WO2019114955A1 - Detecting user attention in immersive video - Google Patents

Detecting user attention in immersive video Download PDF

Info

Publication number
WO2019114955A1
WO2019114955A1 PCT/EP2017/082693 EP2017082693W WO2019114955A1 WO 2019114955 A1 WO2019114955 A1 WO 2019114955A1 EP 2017082693 W EP2017082693 W EP 2017082693W WO 2019114955 A1 WO2019114955 A1 WO 2019114955A1
Authority
WO
WIPO (PCT)
Prior art keywords
immersive video
gaze vector
interest
region
immersive
Prior art date
Application number
PCT/EP2017/082693
Other languages
French (fr)
Inventor
Oliver Baumann
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2017/082693 priority Critical patent/WO2019114955A1/en
Publication of WO2019114955A1 publication Critical patent/WO2019114955A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server
    • H04N21/2543Billing, e.g. for subscription services
    • H04N21/2547Third Party Billing, e.g. billing of advertiser
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/014Hand-worn input/output arrangements, e.g. data gloves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/0485Scrolling or panning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6582Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/014Head-up displays characterised by optical features comprising information/image processing systems
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0179Display position adjusting means not related to the information to be displayed
    • G02B2027/0187Display position adjusting means not related to the information to be displayed slaved to motion of at least a part of the body of the user, e.g. head, eye

Definitions

  • the present application relates to: a method for measuring views of a region of interest in immersive video; a user terminal arranged to measure views of at least one region of interest; a server arranged to measure views of at least one region of interest; a method in a user terminal arranged to display immersive video to a user; an apparatus for measuring views of a region of interest in immersive video; an apparatus for displaying immersive video to a user; and a computer-readable medium.
  • Virtual reality is a technology that has been around for decades but has become more widely adopted only in recent years with the release of several VR headsets.
  • Current implementations include: headsets based on the use of mobile phones with positional sensors strapped closely in front of the eyes; and dedicated VR headsets.
  • the image displayed in the screen changes in response to the acceleration signals and hence to the position of the user’s head. This allows the user to look around a virtual environment giving a sense of immersion.
  • the positional sensors may comprise at least one of accelerometers, gyroscopes, magnetic field sensors, and optical sensors.
  • Immersive video describes a video of a scene, where the view in multiple directions is viewed or is at least viewable at the same time. Immersive video is sometimes described as recording the view in every direction, sometimes with a caveat excluding the camera support. Strictly interpreted, this is an unduly narrow definition, and in practice the term immersive video is applied to any video with a very wide field of view.
  • Immersive video may also be described as video where a viewer is expected to watch only a portion of the video at any one time.
  • the IMAX® motion picture film format developed by the IMAX Corporation provides very high resolution video to viewers on a large screen where it is normal that at any one time some portion of the screen is outside of the viewer’s field of view. This is in contrast to a regular video on a smartphone display or even a television, where usually a viewer can see the whole screen at once.
  • Advertisements in traditional video are ubiquitous. Typically, a portion of screen time is devoted to advertisements and users have come to tolerate this diversion of their attention. Less common are banner or overlay advertisements which run in conjunction with video content. Here, a portion of screen area is devoted to advertisements.
  • Product placements are another common form of advertisement, wherein a product such as a car takes prominence in the action and a manufacturer pays to have their product used in the production of the video.
  • a portion of the display area can be given over to an advertisement. This could be a traditional video or a poster/billboard arrangement, where the user will see that advertisement if their viewing area coincides with the advertisement display area.
  • One solution to this problem might be to limit immersive video advertisements to the whole screen, either by disabling the immersive video and switching to a traditional limited view video, or taking over the whole experience with an advertisement that is itself an immersive video. Either of these approaches is likely to prove unpopular with users given the intimate and immersive nature of an immersive video experience.
  • Immersive video requires a less jarring advertising experience for users, but such a system must acknowledge the commercial realities of advertising and the needs of advertisers.
  • the direction in which the user is looking at any given time within an immersive video is the so called gaze vector.
  • a method for measuring views of a region of interest in immersive video comprising: receiving a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the method further comprising receiving an indication of at least one region of interest in the immersive video; and identifying a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • the immersive video is a two-dimensional curved surface and may be thought of as the inside of a sphere or the inside of a cylinder. In practice this might be any shaped surface, such as a cube or other shape with multiple flat or curved sides. Typically, a projection is made from that surface to the display area.
  • the immersive video may comprise a 3D world in which a known projection techniques are used to generate a view displayed to the user.
  • the gaze vector defines a spatial point in the immersive video. It is called a vector because it starts at the center of the sphere or cylinder and projects out to the inside surface of the sphere or cylinder. Together with the field of view, the gaze vector defines the current view of the user viewing the immersive video.
  • the gaze vector may define a viewing area of the immersive video.
  • the region of interest is an area of the immersive video where, for some time period, an advertisement is shown.
  • the advertisement may be a product placement.
  • Such advertisement space may be sold on a pay per view basis, and the present document defines a way to measure such views in an immersive video arrangement.
  • the gaze vector and timestamp are sent back to the service provider where a back office system process them.
  • By correlating the gaze vector and timestamp data with a database of known advertisement locations we can provide analytic data on for example, how often or how long an advertisement or placed product is viewed.
  • the region of interest may have a boundary in both viewing area and time.
  • the region of interest may be defined by an interest vector at its center, such that a match is identified when a gaze vector and interest vector approach within a threshold distance.
  • the distance may be measured at the plane of the immersive video projection.
  • the threshold distance may be so defined such that any object within the threshold distance of the gaze vector is observable to the viewer of the immersive video.
  • the threshold distance may have different bounds in the vertical and horizontal directions.
  • the region of interest may be an advertisement.
  • the advertisement may be a product placement.
  • the portion of the immersive video within the region of interest may be a still image or a video.
  • the method may further comprise using the identified match to identify a view of an advertisement.
  • an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
  • the method may further comprise aggregating matches for a plurality of users, and determining the total number of views of an advertisement.
  • the method may be applied to every user within a group to measure the number of views within that group.
  • the method may be applied in a subset of a group of users to give an estimate of the number of views in that group.
  • a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; a directional input for detecting a gaze direction of the user.
  • the user terminal is further arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the user terminal is further arranged to: receive an indication of at least one region of interest in the immersive video; identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • the directional input may be a user interface device such as a mouse, keyboard, or some other form of controller.
  • the directional input may be an orientation sensor.
  • the orientation sensor may be in the immersive display.
  • the immersive display may be a headset.
  • the user terminal may be further arranged to send identified matches to a server.
  • an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
  • a server arranged to measure views of at least one region of interest, the server arranged to: receive a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the server is further arranged to receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • the server may be further arranged to use the identified match to identify a view of an advertisement.
  • the server may be further arranged to aggregate matches for a plurality of users, and determining the total number of views of an advertisement.
  • a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; and a directional input for detecting a gaze direction of the user.
  • the user terminal is arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the user terminal is further arranged to send the gaze vector and the timestamp to a server.
  • the detected gaze direction of the user is used to determine the gaze vector.
  • a method in a user terminal arranged to display immersive video to a user comprising: displaying a portion of an immersive video; and detecting a gaze direction of the user and determining a gaze vector.
  • the method further comprises updating the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and sending a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • an apparatus for measuring views of a region of interest in immersive video comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: receive a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded; receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • an apparatus for displaying immersive video to a user comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: display a portion of an immersive video; detect a gaze direction of the user and determining a gaze vector;
  • the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the computer program product may be in the form of a non-volatile memory or volatile memory, e.g. an EEPROM (Electrically Erasable Programmable Read-only Memory), a flash memory, a disk drive or a RAM (Random- access memory).
  • EEPROM Electrically Erasable Programmable Read-only Memory
  • flash memory e.g. a flash memory
  • disk drive e.g. a disk drive
  • RAM Random- access memory
  • Figure 1 illustrates a user terminal displaying a portion of an immersive video
  • Figure 2 shows a man watching an immersive video on his smartphone
  • Figure 3 shows a woman watching an immersive video on a virtual reality headset
  • Figures 4 a, b, c illustrate three different orientations of a displayed area in relation to an advertisement displayed in a portion of an immersive video
  • Figure 5 illustrates a threshold area
  • Figure 6 illustrates a method for measuring views of a region of interest in immersive video
  • Figure 7 illustrates a further method comprising aggregating matches for a plurality of users, and determining the total number of views of a region of interest
  • Figure 8 illustrates a user terminal arranged to measure views of at least one region of interest
  • Figure 9 illustrates a server arranged to measure views of at least one region of interest.
  • Figure 10 illustrates a further method which may be performed in a user terminal arranged to display immersive video to a user.
  • An embodiment of the invention may include functionality that may be implemented as software executed by a processor, hardware circuits or structures, or a combination of both.
  • the processor may be a general-purpose or dedicated processor, such as a processor from the family of processors made by Intel Corporation, Motorola
  • the software may comprise programming logic, instructions or data to implement certain functionality for an embodiment of the invention.
  • the software may be stored in a medium accessible by a machine or computer-readable medium, such as read-only memory (ROM), random- access memory (RAM), magnetic disk (e.g., floppy disk and hard drive), optical disk (e.g., CD-ROM) or any other data storage medium.
  • ROM read-only memory
  • RAM random- access memory
  • magnetic disk e.g., floppy disk and hard drive
  • optical disk e.g., CD-ROM
  • the media may store programming instructions in a compressed and/ or encrypted format, as well as instructions that may have to be compiled or installed by an installer before being executed by the processor.
  • an embodiment of the invention may be implemented as specific hardware components that contain hard-wired logic for performing the recited functionality, or by any combination of programmed general-purpose computer components and custom hardware components.
  • the gaze vector is sent back to the service provider where a back- office system processes it to determine whether any match has occurred.
  • the proposed solution can provide analytic data on for example, how often or how long a region of interest such as an advertisement is viewed.
  • the service provider knows an advert is being viewed simply because the channel is active or, in the case of over-the-top delivery, the media has been requested. A reasonable likelihood that the advertisement has been viewed can be associated with that information.
  • viewing statistics are used by providers to target advertising to the viewer and also report viewing figures back to the advertisement commissioning company.
  • Immersive or 360° video will offer a very different advertising medium. It may be possible for multiple advertisements, or product placements to be in the full 360° frame at the same time. The nature of the consumption of 360° video is such that not all these advertising features will be visible to the user at any one time. In the transition to 360° media one could imagine multiple standard, planar adverts being in the 360° frame. The user selects which advert to view simply by looking at it. The audio for the one currently in the user’s field of view is delivered to the user headphones for example. Without the knowledge of the viewer’s gaze vector there will be no way of knowing which advert or products were viewed.
  • a system for detecting user attention in immersive vide This works by determining the gaze vector used by a viewer of 360° video in association with the time instance of the video. Using this gaze vector and time instance, the user terminal, or a back-office system at a remote server will be able to identify, by correlation with a database of semantic video descriptive data, areas or items being viewed by the viewer. This data may be used to aggregate analytic data on which adverts are being viewed and for how long.
  • the system allows service providers to identify data about how often and how long advertisements and placed products are viewed for. This will feed into their advertising business model by, for example, charging for advertising space on a per-view or duration viewed basis.
  • a 360° video asset comprises a projection of a spherical video captured by a specialized 360° camera. It can be thought of as a video projected on the inside of a sphere with the camera, and later the viewer positioned at the center of the sphere. These assets are consumed by the viewer using an app on a PC, tablet, smartphone or head mounted device (which may be a smartphone in a headset as per the Samsung Gear VR).
  • the part of the 360° scene displayed in the screen of the viewing device is determined either by manual intervention such as the click- and-drag of a mouse or the swipe of a tablet screen, or by feedback control by sensors (accelerometers / gyros) in the device.
  • the concept of a gaze vector defines the azimuth and elevation angles from the center of the sphere to the center of the viewport.
  • the gaze vector is at the heart of feedback loop used to determine what is displayed in the device screen but can also be logged by the app running on the device and transmitted back to the service provider.
  • the gaze vector is determined from information received from the directional input.
  • the directional input may be positional sensors on the device, such as magnetic and/ or gyroscopic sensors on a smartphone or VR headset.
  • the directional input may come from a traditional user interface device such as any combination of a keyboard, a mouse, or a game controller, such as a joy stick or joy pad.
  • the gaze vector may additionally be determined by eye tracking, whereby a camera on the user device monitors the eye ball position. Such eye tracking position can tell the user device where on the display the user’s attention is focused. This eye tracking data can be used to further refine where the gaze vector lies.
  • eye-tracking allows refinement of the gaze vector.
  • the increased level of accuracy may allow for a reduction in the size of a threshold area, within which the gaze vector must point for a minimum period of time for an advertisement impression to be counted.
  • the service provider responsible for delivering the video asset to the viewer, may wish to sell advertising in this 360° space. This may be either by advertisements or product placements (see examples below). Since not all of the 360° space is observable to any individual viewer at any point in time it is not known whether the advert or placed product has been viewed. This system feeds back the direction the viewer is looking, the gaze vector, to the service provider to be stored and used to inform the advertising business model.
  • planar advertisements as are currently delivered to standard television sets, are presented in the 360° space. These can be thought of as video panels shown in an area of the immersive video, that area viewable within a field of view of the user. A user may thus turn to look at any one of several advertisements display concurrently in different areas of the immersive field. The user might watch the video advertisement of most interest to him. The gaze vector information from such a use case may be used to profile the user, understanding the type of adverts they tend to watch and for how long. This will allow the service provider to deliver more relevant
  • the gaze vector information may be used to identify which adverts have been viewed and for how long. This data might feed into the way the advertising space is charged.
  • the content includes a product placement but this is only shown in a subsection of the immersive video, in a region of interest.
  • the service provider is responsible for the production of video assets (e.g. Netflix Originals) advertising may be sold in the form product placements.
  • video assets e.g. Netflix Originals
  • advertising may be sold in the form product placements.
  • the specific brand of beer or coffee used by an actor might be sponsored.
  • the gaze vector fed back to the service provider from multiple users will allow them to know how often the placed product is in the field of view or directly in the line of sight of the user.
  • the gaze vector is used for 360° advertisement analytics.
  • An advertisement may be a 360° video asset in itself. Knowledge of what users are (and aren’t) looking at in the advert will likely be of interest to the sponsor / advertisement producer allowing them to produce more compelling 360° advertisements.
  • the gaze vector data in this case may be sold back to the sponsor / producer in various forms from the raw vectors to distilled analytics.
  • this content analysis is of interest to all types of immersive content producers and so is also relevant beyond the scope of advertising.
  • 360° video data is sent from the service provider to the rendering device, which may be a user terminal.
  • the video asset may have semantic metadata which records the locations of placed products or advertisements in the 360° space.
  • the gaze vector data and associated video time is logged by an application on the rendering device and this information is transmitted back to the service provider where it is stored in a database.
  • the gaze vector data passed back may be a subsampled or temporally averaged version of that used in the motion (or interaction) display loop for example.
  • a process running in the server will then aggregate gaze vector data from multiple users and correlate that with the metadata associated with the video asset to allow the system to identify which adverts, advert regions, product placements or areas of content have been viewed, how many times and for how long.
  • Immersive video can take a plurality of forms. Referring now in detail to the drawings there is illustrated in Figure 1 a user terminal 100 displaying a portion of an immersive video 180.
  • the user terminal is shown as a smartphone and has a screen 110, which is shown displaying a selected portion 185 of immersive video 180.
  • immersive video 180 is a panoramic or cylindrical view of a city skyline.
  • Smartphone 100 comprises gyroscope sensors to measure its orientation, and in response to changes in its orientation the smartphone 100 displays different sections of immersive video 180. For example, if the smartphone 100 were rotated to the left about its vertical axis, the portion 185 of video 180 that is selected would also move to the left and a different area of video 180 would be displayed.
  • the user terminal 100 may comprise any kind of personal computer such as a television, a smart television, a set-top box, a games-console, a home-theatre personal computer, a tablet, a smartphone, a laptop, or even a desktop PC.
  • personal computer such as a television, a smart television, a set-top box, a games-console, a home-theatre personal computer, a tablet, a smartphone, a laptop, or even a desktop PC.
  • the gaze vector which is the direction in which the user is looking at any given time within an immersive video is used to identify whether a user has seen an advertisement.
  • FIG. 2 shows a man watching a video 280 on his smartphone 200.
  • Smartphone 200 is an example of a user terminal. It has a display 210 which displays area 285 of the video 280.
  • the video 280 is illustrated by a grid of dashed lines to demonstrate the curvature of the video surface.
  • the area of the video 280 that is displayed by smartphone 200 changes as the user changes the orientation of this smartphone 200.
  • the selection of the video area to be displayed by the smartphone 200 is defined by a physical location and/ or orientation of the smartphone 200. This information is obtained from sensors in the smartphone 200, such as a magnetic sensor (or compass), and a gyroscope. Alternatively, the smartphone 200 may have a camera and use this together with image processing software to determine a relative orientation of the smartphone 200.
  • the displayed area may also be based on user input to the smartphone 200. For example, such a user input may be via a touch screen on the smartphone 200.
  • FIG 3 shows a woman watching video 380 on a virtual reality headset 300.
  • the virtual reality headset 300 comprises a display 310.
  • the display 310 may comprise a screen, or a plurality of screens, or a virtual retina display that projects images onto the retina.
  • the video 380 is again illustrated here as a grid of dashed lines.
  • the area of video 380 that is displayed to the user is selected as the user changes the orientation of her head, and also the orientation of the headset strapped to her head. The user sees only displayed area 385 of video 380.
  • the video surfaces in figures 2 and 3 are illustrated as the inside of a sphere.
  • the video surface may be shaped like the inside surface of a cylinder.
  • the vertical extent of the immersive video is limited by the top and bottom edges of that cylinder. If the cylinder wraps fully around the user, then this is a format of 360° video that gives a 360° field of view.
  • the selection of a subset of video segments by the user terminal is defined by a physical location and/ or orientation of the headset 300. This information is obtained from gyroscope and/ or magnetic sensors in the headset. The selection may also be based on user input to the user terminal. For example, such a user input may be via a keyboard connected to the headset 300.
  • Figures 4 a, b, c illustrate three different orientations of a displayed area 485 in relation to an advertisement 450 displayed in a portion of an immersive video 480.
  • a gaze vector 490 is at the center of the displayed area 485.
  • the displayed area 485 wholly encompasses the advertisement area 450.
  • the user will have seen the advertisement for“Fluff’.
  • the displayed area 485 only slightly overlaps the advertisement area 450.
  • the user may notice the advertisement area 450, but will not be exposed to the message and so this will not count as a view.
  • the gaze vector 490 is sufficiently far away from the advertisement area 450 that the advertisement area 550 falls entirely outside the displayed area 485. The user is deemed not to have viewed the advertisement.
  • Figure 5 illustrates a threshold area 570. If the gaze vector 590 is within the threshold area 570, the user sees a sufficient proportion of the advertisement area 550 that an advertising impression is deemed to have been made and this counts as a view. Of course, as well as this spatial threshold, a temporal threshold may be applied whereby the gaze vector 590 must be within the threshold area 570 for a minimum period of time for an advertisement impression to be counted. If the gaze vector 590 enters the advertisement area 550 only momentarily then an advertisement impression is not counted.
  • Figure 6 illustrates a method for measuring views of a region of interest in immersive video, the method comprising: receiving a gaze vector 610 and receiving a timestamp 620.
  • the gaze vector defines the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the method further comprises receiving an indication 630 of at least one region of interest in the immersive video.
  • a determination is made at box 640 as to whether the gaze vector is within a threshold distance of the region of interest.
  • a match is identified 650 when the gaze vector approaches within a threshold distance of the at least one region of interest. If the gaze vector is beyond a threshold distance from the region of interest it is determined 660 there is no match.
  • the immersive video is a two-dimensional curved surface and may be thought of as the inside of a sphere or the inside of a cylinder.
  • the gaze vector defines a spatial point in the immersive video. It is called a vector because it starts at the center of the sphere or cylinder and projects out to the inside surface of the sphere or cylinder.
  • the region of interest is an area of the immersive video where, for some time period, an advertisement is shown.
  • the advertisement may be a product placement.
  • Such advertisement space may be sold on a pay per view basis, and the present document defines a way to measure such views in an immersive video arrangement.
  • a match identified in the above method may determine an advertisement view, it may be a component in determining an advertisement view, or it may be used outside the field of advertising for example in immersive video content analysis.
  • Such content analysis may be used to determine what action a user has missed or has seen, in order to tailor future content for that user, or to modify the content presentation for future users.
  • the gaze vector and timestamp are sent back to the service provider where a back-office system process them.
  • a database of known advertisement locations we can provide analytic data on for example, how often or how long an advertisement or placed product is viewed.
  • the gaze vector may define a viewing area of the immersive video.
  • the region of interest may have a boundary in both viewing area and time.
  • the region of interest may be defined by an interest vector at its center, such that a match is identified when a gaze vector and interest vector approach within a threshold distance.
  • the distance may be measured at the plane of the immersive video projection.
  • the threshold distance may be so defined such that any object within the threshold distance of the gaze vector is observable to the viewer of the immersive video.
  • the threshold distance may comprise a threshold area.
  • the threshold area may have a different horizontal and vertical dimension to match the broader horizontal field of view in humans.
  • the threshold area may match the expected field of view of the user.
  • the threshold area may be smaller than the field of view of the user.
  • the region of interest may be an advertisement.
  • the advertisement may be a product placement.
  • the portion of the immersive video within the region of interest may be a still image or a video.
  • the method may further comprise using the identified match to identify a view of an advertisement or a product placement.
  • an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
  • Figure 7 illustrates a further method comprising aggregating matches for a plurality of users, and determining the total number of views of a region of interest.
  • Match information as for example determined in the method illustrated by figure 6, is received 710 for a plurality of users. Then these results are aggregated 720 to determine the total number of views of an advertisement.
  • the method may be applied to every user within a group to measure the number of views within that group. Alternatively, the method may be applied in a subset of a group of users to give an estimate of the number of views in that group.
  • Figure 8 illustrates a user terminal 800 arranged to measure views of at least one region of interest.
  • the user terminal comprises a receiver 810, a processor 820, a memory 825, a directional input 830, and an immersive display 840.
  • the receiver 810 receives immersive video for display by the user terminal 800.
  • the processor 820 is arranged to receive instructions which, when executed, causes the processor 820 to carry out a method described herein. The instructions may be stored on the memory 825.
  • the directional input 830 for detecting a gaze direction of the user.
  • the immersive display 840 is arranged to show a portion of an immersive video.
  • the user terminal 800 is further arranged to: update the portion of the immersive video displayed on the immersive display 840 in response to the gaze direction detected by directional input 830.
  • the processor 820 determines a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the user terminal 800 is further arranged to receive an indication of at least one region of interest in the immersive video and to identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • the directional input 830 may be a user interface device such as a mouse, keyboard, or some other form of controller.
  • the directional input may be an orientation sensor.
  • the orientation sensor may be in the immersive display.
  • the immersive display may be a headset.
  • the user terminal 800 may be further arranged to send identified matches to a server.
  • the region of interest may be an advertisement.
  • the advertisement may be a product placement.
  • the user terminal 800 may be further arranged to use the identified match to identify a view of an advertisement.
  • an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
  • Figure 9 illustrates a server 900 arranged to measure views of at least one region of interest, the server 900 comprising a receiver 910, a processor 920, a memory 925, a transmitter 930, and a storage component 940.
  • the receiver 910 receives a gaze vector and a timestamp from a user terminal, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the receiver 910 additionally receives an indication of at least one region of interest in the immersive video.
  • the processor 920 is arranged to receive instructions which, when executed, causes the processor 920 to carry out a method described herein.
  • the instructions may be stored on the memory 925.
  • the processor 920 identifies a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • the match details may be reported to another device via transmitter 930, or stored in storage component 940 for later retrieval and/ or analysis.
  • the region of interest is an advertisement.
  • the advertisement may be a product placement.
  • the server 900 may be further arranged to use the identified match to identify a view of an advertisement.
  • the server 900 may be further arranged to aggregate matches for a plurality of users, and determining the total number of views of an advertisement.
  • a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; and a directional input for detecting a gaze direction of the user.
  • the user terminal is arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the user terminal is further arranged to send the gaze vector and the timestamp to a server.
  • the detected gaze direction of the user is used to determine the gaze vector.
  • Figure 10 illustrates a further method which may be performed in a user terminal arranged to display immersive video to a user.
  • the method comprises displaying 1001 a portion of an immersive video.
  • the method further comprises detecting 1010 a gaze direction of the user and determining 1011 a gaze vector.
  • the method further comprises updating 1100 the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and sending 1101 a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • an apparatus for measuring views of a region of interest in immersive video comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: receive a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded; receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
  • an apparatus for displaying immersive video to a user comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: display a portion of an immersive video; detect a gaze direction of the user and determining a gaze vector;
  • a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
  • the computer program product may be in the form of a non-volatile memory or volatile memory, e.g. an EEPROM (Electrically Erasable Programmable Read-only Memory), a flash memory, a disk drive or a RAM (Random- access memory).
  • EEPROM Electrically Erasable Programmable Read-only Memory
  • flash memory e.g. a flash memory
  • disk drive e.g. a disk drive
  • RAM Random- access memory
  • the user terminal may be a client device.
  • the user terminal may be any kind of personal computer such as a television, a smart television, a set-top box, a games-console, a home- theatre personal computer, a tablet, a smartphone, a laptop, or even a desktop PC.
  • the user terminal may be connected to or comprise a stereoscopic display.
  • the user terminal may comprise a headset, alternatively the user terminal may comprise a device arranged to output video to and receive directional inputs from a headset.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Social Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Optics & Photonics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

There is provided a method for measuring views of a region of interest in immersive video, the method comprising: receiving a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The method further comprises receiving an indication of at least one region of interest in the immersive video, and identifying a match when the gaze vector approaches within a threshold distance of the at least one region of interest.

Description

DETECTING USER ATTENTION
IN IMMERSIVE VIDEO
Technical field
The present application relates to: a method for measuring views of a region of interest in immersive video; a user terminal arranged to measure views of at least one region of interest; a server arranged to measure views of at least one region of interest; a method in a user terminal arranged to display immersive video to a user; an apparatus for measuring views of a region of interest in immersive video; an apparatus for displaying immersive video to a user; and a computer-readable medium.
Background
Virtual reality is a technology that has been around for decades but has become more widely adopted only in recent years with the release of several VR headsets. Current implementations include: headsets based on the use of mobile phones with positional sensors strapped closely in front of the eyes; and dedicated VR headsets. The image displayed in the screen changes in response to the acceleration signals and hence to the position of the user’s head. This allows the user to look around a virtual environment giving a sense of immersion. The positional sensors may comprise at least one of accelerometers, gyroscopes, magnetic field sensors, and optical sensors.
Immersive video describes a video of a scene, where the view in multiple directions is viewed or is at least viewable at the same time. Immersive video is sometimes described as recording the view in every direction, sometimes with a caveat excluding the camera support. Strictly interpreted, this is an unduly narrow definition, and in practice the term immersive video is applied to any video with a very wide field of view.
Immersive video may also be described as video where a viewer is expected to watch only a portion of the video at any one time. For example, the IMAX® motion picture film format, developed by the IMAX Corporation provides very high resolution video to viewers on a large screen where it is normal that at any one time some portion of the screen is outside of the viewer’s field of view. This is in contrast to a regular video on a smartphone display or even a television, where usually a viewer can see the whole screen at once.
Advertisements in traditional video are ubiquitous. Typically, a portion of screen time is devoted to advertisements and users have come to tolerate this diversion of their attention. Less common are banner or overlay advertisements which run in conjunction with video content. Here, a portion of screen area is devoted to advertisements.
Product placements are another common form of advertisement, wherein a product such as a car takes prominence in the action and a manufacturer pays to have their product used in the production of the video. In immersive video, a portion of the display area can be given over to an advertisement. This could be a traditional video or a poster/billboard arrangement, where the user will see that advertisement if their viewing area coincides with the advertisement display area.
However, given that a user viewing an immersive video will see only a small proportion of the video, and that area is dependent upon where they are looking, an advertiser cannot be certain that a viewer of the immersive video has seen their advertisement.
One solution to this problem might be to limit immersive video advertisements to the whole screen, either by disabling the immersive video and switching to a traditional limited view video, or taking over the whole experience with an advertisement that is itself an immersive video. Either of these approaches is likely to prove unpopular with users given the intimate and immersive nature of an immersive video experience.
Immersive video requires a less jarring advertising experience for users, but such a system must acknowledge the commercial realities of advertising and the needs of advertisers.
Summary
The direction in which the user is looking at any given time within an immersive video is the so called gaze vector. We use this gaze vector in the field of advertising in immersive video.
Accordingly, there is provided a method for measuring views of a region of interest in immersive video, the method comprising: receiving a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The method further comprising receiving an indication of at least one region of interest in the immersive video; and identifying a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
The immersive video is a two-dimensional curved surface and may be thought of as the inside of a sphere or the inside of a cylinder. In practice this might be any shaped surface, such as a cube or other shape with multiple flat or curved sides. Typically, a projection is made from that surface to the display area. Where the immersive video is computer generated and rendered in real time, the immersive video may comprise a 3D world in which a known projection techniques are used to generate a view displayed to the user. The gaze vector defines a spatial point in the immersive video. It is called a vector because it starts at the center of the sphere or cylinder and projects out to the inside surface of the sphere or cylinder. Together with the field of view, the gaze vector defines the current view of the user viewing the immersive video. The gaze vector may define a viewing area of the immersive video.
The region of interest is an area of the immersive video where, for some time period, an advertisement is shown. The advertisement may be a product placement. Such advertisement space may be sold on a pay per view basis, and the present document defines a way to measure such views in an immersive video arrangement.
The gaze vector and timestamp are sent back to the service provider where a back office system process them. By correlating the gaze vector and timestamp data with a database of known advertisement locations we can provide analytic data on for example, how often or how long an advertisement or placed product is viewed. The region of interest may have a boundary in both viewing area and time.
The region of interest may be defined by an interest vector at its center, such that a match is identified when a gaze vector and interest vector approach within a threshold distance. The distance may be measured at the plane of the immersive video projection. The threshold distance may be so defined such that any object within the threshold distance of the gaze vector is observable to the viewer of the immersive video. The threshold distance may have different bounds in the vertical and horizontal directions.
The region of interest may be an advertisement. The advertisement may be a product placement. The portion of the immersive video within the region of interest may be a still image or a video.
The method may further comprise using the identified match to identify a view of an advertisement.
Where the gaze vector changes during the time period that the region of interest exists, then an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
The method may further comprise aggregating matches for a plurality of users, and determining the total number of views of an advertisement. The method may be applied to every user within a group to measure the number of views within that group.
Alternatively, the method may be applied in a subset of a group of users to give an estimate of the number of views in that group.
There is further provided a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; a directional input for detecting a gaze direction of the user. The user terminal is further arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The user terminal is further arranged to: receive an indication of at least one region of interest in the immersive video; identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest. The directional input may be a user interface device such as a mouse, keyboard, or some other form of controller. The directional input may be an orientation sensor. The orientation sensor may be in the immersive display. The immersive display may be a headset.
The user terminal may be further arranged to send identified matches to a server.
Where the gaze vector changes during the time period that the region of interest exists, then an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
There is further provided a server arranged to measure views of at least one region of interest, the server arranged to: receive a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The server is further arranged to receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
The server may be further arranged to use the identified match to identify a view of an advertisement. The server may be further arranged to aggregate matches for a plurality of users, and determining the total number of views of an advertisement.
There is further provided a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; and a directional input for detecting a gaze direction of the user. The user terminal is arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The user terminal is further arranged to send the gaze vector and the timestamp to a server.
The detected gaze direction of the user is used to determine the gaze vector.
There is further provided a method in a user terminal arranged to display immersive video to a user, the method comprising: displaying a portion of an immersive video; and detecting a gaze direction of the user and determining a gaze vector. The method further comprises updating the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and sending a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. There is further provided an apparatus for measuring views of a region of interest in immersive video comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: receive a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded; receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
There is further provided an apparatus for displaying immersive video to a user comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: display a portion of an immersive video; detect a gaze direction of the user and determining a gaze vector;
update the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and send a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
There is further provided a computer-readable medium, carrying instructions, which, when executed by computer logic, causes said computer logic to carry out any of the methods defined herein.
There is further provided a computer-readable storage medium, storing instructions, which, when executed by computer logic, causes said computer logic to carry out any of the methods defined herein. The computer program product may be in the form of a non-volatile memory or volatile memory, e.g. an EEPROM (Electrically Erasable Programmable Read-only Memory), a flash memory, a disk drive or a RAM (Random- access memory).
Brief description of the drawings
A method and apparatus for detecting user attention in immersive video will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 illustrates a user terminal displaying a portion of an immersive video;
Figure 2 shows a man watching an immersive video on his smartphone;
Figure 3 shows a woman watching an immersive video on a virtual reality headset;
Figures 4 a, b, c illustrate three different orientations of a displayed area in relation to an advertisement displayed in a portion of an immersive video;
Figure 5 illustrates a threshold area;
Figure 6 illustrates a method for measuring views of a region of interest in immersive video; Figure 7 illustrates a further method comprising aggregating matches for a plurality of users, and determining the total number of views of a region of interest;
Figure 8 illustrates a user terminal arranged to measure views of at least one region of interest;
Figure 9 illustrates a server arranged to measure views of at least one region of interest; and
Figure 10 illustrates a further method which may be performed in a user terminal arranged to display immersive video to a user.
Detailed description
In this detailed description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It will be understood by those skilled in the art, however, that the embodiments of the invention may be practiced without these specific details. In other instances, well known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments of the invention. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the invention.
An embodiment of the invention may include functionality that may be implemented as software executed by a processor, hardware circuits or structures, or a combination of both. The processor may be a general-purpose or dedicated processor, such as a processor from the family of processors made by Intel Corporation, Motorola
Incorporated, Sun Microsystems Incorporated and others. The software may comprise programming logic, instructions or data to implement certain functionality for an embodiment of the invention. The software may be stored in a medium accessible by a machine or computer-readable medium, such as read-only memory (ROM), random- access memory (RAM), magnetic disk (e.g., floppy disk and hard drive), optical disk (e.g., CD-ROM) or any other data storage medium. In one embodiment of the invention, the media may store programming instructions in a compressed and/ or encrypted format, as well as instructions that may have to be compiled or installed by an installer before being executed by the processor.
Alternatively, an embodiment of the invention may be implemented as specific hardware components that contain hard-wired logic for performing the recited functionality, or by any combination of programmed general-purpose computer components and custom hardware components.
In one embodiment the gaze vector is sent back to the service provider where a back- office system processes it to determine whether any match has occurred. By correlating the gaze vector data with a database of known advertisement or product locations the proposed solution can provide analytic data on for example, how often or how long a region of interest such as an advertisement is viewed. In the context of a standard linear broadcast the service provider knows an advert is being viewed simply because the channel is active or, in the case of over-the-top delivery, the media has been requested. A reasonable likelihood that the advertisement has been viewed can be associated with that information. Such viewing statistics are used by providers to target advertising to the viewer and also report viewing figures back to the advertisement commissioning company. One drawback of these traditional methods is that it is not known if the viewer was looking at the display, or even in the room at any instance. It is quite possible that a user leaves the room to make a cup of tea during a commercial break, and so does not see the advertisement.
Immersive or 360° video will offer a very different advertising medium. It may be possible for multiple advertisements, or product placements to be in the full 360° frame at the same time. The nature of the consumption of 360° video is such that not all these advertising features will be visible to the user at any one time. In the transition to 360° media one could imagine multiple standard, planar adverts being in the 360° frame. The user selects which advert to view simply by looking at it. The audio for the one currently in the user’s field of view is delivered to the user headphones for example. Without the knowledge of the viewer’s gaze vector there will be no way of knowing which advert or products were viewed.
There is provided herein a system for detecting user attention in immersive vide. This works by determining the gaze vector used by a viewer of 360° video in association with the time instance of the video. Using this gaze vector and time instance, the user terminal, or a back-office system at a remote server will be able to identify, by correlation with a database of semantic video descriptive data, areas or items being viewed by the viewer. This data may be used to aggregate analytic data on which adverts are being viewed and for how long.
The system allows service providers to identify data about how often and how long advertisements and placed products are viewed for. This will feed into their advertising business model by, for example, charging for advertising space on a per-view or duration viewed basis.
A 360° video asset comprises a projection of a spherical video captured by a specialized 360° camera. It can be thought of as a video projected on the inside of a sphere with the camera, and later the viewer positioned at the center of the sphere. These assets are consumed by the viewer using an app on a PC, tablet, smartphone or head mounted device (which may be a smartphone in a headset as per the Samsung Gear VR). The part of the 360° scene displayed in the screen of the viewing device (the viewport) is determined either by manual intervention such as the click- and-drag of a mouse or the swipe of a tablet screen, or by feedback control by sensors (accelerometers / gyros) in the device.
The concept of a gaze vector defines the azimuth and elevation angles from the center of the sphere to the center of the viewport. The gaze vector is at the heart of feedback loop used to determine what is displayed in the device screen but can also be logged by the app running on the device and transmitted back to the service provider.
The gaze vector is determined from information received from the directional input.
The directional input may be positional sensors on the device, such as magnetic and/ or gyroscopic sensors on a smartphone or VR headset. The directional input may come from a traditional user interface device such as any combination of a keyboard, a mouse, or a game controller, such as a joy stick or joy pad. The gaze vector may additionally be determined by eye tracking, whereby a camera on the user device monitors the eye ball position. Such eye tracking position can tell the user device where on the display the user’s attention is focused. This eye tracking data can be used to further refine where the gaze vector lies.
Without eye-tracking the gaze vector is assumed to lie at the center of the displayed area eye-tracking allows refinement of the gaze vector. The increased level of accuracy may allow for a reduction in the size of a threshold area, within which the gaze vector must point for a minimum period of time for an advertisement impression to be counted.
The service provider, responsible for delivering the video asset to the viewer, may wish to sell advertising in this 360° space. This may be either by advertisements or product placements (see examples below). Since not all of the 360° space is observable to any individual viewer at any point in time it is not known whether the advert or placed product has been viewed. This system feeds back the direction the viewer is looking, the gaze vector, to the service provider to be stored and used to inform the advertising business model.
Three potential examples for advertising in immersive video will now be described.
In a first example, multiple planar advertisements, as are currently delivered to standard television sets, are presented in the 360° space. These can be thought of as video panels shown in an area of the immersive video, that area viewable within a field of view of the user. A user may thus turn to look at any one of several advertisements display concurrently in different areas of the immersive field. The user might watch the video advertisement of most interest to him. The gaze vector information from such a use case may be used to profile the user, understanding the type of adverts they tend to watch and for how long. This will allow the service provider to deliver more relevant
advertisements to that user (targeted advertising). Alternatively, the gaze vector information may be used to identify which adverts have been viewed and for how long. This data might feed into the way the advertising space is charged.
In a second example the content includes a product placement but this is only shown in a subsection of the immersive video, in a region of interest. Where the service provider is responsible for the production of video assets (e.g. Netflix Originals) advertising may be sold in the form product placements. For example, the specific brand of beer or coffee used by an actor might be sponsored. The gaze vector fed back to the service provider from multiple users will allow them to know how often the placed product is in the field of view or directly in the line of sight of the user.
In a third example, the gaze vector is used for 360° advertisement analytics. An advertisement may be a 360° video asset in itself. Knowledge of what users are (and aren’t) looking at in the advert will likely be of interest to the sponsor / advertisement producer allowing them to produce more compelling 360° advertisements. The gaze vector data in this case may be sold back to the sponsor / producer in various forms from the raw vectors to distilled analytics. Of course, this content analysis is of interest to all types of immersive content producers and so is also relevant beyond the scope of advertising.
360° video data is sent from the service provider to the rendering device, which may be a user terminal. Note that the video asset may have semantic metadata which records the locations of placed products or advertisements in the 360° space.
The gaze vector data and associated video time is logged by an application on the rendering device and this information is transmitted back to the service provider where it is stored in a database. The gaze vector data passed back may be a subsampled or temporally averaged version of that used in the motion (or interaction) display loop for example.
A process running in the server will then aggregate gaze vector data from multiple users and correlate that with the metadata associated with the video asset to allow the system to identify which adverts, advert regions, product placements or areas of content have been viewed, how many times and for how long.
Immersive video can take a plurality of forms. Referring now in detail to the drawings there is illustrated in Figure 1 a user terminal 100 displaying a portion of an immersive video 180. The user terminal is shown as a smartphone and has a screen 110, which is shown displaying a selected portion 185 of immersive video 180. In this example immersive video 180 is a panoramic or cylindrical view of a city skyline.
Smartphone 100 comprises gyroscope sensors to measure its orientation, and in response to changes in its orientation the smartphone 100 displays different sections of immersive video 180. For example, if the smartphone 100 were rotated to the left about its vertical axis, the portion 185 of video 180 that is selected would also move to the left and a different area of video 180 would be displayed.
The user terminal 100 may comprise any kind of personal computer such as a television, a smart television, a set-top box, a games-console, a home-theatre personal computer, a tablet, a smartphone, a laptop, or even a desktop PC.
It is apparent from figure 1 that where the video 180 includes an advertisement in a portion of the video area, that advertisement is not necessarily displayed in portion 185. This is a problem for advertisers given that they pay for a portion of a user’s attention. There is no guarantee that an advert placed within a portion of an immersive video will be seen by the user. Without a mechanism to monitor user attention the viability of such adverts is questionable.
A solution is presented herein to address this problem. The gaze vector, which is the direction in which the user is looking at any given time within an immersive video is used to identify whether a user has seen an advertisement.
Figure 2 shows a man watching a video 280 on his smartphone 200. Smartphone 200 is an example of a user terminal. It has a display 210 which displays area 285 of the video 280. The video 280 is illustrated by a grid of dashed lines to demonstrate the curvature of the video surface. The area of the video 280 that is displayed by smartphone 200 changes as the user changes the orientation of this smartphone 200.
The selection of the video area to be displayed by the smartphone 200 is defined by a physical location and/ or orientation of the smartphone 200. This information is obtained from sensors in the smartphone 200, such as a magnetic sensor (or compass), and a gyroscope. Alternatively, the smartphone 200 may have a camera and use this together with image processing software to determine a relative orientation of the smartphone 200. The displayed area may also be based on user input to the smartphone 200. For example, such a user input may be via a touch screen on the smartphone 200.
Figure 3 shows a woman watching video 380 on a virtual reality headset 300. The virtual reality headset 300 comprises a display 310. The display 310 may comprise a screen, or a plurality of screens, or a virtual retina display that projects images onto the retina. The video 380 is again illustrated here as a grid of dashed lines. The area of video 380 that is displayed to the user is selected as the user changes the orientation of her head, and also the orientation of the headset strapped to her head. The user sees only displayed area 385 of video 380.
The video surfaces in figures 2 and 3 are illustrated as the inside of a sphere.
Alternatively, the video surface may be shaped like the inside surface of a cylinder.
Where the video is the shape of the surface of a cylinder, then the vertical extent of the immersive video is limited by the top and bottom edges of that cylinder. If the cylinder wraps fully around the user, then this is a format of 360° video that gives a 360° field of view.
The selection of a subset of video segments by the user terminal is defined by a physical location and/ or orientation of the headset 300. This information is obtained from gyroscope and/ or magnetic sensors in the headset. The selection may also be based on user input to the user terminal. For example, such a user input may be via a keyboard connected to the headset 300. Figures 4 a, b, c illustrate three different orientations of a displayed area 485 in relation to an advertisement 450 displayed in a portion of an immersive video 480. A gaze vector 490 is at the center of the displayed area 485. In Figure 4a the displayed area 485 wholly encompasses the advertisement area 450. Here the user will have seen the advertisement for“Fluff’.
In figure 4b the displayed area 485 only slightly overlaps the advertisement area 450. The user may notice the advertisement area 450, but will not be exposed to the message and so this will not count as a view.
In figure 4c the gaze vector 490 is sufficiently far away from the advertisement area 450 that the advertisement area 550 falls entirely outside the displayed area 485. The user is deemed not to have viewed the advertisement.
Figure 5 illustrates a threshold area 570. If the gaze vector 590 is within the threshold area 570, the user sees a sufficient proportion of the advertisement area 550 that an advertising impression is deemed to have been made and this counts as a view. Of course, as well as this spatial threshold, a temporal threshold may be applied whereby the gaze vector 590 must be within the threshold area 570 for a minimum period of time for an advertisement impression to be counted. If the gaze vector 590 enters the advertisement area 550 only momentarily then an advertisement impression is not counted.
Figure 6 illustrates a method for measuring views of a region of interest in immersive video, the method comprising: receiving a gaze vector 610 and receiving a timestamp 620. The gaze vector defines the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The method further comprises receiving an indication 630 of at least one region of interest in the immersive video. A determination is made at box 640 as to whether the gaze vector is within a threshold distance of the region of interest. A match is identified 650 when the gaze vector approaches within a threshold distance of the at least one region of interest. If the gaze vector is beyond a threshold distance from the region of interest it is determined 660 there is no match.
The immersive video is a two-dimensional curved surface and may be thought of as the inside of a sphere or the inside of a cylinder. The gaze vector defines a spatial point in the immersive video. It is called a vector because it starts at the center of the sphere or cylinder and projects out to the inside surface of the sphere or cylinder.
The region of interest is an area of the immersive video where, for some time period, an advertisement is shown. The advertisement may be a product placement. Such advertisement space may be sold on a pay per view basis, and the present document defines a way to measure such views in an immersive video arrangement. A match identified in the above method may determine an advertisement view, it may be a component in determining an advertisement view, or it may be used outside the field of advertising for example in immersive video content analysis. Such content analysis may be used to determine what action a user has missed or has seen, in order to tailor future content for that user, or to modify the content presentation for future users.
The gaze vector and timestamp are sent back to the service provider where a back-office system process them. By correlating the gaze vector and timestamp data with a database of known advertisement locations we can provide analytic data on for example, how often or how long an advertisement or placed product is viewed.
The gaze vector may define a viewing area of the immersive video. The region of interest may have a boundary in both viewing area and time. The region of interest may be defined by an interest vector at its center, such that a match is identified when a gaze vector and interest vector approach within a threshold distance. The distance may be measured at the plane of the immersive video projection. The threshold distance may be so defined such that any object within the threshold distance of the gaze vector is observable to the viewer of the immersive video. The threshold distance may comprise a threshold area. The threshold area may have a different horizontal and vertical dimension to match the broader horizontal field of view in humans. The threshold area may match the expected field of view of the user. The threshold area may be smaller than the field of view of the user.
The region of interest may be an advertisement. The advertisement may be a product placement. The portion of the immersive video within the region of interest may be a still image or a video.
The method may further comprise using the identified match to identify a view of an advertisement or a product placement. Where the gaze vector changes during the time period that the region of interest exists, then an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
Figure 7 illustrates a further method comprising aggregating matches for a plurality of users, and determining the total number of views of a region of interest. Match information as for example determined in the method illustrated by figure 6, is received 710 for a plurality of users. Then these results are aggregated 720 to determine the total number of views of an advertisement. The method may be applied to every user within a group to measure the number of views within that group. Alternatively, the method may be applied in a subset of a group of users to give an estimate of the number of views in that group.
Figure 8 illustrates a user terminal 800 arranged to measure views of at least one region of interest. The user terminal comprises a receiver 810, a processor 820, a memory 825, a directional input 830, and an immersive display 840. The receiver 810 receives immersive video for display by the user terminal 800. The processor 820 is arranged to receive instructions which, when executed, causes the processor 820 to carry out a method described herein. The instructions may be stored on the memory 825. The directional input 830 for detecting a gaze direction of the user. The immersive display 840 is arranged to show a portion of an immersive video.
The user terminal 800 is further arranged to: update the portion of the immersive video displayed on the immersive display 840 in response to the gaze direction detected by directional input 830. The processor 820 determines a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The user terminal 800 is further arranged to receive an indication of at least one region of interest in the immersive video and to identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
The directional input 830 may be a user interface device such as a mouse, keyboard, or some other form of controller. The directional input may be an orientation sensor. The orientation sensor may be in the immersive display. The immersive display may be a headset. The user terminal 800 may be further arranged to send identified matches to a server.
The region of interest may be an advertisement. The advertisement may be a product placement. The user terminal 800 may be further arranged to use the identified match to identify a view of an advertisement.
Where the gaze vector changes during the time period that the region of interest exists, then an advertisement view may be determined to have happened if the gaze vector was within a threshold distance of the region of interest for a threshold proportion of the time period that the region of interest exists.
Figure 9 illustrates a server 900 arranged to measure views of at least one region of interest, the server 900 comprising a receiver 910, a processor 920, a memory 925, a transmitter 930, and a storage component 940.
The receiver 910 receives a gaze vector and a timestamp from a user terminal, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The receiver 910 additionally receives an indication of at least one region of interest in the immersive video.
The processor 920 is arranged to receive instructions which, when executed, causes the processor 920 to carry out a method described herein. The instructions may be stored on the memory 925. The processor 920 identifies a match when the gaze vector approaches within a threshold distance of the at least one region of interest. The match details may be reported to another device via transmitter 930, or stored in storage component 940 for later retrieval and/ or analysis. The region of interest is an advertisement. The advertisement may be a product placement. The server 900 may be further arranged to use the identified match to identify a view of an advertisement. The server 900 may be further arranged to aggregate matches for a plurality of users, and determining the total number of views of an advertisement.
There is further provided a user terminal arranged to measure views of at least one region of interest, the user terminal comprising: an immersive display arranged to show a portion of an immersive video; and a directional input for detecting a gaze direction of the user. The user terminal is arranged to: update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction; and determine a gaze vector and a timestamp, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. The user terminal is further arranged to send the gaze vector and the timestamp to a server. The detected gaze direction of the user is used to determine the gaze vector.
Figure 10 illustrates a further method which may be performed in a user terminal arranged to display immersive video to a user. The method comprises displaying 1001 a portion of an immersive video. The method further comprises detecting 1010 a gaze direction of the user and determining 1011 a gaze vector. The method further comprises updating 1100 the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and sending 1101 a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
There is further provided an apparatus for measuring views of a region of interest in immersive video comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: receive a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded; receive an indication of at least one region of interest in the immersive video; and identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
There is further provided an apparatus for displaying immersive video to a user comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to: display a portion of an immersive video; detect a gaze direction of the user and determining a gaze vector;
update the portion of the immersive video displayed on the immersive display in response to the determined gaze vector; and send a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded. There is further provided a computer-readable medium, carrying instructions, which, when executed by computer logic, causes said computer logic to carry out any of the methods defined herein.
There is further provided a computer-readable storage medium, storing instructions, which, when executed by computer logic, causes said computer logic to carry out any of the methods defined herein. The computer program product may be in the form of a non-volatile memory or volatile memory, e.g. an EEPROM (Electrically Erasable Programmable Read-only Memory), a flash memory, a disk drive or a RAM (Random- access memory).
It will be apparent to the skilled person that the exact order and content of the actions carried out in the method described herein may be altered according to the requirements of a particular set of execution parameters. Accordingly, the order in which actions are described and/ or claimed is not to be construed as a strict limitation on order in which actions are to be performed.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim,“a” or“an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope
The user terminal may be a client device. The user terminal may be any kind of personal computer such as a television, a smart television, a set-top box, a games-console, a home- theatre personal computer, a tablet, a smartphone, a laptop, or even a desktop PC. The user terminal may be connected to or comprise a stereoscopic display. For example, the user terminal may comprise a headset, alternatively the user terminal may comprise a device arranged to output video to and receive directional inputs from a headset.

Claims

Claims
1. A method for measuring views of a region of interest in immersive video, the method comprising:
receiving a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded;
receiving an indication of at least one region of interest in the immersive video; identifying a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
2. The method of claim 1 , wherein the gaze vector defines a viewing area of the immersive video.
3. The method of claim 1 or 2, wherein the region of interest has a boundary in both viewing area and time.
4. The method of any preceding claim, wherein the region of interest is an advertisement or a product placement.
5. The method of any preceding claim, wherein the portion of the immersive video within the region of interest is a still image or a video.
6. The method of any preceding claim, further comprising using the identified match to identify a view of an advertisement or a product placement.
7. The method of any preceding claim, further comprising aggregating matches for a plurality of users, and determining the total number of views of an advertisement.
8. A user terminal arranged to measure views of at least one region of interest, the user terminal comprising:
an immersive display arranged to show a portion of an immersive video;
a directional input for detecting a gaze direction of the user;
the user terminal arranged to:
update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction;
determine a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded;
receive an indication of at least one region of interest in the immersive video; identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
9. The user terminal of claim 8, further arranged to send identified matches to a server.
10. The user terminal of claim 8 or 9, wherein the region of interest is an advertisement.
11. The user terminal of any of claims 8, 9 or 10, further comprising using the identified match to identify a view of an advertisement.
12. A server arranged to measure views of at least one region of interest, the server arranged to:
receive a gaze vector and a timestamp, the gaze vector defining the direction of view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded;
receive an indication of at least one region of interest in the immersive video; identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
13. The server of claim 12, wherein the region of interest is an advertisement.
14. The server of claims 12 or 13, further comprising using the identified match to identify a view of an advertisement.
15. The server of any of claims 12 to 14, further arranged to aggregate matches for a plurality of users, and determining the total number of views of an advertisement.
16. A user terminal arranged to measure views of at least one region of interest, the user terminal comprising:
an immersive display arranged to show a portion of an immersive video;
a directional input for detecting a gaze direction of the user;
the user terminal arranged to:
update the portion of the immersive video displayed on the immersive display in response to the detected gaze direction;
determine a gaze vector and a timestamp, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded;
send the gaze vector and the timestamp to a server.
17. A method in a user terminal arranged to display immersive video to a user, the method comprising:
displaying a portion of an immersive video;
detecting a gaze direction of the user and determining a gaze vector;
updating the portion of the immersive video displayed on the immersive display in response to the determined gaze vector;
sending a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
18. An apparatus for measuring views of a region of interest in immersive video comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to:
receive a gaze vector and a timestamp, the gaze vector defining the view of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded;
receive an indication of at least one region of interest in the immersive video; identify a match when the gaze vector approaches within a threshold distance of the at least one region of interest.
19. An apparatus for displaying immersive video to a user comprising a processor and a memory, said memory containing instructions executable by said processor whereby said apparatus is operative to:
display a portion of an immersive video;
detect a gaze direction of the user and determining a gaze vector;
update the portion of the immersive video displayed on the immersive display in response to the determined gaze vector;
send a gaze vector and a timestamp to a server, the gaze vector defining the gaze direction of a user viewing immersive video, the timestamp indicating a point in time in the immersive video playback when the gaze vector was recorded.
20. A computer-readable medium, carrying instructions, which, when executed by computer logic, causes said computer logic to carry out any of the methods defined by claims 1 to 7 or 17.
PCT/EP2017/082693 2017-12-13 2017-12-13 Detecting user attention in immersive video WO2019114955A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/082693 WO2019114955A1 (en) 2017-12-13 2017-12-13 Detecting user attention in immersive video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/082693 WO2019114955A1 (en) 2017-12-13 2017-12-13 Detecting user attention in immersive video

Publications (1)

Publication Number Publication Date
WO2019114955A1 true WO2019114955A1 (en) 2019-06-20

Family

ID=60942970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/082693 WO2019114955A1 (en) 2017-12-13 2017-12-13 Detecting user attention in immersive video

Country Status (1)

Country Link
WO (1) WO2019114955A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113115086A (en) * 2021-04-16 2021-07-13 安乐 Method for collecting elevator media viewing information based on video sight line identification
US20230043838A1 (en) * 2019-08-29 2023-02-09 Looxid Labs Inc. Method for determining preference, and device for determining preference using same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100046797A1 (en) * 2008-08-20 2010-02-25 SET Corporation Methods and systems for audience monitoring
US20130340006A1 (en) * 2012-06-14 2013-12-19 Mobitv, Inc. Eye-tracking navigation
WO2015048749A1 (en) * 2013-09-30 2015-04-02 Interdigital Patent Holdings, Inc. Methods, apparatus, systems, devices, and computer program products for providing an augmented reality display and/or user interface
US9363569B1 (en) * 2014-07-28 2016-06-07 Jaunt Inc. Virtual reality system including social graph
US20160300392A1 (en) * 2015-04-10 2016-10-13 VR Global, Inc. Systems, media, and methods for providing improved virtual reality tours and associated analytics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100046797A1 (en) * 2008-08-20 2010-02-25 SET Corporation Methods and systems for audience monitoring
US20130340006A1 (en) * 2012-06-14 2013-12-19 Mobitv, Inc. Eye-tracking navigation
WO2015048749A1 (en) * 2013-09-30 2015-04-02 Interdigital Patent Holdings, Inc. Methods, apparatus, systems, devices, and computer program products for providing an augmented reality display and/or user interface
US9363569B1 (en) * 2014-07-28 2016-06-07 Jaunt Inc. Virtual reality system including social graph
US20160300392A1 (en) * 2015-04-10 2016-10-13 VR Global, Inc. Systems, media, and methods for providing improved virtual reality tours and associated analytics

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230043838A1 (en) * 2019-08-29 2023-02-09 Looxid Labs Inc. Method for determining preference, and device for determining preference using same
CN113115086A (en) * 2021-04-16 2021-07-13 安乐 Method for collecting elevator media viewing information based on video sight line identification
CN113115086B (en) * 2021-04-16 2023-09-19 浙江闪链科技有限公司 Method for collecting elevator media viewing information based on video line-of-sight identification

Similar Documents

Publication Publication Date Title
CN109416931B (en) Apparatus and method for gaze tracking
US10638194B2 (en) Embedding interactive objects into a video session
US10948982B2 (en) Methods and systems for integrating virtual content into an immersive virtual reality world based on real-world scenery
US8964008B2 (en) Volumetric video presentation
US8730354B2 (en) Overlay video content on a mobile device
JP5732129B2 (en) Zoom display navigation
WO2019105274A1 (en) Method, device, computing device and storage medium for displaying media content
US10511767B2 (en) Information processing device, information processing method, and program
US10356493B2 (en) Methods, systems, and media for presenting interactive elements within video content
CN107295393B (en) method and device for displaying additional media in media playing, computing equipment and computer-readable storage medium
WO2015021939A1 (en) Screen capture method, set top box and television equipment
US10769679B2 (en) System and method for interactive units within virtual reality environments
US20240077941A1 (en) Information processing system, information processing method, and program
WO2019114955A1 (en) Detecting user attention in immersive video
CN114302160B (en) Information display method, device, computer equipment and medium
US20220103901A1 (en) Customized commercial metrics and presentation via integrated virtual environment devices
WO2020206647A1 (en) Method and apparatus for controlling, by means of following motion of user, playing of video content
CN111667313A (en) Advertisement display method and device, client device and storage medium
CN114501127B (en) Inserting digital content in multi-picture video
CN105630170B (en) Information processing method and electronic equipment
CN110910508A (en) Image display method, device and system
CN116954440A (en) Interface interaction method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17826169

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17826169

Country of ref document: EP

Kind code of ref document: A1