GB2509150A - Bookmarking a scene within a video clip by identifying the associated audio stream - Google Patents

Bookmarking a scene within a video clip by identifying the associated audio stream Download PDF

Info

Publication number
GB2509150A
GB2509150A GB201223277A GB201223277A GB2509150A GB 2509150 A GB2509150 A GB 2509150A GB 201223277 A GB201223277 A GB 201223277A GB 201223277 A GB201223277 A GB 201223277A GB 2509150 A GB2509150 A GB 2509150A
Authority
GB
United Kingdom
Prior art keywords
media
piece
user
audio data
user device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB201223277A
Other versions
GB201223277D0 (en
GB2509150B (en
Inventor
Steve Hamilton Shaw
Daniel Laurence
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ROCKET PICTURES Ltd
Original Assignee
ROCKET PICTURES Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ROCKET PICTURES Ltd filed Critical ROCKET PICTURES Ltd
Priority to GB1223277.3A priority Critical patent/GB2509150B/en
Publication of GB201223277D0 publication Critical patent/GB201223277D0/en
Publication of GB2509150A publication Critical patent/GB2509150A/en
Application granted granted Critical
Publication of GB2509150B publication Critical patent/GB2509150B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/37Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying segments of broadcast information, e.g. scenes or extracting programme ID
    • H04H60/372Programme
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/58Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of audio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47214End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for content reservation or setting reminders; for requesting event notification, e.g. of sport results or stock market
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6581Reference data, e.g. a movie identifier for ordering a movie or a product identifier in a home shopping application
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Bookmarking a point within a piece of audio-visual media being output to a user, wherein audio from the playing media is received at a device, the media is identified by comparing it with audio from known media programmes, the device is notified of a user input and stores an indication of a portion of the video that corresponds to the timing of the user input. Preferably the indication may be the timing of the portion of the video or it may be a video frame, scene, shot or sequence. There may be a touchscreen user device which receives the audio data from the piece of media and sends this data to a server for identification. Preferably the user is sent information containing details about an object, product or service relating to the content of the video at the time of the notification. This allows a user to return after the programme to a featured product or item.

Description

I
PROCESSING MEDIA DATA
Field of the Invention
The present invention relates to processing media. round
As a user watches a piece of media (which may, for example, be a fflm, a live t&evision programme. a pre-recorded television programme, a music video or an advertisement), he may see something in the video data of the piece of media which he would like to remember later, e.g. when the piece of media has finished being outputted at the user device. For example, during a film, the user may see a particular object being displayed in the film (e.g. a watch worn in a scene by an action hero) for which he would like to find out more details, At present, a user has to either pause the film to identify the object (if he can do that) or attempt to recaD the object when the film finishes. For example, in a cinema, pausing the film is not an option for a viewer.
Sum wwy The inventors recognise that many viewers of media, termed herein as tnscreen Visual Media" (OVM) have a variety of user devices. For example, users may have a smartphone, a tablet, a laptop, a personal computer (PC) or gaming device. Such devices are capable of executing applications (Apps) which interface with a user.
There are described herein methods by which a user can use an application executed at a user device in order to "remembe( a moment during output of a piece of media. The pieces of media referred to herein include both video data and audio data output in synchronisation. The application can record a frame of the piece of media corresponding to the moment within the piece of media that the user wanted to remember for display to the user, This can be done when the movie or program is finished, or during the movie or program. In methods described herein, this is achieved by identifying a piece of media using the audio data of the piece of media. A small portion of audio data (e.g. of the order of Ito 10 seconds) of a piece of media is usuafly sufficient to identify a piece of media and the temporal position of the audio data within the piece of media.
The application can then track the output of the piece of media.
In particular, in a first aspect there is provided a method of processing media data, the method comprising: receMng audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input at a user device during the output of the piece of media to the user; and storing an indication of a portbn, e.g. a frame or scene, of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The method of the first aspect may be performed at a server or at the user device. For example, a computer program product configured to process media may be embodied on a computerreadable storage medium and configured so as when executed on a processor of the server to perform the method of the first aspect.
In a second aspect, there is provided a method of processing media data the method comprising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
As an example, a computer program product configured to process media may be embodied on a computerreadable storage medium and configured so as when executed on a processor of the user device to perform the method of the second aspect.
In this way, the audio data of a piece of media is used to identify the piece of media, The timing of a user input within the piece of media corresponds to a frame of the video data of the piece of media, and an indication of that frame is stored. This aUows the user to provide a user input during the output of the piece of media in order to remember a moment within the piece of media.
The invention also provides a computer device configured to process media data, the computer device comprising: a receiving module configured to: (I) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
The invention also provides a user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
Brief Description of the Drawing
For a better understanding of the present invention and to show how the same may be put into effect, reference will now be made, by way of example, to the foUowing drawings in which: Figure 1 shows a schematic iUustration of a network; Figure 2 is a schematic functional block diagram of a user device; Figure 3 is a schematic functiona' block diagram of a server; Figure 4 is a flow chart for a process of processing media according to a preferred embodiment; Figure 5 is an example of a graph showing the amplitude of an audio signal as a function of time; and Figures 6a to Sd, 7a and 7b show examples of user interfaces displayed at a user device.
flEmbodiments Preferred embodiments of the invention wiU now be described by way of
example only.
Figure 1 shows a system including a user device 102 which is useable by a user 104. The user 104 is watching a piece of media on another device 103, referred to as a screen device, which could be a cinema screen, or computer, DVD or television, for example. The user device 102 can connect to the network 106, which may for example be the Internet. The user device 102 may for example be a rnobe phone (e.g. a smartphone), a tablet, a laptop, a personal computer ("PC"), a gaming device or other embedded device able to communicate over the network 106. The user device 102 is arranged to receive information from and output information to the user 104. The network 106 comprises a server 108 which has access to a data store such as a database 110. Many more nodes than those shown in Figure 1 may be connected to the network 106, but for clarity only the user device 102 and server 108 are shown in Figure 1. The user device 102 and the server 108 can communicate with each other over the network 106. For example, where the network 108 is the Internet, the user device 102 and the server 108 can communicate with each other by sending Internet Protocol (IP) data packets across the network 106. Ft will be appreciated that if the network 106 is a network other than the Internet then data packets may be formatted and sent according to some other, appropriate protocol.
Figure 2 illustrates a detailed view of the user device 102. The user device 102 comprises a processor (CPU") 202 configured to process data on the user device 102. Connected to the CPU 202 is a display 204 which may be implemented as a touch screen for inputting data to the Cpu 202. Also connected to the CPU 202 are speakers 208 for outputting audio data, a microphone 208 for receiving audio data, a keypad 210, a memory 212 for storing data and a network interface 214 for connecting to the network 106, The display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 are integrated into the user device 102 (eg when the user device 104 is a mobUe phone). The display 204 and speakers 206 act as output apparatus of the user device 102 for outputting video and audio data respectively. The display 204 (when implemented as a touch screen), microphone 208 and keypad 210 act as input apparatus of the user device 102, In alternative user devices one or more of the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 may not be integrated into the user device 102 and may be connected to the CPU 202 via respective interfaces. One example of such an interface is a USB interface.
The user device 102 may include other components which are not shown in Figure 2, For example, when the user device 102 is a PC, the CPU 202 may be connected to a mouse via a USB interface, Similarly, when the user device 102 is a laptop, the CPU 202 may be connected to a touchpad via a USB interface, As another example, when the user device 102 is a television, the CPU 202 may be connected to a remote control via a wireless (e.g. infrared) interface.
An operating system (OS) 216 is running on the CPU 202. The user device 102 is configured to execute a media application 218 on top of the OS 216. The media application 218 is a computer program product which is configured to process media data at the user device 102. The media application 218 is stored in the memory 212 and when executed on the CPU 202 performs methods described in more detail below for processing media data at the user device 102, Figure 3 illustrates a detailed view of the server 108. The server 108 comprises a processor ("CPU") 302 configured to process data on the server 108.
Connected to the CPU 302 is a network interface 304 for connecting to the network 106. The server 108 also indudes a memory for storing data which may include the database 110. A computer program product may be stored in the memory at the server 108 and configured such that when it is executed on the CPU 302 to perform methods described in more detail below for processing media at the server 108.
With reference to Figures 4 to 6 there is now described a method of a preferred embodiment, In Figure 4, the steps shown in the left hand column (i.e. steps 3402, 3404, 3410, 3412, 3414, 3418, S420, S424 and 3426) are implemented at the user device 102, whereas the steps shown in the right hand column (i.e. steps 3406, 3408, 3416 and 3422) are implemented at the server 108.
In step S402 a piece of media is outputted to the user 104 at the screen device 103. For example, the screen device 103 may be showing a film or TV program. The film includes synchronized streams of video and audio data. As described above, the piece of media may be a television programme, music video, advert, or any other OVM. The piece of media can come from any source, and can be shown on any suitable screen device. Moreover, it could be streamed to the user device or stored at the user device for display at the user device itself. In that case, the screen device 103 is the same as the user device 102.
The user 104 opens the media application 218, such that the media application 218 executes on the CPU 202. The media application 218 takes a sample of the audio data of the piece of media currently being output at the screen device 103. The sample may for example have a duration in the range of I second to seconds. The sample has a sufficient duration for the piece of media to be identified as described below.
In step S404, the audio data of the piece of media which is outputted at the screen device 103 is received by the user device 102 which sends it over the network 106 to the server 108 (e.g. via the network interface 214 and the network interface 304). For example, the network interface 214 may connect the user device 102 to the network 106 via a Wi-H router or via a cellular telephone network (e.g. implementing 3rd or 4th generation of mobile telecommunications (3G or 4G) technology).
The server 108 receives the audio data sent from the user device 102, and in step S406 the server 108 uses the received audio data to identify the piece of media being outputted at the user device 102. The server 108 can also identify the exact point in the piece of media at which the audio data occurs. This takes a few seconds, and is implemented as described below.
The audio data may be represented as a graph of amplitude against time, such as that shown in Figure 5. A piece of media has a unique signature through samples of its audio data. This is true even for audio samples which have a duration of the order of ito 10 seconds.
The server 108 has access to a data store which stores audio data of known pieces of media, For example, in the embodiments described in detail herein, the data store is implemented as a database 110 stored at the server 108. The database 110 may for example store data representing power functions (such as that shown in Figure 5) for the audio data of the known pieces of media. As described above, the known pieces of media may include, for example, a film, a live television programme, a prerecorded television programme, a music video, an advert, or any other OVM which may be output.
In step S406 the audio data received from the user device 102 is compared to audio data of known pieces of media stored in the database 110, to thereby identify the piece of media being outputted from the user device 102. The audio signature of a piece of media is used to differentiate between known pieces of media and the exact position of the audio data within a piece of media. The comparison of the received audio data with the known pieces of audio data may be performed using known algorithms. This involves comparing features (e.g. audio fingerprints) of audio data using statistical analysis to determine whether two samples of audio data match each other. For example, applications such as Shazam, Soundhound, SoundPrint and IntoNow implement algorithms for identifying audio data by comparing it with audio data from a database of audio data from known pieces of media. As an example, the IntoNow implementation is described in the US patent publication number 2012/0209612 Al. Since such algorithms are known in the art, they are not described in detail herein.
Step 8406 identifies which piece of media the user 104 is viewing and also identifies the exact temporal position within the piece of media to which the audio data matches. In step S408 the server 108 sends an indication of the identified piece of media to the user device 102. n step S408 the server 108 also sends an identifier of the temporal position within the identified piece of media to the user device 102.
The user device 102 receives the incflcation of the piece of media (e.g. the title of the piece of media) and the identifier of the temporal position (e.g. a time from the start of the piece of media). Then in step 8410, using the indication of the piece of media and the identifier of the temporal position, the media application 218 tracks the outputting of the piece of media. The tracking of the piece of media may continue until completion of the outputting of the piece of media, that is, until the piece of media finishes being output. However, if there is any disruption in the outputting of the piece of media then the media application 218 will need to reconnect to the server 108 in order to correctly track the output of the piece of media. For example, the film may be paused on TV, or buffering issues may interrupt the playout if the media is being streamed.
A reconnect process would repeat steps 8404 to 8410 as described above.
In this way, the media application 218 is able to obtain information (e.g. title) indicating what media is being output and how far through that media (in time) the outputting of the media is. To reassure the user 104, the media application 218 may display details about the piece of media on the display 204 when it has received them from the server 108. For example, the media application 218 may display the title, current point and total length of the piece of media currently being output.
Once the media application 218 has synched to the particular piece of media being output then the temporal position within the piece of media at which a user input is subsequently received can be determined. For example, in step 8412 the media application 218 receives a user input from the user 104 during the output of the piece of media. The user 104 may provide the user input via the user interlace of the user device 102. The user interface of the user device 102 comprises input apparatus by which the user can provide a user input. The input apparatus may for example comprise one or more of a touch screen, a button, a mouse, a touch pad, a voice recognition system and a remote control.
In a preferred embodiment, the user taps the touch screen 204 to provide the user input. Alternatively, a gesture can act as an input. As an example, the user may provide the user input when he sees something in the outputted piece of media which he wants to record, either to remember after the piece of media has finished or at the time the object of interest is displayed. For example, during a film, the user 104 may decide that he would like to buy an object being displayed in the film (e.g. a watch worn in a scene by an action hero). However, the user 104 may not want to interrupt his viewing experience of the film, so he decides that he will foUow up on the object when the film has finished, In the past, when the film finished, the user might not proceed to buy the object, for one of many possible reasons. For example, the user may forget about his intention to buy the object, he may not know how to buy the object or he may proceed to do something else when the film finishes instead of buying the object. Alternatively, a viewer may want additional information about something shown in a TV documentary, such as an animal or location in a wfldlife or holiday program. In accordance with the novel methods described herein, by providing the user input when the user 104 sees the object, the user 104 will be reminded (eg. when the film finishes) of what was on the screen at the time of providing the user input. As will become clear from the following description, an alternative is to present to the user information related to what was on the screen at the time providing the user input. For example, details about objects that were on the screen, or services or products related to what was on the screen. This information can be connected to the content or subject matter of S the onscreen visue! medial itself, or only connected to a particular object in the frame or scene indicated by the user, and not connected to the overaU content of the OVM.
When the user input is received, in step S414 a notification of the user input is sent to the server 108. The notification of the user input may comprise a time within the piece of media at which the user input was received in step 3412.
AD of the communications between the user device 102 and the server 108 occur over the network 106, e.g. via the network interlace 214 and the network interface 304.
The server 108 receives the notification of the user input, and in step 841$ the server 108 stores an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The portion can be a scene or frame or any other defined time period around a frame. Other examples include a particular camera shot (that is, a point of view in a scene), or a sequence of frames. A delayed motion event can cause a number of frames in a scene to be displayed to a user at the user device 102, to allow him to pick the precise frame/scene of interest. The indication of the portion (scene or frame) may be stored in a memory at the server 108. For example, the notification of the user input may indicate a time within the piece of media at which the user input is received, and step 3416 may include determining which frame of the identified piece of media occurs at the identified time. The frame occurring at the identified time may then be stored at the server 108. That is, the frame itself may be stored. In this way, a screenshot can be saved of whatever is displayed on the display 204 at the time at which the user input is received.
Alternatively, the timing of the frame within the piece of media may be stored instead of the frame itself. In that case, the stored timing can subsequently be used to determine the frame of the video data of the identified piece of media occurring at the identified time, Steps 8412 to S416 may be repeated throughout the outputting of the piece of media for each appropriate user input that is received by the media application 218.
The dotted fine in Figure 4 indicates a passage of time, after which in step 8418 the outputting of the piece of media finishes. The finishing of the piece of media is detected by the media application 216. In step 8420 the media application 218 sends an indication to the server 108 to indicate that the output of the piece of media at the user device 102 has finished.
The indication that the output of the piece of media has finished is received at the server 108. In response, in step 5422 the server 108 sends the frame(s) of the piece of media, which are indicated by the indication(s) which were stored in step 8418, to the user device 102. If the frames themselves were stored in step 8416, then step 8422 simply involves retrieving the frames from the data store where they were saved in step 8416 and then sending the frames to the user device 102. Alternatively, if the timings of the frames were stored in step 5416, then step 5422 involves retrieving the timings from the data store where they were saved in step 8416, using the timings and the video data of the known piece of media to determine the frames at the relevant timings, and sending the determined frames to the user device 102.
The frame(s) are received at the user device 102. In step 8424 the received frames are displayed to the user 104 on the display 204 at the user device 102.
In this way the user 104 is reminded of what was on the screen when he decided to provide the user input in step 5412.
In one implementation, as shown in Figure 4 by step 5426, a link to a relevant webpage may be displayed with the frame that is displayed in step 5424. The relevant web page may relate to an object displayed in the frame of the video data. For example, if the frame of video data includes a character wearing a dress, then there may be provided a link to a web page from which the dress can be purchased. Interaction with the link may cause a remote retafler to take action, such as to send a brochure or advertisement to the user device.
In another implementation, step 8426 is not necessary, and instead step 8424 of displaying the frame(s) inbiudes automatically directing the user 104 to a webpage of an online store in which the frame(s) are displayed. In this way, the user 104 may be taken straight to an onUne store when the piece of media finishes, so that the user 104 can review their screenshots (i.e. the frames which caused them to provide the user input in step 8412). AlternativSy, the user 104 can login separately to browse the internet for the items that attracted them in the frames which caused them to provide the user input in step 8412).
Figures Ga to Sd show representations of an example web page to which the user 104 may be directed. Figure $a shows a screen 602 to which the user 102 may first be directed in order to view the frames which he chose to be reminded about. Three frames (indicated as 604, 606 and 608) are shown in screen 602 which show the frames of the piece of media which the user chose to save.
The user 104 can select one of the saved scenes, e.g. by clicking on one of the frames 604, 606 or 608 using for example, the touch screen 204 or keypad 210.
When the user 104 selects a scene from screen 602, screen 610 is displayed as shown in Figure Sb. Screen 610 requests that the user 104 selects a category in order to shop for items shown in the scene of the piece of media which he has selected, The example categories shown in Figure Gb are clothes 612, products 614 and accessories 616.
When the user 104 selects a category from screen 610, screen 618 is displayed as shown in Figure 6c, Screen 618 requests that the user 104 selects between the characters which are included in the selected scene. For example, as shown in Figure 6c, three characters are included in the scene, those being character A 620, character B 622 and character C 624.
When the user selects one of the characters in screen 618, he is taken to screen 626 as shown in Figure Ed. Screen 626 presents the user 104 with onUne shopping opportunities relating to the category and character selected in screens 610 and 618. For example, if the user has selected clothes 612 and character A 620, then screen 626 may present options for the user 104 to buy a dress or shoes by clicking on the respective links 630 and 632. The dress or shoes may be those worn by the selected character in the scene of the piece of media which includes the frame at the time for which the user provided the user input in step S412. Other information relating to the relevant products and/or characters may also be displayed in screen 626.
It can therefore be appreciated that the example implementation illustrated in Figures 6a to 6d enables the user 104 to purchase products and/or services via the media outputted at the user device 102 without interrupting the viewing experience, i.e. the purchasing of products and/or services occurs after the media has finished being outputted at the user device 102. In this implementation the media application 218 bridges the gap between product placement and product purchasing. This opens up huge potential for the way that viewers of media interact with the products included in, or extrapolated from, the media, The media application can also be used to provide instant information about an object on the screen 103.
The media application 218 may be downloaded to the user device 102 over the network 106 and stored in the memory 212 of the user device 102. The media application 218 may, or may not, be downloaded in return for payment from the user 104 to a software provider of the media application 218.
A piece of media (e.g. film) could have multiple versions, including for example a director's cut, extended cut, international cut and the final cut. When the audio data is matched to audio data of a known piece of media (in step S406) it may match with more than one of these versions. The default result which is assumed in this case is the final cut. However, the database 110 will store the temporal positions within the piece of media where the versions differ, and at this point the media application 218 may reconnect to the server 108 in order to verify which version of the piece of media is being output, i.e. to perform steps 8404 to 8410 again. This is done without requiring involvement from the user 104.
When the piece of media is outputted on a television channel which includes adverts, the connection between the user device 102 and the server 108 may be maintained (e.g. using a WiFi connection) and the sampling of the audio data of the outputted piece of media is continued, eg. at regu'ar intervals (e.g., every 5 seconds), to detect the presence of adverts, When adverts are detected the tracking of the output of the piece of media is paused until the adverts are finished and the output of the piece of media recommences, In this way, the media application 218 can maintain the sound sampling over WiFi or other wireless connection to differentiate between the piece of media and the advertisements, to ensure that the tracking of the output of the piece of media proceeds correctly.
To avoid the need to continuously track the media and advertisements so as to identify the adverts, it would be possible for the broadcaster of the television program to provide data to the user device which would indicate when the adverts were starting and stopping, so as to indicate to the user device when to reconnect to the audio data for tracking. When the application is used in a voting TV show, the points at which the user votes, voting is allowed and the vote data can be sent to the show and data can be sent to the user device of overall votes.
In the embodiments described above, the user 104 wishes to remember a frame of the outputted media in order to purchase a product shown in the frame, or obtain more details about an object in the frame. These details can be connected to the content of the subject matter of the OVM, or only connected to the object itself and not the overa content of the OVM. As an alternative to details concerning an object, services or products connected to the OVM can constitute information avaUabie to a user of the user device. However, in other embodiments, the user 104may wish to remember the frame for some other reason, which might not be related to the purchasing of products or services.
The analysis of the audio data for comparison with audio data of known pieces of media provides a fast and reliable method to identify a piece of media, and a temporS position within the piece of media, which is outputted from the user device 102.
The method steps described above and shown in Figure 4 may be implemented as functional modules at the user device 102 and the server 108 as appropriate, eg. in hardware or software. For example, the method steps may be implemented by executing a computer program product on the respective CPU (202 or 302) to implement the steps in software. The media application 218 described above is an example of a suitable computer program product for implementing the steps at the user device 102.
In particular, in the above description, a method has been described wherein audio data is received by the user device 102 and then transmitted to the server. It would be possible for the user device 102 to carry out the steps which are described above as being carried out by the server, in particular, the comparison and storing steps. This would be particularly appropriate in a situation where the provider of a piece of onscreen visual media, such a fUm, created an application specifically for that film, and including a sound file with the application. That is, the media application described above could be made specific to a particular piece of onscreen visual media such that when the application is opened and the user input is received, there would be no requirement to have a server connection until a later time.
it wifl be appreciated that in the embodiments described above where the onscreen visual media is output on the screen device 103, the screen of the user device 102 is not displaying the media and can be considered effectively to be blank. Alternatively the screen could be used to display additional information to augment the OVM content. However, it is useful for a user to be aware that the App is open and responsive, and so the App can be designed to generate a "tracking" screen as shown in Figure 7a while the onscreen visual media is being tracked. When a user has provided a user input, for example, by tap or gesture, a "scene saved" screen can be displayed to a user as shown, for example, in Figure 7b.
S
For watching movies in a cinema, the appflcation could cause the device to adopt a cinema mode, in which the ringing tone is turned off, any camera is turned off, and notifications and recordings of any kind are prevented. The screen could show black, but would still allow sending and receiving data for recognition of the onscreen visual media.
In an alternative aspect of the invention, there may not be a requftement for a user to provide a user input during the output of the piece of media to identify a piece of media in which he is interested. Instead, the piece of media could be tracked using the audio data from the beginning of a particular piece, for example, a television show, wherein the broadcasters of the show send information to the user device connected with the show which is being viewed by the user. For example, coupons or voting opportunities can be advised and displayed to a user of the user device while he is watching the television show, and based on the tracking of that show using the audio data.
A user could identify shows that he wishes to track in this fashion by using a tap or other input gesture at the beginning of a show once he has opened the application on his user device. Alternatively, the show could automatically (assuming the application is open) notify the application to commence tracking such that the show can interact with the user on the display of the user device.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (24)

  1. CLAWS: 1. A method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving a notification of a user input during the output of the piece of media to the user; and storing an indicati on of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  2. 2. The method of claim 1, wherein the notification of the user input indicates the timing of the user input within the piece of media.
  3. 3. The method of claim I wherein said comparing step further identifies a temporal position of the received audio data within the identified piece of metfia.
  4. 4. The method of claim 1, wherein said indication of a portion of the video data is one of: a frame of the video data itself; a scene of the video data; a shot in the video data; and a sequence of frames of the video data.
  5. 5. The method of claim 1. wherein said indication of a portion of the video data is the timing of the portion within the piece of media, wherein said timing is subsequently used to determine the portion of the video data of the identified piece of media.
  6. 6. The method of claim 1, wherein the method steps are performed at a server, the piece of media is outputted to the user at a screen device, and the audio data is received at a user device associated with the user and transmitted to the server.
  7. 7. The method of claim 6, further comprising sending an indication of the identified piece of media to the user device.
  8. 8. The method of claim 6, when dependent upon claim 3 further comprising sending the identified temporal position to the user device.
  9. 9. The method of claim 6, further comprising: using said stored indication of the porfion to determine the portion; and sending the determined portion or information about an object contained in the determined portion to the user device.
  10. 10. The method of claim 9, wherein the determined portion or information is sent to the user device after the piece of media has finished being outputted at the user device.
  11. 11. A computer device configured to process media data, the computer device comprising: a receiving modu'e configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  12. 12. The computer device of claim 11, which is one of a server and a user device.
  13. 13. A computer program product configured to process media data, the computer program product being embodied on a computerreadabe storage medium and configured so as when executed on a processor of a server to perform the operations of: receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input during the output of the piece of media to the user; and storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  14. 14. A method of processing media data at a user device, the method corn p rising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the ideniffied piece of media corresponding to a timing of the user input within the piece of media.
  15. 15. The method of claim 14, further comprising receiving an indication of the identified piece of media from the server.
  16. 16. The method of claim 14, further comprising receiving, from the server, an identifier of a temporal position, within the identified piece of media, of the audio data sent to the server.
  17. 17. The method of claim 16, wherein receMng the audio data causes the user device to track the output of the piece of media.
  18. 18. The method of claim 17, further comprising using the tracking of the output of the piece of media to determine the timing of the user input within the piece of media.
  19. 19. The method of claim 18, wherein the notification of the user input sent to the server comprises an indication of the determined timing of the user input.
  20. 20. The method of claim 14, wherein said user input is received via a user interface of the user device, said user interface comprising at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a remote control and a gesture recognition system.
  21. 21. The method of claim 14, further comprising: after the piece of media has finished being outputted, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device,
  22. 22. The method of claim 21, further comprising providing a link to a web page relating to the object displayed in said portion of the video data.
  23. 23. A user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notificatioh of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  24. 24. The user device of claim 23, wherein the input apparatus comprises at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a gesture recognition system and a remote controL 25, The user device of ciaim 23, comprising: a display operable to present a tracking screen while receiving audio data from the piece of media and a storing screen when sending the notification of the user input.26. The user device of claim 25, operable to receive the identified piece of media or information about an object in the identified piece of media, wherein the display is operable to present the identified piece of media or information.27. A computer program product configured to process media data at a user device, the computer program product being embodied on a computer-readable storage medium and configured so as when executed on a processor of the user device to perform the operations of: receiving audio data of a piece of media; sending audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.28. The method of claim 1. computer device of claim 11, computer program product of claim 15, method of claim 14, user device of claim 23 or computer program product of claim 27, wherein said indication of a portion of the video data comprises information related to the portion of video data.29. The method, computer device, user device or computer program product of claim 28, wherein said information comprises details relating to an object in the portion of video data or information about products or services relating to the content of the video data.30. The method of claim 14, further comprising: concurrently with outputting with the piece of media, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device.31, A method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; tracking the outputted piece of media while it is output to a user; and receiving at a user device items for display to a user based on said tracking.32. A user device comprising a processor configured to execute a computer program product which when executed implements the method of claim 31 the user device further comprising a display for displaying said items.
GB1223277.3A 2012-12-21 2012-12-21 Processing media data Expired - Fee Related GB2509150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1223277.3A GB2509150B (en) 2012-12-21 2012-12-21 Processing media data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1223277.3A GB2509150B (en) 2012-12-21 2012-12-21 Processing media data

Publications (3)

Publication Number Publication Date
GB201223277D0 GB201223277D0 (en) 2013-02-06
GB2509150A true GB2509150A (en) 2014-06-25
GB2509150B GB2509150B (en) 2016-05-18

Family

ID=47682493

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1223277.3A Expired - Fee Related GB2509150B (en) 2012-12-21 2012-12-21 Processing media data

Country Status (1)

Country Link
GB (1) GB2509150B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754661A (en) * 2019-03-18 2019-05-14 北京一维大成科技有限公司 A kind of on-line study method, apparatus, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120315014A1 (en) * 2011-06-10 2012-12-13 Brian Shuster Audio fingerprinting to bookmark a location within a video

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120315014A1 (en) * 2011-06-10 2012-12-13 Brian Shuster Audio fingerprinting to bookmark a location within a video

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754661A (en) * 2019-03-18 2019-05-14 北京一维大成科技有限公司 A kind of on-line study method, apparatus, equipment and medium

Also Published As

Publication number Publication date
GB201223277D0 (en) 2013-02-06
GB2509150B (en) 2016-05-18

Similar Documents

Publication Publication Date Title
US11443511B2 (en) Systems and methods for presenting supplemental content in augmented reality
US20180343476A1 (en) Delivery of different services through client devices by video and interactive service provider
US9015745B2 (en) Method and system for detection of user-initiated events utilizing automatic content recognition
US10650442B2 (en) Systems and methods for presentation and analysis of media content
US9460204B2 (en) Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
US20130173765A1 (en) Systems and methods for assigning roles between user devices
US20130174191A1 (en) Systems and methods for incentivizing user interaction with promotional content on a secondary device
US20120120296A1 (en) Methods and Systems for Dynamically Presenting Enhanced Content During a Presentation of a Media Content Instance
KR20100039706A (en) Method for providing dynamic contents service using analysis of user's response and apparatus thereof
US20130339998A1 (en) Systems and methods for providing related media content listings during media content credits
WO2012006023A2 (en) Apparatus, systems and methods for accessing and synchronizing presentation of media content and supplemental media rich content
GB2527415A (en) Methods and systems for performing playback operations based on the length of time a user is outside a viewing area
US20180014066A1 (en) System and methods for facile, instant, and minimally disruptive playback of media files
US20100145796A1 (en) System and apparatus for interactive product placement
US20120042041A1 (en) Information processing apparatus, information processing system, information processing method, and program
GB2534321A (en) Systems and methods for receiving product data for a product featured in a media asset
US9781492B2 (en) Systems and methods for making video discoverable
WO2015135001A1 (en) Electronic system and method to render additional information with displayed media
CN103918277A (en) System and method for determining a level of confidence that a media item is being presented
US20130085846A1 (en) System and method for online selling of products appearing on a display
US20130177286A1 (en) Noninvasive accurate audio synchronization
GB2509150A (en) Bookmarking a scene within a video clip by identifying the associated audio stream
US10141023B2 (en) Method and system for multimedia summary generation
KR101257913B1 (en) Interworking system of media relational information using a smart media play-back device and smart terminal unit and method thereof
TW201626807A (en) Method and system of displaying and controlling, breakaway judging apparatus and video/audio processing apparatus

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20161221