GB2509150A

GB2509150A - Bookmarking a scene within a video clip by identifying the associated audio stream

Info

Publication number: GB2509150A
Application number: GB201223277A
Authority: GB
Inventors: Steve Hamilton Shaw; Daniel Laurence
Original assignee: ROCKET PICTURES Ltd
Current assignee: ROCKET PICTURES Ltd
Priority date: 2012-12-21
Filing date: 2012-12-21
Publication date: 2014-06-25
Anticipated expiration: 2032-12-21
Also published as: GB201223277D0; GB2509150B

Abstract

Bookmarking a point within a piece of audio-visual media being output to a user, wherein audio from the playing media is received at a device, the media is identified by comparing it with audio from known media programmes, the device is notified of a user input and stores an indication of a portion of the video that corresponds to the timing of the user input. Preferably the indication may be the timing of the portion of the video or it may be a video frame, scene, shot or sequence. There may be a touchscreen user device which receives the audio data from the piece of media and sends this data to a server for identification. Preferably the user is sent information containing details about an object, product or service relating to the content of the video at the time of the notification. This allows a user to return after the programme to a featured product or item.

Description

I

PROCESSING MEDIA DATA

Field of the Invention

The present invention relates to processing media. round

As a user watches a piece of media (which may, for example, be a fflm, a live t&evision programme. a pre-recorded television programme, a music video or an advertisement), he may see something in the video data of the piece of media which he would like to remember later, e.g. when the piece of media has finished being outputted at the user device. For example, during a film, the user may see a particular object being displayed in the film (e.g. a watch worn in a scene by an action hero) for which he would like to find out more details, At present, a user has to either pause the film to identify the object (if he can do that) or attempt to recaD the object when the film finishes. For example, in a cinema, pausing the film is not an option for a viewer.

Sum wwy The inventors recognise that many viewers of media, termed herein as tnscreen Visual Media" (OVM) have a variety of user devices. For example, users may have a smartphone, a tablet, a laptop, a personal computer (PC) or gaming device. Such devices are capable of executing applications (Apps) which interface with a user.

There are described herein methods by which a user can use an application executed at a user device in order to "remembe( a moment during output of a piece of media. The pieces of media referred to herein include both video data and audio data output in synchronisation. The application can record a frame of the piece of media corresponding to the moment within the piece of media that the user wanted to remember for display to the user, This can be done when the movie or program is finished, or during the movie or program. In methods described herein, this is achieved by identifying a piece of media using the audio data of the piece of media. A small portion of audio data (e.g. of the order of Ito 10 seconds) of a piece of media is usuafly sufficient to identify a piece of media and the temporal position of the audio data within the piece of media.

The application can then track the output of the piece of media.

In particular, in a first aspect there is provided a method of processing media data, the method comprising: receMng audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input at a user device during the output of the piece of media to the user; and storing an indication of a portbn, e.g. a frame or scene, of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The method of the first aspect may be performed at a server or at the user device. For example, a computer program product configured to process media may be embodied on a computerreadable storage medium and configured so as when executed on a processor of the server to perform the method of the first aspect.

In a second aspect, there is provided a method of processing media data the method comprising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.

As an example, a computer program product configured to process media may be embodied on a computerreadable storage medium and configured so as when executed on a processor of the user device to perform the method of the second aspect.

In this way, the audio data of a piece of media is used to identify the piece of media, The timing of a user input within the piece of media corresponds to a frame of the video data of the piece of media, and an indication of that frame is stored. This aUows the user to provide a user input during the output of the piece of media in order to remember a moment within the piece of media.

The invention also provides a computer device configured to process media data, the computer device comprising: a receiving module configured to: (I) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.

The invention also provides a user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.

Brief Description of the Drawing

For a better understanding of the present invention and to show how the same may be put into effect, reference will now be made, by way of example, to the foUowing drawings in which: Figure 1 shows a schematic iUustration of a network; Figure 2 is a schematic functional block diagram of a user device; Figure 3 is a schematic functiona' block diagram of a server; Figure 4 is a flow chart for a process of processing media according to a preferred embodiment; Figure 5 is an example of a graph showing the amplitude of an audio signal as a function of time; and Figures 6a to Sd, 7a and 7b show examples of user interfaces displayed at a user device.

flEmbodiments Preferred embodiments of the invention wiU now be described by way of

example only.

Figure 1 shows a system including a user device 102 which is useable by a user 104. The user 104 is watching a piece of media on another device 103, referred to as a screen device, which could be a cinema screen, or computer, DVD or television, for example. The user device 102 can connect to the network 106, which may for example be the Internet. The user device 102 may for example be a rnobe phone (e.g. a smartphone), a tablet, a laptop, a personal computer ("PC"), a gaming device or other embedded device able to communicate over the network 106. The user device 102 is arranged to receive information from and output information to the user 104. The network 106 comprises a server 108 which has access to a data store such as a database 110. Many more nodes than those shown in Figure 1 may be connected to the network 106, but for clarity only the user device 102 and server 108 are shown in Figure 1. The user device 102 and the server 108 can communicate with each other over the network 106. For example, where the network 108 is the Internet, the user device 102 and the server 108 can communicate with each other by sending Internet Protocol (IP) data packets across the network 106. Ft will be appreciated that if the network 106 is a network other than the Internet then data packets may be formatted and sent according to some other, appropriate protocol.

Figure 2 illustrates a detailed view of the user device 102. The user device 102 comprises a processor (CPU") 202 configured to process data on the user device 102. Connected to the CPU 202 is a display 204 which may be implemented as a touch screen for inputting data to the Cpu 202. Also connected to the CPU 202 are speakers 208 for outputting audio data, a microphone 208 for receiving audio data, a keypad 210, a memory 212 for storing data and a network interface 214 for connecting to the network 106, The display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 are integrated into the user device 102 (eg when the user device 104 is a mobUe phone). The display 204 and speakers 206 act as output apparatus of the user device 102 for outputting video and audio data respectively. The display 204 (when implemented as a touch screen), microphone 208 and keypad 210 act as input apparatus of the user device 102, In alternative user devices one or more of the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 may not be integrated into the user device 102 and may be connected to the CPU 202 via respective interfaces. One example of such an interface is a USB interface.

The user device 102 may include other components which are not shown in Figure 2, For example, when the user device 102 is a PC, the CPU 202 may be connected to a mouse via a USB interface, Similarly, when the user device 102 is a laptop, the CPU 202 may be connected to a touchpad via a USB interface, As another example, when the user device 102 is a television, the CPU 202 may be connected to a remote control via a wireless (e.g. infrared) interface.

An operating system (OS) 216 is running on the CPU 202. The user device 102 is configured to execute a media application 218 on top of the OS 216. The media application 218 is a computer program product which is configured to process media data at the user device 102. The media application 218 is stored in the memory 212 and when executed on the CPU 202 performs methods described in more detail below for processing media data at the user device 102, Figure 3 illustrates a detailed view of the server 108. The server 108 comprises a processor ("CPU") 302 configured to process data on the server 108.

Connected to the CPU 302 is a network interface 304 for connecting to the network 106. The server 108 also indudes a memory for storing data which may include the database 110. A computer program product may be stored in the memory at the server 108 and configured such that when it is executed on the CPU 302 to perform methods described in more detail below for processing media at the server 108.

With reference to Figures 4 to 6 there is now described a method of a preferred embodiment, In Figure 4, the steps shown in the left hand column (i.e. steps 3402, 3404, 3410, 3412, 3414, 3418, S420, S424 and 3426) are implemented at the user device 102, whereas the steps shown in the right hand column (i.e. steps 3406, 3408, 3416 and 3422) are implemented at the server 108.

In step S402 a piece of media is outputted to the user 104 at the screen device 103. For example, the screen device 103 may be showing a film or TV program. The film includes synchronized streams of video and audio data. As described above, the piece of media may be a television programme, music video, advert, or any other OVM. The piece of media can come from any source, and can be shown on any suitable screen device. Moreover, it could be streamed to the user device or stored at the user device for display at the user device itself. In that case, the screen device 103 is the same as the user device 102.

The user 104 opens the media application 218, such that the media application 218 executes on the CPU 202. The media application 218 takes a sample of the audio data of the piece of media currently being output at the screen device 103. The sample may for example have a duration in the range of I second to seconds. The sample has a sufficient duration for the piece of media to be identified as described below.

In step S404, the audio data of the piece of media which is outputted at the screen device 103 is received by the user device 102 which sends it over the network 106 to the server 108 (e.g. via the network interface 214 and the network interface 304). For example, the network interface 214 may connect the user device 102 to the network 106 via a Wi-H router or via a cellular telephone network (e.g. implementing 3rd or 4th generation of mobile telecommunications (3G or 4G) technology).

The server 108 receives the audio data sent from the user device 102, and in step S406 the server 108 uses the received audio data to identify the piece of media being outputted at the user device 102. The server 108 can also identify the exact point in the piece of media at which the audio data occurs. This takes a few seconds, and is implemented as described below.

The audio data may be represented as a graph of amplitude against time, such as that shown in Figure 5. A piece of media has a unique signature through samples of its audio data. This is true even for audio samples which have a duration of the order of ito 10 seconds.

The server 108 has access to a data store which stores audio data of known pieces of media, For example, in the embodiments described in detail herein, the data store is implemented as a database 110 stored at the server 108. The database 110 may for example store data representing power functions (such as that shown in Figure 5) for the audio data of the known pieces of media. As described above, the known pieces of media may include, for example, a film, a live television programme, a prerecorded television programme, a music video, an advert, or any other OVM which may be output.

In step S406 the audio data received from the user device 102 is compared to audio data of known pieces of media stored in the database 110, to thereby identify the piece of media being outputted from the user device 102. The audio signature of a piece of media is used to differentiate between known pieces of media and the exact position of the audio data within a piece of media. The comparison of the received audio data with the known pieces of audio data may be performed using known algorithms. This involves comparing features (e.g. audio fingerprints) of audio data using statistical analysis to determine whether two samples of audio data match each other. For example, applications such as Shazam, Soundhound, SoundPrint and IntoNow implement algorithms for identifying audio data by comparing it with audio data from a database of audio data from known pieces of media. As an example, the IntoNow implementation is described in the US patent publication number 2012/0209612 Al. Since such algorithms are known in the art, they are not described in detail herein.

Step 8406 identifies which piece of media the user 104 is viewing and also identifies the exact temporal position within the piece of media to which the audio data matches. In step S408 the server 108 sends an indication of the identified piece of media to the user device 102. n step S408 the server 108 also sends an identifier of the temporal position within the identified piece of media to the user device 102.

The user device 102 receives the incflcation of the piece of media (e.g. the title of the piece of media) and the identifier of the temporal position (e.g. a time from the start of the piece of media). Then in step 8410, using the indication of the piece of media and the identifier of the temporal position, the media application 218 tracks the outputting of the piece of media. The tracking of the piece of media may continue until completion of the outputting of the piece of media, that is, until the piece of media finishes being output. However, if there is any disruption in the outputting of the piece of media then the media application 218 will need to reconnect to the server 108 in order to correctly track the output of the piece of media. For example, the film may be paused on TV, or buffering issues may interrupt the playout if the media is being streamed.

A reconnect process would repeat steps 8404 to 8410 as described above.

In this way, the media application 218 is able to obtain information (e.g. title) indicating what media is being output and how far through that media (in time) the outputting of the media is. To reassure the user 104, the media application 218 may display details about the piece of media on the display 204 when it has received them from the server 108. For example, the media application 218 may display the title, current point and total length of the piece of media currently being output.

Once the media application 218 has synched to the particular piece of media being output then the temporal position within the piece of media at which a user input is subsequently received can be determined. For example, in step 8412 the media application 218 receives a user input from the user 104 during the output of the piece of media. The user 104 may provide the user input via the user interlace of the user device 102. The user interface of the user device 102 comprises input apparatus by which the user can provide a user input. The input apparatus may for example comprise one or more of a touch screen, a button, a mouse, a touch pad, a voice recognition system and a remote control.

In a preferred embodiment, the user taps the touch screen 204 to provide the user input. Alternatively, a gesture can act as an input. As an example, the user may provide the user input when he sees something in the outputted piece of media which he wants to record, either to remember after the piece of media has finished or at the time the object of interest is displayed. For example, during a film, the user 104 may decide that he would like to buy an object being displayed in the film (e.g. a watch worn in a scene by an action hero). However, the user 104 may not want to interrupt his viewing experience of the film, so he decides that he will foUow up on the object when the film has finished, In the past, when the film finished, the user might not proceed to buy the object, for one of many possible reasons. For example, the user may forget about his intention to buy the object, he may not know how to buy the object or he may proceed to do something else when the film finishes instead of buying the object. Alternatively, a viewer may want additional information about something shown in a TV documentary, such as an animal or location in a wfldlife or holiday program. In accordance with the novel methods described herein, by providing the user input when the user 104 sees the object, the user 104 will be reminded (eg. when the film finishes) of what was on the screen at the time of providing the user input. As will become clear from the following description, an alternative is to present to the user information related to what was on the screen at the time providing the user input. For example, details about objects that were on the screen, or services or products related to what was on the screen. This information can be connected to the content or subject matter of S the onscreen visue! medial itself, or only connected to a particular object in the frame or scene indicated by the user, and not connected to the overaU content of the OVM.

When the user input is received, in step S414 a notification of the user input is sent to the server 108. The notification of the user input may comprise a time within the piece of media at which the user input was received in step 3412.

AD of the communications between the user device 102 and the server 108 occur over the network 106, e.g. via the network interlace 214 and the network interface 304.

The server 108 receives the notification of the user input, and in step 841$ the server 108 stores an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media. The portion can be a scene or frame or any other defined time period around a frame. Other examples include a particular camera shot (that is, a point of view in a scene), or a sequence of frames. A delayed motion event can cause a number of frames in a scene to be displayed to a user at the user device 102, to allow him to pick the precise frame/scene of interest. The indication of the portion (scene or frame) may be stored in a memory at the server 108. For example, the notification of the user input may indicate a time within the piece of media at which the user input is received, and step 3416 may include determining which frame of the identified piece of media occurs at the identified time. The frame occurring at the identified time may then be stored at the server 108. That is, the frame itself may be stored. In this way, a screenshot can be saved of whatever is displayed on the display 204 at the time at which the user input is received.

Alternatively, the timing of the frame within the piece of media may be stored instead of the frame itself. In that case, the stored timing can subsequently be used to determine the frame of the video data of the identified piece of media occurring at the identified time, Steps 8412 to S416 may be repeated throughout the outputting of the piece of media for each appropriate user input that is received by the media application 218.

The dotted fine in Figure 4 indicates a passage of time, after which in step 8418 the outputting of the piece of media finishes. The finishing of the piece of media is detected by the media application 216. In step 8420 the media application 218 sends an indication to the server 108 to indicate that the output of the piece of media at the user device 102 has finished.

The indication that the output of the piece of media has finished is received at the server 108. In response, in step 5422 the server 108 sends the frame(s) of the piece of media, which are indicated by the indication(s) which were stored in step 8418, to the user device 102. If the frames themselves were stored in step 8416, then step 8422 simply involves retrieving the frames from the data store where they were saved in step 8416 and then sending the frames to the user device 102. Alternatively, if the timings of the frames were stored in step 5416, then step 5422 involves retrieving the timings from the data store where they were saved in step 8416, using the timings and the video data of the known piece of media to determine the frames at the relevant timings, and sending the determined frames to the user device 102.

The frame(s) are received at the user device 102. In step 8424 the received frames are displayed to the user 104 on the display 204 at the user device 102.

In this way the user 104 is reminded of what was on the screen when he decided to provide the user input in step 5412.

In one implementation, as shown in Figure 4 by step 5426, a link to a relevant webpage may be displayed with the frame that is displayed in step 5424. The relevant web page may relate to an object displayed in the frame of the video data. For example, if the frame of video data includes a character wearing a dress, then there may be provided a link to a web page from which the dress can be purchased. Interaction with the link may cause a remote retafler to take action, such as to send a brochure or advertisement to the user device.

In another implementation, step 8426 is not necessary, and instead step 8424 of displaying the frame(s) inbiudes automatically directing the user 104 to a webpage of an online store in which the frame(s) are displayed. In this way, the user 104 may be taken straight to an onUne store when the piece of media finishes, so that the user 104 can review their screenshots (i.e. the frames which caused them to provide the user input in step 8412). AlternativSy, the user 104 can login separately to browse the internet for the items that attracted them in the frames which caused them to provide the user input in step 8412).

Figures Ga to Sd show representations of an example web page to which the user 104 may be directed. Figure $a shows a screen 602 to which the user 102 may first be directed in order to view the frames which he chose to be reminded about. Three frames (indicated as 604, 606 and 608) are shown in screen 602 which show the frames of the piece of media which the user chose to save.

The user 104 can select one of the saved scenes, e.g. by clicking on one of the frames 604, 606 or 608 using for example, the touch screen 204 or keypad 210.

When the user 104 selects a scene from screen 602, screen 610 is displayed as shown in Figure Sb. Screen 610 requests that the user 104 selects a category in order to shop for items shown in the scene of the piece of media which he has selected, The example categories shown in Figure Gb are clothes 612, products 614 and accessories 616.

When the user 104 selects a category from screen 610, screen 618 is displayed as shown in Figure 6c, Screen 618 requests that the user 104 selects between the characters which are included in the selected scene. For example, as shown in Figure 6c, three characters are included in the scene, those being character A 620, character B 622 and character C 624.

When the user selects one of the characters in screen 618, he is taken to screen 626 as shown in Figure Ed. Screen 626 presents the user 104 with onUne shopping opportunities relating to the category and character selected in screens 610 and 618. For example, if the user has selected clothes 612 and character A 620, then screen 626 may present options for the user 104 to buy a dress or shoes by clicking on the respective links 630 and 632. The dress or shoes may be those worn by the selected character in the scene of the piece of media which includes the frame at the time for which the user provided the user input in step S412. Other information relating to the relevant products and/or characters may also be displayed in screen 626.

It can therefore be appreciated that the example implementation illustrated in Figures 6a to 6d enables the user 104 to purchase products and/or services via the media outputted at the user device 102 without interrupting the viewing experience, i.e. the purchasing of products and/or services occurs after the media has finished being outputted at the user device 102. In this implementation the media application 218 bridges the gap between product placement and product purchasing. This opens up huge potential for the way that viewers of media interact with the products included in, or extrapolated from, the media, The media application can also be used to provide instant information about an object on the screen 103.

The media application 218 may be downloaded to the user device 102 over the network 106 and stored in the memory 212 of the user device 102. The media application 218 may, or may not, be downloaded in return for payment from the user 104 to a software provider of the media application 218.

A piece of media (e.g. film) could have multiple versions, including for example a director's cut, extended cut, international cut and the final cut. When the audio data is matched to audio data of a known piece of media (in step S406) it may match with more than one of these versions. The default result which is assumed in this case is the final cut. However, the database 110 will store the temporal positions within the piece of media where the versions differ, and at this point the media application 218 may reconnect to the server 108 in order to verify which version of the piece of media is being output, i.e. to perform steps 8404 to 8410 again. This is done without requiring involvement from the user 104.

When the piece of media is outputted on a television channel which includes adverts, the connection between the user device 102 and the server 108 may be maintained (e.g. using a WiFi connection) and the sampling of the audio data of the outputted piece of media is continued, eg. at regu'ar intervals (e.g., every 5 seconds), to detect the presence of adverts, When adverts are detected the tracking of the output of the piece of media is paused until the adverts are finished and the output of the piece of media recommences, In this way, the media application 218 can maintain the sound sampling over WiFi or other wireless connection to differentiate between the piece of media and the advertisements, to ensure that the tracking of the output of the piece of media proceeds correctly.

To avoid the need to continuously track the media and advertisements so as to identify the adverts, it would be possible for the broadcaster of the television program to provide data to the user device which would indicate when the adverts were starting and stopping, so as to indicate to the user device when to reconnect to the audio data for tracking. When the application is used in a voting TV show, the points at which the user votes, voting is allowed and the vote data can be sent to the show and data can be sent to the user device of overall votes.

In the embodiments described above, the user 104 wishes to remember a frame of the outputted media in order to purchase a product shown in the frame, or obtain more details about an object in the frame. These details can be connected to the content of the subject matter of the OVM, or only connected to the object itself and not the overa content of the OVM. As an alternative to details concerning an object, services or products connected to the OVM can constitute information avaUabie to a user of the user device. However, in other embodiments, the user 104may wish to remember the frame for some other reason, which might not be related to the purchasing of products or services.

The analysis of the audio data for comparison with audio data of known pieces of media provides a fast and reliable method to identify a piece of media, and a temporS position within the piece of media, which is outputted from the user device 102.

The method steps described above and shown in Figure 4 may be implemented as functional modules at the user device 102 and the server 108 as appropriate, eg. in hardware or software. For example, the method steps may be implemented by executing a computer program product on the respective CPU (202 or 302) to implement the steps in software. The media application 218 described above is an example of a suitable computer program product for implementing the steps at the user device 102.

In particular, in the above description, a method has been described wherein audio data is received by the user device 102 and then transmitted to the server. It would be possible for the user device 102 to carry out the steps which are described above as being carried out by the server, in particular, the comparison and storing steps. This would be particularly appropriate in a situation where the provider of a piece of onscreen visual media, such a fUm, created an application specifically for that film, and including a sound file with the application. That is, the media application described above could be made specific to a particular piece of onscreen visual media such that when the application is opened and the user input is received, there would be no requirement to have a server connection until a later time.

it wifl be appreciated that in the embodiments described above where the onscreen visual media is output on the screen device 103, the screen of the user device 102 is not displaying the media and can be considered effectively to be blank. Alternatively the screen could be used to display additional information to augment the OVM content. However, it is useful for a user to be aware that the App is open and responsive, and so the App can be designed to generate a "tracking" screen as shown in Figure 7a while the onscreen visual media is being tracked. When a user has provided a user input, for example, by tap or gesture, a "scene saved" screen can be displayed to a user as shown, for example, in Figure 7b.

S

For watching movies in a cinema, the appflcation could cause the device to adopt a cinema mode, in which the ringing tone is turned off, any camera is turned off, and notifications and recordings of any kind are prevented. The screen could show black, but would still allow sending and receiving data for recognition of the onscreen visual media.

In an alternative aspect of the invention, there may not be a requftement for a user to provide a user input during the output of the piece of media to identify a piece of media in which he is interested. Instead, the piece of media could be tracked using the audio data from the beginning of a particular piece, for example, a television show, wherein the broadcasters of the show send information to the user device connected with the show which is being viewed by the user. For example, coupons or voting opportunities can be advised and displayed to a user of the user device while he is watching the television show, and based on the tracking of that show using the audio data.

A user could identify shows that he wishes to track in this fashion by using a tap or other input gesture at the beginning of a show once he has opened the application on his user device. Alternatively, the show could automatically (assuming the application is open) notify the application to commence tracking such that the show can interact with the user on the display of the user device.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

CLAWS: 1. A method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving a notification of a user input during the output of the piece of media to the user; and storing an indicati on of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
2. The method of claim 1, wherein the notification of the user input indicates the timing of the user input within the piece of media.
3. The method of claim I wherein said comparing step further identifies a temporal position of the received audio data within the identified piece of metfia.
4. The method of claim 1, wherein said indication of a portion of the video data is one of: a frame of the video data itself; a scene of the video data; a shot in the video data; and a sequence of frames of the video data.
5. The method of claim 1. wherein said indication of a portion of the video data is the timing of the portion within the piece of media, wherein said timing is subsequently used to determine the portion of the video data of the identified piece of media.
6. The method of claim 1, wherein the method steps are performed at a server, the piece of media is outputted to the user at a screen device, and the audio data is received at a user device associated with the user and transmitted to the server.
7. The method of claim 6, further comprising sending an indication of the identified piece of media to the user device.
8. The method of claim 6, when dependent upon claim 3 further comprising sending the identified temporal position to the user device.
9. The method of claim 6, further comprising: using said stored indication of the porfion to determine the portion; and sending the determined portion or information about an object contained in the determined portion to the user device.
10. The method of claim 9, wherein the determined portion or information is sent to the user device after the piece of media has finished being outputted at the user device.
11. A computer device configured to process media data, the computer device comprising: a receiving modu'e configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
12. The computer device of claim 11, which is one of a server and a user device.
13. A computer program product configured to process media data, the computer program product being embodied on a computerreadabe storage medium and configured so as when executed on a processor of a server to perform the operations of: receiving audio data of a piece of media outputted to a user, said piece of media comprising both video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input during the output of the piece of media to the user; and storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
14. A method of processing media data at a user device, the method corn p rising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the ideniffied piece of media corresponding to a timing of the user input within the piece of media.
15. The method of claim 14, further comprising receiving an indication of the identified piece of media from the server.
16. The method of claim 14, further comprising receiving, from the server, an identifier of a temporal position, within the identified piece of media, of the audio data sent to the server.
17. The method of claim 16, wherein receMng the audio data causes the user device to track the output of the piece of media.
18. The method of claim 17, further comprising using the tracking of the output of the piece of media to determine the timing of the user input within the piece of media.
19. The method of claim 18, wherein the notification of the user input sent to the server comprises an indication of the determined timing of the user input.
20. The method of claim 14, wherein said user input is received via a user interface of the user device, said user interface comprising at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a remote control and a gesture recognition system.
21. The method of claim 14, further comprising: after the piece of media has finished being outputted, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device,
22. The method of claim 21, further comprising providing a link to a web page relating to the object displayed in said portion of the video data.
23. A user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notificatioh of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
24. The user device of claim 23, wherein the input apparatus comprises at least one of a touch screen, a button, a mouse, a touch pad, a voice recognition system, a gesture recognition system and a remote controL 25, The user device of ciaim 23, comprising: a display operable to present a tracking screen while receiving audio data from the piece of media and a storing screen when sending the notification of the user input.26. The user device of claim 25, operable to receive the identified piece of media or information about an object in the identified piece of media, wherein the display is operable to present the identified piece of media or information.27. A computer program product configured to process media data at a user device, the computer program product being embodied on a computer-readable storage medium and configured so as when executed on a processor of the user device to perform the operations of: receiving audio data of a piece of media; sending audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.28. The method of claim 1. computer device of claim 11, computer program product of claim 15, method of claim 14, user device of claim 23 or computer program product of claim 27, wherein said indication of a portion of the video data comprises information related to the portion of video data.29. The method, computer device, user device or computer program product of claim 28, wherein said information comprises details relating to an object in the portion of video data or information about products or services relating to the content of the video data.30. The method of claim 14, further comprising: concurrently with outputting with the piece of media, receiving said indicated portion of the video data of the identified piece of media or information about an object contained in the identified piece of media corresponding to the timing of the user input within the piece of media; and displaying said portion or information at the user device.31, A method of processing media data, the method comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; tracking the outputted piece of media while it is output to a user; and receiving at a user device items for display to a user based on said tracking.32. A user device comprising a processor configured to execute a computer program product which when executed implements the method of claim 31 the user device further comprising a display for displaying said items.