WO2014094912A1 - Traitement de données multimédia - Google Patents

Traitement de données multimédia Download PDF

Info

Publication number
WO2014094912A1
WO2014094912A1 PCT/EP2012/076811 EP2012076811W WO2014094912A1 WO 2014094912 A1 WO2014094912 A1 WO 2014094912A1 EP 2012076811 W EP2012076811 W EP 2012076811W WO 2014094912 A1 WO2014094912 A1 WO 2014094912A1
Authority
WO
WIPO (PCT)
Prior art keywords
media
piece
user
audio data
outputted
Prior art date
Application number
PCT/EP2012/076811
Other languages
English (en)
Inventor
Steve Hamilton SHAW
Daniel Laurence
Original Assignee
Rocket Pictures Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Pictures Limited filed Critical Rocket Pictures Limited
Priority to PCT/EP2012/076811 priority Critical patent/WO2014094912A1/fr
Publication of WO2014094912A1 publication Critical patent/WO2014094912A1/fr

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server

Definitions

  • the present invention relates to processing media.
  • a user watches a piece of media (which may, for example, be a film, a live television programme, a pre-recorded television programme, a music video or an advertisement), he may see something in the video data of the piece of media which he would like to remember later, e.g. when the piece of media has finished being outputted at the user device.
  • the user may see a particular object being displayed in the film (e.g. a watch worn in a scene by an action hero) for which he would like to find out more details.
  • a user has to either pause the film to identify the object (if he can do that) or attempt to recall the object when the film finishes. For example, in a cinema, pausing the film is not an option for a viewer. Summary
  • ODM Onscreen Visual Media
  • users may have a smartphone, a tablet, a laptop, a personal computer (PC) or gaming device.
  • PC personal computer
  • Such devices are capable of executing applications (Apps) which interface with a user.
  • the pieces of media referred to herein include both video data and audio data output in synchronisation.
  • the application can record a frame of the piece of media corresponding to the moment within the piece of media that the user wanted to remember for display to the user. This can be done when the movie or program is finished, or during the movie or program. In methods described herein, this is achieved by identifying a piece of media using the audio data of the piece of media. A small portion of audio data (e.g. of the order of 1 to 10 seconds) of a piece of media is usually sufficient to identify a piece of media and the temporal position of the audio data within the piece of media. The application can then track the output of the piece of media.
  • a method of processing media data comprising: receiving audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data; comparing the received audio data of the outputted piece of media to audio data of known pieces of media stored in a data store, to thereby identify the outputted piece of media; receiving notification of a user input at a user device during the output of the piece of media to the user; and storing an indication of a portion, e.g. a frame or scene, of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the method of the first aspect may be performed at a server or at the user device.
  • a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the server to perform the method of the first aspect.
  • a method of processing media data comprising: receiving at a user device audio data of a piece of media outputted to a user; sending the audio data of the outputted piece of media for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media; receiving a user input from the user during the output of the piece of media to the user; and sending a notification of the user input to the server for use in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • a computer program product configured to process media may be embodied on a computer-readable storage medium and configured so as when executed on a processor of the user device to perform the method of the second aspect.
  • the audio data of a piece of media is used to identify the piece of media.
  • the timing of a user input within the piece of media corresponds to a frame of the video data of the piece of media, and an indication of that frame is stored. This allows the user to provide a user input during the output of the piece of media in order to remember a moment within the piece of media.
  • the invention also provides a computer device configured to process media data, the computer device comprising: a receiving module configured to: (i) receive audio data of a piece of media outputted to a user, said piece of media comprising synchronised video data and audio data, and (ii) receive a notification of a user input during the output of the piece of media to the user; a data store configured to store known pieces of media; a comparing module configured to compare the received audio data of the outputted piece of media to audio data of the known pieces of media stored in the data store, to thereby identify the outputted piece of media; and a storing module configured to store an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the invention also provides a user device configured to process media, the user device comprising: an audio data receiving module configured to receive audio data of a piece of media output to a user of the user device, said piece of media comprising synchronised video data and audio data; a user interface configured to receive a user input from the user during the output of the piece of media to the user; and a sending module configured to: (i) send audio data of the outputted piece of media to a server for comparison thereat with audio data of known pieces of media, to thereby identify the outputted piece of media, and (ii) send a notification of the user input to the server, said indication of the user input being for use by the server in storing an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • Figure 1 shows a schematic illustration of a network
  • Figure 2 is a schematic functional block diagram of a user device
  • Figure 3 is a schematic functional block diagram of a server
  • Figure 4 is a flow chart for a process of processing media according to a preferred embodiment
  • Figure 5 is an example of a graph showing the amplitude of an audio signal as a function of time.
  • Figures 6a to 6d, 7a and 7b show examples of user interfaces displayed at a user device.
  • Figure 1 shows a system including a user device 102 which is useable by a user 104.
  • the user 104 is watching a piece of media on another device 103, referred to as a screen device, which could be a cinema screen, or computer, DVD or television, for example.
  • the user device 102 can connect to the network 106, which may for example be the Internet.
  • the user device 102 may for example be a mobile phone (e.g. a smartphone), a tablet, a laptop, a personal computer (“PC”), a gaming device or other embedded device able to communicate over the network 106.
  • the user device 102 is arranged to receive information from and output information to the user 104.
  • the network 106 comprises a server 108 which has access to a data store such as a database 110.
  • IP Internet Protocol
  • FIG. 2 illustrates a detailed view of the user device 102.
  • the user device 102 comprises a processor (“CPU") 202 configured to process data on the user device 102.
  • CPU processor
  • Connected to the CPU 202 is a display 204 which may be implemented as a touch screen for inputting data to the CPU 202.
  • Also connected to the CPU 202 are speakers 206 for outputting audio data, a microphone 208 for receiving audio data, a keypad 210, a memory 212 for storing data and a network interface 214 for connecting to the network 106.
  • the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 are integrated into the user device 102 (e.g. when the user device 104 is a mobile phone).
  • the display 204 and speakers 206 act as output apparatus of the user device 102 for outputting video and audio data respectively.
  • the display 204 (when implemented as a touch screen), microphone 208 and keypad 210 act as input apparatus of the user device 102.
  • one or more of the display 204, speakers 206, microphone 208, keypad 210, memory 212 and network interface 214 may not be integrated into the user device 102 and may be connected to the CPU 202 via respective interfaces.
  • One example of such an interface is a USB interface.
  • the user device 102 may include other components which are not shown in Figure 2. For example, when the user device 102 is a PC, the CPU 202 may be connected to a mouse via a USB interface.
  • the CPU 202 may be connected to a touchpad via a USB interface.
  • the CPU 202 may be connected to a remote control via a wireless (e.g. infra-red) interface.
  • An operating system (OS) 216 is running on the CPU 202.
  • the user device 102 is configured to execute a media application 218 on top of the OS 216.
  • the media application 218 is a computer program product which is configured to process media data at the user device 102.
  • the media application 218 is stored in the memory 212 and when executed on the CPU 202 performs methods described in more detail below for processing media data at the user device 102.
  • FIG 3 illustrates a detailed view of the server 108.
  • the server 108 comprises a processor (“CPU") 302 configured to process data on the server 108.
  • CPU central processing unit
  • the server 108 also includes a memory for storing data which may include the database 110.
  • a computer program product may be stored in the memory at the server 08 and configured such that when it is executed on the CPU 302 to perform methods described in more detail below for processing media at the server 108.
  • steps S402, S404, S410, S412, S414, S418, S420, S424 and S426) are implemented at the user device 102, whereas the steps shown in the right hand column (i.e. steps S406, S408, S416 and S422) are implemented at the server 108.
  • a piece of media is outputted to the user 104 at the screen device 103.
  • the screen device 103 may be showing a film or TV program.
  • the film includes synchronized streams of video and audio data.
  • the piece of media may be a television programme, music video, advert, or any other OVM.
  • the piece of media can come from any source, and can be shown on any suitable screen device. Moreover, it could be streamed to the user device or stored at the user device for display at the user device itself. In that case, the screen device 103 is the same as the user device 102.
  • the user 104 opens the media application 218, such that the media application 218 executes on the CPU 202.
  • the media application 218 takes a sample of the audio data of the piece of media currently being output at the screen device 103.
  • the sample may for example have a duration in the range of 1 second to 10 seconds.
  • the sample has a sufficient duration for the piece of media to be identified as described below.
  • step S404 the audio data of the piece of media which is outputted at the screen device 103 is received by the user device 102 which sends it over the network 106 to the server 108 (e.g. via the network interface 214 and the network interface 304).
  • the network interface 214 may connect the user device 102 to the network 106 via a Wi-Fi router or via a cellular telephone network (e.g. implementing 3rd or 4 th generation of mobile telecommunications (3G or 4G) technology).
  • the server 108 receives the audio data sent from the user device 102, and in step S406 the server 108 uses the received audio data to identify the piece of media being outputted at the user device 102.
  • the server 108 can also identify the exact point in the piece of media at which the audio data occurs. This takes a few seconds, and is implemented as described below.
  • the audio data may be represented as a graph of amplitude against time, such as that shown in Figure 5.
  • a piece of media has a unique signature through samples of its audio data. This is true even for audio samples which have a duration of the order of 1 to 10 seconds.
  • the server 108 has access to a data store which stores audio data of known pieces of media.
  • the data store is implemented as a database 110 stored at the server 108.
  • the database 110 may for example store data representing power functions (such as that shown in Figure 5) for the audio data of the known pieces of media.
  • the known pieces of media may include, for example, a film, a live television programme, a pre-recorded television programme, a music video, an advert, or any other OVM which may be output.
  • step S406 the audio data received from the user device 102 is compared to audio data of known pieces of media stored in the database 110, to thereby identify the piece of media being outputted from the user device 102.
  • the audio signature of a piece of media is used to differentiate between known pieces of media and the exact position of the audio data within a piece of media.
  • the comparison of the received audio data with the known pieces of audio data may be performed using known algorithms. This involves comparing features (e.g. audio fingerprints) of audio data using statistical analysis to determine whether two samples of audio data match each other. For example, applications such as Shazam, Soundhound, SoundPrint and IntoNow implement algorithms for identifying audio data by comparing it with audio data from a database of audio data from known pieces of media. As an example, the IntoNow implementation is described in the US patent publication number 2012/0209612 A1. Since such algorithms are known in the art, they are not described in detail herein.
  • Step S406 identifies which piece of media the user 104 is viewing and also identifies the exact temporal position within the piece of media to which the audio data matches.
  • the server 108 sends an indication of the identified piece of media to the user device 102.
  • the server 108 also sends an identifier of the temporal position within the identified piece of media to the user device 102.
  • the user device 102 receives the indication of the piece of media (e.g. the title of the piece of media) and the identifier of the temporal position (e.g. a time from the start of the piece of media).
  • step S410 using the indication of the piece of media and the identifier of the temporal position, the media application 218 tracks the outputting of the piece of media.
  • the tracking of the piece of media may continue until completion of the outputting of the piece of media, that is, until the piece of media finishes being output.
  • the media application 218 will need to reconnect to the server 108 in order to correctly track the output of the piece of media.
  • the film may be paused on TV, or buffering issues may interrupt the playout if the media is being streamed.
  • a reconnect process would repeat steps S404 to S410 as described above.
  • the media application 218 is able to obtain information (e.g. title) indicating what media is being output and how far through that media (in time) the outputting of the media is.
  • the media application 218 may display details about the piece of media on the display 204 when it has received them from the server 108. For example, the media application 218 may display the title, current point and total length of the piece of media currently being output.
  • the media application 218 receives a user input from the user 104 during the output of the piece of media.
  • the user 104 may provide the user input via the user interface of the user device 102.
  • the user interface of the user device 102 comprises input apparatus by which the user can provide a user input.
  • the input apparatus may for example comprise one or more of a touch screen, a button, a mouse, a touch pad, a voice recognition system and a remote control.
  • the user taps the touch screen 204 to provide the user input.
  • a gesture can act as an input.
  • the user may provide the user input when he sees something in the outputted piece of media which he wants to record, either to remember after the piece of media has finished or at the time the object of interest is displayed.
  • the user 104 may decide that he would like to buy an object being displayed in the film (e.g. a watch worn in a scene by an action hero).
  • the user 104 may not want to interrupt his viewing experience of the film, so he decides that he will follow up on the object when the film has finished.
  • the user might not proceed to buy the object, for one of many possible reasons.
  • the user may forget about his intention to buy the object, he may not know how to buy the object or he may proceed to do something else when the film finishes instead of buying the object.
  • a viewer may want additional information about something shown in a TV documentary, such as an animal or location in a wildlife or holiday program.
  • the user 104 will be reminded (e.g. when the film finishes) of what was on the screen at the time of providing the user input.
  • an alternative is to present to the user information related to what was on the screen at the time providing the user input.
  • This information can be connected to the content or subject matter of the onscreen visual medial itself, or only connected to a particular object in the frame or scene indicated by the user, and not connected to the overall content of the OVM.
  • a notification of the user input is sent to the server 108.
  • the notification of the user input may comprise a time within the piece of media at which the user input was received in step S412.
  • All of the communications between the user device 102 and the server 108 occur over the network 106, e.g. via the network interface 214 and the network interface 304.
  • the server 108 receives the notification of the user input, and in step S416 the server 108 stores an indication of a portion of the video data of the identified piece of media corresponding to a timing of the user input within the piece of media.
  • the portion can be a scene or frame or any other defined time period around a frame. Other examples include a particular camera shot (that is, a point of view in a scene), or a sequence of frames.
  • a delayed motion event can cause a number of frames in a scene to be displayed to a user at the user device 102, to allow him to pick the precise frame/scene of interest.
  • the indication of the portion (scene or frame) may be stored in a memory at the server 108.
  • the notification of the user input may indicate a time within the piece of media at which the user input is received, and step S416 may include determining which frame of the identified piece of media occurs at the identified time.
  • the frame occurring at the identified time may then be stored at the server 108. That is, the frame itself may be stored. In this way, a screenshot can be saved of whatever is displayed on the display 204 at the time at which the user input is received.
  • the timing of the frame within the piece of media may be stored instead of the frame itself. In that case, the stored timing can subsequently be used to determine the frame of the video data of the identified piece of media occurring at the identified time.
  • Steps S412 to S416 may be repeated throughout the outputting of the piece of media for each appropriate user input that is received by the media application 218.
  • the dotted line in Figure 4 indicates a passage of time, after which in step S418 the outputting of the piece of media finishes. The finishing of the piece of media is detected by the media application 218.
  • the media application 218 sends an indication to the server 108 to indicate that the output of the piece of media at the user device 102 has finished.
  • step S422 The indication that the output of the piece of media has finished is received at the server 108.
  • the server 108 sends the frame(s) of the piece of media, which are indicated by the indication(s) which were stored in step S416, to the user device 102. If the frames themselves were stored in step S416, then step S422 simply involves retrieving the frames from the data store where they were saved in step S416 and then sending the frames to the user device 102.
  • step S422 involves retrieving the timings from the data store where they were saved in step S416, using the timings and the video data of the known piece of media to determine the frames at the relevant timings, and sending the determined frames to the user device 102.
  • the frame(s) are received at the user device 102.
  • the received frames are displayed to the user 104 on the display 204 at the user device 102. In this way the user 104 is reminded of what was on the screen when he decided to provide the user input in step S412.
  • a link to a relevant webpage may be displayed with the frame that is displayed in step S424.
  • the relevant web page may relate to an object displayed in the frame of the video data. For example, if the frame of video data includes a character wearing a dress, then there may be provided a link to a web page from which the dress can be purchased. Interaction with the link may cause a remote retailer to take action, such as to send a brochure or advertisement to the user device.
  • step S426 is not necessary, and instead step S424 of displaying the frame(s) includes automatically directing the user 104 to a webpage of an online store in which the frame(s) are displayed.
  • the user 104 may be taken straight to an online store when the piece of media finishes, so that the user 104 can review their screenshots (i.e. the frames which caused them to provide the user input in step S412).
  • the user 104 can login separately to browse the internet for the items that attracted them in the frames which caused them to provide the user input in step S412).
  • Figures 6a to 6d show representations of an example web page to which the user 104 may be directed.
  • Figure 6a shows a screen 602 to which the user 102 may first be directed in order to view the frames which he chose to be reminded about.
  • Three frames are shown in screen 602 which show the frames of the piece of media which the user chose to save.
  • the user 104 can select one of the saved scenes, e.g. by clicking on one of the frames 604, 606 or 608 using for example, the touch screen 204 or keypad 210.
  • screen 610 When the user 104 selects a scene from screen 602, screen 610 is displayed as shown in Figure 6b. Screen 610 requests that the user 104 selects a category in order to shop for items shown in the scene of the piece of media which he has selected.
  • the example categories shown in Figure 6b are clothes 612, products 614 and accessories 616.
  • screen 618 is displayed as shown in Figure 6c. Screen 618 requests that the user 104 selects between the characters which are included in the selected scene. For example, as shown in Figure 6c, three characters are included in the scene, those being character A 620, character B 622 and character C 624.
  • Screen 626 presents the user 104 with online shopping opportunities relating to the category and character selected in screens 610 and 618. For example, if the user has selected clothes 612 and character A 620, then screen 626 may present options for the user 104 to buy a dress or shoes by clicking on the respective links 630 and 632.
  • the dress or shoes may be those worn by the selected character in the scene of the piece of media which includes the frame at the time for which the user provided the user input in step S412. Other information relating to the relevant products and/or characters may also be displayed in screen 626.
  • the example implementation illustrated in Figures 6a to 6d enables the user 04 to purchase products and/or services via the media outputted at the user device 102 without interrupting the viewing experience, i.e. the purchasing of products and/or services occurs after the media has finished being outputted at the user device 102.
  • the media application 218 bridges the gap between product placement and product purchasing. This opens up huge potential for the way that viewers of media interact with the products included in, or extrapolated from, the media.
  • the media application can also be used to provide instant information about an object on the screen 103.
  • the media application 218 may be downloaded to the user device 102 over the network 106 and stored in the memory 212 of the user device 102.
  • the media application 218 may, or may not, be downloaded in return for payment from the user 104 to a software provider of the media application 218.
  • a piece of media could have multiple versions, including for example a director's cut, extended cut, international cut and the final cut.
  • the audio data is matched to audio data of a known piece of media (in step S406) it may match with more than one of these versions.
  • the default result which is assumed in this case is the final cut.
  • the database 110 will store the temporal positions within the piece of media where the versions differ, and at this point the media application 218 may reconnect to the server 108 in order to verify which version of the piece of media is being output, i.e. to perform steps S404 to S410 again. This is done without requiring involvement from the user 104.
  • the connection between the user device 102 and the server 108 may be maintained (e.g. using a Wi-Fi connection) and the sampling of the audio data of the outputted piece of media is continued, e.g. at regular intervals (e.g., every 5 seconds), to detect the presence of adverts.
  • adverts are detected the tracking of the output of the piece of media is paused until the adverts are finished and the output of the piece of media re-commences.
  • the media application 218 can maintain the sound sampling over Wi-Fi or other wireless connection to differentiate between the piece of media and the advertisements, to ensure that the tracking of the output of the piece of media proceeds correctly.
  • the broadcaster of the television program To avoid the need to continuously track the media and advertisements so as to identify the adverts, it would be possible for the broadcaster of the television program to provide data to the user device which would indicate when the adverts were starting and stopping, so as to indicate to the user device when to reconnect to the audio data for tracking.
  • the application When the application is used in a voting TV show, the points at which the user votes, voting is allowed and the vote data can be sent to the show and data can be sent to the user device of overall votes.
  • the user 104 wishes to remember a frame of the outputted media in order to purchase a product shown in the frame, or obtain more details about an object in the frame. These details can be connected to the content of the subject matter of the OVM, or only connected to the object itself and not the overall content of the OVM. As an alternative to details concerning an object, services or products connected to the OVM can constitute information available to a user of the user device. However, in other embodiments, the user 104 may wish to remember the frame for some other reason, which might not be related to the purchasing of products or services.
  • the analysis of the audio data for comparison with audio data of known pieces of media provides a fast and reliable method to identify a piece of media, and a temporal position within the piece of media, which is outputted from the user device 102.
  • the method steps described above and shown in Figure 4 may be implemented as functional modules at the user device 102 and the server 108 as appropriate, e.g. in hardware or software.
  • the method steps may be implemented by executing a computer program product on the respective CPU (202 or 302) to implement the steps in software.
  • the media application 218 described above is an example of a suitable computer program product for implementing the steps at the user device 102.
  • the screen of the user device 102 is not displaying the media and can be considered effectively to be blank.
  • the screen could be used to display additional information to augment the OVM content.
  • the App it is useful for a user to be aware that the App is open and responsive, and so the App can be designed to generate a "tracking" screen as shown in Figure 7a while the onscreen visual media is being tracked.
  • a "scene saved" screen can be displayed to a user as shown, for example, in Figure 7b.
  • the application could cause the device to adopt a cinema mode, in which the ringing tone is turned off, any camera is turned off, and notifications and recordings of any kind are prevented.
  • the screen could show black, but would still allow sending and receiving data for recognition of the onscreen visual media.
  • the piece of media could be tracked using the audio data from the beginning of a particular piece, for example, a television show, wherein the broadcasters of the show send information to the user device connected with the show which is being viewed by the user. For example, coupons or voting opportunities can be advised and displayed to a user of the user device while he is watching the television show, and based on the tracking of that show using the audio data.
  • a user could identify shows that he wishes to track in this fashion by using a tap or other input gesture at the beginning of a show once he has opened the application on his user device.
  • the show could automatically (assuming the application is open) notify the application to commence tracking such that the show can interact with the user on the display of the user device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Signal Processing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

L'invention concerne un procédé permettant de traiter des données multimédia, ledit procédé consistant à : recevoir des données audio d'un élément multimédia transmis à un utilisateur, ledit élément multimédia comprenant à la fois des données vidéo et des données audio ; comparer les données audio reçues de l'élément multimédia transmis à des données audio d'éléments multimédia connus enregistrés dans un magasin de données, afin d'identifier l'élément multimédia transmis ; recevoir une notification d'une entrée utilisateur pendant la transmission de l'élément multimédia à l'utilisateur ; et enregistrer une indication d'une partie des données vidéo de l'élément multimédia identifié correspondant au moment de l'entrée utilisateur dans l'élément multimédia.
PCT/EP2012/076811 2012-12-21 2012-12-21 Traitement de données multimédia WO2014094912A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/076811 WO2014094912A1 (fr) 2012-12-21 2012-12-21 Traitement de données multimédia

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2012/076811 WO2014094912A1 (fr) 2012-12-21 2012-12-21 Traitement de données multimédia

Publications (1)

Publication Number Publication Date
WO2014094912A1 true WO2014094912A1 (fr) 2014-06-26

Family

ID=47458992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/076811 WO2014094912A1 (fr) 2012-12-21 2012-12-21 Traitement de données multimédia

Country Status (1)

Country Link
WO (1) WO2014094912A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286541A1 (fr) * 2000-04-14 2003-02-26 Nippon Telegraph and Telephone Corporation Procede, systeme et appareil permettant d'acquerir des informations concernant des informations diffusees
US20100095326A1 (en) * 2008-10-15 2010-04-15 Robertson Iii Edward L Program content tagging system
US20100154012A1 (en) * 2008-12-15 2010-06-17 Verizon Business Network Services Inc. Television bookmarking with multiplatform distribution
WO2011090540A2 (fr) * 2009-12-29 2011-07-28 Tv Interactive Systems, Inc. Procédé d'identification de segments vidéo et d'affichage de contenu ciblé contextuellement sur téléviseur connecté
US20110247042A1 (en) * 2010-04-01 2011-10-06 Sony Computer Entertainment Inc. Media fingerprinting for content determination and retrieval
US20120209612A1 (en) 2011-02-10 2012-08-16 Intonow Extraction and Matching of Characteristic Fingerprints from Audio Signals

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1286541A1 (fr) * 2000-04-14 2003-02-26 Nippon Telegraph and Telephone Corporation Procede, systeme et appareil permettant d'acquerir des informations concernant des informations diffusees
US20100095326A1 (en) * 2008-10-15 2010-04-15 Robertson Iii Edward L Program content tagging system
US20100154012A1 (en) * 2008-12-15 2010-06-17 Verizon Business Network Services Inc. Television bookmarking with multiplatform distribution
WO2011090540A2 (fr) * 2009-12-29 2011-07-28 Tv Interactive Systems, Inc. Procédé d'identification de segments vidéo et d'affichage de contenu ciblé contextuellement sur téléviseur connecté
US20110247042A1 (en) * 2010-04-01 2011-10-06 Sony Computer Entertainment Inc. Media fingerprinting for content determination and retrieval
US20120209612A1 (en) 2011-02-10 2012-08-16 Intonow Extraction and Matching of Characteristic Fingerprints from Audio Signals

Similar Documents

Publication Publication Date Title
US11443511B2 (en) Systems and methods for presenting supplemental content in augmented reality
JP5837198B2 (ja) ビデオコンテンツ中の要素の視覚的選択のためのシステムおよび方法
US8913171B2 (en) Methods and systems for dynamically presenting enhanced content during a presentation of a media content instance
US9015745B2 (en) Method and system for detection of user-initiated events utilizing automatic content recognition
US20130173765A1 (en) Systems and methods for assigning roles between user devices
JP5530028B2 (ja) 放送に含まれた広告に関連した情報をクライアント端末機側へネットワークを介して提供するシステム及び方法
US9460204B2 (en) Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
US20130174191A1 (en) Systems and methods for incentivizing user interaction with promotional content on a secondary device
US20180014066A1 (en) System and methods for facile, instant, and minimally disruptive playback of media files
US9781492B2 (en) Systems and methods for making video discoverable
US20120042041A1 (en) Information processing apparatus, information processing system, information processing method, and program
US9769530B2 (en) Video-on-demand content based channel surfing methods and systems
CN103918277A (zh) 用于确定媒体项正被呈现的置信水平的系统和方法
KR20160117933A (ko) 검색을 수행하는 디스플레이 장치 및 이의 제어 방법
US20130085846A1 (en) System and method for online selling of products appearing on a display
US20130177286A1 (en) Noninvasive accurate audio synchronization
TWI571119B (zh) 顯示控制方法及系統、廣告破口判斷裝置、影音處理裝置
US10141023B2 (en) Method and system for multimedia summary generation
GB2509150A (en) Bookmarking a scene within a video clip by identifying the associated audio stream
WO2014094912A1 (fr) Traitement de données multimédia
US20190379920A1 (en) Method and system for creating a customized video associated with an advertisement
EP4386653A1 (fr) Placement de commandes pour un sujet inclus dans un segment multimédia
US20220417600A1 (en) Gesture-based parental control system
KR102524066B1 (ko) 웹기반으로 디바이스를 식별하는 방법 및 장치
KR101380963B1 (ko) 관련 정보 제공 시스템 및 제공 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12808408

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12808408

Country of ref document: EP

Kind code of ref document: A1