WO2004109549A2 - System and method for performing media content augmentation on an audio signal - Google Patents

System and method for performing media content augmentation on an audio signal Download PDF

Info

Publication number
WO2004109549A2
WO2004109549A2 PCT/IB2004/050822 IB2004050822W WO2004109549A2 WO 2004109549 A2 WO2004109549 A2 WO 2004109549A2 IB 2004050822 W IB2004050822 W IB 2004050822W WO 2004109549 A2 WO2004109549 A2 WO 2004109549A2
Authority
WO
WIPO (PCT)
Prior art keywords
audio
user
search
speech
information
Prior art date
Application number
PCT/IB2004/050822
Other languages
French (fr)
Other versions
WO2004109549A3 (en
Inventor
Martin Franciscus Mckinney
Jan Alexis Daniel Nesvadba
Dirk Jeroen Breebaart
Original Assignee
Koninklijke Philips Electronics N. V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N. V. filed Critical Koninklijke Philips Electronics N. V.
Publication of WO2004109549A2 publication Critical patent/WO2004109549A2/en
Publication of WO2004109549A3 publication Critical patent/WO2004109549A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning

Definitions

  • This invention relates in general to a system and method for performing media content augmentation on an audio signal, and, in particular, to a system and method for providing media content augmentation in an audio device.
  • An audio signal can be received by a user as, for example, a radio signal or as part of an audio-visual signal originating from a television broadcast or an audio- visual device.
  • An audio device can be, for example, a radio, a receiver, etc., or an audio-visual device such as a television set, a DVD player, VCR, multimedia system, mobile telephone, etc. Regardless of the source of the audio signal, such a signal may consist of speech, music, sound effects and other audio contents.
  • the programs received by a user can be of different natures, e.g. news broadcasts, commercials, feature films, online shopping, health programs etc. While listening to or viewing such programs, a user may hear a reference to a word or phrase that he does not recognise.
  • the word or phrase may be a new buzzword, a slang word, or may denote a product with which the user is not acquainted.
  • words are created and used by particular groups, for example rappers, who write music and communicate using words which are not used by the majority of people.
  • groups might be small ethic communities, whose language might be partially adopted by the surrounding communities. Some of these words might be taken over into the vernacular and used commonly in everyday speech, whilst other words might remain local to the groups that created them. Some words might persist in language whereas others may be of a short-lived nature.
  • the user In the case of commercials, the user might hear a reference to a product with which he is not familiar, and wish to learn more about it.
  • the product might be new on the market and therefore of interest to the user. For example, he might wish to know more about the ingredients in the case of a food product, or technical information regarding a device, or information about side-effects in the case of a medicinal product.
  • the user Equally, the user might see the product advertised or mentioned in a foreign-language broadcast, and might wish to locate a supplier.
  • Such information can be difficult to locate, and requires a relatively high level of effort on the part of the user, for example by looking for suppliers in telephone directories or by first locating and then making contact with the manufacturers.
  • Other information of interest regarding a product might be its price or availability. Since products are often priced differently by various suppliers, the user might be interested in locating the supplier offering the most attractive price. To this end, the user must compare prices himself, by shopping around to find suppliers of the product in which he is interested. The user might also make use of a price agency, which may, in return for a percentage of the sale price, locate the supplier with the lowest price for a particular product. The user must first locate such an agency and then request a price comparison. If the user is watching a program about a foreign country, he might wish to find out more about how to plan a visit to that country. Interesting information in this case might be travel connections, route planning, visa requirements, vaccine recommendations, currency information etc. The user can locate such information by contacting travel bureaus, researching in libraries or online, and reading appropriate literature. Again, to locate all the desired info ⁇ nation requires effort and dedication on the part of the user.
  • an object of the present invention is to provide a system and a method which can be used to easily provide informative media content augmentation on an audio input.
  • the present invention provides a system for performing media content augmentation on an audio input signal, wherein the system comprises a speech identifier for identifying the speech content in the audio signal, a speech-to-text converter for converting the speech content into a digital text format, a key phrase identifier for identifying key phrases in the digital text; a search engine for searching a source of information for material relating to the key phrases; and a result compiler for providing the user with the results of the search.
  • the key phrases may be single words or may consist of groups of words e.g. complete sentences. Therefore, for the sake of simplicity, any reference to "word” in the following text is assumed to refer also to "phrase", and vice versa, without restriction of the invention.
  • An appropriate method for media content augmentation of an audio input comprises identifying the speech content in the audio-visual signal, converting the speech content into a digital text format, identifying key phrases in the digital text, searching a source of information for material relating to the key phrases, and providing the user with results of the search.
  • the system thus provides an easy way of quickly searching for and locating information of any type concerning words identified on an audio signal which are of interest to the user, who no longer needs to invest time and resources by initiating and carrying out such a search on his own.
  • the modules which perform speech identification and conversion to digital text can be realised by one skilled in the art by using off-the-shelf components. These modules may also be realised as a single component, using available software and/or hardware components.
  • the digital text created by the speech-to-text converter can then be analysed for content by the key phrase identifier which first processes the digital text to filter out uninteresting words, such as definite/indefinite articles, conjunctions etc. What remains is a list of possible key phrases which might be candidates for an information search.
  • One possible method of operation of the key phrase identifier is described further on in the following text.
  • the source of information searched with regard to information relating to the key phrases might be, for example, an information database, the internet, or an intranet.
  • the type of information to be located might be in the form of sound clips, text, video clips, URLs, pictures etc.
  • the result compiler organises the located material into a manner suitable for presentation to the user, for example in the fo ⁇ n of a text summary with embedded graphics and hyperlinks, or a collection of video clips.
  • the result compiler is incorporated into the system in such a way as to be able to present to the user the results of the information in the same device which contains the audio receiver.
  • the results of the information search may equally be stored in a memory for later retrieval and perusal, and may be made available for processing or viewing on another device.
  • a dictionary is incorporated in the system, for use by the key phrase identifier in identification of key phrases.
  • the user can access the dictionary by means of a suitable interface.
  • the dictionary can be updated and extended as required.
  • a particularly advantageous embodiment of the invention is such that the dictionary contains a list of known words (phrases).
  • the words already contained in the dictionary are excluded from any information search, since they are words already known to the user. Any word that is "new", i.e. does not exist in the current version of the dictionary, is a potential candidate for an information search.
  • the dictionary can actively be updated, i.e.
  • the user can indicate, via the interface, that this word is to be entered in the dictionary and therefore excluded from all future searches. Further, the user can specify words (phrases) that are to be included in an information search, for example words that are used in everyday language might also be used in the name of a product in which the user is interested, or appear in the title of a book. This would avoid automatic exclusion from an information search of products owing to their product names consisting of common words. Context analysis can be performed on the digital text using state-of-the-art techniques to identify phrases which consist of everyday words, e.g. "Gone with the Wind", "cure for the common cold” etc., but which might as a whole be of interest to the user.
  • the system makes use of a computer network interface to search a computer network for references to the key phrases, for example in the form of URLs, links, etc.
  • the interface can be realised by means of, for example, a modem, ISDN or DSL connection, and any hardware and software required.
  • a further embodiment of the interface might use a wireless connection to make contact with the computer network.
  • the computer network with which the system makes contact might be a local intranet or the world-wide web (internet).
  • the search engine of the system might also make use of the services of existing, possibly more powerful search engines (for example a meta-crawler) to perform parallel searches, thereby minimising the amount of time required to obtain the desired results.
  • a further preferred embodiment of the invention allows the user to control the manner in which a search is to be carried out and the manner in which the results of the search are to be presented, by specifying a set of preferences, for example, automatic or manual result presentation. Therefore, the system preferably comprises a suitable interface, which may be the same interface as utilized for dictionary access.
  • the system might continually perform the information search based on words identified in the audio signal, and update a list of relevant internet pages that the user can choose to view immediately or later on.
  • the system might continuously display lists of words recently identified on the audio signal that the user can highlight and then choose to initiate an (internet) search based on the highlighted word(s).
  • the type of infonnation sought on the internet might be specified in the user preferences or may depend on the type of program being viewed.
  • Genre information extracted from the audio-visual signal (electronic program guide) or obtained from an external source, for example a meta-data service provider on the internet, might be used to identify the type of program being received. For example, the user might be watching a history program and may wish to look for educational material relating to the subject matter of the broadcast.
  • Another use could involve internet comparison shopping engines to quickly compare prices of items advertised.
  • the user might wish to learn of the search results as soon as they are available, in which case the user would specify by means of appropriate commands that the results of the information search are to visually overlay the program which is currently being watched on TV, for example by inserting closed captions into the audio-visual signal or by presenting the results in picture-in-picture form; or that the program being listened to or watched is to be interrupted to present the results immediately.
  • the user might require the option of having the results displayed continuously on a separate screen such as a television or computer screen.
  • the commands and preferences entered by a user might also be stored in a personal user profile.
  • the system preferably offers the possibility of storing at least one user profile, more preferably a plurality of user profiles, so that anyone using the system can activate his own previously stored profile without having to enter anew his own preferences each time.
  • a user profile might cover all preferences about the manner in which the search is to be carried out, the type of information to be searched for, and the manner in which the results are to be presented to the user, for example, whether a user wishes to observe the results immediately in closed-caption form, that certain types of product advertised in commercials are to be excluded from or included in augmentation, and that nature documentaries are to be augmented whereas talk-shows are to be excluded, that a particular search engine or meta-crawler is to be invoked in augmentation, that the augmentation is to be supplemented using information provided by third-party metadata service providers etc.
  • Global preferences might also be stored so that a particular mode might be activated by any user, for example an educational mode, in which documentaries and history programs are singled out for augmentation while excluding other types of program; commercial mode where only the commercials are used to locate information regarding the products and services advertised, etc.
  • the system comprises a comparator which is used to compare qualitatively similar information in the results of the information search.
  • the key phrase denotes, for example, a new model of car
  • the information returned might contain items of text relating to technical performance, or details regarding suppliers, or purchase and leasing conditions. It is advantageous to compare similar types of information, so that duplicate information can be removed from the results, and so that intelligent comparisons can be made.
  • the user might wish to have the results sorted according to the type of information they represent, for example, a list of prices might be sorted in ascending order with the most attractive price at the top of the list.
  • the comparator can also organise the search results into a manner suitable for the desired mode of perusal, for example, audio clips for outputting on a radio, video and text material for viewing on a screen, or a document suitable for printing.
  • the invention advantageously comprises an audio identifier to identify the audio content of an audiovisual signal, to facilitate use of such a system in, for example, home entertainment centres which can receive audio- visual signals from various sources (TV, VCR, DVD etc).
  • a copy of the audio content is diverted to the speech-to-text identifier for further processing as already described.
  • a preferred feature of the invention comprises a computer program for performing all the steps involved in identifying speech in the audio content of the input signal and the key phrases therein, and carrying out an information search concerning the key phrases according to the user's specifications, i.e. most or all of the components of the system, such as speech identifier, speech-to-text converter, key phrase identifier, augmentation module etc. are realised in the form of software and/or hardware modules. Any required software might be encoded on a processor of the audio device, or be encoded on a separate processor, so that an existing audio device might be adapted to benefit from the features of this invention.
  • Fig.l is a schematic block diagram of a system for automatic media content augmentation in accordance with an embodiment of the present invention.
  • the system is shown to incorporate an audiovisual device 17, for example a home entertainment system, TV, multimedia device or similar.
  • an interface 14 between the user and the system has been included only schematically in the diagram. It is understood, however, that the system includes a means of interpreting commands issued by the user in the usual manner of a user interface and also means for outputting the audio-visual signal, for example, TV loudspeakers, TV screen etc.
  • Fig. 1 shows a media content augmentation system 1 in which an audio identifier 15 identifies the audio content of an audio-visual input stream
  • the speech processing module 4 comprises a speech identifier 3 which identifies the speech content on the audio signal 2, and a speech-to-text converter 5 which converts the identified speech content to a digital text 6.
  • the digital text 6 is passed on to a key phrase identifier 7.
  • the key phrase identifier 7 performs some initial processing on the digital text 6 and isolates potential words that might be of interest to the user 20.
  • the key phrase identifier 7 performs a check to see whether an identified word is already covered by a dictionary 12, or specifically tagged for exclusion from or inclusion in an information search. A word not already covered by the dictionary 12 and not excluded from a search is a key phrase 19 and is passed on accordingly to the augmentation module 25.
  • the augmentation module 25 in this example comprises a search engine 8, a comparator 18, and a result compiler 10.
  • the search engine 8 can access an external computer network 9, for example the internet, by means of a computer network interface 13.
  • an information search is initiated, and the results of the search are analysed by the comparator 18, which can categorise the results into similar types of information and perform intelligent comparisons.
  • the result compiler 10 is used to compile the results 11 of the search and/or the comparison into a manner suitable for presentation to the user 20.
  • the user 20 might wish to view the results on the television screen of the audio-visual device 17, or he might want a printout or other hard copy of the results.
  • the user 20 can influence the media augmentation procedure by entering preferences and commands 21 via the user interface 14 of the audio-visual device 17.
  • the preferences and commands 21 are stored in a local database 22 and are used to control the augmentation procedure.
  • the augmentation may be further supplemented by external program genre information 26 obtained from the external computer network 9, e.g. by downloading relevant information from the internet 9 via the augmentation modul 25 passing this information 26 to the key phrase identifier 7.
  • the user may also update the dictionary 12 by specifying words that are to be excluded from or included in an information search.
  • the system 1 described in this example is shown as an extension of an audio-visual device 17.
  • all of the additional components described (automatic speech recognition 4, key phrase identifier 7, dictionary 12, preferences memory 22, augmentation module 25,) might be integrated to present a single device along with the audio-visual device 17, or might be realised as part of a personal computer system which is connected to an audio-visual device 17.
  • the system might also be realised, for example, as a set-top box connected to an audio-visual device 17.
  • the dictionaries can be updated or replaced as desired by downloading new versions from the internet.
  • the media content augmentation system can make use of the most up-to-date data available.

Abstract

The invention describes a system (1) for performing media content augmentation on an audio signal (2). The system comprises a speech identifier (3) for identifying speech content in the audio signal (2); a speech-to-text converter (5) for converting the speech content into a digital text format (6); a key phrase identifier (7) for identifying key phrases (19) in the digital text (6); a search engine (8) for searching a source of information (9) for material relating to the key phrases (19), and a search result compiler (10) to provide a user with results of the search (11). Moreover the invention describes an appropriate method for performing media content augmentation on an audio signal (2).

Description

System and method for performing media content augmentation on an audio signal
This invention relates in general to a system and method for performing media content augmentation on an audio signal, and, in particular, to a system and method for providing media content augmentation in an audio device.
An audio signal can be received by a user as, for example, a radio signal or as part of an audio-visual signal originating from a television broadcast or an audio- visual device. An audio device can be, for example, a radio, a receiver, etc., or an audio-visual device such as a television set, a DVD player, VCR, multimedia system, mobile telephone, etc. Regardless of the source of the audio signal, such a signal may consist of speech, music, sound effects and other audio contents.
The programs received by a user can be of different natures, e.g. news broadcasts, commercials, feature films, online shopping, health programs etc. While listening to or viewing such programs, a user may hear a reference to a word or phrase that he does not recognise. The word or phrase may be a new buzzword, a slang word, or may denote a product with which the user is not acquainted.
Sometimes words are created and used by particular groups, for example rappers, who write music and communicate using words which are not used by the majority of people. Examples of other groups might be small ethic communities, whose language might be partially adopted by the surrounding communities. Some of these words might be taken over into the vernacular and used commonly in everyday speech, whilst other words might remain local to the groups that created them. Some words might persist in language whereas others may be of a short-lived nature.
If the user wishes to find out more about a particular word or phrase, he is limited to consulting standard dictionaries, which may be available in printed form, or as online internet dictionaries. In the case of new words, such as buzzwords or slang, the user may not be able to find a reference in the dictionaries available, since some time usually elapses before such words are included in new editions of the dictionary, if at all. A similar problem might be experienced by persons learning a foreign language, who listen to or view foreign language programs. Foreign language dictionaries are also generally restricted to "normal" language and may include only a relatively small proportion of new words, slang, buzzwords etc.
In the case of commercials, the user might hear a reference to a product with which he is not familiar, and wish to learn more about it. The product might be new on the market and therefore of interest to the user. For example, he might wish to know more about the ingredients in the case of a food product, or technical information regarding a device, or information about side-effects in the case of a medicinal product. Equally, the user might see the product advertised or mentioned in a foreign-language broadcast, and might wish to locate a supplier. Such information can be difficult to locate, and requires a relatively high level of effort on the part of the user, for example by looking for suppliers in telephone directories or by first locating and then making contact with the manufacturers.
Other information of interest regarding a product might be its price or availability. Since products are often priced differently by various suppliers, the user might be interested in locating the supplier offering the most attractive price. To this end, the user must compare prices himself, by shopping around to find suppliers of the product in which he is interested. The user might also make use of a price agency, which may, in return for a percentage of the sale price, locate the supplier with the lowest price for a particular product. The user must first locate such an agency and then request a price comparison. If the user is watching a program about a foreign country, he might wish to find out more about how to plan a visit to that country. Interesting information in this case might be travel connections, route planning, visa requirements, vaccine recommendations, currency information etc. The user can locate such information by contacting travel bureaus, researching in libraries or online, and reading appropriate literature. Again, to locate all the desired infoπnation requires effort and dedication on the part of the user.
Therefore, an object of the present invention is to provide a system and a method which can be used to easily provide informative media content augmentation on an audio input. To this end, the present invention provides a system for performing media content augmentation on an audio input signal, wherein the system comprises a speech identifier for identifying the speech content in the audio signal, a speech-to-text converter for converting the speech content into a digital text format, a key phrase identifier for identifying key phrases in the digital text; a search engine for searching a source of information for material relating to the key phrases; and a result compiler for providing the user with the results of the search. Here, the key phrases may be single words or may consist of groups of words e.g. complete sentences. Therefore, for the sake of simplicity, any reference to "word" in the following text is assumed to refer also to "phrase", and vice versa, without restriction of the invention.
An appropriate method for media content augmentation of an audio input comprises identifying the speech content in the audio-visual signal, converting the speech content into a digital text format, identifying key phrases in the digital text, searching a source of information for material relating to the key phrases, and providing the user with results of the search.
The system thus provides an easy way of quickly searching for and locating information of any type concerning words identified on an audio signal which are of interest to the user, who no longer needs to invest time and resources by initiating and carrying out such a search on his own. The dependent claims and the subsequent description disclose particularly advantageous embodiments and features of the invention.
The modules which perform speech identification and conversion to digital text can be realised by one skilled in the art by using off-the-shelf components. These modules may also be realised as a single component, using available software and/or hardware components. The digital text created by the speech-to-text converter can then be analysed for content by the key phrase identifier which first processes the digital text to filter out uninteresting words, such as definite/indefinite articles, conjunctions etc. What remains is a list of possible key phrases which might be candidates for an information search. One possible method of operation of the key phrase identifier is described further on in the following text.
The source of information searched with regard to information relating to the key phrases might be, for example, an information database, the internet, or an intranet. The type of information to be located might be in the form of sound clips, text, video clips, URLs, pictures etc. The result compiler organises the located material into a manner suitable for presentation to the user, for example in the foπn of a text summary with embedded graphics and hyperlinks, or a collection of video clips. In one embodiment of the invention, the result compiler is incorporated into the system in such a way as to be able to present to the user the results of the information in the same device which contains the audio receiver. The results of the information search may equally be stored in a memory for later retrieval and perusal, and may be made available for processing or viewing on another device. In a preferred embodiment of the invention, a dictionary is incorporated in the system, for use by the key phrase identifier in identification of key phrases. The user can access the dictionary by means of a suitable interface. Thus, the dictionary can be updated and extended as required. A particularly advantageous embodiment of the invention is such that the dictionary contains a list of known words (phrases). The words already contained in the dictionary are excluded from any information search, since they are words already known to the user. Any word that is "new", i.e. does not exist in the current version of the dictionary, is a potential candidate for an information search. The dictionary can actively be updated, i.e. once a search has been successfully carried out for a new, hitherto unknown word, the user can indicate, via the interface, that this word is to be entered in the dictionary and therefore excluded from all future searches. Further, the user can specify words (phrases) that are to be included in an information search, for example words that are used in everyday language might also be used in the name of a product in which the user is interested, or appear in the title of a book. This would avoid automatic exclusion from an information search of products owing to their product names consisting of common words. Context analysis can be performed on the digital text using state-of-the-art techniques to identify phrases which consist of everyday words, e.g. "Gone with the Wind", "cure for the common cold" etc., but which might as a whole be of interest to the user.
In a particularly advantageous embodiment of the invention, the system makes use of a computer network interface to search a computer network for references to the key phrases, for example in the form of URLs, links, etc. The interface can be realised by means of, for example, a modem, ISDN or DSL connection, and any hardware and software required. A further embodiment of the interface might use a wireless connection to make contact with the computer network. The computer network with which the system makes contact might be a local intranet or the world-wide web (internet). The search engine of the system might also make use of the services of existing, possibly more powerful search engines (for example a meta-crawler) to perform parallel searches, thereby minimising the amount of time required to obtain the desired results.
A further preferred embodiment of the invention allows the user to control the manner in which a search is to be carried out and the manner in which the results of the search are to be presented, by specifying a set of preferences, for example, automatic or manual result presentation. Therefore, the system preferably comprises a suitable interface, which may be the same interface as utilized for dictionary access.
In automatic mode, the system might continually perform the information search based on words identified in the audio signal, and update a list of relevant internet pages that the user can choose to view immediately or later on. In a manual mode, the system might continuously display lists of words recently identified on the audio signal that the user can highlight and then choose to initiate an (internet) search based on the highlighted word(s).
The type of infonnation sought on the internet might be specified in the user preferences or may depend on the type of program being viewed. Genre information extracted from the audio-visual signal (electronic program guide) or obtained from an external source, for example a meta-data service provider on the internet, might be used to identify the type of program being received. For example, the user might be watching a history program and may wish to look for educational material relating to the subject matter of the broadcast. Another use could involve internet comparison shopping engines to quickly compare prices of items advertised. Further, the user might wish to learn of the search results as soon as they are available, in which case the user would specify by means of appropriate commands that the results of the information search are to visually overlay the program which is currently being watched on TV, for example by inserting closed captions into the audio-visual signal or by presenting the results in picture-in-picture form; or that the program being listened to or watched is to be interrupted to present the results immediately. The user might require the option of having the results displayed continuously on a separate screen such as a television or computer screen. On the other hand, it might suit the user better to examine the results of the search at a later stage, in which case the user would indicate this by means of entering the appropriate commands to store the search results in a memory until required.
The commands and preferences entered by a user might also be stored in a personal user profile. To this end, the system preferably offers the possibility of storing at least one user profile, more preferably a plurality of user profiles, so that anyone using the system can activate his own previously stored profile without having to enter anew his own preferences each time.
A user profile might cover all preferences about the manner in which the search is to be carried out, the type of information to be searched for, and the manner in which the results are to be presented to the user, for example, whether a user wishes to observe the results immediately in closed-caption form, that certain types of product advertised in commercials are to be excluded from or included in augmentation, and that nature documentaries are to be augmented whereas talk-shows are to be excluded, that a particular search engine or meta-crawler is to be invoked in augmentation, that the augmentation is to be supplemented using information provided by third-party metadata service providers etc. Global preferences might also be stored so that a particular mode might be activated by any user, for example an educational mode, in which documentaries and history programs are singled out for augmentation while excluding other types of program; commercial mode where only the commercials are used to locate information regarding the products and services advertised, etc.
Preferably, the system comprises a comparator which is used to compare qualitatively similar information in the results of the information search. If the key phrase denotes, for example, a new model of car, the information returned might contain items of text relating to technical performance, or details regarding suppliers, or purchase and leasing conditions. It is advantageous to compare similar types of information, so that duplicate information can be removed from the results, and so that intelligent comparisons can be made. The user might wish to have the results sorted according to the type of information they represent, for example, a list of prices might be sorted in ascending order with the most attractive price at the top of the list. The comparator can also organise the search results into a manner suitable for the desired mode of perusal, for example, audio clips for outputting on a radio, video and text material for viewing on a screen, or a document suitable for printing.
For application of the invention to an audio-visual signal, the invention advantageously comprises an audio identifier to identify the audio content of an audiovisual signal, to facilitate use of such a system in, for example, home entertainment centres which can receive audio- visual signals from various sources (TV, VCR, DVD etc). A copy of the audio content is diverted to the speech-to-text identifier for further processing as already described. A preferred feature of the invention comprises a computer program for performing all the steps involved in identifying speech in the audio content of the input signal and the key phrases therein, and carrying out an information search concerning the key phrases according to the user's specifications, i.e. most or all of the components of the system, such as speech identifier, speech-to-text converter, key phrase identifier, augmentation module etc. are realised in the form of software and/or hardware modules. Any required software might be encoded on a processor of the audio device, or be encoded on a separate processor, so that an existing audio device might be adapted to benefit from the features of this invention.
Other objects and features of the present invention will become apparent from the following detailed descriptions considered in conjunction with the accompanying drawing. It is to be understood, however, that the drawing is designed solely for the purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims.
The sole figure, Fig.l, is a schematic block diagram of a system for automatic media content augmentation in accordance with an embodiment of the present invention.
In the description of the following figure, which does not exclude other possible realisations of the invention, the system is shown to incorporate an audiovisual device 17, for example a home entertainment system, TV, multimedia device or similar. For the sake of clarity, an interface 14 between the user and the system has been included only schematically in the diagram. It is understood, however, that the system includes a means of interpreting commands issued by the user in the usual manner of a user interface and also means for outputting the audio-visual signal, for example, TV loudspeakers, TV screen etc.
Fig. 1 shows a media content augmentation system 1 in which an audio identifier 15 identifies the audio content of an audio-visual input stream
16, and passes an audio signal 2, which is a copy of the identified audio content, to a speech processing module 4. Meanwhile, the original audiovisual stream 16 is passed to the audio-visual device 17.
The speech processing module 4 comprises a speech identifier 3 which identifies the speech content on the audio signal 2, and a speech-to-text converter 5 which converts the identified speech content to a digital text 6. The digital text 6 is passed on to a key phrase identifier 7. The key phrase identifier 7 performs some initial processing on the digital text 6 and isolates potential words that might be of interest to the user 20. The key phrase identifier 7 performs a check to see whether an identified word is already covered by a dictionary 12, or specifically tagged for exclusion from or inclusion in an information search. A word not already covered by the dictionary 12 and not excluded from a search is a key phrase 19 and is passed on accordingly to the augmentation module 25.
The augmentation module 25 in this example comprises a search engine 8, a comparator 18, and a result compiler 10. The search engine 8 can access an external computer network 9, for example the internet, by means of a computer network interface 13. By means of appropriate commands and parameters, an information search is initiated, and the results of the search are analysed by the comparator 18, which can categorise the results into similar types of information and perform intelligent comparisons. The result compiler 10 is used to compile the results 11 of the search and/or the comparison into a manner suitable for presentation to the user 20. The user 20 might wish to view the results on the television screen of the audio-visual device 17, or he might want a printout or other hard copy of the results.
The user 20 can influence the media augmentation procedure by entering preferences and commands 21 via the user interface 14 of the audio-visual device 17. The preferences and commands 21 are stored in a local database 22 and are used to control the augmentation procedure. The augmentation may be further supplemented by external program genre information 26 obtained from the external computer network 9, e.g. by downloading relevant information from the internet 9 via the augmentation modul 25 passing this information 26 to the key phrase identifier 7. The user may also update the dictionary 12 by specifying words that are to be excluded from or included in an information search.
The system 1 described in this example is shown as an extension of an audio-visual device 17. However, all of the additional components described (automatic speech recognition 4, key phrase identifier 7, dictionary 12, preferences memory 22, augmentation module 25,) might be integrated to present a single device along with the audio-visual device 17, or might be realised as part of a personal computer system which is connected to an audio-visual device 17. The system might also be realised, for example, as a set-top box connected to an audio-visual device 17.
Although the present invention has been disclosed in the form of preferred embodiments and variations thereon, it will be understood that numerous additional modifications and variations could be made thereto without departing from the scope of the invention. For example, the dictionaries can be updated or replaced as desired by downloading new versions from the internet. In this way, the media content augmentation system can make use of the most up-to-date data available.
For the sake of clarity, it is to be understood that the use of "a" or "an" throughout this application does not exclude a plurality, and "comprising" does not exclude other steps or elements.

Claims

CLAIMS:
1. A system (1) for performing media content augmentation on an audio signal (2), said system comprising: a speech identifier (3) for identifying speech content in the audio signal (2); a speech-to-text converter (5) for converting the speech content into a digital text format (6); a key phrase identifier (7) for identifying key phrases (19) in the digital text (6); a search engine (8) for searching a source of information (9) for material relating to the key phrases (19); and a search result compiler (10) to provide a user with results (11) of the search.
2. The system of claim 1, wherein the system (1) contains a dictionary (12) to store a list of phrases which are to be included in or excluded from a search for material relating to the key phrases (19).
3. The system of claim 1 or claim 2, containing a computer network interface (13) for locating references to the key phrases in a computer network (9).
4. The system according to any preceding claim, wherein the system (1) comprises an interface (14) for inputting phrases and/or user preferences (21).
5. The system according to any preceding claim, comprising a comparator (18) for comparing similar types of information relating to the key phrases (19) in the located material.
6. The system according to any preceding claim, comprising an audio identifier (15) for identifying the audio content (2) in an audio-visual signal (16).
7. An audio device (17) comprising a system according to any of the preceding claims.
8. A method for automatic media content augmentation of an audio signal (2), which method comprises: identifying the speech content in the audio-visual signal
(2); converting the speech content into a digital text format (6); identifying keyphrases in the digital text (6); searching a source of information (9) for material relating to the key phrases; providing the user with results of the search.
9. A method according to claim 8 wherein the search of the source of information for augmentation of the key phrases, is performed according to preferences specified in a user profile.
10. A computer program to carry out all the steps of a method according to claim 8 or 9, whereby the computer program is implemented as part of an audio device.
PCT/IB2004/050822 2003-06-05 2004-06-02 System and method for performing media content augmentation on an audio signal WO2004109549A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03101655 2003-06-05
EP03101655.3 2003-06-05

Publications (2)

Publication Number Publication Date
WO2004109549A2 true WO2004109549A2 (en) 2004-12-16
WO2004109549A3 WO2004109549A3 (en) 2005-02-17

Family

ID=33495629

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/050822 WO2004109549A2 (en) 2003-06-05 2004-06-02 System and method for performing media content augmentation on an audio signal

Country Status (1)

Country Link
WO (1) WO2004109549A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2459308A (en) * 2008-04-18 2009-10-21 Univ Montfort Creating a metadata enriched digital media file
US9836530B2 (en) 2013-12-16 2017-12-05 Entit Software Llc Determining preferred communication explanations using record-relevancy tiers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1094406A2 (en) * 1999-08-26 2001-04-25 Matsushita Electric Industrial Co., Ltd. System and method for accessing TV-related information over the internet
US20020194004A1 (en) * 2001-06-14 2002-12-19 Glinski Stephen C. Methods and systems for enabling speech-based internet searches

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1094406A2 (en) * 1999-08-26 2001-04-25 Matsushita Electric Industrial Co., Ltd. System and method for accessing TV-related information over the internet
US20020194004A1 (en) * 2001-06-14 2002-12-19 Glinski Stephen C. Methods and systems for enabling speech-based internet searches

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CODEN A.R. ET AL.: "Speech transcript analysis for automatic search" PROC. 34TH. ANNUAL HAWAII INTERNAT. CONF. ON SYSTEM SCIENCES, 3 January 2001 (2001-01-03), pages 1-9, XP002310679 LOS ALAMITOS, CA, USA *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2459308A (en) * 2008-04-18 2009-10-21 Univ Montfort Creating a metadata enriched digital media file
US9836530B2 (en) 2013-12-16 2017-12-05 Entit Software Llc Determining preferred communication explanations using record-relevancy tiers

Also Published As

Publication number Publication date
WO2004109549A3 (en) 2005-02-17

Similar Documents

Publication Publication Date Title
US11197036B2 (en) Multimedia stream analysis and retrieval
US8374845B2 (en) Retrieving apparatus, retrieving method, and computer program product
US6772124B2 (en) Content-driven speech- or audio-browser
US20160117729A1 (en) Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet
US7373336B2 (en) Content augmentation based on personal profiles
US7640272B2 (en) Using automated content analysis for audio/video content consumption
US20030093794A1 (en) Method and system for personal information retrieval, update and presentation
US20100274667A1 (en) Multimedia access
US20080250452A1 (en) Content-Related Information Acquisition Device, Content-Related Information Acquisition Method, and Content-Related Information Acquisition Program
US8965916B2 (en) Method and apparatus for providing media content
US20100049741A1 (en) Method and system for providing supplementary content to the user of a stored-media-content device
US9015172B2 (en) Method and subsystem for searching media content within a content-search service system
CN101715100A (en) Information processing apparatus, information acquisition method and information retrieval system
US20080005100A1 (en) Multimedia system and multimedia search engine relating thereto
WO2007029207A2 (en) Method, device and system for providing search results
US20080016068A1 (en) Media-personality information search system, media-personality information acquiring apparatus, media-personality information search apparatus, and method and program therefor
WO2004109549A2 (en) System and method for performing media content augmentation on an audio signal
KR20090128251A (en) System and its method for providing advertisement based on substance of multimedia contents
CN113486212A (en) Search recommendation information generation and display method, device, equipment and storage medium
Bozzon et al. Chapter 8: Multimedia and multimodal information retrieval
WO2020240996A1 (en) Information processing device, information processing method, and program
Hopfgartner Capturing long-term user interests in online television news programs
Janevski et al. Web Information Extraction for Content Augmentation
Hanjalic et al. Indexing and retrieval of TV broadcast news using DANCERS
WO2021023397A1 (en) Method and device for enriching multimedia content through metainformation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase