EP1281173A1 - Commandes vocales basees sur la semantique d'informations de contenus - Google Patents

Commandes vocales basees sur la semantique d'informations de contenus

Info

Publication number
EP1281173A1
EP1281173A1 EP01940369A EP01940369A EP1281173A1 EP 1281173 A1 EP1281173 A1 EP 1281173A1 EP 01940369 A EP01940369 A EP 01940369A EP 01940369 A EP01940369 A EP 01940369A EP 1281173 A1 EP1281173 A1 EP 1281173A1
Authority
EP
European Patent Office
Prior art keywords
content information
user
control
speech
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01940369A
Other languages
German (de)
English (en)
Inventor
Peter J. L. A. Swillens
Jakobus Middeljans
Okke Alberda
Volker Steinbiss
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of EP1281173A1 publication Critical patent/EP1281173A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4821End-user interface for program selection using a grid, e.g. sorted out by channel and broadcast time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/84Television signal recording using optical recording
    • H04N5/85Television signal recording using optical recording on discs or drums
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/78Television signal recording using magnetic recording
    • H04N5/781Television signal recording using magnetic recording on disks or drums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/907Television signal recording using static stores, e.g. storage tubes or semiconductor memories

Definitions

  • the invention relates to voice control, especially for the play-out of content information by consumer electronics (CE) equipment.
  • CE consumer electronics
  • U.S. patent 5,255,326 in particular addresses an interactive audio system that employs a sound signal processor coupled with a microprocessor as an interactive audio control system.
  • a pair of transceivers operated as stereophonic loudspeakers and also as receiving microphones, are coupled with the signal processor for receiving voice commands from a principal user.
  • the voice commands are processed to operate a variety of different devices, such as television, tape, radio or CD player for supplying signals to the processor, from which signals then are supplied to the loudspeakers of the transceivers to produce the desired sound.
  • Additional infrared sensors may be utilized to constantly triangulate the position of the principal listener to supply signals back through the transceiver system to the processor for constantly adjusting the balance of the sound to maintain the "sweet spot" of the sound focused on the principal listener.
  • Additional devices also may be controlled by the signal processor in response to voice commands which are matched with stored commands to produce an output from the signal processor to operate these other devices in accordance with the spoken voice commands.
  • the system is capable of responding to voice commands simultaneously with the reproduction of stereophonic sound from any one of the sources of sound which are operated by the system.
  • Speech recognition is a technology, aspects of which are discussed in, e.g., U.S. patent 5,987,409; U.S. patent 5,946,655; U.S. patent 5,613,034; U.S. patent 5,228,110; and U.S. patent 5,995,930, all incorporated herein by reference.
  • the known speech control and voice control of devices or applications is limited to a fixed set of commands that is tied to the equipment.
  • the inventors have realized that user-friendliness of, and ergonomic aspects during operational use of, voice-controllable equipment are enhanced if the voice command or voice commands are linked to the information content to be played out, rather than to the apparatus or platform. That is, the inventors believe that control of CE equipment should be content-centric, rather than device- centric.
  • the commands are preferably tailored to the semantics of the content information.
  • the content information comprises audio, e.g., a collection of songs
  • selection of one or more specific ones of the songs is achieved by speaking the title or part of the lyrics of the song.
  • Special meta-data is added to the content of the CD to enable this feature.
  • This meta-data is typically, but not necessarily, a representation of the vocabulary required by the voice controller of the device or application to enable voice control for that particular CD and the music on it.
  • the user can hum or (attempt to) sing a part of the desired piece of music in order to select it for play out.
  • U.S. patent 5,963,957 issued 10/5/99 to Mark Hoffberg for BIBLIOGRAPHIC MUSIC DATA BASE WITH NORMALIZED MUSICAL THEMES (attorney docket PHA 23,241) incorporated herein by reference.
  • This latter patent relates to an information processing system that comprises a music database.
  • the music database stores homophonic reference sequences of music notes.
  • the reference sequences are all normalized to the same scale degree so that they can be stored lexicographically.
  • the system Upon finding a match between a string of input music notes and a particular reference sequence through an N-ary query, the system provides bibliographic information associated with the matching reference sequence. This system can also be used to convert the input hummed by the user into a play command via the N-ary query.
  • the audio output of the system may trigger an undesirable activation of the speech-controlled processing, e.g., when a song is being played out.
  • This undesirable activation is prevented, e.g., through echo cancellation, by pressing an activation button on the remote, e.g., the Pronto (TM), the universal programmable remote from Philips Electronics, to activate speech command receipt, or by having the equipment registering the user making a specific gesture, etc.
  • the content information comprises video
  • key scenes are labeled by key words so that speaking those words sets the playing out at the start of the relevant scene.
  • a key word profile of the video content may be used to identify certain scenes, either through a one-to-one mapping of the user's voice input to the keywords or through a semantic mapping of the user's voice input onto an indexed list of the content's keyword labels and their synonyms.
  • undesired activation is prevented from occurring, e.g., by using certain fixed commands or parts thereof such as a prefix.
  • interactive software applications using graphics e.g., virtual reality or video games, are made speech-controllable by allowing the processes to associate speech input with controllable features of graphics objects displayed or to be displayed.
  • actions to be carried out by a graphics object are made speech -controllable or speech- selectable by having the user say the proper words fitting the semantic context.
  • This is suitable for video games allowing multiple modalities of control (e.g., both hand-input through joy-stick and speech input), as well as educational programs for teaching another language, or for teaching children the proper words and expressions for certain concepts such as tangible objects or actions.
  • the speech is converted into data for being processed so as to identify the proper action intended. This is achieved through, e.g., semantic matching of the speech data with items in a pre-determined look-up table and finding the candidate for the closest match.
  • the association between speech input and action intended may be made trainable by virtue of taking user-history into account.
  • speech commands are derived from the content when the content is stored locally after downloading from the Web and/or playing- out. For example, key words in the lyrics are identified and stored as associated with the piece of audio whereto they pertain. This can be done by a dedicated software application. Either the digital data are analyzed or the audible lyrics are analyzed during the first play out of the audio content, for example, by isolating the voice part from the instrumental part and analyzing the former.
  • the speech commands thus created can be used in addition to, or instead of, the basic set that comes with the specific content.
  • the user is enabled to download preexisting or customized commands from the Web that pertain to specific content information and that are to be stored at the user's equipment as semantically associated with the information content for the purpose of enabling voice control.
  • the user can make his/her home library of electronic content information, considered as a resource for the home network, fully speech driven.
  • the user has a collection of CD's, DVD's, in his/her jukebox and/or on a hard disk. If the content relates to publicly available audio and video, a service provider can create a library of annotations for each piece of the content in advance, and the user can download those elements that are relevant to his/her collection.
  • the annotations for a CD or DVD can be tied to the disk's identifier as well as to its segments.
  • the name of an album spoken by the user, is linked to a certain identifier that in turn enables retrieval and selection of the CD or DVD in the jukebox.
  • the name of a song or scene can be linked to both the identifier of the CD or DVD and to the relevant key frames. The user then speaks the terms "movie” and "car chase” and gets in return the movies available that have scenes in them that relate to a car chase.
  • the speech commands are linked to the content as presented in an electronic program guide (EPG), e.g., as broadcast by a service provider.
  • EPG electronic program guide
  • a speech interface enables to select a specific program or program category that matches or match the words spoken by the user.
  • commands as spoken by the user are processed via a server, e.g., a home server or a server on the Web and routed back to the Web-enabled play-out equipment as instructions.
  • the server has an inventory of content available and a dictionary of words that are representative of the content's semantics.
  • the Web-enabled equipment identifies to the server the content, e.g., through the identifier code of a CD or DVD, or through the header of a file, whereupon the speech commands for this content are readily matched to instructions for the control through, e.g., a look-up table.
  • the voice control enables, e.g., the selection of a piece of content information for play-out, or for storage or for fast forward until a stop, etc. Also, content bookmarked with key words in advance can be browsed under voice control for retrieval of certain excerpts matching the voice input at the key word level.
  • the first storage medium comprises the content information and the control information that enables voice control as explained above.
  • the information for the voice control is copy- protected, as a result of which the copy does not have the control commands.
  • This is considered a feature supporting the content information industry. If the consumer wants to have a full copy of the voice controlled version, he or she can download the voice control information from a server on the Internet identified by a link to the CD number or DVD number, at a certain price. This has the advantage that the author's rights are acknowledged, even if the price is merely symbolic. Thus, this feature contributes to maintaining awareness that content information is the intellectual property of the author or his/her assignees.
  • voice command as used herein is meant to indicate a voice control input that may consist of one or more keywords but it may also comprise a more verbose linguistic expression.
  • Figs.l and 2 are block diagrams of systems in the invention.
  • the invention allows for voice control of apparatus or software applications, in particular of those that use content pre-recorded on a storage medium.
  • Voice commands are used that semantically relate to, are associated with or based on, the content as stored in the storage medium.
  • the commands are therefore different per sample of the medium's content. For example, the commands available for a CD with music from composer or lyrics author X are different from those for a CD with music composed by composer or lyrics author Y.
  • the operation is as follows.
  • the user inserts a CD of performer Daan van Schooneveld into the player.
  • the CD stores the music and the software to enable the user to interact with the CD through voice control.
  • the user says "Mustang Danny”
  • the player starts to play the rock song of that title, one of the tracks of
  • a jukebox application is a software application that allows for archiving CD content on the PC's hard disk drive (HDD).
  • HDD hard disk drive
  • the user has archived the Jos Swillens "Greatest Hits” CD on the HDD.
  • the jukebox starts to play "My Beemer fits my crewcut", one of the tracks of Swillens' CD archived on the PC.
  • the voice commands need not consist of only keywords but may comprise more verbose linguistic expressions.
  • the system processes the voice input to match it with one of the options available using, e.g., a suitable search algorithm in an index list.
  • a suitable search algorithm in an index list.
  • the user has also archived the "Greatest Hits" CD from Koos Middeljans on the PC.
  • the jukebox starts to play the folk song with that title, one of the tracks of the CD archived.
  • the jukebox starts playing “Nat the Lab”.
  • the jukebox starts playing the tracks of this CD in a random order.
  • Copy protection measures are available and implemented, e.g., DRM (Digital Rights Management).
  • DRM Digital Rights Management
  • the speech commands as supplied together with the semantically related content information on a CD or DVD could be implemented in such a manner that they cannot be copied to a location other then the onboard memory of a player. Any copy to another location would lose this feature and become less attractive.
  • the user downloads the content via the Internet together with the semantically related control date that enables voice controlled selection and play out in a similar manner as discussed for the jukebox.
  • the control data is preferably an integral part of the downloaded data in this example.
  • the same content information can be tied to phonetically different sets of voice commands, for example, to allow for differences in language and in pronunciation in different geographic regions so as to facilitate voice recognition.
  • the user preferably has a choice of the language he or she wants to use for voice control of the system.
  • the storage medium may have too small a storage capacity for storing the commands of all the languages likely to be used. If voice commands are not available from the medium in one of the languages most likely to be used, the play out device is preferably able to download the equivalent speech commands in the desired language whereupon the system will translate the commands at run time into the corresponding instructions.
  • a dedicated service can be made available on the Internet.
  • the recording is then accomplished at home under secure circumstances.
  • the local recording preferably allows the consumer to create his/her own command set semantically related to a specific piece of content information. This needs some editing and a preferably a specific graphical user interface (GUI) that assists the user with establishing the relationships between content segments, voice input commands and actions or processing desired. For example, if the content information is not annotated at all, the user has to specify which segments he/she wants to control as separate items, how he/she wants to control is with what voice commands, and what actions should be taken upon what segment under what command.
  • GUI graphical user interface
  • the phonetic transcription covers any relevant form of phonetic transcription, independent of phoneme inventory, for example, limited to a subset of the vocabulary, or just for the exception of a standard pronunciation.
  • a language model can be used optionally, that includes a description of how people typically interact with the system and say sentences (the so-called "language model"), be it via example sentences, patterns or phrases, via (stochastic) finite state grammars, via (stochastic) context free grammars, or another kind of grammar.
  • the language model may just contain a modification of any standard way of communicating.
  • the system optionally includes any description of what action should be triggered by certain words, commands, phrases, expressions, typically as given via a grammar.
  • the system may include a dialogue model that includes a description of how the system should react to user's input and how the system enters a dialogue mode. For example, the system may ask for clarification, or to reconfirm a command, etc., under specific circumstances.
  • the system may use a relationship between the data configuring the speech recognizer and other data. For example, the system has a display that shows what the user can say in order to play a current track.
  • the storage medium e.g., a CD, DVD, solid state (e.g., flash) memory, etc.
  • the storage medium has a bit pattern that gets recognized during start-up and that confirms the availability of the voice command feature.
  • the confirmation can be conveyed to the user through, e.g., a pop-up screen on a display or spoken pre-recorded text supplied via the loudspeakers.
  • CD-DA has the extra capacity of the R - W channels that can be used for adding the voice command feature without losing the CD's backwards compatibility.
  • the lead-in tracks may not have adequate storage for the various language versions, but the data can be downloaded from the disc into a local memory. In this case each language has to be only once on the disc.
  • CD ROM on the other hand, has a file structure which makes it easy to accommodate the speech control file on the disc as required.
  • DVD also has a file structure and allows for the same approach as the CD ROM. Flash, HDD etc can be handled in the same way.
  • Fig.l is a block diagram of a system 100 in the invention.
  • System 100 comprises a play-out apparatus 102 for playing out content information 104 stored on a carrier 106.
  • Carrier 106 comprises, for example, a CD, a DVD or a solid state memory.
  • carrier 106 comprises a HDD onto which content information 104 has been downloaded via the Internet or another data network.
  • Content information 104 in these examples is stored in a digital format.
  • content information 104 may also be stored in an analog format.
  • Apparatus 102 has a rendering subsystem 108 for making content information 104 available to the end-user. For example, if content information 104 comprises audio, sub-system 108 comprises one or more loudspeakers, and in case content information 104 comprises video information sub-system 108 comprises a display monitor.
  • carrier 106 comprises control information 110 that is semantically associated with content information 104.
  • Control information 110 enables a data processing sub-system 112 to determine if a voice input 114 by the user via a microphone (not shown) matches an information item in the control information. If there is a match, the relevant play-out mode is selected, examples of which have been given above.
  • the semantic relationship between control information 110 on the one hand, and content information 104 on the other hand facilitates user-interaction with apparatus 102, owing to the highly intuitive correspondence, as explained above in the play-out examples of audio content.
  • visual feedback is provided via a local display, e.g., a small LCD 116, as to the content available and/or mode selected.
  • Carrier 106 can be a component that can be inserted into apparatus 102 one at a time.
  • apparatus 102 comprises a jukebox functionality 118 that enables to select content from among multiple carriers (not shown) like carrier 106 or from among even physically different ones, CD and solid state memory, for example.
  • Control information 110 is shown here as stored or recorded with content information 104 on carrier 106.
  • a CD, DVD or flash can thus be supplied having prerecorded voice control applications and commands.
  • control information 110 cooperates with a dedicated software application running on data processing system 112 for matching voice input 114 with one or more items available in control information 110.
  • the software application is provided via another channel than the control information, e.g., via the Internet or a set-up diskette for setting up apparatus 102.
  • Voice control itself is known, and so is user-interaction with an apparatus for selecting an operational mode of the apparatus.
  • the invention here relates to using a control interface, part of which is semantically associated with the content information available for playing-out.
  • System 100 provides auditory or visual feedback in response to the user having entered a spoken command. For example, system 100 confirms receipt of the command, e.g., by repeating the command word or command words in a pre-recorded voice if there is a match, or by supplying the word "confirmed" in a pre-recorded voice if there is a match. This feature can be readily implemented with a relatively small number of predetermined commands per information content item.
  • the confirmation data can be integrated within control data 110.
  • system 100 supplies auditory feedback indicating the negative status. For example, system 100 supplies in a pre-recorded voice "cannot process this command”, “cannot find this artist", or cannot find this song” or words of a similar meaning.
  • auditory feedback system 100 can give visual feedback, e.g., a green blinking light if system 100 is capable of processing the voice input, and a red light if it is not.
  • system 100 preferably pronounces, in a pre-recorded or synthetic voice, the name of the artist and the song title or album title of the content selected for being played out.
  • the synthetic voice uses a text-to-speech engine for this feature so the system can use the information that comes available from the download or the media carrier.
  • Text-to-Speech (TTS) systems convert words from a computer document (e.g., a word processor document, a web page) into audible speech through a loudspeaker.
  • a computer document e.g., a word processor document, a web page
  • the words are stored together with their phonetic transcription, comprising intonation of carrier sentences, etc.
  • control data 110 comprises pre-recorded or synthetic voice data explaining to the user which commands, e.g., which song keywords, are available.
  • the pre-recorded or synthetic voice data can again be part of control data 110.
  • the user should be able to turn this on or off when he/she does not want the system to provide auditory feedback.
  • Fig.2 is a diagram illustrating a system 200 with an EPG wherein available content information is identified and arranged in rows 202 and columns 204 on a display monitor 206. For example, each respective row represents a respective TV channel and each of the columns represents a specific time slot.
  • a label or title 212 is shown that represents the content available from that specific channel and in that particular time slot.
  • Other types of arrangements can be used instead, e.g., by topical category and time, or ranked by user- preference according to a profile per channel or resource (e.g., on the Internet), etc.
  • the user can browse the EPG by, e.g., moving a window 214 across the grid of the EPG through a suitable user-interface (e.g., arrow keys on a wireless keyboard or another directional device, not shown) in order to get the portion of the EPG displayed that falls within the boundaries of window 214.
  • the user can thereupon select particular content information by clicking or highlighting the associated label in the portion displayed.
  • an EPG is supplied via the Internet by a service provider.
  • the EPG is enhanced with additional control software 216 that enables a mode of user-interaction with the EPG other than the conventional clicking or highlighting of a desired label.
  • Control software 216 is preferably downloaded, updated or refreshed together with the EPG.
  • Control software 216 comprises control information 218 associated with the semantics of the labels that identify the programs in the EPG for user-selection.
  • the EPG's grid is re-organized to only show the available programs according to the category "movie” in window 214, or the movie programs are graphically represented as distinct from programs in the other categories.
  • the user browses through the category "movies", preferably also under speech command.
  • the user sees the movie of his/her liking and enters as voice input the expression "The Magnificent Six and Okke", the title indicated in the EPG of the classic movie about an aviation event.
  • the user enters "tonight” and "from eight o'clock” upon which window 214 is being located to, at least partly, show the collection of programs available that day and as from eight o'clock (8:00pm) on.
  • the user has identified an interesting program in the portion of the EPG displayed in window 214 and speaks the words, representative of the title of the program, into microphone 220. Then, the user speaks "watch” or "record”. The words that represent the title are converted into a suitable format for comparison with control information 218.
  • the control software 216 enables a microprocessor 222 to control a tuner 224 and display monitor 206 or a recording device 226. In this manner, the user can interact with the EPG using voice control.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Information Transfer Between Computers (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

L'invention concerne un procédé de commande vocale de la reproduction ou d'autre traitement d'informations de contenus vidéo ou audio, qui consiste à utiliser des commandes vocales présentant une relation sémantique avec lesdites informations de contenus.
EP01940369A 2000-05-03 2001-04-26 Commandes vocales basees sur la semantique d'informations de contenus Withdrawn EP1281173A1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US20148800P 2000-05-03 2000-05-03
US201488P 2000-05-03
US62152200A 2000-07-21 2000-07-21
US621522 2000-07-21
PCT/EP2001/004714 WO2001084539A1 (fr) 2000-05-03 2001-04-26 Commandes vocales basees sur la semantique d'informations de contenus

Publications (1)

Publication Number Publication Date
EP1281173A1 true EP1281173A1 (fr) 2003-02-05

Family

ID=26896795

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01940369A Withdrawn EP1281173A1 (fr) 2000-05-03 2001-04-26 Commandes vocales basees sur la semantique d'informations de contenus

Country Status (5)

Country Link
EP (1) EP1281173A1 (fr)
JP (1) JP2003532164A (fr)
KR (1) KR20020027382A (fr)
CN (1) CN1193343C (fr)
WO (1) WO2001084539A1 (fr)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1259069A1 (fr) * 2001-05-15 2002-11-20 Deutsche Thomson-Brandt Gmbh Méthode pour modifier l'interface utilisateur d'un appareil d'électronique grand public, appareil électronique grand public correspondant
US7324947B2 (en) 2001-10-03 2008-01-29 Promptu Systems Corporation Global speech user interface
US20040176959A1 (en) * 2003-03-05 2004-09-09 Wilhelm Andrew L. System and method for voice-enabling audio compact disc players via descriptive voice commands
GB2402507A (en) * 2003-06-03 2004-12-08 Canon Kk A user input interpreter and a method of interpreting user input
EP1686796A1 (fr) * 2005-01-05 2006-08-02 Alcatel Guide électronique de programmes présenté par un avatar avec une tête parlante et une voix synthétique
EP1708395A3 (fr) * 2005-03-31 2011-11-23 Yamaha Corporation Appareil de contrôle pour un système de musique constitué de plusieurs appareils reliés entre eux par un réseau, ainsi que logiciel d'ordinateur intégré pour le contrôle du système de musique
JP4655722B2 (ja) * 2005-03-31 2011-03-23 ヤマハ株式会社 ネットワーク接続された複数の機器の動作及び接続設定のための統合プログラム
KR20130140423A (ko) * 2012-06-14 2013-12-24 삼성전자주식회사 디스플레이 장치, 대화형 서버 및 응답 정보 제공 방법
DK2933796T3 (en) * 2014-04-17 2019-01-21 Softbank Robotics Europe EXECUTION OF SOFTWARE APPLICATIONS ON A ROBOT
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10609454B2 (en) 2015-07-31 2020-03-31 Promptu Systems Corporation Natural language navigation and assisted viewing of indexed audio video streams, notably sports contests
US20170127150A1 (en) * 2015-11-04 2017-05-04 Ubitus Inc. Interactive applications implemented in video streams
CN107871500B (zh) * 2017-11-16 2021-07-20 百度在线网络技术(北京)有限公司 一种播放多媒体的方法和装置
US11140450B2 (en) 2017-11-28 2021-10-05 Rovi Guides, Inc. Methods and systems for recommending content in context of a conversation
CN110880321B (zh) * 2019-10-18 2024-05-10 平安科技(深圳)有限公司 基于语音的智能刹车方法、装置、设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774859A (en) * 1995-01-03 1998-06-30 Scientific-Atlanta, Inc. Information system having a speech interface
US6643620B1 (en) * 1999-03-15 2003-11-04 Matsushita Electric Industrial Co., Ltd. Voice activated controller for recording and retrieving audio/video programs
US6553345B1 (en) * 1999-08-26 2003-04-22 Matsushita Electric Industrial Co., Ltd. Universal remote control allowing natural language modality for television and multimedia searches and requests

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Data Retrieval through a Compact Disk Device having a Speech-Driven Interface", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 38, no. 1, January 1995 (1995-01-01), pages 267 - 268, XP000498766 *

Also Published As

Publication number Publication date
JP2003532164A (ja) 2003-10-28
CN1193343C (zh) 2005-03-16
CN1381039A (zh) 2002-11-20
WO2001084539A1 (fr) 2001-11-08
KR20020027382A (ko) 2002-04-13

Similar Documents

Publication Publication Date Title
US10956006B2 (en) Intelligent automated assistant in a media environment
JP3577454B2 (ja) 記録されたテレビジョン放送についての情報を記憶するための機構
US20090076821A1 (en) Method and apparatus to control operation of a playback device
US7684991B2 (en) Digital audio file search method and apparatus using text-to-speech processing
JP3554262B2 (ja) テレビジョン及びマルチメディアの検索及び要求に対して自然言語のモダリティーを可能にする汎用遠隔制御
US9153233B2 (en) Voice-controlled selection of media files utilizing phonetic data
EP1693830B1 (fr) Système de données à commande vocale
US8106285B2 (en) Speech-driven selection of an audio file
US6643620B1 (en) Voice activated controller for recording and retrieving audio/video programs
US7870142B2 (en) Text to grammar enhancements for media files
US20090326953A1 (en) Method of accessing cultural resources or digital contents, such as text, video, audio and web pages by voice recognition with any type of programmable device without the use of the hands or any physical apparatus.
US20150032453A1 (en) Systems and methods for providing information discovery and retrieval
US20040266337A1 (en) Method and apparatus for synchronizing lyrics
WO2001084539A1 (fr) Commandes vocales basees sur la semantique d'informations de contenus
JPH09185879A (ja) レコーディング・インデクシング方法
JP2005539254A (ja) 音声認識を利用したメディアファイルのアクセスおよび検索におけるシステムと方法
KR20100005177A (ko) 맞춤형 학습 시스템, 맞춤형 학습 방법, 및 학습기
US6741791B1 (en) Using speech to select a position in a program
JP2002189483A (ja) 音声入力式楽曲検索システム
US20240134506A1 (en) Intelligent automated assistant in a media environment
KR20080065205A (ko) 맞춤형 학습 시스템, 맞춤형 학습 방법, 및 학습기
Laia et al. Designed for Enablement or Disabled by Design? Choosing the Path to Effective Speech Application Design

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021203

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RTI1 Title (correction)

Free format text: VOICE COMMANDS DEPENDENT ON CONTENT INFORMATION SEMANTICS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070817