WO1992000586A1 - Selecteur d'un locuteur fondee sur un mot-cle - Google Patents

Selecteur d'un locuteur fondee sur un mot-cle Download PDF

Info

Publication number
WO1992000586A1
WO1992000586A1 PCT/US1991/004327 US9104327W WO9200586A1 WO 1992000586 A1 WO1992000586 A1 WO 1992000586A1 US 9104327 W US9104327 W US 9104327W WO 9200586 A1 WO9200586 A1 WO 9200586A1
Authority
WO
WIPO (PCT)
Prior art keywords
users
keyword
templates
uniquely identified
template
Prior art date
Application number
PCT/US1991/004327
Other languages
English (en)
Inventor
Paul F. Smith
Kamyar Rohani
Mark R. Harrison
Original Assignee
Motorola, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola, Inc. filed Critical Motorola, Inc.
Publication of WO1992000586A1 publication Critical patent/WO1992000586A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker

Definitions

  • This invention relates generally to speech recognition, and more particularly to speaker dependant speech recognition systems, and is particularly directed toward providing a time efficient and accurate method of template selection based on keyword recognition.
  • One such speech recognition control system (which is responsive to human voice) is highly desirable in automotive applications.
  • Most mobile radio transceiver functions e.g., on/off, transmit/receive, volume, squelch, changing channels, etc.
  • mobile radio telephone control functions e.g., push button dialing, speech recognizer training, telephone call answering
  • speech recognition has a potential for providing a totally hands free telephone conversation without ever requiring the automobile driver to remove the driver's hands from the steering wheel, or take his or her eyes off the road.
  • this has added to the safety and convenience of using mobile radio telephones in vehicles.
  • speaker dependant technology utilizes pre-stored voice templates as references to recognize the voice of a particular individual to perform specified functions that relate to a predetermined set of recognized command words.
  • a template is a time-ordered set of features that characterize the behavior of the speech signal for a particular speaker.
  • Speaker dependant technology requires that the speaker dependant device must be programmed (or trained) to recognize each individual operator. Training is commonly understood to be a process by which an individual repeats a predetermined set of command words a sufficient number of times so that an acceptable template is formed.
  • a word recognizer recognizes the word command by extracting features which adequately represent the utterance and making a decision as to whether these features meet some distance criteria to match a particular template out of the set of pre-stored templates. These templates correspond to the set of pre-stored features representing the command words to be recognized.
  • the speaker dependant word recognizer is then designed to recognize the command words of users by comparing the utterance to pre- stored voice templates which contain the voice features of those users.
  • the operator or speaker must first "log-in", manipulate or adjust one or more control knobs or buttons to enter an identification code or otherwise inform the recognizer of who the operator is so that the recognizer can reference the voice templates which were generated when the operator trained the system to his or her voice initially.
  • This "logging in” procedure is cumbersome but necessary in prior art systems since the templates are stored in direct association with each user's identification code.
  • One reason for using this logging-in procedure is to preclude an exhaustive search of all the templates to match a particular speaker with the speaker's voice. Another reason is to improve accuracy by knowing who the speaker is before matching his or her voice command.
  • This logging-in procedure is therefore inefficient since one purpose of voice control of a two way mobile radio is to alleviate the need to divert a driver's attention from operating the vehicle to manipulate or adjust such knobs on the radio.
  • this procedure is cumbersome. It forces the operator to remember another number; for example, to identify what car number the operator is in, what batch number he or she has, and which user the operator is. Thus, this cumbersome method detracts from the main purpose of using voice control in the first place, which is to improve usability.
  • a method for recognizing an utterance of a voice command sequence having a keyword spoken at the beginning of the sequence includes storing a plurality of templates, each template uniquely identified with one user. At least one spoken keyword uniquely identified with one of the users is received. The method determines which particular trained user spoke the keyword and selects a subset of the templates uniquely identified with this particular user to provide a set of recognizable commands for subsequent utterances of the voice command sequences.
  • the determining step comprises comparing the received spoken keyword with a portion of the set of templates.
  • Each template of this portion is uniquely identified with one user by the spoken keyword being a unique word distinct for each of the users.
  • the spoken keyword is a single word characteristically spoken by each of the users.
  • FIG. 1 is a block diagram of a communication device in accordance with the present invention.
  • FIG. 2 is a flow diagram illustrating the operation of the communication device of FIG. 1 in accordance with the present invention.
  • the invention of selecting a set of templates based on speaker identification due to an utterance can be applied to many applications.
  • the following mobile radio application is just one example of the various applications possible.
  • a communication device 1000 is illustrated in block diagram form.
  • the communication device 1000 may comprise a land mobile two way radio frequency communication device, such as the SYNTOR X 9000 series radio manufactured by Motorola, Inc., but the present invention need not be limited thereto.
  • the communication device 1000 includes a word recognizer 100 which utilizes the principles of the present invention to determine who a speaker is. By matching the corresponding template (using distance criteria to find the best match) to that particular speaker, a keyword is recognized before word commands (such as "change channel”) are similarly processed. For subsequent word commands, the search for a matching template will only be conducted using the templates associated with this particular speaker.
  • a voice command sequence having a keyword at its beginning to represent who the speaker is, followed by the command words "change channel” is uttered by the user.
  • This utterance is received by a microphone 102.
  • the analog representation of the utterance from the microphone 102 is filtered, sampled, and digitized by the Codec 106.
  • the digitized utterance is sent to the digital signal processor (DSP) 120, which performs the speech recognition function, providing recognition results to the controller 160.
  • the DSP 120 also may send digitized audio (synthesized speech or other sounds) to the Codec 106 to provide audible feedback or error messages, as dictated by inputs from the controller 160.
  • the Codec 106 converts the digital representation of the audio back to an analog representation, filters the analog signal, and drives the speaker 170.
  • the controller 160 takes input from a keyboard 110 and recognition results from the DSP 120. Based upon these inputs, it controls the operation of the radio 104 (in this example, changing the radio channel).
  • the controller 160 may be any suitable microprocessor, microcomputer, or microcontroller, and preferably comprises an MC68HC11 (or its functional equivalent), manufactured by Motorola, Inc. Note that the functions of the controller 160 may be incorporated in the DSP 120 if desired. Generally, the controller
  • Block 150 provides this additional memory.
  • the DSP 120 may be of any suitable type, such as the 56000 family of DSPs, manufactured by Motorola, Inc. Generally, the DSP 120 too may require additional RAM and ROM over what is included on the DSP 120, which is provided in block 150.
  • the Codec 106 may be internal to the DSP 150 and incorporated in a single block. The DSP 150 also will use the RAM for temporary data storage, and the ROM for program storage.
  • Block 140 provides electrically erasable programmable read only memory (EEPROM), which is used mostly to store the recognition templates.
  • EEPROM electrically erasable programmable read only memory
  • the speaker dependant word recognizer 100 recognizes words by comparing them to pre-stored reference templates which contains "extracted features" of the recognizable words, spoken by each user.
  • Extracted features are representations of digitized words (or utterances) which are thought to contain the essential characteristics of the speech.
  • the process of feature extraction is well known in the art, and examples may be found in G. White, R. Neely, "Speech Recognition Experiments with Linear Prediction, Band Pass Filtering, and Dynamic Programming", IEEE ASSP, Vol. ASSP- 23, No. 2, April 1976, which is incorporated here by reference.
  • Training is known to be a process by which the individual user repeats a predetermined set of reference words or utterances for a sufficient number of times, until an acceptable number of their voice features are extracted and stored.
  • the word recognizer 100 comprises a speaker dependant word recognizer and provides two modes of operation, a training mode and a recognition mode.
  • a control panel 110 coupled to the controller 160 includes buttons 114 and 116 for selecting the desired mode.
  • the system 100 must be notified which user is being trained.
  • the control panel 110 also includes buttons 112 for each user to be trained.
  • the extracted features of reference commands or keywords may be stored in an erasable memory means 140, such as any suitable EEPROM.
  • this invention eliminates the use of these user buttons 112.
  • the word recognizer 100 is designed such that in the recognition mode, the user buttons need not be pressed for recognizing the voice of the individual user since voice recognition of the speaker is done automatically.
  • Step 208 is the initialization step, in which the DSP, 120, resets internal variables used in the recognition process.
  • the DSP extracts the features from the digitized input utterance provided by the Codec, 106. (It should be noted that step 210 need not be performed completely before the following steps in the flow chart. In fact, prior art systems often extract features from a new utterance at the same time as the features from a previous utterance are being recognized.)
  • Steps 220 and 230 are the critical steps to the present invention. If the utterance (a spoken word or phrase) is the first in the command sequence, the user will be uttering the keyword.
  • the system will then use this keyword to determine which user is currently speaking in step 230.
  • Methods of identifying people from their speech are well known in the art. Examples of this speaker identification may be found in NAIK, "Speaker Verification: A tutorial,” IEEE Communications Magazine, pages
  • Step 240 compares the features from the input utterance to those templates attributed to the identified user, determining distances between each template and the input utterance. Since the utterances will often be of different lengths, the distance determination method must include some means of aligning them at equivalent points in time.
  • Step 250 updates the total command sequence distances by adding the distances from the features of the input utterance to the templates identified with words that are included in the command sequence. In this way, entire command sequences may be compared to determine the best recognized sequence.
  • Step 260 determines if the command sequence is complete. If the sequence is complete, then the recognizer outputs the recognized sequence of commands words in step
  • step 270 If not, the system returns to step 210, examining the next utterance.
  • the present invention involves using the utterance of a keyword to determine who the user or speaker is.
  • each user has a particular keyword associated with him or her which was used to train the word recognizer 100 to his or her voice to form the reference templates.
  • the word recognizer 100 Upon initiating, or turning on of the mobile radio (208), the word recognizer 100 does not know who is about to use the mobile radio.
  • the user may be a previous speaker already recognized or a new speaker.
  • the word recognizer 100 must initially be responsive to all of the keywords spoken by the different speakers in step 230.
  • an utterance When an utterance is detected (210), it determines (230) whether it was sufficiently close to one of a limited number of keywords that the recognizer was trained on and thus can determine (230) who said the utterance. Particular keywords may be selected because of their distinctive phonetic content to increase the reliability of system recognizer.
  • a second preferred embodiment uses a single keyword that has been trained by all the users to form the reference templates. Upon utterance of this single keyword, the recognizer 100 performs the speaker identification process or speaker recognition of step 230 as previously described to determine who the speaker is. The advantage to this embodiment is that only one keyword needs to be matched.
  • the word recognizer 100 may determine if all the users naturally say the particular keyword acoustically different from each other so that the single keyword will be discernible as spoken by each speaker.
  • the keyword may ideally be chosen because it accentuates the different acoustic characteristics of different speakers speaking the same word.
  • the identification of the speaker is automatic (230) and requires no additional effort by the operator. This invention therefore enhances the ease-of-use that voice control systems strive for.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Procédé destiné à reconnaître l'émission d'une séquence de commande vocale comprenant un mot-clé prononcé au début de ladite séquence. Une pluralité de modèles est en mémoire, chaque modèle étant identifié à un seul utilisateur. Au moins un mot-clé prononcé, identifié à un seul des utilisateurs, est reçu (220). Ledit procédé détermine (230) quel utilisateur particulier a prononcé le mot-clé et sélectionne un sous-ensemble des modèles identifiés à cet utilisateur particulier afin de fournir un ensemble de commandes reconnaissables en vue des émissions ultérieures des séquences de commande vocale (270).
PCT/US1991/004327 1990-07-02 1991-06-17 Selecteur d'un locuteur fondee sur un mot-cle WO1992000586A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US54835990A 1990-07-02 1990-07-02
US548,359 1990-07-02

Publications (1)

Publication Number Publication Date
WO1992000586A1 true WO1992000586A1 (fr) 1992-01-09

Family

ID=24188525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/004327 WO1992000586A1 (fr) 1990-07-02 1991-06-17 Selecteur d'un locuteur fondee sur un mot-cle

Country Status (1)

Country Link
WO (1) WO1992000586A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2778782A1 (fr) * 1998-05-18 1999-11-19 Henri Benmussa Systeme de reconnaissance vocale "multi-monolocuteur"
WO2000045575A1 (fr) * 1999-01-28 2000-08-03 Telia Ab (Publ) Dispositif et procede pour systemes de telecommunication
DE102004030054A1 (de) * 2004-06-22 2006-01-12 Bayerische Motoren Werke Ag Verfahren zur sprecherabhängigen Spracherkennung in einem Kraftfahrzeug

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4363102A (en) * 1981-03-27 1982-12-07 Bell Telephone Laboratories, Incorporated Speaker identification system using word recognition templates
US4590604A (en) * 1983-01-13 1986-05-20 Westinghouse Electric Corp. Voice-recognition elevator security system
US4827520A (en) * 1987-01-16 1989-05-02 Prince Corporation Voice actuated control system for use in a vehicle
US4922538A (en) * 1987-02-10 1990-05-01 British Telecommunications Public Limited Company Multi-user speech recognition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4363102A (en) * 1981-03-27 1982-12-07 Bell Telephone Laboratories, Incorporated Speaker identification system using word recognition templates
US4590604A (en) * 1983-01-13 1986-05-20 Westinghouse Electric Corp. Voice-recognition elevator security system
US4827520A (en) * 1987-01-16 1989-05-02 Prince Corporation Voice actuated control system for use in a vehicle
US4922538A (en) * 1987-02-10 1990-05-01 British Telecommunications Public Limited Company Multi-user speech recognition system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2778782A1 (fr) * 1998-05-18 1999-11-19 Henri Benmussa Systeme de reconnaissance vocale "multi-monolocuteur"
WO2000045575A1 (fr) * 1999-01-28 2000-08-03 Telia Ab (Publ) Dispositif et procede pour systemes de telecommunication
DE102004030054A1 (de) * 2004-06-22 2006-01-12 Bayerische Motoren Werke Ag Verfahren zur sprecherabhängigen Spracherkennung in einem Kraftfahrzeug

Similar Documents

Publication Publication Date Title
US6839670B1 (en) Process for automatic control of one or more devices by voice commands or by real-time voice dialog and apparatus for carrying out this process
US6591237B2 (en) Keyword recognition system and method
US6671669B1 (en) combined engine system and method for voice recognition
US8639508B2 (en) User-specific confidence thresholds for speech recognition
KR100901092B1 (ko) 음성인식을 위하여 화자의존모드 및 화자독립모드에서dtw와 hmm의 결합
EP0311414A2 (fr) Dispositif de numérotation à commande vocale avec mémoires pour la numérotation complète par tous les usagers et numérotation abrégée pour les personnes autorisées
US20020091522A1 (en) System and method for hybrid voice recognition
EP1739546A2 (fr) Interface de véhicule automobile
US20060215821A1 (en) Voice nametag audio feedback for dialing a telephone call
US20020091515A1 (en) System and method for voice recognition in a distributed voice recognition system
US20070156405A1 (en) Speech recognition system
EP1994529B1 (fr) Dispositif de communication dote de reconnaissance vocale independante du locuteur
EP1159735B1 (fr) Plan de rejet d'un systeme de reconnaissance vocale
EP0877518B1 (fr) Méthode de compositon d'un numéro téléphonique par commandes vocales et terminal de télécommunicaiton contrôlé par commandes vocales
US7110948B1 (en) Method and a system for voice dialling
US20090138264A1 (en) Speech to dtmf generation
WO1992000586A1 (fr) Selecteur d'un locuteur fondee sur un mot-cle
WO2000022609A1 (fr) Systeme de commande et de reconnaissance de la parole et telephone
JP2003177788A (ja) 音声対話システムおよびその方法
KR100827074B1 (ko) 이동 통신 단말기의 자동 다이얼링 장치 및 방법
EP1160767A2 (fr) Reconnaissance de la parole à l'aide de probabilités d'hypothèses contextuelles
KR100395222B1 (ko) 음성사서함서비스(브이엠에스)를 위한 음성인식시스템
KR19990081664A (ko) 음성 인식 전화기의 음성 인식 방법
JPH06149287A (ja) 音声認識装置
JPH0580794A (ja) 音声認識装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

NENP Non-entry into the national phase

Ref country code: CA