WO2001015140A1 - Systeme de reconnaissance vocale pour la saisie de donnees - Google Patents

Systeme de reconnaissance vocale pour la saisie de donnees Download PDF

Info

Publication number
WO2001015140A1
WO2001015140A1 PCT/CA2000/000776 CA0000776W WO0115140A1 WO 2001015140 A1 WO2001015140 A1 WO 2001015140A1 CA 0000776 W CA0000776 W CA 0000776W WO 0115140 A1 WO0115140 A1 WO 0115140A1
Authority
WO
WIPO (PCT)
Prior art keywords
textual output
output message
index
user
entry
Prior art date
Application number
PCT/CA2000/000776
Other languages
English (en)
Inventor
Alexei B. Machovikov
Kirill V. Stolyarov
Maxim A. Chernoff
Original Assignee
Telum Canada, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telum Canada, Inc. filed Critical Telum Canada, Inc.
Priority to AU56683/00A priority Critical patent/AU5668300A/en
Priority to CA002342787A priority patent/CA2342787A1/fr
Publication of WO2001015140A1 publication Critical patent/WO2001015140A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • Speech recognition systems also suffer from disadvantages in that they must be trained by each user for the vocabulary to be recognized and this can require a significant amount of time and effort. Further, less than desired results can be obtained due to a variety of factors including background noise, poor enunciation by the user, etc.
  • a data entry system comprising: a speech recognition engine operable to receive speech and to recognize a search phrase therein; a database engine in communication with the speech recognition engine, the database engine including an index against which said recognized search phrase is applied to identify a corresponding index entry, each index entry having at least one textual output message defined therefore; a user terminal in communication with the database engine, the user interface (24) including a display device for displaying said at least one textual output message corresponding to said identified index entry, and a user input device for receiving a user input representing an approval and/or a completion of said displayed textual output message, the database engine (40) being configured for outputting said approved and/or completed textual output message upon receipt of said user input.
  • a method of performing data entry comprising the steps of:
  • Figure 1 shows a schematic representation of the data entry system in accordance with the present invention.
  • System 20 includes a data entry terminal 24 which can be any suitable data entry terminal such as a VT-100 or other "dumb terminal" or a personal computer. As shown, terminal 24 includes a keyboard and a display. Data input by a user of system 20 is passed to a data processing system 28, as discussed in more detail below.
  • Data processing system 28 can be any computer-implemented system requiring data input such as an order entry system, an inventory control system and, in a preferred embodiment of the present invention, is a wireless paging network.
  • System 20 also includes a microphone 32 which, in a preferred embodiment of the invention, is the mouthpiece of a telephone headset or handset but which can be any suitable microphone or other mechanism for capturing the voice of a user.
  • Microphone 32 is connected to a speech recognition engine 36 which can be any appropriate speech recognition system.
  • speech recognition engine 36 can employ Hidden Markov Models (HMM) or other known algorithms to recognize speech and can be implemented in dedicated hardware or as an application running on a general purpose personal computer with adequate memory and processing capacity.
  • HMM Hidden Markov Models
  • the output of speech recognition engine 36 is applied to a database engine 40 which can be any suitable database such as those sold by Oracle, or a Microsoft Access database, etc. As described below in more detail, database engine 36 maintains at least one table relating predefined recognized phrases with corresponding textual message outputs. Selected corresponding textual message outputs from database engine 40 can be reviewed, approved, amended, modified from user terminal 24, or alternative selections of textual message output from user terminal 24, before they are output to data processing system 28.
  • a user defines a set of textual output messages of interest. These messages are selected as being text strings which will be commonly used by the user and can be represented in any language or character set desired, including multi-byte Unicode character sets and/or ideographic character sets.
  • examples of textual output messages of interest and their corresponding index phrases can for example, include:
  • Cell number is I can be reached at my cellular and the number is
  • textual output messages can be added, amended or deleted from database engine 40 by users as desired.
  • speech recognition engine 36 need not be extremely sophisticated. In fact it is contemplated that in some circumstances speech recognition engine 36 may not require training for each individual user and yet can provide acceptably accurate recognition of index entries.
  • a paging operator i.e. - a user
  • Microphone 32 can either be an additional microphone into which the operator can speak when desired, or can be the mouthpiece of an otherwise conventional telephone headset or handset.
  • a switch (not shown) is provided which allows the operator to speak such that the person on the other end of the telephone (the caller) can hear the operator or to speak such that the caller and speech recognition engine 36 can each "hear" the operator.
  • speech recognition engine 36 will analyze the speech it has heard and will provide the output of its analysis, as a search input, to database engine 40.
  • Database engine 40 compares the received search input to the index entries in its table or tables and selects the appropriate table entry.
  • the corresponding textual output string in this example, "For flight arrival information, call 555-1212. Please pick me up at the airport at” is selected by database engine 40 and is displayed on user terminal 24 for approval and/or completion by the operator.
  • the operator would verify that the correct textual output message has been identified and will complete the output message by entering the text " 5 : 00PM" , representing variable information, in a conventional manner such as by the keyboard.
  • index entries and output textual messages in database engine 40 can be in different languages.
  • the index entries in database engine 40 can be in English (in any suitable form such as textual or phonetic) and the corresponding textual output messages can be in Unicode Mandarin Chinese. In this manner an operator speaking with an English language caller will be able to create output messages in Mandarin Chinese.
  • variable completion information it can be selected from a list of appropriate choices displayed to the operator in English and, once a selection is made, database engine 40 will complete the textual output message with predefined corresponding Mandarin Chinese text.
  • database engine 40 can include multiple textual output messages, arranged by languages of interest, for each index entry.
  • the textual output messages displayed to the operator on user terminal 24 for approval and/or completion will be in a language selected by the operator, who can, once the message is completed and/or approved, indicate which of the available languages it is to be input to data processing system 28 in.
  • the present invention provides an efficient real-time data entry system in which user speech is analyzed to extract a search phrase.
  • This search phrase is used to search an index to locate an index entry for which one or more textual output phrases have been defined.
  • a corresponding textual output message is presented to the user for approval and/or completion by the user and is then provided as input to a data processing system, such as a paging system.
  • a data processing system such as a paging system.
  • the user can select the desired textual output message.
  • the corresponding textual output messages can include additional information, defined fields to be completed by the user and/or can be in a different language from the index entry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

L'invention concerne un procédé et un système de reconnaissance vocale pour la saisie de données comprenant un moteur (36) de reconnaissance vocale servant à examiner la voix d'un utilisateur et à reconnaître une phrase de recherche. La phrase de recherche reconnue est alors appliquée à un index dans un moteur (40) de base de données afin de localiser une entrée d'index appropriée et des messages sortie textuels correspondants. Un message sortie textuel correspondant est présenté à l'utilisateur via un terminal (24) d'utilisateur en vue d'être approuvé et/ou complété, puis le message sortie complété et approuvé est fourni comme entrée à un système (28) de traitement des données, tel qu'un système de téléappel. Plusieurs messages textuels correspondants peuvent être fournis pour une entrée d'index, notamment en plusieurs langues et/ou jeux de caractères et, dans ce cas, l'utilisateur choisit le message disponible souhaité.
PCT/CA2000/000776 1999-07-01 2000-07-04 Systeme de reconnaissance vocale pour la saisie de donnees WO2001015140A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU56683/00A AU5668300A (en) 1999-07-01 2000-07-04 Speech recognition system for data entry
CA002342787A CA2342787A1 (fr) 1999-07-01 2000-07-04 Systeme de reconnaissance vocale pour la saisie de donnees

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14183999P 1999-07-01 1999-07-01
US60/141,839 1999-07-01

Publications (1)

Publication Number Publication Date
WO2001015140A1 true WO2001015140A1 (fr) 2001-03-01

Family

ID=22497496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2000/000776 WO2001015140A1 (fr) 1999-07-01 2000-07-04 Systeme de reconnaissance vocale pour la saisie de donnees

Country Status (3)

Country Link
AU (1) AU5668300A (fr)
CA (1) CA2342787A1 (fr)
WO (1) WO2001015140A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077975A1 (fr) * 2001-03-27 2002-10-03 Koninklijke Philips Electronics N.V. Procede de selection et de transmission de messages alphabetiques via un mobile
EP1315098A1 (fr) * 2001-11-27 2003-05-28 Telefonaktiebolaget L M Ericsson (Publ) Recherche de messages vocaux
EP1361740A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement des informations de la parole d'un dialogue
EP1361736A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode pour la reconnaissance des informations de la parole
EP1361737A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole et de classification de dialogues
EP1361738A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole à l'aide de reconnaissance de parole et d'analyse fréquentielle
EP1361739A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole après reconnaissance de la langue
EP1363271A1 (fr) * 2002-05-08 2003-11-19 Sap Ag Méthode et système pour le traitement et la mémorisation du signal de parole d'un dialogue
EP2279508A2 (fr) * 2008-04-23 2011-02-02 nVoq Incorporated Procédés et systèmes de mesure de performance utilisateur présentant une conversion de parole en texte pour des systèmes de dictée

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5500920A (en) * 1993-09-23 1996-03-19 Xerox Corporation Semantic co-occurrence filtering for speech recognition and signal transcription applications
US5724526A (en) * 1994-12-27 1998-03-03 Sharp Kabushiki Kaisha Electronic interpreting machine
US5758318A (en) * 1993-09-20 1998-05-26 Fujitsu Limited Speech recognition apparatus having means for delaying output of recognition result
WO1999003092A2 (fr) * 1997-07-07 1999-01-21 Motorola Inc. Systeme et procede de reconnaissance de voix modulaire

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5758318A (en) * 1993-09-20 1998-05-26 Fujitsu Limited Speech recognition apparatus having means for delaying output of recognition result
US5500920A (en) * 1993-09-23 1996-03-19 Xerox Corporation Semantic co-occurrence filtering for speech recognition and signal transcription applications
US5724526A (en) * 1994-12-27 1998-03-03 Sharp Kabushiki Kaisha Electronic interpreting machine
WO1999003092A2 (fr) * 1997-07-07 1999-01-21 Motorola Inc. Systeme et procede de reconnaissance de voix modulaire

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KONDO K ET AL: "SURFIN' THE WORLD WIDE WEB WITH JAPANESE", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP '97), 21 April 1997 (1997-04-21), IEEE COMP. SOC. PRESS, Los Alamitos, US, pages 1151 - 1154, XP000822656, ISBN: 0-8186-7920-4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002077975A1 (fr) * 2001-03-27 2002-10-03 Koninklijke Philips Electronics N.V. Procede de selection et de transmission de messages alphabetiques via un mobile
US6934552B2 (en) 2001-03-27 2005-08-23 Koninklijke Philips Electronics, N.V. Method to select and send text messages with a mobile
EP1315098A1 (fr) * 2001-11-27 2003-05-28 Telefonaktiebolaget L M Ericsson (Publ) Recherche de messages vocaux
EP1361740A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement des informations de la parole d'un dialogue
EP1361736A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode pour la reconnaissance des informations de la parole
EP1361737A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole et de classification de dialogues
EP1361738A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole à l'aide de reconnaissance de parole et d'analyse fréquentielle
EP1361739A1 (fr) * 2002-05-08 2003-11-12 Sap Ag Méthode et système de traitement du signal de parole après reconnaissance de la langue
EP1363271A1 (fr) * 2002-05-08 2003-11-19 Sap Ag Méthode et système pour le traitement et la mémorisation du signal de parole d'un dialogue
EP2279508A2 (fr) * 2008-04-23 2011-02-02 nVoq Incorporated Procédés et systèmes de mesure de performance utilisateur présentant une conversion de parole en texte pour des systèmes de dictée
EP2279508A4 (fr) * 2008-04-23 2012-08-29 Nvoq Inc Procédés et systèmes de mesure de performance utilisateur présentant une conversion de parole en texte pour des systèmes de dictée

Also Published As

Publication number Publication date
CA2342787A1 (fr) 2001-03-01
AU5668300A (en) 2001-03-19

Similar Documents

Publication Publication Date Title
US7369988B1 (en) Method and system for voice-enabled text entry
US6570964B1 (en) Technique for recognizing telephone numbers and other spoken information embedded in voice messages stored in a voice messaging system
KR101109265B1 (ko) 텍스트 입력 방법
US6895257B2 (en) Personalized agent for portable devices and cellular phone
KR100769029B1 (ko) 다언어의 이름들의 음성 인식을 위한 방법 및 시스템
US20100217591A1 (en) Vowel recognition system and method in speech to text applictions
CN108305626A (zh) 应用程序的语音控制方法和装置
US20060247932A1 (en) Conversation aid device
US7715531B1 (en) Charting audible choices
US20060069563A1 (en) Constrained mixed-initiative in a voice-activated command system
US20070016420A1 (en) Dictionary lookup for mobile devices using spelling recognition
WO2001015140A1 (fr) Systeme de reconnaissance vocale pour la saisie de donnees
Callejas et al. Implementing modular dialogue systems: A case of study
JP4230142B2 (ja) 悪環境下でのキーパッド/音声を用いたハイブリッドな東洋文字認識技術
Collingham et al. The Durham telephone enquiry system
Kouroupetroglou et al. Speech-enabled e-Commerce for disabled and elderly persons
JP3221477B2 (ja) データベース照合型入力方法及び装置、データベース照合型日本語入力装置、並びに、電話番号案内サービスシステム
EP1187431B1 (fr) Terminal portable avec composition vocal de numéro minimisant l'usage de mémoire
EP1895748B1 (fr) Méthode, programme et système pour l'identification univoque d'un contact dans une base de contacts par commande vocale unique
EP1554864B1 (fr) Procede et appareil d'assistance-annuaire
US11902466B2 (en) Captioned telephone service system having text-to-speech and answer assistance functions
JP4067483B2 (ja) 電話受け付け翻訳システム
Sharman Speech interfaces for computer systems: Problems and potential
Goldman et al. Voice Portals—Where Theory Meets Practice
JP2001309049A (ja) メール作成システム、装置、方法及び記録媒体

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

ENP Entry into the national phase

Ref document number: 2342787

Country of ref document: CA

Kind code of ref document: A

Ref document number: 2342787

Country of ref document: CA

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP