EP1518221A1 - Procede de reconnaissance vocale naturelle fondee sur une grammaire syntagmatique/generative transformationnelle - Google Patents

Procede de reconnaissance vocale naturelle fondee sur une grammaire syntagmatique/generative transformationnelle

Info

Publication number
EP1518221A1
EP1518221A1 EP03761435A EP03761435A EP1518221A1 EP 1518221 A1 EP1518221 A1 EP 1518221A1 EP 03761435 A EP03761435 A EP 03761435A EP 03761435 A EP03761435 A EP 03761435A EP 1518221 A1 EP1518221 A1 EP 1518221A1
Authority
EP
European Patent Office
Prior art keywords
grammar
recognized
words
phrase
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP03761435A
Other languages
German (de)
English (en)
Inventor
Klaus Dieter Liedtke
Guntbert Markefka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telekom Deutschland GmbH
Original Assignee
T Mobile Deutschland GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by T Mobile Deutschland GmbH filed Critical T Mobile Deutschland GmbH
Publication of EP1518221A1 publication Critical patent/EP1518221A1/fr
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99934Query formulation, input preparation, or translation

Definitions

  • the invention relates to a method for natural speech recognition based on a generative transformation / phrase structure grammar (GT / PS grammar).
  • NLU Natural Language Understanding
  • Speech recognition systems with natural speech recognition are able to understand a variety of possible utterances and implement them in complex command structures, the speech recognition systems, e.g. Computer, to take certain actions. They do this on the basis of predefined, meaningful sample sets, which are defined by application developers and so-called dialog designers.
  • This collection of sample sentences - also called "grammar" - includes individual command words as well as complicated nesting sentences that make sense at a certain point in the dialog. If the user expresses such a sentence, the system will understand it with great certainty and the instructions associated with it is running.
  • the Grammar is an indispensable component. It is generated using a special tool, the so-called Grammar Specification Language (GSL). It is used to reproduce the words to be understood as well as their links in advance and to lay them down for the speech recognizer.
  • GSL Grammar Specification Language
  • the predefined sentences are formed from combinations of words that are interchangeable (paradigmatic axis) and combinable (syntagmatic axis). An example of this is shown in FIG. 7. The possible utterances result from the syntagmatic connection of the paradigmatic word combinations.
  • the object of the invention is to provide a method for speech recognition on the basis of a generative transformation / phrase structure grammar which, compared to conventional recognition methods, requires less system resources and thereby enables reliable and fast recognition of speech while reducing over-generation.
  • a spoken phrase is analyzed for triphones contained therein, words contained in the spoken phrase are formed from the recognized triphones with the aid of phonetic word databases
  • the linking rules of grammatical sentences are not reproduced on the surface, but the depth structures are shown, which are followed by the syntagmatic links of all Indo-European languages.
  • Each sentence is described using a syntactic model in the form of so-called structure trees.
  • the GT / PS grammar is not based on the potential statements of a specific application, but on the deep structure of the syntax (sentence formation rules) of Indo-European languages. It provides a framework that can be filled with different words and depicts the reality of the spoken language better than the previously used "mimetic" process.
  • Subgrammars in the GT / PS model on e.g. 30 subgrammars can be reduced in just two hierarchical levels.
  • the new grammar type depicts natural language expressions in a structured form and is only around 25% the size of the previous grammar, for example. Because of its small size, this grammar is easier to maintain, and the times for compilation decrease rapidly. Due to their small size, the Detection reliability (Accuracy) and decreases the detection delay (Latency). Current computer capacities are better used and the performance of the servers increases. In addition, the new Grammar is not related to a specific application, but can be used in its basic structures for different applications, which increases the homogeneity of the systems and reduces development times.
  • the universal code of the deep structure enables the use and added value for multilingual language systems in a dimension that has not yet been achieved, especially the standard Western European languages can be processed with comparatively little effort.
  • the new GT / PS grammar is based on current linguistic models that provide natural-language utterances in the context of surface and
  • GSL Grammar Specification Language
  • the GT / PS grammar is much smaller than the previous grammar because it only needs two levels instead of the up to seven subgrammar levels; - The number of grammatically incorrect sentences covered by the grammar
  • Figure 1 A triphone analysis as the first step in the recognition process
  • Figure 2 Word recognition from the recognized triphones as a second step in the recognition process
  • Figure 3 a syntactic reconstruction of the recognized words as the third step of the recognition process
  • Figure 4 An example of the structure of the recognized words in
  • Figure 5 A sample program for a possible grammar
  • Figure 6 An overview of the structure of a PSG grammar
  • Figure 7 An example of the formation of word combinations in a grammar according to the prior art.
  • Figure 1 shows the first step of speech recognition: the triphone analysis.
  • the continuous flow of speech of a person 1 is e.g. accepted by a microphone of a telephone and fed to a speech recognizer 2 as an analog signal.
  • the analog voice signal is converted into a digital voice signal 3.
  • the speech signal contains a variety of triphones, i.e. Sound segments that in speech recognizer 2 with existing, i.e. Predefined triphon linking rules are compared.
  • the existing triphones are stored in a database which contains one or more phonebooks.
  • the recognized triphones are then present as a triphone chain 4, e.g. "Pro", “red”, “ote", "tel”.
  • useful words are formed from the recognized triphones.
  • the phonetic dictionary 5 can comprise a certain vocabulary from the colloquial language as well as a special vocabulary tailored to the respective application.
  • the recognized words 7 are reconstructed using the grammar 8.
  • the recognized words are assigned to their part of speech, such as noun, verb, adverb, article, adjective, etc., as shown in FIG 6 is shown.
  • the databases 9-15 can contain both the conventional part of speech categories mentioned above and special part of speech types, such as yes / no grammar 9, telephone numbers 14, 15.
  • a detection of DTMF inputs 16 can also be provided.
  • the described assignment of the part of speech type to the recognized words can already take place during the word recognition process.
  • the recognized words are based on their word categories of a verbal phrase, i.e. a verb-based phrase, and a nominal phrase, i.e. assigned to a phrase based on a noun, cf. Figure 6.
  • step 18 the objects for multitasking are linked to the corresponding voice-controlled application.
  • Each object 19 comprises a target sentence stored in the grammar 8, more precisely a sentence model.
  • a sentence model e.g. can be defined by a word order "subject, verb, object” or "object, verb, subject”.
  • Many other sentence structures are stored in this general form in Grammar 8. If the word categories of the recognized words 7 correspond to the order of one of the predefined sentence models, they are assigned to the associated object. The sentence is considered recognized. In other words, each sentence model comprises a number of variables assigned to the different word categories, which are filled with the corresponding word categories of the recognized words 7.
  • the procedure uses the traditional Grammar Specification Language (GSL), but structures the stored sentences in an innovative way. It is based on the rules of phrase structure grammar and the concept of a generative transformation grammar.
  • GSL Grammar Specification Language
  • the GT / PS grammar is therefore based on a theoretical model that is suitable for determining the abstract principles of natural language utterances.
  • it opens up the possibility for the first time to reverse the abstraction of sentence formation rules and to substantiate them as a prediction of the statements made by application users. This enables systematic access to speech recognition grammars that have always been based on the intuitive accumulation of example sentences.
  • a central feature of conventional and GT / PS grammars is the hierarchical nesting into so-called subgrammars, which combine individual words and variables at the highest level to form an entire sentence.
  • the GT / PS grammar is much smaller and hierarchically much clearer than the previously known grammars.
  • "meaningful" sentences are almost exclusively stored in the new grammar, so that the degree of overgeneration, ie stored sentences that are incorrect in the natural language sense, decreases. This, in turn, is the prerequisite for improved recognition performance, since the Application only has to choose between a few stored alternatives.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de reconnaissance vocale naturelle fondée sur une grammaire syntagmatique/générative transformationnelle. Selon l'invention, une analyse d'une phrase prononcée est réalisée pour permettre de déterminer les triphones contenus dans cette dernière, puis des mots contenus dans la phrase prononcée sont formés à partir des triphones reconnus, à l'aide de bases de données de groupes de phonèmes (dictionnaires) et une reconstruction syntaxique de la phrase prononcée est réalisée à partir des mots reconnus au moyen d'une ensemble de règles grammaticales (grammaire). Cette grammaire syntagmatique/générative transformationnelle constitue un nouveau procédé pour mémoriser des phrases cibles dans la grammaire. Elle fait appel à la GSL ( </= Grammar Specification Language >/= ), mais structure les phrases à mémoriser de manière novatrice. Elle se fonde sur les règles de la grammaire syntagmatique et sur le concept de grammaire générative transformationnelle de Noam Chomsky.
EP03761435A 2002-06-28 2003-06-26 Procede de reconnaissance vocale naturelle fondee sur une grammaire syntagmatique/generative transformationnelle Ceased EP1518221A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
DE10229207A DE10229207B3 (de) 2002-06-28 2002-06-28 Verfahren zur natürlichen Spracherkennung auf Basis einer Generativen Transformations-/Phrasenstruktur-Grammatik
DE10229207 2002-06-28
PCT/DE2003/002135 WO2004003888A1 (fr) 2002-06-28 2003-06-26 Procede de reconnaissance vocale naturelle fondee sur une grammaire syntagmatique/generative transformationnelle

Publications (1)

Publication Number Publication Date
EP1518221A1 true EP1518221A1 (fr) 2005-03-30

Family

ID=29795990

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03761435A Ceased EP1518221A1 (fr) 2002-06-28 2003-06-26 Procede de reconnaissance vocale naturelle fondee sur une grammaire syntagmatique/generative transformationnelle

Country Status (10)

Country Link
US (1) US7548857B2 (fr)
EP (1) EP1518221A1 (fr)
JP (1) JP4649207B2 (fr)
CN (1) CN1315109C (fr)
AU (1) AU2003250272A1 (fr)
CA (1) CA2493429C (fr)
DE (1) DE10229207B3 (fr)
IL (1) IL165957A (fr)
PL (1) PL373306A1 (fr)
WO (1) WO2004003888A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7295981B1 (en) * 2004-01-09 2007-11-13 At&T Corp. Method for building a natural language understanding model for a spoken dialog system
GB0517082D0 (en) 2005-08-19 2005-09-28 Univ City Hong Kong Auxiliary winding for improved performance of a planar inductive charging platform
EP2141692A1 (fr) 2008-06-26 2010-01-06 Deutsche Telekom AG Assistance automatisée à commande vocale d'un utilisateur
KR101195812B1 (ko) * 2010-07-08 2012-11-05 뷰모션 (주) 규칙기반 시스템을 이용한 음성인식 시스템 및 그 방법
US9817813B2 (en) * 2014-01-08 2017-11-14 Genesys Telecommunications Laboratories, Inc. Generalized phrases in automatic speech recognition systems
CN110164449B (zh) * 2019-04-26 2021-09-24 安徽美博智能科技有限公司 语音识别的空调机控制方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998009228A1 (fr) * 1996-08-29 1998-03-05 Bcl Computers, Inc. Procede de commande vocale en langage naturel

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3786822T2 (de) * 1986-04-25 1994-01-13 Texas Instruments Inc Spracherkennungssystem.
EP0590173A1 (fr) * 1992-09-28 1994-04-06 International Business Machines Corporation Système de reconnaissance de la parole utilisant un ordinateur
JPH0769710B2 (ja) * 1993-03-23 1995-07-31 株式会社エイ・ティ・アール自動翻訳電話研究所 自然言語解析方法
US6070140A (en) * 1995-06-05 2000-05-30 Tran; Bao Q. Speech recognizer
US6182039B1 (en) * 1998-03-24 2001-01-30 Matsushita Electric Industrial Co., Ltd. Method and apparatus using probabilistic language model based on confusable sets for speech recognition
US6163768A (en) * 1998-06-15 2000-12-19 Dragon Systems, Inc. Non-interactive enrollment in speech recognition
JP2950823B1 (ja) * 1998-09-29 1999-09-20 株式会社エイ・ティ・アール音声翻訳通信研究所 音声認識誤り訂正装置
JP3581044B2 (ja) * 1999-05-20 2004-10-27 株式会社東芝 音声対話処理方法、音声対話処理システムおよびプログラムを記憶した記憶媒体
US7120582B1 (en) * 1999-09-07 2006-10-10 Dragon Systems, Inc. Expanding an effective vocabulary of a speech recognition system
US6633846B1 (en) * 1999-11-12 2003-10-14 Phoenix Solutions, Inc. Distributed realtime speech recognition system
DE10032255A1 (de) * 2000-07-03 2002-01-31 Siemens Ag Verfahren zur Sprachanalyse
US7058567B2 (en) * 2001-10-10 2006-06-06 Xerox Corporation Natural language parser

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998009228A1 (fr) * 1996-08-29 1998-03-05 Bcl Computers, Inc. Procede de commande vocale en langage naturel

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
FRÉDÉRIC BÉCHET ET AL: "LARGE SPAN STATISTICAL LANGUAGE MODELS : APPLICATION TO HOMOPHONE DISAMBIGUATION FOR LARGE VOCABULARY SPEECH RECOGNITION IN FRENCH", CONFERENCE PROCEEDINGS ON CD-ROM, EUROSPEECH'99, 6TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY,, vol. 4, 5 September 1999 (1999-09-05) - 9 September 1999 (1999-09-09), Budapest, Hungary, pages 1763, XP007001340, ISSN: 1018-4074 *
I. ZITOUNI, K. SMAÏLI, J.P. HATON: "STATISTICAL LANGUAGE MODEL BASEDON A HIERARCHICAL APPROACH:MC", 7TH EUROPEAN CONFERENCE ON SPEECH COMMUNICATION AND TECHNOLOGY, AALBORG, DENMARK, 2001, 7 September 2001 (2001-09-07), Aalborg, Denmark, 2001, Retrieved from the Internet <URL:http://www.loria.fr/%7Esmaili/Euro01.pdf> [retrieved on 20110204] *
IMED ZITOUNI: "Modélisation du langage pour les systèmes de reconnaissance de la parole destinés aux grands vocabulaires : application à MAUD", DISSERTATION, 31 March 2000 (2000-03-31), UNIVERSITÉ HENRI POINCARÉ - NANCY, XP055005773, Retrieved from the Internet <URL:http://www.afcp-parole.org/doc/theses/theseIZ00.ps.gz> [retrieved on 20110204] *
IMED ZITOUNI: "Modélisation du langage pour les systèmes de reconnaissance de la parole destinés aux grands vocabulaires : application à MAUD", DISSERTATION, 7 September 2001 (2001-09-07), UNIVERSITÉ HENRI POINCARÉ - NANCY, pages 1 - 188, XP055005773 *
IMED ZITOUNI: "Modélisation du langage pour les systèmes de reconnaissance de la parole destinés aux grands vocabulaires : application à MAUD", RESUMES DE THESES, INFORMATION IN COGNITO, vol. 20, 31 December 2001 (2001-12-31), nancy, Retrieved from the Internet <URL:http://www.in-cognito.net/new/images/article/zitouni20.pdf> [retrieved on 20110204] *
M. HASPELMATH: "Word classes and parts of speech", 31 December 2001 (2001-12-31), pages 16538 - 16545, ISBN: 0-08-043076-7, Retrieved from the Internet <URL:www.eva.mpg.de/~haspelmt/2001wcl.pdf> [retrieved on 20110204] *
See also references of WO2004003888A1 *

Also Published As

Publication number Publication date
JP2005539249A (ja) 2005-12-22
US20060161436A1 (en) 2006-07-20
CN1666254A (zh) 2005-09-07
IL165957A (en) 2010-11-30
JP4649207B2 (ja) 2011-03-09
CN1315109C (zh) 2007-05-09
CA2493429C (fr) 2011-09-13
WO2004003888B1 (fr) 2004-03-25
IL165957A0 (en) 2006-01-15
PL373306A1 (en) 2005-08-22
US7548857B2 (en) 2009-06-16
DE10229207B3 (de) 2004-02-05
CA2493429A1 (fr) 2004-01-08
AU2003250272A1 (en) 2004-01-19
WO2004003888A1 (fr) 2004-01-08

Similar Documents

Publication Publication Date Title
DE69607601T2 (de) System und verfahren zur spracherkennung mit automatischer erzeugung einer syntax
DE69527229T2 (de) Sprachinterpretator mit einem Kompiler mit vereinheitlicher Grammatik
DE69923191T2 (de) Interaktive anwenderschnittstelle mit spracherkennung und natursprachenverarbeitungssystem
DE602005001125T2 (de) Erlernen der Aussprache neuer Worte unter Verwendung eines Aussprachegraphen
DE69622565T2 (de) Verfahren und vorrichtung zur dynamischen anpassung eines spracherkennungssystems mit grossem wortschatz und zur verwendung von einschränkungen aus einer datenbank in einem spracherkennungssystem mit grossem wortschatz
DE69937176T2 (de) Segmentierungsverfahren zur Erweiterung des aktiven Vokabulars von Spracherkennern
DE69822296T2 (de) Mustererkennungsregistrierung in einem verteilten system
EP1466317B1 (fr) Procede d&#39;exploitation d&#39;un systeme de reconnaissance vocale automatique pour la reconnaissance vocale multilocuteur de mots de differentes langues et systeme de reconnaissance vocale automatique
DE69834553T2 (de) Erweiterbares spracherkennungssystem mit einer audio-rückkopplung
DE69712216T2 (de) Verfahren und gerät zum übersetzen von einer sparche in eine andere
DE69923379T2 (de) Nicht-interaktive Registrierung zur Spracherkennung
DE60222093T2 (de) Verfahren, modul, vorrichtung und server zur spracherkennung
DE60016722T2 (de) Spracherkennung in zwei Durchgängen mit Restriktion des aktiven Vokabulars
DE69914131T2 (de) Positionshandhabung bei der Spracherkennung
DE60123952T2 (de) Erzeugung von einem einheitlichen aufgabeabhängigen sprachmodell mittels informationsauffindungverfahren
EP1611568B1 (fr) Reconnaissance de mots isoles en trois etapes
DE19847419A1 (de) Verfahren zur automatischen Erkennung einer buchstabierten sprachlichen Äußerung
EP0804788B1 (fr) Procede de reconnaissance vocale
DE69519229T2 (de) Verfahren und vorrichtung zur anpassung eines spracherkenners an dialektische sprachvarianten
DE60026366T2 (de) Spracherkennung mit einem komplementären sprachmodel für typischen fehlern im sprachdialog
WO2000005709A1 (fr) Procede et dispositif pour reconnaitre des mots-cles predetermines dans un enonce verbal
DE69331247T2 (de) Spracherkennungssystem
DE10229207B3 (de) Verfahren zur natürlichen Spracherkennung auf Basis einer Generativen Transformations-/Phrasenstruktur-Grammatik
DE69723449T2 (de) Verfahren und system zur sprache-in-sprache-umsetzung
EP2034472B1 (fr) Procédé de reconnaissance vocale et dispositif de reconnaissance vocale

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20041216

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20110902

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20130920