WO2003001504A1 - Analyseur stochastique de blocs - Google Patents

Analyseur stochastique de blocs Download PDF

Info

Publication number
WO2003001504A1
WO2003001504A1 PCT/AU2002/000802 AU0200802W WO03001504A1 WO 2003001504 A1 WO2003001504 A1 WO 2003001504A1 AU 0200802 W AU0200802 W AU 0200802W WO 03001504 A1 WO03001504 A1 WO 03001504A1
Authority
WO
WIPO (PCT)
Prior art keywords
parser
utterances
parsing
processing system
data structure
Prior art date
Application number
PCT/AU2002/000802
Other languages
English (en)
Inventor
Dominique Estival
Ben Hutchison
Original Assignee
Kaz Group Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kaz Group Limited filed Critical Kaz Group Limited
Publication of WO2003001504A1 publication Critical patent/WO2003001504A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present invention relates to a stochastic chunk parser and, more particularly, to such a parser suited for use within an automated speech recognition system.
  • Automated speech recognition is a complex task in itself. Automated speech understanding sufficient to provide automated dialogue with a user adds a further layer of complexity.
  • automated speech recognition system will refer to automated or substantially automated systems which perform automated speech recognition and also attempt automated speech understanding, at least to predetermined levels sufficient to provide a capability for at least limited automated dialogue with a user.
  • FIG. 1 A generalized diagram of a commercial grade automated speech recognition system as can be used for example in call centres and the like is illustrated in Fig. 1.
  • parsing techniques are used to help take utterances input to the system so as to decide the likelihood of different interpretations being placed on those utterances .
  • Parsing techniques that produce complete parse trees and attribute value matrices been used. Also, parsing techniques have been used which produce complete parse trees but may have skipped over one or more words. The grammars used in these solutions have been difficult to develop and maintain.
  • a speech recognition system of the type adapted to process utterances from a caller or user by way of components including a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of parsing a data structure derived by a component of said system; said method comprising performing only a partial parse of said data strueture .
  • a speech recognition system of the type adapted to process utterances from a caller or user by way of components including a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a stochastic chunk parser for attribute grammars which searches for meaningful words and phrases within said utterances and from these builds attribute value matrices.
  • said parser utilizes structure and location of said phrases to estimate the probability of the utterance in which they are found.
  • said parser receives as input an N-best list derived from said recogniser.
  • said attribute value matrices attach respective probabilities to respective attribute value matrices by combining phrase structure probabilities with class-based N-gram language modeling.
  • said method is applied to a data structure.
  • said data structure comprises a N-best list.
  • said data structure comprises a word lattice.
  • Fig. 1 is a generalized block diagram of a prior art automated speech recognition system
  • Fig. 2 is a generalized block diagram of an automated speech recognition system suited for use in conjunction with an embodiment of the present invention
  • Fig. 3 is a more detailed block diagram of the utterance processing and dialogue processing portions of the system of Fig. 2;
  • Fig. 4 is a more detailed block diagram of the system of Fig. 2 incorporating a stochastic chunk parser in accordance with a first preferred embodiment of the present invention.
  • Fig. 2 there is illustrated a generalized block - diagram of an automated speech recognition system 10 adapted to receive human speech derived from user 11, and to process that speech with a view to recognizing and understanding the speech to a sufficient level of accuracy that a response 12 can be returned to user 11 by system 10.
  • the response 12 can take the form of an auditory communication, a written or visual communication or any other form of communication intelligible to user 11 or a combination thereof.
  • input from user 11 is in the form of a plurality of utterances 13 which are received by transducer 14 (for example a microphone) and converted into an electronic representation 15 of the utterances 13
  • the electronic representation 15 comprises a digital representation of the utterances 13 in .WAV format.
  • Each electronic representation 15 represents an entire utterance 13.
  • the electronic representations 15 are processed through front end processor 16 to produce a stream of vectors 17, one vector for example for each 10ms segment of utterance 13.
  • the vectors 17 are matched against knowledge base vectors 18 derived from knowledge base 19 by back end processor 20 so as to produce ranked results 1-N in the form of N_ best results '21.
  • the results can comprise for example subwords, words or phrases but will depend on the application.
  • N can vary from 1 to very high values, again depending on rhe application.
  • An utterance processing system 26 receives the N best results 21 and begins the task of assembling the results into a meaning representation 25 for example based on the data contained in language knowledge database 31.
  • the utterance processing system 26 orders the resulting tokens or words 23 contained in N-best results 21 into a meaning representation 25 of token or word candidates which are passed to the dialogue processing system 27 where sufficient understanding is attained so as to permit functional utilization of speech input 15 from user 11 for the task to be performed by the automated speech recognition system 10.
  • the functionality includes attaining of sufficient understanding to permit at least a limited dialogue to be entered into with user/caller 11 by means of response 12 in the form of prompts so as to elicit further speech input from the user 11.
  • the functionality for example can include a sufficient understanding to permit interaction with extended databases for data identification.
  • Fig. 3 illustrates further detail of the system of Fig. 2 including listing of further functional components which make up the utterance processing system 26 and the dialogue processing system 27 and their interaction. Like components are numbered as for the arrangement of Fig. 2.
  • the utterance processing system 26 and the dialogue processing system 27 together form a natural language processing system.
  • the utterance processing system 26 is event-driven and processes each of the utterances 13 of caller/user 11 individually.
  • the dialogue processing system 27 puts any given utterance 13 of caller/user 11 into the context of the current conversation (usually in the context of a telephone conversation) . Broadly, in a telephone answering context, it will try to resolve the query from the caller and decide on an appropriate answer to be provided by way of response 12.
  • the utterance processing system 26 takes as input the output of the acoustic or speech recogniser 30 and produces a meaning representation 25 for passing to dialogue processing system 27.
  • the meaning representation 25 can take the form of value pairs.
  • the utterance "I want to go from Melbourne to Sydney on Wednesday” may be presented to the dialogue processing system 27 in the form of three value pairs, comprising.-
  • the recogniser 30 provides as output N best results 21 usually in the form of tokens or words 23 to the utterance processing system 26 where it is first disa biguated by language model 32.
  • the language model 32 is based on trigrams with cut off .
  • Analyser 33 specifies how words derived from language model 32 can be grouped together to form meaningful phrases which are used to interpret utterance 13.
  • the analyzer is based on a series of simple finite state automata which produce robust parses of phrasal chunks - for example noun phrases for entities and concepts and WH- phrases for questions, dates.
  • Analyser 33 is driven by grammars such as meta-grammar 34. The grammars themselves must be tailored for each application and can be thought of as data created for a specific customer.
  • the resolver 35 then uses semantic information
  • dialogue processing system 27 receives meaning representation 25 from resolver 35 and processes the dialogue according to the appropriate dialogue models.
  • dialogue models will be specific to different applications but some may be reusable. For example a protocol model may handle greetings, closures, interruptions, errors and the like across a number of different applications.
  • the dialogue flow controller 36 uses the dialogue history to keep track of the interactions.
  • the logic engine 37 creates SQL queries based on the meaning representation 25. Again it will be dependent on the specific application and its domain knowledge base.
  • the generator 38 produces responses 12 (for example speech out) .
  • the generator 38 can utilize generic text to speech (TTS) systems to produce a voiced response.
  • TTS generic text to speech
  • Language knowledge database 31 comprises, in the instance of Fig. 3, a lexicon 39 operating in conjunction with database 40.
  • the lexicon 39 and database 40 operating in conjunction with knowledge base mapping tools 41 and, as appropriate, language model -32. -and . grammars 34 ' .constitutes a language knowledge database 31 or knowledge base which deals with domain specific data.
  • the structure and grouping of data is modeled in the knowledge base 31.
  • Database 40 comprises raw data provided by a customer. In one instance this data may comprise names, addresses, places, dates and is usually organised in a way that logically relates to the way it will be used.
  • the database 40 may remain unchanged or it may be updated throughout the lifetime of an application. Functional implementation can be by way of database servers such as MySQL, Oracle, Postgres .
  • the set 1310 takes as its input the output of recogniser 30 and, with the assistance of a phrase structure grammar 1311 it produces a set 1312 of attribute value matrices, each having associated with it a probability 1314.
  • the set 1312 comprises N attribute value matrices
  • the probabilities are calculated by combining phrase structure probabilities with class-based N-gram language modeling, thereby to allow the calculation of probabilities of partial parses.
  • the Stochastic Chunk Parser for Attribute Grammars searches for meaningful words and phrases, and from these builds attribute value matrices (AVM) . It also uses the structure and location of these phrases to estimate the probability of the utterances they are found in.
  • AVM attribute value matrices
  • the solution differs from previous attempts in that no complete parse of the sentence is performed. This means the parser, does not attempt to find structural analysis for repairs, discourse markers, asides, editing terms, and other words which do not contribute to the construction of AVMs.
  • the Stochastic Chunk Parser for Attribute Grammars 1310 takes as its input a scored N-best list 21 created by a speech recogniser 30. From this, the Parser uses partial parsing to create a set of AVMs. The parsing is stochastic, so that probabilities are associated with each AVM. This is used to calculate the most likely AVM from the speech input .
  • the Parser 1310 uses phrase structure grammars 1311, and proceeds from the bottom up. As each phrase structure rule is matched AVMs 1313are created representing the meaning of the phrase. At the end of structural analysis there is a sequence of terminal and non-terminal categories; the former represent unparsed words, the latter parsed structures. The probabilities of these sequences are calculated in a novel manner, combining phrase structure probabilities with class-based n-gram language modelling. The internal structure of phrases are calculated using probabilistic parsing techniques, and the product of these is multiplied with the probability of the sequence of categories, which is estimated using an N-gram language model .
  • the same principle can be used to do parsing on data structures other than N-best lists, for example word lattices .
  • the advantage of the solution is that allows the robust stochastic parsing of spoken language.
  • the grammars required by the Parser are considerably simpler than those used by parsers doing complete analyses. This means that application development time is significantly reduced.
  • the principle underlying the solution is that of using a mixture of phrase probabilities and N-gram language modelling to allow the calculation of probabilities of partial parses.
  • Variations that embody the concept include other methods of combining phrase structure probabilities with the probability of a series of tokens, including fixed and variable length language models, word based language models that backoff to class based, etc.
  • Annexure A provides additional background on the concept of probabilistic partial parsing and is incorporated herein by cross reference and so as to form part of this specification.
  • Probabilistic Parsing assigns probabilities to parse trees. This has two possible functions in the field of spoken language understanding:
  • the first function addresses what has been a concern for some time: how can linguistic knowledge be used to improve recognition performance?
  • Language modelling using N-grams is one way to incorporate statistics on language use, but it is unable to sensibly cope with syntactic relations other than the most local.
  • Non detei mmism pai sing has an interesting connection w ith lauices that is worth discussing
  • non-deterministic parsing essenn iy treats all giammar rules as optional a rule may be applied, oi it may be skipped in order to allow futirc rules the possibility of applying Since this leads to a bin uy blanching of the search space, it may be s!_;>posed that this leads to an exponential explosion of the seai ch space
  • This belief is misguided how c since the interdependence of rules, both at the same lc ⁇ of parses
  • considei mg the grammai ⁇ _ as non deterministic, the following pji se for ⁇ w> off the wad would be consti iictcd
  • a Vitei bi •• e search algorithm can locale the most probable p m i ⁇ l nai se lp 'he above exam le this w ould l iopt.
  • the input to this need not be a single utterance, but can be a lattice of recognition hypotheses
  • the lattice below shows the results of non-delerministically parsing a recognition lattice which initially contained just the two traversals turn off the road and turn off the light (mtoduced arcs are shown in dotted line)
  • the Vitcrbi search should produce one of (rn/n off) (the light) and (turn) (off (the road)) as the most likely parse
  • the outside probabilities can be calculated using the Acoustic group's language model class. This means we must just calculate the inside probabilities, which is quite a simple taskof multiplying probabilities of phrase expansions and lexical instantiations Note also that it will also be easy to do speech repair by doing lattice pi uning, and also ALARM piocessing 15 a simple case of lattice pruning. This shows we aie ready to accept lattices from the acoustics gioup more, if we decide to proceed with a non-deterministic partial pai ser then lattice parsing is not )ust possible but desirable!
  • top chunk I mean a rule of the speech recognition grammar that has semantic tags, but all of the rules that reference do not have semantic tags.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Dans un système de reconnaissance vocale du type conçu pour traiter des énoncés provenant d'un appelant ou d'un utilisateur à l'aide d'éléments incluant un reconnaisseur, l'invention concerne un système de traitement d'énoncés et un système de traitement de dialogues permettant de produire des réponses aux énoncés, et un procédé d'analyse d'une structure de données dérivée d'un élément du système ; le procédé comportant la mise en oeuvre d'une analyse partielle de ladite structure de données.
PCT/AU2002/000802 2001-06-21 2002-06-19 Analyseur stochastique de blocs WO2003001504A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR5851A AUPR585101A0 (en) 2001-06-21 2001-06-21 Stochastic chunk parser
AUPR5851 2001-06-21

Publications (1)

Publication Number Publication Date
WO2003001504A1 true WO2003001504A1 (fr) 2003-01-03

Family

ID=3829822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2002/000802 WO2003001504A1 (fr) 2001-06-21 2002-06-19 Analyseur stochastique de blocs

Country Status (2)

Country Link
AU (1) AUPR585101A0 (fr)
WO (1) WO2003001504A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1450350A1 (fr) * 2003-02-20 2004-08-25 Sony International (Europe) GmbH Méthode de reconnaissance de la parole avec des attributs
US8868409B1 (en) 2014-01-16 2014-10-21 Google Inc. Evaluating transcriptions with a semantic parser

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0977173A2 (fr) * 1998-07-31 2000-02-02 Texas Instruments Incorporated Minimisation d'un reseau de recherche pour la reconnaissance de la parole
WO2000026901A2 (fr) * 1998-11-05 2000-05-11 Dragon Systems, Inc. Realisation d'action enregistrees
WO2000078022A1 (fr) * 1999-06-11 2000-12-21 Telstra New Wave Pty Ltd Procede de developpement d'un systeme interactif
WO2002033583A1 (fr) * 2000-10-17 2002-04-25 Telstra New Wave Pty Ltd Systeme d'extraction d'informations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0977173A2 (fr) * 1998-07-31 2000-02-02 Texas Instruments Incorporated Minimisation d'un reseau de recherche pour la reconnaissance de la parole
WO2000026901A2 (fr) * 1998-11-05 2000-05-11 Dragon Systems, Inc. Realisation d'action enregistrees
WO2000078022A1 (fr) * 1999-06-11 2000-12-21 Telstra New Wave Pty Ltd Procede de developpement d'un systeme interactif
WO2002033583A1 (fr) * 2000-10-17 2002-04-25 Telstra New Wave Pty Ltd Systeme d'extraction d'informations

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1450350A1 (fr) * 2003-02-20 2004-08-25 Sony International (Europe) GmbH Méthode de reconnaissance de la parole avec des attributs
US8868409B1 (en) 2014-01-16 2014-10-21 Google Inc. Evaluating transcriptions with a semantic parser

Also Published As

Publication number Publication date
AUPR585101A0 (en) 2001-07-12

Similar Documents

Publication Publication Date Title
US6983239B1 (en) Method and apparatus for embedding grammars in a natural language understanding (NLU) statistical parser
Ward Extracting information in spontaneous speech.
US6937983B2 (en) Method and system for semantic speech recognition
Issar Estimation of language models for new spoken language applications
US6910004B2 (en) Method and computer system for part-of-speech tagging of incomplete sentences
US6963831B1 (en) Including statistical NLU models within a statistical parser
US20030191625A1 (en) Method and system for creating a named entity language model
US6349282B1 (en) Compound words in speech recognition systems
US5875426A (en) Recognizing speech having word liaisons by adding a phoneme to reference word models
US10996931B1 (en) Integrated programming framework for speech and text understanding with block and statement structure
JP2005084681A (ja) 意味的言語モデル化および信頼性測定のための方法およびシステム
CA2481080C (fr) Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees
US8401855B2 (en) System and method for generating data for complex statistical modeling for use in dialog systems
Rayner et al. Fast parsing using pruning and grammar specialization
Chow et al. Speech understanding using a unification grammar
Schwartz et al. Hidden understanding models for statistical sentence understanding
Jurcıcek et al. Transformation-based Learning for Semantic parsing
WO2003001504A1 (fr) Analyseur stochastique de blocs
Basu et al. Commodity price retrieval system in bangla: An ivr based application
US10832675B2 (en) Speech recognition system with interactive spelling function
Favre et al. Efficient sentence segmentation using syntactic features
Béchet et al. Robust dependency parsing for spoken language understanding of spontaneous speech.
JP3181465B2 (ja) 言語処理装置
WO2002103672A1 (fr) Module de reconnaissance assiste par langage
Gordon et al. An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP