WO2003001504A1 - Analyseur stochastique de blocs - Google Patents
Analyseur stochastique de blocs Download PDFInfo
- Publication number
- WO2003001504A1 WO2003001504A1 PCT/AU2002/000802 AU0200802W WO03001504A1 WO 2003001504 A1 WO2003001504 A1 WO 2003001504A1 AU 0200802 W AU0200802 W AU 0200802W WO 03001504 A1 WO03001504 A1 WO 03001504A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- parser
- utterances
- parsing
- processing system
- data structure
- Prior art date
Links
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000004044 response Effects 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000003993 interaction Effects 0.000 description 4
- 238000013138 pruning Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 208000027697 autoimmune lymphoproliferative syndrome due to CTLA4 haploinsuffiency Diseases 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 241000896693 Disa Species 0.000 description 1
- 206010033546 Pallor Diseases 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 208000022823 partial androgen insensitivity syndrome Diseases 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Definitions
- the present invention relates to a stochastic chunk parser and, more particularly, to such a parser suited for use within an automated speech recognition system.
- Automated speech recognition is a complex task in itself. Automated speech understanding sufficient to provide automated dialogue with a user adds a further layer of complexity.
- automated speech recognition system will refer to automated or substantially automated systems which perform automated speech recognition and also attempt automated speech understanding, at least to predetermined levels sufficient to provide a capability for at least limited automated dialogue with a user.
- FIG. 1 A generalized diagram of a commercial grade automated speech recognition system as can be used for example in call centres and the like is illustrated in Fig. 1.
- parsing techniques are used to help take utterances input to the system so as to decide the likelihood of different interpretations being placed on those utterances .
- Parsing techniques that produce complete parse trees and attribute value matrices been used. Also, parsing techniques have been used which produce complete parse trees but may have skipped over one or more words. The grammars used in these solutions have been difficult to develop and maintain.
- a speech recognition system of the type adapted to process utterances from a caller or user by way of components including a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of parsing a data structure derived by a component of said system; said method comprising performing only a partial parse of said data strueture .
- a speech recognition system of the type adapted to process utterances from a caller or user by way of components including a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a stochastic chunk parser for attribute grammars which searches for meaningful words and phrases within said utterances and from these builds attribute value matrices.
- said parser utilizes structure and location of said phrases to estimate the probability of the utterance in which they are found.
- said parser receives as input an N-best list derived from said recogniser.
- said attribute value matrices attach respective probabilities to respective attribute value matrices by combining phrase structure probabilities with class-based N-gram language modeling.
- said method is applied to a data structure.
- said data structure comprises a N-best list.
- said data structure comprises a word lattice.
- Fig. 1 is a generalized block diagram of a prior art automated speech recognition system
- Fig. 2 is a generalized block diagram of an automated speech recognition system suited for use in conjunction with an embodiment of the present invention
- Fig. 3 is a more detailed block diagram of the utterance processing and dialogue processing portions of the system of Fig. 2;
- Fig. 4 is a more detailed block diagram of the system of Fig. 2 incorporating a stochastic chunk parser in accordance with a first preferred embodiment of the present invention.
- Fig. 2 there is illustrated a generalized block - diagram of an automated speech recognition system 10 adapted to receive human speech derived from user 11, and to process that speech with a view to recognizing and understanding the speech to a sufficient level of accuracy that a response 12 can be returned to user 11 by system 10.
- the response 12 can take the form of an auditory communication, a written or visual communication or any other form of communication intelligible to user 11 or a combination thereof.
- input from user 11 is in the form of a plurality of utterances 13 which are received by transducer 14 (for example a microphone) and converted into an electronic representation 15 of the utterances 13
- the electronic representation 15 comprises a digital representation of the utterances 13 in .WAV format.
- Each electronic representation 15 represents an entire utterance 13.
- the electronic representations 15 are processed through front end processor 16 to produce a stream of vectors 17, one vector for example for each 10ms segment of utterance 13.
- the vectors 17 are matched against knowledge base vectors 18 derived from knowledge base 19 by back end processor 20 so as to produce ranked results 1-N in the form of N_ best results '21.
- the results can comprise for example subwords, words or phrases but will depend on the application.
- N can vary from 1 to very high values, again depending on rhe application.
- An utterance processing system 26 receives the N best results 21 and begins the task of assembling the results into a meaning representation 25 for example based on the data contained in language knowledge database 31.
- the utterance processing system 26 orders the resulting tokens or words 23 contained in N-best results 21 into a meaning representation 25 of token or word candidates which are passed to the dialogue processing system 27 where sufficient understanding is attained so as to permit functional utilization of speech input 15 from user 11 for the task to be performed by the automated speech recognition system 10.
- the functionality includes attaining of sufficient understanding to permit at least a limited dialogue to be entered into with user/caller 11 by means of response 12 in the form of prompts so as to elicit further speech input from the user 11.
- the functionality for example can include a sufficient understanding to permit interaction with extended databases for data identification.
- Fig. 3 illustrates further detail of the system of Fig. 2 including listing of further functional components which make up the utterance processing system 26 and the dialogue processing system 27 and their interaction. Like components are numbered as for the arrangement of Fig. 2.
- the utterance processing system 26 and the dialogue processing system 27 together form a natural language processing system.
- the utterance processing system 26 is event-driven and processes each of the utterances 13 of caller/user 11 individually.
- the dialogue processing system 27 puts any given utterance 13 of caller/user 11 into the context of the current conversation (usually in the context of a telephone conversation) . Broadly, in a telephone answering context, it will try to resolve the query from the caller and decide on an appropriate answer to be provided by way of response 12.
- the utterance processing system 26 takes as input the output of the acoustic or speech recogniser 30 and produces a meaning representation 25 for passing to dialogue processing system 27.
- the meaning representation 25 can take the form of value pairs.
- the utterance "I want to go from Melbourne to Sydney on Wednesday” may be presented to the dialogue processing system 27 in the form of three value pairs, comprising.-
- the recogniser 30 provides as output N best results 21 usually in the form of tokens or words 23 to the utterance processing system 26 where it is first disa biguated by language model 32.
- the language model 32 is based on trigrams with cut off .
- Analyser 33 specifies how words derived from language model 32 can be grouped together to form meaningful phrases which are used to interpret utterance 13.
- the analyzer is based on a series of simple finite state automata which produce robust parses of phrasal chunks - for example noun phrases for entities and concepts and WH- phrases for questions, dates.
- Analyser 33 is driven by grammars such as meta-grammar 34. The grammars themselves must be tailored for each application and can be thought of as data created for a specific customer.
- the resolver 35 then uses semantic information
- dialogue processing system 27 receives meaning representation 25 from resolver 35 and processes the dialogue according to the appropriate dialogue models.
- dialogue models will be specific to different applications but some may be reusable. For example a protocol model may handle greetings, closures, interruptions, errors and the like across a number of different applications.
- the dialogue flow controller 36 uses the dialogue history to keep track of the interactions.
- the logic engine 37 creates SQL queries based on the meaning representation 25. Again it will be dependent on the specific application and its domain knowledge base.
- the generator 38 produces responses 12 (for example speech out) .
- the generator 38 can utilize generic text to speech (TTS) systems to produce a voiced response.
- TTS generic text to speech
- Language knowledge database 31 comprises, in the instance of Fig. 3, a lexicon 39 operating in conjunction with database 40.
- the lexicon 39 and database 40 operating in conjunction with knowledge base mapping tools 41 and, as appropriate, language model -32. -and . grammars 34 ' .constitutes a language knowledge database 31 or knowledge base which deals with domain specific data.
- the structure and grouping of data is modeled in the knowledge base 31.
- Database 40 comprises raw data provided by a customer. In one instance this data may comprise names, addresses, places, dates and is usually organised in a way that logically relates to the way it will be used.
- the database 40 may remain unchanged or it may be updated throughout the lifetime of an application. Functional implementation can be by way of database servers such as MySQL, Oracle, Postgres .
- the set 1310 takes as its input the output of recogniser 30 and, with the assistance of a phrase structure grammar 1311 it produces a set 1312 of attribute value matrices, each having associated with it a probability 1314.
- the set 1312 comprises N attribute value matrices
- the probabilities are calculated by combining phrase structure probabilities with class-based N-gram language modeling, thereby to allow the calculation of probabilities of partial parses.
- the Stochastic Chunk Parser for Attribute Grammars searches for meaningful words and phrases, and from these builds attribute value matrices (AVM) . It also uses the structure and location of these phrases to estimate the probability of the utterances they are found in.
- AVM attribute value matrices
- the solution differs from previous attempts in that no complete parse of the sentence is performed. This means the parser, does not attempt to find structural analysis for repairs, discourse markers, asides, editing terms, and other words which do not contribute to the construction of AVMs.
- the Stochastic Chunk Parser for Attribute Grammars 1310 takes as its input a scored N-best list 21 created by a speech recogniser 30. From this, the Parser uses partial parsing to create a set of AVMs. The parsing is stochastic, so that probabilities are associated with each AVM. This is used to calculate the most likely AVM from the speech input .
- the Parser 1310 uses phrase structure grammars 1311, and proceeds from the bottom up. As each phrase structure rule is matched AVMs 1313are created representing the meaning of the phrase. At the end of structural analysis there is a sequence of terminal and non-terminal categories; the former represent unparsed words, the latter parsed structures. The probabilities of these sequences are calculated in a novel manner, combining phrase structure probabilities with class-based n-gram language modelling. The internal structure of phrases are calculated using probabilistic parsing techniques, and the product of these is multiplied with the probability of the sequence of categories, which is estimated using an N-gram language model .
- the same principle can be used to do parsing on data structures other than N-best lists, for example word lattices .
- the advantage of the solution is that allows the robust stochastic parsing of spoken language.
- the grammars required by the Parser are considerably simpler than those used by parsers doing complete analyses. This means that application development time is significantly reduced.
- the principle underlying the solution is that of using a mixture of phrase probabilities and N-gram language modelling to allow the calculation of probabilities of partial parses.
- Variations that embody the concept include other methods of combining phrase structure probabilities with the probability of a series of tokens, including fixed and variable length language models, word based language models that backoff to class based, etc.
- Annexure A provides additional background on the concept of probabilistic partial parsing and is incorporated herein by cross reference and so as to form part of this specification.
- Probabilistic Parsing assigns probabilities to parse trees. This has two possible functions in the field of spoken language understanding:
- the first function addresses what has been a concern for some time: how can linguistic knowledge be used to improve recognition performance?
- Language modelling using N-grams is one way to incorporate statistics on language use, but it is unable to sensibly cope with syntactic relations other than the most local.
- Non detei mmism pai sing has an interesting connection w ith lauices that is worth discussing
- non-deterministic parsing essenn iy treats all giammar rules as optional a rule may be applied, oi it may be skipped in order to allow futirc rules the possibility of applying Since this leads to a bin uy blanching of the search space, it may be s!_;>posed that this leads to an exponential explosion of the seai ch space
- This belief is misguided how c since the interdependence of rules, both at the same lc ⁇ of parses
- considei mg the grammai ⁇ _ as non deterministic, the following pji se for ⁇ w> off the wad would be consti iictcd
- a Vitei bi •• e search algorithm can locale the most probable p m i ⁇ l nai se lp 'he above exam le this w ould l iopt.
- the input to this need not be a single utterance, but can be a lattice of recognition hypotheses
- the lattice below shows the results of non-delerministically parsing a recognition lattice which initially contained just the two traversals turn off the road and turn off the light (mtoduced arcs are shown in dotted line)
- the Vitcrbi search should produce one of (rn/n off) (the light) and (turn) (off (the road)) as the most likely parse
- the outside probabilities can be calculated using the Acoustic group's language model class. This means we must just calculate the inside probabilities, which is quite a simple taskof multiplying probabilities of phrase expansions and lexical instantiations Note also that it will also be easy to do speech repair by doing lattice pi uning, and also ALARM piocessing 15 a simple case of lattice pruning. This shows we aie ready to accept lattices from the acoustics gioup more, if we decide to proceed with a non-deterministic partial pai ser then lattice parsing is not )ust possible but desirable!
- top chunk I mean a rule of the speech recognition grammar that has semantic tags, but all of the rules that reference do not have semantic tags.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPR5851A AUPR585101A0 (en) | 2001-06-21 | 2001-06-21 | Stochastic chunk parser |
AUPR5851 | 2001-06-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003001504A1 true WO2003001504A1 (fr) | 2003-01-03 |
Family
ID=3829822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2002/000802 WO2003001504A1 (fr) | 2001-06-21 | 2002-06-19 | Analyseur stochastique de blocs |
Country Status (2)
Country | Link |
---|---|
AU (1) | AUPR585101A0 (fr) |
WO (1) | WO2003001504A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1450350A1 (fr) * | 2003-02-20 | 2004-08-25 | Sony International (Europe) GmbH | Méthode de reconnaissance de la parole avec des attributs |
US8868409B1 (en) | 2014-01-16 | 2014-10-21 | Google Inc. | Evaluating transcriptions with a semantic parser |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0977173A2 (fr) * | 1998-07-31 | 2000-02-02 | Texas Instruments Incorporated | Minimisation d'un reseau de recherche pour la reconnaissance de la parole |
WO2000026901A2 (fr) * | 1998-11-05 | 2000-05-11 | Dragon Systems, Inc. | Realisation d'action enregistrees |
WO2000078022A1 (fr) * | 1999-06-11 | 2000-12-21 | Telstra New Wave Pty Ltd | Procede de developpement d'un systeme interactif |
WO2002033583A1 (fr) * | 2000-10-17 | 2002-04-25 | Telstra New Wave Pty Ltd | Systeme d'extraction d'informations |
-
2001
- 2001-06-21 AU AUPR5851A patent/AUPR585101A0/en not_active Abandoned
-
2002
- 2002-06-19 WO PCT/AU2002/000802 patent/WO2003001504A1/fr not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0977173A2 (fr) * | 1998-07-31 | 2000-02-02 | Texas Instruments Incorporated | Minimisation d'un reseau de recherche pour la reconnaissance de la parole |
WO2000026901A2 (fr) * | 1998-11-05 | 2000-05-11 | Dragon Systems, Inc. | Realisation d'action enregistrees |
WO2000078022A1 (fr) * | 1999-06-11 | 2000-12-21 | Telstra New Wave Pty Ltd | Procede de developpement d'un systeme interactif |
WO2002033583A1 (fr) * | 2000-10-17 | 2002-04-25 | Telstra New Wave Pty Ltd | Systeme d'extraction d'informations |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1450350A1 (fr) * | 2003-02-20 | 2004-08-25 | Sony International (Europe) GmbH | Méthode de reconnaissance de la parole avec des attributs |
US8868409B1 (en) | 2014-01-16 | 2014-10-21 | Google Inc. | Evaluating transcriptions with a semantic parser |
Also Published As
Publication number | Publication date |
---|---|
AUPR585101A0 (en) | 2001-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6983239B1 (en) | Method and apparatus for embedding grammars in a natural language understanding (NLU) statistical parser | |
Ward | Extracting information in spontaneous speech. | |
US6937983B2 (en) | Method and system for semantic speech recognition | |
Issar | Estimation of language models for new spoken language applications | |
US6910004B2 (en) | Method and computer system for part-of-speech tagging of incomplete sentences | |
US6963831B1 (en) | Including statistical NLU models within a statistical parser | |
US20030191625A1 (en) | Method and system for creating a named entity language model | |
US6349282B1 (en) | Compound words in speech recognition systems | |
US5875426A (en) | Recognizing speech having word liaisons by adding a phoneme to reference word models | |
US10996931B1 (en) | Integrated programming framework for speech and text understanding with block and statement structure | |
JP2005084681A (ja) | 意味的言語モデル化および信頼性測定のための方法およびシステム | |
CA2481080C (fr) | Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees | |
US8401855B2 (en) | System and method for generating data for complex statistical modeling for use in dialog systems | |
Rayner et al. | Fast parsing using pruning and grammar specialization | |
Chow et al. | Speech understanding using a unification grammar | |
Schwartz et al. | Hidden understanding models for statistical sentence understanding | |
Jurcıcek et al. | Transformation-based Learning for Semantic parsing | |
WO2003001504A1 (fr) | Analyseur stochastique de blocs | |
Basu et al. | Commodity price retrieval system in bangla: An ivr based application | |
US10832675B2 (en) | Speech recognition system with interactive spelling function | |
Favre et al. | Efficient sentence segmentation using syntactic features | |
Béchet et al. | Robust dependency parsing for spoken language understanding of spontaneous speech. | |
JP3181465B2 (ja) | 言語処理装置 | |
WO2002103672A1 (fr) | Module de reconnaissance assiste par langage | |
Gordon et al. | An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |