WO2002103672A1 - Module de reconnaissance assiste par langage - Google Patents

Module de reconnaissance assiste par langage Download PDF

Info

Publication number
WO2002103672A1
WO2002103672A1 PCT/AU2002/000801 AU0200801W WO02103672A1 WO 2002103672 A1 WO2002103672 A1 WO 2002103672A1 AU 0200801 W AU0200801 W AU 0200801W WO 02103672 A1 WO02103672 A1 WO 02103672A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
recognition module
assisted recognition
dialogue
language model
Prior art date
Application number
PCT/AU2002/000801
Other languages
English (en)
Inventor
Dominique Estival
Ben Hutchison
Original Assignee
Kaz Group Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kaz Group Limited filed Critical Kaz Group Limited
Publication of WO2002103672A1 publication Critical patent/WO2002103672A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • the present invention relates to a language assisted recognition module and, more particularly, to such a module suited for use with an automated speech recognition system.
  • Automated speech recognition is a complex task in itself. Automated speech understanding sufficient to provide automated dialogue with a user adds a further layer of complexity.
  • automated speech recognition system will refer to automated or substantially automated systems which perform automated speech recognition and also attempt automated speech understanding, at least to predetermined levels sufficient to provide a capability for at least limited automated dialogue with a user.
  • FIG. 1 A generalized diagram of a commercial grade automated speech recognition system as can be used for example in call centres and the like is illustrated in Fig. 1.
  • a language assisted recognition module for an automated speech recognition system, said module incorporating a plurality of sub-modules.
  • said sub-modules process data in series .
  • said sub-modules include one or more modules selected from a duration model; a language repairer; an adaptive language assisted recognition message processor; a culler; a language model/digit sum checker; a confidence model .
  • said language assisted recognition module is in communication with a dialogue processing system.
  • said dialogue processing system includes a dialogue flow controller.
  • said language assisted recognition module incorporates feedback means from said utterance processing system to said language assisted recognition module.
  • said feedback means passes feedback data from said utterance processing system to said language assisted recognition module for adaptive processing within said language assisted recognition module.
  • said feedback data comprises an adaptive language assisted recognition message derived from a dialogue flow controller within said automated speech recognition system.
  • Fig. 1 is a generalized block diagram of a prior art automated speech recognition system
  • Fig. 2 is a generalized block diagram of an automated speech recognition system suited for use in conjunction with an embodiment of the present invention
  • Fig. 3 is a more detailed block diagram of the utterance processing and dialogue processing portions of the system of Fig. 2;
  • Fig. 4 is a generalized block diagram of the system of Fig. 2 incorporating a language assisted recognition module in accordance with a first preferred embodiment of the present invention
  • Fig. 5 is a detailed block diagram of the language assisted recognition module of Fig. 4.
  • Fig. 6 is a block diagram of a class structure of an LAR system in accordance with a first example of the system of Fig. 4.
  • Fig. 2 there is illustrated a generalized block diagram of an automated speech recognition system 10 adapted to receive human speech derived from user 11, and to process that speech with a view to recognizing and understanding the speech to a sufficient level of accuracy that a response 12 can be returned to user 11 by system 10.
  • the response 12 can take the form of an auditory communication, a written or visual communication or any other form of communication intelligible to user 11 or a combination thereof.
  • input from user 11 is in the form of a plurality of utterances 13 which are received by transducer 14 (for example a microphone) and converted into an electronic representation 15 of the utterances 13.
  • the electronic representation 15 comprises a digital representation of the utterances 13 in .WAV format.
  • Each electronic representation 15 represents an entire utterance 13.
  • the electronic representations 15 are processed through front end processor 16 to produce a stream of vectors 17, one vector for example for each 10ms segment of utterance 13.
  • the vectors 17 are matched against knowledge base vectors 18 derived from knowledge base 19 by back end processor 20 so as to produce ranked results 1-N in the form of N best results 21.
  • the results can comprise for example subwords, words or phrases but will depend on the application. N can vary from 1 to very high values, again depending on the application.
  • An utterance processing system 26 receives the N best results 21 and begins the task of assembling the results into a meaning representation 25 for example based on the data contained in language knowledge database 31.
  • the utterance processing system 26 orders the resulting tokens or words 23 contained in N-best results 21 into a meaning representation 25 of token or word candidates which are passed to the dialogue processing system 27 where sufficient understanding is attained so as to permit functional utilization of speech input 15 from user 11 for the task to be performed by the automated speech recognition system 10.
  • the functionality includes attaining of sufficient understanding to permit at least a limited dialogue to be entered into with user/caller 11 by means of response 12 in the form of prompts so as to elicit further speech input from the user 11.
  • the functionality for example can include a sufficient understanding to permit interaction with extended databases for data identification.
  • Fig. 3 illustrates further detail of the system of
  • Fig. 2 including listing of further functional components which make up the utterance processing system 26 and the dialogue processing system 27 and their interaction. Like components are numbered as for the arrangement of Fig. 2.
  • the utterance processing system 26 and the dialogue processing system 27 together form a natural language processing system.
  • the utterance processing system 26 is event-driven and processes each of the utterances 13 of caller/user 11 individually.
  • the dialogue processing system 27 puts any given utterance 13 of caller/user 11 into the context of the current conversation (usually in the context of a telephone conversation) . Broadly, in a telephone answering context, it will try to resolve the query from the caller and decide on an appropriate answer to be provided by way of response 12.
  • the utterance processing system 26 takes as input the output of the acoustic or speech recogniser 30 and produces a meaning representation 25 for passing to dialogue processing system 27.
  • the meaning representation 25 can take the form of value pairs.
  • the utterance "I want to go from Melbourne to Sydney on Wednesday” may be presented to the dialogue processing system 27 in the form of three value pairs, comprising:
  • the recogniser 30 provides as output N best results 21 usually in the form of tokens or words 23 to the utterance processing system 26 where it is first disambiguated by language model 32.
  • the language model 32 is based on trigrams with cut off.
  • Analyser 33 specifies how words derived from language model 32 can be grouped together to form meaningful phrases which are used to interpret utterance 13.
  • the analyzer is based on a series of simple finite state automata which produce robust parses of phrasal chunks - for example noun phrases for entities and concepts and WH- phrases for questions, dates.
  • Analyser 33 is driven by grammars such as meta-grammar 34. The grammars themselves must be tailored for each application and can be thought of as data created for a specific customer.
  • the resolver 35 uses semantic information associated with the words of the phrases recognized as relevant by the analyzer 33 to refine the meaning representation 25 into its final form for passing through the dialogue flow controller 36 within dialogue processing system 27.
  • the dialogue processing system 27, in this instance with reference to Fig. 3, receives meaning representation 25 from resolver 35 and processes the dialogue according to the appropriate dialogue models.
  • dialogue models will be specific to different applications but some may be reusable .
  • a protocol model may handle greetings, closures, interruptions, errors and the like across a number of different applications.
  • the dialogue flow controller 36 uses the dialogue history to keep track of the interactions.
  • the logic engine 37 creates SQL queries based on the meaning representation 25. Again it will be dependent on the specific application and its domain knowledge base .
  • the generator 38 produces responses 12 (for example speech out) .
  • the generator 38 can utilize generic text to speech (TTS) systems to produce a voiced response.
  • TTS generic text to speech
  • Language knowledge database 31 comprises, in the instance of Fig. 3, a lexicon 39 operating in conjunction with database 40.
  • the lexicon 39 and database 40 operating in conjunction with knowledge base mapping tools 41 and, as appropriate, language model 32 and grammars 34 constitutes a language knowledge database 31 or knowledge base which deals with domain specific data.
  • the structure and grouping of data is modeled in the knowledge base 31.
  • Database 40 comprises raw data provided by a customer. In one instance this data may comprise names, addresses, places, dates and is usually organised in a way that logically relates to the way it will be used.
  • the database 40 may remain unchanged or it may be updated throughout the lifetime of an application. Functional implementation can be by way of database servers such as MySQL, Oracle, Postgres .
  • a language assisted recognition module 710 and a first example of its operation are described in the context of an automated speech recognition system 10 of the type described with reference to Fig. 2. Like components are numbered as for the arrangement of Fig . 2.
  • the module 710 includes a plurality of sub-modules 711 which, in a preferred form, are linked in series as illustrated in Fig. 4 so as to provide a pipelined processing of the token or word - sequence candidates 23 (in this instance in the form of an N best list) so as to produce an M-best list of language assisted sequence candidates 721 for input into utterance processing system 26.
  • the utterance processing system 26 includes a dialogue flow controller
  • DFC 715 adapted, as will be discussed in further detail below, to provide feedback data 714 in the form of an adaptive language assisted recognition message (ALARM) 716 to at least one of the sub-modules 711.
  • ALARM adaptive language assisted recognition message
  • the sub-modules 711 comprise, in this instance, respectively and in series, a duration model 717, a language repairer 718, an ALARM processor 719, a culler 720, a language model/digit sum checker 721, a confidence model 722 and a second culler 723.
  • This modular arrangement in preferred forms in conjunction with the feedback data 714, provides an exceedingly flexible arrangement for interposing into the processing path of data flowing through the automated speech recognition system 710 of Fig. 4.
  • the solution according to a first preferred embodiment is to create a separate modular component 710, which contains a set of procedures to 1) eliminate some of the results returned by the speech recogniser, 2) rescore those results, and 3) perform some repair on the results.
  • the solution differs from previous attempts in that: 1. the techniques used for rescoring and repairing are not incorporated in the speech recogniser, 2. the linguistic and application domain information added to the recognition grammar for repairing is minimal and does not impact on efficiency. 3. the linguistic and application domain information used by the LAR is not needed by the language processing components, resulting in greater modularity of application design and development.
  • LAR module 710 There are four types of linguistic information that the LAR module 710 can use. They are: 1. N-gram frequencies of words 2. Parts of speech of words 3. Knowledge of which words do not make sense in the context of the dialogue
  • the LAR module 110 is split into sub-modules 111 which use these different kinds of information.
  • the sub-modules include :
  • Language model uses n-grams to estimate the likelihood of sequences of words and re-score the N-best list.
  • Class based language model (uses 1 and 2) : same as above, but also uses part of speech information to overcome the sparse data problem
  • Adaptive LAR (uses 3) : receives messages from the DFC telling it which concepts have been denied at a previous stage of the dialogue, and uses this information to rule out candidates in the N-best list.
  • Language repairer (uses 4) : removes disfluencies from each utterance (assumes that the recogniser has correctly recognised the disfluencies) .
  • any subset of these sub- modules may be used, and the method of configuring the LAR 710 is flexible enough to allow the sub-modules 711 to be applied in any order.
  • LAR 710 As the linguistic and application domain information used by the LAR 710 is not needed by the language processing components of processing system 26, language processing is also more efficient, but the main advantage is that this architecture results in greater modularity of application design and development.
  • the encapsulation of the procedures in the LAR component allows for greater modularity of the software components.
  • the separation of information for speech recognition, LAR and language processing, which is usually integrated in the architecture of other spoken dialogue systems, allows for greater reusability of the language data used by these components.
  • the data used by the LAR procedures are also more reusable and can be shared more easily across applications.
  • a core concept underlying the LAR is the encapsulation of procedures into a separate module 110, resulting in greater flexibility for development, while enhancing the output of the speech recognition component without negative consequences for efficiency.
  • the modularity of the architecture ensures rapid application development and greater reusability of data. Variations within the scope of embodiments of the invention include the addition of new modules, e.g. keyword spotting, modification of existing modules, e.g. new algorithm for language modeling, as well as applying the procedures to data structures other than an N-best list or a lattice.
  • new modules e.g. keyword spotting
  • modification of existing modules e.g. new algorithm for language modeling
  • a Duration Model which rescores the N-best list based upon the duration of words or phrases.
  • An adaptive duration model may also keep track of the durations of words/phrases in previous utterances.
  • the Language Repairer requires a specification of the transformations it is allowed to make. These may be transformations on POS sequences (e.g. "GENDER, SIL NO GENDER 2 " -> “GENDER 2 "). Thus it would also require the Lexicon.
  • the DFC will pass to the LAR module a (possibly NULL) restriction on the output of the recogniser. I will call this an Adaptive LAR Message, or ALARM.
  • the DFC may send an ALARM:
  • the LAR M ⁇ odule BaJdl! liave a sub-module, the ALARM Processor, which is responsible for interpreting and executing ALARMs.
  • An ALARM specifies what will be filtered out of the N-best list. It consists of:
  • the ALARMs can specify both which words/phrases are bad, and which are good.
  • LAR Language Assisted Recognition
  • LAR module There are four types of linguistic information that the LAR module uses. They are:
  • the LAR module is split into sub-modules which use these different kinds of information.
  • the sub-modules are:
  • Language model uses n-grams to estimate the likelihood of sequences of words and re-score the N-best list.
  • Class based language model (uses 1 and 2): same as above, but also uses part of speech information to overcome the sparse data problem
  • Adaptive LAR (uses 3): receives messages from the DFC telling it which concepts have been denied at a previous stage of the dialogue, and uses this information to rule out candidates in the N-best list.
  • Language repairer (uses 4): removes disfluencies from each utterance (assumes that the recogniser has correctly recognised the disfluencies).
  • any subset of these sub-modules may be used, and the method of configuring the LAR is flexible enough to allow the sub-modules to be applied in any order.
  • the language model used is a trigram model with backoff. It requires unigram, bigram and trigram frequencies, and it first attempts to use estimate the probability of a trigram using the trigram frequency. If however the frequency of the trigram falls below a certain threshold (denoted by trigger3 below), then the language model reverts to a bigram estimation. If, additionally, the bigram frequency falls below another threshold (called trigger2 below) then the language model reverts to just the unigrams to estimate the probability.
  • this probability is combined with recogniser's score to produce a new score for the utterance.
  • a scaling function must first be applied. This scaling function is set in the configuration file. For example a typical way to combine the acoustic and LM scores is:
  • New score acoustic score + 50 x LOG(LM score)
  • This section contains the formulas used in estimating the probability of an utterance. It can safely be skipped unless you are really interested in the nitty-gritty.
  • P back (C,AB) P(C
  • the class based LM estimates the probability of a word class given a context, and then multiplies this by the probability of a word given a word class.
  • the probability of a word given a word class is independent of context. This assumption does not always hold, but class based language modeling still remains an effective tool when large sets of training data are unavailable.
  • the class based LM calculates the probability of an utterance by the following formula:
  • P c ( W1 w 2 w 3 w n ) P(C 1 C 2 C 3 C n ) * P(w 1
  • Ci is the class of word w
  • C k C R 2 ) is calculated using the same trigram backoff formula as the standard language model.
  • the DFC should be able to constrain the LAR. To see why, suppose our first recognition is "when is mens freestyle swimming”. Our DFC asks for confirmation, and in response gets “no, mens freestyle swimming”. Obviously there has been a misrecognition here. If the DFC tells the LAR module to ignore “mens freestyle swimming”, then the next best candidate in the n-best, say “no, womens freestyle swimming”, will be chosen. We will refer to this as Adaptive LAR.
  • the DFC will pass to the LAR module a (possibly NULL) restriction on the output of the recogniser.
  • This restriction is called an Adaptive LAR Message, or ALARM, and specifies a semantic value which is to be excluded. All candidates containing words with this value will then be penalised.
  • the language repairer receives candidate utterances which may contain mark-ed up disfluencies and produce fluent versions of the utterances. For this purpose, three recognition tokens are reserved as meta-tokens.
  • the disfluency meta-tokens are:
  • the actually phonemic spellings of these tokens may be either _ps_ (a short pause), a noise model, or a filled pause ("umm", or "ahh”).
  • the Language Repairer takes in an N-best list and edits each utterance by:
  • the first thing to do is to create your n-gram files. The method is different depending on which language model you wish to use. If you are using the word based language model, then this just involves running ngramDir . pl to produce the files unigram. bin, bigram. bin and trigram. bin. If you are using the class based language model however, you will first need to tag your training data. This can be done either with the brill tagger, or using the tagger .pl script. Once your data is tagged, run ngramclasses . pl to produce classesgram. bin, unigram.bin, bigram. bin and trigram. bin
  • the next step is to configure the language model.
  • the configuration file has the following relevant settings:
  • LM_FILES (unigram.bin, bigram.bin, trigram.bin)
  • CBL _FILE classesgram.bin
  • LM_FILES takes a list containing the names of the files containing the unigram, bigram and trigram frequencies.
  • CBLM_FILES the name of the file containing the frequencies of each word occurring with a particular part of speech.
  • LM_SCALING specifies a function that applies to the language model probability before it gets combined with the acoustic score. Possible values are: o LINEAR: identity function o LOG: natural logarithm of the probability o LOGDIV: logarithm divided by the number of words in the utterance
  • LM_AC EIGHT and LM_LA EIGHT the multiples of the acoustic and language model score that will get combined to produce the final score.
  • the final score will be:
  • Spell ingLanguageModel POSLanguageModel, LanguageRepairer, and ALARM represent LAR modules that operate on the n-best list. Rej ect is used in conjunction with ALARM; it represents a rejected attribute-value pair.
  • LAR component To use the LAR component, you need to instantiate a UserSaid object with an n-best list of utterances Also, you need to determine which LAR modules you are going to use and create an instance for each. These objects are then used to operate on the n-best list.
  • the LAR modules can be applied in any order. Most of the time, the language model is applied before the repairer. This is because the repairer removes disfluent parts of utterances and the data used by the language model typically includes the disfluencies However, if the language model data does nor include disfluencies, the repairer should be applied first.
  • Each module is optional with the exception of the LanguageRepairer. If a recognition grammar is used that outputs disfluency meta tokens for language repair, then the repairer is required.
  • the class structure of the LAR system is showni ⁇ elQw..
  • the LanguageModel also instantiates each of the SentenceScore subclasses.
  • the SpellmgLanguageModel class implements a word based language model.
  • the name SpellmgLanguageModel was chosen instead of "word language model" since in the SYLAN code, a Word is a class containing lexical information such as part of speech (POS).
  • the POSLanguageModel class implements a class based language model It uses the POS rather than the spelling of each word.
  • Candidate represents a scored candidate sentence that is part of an n-best list.
  • a Candidate is an array of Word. It can be in one of two states. Either the words in the Candidate are just spellings (i.e., only the spelling field of the component words have been filled in) or they have been retrieved from the lexicon and contain full lexical details.
  • the Candidate class has an attribute that tracks which of the two states the object is in.
  • the number of ABCs, with F(ABC) > 0 where F is the frequency, is normally significantly smaller than (vocabulary size) 3 and the nu immber of ABs with F(AB) > 0 is normally significantly smaller than (vocabulary size) 2 .
  • normfactor(AB) 1 and we do not need to store/calculate the normfactor(AB) in that case. This means that the size of normfactor(AB) is equal to the size of the stored P back (AB).
  • Section 2 discusses the problem of selecting, obtaining and pre-processing corpora.
  • Section 3 discusses the calculation of the N-gram statistics using the programs ngram pi and ngformat pi
  • Section 4 discusses using the Language Model to evaluate N-best lists, and discusses the various configurations of SYLM that can be set
  • Section 5 gives some examples of SYLM in action.
  • Corpora can come from a variety of sources. transcriptions of dialogues, text downloaded from the web, or from other sources, text generated from a knowledge base,
  • 1271 , core can , MD , , , 1271, core, can, MD, , , 1272, core, can_i, QUERY, QUERY, , 1273 , core, can_ e, QUERY, QUERY,
  • the Language model calculates the probabilities of certain sequences of words using uni-, bi- and trigram statistics. It takes in a file containing an N-best list returned from SYCON which also contains acoustic scores for each entry. The Language Model calculates a probability, weights this with the language score, and returns the best (or perhaps in the future the M- best) analyses.
  • the file containing the N-best list should be in a file with the following format:
  • the filename of the N-best list is entered at the command prompt.
  • the configuration file looks like this:
  • LM_WEIGHTS ⁇ 0.1, 0.3, 0.6 ⁇
  • LM_FILES ⁇ unigram.bin, bigram.bin, trigram.bin ⁇
  • LM_LOGFILE I: ⁇ nlp ⁇ public ⁇ sylancomponents30 ⁇ logs ⁇ lm.log
  • the array LM_WEIGHTS assigns the relative importance placed on uni-, bi- and trigrams, respectively.
  • the Language Model currently uses a smoothed N-gram model, with the smoothing parameters set by LM_WEIGHTS.
  • LM_PROBABILISTIC allows the user to toggle between statistical and non-statistical modes by setting LM-PROBABILISTIC to f or O, respectively.
  • LM_SCALING allows the user to apply a function to the probability score. The purpose of doing this is for comparing the language score with the acoustic score, as well as including factors such as the number of words in each input string. Current avilable options for scaling are LINEAR, LOG, LOGDIV...????
  • LM_ACWEIGHT and LM_LAWEIGHT specify the weightings of the acoustic score and the language model's score respectively.
  • Language Model can be run to take either files or keyboard input, and can output to either the screen or a log file.
  • the language score is the product of the language function applied to each word in the string successively. /
  • the final language model score is a weighting of the acoustic score A and the scaled language score:
  • LM(w1 ,w2,...wn) b * A(w1 wn) + (1 -b1 ) * sca//ng(L(w1 ,...,wn))
  • scaling is the scaling function specified by the user.
  • the output of the Language Model is then the N-best list candidate with the highest final language model score. We leave open the possibility of in the future returning an M-best list of highest scoring candidates. 5. Examples
  • LM_PROBABILISTIC 1 Ben Hutchinson
  • the DFC would send an ALARM that would specify that we're listening for "mens” and “womens”, so that rival candidates to "the mens” that did not contain a gender term (e.g. "tennis”, "swimming") would be penalised.
  • a gender term e.g. "tennis”, "swimming”
  • the ALARM syntax would include "NOT”. So that "NOT mens AND NOT womens” would penalise candidates that didn't contain a gender. If we have already ascertained that the speaker didn't say "breaststroke”, the ALARM would be:
  • An ALARM could optionally include a penalty size, e.g. "-20" which would be added to the scores of candidates meeting the ALARM requirements. In the absence of a penalty size, the ALARM Processor would revert to a default. If the penalty size is set extremely high, it will have the effect of killing of candidates for good.
  • a penalty size e.g. "-20" which would be added to the scores of candidates meeting the ALARM requirements. In the absence of a penalty size, the ALARM Processor would revert to a default. If the penalty size is set extremely high, it will have the effect of killing of candidates for good.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un module de reconnaissance pour un système de reconnaissance de la parole incorporant des sous-modules, dans lequel des sous-modules comprennent un ou des modules choisis parmi un modèle de durée, un réparateur de langage, un processeur adaptatif de message de reconnaissance assisté par langage, une unité de sélection, un vérificateur de sommes de modèles/d'éléments numériques de langage, un modèle de confiance. Le module comporte un système de traitement de conversation, une unité de contrôle des moyens de débit de conversation et des moyens de rétroaction.
PCT/AU2002/000801 2001-06-19 2002-06-19 Module de reconnaissance assiste par langage WO2002103672A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR5788 2001-06-19
AUPR5788A AUPR578801A0 (en) 2001-06-19 2001-06-19 Language assisted recognition module

Publications (1)

Publication Number Publication Date
WO2002103672A1 true WO2002103672A1 (fr) 2002-12-27

Family

ID=3829759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2002/000801 WO2002103672A1 (fr) 2001-06-19 2002-06-19 Module de reconnaissance assiste par langage

Country Status (2)

Country Link
AU (1) AUPR578801A0 (fr)
WO (1) WO2002103672A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2197191A1 (fr) 2002-01-11 2010-06-16 Thomson Licensing Système de fourniture de répertoire et procédé pour un modem de ligne d'abonné numérique

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5566272A (en) * 1993-10-27 1996-10-15 Lucent Technologies Inc. Automatic speech recognition (ASR) processing using confidence measures
EP1020847A2 (fr) * 1999-01-18 2000-07-19 Nokia Mobile Phones Ltd. Procédé de reconnaissance de la parole à étapes multiples utilisant des mesures de fiabilité
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US6208964B1 (en) * 1998-08-31 2001-03-27 Nortel Networks Limited Method and apparatus for providing unsupervised adaptation of transcriptions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5566272A (en) * 1993-10-27 1996-10-15 Lucent Technologies Inc. Automatic speech recognition (ASR) processing using confidence measures
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US6208964B1 (en) * 1998-08-31 2001-03-27 Nortel Networks Limited Method and apparatus for providing unsupervised adaptation of transcriptions
EP1020847A2 (fr) * 1999-01-18 2000-07-19 Nokia Mobile Phones Ltd. Procédé de reconnaissance de la parole à étapes multiples utilisant des mesures de fiabilité

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2197191A1 (fr) 2002-01-11 2010-06-16 Thomson Licensing Système de fourniture de répertoire et procédé pour un modem de ligne d'abonné numérique

Also Published As

Publication number Publication date
AUPR578801A0 (en) 2001-07-12

Similar Documents

Publication Publication Date Title
Jelinek Statistical methods for speech recognition
US6501833B2 (en) Method and apparatus for dynamic adaptation of a large vocabulary speech recognition system and for use of constraints from a database in a large vocabulary speech recognition system
US8612212B2 (en) Method and system for automatically detecting morphemes in a task classification system using lattices
US7249019B2 (en) Method and apparatus for providing an integrated speech recognition and natural language understanding for a dialog system
EP1429313B1 (fr) Modèle de langage utilisé dans la reconnaissance de la parole
US10170107B1 (en) Extendable label recognition of linguistic input
US20180137109A1 (en) Methodology for automatic multilingual speech recognition
Ward Extracting information in spontaneous speech.
US20030191625A1 (en) Method and system for creating a named entity language model
EP1538535A2 (fr) Détermination du sens d'une entrée texte dans un système de compréhension de langage naturel
US5875426A (en) Recognizing speech having word liaisons by adding a phoneme to reference word models
US20020133346A1 (en) Method for processing initially recognized speech in a speech recognition session
US11295730B1 (en) Using phonetic variants in a local context to improve natural language understanding
CA2481080C (fr) Procede et systeme de detection et d'extraction d'entites nommees de communications spontanees
JP2000200273A (ja) 発話意図認識装置
Beaufays et al. Learning name pronunciations in automatic speech recognition systems
López-Cózar et al. Combining language models in the input interface of a spoken dialogue system
KR20050101695A (ko) 인식 결과를 이용한 통계적인 음성 인식 시스템 및 그 방법
US6772116B2 (en) Method of decoding telegraphic speech
KR20050101694A (ko) 문법적 제약을 갖는 통계적인 음성 인식 시스템 및 그 방법
WO2002103672A1 (fr) Module de reconnaissance assiste par langage
Duchateau et al. Handling disfluencies in spontaneous language models
JPH10232693A (ja) 音声認識装置
Ringger A robust loose coupling for speech recognition and natural language understanding
Esteve et al. On the use of linguistic consistency in systems for human-computer dialogues

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP