WO2002103673A1 - Neural network post-processor - Google Patents

Neural network post-processor Download PDF

Info

Publication number
WO2002103673A1
WO2002103673A1 PCT/AU2002/000803 AU0200803W WO02103673A1 WO 2002103673 A1 WO2002103673 A1 WO 2002103673A1 AU 0200803 W AU0200803 W AU 0200803W WO 02103673 A1 WO02103673 A1 WO 02103673A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing system
utterances
speech recognition
user
mlp
Prior art date
Application number
PCT/AU2002/000803
Other languages
French (fr)
Inventor
Habib Talhami
Nik Waldron
Original Assignee
Kaz Group Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kaz Group Limited filed Critical Kaz Group Limited
Publication of WO2002103673A1 publication Critical patent/WO2002103673A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present invention relates to a neural network post-processor and more particularly to such a processor incorporated within a recogniser portion of an automated speech recognition system.
  • Automated speech recognition is a complex task in itself. Automated speech understanding sufficient to provide automated dialogue with a user adds a further layer of complexity.
  • automated speech recognition system will refer to automated or substantially automated systems which perform automated speech recognition and also attempt automated speech understanding, at least to predetermined levels sufficient to provide a capability for at least limited automated dialogue with a user.
  • a generalized diagram of a commercial grade automated speech recognition system as can be used for example in call centres and the like is illustrated in Fig. 1.
  • a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of gauging correctness of a pattern recognition task by mapping an input feature space onto a probability space.
  • said mapping is a non-linear mapping.
  • Preferably said method is applied to a confidence scoring task.
  • Preferably said method utilizes a multi-layer perceptron to apply multiple knowledge sources to said confidence scoring task.
  • a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of obtaining a confidence score by a non-linear mapping onto the real number line.
  • Preferably said method utilizes an MLP to generate an aposteriori probability for confidence.
  • said MLP is trained with a mean squared error.
  • said MLP is trained utilizing a cross- entropy cost function.
  • said MLP is additionally trained with some sigmoidal non-linearity.
  • a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of confidence scoring utilizing a data driven system.
  • FIG. 1 is a generalized block diagram of a prior art automated speech recognition system
  • Fig. 2 is a generalized block diagram of an automated speech recognition system suited for use in conjunction with an embodiment of the present invention
  • Fig. 3 is a more detailed block diagram of the utterance processing and dialogue processing portions of the system of Fig. 2 ;
  • Fig. 4 is a block diagram of the system of Fig. 2 incorporating a neural network post-processor in accordance with a first embodiment of the present invention.
  • Fig. 2 there is illustrated a generalized block diagram of an automated speech recognition system 10 adapted to receive human speech derived from user 11, and to process that speech with a view to recognizing and understanding the speech to a sufficient level of accuracy that a response 12 can be returned to user 11 by system 10.
  • the response 12 can take the form of an auditory communication, a written or visual communication or any other form of communication intelligible to user 11 or a combination thereof.
  • input from user 11 is in the form of a plurality of utterances 13 which are received by transducer 14 (for example a microphone) and converted into an electronic representation 15 of the utterances 13.
  • the electronic representation 15 comprises a digital representation of the utterances 13 in .WAV format.
  • Each electronic representation 15 represents an entire utterance 13.
  • the electronic representations 15 are processed through front end processor 16 to produce a stream of vectors 17, one vector for example for each 10ms segment of utterance 13.
  • the vectors 17 are matched against knowledge base vectors 18 derived from knowledge base 19 by back end processor 20 so as to produce ranked results 1-N in the form of N best results 21.
  • the results can comprise for example subwords, words or phrases but will depend on the application. N can vary from 1 to very high values, again depending on the application.
  • An utterance processing system 26 receives the N best results 21 and begins the task of assembling the results into a meaning representation 25 for example based on the data contained in language knowledge database 31.
  • the utterance processing system 26 orders the resulting tokens or words 23 contained in N-best results 21 into a meaning representation 25 of token or word candidates which are passed to the dialogue processing system 27 where sufficient understanding is attained so as to permit functional utilization of speech input 15 from user 11 for the task to be performed by the automated speech recognition system 10.
  • the functionality includes attaining of sufficient understanding to permit at least a limited dialogue to be entered into with user/caller 11 by means of response 12 in the form of prompts so as to elicit further speech input from the user 11.
  • the functionality for example can include a sufficient understanding to permit interaction with extended databases for data identification.
  • Fig. 3 illustrates further detail of the system of Fig. 2 including listing of further functional components which make up the utterance processing system 26 and the dialogue processing system 27 and their interaction. Like components are numbered as for the arrangement of Fig. 2.
  • the utterance processing system 26 and the dialogue processing system 27 together form a natural language processing system.
  • the utterance processing system 26 is event -driven and processes each of the utterances 13 of caller/user 11 individually.
  • the dialogue processing system 27 puts any given utterance 13 of caller/user 11 into the context of the current conversation (usually in the context of a telephone conversation) . Broadly, in a telephone answering context, it will try to resolve the query from the caller and decide on an appropriate answer to be provided by way of response 12.
  • the utterance processing system 26 takes as input the output of the acoustic or speech recogniser 30 and produces a meaning representation 25 for passing to dialogue processing system 27.
  • the meaning representation 25 can take the form of value pairs.
  • the utterance "I want to go from Melbourne to Sydney on Wednesday” may be presented to the dialogue processing system 27 in the form of three value pairs, comprising :
  • the recogniser 30 provides as output N best results 21 usually in the form of tokens or words 23 to the utterance processing system 26 where it is first disambiguated by language model 32.
  • the language model 32 is based on trigrams with cut off.
  • Analyser 33 specifies how words derived from language model 32 can be grouped together to form meaningful phrases which are used to interpret utterance 13.
  • the analyzer is based on a series of simple finite state automata which produce robust parses of phrasal chunks - for example noun phrases for entities and concepts and WH- phrases for questions, dates.
  • Analyser 33 is driven by grammars such as meta-gram ar 34. The grammars themselves must be tailored for each application and can be thought of as data created for a specific customer.
  • the resolver 35 uses semantic information associated with the words of the phrases recognized as relevant by the analyzer 33 to refine the meaning representation 25 into its final form for passing through the dialogue flow controller 36 within dialogue processing system 27.
  • the dialogue processing system 27, in this instance with reference to Fig. 3, receives meaning representation 25 from resolver 35 and processes the dialogue according to the appropriate dialogue models.
  • dialogue models will be specific to different applications but some may be reusable. For example a protocol model may handle greetings, closures, interruptions, errors and the like across a number of different applications.
  • the dialogue flow controller 36 uses the dialogue history to keep track of the interactions.
  • the logic engine 37 in this instance, creates SQL queries based on the meaning representation 25. Again it will be dependent on the specific application and its domain knowledge base.
  • the generator 38 produces responses 12 (for example speech out) .
  • the generator 38 can utilize generic text to speech (TTS) systems to produce a voiced response.
  • TTS generic text to speech
  • Language knowledge database 31 comprises, in the instance of Fig. 3, a lexicon 39 operating in conjunction with database 40.
  • the lexicon 39 and database 40 operating in conjunction with ' knowledge base mapping tools 41 and, as appropriate, language model 32 and grammars 34 constitutes a language knowledge database 31 or knowledge base which deals with domain specific data.
  • the structure and grouping of data is modeled in the knowledge base 31.
  • Database 40 comprises raw data provided by a customer. In one instance this data may comprise names, addresses, places, dates and is usually organised in a way that logically relates to the way it will be used.
  • the database 40 may remain unchanged or it may be updated throughout the lifetime of an application. Functional implementation can be by way of database servers such as MySQL, Oracle, Postgres .
  • FIG. 4 there is shown in block diagram form a neural network post-processor 410 in accordance with a first preferred embodiment of the present invention.
  • processor 410 is applied to the output of recogniser 30.
  • neural network post-processor 410 utilises a multi-layer perceptron to apply multiple knowledge sources to the problem. This performs a non-linear mapping onto the real number line to give us a confidence score.
  • the system 10 incorporating processor 410 is data driven, rather than based on some heuristic technique, as such (for a representative corpus) . It 'learns' an optimal mapping from input data to a correct/incorrect mapping. In effect, in order to gauge end best results 21 derived from recogniser 30 the neural network postprocessor 410 non-linearly maps an input feature space of a pattern recognition task onto the probability space for gauging correctness as applied to confidence scoring.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

In a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of gauging correctness of a pattern recognition task by mapping an input feature space onto a probability space.

Description

NEURAL NETWORK POST-PROCESSOR
The present invention relates to a neural network post-processor and more particularly to such a processor incorporated within a recogniser portion of an automated speech recognition system.
BACKGROUND
Automated speech recognition is a complex task in itself. Automated speech understanding sufficient to provide automated dialogue with a user adds a further layer of complexity.
In this specification the term "automated speech recognition system" will refer to automated or substantially automated systems which perform automated speech recognition and also attempt automated speech understanding, at least to predetermined levels sufficient to provide a capability for at least limited automated dialogue with a user. A generalized diagram of a commercial grade automated speech recognition system as can be used for example in call centres and the like is illustrated in Fig. 1.
With advances in digital computers and a significant lowering in cost per unit of computing capacity there have been a number of attempts in the commercial marketplace to install such automated speech recognition systems implemented substantially by means of digital computers. However, to date, there remain problems in achieving 100% recognition and/or 100% understanding in real time.
In one particular form critical to the success or otherwise of any given recognition schema there are difficulties in classifying patterns as correctly recognized or incorrectly recognized/not modeled.
It is an object of the present invention to address or ameliorate one or more of the abovementioned disadvantages.
BRIEF DESCRIPTION OF INVENTION
Accordingly, in one broad form of the invention there is provided in a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of gauging correctness of a pattern recognition task by mapping an input feature space onto a probability space. Preferably said mapping is a non-linear mapping.
Preferably said method is applied to a confidence scoring task.
Preferably said method utilizes a multi-layer perceptron to apply multiple knowledge sources to said confidence scoring task.
In a further broad form of the invention there is provided in a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of obtaining a confidence score by a non-linear mapping onto the real number line.
Preferably said method utilizes an MLP to generate an aposteriori probability for confidence.
Preferably said MLP is trained with a mean squared error. Preferably said MLP is trained utilizing a cross- entropy cost function.
Preferably said MLP is additionally trained with some sigmoidal non-linearity.
In yet a further broad form of the invention there is provided in a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of confidence scoring utilizing a data driven system.
BRIEF DESCRIPTION OF DRAWINGS
Embodiments of the present invention will now be described with reference to the accompanying drawings wherein: Fig. 1 is a generalized block diagram of a prior art automated speech recognition system;
Fig. 2 is a generalized block diagram of an automated speech recognition system suited for use in conjunction with an embodiment of the present invention;
Fig. 3 is a more detailed block diagram of the utterance processing and dialogue processing portions of the system of Fig. 2 ;
Fig. 4 is a block diagram of the system of Fig. 2 incorporating a neural network post-processor in accordance with a first embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
With reference to Fig. 2 there is illustrated a generalized block diagram of an automated speech recognition system 10 adapted to receive human speech derived from user 11, and to process that speech with a view to recognizing and understanding the speech to a sufficient level of accuracy that a response 12 can be returned to user 11 by system 10. In the context of systems to which embodiments of the present invention are applicable the response 12 can take the form of an auditory communication, a written or visual communication or any other form of communication intelligible to user 11 or a combination thereof. In all cases input from user 11 is in the form of a plurality of utterances 13 which are received by transducer 14 (for example a microphone) and converted into an electronic representation 15 of the utterances 13. In one exemplary form the electronic representation 15 comprises a digital representation of the utterances 13 in .WAV format. Each electronic representation 15 represents an entire utterance 13. The electronic representations 15 are processed through front end processor 16 to produce a stream of vectors 17, one vector for example for each 10ms segment of utterance 13. The vectors 17 are matched against knowledge base vectors 18 derived from knowledge base 19 by back end processor 20 so as to produce ranked results 1-N in the form of N best results 21. The results can comprise for example subwords, words or phrases but will depend on the application. N can vary from 1 to very high values, again depending on the application.
An utterance processing system 26 receives the N best results 21 and begins the task of assembling the results into a meaning representation 25 for example based on the data contained in language knowledge database 31.
The utterance processing system 26 orders the resulting tokens or words 23 contained in N-best results 21 into a meaning representation 25 of token or word candidates which are passed to the dialogue processing system 27 where sufficient understanding is attained so as to permit functional utilization of speech input 15 from user 11 for the task to be performed by the automated speech recognition system 10. In this case the functionality includes attaining of sufficient understanding to permit at least a limited dialogue to be entered into with user/caller 11 by means of response 12 in the form of prompts so as to elicit further speech input from the user 11. In the alternative or in addition, the functionality for example can include a sufficient understanding to permit interaction with extended databases for data identification.
Fig. 3 illustrates further detail of the system of Fig. 2 including listing of further functional components which make up the utterance processing system 26 and the dialogue processing system 27 and their interaction. Like components are numbered as for the arrangement of Fig. 2.
The utterance processing system 26 and the dialogue processing system 27 together form a natural language processing system. The utterance processing system 26 is event -driven and processes each of the utterances 13 of caller/user 11 individually. The dialogue processing system 27 puts any given utterance 13 of caller/user 11 into the context of the current conversation (usually in the context of a telephone conversation) . Broadly, in a telephone answering context, it will try to resolve the query from the caller and decide on an appropriate answer to be provided by way of response 12.
The utterance processing system 26 takes as input the output of the acoustic or speech recogniser 30 and produces a meaning representation 25 for passing to dialogue processing system 27.
In a typical, but not limiting form, the meaning representation 25 can take the form of value pairs. For example, the utterance "I want to go from Melbourne to Sydney on Wednesday" may be presented to the dialogue processing system 27 in the form of three value pairs, comprising :
1. Start; Melbourne
2. Destination; Sydney 3. Date; Wednesday where, in this instance, the components Melbourne, Sydney, Wednesday of the value pairs 24 comprise tokens or words 23.
With particular reference to Fig. 3 the recogniser 30 provides as output N best results 21 usually in the form of tokens or words 23 to the utterance processing system 26 where it is first disambiguated by language model 32. In one form the language model 32 is based on trigrams with cut off. Analyser 33 specifies how words derived from language model 32 can be grouped together to form meaningful phrases which are used to interpret utterance 13. In one form the analyzer is based on a series of simple finite state automata which produce robust parses of phrasal chunks - for example noun phrases for entities and concepts and WH- phrases for questions, dates. Analyser 33 is driven by grammars such as meta-gram ar 34. The grammars themselves must be tailored for each application and can be thought of as data created for a specific customer.
The resolver 35 then uses semantic information associated with the words of the phrases recognized as relevant by the analyzer 33 to refine the meaning representation 25 into its final form for passing through the dialogue flow controller 36 within dialogue processing system 27. The dialogue processing system 27, in this instance with reference to Fig. 3, receives meaning representation 25 from resolver 35 and processes the dialogue according to the appropriate dialogue models. Again, dialogue models will be specific to different applications but some may be reusable. For example a protocol model may handle greetings, closures, interruptions, errors and the like across a number of different applications.
The dialogue flow controller 36 uses the dialogue history to keep track of the interactions. The logic engine 37, in this instance, creates SQL queries based on the meaning representation 25. Again it will be dependent on the specific application and its domain knowledge base.
The generator 38 produces responses 12 (for example speech out) . In the simplest form the generator 38 can utilize generic text to speech (TTS) systems to produce a voiced response.
Language knowledge database 31 comprises, in the instance of Fig. 3, a lexicon 39 operating in conjunction with database 40. The lexicon 39 and database 40 operating in conjunction with ' knowledge base mapping tools 41 and, as appropriate, language model 32 and grammars 34 constitutes a language knowledge database 31 or knowledge base which deals with domain specific data. The structure and grouping of data is modeled in the knowledge base 31. Database 40 comprises raw data provided by a customer. In one instance this data may comprise names, addresses, places, dates and is usually organised in a way that logically relates to the way it will be used. The database 40 may remain unchanged or it may be updated throughout the lifetime of an application. Functional implementation can be by way of database servers such as MySQL, Oracle, Postgres .
As will be observed particularly with reference to Fig. 3, interaction between a number of components in the system can be quite complex with lexicon 39, in particular, being used by and interacting with multiple components of System 10.
With reference to Fig. 4 there is shown in block diagram form a neural network post-processor 410 in accordance with a first preferred embodiment of the present invention.
In this instance the processor 410 is applied to the output of recogniser 30.
Broadly neural network post-processor 410 utilises a multi-layer perceptron to apply multiple knowledge sources to the problem. This performs a non-linear mapping onto the real number line to give us a confidence score.
This differs from previous solutions in several ways:
1. The application of multiple knowledge sources 2. The use of an MLP to generate an aposteriori probability for confidence.
This solution can be implemented simply using an MLP trained with either a mean squared error or a cross-entropy cost function, and some sigmoidal non-linearity. Our experimental system was trained using conjugate gradient descent (back-propagation) .
The system 10 incorporating processor 410 is data driven, rather than based on some heuristic technique, as such (for a representative corpus) . It 'learns' an optimal mapping from input data to a correct/incorrect mapping. In effect, in order to gauge end best results 21 derived from recogniser 30 the neural network postprocessor 410 non-linearly maps an input feature space of a pattern recognition task onto the probability space for gauging correctness as applied to confidence scoring.
The above describes only some embodiments of the present invention and modifications, obvious to those skilled in the art, can be made thereto without departing from the scope and spirit of the present invention.

Claims

1. In a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of gauging correctness of a pattern recognition task by mapping an input feature space onto a probability space.
2. The method of Claim 1 wherein said mapping is a non- linear mapping.
3. The method of Claim 1 or Claim 2 applied to a confidence scoring task.
4. The method of any previous claim utilizing a multilayer perceptron to apply multiple knowledge sources to said confidence scoring task.
5. In a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of obtaining a confidence score by a non-linear mapping onto the real number line .
6. The method of any previous claim utilizing an MLP to generate an aposteriori probability for confidence.
7. The method of Claim 6 wherein said MLP is trained with a mean squared error.
8. The method of Claim 6 wherein said MLP is trained utilizing a cross-entropy cost function.
9. The method of Claim 7 and Claim 8 wherein said MLP is additionally trained with some sigmoidal non- linearity.
10. In a speech recognition system of the type adapted to process utterances from a caller or user by way of a recogniser, an utterance processing system and a dialogue processing system so as to produce responses to said utterances, a method of confidence scoring utilizing a data driven system.
11. A speech recognition system operating according to the method of any previous claim.
PCT/AU2002/000803 2001-06-19 2002-06-19 Neural network post-processor WO2002103673A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPR5793A AUPR579301A0 (en) 2001-06-19 2001-06-19 Neural network post-processor
AUPR5793 2001-06-19

Publications (1)

Publication Number Publication Date
WO2002103673A1 true WO2002103673A1 (en) 2002-12-27

Family

ID=3829764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2002/000803 WO2002103673A1 (en) 2001-06-19 2002-06-19 Neural network post-processor

Country Status (2)

Country Link
AU (1) AUPR579301A0 (en)
WO (1) WO2002103673A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004072862A1 (en) * 2003-02-11 2004-08-26 Telstra Corporation Limited System for predicting speec recognition accuracy and development for a dialog system
US7653545B1 (en) 1999-06-11 2010-01-26 Telstra Corporation Limited Method of developing an interactive system
US7712031B2 (en) 2002-07-24 2010-05-04 Telstra Corporation Limited System and process for developing a voice application
AU2004211007B2 (en) * 2003-02-11 2010-08-19 Telstra Corporation Limited System for predicting speech recognition accuracy and development for a dialog system
US8046227B2 (en) 2002-09-06 2011-10-25 Telestra Corporation Limited Development system for a dialog system
US8296129B2 (en) 2003-04-29 2012-10-23 Telstra Corporation Limited System and process for grammatical inference
US11256866B2 (en) 2017-10-25 2022-02-22 Google Llc Natural language processing with an N-gram machine
US11449678B2 (en) 2016-09-30 2022-09-20 Huawei Technologies Co., Ltd. Deep learning based dialog method, apparatus, and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5465321A (en) * 1993-04-07 1995-11-07 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Hidden markov models for fault detection in dynamic systems
US6026359A (en) * 1996-09-20 2000-02-15 Nippon Telegraph And Telephone Corporation Scheme for model adaptation in pattern recognition based on Taylor expansion
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US6421641B1 (en) * 1999-11-12 2002-07-16 International Business Machines Corporation Methods and apparatus for fast adaptation of a band-quantized speech decoding system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5465321A (en) * 1993-04-07 1995-11-07 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Hidden markov models for fault detection in dynamic systems
US6026359A (en) * 1996-09-20 2000-02-15 Nippon Telegraph And Telephone Corporation Scheme for model adaptation in pattern recognition based on Taylor expansion
US6125345A (en) * 1997-09-19 2000-09-26 At&T Corporation Method and apparatus for discriminative utterance verification using multiple confidence measures
US6421641B1 (en) * 1999-11-12 2002-07-16 International Business Machines Corporation Methods and apparatus for fast adaptation of a band-quantized speech decoding system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653545B1 (en) 1999-06-11 2010-01-26 Telstra Corporation Limited Method of developing an interactive system
US7712031B2 (en) 2002-07-24 2010-05-04 Telstra Corporation Limited System and process for developing a voice application
US8046227B2 (en) 2002-09-06 2011-10-25 Telestra Corporation Limited Development system for a dialog system
WO2004072862A1 (en) * 2003-02-11 2004-08-26 Telstra Corporation Limited System for predicting speec recognition accuracy and development for a dialog system
AU2004211007B2 (en) * 2003-02-11 2010-08-19 Telstra Corporation Limited System for predicting speech recognition accuracy and development for a dialog system
US7917363B2 (en) 2003-02-11 2011-03-29 Telstra Corporation Limited System for predicting speech recognition accuracy and development for a dialog system
US8296129B2 (en) 2003-04-29 2012-10-23 Telstra Corporation Limited System and process for grammatical inference
US11449678B2 (en) 2016-09-30 2022-09-20 Huawei Technologies Co., Ltd. Deep learning based dialog method, apparatus, and device
US11256866B2 (en) 2017-10-25 2022-02-22 Google Llc Natural language processing with an N-gram machine
US11947917B2 (en) 2017-10-25 2024-04-02 Google Llc Natural language processing with an n-gram machine

Also Published As

Publication number Publication date
AUPR579301A0 (en) 2001-07-12

Similar Documents

Publication Publication Date Title
McTear Spoken dialogue technology: enabling the conversational user interface
US20190035385A1 (en) User-provided transcription feedback and correction
US7606714B2 (en) Natural language classification within an automated response system
US20030191625A1 (en) Method and system for creating a named entity language model
US6501833B2 (en) Method and apparatus for dynamic adaptation of a large vocabulary speech recognition system and for use of constraints from a database in a large vocabulary speech recognition system
Newell et al. Speech-understanding systems: Final report of a study group
CN109241258A (en) A kind of deep learning intelligent Answer System using tax field
Riccardi et al. Stochastic language adaptation over time and state in natural spoken dialog systems
JP2005084681A (en) Method and system for semantic language modeling and reliability measurement
CN111833853A (en) Voice processing method and device, electronic equipment and computer readable storage medium
CA2481080C (en) Method and system for detecting and extracting named entities from spontaneous communications
Rabiner et al. Speech recognition: Statistical methods
Noyes et al. Errors and error correction in automatic speech recognition systems
WO2002103673A1 (en) Neural network post-processor
Gallwitz et al. The Erlangen spoken dialogue system EVAR: A state-of-the-art information retrieval system
Griol et al. Bringing together commercial and academic perspectives for the development of intelligent AmI interfaces
López-Cózar et al. Combining language models in the input interface of a spoken dialogue system
JP4220151B2 (en) Spoken dialogue device
Rahim et al. Robust numeric recognition in spoken language dialogue
KR100684160B1 (en) Apparatus and method for dialogue analysis using entity name recognition
McInnes et al. Effects of prompt style on user responses to an automated banking service using word-spotting
WO2002103674A1 (en) On-line environmental and speaker model adaptation
López-Cózar et al. A new technique based on augmented language models to improve the performance of spoken dialogue systems.
WO2002103672A1 (en) Language assisted recognition module
Passonneau et al. Seeing what you said: How wizards use voice search results

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP