WO1996030895A1 - Method, apparatus, and radio for optimizing hidden markov model speech recognition - Google Patents
Method, apparatus, and radio for optimizing hidden markov model speech recognition Download PDFInfo
- Publication number
- WO1996030895A1 WO1996030895A1 PCT/US1996/000968 US9600968W WO9630895A1 WO 1996030895 A1 WO1996030895 A1 WO 1996030895A1 US 9600968 W US9600968 W US 9600968W WO 9630895 A1 WO9630895 A1 WO 9630895A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- score
- hidden markov
- path
- current
- markov model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000007476 Maximum Likelihood Methods 0.000 claims abstract description 84
- 239000013598 vector Substances 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 abstract description 9
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 13
- 230000007704 transition Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
Definitions
- the present invention relates generally to speech recognition, and more particularly to speech recognition using Hidden Markov Models.
- HMM Hidden Markov Model
- Prior art HMM speech recognition systems choose a model based on a best state sequence, in the maximum likelihood sense, at a specified time. Noise or inadequate training can cause a maximum likelihood state sequence associated with a model other than the correct model to be chosen.
- FIG. 1 is a flow diagram of one embodiment of steps for a method for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- FIG. 2 is a flow diagram of one embodiment of steps for a method for computing a plurality of current path scores in accordance with the present invention.
- FIG. 3 is a flow diagram of one embodiment of steps for a method for computing a plurality of current hybrid scores in accordance with the present invention.
- FIG. 4 is a block diagram of one embodiment of an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- FIG. 5 is a block diagram of one embodiment of a path score determiner in an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- FIG. 6 is a block diagram of one embodiment of a hybrid score determiner in an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- FIG. 7 is a depiction of one embodiment of a radio comprising an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- FIG. 8 is a graphical depiction of examples of normalized maximum likelihood scores for a set of HMM word models with respect to time.
- FIG. 9 is a graphical depiction of the amplitude waveform of an example voice signal with respect to time that relates to the score plots in FIG. 8.
- FIG. 10 is a graphical depiction of the PATH scores for the same set of HMM models as described in FIG. 8 with respect to time.
- FIG. 11 is a graphical depiction of the ML-PATH scores for the same set of HMM models as described in FIG. 8 with respect to time.
- the present invention provides a method, apparatus, and radio for Hidden Markov Model speech recognition that optimizes the model selection, in particular in the presence of noise or inadequate training.
- the advantage of using the ML-PATH metric is that the overall performance of a speech recognizer is significantly improved over that obtained with standard ML metric, especially under noisy conditions.
- HMM Hidden Markov Model
- M k is the kh of K HMM models and O, is the string of speech feature observations.
- prob(M, I O, ) prob(0, 1 M k )prob(M k ) I prob( ⁇ , ) (Eq.
- probCO MiJ the probability of observation O, occurring at time t given model M k , is easily and directly determined for a given observation sequence by means of a Viterbi decoder, the Forward Search algorithm, or other search methods commonly used in HMM-based speech recognition devices as described in "A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Rabiner, L.R., IEEE Proceedings, Vol. 77, No. 2, Feb. 1989, pp. 257-285; "The Viterbi Algorithm", Forney, G.D., IEEE Proceedings, Vol.61, pp.
- M t the HMM model, is a first order Markov chain consisting of N independent states defined by a set of transition probabilities and observation probabilities.
- the determination of MAX ⁇ prob(M k I O,) ⁇ by a Viterbi or other type decoder is really a determination of
- the model parameters are optimized to produce the best state sequence given known training data rather than to produce the best inter-model discrimination.
- the ML "best" model decision is usually made at prespecified times in the observation sequence or when the last model state has been occupied for a predetermined amount of time. Noise or inadequate training can produce a maximum likelihood state sequence associated with a model other than the "correct" model, i.e.. the model corresponding to the spoken input. This is a problem that the proposed invention mitigates.
- the invention described herein is a method implemented in computer hardware that provides an optimized means for choosing the "correct" HMM model for a given speech feature observation sequence.
- the standard maximum likelihood, ML is combined with the state sequence score, ML score, and with an additional score, herein called the PATH score, derived from information describing the dynamics of the ML score as a function of time, i.e., its score path.
- This additional PATH score information is derived from the HMM decoding algorithm and is combined with the ML score information in a novel way to form a hybrid metric, herein called the ML-PATH metric, for choosing the correct HMM model.
- One advantage of using the ML-PATH metric is that the overall accuracy of a speech recognizer can be significantly improved over that obtained with the standard ML metric alone, especially under noisy conditions. This has been demonstrated by the inventors through numerous experiments.
- This invention utilizes information already determined in the normal maximum likelihood (ML) calculations of the recognizer search algorithm to derive new information, i.e., the PATH score, and combine the two in novel ways to derive a new metric, the ML-PATH metric, which more accurately determines the correct HMM for a given spoken input utterance.
- ML normal maximum likelihood
- FIG. 1, numeral 100 is a flow diagram of one embodiment of steps for a method for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- the first step is storing, in a memory unit, a plurality of predetermined Hidden Markov Models (102).
- a speech utterance is divided into frames corresponding to frame feature vectors.
- the second step is determining, in a decoder for a frame feature vector, a plurality of current maximum likelihood scores each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov Models (104).
- maximum likelihood scores are computed by a Viterbi decoder.
- the third step is computing, in the decoder for the frame feature vector, a plurality of current path scores each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov Models, wherein a path score quantifies a variation in a maximum likelihood score as a function of time (106).
- the fourth step is computing, in the decoder for the frame feature vector, a plurality of current hybrid scores each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov Models, wherein each hybrid score is a combination of the maximum likelihood score and the path score for each model (108).
- the fifth step is determining whether all frame feature vectors have been processed and repeating the second, third and fourth steps (110).
- the final step is selecting a Hidden Markov Model with a lowest or best current hybrid score (112).
- FIG. 2. numeral 106 is a flow diagram of one embodiment of steps for a method for computing a plurality of current path scores in accordance with the present invention.
- a difference between a current maximum likelihood score and an immediately previous maximum likelihood score is computed (202).
- a square of the difference is added to a previous path score to provide a current path score (204).
- the first (202) and second (204) steps are repeated for each Hidden Markov Model to provide a plurality of current path scores (206).
- a minimum path score is selected from the plurality of current path scores (208), and the plurality of current path scores are normalized by subtracting the minimum or best path score from each current path score (210).
- FIG. 3, numeral 108 is a flow diagram of one embodiment of steps for a method for computing a plurality of current hybrid scores in accordance with the present invention.
- a current path score is multiplied by a current maximum likelihood score to provide a product (302).
- a square of the product is added to a previous hybrid score to provide a current hybrid score (304).
- Steps 302 and 304 are repeated for each Hidden Markov Model to provide a plurality of current hybrid scores.
- FIG. 4, numeral 400 is a block diagram of one embodiment of an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- the apparatus comprises a memory unit (402), a decoder (404), a path score determiner (406), a hybrid score determiner (408), and a model selector (410).
- the memory unit (402) receives and stores a plurality of predetermined Hidden Markov Models (412).
- the decoder (404) receives a plurality of frame feature vectors (414) and determines, for each frame feature vector, a plurality of current maximum likelihood scores (416) each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov models (412) stored in the memory unit (402).
- the path score determiner (406) receives the plurality of current maximum likelihood scores (416) from the decoder (404) and computes, for each frame feature vector, a plurality of current path scores (418) each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov Models (412).
- a path score quantifies a variation in a maximum likelihood score as a function of time.
- the hybrid score determiner (408) receives the plurality of current path scores (418) from the path score determiner (406) and the plurality of current maximum likelihood scores (416) from the decoder (404). Then, the hybrid score determiner (408) computes, for each frame feature vector, a plurality of current hybrid scores (420) each corresponding to a distinct Hidden Markov Model in the plurality of predetermined Hidden Markov Models (412). Each hybrid score is a combination of the maximum likelihood score and the path score for a model.
- the model selector (410) selects a Hidden Markov Model corresponding to a lowest (best) current hybrid score in the plurality of current hybrid scores (420).
- FIG. 5, numeral 406, is a block diagram of one embodiment of a path score determiner in an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- the path score determiner (406) comprises a plurality of path subcircuits (502). Each path subcircuit comprises a subtracting circuit (504). a squaring circuit (506), a summing circuit (508). and a normalizing circuit (528).
- the subtracting circuit (504) computes a difference (514) between a current maximum likelihood score (416) and an immediately previous maximum likelihood score (511 ).
- the subtracting circuit (504) may contain a first delay unit (510) for holding the current maximum likelihood score (416) to provide the immediately previous maximum likelihood score (511) to a subtracter (512) which computes the difference (514).
- the squaring circuit (506) receives the difference (514) and provides a squared difference (516) by inputting the difference (514) to both inputs of a multiplier (518).
- the summing circuit (508) receives the squared difference (516) and uses an adder (520) to add the squared difference (516) to a previous path score (524) to provide a current path score (522).
- the summing circuit (508) may contain a second delay unit (526) for holding the current path score (522) to provide the previous path score (524).
- the normalizing circuit (528) provides a plurality of normalized path scores (530) by subtracting a minimum path score from each current path score (522).
- FIG. 6, numeral 408, is a block diagram of one embodiment of a hybrid score determiner in an apparatus for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- the hybrid score determiner (408) comprises a plurality of hybrid subcircuits (602) one for each Hidden Markov Model.
- a hybrid subcircuit (602) comprises a multiplier (604), a squaring circuit (606), and a summing circuit (608).
- the multiplier (604) multiplies a current path score (418) from the path score determiner (406) by a current maximum likelihood score (416) to provide a product (610).
- the squaring circuit (606) receives the product (610) and provides a squared product (612) by inputting the product (610) to both inputs of a multiplier (614).
- the summing circuit (608) receives the squared product (612) and uses an adder (616) to add the squared product (612) to a previous hybrid score (618) to provide a current hybrid score (620).
- the summing circuit (608) may contain a delay unit (622) for holding the current hybrid score (620) to provide the previous path score (618).
- FIG. 7, numeral 700 is a depiction of one embodiment of a radio (702) comprising an apparatus (704) for optimizing Hidden Markov Model speech recognition in accordance with the present invention.
- the apparatus is depicted in FIG. 4.
- FIG. 8, numeral 800 is a graphical depiction of examples of normalized maximum likelihood, ML, scores for a set of HMM word models with respect to time.
- the examples are graphs of normalized maximum likelihood scores (802) with respect to time (804) up to time Tmax (806). Tmax is the time of the last frame processed.
- the eleven graphs (808, 810, 812, 814, 816, 818, 820, 822, 824, 826, and 828) represent the eleven digit models "zero" through "nine" and "oh".
- the graphs illustrate the fact that choosing the model having the best ML score caused a misrecognition of the input word.
- FIG. 9, numeral 900 is a graphical depiction of the amplitude waveform of an example voice signal with respect to time which relates to the score plots in FIG. 8. This is the waveform for the word "zero" (808).
- the waveform (908) is plotted as amplitude (902) with respect to time (904) up to time Tmax (906).
- FIG. 10 is a graphical depiction of the PATH scores for the same set of HMM models referred to in FIG. 8.
- the PATH scores are determined using Equation 3 which is set forth below.
- the graphs show normalized maximum likelihood scores (1002) with respect to time (1004) up to time Tmax (1006).
- the eleven graphs (1008, 1010, 1012, 1014, 1016, 1018, 1020, 1022, 1024, 1026, and 1028) represent the eleven digit models "zero" through "nine" and "oh”.
- FIG. 11 is a graphical depiction of the ML-PATH scores for the same set of HMM models referred to in FIG. 8.
- the ML-PATH scores are determined using Equation 4 which is set forth below.
- the graphs show normalized maximum likelihood scores (1102) with respect to time (1104) up to time Tmax (1106).
- the eleven graphs (1108, 1110, 1112, 1114, 1116, 1118, 1120, 1122, 1 124, 1126, and 1128) represent the eleven digit models "zero" through "nine" and "oh".
- FIG. 11 demonstrates that the ML-PATH metric can be used to identify the correct HMM model when the standard ML metric failed to do so.
- Score paths Plots of the ML scores of HMM models versus time are referred to herein as "score paths” or “paths”, examples of which are shown in FIG. 8. Choosing the model having the best ML score at a particular decision time does not guarantee correct correspondence with spoken input.
- the "score path" function described by the ML score versus time plot of the "correct”, i.e., corresponding to the spoken input, HMM model, usually exhibits less overall deviance from the "optimum” path than the score paths of the "incorrect” models.
- the "optimum" path is a straight horizontal line from observation time 0 to time Tmax (806), indicating that a given model was the best ML choice for every input observation. All other score paths are necessarily longer.
- the "correct” HMM model approximates the "optimum” score path more closely than all of the other models.
- the "correct” model is the ML choice for the majority of the input observations and has the shortest overall score path from time 0 to time Tmax (806).
- shortest refers to a function related to the actual length of the score path. Sequences of acoustic event observations which make the "correct” model less probable for short periods are not given undue weight in choosing the best model.
- the ML-PATH metric is a novel heuristic means of weighting the absolute HMM ML scores at some observation time t in such a way that the past overall behavior, i.e., "best average", of the ML scores have influence over the best model choice.
- FIG. 8 displays the normalized maximum likelihood accumulated log probability scores as a function of time for each of 11 HMM digit models as determined by a typical HMM recognizer. The digit models are the words “zero” through “nine” and "oh”.
- the ML scores were generated by a HMM word-based speech recognition system.
- Each Hidden Markov Model consists of a number of states. Each state may be represented by transition probabilities and a number of speech feature observation probabilities.
- model probabilities are typically stored in logarithmic form.
- the log probabilities of the observations and transitions are summed in such a manner as to maximize the total accumulated probability for each HMM.
- the accumulated log probabilities are normalized by the "best", i.e., most positive score, of any model at each time tick, typically a 10-20 msec, interval, so that the normalized "best” ML overall score at any time is 0.0, the maximum ordinate for the score plots in FIG. 8.
- ML score path “length” refers to a function of the ML score path values and is not necessarily a geometric distance. HMM models which on average are less likely in the ML sense will have “longer” paths than models which are more likely. In the ideal case, the shortest path is realized when the "correct” model is the maximum likelihood choice at every time tick.
- a measure of the score path "length”, herein called the PATH score is obtained by summing the differences of the maximum likelihood scores across each time period as expressed functionally in Equation 3.
- Equation 4 ML-PATH hybrid metric
- Equation 3 is the index of the model with the best ML-PATH score
- _J +1 is the best ML probability score for model k at time /
- * is the relative score path "length", i.e., PATH score, of model k at time t, obtained in Equation 3.
- the recognition decision is thus made on the basis of a model's relative ML-PATH score, i.e., lowest score, rather than on the ML score alone.
- Equation 4 can be replaced with the absolute value of the product,
- the ML-PATH scores for this example are shown in FIG. 11.
- Model 0 (1108) has the lowest or best score and is chosen as the model which best represents the spoken input word.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Error Detection And Correction (AREA)
- Image Analysis (AREA)
- Transmitters (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002189249A CA2189249C (en) | 1995-03-29 | 1996-01-29 | Method, apparatus, and radio for optimizing hidden markov model speech recognition |
EP96910297A EP0764319A4 (en) | 1995-03-29 | 1996-01-29 | Method, apparatus, and radio for optimizing hidden markov model speech recognition |
AU53531/96A AU681058B2 (en) | 1995-03-29 | 1996-01-29 | Method, apparatus, and radio for optimizing hidden markov model speech recognition |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/413,146 US5617509A (en) | 1995-03-29 | 1995-03-29 | Method, apparatus, and radio optimizing Hidden Markov Model speech recognition |
US08/413,146 | 1995-03-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996030895A1 true WO1996030895A1 (en) | 1996-10-03 |
Family
ID=23636036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/000968 WO1996030895A1 (en) | 1995-03-29 | 1996-01-29 | Method, apparatus, and radio for optimizing hidden markov model speech recognition |
Country Status (6)
Country | Link |
---|---|
US (1) | US5617509A (en) |
EP (1) | EP0764319A4 (en) |
CN (1) | CN1150490A (en) |
AU (1) | AU681058B2 (en) |
CA (1) | CA2189249C (en) |
WO (1) | WO1996030895A1 (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5778341A (en) * | 1996-01-26 | 1998-07-07 | Lucent Technologies Inc. | Method of speech recognition using decoded state sequences having constrained state likelihoods |
US5970446A (en) | 1997-11-25 | 1999-10-19 | At&T Corp | Selective noise/channel/coding models and recognizers for automatic speech recognition |
SE9903553D0 (en) * | 1999-01-27 | 1999-10-01 | Lars Liljeryd | Enhancing conceptual performance of SBR and related coding methods by adaptive noise addition (ANA) and noise substitution limiting (NSL) |
US6542866B1 (en) * | 1999-09-22 | 2003-04-01 | Microsoft Corporation | Speech recognition method and apparatus utilizing multiple feature streams |
US7110947B2 (en) * | 1999-12-10 | 2006-09-19 | At&T Corp. | Frame erasure concealment technique for a bitstream-based feature extractor |
US6629073B1 (en) | 2000-04-27 | 2003-09-30 | Microsoft Corporation | Speech recognition method and apparatus utilizing multi-unit models |
US6662158B1 (en) * | 2000-04-27 | 2003-12-09 | Microsoft Corporation | Temporal pattern recognition method and apparatus utilizing segment and frame-based models |
AU2000276394A1 (en) * | 2000-09-30 | 2002-04-15 | Intel Corporation | Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition |
GB2370401A (en) * | 2000-12-19 | 2002-06-26 | Nokia Mobile Phones Ltd | Speech recognition |
US20030187813A1 (en) * | 2002-03-26 | 2003-10-02 | Goldman Neal D. | System and method for identifying relationship paths to a target entity |
US7366666B2 (en) * | 2003-10-01 | 2008-04-29 | International Business Machines Corporation | Relative delta computations for determining the meaning of language inputs |
US7970613B2 (en) | 2005-11-12 | 2011-06-28 | Sony Computer Entertainment Inc. | Method and system for Gaussian probability data bit reduction and computation |
US7778831B2 (en) * | 2006-02-21 | 2010-08-17 | Sony Computer Entertainment Inc. | Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch |
US8010358B2 (en) * | 2006-02-21 | 2011-08-30 | Sony Computer Entertainment Inc. | Voice recognition with parallel gender and age normalization |
FI20086260A (en) | 2008-12-31 | 2010-09-02 | Teknillinen Korkeakoulu | A method for finding and identifying a character |
US8442829B2 (en) * | 2009-02-17 | 2013-05-14 | Sony Computer Entertainment Inc. | Automatic computation streaming partition for voice recognition on multiple processors with limited memory |
US8788256B2 (en) * | 2009-02-17 | 2014-07-22 | Sony Computer Entertainment Inc. | Multiple language voice recognition |
US8442833B2 (en) * | 2009-02-17 | 2013-05-14 | Sony Computer Entertainment Inc. | Speech processing with source location estimation using signals from two or more microphones |
US9153235B2 (en) | 2012-04-09 | 2015-10-06 | Sony Computer Entertainment Inc. | Text dependent speaker recognition with long-term feature based on functional data analysis |
US9183830B2 (en) * | 2013-11-01 | 2015-11-10 | Google Inc. | Method and system for non-parametric voice conversion |
US9177549B2 (en) * | 2013-11-01 | 2015-11-03 | Google Inc. | Method and system for cross-lingual voice conversion |
US9311430B2 (en) * | 2013-12-16 | 2016-04-12 | Mitsubishi Electric Research Laboratories, Inc. | Log-linear dialog manager that determines expected rewards and uses hidden states and actions |
WO2016090557A1 (en) * | 2014-12-09 | 2016-06-16 | 华为技术有限公司 | Method for detecting sending sequence, receiver and receiving device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4348553A (en) * | 1980-07-02 | 1982-09-07 | International Business Machines Corporation | Parallel pattern verifier with dynamic time warping |
US5440662A (en) * | 1992-12-11 | 1995-08-08 | At&T Corp. | Keyword/non-keyword classification in isolated word speech recognition |
-
1995
- 1995-03-29 US US08/413,146 patent/US5617509A/en not_active Expired - Fee Related
-
1996
- 1996-01-29 CA CA002189249A patent/CA2189249C/en not_active Expired - Fee Related
- 1996-01-29 AU AU53531/96A patent/AU681058B2/en not_active Ceased
- 1996-01-29 WO PCT/US1996/000968 patent/WO1996030895A1/en not_active Application Discontinuation
- 1996-01-29 EP EP96910297A patent/EP0764319A4/en not_active Withdrawn
- 1996-01-29 CN CN96190239A patent/CN1150490A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4348553A (en) * | 1980-07-02 | 1982-09-07 | International Business Machines Corporation | Parallel pattern verifier with dynamic time warping |
US5440662A (en) * | 1992-12-11 | 1995-08-08 | At&T Corp. | Keyword/non-keyword classification in isolated word speech recognition |
Non-Patent Citations (1)
Title |
---|
See also references of EP0764319A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP0764319A1 (en) | 1997-03-26 |
CN1150490A (en) | 1997-05-21 |
CA2189249C (en) | 2001-04-10 |
CA2189249A1 (en) | 1996-10-03 |
AU681058B2 (en) | 1997-08-14 |
EP0764319A4 (en) | 1998-12-30 |
US5617509A (en) | 1997-04-01 |
AU5353196A (en) | 1996-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5617509A (en) | Method, apparatus, and radio optimizing Hidden Markov Model speech recognition | |
US6260013B1 (en) | Speech recognition system employing discriminatively trained models | |
US5963903A (en) | Method and system for dynamically adjusted training for speech recognition | |
US5937384A (en) | Method and system for speech recognition using continuous density hidden Markov models | |
EP2216775B1 (en) | Speaker recognition | |
US6542866B1 (en) | Speech recognition method and apparatus utilizing multiple feature streams | |
US5857169A (en) | Method and system for pattern recognition based on tree organized probability densities | |
US20040186714A1 (en) | Speech recognition improvement through post-processsing | |
JP4531166B2 (en) | Speech recognition method using reliability measure evaluation | |
US6076053A (en) | Methods and apparatus for discriminative training and adaptation of pronunciation networks | |
KR101120765B1 (en) | Method of speech recognition using multimodal variational inference with switching state space models | |
WO1998000834A9 (en) | Method and system for dynamically adjusted training for speech recognition | |
JPH0372998B2 (en) | ||
US9280979B2 (en) | Online maximum-likelihood mean and variance normalization for speech recognition | |
US6253178B1 (en) | Search and rescoring method for a speech recognition system | |
Knill et al. | Fast implementation methods for Viterbi-based word-spotting | |
WO2004072947A2 (en) | Speech recognition with soft pruning | |
US20080140399A1 (en) | Method and system for high-speed speech recognition | |
US20040254790A1 (en) | Method, system and recording medium for automatic speech recognition using a confidence measure driven scalable two-pass recognition strategy for large list grammars | |
Rose et al. | A user-configurable system for voice label recognition | |
Ynoguti et al. | A comparison between HMM and hybrid ANN-HMM-based systems for continuous speech recognition | |
Ney | Philips GmbH Forschungslaboratorien D-52066 Aachen, Germany | |
Nose et al. | N-best vector quantization for isolated word speech recognition | |
GB2463908A (en) | Speech recognition utilising a hybrid combination of probabilities output from a language model and an acoustic model. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 96190239.6 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA CN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2189249 Country of ref document: CA |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1996910297 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1996910297 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1996910297 Country of ref document: EP |