WO1996013830A1 - Decision tree classifier designed using hidden markov models - Google Patents

Decision tree classifier designed using hidden markov models Download PDF

Info

Publication number
WO1996013830A1
WO1996013830A1 PCT/US1995/013416 US9513416W WO9613830A1 WO 1996013830 A1 WO1996013830 A1 WO 1996013830A1 US 9513416 W US9513416 W US 9513416W WO 9613830 A1 WO9613830 A1 WO 9613830A1
Authority
WO
WIPO (PCT)
Prior art keywords
decision tree
speech
hidden markov
word
utterances
Prior art date
Application number
PCT/US1995/013416
Other languages
French (fr)
Inventor
Jeffrey S. Sorensen
Original Assignee
Dictaphone Corporation (U.S.)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dictaphone Corporation (U.S.) filed Critical Dictaphone Corporation (U.S.)
Priority to JP8514641A priority Critical patent/JPH10509526A/en
Priority to EP95937519A priority patent/EP0789902A4/en
Priority to AU39608/95A priority patent/AU3960895A/en
Publication of WO1996013830A1 publication Critical patent/WO1996013830A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Definitions

  • This invention relates to pattern classification, and more particularly, to a method and apparatus for classifying unknown speech utterances into specific word categories.
  • Speech recognition is the process of analyzing an acoustic speech signal to identify the linguistic message or utterance that was intended so that a machine can correctly respond to spoken commands. Fluent conversation with a machine is difficult because of the intrinsic variability and complexity of speech. Speech recognition difficulty is also a function of vocabulary size, confusability of words, signal bandwidth, noise, frequency distortions, the population of speakers that must be understood, and the form of speech to be processed.
  • a speech recognition device requires the transformation of the continuous signal into discrete representations which may be assigned proper meanings, and which, when comprehended, may be used to effect responsive behavior.
  • the same word spoken by the same person on two successive occasions may have different characteristics. Pattern classifications have been developed to help in determining which word has been uttered.
  • the hidden Markov model is the most popular statistical model for speech.
  • the hidden Markov model characterizes speech signals as a stochastic state, or random distribution, machine with different characteristic distributions of speech signals associated with each state.
  • a speech signal can be viewed as a series of acoustic sounds, where each sound has a particular pattern of harmonic features.
  • harmonic patterns there is not a one-to-one relationship between harmonic patterns and speech units.
  • the decision tree classifier is another pattern classification technique Decision tree classifiers imply a sequential method of determining which class to assign to a particular observation Decision tree classifiers are most commonly used in the field of artificial intelligence for medical diagnosis and in botany for designing taxonomic guides
  • a decision tree can be described as a guide to asking a series of questions, where the latter questions asked depend upon the answers to earlier questions For example, in developing a guide to identifying bird species, a relevant early question is the color of the bird
  • discrete output hidden Markov models For the application of isolated word recognition, a specific class of hidden Markov models is employed, commonly referred to as discrete output hidden Markov models.
  • the spoken words are represented by a sequence of symbols from an appropriately defined alphabet of symbols.
  • the method used to transform a set of acoustic patterns into a set of discrete symbols is known in the art as vector quantization.
  • Prior art implementations of speech recognition systems using hidden Markov models considered the contribution of all parts of the utterance simultaneously.
  • the prior art required that the feature extraction and vector quantization steps be performed on the entire utterance before classification can begin.
  • hidden Markov model computations require complex mathematical operations in order to estimate the probabilities of particular utterance patterns. These operations include the use of the complex representation of floating point numbers as well as large quantities of multiplications, divisions, and summations.
  • This invention overcomes the disadvantage of the prior art by providing a speech recognition system that does not use complex mathematical operations and does not take a great deal of time to classify utterances into word categories.
  • a decision tree classifier designed using the hidden Markov model, yields a final classifier that can be implemented without using any mathematical operations.
  • This invention moves all of the mathematical calculations to the construction, or design, of the decision tree. Once this is completed, the decision tree can be implemented using only logical operations such as memory addressing and binary comparison operations. This simplicity allows the decision tree to be implemented directly in a simple hardware form using conventional logic gates or in software as greatly reduced set of computer instructions. This differs greatly from previous algorithms which required that a means of mathematical calculation of probabilities be provided in the classification system.
  • This invention overcomes the problems of earlier decision tree classifier methods by introducing the hidden Markov modeling technique as an intermediate representation of the training data
  • the first step is obtaining a collection of examples of speech utterances From these examples, hidden Markov models corresponding to each word are trained using the Baum-Welch method.
  • the training speech is discarded. Only the statistical models that correspond to each word are used in designing the decision tree.
  • the statistical models are considered a smoothed version of the training data. This smoothed representation helps the decision tree design prevent over-specialization.
  • the hidden Markov model imposes a rigid parametric structure to the probability space, preventing variability in the training data from ruining the tree design process. Estimates for unobserved utterances are inferred based on their similarity to other examples in the training data set.
  • FIG. 1 is a block drawing of a speech recognition system that was utilized by the prior art
  • FIG. 2 is a block diagram of the apparatus of this invention.
  • FIG. 3 is the decision tree
  • FIG. 4 is a table of the time index and corresponding vector quantizer symbols for sample utterances; and FIG. 5 is a block diagram of a decision tree design algorithm.
  • the reference character 11 represents a prior art speech recognition system for the words “yes”, “no” and “help”.
  • the words “yes”, “no” and “help” have been selected for illustrative purposes. Any words or any number of words may have been selected.
  • Speech recognition system 11 comprises: a feature extractor 12; a vector quantizer 13; a hidden Markov Model (HMM) for the word “yes” 14; a hidden Markov Model (HMM) for the word “no” 15; a hidden Markov Model (HMM) for the word “help” 16 and a choose maximum 17.
  • HMM hidden Markov Model
  • HMM hidden Markov Model
  • Feature extractor 12 consists of a frame buffer and typical speech recognition preprocessing. Extractor 12 is disclosed in a book by Lawrence R. Rabiner and Biing Hwang Juang entitled “Fundamentals of Speech Recognition,” Prentice-Hall, Inc. , Englewood Cliffs. 1993. which is herein incorporated by reference. An implementation entails the use of a frame buffer of 45 ms with 30 ms of overlap between frames.
  • Each frame is processed using a pre-emphasis filter, a Hamming windowing operation, an autocorrelation measurement with order 10, calculation of the Linear Prediction Coefficients (LPC's) followed by calculation of the LPC cepstral parameters.
  • Cepstral parameters provide a complete source-filter characterization of the speech.
  • the cepstral parameters are a representation of the spectral contents of the spoken utterance and contain information that includes the location and bandwidths of the formants of the vocal tract of the speaker
  • An energy term provides additional information about the signal amplitude
  • the output of extractor 12 are the aforementioned LPC cepstral parameters of order 10 and a single frame energy term that are coupled to the input of vector quantizer 13.
  • Vector quantizer 13 utilizes the algorithm disclosed by Rabiner and Juang in "Fundamentals of Speech Recognition" to map each collection of 1 1 features received from extractor 12 into a single integer
  • the above vector quantizer 13 is developed using a large quantity of training speech data.
  • the integer sequence that constitutes a discrete version of the speech spectrum is then coupled to the inputs of HMM for the word "yes” 4, HMM for the word “no” 15 and HMM for the word “help” 16 HMM 14, HMM 15 and HMM 16 each individually contains a set of hidden Markov model parameters.
  • HMM 15 and HMM 16 each individually contains a set of hidden Markov model parameters.
  • this algorithm is known as the forward algorithm.
  • HMM 14, HMM 15 and HMM 16 performs their own computations. These computations are time consuming.
  • the outputs of HMM 14, HMM 15 and HMM 16 are coupled to the input of choose maximum 17.
  • Choose maximum 17 is a conventional recognizer that utilizes a computer program to determine whether HMM 14, HMM 15, or HMM 16 has the highest probability estimate If HMM 14 had the highest probability estimate, then me system concludes the word "yes was spoken into device 18 and if HMM 1 5 had the highest probability estimate the word "no" was spoken into device 18
  • the output of choose maximum 17 is utilized to effect some command input to other systems to place functions in these other systems under voice control
  • FIG. 2 is a block diagram of the apparatus of this invention.
  • the output of speech input device 18 is coupled to the input of feature extractor 12 and the output of feature extractor 12 is coupled to the input of vector quantizer 13.
  • the output of vector quantizer 13 is coupled to the input of decision tree classifier 20.
  • Decision tree classifier 20 has a buffer that contains the vector quantizer values of the entire spoken utterance, i e yes", “no” "help”
  • the decision tree classifier 20 may be represented by the data contained in the table shown in FIG. 3 and the associated procedure for accessing the aforementioned table
  • the table shown in FIG. 3 is generated using the statistical information collected in the hidden Markov models for each of the words in the vocabulary, i e ' yes", “no” and "help
  • FIG 4 contains several examples of spoken utterances for illustrative purposes, and will be used in the discussions that follow
  • the method used to classify utterances with decision tree classifier 20 involves a series of inspections of the symbols at specific time instances, as guided by the information in FIG 3 We begin with step SO for each classification task. This table instructs us to inspect time index 3 and to proceed to other steps based on the symbols found at time index 3, or to announce a final classification when no further inspections are required We will demonstrate this procedure for three specific utterances chosen from FIG
  • step SO This line of the decision tree tells us to look at the vector quantizer symbol contained at time index 3 (FIG. 4) for any spoken word. Let us assume that the word "no" was spoken. The symbol at time index 3 for the first word “no” is 7. The column for symbol 7 (FIG. 3) at time 3 tells us to proceed to step S3. This step tells us to look at time index 6 (FIG. 4) The word "no” has a symbol O 96/13830 PCMJS95/13416
  • step S3 tells us to proceed to step S8
  • step S8 tells us to refer to time index 9 (FIG 4)
  • time index 9 At time index 9 for the word "no", we find a symbol value of 8 which informs us to go to column
  • step 8 This tells us to classify the answer as a "no", which is correct
  • step SO of FIG 3 This line of the table tells us to look at time index 3 (FIG 4) for our chosen word
  • the symbol at time index 3 for the first sample utterance for the word "yes” is 7
  • the column for symbol 7 (FIG 3) at time 3 tells us to proceed to step S3
  • This step tells us to look at time index 6 (FIG 4)
  • the word "yes” has a symbol 3 at time index 6
  • the symbol 3 column of step S3 (FIG 3) tells us to classify this input correctly, as the word "yes '
  • step SO of FIG 3 this line of the table tells us to look at time index 3 (FIG 4) for our chosen word
  • the symbol at time index 3 for the word ' help" is 6
  • the column for symbol 6 (FIG 3) at time 3 tells us to proceed to step S2
  • This step tells us to look at time index 4 (FIG 4)
  • the word "help” has a symbol 6 at time index 4
  • the symbol 6 column of step S2 (Fig 3) tells us that the word "help should be selected
  • the ' EOW" column of FIG 3 tells us where to go when the end of a word has been reached at the time index re ⁇ uested for observation
  • the label "n/a” was included for utterances too short to qualify as any of the words in the training set
  • the decision table shown in FIG 3 can be used to correctly classify all of the examples presented in FIG 4
  • the above decision tree and examples would not perform well on a large data set, as it is far too small to capture the diversity of potential utterances
  • Real implementations of the algorithm, which are hereinafter described, would generate tables with thousands of lines and at least several dozen columns
  • the above example is meant to illustrate the output of the hidden Markov model conversion to decision tree algorithm, and the procedure implemented in decision tree classifier 20 that uses the resultant data table
  • FIG 5 is a flow chart of a decision tree design algorithm
  • General decision tree design principles are disclosed by Ro ⁇ ney M Goodman and Padhraic Smyth in their article entitled “Decision tree design from a communications theory standpoint", IEEE Transactions on Information Theory, Vol., 34, No 5, pp 979-997 Sept 1988 which is herein incorporated by reference
  • the decision tree is designed using an iterative process
  • the precise rule applied for the expansion of each branch of the decision tree is specified
  • the desired termination condition depends highly on specific needs of the application, and will be discussed in broader terms
  • the termination condition is the total number of nodes in the resulting tree This is directly related to the amount of memory available for the storage of the classifier data structures Thus, this termination condition is well suited to most practical implementations
  • This algorithm is contained in box 24 and is repeated iteratively, converting the terminal, or leaf, nodes of the decision tree into internal nodes one at a time, and adding new leaf nodes in proportion to the number of entries in the vector quantizer This continues until the prespecified maximum number of tree nodes has been reached.
  • the resulting data structures can then be organized into the, decision tree as has been shown in FIG 3, and stored in the decision tree classifier 20
  • the following decision tree design algorithm may be utilized in block 24
  • the tree design algorithm proceeds as follows: Two sets of sets are defined.
  • the first set, 7, is the collection of observation sets associated with internal tree nodes.
  • the second set, L is the collection of observation sets associated with leaf nodes of the tree.
  • the set 7 is set to the empty set, and L is a set that contains one element, and that element is the empty set.
  • the main loop of the algorithm moves elements from the set L to the set 7 one at a time until the cardinality, or number of elements, in set 7 reaches the pre-specified maximum number.
  • the determination of which set contained in L should be moved to set 7 at any iteration is determined using an information theoretic criterion.
  • the goal is to reduce the total entropy of the tree specified by the collection of leaf nodes in L.
  • the optimal greedy selection for the collection of observation sets is given by
  • the optimal set X contained in L is moved to the set 7
  • the optimal set X is used to compute the optimal time index for 10 further node expansion. This is done using an information theoretic criteria specified by ⁇ arg min ; ⁇ _(w ⁇ X J ⁇ (/, s) ⁇ ) ?(X ⁇ (/, s) ⁇ )
  • the time index specified by this relationship is used to expand the collection of leaf nodes to include all possible leaf nodes associated with the

Abstract

A decision tree classifier (20) is designed using the hidden Markov model to yield a final classifier that can be implemented without using any mathematical operations. The apparatus of this invention moves all of the mathematical calculations to the construction of the decision tree. Once this is completed, the decision tree can be implemented using only logical operations such as memory addressing and binary comparison operations. This simplicity allows the decision tree to be implemented in a simple hardware form using conventional gates.

Description

DECIS1ON TREE CLASSIFIER DESIGNED USING HIDDEN MARKOV
MODELS
Field Of The Invention
This invention relates to pattern classification, and more particularly, to a method and apparatus for classifying unknown speech utterances into specific word categories.
Background Of The Invention
Speech recognition is the process of analyzing an acoustic speech signal to identify the linguistic message or utterance that was intended so that a machine can correctly respond to spoken commands. Fluent conversation with a machine is difficult because of the intrinsic variability and complexity of speech. Speech recognition difficulty is also a function of vocabulary size, confusability of words, signal bandwidth, noise, frequency distortions, the population of speakers that must be understood, and the form of speech to be processed.
A speech recognition device requires the transformation of the continuous signal into discrete representations which may be assigned proper meanings, and which, when comprehended, may be used to effect responsive behavior. The same word spoken by the same person on two successive occasions may have different characteristics. Pattern classifications have been developed to help in determining which word has been uttered.
Several methods of pattern classification have been utilized in the field of speech recognition. Currently, the hidden Markov model is the most popular statistical model for speech. The hidden Markov model characterizes speech signals as a stochastic state, or random distribution, machine with different characteristic distributions of speech signals associated with each state. Thus, a speech signal can be viewed as a series of acoustic sounds, where each sound has a particular pattern of harmonic features. However, there is not a one-to-one relationship between harmonic patterns and speech units. Rather, there is a random and statistical relationship between a particular speech sound and a particular harmonic pattern In addition the duration of the sounds in speech does not unduly effect the recognition of whole words The hidden Markov model captures both of these aspects of speech, where each state has a characteristic distribution of harmonic patterns, and the transitions from state to state describe the durational aspects of each speech sound
The algorithms for designing hidden Markov models from a collection of sample utterances of words and sounds are widely known and disclosed in a book by Lawrence R Rabiner and Bung Hwang Juang entitled "Fundamentals of Speech Recognition" Prentice-Hall, Inc Englewood Cliffs, 1993, which is herein incorporated by reference A method commonly referred to as Baum- Welch re-estimation, allows one to continually refine models of spoken words. Once the statistical parameters of the model have been estimated, a simple formula exist for computing the probability of a given utterance being produced by the trained model This latter algorithm is used in the design of isolated word speech recognition systems By designing a collection of models, one for each word, one can then use the generated probability estimates as the basis for deciding the most likely word model to match to a given utterance
The decision tree classifier is another pattern classification technique Decision tree classifiers imply a sequential method of determining which class to assign to a particular observation Decision tree classifiers are most commonly used in the field of artificial intelligence for medical diagnosis and in botany for designing taxonomic guides A decision tree can be described as a guide to asking a series of questions, where the latter questions asked depend upon the answers to earlier questions For example, in developing a guide to identifying bird species, a relevant early question is the color of the bird
Decision tree design has proven to be a complex problem The optimal decision tree classifier for a particular classification task can only be found by considering all possible relevant questions and all possible orders of asking them This is a computationally impossible task even for situations where a small number of categories and features exist Instead, methods of designing near optimal decision trees have been proposed using the measurements defined by the field of information theory Ear er attempts at designing decision tree classifiers for speech recognition using conventional methods proved unsuccessful. This is because an unrealistically vast quantity of training utterances would be required for accurate characterization of speech signals. This is due to the fact that the decision tree conditions questions based upon the responses to earlier questions. In any finite training set, -and with each question, the number of examples of a particular utterance meeting all of the necessary criteria imposed by earlier questions dwindles rapidly. Thus, as observations accumulate with deeper nodes of the decision tree, the estimates used in designing the decision tree become increasingly inaccurate.
For the application of isolated word recognition, a specific class of hidden Markov models is employed, commonly referred to as discrete output hidden Markov models. For discrete output hidden Markov models, the spoken words are represented by a sequence of symbols from an appropriately defined alphabet of symbols. The method used to transform a set of acoustic patterns into a set of discrete symbols is known in the art as vector quantization.
Prior art implementations of speech recognition systems using hidden Markov models considered the contribution of all parts of the utterance simultaneously. The prior art required that the feature extraction and vector quantization steps be performed on the entire utterance before classification can begin. In addition, hidden Markov model computations require complex mathematical operations in order to estimate the probabilities of particular utterance patterns. These operations include the use of the complex representation of floating point numbers as well as large quantities of multiplications, divisions, and summations.
Summary Of The Invention
This invention overcomes the disadvantage of the prior art by providing a speech recognition system that does not use complex mathematical operations and does not take a great deal of time to classify utterances into word categories. A decision tree classifier, designed using the hidden Markov model, yields a final classifier that can be implemented without using any mathematical operations. This invention moves all of the mathematical calculations to the construction, or design, of the decision tree. Once this is completed, the decision tree can be implemented using only logical operations such as memory addressing and binary comparison operations. This simplicity allows the decision tree to be implemented directly in a simple hardware form using conventional logic gates or in software as greatly reduced set of computer instructions. This differs greatly from previous algorithms which required that a means of mathematical calculation of probabilities be provided in the classification system.
This invention overcomes the problems of earlier decision tree classifier methods by introducing the hidden Markov modeling technique as an intermediate representation of the training data The first step is obtaining a collection of examples of speech utterances From these examples, hidden Markov models corresponding to each word are trained using the Baum-Welch method.
Once the models of the speech utterances have been trained, the training speech is discarded. Only the statistical models that correspond to each word are used in designing the decision tree. The statistical models are considered a smoothed version of the training data. This smoothed representation helps the decision tree design prevent over-specialization. The hidden Markov model imposes a rigid parametric structure to the probability space, preventing variability in the training data from ruining the tree design process. Estimates for unobserved utterances are inferred based on their similarity to other examples in the training data set.
In order to employ the information theoretic method of designing decision trees, it is necessary to derive new methods for estimating the probability values for specific sequences of acoustically labeled sounds. These new methods differ from earlier hidden Markov model algorithms in that they must yield probability values for sequences of sounds that are only partially specified. From a probabilistic perspective, the solution is to sum up the probabilities for all possible sequences of sounds for all sounds not specified. However, this method is computationally prohibitive. Instead, a modification to the hidden Markov model algorithm known as the forward algorithm can be employed to compute the probabilities within a reasonable amount of time. Using the techniques discussed above, it is possible to design the decision tree before the finical classification task. This is desirable, as this eliminates (virtually) all computations from the classification task.
Brief Description Of The Drawings
FIG. 1 is a block drawing of a speech recognition system that was utilized by the prior art;
FIG. 2 is a block diagram of the apparatus of this invention;
FIG. 3 is the decision tree;
FIG. 4 is a table of the time index and corresponding vector quantizer symbols for sample utterances; and FIG. 5 is a block diagram of a decision tree design algorithm.
Description Of The Preferred Embodiment
Referring now to the drawings in detail and more particularly to FIG. 1 : the reference character 11 represents a prior art speech recognition system for the words "yes", "no" and "help". The words "yes", "no" and "help" have been selected for illustrative purposes. Any words or any number of words may have been selected. Speech recognition system 11 comprises: a feature extractor 12; a vector quantizer 13; a hidden Markov Model (HMM) for the word "yes" 14; a hidden Markov Model (HMM) for the word "no" 15; a hidden Markov Model (HMM) for the word "help" 16 and a choose maximum 17. The words "yes", "no" and "help" may be spoken into speech input device 18. Typically speech input device will consist of a microphone, amplifier and a A/D converter (not shown). The output of device 18 will be digitally sampled speech at a rate of approximately 12 Khz and a 16 bit resolution. The output of device 18 is coupled to the input of feature extractor 12. Feature extractor 12 consists of a frame buffer and typical speech recognition preprocessing. Extractor 12 is disclosed in a book by Lawrence R. Rabiner and Biing Hwang Juang entitled "Fundamentals of Speech Recognition," Prentice-Hall, Inc. , Englewood Cliffs. 1993. which is herein incorporated by reference. An implementation entails the use of a frame buffer of 45 ms with 30 ms of overlap between frames. Each frame is processed using a pre-emphasis filter, a Hamming windowing operation, an autocorrelation measurement with order 10, calculation of the Linear Prediction Coefficients (LPC's) followed by calculation of the LPC cepstral parameters. Cepstral parameters provide a complete source-filter characterization of the speech. The cepstral parameters are a representation of the spectral contents of the spoken utterance and contain information that includes the location and bandwidths of the formants of the vocal tract of the speaker An energy term provides additional information about the signal amplitude The output of extractor 12 are the aforementioned LPC cepstral parameters of order 10 and a single frame energy term that are coupled to the input of vector quantizer 13.
Vector quantizer 13 utilizes the algorithm disclosed by Rabiner and Juang in "Fundamentals of Speech Recognition" to map each collection of 1 1 features received from extractor 12 into a single integer The above vector quantizer 13 is developed using a large quantity of training speech data. The integer sequence that constitutes a discrete version of the speech spectrum is then coupled to the inputs of HMM for the word "yes" 4, HMM for the word "no" 15 and HMM for the word "help" 16 HMM 14, HMM 15 and HMM 16 each individually contains a set of hidden Markov model parameters. When a sequence of integers representing a single spoken word is presented, the mathematical operations necessary to compute the observation sequence probability are performed. In the notation of speech recognition, this algorithm is known as the forward algorithm. HMM 14, HMM 15 and HMM 16 performs their own computations. These computations are time consuming. The outputs of HMM 14, HMM 15 and HMM 16 are coupled to the input of choose maximum 17. Choose maximum 17 is a conventional recognizer that utilizes a computer program to determine whether HMM 14, HMM 15, or HMM 16 has the highest probability estimate If HMM 14 had the highest probability estimate, then me system concludes the word "yes was spoken into device 18 and if HMM 1 5 had the highest probability estimate the word "no" was spoken into device 18 The output of choose maximum 17 is utilized to effect some command input to other systems to place functions in these other systems under voice control
FIG. 2 is a block diagram of the apparatus of this invention. The output of speech input device 18 is coupled to the input of feature extractor 12 and the output of feature extractor 12 is coupled to the input of vector quantizer 13. The output of vector quantizer 13 is coupled to the input of decision tree classifier 20.
Decision tree classifier 20 has a buffer that contains the vector quantizer values of the entire spoken utterance, i e yes", "no" "help" The decision tree classifier 20 may be represented by the data contained in the table shown in FIG. 3 and the associated procedure for accessing the aforementioned table The table shown in FIG. 3 is generated using the statistical information collected in the hidden Markov models for each of the words in the vocabulary, i e ' yes", "no" and "help FIG 4 contains several examples of spoken utterances for illustrative purposes, and will be used in the discussions that follow The method used to classify utterances with decision tree classifier 20 involves a series of inspections of the symbols at specific time instances, as guided by the information in FIG 3 We begin with step SO for each classification task. This table instructs us to inspect time index 3 and to proceed to other steps based on the symbols found at time index 3, or to announce a final classification when no further inspections are required We will demonstrate this procedure for three specific utterances chosen from FIG 4.
We always begin the decision tree shown in FIG. 3 at step SO. This line of the decision tree tells us to look at the vector quantizer symbol contained at time index 3 (FIG. 4) for any spoken word. Let us assume that the word "no" was spoken. The symbol at time index 3 for the first word "no" is 7. The column for symbol 7 (FIG. 3) at time 3 tells us to proceed to step S3. This step tells us to look at time index 6 (FIG. 4) The word "no" has a symbol O 96/13830 PCMJS95/13416
» 8 -
7 at time index 6 The symool 7 column oτ step S3 (FIG 3) tells us to proceed to step S8 Step S8 tells us to refer to time index 9 (FIG 4) At time index 9 for the word "no", we find a symbol value of 8 which informs us to go to column
8 of step 8 (FIG 3) This tells us to classify the answer as a "no", which is correct
If the word "yes" was chosen at step SO of FIG 3 This line of the table tells us to look at time index 3 (FIG 4) for our chosen word The symbol at time index 3 for the first sample utterance for the word "yes" is 7 The column for symbol 7 (FIG 3) at time 3 tells us to proceed to step S3 This step tells us to look at time index 6 (FIG 4) The word "yes" has a symbol 3 at time index 6 The symbol 3 column of step S3 (FIG 3) tells us to classify this input correctly, as the word "yes '
If the word "help" was spoken at step SO of FIG 3, this line of the table tells us to look at time index 3 (FIG 4) for our chosen word The symbol at time index 3 for the word ' help" is 6 The column for symbol 6 (FIG 3) at time 3 tells us to proceed to step S2 This step tells us to look at time index 4 (FIG 4) The word "help" has a symbol 6 at time index 4 The symbol 6 column of step S2 (Fig 3) tells us that the word "help should be selected
The ' EOW" column of FIG 3 tells us where to go when the end of a word has been reached at the time index reαuested for observation The label "n/a" was included for utterances too short to qualify as any of the words in the training set
The decision table shown in FIG 3 can be used to correctly classify all of the examples presented in FIG 4 However, the above decision tree and examples would not perform well on a large data set, as it is far too small to capture the diversity of potential utterances Real implementations of the algorithm, which are hereinafter described, would generate tables with thousands of lines and at least several dozen columns The above example is meant to illustrate the output of the hidden Markov model conversion to decision tree algorithm, and the procedure implemented in decision tree classifier 20 that uses the resultant data table
FIG 5 is a flow chart of a decision tree design algorithm General decision tree design principles are disclosed by Roαney M Goodman and Padhraic Smyth in their article entitled "Decision tree design from a communications theory standpoint", IEEE Transactions on Information Theory, Vol., 34, No 5, pp 979-997 Sept 1988 which is herein incorporated by reference The decision tree is designed using an iterative process The precise rule applied for the expansion of each branch of the decision tree is specified The desired termination condition depends highly on specific needs of the application, and will be discussed in broader terms
In the preferred implementation, a greedy algorithm is specified where the termination condition is the total number of nodes in the resulting tree This is directly related to the amount of memory available for the storage of the classifier data structures Thus, this termination condition is well suited to most practical implementations
In order to design the tree, the measure the mutual information between an arbitrary set of observations and the word category is needed To find the mutual information, it is necessary to compute the probability of observing a partially specified sequence of vector quantizer labels This is accomplished using a modified version of the hidden Markov model equations, contained in block 21 for the word "yes", block 22 for the word "no" and block 23 for the word "help", normally referred to as the forward algorithm This modification involves removing the terms in the forward algorithm associated with unspecified vector quantizer outputs and replacing their probability values with one (1 ) in the conventional hidden Markov model algorithm
This algorithm is contained in box 24 and is repeated iteratively, converting the terminal, or leaf, nodes of the decision tree into internal nodes one at a time, and adding new leaf nodes in proportion to the number of entries in the vector quantizer This continues until the prespecified maximum number of tree nodes has been reached The resulting data structures can then be organized into the, decision tree as has been shown in FIG 3, and stored in the decision tree classifier 20 The following decision tree design algorithm may be utilized in block 24
Before the decision tree algorithm can begin, a collection of discrete output hidden Markov models must be trained for each word in the recognition vocaDUlary using a suitable training αata set. The hidden Markov model parameters for a vocabulary of size l Vwill be denoted as
A state transition matrix for word /
B, state output distribution matrix for word / / = ( Ils* 2 > ' "' W π, initial state distribution for word
We further assume that the number of different outputs from the vector quantizer is denoted by Q, also known as the vector quantizer codebook size. In addition, we will define the notation for a set of vector αuantizer outr-uts at different time indices as
X= {('> U',,S2),...,(,,Λ)} where the set X consists of pairs of observed symbols and their associated time index.
The tree design algorithm proceeds as follows: Two sets of sets are defined. The first set, 7, is the collection of observation sets associated with internal tree nodes. The second set, L, is the collection of observation sets associated with leaf nodes of the tree. Initially, the set 7 is set to the empty set, and L is a set that contains one element, and that element is the empty set. The main loop of the algorithm moves elements from the set L to the set 7 one at a time until the cardinality, or number of elements, in set 7 reaches the pre-specified maximum number.
The determination of which set contained in L should be moved to set 7 at any iteration is determined using an information theoretic criterion. Here, the goal is to reduce the total entropy of the tree specified by the collection of leaf nodes in L. The optimal greedy selection for the collection of observation sets is given by
Figure imgf000012_0001
where
Figure imgf000012_0002
Figure imgf000013_0001
w
The probability of observing a collection of vector quantizer outputs conditioned on the word w is computed using the hidden Markov model equations. One expression of this calculation is given by
Figure imgf000013_0002
| flB,[*(/ ),-ϊy] | ff,Wl)]flAI[x(/),x(/ + l)]
which can be efficiently calculated using a slightly modified version of the hidden Markov model equation commonly referred to as the forward algorithm. With each iteration, the optimal set X contained in L is moved to the set 7 In addition, the optimal set X is used to compute the optimal time index for 10 further node expansion. This is done using an information theoretic criteria specified by ρ arg min ; ¥_(w\ X J {(/, s)}) ?(X {(/, s)})
The time index specified by this relationship is used to expand the collection of leaf nodes to include all possible leaf nodes associated with the
15 set X. Each of these new sets, which are designated by
L <- L -m {X -mj {(t, s)}} V-s e {l,2, . -., β} are added to the set L and the set X is removed from set L. Once this operation is completed, the algorithm repeats by choosing the best element of L to transfer to 7, etc. This algorithm is expressed in pseudocode in FIG 6.
20 The above specification describes a new and improved apparatus and method for classifying utterances into word categories. It is realized that the above description may indicate to those skilled in the art additional ways in which the principles of this invention may be used without departing from the spirit. It is, therefore, intended that this invention be limited only by the scope
25 of the appended claims.
SUBSTITUTE SHEET (RULE 26}

Claims

WHAT IS CLAIMED IS:
1. A method of classifying utterance which comprises the steps of: obtaining a collection of examples of speech utterances; training hidden Markov models to obtain statistical models that correspond to individual words to classify the speech utterances; using the statistical models to design a decision tree that represents a classifier of the speech utterances; using the hidden Markov model to compute the probabilities of sounds not uttered; and determining speech utterances with the decision tree.
2. The method claimed in 1 , wherein the statistical models that are used to design the decision tree are implemented in a greedy iterative fashion.
3. The method claimed in Claim 2, wherein the recursive greedy fashion of implementing the decision tree design includes the steps of: iteratively expanding entries of the decision tree table using the mutual information between the observed symbols and the word class; and terminating the decision table when the maximum number of entries is reached.
4. The method claimed in Claim 3, wherein the mutual information is obtained in accordance with the equation
Figure imgf000014_0001
w
5. The method claimed in Claim 3 further including the step of: using the maximum expected information gain to determine the optimal time index to expand each entry in the decision tree table.
6. A system for classifying utterances which comprises: means for imputing speech; means coupled to said inputting means for transforming the speech into an alphabet of acoustic patterns; means coupled to said transforming means for classifying the acoustic patterns into word categories by utilizing a decision tree classifier designed using hidden Markov models.
7 The system claimeα in Claim 6, wherein said classifying means comprises: a memory containing classification instructions; means coupled to said memory for sequentially accessing said memory based upon the acoustic patterns; and means coupled to said accessing means for outputting the terminal classification decision
8 The system claimed in Claim 6, wherein said transforming means comprises. a feature extractor for converting the inputted speech signals into frame based acoustical patterns; and a vector quantizer coupled to said feature extractor for mapping the acoustical patterns to a fixed alphabet of symbols
PCT/US1995/013416 1994-10-26 1995-10-19 Decision tree classifier designed using hidden markov models WO1996013830A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP8514641A JPH10509526A (en) 1994-10-26 1995-10-19 Decision Tree Classifier Designed Using Hidden Markov Model
EP95937519A EP0789902A4 (en) 1994-10-26 1995-10-19 Decision tree classifier designed using hidden markov models
AU39608/95A AU3960895A (en) 1994-10-26 1995-10-19 Decision tree classifier designed using hidden markov models

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US32939394A 1994-10-26 1994-10-26
US329,393 1994-10-26

Publications (1)

Publication Number Publication Date
WO1996013830A1 true WO1996013830A1 (en) 1996-05-09

Family

ID=23285181

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/013416 WO1996013830A1 (en) 1994-10-26 1995-10-19 Decision tree classifier designed using hidden markov models

Country Status (5)

Country Link
EP (1) EP0789902A4 (en)
JP (1) JPH10509526A (en)
AU (1) AU3960895A (en)
CA (1) CA2203649A1 (en)
WO (1) WO1996013830A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997008686A2 (en) * 1995-08-28 1997-03-06 Philips Electronics N.V. Method and system for pattern recognition based on tree organised probability densities
EP0849723A2 (en) * 1996-12-20 1998-06-24 ATR Interpreting Telecommunications Research Laboratories Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition
WO2002029614A1 (en) * 2000-09-30 2002-04-11 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (hmm) for speech recognition
WO2002029612A1 (en) * 2000-09-30 2002-04-11 Intel Corporation Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition
US20110238410A1 (en) * 2010-03-26 2011-09-29 Jean-Marie Henri Daniel Larcheveque Semantic Clustering and User Interfaces
US8676565B2 (en) 2010-03-26 2014-03-18 Virtuoz Sa Semantic clustering and conversational agents
US9378202B2 (en) 2010-03-26 2016-06-28 Virtuoz Sa Semantic clustering
US9524291B2 (en) 2010-10-06 2016-12-20 Virtuoz Sa Visual display of semantic information
CN113589191A (en) * 2021-07-07 2021-11-02 江苏毅星新能源科技有限公司 Power failure diagnosis system and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455889A (en) * 1993-02-08 1995-10-03 International Business Machines Corporation Labelling speech using context-dependent acoustic prototypes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033087A (en) * 1989-03-14 1991-07-16 International Business Machines Corp. Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455889A (en) * 1993-02-08 1995-10-03 International Business Machines Corporation Labelling speech using context-dependent acoustic prototypes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT ROBOTS AND SYSTEMS, 05 November 1991, MOURA-PIRES et al., "Design of a Decision Tree With Action", pages 625-626. *
IEEE TRANSACTIONS ON INFORMATION THEORY, Vol. 34, No. 5, 01 September 1988, GOODMAN et al., "Decision Tree Design from a Communication Theory Standpoint", pages 979-982. *
See also references of EP0789902A4 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997008686A2 (en) * 1995-08-28 1997-03-06 Philips Electronics N.V. Method and system for pattern recognition based on tree organised probability densities
WO1997008686A3 (en) * 1995-08-28 1997-04-03 Philips Electronics Nv Method and system for pattern recognition based on tree organised probability densities
EP0849723A2 (en) * 1996-12-20 1998-06-24 ATR Interpreting Telecommunications Research Laboratories Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition
EP0849723A3 (en) * 1996-12-20 1998-12-30 ATR Interpreting Telecommunications Research Laboratories Speech recognition apparatus equipped with means for removing erroneous candidate of speech recognition
WO2002029614A1 (en) * 2000-09-30 2002-04-11 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (hmm) for speech recognition
WO2002029612A1 (en) * 2000-09-30 2002-04-11 Intel Corporation Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition
US7472064B1 (en) 2000-09-30 2008-12-30 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition
US8676565B2 (en) 2010-03-26 2014-03-18 Virtuoz Sa Semantic clustering and conversational agents
US20110238410A1 (en) * 2010-03-26 2011-09-29 Jean-Marie Henri Daniel Larcheveque Semantic Clustering and User Interfaces
US8694304B2 (en) * 2010-03-26 2014-04-08 Virtuoz Sa Semantic clustering and user interfaces
US9196245B2 (en) 2010-03-26 2015-11-24 Virtuoz Sa Semantic graphs and conversational agents
US9275042B2 (en) 2010-03-26 2016-03-01 Virtuoz Sa Semantic clustering and user interfaces
US9378202B2 (en) 2010-03-26 2016-06-28 Virtuoz Sa Semantic clustering
US10360305B2 (en) 2010-03-26 2019-07-23 Virtuoz Sa Performing linguistic analysis by scoring syntactic graphs
US9524291B2 (en) 2010-10-06 2016-12-20 Virtuoz Sa Visual display of semantic information
CN113589191A (en) * 2021-07-07 2021-11-02 江苏毅星新能源科技有限公司 Power failure diagnosis system and method
CN113589191B (en) * 2021-07-07 2024-03-01 郴州雅晶源电子有限公司 Power failure diagnosis system and method

Also Published As

Publication number Publication date
AU3960895A (en) 1996-05-23
EP0789902A1 (en) 1997-08-20
CA2203649A1 (en) 1996-05-09
EP0789902A4 (en) 1998-12-02
JPH10509526A (en) 1998-09-14

Similar Documents

Publication Publication Date Title
EP0619911B1 (en) Children&#39;s speech training aid
KR100826875B1 (en) On-line speaker recognition method and apparatus for thereof
JPH11272291A (en) Phonetic modeling method using acoustic decision tree
KR20010102549A (en) Speaker recognition
CN111798846A (en) Voice command word recognition method and device, conference terminal and conference terminal system
EP0789902A1 (en) Decision tree classifier designed using hidden markov models
GB2335064A (en) Linear trajectory models incorporating preprocessing parameters for speech recognition
JP3920749B2 (en) Acoustic model creation method for speech recognition, apparatus thereof, program thereof and recording medium thereof, speech recognition apparatus using acoustic model
Mohanty et al. Isolated Odia digit recognition using HTK: an implementation view
Saha Development of a bangla speech to text conversion system using deep learning
US20020133343A1 (en) Method for speech recognition, apparatus for the same, and voice controller
JP2982689B2 (en) Standard pattern creation method using information criterion
Huda et al. A variable initialization approach to the EM algorithm for better estimation of the parameters of hidden markov model based acoustic modeling of speech signals
JP3029803B2 (en) Word model generation device for speech recognition and speech recognition device
Li Speech recognition of mandarin monosyllables
Dumitru et al. Vowel, Digit and Continuous Speech Recognition Based on Statistical, Neural and Hybrid Modelling by Using ASRS_RL
JPH10254477A (en) Phonemic boundary detector and speech recognition device
Li et al. A comparative study of speech segmentation and feature extraction on the recognition of different dialects
Ney et al. Acoustic-phonetic modeling in the SPICOS system
CN102034474B (en) Method for identifying all languages by voice and inputting individual characters by voice
Scholar Development of a Robust Speech-to-Text Algorithm for Nigerian English Speakers 1Mohammed M. Sulaiman, 2Yahya S. Hadi, 1Mohammed Katun and 1Shehu Yakubu
KR100194581B1 (en) Voice dialing system for departmental automatic guidance
Kato et al. Tree‐based clustering for gaussian mixture HMMs
JP3412501B2 (en) Task adaptation device and speech recognition device
Frikha et al. Hidden Markov models (HMMs) isolated word recognizer with the optimization of acoustical analysis and modeling techniques

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LT LU LV MD MG MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TT UA UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD SZ UG AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2203649

Country of ref document: CA

Ref country code: CA

Ref document number: 2203649

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1995937519

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1995937519

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1995937519

Country of ref document: EP