EP0789902A1 - Klassifikatoren mit entscheidungsbaumstruktur unter verwendung von hidden-markov-modellen - Google Patents

Klassifikatoren mit entscheidungsbaumstruktur unter verwendung von hidden-markov-modellen

Info

Publication number
EP0789902A1
EP0789902A1 EP95937519A EP95937519A EP0789902A1 EP 0789902 A1 EP0789902 A1 EP 0789902A1 EP 95937519 A EP95937519 A EP 95937519A EP 95937519 A EP95937519 A EP 95937519A EP 0789902 A1 EP0789902 A1 EP 0789902A1
Authority
EP
European Patent Office
Prior art keywords
decision tree
speech
hidden markov
word
utterances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP95937519A
Other languages
English (en)
French (fr)
Other versions
EP0789902A4 (de
Inventor
Jeffrey S. Sorensen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dictaphone Corp
Original Assignee
Dictaphone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dictaphone Corp filed Critical Dictaphone Corp
Publication of EP0789902A1 publication Critical patent/EP0789902A1/de
Publication of EP0789902A4 publication Critical patent/EP0789902A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Definitions

  • This invention relates to pattern classification, and more particularly, to a method and apparatus for classifying unknown speech utterances into specific word categories.
  • Speech recognition is the process of analyzing an acoustic speech signal to identify the linguistic message or utterance that was intended so that a machine can correctly respond to spoken commands. Fluent conversation with a machine is difficult because of the intrinsic variability and complexity of speech. Speech recognition difficulty is also a function of vocabulary size, confusability of words, signal bandwidth, noise, frequency distortions, the population of speakers that must be understood, and the form of speech to be processed.
  • a speech recognition device requires the transformation of the continuous signal into discrete representations which may be assigned proper meanings, and which, when comprehended, may be used to effect responsive behavior.
  • the same word spoken by the same person on two successive occasions may have different characteristics. Pattern classifications have been developed to help in determining which word has been uttered.
  • the hidden Markov model is the most popular statistical model for speech.
  • the hidden Markov model characterizes speech signals as a stochastic state, or random distribution, machine with different characteristic distributions of speech signals associated with each state.
  • a speech signal can be viewed as a series of acoustic sounds, where each sound has a particular pattern of harmonic features.
  • harmonic patterns there is not a one-to-one relationship between harmonic patterns and speech units.
  • the decision tree classifier is another pattern classification technique Decision tree classifiers imply a sequential method of determining which class to assign to a particular observation Decision tree classifiers are most commonly used in the field of artificial intelligence for medical diagnosis and in botany for designing taxonomic guides
  • a decision tree can be described as a guide to asking a series of questions, where the latter questions asked depend upon the answers to earlier questions For example, in developing a guide to identifying bird species, a relevant early question is the color of the bird
  • discrete output hidden Markov models For the application of isolated word recognition, a specific class of hidden Markov models is employed, commonly referred to as discrete output hidden Markov models.
  • the spoken words are represented by a sequence of symbols from an appropriately defined alphabet of symbols.
  • the method used to transform a set of acoustic patterns into a set of discrete symbols is known in the art as vector quantization.
  • Prior art implementations of speech recognition systems using hidden Markov models considered the contribution of all parts of the utterance simultaneously.
  • the prior art required that the feature extraction and vector quantization steps be performed on the entire utterance before classification can begin.
  • hidden Markov model computations require complex mathematical operations in order to estimate the probabilities of particular utterance patterns. These operations include the use of the complex representation of floating point numbers as well as large quantities of multiplications, divisions, and summations.
  • This invention overcomes the disadvantage of the prior art by providing a speech recognition system that does not use complex mathematical operations and does not take a great deal of time to classify utterances into word categories.
  • a decision tree classifier designed using the hidden Markov model, yields a final classifier that can be implemented without using any mathematical operations.
  • This invention moves all of the mathematical calculations to the construction, or design, of the decision tree. Once this is completed, the decision tree can be implemented using only logical operations such as memory addressing and binary comparison operations. This simplicity allows the decision tree to be implemented directly in a simple hardware form using conventional logic gates or in software as greatly reduced set of computer instructions. This differs greatly from previous algorithms which required that a means of mathematical calculation of probabilities be provided in the classification system.
  • This invention overcomes the problems of earlier decision tree classifier methods by introducing the hidden Markov modeling technique as an intermediate representation of the training data
  • the first step is obtaining a collection of examples of speech utterances From these examples, hidden Markov models corresponding to each word are trained using the Baum-Welch method.
  • the training speech is discarded. Only the statistical models that correspond to each word are used in designing the decision tree.
  • the statistical models are considered a smoothed version of the training data. This smoothed representation helps the decision tree design prevent over-specialization.
  • the hidden Markov model imposes a rigid parametric structure to the probability space, preventing variability in the training data from ruining the tree design process. Estimates for unobserved utterances are inferred based on their similarity to other examples in the training data set.
  • FIG. 1 is a block drawing of a speech recognition system that was utilized by the prior art
  • FIG. 2 is a block diagram of the apparatus of this invention.
  • FIG. 3 is the decision tree
  • FIG. 4 is a table of the time index and corresponding vector quantizer symbols for sample utterances; and FIG. 5 is a block diagram of a decision tree design algorithm.
  • the reference character 11 represents a prior art speech recognition system for the words “yes”, “no” and “help”.
  • the words “yes”, “no” and “help” have been selected for illustrative purposes. Any words or any number of words may have been selected.
  • Speech recognition system 11 comprises: a feature extractor 12; a vector quantizer 13; a hidden Markov Model (HMM) for the word “yes” 14; a hidden Markov Model (HMM) for the word “no” 15; a hidden Markov Model (HMM) for the word “help” 16 and a choose maximum 17.
  • HMM hidden Markov Model
  • HMM hidden Markov Model
  • Feature extractor 12 consists of a frame buffer and typical speech recognition preprocessing. Extractor 12 is disclosed in a book by Lawrence R. Rabiner and Biing Hwang Juang entitled “Fundamentals of Speech Recognition,” Prentice-Hall, Inc. , Englewood Cliffs. 1993. which is herein incorporated by reference. An implementation entails the use of a frame buffer of 45 ms with 30 ms of overlap between frames.
  • Each frame is processed using a pre-emphasis filter, a Hamming windowing operation, an autocorrelation measurement with order 10, calculation of the Linear Prediction Coefficients (LPC's) followed by calculation of the LPC cepstral parameters.
  • Cepstral parameters provide a complete source-filter characterization of the speech.
  • the cepstral parameters are a representation of the spectral contents of the spoken utterance and contain information that includes the location and bandwidths of the formants of the vocal tract of the speaker
  • An energy term provides additional information about the signal amplitude
  • the output of extractor 12 are the aforementioned LPC cepstral parameters of order 10 and a single frame energy term that are coupled to the input of vector quantizer 13.
  • Vector quantizer 13 utilizes the algorithm disclosed by Rabiner and Juang in "Fundamentals of Speech Recognition" to map each collection of 1 1 features received from extractor 12 into a single integer
  • the above vector quantizer 13 is developed using a large quantity of training speech data.
  • the integer sequence that constitutes a discrete version of the speech spectrum is then coupled to the inputs of HMM for the word "yes” 4, HMM for the word “no” 15 and HMM for the word “help” 16 HMM 14, HMM 15 and HMM 16 each individually contains a set of hidden Markov model parameters.
  • HMM 15 and HMM 16 each individually contains a set of hidden Markov model parameters.
  • this algorithm is known as the forward algorithm.
  • HMM 14, HMM 15 and HMM 16 performs their own computations. These computations are time consuming.
  • the outputs of HMM 14, HMM 15 and HMM 16 are coupled to the input of choose maximum 17.
  • Choose maximum 17 is a conventional recognizer that utilizes a computer program to determine whether HMM 14, HMM 15, or HMM 16 has the highest probability estimate If HMM 14 had the highest probability estimate, then me system concludes the word "yes was spoken into device 18 and if HMM 1 5 had the highest probability estimate the word "no" was spoken into device 18
  • the output of choose maximum 17 is utilized to effect some command input to other systems to place functions in these other systems under voice control
  • FIG. 2 is a block diagram of the apparatus of this invention.
  • the output of speech input device 18 is coupled to the input of feature extractor 12 and the output of feature extractor 12 is coupled to the input of vector quantizer 13.
  • the output of vector quantizer 13 is coupled to the input of decision tree classifier 20.
  • Decision tree classifier 20 has a buffer that contains the vector quantizer values of the entire spoken utterance, i e yes", “no” "help”
  • the decision tree classifier 20 may be represented by the data contained in the table shown in FIG. 3 and the associated procedure for accessing the aforementioned table
  • the table shown in FIG. 3 is generated using the statistical information collected in the hidden Markov models for each of the words in the vocabulary, i e ' yes", “no” and "help
  • FIG 4 contains several examples of spoken utterances for illustrative purposes, and will be used in the discussions that follow
  • the method used to classify utterances with decision tree classifier 20 involves a series of inspections of the symbols at specific time instances, as guided by the information in FIG 3 We begin with step SO for each classification task. This table instructs us to inspect time index 3 and to proceed to other steps based on the symbols found at time index 3, or to announce a final classification when no further inspections are required We will demonstrate this procedure for three specific utterances chosen from FIG
  • step SO This line of the decision tree tells us to look at the vector quantizer symbol contained at time index 3 (FIG. 4) for any spoken word. Let us assume that the word "no" was spoken. The symbol at time index 3 for the first word “no” is 7. The column for symbol 7 (FIG. 3) at time 3 tells us to proceed to step S3. This step tells us to look at time index 6 (FIG. 4) The word "no” has a symbol O 96/13830 PCMJS95/13416
  • step S3 tells us to proceed to step S8
  • step S8 tells us to refer to time index 9 (FIG 4)
  • time index 9 At time index 9 for the word "no", we find a symbol value of 8 which informs us to go to column
  • step 8 This tells us to classify the answer as a "no", which is correct
  • step SO of FIG 3 This line of the table tells us to look at time index 3 (FIG 4) for our chosen word
  • the symbol at time index 3 for the first sample utterance for the word "yes” is 7
  • the column for symbol 7 (FIG 3) at time 3 tells us to proceed to step S3
  • This step tells us to look at time index 6 (FIG 4)
  • the word "yes” has a symbol 3 at time index 6
  • the symbol 3 column of step S3 (FIG 3) tells us to classify this input correctly, as the word "yes '
  • step SO of FIG 3 this line of the table tells us to look at time index 3 (FIG 4) for our chosen word
  • the symbol at time index 3 for the word ' help" is 6
  • the column for symbol 6 (FIG 3) at time 3 tells us to proceed to step S2
  • This step tells us to look at time index 4 (FIG 4)
  • the word "help” has a symbol 6 at time index 4
  • the symbol 6 column of step S2 (Fig 3) tells us that the word "help should be selected
  • the ' EOW" column of FIG 3 tells us where to go when the end of a word has been reached at the time index re ⁇ uested for observation
  • the label "n/a” was included for utterances too short to qualify as any of the words in the training set
  • the decision table shown in FIG 3 can be used to correctly classify all of the examples presented in FIG 4
  • the above decision tree and examples would not perform well on a large data set, as it is far too small to capture the diversity of potential utterances
  • Real implementations of the algorithm, which are hereinafter described, would generate tables with thousands of lines and at least several dozen columns
  • the above example is meant to illustrate the output of the hidden Markov model conversion to decision tree algorithm, and the procedure implemented in decision tree classifier 20 that uses the resultant data table
  • FIG 5 is a flow chart of a decision tree design algorithm
  • General decision tree design principles are disclosed by Ro ⁇ ney M Goodman and Padhraic Smyth in their article entitled “Decision tree design from a communications theory standpoint", IEEE Transactions on Information Theory, Vol., 34, No 5, pp 979-997 Sept 1988 which is herein incorporated by reference
  • the decision tree is designed using an iterative process
  • the precise rule applied for the expansion of each branch of the decision tree is specified
  • the desired termination condition depends highly on specific needs of the application, and will be discussed in broader terms
  • the termination condition is the total number of nodes in the resulting tree This is directly related to the amount of memory available for the storage of the classifier data structures Thus, this termination condition is well suited to most practical implementations
  • This algorithm is contained in box 24 and is repeated iteratively, converting the terminal, or leaf, nodes of the decision tree into internal nodes one at a time, and adding new leaf nodes in proportion to the number of entries in the vector quantizer This continues until the prespecified maximum number of tree nodes has been reached.
  • the resulting data structures can then be organized into the, decision tree as has been shown in FIG 3, and stored in the decision tree classifier 20
  • the following decision tree design algorithm may be utilized in block 24
  • the tree design algorithm proceeds as follows: Two sets of sets are defined.
  • the first set, 7, is the collection of observation sets associated with internal tree nodes.
  • the second set, L is the collection of observation sets associated with leaf nodes of the tree.
  • the set 7 is set to the empty set, and L is a set that contains one element, and that element is the empty set.
  • the main loop of the algorithm moves elements from the set L to the set 7 one at a time until the cardinality, or number of elements, in set 7 reaches the pre-specified maximum number.
  • the determination of which set contained in L should be moved to set 7 at any iteration is determined using an information theoretic criterion.
  • the goal is to reduce the total entropy of the tree specified by the collection of leaf nodes in L.
  • the optimal greedy selection for the collection of observation sets is given by
  • the optimal set X contained in L is moved to the set 7
  • the optimal set X is used to compute the optimal time index for 10 further node expansion. This is done using an information theoretic criteria specified by ⁇ arg min ; ⁇ _(w ⁇ X J ⁇ (/, s) ⁇ ) ?(X ⁇ (/, s) ⁇ )
  • the time index specified by this relationship is used to expand the collection of leaf nodes to include all possible leaf nodes associated with the

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
EP95937519A 1994-10-26 1995-10-19 Klassifikatoren mit entscheidungsbaumstruktur unter verwendung von hidden-markov-modellen Withdrawn EP0789902A4 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US32939394A 1994-10-26 1994-10-26
US329393 1994-10-26
PCT/US1995/013416 WO1996013830A1 (en) 1994-10-26 1995-10-19 Decision tree classifier designed using hidden markov models

Publications (2)

Publication Number Publication Date
EP0789902A1 true EP0789902A1 (de) 1997-08-20
EP0789902A4 EP0789902A4 (de) 1998-12-02

Family

ID=23285181

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95937519A Withdrawn EP0789902A4 (de) 1994-10-26 1995-10-19 Klassifikatoren mit entscheidungsbaumstruktur unter verwendung von hidden-markov-modellen

Country Status (5)

Country Link
EP (1) EP0789902A4 (de)
JP (1) JPH10509526A (de)
AU (1) AU3960895A (de)
CA (1) CA2203649A1 (de)
WO (1) WO1996013830A1 (de)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0788649B1 (de) * 1995-08-28 2001-06-13 Koninklijke Philips Electronics N.V. Verfahren und system zur mustererkennung mittels baumstrukturierten wahrscheinlichkeitsdichten
EP0849723A3 (de) * 1996-12-20 1998-12-30 ATR Interpreting Telecommunications Research Laboratories Spracherkennungsapparat mit Mitteln zum Eliminieren von Kandidatenfehlern
WO2002029614A1 (en) * 2000-09-30 2002-04-11 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (hmm) for speech recognition
AU2000276394A1 (en) * 2000-09-30 2002-04-15 Intel Corporation Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition
US8694304B2 (en) * 2010-03-26 2014-04-08 Virtuoz Sa Semantic clustering and user interfaces
US8676565B2 (en) 2010-03-26 2014-03-18 Virtuoz Sa Semantic clustering and conversational agents
US9378202B2 (en) 2010-03-26 2016-06-28 Virtuoz Sa Semantic clustering
US9524291B2 (en) 2010-10-06 2016-12-20 Virtuoz Sa Visual display of semantic information
CN113589191B (zh) * 2021-07-07 2024-03-01 郴州雅晶源电子有限公司 一种电源故障诊断系统及方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5033087A (en) * 1989-03-14 1991-07-16 International Business Machines Corp. Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system
US5455889A (en) * 1993-02-08 1995-10-03 International Business Machines Corporation Labelling speech using context-dependent acoustic prototypes

Also Published As

Publication number Publication date
AU3960895A (en) 1996-05-23
CA2203649A1 (en) 1996-05-09
WO1996013830A1 (en) 1996-05-09
EP0789902A4 (de) 1998-12-02
JPH10509526A (ja) 1998-09-14

Similar Documents

Publication Publication Date Title
EP0619911B1 (de) Sprachtrainingshilfe für kinder.
US5333236A (en) Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models
JPH11272291A (ja) 音響判断ツリ―を用いたフォネティック・モデル化方法
JPH05216490A (ja) 音声コード化装置及び方法並びに音声認識装置及び方法
KR100826875B1 (ko) 온라인 방식에 의한 화자 인식 방법 및 이를 위한 장치
EP0779609A2 (de) Sprachadaptionssystem und Spracherkenner
JP2000099080A (ja) 信頼性尺度の評価を用いる音声認識方法
KR20010102549A (ko) 화자 인식 방법 및 장치
CN111798846A (zh) 语音命令词识别方法、装置、会议终端及会议终端系统
EP0789902A1 (de) Klassifikatoren mit entscheidungsbaumstruktur unter verwendung von hidden-markov-modellen
GB2335064A (en) Linear trajectory models incorporating preprocessing parameters for speech recognition
JP3920749B2 (ja) 音声認識用音響モデル作成方法、その装置、そのプログラムおよびその記録媒体、上記音響モデルを用いる音声認識装置
Mohanty et al. Isolated Odia digit recognition using HTK: an implementation view
US20020133343A1 (en) Method for speech recognition, apparatus for the same, and voice controller
JP2982689B2 (ja) 情報量基準を用いた標準パターン作成方式
Saha Development of a bangla speech to text conversion system using deep learning
CN120071905A (zh) 一种基于mfcc算法和vq-hmm算法的语音识别与分析方法
JP3029803B2 (ja) 音声認識のための単語モデル生成装置及び音声認識装置
Li Speech recognition of mandarin monosyllables
Dumitru et al. Vowel, digit and continuous speech recognition based on statistical, neural and hybrid modelling by using ASRS_RL
Li et al. A comparative study of speech segmentation and feature extraction on the recognition of different dialects
Ney et al. Acoustic-phonetic modeling in the SPICOS system
CN102034474B (zh) 语音辨认所有语言及用语音输入单字的方法
JPH0786758B2 (ja) 音声認識装置
KR100194581B1 (ko) 부서자동 안내를 위한 음성 다이얼링 시스템

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19970425

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LI NL SE

A4 Supplementary search report drawn up and despatched

Effective date: 19981014

AK Designated contracting states

Kind code of ref document: A4

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LI NL SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Withdrawal date: 19981113