JPS62111296A

JPS62111296A - Voice recognition method and apparatus

Info

Publication number: JPS62111296A
Application number: JP61195109A
Authority: JP
Inventors: ギディオン　アブラハム　セネンシェブ
Original assignee: National Research Development Corp UK
Current assignee: National Research Development Corp UK
Priority date: 1985-08-20
Filing date: 1986-08-20
Publication date: 1987-05-22
Also published as: GB8520777D0

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、所定の語常に基づいた音声を認識する方法と
その装置に関し、さらに、音声認識装置にトレーニング
を施す方法とその装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method and device for recognizing speech based on a predetermined word base, and further relates to a method and device for training a speech recognition device.

（従来の技術）これまでになされてきた音声認識の研究は、［かくれマ
ルコフ・モデルＪ　　（ｈｉｄｄｅｎＭａｒｋｏｖ　　
ｍｏｄｅｌ＞と呼ばれる数学的モデルを利用している。(Prior art) Research on speech recognition that has been done so far is based on [hidden Markov model J].
A mathematical model called ``model'' is used.

この研究では、認識すべきそれぞれの単語について数学
的モデルが確立され、その数学的モデルの諸パラメータ
は、適当なトレーニング手順を利用することによって定
められる。In this study, a mathematical model is established for each word to be recognized, and the parameters of the mathematical model are defined by using a suitable training procedure.

未知の単語が話されると、ある所定の方法でその観測結
果が得られる。この場合、所定の諸単語モデルのそれぞ
れについて、その観測結果を引き起こすであろう最大事
後確率を計算することができる。そして、最も高い確率
を与えるモデルに相当する単語が、その未知の単語に対
応する最も確からしい単語として認識され得る。この概
念は、関連を持って話される一連の単語の認識へと拡張
され得る。When an unknown word is spoken, its observations are made in some predetermined way. In this case, for each given word model, the maximum posterior probability that would cause that observation can be calculated. Then, the word corresponding to the model that gives the highest probability can be recognized as the most likely word corresponding to the unknown word. This concept can be extended to the recognition of sequences of words spoken in a related manner.

マルコフ・モデルは、一連の状態を有する有限状態機械
である。例えば、第３図の４状態モデルを参照されたい
。このマルコフ・モデルは、いつでも、いずれか一つの
状態をとりうる。個々の時間間隔の終了時点で、このモ
デルは、現在の状態から別の状態に移行するか、あるい
は現在の状態自身に移行する。モデルが一つの状態にと
どまっているだけだとすると、次にどのような状態とな
るか確かな予測が不可能である。というよりはむしろ、
一つの状態から別の状態、おるいは現在の状態自身、へ
と移行する可能なすべての組み合わせについて、その移
行確率が知られている。A Markov model is a finite state machine with a set of states. For example, see the four-state model in FIG. This Markov model can be in any one state at any time. At the end of each time interval, the model transitions from the current state to another state or to the current state itself. If a model only stays in one state, it is impossible to predict with certainty what state it will be in next. Rather,
For all possible combinations of transitions from one state to another, or to the current state itself, the transition probabilities are known.

個々の時間間隔にあっては、モデルは観測量を生じる。At individual time intervals, the model produces observables.

この観測量は、通常、多次元の可測量である。かくれマ
ルコフ・モデルでは、状態に関する知識が不十分であり
、観測結果がどのようになるか確かな予測をすることは
できない。その代りり、状態は、モデルがその状態で生
じる諸説測量の事前確率分布を決定する。This observable is typically a multidimensional measurable. Hidden Markov models do not have enough knowledge about the state to make reliable predictions about what the observed results will be. Instead, the state determines the prior probability distribution of the hypothesis measurements that the model will take in that state.

（発明が解決しようとする問題点）上述の音声認識の研究には次のような欠点がある。すな
わち、認識すべき音声を構成する音の試料を認識装置が
受は取ったときに、非常に短い時間に数多くの計算をす
る必要がある。この問題点は、連続した音声を認識する
場合に特に深刻である。ざらに、短時間に多数の計算を
するという要求のために認識装置が非常に高価になり、
パーソナルコンピュータや音声（例えば電話）によるデ
ータ入力などへの多くの応用が制限される。(Problems to be Solved by the Invention) The above-mentioned speech recognition research has the following drawbacks. That is, when the recognition device receives a sample of sounds that constitute the speech to be recognized, it is necessary to perform a large number of calculations in a very short period of time. This problem is particularly serious when recognizing continuous speech. In general, the requirement to perform a large number of calculations in a short period of time has made recognition equipment very expensive.
This limits many applications, such as personal computers and voice (eg, telephone) data entry.

（問題点を解決するための手段）本発明の第１の特徴によれば、所定の語学の範囲内で諸
単語を認識する装置であって次の構成要素を有するもの
が提供される。(Means for Solving the Problems) According to a first feature of the present invention, there is provided an apparatus for recognizing words within a predetermined linguistic range, which has the following components.

（イ）音を順次サンプリングして、それらの音の特質を
代表するそれぞれ一組の信号を得るための手段。(b) Means for sequentially sampling sounds to obtain a set of signals each representative of the characteristics of those sounds.

（ロ）所定の語業のそれぞれの単語について、多数の有
限状態機械を代表するデータを記憶し、かつ、それらの
単語の個々のモデルを形成する手段。(b) Means for storing data representative of a number of finite state machines for each word of a given word work and for forming individual models of those words.

前記データは、有限状態機械を形成する状態を記述し、
それぞれの状態に対して確率関数を割り当てるものであ
る。この場合、少なくとも一つの確率関数が二つ以上の
状態に割り当てられている。the data describes states forming a finite state machine;
A probability function is assigned to each state. In this case, at least one probability function is assigned to two or more states.

それぞれの確率関数は、もし当該モデルが実際の音を発
生したならばそしてどのモデルも当該確率関数が割り当
てられる状態にあったならば音の特質を代表する信号が
なんらかの観測値を装う確率を記述する。Each probability function describes the probability that a signal representing a characteristic of the sound will pretend to be some observed value if the model generates the actual sound and if every model is in the state to which the probability function is assigned. do.

（ハ）もし実際の音を製造したモデルと任意のモデルと
が任意の与えられた状態にあったならば信号のどのよう
な与えられた組が発生するかと言う確率を、信号と確率
関数の組み合わせから、決定する手段。(c) Calculate the probability that a given set of signals will occur if the model that produced the actual sound and any model are in any given state. A means of determining from a combination.

（ニ）計算された確率と有限状態機械の性質とから、連
続した音の試料が所定の諸単語を代表するであろう最大
の可能性を決定する手段。(d) A means of determining, from calculated probabilities and properties of finite state machines, the maximum likelihood that a sample of consecutive sounds will represent given words.

（ホ）検知された最大の可能性に基づいて、所定の諸単
語のうちの゛一つを、話された最も確からしい単語とし
て示す出力を提供する手段。(e) Means for providing an output indicating one of the predetermined words as the most likely spoken word based on the maximum probability detected.

本発明の第２の特徴によれば、所定の語学の範囲内で諸
単語を認識する方法であって次の段階を有するものが提
供される。According to a second feature of the invention, there is provided a method for recognizing words within a given language, comprising the following steps.

（イ）音を順次サンプリングして、それらの音の特質を
代表するそれぞれ一組の信号を得る段階。(a) Sequential sampling of sounds to obtain a set of signals each representative of the characteristics of those sounds.

（ロ）所定の諸策のそれぞれの単語について、多数の有
限状態機械を代表するデータを記憶し、かつ、それらの
単語の個々のモデルを形成する段階。(b) Storing data representative of a number of finite state machines for each word of a given strategy and forming individual models of those words.

（ハ）もし実際の音を発生したモデルと任意のモデルと
が任意の与えられた状態にあったならば信号のどのよう
な与えられた組が発生するかと言う確率を、信号と確率
関数の組み合わせから、決定する手段。(c) If the model that generated the actual sound and any model were in any given state, the probability that a given set of signals would be generated is calculated using the signal and the probability function. A means of determining from a combination.

（ニ）計算された確率と有限状態機械の性質とから、連
続した音の試料が所定の諸単語を代表するであろう最大
の可能性を決定する段階。(d) Determining, from the calculated probabilities and the properties of the finite state machine, the maximum likelihood that a sample of consecutive sounds will represent the given words.

（ホ）検知された最大の可能性に基づいて、所定の諸単
語のうちの一つを、話された最も確からしい単語として
示す出力を提供する段階。(e) providing an output indicating one of the predetermined words as the most likely word spoken based on the maximum likelihood detected;

本発明の主要な利点は次の点にある。すなわち、確率関
数の数は非常に少なくできる。例えば、６４個の単語を
含む語常に対して１０２４個の確率関数が必要であった
（すなわち、１６個の状態のそれぞれに６４個のモデル
がある）のに対して、１００個の確率関数で済む。とい
うのは、はとんどの確率関数は、通常、多数の状態、し
たがって多数の有限状態機械、に割り当てられているか
らである。したがって、計算のＷ雑ざや、必要とされる
周期的操作の数、使用する記憶容量も減少させることが
できて、コストも低減できる。The main advantages of the invention are as follows. In other words, the number of probability functions can be made very small. For example, 1024 probability functions were required for a common word containing 64 words (i.e., 64 models for each of the 16 states), whereas 100 probability functions were required. It's over. This is because most probability functions are usually assigned to a large number of states, and therefore to a large number of finite state machines. Therefore, the number of calculations W, the number of required periodic operations, and the storage capacity used can be reduced, and the cost can also be reduced.

有限状態機械は、あるいは、かくれマルコフ・モデルと
して知られている。Finite state machines are also known as hidden Markov models.

最大の可能性を決定するための手段は、ビテルビ・アル
ゴリズム（Ｓ、Ｅ、Ｌｅｖｉｎｓｏｎ。A means to determine the maximum likelihood is the Viterbi algorithm (S.E. Levinson.

Ｌ、Ｒ，Ｒａｔｏ　ｉ　ｎｅｒ、Ｍ、Ｍ、５ｏｕｄｈ　
ｉ、「自動音声認識におけるマルコフ過程の確率関数理
論の応用入門Ｊ、Ｂｅ１ｌ　　ＳｙｓｔｅｍＴｅｃｈｎ
ｉｃａｌ　　Ｊｏｕｒｎａｌ、第６２巻、第４号、パー
ト１．１９８３年４月、１０３５〜１０７４ページを参
照）または「フォーワード・バックワード」アルゴリズ
ム（Ｓ、Ｅ、Ｌｅｖｉｎｓｏｎらの前掲文献を参照）を
実行する手段を有することができる。L, R, Rato i ner, M, M, 5oudh
i, “Introduction to the application of probability function theory of Markov processes in automatic speech recognition J, Be1l SystemTechn
ical Journal, Vol. 62, No. 4, Part 1. April 1983, pp. 1035-1074) or the "forward-backward" algorithm (see S. E. Levinson et al., supra). It is possible to have the means to do so.

本発明の第３の特徴によれば、所定の語粟の範囲内で諸
単語を認識する装置であって次の構成要素を有するもの
が提供される。According to a third feature of the present invention, there is provided an apparatus for recognizing words within a predetermined range of words, which has the following components.

（ロ）所定の３Ｒ業のそれぞれの単語について、多数の
有限状態機械を代表するデータを記憶し、かつ、それら
の単語の個々のモデルを形成する手段。(b) Means for storing data representative of a number of finite state machines for each word of a given 3R business and forming individual models of those words.

（ハ）開信号が生じたときに確率関数とそれら信号の各
組とから距離ベクトルを計算する手段であって、かつベ
クトルが計算される毎に制御信号を発生する手段。各ベ
クトルは、もしモデルが実際の音を発生してかつ任意の
モデルが任意の与えられた状態にあったならば任意の与
えられた信号の組が発生するであろう確率を表す。(c) Means for calculating a distance vector from the probability function and each set of these signals when an open signal occurs, and means for generating a control signal each time the vector is calculated. Each vector represents the probability that any given set of signals would occur if the model produced an actual sound and if any model were in any given state.

（ニ）制御信号の一つを受は取って、現在の距離ベクト
ルを使用して、各有限状態の各状態の最小累積距離を表
す値を決定するように特別に作られた論理回路手段。(d) logic circuit means specially constructed to receive one of the control signals and use the current distance vector to determine a value representing the minimum cumulative distance of each of the finite states;

（ホ）計算された最小累積距離に基づいて、所定の単語
のうちの一つを最も確からしい話された単語として示す
出力を提供する手段。(e) Means for providing an output indicating one of the predetermined words as the most likely spoken word based on the calculated minimum cumulative distance.

本発明は、累積距離を計算するためにハードウェアを特
別に作った論理回路手段を使用した計算に関して、その
計算に要する時間と、その計算の複雑ざとを減少できる
。The present invention can reduce the time required for calculations and the complexity of calculations using specially designed hardware logic circuit means for calculating cumulative distances.

かくれマルコフ・モデルにおいては、有限状態機械は、
移行によって結合された複数の状態を有することができ
る。各状態と各移行は、それぞれ、対応する信号の組み
の確率関数と移行ペナルティ−を有する。それから、任
意の観測信号の組に対して状態ペナルティ−を計算でき
、確率関数を与えられた前記観測信号の組の確率の逆数
を表すことができる。各有限状態機械は、認識すべき語
案内のある単語を表す。そして、移行ペナルティ−は、
その単語の特性でおる。しかし、状態ベナルティーは、
確率関数と現在受けている音とに依存する。In the hidden Markov model, the finite state machine is
It can have multiple states connected by transitions. Each state and each transition has a corresponding signal set probability function and transition penalty, respectively. A state penalty can then be calculated for any set of observed signals, and a probability function can be expressed as the inverse of the probability of said set of observed signals. Each finite state machine represents a word with a word guide to be recognized. And the migration penalty is
It depends on the characteristics of the word. However, the state Benarti
It depends on the probability function and the sound currently being received.

データを記憶する手段は、各機械の最小累積距離を記憶
することができる。論理回路手段は、現在の状態の各移
行に対してその移行の出発状態の最小累積距離と移行ペ
ナルティ−とに依存する値を決定できるように、そして
その値の最小値を決定できるように、そして前記値の最
小値と現在の状態に対する状態ペナルティ−とから現在
の状態の最小累積距離を決定できるように、構成するこ
とができる。各最小累積距離は、実際は、語彙のある単
語の一部を形成する一連の音が発生された最大の確率を
示すものとして認定することができる。The means for storing data may store a minimum cumulative distance for each machine. The logic circuit means are adapted to determine for each transition of the current state a value that depends on the minimum cumulative distance of the starting state of that transition and the transition penalty, and to determine the minimum value of that value. The present invention can be configured such that the minimum cumulative distance of the current state can be determined from the minimum value of the values and the state penalty for the current state. Each minimum cumulative distance can in fact be identified as indicating the maximum probability that a sequence of sounds forming part of a certain word in the vocabulary was generated.

論理回路手段とは別に、一つ以上の上述のその他の手段
を、汎用コンピュータで形成することができる。好まし
くは、マイクロプロセッサに基づいたもので、必要な゛
諸操作に対しては特に永久的にプログラムされているも
のが良い。Apart from the logic circuit means, one or more of the other means mentioned above may be implemented in a general purpose computer. Preferably, it is based on a microprocessor and is particularly permanently programmed for the necessary operations.

本発明の第４の特徴によれば、多数の条件のうちの一つ
の存在に関係するデータを分析するための装置であって
、次の構成要素を有する装置が提供される。According to a fourth aspect of the invention, there is provided an apparatus for analyzing data relating to the presence of one of a number of conditions, the apparatus having the following components:

（イ）分析すべきデータを表す信号の組を受は取る手段
。(b) A means of receiving and receiving a set of signals representing the data to be analyzed.

（ロ）多数の有限状態機械を表すデータを記憶し、諸条
件のそれぞれを個々にモデル化する手段。この手段は、
多数の関数を有し、少なくともその一つは、有限状態機
械の二つ以上の状態に割り当てられる。また、もしモデ
ルが実際の信号を発生しかつ任意のモデルがその有限状
態が割り当てられている状態にあったならば、各関数は
、データのＩ質を表す信号が任意の観測値を装う確率を
記述する。(b) A means of storing data representing a large number of finite state machines and modeling each of the conditions individually. This means
It has a number of functions, at least one of which is assigned to two or more states of the finite state machine. Also, if a model generates a real signal and any model is in the state to which its finite state is assigned, then each function is the probability that a signal representing the I quality of the data pretends to be an arbitrary observed value. Describe.

（ハ）データが生じるときに、関数とデータの組とから
、もしモデルが実際の信号を発生してかつ任意のモデル
が任意の与えられた状態にあったならば任意の与えられ
た信号の組が発生するであろう確率を表す標識を決定す
る手段。(c) When data is generated, from the function and data pair, if the model generates the actual signal and any model is in any given state, then any given signal A means of determining an indicator that represents the probability that a pair will occur.

（ニ）前記標識を使用して、各有限状態機械の各　　□
状態の最小累積距離の値を決定するように構成された論
理回路手段。(d) Using the above marks, each □ of each finite state machine
Logic circuit means configured to determine a value of a minimum cumulative distance of a state.

音を順次サンプリングする手段は、幅広いダイナミック
レンジを有する増幅器を経由してアナログ・デジタル・
コンバータに接続されたマイクロホンを含むことができ
る。ざらに、これらの手段は、例えば、ディジタル・フ
ィルタのバンクを含むことができて、入ってくる信号を
分析して周波数スペクトルに分離することができる。こ
のスペクトルの強度は、前記信号の組のそれぞれを形成
する。上述の手段は、または、線形予測（Ｊ、Ｄ。The means to sequentially sample sound is analog, digital, or analog via an amplifier with a wide dynamic range.
It can include a microphone connected to the converter. In general, these means can include, for example, a bank of digital filters to analyze and separate the incoming signal into frequency spectra. The intensities of this spectrum form each of the signal sets. The above-mentioned means can also be used for linear prediction (J, D.

Ｍａｒｋｅ　ｌ、Ａ、Ｈ，Ｇｒａｙ、［音声の線形予測
Ｊ　、Ｓｐｒ　ｉ　ｎｃｌｅｒ−Ｖｅｒ　Ｉ　ａｇ、１
９７６年を参照）によって前記組を得る手段を含むこと
ができる。Markel, A.H., Gray, [Linear Prediction of Speech J, Spr incler-Ver I ag, 1
976).

確率を計算する手段は、各フレーム（すなわち、所定の
数のサンプル期間）において、多数のエレメントを有す
る距離ベクトルを計算する手段を含むのが好ましい。エ
レメントのそれぞれは、一つの状態の確率関数と前記信
号の組みのうちの一つとから計算される。Preferably, the means for calculating the probability includes means for calculating, at each frame (ie a predetermined number of sample periods), a distance vector having a number of elements. Each of the elements is calculated from a probability function of one state and one of the signal sets.

本発明の最初の四つの特徴に係る装置と方法は、通常は
、機械状態と移行確率とによって定まる単語モデルを必
要とする。そして、開状態は、確率関数の詳細を必要と
する。これらの状態と関数は、所定のＫ語彙の単語を「
トレーニング」することによって得られる。この語愈は
、必要に応じて、再トレーニングによって変更できる。The apparatus and method according to the first four aspects of the invention typically require a word model defined by machine states and transition probabilities. And the open state requires details of the probability function. These states and functions define words in a given K vocabulary as “
obtained through training. This vocabulary can be changed by retraining if necessary.

従って、本発明の第５の特徴によれば、所定の語嘗の範
囲内で諸単語を表すために有限状態機械に対して多数の
状態を選択し、そして、状態を特徴付けるためのデータ
を得る方法であって次の段階を有する方法が提供される
。。According to a fifth feature of the invention, therefore, selecting a number of states for a finite state machine to represent words within a predetermined vocabulary and obtaining data for characterizing the states. A method is provided having the following steps. .

（イ）前記語食内の各車ｇＲを形成する音を順次サンプ
リングして、その単語を形成する音の特質を代表する信
号の組を得る段階。(b) Sequentially sampling the sounds forming each car gR in the word cliche to obtain a set of signals representative of the characteristics of the sounds forming the word.

（ロ）前記信号の組から、各単語に対する前記状態を定
めるデータを得る段階。(b) Obtaining data defining the state for each word from the set of signals.

（ハ）前記信号の組から、各状態に対する一つの確率関
数を得る段階。(c) Obtaining one probability function for each state from the set of signals.

（ニ）幾つかの確率関数を併合して、所定の数よりも少
ない多数の前記関数を得る段階であって、この併合が、
併合に対する適合性と所定の数とに関係する規定に従っ
て実施される、段階。(d) merging several probability functions to obtain a number of said functions less than a predetermined number, the merging comprising:
Steps carried out in accordance with regulations relating to suitability for annexation and a predetermined number.

（ホ）各有限状態機械において、各状態から、すべての
許容された引き続く状態へ、移行する確率を計算する段
階。(e) Calculating the probability of transition from each state to all allowed subsequent states in each finite state machine.

（へ）各単Ｂ吾が話されたときに、その単語を同定する
データを入力する段階。(f) Entering data that identifies each word when it is spoken.

（ト）各機械を形成する状態を定めるデータであって、
かつその機械によって表される単語を同定するデータを
記憶する段階。(g) Data that determines the conditions for forming each machine,
and storing data identifying the words represented by the machine.

（チ）各併合確率関数を定めるデータを記憶する段階。(H) Storing data defining each merged probability function.

本発明はまた、その第５の特徴の方法を実施するための
装置を含む。The invention also includes an apparatus for carrying out the method of its fifth aspect.

好ましくは、状態を定めるデータを得る段階は、各単語
に対する状態を定めるデータを提供するために、併合適
合性とその単語に対する状態の最大数および最小数とに
関係する基準に従って各単語に対する信号の引き続く組
の幾つかを併合する段階を含む。Preferably, the step of obtaining state-defining data includes determining the signal for each word according to criteria related to merging suitability and maximum and minimum numbers of states for that word, to provide state-defining data for each word. merging some of the subsequent sets.

信号の引き続く二つの組の併合適合性は、ポテンシャル
併合のための信号のそれぞれの組が別個の組から生じた
可能性の比の対数を計算することによって事前評価する
ことができる。もしこの対数がしきい値を越えているな
らば、これら組はそのとき併合される。しきい値は、実
験によってあらかじめ決定しておくか、またはある試験
統計値を使用して得ることができる。同様に、二つの確
率関数の併合適合性は、ポテンシャル併合のための各確
率関数が同じ確率関数から生じた可能性の、各確率関数
が別個の確率関数から生じた可能性に対する比の対数を
計算することによって事前評価することができる。もし
この対数が、他のしきい値と同様な方法で得られたしき
い値を越えたならば、これら関数はそのとき併合される
。The merging suitability of two successive sets of signals can be pre-evaluated by calculating the logarithm of the ratio of the likelihood that each set of signals for potential merging arose from a separate set. If this logarithm exceeds a threshold, then the sets are merged. The threshold value can be predetermined experimentally or obtained using some test statistic. Similarly, the merging suitability of two probability functions is the logarithm of the ratio of the probability that each probability function for potential merging arises from the same probability function to the probability that each probability function arises from a separate probability function. It can be evaluated in advance by calculation. If this logarithm exceeds a threshold obtained in the same way as the other thresholds, then these functions are merged.

前記信号の組を得る段階は、動的時間伸縮の公知の工程
を含むのが好ましい。Preferably, the step of obtaining the set of signals includes the known process of dynamic time warping.

本発明による装置は、トレーニングまたは認識あるいは
その両方のために構成することができる。The device according to the invention can be configured for training and/or recognition.

通常は、この装置は、一つ以上のマスク・プログラム論
理アレイと、一つ以上のマイクロプロセッサとによって
形成される。その様なアレイまたはプロセッサは、複数
個の上述の手段を形成する。Typically, this device is formed by one or more mask-programmed logic arrays and one or more microprocessors. Such an array or processor forms a plurality of the above-mentioned means.

本発明のざらに別の特徴によれば、次の構成要素を有す
る音声認識装置が提供される。According to further features of the invention, there is provided a speech recognition device having the following components.

（イ）音をサンプリングする手段。(b) A means of sampling sound.

（ロ）認識すべき単語に対応する多数の有限状態機械を
表すデータを記憶する手段。(b) Means for storing data representing a number of finite state machines corresponding to words to be recognized.

（ハ）サンプリング手段の出力と記憶したデータとから
、一連の音の試料が前記単語を現す可能性を決定する手
段。(c) means for determining, from the output of the sampling means and the stored data, the probability that a series of sound samples will represent said word;

本発明はまた、この特徴に対応する方法を含む。The invention also includes methods that accommodate this feature.

さらに、この特徴に対して記憶したデータを、公知の話
された単語から決定するための装置と方法を含む。Additionally, an apparatus and method is included for determining stored data for this feature from known spoken words.

（実施例）以下、添付図面を参照して、本発明の詳細な説明する。(Example) Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

まず最初に、トレーニングにおいて、おる単語を話した
ときの実例から、その単語のかくれマルコフ・モデルの
パラメータを得る方法を説明する。First, we will explain how to obtain the parameters of the hidden Markov model of a word from an example of when the word is spoken during training.

第１ａ図において、操作１０で、語嘗に含まれる単語の
うちの一つをトレーニング装置に入力する。In FIG. 1a, in operation 10 one of the words included in the vocabulary is input into the training device.

簡単に説明すると、トレーニング装置は、マイクロホン
と、適当なダイナミック・ゲインを有する増幅器と、ア
ナログ・ディジタル・コンバータと、ディジタル・フィ
ルタのバンクとを有することができる。入力すべき単語
はマイクロホンに向かって話される。次に操作１１を実
施する。ここでは、各フレーム期間に対して一つずつ特
徴ベクトルが形成される。一つのフレームは、アナログ
・ディジタル・コンバータの例えば１２８個のサンプリ
ング期間を有する。各特徴ベクトルは、１個（例えば１
０個）のエレメントを有することができる。Briefly, the training device may include a microphone, an amplifier with appropriate dynamic gain, an analog-to-digital converter, and a bank of digital filters. The words to be entered are spoken into a microphone. Next, perform operation 11. Here, one feature vector is formed for each frame period. One frame has, for example, 128 sampling periods of the analog-to-digital converter. Each feature vector is one (for example, one
0 elements).

このエレメントは、例えばＯ〜４．８ｋＨ２のオーディ
オ周波数帯にわたって広がるそれぞれの周波数帯におけ
る強度に対応する。各特徴ベクトルはまた、フィルタの
バンクと、フィルタ出力でのエネルギー値を抽出する手
段とを使用して得ることができる。したがって、一つの
特徴ベクトルは次のように表すことができる。This element corresponds to an intensity in each frequency band spread over the audio frequency band, for example from 0 to 4.8 kHz. Each feature vector can also be obtained using a bank of filters and means for extracting energy values at the filter outputs. Therefore, one feature vector can be expressed as follows.

［ｘｌ、×２）Ｘ　３　　　　　　　Ｘ　’Ｆ　］　Ｔ
ここで、■は、もっと普通の列ベクトルすなわち垂直ベ
クトルへの転換を意味する。エレメントが１０個のとき
は、［ｘｌ、×２）×３　　　”１０］■ となる。[xl, ×2)X 3 X'F] T
Here, ■ means conversion to a more ordinary column vector, ie, a vertical vector. When there are 10 elements, [xl, x2) x3 ``10]■.

一般に、同じ単語を発音したものであっても、それらの
発音は同一ではない。その差異の重要な原因としては、
′ある発音のタイムスケールが、別の発音のタイムスケ
ールと比較して、非線形に歪んでいることが挙げられる
。この理由のために、動的時間伸縮の工程を実施する（
１作１２）。この操作では、その単語の最も最近に話さ
れた発音の特徴ベクトルを複合ひな型の特徴ベクトル平
均に合わせることによって、二つのタイムスケールを合
致させている。この複合ひな型の特徴ベクトル平均は、
最初の発音である場合を除いて、以下に述べるようにそ
の単語の以前の発音から既に得られているものである。In general, even if the same words are pronounced, their pronunciations are not the same. The important reason for this difference is
'The time scale of one pronunciation is nonlinearly distorted compared to the time scale of another pronunciation. For this reason, we carry out a process of dynamic time stretching (
1 work 12). This operation matches the two timescales by matching the feature vector of the most recently spoken pronunciation of the word to the feature vector average of the composite template. The feature vector average of this composite template is
Unless it is the first pronunciation, it is already derived from a previous pronunciation of the word, as described below.

動的時間伸縮は公知の工程である。この工程は、動的プ
ログラミング・アルゴリズムを使用して実施でき、した
がって、ここでは簡単に説明するにとどめる。実際は、
最も最近に話された発音の特徴ベクトルは、一つ以上の
ベクトルのグループに分割される。各グループは一つの
ひな型ベクトルに対応する。その対応関係は、単語ベク
トルと対応ひな型ベクトルとの間の、結果としての全ス
ペクトル距離が最小となるようにして得られる。Dynamic time warping is a known process. This step can be implemented using dynamic programming algorithms and will therefore only be briefly described here. Actually,
The most recently spoken pronunciation feature vectors are divided into groups of one or more vectors. Each group corresponds to one template vector. The correspondence is obtained such that the resulting total spectral distance between the word vector and the corresponding template vector is minimized.

ひな型ベクトルは、操作１３で形成され、動的時間伸縮
によって得られた合致に従って、特徴ベクトルの各グル
ープを対応ひな型ベクトルに取り込むことによって形成
される。各ひな型ベクトルはＦ個のエレメント（前の例
では１０個のエレメント）を有する。これらの各エレメ
ントは、ひな型ベクトルの各エレメントを、ひな型ベク
トルに対応する特徴ベクトルのグループ内のエレメント
と加重平均することによって形成される。しだがって、
ひな型ベクトルは、特徴ベクトル平均として次のように
表される。A template vector is formed in operation 13 by incorporating each group of feature vectors into a corresponding template vector according to the match obtained by dynamic time warping. Each template vector has F elements (10 elements in the previous example). Each of these elements is formed by weighted averaging of each element of the template vector with the elements in the group of feature vectors corresponding to the template vector. Therefore,
The template vector is expressed as a feature vector average as follows.

［ｘｘｘ−−一−マＦ　］　Ｔ１・　２・　３さらに、各ひな型ベクトルは、諸エレメントを含む。こ
れは、以下の式で計算される。[xxx--1-maF] T 1, 2, 3 Furthermore, each template vector includes various elements. This is calculated using the following formula.

ここで、ｉは、１〜Ｆの値をとり、 σｉは、使用する一組の特徴ベクトルの分散であり、そ
のひな型ベクトルを知らせるもの、己は、−組の特徴ベクトルの平均平方であり、 σｍｉｎは、正の定数である。Here, i takes a value from 1 to F, σi is the variance of the set of feature vectors used and informs its template vector, and is the mean square of the - set of feature vectors, σmin is a positive constant.

ざらに、ひな型ベクトルは、そのひな型ベクトルの中に
併合される特徴ベクトルの数Ｎに関係する。Roughly speaking, a template vector is related to the number N of feature vectors that are merged into it.

すなわち、この数Ｎとは、時間伸縮で得られた各グルー
プ内の対応する特徴ベクトルの数をある単語のすべての
繰り返しについて合計したものである。操作１３を実施
するごとに、平均値、平均平方、特徴ベクトルの数が更
新される。最初に、ある単語が入力されると、操作１１
で得られる一運の特徴ベクトルから直接に複合ひな型が
形成される。各ひな型ベクトルの特徴ベクトル平均は、
話された単語から得られる対応特徴ベクトルに等しく設
定される。特徴ベクトルの数Ｎは１に設定される。That is, this number N is the sum of the number of corresponding feature vectors in each group obtained by time warping for all repetitions of a certain word. Each time operation 13 is performed, the mean value, mean square, and number of feature vectors are updated. First, when a certain word is input, operation 11
A composite template is directly formed from the lucky feature vectors obtained in . The feature vector average of each template vector is
Set equal to the corresponding feature vector obtained from the spoken word. The number N of feature vectors is set to 1.

ある単語について、所定の敗（例えば１０個）の音声見
本が入力されると、判定１４によって、それ以上の操作
１０の繰り返しが防止される。そして、その結果、その
単語をモデル化しただＳ舅マルコフ・モデルの状態を定
義するデータを、複合ひな型が提供する。ひな型ベクト
ルのそれぞれは、一つのモデル状態を表すものとして認
めることができる。ただし、このとき、必要とされる以
上の多くの状態が存在し、各状態は、一つのひな型ベク
トルによって十分に特徴付けられているわけではない。When a predetermined number (for example, 10) of voice samples are input for a certain word, a decision 14 prevents further repetition of operation 10. As a result, the composite template provides data that defines the state of the S-in-law Markov model that models the word. Each of the template vectors can be seen as representing one model state. However, in this case, there are many more states than required, and each state is not fully characterized by one template vector.

第１ａ図の操作の次のグループは、状態の数をより小さ
くするためのものである。The next group of operations in FIG. 1a are for making the number of states smaller.

第１のステップとして、ひな型を分割する。λという量
は、操作１５で、引き続く一対のひな型ベクトルのそれ
ぞれに対してｇ１算される。ひな型ベクトルの対は、最
初と最後のひな型ベクトルを除いて各ひな型ベクトルが
二つの対の中に現れるという意味で重なり合っている。The first step is to divide the template. The quantity λ is calculated g1 in operation 15 for each of the pair of subsequent template vectors. Pairs of template vectors are overlapping in the sense that each template vector appears in two pairs except for the first and last template vector.

したがって、λは、ｎ−２からｎｍに対して、ｎとｎ−
１のすべての対に対して計算される。ここで、ｎｍは、
そのひな型におけるひな型ベクトルの数である。ｎとｎ
−１のひな型ベクトルの対に対して計算されたλの値を
、λ（ｎ＞とする。生じる観測量が、二つのひな型ベク
トルが対になって併合されたときに得られる確率分布か
ら発生した、という確からしぎがＬｓだとすると（この
場合、観測量というのは、ひな型ベクトルに寄与する特
徴ベクトルである。）、そして、生じた観測量が、二つ
の異なる最もそれらしい分布から生じた、という確かし
たがって、λは、二つの状態が併合にいかに適している
かを測る尺度となる。対角分散行列を伴ったガウス多変
量分布を仮定すると、次のことがわかる。Therefore, λ is n and n- for n-2 to nm.
Calculated for every pair of 1's. Here, nm is
is the number of template vectors in that template. n and n
Let λ(n> be the value of λ calculated for a pair of model vectors of Suppose that the probability that the observed value is Ls (in this case, the observed quantity is a feature vector that contributes to the template vector), and that the resulting observed quantity is generated from two different most likely distributions, Therefore, λ is a measure of how well two states are suitable for merging.Assuming a Gaussian multivariate distribution with a diagonal variance matrix, we find that:

ここで、添字１と２は、引き続いた二つのひな型ベクト
ルに関係する。添字ｆは、一つの特徴ベクトルのｆ番目
の特徴に関係する。Ｎは、ひな型ベクトルに併合した特
徴ベクトルの数である。添字ｙは、もし引き続いた二つ
のひな型ベクトルが一つのものに併合されたならば形成
されるであろう仮想ひな型ベクトルに適用される。Here, subscripts 1 and 2 relate to two consecutive template vectors. The subscript f relates to the fth feature of one feature vector. N is the number of feature vectors merged into the template vector. The subscript y is applied to the virtual template vector that would be formed if two consecutive template vectors were merged into one.

修正値λ−（ｎ）は、操作１５で、ｎ−２からｎ−ｎｍ
までに対して次の式で計算される。The correction value λ-(n) is calculated from n-2 to n-nm in operation 15.
It is calculated using the following formula.

λ−（ｎ＞−２λ（ｎ）−λ（ｒｌ−１＞−λ（ｎ＋１
） λ（１）とλ（ｎ、７１＞に対しては、任意の大きい値
を設定する。λ−（ｎ＞の値は、λ（ｎ）における大き
な変化を示すために（したがって分割に適した位置を示
すために）使用される第２の差異である。分割について
はこれから述べる。λ-(n>-2λ(n)-λ(rl-1>-λ(n+1
) λ(1) and λ(n, 71> are set to arbitrarily large values. Values of λ−(n> The second difference is used (to indicate the location of the split), which we will now discuss.

一つのモデルのためのひな型ベクトルは、順次記憶され
る。そして、ひな型ベクトル群の間に多くの分割標識を
入力して、ひな型ベクトルがどのようにして併合される
ことになるかを示す。この場合、併合するひな型ベクト
ルの隣りには標識を付けない。ある数だけの標識を挿入
したら、そのモデルにおける最終状態の数は、その標識
の数プラス１に等しくなる。したがって、判定１６を実
施して、これまでに定めた状態の数が最大値（例えば８
）より小さいかどうかを判断する。もしそうであれば、
判定１７を実施して、分割標識を挿入するためにはまだ
使用していないλ′（λ′　　　）の最大値が、しきい
値よりも小さいａｘかどうかを判断する。もし小さければ、分割は必要ない
。λ−ｍａｘがしきい値よりも大きければ、操作１８で
、λ−のその値に対応するひなａｘ型ベクトルの間の位置に、新しい分割標識を挿入する。The template vectors for one model are stored sequentially. A number of split indicators are then entered between the template vectors to indicate how the template vectors are to be merged. In this case, no marker is placed next to the template vector to be merged. Once a certain number of indicators have been inserted, the number of final states in the model is equal to the number of indicators plus one. Therefore, judgment 16 is performed and the number of states determined so far is the maximum value (for example, 8
) to determine whether it is less than. If so,
Decision 17 is performed to determine whether the maximum value of λ'(λ') not yet used to insert a split indicator is ax less than the threshold. If it is small, no splitting is necessary. If λ-max is greater than the threshold, operation 18 inserts a new split indicator at a position between the brochure ax type vectors corresponding to that value of λ-.

それから、判定１６を繰り返す。しかし、判定１７で、
もしλ′　　　がしきい値よりも小ａｘさければ、判定１９を実施して、標識によって定めた状
態の数が、最小値よりも小ざいかまたは等しいかどうか
を判断する。もし小さければ、たとえλ−ｍａｘがしき
い値よりも小さくても、ざらに分割標識を挿入する必要
がある。したがって、操作１日を実施する。このループ
（すなわち、判定１６．１７．１９および操作１８）は
、状態の数が最小値（例えば３）より小ざいかまたは等
しくなるまで続けられる。Then, repeat decision 16. However, in decision 17,
If λ' is less than the threshold ax , then decision 19 is performed to determine whether the number of states defined by the indicator is less than or equal to the minimum value. If it is small, it is necessary to roughly insert the splitting mark even if λ-max is smaller than the threshold. Therefore, one day of operation is carried out. This loop (ie, decision 16.17.19 and operation 18) continues until the number of states is less than or equal to the minimum value (eg, 3).

しきい値よりも小さいλ′　　　に対応する標ａｘ識を挿入する前かあるいは後に、判定１９で、状態の最
小数の基準を満足した場合は、第１ｂ図の操作２０で、
ひな型ベクトルの併合を実施する。If, in decision 19, the minimum number of states criterion is satisfied before or after inserting the indicator corresponding to λ' that is less than the threshold, then in operation 20 of FIG.
Perform merging of template vectors.

ひな型ベクトルを併合するためには、併合される各ひな
型ベクトルを生ずる特徴ベクトルの全体の数を考慮して
、特徴ベクトル平均と平均平方との平均値を求める。こ
のようにして作られた各併合ベクトルは、その標準偏差
を伴って、確率密度関数（ＰＤＦ）を定める。ただし、
確率はガウス分布に従うと仮定する。しかしながら、こ
のように形成された確率密度関数は仮のものである。と
いうのは１、第１ｂ図の次のステップでは、その仮の確
率密度関数を、他の諸単語（もし有れば）に対して既に
記憶しである確率密度関数に併合するからである。もし
適当でなければ、ざらに確率密度関数を記憶する。確率
密度関数を記憶する前に、操作２０で、仮の確率密度関
数は、その単語の仮のモデルとしで記憶される。To merge template vectors, the average value of the feature vector mean and mean square is determined, taking into account the total number of feature vectors that give rise to each template vector to be merged. Each merged vector created in this way, together with its standard deviation, defines a probability density function (PDF). however,
Assume that the probabilities follow a Gaussian distribution. However, the probability density function formed in this way is provisional. 1, since the next step in FIG. 1b is to merge the temporary probability density function with the probability density functions already stored for the other words (if any). If it is not appropriate, roughly memorize the probability density function. Before storing the probability density function, in operation 20, a tentative probability density function is stored as a tentative model of the word.

もし、トレーニングに現在使っているその単語が、所定
の語彙の中の最初の単語でなければ、仮の確率密度関数
を記憶したときに、操作２２で、各便の確率密度関数と
各記憶した確率密度関数の間で、一つのλを計算する。If the word currently used for training is not the first word in the predetermined vocabulary, when the temporary probability density function is memorized, in step 22, the probability density function of each flight and each memorized Calculate one λ between the probability density functions.

そして、所定の語彙の中の次の単ｇｌについてトレーニ
ングを開始する。Then, training is started for the next single gl in the predetermined vocabulary.

仮の確率密度関数と記憶した確率密度関数とは、ひな型
ベクトルとして同じ形式で記憶されるので、λの計算は
、ひな型ベクトルを併合したときと同じ方法で実施され
る。Since the temporary probability density function and the stored probability density function are stored in the same format as a template vector, the calculation of λ is performed in the same way as when the template vectors are merged.

操作２２を完了した後、操作２３で、一つの確率密度関
数に対するλの最小値を選択する。そして、判定２４で
、併合が望ましいがどうかを示すしきい値と比較する。After completing operation 22, operation 23 selects the minimum value of λ for one probability density function. Then, in decision 24, a comparison is made with a threshold value indicating whether merging is desirable.

併合すべきことをしきい値が示している場合は、操作２
５で、上述の平均化工程によって、仮の確率密度関数と
、λｍａｘに対応する記憶した確率密度関数とを併合す
る。その俊、操作２６で、併合した確率密度関数（今、
記憶したもの）と、残った各便の確率密度関数との間で
、新しいλの値を計算する。If the threshold indicates that it should be merged, then step 2
5, the tentative probability density function and the stored probability density function corresponding to λmax are merged by the averaging step described above. Then, in operation 26, the merged probability density function (now,
A new value of λ is calculated between the stored value) and the probability density function of each remaining flight.

判定２４で、λの最大値がしきい値よりも大きければ、
判定３１を実施して、既に記憶した確率密度関数の数が
、許された最大ｆ［（例えば４８）より小ざいかどうか
を判断する。そして、もしそうならば、操作３２で、試
験中のλの現在の最大値を有する仮の確率密度関数を記
憶する。しかし、判定３１で示されたように、もし確率
密度関数をそれ以上記憶することができなければ、仮の
確率密度関数の値がしきい値よりも大きいとしても、そ
の確率密度関数を併合しなければならない。したがって
、操作２５を実施することになる。In decision 24, if the maximum value of λ is greater than the threshold,
Decision 31 is performed to determine whether the number of probability density functions already stored is smaller than the maximum allowed f[ (for example, 48). And, if so, in operation 32 a temporary probability density function with the current maximum value of λ under test is stored. However, as shown in decision 31, if no more probability density functions can be stored, the probability density functions are merged even if the value of the provisional probability density function is greater than the threshold. There must be. Therefore, operation 25 will be performed.

各確率密度関数は操作２５で併合されるので、操作２９
で、仮の確率密度関数の代わりに、併合した確率密度関
数を示すラベルを使用することによって、現在の単語の
モデルは強められる。Each probability density function is merged in operation 25, so operation 29
, the model for the current word is strengthened by using a label indicating the merged probability density function instead of the hypothetical probability density function.

次に判定２７を実施して、併合を考慮すべきそれ以上の
仮の確率密度関数が存在するがどうかを判断する。もし
存在すれば、操作２３に戻る。Decision 27 is then performed to determine whether there are any more tentative probability density functions that should be considered for merging. If it exists, return to operation 23.

併合すべき仮の確率密度関数がそれ以上存在しない場合
は、操作２８で、一つの状態から別の状態に移行する確
率、または同じ状態に戻る確率を計算する（第３図の矢
印３６．３７をそれぞれ参照）。一つの状態からその次
の状態に移行する確率は、その単語をトレーニングする
のに使用した発音見本の数を、最初に述べた状態に併合
した特徴ベクトルの全体の数で割り算することによって
計算できる。その状態からそれ自身に移行する確率は、
そのようにして得られた確率を１から引き算して得られ
る。If there are no more provisional probability density functions to be merged, then in operation 28 the probability of transitioning from one state to another or returning to the same state is calculated (arrows 36 and 37 in Figure 3). ). The probability of transitioning from one state to the next can be calculated by dividing the number of pronunciation exemplars used to train the word by the total number of feature vectors merged into the first mentioned state. . The probability of transitioning from that state to itself is
It is obtained by subtracting the probability thus obtained from 1.

最後に、操作３０で、記憶した確率密度関数ラベルと共
に、上記移行確率を記憶する。これにより、その単語の
完全なモデルが形成される。第１ａ図と第１ｂ図の工程
は、すべての単語に対して繰り返す。その結果、記憶し
たモデルに追加がなされ、記憶した確率密度関数が変更
される。しかし、記憶した確率密度関数は、ラベルによ
ってモデルに記憶されるので、完成したモデルのそれ以
上の処理は必要ではない。Finally, in operation 30, the transition probabilities are stored together with the stored probability density function labels. This forms a complete model of the word. The steps of Figures 1a and 1b are repeated for all words. As a result, additions are made to the stored model and the stored probability density function is modified. However, since the stored probability density function is stored in the model by the label, no further processing of the completed model is required.

。第１ａ図と第１ｂ図のフローチャートは、マイクロプ
ロセッサなどのコンピュータにプログラミングをするこ
とによって実施できる。その様なプログラミングは周知
の技術なので、ここでは説明しない。もし、それぞれの
単語の発音に必要な処理を、トレーニング中に、その単
語が話された時間とほぼ同時間で実施することが要求さ
れるならば、少なくとも部分的に特別に作られたコンピ
ュータを利用することが必要であろう。この種のコンピ
ュータの一例は、トレーニングで得られたモデルを使用
する認識システムに関連して、本明細書で俊に述べる。. The flowcharts of FIGS. 1a and 1b can be implemented by programming a computer, such as a microprocessor. Such programming is a well-known technique and will not be described here. If the processing required to pronounce each word is required to be carried out during training in approximately the same amount of time that the word is spoken, then at least in part a specially designed computer can be used. It will be necessary to use it. An example of this type of computer is discussed herein in connection with a recognition system that uses a trained model.

認識すべき所定の語柔に含まれるそれぞれの単語（例え
ば６４個の単語）について、その単語を繰り返す（例え
ば、トレーニング装置のマイクロホンに向かって１０回
繰り返す）ことによって、それぞれのモデルを得ること
ができる。トレーニング装置は、キーボードのようなデ
ータ入力装置と組み合わせである。これにより、トレー
ニングで各単語を繰り返して発声しているときに、その
単語を書いて入力することができて、そのモデルに対す
るラベルとして記憶できる。For each word (e.g., 64 words) included in a given vocabulary to be recognized, a respective model can be obtained by repeating the word (e.g., repeating it 10 times into the microphone of the training device). can. The training device is in combination with a data input device such as a keyboard. This allows you to write and input each word as you repeat it during training, and store it as a label for that model.

次に、トレーニングで得られた友Ｓ３マルコフ・モデル
を使用した音声認識について説明する。Next, speech recognition using the Tomo S3 Markov model obtained through training will be described.

ここでは、トレーニングで使用したものと同じアナログ
回路を使用する。そして、アナログ・ディジタル・コン
バータの出力は、ディジタル・フィルタ操作を使用して
、周波数領域に分離される。Here we will use the same analog circuit that we used in training. The output of the analog-to-digital converter is then separated into the frequency domain using digital filtering.

各フレーム期間において、フィルタ出力は、最初に操作
４２（第２図）で、一つの特徴ベクトルとして形成され
る。この特徴ベクトルは、前述の一方、トレーニング中
に得られた各確率密度関数を取り出して、特徴ベクトル
のエレメントと一緒に使用し、そのときの音声見本がこ
の確率密度関数に対応する状態によって生ずるであろう
確率を計算する（操作４３）。この様な計算のそれぞれ
によって、距離ベクトル［ｄ　ｉ　、　ｄ　２．　ｄ　
ａ。In each frame period, the filter output is first formed as a single feature vector in operation 42 (FIG. 2). This feature vector is created by extracting each probability density function obtained during training and using it together with the elements of the feature vector, as described above, so that the speech sample at that time is generated by the state corresponding to this probability density function. The probability is calculated (operation 43). Each such calculation yields a distance vector [d i , d 2 . d
a.

・・・ｄｍ］Ｔのうちの一つのエレメントが得られる。...dm]T is obtained.

「距離」の意味については以下に述べる。The meaning of "distance" will be explained below.

ここで、ｍは、確率密度関数の全体の数であり、前述の
例では４８である。この方法で得られた距離ベクトルは
、操作４４で記憶する。−次元のガウス分布において、
任意のモデル状態が、ある特徴ベクトルと等価な観測量
を生ずるような確率は、次の式で与えられる。Here, m is the total number of probability density functions, which is 48 in the above example. The distance vector obtained in this manner is stored in operation 44. In a -dimensional Gaussian distribution,
The probability that an arbitrary model state produces an observable equivalent to a certain feature vector is given by the following equation.

ここで、σは標準ＷＡ差、Ｘは観測量、★は平均観測量
である。Here, σ is the standard WA difference, X is the observed amount, and ★ is the average observed amount.

ある分布がＪ次元を有すると仮定する。本発明の音声認
識方法でいえば、Ｊは、特徴ベクトルの中に含まれるエ
レメントの数に相当する。このとき、上述の確率は、次
の式のようになる。Suppose a distribution has J dimensions. In the speech recognition method of the present invention, J corresponds to the number of elements included in the feature vector. At this time, the above probability becomes as follows.

上の式において、■は、公知の数学的取り決めに従って
、ｊを変更することによって得られる項を互いに掛は算
したものを意味する。ｘｊに対する値は特徴ベクトルか
ら得られ、ｘｊとＯｊに対する値はひな型ベクｌ−ルか
ら得られる。この確率（本明細書では、確率の「距離」
として知られる）の自然対数をとって一２倍すると、次
の式が得られる。In the above formula, ■ means the product obtained by multiplying the terms obtained by changing j according to a known mathematical convention. The values for xj are obtained from the feature vector, and the values for xj and Oj are obtained from the template vector l-. This probability (herein referred to as the "distance" of probability)
By taking the natural logarithm of (known as ) and multiplying by 12, we get the following equation:

ここで、Ｊｌｎ（２π）は省略しである。というのは、
これは定数であり、相対的な「距離」を比較するのに必
要がないからである。得られた距離ベクトルの各エレメ
ントは、音声見本に対応確率密度関数が与えられそうも
ない程度を示す基準となる。そして、これら各エレメン
トは、対応確率密度関数を与えられた音声見本の確率の
逆数の対数に比例するとして計算される。Here, Jln(2π) is omitted. I mean,
This is because it is a constant and is not needed to compare relative "distances". Each element of the resulting distance vector provides a measure of the degree to which the speech sample is unlikely to be given the corresponding probability density function. Each of these elements is then calculated as having a corresponding probability density function proportional to the logarithm of the reciprocal of the probability of a given speech sample.

「距離」を使用することによって、掛は算および割り譚
の回数を減らすことができる。これにより、認識の速度
を速めることができ、フローチャートを実施するための
特別な目的の回路に要するコストが削減できる。By using ``distance'', the number of calculations and divisions can be reduced. This can speed up recognition and reduce the cost of special purpose circuitry to implement the flowchart.

距離ベクトルは、可能なモデル状態のそれぞれから現在
の発音が生じゃ確率を与えるものであるが、この距離ベ
クトルを得た後、各モデルは、今度は、発音された単語
がそのモデルに該当するであろう確からしぎを決定する
のに考慮される。したがって、操作４５で、あるモデル
が選択され、操作４６で、そのモデルにおいて各状態に
対する最小距離が針環される。この工程は、第４図から
第１０図までを参照して、以下に詳しく説明する。After obtaining this distance vector, which gives the probability that the current pronunciation is true from each of the possible model states, each model in turn calculates the probability that the pronounced word falls under that model. considered in determining the probability that the Thus, in operation 45, a model is selected, and in operation 46, the minimum distance for each state is determined in that model. This process will be described in detail below with reference to FIGS. 4 to 10.

上述の工程は、判定４７で示すように各モデルが処理さ
れるまで、続けられる。それから、操作４８を実施する
。ここでは、すべてのモデルのすべての状態に対して、
最小距離のうち最も小ざいものを見つける。この値は、
操作４９で、見つけた最小累積距離のすべてを正規化す
るのに使用される。（操作４８と４９についても、第４
図と第１０図を参照して説明する。）正規化した距離は
、次に、操作５０で、累積距離ベクトルとして記憶する
。したがって、このベクトルは、エレメントの諸列Ｄ１
．Ｄ２．Ｄ３．　　・・・Ｄｎを有する。The process described above continues until each model has been processed, as indicated by decision 47. Operation 48 is then performed. Here, for all states of all models,
Find the smallest of the minimum distances. This value is
In operation 49, it is used to normalize all of the minimum cumulative distances found. (Also regarding operations 48 and 49,
This will be explained with reference to the figures and FIG. ) The normalized distances are then stored in operation 50 as a cumulative distance vector. Therefore, this vector is the array of elements D1
．． D2. D3. ...has Dn.

ここで、ｎは、おるモデルにおける状態の最大の数であ
る。各モデルに対して一つの列が対応する。Here, n is the maximum number of states in the model. One column corresponds to each model.

最小累積距離のうち最も小さいものについても記憶する
。次に、操作５１を実施して、最小り。を有する各モデ
ルの最後の状態から、現在のサンプリング期間で話され
たであろう最も確からしい単語として、その対応単語を
決定し、かつ提供する。The smallest of the minimum cumulative distances is also stored. Next, perform operation 51 to minimize. From the last state of each model with , determine and provide its corresponding word as the most likely word that would have been spoken in the current sampling period.

この様な方法は、以下の文献に記載されているように、
結合した単語を認識することに容易に拡張できる。すな
わち、Ｊ、Ｓ、Ｂｒ１ｄｌｅ、Ｍ。Such a method is described in the following literature:
It can be easily extended to recognizing combined words. Namely, J.S., Br1dle, M.

Ｄ、Ｂｒｏｗｎ、Ｒ，Ｍ、Ｃｈａｍｂｅｒｌａｉｎによ
る［結合した単語の認識のためのアルゴリズムＪ　ＪＳ
ＲＬＩ研究報告書１０１０を参照されたい。ここでは、
最小累積距離を示したモデルを通じて経路を求め、この
経路を示すトレースバック・ポインタを利用している。D. Brown, R.M., Chamberlain [Algorithm for Combined Word Recognition J JS
Please refer to RLI Research Report 1010. here,
A route is determined through the model that shows the minimum cumulative distance, and a traceback pointer indicating this route is used.

トレースバック・ポインタを決定できる一つの方法を、
以下に説明する。次に示す二つの論文も、結合した単語
を認識するのに適した方法を述べており、最小累積距離
とトレースバック・ポインタとを利用している。One way the traceback pointer can be determined is
This will be explained below. The following two papers also describe suitable methods for recognizing combined words, using minimum cumulative distance and traceback pointers.

すなわち、「部分的トレースバックと動的プログラミン
グＪ　Ｐ、Ｆ、Ｂｒｏｗｎ、Ｊ、Ｃ，Ｓｐ。``Partial Traceback and Dynamic Programming'' JP, F. Brown, J. C., Sp.

ｈｒｅｒ、Ｐ、Ｈ，Ｈｏｃｋｓｃｈ　ｉ　Ｉ　ｄ、Ｊ。hrer, P. H., Hocksch, J.

Ｋ、Ｂａｋｅｒ、音響音声と信号処理に関するＩＥＥＥ
国際会議の会議録１６２９〜１６３２ページ、１９８２
年。および、「結合した単語の認識のための単一ステー
ジ動的プログラミング・アルゴリズムの利用ＪＨｅｒｍ
ａｎｎ　　Ｎｅｔ、音響音声と信号処理に関するＩＥＥ
Ｅ会報、ＡＳＳＰ−３２巻、第２＠、１９８４年４月、
２６３〜２７１ページ。K. Baker, IEEE on Acoustic Speech and Signal Processing.
Proceedings of the International Conference, pages 1629-1632, 1982
Year. and “Using a Single-Stage Dynamic Programming Algorithm for Combined Word Recognition JHerm
ann Net, IEE on Acoustic Speech and Signal Processing
E-newsletter, ASSP-32, No. 2@, April 1984,
Pages 263-271.

第２図の工程は、各フレーム期間に対して繰り返す。こ
の目的のため、通常は高速特殊コンピュータが必要とな
る。この種の機械について次に説明する。The process of FIG. 2 is repeated for each frame period. For this purpose, a high speed specialized computer is usually required. This type of machine will now be described.

トレーニングと音声認識との両方に使用することのでき
る装置を、第３図に示す。ここで、マイクロホン６０は
、増幅器６１を介して、アナログ・ディジタル・コンバ
ータ６２の入力に結合しである。増幅器６０は、自動ゲ
イン制ｍ装置を有し、幅広いダイナミック・レンジを有
する。ただし、この増幅器とマイクロホン６０について
は詳しく説明しない。というのは、装置のこの部分に適
した回路は周知であるからである。プロセッサ６３は、
例えばインテル８０５１型式のマイクロプロセッサとす
ることができ、配線７１を介して、コンバータ６２とフ
ロントエンドプロセッサ６４に対して、８ｋＨｚのサン
プリング周波数を提供する。したがって、第２図に示す
処理の各サイクルは、１２５マイクロ秒で完了する。プ
ロセッサ６４は、上述のディジタル・フィルタ操作４０
を実施し、通過帯域は、はぼ、算術的には０ｋＨｚと１
ｋＨｚの間、対数的には１ｋＨｚと４．６ｋＨ２の間で
分割するのが好ましい。したがって、特徴ベクトルは、
８ごブトバス６５上で得ることができ、ここでは、特徴
ベクトルは論理回路６６に送られる。論理回路６６は、
マスク・プログラム・アレイとすることができる。プロ
セッサ６４と論理回路６６との間の初期接続手順は、ラ
イン７０で制御される。A device that can be used for both training and speech recognition is shown in FIG. Here, microphone 60 is coupled via an amplifier 61 to the input of an analog-to-digital converter 62. Amplifier 60 has automatic gain control and has a wide dynamic range. However, this amplifier and microphone 60 will not be described in detail. This is because suitable circuits for this part of the device are well known. The processor 63 is
For example, it may be an Intel 8051 type microprocessor and provides a sampling frequency of 8 kHz to converter 62 and front end processor 64 via line 71. Therefore, each cycle of the process shown in FIG. 2 completes in 125 microseconds. Processor 64 performs the digital filter operation 40 described above.
is carried out, and the passband is 0 kHz and 1 kHz in arithmetic terms.
kHz, preferably logarithmically divided between 1 kHz and 4.6 kHz. Therefore, the feature vector is
The feature vectors can be obtained on an eight bus 65 where the feature vectors are sent to a logic circuit 66. The logic circuit 66 is
Can be a mask programmed array. The initial connection procedure between processor 64 and logic circuit 66 is controlled on line 70.

プロセッサ６３、論理回路６６、ランダム・アクセス・
メモリ（ＲＡＭ＞６７は、アドレスとデータバス６８．
６９によって、互いに接続される。Processor 63, logic circuit 66, random access
Memory (RAM>67 is connected to the address and data bus 68.
69, they are connected to each other.

ホストコンピュータ７２は、バス７３と制御ライン７４
によって、プロセッサ６３に接続される。The host computer 72 has a bus 73 and a control line 74.
is connected to the processor 63 by.

６２から６７までの構成要素は、ホストコンピュータと
一緒に使用される周辺装置として製造されかつ販売され
るような装置を形成することが期待される。これにより
、ホストコンピュータ７２は、トレーニング期間中に入
力設備（ここでは、話された単語が同定されなければな
らない）を提供でき、認識すべき単語を表示するディス
プレイを提供できる。装置を認識のためだけに使用する
場合は、ＲＡＭ６７は、部分的に読み出し専用メモリに
置き換えることができ、ホストコンピュータ７２は、認
識した単語を表示するためのシンプルあるいはインテリ
ジェント・ディスプレイに置き換えることができる。論
理回路６６は、ピテルビ（Ｖ語彙ｅｒｂｉ）・アルゴリ
ズムを実施する。It is anticipated that components 62 through 67 will form a device that is manufactured and sold as a peripheral device for use with a host computer. This allows the host computer 72 to provide input facilities during the training period (where spoken words must be identified) and to provide a display displaying the words to be recognized. If the device is used solely for recognition, RAM 67 may be partially replaced with read-only memory and host computer 72 may be replaced with a simple or intelligent display for displaying recognized words. . Logic circuit 66 implements the Piterbi algorithm.

したがって、操作４５〜４８に等価な操作を実施し、最
小累積距離を決定する。第１ａ図と第１ｂ図に示す残り
の操作は、プロセッサ６３で実施する。ＲＡＭ６７は、
操作１３の複合ひな型と、確率密度関数を含む操作２９
のモデルとを記憶する。Therefore, operations equivalent to operations 45-48 are performed to determine the minimum cumulative distance. The remaining operations shown in FIGS. 1a and 1b are performed by processor 63. RAM67 is
Operation 29 including the composite model of operation 13 and the probability density function
The model is memorized.

ＲＡＭ６７はまた、操作４４の距離ベクトルと、操作５
０の累積距離ベクトルとを記憶する。The RAM 67 also stores the distance vector of operation 44 and the distance vector of operation 5.
A cumulative distance vector of 0 is stored.

これらプロセッサをプログラミングすることは公知の技
術であり、第１ａ図と第１ｂ図のアルゴリズムが与えら
れればプログラミング可能なので、これ以上説明しない
。ただし、操作４６．４８．４９を実施できるような一
つの方法については、以下に説明する。それから、これ
らの操作を実施するためのビテルビ・エンジンの一例に
ついて述べる。Programming these processors is a known technique and is programmable given the algorithms of FIGS. 1a and 1b, and will not be described further. However, one way in which operations 46, 48, 49 can be performed is described below. We then describe an example of a Viterbi engine for performing these operations.

記載する例は、トレースバック・ポインタの求め方を示
すものである。ただし、第２図の操作５１に関連してオ
プションとして示すように、使用する認識プロセスがこ
れらのポインタを使用しない場合は、もちろん、これら
のポインタを求める必要はない。The described example shows how to obtain a traceback pointer. However, it is of course not necessary to determine these pointers if the recognition process used does not use them, as shown optionally in connection with operation 51 of FIG.

第４図の有限状態機械は、認識装置の所定の語零の中の
単一の単語を表す。これは、三つの正常状態Ａ、Ｂ、Ｃ
を有する。ただし、実際は、これらの三つの状態の代わ
りに、典型的にはもつと多くの状態を有するであろう。The finite state machine of FIG. 4 represents a single word within a given word zero of the recognizer. This is the three normal states A, B, and C.
has. However, in reality, instead of these three states, it will typically have many more states.

さらに、この有限状態機械は、開始と終了のダミー状態
（ＳＤとＥＤ）を有する。正常モデル状態は、矢印で示
すような移行によって分離される。音声モデルでは、左
から右への移行と、一つの状態からそれ自身への移行だ
けが用いられる。三つの音から成るある単語に対する簡
単な例では、これらの音は、左から右へという順番で生
じなければならない。あるいは、一つ以上の時間フレー
ムの存続時間の間、一つの音が同じ状態にとどまること
ができる。Additionally, this finite state machine has start and end dummy states (SD and ED). Normal model states are separated by transitions as indicated by arrows. In the speech model, only left-to-right transitions and transitions from one state to itself are used. In a simple example for a word consisting of three sounds, these sounds must occur in order from left to right. Alternatively, one sound can remain in the same state for the duration of one or more time frames.

単語を認識されたものとして示すことができるようにビ
テルビ・エンジンによってプロセッサ６３に供給された
情報は、モデルを通過する最大の可能性の経路と、これ
ら経路の長さとに関係している。最大の可能性は、二つ
のタイプのペナルティ−をモデルに割り当てることによ
って決定される。すなわち、移行ペナルティ−ｔｐ　（ｓｌ、ｓ２）。これは、矢印
で示した状態間の移行あるいはそれ自身への移行のそれ
ぞれについて別個のものである。The information provided by the Viterbi engine to processor 63 so that words can be indicated as recognized relates to the maximum possible paths through the model and the lengths of these paths. The maximum likelihood is determined by assigning two types of penalties to the model. That is, transition penalty - tp (sl, s2). This is separate for each transition between or into itself as indicated by the arrow.

状態ペナルティ−５ｐ（Ｉ、ｔ）。これは、インデック
ス１（Ｓ）によって、ある特別の繰り返し、すなわちビ
テルビ・エンジンの１１における一つの正常状態に割り
当てられている。State penalty -5p(I,t). It is assigned by index 1(S) to one normal state in a particular iteration, 11 of the Viterbi engine.

移行ペナルティ−とインデックスは、ある特別のビテル
ビ・エンジンのモデルに対して、一定にしておく必要は
ない。しかし、この例では、これらを一定にしである。The transition penalty and index need not remain constant for a particular Viterbi engine model. However, in this example, we will keep these constant.

第５図の第１表では、インデックスに関連する値が時間
と共に変化する様子を示す。最初の繰り返しく０）では
、インデックス１１．Ｉ２．　　・・・Ｉｉに関連する
値は、第１フレーム（０）に対する距離ベクトルのエレ
メントによって形成される。そして、引き続く距離ベク
トルは、第１表の列１，２．・・・ｉを形成する。現在
の距離ベクトルだけが、ビテルビ・エンジンによって保
持される。Table 1 of FIG. 5 shows how the values associated with the index change over time. In the first iteration (0), index 11. I2. ...The value associated with Ii is formed by the elements of the distance vector for the first frame (0). Then, the subsequent distance vectors are in columns 1, 2, etc. of Table 1. ...form i. Only the current distance vector is maintained by the Viterbi engine.

次に、各モデルを通る最小距離を計算する方法について
説明する。各繰り返しにおいて、ピテルビ・エンジンは
、各正常状態に対する最小累積距離を計算する。この計
算は、最も右側の状態でスタートして左側に進む。した
がって、繰り返しＯと状態Ｃに対して、三つの経路から
の最小累積距離が決定される。そして、各経路において
、経路の初めの状態に対するある記憶された累積距離が
、その経路に対する移行ペナルティ−に加えられる。Next, a method for calculating the minimum distance passing through each model will be explained. At each iteration, the Piterbi engine calculates the minimum cumulative distance for each normal state. This calculation starts at the rightmost state and proceeds to the left. Therefore, for iteration O and state C, the minimum cumulative distance from the three paths is determined. Then, for each path, some stored cumulative distance relative to the initial state of the path is added to the transition penalty for that path.

得られた最小値は、状態Ｃに割り当てられたインデック
スと現在の距離ベクトル（すなわち、第１表の現在入手
できる列）とを使用して、得られた状ｆｉＱに存在でき
るようにペナルティ−に加えられる。各状態に対してこ
の方法で得られた最小累積距離（ＭＣＤ）は、任意の時
刻で第６図の第２表の一つの行を保有する一つの記憶装
置に保有される。最初の繰り返しＯでは、行Ｏにおける
すべての値は、最大値に初期設定される。したがって、
繰り返し１が実施されると、状態Ａ、Ｂ、Ｃに対する累
積距離が入手できて、上述の最小累積距離が更新される
。例えば、状態Ｃに対して最小累積距離を更新するため
に、状ＨＡＳＢ、Ｃに対する以前の累積距離Ｏが入手で
きる。繰り返し１において、状態Ｃに対して、どちらが
最小累積距離になるかを決定した後、Ｍ、ＣＤを更新す
るためにその距離を使用する。同時に、トレースバック
・ポインタ（ＴＢ）（これは繰り返しＯにおいて値Ｏに
初期設定されている）についても、新しい最小累積距離
を計算した場合のその状態に対する以前のトレースバッ
ク・ポインタを１つだけ増分することによって、更新す
る。したがって、各状態（Ｓ）に対する各繰り返しく１
＞において、一つの最小累積距離（ＭＣＤ　（ｓ、ｔ）
）と、一つのトレースバック・ポインタ（ＴＢ　（ｓ、
ｔ））とが次のように得られる。The obtained minimum value is penalized to exist in the obtained state fiQ using the index assigned to state C and the current distance vector (i.e. the currently available column of Table 1). Added. The Minimum Cumulative Distance (MCD) obtained in this way for each state is held in a single storage device that holds one row of Table 2 of FIG. 6 at any given time. In the first iteration O, all values in row O are initialized to the maximum value. therefore,
When iteration 1 is performed, the cumulative distances for states A, B, and C are available and the minimum cumulative distance mentioned above is updated. For example, to update the minimum cumulative distance for state C, the previous cumulative distance O for state HASB,C is available. In iteration 1, for state C, after determining which has the smallest cumulative distance, use that distance to update M, CD. At the same time, the traceback pointer (TB) (which is initialized to the value O at iteration O) also increments the previous traceback pointer for that state by one when calculating the new minimum cumulative distance. Update by doing. Therefore, each iteration for each state (S)
>, one minimum cumulative distance (MCD (s, t)
) and one traceback pointer (TB (s,
t)) is obtained as follows.

ここで、ｔは、現在の繰り返しに関係する。ＤＶ（１（
ｓ）、ｔ）は、繰り返しｔｋ：おける状態Ｓに対するイ
ンデックスＩ、（Ｓ）を・使用して得られた距離ベクト
ルのエレメントである。ｔｐ（ｘ。Here, t relates to the current iteration. DV(1(
s), t) are the elements of the distance vector obtained using the index I, (S) for the state S in the iteration tk:. tp(x.

Ｓ）は、状ｑｘから状態ｓへの移行に関する移行ペナル
ティ−である。ＭＣＤ　（ｓ、ｔ）は、繰り−返し尤に
おける状態Ｓへの最小累積距離である。S) is the transition penalty for transitioning from state qx to state s. MCD (s, t) is the minimum cumulative distance to state S in an iterative likelihood.

ＴＢ　（Ｓ、ｔ）は、繰り返しｔにおける状態Ｓに関す
るトレースバック・ポインタである。TB (S, t) is the traceback pointer for state S at iteration t.

ｍ　ｉ　ｎ　ｘ　（）は、Ｘのすべての有効値に対する
Ｃ）内の値の最小値を表す。Ａｒｇｍ；　ｎＸ（）は、
（）内の値を最小にするようなＸの値を表す。min x () represents the minimum of the values in C) for all valid values of X. Argm; nX() is
Represents the value of X that minimizes the value in parentheses.

既に述べたように、ビテルｅ・エンジンは、各フレーム
において最小累積距離を得て、各状態に対してトレース
バック・ポインタを得る。各有限状態機械に対して、こ
れらの最小累積距離と対応トレースバック・ポインタは
、決定論理回路へと、その論理回路のデータとして送ら
れ、どの単語が明らかに話されたものかを示す。上述の
諸操作に加えて、ビテルビ・エンジンはいすべてのモデ
ルに亘って完全な繰り返しのそれぞれの終了点において
、すべてのモデルに対する最小累積距離のうちの最小値
を得る。そして、その次の繰り返しで、記憶した最小累
積距離の値のすべてからこの最小値を引き算する。この
基準化操作を実施することにより、記憶装置内の記憶場
所が占める大きざを越えて、記憶した累積距離の値が増
加することが、可能なかぎり防止される。上述の諸式か
ら次のことがわかる。すなわち、最小累積距離は、加え
操作によって形成され、それゆえ、最小累積距離は増加
せざるを得ない。得られたすべての値からある最小値を
引き算することによって、その増力月頃向は抑えられる
。しかし、もし、記憶したある最小累積距離が最大記憶
容量に達したならば、その最小累積距離は増加されるこ
となくその値に保持される。そして、誤解を招くような
累積距離の値が記憶されないように、いったん最大値に
達した後は、上述の引き算工程は実施しない。その最大
値は、最後には記憶装置から自動的に出てい（。As previously mentioned, the Vitel e-engine obtains a minimum cumulative distance in each frame and obtains a traceback pointer for each state. For each finite state machine, these minimum cumulative distances and corresponding traceback pointers are sent as data to the decision logic circuit to indicate which words were clearly spoken. In addition to the operations described above, the Viterbi engine obtains the minimum of the minimum cumulative distances for all models at the end of each complete iteration over all models. Then, in the next iteration, this minimum value is subtracted from all of the stored minimum cumulative distance values. By performing this scaling operation, it is prevented, as far as possible, that the value of the stored cumulative distance increases beyond the size of the storage location in the storage device. The following can be seen from the above equations. That is, the minimum cumulative distance is formed by an addition operation, and therefore the minimum cumulative distance must increase. By subtracting a certain minimum value from all the obtained values, the power increase trend can be suppressed. However, if a certain minimum cumulative distance stored reaches the maximum storage capacity, the minimum cumulative distance is not increased and is held at that value. In order to avoid storing a misleading cumulative distance value, once the maximum value is reached, the above-described subtraction step is not performed. The maximum value will eventually come out of the storage device automatically (.

というのは、各繰り返しにおいて最小累積距離が記憶さ
れ、その最大値は結局取り除かれるからである。This is because in each iteration the minimum cumulative distance is stored and the maximum value is eventually removed.

第５図の二つのダミー状態は、一つの有限状態（一つの
単語を表す）から別の状態に移行するのを単純化するた
めに使われている。第７図では、二つの異なる単語が、
類似のモデルＡ、Ｂ、ＣとＡ″、Ｂ−、Ｃ−をそれぞれ
有する。ただし、既に述べたように、移行ペナルティ−
とインデックスは異なる。第１のモデルの終わりから第
２のモデルの初めへと正常に移行する様子は、状態Ｃか
ら状態へ−に至る矢印１１０で示されている。しかし、
単語の一部が発音されなかったときは、矢印１１１と１
１２で示されるような移行、すなわち、第１のモデルの
最後の状態または第２のモデルの最初の状態を省略する
ような移行が、しばしば生ずる。状態Ｂから状態Ｂ−へ
の移行もしばしば生ずる。ある典型的な認識装置の所定
の諸量に含まれる６４単語の間で起こり得るすべての移
行を考慮することは、もし第５図に基づいて実施するな
らば、きわめて複雑である。しかし、この問題点は、こ
れから述べるダミー状態を利用することによって避ける
ことができる。ビテルビ・エンジンがある単語の正常状
態の更新を完了したときに、その状態の最小累積距離を
得ることによってビテルビ・エンジンは終了ダミー状態
（ＥＤ）を更新する。その更新は、正常状態を更新する
場合と同じ方法で実施する。ただし、終了ダミー状態に
対しては状態ペナルティ−はない。終了ダミー状態に対
して、トレースバック・ポインタも記憶する。このトレ
ースバック・ポインタも、他のトレースバック・ポイン
タの場合と同じ方法で１ｑる。The two dummy states in Figure 5 are used to simplify the transition from one finite state (representing a single word) to another. In Figure 7, two different words are
have similar models A, B, C and A'', B-, C-, respectively. However, as already mentioned, the transition penalty -
and the index are different. The normal transition from the end of the first model to the beginning of the second model is indicated by arrow 110 leading from state C to state -. but,
If part of the word is not pronounced, arrows 111 and 1
Transitions such as those shown at 12 often occur, ie, those that omit the last state of the first model or the first state of the second model. Transitions from state B to state B- also often occur. Considering all the possible transitions between the 64 words contained in the given quantities of a typical recognizer, if implemented on the basis of FIG. 5, would be extremely complex. However, this problem can be avoided by using the dummy state described below. When the Viterbi engine completes updating the normal state of a word, it updates the ending dummy state (ED) by obtaining the minimum cumulative distance of that state. The update is performed in the same way as when updating the normal state. However, there is no state penalty for the termination dummy state. A traceback pointer is also stored for the exit dummy state. This traceback pointer is also 1q in the same way as other traceback pointers.

ただし、選ばれた最小累積距離に対応するトレースバッ
ク・ポインタは、記憶の前に増分しない。However, the traceback pointer corresponding to the chosen minimum cumulative distance is not incremented prior to storage.

ビテルビ・エンジンがすべてのモデルを処理したときに
、最小累積距離のうちの最も小さいものを有する終了ダ
ミーは、選ばれた単語モデルの開始ダミー状態（ＳＤ）
を更新するのに利用する。その他の開始状態は、ある最
大値へと更新される。When the Viterbi engine has processed all models, the ending dummy with the smallest of the minimum cumulative distances is the starting dummy state (SD) of the chosen word model.
Use to update. Other starting states are updated to some maximum value.

すべての開始ダミーに対するトレースバック・ポインタ
は、ゼロに設定される。開始状態は、最も小ざい最小累
積距離を伴った更新のために、ある文法規則に基づいて
、選ばれる。その文法規則に従えば、認識すべき語彙に
含まれる単語が、それより前の単語に続くことができる
か否かを判断できる。その様な文法が存在しない場合に
は、あるいは文法を無視する場合には、最も小さい最小
累積距離を得ることによって、すべての開始状態が更新
される。この方法では、任意の有限状態機械から別の有
限状態機械への移行に対する最小経路と累積距離は、各
機械に対して記録される。Traceback pointers for all starting dummies are set to zero. The starting state is chosen for update with the smallest minimum cumulative distance based on certain grammar rules. According to the grammar rules, it can be determined whether a word included in the vocabulary to be recognized can follow a previous word. If no such grammar exists, or if the grammar is ignored, all starting states are updated by obtaining the smallest minimum cumulative distance. In this method, the minimum path and cumulative distance for transitioning from any finite state machine to another is recorded for each machine.

ビテルビ・エンジンは、汎用コンピュータを使用して構
成することができる。しかし、連続音声認識のためには
、十分短い時間で処理を行うために、ディスクリート集
積回路か、または、特別にメタライズしたゲートもしく
は論理アレイを有するシングル・クリップによって、専
用コンピュータを構成したほうが好ましい。専用コンピ
ュータは、その−例は後に述べるが、例えば、多数のラ
ッチに接続した出力装置を備えた演算装置を有する。ラ
ッチのあるものは、ランダム・アクセス・メモリーのア
ドレス・バスに接続され、その他のラッチは、データ・
バスに接続される。The Viterbi Engine can be constructed using a general purpose computer. However, for continuous speech recognition, it is preferable to construct a dedicated computer with discrete integrated circuits or a single clip with specially metallized gates or logic arrays in order to perform the processing in a sufficiently short time. A special purpose computer, an example of which will be described later, has, for example, a computing unit with an output device connected to a number of latches. Some latches are connected to the random access memory address bus; others are connected to the data bus.
connected to the bus.

前の段落で述べたようなコンピュータを、この応用のた
めにフレキシブルにするためには、すなわち、有限状態
機械のモデルに含まれる状態の数や状態間の移行を変更
できないようなハード配線を避けるためには、ＲＡＭ６
７の一部を「スケルトン」として割り当てる。この「ス
ケルトン」は、各モデルに含まれる状態の数と状態間の
移行経路とを決定する。ＲＡＭの様々な部分は、第８図
に示す。これらの部分の一つ１１５が「スケルトン」で
ある。部分１１６〜１１８は、三つのそれぞれのモデル
に対するデータのほとんどを含む。各モデルに対して一
つの同様な部分が存在する。第４図のモデルがＲＡＭ１
１６によって表されるとすると、次のことがわかるであ
ろう。３個の下部記憶場所は、伯の状態からのまたはそ
れ自身からの状ＲＣへの移行に対応する３個のエントリ
移行ペナルティ−１ｐ１〜ｔ、３を含む。第４の記憶場
所はインデックスを含む。このインデックスにより、第
１表から状態ペナルティ−を得ることができる。部分１
１６は、状態Ａ、　Ｅ３．　ＥＥｏに対するその他の記
憶場所に分割されている。状態Ｃは、三つのエントリ移
行があるため、状態Ａ、Ｂよりも記憶場所が一つ多い。To make a computer like the one described in the previous paragraph flexible for this application, we must avoid hard wiring that does not allow us to change the number of states or the transitions between states in our finite state machine model. For this purpose, RAM6
Allocate part of 7 as a "skeleton". This "skeleton" determines the number of states included in each model and the transition paths between states. Various parts of the RAM are shown in FIG. One of these parts 115 is a "skeleton". Portions 116-118 contain most of the data for each of the three models. There is one similar part for each model. The model in Figure 4 is RAM1
16, we can see that: The three lower storage locations contain three entry transition penalties -1p1 to t, 3, corresponding to transitions to state RC from the state of count or from itself. The fourth memory location contains the index. With this index, the state penalty can be obtained from Table 1. part 1
16 is state A, E3. It is divided into other storage locations for EEo. State C has one more memory location than states A and B because there are three entry transitions.

そして、状態ＥＤは、この状態に対するペナルティ−が
ないため、状態Ａよりも記憶場所が一つ少ない。状態Ｓ
Ｄは、この状態への移行が無く、関連するペナルティ−
もないため、部分１１６に記憶場所がない。債に述べる
ように、繰り返しを実施するときは、ビテルビ・エンジ
ンは、部分１１６．１１７．１１８をそしてその他のモ
デルのこれらの部分を順番に通って移動するために、ポ
インタを使用する。一つのポインタは、スケルトンＲＡ
Ｍ部分１１５を通って移動するためにも使用される。ス
ケルトンＲＡＭ部分１１５では、各状態に対して、各移
行ペナルティ−に対するオフセットが記憶される。The state ED requires one less memory location than the state A because there is no penalty for this state. Status S
D has no transition to this state and the associated penalty -
Therefore, there is no storage location in portion 116. When performing an iteration, the Viterbi engine uses pointers to move sequentially through portions 116, 117, 118, and other parts of the model, as described in the section. One pointer is the skeleton RA
It is also used to move through the M section 115. In the skeleton RAM portion 115, for each state, an offset for each transition penalty is stored.

ＲＡＭのその他の部分は、各単語に対して一つの部分が
対応するが、離して設定され、各単語の各状態に関連す
る最小累積距離とトレースバック・ポインタとを記憶す
る。これらの部分の例を、第６図の１２０．１２１．１
２２で示す。第４図に示す単語に対しては、ＲＡＭ部分
１２０は、５対の記憶場所に分割され、各状態に対して
その一つが対応し、その状態に対して、累積距離とトレ
ースバック・ポインタとを含む。Other parts of the RAM, one for each word, are set apart to store the minimum cumulative distance and traceback pointers associated with each state of each word. Examples of these parts are 120.121.1 in Figure 6.
22. For the words shown in FIG. 4, the RAM portion 120 is divided into five pairs of memory locations, one for each state, for which the cumulative distance and traceback pointer are stored. including.

最小累積距離を更新するために、矢印１２３〜１２５で
示され、かつ専用コンピュータにおいてラッチによって
保持された、３個のポインタが使用される。第１のポイ
ンタ１２３は、最初は、更新すべき第１のモデルの最後
の状態の第１の移行ペナルティ−を指す。この例では、
ＲＡＭ部分１１６の最も下の記憶場所によって保持され
る。ビテルビ・エンジンは、状態Ｃに移行するときに、
どの累積距離が最小であるかを決定しなければならない
。それをするために、ビテルビ・エンジンは、状態Ｃへ
の経路のそれぞれに対して、累積距離を見つけなければ
ならない。それから、どれが最小であるかを決定しなけ
ればならない。移行ペナルティ−１が１ｑられた後に、
矢印１２４は、スケルトンＲＡＭ部分１１５におけるオ
フセットを指す。このオフセットは、第１の移行がそこ
から始まるところの状態における累積距離の位置を決定
する。したがって、もし出発状態が現在の状態であるな
らば、オフセットはゼロであり、累積距離は、ポインタ
１２５によって与えられるものである。得られた距離は
記憶され、ポインタ１２３と１−２４は一つだけ増分さ
れる。これにより、オフセット２によって与えられた累
積距離に、第２の移行ペナルティ−が加えられる。オフ
セット２は、この場合、「２」であるかもしれない。そ
の場合、ポインタ１２５のアドレスよりも二つだけ大き
いアドレスを伴った記憶場所に保持されている値が読み
取られる。すなわち、状態Ｂの累積距離である。この方
法で得られた距離は、もしそれが以前に得られたものよ
りも小さければ、−記憶される。ポインタ１２３と１２
４は再び増分されて、第３の移行と、それに加えるべき
累積距離とを与える。それから、得られた距離は、もし
既に記憶した距離よりも小さければ、再び記憶される。Three pointers, indicated by arrows 123-125 and held by latches in the dedicated computer, are used to update the minimum cumulative distance. The first pointer 123 initially points to the first transition penalty of the last state of the first model to be updated. In this example,
It is held by the bottom memory location of RAM portion 116. When the Viterbi engine transitions to state C,
It must be determined which cumulative distance is the minimum. To do that, the Viterbi engine must find the cumulative distance for each of the paths to state C. Then you have to decide which is the smallest. After the transition penalty -1 is reduced by 1q,
Arrow 124 points to an offset in skeleton RAM portion 115. This offset determines the cumulative distance position in the state from which the first transition begins. Therefore, if the starting state is the current state, the offset is zero and the cumulative distance is that given by pointer 125. The distance obtained is stored and pointers 123 and 1-24 are incremented by one. This adds a second migration penalty to the cumulative distance given by offset 2. Offset 2 may be "2" in this case. In that case, the value held in the memory location with an address two greater than the address of pointer 125 is read. That is, it is the cumulative distance of state B. The distance obtained in this way - if it is smaller than the one obtained previously - is stored. pointers 123 and 12
4 is again incremented to give the third transition and the cumulative distance to be added to it. The obtained distance is then stored again if it is smaller than the already stored distance.

この方法では、得られた累積距離の最小値が記憶される
。ポインタ１２３と１２４は再び増分され、それから、
ポインタ１２３が、適当な状態ペナルティ−を得るため
のインデックスを与える。インデックス１２４は、状態
Ｃに対する繰り返しが最後に達したことを示す。次に、
適当なトレースバック・ポインタと一緒に得られた最小
累積距離が、ポインタ１２５によって与えられた場所に
記憶される。そして、これらポインタのすべてが増分さ
れる。それから、状態ＢとＣに対する累積距離は、ポイ
ンタ１２３．１２４．１２５が増分されたのと同様の方
法で更新される。しかし、終了状態（ＥＤ）に対する累
積距離とトレースバック・ポインタは、あるモデル内の
すべての状態が更新されかつ上述のようにすべてのモデ
ルが更新されたときくそれらの開始状態を除く）に得ら
れる。それから、任意の終了ダミー状態によって保持さ
れた最小累積距離が、文法規則に基づいて、選ばれた開
始状態（ＳＤＳ）に入力される。In this method, the minimum value of the cumulative distance obtained is stored. Pointers 123 and 124 are incremented again and then
Pointer 123 provides an index to obtain the appropriate state penalty. Index 124 indicates that the iteration for state C has reached its end. next,
The minimum cumulative distance obtained together with the appropriate traceback pointer is stored at the location given by pointer 125. All of these pointers are then incremented. The cumulative distances for states B and C are then updated in the same manner as pointers 123.124.125 were incremented. However, the cumulative distance and traceback pointer to the end state (ED) is obtained when all states in a model are updated (excluding their starting state) when all models are updated as described above. . Then, the minimum cumulative distance maintained by any ending dummy state is entered into the chosen starting state (SDS) based on the grammar rules.

「スケルトン」を使用することによって、モデルは三つ
の方法で変更できる。すなわち、第１は、オフセットを
変更することによって、状態間の移行を変更できる。第
２は、もし二つの標識間のオフセットの数を変更すれば
、ある状態への移行の数が変化する。一つの状態の更新
が完了したことを示すのは、この標識でおる。各モデル
における状態の数は、ＲＡＭ部分１１５に記憶されたオ
フセットのグループの数を増加することによって変更で
きる。モデルの数は、もつと多くの部分、例えば１１６
．１１７．１１８や１２０．１２１．１２２）を割り当
てることによって増加できる。By using "skeleton", the model can be modified in three ways. First, by changing the offset, the transition between states can be changed. Second, if we change the number of offsets between two indicators, the number of transitions to a certain state changes. This indicator indicates that a state update is complete. The number of states in each model can be changed by increasing the number of groups of offsets stored in RAM portion 115. The number of models can be many parts, for example 116
．． 117.118 or 120.121.122).

第９図のフローチャートは、ビテルビ・エンジンの一つ
の繰返しを示すものでおるが、その動作のおもな点は、
そのチャートから明らかであろう。The flowchart in Figure 9 shows one iteration of the Viterbi engine, but the main points of its operation are:
It should be clear from the chart.

操作１３０では、ＲＡＭ内の、第１表の現在の列を表す
記憶場所に、距離ベクトルが読み込まれる。In operation 130, a distance vector is read into a memory location in RAM that represents the current column of the first table.

それから、ループ１３１に入る。ここでは、第１のモデ
ルの最後の正常モデルの一つの移行に対して累積路１１
ｔｃＤｎが計算される。もしＣＤ、の値が、その状態に
ついての最小累積距離のリセット値（すなわち、以前に
計算した（ｔ！！（ＭＣＤ））よりも小さいならば、そ
のときは、操作１３２で、ＭＣＤをＣＤ、に設定する。Then loop 131 is entered. Here, for one transition of the last normal model of the first model, the cumulative path 11
tcDn is calculated. If the value of CD, is less than the reset minimum cumulative distance value for that state (i.e., the previously calculated (t!!(MCD)), then in operation 132, set MCD to CD, Set to .

これと同時に、その状態の出発状態の累積距離について
の対応トレースバック・ポインタを、一つだけ増分する
。判定１３３では、現在の状態への移行がもつと存在す
るかどうかを決定する。もしそうでなければ、そのとき
は、ループ１３４を実施する。ここでは、最も小さい最
小距離を以前の繰り返しから引き算することによって基
準化を実施し、その状態での状態ペナルティ−を加え、
最小ＭＣＤを記憶する。At the same time, the corresponding traceback pointer for the cumulative distance of the starting state for that state is incremented by one. Decision 133 determines whether a transition to the current state exists. If not, then loop 134 is performed. Here, scaling is performed by subtracting the smallest minimum distance from the previous iteration, adding the state penalty at that state,
Store the minimum MCD.

ループ１３４の最後では、その状態についてのＭＣＤと
ＴＢを記憶し、そして、次の状態についてループ１３１
を繰り返す。この繰り返しを実施するときは、ＭＣＤは
、新しい状態に対してリセットされ、ＣＤｎ、ｔｐ、Ｃ
ＯＤ、ＴＢは新しい状態に関係する。もし判定１３５に
よって、新しい状態が最後の状態であると決定されたな
らば、ジャンプ１３６によって示すように、ループ１３
１と１３４を繰り返す。しかし、必要に応じて変数は再
びリセットされ、新しいモデルと新しい状態を表す。判
定１３７で示すように、すべてのモデルが処理されたと
きは、−すべでのモデルの開始ダミー状態が更新され、
すべてのモデルのすべての状態に関する様々なＭＣ［）
とＴＢの値が、プロセッサ６３に出力できるように待機
される。プロセッサに出力されると、決定された単語が
、認識されたものとして示される。それから、次の繰り
返しが操作１３０で開始される。At the end of loop 134, we store MCD and TB for that state, and then loop 131 for the next state.
repeat. When performing this iteration, MCD is reset to the new state and CDn, tp, C
OD, TB relate to the new state. If decision 135 determines that the new state is the last state, loop 13
Repeat steps 1 and 134. However, if necessary, the variables are reset again to represent the new model and new state. When all models have been processed, as indicated by decision 137, - the starting dummy state of all models is updated;
Various MCs for all states of all models [)
and TB are awaited so that they can be output to the processor 63. Once output to the processor, the determined word is indicated as recognized. The next iteration is then initiated at operation 130.

第１０図に示す回路１４０は、ビテルビ・エンジン（す
なわち、論理回路６６）のために使用することができる
。例えば、回路１４０は、ディスクリート集積回路また
はカスタムゲートアレイから構成することができる。ビ
テルビ・エンジンは、１６ビツトのアドレスバス１４２
と８ビツトのデータパスコ４３とによって、ＲＡＭ６７
に接続される。ＲＡＭ６７は、第８図の領１！！１１５
〜１２０および同様の領域を含む。The circuit 140 shown in FIG. 10 can be used for the Viterbi engine (ie, logic circuit 66). For example, circuit 140 may be constructed from a discrete integrated circuit or a custom gate array. The Viterbi engine uses a 16-bit address bus 142
and the 8-bit data path code 43, the RAM 67
connected to. RAM67 is area 1 in Figure 8! ! 115
~120 and similar regions.

ビテルビ・エンジンの繰り返しが実施されるとき、コン
トローラ１４５に接続した端子１４４に、「開始」信号
が送られる。コントローラ１４５は、ある例では、シー
ケンサとメモリ（どちらも図示せず）を有する。「開始
」信号に応答して、そのすべての状態に亘るシーケンサ
・サイクルは、コントローラ・メモリを番地付けする。When an iteration of the Viterbi engine is performed, a "start" signal is sent to terminal 144 connected to controller 145. Controller 145, in one example, includes a sequencer and memory (neither shown). In response to a "start" signal, the sequencer cycle through all its states addresses the controller memory.

コントローラ・メモリは、２進ビツトの連続パターンを
、端子１４６上に出力する。ビテルビ・エンジン１４０
（コントローラ１４５を除く）のすべての回路の使用可
能端子は、それぞれ、コントローラ１４５のそれぞれの
端子に接続される。これにより、ビットのパターンが現
れたとき、異なる回路または回路グループが可能になる
。The controller memory outputs a continuous pattern of binary bits on terminal 146. viterbi engine 140
The enabled terminals of all circuits (other than controller 145 ) are each connected to a respective terminal of controller 145 . This allows different circuits or groups of circuits when a pattern of bits appears.

第１の操作は、ラッチによって永久的に保持されたパラ
メータ・ベース・アドレスを使用する。The first operation uses a parameter base address held permanently by a latch.

これにより、ＲＡＭ６７においてパラメータ・アレイを
番地付けする。バッファ１４８は、コントローラ１４５
によって使用可能にされる。そして、ＲＡＭはバス１４
２によって番地付けされる。パラメータ・アレイは、４
個のポインタのアドレスを保持する。４個のポインタと
は、１２３（ペナルティ−’ポインタ、Ｐｅｎｔ）、１
２４　（スケルトン・ポインタ、Ｔｐｔ）、１２５　（
累積距離ポインタ、ＣＤｐｔ）、第１表を読むためのイ
ンデックス・ポインタ（［）ｖｅｃ）である。これらの
ポインタは、それぞれバッフ？１５４とラッチ１５５を
経由して、直列８ビツト・バイトで、ラッチ１５０．１
５１．１５２．１５３にそれぞれ読み込まれる。コント
ローラ１４５は、再び、必要な制御信号を送る（以後、
本明細書では、コントローラの機能は、特別な芸能が必
要とされる場合を除いて、理解されているものとして扱
う）。This addresses the parameter array in RAM 67. The buffer 148 is connected to the controller 145
enabled by. And RAM is bus 14
Numbered by 2. The parameter array is 4
Holds the addresses of pointers. The four pointers are 123 (penalty-'pointer, Pent), 1
24 (skeleton pointer, Tpt), 125 (
a cumulative distance pointer (CDpt), and an index pointer ([)vec) for reading the first table. Is each of these pointers a buffer? 154 and latch 155 in serial 8-bit bytes to latch 150.1.
51, 152, and 153 respectively. The controller 145 again sends the necessary control signals (hereinafter referred to as
Controller functionality is treated herein as understood, except where special performance is required).

累積距離をある量だけ減少すべき場合、パラメータ・ア
レイのざらに別のエレメントを読み込み、その結果をラ
ッチ１５８に入力することによって、その減少量も初期
設定される。初期設定を完了するために、以前の累積距
離の最小値プラス移行ペナルティ−と、現在の繰り返し
についての最小累積距離とを表す２個のラッチ１５６と
１５７を、それぞれ最大に設定する。If the cumulative distance is to be decreased by a certain amount, that decrease is also initialized by reading another element of the parameter array and inputting the result to latch 158. To complete the initialization, two latches 156 and 157 representing the minimum previous cumulative distance plus transition penalty and the minimum cumulative distance for the current iteration are set to maximum, respectively.

次に、ＲＡＭ６７から第１のオフセットを読み取るため
に、ラッチ１５１におけるスケルトン・ポインタ１２４
を使用する。この第１のオフセットは、バッファ１５４
を経由して、ラッチ１６０内にロードされる。それから
、ポインタ１２４は、バッファ１６２を介して、演算装
置（ＡＬＵ＞１６１の一つの入力に応用することによっ
て、増分される。このことは、・ＡＬＵの別の入力に「
１」を強制するのに、バッファ１６３を使用したときに
、同時に起こる。バッフ１１５９は、増分されたポイン
タ１２４をラッチ１５１にロードする。Next, skeleton pointer 124 in latch 151 is used to read the first offset from RAM 67.
use. This first offset is
is loaded into latch 160 via . The pointer 124 is then incremented by applying it to one input of the arithmetic unit (ALU>161) via the buffer 162.
This occurs simultaneously when buffer 163 is used to force ``1''. Buffer 1159 loads incremented pointer 124 into latch 151.

累積距離ポインタ１２５は、ラッチ１５２からＡＬＵへ
と送られる。ＡＬＵでは、累積距離ポインタ１２５は、
ラッチ１６０に保持された必要なオフセットに加えられ
る。そして、その結果は、仮のポインタ（Ｖｐｔ）を保
持するラッチ１４０へと送られる。Ｖｐｔは、第１の移
行（例えば、第１図のＡからＣへ）が出発する状態の累
積距離（ＲＡＭ６７に保持されている）を読み込むのに
使用される。というのは、オフセットは、領域１２０に
おける、ポインタ１２５からの、この距離の増分アドレ
スであるからである。読み込まれた距離は、ラッチ１６
５内にロードされる。バッファ１６２）ラッチ１６３、
ＡＬＵ、バッフ１１５９を使用することによって、Ｖｐ
ｔは一つだけ増分される。したがって、ｖｐｔは、ラッ
チ１６５によって保持された累積距離に対応するトレー
スバック・ポインタを示すことになる。このポインタは
、ラッチ１６６内に読み込まれる。次に、ペナルティ−
・ポインタ１２３は、ＲＡＭ６７から適切な移行ペナル
ティ−をラッチ１６０内に読み込むのに使用される。そ
して、ＡＬＵは、ラッチ１６０と１６５の内容、すなわ
ち累積距離と移行ペナルティ−１を足し算する。そして
、その結果を、バッファ１６７を経由して、ラッチ１６
０内にロードする。ラッチ１５６は、通常は、ある状態
に対しては、これまでに得られた累積距離の最小値を保
持する。しかし、最初の移行に対しては、既に述べたよ
うに、累積距離は最大に設定されている。したがって、
通常は、ＡＬＵは、ラッチ１５６と１６０の内容を比較
する。そして、もしラッチ１６０の内容が、これまでに
得られた最小値よりも小さければ、制御ライン１６８上
にフラグが設定される。制御ライン・１６８によって、
コントローラは、ラッチ１６０からラッチ１５６内に読
み込むことができ、トレースバック・ポインタをラッチ
１６６からラッチ１６９内に読み込むことができる。し
たがって、最良の累積距離に、これまでに得られた移行
ペナルティ−と、対応トレースバック・ポインタとを足
したものを得ることができた。Cumulative distance pointer 125 is sent from latch 152 to the ALU. In the ALU, the cumulative distance pointer 125 is
Added to the required offset held in latch 160. The result is then sent to latch 140 that holds a temporary pointer (Vpt). Vpt is used to read the cumulative distance (held in RAM 67) of the state from which the first transition (eg, from A to C in FIG. 1) departs. This is because the offset is the incremental address of this distance in region 120 from pointer 125. The read distance is the latch 16
Loaded within 5. buffer 162) latch 163,
By using ALU and buffer 1159, Vp
t is incremented by one. Therefore, vpt will indicate a traceback pointer corresponding to the cumulative distance held by latch 165. This pointer is loaded into latch 166. Next, the penalty -
- Pointer 123 is used to load the appropriate transition penalty into latch 160 from RAM 67. The ALU then adds the contents of latches 160 and 165, namely the cumulative distance and the transition penalty -1. Then, the result is sent to the latch 16 via the buffer 167.
Load into 0. Latch 156 typically holds the minimum cumulative distance obtained so far for a given state. However, for the first migration, the cumulative distance is set to the maximum, as already mentioned. therefore,
Typically, the ALU compares the contents of latches 156 and 160. A flag is then set on control line 168 if the contents of latch 160 are less than the minimum value previously obtained. By control line 168,
The controller can read from latch 160 into latch 156 and the traceback pointer from latch 166 into latch 169 . Therefore, we were able to obtain the best cumulative distance plus the migration penalty obtained so far and the corresponding traceback pointer.

次に、第１の状態への別の移行の研究を始めるために、
スケルトン・ポインタを使用する。そして、累積距離と
得られた移（テペナルティーの和は、その和がラッチ１
５６に保持された値より小さいかどうかを決定するため
に、チェックされる。摸しそうならば、その和は、対応
トレースバック・ポインタと一緒に、ラッチ１５６と１
６９に記憶される。このプロセスは、スケルトン・ポイ
ンタ１２４が第１の標識に違するまで続く。その第１の
標識は、状態の終了（ＥＯ３）の標識として知られてい
るタイプのものである。この様な標識がバッファ１５４
の出力に現れたときは、その標識は検出器１７０で検出
され、コントローラ１４５は、以下に述べる方法で続行
するように指示を受ける。Next, to begin studying another transition to the first state,
Use skeleton pointers. Then, the sum of the accumulated distance and the obtained displacement (tepenalty) is the sum of the latch 1
56 to determine if it is less than the value held at 56. If so, the sum, along with the corresponding traceback pointer, is added to latches 156 and 1.
69. This process continues until skeleton pointer 124 crosses the first indicator. The first indicator is of the type known as an end of state (EO3) indicator. This kind of indicator is in the buffer 154.
, the indicator is detected by detector 170 and controller 145 is instructed to proceed in the manner described below.

ＡＬＵでは、ランチ１５６の内容から、最後の繰り返し
で得られて上述のようにラッチ１５８内に保持された最
小累積距離が、引き譚される。そして、その結果はラッ
チ１６５内に書き込まれる。In the ALU, from the contents of launch 156, the minimum cumulative distance obtained in the last iteration and held in latch 158 as described above is retrieved. The result is then written into latch 165.

ペナルティ−・ポインタ１２３によって示されたアドレ
ス、すなわち第１表でのインデックスは、そのインデッ
クスをラッチ１６０内に書き込むために使用される。そ
して、ＡＬＬＪでは、これらの内容がラッチ１５３の内
容に加えられる。ラッチ１５３は、このとき、上述の初
期設定の結果として、ＲＡＭ６７のその領域のベース・
アドレスを保持する。このＲＡＭ６７は、距離ベクトル
（すなわち、第２表の現在の列）の諸エレメントを含ん
でいる。上述の結果は、ラッチ１６４に」き込まれｖｐ
ｔとなる。このＶｐｔは、距離ベクトルの適切なエレメ
ントをラッチ１６０内に読み込むために使用される。Ａ
ＬＵでは、ラッチ１６０の内容とラッチ１６５（基準化
した最小累積距離を保持する）の内容を加えて、更新し
た累積距離を得る。この更新した累積距離は、ラッチ１
５２に保持されたポインタ１２５によって示された記憶
場所に入力される。次に、ラッチ１６９内のトレースバ
ック・ポインタは、ＡＬＵ内で一つだけ増分され、その
結果がラッチ１６０内に入力される。The address pointed to by penalty pointer 123, the index in table 1, is used to write that index into latch 160. Then, in ALLJ, these contents are added to the contents of latch 153. The latch 153 will now hold the base of that area of the RAM 67 as a result of the initialization described above.
Retain address. This RAM 67 contains the elements of the distance vector (ie, the current column of Table 2). The above result is loaded into latch 164 and
It becomes t. This Vpt is used to load the appropriate elements of the distance vector into latch 160. A
At the LU, the contents of latch 160 and the contents of latch 165 (which holds the scaled minimum cumulative distance) are added to obtain the updated cumulative distance. This updated cumulative distance is
The memory location indicated by pointer 125 held at 52 is entered. The traceback pointer in latch 169 is then incremented by one in the ALU and the result is input into latch 160.

ラッチ１５２の内容も、ＡＬＵ内で一つだけ増分される
。そのように形成されたアドレスは、ラッチ１６０内の
トレースバック・ポインタをバッフ？１７１を経由して
ＲＡＭ６７に入力するために使用される。次の状態の処
理に備えるために、このラッチの内容は、再び一つだけ
増分される。The contents of latch 152 are also incremented by one within the ALU. The address so formed buffers the traceback pointer in latch 160? It is used for inputting to the RAM 67 via 171. To prepare for processing the next state, the contents of this latch are again incremented by one.

一つの状態に対する諸操作はこれで終了する。This completes the operations for one state.

スケルトン・ポインタは、ＲＡＭを読み取るために、再
び使用され、その他の正常状態は、同様な方法で処理さ
れる。最終的に、検出器＠１７０は、最後の正常状態と
終了ダミーとの間の標識を得る。The skeleton pointer is used again to read RAM and other normal conditions are handled in a similar manner. Eventually, the detector@170 obtains the indicator between the last normal state and the termination dummy.

この標識は、単語の終わり（ＥＯＷ）として知られるタ
イプのものである。次の状態（Ｒ後の状態）が処理され
ると、以下の諸操作が省略される。すなわち、インデッ
クスを使用する操作と、トレースバック・ポインタを増
分する操作。また、終了ダミー処理は、モデルを通って
反対の方向にあり（第４図を参照）、これらの諸操作で
は、ポインタ１２５と読み込まれる累積距離との間でオ
フセットを必要とするので、そのオフセットは、増分の
代わりに引き算される。This indicator is of the type known as an end of word (EOW). When the next state (state after R) is processed, the following operations are omitted. That is, operations that use the index and operations that increment the traceback pointer. Also, the termination dummy operations are in the opposite direction through the model (see Figure 4), and since these operations require an offset between the pointer 125 and the cumulative distance read, that offset is subtracted instead of incremented.

ＥＯＷ標識の次の標識がスケルトン・ポインタ１２４に
遭遇したときは、ラッチ１５１と１５６の初期設定が実
施される。それから、次の単語モデルが処理される。無
効な移行ペナルティ−は、ＲＡＭ６７の領域１１６の上
にある記憶場所に保持される。そして、詔ゑ内の最後の
単語が処理された後に、コントローラが処理を続けよう
とすると、読み出された第１の移行ペナルティ−が検出
装置１７０によって得られ、この値は無効ペナルティ−
とされる。そして、語嘗の終わり（ＥＯＶ）の制御信号
がコンｌ−ローラに遅し、繰り返しの終了を知らせる。When the next mark after the EOW mark encounters skeleton pointer 124, initialization of latches 151 and 156 is performed. Then the next word model is processed. Invalid migration penalties are maintained in memory locations above area 116 of RAM 67. Then, after the last word in the edict has been processed, when the controller attempts to continue processing, the read first transition penalty is obtained by the detection device 170, and this value is the invalid penalty -
It is said that An end-of-word (EOV) control signal is then delayed to the controller, signaling the end of the repetition.

それから、コントローラは、次の繰り返しで使用するた
めに、この繰り返しで得た最小累積距離＠−直き出す。The controller then extracts the minimum cumulative distance obtained in this iteration for use in the next iteration.

そして、「終了」信号を端子１７２に送る。これに接続
した決定論理回路く通常はマイクロプロセッサである）
は、それから、累積距離とＲＡＭ６７に送られたトレー
スバック・ポインタとを使用して、繰り返しを実施する
。決定論理回路または関連するマイクロプロセッサが、
さらに「開始」信号を端子１４４に送るまでは、ビテル
ビ・エンジンでは、それ以上の操作は実施されない。Then, it sends an "end" signal to terminal 172. The decision logic circuit connected to this is usually a microprocessor)
then uses the accumulated distance and the traceback pointer sent to RAM 67 to perform the iteration. The decision logic circuit or associated microprocessor
No further operation is performed on the Viterbi engine until a further "start" signal is sent to terminal 144.

第１０図に示すように、ＡＬｔＪ１６１の上部入力は１
６ビツトであり、下部入力は８ビツトである。ラッチ１
５６の、ラッチ１５８．１６５．１６９に対する出力は
、８ビツトであるから、ラッチ１７４が設けられている
。このラッチ１７４は、上述の諸ラッチがＡＬＵに入力
されるときに、ＡＬＵの上部入力の高有効側で８ゼロ・
ビットを強制する。As shown in Figure 10, the upper input of ALtJ161 is 1
The lower input is 8 bits. latch 1
Since the output of 56 to latches 158, 165, and 169 is 8 bits, latch 174 is provided. This latch 174 is set to 8 zeros on the high valid side of the top input of the ALU when the latches described above are input to the ALU.
Force bit.

ＡＬＬＪの出力に検出器１７５が接続されていて、飽和
を検出し、バッファ１７６を制御する。これにより、飽
和が生じた場合はいつでも、バッファ１７６は、バッフ
？１６７に代わって利用可能にされる。バッファ１７６
は最大可能数を含み１．一方、バッフ７１６７はオーバ
ーフローを含むことができる。検出器１７５も、バッフ
ァ１６３からの入力を有する。ＡＬＵが最大値をもっで
ある操作を実施していることを、もしこの入力が示すな
らば、そのときは、飽和もまた検出されたものとみなし
、バッファ１７６の出力（すなわち最大値）は、その操
作の結果として使用される。A detector 175 is connected to the output of ALLJ to detect saturation and control buffer 176. This ensures that whenever saturation occurs, the buffer 176 is in the buffer? 167. buffer 176
contains the maximum possible number.1. On the other hand, buffer 7167 can contain an overflow. Detector 175 also has an input from buffer 163. If this input indicates that the ALU is performing an operation with a maximum value, then saturation is also assumed to have been detected and the output of buffer 176 (i.e., the maximum value) is used as a result of that operation.

コントローラ１４５によって操作される様々の回路の同
期を取るために、従来のクロック・パルス・システムが
操作される。すなわち、端子１７３に、外部クロック・
パルスが入力される。各ラッチの、右側に向かう矢印は
、クロック端子を表し、左側に向かう円は、利用可能な
端子を表す。A conventional clock pulse system is operated to synchronize the various circuits operated by controller 145. In other words, the external clock signal is connected to the terminal 173.
A pulse is input. The arrow pointing to the right of each latch represents the clock terminal, and the circle pointing to the left represents the available terminal.

上述の装置は、分離された単語の認識に関するものであ
るが、１〜レーニングされた限られた暦党から形成され
た′連続音声の認識へと拡張できる。Although the device described above concerns the recognition of isolated words, it can be extended to the recognition of 'continuous speech' formed from a limited number of trained words.

すなわち、このことは、例えばトレースバック・ポイン
タを使用して、詳しく検討されたさまざまな単語モデル
状態をトラッキングすることによって可能となる（上述
の、Ｐ、Ｆ、Ｂｒｏｗｎらの文献、そしてＮｅＶの文献
を参照されたい）。That is, this is possible, for example, by using traceback pointers to track the various word model states discussed in detail (see P., F., Brown et al., supra, and NeV. Please refer to ).

本発明は、上述の特定の方法以外のその他の多くの方法
で実施できることは明らかであろう。例えば、最小累積
距離を得るために、その他のハードウェアを特別に設計
することができる。また、距離ベクトルは、線形予測関
数から１ｑることができる。ざらに、ビテルビ・プロセ
スは、フォワード・バックワード・アルゴリズムに置き
換えることができる。It will be obvious that the invention may be practiced in many other ways than those described above. For example, other hardware can be specially designed to obtain minimum cumulative distance. Also, the distance vector can be subtracted by 1q from the linear prediction function. Roughly speaking, the Viterbi process can be replaced by a forward-backward algorithm.

確率関数は、上述の特定の方法以外のその他の方法で得
ることができる。ただし、確率関数の数は、すべての有
限状態機械において、状態の数よりも小ざく、好ましく
は相当小さく、しなければならない。The probability function can be obtained in other ways than the specific methods mentioned above. However, the number of probability functions must be smaller, preferably significantly smaller, than the number of states in all finite state machines.

[Brief explanation of drawings]

第１ａ図と第１ｂ図は、本発明による方法の、状態を選
択し、ある単語に対して確率密度関数を１ｑるためのフ
ローチャートを形成する。第２図は、本発明による方法の、単語を認識するための
フローチャートである。第３図は、本発明による装置のブロック図である。第４図は、本発明の一実施例を説明するのに使用した有
限状態機械の図である。第５図は、本発明の一実施例で使用したごテルビ・エン
ジンに対する入力値が、どのように生じて、どのように
アクセスされるかを示す表、の概要である。第６図は、ビテルビ・エンジンによって、一度に一行、
計算され記憶される量、を示す表である。第７図は、異なる単語を表す状態間の移行がどのように
生じるかを示す例である。第８図は、本発明の一実施例で使用されるビテルビ・エ
ンジン内の記憶場所の図である。第９図は、本発明の一実施例で使用されるビテルビ・エ
ンジンのフローチャートである。第１０図は、本発明の一実施例で使用されるビテルビ・
エンジンに対する専用回路のブロック図である。６３・・・・・・プロセッサ、６４・・・・・・プロセッサ、６６・・・・・・論理回路、　　“ ６７・・・・・・ランダム・アクセス・メモリー（ＲＡ
Ｍ）、７２・・・・・・ホストコンピュータ、１４５・・・・
・・コントローラ、１６１・・・・・・演算装置（、ＡＬＵ＞。昭和　　年　　月　　日１．事件の表示　　　昭和６１年特許願第１９５１０９
号２）発明の名称　　　音声認識方法およびその装置３
、補正をする者事件との関係　　　出願人４、代理人Figures 1a and 1b form a flowchart of the method according to the invention for selecting states and calculating the probability density function 1q for a certain word. FIG. 2 is a flowchart of the method according to the invention for recognizing words. FIG. 3 is a block diagram of an apparatus according to the invention. FIG. 4 is a diagram of a finite state machine used to explain one embodiment of the invention. FIG. 5 is an overview of a table showing how the input values for the Telbi engine used in one embodiment of the present invention are generated and accessed. Figure 6 shows how the Viterbi engine produces one line at a time.
1 is a table showing quantities that are calculated and stored; FIG. 7 is an example of how transitions between states representing different words occur. FIG. 8 is a diagram of storage locations within a Viterbi engine used in one embodiment of the present invention. FIG. 9 is a flowchart of a Viterbi engine used in one embodiment of the present invention. FIG. 10 shows the Viterbi system used in one embodiment of the present invention.
FIG. 2 is a block diagram of a dedicated circuit for the engine. 63...Processor, 64...Processor, 66...Logic circuit, "67...Random access memory (RA)
M), 72...Host computer, 145...
...Controller, 161... Arithmetic unit (, ALU>. Showa year, month, day 1. Display of incident Patent application No. 195109 of 1988)
No. 2) Title of the invention: Speech recognition method and device 3
, Relationship with the case of the person making the amendment Applicant 4, Agent

Claims

[Scope of Claims] (1) A device for recognizing words within a predetermined vocabulary, which includes the following components. (b) Means for sequentially sampling sounds to obtain a set of signals each representative of the characteristics of those sounds. (b) means for storing data representative of a number of finite state machines for each word of a predetermined vocabulary and forming individual models of those words, said data representing a number of finite state machines; Describe the state that forms,
A probability function is assigned to each state, where at least one probability function is assigned to more than one state, and each probability function is determined based on whether the model generates the actual sound. Then, a means for describing the probability that a signal representing the characteristics of a sound will pretend to be some observed value if any model is in a state to which the probability function is assigned. (c) The probability that a given set of signals will be generated if the model that generated the actual sound and any model are in any given state is calculated by combining signal probability functions. From, the means to decide. (d) A means of determining, from calculated probabilities and properties of finite state machines, the maximum likelihood that a sample of consecutive sounds will represent given words. (e) Means for providing an output indicating one of the predetermined words as the most likely spoken word based on the maximum probability detected. (2) The device according to claim 1, wherein the means for calculating the probability includes means for calculating a distance vector from each set of the probability function and the signals when the various signals occur; If any finite state machine generates a real sound and occupies a state that is assigned a corresponding probability function, then each element of the distance vector has a corresponding probability function and the current signal represents the reciprocal of the probability of observing a set of having, equipment. (3) In the device according to claim 1 or 2, each finite state machine has a plurality of states connected by transitions, and each state and each transition correspond to each other. states and transition penalties, maintained as part of the data representing the finite state machine, each transition penalty being a property of the finite state machine,
Apparatus being part of a finite state machine, wherein each state penalty depends on a probability function assigned to the corresponding state and on the current set of said signals. 4. A method for selecting a number of states for a finite state machine to represent words within a predetermined vocabulary and obtaining data for characterizing the states, the method comprising the steps of: (b) Sequentially sampling the sounds forming each word in the vocabulary to obtain a set of signals representative of the characteristics of the sounds forming the word. (b) Obtaining data defining the state for each word from the set of signals. (c) Obtaining one probability function for each state from the set of signals. (d) merging several probability functions to obtain a number of said functions less than a predetermined number, the merging comprising:
Steps performed according to criteria relating to merging suitability and a predetermined number. (e) Calculating the probability of transition from each state to all allowed subsequent states in each finite state machine. (f) Entering data identifying each word as it is spoken. (g) Data that determines the conditions for forming each machine,
and storing data identifying the words represented by the machine. (H) Storing data defining each merged probability function. (5) In the method according to claim 4, the data defining the states to obtain one probability function for each state is obtained by merging several sets of signals for each word according to a criterion for merging suitability. A method comprising providing data defining a state for the word. (6) In the method of claim 5, successive sets of signals are merged, and the merging compatibility of the two successive sets is such that each set of signals for potential merging arises from a separate set. The method is pre-evaluated by calculating the logarithm of the ratio of the probabilities. (7) In the method according to claim 5 or 6, the merging suitability of two probability functions is determined by determining the probability that each probability function for potential merging arises from the same probability function. A method in which a probability function is pre-estimated by calculating the logarithm of the ratio to the probability resulting from separate probability functions.