JPH0449954B2

JPH0449954B2 -

Info

Publication number: JPH0449954B2
Application number: JP58183361A
Authority: JP
Inventors: Seiichi Nakagawa; Hidekazu Tsuboka
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-09-30
Filing date: 1983-09-30
Publication date: 1992-08-12
Also published as: JPS6073698A

Description

[Detailed description of the invention]

産業上の利用分野本発明は連続する音声等のパターンを一連のパ
ターンとして自動的に認識するパターン比較装置
に関する。従来例の構成とその問題点パターンマツチングによる音声認識装置の一般
的な構成は次のようなものである。入力音声信号を、フイルタバンク、周波数分析
LPC分析等によつて特徴ベクトルの系列に変換
する特徴抽出手段と、予め発声され、この特徴抽
出手段により抽出された特徴ベクトルの系列を認
識単語全部について標準パターンとして登録して
おく標準パターン記憶手段と、認識させるべく発
声され、前記特徴抽出手段により抽出された入力
パターンと前記標準パターン記憶手段に記憶され
ている標準パターンの全てと特徴ベクトルの系列
としての類似度あるいは距離を計算するパターン
比較手段と、パターン比較の結果、最も類似度の
高かつた（距離の小さかつた）標準パターンに対
応する単語を認識結果として判定出力する判定手
段からなる。このとき、同一話者が同一の単語を発声しても
発声の都度、その発声時間長が異るので、前記パ
ターン比較手段で標準パターンと入力パターンの
比較を行う際には、両者の時間軸を伸縮させ、両
者のパターン長を揃えて比較する必要がある。そ
の際、発声時間長の変化は、発声単語の各部で一
様に生じているわけではないので、各部を不均一
に伸縮する必要がある。その伸縮は、比較すべき
両者のパターンの類似度が最大になる（距離が最
小になる以下距離で説明するように行われるのが
最も良い結果が得られている。このようなマツチ
ングを効果的に行うのに動的計画法を用いる装置
が一般的である（以下このマツチングをDPマツ
チングと称する。） DPマツチングの方法は格子グラフによつて説
明できる。第１図は格子グラフであつて、横軸は
入力パターンＴ＝a₁，a₂…a_Iに対応するｉ座標、
縦軸は標準パターンRⁿ＝bⁿ ₁，bⁿ ₂…bⁿ _Joに対応する
ｊ座標を表す。入力パターンＴと標準パターンを
時間軸を非線形に伸縮してマツチングすることは
この格子グラフ上において、両パターンの各特徴
ベクトルの対応関係を示す経路(1)を何らかの標価
基準によつて決定し、この径路に関して両パター
ンの距離を評価することである。この径路を決定
する際には音声の性質を考慮して制限条件を設け
る。第２図ａは径路選択の制限条件の一例であ
る。即ち、この例では点（ｉ，ｊ）へ至る径路
は、点（ｉ−２，ｊ−１）から点（ｉ−１，ｊ）
を通る径路(2)か、点（ｉ−１，ｊ−１）から来る
径路(3)か、点（ｉ−１，ｊ−２）から点（ｉ，ｊ
−１）を通る径路(4)かの何れかしか取り得ないと
いうことを意味している。このとき、入力パターンと標準パターンの始端
と終端は必ず対応させるという条件をつければ、
前記マツチングの径路は第１図の斜線の部分に制
限される。この制限は、いかに時間軸が伸縮する
といつても、同一単語に対してはそれ程極端に伸
縮するはずはないという事実からあまり極端な対
応づけが生じないようにするためである。 a_iとbⁿ _Jのベクトル間距離をdⁿ（ｉ，ｊ）とすれ
ば、入力パターンＴと標準パターンRⁿのパター
ン間の前記径路に沿う距離は、その径路に沿うdⁿ
（ｉ，ｊ）の荷重平均として定義される。第２図
の径路上のａ，ｂ，ｃ，ｄはそれに対応する径路
が選ばれたときの荷重であるDPマツチングが適
用できるためにはこの荷重の決め方は、格子グラ
フ上で前記制限条件の下でいかなる径路が選ばれ
ようともその径路に沿う荷重の和が一定になるよ
うに決めれば良い。ａ＝ｃ＝ｅ＝２，ｂ＝ｄ＝１
とすれば、この荷重の和はＩ＋Jⁿ，ａ＝ｂ＝ｃ＝
１，ｄ＝ｅ＝0.5とすれば、この荷重の和はＩ，
ａ＝ｂ＝0.5，ｃ＝ｄ＝ｅ＝１とすれば、この荷
重の和はJⁿとなり径路の選ばれ方によらず一定と
なる。これらは共によく用いられる。また、前記
荷重の和一定という条件の下でこの荷重をｊに関
する関数とすることにより、より重視してマツチ
ングしたい径路上の部分の荷重を重くする等の操
作も可能である。入力パターンＴと標準パターンRⁿの距離は、
前記制限条件の下で、前記荷重平均の最小値とし
て定義される。即ち、次の漸化式を解くことによ
つて前記荷重平均の最小値とその最小値を与える
径路が決定され得る。 gⁿ（ｉ，ｊ）＝mimgⁿ（ｉ−２，ｊ−１）＋adⁿ（
ｉ−１，ｊ）＋bdⁿ（ｉ，ｊ） gⁿ（ｉ−１，ｊ−１）＋cdⁿ（ｉ，ｊ） gⁿ（ｉ−１，ｊ−１）＋cdⁿ（ｉ，ｊ） gⁿ（ｉ−１，ｊ−２）＋edⁿ（ｉ，ｊ−１）＋ddⁿ（ｉ，
ｊ）……(1) 初期条件gⁿ（１，１）＝dⁿ（１，１）Ｄ（Ｔ，Rⁿ）＝gⁿ（Ｉ，Jⁿ）／荷重の和ここにＤ（Ｔ，Rⁿ）は入力パターンＴと標準パ
ターンR_oの距離である。径路選択の条件としては他にも種々考えられる
第２図ｂ〜ｊ等は他の例である。この他にもさら
に種々の変形が考えられ得る。これら径路の選択
条件に伴つて前記漸化式は対応するものに書き換
えられるのは勿論である。孤立して発声された単語を認識する場合は勿論
連続して発声された（単語と単語の間に切れ目な
く発声された）音声を認識する場合もDPマツチ
ングは良好な成績をおさめている。連続単語音声認識の問題は次のように定式化さ
れる。入力パターンのフレーム数をＩ、第ｉフレーム
の特徴ベクトルをa_i、単語ｎの標準パターンのフ
レーム数をJⁿ、第ｊフレームの特徴ベクトルをbⁿ _j
とするとき、単語ｎの標準パターンRⁿは次のよ
うに表わされる。 Rⁿ＝bⁿ ₁bⁿ ₂…bⁿ _j…bⁿ _jｎそこでｘ個の単語列に対応する標準パターンの
結合、Ｒ＝R^q(1)R^q(2)…R^q(x) ＝b^q(1) ₁b^q(1) ₂…b^q(1) _Jq(1)b^q(2) ₁b^q(2) ₂…b^q(x) _Jq(
2) …b^q(x) ₁b^q(x) ₂…b^q(x) _Jq(x) ……(2) と入力パターンＴ＝a₁，a₂…a_i…a_Iとのベクトル
系列間の距離が最小になる単語列ｑ(1)ｑ(2)…ｑ
（ｘ）を求める。以上の計算を前記孤立単語の場合と同様にして
そのままDPマツチングで解こうとすれば、例え
ば１０数字の単語を標準パターンとしてもつてい
るとき、３数字の連続発声された音声を認識する
には10³＝1000種類の標準パターンとマツチング
しなければならない。標準パターンの数が増せば
たちまちその組合せの数は禁止的な量になる。そこで、連続単語の認識にもDPマツチングを
適用するために、マツチングの累積距離の正規化
係数（前記荷重の和のこと）は入力のフレーム数
にのみ依存するように径路の選択の条件を設定す
れば、以下に示すように標準パターンの単語の組
合せにも動的計画法が適用でき計算量を大幅に減
らし得る。径路の選択条件としては一般に第３図ａ〜ｅに
示すものがある。径路上に示した数値はその径路
が選ばれたときの荷重係数である。入力パターンＴの第ｉフレーム（の特徴ベント
ル、以後フレームとのみ称する）a_iとＸ個の標準
パターン連結からなる連続標準パターンＲの第ｊ
フレームb^R _jのフレーム間距離（ベクトル距離）を
d_R（ｉ，ｊ）とし、入力パターンと連続標準パタ
ーンとの対応づけをする時間関数（前記マツチン
グの径路）をｕ（ｉ）として、この時間関数に沿
つて求められる次の累積距離（フレーム間距離の
荷重の和）Ｄ（Ｔ，Ｒ）を最小化するＲ（R^と記
す）が求めるものであるとする。即ち、Ｄ（Ｔ，Ｒ）＝ min ｕ（ｉ）〔_I 〓ⁱ⁼¹ d_R（ｉ，ｕ（ｉ））〕 ……(3) R^＝ argmjn Ｒ〔Ｄ（Ｔ，Ｒ）〕ここで、第３図ａの径路のときは、０＜ｕ（ｉ）−ｕ（ｉ−１）＜２，ｕ(1)＝１，ｕ（Ｉ
）
＝J^R である。また、min〔ｆ（ｚ）〕はｚに関して最小
化されたｆ（ｚ），argmin〔ｆ（ｚ）〕はｆ（ｚ）を
最小にするｚの値を意味する。式(3)は単語数既知の場合、未知の場合、あるい
はオートマトン制御を組み込んだ形で解くことが
でき、その方法については既に種々提案されてお
り、製品化されている例もある。本願はこのうち、オートマトン制御を組み込ん
だ形で式(3)を解くことによつて、連続して発声さ
れた音声を認識する装置に関するものである。次に、オートマトン制御を組み込んだ形で式(3)
を解く従来の方法について説明する。我々が実際に単語を連続して発声する場合は、
それらの順序が決つている場合が多い。従つて、
入力パターン（入力文）は、有限状態オートマト
ンαと等価な正規文法によつて生成された文であ
るとし、オートマトンαで受理されるあらゆる単
語列のうち、式(3)を最小にする単語列（ｑ(1)，ｑ
(2)，…，ｑ（ｘ））を求めるというようにすること
によつて、認識率を向上させることができる。こ
こで、単語列（ｑ(1)，ｑ(2)，…，ｑ（ｘ））が、オ
ートマトンαで受理されるとはΔ（q_p，ｑ(1)）＝q_i
CS，Δ（q_i，ｑ(2)＝q_jCS，…，Δ（q_k，ｑ（ｘ−
１））＝q_lCS，Δ（q_l，ｑ（ｘ））＝q_fCFCSとなるよ
うな状態遷移が存在する場合である。各記号の意
味は、Ｓは状態ｑの有限集合｛ｑ∈q₀，q₁，…，
q|_S|_-1｝，〓は入力単語ｎの有限集合｛ｎ∈１，
２，…，Ｎ｝、Δは状態遷移関数で、Ｓ×〓→Ｓ，
｛Δ（q₁，ｎ）＝q_j｝、q_pは初期状態でq_p∈Ｓ，Ｆは
最終状態の集合FCSである。ここでｑ(1)，…ｑ
（ｘ）∈｛１，２，…Ｎ｝である。通常のオートマトンの認識問題と異る点は、時
間を表わすフレーム番号も変数として入つている
点であり、しかも単に受理、拒否の出力でなく、
受理可能な度合（累積距離）が出力される点であ
る。 D_qj（ｉ）を状態q_jで入力のｉフレームで終端す
ると仮定したあらゆる単語列のうちの最小累積距
離、N_qj（ｉ）をD_qj（ｉ）に対応する単語列の最後
尾単語名、B_qj（ｉ）をN_qj（ｉ）の始点位置マイナ
ス１（N_qj（ｉ）の一つ前の単語の最終フレーム、
バツクポインタと称する）、Q_qj（ｉ）をq_jへの状
態遷移によつてD_qj（ｉ）を満たした状態名即ち Δ（Q_qj（ｉ），N_qj（ｉ））＝q_jとするとき、次の漸化
式を解くことで、オートマトン制御による式(3)の
解が得られる。即ち、初期条件D_qj（ｏ）＝Ｏ，B_qj（ｏ）＝０として D_q（ｉ）＝min 〔D_qj（ｍ）＋Dⁿ（ｍ＋１：ⁱ）〕，ｑ＝Δ（q_k，ｎ） ……(4) をｑ＝q₁，q₂，…，q|_S|_-1について求め、この
式を満たすｎ，ｍ，q_kをｎ，ｍ，q_kとするとき N_q（ｉ）＝n^，B_q（ｉ）＝ｍ，Q_q（ｉ）＝q_k とする。ｉ＝Ｉまでこの計算を行えば、次のよう
にして最後尾の単語から逆順に単語が求まる。即
ちｉ＝Ｉ，ｑ＝a_gming_f D_qf（ｉ），q_f∈Ｆとし
て n^＝N_q（ｉ） B_q（ｉ）≠なら、ｉ＝B_q（ｉ），ｑ＝Q_q（ｉ）と
してへ、B_q（ｉ）＝０なら終了する。第６図はフローチヤートである。なお、Dⁿ（ｍ＋１：ｉ）は次式で定義され、前
記の孤立単語のDPマツチングと同じ方法で求め
られる。 Dⁿ（ｍ＋１：ｉ）＝ min ｕ（ｉ）〔_i 〓^k=m+1 dⁿ（ｋ，ｕ（ｋ））〕
……(5) ここで、第３図ａの径路のときはｏｕ（ｋ）−ｕ（ｋ−１）２，ｕ（ｍ＋１）＝１，
ｕ（ｉ）＝Jⁿ である。式(3)は、予め定められたｍの範囲i₁〜i₂につい
てDⁿ（ｍ＋１：ｉ）を求め、各ｍについて既に求
められているD_qk（ｍ）とDⁿ（ｍ＋１：ｉ）の和が
最小となるｍとq_k∈｛q₀，q₁，…，ｑ｜ｓ｜−
１｝（ただしq_kはｑ＝Δ（q_k，ｎ）を満足する）と
ｎを求め、これをD_q（ｉ）とするのであるが、こ
の計算量を減ずる方法として、ｉ＝ｍ＋１以後は
単語ｎで、次の状態がｑであるとしたときの、
（ｉ，ｊ）＝（１，１）から（ｉ，ｊ）＝（i′，j′）
ま
での累積距離をDⁿ _q（i′，j′）とするとき、Dⁿ _q（ｉ，
Jⁿ）の前記ｎ，q_kについての最小値としてD_q（ｉ）
を求める方法が提案されている。これらの方法の問題点は、オートマトンの構造
によつては、さらに計算量を減じ得るものである
が、従来はその点が考慮されていなかつた点にあ
る。発明の目的オートマトン制御による連続音声認識における
計算量を大幅に削減したパターン比較装置を提案
することを目的とする。発明の構成本発明はオートマトン制御による連続パターン
のDPマツチングにおいて、オートマトンの制御
規則をｑ＝Δ（q_k，ｎ）とするとき、パターンｎ
に対して、状態q_kが複数存在するとき、それぞれ
の状態に対してその状態までの累積距離が、最小
であるものをq^_kとすれば、状態ｑまでの最後尾パ
ターンｎに対する累積距離は直前の状態がq^_kのみ
であるとして計算することにより、従来q_kのすべ
てに対して行つていた累積距離の計算を減らすも
のである。実施例の説明以下に本発明の原理及び実施例を説明する。第５図は本発明の原理を説明するための図で、
第５図ａは、ｂに書かれた単語に対する有限オー
トマトン表現である。ここでの説明は簡単のため
に前記単語の代りに単音節としている。上記従来
例における計算では、すべての状態遷移（第７図
においてその数は22）に対して計算を行なわなけ
ればならなかつた。ところが、第７図の例では、
状態S₁₂には状態S₆，S₇，S₈から同じ“Ｏ”が入
つてきている。このような場合、状態S₆，S₇，S₈
を一時的に縮退化してやれば、状態S₁₂の“Ｏ”
に関しては一回のみ計算で済む。このことは、状
態S₁₄の“YA”についても言える。本発明は、
この原理を利用して計算量の削減をはかつたパタ
ーン比較装置である。第７図の例において、ｑ＝S₁₂，n_p＝n_p（n_pは、
音節“Ｏ”に付された番号）とすれば、ｑ＝S₁₂
に対して式(3)の意味するところは D_S12（ｉ）＝ min qk，ｍ〔D^no _qk（ｍ）＋Dⁿｏ（ｍ＋１：ｉ）〕＝ min ｍminD^no _S6（ｍ） D^no _S7（ｍ） D^no _S8(n)＋Dⁿｏ（ｍ＋１：ｉ） ……(7) となる。換言すれば、第５図において、S₆，S₇，
S₈から“Ｏ”を発してS₁₂に遷移するとき、D_s
（ｉ）を最小にするためには、D_S6（ｍ），D_S7（ｍ），
D_S8（ｍ）のうちの最小の状態から遷移することに
なるということである。これを一般的にかけば、
q_pを初期状態として、 D_qp（ｏ）＝０，B_qp（ｏ） D_q（ｉ）＝ min ｎ，ｍ〔 min qk〔D_qk（ｍ）〕＋Dⁿ（ｍ＋１：
ｉ）〕 ……(8) ただしｑ＝Δ（q_k，ｑ） N_q（ｉ）＝n^，B_q（ｉ）＝m^，Q_q（ｉ）＝argmin〔D_qk
（m^）〕となる。即ち、フレームｉにおいて状態ｑとなる
直前の状態のうち、同じ音節（あるいは単語）ｎ
で連がるものがあるときは、それら状態に到るま
での累積距離が最小である状態から連がるという
ことである。このことを利用すれば、第５図の例の場合は
“Ｏ”と“YA”について以上のことが言えるか
ら式(3)の計算は19回となり、３回減る。Dⁿ（ｍ＋
１：ｉ）を求めて式(3)を直接解く場合は、この量
は殆んど無視できるが、前記高速計算法を用いる
場合には大きな差となる。また、タスクによつて
は大きな計算量の削減が期待される。上記高速計算法の一つに対し、本発明を適用し
た一実施例について説明する。第３図ｂの径路制限条件によれば、Dⁿ（ｍ＋
１：ｉ）は第６図の斜線の内部における径路に沿
う（ｉ，ｊ）＝（ｍ，１）から（ｉ，ｊ）＝（ｉ，
jⁿ）までの累積距離である。ここで(6)は傾き1/2、
(7)は傾き２の直線である。いま、径路(5)が（ｉ，
ｊ）＝（１，１）から（ｉ，ｊ）＝（ｉ，jⁿ）までの
最小の累積距離を与えるものとすれば、動的計画
法の原理に従つて、（ｉ，ｊ）＝（１，１）から径
路(5)上の点（ｉ，ｊ）＝（i′，j′）までの累積距離
を最小にする径路は、径路(5)の（ｉ，ｊ）＝（i′，
j′）までの径路と全く一致する。従つて、Dⁿ（ｍ
＋１：ｉ）は特に求めなくても、Dⁿ _q（ｉ，ｊ）を
漸化式 Dⁿ _q（ｉ，ｊ）＝mimDⁿ _q（ｉ−２，ｊ−１）＋dⁿ
（ｉ−１，ｊ）＋dⁿ（ｉ，ｊ）…(1) Dⁿ _q（ｉ−１，ｊ−１）＋dⁿ（ｉ，ｊ）…(2) Dⁿ _q（ｉ−１，ｊ−２）＋dⁿ（ｉ，ｊ）…(3)……(9) から求め、Dⁿ _q（ｉ）＝Dⁿ _q（ｉ，jⁿ）として求めるこ
とができる。ただし、漸化式(9)の初期値は Dⁿ _q（−１，ｊ）＝∞ （Ｊ＝０，１，…，Jⁿ） Dⁿ _q（０，０）＝０ Dⁿ _q（０，ｊ）＝∞ （Ｊ＝−１，１，２，…，
Jⁿ） Dⁿ _q（ｉ，−１）＝∞ （ｉ＝１，２，…，Ｉ） Dⁿ _q（ｉ，０）＝D_qk（ｉ−１）（q_kはｑ＝Ｕ（q_k，
ｎを満たすq_k）であり、Dⁿ _q（ｉ，ｊ）に対応するバツクポインタ
Bⁿ _q（ｉ，ｊ）は式(6)において Dⁿ _q（ｉ，ｊ）＝のときBⁿ _q（ｉ，ｊ）＝Bⁿ _q（ｉ−
２，ｊ−１） Dⁿ _q（ｉ，ｊ）＝のときBⁿ _q（ｉ，ｊ）＝Bⁿ _q（ｉ−
１，ｊ−１） Dⁿ _q（ｉ，ｊ）＝のときBⁿ _q（ｉ，ｊ）＝Bⁿ _q（ｉ−
１，ｊ−２）となる。また、Bⁿ _q（ｉ，ｊ）の初期値は Bⁿ _q（ｉ，１）＝ｉ−１である。第６図はｉ＝ｉ上におけるDⁿ _q（ｉ，ｊ）がどの
ような意味をもつているかを説明している。８，
１０，１２なる直線は傾き1/2、９，１１，１３
なる直線は傾き２であつて、（ｉ，ｊ）＝（ｉ，jⁿ）
（点２６）へ到る径路は直線８と９で挾まれた領
域に含まれ、点１６を通るマツチング径路は直線
１０と１１で挾まれた領域に含まれ、点１７を通
るマツチング径路は直線１２と１３で挾まれる領
域に含まれる。言い換えれば、点１６を通る径路
は単語ｎに対して、始端は１８〜１９の間にあ
り、終端は２０〜２１の間にあり、点１７を通る
径路は始端は２２〜２３の間にあり、終端は２４
〜２５の間にあるということになる。結局、ｉ＝
ｉ上のDⁿ _q（ｉ，ｊ）は、直線２８の傾きを1/2と
するとき、フレームｉ〜i′におけるi″に対する累
積距離Dⁿ _q（i″）＝Dⁿ _q（i″，Jⁿ）に対する途中の累積
距離を表していることになる。（以後Dⁿ _q（ｉ，ｊ）
を中間累積距離と呼ぶことにする）また、漸化式
(9)から明らかなようにDⁿ _q（ｉ，ｊ）を求めるに
は、フレームｉ−２とｉ−１における中間累積距
離と、フレームｉ−１とｉにおけるフレーム間距
離のみ既知であればよく、それ以前の値は忘れて
しまつても良い。以上のことをまとめて言えば、D_q（ｉ）を求め
るには、フレームｉ毎にｊ＝１，２，…，Jⁿ，ｎ
＝１，２，…，Ｎ，ｑ＝q₁，q₂，…，q|_s|_-1に
対してDⁿ _q（ｉ，ｊ）を求め、 m^，q^_k＝argmin〔Dⁿ _q（ｉ，jⁿ），ｑ＝Δ（q_k，ｎ）とするとき、 D_q（ｉ）＝Dⁿ^_q（ｉ，jⁿ^） B_q（ｉ）＝Bⁿ _q（ｉ，jⁿ^） Q_q（ｉ）＝q^_k N_q（ｉ）＝n^ とすることができる。このようにすれば、各格子
点（ｉ，ｊ）におけるdⁿ（ｉ，ｊ）の計算は単語
ｎ毎に１回で済み、Dⁿ _q（ｉ，ｊ）の計算はｑ，
ｎ，Δ（q_k，ｎ）＝ｑなるq_kについて１回で済み、
毎フレーム第５図に示す斜線内部の格子点につい
てdⁿ（i′，ｊ），Dⁿ（ｍ＋１：ｉ）を計算しなけれ
ばならない式(3)を直接解く方法に比べて大幅に計
算量が減少する。また、この方法は入力パターンのフレームｉ毎
に処理を進めてゆくものであるが、標準パターン
のフレームｊ毎に処理してゆくこともできる。即
ち、漸化式(6)から明らかなように、ｊ方向につい
ても、Dⁿ _q（ｉ，ｊ）を求めるには、フレームｊ−
１とフレームｊのフレーム間距離Dⁿ（ｉ，ｊ）と
フレームｊ−２とフレームｊ−１の中間累積距離
がわかつていればよいから、フレームｊ毎にｊ＝
ｊ上のDⁿ _q（ｉ，ｊ）を求めてゆくこともできる。以上の方法は、オートマトン制御クロツク同期
伝播形DP法（FSACWDP）と呼ばれるものであ
る。このとき、式(8)の考え方を導入するには、漸化
式(9)の計算において、Dⁿ _q（ｉ，ｊ）の初期値を次
のようにすればよい。即ち、ｑ＝Δ（q_k，ｎ）を
満足するq_kに対し、q^_k＝argmin〔D_qk（ｉ−１）〕
とするとき、 Dⁿ _q（ｉ，０）＝D_q^_k（ｉ−１） ……(10) とすればよい。第８図は本発明の一実施例を示すブロツク図で
ある。１００は音声信号の入力端子である。１０
１は特徴抽出部であつて入力音声信号を特徴ベク
トルの系列に変換する。１０２は標準パターン記
憶部であつて、認識すべき単語（音節）のそれぞ
れが同様な特徴ベクトルの系列として記憶されて
いる。１０３はフレーム間距離計算部であつて、特徴
抽出部１０１の出力の特徴ベクトルと標準パター
ン記憶部１０２の特徴ベクトルとの差を計算す
る。１０４は計算されたフレーム間距離を一時的
に記憶するフレーム間距離記憶部である。１０５
はオートマトン制御規則記憶部である。１０６は
累積距離記憶部であつて、累積距離D_q（ｉ）が各
フレーム毎に記憶されている。１０７は初期値計
算部であつて、漸化式(9)の計算における初期値を
式(10)に従つて計算する部分である。１０８は漸化
式計算部であつて、１０７で計算された初期値を
もとに、漸化式(10)を計算する部分である。１０９
は最終値決定部であつて、漸化式計算部１０８の
結果から各フレームにおける最終値として n^＝argmin〔Dⁿ _q（ｉ，Jⁿ） N_q（ｉ）＝n^ D_q（ｉ）＝Dⁿ^_q（ｉ，Jⁿ^） B_q（ｉ）＝Bⁿ^_q（ｉ，jⁿ^） Q_q（ｉ）＝argmin〔D_qk（B_q（ｉ））〕を計算する部分である。１１３，１１４，１１５
はそれぞれ最後尾単語記憶部、直前状態記憶部、
バツクポインタ記憶部であつて、それぞれ、最終
値決定部１０９で求められた、最後尾単語N_q
（ｉ），ｑの直前の状態Q_q（ｉ），バツクポインタ
B_q（ｉ）が各フレームについて記憶される。１１
２は入力音声の最終フレームにおける状態を決定
する部分である。即ち、q_f Ｆとするとき最終フ
レームＩの状態ｑはｑ＝argmin（D_qf（Ｉ）〕と決定される。１１０は音声区間検出部、１１１
はフレーム数係数部である。１１６はバツクトレ
ース制御部であつて、最後尾単語記憶部１１３
と、直前状態記憶部１１４と、バツクポインタ記
憶部１１５の内容から、第４図に示すフローチヤ
ートに従つて逆の順序で、最後尾単語記憶部１１
３から認識結果を出力せしめる。第９図は、以上の実施例の動作をソフトウエア
で実現する場合の一例を示すチヤートである。ステツプ200〜ステツプ204は、全体としての初
期化を行う部分、ステツプ205〜ステツプ214はフ
レームｉにおける処理を行う部分であつて、ステ
ツプ206〜ステツプ212は、各標準パターンｎ（ｎ
＝１，２，…，Ｎ）に対してDⁿ _q（ｉ），Bⁿ _q（ｉ）を
求める部分、ステツプ213〜ステツプ214は、入力
音声がフレームｉで終端すると仮定したときの最
後尾単語（音節）名、それに対応する累積距離、
バツクポインタ、直前の状態を各状態について求
める部分である。ステツプ208はフレーム間距離
計算部１０３、フレーム間距離記憶部１０４の動
作に対応する。ステツプ210は初期値計算部１０
７の動作に対応する。ステツプ211〜ステツプ212
は漸化式計算部１０８の動作に対応する。ステツ
プ213〜ステツプ214は最終値決定部１０９の動作
に対応する。ステツプ215〜ステツプ217は入力音
声の最終フレームから逆の順序で認識結果を決定
していく処理であつて、最終フレーム状態決定部
１１８、最後尾単語記憶部１１３、直前状態記憶
部１１４バツクポインタ記憶部１１５、バツクト
レース制御部１１６の間で行われる動作に対応し
ている。 Dⁿ _q（ｉ，ｊ）を直接求めて解く方法としては他
にオートマトン制御Level.Buildling法
（FSALB）が知られている。これにも、本発明
の考え方を導入することができ、計算量を大幅に
減らし得る。第１０図は、５桁の数字の棒読み音声例えば、
七万三千二百八十六という具合に○万○千○百○
十○と読む場合のオートマトン表現である。これ
に対し、従来のFSACWDP、FSALBを用いた装
置と、本発明による縮退化FSACWDPと縮退化
FSALBを用いた装置に関して、計算量、ワーク
メモリの記憶量を比較したのが第１表である。こ
の表によれば、第１０図に示すタスクの場合は、
計算量が大幅に減り、記憶量も同等以下となつて
おり、実用的に効果の大きいものである。 BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern comparison device that automatically recognizes continuous speech patterns as a series of patterns. Conventional configuration and its problems The general configuration of a speech recognition device using pattern matching is as follows. Input audio signal, filter bank, frequency analysis
Feature extraction means for converting into a series of feature vectors by LPC analysis etc., and standard pattern storage means for registering the series of feature vectors uttered in advance and extracted by the feature extraction means as standard patterns for all recognized words. pattern comparison means for calculating the similarity or distance as a series of feature vectors between the input pattern extracted by the feature extraction means and all standard patterns stored in the standard pattern storage means; and a determining means for determining and outputting a word corresponding to the standard pattern having the highest degree of similarity (smallest distance) as a recognition result as a result of pattern comparison. At this time, even if the same speaker utters the same word, the duration of the utterance differs each time, so when comparing the standard pattern and the input pattern using the pattern comparison means, the time axis of both It is necessary to expand and contract the pattern lengths of the two to make them the same and compare them. At this time, since the change in utterance time length does not occur uniformly in each part of the uttered word, it is necessary to expand and contract each part non-uniformly. The best results have been obtained when the expansion/contraction is performed in a way that maximizes the similarity between the two patterns to be compared (minimizes the distance), as explained below using distance. (hereinafter, this matching is referred to as DP matching).The DP matching method can be explained using a lattice graph.Figure 1 is a lattice graph, and The horizontal axis is the i coordinate corresponding to the input pattern T = a ₁ , a ₂ ...a _I ,
The vertical axis represents the j coordinate corresponding to the standard pattern R ⁿ =b ⁿ ₁ , b ⁿ ₂ . . . b ⁿ _Jo . Matching the input pattern T and the standard pattern by non-linearly expanding and contracting the time axis involves determining the path (1) showing the correspondence between the feature vectors of both patterns on this lattice graph using some price standard. , to evaluate the distance between both patterns with respect to this path. When determining this route, limiting conditions are set in consideration of the nature of the voice. FIG. 2a shows an example of restrictive conditions for route selection. That is, in this example, the path to point (i, j) is from point (i-2, j-1) to point (i-1, j)
Path (2) passing through, path (3) coming from point (i-1, j-1), or path (3) from point (i-1, j-2) to point (i, j
This means that only one of the routes (4) passing through -1) can be taken. At this time, if we set the condition that the start and end of the input pattern and standard pattern must correspond,
The matching path is limited to the shaded area in FIG. This restriction is made to prevent extreme correspondences from occurring due to the fact that no matter how much the time axis expands or contracts, it is unlikely that the same word will expand or contract so drastically. If the distance between the vectors a _i and b ⁿ _J is d ⁿ (i, j), the distance along the path between the input pattern T and the standard pattern R ⁿ is d ⁿ along that path.
It is defined as the weighted average of (i,j). The a, b, c, and d on the route in Figure 2 are the loads when the corresponding route is selected.In order to apply DP matching, the method of determining this load is to No matter what route is selected below, it should be determined so that the sum of the loads along that route will be constant. a=c=e=2, b=d=1
Then, the sum of this load is I+J ⁿ , a=b=c=
1, d=e=0.5, the sum of this load is I,
If a=b=0.5, c=d=e=1, the sum of these loads will be J ⁿ and will be constant regardless of how the path is selected. Both of these are commonly used. Furthermore, by making this load a function of j under the condition that the sum of the loads is constant, it is also possible to perform operations such as increasing the load on a portion of the path that is to be matched with greater emphasis. The distance between the input pattern T and the standard pattern R ⁿ is
is defined as the minimum value of the weighted average under the limiting conditions. That is, by solving the following recurrence formula, the minimum value of the weighted average and the path that provides the minimum value can be determined. g ⁿ (i, j) = mimg ⁿ (i-2, j-1) + ad ⁿ (
i-1, j) + bd ⁿ (i, j) g ⁿ (i-1, j-1) + cd ⁿ (i, j) g ⁿ (i-1, j-1) + cd ⁿ (i, j) g ⁿ (i-1, j-2)+ed ⁿ (i, j-1)+dd ⁿ (i,
j)...(1) Initial condition g ⁿ (1, 1) = d ⁿ (1, 1) D (T, R ⁿ ) = g ⁿ (I, J ⁿ )/sum of loads Here, D (T, R ⁿ ) is the distance between the input pattern T and the standard pattern _Ro . There are various other possible route selection conditions, such as those shown in FIGS. 2b to 2j, which are other examples. In addition to this, various other modifications can be considered. Of course, the recurrence formula can be rewritten into a corresponding one depending on the selection conditions of these routes. DP matching has achieved good results not only when recognizing isolated words but also when recognizing consecutively uttered sounds (uttered without any breaks between words). The problem of continuous word speech recognition is formulated as follows. The number of frames of the input pattern is I, the feature vector of the i-th frame is a _i , the number of frames of the standard pattern of word n is J ⁿ , the feature vector of the j-th frame is b ⁿ _j
Then, the standard pattern R ⁿ of word n is expressed as follows. R ⁿ = b ⁿ ₁ b ⁿ ₂ …b ⁿ _j …b ⁿ _j n Then, the combination of standard patterns corresponding to x word strings, R = R ^q(1) R ^q(2) …R ^q(x) ＝b ^q(1) ₁ b ^q(1) ₂ …b ^q(1) _Jq(1) b ^q(2) ₁ b ^q(2) ₂ …b ^q(x) _{Jq(
2)} Vector sequence of …b ^q(x) ₁ b ^q(x) ₂ …b ^q(x) _Jq(x) …(2) and input pattern T=a ₁ , a ₂ …a _i …a _I Word sequence q(1)q(2)…q with the minimum distance between
Find (x). If we try to solve the above calculation using DP matching in the same way as in the case of isolated words, for example, when we have a word with 10 digits as a standard pattern, we can recognize the sound of 3 digits uttered consecutively. 10 ³ = Must be matched with 1000 different standard patterns. As the number of standard patterns increases, the number of combinations quickly becomes prohibitive. Therefore, in order to apply DP matching to continuous word recognition, we set the conditions for path selection so that the normalization coefficient of the cumulative distance of matching (the sum of the weights mentioned above) depends only on the number of input frames. Then, as shown below, dynamic programming can be applied to word combinations of standard patterns, and the amount of calculation can be significantly reduced. The route selection conditions generally include those shown in FIGS. 3a to 3e. The numerical value shown on the route is the load factor when that route is selected. The j-th continuous standard pattern R consisting of the i-th frame (feature vent of the input pattern T, hereinafter referred to simply as frame) a _i and the connection of X standard patterns
The interframe distance (vector distance) of frame b ^R _j is
Let d _R (i, j) be the time function that matches the input pattern with the continuous standard pattern (the matching path) as u(i), and let the next cumulative distance (frame It is assumed that R (denoted as R^) that minimizes D (T, R) (sum of loads of distances) is to be sought. That is, D (T, R) = min u (i) [ _I 〓 ⁱ⁼¹ d _R (i, u (i))] ... (3) R^ = argmjn R [D (T, R)] here So, for the path shown in Figure 3a, 0<u(i)-u(i-1)<2, u(1)=1, u(I
)
= ^JR . Further, min[f(z)] means f(z) minimized with respect to z, and argmin[f(z)] means the value of z that minimizes f(z). Equation (3) can be solved when the number of words is known, when it is unknown, or when automaton control is incorporated. Various methods have already been proposed, and some examples have been commercialized. Among these, the present application relates to a device that recognizes continuously uttered speech by solving equation (3) in a form that incorporates automaton control. Next, formula (3) incorporating automaton control is expressed as
We will explain the conventional method for solving the problem. If we actually say the words in succession,
In many cases, their order is fixed. Therefore,
The input pattern (input sentence) is a sentence generated by a regular grammar equivalent to the finite state automaton α, and the word sequence that minimizes equation (3) among all word sequences accepted by the automaton α. (q(1), q
(2),...,q(x)), the recognition rate can be improved. Here, if the word string (q(1), q(2), ..., q(x)) is accepted by the automaton α, Δ(q _p , q(1)) = q _i
CS, Δ(q _i , q(2)=q _j CS, …, Δ(q _k , q(x−
This is a case where there is a state transition such that 1))=q _l CS, Δ(q _l , q(x))=q _f CFCS. The meaning of each symbol is that S is a finite set of states q {q∈q ₀ , q ₁ ,...,
q| _S | _-1 }, 〓 is a finite set of input words n {n∈1,
2,...,N}, Δ is the state transition function, S×〓→S,
{Δ(q ₁ , n)=q _j }, q _p is an initial state, q _p ∈S, F is a set of final states FCS. Here q(1),…q
(x)∈{1, 2,...N}. The difference from normal automaton recognition problems is that the frame number representing time is also included as a variable, and it is not just an output of acceptance or rejection.
This is the point where the degree of acceptability (cumulative distance) is output. Let D _qj (i) be the minimum cumulative distance among all word strings assuming that it ends at the input i frame in state q _j , and N _qj (i) be the last word name of the word string corresponding to D _qj (i). , B _qj (i) is the starting point position of N _qj (i) minus 1 (the last frame of the word before N _qj (i),
), Q _qj (i) is the state name that satisfies D _qj (i) by state transition to q _j , that is, Δ(Q _qj (i), N _qj (i)) = q _j . Then, by solving the following recurrence equation, the solution to equation (3) by automaton control can be obtained. That is, with the initial conditions D _qj (o)=O, B _qj (o)=0, D _q (i)=min [D _qj (m)+D ⁿ (m+1: ⁱ )], q=Δ(q _k , n ₎ ... ₍ 4) for q = q ₁ , q ₂ _, ..., q| _S _| ) = n^, B _q (i) = m, Q _q (i) = q _k . If this calculation is performed until i=I, words can be found in reverse order starting from the last word as follows. That is, if i=I, q=a _g ming _f D _qf (i), q _f ∈F, then n^=N _q (i) B _q (i)≠, then i=B _q (i), q=Q _q As (i), if B _q (i)=0, the process ends. FIG. 6 is a flowchart. Note that D ⁿ (m+1:i) is defined by the following equation, and is obtained using the same method as the DP matching of isolated words described above. D ⁿ (m+1:i)= min u(i) [ _i 〓 ^k=m+1 d ⁿ (k, u(k))]
...(5) Here, for the route shown in Figure 3a, ou(k)-u(k-1)2, u(m+1)=1,
u(i)=J ⁿ . Equation (3) calculates D ⁿ (m+1:i) for a predetermined range of m from i ₁ to i ₂ , and calculates D _qk (m) and D ⁿ (m+1: i) that have already been calculated for each m. m and q _k ∈ {q ₀ , q ₁ , ..., q|s|−
1} (however, q _k satisfies q = Δ(q _k , n)) and n, and use this as D _q (i). As a way to reduce this amount of calculation, after i = m + 1, is word n and the next state is q, then
(i, j) = (1, 1) to (i, j) = (i', j')
Let D ⁿ _q (i′, j′) be the cumulative distance to D ⁿ _q (i, j′).
D _q (i) as the minimum value of n, q _k of J ⁿ )
A method has been proposed to find the . The problem with these methods is that although the amount of calculation can be further reduced depending on the structure of the automaton, this point has not been taken into consideration in the past. OBJECT OF THE INVENTION It is an object of the invention to propose a pattern comparison device that significantly reduces the amount of calculation in continuous speech recognition using automaton control. Structure of the Invention The present invention provides continuous pattern DP matching using automaton control, when the automaton control rule is q=Δ(q _k , n), pattern n
On the other hand, when there are multiple states q _k , if the minimum cumulative distance to each state is q^ _k , then the cumulative distance to the last pattern n to state q is is calculated assuming that the immediately preceding state is only q^ _k , thereby reducing the cumulative distance calculation that was conventionally performed for all q _k . DESCRIPTION OF EMBODIMENTS The principle and embodiments of the present invention will be described below. FIG. 5 is a diagram for explaining the principle of the present invention,
Figure 5a is a finite automaton representation for the word written in b. For simplicity, the explanation here uses monosyllables instead of the words. In the calculations in the conventional example described above, calculations had to be performed for all state transitions (the number of which is 22 in FIG. 7). However, in the example in Figure 7,
The same "O" has entered state _S12 from states _S6 , _S7 , and _S8 . In such a case, states S ₆ , S ₇ , S ₈
If we temporarily degenerate, we get “O” in state S ₁₂ .
It only needs to be calculated once. This also applies to “YA” in state _S14 . The present invention
This is a pattern comparison device that utilizes this principle to reduce the amount of calculation. In the example of Figure 7, q=S ₁₂ , n _p =n _p (n _p is
number attached to the syllable “O”), then q=S ₁₂
^The _meaning _of _equation ⁽ ³ ₎ ^for m) D ^no _S8(n) +D ^no (m+1:i) ...(7). In other words, in FIG. 5, S ₆ , S ₇ ,
When issuing “O” from S ₈ and transitioning to S ₁₂ , D _s
In order to minimize (i), D _S6 (m), D _S7 (m),
This means that the transition will start from the minimum state of D _S8 (m). If you multiply this in general,
With q _p as the initial state, D _qp (o) = 0, B _qp (o) D _q (i) = min n, m [min qk [D _qk (m)] + D ⁿ (m+1:
i)] ...(8) However, q=Δ(q _k , q) N _q (i)=n^, B _q (i)=m^, Q _q (i)=argmin [D _qk
(m^)]. That is, among the states immediately before becoming state q in frame i, the same syllable (or word) n
If something is connected, it means that it is connected from the state where the cumulative distance to reach those states is the minimum. By utilizing this fact, in the case of the example shown in FIG. 5, the above can be said about "O" and "YA", so the calculation of equation (3) becomes 19 times, which is reduced by 3 times. D ⁿ (m+
1:i) and directly solves equation (3), this quantity can be almost ignored, but when the above-mentioned high-speed calculation method is used, it becomes a big difference. Also, depending on the task, a significant reduction in the amount of calculation is expected. An embodiment in which the present invention is applied to one of the above-mentioned high-speed calculation methods will be described. According to the path restriction conditions shown in Figure 3b, D ⁿ (m+
1:i) is from (i,j)=(m,1) to (i,j)=(i,
j ⁿ ). Here, (6) has a slope of 1/2,
(7) is a straight line with a slope of 2. Now, the path (5) is (i,
j) = (1, 1) to (i, j) = (i, j ⁿ ), then according to the principles of dynamic programming, (i, j) = The path that minimizes the cumulative distance from (1, 1) to the point (i, j) = (i', j') on path (5) is (i, j) = (i ′、
j'). Therefore, D ⁿ (m
+1:i) does not need to be specifically determined, but D ⁿ _q (i, j) can be calculated using the recurrence formula D ⁿ _q (i, j) = mimD ⁿ _q (i-2, j-1) + d ⁿ
(i-1, j)+d ⁿ (i, j)...(1) D ⁿ _q (i-1, j-1)+d ⁿ (i, j)...(2) D ⁿ _q (i-1, j −2)+d ⁿ (i, j)...(3)...(9), and can be obtained as D ⁿ _q (i)=D ⁿ _q (i, j ⁿ ). However, the initial value of recurrence formula (9) is D ⁿ _q (-1, j) = ∞ (J = 0, 1, ..., J ⁿ ) D ⁿ _q (0, 0) = 0 D ⁿ _q (0 , j)=∞ (J=-1, 1, 2,...,
J ⁿ ) D ⁿ _q (i, -1) = ∞ (i = 1, 2, ..., I) D ⁿ _q (i, 0) = D _qk (i - 1) (q _k is q = U (q _k ,
q _k ) that satisfies n, and the back pointer corresponding to D ⁿ _q (i, j)
B ⁿ _q (i, ^j ) is expressed as B ⁿ _q (i, j) = B ⁿ _q ₍ i-
2, j-1) When D ⁿ _q (i, j) = B ⁿ _q (i, j) = B ⁿ _q (i-
1, j-1) When D ⁿ _q (i, j) = B ⁿ _q (i, j) = B ⁿ _q (i-
1,j-2). Further, the initial value of B ⁿ _q (i, j) is B ⁿ _q (i, 1)=i−1. FIG. 6 explains the meaning of D ⁿ _q (i, j) on i=i. 8,
The straight line 10, 12 has a slope of 1/2, 9, 11, 13
The straight line has a slope of 2, and (i, j) = (i, j ⁿ )
The path leading to (point 26) is included in the area between straight lines 8 and 9, the matching path passing through point 16 is included in the area between straight lines 10 and 11, and the matching path passing through point 17 is included in the area between straight lines 8 and 9. It is included in the area between 12 and 13. In other words, a path through point 16 has a starting point between 18 and 19 and an ending point between 20 and 21 for word n, and a path through point 17 has a starting point between 22 and 23. , the terminal is 24
This means that it is between 25 and 25. After all, i=
D ⁿ _q (i, j) on i is the cumulative distance D ⁿ _q (i″)=D ⁿ _q (i″ , J ⁿ ) (hereinafter D ⁿ _q (i, j)).
is called the intermediate cumulative distance) Also, the recurrence formula
As is clear from (9), in order to obtain D ⁿ _q (i, j), if only the intermediate cumulative distance between frames i-2 and i-1 and the inter-frame distance between frames i-1 and i are known, then Often, you can forget about previous values. To summarize the above, to find D _q (i), j = 1, 2, ..., J ⁿ , n for each frame i
Find D ⁿ _q (i, j) for = 1, 2,..., N, q = q ₁ , q ₂ ,..., q| _s | _-1 , m^, q^ _k = argmin [D ⁿ When _q (i, j ⁿ ), q=Δ(q _k , n), D _q (i)=D ⁿ ^ _q (i, j ⁿ ^) B _q (i)=B ⁿ _q (i, j ⁿ ^) Q _q (i) = q^ _k N _q (i) = n^. In this way, the calculation of d ⁿ (i, j) at each grid point (i, j) only needs to be done once for each word n, and the calculation of D ⁿ _q (i, j) only needs to be performed once for each word n.
It only needs to be done once for q _k where n, Δ(q _k , n) = q,
The amount of calculation is significantly greater than the method of directly solving equation (3), which requires calculating d ⁿ (i', j) and D ⁿ (m+1:i) for the grid points inside the diagonal lines shown in Figure 5 for each frame. decreases. Furthermore, although this method proceeds with the processing for each frame i of the input pattern, it is also possible to perform the processing for each frame j of the standard pattern. That is, as is clear from recurrence formula (6), in order to obtain D ⁿ _q (i, j) also in the j direction, frame j−
Since it is sufficient to know the interframe distance D ⁿ (i, j) between frame 1 and frame j and the intermediate cumulative distance between frame j-2 and frame j-1, j=
It is also possible to find D ⁿ _q (i, j) on j. The above method is called the automaton-controlled clock synchronized propagation type DP method (FSACWDP). At this time, in order to introduce the idea of equation (8), the initial value of D ⁿ _q (i, j) may be set as follows in the calculation of recurrence equation (9). That is, for q _k that satisfies q = Δ(q _k , n), q^ _k = argmin [D _qk (i-1)]
In this case, D ⁿ _q (i, 0)=D _q ^ _k (i-1) (10). FIG. 8 is a block diagram showing one embodiment of the present invention. 100 is an input terminal for audio signals. 10
Reference numeral 1 denotes a feature extraction unit which converts an input audio signal into a series of feature vectors. 102 is a standard pattern storage unit in which each word (syllable) to be recognized is stored as a series of similar feature vectors. Reference numeral 103 denotes an interframe distance calculation unit that calculates the difference between the feature vector output from the feature extraction unit 101 and the feature vector stored in the standard pattern storage unit 102. 104 is an interframe distance storage unit that temporarily stores the calculated interframe distance. 105
is the automaton control rule storage section. Reference numeral 106 is a cumulative distance storage unit in which cumulative distance D _q (i) is stored for each frame. Reference numeral 107 is an initial value calculation unit which calculates the initial value in the calculation of recurrence formula (9) according to formula (10). Reference numeral 108 is a recurrence formula calculation unit, which calculates the recurrence formula (10) based on the initial value calculated in 107. 109
is a final value determining unit, which determines n^=argmin [D ⁿ _q (i, J ⁿ ) N _q (i)=n^ D _q (i ) = D ⁿ ^ _q (i, J ⁿ ^) B _q (i) = B ⁿ ^ _q (i, j ⁿ ^) Q _q (i) = argmin [D _qk (B _q (i))] Calculate This is the part to do. 113, 114, 115
are the last word memory section, the previous state memory section, and the last word memory section, respectively.
The last word N _q , which is a back pointer storage unit, and which is determined by the final value determining unit 109.
(i), state immediately before q _q (i), back pointer
B _q (i) is stored for each frame. 11
2 is a part that determines the state of the input audio in the final frame. That is, when q _f F, the state q of the final frame I is determined as q=argmin(D _qf (I)). 110 is a voice section detection unit; 111
is the frame number coefficient part. Reference numeral 116 is a backtrace control unit, which stores the last word storage unit 113.
Then, from the contents of the previous state storage section 114 and the back pointer storage section 115, the last word storage section 11 is stored in the reverse order according to the flowchart shown in FIG.
The recognition results are output from step 3. FIG. 9 is a chart showing an example of the case where the operation of the above embodiment is realized by software. Steps 200 to 204 are the parts that initialize the entire frame, steps 205 to 214 are the parts that perform processing in frame i, and steps 206 to 212 are the parts that perform the processing for each standard pattern n (n
= 1, 2, ..., N), the part where D ⁿ _q (i), B ⁿ _q (i) are calculated, steps 213 to 214, are the last part when it is assumed that the input audio ends at frame i. Word (syllable) name, corresponding cumulative distance,
The back pointer is the part that determines the previous state for each state. Step 208 corresponds to the operations of the interframe distance calculation section 103 and the interframe distance storage section 104. Step 210 is the initial value calculation section 10
Corresponds to operation 7. Step 211~Step 212
corresponds to the operation of the recurrence formula calculation unit 108. Steps 213 and 214 correspond to the operations of the final value determining section 109. Steps 215 to 217 are processes for determining recognition results in reverse order starting from the last frame of the input audio, and include the final frame state determining section 118, last word storage section 113, and previous state storage section 114 back pointer storage. This corresponds to the operation performed between the back trace control section 115 and the back trace control section 116. Another known method for directly determining and solving D ⁿ _q (i, j) is the automaton control level building method (FSALB). The idea of the present invention can also be introduced to this, and the amount of calculation can be significantly reduced. Figure 10 shows the sound of a 5-digit number, for example,
Seventy-three thousand two hundred and eighty-six, and so on.
This is an automaton expression for reading 10. In contrast, devices using conventional FSACWDP and FSALB, and degenerate FSACWDP and degenerate device according to the present invention.
Table 1 compares the calculation amount and work memory storage amount for devices using FSALB. According to this table, for the task shown in Figure 10,
The amount of calculation is greatly reduced, and the amount of memory is also less than the same amount, so it has a great practical effect.

【表】発明の効果以上述べたように、オートマトンの制御規制を
ｑ＝Δ（q_k，ｎ）とするとき、パターンｎに対し
て、状態q_kが複数存在するとき、従来は最後尾パ
ターンｎに対する累積距離を直前の状態q_kに対し
てすべて行つていたのを、状態ｑとなる直前の状
態のうち、同じパターンｎで連がるものがあると
きは、それら直前の状態に至るまでの累積距離が
最小である状態から連がるという事実を利用する
ことにより、タスクによつては計算量を大幅に減
らすことが可能となつたものである。[Table] Effects of the Invention As described above, when the control regulation of the automaton is q = Δ(q _k , n), and there are multiple states q _k for pattern n, conventionally the last pattern The cumulative distance for n is calculated for the previous state q _k , and if any of the states immediately before becoming state q have the same pattern n, then the process returns to those immediately previous states. By taking advantage of the fact that the process continues from the state with the minimum cumulative distance, it has become possible to significantly reduce the amount of calculation depending on the task.

[Brief explanation of the drawing]

第１図〜第３図はDPマツチングの基本原理を
説明する図、第４図はオートマトン制御による連
続単語音声認識におけるセグメンテーシヨンおよ
び認識単語の決定手順を示すフローチヤート、第
５図は本発明の原理を説明するオートマトン表現
の一例を示す図、第６図、第７図は本発明の一実
施例の原理を説明する図、第８図は本発明の一実
施例を示すブロツク図、第９図は同実施例装置の
機能をソフトウエアで実現したときのチヤート、
第１０図は同実施例の効果を示す図である。１０１……特徴抽出部、１０２……標準パター
ン記憶部、１０３……フレーム間距離計算部、１
０４……フレーム間距離記憶部、１０５……オー
トマトン制御規則記憶部、１０６……累積距離記
憶部、１０７……初期値計算部、１０８……漸化
式計算部、１０９……最終値決定部、１１０……
音声区間検出部、１１１……フレーム数係数部、
１１２……最終フレーム状態決定部、１１３……
最後尾単語記憶部、１１４……直前状態記憶部、
１１５……バツクポインタ記憶部、１１６……バ
ツクトレース制御部。 Figures 1 to 3 are diagrams explaining the basic principle of DP matching, Figure 4 is a flowchart showing the procedure for segmentation and recognition word determination in continuous word speech recognition by automaton control, and Figure 5 is a flowchart of the present invention. 6 and 7 are diagrams illustrating the principle of an embodiment of the present invention, and FIG. 8 is a block diagram illustrating an embodiment of the present invention. Figure 9 is a chart when the functions of the same embodiment device are realized by software.
FIG. 10 is a diagram showing the effect of the same embodiment. 101...Feature extraction unit, 102...Standard pattern storage unit, 103...Interframe distance calculation unit, 1
04...Interframe distance storage section, 105...Automaton control rule storage section, 106...cumulative distance storage section, 107...Initial value calculation section, 108...Recurrence equation calculation section, 109...Final value determination section , 110...
Voice section detection section, 111... frame number coefficient section,
112...Final frame state determination unit, 113...
Last word storage unit, 114... Immediate state storage unit,
115... Back pointer storage unit, 116... Back trace control unit.

Claims

[Claims]

1 Input signal as feature vector a ₁ , a ₂ ,...a _i ,...a _I
and a standard pattern R ⁿ (where n=1, 2,..., N) consisting of a series of feature vectors b ⁿ ₁ , b ⁿ ₂ , ... b ⁿ _J , ... b ⁿ _Jo The distance between the standard pattern storage means to be stored and the feature vectors a _i and b _j in the i-th frame of the input pattern.
d ⁿ (i, j) as j=1,2,...,j ⁿ ;n=1,2,
..., N, and in the i-th frame of the input pattern, j=
1, 2,..., j ⁿ ; For n=1, 2,..., N,
Intermediate cumulative distance D ⁿ _q when state is q
Let (i, j) be the minimum value of the cumulative distance D _qk (i-1) along the path leading to state q _k in the i-1st frame among the immediately preceding states q _k that continue to q in pattern n. A cumulative distance calculation means for calculating the intermediate back pointer B ⁿ _q (i, j) along the path that led to the calculation, and the last pattern n when it is assumed that the i-th frame ends in state q. n that minimizes the intermediate cumulative distance D ⁿ _q (i, j ⁿ )
, and the pattern name at that time is N _q (i)
= n^, the cumulative distance is D _q (i) = D ⁿ ^ _q (i, j ⁿ ^), the back pointer is B _q (i) = B ⁿ ^ _q (i, j ⁿ ^), the previous in state
When q _k that minimizes D _qk (B _q (i)) is q^ _k , Q _q
(i) A final determination means for setting = q^ _k , a last pattern storage means for storing these values for each frame, a back pointer storage means, a previous state storage means, and input is completed at the last frame I of the input pattern. Then, the cumulative distance D _qf (I) for the state q _f allowed as the final state is read from the cumulative distance storage means, and q^ _f that gives the minimum value of D _qf (I) is set as the state q in the final frame. , from the contents of the pack pointer storage means, the previous state storage means, and the last word storage means, n=N _q (i), i=B _q (i) until B _q (i)=0. , q=Q _q (i) and outputs the recognition results in reverse order.