JPH09127977A

JPH09127977A - Voice recognition method

Info

Publication number: JPH09127977A
Application number: JP28031495A
Authority: JP
Inventors: Takashi Miki; 敬三木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-10-27
Filing date: 1995-10-27
Publication date: 1997-05-16
Anticipated expiration: 2015-10-27
Also published as: JP3251480B2

Abstract

PROBLEM TO BE SOLVED: To sharply reduce the calculation quantity by reading the reference probability, and obtaining the forward probability of the present frame number when the distance between the voice feature vector of the present frame number and the voice feature vector of the standard frame number becomes a threshold value or below. SOLUTION: When the present frame number (t) is the terminal frame number T or below, a collation section obtains the distance dts between voice feature vectors xt , xqs of the present frame number (t) and the standard frame number qs. When the distance dts is a threshold value DTS or below, the output probability Bji (xt ) of the present frame number (t) is approximately equal to the reference probability Bji, and the reference probability Bji is not rewritten. The output probability Bji is not calculated by an equation, each reference probability Bji is read out, and the logarithmic forward probability Cit is obtained. The forward probability Cit is obtained with the output probability Bji (xqs ) of the standard frame number qs.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、認識照合用の標
準パタンにヒドンマルコフモデルを用いた音声認識方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method using a Hidden Markov Model as a standard pattern for recognition and verification.

【０００２】[0002]

【従来の技術】ヒドンマルコフモデル（Ｈｉｄｄｅｎ
ＭａｒｋｏｖＭｏｄｅｌ。以下、ＨＭＭ）は、音声パ
タンのような発声速度に伴う時間変動、発声の個人差や
調音結合などの揺らぎを含むパタンを適切に表現でき、
このため音声認識の分野において広く用いられている。
音声認識で用いるＨＭＭは、いくつかの状態例えばＳ₀
〜Ｓ₃ と、状態Ｓ_i から状態Ｓ_j に遷移する確率ａ_ij及
びその遷移の際に出力される音声特徴ベクトルｘの出力
確率ｂ_ij(x) を有し、一般に、出力確率ｂ_ij(x)を、複
数個の正規分布から成る無相関混合正規分布で表現す
る。2. Description of the Related Art Hidden Markov Model (Hidden
Markov Model. Hereinafter, HMM) can appropriately express patterns including fluctuations such as voice patterns, which vary with vocalization speed, individual differences in vocalization, and articulatory coupling.
Therefore, it is widely used in the field of voice recognition.
HMMs used in speech recognition have several states, such as S _0.
˜S _3, and a probability a _ij of transition from the state S _i to the state S _j and an output probability b _ij (x) of the speech feature vector x output at the time of the transition, and generally, the output probability b _ij ( x) is expressed as an uncorrelated mixed normal distribution composed of a plurality of normal distributions.

【０００３】ＨＭＭを用いた音声認識方法では、音声信
号から、音声区間の各フレーム毎に音声特徴ベクトルｘ
_t を抽出し、次いで音声特徴ベクトルｘ_t の出力確率ｂ
_ij(x_t)を求める。出力確率ｂ_ij(x_t)として、典型的に
は、ｂ_ij(x_t)＝Σ｛λ_ijm ｂ_ijm(x_t) ｝を算出する。こ
こで、λ_ijm は無相関混合正規分布における第ｍ番目の
正規分布の重み、ｂ_ijm(x_t) は無相関混合正規分布にお
ける第ｍ番目の正規分布から求めた音声特徴ベクトルｘ
_t の出力確率（重み付け無しの出力確率）を表す。In the voice recognition method using the HMM, a voice feature vector x is calculated for each frame of a voice section from a voice signal.
_t , and then output probability b of speech feature vector x _t
_{Find ij} (x _t ). As the output probability b _ij (x _t ), typically b _ij (x _t ) = Σ {λ _ijm b _ijm (x _t )} is calculated. Here, λ _ijm is the weight of the m-th normal distribution in the uncorrelated mixed normal distribution, and b _ijm (x _t ) is the speech feature vector x obtained from the m-th normal distribution in the uncorrelated mixed normal distribution.
Indicates the output probability of _t (output probability without weighting).

【０００４】そして音声区間の始端フレームから終端フ
レームまでに抽出された音声特徴ベクトルｘ_t の時系列
とＨＭＭとの間の尤度を、各音声特徴ベクトルｘ_t の出
力確率ｂ_ij(x_t)を用いて、求める。標準パタンとして用
意された各ＨＭＭ毎に尤度を求め、最大の尤度を得たＨ
ＭＭに付与されているカテゴリを認識結果とする。Then, the likelihood between the time series of the voice feature vector x _t extracted from the start frame to the end frame of the voice section and the HMM is calculated as the output probability b _ij (x _t ) of each voice feature vector x _t. Use to find. The likelihood is calculated for each HMM prepared as a standard pattern, and the maximum likelihood H is obtained.
The category assigned to the MM is used as the recognition result.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら音声特徴
ベクトルｘ_t の出力確率ｂ_ij(x_t)＝Σ｛λ_ijm ｂ
_ijm(x_t) ｝を求めるには膨大な計算が必要であり、従っ
て音声特徴ベクトルｘ_t の時系列とＨＭＭとの尤度を高
速に求めることは難しい。However, the output probability b _ij (x _t ) = Σ {λ _ijm b of the speech feature vector x _t.
_ijm (x _t )} requires enormous calculation, and thus it is difficult to quickly calculate the likelihood between the time series of the speech feature vector x _t and the HMM.

【０００６】このため、音声特徴ベクトルｘ_t の出力確
率ｂ_ij(x_t)を、誤差を抑えつつ、より簡略に求めること
が望まれていた。Therefore, it has been desired to more simply obtain the output probability b _ij (x _t ) of the voice feature vector x _t while suppressing the error.

【０００７】[0007]

【課題を解決するための手段】前述の課題を解決するた
め、請求項１〜８の発明の音声認識方法はそれぞれ、音
声区間の始端フレームから終端フレームまでに抽出され
た音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、ｘ_T と
ヒドンマルコフモデルとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ
₂ 、……、ｘ_T ）｝を求め、最大の尤度を得たヒドンマ
ルコフモデルに付与されているカテゴリを、当該音声区
間内の音声信号に対する認識結果とする音声認識方法に
おいて、In order to solve the above-mentioned problems, the speech recognition method according to the inventions of claims 1 to 8 is a time series of speech feature vectors extracted from the start frame to the end frame of a voice section. The likelihood ln {P (x ₁ , x, x between x ₁ , x ₂ , ..., X _T and the Hidden Markov model
_2, ..., seek x _T)}, a category that is given to the hidden Markov model to obtain the maximum likelihood, the speech recognition method according to the recognition result for the speech signal in the speech segment,

【０００８】[0008]

【数４】 (Equation 4)

【０００９】但し、ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊ Ф_i ：ヒドンマルコフモデルにおいて初期状態がＳ_i で
ある確率ａ_ji：ヒドンマルコフモデルにおいて状態Ｓ_j から状態
Ｓ_i に遷移する確率ｘ_t ：音声区間内の第ｔ番目のフレームで抽出された音
声特徴ベクトル（１≦ｔ≦Ｔであって、第１番目のフレ
ームは音声区間の始端フレームを及び第Ｔ番目のフレー
ムは音声区間の終端フレームを表す）ｂ_ji(x_t)：ヒドンマルコフモデルにおいて状態Ｓ_j から
状態Ｓ_i に遷移するとき出力される音声特徴ベクトルｘ
_t の出力確率ｃ_it：ヒドンマルコフモデルにおいて初期状態から遷移
を開始し音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_t を出力して状態Ｓ_i に至る前向き確率＊ｉ：ヒドンマルコフモデルにおいて最終状態となる状
態Ｓ_i に付与されている状態番号ｉで示される各式を用いて尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝を求めるに当り、次の如く処理を行なうこ
とを特徴とする。However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that the initial state is S _i in the Hidden Markov model a _ji : Hidden Markov model At the state S _j to the state S _i in the above, x _t : the speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, and the first frame corresponds to the speech section). The start frame and the T-th frame represent the end frame of the speech section) b _ji (x _t ): speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t and reaching state S _i * i: Likelihood ln {P (x (x)) is obtained by using each equation represented by state number i given to state S _i that is the final state in the Hidden Markov model. ₁ , x ₂ , ...
., X _T )} is obtained, the following processing is performed.

【００１０】＜請求項１〜２の発明＞すなわち、請求項
１の発明の音声認識方法にあっては、基準フレーム番号
ｑｓと参照確率ｂ_jiとを格納する記憶部を設け、参照確
率ｂ_jiを用いて、ｔ＝１、２、……、Ｔの各場合の前向
き確率ｃ_itを順次に求める。<Invention of Claims 1 and 2> That is, in the speech recognition method of the invention of Claim 1, a storage unit for storing the reference frame number qs and the reference probability b _ji is provided, and the reference probability b _ji. , The forward probability c _it in each case of t = 1, 2, ..., T is sequentially obtained.

【００１１】そして（１）．ｔ＝１のときは、基準フレ
ーム番号ｑｓを１に初期化すると共に、全てのｊ、ｉに
ついて、出力確率ｂ_ji(x_t)をヒドンマルコフモデルから
求め当該出力確率ｂ_ji(x_t)を参照確率ｂ_jiの初期値とし
て書き込み、参照確率ｂ_jiの書込み終了後に各参照確率
ｂ_jiを読み出して前向き確率ｃ_itを求める処理（１Ａ）
と、処理（１Ａ）の終了後、現フレーム番号ｔに１を加
算する処理（１Ｂ）とを行なう。And (1). When the t = 1, is initialized to 1 reference frame number qs, all j, for i, the output probability b _ji the (x _t) determined from the hidden Markov model the output probability b _ji the (x _t) writing the initial value of the reference probability b _ji, after completion of writing of the reference probability b _ji reads each reference probability b _ji seek forward probability c _it processes (1A)
Then, after the process (1A) is completed, the process (1B) of adding 1 to the current frame number t is performed.

【００１２】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と基準フレーム番号ｑｓ
の音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを閾値ＤＴ
Ｓと比較し、当該比較結果がｄｔｓ＞ＤＴＳとなる場合
に、基準フレーム番号ｑｓを現フレーム番号ｔに書き換
えると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)
をヒドンマルコフモデルから求めて参照確率ｂ_jiを当該
出力確率ｂ_ji(x_t)に書き換え、該参照確率ｂ_jiの書換え
終了後に各参照確率ｂ_jiを読み出して前向き確率ｃ_itを
求め、当該比較結果がｄｔｓ≦ＤＴＳとなる場合に、参
照確率ｂ_jiの書き換えを行なわずに各参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求める処理（１Ｃ）と、処理
（１Ｃ）の終了後、現フレーム番号ｔに１を加算する処
理（１Ｄ）とを行なう。(2). When 2 ≦ t ≦ T, the voice feature vector x _t of the current frame number _t and the reference frame number qs
Of the voice feature vector x _qs of the
When S is compared with S and the comparison result is dts> DTS, the reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) for all j and i.
Rewriting the reference probability b _ji determined from hidden Markov model to the output probability b _ji (x _t), determine the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji, the comparison If the result is dts ≦ DTS, reads each reference probability b _ji without rewriting the reference probability b _ji seek forward probability c _it processing (1C), after the processing (1C), the current frame A process (1D) of adding 1 to the number t is performed.

【００１３】このように請求項１の発明では、参照確率
ｂ_jiの初期値を、始端フレームでヒドンマルコフモデル
から求めた出力確率ｂ_ji(x₁)とし、基準フレーム番号ｑ
ｓの初期値を、始端フレームのフレーム番号１とする。As described above, in the first aspect of the invention, the initial value of the reference probability b _ji is the output probability b _ji (x ₁ ) obtained from the Hidden Markov model in the starting frame, and the reference frame number q
The initial value of s is the frame number 1 of the start frame.

【００１４】そして現フレーム番号ｔの音声特徴ベクト
ルｘ_t と基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
との間の距離ｄｔｓを閾値ＤＴＳと比較する。ｄｔｓ＞
ＤＴＳの場合は、基準フレーム番号ｑｓの書換えと参照
確率ｂ_jiの書換えとを行ない、書き換えた参照確率ｂ_ji
を読み出して前向き確率ｃ_itを求める。ｄｔｓ≦ＤＴＳ
の場合は、基準フレーム番号ｑｓの書換えと参照確率ｂ
_jiの書換えとは行なわず、書換えを行なわなかった参照
確率ｂ_jiを読み出して前向き確率ｃ_itを求める。Then, the voice feature vector x _t of the current frame number t and the voice feature vector x qs of the reference frame number _qs
The distance dts between and is compared to a threshold DTS. dts>
In the case of DTS, the reference frame number qs and the reference probability b _ji are rewritten, and the rewritten reference probability b _ji is performed.
Is read to obtain the forward probability c _it . dts ≦ DTS
In the case of, rewriting of the reference frame number qs and the reference probability b
The rewriting of _ji is not performed, and the reference probability b _ji that has not been rewritten is read to obtain the forward probability c _it .

【００１５】従って記憶部に格納される参照確率ｂ
_jiは、基準フレーム番号ｑｓのフレームでヒドンマルコ
フモデルから求めた出力確率ｂ_ji(x_t)である。Therefore, the reference probability b stored in the storage unit
_ji is the output probability b _ji (x _t ) obtained from the Hidden Markov model in the frame with the reference frame number qs.

【００１６】そしてｄｔｓ＞ＤＴＳの場合は、距離ｄｔ
ｓが閾値ＤＴＳを越えるので現フレーム番号ｔの音声特
徴ベクトルｘ_t が書換え前の基準フレーム番号ｑｓの音
声特徴ベクトルｘ_qsに近似しない場合であり、従って現
フレーム番号ｔの出力確率ｂ_ji(x_t)は、書換え前の基準
フレーム番号ｑｓの出力確率ｂ_ji(x_qs) すなわち参照確
率ｂ_jiで近似できない。そこで参照確率ｂ_jiを、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)に書き換え、この書き換
えた参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。また参照確率ｂ_jiを、現フレーム番号ｔの出力確率
ｂ_ji(x_t)に書き換えるので、基準フレーム番号ｑｓを現
フレーム番号ｔに書き換える。If dts> DTS, the distance dt
s is a case where since exceeding the threshold DTS audio feature vector x _t of the current frame number t does not approximate to the speech feature vector x _qs reference frame number qs before rewriting, thus the output probability b _ji (x of the current frame number t _t ) cannot be approximated by the output probability b _ji (x _qs ) of the reference frame number qs before rewriting, that is, the reference probability b _ji . Therefore, the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, and the rewritten reference probability b _ji is read to obtain the forward probability c _it . Further, since the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, the standard frame number qs is rewritten to the current frame number t.

【００１７】ｄｔｓ≦ＤＴＳの場合は、距離ｄｔｓが閾
値ＤＴＳ以下となるので現フレーム番号ｔの音声特徴ベ
クトルｘ_t と書換えを行なわない基準フレーム番号ｑｓ
の音声特徴ベクトルｘ_qsとが近似的に等しくなる場合で
あり、従って現フレーム番号ｔの出力確率ｂ_ji(x_t)は、
基準フレーム番号ｑｓの出力確率ｂ_ji(x_qs) すなわち参
照確率ｂ_jiに近似的に等しくなる。そこで参照確率ｂ_ji
の書換えを行なわずに、参照確率ｂ_jiを読み出して前向
き確率ｃ_itを求める。また参照確率ｂ_jiの書換えを行な
わないので、基準フレーム番号ｑｓの書換えを行なわな
い。If dts≤DTS, the distance dts is less than or equal to the threshold value DTS, and therefore the voice feature vector x _t of the current frame number t is not rewritten to the reference frame number qs.
Is approximately equal to the speech feature vector x _{qs of,} and the output probability b _ji (x _t ) of the current frame number t is
It is approximately equal to the output probability b _ji (x _qs ) of the reference frame number qs, that is, the reference probability b _ji . Therefore, the reference probability b _ji
Without rewriting, the reference probability b _ji is read to obtain the forward probability c _it . Since the reference probability b _ji is not rewritten, the reference frame number qs is not rewritten.

【００１８】このようにｄｔｓ＞ＤＴＳの場合は、参照
確率ｂ_jiの書換えを行なった後に、従って現フレーム番
号ｔの出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求
める演算を行なった後に、参照確率ｂ_jiを読み出して前
向き確率ｃ_itを求める。さらに距離ｄｔｓ≦閾値ＤＴＳ
の場合は、参照確率ｂ_jiの書換えを行なわずに、従って
現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマルコフ
モデルから求める演算を行なわずに、参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求めるので、前向き確率ｃ_it
の誤差を抑えつつ、演算量を減少させることができる。In this way, in the case of dts> DTS, after the reference probability b _ji is rewritten, therefore, after the output probability b _ji (x _t ) of the current frame number t is calculated from the Hidden Markov model, The reference probability b _ji is read to obtain the forward probability c _it . Furthermore, the distance dts ≦ threshold value DTS
In the case of, the reference probability b _ji is not rewritten, and thus the reference probability b _ji is read out without performing the calculation for obtaining the output probability b _ji (x _t ) of the current frame number t from the Hidden Markov model. Since c _it is obtained, the forward probability c _it
It is possible to reduce the calculation amount while suppressing the error of.

【００１９】この場合の前向き確率ｃ_itの誤差とは、ｄ
ｔｓ≦ＤＴＳの場合に出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求める演算を行なわずに得た前向き確率ｃ
_itと、そのような演算の簡略化を行なわずに得た前向き
確率ｃ_itとの差である。The error of the forward probability c _{it in} this case is d
The forward probability c obtained without performing the calculation of the output probability b _ji (x _t ) from the Hidden Markov model when ts ≦ DTS
_It is the difference between _it and the forward probability c _it obtained without such simplification of the operation.

【００２０】閾値ＤＴＳを大きくするに従って、演算の
削減量は増えるが、前向き確率ｃ_itの誤差は大きくな
る。従って実用上望まれる誤差の範囲内で前向き確率ｃ
_itを求めることができるように、閾値ＤＴＳの値を定め
る必要がある。As the threshold value DTS is increased, the amount of calculation reduction increases, but the error of the forward probability c _it increases. Therefore, the forward probability c is within the error range practically desired.
_It is necessary to set the value of the threshold value DTS so that it can be obtained.

【００２１】また請求項２の発明の音声認識方法にあっ
ては、請求項１の発明の音声認識方法において、次の如
く処理を行なう。According to the speech recognition method of the second aspect of the invention, the following processing is performed in the speech recognition method of the first aspect of the invention.

【００２２】（１）．ｔ＝１のときは、基準フレーム番
号ｑｓを１に、及び、スキップ数ｓｋｉｐｓを０に初期
化すると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x
_t)をヒドンマルコフモデルから求め当該出力確率ｂ_ji(x
_t)を参照確率ｂ_jiの初期値として書き込み、参照確率ｂ
_jiの書込み終了後に各参照確率ｂ_jiを読み出して前向き
確率ｃ_itを求める処理（１Ａ）と、処理（１Ａ）の終了
後、現フレーム番号ｔに１を加算する処理（１Ｂ）とを
行なう。(1). When t = 1, the reference frame number qs is initialized to 1, the skip number skips is initialized to 0, and the output probabilities b _ji (x
_t ) is calculated from the Hidden Markov model and the output probability b _ji (x
_t ) is written as the initial value of the reference probability b _ji , and the reference probability b
After the writing of _ji is completed, each reference probability b _ji is read out to obtain the forward probability c _it (1A), and after the process (1A) is finished, a process (1B) of adding 1 to the current frame number t is performed.

【００２３】（２）．２≦ｔ≦Ｔのときは、スキップ数
ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共に、現フ
レーム番号ｔの音声特徴ベクトルｘ_t と基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを閾
値ＤＴＳと比較し、当該比較結果がｓｋｉｐｓ＞ＮＳＫ
ＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる場合に、スキップ
数ｓｋｉｐｓを０に初期化し、及び、基準フレーム番号
ｑｓを現フレーム番号ｔに書き換えると共に、全ての
ｊ、ｉについて、出力確率ｂ_ji(x_t)をヒドンマルコフモ
デルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x_t)に
書き換え、該参照確率ｂ_jiの書換え終了後に各参照確率
ｂ_jiを読み出して前向き確率ｃ_itを求め、当該比較結果
がｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳとなる
場合に、スキップ数ｓｋｉｐｓに１を加算すると共に、
参照確率ｂ_jiの書換えを行なわずに各参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求める処理（１Ｃ）と、処理
（１Ｃ）の終了後、現フレーム番号ｔに１を加算する処
理（１Ｄ）とを行なう。(2). 2 ≦ t when the ≦ T, together with comparing the number of skips skips a threshold NSKIPS, threshold DTS distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t And the comparison result is skips> NSK.
When IPS or dts> DTS, the skip number skips is initialized to 0, the reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) are set for all j and i. hidden rewritten reference probability b _ji determined from Markov model to the output probability b _ji (x _t), determine the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji, the comparison result Is skips ≦ NSKIPS and dts ≦ DTS, 1 is added to the skip number skips, and
Without rewriting the reference probability b _ji reads each reference probability b _ji seek forward probability c _it processing (1C), after the processing (1C), the process of adding 1 to the current frame number t (1D ) And do.

【００２４】このように請求項２の発明では、参照確率
ｂ_jiの初期値を、始端フレームでヒドンマルコフモデル
から求めた出力確率ｂ_ji(x₁)とし、基準フレーム番号ｑ
ｓの初期値を、始端フレームのフレーム番号１とし、ス
キップ数ｓｋｉｐｓの初期値を０とする。As described above, in the second aspect of the present invention, the initial value of the reference probability b _ji is set to the output probability b _ji (x ₁ ) obtained from the Hidden Markov model in the starting frame, and the reference frame number q
The initial value of s is the frame number 1 of the start frame, and the initial value of the skip number skips is 0.

【００２５】そしてスキップ数ｓｋｉｐｓを閾値ＮＳＫ
ＩＰＳと比較すると共に、現フレーム番号ｔの音声特徴
ベクトルｘ_t と基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsとの間の距離ｄｔｓを閾値ＤＴＳと比較する。ｓ
ｋｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳの場合
は、スキップ数ｓｋｉｐｓの初期化と基準フレーム番号
ｑｓの書換えと参照確率ｂ_jiの書換えとを行ない、書き
換えた参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳの場
合は、スキップ数ｓｋｉｐｓのカウントアップを行な
い、基準フレーム番号ｑｓの書換えと参照確率ｂ_jiの書
換えとは行なわず、書換えを行なわなかった参照確率ｂ
_jiを読み出して前向き確率ｃ_itを求める。Then, the skip number skips is set to the threshold value NSK.
With comparison with IPS, comparing the distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t and the threshold DTS. s
If kips> NSKIPS or dts> DTS, the skip number skips is initialized, the reference frame number qs is rewritten, and the reference probability b _ji is rewritten, and the rewritten reference probability b _ji is read to obtain the forward probability c _it . . When skips ≦ NSKIPS and dts ≦ DTS, the skip number skips is counted up, the reference frame number qs is not rewritten, and the reference probability b _ji is not rewritten, but the reference probability b that is not rewritten.
_ji is read to obtain the forward probability c _it .

【００２６】従って記憶部に格納される参照確率ｂ
_jiは、基準フレーム番号ｑｓのフレームでヒドンマルコ
フモデルから求めた出力確率ｂ_ji(x_t)である。Therefore, the reference probability b stored in the storage unit
_ji is the output probability b _ji (x _t ) obtained from the Hidden Markov model in the frame with the reference frame number qs.

【００２７】そしてｄｔｓ＞ＤＴＳの場合は、距離ｄｔ
ｓが閾値ＤＴＳを越えるので現フレーム番号ｔの音声特
徴ベクトルｘ_t が書換え前の基準フレーム番号ｑｓの音
声特徴ベクトルｘ_qsに近似しない場合であり、従って現
フレーム番号ｔの出力確率ｂ_ji(x_t)を、書換え前の基準
フレーム番号ｑｓの出力確率ｂ_ji(x_qs) すなわち参照確
率ｂ_jiで近似できない。そこで参照確率ｂ_jiを、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)に書き換え、この書き換
えた参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。また参照確率ｂ_jiを、現フレーム番号ｔの出力確率
ｂ_ji(x_t)に書き換えるので、基準フレーム番号ｑｓを現
フレーム番号ｔに書き換える。スキップ数ｓｋｉｐｓ
は、ｓｋｉｐｓ≦ＮＳＫＩＰＳとなる範囲内で参照確率
ｂ_jiの書換えを行なわなかった回数を表すものであるの
で、スキップ数ｓｋｉｐｓを初期化する。If dts> DTS, the distance dt
s is a case where since exceeding the threshold DTS audio feature vector x _t of the current frame number t does not approximate to the speech feature vector x _qs reference frame number qs before rewriting, thus the output probability b _ji (x of the current frame number t _t ) cannot be approximated by the output probability b _ji (x _qs ) of the reference frame number qs before rewriting, that is, the reference probability b _ji . Therefore, the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, and the rewritten reference probability b _ji is read to obtain the forward probability c _it . Further, since the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, the standard frame number qs is rewritten to the current frame number t. Number of skips skips
Represents the number of times the reference probability b _ji was not rewritten within the range of skips ≦ NSKIPS, so the skip number skips is initialized.

【００２８】ｓｋｉｐｓ＞ＮＳＫＩＰＳの場合は、参照
確率ｂ_jiの書換えを行なわなかった回数ｓｋｉｐｓが閾
値ＮＳＫＩＰＳを越えるので現フレーム番号ｔと基準フ
レーム番号ｑｓとの時間的隔たりが大きくなり、従って
誤差が増大する可能性が高い。そこで誤差を低減すべ
く、参照確率ｂ_jiの書換えを行なう。従って参照確率ｂ
_jiを、現フレーム番号ｔの出力確率ｂ_ji(x_t)に書き換え
るので、基準フレーム番号ｑｓを現フレーム番号ｔに書
き換える。またスキップ数ｓｋｉｐｓは、ｓｋｉｐｓ≦
ＮＳＫＩＰＳとなる範囲内で参照確率ｂ_jiの書換えを行
なわなかった回数を表すものであるので、スキップ数ｓ
ｋｉｐｓを初期化する。In the case of skips> NSKIPS, the number of times skips in which the reference probability b _ji has not been rewritten exceeds the threshold value NSKIPS, so that the time gap between the current frame number t and the reference frame number qs becomes large, thus increasing the error. Is likely to. Therefore, the reference probability b _ji is rewritten in order to reduce the error. Therefore, the reference probability b
_{Since ji} is rewritten to the output probability b _ji (x _t ) of the current frame number t, the reference frame number qs is rewritten to the current frame number t. The skip number skips is skips ≦
Since it represents the number of times the reference probability b _ji was not rewritten within the range of NSKIPS, the number of skips s
Initialize the kips.

【００２９】ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦Ｄ
ＴＳの場合は、ｄｔｓ≦ＤＴＳであるので現フレーム番
号ｔの音声特徴ベクトルｘ_t と書換えを行なわない基準
フレーム番号ｑｓの音声特徴ベクトルｘ_qsとが近似的に
等しくなる場合であり、従って現フレーム番号ｔの出力
確率ｂ_ji(x_t)は、書換えを行なわない基準フレーム番号
ｑｓの出力確率ｂ_ji(x_qs) すなわち参照確率ｂ_jiに近似
的に等しくなる。しかもｓｋｉｐｓ≦ＮＳＫＩＰＳであ
り、従って参照確率ｂ_jiの書換えを行なわなかった回数
ｓｋｉｐｓが閾値ＮＳＫＩＰＳ以下であるので現フレー
ム番号ｔと基準フレーム番号ｑｓとの時間的隔たりが小
さく、従って誤差が増大する可能性は低い。そこで参照
確率ｂ_jiの書換えを行なわずに、参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める。従って参照確率ｂ_jiの書
換えを行なわないので、基準フレーム番号ｑｓの書換え
を行なわない。またスキップ数ｓｋｉｐｓは、ｓｋｉｐ
ｓ≦ＮＳＫＩＰＳとなる範囲内で参照確率ｂ_jiの書換え
を行なわなかった回数を表すものであるので、スキップ
数ｓｋｉｐｓに１を加算してスキップ数ｓｋｉｐｓをカ
ウントアップする。Skips ≦ NSKIPS and dts ≦ D
For TS, a case where the audio feature vector x _qs reference frame number qs is not performed rewriting a speech feature vector x _t of the current frame number t is equal to approximately since at dts ≦ DTS, therefore the current frame The output probability b _ji (x _t ) of the number t is approximately equal to the output probability b _ji (x _qs ) of the reference frame number qs that is not rewritten, that is, the reference probability b _ji . Moreover, skips ≦ NSKIPS, and therefore the number of times skips at which the reference probability b _ji has not been rewritten is less than or equal to the threshold value NSKIPS, so that the time gap between the current frame number t and the reference frame number qs is small, and thus the error can increase. The sex is low. So without rewriting the reference probability b _ji, it reads the reference probability b _ji by obtaining the forward probability c _it. Therefore, since the reference probability b _ji is not rewritten, the reference frame number qs is not rewritten. The skip number skips is skip
Since it represents the number of times the reference probability b _ji was not rewritten within the range of s ≦ NSKIPS, 1 is added to the skip number skips to count up the skip number skips.

【００３０】このようにｓｋｉｐｓ＞ＮＳＫＩＰＳ若し
くはｄｔｓ＞ＤＴＳの場合は、参照確率ｂ_jiの書換えを
行なった後に、従って現フレーム番号ｔの出力確率ｂ_ji
(x_t)をヒドンマルコフモデルから求める演算を行なった
後に、参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。さらにｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴ
Ｓの場合は、参照確率ｂ_jiの書換えを行なわずに、従っ
て現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求める演算を行なわずに、参照確率ｂ_jiを
読み出して前向き確率ｃ_itを求めるので、前向き確率ｃ
_itの誤差を抑えつつ、演算量を減少させることができ
る。As described above, in the case of skips> NSKIPS or dts> DTS, after the reference probability b _ji is rewritten, the output probability b _{ji of the} current frame number t is accordingly.
After performing a calculation to obtain (x _t ) from the Hidden Markov model, the reference probability b _ji is read to obtain the forward probability c _it . Furthermore, skips ≦ NSKIPS and dts ≦ DT
In the case of S, the reference probability b _ji is read out without rewriting the reference probability b _ji , and thus the output probability b _ji (x _t ) of the current frame number t is not calculated from the Hidden Markov model. Since the probability c _it is obtained, the forward probability c
_The amount of calculation can be reduced while suppressing the error of _it .

【００３１】この場合の前向き確率ｃ_itの誤差とは、ｓ
ｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳの場合に出
力確率ｂ_ji(x_t)をヒドンマルコフモデルから求める演算
を行なわずに得た前向き確率ｃ_itと、そのような演算の
簡略化を行なわずに得た前向き確率ｃ_itとの差である。The error of the forward probability c _{it in} this case is s
In the case of kips ≦ NSKIPS and dts ≦ DTS, the output probability b _ji (x _t ) is obtained without performing the calculation for obtaining from the Hidden Markov model, and the forward probability c _it is obtained without simplifying such calculation. This is the difference from the forward probability c _it .

【００３２】閾値ＤＴＳを大きくするに従って、演算の
削減量は増えるが、前向き確率ｃ_itの誤差は大きくな
る。従って実用上望まれる誤差の範囲内で前向き確率ｃ
_itを求めることができるように、閾値ＤＴＳの値を定め
る必要がある。As the threshold value DTS is increased, the amount of reduction in calculation increases, but the error of the forward probability c _it increases. Therefore, the forward probability c is within the error range practically desired.
_It is necessary to set the value of the threshold value DTS so that it can be obtained.

【００３３】＜請求項３〜６の発明＞さらに請求項３の
発明の音声認識方法にあっては、ヒドンマルコフモデル
において遷移元となる状態Ｓ_j に、定常部及び過渡部の
いずれかの種別ｓを付与し、定常部基準フレーム番号ｑ
ｓ、過渡部基準フレーム番号ｑｔと、参照確率ｂ_jiとを
格納する記憶部を設け、該参照確率ｂ_jiを用いて、ｔ＝
１、２、……、Ｔの各場合の前向き確率ｃ_itを順次に求
める。<Invention of Claims 3 to 6> Furthermore, in the speech recognition method of the invention of Claim 3, the state S _j which is the transition source in the Hidden Markov model has one of a stationary part and a transient part. s is added to the reference frame number q
s, a transition part reference frame number qt, and a reference probability b _ji are provided in a storage unit, and the reference probability b _ji is used to t =
The forward probability c _it for each of 1, 2, ..., T is sequentially obtained.

【００３４】そして（１）．ｔ＝１のときは、定常部基
準フレーム番号ｑｓ、過渡部基準フレーム番号ｑｔをそ
れぞれ１に初期化すると共に、全てのｊ、ｉについて、
出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求め当該
出力確率ｂ_ji(x_t)を参照確率ｂ_jiの初期値として書き込
み、参照確率ｂ_jiの書込み終了後に各参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求める処理（２Ａ）と、処理
（２Ａ）の終了後、現フレーム番号ｔに１を加算する処
理（２Ｂ）とを行なう。And (1). When t = 1, the constant part reference frame number qs and the transient part reference frame number qt are initialized to 1, and all j and i are
Output Write probabilities b _ji (x _t) the probability that output determined from Hidden Markov Models b _ji the (x _t) as the initial value of the reference probability b _ji, after completion of writing of the reference probability b _ji reads each reference probability b _ji A process (2A) for obtaining the forward probability c _it and a process (2B) for adding 1 to the current frame number t are performed after the process (2A) is completed.

【００３５】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と定常部基準フレーム番
号ｑｓの音声特徴ベクトル_qsとの間の距離ｄｔｓを閾値
ＤＴＳと比較し、比較結果がｄｔｓ＞ＤＴＳとなる場合
に、定常部基準フレーム番号ｑｓを現フレーム番号ｔに
書き換える処理（２Ｃ）と、現フレーム番号ｔの音声特
徴ベクトルｘ_t と過渡部基準フレーム番号ｑｔの音声特
徴ベクトルｘ_qtとの間の距離ｄｔｔを閾値ＤＴＴと比較
し、当該比較結果がｄｔｔ＞ＤＴＴとなる場合に、過渡
部基準フレーム番号ｑｔを現フレーム番号ｔに書き換え
る処理（２Ｄ）と、処理（２Ｃ）及び（２Ｄ）の終了
後、ｊ＝１、２、……、Ｊの各ｊ毎に、出力確率ｂ_ji(x
_t)を与える状態遷移の遷移元Ｓ_j に付与されている種別
ｓを判定する処理（２Ｅ）と、処理（２Ｅ）の種別判定
結果が定常部であった場合に、処理（２Ｃ）の比較結果
がｄｔｓ＞ＤＴＳであれば、当該種別判定結果を得たｊ
に関しては全てのｉについて、出力確率ｂ_ji(x_t)をヒド
ンマルコフモデルから求めて参照確率ｂ_jiを当該出力確
率ｂ_ji(x_t)に書き換え、処理（２Ｅ）の種別判定結果が
定常部であった場合に、処理（２Ｃ）の比較判定結果が
ｄｔｓ≦ＤＴＳであれば、当該種別判定結果を得たｊに
関しては参照確率ｂ_jiの書換えを行なわず、処理（２
Ｅ）の種別判定結果が過渡部であった場合に、処理（２
Ｄ）の比較結果がｄｔｔ＞ＤＴＴであれば、当該種別判
定結果を得たｊに関しては全てのｉについて、出力確率
ｂ_ji(x_t)をヒドンマルコフモデルから求めて参照確率ｂ
_jiを当該出力確率ｂ_ji(x_t)に書き換え、処理（２Ｅ）の
種別判定結果が過渡部であった場合に、処理（２Ｄ）の
比較判定結果がｄｔｔ≦ＤＴＴであれば、当該種別判定
結果を得たｊに関しては参照確率ｂ_jiの書換えを行なわ
ない処理（２Ｆ）と、ｊ＝１、２、……、Ｊの個々のｊ
毎に処理（２Ｆ）を行ない、全てのｊにつき前記処理
（２Ｆ）を終了したら、各参照確率ｂ_jiを読み出して前
向き確率ｃ_itを求める処理（２Ｇ）と、処理（２Ｇ）の
終了後、現フレーム番号ｔに１を加算する処理（２Ｈ）
とを行なう。(2). When the 2 ≦ t ≦ T, the distance dts between the speech feature vector _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t is compared with a threshold DTS, the comparison result is dts> DTS between when made, processing of rewriting the constant part reference frame number qs to the current frame number t and (2C), an audio feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t The distance dtt of the threshold value DTT is compared with the threshold value DTT, and when the comparison result is dtt> DTT, the transition part reference frame number qt is rewritten to the current frame number t (2D) and the processes (2C) and (2D). After the end, the output probability b _ji (x
_t ) is compared with the process (2E) for determining the type s _assigned to the transition source S _j of the state transition and the process (2C) when the type determination result of the process (2E) is a stationary part. If the result is dts> DTS, the type determination result is obtained j
For, regarding all i, the output probability b _ji (x _t ) is obtained from the Hidden Markov model, and the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and the type determination result of the process (2E) is the stationary part. If the comparison determination result of the process (2C) is dts ≦ DTS, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and the process (2
When the type determination result of E) is the transition part, the process (2
If the comparison result of D) is dtt> DTT, the output probability b _ji (x _t ) is obtained from the Hidden Markov model for all i with respect to j for which the type determination result is obtained, and the reference probability b
_{If ji} is rewritten to the output probability b _ji (x _t ), and the type determination result of the process (2E) is the transition part, and the comparison determination result of the process (2D) is dtt ≦ DTT, the type determination For j that has obtained the result, a process (2F) in which the reference probability b _ji is not rewritten, and j = 1, 2, ...
The process (2F) is performed for each j, and when the process (2F) is completed for all j, the process (2G) of reading out each reference probability b _ji to obtain the forward probability c _it , and after the process (2G), Processing for adding 1 to the current frame number t (2H)
And.

【００３６】このように請求項３の発明では、ｊ＝１、
２、……、Ｊの個々のｊ毎に、出力確率ｂ_ji(x_t)を与え
る状態遷移の、遷移元Ｓ_j に付与されている種別ｓを判
定する。As described above, in the invention of claim 3, j = 1,
For each j of 2, ..., J, the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ) is determined.

【００３７】種別ｓが定常部である場合は、定常部に関
わる距離ｄｔｓが閾値ＤＴＳを越えれば、当該種別を得
たｊに関しては、現フレーム番号ｔの出力確率ｂ_ji(x_t)
をヒドンマルコフモデルから求めそして参照確率ｂ_jiを
当該出力確率ｂ_ji(x_t)に書き換え、然る後に、参照確率
ｂ_jiを読み出して前向き確率ｃ_itを求める。また定常部
に関わる距離ｄｔｓが閾値ＤＴＳ以下であれば、当該種
別ｓを得たｊに関しては、参照確率ｂ_jiの書換えを行な
わずに、従って現フレーム番号ｔの出力確率ｂ_ji(x_t)を
ヒドンマルコフモデルから求めずに、参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求める。これがため種別ｓが
定常部であるという判定結果を得たｊに関し、前向き確
率ｃ_itの誤差を抑えつつ、演算量を減少させることがで
きる。When the type s is the stationary part, and the distance dts related to the stationary part exceeds the threshold value DTS, the output probability b _ji (x _t ) of the current frame number t is obtained for j that obtained the type.
From the Hidden Markov model, and the reference probability b _ji is rewritten to the output probability b _ji (x _t ). After that, the reference probability b _ji is read to obtain the forward probability c _it . If the distance dts related to the stationary part is equal to or less than the threshold value DTS, the reference probability b _ji is not rewritten for j for which the type s is obtained, and accordingly, the output probability b _ji (x _t ) of the current frame number t The reference probability b _ji is read out and the forward probability c _it is calculated without calculating H from the Hidden Markov model. Therefore, with respect to j for which the determination result that the type s is the stationary part is obtained, it is possible to reduce the amount of calculation while suppressing the error of the forward probability c _it .

【００３８】この場合の前向き確率ｃ_itの誤差とは、種
別ｓが定常部であるという判定結果を得たｊに関して、
ｄｔｓ≦ＤＴＳの場合に出力確率ｂ_ji(x_t)をヒドンマル
コフモデルから求める演算を行なわずに得た前向き確率
ｃ_itと、そのような演算の簡略化を行なわずに得た前向
き確率ｃ_itとの間の差である。The error of the forward probability c _{it in} this case means that j is obtained as a result of the determination that the type s is a stationary part.
dts ≦ a forward probability c _it got an output probability b _ji (x _t) without operation for obtaining the hidden Markov model in the case of DTS, such forward probability c _it obtained without simplification of operation Is the difference between.

【００３９】定常部に関わる距離ｄｔｓ、閾値ＤＴＳの
比較結果に応じて、参照確率ｂ_jiの書換えを行なうの
は、次の理由による。すなわちｄｔｓ＞ＤＴＳであれば
定常部基準フレーム番号ｑｓの書換えを行なうこととな
るが、ｄｔｓ＞ＤＴＳであるので現フレーム番号ｔの音
声特徴ベクトルｘ_t はこの書換え前の定常部基準フレー
ム番号ｑｓの音声特徴ベクトルｘ_qsに近似せず、従って
現フレーム番号ｔの音声特徴ベクトルｘ_t は定常部基準
フレーム番号ｑｓの音声特徴ベクトルｘ_qsからの変化が
大きいので、現フレーム番号ｔの出力確率ｂ_ji(x_t)を参
照確率ｂ_jiで近似することはできない。また距離ｄｔｓ
≦閾値ＤＴＳであれば定常部基準フレーム番号ｑｓの書
換えを行なわないこととなるが、ｄｔｓ≦ＤＴＳである
ので現フレーム番号ｔの音声特徴ベクトルｘ_t は書換え
を行なわない定常部基準フレーム番号ｑｓの音声特徴ベ
クトルｘ_qsと近似的に等しくなり、従って現フレーム番
号ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番号
ｑｓの音声特徴ベクトルｘ_qsからの変化が少ないので、
現フレーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで
近似することができる。The reference probability b _ji is rewritten according to the comparison result of the distance dts related to the stationary part and the threshold value DTS for the following reason. That is, if dts> DTS, the constant part reference frame number qs is rewritten. However, since dts> DTS, the voice feature vector x _t of the current frame number _t is the constant part reference frame number qs before the rewriting. not approximate the speech feature vector x _qs, hence the speech feature vector x _t of the current frame number t greater change from speech feature vector x _qs constant region reference frame number qs, output probability b _ji the current frame number t (x _t ) cannot be approximated by the reference probability b _ji . Also the distance dts
If ≦ threshold value DTS, the constant part reference frame number qs is not rewritten. However, since dts ≦ DTS, the voice feature vector x _t of the current frame number _t is the constant part reference frame number qs that is not rewritten. Since the speech feature vector x _qs is approximately equal to the speech feature vector x _qs, and therefore the speech feature vector x _t of the current frame number t changes little from the speech feature vector x _qs of the stationary part reference frame number qs,
The output probability b _ji (x _t ) of the current frame number t can be approximated by the reference probability b _ji .

【００４０】同様に種別ｓが過渡部である場合は、過渡
部に関わる距離ｄｔｔが閾値ＤＴＴを越えれば、当該種
別ｓを得たｊに関しては、現フレーム番号ｔの出力確率
ｂ_ji(x_t)をヒドンマルコフモデルから求めそして参照確
率ｂ_jiを当該出力確率ｂ_ji(x _t)に書き換え、然る後に、
参照確率ｂ_jiを読み出して前向き確率ｃ_itを求める。ま
た過渡部に関わる距離ｄｔｔが閾値ＤＴＴ以下であれ
ば、当該種別ｓを得たｊに関しては、参照確率ｂ_jiの書
換えを行なわずに、従って現フレーム番号ｔの出力確率
ｂ_ji(x_t)をヒドンマルコフモデルから求めずに、参照確
率ｂ_jiを読み出して前向き確率ｃ_itを求める。これがた
め種別ｓが過渡部であるという判定結果を得たｊに関
し、前向き確率ｃ_itの誤差を抑えつつ、演算量を減少さ
せることができる。Similarly, if the type s is the transient part, the transient
If the distance dtt related to the part exceeds the threshold value DTT,
The output probability of the current frame number t for j that has obtained another s
b_ji(x_t) From the Hidden-Markov model and reference
Rate b_jiIs the output probability b_ji(x _t), And after that,
Reference probability b_jiAnd the forward probability c_itAsk for. Ma
If the distance dtt related to the transition part is less than or equal to the threshold value DTT,
For example, for j that obtained the type s, the reference probability b_jiBook of
Output probability of the current frame number t without changing
b_ji(x_t) From the Hidden Markov model,
Rate b_jiAnd the forward probability c_itAsk for. This
Therefore, regarding j that obtained the determination result that the type s is a transient part,
And the forward probability c_itError while suppressing the amount of calculation
Can be made.

【００４１】この場合の前向き確率ｃ_itの誤差とは、種
別ｓが過渡部であるという判定結果を得たｊに関して、
ｄｔｔ≦ＤＴＴの場合に出力確率ｂ_ji(x_t)をヒドンマル
コフモデルから求める演算を行なわずに得た前向き確率
ｃ_itと、そのような演算の簡略化を行なわずに得た前向
き確率ｃ_itとの間の差である。The error of the forward probability c _{it in} this case means that j is obtained as a result of the judgment that the type s is a transient part.
dtt ≦ a forward probability c _it got an output probability b _ji (x _t) without operation for obtaining the hidden Markov model in the case of DTT, such forward probability c _it obtained without simplification of operation Is the difference between.

【００４２】過渡部に関わる距離ｄｔｔ、閾値ＤＴＴの
比較結果に応じて、参照確率ｂ_jiの書換えを行なうの
は、次の理由による。すなわちｄｔｔ＞ＤＴＴであれば
過渡部基準フレーム番号ｑｔの書換えを行なうこととな
るが、ｄｔｔ＞ＤＴＴであるので現フレーム番号ｔの音
声特徴ベクトルｘ_t はこの書換え前の過渡部基準フレー
ム番号ｑｔの音声特徴ベクトルｘ_qtに近似せず、従って
現フレーム番号ｔの音声特徴ベクトルｘ_t は過渡部基準
フレーム番号ｑｔの音声特徴ベクトルｘ_qtからの変化が
大きいので、現フレーム番号ｔの出力確率ｂ_ji(x_t)を参
照確率ｂ_jiで近似することはできない。また距離ｄｔｔ
≦閾値ＤＴＴであれば過渡部基準フレーム番号ｑｔの書
換えを行なわないこととなるが、ｄｔｔ≦ＤＴＴである
ので現フレーム番号ｔの音声特徴ベクトルｘ_t は書換え
を行なわない過渡部基準フレーム番号ｑｔの音声特徴ベ
クトルｘ_qtと近似的に等しくなり、従って現フレーム番
号ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番号
ｑｔの音声特徴ベクトルｘ_qtからの変化が小さいので、
現フレーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで
近似することができる。The reason why the reference probability b _ji is rewritten according to the comparison result of the distance dtt and the threshold value DTT relating to the transition part is as follows. That is, if dtt> DTT, the transient part reference frame number qt is rewritten. However, since dtt> DTT, the voice feature vector x _t of the current frame number _t is the transient part reference frame number qt before the rewriting. not approximate the speech feature vector x _qt, hence the speech feature vector x _t of the current frame number t greater change from speech feature vector x _qt transient portion reference frame number qt, output probability b _ji the current frame number t (x _t ) cannot be approximated by the reference probability b _ji . Also, the distance dtt
If ≦ threshold value DTT, the transient reference frame number qt is not rewritten. However, since dtt ≦ DTT, the voice feature vector x _t of the current frame number _t is the transient reference frame number qt of which rewriting is not performed. Since the voice feature vector x _qt is approximately equal to the voice feature vector x _qt, and therefore the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _qt of the transition part reference frame number qt,
The output probability b _ji (x _t ) of the current frame number t can be approximated by the reference probability b _ji .

【００４３】さらに請求項３の発明において、種別ｓが
定常部である場合の閾値ＤＴＳと種別ｓが過渡部である
場合の閾値ＤＴＴとをそれぞれ個別に設定する理由は、
次の理由に依る。Further, in the invention of claim 3, the reason why the threshold value DTS when the type s is a stationary part and the threshold value DTT when the type s is a transient part are individually set is as follows.
It depends on the following reasons.

【００４４】すなわち、音声信号の過渡部においては時
間順次に検出される音声特徴ベクトルｘ_t の変化は大き
いので、種別ｓが過渡部である場合に用いる閾値ＤＴＴ
を小さくすることにより、前向き確率ｃ_itの誤差を小さ
くすることが望まれる。That is, since the change of the voice feature vector x _t detected time-sequentially is large in the transition part of the voice signal, the threshold value DTT used when the type s is the transition part.
It is desired to reduce the error of the forward probability c _it by reducing.

【００４５】これに対し、音声信号の定常部においては
時間順次に検出される音声特徴ベクトルｘ_t の変化は小
さいので、種別ｓが定常部である場合に用いる閾値ＤＴ
Ｓを大きくしても前向き確率ｃ_itの誤差を小さくするこ
とができる。閾値ＤＴＳを大きくすることは、演算量の
削減に寄与する。On the other hand, in the stationary part of the audio signal, since the change of the audio feature vector x _t detected in time sequence is small, the threshold value DT used when the type s is the stationary part.
Even if S is increased, the error of the forward probability c _it can be reduced. Increasing the threshold value DTS contributes to a reduction in the amount of calculation.

【００４６】従って種別ｓが過渡部である場合に用いる
閾値ＤＴＴに値の小さいものを用いると共に、種別ｓが
定常部である場合に用いる閾値ＤＴＳに値の大きいもの
を用いることにより、より効果的に前向き確率ｃ_itの誤
差を小さくしつつ、演算量を削減することができる。Therefore, it is more effective to use a small value for the threshold value DTT used when the type s is a transient part and to use a large value for the threshold value DTS used when the type s is a stationary part. It is possible to reduce the calculation amount while reducing the error of the forward probability c _it .

【００４７】また請求項４の発明の音声認識方法にあっ
ては、請求項３の発明の音声認識方法において、処理
（２Ｃ）及び（２Ｄ）の終了後、処理（２Ｅ）を行な
う。According to the voice recognition method of the invention of claim 4, in the voice recognition method of the invention of claim 3, the process (2E) is performed after the processes (2C) and (2D) are completed.

【００４８】このように請求項４の発明では、ｄｔｓ、
ＤＴＳの比較結果に応じて定常部基準フレーム番号ｑｓ
を書き換える処理（２Ｃ）とｄｔｃ、ＤＴＣの比較結果
に応じて過渡部基準フレーム番号ｑｃを書き換える処理
（２Ｄ）とを行ない、然る後、現フレーム番号ｔの出力
確率ｂ_ji(x_t)を与える状態遷移の、遷移元Ｓ_j に付与さ
れている種別ｓを判定する処理（２Ｅ）を行なう。従っ
て定常部基準フレーム番号ｑｓの書換え処理（２Ｃ）と
過渡部基準フレーム番号ｑｃの書換え処理（２Ｄ）と
を、種別ｓの判定処理（２Ｅ）を行なう前に終了して、
ｊ＝１、２、……、Ｊの個々のｊ毎には行なわないの
で、処理量を減らすことができる。種別ｓの判定処理
（２Ｅ）を行なった後に、これら書換え処理（２Ｃ）、
（２Ｄ）を行なうようにすると、個々のｊ毎に、これら
書換え処理（２Ｃ）、（２Ｄ）を行なうこととなり処理
量が増える。As described above, in the invention of claim 4, dts,
Based on the comparison result of DTS, the reference frame number of the stationary part qs
Is performed (2C) and the transition part reference frame number qc is rewritten according to the comparison result of dtc and DTC (2D), and then the output probability b _ji (x _t ) of the current frame number t is calculated. A process (2E) of determining the type s _assigned to the transition source S _j of the given state transition is performed. Therefore, the rewriting process (2C) of the stationary part reference frame number qs and the rewriting process (2D) of the transition part reference frame number qc are completed before the determination process (2E) of the type s,
Since it is not performed for each j of j = 1, 2, ..., J, the processing amount can be reduced. After performing the determination process (2E) for the type s, the rewriting process (2C),
If (2D) is performed, these rewriting processes (2C) and (2D) are performed for each individual j, which increases the processing amount.

【００４９】尚、処理量は増えるが、請求項３の発明に
おいて、処理（２Ｅ）の終了後に、処理（２Ｃ）及び
（２Ｄ）を行なうようにしても良い。Although the processing amount increases, in the invention of claim 3, the processing (2C) and (2D) may be performed after the processing (2E) is completed.

【００５０】また請求項５の発明の音声認識方法にあっ
ては、請求項３記載の音声認識方法において、ヒドンマ
ルコフモデルにおいて遷移元となる状態Ｓ_j に、定常部
及び過渡部のいずれかの種別ｓを付与し、定常部基準フ
レーム番号ｑｓ、過渡部基準フレーム番号ｑｔと、参照
確率ｂ_jiとを格納する記憶部を設け、該参照確率ｂ_jiを
用いて、ｔ＝１、２、……、Ｔの各場合の前向き確率ｃ
_itを順次に求める。According to the speech recognition method of the fifth aspect of the present invention, in the speech recognition method of the third aspect, the state S _j that is the transition source in the Hidden Markov model is either a steady part or a transient part. A type s is given, and a storage unit is provided for storing a stationary part reference frame number qs, a transition part reference frame number qt, and a reference probability b _ji, and using the reference probability b _ji , t = 1, 2, ... ..., the forward probability c in each case of T
_It is calculated sequentially.

【００５１】そして（１）．ｔ＝１のときは、定常部ス
キップ数ｓｋｉｐｓ、過渡部スキップ数ｓｋｉｐｔをそ
れぞれ０に、及び、定常部基準フレーム番号ｑｓ、過渡
部基準フレーム番号ｑｔをそれぞれ１に初期化すると共
に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)をヒドン
マルコフモデルから求め当該出力確率ｂ_ji(x_t)を参照確
率ｂ_jiの初期値として書き込み、参照確率ｂ_jiの書込み
終了後に各参照確率ｂ_jiを読み出して前向き確率ｃ_itを
求める処理（２Ａ）と、処理（２Ａ）の終了後、現フレ
ーム番号ｔに１を加算する処理（２Ｂ）とを行なう。And (1). When t = 1, the constant part skip number skips and the transient part skip number skipt are initialized to 0, and the constant part reference frame number qs and the transient part reference frame number qt are initialized to 1 and all j are initialized. for i, the output probability b _ji (x _t) writes an initial value of the reference probability b _ji the output probability b _ji (x _t) determined from the hidden Markov models, reference probability b each reference probability after completion of writing of _ji b A process (2A) of reading _ji to obtain the forward probability c _it and a process (2B) of adding 1 to the current frame number t are performed after the process (2A) is completed.

【００５２】（２）．２≦ｔ≦Ｔのときは、定常部スキ
ップ数ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共
に、現フレーム番号ｔの音声特徴ベクトルｘ_t と定常部
基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の
距離ｄｔｓを閾値ＤＴＳと比較し、当該比較結果がｓｋ
ｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる場
合に、定常部スキップ数ｓｋｉｐｓを０に初期化し、及
び、定常部基準フレーム番号ｑｓを現フレーム番号ｔに
書き換え、当該比較結果がｓｋｉｐｓ≦ＮＳＫＩＰＳか
つｄｔｓ≦ＤＴＳとなる場合に、定常部スキップ数ｓｋ
ｉｐｓに１を加算する処理（２Ｃ）と、過渡部スキップ
数ｓｋｉｐｔを閾値ＮＳＫＩＰＴと比較すると共に、現
フレーム番号ｔの音声特徴ベクトルｘ_t と過渡部基準フ
レーム番号ｑｔの音声特徴ベクトルｘ_qtとの間の距離ｄ
ｔｔを閾値ＤＴＴと比較し、当該比較結果がｓｋｉｐｔ
＞ＮＳＫＩＰＴ若しくはｄｔｔ＞ＤＴＴとなる場合に、
過渡部スキップ数ｓｋｉｐｔを０に初期化し、及び、過
渡部基準フレーム番号ｑｔを現フレーム番号ｔに書き換
え、当該比較結果がｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔ
ｔ≦ＤＴＴとなる場合に、過渡部スキップ数ｓｋｉｐｔ
に１を加算する処理（２Ｄ）と、処理（２Ｃ）、（２
Ｄ）の終了後、ｊ＝１、２、……、Ｊの各ｊ毎に、出力
確率ｂ_ji(x_t)を与える状態遷移の遷移元Ｓ_j に付与され
ている種別ｓを判定する処理（２Ｅ）と、処理（２Ｅ）
の種別判定結果が定常部であった場合に、処理（２Ｃ）
の比較結果がｓｋｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ
＞ＤＴＳであれば、当該種別判定結果を得たｊに関して
は全てのｉについて、出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x
_t)に書き換え、処理（４Ｅ）の種別判定結果が定常部で
あった場合に、処理（２Ｃ）の比較結果がｓｋｉｐｓ≦
ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳであれば、当該種別判
定結果を得たｊに関しては参照確率ｂ_jiの書換えを行な
わず、処理（２Ｅ）の種別判定結果が過渡部であった場
合に、処理（２Ｄ）の比較結果がｓｋｉｐｔ＞ＮＳＫＩ
ＰＴ若しくはｄｔｔ＞ＤＴＴであれば、当該種別判定結
果を得たｊに関しては全てのｉについて、出力確率ｂ_ji
(x_t)をヒドンマルコフモデルから求めて参照確率ｂ_jiを
当該出力確率ｂ_ji(x_t)に書き換え、処理（２Ｅ）の種別
判定結果が過渡部であった場合に、処理（２Ｄ）の比較
結果がｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔｔ≦ＤＴＴで
あれば、当該種別判定結果を得たｊに関しては参照確率
ｂ_jiの書換えを行なわない処理（２Ｆ）と、ｊ＝１、
２、……、Ｊの個々のｊ毎に該処理（２Ｆ）を行ない、
全てのｊにつき該処理（２Ｆ）を終了したら、各参照確
率ｂ_jiを読み出して前向き確率ｃ_itを求める処理（２
Ｇ）と、処理（２Ｇ）の終了後、現フレーム番号ｔに１
を加算する処理（２Ｈ）とを行なう。(2). When the 2 ≦ t ≦ T, as well as comparing the constant region skip number skips a threshold NSKIPS, the distance between the speech feature vector x _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t dts is compared with the threshold value DTS, and the comparison result is sk.
When ips> NSKIPS or dts> DTS, the constant part skip number skips is initialized to 0, and the constant part reference frame number qs is rewritten to the current frame number t, and the comparison result is skips ≦ NSKIPS and dts ≦ DTS. If, then the number of skips in the stationary part sk
1 and adds the process (2C) in ips, while comparing the transient portion skip number skipt a threshold NSKIPT, the speech feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t Distance d
tt is compared with the threshold value DTT, and the comparison result is skippt.
> NSKIPT or dtt> DTT,
The transition part skip number skipt is initialized to 0, and the transition part reference frame number qt is rewritten to the current frame number t, and the comparison result is skip≤NSKIPT and dt.
When t ≦ DTT, the number of skips in transition part skippt
Processing of adding 1 to (2D), processing (2C), (2
After the end of D), a process of determining the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ) for each j of j = 1, 2, ..., J. (2E) and processing (2E)
When the type determination result of is a stationary part, processing (2C)
Comparison result of skips> NSKIPS or dts
> DTS, for j for which the type determination result is obtained, the output probability b _ji (x _t ) is obtained from the Hidden Markov model for all i, and the reference probability b _ji is set as the output probability b _ji (x
_t )) and the type determination result of the process (4E) is the stationary part, the comparison result of the process (2C) is skips ≦
If NSKIPS and dts ≦ DTS, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and if the type determination result of the process (2E) is a transient part, the process (2D) Comparison result of skip> NSKI
If PT or dtt> DTT, the output probability b _ji for all i with respect to j for which the type determination result is obtained.
(x _t ) is obtained from the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and when the type determination result of the process (2E) is the transient part, the process (2D) If the comparison result is skipt ≦ NSKIPT and dtt ≦ DTT, a process (2F) in which the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and j = 1,
2, ..., Performing the processing (2F) for each j of J,
When the process (2F) is completed for all j, a process of reading the reference probabilities b _ji to obtain the forward probability c _it (2
G) and after the processing (2G) is finished, 1 is set to the current frame number t.
Is performed (2H).

【００５３】このように請求項５の発明では、ｊ＝１、
２、……、Ｊの個々のｊ毎に、出力確率ｂ_ji(x_t)を与え
る状態遷移の、遷移元Ｓ_j に付与されている種別ｓを判
定する。As described above, in the invention of claim 5, j = 1,
For each j of 2, ..., J, the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ) is determined.

【００５４】種別ｓの判定結果が定常部である場合に、
定常部に関わるスキップ数ｓｋｉｐｓが閾値ＮＳＫＩＰ
Ｓを越えるか若しくは定常部に関わる距離ｄｔｓが閾値
ＤＴＳを越えるかすれば、当該種別を得たｊに関して
は、現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマル
コフモデルから求めそして参照確率ｂ_jiを当該出力確率
ｂ_ji(x_t)に書き換え、然る後に、参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める。また種別ｓの判定結果が
定常部である場合に、定常部に関わるスキップ数ｓｋｉ
ｐｓが閾値ＮＳＫＩＰＳ以下となりかつ定常部に関わる
距離ｄｔｓが閾値ＤＴＳ以下となれば、当該種別ｓを得
たｊに関しては、参照確率ｂ_jiの書換えを行なわずに、
従って現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマ
ルコフモデルから求めずに、参照確率ｂ_jiを読み出して
前向き確率ｃ_itを求める。これがため種別ｓが定常部で
あるという判定結果を得たｊに関し、前向き確率ｃ_itの
誤差を抑えつつ、演算量を減少させることができる。When the determination result of the type s is the stationary part,
The skip number skips related to the stationary part is the threshold value NSKIP.
If S is exceeded or the distance dts related to the stationary part exceeds the threshold DTS, the output probability b _ji (x _t ) of the current frame number t is obtained from the Hidden-Markov model for j for which the type is obtained, and the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and after that, the reference probability b _ji is read to obtain the forward probability c _it . In addition, when the determination result of the type s is a stationary part, the skip number ski related to the stationary part
If ps is equal to or less than the threshold value NSKIPS and the distance dts related to the stationary part is equal to or less than the threshold value DTS, the reference probability b _ji is not rewritten for j for which the type s is obtained.
Therefore, the output probability b _ji (x _t ) of the current frame number t is not obtained from the Hidden Markov model, but the reference probability b _ji is read to obtain the forward probability c _it . Therefore, with respect to j for which the determination result that the type s is the stationary part is obtained, it is possible to reduce the amount of calculation while suppressing the error of the forward probability c _it .

【００５５】この場合の前向き確率ｃ_itの誤差とは、種
別ｓが定常部であるという判定結果を得たｊに関して、
ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳの場合に
出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求める演
算を行なわずに得た前向き確率ｃ_itと、そのような演算
の簡略化を行なわずに得た前向き確率ｃ_itとの間の差で
ある。The error of the forward probability c _{it in} this case means that j is obtained as a result of the judgment that the type s is a stationary part.
In the case of skips ≦ NSKIPS and dts ≦ DTS, the output probability b _ji (x _t ) was obtained without performing the calculation for obtaining from the Hidden Markov model, and the forward probability c _it was obtained without performing such calculation simplification. This is the difference between the forward probability c _it .

【００５６】定常部に関わる距離ｄｔｓ、閾値ＤＴＳの
比較結果とスキップ数ｓｋｉｐｓ、閾値ＮＳＫＩＰＳの
比較結果とに応じて、参照確率ｂ_jiの書き換えを行なう
のは次の理由による。The reference probability b _ji is rewritten according to the comparison result of the distance dts related to the stationary part, the threshold value DTS and the comparison result of the skip number skips and the threshold value NSKIPS for the following reason.

【００５７】ｄｔｓ＞ＤＴＳの場合は、現フレーム番号
ｔの音声特徴ベクトルｘ_t は、基準フレーム番号ｑｓの
音声特徴ベクトルｘ_qsに近似せず、従って現フレーム番
号ｔの音声特徴ベクトルｘ_t は基準フレーム番号ｑｓの
音声特徴ベクトルｘ_qsからの変化が大きいので、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで近似す
ることができない。そこで参照確率ｂ_jiの書き換えを行
なう。When dts> DTS, the voice feature vector x _t of the current frame number t does not approximate to the voice feature vector x _qs of the reference frame number qs, and therefore the voice feature vector x _t of the current frame number _t is the reference. the change from speech feature vector x _qs frame number qs is large, can not be approximated by reference probability b _ji the output probability b _ji the current frame number _t (x t). Therefore, the reference probability b _ji is rewritten.

【００５８】ｓｋｉｐｓ＞ＮＳＫＩＰＳの場合は、距離
ｄｔｓが閾値ＤＴＳ以下となった回数ｓｋｉｐｓが閾値
ＮＳＫＩＰＳを越えるので現フレーム番号ｔと基準フレ
ーム番号ｑｓとの時間的隔たりが大きくなり、従って誤
差が増大する可能性が高いので誤差を低減するべく、参
照確率ｂ_jiの書き換えを行なう。In the case of skips> NSKIPS, the number of times skips when the distance dts becomes equal to or less than the threshold value DTS exceeds the threshold value NSKIPS, so that the time gap between the current frame number t and the reference frame number qs becomes large, and thus the error increases. Since the possibility is high, the reference probability b _ji is rewritten in order to reduce the error.

【００５９】ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦Ｄ
ＴＳの場合は、ｄｔｓ≦ＤＴＳであるので現フレーム番
号ｔの音声特徴ベクトルｘ_t は、基準フレーム番号ｑｓ
の音声特徴ベクトルｘ_qsに近似し、従って現フレーム番
号ｔの音声特徴ベクトルｘ_tは基準フレーム番号ｑｓの
音声特徴ベクトルｘ_qsからの変化が少ないので、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで近似す
ることができる。しかもｓｋｉｐｓ≦ＮＳＫＩＰＳであ
り従って距離ｄｔｓが閾値ＤＴＳ以下となった回数ｓｋ
ｉｐｓが閾値ＮＳＫＩＰＳ以下であるので現フレーム番
号ｔと基準フレーム番号ｑｓとの時間的隔たりが小さ
く、これがため誤差が増大する可能性が低い。そこで演
算量を低減すべく、参照確率ｂ_jiの書き換えを行なわな
い。Skips ≦ NSKIPS and dts ≦ D
In the case of TS, since dts ≦ DTS, the voice feature vector x _t of the current frame number _t is the reference frame number qs.
Since the approximate speech feature vector x _qs, hence speech feature vector x _t of the current frame number t is less change from speech feature vector x _qs reference frame number qs, output probability b _ji (x of the current frame number t _t ) can be approximated by the reference probability b _ji . Moreover, skips ≦ NSKIPS, and thus the number of times the distance dts becomes equal to or less than the threshold value DTS sk
Since ips is less than or equal to the threshold value NSKIPS, the time gap between the current frame number t and the reference frame number qs is small, and therefore the error is unlikely to increase. Therefore, in order to reduce the calculation amount, the reference probability b _ji is not rewritten.

【００６０】同様に種別ｓの判定結果が過渡部である場
合に、過渡部に関わるスキップ数ｓｋｉｐｔが閾値ＮＳ
ＫＩＰＴを越えるか若しくは過渡部に関わる距離ｄｔｔ
が閾値ＤＴＴを越えるかすれば、当該種別を得たｊに関
しては、現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドン
マルコフモデルから求めそして参照確率ｂ_jiを当該出力
確率ｂ_ji(x_t)に書き換え、然る後に、参照確率ｂ_jiを読
み出して前向き確率ｃ_itを求める。また種別ｓの判定結
果が過渡部である場合に、過渡部に関わるスキップ数ｓ
ｋｉｐｔが閾値ＮＳＫＩＰＴ以下となりかつ過渡部に関
わる距離ｄｔｔが閾値ＤＴＴ以下となれば、当該種別ｓ
を得たｊに関しては、参照確率ｂ_jiの書換えを行なわず
に、従って現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒド
ンマルコフモデルから求めずに、参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める。これがため種別ｓが過渡
部であるという判定結果を得たｊに関し、前向き確率ｃ
_itの誤差を抑えつつ、演算量を減少させることができ
る。Similarly, when the determination result of the type s is the transient part, the skip number skipt related to the transient part is the threshold NS.
Distance dtt that exceeds KIPT or is related to transition
Is greater than the threshold DTT, the output probability b _ji (x _t ) of the current frame number t is obtained from the Hidden-Markov model for j that has obtained the type, and the reference probability b _ji is the output probability b _ji (x _t ), And after that, the reference probability b _ji is read to obtain the forward probability c _it . In addition, when the determination result of the type s is the transient part, the number of skips s related to the transient part
If kipt is less than or equal to the threshold value NSKIPT and the distance dtt related to the transition part is less than or equal to the threshold value DTT, the type s
For j obtained, the reference probability b _ji is read out without rewriting the reference probability b _ji , and thus the output probability b _ji (x _t ) of the current frame number t is not obtained from the Hidden Markov model. ask for _it . For this reason, with respect to j for which the determination result that the type s is the transient part is obtained, the forward probability c
_The amount of calculation can be reduced while suppressing the error of _it .

【００６１】この場合の前向き確率ｃ_itの誤差とは、種
別ｓが過渡部であるという判定結果を得たｊに関して、
ｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔｔ≦ＤＴＴの場合に
出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求める演
算を行なわずに得た前向き確率ｃ_itと、そのような演算
の簡略化を行なわずに得た前向き確率ｃ_itとの間の差で
ある。The error of the forward probability c _{it in} this case means that j is obtained as a result of judging that the type s is a transient part.
In the case of skipt ≤ NSKIPT and dtt ≤ DTT, the output probability b _ji (x _t ) is obtained without performing the operation for obtaining from the Hidden-Markov model, and the forward probability c _it is obtained without performing such operation simplification. This is the difference between the forward probability c _it .

【００６２】過渡部に関わる距離ｄｔｔ、閾値ＤＴＴの
比較結果とスキップ数ｓｋｉｐｔ、閾値ＮＳＫＩＰＴの
比較結果とに応じて、参照確率ｂ_jiの書き換えを行なう
のは次の理由による。The reference probability b _ji is rewritten according to the comparison result of the distance dtt related to the transitional part, the threshold value DTT and the comparison result of the skip number skipt and the threshold value NSKIPT for the following reason.

【００６３】ｄｔｔ＞ＤＴＴの場合は、現フレーム番号
ｔの音声特徴ベクトルｘ_t は、基準フレーム番号ｑｔの
音声特徴ベクトルｘ_qtに近似せず、従って現フレーム番
号ｔの音声特徴ベクトルｘ_t は基準フレーム番号ｑｔの
音声特徴ベクトルｘ_qtからの変化が大きいので、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで近似す
ることができない。そこで参照確率ｂ_jiの書き換えを行
なう。When dtt> DTT, the voice feature vector x _t of the current frame number t does not approximate to the voice feature vector x _qt of the reference frame number qt, so the voice feature vector x _t of the current frame number _t is the reference. the change from speech feature vector x _qt frame number qt is large, can not be approximated by reference probability b _ji the output probability b _ji the current frame number _t (x t). Therefore, the reference probability b _ji is rewritten.

【００６４】ｓｋｉｐｔ＞ＮＳＫＩＰＴの場合は、距離
ｄｔｔが閾値ＤＴＴ以下となった回数ｓｋｉｐｔが閾値
ＮＳＫＩＰＴを越えるので現フレーム番号ｔと基準フレ
ーム番号ｑｔとの時間的隔たりが大きくなり、従って誤
差が増大する可能性が高いので誤差を低減するべく、参
照確率ｂ_jiの書き換えを行なう。In the case of skipt> NSKIPT, the number of times the distance dtt becomes the threshold value DTT or less skipt exceeds the threshold value NSKIPT, so that the time gap between the current frame number t and the reference frame number qt becomes large, and thus the error increases. Since the possibility is high, the reference probability b _ji is rewritten in order to reduce the error.

【００６５】ｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔｔ≦Ｄ
ＴＴの場合は、ｄｔｔ≦ＤＴＴであるので現フレーム番
号ｔの音声特徴ベクトルｘ_t は、基準フレーム番号ｑｔ
の音声特徴ベクトルｘ_qtに近似し、従って現フレーム番
号ｔの音声特徴ベクトルｘ_tは基準フレーム番号ｑｔの
音声特徴ベクトルｘ_qtからの変化が少ないので、現フレ
ーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ_jiで近似す
ることができる。しかもｓｋｉｐｔ≦ＮＳＫＩＰＴであ
り従って距離ｄｔｔが閾値ＤＴＴ以下となった回数ｓｋ
ｉｐｔが閾値ＮＳＫＩＰＴ以下であるので現フレーム番
号ｔと基準フレーム番号ｑｔとの時間的隔たりが小さ
く、これがため誤差が増大する可能性が低い。そこで演
算量を低減すべく、参照確率ｂ_jiの書き換えを行なわな
い。Skippt NSKIPT and dtt≤D
In the case of TT, since dtt ≦ DTT, the voice feature vector x _t of the current frame number _t is the reference frame number qt.
Since the approximate speech feature vector x _qt, hence speech feature vector x _t of the current frame number t is less change from speech feature vector x _qt reference frame number qt, output probability b _ji (x of the current frame number t _t ) can be approximated by the reference probability b _ji . In addition, skipt ≦ NSKIPT, and therefore the number of times the distance dtt becomes equal to or less than the threshold value DTT
Since ipt is less than or equal to the threshold value NSKIPT, the time gap between the current frame number t and the reference frame number qt is small, which reduces the possibility of increasing the error. Therefore, in order to reduce the calculation amount, the reference probability b _ji is not rewritten.

【００６６】さらに請求項５の発明において、種別ｓが
定常部である場合の閾値ＤＴＳ、ＮＳＫＩＰＳと種別ｓ
が過渡部である場合の閾値ＤＴＴ、ＮＳＫＩＰＴとをそ
れぞれ個別に設定する理由は、次に述べる理由に依る。Further, in the invention of claim 5, the thresholds DTS, NSKIPS and type s when the type s is a stationary part
The reason why the threshold values DTT and NSKIPT are individually set in the case where is a transition part depends on the following reason.

【００６７】すなわち、音声信号の過渡部においては時
間順次に検出される音声特徴ベクトルｘ_t の変化は大き
いので、種別ｓが過渡部である場合に用いる閾値ＤＴ
Ｔ、ＮＳＫＩＰＴを小さくすることにより、前向き確率
ｃ_itの誤差を小さくすることが望まれる。That is, since the change of the voice feature vector x _t detected in time sequence is large in the transition part of the voice signal, the threshold value DT used when the type s is the transition part.
It is desirable to reduce the error of the forward probability c _it by reducing T and NSKIPT.

【００６８】これに対し、音声信号の定常部においては
時間順次に検出される音声特徴ベクトルｘ_t の変化は小
さいので、種別ｓが定常部である場合に用いる閾値ＤＴ
Ｓ、ＮＳＫＩＰＳを大きくしても前向き確率ｃ_itの誤差
を小さくすることができる。閾値ＤＴＳ、ＮＳＫＩＰＳ
を大きくすることは、演算量の削減に寄与する。On the other hand, in the stationary part of the audio signal, since the change of the audio feature vector x _t detected in time sequence is small, the threshold value DT used when the type s is the stationary part.
Even if S and NSKIPS are increased, the error of the forward probability c _it can be reduced. Threshold DTS, NSKIPS
Increasing the value contributes to the reduction of the calculation amount.

【００６９】従って種別ｓが過渡部である場合に用いる
閾値ＤＴＴ、ＮＳＫＩＰＴに値の小さいものを用いると
共に、種別ｓが定常部である場合に用いる閾値ＤＴＳ、
ＮＳＫＩＰＳに値の大きいものを用いることにより、よ
り効果的に前向き確率ｃ_itの誤差を小さくしつつ、演算
量を削減することができる。Therefore, the thresholds DTT and NSKIPT having a small value are used when the type s is the transient part, and the threshold DTS used when the type s is the stationary part,
By using a large value for NSKIPS, it is possible to more effectively reduce the error of the forward probability c _it and reduce the calculation amount.

【００７０】また請求項６の発明の音声認識方法にあっ
ては、請求項５の発明の音声認識方法において、処理
（２Ｃ）及び（２Ｄ）の終了後、処理（２Ｅ）を行な
う。Further, in the voice recognition method of the invention of claim 6, in the voice recognition method of the invention of claim 5, the process (2E) is performed after the processes (2C) and (2D) are completed.

【００７１】このように請求項６の発明では、ｓｋｉｐ
ｓ、ＮＳＫＩＰＳの比較結果及びｄｔｓ、ＤＴＳの比較
結果に応じて定常部スキップ数ｓｋｉｐｓの初期化若し
くはカウントアップと定常部基準フレーム番号ｑｓの書
換えとを行なう処理（２Ｃ）と、ｓｋｉｐｔ、ＮＳＫＩ
ＰＴの比較結果及びｄｔｔ、ＤＴＴの比較結果に応じて
過渡部スキップ数ｓｋｉｐｔの初期化若しくはカウント
アップと過渡部基準フレーム番号ｑｔの書換えとを行な
う処理（２Ｄ）とを行ない、然る後、現フレーム番号ｔ
の出力確率ｂ_ji(x_t)を与える状態遷移の、遷移元Ｓ_j に
付与されている種別ｓを判定する処理（２Ｅ）を行な
う。従ってこれらスキップ数、基準フレーム番号に関わ
る処理（２Ｃ）、（２Ｄ）を、種別ｓの判定処理（２
Ｅ）を行なう前に終了して、ｊ＝１、２、……、Ｊの個
々のｊ毎には行なわないので、処理量を減らすことがで
きる。種別ｓの判定処理（２Ｅ）を行なった後に、これ
らスキップ数、基準フレーム番号に関わる書換え処理
（２Ｃ）、（２Ｄ）を行なうようにすると、個々のｊ毎
に、これらスキップ数、基準フレーム番号に関わる処理
（２Ｃ）、（２Ｄ）を行なうこととなり処理量が増え
る。As described above, in the invention of claim 6, the skip
A process (2C) of initializing or counting up the constant part skip number skips and rewriting the constant part reference frame number qs according to the comparison result of s, NSKIPS and the comparison result of dts, DTS, and skipt, NSKI.
Depending on the comparison result of PT and the comparison result of dtt and DTT, the process (2D) of initializing or counting up the transition part skip number skipt and rewriting the transition part reference frame number qt is performed. Frame number t
The process (2E) of determining the type s _assigned to the transition source S _j of the state transition giving the output probability b _ji (x _t ) of Therefore, the processes (2C) and (2D) related to the number of skips and the reference frame number are compared with the determination process (2
The processing amount can be reduced because the processing is finished before performing step E) and is not performed for each j of j = 1, 2, ..., J. When the rewriting process (2C) and (2D) relating to the skip number and the reference frame number are performed after the determination process (2E) of the type s, the skip number and the reference frame number are calculated for each j. Since the processes (2C) and (2D) related to are performed, the processing amount increases.

【００７２】尚、処理量は増えるが、請求項５の発明に
おいて、処理（２Ｅ）の終了後に処理（２Ｃ）及び（２
Ｄ）を行なうようにしても良い。Although the processing amount increases, in the invention of claim 5, after the processing (2E) is completed, the processing (2C) and (2) are performed.
You may make it perform D).

【００７３】＜請求項７〜８の発明＞さらに請求項７の
発明の音声認識方法にあっては、前向き確率基準フレー
ム番号ｑｃ、出力確率基準フレーム番号ｑｓと、参照確
率ｂ_jiとを格納する記憶部を設け、参照確率ｂ_jiを用い
て、ｔ＝１、２、……、Ｔの各場合の前向き確率ｃ_itを
順次に求める。<Invention of Claims 7 to 8> Further, in the voice recognition method of the invention of Claim 7, the forward probability reference frame number qc, the output probability reference frame number qs, and the reference probability b _ji are stored. A storage unit is provided, and the forward probability c _it in each case of t = 1, 2, ..., T is sequentially obtained using the reference probability b _ji .

【００７４】そして（１）．ｔ＝１のときは、前向き確
率基準フレーム番号ｑｃ、出力確率基準フレーム番号ｑ
ｓをそれぞれ１に初期化すると共に、全てのｊ、ｉにつ
いて、出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求
め当該出力確率ｂ_ji(x_t)を参照確率ｂ_jiの初期値として
書き込み、参照確率ｂ_jiの書込み終了後に各参照確率ｂ
_jiを読み出して前向き確率ｃ_itを求める処理（３Ａ）
と、処理（３Ａ）の終了後、現フレーム番号ｔに１を加
算する処理（３Ｂ）とを行なう。And (1). When t = 1, the forward probability reference frame number qc and the output probability reference frame number q
Writing is initialized s to 1 in all j, for i, as the initial value of the reference probability b _ji the output probability b _ji (x _t) the probability that output determined from Hidden Markov Models b _ji (x _t) , Each reference probability b after writing the reference probability b _ji
Processing to read _ji and obtain forward probability c _it (3A)
Then, after the process (3A) is completed, the process (3B) of adding 1 to the current frame number t is performed.

【００７５】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と前向き確率基準フレー
ム番号ｑｃの音声特徴ベクトルｘ_qcとの間の距離ｄｔｃ
を閾値ＤＴＣと比較する処理（３Ｃ）と、処理（３Ｃ）
の比較結果がｄｔｃ≦ＤＴＣとなる場合に、前向き確率
ｃ_itは直前フレームの前向き確率ｃ_i(t-1)に等しいもの
として前向き確率ｃ_itを求める演算を終了する処理（３
Ｄ）と、処理（３Ｃ）の比較結果がｄｔｃ＞ＤＴＣとな
る場合に、前向き確率基準フレーム番号ｑｃを現フレー
ム番号ｔに書き換える処理（３Ｅ）と、処理（３Ｅ）の
終了後、現フレーム番号ｔの音声特徴ベクトルｘ_t と出
力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsと
の間の距離ｄｔｓを閾値ＤＴＳと比較し、当該比較結果
がｄｔｓ＞ＤＴＳとなる場合に、出力確率基準フレーム
番号ｑｓを現フレーム番号ｔに書き換えると共に、全て
のｊ、ｉについて、出力確率ｂ_ji(x_t)をヒドンマルコフ
モデルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x_t)
に書き換え、参照確率ｂ_jiの書換え終了後に各参照確率
ｂ_jiを読み出して前向き確率ｃ_itを求め、当該比較結果
がｄｔｓ≦ＤＴＳとなる場合に、参照確率ｂ_jiの書き換
えを行なわずに各参照確率ｂ_jiを読み出して前向き確率
ｃ_itを求める処理（３Ｆ）と、処理（３Ｄ）若しくは
（３Ｆ）の終了後、現フレーム番号ｔに１を加算する処
理（３Ｇ）とを行なう。(2). 2 ≦ t when the ≦ T, the distance between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t dtc
Processing (3C) for comparing the value with the threshold value DTC, and processing (3C)
When the result of comparison of dtc ≦ DTC is satisfied, the forward probability c _it is assumed to be equal to the forward probability c _{i (t−1)} of the immediately preceding frame, and the calculation for obtaining the forward probability c _it is ended (3
D) and the process (3C) result in dtc> DTC, the forward probability reference frame number qc is rewritten to the current frame number t (3E), and the current frame number after the process (3E) is finished. The distance dts between the voice feature vector x _{t of t} and the voice feature vector x _qs of the output probability reference frame number qs is compared with the threshold DTS, and when the comparison result is dts> DTS, the output probability reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) are obtained from the Hidden Markov model for all j and i, and the reference probabilities b _ji are output probabilities b _ji (x _t ).
, And each reference probability b _ji is read after the reference probability b _ji is rewritten to obtain the forward probability c _{it. When the} comparison result is dts ≦ DTS, each reference probability b _ji is not rewritten. A process (3F) of reading the probability b _ji to obtain the forward probability c _it and a process (3G) of adding 1 to the current frame number t after the process (3D) or (3F) are completed.

【００７６】このように請求項７の発明では、参照確率
ｂ_jiの初期値を、始端フレームでヒドンマルコフモデル
から求めた出力確率ｂ_ji(x₁)とする。そして前向き確率
基準フレーム番号ｑｃの初期値と、出力確率基準フレー
ム番号ｑｓの初期値とをそれぞれ、始端フレームのフレ
ーム番号１とする。As described above, in the invention of claim 7, the initial value of the reference probability b _ji is the output probability b _ji (x ₁ ) obtained from the Hidden-Markov model in the start frame. Then, the initial value of the forward probability reference frame number qc and the initial value of the output probability reference frame number qs are set as the frame number 1 of the start frame.

【００７７】そして現フレーム番号ｔの音声特徴ベクト
ルｘ_t と前向き確率基準フレーム番号ｑｃの音声特徴ベ
クトルｘ_qcとの間の距離ｄｔｃを閾値ＤＴＣと比較す
る。ｄｔｃ≦ＤＴＣの場合は、現フレーム番号ｔの前向
き確率ｃ_itは直前フレームの前向き確率ｃ_i(t-1)に等し
いものとして、前向き確率ｃ_itを求める演算を終了す
る。[0077] and compares the distance dtc between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t and the threshold DTC. If dtc ≦ DTC, the forward probability c _it of the current frame number t is assumed to be equal to the forward probability c _{i (t−1)} of the immediately preceding frame, and the calculation of the forward probability c _it ends.

【００７８】またｄｔｃ＞ＤＴＣの場合は、前向き確率
基準フレーム番号ｑｃを現フレーム番号ｔに書き換え、
然る後、現フレーム番号ｔの音声特徴ベクトルｘ_t と基
準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の距
離ｄｔｓを閾値ＤＴＳと比較する。ｄｔｓ＞ＤＴＳの場
合は、基準フレーム番号ｑｓの書換えと参照確率ｂ_jiの
書換えとを行ない、書き換えた参照確率ｂ_jiを読み出し
て前向き確率ｃ_itを求める。ｄｔｓ≦ＤＴＳの場合は、
基準フレーム番号ｑｓの書換えと参照確率ｂ_jiの書換え
とは行なわず、書換えを行なわなかった参照確率ｂ_jiを
読み出して前向き確率ｃ_itを求める。If dtc> DTC, the forward probability reference frame number qc is rewritten to the current frame number t,
Thereafter, comparing the distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t and the threshold DTS. If dts> DTS, the reference frame number qs is rewritten and the reference probability b _ji is rewritten, and the rewritten reference probability b _ji is read to obtain the forward probability c _it . If dts ≤ DTS,
The rewriting of the reference frame number qs and the rewriting of the reference probability b _ji are not performed, but the reference probability b _ji that has not been rewritten is read to obtain the forward probability c _it .

【００７９】ｄｔｃ≦ＤＴＣの場合は、距離ｄｔｃが閾
値ＤＴＣ以下となるので現フレーム番号ｔの音声特徴ベ
クトルｘ_t と前向き確率基準フレーム番号ｑｃの音声特
徴ベクトルｘ_qcとが近似的に等しくなる場合であり、従
って現フレーム番号ｔの前向き確率ｃ_itは前向き確率基
準フレーム番号ｑｃの音声特徴ベクトルｘ_qcからの変化
が小さくなるので、現フレーム番号ｔの前向き確率ｃ_it
は直前フレームの前向き確率ｃ_i(t-1)で近似できる。そ
こで現フレーム番号ｔの前向き確率ｃ_itは直前フレーム
の前向き確率ｃ_i(t-1)に等しいものとして、前向き確率
ｃ_itを求める演算を終了する。[0079] dtc For ≦ DTC, when the distance dtc is equal to or less than the threshold DTC and the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t is equal to approximately Therefore, since the forward probability c _it of the current frame number t changes from the voice feature vector x _qc of the forward probability reference frame number qc, the forward probability c _it of the current frame number t becomes.
Can be approximated by the forward probability c _{i (t-1)} of the immediately preceding frame. So forward probability c _it the current frame number t is as equal to the forward probability c _{i (t-1)} of the previous frame, and terminates the operation for obtaining the forward probability c _it.

【００８０】ｄｔｃ＞ＤＴＣの場合は、距離ｄｔｃが閾
値ＤＴＣを越えるので現フレーム番号ｔの音声特徴ベク
トルｘ_t と前向き確率基準フレーム番号ｑｃの音声特徴
ベクトルｘ_qcとが近似しない場合であり、従って現フレ
ーム番号ｔの音声特徴ベクトルｘ_t は前向き確率基準フ
レーム番号ｑｃの音声特徴ベクトルｘ_qcからの変化が大
きくなるので、現フレーム番号ｔの前向き確率ｃ_itは直
前フレームの前向き確率Ｃ_i(t-1)で近似できない。そこ
で参照確率ｂ_jiを読み出して現フレーム番号ｔの前向き
確率ｃ_itを求める演算を行なうこととなるので、前向き
確率基準フレーム番号ｑｃを現フレーム番号ｔに書き換
える。[0080] The dtc> For DTC, distance dtc the case since exceeds a threshold DTC that the speech feature vector x _t and forward probabilities reference frame number qc speech feature vector x _qc of the current frame number t does not approximate, thus Since the speech feature vector x _t of the current frame number t has a large change from the speech feature vector x _qc of the forward probability reference frame number qc, the forward probability c _it of the current frame number t is the forward probability C _{i (t -1)} cannot be approximated. Therefore, since the reference probability b _ji is read and the forward probability c _it of the current frame number t is calculated, the forward probability reference frame number qc is rewritten to the current frame number t.

【００８１】また記憶部に格納される参照確率ｂ_jiは、
出力確率基準フレーム番号ｑｓのフレームでヒドンマル
コフモデルから求めた出力確率ｂ_ji(x_t)である。The reference probability b _ji stored in the storage unit is
The output probability is the output probability b _ji (x _t ) obtained from the Hidden Markov Model in the frame of the reference frame number qs.

【００８２】そしてｄｔｓ＞ＤＴＳの場合は、距離ｄｔ
ｓが閾値ＤＴＳを越えるので現フレーム番号ｔの音声特
徴ベクトルｘ_t と書換え前の出力確率基準フレーム番号
ｑｓの音声特徴ベクトルｘ_qsとが近似しない場合であ
り、従って現フレーム番号ｔの音声特徴ベクトルｘ_t は
出力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
からの変化が大きくなるので、現フレーム番号ｔの出力
確率ｂ_ji(x_t)を、書換え前の出力確率基準フレーム番号
ｑｓの出力確率ｂ_ji(x_qs) すなわち参照確率ｂ_jiで近似
できない。そこで現フレーム番号ｔの出力確率ｂ_ji(x_t)
をヒドンマルコフモデルから求め、参照確率ｂ_jiを当該
出力確率ｂ_ji(x_t)に書き換えた後に参照確率ｂ_jiを読み
出して前向き確率ｃ_itを求める。また参照確率ｂ_jiを、
現フレーム番号ｔの出力確率ｂ_ji(x_t)に書き換えるの
で、出力確率基準フレーム番号ｑｓを現フレーム番号ｔ
に書き換える。When dts> DTS, the distance dt
s is a case where since exceeding the threshold DTS audio feature vector x _t of the current frame number t and the speech feature vector x _qs for rewriting the previous output probabilities reference frame number qs is not approximate, hence speech feature vectors of the current frame number t x _t is the speech feature vector x _qs of the output probability reference frame number _qs
Therefore, the output probability b _ji (x _t ) of the current frame number t cannot be approximated by the output probability b _ji (x _qs ) of the output probability reference frame number qs before rewriting, that is, the reference probability b _ji . Therefore, the output probability b _ji (x _t ) of the current frame number t
From the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and then the reference probability b _ji is read to obtain the forward probability c _it . The reference probability b _ji is
Since the output probability b _ji (x _t ) of the current frame number t is rewritten, the output probability reference frame number qs is changed to the current frame number t.
Rewrite

【００８３】ｄｔｓ≦ＤＴＳの場合は、距離ｄｔｓが閾
値ＤＴＳ以下となるので現フレーム番号ｔの音声特徴ベ
クトルｘ_t と書換えを行なわない出力確率基準フレーム
番号ｑｓの音声特徴ベクトルｘ_qsとが近似的に等しくな
る場合であり、従って現フレーム番号ｔの音声特徴ベク
トルｘ_t は出力確率基準フレーム番号ｑｓの音声特徴ベ
クトルｘ_qsからの変化が小さくなるので、現フレーム番
号ｔの出力確率ｂ_ji(x_t)を、出力確率基準フレーム番号
ｑｓの出力確率ｂ_ji(x_qs) すなわち参照確率ｂ_jiで近似
できる。そこで参照確率ｂ_jiの書換えを行なわずに、参
照確率ｂ_jiを読み出して前向き確率ｃ_itを求める。また
参照確率ｂ_jiの書換えを行なわないので、出力確率基準
フレーム番号ｑｓの書換えを行なわない。[0083] dts For ≦ DTS is the distance dts is equal to or less than the threshold DTS and speech feature vectors x _qs output probabilities reference frame number qs is not performed rewriting a speech feature vector x _t of the current frame number t approximate Therefore, the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _qs of the output probability reference frame number qs, and thus the output probability b _ji (x _t ) can be approximated by the output probability b _ji (x _qs ) of the output probability reference frame number qs, that is, the reference probability b _ji . So without rewriting the reference probability b _ji, it reads the reference probability b _ji by obtaining the forward probability c _it. Since the reference probability b _ji is not rewritten, the output probability reference frame number qs is not rewritten.

【００８４】このようにｄｔｃ≦ＤＴＣの場合は、現フ
レーム番号ｔの前向き確率ｃ_itは直前フレームの前向き
確率ｃ_i(t-1)に等しいものとして前向き確率ｃ_itを求め
る演算を終了し、出力確率ｂ_ji(x_t)を求める演算を行な
わない。そしてｄｔｃ＞ＤＴＣの場合にｄｔｓ＞ＤＴＳ
であれば、参照確率ｂ_jiの書換えを行なった後に、従っ
て現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求める演算を行なった後に、参照確率ｂ_ji
を読み出して前向き確率ｃ_itを求める。またｄｔｃ＞Ｄ
ＴＣの場合にｄｔｓ≦ＤＴＳであれば、参照確率ｂ_jiの
書換えを行なわずに、従って現フレーム番号ｔの出力確
率ｂ_ji(x_t)をヒドンマルコフモデルから求める演算を行
なわずに、参照確率ｂ_jiを読み出して前向き確率ｃ_itを
求めるので、前向き確率ｃ_itの誤差を抑えつつ、演算量
を減少させることができる。Thus, in the case of dtc≤DTC, the forward probability c _it of the current frame number t is assumed to be equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the calculation of the forward probability c _it is completed. The calculation for _{obtaining the} output probability b _ji (x _t ) is not performed. And if dtc> DTC, dts> DTS
If so, after the reference probability b _ji is rewritten, and accordingly, the output probability b _ji (x _t ) of the current frame number t is calculated from the Hidden Markov model, and then the reference probability b _ji
Is read to obtain the forward probability c _it . Also dtc> D
In the case of TC, if dts ≦ DTS, the reference probability b _ji is not rewritten, and thus the output probability b _ji (x _t ) of the current frame number t is not calculated from the Hidden Markov model, and the reference probability since reads b _ji seek forward probability c _it, while suppressing the error of the forward probability c _it, it is possible to reduce the amount of calculation.

【００８５】この場合の前向き確率ｃ_itの誤差とは、ｄ
ｔｃ≦ＤＴＣ若しくはｄｔｓ≦ＤＴＳの場合に出力確率
ｂ_ji(x_t)をヒドンマルコフモデルから求める演算を行な
わずに得た前向き確率ｃ_itと、そのような演算の簡略化
を行なわずに得た前向き確率ｃ_itとの差である。The error of the forward probability c _{it in} this case is d
When tc ≦ DTC or dts ≦ DTS, the output probability b _ji (x _t ) is obtained without performing the calculation for obtaining from the Hidden Markov model, and the forward probability c _it is obtained without simplifying such calculation. This is the difference from the forward probability c _it .

【００８６】閾値ＤＴＣ、ＤＴＳを大きくするに従っ
て、演算の削減量は増えるが、前向き確率ｃ_itの誤差は
大きくなる。従って実用上望まれる誤差の範囲内で前向
き確率ｃ_itを求めることができるように、閾値ＤＴＣ、
ＤＴＳの値を定める必要がある。As the threshold values DTC and DTS are increased, the amount of reduction in calculation increases, but the error in the forward probability c _it increases. Therefore, the threshold DTC, so that the forward probability c _it can be obtained within the error range practically desired,
It is necessary to determine the value of DTS.

【００８７】また請求項８の発明の音声認識方法にあっ
ては、請求項７の発明の音声認識方法において、次の如
く処理を行なう。In the voice recognition method of the eighth aspect of the invention, the following processing is performed in the voice recognition method of the seventh aspect of the invention.

【００８８】（１）．ｔ＝１のときは、前向き確率基準
フレーム番号ｑｃ、出力確率基準フレーム番号ｑｓをそ
れぞれ１に、及び、前向き確率スキップ数ｓｋｉｐｃ、
出力確率スキップ数ｓｋｉｐｓをそれぞれ０に初期化す
ると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)を
ヒドンマルコフモデルから求め当該出力確率ｂ_ji(x_t)を
参照確率ｂ_jiの初期値として書き込み、該参照確率ｂ_ji
の書込み終了後に各参照確率ｂ_jiを読み出して前向き確
率ｃ_itを求める処理（３Ａ）と、処理（３Ａ）の終了
後、現フレーム番号ｔに１を加算する処理（３Ｂ）とを
行なう。(1). When t = 1, the forward probability reference frame number qc and the output probability reference frame number qs are set to 1, respectively, and the forward probability skip number skipc,
Is initialized to output probability skip number skips to 0 respectively, all j, for i, the output probability b _ji (x _t) from hidden Markov model determined reference probability b _ji the output probability b _ji (x _t) Write as an initial value, and the reference probability b _ji
After the completion of the writing, the reference probability b _ji is read out to obtain the forward probability c _it (3A), and after the process (3A) is finished, the current frame number t is incremented by 1 (3B).

【００８９】（２）．２≦ｔ≦Ｔのときは、前向き確率
スキップ数ｓｋｉｐｃを閾値ＮＳＫＩＰＣと比較すると
共に、現フレーム番号ｔの音声特徴ベクトルｘ_t と前向
き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qcと
の間の距離ｄｔｃを閾値ＤＴＣと比較する処理（３Ｃ）
と、処理（３Ｃ）の比較結果がｓｋｉｐｃ≦ＮＳＫＩＰ
Ｃかつｄｔｃ≦ＤＴＣとなる場合に、前向き確率ｃ_itは
直前フレームの前向き確率ｃ_i(t-1)に等しいものとして
前向き確率ｃ_itを求める演算を終了すると共に、前向き
確率スキップ数ｓｋｉｐｃ、出力確率スキップ数ｓｋｉ
ｐｓにそれぞれ１を加算する処理（３Ｄ）と、処理（３
Ｃ）の比較結果がｓｋｉｐｃ＞ＮＳＫＩＰＣ若しくはｄ
ｔｃ＞ＤＴＣとなる場合に、前向き確率スキップ数ｓｋ
ｉｐｃを０に初期化し、及び、前向き確率基準フレーム
番号ｑｃを現フレーム番号ｔに書き換える処理（３Ｅ）
と、処理（３Ｅ）の終了後、出力確率スキップ数ｓｋｉ
ｐｓを閾値ＮＳＫＩＰＳと比較すると共に、現フレーム
番号ｔの音声特徴ベクトルｘ_t と出力確率基準フレーム
番号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを
閾値ＤＴＳと比較し、当該比較結果がｓｋｉｐｓ＞ＮＳ
ＫＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる場合に、出力確
率スキップ数ｓｋｉｐｓを０に初期化し、及び、出力確
率基準フレーム番号ｑｓを現フレーム番号ｔに書き換え
ると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)を
ヒドンマルコフモデルから求めて参照確率ｂ_jiを当該出
力確率ｂ_ji(x_t)に書き換え、参照確率ｂ_jiの書換え終了
後に各参照確率ｂ_jiを読み出して前向き確率ｃ_itを求
め、当該比較結果がｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔ
ｓ≦ＤＴＳとなる場合に、出力確率スキップ数ｓｋｉｐ
ｓに１を加算すると共に、参照確率ｂ_ji(x_t)の書換えを
行なわずに各参照確率ｂ_jiを読み出して前向き確率ｃ_it
を求める処理（３Ｆ）と、処理（３Ｄ）若しくは（３
Ｆ）の終了後、現フレーム番号ｔに１を加算する処理
（３Ｇ）とを行なう。(2). When the 2 ≦ t ≦ T, the forward probability skip number skipc with is compared with a threshold value NSKIPC, the distance between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t Process of comparing dtc with threshold DTC (3C)
And the comparison result of the processing (3C) is skipc ≦ NSKIP
When C and dtc ≦ DTC, the forward probability c _it is assumed to be equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the operation for obtaining the forward probability c _it is ended, and the forward probability skip number skippc, output Probability skip number ski
The process (3D) of adding 1 to ps and the process (3D)
The comparison result of C) is skipc> NSKIPC or d
When tc> DTC, the forward probability skip number sk
A process of initializing ipc to 0 and rewriting the forward probability reference frame number qc to the current frame number t (3E)
And, after the processing (3E) ends, the output probability skip number ski
with comparing ps a threshold NSKIPS, the distance dts between the speech feature vector x _qs output probabilities reference frame number qs with speech feature vectors x _t of the current frame number t is compared with a threshold DTS, the comparison result is skips > NS
When KIPS or dts> DTS, the output probability skip number skips is initialized to 0, the output probability reference frame number qs is rewritten to the current frame number t, and the output probability b _ji (for all j and i). x _t ) is calculated from the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), each reference probability b _ji is read out after the reference probability b _ji is rewritten, and the forward probability c _it is calculated. The comparison result is skips ≦ NSKIPS and dt.
Output probability skip number skip if s ≦ DTS
In addition to adding 1 to s, each reference probability b _ji is read without rewriting the reference probability b _ji (x _t ) and the forward probability c _it.
Process (3F) and process (3D) or (3
After the end of F), a process (3G) of adding 1 to the current frame number t is performed.

【００９０】このように請求項８の発明では、参照確率
ｂ_jiの初期値を、始端フレームでヒドンマルコフモデル
から求めた出力確率ｂ_ji(x₁)とする。そして前向き確率
基準フレーム番号ｑｃの初期値と、出力確率基準フレー
ム番号ｑｓの初期値とをそれぞれ、始端フレームのフレ
ーム番号１とする。また前向き確率スキップ数ｓｋｉｐ
ｃの初期値と、出力確率スキップ数ｓｋｉｐｓの初期値
とをそれぞれ、０とする。As described above, in the invention of claim 8, the initial value of the reference probability b _ji is the output probability b _ji (x ₁ ) obtained from the Hidden Markov model in the start frame. Then, the initial value of the forward probability reference frame number qc and the initial value of the output probability reference frame number qs are set as the frame number 1 of the start frame. Also, the forward probability skip number skip
The initial value of c and the initial value of the output probability skip number skips are set to 0, respectively.

【００９１】そして前向き確率スキップ数ｓｋｉｐｃを
閾値ＮＳＫＩＰＣと比較すると共に、現フレーム番号ｔ
の音声特徴ベクトルｘ_t と前向き確率基準フレーム番号
ｑｃの音声特徴ベクトルｘ_qcとの間の距離ｄｔｃを閾値
ＤＴＣと比較する。ｓｋｉｐｃ≦ＮＳＫＩＰＣかつｄｔ
ｃ≦ＤＴＣの場合は、現フレーム番号ｔの前向き確率ｃ
_itは直前フレームの前向き確率ｃ_i(t-1)に等しいものと
して前向き確率ｃ_itを求める演算を終了すると共に、前
向き確率スキップ数ｓｋｉｐｃのカウントアップと出力
確率スキップ数ｓｋｉｐｓのカウントアップとを行な
う。Then, the forward probability skip number skippc is compared with the threshold value NSKIPC, and the current frame number t
The distance dtc between the voice feature vector x _{t of the above} and the voice feature vector x _qc of the forward probability reference frame number qc is compared with the threshold value DTC. skipc ≤ NSKIPC and dt
If c ≦ DTC, the forward probability c of the current frame number t
_It is assumed that _it is equal to the forward probability c _{i (t−1)} of the immediately preceding frame, and the calculation of the forward probability c _it is completed, and the forward probability skip number skippc and the output probability skip number skips are incremented. .

【００９２】またｓｋｉｐｃ＞ＮＳＫＩＰＣ若しくはｄ
ｔｃ＞ＤＴＣの場合は、前向き確率基準フレーム番号ｑ
ｃを現フレーム番号ｔに書き換えると共に前向き確率ス
キップ数ｓｋｉｐｃを初期化し、然る後、出力確率スキ
ップ数ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共に
現フレーム番号ｔの音声特徴ベクトルｘ_t と出力確率基
準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の距
離ｄｔｓを閾値ＤＴＳと比較する。ｓｋｉｐｓ＞ＮＳＫ
ＩＰＳ若しくはｄｔｓ＞ＤＴＳの場合は、出力確率スキ
ップ数ｓｋｉｐｓの初期化と出力確率基準フレーム番号
ｑｓの書換えと参照確率ｂ_jiの書換えとを行ない、書き
換えた参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳの場
合は、出力確率スキップ数ｓｋｉｐｓのカウントアップ
は行ない、出力確率基準フレーム番号ｑｓの書換えと参
照確率ｂ_jiの書換えとは行なわずに、書換えを行なわな
かった参照確率ｂ_jiを読み出して前向き確率ｃ_itを求め
る。In addition, skippc> NSKIPC or d
When tc> DTC, the forward probability reference frame number q
c is rewritten to the current frame number t, the forward probability skip number skipc is initialized, and then the output probability skip number skips is compared with the threshold value NSKIPS, and the voice feature vector x _t of the current frame number _t and the output probability reference frame number are compared. The distance dts between _qs and the voice feature vector x _qs is compared with a threshold DTS. skips> NSK
In the case of IPS or dts> DTS, the output probability skip number skips is initialized, the output probability reference frame number qs is rewritten, and the reference probability b _ji is rewritten, and the rewritten reference probability b _ji is read and the forward probability c _it. Ask for. In the case of skips ≦ NSKIPS and dts ≦ DTS, the output probability skip number skips is counted up, and the reference probability not rewritten without rewriting the output probability reference frame number qs and the reference probability b _ji . Read b _ji to obtain the forward probability c _it .

【００９３】ｓｋｉｐｃ≦ＮＳＫＩＰＣかつｄｔｃ≦Ｄ
ＴＣの場合は、ｄｔｃ≦ＤＴＣであるので現フレーム番
号ｔの音声特徴ベクトルｘ_t は前向き確率基準フレーム
番号ｑｃの音声特徴ベクトルｘ_qcに近似し、従って現フ
レーム番号ｔの音声特徴ベクトルｘ_t は前向き確率基準
フレーム番号ｑｃの音声特徴ベクトルｘ_qcからの変化が
小さいので、現フレーム番号ｔの前向き確率ｃ_itは直前
フレームの前向き確率ｃ_i(t-1)で近似できる。しかもｓ
ｋｉｐｃ≦ＮＳＫＩＰＣであり従って直前フレームの前
向き確率ｃ_i(t-1)の書換えを行なわなかった回数ｓｋｉ
ｐｃが閾値ＮＳＫＩＰＣ以下であるので現フレーム番号
ｔと前向き確率基準フレーム番号ｑｃとの時間的隔たり
が小さくなる。従って誤差が増大する可能性が低いので
演算量を削減すべく、現フレーム番号ｔの前向き確率ｃ
_itは直前フレームの前向き確率ｃ_i(t-1)に等しいものと
して、前向き確率ｃ_itを求める演算を終了する。従って
参照確率ｂ_jiを読み出して前向き確率ｃ_itを求める演算
も出力確率ｂ_jiの書換えも行なわないので、前向き確率
基準フレーム番号ｑｃの書換えも出力確率基準フレーム
番号ｑｓの書換えも行なわない。また前向き確率スキッ
プ数ｓｋｉｐｃは、ｓｋｉｐｃ≦ＮＳＫＩＰＣとなる範
囲内で現フレーム番号ｔの前向き確率ｃ_itを直前フレー
ムの前向き確率ｃ_i(t-1)で近似して前向き確率ｃ_itの演
算を終了した回数を表すものであるので、前向き確率ス
キップ数ｑｃに１を加算して前向き確率スキップ数ｑｃ
をカウントアップする。さらに出力確率スキップ数ｓｋ
ｉｐｓは、ｓｋｉｐｓ≦ＮＳＫＩＰＳとなる範囲内で参
照確率ｂ_jiの書換えを行なわなかった回数を表すもので
あるので、出力確率スキップ数ｓｋｉｐｓに１を加算し
て出力確率スキップ数ｓｋｉｐｓをカウントアップす
る。Skipc≤NSKIPC and dtc≤D
In the case of TC, since dtc ≦ DTC, the voice feature vector x _t of the current frame number t is close to the voice feature vector x _qc of the forward probability reference frame number qc, and thus the voice feature vector x _t of the current frame number _t is Since the change in the forward probability reference frame number qc from the voice feature vector x _qc is small, the forward probability c _it of the current frame number t can be approximated by the forward probability c _{i (t-1)} of the immediately preceding frame. Moreover, s
Since kipc ≦ NSKIPC, the number of times ski in which the forward probability c _{i (t-1)} of the immediately preceding frame is not rewritten is ski.
Since pc is less than or equal to the threshold value NSKIPC, the time gap between the current frame number t and the forward probability reference frame number qc becomes small. Therefore, the error is unlikely to increase, so that the forward probability c of the current frame number t should be reduced in order to reduce the calculation amount.
_It is assumed that _it is equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the calculation for obtaining the forward probability c _it ends. Therefore, since neither the reference probability b _ji is read out to obtain the forward probability c _it nor the output probability b _ji is rewritten, neither the forward probability reference frame number qc nor the output probability reference frame number qs is rewritten. The forward probability skip number Skipc is complete the calculation of the forward probability c _it is approximated by the forward probability c _{i (t-1)} of the immediately preceding frame forward probability c _it the current frame number t in a range of a skipc ≦ NSKIPC The number of forward probability skips qc is calculated by adding 1 to the number of forward probability skips qc.
Count up. Further, the output probability skip number sk
Since ips represents the number of times the reference probability b _ji was not rewritten within the range of skips ≦ NSKIPS, 1 is added to the output probability skip number skips to count up the output probability skip number skips.

【００９４】ｄｔｃ＞ＤＴＣの場合は、距離ｄｔｃが閾
値ＤＴＣを越えるので現フレーム番号ｔの音声特徴ベク
トルｘ_t と前向き確率基準フレーム番号ｑｃの音声特徴
ベクトルｘ_qcとが近似しない場合であり、従って現フレ
ーム番号ｔの音声特徴ベクトルｘ_t は前向き確率基準フ
レーム番号ｑｃの音声特徴ベクトルｘ_qcからの変化が大
きくなるので、現フレーム番号ｔの前向き確率ｃ_itは直
前フレームの前向き確率ｃ_i(t-1)で近似できない。そこ
で参照確率ｂ_jiを読み出して現フレーム番号ｔの前向き
確率ｃ_itを求める演算を行なうこととなるので、前向き
確率基準フレーム番号ｑｃを現フレーム番号ｔに書き換
える。また前向き確率スキップ数ｓｋｉｐｃは、ｓｋｉ
ｐｃ≦ＮＳＫＩＰＣとなる範囲内で現フレーム番号ｔの
前向き確率ｃ_itを直前フレームの前向き確率ｃ_i(t-1)で
近似して前向き確率ｃ_itの演算を終了した回数を表すも
のであるので、前向き確率スキップ数ｓｋｉｐｃを０に
初期化する。[0094] The dtc> For DTC, distance dtc the case since exceeds a threshold DTC that the speech feature vector x _t and forward probabilities reference frame number qc speech feature vector x _qc of the current frame number t does not approximate, thus Since the voice feature vector x _t of the current frame number t has a large change from the voice feature vector x _qc of the forward probability reference frame number qc, the forward probability c _it of the current frame number t is the forward probability c _{i (t of the} immediately preceding frame. _-1) cannot be approximated. Therefore, since the reference probability b _ji is read and the forward probability c _it of the current frame number t is calculated, the forward probability reference frame number qc is rewritten to the current frame number t. In addition, the forward probability skip number skipc is
It represents the number of times that the forward probability c _it of the current frame number t is approximated by the forward probability c _{i (t-1)} of the immediately preceding frame within the range of pc ≦ NSKIPC to complete the calculation of the forward probability c _it . , The forward probability skip number skipc is initialized to 0.

【００９５】ｓｋｉｐｃ＞ＮＳＫＩＰＣの場合は、現フ
レーム番号ｔの前向き確率ｃ_itを直前フレームの前向き
確率ｃ_i(t-1)で近似して前向き確率ｃ_itの演算を終了し
た回数ｓｋｉｐｃが閾値ＮＳＫＩＰＣを越えるので現フ
レーム番号ｔと前向き確率基準フレーム番号ｑｃとの時
間的隔たりが大きくなり、従って誤差が増大する可能性
が高いので誤差を低減するべく、参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める演算を行なう。そこで前向
き確率基準フレーム番号ｑｃを現フレーム番号ｔに書き
換える。また前向き確率スキップ数ｓｋｉｐｃは、ｓｋ
ｉｐｃ≦ＮＳＫＩＰＣとなる範囲内で現フレーム番号ｔ
の前向き確率ｃ_itを直前フレームの前向き確率ｃ_i(t-1)
で近似して前向き確率ｃ_itの演算を終了した回数を表す
ものであるので、前向き確率スキップ数ｓｋｉｐｃを初
期化する。If skippc> NSKIPC, the number of times skippc is the threshold NSKIPC when the forward probability c _it of the current frame number t is approximated by the forward probability c _{i (t-1)} of the immediately preceding frame and the calculation of the forward probability c _it is completed. Since the current frame number t exceeds the forward probability reference frame number qc, the error is likely to increase. Therefore, in order to reduce the error, the reference probability b _ji is read and the forward probability c _it. Is performed. Therefore, the forward probability reference frame number qc is rewritten to the current frame number t. Further, the forward probability skip number skipc is sk
Current frame number t within the range of ipc ≦ NSKIPC
Forward probability c _it of the previous frame forward probability c _{i (t-1)}
Since it represents the number of times that the calculation of the forward probability c _it is completed by approximating with, the forward probability skip number skippc is initialized.

【００９６】また記憶部に格納される参照確率ｂ_jiは、
出力確率基準フレーム番号ｑｓのフレームでヒドンマル
コフモデルから求めた出力確率ｂ_ji(x_t)である。The reference probability b _ji stored in the storage unit is
The output probability is the output probability b _ji (x _t ) obtained from the Hidden Markov Model in the frame of the reference frame number qs.

【００９７】そしてｄｔｓ＞ＤＴＳの場合は、距離ｄｔ
ｓが閾値ＤＴＳを越えるので現フレーム番号ｔの音声特
徴ベクトルｘ_t と書換え前の出力確率基準フレーム番号
ｑｓの音声特徴ベクトルｘ_qsとが近似しない場合であ
り、現フレーム番号ｔの音声特徴ベクトルｘ_t は出力確
率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsからの
変化が大きいので、現フレーム番号ｔの出力確率ｂ_ji(x
_t)を、出力確率基準フレーム番号ｑｓの出力確率ｂ_ji(x
_qs) すなわち参照確率ｂ_jiで近似できない。そこで参照
確率ｂ_jiを、現フレーム番号ｔの出力確率ｂ_ji(x_t)に書
き換え、この書き換えた参照確率ｂ_jiを読み出して前向
き確率ｃ_itを求める。また参照確率ｂ_jiを、現フレーム
番号ｔの出力確率ｂ_ji(x_t)に書き換えるので、出力確率
基準フレーム番号ｑｓを現フレーム番号ｔに書き換え
る。そして出力確率スキップ数ｓｋｉｐｓは、ｓｋｉｐ
ｓ≦ＮＳＫＩＰＳとなる範囲内で参照確率ｂ_jiの書換え
を行なわなかった回数を表すものであるので、出力確率
スキップ数ｓｋｉｐｓを初期化する。If dts> DTS, the distance dt
s is the case where the audio feature vector x _qs output probabilities reference frame number qs before rewriting the audio feature vector x _t of the current frame number t since exceeding the threshold DTS is not approximate, the audio feature vector x of the current frame number t _{Since t} has a large change from the voice feature vector x _qs of the output probability reference frame number qs, the output probability b _ji (x
_t ) is the output probability b _ji (x
_qs ) That is, it cannot be approximated by the reference probability b _ji . Therefore, the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, and the rewritten reference probability b _ji is read to obtain the forward probability c _it . Further, since the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, the output probability reference frame number qs is rewritten to the current frame number t. Then, the output probability skip number skips is skip
Since it represents the number of times the reference probability b _ji was not rewritten within the range of s ≦ NSKIPS, the output probability skip number skips is initialized.

【００９８】ｓｋｉｐｓ＞ＮＳＫＩＰＳの場合は、参照
確率ｂ_jiの書換えを行なわなかった回数ｓｋｉｐｓが閾
値ＮＳＫＩＰＳを越えるので現フレーム番号ｔと出力確
率基準フレーム番号ｑｓとの時間的隔たりが大きくな
り、従って誤差が増大する可能性が高いので誤差を低減
すべく、参照確率ｂ_jiの書換えを行なう。従って参照確
率ｂ_jiを、現フレーム番号ｔの出力確率ｂ_ji(x_t)に書き
換えるので、出力確率基準フレーム番号ｑｓを現フレー
ム番号ｔに書き換える。そして出力確率スキップ数ｓｋ
ｉｐｓは、ｓｋｉｐｓ≦ＮＳＫＩＰＳとなる範囲内で参
照確率ｂ_jiの書換えを行なわなかった回数を表すもので
あるので、出力確率スキップ数ｓｋｉｐｓを初期化す
る。In the case of skips> NSKIPS, the number of times skips, in which the reference probability b _ji is not rewritten, exceeds the threshold value NSKIPS, so that the time lag between the current frame number t and the output probability reference frame number qs becomes large, and thus the error occurs. Is likely to increase, the reference probability b _ji is rewritten in order to reduce the error. Therefore, since the reference probability b _ji is rewritten to the output probability b _ji (x _t ) of the current frame number t, the output probability reference frame number qs is rewritten to the current frame number t. And the output probability skip number sk
Since ips represents the number of times the reference probability b _ji is not rewritten within the range of skips ≦ NSKIPS, the output probability skip number skips is initialized.

【００９９】ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦Ｄ
ＴＳの場合は、ｄｔｓ≦ＤＴＳであるので現フレーム番
号ｔの音声特徴ベクトルｘ_t と出力確率基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsとが近似的に等しくなる
場合であり、従って現フレーム番号ｔの出力確率ｂ_ji(x
_t)は、出力確率基準フレーム番号ｑｓの出力確率ｂ_ji(x
_qs) すなわち参照確率ｂ_jiに近似的に等しくなる。しか
もｓｋｉｐｓ≦ＮＳＫＩＰＳであり従って参照確率ｂ_ji
の書換えを行なわなかった回数ｓｋｉｐｓが閾値ＮＳＫ
ＩＰＳ以下であるので、現フレーム番号ｔと出力確率基
準フレーム番号ｑｓとの時間的隔たりが小さく従って誤
差が増大する可能性は低い。そこで参照確率ｂ_jiの書換
えを行なわずに、参照確率ｂ_jiを読み出して前向き確率
ｃ_itを求める。従って参照確率ｂ_jiの書換えを行なわな
かったので、出力確率基準フレーム番号ｑｓの書換えを
行なわない。そして出力確率スキップ数ｓｋｉｐｓは、
ｓｋｉｐｓ≦ＮＳＫＩＰＳとなる範囲内で参照確率ｂ_ji
の書換えを行なわなかった回数を表すものであるので、
出力確率スキップ数ｓｋｉｐｓに１を加算してスキップ
数ｓｋｉｐｓをカウントアップする。Skips ≦ NSKIPS and dts ≦ D
For TS, a case where the audio feature vector x _t of the current frame number t and the speech feature vector x _qs output probabilities reference frame number qs equal to approximately since at dts ≦ DTS, therefore the current frame number t Output probability b _ji (x
_t ) is the output probability b _ji (x of the output probability reference frame number qs
_qs ), that is, approximately equal to the reference probability b _ji . Moreover, skips ≦ NSKIPS, and therefore the reference probability b _ji
The number of times skips was not rewritten is the threshold value NSK
Since it is IPS or less, the time gap between the current frame number t and the output probability reference frame number qs is small, and therefore the possibility of increasing the error is low. So without rewriting the reference probability b _ji, it reads the reference probability b _ji by obtaining the forward probability c _it. Therefore, since the reference probability b _ji has not been rewritten, the output probability reference frame number qs is not rewritten. The output probability skip number skips is
Reference probability b _ji within the range of skips ≦ NSKIPS
Since it represents the number of times that
The number of skips skips is incremented by adding 1 to the output probability skip number skips.

【０１００】このようにｓｋｉｐｃ≦ＮＳＫＩＰＣかつ
ｄｔｃ≦ＤＴＣの場合は、現フレーム番号ｔの前向き確
率ｃ_itは直前フレームの前向き確率ｃ_i(t-1)に等しいも
のとして前向き確率ｃ_itを求める演算を終了し、出力確
率ｂ_ji(x_t)を求める演算を行なわない。またｓｋｉｐｃ
＞ＮＳＫＩＰＣ若しくはｄｔｃ＞ＤＴＣの場合に、ｓｋ
ｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳであれ
ば、参照確率ｂ_jiの書換えを行なった後に、従って現フ
レーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマルコフモデ
ルから求める演算を行なった後に、参照確率ｂ_jiを読み
出して前向き確率ｃ_itを求める。さらにｓｋｉｐｃ＞Ｎ
ＳＫＩＰＣ若しくはｄｔｃ＞ＤＴＣの場合に、ｓｋｉｐ
ｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳであれば、参照確
率ｂ_jiの書換えを行なわずに、従って現フレーム番号ｔ
の出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求める
演算を行なわずに、参照確率ｂ_jiを読み出して前向き確
率ｃ_itを求めるので、前向き確率ｃ_itの誤差を抑えつ
つ、演算量を減少させることができる。As described above, in the case of skipc≤NSKIPC and dtc≤DTC, the forward probability c _it of the current frame number t is equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the forward probability c _it is calculated. And the calculation for _obtaining the output probability b _ji (x _t ) is not performed. See skipc
If> NSKIPC or dtc> DTC, sk
If ips> NSKIPS or dts> DTS, the reference probability is changed after the reference probability b _ji is rewritten, and thus the output probability b _ji (x _t ) of the current frame number t is calculated from the Hidden-Markov model. Read b _ji to obtain the forward probability c _it . Furthermore, skipc> N
If SKIPC or dtc> DTC, skip
If s ≦ NSKIPS and dts ≦ DTS, the reference probability b _ji is not rewritten, and accordingly the current frame number t
The reference probability b _ji is read out to obtain the forward probability c _it without performing the operation of obtaining the output probability b _ji (x _t ) of the forward probability c _it from the Hidden Markov model. Therefore, the error of the forward probability c _it is suppressed and the amount of calculation is reduced. Can be made.

【０１０１】この場合の前向き確率ｃ_itの誤差とは、ｓ
ｋｉｐｃ≦ＮＳＫＩＰＣかつｄｔｃ≦ＤＴＣの場合、若
しくは、ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳ
の場合に出力確率ｂ_ji(x_t)をヒドンマルコフモデルから
求める演算を行なわずに得た前向き確率ｃ_itと、そのよ
うな演算の簡略化を行なわずに得た前向き確率ｃ_itとの
差である。The error of the forward probability c _{it in} this case is s
If kipc ≦ NSKIPC and dtc ≦ DTC, or skips ≦ NSKIPS and dts ≦ DTS
In the case of, the difference between the forward probability c _it obtained without performing the calculation of the output probability b _ji (x _t ) from the Hidden Markov model and the forward probability c _it obtained without such simplification of the calculation. Is.

【０１０２】閾値ＤＴＣ、ＤＴＳを大きくするに従っ
て、演算の削減量は増えるが、前向き確率ｃ_itの誤差は
大きくなる。従って実用上望まれる誤差の範囲内で前向
き確率ｃ_itを求めることができるように、閾値ＤＴＣ、
ＤＴＳの値を定める必要がある。As the thresholds DTC and DTS are increased, the amount of reduction in calculation increases, but the error in the forward probability c _it increases. Therefore, the threshold DTC, so that the forward probability c _it can be obtained within the error range practically desired,
It is necessary to determine the value of DTS.

【０１０３】[0103]

BEST MODE FOR CARRYING OUT THE INVENTION

＜請求項１の発明の第一実施形態＞図１は請求項１の発
明の第一実施形態の実施に用いて好適な音声認識装置の
構成例を示す機能ブロック図である。<First Embodiment of Invention of Claim 1> FIG. 1 is a functional block diagram showing a configuration example of a speech recognition apparatus suitable for use in implementing the first embodiment of the invention of Claim 1.

【０１０４】同図に示す音声認識装置１０は、辞書部１
２、音響処理部１４、音声区間検出部１６、照合部１８
及び参照情報記憶部２０を備える。The voice recognition apparatus 10 shown in FIG.
2, sound processing unit 14, voice section detection unit 16, collation unit 18
And a reference information storage unit 20.

【０１０５】辞書部１２は、認識照合用の標準パタンと
して各カテゴリ毎に用意された複数個のヒドンマルコフ
モデルを格納する。参照情報記憶部２０は、基準フレー
ム番号ｑｓと参照確率ｂ_jiとを格納する。The dictionary unit 12 stores a plurality of Hidden Markov models prepared for each category as standard patterns for recognition and matching. The reference information storage unit 20 stores the reference frame number qs and the reference probability b _ji .

【０１０６】音響処理部１４は、一定時間幅のフレーム
毎に、入力音声信号から音声特徴ベクトルを抽出する。
音声区間検出部１６は、入力音声信号から音声区間を検
出する。The sound processing section 14 extracts a voice feature vector from the input voice signal for each frame of a fixed time width.
The voice section detection unit 16 detects a voice section from the input voice signal.

【０１０７】照合部１８は、請求項１の発明の第一実施
形態を実施するものであって、音声区間の始端フレーム
から終端フレームまでに抽出された音声特徴ベクトルの
時系列ｘ₁ 、ｘ₂ 、……、ｘ_T とヒドンマルコフモデル
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
次式（１）〜（３）を用いて求め、最大の尤度を得たヒ
ドンマルコフモデルに付与されているカテゴリを、当該
音声区間内の音声信号に対する認識結果とする。The collating unit 18 implements the first embodiment of the invention of claim 1, and is a time series x ₁ and x _{2 of} the voice feature vectors extracted from the start frame to the end frame of the voice section. , ..., The likelihood ln {P (x ₁ , x ₂ , ..., x _T )} between x _T and the Hidden Markov model is
The category given to the Hidden Markov model that has been obtained using the following equations (1) to (3) and has the maximum likelihood is used as the recognition result for the voice signal in the voice section.

【０１０８】[0108]

【数５】 (Equation 5)

【０１０９】但し、ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊ Ф_i ：ヒドンマルコフモデルにおいて初期状態がＳ_i で
ある確率ａ_ji：ヒドンマルコフモデルにおいて状態Ｓ_j から状態
Ｓ_i に遷移する確率ｘ_t ：音声区間内の第ｔ番目のフレームで抽出された音
声特徴ベクトル（１≦ｔ≦Ｔであって、第１番目のフレ
ームは音声区間の始端フレームを及び第Ｔ番目のフレー
ムは音声区間の終端フレームを表す）ｂ_ji(x_t)：ヒドンマルコフモデルにおいて状態Ｓ_j から
状態Ｓ_i に遷移するとき出力される音声特徴ベクトルｘ
_t の出力確率ｃ_it：ヒドンマルコフモデルにおいて初期状態から遷移
を開始し音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_t を出力して状態Ｓ_i に至る前向き確率＊ｉ：ヒドンマルコフモデルにおいて最終状態となる状
態Ｓ_i に付与されている状態番号ｉ尤度を求める際には、参照情報記憶部２０に格納してあ
る参照確率ｂ_jiを用いて、ｔ＝１、２、……、Ｔの各場
合の前向き確率ｃ_itを、次ぎの如くして順次に求める。However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that the initial state is S _i in the Hidden Markov model a _ji : Hidden Markov model At the state S _j to the state S _i in the above, x _t : the speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, and the first frame corresponds to the speech section). The start frame and the T-th frame represent the end frame of the speech section) b _ji (x _t ): speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t to the state S _i * i: State number i given to the state S _i that is the final state in the Hidden Markov model When storing the likelihood, it is stored in the reference information storage unit 20. Using the given reference probability b _ji , the forward probability c _it in each case of t = 1, 2, ..., T is sequentially obtained as follows.

【０１１０】（１）．ｔ＝１のときは、基準フレーム番
号ｑｓを１に初期化すると共に、全てのｊ、ｉについ
て、出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求め
当該出力確率ｂ_ji(x_t)を参照確率ｂ_jiの初期値として書
き込み、参照確率ｂ_jiの書込み終了後に各参照確率ｂ_ji
を読み出して前向き確率ｃ_itを求める処理（１Ａ）と、
処理（１Ａ）の終了後、現フレーム番号ｔに１を加算す
る処理（１Ｂ）とを行なう。(1). When the t = 1, is initialized to 1 reference frame number qs, all j, for i, the output probability b _ji the (x _t) determined from the hidden Markov model the output probability b _ji the (x _t) The reference probability b _ji is written as an initial value, and each reference probability b _ji is written after the reference probability b _ji is written.
And a process (1A) for obtaining the forward probability c _it ,
After the end of the process (1A), a process (1B) of adding 1 to the current frame number t is performed.

【０１１１】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と基準フレーム番号ｑｓ
の音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを閾値ＤＴ
Ｓと比較し、当該比較結果がｄｔｓ＞ＤＴＳとなる場合
に、基準フレーム番号ｑｓを現フレーム番号ｔに書き換
えると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)
をヒドンマルコフモデルから求めて参照確率ｂ_jiを当該
出力確率ｂ_ji(x_t)に書き換え、参照確率ｂ_jiの書換え終
了後に各参照確率ｂ_jiを読み出して前向き確率ｃ_itを求
め、当該比較結果がｄｔｓ≦ＤＴＳとなる場合に、参照
確率ｂ_jiの書き換えを行なわずに各参照確率ｂ_jiを読み
出して前向き確率ｃ_itを求める処理（１Ｃ）と、処理
（１Ｃ）の終了後、現フレーム番号ｔに１を加算する処
理（１Ｄ）とを行なう。(2). When 2 ≦ t ≦ T, the voice feature vector x _t of the current frame number _t and the reference frame number qs
Of the voice feature vector x _qs of the
When S is compared with S and the comparison result is dts> DTS, the reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) for all j and i.
From the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), each reference probability b _ji is read out after the reference probability b _ji is rewritten, and the forward probability c _it is calculated. If There where the dts ≦ DTS, without rewriting of the reference probability b _ji reads each reference probability b _ji seek forward probability c _it processing (1C), after the processing (1C), the current frame number The process of adding 1 to t (1D) is performed.

【０１１２】図２はヒドンマルコフモデルの説明に供す
る図である。辞書部１２に格納されているヒドンマルコ
フモデル（Hidden Markov Model 。以下、ＨＭＭ）は、
音声認識一単位分の音声信号を表現する。音声認識の一
単位は、単語単位、音素単位或はそのほかとすることが
できるが、ここでは単語単位とする。各カテゴリｚ毎に
複数のＨＭＭを用意し、ＨＭＭとカテゴリｚとを相対応
付けて辞書部１２に格納する。FIG. 2 is a diagram for explaining the Hidden Markov Model. Hidden Markov Model (hereinafter, HMM) stored in the dictionary unit 12 is
Voice recognition Represents one unit of voice signal. One unit of speech recognition can be a word unit, a phoneme unit, or another unit, but here, it is a word unit. A plurality of HMMs are prepared for each category z, and the HMMs and the categories z are associated with each other and stored in the dictionary unit 12.

【０１１３】ＨＭＭは、総個数Ｉ個の状態Ｓ₁ 〜Ｓ_I か
ら成る状態の集合１と、音声特徴ベクトルｘの集合２
と、状態遷移確率ａ_jiの集合３と、出力確率ｂ_ji(x) の
集合４と、初期状態確率Ф_i の集合５と、最終状態Ｆの
集合６とにより定義される。但し、The HMM has a set 1 of states consisting of a total of I states S _{1 to} S _I and a set 2 of speech feature vectors x.
, A set 3 of state transition probabilities a _ji, a set 4 of output probabilities b _ji (x), a set 5 of initial state probabilities Φ _i , and a set 6 of final states F. However,

【０１１４】[0114]

【数６】 (Equation 6)

【０１１５】ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊａ_ji：状態Ｓ_j から状態Ｓ_i に遷移する確率ｂ_ji(x) ：状態Ｓ_j から状態Ｓ_i に遷移する際に音声特
徴ベクトルｘが出力される確率 Ф_i ：初期状態がＳ_i である確率例えば図２の例において、ａ₁₂は状態Ｓ₁ から状態Ｓ₂
に遷移する確率及びｂ₁₂(x) は状態Ｓ₁ から状態Ｓ₂ に
遷移したとき音声特徴ベクトルｘが出力される確率、ま
たａ₂₂は状態Ｓ₂ から状態Ｓ₂ に遷移する確率及びｂ₂₂
(x) は状態Ｓ₂から状態Ｓ₂ に遷移したとき音声特徴ベ
クトルｘが出力される確率を表す。I: i = 1, 2, ..., I j: j = 1, 2, ..., J a _ji : Probability of transition from state S _j to state S _i b _ji (x): state S _j probability speech feature vector x is output in transition to state S _i from .PHI _i: initial state in the example of the probability for example FIG. 2 is a S _i, a ₁₂ state from the state S ₁ is S ₂
Transitions to probability and b ₁₂ (x) The probability, speech feature vector x is output when a transition from the state S ₁ to state S ₂ is a ₂₂ probability and b ₂₂ transitions from state S ₂ to state S ₂ is
(x) represents the probability that the audio feature vector x is output when a transition from the state S ₂ to state S _2.

【０１１６】ＨＭＭを定義するための集合１〜６は、統
計的手法によって、各カテゴリｚ毎に個別に求められ
る。すなわちカテゴリｚに対応する音声信号として種々
の音声信号を集め、例えば年齢別にもしくは性別毎に音
声信号を集め、或は、発声法の異なる音声信号を集め、
これら音声信号の統計的性質を表現する集合１〜６を求
める。The sets 1 to 6 for defining the HMM are individually obtained for each category z by a statistical method. That is, various voice signals are collected as voice signals corresponding to the category z, for example, voice signals are collected by age or sex, or voice signals having different voicing methods are collected.
Sets 1 to 6 expressing the statistical properties of these audio signals are obtained.

【０１１７】出力確率ｂ_ji(x) は、互いに無相関な複数
個の正規分布から成る無相関混合正規分布を用いて表現
されており、これら正規分布はそれぞれ音声特徴ベクト
ルｘの関数となっている。無相関混合正規分布は、数学
的取り扱いが簡単でしかも表現能力が高いという利点を
有する。The output probability b _ji (x) is expressed by using a non-correlated mixed normal distribution consisting of a plurality of normal distributions that are uncorrelated with each other, and each of these normal distributions is a function of the voice feature vector x. There is. The decorrelated mixed normal distribution has the advantage of being easy to handle mathematically and having high expressiveness.

【０１１８】次に音声認識装置１０の動作説明ととも
に、この実施形態の音声認識方法の処理の流れにつき具
体的に説明する。Next, the operation of the voice recognition apparatus 10 will be described, and the flow of processing of the voice recognition method of this embodiment will be specifically described.

【０１１９】音響処理部１４は、入力音声信号から、各
フレーム毎に音声特徴ベクトルｘ_t＝（ｘ_t1、ｘ_t2、…
…、ｘ_tp）を抽出する。ここでｐは音声特徴ベクトルｘ
_t の次数及びｘ_t1〜ｘ_tpは音声特徴ベクトルｘ_t のベク
トル成分を表す。ｔは音声特徴ベクトルｘ_t が抽出され
たフレームに付与されている番号である。後述するＨＭ
Ｍとの照合の段階では音声区間の始端フレームのフレー
ム番号ｔを１として昇順に書き改められるが、音響処理
の時点では各フレームを識別できるようにフレーム番号
ｔを付与してあれば良い。The sound processing section 14 uses the input voice signal to output a voice feature vector x _t = (x _t 1, x _t 2, ...) For each frame.
,, x _t p) are extracted. Where p is the voice feature vector x
order and x _t 1 to x _t p to _t represents a vector component of the speech feature vector x _t. t is a number given to the frame from which the voice feature vector x _t is extracted. HM described later
At the stage of matching with M, the frames are rewritten in ascending order with the frame number t of the starting frame of the voice section as 1, but at the time of the acoustic processing, the frame number t may be added so that each frame can be identified.

【０１２０】音声特徴ベクトルｘ_t のベクトル成分とし
ては、例えば、中心周波数が異なる複数のバンドパスフ
ィルタから成る帯域フィルタ群に入力音声信号を入力し
たときの各フィルタ出力から得たものや、入力音声信号
をフーリエ解析して得られるパワースペクトル成分や、
或は、入力音声信号の線形予測分析すなわちＬＰＣ分析
により求められるＬＰＣケプストラム係数を、用いるこ
とができる。ここでは帯域フィルタ群を用いて音声特徴
ベクトルｘ_t を抽出する例につき説明する。As the vector component of the voice feature vector x _t , for example, the one obtained from each filter output when the input voice signal is input to the band filter group consisting of a plurality of band pass filters having different center frequencies, the input voice Power spectrum component obtained by Fourier analysis of the signal,
Alternatively, the LPC cepstrum coefficient obtained by the linear prediction analysis, that is, the LPC analysis of the input speech signal can be used. Here, an example of extracting the voice feature vector x _t using a bandpass filter group will be described.

【０１２１】音響処理部１４は、入力音声信号をアナロ
グ信号からデジタル信号に変換し、変換後の入力音声信
号を、帯域フィルタ群を介して、各バンドパスフィルタ
に対応した周波数帯（チャネル）の信号成分に分離し、
それぞれ周波数帯が異なる総個数ｐ個の信号成分ｘ1 〜
ｘp を得る。次いで音響処理部１４は、信号成分ｘ1を
整流し、フレーム単位に、整流した信号成分ｘ1 （信号
成分ｘ1 の絶対値）の平均値を得る。この平均値は、整
流した信号成分ｘ1 を１フレーム分の時間幅で除して得
られる。第ｔ番目のフレームにおいて得られる信号成分
ｘ1 の平均値を、音声特徴ベクトルｘ_t の成分ｘ_t1とし
て抽出する。同様にして、残りの信号成分ｘ2 〜ｘp か
ら、音声特徴ベクトルｘ_t の成分ｘ_t2〜ｘ_tpを抽出す
る。The acoustic processing section 14 converts the input audio signal from an analog signal to a digital signal, and outputs the converted input audio signal through a band filter group to a frequency band (channel) corresponding to each band pass filter. Separated into signal components,
The total number p of signal components x1 ...
Get xp. Next, the acoustic processing unit 14 rectifies the signal component x1 and obtains an average value of the rectified signal component x1 (absolute value of the signal component x1) in frame units. This average value is obtained by dividing the rectified signal component x1 by the time width of one frame. The average value of the signal component x1 obtained in the t-th frame is extracted as component x _t 1 of the audio feature vector x _t. Similarly, from the remaining signal components x2 ～Xp, it extracts the component x _{_t} 2~x _t p of a speech feature vector x _t.

【０１２２】次に音声区間検出部１６は、音響処理部１
４からの音声特徴ベクトルｘ_t に基づいて、音声区間の
始端フレーム及び終端フレームを検出し、どのフレーム
が音声区間の始端フレーム及び終端フレームであるかを
表す区間情報を生成する。音声区間は、音声認識一単位
分の音声信号ここでは単語１個分の音声信号が含まれる
区間である。Next, the voice section detecting section 16 includes the sound processing section 1
Based on the voice feature vector x _t from 4, the start frame and the end frame of the voice section are detected, and the section information indicating which frame is the start frame and the end frame of the voice section is generated. The voice section is a section in which a voice signal for one unit of voice recognition is included here.

【０１２３】照合部１８は、区間情報と音声特徴ベクト
ルｘ_t とを音声区間検出部１６から入力して、音声区間
の始端フレームから終端フレームまでに抽出された音声
特徴ベクトルｘ_t の時系列ｘ₁ 、ｘ₂ 、……、ｘ_T を生
成する。この際、始端フレームのフレーム番号ｔを１と
して、音声区間の始端フレームから終端フレームまでの
フレーム番号ｔを昇順に書き改める。The collation unit 18 inputs the section information and the speech feature vector x _t from the speech section detection unit 16, and the time series x of the speech feature vector x _t extracted from the start frame to the end frame of the speech section. Generate ₁ , x ₂ , ..., X _T. At this time, the frame number t of the start frame is set to 1, and the frame numbers t from the start frame to the end frame of the voice section are rewritten in ascending order.

【０１２４】そして照合部１８はベクトル時系列ｘ₁ 、
ｘ₂ 、……、ｘ_T と辞書部１２に格納されているＨＭＭ
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
辞書部１２の各ＨＭＭ毎に個別に求め、最大の尤度を得
たＨＭＭに対し付与されているカテゴリｚを、認識結果
として出力する。The collation unit 18 then calculates the vector time series x ₁ ,
x ₂ , ..., X _T and the HMM stored in the dictionary unit 12
And the likelihood ln {P (x ₁ , x ₂ , ..., X _T )} between
The category z given to each HMM of the dictionary unit 12 is obtained individually, and the category z assigned to the HMM having the maximum likelihood is output as a recognition result.

【０１２５】ここで、式（１）で示されるＰ（ｘ₁ 、ｘ
₂ 、……、ｘ_T ）は、ＨＭＭにおいてベクトル時系列ｘ
₁ 、ｘ₂ 、……、ｘ_T が出現する確率である。Here, P (x ₁ , x represented by the equation (1)
₂ , ..., x _T ) is the vector time series x in the HMM.
It is the probability that ₁ , x ₂ , ..., x _T will appear.

【０１２６】[0126]

【数７】 (Equation 7)

【０１２７】（１）式中のｃ_iTは、ＨＭＭにおいて初期
状態から遷移を開始しベクトル時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を出力して状態Ｓ_i に至る前向き確率、＊ｉは
Ｓ_i ∈Ｆを満たすｉ（最終状態Ｆに属する状態Ｓ_i に付
与されている番号ｉ）であって、従ってｉ＝＊ｉとなる
前向き確率ｃ_iTのなかで最大の前向き確率ｃ_iTを、出現
確率Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）とするものである。The c _iT in the equation (1) starts transition from the initial state in the HMM and the vector time series x ₁ , x ₂ , ...
..., the forward probability of outputting x _T to reach the state S _i , * i is i (the number _i assigned to the state S _i belonging to the final state F) that satisfies S _i εF, and thus i = The maximum forward probability c _iT among the forward probabilities c _iT of * i is _defined as the appearance probability P (x ₁ , x ₂ , ..., X _T ).

【０１２８】前向き確率ｃ_iTは、ビタビアルゴリズムに
より、式（２）〜（３）に示す漸化式を用いて近似的に
求められる。ｃ_i0＝Ф_i ……（２）The forward probability c _iT is approximately obtained by the Viterbi algorithm using the recurrence formulas shown in equations (2) to (3). c _i0 = Ф _i (2)

【０１２９】[0129]

【数８】 (Equation 8)

【０１３０】ＨＭＭにおいて、音声特徴ベクトルｘ_t を
出力する状態遷移は一又は複数存在する。従って初期状
態からベクトル系列ｘ₁ 〜ｘ_t を出力して状態Ｓ_i に至
る遷移パスは一つ又は複数存在し、ほとんどの場合に複
数の遷移パスが存在する。そこで式（３）に示されるよ
うに、各遷移パス毎に計算したｃ_j(t-1)ａ_jiｂ_ji(x_t)の
うち最大のｃ_j(t-1)ａ_jiｂ_ji(x_t)を前向き確率ｃ_itとす
る。この計算法はビタビ法と呼ばれる。In the HMM, there is one or more state transitions that output the voice feature vector x _t . Therefore, there is one or a plurality of transition paths from the initial state to output the vector series x _{1 to} x _t to reach the state S _i , and in most cases, there are a plurality of transition paths. Therefore, as shown in Expression (3), the maximum c _{j (t-1)} a _ji b _ji (x of the c _{j (t-1)} a _ji b _ji (x _t ) calculated for each transition path is obtained. Let _t ) be the forward probability c _it . This calculation method is called the Viterbi method.

【０１３１】（３）式中の出力確率ｂ_ji(x_t)を、ここで
は次式（４）の如く定義する。The output probability b _ji (x _t ) in the equation (3) is defined as in the following equation (4).

【０１３２】[0132]

【数９】 (Equation 9)

【０１３３】但し、ｍ＝１、２、……、Ｍｇ_jim(x_t) ：総個数Ｍ個の正規分布から成る無相関混合
正規分布において第ｍ番目の正規分布から算出される音
声特徴ベクトルｘ_t の重み付け確率（４）式中の重み付け確率ｇ_jim(x_t) は、次式（５）〜
（７）を用いて表される。However, m = 1, 2, ..., M g _jim (x _t ): A voice feature vector calculated from the m-th normal distribution in the uncorrelated mixed normal distribution consisting of M normal distributions. Weighting probability of x _t The weighting probability g _jim (x _t ) in the expression (4) is expressed by the following expression (5)-
It is expressed using (7).

【０１３４】ｇ_jim(x_t) ＝λ_jim ｂ_jim(x_t) ……（５）ｂ_jim(x_t) ＝（２π）^-p/2｜ρ_jim ｜^-1/2 exp｛−Ｄ_jimt ² ／２｝ ……（６）Ｄ_jimt ² ＝（ｘ_t −μ_jim ）’ρ_jim ^-1(ｘ_t −μ_jim ） ……（７） λ_jim ：第ｍ番目の正規分布の重みｂ_jim(x_t) ：第ｍ番目の正規分布から算出される音声特
徴ベクトルｘ_t の重み無し確率 ρ_jim ：第ｍ番目の正規分布の分散・供分散行列 μ_jim ：第ｍ番目の正規分布の平均ベクトルＤ_jimt：音声特徴ベクトルｘ_t と第ｍ番目の正規分布と
の間の距離を表すマハラビスの汎距離（ｘ_t −μ_jim ）’：（ｘ_t −μ_jim ）の転置行列尚、出力確率ｂ_ji(x_t)としては種々のものを用いること
ができ、（４）式のもののほか例えば、次式（８）の如
く定義したものを用いても良い。（８）式は、総個数Ｍ
個の正規分布から成る無相関混合正規分布において個々
の正規分布から算出される重み付け確率ｇ_ijm(x_t) のう
ち最大の重み付け確率ｇ_jim(x_t) を、出力確率ｂ_ji(x_t)
として検出することを表す。G _jim (x _t ) = λ _jim b _jim (x _t ) ... (5) b _jim (x _t ) = (2π) ^{−p / 2} _{│ρ jim} │ ^-1/2 exp {−D _jimt ^{2/2} ...... (6)} D jimt 2 = (x t -μ jim) 'ρ jim -1 (x t -μ jim) ...... (7) λ jim: weight b _jim of the m-th normal distribution (x _t ): _Unweighted probability of the speech feature vector x _t calculated from the m-th normal distribution ρ _jim : _Covariance / covariance matrix of the m-th normal distribution μ _jim : _{Mean of} the m-th normal distribution Vector D _jimt : Mahalabis's general distance (x _t −μ _jim ) ′: transposed matrix of (x _t −μ _jim ), which represents the distance between the voice feature vector x _t and the m-th normal distribution. Various types of b _ji (x _t ) can be used, and in addition to the formula (4), for example, a formula defined as the following formula (8) may be used. Equation (8) is the total number M
Number of regular weights probability in the distribution uncorrelated Gaussian Mixture consisting calculated from individual normal distribution g _ijm largest weighted probability g _jim (x _t) of the (x _t), the output probability b _ji (x _t)
It means to detect as.

【０１３５】[0135]

【数１０】 (Equation 10)

【０１３６】さらに対数化した遷移確率Ａ_ji＝ln
（ａ_ji）、対数化した出力確率Ｂ_ji(x_t)＝ln｛ｂ
_ji(x_t)｝、及び、対数化した前向き確率Ｃ_it＝ln
（ｃ_it）と表せば、式（１）〜（３）を変形して、尤度
ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_t ）｝の算出に関する
（９）〜（１１）式が得られる。Further logarithmic transition probability A _ji = ln
(A _ji ), logarithmic output probability B _ji (x _t ) = ln {b
_ji (x _t )} and the logarithmic forward probability C _it = ln
When expressed as (c _it ), the likelihoods are modified by modifying equations (1) to (3).
Equations (9) to (11) relating to the calculation of ln {P (x ₁ , x ₂ , ..., X _t )} are obtained.

【０１３７】[0137]

【数１１】 [Equation 11]

【０１３８】（９）〜（１１）式はｔの漸化式であるか
ら、ｔ＝１、２、……、Ｔのときの対数化した前向き確
率Ｃ_itを、次式（１２）〜（１６）の如く順次に計算で
きる。Since equations (9) to (11) are recurrence equations of t, the logarithmic forward probability C _it when t = 1, 2, ..., T is expressed by the following equations (12) to (12). It can be calculated sequentially as in 16).

【０１３９】[0139]

【数１２】 (Equation 12)

【０１４０】ＨＭＭ照合部１８は、ｉ＝１、２、……Ｉ
の全てのｉについてｔ＝Ｔの対数化した前向き確率Ｃ_iT
を得ると、ｉ＝＊ｉなる対数化した前向き確率Ｃ_iTのな
かで最大のＣ_iTを、尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得る。辞書部１２に格納されているすべて
のＨＭＭについて、各ＨＭＭ毎に、尤度ln｛Ｐ（ｘ₁、
ｘ₂ 、……、ｘ_T ）｝を求め、最大の尤度を得たＨＭＭ
に付与されているカテゴリｚを、当該時系列ｘ₁ 、ｘ
₂ 、……、ｘ_T を得た入力音声信号に対する認識結果と
して出力する。The HMM matching unit 18 uses i = 1, 2, ... I
Logarithmic forward probability C _iT of t = T for all i in
, The maximum C _iT among the logarithmic forward probabilities C _iT with i = * i is _calculated as the likelihood ln {P (x ₁ , x ₂ , ..., X
_T )}. For all the HMMs stored in the dictionary unit 12, the likelihood ln {P (x ₁ ,
x ₂ , ..., x _T )} is obtained and the maximum likelihood is obtained.
The category z given to the time series x ₁ , x
₂ , ..., x _T is output as the recognition result for the input voice signal.

【０１４１】次に請求項１の発明の第一実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
３及び図４はこの１個のＨＭＭに着目した処理の流れを
示す図である。この例では、出力確率ｂ_ji(x_t)、前向き
確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した出力
確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数化し
た参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉとして
説明する。Next, in the first embodiment of the invention of claim 1, time series x ₁ , x ₂ , ... Of HMM and speech feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. FIG. 3 and FIG. 4 are diagrams showing the flow of processing focusing on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０１４２】照合部１８は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部１６から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are input from the speech section detection unit 16, the collation unit 18 sets i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０１４３】次に照合部１８は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collation unit 18 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０１４４】次に照合部１８は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を対数化した参照
確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the collation unit 18 determines that j = 1, 2, ..., J.
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as the initial value of the logarithmic reference probability B _ji (S4).

【０１４５】参照情報記憶部３２には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域save B_jiを設けて
ある。従って参照情報記憶部３２は、Ｂ₁₁、Ｂ₁₂、…
…、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、……、Ｂ_J1、
Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納するＪ×Ｉ個の
格納領域を有する。そこで図にあっては、参照確率Ｂ_ji
の初期値を格納する処理を、save B_ji＝B_ji(x₁) と表し
ている。In the reference information storage section 32, j = 1, 2, ...
, J and i = 1, 2, ..., I are respectively provided with storage areas save B _ji for storing the reference probabilities B _ji . Therefore, the reference information storage unit 32 stores B ₁₁ , B ₁₂ , ...
_{_{..., B 1I, B 21,}} B 22, ......, B 2I, ......, B J1,
B _J2 , ..., B _JI respectively have J × I storage areas for individually storing. Therefore, in the figure, the reference probability B _ji
The process of storing the initial value of is expressed as save B _ji = B _ji (x ₁ ).

【０１４６】次に照合部１８は、基準フレーム番号ｑｓ
を現フレーム番号１に初期化し（Ｓ５）、然る後、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率Ｃ_i1を式（１１）に従って求める（Ｓ６）。Next, the collation unit 18 determines the reference frame number qs.
Is initialized to the current frame number 1 (S5), and then i =
Logarithmic forward probability C _i1 is obtained for all i of 1, 2, ..., I according to equation (11) (S6).

【０１４７】次に照合部１８は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collation unit 18 adds 1 to the current frame number t in order to process the next frame of the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame are compared. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０１４８】（１−１Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、照合部１８は現フレーム番号ｔの音声特徴
ベクトルｘ_t と基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsとの間の距離ｄｔｓを、次式（１７）に従って求
める（Ｓ９）。(1-1A: When t≤T in S8) If the current frame number t is equal to or less than the end frame number T in S8, the processing has not been completed for all the frames in the voice section, so the comparison is performed. part 18 a distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t, determined according to the following equation (17) (S9).

【０１４９】[0149]

【数１３】 (Equation 13)

【０１５０】但し、ｘ_tk：現フレーム番号ｔの音声特徴ベクトルｘ_t のベク
トル成分ｘ_qsk ：基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
のベクトル成分次に照合部１８は、距離ｄｔｓと閾値ＤＴＳとを比較し
てこれらベクトルｘ_t及びｘ_qsが近似的に等しいか否か
を判定する（Ｓ１０）。However, x _t k: vector component of the voice feature vector x _t of the current frame number t x _qs k: voice feature vector x _qs of the reference frame number _qs
Next, the matching unit 18 compares the distance dts with the threshold value DTS and determines whether or not these vectors x _t and x _qs are approximately equal (S10).

【０１５１】Ｓ１０で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t と
基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとが近似
せず従って現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照
確率Ｂ_jiで近似できないので、参照確率Ｂ_jiの書き換え
を行なうこととなる。そこで基準フレーム番号ｑｓを現
フレーム番号ｔに書き換える（Ｓ１１）。然る後、ｊ＝
１、２、……、Ｊ及びｉ＝１、２、……、Ｉの全ての
ｊ、ｉについて、対数化した出力確率Ｂ_ji(x_t)を式
（４）〜（７）に従って求め、参照確率Ｂ_jiを、当該出
力確率Ｂ_ji(x_t)に書き換える（Ｓ１２）。この参照確率
Ｂ_jiの書換え終了後に各参照確率Ｂ_jiを読み出し、ｉ＝
１、２、……、Ｉの全てのｉについて、前向き確率Ｃ_it
を式（１１）に従って求める（Ｓ１３）。然る後、音声
区間の次のフレームにつき処理を行なうべくＳ７の処理
に戻る。尚、Ｓ１２で参照確率Ｂ_jiを書き換える処理
を、図にあってはsave B_ji＝B_ji(x_t) と表している。[0151] When the S10 by the distance dts exceeds the threshold DTS, the output probability of the speech feature vector x _t and the reference frame number qs speech feature vector x _qs and does not approximate Thus current frame number t of the current frame number t since B _ji (x _t) can not be approximated by the reference probability B _ji, so that the rewriting of the reference probability B _ji. Therefore, the reference frame number qs is rewritten to the current frame number t (S11). After that, j =
, J, and i = 1, 2, ..., I, for all j and i, logarithmic output probabilities B _ji (x _t ) are obtained according to equations (4) to (7), The reference probability B _ji is rewritten to the output probability B _ji (x _t ) (S12). After the reference probability B _ji has been rewritten, each reference probability B _ji is read, and i =
Forward probability C _it for all i of 1, 2, ..., I
Is calculated according to the equation (11) (S13). After that, the process returns to S7 so as to perform the process for the next frame of the voice section. The process of rewriting the reference probability B _{ji in} S12 is represented as save B _ji = B _ji (x _t ) in the figure.

【０１５２】この場合のＳ１３で読み出した参照確率Ｂ
_jiは、Ｓ１２において求めた現フレーム番号ｔの出力確
率Ｂ_ji(x_t)であり、従ってこの場合のＳ１３では、現フ
レーム番号ｔの出力確率Ｂ_ji(x_t)を用いて前向き確率Ｃ
_itを求めることとなる。Reference probability B read in S13 in this case
_ji is the output probability B _ji (x _t ) of the current frame number t obtained in S12. Therefore, in S13 of this case, the forward probability C is calculated using the output probability B _ji (x _t ) of the current frame number t.
_It will ask for it.

【０１５３】またＳ１０で距離ｄｔｓが閾値ＤＴＳ以下
である場合には、現フレーム番号ｔの音声特徴ベクトル
ｘ_t は基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsに
近似的に等しく従って現フレーム番号ｔの出力確率Ｂ_ji
(x_t)は参照確率Ｂ_jiに近似的に等しくなるので、参照確
率Ｂ_jiの書換えは行なわないこととなる。そこで出力確
率Ｂ_ji(x_t)を式（４）〜（７）を用いて算出せずに、各
参照確率Ｂ_jiを読み出し、ｉ＝１、２、……、Ｉの全て
のｉについて、対数化した前向き確率Ｃ_itを式（１１）
に従って求める（Ｓ１３）。然る後、音声区間の次のフ
レームにつき処理を行なうべくＳ７の処理に戻る。If the distance dts is less than or equal to the threshold value DTS in S10, the voice feature vector x _t of the current frame number t is approximately equal to the voice feature vector x _qs of the reference frame number qs. Output probability B _ji
Because (x _t) is equal approximately to a reference probability B _ji, rewriting of the reference probability B _ji becomes not performed. Therefore, the reference probabilities B _ji are read out without calculating the output probabilities B _ji (x _t ) using the equations (4) to (7), and all i of i = 1, 2, ... The forward probability C _it which is logarithmized is _expressed by the equation (11).
(S13). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０１５４】この場合のＳ１３で読み出した参照確率Ｂ
_jiは、基準フレーム番号ｑｓのフレームで求めた出力確
率Ｂ_ji(x_qs) であり、従ってこの場合のＳ１３では、基
準フレーム番号ｑｓの出力確率Ｂ_ji(x_qs) を用いて前向
き確率Ｃ_itを求めることとなる。Reference probability B read in S13 in this case
_ji is the output probability B _ji (x _qs ) obtained in the frame of the reference frame number qs. Therefore, in S13 in this case, the forward probability C _it is used by using the output probability B _ji (x _qs ) of the reference frame number qs. Will be asked.

【０１５５】（１−１Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトル時系列ｘ₁ 、ｘ₂ 、……、ｘ
_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求め
る処理を終了する（終了）。(1-1B: When t> T in S8) When the current frame number t is larger than the frame number T of the end frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, wherein i = * i maximum forward probability C _iT of the forward probability C _iT comprising the following (9), the speech feature vector time series _{_{x 1, x 2, ......,}} x
Likelihood ln {P (x ₁ , x ₂ , ..., X between _T and HMM
_T )}, and after that, the process of calculating the likelihood for the HMM is ended (end).

【０１５６】照合部１８は、辞書部１２に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に図３、図４に示
すＳ１〜Ｓ１３の処理を行なって尤度（前向き確率
Ｃ_iT）を求め、そして最大の尤度を得たＨＭＭのカテゴ
リを、当該音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を抽出した入力音声信号に対する認識結果とし
て、次段の装置（図示せず）へ出力する。The collation unit 18 obtains the likelihood (forward probability C _iT ) by performing the processes of S1 to S13 shown in FIGS. 3 and 4 for all HMMs stored in the dictionary unit 12. , And the category of the HMM having the maximum likelihood, the time series x ₁ , x ₂ , ... Of the speech feature vector.
, X _T are output to a device (not shown) in the next stage as a recognition result for the extracted input voice signal.

【０１５７】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、距離ｄｔｓ
が閾値ＤＴＳ以下となる場合に、出力確率Ｂ_ji(x_t)を式
（４）〜（７）から求める演算を行なわずに、前向き確
率Ｃ_itを求めるので、演算量を大幅に削減できる。しか
もこのような演算の簡略化は、距離ｄｔｓが閾値ＤＴＳ
以下となる場合に行なうので、演算の簡略化を行なって
も、前向き確率Ｃ_itの誤差を小さくできる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _iT in the process of obtaining the distance dts
Is less than or equal to the threshold value DTS, the forward probability C _it is calculated without performing the calculation of the output probabilities B _ji (x _t ) from the equations (4) to (7), so that the amount of calculation can be significantly reduced. Moreover, the simplification of such calculation is that the distance dts is the threshold value DTS.
Since it is performed in the following case, the error of the forward probability C _it can be reduced even if the calculation is simplified.

【０１５８】この出願の発明者のシミュレーション結果
によれば、出力確率Ｂ_ji(x_t)を求めるための演算量を、
演算の簡略化を行なわない場合の約１／５となるよう
に、閾値ＤＴＳを定めた場合と、演算の簡略化を行なわ
ない場合とで、音声認識の認識精度に顕著な差を生じな
い例が数多く存在した。According to the simulation result of the inventor of this application, the calculation amount for _obtaining the output probability B _ji (x _t ) is
Example in which there is no significant difference in recognition accuracy of voice recognition between the case where the threshold value DTS is set to be about 1/5 of the case where the calculation is not simplified and the case where the calculation is not simplified There were many.

【０１５９】＜請求項１の発明の第二実施形態＞請求項
１の発明の第二実施形態の実施に用いて好適な音声認識
装置としては、照合部１８を次に述べる如く構成するほ
かは、上述した構成と同様の構成の音声認識装置１０を
用いることができる。<Second Embodiment of the Invention of Claim 1> As a voice recognition apparatus suitable for carrying out the second embodiment of the invention of claim 1, the collating unit 18 is configured as follows. It is possible to use the voice recognition device 10 having the same configuration as that described above.

【０１６０】すなわち照合部１８は、尤度を求める際
に、参照情報記憶部２０に格納してある参照確率ｂ_jiを
用いて、ｔ＝１、２、……、Ｔの各場合の前向き確率ｃ
_itを、次ぎの如くして順次に求める。That is, the collation unit 18 uses the reference probability b _ji stored in the reference information storage unit 20 when calculating the likelihood, and the forward probability in each case of t = 1, 2, ..., T. c
_It is calculated sequentially as follows.

【０１６１】（１）．ｔ＝１のときは、基準フレーム番
号ｑｓを１に、及び、スキップ数ｓｋｉｐｓを０に初期
化すると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x
_t)をヒドンマルコフモデルから求め当該出力確率ｂ_ji(x
_t)を参照確率ｂ_jiの初期値として書き込み、参照確率ｂ
_jiの書込み終了後に各参照確率ｂ_jiを読み出して前向き
確率ｃ_itを求める処理（１Ａ）と、処理（１Ａ）の終了
後、現現フレーム番号ｔに１を加算する処理（１Ｂ）と
を行なう。(1). When t = 1, the reference frame number qs is initialized to 1, the skip number skips is initialized to 0, and the output probabilities b _ji (x
_t ) is calculated from the Hidden Markov model and the output probability b _ji (x
_t ) is written as the initial value of the reference probability b _ji , and the reference probability b
After the writing of _ji is finished, a process (1A) of reading out each reference probability b _ji to obtain a forward probability c _it and a process (1B) of adding 1 to the current frame number t after the process (1A) are completed. .

【０１６２】（２）．２≦ｔ≦Ｔのときは、スキップ数
ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共に、現フ
レーム番号ｔの音声特徴ベクトルｘ_t と基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを閾
値ＤＴＳと比較し、当該比較結果がｓｋｉｐｓ＞ＮＳＫ
ＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる場合に、スキップ
数ｓｋｉｐｓを０に初期化し、及び、基準フレーム番号
ｑｓを現フレーム番号ｔに書き換えると共に、全ての
ｊ、ｉについて、出力確率ｂ_ji(x_t)をヒドンマルコフモ
デルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x_t)に
書き換え、この参照確率ｂ_jiの書換え終了後に各参照確
率ｂ_jiを読み出して前向き確率ｃ_itを求め、当該比較結
果がｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦ＤＴＳとな
る場合に、スキップ数ｓｋｉｐｓに１を加算すると共
に、参照確率ｂ_jiの書換えを行なわずに各参照確率ｂ_ji
を読み出して前向き確率ｃ_itを求める処理（１Ｃ）と、
処理（１Ｃ）の終了後、現フレーム番号ｔに１を加算す
る処理（１Ｄ）とを行なう。(2). 2 ≦ t when the ≦ T, together with comparing the number of skips skips a threshold NSKIPS, threshold DTS distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t And the comparison result is skips> NSK.
When IPS or dts> DTS, the skip number skips is initialized to 0, the reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) are set for all j and i. the reference probability b _ji determined from hidden Markov models rewritten to the output probability b _ji (x _t), determine the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji, the comparison result There when the skips ≦ NSKIPS and dts ≦ DTS, as well as adding 1 to the skip number skips, reference probability b _ji each reference probability b _ji without rewriting the
And a process (1C) for obtaining the forward probability c _it ,
After the end of the process (1C), a process (1D) of adding 1 to the current frame number t is performed.

【０１６３】次に請求項１の発明の第二実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
５及び図６は、この１個のＨＭＭに着目した処理の流れ
を示す図である。この例では、出力確率ｂ_ji(x_t)、前向
き確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した出
力確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数化
した参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉとし
て説明する。Next, in the second embodiment of the invention as claimed in claim 1, time series x ₁ , x ₂ , ... Of HMM and voice feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. FIG. 5 and FIG. 6 are diagrams showing the flow of processing focusing on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０１６４】照合部１８は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部１６から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are input from the speech section detecting section 16, the collation section 18 sets i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０１６５】次に照合部１８は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collation unit 18 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０１６６】次に照合部１８は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を対数化した出力
確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the collation unit 18 uses j = 1, 2, ..., J.
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as an initial value of the logarithmic output probability B _ji (S4).

【０１６７】参照情報記憶部３２には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域をsave B_jiを設け
てある。従って参照情報記憶部３２は、Ｂ₁₁、Ｂ₁₂、…
…、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、……、Ｂ_J1、
Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納するＪ×Ｉ個の
格納領域を有する。そこで図にあっては、参照確率Ｂ_ji
の初期値を格納する処理をsave B_ji＝B_ji(x₁) と表して
いる。In the reference information storage section 32, j = 1, 2, ...
, J and i = 1, 2, ..., I, respectively, save B _ji is provided as a storage area for storing the reference probability B _ji for each j and i. Therefore, the reference information storage unit 32 stores B ₁₁ , B ₁₂ , ...
_{_{..., B 1I, B 21,}} B 22, ......, B 2I, ......, B J1,
B _J2 , ..., B _JI respectively have J × I storage areas for individually storing. Therefore, in the figure, the reference probability B _ji
The process of storing the initial value of is expressed as save B _ji = B _ji (x ₁ ).

【０１６８】次に照合部１８は、基準フレーム番号ｑｓ
を現フレーム番号１に初期化すると共にスキップ数ｓｋ
ｉｐｓを０に初期化する（Ｓ５）。然る後、ｉ＝１、
２、……、Ｉの全てのｉについて、対数化した前向き確
率Ｃ_i1を式（１１）に従って求める（Ｓ６）。Next, the collating section 18 determines the reference frame number qs.
To the current frame number 1 and skip number sk
ips is initialized to 0 (S5). After that, i = 1,
The logarithmic forward probability C _i1 is _calculated for all i of 2, ..., I according to the equation (11) (S6).

【０１６９】次に照合部１８は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collation unit 18 adds 1 to the current frame number t in order to process the next frame in the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０１７０】（１−２Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、スキップ数ｓｋｉｐｓと閾値ＮＳＫＩＰＳ
との比較判定を行なう（Ｓ９）。(1-2A: When t ≦ T in S8) If the current frame number t is less than or equal to the end frame number T in S8, the processing has not been completed for all the frames in the voice section, so skip. Number skips and threshold NSKIPS
A comparison judgment with is performed (S9).

【０１７１】Ｓ９でスキップ数ｓｋｉｐｓが閾値ＮＳＫ
ＩＰＳを越える場合は、現フレーム番号ｔと基準フレー
ム番号ｑｓとの時間的隔たりが大きく従って誤差が増大
する可能性が高いので誤差を低減すべく、参照確率Ｂ_ji
の書換えを行なうこととなる。そこでスキップ数ｓｋｉ
ｐｓを０に初期化すると共に基準フレーム番号ｑｓを現
フレーム番号ｔに書き換える（Ｓ１０）。然る後、ｊ＝
１、２、……、Ｊ及びｉ＝１、２、……、Ｉの全ての
ｊ、ｉについて、対数化した出力確率Ｂ_ji(x_t)を式
（４）〜（７）に従って求め、参照確率Ｂ_jiを当該出力
確率Ｂ_ji(x_t)に書き換える（Ｓ１１）。この参照確率Ｂ
_jiの書換え終了後に各参照確率Ｂ_jiを読み出し、ｉ＝
１、２、……、Ｉの全てのｉについて、前向き確率Ｃ_it
を式（１１）に従って求める（Ｓ１２）。然る後、音声
区間の次のフレームにつき処理を行なうべくＳ７の処理
に戻る。尚、Ｓ１１で参照確率Ｂ_jiを書き換える処理
を、図にあってはsave B_ji＝B_ji(x_t) と表している。In S9, the skip number skips is the threshold NSK.
When the IPS is exceeded, the current frame number t and the reference frame number qs have a large time gap, and thus the error is likely to increase. Therefore, in order to reduce the error, the reference probability B _ji is reduced.
Will be rewritten. Therefore skip number ski
The ps is initialized to 0 and the reference frame number qs is rewritten to the current frame number t (S10). After that, j =
, J, and i = 1, 2, ..., I, for all j and i, logarithmic output probabilities B _ji (x _t ) are obtained according to equations (4) to (7), The reference probability B _ji is rewritten to the output probability B _ji (x _t ) (S11). This reference probability B
After the completion of rewriting _ji , each reference probability B _ji is read, and i =
Forward probability C _it for all i of 1, 2, ..., I
Is calculated according to the equation (11) (S12). After that, the process returns to S7 so as to perform the process for the next frame of the voice section. The process of rewriting the reference probability B _{ji in} S11 is represented as save B _ji = B _ji (x _t ) in the figure.

【０１７２】この場合のＳ１２で読み出した参照確率Ｂ
_jiは、Ｓ１１において求めた現フレーム番号ｔの出力確
率Ｂ_ji(x_t)であり、従ってこの場合のＳ１２では、現フ
レーム番号ｔの出力確率Ｂ_ji(x_t)を用いて前向き確率Ｃ
_itを求めることとなる。Reference probability B read in S12 in this case
_ji is the output probability B _ji (x _t ) of the current frame number t obtained in S11. Therefore, in S12 of this case, the forward probability C is calculated using the output probability B _ji (x _t ) of the current frame number t.
_It will ask for it.

【０１７３】Ｓ９でスキップ数ｓｋｉｐｓが閾値ＮＳＫ
ＩＰＳ以下となる場合は、照合部１８は、現フレーム番
号ｔの音声特徴ベクトルｘ_t と基準フレーム番号ｑｓの
音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを求め（Ｓ１
３）、求めた距離ｄｔｓを閾値ＤＴＳと比較してこれら
ベクトルｘ_t 及びｘ_qsが近似的に等しいか否かを判定す
る（Ｓ１４）。In step S9, the skip number skips is the threshold value NSK.
If the IPS or less, the matching unit 18 obtains the distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t (S1
3) The obtained distance dts is compared with the threshold value DTS to determine whether or not these vectors x _t and x _qs are approximately equal (S14).

【０１７４】Ｓ１４で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t と
基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとが近似
せず従って現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照
確率Ｂ_jiで近似できないので、参照確率Ｂ_jiの書き換え
を行なうこととなる。そこでＳ１０〜Ｓ１２の処理を行
ない、然る後、音声区間の次のフレームにつき処理を行
なうべくＳ７の処理に戻る。[0174] When the S14 by the distance dts exceeds the threshold DTS, the output probability of the speech feature vector x _t and the reference frame number qs speech feature vector x _qs and does not approximate Thus current frame number t of the current frame number t since B _ji (x _t) can not be approximated by the reference probability B _ji, so that the rewriting of the reference probability B _ji. Therefore, the processes of S10 to S12 are performed, and thereafter, the process returns to the process of S7 to perform the process for the next frame of the voice section.

【０１７５】Ｓ１４で距離ｄｔｓが閾値ＤＴＳ以下であ
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsに近似
的に等しく従って現フレーム番号ｔの出力確率Ｂ_ji(x_t)
は参照確率Ｂ_jiに近似的に等しくなるので、参照確率Ｂ
_jiの書換えは行なわないこととなる。そこでスキップ数
ｓｋｉｐｓに１を加算してスキップ数ｓｋｉｐｓをカウ
ントアップし（Ｓ１５）、然る後、出力確率Ｂ_ji(x_t)を
式（４）〜（７）を用いて算出せずに、参照確率Ｂ_jiを
読み出し、ｉ＝１、２、……、Ｉの全てのｉについて、
対数化した前向き確率Ｃ_itを式（１１）に従って求める
（Ｓ１２）。然る後、音声区間の次のフレームにつき処
理を行なうべくＳ７の処理に戻る。If the distance dts is less than or equal to the threshold value DTS in S14, the voice feature vector x _t of the current frame number _t.
Is approximately equal to the speech feature vector x _qs of the reference frame number qs, and thus the output probability B _ji (x _t ) of the current frame number t
Is approximately equal to the reference probability B _ji , the reference probability B
_ji will not be rewritten. Therefore, 1 is added to the skip number skips to count up the skip number skips (S15), and then the output probability B _ji (x _t ) is not calculated using the equations (4) to (7), Read out the reference probability B _ji, and for all i of i = 1, 2, ..., I,
The logarithmic forward probability C _it is calculated according to the equation (11) (S12). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０１７６】この場合のＳ１２で読み出した参照確率Ｂ
_jiは、基準フレーム番号ｑｓのフレームで求めた出力確
率Ｂ_ji(x_t)であり、従ってこの場合のＳ１２では、基準
フレーム番号ｑｓの出力確率Ｂ_ji(x_qs) を用いて前向き
確率Ｃ_itを求めることとなる。Reference probability B read in S12 in this case
_ji is the output probability B _ji (x _t ) obtained in the frame of the reference frame number qs. Therefore, in S12 in this case, the forward probability C _it is used by using the output probability B _ji (x _qs ) of the reference frame number qs. Will be asked.

【０１７７】（１−２Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、
ｘ_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求
める処理を終了する（終了）。(1-2B: When t> T in S8) When the current frame number t is larger than the frame number T of the end frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, the maximum forward probability C _iT of i = * i consisting forward probability C _iT according to equation (9), the time series x _1, x ₂ of the speech feature vector, ...,
Likelihood ln {P (x ₁ , x ₂ , ..., Between x _T and HMM
x _T )}, and after that, the process of calculating the likelihood for the HMM is terminated (end).

【０１７８】照合部１８は、辞書部１２に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に、図５、図６に
示すＳ１〜Ｓ１５の処理を行なって尤度（前向き確率Ｃ
_iT）を求め、求めた尤度のうち最大の尤度を検出する。
そして最大の尤度を得たＨＭＭのカテゴリを、当該音声
特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、ｘ_T を抽出し
た入力音声信号に対する認識結果として、次段の装置
（図示せず）へ出力する。The collating unit 18 performs the processes of S1 to S15 shown in FIGS. 5 and 6 for all the HMMs stored in the dictionary unit 12 to calculate the likelihood (forward probability C).
_iT ), and the maximum likelihood is detected from the calculated likelihoods.
The largest category of likelihood the resulting HMM, time series x _1, x ₂ of the audio feature vector, ..., as the recognition result for the input speech signal obtained by extracting the x _T, without the next stage of the device (shown ).

【０１７９】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、スキップ数
ｓｋｉｐｓが閾値ＮＳＫＩＰＳ以下となりかつ距離ｄｔ
ｓが閾値ＤＴＳ以下となる場合に、出力確率Ｂ_ji(x_t)を
式（４）〜（７）から求める演算を行なわずに、前向き
確率Ｃ_itを求めるので、大幅に演算量を削減できる。し
かもこのような演算の簡略化は、スキップ数ｓｋｉｐｓ
が閾値ＮＳＫＩＰＳ以下となりかつ距離ｄｔｓが閾値Ｄ
ＴＳ以下となる場合に行なうので、演算の簡略化を行な
っても、前向き確率Ｃ_itの誤差を小さくできる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _iT , the number of skips skips becomes equal to or less than the threshold value NSKIPS and the distance dt
When s is equal to or less than the threshold value DTS, the forward probability C _it is calculated without performing the calculation of the output probability B _ji (x _t ) from the equations (4) to (7), so that the calculation amount can be significantly reduced. . Moreover, such a simplification of the calculation is based on the skip number skips.
Is less than or equal to the threshold NSKIPS and the distance dts is the threshold D
Since it is performed when it is equal to or smaller than TS, the error of the forward probability C _it can be reduced even if the calculation is simplified.

【０１８０】請求項１の発明は、フレーム単位でマッチ
ング処理を行なう音声認識装置の全てに適用できる。The invention of claim 1 can be applied to all speech recognition apparatuses that perform matching processing in frame units.

【０１８１】＜請求項３の発明の第一実施形態＞図７は
請求項３の発明の第一実施形態の実施に用いて好適な音
声認識装置の構成例を示す機能ブロック図である。<First Embodiment of Invention of Claim 3> FIG. 7 is a functional block diagram showing an example of the configuration of a voice recognition apparatus suitable for carrying out the first embodiment of the invention of Claim 3.

【０１８２】同図に示す音声認識装置２２は、辞書部２
４、音響処理部２６、音声区間検出部２８、照合部３０
及び参照情報記憶部３２を備える。The voice recognition device 22 shown in FIG.
4, sound processing unit 26, voice section detection unit 28, collation unit 30
And a reference information storage unit 32.

【０１８３】辞書部２４は、認識照合用の標準パタンと
して各カテゴリ毎に用意された複数個のヒドンマルコフ
モデルを格納する。ヒドンマルコフモデルにおいて音声
特徴ベクトルｘの出力確率ｂ_ji(x) を与える状態遷移の
遷移元となる状態Ｓ_j には、定常部及び過渡部のいずれ
かの種別ｓを付与してある。参照情報記憶部３２は、定
常部基準フレーム番号ｑｓ、過渡部基準フレーム番号ｑ
ｔと、参照確率ｂ_jiとを格納する。The dictionary unit 24 stores a plurality of Hidden Markov models prepared for each category as standard patterns for recognition and matching. In the Hidden Markov model, the state S _j that is the transition source of the state transition that gives the output probability b _ji (x) of the voice feature vector x is given a type s of either a stationary part or a transient part. The reference information storage unit 32 stores the reference frame number qs for the stationary part and the reference frame number q for the transient part.
Store t and the reference probability b _ji .

【０１８４】音響処理部２６は、一定時間幅のフレーム
毎に、入力音声信号から音声特徴ベクトルを抽出する。
音声区間検出部２８は、入力音声信号から音声区間を検
出する。The sound processing unit 26 extracts a voice feature vector from the input voice signal for each frame of a fixed time width.
The voice section detection unit 28 detects a voice section from the input voice signal.

【０１８５】照合部３０は、請求項３の発明の第一実施
形態を実施するものであって、音声区間の始端フレーム
から終端フレームまでに抽出された音声特徴ベクトルの
時系列ｘ₁ 、ｘ₂ 、……、ｘ_T とヒドンマルコフモデル
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
次式（１）〜（３）を用いて求め、最大の尤度を得たヒ
ドンマルコフモデルに付与されているカテゴリを、当該
音声区間内の音声信号に対する認識結果とする。The collating unit 30 implements the first embodiment of the invention of claim 3, and is a time series x ₁ and x _{2 of} the voice feature vectors extracted from the start frame to the end frame of the voice section. , ..., The likelihood ln {P (x ₁ , x ₂ , ..., x _T )} between x _T and the Hidden Markov model is
The category given to the Hidden Markov model that has been obtained using the following equations (1) to (3) and has the maximum likelihood is used as the recognition result for the voice signal in the voice section.

【０１８６】[0186]

【数１４】 [Equation 14]

【０１８７】但し、ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊ Ф_i ：ヒドンマルコフモデルにおいて初期状態がＳ_i で
ある確率ａ_ji：ヒドンマルコフモデルにおいて状態Ｓ_j から状態
Ｓ_i に遷移する確率ｘ_t ：音声区間内の第ｔ番目のフレームで抽出された音
声特徴ベクトル（１≦ｔ≦Ｔであって、第１番目のフレ
ームは音声区間の始端フレームを及び第Ｔ番目のフレー
ムは音声区間の終端フレームを表す）ｂ_ji(x_t)：ヒドンマルコフモデルにおいて状態Ｓ_j から
状態Ｓ_i に遷移するとき出力される音声特徴ベクトルｘ
_t の出力確率ｃ_it：ヒドンマルコフモデルにおいて初期状態から遷移
を開始し音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_t を出力して状態Ｓ_i に至る前向き確率＊ｉ：ヒドンマルコフモデルにおいて最終状態となる状
態Ｓ_i に付与されている状態番号ｉ尤度を求める際には、参照情報記憶部３２に格納されて
いる参照確率ｂ_jiを用いて、ｔ＝１、２、……、Ｔの各
場合の前向き確率ｃ_itを、次の如くして順次に求める。However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that the initial state is S _i in the Hidden Markov model a _ji : Hidden Markov model At the state S _j to the state S _i in the above, x _t : the speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, and the first frame corresponds to the speech section). The start frame and the T-th frame represent the end frame of the speech section) b _ji (x _t ): speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t to the state S _i * i: State number i assigned to the state S _i that is the final state in the Hidden Markov model When storing the likelihood, it is stored in the reference information storage unit 32. The forward probability c _it in each case of t = 1, 2, ..., T is sequentially obtained using the reference probability b _ji that has been described as follows.

【０１８８】（１）．ｔ＝１のときは、定常部基準フレ
ーム番号ｑｓ、過渡部基準フレーム番号ｑｔをそれぞれ
１に初期化すると共に、全てのｊ、ｉについて、出力確
率ｂ_ji(x_t)をヒドンマルコフモデルから求め当該出力確
率ｂ_ji(x_t)を参照確率ｂ_jiの初期値として書き込み、該
参照確率ｂ_jiの書込み終了後に各参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める処理（２Ａ）を行なう。そ
して処理（２Ａ）の終了後、現フレーム番号ｔに１を加
算する処理（２Ｂ）を行なう。(1). When t = 1, the stationary part reference frame number qs and the transient part reference frame number qt are initialized to 1, and the output probabilities b _ji (x _t ) are obtained from the Hidden Markov model for all j and i. writing the output probability b _ji the (x _t) as the initial value of the reference probability b _ji, performs processing for calculating the forward probability c _it reads each reference probability b _ji after completion of writing of the reference probability b _ji the (2A). After the processing (2A) is completed, processing (2B) for adding 1 to the current frame number t is performed.

【０１８９】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と定常部基準フレーム番
号ｑｓの音声特徴ベクトル_qsとの間の距離ｄｔｓを閾値
ＤＴＳと比較し、当該比較結果がｄｔｓ＞ＤＴＳとなる
場合に、定常部基準フレーム番号ｑｓを現フレーム番号
ｔに書き換える処理（２Ｃ）と、現フレーム番号ｔの音
声特徴ベクトルｘ_t と過渡部基準フレーム番号ｑｔの音
声特徴ベクトルｘ_qtとの間の距離ｄｔｔを閾値ＤＴＴと
比較し、当該比較結果がｄｔｔ＞ＤＴＴとなる場合に、
過渡部基準フレーム番号ｑｔを現フレーム番号ｔに書き
換える処理（２Ｄ）とを行ない、これら処理（２Ｃ）及
び（２Ｄ）の終了後、ｊ＝１、２、……、Ｊの各ｊ毎
に、出力確率ｂ_ji(x_t)を与える状態遷移の遷移元Ｓ_j に
付与されている種別ｓを判定する処理（２Ｅ）を行な
う。(2). When the 2 ≦ t ≦ T, the distance dts between the speech feature vector _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t is compared with a threshold DTS, the comparison result is dts> if the DTS, the process of rewriting the constant part reference frame number qs to the current frame number t and (2C), the speech feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t If the distance dtt is compared with the threshold value DTT and the comparison result is dtt> DTT,
A process (2D) for rewriting the transition part reference frame number qt to the current frame number t is performed, and after completion of these processes (2C) and (2D), j = 1, 2, ... A process (2E) of determining the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ) is performed.

【０１９０】そして処理（２Ｅ）の種別判定結果が定常
部であった場合に、処理（２Ｃ）の比較結果がｄｔｓ＞
ＤＴＳであれば、当該種別判定結果を得たｊに関しては
全てのｉについて、出力確率ｂ_ji(x_t)をヒドンマルコフ
モデルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x_t)
に書き換え、処理（２Ｅ）の種別判定結果が定常部であ
った場合に、処理（２Ｃ）の比較結果がｄｔｓ≦ＤＴＳ
であれば、当該種別判定結果を得たｊに関しては参照確
率ｂ_jiの書換えを行なわず、処理（２Ｅ）の種別判定結
果が過渡部であった場合に、処理（２Ｄ）の比較結果が
ｄｔｔ＞ＤＴＴであれば、当該種別判定結果を得たｊに
関しては全てのｉについて、出力確率ｂ_ji(x_t)をヒドン
マルコフモデルから求めて参照確率ｂ_jiを当該出力確率
ｂ_ji(x_t)に書き換え、処理（２Ｅ）の種別判定結果が過
渡部であった場合に、処理（２Ｄ）の比較結果がｄｔｔ
≦ＤＴＴであれば、当該種別判定結果を得たｊに関して
は参照確率ｂ_jiの書換えを行なわない処理（２Ｆ）を行
なう。When the type determination result of the process (2E) is the stationary part, the comparison result of the process (2C) is dts>
In the case of DTS, for j for which the type determination result is obtained, the output probability b _ji (x _t ) is obtained from the Hidden Markov model for all i, and the reference probability b _ji is the output probability b _ji (x _t ).
When the type determination result of the process (2E) is a stationary part, the comparison result of the process (2C) is dts ≦ DTS.
In this case, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and when the type determination result of the process (2E) is the transition part, the comparison result of the process (2D) is dtt. > DTT, for j for which the type determination result is obtained, for all i, output probabilities b _ji (x _t ) are obtained from Hidden Markov models and reference probabilities b _ji are output probabilities b _ji (x _t ). When the type determination result of the process (2E) is a transition part, the comparison result of the process (2D) is changed to dtt.
If ≦ DTT, a process (2F) of not rewriting the reference probability b _ji is performed for j that has obtained the type determination result.

【０１９１】そしてｊ＝１、２、……、Ｊの個々のｊ毎
に処理（２Ｆ）を行ない、全てのｊにつき処理（２Ｆ）
を終了したら、各参照確率ｂ_jiを読み出して前向き確率
ｃ_itを求める処理（２Ｇ）を行なう。処理（２Ｇ）の終
了後、現フレーム番号ｔに１を加算する処理（２Ｈ）を
行なう。Then, the processing (2F) is performed for each j of j = 1, 2, ..., J, and the processing (2F) is performed for all j.
After the above, the reference probability b _ji is read out and the forward probability c _it is calculated (2G). After the processing (2G) is completed, the processing (2H) of adding 1 to the current frame number t is performed.

【０１９２】図８はヒドンマルコフモデルの説明に供す
る図である。辞書部２４に格納されているヒドンマルコ
フモデル（Hidden Markov Model 。以下、ＨＭＭ）は、
音声認識一単位分の音声信号を表現する。音声認識の一
単位は、単語単位、音素単位或はそのほかとすることが
できるが、ここでは単語単位とする。各カテゴリｚ毎に
複数のＨＭＭを用意し、ＨＭＭとカテゴリｚとを相対応
付けて辞書部２４に格納する。FIG. 8 is a diagram for explaining the Hidden Markov Model. The Hidden Markov Model (hereinafter, HMM) stored in the dictionary unit 24 is
Voice recognition Represents one unit of voice signal. One unit of speech recognition can be a word unit, a phoneme unit, or another unit, but here, it is a word unit. A plurality of HMMs are prepared for each category z, and the HMM and the category z are associated with each other and stored in the dictionary unit 24.

【０１９３】ＨＭＭは、総個数Ｉ個の状態Ｓ₁ 〜Ｓ_I か
ら成る状態の集合１と、音声特徴ベクトルｘの集合２
と、状態遷移確率ａ_jiの集合３と、出力確率ｂ_ji(x) の
集合４と、初期状態確率Ф_i の集合５と、最終状態Ｆの
集合６とにより定義される。そしてＨＭＭにおいて出力
確率ｂ_ji(x) を与える状態遷移の遷移元Ｓ_j に対して
は、定常部及び過渡部のいずれかの種別ｓを付与してあ
る。但し、The HMM has a set 1 of states consisting of a total of I states S _{1 to} S _I and a set 2 of speech feature vectors x.
, A set 3 of state transition probabilities a _ji, a set 4 of output probabilities b _ji (x), a set 5 of initial state probabilities Φ _i , and a set 6 of final states F. In the HMM, the transition source S _{j of the} state transition that gives the output probability b _ji (x) is given a type s of either a steady part or a transient part. However,

【０１９４】[0194]

【数１５】 (Equation 15)

【０１９５】ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊａ_ji：状態Ｓ_j から状態Ｓ_i に遷移する確率ｂ_ji(x) ：状態Ｓ_j から状態Ｓ_i に遷移する際に音声特
徴ベクトルｘが出力される確率 Ф_i ：初期状態がＳ_i である確率例えば図２の例において、ａ₁₂は状態Ｓ₁ から状態Ｓ₂
に遷移する確率及びｂ₁₂(x) は状態Ｓ₁ から状態Ｓ₂ に
遷移したとき音声特徴ベクトルｘが出力される確率、ま
たａ₂₂は状態Ｓ₂ から状態Ｓ₂ に遷移する確率及びｂ₂₂
(x) は状態Ｓ₂から状態Ｓ₂ に遷移したとき音声特徴ベ
クトルｘが出力される確率を表す。さらに出力確率ｂ₁₁
(x) を与える状態遷移Ｓ₁ →Ｓ₁ の遷移元Ｓ₁ に対して
は、種別ｓとして過渡部が、また出力確率ｂ₁₂(x) を与
える状態遷移Ｓ₁ →Ｓ₂ に対しては、種別ｓとして定常
部が付与してある。I: i = 1, 2, ..., I j: j = 1, 2, ..., J a _ji : Probability of transition from state S _j to state S _i b _ji (x): state S _j probability speech feature vector x is output in transition to state S _i from .PHI _i: initial state in the example of the probability for example FIG. 2 is a S _i, a ₁₂ state from the state S ₁ is S ₂
Transitions to probability and b ₁₂ (x) The probability, speech feature vector x is output when a transition from the state S ₁ to state S ₂ is a ₂₂ probability and b ₂₂ transitions from state S ₂ to state S ₂ is
(x) represents the probability that the audio feature vector x is output when a transition from the state S ₂ to state S _2. Furthermore, the output probability b ₁₁
For the transition source S _{1 of the} state transition S ₁ → S ₁ that gives (x), the transient part as the type s, and for the state transition S ₁ → S ₂ that gives the output probability b ₁₂ (x), , The stationary part is added as the type s.

【０１９６】ＨＭＭを定義するための集合１〜６は、統
計的手法によって、各カテゴリｚ毎に個別に求められ
る。すなわちカテゴリｚに対応する音声信号として種々
の音声信号を集め、例えば年齢別にもしくは性別毎に音
声信号を集め、或は、発声法の異なる音声信号を集め、
これら音声信号の統計的性質を表現する集合１〜６を求
める。この際、出力確率ｂ_ji(x) を与える状態遷移が音
声信号の定常部及び過渡部のいずれであるかも調べて、
当該状態遷移の遷移元Ｓ_j に対し定常部及び過渡部のい
ずれかの種別ｓを付与する。Sets 1 to 6 for defining the HMM are individually obtained for each category z by a statistical method. That is, various voice signals are collected as voice signals corresponding to the category z, for example, voice signals are collected by age or sex, or voice signals having different voicing methods are collected.
Sets 1 to 6 expressing the statistical properties of these audio signals are obtained. At this time, it is also checked whether the state transition that gives the output probability b _ji (x) is the steady part or the transient part of the audio signal,
A type s of either a steady part or a transient part is given to the transition source S _{j of the} state transition.

【０１９７】出力確率ｂ_ji(x) は、互いに無相関な複数
個の正規分布から成る無相関混合正規分布を用いて表現
されており、これら正規分布はそれぞれ音声特徴ベクト
ルｘの関数となっている。無相関混合正規分布は、数学
的取り扱いが簡単でしかも表現能力が高いという利点を
有する。The output probability b _ji (x) is expressed by using a non-correlated mixed normal distribution consisting of a plurality of normal distributions that are uncorrelated with each other, and each of these normal distributions is a function of the speech feature vector x. There is. The decorrelated mixed normal distribution has the advantage of being easy to handle mathematically and having high expressiveness.

【０１９８】次に音声認識装置２２の動作説明ととも
に、この実施形態の音声認識方法の処理の流れにつき具
体的に説明する。Next, the operation of the voice recognition device 22 will be described, and the flow of processing of the voice recognition method of this embodiment will be specifically described.

【０１９９】音響処理部２６は、入力音声信号から、各
フレーム毎に音声特徴ベクトルｘ_t＝（ｘ_t1、ｘ_t2、…
…、ｘ_tp）を抽出する。ここでｐは音声特徴ベクトルｘ
_t の次数及びｘ_t1〜ｘ_tpは音声特徴ベクトルｘ_t のベク
トル成分を表す。ｔは音声特徴ベクトルｘ_t が抽出され
たフレームに付与されている番号である。後述するＨＭ
Ｍとの照合の段階では音声区間の始端フレームのフレー
ム番号ｔを１として昇順に書き改められるが、音響処理
の時点では各フレームを識別できるようにフレーム番号
ｔを付与してあれば良い。The sound processing unit 26, from the input speech signal, the speech feature vector x _t = (x _t 1, x _t 2, ...) For each frame.
,, x _t p) are extracted. Where p is the voice feature vector x
order and x _t 1 to x _t p to _t represents a vector component of the speech feature vector x _t. t is a number given to the frame from which the voice feature vector x _t is extracted. HM described later
At the stage of matching with M, the frames are rewritten in ascending order with the frame number t of the starting frame of the voice section as 1, but at the time of the acoustic processing, the frame number t may be added so that each frame can be identified.

【０２００】音声特徴ベクトルｘ_t のベクトル成分とし
ては、例えば、中心周波数が異なる複数のバンドパスフ
ィルタから成る帯域フィルタ群に入力音声信号を入力し
たときの各フィルタ出力から得たものや、入力音声信号
をフーリエ解析して得られるパワースペクトル成分や、
或は、入力音声信号の線形予測分析すなわちＬＰＣ分析
により求められるＬＰＣケプストラム係数を、用いるこ
とができる。ここでは帯域フィルタ群を用いて音声特徴
ベクトルｘ_t を抽出する例につき説明する。The vector component of the voice feature vector x _t is, for example, one obtained from each filter output when an input voice signal is input to a band filter group consisting of a plurality of band pass filters having different center frequencies, or input voice signals. Power spectrum component obtained by Fourier analysis of the signal,
Alternatively, the LPC cepstrum coefficient obtained by the linear prediction analysis, that is, the LPC analysis of the input speech signal can be used. Here, an example of extracting the voice feature vector x _t using a bandpass filter group will be described.

【０２０１】音響処理部２６は、入力音声信号をアナロ
グ信号からデジタル信号に変換し、変換後の入力音声信
号を、帯域フィルタ群を介して、各バンドパスフィルタ
に対応した周波数帯（チャネル）の信号成分に分離し、
それぞれ周波数帯が異なる総個数ｐ個の信号成分ｘ1 〜
ｘp を得る。次いで音響処理部２６は、信号成分ｘ1を
整流し、フレーム単位に、整流した信号成分ｘ1 （信号
成分ｘ1 の絶対値）の平均値を得る。この平均値は、整
流した信号成分ｘ1 を１フレーム分の時間幅で除して得
られる。第ｔ番目のフレームにおいて得られる信号成分
ｘ1 の平均値を、音声特徴ベクトルｘ_t の成分ｘ_t1とし
て抽出する。同様にして、残りの信号成分ｘ2 〜ｘp か
ら、音声特徴ベクトルｘ_t の成分ｘ_t2〜ｘ_tpを抽出す
る。The acoustic processing unit 26 converts the input audio signal from an analog signal to a digital signal, and outputs the converted input audio signal through a band filter group to a frequency band (channel) corresponding to each band pass filter. Separated into signal components,
The total number p of signal components x1 ...
Get xp. Next, the acoustic processing unit 26 rectifies the signal component x1 and obtains an average value of the rectified signal component x1 (absolute value of the signal component x1) in frame units. This average value is obtained by dividing the rectified signal component x1 by the time width of one frame. The average value of the signal component x1 obtained in the t-th frame is extracted as component x _t 1 of the audio feature vector x _t. Similarly, from the remaining signal components x2 ～Xp, it extracts the component x _{_t} 2~x _t p of a speech feature vector x _t.

【０２０２】次に音声区間検出部２８は、音響処理部２
６からの音声特徴ベクトルｘ_t に基づいて、音声区間の
始端フレーム及び終端フレームを検出し、どのフレーム
が音声区間の始端フレーム及び終端フレームであるかを
表す区間情報を生成する。音声区間は、音声認識一単位
分の音声信号ここでは単語１個分の音声信号が含まれる
区間である。Next, the voice section detecting section 28 uses the sound processing section 2
Based on the voice feature vector x _t from 6, the start frame and the end frame of the voice section are detected, and the section information indicating which frame is the start frame and the end frame of the voice section is generated. The voice section is a section in which a voice signal for one unit of voice recognition is included here.

【０２０３】照合部３０は、区間情報と音声特徴ベクト
ルｘ_t とを音声区間検出部２８から入力して、音声区間
の始端フレームから終端フレームまでに抽出された音声
特徴ベクトルｘ_t の時系列ｘ₁ 、ｘ₂ 、……、ｘ_T を生
成する。この際、始端フレームのフレーム番号ｔを１と
して、音声区間の始端フレームから終端フレームまでの
フレーム番号ｔを昇順に書き改める。The collating unit 30 inputs the section information and the voice feature vector x _t from the voice section detecting unit 28, and the time series x of the voice feature vector x _t extracted from the start frame to the end frame of the voice section. Generate ₁ , x ₂ , ..., X _T. At this time, the frame number t of the start frame is set to 1, and the frame numbers t from the start frame to the end frame of the voice section are rewritten in ascending order.

【０２０４】そして照合部３０はベクトル時系列ｘ₁ 、
ｘ₂ 、……、ｘ_T と辞書部２４に格納されているＨＭＭ
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
辞書部２４の各ＨＭＭ毎に個別に求め、最大の尤度を得
たＨＭＭに対し付与されているカテゴリｚを、認識結果
として出力する。Then, the matching unit 30 calculates the vector time series x ₁ ,
x ₂ , ..., X _T and the HMM stored in the dictionary unit 24
And the likelihood ln {P (x ₁ , x ₂ , ..., X _T )} between
The category z given to each HMM of the dictionary unit 24 is individually obtained, and the category z assigned to the HMM having the maximum likelihood is output as a recognition result.

【０２０５】ここで、式（１）で示されるＰ（ｘ₁ 、ｘ
₂ 、……、ｘ_T ）は、ＨＭＭにおいてベクトル時系列ｘ
₁ 、ｘ₂ 、……、ｘ_T が出現する確率である。Here, P (x ₁ , x shown in equation (1)
₂ , ..., x _T ) is the vector time series x in the HMM.
It is the probability that ₁ , x ₂ , ..., x _T will appear.

【０２０６】[0206]

【数１６】 (Equation 16)

【０２０７】（１）式中のｃ_iTは、ＨＭＭにおいて初期
状態から遷移を開始しベクトル時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を出力して状態Ｓ_i に至る前向き確率、＊ｉは
Ｓ_i ∈Ｆを満たすｉ（最終状態Ｆに属する状態Ｓ_i に付
与されている番号ｉ）であって、従ってｉ＝＊ｉとなる
前向き確率ｃ_iTのなかで最大の前向き確率ｃ_iTを、出現
確率Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）とするものである。C _iT in the equation (1) is a vector time series x ₁ , x ₂ , ...
..., the forward probability of outputting x _T to reach the state S _i , * i is i (the number _i assigned to the state S _i belonging to the final state F) that satisfies S _i εF, and thus i = The maximum forward probability c _iT among the forward probabilities c _iT of * i is _defined as the appearance probability P (x ₁ , x ₂ , ..., X _T ).

【０２０８】前向き確率ｃ_iTは、ビタビアルゴリズムに
より、式（２）〜（３）に示す漸化式を用いて近似的に
求められる。ｃ_i0＝Ф_i ……（２）The forward probability c _iT is approximately obtained by the Viterbi algorithm using the recurrence formulas shown in the equations (2) to (3). c _i0 = Ф _i (2)

【０２０９】[0209]

【数１７】 [Equation 17]

【０２１０】ＨＭＭにおいて、音声特徴ベクトルｘ_t を
出力する状態遷移は一又は複数存在する。従って初期状
態からベクトル系列ｘ₁ 〜ｘ_t を出力して状態Ｓ_i に至
る遷移パスは一つ又は複数存在し、ほとんどの場合に複
数の遷移パスが存在する。そこで式（３）に示されるよ
うに、各遷移パス毎に計算したｃ_j(t-1)ａ_jiｂ_ji(x_t)の
うち最大のｃ_j(t-1)ａ_jiｂ_ji(x_t)を、前向き確率ｃ_itと
する。この計算法はビタビ法と呼ばれる。In the HMM, there are one or more state transitions that output the voice feature vector x _t . Therefore, there is one or a plurality of transition paths from the initial state to output the vector series x _{1 to} x _t to reach the state S _i , and in most cases, there are a plurality of transition paths. Therefore, as shown in Expression (3), the maximum c _{j (t-1)} a _ji b _ji (x of the c _{j (t-1)} a _ji b _ji (x _t ) calculated for each transition path is obtained. Let _t ) be the forward probability c _it . This calculation method is called the Viterbi method.

【０２１１】（３）式中の出力確率ｂ_ji(x_t)を、ここで
は次式（４）の如く定義する。The output probability b _ji (x _t ) in the equation (3) is defined as the following equation (4).

【０２１２】[0212]

【数１８】 (Equation 18)

【０２１３】但し、ｍ＝１、２、……、Ｍｇ_jim(x_t) ：総個数Ｍ個の正規分布から成る無相関混合
正規分布において第ｍ番目の正規分布から算出される音
声特徴ベクトルｘ_t の重み付け確率（４）式中の重み付け確率ｇ_jim(x_t) は、次式（５）〜
（７）を用いて表される。However, m = 1, 2, ..., M g _jim (x _t ): A speech feature vector calculated from the m-th normal distribution in the uncorrelated mixed normal distribution consisting of M normal distributions. Weighting probability of x _t The weighting probability g _jim (x _t ) in the expression (4) is expressed by the following expression (5)-
It is expressed using (7).

【０２１４】ｇ_jim(x_t) ＝λ_jim ｂ_jim(x_t) ……（５）ｂ_jim(x_t) ＝（２π）^-p/2｜ρ_jim ｜^-1/2 exp｛−Ｄ_jimt ² ／２｝ ……（６）Ｄ_jimt ² ＝（ｘ_t −μ_jim ）’ρ_jim ^-1(ｘ_t −μ_jim ） ……（７） λ_jim ：第ｍ番目の正規分布の重みｂ_jim(x_t) ：第ｍ番目の正規分布から算出される音声特
徴ベクトルｘ_t の重み無し確率 ρ_jim ：第ｍ番目の正規分布の分散・供分散行列 μ_jim ：第ｍ番目の正規分布の平均ベクトルＤ_jimt：音声特徴ベクトルｘ_t と第ｍ番目の正規分布と
の間の距離を表すマハラビスの汎距離（ｘ_t −μ_jim ）’：（ｘ_t −μ_jim ）の転置行列尚、出力確率ｂ_ji(x_t)としては種々のものを用いること
ができ、（４）式のもののほか例えば、次式（８）の如
く定義したものを用いても良い。（８）式は、総個数Ｍ
個の正規分布から成る無相関混合正規分布において個々
の正規分布から算出される重み付け確率ｇ_jim(x_t) のう
ち最大の重み付け確率ｇ_jim(x_t) を、出力確率ｂ_ji(x_t)
として検出することを表す。G _jim (x _t ) = λ _jim b _jim (x _t ) ... (5) b _jim (x _t ) = (2π) ^{−p / 2} _{│ρ jim} │ ^-1/2 exp {−D _jimt ^{2/2} ...... (6)} D jimt 2 = (x t -μ jim) 'ρ jim -1 (x t -μ jim) ...... (7) λ jim: weight b _jim of the m-th normal distribution (x _t ): _Unweighted probability of the speech feature vector x _t calculated from the m-th normal distribution ρ _jim : _Covariance / covariance matrix of the m-th normal distribution μ _jim : _{Mean of} the m-th normal distribution Vector D _jimt : Mahalabis's general distance (x _t −μ _jim ) ′: transposed matrix of (x _t −μ _jim ), which represents the distance between the voice feature vector x _t and the m-th normal distribution. Various types of b _ji (x _t ) can be used, and in addition to the formula (4), for example, a formula defined as the following formula (8) may be used. Equation (8) is the total number M
Number of regular weights probability in the distribution uncorrelated Gaussian Mixture consisting calculated from individual normal distribution g _jim largest weighted probability g _jim (x _t) of the (x _t), the output probability b _ji (x _t)
It means to detect as.

【０２１５】[0215]

【数１９】 [Equation 19]

【０２１６】さらに対数化した遷移確率Ａ_ji＝ln
（ａ_ji）、対数化した出力確率Ｂ_ji(x_t)＝ln｛ｂ
_ji(x_t)｝、及び、対数化した前向き確率Ｃ_it＝ln
（ｃ_it）と表せば、式（１）〜（３）を変形して、尤度
ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_t ）｝の算出に関する
（９）〜（１１）式が得られる。Further logarithmic transition probability A _ji = ln
(A _ji ), logarithmic output probability B _ji (x _t ) = ln {b
_ji (x _t )} and the logarithmic forward probability C _it = ln
When expressed as (c _it ), the likelihoods are modified by modifying equations (1) to (3).
Equations (9) to (11) relating to the calculation of ln {P (x ₁ , x ₂ , ..., X _t )} are obtained.

【０２１７】[0219]

【数２０】 (Equation 20)

【０２１８】（９）〜（１１）式はｔの漸化式であるか
ら、ｔ＝１、２、……、Ｔのときの対数化した前向き確
率Ｃ_itを、次式（１２）〜（１６）の如く順次に計算で
きる。Since equations (9) to (11) are recurrence equations of t, the logarithmic forward probability C _it when t = 1, 2, ... It can be calculated sequentially as in 16).

【０２１９】[0219]

【数２１】 (Equation 21)

【０２２０】ＨＭＭ照合部３０は、ｉ＝１、２、……、
Ｉの全てのｉにつきｔ＝Ｔの対数化した前向き確率Ｃ_iT
を得ると、ｉ＝＊ｉなる対数化した前向き確率Ｃ_iTのな
かで最大のＣ_iTを、尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得る。辞書部２４に格納されているすべて
のＨＭＭについて、各ＨＭＭ毎に、尤度ln｛Ｐ（ｘ₁、
ｘ₂ 、……、ｘ_T ）｝を求め、最大の尤度を得たＨＭＭ
に付与されているカテゴリｚを、当該時系列ｘ₁ 、ｘ
₂ 、……、ｘ_T を得た入力音声信号に対する認識結果と
して出力する。The HMM matching unit 30 uses i = 1, 2, ...,
Logarithmic forward probability C _{iT of} t = T for all i in I
, The maximum C _iT among the logarithmic forward probabilities C _iT with i = * i is _calculated as the likelihood ln {P (x ₁ , x ₂ , ..., X
_T )}. For all the HMMs stored in the dictionary unit 24, the likelihood ln {P (x ₁ ,
x ₂ , ..., x _T )} is obtained and the maximum likelihood is obtained.
The category z given to the time series x ₁ , x
₂ , ..., x _T is output as the recognition result for the input voice signal.

【０２２１】次に請求項３の発明の第一実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
９〜図１１は、この１個のＨＭＭに着目した処理の流れ
を示す図である。この例では、出力確率ｂ_ji(x_t)、前向
き確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した出
力確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数化
した参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉとし
て説明する。Next, in the first embodiment of the invention of claim 3, time series x ₁ , x ₂ , ... Of HMM and speech feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. 9 to 11 are diagrams showing the flow of processing focused on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０２２２】照合部３０は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部２８から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are inputted from the speech section detecting section 28, the collating section 30 receives i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０２２３】次に照合部３０は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collating unit 30 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０２２４】次に照合部３０は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を対数化した参照
確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the matching unit 30 determines that j = 1, 2, ..., J.
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as the initial value of the logarithmic reference probability B _ji (S4).

【０２２５】参照情報記憶部３２には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域save B_jiを設けて
ある。従って参照情報記憶部３２は、Ｂ₁₁、Ｂ₁₂、…
…、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、……、Ｂ_J1、
Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納するＪ×Ｉ個の
格納領域を有する。そこで図にあっては、参照確率Ｂ_ji
の初期値を格納する処理を、save B_ji＝B_ji(x₁) と表し
ている。In the reference information storage section 32, j = 1, 2, ...
, J and i = 1, 2, ..., I are respectively provided with storage areas save B _ji for storing the reference probabilities B _ji . Therefore, the reference information storage unit 32 stores B ₁₁ , B ₁₂ , ...
_{_{..., B 1I, B 21,}} B 22, ......, B 2I, ......, B J1,
B _J2 , ..., B _JI respectively have J × I storage areas for individually storing. Therefore, in the figure, the reference probability B _ji
The process of storing the initial value of is expressed as save B _ji = B _ji (x ₁ ).

【０２２６】次に照合部３０は、定常部基準フレーム番
号ｑｓ、過渡部基準フレーム番号ｑｔをそれぞれ、現フ
レーム番号１に初期化し（Ｓ５）、然る後、ｉ＝１、
２、……、Ｉの全てのｉについて、対数化した前向き確
率Ｃ_i1を式（１１）に従って求める（Ｓ６）。Next, the collating unit 30 initializes the stationary part reference frame number qs and the transient part reference frame number qt to the current frame number 1 (S5), and after that, i = 1,
The logarithmic forward probability C _i1 is _calculated for all i of 2, ..., I according to the equation (11) (S6).

【０２２７】次に照合部３０は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collating unit 30 adds 1 to the current frame number t in order to process the next frame of the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame are compared. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０２２８】（２−１Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、照合部３０は現フレーム番号ｔの音声特徴
ベクトルｘ_t と定常部基準フレーム番号ｑｓの音声特徴
ベクトルｘ_qsとの間の距離ｄｔｓを、次式（１７）に従
って求める（Ｓ９）。(2-1A: When t ≦ T in S8) If the current frame number t is equal to or less than the end frame number T in S8, the processing has not been completed for all the frames in the voice section, so the comparison is performed. part 30 a distance dts between the speech feature vector x _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t, determined according to the following equation (17) (S9).

【０２２９】[0229]

【数２２】 (Equation 22)

【０２３０】但し、ｘ_tk：現フレーム番号ｔの音声特徴ベクトルｘ_t のベク
トル成分ｘ_qsk ：定常部基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsのベクトル成分次に照合部３０は、定常部に関わる距離ｄｔｓと閾値Ｄ
ＴＳとを比較してこれらベクトルｘ_t 及びｘ_qsが近似的
に等しいか否かを判定する（Ｓ１０）。However, x _t k: vector component of the voice feature vector x _t of the current frame number t x _qs k: vector component of the voice feature vector x _qs of the constant part reference frame number qs Dts and threshold D related to
It is determined whether these vectors x _t and x _qs are approximately equal by comparing with TS (S10).

【０２３１】Ｓ１０で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsに
近似せず従って現フレーム番号ｔの音声特徴ベクトルｘ
_t は定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ
_qsからの変化が大きいので、定常部基準フレーム番号ｑ
ｓを現フレーム番号ｔに書き換えると共に定常部に関わ
る比較結果mode sとして、ｄｔｓ＞ＤＴＳを表す情報TR
UEを書き込む（Ｓ１１）。If the distance dts exceeds the threshold value DTS in S10, the voice feature vector x _t of the current frame number t does not approximate to the voice feature vector x _qs of the stationary part reference frame number qs, and thus the voice of the current frame number t is Feature vector x
_t is the voice feature vector x of the stationary part reference frame number qs
_Since the change from _qs is large, the reference frame number q
s is rewritten to the current frame number t and the comparison result mode related to the stationary part Information TR representing dts> DTS as s
Write the UE (S11).

【０２３２】Ｓ１０で距離ｄｔｓが閾値ＤＴＳ以下とな
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
に近似的に等しくなり従って現フレーム番号ｔの音声特
徴ベクトルｘ_t は定常部基準フレーム番号ｑｓの音声特
徴ベクトルｘ_qsからの変化が小さいので、定常部基準フ
レーム番号ｑｓの書換えは行なわないと共に定常部に関
わる比較結果mode sとして、ｄｔｓ≦ＤＴＳを表す情報
FALSE を書き込む（Ｓ１２）。If the distance dts is less than or equal to the threshold value DTS in S10, the voice feature vector x _t of the current frame number _t.
Is the voice feature vector x _qs of the stationary part reference frame number _qs
Therefore, since the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _{qs of} the stationary part reference frame number qs, the stationary part reference frame number qs is not rewritten and the stationary part reference frame number qs is not rewritten. Comparison result mode related to department Information indicating dts ≦ DTS as s
Write FALSE (S12).

【０２３３】Ｓ１１若しくはＳ１２の処理を終了した
ら、次に照合部３０は現フレーム番号ｔの音声特徴ベク
トルｘ_t と過渡部基準フレーム番号ｑｔの音声特徴ベク
トルｘ_qtとの間の距離ｄｔｔを、次式（１８）に従って
求める（Ｓ１３）。[0233] Once finished the process of S11 or S12, then matching unit 30 a distance dtt between the speech feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t, the following It is obtained according to the equation (18) (S13).

【０２３４】[0234]

【数２３】 (Equation 23)

【０２３５】但し、ｘ_tk：現フレーム番号ｔの音声特徴ベクトルｘ_t のベク
トル成分ｘ_qtk ：過渡部基準フレーム番号ｑｔの音声特徴ベクト
ルｘ_qtのベクトル成分次に照合部３０は、過渡部に関わる距離ｄｔｔと閾値Ｄ
ＴＴとを比較してこれらベクトルｘ_t 及びｘ_qtが近似的
に等しいか否かを判定する（Ｓ１４）。However, x _t k: vector component of the voice feature vector x _t of the current frame number t x _qt k: vector component of the voice feature vector x _qt of the transition part reference frame number qt Next, the matching unit 30 determines the transition part. Distance dtt and threshold D related to
It is compared with TT to determine whether these vectors x _t and x _qt are approximately equal (S14).

【０２３６】Ｓ１４で距離ｄｔｔが閾値ＤＴＴを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t が
過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ_qtに
近似せず従って現フレーム番号ｔの音声特徴ベクトルｘ
_t は過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ
_qtからの変化が大きいので、過渡部基準フレーム番号ｑ
ｔを現フレーム番号ｔに書き換えると共に過渡部に関わ
る比較結果mode tとして、ｄｔｔ＞ＤＴＴを表す情報TR
UEを書き込む（Ｓ１５）。If the distance dtt exceeds the threshold value DTT in S14, the voice feature vector x _t of the current frame number t is not approximated to the voice feature vector x _qt of the transition part reference frame number qt, and thus the voice of the current frame number t is Feature vector x
_t is the speech feature vector x of the transition part reference frame number qt
_Since the change from _qt is large, the reference frame number of the transition part q
t is rewritten to the current frame number t and the comparison result mode related to the transient part Information TR representing dtt> DTT as t
Write the UE (S15).

【０２３７】Ｓ１４で距離ｄｔｔが閾値ＤＴＴ以下とな
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ_qt
に近似的に等しくなり従って現フレーム番号ｔの音声特
徴ベクトルｘ_t は過渡部基準フレーム番号ｑｔの音声特
徴ベクトルｘ_qtからの変化が小さいので、過渡部基準フ
レーム番号ｑｔの書換えは行なわないと共に過渡部に関
わる比較結果mode tとして、ｄｔｔ≦ＤＴＴを表す情報
FALSE を書き込む（Ｓ１６）。If the distance dtt becomes less than or equal to the threshold value DTT in S14, the voice feature vector x _t of the current frame number _t.
Is the speech feature vector x _qt of the reference frame number _qt
Therefore, since the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _qt of the reference frame number qt of the transient part, the transition reference frame number qt of the transient part is not rewritten and the transient reference frame number qt is not changed. Comparison result mode related to department Information representing dtt ≦ DTT as t
FALSE is written (S16).

【０２３８】Ｓ１５若しくはＳ１６の処理を終了した
ら、次に照合部３０は、遷移元Ｓ_j の番号ｊ（番号ｊは
ヒドンマルコフモデルにおいて状態遷移の遷移元Ｓ_j に
付与されている番号）を初期値１に設定し（Ｓ１７）、
然る後、遷移元Ｓ_j の番号ｊが最大の番号Ｊ（ここでは
Ｊ＝Ｉ）を越えるか否かを判定する（Ｓ１８）。[0238] Once finished the step S15 or S16, then matching unit 30, the transition number j of the original S _j (number j is a number that is given to the transition source S _j state transition in hidden Markov models) Initial and Set the value to 1 (S17),
Then, it is determined whether or not the number _j of the transition source S _j exceeds the maximum number J (J = I here) (S18).

【０２３９】Ｓ１８でｊ≦Ｊであれば、次に照合部３０
は、遷移元Ｓ_j に付与されている種別ｓが定常部及び過
渡部のいずれであるかを判定する（Ｓ１９）。If j ≦ J in S18, the collating unit 30
Determines whether the type s _assigned to the transition source S _j is a steady part or a transient part (S19).

【０２４０】Ｓ１９の種別判定結果が定常部である場合
は、次に照合部３０は定常部に関わる比較結果mode sを
参照して、定常部に関わる距離ｄｔｓが閾値ＤＴＳを越
えていたか否かを判定する（Ｓ２０）。If the type determination result in S19 is the stationary part, the collating unit 30 next compares the comparison result mode related to the stationary part. By referring to s, it is determined whether or not the distance dts related to the stationary part exceeds the threshold value DTS (S20).

【０２４１】Ｓ２０で比較結果mode sがｄｔｓ＞ＤＴＳ
であったことを表す情報TRUEであれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t が定常部基準フレーム番号ｑ
ｓの音声特徴ベクトルｘ_qsに近似せず従って現フレーム
番号ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsからの変化が大きいの
で、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ
_jiで近似できない。そこで照合部３０は、ｊ＝１、２、
……、Ｊ及びｉ＝１、２、……、Ｉの全てのｊ、ｉにつ
いて、対数化した出力確率Ｂ_ji(x_t)を式（４）〜（７）
に従って求め、参照確率Ｂ_jiを、当該出力確率Ｂ_ji(x_t)
に書き換える（Ｓ２１）。次に照合部３０は、次の番号
ｊにつき処理を行なうべく、遷移元Ｓ_j の番号ｊに１を
加算し（Ｓ２２）、然る後、Ｓ１８の処理を行なう。
尚、Ｓ２１で参照確率Ｂ_jiを書き換える処理を、図にあ
ってはsave B_ji＝B_ji(x_t) と表している。The comparison result mode in S20 s is dts> DTS
If the information TRUE indicating that the current frame number is t, the voice feature vector x _t of the current frame number _t is the stationary part reference frame number q
the change from speech feature vector x _qs speech feature vector x _t is the constant part reference frame number qs of s speech feature vector x _qs not approximate the thus current frame number t is large, the output probability B of the current frame number t _ji (x _t ) is the reference probability B
It cannot be approximated by _ji . Therefore, the matching unit 30 uses j = 1, 2,
.., J and i = 1, 2, ..., For all j and i of I, logarithmicized output probabilities B _ji (x _t ) are _expressed by equations (4) to (7).
According to the reference probability B _ji , and the output probability B _ji (x _t )
(S21). Next, the collation unit 30 adds 1 to the number j of the transition source S _{j in} order to process the next number j (S22), and thereafter performs the process of S18.
The process of rewriting the reference probability B _{ji in} S21 is represented by save B _ji = B _ji (x _t ) in the figure.

【０２４２】Ｓ２０で比較結果mode sがｄｔｓ≦ＤＴＳ
であったことを表す情報FALSE であれば、現フレーム番
号ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番号
ｑｓの音声特徴ベクトルｘ_qsに近似的に等しく従って現
フレーム番号ｔの音声特徴ベクトルｘ_t は定常部基準フ
レーム番号ｑｓの音声特徴ベクトルｘ_qsからの変化が小
さいので、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照
確率Ｂ_jiで近似できる。そこで照合部３０は、Ｓ２１の
処理を行なわずに、従って出力確率Ｂ_ji(x_t)を式（４）
〜（７）に従って求める処理も参照確率Ｂ_jiを書き換え
る処理も行なわずに、次の番号ｊにつき処理を行なうべ
く、遷移元Ｓ_j の番号ｊに１を加算し（Ｓ２２）、然る
後、Ｓ１８の処理を行なう。The comparison result mode in S20 s is dts ≦ DTS
If the information FALSE is that the voice feature vector x _t of the current frame number t is approximately equal to the voice feature vector x _qs of the stationary part reference frame number qs, the voice feature vector x of the current frame number t is _{Since t} has a small change from the speech feature vector x _qs of the stationary part reference frame number qs, the output probability B _ji (x _t ) of the current frame number t can be approximated by the reference probability B _ji . Therefore, the matching unit 30 does not perform the process of S21, and therefore the output probability B _ji (x _t ) is _calculated by the equation (4).
~ (7) is not performed and the process of rewriting the reference probability B _ji is not performed, 1 is added to the number j of the transition source S _j to perform the process for the next number j (S22), and thereafter, The process of S18 is performed.

【０２４３】Ｓ１９の種別判定結果が過渡部である場合
は、次に照合部３０は過渡部に関わる比較結果mode tを
参照して、過渡部に関わる距離ｄｔｔが閾値ＤＴＴを越
えていたか否かを判定する（Ｓ２３）。If the type determination result of S19 is the transient part, the collating part 30 next determines the comparison result mode relating to the transient part. With reference to t, it is determined whether or not the distance dtt related to the transition portion has exceeded the threshold value DTT (S23).

【０２４４】Ｓ２３で比較結果mode tがｄｔｔ＞ＤＴＴ
であったことを表す情報TRUEであれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番号ｑ
ｔの音声特徴ベクトルｘ_qtに近似せず従って現フレーム
番号ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番
号ｑｔの音声特徴ベクトルｘ_qtからの変化が大きいの
で、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ
_jiで近似できない。そこで照合部３０は、ｊ＝１、２、
……、Ｊ及びｉ＝１、２、……、Ｉの全てのｊ、ｉにつ
いて、対数化した出力確率Ｂ_ji(x_t)を式（４）〜（７）
に従って求め、参照確率Ｂ_jiを、当該出力確率Ｂ_ji(x_t)
に書き換える（Ｓ２１）。次に照合部３０は、次の番号
ｊにつき処理を行なうべく、遷移元Ｓ_j の番号ｊに１を
加算し（Ｓ２２）、然る後、Ｓ１８の処理を行なう。In S23, the comparison result mode t is dtt> DTT
If the information TRUE indicating that the current frame number is t, the speech feature vector x _t of the current frame number _t is the transition part reference frame number q.
Since t is speech feature vector x _t speech feature vector x _qt without approximation therefore current frame number t of large changes from speech feature vector x _qt transient portion reference frame number qt, output probability B of the current frame number t _ji (x _t ) is the reference probability B
It cannot be approximated by _ji . Therefore, the matching unit 30 uses j = 1, 2,
.., J and i = 1, 2, ..., For all j and i of I, logarithmic output probabilities B _ji (x _t ) are _expressed by equations (4) to (7).
According to the reference probability B _ji , and the output probability B _ji (x _t )
(S21). Next, the collation unit 30 adds 1 to the number j of the transition source S _{j in} order to process the next number j (S22), and thereafter performs the process of S18.

【０２４５】Ｓ２３で比較結果mode tがｄｔｔ≦ＤＴＴ
であったことを表す情報FALSE であれば、現フレーム番
号ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番号
ｑｔの音声特徴ベクトルｘ_qtに近似的に等しくなり従っ
て現フレーム番号ｔの音声特徴ベクトルｘ_t は過渡部基
準フレーム番号ｑｔの音声特徴ベクトルｘ_qtからの変化
が小さいので、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は
参照確率Ｂ_jiで近似できる。そこで照合部３０は、Ｓ２
１の処理を行なわずに、従って出力確率Ｂ_ji(x_t)を式
（４）〜（７）に従って求める処理も参照確率Ｂ_jiを書
き換える処理も行なわずに、次の番号ｊにつき処理を行
なうべく、遷移元Ｓ_j の番号ｊに１を加算し（Ｓ２
２）、然る後、Ｓ１８の処理を行なう。Comparison result mode in S23 t is dtt ≦ DTT
If the information FALSE is that the voice feature vector x _t of the current frame number t is approximately equal to the voice feature vector x _qt of the transition part reference frame number qt, the voice feature vector of the current frame number t is Since x _t has a small change from the speech feature vector x _qt of the transition part reference frame number qt, the output probability B _ji (x _t ) of the current frame number t can be approximated by the reference probability B _ji . Therefore, the matching unit 30 uses S2
Therefore, the process for the next number j is performed without performing the process of 1 and thus without performing the process of _{obtaining the} output probability B _ji (x _t ) according to the equations (4) to (7) and the process of rewriting the reference probability B _ji. Therefore, 1 is added to the number j of the transition source S _j (S2
2) Then, the process of S18 is performed.

【０２４６】そしてｊ＝１、２、……、Ｊの全てのｊに
つきＳ１９〜Ｓ２３の処理を終了すると、Ｓ１８の処理
でｊ＞Ｊ（ここではＪ＝Ｉ）との判定結果を得るので、
Ｓ１８でｊ＞Ｊであれば、次に照合部３０は、各参照確
率Ｂ_jiを読み出し、ｉ＝１、２、……、Ｉの全てのｉに
ついて、前向き確率Ｃ_itを式（１１）に従って求める
（Ｓ２４）。然る後、音声区間の次のフレームにつき処
理を行なうべくＳ７の処理に戻る。When the processing of S19 to S23 is completed for all j of j = 1, 2, ..., J, the determination result of j> J (here J = I) is obtained in the processing of S18.
If j> J in S18, then the matching unit 30 reads out the reference probabilities B _ji and calculates the forward probability C _it for all i of i = 1, 2, ..., I according to the equation (11). Ask (S24). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０２４７】（２−１Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトル時系列ｘ₁ 、ｘ₂ 、……、ｘ
_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求め
る処理を終了する（終了）。(2-1B: When t> T in S8) If the current frame number t is larger than the frame number T of the terminating frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, wherein i = * i maximum forward probability C _iT of the forward probability C _iT comprising the following (9), the speech feature vector time series _{_{x 1, x 2, ......,}} x
Likelihood ln {P (x ₁ , x ₂ , ..., X between _T and HMM
_T )}, and after that, the process of calculating the likelihood for the HMM is ended (end).

【０２４８】照合部３０は、辞書部２４に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に図９〜図１１に
示すＳ１〜Ｓ２３の処理を行なって尤度（前向き確率Ｃ
_iT）を求め、そして最大の尤度を得たＨＭＭのカテゴリ
を、当該音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_T を抽出した入力音声信号に対する認識結果として、
次段の装置（図示せず）へ出力する。The collating unit 30 performs the processes of S1 to S23 shown in FIGS. 9 to 11 for all the HMMs stored in the dictionary unit 24 to calculate the likelihood (forward probability C).
_iT ), and the category of the HMM for which the maximum likelihood is obtained is set to the time series x ₁ , x ₂ , ..., Of the speech feature vector.
As a recognition result for the input speech signal in which x _T is extracted,
Output to the next-stage device (not shown).

【０２４９】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、遷移元Ｓ_j
が定常部である場合に距離ｄｔｓが閾値ＤＴＳ以下とな
るか、若しくは、遷移元Ｓ_j が過渡部である場合に距離
ｄｔｔが閾値ＤＴＴ以下となるかした場合に、出力確率
Ｂ_ji(x_t)を式（４）〜（７）から求める演算を行なわず
に、前向き確率Ｃ_itを求めるので、演算量を大幅に削減
できる。しかもこのような演算の簡略化は、遷移元Ｓ_j
が定常部である場合に距離ｄｔｓが閾値ＤＴＳ以下とな
るか若しくは遷移元Ｓ_j が過渡部である場合に距離ｄｔ
ｔが閾値ＤＴＴ以下となるかした場合に、行なうので、
演算の簡略化を行なっても、前向き確率Ｃ_itの誤差を小
さくできる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _iT in the process of transition _Sj
If the distance dts is less than or equal to the threshold value DTS when is a stationary portion, or if the distance dtt is less than or equal to the threshold value DTT when the transition source S _j is a transient portion, the output probability B _ji (x _t Since the forward probability C _it is calculated without performing the calculation of () from equations (4) to (7), the amount of calculation can be significantly reduced. Moreover, simplification of such an operation is based on the transition source S _j
Is a stationary part, the distance dts is less than or equal to the threshold value DTS, or the transition source S _j is a transient part, the distance dt is
If t is less than or equal to the threshold value DTT, it is performed.
Even if the calculation is simplified, the error of the forward probability C _it can be reduced.

【０２５０】また音声信号の過渡部において時間順次に
抽出される音声特徴ベクトルｘ_t の変化は大きいので、
遷移元Ｓ_j の種別ｓが過渡部である場合には、過渡部に
関わる閾値ＤＴＴを小さく設定することにより前向き確
率Ｃ_itの誤差を小さくすることが望まれる。Further, since the change of the voice feature vector x _t that is extracted in time sequence in the transient portion of the voice signal is large,
When the type s of the transition source S _j is a transient part, it is desired to reduce the error of the forward probability C _it by setting the threshold value DTT related to the transient part to be small.

【０２５１】これに対し、音声信号の定常部において時
間順次に抽出される音声特徴ベクトルｘ_t の変化は小さ
いので、遷移元Ｓ_j の種別ｓが定常部である場合には、
定常部に関わる閾値ＤＴＳを大きくしても前向き確率Ｃ
_itの誤差を小さくすることができる。On the other hand, since the change of the voice feature vector x _t which is extracted in time series in the stationary part of the audio signal is small, when the type s of the transition source S _j is the stationary part,
Forward probability C even if the threshold value DTS related to the stationary part is increased
it is possible to reduce the error of _it.

【０２５２】従って定常部に関わる閾値ＤＴＳとして値
の大きなものを用いると共に、過渡部に関わる閾値ＤＴ
Ｔとして値の小さなものを用いることにより、前向き確
率Ｃ_itの誤差をなるべく小さくしつつ、演算量を削減す
ることができる。Therefore, a large value is used as the threshold value DTS related to the steady part, and the threshold value DT related to the transient part is used.
By using T having a small value, _it is possible to reduce the amount of calculation while minimizing the error of the forward probability C _it .

【０２５３】この出願の発明者のシミュレーション結果
によれば、図９〜図１１に示すこの例において、前向き
確率Ｃ_itを求めるための演算量が、演算の簡略化を行な
わない場合の約１／５となるように、定常部に関わる閾
値ＤＴＳ及び過渡部に関わる閾値ＤＴＴを定めても、図
９〜図１１に示すこの例と、演算の簡略化を行なわない
場合とで、音声認識の認識精度に顕著な差を生じないば
かりか、むしろ認識精度が向上する例が数多く存在し
た。According to the simulation result of the inventor of this application, in this example shown in FIGS. 9 to 11, the amount of calculation for obtaining the forward probability C _it is about 1 / third that when the calculation is not simplified. Even if the threshold value DTS related to the stationary part and the threshold value DTT related to the transient part are determined so as to be 5, the recognition of the voice recognition is performed in this example shown in FIG. 9 to FIG. 11 and the case where the calculation is not simplified. There were many cases where not only the accuracy did not differ significantly, but rather the recognition accuracy improved.

【０２５４】＜請求項３の発明の第二実施形態＞請求項
３の発明の第二実施形態の実施に用いて好適な音声認識
装置としては、照合部３０を次に述べる如く構成するほ
かは、上述した構成と同様の構成の音声認識装置１０を
用いることができる。<Second Embodiment of the Invention of Claim 3> As a voice recognition apparatus suitable for carrying out the second embodiment of the invention of claim 3, the collating unit 30 is configured as follows. It is possible to use the voice recognition device 10 having the same configuration as that described above.

【０２５５】すなわち照合部３０は、尤度を求める際
に、参照情報記憶部３２に格納されている参照確率ｂ_ji
を用いて、ｔ＝１、２、……、Ｔの各場合の前向き確率
ｃ_itを、次ぎの如くして順次に求める。That is, the matching unit 30 determines the reference probability b _ji stored in the reference information storage unit 32 when obtaining the likelihood.
, The forward probability c _it in each case of t = 1, 2, ..., T is sequentially obtained as follows.

【０２５６】（１）．ｔ＝１のときは、定常部スキップ
数ｓｋｉｐｓ、過渡部スキップ数ｓｋｉｐｔをそれぞれ
０に、及び、定常部基準フレーム番号ｑｓ、過渡部基準
フレーム番号ｑｔをそれぞれ１に初期化すると共に、全
てのｊ、ｉについて、出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求め当該出力確率ｂ_ji(x_t)を参照確率ｂ_ji
の初期値として書き込み、参照確率ｂ_jiの書込み終了後
に各参照確率ｂ_jiを読み出して前向き確率ｃ_itを求める
処理（２Ａ）を行なう。(1). When t = 1, the constant part skip number skips and the transient part skip number skipt are initialized to 0, and the constant part reference frame number qs and the transient part reference frame number qt are initialized to 1 and all j are initialized. , I, the output probability b _ji (x _t ) is obtained from the Hidden Markov model, and the output probability b _ji (x _t ) is referred to as the reference probability b _ji.
Writing the initial value, it reads out the reference probability b _ji after completion of writing of the reference probability b _ji by obtaining the forward probability c _it process performs (2A).

【０２５７】そして処理（２Ａ）の終了後、現フレーム
番号ｔに１を加算する処理（２Ｂ）を行なう。After the end of the process (2A), a process (2B) of adding 1 to the current frame number t is performed.

【０２５８】（２）．２≦ｔ≦Ｔのときは、定常部スキ
ップ数ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共
に、現フレーム番号ｔの音声特徴ベクトルｘ_t と定常部
基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の
距離ｄｔｓを閾値ＤＴＳと比較し、当該比較結果がｓｋ
ｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる場
合に、定常部スキップ数ｓｋｉｐｓを０に初期化し、及
び、定常部基準フレーム番号ｑｓを現フレーム番号ｔに
書き換え、当該比較結果がｓｋｉｐｓ≦ＮＳＫＩＰＳか
つｄｔｓ≦ＤＴＳとなる場合に、定常部スキップ数ｓｋ
ｉｐｓに１を加算する処理（２Ｃ）と、過渡部スキップ
数ｓｋｉｐｔを閾値ＮＳＫＩＰＴと比較すると共に、現
フレーム番号ｔの音声特徴ベクトルｘ_t と過渡部基準フ
レーム番号ｑｔの音声特徴ベクトルｘ_qtとの間の距離ｄ
ｔｔを閾値ＤＴＴと比較し、当該比較結果がｓｋｉｐｔ
＞ＮＳＫＩＰＴ若しくはｄｔｔ＞ＤＴＴとなる場合に、
過渡部スキップ数ｓｋｉｐｔを０に初期化し、及び、過
渡部基準フレーム番号ｑｔを現フレーム番号ｔに書き換
え、当該比較結果がｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔ
ｔ≦ＤＴＴとなる場合に、過渡部スキップ数ｓｋｉｐｔ
に１を加算する処理（２Ｄ）とを行なう。(2). When the 2 ≦ t ≦ T, as well as comparing the constant region skip number skips a threshold NSKIPS, the distance between the speech feature vector x _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t dts is compared with the threshold value DTS, and the comparison result is sk.
When ips> NSKIPS or dts> DTS, the constant part skip number skips is initialized to 0, and the constant part reference frame number qs is rewritten to the current frame number t, and the comparison result is skips ≦ NSKIPS and dts ≦ DTS. If, then the number of skips in the stationary part sk
1 and adds the process (2C) in ips, while comparing the transient portion skip number skipt a threshold NSKIPT, the speech feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t Distance d
tt is compared with the threshold value DTT, and the comparison result is skippt.
> NSKIPT or dtt> DTT,
The transition part skip number skipt is initialized to 0, and the transition part reference frame number qt is rewritten to the current frame number t, and the comparison result is skip≤NSKIPT and dt.
When t ≦ DTT, the number of skips in transition part skippt
The process of adding 1 to (2D) is performed.

【０２５９】そして処理（２Ｃ）、（２Ｄ）の終了後、
ｊ＝１、２、……、Ｊの各ｊ毎に、出力確率ｂ_ji(x_t)を
与える状態遷移の遷移元Ｓ_j に付与されている種別ｓを
判定する処理（２Ｅ）を行なう。After the processes (2C) and (2D) are completed,
For each j of j = 1, 2, ..., J, a process (2E) of determining the type s given to the transition source S _j of the state transition giving the output probability b _ji (x _t ) is performed.

【０２６０】そして処理（２Ｅ）の種別判定結果が定常
部であった場合に、処理（２Ｃ）の比較結果がｓｋｉｐ
ｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳであれば、当
該種別判定結果を得たｊに関しては全てのｉについて、
出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求めて参
照確率ｂ_jiを当該出力確率ｂ_ji(x_t)に書き換え、処理
（２Ｅ）の種別判定結果が定常部であった場合に、処理
（２Ｃ）の比較結果がｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄ
ｔｓ≦ＤＴＳであれば、当該種別判定結果を得たｊに関
しては参照確率ｂ_jiの書換えを行なわず、処理（２Ｅ）
の種別判定結果が過渡部であった場合に、処理（２Ｄ）
の比較結果がｓｋｉｐｔ＞ＮＳＫＩＰＴ若しくはｄｔｔ
＞ＤＴＴであれば、当該種別判定結果を得たｊに関して
は全てのｉについて、出力確率ｂ_ji(x_t)をヒドンマルコ
フモデルから求めて参照確率ｂ_jiを当該出力確率ｂ_ji(x
_t)に書き換え、処理（２Ｅ）の種別判定結果が過渡部で
あった場合に、処理（２Ｄ）の比較結果がｓｋｉｐｔ≦
ＮＳＫＩＰＴかつｄｔｔ≦ＤＴＴであれば、当該種別判
定結果を得たｊに関しては参照確率ｂ_jiの書換えを行な
わない処理（２Ｆ）を行なう。When the type determination result of the process (2E) is the stationary part, the comparison result of the process (2C) is skip.
If s> NSKIPS or dts> DTS, for j for which the type determination result is obtained, for all i,
When the output probability b _ji (x _t ) is obtained from the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and when the type determination result of the process (2E) is the stationary part, the process The comparison result of (2C) is skips ≦ NSKIPS and d
If ts ≦ DTS, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and the process (2E) is performed.
When the type determination result of is a transition part, processing (2D)
Comparison result of skip> NSKIPT or dtt
If> DTT, for j for which the type determination result is obtained, for all i, output probabilities b _ji (x _t ) are obtained from Hidden Markov models and reference probabilities b _ji are output probabilities b _ji (x
_t ), and when the type determination result of the process (2E) is the transition part, the comparison result of the process (2D) is skip≤
If NSKIPT and dtt ≦ DTT, the process (2F) of not rewriting the reference probability b _ji is performed for j that has obtained the type determination result.

【０２６１】そしてｊ＝１、２、……、Ｊの個々のｊ毎
に該処理（２Ｆ）を行ない、全てのｊにつき処理（２
Ｆ）を終了したら、各参照確率ｂ_jiを読み出して前向き
確率ｃ_itを求める処理（２Ｇ）を行なう。Then, the process (2F) is performed for each j of j = 1, 2, ..., J, and the process (2F) is performed for all j.
After the step F) is completed, each reference probability b _ji is read out and the forward probability c _it is calculated (2G).

【０２６２】そして処理（２Ｇ）の終了後、現フレーム
番号ｔに１を加算する処理（２Ｈ）を行なう。After the processing (2G) is completed, processing (2H) is performed to add 1 to the current frame number t.

【０２６３】次に請求項３の発明の第二実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
１２〜図１４は、この１個のＨＭＭに着目した処理の流
れを示す図である。この例では、出力確率ｂ_ji(x_t)、前
向き確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した
出力確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数
化した参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉと
して説明する。Next, in the second embodiment of the invention of claim 3, the time series x ₁ , x ₂ , ... Of HMM and speech feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. 12 to 14 are diagrams showing the flow of processing focusing on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０２６４】照合部３０は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部２８から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are input from the speech section detection unit 28, the matching unit 30 receives i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０２６５】次に照合部３０は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collating unit 30 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０２６６】次に照合部３０は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を対数化した参照
確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the collating unit 30 determines that j = 1, 2, ..., J.
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as the initial value of the logarithmic reference probability B _ji (S4).

【０２６７】参照情報記憶部３２には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域save B_jiを設けて
ある。従って参照情報記憶部３２は、Ｂ₁₁、Ｂ₁₂、…
…、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、……、Ｂ_J1、
Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納するＪ×Ｉ個の
格納領域を有する。そこで図にあっては、参照確率Ｂ_ji
の初期値を格納する処理を、save B_ji＝B_ji(x₁) と表し
ている。In the reference information storage section 32, j = 1, 2, ...
, J and i = 1, 2, ..., I are respectively provided with storage areas save B _ji for storing the reference probabilities B _ji . Therefore, the reference information storage unit 32 stores B ₁₁ , B ₁₂ , ...
_{_{..., B 1I, B 21,}} B 22, ......, B 2I, ......, B J1,
B _J2 , ..., B _JI respectively have J × I storage areas for individually storing. Therefore, in the figure, the reference probability B _ji
The process of storing the initial value of is expressed as save B _ji = B _ji (x ₁ ).

【０２６８】次に照合部３０は、定常部スキップ数ｓｋ
ｉｐｓ、過渡部スキップ数ｓｋｉｐｔをそれぞれ、０に
初期化すると共に定常部基準フレーム番号ｑｓ、過渡部
基準フレーム番号ｑｔをそれぞれ、現フレーム番号１に
初期化し（Ｓ５）、然る後、ｉ＝１、２、……、Ｉの全
てのｉについて、対数化した前向き確率Ｃ_i1を式（１
１）に従って求める（Ｓ６）。Next, the matching unit 30 determines the number of regular part skips sk.
ips and the number of skips in the transition part skipt are initialized to 0, and the reference frame number qs of the stationary part and the reference frame number qt of the transition part are initialized to the current frame number 1 (S5), and then i = 1. , ..., For all i in I, the forward probability C _i1 which is logarithmized is expressed by the formula (1
It is obtained according to 1) (S6).

【０２６９】次に照合部３０は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collating unit 30 adds 1 to the current frame number t in order to process the next frame of the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame are added. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０２７０】（２−２Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、定常部スキップ数ｓｋｉｐｓと閾値ＮＳＫ
ＩＰＳとの比較判定を行なう（Ｓ９）。(2-2A: When t ≦ T in S8) If the current frame number t is less than or equal to the end frame number T in S8, the processing is not completed for all the frames in the voice section, so that the steady state is performed. Copy skip number skips and threshold NSK
A comparison judgment with IPS is performed (S9).

【０２７１】Ｓ９で定常部スキップ数ｓｋｉｐｓが閾値
ＮＳＫＩＰＳを越える場合は、定常部に関わる距離ｄｔ
ｓが閾値ＤＴＳ以下となった回数ｓｋｉｐｓが閾値ＮＳ
ＫＩＰＳを越え従って現フレーム番号ｔと定常部基準フ
レーム番号ｑｓとの時間的隔たりが大きくなるので、誤
差が増大する可能性が高い。そこで定常部スキップ数ｓ
ｋｉｐｓを０に初期化すると共に定常部基準フレーム番
号ｑｓを現フレーム番号ｔに書き換え、さらに定常部に
関わる比較結果mode sとして、ｓｋｉｐｓ＞ＮＳＫＩＰ
Ｓ若しくはｄｔｓ＞ＤＴＳであったことを表す情報TRUE
を書き込む（Ｓ１０）。If the skip count skips of the stationary part exceeds the threshold value NSKIPS in S9, the distance dt related to the stationary part
The number of times s is less than or equal to the threshold DTS skips is the threshold NS
Since the time gap between the current frame number t and the constant part reference frame number qs is increased beyond KIPS, the error is likely to increase. Therefore, the number of regular part skips s
Initialize kips to 0, rewrite the constant part reference frame number qs to the current frame number t, and compare the result As s, skips> NSKIP
Information TRUE indicating that S or dts> DTS was true
Is written (S10).

【０２７２】Ｓ９で定常部スキップ数ｓｋｉｐｓが閾値
ＮＳＫＩＰＳ以下である場合は、次に照合部３０は現フ
レーム番号ｔの音声特徴ベクトルｘ_t と定常部基準フレ
ーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔ
ｓを式（１７）に従って求め（Ｓ１１）、然る後、定常
部に関わる距離ｄｔｓを閾値ＤＴＳと比較してこれらベ
クトルｘ_t 及びｘ_qsが近似的に等しいか否かを判定する
（Ｓ１２）。[0272] When the constant region skip number skips is equal to or less than the threshold NSKIPS in S9, then matching unit 30 of the audio feature vector x _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t Distance dt
s is obtained according to the equation (17) (S11), and then the distance dts related to the stationary part is compared with the threshold value DTS to determine whether or not these vectors x _t and x _qs are approximately equal (S12). .

【０２７３】Ｓ１２で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsに
近似せず従って現フレーム番号ｔの音声特徴ベクトルｘ
_t は定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ
_qsからの変化が大きい。そこで定常部スキップ数ｓｋｉ
ｐｓを０に初期化すると共に定常部基準フレーム番号ｑ
ｓを現フレーム番号ｔに書き換え、さらに定常部に関わ
る比較結果mode sとして、ｓｋｉｐｓ＞ＮＳＫＩＰＳ若
しくはｄｔｓ＞ＤＴＳであったことを表す情報TRUEを書
き込む（Ｓ１０）。If the distance dts exceeds the threshold value DTS in S12, the voice feature vector x _t of the current frame number t does not approximate to the voice feature vector x _qs of the stationary part reference frame number qs, and therefore the voice of the current frame number t Feature vector x
_t is the voice feature vector x of the stationary part reference frame number qs
The change from _qs is large. Therefore, the number of skips in the steady part ski
ps is initialized to 0 and the reference frame number q of the stationary part
s is rewritten to the current frame number t, and the comparison result mode related to the stationary part Information s indicating that skips> NSKIPS or dts> DTS is written as s (S10).

【０２７４】Ｓ１２で距離ｄｔｓが閾値ＤＴＳ以下であ
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は定常部基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
に近似的に等しく従って現フレーム番号ｔの音声特徴ベ
クトルｘ_t は定常部基準フレーム番号ｑｓの音声特徴ベ
クトルｘ_qsからの変化が小さい。そこで定常部スキップ
数ｓｋｉｐｓに１を加算して定常部スキップ数ｓｋｉｐ
ｓをカウントアップすると共に、定常部に関わる比較結
果mode sとして、ｓｋｉｐｓ≦ＮＳＫＩＰＴかつｄｔｓ
≦ＤＴＳであったことを表す情報FALSE を書き込む（Ｓ
１３）。If the distance dts is less than or equal to the threshold DTS in S12, the voice feature vector x _t of the current frame number _t
Is the voice feature vector x _qs of the stationary part reference frame number _qs
Speech feature vector x _t of approximately equal therefore current frame number t is small variation from the speech feature vector x _qs constant region reference frame number qs to. Then, 1 is added to the number of skips in the steady part skips to skip the number of skips in the steady part
s is counted up and the comparison result mode related to the stationary part As s, skips ≦ NSKIPT and dts
Write information FALSE indicating that ≤ DTS (S
13).

【０２７５】Ｓ１０若しくはＳ１３の処理を終了した
ら、次に照合部３０は過渡部スキップ数ｓｋｉｐｔと閾
値ＮＳＫＩＰＴとの比較判定を行なう（Ｓ１４）。After the processing of S10 or S13 is completed, the collating unit 30 next makes a comparison judgment between the transient part skip number skipt and the threshold value NSKIPT (S14).

【０２７６】Ｓ１４で過渡部スキップ数ｓｋｉｐｔが閾
値ＮＳＫＩＰＴを越える場合は、過渡部に関わる距離ｄ
ｔｔが閾値ＤＴＴ以下となった回数ｓｋｉｐｔが閾値Ｎ
ＳＫＩＰＴを越え従って現フレーム番号ｔと過渡部基準
フレーム番号ｑｔとの時間的隔たりが大きくなるので、
誤差が増大する可能性が高い。そこで過渡部スキップ数
ｓｋｉｐｔを０に初期化すると共に過渡部基準フレーム
番号ｑｔを現フレーム番号ｔに書き換え、さらに過渡部
に関わる比較結果mode tとして、ｓｋｉｐｔ＞ＮＳＫＩ
ＰＴ若しくはｄｔｔ＞ＤＴＴであったことを表す情報TR
UEを書き込む（Ｓ１５）。If the number of skips in the transitional part skip exceeds the threshold value NSKIPT in S14, the distance d related to the transitional part
The number of times tt becomes equal to or less than the threshold DTT, skipt is the threshold N
Since the time frame between the current frame number t and the transitional section reference frame number qt becomes larger than the SKIPT,
The error is likely to increase. Therefore, the transition part skip number skipt is initialized to 0, the transition part reference frame number qt is rewritten to the current frame number t, and the comparison result mode relating to the transition part is set. As t, skippt> NSKI
Information TR showing that PT or dtt> DTT
Write the UE (S15).

【０２７７】Ｓ１４で過渡部スキップ数ｓｋｉｐｔが閾
値ＮＳＫＩＰＴ以下である場合は、次に照合部３０は現
フレーム番号ｔの音声特徴ベクトルｘ_t と過渡部基準フ
レーム番号ｑｔの音声特徴ベクトルｘ_qtとの間の距離ｄ
ｔｔを式（１８）に従って求め（Ｓ１６）、然る後、過
渡部に関わる距離ｄｔｔを閾値ＤＴＴと比較してこれら
ベクトルｘ_t 及びｘ_qtが近似的に等しいか否かを判定す
る（Ｓ１７）。If the number of skips in the transitional part skip is less than or equal to the threshold value NSKIPT in S14, the collation unit 30 next determines the speech feature vector x _t of the current frame number t and the speech feature vector x qt of the reference frame number of the transitional part _qt . Distance d
tt is obtained according to the equation (18) (S16), and then the distance dtt related to the transient part is compared with the threshold value DTT to determine whether or not these vectors x _t and x _qt are approximately equal (S17). .

【０２７８】Ｓ１７で距離ｄｔｔが閾値ＤＴＴを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ_qtに
近似せず従って現フレーム番号ｔの音声特徴ベクトルｘ
_t は過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ
_qtからの変化が大きい。そこで過渡部スキップ数ｓｋｉ
ｐｔを０に初期化すると共に過渡部基準フレーム番号ｑ
ｔを現フレーム番号ｔに書き換え、さらに過渡部に関わ
る比較結果mode tとして、ｓｋｉｐｔ＞ＮＳＫＩＰＴ若
しくはｄｔｔ＞ＤＴＴであったことを表す情報TRUEを書
き込む（Ｓ１５）。If the distance dtt exceeds the threshold value DTT in S17, the voice feature vector x _t of the current frame number t is not approximated to the voice feature vector x _qt of the transition part reference frame number qt, and therefore the voice of the current frame number t. Feature vector x
_t is the speech feature vector x of the transition part reference frame number qt
The change from _qt is large. Therefore, the number of skips in the transition part ski
pt is initialized to 0 and the transition part reference frame number q
t is rewritten to the current frame number t, and the comparison result mode related to the transition part Information t indicating that skip> NSKIPT or dtt> DTT is written as t is written (S15).

【０２７９】Ｓ１７で距離ｄｔｔが閾値ＤＴＴ以下であ
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は過渡部基準フレーム番号ｑｔの音声特徴ベクトルｘ_qt
に近似的に等しく従って現フレーム番号ｔの音声特徴ベ
クトルｘ_t は過渡部基準フレーム番号ｑｔの音声特徴ベ
クトルｘ_qtからの変化が小さい。そこで過渡部スキップ
数ｓｋｉｐｔに１を加算して過渡部スキップ数ｓｋｉｐ
ｔをカウントアップすると共に、過渡部に関わる比較結
果mode tとして、ｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔｔ
≦ＤＴＴであったことを表す情報FALSE を書き込む（Ｓ
１８）。If the distance dtt is less than or equal to the threshold value DTT in S17, the voice feature vector x _t of the current frame number _t.
Is the speech feature vector x _qt of the reference frame number _qt
Speech feature vector x _t of approximately equal therefore current frame number t is small variation from the speech feature vector x _qt transient portion reference frame numbers qt to. Therefore, 1 is added to the number of skips in the transition part skip, and the number of skips in the transition part skip
t is counted up and the comparison result mode related to the transient part As t, skipt ≦ NSKIPT and dtt
Write information FALSE indicating that ≦ DTT (S
18).

【０２８０】Ｓ１５若しくはＳ１８の処理を終了した
ら、次に照合部３０は、遷移元Ｓ_j の番号ｊ（番号ｊは
ヒドンマルコフモデルにおいて状態遷移の遷移元Ｓ_j に
付与されている番号）を初期値１に設定し（Ｓ１９）、
然る後、遷移元Ｓ_j の番号ｊが最大の番号Ｊ（ここでは
Ｊ＝Ｉ）を越えるか否かを判定する（Ｓ２０）。[0280] Once finished the step S15 or S18, then matching unit 30, the transition number j of the original S _j (number j is a number that is given to the transition source S _j state transition in hidden Markov models) Initial and Set the value to 1 (S19),
Then, it is determined whether or not the number _j of the transition source S _j exceeds the maximum number J (J = I here) (S20).

【０２８１】Ｓ２０でｊ≦Ｊであれば、次に照合部３０
は、遷移元Ｓ_j に付与されている種別ｓが定常部及び過
渡部のいずれであるかを判定する（Ｓ２１）。If j ≦ J in S20, the collating unit 30
Determines whether the type s _assigned to the transition source S _j is a steady part or a transient part (S21).

【０２８２】Ｓ２１の種別判定結果が定常部である場合
は、次に照合部３０は定常部に関わる比較結果mode sを
参照して、定常部に関わるスキップ数ｓｋｉｐｓ、閾値
ＮＳＫＩＰＳの比較結果及び距離ｄｔｓ、閾値ＤＴＳの
比較結果がどのようになっているかを判定する（Ｓ２
２）。If the type determination result in S21 is the stationary part, the collating part 30 next compares the comparison result mode related to the stationary part. By referring to s, it is determined what the skip number skips related to the stationary part, the comparison result of the threshold NSKIPS and the distance dts, and the comparison result of the threshold DTS are (S2).
2).

【０２８３】Ｓ２２で比較結果mode sがｓｋｉｐｓ＞Ｎ
ＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳであったことを表す
情報TRUEであれば、照合部３０は、ｊ＝１、２、……、
Ｊ及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、
対数化した出力確率Ｂ_ji(x_t)を式（４）〜（７）に従っ
て求め、参照確率Ｂ_jiを、当該出力確率Ｂ_ji(x_t)に書き
換える（Ｓ２３）。次に照合部３０は、次の番号ｊにつ
き処理を行なうべく、遷移元Ｓ_j の番号ｊに１を加算し
（Ｓ２４）、然る後、Ｓ２０の処理を行なう。尚、Ｓ２
３で参照確率Ｂ_jiを書き換える処理を、図にあってはsa
ve B_ji＝B_ji(x_t) と表している。Comparison result mode in S22 s is skips> N
If the information is TRUE indicating that SKIPS or dts> DTS, the collation unit 30 determines that j = 1, 2, ...
For all j and i of J and i = 1, 2, ..., I,
The logarithmic output probability B _ji (x _t ) is obtained according to equations (4) to (7), and the reference probability B _ji is rewritten to the output probability B _ji (x _t ) (S23). Next, the collation unit 30 adds 1 to the number j of the transition source S _{j in} order to process the next number j (S24), and thereafter performs the process of S20. S2
The process of rewriting the reference probability B _{ji in} 3 is sa in the figure.
It is expressed as ve B _ji = B _ji (x _t ).

【０２８４】ｓｋｉｐｓ＞ＮＳＫＩＰＳであれば、定常
部に関わる距離ｄｔｓが閾値ＤＴＳ以下となった回数ｓ
ｋｉｐｓが閾値ＮＳＫＩＰＳを越えたので現フレーム番
号ｔと定常部基準フレーム番号ｑｓとの時間的隔たりが
大きく、従って誤差が増大する可能性が高い。そこで誤
差を低減するために、参照確率Ｂ_jiを書き換える。If skips> NSKIPS, the number of times s that the distance dts related to the stationary part becomes equal to or less than the threshold value DTS.
Since the kips exceeds the threshold value NSKIPS, there is a large time gap between the current frame number t and the constant part reference frame number qs, and thus the error is likely to increase. Therefore, in order to reduce the error, the reference probability B _ji is rewritten.

【０２８５】ｄｔｓ＞ＤＴＳであれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番号ｑ
ｓの音声特徴ベクトルｘ_qsに近似せず従って現フレーム
番号ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsからの変化が大きいの
で、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ
_jiで近似できない。そこで参照確率Ｂ_jiを書き換える。If dts> DTS, the voice feature vector x _t of the current frame number _t is the stationary part reference frame number q.
the change from speech feature vector x _qs speech feature vector x _t is the constant part reference frame number qs of s speech feature vector x _qs not approximate the thus current frame number t is large, the output probability B of the current frame number t _ji (x _t ) is the reference probability B
It cannot be approximated by _ji . Therefore, the reference probability B _ji is rewritten.

【０２８６】Ｓ２２で比較結果mode sがｓｋｉｐｓ≦Ｎ
ＳＫＩＰＳかつｄｔｓ≦ＤＴＳであったことを表す情報
FALSE であれば、照合部３０は、Ｓ２３の処理を行なわ
ずに、従って出力確率Ｂ_ji(x_t)を式（４）〜（７）に従
って求める処理も参照確率Ｂ _jiを書き換える処理も行な
わずに、次の番号ｊにつき処理を行なうべく、遷移元Ｓ
_j の番号ｊに１を加算し（Ｓ２４）、然る後、Ｓ２０の
処理を行なう。Comparison result mode in S22 s is skips ≤ N
Information indicating that SKIPS and dts ≦ DTS were satisfied
If it is FALSE, the collation unit 30 performs the process of S23.
Without, and therefore output probability B_ji(x_t) According to equations (4)-(7)
The processing to obtain is also the reference probability B _jiAlso rewrite
Instead of performing the process for the next number j, the transition source S
_j 1 is added to the number j of (S24), and then S20
Perform processing.

【０２８７】ｓｋｉｐｓ≦ＮＳＫＩＰＳかつｄｔｓ≦Ｄ
ＴＳであれば、ｓｋｉｐｓ≦ＮＳＫＩＰＳなので定常部
に関わる距離ｄｔｓが閾値ＤＴＳ以下となった回数ｓｋ
ｉｐｓは閾値ＮＳＫＩＰＳを越えず、従って現フレーム
番号ｔと定常部基準フレーム番号ｑｓとの時間的隔たり
は小さくなるので誤差が増大する可能性は低い。しかも
ｄｔｓ≦ＤＴＳなので現フレーム番号ｔの音声特徴ベク
トルｘ_t は定常部基準フレーム番号ｑｓの音声特徴ベク
トルｘ_qsに近似的に等しくなり従って現フレーム番号ｔ
の音声特徴ベクトルｘ_t は定常部基準フレーム番号ｑｓ
の音声特徴ベクトルｘ_qsからの変化が小さいので、現フ
レーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ_jiで近似
的できる。そこで参照確率Ｂ_jiを書き換えずに読み出し
て、前向き確率Ｃ_itを求める。Skips ≦ NSKIPS and dts ≦ D
If TS, skips ≦ NSKIPS, so the number of times the distance dts related to the stationary part is equal to or less than the threshold value DTS sk
Since ips does not exceed the threshold value NSKIPS, and therefore the time gap between the current frame number t and the constant part reference frame number qs is small, the error is unlikely to increase. Moreover, since dts ≦ DTS, the voice feature vector x _t of the current frame number t becomes approximately equal to the voice feature vector x _qs of the stationary part reference frame number qs, and therefore the current frame number t
Voice feature vector x _t of the stationary part reference frame number qs
The change from speech feature vector x _qs is small, the output probability B _ji (x _t) of the current frame number t may be approximated by the reference probability B _ji. Therefore, the reference probability B _ji is read out without being rewritten to obtain the forward probability C _it .

【０２８８】Ｓ２１の種別判定結果が過渡部である場合
は、次に照合部３０は過渡部に関わる比較結果mode tを
参照して、過渡部に関わるスキップ数ｓｋｉｐｔ、閾値
ＮＳＫＩＰＴの比較結果及び距離ｄｔｔ、閾値ＤＴＴの
比較結果がどのようになっているかを判定する（Ｓ２
５）。If the type determination result of S21 is the transitional part, the collation unit 30 next determines the comparison result mode related to the transitional part. With reference to t, it is determined what the skip number skipt related to the transition part, the comparison result of the threshold value NSKIPT, the distance dtt, and the comparison result of the threshold value DTT are (S2).
5).

【０２８９】Ｓ２５で比較結果mode tがｓｋｉｐｔ＞Ｎ
ＳＫＩＰＴ若しくはｄｔｔ＞ＤＴＴであったことを表す
情報TRUEであれば、照合部３０は、ｊ＝１、２、……、
Ｊ及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、
対数化した出力確率Ｂ_ji(x_t)を式（４）〜（７）に従っ
て求め、参照確率Ｂ_jiを、当該出力確率Ｂ_ji(x_t)に書き
換える（Ｓ２３）。次に照合部３０は、次の番号ｊにつ
き処理を行なうべく、遷移元Ｓ_j の番号ｊに１を加算し
（Ｓ２４）、然る後Ｓ２０の処理を行なう。Comparison result mode in S25 t is skippt> N
If the information is TRUE indicating that SKIPT or dtt> DTT, the collation unit 30 determines that j = 1, 2, ...
For all j and i of J and i = 1, 2, ..., I,
The logarithmic output probability B _ji (x _t ) is obtained according to equations (4) to (7), and the reference probability B _ji is rewritten to the output probability B _ji (x _t ) (S23). Next, the collation unit 30 adds 1 to the number j of the transition source S _{j in} order to process the next number j (S24), and then performs the process of S20.

【０２９０】ｓｋｉｐｔ＞ＮＳＫＩＰＴであれば、過渡
部に関わる距離ｄｔｔが閾値ＤＴＴ以下となった回数ｓ
ｋｉｐｔが閾値ＮＳＫＩＰＴを越えたので現フレーム番
号ｔと過渡部基準フレーム番号ｑｔとの時間的隔たりが
大きく、従って誤差が増大する可能性が高い。そこで誤
差を低減するために参照確率Ｂ_jiを書き換える。If skipt> NSKIPT, the number of times s that the distance dtt related to the transient portion becomes equal to or less than the threshold value DTT.
Since kipt exceeds the threshold value NSKIPT, there is a large time gap between the current frame number t and the transition part reference frame number qt, and thus the error is likely to increase. Therefore, the reference probability B _ji is rewritten in order to reduce the error.

【０２９１】ｄｔｔ＞ＤＴＴであれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番号ｑ
ｔの音声特徴ベクトルｘ_qtに近似せず従って現フレーム
番号ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番
号ｑｔの音声特徴ベクトルｘ_qtからの変化が大きいの
で、現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ
_jiで近似できない。そこで誤差を低減するために参照確
率Ｂ_jiを書き換える。If dtt> DTT, the voice feature vector x _t of the current frame number _t is the transition part reference frame number q.
Since t is speech feature vector x _t speech feature vector x _qt without approximation therefore current frame number t of large changes from speech feature vector x _qt transient portion reference frame number qt, output probability B of the current frame number t _ji (x _t ) is the reference probability B
It cannot be approximated by _ji . Therefore, the reference probability B _ji is rewritten in order to reduce the error.

【０２９２】Ｓ２５で比較結果mode tがｓｋｉｐｔ≦Ｎ
ＳＫＩＰＴかつｄｔｔ≦ＤＴＴであったことを表す情報
FALSE であれば、照合部３０は、Ｓ２３の処理を行なわ
ずに、従って出力確率Ｂ_ji(x_t)を式（４）〜（７）に従
って求める処理も参照確率Ｂ_jiを書き換える処理も行な
わずに、次の番号ｊにつき処理を行なうべく、遷移元Ｓ
_j の番号ｊに１を加算し（Ｓ２４）、然る後、Ｓ２０の
処理を行なう。In S25, the comparison result mode t is skip ≦ N
Information indicating that SKIPT and dtt ≦ DTT
If it is FALSE, the matching unit 30 does not perform the process of S23, and therefore does not perform the process of _{obtaining the} output probability B _ji (x _t ) according to the equations (4) to (7) or the process of rewriting the reference probability B _ji. In order to perform processing for the next number j, the transition source S
adds 1 to _j number j (S24), thereafter, it performs the processing of S20.

【０２９３】ｓｋｉｐｔ≦ＮＳＫＩＰＴかつｄｔｔ≦Ｄ
ＴＴであれば、ｓｋｉｐｔ≦ＮＳＫＩＰＴなので過渡部
に関わる距離ｄｔｔが閾値ＤＴＴ以下となった回数ｓｋ
ｉｐｔは閾値ＮＳＫＩＰＴを越えず、従って現フレーム
番号ｔと過渡部基準フレーム番号ｑｔとの時間的隔たり
が小さくなるので誤差が増大する可能性は低い。しかも
ｄｔｔ≦ＤＴＴであるので現フレーム番号ｔの音声特徴
ベクトルｘ_t は過渡部基準フレーム番号ｑｔの音声特徴
ベクトルｘ_qtに近似的に等しくなり従って現フレーム番
号ｔの音声特徴ベクトルｘ_t は過渡部基準フレーム番号
ｑｔの音声特徴ベクトルｘ_qtからの変化が小さいので、
現フレーム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ_jiで
近似できる。そこで参照確率Ｂ_jiの書き換えを行なわな
い。Skipt ≤ NSKIPT and dtt ≤ D
If TT, then skip ≦ NSKIPT, so the number of times sk that the distance dtt related to the transition part is equal to or less than the threshold value DTT
Since ipt does not exceed the threshold value NSKIPT, and therefore the time gap between the current frame number t and the transition part reference frame number qt becomes small, the possibility of increasing the error is low. Moreover, since dtt ≦ DTT, the voice feature vector x _t of the current frame number t becomes approximately equal to the voice feature vector x _qt of the reference frame number qt of the transition part, and thus the voice feature vector x _t of the current frame number _t is the transition part. the change from speech feature vector x _qt reference frame number qt is small,
The output probability B _ji (x _t ) of the current frame number t can be approximated by the reference probability B _ji . Therefore, the reference probability B _ji is not rewritten.

【０２９４】そしてｊ＝１、２、……、Ｊの全てのｊに
つきＳ２０〜Ｓ２５の処理を終了すると、Ｓ２０の処理
でｊ＞Ｊ（ここではＪ＝Ｉ）との判定結果を得るので、
Ｓ２０でｊ＞Ｊであれば、次に照合部３０は、各参照確
率Ｂ_jiを読み出し、ｉ＝１、２、……、Ｉの全てのｉに
ついて、前向き確率Ｃ_itを式（１１）に従って求める
（Ｓ２６）。然る後、音声区間の次のフレームにつき処
理を行なうべくＳ７の処理に戻る。When the processing of S20 to S25 is completed for all j of j = 1, 2, ..., J, the determination result of j> J (here J = I) is obtained in the processing of S20.
If j> J in S20, then the matching unit 30 reads out the reference probabilities B _ji and calculates the forward probability C _it for all i of i = 1, 2, ..., I according to the equation (11). Ask (S26). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０２９５】（２−２Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトル時系列ｘ₁ 、ｘ₂ 、……、ｘ
_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求め
る処理を終了する（終了）。(2-2B: When t> T in S8) When the current frame number t is larger than the frame number T of the end frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, wherein i = * i maximum forward probability C _iT of the forward probability C _iT comprising the following (9), the speech feature vector time series _{_{x 1, x 2, ......,}} x
Likelihood ln {P (x ₁ , x ₂ , ..., X between _T and HMM
_T )}, and after that, the process of calculating the likelihood for the HMM is ended (end).

【０２９６】照合部３０は、辞書部２４に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に図１２〜図１４
に示すＳ１〜Ｓ２６の処理を行なって尤度（前向き確率
Ｃ_iT）を求め、そして最大の尤度を得たＨＭＭのカテゴ
リを、当該音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を抽出した入力音声信号に対する認識結果とし
て、次段の装置（図示せず）へ出力する。The collating unit 30 detects all the HMMs stored in the dictionary unit 24 for each HMM as shown in FIGS.
, The likelihood (forward probability C _iT ) is obtained by performing the processes of S1 to S26, and the category of the HMM for which the maximum likelihood is obtained is the time series x ₁ , x ₂ , ... Of the speech feature vector.
, X _T are output to a device (not shown) in the next stage as a recognition result for the extracted input voice signal.

【０２９７】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、遷移元Ｓ_j
が定常部である場合にスキップ数ｓｋｉｐｓが閾値ＮＳ
ＫＩＰＳ以下となりかつ距離ｄｔｓが閾値ＤＴＳ以下で
あれば、出力確率Ｂ_ji(x_t)を式（４）〜（７）から求め
る演算を行なわずに、参照確率Ｂ_jiを読み出して前向き
確率Ｃ_itを求める。また遷移元Ｓ_j が過渡部である場合
にスキップ数ｓｋｉｐｔが閾値ＮＳＫＩＰＴ以下となり
かつ距離ｄｔｔが閾値ＤＴＴ以下であれば、出力確率Ｂ
_ji(x_t)を式（４）〜（７）から求める演算を行なわず
に、前向き確率Ｃ_itを求めるので、大幅に演算量を削減
できる。しかもこのような演算の簡略化は、遷移元Ｓ_j
が定常部である場合にスキップ数ｓｋｉｐｓが閾値ＮＳ
ＫＩＰＳ以下となりかつ距離ｄｔｓが閾値ＤＴＳ以下と
なるか、遷移元Ｓ_j が過渡部である場合にスキップ数ｓ
ｋｉｐｔが閾値ＮＳＫＩＰＴ以下となりかつ距離ｄｔｔ
が閾値ＤＴＴ以下となる場合かのいずれかの場合に行な
うので、演算の簡略化を行なっても、前向き確率Ｃ_itの
誤差を小さくできる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _iT in the process of transition _Sj
Is a stationary part, the skip number skips is the threshold NS
If KIPS or less and the distance dts is less than or equal to the threshold DTS, the reference probability B _ji is read out without performing the calculation of the output probability B _ji (x _t ) from the equations (4) to (7), and the forward probability C _it. Ask for. If the number of skips skip is less than or equal to the threshold value NSKIPT and the distance dtt is less than or equal to the threshold value DTT when the transition source S _j is the transition portion, the output probability B
_Since the forward probability C _it is calculated without performing the calculation of _ji (x _t ) from the equations (4) to (7), the calculation amount can be significantly reduced. Moreover, simplification of such an operation is based on the transition source S _j
Is a stationary part, the skip number skips is the threshold NS
The number of skips s when the distance dts is less than or equal to KIPS and the distance dts is less than or equal to the threshold value DTS, or the transition source S _j is a transition part.
kipt is less than or equal to the threshold value NSKIPT and the distance dtt
Is less than or equal to the threshold DTT, the error in the forward probability C _it can be reduced even if the calculation is simplified.

【０２９８】また音声信号の過渡部において時間順次に
抽出される音声特徴ベクトルｘ_t の変化は大きいので、
遷移元Ｓ_j の種別ｓが過渡部である場合には、過渡部に
関わる閾値ＮＳＫＩＰＴ、ＤＴＴを小さく設定すること
により前向き確率Ｃ_itの誤差を小さくすることが望まれ
る。Further, since the change of the voice feature vector x _t extracted in time sequence in the transient portion of the voice signal is large,
When the type s of the transition source S _j is the transient part, it is desired to reduce the error of the forward probability C _it by setting the thresholds NSKIPT and DTT related to the transient part to be small.

【０２９９】これに対し、音声信号の定常部において時
間順次に抽出される音声特徴ベクトルｘ_t の変化は小さ
いので、遷移元Ｓ_j の種別ｓが定常部である場合には、
定常部に関わる閾値ＮＳＫＩＰＳ、ＤＴＳを大きくして
も前向き確率Ｃ_itの誤差を小さくすることができる。On the other hand, since the change in the voice feature vector x _t extracted in time-sequential order in the stationary part of the audio signal is small, when the type s of the transition source S _j is the stationary part,
Even if the thresholds NSKIPS and DTS related to the stationary part are increased, the error of the forward probability C _it can be reduced.

【０３００】従って定常部に関わる閾値ＮＳＫＩＰＳ、
ＤＴＳに値の大きいものを用いると共に、過渡部に関わ
る閾値ＮＳＫＩＰＴ、ＤＴＴに値の小さなものを用いる
ことにより、前向き確率Ｃ_itの誤差をなるべく小さくし
つつ、演算量を削減することができる。Therefore, the threshold value NSKIPS relating to the stationary part,
By using a large value for DTS and a small value for the threshold values NSKIPT and DTT related to the transient part, _it is possible to reduce the amount of calculation while minimizing the error of the forward probability C _it .

【０３０１】請求項３の発明は、フレーム単位でマッチ
ング処理を行なう音声認識装置の全てに適用できる。The invention of claim 3 can be applied to all speech recognition apparatuses that perform matching processing in frame units.

【０３０２】尚、遷移元Ｓ_j に対し付与される定常部、
過渡部の種別ｓは、例えば以下に述べるようにして定め
ることができる。The stationary part given to the transition source S _j ,
The type s of the transitional part can be determined as described below, for example.

【０３０３】第一の例は、出力確率ｂ_ji(x_t)を定めるパ
ラメータのひとつｂ_jim(x_t) に着目するものである。
（６）式にも示すように、ｂ_jim(x_t) ＝（２π）^-p/2｜
ρ_jim｜^-1/2 exp｛Ｄ_jimt ² ／２｝であって、この式
（６）中の分散・供分散行列の大きさ｜ρ_jim ｜が、任
意好適に定めた閾値ＴＨＬを越える場合に、当該出力確
率ｂ_ji(x_t)を与える遷移元Ｓ_j の種別ｓを過渡部と判定
し、また分散・供分散行列の大きさ｜ρ_jim ｜が閾値Ｔ
ＨＬ以下となる場合に、当該出力確率ｂ_ji(x_t)を与える
遷移元Ｓ_j の種別ｓを定常部と判定する。従ってこの場
合には、分散・供分散行列の大きさ｜ρ_jim ｜が種別ｓ
を表し、この｜ρ_jim ｜と閾値ＴＨＬとの比較判定が、
種別ｓの判定ということになる。The first example focuses on one parameter b _jim (x _t ) that determines the output probability b _ji (x _t ).
As shown in the equation (6), b _jim (x _t ) = (2π) ^{−p / 2} |
A ^{_{^{-1/2 exp {D jimt 2/2}}} }, the magnitude of the dispersion and test covariance matrix in the equation (6) | | ρ _jim when a exceeds the threshold value THL which defines any suitably | [rho _jim , The type s of the transition source S _j that gives the output probability b _ji (x _t ) is determined to be the transient part, and the size | ρ _jim | of the variance-covariance matrix is the threshold T.
When it becomes HL or less, the type s of the transition source S _j that gives the output probability b _ji (x _t ) is determined to be a stationary part. Therefore, in this case, the size of the variance / covariance matrix | ρ _jim |
And the comparison judgment between this | ρ _jim | and the threshold value THL is
This means the determination of the type s.

【０３０４】第二の例は、出力確率ｂ_ji(x_t)を与える状
態遷移が母音の状態遷移に対応する場合に、当該状態遷
移の遷移元Ｓ_j に対し定常部であることを表す情報を、
また出力確率ｂ_ji(x_t)を与える状態遷移が子音の状態遷
移に対応する場合に、当該状態遷移の遷移元Ｓ_j に対し
過渡部であることを表す情報を、予め付与しておくとい
うものである。In the second example, when the state transition giving the output probability b _ji (x _t ) corresponds to the state transition of the vowel, the information indicating that it is a stationary part for the transition source S _{j of the} state transition. To
In addition, when the state transition giving the output probability b _ji (x _t ) corresponds to the state transition of a consonant, information indicating that it is a transition part is given to the transition source S _{j of the} state transition in advance. It is a thing.

【０３０５】第三の例は、出力確率ｂ_ji(x_t)を与える状
態遷移が、母音の状態遷移及びｐ、ｔ、ｋ、ｒ以外の子
音の状態遷移に対応する場合に、当該状態遷移の遷移元
Ｓ_jに対し定常部であることを表す情報を、また出力確
率ｂ_ji(x_t)を与える状態遷移が子音ｐ、ｔ、ｋ、ｒの状
態遷移に対応する場合に、当該状態遷移の遷移元Ｓ_jに
対し過渡部であることを表す情報を、予め付与しておく
というものである。The third example is the case where the state transition giving the output probability b _ji (x _t ) corresponds to the state transition of a vowel and the state transition of a consonant other than p, t, k and r. If the state transition that gives the information indicating that the transition source S _{j is a} stationary part and that the output probability b _ji (x _t ) corresponds to the state transition of the consonants p, t, k, and r, Information indicating that it is a transitional part is added in advance to the transition source S _j of the transition.

【０３０６】＜請求項７の発明の第一実施形態＞図１５
は請求項７の発明の第一実施形態の実施に用いて好適な
音声認識装置の構成例を示す機能ブロック図である。<First Embodiment of the Invention of Claim 7> FIG.
FIG. 8 is a functional block diagram showing a configuration example of a voice recognition device suitable for use in carrying out the first embodiment of the invention of claim 7.

【０３０７】同図に示す音声認識装置３４は、辞書部３
６、音響処理部３８、音声区間検出部４０、照合部４２
及び参照情報記憶部４４を備える。The voice recognition device 34 shown in FIG.
6, sound processing unit 38, voice section detection unit 40, collation unit 42
And a reference information storage unit 44.

【０３０８】辞書部３６は、認識照合用の標準パタンと
して各カテゴリ毎に用意された複数個のヒドンマルコフ
モデルを格納する。参照情報記憶部４４は、前向き確率
基準フレーム番号ｑｃ、出力確率基準フレーム番号ｑｓ
と、参照確率ｂ_jiとを格納する。The dictionary unit 36 stores a plurality of Hidden Markov models prepared for each category as standard patterns for recognition and matching. The reference information storage unit 44 stores the forward probability reference frame number qc and the output probability reference frame number qs.
And the reference probability b _ji are stored.

【０３０９】音響処理部３８は、一定時間幅のフレーム
毎に、入力音声信号から音声特徴ベクトルを抽出する。
音声区間検出部４０は、入力音声信号から音声区間を検
出する。The sound processing unit 38 extracts a voice feature vector from the input voice signal for each frame of a fixed time width.
The voice section detection unit 40 detects a voice section from the input voice signal.

【０３１０】照合部４２は、請求項７の発明の第一実施
形態を実施するものであって、音声区間の始端フレーム
から終端フレームまでに抽出された音声特徴ベクトルの
時系列ｘ₁ 、ｘ₂ 、……、ｘ_T とヒドンマルコフモデル
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
次式（１）〜（３）を用いて求め、最大の尤度を得たヒ
ドンマルコフモデルに付与されているカテゴリを、当該
音声区間内の音声信号に対する認識結果とする。The collation unit 42 implements the first embodiment of the invention of claim 7, and is the time series x ₁ , x _{2 of} the voice feature vectors extracted from the start frame to the end frame of the voice section. , ..., The likelihood ln {P (x ₁ , x ₂ , ..., x _T )} between x _T and the Hidden Markov model is
The category given to the Hidden Markov model that has been obtained using the following equations (1) to (3) and has the maximum likelihood is used as the recognition result for the voice signal in the voice section.

【０３１１】[0311]

【数２４】 (Equation 24)

【０３１２】但し、ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊ Ф_i ：ヒドンマルコフモデルにおいて初期状態がＳ_i で
ある確率ａ_ji：ヒドンマルコフモデルにおいて状態Ｓ_j から状態
Ｓ_i に遷移する確率ｘ_t ：音声区間内の第ｔ番目のフレームで抽出された音
声特徴ベクトル（１≦ｔ≦Ｔであって、第１番目のフレ
ームは音声区間の始端フレームを及び第Ｔ番目のフレー
ムは音声区間の終端フレームを表す）ｂ_ji(x_t)：ヒドンマルコフモデルにおいて状態Ｓ_j から
状態Ｓ_i に遷移するとき出力される音声特徴ベクトルｘ
_t の出力確率ｃ_it：ヒドンマルコフモデルにおいて初期状態から遷移
を開始し音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_t を出力して状態Ｓ_i に至る前向き確率＊ｉ：ヒドンマルコフモデルにおいて最終状態となる状
態Ｓ_i に付与されている状態番号ｉ尤度を求める際には、参照情報記憶部４４に格納してあ
る参照確率ｂ_jiを用いて、ｔ＝１、２、……、Ｔの各場
合の前向き確率ｃ_itを、次の如くして順次に求める。However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that the initial state is S _i in the Hidden Markov model a _ji : Hidden Markov model At the state S _j to the state S _i in the above, x _t : the speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, and the first frame corresponds to the speech section). The start frame and the T-th frame represent the end frame of the speech section) b _ji (x _t ): speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t to the state S _i * i: State number i assigned to the state S _i that is the final state in the Hidden Markov model When storing the likelihood, it is stored in the reference information storage unit 44. Using the given reference probability b _ji , the forward probability c _it for each case of t = 1, 2, ..., T is sequentially obtained as follows.

【０３１３】（１）．ｔ＝１のときは、前向き確率基準
フレーム番号ｑｃ、出力確率基準フレーム番号ｑｓをそ
れぞれ１に初期化すると共に、全てのｊ、ｉについて、
出力確率ｂ_ji(x_t)をヒドンマルコフモデルから求め当該
出力確率ｂ_ji(x_t)を参照確率ｂ_jiの初期値として書き込
み、該参照確率ｂ_jiの書込み終了後に各参照確率ｂ_jiを
読み出して前向き確率ｃ_itを求める処理（３Ａ）を行な
う。そして処理（３Ａ）の終了後、現フレーム番号ｔに
１を加算する処理（３Ｂ）を行なう。(1). When t = 1, the forward probability reference frame number qc and the output probability reference frame number qs are initialized to 1, and all j and i are
Writing output probability b _ji (x _t) the probability that output determined from Hidden Markov Models b _ji the (x _t) as the initial value of the reference probability b _ji, reads out the reference probability b _ji after completion of writing of the reference probability b _ji Processing (3A) for obtaining the forward probability c _it . After the processing (3A) is completed, processing (3B) for adding 1 to the current frame number t is performed.

【０３１４】（２）．２≦ｔ≦Ｔのときは、現フレーム
番号ｔの音声特徴ベクトルｘ_t と前向き確率基準フレー
ム番号ｑｃの音声特徴ベクトルｘ_qcとの間の距離ｄｔｃ
を閾値ＤＴＣと比較する処理（３Ｃ）と、この処理（３
Ｃ）の比較結果がｄｔｃ≦ＤＴＣとなる場合に、前向き
確率ｃ_itは直前フレームの前向き確率ｃ_i(t-1)に等しい
ものとして前向き確率ｃ_itを求める演算を終了する処理
（３Ｄ）と、この処理（３Ｃ）の比較結果がｄｔｃ＞Ｄ
ＴＣとなる場合に、前向き確率基準フレーム番号ｑｃを
現フレーム番号ｔに書き換える処理（３Ｅ）とを行な
う。(2). 2 ≦ t when the ≦ T, the distance between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t dtc
Is compared with the threshold value DTC (3C), and this processing (3C)
When the comparison result of C) is dtc ≦ DTC, the forward probability c _it is regarded as equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the process of terminating the forward probability c _it is finished (3D). , The comparison result of this process (3C) is dtc> D
If it becomes TC, the forward probability reference frame number qc is rewritten to the current frame number t (3E).

【０３１５】そして処理（３Ｅ）の終了後、現フレーム
番号ｔの音声特徴ベクトルｘ_t と出力確率基準フレーム
番号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔｓを
閾値ＤＴＳと比較し、当該比較結果がｄｔｓ＞ＤＴＳと
なる場合に、出力確率基準フレーム番号ｑｓをフレーム
番号ｔに書き換えると共に、全てのｊ、ｉについて、出
力確率ｂ_ji(x_t)をヒドンマルコフモデルから求めて参照
確率ｂ_jiを当該出力確率ｂ_ji(x_t)に書き換え、参照確率
ｂ_jiの書換え終了後に各参照確率ｂ_jiを読み出して前向
き確率ｃ_itを求め、当該比較結果がｄｔｓ≦ＤＴＳとな
る場合に、参照確率ｂ_jiの書き換えを行なわずに各参照
確率ｂ_jiを読み出して前向き確率ｃ_itを求める処理（３
Ｆ）を行なう。[0315] Then after the processing (3E), the distance dts between the speech feature vector x _qs output probabilities reference frame number qs with speech feature vectors x _t of the current frame number t is compared with a threshold DTS, the comparison When the result is dts> DTS, the output probability reference frame number qs is rewritten to the frame number t, and the output probabilities b _ji (x _t ) are obtained from the Hidden Markov model for all j and i, and the reference probabilities b _{ji are obtained.} To the output probability b _ji (x _t ), read the reference probabilities b _ji after the reference probabilities b _ji have been rewritten, obtain the forward probabilities c _it, and when the comparison result is dts ≦ DTS, the reference probabilities are without rewriting b _ji reads each reference probability b _ji seek forward probability c _it processes (3
F).

【０３１６】そして処理（３Ｄ）若しくは（３Ｆ）の終
了後、現フレーム番号ｔに１を加算する処理（３Ｇ）を
行なう。After the processing (3D) or (3F), the processing (3G) of adding 1 to the current frame number t is performed.

【０３１７】図１６はヒドンマルコフモデルの説明に供
する図である。辞書部３６に格納されているヒドンマル
コフモデル（Hidden Markov Model 。以下、ＨＭＭ）
は、音声認識一単位分の音声信号を表現する。音声認識
の一単位は、単語単位、音素単位或はそのほかとするこ
とができるが、ここでは単語単位とする。各カテゴリｚ
毎に複数のＨＭＭを用意し、ＨＭＭとカテゴリｚとを相
対応付けて辞書部３６に格納する。FIG. 16 is a diagram for explaining the Hidden Markov Model. Hidden Markov Model (HMM) stored in the dictionary unit 36
Represents a voice signal for one unit of voice recognition. One unit of speech recognition can be a word unit, a phoneme unit, or another unit, but here, it is a word unit. Each category z
A plurality of HMMs are prepared for each, and the HMM and the category z are associated with each other and stored in the dictionary unit 36.

【０３１８】ＨＭＭは、総個数Ｉ個の状態Ｓ₁ 〜Ｓ_I か
ら成る状態の集合１と、音声特徴ベクトルｘの集合２
と、状態遷移確率ａ_jiの集合３と、出力確率ｂ_ji(x) の
集合４と、初期状態確率Ф_i の集合５と、最終状態Ｆの
集合６とにより定義される。但し、The HMM has a set 1 of states consisting of a total of I states S _{1 to} S _I and a set 2 of speech feature vectors x.
, A set 3 of state transition probabilities a _ji, a set 4 of output probabilities b _ji (x), a set 5 of initial state probabilities Φ _i , and a set 6 of final states F. However,

【０３１９】[0319]

【数２５】 (Equation 25)

【０３２０】ｉ：ｉ＝１、２、……、Ｉｊ：ｊ＝１、２、……、Ｊａ_ji：状態Ｓ_j から状態Ｓ_i に遷移する確率ｂ_ji(x) ：状態Ｓ_j から状態Ｓ_i に遷移する際に音声特
徴ベクトルｘが出力される確率 Ф_i ：初期状態がＳ_i である確率例えば図１４の例において、ａ₁₂は状態Ｓ₁ から状態Ｓ
₂ に遷移する確率及びｂ₁₂(x) は状態Ｓ₁ から状態Ｓ₂
に遷移したとき音声特徴ベクトルｘが出力される確率、
またａ₂₂は状態Ｓ₂ から状態Ｓ₂ に遷移する確率及びｂ
₂₂(x) は状態Ｓ ₂ から状態Ｓ₂ に遷移したとき音声特徴
ベクトルｘが出力される確率を表す。I: i = 1, 2, ..., I j: j = 1, 2, ..., J a_ji: State S_j To state S_i Probability of transition to b_ji(x): State S_j To state S_i When switching to
Probability that characteristic vector x is output Ф_i : Initial state is S_i For example, in the example of FIG. 14, a₁₂Is state S₁ To state S
_Two Probability of transition to and b₁₂(x) is state S₁ To state S_Two
The probability that the speech feature vector x is output when the transition to
Also a_{twenty two}Is state S_Two To state S_Two Probability of transition to and b
_{twenty two}(x) is state S _Two To state S_Two Voice features when transitioning to
It represents the probability that the vector x will be output.

【０３２１】ＨＭＭを定義するための集合１〜６は、統
計的手法によって、各カテゴリｚ毎に個別に求められ
る。すなわちカテゴリｚに対応する音声信号として種々
の音声信号を集め、例えば年齢別にもしくは性別毎に音
声信号を集め、或は、発声法の異なる音声信号を集め、
これら音声信号の統計的性質を表現する集合１〜６を求
める。Sets 1 to 6 for defining the HMM are individually obtained for each category z by a statistical method. That is, various voice signals are collected as voice signals corresponding to the category z, for example, voice signals are collected by age or sex, or voice signals having different voicing methods are collected.
Sets 1 to 6 expressing the statistical properties of these audio signals are obtained.

【０３２２】出力確率ｂ_ji(x) は、互いに無相関な複数
個の正規分布から成る無相関混合正規分布を用いて表現
されており、これら正規分布はそれぞれ音声特徴ベクト
ルｘの関数となっている。無相関混合正規分布は、数学
的取り扱いが簡単でしかも表現能力が高いという利点を
有する。The output probability b _ji (x) is expressed by using a non-correlated mixed normal distribution consisting of a plurality of normal distributions that are uncorrelated with each other, and each of these normal distributions is a function of the speech feature vector x. There is. The decorrelated mixed normal distribution has the advantage of being easy to handle mathematically and having high expressiveness.

【０３２３】次に音声認識装置３４の動作説明ととも
に、この実施形態の音声認識方法の処理の流れにつき具
体的に説明する。Next, the operation flow of the voice recognition device 34 will be described, and the flow of processing of the voice recognition method of this embodiment will be specifically described.

【０３２４】音響処理部３８は、入力音声信号から、各
フレーム毎に音声特徴ベクトルｘ_t＝（ｘ_t1、ｘ_t2、…
…、ｘ_tp）を抽出する。ここでｐは音声特徴ベクトルｘ
_t の次数及びｘ_t1〜ｘ_tpは音声特徴ベクトルｘ_t のベク
トル成分を表す。ｔは音声特徴ベクトルｘ_t が抽出され
たフレームに付与されている番号である。後述するＨＭ
Ｍとの照合の段階では音声区間の始端フレームのフレー
ム番号ｔを１として昇順に書き改められるが、音響処理
の時点では各フレームを識別できるようにフレーム番号
ｔを付与してあれば良い。The sound processing unit 38 uses the input voice signal to output a voice feature vector x _t = (x _t 1, x _t 2, ...) For each frame.
,, x _t p) are extracted. Where p is the voice feature vector x
order and x _t 1 to x _t p to _t represents a vector component of the speech feature vector x _t. t is a number given to the frame from which the voice feature vector x _t is extracted. HM described later
At the stage of matching with M, the frames are rewritten in ascending order with the frame number t of the starting frame of the voice section as 1, but at the time of the acoustic processing, the frame number t may be added so that each frame can be identified.

【０３２５】音声特徴ベクトルｘ_t のベクトル成分とし
ては、例えば、中心周波数が異なる複数のバンドパスフ
ィルタから成る帯域フィルタ群に入力音声信号を入力し
たときの各フィルタ出力から得たものや、入力音声信号
をフーリエ解析して得られるパワースペクトル成分や、
或は、入力音声信号の線形予測分析すなわちＬＰＣ分析
により求められるＬＰＣケプストラム係数を、用いるこ
とができる。ここでは帯域フィルタ群を用いて音声特徴
ベクトルｘ_t を抽出する例につき説明する。The vector component of the voice feature vector x _t is, for example, one obtained from each filter output when an input voice signal is input to a band filter group consisting of a plurality of band pass filters having different center frequencies, or input voice signals. Power spectrum component obtained by Fourier analysis of the signal,
Alternatively, the LPC cepstrum coefficient obtained by the linear prediction analysis, that is, the LPC analysis of the input speech signal can be used. Here, an example of extracting the voice feature vector x _t using a bandpass filter group will be described.

【０３２６】音響処理部３８は、入力音声信号をアナロ
グ信号からデジタル信号に変換し、変換後の入力音声信
号を、帯域フィルタ群を介して、各バンドパスフィルタ
に対応した周波数帯（チャネル）の信号成分に分離し、
それぞれ周波数帯が異なる総個数ｐ個の信号成分ｘ1 〜
ｘp を得る。次いで音響処理部３８は、信号成分ｘ1を
整流し、フレーム単位に、整流した信号成分ｘ1 （信号
成分ｘ1 の絶対値）の平均値を得る。この平均値は、整
流した信号成分ｘ1 を１フレーム分の時間幅で除して得
られる。第ｔ番目のフレームにおいて得られる信号成分
ｘ1 の平均値を、音声特徴ベクトルｘ_t の成分ｘ_t1とし
て抽出する。同様にして、残りの信号成分ｘ2 〜ｘp か
ら、音声特徴ベクトルｘ_t の成分ｘ_t2〜ｘ_tpを抽出す
る。The acoustic processing unit 38 converts the input audio signal from an analog signal into a digital signal, and outputs the converted input audio signal through a band filter group to a frequency band (channel) corresponding to each band pass filter. Separated into signal components,
The total number p of signal components x1 ...
Get xp. Next, the acoustic processing unit 38 rectifies the signal component x1 and obtains an average value of the rectified signal component x1 (absolute value of the signal component x1) in frame units. This average value is obtained by dividing the rectified signal component x1 by the time width of one frame. The average value of the signal component x1 obtained in the t-th frame is extracted as component x _t 1 of the audio feature vector x _t. Similarly, from the remaining signal components x2 ～Xp, it extracts the component x _{_t} 2~x _t p of a speech feature vector x _t.

【０３２７】次に音声区間検出部４０は、音響処理部３
８からの音声特徴ベクトルｘ_t に基づいて、音声区間の
始端フレーム及び終端フレームを検出し、どのフレーム
が音声区間の始端フレーム及び終端フレームであるかを
表す区間情報を生成する。音声区間は、音声認識一単位
分の音声信号ここでは単語１個分の音声信号が含まれる
区間である。Next, the voice section detecting section 40 includes the sound processing section 3
Based on the voice feature vector x _t from 8, the start frame and the end frame of the voice section are detected, and the section information indicating which frame is the start frame and the end frame of the voice section is generated. The voice section is a section in which a voice signal for one unit of voice recognition is included here.

【０３２８】照合部４２は、区間情報と音声特徴ベクト
ルｘ_t とを音声区間検出部４０から入力して、音声区間
の始端フレームから終端フレームまでに抽出された音声
特徴ベクトルｘ_t の時系列ｘ₁ 、ｘ₂ 、……、ｘ_T を生
成する。この際、始端フレームのフレーム番号ｔを１と
して、音声区間の始端フレームから終端フレームまでの
フレーム番号ｔを昇順に書き改める。The collation unit 42 inputs the section information and the voice feature vector x _t from the voice section detection unit 40, and the time series x of the voice feature vector x _t extracted from the start frame to the end frame of the voice section. Generate ₁ , x ₂ , ..., X _T. At this time, the frame number t of the start frame is set to 1, and the frame numbers t from the start frame to the end frame of the voice section are rewritten in ascending order.

【０３２９】そして照合部４２はベクトル時系列ｘ₁ 、
ｘ₂ 、……、ｘ_T と辞書部３６に格納されているＨＭＭ
との間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝を、
辞書部３６の各ＨＭＭ毎に個別に求め、最大の尤度を得
たＨＭＭに対し付与されているカテゴリｚを、認識結果
として出力する。The collating unit 42 then calculates the vector time series x ₁ ,
x ₂ , ..., X _T and the HMM stored in the dictionary unit 36
And the likelihood ln {P (x ₁ , x ₂ , ..., X _T )} between
The category z given to each HMM of the dictionary unit 36 is individually obtained and the maximum likelihood is obtained, and the category z is output as the recognition result.

【０３３０】ここで、式（１）で示されるＰ（ｘ₁ 、ｘ
₂ 、……、ｘ_T ）は、ＨＭＭにおいてベクトル時系列ｘ
₁ 、ｘ₂ 、……、ｘ_T が出現する確率である。Here, P (x ₁ , x represented by the equation (1)
₂ , ..., x _T ) is the vector time series x in the HMM.
It is the probability that ₁ , x ₂ , ..., x _T will appear.

【０３３１】[0331]

【数２６】 (Equation 26)

【０３３２】（１）式中のｃ_iTは、ＨＭＭにおいて初期
状態から遷移を開始しベクトル時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を出力して状態Ｓ_i に至る前向き確率、＊ｉは
Ｓ_i ∈Ｆを満たすｉ（最終状態Ｆに属する状態Ｓ_i に付
与されている番号ｉ）であって、従ってｉ＝＊ｉとなる
前向き確率ｃ_iTのなかで最大の前向き確率ｃ_iTを、出現
確率Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）とするものである。C _iT in the equation (1) is a vector time series x ₁ , x ₂ , ...
..., the forward probability of outputting x _T to reach the state S _i , * i is i (the number _i assigned to the state S _i belonging to the final state F) that satisfies S _i εF, and thus i = The maximum forward probability c _iT among the forward probabilities c _iT of * i is _defined as the appearance probability P (x ₁ , x ₂ , ..., X _T ).

【０３３３】前向き確率ｃ_iTは、ビタビアルゴリズムに
より、式（２）〜（３）に示す漸化式を用いて近似的に
求められる。ｃ_i0＝Ф_i ……（２）The forward probability c _iT is approximately obtained by the Viterbi algorithm using the recurrence formulas shown in the equations (2) to (3). c _i0 = Ф _i (2)

【０３３４】[0334]

【数２７】 [Equation 27]

【０３３５】ＨＭＭにおいて、音声特徴ベクトルｘ_t を
出力する状態遷移は一又は複数存在する。従って初期状
態からベクトル系列ｘ₁ 〜ｘ_t を出力して状態Ｓ_i に至
る遷移パスは一つ又は複数存在し、ほとんどの場合に複
数の遷移パスが存在する。そこで式（３）に示されるよ
うに、各遷移パス毎に計算したｃ_j(t-1)ａ_jiｂ_ji(x_t)の
うち最大のｃ_j(t-1)ａ_jiｂ_ji(x_t)を前向き確率ｃ_itとす
る。この計算方法は、ビタビ法と呼ばれている。In the HMM, there are one or more state transitions that output the voice feature vector x _t . Therefore, there is one or a plurality of transition paths from the initial state to output the vector series x _{1 to} x _t to reach the state S _i , and in most cases, there are a plurality of transition paths. Therefore, as shown in Expression (3), the maximum c _{j (t-1)} a _ji b _ji (x of the c _{j (t-1)} a _ji b _ji (x _t ) calculated for each transition path is obtained. Let _t ) be the forward probability c _it . This calculation method is called the Viterbi method.

【０３３６】（３）式中の出力確率ｂ_ji(x_t)を、ここで
は次式（４）の如く定義する。The output probability b _ji (x _t ) in the equation (3) is defined as in the following equation (4).

【０３３７】[0337]

【数２８】 [Equation 28]

【０３３８】但し、ｍ＝１、２、……、Ｍｇ_jim(x_t) ：総個数Ｍ個の正規分布から成る無相関混合
正規分布において第ｍ番目の正規分布から算出される音
声特徴ベクトルｘ_t の重み付け確率（４）式中の重み付け確率ｇ_jim(x_t) は、次式（５）〜
（７）を用いて表される。However, m = 1, 2, ..., M g _jim (x _t ): A speech feature vector calculated from the m-th normal distribution in the uncorrelated mixed normal distribution consisting of M normal distributions. Weighting probability of x _t The weighting probability g _jim (x _t ) in the expression (4) is expressed by the following expression (5)-
It is expressed using (7).

【０３３９】ｇ_jim(x_t) ＝λ_jim ｂ_jim(x_t) ……（５）ｂ_jim(x_t) ＝（２π）^-p/2｜ρ_jim ｜^-1/2 exp｛−Ｄ_jimt ² ／２｝ ……（６）Ｄ_jimt ² ＝（ｘ_t −μ_jim ）’ρ_jim ^-1(ｘ_t −μ_jim ） ……（７） λ_jim ：第ｍ番目の正規分布の重みｂ_jim(x_t) ：第ｍ番目の正規分布から算出される音声特
徴ベクトルｘ_t の重み無し確率 ρ_jim ：第ｍ番目の正規分布の分散・供分散行列 μ_jim ：第ｍ番目の正規分布の平均ベクトルＤ_jimt：音声特徴ベクトルｘ_t と第ｍ番目の正規分布と
の間の距離を表すマハラビスの汎距離（ｘ_t −μ_jim ）’：（ｘ_t −μ_jim ）の転置行列尚、出力確率ｂ_ji(x_t)としては種々のものを用いること
ができ、（４）式のもののほか例えば、次式（８）の如
く定義したものを用いても良い。（８）式は、総個数Ｍ
個の正規分布から成る無相関混合正規分布において個々
の正規分布から算出される重み付け確率ｇ_jim(x_t) のう
ち最大の重み付け確率ｇ_jim(x_t) を、出力確率ｂ_ji(x_t)
として検出することを表す。G _jim (x _t ) = λ _jim b _jim (x _t ) ... (5) b _jim (x _t ) = (2π) ^{−p / 2} _{│ρ jim} │ ^-1/2 exp {−D _jimt ^{2/2} ...... (6)} D jimt 2 = (x t -μ jim) 'ρ jim -1 (x t -μ jim) ...... (7) λ jim: weight b _jim of the m-th normal distribution (x _t ): _Unweighted probability of the speech feature vector x _t calculated from the m-th normal distribution ρ _jim : _Covariance / covariance matrix of the m-th normal distribution μ _jim : _{Mean of} the m-th normal distribution Vector D _jimt : Mahalabis's general distance (x _t −μ _jim ) ′: transposed matrix of (x _t −μ _jim ), which represents the distance between the voice feature vector x _t and the m-th normal distribution. Various types of b _ji (x _t ) can be used, and in addition to the formula (4), for example, a formula defined as the following formula (8) may be used. Equation (8) is the total number M
Number of regular weights probability in the distribution uncorrelated Gaussian Mixture consisting calculated from individual normal distribution g _jim largest weighted probability g _jim (x _t) of the (x _t), the output probability b _ji (x _t)
It means to detect as.

【０３４０】[0340]

【数２９】 (Equation 29)

【０３４１】さらに対数化した遷移確率Ａ_ji＝ln
（ａ_ji）、対数化した出力確率Ｂ_ji(x_t)＝ln｛ｂ
_ji(x_t)｝、及び、対数化した前向き確率Ｃ_it＝ln
（ｃ_it）と表せば、式（１）〜（３）を変形して、尤度
ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_t ）｝の算出に関する
（９）〜（１１）式が得られる。Further logarithmic transition probability A _ji = ln
(A _ji ), logarithmic output probability B _ji (x _t ) = ln {b
_ji (x _t )} and the logarithmic forward probability C _it = ln
When expressed as (c _it ), the likelihoods are modified by modifying equations (1) to (3).
Equations (9) to (11) relating to the calculation of ln {P (x ₁ , x ₂ , ..., X _t )} are obtained.

【０３４２】[0342]

【数３０】 [Equation 30]

【０３４３】（９）〜（１１）式はｔの漸化式であるか
ら、ｔ＝１、２、……、Ｔのときの対数化した前向き確
率Ｃ_itを、次式（１２）〜（１６）の如く順次に計算で
きる。Since the expressions (9) to (11) are recurrence expressions of t, the logarithmized forward probability C _it when t = 1, 2, ..., T is expressed by the following expressions (12) to (12). It can be calculated sequentially as in 16).

【０３４４】[0344]

【数３１】 (Equation 31)

【０３４５】ＨＭＭ照合部４２は、ｉ＝１、２、……、
Ｉの全てのｉにつきｔ＝Ｔの対数化した前向き確率Ｃ_iT
を得ると、ｉ＝＊ｉなる対数化した前向き確率Ｃ_iTのな
かで最大のＣ_iTを、尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得る。辞書部３６に格納されているすべて
のＨＭＭについて、各ＨＭＭ毎に、尤度ln｛Ｐ（ｘ₁、
ｘ₂ 、……、ｘ_T ）｝を求め、最大の尤度を得たＨＭＭ
に付与されているカテゴリｚを、当該時系列ｘ₁ 、ｘ
₂ 、……、ｘ_T を得た入力音声信号に対する認識結果と
して出力する。The HMM matching unit 42 uses i = 1, 2, ...,
Logarithmic forward probability C _{iT of} t = T for all i in I
, The maximum C _iT among the logarithmic forward probabilities C _iT with i = * i is _calculated as the likelihood ln {P (x ₁ , x ₂ , ..., X
_T )}. For all HMMs stored in the dictionary unit 36, the likelihood ln {P (x ₁ ,
x ₂ , ..., x _T )} is obtained and the maximum likelihood is obtained.
The category z given to the time series x ₁ , x
₂ , ..., x _T is output as the recognition result for the input voice signal.

【０３４６】次に請求項７の発明の第一実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
１７〜図１９はこの１個のＨＭＭに着目した処理の流れ
を示す図である。この例では、出力確率ｂ_ji(x_t)、前向
き確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した出
力確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数化
した参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉとし
て説明する。Next, in the first embodiment of the invention as claimed in claim 7, time series x ₁ , x ₂ , ... Of HMM and voice feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. 17 to 19 are diagrams showing the flow of processing focusing on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０３４７】照合部４２は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部４０から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are inputted from the speech section detecting section 40, the collating section 42 receives i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０３４８】次に照合部４２は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collation unit 42 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０３４９】次に照合部４２は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を、対数化した参
照確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the collation unit 42 determines that j = 1, 2, ..., J
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as the initial value of the logarithmic reference probability B _ji (S4).

【０３５０】参照情報記憶部４４には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域save B_jiを設けて
ある。従って参照情報記憶部４４は、参照確率Ｂ₁₁、Ｂ
₁₂、……、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、……、
Ｂ_J1、Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納するＪ×
Ｉ個の格納領域を有する。そこで図にあっては、参照確
率Ｂ_jiの初期値を格納する処理を、save B_ji＝B_ji(x₁)
と表している。In the reference information storage section 44, j = 1, 2, ...
, J and i = 1, 2, ..., I are respectively provided with storage areas save B _ji for storing the reference probabilities B _ji . Therefore, the reference information storage unit 44 stores the reference probabilities B ₁₁ , B
₁₂ , ……, B _1I , B ₂₁ , B ₂₂ , ……, B _2I , ……,
B _J1 , B _J2 , ..., B _JI are stored separately J ×
It has I storage areas. Therefore, in the figure, the process of storing the initial value of the reference probability B _ji is performed as save B _ji = B _ji (x ₁ )
It is expressed as

【０３５１】次に照合部４２は、前向き確率基準フレー
ム番号ｑｃ、出力確率基準フレーム番号ｑｓをそれぞれ
現フレーム番号１に初期化する（Ｓ５）。Next, the collation unit 42 initializes the forward probability reference frame number qc and the output probability reference frame number qs to the current frame number 1 (S5).

【０３５２】然る後、ｉ＝１、２、……、Ｉの全てのｉ
について、対数化した前向き確率Ｃ_i1を式（１１）に従
って求める（Ｓ６）。After that, all i in i = 1, 2, ..., I
The logarithmic forward probability C _i1 is _{calculated according} to the equation (11) (S6).

【０３５３】次に照合部４２は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collation unit 42 adds 1 to the current frame number t in order to process the next frame in the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame are added. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０３５４】（３−１Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、照合部４２は現フレーム番号ｔの音声特徴
ベクトルｘ_t と前向き確率基準フレーム番号ｑｃの音声
特徴ベクトルｘ_qcとの間の距離ｄｔｃを、次式（１９）
に従って求める（Ｓ９）。(3-1A: When t ≦ T in S8) If the current frame number t is equal to or smaller than the end frame number T in S8, the processing has not been completed for all the frames in the voice section, and thus the comparison is performed. part 42 a distance dtc between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t, the following equation (19)
(S9).

【０３５５】[0355]

【数３２】 (Equation 32)

【０３５６】但し、ｘ_tk：現フレーム番号ｔの音声特徴ベクトルｘ_t のベク
トル成分ｘ_qck ：前向き確率基準フレーム番号ｑｃの音声特徴ベ
クトルｘ_qcのベクトル成分次に照合部４２は、距離ｄｔｃと閾値ＤＴＣとを比較し
てこれらベクトルｘ_t及びｘ_qcが近似的に等しいか否か
を判定する（Ｓ１０）。However, x _t k: vector component of the voice feature vector x _t of the current frame number t x _qc k: vector component of the voice feature vector x _qc of the forward probability reference frame number qc Next, the matching unit 42 determines the distance dtc. And the threshold value DTC are compared to determine whether or not these vectors x _t and x _qc are approximately equal (S10).

【０３５７】Ｓ１０で距離ｄｔｃが閾値ＤＴＣ以下であ
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は前向き確率基準フレーム番号ｑｃの音声特徴ベクトル
ｘ_qcに近似し従って現フレーム番号ｔの音声特徴ベクト
ルｘ_t は前向き確率基準フレーム番号ｑｃの音声特徴ベ
クトルｘ_qcからの変化が小さいので、現フレーム番号ｔ
の前向き確率Ｃ_itは直前フレームの前向き確率Ｃ_i(t-1)
で近似できる。そこで現フレーム番号ｔの前向き確率Ｃ
_itは直前フレームの前向き確率Ｃ_i(t-1)に等しいものと
して、前向き確率Ｃ_itを求める演算を終了する（Ｓ１
１）。然る後、音声区間の次のフレームにつき処理を行
なうべくＳ７の処理に戻る。If the distance dtc is less than or equal to the threshold value DTC in S10, the voice feature vector x _t of the current frame number _t.
Is close to the voice feature vector x _qc of the forward probability reference frame number qc, and therefore the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _qc of the forward probability reference frame number qc. t
The forward probability C _{it of} is the forward probability C _{i (t-1)} of the immediately preceding frame.
Can be approximated by Therefore, the forward probability C of the current frame number t
_It is assumed that _it is equal to the forward probability C _{i (t-1)} of the immediately preceding frame, and the calculation for obtaining the forward probability C _it is completed (S1).
1). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０３５８】Ｓ１０で距離ｄｔｃが閾値ＤＴＣを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
前向き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ
_qcに近似せず従って現フレーム番号ｔの音声特徴ベクト
ルｘ_t は前向き確率基準フレーム番号ｑｃの音声特徴ベ
クトルｘ_qcからの変化が大きいので、現フレーム番号ｔ
の前向き確率Ｃ_itは直前フレームの前向き確率Ｃ_i(t-1)
で近似できない。そこで前向き確率基準フレーム番号ｑ
ｃを現フレーム番号ｔに書き換える（Ｓ１２）。When the distance dtc exceeds the threshold value DTC in S10, the voice feature vector x _t of the current frame number t is the voice feature vector x _t of the forward probability reference frame number qc.
Since the speech feature vector x _t approximation without therefore current frame number t in _qc large change from the speech feature vector x _qc of forward probabilities reference frame number qc, the current frame number t
The forward probability C _{it of} is the forward probability C _{i (t-1)} of the immediately preceding frame.
Cannot be approximated with. Therefore, the forward probability reference frame number q
c is rewritten to the current frame number t (S12).

【０３５９】そしてＳ１２の終了後、照合部４２は現フ
レーム番号ｔの音声特徴ベクトルｘ_t と出力確率基準フ
レーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄ
ｔｓを、次式（１７）に従って求める（Ｓ１３）。[0359] Then, after completion of S12, the distance between the matching unit 42 speech feature vectors x _t and the output probability reference frame number qs speech feature vector x _qs of the current frame number t d
ts is calculated according to the following equation (17) (S13).

【０３６０】[0360]

【数３３】 [Equation 33]

【０３６１】但し、ｘ_tk：現フレーム番号ｔの音声特徴ベクトルｘ_t のベク
トル成分ｘ_qsk ：基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
のベクトル成分次に照合部４２は、距離ｄｔｓと閾値ＤＴＳとを比較し
てこれらベクトルｘ_t及びｘ_qsが近似的に等しいか否か
を判定する（Ｓ１４）。However, x _t k: vector component of the voice feature vector x _t of the current frame number t x _qs k: voice feature vector x _qs of the reference frame number _qs
Next, the matching unit 42 compares the distance dts with the threshold value DTS and determines whether or not these vectors x _t and x _qs are approximately equal (S14).

【０３６２】Ｓ１４で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
出力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
に近似せず従って現フレーム番号ｔの音声特徴ベクトル
ｘ_t は出力確率基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsからの変化が大きいので、現フレーム番号ｔの出
力確率Ｂ_ji(x_t)を参照確率Ｂ_jiで近似できない。そこで
出力確率基準フレーム番号ｑｓを現フレーム番号ｔに書
き換える（Ｓ１５）。然る後、ｊ＝１、２、……、Ｊ及
びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対数
化した出力確率Ｂ_ji(x_t)を式（４）〜（７）に従って求
め、参照確率Ｂ_jiを、当該出力確率Ｂ_ji(x_t)に書き換え
る（Ｓ１６）。この参照確率Ｂ_jiの書換え終了後に各参
照確率Ｂ_jiを読み出し、ｉ＝１、２、……、Ｉの全ての
ｉについて、前向き確率Ｃ_itを式（１１）に従って求め
る（Ｓ１３）。然る後、音声区間の次のフレームにつき
処理を行なうべくＳ７の処理に戻る。尚、Ｓ１６で参照
確率Ｂ_jiを書き換える処理を、図にあってはsave B_ji＝
B_ji(x_t) と表している。If the distance dts exceeds the threshold value DTS in S14, the voice feature vector x _t of the current frame number t is the voice feature vector x qs of the output probability reference frame number _qs.
Therefore, since the voice feature vector x _t of the current frame number t has a large change from the voice feature vector x _qs of the output probability reference frame number qs, refer to the output probability B _ji (x _t ) of the current frame number t. It cannot be approximated with the probability B _ji . Therefore, the output probability reference frame number qs is rewritten to the current frame number t (S15). After that, logarithmized output probabilities B _ji (x _t ) for all j and i of j = 1, 2, ..., J and i = 1, 2 ,. _Obtained according to (7), the reference probability B _ji is rewritten to the output probability B _ji (x _t ) (S16). After the rewriting of the reference probabilities B _ji, the reference probabilities B _ji are read out, and the forward probabilities C _it for all i of i = 1, 2, ..., I are calculated according to the equation (11) (S13). After that, the process returns to S7 so as to perform the process for the next frame of the voice section. The process of rewriting the reference probability B _{ji in} S16 is save B _ji =
It is expressed as B _ji (x _t ).

【０３６３】この場合のＳ１７で読み出した参照確率Ｂ
_jiは、Ｓ１６において求めた現フレーム番号ｔの出力確
率Ｂ_ji(x_t)であり、従ってこの場合のＳ１７では、現フ
レーム番号ｔの出力確率Ｂ_ji(x_t)を用いて、前向き確率
Ｃ_itを求めることとなる。Reference probability B read in S17 in this case
_ji is the output probability B _ji (x _t ) of the current frame number t obtained in S16. Therefore, in S17 of this case, the forward probability C is calculated using the output probability B _ji (x _t ) of the current frame number t. _It will ask for it.

【０３６４】またＳ１４で距離ｄｔｓが閾値ＤＴＳ以下
である場合には、現フレーム番号ｔの音声特徴ベクトル
ｘ_t は出力確率基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsに近似的に等しく従って現フレーム番号ｔの音声
特徴ベクトルｘ_t は出力確率基準フレーム番号ｑｓの音
声特徴ベクトルｘ_qsからの変化が小さいので、現フレー
ム番号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ_jiで近似でき
る。そこで出力確率Ｂ_ji(x_t)を式（４）〜（７）を用い
て算出せずに、各参照確率Ｂ_jiを読み出し、ｉ＝１、
２、……、Ｉの全てのｉについて、対数化した前向き確
率Ｃ_itを式（１１）に従って求める（Ｓ１７）。然る
後、音声区間の次のフレームにつき処理を行なうべくＳ
７の処理に戻る。If the distance dts is less than or equal to the threshold value DTS in S14, the voice feature vector x _t of the current frame number t is approximately equal to the voice feature vector x _qs of the output probability reference frame number qs, and therefore the current frame number. Since the voice feature vector x _t of _t has a small change from the voice feature vector x _qs of the output probability reference frame number qs, the output probability B _ji (x _t ) of the current frame number t can be approximated by the reference probability B _ji . Therefore, each reference probability B _ji is read out without calculating the output probability B _ji (x _t ) using the equations (4) to (7), and i = 1,
The logarithmic forward probability C _it is calculated for all i of 2, ..., I according to the equation (11) (S17). After that, S is executed to process the next frame of the voice section.
It returns to the process of 7.

【０３６５】この場合のＳ１７で読み出した参照確率Ｂ
_jiは、出力確率基準フレーム番号ｑｓのフレームで求め
た出力確率Ｂ_ji(x_qs) であり、従ってこの場合のＳ１７
では、出力確率基準フレーム番号ｑｓの出力確率Ｂ_ji(x
_qs) を用いて前向き確率Ｃ_itを求めることとなる。Reference probability B read in S17 in this case
_ji is the output probability B _ji (x _qs ) obtained in the frame of the output probability reference frame number qs, and thus S17 in this case.
Then, the output probability B _ji (x of the output probability reference frame number qs
_The forward probability C _it is calculated using _qs ).

【０３６６】（３−１Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトル時系列ｘ₁ 、ｘ₂ 、……、ｘ
_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ
_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求め
る処理を終了する（終了）。(3-1B: When t> T in S8) When the current frame number t is larger than the frame number T of the end frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, wherein i = * i maximum forward probability C _iT of the forward probability C _iT comprising the following (9), the speech feature vector time series _{_{x 1, x 2, ......,}} x
Likelihood ln {P (x ₁ , x ₂ , ..., X between _T and HMM
_T )}, and after that, the process of calculating the likelihood for the HMM is ended (end).

【０３６７】照合部４２は、辞書部３６に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に図１７〜図１９
に示すＳ１〜Ｓ１７の処理を行なって尤度（前向き確率
Ｃ_iT）を求め、そして最大の尤度を得たＨＭＭのカテゴ
リを、当該音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T を抽出した入力音声信号に対する認識結果とし
て、次段の装置（図示せず）へ出力する。The collation unit 42, for all the HMMs stored in the dictionary unit 36, is shown in FIGS.
, The likelihood (forward probability C _iT ) is obtained by performing the processes of S1 to S17, and the category of the HMM for which the maximum likelihood is obtained is the time series x ₁ , x ₂ , ... Of the speech feature vector.
, X _T are output to a device (not shown) in the next stage as a recognition result for the extracted input voice signal.

【０３６８】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、前向き確率
Ｃ_itに関わる距離ｄｔｃが閾値ＤＴＣとなる場合に、出
力確率Ｂ_ji(x_t)を式（４）〜（７）から求める演算も前
向き確率Ｃ_itを式（３）若しくは式（１１）から求める
演算も行なわずに、前向き確率Ｃ_itは直前フレームの前
向き確率Ｃ_i(t-1)に等しいものとして前向き確率Ｃ_itを
求める演算を終了する。さらに出力確率Ｂ_ji(x_t)に関わ
る距離ｄｔｓが閾値ＤＴＳ以下となる場合に、出力確率
Ｂ_ji(x_t)を式（４）〜（７）から求める演算を行なわず
に、参照確率Ｂ_jiを用いて前向き確率Ｃ_itを求めるの
で、演算量を大幅に削減できる。しかもこのような演算
の簡略化は、前向き確率Ｃ_itに関わる距離ｄｔｃが閾値
ＤＴＣ以下となる場合若しくは出力確率Ｂ_ji(x_t)に関わ
る距離ｄｔｓが閾値ＤＴＳ以下となる場合に行なうの
で、演算を簡略化しても、前向き確率Ｃ_itの誤差を小さ
くすることができる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _{iT In the} process of _obtaining C _iT , when the distance dtc related to the forward probability C _it becomes the threshold value DTC, the output probability B _ji (x _t ) is calculated from the equations (4) to (7). even without also calculation for obtaining the forward probability C _it from equation (3) or formula (11), the forward probability C _it obtains the forward probability C _it as equal to forward immediately preceding frame probability C _{i (t-1)} The calculation ends. If the further distance dts is equal to or less than the threshold DTS associated with the output probability B _ji (x _t), without calculation for obtaining the output probabilities B _ji the (x _t) from equation (4) to (7), the reference probability B _Since the forward probability C _it is calculated using _ji , the amount of calculation can be significantly reduced. Moreover, such simplification of the calculation is performed when the distance dtc related to the forward probability C _it is less than or equal to the threshold DTC or when the distance dts related to the output probability B _ji (x _t ) is less than or equal to the threshold DTS. Even if is simplified, the error of the forward probability C _it can be reduced.

【０３６９】この出願の発明者のシミュレーション結果
によれば、前向き確率Ｃ_itを得るための演算量が、演算
の簡略化を行なわない場合の約１／２となるように、前
向き確率Ｃ_itに関わる閾値ＤＴＣを定め、かつ、出力確
率Ｂ_ji(x_t)を得るための演算量が、演算の簡略化を行な
わない場合の約１／５となるように、出力確率Ｂ_ji(x_t)
に関わる閾値ＤＴＳを定めても、音声認識の精度低下は
ほとんど見られなかった。[0369] According to the inventor of the simulation results of this application, as the amount of calculation for obtaining the forward probability C _it is approximately 1/2 of the case of not performing the simplification of calculation, the forward probability C _it defining a threshold DTC involved, and the output probability B _ji (x _t) calculation amount for obtaining the found to be about 1/5 of the case of not performing the simplification of the operation, the output probability B _ji (x _t)
Even if the threshold value DTS related to is determined, the accuracy of voice recognition is hardly decreased.

【０３７０】＜請求項７の発明の第二実施形態＞請求項
７の発明の第二実施形態の実施に用いて好適な音声認識
装置としては、照合部４２を次に述べる如く構成するほ
かは、上述した構成と同様の構成の音声認識装置３４を
用いることができる。<Second Embodiment of the Invention of Claim 7> As a voice recognition apparatus suitable for carrying out the second embodiment of the invention of claim 7, the collation unit 42 is configured as follows. The voice recognition device 34 having the same configuration as the above can be used.

【０３７１】すなわち照合部４２は、尤度を求める際
に、参照情報記憶部４４に格納してある参照確率ｂ_jiを
用いて、ｔ＝１、２、……、Ｔの各場合の前向き確率ｃ
_itを、次ぎの如くして順次に求める。That is, the collation unit 42 uses the reference probability b _ji stored in the reference information storage unit 44 when obtaining the likelihood, and the forward probability in each case of t = 1, 2, ..., T. c
_It is calculated sequentially as follows.

【０３７２】（１）．ｔ＝１のときは、前向き確率基準
フレーム番号ｑｃ、出力確率基準フレーム番号ｑｓをそ
れぞれ１に、及び、前向き確率スキップ数ｓｋｉｐｃ、
出力確率スキップ数ｓｋｉｐｓをそれぞれ０に初期化す
ると共に、全てのｊ、ｉについて、出力確率ｂ_ji(x_t)を
ヒドンマルコフモデルから求め当該出力確率ｂ_ji(x_t)を
参照確率ｂ_jiの初期値として書き込み、この参照確率ｂ
_jiの書込み終了後に各参照確率ｂ_jiを読み出して前向き
確率ｃ_itを求める処理（３Ａ）を行なう。そして処理
（３Ａ）の終了後、現フレーム番号ｔに１を加算する処
理（３Ｂ）を行なう。(1). When t = 1, the forward probability reference frame number qc and the output probability reference frame number qs are set to 1, respectively, and the forward probability skip number skipc,
Is initialized to output probability skip number skips to 0 respectively, all j, for i, the output probability b _ji (x _t) from hidden Markov model determined reference probability b _ji the output probability b _ji (x _t) The reference probability b is written as the initial value.
After the writing of _ji is finished, the reference probability b _ji is read out and the forward probability c _it is calculated (3A). After the processing (3A) is completed, processing (3B) for adding 1 to the current frame number t is performed.

【０３７３】（２）．２≦ｔ≦Ｔのときは、前向き確率
スキップ数ｓｋｉｐｃを閾値ＮＳＫＩＰＣと比較すると
共に、現フレーム番号ｔの音声特徴ベクトルｘ_t と前向
き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qcと
の間の距離ｄｔｃを閾値ＤＴＣと比較する処理（３Ｃ）
と、この処理（３Ｃ）の比較結果がｓｋｉｐｃ≦ＮＳＫ
ＩＰＣかつｄｔｃ≦ＤＴＣとなる場合に、前向き確率ｃ
_itは直前フレームの前向き確率ｃ_i(t-1)に等しいものと
して前向き確率ｃ_itを求める演算を終了すると共に前向
き確率スキップ数ｓｋｉｐｃ、出力確率スキップ数ｓｋ
ｉｐｓにそれぞれ、１を加算する処理（３Ｄ）と、この
処理（３Ｃ）の比較結果がｓｋｉｐｃ＞ＮＳＫＩＰＣ若
しくはｄｔｃ＞ＤＴＣとなる場合に、前向き確率スキッ
プ数ｓｋｉｐｃを０に初期化し、及び、前向き確率基準
フレーム番号ｑｃを現フレーム番号ｔに書き換える処理
（３Ｅ）とを行なう。(2). When the 2 ≦ t ≦ T, the forward probability skip number skipc with is compared with a threshold value NSKIPC, the distance between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t Process of comparing dtc with threshold DTC (3C)
And the comparison result of this processing (3C) is skipc ≦ NSK
Forward probability c when IPC and dtc ≦ DTC
_It is assumed that _it is equal to the forward probability c _{i (t−1)} of the immediately preceding frame, and the calculation of the forward probability c _it is completed, and the forward probability skip number skippc and the output probability skip number sk
When the comparison result of the process (3D) of adding 1 to ips and the process (3C) is skipc> NSKIPC or dtc> DTC, the forward probability skip number skippc is initialized to 0, and the forward probability. A process (3E) of rewriting the reference frame number qc to the current frame number t is performed.

【０３７４】そして処理（３Ｅ）の終了後、出力確率ス
キップ数ｓｋｉｐｓを閾値ＮＳＫＩＰＳと比較すると共
に、現フレーム番号ｔの音声特徴ベクトルｘ_t と出力確
率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間
の距離ｄｔｓを閾値ＤＴＳと比較し、当該比較結果がｓ
ｋｉｐｓ＞ＮＳＫＩＰＳ若しくはｄｔｓ＞ＤＴＳとなる
場合に、出力確率スキップ数ｓｋｉｐｓを０に初期化
し、及び、出力確率基準フレーム番号ｑｓを現フレーム
番号ｔに書き換えると共に、全てのｊ、ｉについて、出
力確率ｂ_ji(x_t)をヒドンマルコフモデルから求めて参照
確率ｂ_jiを当該出力確率ｂ_ji(x_t)に書き換え、この参照
確率ｂ_jiの書換え終了後に各参照確率ｂ_jiを読み出して
前向き確率ｃ_itを求め、当該比較結果がｓｋｉｐｓ≦Ｎ
ＳＫＩＰＳかつｄｔｓ≦ＤＴＳとなる場合に、出力確率
スキップ数ｓｋｉｐｓに１を加算すると共に、参照確率
ｂ_ji(x_t)の書換えを行なわずに各参照確率ｂ_jiを読み出
して前向き確率ｃ_itを求める処理（３Ｆ）を行なう。[0374] Then after the processing (3E), together with comparing the output probability skip number skips a threshold NSKIPS, the speech feature vector x _t and the output probability reference frame number qs of the current frame number t of the audio feature vector x _qs The distance dts between them is compared with the threshold value DTS, and the comparison result is s.
When kips> NSKIPS or dts> DTS, the output probability skip number skips is initialized to 0, the output probability reference frame number qs is rewritten to the current frame number t, and the output probability b is set for all j and i. _ji (x _t) rewriting the reference probability b _ji determined from hidden Markov model to the output probability b _ji (x _t), the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji And the comparison result is skips ≦ N.
When SKIPS and dts ≦ DTS are satisfied, 1 is added to the output probability skip number skips, and each reference probability b _ji is read without rewriting the reference probability b _ji (x _t ) to obtain the forward probability c _it . Process (3F) is performed.

【０３７５】そして処理（３Ｄ）若しくは（３Ｆ）の終
了後、現フレーム番号ｔに１を加算する処理（３Ｇ）を
行なう。After the processing (3D) or (3F), the processing (3G) of adding 1 to the current frame number t is performed.

【０３７６】次に請求項７の発明の第二実施形態におい
て、ＨＭＭと音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、…
…、ｘ_T との間の尤度を求める処理の流れであって、１
個のＨＭＭに着目した処理の流れについて説明する。図
２０〜図２２は、この１個のＨＭＭに着目した処理の流
れを示す図である。この例では、出力確率ｂ_ji(x_t)、前
向き確率ｃ_it及び参照確率ｂ_jiをそれぞれ、対数化した
出力確率Ｂ_ji(x_t)、対数化した前向き確率Ｃ_it及び対数
化した参照確率Ｂ_jiとし、ｉ＝ｊ＝１、２、……、Ｉと
して説明する。Next, in the second embodiment of the invention of claim 7, the time series x ₁ , x ₂ , ... Of the HMM and the voice feature vector.
,, the flow of the process of obtaining the likelihood between x _T and 1
The flow of processing focused on individual HMMs will be described. 20 to 22 are diagrams showing the flow of processing focusing on this one HMM. In this example, the output probability b _ji (x _t ), the forward probability c _it, and the reference probability b _ji are logarithmized output probability B _ji (x _t ), logarithmic forward probability C _it, and logarithmic reference probability, respectively. B _ji and i = j = 1, 2, ..., I.

【０３７７】照合部４２は、区間情報及び音声特徴ベク
トルｘ_t を音声区間検出部４０から入力すると、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率の初期値Ｃ_i0を式（１０）に従って設定する（Ｓ
１）。When the section information and the speech feature vector x _t are input from the speech section detecting section 40, the collating section 42 receives i =
For all i of 1, 2, ..., I, the initial value C _i0 of the forward probability logarithmized is set according to the equation (10) (S).
1).

【０３７８】次に照合部４２は、音声区間の始端フレー
ムにつき処理を行なうべく現フレーム番号ｔをｔ＝１に
初期化する（Ｓ２）。Next, the collating unit 42 initializes the current frame number t to t = 1 in order to process the start frame of the voice section (S2).

【０３７９】次に照合部４２は、ｊ＝１、２、……、Ｊ
及びｉ＝１、２、……、Ｉの全てのｊ、ｉについて、対
数化した出力確率Ｂ_ji(x₁)を式（４）〜（７）に従って
求め（Ｓ３）、当該出力確率Ｂ_ji(x₁)を、対数化した出
力確率Ｂ_jiの初期値として書き込む（Ｓ４）。Next, the collation unit 42 determines that j = 1, 2, ..., J
And i = 1, 2, ..., I, logarithmized output probabilities B _ji (x ₁ ) are obtained according to equations (4) to (7) (S3), and the output probabilities B _{ji are obtained.} (x ₁ ) is written as the initial value of the logarithmic output probability B _ji (S4).

【０３８０】参照情報記憶部４４には、ｊ＝１、２、…
…、Ｊ及びｉ＝１、２、……、Ｉの各ｊ、ｉ毎に個別
に、参照確率Ｂ_jiを格納する格納領域をsave B_jiを設け
てある。従って参照情報記憶部４４は、出力参照確率Ｂ
₁₁、Ｂ₁₂、……、Ｂ_1I、Ｂ₂₁、Ｂ₂₂、……、Ｂ_2I、…
…、Ｂ_J1、Ｂ_J2、……、Ｂ_JIをそれぞれ個別に格納する
Ｊ×Ｉ個の格納領域を有する。そこで図にあっては、参
照確率Ｂ_jiの初期値を格納する処理を、save B_ji＝B
_ji(x₁) と表している。In the reference information storage section 44, j = 1, 2, ...
, J and i = 1, 2, ..., I, respectively, save B _ji is provided as a storage area for storing the reference probability B _ji for each j and i. Therefore, the reference information storage unit 44 determines that the output reference probability B
₁₁ , B ₁₂ , ..., B _1I , B ₂₁ , B ₂₂ , ..., B _2I , ...
, B _J1 , B _J2 , ..., B _JI are respectively stored in J × I storage areas. Therefore, in the figure, the process of storing the initial value of the reference probability B _ji is performed as save B _ji = B
It is expressed as _ji (x ₁ ).

【０３８１】次に照合部４２は、前向き確率基準フレー
ム番号ｑｃ、出力確率基準フレーム番号ｑｓをそれぞれ
現フレーム番号１に初期化すると共に、前向き確率スキ
ップ数ｓｋｉｐｃ、出力確率スキップ数ｓｋｉｐｓをそ
れぞれ０に初期化する（Ｓ５）。然る後、照合部４２
は、ｉ＝１、２、……、Ｉの全てのｉについて、対数化
した前向き確率Ｃ_i1を式（１１）に従って求める（Ｓ
６）。Next, the collation unit 42 initializes the forward probability reference frame number qc and the output probability reference frame number qs to the current frame number 1, and sets the forward probability skip number skipc and the output probability skip number skips to 0, respectively. Initialize (S5). After that, the collating unit 42
_{Calculates the} logarithmic forward probability C _{i1 according} to the equation (11) for all i = 1, 2, ..., I (S
6).

【０３８２】次に照合部４２は、音声区間の次のフレー
ムにつき処理を行なうべく現フレーム番号ｔに１を加算
し（Ｓ７）、然る後、現フレーム番号ｔと終端フレーム
のフレーム番号Ｔとを比較して音声区間内の全てのフレ
ームにつき処理を終了したか否かを判定する（Ｓ８）。Next, the collation unit 42 adds 1 to the current frame number t in order to process the next frame of the voice section (S7), and thereafter, the current frame number t and the frame number T of the end frame are compared. Are compared to determine whether the processing has been completed for all the frames in the voice section (S8).

【０３８３】（３−２Ａ：Ｓ８でｔ≦Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームの番号Ｔ以下である場
合は、音声区間の全てのフレームにつき処理を終了して
いないので、前向き確率スキップ数ｓｋｉｐｃと閾値Ｎ
ＳＫＩＰＣとの比較判定を行なう（Ｓ９）。(3-2A: When t ≦ T in S8) If the current frame number t is equal to or less than the end frame number T in S8, the processing has not been completed for all the frames in the voice section, so that it is forward-looking. Probability skip number skipc and threshold N
A comparison determination with SKIPC is performed (S9).

【０３８４】Ｓ９で前向き確率スキップ数ｓｋｉｐｃが
閾値ＮＳＫＩＰＣを越える場合は、現フレーム番号ｔの
前向き確率Ｃ_itを直前フレームの前向き確率Ｃ_i(t-1)で
近似して前向き確率Ｃ_itを求める演算を終了した回数ｓ
ｋｉｐｃが閾値ＮＳＫＩＰＣを越えるので現フレーム番
号ｔと前向き確率基準フレーム番号ｑｃとの時間的隔た
りが大きくなり、従って誤差が増大する可能性が高い。
そこで参照確率Ｂ_jiを読み出して前向き確率Ｃ_itを求め
ることとなるので、前向き確率スキップ数ｓｋｉｐｃを
０に初期化すると共に、前向き確率基準フレーム番号ｑ
ｃを現フレーム番号ｔに書き換える（Ｓ１０）。If the number of forward probability skips skippc exceeds the threshold value NSKIPC in S9, the forward probability C _it of the current frame number t is approximated by the forward probability C _{i (t-1)} of the immediately preceding frame to obtain the forward probability C _it . Number of times the calculation is completed s
Since kipc exceeds the threshold value NSKIPC, the time gap between the current frame number t and the forward probability reference frame number qc becomes large, and thus the error is likely to increase.
Therefore, since the reference probability B _ji is read to obtain the forward probability C _it , the forward probability skip number skipc is initialized to 0 and the forward probability reference frame number q is set.
c is rewritten to the current frame number t (S10).

【０３８５】またＳ９で前向き確率スキップ数ｓｋｉｐ
ｃが閾値ＮＳＫＩＰＣ以下となる場合は、照合部４２
は、現フレーム番号ｔの音声特徴ベクトルｘ_t と前向き
確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qcとの
間の距離ｄｔｃを式（１９）に従って求め（Ｓ１１）、
求めた距離ｄｔｃを閾値ＤＴＣと比較してこれらベクト
ルｘ_t 及びｘ_qcが近似的に等しいか否かを判定する（Ｓ
１２）。Further, in S9, the number of forward probability skips skip
If c is less than or equal to the threshold value NSKIPC, the matching unit 42
Is determined according to equation (19) the distance dtc between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t (S11),
The obtained distance dtc is compared with a threshold value DTC to determine whether or not these vectors x _t and x _qc are approximately equal (S
12).

【０３８６】Ｓ１２で距離ｄｔｃが閾値ＤＴＣを越える
場合は、現フレーム番号ｔの音声特徴ベクトルｘ_t は前
向き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qc
に近似せず従って現フレーム番号ｔの音声特徴ベクトル
ｘ_t は前向き確率基準フレーム番号ｑｃの音声特徴ベク
トルｘ_qcからの変化が大きいので、現フレーム番号ｔの
前向き確率Ｃ_itは直前フレームの前向き確率Ｃ_i(t-1)で
近似できない。そこで参照確率Ｂ_jiを読み出して前向き
確率Ｃ_itを求めることとなるので、前向き確率スキップ
数ｓｋｉｐｃを０に初期化すると共に、前向き確率基準
フレーム番号ｑｃを現フレーム番号ｔに書き換える（Ｓ
１０）。If the distance dtc exceeds the threshold value DTC in S12, the voice feature vector x _t of the current frame number t is the voice feature vector x qc of the forward probability reference frame number _qc.
Therefore, since the voice feature vector x _t of the current frame number t has a large change from the voice feature vector x _qc of the forward probability reference frame number qc, the forward probability C _it of the current frame number t is the forward probability of the immediately preceding frame. It cannot be approximated by C _{i (t-1)} . Therefore, since the reference probability B _ji is read to obtain the forward probability C _it , the forward probability skip number skipc is initialized to 0 and the forward probability reference frame number qc is rewritten to the current frame number t (S
10).

【０３８７】またＳ１２で距離ｄｔｃが閾値ＤＴＣ以下
である場合は、ｓｋｉｐｃ≦ＮＳＫＩＰＣかつｄｔｃ≦
ＤＴＣである場合である。ｓｋｉｐｃ≦ＮＳＫＩＰＣの
場合、現フレーム番号ｔの前向き確率Ｃ_itを直前フレー
ムの前向き確率Ｃ_i(t-1)で近似して前向き確率Ｃ_itの演
算を終了した回数ｓｋｉｐｃが閾値ＮＳＫＩＰＣを越え
たので、現フレーム番号ｔと前向き確率基準フレーム番
号ｑｃとの時間的隔たりが小さく、従って誤差が増大す
る可能性は低い。しかもｄｔｃ≦ＤＴＣの場合、現フレ
ーム番号ｔの音声特徴ベクトルｘ_t は前向き確率基準フ
レーム番号ｑｃの音声特徴ベクトルｘ_qcに近似的に等し
く従って現フレーム番号ｔの音声特徴ベクトルｘ_t は前
向き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qc
からの変化が小さいので、現フレーム番号ｔの前向き確
率Ｃ_itは直前フレームの前向き確率Ｃ_i(t-1)で近似でき
る。従って参照確率Ｂ_jiを読み出して前向き確率Ｃ_itを
求める演算も参照確率Ｂ_jiの書換えも行なわない。そこ
で現フレーム番号ｔの前向き確率Ｃ_itは直前フレームの
前向き確率Ｃ_i(t-1)に等しいものとして現フレーム番号
ｔの前向き確率Ｃ_itを求める演算を終了すると共に、前
向き確率スキップ数ｓｋｉｐｃ、出力確率スキップ数ｓ
ｋｉｐｓにそれぞれ１を加算してこれらスキップ数ｓｋ
ｉｐｃ、ｓｋｉｐｓをそれぞれカウントアップする（Ｓ
１３）。然る後、音声区間の次のフレームにつき処理を
行なうべくＳ７の処理に戻る。If the distance dtc is less than or equal to the threshold DTC in S12, skippc≤NSKIPC and dtc≤
This is the case of DTC. For skipc ≦ NSKIPC, since the number of times Skipc completing the calculation of forward probabilities C _it is approximated by the forward probability C _{i (t-1)} of the immediately preceding frame forward probability C _it the current frame number t exceeds the threshold NSKIPC , The time gap between the current frame number t and the forward probability reference frame number qc is small, and therefore the error is unlikely to increase. Moreover, when dtc ≦ DTC, the voice feature vector x _t of the current frame number t is approximately equal to the voice feature vector x _qc of the forward probability reference frame number qc, and thus the voice feature vector x t of the current frame number _t is the forward probability reference. Voice feature vector x _qc of frame number _qc
Since the change from is small, the forward probability C _it of the current frame number t can be approximated by the forward probability C _{i (t-1)} of the immediately preceding frame. Therefore, neither the reference probability B _ji is read out to obtain the forward probability C _it nor the reference probability B _ji is rewritten. Therefore, the forward probability C _it of the current frame number t is assumed to be equal to the forward probability C _{i (t-1)} of the immediately preceding frame, and the calculation for obtaining the forward probability C _it of the current frame number t is completed, and the forward probability skip number skippc, Output probability skip number s
These skip numbers sk are obtained by adding 1 to each of the skips.
Count up ipc and skips respectively (S
13). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０３８８】前向き確率スキップ数ｓｋｉｐｃが閾値Ｎ
ＳＫＩＰＣを越えるか若しくは距離ｄｔｃが閾値ＤＴＣ
を越えるかした場合にＳ１０を行なったら、次に出力確
率スキップ数ｓｋｉｐｓと閾値ＮＳＫＩＰＳとの比較判
定を行なう（Ｓ１４）。The number of forward probability skips skipc is a threshold N.
Exceeds SKIPC or distance dtc is threshold DTC
If S10 is performed when the value exceeds the threshold, then the output probability skip number skips and the threshold value NSKIPS are compared and determined (S14).

【０３８９】Ｓ１４で出力確率スキップ数ｓｋｉｐｓが
閾値ＮＳＫＩＰＳを越える場合は、参照確率Ｂ_jiの書換
えを行なわなかった回数ｓｋｉｐｓが閾値ＮＳＫＩＰＳ
を越えるので現フレーム番号ｔと出力確率基準フレーム
番号ｑｓとの時間的隔たりが大きくなり、従って誤差が
増大する可能性が高い。そこで誤差を低減すべく、参照
確率Ｂ_jiの書換えを行なうこととなる。そこで出力確率
スキップ数ｓｋｉｐｓを０に初期化すると共に出力確率
基準フレーム番号ｑｓを現フレーム番号ｔに書き換える
（Ｓ１５）。然る後、ｊ＝１、２、……、Ｊ及びｉ＝
１、２、……、Ｉの全てのｊ、ｉについて、対数化した
出力確率Ｂ_ji(x_t)を式（４）〜（７）に従って求め、参
照確率Ｂ_jiを当該出力確率Ｂ_ji(x_t)に書き換える（Ｓ１
６）。そしてこの参照確率Ｂ_jiの書換え終了後に各参照
確率Ｂ_jiを読み出し、ｉ＝１、２、……、Ｉの全てのｉ
について、前向き確率Ｃ_itを式（１１）に従って求める
（Ｓ１７）。然る後、音声区間の次のフレームにつき処
理を行なうべくＳ７の処理に戻る。尚、図にあっては、
Ｓ１６で参照確率Ｂ_jiを書き換える処理をsave B_ji＝B
_ji(x_t) と表す。If the output probability skip number skips exceeds the threshold value NSKIPS in S14, the number of times skips at which the reference probability B _ji is not rewritten is the threshold value NSKIPS.
Since the current frame number t exceeds the output probability reference frame number qs, there is a high possibility that the error increases. Therefore, in order to reduce the error, the reference probability B _ji is rewritten. Therefore, the output probability skip number skips is initialized to 0 and the output probability reference frame number qs is rewritten to the current frame number t (S15). Then, j = 1, 2, ..., J and i =
For all j, i of 1, 2, ..., I, logarithmic output probabilities B _ji (x _t ) are obtained according to equations (4) to (7), and reference probabilities B _ji are output probabilities B _ji ( x _t ) (S1
6). Then, after the reference probabilities B _ji have been rewritten, the reference probabilities B _ji are read out, and all i _values of i = 1, 2, ..., I are read.
Then, the forward probability C _it is calculated according to the equation (11) (S17). After that, the process returns to S7 so as to perform the process for the next frame of the voice section. In the figure,
The process of rewriting the reference probability B _{ji in} S16 is save B _ji = B
_{Expressed as ji} (x _t ).

【０３９０】この場合のＳ１７で読み出した参照確率Ｂ
_jiは、Ｓ１６において求めた現フレーム番号ｔの出力確
率Ｂ_ji(x_t)であり、従ってこの場合のＳ１７では現フレ
ーム番号ｔの出力確率Ｂ_ji(x_t)を用いて前向き確率Ｃ_it
を求めることとなる。Reference probability B read in S17 in this case
_ji is the output probability B _ji (x _t ) of the current frame number t obtained in S16. Therefore, in S17 in this case, the forward probability C _it is used by using the output probability B _ji (x _t ) of the current frame number t.
Will be asked.

【０３９１】Ｓ１４で出力確率スキップ数ｓｋｉｐｓが
閾値ＮＳＫＩＰＳ以下となる場合は、照合部４２は、現
フレーム番号ｔの音声特徴ベクトルｘ_t と出力確率基準
フレーム番号ｑｓの音声特徴ベクトルｘ_qsとの間の距離
ｄｔｓを求め（Ｓ１８）、求めた距離ｄｔｓを閾値ＤＴ
Ｓと比較してこれらベクトルｘ_t 及びｘ_qsが近似的に等
しいか否かを判定する（Ｓ１９）。[0391] If S14 in output probability skip number skips is equal to or less than the threshold NSKIPS is, the matching unit 42, between the speech feature vector x _qs speech feature vector x _t and the output probability reference frame number qs of the current frame number t Of the calculated distance dts (S18)
It is determined by comparing with S whether these vectors x _t and x _qs are approximately equal (S19).

【０３９２】Ｓ１９で距離ｄｔｓが閾値ＤＴＳを越える
場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t は
出力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qs
に近似せず従って現フレーム番号ｔの音声特徴ベクトル
ｘ_t は出力確率基準フレーム番号ｑｓの音声特徴ベクト
ルｘ_qsからの変化が大きいので、現フレーム番号ｔの出
力確率Ｂ_ji(x_t)は参照確率Ｂ_jiで近似できない。従って
参照確率Ｂ_jiの書き換えを行なうこととなる。そこでＳ
１５〜Ｓ１７の処理を行ない、然る後、音声区間の次の
フレームにつき処理を行なうべくＳ７の処理に戻る。When the distance dts exceeds the threshold DTS in S19, the voice feature vector x _t of the current frame number t is the voice feature vector x qs of the output probability reference frame number _qs.
Therefore, since the voice feature vector x _t of the current frame number t has a large change from the voice feature vector x _qs of the output probability reference frame number qs, the output probability B _ji (x _t ) of the current frame number t is referred to. It cannot be approximated with the probability B _ji . Therefore, the reference probability B _ji is rewritten. So S
The processes of 15 to S17 are performed, and thereafter, the process returns to the process of S7 to perform the process for the next frame of the voice section.

【０３９３】Ｓ１９で距離ｄｔｓが閾値ＤＴＳ以下であ
る場合には、現フレーム番号ｔの音声特徴ベクトルｘ_t
は出力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ
_qsに近似的に等しく従って現フレーム番号ｔの音声特徴
ベクトルｘ_t は出力確率基準フレーム番号ｑｓの音声特
徴ベクトルｘ_qsからの変化が小さいので、現フレーム番
号ｔの出力確率Ｂ_ji(x_t)は参照確率Ｂ_jiで近似できる。
従って参照確率Ｂ_jiの書換えは行なわないこととなる。
そこで出力確率スキップ数ｓｋｉｐｓに１を加算して出
力確率スキップ数ｓｋｉｐｓをカウントアップする（Ｓ
２０）。然る後、出力確率Ｂ_ji(x_t)を式（４）〜（７）
を用いて算出せずに参照確率Ｂ_jiを読み出して、ｉ＝
１、２、……、Ｉの全てのｉについて、対数化した前向
き確率Ｃ_itを式（１１）に従って求める（Ｓ１７）。然
る後、音声区間の次のフレームにつき処理を行なうべく
Ｓ７の処理に戻る。If the distance dts is less than or equal to the threshold value DTS in S19, the voice feature vector x _t of the current frame number _t.
Is the voice feature vector x of the output probability reference frame number qs
Since the speech feature vector x _t of approximately equal therefore current frame number t to _qs small changes from speech feature vector x _qs output probabilities reference frame number qs, output probability B _ji of the current frame number t (x _t) Can be approximated by the reference probability B _ji .
Therefore, the reference probability B _ji is not rewritten.
Therefore, 1 is added to the output probability skip number skips to count up the output probability skip number skips (S
20). After that, the output probability B _ji (x _t ) is _calculated by the equations (4) to (7).
The reference probability B _ji is read out without calculation using
The logarithmic forward probability C _it is calculated for all i of 1, 2, ..., I according to the equation (11) (S17). After that, the process returns to S7 so as to perform the process for the next frame of the voice section.

【０３９４】この場合のＳ１７で読み出した参照確率Ｂ
_jiは、出力確率基準フレーム番号ｑｓのフレームで求め
た出力確率Ｂ_ji(x_t)であり、従ってこの場合のＳ１７で
は出力確率基準フレーム番号ｑｓの出力確率Ｂ_ji(x_qs)
を用いて前向き確率Ｃ_itを求めることとなる。Reference probability B read in S17 in this case
_ji is the output probability B _ji (x _t ) obtained in the frame of the output probability reference frame number qs. Therefore, in S17 in this case, the output probability B _ji (x _qs ) of the output probability reference frame number qs is obtained.
Then, the forward probability C _it is obtained.

【０３９５】（３−２Ｂ：Ｓ８でｔ＞Ｔの場合）Ｓ８で
現フレーム番号ｔが終端フレームのフレーム番号Ｔより
も大きい場合は、ｉ＝１、２、……、Ｉの全てのｉにつ
いて前向き確率Ｃ_iTを求め終えたので、式（９）に従っ
てｉ＝＊ｉ成る前向き確率Ｃ_iTのうち最大の前向き確率
Ｃ_iTを、音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、
ｘ_T とＨＭＭとの間の尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、……、
ｘ_T ）｝として得、然る後、当該ＨＭＭにつき尤度を求
める処理を終了する（終了）。(3-2B: When t> T in S8) When the current frame number t is larger than the frame number T of the end frame in S8, i = 1, 2, ... since finished sought forward probability C _iT, the maximum forward probability C _iT of i = * i consisting forward probability C _iT according to equation (9), the time series x _1, x ₂ of the speech feature vector, ...,
Likelihood ln {P (x ₁ , x ₂ , ..., Between x _T and HMM
x _T )}, and after that, the process of calculating the likelihood for the HMM is terminated (end).

【０３９６】照合部４２は、辞書部３６に格納されてい
る全てのＨＭＭについて、各ＨＭＭ毎に、図２０〜図２
２に示すＳ１〜Ｓ２０の処理を行なって尤度（前向き確
率Ｃ_iT）を求め、求めた尤度のうち最大の尤度を検出す
る。そして最大の尤度を得たＨＭＭのカテゴリを、当該
音声特徴ベクトルの時系列ｘ₁ 、ｘ₂ 、……、ｘ_T を抽
出した入力音声信号に対する認識結果として、次段の装
置（図示せず）へ出力する。The collating unit 42, for all HMMs stored in the dictionary unit 36, for each HMM, see FIG.
The likelihood (forward probability C _iT ) is obtained by performing the processes of S1 to S20 shown in FIG. 2, and the maximum likelihood is detected from the obtained likelihoods. The largest category of likelihood the resulting HMM, time series x _1, x ₂ of the audio feature vector, ..., as the recognition result for the input speech signal obtained by extracting the x _T, without the next stage of the device (shown ).

【０３９７】上述のように尤度ln｛Ｐ（ｘ₁ 、ｘ₂ 、…
…、ｘ_T ）｝＝Ｃ_iTを求める過程において、前向き確率
Ｃ_itに関わるスキップ数ｓｋｉｐｃが閾値ＮＳＫＩＰＣ
以下となりかつ距離ｄｔｃが閾値ＤＴＳ以下となる場合
に、出力確率Ｂ_ji(x_t)を式（４）〜（７）から求める演
算も前向き確率Ｃ_itを式（３）若しくは式（１１）から
求める演算も行なわずに、前向き確率Ｃ_itは直前フレー
ムの前向き確率Ｃ_i(t-1)に等しいものとして前向き確率
Ｃ_itを求める演算を終了する。また出力確率Ｂ_ji(x_t)に
関わるスキップ数ｓｋｉｐｓが閾値ＮＳＫＩＰＳ以下と
なりかつ距離ｄｔｓが閾値ＤＴＳ以下となる場合に、出
力確率Ｂ_ji(x_t)を式（４）〜（７）から求める演算を行
なわずに、前向き確率Ｃ_itを求めるので、大幅に演算量
を削減できる。しかもこのような演算の簡略化は、前向
き確率Ｃ_itに関わるスキップ数ｓｋｉｐｃが閾値ＮＳＫ
ＩＰＣ以下となりかつ距離ｄｔｃが閾値ＤＴＣ以下とな
る場合か出力確率Ｂ_ji(x_t)に関わるスキップ数ｓｋｉｐ
ｓが閾値ＮＳＫＩＰＳ以下となりかつ距離ｄｔｓが閾値
ＤＴＳ以下となる場合かのいずれかの場合に行なうの
で、演算の簡略化を行なっても、前向き確率Ｃ_itの誤差
を小さくできる。As described above, the likelihood ln {P (x ₁ , x ₂ , ...
, X _T )} = C _{iT In the} process of obtaining C _iT , the number of skips skipc related to the forward probability C _it is a threshold value NSKIPC.
When the distance dtc is equal to or less than and the distance dtc is equal to or less than the threshold DTS, the calculation of obtaining the output probability B _ji (x _t ) from the equations (4) to (7) is also performed using the forward probability C _it from the equation (3) or the equation (11). Without performing the calculation, the forward probability C _it is assumed to be equal to the forward probability C _{i (t-1)} of the immediately preceding frame, and the calculation of the forward probability C _it ends. Further, when the skip number skips related to the output probability B _ji (x _t ) is less than or equal to the threshold value NSKIPS and the distance dts is less than or equal to the threshold value DTS, the output probability B _ji (x _t ) is obtained from the equations (4) to (7). Since the forward probability C _it is calculated without performing calculation, the amount of calculation can be significantly reduced. Moreover, such a simplification of the calculation is performed by setting the skip number skipc related to the forward probability C _it to be the threshold value NSK.
When the distance is less than IPC and the distance dtc is less than the threshold DTC, or the skip number skip related to the output probability B _ji (x _t ).
Since s is equal to or less than the threshold value NSKIPS and the distance dts is equal to or less than the threshold value DTS, the error of the forward probability C _it can be reduced even if the calculation is simplified.

【０３９８】請求項７の発明は、フレーム単位でマッチ
ング処理を行なう音声認識装置の全てに適用できる。The invention of claim 7 can be applied to all speech recognition apparatuses that perform matching processing in frame units.

【０３９９】[0399]

【発明の効果】上述した説明からも明らかなように、請
求項１の発明の音声認識方法によれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t と基準フレーム番号ｑｓの音
声特徴ベクトルｘ_qsとの間の距離ｄｔｓが閾値ＤＴＳ以
下（ｄｔｓ≦ＤＴＳ）となる場合は、参照確率ｂ_jiの書
換えを行なわずに従って現フレーム番号ｔの出力確率ｂ
_ji(x_t)をヒドンマルコフモデルから求める演算を行なわ
ずに、参照確率ｂ_jiを読み出して現フレーム番号ｔの前
向き確率ｃ_itを求めるので、演算量を大幅に削減でき
る。As is apparent from the above description, according to the voice recognition method of the invention of claim 1, the voice feature vector x _t of the current frame number t and the voice feature vector x qs of the reference frame number _qs are used. When the distance dts between them is equal to or smaller than the threshold value DTS (dts ≦ DTS), the output probability b of the current frame number t is calculated without rewriting the reference probability b _ji.
_Since the reference probability b _ji is read out and the forward probability c _it of the current frame number t is calculated without performing the calculation for obtaining _ji (x _t ) from the Hidden Markov model, the amount of calculation can be greatly reduced.

【０４００】しかもｄｔｓ≦ＤＴＳとなる場合に、現フ
レーム番号ｔの音声特徴ベクトルｘ_t は基準フレーム番
号ｑｓの音声特徴ベクトルｘ_qsからの変化が小さいの
で、現フレーム番号ｔの出力確率ｂ_ji(x_t)を参照確率ｂ
_jiで近似できる。従ってこのようにｄｔｓ≦ＤＴＳとな
る場合に演算を簡略化して前向き確率ｃ_itを求めても、
前向き確率ｃ_itの誤差を小さくできる。[0400] Moreover if the dts ≦ DTS, since speech feature vector x _t of the current frame number t is the change in the speech feature vector x _qs reference frame number qs is small, the output probability b _ji the current frame number t ( x _t ) is the reference probability b
It can be approximated by _ji . Therefore, even if the forward probability c _it is calculated by simplifying the calculation when dts ≦ DTS,
The error of the forward probability c _it can be reduced.

【０４０１】これがため音声認識を行なう際の、尤度ln
｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝＝Ｃ_iTを求める過程
において、前向き確率ｃ_itの誤差を低減しつつ、演算を
簡略化できるので、認識精度の低下を避けつつ高速に音
声認識を行なえる。Therefore, the likelihood ln when performing speech recognition
In the process of _obtaining {P (x ₁ , x ₂ , ..., X _T )} = C _iT , _it is possible to reduce the error of the forward probability c _it and simplify the calculation. Can perform voice recognition.

【０４０２】さらに請求項３の発明の音声認識方法によ
れば、現フレーム番号ｔの出力確率ｂ_ji(x_t)を与える遷
移元Ｓ_j の種別ｓが定常部である場合に、現フレーム番
号ｔの音声特徴ベクトルｘ_t と定常部基準フレーム番号
ｑｓの音声特徴ベクトルｘ_qsとの間の距離ｄｔｓが閾値
ＤＴＳ以下（ｄｔｓ≦ＤＴＳ）であれば、当該種別ｓを
得たｊに関しては、参照確率ｂ_jiの書換えを行なわずに
従って現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマ
ルコフモデルから求める演算を行なわずに、参照確率ｂ
_jiを読み出して現フレーム番号ｔの前向き確率ｃ_itを求
める。また現フレーム番号ｔの出力確率ｂ_ji(x_t)を与え
る遷移元Ｓ_j の種別ｓが過渡部である場合に、現フレー
ム番号ｔの音声特徴ベクトルｘ_t と過渡部基準フレーム
番号ｑｔの音声特徴ベクトルｘ_qtとの間の距離ｄｔｔが
閾値ＤＴＴ以下（ｄｔｔ≦ＤＴＴ）であれば、当該種別
ｓを得たｊに関しては、参照確率ｂ_jiの書換えを行なわ
ずに従って現フレーム番号ｔの出力確率ｂ_ji(x_t)をヒド
ンマルコフモデルから求める演算を行なわずに、参照確
率ｂ_jiを読み出して現フレーム番号ｔの前向き確率ｃ_it
を求める。このように定常部の場合はｄｔｓ≦ＤＴＳ及
び過渡部の場合はｄｔｔ≦ＤＴＴであれば、参照確率ｂ
_jiの書換えを行なわずに前向き確率ｃ_itを求めるので、
演算量を大幅に低減できる。Further, according to the speech recognition method of the present invention, when the type s of the transition source S _j that gives the output probability b _ji (x _t ) of the current frame number t is a stationary part, the current frame number is If the distance dts between the voice feature vector x _{t of t} and the voice feature vector x _qs of the stationary part reference frame number qs is less than or equal to the threshold DTS (dts ≦ DTS), refer to j for which the type s is obtained. The reference probability b _ji is calculated without performing the calculation of the output probability b _ji (x _t ) of the current frame number t from the Hidden Markov model according to the rewriting of the probability b _ji.
_ji is read to obtain the forward probability c _it of the current frame number t. Further, when the type s of the transition source S _j that gives the output probability b _ji (x _t ) of the current frame number t is the transition part, the voice feature vector x _t of the current frame number t and the voice of the transition part reference frame number qt If the distance dtt from the feature vector x _qt is less than or equal to the threshold value DTT (dtt ≦ DTT), the output probability of the current frame number t according to the reference probability b _ji is not rewritten for j that obtained the type s. The forward probability c _it of the current frame number t is read out by reading out the reference probability b _ji without performing the operation of obtaining b _ji (x _t ) from the Hidden Markov model.
Ask for. As described above, if dts ≦ DTS in the case of the stationary part and dtt ≦ DTT in the case of the transient part, the reference probability b
_{Since the} forward probability c _it is calculated without rewriting _ji ,
The amount of calculation can be greatly reduced.

【０４０３】ｄｔｓ≦ＤＴＳであれば、現フレーム番号
ｔの音声特徴ベクトルｘ_t は定常部基準フレーム番号ｑ
ｓの音声特徴ベクトルｘ_qsからの変化が小さいので、当
該種別ｓを得たｊに関しては、現フレーム番号ｔの出力
確率ｂ_ji(x_t)を参照確率ｂ_jiで近似できる。またｄｔｔ
≦ＤＴＴであれば、現フレーム番号ｔの音声特徴ベクト
ルｘ_t は過渡部基準フレーム番号ｑｔの音声特徴ベクト
ルｘ_qtからの変化が小さいので、当該種別ｓを得たｊに
関して、現フレーム番号ｔの出力確率ｂ_ｊｉ（ｘ_ｔ）を
参照確率ｂ_ｊｉで近似できる。従ってこのようにｄｔｓ
≦ＤＴＳ若しくはｄｔｔ≦ＤＴＴの場合に演算を簡略化
して前向き確率ｃ_itを求めても、前向き確率ｃ_itの誤差
を小さくできる。If dts≤DTS, the voice feature vector x _t of the current frame number _t is the stationary part reference frame number q.
Since the change of the voice feature vector x of s from the voice feature vector x _qs is small, the output probability b _ji (x _t ) of the current frame number t can be approximated by the reference probability b _ji for j having obtained the type s. Also dtt
If ≦ DTT, since speech feature vector x _t of the current frame number t is small variation from the speech feature vector x _qt transient portion reference frame number qt, with respect to j that give the type s, the current frame number t The output probability b _ji (x _t ) can be approximated by the reference probability b _ji . Therefore, dts
Also it is determined ≦ DTS or forward probability c _it to simplify the operation in the case of dtt ≦ DTT, can reduce an error of the forward probability c _it.

【０４０４】これがため音声認識を行なう際の、尤度ln
｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝＝Ｃ_iTを求める過程
において、前向き確率ｃ_itの誤差を低減しつつ、演算を
簡略化できるので、認識精度の低下を避けつつ高速に音
声認識を行なえる。Because of this, the likelihood ln at the time of performing speech recognition
In the process of _obtaining {P (x ₁ , x ₂ , ..., X _T )} = C _iT , _it is possible to reduce the error of the forward probability c _it and simplify the calculation. Can perform voice recognition.

【０４０５】さらに請求項７の発明の音声認識方法によ
れば、現フレーム番号ｔの音声特徴ベクトルｘ_t と前向
き確率基準フレーム番号ｑｃの音声特徴ベクトルｘ_qcと
の間の距離ｄｔｃが閾値ＤＴＣ以下となる（ｄｔｃ≦Ｄ
ＴＣとなる）場合は、現フレーム番号ｔの前向き確率ｃ
_itは直前フレームの前向き確率ｃ_i(t-1)に等しいものと
して前向き確率ｃ_itを求める演算を終了する。また距離
ｄｔｃが閾値ＤＴＣを越える（ｄｔｃ＞ＤＴＣとなる）
場合に、現フレーム番号ｔの音声特徴ベクトルｘ_t と出
力確率基準フレーム番号ｑｓの音声特徴ベクトルｘ_qsと
の間の距離ｄｔｓが閾値ＤＴＳ以下（ｄｔｓ≦ＤＴＳ）
となれば、参照確率ｂ_jiの書換えを行なわずに従って現
フレーム番号ｔの出力確率ｂ_ji(x_t)をヒドンマルコフモ
デルから求める演算を行なわずに、参照確率ｂ_jiを読み
出して現フレーム番号ｔの前向き確率ｃ_itを求める。こ
のようにｄｔｃ≦ＤＴＣ若しくはｄｔｓ≦ＤＴＳとなる
場合に、参照確率ｂ_jiの書換えを行なわずに前向き確率
ｃ_itを求めるので、演算量を大幅に削減できる。[0405] Further according to the speech recognition method of the invention of claim 7, the following distance dtc threshold DTC between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t (Dtc ≦ D
TC), the forward probability c of the current frame number t
_It is assumed that _it is equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the calculation for obtaining the forward probability c _it ends. In addition, the distance dtc exceeds the threshold value DTC (dtc> DTC).
If the following distance dts threshold DTS between speech feature vector x _qs speech feature vector x _t and the output probability reference frame number qs of the current frame number t (dts ≦ DTS)
If the reference probability b _ji the output probability b _ji the current frame number t in accordance without rewriting (x _t) of without operation for obtaining from Hidden Markov Models, reference probability b _ji reads in the current frame number t The forward probability c _{it of} is calculated. In this way, when dtc ≦ DTC or dts ≦ DTS, the forward probability c _it is obtained without rewriting the reference probability b _ji , so the amount of calculation can be greatly reduced.

【０４０６】しかもｄｔｃ≦ＤＴＣとなる場合に、現フ
レーム番号ｔの音声特徴ベクトルｘ_t は前向き確率基準
フレーム番号ｑｃの音声特徴ベクトルｘ_qcからの変化が
小さいので、現フレーム番号ｔの前向き確率ｃ_itを直前
フレームの前向き確率ｃ_i(t-1)で近似できる。またｄｔ
ｓ≦ＤＴＳ以下となる場合に、現フレーム番号ｔの音声
特徴ベクトルｘ_t は基準フレーム番号ｑｓの音声特徴ベ
クトルｘ_qsからの変化が小さいので、現フレーム番号ｔ
の出力確率ｂ_ji(x_t)を参照確率ｂ_jiで近似できる。従っ
てこのようにｄｔｃ≦ＤＴＣ若しくはｄｔｓ≦ＤＴＳの
場合に演算を簡略化して前向き確率ｃ_itを求めても、前
向き確率ｃ_itの誤差を小さくできる。[0406] Moreover when the dtc ≦ DTC, the change from speech feature vector x _qc speech feature vector x _t is the forward probability reference frame number qc of the current frame number t is small, the forward probability c of the current frame number t _It can be approximated by the forward probability c _{i (t-1)} of the immediately preceding frame. Also dt
When s ≦ DTS or less, since the voice feature vector x _t of the current frame number t has a small change from the voice feature vector x _qs of the reference frame number qs, the current frame number t
The output probability b _ji (x _t ) can be approximated by the reference probability b _ji . Therefore, even when the forward probability c _it is obtained by simplifying the calculation in the case of dtc ≦ DTC or dts ≦ DTS, the error of the forward probability c _it can be reduced.

【０４０７】これがため音声認識を行なう際の、尤度ln
｛Ｐ（ｘ₁ 、ｘ₂ 、……、ｘ_T ）｝＝Ｃ_iTを求める過程
において、前向き確率ｃ_itの誤差を低減しつつ、演算を
簡略化できるので、認識精度の低下を避けつつ高速に音
声認識を行なえる。Because of this, the likelihood ln when performing speech recognition
In the process of _obtaining {P (x ₁ , x ₂ , ..., X _T )} = C _iT , _it is possible to reduce the error of the forward probability c _it and simplify the calculation. Can perform voice recognition.

[Brief description of the drawings]

【図１】請求項１の発明の実施に用いて好適な装置構成
の一例を示す図である。FIG. 1 is a diagram showing an example of a device configuration suitable for use in carrying out the invention of claim 1;

【図２】ヒドンマルコフモデルの説明に供する図であ
る。FIG. 2 is a diagram for explaining a Hidden Markov model.

【図３】請求項１の発明の第一実施形態の説明に供する
流れ図である。FIG. 3 is a flowchart for explaining the first embodiment of the invention of claim 1;

【図４】請求項１の発明の第一実施形態の説明に供する
流れ図である。FIG. 4 is a flowchart for explaining the first embodiment of the invention of claim 1;

【図５】請求項１の発明の第二実施形態の説明に供する
流れ図である。FIG. 5 is a flowchart for explaining the second embodiment of the invention of claim 1;

【図６】請求項１の発明の第二実施形態の説明に供する
流れ図である。FIG. 6 is a flowchart for explaining the second embodiment of the invention of claim 1;

【図７】請求項３の発明の実施に用いて好適な装置構成
の一例を示す図である。FIG. 7 is a diagram showing an example of a device configuration suitable for use in carrying out the invention of claim 3;

【図８】ヒドンマルコフモデルの説明に供する図であ
る。FIG. 8 is a diagram for explaining a Hidden Markov model.

【図９】請求項３の発明の第一実施形態の説明に供する
流れ図である。FIG. 9 is a flowchart for explaining the first embodiment of the invention of claim 3;

【図１０】請求項３の発明の第一実施形態の説明に供す
る流れ図である。FIG. 10 is a flowchart for explaining the first embodiment of the invention of claim 3;

【図１１】請求項３の発明の第一実施形態の説明に供す
る流れ図である。FIG. 11 is a flowchart for explaining the first embodiment of the invention of claim 3;

【図１２】請求項３の発明の第二実施形態の説明に供す
る流れ図である。FIG. 12 is a flowchart for explaining the second embodiment of the invention of claim 3;

【図１３】請求項３の発明の第二実施形態の説明に供す
る流れ図である。FIG. 13 is a flowchart for explaining the second embodiment of the invention of claim 3;

【図１４】請求項３の発明の第二実施形態の説明に供す
る流れ図である。FIG. 14 is a flowchart for explaining the second embodiment of the invention of claim 3;

【図１５】請求項７の発明の実施に用いて好適な装置構
成の一例を示す図である。FIG. 15 is a diagram showing an example of a device configuration suitable for use in implementing the invention of claim 7;

【図１６】ヒドンマルコフモデルの説明に供する図であ
る。FIG. 16 is a diagram for explaining a Hidden Markov model.

【図１７】請求項７の発明の第一実施形態の説明に供す
る流れ図である。FIG. 17 is a flowchart for explaining the first embodiment of the invention of claim 7;

【図１８】請求項７の発明の第一実施形態の説明に供す
る流れ図である。FIG. 18 is a flowchart for explaining the first embodiment of the invention of claim 7;

【図１９】請求項７の発明の第一実施形態の説明に供す
る流れ図である。FIG. 19 is a flowchart for explaining the first embodiment of the invention of claim 7;

【図２０】請求項７の発明の第二実施形態の説明に供す
る流れ図である。FIG. 20 is a flowchart for explaining the second embodiment of the invention of claim 7;

【図２１】請求項７の発明の第二実施形態の説明に供す
る流れ図である。FIG. 21 is a flowchart for explaining the second embodiment of the invention of claim 7;

【図２２】請求項７の発明の第二実施形態の説明に供す
る流れ図である。FIG. 22 is a flow chart for explanation of a second embodiment of the invention of claim 7;

[Explanation of symbols]

１０、２２、３４：音声認識装置１２、２４、３６：辞書部１４、２６、３８：音響処理部１６、２８、４０：音声区間検出部１８、３０、４２：照合部２０、３２、４４：参照情報記憶部 10, 22, 34: Speech recognition device 12, 24, 36: Dictionary section 14, 26, 38: Sound processing section 16, 28, 40: Speech section detection section 18, 30, 42: Collation section 20, 32, 44: Reference information storage

Claims

[Claims]

1. A time series x ₁ , x of voice feature vectors extracted from a start frame to an end frame of a voice section.
₂ , ……, the likelihood ln between x _T and Hidden Markov model
{P (x ₁ , x ₂ , ..., X _T )} is calculated, and the category assigned to the Hidden-Markov model that has the maximum likelihood is used as the recognition result for the speech signal in the speech section. In the recognition method, However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that initial state is S _i in Hidden Markov model a _ji : State S in Hidden Markov model Probability of transition from _j to state S _i x _t : speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, the first frame is the start frame of the speech section) And the T-th frame represents the end frame of the speech section) b _ji (x _t ): Speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t and reaching state S _i * i: Likelihood ln {P (x (x)) is obtained by using each equation represented by state number i given to state S _i that is the final state in the Hidden Markov model. ₁ , x ₂ , ...
, X _T )}, a storage unit for storing the reference frame number qs and the reference probability b _ji is provided, and using the reference probability b _ji , t = 1, 2, ...
, The forward probability c _it in each case of T is sequentially obtained,
(1). When the t = 1, is initialized to 1 reference frame number qs, all j, for i, the output probability b _ji the (x _t) determined from the hidden Markov model the output probability b _ji the (x _t) writing the initial value of the reference probability b _ji, reads each reference probability b _ji after completion of writing of the reference probability b _ji seek forward probability c _it processing (1A), after the end of the process (1A), the current frame The process (1B) of adding 1 to the number t is performed, and (2). 2 ≦ t when the ≦ T, the distance between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t dts
Is compared with a threshold DTS, and when the comparison result is dts> DTS, the reference frame number qs is set to the current frame number t.
And the output probability b _ji (x _t ) for all j and i is calculated from the Hidden Markov model and the reference probability b
_ji is rewritten to the output probability b _ji (x _t ) and the reference probability b _ji
After the end of rewriting, each reference probability b _ji is read out to obtain the forward probability c _it , and when the comparison result is dts ≦ DTS, each reference probability b _ji is not rewritten.
Processing to read _ji and obtain forward probability c _it (1C)
And a process (1D) of adding 1 to the current frame number t after the process (1C) is completed.

2. The voice recognition method according to claim 1, wherein
(1). When t = 1, the reference frame number qs is set to 1 and the skip number ski
Initialize ps to 0, and for all j, i,
Writing output probability b _ji (x _t) the probability that output determined from Hidden Markov Models b _ji the (x _t) as the initial value of the reference probability b _ji, reads out the reference probability b _ji after completion of writing of the reference probability b _ji Processing (1A) for obtaining the forward probability c _it , and processing (1B) for adding 1 to the current frame number t after completion of the processing (1A), (2). 2 ≦ t when the ≦ T, together with comparing the number of skips skips a threshold NSKIPS, threshold DTS distance dts between the speech feature vector x _qs speech feature vector x _t and the reference frame number qs of the current frame number t And the comparison result is skips.
> NSKIPS or dts> DTS,
The skip number skips is initialized to 0, the reference frame number qs is rewritten to the current frame number t, and
For all j and i, the output probability b _ji (x _t ) is obtained from the Hidden Markov model, and the reference probability b _ji is set as the output probability b _ji.
rewritten (x _t), determine the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji, if the comparison result is skips ≦ NSKIPS and dts ≦ DTS, skip number Skips in conjunction with adding 1, reference probability b _ji each reference probability b _ji without rewriting the
And a process (1C) for obtaining the forward probability c _it and a process (1D) for adding 1 to the current frame number t after completion of the process (1C).

3. A time series x ₁ , x of voice feature vectors extracted from the start frame to the end frame of a voice section.
₂ , ……, the likelihood ln between x _T and Hidden Markov model
{P (x ₁ , x ₂ , ..., X _T )} is calculated, and the category assigned to the Hidden-Markov model that has the maximum likelihood is used as the recognition result for the speech signal in the speech section. In the recognition method, However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that initial state is S _i in Hidden Markov model a _ji : State S in Hidden Markov model Probability of transition from _j to state S _i x _t : speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, the first frame is the start frame of the speech section) And the T-th frame represents the end frame of the speech section) b _ji (x _t ): Speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability of outputting x _t and reaching state S _i * i: Likelihood ln {P (x (x)) is obtained by using each equation represented by state number i given to state S _i that is the final state in the Hidden Markov model. ₁ , x ₂ , ...
,, x _T )}, the state S _j that is the transition source in the Hidden Markov model
Is provided with a stationary part or transient part type s, a stationary part reference frame number qs, a transient part reference frame number qt, and a reference probability b _ji are provided, and the reference probability b _ji is provided. , The forward probability c _it for each case of t = 1, 2, ..., T is sequentially obtained, and (1). When t = 1, the stationary part reference frame number qs and the transient part reference frame number qt are initialized to 1, and the output probabilities b _ji (x _t ) are calculated from the Hidden Markov model for all j and i. writing the output probability b _ji the (x _t) as the initial value of the reference probability b _ji, reads each reference probability b _ji after completion of writing of the reference probability b _ji seek forward probability c _it processes (2
A) and a process (2B) of adding 1 to the current frame number t after completion of the process (2A), (2). 2 ≦ t ≦ T when the distance d between the speech feature vector _qs speech feature vector x _t and the constant part reference frame number qs of the current frame number t
ts is compared with the threshold value DTS, and the comparison result is dts> DT
When the S, processing of rewriting the constant part reference frame number qs to the current frame number t and (2C), the speech feature vector x _qt speech feature vector x _t and the transient portion reference frame number qt the current frame number t The distance dtt between them is compared with the threshold value DTT, and the comparison result shows that dtt> D.
In the case of TT, the process (2D) of rewriting the transition part reference frame number qt to the current frame number t, and j = 1, 2, after the processes (2C) and (2D) are finished.
.., for each j of J, a process (2E) of determining the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ), and the type of the process (2E) When the determination result is the stationary part and the comparison result of the process (2C) is dts> DTS, the output probabilities b _ji (x _t ) for all i with respect to j for which the type determination result is obtained. From the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and when the type determination result of the process (2E) is a stationary part, the comparison result of the process (2C) Is dts ≦ DTS, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and if the type determination result of the process (2E) is a transient part, the process (2D If the comparison result of) is dtt> DTT, it is related to j that has obtained the type determination result. For all Te is the i, rewritten to the output probability b _ji (x _t) The hidden Markov model reference probability b _ji determined from the output probability b _ji (x _t), type determination result of the processing (2E) transient If the comparison result of the process (2D) is dtt ≦ DTT, the process (2F) in which the reference probability b _ji is not rewritten for j that has obtained the type determination result, and j = The above process (2F) for each j of 1, 2, ..., J
When the process (2F) is completed for all j, the reference probability b _ji is read to obtain the forward probability c _it (2G), and after the process (2G) is completed, the current frame number t is set. A voice recognition method characterized by performing a process of adding 1 (2H).

4. The voice recognition method according to claim 3, wherein the process (2E) is performed after the processes (2C) and (2D) are completed.

5. The voice recognition method according to claim 3,
(1). When t = 1, the number of skips in the steady part skips and the number of skips in the transient part ski
pt to 0 and the reference frame number q of the stationary part
s and the transition part reference frame number qt are initialized to 1, respectively, and the output probabilities b _ji (x _t ) for all j and i.
From the Hidden Markov model, the output probability b _ji (x _t )
_{Is written as} the initial value of the reference probability b _ji , and the reference probability b
After the writing of _ji , each reference probability b _ji is read out to obtain the forward probability c _it (2A), and after the completion of the process (2A), a process of adding 1 to the current frame number t (2B) is performed. , (2). When 2 ≦ t ≦ T, the constant part skip number skips is compared with the threshold value NSKIPS, and the voice feature vector x _t of the current frame number _t
And the voice feature vector x _{qs of the} stationary part reference frame number _qs
When the comparison result is skips> NSKIPS or dts> DTS, the constant part skip number skips is initialized to 0, and the constant part reference frame number qs is calculated. The frame number t is rewritten, and the comparison result is skips ≦ NSKIP.
When S and dts ≦ DTS, the process (2C) of adding 1 to the constant part skip number skips is compared with the transient part skip number skipt and the threshold NSKIPT, and the voice feature vector x _t of the current frame number _t
And the voice feature vector x _{qt of the} transition part reference frame number _qt
When the comparison result is skip> NSKIPT or dtt> DTT, the transition skip number skippt is initialized to 0, and the transition reference frame number qt is calculated. Rewrite to frame number t, and the comparison result is skipt ≦ NSKIP
When T and dtt ≦ DTT, the process (2D) of adding 1 to the transition part skip number skipt, and j = 1, 2, ... After the processes (2C) and (2D) are finished.
, Processing for determining the type s _assigned to the transition source S _j of the state transition that gives the output probability b _ji (x _t ) for each j of J, and type determination for this processing (2E) When the result is the stationary part, the comparison result of the process (2C) is skips> NSK.
If IPS or dts> DTS, for j for which the type determination result is obtained, output probability b for all i
_ji (x _t ) is calculated from Hidden Markov model and the reference probability is b _ji
To the output probability b _ji (x _t ), and if the type determination result of the process (2E) is the stationary part, the process (2
The comparison result of C) is skips ≦ NSKIPS and dts
If ≦ DTS, the reference probability b _ji is not rewritten for j for which the type determination result is obtained, and if the type determination result of the process (2E) is a transient part, the process (2
The comparison result of D) is skip> NSKIPT or d
If tt> DTT, the output probability b _ji (x _t ) is obtained from the Hidden-Markov model for all i for j for which the type determination result is obtained, and the reference probability b _{ji is set} for the output probability b _ji.
rewritten _ji (x _t), when the type determination result of the processing (2E) was a transient portion, the comparison result of the processing (2D) is s
If kipt ≦ NSKIPT and dtt ≦ DTT,
For j for which the type determination result has been obtained, the processing (2F) in which the reference probability b _ji is not rewritten and the processing (2F) is performed for each j of j = 1, 2, ... When the processing (2F) is completed for j,
It is characterized by performing processing (2G) for reading out each reference probability b _ji to obtain a forward probability c _it , and processing (2H) for adding 1 to the current frame number t after the processing (2G) is completed. Speech recognition method.

6. The voice recognition method according to claim 5, wherein the process (2E) is performed after the processes (2C) and (2D) are completed.

7. A time series x ₁ , x of voice feature vectors extracted from the start frame to the end frame of a voice section.
₂ , ……, the likelihood ln between x _T and Hidden Markov model
{P (x ₁ , x ₂ , ..., X _T )} is calculated, and the category assigned to the Hidden-Markov model that has the maximum likelihood is used as the recognition result for the speech signal in the speech section. In the recognition method, However, i: i = 1, 2, ..., I j: j = 1, 2, ..., J Φ _i : Probability that initial state is S _i in Hidden Markov model a _ji : State S in Hidden Markov model Probability of transition from _j to state S _i x _t : speech feature vector extracted in the t-th frame in the speech section (1 ≦ t ≦ T, the first frame is the start frame of the speech section) And the T-th frame represents the end frame of the speech section) b _ji (x _t ): Speech feature vector x output when transitioning from state S _j to state S _i in the Hidden Markov model
Output probability of _t c _it : Time series of speech feature vector x ₁ , x ₂ , ..., Which starts transition from the initial state in Hidden Markov model
Forward probability * i that outputs x _t and reaches the state S _i : Likelihood ln {P (x (x) is obtained by using each equation represented by the state number i assigned to the final state S _i in the Hidden Markov model. ₁ , x ₂ , ...
, X _T )}, a storage unit for storing the forward probability reference frame number qc, the output probability reference frame number qs, and the reference probability b _ji is provided, and using the reference probability b _ji , t = Forward probabilities c _it for each case of 1, 2, ..., T are sequentially obtained, and (1). t = 1
, The forward probability reference frame number qc and the output probability reference frame number qs are initialized to 1, and the output probabilities b _ji (x _t ) for all j and i are calculated from the Hidden Markov model. write b _ji the (x _t) as the initial value of the reference probability b _ji, reads each reference probability b _ji after completion of writing of the reference probability b _ji seek forward probability c _it processing (3A), the processing (3A After the end of (), the process of adding 1 to the current frame number t (3B) is performed, and (2). When the 2 ≦ t ≦ T, the process of comparing with a threshold DTC distance dtc between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t (3C), When the comparison result of the processing (3C) is dtc ≦ DTC, the forward probability c _it is the forward probability c of the immediately preceding frame.
_When the comparison result of the processing (3D) for terminating the calculation of the forward probability c _it as equal to _{i (t-1)} and the processing (3C) is dtc> DTC, the forward probability reference frame number qc is set. and processing (3E) for rewriting the current frame number t, after the end of the process (3E), the distance between the speech feature vector x _qs speech feature vector x _t and the output probability reference frame number qs of the current frame number t dts Is compared with the threshold DTS, and when the comparison result is dts> DTS, the output probability reference frame number qs is rewritten to the current frame number t, and the output probabilities b _ji (x _t ) are calculated for all j and i. hidden rewritten reference probability b _ji determined from Markov model to the output probability b _ji (x _t), determine the forward probability c _it reads each reference probability b _ji after rewriting completion of the reference probability b _ji, the comparison If the result is dts ≦ DTS, without rewriting of the reference probability b _ji reads each reference probability b _ji seek forward probability c _it processing (3F), the processing (3D) or (3F) A voice recognition method characterized by performing a process (3G) of adding 1 to the current frame number t after the end.

8. The voice recognition method according to claim 7, wherein
(1). When t = 1, the forward probability reference frame number qc and the output probability reference frame number qs are initialized to 1, and the forward probability step number skipc and the output probability step number skips are initialized to 0, and all j are initialized. , I, the output probability b _ji (x _t ) is obtained from the Hidden Markov model, and the output probability b _ji (x _t ) is written as the initial value of the reference probability b _ji ,
And processing (3A) for obtaining the forward probability c _it reads each reference probability b _ji after completion of writing of the reference probability b _ji, after completion of the process (3A), the process of adding 1 to the current frame number t (3B ) And (2). When the 2 ≦ t ≦ T, the forward probability skip number skipc with is compared with a threshold value NSKIPC, the distance between the speech feature vector x _qc of forward probabilities reference frame number qc and speech feature vectors x _t of the current frame number t The process (3C) of comparing dtc with the threshold value DTC and the comparison result of the process (3C) are skipc ≦ NSKIPC.
When dtc ≦ DTC, the forward probability c _it is assumed to be equal to the forward probability c _{i (t-1)} of the immediately preceding frame, and the calculation of the forward probability c _it is completed, and the forward probability skip number skippc, the output probability Skip number skip
The comparison result of the process (3D) of adding 1 to s and the process (3C) is skippc> NSKIPC.
Alternatively, when dtc> DTC, the forward probability skip number skippc is initialized to 0, and the forward probability reference frame number qc is rewritten to the current frame number t (3E), and after the process (3E), Output probability skip number skip
s with a comparison with a threshold NSKIPS, the distance dts between the speech feature vector x _qs output probabilities reference frame number qs with speech feature vectors x _t of the current frame number t is compared with a threshold DTS, the comparison result is skips ＞ NSKI
When PS or dts> DTS, the output probability skip number skips is initialized to 0, the output probability reference frame number qs is rewritten to the current frame number t, and the output probability b _ji (for all j and i). x _t ) is calculated from the Hidden Markov model, the reference probability b _ji is rewritten to the output probability b _ji (x _t ), and each reference probability b _ji is read out after the reference probability b _ji is rewritten to obtain the forward probability c _it . ,
The comparison result is skips ≦ NSKIPS and dts ≦ D.
If TS, the output probability skip count is 1 in skips
Is added and the reference probability b _ji (x _t ) is not rewritten and each reference probability b _ji is read to obtain the forward probability c _it (3F), and the process (3D) or (3F) ends. Then, a process (3G) of adding 1 to the current frame number t is performed.