JPH06175685A

JPH06175685A - Pattern recognition device and hidden markov model generating device

Info

Publication number: JPH06175685A
Application number: JP4329489A
Authority: JP
Inventors: Junichi Nakabashi; 順一中橋; Hidekazu Tsuboka; 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-12-09
Filing date: 1992-12-09
Publication date: 1994-06-24

Abstract

PURPOSE:To simplify the constitution of the entire device and to reduce the number of arithmetic operations by beforehand computing weighting coefficient vectors and eliminating a weighting coefficient vector computing section. CONSTITUTION:The device is provided with a weighting coefficient vector storage means so as to store weighting coefficient vectors prior to the learning of a Hidden Markov Model(HMM) and to use their fixed values during a pattern recognition. Namely, a weighting vector storage section 104 stores weighting coefficients u1, u2,... UK corresponding to a first to a Kth representative vectors as fixed values independent to the frames of an input voice. Therefore, during an HMM learning, the computations of weighting coefficient vectors, which are computed sequentially, are eliminated while conducting a recognition, a voice recognition is performed by employing a smaller amount of a storage capacity, the device is simplified and the amount of computations is reduced.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はパタン認識装置及びヒド
ゥンマルコフモデル(ＨＭＭ：Hidden MarkovModel)作成
装置(以下、単にＨＭＭ作成装置という)に関し、特に音
声認識等の時系列パターンを識別するための装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern recognition device and a Hidden Markov Model (HMM) creating device (hereinafter simply referred to as an HMM creating device), and more particularly to a device for identifying a time series pattern such as voice recognition. Regarding

【０００２】本発明は一般の時系列信号に適応可能なも
のであるが、説明の便宜のために、以下、従来の技術及
び本発明については、音声認識を例に説明する。The present invention can be applied to general time-series signals, but for convenience of explanation, the prior art and the present invention will be described below by taking speech recognition as an example.

【０００３】[0003]

【従来の技術】一般に、音声認識装置は、未知の音声信
号を定められた音響特徴ベクトルの系列に変換し、その
後、前もって記憶されている識別された参照パタンを表
す音響特徴ベクトルと比較するように構成されている。
比較の結果として、未知の音声信号は、定められた認識
基準に従って最もよく適合する参照パタンとして識別さ
れる。現在、最も性能の良いとされる参照パタンは、統
計的推定に基づく状態と状態遷移との集合を利用したＨ
ＭＭである。BACKGROUND OF THE INVENTION In general, speech recognizers convert an unknown speech signal into a sequence of defined acoustic feature vectors and then compare it with a previously stored acoustic feature vector representing an identified reference pattern. Is configured.
As a result of the comparison, the unknown speech signal is identified as the best matching reference pattern according to the defined recognition criteria. Currently, the reference pattern that is considered to have the best performance is H that uses a set of states and state transitions based on statistical estimation.
It is MM.

【０００４】そこで、先ず、ＨＭＭについて説明する。
ＨＭＭは、観測系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tにおいて各
観測が有限の数Ｍのシンボル中のどれかであるような観
測系列を評価するのに用いられる。観測系列は、直接的
には観測可能でない状態遷移を持つ潜在的なマルコフ鎖
の確率的関数としてモデル化できる。図５はこのような
ＨＭＭを説明するための図である。Therefore, first, the HMM will be described.
The HMM is used to evaluate an observation sequence such that each observation is one of a finite number M of symbols in the observation sequence O = o ₁ , o ₂ , ..., O _T. The observation sequence can be modeled as a stochastic function of a latent Markov chain with state transitions that are not directly observable. FIG. 5 is a diagram for explaining such an HMM.

【０００５】図５では、状態数Ｎ＝３、有限個の出力シ
ンボルＭ＝４を例としてある。状態１，２，３間の遷移
は状態遷移確率行列Ａ＝［ａ_ij］として表され、ａ_ijは
モデルが状態ｉにいる場合には状態ｊに遷移を生ずる確
率である。モデルの出力シンボルの確率はシンボル出力
確率行列Ｂ＝［ｂ_j(k)］で表され、ｂ_j(k)はモデルが状
態ｊに遷移した場合にシンボルｋを出力する確率であ
る。ＨＭＭは、各語彙に対して１つずつ作成しておき、
各ＨＭＭから未知の観測系列を発生する確率に基づいて
観測系列を分類するために用いることができる。In FIG. 5, the number of states N = 3 and the finite number of output symbols M = 4 are taken as an example. The transition between states 1, 2 and 3 is represented as a state transition probability matrix A = [a _ij ], where a _ij is the probability of transition to state j when the model is in state i. The probability of the output symbol of the model is represented by a symbol output probability matrix B = [b _j (k)], and b _j (k) is the probability of outputting the symbol k when the model transits to the state j. Create one HMM for each vocabulary,
It can be used to classify observation sequences based on the probability of generating an unknown observation sequence from each HMM.

【０００６】未知入力に対して得られるシンボル系列を
Ｏ＝ｏ₁，ｏ₂，…，ｏ_T、ＨＭＭλから発生できる長さ
Ｔの任意の状態系列をＳ＝ｓ₁，ｓ₂，…，ｓ_Tとすると
き、λからシンボル系列Ｏの発生する確率(尤度)は、
(数１)のように示される。A symbol sequence obtained for an unknown input is O = o ₁ , o ₂ , ..., O _T , and an arbitrary state sequence of length T that can be generated from HMMλ is S = s ₁ , s ₂ ,. _{When T} , the probability (likelihood) that the symbol sequence O occurs from λ is
It is shown as (Equation 1).

【０００７】[0007]

【数１】 [Equation 1]

【０００８】以上は、入力ｘ_tを唯一のシンボルｏ_tに変
換するようにしたものであるが、Ｋ個の複数のシンボル
ｏ_t1，ｏ_t2，…，ｏ_tKに変換し(シンボルベクトルｑ_t＝
(ｏ_t1，ｏ_t2，…，ｏ_tK))、各々のシンボルとの荷重係
数ｕ_t1，ｕ_t2，…，ｕ_tKを用いて表す方法(荷重係数ベ
クトルｖ_t(ｕ_t1，ｕ_t2，…，ｕ_tK))もあり、その場合、
未知入力はシンボルベクトル系列Ｑ＝ｑ₁，ｑ₂，…，ｑ
_Tと荷重係数ベクトル系列Ｖ＝ｖ₁，ｖ₂，…，ｖ_Tで表現
され、ＨＭＭλからシンボルベクトル系列の発生する確
率(尤度)は、(数２)のように示される。In the above, the input x _t is converted into a unique symbol o _t , but it is converted into a plurality of K symbols o _t1 , o _t2 , ..., O _tK (the symbol vector q _t =
(o _t1 , o _t2 , ..., O _tK )), and a method of using the weight coefficients u _t1 , u _t2 , ..., U _tK with each symbol (weight coefficient vector v _t (u _t1 , u _t2 , ...). , U _tK )), in which case
The unknown input is a symbol vector sequence Q = q ₁ , q ₂ , ..., q
_T and the load coefficient vector sequence _{_{V = v 1, v 2,}} ..., expressed in v _T, the probability of occurrence of the symbol vector sequence to the HMMramuda (likelihood) is expressed as shown in equation (2).

【０００９】[0009]

【数２】 [Equation 2]

【００１０】各々の語彙のＨＭＭから求まる尤度を比較
することにより認識は行われる。Recognition is performed by comparing the likelihoods obtained from the HMMs of each vocabulary.

【００１１】[0011]

【外１】 [Outer 1]

【００１２】[0012]

【数３】 [Equation 3]

【００１３】図６は従来のＨＭＭを用いた音声認識装置
の構成を示すブロック図である。同図において、601は
特徴抽出部であり、入力音声信号ｖを線形予測コーディ
ング(ＬＰＣ：Linear Predictive Coding)分析、フーリ
エ変換等の周知の方法により、一定時間間隔毎に特徴ベ
クトルの系列Ｘ＝ｘ₁，ｘ₂，…ｘ_t，…，ｘ_Tに変換す
る。ここで、Ｔは、入力音声信号ｖにおける特徴ベクト
ル系列の長さである。FIG. 6 is a block diagram showing the structure of a conventional speech recognition apparatus using an HMM. In the figure, reference numeral 601 denotes a feature extraction unit, which uses a well-known method such as linear predictive coding (LPC) analysis or Fourier transform of an input speech signal v to obtain a feature vector sequence X = x at regular time intervals. _{_{_{1, x 2, ... x t}}} , ..., is converted to x _T. Here, T is the length of the feature vector sequence in the input audio signal v.

【００１４】602はコードブックと呼ばれるものであ
り、図７に示すように有限個Ｍの各シンボルを表す代表
ベクトルを保持している。即ち、各行の１カラム目にシ
ンボルを、それ以降に代表ベクトルを格納する形で、コ
ードブックの数Ｍ行で構成されている。Reference numeral 602 is called a codebook, and holds a representative vector representing a finite number M of symbols as shown in FIG. In other words, the symbol is stored in the first column of each row, and the representative vector is stored after that in the number M rows of the codebook.

【００１５】603はベクトル量子化部であり、前記特徴
ベクトルｘ_tを前記コードブック602の最も近い順に１位
からＫ位の代表ベクトルのシンボルに置き換え、シンボ
ルベクトルｑ_t＝(ｏ_t1，ｏ_t2，…，ｏ_tk…，ｏ_tK)に変
換し、前記特徴ベクトルの系列をシンボルベクトル系列
Ｑ＝ｑ₁，ｑ₂，…，ｑ_Tに変換するものである。Reference numeral 603 denotes a vector quantizer, which replaces the feature vector x _t with the symbols of the representative vectors of the 1st to Kth ranks in the closest order of the codebook 602, and symbol vectors q _t = (o _t1 , o _t2 , ..., o _tk ..., o _tK ), and the sequence of the feature vectors is converted into a symbol vector sequence Q = q ₁ , q ₂ , ..., q _T.

【００１６】604は荷重係数ベクトル算出部であり、前
記特徴ベクトルｘ_tの前記ベクトル量子化部603により選
ばれる１位からＫ位の代表ベクトルそれぞれに対する荷
重係数を(数４)に従い算出し、荷重係数ベクトルｖ_t＝
(ｕ_t1，ｕ_t2，…，ｕ_tk…，ｕ_tK)を算出し、荷重係数ベ
クトル系列Ｖ＝ｖ₁，ｖ₂，…，ｖ_Tを算出するものであ
る。Reference numeral 604 denotes a weighting factor vector calculating unit, which calculates a weighting factor for each of the 1st to Kth representative vectors selected by the vector quantization unit 603 of the feature vector x _t according to (Equation 4), Coefficient vector v _t =
(u _t1 , u _t2 , ..., U _tk ..., U _tK ), and the load coefficient vector series V = v ₁ , v ₂ , ..., V _T is calculated.

【００１７】[0017]

【数４】 [Equation 4]

【００１８】[0018]

【外２】 [Outside 2]

【００１９】[0019]

【外３】 [Outside 3]

【００２０】[0020]

【数５】 [Equation 5]

【００２１】[0021]

【外４】 [Outside 4]

【００２２】[0022]

【数６】 [Equation 6]

【００２３】608は尤度記憶部であり、前記尤度算出部6
07で算出された各単語の尤度を比較するため記憶する。Reference numeral 608 denotes a likelihood storage unit, which is the likelihood calculation unit 6
It is stored to compare the likelihood of each word calculated in 07.

【００２４】609は比較判定部であり、前記尤度記憶部6
08に記憶されているそれぞれのＨＭＭに対する尤度の最
大値を与えるＨＭＭに対応する語彙を認識結果(rec)と
して判定するものである。Reference numeral 609 denotes a comparison / determination unit, which is the likelihood storage unit 6
The vocabulary corresponding to the HMM that gives the maximum likelihood value for each HMM stored in 08 is determined as the recognition result (rec).

【００２５】前記各部606から608は各語彙のＨＭＭにつ
き１度ずつ行い、w=1〜Wまで繰り返され、その結果を前
記比較判定部609で評価する。The respective units 606 to 608 perform once for each HMM of each vocabulary and repeat from w = 1 to W, and the result is evaluated by the comparison and determination unit 609.

【００２６】以上のようなＨＭＭを用いた認識を行うた
めには、事前にＨＭＭを作成しておく必要がある。これ
をＨＭＭの学習と呼び、以下にその方法について説明す
る。In order to perform recognition using the HMM as described above, it is necessary to create the HMM in advance. This is called HMM learning, and the method will be described below.

【００２７】[0027]

【外５】 [Outside 5]

【００２８】802はコードブックと呼ばれるものであ
り、有限個Ｍの各シンボルを表わす代表ベクトルを保持
しており、その構成は前記図７と同様である。Reference numeral 802 is called a codebook, which holds a representative vector representing a finite number of M symbols, and its configuration is the same as that shown in FIG.

【００２９】[0029]

【外６】 [Outside 6]

【００３０】[0030]

【外７】 [Outside 7]

【００３１】[0031]

【外８】 [Outside 8]

【００３２】806はＨＭＭ一時記憶部であり、初期ＨＭ
Ｍ(Ａ，Ｂは乱数、または経験値などを用いたもの)や逐
次学習を繰り返す上で学習が収束する以前の学習途上Ｈ
ＭＭを記憶するものであり、前記状態遷移確率行列Ａと
前記シンボル出力確率行列Ｂを記憶しておき学習が１度
終わる度に更新する。Reference numeral 806 denotes an HMM temporary storage unit, which is an initial HM.
M (where A and B are random numbers or empirical values) and learning H before the learning converges when repeating the learning H
The MM is stored, and the state transition probability matrix A and the symbol output probability matrix B are stored and updated every time learning is completed.

【００３３】[0033]

【外９】 [Outside 9]

【００３４】[0034]

【外１０】 [Outside 10]

【００３５】[0035]

【数７】 [Equation 7]

【００３６】[0036]

【外１１】 [Outside 11]

【００３７】810は再推定部であり、(数８)に従って状
態遷移確率ａ_ijを、(数９)に従ってシンボル出力確率ｂ
_i(m)を再推定するものである。Reference numeral 810 denotes a re-estimation unit which calculates the state transition probability a _ij according to (Equation 8) and the symbol output probability b _{ij according} to ( _Equation 9).
_It re-estimates _i (m).

【００３８】[0038]

【数８】 [Equation 8]

【００３９】[0039]

【数９】 [Equation 9]

【００４０】811は学習収束確認部であり、再推定部810
における状態から学習が収束状態にあるか否かを判定
し、収束状態にあるならば収束信号ｙをそうでなければ
再推定命令信号ｎを再推定ＨＭＭ記憶部812に送る。Reference numeral 811 is a learning convergence confirmation unit, which is a re-estimation unit 810.
It is determined from the state in 1) whether the learning is in the convergent state, and if it is in the convergent state, the convergent signal y is sent, and otherwise, the re-estimation command signal n is sent to the re-estimated HMM storage unit 812.

【００４１】上記再推定ＨＭＭ記憶部812は、再推定さ
れたＨＭＭを一時記憶しておき、前記学習収束確認部81
1からの信号により、収束信号ｙならば再推定ＨＭＭを
前記図６におけるＨＭＭ記憶部605に記憶させ、再推定
命令信号ｎならば前記ＨＭＭ一時記憶部806に記憶させ
る。The re-estimation HMM storage unit 812 temporarily stores the re-estimated HMM, and the learning convergence confirmation unit 81
With the signal from 1, if the convergence signal is y, the re-estimation HMM is stored in the HMM storage unit 605 in FIG. 6, and if it is the re-estimation command signal n, it is stored in the HMM temporary storage unit 806.

【００４２】前記学習収束確認部811で収束信号ｙが得
られるまで、前記各部807から810は繰り返される。The respective units 807 to 810 are repeated until the learning convergence confirmation unit 811 obtains the convergence signal y.

【００４３】以上が、従来のＨＭＭを用いた音声認識装
置、及びＨＭＭ作成装置の構成である。The above is the configuration of the speech recognition apparatus and the HMM creation apparatus using the conventional HMM.

【００４４】[0044]

【発明が解決しようとする課題】以上のような従来の音
声認識等を用いられている前記図６及び図８の荷重係数
ベクトル算出部604，804は、前記(数４)のような算出を
行うため、その構成が複雑となること及び演算回数が増
加するという課題があった。The weighting factor vector calculation units 604 and 804 in FIGS. 6 and 8 using the conventional speech recognition and the like as described above perform the calculation as shown in (Equation 4). Therefore, there is a problem that the configuration becomes complicated and the number of calculations increases.

【００４５】本発明は、この課題を解決すべく荷重係数
ベクトルの事前の算出によって荷重係数ベクトル算出部
を削除することによって装置全体の構成を簡略化し、か
つ演算回数を削減することを目的とする。An object of the present invention is to solve the problem by simplifying the weighting factor vector calculation unit by deleting the weighting factor vector in advance and simplifying the overall construction of the apparatus and reducing the number of calculations. .

【００４６】[0046]

【課題を解決するための手段】本発明の請求項１記載の
発明は、荷重係数ベクトル記憶手段を有し、ＨＭＭ(ヒ
ドゥンマルコフモデル)の学習前に事前に荷重係数ベク
トルを記憶させ、パタン認識時にその固定値を用いるこ
とを特徴とするパタン認識装置である。The invention according to claim 1 of the present invention has a weighting coefficient vector storage means, which stores a weighting coefficient vector in advance before learning of an HMM (Hidden Markov model) to recognize a pattern. The pattern recognition device is characterized in that the fixed value is sometimes used.

【００４７】また、本発明の請求項２記載の発明は、荷
重係数ベクトル記憶手段を有し、ＨＭＭ(ヒドゥンマル
コフモデル)の学習前に事前に荷重係数ベクトルを記憶
させ、前記ヒドゥンマルコフモデル学習時にその固定値
を用いることを特徴とするヒドゥンマルコフモデル作成
装置である。The invention according to claim 2 of the present invention further comprises a weighting coefficient vector storage means for storing the weighting coefficient vector in advance before learning the HMM (Hidden Markov model), and at the time of learning the Hidden Markov model. It is a Hidden Markov model creation device characterized by using the fixed value.

【００４８】[0048]

【作用】本発明によれば、ＨＭＭ学習前に算出した荷重
係数ベクトルを荷重係数ベクトル記憶部に記憶してお
き、ＨＭＭ学習時、認識時において、逐次算出していた
荷重係数ベクトルの計算を削除し、装置を簡略化，計算
量を削減できる。According to the present invention, the weighting coefficient vector calculated before the HMM learning is stored in the weighting coefficient vector storage unit, and the calculation of the weighting coefficient vector which is sequentially calculated during the HMM learning and the recognition is deleted. However, the device can be simplified and the calculation amount can be reduced.

【００４９】[0049]

【実施例】以下、実施例を用いて、本発明に付いて説明
する。EXAMPLES The present invention will be described below with reference to examples.

【００５０】図１は、本発明の第１の実施例におけるＨ
ＭＭを用いた音声認識装置の構成を示すブロック図であ
る。同図において、101は特徴抽出部であり、入力音声
信号ｖをＬＰＣ分析、フーリエ変換等の周知の方法によ
り、一定時間間隔毎に特徴ベクトルの系列Ｘ＝ｘ₁，
ｘ₂，…ｘ_t，…，ｘ_Tに変換する。ここで、Ｔは、入力
音声信号における特徴ベクトル系列の長さである。FIG. 1 shows H in the first embodiment of the present invention.
It is a block diagram which shows the structure of the speech recognition apparatus using MM. In the figure, 101 is a feature extraction unit, which uses a well-known method such as LPC analysis or Fourier transform of the input voice signal v to obtain a feature vector sequence X = x ₁ ,
_{_{x 2, ... x t, ...}} , it is converted to x _T. Here, T is the length of the feature vector sequence in the input audio signal.

【００５１】102はコードブックと呼ばれるものであ
り、有限個Ｍの各シンボルを表わす代表ベクトルを保持
しており、その構成は前記図７と同様である。Reference numeral 102 is a codebook, which holds a representative vector representing a finite number of M symbols, and its configuration is the same as that shown in FIG.

【００５２】103はベクトル量子化部であり、前記特徴
ベクトルｘ_tを前記コードブック102の最も近い順に１位
からＫ位の代表ベクトルのシンボルに置き換え、シンボ
ルベクトルｑ_t＝(ｏ_t1，ｏ_t2，…，ｏ_tk…，ｏ_tK)に変
換し、前記特徴ベクトルの系列をシンボルベクトル系列
Ｑ＝ｑ₁，ｑ₂，…，ｑ_Tに変換するものである。Reference numeral 103 denotes a vector quantizer, which replaces the feature vector x _t with the symbols of the representative vectors of the 1st to Kth ranks in the closest order of the codebook 102, and symbol vectors q _t = (o _t1 , o _t2 , ..., o _tk ..., o _tK ), and the sequence of the feature vectors is converted into a symbol vector sequence Q = q ₁ , q ₂ , ..., q _T.

【００５３】104は本発明の特徴である荷重係数ベクト
ル記憶部であり、１位からＫ位の代表ベクトルそれぞれ
に対する荷重係数(ｕ₁，…，ｕ_k…，ｕ_K)を入力音声の
フレームに関係ない固定値として記憶しておくものであ
る。この値の決め方は、例えば、１位からＫ位の逆数で
もよく、また、１位からＫ位まで徐々に小さくなる値を
適当に与えてもよい。Reference numeral 104 denotes a weighting factor vector storage unit, which is a feature of the present invention, and weighting factors (u ₁ , ..., U _k ..., U _K ) for each of the _1st to Kth representative vectors are input to the frame of the input speech. It is stored as an irrelevant fixed value. The value may be determined by, for example, the reciprocal of the first place to the Kth place, or a value that gradually decreases from the first place to the Kth place may be appropriately given.

【００５４】[0054]

【外１２】 [Outside 12]

【００５５】[0055]

【外１３】 [Outside 13]

【００５６】[0056]

【数１０】 [Equation 10]

【００５７】[0057]

【外１４】 [Outside 14]

【００５８】108は尤度記憶部であり、前記尤度算出部1
07で算出された各単語の尤度を比較するため記憶する。Reference numeral 108 denotes a likelihood storage unit, which is the likelihood calculation unit 1
It is stored to compare the likelihood of each word calculated in 07.

【００５９】109は比較判定部であり、前記尤度記憶部1
08に記憶されているそれぞれのＨＭＭに対する尤度の最
大値を与えるＨＭＭに対応する語彙を認識結果(rec)と
して判定するものである。Reference numeral 109 denotes a comparison / determination unit, which is the likelihood storage unit 1
The vocabulary corresponding to the HMM that gives the maximum likelihood value for each HMM stored in 08 is determined as the recognition result (rec).

【００６０】前記各部106から108は各語彙のＨＭＭにつ
き１度ずつ行い、w=1〜Wまで繰り返され、その結果を前
記比較判定部109で評価する。The respective units 106 to 108 perform once for each HMM of each vocabulary and repeat from w = 1 to W, and the result is evaluated by the comparison and determination unit 109.

【００６１】以上のように本実施例では音声認識時にお
いて、逐次計算していた荷重係数ベクトル算出部の計算
を削除できる。As described above, in the present embodiment, at the time of voice recognition, the calculation of the weighting factor vector calculation unit, which has been sequentially calculated, can be deleted.

【００６２】[0062]

【外１５】 [Outside 15]

【００６３】202はコードブックと呼ばれるものであ
り、有限個Ｍの各シンボルを表す代表ベクトルを保持し
ており、その構成は前記図７と同様である。Reference numeral 202 denotes a codebook, which holds a representative vector representing a finite number of M symbols, and its configuration is the same as that shown in FIG.

【００６４】[0064]

【外１６】 [Outside 16]

【００６５】204は本発明の特徴である荷重係数ベクト
ル記憶部であり、１位からＫ位の代表ベクトルそれぞれ
に対する荷重係数(ｕ₁，ｕ₂，…，ｕ_k…，ｕ_K)を入力学
習音声信号ｖ′のフレームの関係ない固定値として記憶
しておくものである。この値の決め方は、例えば、１位
からＫ個の逆でもよく、また、１位からＫ位まで徐々に
小さくなる値を適当に与えてもよい。Reference numeral 204 denotes a weighting coefficient vector storage unit, which is a feature of the present invention, in which the weighting coefficients (u ₁ , u ₂ , ..., U _k ..., U _K ) for each of the _1st to Kth representative vectors are input and learned. It is stored as a fixed value irrelevant to the frame of the audio signal v '. The method of determining this value may be, for example, the reverse of 1st to Kth, or a value that gradually decreases from the 1st to Kth may be appropriately given.

【００６６】[0066]

【外１７】 [Outside 17]

【００６７】206はＨＭＭ一時記憶部であり、初期ＨＭ
Ｍ(Ａ，Ｂは乱数、または経験値などを用いたもの)や逐
次学習を繰り返す上で学習が収束する以前の学習途上Ｈ
ＭＭを記憶するものであり、前記状態遷移確率行列Ａと
前記シンボル出力確率行列Ｂを記憶しておき学習が１度
終わる度に更新する。Reference numeral 206 denotes an HMM temporary storage unit, which is an initial HM.
M (where A and B are random numbers or empirical values) and learning H before the learning converges when repeating the learning H
The MM is stored, and the state transition probability matrix A and the symbol output probability matrix B are stored and updated every time learning is completed.

【００６８】[0068]

【外１８】 [Outside 18]

【００６９】[0069]

【外１９】 [Outside 19]

【００７０】[0070]

【外２０】 [Outside 20]

【００７１】210は再推定部であり、前記(数８)に従っ
て状態遷移確率ａ_ijを、前記(数９)に従ってシンボル出
力確率ｂ_i(m)を再推定するものである。A re-estimation unit 210 re-estimates the state transition probability a _ij according to (Equation 8) and the symbol output probability b _i (m) according to ( _Equation 9).

【００７２】211は学習収束確認部であり、学習が収束
状態にあるか否かを判定し、収束状態にあるならば収束
信号ｙを、そうでなければ再推定命令信号ｎを再推定Ｈ
ＭＭ記憶部212に送る。Reference numeral 211 denotes a learning convergence confirming unit, which determines whether or not learning is in a convergent state. If the learning is in a convergent state, the convergent signal y is re-estimated.
It is sent to the MM storage unit 212.

【００７３】上記再推定ＨＭＭ記憶部212は、再推定さ
れたＨＭＭを一時記憶しておき、前記学習収束確認部21
1からの信号により、収束信号ｙならば再推定ＨＭＭを
前記図１におけるＨＭＭ記憶部105に記憶させ、再推定
命令信号ｎならば前記ＨＭＭ一時記憶部206に記憶させ
る。The re-estimation HMM storage unit 212 temporarily stores the re-estimated HMM, and the learning convergence confirmation unit 21
With the signal from 1, if the convergence signal is y, the re-estimation HMM is stored in the HMM storage unit 105 in FIG. 1, and if it is the re-estimation command signal n, it is stored in the HMM temporary storage unit 206.

【００７４】前記学習収束確認部211で収束信号ｙが得
られるまで、前記各部207から210は繰り返される。The respective units 207 to 210 are repeated until the learning convergence confirmation unit 211 obtains a convergence signal y.

【００７５】以上が、本発明の各第１の実施例のＨＭＭ
を用いた音声認識装置、及びＨＭＭ作成装置の構成であ
る。The above is the HMM of each first embodiment of the present invention.
2 is a configuration of a voice recognition device and an HMM creation device using the.

【００７６】以上の第１の実施例でもわかるように、従
来の図６や図８に示す荷重係数ベクトル計算部604，804
が削減され、そのかわりに荷重係数ベクトル記憶部10
4，204が与えられている。前者は計算装置としての構成
となるが、後者は高々Ｋ個の値を記憶するものでよく、
大きく構成が簡略化されている。また、計算を行なう必
要もなく計算量の削減につながっている。As can be seen from the first embodiment described above, the conventional load coefficient vector calculation units 604 and 804 shown in FIG. 6 and FIG.
Is reduced, and instead the weighting factor vector storage unit 10
4,204 are given. The former is configured as a computing device, but the latter can store at most K values,
The structure is greatly simplified. In addition, it is possible to reduce the amount of calculation without having to perform calculation.

【００７７】以上の本発明の実施例を用いて行なった実
験に付いて説明を行なう。The experiment conducted by using the above-described embodiment of the present invention will be described.

【００７８】認識対象語彙としては、日本の100地名を
用い、各々のＨＭＭの学習データに男性27名が２回発声
した各語彙に付き延べ54単語を用い、認識のデータとし
ては各100語彙に付いて学習話者以外の者48名が２回発
声した計9600単語を用いた。結果については(表１)に示
すように計算量を削減したにも関わらず従来の方法に比
べ性能の劣化は見られない。As the vocabulary to be recognized, 100 place names of Japan were used, and a total of 54 words for each vocabulary uttered by 27 men twice were used as the learning data of each HMM. For this purpose, a total of 9600 words were spoken twice by 48 non-learning speakers. As for the result, as shown in (Table 1), the performance is not deteriorated as compared with the conventional method even though the calculation amount is reduced.

【００７９】[0079]

【表１】 [Table 1]

【００８０】図３は、本発明の第２の実施例におけるＨ
ＭＭを用いた音声認識装置の構成を示すブロック図であ
る。この第２の実施例の構成は、前記図１の第１の実施
例の構成に、荷重係数ベクトル記憶部305に記憶する荷
重係数の値をＨＭＭ学習の事前に計算する事前荷重係数
ベクトル計算部304を構築した物である。同図におい
て、301は特徴抽出部であり、入力音声信号ｖをＬＰＣ
分析、フーリエ変換等の周知の方法により、一定時間間
隔毎に特徴ベクトルの系列Ｘ＝ｘ₁，ｘ₂，…ｘ_t，…，
ｘ_Tに変換する。ここで、Ｔは、入力音声信号における
特徴ベクトル系列の長さである。FIG. 3 shows H in the second embodiment of the present invention.
It is a block diagram which shows the structure of the speech recognition apparatus using MM. The configuration of the second embodiment is the same as the configuration of the first embodiment of FIG. 1, except that the weighting coefficient vector storage unit 305 stores the value of the weighting factor in advance in the HMM learning. It is a build of 304. In the figure, reference numeral 301 denotes a feature extraction unit that inputs the input voice signal v to the LPC.
A series of feature vectors X = x ₁ , x ₂ , ... _Xt , ..., At regular time intervals, by well-known methods such as analysis and Fourier transform.
Convert to x _T. Here, T is the length of the feature vector sequence in the input audio signal.

【００８１】302はコードブックと呼ばれるものであ
り、有限個Ｍの各シンボルを表す代表ベクトルを保持し
ており、その構成は前記図７と同様である。Reference numeral 302 is a codebook, which holds a representative vector representing a finite number of M symbols, and its configuration is the same as that shown in FIG.

【００８２】303はベクトル量子化部であり、前記特徴
ベクトルｘ_tを前記コードブック302の最も近い順に１位
からＫ位の代表ベクトルのシンボルに置き換え、シンボ
ルベクトルｑ_t＝(ｏ_t1，ｏ_t2，…，ｏ_tk…，ｏ_tK)に変
換し、前記特徴ベクトルの系列をシンボルベクトル系列
Ｑ＝ｑ₁，ｑ₂，…ｑ_t，ｑ_Tに変換するものである。Reference numeral 303 denotes a vector quantizer, which replaces the feature vector x _t with the symbols of the representative vectors from the 1st to the Kth in the closest order of the codebook 302, and symbol vectors q _t = (o _t1 , o _t2 , ..., o _tk ..., o _tK ), and the feature vector sequence is converted into a symbol vector sequence Q = q ₁ , q ₂ , ... q _t , q _T.

【００８３】304は本発明の特徴である事前荷重係数ベ
クトル計算部であり、ＨＭＭ学習データＤとコードブッ
ク302を用いて(数11)に従って、荷重係数を算出するも
のである。Reference numeral 304 denotes a pre-weighting coefficient vector calculation unit, which is a feature of the present invention, and calculates a weighting coefficient using the HMM learning data D and the codebook 302 according to (Equation 11).

【００８４】[0084]

【数１１】 [Equation 11]

【００８５】305は荷重係数ベクトル記憶部であり、前
記事前荷重係数ベクトル計算部304により学習の事前に
求められた１位からＫ位の代表ベクトルそれぞれに対す
る荷重係数(ｕ₁，…，ｕ_k…，ｕ_K)を入力音声信号ｖの
フレームに関係ない固定値として記憶しておくものであ
る。Reference numeral 305 denotes a weighting coefficient vector storage section, which is a weighting coefficient (u ₁ , ..., U _k) for each of the representative vectors of 1st to Kth ranks obtained in advance by the aforesaid weighting coefficient vector calculation section 304. , U _K ) is stored as a fixed value irrelevant to the frame of the input audio signal v.

【００８６】[0086]

【外２１】 [Outside 21]

【００８７】[0087]

【外２２】 [Outside 22]

【００８８】[0088]

【外２３】 [Outside 23]

【００８９】309は尤度記憶部であり、前記尤度算出部3
08で算出された各単語の尤度を比較するため記憶する。A likelihood storage unit 309 includes the likelihood calculation unit 3
It is stored in order to compare the likelihood of each word calculated in 08.

【００９０】310は比較判定部であり、前記尤度記憶部3
09に記憶されているそれぞれのＨＭＭに対する尤度の最
大値を与えるＨＭＭに対応する語彙を認識結果(rec)と
して判定するものである。Reference numeral 310 denotes a comparison / determination unit, which is the likelihood storage unit 3
The vocabulary corresponding to the HMM that gives the maximum likelihood value for each HMM stored in 09 is determined as the recognition result (rec).

【００９１】前記各部307から309は各語彙のＨＭＭにつ
き１度ずつ行い、w=1〜Wまで繰り返され、その結果を前
記比較判定部310で評価する。The respective units 307 to 309 perform once for each HMM of each vocabulary and repeat from w = 1 to W, and the result is evaluated by the comparison / determination unit 310.

【００９２】[0092]

【外２４】 [Outside 24]

【００９３】402はコードブックと呼ばれるものであ
り、有限個Ｍの各シンボルを表わす代表ベクトルを保持
しており、その構成は前記図７と同様である。Reference numeral 402 denotes a codebook, which holds a representative vector representing a finite number of M symbols, and its configuration is the same as that shown in FIG.

【００９４】[0094]

【外２５】 [Outside 25]

【００９５】404は本発明の特徴である事前荷重係数ベ
クトル計算部であり、ＨＭＭ学習用データＤとコードブ
ック402を用いて前記(数11)に従って、荷重係数を算出
するものである。Reference numeral 404 denotes a pre-weighting coefficient vector calculation unit, which is a feature of the present invention, and calculates the weighting coefficient using the HMM learning data D and the codebook 402 in accordance with (Equation 11).

【００９６】405は荷重係数ベクトル記憶部であり、前
記図３の事前荷重係数ベクトル計算部304により学習の
事前に求められた１位からＫ位の代表ベクトルそれぞれ
に対する荷重係数(ｕ₁，ｕ₂，…，ｕ_k…，ｕ_K)を入力学
習音声信号ｖ′のフレームに関係ない固定値として記憶
しておくものである。Reference numeral 405 denotes a weighting coefficient vector storage unit, which weights the weighting factors (u ₁ , u ₂₎ for the respective representative vectors of the 1st to Kth positions obtained in advance by the preloading coefficient vector calculation unit 304 of FIG. , ..., u _k ..., in which stored as no fixed value related to the frame of the u _K) input training speech signal v 'a.

【００９７】[0097]

【外２６】 [Outside 26]

【００９８】407はＨＭＭ一時記憶部であり、初期ＨＭ
Ｍ(Ａ，Ｂは乱数、または経験値などを用いたもの)や逐
次学習を繰り返す上で学習が収束する以前の学習途上Ｈ
ＭＭを記憶するものであり、前記状態遷移確率行列Ａと
前記シンボル出力確率行列Ｂを記憶しておき学習が１度
終わる度に更新する。Reference numeral 407 denotes an HMM temporary storage unit which stores the initial HM.
M (where A and B are random numbers or empirical values) and learning H before the learning converges when repeating the learning H
The MM is stored, and the state transition probability matrix A and the symbol output probability matrix B are stored and updated every time learning is completed.

【００９９】[0099]

【外２７】 [Outside 27]

【０１００】[0100]

【外２８】 [Outside 28]

【０１０１】[0101]

【外２９】 [Outside 29]

【０１０２】411は再推定部であり、前記(数８)に従っ
て状態遷移確率ａ_ijを、前記(数９)に従ってシンボル出
力確率ｂ_i(m)を再推定するものである。A re-estimation unit 411 re-estimates the state transition probability a _ij according to (Equation 8) and the symbol output probability b _i (m) according to ( _Equation 9).

【０１０３】412は学習収束確認部であり、学習が収束
状態にあるか否かを判定し、収束状態にあるならば収束
信号ｙを、そうでなければ再推定命令信号ｎを再推定Ｈ
ＭＭ記憶部413に送る。A learning convergence confirmation unit 412 determines whether or not the learning is in the convergent state. If the learning is in the convergent state, the convergent signal y is re-estimated.
It is sent to the MM storage unit 413.

【０１０４】上記再推定ＨＭＭ記憶部413は、再推定さ
れたＨＭＭを一時記憶しておき、前記学習収束確認部41
2からの信号により、収束信号ｙならば再推定ＨＭＭを
前記図３におけるＨＭＭ記憶部306に記憶させ、再推定
命令信号ｎならば前記ＨＭＭ一時記憶部407に記憶させ
る。The re-estimation HMM storage unit 413 temporarily stores the re-estimated HMM, and the learning convergence confirmation unit 41
In accordance with the signal from 2, if the convergence signal is y, the re-estimation HMM is stored in the HMM storage unit 306 in FIG. 3, and if it is the re-estimation command signal n, it is stored in the HMM temporary storage unit 407.

【０１０５】前記学習収束確認部412で収束信号ｙが得
られるまで、前記各部408から411は繰り返される。The respective units 408 to 411 are repeated until the learning convergence confirmation unit 412 obtains the convergence signal y.

【０１０６】以上が、本発明の第２実施例のＨＭＭを用
いた音声認識装置、及びＨＭＭ作成装置の構成である。The above is the configuration of the voice recognition apparatus using the HMM and the HMM creation apparatus of the second embodiment of the present invention.

【０１０７】以上の第２の実施例でもわかるように、従
来の図６や図８に示す荷重係数ベクトル計算部604や804
が削減され、そのかわりに荷重係数ベクトル記憶部30
5，405が与えられている。前者は計算装置としての構成
となるが、後者は高々Ｋ個の値を記憶するものでよく、
大きく構成が簡略化されている。また、計算を行なう必
要もなく計算量の削減につながっている。As can be seen from the second embodiment described above, the conventional load coefficient vector calculation units 604 and 804 shown in FIG. 6 and FIG.
Is reduced, and instead, the weighting factor vector storage unit 30
5,405 have been given. The former is configured as a computing device, but the latter can store at most K values,
The structure is greatly simplified. In addition, it is possible to reduce the amount of calculation without having to perform calculation.

【０１０８】以上の本発明を用いて行なった実験に付い
て説明を行なう。The experiment conducted using the present invention will be described.

【０１０９】認識対象語彙としては、日本の100地名を
用い、各々のＨＭＭの学習データに男性27名が２回発声
した各語彙に付き延べ54単語を用い、認識のデータとし
ては各100語彙に付いて学習話者以外の者48名が２回発
声した計9600単語を用いた。結果に付いては(表２)に示
すように計算量を削減したにも関わらず従来の方法に比
べ性能の劣化はほぼ見られない。As the recognition target vocabulary, 100 place names of Japan were used, and a total of 54 words for each vocabulary that 27 males uttered twice were used for the learning data of each HMM, and the recognition data was 100 vocabulary for each. For this purpose, a total of 9600 words were spoken twice by 48 non-learning speakers. As for the results, as shown in (Table 2), there is almost no deterioration in performance compared with the conventional method, even though the calculation amount is reduced.

【０１１０】[0110]

【表２】 [Table 2]

【０１１１】[0111]

【発明の効果】以上説明したように、本発明は、事前の
荷重係数ベクトルの算出により、従来に比べ認識率をほ
ぼ変化させずに、従来に比べ構成が簡易で、演算回数の
少ないパタン認識装置を作成することが可能となる。As described above, according to the present invention, by the calculation of the weighting coefficient vector in advance, the recognition rate is substantially unchanged as compared with the conventional one, and the pattern recognition is simpler than the conventional one and the number of calculations is small. It is possible to create a device.

[Brief description of drawings]

【図１】本発明の第１の実施例におけるＨＭＭを用いた
音声認識装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a voice recognition device using an HMM according to a first embodiment of the present invention.

【図２】本発明の第１の実施例における音声認識に用い
るＨＭＭ作成装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of an HMM creating apparatus used for voice recognition in the first exemplary embodiment of the present invention.

【図３】本発明の第２の実施例におけるＨＭＭを用いた
音声認識装置の構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of a voice recognition device using an HMM according to a second embodiment of the present invention.

【図４】本発明の第２の実施例における音声認識に用い
るＨＭＭ作成装置の構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of an HMM creating apparatus used for voice recognition in a second exemplary embodiment of the present invention.

【図５】ビドゥンマルコフモデル(ＨＭＭ)を説明するた
めの図である。FIG. 5 is a diagram for explaining a Bidun Markov model (HMM).

【図６】従来のＨＭＭを用いた音声認識装置の構成を示
すブロック図である。FIG. 6 is a block diagram showing a configuration of a speech recognition apparatus using a conventional HMM.

【図７】コードブックの構成例を示す図である。FIG. 7 is a diagram showing a configuration example of a codebook.

【図８】従来の音声認識に用いるＨＭＭ作成装置の構成
を示すブロック図である。FIG. 8 is a block diagram showing a configuration of a conventional HMM creating apparatus used for speech recognition.

[Explanation of symbols]

101，201，301，401…特徴抽出部、 102，202，302，4
02…コードブック、103，203，303，403…ベクトル量子
化部、 104，204，305，405…荷重係数ベクトル記憶
部、 106，207，307，408…重み付け確率和算出部、
105，306…ＨＭＭ記憶部、 107，308…尤度算出部、
108，309…尤度記憶部、 109，310…比較判定部、 20
5，406…ＨＭＭ学習用データ記憶部、 206，407 ＨＭ
Ｍ一時記憶部、 208，409…経路確率算出部、 209，4
10…経路確率記憶部、 210，411…再推定部、 211，4
12…学習収束確認部、 212，413…再推定ＨＭＭ記憶
部、304，404…事前荷重係数ベクトル計算部。101, 201, 301, 401 ... Feature extraction unit, 102, 202, 302, 4
02 ... Codebook, 103, 203, 303, 403 ... Vector quantizer, 104, 204, 305, 405 ... Weighting coefficient vector memory, 106, 207, 307, 408 ... Weighted probability sum calculator,
105, 306 ... HMM storage unit, 107, 308 ... Likelihood calculation unit,
108, 309 ... Likelihood storage unit, 109, 310 ... Comparison determination unit, 20
5,406 ... HMM learning data storage unit, 206,407 HM
M temporary storage unit, 208, 409 ... Path probability calculation unit, 209, 4
10 ... Path probability storage unit, 210, 411 ... Re-estimation unit, 211, 4
12 ... Learning convergence confirmation unit, 212, 413 ... Re-estimation HMM storage unit, 304, 404 ... Pre-load coefficient vector calculation unit.

Claims

[Claims]

1. A pattern recognition device comprising a weighting coefficient vector storage means, which stores a weighting coefficient vector in advance before learning a Hidden Markov model, and uses a fixed value when recognizing a pattern.

2. A Hidden Markov model is created, which has a weighting coefficient vector storage means, stores a weighting coefficient vector in advance before learning a Hidden Markov model, and uses a fixed value when learning the Hidden Markov model. apparatus.

3. A pre-load coefficient vector calculation means is provided,
The weighting factor vector storage means stores the weighting factor as an average of the weighting factor vectors obtained from the learning data of the Hidden-Markov model before learning, and the value is used at the time of pattern recognition.
The pattern recognition device described.

4. A pre-load coefficient vector calculation means is provided,
The weighting factor is stored in the weighting factor vector storage means as an average of the weighting factor vectors obtained from the learning data of the Hidden-Markov model before learning the Hidden-Markov model, and the value is used during the learning of the Hidden-Markov model. Item 2. The Hidden Markov model creation device according to item 2.