JPH07160288A

JPH07160288A - Voice recognizing device

Info

Publication number: JPH07160288A
Application number: JP5305104A
Authority: JP
Inventors: Junichi Nakabashi; 順一中橋; Hidekazu Tsuboka; 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-12-06
Filing date: 1993-12-06
Publication date: 1995-06-23

Abstract

PURPOSE:To improve recognizing performance by making a code vector suitable to an input voice. CONSTITUTION:This device is provided with a code book storage means 406, a fuzzy vector quantization means 405 which converts each vector of a feature vector group to a group (vector of attribution factor) corresponding to each label by a code book and converts the feature vector group to an attribution factor vector group, a HMM storage means 407 which stores HMM in which occurrence probability of a label defined is defined for each state, a feature vector group occurrence rate calculating means which calculates occurrence rate from the HMM of the feature vector group by label occurrence probability and an attribution factor vector, and a code book correcting means 408 which corrects each code vector.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はパターン認識、特に時系
列標準モデルにヒドゥンマルコフモデル（ＨＭＭ：Hidd
en Markov Model）を用いた音声認識における、ベクト
ル空間の有限個の代表点（以下、単にコードベクトルと
いう）を修正するための装置および入力特徴ベクトルを
正規化する装置に関する。BACKGROUND OF THE INVENTION The present invention relates to pattern recognition, and in particular to a time series standard model using Hidden Markov Model (HMM: Hidd).
An apparatus for correcting a finite number of representative points in a vector space (hereinafter, simply referred to as a code vector) and an apparatus for normalizing an input feature vector in speech recognition using en Markov Model).

【０００２】本発明は一般の時系列信号に適用可能なも
のであるが、説明の便宜のために、以下、従来の技術及
び本発明については、音声認識を例に説明する。The present invention is applicable to general time-series signals, but for convenience of explanation, the prior art and the present invention will be described below by taking speech recognition as an example.

【０００３】[0003]

【従来の技術】一般に、音声認識装置は、未知の音声信
号を定められた特徴ベクトルの系列に変換し、該特徴ベ
クトルと前もって記憶されている識別された参照モデル
とを比較するように構成されている。比較の結果とし
て、前記音声信号は、定められた認識基準に従って最も
よく適合する（尤度が最大となる）参照モデルとして識
別される。現在、最も性能の良いとされる参照モデル
は、統計的推定に基づく状態と状態遷移との集合を利用
したＨＭＭであり、以下、参照モデルを代表してＨＭＭ
を用いる。BACKGROUND OF THE INVENTION In general, speech recognizers are configured to convert an unknown speech signal into a sequence of defined feature vectors and compare the feature vectors with a previously stored identified reference model. ing. As a result of the comparison, the speech signal is identified as the best matching (maximum likelihood) reference model according to defined recognition criteria. At present, the reference model that is considered to have the best performance is an HMM that uses a set of states and state transitions based on statistical estimation.
To use.

【０００４】ＨＭＭに基づく音声認識では、尤度を算出
するために、以下の操作を行う。まず、未知の音声信号
は線形予測コーディング（ＬＰＣ：Linear PredictiveC
oding)分析等の周知の方法を用いて特徴ベクトルの系列
（特徴ベクトル系列）に変換され、次に、該特徴ベクト
ルは、ベクトル間距離の最も近いコードベクトルを表す
ラベル（コードベクトルに付された番号または記号）に
変換され、前記特徴ベクトルの系列はラベルの系列（ラ
ベル系列）となり、予め作成され記憶されているＨＭＭ
が該ラベル系列を発生する確率（尤度）を算出し、尤度
が最大となるＨＭＭを認識結果とする。In HMM-based voice recognition, the following operations are performed in order to calculate the likelihood. First, unknown speech signals are subjected to linear predictive coding (LPC).
The feature vector is converted into a series of feature vectors (feature vector series) by using a well-known method such as oding analysis, and the feature vector is then labeled (labeled to the code vector indicating the code vector with the shortest inter-vector distance). Number or symbol), the feature vector series becomes a label series (label series), and the HMM is created and stored in advance.
Calculates the probability (likelihood) of generating the label sequence, and sets the HMM with the maximum likelihood as the recognition result.

【０００５】ここで、コードベクトルとは、特徴ベクト
ルの多次元空間において、予めＬＢＧアルゴリズム等の
周知の方法を用いて作成した有限個の代表点を表すベク
トルであり、ラベルによって検索可能な形で記憶されて
いる。Here, the code vector is a vector representing a finite number of representative points created in advance by a well-known method such as the LBG algorithm in a multidimensional space of feature vectors, and can be searched by a label. Remembered

【０００６】次に、ＨＭＭについて説明する。ＨＭＭ
は、各観測が有限個Ｍのラベルの中のどれかであるよう
な観測ラベル系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tを評価するの
に用いられる。図１はこのようなＨＭＭを説明するため
の図である。Next, the HMM will be described. HMM
Is used to evaluate an observation label sequence O = o ₁ , o ₂ , ..., O _T such that each observation is one of a finite number of M labels. FIG. 1 is a diagram for explaining such an HMM.

【０００７】図１では、状態数Ｎ＝３、ラベルの有限個
数Ｍ＝４を例としてある。状態１，２，３間の遷移は状
態遷移確率行列Ａ＝［ａ_ij］として表され、状態遷移確
率ａ_ijは状態ｉにいる場合には次に状態ｊに遷移を生ず
る確率である。ＨＭＭからラベルの発生する確率はラベ
ル発生確率行列Ｂ＝［ｂ_ij(k)］で表わされ、ラベル発
生確率ｂ_ij(k)は状態ｉから状態ｊに遷移した場合にラ
ベルｋを発生する確率である。ＨＭＭは、各語彙に対し
て１つずつ作成しておき、各ＨＭＭがラベル系列を発生
する確率（尤度）に基づいて該ラベル系列を分類するた
めに用いられる。In FIG. 1, the number of states N = 3 and the finite number of labels M = 4 are taken as an example. The transition between states 1, 2, and 3 is represented as a state transition probability matrix A = [a _ij ], and the state transition probability a _ij is the probability that a transition will occur next in state j when in state i. The label occurrence probability from the HMM is represented by a label occurrence probability matrix B = [b _ij (k)], and the label occurrence probability b _ij (k) generates a label k when the state i transits to the state j. It is a probability. One HMM is created for each vocabulary and used to classify the label series based on the probability (likelihood) of each HMM generating a label series.

【０００８】未知の音声信号の特徴ベクトル系列Ｙ＝ｙ
₁，ｙ₂， …，ｙ_t，…，ｙ_Tに対して得られるラベル系
列をＯ＝ｏ₁，ｏ₂，…，ｏ_T、ＨＭＭλが発生できる長
さＴの任意の状態系列をＳ＝ｓ₁，ｓ₂，…，ｓ_Tとする
とき、ＨＭＭλがラベル系列Ｏを発生する確率（尤度）
は、（数１）のように示される。Feature vector sequence Y = y of unknown voice signal
₁ , y ₂ , ..., Y _t , ..., y _T are label sequences obtained for O = o ₁ , o ₂ , ..., O _T , and an arbitrary state sequence of length T capable of generating HMMλ is S = When s ₁ , s ₂ , ..., S _T , the probability (likelihood) that the HMMλ generates the label sequence O
Is expressed as in (Equation 1).

【０００９】[0009]

【数１】 [Equation 1]

【００１０】以上は、特徴ベクトルｙ_tを唯一のラベル
ｏ_tに変換する場合の説明であるが、Ｍ個のラベルのう
ち特徴ベクトルｙ_tに対して近傍のＫ個（ベクトル間距
離の近いものからＫ個に制限）のラベルの組ｏ_t1，
ｏ_t2，…，ｏ_tKと、特徴ベクトルｙ _tの該ラベルの組で
検索される各々のコードベクトルに対する特徴ベクトル
ｙ_tの帰属度の組ｕ_t1，ｕ_t2，…，ｕ_tKを用いて、ラベ
ルベクトルｏ_t＝（ｏ_t1，ｏ_t2，…，ｏ_tK）と帰属度ベ
クトルｕ_t＝（ｕ_t1，ｕ_t2，…，ｕ_tK）に変換するファ
ジィベクトル量子化と呼ばれる方法がある。該ファジィ
ベクトル量子化は、特徴ベクトル系列Ｙをラベルベクト
ル系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tと帰属度ベクトル系列Ｕ
＝ｕ₁，ｕ₂，…，ｕ_Tに変換し、ＨＭＭλが前記特徴ベ
クトル系列を発生する確率（尤度）は、ラベル発生確率
ｂ_imの変わりに特徴ベクトルの発生度合ω _i(t)を用い
て、（数２）のように示される。The above is the feature vector y_tThe only label
o_tThis is an explanation of the case of converting to
Chi feature vector y_tIn the vicinity of K (distance between vectors)
A set of labels (limited to K from the closest one)_t1，
o_t2,, o_tKAnd the feature vector y _tIn the label set of
Feature vector for each code vector searched
y_tThe degree of membership u_t1, U_t2, ..., u_tKUsing the
Le vector o_t= (O_t1, O_t2,, o_tK) And the degree of membership
Cutle u_t= (U_t1, U_t2, ..., u_tK) To convert
There is a method called Z vector quantization. The fuzzy
The vector quantization is performed by labeling the feature vector sequence Y
Le series O = o₁, O₂,, o_TAnd membership vector series U
= U₁, U₂, ..., u_THMMλ is converted to
The probability (likelihood) of generating a Kuttle sequence is the label occurrence probability.
b_imInstead of, the degree of occurrence of the feature vector ω _iusing (t)
Is shown as (Equation 2).

【００１１】[0011]

【数２】 [Equation 2]

【００１２】ここで、Ｋの取り得る値は１からＭの整数
であり、計算量の削減の為に小さな値Ｋに制限される
（Ｋ＝３が適当）。１の場合は前記唯一のラベルを発生
する場合に対応する。Here, the possible value of K is an integer from 1 to M, and is limited to a small value K (K = 3 is appropriate) in order to reduce the calculation amount. The case of 1 corresponds to the case where the unique label is generated.

【００１３】上記のようにして得られる各ＨＭＭの尤度
（特徴ベクトル系列Ｙを各語彙のＨＭＭが発生する確
率）を比較し、該尤度が最大のものを選択することによ
り認識は行われる。Recognition is performed by comparing the likelihoods (probabilities that HMMs of each vocabulary occur in the feature vector series Y) of the HMMs obtained as described above and selecting the one having the maximum likelihood. .

【００１４】例えば、語彙数Ｗの場合、未知の特徴ベク
トル系列Ｙに対するｗ番目の語彙のＨＭＭλ^wの尤度を
Ｌ（Ｙ｜λ^w）とする場合、その認識結果ｗは（数３）
である。For example, if the number of vocabularies is ^W , and the likelihood of HMMλ ^w of the w-th vocabulary with respect to the unknown feature vector sequence Y is L (Y | λ ^w ), the recognition result w is (Equation 3).
Is.

【００１５】[0015]

【数３】 [Equation 3]

【００１６】従って、得られた尤度の相対的比較により
認識結果が求められる。図２は、コードブックの一構成
例を示すものであり、各行の１カラム目にラベルをそれ
以降にコードベクトルの値を格納する形、すなわち、ラ
ベルによってコードベクトルが検索可能な形で構成され
ている。Therefore, the recognition result is obtained by the relative comparison of the obtained likelihoods. FIG. 2 shows an example of the structure of the codebook. The codebook is constructed in such a form that a label is stored in the first column of each row and the value of the codevector is stored thereafter, that is, the codevector can be searched by the label. ing.

【００１７】以上のような構成の音声認識装置のブロッ
ク図は図３に示す形となる。３０１は特徴抽出部であ
り、ＬＰＣ分析等の周知の方法を用いて、未知の音声信
号を一定時間間隔毎に特徴ベクトルに変換し、特徴ベク
トルの系列Ｙ＝ｙ₁，ｙ₂， …，ｙ_t，…，ｙ_Tを得る。
ここでＴは、未知の音声信号に対する特徴ベクトル系列
Ｙの長さである。A block diagram of the speech recognition apparatus having the above-described structure is shown in FIG. A feature extraction unit 301 converts an unknown speech signal into a feature vector at regular time intervals using a well-known method such as LPC analysis, and outputs a sequence of feature vectors Y = y ₁ , y ₂ , ..., Y. _{Get t} , ..., y _T.
Here, T is the length of the feature vector sequence Y with respect to the unknown voice signal.

【００１８】３０２はコードブック記憶部であり、コー
ドベクトルをそれに付されたラベルによって検索可能な
形で記憶している。A codebook storage unit 302 stores a code vector in a searchable form by a label attached to the code vector.

【００１９】３０３はファジィベクトル量子化部であ
り、前記特徴抽出部３０１で抽出された前記特徴ベクト
ルｙ_tと前記コードブック記憶部３０２に記憶されてい
るコードベクトルとのベクトル間距離の最も近い順にＫ
個のラベルと、特徴ベクトルｙ _tの該ラベルの組で検索
される各々のコードベクトルに対する特徴ベクトルｙ_t
の帰属度に置き換え、ラベルベクトルｏ_t＝（ｏ_t1，ｏ
_t2，…，ｏ_tK）と帰属度ベクトルｕ_t＝（ｕ_t1，ｕ_t2，
…，ｕ_tK）に変換し、前記特徴ベクトルｙ_tの系列Ｙを
ラベルベクトル系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tと帰属度ベ
クトル系列Ｕ＝ｕ₁，ｕ ₂，…，ｕ_Tに変換するものであ
る。Reference numeral 303 is a fuzzy vector quantizer.
The feature vector extracted by the feature extraction unit 301.
Ry_tAnd stored in the codebook storage unit 302
K in order of the distance between the vector and the code vector
Labels and feature vector y _tSearch with the label set of
Feature vector y for each code vector_t
Label vector o_t= (O_t1, O
_t2,, o_tK) And the membership vector u_t= (U_t1, U_t2，
…, U_tK), And the feature vector y_tThe series Y of
Label vector series O = o₁, O₂,, o_TAnd degree of membership
Kuturu series U = u₁, U ₂, ..., u_TTo convert to
It

【００２０】３０４はＨＭＭ記憶部であり、既に作成さ
れているＨＭＭλ^w（w=1〜W）を認識すべき各語彙毎に
前記状態遷移確率行列Ａと前記ラベル発生確率行列Ｂを
語彙数Ｗだけ記憶しておく。従って、ｗ番目のＨＭＭ
は、λ^w＝｛Ａ^w，Ｂ^w｝w=1〜Wと表される。An HMM storage unit 304 stores the state transition probability matrix A and the label occurrence probability matrix B for each vocabulary for recognizing already created HMMλ ^w (w = 1 to W). Just remember. Therefore, the wth HMM
Is expressed as λ ^w = {A ^w , B ^w } w = 1 to W.

【００２１】３０５は特徴ベクトル系列発生度合算出部
であり前記ファジィベクトル量子化部３０３で求められ
た前記ラベルベクトル系列Ｏと前記帰属度ベクトル系列
Ｕと前記ＨＭＭ記憶部３０４に記憶されているｗ番目の
語彙のラベル発生確率行列Ｂ ^wを用いて、ＨＭＭλ^wに対
する特徴ベクトル系列の発生度合行列Ω^w＝{ω^w _it}を
(数４)または（数５）に従い算出するものである。Reference numeral 305 is a feature vector sequence occurrence degree calculation unit.
And is obtained by the fuzzy vector quantizer 303.
The label vector series O and the degree of membership vector series
U and the w-th stored in the HMM storage unit 304
Vocabulary label occurrence probability matrix B ^wUsing HMMλ^wAgainst
Occurrence matrix of feature vector sequence Ω^w= {Ω^w _it}
It is calculated according to (Equation 4) or (Equation 5).

【００２２】[0022]

【数４】 [Equation 4]

【００２３】[0023]

【数５】 [Equation 5]

【００２４】ここで、ラベル発生確率ｂ_iｏ_tkは、時刻
ｔの特徴ベクトルｙ_tをファジィベクトル量子化したと
きのｋ番目のラベルｏ_tkがＨＭＭの状態ｉから発生する
ラベル発生確率である。Here, the label occurrence probability b _i o _tk is the label occurrence probability that the k-th label o _tk when the feature vector y _t at time t is fuzzy vector quantized occurs from the state i of the HMM.

【００２５】３０６は尤度算出部であり、前記特徴ベク
トル系列発生度合算出部３０５で算出されたＨＭＭλ^w
に対する前記特徴ベクトル系列発生度合行列Ω^wと前記
ＨＭＭ記憶部３０４に記憶されているＨＭＭλ^wの状態
遷移確率行列Ａ^wを用い、尤度Ｌ（Ｙ｜λ^w）を算出する
ものである。A likelihood calculation unit 306 calculates the HMMλ ^w calculated by the feature vector sequence generation degree calculation unit 305.
Using a state transition probability matrix A ^w of the feature vector series occurrence rate matrix Omega ^w wherein stored in the HMM storage unit 304 and HMMramuda ^w relative likelihood L | and calculates the (Y λ ^w).

【００２６】３０７は尤度記憶部であり、前記尤度算出
部３０６で算出された特徴ベクトル系列Ｙに対する各単
語ＨＭＭλ^wの尤度Ｌ（Ｙ｜λ^w）を比較のために記憶す
る。A likelihood storage unit 307 stores the likelihood L (Y | λ ^w ) of each word HMMλ ^w for the feature vector series Y calculated by the likelihood calculation unit 306 for comparison.

【００２７】３０８は比較判定部であり、前記尤度記憶
部３０７に記憶されている各ＨＭＭの尤度のうち最大値
を与えるＨＭＭに対応する語彙を認識結果として判定す
るものである。A comparison / determination unit 308 determines, as a recognition result, a vocabulary corresponding to the HMM that gives the maximum value among the likelihoods of the HMMs stored in the likelihood storage unit 307.

【００２８】前記３０５から３０７は各語彙のＨＭＭλ
^wにつき一度ずつ行い、w=1〜Wまで繰り返され、その結
果を前記比較判定部３０８で評価する。305 to 307 are HMMλ of each vocabulary
^{It is} performed once for ^w and repeated from w = 1 to W, and the result is evaluated by the comparison and determination unit 308.

【００２９】以上が、従来のコードブック、ＨＭＭを用
いた音声認識装置の構成である。以上のようなＨＭＭを
用いた音声認識を行うためには、ＨＭＭの学習と呼ばれ
る、大量のＨＭＭ作成用音声データ（以後、ＨＭＭの学
習データと呼ぶ）を用いたＨＭＭの作成手続きが必要で
ある。The above is the configuration of the speech recognition apparatus using the conventional codebook and HMM. In order to perform the voice recognition using the HMM as described above, an HMM creating procedure using a large amount of HMM creating voice data (hereinafter referred to as HMM learning data), called HMM learning, is required. .

【００３０】ＨＭＭの学習データが、ある特定の一人の
話者で認識時も同一の話者が入力する場合を特定話者音
声認識と呼び、不特定多数の話者を用いてＨＭＭの学習
を行い、認識時の話者は未知である場合を不特定話者音
声認識と呼ぶ。When the HMM learning data is input by the same speaker even when it is recognized by one specific speaker, it is called specific speaker voice recognition, and HMM learning is performed by using an unspecified number of speakers. If the speaker at the time of recognition is unknown, it is called unspecified speaker voice recognition.

【００３１】[0031]

【発明が解決しようとする課題】特定話者音声認識の場
合は、性能はよいが一人の話者から膨大なＨＭＭの学習
データを集める必要があり、実現性に乏しい。In the case of specific speaker voice recognition, performance is good, but enormous amount of HMM learning data needs to be collected from one speaker, which is not practical.

【００３２】不特定話者音声認識の場合は、ＨＭＭの学
習データの収集は容易であり、データが多ければ多いほ
ど、ＨＭＭの統計的信頼性が上がる。しかし、特定話者
の性能を超えることはなく、該ＨＭＭでは、認識性能の
極端に悪い特異な話者がいる。In the case of unspecified speaker speech recognition, it is easy to collect the learning data of the HMM, and the more data, the higher the statistical reliability of the HMM. However, the performance of the specific speaker is not exceeded, and in the HMM, there is a specific speaker whose recognition performance is extremely poor.

【００３３】また、ＨＭＭを音節や音韻のように単語よ
り小さい単位で記憶しておく場合には、学習用語彙と認
識用語彙の間に、文脈（音節、音韻等の並ぶ順序）の差
異の影響により性能の劣下が起こる。Further, when the HMM is stored in units smaller than words such as syllables and phonemes, there is a difference in context (order of syllables, phonemes, etc.) between the learning vocabulary and the recognition vocabulary. Poor performance occurs due to the effect.

【００３４】また、認識時の周囲の環境が、学習用デー
タを収録した場合と違うときなども、その差異により性
能の劣化が起こる。Also, when the surrounding environment at the time of recognition is different from the case where the learning data is recorded, the difference causes deterioration in performance.

【００３５】以上のように従来の音声認識では、学習時
と認識時の話者の差異及び文脈の差異により性能の劣下
が起きるという課題があった。As described above, in the conventional speech recognition, there is a problem in that the performance is degraded due to the difference between the speakers at the time of learning and the recognition and the difference in the context.

【００３６】[0036]

[Means for Solving the Problems]

（１）本課題を解決するために本発明は、修正用音声の
発声内容がシステム側に既知（どの様な単語を発声した
かが分かっている）場合に、特徴ベクトル空間の有限個
の代表点（コードベクトル）をそれに付されたラベルに
よって検索可能な形で記憶するコードブック記憶手段
と、該コードブックによって特徴ベクトル系列の各ベク
トルを各ラベルに対応した帰属度の組（帰属度ベクト
ル）に変換し、前記特徴ベクトル系列を帰属度ベクトル
系列に変換するファジィベクトル量子化手段と、前記ラ
ベルの発生確率（ラベル発生確率）が状態毎に定義され
たＨＭＭを記憶するＨＭＭ記憶手段と、前記ラベル発生
確率と前記帰属度ベクトルにより前記特徴ベクトル系列
の前記ＨＭＭからの発生度合を算出する特徴ベクトル系
列発生度合算出手段と、前記各コードベクトルを修正す
るコードブック修正手段を備え、該コードブック修正手
段は前記特徴ベクトル系列が前記ＨＭＭから発生する度
合を最大にするように前記コードベクトルを修正する修
正ベクトル算出手段を含み、前記コードベクトルを修正
するように構成されている。（２）本課題を解決するために本発明は、修正用音声の
発声内容がシステム側に既知（どの様な単語を発声した
かが分かっている）場合に、特徴ベクトル空間の有限
個の代表点（コードベクトル）をそれに付されたラベル
によって検索可能な形で記憶するコードブック記憶手段
と、該コードブックによって特徴ベクトル系列の各ベク
トルを各ラベルに対応した帰属度の組（帰属度ベクト
ル）に変換し、前記特徴ベクトル系列を帰属度ベクトル
系列に変換するファジィベクトル量子化手段と、前記ラ
ベルの発生確率（ラベル発生確率）が状態毎に定義され
たＨＭＭを記憶するＨＭＭ記憶手段と、前記ラベル発生
確率と前記帰属度ベクトルにより前記特徴ベクトル系列
の前記ＨＭＭからの発生度合を算出する特徴ベクトル系
列発生度合算出手段と、前記特徴ベクトルを修正する特
徴ベクトル修正手段を備え、該特徴ベクトル修正手段は
前記特徴ベクトル系列が前記ＨＭＭから発生する度合を
最大にするように前記特徴ベクトルを修正する修正ベク
トル算出手段を含み、前記特徴ベクトルを修正するよう
に構成されている。(1) In order to solve this problem, the present invention provides a finite number of representatives of the feature vector space when the utterance content of the correction voice is known to the system side (what kind of word is uttered is known). Codebook storage means for storing points (code vectors) in a searchable form by labels attached to the points, and a set of membership degrees (membership vectors) for each vector of the feature vector series by the codebook Fuzzy vector quantization means for converting the feature vector sequence into a membership vector sequence, HMM storage means for storing an HMM in which the label occurrence probability (label occurrence probability) is defined for each state, Feature vector sequence occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the belonging degree vector. Codebook modifying means for modifying each of the codevectors, the codebook modifying means including modification vector calculating means for modifying the codevectors so as to maximize the degree of occurrence of the feature vector sequence from the HMM, It is configured to modify the code vector. (2) In order to solve this problem, the present invention provides a finite number of representatives of the feature vector space when the utterance content of the correction voice is known to the system side (which word is known). Codebook storage means for storing points (code vectors) in a searchable form by labels attached to the points, and a set of membership degrees (membership vectors) for each vector of the feature vector series by the codebook Fuzzy vector quantization means for converting the feature vector sequence into a membership vector sequence, HMM storage means for storing an HMM in which the label occurrence probability (label occurrence probability) is defined for each state, Feature vector sequence occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the belonging degree vector. A correction vector calculating means for correcting the feature vector, the feature vector correcting means including a correction vector calculating means for correcting the feature vector so as to maximize the degree of occurrence of the feature vector series from the HMM, It is configured to modify the feature vector.

【００３７】[0037]

[Action]

（１）本発明の効果は、修正用音声の発声内容がシステ
ム側に既知（どの様な単語を発声したかが分かってい
る）場合に、該音声に対応するＨＭＭの尤度が最大にな
るようにコードベクトルの修正ベクトルを算出し、コー
ドベクトルを修正することにより、上記修正用音声に適
したコードベクトルを作成し、認識性能の向上を図るこ
とである。。（２）本発明の効果は、修正用音声の発声内容がシステ
ム側に既知（どの様な単語を発声したかが分かってい
る）場合に、該音声に対応するＨＭＭの尤度が最大にな
るように特徴ベクトルの修正ベクトルを算出し、認識時
に前記修正ベクトルを用いて未知の入力音声の特徴ベク
トル系列を正規化することにより、認識性能の向上を図
ることである。(1) The effect of the present invention is that, when the utterance content of the correction voice is known to the system side (which word is uttered is known), the likelihood of the HMM corresponding to the voice is maximized. Thus, the correction vector of the code vector is calculated, and the code vector is corrected, thereby creating a code vector suitable for the correction voice and improving the recognition performance. . (2) The effect of the present invention is that, when the utterance content of the correction voice is known to the system side (which word is uttered is known), the likelihood of the HMM corresponding to the voice is maximized. Thus, the correction vector of the feature vector is calculated, and at the time of recognition, the correction vector is used to normalize the feature vector sequence of the unknown input voice, thereby improving the recognition performance.

【００３８】[0038]

【実施例】本発明は、ＨＭＭの学習時と認識時との間の
条件の差異の修正に対して適用可能であるが、以下、Ｈ
ＭＭの学習用データを発声した話者と認識装置を使用す
る話者が違う場合に、その話者間の差異の修正を例に、
本発明を説明する。BEST MODE FOR CARRYING OUT THE INVENTION The present invention can be applied to the correction of the difference in condition between the time of learning and the time of recognition of HMM.
When the speaker who uttered the MM learning data is different from the speaker who uses the recognition device, the correction of the difference between the speakers is taken as an example.
The present invention will be described.

【００３９】また、特徴ベクトル系列発生度合算出式の
積和またはべき乗積の計算の制限範囲は１からコードブ
ックサイズに等しい値Ｍまでの整数値を取り得るが、計
算量の削減のために小さな値を用いる場合が多く、本実
施例では、該制限範囲をＫという文字で表す。The limit range for calculating the sum of products or the exponentiation product of the feature vector sequence generation degree calculation formula can be an integer value from 1 to a value M equal to the codebook size, but it is small to reduce the calculation amount. In many cases, a value is used, and in this embodiment, the limit range is represented by the letter K.

【００４０】図４は、本発明であるコードブック修正装
置の概要を表すブロック図である。４０１は修正用音声
記憶部であり、コードブックをその人用に修正したい話
者（以後、修正話者と呼ぶ）すなわち音声認識システム
を使用する話者が発声した内容既知（コードブック修正
装置に発声の内容が事前に分かっている）の音声Ｓ
^r（以後、修正用音声と呼ぶ）を発声数Ｒ（ｒ＝１〜
Ｒ）記憶しており、以降のコードブックの修正に用いら
れる。該修正用音声は、発声内容が既知であればいかな
る単語、文章でも良い。FIG. 4 shows a codebook correction device according to the present invention.
It is a block diagram showing the outline of an arrangement. 401 is a correction voice
It's a memory section, and I want to modify the codebook for that person
Person (hereinafter referred to as a modified speaker), that is, a voice recognition system
Known content spoken by the speaker using
Voice content of the device is known in advance)
^r(Hereinafter, referred to as correction voice) is the number of utterances R (r = 1 to 1)
R) I remember it and use it to modify the codebook after that.
Be done. If the utterance content is known, what is the correction voice?
You can use words and sentences.

【００４１】４０２は特徴抽出部であり、前記図３に示
した認識装置で用いる特徴抽出手法と同様の手法を用
い、修正用音声Ｓ^rを一定時間間隔毎に特徴ベクトルの
系列Ｙ^r＝ｙ₁ ^r，ｙ₂ ^r， …，ｙ_t ^r，…，ｙ_T ^rに変換す
る。ここでＴ^rは、修正用音声Ｓ^rを特徴ベクトル系列に
変換したときのデータのフレーム数である。Reference numeral 402 denotes a feature extraction unit, which uses a technique similar to the feature extraction technique used in the recognition apparatus shown in FIG. 3 and outputs the correction speech S ^r at a fixed time interval Y ^r = y. _{^{_{^{1 r, y 2 r, ...}}}} , y t r, ..., it is converted to y _T ^r. Here, T ^r is the number of data frames when the correction voice S ^r is converted into a feature vector sequence.

【００４２】４０３は修正用特徴ベクトル記憶部であ
り、前記４０１に記憶されている修正用音声信号Ｓ^rを
前記特徴抽出部４０２で特徴抽出した特徴ベクトル系列
Ｙ^rをｒ＝１〜Ｒについて記憶している。A correction feature vector storage unit 403 stores a feature vector sequence Y ^r obtained by feature extraction of the correction voice signal S ^r stored in the 401 by the feature extraction unit 402 for r = 1 to R. is doing.

【００４３】４０４はデータ制御部であり、現在Ｒ発声
のうち第ｒ番目の発声を扱っているか、そのｒ番目の発
声内容は何かを用いて、以下の処理を制御する。ここ
で、word(r)とは、第ｒ番目の発声内容（該発声内容の
ＨＭＭの番号ｗ）を示す。A data control unit 404 controls the following processes by using the r-th utterance of the R utterances at present or by using the content of the r-th utterance. Here, word (r) indicates the r-th utterance content (HMM number w of the utterance content).

【００４４】４０５はファジィベクトル量子化部であ
り、前記データ制御部４０４より送られた発声番号ｒを
用いて前記修正用特徴ベクトル記憶部４０３より特徴ベ
クトル系列Ｙ^rを読みだし、各時刻ｔの特徴ベクトルｙ_t
^rに対して、後述するコードブック記憶部４０６に記憶
されているＣ₁〜Ｃ_Mのコードベクトルとのベクトル間距
離の最も近い順に１位からＫ位のラベルと、特徴ベクト
ルｙ_tの該ラベルの組で検索される各々のコードベクト
ルに対する特徴ベクトルｙ_tの帰属度に置き換え、ラベ
ルベクトルｏ_t ^r＝（ｏ_t1，ｏ_t2，…，ｏ_tK）と帰属度ベ
クトルｕ_t ^r＝（ｕ _t1，ｕ_t2，…，ｕ_tK）に変換し、前記
特徴ベクトルｙ_t ^rの系列Ｙ^rをラベルベクトル系列Ｏ^r＝
ｏ₁ ^r，ｏ₂ ^r，…，ｏ_T ^rと帰属度ベクトル系列Ｕ^r＝
ｕ₁ ^r，ｕ₂ ^r，…，ｕ_T ^rに変換するものである。Reference numeral 405 is a fuzzy vector quantizer.
The vocalization number r sent from the data control unit 404.
Using the correction feature vector storage unit 403,
Cuttle series Y^rAnd the feature vector y at each time t_t
^rIn the codebook storage unit 406 described later.
Is C₁~ C_MDistance between code vector and
Labels from 1st to Kth in order of closest separation, and feature vectors
Ry_tEach code vector searched for in the label set of
Feature vector y for_tReplaced with the degree of
Le vector o_t ^r= (O_t1, O_t2,, o_tK) And the degree of membership
Cutle u_t ^r= (U _t1, U_t2, ..., u_tK) To the above
Feature vector y_t ^rSeries Y^rLabel vector series O^r=
o₁ ^r, O₂ ^r,, o_T ^rAnd membership vector series U^r=
u₁ ^r, U₂ ^r, ..., u_T ^rIs to be converted to.

【００４５】４０６はコードブック記憶部であり、コー
ドベクトルＣ_mをそれに付されたラベルｍによって検索
可能な形で記憶しており、前記ファジィベクトル量子化
部４０５で、ベクトル量子化時に用いられる。A codebook storage unit 406 stores the code vector C _m in a searchable form by the label m attached thereto, and is used by the fuzzy vector quantization unit 405 during vector quantization.

【００４６】４０７はＨＭＭ記憶部であり、既に作成さ
れているＨＭＭを認識すべき各語彙毎に前記図１に示し
たように状態遷移確率行列Ａとラベル発生確率行列Ｂを
語彙数Ｗだけ記憶しておく。従って、ｗ番目のＨＭＭλ
^wは、λ^w ＝｛Ａ^w，Ｂ^w｝と表される。Reference numeral 407 denotes an HMM storage unit, which stores the state transition probability matrix A and the label occurrence probability matrix B for each vocabulary of which the number of vocabularies is W as shown in FIG. I'll do it. Therefore, the wth HMMλ
^w is expressed as λ ^w = {A ^w , B ^w }.

【００４７】４０８はコードブック修正部であり、前記
コードブック記憶部４０６におけるコードベクトルＣ_m
の値を修正用音声と前記ＨＭＭ記憶部４０７に記憶され
ている修正用音声の発声内容に対応するＨＭＭを用い
て、該ＨＭＭが前記修正用音声を発生する確率（尤度）
が最大になるように、コードベクトル値を修正し、修正
した新しいコードベクトルＣ^' _mを前記コードブック記憶
部４０６に転送するものである。Reference numeral 408 denotes a codebook correction unit, which is a code vector C _m in the codebook storage unit 406.
Value of the correction voice and the probability that the HMM generates the correction voice by using the HMM corresponding to the utterance content of the correction voice stored in the HMM storage unit 407.
The code vector value is modified so that the maximum value of C becomes the maximum, and the modified new code vector C ^′ _m is transferred to the codebook storage unit 406.

【００４８】４０９は修正収束判定部であり、修正用音
声を用いてコードベクトルを修正したときの収束状況を
判定するものであり、予め定められた収束条件を満足す
れば修正動作を終了し、満足しなければ満足するまで、
コードベクトルの修正を繰り返す。Reference numeral 409 denotes a correction convergence determination unit, which determines the convergence status when the code vector is corrected using the correction voice. If the predetermined convergence condition is satisfied, the correction operation is terminated, If not satisfied, until satisfied,
Repeat the modification of the code vector.

【００４９】本発明の特徴は前記コードブック修正部４
０８の構成にあり、発声内容が既知であることを条件
に、その発声内容に対応するＨＭＭがその音声を発生す
る確率（尤度）が最大になるようにコードベクトルの修
正を行うことである。コードブック修正部の具体的な構
成を示したブロック図を図５に示す。A feature of the present invention is that the codebook correction unit 4
In the configuration of No. 08, the code vector is modified so that the probability (likelihood) that the HMM corresponding to the utterance content generates the voice is maximized, provided that the utterance content is known. . FIG. 5 is a block diagram showing a specific configuration of the codebook correction unit.

【００５０】各々の端子１〜８は前記図４と接続されて
おり、端子１、６は前記コードブック記憶部４０６と接
続されており、端子１はコードブックＣを受信し、端子
６は修正後のコードブックＣ^'を送信する。端子４、５
は前記ＨＭＭ記憶部４０７と接続されており、端子４が
ｒ番目の音声に対応するＨＭＭの状態遷移確率行列Ａ
^word(r)を、端子５が同じくラベル発生確率行列Ｂ
^word(r)を受信する。端子２、３は前記ファジィベクト
ル量子化部４０５と接続されており、ｒ番目の音声に対
するラベルベクトル系列Ｏ^rと帰属度ベクトル系列Ｕ^rを
受信する。端子７は前記修正収束判定部４０９と接続さ
れており、収束の判定に用いられる平均尤度Ｌ_av _eを送
信する。端子８は前記データ制御部４０４と接続してお
り、現在の修正用音声データが第ｒ番目であるかの情報
を受信し、ｒ＝Ｒとなったら、修正ベクトル△Ｃと平均
尤度Ｌ_aveを算出する。Each terminal 1-8 is connected to FIG.
Terminals 1 and 6 are connected to the codebook storage unit 406.
And terminal 1 receives codebook C,
6 is the modified codebook C^'To send. Terminals 4, 5
Is connected to the HMM storage unit 407, and the terminal 4 is
State transition probability matrix A of HMM corresponding to r-th speech
^{word (r)}And the terminal 5 has the same label occurrence probability matrix B
^{word (r)}To receive. Terminals 2 and 3 are the fuzzy vector
Connected to the audio quantizer 405,
Label vector series O^rAnd membership vector series U^rTo
To receive. The terminal 7 is connected to the correction convergence determination unit 409.
And the average likelihood L used for the convergence determination_av _eSend
Believe. The terminal 8 is connected to the data control unit 404.
Information that the current correction voice data is the rth
When r = R is received, the correction vector ΔC and the average
Likelihood L_aveTo calculate.

【００５１】上記のような情報のやり取りを行いなが
ら、前記コードブック修正部４０８すなわち図５は実行
される。While exchanging the information as described above, the codebook correction unit 408, that is, FIG. 5 is executed.

【００５２】５０１は特徴ベクトル系列発生度合算出部
であり、端子２、３、５より受信した前記ラベル発生確
率行列、ラベルベクトル系列、帰属度ベクトル系列をも
とに帰属度とラベル発生確率から特徴ベクトル発生度合
ω_i(t)を全ての時刻ｔ、ＨＭＭの全ての状態ｉについて
算出し、特徴ベクトル発生度合行列Ωを求める。この特
徴ベクトル発生度合の算出式の与え方により後述する修
正ベクトルの算出式が異なる。Numeral 501 is a feature vector sequence occurrence degree calculation unit, which is based on the attribution degree and the label occurrence probability based on the label occurrence probability matrix, the label vector series, and the attribution degree vector series received from the terminals 2, 3, and 5. The vector generation degree ω _i (t) is calculated for all times t and all states i of the HMM to obtain the feature vector generation degree matrix Ω. The correction vector calculation formula, which will be described later, differs depending on how the calculation formula of the feature vector generation degree is given.

【００５３】５０２は経路確率算出部であり、ある時刻
ｔにＨＭＭのある状態ｉに存在する経路確率γ_i(t)を全
ての時刻ｔ、ＨＭＭの全ての状態ｉについて算出し、経
路確率行列Γを求める。また、ｒ番目の発声内容に対応
するＨＭＭがその音声が発生する確率（尤度）Ｌ
（Ｏ^r，Ｕ^r｜λ^word(r)）を算出し、収束判定のために
後述する尤度記憶部に送る。Reference numeral 502 denotes a route probability calculating unit, which calculates a route probability γ _i (t) existing in a state i of the HMM at a certain time t for all times t and all states i of the HMM to obtain a route probability matrix. Find Γ. Also, the probability (likelihood) L that the HMM corresponding to the r-th utterance content generates the voice is L.
(O ^r , U ^r | λ ^{word (r)} ) is calculated and sent to a likelihood storage unit described later for convergence determination.

【００５４】５０３は修正ベクトル分母分子算出部であ
り、前記特徴ベクトル系列発生度合算出部５０１におけ
る前記特徴ベクトル発生度合ω_i(t)の算出式に対応する
修正ベクトル算出式の分母および分子を算出する。Reference numeral 503 is a correction vector denominator / numerator calculation unit that calculates the denominator and numerator of the correction vector calculation formula corresponding to the calculation formula of the feature vector generation ratio ω _i (t) in the feature vector sequence generation ratio calculation unit 501. To do.

【００５５】５０４は修正ベクトル分母分子記憶部であ
り、前記修正ベクトル分母分子算出部５０３で算出され
た修正ベクトル算出式の分母および分子の値を後述する
修正ベクトル算出部で用いるために記憶する。A correction vector denominator / numerator storage unit 504 stores the denominator and numerator values of the correction vector calculation formula calculated by the correction vector denominator / numerator calculation unit 503 for use in the correction vector calculation unit described later.

【００５６】５０５は尤度記憶部であり、前記経路確率
記憶部５０２から送られた尤度Ｌ（Ｙ｜λ^w）の全修正
用単語分Ｒ個を記憶する。A likelihood storage unit 505 stores R words for all correction words of the likelihood L (Y | λ ^w ) sent from the route probability storage unit 502.

【００５７】以上の動作を修正用音声Ｒ個に対して行っ
た（前記端子７からの信号がＲになった）後、後述する
動作を行う。After the above operation is performed for R correction voices (the signal from the terminal 7 becomes R), the operation described later is performed.

【００５８】５０６は修正ベクトル算出部であり、前記
特徴ベクトル系列発生度合算出部５０１における特徴ベ
クトル発生度合ω_i(t)の算出式に対応する修正ベクトル
算出式に基づき、前記修正ベクトル分母分子記憶部５０
４に記憶されている修正ベクトルの分母および分子より
修正ベクトルの集合△Ｃを求める。A correction vector calculation unit 506 stores the correction vector denominator / numerator memory based on a correction vector calculation formula corresponding to the calculation formula of the feature vector generation degree ω _i (t) in the feature vector sequence generation degree calculation unit 501. Part 50
A correction vector set ΔC is obtained from the denominator and numerator of the correction vector stored in 4.

【００５９】５０７は修正後コードベクトル算出部であ
り、前記端子１より受信した修正前のコードブックＣの
コードベクトル値と前記修正ベクトル算出部５０７で求
めた修正ベクトルの集合△Ｃを用いて、修正後のコード
ブックＣ’のコードベクトル値を算出し、前記端子６よ
り前記コードベクトル記憶部４０６に送信する。Reference numeral 507 denotes a corrected code vector calculation unit, which uses the code vector value of the uncorrected codebook C received from the terminal 1 and the correction vector set ΔC obtained by the correction vector calculation unit 507. The code vector value of the corrected code book C ′ is calculated and transmitted from the terminal 6 to the code vector storage unit 406.

【００６０】５０８は平均尤度算出部であり、前記修正
収束判定部４０９に前記端子７を通じて送信するために
全尤度を平均して、平均尤度Ｌ_aveを算出する。An average likelihood calculator 508 averages all likelihoods to be transmitted to the modified convergence determiner 409 through the terminal 7, and calculates an average likelihood L _ave .

【００６１】以上が、本発明におけるコードブック修正
部の構成であるが、この構成には大きく２つの場合が考
えられる。一方は、コードブックのコードベクトルを修
正するときに、修正前と後のコードベクトル間の修正ベ
クトルを各クラスタ毎に個別に求める方法、他方はその
修正ベクトルを全クラスタ共通に求める方法である。The above is the configuration of the codebook correction unit in the present invention, but there are roughly two cases in this configuration. One is a method of individually obtaining the correction vector between the code vectors before and after the correction when correcting the code vector of the codebook, and the other is a method of obtaining the correction vector common to all the clusters.

【００６２】まず、前者の修正ベクトルを各クラスタ毎
に個別に求める場合を、図６から図９のコードブックの
修正動作の実行を表すフローチャートを用いて説明す
る。First, the case where the former correction vector is individually obtained for each cluster will be described with reference to the flowcharts of FIG. 6 to FIG.

【００６３】６０１で修正話者の発声内容既知の音声Ｓ
^rが前記修正用音声記憶部４０１に記憶されているか確
認する。記憶されていれば次に進み、されていなければ
６０２に示すように修正用音声を発声し記憶する。６０
３はその修正用音声Ｓ^rを前記特徴抽出部４０２で周知
の特徴ベクトル抽出手段を用いて特徴ベクトルＹ^rに変
換することに対応しており、ｒ＝１〜Ｒについて実行
し、６０４に示すようにそれらを前記修正用特徴ベクト
ル記憶部４０３に記憶する。At S 601, a voice S whose voicing content of the modified speaker is known
It is confirmed whether ^r is stored in the correction voice storage unit 401. If it is stored, the process proceeds to the next step, and if not, a correction voice is uttered and stored as indicated by 602. 60
3 corresponds to converting the correction speech S ^r into a feature vector Y ^r by the feature extraction unit 402 using a well-known feature vector extraction means, which is executed for r = 1 to R and is shown at 604. Thus, they are stored in the correction feature vector storage unit 403.

【００６４】以下の動作は、修正が収束したと判断され
るまで繰り返し行われる。まず、６０５で修正ベクトル
の分母用および分子用のバッファをゼロクリアし、以降
の準備をする。６０６または６０７で修正用音声データ
の特徴ベクトル系列Ｙ^rを読み込み、６０８では、前記
ファジィベクトル量子化部４０５と前記コードベクトル
記憶部４０６においてファジィベクトル量子化を周知の
方法により実行し、前記帰属度ベクトル系列Ｕ^rと前記
ラベルベクトル系列Ｏ^rを算出する。The following operation is repeated until it is determined that the correction has converged. First, in 605, the buffers for the denominator and the numerator of the correction vector are cleared to zero, and the subsequent preparations are made. At 606 or 607, the feature vector series Y ^r of the correction voice data is read, and at 608, fuzzy vector quantization is executed in the fuzzy vector quantization unit 405 and the code vector storage unit 406 by a well-known method, and the degree of membership is calculated. The vector series U ^r and the label vector series O ^r are calculated.

【００６５】６０９すなわち図７では、前記特徴ベクト
ル系列発生度合算出部５０１における特徴ベクトル発生
度合ω_i(t)の算出をｔ＝１〜Ｔ^r，ｉ＝１〜Ｉについて
７０７で前記算出式（数４）に従い行う。609, that is, in FIG. 7, the calculation of the characteristic vector generation degree ω _i (t) in the characteristic vector sequence generation degree calculation unit 501 is performed by the calculation formula (7) for t = 1 to ^Tr , i = 1 to I. Perform according to equation 4).

【００６６】６１０では前記経路確率算出部５０２の経
路確率γ_i(t)を周知のフォワード・バックワードアルゴ
リズムを用いて算出する。At 610, the route probability γ _i (t) of the route probability calculating unit 502 is calculated by using a well-known forward / backward algorithm.

【００６７】６１１すなわち図８では、前記修正ベクト
ル分母分子算出部５０３の動作を、８０３でｒ番目の特
徴ベクトル系列Ｙ^rのフレーム長Ｔ^rが終了するまで、ま
た８０６でそのｒ番目の発声内容に対応するＨＭＭの状
態数Ｉが終了するまで全てのコードベクトルＣ_m（ｍ＝
１〜Ｍ）に対して（８０９の条件を満足）、８１０で分
母については（数６）、分子については（数７）でそれ
ぞれ修正ベクトル算出式の分母および分子を算出する。
ただし、前記（数６）および（数７）は前記特徴ベクト
ル発生度合の算出式（数４）に対応するラベルｍごとの
修正ベクトル算出式（数８）の分母および分子の式であ
る。611, that is, in FIG. 8, the operation of the correction vector denominator / numerator calculation unit 503 is performed until the frame length T ^r of the r-th feature vector sequence Y ^r ends at 803 and at the r-th utterance content at 806. Until all the code vectors C _m (m =
1 to M) (satisfies the condition of 809), 810 calculates the denominator and the numerator of the correction vector calculation formula by (Equation 6) for the denominator and (Equation 7) for the numerator.
However, (Formula 6) and (Formula 7) are denominator and numerator formulas of the correction vector calculation formula (Formula 8) for each label m corresponding to the calculation formula (Formula 4) of the feature vector generation degree.

【００６８】[0068]

【数６】 [Equation 6]

【００６９】[0069]

【数７】 [Equation 7]

【００７０】[0070]

【数８】 [Equation 8]

【００７１】ここで、△Ｃ_m ^r _{_denom}および△Ｃ_m ^r _{_numer}
はｒ番目の単語、ｍ番目のクラスタに対する修正ベクト
ル△Ｃ_mの算出式の各々分母および分子を表す。Here, _ΔC _m ^r _{_denom} and _ΔC _m ^r _{_numer}
Represents the denominator and numerator of the calculation formula of the correction vector ΔC _m for the rth word and the mth cluster, respectively.

【００７２】全てのＴ^r、Ｉが終了して（８０６、８０
３の条件を満足）、８１１で修正ベクトル算出式の分母
および分子の値を尤度で正規化する。When all T ^r , I have been completed (806, 80
The condition of 3) is satisfied), and in 811 the values of the denominator and the numerator of the correction vector calculation formula are normalized by likelihood.

【００７３】６１２で最後の特徴ベクトル系列Ｙ^Rを選
択したかという条件が満たされるまで、すなわち全ての
特徴ベクトル系列Ｙ^r（ｒ＝１〜Ｒ）に対して６０８か
ら６１１の動作を繰り返し、全ての修正用音声に対する
コードベクトル毎の修正ベクトル算出式の分母および分
子を算出し終わると、６１３から６１５すなわち図９に
おいて修正ベクトルの集合△Ｃ＝｛△Ｃ₁，△Ｃ₂，…，
△Ｃ_M）の算出と修正後のコードブックＣ^'＝｛Ｃ₁ ^'，Ｃ
₂ ^'，…，Ｃ_M ^'）の算出が行われる。Until the condition that the last feature vector sequence Y ^R is selected in 612 is satisfied, that is, the operations from 608 to 611 are repeated for all feature vector sequences Y ^r (r = 1 to R), When the denominator and the numerator of the correction vector calculation formula for each code vector for the correction voice of are finished, 613 to 615, that is, the set of correction vectors ΔC = {ΔC ₁ , ΔC ₂ , ..., In FIG.
Calculation of ΔC _M ) and the corrected codebook C ^′ = {C ₁ ^′ , C
₂ ^' , ..., _CM ^' ) is calculated.

【００７４】９０３に示すように全てのコードベクトル
に対して、前記修正ベクトル算出式の分母および分子を
用いて（数９）に従い６１３すなわち９０４で各クラス
タに対する前記修正ベクトル△Ｃ_mを求める。As shown in 903, the correction vector ΔC _m for each cluster is obtained by 613, that is, 904 according to (Equation 9) using the denominator and numerator of the correction vector calculation formula for all code vectors.

【００７５】[0075]

【数９】 [Equation 9]

【００７６】前記修正ベクトルの集合△Ｃが求まれば、
後はそれを修正前の前記コードブックＣのコードベクト
ルに加え（６１４すなわち９０５）、修正後のコードブ
ックＣ^'を新しくコードブックＣとして置き換える（６
１５すなわち９０６）。Once the correction vector set ΔC is obtained,
After addition to the code vector of the codebook C before modify it (614 ie 905), replaces the codebook C ^'after correction as the new codebook C (6
15 or 906).

【００７７】６１６で修正が予め定められた収束条件に
対して収束したかどうかを判定し、収束すると判断され
れば終了し、その時点でのコードブックを修正話者に対
するコードブックとする。また、収束していないと判断
された場合は６０５に戻り、収束するまで繰り返す。At 616, it is determined whether or not the correction has converged to a predetermined convergence condition. If it is determined that the correction has converged, the process ends, and the codebook at that time is set as the codebook for the corrected speaker. Further, when it is determined that it has not converged, the process returns to 605 and is repeated until it converges.

【００７８】次に、後者の場合すなわち修正ベクトルを
全クラスタ共通に求める場合を、図６、図７と図１０、
図１１のコードブックの修正動作の実行を表すフローチ
ャートを用いて説明する。Next, in the latter case, that is, in the case of obtaining the correction vector commonly to all the clusters, FIG. 6, FIG. 7 and FIG.
This will be described with reference to the flowchart of FIG.

【００７９】６０１で修正話者の発声内容既知の音声Ｓ
^rが前記修正用音声記憶部４０１に記憶されているか確
認する。記憶されていれば次に進み、されていなければ
６０２に示すように修正用音声を発声し記憶する。６０
３はその修正用音声Ｓ^rを前記特徴抽出部４０２で周知
の特徴ベクトル抽出手段を用いて特徴ベクトルＹ^rに変
換することに対応しており、ｒ＝１〜Ｒについて実行
し、６０４に示すようにそれらを前記修正用特徴ベクト
ル記憶部４０３に記憶する。At S 601, the voice S whose utterance content of the modified speaker is known
It is confirmed whether ^r is stored in the correction voice storage unit 401. If it is stored, the process proceeds to the next step, and if not, a correction voice is uttered and stored as indicated by 602. 60
3 corresponds to converting the correction speech S ^r into a feature vector Y ^r by the feature extraction unit 402 using a well-known feature vector extraction means, which is executed for r = 1 to R and is shown at 604. Thus, they are stored in the correction feature vector storage unit 403.

【００８０】以下の動作は、修正が収束したと判断され
るまで繰り返し行われる。まず、６０５で修正ベクトル
の分母用および分子用のバッファをゼロクリアし、以降
の準備をする。６０６または６０７で修正用音声データ
の特徴ベクトル系列Ｙ^rを読み込み、６０８では、前記
ファジィベクトル量子化部４０５と前記コードベクトル
記憶部４０６においてファジィベクトル量子化を周知の
方法により実行し、前記帰属度ベクトル系列Ｕ^rと前記
ラベルベクトル系列Ｏ^rを算出する。The following operation is repeated until it is determined that the correction has converged. First, in 605, the buffers for the denominator and the numerator of the correction vector are cleared to zero, and the subsequent preparations are made. At 606 or 607, the feature vector series Y ^r of the correction voice data is read, and at 608, fuzzy vector quantization is executed in the fuzzy vector quantization unit 405 and the code vector storage unit 406 by a well-known method, and the degree of membership is calculated. The vector series U ^r and the label vector series O ^r are calculated.

【００８１】６０９すなわち図７では、前記特徴ベクト
ル系列発生度合算出部５０１における特徴ベクトル発生
度合ω_i(t)の算出をｔ＝１〜Ｔ^r，ｉ＝１〜Ｉについて
７０７で前記算出式（数４）に従い行う。609, that is, in FIG. 7, the calculation of the characteristic vector generation degree ω _i (t) in the characteristic vector sequence generation degree calculation unit 501 is performed at 707 for t = 1 to ^Tr , i = 1 to I. Perform according to equation 4).

【００８２】６１０では前記経路確率算出部５０２の経
路確率γ_i(t)を周知のフォワード・バックワードアルゴ
リズムを用いて算出する。At 610, the route probability γ _i (t) of the route probability calculating unit 502 is calculated using a well-known forward / backward algorithm.

【００８３】６１１すなわち図１０では、前記修正ベク
トル分母分子算出部５０３の動作を、１００３でｒ番目
の特徴ベクトルＹ^rのフレーム長Ｔ^rが終了するまで、ま
た１００６でそのｒ番目の発声内容に対応するＨＭＭの
状態数Ｉが終了するまで全てのコードベクトルＣ_m（ｍ
＝１〜Ｍ）に対して（１００９の条件を満足）、１０１
０で分母については（数１０）、分子については（数１
１）で修正ベクトル算出式の分母および分子を算出す
る。ただし、前記（数１０）および（数１１）は前記特
徴ベクトル発生度合の算出式（数４）に対応する全ラベ
ル共通の修正ベクトル算出式（数１２）の分母および分
子の式である。In 611, that is, in FIG. 10, the operation of the correction vector denominator / numerator calculation unit 503 is performed until the frame length T ^r of the r-th feature vector Y ^r ends at 1003, and the r-th speech content at 1006. All the code vectors C _m (m
= 1 to M) (satisfies the condition of 1009), 101
0 for the denominator (Equation 10) and for the numerator (Equation 1)
In 1), the denominator and numerator of the correction vector calculation formula are calculated. However, (Formula 10) and (Formula 11) are denominator and numerator formulas of the correction vector calculation formula (Formula 12) common to all the labels corresponding to the calculation formula (Formula 4) of the feature vector generation degree.

【００８４】[0084]

【数１０】 [Equation 10]

【００８５】[0085]

【数１１】 [Equation 11]

【００８６】[0086]

【数１２】 [Equation 12]

【００８７】ここで、△Ｃ^r _{_denom}および△Ｃ^r _{_numer}は
ｒ番目の単語の全クラスタ共通の修正ベクトル△Ｃの算
出式の各々分母および分子を表す。Here, _ΔC ^r _{_denom} and _ΔC ^r _{_numer} represent the denominator and the numerator of the calculation formula of the correction vector _ΔC common to all clusters of the r-th word.

【００８８】全てのＴ^r、Ｉが終了して（１００６、１
００３の条件を満足）、１０１１で修正ベクトル算出式
の分母および分子の値を尤度で正規化する。When all T ^r , I have finished (1006, 1
(The condition of 003 is satisfied.) At 1011, the values of the denominator and the numerator of the correction vector calculation formula are normalized by the likelihood.

【００８９】６１２で最後の特徴ベクトル系列Ｙ^Rを選
択したかという条件が満たされるまで、すなわち全ての
特徴ベクトルＹ^r（ｒ＝１〜Ｒ）に対して６０８から６
１１の動作を繰り返し、全ての修正用音声に対する全コ
ードベクトル共通の修正ベクトル算出式の分母および分
子を算出し終わると、６１３から６１５すなわち図１１
において共通修正ベクトル△Ｃの算出と修正後のコード
ブックＣ^'＝｛Ｃ₁ ^'，Ｃ₂ ^'，…，Ｃ_M ^'）の算出が行われ
る。Until the condition that the last feature vector sequence Y ^R has been selected in 612 is satisfied, that is, 608 to 6 for all feature vectors Y ^r (r = 1 to R).
11 is repeated, and when the denominator and the numerator of the correction vector calculation formula common to all code vectors for all the correction voices are calculated, 613 to 615, that is, FIG.
In, the common correction vector ΔC and the corrected codebook C ^′ = {C ₁ ^′ , C ₂ ^′ , ..., C _M ^′ ) are calculated.

【００９０】前記修正ベクトル算出式の分母および分子
を用いて（数１３）に従い６１３すなわち１１０１で全
クラスタ共通の前記修正ベクトル△Ｃを求める。Using the denominator and the numerator of the correction vector calculation formula, the correction vector ΔC common to all clusters is obtained at 613, that is, 1101 according to (Equation 13).

【００９１】[0091]

【数１３】 [Equation 13]

【００９２】前記修正ベクトル△Ｃが求まれば、後はそ
れを修正前の前記コードブックＣに加え（６１４すなわ
ち１１０５）、修正後のコードブックＣ^'を新しくコー
ドブックＣとして置き換える（６１５すなわち１１０
６）。[0092] If the correction vector △ C is obtained, after the addition to the codebook C before modify it (614 i.e. 1105), replaces the codebook C ^'after correction as the new codebook C (615 ie 110
6).

【００９３】６１６で修正が予め定められた収束条件に
対して収束したかどうかを判定し、収束すると判断され
れば終了し、その時点でのコードブックを修正話者に対
するコードブックとする。また、収束していないと判断
された場合は６０５に戻り、収束するまで繰り返す。At 616, it is determined whether or not the correction has converged to a predetermined convergence condition. If it is determined that the correction has converged, the process ends, and the codebook at that time is set as the codebook for the corrected speaker. Further, when it is determined that it has not converged, the process returns to 605 and is repeated until it converges.

【００９４】以上は、前記図５の特徴ベクトル系列発生
度合算出部５０１および前記図６の６０９および前記図
７の７０７における特徴ベクトル発生度合ω_i(t)の算出
を前記算出式（数４）に従って与え、尤度最大化の条件
のもとに修正ベクトルの算出式を、ＨＭＭの推定式導出
に良く用いられる周知のＢａｕｍ−Ｗｅｌｃｈのアルゴ
リズムを用いて導出し、各クラスタ毎に求める場合は前
記（数８）で、全クラスタ共通の場合は前記（数１２）
で与えた場合である。In the above, the calculation of the characteristic vector generation degree ω _i (t) in the characteristic vector sequence generation degree calculation unit 501 of FIG. 5 and 609 of FIG. 6 and 707 of FIG. 7 is performed by the calculation formula (Equation 4). According to the above equation, the correction vector calculation formula is derived using the well-known Baum-Welch algorithm that is often used for deriving the estimation formula of the HMM under the condition of likelihood maximization, and the calculation is performed for each cluster as described above. In (Equation 8), if all clusters are common, the above (Equation 12)
When given in.

【００９５】当然のことではあるが、修正ベクトルの算
出式の導出には、尤度最大の条件のもとではＢａｕｍ−
Ｗｅｌｃｈのアルゴリズムを用いずに最急降下法を用い
ることも可能であり、また、尤度最大でなくコードベク
トルの歪最小の条件のもとでも考えられる。As a matter of course, in deriving the formula for calculating the correction vector, under the condition of maximum likelihood, Baum-
It is also possible to use the steepest descent method without using the Welch's algorithm, and it can be considered under the condition that the code vector distortion is minimum, not the maximum likelihood.

【００９６】次に、前記図５の特徴ベクトル系列発生度
合算出部５０１および前記図６の６０９における特徴ベ
クトル発生度合ω_i(t)の算出を前記算出式（数５）に従
って与えることにより、全体の構成を変えることなく、
一部分の構成を変えることにより、違った効果を与える
修正装置を与えることができる。Next, the calculation of the feature vector occurrence degree ω _i (t) in the feature vector sequence occurrence degree calculation unit 501 in FIG. 5 and the calculation result in 609 in FIG. Without changing the configuration of
By modifying the configuration of one part, it is possible to provide a correction device that gives different effects.

【００９７】図１２は、特徴ベクトル発生度合ω_i(t)の
算出を前記算出式（数５）とした場合の前記特徴ベクト
ル系列発生度合算出部５０１および前記フローチャート
図６の６０９の動作を示すフローチャートであり、特徴
ベクトル発生度合ω_i(t)の算出を前記算出式（数４）と
した場合の前記図７に対応するものである。前記図７と
基本的な動作は同じであり、１２０７における算出式が
違う。FIG. 12 shows the operations of the characteristic vector sequence generation degree calculation unit 501 and the flowchart 609 of FIG. 6 when the calculation of the characteristic vector generation degree ω _i (t) is performed by the calculation formula (Equation 5). 9 is a flowchart and corresponds to FIG. 7 in the case where the calculation of the feature vector generation degree ω _i (t) is performed by the calculation formula (Equation 4). The basic operation is the same as that of FIG. 7, and the calculation formula in 1207 is different.

【００９８】この場合も、修正ベクトルを各コードベク
トルに対して求める場合と、全コードベクトルに対して
共通にする場合があり、尤度最大化の条件のもとに修正
ベクトルの算出式を各クラスタ毎に求める場合は（数１
４）となり、全クラスタ共通の場合は（数１５）とな
る。Also in this case, the correction vector may be obtained for each code vector or may be common to all code vectors, and the correction vector calculation formulas may be set under the likelihood maximization condition. When calculating for each cluster (Equation 1
4), and when all clusters are common, (Equation 15) is obtained.

【００９９】[0099]

【数１４】 [Equation 14]

【０１００】[0100]

【数１５】 [Equation 15]

【０１０１】修正ベクトルの算出式が変わることによっ
て、前記コードブック修正部４０８の構成、特に前記修
正ベクトル分母分子算出部５０３の構成すなわち前記フ
ローチャート図６の６１１の手順のみが変わる。その構
成の変更を修正ベクトルを各クラスタに対して求める場
合と全クラスタ共通に求める場合に分けて説明する。By changing the calculation formula of the correction vector, only the structure of the codebook correction unit 408, especially the structure of the correction vector denominator / numerator calculation unit 503, that is, the procedure of 611 in the flowchart of FIG. 6 is changed. The change in the configuration will be described separately for the case where the correction vector is obtained for each cluster and the case where it is obtained for all clusters.

【０１０２】まず、各クラスタ毎に個別に修正ベクトル
を求める場合は、前記修正ベクトル分母分子算出部５０
３の動作が、前記修正ベクトル算出式（数１４）に従
い、前記図８の８１０における各クラスタに個別な修正
ベクトル算出式の分母および分子の算出式が、それぞれ
分母は（数１６）、分子は（数１７）となる。First, when the correction vector is individually obtained for each cluster, the correction vector denominator / numerator 50 is used.
According to the correction vector calculation formula (Formula 14), the denominator and the numerator calculation formula of the correction vector calculation formula individual to each cluster in 810 of FIG. 8 are as follows. (Equation 17)

【０１０３】[0103]

【数１６】 [Equation 16]

【０１０４】[0104]

【数１７】 [Equation 17]

【０１０５】各クラスタに個別な修正ベクトル算出式の
分母および分子が求まれば、全てのコードベクトルに対
して前記（数１１）に従い６１３すなわち９０４で各ク
ラスタに個別な前記修正ベクトル△Ｃ_mを求め、それを
修正前の前記コードブックＣに加え（６１４すなわち９
０５）、修正後のコードブックＣ^'を新しくコードブッ
クＣとして置き換える（６１５すなわち９０６）。When the denominator and the numerator of the correction vector calculation formula unique to each cluster are obtained, the correction vector ΔC _m unique to each cluster is calculated 613, that is, 904, according to (Equation 11) for all code vectors. And add it to the codebook C before modification (614, ie 9
05), the corrected codebook C ^′ is replaced with a new codebook C (615 or 906).

【０１０６】次に、全クラスタに共通な修正ベクトルを
求める場合は、前記修正ベクトル分母分子算出部５０３
の動作が、前記修正ベクトル算出式（数１５）に従い、
前記図１０の１０１０における全クラスタに共通な修正
ベクトル算出式の分母および分子の算出式が、それぞれ
分母は（数１８）、分子は（数１９）となる。Next, when a correction vector common to all clusters is to be obtained, the correction vector denominator / numerator calculation unit 503.
Is performed according to the modified vector calculation formula (Equation 15),
The denominator and numerator calculation formulas of the correction vector calculation formula common to all clusters in 1010 of FIG. 10 are (numerical formula 18) and numerator (numeral 19), respectively.

【０１０７】[0107]

【数１８】 [Equation 18]

【０１０８】[0108]

【数１９】 [Formula 19]

【０１０９】全クラスタに共通な修正ベクトル算出式の
分母および分子が求まれば、前記（数１３）に従い６１
３すなわち１１０１で全クラスタ共通の前記修正ベクト
ル△Ｃを求め、それを修正前の前記コードブックＣに加
え（６１４すなわち１１０５）、修正後のコードブック
Ｃ^'を新しくコードブックＣとして置き換える（６１５
すなわち１１０６）。If the denominator and numerator of the correction vector calculation formula common to all clusters are obtained, 61 according to (Equation 13) above.
3 or 1101 obtains the correction vector ΔC common to all clusters, adds it to the codebook C before correction (614 or 1105), and replaces the corrected codebook C ^′ as a new codebook C (615).
That is, 1106).

【０１１０】上記実施例は、一度修正ベクトルと呼ばれ
る修正前後のコードブックの写像ベクトルを求めた後、
修正後のコードブックを求めているが、当然のことであ
るが、同様に修正用音声に対するＨＭＭの尤度を最大に
するように、直接修正後のコードブックのコードベクト
ルを求めることも可能である。In the above embodiment, once the mapping vector of the codebook before and after the modification called the modification vector is obtained,
Although the corrected codebook is sought, it goes without saying that it is also possible to directly obtain the code vector of the corrected codebook so as to maximize the likelihood of the HMM for the correction speech. is there.

【０１１１】上記実施例で得られた修正後のコードブッ
クを前記従来例図３のような音声認識装置のコードブッ
ク記憶装置３０２の値と置き換えることのみで、音声認
識が実行できる。The voice recognition can be executed only by replacing the corrected codebook obtained in the above embodiment with the value in the codebook storage device 302 of the voice recognition device as shown in FIG.

【０１１２】上記のことも本発明の特徴であり、コード
ブックの変更をＨＭＭの尤度最大で実行しており、ＨＭ
Ｍのラベル発生確率との対応関係が保持される。例え
ば、コードブックのみを別の条件で修正した場合には、
コードブックとラベル発生確率との対応関係が崩れてし
まう場合があり、認識率を低下させる場合が考えられる
が、本発明では問題がない。The above is also a feature of the present invention, in which the codebook is changed with the maximum likelihood of the HMM.
The correspondence with the label occurrence probability of M is retained. For example, if you modify only the codebook under different conditions,
The correspondence between the codebook and the label occurrence probability may be broken, and the recognition rate may be reduced, but the present invention does not pose a problem.

【０１１３】上記実施例は、前記特徴ベクトル発生度合
ω_i(t)の算出を前記算出式（数４）で与えた場合も前記
算出式（数５）で与えた場合も、修正ベクトル△Ｃをコ
ードベクトルＣに加えるて更新することによりコードベ
クトルの修正を行ったが、修正ベクトル△Ｃをもとにし
て得られた一定のベクトル△Ｈ（以後、正規化ベクトル
と呼ぶ。）を入力話者の音声の特徴ベクトルｙ_tから
（数２０）のように減じることにより、入力音声におけ
る話者による差異を除き、話者正規化できる。In the above embodiment, the correction vector ΔC is obtained regardless of whether the calculation of the feature vector generation degree ω _i (t) is given by the calculation formula (Formula 4) or the calculation formula (Formula 5). Was added to the code vector C to update the code vector, and a constant vector ΔH (hereinafter referred to as a normalization vector) obtained based on the correction vector ΔC was input. By subtracting from the feature vector y _t of the person's voice as in (Equation 20), the speaker can be normalized by removing the difference in the input voice depending on the speaker.

【０１１４】[0114]

【数２０】 [Equation 20]

【０１１５】この場合、コードベクトル値はそのまま
で、正規化ベクトルを記憶しておき、認識時に特徴ベク
トル系列の各フレームの特徴ベクトルからその正規化ベ
クトルを減じれば良い。図１３は、そのような話者正
規化のための正規化ベクトル△Ｈ）を作成する装置に関
するブロック図である。In this case, the normalization vector may be stored with the code vector value as it is, and the normalization vector may be subtracted from the feature vector of each frame of the feature vector series at the time of recognition. FIG. 13 is a block diagram of an apparatus for creating a normalization vector ΔH) for such speaker normalization.

【０１１６】１３０１は修正用音声記憶部であり、正規
化ベクトルを求めたい話者（以後、修正話者）すなわち
音声認識システムをし要する話者が発声した発声内容既
知（特徴ベクトル正規化装置に発生の内容が事前に分か
っている）の音声Ｓ^r（以後、修正用音声と呼ぶ）を発
生数Ｒ（ｒ＝１〜Ｒ）記憶しており、以降の修正に用い
られる。Reference numeral 1301 denotes a correction voice storage unit, which is a known speaker (hereinafter referred to as a corrected speaker) who wants to obtain a normalized vector, that is, a known voice uttered by a speaker who needs a voice recognition system (in the feature vector normalization device). The number of occurrences R (r = 1 to R) of a voice S ^r (hereinafter referred to as a correction voice) of which the content of the generation is known) is stored and used for the subsequent correction.

【０１１７】１３０２は特徴抽出部であり、前記図３に
示した認識装置で用いる特徴抽出手法と同様の手法を用
い、修正用音声Ｓ^rを一定時間間隔毎に特徴ベクトルの
系列Ｙ^r＝ｙ₁ ^r，ｙ₂ ^r， …，ｙ_t ^r，…，ｙ_T ^rに変換す
る。ここでＴ^rは、修正用音声Ｓ^rを特徴ベクトル系列に
変換したときのデータのフレーム数である。Reference numeral 1302 denotes a feature extraction unit, which uses a technique similar to the feature extraction technique used in the recognition apparatus shown in FIG. 3 to output the correction speech S ^r at a constant time interval Y ^r = y. _{^{_{^{1 r, y 2 r, ...}}}} , y t r, ..., it is converted to y _T ^r. Here, T ^r is the number of data frames when the correction voice S ^r is converted into a feature vector sequence.

【０１１８】１３０３は修正用特徴ベクトル記憶部であ
り、前記１３０１に記憶されている修正用音声信号Ｓ^r
を前記特徴抽出部１３０２で特徴抽出した特徴ベクトル
系列Ｙ^rをｒ＝１〜Ｒについて記憶している。Reference numeral 1303 denotes a correction feature vector storage unit, which is the correction voice signal S ^r stored in the above 1301.
The feature vector series Y ^r extracted by the feature extraction unit 1302 is stored for r = 1 to R.

【０１１９】１３０４はデータ制御部であり、現在Ｒ発
声のうち第ｒ番目の発声を扱っているのか、そのｒ番目
の発声内容は何かを用いて、以下の処理を制御する。こ
こで、word(r)とは、第ｒ番目の発声内容（該発声内容
のＨＭＭの番号ｗ）を示す。A data control unit 1304 controls the following processing by using whether the r-th utterance of the R utterances is currently handled or what the r-th utterance content is. Here, word (r) indicates the r-th utterance content (HMM number w of the utterance content).

【０１２０】１３０５は正規化ベクトル記憶部であり、
前記特徴ベクトルを修正するための正規化ベクトルを記
憶するものである。Reference numeral 1305 denotes a normalized vector storage unit,
A normalization vector for correcting the feature vector is stored.

【０１２１】１３０６は特徴ベクトル正規化部であり、
前記正規化ベクトル記憶部１３０５に記憶されている正
規化ベクトル△Ｈを用いて、特徴ベクトルの各時刻ｔの
値ｙ _tを修正することにより修正後の特徴ベクトルを得
るものである。Reference numeral 1306 is a feature vector normalization unit,
The positive vector stored in the normalized vector storage unit 1305.
Using the normalization vector ΔH, at each time t of the feature vector
Value y _tTo obtain the modified feature vector by modifying
It is something.

【０１２２】１３０７はファジィベクトル量子化部であ
り、前記データ制御部１３０４より送られた発声番号ｒ
を用いて前記修正用特徴ベクトル記憶部１３０３より特
徴ベクトル系列Ｙ^rを読みだし、各時刻ｔの特徴ベクト
ルｙ_t ^rを、前記正規化ベクトル記憶部１３０５に記憶さ
れている正規化ベクトルを用いて前記特徴ベクトル正規
化部で特徴ベクトルの修正を行い、得られた修正後の特
徴ベクトルに対して、後述するコードブック記憶部１３
０８に記憶されているＣ₁〜Ｃ_Mのコードベクトルとのベ
クトル間距離の最も近い順に１位からＫ位のラベルと、
特徴ベクトルｙ _tの該ラベルの組で検索される各々のコ
ードベクトルに対する特徴ベクトルｙ_tの帰属度に置き
換え、ラベルベクトルｏ_t ^r＝（ｏ_t1，ｏ_t2，…，ｏ_tK）
と帰属度ベクトルｕ_t ^r＝（ｕ_t1，ｕ_t2，…，ｕ_tK）に変
換し、前記修正後の特徴ベクトルｙ _t ^r’の系列Ｙ^r’を
ラベルベクトル系列Ｏ^r＝ｏ₁ ^r，ｏ₂ ^r，…，ｏ_T ^rと帰属
度ベクトル系列Ｕ^r＝ｕ₁ ^r，ｕ₂ ^r，…，ｕ_T ^rに変換する
ものである。ここで、Ｋは１からコードブックサイズＭ
を取り得るものであり、後述する特徴ベクトル発声度合
の算出の演算回数を制限するものである。Reference numeral 1307 denotes a fuzzy vector quantizer.
Voice number r sent from the data control unit 1304.
By using the correction feature vector storage unit 1303.
Characteristic vector series Y^rAnd read the characteristic vector at each time t
Ry_t ^rIs stored in the normalized vector storage unit 1305.
Using the normalized vector
The feature vector is modified by the digitization unit, and the resulting modified feature is
The codebook storage unit 13 to be described later with respect to the characteristic vector.
C stored in 08₁~ C_MWith code vector
Labels from 1st to Kth in order of the closest distance between the coutres,
Feature vector y _tEach label found in the label set of
Feature vector y for the code vector_tThe degree of belonging
Instead, the label vector o_t ^r= (O_t1, O_t2,, o_tK)
And the membership vector u_t ^r= (U_t1, U_t2, ..., u_tK)
In other words, the modified feature vector y _t ^r’Series Y^r’
Label vector series O^r= O₁ ^r, O₂ ^r,, o_T ^rAnd attribution
Degree vector series U^r= U₁ ^r, U₂ ^r, ..., u_T ^rConvert to
It is a thing. Where K is 1 to codebook size M
And the feature vector voicing degree described later.
Is to limit the number of calculation times.

【０１２３】１３０８はコードブック記憶部であり、コ
ードベクトルＣ_mをそれに付されたラベルｍによって検
索可能な形で記憶しており、前記ファジィベクトル量子
化部１３０７で、ベクトル量子化時に用いられる。A codebook storage unit 1308 stores the code vector C _m in a searchable form by the label m attached thereto, and is used by the fuzzy vector quantization unit 1307 during vector quantization.

【０１２４】１３０９はＨＭＭ記憶部であり、既に作成
されているＨＭＭを認識すべき各語彙毎に前記図１に示
したように状態遷移確率行列Ａとラベル発生確率行列Ｂ
を語彙数Ｗだけ記憶しておく。従って、ｗ番目のＨＭＭ
は、λ^w ＝｛Ａ^w，Ｂ^w｝と表される。Reference numeral 1309 denotes an HMM storage unit, which has a state transition probability matrix A and a label occurrence probability matrix B as shown in FIG.
Is stored for the number of vocabulary W. Therefore, the wth HMM
Is expressed as λ ^w = {A ^w , B ^w }.

【０１２５】１３１０は正規化ベクトル調整部であり、
前記正規化ベクトル記憶部１３０５における正規化ベク
トル△Ｈの値を修正用音声と前記ＨＭＭ記憶部１３０９
に記憶されている修正用音声の発声内容に対応するＨＭ
Ｍを用いて、該ＨＭＭが前記修正用音声を発生する確率
（尤度）が最大になるように、正規化ベクトルを調整
し、調整した新しい正規化ベクトル△Ｈ^'を前記正規化
ベクトル記憶部１３０５に転送するものである。Reference numeral 1310 is a normalization vector adjustment unit,
The value of the normalization vector ΔH in the normalization vector storage unit 1305 is used as the correction voice and the HMM storage unit 1309.
Corresponding to the utterance content of the correction voice stored in
Using M, the normalization vector is adjusted so that the probability (likelihood) that the HMM generates the correction speech is maximized, and a new adjusted normalization vector ΔH ^′ is stored in the normalization vector storage unit. It is transferred to 1305.

【０１２６】１３１１は修正収束判定部であり、修正用
音声を用いて正規化ベクトルを修正したときの収束状況
を判定するものであり、予め定められた収束条件を満足
すれば修正動作を終了し、満足しなければ満足するま
で、逐次、正規化ベクトルを更新しながら同じ修正用音
声を用いて修正を繰り返す。Reference numeral 1311 denotes a correction convergence determination unit which determines the convergence status when the normalization vector is corrected using the correction voice, and ends the correction operation when the predetermined convergence condition is satisfied. If not satisfied, the normalization vector is updated and the correction is repeated using the same correction voice until the satisfaction is satisfied.

【０１２７】本発明の特徴は前記正規化ベクトル調整部
１３１０の構成にあり、発声内容が既知であることを条
件に、特徴ベクトルを正規化ベクトルにより修正した修
正後の特徴ベクトルに対する、その発声内容に対応する
ＨＭＭによる尤度が最大になるように正規化ベクトルを
調整することである。The feature of the present invention resides in the configuration of the normalization vector adjusting unit 1310, and on the condition that the utterance content is known, the utterance content for the modified feature vector obtained by modifying the feature vector with the normalization vector. Is to adjust the normalization vector so that the likelihood by the HMM corresponding to

【０１２８】正規化ベクトル調整部の具体的な構成を示
したブロック図を図１４に示す。各々の端子１〜９は前
記図１３と接続されており、端子１は前記コードベクト
ル記憶部１３０８と接続され、コードブックＣを受信す
る。。端子４、５は前記ＨＭＭ記憶部１３０９と接続さ
れており、端子４がｒ番目の音声に対応するＨＭＭの状
態遷移確率行列Ａ^word(r)を、端子５が同じくラベル発
生確率行列Ｂ^word( ^r)を受信する。端子２、３は前記フ
ァジィベクトル量子化部１３０７と接続されており、ｒ
番目の音声に対するラベルベクトル系列Ｏ^rと帰属度ベ
クトル系列Ｕ^rを受信する。端子６、９は前記正規化ベ
クトル記憶部１３０５と接続されており、端子６は正規
化ベクトル△Ｈを受信し、端子９は修正後の正規化ベク
トル△Ｈ’を送信する。端子７は前記修正収束判定部１
３１１と接続されており、収束の判定に用いられる平均
尤度Ｌ_aveを送信する。端子８は前記データ制御部１３
０４と接続しており、現在の修正用音声データが第ｒ番
目であるかの情報を受け取り、ｒ＝Ｒとなったら、正規
化ベクトル△Ｈと平均尤度Ｌ_aveを算出する。FIG. 14 is a block diagram showing the specific structure of the normalization vector adjustment unit. Each of terminals 1 to 9 is connected to FIG. 13 and terminal 1 is connected to the code vector storage unit 1308 to receive the codebook C. . The terminals 4 and 5 are connected to the HMM storage unit 1309. The terminal 4 outputs the state transition probability matrix A ^{word (r)} of the HMM corresponding to the r-th speech, and the terminal 5 also outputs the label occurrence probability matrix B ^{word (} ^r). ^r) is received. Terminals 2 and 3 are connected to the fuzzy vector quantizer 1307, and r
The label vector sequence O ^r and the membership vector sequence U ^r for the th speech are received. The terminals 6 and 9 are connected to the normalized vector storage unit 1305, the terminal 6 receives the normalized vector ΔH, and the terminal 9 transmits the corrected normalized vector ΔH ′. The terminal 7 is the correction convergence determination unit 1
It is connected to 311 and transmits the average likelihood L _ave used for the determination of convergence. The terminal 8 is the data control unit 13
04, and receives information as to whether the current correction speech data is the r-th, and when r = R, calculates a normalized vector ΔH and an average likelihood L _ave .

【０１２９】上記のような情報のやり取りを行いなが
ら、前記コードブック修正部１３１０すなわち前記図１
４は実行される。While exchanging the information as described above, the codebook correction unit 1310, that is, FIG.
4 is executed.

【０１３０】１４０１は特徴ベクトル系列発生度合算出
部であり、端子２、３、５より受信した前記ラベル発生
確率行列、ラベルベクトル系列、帰属度ベクトル系列を
もとに帰属度とラベル発生確率から特徴ベクトル発生度
合ω_i(t)を全ての時刻ｔ、ＨＭＭの全ての状態ｉについ
て算出し、特徴ベクトル発生度合行列Ωを求める。この
特徴ベクトル発生度合の算出式の与え方により後述する
修正ベクトルの算出式が違う。Reference numeral 1401 is a feature vector sequence occurrence degree calculation unit, which is based on the attribution degree and the label occurrence probability based on the label occurrence probability matrix, label vector series, and attribution degree vector series received from terminals 2, 3, and 5. The vector generation degree ω _i (t) is calculated for all times t and all states i of the HMM to obtain the feature vector generation degree matrix Ω. The correction vector calculation formula, which will be described later, differs depending on how the calculation formula of the feature vector generation degree is given.

【０１３１】１４０２は経路確率算出部であり、ある時
刻ｔにＨＭＭのある状態ｉに存在する経路確率γ_i(t)を
全ての時刻ｔ、ＨＭＭの全ての状態ｉについて算出し、
経路確率行列Γを求める。また、ｒ番目の発声内容に対
応するＨＭＭがその音声が発生する確率（尤度）Ｌ（Ｏ
^r，Ｕ^r｜λ^word(r)）を算出し、収束判定のために後述
する尤度記憶部に送る。Reference numeral 1402 denotes a path probability calculating unit which calculates a path probability γ _i (t) existing in a state i of the HMM at a time t for all times t and all states i of the HMM.
Obtain the path probability matrix Γ. Further, the probability (likelihood) L (O) that the HMM corresponding to the r-th utterance content generates the voice is
^r , U ^r | λ ^{word (r)} ) is calculated and sent to a likelihood storage unit described later for convergence determination.

【０１３２】１４０３は修正ベクトル分母分子算出部で
あり、前記特徴ベクトル系列発生度合算出部１４０１に
おける前記特徴ベクトル発生度合ω_i(t)に対応する修正
ベクトル算出式の分母および分子を算出する。A correction vector denominator / numerator 1403 calculates the denominator and numerator of the correction vector calculation formula corresponding to the characteristic vector generation degree ω _i (t) in the characteristic vector sequence generation degree calculation section 1401.

【０１３３】１４０４は修正ベクトル分母分子記憶部で
あり、前記修正ベクトル分母分子算出部１４０３で算出
された修正ベクトル算出式の分母および分子の値を後述
する修正ベクトル算出部で用いるために記憶する。A correction vector denominator / numerator storage unit 1404 stores the values of the denominator and the numerator of the correction vector calculation formula calculated by the correction vector denominator / numerator calculation unit 1403 for use in the correction vector calculation unit described later.

【０１３４】１４０５は尤度記憶部であり、前記経路確
率記憶部１４０２から送られた尤度Ｌ（Ｙ｜λ^w）の全
修正用単語分Ｒ個を記憶する。A likelihood storage unit 1405 stores R words for all correction words of the likelihood L (Y | λ ^w ) sent from the path probability storage unit 1402.

【０１３５】以上の動作を修正用音声Ｒ個に対して行っ
た（前記端子７からの信号がＲになった）後、後述する
動作を行う。After the above operation is performed for R correction voices (the signal from the terminal 7 becomes R), the operation described later is performed.

【０１３６】１４０６は修正ベクトル算出部であり、前
記特徴ベクトル系列発生度合算出部１４０１における発
生度合ω_i(t)に対応する修正ベクトル算出式に基づき、
前記修正ベクトル分母分子記憶部１４０４に記憶されて
いる修正ベクトルの分母分子より修正ベクトル値△Ｃを
求め、端子９より前記修正ベクトル記憶部１３０５に送
り出す。Reference numeral 1406 denotes a correction vector calculation unit, which is based on a correction vector calculation formula corresponding to the generation degree ω _i (t) in the feature vector sequence generation degree calculation unit 1401.
The correction vector value ΔC is obtained from the denominator / numerator of the correction vector stored in the correction vector denominator / numerator storage unit 1404, and is sent to the correction vector storage unit 1305 from the terminal 9.

【０１３７】１４０７は正規化ベクトル算出部であり、
前記修正ベクトル算出部で求められた修正ベクトル△Ｃ
と端子６で受信した修正前の正規化ベクトル△Ｈによ
り、新しい正規化ベクトル△Ｈ’を得ることができる。Reference numeral 1407 is a normalized vector calculation unit,
The correction vector ΔC obtained by the correction vector calculation unit
A new normalized vector ΔH ′ can be obtained from the uncorrected normalized vector ΔH received at the terminal 6.

【０１３８】１４０８は平均尤度算出部であり、前記修
正収束判定部１３１１に前記端子７を通じて送り出すた
めに全尤度を平均して、平均尤度Ｌ_aveを算出する。Reference numeral 1408 denotes an average likelihood calculating section, which averages all likelihoods to send it to the correction convergence determining section 1311 through the terminal 7 to calculate an average likelihood L _ave .

【０１３９】上記の特徴ベクトル正規化装置において
も、前記コードブック修正装置と同様に発生度合ω_i(t)
が前記（数４）で定義される場合と、前記（数５）で定
義される場合があり、修正ベクトルの算出式は、前者の
場合が前記（数１２）であり、後者の場合が前記（数１
５）と同様である。In the above-described feature vector normalizing apparatus, the occurrence degree ω _i (t) is also the same as in the codebook correcting apparatus.
May be defined by the above (Formula 4) or may be defined by the above (Formula 5). The correction vector calculation formula is the above (Formula 12) and the latter case is the above (Equation 1
Same as 5).

【０１４０】以上が、本発明のコードブック修正装置と
特徴ベクトル正規化装置の実施例であるが、以上の場
合、修正用音声はあらかじめ発声されたものとしていた
が、発声内容既知という観点から考えると、認識結果の
信頼性が高い場合は、該認識結果をその発声内容と考え
ることができ、音声認識システムを使用する話者が、事
前に修正用音声を発声する必要がない。The above is an embodiment of the codebook correction device and the feature vector normalization device of the present invention. In the above case, the correction voice is supposed to be uttered in advance, but it is considered from the viewpoint that the utterance content is known. When the reliability of the recognition result is high, the recognition result can be considered as the utterance content, and the speaker using the voice recognition system does not need to utter the correction voice in advance.

【０１４１】この場合、認識結果の信頼性は、その尤度
そのものが大きいときや第１候補と第２候補の尤度差が
大きいときは認識結果の信頼度が高く、そうでない場合
は認識結果の信頼性が低いと考えられるから、それぞれ
に適当に閾値を設けておき、その閾値を超えたときはコ
ードブックの修正を行い、そうでない場合は修正を行わ
ないというようにすれば発声内容が未知の場合でも認識
結果を発声内容とすることにより，コードブックの修正
が行える。In this case, the reliability of the recognition result is high if the likelihood itself is large or the likelihood difference between the first candidate and the second candidate is large, and the reliability of the recognition result is otherwise. Since it is considered that the reliability of the code is low, an appropriate threshold value is set for each, and if the threshold value is exceeded, the codebook is corrected, and if not, the correction is not performed. Even if it is unknown, the codebook can be modified by using the recognition result as the utterance content.

【０１４２】この場合ような音声認識装置のブロック図
を図１５を用いて説明する。１５０１は特徴抽出部であ
り、ＬＰＣ分析等の周知の方法を用いて、未知の音声信
号を一定時間間隔毎に特徴ベクトルに変換し、特徴ベク
トルの系列Ｙ＝ｙ₁，ｙ₂， …，ｙ_t，…，ｙ_Tを得る。
ここでＴは、未知の音声信号に対する特徴ベクトル系列
Ｙの長さである。A block diagram of such a voice recognition device will be described with reference to FIG. A feature extraction unit 1501 converts an unknown voice signal into a feature vector at regular time intervals using a well-known method such as LPC analysis, and outputs a sequence of feature vectors Y = y ₁ , y ₂ , ..., Y. _{Get t} , ..., y _T.
Here, T is the length of the feature vector sequence Y with respect to the unknown voice signal.

【０１４３】１５０２はコードブック記憶部であり、コ
ードベクトルをそれに付されたラベルによって検索可能
な形で記憶している。Reference numeral 1502 denotes a codebook storage unit which stores the code vector in a searchable form by the label attached to the code vector.

【０１４４】１５０３はファジィベクトル量子化部であ
り、前記特徴抽出部１５０１で抽出された前記特徴ベク
トルｙ_tと前記コードブック記憶部１５０２に記憶され
ているコードベクトルとのベクトル間距離の最も近い順
にＫ個のラベルと、特徴ベクトルｙ_tの該ラベルの組で
検索される各々のコードベクトルに対する特徴ベクトル
ｙ_tの帰属度に置き換え、ラベルベクトルｏ_t＝（ｏ_t1，
ｏ_t2，…，ｏ_tK）と帰属度ベクトルｕ_t＝（ｕ_t1，
ｕ_t2，…，ｕ_tK）に変換し、前記特徴ベクトルｙ_tの系
列Ｙをラベルベクトル系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tと帰
属度ベクトル系列Ｕ＝ｕ₁，ｕ₂，…，ｕ_Tに変換するも
のである。Reference numeral 1503 denotes a fuzzy vector quantizing unit, which is arranged in the order of closest inter-vector distance between the feature vector y _t extracted by the feature extracting unit 1501 and the code vector stored in the codebook storage unit 1502. and the K label, replaces the membership of the feature vector y _t for each of the code vectors searched by the label of the set of feature vectors y _t, label vector o _t = (o _t1,
o _t2 , ..., o _tK ) and the membership vector u _t = (u _t1 ,
u _t2, ..., is converted into u _tK), the feature vector y series Y label vector series _{_{_{t O = o 1, o 2}}} , ..., o T a membership vector series _{_{U = u 1, u 2,}} ... , U _T.

【０１４５】１５０４はＨＭＭ記憶部であり、既に作成
されているＨＭＭλ^w（w=1〜W）を認識すべき各語彙毎
に前記状態遷移確率行列Ａと前記ラベル発生確率行列Ｂ
を語彙数Ｗだけ記憶しておく。従って、ｗ番目のＨＭＭ
は、λ^w＝｛Ａ^w，Ｂ^w｝w=1〜W と表される。Reference numeral 1504 denotes an HMM storage unit, which has the state transition probability matrix A and the label occurrence probability matrix B for each vocabulary to recognize the already created HMMλ ^w (w = 1 to W).
Is stored for the number of vocabulary W. Therefore, the wth HMM
Is expressed as λ ^w = {A ^w , B ^w } w = 1 to W.

【０１４６】１５０５は特徴ベクトル系列発生度合算出
部であり前記ファジィベクトル量子化部１５０３で求め
られた前記ラベルベクトル系列Ｏと前記帰属度ベクトル
系列Ｕと前記ＨＭＭ記憶部１５０４に記憶されているｗ
番目の語彙のラベル発生確率行列Ｂ^wを用いて、ＨＭＭ
λ^wに対する特徴ベクトル系列の発生度合行列Ω^w＝{ω ^w
_it}を前記（数４)または前記（数５）に従い算出するも
のである。Reference numeral 1505 is a feature vector sequence generation degree calculation
And the fuzzy vector quantizer 1503
The label vector series O and the degree of membership vector
Sequence U and w stored in the HMM storage unit 1504
Label occurrence probability matrix B of the th vocabulary^wUsing the HMM
λ^wThe occurrence degree matrix Ω of the feature vector sequence for^w= {Ω ^w
_itIs calculated according to the above (Equation 4) or the above (Equation 5)
Of.

【０１４７】ここで、ラベル発生確率ｂ_iｏ_tkは、時刻
ｔの特徴ベクトルｙ_tをファジィベクトル量子化したと
きのｋ番目のラベルｏ_tkがＨＭＭの状態ｉから発生する
ラベル発生確率である。Here, the label occurrence probability b _i o _tk is the label occurrence probability that the k-th label o _tk when the feature vector y _t at time t is fuzzy vector quantized occurs from the state i of the HMM.

【０１４８】１５０６は尤度算出部であり、前記特徴ベ
クトル系列発生度合算出部１５０５で算出されたＨＭＭ
λ^wに対する前記特徴ベクトル系列発生度合行列Ω^wと前
記ＨＭＭ記憶部１５０４に記憶されているＨＭＭλ^wの
状態遷移確率行列Ａ^wを用い、尤度Ｌ（Ｙ｜λ^w）を算出
するものである。Reference numeral 1506 denotes a likelihood calculating unit, which is the HMM calculated by the feature vector sequence generation degree calculating unit 1505.
using the feature vector series occurrence rate matrix Omega ^w a state transition probability matrix A ^w of HMMramuda ^w stored in the HMM storage section 1504 for lambda ^w, likelihood L | and calculates the (Y lambda ^w) .

【０１４９】１５０７は尤度記憶部であり、前記尤度算
出部１５０６で算出された特徴ベクトル系列Ｙに対する
各単語ＨＭＭλ^wの尤度Ｌ（Ｙ｜λ^w）を比較のために記
憶する。[0149] 1507 is a likelihood storage unit, the likelihood L of each word HMMramuda ^w for the feature vector sequence Y calculated by the likelihood calculation unit 1506 | stores (Y lambda ^w) for comparison.

【０１５０】１５０８は比較判定部であり、前記尤度記
憶部１５０７に記憶されている各ＨＭＭの尤度のうち最
大値を与えるＨＭＭに対応する語彙を認識候補として判
定するものである。A comparison / determination unit 1508 determines the vocabulary corresponding to the HMM giving the maximum value among the likelihoods of each HMM stored in the likelihood storage unit 1507 as a recognition candidate.

【０１５１】前記１５０５から１５０７は各語彙のＨＭ
Ｍλ^wにつき一度ずつ行い、w=1〜Wまで繰り返され、そ
の結果を前記比較判定部１５０８で評価する。The above 1505 to 1507 are HM of each vocabulary
It is performed once for each Mλ ^w , repeated from w = 1 to W, and the result is evaluated by the comparison / determination unit 1508.

【０１５２】１５０９は認識候補信頼性算出部であり，
前記比較判定部１５０８で選ばれた認識候補の信頼性を
前記尤度記憶部１５０７に記憶されている該認識候補の
尤度等を用いて算出するものである。Reference numeral 1509 denotes a recognition candidate reliability calculation unit,
The reliability of the recognition candidate selected by the comparison determination unit 1508 is calculated using the likelihood of the recognition candidate stored in the likelihood storage unit 1507.

【０１５３】１５１０はコードブック修正実行判定部で
あり，前記認識候補信頼度算出部１５０９より得られた
前記認識候補の信頼性が，予め定められた閾値以上であ
ればコードブック修正信号を後述するコードブック修正
部に送り，コードブックの修正を実行させる。Reference numeral 1510 is a codebook correction execution determination unit, and if the reliability of the recognition candidate obtained from the recognition candidate reliability calculation unit 1509 is a predetermined threshold value or more, a codebook correction signal will be described later. Send to the codebook modification section to execute the codebook modification.

【０１５４】１５１１はコードブック修正部であり，前
記コードブック修正実行判定部からの前記コードブック
修正信号を受けて，前記コードブック記憶部１５０２に
記憶されている前記コードブックと前記ファジィベクト
ル量子化部１５０３で得られた前記ラベルベクトル系列
Ｏと前記帰属度ベクトル系列Ｕと前記ＨＭＭ記憶部に記
憶されている前記認識候補に対応するＨＭＭとを用い
て，コードブックの修正を行い，修正後のコードブック
をコードブック記憶部に送るものである。Reference numeral 1511 denotes a codebook correction unit, which receives the codebook correction signal from the codebook correction execution determination unit and receives the codebook and the fuzzy vector quantization stored in the codebook storage unit 1502. The codebook is corrected using the label vector series O, the belonging degree vector series U, and the HMM corresponding to the recognition candidate stored in the HMM storage unit obtained by the unit 1503. The codebook is sent to the codebook storage unit.

【０１５５】同様に、閾値を超えたときは正規化ベクト
ルの調整を行い、そうでない場合は調整を行わないとい
うようにすれば発声内容が未知の場合でも認識結果を発
声内容とすることにより，正規化ベクトルの修正が行え
る。Similarly, if the normalization vector is adjusted when the threshold value is exceeded and adjustment is not made otherwise, the recognition result is used as the utterance content even if the utterance content is unknown. The normalization vector can be modified.

【０１５６】この場合ような音声認識装置のブロック図
を図１６を用いて説明する。１６０１は特徴抽出部であ
り、ＬＰＣ分析等の周知の方法を用いて、未知の音声信
号を一定時間間隔毎に特徴ベクトルに変換し、特徴ベク
トルの系列Ｙ＝ｙ₁，ｙ₂， …，ｙ_t，…，ｙ_Tを得る。
ここでＴは、未知の音声信号に対する特徴ベクトル系列
Ｙの長さである。A block diagram of such a voice recognition device will be described with reference to FIG. Reference numeral 1601 denotes a feature extraction unit that converts an unknown voice signal into a feature vector at regular time intervals using a well-known method such as LPC analysis, and outputs a sequence of feature vectors Y = y ₁ , y ₂ , ..., Y. _{Get t} , ..., y _T.
Here, T is the length of the feature vector sequence Y with respect to the unknown voice signal.

【０１５７】１６０２は正規化ベクトル記憶部であり，
前記特徴ベクトルを正規化するための正規化ベクトルを
記憶するものである。Reference numeral 1602 denotes a normalized vector storage section,
A normalization vector for normalizing the feature vector is stored.

【０１５８】１６０３は特徴ベクトル正規化部であり，
前記特徴ベクトルを前記正規化ベクトルのより正規化す
るものである。Reference numeral 1603 is a feature vector normalization unit,
The feature vector is further normalized by the normalization vector.

【０１５９】１６０４はコードブック記憶部であり、コ
ードベクトルをそれに付されたラベルによって検索可能
な形で記憶している。A codebook storage unit 1604 stores the code vector in a searchable form by the label attached to the code vector.

【０１６０】１６０５はファジィベクトル量子化部であ
り、前記特徴ベクトル正規化部で正規化された正規化後
の特徴ベクトルｙ'_tと前記コードブック記憶部１６０４
に記憶されているコードベクトルとのベクトル間距離の
最も近い順にＫ個のラベルと、正規化後の特徴ベクトル
ｙ'_tの該ラベルの組で検索される各々のコードベクトル
に対する正規化後の特徴ベクトルｙ'_tの帰属度に置き換
え、ラベルベクトルｏ _t＝（ｏ_t1，ｏ_t2，…，ｏ_tK）と
帰属度ベクトルｕ_t＝（ｕ_t1，ｕ_t2，…，ｕ_tK）に変換
し、前記正規化後の特徴ベクトルｙ'_tの系列Ｙ'をラベ
ルベクトル系列Ｏ＝ｏ₁，ｏ₂，…，ｏ_Tと帰属度ベクト
ル系列Ｕ＝ｕ₁，ｕ₂，…，ｕ_Tに変換するものである。Reference numeral 1605 is a fuzzy vector quantizer.
After the normalization by the feature vector normalization unit
Feature vector y'of_tAnd the codebook storage unit 1604
Of the distance between the vector and the code vector stored in
K labels in the closest order and the feature vector after normalization
y '_tEach code vector searched for in the label set of
The normalized feature vector y ′ for_tReplaced with
E, label vector o _t= (O_t1, O_t2,, o_tK)When
Membership vector u_t= (U_t1, U_t2, ..., u_tK)Conversion to
Then, the normalized feature vector y ′_tThe series Y'of
Le vector series O = o₁, O₂,, o_TAnd degree of membership
Le series U = u₁, U₂, ..., u_TIs to be converted to.

【０１６１】１６０６はＨＭＭ記憶部であり、既に作成
されているＨＭＭλ^w（w=1〜W）を認識すべき各語彙毎
に前記状態遷移確率行列Ａと前記ラベル発生確率行列Ｂ
を語彙数Ｗだけ記憶しておく。従って、ｗ番目のＨＭＭ
は、λ^w＝｛Ａ^w，Ｂ^w｝w=1〜W と表される。Reference numeral 1606 denotes an HMM storage unit, which has the state transition probability matrix A and the label occurrence probability matrix B for each vocabulary to recognize the already created HMMλ ^w (w = 1 to W).
Is stored for the number of vocabulary W. Therefore, the wth HMM
Is expressed as λ ^w = {A ^w , B ^w } w = 1 to W.

【０１６２】１６０７は特徴ベクトル系列発生度合算出
部であり前記ファジィベクトル量子化部１６０５で求め
られた前記ラベルベクトル系列Ｏと前記帰属度ベクトル
系列Ｕと前記ＨＭＭ記憶部１６０６に記憶されているｗ
番目の語彙のラベル発生確率行列Ｂ^wを用いて、ＨＭＭ
λ^wに対する正規化後の特徴ベクトル系列の発生度合行
列Ω^w＝{ω^w _it}を前記（数４)または前記（数５）に従
い算出するものである。Reference numeral 1607 denotes a feature vector sequence generation degree calculation unit, which is stored in the HMM storage unit 1606 and the label vector sequence O obtained by the fuzzy vector quantization unit 1605 and the membership degree vector sequence U.
Using the label occurrence probability matrix B ^w of the th vocabulary, the HMM
The occurrence degree matrix Ω ^w = {ω ^w _it } of the normalized feature vector series for λ ^w is calculated according to the (Equation 4) or the (Equation 5).

【０１６３】ここで、ラベル発生確率ｂ_iｏ_tkは、時刻
ｔの特徴ベクトルｙ'_tをファジィベクトル量子化したと
きのｋ番目のラベルｏ_tkがＨＭＭの状態ｉから発生する
ラベル発生確率である。Here, the label occurrence probability b _i o _tk is the label occurrence probability that the k-th label o _tk when the feature vector y ′ _t at time t is fuzzy vector quantized occurs from the state i of the HMM. .

【０１６４】１６０８は尤度算出部であり、前記特徴ベ
クトル系列発生度合算出部１６０７で算出されたＨＭＭ
λ^wに対する前記特徴ベクトル系列発生度合行列Ω^wと前
記ＨＭＭ記憶部１６０６に記憶されているＨＭＭλ^wの
状態遷移確率行列Ａ^wを用い、尤度Ｌ（Ｙ'｜λ^w）を算
出するものである。Reference numeral 1608 denotes a likelihood calculating unit, which is the HMM calculated by the feature vector sequence generation degree calculating unit 1607.
calculates a ^{| (λ w Y ') λ} w using a state transition probability matrix A ^w of the feature vector series occurrence rate matrix Omega ^w wherein stored in the HMM storage unit 1606 and HMMramuda ^w relative likelihood L is there.

【０１６５】１６０９は尤度記憶部であり、前記尤度算
出部１６０８で算出された特徴ベクトル系列Ｙに対する
各単語ＨＭＭλ^wの尤度Ｌ（Ｙ'｜λ^w）を比較のために
記憶する。A likelihood storage unit 1609 stores the likelihood L (Y '| λ ^w ) of each word HMMλ ^w for the feature vector series Y calculated by the likelihood calculation unit 1608 for comparison.

【０１６６】１６１０は比較判定部であり、前記尤度記
憶部１６０９に記憶されている各ＨＭＭの尤度のうち最
大値を与えるＨＭＭに対応する語彙を認識候補として判
定するものである。A comparison / determination unit 1610 determines the vocabulary corresponding to the HMM giving the maximum value among the likelihoods of each HMM stored in the likelihood storage unit 1609 as a recognition candidate.

【０１６７】前記１６０７から１６０９は各語彙のＨＭ
Ｍλ^wにつき一度ずつ行い、w=1〜Wまで繰り返され、そ
の結果を前記比較判定部１６１０で評価する。Numerals 1607 to 1609 are HM of each vocabulary
It is performed once for each Mλ ^w , repeated from w = 1 to W, and the result is evaluated by the comparison / determination unit 1610.

【０１６８】１６１１は認識候補信頼性算出部であり，
前記比較判定部１６１０で選ばれた認識候補の信頼性を
前記尤度記憶部１６０９に記憶されている該認識候補の
尤度等を用いて算出するものである。Reference numeral 1611 is a recognition candidate reliability calculation unit,
The reliability of the recognition candidate selected by the comparison determination unit 1610 is calculated using the likelihood of the recognition candidate stored in the likelihood storage unit 1609.

【０１６９】１６１２は正規化ベクトル調整実行判定部
であり，前記認識候補信頼度算出部１６１１より得られ
た前記認識候補の信頼性が，予め定められた閾値以上で
あれば正規化ベクトル調整信号を後述する正規化ベクト
ル調整部に送り，正規化ベクトルの調整を実行させる。Reference numeral 1612 denotes a normalization vector adjustment execution determination unit, which outputs a normalization vector adjustment signal if the reliability of the recognition candidate obtained from the recognition candidate reliability calculation unit 1611 is a predetermined threshold value or more. It is sent to the normalization vector adjustment unit described later to execute the normalization vector adjustment.

【０１７０】１６１３は正規化ベクトル調整部であり，
前記正規化ベクトル調整実行判定部からの前記正規化ベ
クトル調整信号を受けて，前記正規化ベクトル記憶部１
６０２に記憶されている前記正規化ベクトルと前記ファ
ジィベクトル量子化部１６０５で得られた前記ラベルベ
クトル系列Ｏと前記帰属度ベクトル系列Ｕと前記ＨＭＭ
記憶部に記憶されている前記認識候補に対応するＨＭＭ
とを用いて，正規化ベクトルの調整を行い，修正後の正
規化ベクトルを正規化ベクトル記憶部に送るものであ
る。Reference numeral 1613 is a normalization vector adjustment unit,
Upon receiving the normalization vector adjustment signal from the normalization vector adjustment execution determination unit, the normalization vector storage unit 1
The normalization vector stored in 602, the label vector sequence O obtained by the fuzzy vector quantization unit 1605, the membership vector sequence U, and the HMM.
HMM corresponding to the recognition candidate stored in the storage unit
And are used to adjust the normalization vector, and the corrected normalization vector is sent to the normalization vector storage unit.

【０１７１】[0171]

【The invention's effect】

（１）本発明によれば、発声内容が既知の音声を用い、
該音声に対するＨＭＭの尤度が最大になるように求めた
修正ベクトルを用いてコードブックを修正することによ
り、ＨＭＭ作成時と認識時における環境の差異を修正
し、環境の変化に強い音声認識装置を構成することが可
能となる。（２）本発明によれば、発声内容が既知の音声を用い、
該音声に対するＨＭＭの尤度が最大になるように求めた
正規化ベクトルを用いて特徴ベクトルを修正することに
より、ＨＭＭ作成時と認識時における環境の差異を修正
し、環境の変化に強い音声認識装置を構成することが可
能となる。(1) According to the present invention, a voice whose utterance content is known is used,
By correcting the codebook using the correction vector obtained so as to maximize the likelihood of the HMM for the voice, the difference between the environments at the time of creating the HMM and at the time of recognition is corrected, and the voice recognition device is resistant to the changes in the environment. Can be configured. (2) According to the present invention, a voice whose utterance content is known is used,
By correcting the feature vector using the normalized vector obtained so as to maximize the likelihood of the HMM for the voice, the difference in the environment between when the HMM is created and when the recognition is performed is corrected, and the voice recognition is robust against the change in the environment. It is possible to configure the device.

[Brief description of drawings]

【図１】Hidden Markov Model(ＨＭＭ)を説明するため
のＨＭＭの構成図FIG. 1 is a block diagram of an HMM for explaining a Hidden Markov Model (HMM).

【図２】コードブックの構成を説明する図FIG. 2 is a diagram illustrating the configuration of a codebook.

【図３】音声認識装置の従来例を説明するブロック図FIG. 3 is a block diagram illustrating a conventional example of a voice recognition device.

【図４】本発明のコードブック修正装置の概要の一実施
例を説明するブロック図FIG. 4 is a block diagram illustrating an example of an outline of a codebook correction device according to the present invention.

【図５】本発明のコードブック修正装置の主要部である
コードブック修正部の一実施例を説明するブロック図FIG. 5 is a block diagram illustrating an embodiment of a codebook correction unit which is a main part of the codebook correction device of the present invention.

【図６】本発明の動作を説明するフローチャートFIG. 6 is a flowchart illustrating the operation of the present invention.

【図７】図６における発生度合の算出式が（数４）で表
される場合の動作を説明するフローチャートFIG. 7 is a flowchart for explaining the operation when the occurrence degree calculation formula in FIG. 6 is expressed by (Equation 4).

【図８】図６における修正ベクトルが各クラスタ毎に求
められる場合の修正ベクトル分母分子の算出の動作を説
明するフローチャートFIG. 8 is a flowchart illustrating an operation of calculating a correction vector denominator / numerator when the correction vector in FIG. 6 is obtained for each cluster.

【図９】図６における修正ベクトルが各クラスタ毎に求
められる場合の動作を説明するフローチャート9 is a flowchart illustrating an operation when a correction vector in FIG. 6 is obtained for each cluster.

【図１０】図６における修正ベクトルが全クラスタ共通
に求められる場合の修正ベクトル分母分子の算出の動作
を説明するフローチャートFIG. 10 is a flowchart illustrating an operation of calculating a correction vector denominator / numerator when the correction vector in FIG. 6 is commonly found in all clusters.

【図１１】図６における修正ベクトルが全クラスタ共通
に求められる場合の動作を説明するフローチャートFIG. 11 is a flowchart for explaining the operation when the correction vector in FIG. 6 is obtained commonly to all clusters.

【図１２】図６における発生度合の算出式が前記（数
５）で表される場合の動作を説明するフローチャートFIG. 12 is a flowchart for explaining the operation when the expression for calculating the degree of occurrence in FIG. 6 is expressed by (Equation 5).

【図１３】本発明の特徴ベクトル正規化装置の概要の一
実施例を説明するブロック図FIG. 13 is a block diagram illustrating an example of an outline of a feature vector normalization device of the present invention.

【図１４】本発明の特徴ベクトル正規化装置の主要部で
ある修正ベクトル修正部の一実施例を説明するブロック
図FIG. 14 is a block diagram illustrating an embodiment of a correction vector correction unit which is a main part of the feature vector normalization device of the present invention.

【図１５】コードブック正規化手段を組み込んだ音声認
識装置の一実施例を説明するブロック図FIG. 15 is a block diagram illustrating an embodiment of a voice recognition device incorporating a codebook normalizing means.

【図１６】正規化ベクトル調整手段を組み込んだ音声認
識装置の一実施例を説明するブロック図FIG. 16 is a block diagram illustrating an embodiment of a voice recognition device incorporating a normalization vector adjustment means.

Claims

[Claims]

1. A codebook storage means for storing a finite number of representative points (code vectors) in a feature vector space in a searchable form by labels attached thereto, and each vector of a feature vector series by the codebook. Fuzzy vector quantizing means for converting to a set of degree of membership corresponding to a label (degree of membership vector) and converting the feature vector series into a degree of membership vector series, and the probability of occurrence of the label (probability of occurrence of label) for each state. HMM storage means for storing the defined HMM, feature vector series occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the degree of membership vector, and each of the code vectors A codebook modifying means for modifying, wherein the codebook modifying means uses the HM as the feature vector sequence; It includes modification vector calculation means for modifying the code vector to maximize the degree generated from codebook adjustment device also being configured to modify the code vector.

2. The codebook correction device according to claim 1, wherein the correction vector calculation means individually calculates the value of the correction vector for each code vector of the codebook.

3. The codebook correction device according to claim 1, wherein the correction vector calculation means calculates the value of the correction vector in common to all codevectors of the codebook.

4. The feature vector sequence occurrence degree is defined as a power product of the label occurrence probability and the membership degree, or the logarithm of the occurrence degree is defined by a sum of products of the logarithm of the label occurrence probability and the membership degree. Codebook correction device described.

5. The codebook correction device according to claim 1, wherein the degree of occurrence of the feature vector series is defined by the sum of products of the label occurrence probability and the degree of membership.

6. A codebook storage means for storing a finite number of representative points (code vectors) in a feature vector space in a searchable form by labels attached thereto, and each vector of the feature vector series by the codebook. Fuzzy vector quantizing means for converting to a set of degree of membership corresponding to a label (degree of membership vector) and converting the feature vector series into a degree of membership vector series, and the probability of occurrence of the label (probability of occurrence of label) for each state. HMM storage means for storing the defined HMM, feature vector series occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the membership degree vector, and the feature vector is modified And a normalization vector for calculating a normalization vector for correcting the feature vector And a correction vector calculating means for correcting the feature vector so as to maximize the degree of occurrence of the feature vector sequence from the HMM. A feature vector normalization device having the following configuration.

7. The degree of occurrence of the feature vector sequence is characterized in that the power product of the label occurrence probability and the degree of membership or the logarithm of the degree of occurrence is defined by the sum of products of the log of the label occurrence probability and the degree of membership. The described feature vector normalizer.

8. The feature vector normalization apparatus according to claim 6, wherein the occurrence degree of the feature vector series is defined by the sum of products of the label occurrence probability and the membership degree.

9. The codebook correction device according to claim 4 or 5, wherein the range of exponentiation products or the range of product sums of the feature vector sequence generation degree calculation means is limited.

10. The feature vector normalizing apparatus according to claim 7 or 8, wherein the range of exponentiation products or the range of product sums of the feature vector sequence generation degree calculating means is limited. .

11. A codebook storage means for storing a finite number of representative points (code vectors) in a feature vector space in a searchable form by labels attached thereto, and each vector of the feature vector series by the codebook. Fuzzy vector quantizing means for converting to a set of degree of membership corresponding to a label (degree of membership vector) and converting the feature vector series into a degree of membership vector series, and the probability of occurrence of the label (probability of occurrence of label) for each state. HMM storage means for storing the defined HMM, feature vector series occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the membership degree vector, and the feature vector series H of each vocabulary
Likelihood calculation means for calculating the likelihood of MM, comparison judgment means for judging the recognition result, recognition candidate reliability calculation means for calculating the reliability of the recognition candidates obtained from the comparison judgment means,
If the reliability of the recognition candidate exceeds a certain threshold, a codebook correction execution determination means for issuing an instruction to execute a codebook correction and a codebook correction means for correcting each code vector are provided. The means includes correction vector calculation means for correcting the code vector so as to maximize the degree of occurrence of the feature vector sequence from the HMM corresponding to the recognition candidate, and the recognition candidate is uttered when the utterance content is unknown in advance. A voice recognition device characterized in that the code vector is modified by the following.

12. A codebook storage means for storing a finite number of representative points (code vectors) in a feature vector space in a searchable form by labels attached thereto, and each vector of the feature vector series by the codebook. Fuzzy vector quantizing means for converting to a set of degree of membership corresponding to a label (degree of membership vector) and converting the feature vector series into a degree of membership vector series, and the probability of occurrence of the label (probability of occurrence of label) for each state. HMM storage means for storing the defined HMM, feature vector series occurrence degree calculation means for calculating the occurrence degree of the feature vector series from the HMM based on the label occurrence probability and the membership degree vector, and the feature vector series H of each vocabulary
Likelihood calculation means for calculating the likelihood of MM, comparison judgment means for judging the recognition result, recognition candidate reliability calculation means for calculating the reliability of the recognition candidates obtained from the comparison judgment means,
If the reliability of the recognition candidate exceeds a certain threshold, a codebook correction execution determining means for issuing an instruction to execute a codebook correction, a feature vector correcting means for correcting the feature vector, and the feature vector are corrected. A normalization vector adjustment means for calculating a normalization vector for
The normalization vector adjustment means includes a correction vector calculation means for correcting the feature vector so as to maximize the degree of occurrence of the feature vector sequence from the HMM, and when the utterance content is previously unknown, the recognition candidate is uttered. The voice recognition device is configured to correct the feature vector according to the above.