JP3144203B2

JP3144203B2 - Vector quantizer

Info

Publication number: JP3144203B2
Application number: JP01094494A
Authority: JP
Inventors: 英一坪香; 順一中橋
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-02-02
Filing date: 1994-02-02
Publication date: 2001-03-12
Anticipated expiration: 2016-03-12
Also published as: JPH07219599A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は，ベクトル量子化（Ｖ
Ｑ：Vector Quantization）を用いたパターン認識や通
信におけるコードブックの話者適応化または認識すべき
入力信号や伝送すべき信号の話者正規化を行う装置に関
する。BACKGROUND OF THE INVENTION The present invention relates to a vector quantization (V
The present invention relates to an apparatus for speaker adaptation of a code book in pattern recognition and communication using Q (Vector Quantization) or speaker normalization of an input signal to be recognized and a signal to be transmitted.

【０００２】[0002]

【従来の技術】ベクトル量子化は，音声信号等の伝送に
おける高能率符号化や，音声認識をはじめとするパター
ン認識における基本的な技術として広く用いられている
ものである。ベクトル量子化は次のように行われる。2. Description of the Related Art Vector quantization is widely used as a basic technique in high-efficiency encoding in transmission of speech signals and the like, and in pattern recognition such as speech recognition. Vector quantization is performed as follows.

【０００３】取り扱う対象とするベクトル空間をＭ個の
部分空間に分割し，それぞれの部分空間にラベル（番
号）１,…,Ｍを付し，ラベルｍに対応する部分空間ｍ
（ｍ＝１，・・・,Ｍ）の代表ベクトル（コードベクトル）
μ_mを決定し，μ_m（ｍ＝１,…,Ｍ）をｍにて参照可能な
形で記憶したコードブックを用い，ベクトルｙをラベル
１,…,Ｍの何れかに変換するものである。即ち，ベクト
ルｕとｖの距離をｄ(ｕ,ｖ)とするとき，ｙはラベルA vector space to be handled is divided into M subspaces, and each subspace is labeled with a label (number) 1,..., M, and a subspace m corresponding to the label m is assigned.
(M = 1,..., M) representative vector (code vector)
Determines _{_{μ m, μ m (m =}} 1, ..., M) using a codebook that stores the reference a form in m, converts the vector y label 1, ..., either of M is there. That is, when the distance between vectors u and v is d (u, v), y is a label

【０００４】[0004]

【数１】 (Equation 1)

【０００５】に変換される。前記部分空間は，訓練ベク
トル集合をクラスタリングすることによって決定され
る。クラスタリングの方法としては周知のＬＢＧアルゴ
リズムがしばしば用いられる。この場合，代表ベクトル
μ_mはクラスタｍの重心あるいは平均ベクトルであっ
て，クラスタｍのセントロイドとも呼ばれる。[0005] The subspace is determined by clustering the training vector set. A well-known LBG algorithm is often used as a clustering method. In this case, the representative vector μ _m is the center of gravity or the average vector of the cluster m, and is also called the centroid of the cluster m.

【０００６】ベクトル量子化を用いた音声信号の伝送は
次のように行われる。送信側では，伝送すべきＰＣＭ音
声信号をｎ標本毎にブロック化し，それぞれのブロック
をｎ次元のベクトルと見なし，これを前記コードブック
を用いてラベル系列に変換する。（図１）を用いてこの
ことを説明する。２，３はそれぞれバッファメモリであ
って，相続くｎ標本を交互に記憶するものである。１は
前記バッファメモリ２，３に前記入力のｎ標本を交互に
記憶せしめるべく切り替えるスイッチである。４はバッ
ファメモリ２，３のｎ標本を交互に選択出力するスイッ
チである。１〜４は，一方のバッファメモリが書き込み
を行っている間に他方のバッファメモリから読み出しが
行われるように動作する。５はコードブックであって，
Ｍ個のクラスタそれぞれのｎ次元の代表ベクトルがラベ
ルで検索可能な形で記憶されている。６は比較部であっ
て，前記バファメモリ２，３に記憶されているｎ次元ベ
クトルと，コードブック５に記憶されている前記Ｍ個の
代表ベクトルとの比較を行う。７はラベル選択部であっ
て，前記比較の結果前記バッファメモリ２，３のベクト
ルと最も類似している代表ベクトルに対応するラベルを
選択するものである。この選択されたラベルが送信され
る。即ち，相続くｎ個の標本が順次ラベルに変換され，
このラベルが伝送される。Transmission of an audio signal using vector quantization is performed as follows. On the transmitting side, the PCM audio signal to be transmitted is divided into blocks every n samples, each block is regarded as an n-dimensional vector, and this is converted into a label sequence using the codebook. This will be described with reference to FIG. Reference numerals 2 and 3 denote buffer memories for alternately storing successive n samples. Reference numeral 1 denotes a switch for switching the buffer memories 2 and 3 so that the input n samples are stored alternately. A switch 4 alternately selects and outputs n samples from the buffer memories 2 and 3. 1 to 4 operate such that reading is performed from the other buffer memory while one buffer memory is performing writing. 5 is a code book,
An n-dimensional representative vector of each of the M clusters is stored in a form that can be searched for by a label. Reference numeral 6 denotes a comparison unit that compares the n-dimensional vectors stored in the buffer memories 2 and 3 with the M representative vectors stored in the codebook 5. Reference numeral 7 denotes a label selection unit for selecting a label corresponding to a representative vector most similar to the vector in the buffer memories 2 and 3 as a result of the comparison. The selected label is sent. That is, successive n samples are sequentially converted into labels,
This label is transmitted.

【０００７】受信側では，同じ構成のコードブックを用
いて受信したラベル系列を対応する代表ベクトル系列に
変換し，時間波形に戻す。８はコードベクトル読み出し
部，９はコードブックである。コードブック９はコード
ブック５と同じ構成である。８，９を用いて受信したラ
ベルに対応するｎ次元のコードベクトル（代表ベクト
ル）がコードブック９から読み出される。１１，１２は
それぞれコードブック９から読み出されたｎ要素のコー
ドベクトルを交互に記憶するバッファメモリであって，
前記コードブックから読み出されたｎ次元のコードベク
トルが交互に記憶される。１０はスイッチであって，コ
ードブック９から読み出されたコードベクトルをバッフ
ァメモリ１１，１２に交互に振り分けるためのものであ
る。１３はバッファメモリ１１，１２の内容を交互に読
み出して出力するためのスイッチである。バッファメモ
リ１１，１２には，バッファメモリ２，３のベクトルの
コードベクトルで近似されたものが記憶されることにな
る。従って，これを前記ｎ次元ベクトルの要素毎にシリ
アルに読み出せば，送信信号を近似した形で復号信号が
得られる。バッファメモリ１１，１２は一方が読み出さ
れているとき他方に書き込みを行うものである。読み出
しはスイッチ１３を通してバッファメモリ１１，１２か
ら交互に読み出される。On the receiving side, the received label sequence is converted into a corresponding representative vector sequence using a codebook having the same configuration, and is converted back to a time waveform. Reference numeral 8 denotes a code vector reading unit, and 9 denotes a code book. The code book 9 has the same configuration as the code book 5. The n-dimensional code vector (representative vector) corresponding to the label received by using 8, 9 is read from the code book 9. Reference numerals 11 and 12 denote buffer memories for alternately storing n-element code vectors read from the code book 9, respectively.
The n-dimensional code vectors read from the code book are stored alternately. Reference numeral 10 denotes a switch for alternately distributing the code vectors read from the code book 9 to the buffer memories 11 and 12. Reference numeral 13 denotes a switch for alternately reading and outputting the contents of the buffer memories 11 and 12. The buffer memories 11 and 12 store the values approximated by the code vectors of the vectors of the buffer memories 2 and 3. Therefore, if this is read out serially for each element of the n-dimensional vector, a decoded signal can be obtained in a form approximating the transmission signal. When one of the buffer memories 11 and 12 is being read, the other is written to the other. Reading is alternately performed from the buffer memories 11 and 12 through the switch 13.

【０００８】このようにすることによって，例えば，１
標本が１２ビットで表現される音声信号を伝送すると
き，コードブックサイズをＭ＝２５６，ブロック長をｎ
＝８とすれば，伝送ビットレートは次のようになる。即
ち，１ブロック当りの伝送量は，ＰＣＭ信号そのままを
伝送する場合は１２×８＝９６[ビット]であるが，ベク
トル量子化を行うとラベルを区別するビット数，即ち，
log₂ ２５６＝８[ビット]で済み，伝送ビットレートは
１／１２になる。この場合，前記各バッファメモリに記
憶されるｎ標本を成分とするベクトルｙは，それに最も
近いセントロイドで近似される（量子化される）ことに
なる。従ってコードブックサイズＭは大きい程この量子
化誤差は小さくなるが，符号化に要するビット数は増加
することになる。また，前記代表ベクトルは，学習用に
準備されたベクトル集合から前記のようにして求められ
るが，これを精度良く行うためには，Ｍが大きくなるほ
ど前記学習用ベクトルは多くを必要とする。従って，こ
の量子化に伴う誤差，伝送ビットレート，代表ベクトル
の推定精度等を総合的に考慮して，目的に応じてコード
ブックサイズを決定する必要がある。By doing so, for example, 1
When transmitting an audio signal whose sample is represented by 12 bits, the codebook size is M = 256, and the block length is n.
If = 8, the transmission bit rate is as follows. That is, the transmission amount per block is 12 × 8 = 96 [bits] when the PCM signal is transmitted as it is, but when the vector quantization is performed, the number of bits for distinguishing the label, that is,
log ₂ 256 = 8 [bits], and the transmission bit rate is 1/12. In this case, the vector y having n samples as components in each buffer memory is approximated (quantized) by the nearest centroid. Therefore, as the codebook size M increases, the quantization error decreases, but the number of bits required for encoding increases. In addition, the representative vector is obtained from the vector set prepared for learning as described above. In order to perform this with high accuracy, the larger the M becomes, the more the learning vector is required. Therefore, it is necessary to determine the codebook size according to the purpose by comprehensively taking into account the error accompanying the quantization, the transmission bit rate, the estimation accuracy of the representative vector, and the like.

【０００９】音声認識装置は，未知の音声信号を音響特
徴ベクトルの系列に変換し，それぞれの認識カテゴリに
対応して前もって記憶されているそれぞれの参照モデル
の前記音響特徴ベクトル系列に対する尤度を計算し，該
尤度が最大となる参照モデルとして識別される。（図
２）はベクトル量子化を用いた一般的な音声認識装置の
ブロック図である。２０は特徴抽出部であって入力音声
信号を特徴ベクトルに変換するものである。即ち，例え
ば１０msec毎に，フィルタバンク，ＬＰＣ分析，ケプス
トラム分析等によりｎ次元の特徴ベクトルに変換する。
２１はコードブックであって，予め学習用音声から前記
と同様にして得られる特徴ベクトルの集合から周知のク
ラスタリング法によってクラスタリングし，各クラスタ
にラベル付けし，該ラベルにより検索可能な形で各クラ
スタのセントロイドを記憶したものである。２２はベク
トル量子化部であって，（図１）の比較部１４とラベル
選択部１５を含んだものである。従って，特徴抽出部２
０で得られた特徴ベクトルは，コードブック２１を参照
して該特徴ベクトルに最も近いセントロイドのクラスタ
のラベルに変換される。２３は参照モデル記憶部であっ
て，各認識単位に対応した参照モデルが記憶されてい
る。認識単位としては，単語，音節，音韻等がよく用い
られる。２４は照合部であって，参照モデル記憶部２３
に記憶された前記各参照モデルのベクトル量子化部２２
の出力に得られるラベル系列に対する尤度を計算する。
２５は判定部であって，この尤度が最大である参照モデ
ルに対応する認識単位を認識結果と判定する。The speech recognition apparatus converts an unknown speech signal into a sequence of acoustic feature vectors, and calculates the likelihood of each reference model stored in advance corresponding to each recognition category with respect to the acoustic feature vector sequence. Then, the likelihood is identified as the reference model having the maximum likelihood. FIG. 2 is a block diagram of a general speech recognition device using vector quantization. Reference numeral 20 denotes a feature extraction unit which converts an input speech signal into a feature vector. That is, for example, every 10 msec, it is converted into an n-dimensional feature vector by filter bank, LPC analysis, cepstrum analysis and the like.
Reference numeral 21 denotes a code book, which is clustered by a well-known clustering method from a set of feature vectors obtained in advance from the training speech in the same manner as described above, and labels each cluster. It is a memory of the centroid. Reference numeral 22 denotes a vector quantization unit, which includes the comparison unit 14 and the label selection unit 15 shown in FIG. Therefore, the feature extraction unit 2
The feature vector obtained at 0 is converted to the label of the centroid cluster closest to the feature vector with reference to the codebook 21. Reference numeral 23 denotes a reference model storage unit which stores reference models corresponding to each recognition unit. As recognition units, words, syllables, phonemes, and the like are often used. Reference numeral 24 denotes a reference unit, which is a reference model storage unit 23.
Vector quantization unit 22 for each of the reference models stored in
Calculate the likelihood for the label sequence obtained in the output of.
A determination unit 25 determines the recognition unit corresponding to the reference model having the maximum likelihood as a recognition result.

【００１０】参照モデルとしては，各認識単位音声をラ
ベル系列として持つものと，状態と状態遷移，各状態に
おける特徴ベクトルの発生度合が定義された，いわゆる
ＨＭＭ（Hidden Markov Model）として持つもの等が提
案されている。As reference models, there are a model having each recognition unit voice as a label sequence, a model having a so-called HMM (Hidden Markov Model) in which states and state transitions, and a degree of occurrence of a feature vector in each state are defined. Proposed.

【００１１】前者はＳＰＬＩＴ法として知られているも
のであり，未知入力音声に対応するラベル系列と，参照
モデルたるラベル系列とラベル系列同士で照合するもの
と，未知入力から得られる特徴抽出部２０の出力ベクト
ルをラベルに変換してしまわずに各セントロイドに対す
る距離ベクトル（各フレームの各セントロイドに対する
距離を要素とするベクトル）あるいは類似度ベクトル
（各フレームの各セントロイドに対する類似度を要素と
するベクトル）に変換し，得られた距離（類似度）ベク
トル列と参照モデルと照合する方法である。The former is known as the SPLIT method, and includes a label sequence corresponding to an unknown input speech, a label sequence which is a reference model and a label sequence collated with each other, and a feature extraction unit 20 obtained from an unknown input. Is converted to a label without converting the distance vector to each centroid (a vector having the distance to each centroid in each frame as an element) or the similarity vector (the similarity to each centroid in each frame as an element In this method, the obtained distance (similarity) vector sequence is compared with the reference model.

【００１２】後者は，最近主流となって来た方式であ
り，種々の改良法も提案されているが基本的には次の原
理に基づく。認識すべき未知入力に対する特徴ベクトル
系列をＹ＝ｙ₁,ｙ₂,…,ｙ_T，ＨＭＭλから発生する長さ
Ｔの任意の状態系列をＸ＝ｘ₁,ｘ₂,…,ｘ_T，状態ｉから
状態ｊへの遷移確率をａ_ij，状態ｉの初期確率，即ち，
ｔ＝１で状態ｉである確率をπ_i，状態ｉにおけるベク
トルｙ_tの発生度合をω_i(ｙ_t)とするとき，λから特徴
ベクトル系列Ｙの発生する度合は，（数２）〜（数４）
のように示される。The latter is a system which has recently become mainstream, and various improvements have been proposed, but are basically based on the following principle. The feature vector sequence for unknown input to be recognized _{_{Y = y 1, y 2,}} ..., y T, any state sequence X = x ₁ length T generated from _{HMMλ, x 2, ..., x} T, the state The transition probability from i to state j is a _ij , the initial probability of state i,
When the probability of state i at t = 1 is π _i and the degree of occurrence of vector y _t in state i is ω _i (y _t ), the degree of occurrence of feature vector series Y from λ is (Equation 4)
Is shown as

【００１３】[0013]

【数２】 (Equation 2)

【００１４】またはOr

【００１５】[0015]

【数３】 (Equation 3)

【００１６】または（数３）の両辺の対数をとって，Or, taking the logarithm of both sides of (Equation 3),

【００１７】[0017]

【数４】 (Equation 4)

【００１８】よく用いられるモデルの状態遷移図は（図
４）のように表される。ただし，同図において右肩のｗ
は認識単位ｗに対応するものであることを示す。これを
ＨＭＭｗとすれば，（図２）における参照モデル記憶
部２３には，（図３）のようにＨＭＭ１，ＨＭＭ２，
・・・，ＨＭＭＷが記憶されることになる。このとき，認
識結果は，認識単位ｗに対応するＬ₁(Ｙ|λ^w)，Ｌ₂(Ｙ|
λ^w)，Ｌ₃(Ｙ|λ^w)に対して，A state transition diagram of a frequently used model is represented as shown in FIG. However, in FIG.
Indicates that it corresponds to the recognition unit w. Assuming that this is HMM w, the reference model storage unit 23 in (FIG. 2) stores HMM 1, HMM 2,
.., HMM W is stored. At this time, the recognition results are L ₁ (Y | λ ^w ) and L ₂ (Y |
λ ^w ) and L ₃ (Y | λ ^w )

【００１９】[0019]

【数５】 (Equation 5)

【００２０】となる。ただし，（数５）において，（数
２）を用いる場合はｉ＝１，（数３）を用いる場合はｉ
＝２，（数４）を用いる場合はｉ＝３であるとする。## EQU1 ## However, in (Equation 5), i = 1 when (Equation 2) is used, and i when (Equation 3) is used.
= 2 and (Equation 4), i = 3.

【００２１】状態ｉにおける特徴ベクトルの発生度合ω
_i(ｙ_t)の定義の仕方によって連続型ＨＭＭ，離散型ＨＭ
Ｍ，ＦＶＱ型ＨＭＭ等が存在する。本発明は，離散型Ｈ
ＭＭ，ＦＶＱ型ＨＭＭに関するものである。Degree of occurrence of feature vector ω in state i
Depending on the definition of _i (y _t ), continuous HMM, discrete HM
M, FVQ type HMMs and the like exist. The present invention provides a discrete H
It relates to MM and FVQ HMMs.

【００２２】離散型ＨＭＭは，ｂ_imを状態ｉにおけるラ
ベルｍの発生確率とするとき，In the discrete HMM, when b _im is the probability of occurrence of label m in state i,

【００２３】[0023]

【数６】 (Equation 6)

【００２４】とするものである。離散型ＨＭＭの改良と
してファジィベクトル量子化に基づくＨＭＭ（ＦＶＱ型
ＨＭＭ）がある。通常のベクトル量子化においては，ｙ
_tは，それに最も近いクラスタの代表ベクトルに一意に
量子化されてしまうのに対し，ファジィベクトル量子化
はｙ_tのクラスタｍへの帰属度０≦ｕ_tm≦１，ｕ_t1＋ｕ
_t2＋・・・＋ｕ_tM＝１が定義され，[0024] As an improvement of the discrete HMM, there is an HMM based on fuzzy vector quantization (FVQ HMM). In normal vector quantization, y
_{While t} is uniquely quantized to the representative vector of the cluster closest to it, fuzzy vector quantization is based on the degree of belonging of y _t to cluster m 0 ≦ u _tm ≦ 1, u _t1 + u
_t2 + ... + _utM = 1 is defined,

【００２５】[0025]

【数７】 (Equation 7)

【００２６】あるいはOr

【００２７】[0027]

【数８】 (Equation 8)

【００２８】等と定義される。Are defined as follows.

【００２９】[0029]

【発明が解決しようとする課題】通常，コードブックは
多数の話者の種々の文章，単語等の発声音声から平均的
な値として求められるのであるが，この平均からずれる
と歪が大きくなり，通信の場合は復号信号の品質が低下
し，音声認識の場合は認識性能の劣化を招く。話者毎に
コードブックを作成し，話者に応じて適用するコードブ
ックを切り替えれば，性能はよくなるが一人の話者から
膨大な学習データを集める必要があり，実用性に乏し
い。Normally, a code book is obtained as an average value from uttered voices of various sentences, words, etc. of a large number of speakers. In the case of communication, the quality of the decoded signal deteriorates, and in the case of speech recognition, the recognition performance deteriorates. If a codebook is created for each speaker and the applied codebook is switched according to the speaker, the performance will be improved, but it will be necessary to collect a large amount of training data from one speaker, which is not practical.

【００３０】[0030]

【課題を解決するための手段】１．特徴ベクトル空間におけるいくつかの代表ベクトル
をそれぞれに対応したラベルで検索可能な形で記憶する
標準コードブックと，いくつかの学習用ベクトルを記憶
する学習用ベクトル記憶手段と，前記代表ベクトルと前
記学習用ベクトルの関数として定義される目的関数を計
算する目的関数計算手段と，移動ベクトルを算出する移
動ベクトル算出手段と，該移動ベクトルを前記代表ベク
トルに加算して新たなる代表ベクトルを得る適応手段と
を備え，入力ベクトルの符号化に際しては前記新たなる
代表ベクトルにより入力ベクトルをラベルもしくは該入
力ベクトルの各ラベルに対する帰属度を要素とする帰属
度ベクトルに変換するものであって，前記移動ベクトル
算出手段は，前記学習用ベクトルに対し，前記新たなる
代表ベクトルが前記目的関数を極値に近づけるべく算出
するものである２．特徴ベクトル空間におけるいくつかの代表ベクトル
をそれぞれに対応したラベルで検索可能な形で記憶する
標準コードブックと，いくつかの学習用ベクトルを記憶
する学習用ベクトル記憶手段と，前記代表ベクトルと前
記学習用ベクトルの関数として定義される目的関数を計
算する目的関数計算手段と，移動ベクトルを算出する移
動ベクトル算出手段と，該移動ベクトルを入力ベクトル
に加算する正規化手段とを備え，入力ベクトルの符号化
に際しては，前記移動ベクトルと入力ベクトルを加算す
ることにより前記正規化された入力ベクトルを得，前記
代表ベクトルによりラベルもしくは該入力ベクトルの各
ラベルに対する帰属度を要素とする帰属度ベクトルに変
換するものであって，前記移動ベクトル算出手段は，前
記標準コードブックに対し，前記学習用ベクトルと前記
移動ベクトルの和を新たなる学習ベクトルとして置き換
えたとき，前記目的関数を極値に近づけるべく算出する
ものである。[Means for Solving the Problems] A standard codebook that stores a number of representative vectors in a feature vector space in a searchable form with corresponding labels, a learning vector storage unit that stores some learning vectors, Objective function calculating means for calculating an objective function defined as a function of a use vector, moving vector calculating means for calculating a moving vector, and adapting means for adding the moving vector to the representative vector to obtain a new representative vector. And converting the input vector into a label or a membership vector using the new representative vector as an element with the membership of the input vector with respect to each label when encoding the input vector. Means that the new representative vector is 2 and calculates to approximate the function extremum. A standard codebook that stores a number of representative vectors in a feature vector space in a searchable form with corresponding labels, a learning vector storage unit that stores some learning vectors, Function vector calculating means for calculating an objective function defined as a function of the input vector, moving vector calculating means for calculating the moving vector, and normalizing means for adding the moving vector to the input vector. In the conversion, the normalized input vector is obtained by adding the movement vector and the input vector, and the representative vector is used to convert the input vector into a label or a membership vector with the membership of each input vector as an element. Wherein the movement vector calculating means stores the motion vector in the standard codebook. And, when replacing the sum of the motion vector and the learning vector As a new learning vector, and calculates to approximate the objective function extremum.

【００３１】[0031]

[Action]

１．標準コードブックに特徴ベクトル空間におけるいく
つかの代表ベクトルをそれぞれに対応したラベルで検索
可能な形で記憶し，学習用ベクトル記憶手段にいくつか
の学習用ベクトルを記憶しておき，目的関数計算手段に
より前記代表ベクトルと前記学習用ベクトルの関数とし
て定義される目的関数を計算し，移動ベクトル算出手段
によって移動ベクトルを算出し，適応手段によって該移
動ベクトルを前記代表ベクトルに加算して新たなる代表
ベクトルを得，該新たなる代表ベクトルを用いて，ベク
トル量子化手段によって符号化すべき入力ベクトルをラ
ベルもしくは該入力ベクトルの各ラベルに対する帰属度
を要素とする帰属度ベクトルに変換するものであって，
前記移動ベクトルの算出は，前記学習用ベクトルに対
し，前記新たなる代表ベクトルが前記目的関数を極値に
近づけるべく算出するものである。２．標準コードブックに特徴ベクトル空間におけるいく
つかの代表ベクトルをそれぞれに対応したラベルで検索
可能な形で記憶し，学習用ベクトル記憶手段にいくつか
の学習用ベクトルを記憶しておき，目的関数計算手段に
より前記代表ベクトルと前記学習用ベクトルの関数とし
て定義される目的関数を計算し，移動ベクトル算出手段
によって移動ベクトルを算出し，正規化手段によって該
移動ベクトルを符号化すべき入力ベクトルに加算して正
規化された入力ベクトルを得，ベクトル量子化手段によ
って該正規化された入力ベクトルを前記代表ベクトルに
よりラベルもしくは該入力ベクトルの各ラベルに対する
帰属度を要素とする帰属度ベクトルに変換するものであ
って，前記移動ベクトルは，前記移動ベクトル算出手段
によって，前記標準コードブックに対し，前記学習用ベ
クトルに前記移動ベクトルを加算したものを新たな学習
ベクトルとして前記目的関数を極値に近づけるべく算出
するものである。1. Some representative vectors in the feature vector space are stored in a standard codebook in a searchable form with corresponding labels, and some learning vectors are stored in learning vector storage means. , An objective function defined as a function of the representative vector and the learning vector is calculated, a moving vector is calculated by a moving vector calculating means, and the moving vector is added to the representative vector by an adaptive means to obtain a new representative vector. And converting the input vector to be encoded by the vector quantization means into a label or a membership vector having the membership of each input vector as an element, using the new representative vector,
The calculation of the movement vector is to calculate the new representative vector so that the objective function approaches an extreme value with respect to the learning vector. 2. Some representative vectors in the feature vector space are stored in a standard codebook in a searchable form with corresponding labels, and some learning vectors are stored in learning vector storage means. Calculates an objective function defined as a function of the representative vector and the learning vector, calculates a motion vector by a motion vector calculation means, adds the motion vector to an input vector to be coded by a normalization means, and performs normalization. Converting the input vector normalized by the vector quantization means into a label or a membership vector using the representative vector as an element of the membership of the input vector with respect to each label. , The movement vector is calculated by the movement vector calculation means. Codebook to, and calculates to approximate the extrema of the objective function obtained by adding the motion vector to the learning vector as a new learning vector.

【００３２】[0032]

【実施例】本発明は，話者の正規化あるいはコードブッ
クの適応化に関するものである。即ち，上記欠点を補う
ために，入力ベクトルを話者に応じて修正する，あるい
は，コードブックの代表ベクトルを話者に応じて修正す
る方法に関するものであって，認識すべき話者のごく少
数の音声から教師無し（話者が何れの単語，文章等を発
声したかシステムには教えない）でこれを実行するもの
である。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention relates to speaker normalization or codebook adaptation. That is, in order to compensate for the above drawbacks, the present invention relates to a method of correcting an input vector according to a speaker or a method of correcting a representative vector of a codebook according to a speaker. This is executed without a teacher (it does not tell the system which word, sentence, etc. the speaker uttered) from the voice of (1).

【００３３】コードブックは多数の話者が発声して得ら
れた特徴ベクトルの集合からクラスタリングし作成され
る。クラスタリングの方法は，各特徴ベクトルが唯一つ
のクラスタにのみ属するとするいわゆるハードクラスタ
リングと各特徴ベクトルがそれぞれのクラスタに，それ
ぞれに対する帰属度に従って属するとするファジィクラ
スタリングがある。ハードクラスタリングの方法にはＬ
ＢＧ法と呼ばれるアルゴリズムがあり，ファジィクラス
タリングの方法にはファジィk-means法等周知の方法が
用いられる。本発明は，ハードクラスタリングにもファ
ジィクラスタリングにも用いられるが，ハードクラスタ
リングはファジィクラスタリングの特別の場合であると
考えることが出来る。The code book is created by clustering from a set of feature vectors obtained by uttering a number of speakers. Clustering methods include so-called hard clustering, in which each feature vector belongs to only one cluster, and fuzzy clustering, in which each feature vector belongs to each cluster according to the degree of belonging to each cluster. L for hard clustering method
There is an algorithm called a BG method, and a well-known method such as a fuzzy k-means method is used as a fuzzy clustering method. Although the present invention is used for both hard clustering and fuzzy clustering, hard clustering can be considered to be a special case of fuzzy clustering.

【００３４】ファジィクラスタリングは次のようにして
行われる。多数話者が発声して得られた特徴ベクトルに
通し番号をつけてｙ₁,ｙ₂,…,ｙ_n，…,ｙ_Nとする。問題
は，ｙ_nのクラスタｍ（＝１,・・・,Ｍ）への帰属度を
ｕ_nm，クラスタｍのセントロイドベクトルをμ_mとする
とき，目的関数Fuzzy clustering is performed as follows. Y _1, y ₂ multiple speakers with a serial number feature vectors obtained by saying, ..., y _n, ..., and y _N. When the problem is, the cluster m of _{y n (= 1, ···,} M) u nm the degree of belonging to, the centroid vector of cluster m and μ _m, the objective function

【００３５】[0035]

【数９】 (Equation 9)

【００３６】をｕ_n1＋ｕ_n2＋ … ＋ｕ_nM＝１の条件の下
で最小化すべくセントロイド行列Ｖ＝[μ₁,μ₂,・・・,
μ_M]と帰属度行列Ｕ＝[ｕ_nm]を決定することである。こ
れは，ＶとＵの何れか一方を固定し，他方によって目的
関数Ｊを最小化するという操作を，ＶとＵとについて交
互に繰り返すことによって実行される。具体的には，Ｖ
を固定して∂Ｊ／∂Ｕ＝０のＵに関する解としてＵ'を
求め，Ｕを固定して∂Ｊ／∂Ｖ＝０のＶに関する解とし
てＶ'を求め，Ｕ＝Ｕ'，Ｖ＝Ｖ'を新たなＵ，Ｖとする
という操作を収束するまで交互に繰り返すものである。
Ｆはファジィネスと呼ばれるもので，Ｆ＞１であって，
Ｆが大きくなるにつれてクラスタ間の曖昧さが増す。The centroid matrix V = [μ ₁ , μ ₂ ,..., Is minimized under the condition of u _n1 + un ₂ +... + _UnM = 1.
μ _M ] and the membership matrix U = [ _unm ]. This is performed by alternately repeating the operation of fixing one of V and U and minimizing the objective function J by the other for V and U. Specifically, V
Is fixed, U ′ is obtained as a solution for U of ∂J / ∂U = 0, and U is fixed to obtain V ′ as a solution for V of ∂J / ∂V = 0, U = U ′, V = The operation of changing V 'to new U and V is alternately repeated until convergence.
F is called fuzziness, F> 1 and
The ambiguity between clusters increases as F increases.

【００３７】ファジィクラスタリングは次のステップに
より実行される。ここでは，ｄ(ｙ_n，μ_m)＝(ｙ_n−μ_m)
^T(ｙ_n−μ_m)とする。（ステップ１−１）クラスタの数をＭ，繰り返し計算の回数をｓ＝０，目的
関数の値をＪ⁽⁰⁾＝∞ とし，帰属度行列Ｕ＝[ｕ_nm]
の初期値Ｕ⁽⁰⁾を適当に与える。（ステップ１−２）ｓ＝ｓ＋１とする。（ステップ１−３）クラスタｍの平均ベクトルμ
_m ^(s)（ｍ＝１,…,Ｍ）を次式で求める。Fuzzy clustering is performed by the following steps. _{_{Here, d (y n, μ m}} ) = (y n -μ m)
And ^T (y _n ^-μ _m). (Step 1-1) The number of clusters is M, the number of iterations is s = 0, the value of the objective function is J ⁽⁰⁾ = ∞, and the membership matrix U = [ _unm ]
Is appropriately given as an initial value U ⁽⁰⁾ . (Step 1-2) Set s = s + 1. (Step 1-3) Average vector μ of cluster m
_m ^(s) (m = 1,..., M) is obtained by the following equation.

【００３８】[0038]

【数１０】 (Equation 10)

【００３９】（ステップ１−４）各点のクラスタへの帰
属度行列を次式により計算する。(Step 1-4) A matrix of the degree of belonging of each point to the cluster is calculated by the following equation.

【００４０】[0040]

【数１１】 [Equation 11]

【００４１】（ステップ１−５）目的関数の計算(Step 1-5) Calculation of Objective Function

【００４２】[0042]

【数１２】 (Equation 12)

【００４３】（ステップ１−６）終端条件(Step 1-6) Termination condition

【００４４】[0044]

【数１３】 (Equation 13)

【００４５】を満たさないときは（ステップ１−２）
へ。満たすときは終了する。ここで，εは予め定められ
た適当に小さな正の数であって，この値が小さい程，セ
ントロイドの推定精度は高くなるが収束に時間がかかる
ことになる。When the condition is not satisfied (step 1-2)
What. When satisfied, end. Here, ε is a predetermined appropriately small positive number, and the smaller this value is, the higher the centroid estimation accuracy is, but the longer it takes to converge.

【００４６】上記のステップにおいて（数１０）は｜Ｊ
^(s-1)／｜μ_m ^(s-1)＝０をμ_m ^(s-1)について解くことに
よって，（数１１）はθをLagrangeの未定乗数としてIn the above steps, (Equation 10) is | J
^(s-1) / | by solving for _{^{μ m (s-1) =}} 0 and μ _m ^(s-1), as (number 11) is undetermined multiplier of Lagrange the θ

【００４７】[0047]

【数１４】 [Equation 14]

【００４８】をｕ_nm ^(s-1)によって解くことによって得
られる。また，ファジィネスＦ→１＋０とすれば，１／
(Ｆ−１)→∞であって，μ_m ^(s-1)がｙ_nに最近隣のとき
は，ｄ(ｙ_n,μ_m ^(s-1))＜ｄ(ｙ_n,μ_h ^(s-1)) for ｈ≠ｍｄ(ｙ_n,μ_m ^(s-1))＝ｄ(ｙ_n,μ_h ^(s-1)) for ｈ＝ｍであるから， {ｄ(ｙ_n,μ_m ^(s-1))／ｄ(ｙ_n,μ_h ^(s-1))}^1/(F-1)→０ f
or ｈ≠ｍ {ｄ(ｙ_n,μ_m ^(s-1))／ｄ(ｙ_n,μ_h ^(s-1))}^1/(F-1)＝１ f
or ｈ＝ｍとなり，By solving u _nm ^(s-1) . If fuzziness F → 1 + 0, then 1 /
A (F-1) → ∞, when μ _m ^(s-1) is the highest close to _{_{y n, d (y n,}} μ m (s-1)) <d (y n, μ h ( ^{s-1)) for h ≠} m d (y n, μ m (s-1)) = d (y n, μ h (s-1)) since it is for h = m, {d ( y n, _{^{μ m (s-1))}} / d (y n, μ h (s-1))} 1 / (F-1) → 0 f
or h ≠ m {d (y n, μ m (s-1)) / d (y n, μ h (s-1))} 1 / (F-1) = 1 f
or h = m,

【００４９】[0049]

【数１５】 (Equation 15)

【００５０】即ち，ハードクラスタリングとなる。ハー
ドクラスタリングは，ファジィクラスタリングにおい
て，ｙ_nに最も近いクラスタのラベルをＬ(ｎ)とすると
き，ｕ_nm ^(s)＝δ_L(n),m ^(s)と定義することである。ここ
で，δ_ijはクロネッカのデルタであって，ｉ＝ｊのとき
δ_ij＝１，ｉ≠ｊのときδ_ij＝０である。従って，ハー
ドクラスタリングの場合は上の処理手順は次のようにな
る。That is, hard clustering is performed. Hard clustering in fuzzy clustering, when the label of the closest cluster y _n and L (n), is to define the _{^{_{u nm (s) = δ L}}} (n), m (s). Here, δ _ij is the Kronecker delta, δ _ij = 1 when i = j, and δ _ij = 0 when i ≠ j. Therefore, in the case of hard clustering, the above processing procedure is as follows.

【００５１】先ず，目的関数はFirst, the objective function is

【００５２】[0052]

【数１６】 (Equation 16)

【００５３】である。この場合は，クラスタリングの手
順は次のようになる。（ステップ２−１）ｓ＝０，Ｊ⁽⁰⁾＝∞とする。（ステップ２−２）ｓ＝ｓ＋１とする。（ステップ２−３）クラスタの平均ベクトルμ_m ^(s)（ｍ
＝１,…,Ｍ）を次式で求める。Is as follows. In this case, the clustering procedure is as follows. (Step 2-1) It is assumed that s = 0 and J ⁽⁰⁾ = ∞. (Step 2-2) s = s + 1 is set. (Step 2-3) Average vector μ _m ^(s) (m
= 1,..., M) by the following equation.

【００５４】[0054]

【数１７】 [Equation 17]

【００５５】（ステップ２−４）各点の最近隣のセント
ロイドを計算し，各点をクラスタリングする。(Step 2-4) The centroid closest to each point is calculated, and each point is clustered.

【００５６】[0056]

【数１８】 (Equation 18)

【００５７】（ステップ２−５）目的関数の計算(Step 2-5) Calculation of Objective Function

【００５８】[0058]

【数１９】 [Equation 19]

【００５９】（ステップ２−６）終端条件(Step 2-6) Termination condition

【００６０】[0060]

【数２０】 (Equation 20)

【００６１】を満たさないときは（ステップ２−２）
へ。満たすときは終了する。以上のようにしてコードブ
ックが作成されるが，このようにして作成されたコード
ブックの話者Ａの音声に対する適応は次のように行う。When the condition is not satisfied (step 2-2)
What. When satisfied, end. The codebook is created as described above, and the adaptation of the codebook thus created to the voice of speaker A is performed as follows.

【００６２】問題は，セントロイドμ_m（ｍ＝１,・・・,
Ｍ）を話者Ａの音声に最も適するようにμ_m'に変換する
ことである。本発明による第１の実施例は，この変換を
μ_m'＝μ_m＋ｈ_mで与え，話者Ａの発声した音声から最適
のｈ_mを見出すことによってこれを行うものである。具
体的には，話者Ａがコードブック適応のために発声した
音声から得られる特徴ベクトルを通し番号を付けて，ｙ
^A ₁,ｙ^A ₂,・・・,ｙ^A _Iとするとき，The problem is that the centroid μ _m (m = 1,...,
M) to μ _m ′ so as to be most suitable for the voice of speaker A. The first embodiment according to the present invention, the transformation given by _{_{μ m '= μ m + h}} m, and performs this by finding the optimal h _m from speech uttered by the speaker A. Specifically, feature numbers obtained from the voice uttered by speaker A for codebook adaptation are serially numbered, and y
^{When A} ₁ , y ^A ₂ , ..., y ^A _I ,

【００６３】[0063]

【数２１】 (Equation 21)

【００６４】を適当に小さくするｈ_mを見出すことによ
って行われ得る。前記の例のようにｄ(ｙ,μ)＝(ｙ−
μ)^T(ｙ−μ)で定義すれば，次のステップによりｈ_mが
求められる。Ｓは繰り返し回数の上限として予め設定し
た値である。（ステップ３−１）クラスタの数をＭ，繰り返し計算の
回数をｓ＝０，目的関数の値をＪ⁽⁰⁾＝∞，ｈ_m ⁽⁰⁾
＝０（ｍ＝１,・・・,Ｍ）とし，帰属度行列Ｕ＝[ｕ_nm]の
初期値Ｕ ⁽⁰⁾を次式で与える。H is appropriately reduced_mBy finding
Can be performed. As in the above example, d (y, μ) = (y−
μ)^T(y-μ), h_mBut
Desired. S is preset as the upper limit of the number of repetitions
Value. (Step 3-1) When the number of clusters is M,
The number of times is s = 0 and the value of the objective function is J⁽⁰⁾= ∞, h_m ⁽⁰⁾
= 0 (m = 1,..., M), and the membership degree matrix U = [u_nm]of
Initial value U ⁽⁰⁾Is given by the following equation.

【００６５】[0065]

【数２２】 (Equation 22)

【００６６】（ステップ３−２）ｓ＝ｓ＋１とする。（ステップ３−３）移動ベクトルｈ_m ^(s)（ｍ＝１,…,
Ｍ）を次式で求める。(Step 3-2) s = s + 1 is set. (Step 3-3) Movement vector h _m ^(s) (m = 1,...,
M) is calculated by the following equation.

【００６７】[0067]

【数２３】 (Equation 23)

【００６８】（ステップ３−４）各点（学習用ベクト
ル）のクラスタへの帰属度行列を次式により計算する。(Step 3-4) A matrix of the degree of belonging of each point (learning vector) to the cluster is calculated by the following equation.

【００６９】[0069]

【数２４】 (Equation 24)

【００７０】（ステップ３−５）目的関数の計算(Step 3-5) Calculation of Objective Function

【００７１】[0071]

【数２５】 (Equation 25)

【００７２】（ステップ３−６）終端条件(Step 3-6) Termination condition

【００７３】[0073]

【数２６】 (Equation 26)

【００７４】を満たさないときは（ステップ３−２）
へ。満たすときは終了する。（ステップ３−６）におけ
るδは適当に小さな数であって，標準的に準備されてい
るコードブックのセントロイドを，学習のために用いる
音声入力にどの程度近づけるかによって決められる。δ
が小さく，Ｓが大きいときは，前記学習用音声のみによ
ってクラスタリングして得られるコードブックに近づく
ことになる。学習用音声が少ないときは，セントロイド
の分布がこの学習用音声に過度に偏ることは返って好ま
しくないと考えられるから，δ，Ｓは，学習用音声の数
によって適当な大きさが選ばれるべきである。If the condition is not satisfied (step 3-2)
What. When satisfied, end. Δ in (Step 3-6) is an appropriately small number, and is determined by how close the centroid of the codebook prepared as standard is to the speech input used for learning. δ
Is small and S is large, it approaches a codebook obtained by clustering only with the learning speech. When the number of learning voices is small, it is considered unfavorable that the distribution of the centroid is excessively biased toward the learning voices. Therefore, δ and S are appropriately selected according to the number of learning voices. Should.

【００７５】学習用音声が少ないときは，むしろ，目的
関数（数２１）におけるｈ_mをｍ＝１，・・・,Ｍに関して
共通にする方がよい。即ち，本発明による第２の実施例
はこの場合であって，ｈ＝ｈ₁＝ｈ₂＝・・・＝ｈ_Mとし，目
的関数を[0075] When the voice for learning is small, but rather, a h _m in the objective function (number 21) m = 1, ···, it is better to be in common with respect to M. That is, the second embodiment according to the present invention is in this case, where h = h ₁ = h ₂ =... = H _M and the objective function is

【００７６】[0076]

【数２７】 [Equation 27]

【００７７】とするものである。ｈは次のステップによ
って求められる。（ステップ４−１）クラスタの数をＭ，繰り返し計算の
回数をｓ＝０，目的関数の値をＪ⁽⁰⁾＝∞，ｈ⁽⁰⁾＝０と
し，帰属度行列Ｕ＝[ｕ_nm]の初期値Ｕ⁽⁰⁾を次式で与え
る。It is assumed that h is determined by the following steps. (Step 4-1) The number of clusters is M, the number of iterations is s = 0, the value of the objective function is J ⁽⁰⁾ = ∞, h ⁽⁰⁾ = 0, and the membership matrix U = [ _unm ] It gives the initial value U ⁽⁰⁾ by the following equation.

【００７８】[0078]

【数２８】 [Equation 28]

【００７９】（ステップ４−２）ｓ＝ｓ＋１とする。（ステップ４−３）移動ベクトルｈ^(s)を次式で求め
る。(Step 4-2) s = s + 1. (Step 4-3) The movement vector h ^(s) is obtained by the following equation.

【００８０】[0080]

【数２９】 (Equation 29)

【００８１】（ステップ４−４）各点（学習用ベクト
ル）のクラスタへの帰属度行列を次式により計算する。(Step 4-4) The degree of membership of each point (learning vector) to the cluster is calculated by the following equation.

【００８２】[0082]

【数３０】 [Equation 30]

【００８３】（ステップ４−５）目的関数の計算(Step 4-5) Calculation of Objective Function

【００８４】[0084]

【数３１】 (Equation 31)

【００８５】（ステップ４−６）終端条件(Step 4-6) Termination Condition

【００８６】[0086]

【数３２】 (Equation 32)

【００８７】を満たさないときは（ステップ４−２）
へ。満たすときは終了する。この場合もδ，Ｓの選び方
によって，学習用として発声した音声の，セントロイド
の修正量に対する影響度を調整することが出来る。When the condition is not satisfied (step 4-2)
What. When satisfied, end. Also in this case, the influence of the voice uttered for learning on the centroid correction amount can be adjusted by selecting δ and S.

【００８８】（図５）は前記第１，第２の実施例の構成
を示すブロック図である。前記第１の実施例の場合は前
記（ステップ３−１）〜（ステップ３−６）を実行する
ものであり，前記第２の実施例の場合は，（ステップ４
−１）〜（ステップ４−６）を実行する。５０はコード
ブック作成のための前記学習用ベクトルｙ^A ₁,…,ｙ^A _Nの
入力端子，５１はバッファメモリで，前記ｙ^A ₁,…,ｙ^A _N
を記憶する。５４は標準コードブックであって，多数話
者から作成されたコードベクトルがラベルで検索可能な
形で記憶されている。５３は移動ベクトル記憶部，５５
は加算器であって，前記標準コードブック５４の内容と
移動ベクトル記憶部５５の内容とが加算器５５で加算さ
れる。５２は移動ベクトル計算部であって，バッファメ
モリ５１の内容と加算器５５の出力から，前記第１の実
施例の場合は前記（ステップ６−１）〜（ステップ６−
６）に従ってｈ_m（ｍ＝１,…,Ｍ）を計算し，前記第２
の実施例の場合は前記（ステップ４−１）〜（ステップ
４−６）を計算する。計算された移動ベクトルは移動ベ
クトル記憶部５３に記憶される。前記繰り返し計算開始
の時点で，移動ベクトル記憶部５３の内容は０に初期化
される。この構成によれば，移動ベクトル記憶部５３の
内容は計算途中に得られる更新された移動ベクトルによ
って，その都度書き換えられることになる。（ステップ
３−６）あるいは（ステップ４−６）の収束条件が満た
されれば，最終的に話者Ａに適応した移動ベクトルが移
動ベクトル記憶部５３に得られる。このようにして得ら
れた移動ベクトルを標準ゴードブックの出力に加算した
ものを以って話者Ａに適した代表ベクトルとすることが
出来る。FIG. 5 is a block diagram showing the structure of the first and second embodiments. In the case of the first embodiment, steps (3-1) to (3-6) are executed, and in the case of the second embodiment, (step 4-4) is executed.
-1) to (Step 4-6) are executed. 50 the learning vector y ^A ₁ for codebook creation, ..., the input terminal of the y ^A _N, 51 in the buffer memory, the ^{_{^{y A 1, ..., y A}}} N
Is stored. Reference numeral 54 denotes a standard code book, which stores code vectors created by a large number of speakers in a form that can be searched for by labels. 53 is a movement vector storage unit, 55
Is an adder, and the contents of the standard codebook 54 and the contents of the movement vector storage unit 55 are added by the adder 55. Reference numeral 52 denotes a movement vector calculation unit, which is based on the contents of the buffer memory 51 and the output of the adder 55 in the case of the first embodiment.
H _m (m = 1,..., M) is calculated according to 6), and the second
In the case of this embodiment, the above (Step 4-1) to (Step 4-6) are calculated. The calculated movement vector is stored in the movement vector storage unit 53. At the start of the repetitive calculation, the contents of the movement vector storage unit 53 are initialized to zero. According to this configuration, the contents of the movement vector storage unit 53 are rewritten each time by the updated movement vector obtained during the calculation. If the convergence condition of (Step 3-6) or (Step 4-6) is satisfied, a motion vector suitable for the speaker A is finally obtained in the motion vector storage unit 53. The motion vector obtained in this manner is added to the output of the standard godbook, so that a representative vector suitable for speaker A can be obtained.

【００８９】（図６）は加算器５５と移動ベクトル計算
部５２との間に適応化コードブック５６を挿入した場合
である。即ち，この構成にすれば，明らかに，適応化コ
ードブックには最終的には話者Ａに適したコードブック
として適応化コードブックが得られることになる。FIG. 6 shows a case where the adaptation codebook 56 is inserted between the adder 55 and the movement vector calculation section 52. That is, with this configuration, it is apparent that the adapted codebook is finally obtained as a codebook suitable for the speaker A in the adapted codebook.

【００９０】（図７），（図８）は，以上の原理を用い
た通信装置の送信側の一実施例である。FIGS. 7 and 8 show an embodiment of the transmitting side of a communication device using the above principle.

【００９１】（図７）は話者適応の方法として（図５）
に示したものを用いた場合である。ブロック１，２，
３，４，６，７は（図１）に示した同じ番号のブロック
と同様な動作をする。また，（図７）におけるブロック
５１〜５４は（図６）における同じ番号のブロックと同
様な動作をし，大部分は話者適応の場合に使われるのみ
である。話者が替わる毎に前記説明にしたがって移動ベ
クトル記憶部５３にはその話者の標準コードブックから
のずれを表す移動ベクトルが学習され記憶される。（図
１）のシステムの場合，比較部６では，スイッチ４の出
力とコードブック５の内容とが比較されたが，（図７）
では，スイッチ４の出力と加算器５５の出力が比較され
ることになる。加算器５５の出力は，標準コードブック
に前記話者のずれた分の補正を行ったものであると考え
ることが出来る。FIG. 7 shows a method of speaker adaptation (FIG. 5).
This is a case where the one shown in FIG. Blocks 1, 2,
3, 4, 6, and 7 operate in the same manner as the blocks of the same number shown in FIG. The blocks 51 to 54 in FIG. 7 operate in the same manner as the blocks of the same number in FIG. 6, and are mostly used only for speaker adaptation. Each time the speaker changes, the motion vector indicating the deviation of the speaker from the standard codebook is learned and stored in the motion vector storage unit 53 according to the above description. In the case of the system shown in FIG. 1, the output of the switch 4 is compared with the contents of the codebook 5 by the comparing unit 6, but FIG.
Then, the output of the switch 4 and the output of the adder 55 are compared. It can be considered that the output of the adder 55 is obtained by correcting the standard codebook for the deviation of the speaker.

【００９２】（図８）は話者適応の方法として（図６）
に示したものを用いた場合である。この場合は，前述の
ごとく適応化コードブックを挿入した場合である。この
場合は，比較器６では，スイッチ４の出力と適応化コー
ドブックの出力が比較されることになる。即ち，適応化
コードブックには話者に対する補正が行われた結果の代
表ベクトルが記憶されているからである。FIG. 8 shows a method of speaker adaptation (FIG. 6).
This is a case where the one shown in FIG. In this case, the adaptive codebook is inserted as described above. In this case, the comparator 6 compares the output of the switch 4 with the output of the adaptation codebook. That is, the adaptation codebook stores the representative vector obtained as a result of the correction for the speaker.

【００９３】（図９）〜（図１２）は以上のようにして
送られてきたラベル系列からもとの標本系列を再現する
受信機の実施例である。(FIG. 9) to (FIG. 12) show an embodiment of a receiver for reproducing the original sample sequence from the label sequence sent as described above.

【００９４】（図９）は話者に応じた移動ベクトルが最
初に送られて来，予め移動ベクトル記憶部にそれらが記
憶される。以後，送られてきたラベルに対応するベクト
ルが標準コードブックから読み出され，読み出されたコ
ードベクトルが前記移動ベクトル記憶部の内容によっ
て，加算器９３で補正され，ブロック１０〜１３におい
て前記と同様の処理が行われ復号信号が得られる。In FIG. 9, the motion vectors according to the speaker are transmitted first, and they are stored in the motion vector storage unit in advance. Thereafter, the vector corresponding to the sent label is read from the standard codebook, and the read code vector is corrected by the adder 93 according to the contents of the moving vector storage unit. A similar process is performed to obtain a decoded signal.

【００９５】（図１０）は適応化コードブック１０１を
備えた場合である。即ち，移動ベクトル記憶部９２の内
容と，標準コードブックの内容の，加算器９３による加
算出力を全てのコードベクトルに対して予め計算して適
応化コードブックに記憶しておき，この適応化コードブ
ックを（図１）におけるコードブック９の代わりに用い
るものである。FIG. 10 shows a case where the adaptive code book 101 is provided. That is, the addition output of the contents of the movement vector storage unit 92 and the contents of the standard codebook by the adder 93 is calculated in advance for all the code vectors and stored in the adaptation codebook. A book is used in place of the code book 9 in FIG.

【００９６】（図１１）は，送信側から移動ベクトルで
なくコードブックそのものを予め伝送しておくものであ
る。即ち，コードブック１１１には（図８）等の送信器
で作成された適応化コードブックの内容が伝送され，記
憶される。このコードブック８１が（図１）のコードブ
ック９に対応するものであることは言うまでもない。FIG. 11 shows a case where the code book itself is transmitted from the transmitting side in advance, not the motion vector. That is, the contents of the adapted codebook created by the transmitter such as (FIG. 8) are transmitted and stored in the codebook 111. It goes without saying that the code book 81 corresponds to the code book 9 of FIG.

【００９７】（図１２）（図１３）は，以上の話者適応
方式を音声認識に適用した場合の実施例である。FIGS. 12 and 13 show an embodiment in which the above speaker adaptation method is applied to speech recognition.

【００９８】（図１２）は，（図５）の方法を用いる場
合であって，５１〜５５は（図５）における場合と同様
な働きをする。従って，話者適応後は，（図２）のコー
ドブック２１の代わりに加算器５５の出力が用いられる
ことになる。FIG. 12 shows the case where the method of FIG. 5 is used, and 51 to 55 work in the same manner as in the case of FIG. Therefore, after the speaker adaptation, the output of the adder 55 is used instead of the codebook 21 of FIG.

【００９９】（図１３）は，（図６）の方法を用いる場
合であって，５１〜５６は（図６）における場合と同様
な働きをする。従って，話者適応後は，（図２）のコー
ドブック２１の代わりに適応化コードブック５６が用い
られることになる。FIG. 13 shows the case where the method of FIG. 6 is used, and 51 to 56 perform the same operation as in the case of FIG. Therefore, after the speaker adaptation, the adaptation codebook 56 is used instead of the codebook 21 of FIG.

【０１００】以上は，コードブックを話者に適合させる
という観点に基づくものであるが，これと裏腹の関係
で，話者を標準のコードブックに適合させる，即ち，話
者正規化を行う方法も考えられる。即ち，（数２１）はThe above description is based on the viewpoint of adapting the codebook to the speaker. However, in contrast to this, the method of adapting the speaker to the standard codebook, that is, a method of performing speaker normalization Is also conceivable. That is, (Equation 21) is

【０１０１】[0101]

【数３３】 [Equation 33]

【０１０２】となるから，ｈ_mをｙ^A _iから減ずれば，話
者をコードブックに合わせて正規化すると言うふうに考
えることが出来る。（数３３）は（図５）あるいは（図
６）に対応するものであるが，それらに対応して（図１
７）（ａ），（ｂ）の構成を用いれば，（数３３）に対
応して，（数３４）が得られる。[0102] because become, if Genzure the h _m from y ^A _i, can be considered to Fu say that normalized to the speaker in the code book. (Equation 33) corresponds to (FIG. 5) or (FIG. 6).
7) If the configurations of (a) and (b) are used, (Equation 34) is obtained corresponding to (Equation 33).

【０１０３】[0103]

【数３４】 (Equation 34)

【０１０４】（図１４）は前記第３の実施例たる話者正
規化によるベクトル量子化に基づく通信方式の送信側の
一実施例であって，（図５）あるいは（図６）を用いた
場合である。５１〜５５は前記説明と同様な動作をす
る。この場合は，前記説明にしたがって学習された移動
ベクトルを入力ベクトルから差し引いて標準コードブッ
ク５４を用いてベクトル量子化する。１３１は減算器で
あって，入力ベクトルから移動ベクトルを減算するもの
である。(FIG. 14) is an embodiment of the transmitting side of the communication system based on vector quantization by speaker normalization according to the third embodiment, and uses (FIG. 5) or (FIG. 6). Is the case. 51 to 55 operate in the same manner as described above. In this case, the motion vector learned according to the above description is subtracted from the input vector, and vector quantization is performed using the standard codebook 54. A subtractor 131 subtracts a movement vector from an input vector.

【０１０５】（図１５）は（図１４）で述べた送信機に
対する受信機であって，標準コードブック９１を用いて
受信したラベル系列をコードベクトルの系列に変換し，
送信側から別途送られてきた移動ベクトルを前記コード
ベクトルに加算し，復号されたベクトルを得るものであ
る。１４１はこの加算を行う加算器である。９２は加算
器１４１で加算を行うべき移動ベクトルを記憶する移動
ベクトル記憶部であって，この移動ベクトルは話者が変
われば前以って送信側から伝送されるものである。(FIG. 15) is a receiver for the transmitter described in (FIG. 14), and converts a label sequence received using the standard codebook 91 into a code vector sequence.
The motion vector separately sent from the transmitting side is added to the code vector to obtain a decoded vector. An adder 141 performs this addition. Reference numeral 92 denotes a movement vector storage unit for storing a movement vector to be added by the adder 141, and this movement vector is transmitted from the transmitting side in advance when the speaker changes.

【０１０６】（図１６）は前記第３の実施例たる話者正
規化によるベクトル量子化に基づく音声認識装置の一実
施例である。５１〜５５は前記説明と同様な動作をす
る。この場合も，前記説明にしたがって学習された移動
ベクトルを減算器１３１によって入力ベクトルから差し
引いて標準コードブック５４を用いてベクトル量子化す
る。FIG. 16 shows an embodiment of the speech recognition apparatus based on vector quantization by speaker normalization according to the third embodiment. 51 to 55 operate in the same manner as described above. Also in this case, the motion vector learned according to the above description is subtracted from the input vector by the subtractor 131, and vector quantization is performed using the standard codebook 54.

【０１０７】ベクトル量子化として前記（図１７）
（ａ），（ｂ）のものを用いる場合も，ほぼ同様の構成
で送・受信装置，音声認識装置の実現が可能であること
は明かであって，この場合は一部加算と減算が逆になる
（図示せず）。The above-described vector quantization (FIG. 17)
It is clear that the transmission / reception device and the speech recognition device can be realized with substantially the same configuration when using the devices (a) and (b). In this case, the addition and the subtraction are partially reversed. (Not shown).

【０１０８】以上は，システムの学習フェーズと認識フ
ェーズに分ける場合であるが，話者が通話中あるいは認
識処理中に過去（直前）に発声された音声から，逐次，
学習を繰り返し行いながら通信あるいは認識を行うよう
にも出来る。即ち，（図５）〜（図８），（図１２）〜
（図１４），（図１６）等における，バッファメモリ５
１は入力信号を常に取り込む状態にしておき，適当な期
間毎にそこに取り込まれた音声データを基に，前述の方
法により移動ベクトルを算出し直すことにより，コード
ブックの書換えや話者正規化の正規化ベクトルの更新を
行うことが出来る。このことにより，話者は，学習フェ
ーズと言うことを特別意識することなく，逐次的に話者
適応が可能となり，話者特性の時間的変化に追従して適
応あるいは正規化が可能となる。The above is a case in which the system is divided into a learning phase and a recognition phase. The speaker sequentially starts from speech uttered in the past (immediately) during a call or during recognition processing.
Communication or recognition can be performed while repeating learning. That is, (FIG. 5) to (FIG. 8), (FIG. 12) to
Buffer memory 5 in (FIG. 14), (FIG. 16), etc.
Reference numeral 1 denotes a state in which an input signal is always captured, and a motion vector is recalculated by the above-described method based on the voice data captured therein at appropriate intervals, thereby rewriting the codebook and normalizing the speaker. Can be updated. As a result, the speaker can perform speaker adaptation sequentially without special awareness of the learning phase, and can adapt or normalize following the temporal change of the speaker characteristic.

【０１０９】本実施例においては，移動ベクトルを目的
関数の極値を与えるｈ₁,…,ｈ₂,…,ｈ_Mとして計算した
が，最急降下法やその他類似の方法によりこれらを求め
ることが出来る。また，本実施例では，目的関数を減少
させるｈ_iを求める場合の例を挙げたが，目的関数の定
義の仕方によってはこれを増大させるｈ_iを求める場合
もある。例えば，本例のＪの代わりに−Ｊを用いれば当
然そのようになる。また，本実施例においては，加算，
減算なる言葉を用いたが，負号を付ければ加算は減算に
減算は加算になるから言葉の上では何れの表現も成り立
つ。In the present embodiment, the movement vectors are calculated as h ₁ ,..., H ₂ ,..., H _M giving the extreme values of the objective function. I can do it. Further, in the present embodiment, an example of a case of obtaining the h _i to reduce the objective function, there is a case of obtaining the h _i to increase this by the definition of how the objective function. For example, if -J is used instead of J in the present example, this is naturally the case. In this embodiment, addition,
Although the word "subtraction" is used, if a negative sign is added, subtraction is addition and subtraction is addition, so any expression is valid on the word.

【０１１０】[0110]

【発明の効果】以上のように本発明によれば，コードブ
ックを少ない標本で特定の話者の音声に適応させること
ができる，あるいはその話者の音声を標準のコードブッ
クに適合するように正規化することが出来，僅かの学習
の労力で通信の場合は通話品質，認識の場合は認識精度
を向上させることが出来る。As described above, according to the present invention, the codebook can be adapted to the speech of a specific speaker with a small number of samples, or the speech of the speaker can be adapted to the standard codebook. Normalization can be performed, and the communication quality can be improved in the case of communication and the recognition accuracy can be improved in the case of recognition with a small amount of learning effort.

[Brief description of the drawings]

【図１】ベクトル量子化に基づく伝送方式の原理図FIG. 1 is a principle diagram of a transmission method based on vector quantization.

【図２】ベクトル量子化に基づく音声認識装置の一般的
な原理図FIG. 2 is a general principle diagram of a speech recognition device based on vector quantization.

【図３】（図２）における参照モデル記憶部の詳細図FIG. 3 is a detailed view of a reference model storage unit in FIG. 2;

【図４】ＨＭＭ（Hidden Markov Model）の原理図Fig. 4 HMM (Hidden Markov Model) principle diagram

【図５】本発明による適応化方法の一実施例の原理図FIG. 5 is a principle diagram of an embodiment of an adaptation method according to the present invention.

【図６】本発明による他の実施例の原理図FIG. 6 is a principle diagram of another embodiment according to the present invention.

【図７】（図５）の原理による，ベクトル量子化に基づ
く信号送信装置のブロック図FIG. 7 is a block diagram of a signal transmission device based on vector quantization based on the principle of FIG. 5;

【図８】（図６）の原理による，ベクトル量子化に基づ
く信号送信装置のブロック図8 is a block diagram of a signal transmission device based on vector quantization based on the principle of FIG. 6;

【図９】（図７），（図８）の送信装置に対する受信装
置の実施例FIG. 9 is an embodiment of a receiving apparatus for the transmitting apparatus shown in FIGS. 7 and 8;

【図１０】（図７），（図８）の送信装置に対する受信
装置の実施例FIG. 10 is an embodiment of a receiving apparatus for the transmitting apparatus shown in FIGS. 7 and 8;

【図１１】（図８）の送信装置に対する受信装置の他の
実施例FIG. 11 shows another embodiment of the receiving apparatus for the transmitting apparatus shown in FIG.

【図１２】（図５）の原理による，ベクトル量子化に基
づくパタン認識装置のブロック図FIG. 12 is a block diagram of a pattern recognition device based on vector quantization based on the principle of FIG. 5;

【図１３】（図６）の原理による，ベクトル量子化に基
づくパタン認識装置のブロック図FIG. 13 is a block diagram of a pattern recognition device based on vector quantization based on the principle of FIG. 6;

【図１４】話者の正規化に基づく送信装置の一実施例の
図FIG. 14 is a diagram of an embodiment of a transmission device based on speaker normalization.

【図１５】話者の正規化に基づく受信装置の一実施例の
図FIG. 15 is a diagram of an embodiment of a receiving apparatus based on speaker normalization.

【図１６】話者の正規化に基づく認識装置の一実施例の
図FIG. 16 is a diagram of an embodiment of a recognition device based on speaker normalization.

【図１７】本発明による話者正規化方法の他の実施例の
図FIG. 17 is a diagram of another embodiment of the speaker normalization method according to the present invention.

[Explanation of symbols]

１、４スイッチ２、３バッファメモリ５コードブック６比較部７ラベル選択部８コードベクトル読出部９コードブック５２移動ベクトル計算部５３移動ベクトル記憶部 1, 4 switch 2, 3 buffer memory 5 codebook 6 comparison section 7 label selection section 8 code vector reading section 9 codebook 52 movement vector calculation section 53 movement vector storage section

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平３−75700（ＪＰ，Ａ) 特開平４−122997（ＪＰ，Ａ) 特開平４−127200（ＪＰ，Ａ) 特開平５−173588（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 19/14 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-3-75700 (JP, A) JP-A-4-122997 (JP, A) JP-A-4-127200 (JP, A) JP-A-5-127 173588 (JP, A) (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-19/14

Claims

(57) [Claims]

1. Clustering a feature vector space ,
The representative vectors definitive in each cluster, and the standard code book for storing a searchable form with labels corresponding to each of the representative vectors of said standard codebook Modifa
To input the standard codebook into vector quantization.
And learning vector storage means for storing a number of learning vector for adapting the force vector, the standard code
By adding to the representative vector of the book,
The movement vector for modifying the table vector is
The representative vector of the quasi-codebook and the learning vector
Gives the extrema of the objective function defined as a function
A motion vector calculation means for calculating Te, and a codebook adaptation means to obtain a New representative vector by adding the motion vector in the representative vector, the input vector by the representative vector wherein A New are in encoding the input vector each label of the label or the input vector
A vector quantization device for converting a degree of belonging to a (cluster) into a degree of belonging vector having elements as elements.

2. Clustering a feature vector space ,
The representative vectors definitive in each cluster, and the standard code book for storing a searchable form with labels corresponding to each of the first input vector to be vector quantized Modi
File into the standard codebook.
To normalize the force vector to get a second input vector
And learning vector storage means for storing a number of learning vector of the adding to the first input vector
Therefore, the movement for modifying the first input vector
Let the vector be the same as the representative vector in the standard codebook
Objective function pole defined as a function of the learning vector
Movement vector calculation means for calculating a value to be given
And the motion vector is added to the first input vector.
Normalizing means for obtaining a second input vector by
When encoding the first input vector, the second input vector
Label or label the first input vector with a vector
Is the degree of membership of the input vector for each label.
A vector quantization device for converting the data into a membership vector .

3. The movement vector calculation means includes a standard codebook.
3. The vector quantization apparatus according to claim 1, wherein a motion vector is obtained for each of the representative vectors of the blocks .

4. The moving vector calculating means according to claim 1, wherein:
Tsu claim and obtains the moving vector as common to all the representative vectors of click 1 or claim 2
A vector quantizer as described.

5. The vector quantization apparatus according to claim 1, further comprising an adder for adding the movement vector and each representative vector of the standard codebook, and performing vector quantization by an output of the adder.

6. An adder for adding a motion vector to each representative vector of a standard codebook, and an adaptation codebook for storing an output of the adder, wherein an output of the adaptation codebook is used to generate a vector quantum vector. The vector quantization apparatus according to claim 1, wherein the vector quantization is performed.

7. A signal transmitting apparatus comprising: a label transmitting means for transmitting a label encoded by the vector quantization apparatus according to claim 5; and a moving vector transmitting means.

8. A label transmitting means for transmitting a label coded by the vector quantization apparatus according to claim 6, and an adaptive codebook transmitting means for transmitting an adaptive codebook. Signal transmission device.

9. A motion vector storage unit for storing a motion vector transmitted from the signal transmission device according to claim 7,
An adder for adding a standard codebook, a representative vector read from the standard codebook corresponding to the received label, and a movement vector read from the movement vector storage unit corresponding to the label; And a decoder that uses the output of the adder as a decoded vector of the label.

10. A motion vector storage unit for storing a motion vector transmitted from the signal transmission device according to claim 7, a standard codebook, a representative vector corresponding to each label of the standard codebook, and a corresponding vector. An adaptation codebook that stores the sum of each of the movement vectors read from the movement vector storage unit in association with the label, and the code vector of the adaptation codebook corresponding to the received label is stored as the code vector of the label. A signal receiving device including: a decoder that sets a decoded vector.

11. An adaptive codebook storage unit for storing an adaptive codebook sent from the signal transmitting apparatus according to claim 8, wherein a code vector of the adaptive codebook corresponding to a received label is stored in the adaptive codebook storage unit. A signal receiving device comprising: a decoding unit that decodes a label;

12. The vector quantization apparatus according to claim 1, wherein each vector of the input feature vector sequence is vector-quantized and converted into a label, and as a result, the feature vector sequence is converted into a label sequence. HMM storage means for storing a hidden Markov model (HMM) for each recognition unit in which the occurrence probability of each label is defined for each state, likelihood calculation means for calculating the likelihood of each HMM for the label sequence, A recognition apparatus, wherein a recognition unit corresponding to an HMM that gives the maximum likelihood is a recognition result.

13. An individual vector of an input feature vector sequence is vector-quantized and converted into a membership vector having the membership of each label as an element. As a result, the feature vector sequence is converted into the membership vector sequence. 2. A vector quantization device according to claim 1, wherein the probability of occurrence of each label is defined for each state, HMM storage means for storing a hidden Markov model (HMM) for each recognition unit, and said membership degree vector sequence. A likelihood calculating means for calculating the likelihood of each of the HMMs, and a recognition unit corresponding to the HMM giving the maximum value of the likelihood as a recognition result.

14. The vector quantization apparatus according to claim 1, wherein each vector of the input feature vector sequence is vector-quantized and converted into a label, and as a result, the feature vector sequence is converted into a label sequence. Recognition model storage means for storing a recognition model for each recognition unit expressed by a label sequence, distance calculation means for calculating the distance or similarity between the input label sequence and each recognition model, and a minimum value of the distance Alternatively, a recognition unit that uses a recognition unit corresponding to a recognition model that gives a maximum value of similarity as a recognition result.

15. An individual vector of the input feature vector sequence is vector-quantized and converted into a membership vector having the membership of each label as an element. As a result, the feature vector sequence is converted into a membership vector sequence. 2. The vector quantization device according to claim 1, wherein the recognition model storage unit stores a recognition model for each recognition unit expressed by a label sequence, and a distance or a similarity of each recognition model with respect to the input membership sequence. a distance calculation means for calculating a recognition device which is characterized in that the recognition result recognition unit corresponding to the recognized model giving the maximum value of the minimum or the similarity of the distance.

16. The vector quantization apparatus according to claim 2, further comprising an adder for adding the motion vector and the input vector, wherein an output of the adder is vector-quantized.

17. A signal transmitting apparatus comprising: a label transmitting means for transmitting a label encoded by the vector quantization apparatus according to claim 16, and a moving vector transmitting means for transmitting a moving vector.

18. A motion vector storage unit for storing a motion vector transmitted from the signal transmission device according to claim 17, a standard codebook, and a standard codebook read from the standard codebook corresponding to a received label. A subtractor for subtracting the motion vector read from the motion vector storage unit from the representative vector, and a decoder using the output of the subtractor as a decoded vector of the label.

19. The vector quantizer according to claim 2, wherein the normalized vector of each of the input feature vector sequences is vector-quantized and converted into a label, and consequently the feature vector sequence is converted into a label sequence. , An HMM storage unit that stores a hidden Markov model (HMM) for each recognition unit in which the occurrence probability of each label is defined for each state, and calculates the likelihood of each of the HMMs with respect to the membership vector series. A recognition apparatus characterized in that a likelihood calculating means and a recognition unit corresponding to an HMM giving the maximum value of the likelihood as a recognition result.

20. Vector quantization of a normalized vector of an individual vector of an input feature vector sequence, conversion into a membership vector having the membership of each label as an element, and as a result, A vector quantization device according to claim 2, wherein the vector quantization device converts the vector quantization sequence into a degree vector sequence;
HMM storage means for storing a hidden Markov model (HMM) for each recognition unit in which the occurrence probability of each label is defined for each state, and each HMM for the membership degree vector series
And a recognition unit that calculates a recognition unit corresponding to the HMM that gives the maximum value of the likelihood as a recognition result.

21. The vector quantizer according to claim 2, wherein the normalized vector of each of the input feature vector sequences is vector-quantized and converted into a label, and consequently the feature vector sequence is converted into a label sequence. And a recognition model storage means for storing a recognition model for each recognition unit expressed by a label sequence, and a distance calculation means for calculating the distance or similarity between a pair of the input label sequence and each of the recognition models. A recognition unit corresponding to a recognition model that gives the minimum value of the distance or the maximum value of the similarity as a recognition result.

22. Vector quantization of a normalized vector of an individual vector of an input feature vector sequence is performed, and the normalized vector is converted into a membership vector having the membership of each label as an element. A vector quantization device according to claim 2, wherein the vector quantization device converts the vector quantization sequence into a degree vector sequence;
Recognition model storage means for storing a recognition model for each recognition unit represented by a label sequence, distance calculation means for calculating the distance or similarity of each recognition model with respect to the input membership degree series, and a minimum value of the distance Alternatively, a recognition apparatus characterized in that a recognition unit corresponding to a recognition model that gives the maximum value of similarity is a recognition result.

23. Temporary storage means for sequentially storing a predetermined constant signal section of an input signal, wherein the contents of said temporary storage means are used as a learning vector and a codebook or a moving picture is sequentially stored for each signal section. The vector quantization apparatus according to claim 1, wherein the vector quantization is performed.

24. Temporary storage means for sequentially storing a predetermined constant signal section of an input signal, and using the contents of said temporary storage means as a learning vector, the normalization of the input signal is sequentially performed for each signal section. The vector quantization apparatus according to claim 2, wherein a motion vector for quantization is calculated.