JPS61261799A

JPS61261799A - Code book preparation for unspecified speaker

Info

Publication number: JPS61261799A
Application number: JP60104397A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-05-16
Filing date: 1985-05-16
Publication date: 1986-11-19

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】一技−術分−野本発明は、音声信号等の波形符号化方式や、線形予測分
析・合成系に対する適用技術に関するものであり、音声
や画像のパターン認識のクラスタリング手法に応用でき
るものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to techniques applied to waveform encoding systems for audio signals, etc., and linear predictive analysis/synthesis systems, and relates to a clustering method for pattern recognition of audio and images. It can be applied to

Ｗ米技術一般に、音声や画像の特徴へりトルから代表的なベクト
ル（コードベクトルと呼ぶ）を抽出するクラスタリング
の手法はベクトル量子化法と呼ばれ、最近、波形符号化
方式や、線形予測分析・合成系、符号伝送方式等に適用
されている。In general, the clustering method for extracting representative vectors (called code vectors) from the characteristics of audio and images is called vector quantization, and recently, waveform encoding methods, linear predictive analysis, It is applied to synthesis systems, code transmission systems, etc.

コードベクトルを作成する方法としては、学習サンプル
データの空間的な分布に従ってクラスタリングする方法
が効率的であるが、未だ決定的な手法は確立されておら
ず、通常は、多大な繰り返し演算によって量子化の際の
平均歪み（コードベクトルと学習サンプルとの誤差）を
最小化するアルゴリズムが代表的なものである。An efficient way to create code vectors is to cluster them according to the spatial distribution of training sample data, but no definitive method has been established yet, and quantization is usually done through a large number of repeated operations. A typical algorithm is one that minimizes the average distortion (error between the code vector and the learning sample) when

このように、従来、コードベクトルの作成には、膨大な
計算量を要し、特に、コードベクトルの数や学習サンプ
ルが増加すればそれらに比例して計算量も増大するため
、大型「１算機を以ってしても、作成は現実的に不６丁
能となってしまうという欠点があった。Conventionally, creating code vectors requires a huge amount of calculation, and in particular, as the number of code vectors and training samples increases, the amount of calculation increases proportionally. Even if there was a chance, the drawback was that it would be difficult to create in reality.

一目的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、ベクトル量子化の際に用いる膨大な学習サンプル
データの繰り返し演算に対して、高速で正確な処理を行
うことにより、コードブ。One Purpose The present invention has been made in view of the above-mentioned circumstances.
In particular, codeb is able to perform fast and accurate processing for repeated operations on the huge amount of learning sample data used in vector quantization.

り（コードベクトルの集合）を効率的に作成することを
目的としてなされたものである。This was done for the purpose of efficiently creating a set of code vectors.

構成本発明は、上記目的を達成するため、音声や画像等の特
徴ベクトルの集合（学習サンプルと呼ぶ）から代表的な
ベクトル（コードベクトルと呼ぶ。Structure: In order to achieve the above object, the present invention extracts a representative vector (referred to as a code vector) from a set of feature vectors (referred to as a learning sample) of audio, images, etc.

また、コードベクトルの集合をコードブックと呼ぶ。）
をクラスタリングの手法に基づいて作成するベクトル量
子化法において、複数の話者中の各話者ごとにベクトル
量子化してコードベクトルを作成した後、それらのコー
ドベクトルに属する学習サンプル数を登録し、複数の話
者全てのコードベクトルに学習サンプル数を重み付けし
て再びベクトル量子化して高速に不特定話者用のコード
ベクトルを作成することを特徴としたものである。Also, a collection of code vectors is called a codebook. )
In the vector quantization method, which is created based on a clustering method, code vectors are created by vector quantization for each speaker among multiple speakers, and then the number of learning samples belonging to those code vectors is registered, This method is characterized by weighting the code vectors of all multiple speakers by the number of learning samples and vector quantizing them again to quickly create code vectors for unspecified speakers.

以下、本発明の実施例に一基づいて説明する。EMBODIMENT OF THE INVENTION Hereinafter, the present invention will be explained based on an example.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、■は信号入力端子。FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, and in the figure, ■ indicates a signal input terminal.

２は特微分析部、３はコードブック（コードベクトルの
集合）格納部、４はベクトル量子化部、５は標準パター
ン格納部５６は認識処理部、７は認識結果出力端子であ
る。2 is a feature analysis section, 3 is a codebook (a collection of code vectors) storage section, 4 is a vector quantization section, 5 is a standard pattern storage section 56 is a recognition processing section, and 7 is a recognition result output terminal.

信号入力としては、一般的に画像信号入力、音声信号入
力等が考えられるが、ここでは音声信号に限って説明を
行う。Generally, image signal input, audio signal input, etc. can be considered as signal input, but only audio signals will be explained here.

入力端子１で入力された信号は、特微分析部２でバンド
パスフィルタ一群や、ｒ、ｐ　ｃ分析等の分析処理が行
われ、パワースペクトラムやＬ　Ｐ　Ｇパラメータなど
の特徴パラメータに変換される。これらの特徴パラメー
タの集合を学習サンプルとして、後述するクラスタリン
グの手法（ベクトル量子化法）によって、代表的なベク
トル（コードベクトルと呼ぶ）を予め作成しておき、コ
ードブック３に格納しておく。これらコードベクトルの
集合をコードブックと呼ぶ。The signal input at the input terminal 1 is subjected to analysis processing such as a group of band-pass filters and r, pc analysis in the characteristic analysis section 2, and is converted into characteristic parameters such as power spectrum and LPG parameters. . Using a set of these feature parameters as a learning sample, representative vectors (referred to as code vectors) are created in advance by a clustering method (vector quantization method) described later and stored in the codebook 3. A collection of these code vectors is called a codebook.

ある信号入力は、コードブック３のコードベクトルのい
ずれかにベクトル量子化部４でベクトル量子化され、各
フレームごとにコード（ヨー１−′ベクトルの番号）が
登録される。標準パターン格納部５では、辞書パターン
に相当するものがコードブック３のコードによって表現
されている。こうして、ベクトル量子化部４でベクトル
量子化された未知入カバターンと標準パターン格納部５
の標準パターンとのパターンマツチングを認識処理部６
で行って、未知入カバターンに最も類偵するパターンを
標準パターン格納部５の中より選択し、これを出力端子
７にて出力する。A certain signal input is vector quantized into one of the code vectors in the code book 3 by the vector quantizer 4, and the code (the number of the yaw 1-' vector) is registered for each frame. In the standard pattern storage section 5, those corresponding to dictionary patterns are expressed by codes in the codebook 3. In this way, the unknown input cover pattern vector quantized by the vector quantization unit 4 and the standard pattern storage unit 5
The recognition processing unit 6 performs pattern matching with the standard pattern of
Then, the pattern most similar to the unknown input cover turn is selected from the standard pattern storage section 5 and outputted at the output terminal 7.

第２図は、代表的なベクトル量子化法のアルゴリズムに
基づくフローチャートを示したものである。このアルゴ
リズムは量子化の際に生じる平均歪みを最小化するもの
で、強力なアルゴリズムの一つである。図中、８はアル
ゴリズム開始端子。FIG. 2 shows a flowchart based on a typical vector quantization algorithm. This algorithm minimizes the average distortion that occurs during quantization, and is one of the most powerful algorithms. In the figure, 8 is the algorithm start terminal.

９は初期化部、１０は学習サンプルの読み込み部。9 is an initialization section, and 10 is a learning sample reading section.

ｌｌはクラスタリング部、１２は平均歪め剖算部。ll is a clustering part, and 12 is an average distortion calculation part.

１３は収束判定部、１４は平均歪みの入れ替え部。13 is a convergence determination unit, and 14 is an average distortion replacement unit.

１５はコードブック決定部、１６はコードブック格納部
、１７はアルゴリズム終了端子である。15 is a codebook determining section, 16 is a codebook storage section, and 17 is an algorithm termination terminal.

まず、初期化部９で収束の判定のための平均歪みＤ−１
の設定（Ｄ−１−■とする）、量子化レベルＮの設定、
初期ベクトルＹ１．）’ｚ、・・・、糺４の設定、収束
判定閾値εの設定を行う。量子化レベルＮとしては、ｉ
１１常、特定話者向き音声認識装置では１２８．２５６
等のレベルを、不特定話者向きには２５６，５１２等の
レベルを採るのが良いとされている。初期ベクトル（ｉ
ｔ　ｌ　、　　（ｉ＝］。First, the initialization unit 9 uses the average distortion D-1 for determining convergence.
(set as D-1-■), setting of quantization level N,
Initial vector Y1. )'z, . . . , the setting of the adhesive 4 and the convergence determination threshold ε are performed. As the quantization level N, i
11, 128.256 for speech recognition devices for specific speakers
It is said that a level such as 256, 512 etc. is recommended for non-specific speakers. Initial vector (i
t l , (i=].

２、・・・Ｎ）の選択法としては、学習サンプルＸ１゜
交２．・・・、；＜ｎの中から、出来る限り互いに異な
るベクトルＮ個をとるの゛が普通である。さもないと、
クラスタリングが正常に行われず、局所的に収束する危
険がある。また、学習サンプル数ｎについては、通常、
量子化レベルＮのｌＯ倍借景ヒとするのがよい。収束判
定闇値としては、例えばε＝Ｏ，’ＯＯＩを選ふ。読み
込み部１０で学習サンプルＸ１．Ｘ２．・・・、交ｎを
バッファメモリに読み込み、各サンプル交」について、
初期的なコードベクトルーｙ１．’ｔｚ、・・・、９，
４との距離１１灸ｊ−糺１１を計算する。このとき、最
も距離が小さい９、を選び、Ｘｉは糺に属すると考える
。このようにして、全ての学習サンプル”ｉｔ　　哀２
＋　・・・。2,...N), the learning sample X1° cross 2. It is normal to take N vectors that are as different from each other as possible from among <n. Otherwise,
There is a risk that clustering will not be performed correctly and converge locally. Also, the number of learning samples n is usually
It is preferable to set the quantization level N to 10 times the borrowed background. As the dark value for convergence determination, for example, ε=O,'OOI is selected. The reading unit 10 reads the learning sample X1. X2. ..., read the intersection n into the buffer memory, and for each sample intersection,
Initial code vector - y1. 'tz,...,9,
Calculate the distance 11 moxibustion j - 11 from 4. At this time, select 9, which has the smallest distance, and consider that Xi belongs to Tadasu. In this way, all the learning samples "It Ai 2"
+...

Ｍ　ｎについてのクラスタリングが終了する。各９゜（
ｉ＝］、２、・・・、Ｎ）をコードベクトルとするクラ
スタを０１と呼ふと、各Ｃｉに属する全ての学習サンプ
ルＭ４と９．との距ｌ１ｉｔｌ（ｉ量化歪という）を計
算して、全クラスタＣｉ　　（ｉ＝１，２゜・・・、Ｎ
）で平均化したものを平均歪Ｄｏとして、平均歪計算部
１２で計算する。次に、前記（初期的な）歪み値Ｄ−，
とＤ　ｏとの相対的な変化分（Ｄ−１−Ｄｏ）／Ｄｏを
計算し、予め初期化部９で定めた閾値εより小さければ
、この時のｆｙ＋ｌ。Clustering for M n is completed. 9° each (
i =], 2, ..., N) as a code vector is called 01, then all the learning samples M4 and 9. Calculate the distance l1itl (referred to as i quantification distortion) to all clusters Ci (i=1,2°...,N
) is calculated as the average distortion Do by the average distortion calculation unit 12. Next, the (initial) distortion value D−,
The relative change (D-1-Do)/Do between and Do is calculated, and if it is smaller than the threshold ε predetermined by the initialization unit 9, then fy+l at this time.

（ｉ＝］、２．・・・、Ｎ）を最終的なコードブックと
してコードブック決定部１５で決定し、各クラスタＯ１
に属するサンプル数８１とともにメモリ部１６に登録後
、終了する。しかし、以上クラスタリング部１１から収
束判定部１３の計！過程ｉｌｌ初期化部９の初期ベクト
ル（９，）を適切に選択しても収束するまでに数回以」
−の繰り返しを必要とする。そのため、入れ替え部１４
でＤＯをＤ−＋に値を入れ替えて再びクラスタリング部
１１に戻って収束判定部１３の条件を満たすまでクラス
タリングとＤＯの計算を繰り返す。(i=], 2..., N) is determined as the final codebook by the codebook determining unit 15, and each cluster O1
After registering in the memory unit 16 together with the number of samples 81 belonging to the sample, the process ends. However, the above is the total amount from the clustering unit 11 to the convergence determination unit 13! Even if the initial vector (9,) of the process ill initialization unit 9 is selected appropriately, it will take several times to converge.
- Requires repetition. Therefore, the replacement part 14
Then, the value of DO is replaced with D-+, and the process returns to the clustering unit 11 again to repeat clustering and calculation of DO until the conditions of the convergence determination unit 13 are satisfied.

第３図は、コードブックが作成できた時点での学習サン
プル（ＸＪｌ　、　　（ｊ＝１．２．・・・、ｎ）、各
クラスタＣｉ等を表わす図であり、各ベクトルは一般に
Ｐ　（Ｐ≧２）次元であるが、図ではｐ＝２の場合を表
わしている。FIG. 3 is a diagram showing the learning samples (XJl, (j=1.2..., n), each cluster Ci, etc. at the time when the codebook has been created, and each vector is generally P (P ≧2) dimension, but the figure shows the case where p=2.

第４図は、本発明における不特定話者用コードブック作
成法のアルゴリズムに基づくフローチャー　１・を示し
たもので、基本的な構成は第２図と同様である。初期化
部１９で、第２図の初期化部９と同様、平均歪ｌ）−τ
−■、とεを設定し、し７化しベルはＮ１．初期ベクト
ルは、第５図で後述するように、ある特定の話者１につ
いてのコードベクトル（辷）　とする。前記Ｎ滓は不特
定話者向きには、特定話者の場合の２−４’；’ｉ　、
即ちＮ”＝２Ｎ程度が良いとされている。読め込み部２
０で、話者ｍ　（ｍ＝　１　２．−、Ｍ　；　Ｍは話者
数）ノコ−ドブツクｆｙＴｌ　とその個数ｆ３７１を第
２図で述べた方法によってコードブック１６で予め作成
しておいたものを読み込む。クラスタリング部２１でク
ラスタリングを行い、平均歪Ｄ０１を計算部２２で計算
する。ここで前記ＤＯ亭の計算は、次のようにして行う
。即ら、初期ベクトル（シ１）と学習サンプル（ｙＴｌ
　、　　（ｊ＝　ｌ、２．・・・５　Ｎ；ｍ＝２．３．
・・・、Ｍ）との距離を計算する際に、各ｙＷについて
最も近いシｊ　　（ｉ＝］、　　２．・・・。FIG. 4 shows a flowchart 1 based on the algorithm of the speaker-independent codebook creation method according to the present invention, and the basic configuration is the same as that in FIG. 2. In the initialization section 19, the average strain l)-τ is calculated as in the initialization section 9 of FIG.
-■, and ε are set, and the bell is N1. The initial vector is a code vector for a particular speaker 1, as will be described later in FIG. For unspecified speakers, the N slag is 2-4';'i for specific speakers;
In other words, it is said that approximately N''=2N is good.Reading section 2
0, speaker m (m=12.-, M; M is the number of speakers) codebook fyTl and its number f371 were created in advance in codebook 16 by the method described in Fig. 2. Load. A clustering section 21 performs clustering, and a calculation section 22 calculates an average distortion D01. Here, the calculation of the DO-tei is performed as follows. That is, the initial vector (shi1) and the learning sample (yTl
, (j=l, 2....5 N; m=2.3.
..., M), the nearest si j (i=], 2..... for each yW.

Ｎ）との距離に、３＋ｉ２を重みとして乗したものを歪
みとし、全てのクラスタＣｉ”　（ｉ　＝］、２゜・・
・、Ｎ”）について平均をとる。式で表わすと、但し、
ｎ忰−Ｎ−Ｍである。The distance to N) multiplied by 3+i2 as a weight is defined as distortion, and all clusters Ci'' (i =], 2°...
・, N”). Expressed in the formula, however,
It is n-N-M.

−に述のようにして、収束判定部２３で相対歪みを計算
し、その変化がεより小さければ、この時点でのコード
ブックを（夛７１．　　（ｉ　＝１．２゜・・・、Ｎ１
）としてコードブック決定部２５で決定し、コードブッ
ク格納部２６のメモリに不特定話者用コードブックとし
て登録後、終了する。収束判定部２３での判定が否のと
きはクラスタリング部２１のクラスタリングから再びや
り直すことは、第２図で述べたことと同様である。The relative distortion is calculated by the convergence determination unit 23 as described in -, and if the change is smaller than ε, the codebook at this point is calculated as (夛71. (i = 1.2°..., N1
) is determined by the codebook determining unit 25, and is registered in the memory of the codebook storage unit 26 as a codebook for unspecified speakers, and then the process ends. If the determination by the convergence determining section 23 is negative, the clustering by the clustering section 21 is restarted, which is the same as described in FIG. 2.

第５図は、話者１．・・・１ｍ、・・・９Ｍのコードブ
ックｓｐ１．・・・、ＳＰｍ、ＳＰＨの作成結果の様子
を示す図であり、図中、夛Ｔ、・・・、シ賃は話者ｍの
コードベクトルを、Ｓ　’ｉ’　、　・・・＋８は各コ
ードベクトルにベクトル量子化された学習サンプル数を
表わす。Figure 5 shows speaker 1. ...1m, ...9M code book sp1. . . , SPm, and SPH. In the diagram, T, . Represents the number of training samples vector quantized into the code vector.

第６図は、第４図の不特定用コードブック２６で作成さ
れた不特定話者用のコードベクトルを表わす図であり、
図中、ｙＴ’、ＳＴは各々話者ｍのコードベクトルと、
それに属する学習サンプル数（重み）、９丁、ＳＴは各
々、不特定話者用のコードベクトルとそれに属する学習
サンプル数であ第４図の初期化部１９において、初期ベ
クトルを設定する際に、話者１の特定用のコートヘク［
・ルを用いたが、話者ｌはＭ入内のいずれの話者を採用
してもよいことはいうまでもない。FIG. 6 is a diagram showing a code vector for unspecified speakers created with the unspecified code book 26 of FIG. 4,
In the figure, yT' and ST are the code vectors of speaker m, respectively,
The number of learning samples (weight) belonging to it, 9, and ST are the code vector for unspecified speakers and the number of learning samples belonging to it, respectively. In the initialization unit 19 of FIG. 4, when setting the initial vector, Coat for identifying speaker 1 [
・Although we used M, it goes without saying that speaker l may be any speaker in M.

また、第４図のアルゴリズムにおいて、話−ｉ＋の初期
ベクトルを基に、話者２の学習サンプルを第２図で述べ
たアルゴリズムに従ってベクトル量子化して、２人分の
コードベクトルを作成できるが、同様に、この操作を繰
り返して、話者１〜Ｍ（１＜ｍ＜Ｍ）までのコードベク
トルを作成後、ごれを初期ベクトルとして、次の話者（
ｍ＋１）の学習サンプルをベクトル量子化してもよい。Furthermore, in the algorithm shown in Fig. 4, code vectors for two people can be created by vector quantizing the learning sample of speaker 2 according to the algorithm described in Fig. 2 based on the initial vector of talk -i+. Similarly, after repeating this operation to create code vectors for speakers 1 to M (1<m<M), the next speaker (
m+1) training samples may be vector quantized.

このような操作を施すことにより、不特定話者用のコー
ドベクトルを局所的に収束させることを防ぎ、バランス
よくしかも高速に作成することが可能となる。By performing such an operation, it is possible to prevent code vectors for unspecified speakers from converging locally and to create them in a well-balanced manner and at high speed.

次に、各クラスタ内の学習サンプルの分散を考慮した統
計的な処理法について説明する。Next, a statistical processing method that takes into account the variance of learning samples within each cluster will be described.

第３図の各クラスタＣｉ内の学習サンプルについて、分
散・共分散を求めるには、共分散行列Σｃｉを次のよう
に定義すればよい。To find the variance and covariance of the learning samples in each cluster Ci in FIG. 3, the covariance matrix Σci can be defined as follows.

Σｃｉ＝　（ｔｙ　ｊｋ）　、　　ｊ、　　ｋ＝　１．
　２．−、　　Ｐ。Σci= (ty jk), j, k= 1.
2. -, P.

ここで、Ｐはパラメータ次元数で、σｊｋは共分散行列
のＮ、ｋ）成分であり、・・・（１）と表わせる。ｘ１□ｊは、クラスタＣ４内の１番目の学
習サンプルのｊ成分、ｙ、４はＣ４のコードベクトルの
Ｊ成分の意味である。Here, P is the number of parameter dimensions, and σjk is the N, k) component of the covariance matrix, which can be expressed as follows. x1□j is the j component of the first learning sample in cluster C4, and y, 4 is the J component of the code vector of C4.

上記の共分散行列Σｃｉは次のようなマハラノビスの距
離ｄ　　（Ｘ＋−Ｙ＋　　）−（Ｘ＋、）’、＋　　）　
　　Σ　昌　（交　１−　糺ンＴを計算するときに用い
ることが多い。ここで−１は逆行列を、Ｔは転置を表わ
す。ΣｃｉあるいはΣε１の計算は学習サンプル数が多
くなると計算量が増大する。The above covariance matrix Σci is the Mahalanobis distance d (X+ - Y+ ) - (X+, )', + )
Σ昌 (Intersection 1-Ten) It is often used when calculating T. Here, -1 represents the inverse matrix and T represents the transposition. When calculating Σci or Σε1, the amount of calculation increases as the number of learning samples increases. do.

したがって、ベクトル量子化の際のベクトル間の距離尺
度として、最初から、前述したようなマハラノビスの距
離を用いることは膨大な時間を要し、非現実的な計算と
なる。この問題を解決するには、第２図のコードブック
決定部１５でコードブックを決定する迄は、絶対値距離ｄ　　（Ｍｌ、糺）−Σ　１ｘＩ、−ｙ１１１ｊ＝１や、ユークリッド距離を用い、収束した後前記（１）式を計算すれば、収束８
１算が効率的に行える。Therefore, using the Mahalanobis distance as described above from the beginning as a distance measure between vectors during vector quantization requires a huge amount of time and becomes an unrealistic calculation. To solve this problem, until the codebook is determined by the codebook determination unit 15 in FIG. If we calculate the above equation (1) after convergence, we will get convergence 8
Ability to perform calculations efficiently.

また、第４図の不特定用コー１−”ブック２Ｇの不特定
話者用のコードベクトルが属するクラスタＣ４内の共分
散行列Σｃ１を求めるのも、各特定話者用のコードへク
トル９７にその重め８７を乗して、次のように定義でき
る。In addition, the covariance matrix Σc1 in the cluster C4 to which the code vector for the general speaker in the non-specific code 1-'' book 2G in FIG. By multiplying the weight by 87, it can be defined as follows.

ΣｃＹ−（ｔｙ　＋；）　、　　ｊ、　　ｋ＝　１．　
２．−、　　ｐ・・・　（２）但し、したがって、前記（２）式のように共分散行列を定義す
れば、不特定話者用の学習サンプルを、統計処理する際
に必要な計算量を大幅に減少させることができ、現実的
に可能な処理にすることができる。ΣcY−(ty +;), j, k= 1.
2. −, p... (2) However, if the covariance matrix is defined as in equation (2) above, the amount of calculation required for statistical processing of learning samples for non-specific speakers can be greatly reduced. This can be reduced to a practically possible process.

効果以上の説明から明らかなように、本発明によれば、量子
化レベルが高く、学習サンプル数が膨大となるような不
特定話者用のコードブックを作成する際に、繰り返し計
算の量を減少させ、局所的に収束させないようにバラン
スよく初期ベクトルを選択しているので、従来の方法に
比べて大幅に計算量を減少でき、正確で現実的な処理が
可能となる。Effects As is clear from the above explanation, the present invention reduces the amount of repeated calculations when creating a codebook for an unspecified speaker with a high quantization level and a large number of learning samples. Since the initial vector is selected in a well-balanced manner to avoid local convergence, the amount of calculation can be significantly reduced compared to conventional methods, making it possible to perform accurate and realistic processing.

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第２図は、ベクトル量子化法のアルゴリズ
ムを示すフローチャート、第３図は、学習サンプルとク
ラスタの関係を示す図、第４図は、不特定話者用コード
ブック作成用アルゴリズムを示すフローチャート、第５
図は、コードブック作成結果の様子を示す図、第６図は
、不特定話者用のコートベクトルを表わす図である。１・・・信号入力端、２・・・特徴分析部、３・・・コ
ードブック、４・・・ベクトル量子化部、５・・・標準
パターン。６・・・認識処理部、７・・・結果出力部、８．１８・
・・アルゴリズム開始端子、９．１９・・・初期化部、
１０゜２０・・・学習サンプル読み込み部、１１．２１
・・・クラスタリング部、１２．２２・・・平均歪み計
算部。１３．２３・・・収束判定部、１４．２４・・・平均歪
み入れ替え部、１５．２５・・・コードブック決定部。１６．２６・・・コードブック格納部、１７．２７・・
・アルゴリズム終了端子。第３図第４図文呂　ぬ　　　　　　旧初肋イし吉亮着ｍｎコードブック（９Ｔ）とデの個数Ｓ　Ｔ　　４計与坂みクラスタ１ノング平を７歪り二のぎ士算　　２２第　５　図Fig. 1 is an electrical block diagram for explaining an embodiment of the present invention, Fig. 2 is a flowchart showing the vector quantization algorithm, and Fig. 3 shows the relationship between learning samples and clusters. Figure 4 is a flowchart showing the algorithm for creating a codebook for non-specific speakers.
The figure shows the codebook creation result, and FIG. 6 is a diagram showing the coat vector for unspecified speakers. DESCRIPTION OF SYMBOLS 1... Signal input terminal, 2... Feature analysis section, 3... Code book, 4... Vector quantization section, 5... Standard pattern. 6... Recognition processing section, 7... Result output section, 8.18.
...Algorithm start terminal, 9.19...Initialization section,
10゜20...Learning sample reading section, 11.21
. . . Clustering section, 12.22 . . . Average distortion calculation section. 13.23... Convergence determination section, 14.24... Average distortion replacement section, 15.25... Codebook determination section. 16.26...Codebook storage section, 17.27...
・Algorithm end terminal. Figure 3 Figure 4 Bunro Nu old first rib Ishiyoshi Ryochi mn code book (9T) and number of de S T 4 calculation Yosakami cluster 1 Nonghira 7 distortion Ninogi Shisan 22 Fig. 5

Claims

[Claims]

(1) In the vector quantization method, which creates representative vectors from a set of feature vectors of voices, images, etc. based on a clustering method, vector quantization is performed for each speaker among multiple speakers to create code vectors. After creating , the number of training samples belonging to those code vectors is registered, and the code vectors for all multiple speakers are weighted by the number of training samples and vector quantized again to create a code vector for unspecified speakers. A method for creating a codebook for unspecified speakers, characterized by the following.

(2) Claim No. 1 characterized in that learning samples of other speakers are vector quantized using a code vector of a specific speaker among a plurality of speakers as an initial vector. ) The method for creating a codebook for non-specific speakers as described in section 2.

(3) When creating a codebook for the (m+1)th speaker or later, vector quantization is performed using the codebook created from speakers 1 to m as an initial vector. 1) Method for creating a codebook for non-specific speakers as described in section 1).

(4) Calculate the Mahalanobis distance from the covariance matrix using the code vectors for each speaker that belong to the created code vectors for unspecified speakers, calculate the statistical inter-code vector distance, and then calculate the vector quantum A method for creating a codebook for an unspecified speaker according to claim (1), characterized in that:

(5) Use absolute value distance or Euclidean distance, which requires less calculation, until the code vector converges as a distance measure between vectors during vector quantization, and then switch to Mahalanobis distance after convergence to create a codebook. A method for creating a codebook for unspecified speakers according to claim (4).