JPH02238499A

JPH02238499A - Vector quantizing system

Info

Publication number: JPH02238499A
Application number: JP1057706A
Authority: JP
Inventors: Yoshiaki Asakawa; 淺川　吉章; Hiroshi Ichikawa; 市川　熹
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-03-13
Filing date: 1989-03-13
Publication date: 1990-09-20

Abstract

PURPOSE:To secure the reduction of a quantizing distortion by executing a fuzzy vector quantization by using selectively a code vector in accordance with an input vector. CONSTITUTION:First of all, a quantizing distortion at the time when an input vector 106 is brought to vector quantization by a vector 405 being the nearest to the vector is calculated 406. Subsequently, a vector is added successively in order being near the input vector 106 from in candidate vectors, brought to fuzzy vector quantization 408, and a quantizing distortion is calculated. If the quantizing distortion increases, the added candidate vector is not used, this procedure is applied to the remaining candidate vectors, as well, and it is repeated until the vector to be used reaches a prescribed number of pieces or until the quantizing distortion is decreased and eliminated. A code vector (a representative vector in a code book) is used selectively for such a fuzzy vector quantization. In such a way, the quantizing distortion can always be set to a regular vector quantization or below.

Description

[Detailed description of the invention] [Industrial application field]

本発明は音声の高能率符号化装置に係り、特に高品質な
再生音声を高い情報圧縮率で得ることに好適な音声符号
化方式に関する。［従来の技術）従来，音声高能率符号化方式には、様々な方式が提案さ
れてきた。例えば、中田和男著［ディジタル情報圧縮」
　（廣済堂産報出版、電子科学シリーズ１００）には、
様々な方式がわかりやすく解説されており、波形符号化
方式や情報源符号化方式（パラメータ符号化方式）に関
する多数の方式が示されている。［発明が解決しようとする課題１これら諸方式のうち、波形符号化方式は音質が良好なも
のの、情報圧縮効率をあげることが困難であり、パラメ
ータ符号化方式は、情報圧縮効率は高いものの、逆に情
報量をましでも音質に上限が生じ、十分な品質が得られ
ないという欠点があり、特に両者の得意な帯域の中間の
情報圧縮（　１　０ｋｂｐｓ付近）は谷間の帯域となっ
ている。これに対し、両方式の長所を組合せたハイブリ
ッド方式として、マルチパルス方式（たとえば、Ｂ，Ｓ
，Ａｔａｌ　ｅｔ　ａｌ，　　“Ａ　ｎｅｔｉ　ｍｏｄ
ｅｌ　ｏｆ　Ｌ　Ｐ　Ｃｅｘｃｉｔａｔｉｏｎ　ｆｏｒ
　ｐｒｏｄｕｃｉｎｇ　ｎａｔｕｒａｌ−ｓｏｕｎｄｉ
ｎｇｓｐｅｅｃｈ　ａｔ　　ｌｏｗ　　ｂｉｔ　ｒａｔ
ｅｓ”　　Ｐｒｏｃ．　　Ｉ　ＣＡ　Ｓ　Ｓ　Ｐ８２，
Ｓ−５．１０，（１９８２）など）や、ＴＯＲ方式（Ａ
．Ｉｃｈｉｋａｗａ　ｅｔ　ａｌ．，　　”Ａ　ｓｐｅ
ｅｃｈｃｏｄｉｎｇ　ｍｅｔｈｏｄ　ｕｓｉｎｇ　ｔｈ
ｉｎｎｅｄ−ｏｕｔ　ｒｅｓｉｄｕａｌ”Ｐｒｏｃ．　
　ＩＣＡＳＳＯ　　８５，２５．７　　（１９８５））
等が近年提案され、各種の検討がなされているが．音質
の点から見ても、処理に要するコストの面から見ても不
十分な状況にある。一般に、各種高能率符号化方式は、音声の情報の存在が
偏っている点に注目し、情報の存在している部分に符号
の割当を厚くすることにより実現しているが、この点を
さらに積極的に推し進め、複数のパラメータの組合せと
しての情報の偏りに注目し、パラメータの組合せセット
（ベクトルと呼ぶ）に対し、音声情報の存在している部
分に符号の割当を厚くする方式（ベクトル量子化と呼ぶ
）（たとえば、Ｓ　．　Ｒｏｕｃｏｓ　ｅｔ　ａｌ．　
，　　”　Ｓ　ｅｇｍｅｎｔｑｕａｎｔｉｚａｔｉｏｎ
　ｆｏｒ　ｖｅｒｙ−ｌｏｗ−ｒａｔｅ　ｓｐｅｅｃｈ
　ｃｏｄｉｎｇ”Ｐｒｏｃ．ＩＣＡＳＳＦ’　　８２，
ｐ，１５６３（１９８２））が注目されている。ベクト
ル量子化方式では、ベクトルとコード（符号）を対応付
ける表をコードブックと呼ぶが、高品質の音声符号化を
実現するためには、事前に良質のコードブックを作って
おく必要がある。コードブックの作成には極めて大量の
音声データを用いなければならず，また、コードブック
のサイズをどの程度大きくすれば良いか，等々の問題が
ある。コードブックの問題に関しては、入力音声とコードブッ
クのすべてのベクトルとの各ベクトルに対する級関数を
用いて内挿するファジィベクトル量子化法も提案されて
いるが（たとえば、Ｈ．Ｐ．Ｔｓｅｎｇ　ｅｔ　ａｌ．
，　　”Ｆｕｚｚｙ　ｖｅｃｔｏｒ　ｑｕａｎｔｉｚａ
ｔｉｏｎａｐｐｌｉｅｄ　ｔｏ　ｈｉｄｄｅｎ　Ｍａｒ
ｋｏｖ　ｍｏｄｅｌｉｎｇＩＣＡＳＳＰ　　８７．４　
（１９８７）），各ベクトルとの類似性の情報である級
関数（メンバシップ関数）情報が大量に必要となるため
、コードブックの質に比し音質の向上は期待されるもの
の，伝送用の技術としては用いられない。音声認識等の
前処理に用いることが検討されている状況にある。また
二情報量を減らすために入力音声とコードブックの全て
のベクトルと比較し、近いものをｋ個だけ用いるｋ近傍
則（ＫＮＮ法）の適用も提案されているが（たとえば、
中村他、″ファジィベクトル量子化を用いたスペクトロ
グラムの正規化″′、日本音響学会誌４５巻２号（平成
１−２））、近傍ベクトルを選択するためにソーティン
グ処理が必要であり、伝送情報量の削減にも限界がある
。この問題に対し、コードブック中の各ベクトルの各々に
対し、あらかじめ近傍ベク１・ルを求めておき、この情
報をコードブックに記録しておくことによって，ソーテ
ィング処理を不要にし、かつ伝送情報量も低減できる方
法を本発明の発明者らがすでに提案している（特願昭６
３− ２４０９７２）。上記いずれのファジィベクトル量子化においても、平均
の量子化歪は通常のベクトル量子化に比べて低減するこ
とができるが、個々のケースを詳細に調べると、量子化
歪がかえって増加する場合がある。本発明の目的は、ファジィベクトル量子化において量子
化歪の低減を保証する方法を提供することにある。The present invention relates to a high-efficiency audio encoding device, and particularly to an audio encoding method suitable for obtaining high-quality reproduced audio at a high information compression rate. [Prior Art] Conventionally, various methods have been proposed as high-efficiency speech encoding methods. For example, ``Digital Information Compression'' by Kazuo Nakata.
(Kosaido Sanpo Publishing, Electronic Science Series 100),
Various methods are explained in an easy-to-understand manner, and a large number of methods related to waveform encoding methods and information source encoding methods (parameter encoding methods) are shown. [Problem to be solved by the invention 1 Among these methods, although the waveform encoding method has good sound quality, it is difficult to increase information compression efficiency, and the parameter encoding method has high information compression efficiency, but On the other hand, even if the amount of information is better, there is a limit to the sound quality, and sufficient quality cannot be obtained.In particular, information compression (nearly 10 kbps), which is between the bands that both are good at, is in the valley. On the other hand, as a hybrid method that combines the advantages of both methods, a multi-pulse method (for example, B, S
, Atal et al.
El of L P Excitation for
producing natural-sound
ngspeech at low bit rat
es” Proc. I CA S S P82,
S-5.10, (1982), etc.), TOR method (A
．． Ichikawa et al. , “A spe
echocoding method using th
ined-out residual”Proc.
ICASSO 85, 25.7 (1985))
etc. have been proposed in recent years, and various studies have been conducted. The situation is inadequate both in terms of sound quality and the cost required for processing. In general, various high-efficiency encoding methods focus on the fact that the presence of voice information is unevenly distributed, and achieve this by thickening the allocation of codes to the parts where information exists. We are actively promoting a method that focuses on the bias of information as a combination of multiple parameters, and assigns a thicker code to the part where audio information exists for a set of parameter combinations (called a vector). (for example, S. Roucos et al.
, ”S egmentquantization
for very-low-rate speech
coding"Proc.ICASSF' 82,
p, 1563 (1982)) has been attracting attention. In the vector quantization method, a table that associates vectors with codes is called a codebook, but in order to achieve high-quality speech encoding, it is necessary to create a high-quality codebook in advance. Creating a codebook requires the use of an extremely large amount of speech data, and there are also problems such as how large the size of the codebook should be. Regarding the codebook problem, a fuzzy vector quantization method has also been proposed in which the input speech and all vectors in the codebook are interpolated using a class function for each vector (for example, H.P. Tseng et al. ．．
, ”Fuzzy vector quantiza
tionapplied to hidden Mar
kov modelingICASSP 87.4
(1987)), a large amount of class function (membership function) information, which is information on the similarity with each vector, is required, so although it is expected that the sound quality will improve compared to the quality of the codebook, the It is not used as a technology. It is currently being considered for use in preprocessing such as speech recognition. Furthermore, in order to reduce the amount of bi-information, it has been proposed to apply the k-nearest neighbor law (KNN method), which compares the input speech with all vectors in the codebook and uses only k closest vectors (for example,
Nakamura et. There are limits to how much can be reduced. To solve this problem, we can eliminate the need for sorting processing and reduce the amount of information to be transmitted by calculating the neighborhood vectors for each vector in the codebook in advance and recording this information in the codebook. The inventors of the present invention have already proposed a method that can reduce the
3-240972). In any of the above fuzzy vector quantizations, the average quantization distortion can be reduced compared to normal vector quantization, but if you examine each case in detail, the quantization distortion may actually increase. . An object of the present invention is to provide a method that guarantees reduction of quantization distortion in fuzzy vector quantization.

[Means to solve the problem]

上記目的を達成するために、入力ベクトルに応じてコー
ドブック中の代表ベクトル（以下、コードベクトルと呼
ぶ）を選択的に用いて、ファジィベクトル量子化を行う
。このコードベクトルの選択には、使用するコードベク
トルの候補を選択する手段、その候補ベクトルに対して
入力ベクトルとの関係を評価する手段、評価結果に基づ
き、使用するベクトルを決定する手段、からなる。［作用］本発明の代表的な手順について、その作用を説明する。伝送したい音声が入力されると、分析部において特徴ベ
クトルが抽出され、順次コードブック中のコードベクト
ルと比較され，最も近いベクトルが選択される。従来の
ファジィベクトル量子化に本発明を適用する場合は、候
補ベクＩ−ルとして，たとえば、入力ベクトルに近い順
にあらかじめ定められた個数のコードベクトルを選びだ
す。また、コードベクトルごとに近傍ベクトルをあらかじめ
登録しておく方式（特願昭６３−２４０９７２）に適用
する場合は、近傍ベクトルが候補ベクトルとなるが、こ
の中で、入力ベクトルに近い順に順序付けをしておく。ベクトルを選択するための評価基準として量子化歪を用
いる場合には、まず，入力ベクトルをそれに最も近いベ
クトルによってベクトル量子化（いわゆる通常のベクト
ル量子化）したときの量子化歪を算出する。次に、候補
ベクトルの中から入力ベクトルに近い順に逐次ベクトル
を追加し、ファジィベクトル量子化し、量子化歪を算出
する。この時、量子化歪が減少すれば追加した候補ベクトルを
正式に用いることにする。もし量子化歪が増加すれば、
追加した候補ベクトルは用いない。この手順を残りの候補ベクトルにも適用し、使用するベ
クトルが所定の個数に達するか、あるいは量子化歪が減
少しなくなるまで繰り返す。このように、ファジィベク
トル量子化にコードベクトルを選択的に用いることで、
常に量子化歪を通常のベクトル量子化以下にすることが
できる。（実施例１以下、本発明の実施例を図面を用いて説明する。第１図は本発明の一実施例を説明するためのブロック図
である。送信側と受信側を対にした一方向のみを示して
おり，逆方向への通信路は、図が複雑になるため省略し
てある。第１図において、入力音声１０１はアナログ・ディジタ
ル（Ａ／Ｄ）変換器１０２を経て、２面構成のバッファ
メモリ１０３に入力される。このメモリは以下の処理の
時間調整と，入力音声の中断を防止するために設けられ
ている。バッフ７メモリ１０３からの音声は分析部１０
４に入力され、ピッチ情報１０７、スペクトル情報１０
６、レベル情報１０５が求められる。スペクトル情報１
０６は本発明を適用したファジィベクトル量子化部１０
８に加えられ、ベクトルコード１０９と級関数（メンバ
シップ関数）１１０を得る６ベクトルコード１０９、級
関数１１０、ピッチ情報１０７、レベル情報１０５は送
信部１１１、伝送路１１２を経て受信部１１３に送られ
る。受信側では受信部で受けたベクトルコード１０９’
　、級関数１１０’．ピッチ情報１０７″　　レベル情
報１０５ｔはファジィベクトル逆量子化部１１４に加え
られ、スペクトル情報１１５が復元され、ピッチ情報１
０７′　　レベル情報１０５′と共に合成部１１６に加
えられる。合成部１１６では音声波形に復号され、出力
用の２面バッファメモリ１１７を経て、ディジタル・ア
ナログ（Ｄ／Ａ）変換器１１８によりアナログ信号に変
換され、出力音声１１９として再生される。以下，各部分を詳細に説明する。第２図は分析部１０４を説明するための図である。本実
施例では、分析部はパワスペクトル包絡（ＰＳＥ）分析
法による。ＰＳＥ分析法は、中島等の論文″パワースペ
クトル包絡（Ｐ　Ｓ　Ｅ）音声分析・合成系″、日本音
響学会誌４４巻１１号（昭６３−１１）に詳細に述べら
れている。ここではその概要を述べる。第２図において、ピッチ抽出部２０１は入力音声のピッ
チ情報（ピッチ周波数またはピッチ周期）を抽出する。ピッチ抽出の方法は、相関法やＡＭＤＦ法など公知の方
法を用いれば良い。波形切り出し部２０３は入力音声か
らスペクトル情報を分析するための波形区間を切り出す
ものであり、２０〜６０ｍｓ程度の区間を切り出す。固
定長の区間とすることが多いが、ピッチ周期に依存し、
その３倍程度の可変長にすることもある。切り出された
波形は、フーリエ変換部２０４に送られ、フーリエ級数
に変換される。このとき、切り出された波形にハミング
窓等、通常用いられる窓関数を掛けた後、前後に零デー
タを埋め込み、２０４８点のデータとし、高速フーリエ
変換（ＦＦＴ）を用いることで、高速かつ周波数分解能
の高いデータが得られる。フーリエ係数を絶対値で表示
したものが切り出し波形の周波数成分、すなわちスペク
１・ルとなる。切り出し波形が周期構造を有する場合は
、スペクトルはピッチの高調波による線スペクトル構造
を有する。ピッチ再標本化部２０５では、ＦＦＴにより得られたス
ペクトル情報の中から、ピッチ周波数の高調波成分（線
スペクトル成分）のみを取り出す。このようにして取り出したデータは、後述の余弦級数展
開時の周期πに対応付けて、以下考える。パワスペクトル化部２０６は、スペクトルの各成分を自
乗し，パワスペクトルに変換する。さらに、対数化部２
０７は，各成分を対数化し、対数パワスペクトルを得る
．レベル正規化部２０８は入力音声の大きさに基づくレベ
ル変動を吸収するものであるが、次の余弦変換部２０９
において、まとめて抽出しても良い。余弦変換部２０９は対数パワスペクトルを再標本化した
データを用いて、有限項の余弦級数で近似的に表現する
ものである。項数ｍは、通常２５程度に設定する。パワ
スペクトル包絡を次のように表現する．Ｙ　＝　Ａ．＋　Ａ，ｃｏｓλ＋Ａ，ｃｏｓ２λ＋−＋
Ａｍｃｏｓ　ｍλ　　　　（１）係数Ａは、再標本化さ
れたパワスペクトルデータと、（１）式によるＹとの２
乗誤差が最小となるように求められる。係数の第０項八
〇は入力のレベルを表わしているのでレベル情報１０５
として、Ａ１，・・・，Ａ．をスペクトル情報１０６と
して出力する。次に第３図を用いて本発明のベクトル選択機能を有する
−ファジィベクトル量子化部を説明する。第３図において、コードブック４０１にはコードベクト
ルの要素の値とそのコードが記憶されている。距離計算部４０２において、スペクトル情報（入力ベク
トル）１ｏ６が入力されると、コードブック４０１から
各コードベクトルが読みだされ、入力ベクトル１０６と
の距離が計算され、距離値４０３が出力される。ここで
距離尺度は、ベクトルの各要素に重み付けしたユークリ
ッド距離であるが、他の適当な尺度を用いても良いこと
は言うまでもない。また，ピッチ情報１０７などを利用
して、距離計算の対象とするコードベクトルの範囲を限
定することも可能である。候補ベクトル選択部４０４において、次に述べるベクト
ル評価の対象とするコードベクトルの候補を選択する。ここでは、距雌値４０３を参照して、距離の小さいもの
から所定個数（Ｃ個）を選択し、距離の小さい順に並べ
替えた候補ベクトルのコード４０５として出力する。候
補ベクトルの選択基準は上記のほか，距離値が所定の閾
値よりも小さいものとしても良いし、所定個数以下で、
かつ、距離値が所定閾値以下としても良い。また，コー
ドブック中の全コードベクトルを対象とする場合には、
本候補ベクトル選択部は不要である。ベクトル選択部４０６では、候補ベクトルに対し、以下
の手順で量子化歪を算出し、評価する。入力ベクトルとの距離値４０３の最小値ｄ，Ｉｉ．１が
、最近傍ベクトルで入力ベクトルをベクトル量子化（い
わゆる通常のベクトル量子化）したときの量子化歪にな
るので，まずこれを評価の基準とする。次に、最近傍ベクトル以外の候補ベクトルを一つずつ最
近傍ベクトルと組み合わせ、ファジィベクトル量子化し
，量子化歪を算出する。ファジィベクトル量子化につい
ては、中村等の文献「ファジィベクトル量子化を用いた
スペクトログラムの正規化」　（日本音響学会誌４５巻
２号（１９８９））及びそこで引用されている文献に詳
しく述べられているので、ここではその概要を説明する
。ファジィベクトル量子化では、入力ベクトルを複数個の
コードベクトルに対する帰属度によって表現する。帰属
度は級関数（メンバシップ関数）により、数値化される
。級関数の求め方の一例を次式に示す。今、Ｃ個のコードベクトル（Ｖエ，・・・，　Ｖｃ）を
対象とするとき，入力ベクトルＸｉとコードベクトルＶ
．との距離をｄｔｋとする。入力ベクトルがどのコード
ベクドルにも一致しない場合は、各コードベクトルに対
する級関数ｕ＋ｈは次式によって求まる。ｕＩｋ：ここに、ｐはファジィネスと呼ぶパラメータで、通常１
．５程度の値とする。もし、入力ベクトルがコードベク
トルのいずれかに一致したときは、そのコードベクトル
に対する級関数の値を１とし、他をＯとする。次に、級関数からベクトルを再生する（逆量子化操作）
について説明する。再生ベクトルＸｋはコードベクトル
の線形結合で表わされる。Ｘｋ　　　＝入力ベクトルＸｋと再生ベクトルｘ　，　ｌ　との誤差
（距離）がファジィベクトル量子化による量子化歪であ
る。最近傍ベクトルと、残りの候補ベクトルを，順次、一つ
ずつ用いてファジィベクトル量子化し、それぞれの量子
化歪を求める。これらの量子化歪の最小値がｄ　ｍｌｎ
以下のとき、この最小値を与える候補ベクトルを選択す
る。また、このときの最小値を改めてｄ　ｌｌＩｎと置
く。次に、最近傍ベクトルと今選択されたベクトルに残
りの候補ベクトルを順次一つずつ追加し，同様の手順を
候補ベクトルが無くなるまで繰り返す。以上は候補ベク
トルの中で量子化歪を減少させるものは，すべて選択す
る場合である。この他、選択されたベクトル数が所定個
数に達したら、処理を打ち切るようにすることもできる
。また、上記の方法では、一つのベクトルを選択するため
に，毎回残っている候補ベクトルすべてを評価している
。簡略化した方法として、入力ベクトルとの距離が小さ
い順に一つずつ追加し、その時の量子化歪が追加前の量
子化歪よりも滅少すれば、そのベクトルを選択するよう
にしてもよい。以上は、評価基準として量子化歪を用いた例である。こ
れ以外に、局所復号器を持つことにより、？力信号と同
じ次元での誤差尺度を評価甚準として、ベクトルを選択
することもできる。また、この他、ベクトル空間におけ
る位置関係によってベクトルを選択することも可能であ
る。第４図はこれを説明するための概念図である。簡単
のため、ベクトルの次元数は２次元としてある。同図で
Ｘｋは入力ベクトル、Ｖよけ最近傍ベクトルである。Ｖｌを評価しようとするベクトルとすると、■、とｖｔ
でファジィベクトル量子化したときの再生ベクトル（ｘ
ｋ′とする）はＶエと■１を結んだ直線上レこ来る。し
たがって、Ｘｋ′とＸ，の距雑がＶエとＸｋとの距離よ
りも小さくなるための条件は、■，がＸ，を中心とした
半径ｄ　ｍｉｎの円のｖ１における接線よりもＸｋ側に
あることである。これは別の見方をすれば、ＸｈとＶエ
の距離とＸｈとＶ１の距離が判っているとき、３つのベ
クトルＶ■，Ｘｋ，ｖＩのなす角の大きさで判定できる
ことを意味している。具体的には、ベクトルの内積計算を行えば良い。ファジィベクトル量子化部４０８では、ベクトル選択部
４０６の出力であるベクトルコード４０７を参照して，
選択されたベク１・ルを用いて入力ベクトルをファジィ
ベクトル量子化する。具体的には前述の（２）式に基づ
いて級関数を算出する。出力は選択されたベクトルコー
ド１０９と、級関数１１０である。選択されるベクトル
数が可変のときは、その数の情報も出力する。また、級
関数はその性質上、総和は１となるので、ベクトル数よ
り１少ない個数だけ出力すれば良い。また、実際に選択
されたベクトル数が所定の個数（固定）に満たないとき
は、残りの個数に対する級関数値はＯとすれば良い。以上では、本発明のベクトル選択機能を従来提案されて
いる通常のファジィベクトル量子化に適用する場合につ
いて説明した。これに対し、本発明の発明者らが既に提
案している、近傍ベクトルをコードベクトルごとに事前
に登録しておくタイプのファジィベクトル量子化（特願
昭６３−２４０９７２）に適用する場合について簡単に
説明する。この場合、候補ベクトルは最近傍ベクトルと
それに対して事前に登録されている近傍ベクトルである
。したがって候補ベクトル選択部４０４の機能は大幅に
簡略化されている。また最終段のファジィベクトル量子
化部４０８の出力のうち、ベクトルコードは最近傍ベク
トルコードだけである。候補ベクトルのうち選択されな
かったものは、級関数値をＯにすることによって判別で
きる。次に復号側（受信側）について説明する。第５図はファジィベクトル逆量子化部１１４を説明する
ための図である。ベクトルコード１０９′が受信される
と、コードブック７０１から対応するコードベクトルＶ
ｌが読みだされる。これと受信された級関数ｕｔｉｔｌ
ｌＱ’　を用いて、ベクトル再生部７０２において、前
述の（３）式によりベクトルを再生する。なお、受信側
のコードブック７０１は送信側のコードブック４０１と
同一の内容であることは言うまでもない。再生ベクトル
Ｘｈ’　＝　（Ａ１’　＋　　２’　ｌ・・・，Ａ．’
　）はスペクトＡル情報１１５として合成部１１６に送られる。次に、合成部１１６を第６図を用いて説明する。同図において、対数パワスペクトル再生部８０１では、
伝送されたレベル情報Ａ。″　１０５′と再生ベクトル
（スペクトル情報１１５）の各要素Ａ％ＨＡ２’　　・
・・ＩＡＩ１’　を用いて対数パワスペクトルＹ’　８
０２を次式にしたがって得る。Ｙ’　＝Ａ，’　＋Ａ．″ｃｏｓλ＋Ａ２’ｃｏｓ２λ
＋＝−＋Ａｍ’ｃｏｓ　ｍλ　　（４）再生された対数
パワスペクトルＹ’　　８０２は逆対数変換部８０３で
変換（１／２）ｌｏｇ一”を行い、零位相化スペクトル
８０４を得、逆フーリエ変換部８０５へ送られる。逆フ
ーリエ変換部８０５では高速フーリエ逆変換（ＩＦＦＴ
）により音声素片８０６が得られる。音声素片８０６は
波形合成部８０７でピッチ情報１０７′にしたがって順
次ピッチ間隔だけずらしながら加えあわせられ、再生音
声８０８として出力される。実施例における本発明の効果を第７図と第８図に示す。コードブックサイズに対する量子化歪の関係をプロット
したものであり、本発明のベクトル選択機能を適用した
場合とそうでない場合を示している。第７図は通常のフ
ァジィベクトル量子化に適用した例、第８図は近傍ベク
トルを事前に登録したコードブックを使用するファジィ
ベクトル量子化に適用した例である。（発明の効果】本発明によれば、量子化歪を従来のファジィベクトル量
子化よりも常に小さく出来るので、同一情報量で高品質
な音声を伝送できる。また、同じ品質ならば、情報量を
削減できる。さらに，近傍ベクトルを事前に登録したコ
ードブックを吏用するファジィベクトル量子化に適用す
ることにより、一層情報量の削減が可能である。なお、本発明の説明では、対象は全て音声を例にしてい
るが、類似の構造の情報をもつものに利用できることは
言うまでもない。In order to achieve the above object, fuzzy vector quantization is performed by selectively using representative vectors in the codebook (hereinafter referred to as code vectors) according to the input vector. This code vector selection consists of a means for selecting a code vector candidate to be used, a means for evaluating the relationship between the candidate vector and the input vector, and a means for determining a vector to be used based on the evaluation result. . [Operation] The operation of typical procedures of the present invention will be explained. When the voice to be transmitted is input, feature vectors are extracted in the analyzer and compared with code vectors in the codebook in order, and the closest vector is selected. When the present invention is applied to conventional fuzzy vector quantization, a predetermined number of code vectors are selected as candidate vectors, for example, in order of proximity to the input vector. Furthermore, when applying a method in which neighboring vectors are registered in advance for each code vector (Japanese Patent Application No. 63-240972), the neighboring vectors become candidate vectors, but among them, they are ordered in order of proximity to the input vector. I'll keep it. When using quantization distortion as an evaluation criterion for selecting a vector, first, the quantization distortion when an input vector is vector quantized using the vector closest to it (so-called normal vector quantization) is calculated. Next, vectors are sequentially added from among the candidate vectors in order of their proximity to the input vector, and fuzzy vector quantization is performed to calculate quantization distortion. At this time, if the quantization distortion decreases, the added candidate vector is officially used. If the quantization distortion increases,
The added candidate vectors are not used. This procedure is applied to the remaining candidate vectors and repeated until a predetermined number of vectors is used or the quantization distortion no longer decreases. In this way, by selectively using code vectors for fuzzy vector quantization,
Quantization distortion can always be kept below normal vector quantization. (Embodiment 1 An embodiment of the present invention will be described below with reference to the drawings. Fig. 1 is a block diagram for explaining an embodiment of the present invention. The communication path in the opposite direction is omitted because it would complicate the diagram. The audio from the buffer memory 103 is input to the buffer memory 103 of the configuration. This memory is provided to adjust the time of the following processing and to prevent interruption of input audio.
4, pitch information 107, spectrum information 10
6. Level information 105 is obtained. Spectrum information 1
06 is a fuzzy vector quantization unit 10 to which the present invention is applied
8 to obtain a vector code 109 and a class function (membership function) 110.6 The vector code 109, class function 110, pitch information 107, and level information 105 are sent to the receiving unit 113 via a transmitting unit 111 and a transmission line 112. It will be done. On the receiving side, the vector code 109' received by the receiving section
, class function 110'. Pitch information 107'' level information 105t is added to fuzzy vector dequantization unit 114, spectrum information 115 is restored, and pitch information 1
07' is added to the synthesis unit 116 together with the level information 105'. The synthesizer 116 decodes the audio waveform, passes through a two-sided buffer memory 117 for output, converts it into an analog signal by a digital/analog (D/A) converter 118, and reproduces it as an output audio 119. Each part will be explained in detail below. FIG. 2 is a diagram for explaining the analysis section 104. In this embodiment, the analysis section uses a power spectral envelope (PSE) analysis method. The PSE analysis method is described in detail in the paper "Power Spectral Envelope (PSE) Speech Analysis and Synthesis System" by Nakajima et al., Journal of the Acoustical Society of Japan, Vol. 44, No. 11 (1986-11). Here we will give an overview. In FIG. 2, a pitch extraction unit 201 extracts pitch information (pitch frequency or pitch period) of input speech. As a pitch extraction method, a known method such as a correlation method or an AMDF method may be used. The waveform cutting unit 203 cuts out a waveform section for analyzing spectrum information from the input voice, and cuts out a section of about 20 to 60 ms. It is often a fixed length section, but it depends on the pitch period,
The length may be made variable to about three times that length. The extracted waveform is sent to the Fourier transform unit 204 and transformed into a Fourier series. At this time, after multiplying the extracted waveform by a commonly used window function such as a Hamming window, zero data is embedded before and after the waveform to create 2048 points of data, and by using fast Fourier transform (FFT), high speed and frequency resolution are achieved. Highly accurate data can be obtained. The Fourier coefficient expressed in absolute value becomes the frequency component of the cut-out waveform, that is, the spectrum. When the cut-out waveform has a periodic structure, the spectrum has a line spectrum structure due to pitch harmonics. The pitch resampling unit 205 extracts only the harmonic components (line spectrum components) of the pitch frequency from the spectrum information obtained by FFT. The data extracted in this manner will be considered below in association with the period π during cosine series expansion, which will be described later. The power spectrum conversion unit 206 squares each component of the spectrum and converts it into a power spectrum. Furthermore, the logarithmization unit 2
07 logarithms each component to obtain a logarithmic power spectrum. The level normalization unit 208 absorbs level fluctuations based on the loudness of the input audio, but the next cosine conversion unit 209
, they may be extracted all at once. The cosine transform unit 209 uses data obtained by resampling the logarithmic power spectrum and approximately expresses it using a cosine series of finite terms. The number of terms m is usually set to about 25. The power spectrum envelope is expressed as follows. Y=A. + A, cosλ+A, cos2λ+-+
Amcos mλ (1) The coefficient A is the difference between the resampled power spectrum data and Y according to equation (1).
It is calculated so that the multiplicative error is minimized. The 0th term 80 of the coefficient represents the level of the input, so the level information 105
As, A1,...,A. is output as spectrum information 106. Next, a fuzzy vector quantizer having a vector selection function according to the present invention will be explained using FIG. In FIG. 3, a codebook 401 stores the values of elements of codevectors and their codes. When the spectrum information (input vector) 1o6 is input to the distance calculation unit 402, each code vector is read out from the codebook 401, the distance from the input vector 106 is calculated, and a distance value 403 is output. Here, the distance measure is a Euclidean distance weighted to each element of the vector, but it goes without saying that other suitable measures may be used. Furthermore, it is also possible to limit the range of code vectors targeted for distance calculation by using the pitch information 107 or the like. Candidate vector selection section 404 selects code vector candidates to be subjected to vector evaluation, which will be described below. Here, with reference to the distance value 403, a predetermined number (C) of vectors with the smallest distance are selected and output as a code 405 of candidate vectors sorted in descending order of distance. In addition to the above criteria for selecting candidate vectors, the distance value may be smaller than a predetermined threshold, or the number of vectors may be less than or equal to a predetermined number.
In addition, the distance value may be less than or equal to a predetermined threshold value. In addition, when targeting all code vectors in the codebook,
This candidate vector selection section is not necessary. The vector selection unit 406 calculates and evaluates quantization distortion for the candidate vector using the following procedure. Minimum value d of distance value 403 to input vector, Ii. 1 is the quantization distortion when the input vector is vector quantized (so-called normal vector quantization) using the nearest neighbor vector, so this is first used as the criterion for evaluation. Next, candidate vectors other than the nearest neighbor vector are combined with the nearest neighbor vector one by one, fuzzy vector quantization is performed, and quantization distortion is calculated. Fuzzy vector quantization is described in detail in the literature by Nakamura et al., "Normalization of spectrograms using fuzzy vector quantization" (Journal of the Acoustical Society of Japan, Vol. 45, No. 2 (1989)) and the literature cited therein. Therefore, I will provide an overview here. In fuzzy vector quantization, an input vector is expressed by degrees of belonging to a plurality of code vectors. The degree of membership is quantified using a class function (membership function). An example of how to find the class function is shown in the following equation. Now, when we target C code vectors (Ve,..., Vc), input vector Xi and code vector V
．． Let the distance between the two points be dtk. If the input vector does not match any code vector, the class function u+h for each code vector is determined by the following equation. uIk: Here, p is a parameter called fuzziness, usually 1
．． The value should be around 5. If the input vector matches any of the code vectors, the value of the class function for that code vector is set to 1, and the others are set to O. Next, reproduce the vector from the class function (inverse quantization operation)
I will explain about it. The reproduction vector Xk is expressed as a linear combination of code vectors. Xk = error (distance) between input vector Xk and reproduction vector x, l is quantization distortion due to fuzzy vector quantization. Fuzzy vector quantization is performed using the nearest neighbor vector and the remaining candidate vectors one by one, and the quantization distortion of each is determined. The minimum value of these quantization distortions is d mln
In the following cases, select the candidate vector that gives this minimum value. Also, the minimum value at this time is rewritten as dllIn. Next, the remaining candidate vectors are added one by one to the nearest neighbor vector and the currently selected vector, and the same procedure is repeated until there are no more candidate vectors. The above is a case in which all candidate vectors that reduce quantization distortion are selected. In addition, the process may be terminated when the number of selected vectors reaches a predetermined number. Furthermore, in the above method, all remaining candidate vectors are evaluated each time in order to select one vector. As a simplified method, vectors may be added one by one in order of decreasing distance to the input vector, and if the quantization distortion at that time is smaller than the quantization distortion before addition, that vector may be selected. The above is an example using quantization distortion as an evaluation criterion. Besides this, by having a local decoder? A vector can also be selected using an error scale in the same dimension as the force signal as the evaluation criterion. In addition to this, it is also possible to select vectors based on positional relationships in vector space. FIG. 4 is a conceptual diagram for explaining this. For simplicity, the number of dimensions of the vector is assumed to be two. In the figure, Xk is an input vector and a V-avoiding nearest neighbor vector. If Vl is the vector to be evaluated, ■, and vt
The reproduction vector (x
k') comes on the straight line connecting Ve and ■1. Therefore, the condition for the distance difference between Xk' and X to be smaller than the distance between V and It is a certain thing. From another perspective, this means that when the distance between Xh and Ve and the distance between Xh and V1 are known, it can be determined by the size of the angle formed by the three vectors V, Xk, and vI. . Specifically, vector inner product calculation may be performed. The fuzzy vector quantization unit 408 refers to the vector code 407 that is the output of the vector selection unit 406, and calculates
The input vector is fuzzy vector quantized using the selected vector 1. Specifically, the class function is calculated based on the above-mentioned equation (2). The output is the selected vector code 109 and the class function 110. When the number of selected vectors is variable, information on that number is also output. Furthermore, since the sum of the class functions is 1 due to their nature, it is only necessary to output the number of vectors, which is one less than the number of vectors. Further, when the number of actually selected vectors is less than a predetermined number (fixed), the series function value for the remaining number may be set to O. In the above, a case has been described in which the vector selection function of the present invention is applied to conventional fuzzy vector quantization. In contrast, we will briefly explain the case where it is applied to fuzzy vector quantization of the type in which neighboring vectors are registered in advance for each code vector (Japanese Patent Application No. 63-240972), which has already been proposed by the inventors of the present invention. Explain. In this case, the candidate vectors are the nearest neighbor vector and its neighboring vectors registered in advance. Therefore, the function of candidate vector selection section 404 is greatly simplified. Further, among the outputs of the fuzzy vector quantization unit 408 at the final stage, the vector code is only the nearest neighbor vector code. Candidate vectors that are not selected can be determined by setting the class function value to O. Next, the decoding side (receiving side) will be explained. FIG. 5 is a diagram for explaining the fuzzy vector inverse quantization section 114. When the vector code 109' is received, the corresponding code vector V is retrieved from the codebook 701.
l is read out. This and the received class function util
Using lQ', the vector reproducing section 702 reproduces the vector according to the above-mentioned equation (3). It goes without saying that the codebook 701 on the receiving side has the same content as the codebook 401 on the transmitting side. Reproduction vector Xh' = (A1' + 2' l..., A.'
) is sent to the synthesis section 116 as spectrum information 115. Next, the combining section 116 will be explained using FIG. 6. In the same figure, the logarithmic power spectrum reproducing section 801
Transmitted level information A. ″105′ and each element A%HA2′ of the reproduction vector (spectral information 115)
... Logarithmic power spectrum Y' using IAI1' 8
02 is obtained according to the following equation. Y'=A,'+A. ″cosλ+A2′cos2λ
+=-+Am' cos mλ (4) The regenerated logarithmic power spectrum Y' 802 is transformed (1/2) log -'' in an inverse logarithmic transform section 803 to obtain a zero-phase spectrum 804, and then the inverse Fourier transform section 805.The inverse Fourier transform unit 805 performs an inverse fast Fourier transform (IFFT).
), a speech segment 806 is obtained. The speech segments 806 are added together in a waveform synthesis section 807 while being sequentially shifted by pitch intervals according to the pitch information 107', and are output as reproduced speech 808. The effects of the present invention in Examples are shown in FIGS. 7 and 8. This is a plot of the relationship between quantization distortion and codebook size, showing cases where the vector selection function of the present invention is applied and cases where it is not applied. FIG. 7 shows an example applied to normal fuzzy vector quantization, and FIG. 8 shows an example applied to fuzzy vector quantization using a codebook in which neighboring vectors are registered in advance. (Effects of the Invention) According to the present invention, quantization distortion can always be made smaller than conventional fuzzy vector quantization, so high quality audio can be transmitted with the same amount of information.Furthermore, with the same quality, the amount of information can be reduced. Furthermore, by applying fuzzy vector quantization that uses a codebook in which neighborhood vectors are registered in advance, it is possible to further reduce the amount of information. is used as an example, but it goes without saying that it can be used for anything that has information with a similar structure.

[Brief explanation of drawings]

第１図は本発明の一実施例のシステム構成を説明するブ
ロック図、第２図は分析部を説明する図、第３図はファ
ジィベクトル量子化部を説明する図、第４図はベクトル
の位置関係に基づいたベクトル選択を説明する図、第５
図はファジィベクトル逆量子化部を説明する図、第６図
は合成部を説明する図、第７図と第８図は実施例の効果
を説明する図である。符号の説明１０１・・・入力音声、１０３、１１７・・・バッファ
メモリ、１０４・・分析部、１０６、１１５・・・スペ
クトル情報、１０７，１０７′・・・ピッチ情報、１０
８・・・ファジィベクトル量子化部、１０９，１０９′
・・・ベクトルインデクス、１１０，１１０′・・・級
関数、１１４・・・ファジィベクトル逆量子化部、１１
６・・・合成部、１２０・・・出力音声。FIG. 1 is a block diagram explaining the system configuration of an embodiment of the present invention, FIG. 2 is a diagram explaining the analysis section, FIG. 3 is a diagram explaining the fuzzy vector quantization section, and FIG. 4 is a diagram explaining the vector quantization section. Diagram 5 explaining vector selection based on positional relationships
FIG. 6 is a diagram for explaining the fuzzy vector inverse quantization section, FIG. 6 is a diagram for explaining the combining section, and FIGS. 7 and 8 are diagrams for explaining the effects of the embodiment. Explanation of symbols 101...Input audio, 103, 117...Buffer memory, 104...Analysis section, 106, 115...Spectrum information, 107, 107'...Pitch information, 10
8...Fuzzy vector quantization section, 109, 109'
...vector index, 110,110'...class function, 114...fuzzy vector inverse quantization unit, 11
6...Synthesizer, 120...Output audio.

Claims

[Scope of Claims] 1. Means for analyzing the characteristics of an input signal as a set (vector) of a plurality of parameters; a codebook in which a code is assigned to a vector) and the code and the representative vector are stored in association with each other; a vector representing the characteristics of the input signal obtained by the analysis means and the representative vector of each cluster in the codebook; means for comparing and determining to which cluster a vector of an input signal belongs; means for selecting a representative vector from a plurality of representative vectors in the codebook based on a predetermined evaluation criterion; and the input vector A vector quantization method comprising: a representative vector of a cluster to which the vector belongs; and means for quantizing an input vector using information on the selected representative vector. 2. The target of the evaluation is a representative vector whose distance to the input vector is less than a certain value, or whose similarity with the input vector is more than a certain value. One-term vector quantization method. 3. The object of the evaluation is a predetermined number of representative vectors predetermined in descending order of distance to the input vector or in descending order of similarity to the input vector. Vector quantization method for the first term in the range. 4. The targets of the evaluation are representative vectors whose distance to the input vector is less than a certain value, or whose similarity with the input vector is more than a certain value, and whose number is less than a predetermined number. A vector quantization method according to claim 1, characterized in that: 5. Scope of claims characterized in that the objects of the evaluation are a representative vector of a cluster to which the input vector is determined to belong, and a plurality of neighboring vectors registered in advance to the representative vector. Vector quantization method for the first term. 6. The vector quantization method according to claim 1, wherein the evaluation criterion is a quantization error of the input vector. 7. The vector quantization method according to claim 1, wherein the evaluation criterion is a positional relationship between the input vector and the representative vector of the evaluation target in a feature space. 8. Vector quantization according to claim 1, wherein the evaluation criterion is an error between the input signal and the output signal obtained as a result of performing local decoding. method. 9. The selection criteria in the representative vector selection means are:
2. The vector quantization method according to claim 1, wherein the representative vector to be evaluated is sequentially evaluated, and if the evaluation value is improved, the representative vector is selected. 10. The selection criterion in the representative vector selection means is to simultaneously evaluate a plurality of representative vectors to be evaluated;
The vector quantization method according to claim 1, characterized in that a representative vector whose evaluation value improves the most among those evaluation values is selected. 11. Vector quantization according to claim 1, characterized in that the number of representative vectors selected by the representative vector selection means is the number of all representative vectors with improvement in the evaluation value. method. 12. A patent characterized in that the number of representative vectors selected by the representative vector selection means is within a predetermined number and is the number of representative vectors that improve the evaluation value. A vector quantization method according to claim 1.