JPH0527799A

JPH0527799A - Vector quantization method and apparatus thereof

Info

Publication number: JPH0527799A
Application number: JP3179211A
Authority: JP
Inventors: Hiroaki Kokubo; 浩明小窪; Yoshiaki Asakawa; 吉章淺川; Hiroshi Ichikawa; 熹市川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-07-19
Filing date: 1991-07-19
Publication date: 1993-02-05
Anticipated expiration: 2015-10-03
Also published as: JP3094522B2

Abstract

(57)【要約】【目的】本発明の目的はベクトル量子化方法及び装置に
おいて、フレーム間のスペクトルの不連続を抑えること
で異音の発生を抑えることである。【構成】ベクトル符号化方式による受信側システムの対
数パワスペクトル再生部は、コードベクトル１１４から
対数スペクトルを複合するスペクトル変換部501,前フレ
ームとのパワを比較して後の処理を切り替えるパワ比較
部５０２，対数スペクトルを共通化する成分５０４とそ
れ以外の成分５０５に弁別する周波数弁別部５０３，共
通化する成分５０４に対して前フレームの周波数成分５
０８との間で平均化を行う重み付き平均化部５０６，重
み付き平均化された成分５０９と弁別されたもう一方の
周波数成分５０５とを再び混合する周波数混合部５１
０，レベル情報１０５′を付加するレベル再生部５１２
より構成される。 (57) [Summary] [Object] An object of the present invention is to suppress the generation of abnormal noise by suppressing the discontinuity of the spectrum between frames in the vector quantization method and apparatus. A logarithmic power spectrum reproducing unit of a receiving-side system using a vector coding method includes a spectrum converting unit 501 that combines a logarithmic spectrum from a code vector 114, a power comparing unit that compares power with a preceding frame and switches subsequent processing. 502, a frequency discriminator 503 that discriminates a component 504 that shares a common logarithmic spectrum and a component 505 that does not share the same, with respect to the component 504 that shares a common frequency component
Weighted averaging unit 506 that performs averaging with respect to 08 and the frequency mixing unit 51 that mixes again the weighted averaged component 509 and the discriminated other frequency component 505.
0, level reproduction unit 512 for adding level information 105 '
It is composed of

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声の高能率符号化装置
に係り、特に高品質な再生音声を高い情報圧縮率で得る
ことに好適なベクトル量子化方法及びその装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency speech coding apparatus, and more particularly to a vector quantization method and apparatus suitable for obtaining high-quality reproduced speech at a high information compression rate.

【０００２】[0002]

【従来の技術】従来音声効能率符号化方式には、様々な
方式が提案されてきた。例えば、中田和男著「ディジタ
ル情報圧縮」（廣済堂産報出版、電子科学シリーズ１０
０）には、様々な方式がわかりやすく解説されており、
波形符号化方式や情報源符号化方式（パラメータ符号化
方式）に関する多数の方式が示されている。2. Description of the Related Art Conventionally, various systems have been proposed as a voice efficiency coding system. For example, Kazuo Nakata "Digital Information Compression" (Kosaido Kogyo Publishing, Electronic Science Series 10
In 0), various methods are explained in an easy-to-understand manner.
A number of schemes relating to waveform coding schemes and information source coding schemes (parameter coding schemes) are shown.

【０００３】[0003]

【発明が解決しようとする課題】これらの諸方式のう
ち、波形符号化方式は音質が良好なものの、情報圧縮効
率を上げることが困難であり、パラメータ符号化方式
は、情報圧縮効率は高いものの、逆に情報量を増しても
音質に上限が生じ、十分な品質が得られないという欠点
がある。特に両者の得意な帯域の中間の情報圧縮（１０
kbps付近）は谷間の帯域になっている。これに対し、両
方式の長所を組み合わせたハイブリッド方式として、マ
ルチパルス方式（例えば、B.S.Atal et al. “A new mo
del of LPCexcitation for producing natural-soundin
g speech at low bit rates”Proc.ICASSP ８２，Ｓ−
５，１０，（１９８２）など）や、ＴＯＲ方式（A.Ichi
kawaet al.,“A speech coding method using thinned-
out residual”Proc.ICASSP８５，２５.７（１９８
５)）等が近年提案され、各種の検討がなされている
が、音質の点から見ても、処理に要するコストの面から
見ても不十分な状況にある。一般に、各種高能率符号化
方式は、音声の情報の存在が偏っている点に注目し、情
報の存在している部分に符号の割当てを厚くすることに
より実現している。この点をさらに積極的に推し進め、
複数のパラメータの組合せとしての情報の偏りに注目
し、パラメータの組合せセット（ベクトルと呼ぶ）に対
し、音声情報の存在している部分に符号の割当てを厚く
する方式（ベクトル量子化と呼ぶ）（例えば、S.Roucos
et al.,“Segment quantization for very-low-rate s
peechcoding”Proc.ICASSP ８２，ｐ．１５６３（１９
８２））が注目されている。Among these various methods, the waveform coding method has good sound quality, but it is difficult to increase the information compression efficiency, and the parameter coding method has high information compression efficiency. On the contrary, even if the amount of information is increased, the sound quality has an upper limit, and there is a drawback that sufficient quality cannot be obtained. In particular, information compression in the middle of the band that both parties are good at (10
Near kbps) is a valley band. On the other hand, as a hybrid method combining the advantages of both methods, a multi-pulse method (for example, BSAtal et al. “A new mo
del of LPCexcitation for producing natural-soundin
g speech at low bit rates ”Proc.ICASSP 82, S-
5, 10, (1982), etc., and the TOR method (A.Ichi
kawaet al., “A speech coding method using thinned-
out residual "Proc.ICASSP85, 25.7 (198
Although 5)) and the like have been proposed in recent years and various studies have been made, they are inadequate in terms of sound quality and cost of processing. In general, various high-efficiency coding schemes are realized by paying attention to the fact that the existence of voice information is biased and thickening the allocation of codes to the parts where the information exists. We will further promote this point,
Paying attention to the bias of information as a combination of a plurality of parameters, the method of thickening the code allocation to the part where the speech information exists for the combination set of parameters (called a vector) (called vector quantization) ( For example, S. Roucos
et al., “Segment quantization for very-low-rate s
peechcoding "Proc.ICASSP 82, p. 1563 (19)
82)) is drawing attention.

【０００４】ベクトル量子化では、コードベクトル（パ
ラメータ，代表ベクトル、またはコードワードともい
う）をコード（指標，インデックスともいう）に対応付
けて格納するコードブックを用いる。ベクトル量子化は
入力ベクトルを有限個のコードベクトルに写像すると言
う意味で量子化過程であるが、指標の情報量が元のベク
トルよりも少なくできるので、情報圧縮に用いられる。
音声の予測残差に対してベクトル量子化を適用するＣＥ
ＬＰ方式（例えば、B.S.Atal et al.,“Stochastic cod
ing of speech signals at very low bit rates”Proc.
ICC８４，ｐｐ．１６１０−１６１３（１９８４))やそ
の改良方式であるＶＳＥＬＰ方式（例えば、I.A.Gerson
et al.,”Vector sum excited linear prediction(VSE
LP)”Proc.IEEE workshop on speech coding for telec
ommunications,pp.６６−６８（１９８９））などが提
案されている。これらの方式では合成音声と原音声の誤
差をフィードバックループを用いて最小化するような方
法を採っている。そのため合成音声の品質は良好だが、
処理量が多いこと、また、４.８kbps以下への適用が困
難という問題がある。高品質の音声符号化を実現するた
めには、事前に良質のコードブックを作っておく必要が
ある。量子化特性の向上には量子化歪を小さくすること
が基本であるが、量子化歪はコードブックを大きくする
程小さくなる。また、有限個のサイズのコードブックを
用いてベクトル量子化を行うことにより生ずる問題点の
一つとして、隣合った音声フレーム間で特性に不連続の
大きな成分があると再生音声に異音が生じることがあ
る。フレーム間の特性の不連続を小さくするには、コー
ドブックに含まれるコードベクトルを増やしそれぞれの
コードベクトル間の距離を小さくする必要があるが、そ
のためにはかなり大きなサイズのコードブックを設定が
必要となる。しかし、大規模のコードブックの作成には
極めて大量の音声データを用いなければならず、かつ膨
大な処理量を要する。また、メモリ容量や探索処理量と
いった制約から、コードブックのサイズには限界があ
る。In vector quantization, a codebook that stores a code vector (also called parameter, representative vector, or codeword) in association with a code (also called index or index) is used. Vector quantization is a quantization process in the sense that an input vector is mapped to a finite number of code vectors, but since the amount of information of the index can be made smaller than the original vector, it is used for information compression.
CE applying vector quantization to prediction residual of speech
LP method (eg BSAtal et al., “Stochastic cod
ing of speech signals at very low bit rates ”Proc.
ICC84, pp. 1610-1613 (1984)) and its improved VSELP method (for example, IAGerson).
et al., ”Vector sum excited linear prediction (VSE
LP) ”Proc.IEEE workshop on speech coding for telec
ommunications, pp. 66-68 (1989)) and the like. These methods employ a method that minimizes the error between the synthetic speech and the original speech by using a feedback loop. Therefore, the quality of synthetic speech is good,
There is a problem that the processing amount is large and that it is difficult to apply to 4.8 kbps or less. In order to realize high quality speech coding, it is necessary to create a good quality codebook in advance. It is fundamental to reduce the quantization distortion to improve the quantization characteristics, but the quantization distortion becomes smaller as the codebook becomes larger. Also, as one of the problems caused by performing vector quantization using a codebook of a finite number of sizes, if there is a large component of discontinuity in the characteristics between adjacent speech frames, there will be abnormal noise in the reproduced speech. May occur. In order to reduce the discontinuity of characteristics between frames, it is necessary to increase the code vectors included in the codebook and reduce the distance between each code vector, but this requires setting a codebook of a considerably large size. Becomes However, in order to create a large-scale codebook, an extremely large amount of voice data must be used, and a huge amount of processing is required. Further, the size of the codebook is limited due to restrictions such as memory capacity and search processing amount.

【０００５】本発明の目的はベクトル量子化方法及びそ
の装置において、連続する２つのフレーム間におけるス
ペクトルの変化をある程度以内に抑えることにより、フ
レーム間でスペクトルの特性が大幅に変化することによ
って生ずる異音の発生を抑え、再生音声の音質を改善す
ることである。It is an object of the present invention to suppress a change in spectrum between two consecutive frames within a certain degree in a vector quantization method and its apparatus, and thereby cause a difference caused by a drastic change in spectrum characteristics between frames. It is to suppress the generation of sound and improve the quality of reproduced voice.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、ベクトル量子化方法及びその装置において、パワの
変化が急峻ではない連続する２つのフレーム間における
スペクトルの変化の度合いの著しい帯域が入力音声の特
徴をあまり反映していないことに着目し、その帯域内の
スペクトルに対し前フレームのスペクトルとの間で重み
付きの平均化を行うことにより、スペクトルをある程度
共通なものとし、フレーム間の変化をある程度以内に抑
えるように構成したものである。In order to achieve the above object, in a vector quantization method and its apparatus, a band in which the degree of change in spectrum is remarkable between two consecutive frames in which the change in power is not steep is input. Focusing on the fact that the characteristics of speech are not reflected so much, the spectrum in that band is averaged with the spectrum of the previous frame with weighted averaging to make the spectrum common to some extent, and It is configured to suppress the change within a certain amount.

【０００７】[0007]

【作用】本発明には種々の変形が考えられるが、その中
で代表的な手段について、その作用を説明する。Various modifications of the present invention are conceivable, and the operation of typical means will be described below.

【０００８】伝送したい音声が入力されると、分析部に
於て特徴ベクトルが抽出され、順次コードブック中のコ
ードベクトルと比較され、最も近いベクトルが選択され
る。この時、前フレームで選択されたコードベクトルと
現フレームで選択されたコードベクトルとの間の距離
は、２つのフレームで同一のコードベクトルが選択され
ない限りコードブックのサイズが小さいほど大きくなっ
てしまう。When the voice to be transmitted is input, the feature vector is extracted in the analysis unit and sequentially compared with the code vector in the codebook, and the closest vector is selected. At this time, the distance between the code vector selected in the previous frame and the code vector selected in the current frame increases as the size of the codebook decreases unless the same code vector is selected in two frames. .

【０００９】そこでまず、フレーム間でパワの急変して
いない部分に対して、前フレームでのスペクトルの情報
を記憶しておく。次に現フレームで選択されたコードベ
クトルについてスペクトルに復元した後、スペクトルの
不連続が大きく異音の発生が著しい周波数領域に対して
記憶しておいた上記スペクトルとの間で重み付きの平均
化を行う。Therefore, first, the spectrum information in the previous frame is stored for the portion where the power does not change suddenly between frames. Next, after restoring the spectrum for the code vector selected in the current frame, weighted averaging is performed with respect to the above-mentioned spectrum stored for the frequency region where the spectrum discontinuity is large and abnormal noise is remarkable. I do.

【００１０】平均化処理を行うパワの急変していないよ
うな部分では、音声が本来持っている特徴は連続してい
るフレームの間でそれほどかけ離れるということはほと
んど無く、また本手法では選択されたコードベクトルに
対して異音の発生の著しい周波数帯域に対してのみスペ
クトルの平均化を行い、それ以外の帯域では何等の加工
も行ってはいないため、本手法によるスペクトルの変形
を行っても再生音声の明瞭性等が損なわれるようなこと
はほとんど無い。また本手法の処理は全て受信側のみで
行われるため、送信側より送信される情報は従来のまま
であり伝送量の増加はない。In the part where the power for performing the averaging process does not change suddenly, the characteristic originally possessed by the voice is hardly separated so much between consecutive frames, and it is selected by this method. Since the spectrum is averaged only in the frequency band where abnormal noise is generated with respect to the code vector and no processing is performed in other bands, even if the spectrum is modified by this method. There is almost no loss of clarity of the reproduced voice. Further, since the processing of this method is entirely performed only on the receiving side, the information transmitted from the transmitting side remains the same as before and the transmission amount does not increase.

【００１１】[0011]

【実施例】以下、本発明の実施例を図面を用いて説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００１２】図１は本発明の一実施例を説明するための
ブロック図である。送信側と受信側を対にした一方向の
みを示しており、逆方向への通信路は、図が複雑になる
ため省略してある。FIG. 1 is a block diagram for explaining an embodiment of the present invention. Only one direction in which the transmitting side and the receiving side are paired is shown, and the communication path in the opposite direction is omitted because the figure becomes complicated.

【００１３】図１において、入力音声１０１はアナログ
・ディジタル（Ａ／Ｄ）変換器102を経て、２面構成の
バッファメモリ１０３に入力される。このメモリは以下
の処理の時間調整と、入力音声の中断を防止するために
設けられている。バッファメモリ１０３からの音声は分
析部１０４に入力され、レベル情報１０５，スペクトル
情報１０６，ピッチ情報１０７、が求められる。スペク
トル情報１０６はベクトル量子化部１０８に加えられ、
ベクトルコード１０９を得る。ベクトルコード１０９，
ピッチ情報１０７，レベル情報１０５は送信部１１０，
伝送路１１１を経て受信部１１２に送られる。受信側で
は受信部１１２で受けたベクトルコード１０９′，ピッ
チ情報１０７′，レベル情報１０５′はベクトル逆量子
化部113に加えられ、スペクトル情報１１４が復元さ
れ、ピッチ情報１０７′、レベル情報１０５′と共に合
成部１１５に加えられる。合成部１１５では音声波形に
復号され、出力用の２面バッファメモリ１１６を経て、
ディジタル・アナログ（Ｄ／Ａ）変換器１１７によりア
ナログ信号に変換され、出力音声１１８として再生され
る。In FIG. 1, an input voice 101 is input to a two-sided buffer memory 103 via an analog / digital (A / D) converter 102. This memory is provided to adjust the time of the following processing and prevent interruption of the input voice. The voice from the buffer memory 103 is input to the analysis unit 104, and the level information 105, the spectrum information 106, and the pitch information 107 are obtained. The spectral information 106 is added to the vector quantizer 108,
Obtain the vector code 109. Vector code 109,
The pitch information 107 and the level information 105 are transmitted by the transmitter 110,
It is sent to the receiving unit 112 via the transmission path 111. On the receiving side, the vector code 109 ', the pitch information 107', and the level information 105 'received by the receiving unit 112 are added to the vector dequantization unit 113, the spectrum information 114 is restored, and the pitch information 107' and the level information 105 '. It is added to the combining unit 115 together with. The synthesizing unit 115 decodes it into a voice waveform, passes through the output two-sided buffer memory 116,
A digital-to-analog (D / A) converter 117 converts the analog signal and reproduces it as an output sound 118.

【００１４】以下、各部分を詳細に説明する。Each part will be described in detail below.

【００１５】図２は分析部１０４を説明するための図で
ある。本実施例では、分析部はパワスペクトル包絡（Ｐ
ＳＥ）分析法による。ＰＳＥ分析法は、中島等の論文
“パワースペクトル包絡（ＰＳＥ）音声分析・合成
系”、日本音響学会誌４４巻１１号（昭６３−１１）に
詳細に述べられている。ここではその概要を述べる。FIG. 2 is a diagram for explaining the analysis unit 104. In the present embodiment, the analysis unit uses the power spectrum envelope (P
SE) Analytical method. The PSE analysis method is described in detail in Nakashima et al.'S paper "Power Spectrum Envelope (PSE) Speech Analysis / Synthesis System", Journal of Acoustical Society of Japan, Vol. 44, No. 11 (Sho 63-11). The outline is given here.

【００１６】図２において、ピッチ抽出部２０１は入力
音声のピッチ情報（ピッチ周波数またはピッチ周期）を
抽出する。ピッチ抽出の方法は、相関法やＡＭＤＦ法な
ど公知の方法を用いれば良い。波形切り出し部２０２は
入力音声からスペクトル情報を分析するための波形区間
を切り出すものであり、２０〜６０ｍｓ程度の区間を切
り出す。固定長の区間とすることが多いが、ピッチ周期
に依存し、その３倍程度の可変長にすることもある。切
り出された波形は、フーリエ変換部２０３に送られ、フ
ーリエ級数に変換される。このとき、切り出された波形
にハミング窓等、通常用いられる窓関数を掛けた後、前
後に零データを埋め込み、２０４８点のデータとし、高
速フーリエ変換（ＦＦＴ）を用いることで、高速かつ周
波数分解能の高いデータが得られる。フーリエ係数を絶
対値で表示したものが切り出し波形の周波数成分、すな
わちスペクトルとなる。切り出し波形が周期構造を有す
る場合は、スペクトルはピッチの高調波による線スペク
トル構造を有する。In FIG. 2, a pitch extraction unit 201 extracts pitch information (pitch frequency or pitch period) of input speech. A known method such as the correlation method or the AMDF method may be used as the pitch extraction method. The waveform cutout unit 202 cuts out a waveform section for analyzing spectrum information from the input voice, and cuts out a section of about 20 to 60 ms. It is often a fixed-length section, but it may be a variable length that is about three times as long as it depends on the pitch period. The cut out waveform is sent to the Fourier transform unit 203 and converted into a Fourier series. At this time, after multiplying the cut-out waveform by a window function that is normally used, such as a Hamming window, embedding zero data before and after to obtain 2048-point data, and using fast Fourier transform (FFT), high speed and frequency resolution High data is obtained. The Fourier coefficient displayed as an absolute value is the frequency component of the cut-out waveform, that is, the spectrum. When the cutout waveform has a periodic structure, the spectrum has a line spectral structure due to pitch harmonics.

【００１７】ピッチ再標本化部２０４では、ＦＦＴによ
り得られたスペクトル情報の中から、ピッチ周波数の高
調波成分（線スペクトル成分）のみを取り出す。このよ
うにして取り出したデータは、後述の余弦級数展開時の
周期πに対応付けて、以下考える。The pitch resampling unit 204 extracts only the harmonic component (line spectrum component) of the pitch frequency from the spectrum information obtained by the FFT. The data thus extracted will be considered below in association with the period π when the cosine series is expanded, which will be described later.

【００１８】パワスペクトル化部２０５は、スペクトル
の各成分を自乗し、パワスペクトルに変換する。さら
に、対数化部２０６は、各成分を対数化し、対数パワス
ペクトルを得る。The power spectrum conversion section 205 squares each component of the spectrum and converts it into a power spectrum. Further, the logarithmic unit 206 logarithmizes each component to obtain a logarithmic power spectrum.

【００１９】レベル正規化部２０７は入力音声の大きさ
に基づくレベル変動を吸収するものであるが、次の余弦
変換部２０８において、まとめて抽出しても良い。The level normalization unit 207 absorbs level fluctuations based on the volume of the input voice, but may be collectively extracted in the next cosine transform unit 208.

【００２０】余弦変換部２０８は対数パワスペクトルを
再標本化したデータを用いて、有限項の余弦級数で近似
的に表現するものである。項数ｍは、通常２５程度に設
定する。パワスペクトル包絡を次のように表現する。The cosine transform section 208 uses data obtained by re-sampling the logarithmic power spectrum to approximately express it as a finite term cosine series. The number of terms m is usually set to about 25. The power spectrum envelope is expressed as follows.

【００２１】[0021]

【数１】Ｙ＝Ａ０＋Ａ０・Ａ１・cosλ＋Ａ０・Ａ２・cos２λ＋…＋Ａ０・Ａｍ・cos ｍλ 係数Ａは、再標本化されたパワスペクトルデータと、数
１によるＹとの２乗誤差が最小となるように求められ
る。係数の第０項Ａ０は入力のレベルを表わしているの
でレベル情報１０５として、Ａ１，…，Ａｍをスペクト
ル情報１０６として出力する。[Equation 1] Y = A0 + A0 · A1 · cosλ + A0 · A2 · cos2λ + ... + A0 · Am · cos mλ The coefficient A is such that the square error between the resampled power spectrum data and Y according to Equation 1 is minimized. Required to. Since the 0th term A0 of the coefficient represents the input level, the level information 105 and A1, ..., Am are output as the spectrum information 106.

【００２２】次に図３を用いてベクトル量子化部を説明
する。図３において、コードブック３０１にはコードベ
クトルの要素の値とそのコードが記憶されている。照合
部３０２において、スペクトル情報（入力ベクトル）１
０６が入力されると、コードブック３０１から各コード
ベクトルが読みだされ、入力ベクトル１０６との距離が
計算される。ここで距離尺度は、ベクトルの各要素に重
み付けしたユークリッド距離であるが、他の適当な尺度
を用いても良いことは言うまでもない。全数探索の場合
はコードブック中の全てのコードベクトルに対して距離
値が計算され、距離値が最小となるコードベクトル（最
近傍ベクトル）のコード１０９を出力する。なお、ピッ
チ情報１０７などを利用して、照合の対象とするコード
ベクトルの範囲を限定することも可能である。また、ベ
クトル量子化手法として相補型ベクトル量子化（特願平
3−75700号）などの手法を採用する事も可能である。Next, the vector quantizer will be described with reference to FIG. In FIG. 3, a codebook 301 stores values of code vector elements and their codes. In the matching unit 302, the spectrum information (input vector) 1
When 06 is input, each code vector is read from the code book 301 and the distance from the input vector 106 is calculated. Here, the distance measure is the Euclidean distance in which each element of the vector is weighted, but it goes without saying that another suitable measure may be used. In the case of exhaustive search, distance values are calculated for all code vectors in the codebook, and the code 109 of the code vector (nearest neighbor vector) that minimizes the distance value is output. Note that it is possible to limit the range of code vectors to be matched by using the pitch information 107 and the like. As a vector quantization method, complementary vector quantization (Japanese Patent Application No.
It is also possible to adopt a method such as 3-75700).

【００２３】ここで、コードブック３０１について説明
する。学習データを図１の分析部までと同様な処理をし
てトレーニングベクトルを得、信号の特徴空間に分布す
る複数個のトレーニングベクトルをトレーニングベクト
ルの数よりも少ない複数個のクラスタに分類し、それぞ
れのクラスタから代表ベクトルを作成しコードブックに
登録する。代表ベクトルの作成法としては、まず各クラ
スタにおいて重心を求め、各クラスタ内に分布するトレ
ーニングベクトルに対し重心との距離が最小のトレーニ
ングベクトルをそのクラスタの代表ベクトルとする。ま
たは、クラスタの重心からの距離が予め定めた範囲内に
含まれる複数個のトレーニングベクトルを用いて求めた
重心を代表ベクトルとするなどの方法がある。本実施例
では約５０００個のトレーニングベクトルを用い、１０
２４のサイズのコードブックを作成した。The codebook 301 will be described below. The training data is processed in the same way as in the analysis unit of FIG. 1 to obtain a training vector, and a plurality of training vectors distributed in the signal feature space are classified into a plurality of clusters smaller than the number of training vectors. Create a representative vector from the cluster and register it in the codebook. As a method of creating the representative vector, first, the center of gravity is obtained in each cluster, and the training vector having the smallest distance from the center of gravity of the training vector distributed in each cluster is used as the representative vector of the cluster. Alternatively, there is a method in which the center of gravity obtained by using a plurality of training vectors whose distance from the center of gravity of the cluster is included in a predetermined range is used as the representative vector. In this embodiment, about 5000 training vectors are used and 10
A codebook of 24 sizes was created.

【００２４】次に復号側（受信側）について説明する。Next, the decoding side (reception side) will be described.

【００２５】図１においてベクトル逆量子化部１１３で
はベクトルコード１０９′が受信されると、コードブッ
クから対応するコードベクトルが読みだされる。なお、
受信側のコードブックは送信側のコードブック３０１と
同一の内容であることは言うまでもない。再生コードベ
クトルｙ′＝｛Ａ１′，Ａ２′，…，Ａｍ′｝はスペク
トル情報１１４として合成部１１５に送られる。In FIG. 1, when the vector dequantization unit 113 receives the vector code 109 ', the corresponding code vector is read from the codebook. In addition,
It goes without saying that the codebook on the receiving side has the same contents as the codebook 301 on the transmitting side. The reproduction code vector y ′ = {A1 ′, A2 ′, ..., Am ′} is sent to the synthesizing unit 115 as spectrum information 114.

【００２６】次に、合成部１１５について図４を用いて
説明する。同図において、対数パワスペクトル再生部４
０１についての説明は図５を用いて行う。スペクトル変
換部５０１では、伝送された再生ベクトル(スペクトル
情報１１４）の各要素Ａ１′,Ａ２′，…，Ａｍ′を用
いて対数パワスペクトルＹ′を次式にしたがって得る。Next, the synthesizing unit 115 will be described with reference to FIG. In the figure, a logarithmic power spectrum reproducing unit 4
01 will be described with reference to FIG. The spectrum conversion unit 501 obtains a logarithmic power spectrum Y'by using the elements A1 ', A2', ..., Am 'of the transmitted reproduction vector (spectral information 114) according to the following equation.

【００２７】[0027]

【数２】Ｙ′＝１＋Ａ１′cosλ＋Ａ２′cos２λ＋…＋Ａｍ′cos ｍλ 次にパワ比較部５０２において前フレームとのパワを比
較し、破裂音などパワが急変しているフレームに対して
はレベル再生部５１２に出力を出し、パワの急変してい
ないフレームでは周波数弁別部５０３に出力を切り替え
る。周波数弁別部５０３では入力された対数パワスペク
トル５０２をスペクトルの共通化を行う周波数成分Ｆｎ
５０４とそれ以外の成分５０５とに分離する。なお本実
施例では２ｋＨｚ以上の周波数帯域に対して共通化を行
っている。重み付き平均化部５０６では、予めデータ格
納部５０７に格納してある前フレームのスペクトルの共
通化成分Ｆｎ−１５０８を取り出し、入力スペクトル
の共通化する成分Ｆｎ′５０４との間で数３に従って重
み付きの平均化を行う。## EQU2 ## Y '= 1 + A1'cosλ + A2'cos2λ + ... + Am'cos mλ Next, the power comparing section 502 compares the power with the previous frame, and the level reproducing section is applied to a frame in which power is suddenly changed such as a burst sound. The output is output to 512, and the output is switched to the frequency discriminating unit 503 in the frame where the power does not suddenly change. In the frequency discriminating unit 503, the input logarithmic power spectrum 502 is frequency component Fn for commonizing the spectrum.
504 and the other components 505 are separated. In this embodiment, the frequency band of 2 kHz or higher is shared. The weighted averaging unit 506 takes out the common component Fn-1 508 of the spectrum of the previous frame stored in the data storage unit 507 in advance, and calculates it with the common component Fn ′ 504 of the input spectrum according to Formula 3. Performs weighted averaging.

【００２８】[0028]

【数３】Ｆｎ＝（Ｗ１・Ｆｎ′＋Ｗ２・Ｆｎ−１）／（Ｗ１＋Ｗ２）なお、それぞれの重みＷ１，Ｗ２は周波数によって変化
させることも出来る。重み付き平均化部５０６の出力５
０９はデータ格納部５０７のデータを更新すると共に、
周波数弁別部５０３から出力された残りの周波数成分５
０５と周波数混合部５１０において再び混合される。周
波数混合部５１０から出力されたスペクトル５１１は、
レベル再生部５１２においてレベル情報Ａ０１０５′
を用いることにより入力音声の大きさに基づくレベル情
報が付与される。## EQU00003 ## Fn = (W1.Fn '+ W2.Fn-1) / (W1 + W2) The respective weights W1 and W2 can be changed depending on the frequency. Output 5 of weighted averaging unit 506
09 updates the data in the data storage unit 507 and
Remaining frequency component 5 output from the frequency discriminating unit 503
05 is mixed again with the frequency mixing unit 510. The spectrum 511 output from the frequency mixing unit 510 is
Level information A0 105 'in the level reproduction unit 512
Is used, level information based on the volume of the input voice is added.

【００２９】再び図４に戻ると、再生された対数パワス
ペクトル４０２は指数変換部４０３で指数変換を行い、
零位相化スペクトル４０４を得、逆フーリエ変換部４０
５へ送られる。逆フーリエ変換部４０５では高速フーリ
エ逆変換（ＩＦＦＴ）により音声素片４０６が得られ
る。音声素片４０６は波形合成部４０７でピッチ情報１
０７′にしたがって順次ピッチ間隔だけずらしながら加
え合わせられ、再生音声４０８として出力される。Returning again to FIG. 4, the reproduced logarithmic power spectrum 402 undergoes exponential conversion in the exponential conversion unit 403,
The zero-phased spectrum 404 is obtained, and the inverse Fourier transform unit 40 is obtained.
Sent to 5. The inverse Fourier transform unit 405 obtains the speech unit 406 by inverse fast Fourier transform (IFFT). The speech unit 406 uses the waveform synthesizer 407 to generate pitch information 1
According to 07 ', they are sequentially added while being shifted by a pitch interval, and output as reproduced voice 408.

【００３０】実施例における発明の効果は２ｋＨｚ以上
の周波数のスペクトルを前フレームのスペクトルで重み
付きの平均化を行っているため合成音声のスペクトルの
時間的不連続に起因する異音の発生が抑えられ、一方２
ｋＨｚ以下の周波数に対しては何等の処理も加えていな
いため、明瞭性を損なわれずに音質の向上が得られた。
なお、情報の伝送量についても、送信側では従来のベク
トル量子化システムをそのまま用いているため伝送量は
従来のままである。The effect of the invention in the embodiment is that the generation of abnormal noise due to the temporal discontinuity of the spectrum of the synthesized voice is suppressed because the spectrum of the frequency of 2 kHz or more is weighted and averaged with the spectrum of the previous frame. On the other hand, 2
Since no processing was applied to frequencies below kHz, the sound quality was improved without impairing the clarity.
Regarding the amount of information transmitted, the amount of information transmitted is the same as before because the transmitting side uses the conventional vector quantization system as it is.

【００３１】[0031]

【発明の効果】本発明によれば、同一のコードブックを
用いた場合、従来のベクトル量子化よりも明瞭性を損な
うことなしに、合成音声のスペクトルの時間的不連続起
因する異音の発生が抑えられることによる音質の向上が
計ることができる。According to the present invention, when the same codebook is used, the generation of abnormal noise due to the temporal discontinuity of the spectrum of the synthesized speech without deteriorating the clarity as compared with the conventional vector quantization. It is possible to improve the sound quality by suppressing the noise.

【００３２】なお、本発明の説明では、対象は全て音声
を例にしているが、類似の構造をもつものにも利用でき
ることは言うまでもない。In the description of the present invention, all the objects are voices, but it goes without saying that they can be applied to those having a similar structure.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成を説明する図
である。FIG. 1 is a diagram illustrating a system configuration of an embodiment of the present invention.

【図２】分析部を説明する図である。FIG. 2 is a diagram illustrating an analysis unit.

【図３】ベクトル量子化部を説明する図である。FIG. 3 is a diagram illustrating a vector quantization unit.

【図４】合成部を説明する図である。FIG. 4 is a diagram illustrating a combining unit.

【図５】対数パワスペクトル再生部を説明する図であ
る。FIG. 5 is a diagram illustrating a logarithmic power spectrum reproducing unit.

[Explanation of symbols]

１０１…入力音声、１０３，１１６…バッファメモリ、
１０４…分析部、106,１１４…スペクトル情報、１０
７，１０７′…ピッチ情報、１０８…ベクトル量子化
部、１０９，１０９′…ベクトルコード、１１３…ベク
トル逆量子化部、１１５…合成部、１１９…出力音声。101 ... Input voice, 103, 116 ... Buffer memory,
104 ... Analysis unit, 106, 114 ... Spectral information, 10
7, 107 '... Pitch information, 108 ... Vector quantizer, 109, 109' ... Vector code, 113 ... Vector dequantizer, 115 ... Synthesizer, 119 ... Output speech.

Claims

[Claims]

1. A decoding side has a codebook in which different codes and a code vector representing a feature space of a signal corresponding to the code are stored, and by the code transmitted from the encoding side for each frame, When a code vector corresponding to a code is selected and the selected code vector is restored to a characteristic amount such as a spectrum of a signal, the previous frame is applied to a portion of the characteristic amount that does not reflect the signal characteristic. The vector quantization method, characterized in that the feature quantity is modified by performing commonization with the portion used in (1).

2. The decoding side uses the power information transmitted from the encoding side, and modifies the feature amount only for the frames in which the change in the power information between the frames is small. The vector quantization method according to claim 1, characterized in that

3. When transforming the feature quantity, weighted averaging is performed on the spectrum component restored from the codebook with the spectrum used in the previous frame. 1 vector quantization method.

4. The vector according to claim 3, wherein, when performing weighted averaging between the spectra, the spectrum component for which the weighted averaging is performed is limited to a predetermined frequency region and the averaging is performed. Quantization method.

5. The vector quantization method according to claim 3, wherein in performing the weighted averaging between the spectra, the weight used in the weighted averaging is changed for each frequency component.

6. When creating the codebook, the feature space of the signal is divided into a plurality of clusters smaller than the number of training vectors by clustering a plurality of training vectors distributed in the feature space of the signal. The vector quantization method according to claim 1, wherein a training vector having a minimum distance from the center of gravity of each cluster is adopted as the representative vector among the training vectors.

7. When creating the codebook, the feature space of the signal is divided into a plurality of clusters smaller than the number of training vectors by clustering a plurality of training vectors distributed in the feature space of the signal, The vector quantization method according to claim 1, wherein a representative vector is created using a plurality of training vectors whose distances from the center of gravity of each cluster are within a predetermined range among the training vectors.

8. A codebook creating method for creating a codebook that stores different codes and a code vector representing a feature space of a signal corresponding to the code on the decoding side, wherein the feature space of the signal is the feature space of the signal. Is divided into a plurality of clusters smaller than the number of the training vectors by clustering the plurality of training vectors distributed in the, and the training vector having the smallest distance from the center of gravity of each cluster among the training vectors is the representative vector. A codebook creation method characterized by being adopted as.

9. A codebook creating method for creating a codebook for storing different codes and a code vector representing a feature space of a signal corresponding to the code on the decoding side, wherein the feature space of the signal is the feature space of the signal. Are divided into a plurality of clusters smaller than the number of the training vectors by clustering a plurality of training vectors distributed in the above, and the distance from the center of gravity of each cluster in the training vectors is included within a predetermined range. A codebook creating method characterized in that a representative vector is created using a plurality of training vectors.

10. The decoding side has a codebook for storing different codes and a code vector representing a feature space of a signal corresponding to the code, and the codebook is transmitted by the coding side for each frame. Means for selecting a code vector corresponding to a code, and in restoring the selected code vector to a characteristic amount such as a spectrum of a signal, for a portion of the characteristic amount that does not reflect the characteristic of the signal A vector quantizer having a means for transforming the feature quantity by performing commonization with the portion used in the previous frame.

11. The decoding side uses the power information transmitted from the encoding side, and modifies the feature amount only for the frames in which the change in the power information between the frames is small. 11. The vector quantizer of claim 10, wherein the vector quantizer is a quantizer.

12. The feature amount transforming means includes means for performing weighted averaging on the spectrum component restored from the codebook with the spectrum used in the previous frame. The vector quantization device according to claim 10.

13. The vector quantum according to claim 12, wherein the weighted averaging means between the spectra performs averaging by limiting the spectrum component for which the weighted averaging is performed to a predetermined frequency domain. Device.

14. The vector quantization device according to claim 12, wherein the weighted averaging means between the spectra changes a weight for performing the weighted averaging for each frequency component.