JP4191503B2

JP4191503B2 - Speech musical sound signal encoding method, decoding method, encoding device, decoding device, encoding program, and decoding program

Info

Publication number: JP4191503B2
Application number: JP2003035256A
Authority: JP
Inventors: 岳至森; 祐介日和▲崎▼; 丈太朗池戸; 徹森永; 大輔徳元
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-02-13
Filing date: 2003-02-13
Publication date: 2008-12-03
Anticipated expiration: 2023-02-13
Also published as: JP2004246038A

Description

【０００１】
【産業上の利用分野】
この発明は音声楽音信号符号化方法、復号化方法、符号化装置、復号化装置、符号化プログラム、および復号化プログラムに関し、特に、音声楽音の周波数帯域を分割して帯域毎に残差係数に対するベクトル長およびビット割り当てを行い音声楽音信号を高能率にディジタル符号に変換符号化して、このディジタル信号を音声楽音信号に復元復号化する音声楽音信号符号化方法、復号化方法、符号化装置、復号化装置、および符号化プログラム、復号化プログラムに関する。
【０００２】
【従来の技術】
音声信号および楽音信号を圧縮する従来方法として、入力信号をＭＤＣＴ（modified discrete cosine transform：変形離散コサイン変換）により一定サンプルの入力毎に時間／周波数変換し、周波数領域の一連の係数に変換してから符号化を行なう変換符号化方法が知られている。一例として、図１に示されるが如きＴｗｉｎＶＱ方式（transfer-domain weighted interleave vector quantization：周波数領域重み付けインタリーブベクトル量子化）（非特許文献１、特許文献１参照）は、ＭＤＣＴにより変換した周波数領域ＭＤＣＴ係数をＬＰＣ（linear predictiv coefficient:線形予測係数）スペクトル包絡、２段階のパワーにより平坦化を行ない、得られる出力信号、即ち、残差信号を重み付けベクトル量子化により量子化する方法である。
【０００３】
ベクトル量子化は、目標ベクトルとの間の距離が最小となるコードベクトルをコードブックから選択し、復号化装置でベクトルを先のコードベクトルから再生する方法である。この符号化方法は、低いビットレートでも高音質な符号化がなされるが、効率的に符号化するには符号化するベクトルを或る程度の長さの、即ち、５〜６次元の長さのベクトルに設定する必要がある。しかし、ベクトル長を長くするとパワー変動が急峻なデータを符号化する際に大きな符号化歪が発生するという問題がある。また、ベクトル長を短くすると、各ベクトルを符号化するに使用することができるビットが減るところから、目標ベクトルに近いコードベクトルがコードブック中に存在しない状態になりやすく、これが大きな符号化歪を発生させる原因となる。
【０００４】
【特許文献１】
特開平０８−０４４３９９号公報
【非特許文献１】
岩上、守谷、三樹、「周波数領域重み付けインターリーブベクトル量子化によるオーディオ符号化」、日本音響学会秋季講演論文、ｐｐ．３３９−３４0、１９９４−１0
【０００５】
【発明が解決しようとする課題】
この発明は、広帯域入力信号のスペクトルを符号化する音声楽音信号の変換符号化方法において帯域毎に異なる形状を持つスペクトルを効率的に符号化する方法およびこの符号化された信号を復号化する帯域分割音声楽音信号符号化方法、復号化方法およびこれらの方法を実行するプログラム記憶媒体を提供するものである。
【０００６】
【課題を解決するための手段】
離散音声楽音信号サンプルを入力としてディジタル符号を出力する音声楽音信号符号化方法において、時間系列の信号を一定数の入力サンプル毎に帯域分割フィルタにより帯域分割し、複数の帯域別時間系列の信号を求め、帯域別時間系列の信号を帯域毎に一定数の入力サンプル毎にＭＤＣＴ変換して周波数領域ＭＤＣＴ係数を求め、帯域別時間系列の信号から帯域毎に線形予測分析を行なってＬＰＣスペクトル包絡およびＬＰＣインデックスを算出し、帯域毎の周波数領域ＭＤＣＴ係数を帯域毎のＬＰＣスペクトル包絡により平坦化して平坦化入力係数を求め、帯域毎の平坦化入力係数を一定数の入力サンプル係数毎にパワーで正規化し、パワー正規化インデックスおよび残差入力係数を求め、帯域毎の残差入力係数をそれぞれベクトル量子化によりベクトル量子化インデックスを求め、このベクトル量子化は、帯域毎にＬＰＣスペクトル包絡、或はパワー正規化インデックス、或いは残差入力係数からベクトル長と量子化ビット割り当ておよびベクトル量子化を行い、必要に応じてベクトル長インデックスと量子化ビット割り当てインデックスを求める音声楽音信号符号化方法を構成した。
【０００７】
そして、ディジタル符号を入力して音声楽音信号を再生する音声楽音信号復号化方法において、帯域毎のパワー正規化インデックスから再生正規化パワーを求め、帯域毎のＬＰＣインデックスから再生ＬＰＣスペクトル包絡を求め、帯域毎にベクトル長と量子化ビット割り当てを行い、ベクトル量子化インデックスにより再生残差係数を求めてベクトル量子化復号化を行い、帯域毎の再生残差係数を再生正規化パワーにより逆平坦化し、再生平坦化係数を求め、帯域毎の再生平坦化係数を、帯域毎の再生ＬＰＣスペクトル包絡により逆平坦化し、再生ＭＤＣＴ係数を求め、帯域毎の再生ＭＤＣＴ係数を逆ＭＤＣＴ変換して複数の帯域別時間系列信号を求め、複数の帯域別時間系列信号から帯域合成フィルタにより時間系列の再生音声楽音信号を求める音声楽音信号復号化方法を構成した。
【０００８】
また、離散音声楽音信号サンプルを入力としてディジタル符号を出力する音声楽音信号符号化装置において、時間系列の信号を一定数の入力サンプル毎に帯域分割し、複数の帯域別時間系列の信号を求める帯域分割フィルタを具備し、帯域別時間系列の信号を帯域毎に一定数の入力サンプル毎にＭＤＣＴ変換して周波数領域ＭＤＣＴ係数を求めるＭＤＣＴ変換部２１０、２１１、２１２を具備し、帯域別時間系列の信号から帯域毎に線形予測分析を行なってＬＰＣスペクトル包絡およびＬＰＣインデックスを算出するＬＰＣ分析部２２０、２２１、２２２を具備し、帯域毎の周波数領域ＭＤＣＴ係数を帯域毎のＬＰＣスペクトル包絡により平坦化して平坦化入力係数を求めるＬＰＣ平坦化部２３０、２３１、２３２を具備し、帯域毎の平坦化入力係数を一定数の入力サンプル係数毎にパワー成分で正規化し、正規化パワーおよびパワー正規化インデックスを求めるパワー正規化部２４０、２４１、２４２を具備し、帯域毎の入力平坦化係数を正規化パワーにより正規化し、残差入力係数を計算する残差計算部２５０、２５１、２５２を具備し、ＬＰＣスペクトル包絡、或いは正規化パワー、或いは残差入力係数から各帯域のベクトル量子化で使用するベクトル長とベクトル量子化に使用するビットの割り当てを計算するベクトル長量子化ビット割り当て部２７０を具備し、帯域毎にベクトル長と量子化ビット割り当ておよびベクトル量子化を行ない、必要に応じてベクトル長インデックスと量子化ビット割り当てインデックスを求めるベクトル量子化部２６０、２６１、２６２を具備する音声楽音信号符号化装置を構成した。
【０００９】
更に、ディジタル符号を入力して音声楽音信号を再生する音声楽音信号復号化装置において、帯域毎のパワー正規化インデックスから再生正規化パワーを求めるパワー逆正規化部４４０、４４１、４４２を具備し、帯域毎のＬＰＣインデックスから再生ＬＰＣスペクトル包絡を求めるＬＰＣ合成部４２０、４２１、４２２を具備し、帯域毎にベクトル長と量子化ビット割り当てを行い、ベクトル量子化インデックスにより再生残差係数を求めてベクトル量子化復号化を行うベクトル量子化復号部４６０、４６１、４６２を具備し、帯域毎の再生残差係数を再生正規化パワーにより逆平坦化し、再生平坦化係数を求める残差逆平坦化部４５０、４５１、４５２を具備し、帯域毎の再生平坦化係数を、帯域毎の再生ＬＰＣスペクトル包絡により逆平坦化し、再生ＭＤＣＴ係数を求めるＬＰＣ逆平坦化部４３０、４３１、４３２を具備し、帯域毎の再生ＭＤＣＴ係数を逆ＭＤＣＴ変換して複数の帯域別時間系列信号を求める逆ＭＤＣＴ変換部４１０、４１１、４１２を具備し、複数の帯域別時間系列信号から時間系列の再生音声楽音信号を求める帯域合成フィルタ４００を具備する音声楽音信号復号化装置を構成した。
【００１０】
ここで、時間系列の信号を一定数の入力サンプル毎に帯域分割フィルタにより帯域分割し、複数の帯域別時間系列の信号を求め、帯域別時間系列の信号を帯域毎に一定数の入力サンプル毎にＭＤＣＴ変換して周波数領域ＭＤＣＴ係数を求め、帯域別時間系列の信号から帯域毎に線形予測分析を行なってＬＰＣスペクトル包絡およびＬＰＣインデックスを算出し、帯域毎の周波数領域ＭＤＣＴ係数を帯域毎のＬＰＣスペクトル包絡により平坦化して平坦化入力係数を求め、帯域毎の平坦化入力係数を一定数の入力サンプル係数毎にパワーで正規化し、パワー正規化インデックスおよび残差入力係数を求め、帯域毎の残差入力係数をそれぞれベクトル量子化によりベクトル量子化インデックスを求め、このベクトル量子化は、帯域毎にベクトル長と量子化ビット割り当ておよびベクトル量子化を行ない、必要に応じてベクトル長インデックスと量子化ビット割り当てインデックスを求める指令を実行する音声楽音信号符号化プログラムを構成した。
【００１１】
そして、帯域毎のパワー正規化インデックスから再生正規化パワーを求め、帯域毎のＬＰＣインデックスから再生ＬＰＣスペクトル包絡を求め、帯域毎にベクトル長と量子化ビット割り当てを行い、ベクトル量子化インデックスにより再生残差係数を求めてベクトル量子化復号化を行い、帯域毎の再生残差係数を再生正規化パワーにより逆平坦化し、再生平坦化係数を求め、帯域毎の再生平坦化係数を、帯域毎の再生ＬＰＣスペクトル包絡により逆平坦化し、再生ＭＤＣＴ係数を求め、帯域毎の再生ＭＤＣＴ係数を逆ＭＤＣＴ変換して複数の帯域別時間系列信号を求め、複数の帯域別時間系列信号から帯域合成フィルタにより時間系列の再生音声楽音信号を求める指令を実行する音声楽音信号復号化プログラムを構成した。
【００１２】
【発明の実施の形態】
符号器は、入力された信号系列を一定時間毎に帯域分割フィルタにより複数帯域に分割し、それぞれの時間系列信号をＭＤＣＴにより周波数領域のＭＤＣＴ係数に変換し、ＬＰＣスペクトル包絡により正規化を行ない平坦化入力係数を算出した後、パワーにより正規化を行ない、残差入力係数を得る。帯域毎に計算されるＬＰＣスペクトル包絡とパワー正規化係数と残差入力係数より、各帯域においてベクトル量子化に使用するビット数およびベクトル長を計算し、各帯域毎に算出される残差入力係数をベクトル量子化する。
復号器は、ベクトル量子化復号を行ない再生平坦化係数を算出する。この再生平坦化係数を再生正規化パワーおよび再生ＬＰＣスペクトル包絡により逆平坦化を行なった後、周波数／時間変換により帯域別出力時間領域信号を得て、これら出力時間領域信号を帯域合成フィルタにより合成し、出力信号を得る。この方法は、各帯域の重要度に応じた品質の制御を特に低ビットレートの符号化に効果のあるベクトル量子化により実現することができる。
【００１３】
【実施例】
この発明の実施例を図を参照して説明する。
図２および図４はこの発明の第１の実施例を説明する図である。図２に示される第１の実施例における符号化装置は、帯域分割フィルタ部２００とＭＤＣＴ変換部２１０、２１１、２１２と、ＬＰＣ分析部２２０、２２１、２２２と、ＬＰＣ平坦化部２３０、２３１、２３２と、パワー正規化部２４０、２４１、２４２と、残差計算部２５０、２５１、２５２と、ベクトル量子化部２６０、２６１、２６２と、ベクトル長量子化ビット割り当て部２７０より構成される。入力端子２０１から入力した入力信号である音声楽音信号の離散サンプル列は、帯域分割フィルタ部２００に入力される。入力信号は帯域分割フィルタ部２００において帯域別時間系列信号に変換される。この実施例において、入力信号は３帯域に分割している。一例として、０ｋＨｚから１６ｋＨｚに亘る広帯域の音声楽音信号を３２ｋＨｚでサンプリングして入力信号とし、この入力信号をOｋＨｚから４ｋＨｚ迄の帯域、４ｋＨｚから８ｋＨｚ迄の帯域、８ｋＨｚから１６ｋＨｚ迄の帯域の３帯域に分割すると効果が高い。この通りに３分割された帯域別時間系列信号は、各帯域のＭＤＣＴ変換部２１０、２１１、２１２に入力され、ここにおいて変形離散コサイン変換によりそれぞれ周波数領域の入力ＭＤＣＴ係数に変換されて、各帯域のＬＰＣ平坦化部２３０、２３１、２３２に送信される。３分割された帯域別時間系列信号は、また、各帯域のＬＰＣ分析部２２０、２２１、２２２にも入力され、ここにおいてこの入力信号に基づいてＬＰＣスペクトル包絡が算出され、各帯域のＬＰＣ平坦化部２３０、２３１、２３２とベクトル長量子化ビット割り当て部２７０に送信される。ここで、ベクトル長、量子化ビット割り当てには、以下の３通りがある。
（Ａ）ＬＰＣスペクトル包絡より求める。
（Ｂ）パワー正規化インデックスより求める。
（Ｃ）残差入力係数から求める。この場合のみベクトル長インデックスと量子化ビット割り当てインデックスが必要とされる。
【００１４】
入力ＭＤＣＴ係数については、効率的な量子化を行なうために、各帯域のＬＰＣ平坦化部２３０、２３１、２３２においてＬＰＣスペクトル包絡により平坦化された平坦化入力係数が計算され、それぞれ各帯域のパワー正規化部２４０、２４１、２４２と残差計算部２５０、２５１、２５２とに送信される。各帯域のパワー正規化部２４０、２４１、２４２は、平坦化入力係数からのパワー成分を計算し、正規化パワーをそれぞれの残差計算部２５０、２５１、２５２とベクトル長量子化ビット割り当て部２７０に送信する。以上の計算は、一定数サンプルにおける平均パワーの平方根、一定数サンプル内での最大振幅の絶対値を使用すると効果的である。
各帯域の残差計算部２５０、２５１、２５２は、入力平坦化係数を正規化パワーにより正規化し、残差入力係数を計算し、計算結果をそれぞれのベクトル量子化部２６０、２６１、２６２に送信する。ベクトル長量子化ビット割り当て部２７０は、ＬＰＣスペクトル包絡と正規化パワーから各帯域のベクトル量子化で使用するベクトル長とベクトル量子化に使用するビットの割り当てを計算し、計算結果を各帯域のベクトル量子化部２６０、２６１、２６２に送信する。ここで、例えば、ベクトル長の上限をＶｓ−t、下限をＶs−bとし、各帯域の正規化パワーの平均をＰ（ｋ）（ｋは帯域番号）とすると、帯域番号ｋにおけるベクトル量子化部で使用するベクトル長Ｖ（ｋ）は、
【数１】

但し、(ｉｎｔ）は整数化を表す。
により計算することができる。ここで、例えば、Ｖs−t＝６、Ｖs−b＝２とすることにより、効果的なベクトル長を決定することができる。また、ベクトルＶs−t（ｋ、ｎ）（ｋは帯域番号、ｎはベクトル番号)のビット割り当てＢit（Ｖ−t（ｋ、ｎ））は、帯域別に符号化に使用することができるビット数をＢit−total（ｋ）とし、ＬＰＣスペクトルをＬＰＣ（ｋ、ｓ）(ｋは帯域番号、ｓはサンプル番号)とすると、
【数２】

により計算することができる。各帯域のベクトル量子化部２６０、２６１、２６２は、残差入力係数を、ベクトル長、ビット割り当て情報を用いてベクトル量子化して、ベクトル量子化インデックスを計算する。
ベクトル長を、その帯域の正規化残差パワーの全帯域パワーの合計に対する比に基づいて決められることにより、パワーが大きくなる程ベクトル長を短く設定することができる。そして、式（２）および後で説明される式（４）の内のｓ∈Ｖ_t（ｋ、ｎ）とは、Ｖ_ｔ（ｋ、ｎ）(帯域ｋにおけるｎ番目のベクトル)に含まれるベクトル長個分あるベクトル要素サンプルｓを示し、式（２）（４）においてベクトル要素サンプルｓに関するＬＰＣスペクトル和を計算することにより、ベクトルのパワーが大きい程多くの情報量が割り当てられる。以上のことから、パワーが大きな帯域ほどベクトル長を短くし、ビット割り当てを多く設定することにより、帯域内パワー変動が激しくとも符号化品質劣化を抑制することができる。ここで、ベクトル長は帯域毎に決定され、帯域別で且つビット割り当て応じた符号帳を使用する。符号帳にはビット割り当てに応じた個数の符号ベクトルが符号と対応付けて記憶されている。この個数は、一般に、２のビット割り当て個数乗個であり、ビット割り当てが多いほどベクトル個数が大きい。復号化においても、ベクトル長、ビット割り当て決定後にこの様な符号帳を使用し、入力符号に対するベクトルを再生する。
【００１５】
図４に示される第１の実施例における復号化装置は、帯域合成フィルタ４００と、逆ＭＤＣＴ変換部４１０、４１１、４１２と、ＬＰＣ合成部４２０、４２１、４２２と、ＬＰＣ逆平坦化部４３０、４３１、４３２と、パワー逆正規化部４４０、４４１、４４２と、残差逆平坦化部４５０、４５１、４５２と、ベクトル量子化復号部４６０、４６１、４６２と、ベクトル長量子化ビット割り当て部４７０から構成される。ＬＰＣ合成部４２０、４２１、４２２、パワー逆正規化部４４０、４４１、４４２、ベクトル量子化復号部４６０、４６１、４６２から入力された符号ビット列を復号し、時間領域の離散サンプル列である音声楽音信号を出力端子４０１から出力する。即ち、各帯域のＬＰＣ合成部４２０、４２１、４２２は入力されたＬＰＣインデックスから再生ＬＰＣスペクトル包絡を算出し、ＬＰＣ逆平坦化部４３０、４３１、４３２とベクトル長量子化ビット割り当て部４７０に送信する。また、パワー逆正規化部４４０、４４１、４４２は入力されたパワー正規化インデックスから正規化パワーを計算して、残差逆平坦化部４５０、４５１、４５２とベクトル長量子化ビット割り当て部４７０に送信する。ベクトル長量子化ビット割り当て部４７０は、図２に示されるベクトル長量子化ビット割り当て部２７０と同様の計算により、各帯域でベクトル量子化に使用するベクトル長およびビット割り当てを計算し、計算結果を各帯域のベクトル量子化復号部４６０、４６１、４６２に送信する。各帯域のベクトル量子化復号部４６０、４６１、４６２は、ベクトル量子化インデックスと以上において計算されたベクトル長、量子化ビット割り当てを使ってベクトル量子化復号を行ない、再生残差係数を計算し、計算結果をそれぞれの残差逆平坦化部４５０、４５１、４５２に送信する。各帯域の残差逆平坦化部４５０、４５１、４５２は、再生残差係数を正規化パワーにより逆正規化し、再生平坦化係数を算出し、計算結果をそれぞれのＬＰＣ逆平坦化部４３０、４３１、４３２に送信する。各帯域のＬＰＣ逆平坦化部４３０、４３１、４３２は、再生平坦化係数を再生ＬＰＣスペクトル包絡により逆平坦化し、再生ＭＤＣＴ係数を計算して、それぞれの逆ＭＤＣＴ変換部４１０、４１１、４１２に送信する。各帯域の逆ＭＤＣＴ変換部４１０、４１１、４１２は、再生ＭＤＣＴ係数を逆ＭＤＣＴ計算することで、帯域別時間系列信号を計算し、計算結果を帯域合成フィルタ部４００に送信する。帯域合成フィルタ部４００は、各帯域から出力された時間系列信号を合成し、時間領域の出力サンプル系列に変換し、復号結果として出力端子４０１から出力される。
以上の図２および図４による第１の実施例の場合は、ベクトル長、ビット割り当てを、帯域毎に、符号化の対象となるパワー正規化係数、包絡成分から算出するので、ベクトル長、ビット割り当てに関する情報を符号化出力とする必要はない。従って、復号化においては、伝送したパワー正規化係数、包絡成分に基づいてベクトル長、ビット割り当てを行った上で伝送符号からの残差成分の再生を行っている。
【００１６】
図３および図５はにこの発明の第２の実施例を説明する図である。
図３に示される符号化装置は、帯域分割フィルタ部３００と、ＭＤＣＴ変換部３１０、３１１、３１２と、ＬＰＣ分析部３２０、３２１、３２２と、ＬＰＣ平坦化部３３０、３３１、３３２と、パワー正規化部３４０、３４１、３４２と、残差計算部３５０、３５１、３５２と、ベクトル量子化部３６０、３６１、３６２と、ベクトル長量子化ビット割り当て部３７０より構成され、音声楽音信号の離散サンプル列を端子３０１に入力し、符号化したビット列をＬＰＣ分析部３２０、３２１、３２２、パワー正規化部３４０、３４１、３４２、ベクトル量子化部３６０、３６１、３６２より出力する。即ち、入力信号は入力端子３０１を介して帯域分割フィルタ部３００に入力され、ここにおいて帯域別時間系列信号に変換される。この実施例においては、入力信号は３帯域に分割される。一例として、０ｋＨｚから１６ｋＨｚに亘る広帯域の音声楽音信号を３２ｋＨｚでサンプリングして入力信号とし、この入力信号を０ｋＨｚから４ｋＨｚ迄の帯域、４ｋＨｚから８ｋＨｚ迄の帯域、８ｋＨｚから１６ｋＨｚ迄の帯域の３帯域に分割すると効果が高い。この通りに３分割された帯域別時間系列信号は、各帯域のＭＤＣＴ変換部３１０、３１１、３１２において、変形離散コサイン変換によりそれぞれ周波数領域の入力ＭＤＣＴ係数に変換され、変換結果を対応するＬＰＣ平坦化部３３０、３３１、３３２に送信する。３分割された帯域別時間系列信号は、また、対応するＬＰＣ分析部３２０、３２１、３２２にも送信され、ここにおいてＬＰＣスペクトル包絡が入力信号より算出され、ＬＰＣ平坦化部３３０、３３１、３３２に送信される。入力ＭＤＣＴ係数について、効率的な量子化を行なうために、対応するＬＰＣ平坦化部３３０、３３１、３３２においてＬＰＣスペクトル包絡により平坦化されて平坦化入力係数が計算され、対応する残差計算部３５０、３５１、３５２とパワー正規化部３４０、３４１、３４２に送信される。各帯域のパワー正規化部３４０、３４１、３４２は平坦化入力係数からのパワー成分を計算し、残差計算部３５０、３５１、３５２に送信する。以上の計算は、一定数サンプルにおける平均パワーの平方根、一定数サンプル内での最大振幅の絶対値を使用すると効果的である。各帯域の残差計算部３５０、３５１、３５２は、入力平坦化係数を正規化パワーにより正規化し、残差入力係数を計算し、対応するベクトル量子化部２６０、２６１、２６２とベクトル長量子化ビット割り当て部３７０に送信する。ベクトル長量子化ビット割り当て部３７０は、残差入力係数から各帯域のベクトル量子化で使用するベクトル長とベクトル量子化に使用するビットの割り当てを計算し、ベクトル長および量子化ビット割り当てを対応するベクトル量子化部３６０、３６１、３６２に送信する。そして、これらベクトル量子化部３６０、３６１、３６２はベクトル量子化インデックスを計算する。ここで、例えば、ベクトル長の上限をＶs−t、下限をＶs−bとし、各帯域の残差入力信号のパワーをＰ（ｋ、ｓ）（ｋは帯域番号、ｓはサンプル番号）とすると、帯域番号ｋのベクトル量子化におけるベクトル長Ｖ（ｋ）は
【数３】

但し、（ｉｎｔ）は整数化を表す。
により計算することができる。ここで、例えばＶs−t＝６、Ｖs−b＝２とすることにより、効果的なベクトル長を決定することができる。また、各帯域におけるベクトルＶ−t（ｋ、ｎ）（ｋは帯域番号、ｎはベクトル番号）のビット割り当てＢit（Ｖ−t（ｋ、ｎ））は、帯域別に符号化に使用することができるビット数をＢit−total（ｋ）としたとき、
【数４】

により計算することができる。各帯域のベクトル量子化部３６０、３６１、３６２は、残差入力係数を、ベクトル長、ビット割り当て情報を用いてベクトル量子化し、ベクトル量子化インデックスを計算する。
図５に示される第２の実施例における復号化装置は各帯域の帯域合成フィルタ５００と、逆ＭＤＣＴ変換部５１０、５１１、５１２と、ＬＰＣ合成部５２０、５２１、５２２と、ＬＰＣ逆平坦化部５３０、５３１、５３２と、パワー逆正規化部５４０、５４１、５４２と、残差逆平坦化部５５０、５５１、５５２と、ベクトル量子化復号部５６０、５６１、５６２から構成されて、ＬＰＣ合成部５２０、５２１、５２２、パワー逆正規化部５４０、５４１、５４２、ベクトル量子化復号部５６０、５６１、５６２から入力された符号ビット列を復号し、時間領域の離散サンプル列である音声楽音信号を出力端子５０１から出力する。即ち、各帯域のＬＰＣ合成部５２０、５２１、５２２は、入力されたＬＰＣインデックスから再生ＬＰＣスペクトル包絡を算出し、対応するＬＰＣ逆平坦化部５３０、５３１、５３２に送信する。また、各帯域のパワー逆正規化部５４０、５４１、５４２は、パワー正規化インデックスから正規化パワーを計算し、残差逆平坦化部５５０、５５１,５５２に送信する。各帯域のベクトル量子化復号部５６０、５６１、５６２は、ベクトル量子化インデックスとベクトル長および量子化ビット割り当てからベクトル量子化復号を行ない、再生残差係数を計算し、残差逆平坦化部５５０、５５１、５５２に送信する。各帯域の残差逆平坦化部５５０、５５１、５５２は、再生残差係数を正規化パワーにより逆正規化し、再生平坦化係数を算出し、ＬＰＣ逆平坦化部５３０、５３１、５３２に送信する。ＬＰＣ逆平坦化部５３０、５３１、５３２は、再生平坦化係数を再生ＬＰＣスペクトル包絡により逆平坦化し、再生ＭＤＣＴ係数を計算して、逆ＭＤＣＴ変換部５１０、５１１、５１２に送信する。逆ＭＤＣＴ変換部５１０、５１１、５１２は再生ＭＤＣＴ係数を逆ＭＤＣＴ計算することで、帯域別時間系列信号を計算し、帯域合成フィルタ部５００に送信する。帯域合成フィルタ部５００は、各帯域から出力された時間系列信号を合成し、時間領域の出力サンプル系列に変換し、復号結果として出力端子５０１に出力する。
以上の図３および図５による第２の実施例の場合は、残差成分に対するベクトル長、ビット割り当てを量子化前の値に基づいて行うので、ベクトル長、ビット割り当て情報を符号化し、復号においてはこれらを再生してベクトル長、ビット割り当てを決めてから残差成分を再生するところが、第１の実施例と異なるところである。
【００１７】
図６はこの発明による符号化方法および復号化方法をコンピュータで実施する場合の構成を示す。コンピュータ６００は、バス６８０を介して互いに接続されたＣＰＵ６１０、ＲＡＭ６２０、ＲＯＭ６３０、入出カインタフェース６４０、ハードディスク６５０を含んでいる。ＲＯＭ６３０にはコンピュータ６００を動作させる基本プログラムが格納されており、ハードディスク６５０は前述したこの発明による符号化方法および復号化方法を実行するプログラムが予め格納されている。
符号化時には、ＣＰＵ６１０はハードディスク６５０から符号化プログラムをＲＡＭ６２０にロードし、インタフニース６４０から入力されたオーディオ信号サンプルを符号化プログラムに従って処理することにより符号化し、インタフェース６４０から出力する。復号時には、復号プログラムをハードディスク６５０からＲＡＭ６２０にロードし、入力信号を復号プログラムに従って処理してオーディオ信号サンプルを出力する。
【００１８】
この発明による符号化方法および復号化方法を実行するプログラムは、内部バス６８０に駆動装置６６０を介して接続された外部ディスク装置６７０に記録されたものを使用しても良い。或いは、インタフェース６４０を介して外部ネットワークからプログラムをダウンロードしてハードディスク６５０に格納したものでも良い。この発明による符号化、復号化方法を実行するプログラムが記録された記憶媒体としては、磁気記録媒体、ＩＣメモリ、コンパクトディスクなどの形態の記憶媒体であっても良い。
【００１９】
【発明の効果】
上述した通りであって、この発明は、帯域毎に異なる形状を持つ広帯域音声楽音信号を効率的に符号化、復号化することができる。音声楽音信号は、これを周波数を横軸にとり、パワーを縦軸にとって一例として帯域に分割して示した場合、これら３帯域はそれぞれ各別の異なる形状のスペクトルを持つ。ベクトル量子化に使用する全体として一定量のビット数およびベクトル長を、各帯域別の一定値により固定的に割り当てることをしないで、各帯域の重要度、必要性を勘案しこれに対応した適正な分配、割り当てを行うことにより、広帯域の音声楽音信号を効率的に符号化する。
【図面の簡単な説明】
【図１】ベクトル量子化利用の変換符号化方法の一例を説明する図。
【図２】第１の実施例における符号化器を説明する図。
【図３】第２の実施例における符号化器を説明する図。
【図４】第１の実施例における復号化器を説明する図。
【図５】第２の実施例における復号化器を説明する図。
【図６】符号化、復号化方法を実施するコンピュータを示す図。
【符号の説明】
２０１帯域分割フィルタ
２１０、２１１、２１２ＭＤＣＴ変換部
２２０、２２１、２２２ＬＰＣ分析部
２３０、２３１、２３２ＬＰＣ平坦化部
２４０、２４１、２４２パワー正規化部
２５０、２５１、２５２残差計算部
２６０、２６１、２６２ベクトル量子化部
２７０ベクトル長量子化ビット割り当て部
４００帯域合成フィルタ
４１０、４１１、４１２逆ＭＤＣＴ変換部
４２０、４２１、４２２ＬＰＣ合成部
４３０、４３１、４３２ＬＰＣ逆平坦化部
４４０、４４１、４４２パワー逆正規化部
４５０、４５１、４５２残差逆平坦化部
４６０、４６１、４６２ベクトル量子化復号部[0001]
[Industrial application fields]
The present invention relates to a voice tone signal encoding method, a decoding method, an encoding device, a decoding device, an encoding program, and a decoding program, and more particularly to dividing a frequency band of a voice tone into a residual coefficient for each band. Voice musical tone signal encoding method, decoding method, encoding device, and decoding, which performs vector length and bit allocation and converts and encodes a voice musical tone signal into a digital code with high efficiency and restores the digital signal to a voice musical tone signal. The present invention relates to an encoding device, an encoding program, and a decoding program.
[0002]
[Prior art]
As a conventional method for compressing audio signals and musical tone signals, the input signal is converted into a series of coefficients in the frequency domain by MDCT (modified discrete cosine transform) for time / frequency conversion for each input of a fixed sample. A transform coding method for performing coding from the above is known. As an example, as shown in FIG. 1, the TwinVQ method (transfer-domain weighted interleave vector quantization) (see Non-Patent Document 1 and Patent Document 1) is a frequency-domain MDCT coefficient converted by MDCT. Is an LPC (linear predictive coefficient) spectrum envelope, flattened by two levels of power, and the resulting output signal, that is, the residual signal is quantized by weighted vector quantization.
[0003]
Vector quantization is a method in which a code vector that minimizes the distance to a target vector is selected from the code book, and the vector is reproduced from the previous code vector by a decoding device. In this encoding method, encoding with high sound quality is performed even at a low bit rate. However, for efficient encoding, a vector to be encoded has a certain length, that is, a length of 5 to 6 dimensions. Must be set to a vector of However, when the vector length is increased, there is a problem that a large encoding distortion occurs when encoding data with a sharp power fluctuation. Also, if the vector length is shortened, the number of bits that can be used to encode each vector is reduced, so that a code vector close to the target vector tends not to exist in the codebook, which causes a large coding distortion. Cause it to occur.
[0004]
[Patent Document 1]
Japanese Patent Laid-Open No. 08-044399
[Non-Patent Document 1]
Iwakami, Moriya, Miki, “Audio coding by frequency domain weighted interleaved vector quantization”, Acoustical Society of Japan Autumn Lecture, pp. 339-340, 1994-10
[0005]
[Problems to be solved by the invention]
The present invention relates to a method for efficiently encoding a spectrum having a different shape for each band and a band for decoding the encoded signal in a method for transforming and encoding a voice tone signal for encoding a spectrum of a wideband input signal. A divided voice musical tone signal encoding method, a decoding method, and a program storage medium for executing these methods are provided.
[0006]
[Means for Solving the Problems]
In a speech tone signal encoding method in which a discrete speech tone signal sample is input and a digital code is output, a time-series signal is band-divided by a band-splitting filter for each predetermined number of input samples, and a plurality of time-series time-series signals are obtained. The frequency domain MDCT coefficient is obtained by performing MDCT conversion for each predetermined number of input samples for each band, and frequency domain MDCT coefficients are obtained from each band time series signal to perform linear prediction analysis for each band to obtain an LPC spectrum envelope and The LPC index is calculated, the frequency domain MDCT coefficient for each band is flattened by the LPC spectrum envelope for each band to obtain a flattened input coefficient, and the flattened input coefficient for each band is normalized by the power for each fixed number of input sample coefficients. And calculate the power normalization index and residual input coefficient. A vector quantization index is obtained by quantization, and this vector quantization is performed by assigning the vector length and quantization bit and vector quantization from the LPC spectrum envelope, power normalization index, or residual input coefficient for each band. A speech tone signal encoding method for obtaining a vector length index and a quantization bit allocation index according to the above is constructed.
[0007]
Then, in a voice music signal decoding method for inputting a digital code and reproducing a voice music signal, a reproduction normalization power is obtained from a power normalization index for each band, and a reproduction LPC spectrum envelope is obtained from an LPC index for each band. Vector length and quantization bit allocation are performed for each band, reproduction residual coefficient is obtained by vector quantization index, vector quantization decoding is performed, and reproduction residual coefficient for each band is inversely flattened by reproduction normalization power, The reproduction flattening coefficient is obtained, the reproduction flattening coefficient for each band is inversely flattened by the reproduction LPC spectrum envelope for each band, the reproduction MDCT coefficient is obtained, the reproduction MDCT coefficient for each band is subjected to inverse MDCT conversion, and a plurality of bands are obtained. Time sequence signal is obtained, and a time-series playback voice tone signal is obtained from multiple time sequence signals by band synthesis filter. We configured the audio tone signal decoding method for obtaining.
[0008]
In addition, in a voice tone signal encoding apparatus that outputs a digital code with a discrete voice tone signal sample as an input, a band for dividing a time-series signal into a predetermined number of input samples and obtaining a plurality of time-series signals by band

MDCT conversion units

210, 211, and 212 for obtaining frequency domain MDCT coefficients by performing MDCT conversion for each predetermined number of input samples for each band, and including a division filter.

LPC analysis units

220, 221, and 222 that perform linear prediction analysis for each band from a signal to calculate an LPC spectrum envelope and an LPC index, and flatten the frequency domain MDCT coefficients for each band by the LPC spectrum envelope for each band.

LPC flattening units

230, 231 and 232 for obtaining flattening input coefficients are provided, and flattening for each band

Power normalization units

240, 241, and 242 for normalizing the power coefficient for each predetermined number of input sample coefficients with power components to obtain normalized power and power normalization index are provided to normalize the input flattening coefficient for each band. A vector used in vector quantization of each band from the LPC spectrum envelope, normalized power, or residual input coefficient is provided with

residual calculation units

250, 251, 252 that are normalized by power and calculate the residual input coefficient. A vector length quantization bit allocation unit 270 that calculates allocation of lengths and bits used for vector quantization is provided, performs vector length and quantization bit allocation and vector quantization for each band, and a vector length index as necessary And

vector quantization units

260, 261, and 262 for obtaining quantization bit allocation indexes. We configured the audio tone signal encoding apparatus.
[0009]
Furthermore, in the voice musical tone signal decoding apparatus for inputting the digital code and reproducing the voice musical tone signal,

power denormalization units

440, 441, and 442 for obtaining reproduction normalization power from the power normalization index for each band are provided,

LPC synthesis units

420, 421, and 422 for obtaining a reproduction LPC spectrum envelope from an LPC index for each band are provided, a vector length and quantization bit allocation are performed for each band, and a reproduction residual coefficient is obtained by a vector quantization index to obtain a vector Vector

quantization decoding units

460, 461, and 462 that perform quantization decoding are provided, and a residual deflating unit 450 that obtains a reproduction flattening coefficient by deflating the reproduction residual coefficient for each band using the reproduction normalization power. 451, 452, and the reproduction flattening coefficient for each band is inverted by the reproduction LPC spectrum envelope for each band. LPC

inverse flattening units

430, 431, and 432 that obtain carrier MDCT coefficients and perform inverse MDCT conversion on the reproduction MDCT coefficients for each band to obtain a plurality of time series signals for each band. 412 and a voice musical tone signal decoding apparatus comprising a band synthesizing filter 400 that obtains a reproduced voice musical tone signal in a time series from a plurality of time series signals by band.
[0010]
Here, the time-series signal is band-divided by a band-division filter for every fixed number of input samples to obtain a plurality of time-series signals by band, and the time-series signal is divided into a certain number of input samples for each band. The frequency domain MDCT coefficient is obtained by performing MDCT conversion to the frequency domain, linear prediction analysis is performed for each band from the time-series signal of each band to calculate the LPC spectrum envelope and the LPC index, and the frequency domain MDCT coefficient for each band is converted to the LPC for each band. A flattened input coefficient is obtained by flattening with the spectral envelope, and the flattened input coefficient for each band is normalized with the power for each of a certain number of input sample coefficients to obtain a power normalization index and a residual input coefficient. A vector quantization index is obtained by vector quantization for each difference input coefficient, and this vector quantization is performed by vector length for each band. It performs quantization bit allocation and vector quantization, to constitute a speech sound signal encoding program for executing a command for obtaining the vector length index and the quantization bit allocation indexes as necessary.
[0011]
Then, the reproduction normalization power is obtained from the power normalization index for each band, the reproduction LPC spectrum envelope is obtained from the LPC index for each band, the vector length and the quantization bit are assigned for each band, and the reproduction residual is obtained by the vector quantization index. Vector quantization decoding is performed by obtaining the difference coefficient, and the reproduction residual coefficient for each band is inversely flattened by the reproduction normalization power, the reproduction flattening coefficient is obtained, and the reproduction flattening coefficient for each band is reproduced for each band. Inverse flattening by LPC spectrum envelope to obtain reproduction MDCT coefficients, inverse MDCT conversion of reproduction MDCT coefficients for each band to obtain a plurality of time-series signals by band, and time series by a band synthesis filter from the plurality of time-series signals by band A voice musical tone signal decoding program for executing a command for obtaining a reproduced voice musical tone signal is constructed.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
The encoder divides the input signal sequence into a plurality of bands by a band division filter at regular intervals, converts each time sequence signal into MDCT coefficients in the frequency domain by MDCT, normalizes by LPC spectrum envelope, and flattens After calculating the normalized input coefficient, normalization is performed by power to obtain a residual input coefficient. The number of bits and vector length used for vector quantization in each band are calculated from the LPC spectrum envelope, power normalization coefficient, and residual input coefficient calculated for each band, and the residual input coefficient calculated for each band. Is vector quantized.
The decoder performs vector quantization decoding and calculates a reproduction flattening coefficient. The playback flattening coefficient is inversely flattened using the playback normalization power and playback LPC spectrum envelope, and then output time domain signals are obtained by frequency / time conversion, and these output time domain signals are synthesized by a band synthesis filter. And an output signal is obtained. In this method, quality control according to the importance of each band can be realized by vector quantization which is particularly effective for low bit rate coding.
[0013]
【Example】
An embodiment of the present invention will be described with reference to the drawings.
2 and 4 are views for explaining a first embodiment of the present invention. 2 includes a band division filter unit 200,

MDCT conversion units

210, 211, 212,

LPC analysis units

220, 221, 222,

LPC flattening units

230, 231, 232,

power normalization units

240, 241, 242,

residual calculation units

250, 251, 252,

vector quantization units

260, 261, 262, and a vector length quantization bit allocation unit 270. A discrete sample string of a voice tone signal that is an input signal input from the input terminal 201 is input to the band division filter unit 200. . Enter The force signal is converted into a time sequence signal for each band in the band division filter unit 200. In this embodiment, the input signal is divided into three bands. As an example, a broadband audio musical tone signal ranging from 0 kHz to 16 kHz is sampled at 32 kHz to be used as an input signal. Dividing into two is highly effective. The time series signals classified by band in this way are input to the

MDCT converters

210, 211, and 212 of each band, where they are converted into input MDCT coefficients in the frequency domain by modified discrete cosine transform, respectively. To the

LPC flattening units

230, 231, and 232. The time-series signals divided into three bands are also input to the

LPC analysis units

220, 221, and 222 of each band, where the LPC spectrum envelope is calculated based on the input signals, and the LPC flattening of each band is performed. Are transmitted to the

units

230, 231 and 232 and the vector length quantized bit allocation unit 270. Here, there are the following three types of vector length and quantization bit allocation.
(A) Obtained from LPC spectrum envelope.
(B) Obtained from the power normalization index.
(C) Obtain from the residual input coefficient. Only in this case a vector length index and a quantization bit allocation index are required.
[0014]
For the input MDCT coefficients, in order to perform efficient quantization, the flattened input coefficients flattened by the LPC spectrum envelope are calculated in the

LPC flattening units

230, 231, and 232 of each band, and the power of each band is calculated. The data are transmitted to the

normalization units

240, 241, and 242 and the

residual calculation units

250, 251, and 252. The

power normalization units

240, 241, and 242 of each band calculate the power components from the flattened input coefficients, and the normalized powers are respectively calculated from the

residual calculation units

250, 251, and 252 and the vector length quantization bit allocation unit 270. Send to. The above calculation is effective when the square root of the average power in a fixed number of samples and the absolute value of the maximum amplitude in the fixed number of samples are used.
The

residual calculation units

250, 251, and 252 of each band normalize the input flattening coefficient with the normalized power, calculate the residual input coefficient, and transmit the calculation result to the respective

vector quantization units

260, 261, and 262. To do. The vector length quantization bit allocation unit 270 calculates the vector length used in vector quantization of each band and the bit allocation used in vector quantization from the LPC spectrum envelope and the normalized power, and the calculation result is a vector of each band. It transmits to the quantization part 260,261,262. Here, for example, assuming that the upper limit of the vector length is Vs-t, the lower limit is Vs-b, and the average normalized power of each band is P (k) (k is a band number), vector quantization in band number k The vector length V (k) used in the part is
[Expression 1]

However, (int) represents integerization.
Can be calculated. Here, for example, by setting Vs−t = 6 and Vs−b = 2, an effective vector length can be determined. Further, the bit allocation Bit (Vt (k, n)) of the vector Vs-t (k, n) (k is a band number, n is a vector number) is the number of bits that can be used for encoding by band. Is Bit-total (k), and LPC spectrum is LPC (k, s) (k is a band number, s is a sample number),
[Expression 2]

Can be calculated. The

vector quantization units

260, 261, and 262 for each band perform vector quantization on the residual input coefficient using the vector length and bit allocation information, and calculate a vector quantization index.
By determining the vector length based on the ratio of the normalized residual power of the band to the sum of all band powers, the vector length can be set shorter as the power increases. SεV_t (k, n) in the expression (2) and the expression (4) described later is a vector length included in V_t (k, n) (the nth vector in the band k). A certain number of vector element samples s are shown, and by calculating the LPC spectrum sum related to the vector element samples s in equations (2) and (4), a larger amount of information is allocated as the vector power increases. From the above, by reducing the vector length and setting a larger bit allocation for a band with higher power, it is possible to suppress deterioration in encoding quality even if the in-band power fluctuation is severe. Here, the vector length is determined for each band, and a codebook corresponding to each band and corresponding to bit allocation is used. The code book stores a number of code vectors corresponding to bit allocation in association with codes. This number is generally a power of 2 bit allocations, and the greater the bit allocation, the larger the vector number. Also in decoding, after determining the vector length and bit allocation, such a codebook is used to reproduce the vector for the input code.
[0015]
4 includes a band synthesis filter 400, inverse

MDCT conversion units

410, 411, and 412;

LPC synthesis units

420, 421, and 422; and an LPC inverse flattening unit 430. 431, 432,

power denormalization units

440, 441, 442, residual deflating

units

450, 451, 452, vector

quantization decoding units

460, 461, 462, and vector length quantization bit allocation unit 470 Consists of A voice tone which is a discrete sample sequence in the time domain by decoding the code bit sequence input from the

LPC synthesis units

420, 421, 422, the

power denormalization units

440, 441, 442, and the vector

quantization decoding units

460, 461, 462 A signal is output from the output terminal 401. That is, the

LPC synthesis units

420, 421, and 422 for each band calculate a reproduction LPC spectrum envelope from the input LPC index, and transmit it to the LPC

inverse flattening units

430, 431, and 432 and the vector length quantization bit allocation unit 470. . The power denormalization

units

440, 441, and 442 calculate the normalized power from the input power normalization index, and the residual deflating

units

450, 451, and 452 and the vector length quantization bit allocation unit 470 Send. The vector length quantized bit allocation unit 470 calculates the vector length and bit allocation used for vector quantization in each band by the same calculation as the vector length quantized bit allocation unit 270 shown in FIG. It transmits to the vector quantization decoding part 460,461,462 of each band. The vector

quantization decoding units

460, 461, and 462 for each band perform vector quantization decoding using the vector quantization index, the vector length calculated above, and quantization bit allocation, and calculate reproduction residual coefficients. The calculation result is transmitted to each of the residual

inverse flattening units

450, 451, and 452. The residual deflating

units

450, 451, and 452 for each band denormalize the reproduction residual coefficient with the normalized power, calculate the reproduction flattening coefficient, and calculate the calculation result for each

LPC deflating unit

430, 431. 432. The LPC

inverse flattening units

430, 431, and 432 of each band inversely flatten the reproduction flattening coefficients using the reproduction LPC spectrum envelope, calculate reproduction MDCT coefficients, and transmit the reproduction MDCT coefficients to the respective inverse

MDCT conversion units

410, 411, and 412. To do. The inverse

MDCT conversion units

410, 411, 412 for each band calculate the time-series signal for each band by performing inverse MDCT calculation on the reproduction MDCT coefficient, and transmit the calculation result to the band synthesis filter unit 400. The band synthesis filter unit 400 synthesizes the time series signals output from the respective bands, converts them into a time domain output sample series, and outputs them from the output terminal 401 as a decoding result.
In the case of the first embodiment shown in FIGS. 2 and 4, the vector length and the bit allocation are calculated from the power normalization coefficient and the envelope component to be encoded for each band. It is not necessary to use information relating to allocation as an encoded output. Accordingly, in decoding, a vector length and bit allocation are performed based on the transmitted power normalization coefficient and envelope component, and then the residual component from the transmission code is reproduced.
[0016]
3 and 5 are views for explaining a second embodiment of the present invention.
3 includes a band division filter unit 300,

MDCT conversion units

310, 311 and 312,

LPC analysis units

320, 321, and 322,

LPC flattening units

330, 331, and 332, and power normalization. And 340, 341, and 342,

residual calculation units

350, 351, and 352,

vector quantization units

360, 361, and 362, and a vector length quantization bit allocation unit 370. Is input to the terminal 301 and the encoded bit string is converted into an LPC analysis unit. 320, 321, 322 , Output from the

power normalization units

340, 341, 342 and the

vector quantization units

360, 361, 362. That is, the input signal is input to the band division filter unit 300 via the input terminal 301 and is converted into a time series signal for each band. In this embodiment, the input signal is divided into three bands. As an example, a wide-band audio musical tone signal ranging from 0 kHz to 16 kHz is sampled at 32 kHz to be used as an input signal. Dividing into two is highly effective. The time-series signals divided into three bands in this way are converted into input MDCT coefficients in the frequency domain by the modified discrete cosine transform in the

MDCT conversion units

310, 311, and 312 of the respective bands, and the conversion results are converted into corresponding LPC flats. It transmits to the conversion part 330,331,332. The time-series signals divided into three bands are also represented by corresponding LPC analysis units. 320, 321, 322 Here, the LPC spectrum envelope is calculated from the input signal and transmitted to the

LPC flattening units

330, 331, and 332. In order to perform efficient quantization on the input MDCT coefficients, the corresponding

LPC flattening units

330, 331, and 332 are flattened by the LPC spectrum envelope to calculate the flattened input coefficients, and the corresponding

residual calculation unit

350 , 351, 352 and the

power normalization units

340, 341, 342. The

power normalization units

340, 341, and 342 for each band calculate power components from the flattened input coefficients and transmit them to the

residual calculation units

350, 351, and 352. The above calculation is effective when the square root of the average power in a fixed number of samples and the absolute value of the maximum amplitude in the fixed number of samples are used. Each band

residual calculation unit

350, 351, 352 normalizes the input flattening coefficient with the normalized power, calculates the residual input coefficient, and the corresponding

vector quantization unit

260, 261, 262 and vector length quantization It transmits to the bit allocation part 370. Vector length quantization bit allocation section 370 calculates the vector length used in vector quantization of each band and the allocation of bits used in vector quantization from the residual input coefficient, and corresponds the vector length and quantization bit allocation. It transmits to the vector quantization part 360,361,362. Then, these

vector quantization units

360, 361, and 362 calculate a vector quantization index. Here, for example, if the upper limit of the vector length is Vs−t, the lower limit is Vs−b, and the power of the residual input signal in each band is P (k, s) (k is the band number, s is the sample number). The vector length V (k) in the vector quantization of the band number k is
[Equation 3]

However, (int) represents integerization.
Can be calculated. Here, for example, by setting Vs−t = 6 and Vs−b = 2, an effective vector length can be determined. In addition, the bit allocation Bit (Vt (k, n)) of the vector Vt (k, n) (k is a band number, n is a vector number) in each band may be used for encoding for each band. When the number of possible bits is Bit-total (k),
[Expression 4]

Can be calculated. The

vector quantization units

360, 361, and 362 for each band perform vector quantization on the residual input coefficient using the vector length and bit allocation information, and calculate a vector quantization index.
The decoding apparatus in the second embodiment shown in FIG. 5 includes a band synthesis filter 500 for each band, inverse

MDCT conversion units

510, 511, 512,

LPC synthesis units

520, 521, 522, and an LPC inverse flattening unit. 530, 531, 532,

power denormalization units

540, 541, 542, residual deflating

units

550, 551, 552, and vector

quantization decoding units

560, 561, 562, and an

LPC synthesis unit

520, 521, 522,

power denormalization units

540, 541, 542 and vector

quantization decoding units

560, 561, 562 are decoded, and a bit tone sequence signal is output as a time domain discrete sample sequence. Output from terminal 501. That is, the

LPC synthesis units

520, 521, and 522 of the respective bands calculate a reproduction LPC spectrum envelope from the input LPC index, and transmit it to the corresponding LPC

inverse flattening units

530, 531, and 532. Also, the

power denormalization units

540, 541, and 542 for each band calculate the normalized power from the power normalization index, and transmit the normalization power to the residual deflating

units

550, 551, and 552. Vector

quantization decoding sections

560, 561, and 562 of each band perform vector quantization decoding from the vector quantization index, vector length, and quantization bit allocation, calculate reproduction residual coefficients, and residual inverse flattening section 550. , 551, 552. The residual

inverse flattening units

550, 551, and 552 of the respective bands denormalize the reproduction residual coefficients with the normalized power, calculate the reproduction flattening coefficients, and transmit the reproduction flattening coefficients to the LPC

inverse flattening units

530, 531, and 532. . The LPC

inverse flattening units

530, 531 and 532 inversely flatten the reproduction flattening coefficient by the reproduction LPC spectrum envelope, calculate the reproduction MDCT coefficient, and transmit it to the inverse

MDCT conversion units

510, 511 and 512. The inverse

MDCT conversion units

510, 511, and 512 calculate the time-series signal for each band by performing inverse MDCT calculation on the reproduction MDCT coefficient, and transmit it to the band synthesis filter unit 500. The band synthesis filter unit 500 synthesizes the time series signals output from each band, converts them into a time domain output sample series, and outputs the result to the output terminal 501 as a decoding result.
In the case of the second embodiment shown in FIGS. 3 and 5 above, since the vector length and bit allocation for the residual component are performed based on the values before quantization, the vector length and bit allocation information are encoded and decoded. Is different from the first embodiment in that the residual component is reproduced after the vector length and bit allocation are determined by reproducing them.
[0017]
FIG. 6 shows a configuration when the encoding method and the decoding method according to the present invention are implemented by a computer. The computer 600 includes a CPU 610, a RAM 620, a ROM 630, an input / output interface 640, and a hard disk 650 that are connected to each other via a bus 680. The ROM 630 stores a basic program for operating the computer 600, and the hard disk 650 stores in advance a program for executing the above-described encoding method and decoding method according to the present invention.
At the time of encoding, the CPU 610 loads an encoding program from the hard disk 650 into the RAM 620, encodes the audio signal sample input from the interface 640 by processing according to the encoding program, and outputs it from the interface 640. At the time of decoding, a decoding program is loaded from the hard disk 650 to the RAM 620, an input signal is processed according to the decoding program, and audio signal samples are output.
[0018]
As a program for executing the encoding method and the decoding method according to the present invention, those recorded in the external disk device 670 connected to the internal bus 680 via the drive device 660 may be used. Alternatively, a program downloaded from an external network via the interface 640 and stored in the hard disk 650 may be used. The storage medium on which the program for executing the encoding / decoding method according to the present invention is recorded may be a storage medium in the form of a magnetic recording medium, an IC memory, a compact disk, or the like.
[0019]
【The invention's effect】
As described above, the present invention can efficiently encode and decode a wideband audio musical tone signal having a different shape for each band. When the voice tone signal is divided into bands as an example with the frequency on the horizontal axis and the power on the vertical axis, these three bands each have a spectrum of a different shape. Do not assign a fixed number of bits and vector length as a whole for vector quantization with a fixed value for each band, but consider the importance and necessity of each band and make sure it is appropriate By performing appropriate distribution and allocation, a wideband voice musical sound signal is efficiently encoded.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining an example of a transform coding method using vector quantization.
FIG. 2 is a diagram for explaining an encoder in the first embodiment;
FIG. 3 is a diagram for explaining an encoder in a second embodiment;
FIG. 4 is a diagram illustrating a decoder in the first embodiment.
FIG. 5 is a diagram for explaining a decoder according to a second embodiment.
FIG. 6 is a diagram illustrating a computer that performs an encoding / decoding method.
[Explanation of symbols]
201 Band-splitting filter
210, 211, 212 MDCT converter
220, 221, 222 LPC analysis section
230, 231, 232 LPC flattening section
240, 241, 242 Power normalization unit
250, 251, 252 Residual calculator
260, 261, 262 Vector quantization unit
270 Vector length quantization bit allocation unit
400 band synthesis filter
410, 411, 412 Inverse MDCT converter
420, 421, 422 LPC synthesis unit
430, 431, 432 LPC reverse flattening section
440, 441, 442 Power denormalization unit
450, 451, 452 Residual inverse flattening section
460, 461, 462 Vector quantization decoding unit

Claims

In a voice tone signal encoding method for outputting a digital code by using a discrete voice tone signal sample as an input,
A time-series signal is band-divided by a band-splitting filter for every certain number of input samples, and a plurality of time-series signals are obtained by band.
Seeking frequency domain coefficients are converted into a frequency domain signal of a band-specific time sequence for each input sample of the certain number for each band,
Perform linear prediction analysis for each band from the time-series signal of each band to calculate the LPC spectrum envelope and LPC index,
Seeking flattening input factor planarized by LPC spectrum envelope of each band the frequency domain coefficients for each band,
Normalize the flattened input coefficient for each band by power for a certain number of input sample coefficients, find the power normalization index and residual input coefficient,
Used in vector quantization so that for each band, the vector length decreases as the ratio of the band power to the power of the entire band increases, and the vector length increases as the ratio of the power of the band to the power of the entire band decreases. Determine the vector length to be
Obtaining a vector quantization index residual input coefficient for each band by more vector quantization on the determined vector length,
A voice musical sound signal encoding method characterized by the above.

In a voice music signal decoding method for inputting a digital code and reproducing a voice music signal,
Obtain the playback normalization power from the power normalization index for each band,
Obtain the playback LPC spectrum envelope from the LPC index for each band,
For each band, the higher the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band, the shorter the vector length, and the lower the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band, the lower the vector length. Determine the vector length to be used in vector quantization decoding so that the vector length becomes longer,
A vector quantization decoding is performed using a codebook having the vector length determined by the vector quantization index to obtain a reproduction residual coefficient ,
The playback residual coefficient for each band is inversely flattened by the playback normalization power, and the playback flattening coefficient is obtained.
The reproduction flattening coefficient for each band is inversely flattened by the reproduction LPC spectrum envelope for each band, and the reproduction frequency domain coefficient is obtained.
By converting the reproduction frequency domain coefficient for each band to the time domain, a plurality of time series signals by band are obtained,
Obtaining a time-series playback voice musical sound signal from a plurality of time-series signals by band using a band synthesis filter,
A voice musical sound signal decoding method characterized by the above.

In a voice musical sound signal encoding apparatus for outputting a digital code with a discrete voice musical sound signal sample as an input,
A band division filter that divides a time-series signal into a certain number of input samples and obtains a plurality of time-series time-series signals,
Comprising a frequency domain conversion unit for obtaining the frequency domain coefficients are transformed into the frequency domain signals of the band-specific time sequence for each input sample of the certain number for each band,
An LPC analysis unit that performs linear prediction analysis for each band from a time-series signal of each band to calculate an LPC spectrum envelope and an LPC index,
Planarized by LPC spectrum envelope of each band the frequency domain coefficients of each band comprises a LPC flattening unit for obtaining a planarization input coefficients,
A power normalization unit that normalizes a flattened input coefficient for each band with a power component for each of a certain number of input sample coefficients, and obtains a normalized power and a power normalized index,
Normalizing the input flattening coefficient for each band with the normalized power, and comprising a residual calculation unit for calculating the residual input coefficient,
Used in vector quantization so that for each band, the vector length decreases as the ratio of the band power to the power of the entire band increases, and the vector length increases as the ratio of the power of the band to the power of the entire band decreases. A vector length quantization bit allocation unit for calculating a vector length to be
A vector quantization unit that performs vector quantization on the residual input coefficient of the band by the determined vector length to obtain a vector quantization index;
A voice musical sound signal encoding device characterized by the above.

In a voice music signal decoding apparatus for inputting a digital code and reproducing a voice music signal,
A power denormalization unit for obtaining a reproduction normalization power from a power normalization index for each band;
An LPC synthesis unit for obtaining a reproduction LPC spectrum envelope from an LPC index for each band;
For each band, the higher the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band, the shorter the vector length, and the lower the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band, the lower the vector length. A vector length quantization bit allocation unit for calculating a vector length to be used in vector quantization decoding so that the vector length becomes longer;
A vector quantization decoding unit that performs vector quantization decoding using a vector length and a vector quantization index calculated for each band, and obtains a reproduction residual coefficient ;
A residual deflating unit that obtains a reproduction flattening coefficient by deflating a reproduction residual coefficient for each band using a reproduction normalization power,
An LPC inverse flattening unit for obtaining a reproduction frequency domain coefficient by inversely flattening the reproduction flattening coefficient for each band by a reproduction LPC spectrum envelope for each band;
The reproduction frequency domain coefficients for each band by converting the time domain comprises a time domain conversion unit for obtaining a plurality of bands by time series signal,
A band synthesizing filter for obtaining a time-series reproduced voice musical sound signal from a plurality of time-series signals by band;
A voice musical tone signal decoding apparatus characterized by the above.

A time-series signal is band-divided by a band-splitting filter for each fixed number of input samples to obtain a plurality of time-series signals for each band, and the time-series signal for each band is frequency-domain for a certain number of input samples for each band. seeking frequency domain coefficients are converted into, by performing a linear prediction analysis to calculate the LPC spectral envelope and LPC index from the signal of the band-specific time series for each band, the frequency domain coefficients of each band in the respective bands seeking flattening input factor planarized by LPC spectrum envelope, a flattening input coefficient for each band is normalized by the power for each input sample coefficient of a certain number, it obtains the power normalization index and residual input coefficients, for each band The vector length decreases as the ratio of the power of the band to the power of the entire band increases, and the vector length decreases as the ratio of the power of the band to the power of the entire band decreases. Kunar so, determines the vector length to be used in the vector quantization, voice tone signal for executing a command for obtaining the vector quantization index and the vector quantized using a vector length with the determined respective residual input coefficient for each band Encoding program.

The reproduction normalization power is obtained from the power normalization index for each band, the reproduction LPC spectrum envelope is obtained from the LPC index for each band, and the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band is obtained for each band. The vector length is shortened as the value increases, and the vector length used in vector quantization decoding is determined so that the vector length becomes longer as the ratio of the reproduction normalization power of the band to the reproduction normalization power of the entire band is lower. Then, vector quantization decoding is performed using the vector length and vector quantization index calculated in step (2) to obtain a reproduction residual coefficient, and the reproduction residual coefficient for each band is inversely flattened by the reproduction normalization power, and the reproduction flattening coefficient is obtained. determined, the reproduction flattening coefficient for each band inversely flattened by reproducing LPC spectrum envelope of each band, obtains a reproduction frequency domain coefficients, re in the respective bands Obtains a plurality of bands by time-series signal into a frequency domain coefficients into the time domain, the audio tone signal decoding that executes instructions for determining the reproduced audio tone signal of a time series by the band synthesis filter from a plurality of bands by time-series signal program.