JP2000242299A

JP2000242299A - Weight code book and its generating method, setting method for initial value of ma prediction coefficient in learning in code book designing, encoding method and decoding method for sound signal, computer-readable storage medium storing encoding program and computer- readable storage medium storing decording program

Info

Publication number: JP2000242299A
Application number: JP11039195A
Authority: JP
Inventors: Noboru Harada; 登原田; Naka Omuro; 仲大室; Kenichi Furuya; 賢一古家
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-02-17
Filing date: 1999-02-17
Publication date: 2000-09-08
Anticipated expiration: 2019-02-17
Also published as: JP3578933B2

Abstract

PROBLEM TO BE SOLVED: To make suppressible deterioration in the sound quality of a relatively-low-sound- pressure part of a decoded sound signal by setting respective elements of a weight code book by using a distance scale determined by taking human sound pressure sense characteristics into account and encoding a sound signal by using the weight code book which is thus generated. SOLUTION: In order to calculate the distortion (d) of a reproduced sound candidate (y) while taking auditory weighting into consideration, a distortion calculation part 1-6 inputs the reproduced sound candidate (d) outputted from a composing filter after passing it through an auditory weighting filter 2-2 and also inputs an input sound (x) to the distance calculation part 2-4 after passing it through an auditory weighting filter 2-3. The auditory weighting filter 2-2 has a linear prediction parameter (a) as a coefficient, converts the reproduced sound candidate (y) into a reproduced sound candidate yw with auditory weight, and outputs it to the distance calculation part 2-4. The auditory weighting filter 2-3 is a filter which uses a fixed coefficient, and converts an input sound (x) into an input sound xw with auditory weight and outputs it to the distance calculation part 2-4.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、重み符号帳とその
作成方法及び符号帳設計時における学習時のＭＡ予測係
数の初期値の設定方法並びに音響信号の符号化方法及び
その復号方法並びに符号化プログラムが記憶されたコン
ピュータに読み取り可能な記憶媒体及び復号プログラム
が記憶されたコンピュータに読み取り可能な記憶媒体に
係わり、特に音響信号を符号化あるいは復号する技術に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a weighted codebook, a method for preparing the same, a method for setting initial values of MA prediction coefficients during learning when designing a codebook, a method for encoding an audio signal, a method for decoding the same, and an encoding method. The present invention relates to a computer-readable storage medium in which a program is stored and a computer-readable storage medium in which a decoding program is stored, and more particularly to a technique for encoding or decoding an audio signal.

【０００２】[0002]

【従来の技術】ディジタル移動体通信においては、通信
回線や記憶媒体を効率的に利用するために、例えば再生
音響信号の歪みを所定の距離尺度の下で最小とする高能
率符号化方式が採用される。この高能率符号化方式の１
つとして、時間領域において音響信号をフレームまたは
サブフレームと呼ばれる５〜５０ｍｓ程度の一定間隔の
区分に分割し、その１フレームを周波数スペクトルの包
絡特性を表す信号（短期予測信号）及びその予測残差を
示す駆動音源信号に分離し、これら短期予測信号と駆動
音源信号とをそれぞれ符号化する方法が提案されてい
る。2. Description of the Related Art In digital mobile communication, in order to use a communication line and a storage medium efficiently, for example, a high-efficiency coding system for minimizing distortion of a reproduced audio signal under a predetermined distance scale is adopted. Is done. One of the high efficiency coding methods
First, in the time domain, an audio signal is divided into sections called frames or subframes at regular intervals of about 5 to 50 ms, and one frame is divided into a signal (short-term prediction signal) representing an envelope characteristic of a frequency spectrum and a prediction residual thereof. Has been proposed in which the short-term predicted signal and the driving excitation signal are separately encoded.

【０００３】また、上記駆動音源信号の符号化方式とし
ては、音声のピッチ周期（基本周波数）に対応する周波
数成分とそれ以外の成分（非周期性成分）に分離して符
号化する符号駆動線形予測符号化方式（ＣＥＬＰ:Code-
Excited Linear Prediction）が知られている。この符
号駆動線形予測符号化方式については、文献『M.R.Schr
oeder and B.S.Atal,"Code‐Excited Linear Predictio
n(CELP):High QualitySpeech at Very Low Bit Rates",
IEEE Proc.lCASSP-85,p.p.937-940,1985』に詳細が記載
されている。[0003] In addition, as a coding method of the above-mentioned driving excitation signal, there is a code driving linearity which separates and codes a frequency component corresponding to a pitch period (basic frequency) of a voice and other components (non-periodic components). Predictive coding method (CELP: Code-
Excited Linear Prediction) is known. This code-driven linear prediction coding method is described in the document "MRSchr
oeder and BSAtal, "Code-Excited Linear Predictio
n (CELP): High QualitySpeech at Very Low Bit Rates ",
IEEE Proc. LCASSP-85, pp937-940, 1985].

【０００４】音響信号の符号化の場合、例えば符号駆動
線形予測符号化方式にはＭＡ予測を用いたベクトル量子
化（ＭＡ予測ベクトル量子化）が用いられる。予測ベク
トル量子化には、ＡＲ予測を用いたＡＲ予測ベクトル量
子化とＭＡ予測ベクトル量子化とがあり、いずれも予測
を用いてフレーム間の相関を取り除くことができる。一
般的には、相関のある信号列から予測可能な成分を取り
除いて白色化する手法としてＡＲ予測ベクトル量子化が
用いられることが多いが、音響信号の符号化において
は、伝送誤り耐性の向上のためにＭＡ予測ベクトル量子
化が用いられている。In the case of encoding an acoustic signal, for example, vector quantization using MA prediction (MA prediction vector quantization) is used as a code-driven linear prediction encoding method. The prediction vector quantization includes AR prediction vector quantization using AR prediction and MA prediction vector quantization, and both of them can remove correlation between frames using prediction. In general, AR prediction vector quantization is often used as a method of removing predictable components from a correlated signal sequence and whitening it. However, in audio signal encoding, improvement of transmission error resistance is improved. For this reason, MA prediction vector quantization is used.

【０００５】ＭＡ予測ベクトル量子化では、符号帳に所
定数の代表ベクトルを各要素として予め設定しておき、
駆動音源信号に応じてフレームあるいはサブフレーム毎
に代表ベクトルを選択し、該選択された代表ベクトルを
所定のフィルタ係数（ＭＡ予測係数）を有するＩＩＲフ
ィルタに入力することにより駆動音源信号に対応する信
号を生成する。そして、復号時には上記ＩＩＲフィルタ
と同一のＭＡ予測係数のＦＩＲフィルタを用いて上記信
号を再生する。In the MA prediction vector quantization, a predetermined number of representative vectors are previously set as elements in a codebook,
A signal corresponding to the driving sound source signal is selected by selecting a representative vector for each frame or subframe according to the driving sound source signal and inputting the selected representative vector to an IIR filter having a predetermined filter coefficient (MA prediction coefficient). Generate Then, at the time of decoding, the signal is reproduced using an FIR filter having the same MA prediction coefficient as the IIR filter.

【０００６】このようなＭＡ予測ベクトル量子化におい
ては、上記符号帳及びＭＡ予測係数を適切に決定する必
要がある。従来では、図９に示すような再帰的学習法に
よって符号帳の各要素及びＭＡ予測係数を決定してい
る。この学習法においては、初期符号帳の設計処理（９
−１）において、初期符号帳を設計する。この初期符号
帳の設計手法としては、ＬＢＧアルゴリズムが適用され
る。この設計手法は、ベクトル量子化すべきベクトル空
間の分布をよく表すような分布をもつ学習用データ（学
習用の音響信号）を量子化したときの歪みの総和が極小
となるように初期符号帳を設計する手法である。ＬＢＧ
アルゴリズムでは、代表ベクトルの初期値をｌつ決め、
最適な代表ベクトルを求める操作と代表ベクトルの数を
２倍に増加させる操作とを交互に行って目的とする数の
代表ベクトルを得る。[0006] In such MA prediction vector quantization, it is necessary to appropriately determine the codebook and the MA prediction coefficient. Conventionally, each element of the codebook and the MA prediction coefficient are determined by a recursive learning method as shown in FIG. In this learning method, an initial codebook design process (9
In -1), an initial codebook is designed. The LBG algorithm is applied as a method for designing the initial codebook. This design method uses an initial codebook such that the total sum of distortion when quantizing learning data (sound signal for learning) having a distribution that well represents the distribution of the vector space to be vector-quantized is minimized. It is a design method. LBG
The algorithm determines one initial value of the representative vector,
An operation of obtaining an optimal representative vector and an operation of doubling the number of representative vectors are alternately performed to obtain a desired number of representative vectors.

【０００７】このＬＢＧアルゴリズムは、学習ベクトル
の集合をいくつかの部分集合に分割する操作を意味し、
上記部分集合をクラスタ、分割処理をクラスタリングと
呼ぷ。ＬＢＧアルゴリズムの詳細については、文献『Y.
Lindle A.Buzo.E.M.Gray,"AnAlgorithm for Vector Qlu
antizer Design",IEEE Trains.Commun.COM‐28 p.p.84-
95,1980』に記載されている。予測ベクトル量子化につ
いては、文献『AllenGersho,Robert M.Gray,"VECTO QUA
NTIZATI0N AND SlCNAL C0MPRESSION",KliwerAcademic P
ublishers,1992』に記載されている。[0007] This LBG algorithm means an operation of dividing a set of learning vectors into several subsets,
The above subset is called a cluster, and the division process is called clustering. For details of the LBG algorithm, see the document "Y.
Lindle A. Buzo.EMGray, "AnAlgorithm for Vector Qlu
antizer Design ", IEEE Trains.Commun.COM-28 pp84-
95, 1980]. For the prediction vector quantization, see the document "AllenGersho, Robert M. Gray," VECTO QUA.
NTIZATI0N AND SlCNAL C0MPRESSION ", KliwerAcademic P
ublishers, 1992].

【０００８】続いて、ＭＡ予測係数決定処理（９−２）
では、学習用データと上記初期符号帳を用いて再生信号
の歪みの総和が最小となるようにＭＡ予測係数を仮決定
する。そして、処理（９−３）において学習用データを
符号化すると、以降、処理（９−４）〜（９−９）を繰
り返すことにより、再生信号の歪みの総和Ｄが最小とな
るように最適な代表ベクトルとＭＡ予測係数とが最終的
に決定される。Subsequently, MA prediction coefficient determination processing (9-2)
Then, the MA prediction coefficient is provisionally determined using the learning data and the initial codebook so that the total sum of the distortion of the reproduction signal is minimized. Then, when the learning data is encoded in the process (9-3), the processes (9-4) to (9-9) are repeated thereafter, so that the total sum D of the distortion of the reproduction signal is minimized. Finally, the representative vector and the MA prediction coefficient are finally determined.

【０００９】すなわち、学習用データがどの代表ベクト
ルに帰属するかを再帰的に更新することにより、その帰
属状態において歪みの総和が極小となるように代表ベク
トルが更新される（９−４）。この帰属更新処理と代表
ベクトの更新処埋を収束するまで交互に繰り返し行うこ
とにより、代表ベクトルは、次第に最適なものに近づい
て行く。この操作は、Lloyd‐Maxまたはｋ平均アルゴリ
ズムと呼ばれている。That is, by recursively updating which representative vector the learning data belongs to, the representative vector is updated so that the total sum of the distortion is minimized in the belonging state (9-4). By repeating the belonging update processing and the update processing of the representative vector alternately until convergence, the representative vector gradually approaches an optimum one. This operation is called Lloyd-Max or k-means algorithm.

【００１０】なお、重み符号帳のＭＡ予測ベクトル量子
化については、文献『ITU‐T COM 15-152-E,"G.729-Cod
ing of Speech at 8kbit/s using Conjugate-Structure
Algeraic-Code-Excited/Linear-Prediction(CS-ACEL
P)",Jul.1995』に詳しい。同じく,ＭＡ予測を用いるフ
レーム間予測ベクトル量子化に関しては、大室仲，守谷
健弘，間野一則，三樹聡，”移動平均型フレーム間予測
を用いるＬＳＰパラメータのベクトル量子化”，電子情
報通信学会論文誌 A Vol.J77-A No.3 p.p.303-313,1994
等に記載がある。The quantization of the MA prediction vector of the weighting codebook is described in the document "ITU-T COM 15-152-E," G.729-Cod.
ing of Speech at 8kbit / s using Conjugate-Structure
Algeraic-Code-Excited / Linear-Prediction (CS-ACEL
P) ", Jul. 1995. Similarly, regarding inter-frame prediction vector quantization using MA prediction, Nakamura Omuro, Takehiro Moriya, Kazunori Mano, Satoshi Miki," LSP using moving average type inter-frame prediction Vector Quantization of Parameters ”, Transactions of the Institute of Electronics, Information and Communication Engineers A Vol.J77-A No.3 pp303-313,1994
And so on.

【００１１】[0011]

【発明が解決しようとする課題】ところで、音響信号の
符号化においては、音響信号をデジタル表現する際のビ
ット数が制限されることから、上記符号駆動線形予測符
号化方式に用いられる符号帳の要素数にも制限がある。
この符号帳は、複数の学習用の音響信号を用いることに
より予め各要素を決定し、実際に伝送する音響信号の符
号化に共されるものである。By the way, in the coding of an audio signal, the number of bits in digitally expressing the audio signal is limited. There is also a limit on the number of elements.
The code book determines each element in advance by using a plurality of acoustic signals for learning, and is used for encoding an acoustic signal to be actually transmitted.

【００１２】しかし、従来では学習用の音響信号の集合
全体に対して再生音響信号の波形歪みの総和が最小、つ
まり再生音響信号のＳ／Ｎが最大となるように符号帳の
各要素を学習によって決定している。このような学習用
の音響信号全体に亘って波形歪みを最小にするような距
離尺度で符号帳の各要素を決定した場合、入カ音響信号
のパワーが大きなところに重点を置いた符号帳が生成さ
れる。However, conventionally, each element of the codebook is learned so that the sum of the waveform distortions of the reproduced audio signal is minimized with respect to the entire set of learning audio signals, that is, the S / N of the reproduced audio signal is maximized. Is determined by When each element of the codebook is determined on a distance scale that minimizes waveform distortion over the entire acoustic signal for learning, a codebook that focuses on a portion where the power of the input audio signal is large is used. Generated.

【００１３】人問の音圧知覚特性は、パワーの小さな音
では少しのパワーの変化も十分に知覚可能であり、むし
ろパワーの大きな部分で知覚感度が荒い。このため、従
来の学習によって得られた符号帳を用いて実際に伝送す
る音響信号を符号化した場合、音響信号のバワーの小さ
な部分に対応する符号帳要素の割合か少な過ぎるため、
入力音響信号の大きさが比較的小さい場合に再生音響信
号の音質劣化が顕著に知覚される。この結果として、そ
のような符号帳を用いて再生された再生音響は不安定な
印象を与える。With respect to the sound pressure perception characteristics of human beings, a small change in power can be sufficiently perceived with a sound having a small power, and the perception sensitivity is rather rough at a portion where the power is large. For this reason, when the audio signal actually transmitted using the codebook obtained by the conventional learning is encoded, the ratio of the codebook element corresponding to the small part of the power of the audio signal is too small,
When the magnitude of the input audio signal is relatively small, sound quality degradation of the reproduced audio signal is noticeably perceived. As a result, the reproduced sound reproduced using such a codebook gives an unstable impression.

【００１４】一方、少ないビット数で少しでも高品質な
再生音響信号を得るためには、従来の符号帳の学習方法
を改善し、入力音響信号の量子化効率を可能な限り良く
する必要がある。上記符号帳の要素の量子化には、伝送
誤り耐性向上のために復号時にＦＩＲフィルタを用いる
ＭＡ予測ベクトル量子化が用いられてきたが、一般的
に、ＭＡ予測を用いてＡＲ予測と同程度の予測性能を得
るためには、ＭＡ予測の次数（ＩＩＲフィルタ及びＦＩ
Ｒフィルタの次数）を非常に高次に設定する必要があ
る。しかし、伝送誤りからの復旧時間を考慮して４次程
度の少ない次数が用いられちことが多く、この低い次数
のＭＡ予測係数を開ループで安定に決定することは極め
て困難であった。On the other hand, in order to obtain a high-quality reproduced audio signal even with a small number of bits, it is necessary to improve the conventional codebook learning method and improve the quantization efficiency of the input audio signal as much as possible. . For the quantization of the codebook elements, MA prediction vector quantization using an FIR filter at the time of decoding has been used to improve transmission error resistance. In order to obtain the prediction performance of the MA prediction, the order of the MA prediction (IIR filter and FI
R order) must be set to a very high order. However, a small order of about four is often used in consideration of a recovery time from a transmission error, and it has been extremely difficult to stably determine an MA prediction coefficient of this low order in an open loop.

【００１５】従来のＭＡ予測ベクトル量子化では、上述
したように初期符号帳を先に決めなければＭＡ予測係数
が決まらず、またＭＡ予測係数を決定しなければ初期符
号帳を決定することができないという制約条件がある。
一度、初期符号帳が決定されれば、あとは初期符号帳と
ＭＡ予測係数を交互に学習する再帰学習法によって、各
値は漸近的に局所的最適解に収束していく。従来では、
ＭＡ予測を使わないで設計した初期符号帳を用いる方法
またはＭＡ予測係数の初期値を経験的に決定する方法が
採用されており、よって初期符号帳によっては局所解に
陥ったり、あるいは適当に与えたＭＡ予測係数が安定で
ない場合があるため、最適な初期符号帳の決定は難し
く、再帰学習結果として得られた符号帳も局所的な最適
解しか得られない。In the conventional MA prediction vector quantization, as described above, the MA prediction coefficient cannot be determined unless the initial codebook is determined first, and the initial codebook cannot be determined unless the MA prediction coefficient is determined. There is a constraint condition.
Once the initial codebook is determined, each value asymptotically converges to a local optimal solution by a recursive learning method that alternately learns the initial codebook and MA prediction coefficients. Traditionally,
A method using an initial codebook designed without using MA prediction or a method of empirically determining an initial value of an MA prediction coefficient is employed. Therefore, depending on the initial codebook, a local solution may be obtained or an appropriate value may be given. Since the MA prediction coefficient may not be stable, it is difficult to determine an optimal initial codebook, and a codebook obtained as a result of recursive learning can only obtain a local optimal solution.

【００１６】本発明は、上述する問題点に鑑みてなされ
たもので、以下の点を目的とするものである。（１）入力音響信号の大きさが比較的小さい場合におけ
る再生音響信号の音質劣化を抑える。（２）低いビット数の符号化でも高品質な再生音響信号
を得る。The present invention has been made in view of the above problems, and has the following objects. (1) Deterioration of sound quality of a reproduced sound signal when the size of the input sound signal is relatively small. (2) A high-quality reproduced sound signal is obtained even with low bit number coding.

【００１７】[0017]

【課題を解決するための手段】上記目的を達成するため
に、本発明では、重み符号帳及びその作成方法に係わる
第１の手段として、入力音響信号を、その短期間の周波
数スペクトルの包絡特性の予測結果を示す短期予測信号
とその予測残差を示す駆動音源信号とに分離し、該駆動
音源信号に基づいて正規化ベクトルを選択すると共に該
正規化ベクトルの大きさを規定する重み量を該重み量を
各要素とする重み符号帳から選択して駆動音源ベクトル
候補を生成し、該駆動音源ベクトル候補と前記短期予測
信号とに基づいて生成された再生音声候補の入力音響信
号に対する歪みが所定の距離尺度の下で最小となるよう
に前記駆動音源信号を生成し、このようにして生成され
た駆動音源信号と前記短期予測信号とを符号化する符号
化方法における前記重み符号帳の作成方法において、人
間の音圧知覚特性を加味した距離尺度を用いて前記重み
符号帳の各要素を設定するという手段を採用する。In order to achieve the above-mentioned object, according to the present invention, as a first means relating to a weight codebook and a method for producing the same, an input audio signal is converted into an envelope characteristic of a short-term frequency spectrum. Is divided into a short-term prediction signal indicating the prediction result of the above and a driving excitation signal indicating the prediction residual thereof, a normalization vector is selected based on the driving excitation signal, and a weight defining the magnitude of the normalization vector is determined. A driving excitation vector candidate is generated by selecting the weighting amount from each of the weighting codebooks as elements, and distortion of the reproduced audio candidate generated based on the driving excitation vector candidate and the short-term prediction signal with respect to the input audio signal is reduced. Generating a driving excitation signal such that the driving excitation signal is minimized under a predetermined distance measure, and encoding the driving excitation signal thus generated and the short-term prediction signal in a coding method. In creating a weight codebook, to adopt a means of setting each element of the weight codebook using a distance measure that takes into account the human sound pressure sensory properties.

【００１８】重み符号帳及びその作成方法に係わる第２
の手段として、上記第１の手段において、人間の音圧知
覚特性を入力音響信号のパワーの関数をして加味した距
離尺度を用いるという手段を採用する。[0018] The second related to the weight codebook and the method of producing the same.
In the first means, a means of using a distance scale in which a human sound pressure perception characteristic is added as a function of the power of an input sound signal is used.

【００１９】重み符号帳及びその作成方法に係わる第３
の手段として、上記第２の手段において、入力音響信号
をｘとした場合に、前記関数を（｜ｘ｜²）^p／｜ｘ｜²
によって与えるという手段を採用する。ただし、ｐは、
０＜ｐ＜１の範囲内の値である。The third related to the weight codebook and the method of producing the same.
In the second means, when the input acoustic signal is x, the function is expressed as (| x | ² ) ^p / | x | ²
The means of giving by means is adopted. Where p is
It is a value in the range of 0 <p <1.

【００２０】また、本発明では、ＭＡ予測係数の初期値
の設定方法に係わる第１の手段として、ＭＡ予測ベクト
ル量子化におけるＭＡ予測係数の初期値の設定方法にお
いて、複数の学習用データから算出されたＡＲ予測係数
をＭＡ予測の手法を用いて近似することによりＭＡ予測
係数の初期値を算出するという手段を採用する。According to the present invention, as a first means relating to a method of setting an initial value of an MA prediction coefficient, a method of setting an initial value of an MA prediction coefficient in MA prediction vector quantization comprises calculating from a plurality of learning data. A means of calculating the initial value of the MA prediction coefficient by approximating the obtained AR prediction coefficient using the MA prediction method is adopted.

【００２１】ＭＡ予測係数の初期値の設定方法に係わる
第２の手段として、ＭＡ予測ベクトル量子化におけるＭ
Ａ予測係数の初期値の設定方法において、複数の学習用
データの平均値を算出する行程と、各学習用データから
前記平均値を差し引いたものを用いて共分散法に基づい
て第１のＡＲ予測係数を算出する行程と、第１のＡＲ予
測係数をフィルタ係数とするフィルタのインパルス応答
を求める行程と、インパルス応答を入力とし、自己相関
法に基づいてＭＡ予測係数と同次数の第２のＡＲ予測係
数を算出する行程と、第２のＡＲ予測係数をフィルタ係
数とするフィルタの逆フィルタを求めることによりＭＡ
予測係数の初期値を算出する行程とを有する手段を採用
する。As a second means relating to the method of setting the initial value of the MA prediction coefficient, M
In the method of setting the initial value of the A prediction coefficient, a process of calculating an average value of a plurality of learning data and a first AR based on a covariance method using a value obtained by subtracting the average value from each learning data. A step of calculating a prediction coefficient, a step of obtaining an impulse response of a filter using the first AR prediction coefficient as a filter coefficient, and a step of receiving the impulse response and inputting the second impulse response having the same order as the MA prediction coefficient based on the autocorrelation method. The step of calculating the AR prediction coefficient and the inverse filter of the filter using the second AR prediction coefficient as the filter coefficient are used to obtain MA.
Means for calculating the initial value of the prediction coefficient.

【００２２】また、本発明では、ＭＡ予測ベクトル量子
化用の重み符号帳及びその作成方法に係わる第１の手段
として、ＭＡ予測係数の初期値の設定方法に係わる上記
各手段を用いてＭＡ予測係数の初期値を算出すると、Ｌ
ＢＧアルゴリズムに基づいて重み符号帳の初期符号帳を
作成する行程と、ＭＡ予測係数の初期値と初期符号帳を
備えたＭＡ予測ベクトル量子化処理において、再帰的な
学習によって各学習用データについて算出される再生信
号の歪みの総和が最小となるように重み符号帳の各要素
及びＭＡ予測係数を順次更新して最終的に各要素を決定
する行程とを有する手段を採用する。Further, in the present invention, as a first means relating to a weighting codebook for quantizing an MA prediction vector and a method for producing the same, the above-mentioned means relating to a method for setting an initial value of an MA prediction coefficient are used. When the initial value of the coefficient is calculated, L
In the process of creating the initial codebook of the weighted codebook based on the BG algorithm, and in the MA prediction vector quantization processing including the initial value of the MA prediction coefficient and the initial codebook, calculation is performed for each learning data by recursive learning. Means for sequentially updating each element of the weighted codebook and the MA prediction coefficient so as to minimize the sum of the distortion of the reproduced signal to be performed, and finally determining each element.

【００２３】ＭＡ予測ベクトル量子化用の重み符号帳及
びその作成方法に係わる第２の手段として、上記第１の
手段において、再生信号の歪みを人間の音圧知覚特性を
加味した距離尺度を用いて算出するという手段を採用す
る。As a second means relating to the weighting codebook for MA prediction vector quantization and a method for producing the same, in the first means, the distortion of the reproduced signal is obtained by using a distance scale taking into account the human sound pressure perception characteristics. A means of calculating by calculation is adopted.

【００２４】ＭＡ予測ベクトル量子化用の重み符号帳及
びその作成方法に係わる第３の手段として、上記第１の
手段において、再生信号の歪みを入力音響信号のパワー
の関数を加味した距離尺度を用いて算出するという手段
を採用する。As a third means relating to the weighting codebook for quantizing the MA prediction vector and a method for producing the same, in the first means, the distortion of the reproduced signal may be calculated by using a distance scale in which a function of the power of the input audio signal is added. A means of calculating by using this is adopted.

【００２５】ＭＡ予測ベクトル量子化用の重み符号帳及
びその作成方法に係わる第４の手段として、上記第１の
手段において、再生信号の歪みを下式（１）に示す入力
音響信号ｘのパワーの関数Ｗを加味した距離尺度を用い
て算出するという手段を採用する。ただし、ｐは、０＜
ｐ＜１の範囲内の値である。Ｗ＝（｜ｘ｜²）^p／｜ｘ｜² （１）As a fourth means relating to the weighting codebook for quantizing the MA prediction vector and a method for producing the same, in the above-mentioned first means, the distortion of the reproduced signal is calculated by calculating the power of the input acoustic signal x represented by the following equation (1). Means is calculated using a distance scale that takes into account the function W. Where p is 0 <
It is a value within the range of p <1. W = (| x | ² ) ^p / | x | ² (1)

【００２６】さらに、本発明では、音響信号の符号化方
法に係わる第１の手段として、入力音響信号を、その短
期間の周波数スペクトルの包絡特性の予測結果を示す短
期予測信号とその予測残差を示す駆動音源信号とに分離
し、該駆動音源信号に基づいて正規化ベクトルを選択す
ると共に該正規化ベクトルの大きさを規定する重み量を
該重み量を各要素とする重み符号帳から選択して駆動音
源ベクトル候補を生成し、該駆動音源ベクトル候補と前
記短期予測信号とに基づいて生成された再生音声候補の
入力音響信号に対する歪みが所定の距離尺度の下で最小
となるように前記駆動音源信号を生成し、このようにし
て生成された駆動音源信号と前記短期予測信号とを符号
化する音響信号の符号化方法において、人間の音圧知覚
特性を加味した距離尺度を用いて前記重み符号帳の各要
素を設定するという手段を採用する。Further, according to the present invention, as a first means relating to a method of encoding an audio signal, an input audio signal is converted into a short-term prediction signal indicating a prediction result of an envelope characteristic of a short-term frequency spectrum and a prediction residual thereof. And a normalization vector is selected based on the driving excitation signal, and a weight defining the magnitude of the normalization vector is selected from a weight codebook having the weight as an element. To generate a driving sound source vector candidate, such that the distortion of the input sound signal of the reproduced sound candidate generated based on the driving sound source vector candidate and the short-term prediction signal is minimized under a predetermined distance scale. In a sound signal encoding method for generating a driving sound source signal and coding the thus generated driving sound source signal and the short-term prediction signal, a distance considering a sound pressure perception characteristic of a human is added. Adopting means of setting each element of the weight codebook using a scale.

【００２７】音響信号の符号化方法に係わる第２の手段
として、上記第１の手段において、人間の音圧知覚特性
を入力音響信号のパワーの関数をして加味した距離尺度
を用いるという手段を採用する。As a second means relating to a method of encoding an audio signal, the first means is a means in which a distance scale is used in which a human sound pressure perception characteristic is added as a function of the power of an input audio signal. adopt.

【００２８】音響信号の符号化方法に係わる第３の手段
として、上記第２の手段において、入力音響信号をｘと
した場合に、前記関数を（｜ｘ｜²）^p／｜ｘ｜²によっ
て与えるという手段を採用する。ただし、ｐは、０＜ｐ
＜１の範囲内の値である。As a third means relating to an audio signal encoding method, when the input audio signal is x in the second means, the function is expressed by (| x | ² ) ^p / | x | ² The means of giving is adopted. Here, p is 0 <p
<Value within the range of 1.

【００２９】音響信号の符号化方法に係わる第４の手段
として、上記第１〜第３いずれかの手段において、ＭＡ
予測ベクトル量子化を用いて駆動音源信号から駆動音源
ベクトル候補を生成する場合、上記ＭＡ予測ベクトル量
子化用の重み符号帳の作成方法に係わる第１の手段によ
って得られた重み符号帳及びＭＡ予測係数を用いるとい
う手段を採用する。As a fourth means relating to an audio signal encoding method, in any one of the above first to third means,
In the case of generating a driving excitation vector candidate from a driving excitation signal using prediction vector quantization, the weight codebook and MA prediction obtained by the first means relating to the method for creating a weighting codebook for MA prediction vector quantization are described. A means of using coefficients is employed.

【００３０】さらに、本発明では、上記音響信号の復号
方法に係わる第１の手段として、上記音響信号の符号化
方法に係わる第１〜第３いずれかの手段によって生成さ
れた符号を復号する音響信号の復号方法において、これ
らの音響信号の符号化方法に用いられる重み符号帳を用
いて前記符号を復号するという手段を採用する。Further, according to the present invention, as the first means relating to the above-described audio signal decoding method, an audio for decoding a code generated by any one of the first to third means relating to the above-described audio signal encoding method is provided. In the signal decoding method, means for decoding the code using a weighting codebook used in the encoding method of these audio signals is employed.

【００３１】上記音響信号の復号方法に係わる第２の手
段として、上記音響信号の符号化方法に係わる第４の手
段によって生成された符号を復号する音響信号の復号方
法において、この音響信号の符号化方法に用いられる重
み符号帳及びＭＡ予測係数を用いて前記符号を復号する
という手段を採用する。As a second means relating to the audio signal decoding method, a sound signal decoding method for decoding the code generated by the fourth means relating to the audio signal encoding method is provided. Means for decoding the code using a weighted codebook and MA prediction coefficients used in the decoding method.

【００３２】また、本発明では、上記音響信号の符号化
方法に係わる各手段に従った処理をコンピュータに読み
取り可能な記憶媒体に記憶した符号化プログラムとする
という手段を採用する。Further, the present invention employs a means for performing the processing according to each means relating to the above-described audio signal encoding method into an encoded program stored in a computer-readable storage medium.

【００３３】さらに、上記音響信号の復号方法に係わる
各手段に従った処理をコンピュータに読み取り可能な記
憶媒体に記憶した復号プログラムとするという手段を採
用する。Further, a means is adopted in which the processing according to each means relating to the method of decoding an acoustic signal is converted into a decoding program stored in a computer-readable storage medium.

【００３４】[0034]

【発明の実施の形態】以下、図面を参照して、本発明に
係わる重み符号帳とその作成方法及び符号帳設計時にお
ける学習時のＭＡ予測係数の初期値の設定方法並びに音
響信号の符号化方法及びその復号方法並びに符号化プロ
グラムが記憶されたコンピュータに読み取り可能な記憶
媒体及び復号プログラムが記憶されたコンピュータに読
み取り可能な記憶媒体の一実施形態について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring now to the drawings, a weighted codebook according to the present invention, a method for producing the codebook, a method for setting an initial value of an MA prediction coefficient at the time of learning at the time of codebook design, and encoding of an audio signal will be described. An embodiment of a method and a decoding method thereof, a computer-readable storage medium storing an encoding program, and a computer-readable storage medium storing a decoding program will be described.

【００３５】図１は、本実施形態に係わる符号化器の機
能構成を示すブロック図である。この図に示すように、
入力音声（入力音響信号）ｘは、線形予測分析部１−１
と歪み計算部１−６とに入力される。こ線形予測分析部
１−１は、入力音声ｘの周波数スペクトル包絡特性を表
す線形予測パラメータａを算出し、線形予測パラメータ
符号化部１−２及び歪み計算部１−６に出力する。FIG. 1 is a block diagram showing a functional configuration of an encoder according to the present embodiment. As shown in this figure,
The input speech (input audio signal) x is input to the linear prediction analysis unit 1-1.
And the distortion calculator 1-6. The linear prediction analysis unit 1-1 calculates a linear prediction parameter a indicating a frequency spectrum envelope characteristic of the input speech x, and outputs the calculated parameter to the linear prediction parameter encoding unit 1-2 and the distortion calculation unit 1-6.

【００３６】ここで、入力音声ｘは、例えば１０ｍｓの
間隔で分割されたフレームとして線形予測分析部１−１
に入力されるようになっており、線形予測分析部１−１
から出力される線形予測パラメータａは、各フレームの
周波数スペクトル包絡特性を表すものとなる。Here, the input speech x is converted into a frame divided at intervals of 10 ms, for example, as a linear prediction analysis unit 1-1.
The linear prediction analysis unit 1-1
Represents the frequency spectrum envelope characteristic of each frame.

【００３７】線形予測パラメータ符号化部１−２は、線
形予測パラメータａをフレーム毎に符号化し、線形予測
パラメータ符号ａfとして線形予測バラメータ復号部１
−３及び符号送出部１−９に出力する。線形予測パラメ
ータ復号部１−３は、線形予測パラメータ符号化部１−
２から入力された線形予測パラメータ符号ａfから合成
フィルタ係数ｂを再生し、合成フィルタ１−５に送出す
る。The linear prediction parameter encoding unit 1-2 encodes the linear prediction parameter a for each frame, and outputs the linear prediction parameter code af as the linear prediction parameter decoding unit 1.
-3 and the code transmission section 1-9. The linear prediction parameter decoding unit 1-3 includes a linear prediction parameter encoding unit 1-
2 reproduces the synthesis filter coefficient b from the linear prediction parameter code af input from 2 and sends it to the synthesis filter 1-5.

【００３８】合成フィルタ１−５は、上記合成フィルタ
係数ｂによって特性が規定される線形フィルタであり、
この合成フィルタ係数ｂと駆動音源ベクトル生成部１−
４から入力された駆動音源ベクトル侯補ｃとを用いて再
生音声候補ｙを生成出力する。この合成フィルタ１−５
の次数つまり上記線形予測分析部１−１における線形予
測分析の次数は、例えば１０〜１６次程度である。The synthesis filter 1-5 is a linear filter whose characteristics are defined by the synthesis filter coefficient b.
The synthesis filter coefficient b and the driving sound source vector generation unit 1-
The reproduction sound candidate y is generated and output using the driving sound source vector candidate c input from the step S4. This synthesis filter 1-5
, That is, the order of the linear prediction analysis in the linear prediction analysis unit 1-1 is, for example, about 10 to 16.

【００３９】なお、上記線形予測分析部１−１、線形予
測パラメータ符号化部１−２、線形予測パラメータ復号
部１−３及び合成フィルタ１−５を非線型なものに置き
換えても良い。線形予測分析の詳細及び線形予測パラメ
ータの符号化は、周知の技術事項であり、例えば古井忠
著『ディジタル音声処理』（東海大学出版会）に詳細が
記載されている。The linear prediction analysis section 1-1, linear prediction parameter encoding section 1-2, linear prediction parameter decoding section 1-3 and synthesis filter 1-5 may be replaced with non-linear ones. The details of the linear prediction analysis and the encoding of the linear prediction parameters are well-known technical matters, and are described in detail, for example, in Tadashi Furui, “Digital Speech Processing” (Tokai University Press).

【００４０】歪み計算部１−６は、上記入力音声ｘ、線
形予測パラメータａ及び合成フィルタ１−５から入力さ
れる再生音声候補ｙに基づいて、再生音声候補ｙの入力
音声ｘに対する歪みｄを上記フレーム毎に算出して符号
帳検索制御部１−８に出力するものである。The distortion calculator 1-6 calculates the distortion d of the reproduced speech candidate y with respect to the input speech x based on the input speech x, the linear prediction parameter a, and the reproduced speech candidate y input from the synthesis filter 1-5. It is calculated for each frame and output to the codebook search control unit 1-8.

【００４１】図２は、上記歪み計算部１−６の詳細構成
を示すブロック図である。この図に示すように、当該歪
み計算部１−６は、聴覚重み付けを考慮して再生音声候
補ｙの歪みｄを計算するために、合成フィルタ１−５か
ら出力された再生音声候補ｙを聴覚重みフィルタ２−２
を通してから距離計算部２−４に入力すると共に、入力
音声ｘをも聴覚重みフィルタ２−３を通して距離計算部
２−４に入力するように構成されている。FIG. 2 is a block diagram showing a detailed configuration of the distortion calculator 1-6. As shown in the figure, the distortion calculation unit 1-6 outputs the reproduced speech candidate y output from the synthesis filter 1-5 in the auditory sense in order to calculate the distortion d of the reproduced speech candidate y in consideration of the auditory weighting. Weight filter 2-2
And input to the distance calculation unit 2-4 through the auditory weight filter 2-3 while inputting to the distance calculation unit 2-4.

【００４２】聴覚重みフィルタ２−２は、上記線形予測
バラメータａを係数とするものであり、再生音声候補ｙ
を聴覚重みつき再生音声候補ｙwに変換して距離計算部
２−４に出力する。また、聴覚重みフィルタ２−３は、
固定係数を用いるフィルタであり、入力音声ｘを聴覚重
みつき入力音声ｘwに変換して距離計算部２−４に出力
する。これら聴覚重みフィルタ２−２，２−３は、距離
計算部２−４の後に１つのフィルタとして挿入しても等
価であるが、処理量の観点から距離計算部２−４の手前
に独立して設けられている。The auditory weight filter 2-2 uses the above-mentioned linear prediction parameter a as a coefficient.
Is converted into a perceptually weighted reproduced voice candidate yw and output to the distance calculator 2-4. The auditory weight filter 2-3 is
This is a filter using fixed coefficients, and converts the input voice x into an auditory weighted input voice xw and outputs it to the distance calculation unit 2-4. These auditory weight filters 2-2 and 2-3 are equivalent even if they are inserted as one filter after the distance calculator 2-4, but are independent before the distance calculator 2-4 from the viewpoint of processing amount. It is provided.

【００４３】一方、符号張検索制御部１−８は、上記再
生音声候補ｙの歪みｄの総和が各フレーム毎に最小とな
るような駆動音源符号ｅを選択して駆動音源ベクトル生
成部１−４に出力する。本実施形態では、図３及び図４
を参照して以下に説明するように、駆動音源ベクトル生
成部１−４が適応符号帳３−１、固定符号帳２−２及び
重み符号帳４−１を用いて駆動音源ベクトル侯補ｃを算
出するように構成されている。On the other hand, the code extension search control unit 1-8 selects the driving excitation code e such that the total sum of the distortions d of the reproduced speech candidate y is minimized for each frame, and selects the driving excitation vector generation unit 1-8. 4 is output. In the present embodiment, FIGS.
, The driving excitation vector generation unit 1-4 generates the driving excitation vector candidate c using the adaptive codebook 3-1, the fixed codebook 2-2, and the weighting codebook 4-1. It is configured to calculate.

【００４４】このことに関連して、符号張検索制御部１
−８は、再生音声候補ｙの歪みｄの総和が各フレーム毎
に最小となるように適応符号帳３−１用の周期符号、固
定符号帳２−２用の固定（雑音）符号及び重み符号帳４
−１用の重み符号を選択し、駆動音源符号ｅとして駆動
音源ベクトル生成部１−４に出力するように構成されて
いる。In connection with this, the code extension search control unit 1
-8 is a periodic code for the adaptive codebook 3-1, a fixed (noise) code and a weighted code for the fixed codebook 2-2 such that the sum of the distortions d of the reproduced speech candidate y is minimized for each frame. Book 4
The weighting code for −1 is selected and output to the driving excitation vector generating section 1-4 as the driving excitation code e.

【００４５】符号送出部１−９は、駆動音源符号ｅ及び
線形予測パラメータ符号ａfを、利用形態に応じて記憶
装置あるいは伝送路に出力する。上記駆動音源ベクトル
生成部１−４は、入力音声ｘの１フレーム分の長さの駆
動音源ベクトル侯補ｃを生成して合成フィルタ１−５に
出力するものである。The code transmitting section 1-9 outputs the driving excitation code e and the linear prediction parameter code af to a storage device or a transmission path according to a use mode. The driving sound source vector generation unit 1-4 generates a driving sound source vector candidate c having a length of one frame of the input sound x and outputs the generated driving sound source vector c to the synthesis filter 1-5.

【００４６】図３は、上記駆動音源ベクトル生成部１−
４の詳細構成を示すブロック図である。この図におい
て、適応符号帳３−１は、バッファ部（図示略）に記憶
されていた１フレーム前（過去）の駆動音源ベクトルｃ
（ｔ‐１）と上記駆動音源符号ｅに含まれる周期符号と
に基づいて入力音声ｘの周期成分に対応する時系列ベク
トル候補Ｖａをフレーム毎に生成する。FIG. 3 shows the driving sound source vector generating section 1-.
4 is a block diagram showing a detailed configuration of FIG. In this figure, adaptive codebook 3-1 stores driving excitation vector c one frame before (past) stored in a buffer unit (not shown).
Based on (t-1) and the periodic code included in the driving excitation code e, a time-series vector candidate Va corresponding to the periodic component of the input speech x is generated for each frame.

【００４７】この適応符号帳３−１は、駆動音源ベクト
ルｃ（ｔ‐１）を周期符号を用いて所定周期に相当する
長さで切り出し、この切り出したベクトルをフレームの
長さになるまで繰り返すことによって入力音声ｘの周期
成分に対応する周期符号ベクトル候補Ｖａ（正規化ベク
トル）を選択して出力する。適応符号帳３−１には、周
期符号に対応する正規化ベクトルが各要素として所定数
記憶されている。The adaptive codebook 3-1 cuts out the driving excitation vector c (t-1) at a length corresponding to a predetermined period using a periodic code, and repeats the cut-out vector until the length of the frame becomes the length of the frame. Thus, a periodic code vector candidate Va (normalized vector) corresponding to the periodic component of the input speech x is selected and output. The adaptive codebook 3-1 stores a predetermined number of normalized vectors corresponding to the periodic codes as respective elements.

【００４８】上記所定周期は、再生音声候補ｙの歪みｄ
が小さくなるように選択されるが、一般には入力音声ｘ
のピッチ周期に相当する周期である。本実施形態もこれ
に従う。固定符号帳３−２は、音声の非周期成分に対応
する１フレーム分の長さの固定符号ベクトル侯補（正規
化ベクトル）を選択出力する。この固定符号ベクトル侯
補は、入力音声ｘとは独立に符号化のためのビット数に
応じて予め指定された数の候補ベクトルとして固定符号
帳３−２に記憶されている。The predetermined period is the distortion d of the reproduced voice candidate y.
Is selected to be small, but generally, the input voice x
This is a cycle corresponding to the pitch cycle of. This embodiment also follows this. The fixed codebook 3-2 selectively outputs a fixed code vector candidate (normalized vector) having a length of one frame corresponding to the aperiodic component of the voice. The fixed code vector candidate is stored in the fixed code book 3-2 as a candidate vector of a predetermined number according to the number of bits for encoding independently of the input speech x.

【００４９】周期化部３−３は、固定符号帳３−２から
出力された固定符号ベクトル候補を周期符号で指定され
る上記所定周期（ビッチ周期に相当）で周期化した時系
列ベクトル候補Ｖｆを出力する。この周期化は、指定さ
れた周期位置にタップを有する櫛形フィルタをかける
か、適応符号帳３−１と同様にフレームの先頭から指定
された周期に相当する長さで切り出したベクトルを繰り
返す処理である。このような周期化部３−３は、符号化
効率向上の点から用いられることが多い。また、子音区
間等、入力音声ｘそのものにピッチ成分がないかあるい
は少ない場合などには、周期化部３−３は何の働きもし
ない。The periodicization section 3-3 is a time series vector candidate Vf obtained by periodicizing the fixed code vector candidates output from the fixed code book 3-2 at the above-mentioned predetermined period (corresponding to the bitch period) specified by the periodic code. Is output. This periodicization is performed by applying a comb filter having a tap at a specified periodic position, or by repeating a vector cut out at a length corresponding to a specified period from the beginning of a frame similarly to the adaptive codebook 3-1. is there. Such a periodic unit 3-3 is often used from the viewpoint of improving coding efficiency. In addition, when there is no or little pitch component in the input voice x itself, such as in a consonant section, the periodic unit 3-3 does not perform any operation.

【００５０】重み符号生成部３−７は、上記駆動音源符
号ｅに含まれるフレーム単位（１０ｍｓ単位）の重み符
号をさらに５ｍｓ毎の２つのサプフレームに分割するこ
とにより、正規化ベクトルである上記適応符号帳３−１
の周期符号ベクトル候補Ｖａに対する重み量ｇa及び周
期化部３−３から出力された非周期性の時系列ベクトル
候補Ｖｆ（正規化ベクトル）に対する重み量ｇfとをサ
ブフレーム毎に生成出力するものである。この重み符号
生成部３−７の詳細については、以下に詳説する。The weighting code generation unit 3-7 further divides the weighting code in frame units (10 ms units) included in the driving excitation code e into two subframes every 5 ms, thereby obtaining a normalized vector. Adaptive codebook 3-1
And a weight gf for the periodic code vector candidate Va and the weight gf for the non-periodic time-series vector candidate Vf (normalized vector) output from the periodicization unit 3-3 for each subframe. is there. The details of the weight code generation unit 3-7 will be described in detail below.

【００５１】乗算部３−４は、上記適応符号帳３−１か
ら入力された周期符号ベクトル候補Ｖａと重み符号生成
部３−７から入力された重み量ｇaとを乗算し、ベクト
ル候補ｃaとして加算部３−６に出力する。乗算部３−
５は、周期化部３−３から入力された時系列ベクトル候
補Ｖｆと重み符号生成部３−７から入力された重み量ｇ
fとを乗算し、ベクトル候補ｃfとして加算部３−６に出
力する。加算部３−６は、上記ベクトル候補ｃaとベク
トル候補ｃfとを加算し、駆動音源ベクトルの侯補ｃと
して合成フィルタ１−５に出力する。The multiplication unit 3-4 multiplies the periodic code vector candidate Va input from the adaptive codebook 3-1 by the weight ga input from the weight code generation unit 3-7 to obtain a vector candidate ca. Output to the adder 3-6. Multiplication unit 3-
5 is a time-series vector candidate Vf input from the periodizing unit 3-3 and a weight amount g input from the weight code generation unit 3-7.
, and outputs the result to the adder 3-6 as a vector candidate cf. The adding unit 3-6 adds the vector candidate ca and the vector candidate cf, and outputs the result to the synthesis filter 1-5 as a candidate c of the driving sound source vector.

【００５２】図４は、上記重み符号生成部３−７の詳細
構成を示す図であり、このうち（ａ）はブロック図、
（ｂ）は重み符号帳４−１の具体的構成例である。図４
（ａ）に示すように、重み符号生成部３−７は、ＭＡ予
側ベクトル量子化を適用したものであり、重み符号帳４
−１とＭＡ予測部４−２とから構成されている。FIG. 4 is a diagram showing a detailed configuration of the weight code generation section 3-7, wherein (a) is a block diagram,
(B) is a specific configuration example of the weight codebook 4-1. FIG.
As shown in (a), the weight code generation unit 3-7 applies the MA pre-vector quantization, and
-1 and the MA prediction unit 4-2.

【００５３】重み符号帳４−１は、上記適応符号帳３−
１に登録された規格化ベクトルに対する重み係数ｇaと
固定符号帳３−２に登録された規格化ベクトルに対する
重みのＭＡ予測算差ｘfとを要素とする２次元ベクトル
（要素ベクトル）を所定数登録したものである。例え
ば、各要素ベクトルを６ビットのデータとした場合、重
み符号帳４−１は、図４（ｂ）に示すように６４個（＝
２⁶）の要素ベクトルから構成される。The weighted codebook 4-1 is the adaptive codebook 3-
A predetermined number of two-dimensional vectors (element vectors) each having the weight coefficient ga for the normalized vector registered in 1 and the MA prediction difference xf of the weight for the normalized vector registered in the fixed codebook 3-2 as an element are registered. It was done. For example, when each element vector is 6-bit data, 64 (=) weight codebooks 4-1 as shown in FIG.
It is composed of ²⁶ ) element vectors.

【００５４】重み符号帳４−１は、実際の入力音声ｘの
符号化時においては、上記駆動音源符号ｅ内の重み符号
に基づいて要素ベクトルを選択し、この選択された要素
ベクトルの重み係数ｇaを乗算部３−４に出力すると共
に、この重み係数ｇaと組をなすＭＡ予測算差ｘfをＭＡ
予測部４−２に出力する。The weighting codebook 4-1 selects an element vector based on the weighting code in the driving excitation code e when the actual input speech x is coded, and weights the selected element vector by a weighting factor. ga is output to the multiplication unit 3-4, and the MA prediction difference xf paired with the weight coefficient ga is calculated by MA.
Output to the prediction unit 4-2.

【００５５】ＭＡ予測部４−２は、ＭＡ予測算差ｘfを
１サブフレーム分遅延させる（ｍ−１）段直列接続され
たバッファ部４ｂ1，４ｂ2，……４ｂ(m-1)、重み符号
帳４−１から出力されたＭＡ予測算差ｘf及び各バッフ
ァ部４ｂ1，４ｂ2，……４ｂ(m-1)の出力にＭＡ予測係
数ａ1，ａ2，ａ3，……ａmを乗算するベクトル乗算部４
ｋ1，４ｋ2，４ｋ3，……４ｋm、またこれらベクトル乗
算部４ｋ1，４ｋ2，４ｋ3，……４ｋmの各出力を順次加
算するベクトル加算部４ａ1，４ａ2，４ａ3，……から
構成されている。The MA prediction unit 4-2 delays the MA prediction difference xf by one sub-frame by (m-1) stages of serially connected buffers 4b1, 4b2,... 4b (m-1), A vector multiplication unit that multiplies the MA prediction difference xf output from the book 4-1 and the output of each buffer unit 4b1, 4b2,... 4b (m-1) by MA prediction coefficients a1, a2, a3,. 4
.., 4 km, and vector adders 4 a 1, 4 a 2, 4 a 3,... for sequentially adding the outputs of these vector multipliers 4 k 1, 4 k 2, 4 k 3,.

【００５６】このように構成されたＭＡ予測部４−２
は、時系列的に連続するｍ個のＭＡ予測算差ｘf(n)，ｘ
f(n-1)，ｘf(n-2)，……ｘf(n-m)とＭＡ予測係数ａ1，
ａ2，ａ3，……ａmとを用いて予測したＭＡ予測残差を
算出し、このＭＡ予測残差を固定符号帳３−２に対する
重み係数ｇfをとして乗算部３−５に出力する。The MA predicting unit 4-2 thus configured
Is the m successive MA prediction differences xf (n), x
f (n-1), xf (n-2),... xf (nm) and MA prediction coefficient a1,
.., am, and outputs the MA prediction residual to the multiplier 3-5 as a weight coefficient gf for the fixed codebook 3-2.

【００５７】このような重み符号帳４−１の各要素ベク
トル及びＭＡ予測係数ａ1，ａ2，ａ3，……ａmは、実際
の入力音声ｘの符号化に先立って行われる学習によって
決定される。この学習では、上記再生音声候補ｙの歪み
ｄが最小となるように重み符号帳４−１の各要素ベクト
ル及びＭＡ予測係数ａ1，ａ2，ａ3，……ａmが決定され
る。なお、この重み符号帳４−１の各要素ベクトル及び
ＭＡ予測係数ａ1，ａ2，ａ3，……ａmの設定方法の詳細
については、後述する。Each of the element vectors of the weighting codebook 4-1 and the MA prediction coefficients a 1, a 2, a 3,... Am are determined by learning performed prior to actual encoding of the input speech x. In this learning, each element vector of the weight codebook 4-1 and the MA prediction coefficients a1, a2, a3,..., Am are determined so that the distortion d of the reproduced voice candidate y is minimized. The method of setting each element vector of the weight codebook 4-1 and the MA prediction coefficients a1, a2, a3,... Am will be described later.

【００５８】続いて、図５は、上記符号化器に対応する
復号器の機能構成を示すブロック図である。符号受信部
５−１は、伝送路または記憶媒体から受信された符号を
受信し、線形予測パラメータ符号を線形予測パラメータ
復号部５−２に、また駆動音源符号ｅを駆動音源ベクト
ル生成部５−３にそれぞれ出力する。線形予測パラメー
タ復号部５−２は、線形予測パラメータ符号を復号して
上記合成フィルタ係数ｂを再生し、この合成フィルタ係
数ｂを合成フィルタ５−４及び後処理部５−５に出力す
る。FIG. 5 is a block diagram showing a functional configuration of a decoder corresponding to the encoder. The code receiving unit 5-1 receives the code received from the transmission path or the storage medium, and outputs the linear prediction parameter code to the linear prediction parameter decoding unit 5-2 and the driving excitation code e to the driving excitation vector generation unit 5- 3 respectively. The linear prediction parameter decoding unit 5-2 decodes the linear prediction parameter code to reproduce the synthesis filter coefficient b, and outputs the synthesis filter coefficient b to the synthesis filter 5-4 and the post-processing unit 5-5.

【００５９】駆動音源ベクトル生成部５−３は、駆動音
源符号ｅに対応する音源ベクトルを生成して合成フィル
タ５−４に出力する。なお、この駆動音源ベクトル生成
部５−３の構成は、上述した符号化器における駆動音源
ベクトル生成部１−４に対応する構成となる。合成フィ
ルタ５−４は、上記駆動音源ベクトルと合成フィルタ係
数ｂに基づいて入力音声ｘを再生し、再生音声を後処理
部５−５に出力する。後処理部５−５は、再生音声雑音
を聴覚的に低減させるポストフィルタリングを行い、出
力音声として外部に出力する。なお、この後処理部５−
５は、処理量の削減等の観点から設けられない場合もあ
る。Driving excitation vector generation section 5-3 generates an excitation vector corresponding to driving excitation code e and outputs the generated excitation vector to synthesis filter 5-4. The configuration of the driving excitation vector generation section 5-3 corresponds to the configuration of the driving excitation vector generation section 1-4 in the encoder described above. The synthesis filter 5-4 reproduces the input sound x based on the driving sound source vector and the synthesis filter coefficient b, and outputs the reproduced sound to the post-processing unit 5-5. The post-processing unit 5-5 performs post-filtering to reduce reproduced audio noise in an auditory sense, and outputs the resultant to the outside as output audio. The post-processing unit 5-
5 may not be provided from the viewpoint of reducing the processing amount.

【００６０】次に、このように構成された符号化器及び
復号器に適用する重み符号帳４−１及びＭＡ予測係数ａ
1，ａ2，ａ3，……ａmの決定方法について説明する。Next, the weighting codebook 4-1 and the MA prediction coefficient a applied to the encoder and decoder configured as described above.
A method for determining 1, a2, a3,... Am will be described.

【００６１】まず、上記符号化器に適用する重み符号帳
４−１、並びに符号化器及び復号器に適用するＭＡ予測
係数ａ1，ａ2，ａ3，……ａmを決定する場合、上記距離
計算部２−４は、以下のようにして歪みｄを算出する。
すなわち、人間の音圧知覚に関する聴覚特性を考慮して
入力音声ｘのパワーの関数として与えられる値Ｗを下式
（１）に基づいて算出する。ただし、ｐは、範囲（０＜
ｐ＜１）内の値であるが、０．３程度が好ましい。Ｗ＝（｜ｘ｜²）^p／｜ｘ｜² （１）First, when determining the weighting codebook 4-1 to be applied to the encoder and the MA prediction coefficients a1, a2, a3,... Am to be applied to the encoder and the decoder, the distance calculation unit 2-4 calculates the distortion d as follows.
That is, the value W given as a function of the power of the input voice x is calculated based on the following equation (1) in consideration of the auditory characteristics related to human sound pressure perception. Where p is in the range (0 <
Although it is a value within p <1), it is preferably about 0.3. W = (| x | ² ) ^p / | x | ² (1)

【００６２】また、上記値Ｗを係数とする以下の歪み計
算式（２）に基づいてサプフレーム毎の歪みｄを算出す
る。ここで、Ｈは、上記合成フィルタ１−５の特性を示
す下三角行列であり、主対角成分として合成フィルタ１
−５のインパルス応答の０次成分が、下位の対角成分と
してインパルス応答の１次成分，２次成分，……が並ぶ
ものである。また、Ｔは上記聴覚重みフィルタ２−２，
２−３の係数（聴覚重み係数）である。ｄ＝Ｗ・Ｔ｜ｘ−ｇa・Ｈ・Ｖa−ｇf・Ｈ・Ｖf｜² （２）Also, the distortion d for each subframe is calculated based on the following distortion calculation equation (2) using the value W as a coefficient. Here, H is a lower triangular matrix indicating the characteristics of the synthesis filter 1-5, and the synthesis filter 1 is used as a main diagonal component.
The first-order component, second-order component,... Of the impulse response are arranged as lower-order diagonal components in the zeroth-order component of the impulse response of −5. T is the above auditory weight filter 2-2,
2-3 coefficients (hearing weight coefficients). d = W · T | x-ga · H · Va-gf · H · Vf | ² (2)

【００６３】そして、このように距離計算部２−４が設
定された符号化器を用いて、図６に示すフローチャート
に沿った処理を実施することにより、重み符号帳４−１
及びＭＡ予測係数ａ1，ａ2，ａ3，……ａmを決定する。
以下、図６及び図７に示すフローチャートを参照して、
本実施形態における重み符号帳４−１及びＭＡ予測係数
ａ1，ａ2，ａ3，……ａmを決定方法を詳細に説明する。Then, by using the encoder in which the distance calculation section 2-4 is set as described above, the processing according to the flowchart shown in FIG. 6 is performed, whereby the weight codebook 4-1 is obtained.
And the MA prediction coefficients a1, a2, a3,..., Am.
Hereinafter, referring to the flowcharts shown in FIGS. 6 and 7,
A method of determining the weight codebook 4-1 and the MA prediction coefficients a1, a2, a3,... Am in the present embodiment will be described in detail.

【００６４】本決定法と従来の決定法（図９参照）との
違いは、図６に示すように、ＭＡ予測係数ａ1，ａ2，ａ
3，……ａmの初期値を決定（６−１）した後に、重み符
号帳４−１の初期符号帳を決定（６−２）する点であ
る。以下の処理（６−１）〜（６−９）は、従来と同様
である。The difference between this determination method and the conventional determination method (see FIG. 9) is that the MA prediction coefficients a1, a2, a
3,..., After determining the initial value of am (6-1), determining the initial codebook of the weighted codebook 4-1 (6-2). The following processes (6-1) to (6-9) are the same as in the related art.

【００６５】本実施形態において、ＭＡ予測係数ａ1，
ａ2，ａ3，……ａmの初期値（ｍ次の初期値）は、図７
に示すように開ループ処理によって算出される。すなわ
ち、直流成分除去処理（７−１）において、学習用デー
タの平均を求めて学習用データから除算する。ＡＲ予測
係数計算処理１（７−２）では、学習用データから上記
平均を引いたデータを用いて、共分散法によって第１の
ＡＲ予測係数を求める。インパルス応答生成処理（７−
３）においては、上記第１のＡＲ予測係数によって構成
されたフィルタに単位インパルスを入力し、十分な長さ
のインパルス応答を得る。In this embodiment, the MA prediction coefficients a1,
The initial values of a2, a3,...
Is calculated by open loop processing as shown in FIG. That is, in the DC component removal processing (7-1), the average of the learning data is obtained and divided from the learning data. In the AR prediction coefficient calculation processing 1 (7-2), a first AR prediction coefficient is obtained by a covariance method using data obtained by subtracting the average from the learning data. Impulse response generation processing (7-
In 3), a unit impulse is input to the filter constituted by the first AR prediction coefficient, and an impulse response having a sufficient length is obtained.

【００６６】そして、ＡＲ予測係数計算処理２（７−
４）において、上記インバルス応答を入力とし、自己相
関法によってＭＡ予測係数と同一次数を持つ第２のＡＲ
予測係数（ｍ次）を求める。ＭＡ予測係数計算処理（７
−５）においては、上記第２のＡＲ予測係数の逆フィル
タを求めることにより、ｍ次のＭＡ予測係数ａ1，ａ2，
ａ3，……ａmの初期値を決定する。このような開ループ
処理によって、安定なフィルタ特性を持つＭＡ予測係数
ａ1，ａ2，ａ3，……ａmの初期値が得られる。Then, AR prediction coefficient calculation processing 2 (7-
In 4), the second AR having the same order as the MA prediction coefficient is input by the autocorrelation method using the above impulse response as an input.
A prediction coefficient (mth order) is obtained. MA prediction coefficient calculation processing (7
In -5), an m-th order MA prediction coefficient a1, a2,
a3,... determine the initial value of am. By such an open loop process, initial values of the MA prediction coefficients a1, a2, a3,... Am having stable filter characteristics are obtained.

【００６７】ここで、上述したように第１のＡＲ予測係
数のインパルス応答を目的とするＭＡ予測次数をｍ次で
打ち切れば、ＭＡ予測係数の初期値が得られると単純に
考えられるかもしれないが、ＡＲ予測係数をそのままＭ
Ａ予測で近似するためには非常に高いＭＡ予測次数が必
要となり、音響信号の符号化で用いるようなＭＡ予測次
数では、打切り誤差によって不安定なフィルタとなる。Here, if the MA prediction order for the impulse response of the first AR prediction coefficient is cut off at the m-th order as described above, it may be simply considered that the initial value of the MA prediction coefficient can be obtained. Is the AR prediction coefficient
To approximate by A prediction, a very high MA prediction order is required, and an MA prediction order used in audio signal encoding results in an unstable filter due to a truncation error.

【００６８】したがって、本実施形態によれば、開ルー
プ処理によって従来に比較して良好なＭＡ予測係数ａ
1，ａ2，ａ3，……ａmの初期値を決定することができ
る。従来では、このＭＡ予測係数ａ1，ａ2，ａ3，……
ａmの初期値を経験的に決定する方法が採られていた
が、本実施形態では上記開ループ処理によってＡＲ予測
の手法とＭＡ予測の手法とを組み合わせることにより、
より最適なＭＡ予測係数ａ1，ａ2，ａ3，……ａmの初期
値を決定することができる。Therefore, according to the present embodiment, a better MA prediction coefficient a
The initial values of 1, a2, a3,... Am can be determined. Conventionally, the MA prediction coefficients a1, a2, a3,...
Although the method of empirically determining the initial value of am has been adopted, in the present embodiment, the method of AR prediction and the method of MA prediction are combined by the open loop processing described above.
More optimal initial values of MA prediction coefficients a1, a2, a3,..., Am can be determined.

【００６９】ＭＡ予測係数ａ1，ａ2，ａ3，……ａmの初
期値が上記のように決定されると、該初期値に基づいて
従来と同様にＬＢＧアルゴリズムによって重み符号帳４
−１の初期符号帳が決定される。この場合、上述のよう
により最適なＭＡ予測係数ａ1，ａ2，ａ3，……ａmの初
期値が既に決定されているので、効率良く当該初期符号
帳を決定することができる。When the initial values of the MA prediction coefficients a1, a2, a3,... Am are determined as described above, based on the initial values, the weighted codebook 4 is obtained by the LBG algorithm in the same manner as in the prior art.
An initial codebook of -1 is determined. In this case, since the initial values of the more optimal MA prediction coefficients a1, a2, a3,... Am have already been determined as described above, the initial codebook can be efficiently determined.

【００７０】そして、このようにしてＭＡ予測係数ａ
1，ａ2，ａ3，……ａmの初期値と重み符号帳４−１の初
期符号帳が決定されると、これらに基づく再帰的な学習
処理（６−１）〜（６−９）によって、ＭＡ予測係数ａ
1，ａ2，ａ3，……ａmと重み符号帳４−１とが最終的に
決定される。すなわち、全ての学習用データについて歪
み計算式（２）に基づいて算出された歪みｄの総和Ｄが
最も小さくなるように、重み符号帳４−１の各要素及び
ＭＡ予測係数ａ1，ａ2，ａ3，……ａmが最終的に決定さ
れる。Then, the MA prediction coefficient a
When the initial values of 1, a2, a3,... Am and the initial codebook of the weighting codebook 4-1 are determined, recursive learning processes (6-1) to (6-9) based on these are performed. MA prediction coefficient a
1, a2, a3,... Am and the weight codebook 4-1 are finally determined. That is, each element of the weighting codebook 4-1 and the MA prediction coefficients a 1, a 2, a 3 are set so that the sum D of the distortions d calculated based on the distortion calculation formula (2) for all the learning data is minimized. ,..., Am are finally determined.

【００７１】このように最終決定された重み符号帳４−
１及びＭＡ予測係数ａ1，ａ2，ａ3，……ａmは、伝送あ
るいは記憶媒体に記憶する入力信号ｘを実際に符号化す
る符号化器及びこの入力信号ｘを復号して再生する復号
器（図１〜図５参照）に適用される。この場合、距離計
算部２−４は、従来法と同様の歪み計算式つまり上記歪
み計算式（２）においてＷ＝１とした計算式に基づい
て、サプフレーム毎の歪みｄが最小となるように入力音
声ｘを符号化する。The weight code book 4 finally determined in this way is
1 and the MA prediction coefficients a1, a2, a3,... Am are encoders for actually encoding an input signal x to be transmitted or stored in a storage medium, and a decoder for decoding and reproducing the input signal x (see FIG. 1 to 5). In this case, the distance calculation unit 2-4 minimizes the distortion d for each subframe based on the same distortion calculation formula as the conventional method, that is, the calculation formula with W = 1 in the above-described distortion calculation formula (2). To the input speech x.

【００７２】本実施形態に基づく重み符号帳４−１及び
ＭＡ予測係数ａ1，ａ2，ａ3，……ａmを適用した符号化
器及び復号器をソフトウェアによって例えばコンピュー
タ上に構成し、主観評価試験によって従来法との比較を
行った。この実験の結果、同じピットレートで重み符号
帳４−１を設計した場合には、従来法よりも本実施形態
の方が評価結果が良く、本発明の有効性が確認された。An encoder and a decoder to which the weighted codebook 4-1 and the MA prediction coefficients a1, a2, a3,..., Am based on the present embodiment are applied are configured by software on a computer, for example, and subjected to a subjective evaluation test. A comparison with the conventional method was performed. As a result of this experiment, when the weighted codebook 4-1 was designed at the same pit rate, the evaluation result was better in the present embodiment than in the conventional method, and the effectiveness of the present invention was confirmed.

【００７３】なお、図８に示すように、図６の処理に若
干の修正を加えることにより、ＬＳＰ量子化器のＬＳＰ
符号帳の決定に適用することが可能である。上述したよ
うにＭＡ予測係数の初期値を決定（８−１）した後にＬ
ＳＰ符号帳の初期符号帳を決定（８−１）して学習用デ
ータを符号化すると（８−３）、以下の処理（８−４）
〜（８−１０）を繰り返す。As shown in FIG. 8, by slightly modifying the processing of FIG. 6, the LSP quantizer LSP
It can be applied to codebook determination. After the initial value of the MA prediction coefficient is determined (8-1) as described above, L
When the initial codebook of the SP codebook is determined (8-1) and the learning data is encoded (8-3), the following processing (8-4)
(8-10) is repeated.

【００７４】すなわち、処理（８−４）ではＬＳＰ符号
帳の１段目を更新する処理を行い、処理（８−５）では
学習用データを符号化し、処理（８−６）ではＬＳＰ符
号帳の１段目を更新し、処理（８−７）ではＭＡ予測係
数の更新を行い、処理（８−８）では学習用データを符
号化し、さらに処理（８−９）では歪みＤ（歪みｄの総
和）を算出する処理を繰り返すことにより、ＬＳＰ符号
帳を決定することができる。That is, in the process (8-4), the process of updating the first stage of the LSP codebook is performed, in the process (8-5), the learning data is encoded, and in the process (8-6), the LSP codebook is updated. Is updated, the MA prediction coefficient is updated in the process (8-7), the learning data is encoded in the process (8-8), and the distortion D (distortion d) is further processed (8-9). ) Can be determined by repeating the process of calculating the sum of the LSP codebooks.

【００７５】[0075]

【発明の効果】以上説明したように、本発明に係わる重
み符号帳とその作成方法及び符号帳設計時における学習
時のＭＡ予測係数の初期値の設定方法並びに音響信号の
符号化方法及びその復号方法並びに符号化プログラムが
記憶されたコンピュータに読み取り可能な記憶媒体及び
復号プログラムが記憶されたコンピュータに読み取り可
能な記憶媒体によれば、以下のような効果を奏する。As described above, the weighted codebook according to the present invention, the method of creating the same, the method of setting the initial value of the MA prediction coefficient at the time of learning when designing the codebook, the method of encoding the acoustic signal, and the decoding thereof According to the method, the computer-readable storage medium storing the encoding program, and the computer-readable storage medium storing the decoding program, the following effects are obtained.

【００７６】（１）入力音響信号を、その短期間の周波
数スペクトルの包絡特性の予測結果を示す短期予測信号
とその予測残差を示す駆動音源信号とに分離し、該駆動
音源信号に基づいて正規化ベクトルを選択すると共に該
正規化ベクトルの大きさを規定する重み量を該重み量を
各要素とする重み符号帳から選択して駆動音源ベクトル
候補を生成し、該駆動音源ベクトル候補と前記短期予測
信号とに基づいて生成された再生音声候補の入力音響信
号に対する歪みが所定の距離尺度の下で最小となるよう
に前記駆動音源信号を生成し、このようにして生成され
た駆動音源信号と前記短期予測信号とを符号化する符号
化方法における前記重み符号帳の作成方法において、人
間の音圧知覚特性を加味した距離尺度を用いて前記重み
符号帳の各要素を設定するので、重み符号帳は、従来に
比較して音圧レベルの比較的低い部分に対応する要素が
多くなる。したがって、このようにして作成された重み
符号帳を用いて音響信号を符号化した場合、復号された
音響信号の音圧レベルの比較的低い部における音質劣化
を抑えることができる。(1) The input audio signal is separated into a short-term prediction signal indicating the prediction result of the envelope characteristic of the short-term frequency spectrum and a driving sound source signal indicating the prediction residual, and are separated based on the driving sound source signal. A driving excitation vector candidate is generated by selecting a normalization vector and selecting a weight amount that defines the size of the normalization vector from a weighting codebook that uses the weight amount as each element. Generating the driving sound source signal such that distortion of the input sound signal of the reproduced sound candidate generated based on the short-term prediction signal is minimized under a predetermined distance scale, and the driving sound source signal thus generated is generated. And the short-term prediction signal and the encoding method of the weighted codebook in the encoding method, wherein each element of the weighted codebook using a distance scale taking into account human sound pressure perception characteristics. Because a constant, the weight codebook becomes large corresponding element to a relatively low portion of the sound pressure level as compared with the prior art. Therefore, when an audio signal is encoded using the weighted codebook created in this way, it is possible to suppress sound quality deterioration in a portion of the decoded audio signal having a relatively low sound pressure level.

【００７７】（２）ＭＡ予測ベクトル量子化におけるＭ
Ａ予測係数の初期値の設定方法であって、複数の学習用
データから算出されたＡＲ予測係数をＭＡ予測の手法を
用いて近似することによりＭＡ予測係数の初期値を算出
するので、従来のＭＡ予測係数の初期値の設定方法に比
較してより的確なＭＡ予測係数の初期値を得ることがで
きる。このようなＭＡ予測係数の初期値を用いることに
より、ＭＡ予測ベクトル量子化に用いる重み符号帳及び
ＭＡ予測係数をより最適なものに設定することができ
る。(2) M in MA prediction vector quantization
This is a method of setting an initial value of the A prediction coefficient, in which the initial value of the MA prediction coefficient is calculated by approximating an AR prediction coefficient calculated from a plurality of learning data by using a method of MA prediction. A more accurate initial value of the MA prediction coefficient can be obtained as compared with the method of setting the initial value of the MA prediction coefficient. By using such an initial value of the MA prediction coefficient, it is possible to set the weight codebook and the MA prediction coefficient used for the MA prediction vector quantization to be more optimal.

【００７８】（３）本願発明に基づいてＭＡ予測係数の
初期値を用いて、ＬＢＧアルゴリズムに基づいてＭＡ予
測ベクトル量子化における重み符号帳の初期符号帳を作
成し、再帰的な学習によって各学習用データについて算
出される再生信号の歪みの総和が最小となるように重み
符号帳の各要素及びＭＡ予測係数を順次更新して最終的
に重み符号帳の各要素及びＭＡ予測係数を決定すること
により、重み符号帳及びＭＡ予測係数をより最適なもの
に設定することができる。したがって、入力音響信号を
低いビット数で符号化した場合でも高品質な再生音響信
号を得ることが可能である。(3) Based on the present invention, an initial codebook of a weighting codebook in MA prediction vector quantization is created based on the LBG algorithm using the initial value of the MA prediction coefficient, and each learning is performed by recursive learning. The elements of the weighted codebook and the MA prediction coefficient are sequentially updated so that the total sum of the distortions of the reproduction signal calculated for the data for use is minimized, and finally the elements of the weighted codebook and the MA prediction coefficient are determined. Thus, the weight codebook and the MA prediction coefficient can be set to more optimal ones. Therefore, even when the input audio signal is encoded with a low bit number, a high-quality reproduced audio signal can be obtained.

[Brief description of the drawings]

【図１】本発明の一実施形態に係わる符号化器の機能
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of an encoder according to an embodiment of the present invention.

【図２】本発明の一実施形態に係わる符号化器におけ
る歪み計算部の機能構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of a distortion calculator in an encoder according to an embodiment of the present invention.

【図３】本発明の一実施形態に係わる符号化器におけ
る駆動音源ベクトル生成部の機能構成を示すブロック図
である。FIG. 3 is a block diagram showing a functional configuration of a driving excitation vector generation unit in the encoder according to the embodiment of the present invention.

【図４】本発明の一実施形態に係わる符号化器におけ
る重み符号生成部の機能構成を示すブロック図及び重み
符号帳の構成例である。FIG. 4 is a block diagram illustrating a functional configuration of a weight code generation unit in an encoder according to an embodiment of the present invention, and a configuration example of a weight codebook.

【図５】本発明の一実施形態に係わる復号器の機能構
成を示すブロック図である。FIG. 5 is a block diagram illustrating a functional configuration of a decoder according to an embodiment of the present invention.

【図６】本発明の一実施形態に係わる重み符号帳の各
要素及びＭＡ予測係数の決定方法を示すフローチャート
である。FIG. 6 is a flowchart illustrating a method of determining each element of a weight codebook and an MA prediction coefficient according to an embodiment of the present invention.

【図７】本発明の一実施形態に係わるＭＡ予測係数の
初期値の決定方法を示すフローチャートである。FIG. 7 is a flowchart illustrating a method of determining an initial value of a MA prediction coefficient according to an embodiment of the present invention.

【図８】本発明に基づくＬＳＰ量子化器のＬＳＰ符号
帳の決定方法を示すフローチャートである。FIG. 8 is a flowchart illustrating a method of determining an LSP codebook of an LSP quantizer according to the present invention.

【図９】従来の符号帳の各要素及びＭＡ予測係数の決
定方法を示すフローチャートである。FIG. 9 is a flowchart showing a conventional method of determining each element of a codebook and an MA prediction coefficient.

[Explanation of symbols]

１−１……線形予測分析部１−２……線形予測パラメータ符号化部１−３……線形予測バラメータ復号部１−４……駆動音源ベクトル生成部１−５……合成フィルタ１−６……歪み計算部１−８……符号帳検索制御部１−９……符号送出部２−２，２−３……聴覚重みフィルタ２−４……距離計算部３−１……適応符号帳３−２……固定符号帳３−３……周期化部３−４，３−５……乗算部３−６……加算部３−７……重み符号生成部４−１……重み符号帳４−２……ＭＡ予測部４ｂ1，４ｂ2，……４ｂ(m-1)……バッファ部４ｋ1，４ｋ2，４ｋ3，……４ｋm……ベクトル乗算部４ａ1，４ａ2，４ａ3，……ベクトル加算部５−１……符号受信部５−２……線形予測パラメータ復号部５−３……駆動音源ベクトル生成部５−４……合成フィルタ５−５……後処理部 1-1 Linear prediction analysis unit 1-2 Linear prediction parameter encoding unit 1-3 Linear prediction parameter decoding unit 1-4 Drive excitation vector generation unit 1-5 Synthesis filter 1-6 ... Distortion calculator 1-8 Codebook search controller 1-9 Code transmitter 2-2, 2-3 Perceptual weight filter 2-4 Distance calculator 3-1 Adaptive code Book 3-2: Fixed codebook 3-3: Periodizing unit 3-4, 3-5: Multiplying unit 3-6: Addition unit 3-7: Weight code generation unit 4-1: Weight Codebook 4-2 MA prediction unit 4b1, 4b2, 4b (m-1) Buffer unit 4k1, 4k2, 4k3, 4km Vector multiplication unit 4a1, 4a2, 4a3, vector addition Unit 5-1 Code reception unit 5-2 Linear prediction parameter decoding unit 5-3 Driving excitation vector generation unit 5-4 Formed filter 5-5 ...... post-processing unit

フロントページの続き (72)発明者古家賢一東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5D045 CA01 CC07 DA11 5J064 AA01 BA13 BB01 BB03 BB14 BC12 BC17 BC27 BC29 BD02 BD03 BD04 9A001 CZ05 EE04 FF05 HH15 KZ56 (54)【発明の名称】重み符号帳とその作成方法及び符号帳設計時における学習時のＭＡ予測係数の初期値の設定方法並びに音響信号の符号化方法及びその復号方法並びに符号化プログラムが記憶されたコンピュータに読み取り可能な記憶媒体及び復号プログラムが記憶されたコンピュータに読み取り可能な記憶媒体Continuing on the front page (72) Inventor Kenichi Furuya 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo F-term in Nippon Telegraph and Telephone Corporation (reference) 5D045 CA01 CC07 DA11 5J064 AA01 BA13 BB01 BB03 BB14 BC12 BC17 BC27 BC29 BD02 BD03 BD04 9A001 CZ05 EE04 FF05 HH15 KZ56 (54) [Title of the Invention] Weighted codebook, method of creating the same, method of setting initial value of MA prediction coefficient during learning at the time of codebook design, and method of encoding acoustic signal and decoding thereof Method and computer-readable storage medium storing encoding program and computer-readable storage medium storing decoding program

Claims

[Claims]

1. An input audio signal is separated into a short-term prediction signal indicating a prediction result of an envelope characteristic of a short-term frequency spectrum and a driving sound source signal indicating a prediction residual thereof. A driving excitation vector candidate is generated by selecting a weighting vector that defines the magnitude of the normalization vector and a weighting codebook that uses the weighting amount as each element, and generates the driving excitation vector candidate and the driving excitation vector candidate and the short-term The drive sound source signal is generated such that the distortion to the input sound signal of the reproduced sound candidate generated based on the prediction signal is minimized under a predetermined distance scale, and the drive sound source signal thus generated and In the method for creating the weight codebook in the encoding method for encoding the short-term prediction signal, each element of the weight codebook is set using a distance scale that takes into account human sound pressure perception characteristics. Weight codebook creation method comprising Rukoto.

2. A method according to claim 1, wherein a distance scale is used in which human sound pressure perception characteristics are added as a function of the power of the input sound signal. How to make.

3. The method according to claim 2, wherein when the input audio signal is x, the function is based on a value p in a range (0 <p <1) (| x | ² ) ^p
/ | X | ^{2, which} is given by:

4. A weighted codebook obtained by the method for creating a weighted codebook according to claim 1.

5. A method for setting an initial value of an MA prediction coefficient in MA prediction vector quantization, wherein an AR prediction coefficient calculated from a plurality of learning data is set to an MA prediction coefficient.
A method for setting an initial value of an MA prediction coefficient, wherein the initial value of the MA prediction coefficient is calculated by approximation using a prediction method.

6. A method for setting an initial value of a MA prediction coefficient in MA prediction vector quantization, comprising: calculating an average value of a plurality of learning data; and subtracting the average value from each learning data. Calculating the first AR prediction coefficient based on the covariance method using: a step of obtaining an impulse response of a filter using the first AR prediction coefficient as a filter coefficient; Calculating the second AR prediction coefficient of the same order as the MA prediction coefficient based on the method, and calculating the initial value of the MA prediction coefficient by obtaining an inverse filter of a filter using the second AR prediction coefficient as a filter coefficient And setting an initial value of the MA prediction coefficient.

7. A method for creating a weighting codebook for quantizing an MA prediction vector using the method for setting an initial value of an MA prediction coefficient according to claim 5, wherein the initial value of the MA prediction coefficient is calculated. Creating an initial codebook of a weighted codebook based on the LBG algorithm, and in the MA prediction vector quantization including the initial value of the MA prediction coefficient and the initial codebook, for each learning data by recursive learning. A step of sequentially updating each element of the weighted codebook and the MA prediction coefficient so as to minimize the sum of the calculated distortions of the reproduced signal, and finally determining each element. How to create a book.

8. The method according to claim 7, wherein the distortion of the reproduced signal is calculated using a distance scale that takes into account human sound pressure perception characteristics. .

9. The method according to claim 7, wherein the distortion of the reproduced signal is calculated using a distance scale taking into account a function of the power of the input audio signal. Method.

10. The method according to claim 7, wherein the distortion of the reproduced signal is calculated based on the power of the input acoustic signal x expressed by the following equation (1) based on a value p within a range (0 <p <1). A weighting codebook, which is calculated using a distance scale taking into account the function W. W = (| x | ² ) ^p / | x | ² (1)

11. A method according to claim 7, wherein the M code is obtained by the method according to any one of claims 7 to 10.
Weight codebook for A prediction vector quantization.

12. An input audio signal is separated into a short-term predicted signal indicating a prediction result of an envelope characteristic of a short-term frequency spectrum and a driving sound source signal indicating a prediction residual thereof, and are normalized based on the driving sound source signal. A driving excitation vector candidate is generated by selecting a weighting vector that defines the magnitude of the normalization vector and a weighting codebook that uses the weighting amount as each element, and generates the driving excitation vector candidate and the driving excitation vector candidate and the short-term The drive sound source signal is generated such that the distortion to the input sound signal of the reproduced sound candidate generated based on the prediction signal is minimized under a predetermined distance scale, and the drive sound source signal thus generated and In the audio signal encoding method for encoding the short-term prediction signal, each element of the weight codebook is set using a distance scale that takes into account human sound pressure perception characteristics. Coding method Hibiki signal.

13. The audio signal encoding method according to claim 12, wherein a distance scale is used in which human sound pressure perception characteristics are added as a function of the power of the input audio signal. Method.

14. The audio signal encoding method according to claim 13, wherein when the input audio signal is x, the function is based on a value p in a range (0 <p <1) (| x | ² ) ^p
/ | X | ^{2, which} is given by:

15. The method according to claim 12, wherein in the encoding method of the acoustic signal according to any one of claims 12 to 14, when a driving excitation vector candidate is generated from the driving excitation signal using MA prediction vector quantization. An audio signal encoding method using a weighted codebook and an MA prediction coefficient obtained by a method for creating a weighted codebook for predictive vector quantization.

16. An audio signal decoding method for decoding a code generated by the audio signal encoding method according to claim 12, wherein a weight used in these audio signal encoding methods is provided. A decoding method of an acoustic signal, comprising decoding the code using a codebook.

17. A decoding method of an audio signal for decoding a code generated by the audio signal encoding method according to claim 15, wherein the weight codebook and the MA prediction coefficient used in the audio signal encoding method are provided. A decoding method for an acoustic signal, comprising decoding the code using

18. A computer-readable storage medium storing an encoding program for processing according to the audio signal encoding method according to claim 12. Description:

19. A computer-readable storage medium storing a decoding program for processing according to the audio signal decoding method according to claim 16. Description: