JP2947788B1

JP2947788B1 - High-speed encoding method and apparatus for speech and audio signals and recording medium

Info

Publication number: JP2947788B1
Application number: JP10129236A
Authority: JP
Inventors: 仲大室; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-05-12
Filing date: 1998-05-12
Publication date: 1999-09-13
Anticipated expiration: 2018-05-12
Also published as: JPH11327599A

Abstract

【要約】【課題】低いビットレート、少ないメモリ量、少ない
演算量で、高品質な再生音声を得られる音声および音響
信号の符号化を提供すること。【解決手段】目標音響信号と再生音響信号との間の歪
みを計算する過程で、周期化の周期化ゲインを１に設定
し、周期化のための周期を整数サンプル値で近似し、合
成フィルタまたは聴覚重みを考慮した合成フィルタのイ
ンパルス応答を求め、上記インパルス応答を周期化の周
期の２分の１以下の長さで打ち切り、上記打ち切ったイ
ンパルス応答を用いてインパルス応答行列の相関行列を
計算し、周期化を上記整数サンプル値による近似処理で
置き換えることを前提とし、上記打ち切ったインパルス
応答より計算されたインパルス応答行列の相関行列を用
いて、再生される音響信号のパワーの近似値を計算す
る。An object of the present invention is to provide audio and audio signal encoding that can obtain high-quality reproduced audio with a low bit rate, a small amount of memory, and a small amount of computation. SOLUTION: In a process of calculating a distortion between a target audio signal and a reproduced audio signal, a periodic gain for periodicization is set to 1, a period for periodicization is approximated by an integer sample value, and a synthesis filter is provided. Alternatively, the impulse response of the synthesis filter considering the auditory weight is obtained, the impulse response is truncated at a length equal to or less than half of the period of the period, and the correlation matrix of the impulse response matrix is calculated using the truncated impulse response. Calculating the approximate value of the power of the reproduced sound signal using the correlation matrix of the impulse response matrix calculated from the truncated impulse response, assuming that the periodicization is replaced with the approximation process using the integer sample values described above. I do.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声、音楽また
は音響信号の、スペクトル包絡特性を表すフィルタを音
源ベクトルで駆動して音声を合成する予測符号化によ
り、音声の信号系列を少ない情報量でディジタル符号化
する高能率音声符号化方法および装置および記録媒体に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to predictive coding for synthesizing speech by driving a filter representing a spectral envelope characteristic of speech, music, or an acoustic signal with a sound source vector to thereby reduce a signal sequence of the speech with a small amount of information. The present invention relates to a high-efficiency audio encoding method and apparatus for digital encoding and a recording medium.

【０００２】[0002]

【従来の技術】ディジタル移動体通信において電波を効
率的に利用したり、音声または音楽蓄積サービス等で通
信回線や記憶媒体を効率的に利用するために、高能率音
声符号化方法が用いられる。現在、音声を高能率に符号
化する方法として、原音声をフレームまたはサブフレー
ムと呼ばれる５〜５０ミリ秒程度の一定間隔の区間（以
降、フレームまたはサブフレームを総称してフレームと
呼ぶ。）に分割し、その１フレームの音声を周波数スペ
クトルの包絡特性を表す線形フィルタの特性と、そのフ
ィルタを駆動するための駆動音源信号との２つの情報に
分離し、それぞれを符号化する手法が提案されている。2. Description of the Related Art In order to efficiently use radio waves in digital mobile communication and to efficiently use communication lines and storage media for voice or music storage services, a high-efficiency voice coding method is used. At present, as a method for efficiently encoding speech, original speech is divided into sections at a fixed interval of about 5 to 50 milliseconds called frames or subframes (hereinafter, frames or subframes are collectively referred to as frames). A method has been proposed in which the audio of one frame is divided into two pieces of information, namely, a linear filter characteristic representing an envelope characteristic of a frequency spectrum and a drive excitation signal for driving the filter, and each is encoded. ing.

【０００３】この手法において、駆動音源信号を符号化
する方法として、音声のピッチ周期（基本周波数）に対
応すると考えられる周期成分と、それ以外の成分に分離
して符号化する方法が知られている。この駆動音源情報
の符号化法の例として、符号駆動線形予測符号化（Code
-Excited Linear Prediction：ＣＥＬＰ）がある。上記
技術の詳細については、文献 M.R.Schroeder and B.S.A
tal, "Code-Excited Linear Prediction(CELP): High Q
uality Speech at Very Low Bit Rates", IEEEProc. IC
ASSP-85, pp.937-940, 1985に記載されている。In this method, as a method of encoding a driving excitation signal, there is known a method of encoding by separating a periodic component considered to correspond to a pitch period (fundamental frequency) of voice and other components. I have. As an example of the coding method of the driving excitation information, code driving linear prediction coding (Code
-Excited Linear Prediction (CELP). For details of the above technology, see the literature MRSchroeder and BSA
tal, "Code-Excited Linear Prediction (CELP): High Q
uality Speech at Very Low Bit Rates ", IEEEProc. IC
ASSP-85, pp. 937-940, 1985.

【０００４】図４に上記符号化方法の構成例を示す。入
力端子に入力された音声は、線形予測分析部１−１にお
いて、入力音声の周波数スペクトル包絡特性を表す線形
予測パラメータが計算される。得られた線形予測パラメ
ータは線形予測パラメータ符号化部１−２において、符
号化されて線形予測パラメータ復号部１−３に送られ
る。また、歪み計算に聴覚特性を考慮するなど、入力音
声のスペクトル情報を利用して歪み計算を行う場合に
は、線形予測パラメータは歪み計算部１−６へも送られ
る。FIG. 4 shows a configuration example of the above-mentioned encoding method. For the speech input to the input terminal, a linear prediction parameter representing a frequency spectrum envelope characteristic of the input speech is calculated in the linear prediction analysis unit 1-1. The obtained linear prediction parameters are encoded in the linear prediction parameter encoding unit 1-2 and sent to the linear prediction parameter decoding unit 1-3. In addition, when distortion calculation is performed using spectral information of input speech, for example, by taking into account auditory characteristics in distortion calculation, the linear prediction parameter is also sent to the distortion calculation unit 1-6.

【０００５】線形予測パラメータ復号部１−３では、受
け取った符号から合成フィルタ係数を再生し、合成フィ
ルタ１−５に送る。歪み計算に聴覚特性を考慮する場合
に、歪み計算部１−６において量子化前の線形予測パラ
メータを用いる代わりに、上記復号された線形予測パラ
メータを歪み計算に使用することもある。なお、線形予
測分析の詳細および線形予測パラメータの符号化例につ
いては、例えば古井貞煕著”ディジタル音声処理”（東
海大学出版会）に記載されている。ここで、線形予測分
析部１−１，線形予測パラメータ符号化部１−２，線形
予測パラメータ復号部１−３および合成フィルタ１−５
は非線形なものに置き換えてもよい。[0005] The linear prediction parameter decoding section 1-3 reproduces a synthesis filter coefficient from the received code and sends it to the synthesis filter 1-5. When the auditory characteristics are considered in the distortion calculation, the decoded linear prediction parameter may be used in the distortion calculation instead of using the linear prediction parameter before quantization in the distortion calculation unit 1-6. The details of the linear prediction analysis and examples of encoding of the linear prediction parameters are described in, for example, "Digital Speech Processing" by Sadahiro Furui (Tokai University Press). Here, the linear prediction analysis unit 1-1, the linear prediction parameter encoding unit 1-2, the linear prediction parameter decoding unit 1-3, and the synthesis filter 1-5
May be replaced with a non-linear one.

【０００６】駆動音源ベクトル生成部１−４では、１フ
レーム分の長さの駆動音源ベクトル候補を生成し、合成
フィルタ１−５に送る。駆動音源ベクトル生成部１−４
は大きく分けて、音声のピッチ周期（基本周波数）に対
応する部分と、それ以外の非周期成分に対応する部分か
らなる。適応符号帳１−１０はピッチ周期に対応する部
分で、バッファに記憶された直前の過去の駆動音源ベク
トル（既に量子化された直前の１〜数フレーム分の駆動
音源ベクトル）ｃ（ｔ−１）を、ある周期に相当する長
さで切り出し、その切り出したベクトルをフレームの長
さになるまで繰り返すことによって、音声の周期成分に
対応する時系列ベクトルの候補を出力する。上記「ある
周期」とは、歪み計算部１−６における歪みが小さくな
るような周期が選択されるが、選択された周期は、一般
には音声のピッチ周期に相当することが多い。[0006] The drive excitation vector generation section 1-4 generates a drive excitation vector candidate having a length of one frame and sends it to the synthesis filter 1-5. Driving sound source vector generation section 1-4
Is roughly divided into a portion corresponding to the pitch cycle (fundamental frequency) of the voice and a portion corresponding to the other aperiodic components. The adaptive codebook 1-10 is a portion corresponding to the pitch period, and is a drive excitation vector of the immediately preceding past stored in the buffer (a drive excitation vector of one to several frames just before quantization) c (t-1 ) Is cut out at a length corresponding to a certain cycle, and the cut-out vector is repeated until the length of the frame is reached, thereby outputting a time-series vector candidate corresponding to a sound cycle component. As the “certain period”, a period in which the distortion in the distortion calculator 1-6 is reduced is selected, and the selected period generally corresponds to a pitch period of a voice in many cases.

【０００７】固定符号帳１−１１は、音声の基本周期以
外の成分に対応する部分で、入力音声とは独立に符号化
のためのビット数に応じてあらかじめ指定された数の候
補ベクトルが記憶されており、そこから１フレーム分の
長さの時系列符号ベクトルの候補を出力する。固定符号
帳１−１１から出力された固定符号ベクトル候補は、周
期化部１−１６において、周期符号で指定される周期
（上記のように一般にピッチ周期に相当）で必要に応じ
て周期化される。周期化とは、指定された周期位置にタ
ップを持つ櫛形フィルタをかけるか、適応符号帳と同様
に、ベクトルの先頭から指定された周期に相当する長さ
で切り出したベクトルを繰り返すことをいう。周期が非
整数サンプル値の場合は、標本化関数の畳み込みを利用
して、非整数サンプル周期の周期化を実現する。なお周
期化部１−１６は、子音区間や非音声区間など、音声そ
のものにピッチ成分がないかまたはピッチ成分が少ない
場合などには適用されない場合がある。The fixed codebook 1-11 is a portion corresponding to a component other than the basic period of speech, and stores a predetermined number of candidate vectors according to the number of bits for encoding independently of the input speech. Then, a time-series code vector candidate having a length of one frame is output therefrom. The fixed code vector candidates output from the fixed codebook 1-11 are periodicized as necessary by the periodicization unit 1-16 at a period specified by the periodic code (generally corresponding to the pitch period as described above). You. The term “periodization” refers to applying a comb filter having a tap at a specified period position, or repeating a vector cut out from the head of the vector at a length corresponding to the specified period, similarly to the adaptive codebook. When the period is a non-integer sample value, the convolution of the sampling function is used to realize the non-integer sample period. Note that the periodicization unit 1-16 may not be applied to a case where a speech itself has no pitch component or a small pitch component, such as a consonant section or a non-speech section.

【０００８】適応符号帳１−１０および周期化部１−１
６から出力された時系列ベクトルの候補は、乗算部１−
１３，１−１４において、それぞれ重み作成部１−１２
において作成された重みが乗算され、加算部１−１５に
おいて加算され、駆動音源ベクトルの候補ｃとなる。な
お、適応符号帳１−１０を用いないで、固定符号帳１−
１１のみの構成としてもよく、特に子音部や非音声区間
などのピッチ周期性の少ない信号を符号化するときに
は、ビットを節約するために、適応符号帳１−１０を用
いない構成にすることが多い。Adaptive Codebook 1-10 and Periodizing Unit 1-1
6 are output to the multiplication unit 1-
13 and 1-14, respectively, the weight creating unit 1-12
Are multiplied by the weights generated in step (1), and added in the adder 1-15 to obtain a driving sound source vector candidate c. Note that the fixed codebook 1- 1 is used without using the adaptive codebook 1-10.
In particular, when encoding a signal having a small pitch periodicity such as a consonant part or a non-voice section, a configuration not using the adaptive codebook 1-10 may be used in order to save bits. Many.

【０００９】合成フィルタ１−５は、線形予測パラメー
タ復号部１−３の出力をフィルタの係数とする線形フィ
ルタで、駆動音源ベクトル候補ｃを入力として再生音声
の候補ｙを出力する。合成フィルタ１−５の次数すなわ
ち線形予測分析の次数は、一般に１０〜１６次程度が用
いられることが多い。なお、既に述べたように、合成フ
ィルタ１−５は非線形なフィルタでもよい。The synthesis filter 1-5 is a linear filter that uses the output of the linear prediction parameter decoding section 1-3 as a filter coefficient, and outputs a reproduced sound candidate y with a driving excitation vector candidate c as an input. Generally, the order of the synthesis filter 1-5, that is, the order of the linear prediction analysis, is generally about 10 to 16 order. As described above, the synthesis filter 1-5 may be a non-linear filter.

【００１０】歪み計算部１−６では、合成フィルタ１−
５の出力である再生音声の候補ｙと、入力音声ｘとの歪
みを計算する。この歪みの計算は、例えば聴覚重み付け
など、合成フィルタの係数または量子化していない線形
予測係数を考慮にいれて行なうことが多い。The distortion calculator 1-6 includes a synthesis filter 1-
The distortion between the output y of the reproduced voice candidate y and the input voice x is calculated. The calculation of this distortion is often performed taking into account the coefficients of the synthesis filter or unquantized linear prediction coefficients, for example, perceptual weighting.

【００１１】符号帳検索制御部１−８では、各再生音声
候補ｙと入力音声ｘとの歪みが最小となるような周期符
号、固定符号および重み符号を選択し、そのフレームに
おける駆動音源ベクトルを決定する。このとき、理想的
には歪みが最小となる周期符号、固定符号および重み符
号の最適組み合わせを選択するべきであるが、そのため
には膨大な処理量が必要となり、現実的な時間内で処理
することは困難である。そこで実際には、周期符号→重
み符号の適応符号帳成分ｇ_a→固定符号→重み符号の固
定符号帳成分ｇ_rの順か、周期符号→重み符号の適応符
号帳成分ｇ_aの暫定値→固定符号→重み符号（固定符号
帳成分ｇ_rと適応符号帳成分ｇ_aの両方）の順に決め、順
番に決めた結果をもって最適組み合わせとみなすことが
多い。The codebook search control unit 1-8 selects a periodic code, a fixed code, and a weight code that minimize the distortion between each reproduced voice candidate y and the input voice x, and determines the driving excitation vector in the frame. decide. At this time, ideally, an optimal combination of a periodic code, a fixed code, and a weight code that minimizes distortion should be selected. However, an enormous amount of processing is required, and processing is performed in a realistic time. It is difficult. So in practice, the period codes → weighting code of an adaptive codebook component g _a → or order of the fixed code → Weight Codes of the fixed codebook component g _r, the provisional value of the period codes → weighting code of an adaptive codebook component g _a → decide the order of the fixed code → weighting code (both the fixed codebook component g _r and the adaptive codebook component g _a), is often regarded as optimal combination with a result of deciding the order.

【００１２】上記のように各符号を順番に決める場合、
最適な周期符号と重み符号の適応符号帳成分を探索する
ときには、固定符号帳はないものとして（固定符号帳成
分はゼロであると仮定して）探索する。周期符号と重み
符号の適応符号帳成分を決めた後に、固定符号と重み符
号を決めるときの構成例を図５に示す。図の簡略化のた
め、図５では図４に記載されている各部のうち、線形予
測関連の一部を省略している。When each code is determined in order as described above,
When searching for the optimal codebook component of the optimal periodic code and weight code, the search is performed assuming that there is no fixed codebook (assuming that the fixed codebook component is zero). FIG. 5 shows a configuration example in which the fixed code and the weight code are determined after the adaptive codebook components of the periodic code and the weight code are determined. For simplification of the drawing, FIG. 5 omits a part related to the linear prediction among the units described in FIG.

【００１３】先に決められた適応符号帳出力ｖ_aに先に
決めた重み、または重みの暫定値を乗算部２−１３にお
いて乗じ、適応符号ベクトルｃ_aを作成する。このｃ_aを
合成フィルタ２−１８に通したベクトルを入力音声ｘか
ら差し引く。これを固定符号帳成分の参照音声ｘ_rと呼
び、固定符号と重み符号を決めるときには、適応符号帳
成分はないものとして、固定符号帳の成分によって生成
された再生音声と、上記参照音声ｘ_rとの歪みが最小に
なるように最適値を探索する。なお、適応符号帳を用い
ない構成の場合には、入力音声と参照音声は同じであ
る。[0013] destination weight previously determined in the adaptive codebook output v _a which is determined or the provisional value of the weight, multiplied by the multiplication unit 2-13 generates an adaptive code vector c _a. Subtracting the vector through the c _a synthesis filter 2-18 from the input speech x. This is called the reference speech x _r of the fixed codebook component, when determining the fixed code and weighting code, as not adaptive codebook component, a playback sound generated by the components of the fixed codebook, the reference speech x _r The optimum value is searched so as to minimize the distortion. In the case where the adaptive codebook is not used, the input voice and the reference voice are the same.

【００１４】図５は適応符号帳と固定符号帳という、２
つの符号帳を用いる構成での探索方法であるが、固定符
号帳が例えば２段構成になっている場合など、３つ以上
の符号帳を用いる場合にも一般化することができる。簡
潔に表現すれば、既に決定しているかまたは暫定的に決
めた符号帳出力を合成して得られる再生音声成分を入力
音声から差し引いて参照音声を作成し、これから探索し
たい符号帳以外は存在しないものとして、上記参照音声
との歪みが最小になるような符号を探索する。なお、上
記符号を順番に決める手順において、各符号を順に１つ
だけに決めてしまうのではなく、途中の段階でいくつか
の候補を残しておいて最後にそれらの組み合わせの中か
ら最適なものを選ぶ場合もあり、これはディレードデシ
ジョンと呼ばれる。FIG. 5 shows an adaptive codebook and a fixed codebook.
Although the search method is a configuration using one codebook, it can be generalized to a case where three or more codebooks are used, for example, when a fixed codebook has a two-stage configuration. Simply put, a reference speech is created by subtracting the reproduction speech component obtained from the already determined or provisionally determined codebook output from the input speech, and there is no codebook other than the codebook to be searched from now on. Specifically, a code that minimizes distortion from the reference voice is searched for. In the above-described procedure for determining codes in order, instead of deciding only one code in order, some candidates are left at an intermediate stage, and finally the most suitable combination is selected from those combinations. In some cases, this is called a delayed decision.

【００１５】図４の符号帳検索制御部１−８において決
定された駆動音源符号（周期符号、雑音符号、重み符
号）と、線形予測パラメータ符号化部１−２の出力であ
る線形予測パラメータ符号は、符号送出部１−９に送ら
れ、利用の形態に応じて記憶装置に記憶されるか、また
は通信路を介して受信側へ送られる。The excitation code (periodic code, noise code, weight code) determined by codebook search control section 1-8 in FIG. 4 and linear prediction parameter code output from linear prediction parameter coding section 1-2. Is sent to the code sending section 1-9 and stored in a storage device or sent to the receiving side via a communication path according to the form of use.

【００１６】このようなＣＥＬＰ方式において問題とな
るのは、駆動音源ベクトル候補の選択をするための歪み
計算、特に固定符号帳を探索する際の歪み計算に、非常
に多くの演算処理が必要になることである。この問題に
対して、パルス駆動型のＣＥＬＰ、あるいはスパース符
号帳によるＣＥＬＰ方式が提案されている。いずれも、
固定符号帳から出力される固定符号ベクトルの一部の要
素（サンプル点）のみに値を持ち、その他の要素（サン
プル点）では値が０であるベクトルを固定符号ベクトル
として用いる。A problem in such a CELP method is that a large amount of arithmetic processing is required for distortion calculation for selecting a driving excitation vector candidate, particularly for distortion calculation for searching a fixed codebook. It is becoming. To address this problem, a pulse-driven CELP or a CELP scheme using a sparse codebook has been proposed. In each case,
A vector having a value in only some elements (sample points) of the fixed code vector output from the fixed codebook and having a value of 0 in other elements (sample points) is used as a fixed code vector.

【００１７】前者の一例を挙げるとパルス駆動型のＣＥ
ＬＰは、固定符号帳を、フレーム長のベクトルパターン
として蓄えるのではなく、高さが１のパルスを、フレー
ム内に数本、例えば、８０サンプルのフレームに対し
て、４本、適当な位置に立てることによって、固定符号
ベクトルとする方式である。このようなパルス型の駆動
音源の採用と、歪み計算において演算順序を工夫すれ
ば、従来のＣＥＬＰ方式に比べて演算処理を減らすこと
ができる。以下、パルス駆動型の方式における歪み計算
の例を図を用いて示す。One example of the former is a pulse-driven CE.
The LP does not store the fixed codebook as a vector pattern of a frame length, but puts several pulses having a height of 1 in an appropriate position in a frame, for example, four pulses in a frame of 80 samples. This is a method of setting a fixed code vector by setting up. By adopting such a pulse-type driving sound source and devising the calculation order in the distortion calculation, the number of calculation processes can be reduced as compared with the conventional CELP method. Hereinafter, an example of distortion calculation in the pulse driving type method will be described with reference to the drawings.

【００１８】図５の固定符号帳を探索する構成におい
て、聴覚重みづけを考慮して歪みを計算する構成例を図
６に示す。図６において、固定符号帳（固定符号帳のベ
クトル形状）を探索するときには、一般に、重みｇ_rは
任意の値をとってよいものとして探索し、最適な固定符
号べクトルを決めるか、あるいは適当な固定符号ベクト
ルの候補を少数に絞った後に重み符号を探索するので、
重み符号帳部分は省略し、重みは単にｇと表記した。ｇ
は各固定符号ベクトル毎に任意の値をとってよいものと
する。聴覚重みづけは、量子化していない線形予測パラ
メータもしくは量子化された合成フィルタ係数を用いた
聴覚重みフィルタの形で構成される。FIG. 6 shows an example of a configuration for calculating distortion in consideration of auditory weighting in the configuration for searching for a fixed codebook in FIG. 6, when searching fixed codebook (vector shape fixed codebook) generally weights g _r is searched as may take any value, or determines the optimum fixed code base vector, or a suitable After searching for a weight code after narrowing down the candidates for the fixed code vector to a small number,
The weight codebook portion is omitted, and the weight is simply expressed as g. g
May take any value for each fixed code vector. The auditory weighting is configured in the form of an auditory weight filter using unquantized linear prediction parameters or quantized synthetic filter coefficients.

【００１９】合成フィルタ３−１から出力される再生音
声候補ｙ_rは、聴覚重みフィルタ３−２を通され、同じ
く聴覚重みフィルタ３−３に通された参照音声ｘ_rとの
間で、歪みが計算される。ここで、聴覚重みフィルタ３
−２，３−３は通常同一のフィルタ係数を用いるため、
聴覚重みフィルタ３−２，３−３は、距離計算部３−４
の後にｌつのフィルタとして入れても等価であるが、処
理量の点から、図６に示されるように、距離計算部３−
４の手前で２ヶ所に分けて入れることが多い。The reproduced speech candidate y _r output from the synthesis filter 3-1 is passed through the perceptual weighting filter 3-2, also with the reference speech x _r which is passed through the perceptual weighting filter 3-3 strain Is calculated. Here, the auditory weight filter 3
Since −2 and 3-3 usually use the same filter coefficient,
The hearing weight filters 3-2 and 3-3 are provided by a distance calculator 3-4.
Is equivalent even if it is inserted as one filter after, but from the viewpoint of the processing amount, as shown in FIG.
It is often divided into two places just before 4.

【００２０】距離計算部３−４では、聴覚重み付き参照
音声ｘ_wと聴覚重み付き再生音声候補ｙ_wの間の距離を測
定する。このときの距離尺度には例えば、[0020] In the distance calculating section 3-4 measures the distance between the auditory weighted reference speech x _w and auditory weighted reproduced speech candidate y _w. The distance scale at this time is, for example,

【数１】といった距離尺度を用いればよい。（１）式の距離尺度
を最小にするような駆動音源ベクトルが選択される。な
お、聴覚重みフィルタ３−２，３−３は人間の聴覚特性
を利用して再生音声の雑音感を低減するような歪み計算
をするためのフィルタで、必ずしも用いなくてもよい。(Equation 1) Such a distance scale may be used. A driving sound source vector that minimizes the distance scale of the equation (1) is selected. The auditory weight filters 3-2 and 3-3 are filters for performing distortion calculation to reduce noise in the reproduced voice by using human auditory characteristics, and need not always be used.

【００２１】図６の構成において、駆動音源ベクトル候
補を合成フィルタ３−１と聴覚重みづけフィルタ３−２
に通す操作を高速に実行するためには、これらの２つの
フィルタを合わせて、等価なフィルタ特性を持つ１つの
聴覚重み付き合成フィルタとするとよい。等価な１つの
フィルタとするには、例えば合成フィルタ３−１の入力
から聴覚重みフィルタ３−２の出力までのインパルス応
答をフィルタ係数とするＦＩＲフィルタで表現すること
ができる。このとき、合成フィルタはＩＩＲフィルタで
あるので、ＦＩＲフィルタで等価なフィルタ特性を実現
するためには、フレーム長のタップ数のＦＩＲフィルタ
が必要となる。In the configuration shown in FIG. 6, the driving sound source vector candidates are combined with the synthesis filter 3-1 and the auditory weighting filter 3-2.
In order to execute the operation of passing through the filter at high speed, it is preferable to combine these two filters into one auditory weighted synthesis filter having equivalent filter characteristics. In order to make an equivalent one filter, for example, it can be expressed by an FIR filter that uses an impulse response from an input of the synthesis filter 3-1 to an output of the auditory weight filter 3-2 as a filter coefficient. At this time, since the synthesis filter is an IIR filter, in order to realize equivalent filter characteristics with the FIR filter, an FIR filter having the number of taps of the frame length is required.

【００２２】フレームの長さ（サンプル数またはベクト
ル長）をＮ、上記インパルス応答（ＦＩＲフィルタの係
数列）をｈ₀，ｈ₁，ｈ₂，…，ｈ_N-1とし、これらのイン
パルス応答を使って以下のようなインパルス応答行列Ｈ
を定義する。The frame length (the number of samples or the vector length) is N, the impulse response (coefficient sequence of the FIR filter) is h ₀ , h ₁ , h ₂ ,..., H _N−1, and these impulse responses are Using the following impulse response matrix H
Is defined.

【数２】また、周期化部３−７の機能も行列で表現することがで
き、この行列をＰで表す。(Equation 2) Further, the function of the periodic unit 3-7 can also be represented by a matrix, and this matrix is represented by P.

【００２３】上記行列ＨおよびＰを用いると、重みの値
ｇが任意の値をとることができるという前提のもとで
（１）式を最小にすることは、以下の（３）式を最大に
することと等価になる。Using the above matrices H and P, minimizing equation (1) on the premise that the weight value g can take an arbitrary value is equivalent to maximizing equation (3) below. Is equivalent to

【数３】上記（３）式を最大にすることをもって符号帳を探索す
るのは、（ＨＰ）^t（ＨＰ）とｘ_w ^t（ＨＰ）は１つのフ
レームに対して探索の最初に１回のみ計算をすればよ
く、パルス駆動型またはスパース符号帳を用いる場合
に、高速な探索を実現できるからである。なお、上記式
で、記号ｔはベクトルおよび行列の転置を表す。また、
（３）式における分母は、合成された信号のパワー（ベ
クトルの二乗和）に相当する。(Equation 3) The search for the codebook by maximizing the above equation (3) is based on the fact that (HP) ^t (HP) and _xw ^t (HP) need only be calculated once at the beginning of the search for one frame. This is because a high-speed search can be realized when a pulse-driven or sparse codebook is used. In the above equation, the symbol t represents transposition of a vector and a matrix. Also,
The denominator in the equation (3) corresponds to the power (sum of squares of the vector) of the combined signal.

【００２４】上記（３）式の値を高速に求める方法の構
成例を図７に示す。聴覚重み付き合成フィルタインパル
ス応答算出部４−１では、図６における聴覚重み付き合
成フィルタ部３−９のインパルス応答を計算する。図６
では、周期化部３−７は固定符号帳と合成フィルタの間
にあるが、図７の構成ではＨＰｖ_rの計算において、Ｈ
Ｐの値を先に計算することから、周期化部４−２はイン
パルス応答算出部４−１の次に配置し、インパルス応答
ｈを周期化する。FIG. 7 shows an example of a configuration of a method for quickly obtaining the value of the above equation (3). The hearing weighted synthesis filter impulse response calculation unit 4-1 calculates the impulse response of the hearing weighted synthesis filter unit 3-9 in FIG. FIG.
So although periodic unit 3-7 is between the fixed codebook synthesis filter, in the calculation of HPV _r in the configuration of FIG. 7, H
Since the value of P is calculated first, the periodicization unit 4-2 is arranged next to the impulse response calculation unit 4-1 to periodicize the impulse response h.

【００２５】相関行列計算部４−５では、周期化部４−
２の出力である、周期化されたインパルス応答を用い
て、相関行列（ＨＰ）^t（ＨＰ）を計算する。畳み込み
部４−４では、同じく周期化されたインパルス応答と聴
覚重みづけされた参照音声ｘ_wを用いてｘ_w ^t（ＨＰ）を
計算する。最終距離尺度計算部４−７では、相関行列計
算部４−５の出力行列、および畳み込み部４−４の出力
ベクトルと、固定符号帳４−６の出力である固定符号ベ
クトルを用いて、（３）式の値を計算する。In the correlation matrix calculation section 4-5, the periodicization section 4-
The correlation matrix (HP) ^t (HP) is calculated using the periodicized impulse response, which is the output of 2. The convolution unit 4-4 calculates x _w ^t (HP) using the impulse response that has been similarly periodized and the reference speech x _w weighted with the auditory sense. The final distance scale calculation unit 4-7 uses the output matrix of the correlation matrix calculation unit 4-5, the output vector of the convolution unit 4-4, and the fixed code vector output from the fixed codebook 4-6 to obtain ( 3) Calculate the value of equation.

【００２６】[0026]

【発明が解決しようとする課題】図７の構成を用いて歪
み計算を行う際に問題となるのは、フレームが長い場合
である。例えば、サンプリング周波数が８ｋＨｚでフレ
ーム長を１０ミリ秒とすると、フレーム長は８０点とな
り、相関行列（ＨＰ）^t（ＨＰ）のサイズは８０×８０
となる。このため、要素数が６４００の行列計算をしな
ければならず、多大なメモリと演算処理が必要となる。
低ビットレートで高能率な音声符号化を実現しようとす
る場合には、上記フレームを長くしなければならないた
め、図７の方法を用いて演算量の少ない低ビットレート
の音声符号化を実現することは不可能であった。A problem when performing the distortion calculation using the configuration of FIG. 7 is when the frame is long. For example, if the sampling frequency is 8 kHz and the frame length is 10 ms, the frame length is 80 points, and the size of the correlation matrix (HP) ^t (HP) is 80 × 80.
Becomes For this reason, a matrix calculation having 6400 elements must be performed, which requires a large amount of memory and arithmetic processing.
In order to realize high-efficiency voice encoding at a low bit rate, the above-mentioned frame must be lengthened. Therefore, low-bit-rate voice encoding with a small amount of computation is realized using the method of FIG. That was impossible.

【００２７】この問題を解決するひとつの方法として、
本発明者等が既に出願した「音響信号符号化方法」（特
願平９−０４０４０４）がある。この方式の構成例を図
８に示す。この方法はまず、歪み計算に用いる聴覚重み
付き合成フィルタのインパルス応答を途中で打ち切っ
て、有限長のＦＩＲフィルタとする。参照音声には、い
ったん合成フィルタの逆フィルタをかけたあとに、上記
有限長で打ち切ったＦＩＲフィルタをかけて、参照音声
にもインパルス応答の打ち切り歪みを重畳する。図６，
図７の構成では、駆動音源ベクトルをピッチ周期化して
いるが、図８の構成では参照音声側にピッチ逆フィルタ
を入れる。One way to solve this problem is as follows:
There is an “audio signal encoding method” (Japanese Patent Application No. 9-040404) filed by the present inventors. FIG. 8 shows a configuration example of this method. In this method, first, an impulse response of an auditory weighted synthesis filter used for distortion calculation is truncated halfway to obtain a finite-length FIR filter. The reference voice is once subjected to an inverse filter of the synthesis filter, and then to the FIR filter truncated to the above-mentioned finite length, and the truncation distortion of the impulse response is also superimposed on the reference voice. Figure 6
In the configuration of FIG. 7, the driving sound source vector is pitch-periodized, but in the configuration of FIG. 8, a pitch inverse filter is inserted on the reference voice side.

【００２８】この方法を用いることにより、相関行列Ｈ
_f ^tＨ_fのサイズを非常に小さくできる。例えば、インパ
ルス応答を５タップで打ち切ったと仮定すると、相関行
列は５×５のサイズの行列計算をするだけでよい。この
方法は、フレームが長い場合でも、非常に少ない処理量
で符号化を実現でき、さらに、インパルス応答を打ち切
った歪みが重畳した参照音声と、同じく打ち切り歪みが
重畳した合成音声との間で歪み計算がされるので、両者
間で打ち切りに起因する歪み成分が相殺され、結果とし
て短いタップでインパルス応答を打ち切っても音質劣化
を少なく抑えることができるというメリットがある。By using this method, the correlation matrix H
The size of _f ^t H _f can be made very small. For example, assuming that the impulse response is terminated by 5 taps, the correlation matrix only needs to be calculated in a matrix of size 5 × 5. With this method, even if the frame is long, encoding can be realized with a very small amount of processing.Furthermore, the distortion between the reference speech with the impulse response censored and the synthesized speech with the censored distortion also superimposed. Since the calculation is performed, the distortion component due to the truncation is canceled between the two, and as a result, there is an advantage that even if the impulse response is terminated with a short tap, the sound quality degradation can be suppressed to a small extent.

【００２９】しかしながら、図８の方法では、高速演算
を実現できる代償として少ないながら品質劣化を避ける
ことができない。品質劣化の主な要因は、本来の駆動音
源ベクトル側のピッチ周期化を、ピッチ周期化逆フィル
タとして参照音声側に入れるためである。インパルス応
答の打ち切り手法そのものは品質にほとんど影響を与え
ない。そこで、ピッチ周期化を駆動音源ベクトル側に残
したままで、インパルス応答の打ち切りによる高速演算
を実現したい。しかし、ピッチ周期化部を単純に駆動音
源ベクトル側に戻したのでは、インパルス応答ｈを有限
長で打ち切っても、インパルス応答をピッチ周期化した
Ｐｈは有限長にならず、高速な演算を実現できない。However, in the method shown in FIG. 8, quality deterioration cannot be avoided, albeit little, at the cost of realizing high-speed operation. The main factor of the quality deterioration is that the pitch period on the original drive sound source vector side is put into the reference sound side as a pitch period inverse filter. The truncation of the impulse response itself has little effect on quality. Therefore, it is desired to realize a high-speed calculation by terminating the impulse response while keeping the pitch period on the driving sound source vector side. However, simply returning the pitch-periodization unit to the driving sound source vector side, even if the impulse response h is truncated to a finite length, Ph whose pitch-period of the impulse response does not become a finite length, realizes high-speed operation. Can not.

【００３０】この発明の目的は、低いビットレート、か
つ安価なプロセッサで許容される範囲内の少ないメモリ
量、少ない演算量で、高品質な再生音声を得られるよう
な、音声または音楽またはその他の音響信号をディジタ
ル符号化する方法および装置および記録媒体を提供する
ことにある。An object of the present invention is to provide a low bit rate, low memory amount within a range permitted by an inexpensive processor, and a small amount of operation, so that high quality reproduced sound can be obtained by voice or music or other processing. It is an object of the present invention to provide a method and apparatus for digitally encoding an audio signal and a recording medium.

【００３１】[0031]

【課題を解決するための手段】本発明は、符号帳から取
り出した時系列ベクトルを、音声の基本周期に対応する
周期で周期化したベクトルを用いて作成した駆動音源ベ
クトルにより、合成フィルタを駆動して音響信号を再生
し、目標となる音響信号と、上記再生された音響信号の
間の歪みが最小または最小に準ずるような駆動音源ベク
トルを決定する符号化において、上記歪みを計算する過
程で、周期化の周期化ゲインを１に設定し、周期化のた
めの周期を整数サンプル値で近似し、合成フィルタまた
は聴覚重みを考慮した合成フィルタのインパルス応答を
求め、上記インパルス応答を周期化の周期の２分の１以
下の長さで打ち切り、上記打ち切ったインパルス応答を
用いてインパルス応答行列の相関行列を計算し、周期化
を上記整数サンプル値による近似処理で置き換えること
を前提とし、上記打ち切ったインパルス応答より計算さ
れたインパルス応答行列の相関行列を用いて、再生され
る音響信号のパワーの近似値を計算することを特徴とす
る。According to the present invention, a synthesis filter is driven by a driving sound source vector created by using a time-series vector extracted from a codebook using a vector periodicized at a cycle corresponding to a basic cycle of speech. In the encoding to determine a driving sound source vector such that the distortion between the target audio signal and the reproduced audio signal is minimum or similar to the minimum, in the process of calculating the distortion, , The periodization gain of the periodization is set to 1, the period for the periodization is approximated by an integer sample value, and the impulse response of the synthesis filter or the synthesis filter in consideration of the auditory weight is obtained. The period is truncated to a half or less of the period, the correlation matrix of the impulse response matrix is calculated using the truncated impulse response, and the period is calculated by the integer sampling. It assumes that replace approximation by value, by using a correlation matrix of the censored impulse response from the calculated impulse response matrix, and calculates the approximate value of the power of the acoustic signal reproduced.

【００３２】[0032]

【発明の実施の形態】§１．概要この発明では、（３）式における、分子と分母のピッチ
周期化行列Ｐの両方か、分母のピッチ周期化行列のみ
を、整数サンプル点ピッチで近似し、周期符号の示すピ
ッチ周期の変動範囲を考慮して、インパルス応答の打ち
切り次数を、一定の範囲内に設定することにより、ピッ
チ周期化部を駆動音源ベクトル側に入れたままで、小さ
いサイズの行列を用いた高速な歪み計算を実現できる。DESCRIPTION OF THE PREFERRED EMBODIMENTS §1. Overview In the present invention, the pitch period variation range indicated by the periodic code is obtained by approximating both the numerator and the denominator pitch periodic matrix P or only the denominator pitch periodic matrix in the equation (3) by the integer sample point pitch. In consideration of the above, by setting the truncation order of the impulse response within a certain range, high-speed distortion calculation using a matrix of a small size can be realized while the pitch periodization unit is kept on the side of the driving sound source vector. .

【００３３】§２．実施形態以下にこの発明の実施形態を、図表および式を用いて説
明する。なお、本実施形態において、本発明による符号
化方法は、具体的にはパーソナルコンピュータで実行さ
れる。即ち、以下に説明する符号化方法は、上記パーソ
ナルコンピュータのＣＰＵ（中央処理装置）を制御する
ための制御プログラムとして、半導体メモリ（ＲＯＭ，
ＲＡＭ等）またはその他の記録媒体（磁気ディスク等）
に格納されている。そして、上記ＣＰＵは、上記制御プ
ログラムに基づいて音声を符号化する。§2. Embodiments Embodiments of the present invention will be described below with reference to figures and tables. In the present embodiment, the encoding method according to the present invention is specifically executed by a personal computer. That is, the encoding method described below uses a semiconductor memory (ROM, ROM, etc.) as a control program for controlling the CPU (central processing unit) of the personal computer.
RAM, etc.) or other recording media (magnetic disk, etc.)
Is stored in Then, the CPU encodes audio based on the control program.

【００３４】まず、ピッチ周期化の周期化ゲインを１に
設定する。さらに、（３）式において、分母のピッチ周
期化行列Ｐを近似行列Ｐ_aで置き換える。分子のピッチ
周期化行列ＰはＰ_aに置き換えても置き換えなくてもよ
いが、置き換えないほうが音質の点ではよい。ここでピ
ッチ周期化行列の近似行列Ｐ_aとは、ピッチ周期が非整
数サンプル値、例えば４６．２５サンプル（８ｋＨｚサ
ンプリングの場合では約１７３Ｈｚに相当）であった場
合に、整数値、例えば４６サンプルに近似して作成した
ピッチ周期化行列をいう。First, the period gain of the pitch period is set to 1. Further, in (3), replace the pitch period of matrix P in the denominator in the approximate matrix P _a. Pitch cycle of matrix P of the molecule may or may not replaced be replaced by a P _a, but should not replace good in terms of sound quality. Here, the approximate matrix P _a pitch period of matrices, when the pitch period is a non-integer sample values, for example (in the case of 8kHz sampling corresponds to approximately 173Hz) 46.25 Sample was, integers, for example 46 samples Is a pitch periodic matrix created by approximating

【００３５】一般に非整数サンプル周期のピッチ周期化
は、標本化関数を過去の信号系列に畳み込む処理が必要
であるが、整数サンプル値でしかも周期化ゲインが１の
場合には、過去のサンプル値を単純に繰り返すか、単純
に加算するだけでよい。以下に、周期化ゲインを１に設
定し、周期を整数サンプル値に近似したピッチ周期化行
列の例を示す。In general, the pitch period of a non-integer sample period requires a process of convolving a sampling function with a past signal sequence. However, when the integer sample value is used and the period gain is 1, the past sample value is used. May simply be repeated or simply added. The following is an example of a pitch periodic matrix in which the periodic gain is set to 1 and the period is approximated to an integer sample value.

【数４】さらに、インパルス応答ｈを、前記「音響信号符号化方
法」（特願平９−０４０４０４）で述べられているのと
同様に有限長で打ち切る、すなわち打ち切りより先のイ
ンパルス応答は０であるとみなす。(Equation 4) Further, the impulse response h is truncated to a finite length in the same manner as described in the above-mentioned “Audio Signal Coding Method” (Japanese Patent Application No. 9-040404), that is, the impulse response before the truncation is regarded as 0. .

【００３６】この打ち切ったインパルス応答を用いて作
成するインパルス応答行列をＨ_fで表し、分子および分
母のＨと置き換える。すると、（３）式は以下の（４）
式で表現される。An impulse response matrix created using the truncated impulse response is represented by H _f , and is replaced with H of the numerator and denominator. Then, equation (3) becomes the following equation (4)
It is represented by an expression.

【数５】（４）式は（３）式の近似式とみなすことができ、
（３）式を最大にするコードベクトルｖ_rを探索する代
わりに、（４）式を最大にするコードベクトルを探索す
る。（４）式の分母は、ピッチ周期化の周期を整数サン
プル値で近似して合成した信号のパワーに相当する。(Equation 5) Equation (4) can be regarded as an approximation of equation (3),
(3) Instead of searching for a code vector v _r to maximize, to search for a code vector which maximizes the equation (4). The denominator of the equation (4) corresponds to the power of a signal synthesized by approximating the pitch period period with an integer sample value.

【００３７】（４）式において、インパルス応答の打ち
切り次数を短くすれば、相関行列Ｈ _f ^tＨ_fのサイズを非
常に小さくできる。例えば、インパルス応答を５タップ
で打ち切ったと仮定すると、相関行列は５×５のサイズ
の行列計算をするだけでよい。相関行列Ｈ_f ^tＨ_fを計算
した後、Ｐ_a ^tＨ_f ^tＨ_fＰ_aを計算する。このとき「インパ
ルス応答の打ち切り次数がピッチ周期の半分以下」の条
件のもとでは、非常に簡単な処理でＨ_f ^tＨ_fからＰ_a ^tＨ_f
^tＨ_fＰ_aが求められる。「インパルス応答の打ち切り次
数がピッチ周期の半分以下」の場合とは、例えば、ピッ
チ周期が２０サンプル（８ｋＨｚサンプリングの場合に
は４００Ｈｚに相当）のときに、インパルス応答の打ち
切りの次数が１０次（１０タップ）以下の場合に相当す
る。In equation (4), the impulse response
By shortening the cutting order, the correlation matrix H _f ^tH_fThe size of the non
Can always be smaller. For example, 5 taps on the impulse response
If the correlation matrix is assumed to be truncated, the correlation matrix has a size of 5 × 5.
It is only necessary to calculate the matrix. Correlation matrix H_f ^tH_fCalculate
After that, P_a ^tH_f ^tH_fP_aIs calculated. At this time,
The censored order of the lus response is less than half the pitch period. "
Under the circumstances, very simple processing_f ^tH_fTo P_a ^tH_f
^tH_fP_aIs required. "Truncation of impulse response
The number is less than half the pitch period ''
Cycle is 20 samples (8 kHz sampling
Is equivalent to 400 Hz).
Corresponds to the case where the order of cutting is 10th order (10 taps) or less
You.

【００３８】人間のピッチ周波数は、高くても４００Ｈ
ｚ以下と言われており、例えばインパルス応答を５タッ
プで打ち切ることにすると、常にこの条件を満たす。イ
ンパルス応答の打ち切り次数は、３タップ程度までは再
生音の品質劣化に与える影響は小さいため、再生音の品
質と演算処理量の観点から、３〜１０タップ程度に設定
するとよい。The human pitch frequency is at most 400H
It is said to be equal to or less than z. For example, if the impulse response is terminated by 5 taps, this condition is always satisfied. The truncation order of the impulse response has a small effect on the quality degradation of the reproduced sound up to about 3 taps, so it is preferable to set the order to about 3 to 10 taps from the viewpoint of the quality of the reproduced sound and the amount of arithmetic processing.

【００３９】図１に、Ｐ_a ^tＨ_f ^tＨ_fＰ_aがＨ_f ^tＨ_fから簡
単な処理で計算できることを示す模式図を示す。この図
は、仮にフレーム長が２０サンプル、ピッチ周期が１２
サンプル、インパルス応答の打ち切り次数を５とした場
合の、行列Ｐ_a ^tＨ_f ^tＨ_fＰ_aの例である。実際には、本発
明を適用すると効果が大きいのは、フレーム長が例えば
８０点といった長い場合であり、前述のように人間のピ
ッチ周波数が４００Ｈｚを越えることはないので、実音
声でピッチ周期が１２サンプルになることもない。した
がって、図１の例は実際に本発明を適用するときの例と
は条件が異なるが、図をわかりやすく簡略化して説明を
容易にするために上記の条件のもとでの例を示した。ま
た、行列Ｐ_a ^tＨ_f ^tＨ_fＰ_aは常に対称行列であるため、右
上の三角行列部分は省略した。ｉ行ｊ列の値はｊ行ｉ列
の値と同じである。要素の記号が同じところは値が同じ
になることを示す。例えば、ｍ₃₄と表記してある要素は
すべて同じ値をとる。この行列の特徴を以下に列挙す
る。[0039] FIG. 1 shows a schematic diagram showing that _{^{_{^{_{P a t H f t H f}}}}} P a can be calculated by a simple process from H _{_f} ^t H _f. This figure assumes that the frame length is 20 samples and the pitch period is 12
Samples, in the case of a 5 to truncation order of the impulse response is an example of a matrix _{^{_{^{_{P a t H f t H f}}}}} P a. Actually, when the present invention is applied, the effect is great when the frame length is as long as 80 points, for example, and as described above, the human pitch frequency does not exceed 400 Hz. Not even 12 samples. Therefore, the example of FIG. 1 has different conditions from the example when the present invention is actually applied. However, the example under the above conditions is shown in order to simplify the drawing and to simplify the description for easy understanding. . Further, the matrix _{^{_{^{_{P a t H f t H f}}}}} P a is which is always a symmetric matrix, triangular matrix portion of the upper right are omitted. The value at the i-th row and the j-th column is the same as the value at the j-th row and the i-th column. The same symbol indicates that the value is the same. For example, elements are denoted with m ₃₄ all have the same value. The features of this matrix are listed below.

【００４０】特徴１：左上から右下にかけて帯状になっ
ており、帯と帯の間の要素は値が０である。特徴２：帯を左もしくは右にピッチ周期分シフトする
と、値が完全に一致する。特徴３：ｍ_ijで表記される部分に関して言えば、０≦ｉ
≦４、ｉ≦ｊ≦４のみ値が独立で、それ以外はｍ_i4（０
≦ｉ≦４）の値が斜めに並んでいるだけである。ｋ_ijで
表記される部分についても同様である。特徴４：ｍ_ijとｋ_ijには密接な関係があって、ｋ_ij＝ｍ_ij＋ｍ_4-(j-i),4 （０≦ｉ≦４，ｉ≦ｊ≦
４）である。上記特徴１〜４を利用すれば、打ち切ったイン
パルス応答の相関行列Ｈ _f ^tＨ_f、すなわち５×５の行列
ｍ_ij（０≦ｉ≦４，ｉ≦ｊ≦４）のみを計算すれば、Ｐ
_a ^tＨ_f ^tＨ_fＰ_aがＨ_f ^tＨ_fから簡単に変換できることがわ
かる。Feature 1: Band-like from upper left to lower right
And the elements between the bands have a value of zero. Feature 2: Shift the band left or right by the pitch period.
And the values match exactly. Feature 3: m_ijIn terms of the portion represented by, 0 ≦ i
≦ 4, i ≦ j ≦ 4, the values are independent, otherwise m_i4(0
≦ i ≦ 4) are merely arranged diagonally. k_ijso
The same applies to the portions described. Feature 4: m_ijAnd k_ijHave a close relationship with k_ij= M_ij+ M_{4- (ji), 4} (0 ≦ i ≦ 4, i ≦ j ≦
4) If you use the above features 1-4,
Correlation matrix H of pulse response _f ^tH_f, Ie a 5 × 5 matrix
m_ijBy calculating only (0 ≦ i ≦ 4, i ≦ j ≦ 4), P
_a ^tH_f ^tH_fP_aIs H_f ^tH_fIt can be easily converted from
Call

【００４１】図２は、行列の規模を少し大きくして、フ
レーム長がピッチ長の２倍よりも大きい場合を模式的に
示した図である。図２におけるフレーム長は４０サンプ
ル、ピッチ周期は１８サンプル、インパルス応答の打ち
切りは５タップを仮定している。図２でも図１と同様に
対称行列のため、右上の三角行列部分は省略している。
また、通常、行列を表記するときには、インデックスは
左から右に向かって大きくなり、上から下に向かって大
きくなるように表記するが、行列の性質上、説明をわか
りやすくするために、右下から上と左ヘインデックスが
大きくなるように表記している。この行列のｉ行ｊ列の
要素を便宜上、φ_ijで表すこととする。FIG. 2 is a diagram schematically showing a case where the scale of the matrix is slightly increased and the frame length is larger than twice the pitch length. In FIG. 2, it is assumed that the frame length is 40 samples, the pitch period is 18 samples, and the cutoff of the impulse response is 5 taps. In FIG. 2, the triangular matrix portion at the upper right is omitted because it is a symmetric matrix similarly to FIG.
Also, when writing a matrix, the index is usually written such that it grows from left to right, and grows from top to bottom. The index is written so that the index becomes larger from the top to the left. The element at the i-th row and the j-th column of this matrix is represented by φ _ij for convenience.

【００４２】図２の例でも、上記特徴１〜４があてはま
る。この例では、フレーム長がピッチ周期の２倍より大
きく３倍よりも小さいので、行列の帯は３本できてお
り、縦縞で表記する帯と帯の間のエリアは値が０にな
る。また、帯を左または右にピッチ周期の整数倍シフト
すると値は完全に一致する。例えば、φ_4,20＝φ_4,38で
ある。斜めの格子縞で表す領域、例えば、φ_ij（１８≦
ｉ≦２２，ｉ≦ｊ≦２２）やφ_ij（３６≦ｉ≦３９，ｉ
≦ｊ≦３９）は、右斜線で示す部分行列φ_ij（０≦ｉ≦
４、ｉ≦ｊ≦４）から前記特徴４の法則およびその拡張
から簡単に計算できる。例えば、 φ_18,18＝φ_0,0＋φ_4,4 φ_20,21＝φ_2,3＋φ_3,4 φ_36,38＝φ_0,2＋φ_20,22＝φ_0,2＋２×φ_2,4 となる。ここで注意することは、ピッチのｎ周期目以降
の領域、本例ではφ_ij（３６≦ｉ≦３９，ｉ≦ｊ≦３
９）が３周期目に相当するが、この場合には後ろの項が
ｎ−１倍（本例では３−１＝２倍）になる。フレームが
長くてピッチ周期が短い場合など、４周期目以上になる
ときも同様にｎ−１倍する。The above features 1 to 4 also apply to the example of FIG. In this example, since the frame length is more than twice and less than three times the pitch period, three bands of the matrix are formed, and the value between the bands expressed by vertical stripes is 0. When the band is shifted to the left or right by an integral multiple of the pitch period, the values completely match. For example, a φ _4,20 = φ _4,38. A region represented by an oblique lattice pattern, for example, φ _ij (18 ≦
i ≦ 22, i ≦ j ≦ 22) and φ _ij (36 ≦ i ≦ 39, i
≦ j ≦ 39) is a submatrix φ _ij (0 ≦ i ≦
4, i ≦ j ≦ 4), it can be easily calculated from the law of the feature 4 and its extension. For example, φ _18,18 = φ _0,0 + φ _4,4 φ _20,21 = φ _2,3 + φ _3,4 φ _36,38 = φ _0,2 + φ _20,22 = φ _0,2 + 2 × φ _{2 , 4} . It should be noted here that the area after the nth cycle of the pitch, in this example, φ _ij (36 ≦ i ≦ 39, i ≦ j ≦ 3
9) corresponds to the third cycle. In this case, the following term is n-1 times (3-1 = 2 times in this example). Similarly, when the pitch is longer than the fourth cycle, such as when the frame is long and the pitch cycle is short, it is similarly multiplied by n-1.

【００４３】これらの特徴を利用すれば、距離計算のた
めの（４）式の値を非常に少ない演算量で求めることが
可能になり、従来、例えばフレーム長が８０サンプルの
場合に、８０×８０の行列計算が必要であったものが、
例えば５×５の行列計算と、簡単な行列の変換処理で実
現できることになる。図１，図２では説明をわかりやす
くするため行列の要素を全部表示しているが、実際には
例えば５×５の行列のみをメモリに記憶しておき、
（４）式の分母を展開して、５×５の行列の要素のみで
表現すれば、行列を記憶するためのメモリ領域も大幅に
減らすことができる。If these features are used, it becomes possible to obtain the value of equation (4) for calculating the distance with a very small amount of calculation. Conventionally, for example, when the frame length is 80 samples, the value of 80 × What required 80 matrix calculations,
For example, it can be realized by 5 × 5 matrix calculation and simple matrix conversion processing. In FIGS. 1 and 2, all elements of the matrix are displayed for easy understanding, but actually, for example, only a 5 × 5 matrix is stored in the memory,
If the denominator of equation (4) is expanded and expressed using only elements of a 5 × 5 matrix, the memory area for storing the matrix can be significantly reduced.

【００４４】ただし、分母を展開すると、従来法では１
回のテーブル（行列）参照で済む処理が、式の展開によ
って項が増えるため、相関行列を計算するための処理量
は大幅に削減できるけれども、分母の計算のための演算
回数は若干増加する。５×５などのように、極端にメモ
リ領域を少なくする必要がなければ、例えば５×（フレ
ーム長）のメモリ領域を確保し、行列の値を代入してお
けば、適度に少ないメモリ量と、非常に少ない演算量を
両立することができる。例えば、図２において、φ
_ij（０≦ｉ≦３９，３６≦ｊ≦４０）の値のみを記憶す
る。このときフレームよりも長い部分（ｊ＝４０）につ
いては、フレームを延長したものとみなして計算する。
行列の他の要素（例えば、０≦ｉ≦３９，０≦ｊ≦３
５）は記憶したメモリ領域の参照で代用することができ
る。However, when the denominator is expanded, the conventional method uses 1
Since the number of terms is increased by the expansion of the expression in the processing that only needs to refer to the table (matrix) twice, the processing amount for calculating the correlation matrix can be greatly reduced, but the number of operations for calculating the denominator slightly increases. If it is not necessary to extremely reduce the memory area, such as 5 × 5, for example, a memory area of 5 × (frame length) is secured and the value of the matrix is substituted. , A very small amount of calculation can be achieved. For example, in FIG.
Only the value of _ij (0 ≦ i ≦ 39, 36 ≦ j ≦ 40) is stored. At this time, the portion longer than the frame (j = 40) is calculated assuming that the frame is extended.
Other elements of the matrix (eg, 0 ≦ i ≦ 39, 0 ≦ j ≦ 3
5) can be substituted by referring to the stored memory area.

【００４５】図３に、本発明による高速な歪み計算方法
の構成例を示す。まず参照音声ｘ_rは量子化された（復
号された）合成フィルタ係数による合成フィルタの逆フ
ィルタ（合成逆フィルタ）８−３を通り、理想の（量子
化しない）駆動音源ベクトルｅに変換される。有限タッ
プ長ＦＩＲ型聴覚重み付き合成フィルタ係数算出部８−
１では、合成フィルタと聴覚重みフィルタを合わせた聴
覚重みつき合成フィルタのインパルス応答を求め、ピッ
チ周期の半分以下の有限タップ長、例えば５タップでイ
ンパルス応答を打ち切る。ここで得られたフィルタ係数
を使ったＦＩＲ型合成フィルタに、理想の駆動音源ベク
トルｅを通し、打ち切り歪みが重畳されたターゲット音
声ｘ_fを得る。FIG. 3 shows a configuration example of a high-speed distortion calculation method according to the present invention. First reference speech x _r is converted to quantized inverse filter of the synthesis filter by (decoded) synthesis filter coefficients as a (synthetic inverted filter) 8-3, ideal (not quantized) excitation vector e . Finite tap length FIR type auditory weighted synthesis filter coefficient calculator 8-
In step 1, the impulse response of a perceptually weighted synthesis filter obtained by combining the synthesis filter and the perceptual weight filter is obtained, and the impulse response is terminated at a finite tap length equal to or less than half the pitch period, for example, 5 taps. The FIR type synthesis filter using filter coefficients obtained here, through the ideal excitation vector e, obtain the target speech x _f where truncation distortion is superimposed.

【００４６】打ち切られたインパルス応答は、ピッチ周
期化フィルタ８−２で周期化され、畳み込み部８−４に
送られる。このときのピッチ周期化は、標本化関数を用
いて非整数サンプル周期で行っても良いし、整数サンプ
ル点に近似してもよいが、非整数サンプル周期のままの
ほうが再生音声の品質がよい。畳み込み部８−４と距離
尺度分子計算部８−９で、（４）式の分子が計算され
る。The truncated impulse response is periodicized by the pitch periodicizing filter 8-2 and sent to the convolution unit 8-4. The pitch period at this time may be performed at a non-integer sample period using a sampling function or may be approximated to an integer sample point, but the quality of the reproduced sound is better when the non-integer sample period is maintained. . The convolution unit 8-4 and the distance scale numerator calculation unit 8-9 calculate the numerator of the equation (4).

【００４７】一方、打ち切られたインパルス応答は相関
行列計算部８−５に送られ、相関行列Ｈ_f ^tＨ_fが計算さ
れる。この相関行列は、例えば５タップでインパルス応
答を打ち切った場合には、５×５の行列を計算するだけ
でよい。ピッチ周期化相関行列変換部８−７では、周期
符号によって指定されるピッチ周期を整数サンプル周期
に近似し、前述の変換方法によってＰ_a ^tＨ_f ^tＨ_fＰ_aを計
算する。このとき、Ｐ _a ^tＨ_f ^tＨ_fＰ_aは本来（フレーム
長）×（フレーム長）、例えば８０×８０の行列である
が、前述の特徴があるため、Ｐ_a ^tＨ_f ^tＨ_fＰ_aの一部分、
例えば８０×５の部分行列のみをメモリに記憶し、記憶
した部分を参照することによって（フレーム長）×（フ
レーム長）の行列を参照することと同等の結果が得ら
れ、著しい処理量とメモリ量の削減が可能となる。On the other hand, the truncated impulse response is
Is sent to the matrix calculation unit 8-5, and the correlation matrix H_f ^tH_fIs calculated
It is. The correlation matrix is, for example, an impulse response with 5 taps.
If you censor the answer, just calculate the 5x5 matrix
Is fine. In the pitch period correlation matrix conversion unit 8-7, the period
The pitch period specified by the sign is an integer sample period
, And P_a ^tH_f ^tH_fP_aTotal
Calculate. At this time, P _a ^tH_f ^tH_fP_aIs originally (frame
Length) × (frame length), for example, an 80 × 80 matrix
However, because of the aforementioned characteristics, P_a ^tH_f ^tH_fP_aPart of
For example, only the 80 × 5 sub-matrix is stored in the memory and stored.
(Frame length) x (file
Is equivalent to referencing the matrix of
As a result, the amount of processing and the amount of memory can be significantly reduced.

【００４８】なお、前述のように、（４）式の分母を展
開して計算すれば、Ｈ_f ^tＨ_fを参照するだけで、Ｐ_a ^tＨ_f
^tＨ_fＰ_aの参照と同等の結果が得られるため、ピッチ周
期化相間行列変換部８−７を省略することもできる。再
生信号パワー（分母）計算部８−８では、行列Ｐ_a ^tＨ_f ^t
Ｈ_fＰ_aまたはＨ_f ^tＨ_fを使って（４）式の分母ｖ_r ^tＰ_a ^t
Ｈ_f ^tＨ_fＰ_aｖ_r'すなわちピッチ周期化の周期を整数サン
プル値で近似した再生信号のパワーを計算する。なお、
固定符号帳として振幅が１の時系列パルスを用いる場合
には、行列Ｐ_a ^tＨ_f ^tＨ_fＰ_aまたはＨ_f ^tＨ_fの要素を参照
して、読み出した要素の値を加算する簡単な処理で分母
の値を計算できる。As described above, the denominator of equation (4) is
Open and calculate, H_f ^tH_fJust by referring to_a ^tH_f
^tH_fP_aTo obtain the same result as the reference
The period-to-phase matrix conversion unit 8-7 may be omitted. Again
In the raw signal power (denominator) calculator 8-8, the matrix P_a ^tH_f ^t
H_fP_aOr H_f ^tH_fAnd the denominator v of equation (4)_r ^tP_a ^t
H_f ^tH_fP_av_{r '}In other words, the pitch period
The power of the reproduced signal approximated by the pull value is calculated. In addition,
When a time series pulse with an amplitude of 1 is used as a fixed codebook
Contains the matrix P_a ^tH_f ^tH_fP_aOr H_f ^tH_fSee elements of
And add the value of the read element to the denominator by a simple process.
Can be calculated.

【００４９】振幅が１でない実数値をとる場合でも、パ
ルス型の駆動方法で、固定符号ベクトルｖ_rのほとんど
のサンプル位置で値が０、少数のサンプル位置で０でな
い値を持つときには、本発明の方法を使って、非常に高
速な距離計算をすることができる。距離計算部８−１０
では、計算された分子と分母の値から、（４）式の値を
計算し、歪みを求める。このときの歪みは、分母のピッ
チ周期化行列を整数サンプル周期に近似しているため、
本来の歪みの近似値であるが、分母のみを整数サンプル
周期で近似する（分子は近似しない）場合には、近似に
よる再生音の品質劣化はほとんど生じない。Even when the real value whose amplitude is not 1 is taken, when the pulse-type driving method has a value of 0 at most sample positions of the fixed code vector v _r and a non-zero value at a small number of sample positions, the present invention Very fast distance calculations can be performed using the method. Distance calculator 8-10
Then, the value of equation (4) is calculated from the calculated numerator and denominator values to obtain the distortion. Since the distortion at this time approximates the pitch period matrix of the denominator to the integer sample period,
Although it is an approximate value of the original distortion, if only the denominator is approximated by an integer sample period (the numerator is not approximated), the quality of the reproduced sound due to the approximation hardly deteriorates.

【００５０】以上、この発明の実施形態を図面を参照し
て詳述してきたが、具体的な構成はこの実施形態に限ら
れるものではなく、この発明の要旨を逸脱しない範囲の
設計の変更等があってもこの発明に含まれる。The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and a design change or the like may be made without departing from the gist of the present invention. Even if there is, it is included in the present invention.

【００５１】本発明を適用して４ｋｂｉｔ／ｓの音声符
号化を設計し、コンピュータプログラムの形で実現し
た。サンプリング周波数は８ｋＨｚ、線形予測のフレー
ム長は２０ミリ秒（１６０サンプル）、駆動音源ベクト
ルのフレーム長は１０ミリ秒（８０サンプル）、ピッチ
周期を探索する際の許容範囲の下限は１８サンプル、イ
ンパルス応答の打ち切り次数は６次とした。固定符号帳
の構成としては、振幅が１のパルス列を用いた。ピッチ
周期を非整数サンプル数で表現する場合の解像度は１／
３サンプル精度とした。なお、今回は効果を確認するた
めにコンピュータプログラムの形で実現したが、信号処
理用のプロセッサに実装したり、専用のハードウェアの
形で実現してもよい。By applying the present invention, 4 kbit / s speech coding was designed and realized in the form of a computer program. The sampling frequency is 8 kHz, the frame length of the linear prediction is 20 milliseconds (160 samples), the frame length of the driving sound source vector is 10 milliseconds (80 samples), the lower limit of the allowable range when searching for the pitch period is 18 samples, and the impulse The response truncation order was set to 6. As the configuration of the fixed codebook, a pulse train having an amplitude of 1 was used. When the pitch period is expressed by a non-integer number of samples, the resolution is 1 /
Accuracy of 3 samples. In this case, the present embodiment is implemented in the form of a computer program to confirm the effect, but may be implemented in a signal processing processor or in the form of dedicated hardware.

【００５２】符号化にかかる処理時間を調べた結果、普
及型パソコンで十分に実時間処理が可能であった。ま
た、再生音を実際に聴いた場合の主観的品質を調べた結
果、ＩＴＵ−ＴＧ．７２６（３２ｋｂｉｔ／ｓのＡＤ
ＰＣＭ）と同等の品質であった。したがって、Ｇ．７２
６方式と比べると、同等の品質が１／８のビットレート
で実現できたことになる。また、携帯電話で利用されて
いるビットレートが３．４５ｋｂｉｔ／ｓのＰＳＩ−Ｃ
ＥＬＰ方式と比較すると、より高い品質が１／４以下の
処理量で実現できたことになる。このように、本発明を
利用した場合に、非常に少ない処理量と少ないメモリ量
で、高品質な音声符号化を実現でき、近似計算に伴う品
質の劣化は非常に少ないことが確認された。As a result of examining the processing time required for encoding, it was found that real-time processing could be sufficiently performed by a popular personal computer. In addition, as a result of examining the subjective quality when the reproduced sound is actually heard, ITU-TG. 726 (32 kbit / s AD
PCM). Therefore, G. 72
Compared to the 6 system, the same quality can be realized at a bit rate of 1/8. Also, the PSI-C bit rate used in mobile phones is 3.45 kbit / s.
Compared to the ELP method, higher quality can be realized with a processing amount of 1/4 or less. As described above, it has been confirmed that when the present invention is used, high-quality speech coding can be realized with a very small amount of processing and a small amount of memory, and the quality deterioration due to the approximate calculation is very small.

【００５３】なお、本発明は再生音の品質劣化への影響
が非常に少ない、歪みの近似計算方法を実現するもので
あるが、近似でない歪みを計算したい場合には、本発明
による方法を用いて、歪みの近似値が小さい順（（４）
式の値の大きい順）にいくつかの候補に絞り、それらの
各々の候補について近似でない歪みを計算して最終的に
１つのコードベクトルを決定する方法も可能である。こ
のような二段階選抜で最適なものを選択する場合の一段
階目は予備選択と呼ばれる。本発明は予備選択のための
方法としても有効である。The present invention realizes an approximate calculation method of distortion which has very little influence on the quality deterioration of reproduced sound. However, when it is desired to calculate a non-approximate distortion, the method according to the present invention is used. And the order in which the approximate value of distortion is small ((4)
It is also possible to narrow down to several candidates (in the order of larger values of the equation), calculate non-approximate distortion for each of those candidates, and finally determine one code vector. The first step in selecting an optimum one in such two-step selection is called preliminary selection. The present invention is also effective as a method for preliminary selection.

【００５４】[0054]

【発明の効果】本発明によれば、低いビットレート、少
ないメモリ量、少ない演算量で、高品質な再生音声を得
ることができる。According to the present invention, high-quality reproduced sound can be obtained with a low bit rate, a small amount of memory, and a small amount of computation.

[Brief description of the drawings]

【図１】本発明において用いられるピッチ周期化相関
行列Ｐ_a ^tＨ_f ^tＨ_fＰ_aの特性を説明する模式図である。1 is a schematic view for explaining the characteristics of the pitch period of the correlation matrix used in the invention _{^{_{^{_{P a t H f t H f}}}}} P a.

【図２】本発明において用いられるピッチ周期化相関
行列Ｐ_a ^tＨ_f ^tＨ_fＰ_aの特性を説明する模式図であり、図
１をより一般化して説明する図である。Figure 2 is a schematic diagram for explaining the characteristics of the pitch period of the correlation matrix _{^{_{^{_{P a t H f t H f}}}}} P a to be used in the present invention, is a diagram for explaining in more generalized to FIG.

【図３】本発明を用いて高速に歪み計算を行う装置の
構成例を説明するブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a device that performs high-speed distortion calculation using the present invention.

【図４】従来のＣＥＬＰ型音声符号化装置の構成例を
説明するブロック図である。FIG. 4 is a block diagram illustrating a configuration example of a conventional CELP-type speech encoding device.

【図５】ＣＥＬＰ型音声符号化装置における固定符号
帳探索装置の構成例を説明するブロック図である。FIG. 5 is a block diagram illustrating a configuration example of a fixed codebook search device in a CELP-type speech coding device.

【図６】合成歪みを計算する装置の構成例を説明する
ブロック図である。FIG. 6 is a block diagram illustrating a configuration example of an apparatus that calculates a composite distortion.

【図７】インパルス応答を用いて歪み計算を行う装置
の構成例を説明するブロック図である。FIG. 7 is a block diagram illustrating a configuration example of a device that performs distortion calculation using an impulse response.

【図８】本発明者等が既に出願した方法で、インパル
ス応答の打ち切りとピッチ逆フィルタを用いる高速な歪
み計算装置の構成例を説明するブロック図である。FIG. 8 is a block diagram illustrating an example of a configuration of a high-speed distortion calculation device that uses the method already applied for by the present inventors to terminate an impulse response and use a pitch inverse filter.

[Explanation of symbols]

８−１……有限タップ長ＦＩＲ型聴覚重みつき合成フィ
ルタ係数算出部８−２……ピッチ周期化フィルタ８−３……合成逆フィルタ８−４……畳み込み部８−５……相関行列計算部８−６……固定符号帳８−７……ピッチ周期化相関行列変換部８−８……再生信号パワー（分母）計算部８−９……距離尺度分子計算部８−１０……距離尺度計算部８−１１……ＦＩＲ型合成フィルタ8-1 ... Finite tap length FIR type auditory weighted synthesis filter coefficient calculation unit 8-2 ... Pitch periodicization filter 8-3 ... Synthesis inverse filter 8-4 ... Convolution unit 8-5 ... Correlation matrix calculation Unit 8-6: Fixed codebook 8-7: Pitch periodic correlation matrix conversion unit 8-8: Reproduction signal power (denominator) calculation unit 8-9: Distance scale numerator calculation unit 8-10: Distance Scale calculator 8-11 FIR type synthesis filter

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 - 9/20 H03M 7/30 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (58) Fields surveyed (Int. Cl. ⁶ , DB name) G10L 3/00-9/20 H03M 7/30 JICST file (JOIS)

Claims

(57) [Claims]

An audio signal is reproduced by driving a synthesis filter by using a drive sound source vector created by using a time-series vector extracted from a codebook using a vector that is periodicized at a cycle corresponding to a basic cycle of speech, In a coding method for determining a driving excitation vector such that a distortion between a target audio signal and the reproduced audio signal is minimum or similar to the minimum, a periodical gain of periodicization is included in a process of calculating the distortion. Is set to 1 and the period for the periodization is approximated by an integer sample value, and the impulse response of the synthesis filter or the synthesis filter in consideration of the auditory weight is obtained. Calculate the correlation matrix of the impulse response matrix using the truncated impulse response. And an approximate value of the power of the reproduced audio signal is calculated using the correlation matrix of the impulse response matrix calculated from the truncated impulse response. .

2. The audio signal encoding method according to claim 1, wherein a signal obtained by subjecting a target audio signal to an inverse filter of a synthesis filter is subjected to an FIR filter having a coefficient of the truncated impulse response. Generating a target sound signal on which the truncation distortion of the impulse response is superimposed, and determining the driving sound source vector by regarding the target sound signal on which the truncation distortion is superimposed as a new target sound signal. Encoding method.

3. The audio signal encoding method according to claim 2, wherein the truncation order of the impulse response is set as short as about 3 to 10 taps.

4. The audio signal encoding method according to claim 1, wherein the pitch period correlation matrix is calculated only from the correlation matrix of the impulse response matrix calculated from the truncated impulse response. Using the features that can be used, the expression of the reproduced signal power is expressed only by the elements of the correlation matrix of the impulse response matrix calculated from the truncated impulse response, and the correlation of the impulse response matrix calculated from the truncated impulse response is calculated. A method for encoding an audio signal, comprising calculating the power of a reproduction signal while referring to elements of a matrix.

5. The audio signal encoding method according to claim 1, wherein the elements of the pitch-periodic correlation matrix have the same value at positions shifted by the period of the period. Utilizing the feature that the values match at the position where the row or column is shifted by the period, the matrix of (frame or subframe length) x (the truncation order of the impulse response) is stored in the memory, and the pitch period correlation matrix A method for encoding an acoustic signal, wherein the reference is performed by referring to the memory.

6. A sound filter is reproduced by driving a synthesis filter by using a driving sound source vector created by using a time-series vector extracted from a codebook using a vector that is periodicized at a cycle corresponding to a basic cycle of speech, In a coding apparatus for determining a drive excitation vector such that a distortion between a target audio signal and the reproduced audio signal is minimum or similar to the minimum, the distortion calculation means for calculating the distortion includes: Setting means for setting the quantization gain to 1, period approximation means for approximating the period for the periodization with an integer sampled value, response means for obtaining an impulse response of a synthesis filter or a synthesis filter in consideration of auditory weights, Truncating means for truncating the response with a length equal to or less than half the period of the periodicization, and using the truncated impulse response to form an impulse response matrix A correlation matrix calculating means for calculating the correlation matrix, and the periodicity is replaced by an approximation process using the integer sample values, and the correlation matrix is reproduced using the correlation matrix of the impulse response matrix calculated from the truncated impulse response. A sound signal approximating unit for calculating an approximate value of the power of the sound signal.

7. An audio signal encoding apparatus according to claim 6, wherein a signal obtained by subjecting a target audio signal to an inverse filter of a synthesis filter is subjected to an FIR filter using the truncated impulse response as a coefficient. A driving sound source vector determining unit that determines a driving sound source vector by generating a target sound signal on which the truncation distortion of the impulse response is superimposed, and regards the target sound signal on which the truncation distortion is superimposed as a new target sound signal. A coding device for an audio signal.

8. The audio signal encoding apparatus according to claim 7, wherein the truncation order of the impulse response is set as short as about 3 to 10 taps.

9. The audio signal encoding device according to claim 6, wherein said audio signal approximation means includes an impulse response in which a pitch periodic correlation matrix is calculated from the truncated impulse response. Using the features that can be calculated only from the correlation matrix of the matrix, the expression of the reproduced signal power is expressed only by the elements of the correlation matrix of the impulse response matrix calculated from the truncated impulse response, and calculated from the truncated impulse response An audio signal encoding apparatus, wherein the power of a reproduced signal is calculated with reference to the elements of the correlation matrix of the impulse response matrix.

10. The audio signal encoding apparatus according to claim 6, wherein the audio signal approximation unit includes a position in which an element of a pitch periodic correlation matrix is shifted by a period of the period. By taking advantage of the feature that takes the same value in, that is, the value matches at the position where the row or column is shifted by the period, a matrix of (frame or sub-frame length) x (discontinuation order of impulse response) is stored in the memory. An audio signal encoding apparatus, wherein reference to a pitch period correlation matrix is performed by referring to the memory.

11. A computer generates a time series vector extracted from a codebook using a driving sound source vector created by using a vector that has been periodicized at a period corresponding to a basic period of speech, and drives a synthesis filter to generate an acoustic signal. A medium in which a program for reproducing and functioning as a coding apparatus for determining a target excitation signal and a driving excitation vector such that a distortion between the reproduced audio signal is minimum or similar to the minimum, In the process of calculating the distortion, the computer sets the periodic gain of the periodicization to 1, a period approximating unit that approximates the period for the periodicization by an integer sample value, and considers a synthesis filter or an auditory weight. Response means for obtaining the impulse response of the synthesized filter, and striking the impulse response with a length equal to or less than half the period of the periodicization. A truncating means, a correlation matrix calculating means for calculating a correlation matrix of an impulse response matrix using the truncated impulse response, and the truncated impulse response on the assumption that the periodicization is replaced by an approximation process using the integer sampled values. A medium on which is recorded a program for functioning as acoustic signal approximating means for calculating an approximate value of the power of a reproduced acoustic signal using the correlation matrix of the impulse response matrix calculated as above.

12. The recording medium according to claim 11, wherein the computer applies a FIR filter having a coefficient of the truncated impulse response to a signal obtained by applying an inverse filter of a synthesis filter to a target acoustic signal. A target sound signal on which the truncation distortion of the impulse response is superimposed, and the target sound signal on which the truncation distortion is superimposed is regarded as a new target sound signal to function as a driving sound source vector determining means for determining a driving sound source vector. Recording a program for the computer.

13. The recording medium according to claim 12, wherein the cutoff order of the impulse response is set to be as short as about 3 to 10 taps.

14. The recording medium according to claim 11, wherein said acoustic signal approximation means comprises: a correlation matrix of an impulse response matrix in which a pitch periodic correlation matrix is calculated from a truncated impulse response. Using the features that can be calculated from only the impulse response calculated from the truncated impulse response, the expression of the reproduction signal power is expressed using only the elements of the correlation matrix of the impulse response matrix calculated from the truncated impulse response. A recording medium for calculating the power of a reproduction signal while referring to elements of a correlation matrix of a matrix.

15. The recording medium according to claim 11, wherein the acoustic signal approximation unit sets the elements of the pitch periodic correlation matrix to have the same value at a position shifted by the period of the period. Taking advantage of the feature that values match at positions where rows or columns are shifted by a period, a matrix of (frame or subframe length) × (discontinuation order of impulse response) size is stored in memory, and pitch periodization is performed. A recording medium characterized in that reference of a correlation matrix is performed by referring to the memory.