JP2943983B1

JP2943983B1 - Audio signal encoding method and decoding method, program recording medium therefor, and codebook used therefor

Info

Publication number: JP2943983B1
Application number: JP10101062A
Authority: JP
Inventors: 仲大室; 一則間野; 伸二林; 祥子栗原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-04-13
Filing date: 1998-04-13
Publication date: 1999-08-30
Anticipated expiration: 2018-04-13
Also published as: JPH11296195A

Abstract

【要約】【課題】ＡＣＥＬＰ（ＡｌｇｅｂｒａｉｃＣＥＬ
Ｐ）で高い圧縮率とし、メモリ量、演算量を少なくし、
かつ高品質を保つ。【解決手段】各位置符号帳ｉ（ｉ＝１，２，…）のパ
ルス位置候補に対し対応正負符号帳ｉに、そのとるべき
正負の極性が決められている（例えば、図３参照）、図
３の例では偶数サンプル点（位置候補）は正極性、奇数
サンプル点は負極性をとり、隣接サンプル点にパルスが
立つと、お互いにパルスの極性が反対になる。位置符号
帳ｉと正負符号帳ｉは同一位置符号ｉで読出される。Abstract: ACELP (Algebric CEL)
P) to achieve a high compression ratio, reduce the amount of memory and computation,
And keep high quality. A positive / negative sign book i corresponding to a pulse position candidate of each position code book i (i = 1, 2,...) Is determined (for example, see FIG. 3). In the example of FIG. 3, the even-numbered sample points (position candidates) have a positive polarity, and the odd-numbered sample points have a negative polarity. When a pulse is generated at an adjacent sample point, the polarity of the pulse is opposite to each other. The position codebook i and the positive / negative codebook i are read with the same position code i.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声、音楽など
の音響信号の、スペクトル包絡特性を表すフィルタを音
源ベクトルで駆動して音声を合成する予測符号化によ
り、音声の信号系列を少ない情報量でディジタル符号化
する高能率音声符号化方法、その復号方法、これらに用
いる符号帳およびそのプログラム記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to predictive coding for synthesizing speech by driving a filter representing a spectral envelope characteristic of an audio signal such as speech or music with a sound source vector, thereby reducing a signal sequence of the speech to a small amount of information. The present invention relates to a high-efficiency speech encoding method for digital encoding by using the method, a decoding method thereof, a codebook used for these methods, and a program recording medium therefor.

【０００２】[0002]

【従来の技術】ディジタル移動体通信において、電波を
効率的に利用したり、音声または音楽蓄積サービス等で
通信回線や記憶媒体を効率的に利用するために、高能率
音声符号化方法が用いられる。現在、音声を高能率に符
号化する方法として、原音声をフレームまたはサブフレ
ームと呼ばれる５〜５０ｍｓ程度の一定間隔の区間に分
割し、その１フレームの音声を周波数スペクトルの包絡
特性を表す線形フィルタの特性と、そのフィルタを駆動
するための駆動音源信号との２つの情報に分離し、それ
ぞれを符号化する手法が提案されている。この手法にお
いて、駆動音源信号を符号化する方法として、音声のピ
ッチ周期（基本周波数）に対応すると考えられる周期成
分と、それ以外の成分に分離して符号化する方法が知ら
れている。この駆動音源情報の符号化法の例として、符
号駆動線形予測符号化（Ｃｏｄｅ−ＥｘｃｉｔｅｄＬ
ｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ：ＣＥＬＰ）があ
る。上記技術の詳細については、文献Ｍ．Ｒ．Ｓｃｈ
ｒｏｅｄｅｒａｎｄＢ．Ｓ．Ａｔａｌ，“Ｃｏｄｅ
−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏ
ｎ（ＣＥＬＰ）：ＨｉｇｈＱｕａｌｉｔｙＳｐｅｅ
ｃｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅｓ”，
ＩＥＥＥＰｒｏｃ．ＩＣＡＳＳＰ−８５，ｐｐ．９３
７−９４０，１９８５に記載されている。2. Description of the Related Art In digital mobile communication, a high-efficiency voice encoding method is used in order to efficiently use radio waves or to efficiently use a communication line or a storage medium for a voice or music storage service. . At present, as a method for efficiently encoding speech, an original speech is divided into sections called frames or subframes at a constant interval of about 5 to 50 ms, and the speech of one frame is subjected to a linear filter representing envelope characteristics of a frequency spectrum. And a method of separating the information into two pieces of information, ie, a drive excitation signal for driving the filter, and encoding each of the information. In this method, as a method of encoding a drive excitation signal, a method of separating and encoding a periodic component considered to correspond to a pitch period (fundamental frequency) of a voice and other components is known. As an example of the coding method of the drive excitation information, code-driven linear prediction coding (Code-Excited L
inner Prediction (CELP). For details of the above technique, see Reference M. R. Sch
roeder and B.R. S. Atal, “Code
-Excited Linear Prediction
n (CELP): High Quality Speed
chat VeryLow Bit Rates ”,
IEEE Proc. ICASP-85, pp. 93
7-940, 1985.

【０００３】図８に上記符号化方法の機能的構成例を示
す。入力端子に入力された音声は、線形予測分析部１−
１において、入力音声の周波数スペクトル包絡特性を表
す線形予測パラメータが計算される。得られた線形予測
パラメータは線形予測パラメータ符号化部１−２におい
て、符号化されて線形予測パラメータ復号部１−３に送
られる。また、歪み計算に聴覚特性を考慮するなど、入
力音声のスペクトル情報を利用して歪み計算を行なう場
合には、線形予測パラメータは歪み計算部１−６へも送
られる。線形予測パラメータ復号部１−３では、受け取
った符号から合成フィルタ係数を再生し、合成フィルタ
１−５に送る。歪み計算に聴覚特性を考慮する場合に、
歪み計算部１−６において量子化前の線形予測パラメー
タを用いる代わりに、上記復号された線形予測パラメー
タを歪み計算に使用することもある。なお、線形予測分
析の詳細および線形予測パラメータの符号化例について
は、例えば古井貞煕著“ディジタル音声処理”（東海大
学出版会）に記載されている。ここで、線形予測分析部
１−１、線形予測パラメータ符号化部１−２、線形予測
パラメータ復号部１−３および合成フィルタ１−５は非
線形なものに置き換えてもよい。FIG. 8 shows an example of a functional configuration of the encoding method. The speech input to the input terminal is output to the linear prediction analysis unit 1-
At 1, a linear prediction parameter representing the frequency spectrum envelope characteristic of the input speech is calculated. The obtained linear prediction parameters are encoded in the linear prediction parameter encoding unit 1-2 and sent to the linear prediction parameter decoding unit 1-3. When distortion calculation is performed using spectral information of input speech, for example, in consideration of auditory characteristics in distortion calculation, the linear prediction parameter is also sent to the distortion calculation unit 1-6. The linear prediction parameter decoding unit 1-3 reproduces a synthesis filter coefficient from the received code, and sends it to the synthesis filter 1-5. When considering auditory characteristics in distortion calculation,
Instead of using the linear prediction parameters before quantization in the distortion calculator 1-6, the decoded linear prediction parameters may be used for distortion calculation. The details of the linear prediction analysis and examples of encoding the linear prediction parameters are described in, for example, “Digital Speech Processing” by Sadahiro Furui (Tokai University Press). Here, the linear prediction analysis unit 1-1, the linear prediction parameter encoding unit 1-2, the linear prediction parameter decoding unit 1-3, and the synthesis filter 1-5 may be replaced with non-linear ones.

【０００４】駆動音源ベクトル生成部１−４では、１フ
レーム分の長さの駆動音源ベクトル候補を生成し、合成
フィルタ１−５に送る。図９に駆動音源ベクトル生成部
１−４の機能的構成例を示す。適応符号帳２−１からは
そのバッファに記憶された直前の過去の駆動音源ベクト
ル（既に量子化された直前の１〜数フレーム分の駆動音
源ベクトル）ｃ（ｔ−１）を、ある周期に相当する長さ
で切り出し、その切り出したベクトルをフレームの長さ
になるまで繰り返すことによって、音声の周期成分に対
応する時系列ベクトルの候補が出力される。上記「ある
周期」とは、歪み計算部１−６における歪みｄが小さく
なるような周期が選択されるが、選択された周期は、一
般には音声のピッチ周期に相当することが多い。固定符
号帳２−２からは、音声の非周期成分に対応する１フレ
ーム分の長さの時系列符号ベクトルの候補が出力され
る。これらの候補は入力音声とは独立に符号化のための
ビット数に応じてあらかじめ指定された数の候補ベクト
ルが記憶されている。固定符号帳２−２から出力された
固定符号ベクトル候補は、周期化部２−８において、周
期符号で指定される周期（上記のように一般にピッチ周
期に相当）で必要に応じて周期化される。周期化とは、
指定された周期位置にタップを持つ櫛形フィルタをかけ
るか、適応符号帳と同様に、ベクトルの先頭から指定さ
れた周期に相当する長さで切り出したベクトルを繰り返
すことをいう。周期化部２−８は、符号化効率向上の点
から用いられることが多いが、用いられない場合もあ
る。また、子音区間など、音声そのものにピッチ成分が
ないかまたは少ない場合などには、周期化部は何の働き
もしない場合もある。適応符号帳２−１および周期化部
２−８から出力された時系列ベクトルの候補は、乗算部
２−４，２−５において、それぞれ重み作成部２−３に
おいて作成された重みが乗算され、加算部２−６におい
て加算され、駆動音源ベクトルの候補ｃとなる。図９の
構成例において、適応符号帳２−１を用いないで、固定
符号帳２−２のみの構成としてもよく、子音部や背景雑
音などのピッチ周期性の少ない信号を符号化するときに
は、ビットを節約するために、適応符号帳２−１を用い
ない構成にすることも多い。[0004] The drive excitation vector generation section 1-4 generates a drive excitation vector candidate having a length of one frame and sends it to the synthesis filter 1-5. FIG. 9 shows a functional configuration example of the driving sound source vector generation unit 1-4. From the adaptive codebook 2-1, the immediately preceding past drive excitation vector (the drive excitation vector for the immediately preceding one to several frames that has been already quantized) c (t−1) stored in the buffer is stored in a certain cycle. By cutting out the cut-out vector at a corresponding length and repeating the cut-out vector until the length of the frame is reached, a time-series vector candidate corresponding to the periodic component of the voice is output. As the “certain period”, a period that reduces the distortion d in the distortion calculator 1-6 is selected. In general, the selected period generally corresponds to the pitch period of voice. The fixed codebook 2-2 outputs a time-series code vector candidate having a length of one frame corresponding to the aperiodic component of the voice. For these candidates, a predetermined number of candidate vectors are stored in accordance with the number of bits for encoding independently of the input speech. The fixed code vector candidates output from the fixed codebook 2-2 are periodicized as necessary by the periodicization unit 2-8 at a period (generally equivalent to a pitch period as described above) specified by the periodic code. You. What is periodicity?
This means that a comb filter having taps is applied to a specified period position, or a vector cut out from the head of the vector at a length corresponding to the specified period is repeated, as in the adaptive codebook. The periodic unit 2-8 is often used from the viewpoint of improving coding efficiency, but may not be used. Further, when there is no or little pitch component in the voice itself, such as in a consonant section, the periodic unit may not perform any function. The time series vector candidates output from the adaptive codebook 2-1 and the periodization unit 2-8 are multiplied by the weights generated by the weight generation units 2-3 in the multiplication units 2-4 and 2-5, respectively. , And are added in the adder 2-6 to become a drive excitation vector candidate c. In the configuration example of FIG. 9, the configuration may be such that only the fixed codebook 2-2 is used without using the adaptive codebook 2-1. When encoding a signal with a small pitch periodicity such as a consonant part or background noise, In order to save bits, a configuration not using the adaptive codebook 2-1 is often used.

【０００５】図８中の合成フィルタ１−５は、線形予測
パラメータ復号部１−３の出力をフィルタの係数とする
線形フィルタで、駆動音源ベクトル候補ｃを入力として
再生音声の候補ｙを出力する。合成フィルタ１−５の次
数すなわち線形予測分析の次数は、一般に１０〜１６次
程度が用いられることが多い。なお、既に述べたよう
に、合成フィルタ１−５は非線形なフィルタでもよい。The synthesis filter 1-5 in FIG. 8 is a linear filter using the output of the linear prediction parameter decoding section 1-3 as a filter coefficient, and outputs a reproduced sound candidate y with a driving excitation vector candidate c as an input. . Generally, the order of the synthesis filter 1-5, that is, the order of the linear prediction analysis, is generally about 10 to 16 order. As described above, the synthesis filter 1-5 may be a non-linear filter.

【０００６】歪み計算部１−６では、合成フィルタ１−
５の出力である再生音声の候補ｙと、入力音声ｘとの歪
みｄを計算する。この歪みの計算は、例えば聴覚重み付
けなど、合成フィルタの係数または量子化していない線
形予測係数を考慮にいれて行なうことが多い。図１０
に、聴覚重みづけを考慮して歪みを計算する機能的構成
例を示した。聴覚重みづけは、量子化していない線形予
測パラメータもしくは量子化された合成フィルタ係数を
用いた、聴覚重みフィルタ４−２，４−３の形で構成さ
れる。合成フィルタ４−１から出力される再生音声候補
ｙは、聴覚重みフィルタ４−２を通され、同じく聴覚重
みフィルタ４−３に通された入力音声との間で、歪みｄ
が計算される。ここで、聴覚重みフィルタ４−２，４−
３は通常同一のフィルタ係数を用いるため、聴覚重みフ
ィルタ４−２，４−３は、距離計算部４−４の後に１つ
のフィルタとして入れても等価であるが、処理量の点か
ら、図１０に示したように、距離計算部４−４の手前で
２ヶ所に分けて入れることが多い。[0006] In the distortion calculation unit 1-6, the synthesis filter 1-
Then, a distortion d between the reproduced voice candidate y, which is the output of No. 5, and the input voice x is calculated. The calculation of this distortion is often performed taking into account the coefficients of the synthesis filter or unquantized linear prediction coefficients, for example, perceptual weighting. FIG.
Fig. 7 shows an example of a functional configuration for calculating distortion in consideration of auditory weighting. The perceptual weighting is configured in the form of perceptual weight filters 4-2 and 4-3 using unquantized linear prediction parameters or quantized synthetic filter coefficients. The reproduced voice candidate y output from the synthesis filter 4-1 is passed through the auditory weight filter 4-2, and the distortion d between the input voice and the input voice also passed through the auditory weight filter 4-3.
Is calculated. Here, the auditory weight filters 4-2, 4-
3 normally use the same filter coefficient, the auditory weighting filters 4-2 and 4-3 are equivalent even if they are inserted as one filter after the distance calculation unit 4-4. As shown in FIG. 10, it is often divided into two places before the distance calculation unit 4-4.

【０００７】図８中の符号帳検索制御部１−８では、各
再生音声候補ｙと入力音声ｘとの歪みｄが最小となるよ
うな駆動音源符号を選択し、そのフレームにおける駆動
音源ベクトルを決定する。なお、図９に示した適応符号
帳２−１、固定符号帳２−２、重み符号帳２−３を用い
る場合には、これらに対する周期符号、固定符号および
重み符号を選択し、これらを駆動音源符号とする。The codebook search control section 1-8 in FIG. 8 selects a driving excitation code that minimizes the distortion d between each reproduced speech candidate y and the input speech x, and determines the driving excitation vector in the frame. decide. When the adaptive codebook 2-1, the fixed codebook 2-2, and the weighted codebook 2-3 shown in FIG. 9 are used, a periodic code, a fixed code, and a weighted code for these are selected and driven. It is a sound source code.

【０００８】符号帳検索制御部１−８において決定され
た駆動音源符号（周期符号、固定（雑音）符号、重み符
号）と、線形予測パラメータ符号化部１−２の出力であ
る線形予測パラメータ符号は、符号送出部１−９に送ら
れ、利用の形態に応じて記憶装置に記憶されるか、また
は通信路を介して受信側へ送られる。図１１に、上記符
号化方法に対応する復号方法の機能的構成例を示す。伝
送路または記憶媒体から受信された符号のうち、線形予
測パラメータ符号は線形予測パラメータ復号部３−２に
おいて合成フィルタ係数に復号され、合成フィルタ３−
４および、必要に応じて後処理部３−５に送られる。受
信された駆動音源符号は、駆動音源ベクトル生成部３−
３に送られ、符号に対応する音源ベクトルが生成され
る。なお、駆動音源ベクトル生成部３−３の構成は、図
８に示した符号化方法の駆動音源ベクトル生成部１−４
に対応する構成となる。合成フィルタ３−４は、駆動音
源ベクトルを入力として、音声を再生する。後処理部３
−５は、再生された音声の雑音感を聴覚的に低下させる
ような処理（ポストフイルタリングとも呼ばれる）を行
なうが、後処理部３−５は処理量削減等の関係から用い
られないことも多い。The excitation code (periodic code, fixed (noise) code, weight code) determined by the codebook search control section 1-8 and the linear prediction parameter code output from the linear prediction parameter coding section 1-2. Is sent to the code sending section 1-9 and stored in a storage device or sent to the receiving side via a communication path according to the form of use. FIG. 11 shows a functional configuration example of a decoding method corresponding to the encoding method. Among the codes received from the transmission path or the storage medium, the linear prediction parameter code is decoded into a synthesis filter coefficient in the linear prediction parameter decoding unit 3-2, and the synthesis filter coefficient is decoded.
4 and, if necessary, to the post-processing unit 3-5. The received excitation code is used as a driving excitation vector generation unit 3-
3 and an excitation vector corresponding to the code is generated. The configuration of the driving excitation vector generation unit 3-3 is the same as that of the driving excitation vector generation unit 1-4 of the encoding method shown in FIG.
Is a configuration corresponding to. The synthesis filter 3-4 reproduces a sound by using the driving sound source vector as an input. Post-processing unit 3
-5 performs processing (also referred to as post-filtering) to aurally reduce the sense of noise in the reproduced sound, but the post-processing unit 3-5 may not be used due to processing amount reduction or the like. Many.

【０００９】[0009]

【発明が解決しようとする課題】ＣＥＬＰ方式において
問題となるのは、駆動音源ベクトル候補の選択をするた
めの歪み計算に、非常に多くの演算処理が必要になるこ
とと、固定符号帳２−２の中に格納される符号ベクトル
を記憶するために、非常に多くのメモリ量が必要になる
点である。Problems that arise in the CELP method are that a great deal of arithmetic processing is required for distortion calculation for selecting a driving excitation vector candidate, and that the fixed codebook 2- is used. 2 in that a very large amount of memory is required to store the code vector stored in the second.

【００１０】この問題に対して、Ａｌｇｅｂｒａｉｃ
Ｃｏｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉ
ｃｔｉｏｎ（ＡＣＥＬＰ）という方式が提案されてい
る。この方式は、固定符号帳を、フレーム長のベクトル
パターンとして蓄えるのではなく、高さが１のパルス
を、フレーム内に数本、例えば、８０サンプルのフレー
ムまたはサブフレームに対して、４本、適当な位置に立
てることによって、固定符号ベクトルとする方式で、こ
の駆動音源方式の採用と、歪み計算において演算順序を
工夫することによって、従来の方式に比べて演算処理と
メモリの必要量を減らすことができる。なお、ＡＣＥＬ
Ｐ方式の詳細は、例えば、文献、Ｒ．Ｓａｌａｍｉ，
Ｃ．Ｌａｆｌａｍｍｅ，ａｎｄＪ−Ｐ．Ａｄｏｕｌ，
“８ｋｂｉｔ／ｓＡＣＥＬＰＣｏｄｉｎｇｏｆ
Ｓｐｅｅｃｈｗｉｔｈ１０ｍｓＳｐｅｅｃｈ−Ｆ
ｒａｍｅ：ａＣａｎｄｉｄａｔｅｆｏｒＣＣＩＴ
ＴＳｔａｎｄａｒｄｉｚａｔｉｏｎ”，ＩＥＥＥＰ
ｒｏｃ．ＩＣＡＳＳＰ−９４，ｐｐ．II−９７に記載さ
れている。[0010] To solve this problem, Algebraic
Code-Excited Linear Predi
ction (ACELP) has been proposed. In this method, instead of storing a fixed codebook as a vector pattern of a frame length, several pulses having a height of 1 are provided in a frame, for example, four pulses for a frame or a subframe of 80 samples, By adopting this driving excitation method in a method of setting a fixed code vector by setting it in an appropriate position, and devising the calculation order in distortion calculation, the amount of calculation processing and memory required is reduced compared to the conventional method. be able to. In addition, ACEL
For details of the P method, see, for example, Literature, Salami,
C. Laflame, and JP. Adoul,
“8 kbit / s ACELP Coding of
Speech with 10ms Speech-F
name: a Candidate for CCIT
T Standardization ”, IEEE P
rc. ICASP-94, pp. II-97.

【００１１】ここで図を用いてＡＣＥＬＰの具体的な例
を示す。図１２はＡＣＥＬＰにおける固定符号帳の機能
的構成例である。Ｎ本のパルスを立てる場合、Ｎ個の位
置符号帳１〜Ｎ（５−１１〜５−１Ｎ）、Ｎ箇所のパル
ス生成部１〜Ｎ（５−２１〜５−２Ｎ）、Ｎ箇所の正負
符号乗算部１〜Ｎ（５−３１〜５−３Ｎ）を持つ。位置
符号帳５−１には、各パルス生成部５−２毎にパルスを
立てられる位置候補のテーブルを保持し、位置符号と実
際のパルス位置の対応付けを行なう。パルス生成部５−
２では、位置符号帳５−１によって指定されたサンプル
点に単一パルスを立てた１サブフレーム分（または１フ
レーム分）の時系列ベクトルを出力する。正負符号乗算
部５−３では、パルス生成部５−２より出力されたパル
スに対して、＋１または−１の正負符号を乗算する。こ
の符号が乗算されたパルスが加算されてＮ個のパルスよ
りなるベクトルとされる。Now, a specific example of ACELP will be described with reference to the drawings. FIG. 12 is a functional configuration example of a fixed codebook in ACELP. When N pulses are generated, N position codebooks 1 to N (5-11 to 5-1N), N pulse generators 1 to N (5-21 to 5-2N), and N positive and negative signs It has sign multiplication units 1 to N (5-31 to 5-3N). The position codebook 5-1 holds a table of position candidates where a pulse can be set for each pulse generation unit 5-2, and associates position codes with actual pulse positions. Pulse generator 5-
In step 2, a time-series vector for one subframe (or one frame) in which a single pulse is set at a sample point specified by the position codebook 5-1 is output. The sign multiplication unit 5-3 multiplies the pulse output from the pulse generation unit 5-2 by a sign of +1 or -1. The pulses multiplied by this code are added to form a vector composed of N pulses.

【００１２】図１３に例として、サブフレーム長が８０
サンプル、パルス数が４本の場合の、位置符号帳とパル
ス生成部によってパルスが生成される様子を、模式的に
理解できるようにしたもので、同じサンプル点に２本の
パルスが重ならないようにパルス位置符号帳（位置候補
テーブル）を設計した例である。例えば、パルス１は、
０，５，１０，…７５の位置（サンプル点）候補の中の
いずれか１ヶ所に立てられる。同様に、パルス２，３，
４はそれぞれの位置候補の中の１ヶ所（破線のうちの１
本）に立てられる。各パルスは、正負符号乗算部５−３
において、＋１または−１の符号が乗算されるので、同
じ位置でプラス方向にパルスが立てられる場合と、マイ
ナス方向にパルスが立てられる場合がある。こうして、
正負の符号のついた４本のパルスのパルス列が出力され
る。図８中の歪み計算部１−６と符号帳検索制御部１−
８では、これらのパルス列のどの位置にどの符号のパル
スが立てられたときが最も歪みが小さくなるかを判断
し、最適な位置と正負符号を決定、位置符号と正負符号
は通信路または記憶媒体を通じて復号器（デコーダ）に
送られる。なお、規定本数、図１３の例では４本以下で
あれば、例えば３番目のパルスは立てないという選択も
あって、パルス数が３本以下の場合もある。FIG. 13 shows an example in which the subframe length is 80.
This is to make it possible to schematically understand how the position codebook and the pulse generator generate pulses when the number of samples and pulses is four, so that two pulses do not overlap at the same sample point. This is an example of designing a pulse position codebook (position candidate table). For example, pulse 1 is
.. 75 are set at any one of the candidate positions (sample points). Similarly, pulses 2, 3,
4 is one of the position candidates (one of the broken lines)
Book). Each pulse is supplied to a plus / minus sign multiplier 5-3.
, The sign of +1 or -1 is multiplied, so that there is a case where a pulse is made in the plus direction at the same position and a case where a pulse is made in the minus direction. Thus,
A pulse train of four pulses with positive and negative signs is output. The distortion calculator 1-6 and the codebook search controller 1- in FIG.
In step 8, it is determined which pulse has the lowest distortion at which position in these pulse trains and the optimum position and sign are determined. The position code and sign are determined by the communication path or the storage medium. Through to the decoder. If the number of pulses is four or less in the example of FIG. 13, for example, there is a choice that the third pulse is not raised, and the number of pulses may be three or less.

【００１３】ＡＣＥＬＰ方式の問題点は、固定符号ベク
トルのモデルが単純であるために、一定時間長あたりの
パルスの本数を減らす、すなわち、ビットレートを低く
しようとすると、音質の劣化が顕著になる点である。例
えば、図１３の例では、位置を表すのにパルス１，２，
４は各４ビット、パルス３は５ビット、正負の符号を表
すのに４パルスで４ビット必要であり、合計２１ビット
必要となる。このビット数を減らすために、パルス数を
４本から３本に減らすと、必要なビット数は合計１７ビ
ットとなるけれども、音質が顕著に劣化する。The problem of the ACELP system is that, since the model of the fixed code vector is simple, if the number of pulses per fixed time length is reduced, that is, if the bit rate is reduced, the sound quality is significantly deteriorated. Is a point. For example, in the example of FIG.
4 is 4 bits each, pulse 3 is 5 bits, and 4 pulses require 4 bits to represent a positive / negative sign, for a total of 21 bits. If the number of pulses is reduced from four to three in order to reduce the number of bits, the required number of bits becomes 17 bits in total, but the sound quality is significantly degraded.

【００１４】一方、従来型のＣＥＬＰでは、符号帳に蓄
えられる符号ベクトルの形状を事前に大量の音声データ
を分析して、有効な形状のベクトルを効率的に蓄えてお
くことができるというメリットがある。つまり、実音声
において出現する頻度の低い形状を符号ベクトルとして
表現する必要がないため、ベクトル形状の自由度に実音
声の性質を反映する拘束をかけることができるため少な
いビット数で効率的な符号化を実現できる。しかし、Ａ
ＣＥＬＰでは高さ１のパルスを並べるだけのため、実音
声に含まれる統計的な傾向を符号帳の設計に反映するこ
とができない。したがって、従来型のＣＥＬＰを使って
圧縮率を高めるか、ＡＣＥＬＰタイプの方式を使って演
算処理量とメモリ量を低く抑えるか、の二者択一が必要
であった。On the other hand, the conventional CELP has the advantage that the shape of the code vector stored in the codebook can be analyzed in advance by analyzing a large amount of speech data, and the vector having an effective shape can be stored efficiently. is there. In other words, it is not necessary to represent a shape that appears infrequently in real speech as a code vector, so that the degree of freedom of the vector shape can be constrained to reflect the properties of real speech, so that an efficient code with few bits can be used. Can be realized. But A
In CELP, since only pulses having a height of 1 are arranged, a statistical tendency included in actual speech cannot be reflected in codebook design. Therefore, it is necessary to choose between using a conventional CELP to increase the compression ratio or using an ACELP type method to reduce the amount of computation and memory.

【００１５】この発明の目的は、高い圧縮率（低いビッ
トレート）、かつ安価なプロセッサで、許容される範囲
内の少ないメモリ量、少ない演算量で、高品質な再生音
声が得られるような、音声または音楽などの音響信号を
ディジタル符号化する方法、その復号方法、これらの装
置とそのプログラム記録媒体を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a high compression rate (low bit rate) and inexpensive processor capable of obtaining high quality reproduced voice with a small amount of memory and a small amount of computation within an allowable range. It is an object of the present invention to provide a method of digitally encoding an audio signal such as voice or music, a decoding method thereof, these devices, and a program recording medium thereof.

【００１６】[0016]

【課題を解決するための手段】この発明では、ＡＣＥＬ
Ｐタイプのパルス型符号化方式において、パルスの並べ
方に実音声の性質を反映するような拘束を与え、つまり
各位置符号帳に対し、正負符号帳を対応させ、その位置
符号帳での各パルス位置候補に対し、正又は負の極性を
予め与え、位置符号で正負符号帳をも読出し、かつ隣接
サンプル点にパルスが並んで立った場合は互いに逆極性
となるようにされ、実音声において出現頻度の低いパタ
ーンは生成されないようにする。この結果、従来のＡＣ
ＥＬＰタイプの符号化方式に比べて、同等の品質を実現
する場合にはビットレートを低減でき、ビットレートを
同じにする場合には、高い品質の符号化を実現できる。According to the present invention, ACEL is used.
In the P-type pulse-type coding method, the arrangement of the pulses is constrained so as to reflect the characteristics of the actual voice. That is, each position codebook is associated with a positive / negative codebook, and each pulse in the position codebook is assigned. Positive or negative polarity is given to the position candidate in advance, the sign code is also read out by the position code, and when the pulses stand side by side at adjacent sample points, they are made to have opposite polarities and appear in the actual voice. Infrequent patterns should not be generated. As a result, the conventional AC
Compared to the ELP type coding method, the bit rate can be reduced when the same quality is realized, and high quality coding can be realized when the bit rates are the same.

【００１７】[0017]

【発明の実施の形態】以下この発明の実施例を、図面を
用いて説明する。図１はこの発明による固定符号帳の機
能的構成例を示したものである。図１２に示した従来法
と異なる点は、各パルスの正負の符号が、パルスの位置
によってあらかじめ決められている点で、従来法が１パ
ルスあたり１ビットを使って各パルスの正負を指定して
いたのに対して、この発明ではパルスの位置符号のみに
よって、パルスの位置と正負の向きがセットで決めら
れ、つまり位置符号帳５−１１〜５−１Ｎは正負符号帳
７−１１〜７−１Ｎとそれぞれ組合され、対応する、例
えば位置符号帳５−１１で指定され得る各パルス位置に
対する正負の符号は正負符号帳７−１１に予め決められ
ている。正負符号乗算部５−３１〜５−３Ｎでのパルス
生成部５−２１〜５−２Ｎの各出力パルスに対し、それ
ぞれ正負符号帳７−１１〜７−１Ｎよりの正負符号が乗
算される。必要に応じてこのようにして決められたＮ本
のパルスをまとめ、全体の符号乗算部７−２で、１つの
正負の符号が乗算される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows an example of a functional configuration of a fixed codebook according to the present invention. The difference from the conventional method shown in FIG. 12 is that the sign of each pulse is predetermined in accordance with the position of the pulse, and the conventional method specifies the sign of each pulse using one bit per pulse. On the other hand, in the present invention, the position and the positive / negative direction of the pulse are determined as a set only by the position code of the pulse, that is, the position codebooks 5-11 to 5-1N are changed to the positive and negative codebooks 7-11 to 7-7. The plus / minus sign for each pulse position which can be respectively specified with, for example, the position codebook 5-11 and which is combined with -1N is predetermined in the plus / minus codebook 7-11. Each of the output pulses of the pulse generation units 5-21 to 5-2N in the positive / negative sign multiplying units 5-31 to 5-3N is multiplied by the positive / negative sign from the positive / negative sign book 7-11 to 7-1N, respectively. If necessary, the N pulses determined in this way are combined, and the entire sign multiplying unit 7-2 multiplies the sign by one sign.

【００１８】この結果、この符号帳構成例を用いた場合
に必要となるビット数は、図１２による従来法を用いた
場合とくらべて、Ｎパルス用いる場合で、Ｎ−１ビット
少なくなる。図１の構成例は、図２に示すように変形し
ても等価で、最後にプラス、マイナスの符号を全体のパ
ルス列に対して乗算するかわりに、符号反転部１〜Ｎ
（８−１１〜８−１Ｎ）において、正負符号帳１〜Ｎ
（７−１１〜７−１Ｎ）よりの各正負符号がマイナスの
場合には、パルスの極性を反転、プラスのときは反転せ
ずそのままという処理にして正負符号乗算部１〜Ｎ（５
−３１〜５−３Ｎ）に入力してもよい。As a result, the number of bits required when using this codebook configuration example is N-1 bits smaller when using N pulses than when using the conventional method shown in FIG. The configuration example of FIG. 1 is equivalent even if modified as shown in FIG. 2. Finally, instead of multiplying the entire pulse train by plus and minus signs, the sign inverting units 1 to N
In (8-11 to 8-1N), the sign books 1 to N
If the sign of each sign from (7-11 to 7-1N) is negative, the polarity of the pulse is inverted, and if the sign is positive, the pulse is not inverted and processed as it is, and the sign multipliers 1 to N (5
-31 to 5-3N).

【００１９】図１および図２に示す構成例を用いたとき
に、正負符号帳７−１１〜７−１Ｎおよび位置符号帳５
−１１〜５−１Ｎに記憶されるパルスの位置と極性の組
を、どのように決めるかという点が重要である。どのよ
うな決め方をしても、Ｎ−１ビット削減できることは同
じであるが、音声を符号化して復号したときに、品質の
劣化ができるだけ少ないように決めなければならない。
既に述べたように、音質の劣化を最小限に抑えるために
は、実音声の符号化をした場合に、出現する頻度の高い
組み合わせを残し、出現する頻度の低い組み合わせを削
減する。実音声の性質を反映しない削減方法は、音声品
質を極端に劣化させる。When the configuration examples shown in FIGS. 1 and 2 are used, the sign book 7-11 to 7-1N and the position code book 5
It is important how to determine the set of the position and the polarity of the pulse stored in -11 to 5-1N. Regardless of the method of determination, it is the same that N-1 bits can be reduced. However, when speech is encoded and decoded, it is necessary to determine the quality as little as possible.
As described above, in order to minimize the deterioration of the sound quality, when encoding of real voice is performed, the combinations that appear frequently are left, and the combinations that appear less frequently are reduced. A reduction method that does not reflect the properties of the actual voice will significantly degrade the voice quality.

【００２０】図３に、図１３に示した従来と同様にとり
得る位置（サンプル点）を同一としたものにこの発明を
適用した場合の、パルス位置とパルスの極性の組の例を
示す。この場合に必要なビット数は１８ビットで、図１
３の場合に比べて３ビット少ない。上向きの破線の位置
では、プラス方向のみにパルスの極性が指定されること
を、下向きの破線の位置では、マイナス方向のみにパル
スの極性が指定されることを表す。ただし、最後に４パ
ルスまとめてプラス、マイナスの符号が乗算されるの
で、この極性は絶対的な符号ではなく、４パルス相互の
相対的な極性でしかない。図３の例ではパルス１は偶数
サンプル位置ではプラス方向の極性を、奇数サンプル位
置ではマイナス方向の極性をもつように指定されてい
る。偶数サンプル点がマイナスの極性、奇数サンプル点
がプラスの極性としてもよい。FIG. 3 shows an example of a set of pulse positions and pulse polarities when the present invention is applied to the same position (sample point) that can be taken in the same manner as in the prior art shown in FIG. In this case, the required number of bits is 18 bits.
3 bits less than 3 bits. The position of the upward dashed line indicates that the pulse polarity is specified only in the plus direction, and the position of the downward dashed line indicates that the pulse polarity is specified only in the negative direction. However, since the last four pulses are multiplied together by plus and minus signs, this polarity is not an absolute sign but only a relative polarity between the four pulses. In the example of FIG. 3, the pulse 1 is specified to have a positive polarity in the even sample position and a negative polarity in the odd sample position. Even-numbered sample points may have negative polarity, and odd-numbered sample points may have positive polarity.

【００２１】図３に示すような指定は、一見、音質を劣
化させないことに何ら貢献していないように見えるが、
実際の音声の特性を的確に反映した、有効な指定の方法
である。実際の音声を符号化した場合の駆動音源ベクト
ルの形状を観察すると、隣接するサンプル点では、相反
する極性のパルスが立つことが多く、同じ極性のパルス
が、隣接するサンプル点に並ぶことはあまりない。この
事実を的確に反映したのが、図３のような指定方法であ
る。この指定方法をとれば、パルスが隣接するサンプル
点に並んで立つ場合には、２つのパルスの極性は必ず反
対になる。同じ極性のパルスが隣接して立つことはな
い。したがって、偶数サンプル点、奇数サンプル点とい
う代数的な決め方であるけれども、音声の性質をよく反
映した方法である。At first glance, the designation as shown in FIG. 3 does not seem to have contributed to not deteriorating the sound quality.
This is an effective specification method that accurately reflects the characteristics of actual speech. When observing the shape of the driving sound source vector when actual speech is coded, pulses of opposite polarities often occur at adjacent sample points, and pulses of the same polarity rarely line up at adjacent sample points. Absent. The designation method shown in FIG. 3 accurately reflects this fact. According to this designation method, when pulses stand side by side at adjacent sample points, the polarities of the two pulses are always opposite. Pulses of the same polarity do not stand adjacent to each other. Therefore, although it is an algebraic method of determining even-numbered sample points and odd-numbered sample points, the method well reflects the characteristics of speech.

【００２２】図４に、図３の候補例の変形としての候補
例２を示す。図３の例との違いは、×印で示したサンプ
ル点１５，３１，４７，６３では、パルスの振幅が０、
すなわちパルスを立てないことを表す。これはパルスを
常に１サブフレームに４本立てるよりも、３本以下の選
択肢を用意したほうが、符号化した場合の品質がよくな
るためである。４本パルスがよいか３本パルス以下がよ
いかの判断は、歪み計算部１−６と符号帳検索制御部１
−８によって歪みの小さいほうに決定される。図４に示
したように、パルスの立たないサンプル点をつくる場
合、パルスの立たない位置はサブフレーム内に分散して
いたほうが符号化した音の品質が良く、サンプル点１
５，３１，４７，６３という決め方は良い決め方の代表
例である。FIG. 4 shows a candidate example 2 as a modification of the candidate example of FIG. The difference from the example of FIG. 3 is that at sample points 15, 31, 47, and 63 indicated by crosses, the pulse amplitude is 0,
That is, no pulse is generated. This is because the quality when encoding is better when three or less options are prepared, rather than always setting four pulses in one subframe. The determination as to whether four pulses or three pulses or less is preferable is made by the distortion calculator 1-6 and the codebook search controller 1
-8 determines the smaller distortion. As shown in FIG. 4, when a sample point where a pulse does not occur is formed, the quality of the encoded sound is better when the position where the pulse does not occur is dispersed in the subframe, and the sample point 1
The determination method of 5, 31, 47, 63 is a typical example of a good determination method.

【００２３】図５は、サブフレームあたり更にもう１ビ
ット減らして、１７ビットとしたい場合の位置と極性の
候補例である。図中にサンプル点番号のないサンプル位
置、つまり４，９，１４，１９，…，７４，７９にはパ
ルスを立てない。また、×印の３１，４７にもパルスを
立てない。図４の構成例に比べると、符号化音の品質と
いう点では劣るけれども、上記、パルスを立てられない
サンプル点はサブフレーム内に分散していたほうがよい
という観点ではよい例である。なお、図４は１８ビット
の場合、図５は１７ビットの場合であるが、例えば第一
サブフレームは１７ビットを固定符号帳に割り当てて図
５の構成を使い、第二サブフレームは１８ビットを割り
当てて図４の構成を使うなど、ひとつの符号化の中で異
なった構成が混在していてもよい。FIG. 5 shows an example of position and polarity candidates when it is desired to further reduce one bit per subframe to 17 bits. No pulse is raised at sample positions without sample point numbers in the figure, that is, 4, 9, 14, 19,..., 74, 79. In addition, no pulse is made on the crosses 31 and 47. Although the quality of the coded sound is inferior to that of the configuration example of FIG. 4, it is a good example from the viewpoint that it is better to disperse the sample points where a pulse cannot be formed in a subframe. FIG. 4 shows the case of 18 bits, and FIG. 5 shows the case of 17 bits. For example, the first subframe uses the configuration of FIG. 5 by allocating 17 bits to a fixed codebook, and the second subframe has 18 bits. 4, different configurations may be mixed in one encoding, such as using the configuration of FIG.

【００２４】図６に、図５の変形例を示す。図３〜５
は、サンプル点が偶数点か奇数点かによって、パルスの
相対的極性が決まるという特徴があったが、図６では、
偶数サンプル点か奇数サンプル点かではパルスの相対的
極性は決まらない。しかし図３〜５と同様に、パルスが
隣接するサンプル点に並んで立つ場合には、２つのパル
スの極性は必ず反対になり、同じ極性のパルスが隣接し
て立つことがない。したがって、図６のパルスの立て方
も、実際の音声の特性を的確に反映した、有効な指定の
方法である。この方法でさらに都合がよいのは、パルス
１はすべての候補位置で同じ極性をとること、同じくパ
ルス２〜４でも、各パルスはそれぞれのすべての候補位
置で同じ極性をとることで、最適なパルス位置の探索処
理が簡単になるという特徴がある。なお、図６におい
て、図５に×印で示されるような、パルスを立てないサ
ンプル位置を設けてもよい。FIG. 6 shows a modification of FIG. Figures 3-5
Is characterized in that the relative polarity of the pulse is determined depending on whether the sample point is an even or odd point, but in FIG.
The relative polarity of the pulse is not determined by the even sample point or the odd sample point. However, as in FIGS. 3 to 5, when the pulses stand side by side at adjacent sample points, the polarities of the two pulses are always opposite, and pulses of the same polarity do not stand adjacent. Therefore, the method of setting the pulse shown in FIG. 6 is also an effective designation method that accurately reflects the characteristics of the actual voice. A further advantage of this method is that pulse 1 has the same polarity at all the candidate positions, and that pulses 2 to 4 have the same polarity at all the respective candidate positions. There is a feature that the search processing of the pulse position is simplified. In FIG. 6, a sample position at which no pulse is generated may be provided as shown by a cross in FIG.

【００２５】図７は、この発明による固定符号帳の別の
機能的構成例で、これまでの構成例では、パルスは＋１
か−１のパルスであったが、正負符号帳１〜Ｎ（７−１
１〜７−１Ｎ）のかわりに、振幅符号帳１〜Ｎ（９−１
１〜９−１Ｎ）を用い、位置符号によって指定されるパ
ルス位置の各々について、パルスの正負の極性と、パル
スの高さをあらかじめ決めておく方法である。振幅符号
帳９−１１〜９−１Ｎは、位置符号から対応するサンプ
ル点でのパルス振幅値を出力する。例えば、サンプル点
１０にパルスが立つ場合にはパルスの高さは＋０．６、
サンプル点５７にパルスが立つ場合にはパルスの高さは
−０．２といった具合である。図２の構成例は、図７に
おいて、振幅符号帳９−１１〜９−１Ｎが、＋１か−１
の値しか持たない場合に相当する。振幅符号帳１〜Ｎ
（９−１１〜９−１Ｎ）の設計は、符号化音の品質がで
きるだけ良くなるように設計する必要があるが、一般
に、実際の音声を符号化した場合に出現頻度の高いパタ
ーンを用意するのがよい。実際の音声データを用いたと
きの駆動音源を、波形表示するなど観察して、出現頻度
の高いパターンを用意するのも良いが、一般化ロイドア
ルゴリズムと呼ばれる方法で、大量の音声データを使っ
て、歪みの総和を最小にする手法を適用することができ
る。一般化ロイドアルゴリズムとは、ある初期振幅符号
帳からスタートして、全音声データを符号化したときの
歪みの総和が最小になるように、振幅符号帳を更新する
方法で、符号化と符号帳の更新を交互に繰り返すことに
よって、徐々に最適な符号帳が形成される。この発明に
この一般化ロイドアルゴリズムを適用する場合の初期符
号帳の決め方として、図３〜図６に示す位置と振幅（＋
１か−１）を初期値とするのは、良い代表例である。FIG. 7 shows another example of the functional configuration of the fixed codebook according to the present invention.
Or −1 pulse, but the sign books 1 to N (7-1)
1 to 7-1N), instead of amplitude codebooks 1 to N (9-1)
1 to 9-1N), the positive and negative polarities of the pulse and the pulse height are determined in advance for each pulse position specified by the position code. The amplitude codebooks 9-11 to 9-1N output pulse amplitude values at corresponding sample points from the position code. For example, when a pulse rises at the sample point 10, the pulse height is +0.6,
When a pulse rises at the sample point 57, the pulse height is -0.2. In the configuration example of FIG. 2, the amplitude codebooks 9-11 to 9-1N in FIG.
Is equivalent to having only the value of. Amplitude codebook 1-N
The design of (9-11 to 9-1N) needs to be designed so that the quality of the coded sound is as good as possible. In general, a pattern having a high appearance frequency when an actual voice is coded is prepared. Is good. It is good to observe the driving sound source when using actual audio data, such as displaying a waveform, and prepare a pattern with a high frequency of appearance.However, using a method called the generalized Lloyd algorithm, using a large amount of audio data , A method of minimizing the sum of distortions can be applied. The generalized Lloyd algorithm is a method that starts from a certain initial amplitude codebook and updates the amplitude codebook so that the sum of distortions when encoding all audio data is minimized. Are alternately repeated to form an optimum codebook gradually. When the generalized Lloyd algorithm is applied to the present invention, the initial codebook is determined by the position and amplitude (+
It is a good representative example to set 1 or -1) as the initial value.

【００２６】図７において符号反転部８−１１〜８−１
Ｎを省略して、図１１に示すように、振幅符号帳９−１
１〜９−１Ｎの各出力を乗算部５−３１〜５−３Ｎでそ
れぞれ乗算し、これらを加算した後乗算部７−２で正負
符号を乗算してもよい。上述した符号化処理、復号処理
はそれぞれコンピュータによりプログラムを読出し、解
読実行することにより、行わせることもできる。In FIG. 7, sign inverting sections 8-11 to 8-1 are provided.
N is omitted, and as shown in FIG.
The respective outputs of 1 to 9-1N may be multiplied by multipliers 5-31 to 5-3N, respectively, added, and then multiplied by the plus / minus sign by the multiplier 7-2. The above-described encoding processing and decoding processing can be performed by reading a program by a computer and executing the decoding.

【００２７】[0027]

【発明の効果】この発明による効果を確認するために、
実際の音声を符号化、復号するためのコンピュータプロ
グラムを作成した。この発明はハードウェアで実現して
もよいし、コンピュータプログラムの形で実現してもよ
いが、簡単に効果を確認できる方法として、今回はコン
ピュータプログラムを用いた。サンプリング周波数は８
ｋＨｚ、サブフレーム長は１０ｍｓ（８０サンプル）、
１フレームあたり２サブフレーム構成とし、その第一サ
ブフレームに図５の符号帳を用い、第二サブフレームに
図４の符号帳を用い、ビットレートは４ｋｂｉｔ／ｓと
した。上記プログラムに、実際の音声データを入力して
符号化および復号し、再生音の品質を調べたところ、Ｉ
ＴＵ−ＴＧ．７２６標準方式を用いた場合の品質とほ
ぼ同等の品質を実現することができた。Ｇ．７２６標準
方式のビットレートは３２ｋｂｉｔ／ｓであるので、そ
の１／８のビットレートで同等の品質を実現できたこと
になる。符号化に要したＣＰＵ時間は、従来のＣＥＬＰ
方式の１／４以下（４倍以上高速）で、少ない演算処理
量で高品質な音声符号化を実現できることが確認され
た。In order to confirm the effects of the present invention,
A computer program for encoding and decoding actual audio was created. The present invention may be realized by hardware or in the form of a computer program, but this time, a computer program was used as a method for easily confirming the effects. The sampling frequency is 8
kHz, subframe length is 10 ms (80 samples),
One frame has a two-subframe configuration, the codebook of FIG. 5 is used for the first subframe, the codebook of FIG. 4 is used for the second subframe, and the bit rate is 4 kbit / s. The actual sound data was input to the above program, encoded and decoded, and the quality of the reproduced sound was examined.
TU-T G. The quality almost equal to the quality in the case of using the 726 standard system could be realized. G. FIG. Since the bit rate of the 726 standard system is 32 kbit / s, equivalent quality can be realized at a bit rate of 1/8. The CPU time required for encoding is the same as the conventional CELP
It has been confirmed that high-quality speech encoding can be realized with less than 1/4 of the system (more than 4 times as fast) and with a small amount of processing.

[Brief description of the drawings]

【図１】この発明による固定符号帳の機能的構成例を示
す図。FIG. 1 is a diagram showing a functional configuration example of a fixed codebook according to the present invention.

【図２】この発明による固定符号帳の機能的構成の変形
例を示す図。FIG. 2 is a diagram showing a modification of the functional configuration of the fixed codebook according to the present invention.

【図３】この発明によるパルス位置と極性の候補例を示
す図。FIG. 3 is a diagram showing pulse position and polarity candidate examples according to the present invention.

【図４】この発明によるパルス位置と極性の候補の他の
例を示す図。FIG. 4 is a diagram showing another example of pulse position and polarity candidates according to the present invention.

【図５】この発明によるパルス位置と極性の候補の更に
他の例を示す図。FIG. 5 is a diagram showing still another example of pulse position and polarity candidates according to the present invention.

【図６】この発明によるパルス位置と極性の候補の更に
他の例を示す図。FIG. 6 is a diagram showing still another example of pulse position and polarity candidates according to the present invention.

【図７】この発明による固定符号帳の機能的構成の更に
他の例を示す図。FIG. 7 is a diagram showing still another example of the functional configuration of the fixed codebook according to the present invention.

【図８】従来の音声符号化方法の機能的構成例を示す
図。FIG. 8 is a diagram showing a functional configuration example of a conventional speech coding method.

【図９】図８中の駆動音源ベクトル生成部１−４の機能
的構成例を示す図。FIG. 9 is a diagram showing a functional configuration example of a driving sound source vector generation unit 1-4 in FIG. 8;

【図１０】図８中の合成歪み計算部１−７の機能的構成
例を示す図。FIG. 10 is a diagram showing an example of a functional configuration of a combined distortion calculator 1-7 in FIG. 8;

【図１１】ＣＥＬＰ型音声復号方法の機能的構成例を示
す図。FIG. 11 is a diagram showing a functional configuration example of a CELP-type speech decoding method.

【図１２】ＡＣＥＬＰの固定符号帳の機能的構成例を示
す図。FIG. 12 is a diagram showing an example of a functional configuration of a fixed codebook of ACELP.

【図１３】従来のパルス位置の候補例を示す図。FIG. 13 is a diagram showing a conventional example of pulse position candidates.

───────────────────────────────────────────────────── フロントページの続き (72)発明者栗原祥子東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (56)参考文献特開平２−282799（ＪＰ，Ａ) 特開平３−177900（ＪＰ，Ａ) 特開平３−245197（ＪＰ，Ａ) 特開平８−54898（ＪＰ，Ａ) 特開平10−97294（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 9/14,9/18 H03M 7/30 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Shoko Kurihara 3-19-2 Nishishinjuku, Shinjuku-ku, Tokyo Within Nippon Telegraph and Telephone Corporation (56) References JP-A-2-282799 (JP, A) JP-A-3-177900 (JP, A) JP-A-3-245197 (JP, A) JP-A-8-54898 (JP, A) JP-A-10-97294 (JP, A) (58) Int.Cl. ⁶ , DB name) G10L 9/14, 9/18 H03M 7/30

Claims

(57) [Claims]

An audio signal is reproduced by driving a synthesis filter with a driving excitation vector obtained from a time series vector extracted from a codebook, and a quantization unit of an excitation vector called a frame or a subframe is used as a driving excitation vector. Generate two or more pulses in the section, and search for the position, positive / negative polarity, and amplitude of the pulse so that the distortion between the input audio signal and the reproduced audio signal is minimized or conforms to the minimum. In the coding method to be determined, the relative polarity of a plurality of pulses to be set in a frame or subframe is determined in advance depending on the sample position where the pulse is to be set, and the relative polarity of the pulse is determined at the same time when the pulse position is determined. , Which is also uniquely determined.

2. The method for encoding an audio signal according to claim 1, wherein the position of each pulse in the frame or the sub-frame and the relative polarity of each pulse are determined, and all the pulses in the frame or the sub-frame are determined. A method for encoding an acoustic signal, comprising: determining an absolute polarity of each pulse by designating one kind of positive / negative sign collectively.

3. The audio signal encoding method according to claim 1, wherein a positive or negative sign depending on a pulse position is set so that pulses having the same polarity do not stand side by side at adjacent sample points. A method for encoding an audio signal, comprising using a codebook.

4. The encoding method of an acoustic signal according to claim 1, wherein the polarity of the pulse is determined depending on whether the sample position where the pulse is to be formed is an even sample point or an odd sample point. ,
A coding method for an acoustic signal, comprising using a codebook of a positive / negative sign depending on a pulse position so that one is positive and the other is negative.

5. The method for encoding an acoustic signal according to claim 1, wherein the absolute value of the pulse amplitude is 1.

6. The method of encoding an acoustic signal according to claim 1, wherein real values of the polarity and amplitude of the pulse at each pulse position are determined in advance and stored in a codebook. Wherein the polarity and amplitude of the pulse are uniquely determined simultaneously at the same time.

7. A linear prediction parameter code and a driving excitation code are input, the linear prediction parameter code is decoded into a synthesis filter coefficient, the synthesis filter coefficient is set in a synthesis filter, and the driving excitation code is set to a driving excitation vector. In the decoding method of reproducing the acoustic signal by driving the synthesis filter with the driving excitation vector, a position code is decoded from the driving excitation code, and a position codebook and a positive / negative codebook are decoded by the position code. A sound signal decoding method comprising: generating a drive excitation vector by generating a pulse having a unique polarity at a position corresponding to the position code.

8. A first to N-th position codebook which has a plurality of sample points different from each other as pulse position candidates and outputs pulse position candidates in accordance with respective given position codes; It is provided corresponding to the codebook, holds a predetermined positive / negative polarity for each pulse position candidate of the corresponding position codebook, and outputs positive or negative polarity according to the given position code. First to N-th codebooks, and pulse position candidates output from the first to N-th position codebooks, respectively, and first to N-th pulse generation for generating a pulse at the position of the pulse position candidate Means for multiplying each pulse output from the first to Nth pulse generation means by positive or negative polarity output from the first to Nth positive / negative codebook, respectively.
An audio signal encoding / decoding codebook comprising: a positive / negative sign multiplying means; and an adding means for combining all of the multiplied outputs of the first to Nth positive / negative sign multipliers.

9. The audio signal encoding / coding apparatus according to claim 8, further comprising: all-code multiplying means for multiplying an output pulse of said adding means by a given positive or negative sign. Codebook for decoding.

10. A first to Nth position codebook which has a plurality of sample points different from each other as pulse position candidates and outputs pulse position candidates corresponding to given position codes, respectively, and said first to Nth position codebooks. The code book is provided in correspondence with each other, and holds a positive or negative polarity for a predetermined pulse for each pulse position candidate of the corresponding position code book, and sets a positive or negative polarity according to the given position code. The first to Nth sign books to be output and the output positive or negative polarity of the first to Nth sign books are output with or without performing sign inversion according to a given common sign. First to N-th sign inverting means; and first to N-th pulses each receiving a pulse position candidate output from the first to N-th position codebooks and generating a pulse at the position of the pulse position candidate. Generating means; and first to N-th positive / negative signs for multiplying each of the pulses output from the first to N-th pulse generating means by positive or negative polarity output from the first to N-th sign inverting means, respectively. An audio signal encoding / decoding codebook, comprising: a multiplying unit; and an adding unit that combines all of the multiplied outputs of the first to N-th sign multiplying units.

11. A first to N-th position codebook which has a plurality of sample points different from each other as pulse position candidates and outputs position candidates corresponding to given position codes, respectively, and said first to N-th position codes. Each of the pulse position candidates of the corresponding position codebook is provided with an amplitude value including predetermined positive and negative polarities, and includes the polarity according to the given position code. First to Nth amplitude codebooks for outputting amplitude values, and respective pulse position candidates output from the first to Nth position codebooks, respectively, are inputted, and a first pulse for generating a pulse at the position of the pulse position candidate is input. To the N-th pulse generation means, and for the pulses output from the first to N-th pulse generation means, an amplitude value including the polarity from the first to N-th amplitude codebooks and a common positive / negative sign for the whole. It Means for multiplying each and synthesizing all the multiplication results and outputting the result.

12. A process for obtaining a linear prediction parameter representing a frequency spectrum envelope characteristic of an input audio signal; a process for encoding the linear prediction parameter to obtain a linear prediction parameter code; and decoding and combining the linear prediction parameter code. A process of obtaining a filter coefficient and setting the same in a synthesis filter; a process of reading out a position codebook and a positive / negative codebook by a position code to generate a plurality of drive excitation pulses; A process of synthesizing signals, a process of obtaining distortion of the synthesized audio signal with respect to the input audio signal, a process of searching for the position code so as to minimize the distortion and determining a driving excitation code, Outputting a linear prediction parameter code and the driving excitation code as encoded output by a computer. A recording medium recording a program to.

13. A process of decoding an input linear prediction parameter code to obtain a synthesis filter coefficient and setting the synthesis filter coefficient in a synthesis filter; a process of decoding a driving excitation code to obtain a position code; A program for performing, by a computer, a process of reading a position codebook and a positive / negative codebook with a position code to generate a plurality of drive excitation pulses, and a process of driving the synthesis filter by the drive excitation pulses to synthesize an acoustic signal. And a recording medium on which is recorded.