JP3490325B2

JP3490325B2 - Audio signal encoding method and decoding method, and encoder and decoder thereof

Info

Publication number: JP3490325B2
Application number: JP03811299A
Authority: JP
Inventors: 登原田; 仲大室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-02-17
Filing date: 1999-02-17
Publication date: 2004-01-26
Anticipated expiration: 2019-02-17
Also published as: JP2000235399A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、音声信号を入力
し、入力された音声信号と合成された再生信号の間の歪
みを定められた距離尺度で最小にすることにより、少な
い情報量でディジタル符号化する高能率音声符号化方
法、その復号方法およびその符号化器、復号器に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital signal with a small amount of information by inputting an audio signal and minimizing distortion between the input audio signal and the synthesized reproduction signal on a predetermined distance scale. The present invention relates to a high-efficiency speech coding method for coding, a decoding method thereof, an encoder and a decoder thereof .

【０００２】[0002]

【従来の技術】ディジタル移動体通信において電波を効
率的に利用したり、音声または音楽蓄積サービス等で通
信回線や記憶媒体を効率的に利用するために、高能率音
声信号符号化方法が用いられる。音声符号化方式として
は、３．４ｋＨｚ以下に周波数帯域を制限した電話帯域
音声を対象とした符号化方式と、７ｋＨｚ帯域までの周
波数帯域を含んだ音声を対象とした符号化方式が一般的
に利用されている。これらの符号化方式にはＩＴＵ−Ｔ
の標準方式であるＧ．７２３．１，Ｇ．７２９，Ｇ．７
２２等がある。2. Description of the Related Art A high-efficiency voice signal coding method is used in order to efficiently use radio waves in digital mobile communications and to efficiently use communication lines and storage media in voice or music storage services. . As a voice encoding method, generally, an encoding method for telephone band voice whose frequency band is limited to 3.4 kHz or less and an encoding method for voice including a frequency band up to 7 kHz band are generally used. It's being used. These encoding methods include ITU-T
G.G. 723.1, G.I. 729, G.I. 7
There are 22 etc.

【０００３】これらのうち、７ｋＨｚ帯域の符号化方式
では自然性は高いが、比較的ビットレートが高く、電話
帯域の符号化方式では、ビットレートは比較的低いもの
が多いが、自然性の面では７ｋＨｚ帯域の符号化方式に
及ばないといった特徴がある。実際の応用では、さまざ
まな要件条件に応じてこれらの符号化方式を選択して用
いることが多い。Of these, the 7 kHz band coding system has a high naturalness, but has a relatively high bit rate, and the telephone band coding system often has a relatively low bit rate, but it is natural. Has a feature that it does not reach the encoding system of the 7 kHz band. In actual applications, these coding methods are often selected and used according to various requirements.

【０００４】比較的低いビットレートで音声を符号化す
る方式としては、特に符号駆動線形予測符号化（Ｃｏｄ
ｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉ
ｏｎ：ＣＥＬＰ）と呼ばれる方式が利用されることが多
い。この技術の詳細については、文献Ｍ．Ｒ．Ｓｃｈｒ
ｏｅｄｅｒａｎｄＢ．Ｓ．Ａｔａｌ，“Ｃｏｄｅ−
ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ
（ＣＥＬＰ）：ＨｉｇｈＱｕａｌｉｔｙＳｐｅｅｃ
ｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅｓ”，
ＩＥＥＥＰｒｏｃ．ＩＣＡＳＳＰ−８５，ｐｐ．９３
７−９４０，１９８５に記載されている。Code-driven linear predictive coding (Cod) is one of the methods for coding speech at a relatively low bit rate.
e-Excited Linear Predicti
The method called on: CELP) is often used. For details of this technique, refer to the document M. R. Schr
oeder and B.I. S. Atal, "Code-
Excited Linear Prediction
(CELP): High Quality Spec
h at Very Low Bit Rates ”,
IEEE Proc. ICASSP-85, pp. 93
7-940,1985.

【０００５】図１０にこの符号化方法の機能的構成を示
す。入力端子に入力された音響信号（入力音声）を用い
て、線形予測分析部１−２において、入力音声の周波数
スペクトル包絡特性を表す線形予測パラメータが計算さ
れる。得られた線形予測パラメータは線形予測パラメー
タ符号化部１−３において符号化されて線形予測パラメ
ータ復号部１−４に送られる。また、歪み計算に聴覚特
性を考慮するなど、入力音声のスペクトル情報を利用し
て歪み計算を行う場合には、線形予測パラメータは歪み
計算部１−７へも送られる。線形予測パラメータ復号部
１−４では、受け取った符号から合成フィルタ係数を再
生し、合成フィルタ１−６に送る。歪み計算に聴覚特性
を考慮する場合に、上記復号された線形予測パラメータ
を歪み計算に使用することもできる。なお、線形予測分
析の詳細および線形予測パラメータの符号化例について
は、例えば古井貞煕著“ディジタル音声処理”（東海大
学出版会）に記載されている。ここで、線形予測分析部
１−２、線形予測パラメータ符号化部１−３、線形予測
パラメータ復号部１−４および合成フィルタ１−６は非
線型なものに置き換えてもよい。FIG. 10 shows a functional configuration of this encoding method. A linear prediction parameter representing the frequency spectrum envelope characteristic of the input voice is calculated in the linear prediction analysis unit 1-2 using the acoustic signal (input voice) input to the input terminal. The obtained linear prediction parameter is coded by the linear prediction parameter coding unit 1-3 and sent to the linear prediction parameter decoding unit 1-4. Further, when the distortion calculation is performed using the spectral information of the input voice such as considering the auditory characteristics in the distortion calculation, the linear prediction parameter is also sent to the distortion calculation unit 1-7. The linear prediction parameter decoding unit 1-4 reproduces the synthesis filter coefficient from the received code and sends it to the synthesis filter 1-6. When the auditory characteristics are taken into consideration in the distortion calculation, the decoded linear prediction parameter may be used in the distortion calculation. Details of the linear prediction analysis and an example of coding the linear prediction parameters are described in, for example, "Digital Speech Processing" by Sadahiro Furui (Tokai University Press). Here, the linear prediction analysis unit 1-2, the linear prediction parameter encoding unit 1-3, the linear prediction parameter decoding unit 1-4, and the synthesis filter 1-6 may be replaced with non-linear ones.

【０００６】駆動音源ベクトル生成部１−５では、１フ
レーム分の長さの駆動音源ベクトル候補を生成し、合成
フィルタ１−６に送る。図１１に駆動音源ベクトル生成
部１−５の機能的構成例を示す。適応符号帳２−１から
はそのバッファに記憶された直前の過去の駆動音源ベク
トル（既に量子化された直前の１〜数フレーム分の駆動
音源ベクトル）ｃ（ｔ−１）を、ある周期に相当する長
さで切り出し、その切り出したベクトルを、フレームの
長さになるまで繰り返すことによって、音声の周期成分
に対応する時系列ベクトルの候補が出力される。上記
「ある周期」とは歪み計算部１−７における歪みｄが小
さくなるような周期が選択されるが、選択された周期
は、一般には音声のピッチ周期に相当することが多い。The driving sound source vector generation unit 1-5 generates driving sound source vector candidates having a length of one frame and sends them to the synthesis filter 1-6. FIG. 11 shows a functional configuration example of the driving sound source vector generation unit 1-5. From the adaptive codebook 2-1, the immediately preceding past driving excitation vector (driving excitation vector for one to several frames immediately before being already quantized) c (t-1) stored in the buffer is set to a certain cycle. By cutting out with a corresponding length and repeating the cut out vector until the length of the frame is reached, candidates for the time-series vector corresponding to the periodic component of the speech are output. The "certain cycle" is selected as a cycle in which the distortion d in the distortion calculation unit 1-7 is small, and the selected cycle generally corresponds to the pitch cycle of the voice.

【０００７】固定符号帳２−２からは、音声の非周期成
分に対応する１フレーム分の長さの時系列符号ベクトル
候補が出力される。これらの候補は入力音声信号とは独
立に符号化のためのビット数に応じてあらかじめ指定さ
れた数の候補ベクトルとして記憶されたものである。固
定符号帳２−２から出力された固定符号ベクトル候補
は、周期化部２−３において、周期符号で指定される周
期（上記のように一般にピッチ周期に相当）で必要に応
じて周期化される。周期化とは、指定された周期位置に
タップを持つ櫛形フィルタをかけるか、適応符号帳と同
様にベクトルの先頭から指定された周期に相当する長さ
で切り出したベクトルを繰り返すことをいう。周期化部
２−３は、符号化効率向上の点から用いられることが多
いが、用いられない場合もある。また、子音区間など、
音声そのものにピッチ成分がないかまたは少ない場合な
どには、周期化部は何の働きもしない場合もある。From fixed codebook 2-2, a time-series code vector candidate having a length of one frame corresponding to an aperiodic component of speech is output. These candidates are stored as a predetermined number of candidate vectors according to the number of bits for coding, independently of the input speech signal. The fixed code vector candidates output from the fixed codebook 2-2 are, in the periodization unit 2-3, periodicized as necessary with a period (generally equivalent to the pitch period as described above) specified by the periodic code. It Periodization refers to applying a comb filter having a tap at a specified cycle position, or repeating a vector cut out at a length corresponding to a specified cycle from the beginning of the vector as in the adaptive codebook. The periodicization unit 2-3 is often used from the viewpoint of improving coding efficiency, but it may not be used in some cases. Also, such as consonant section,
In the case where the voice itself has no or little pitch component, in some cases, the cycler does not work.

【０００８】適応符号帳２−１および、周期化部２−３
から出力された時系列ベクトルの候補は、乗算部２−
４、２−５において、それぞれ重み作成部２−７で生成
された重みｇａ，ｇｆが乗算され、加算部２−６におい
て加算され、駆動音源ベクトルの候補ｃとなる。図１１
の構成例において、適応符号帳２−１を用いないで、固
定符号帳２−２のみの構成としてもよく、子音部や背景
雑音などのピッチ周期性の少ない信号を符号化するとき
には、ビットを節約するために、適応符号帳２−１を用
いない構成にすることも多い。Adaptive codebook 2-1 and periodicizing section 2-3
The candidate of the time series vector output from the multiplication unit 2-
In 4 and 2-5, the weights ga and gf generated in the weight generating unit 2-7 are respectively multiplied, and added in the adding unit 2-6 to become the driving sound source vector candidate c. Figure 11
In the configuration example of 1., the adaptive codebook 2-1 may not be used, and only the fixed codebook 2-2 may be configured. When encoding a signal with a small pitch periodicity such as a consonant part or background noise, bits are In order to save the cost, it is often the case that the adaptive codebook 2-1 is not used.

【０００９】図１０中の合成フィルタ１−６は、線形予
測パラメータ復号部１−３の出力をフィルタの係数とす
る線形フィルタで、駆動音源ベクトル候補ｃを入力とし
て再生音声の候補ｙを出力する。合成フィルタ１−６の
次数すなわち線形予測分析の次数は、一般に１０〜１６
次程度が用いられることが多い。なお、既に述べたよう
に、合成フィルタ１−６は非線型なフィルタでもよい。A synthesis filter 1-6 shown in FIG. 10 is a linear filter which uses the output of the linear prediction parameter decoding unit 1-3 as a filter coefficient, and outputs a candidate y of a reproduced voice with a driving sound source vector candidate c as an input. . The order of the synthesis filter 1-6, that is, the order of the linear prediction analysis is generally 10 to 16.
The following are often used: As described above, the synthesis filter 1-6 may be a non-linear filter.

【００１０】歪み計算部１−７では、合成フィルタ１−
６の出力である再生音声の候補ｙと、入力音声ｘとの歪
みｄを計算する。この歪みの計算は、例えば聴覚重み付
けなど、合成フィルタ１−６の係数または量子化してい
ない線形予測係数を考慮に入れて行うことが多い。図１
２に、聴覚重み付けを考慮して歪み計算する機能的構成
例を示した。聴覚重み付けは、量子化していない線形予
測パラメータもしくは量子化された線形予測フィルタ係
数を用いた、聴覚重みフィルタ３−２，３−３の形で構
成される。合成フィルタ３−１から出力される再生音声
候補ｙは、聴覚重みフィルタ３−２を通され、同じく聴
覚重みフィルタ３−３に通された入力音声との間で、歪
みｄが計算される。ここで、聴覚重みフィルタ３−２，
３−３は、距離計算部３−４の後に１つのフィルタとし
て入れても等価であるが、処理量の点から、図１２に示
したように、距離計算部３−４の手前で２ケ所に分けて
入れることが多い。In the distortion calculation section 1-7, the synthesis filter 1-
The distortion d between the reproduced voice candidate y which is the output of No. 6 and the input voice x is calculated. This distortion calculation is often done taking into account the coefficients of the synthesis filters 1-6 or the unquantized linear prediction coefficients, for example auditory weighting. Figure 1
2 shows an example of a functional configuration in which distortion is calculated in consideration of auditory weighting. Perceptual weighting is configured in the form of perceptual weighting filters 3-2, 3-3 using unquantized linear prediction parameters or quantized linear prediction filter coefficients. The reproduced voice candidate y output from the synthesis filter 3-1 is passed through the perceptual weighting filter 3-2, and the distortion d is calculated with the input voice also passed through the perceptual weighting filter 3-3. Here, the auditory weight filter 3-2
3-3 is equivalent even if it is inserted as one filter after the distance calculation unit 3-4, but in terms of processing amount, as shown in FIG. 12, there are two locations before the distance calculation unit 3-4. Often divided into

【００１１】図１０中の符号帳検索制御部１−９では、
各再生音声候補ｙと入力音声ｘとの歪みｄが最小となる
ような駆動音源符号を選択し、そのフレームにおける駆
動音源ベクトルを決定する。なお、図１１に示した適応
符号帳２−１、固定符号帳２−２、重み符号帳２−３を
用いる場合には、これらに対する周期符号、固定符号、
および重み符号を選択し、それらを駆動音源とする。In the codebook search controller 1-9 shown in FIG.
A driving sound source code that minimizes the distortion d between each reproduced sound candidate y and the input sound x is selected, and the driving sound source vector in that frame is determined. When the adaptive codebook 2-1, fixed codebook 2-2, and weight codebook 2-3 shown in FIG. 11 are used, the periodic code, fixed code,
And weight codes are selected and used as the driving sound source.

【００１２】符号帳検索制御部１−９において決定され
た駆動音源符号（周期符号、固定（雑音）符号、重み符
号）と、線形予測パラメータ符号化部１−２の出力であ
る線形予測パラメータ符号は、符号送出部１−１０に送
られ、利用の形態に応じて記憶装置に記憶されるか、ま
たは通信路を介して受信側へ送られる。つまり、音声の
フレームごとの短期予測成分が線形予測パラメータ符号
として、この短期予測成分の予測残差成分中のフレーム
よりも長い周期的な成分が、周期符号として、その残り
の成分が固定（雑音）符号として、また周期的な成分と
その残りの成分の振幅が重み符号としてそれぞれ符号化
される。The driving excitation code (periodic code, fixed (noise) code, weight code) determined by the codebook search control unit 1-9 and the linear prediction parameter code output from the linear prediction parameter coding unit 1-2. Is sent to the code sending unit 1-10 and is stored in the storage device or sent to the receiving side via the communication path depending on the form of use. That is, the short-term prediction component for each frame of speech is used as a linear prediction parameter code, and the periodic component longer than the frame in the prediction residual component of this short-term prediction component is used as the periodic code and the remaining components are fixed (noise). ) Code, and the amplitudes of the periodic component and the remaining component are encoded as weight codes.

【００１３】図１３に、上記符号化方法に対応する復号
方法の機能的構成例を示す。伝送路または記憶媒体から
受信された符号のうち、線形予測パラメータ符号は線形
予測パラメータ復号部４−２において合成フィルタ係数
に復号され、合成フィルタ４−４および必要に応じて後
処理部４−５に送られる。受信された駆動音源符号は、
駆動音源ベクトル生成部４−３に送られ、符号に対応す
る音源ベクトルが生成される。なお、駆動音源生成部４
−３の構成は、図１０に示した符号化方法の駆動音源ベ
クトル生成部１−４に対応する構成となる。合成フィル
タ４−４は、駆動音源ベクトルを入力として、音声を再
生する。後処理部４−５は、再生された音声の雑音感を
聴覚的に低下させるような処理（ポストフィルタリング
とも呼ばれる）を行うが、後処理部３−５は処理量の削
減等の関係から用いられないことも多い。FIG. 13 shows an example of the functional configuration of a decoding method corresponding to the above encoding method. Among the codes received from the transmission path or the storage medium, the linear prediction parameter code is decoded into the synthesis filter coefficient in the linear prediction parameter decoding unit 4-2, and the synthesis filter 4-4 and, if necessary, the post-processing unit 4-5. Sent to. The received excitation code is
It is sent to the driving sound source vector generation unit 4-3, and the sound source vector corresponding to the code is generated. The driving sound source generator 4
The configuration of -3 corresponds to the drive excitation vector generation unit 1-4 of the encoding method shown in FIG. The synthesis filter 4-4 receives the driving sound source vector as an input and reproduces a voice. The post-processing unit 4-5 performs a process (also referred to as post-filtering) that aurally reduces the noise sensation of the reproduced voice, but the post-processing unit 3-5 is used because of the reduction of the processing amount and the like. There are many things that cannot be done.

【００１４】ＣＥＬＰ方式の駆動音源ベクトル探索法の
ひとつとして、ＡｌｇｅｂｒａｉｃＣｏｄｅ−Ｅｘｃｉ
ｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ（ＡＣＥ
ＬＰ）という方式が提案されている。この方式は、固定
符号帳を、フレーム長のベクトルパターンとして蓄える
のではなく、高さが１のパルスをフレーム内に数本、例
えば８０サンプルのフレームまたはサブフレームに対し
て、４本、適当な位置に立てることによって、固定符号
ベクトルとする方式で、この駆動音源方式の採用と、歪
み計算において演算順序を工夫することによって、従来
の方式に比べて演算処理とメモリの必要量を減らすこと
ができる。なお、ＡＣＥＬＰ方式の詳細は、例えば、文
献、Ｒ．Ｓａｌａｍｉ，Ｃ．Ｌａｆｌａｍｍｅ，ａｎｄ
Ｊ−Ｐ．Ａｄｏｕｌ，“８ｋｂｉｔ／ｓＡＣＥＬＰ
ＣｏｄｉｎｇｏｆＳｐｅｅｃｈｗｉｔｈ１０
ｍｓＳｐｅｅｃｈ−Ｆｒａｍｅ：ａＣａｎｄｉｄａ
ｔｅｆｏｒＣＣＩＴＴＳｔａｎｄａｒｄｉｚａｔ
ｉｏｎ”，ＩＥＥＥＰｒｏｃ．ＩＣＡＳＳＰ−９４，
ｐｐ．II−９７に記載されている。As one of the driving sound source vector search methods of the CELP system, Algebraic Code-Exci
ted Linear Prediction (ACE
LP) has been proposed. This method does not store the fixed codebook as a vector pattern of the frame length, but rather several pulses with a height of 1 in a frame, for example, four pulses for a frame or subframe of 80 samples. By setting it to the position, a fixed code vector is adopted, and by adopting this driving sound source method and devising the calculation order in distortion calculation, it is possible to reduce the amount of calculation processing and memory required compared to the conventional method. it can. The details of the ACELP method are described in, for example, the literature, R.M. Salami, C.I. Laflamme, and
J-P. Adoul, “8kbit / s ACELP
Coding of Speech with 10
ms Speech-Frame: a Candida
te for CCITT Standardizatat
Ion ", IEEE Proc. ICASSP-94,
pp. II-97.

【００１５】[0015]

【発明が解決しようとする課題】人間の音声に含まれる
周波数成分は、一般に７ｋＨｚ以下の帯域に集中する。
３．４ｋＨｚ以下に帯域制限すると、情報量は減り予測
が容易になるため圧縮符号化効率は良くする反面、自然
性や個人性情報の一部が失われる。このことから３．４
ｋＨｚ帯域の音声を入力対象にした場合には、８ｋｂｉ
ｔ／ｓ程度のビットレートでも比較的高いＳ／Ｎを実現
可能であるが、７ｋＨｚ帯域の音声に比べて原音レベル
で自然性に劣化が生じているため、自然性のよい音声を
再生することはできない。The frequency components contained in human voice are generally concentrated in a band of 7 kHz or less.
If the band is limited to 3.4 kHz or less, the amount of information is reduced and prediction is facilitated, so that the compression coding efficiency is improved, but part of the naturalness and individuality information is lost. From this, 3.4
When inputting a voice in the kHz band, 8 kbi
It is possible to achieve a relatively high S / N even at a bit rate of about t / s, but the naturalness is deteriorated at the original sound level compared to the sound in the 7 kHz band, so it is necessary to reproduce sound with good naturalness. I can't.

【００１６】それに対して入力対象の音声を７ｋＨｚ帯
域で取得すれば入力音声自体の自然性は非常に高いが、
情報量が多いため、低ビットレート、たとえば４〜６ｋ
ｂｉｔ／ｓ程度で高品質な音声を実現するのは非常に困
難である。３．４ｋＨｚ帯域の音声を入力するには８ｋ
Ｈｚ程度のサンプリングレートが必要で、量子化ビット
数を１６ｂｉｔとすれば１６×８０００ｂｉｔ／ｓの情
報量になる。これに対して７ｋＨｚ帯域の音声を入力す
るには１６ｋＨｚ程度のサンプリングレートが必要で、
量子化ビット数を１６ｂｉｔとした場合には１６×１６
０００ｂｉｔ／ｓの情報量になる。低域にパワーが集中
しているという音声の特徴を利用すれば０〜３．４ｋＨ
ｚに比べて３．４〜７ｋＨｚの方が少ない情報量で表現
可能であるが、７ｋＨｚ帯域の音声では周波数成分短期
予測およびピッチ成分長期予測の両方に関して、高域に
予測しにくい成分を含んでいるため、３．４ｋＨｚ帯域
の音声を符号化するのと同程度までビットレートを削減
することは非常に困難である。On the other hand, if the voice to be input is acquired in the 7 kHz band, the naturalness of the input voice itself is very high.
Due to the large amount of information, low bit rate, for example 4-6k
It is very difficult to realize high quality voice at about bit / s. 8k to input the sound of 3.4kHz band
A sampling rate of about Hz is required, and if the quantization bit number is 16 bits, the amount of information is 16 × 8000 bits / s. On the other hand, a sampling rate of about 16 kHz is required to input the sound of 7 kHz band,
16 × 16 when the number of quantization bits is 16 bits
The amount of information is 000 bits / s. If you use the feature of voice that power is concentrated in the low frequency range , 0-3.4kHz
Compared with z, 3.4 to 7 kHz can be expressed with a smaller amount of information, but in the case of 7 kHz speech, it is difficult to predict in the high frequency region for both short-term frequency component prediction and long-term pitch component prediction. Therefore, it is very difficult to reduce the bit rate to the same extent as when encoding speech in the 3.4 kHz band.

【００１７】また、ビットレート可変の低ビットレート
音声符号化方式では、フレーム単位でビットレートを可
変制御する場合に、符号化効率と遅延の関係から問題が
生じる場合がある。たとえば様々な種類のビットレート
モードに対応している符号化方式において、品質の制約
からビットレートを低くするほどフレーム長も長くなる
などの制約がある場合が多い。このような可変ビットレ
ート実現方式では、再生途中でフレーム単位にビットレ
ートを変更すると、遅延の制約から音が途切れる等の問
題点が生じる。あるいは遅延の最も長いビットレートモ
ードに全体の遅延を合わせる方法があるが、その場合に
は遅延の短いモードでの性能を最大限に発揮できないと
いう問題点が残る。In addition, in the low bit rate audio encoding method with variable bit rate, when the bit rate is variably controlled in frame units, a problem may occur due to the relationship between encoding efficiency and delay. For example, in encoding systems that support various kinds of bit rate modes, there are many restrictions due to quality restrictions, such that the lower the bit rate, the longer the frame length. In such a variable bit rate realizing method, if the bit rate is changed in units of frames during reproduction, there is a problem that sound is interrupted due to delay constraint. Alternatively, there is a method of adjusting the total delay to the bit rate mode having the longest delay, but in that case, there remains a problem that the performance in the mode having the shortest delay cannot be maximized.

【００１８】この発明では、上述したような従来法の欠
点に鑑みてなされたもので、７ｋＨｚ帯域の音声符号化
方式に比べて品質の劣化を抑えたまま圧倒的に低いビッ
トレートで、３．４ｋＨｚ帯域の音声符号化方式に比べ
ては、同程度のビットレートで圧倒的に自然性の高い高
品質な音声を再生できる音声信号符号化方法、復号方法
およびその符号化器、復号器を提供することをその目的
とする。また音が途切れることなくフレーム単位で連続
的にビットレートを変えることができる音声信号符号化
方法および符号化器を提供することにある。The present invention has been made in view of the above-mentioned drawbacks of the conventional method, and has an overwhelmingly low bit rate while suppressing deterioration of quality as compared with the voice encoding system of the 7 kHz band. An audio signal encoding method and a decoding method capable of reproducing high-quality audio with overwhelmingly high naturalness at a bit rate comparable to that of an audio encoding method in the 4 kHz band.
It is an object of the present invention to provide an encoder and a decoder thereof. Another object of the present invention is to provide an audio signal encoding method and an encoder capable of continuously changing the bit rate in frame units without interruption of sound.

【００１９】[0019]

【課題を解決するための手段】この発明では、入力音声
の周波数帯域を５ｋＨｚ程度、つまり４．５ｋＨｚ帯域
〜５．５ｋＨｚ帯域に特化することで音声に含まれる情
報量と符号化効率のバランスをとり、従来の７ｋＨｚ帯
域の入力音声を対象とした符号化方式に比べて圧倒的に
低いビットレートで符号化でき、３．４ｋＨｚ帯域の入
力音声を対象とした符号化方式に比べては、同程度の低
いビットレートで圧倒的に自然性の高い音声を再生でき
る。According to the present invention, by balancing the frequency band of input speech to about 5 kHz, that is, 4.5 kHz to 5.5 kHz, the amount of information contained in speech and the coding efficiency are balanced. Therefore, it is possible to perform encoding at an overwhelmingly lower bit rate than the conventional encoding method for the input voice of the 7 kHz band, and compared to the encoding method for the input voice of the 3.4 kHz band, It can play overwhelmingly natural sound at the same low bit rate.

【００２０】例えばサンプリングレートが１１ｋＨｚの
音声を入力とした場合、８ｋＨｚでサンプリングした場
合に比べて情報量が１１／８倍に増加するため、一般的
には固定符号帳のパルス数を１１／８倍程度に増やす必
要があると容易に推測できる。この発明では１１ｋＨｚ
サンプリングレートした場合に増える情報量のうち、音
声モデル（短期予測成分と、その予測残差中の長周期的
成分とその残りの成分とにモデル化したもの）に一致す
る範囲内の情報のみを利用することで、線形予測パラメ
ータ量子化効率、適応符号帳の量子化効率を上げ、同様
の特徴を利用することで固定符号帳パルス候補でも、パ
ルス配置位置を非常にスパース（まばら）にし、パルス
の本数を制限する。For example, when a voice having a sampling rate of 11 kHz is input, the amount of information increases 11/8 times as compared with the case of sampling at 8 kHz. Therefore, the number of pulses in the fixed codebook is generally 11/8. It can be easily inferred that it is necessary to double the number. 11 kHz in this invention
Of the amount of information that increases when the sampling rate is used, only the information within the range that matches the speech model (the short-term prediction component, the long-period component in the prediction residual and the remaining component) By using it, the linear prediction parameter quantization efficiency and the adaptive codebook quantization efficiency are improved, and by using the same characteristics, even in fixed codebook pulse candidates, the pulse arrangement positions are made very sparse and Limit the number of.

【００２１】この結果、パルスの本数を８ｋＨｚサンプ
リングの場合とほぼ同じにしても、非常に高品質な音声
を再生することが可能となる。ビットレートを低くする
場合には、情報効率をよくするためにフレーム長を長く
取って線形予測パラメータの情報を送る割合を少なくす
ることが多い。このような場合フレーム長が変わると遅
延も変化するので最も遅延の長いモードに遅延を合わせ
て先読みを行う必要がある。As a result, even if the number of pulses is almost the same as in the case of sampling at 8 kHz, it is possible to reproduce a very high quality voice. When lowering the bit rate, in order to improve information efficiency, it is often the case that the frame length is increased to reduce the rate of sending information of the linear prediction parameter. In such a case, since the delay changes as the frame length changes, it is necessary to match the delay to the mode with the longest delay and perform prefetching.

【００２２】この発明ではフレーム長と遅延をすべての
モードで一定にし、先読みした部分の情報もすべてのモ
ードであますところ無く利用できるため、ビットレート
切り替えに対応するための効率低下を最小限に抑えるこ
とが可能となる。さらにサブフレーム数、固定符号帳の
切り替え時に適応符号帳を再初期化せずに用いること
で、過去に送った情報を有効に利用し、音が途切れるこ
となくフレーム単位でビットレートモードの切り替えを
行うことができる。作用７ｋＨｚ帯域の音声がＣＥＬＰ方式で用いられる音声の
モデルに一致しない高域の情報も多く含んでいるのに対
して、３．４ｋＨｚ帯域の音声では周波数帯域制限によ
って上記音声モデルに一致する範囲内の情報も失われて
いる。According to the present invention, the frame length and the delay are made constant in all modes, and the information of the prefetched portion can be used without exception in all modes. Therefore, the efficiency reduction for dealing with bit rate switching is minimized. It becomes possible to suppress. Furthermore, by using the adaptive codebook without re-initialization when switching the number of subframes and fixed codebook, the information sent in the past can be used effectively, and the bit rate mode can be switched frame by frame without interruption of sound. It can be carried out. Action While the voice in the 7 kHz band contains a lot of high frequency information that does not match the model of the voice used in the CELP method, the voice in the 3.4 kHz band is within the range that matches the voice model due to the frequency band limitation. Information has been lost.

【００２３】これらに対して５ｋＨｚ帯域の音声で、上
記音声モデルに一致する範囲内の必要十分な情報を含ん
でおり、それ以外の情報はほとんど含まれていなく、上
記音声のモデルでの符号化に非常に適しているため、７
ｋＨｚ帯域の音声に比べてビットレートを圧倒的に低く
でき、３．４ｋＨｚ帯域の符号化方式と比べて、同程度
のビットレートにした場合には帯域が広くなった分だけ
圧倒的に自然性の高い高品質な音声を再生できる。On the other hand, the voice of the 5 kHz band contains necessary and sufficient information within the range that matches the above-mentioned voice model, and contains almost no other information, and is encoded by the above-mentioned voice model. 7 is suitable for
The bit rate can be overwhelmingly lower than that of voices in the kHz band, and when compared to the encoding method in the 3.4 kHz band, when the bit rate is about the same, the band becomes wider, which is overwhelmingly natural. It can play high quality and high quality audio.

【００２４】また、可変ビットレートを実現する場合に
も、入力を５ｋＨｚ帯域の音声とすることで、３．４ｋ
Ｈｚ帯域の入力音声を対象とする可変ビットレート音声
符号化方式に比べて高品質を実現できる。Also, in the case of realizing a variable bit rate, inputting a voice in the 5 kHz band makes it possible to obtain 3.4 k
Higher quality can be realized as compared with the variable bit rate speech coding method for input speech in the Hz band.

【００２５】[0025]

【発明の実施の形態】以下にこの発明の実施例を図面を
用いて説明する。図１はこの発明による音声符号化器の
機能的構成例を示したものである。従来法と異なる点
は、入力対象を５ｋＨｚ程度の帯域の音声（１１．０２
５ｋＨｚサンプリング）とする点、ビットレート切り替
え制御時にフレーム長は同じままサブフレーム長と固定
符号帳を切り替える点、適応符号帳は符号化器と復号器
ともに再初期化を行わず継続して使用する点である。図
２にこの発明による復号器の機能的構成例を示す。＜実施例１＞発明による符号化方法の例として、この発
明を用いてフレーム長１０ｍｓ、ビットレート７．８ｋ
ｂｉｔ／ｓの符号化方式を設計した例を示す。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a functional configuration example of a speech coder according to the present invention. The difference from the conventional method is that the input target is a voice (11.02
5 kHz sampling), the subframe length and the fixed codebook are switched while the frame length remains the same during the bit rate switching control, and the adaptive codebook is used continuously without re-initialization for both the encoder and the decoder. It is a point. FIG. 2 shows a functional configuration example of the decoder according to the present invention. <Embodiment 1> As an example of an encoding method according to the present invention, a frame length of 10 ms and a bit rate of 7.8 k are obtained by using the present invention.
An example of designing a bit / s encoding method will be shown.

【００２６】入力された音声信号は、フィルタ部５−１
においてローパスフィルタ処理をかけられ５ｋＨｚ以下
に帯域制限される。分析フレーム長は１１０サンプルで
ある。これは約１０ｍｓに相当する。サブフレーム数
２、先読みは１サブフレーム５ｍｓとする。図３にビッ
ト配分の例を示す。線形予測分析部５−２では１４次の
線形予測分析が行われ、得られた線形予測係数からＬｅ
ｖｉｎｓｏｎ−Ｄｕｒｂｉｎのアルゴリズムによって１
４次のＬＳＰ係数が計算される。ここでは演算量を低く
抑えるため１４次としたが、線形予測次数は１４次から
２０次程度でも可能である。The input voice signal is filtered by the filter section 5-1.
In, the signal is low-pass filtered and band limited to 5 kHz or less. The analysis frame length is 110 samples. This corresponds to about 10 ms. It is assumed that the number of subframes is 2 and prefetching is 1 subframe 5 ms. FIG. 3 shows an example of bit allocation. In the linear prediction analysis unit 5-2, 14th-order linear prediction analysis is performed, and Le is calculated from the obtained linear prediction coefficients.
1 according to the Vinson-Durbin algorithm
The fourth-order LSP coefficient is calculated. Here, the order is 14th in order to keep the calculation amount low, but the linear prediction order may be about 14th to 20th.

【００２７】線形予測パラメータは、線形予測パラメー
タ符号化部５−３において、移動平均（ＭＡ）予測を用
いる２段ベクトル量子化によって符号化される。ここで
は移動平均（ＭＡ）予測モード切り替えに１ビット、１
段目に７ビット、２段目を低次と高次の２つに分け、そ
れぞれ５ビットを用いた。３．４ｋＨｚ帯域の音声では
分析次数はたとえば１０次程度が用いられる。これに対
して５ｋＨｚ帯域の音声では線形予測分析次数を１４次
に上げているため情報量か増えている。しかし、情報量
が増えているにも関わらずＬＳＰのベクトル量子化ビッ
ト数は同程度とする。これは５ｋＨｚ程度までは線形予
測係数に相関があり、３．４ｋＨｚ帯域の音声と同程度
の量子化ビット数で量子化可能であるという５ｋＨｚ帯
域音声信号の特徴を利用しているためである。なお、７
ｋＨｚ帯域の音声を入力とした場合には、高域に無相関
な成分を含むため同程度のビット数にすることは難し
い。ＬＳＰ係数の次数を多くしているのでそれを忠実に
表わすには、その量子化に用いるＬＳＰ符号帳のコード
ベクトルの数を多くし、従って符号化ビット数を多くす
るので通常であるが、この発明では３．４ｋＨｚ帯域の
符号化に用いるＬＳＰ符号帳のコードベクトル数と同程
度とし、従って符号化ビット数を同一としている。この
ため同一符号化ビット数の場合、３．４ｋＨｚ帯域より
もこの発明では量子化歪が大となる。しかし、５ｋＨｚ
程度までは線形予測係数に相関があり、その量子化歪の
増加は比較的わずかであり、その量子化歪の増加以上
に、帯域が広かったこと、つまりＬＳＰ係数の次数を多
くしたことにもとづく品質の向上が、再生音声に対して
大きく影響し、自然性のよいものとなる。The linear prediction parameter coding unit 5-3 codes the linear prediction parameter by two-stage vector quantization using moving average (MA) prediction. 1 bit for moving average (MA) prediction mode switching, 1
The 7th bit was used for the second stage, and the second stage was divided into two, low order and high order, and 5 bits were used for each. For voice in the 3.4 kHz band, the analysis order is, for example, about 10th order. On the other hand, in the case of speech in the 5 kHz band, the amount of information is increased because the linear prediction analysis order is increased to 14. However, the number of vector quantization bits of LSP is assumed to be about the same even though the amount of information increases. This is because the linear prediction coefficient is correlated up to about 5 kHz, and the characteristic of the 5 kHz band audio signal that it can be quantized with the same number of quantization bits as the audio in the 3.4 kHz band is used. In addition, 7
When a voice in the kHz band is input, it is difficult to set the number of bits to the same level because the high frequency includes uncorrelated components. Since the order of the LSP coefficient is increased, it is usual to faithfully represent it by increasing the number of code vectors of the LSP codebook used for the quantization, and thus increasing the number of coded bits. In the present invention, the number of code vectors is the same as the number of code vectors of the LSP codebook used for encoding in the 3.4 kHz band, and thus the number of encoded bits is the same. Therefore, in the case of the same number of coded bits, the present invention has a larger quantization distortion than that in the 3.4 kHz band. However, 5 kHz
Up to a degree, there is a correlation between linear prediction coefficients, and the increase in the quantization distortion is relatively small. Based on the fact that the band was wider than the increase in the quantization distortion, that is, the order of the LSP coefficient was increased. The improvement in quality has a great influence on the reproduced sound, and the sound becomes natural.

【００２８】駆動音源ベクトル生成部５−１２〜５−１
４（この実施例１では駆動音源ベクトル生成部は１つを
使用）で駆動音源ベクトルは適応符号帳、固定符号帳に
重みをかけて足しあわせることで生成される。適応符号
帳の探索はそこに格納された時系列データの先頭からど
の位置までを切り出すかを変更して行う、その切り出し
位置はサンプル位置単位で行う整数精度探索と、隣接サ
ンプル位置間を３等分した各位置単位で行う３倍精度探
索とが通常行われている。１１．０２５ｋＨｚサンプリ
ングを使用した場合に適応符号帳は、たとえば適応符号
帳インデックスに８ビットを割り当てた場合は基準位置
に対し２７＋１／３から８５＋１／３サンプルまでは３
倍精度で、８６から１６５サンプルまでは整数精度で探
索することができる。第２サブフレームに対しては、前
サブフレームで得た適応予測値Ｔ ₁の整数部分をｉｎｔ
（Ｔ₁）とするとき〔ｉｎｔ（Ｔ₁）−５＋２／３，ｉ
ｎｔ（Ｔ₁）＋４＋２／３〕に対して３倍精度で適応予
測値を探索し、５ビットを用いて表す。ここで、適応符
号帳インデックスに９ビットを割り当てた場合には２６
＋１／３から１８５＋２／３サンプルまでは３倍精度
で、１８６から２２０サンプルまでは整数精度で探索し
てもよい。Driving sound source vector generation unit 5-12 to 5-1
4 (in the first embodiment, one driving sound source vector generation unit
Drive source vector to adaptive codebook or fixed codebook
It is generated by adding weights and adding them. Adaptive code
The search of the book is from the beginning of the time series data stored there.
Cut out by changing whether to cut up to the position of
The position is an integer precision search performed in sample position units, and
Triple precision search performed for each position by dividing the sample positions into three equal parts
Searching is usually done. 11.025kHz sample
Adaptive codebook, for example,
Reference position when 8 bits are assigned to the book index
From 27 + 1/3 to 85 + 1/3 samples is 3
Double precision search with integer precision from 86 to 165 samples
You can search. For the second subframe, the previous
Adaptive prediction value T obtained in subframe ₁Int the integer part of
(T₁) [Int (T₁) -5 + 2/3, i
nt (T₁) + 4 + 2/3] with 3 times accuracy
The measurement is searched and represented using 5 bits. Where the adaptation mark
26 if 9 bits are assigned to the issue index
Triple precision from +1/3 to 185 + 2/3 samples
So, search from 186 to 220 samples with integer precision
May be.

【００２９】適応符号帳の探索範囲をこのように設定し
た場合には、同じビット数を割り当てるとすれば５ｋＨ
ｚ帯域の音声を用いた場合の方が、３．４ｋＨｚ帯域の
入力を対象とした場合に比べて３倍精度で探索する区間
は短くなる。しかし、ピッチの周期性が３．４ｋＨｚ帯
域の音声と似ているという５ｋＨｚ帯域音声の特徴を利
用すれば、同じビット数で同等以上の品質を得ることが
できる。When the search range of the adaptive codebook is set in this way, if the same number of bits is assigned, it is 5 kHz.
In the case of using the voice in the z band, the search period with triple precision is shorter than in the case of inputting in the 3.4 kHz band. However, if the characteristic of the 5 kHz band voice that the pitch periodicity is similar to that of the 3.4 kHz band voice is used, equal or higher quality can be obtained with the same number of bits.

【００３０】また、３．４ｋＨｚ帯域の音声では８ｋＨ
ｚでサンプリングされ、５ｋＨｚ帯域の音声では１１．
０２５ｋＨｚでサンプリングされる。８ｋＨｚサンプリ
ングに比べて１１．０２５ｋＨｚサンプリングの方がサ
ンプリング間隔が狭いことから、適応符号帳探索の時間
分解能が向上する。実質的には５ｋＨｚ帯域での３倍精
度は３．４ｋＨｚ帯域の４．１３倍精度と換算すること
ができる。このことも５ｋＨｚ帯域を入力とした場合に
品質が向上する理由である。8 kHz for voice in the 3.4 kHz band
z is sampled at z, and 11.
It is sampled at 025 kHz. Since the sampling interval of 11.025 kHz sampling is narrower than that of 8 kHz sampling, the time resolution of adaptive codebook search is improved. Substantially, the triple precision in the 5 kHz band can be converted into the 4.13 precision in the 3.4 kHz band. This is also the reason why the quality is improved when the input is in the 5 kHz band.

【００３１】上記に示した様に、４．５ｋＨｚ帯域〜
５．５ｋＨｚ帯域の音声を入力対象とすることによっ
て、３．４ｋＨｚ帯域の音声を対象とした符号化方式と
同程度のビット配分でもＬＳＰ符号帳と適応符号帳の性
能が顕著に向上する。この実施例１のビット配分は図３
に示すように、３．４ｋＨｚ帯域のＧ．７２９（８ｋｂ
ｉｔ／ｓ）のビット配分と比較すると、重み（利得）が
各７，７ビットである点が違うのみである。このように
この実施例１ではＧ．７２９よりも２ビット少ないが、
後で示すが、再生音声の品質は実施例１の方が向上して
いる。As indicated above, the 4.5 kHz band
By inputting the voice of the 5.5 kHz band as the input target, the performance of the LSP codebook and the adaptive codebook is remarkably improved even if the bit allocation is similar to that of the encoding method for the voice of the 3.4 kHz band. The bit allocation of the first embodiment is shown in FIG.
As shown in FIG. 729 (8 kb
It is different from the bit allocation of it / s) only in that the weights (gains) are 7 and 7 bits, respectively. As described above, in the first embodiment, the G.I. 2 bits less than 729,
As will be shown later, the quality of reproduced voice is improved in the first embodiment.

【００３２】またこのビット配分から、適応符号帳の第
１サブフレームに対し、実施例１、Ｇ．７２９共に８ビ
ットであるが、このサブフレーム内のサンプル数は実施
例１が５５サンプルに対し、Ｇ．７２９は４０サンプル
である。従って、１サンプル当りのビット割り当て数は
実施例１では８／５５、Ｇ．７２９では８／４０とな
り、実施例１の方が少ないビット数である。このように
１サンプル当りのビット割り当てを３．４ｋＨｚ帯域の
それより少なくしたのがこの発明の１つの特徴である。From this bit allocation, the first subframe of the adaptive codebook, G. Although both 729 are 8 bits, the number of samples in this subframe is 55. 729 is 40 samples. Therefore, the bit allocation number per sample is 8/55 in the first embodiment, and G. 729 becomes 8/40, and the number of bits is smaller in the first embodiment. As described above, one of the features of the present invention is that the bit allocation per sample is made smaller than that of the 3.4 kHz band.

【００３３】また、固定符号帳の探索においても適応符
号帳の場合と同様に、８ｋＨｚサンプリングに比べて１
１．０２５ｋＨｚサンプリングのサンプリング間隔が狭
いことから、固定符号帳探索の時間分解能が向上すると
いう効果を得ることができる。これらのことを利用すれ
ば固定符号帳で立てられる１０ｍｓあたりのパルス数を
非常にスパースにすることができ、固定符号帳のビット
配分を３．４ｋＨｚ帯域と同程度まで削減することが可
能である。このように固定符号帳のビット配分を少なく
しても、ＬＳＰ符号帳、適応符号帳の性能の向上が固定
符号帳のパルス数を少なくしたことに基づく品質劣化よ
り大きく上回っている。Also, in the search for the fixed codebook, as compared with the case of the adaptive codebook, 1
Since the sampling interval of 1.025 kHz sampling is narrow, it is possible to obtain the effect of improving the time resolution of the fixed codebook search. By using these things, the number of pulses per 10 ms set in the fixed codebook can be made very sparse, and the bit allocation of the fixed codebook can be reduced to the same level as the 3.4 kHz band. . Even if the bit allocation of the fixed codebook is reduced in this way, the improvement in the performance of the LSP codebook and the adaptive codebook exceeds the quality deterioration due to the reduced number of pulses of the fixed codebook.

【００３４】５ｍｓサブフレームに４本のパルスを割り
当てて１７ビットとしたパルス配置の例を図４に示す。
図４中のｉ０の行は０番パルスが立つことができる位置
を示し、以下同様に１番、２番、３番の各パルスがそれ
ぞれ立つことができる位置を示している。各サブフレー
ムに０番乃至３番の４本のパルスを立てる。＜実施例２＞この発明の符号化方法の実施例２を示す。
この実施例２ではフレーム長２０ｍｓ、３．９５ｋｂｉ
ｔ／ｓ、５．７５ｋｂｉｔ／ｓ、７．７５ｋｂｉｔ／ｓ
の３段階にビットレートを可変制御可能な符号化方式と
した場合である。FIG. 4 shows an example of a pulse arrangement in which 4 pulses are allocated to a 5 ms subframe to have 17 bits.
The row of i0 in FIG. 4 indicates the position where the 0th pulse can stand, and the positions where the 1st, 2nd and 3rd pulses can stand respectively similarly. Four pulses 0 to 3 are set in each subframe. <Second Embodiment> A second embodiment of the encoding method of the present invention will be described.
In the second embodiment, the frame length is 20 ms, 3.95 kbi.
t / s, 5.75 kbit / s, 7.75 kbit / s
This is a case where the encoding method in which the bit rate is variably controllable in three stages.

【００３５】入力された音声信号は、フィルタ部５−１
においてローパスフィルタをかけられ５ｋＨｚ以下に帯
域制限される。分析フレーム長は２２０サンプルであ
る。これは約２０ｍｓに相当する。サブフレーム数は
３．９５ｋｂｉｔ／ｓ、５．７５ｋｂｉｔ／ｓ、７．７
５ｋｂｉｔ／ｓの各モードでそれぞれ２、３、４であ
る。先読みは約７ｍｓ、フレーム長の約３分の１とす
る。The input voice signal is filtered by the filter section 5-1.
, And is bandpass limited to 5 kHz or less. The analysis frame length is 220 samples. This corresponds to about 20 ms. The number of subframes is 3.95 kbit / s, 5.75 kbit / s, 7.7.
It is 2, 3, and 4 in each mode of 5 kbit / s. The prefetch is about 7 ms and about one third of the frame length.

【００３６】モード切り替え可能な符号化器の例を図１
に、各ビットレートモードにおけるビット配分の例を図
５に示す。各ビットレートモードはフレームごとに独立
に設定し、切り替えることが可能である。当該フレーム
のビットレートモードを復号器に知らせるために、たと
えば図５にしめしたものとは別に各フレーム１ｂｉｔか
ら２ｂｉｔを用いる。例えば３．９５ｋｂｉｔ／ｓ、
５．７５ｋｂｉｔ／ｓ、７．７５ｋｂｉｔ／ｓをそれぞ
れ０、１０、１１で表現する。この場合の各ビットレー
トは４ｋｂｉｔ／ｓ、５．８５ｋｂｉｔ／ｓ、７．８５
ｋｂｉｔ／ｓとなる。ＩＰパケット等を通信に用いる場
合は、パケットサイズ情報から間接的に当該フレームの
ビットレートモードを知ることができ、ビットレートモ
ードを示す情報を送る必要はない。ビットレートモード
の切り替えは利用者が行ったり、符号器の上位側で、例
えば通信の輻輳状態に応じてどのビットレートモードを
用いるかの指示が来る。An example of a mode switchable encoder is shown in FIG.
FIG. 5 shows an example of bit allocation in each bit rate mode. Each bit rate mode can be set and switched independently for each frame. In order to inform the decoder of the bit rate mode of the frame, 1 bit to 2 bit of each frame is used separately from the one shown in FIG. For example, 3.95 kbit / s,
5.75 kbit / s and 7.75 kbit / s are represented by 0, 10, and 11, respectively. In this case, each bit rate is 4 kbit / s, 5.85 kbit / s, 7.85.
It becomes kbit / s. When an IP packet or the like is used for communication, the bit rate mode of the frame can be indirectly known from the packet size information, and it is not necessary to send information indicating the bit rate mode. The user switches the bit rate mode, or the upper side of the encoder gives an instruction as to which bit rate mode to use in accordance with the congestion state of communication, for example.

【００３７】各ビットレートモードにおける固定符号帳
のパルス配置を図６、７、８に示す。線形予測分析部５
−２では１４次の線形予測分析が行われ、得られた線形
予測係数からＬｅｖｉｎｓｏｎ−Ｄｕｒｂｉｎのアルゴ
リズムによって１４次のＬＳＰ係数が計算される。ここ
では演算量との兼ね合いで１４次としたが、線形予測次
数は１４次から２０次程度を用いることができる。The fixed codebook pulse arrangement in each bit rate mode is shown in FIGS. Linear prediction analysis unit 5
In −2, a 14th-order linear prediction analysis is performed, and a 14th-order LSP coefficient is calculated from the obtained linear prediction coefficient by the Levinson-Durbin algorithm. Although the fourteenth order is used here in consideration of the amount of calculation, a linear prediction order of about 14th to 20th can be used.

【００３８】線形予測パラメータは、線形予測パラメー
タ符号化部５−３において、移動平均（ＭＡ）予測を用
いる２段ベクトル量子化によって符号化される。ここで
は移動平均（ＭＡ）予測モード切り替えに１ビット、１
段目に８ビット、２段目を低次と高次の２つに分け、そ
れぞれ６ビットを用いた。３．４ｋＨｚ帯域の音声では
分析次数はたとえば１０次程度が用いられる。これに対
して５ｋＨｚ帯域の音声では線形予測分析次数を１４次
に上げているため情報量が増えている。しかし、情報量
が増えているにも関わらずＬＳＰのベクトル量子化ビッ
ト数は同程度とする。これは５ｋＨｚ程度までは線形予
測係数に相関があり、３．４ｋＨｚ帯域の音声と同程度
の量子化ビット数で量子化可能であるという５ｋＨｚ帯
域音声信号の特徴を利用しているためである。また、７
ｋＨｚ帯域の音声を入力とした場合には、高域に無相関
な成分を含むため同程度のビット数にすることは難し
い。The linear prediction parameter coding unit 5-3 codes the linear prediction parameter by two-stage vector quantization using moving average (MA) prediction. 1 bit for moving average (MA) prediction mode switching, 1
8 bits were used for the second stage, and the second stage was divided into two, low order and high order, and 6 bits were used for each. For voice in the 3.4 kHz band, the analysis order is, for example, about 10th order. On the other hand, in the case of speech in the 5 kHz band, the amount of information increases because the linear prediction analysis order is increased to 14. However, the number of vector quantization bits of LSP is assumed to be about the same even though the amount of information increases. This is because the linear prediction coefficient is correlated up to about 5 kHz, and the characteristic of the 5 kHz band audio signal that it can be quantized with the same number of quantization bits as the audio in the 3.4 kHz band is used. Also, 7
When a voice in the kHz band is input, it is difficult to set the number of bits to the same level because the high frequency includes uncorrelated components.

【００３９】駆動音源ベクトル制御部５−５において、
各モードで使用される駆動音源ベクトル生成部１〜ｎ
（５−１２〜５−１４）全てにおいて、その各適応符号
帳には直前の駆動音源ベクトルを共通に格納し、ビット
レートモード切り替え時にも過去の適応符号帳ベクトル
を最初期化せずそのまま使用する。これによって、過去
に送った適応符号帳の情報を有効に利用することが可能
となる。ここでは、ビットレートモード３．９５ｋｂｉ
ｔ／ｓ、５．７５ｋｂｉｔ／ｓ、７．７５ｋｂｉｔ／ｓ
の３つのモードに対してそれぞれ駆動音源ベクトル生成
部を用意した。それぞれのサブフレーム数は２、３、４
である。In the driving sound source vector control unit 5-5,
Driving sound source vector generation units 1 to n used in each mode
In all of (5-12 to 5-14), the previous driving excitation vector is commonly stored in each of the adaptive codebooks, and the past adaptive codebook vectors are used as they are without initializing even when the bit rate mode is switched. To do. This makes it possible to effectively use the information of the adaptive codebook sent in the past. Here, the bit rate mode is 3.95 kbi.
t / s, 5.75 kbit / s, 7.75 kbit / s
A driving sound source vector generation unit is prepared for each of the three modes. The number of each subframe is 2, 3, 4
Is.

【００４０】適応符号帳インデックスには７．７５ｋｂ
ｉｔ／ｓモードでは第１、第３サブフレームに対して９
ビットを割り当て、第２、第４サブフレームでは前サブ
フレームで得た適応予測値の整数部ｉｎｔ（Ｔ₁）に対
して〔ｉｎｔ（Ｔ₁）−５＋２／３，ｉｎｔ（Ｔ₁）＋
４＋２／３〕の範囲で３倍精度で適応予測値を探索し、
５ビットを用いて表す。それ以外のビットレートモード
では第１サブフレームに対して９ビットを割り当て、残
りのサブフレームに対しては、５ビットを用いて表す。The adaptive codebook index is 7.75 kb.
9 in the first and third subframes in the it / s mode
Bits are allocated, and in the second and fourth subframes, [int (T ₁ ) -5 + 2/3, int (T ₁ ) + is added to the integer part int (T ₁ ) of the adaptive prediction value obtained in the previous subframe.
4 + 2/3], and search for an adaptive prediction value with triple precision,
It is expressed using 5 bits. In the other bit rate modes, 9 bits are assigned to the first subframe, and 5 bits are used for the remaining subframes.

【００４１】ここでは、音声は５ｋＨｚ程度まではピッ
チ予測が比較的容易であるという特徴を利用しているた
め、３．４ｋＨｚ帯域の音声に用いるのと同程度のビッ
トで適応符号帳成分を表現可能である。また、１１．０
２５ｋＨｚサンプリングの入力信号を用いているため、
８ｋＨｚサンプリングの信号を用いた場合と比較して適
応予測の時間分解能が向上していることも品質向上の理
由となっている。In this case, since the voice is characterized in that pitch prediction is relatively easy up to about 5 kHz, the adaptive codebook component is expressed by the same number of bits as used for the voice in the 3.4 kHz band. It is possible. Also, 11.0
Since the input signal of 25 kHz sampling is used,
Another reason for the quality improvement is that the time resolution of adaptive prediction is improved as compared with the case where a signal of 8 kHz sampling is used.

【００４２】ここでも、実施例１の場合と同様に、４．
５ｋＨｚ帯域〜５．５ｋＨｚ帯域の音声を入力対象とす
ることによって、３．４ｋＨｚ帯域の音声を対象とした
符号化方式と同程度のビット配分でもＬＳＰ符号帳と適
応符号帳の性能が顕著に向上する。また、固定符号帳の
探索においても適応符号帳の場合と同様に、８ｋＨｚサ
ンプリングに比べて１１．０２５ｋＨｚサンプリングの
サンプリング間隔が狭いことから、固定符号帳探索の時
間分解能が向上するという効果を得ることができる。Also here, as in the case of the first embodiment, 4.
By inputting speech in the 5 kHz band to 5.5 kHz band, the performance of the LSP codebook and the adaptive codebook is remarkably improved even if the bit allocation is similar to that of the encoding method for speech in the 3.4 kHz band. To do. Also, in the fixed codebook search, as in the case of the adaptive codebook, the sampling interval of 11.025 kHz sampling is narrower than that of 8 kHz sampling, so that the time resolution of the fixed codebook search is improved. You can

【００４３】これらのことを利用すれば固定符号帳で立
てられるサブフレームあたりのパルス数を非常にスパー
スにすることができ、固定符号帳のビット配分を３．４
ｋＨｚ帯域と同程度まで削減することが可能である。駆
動音源ベクトル制御部５−５では、駆動音源ベクトル切
り替え部５−１１において、使用する駆動音源ベクトル
を切り替えることでビットレートの変更を実現する。各
駆動音源ベクトル生成部１〜ｎ（５−１２から５−１
４）では、ビットレートモード切り替え時にも駆動音源
ベクトルバッファの再初期化は行わず、前サブフレーム
で用いた駆動音源ベクトルを使用して適応符号帳の探索
を行う。これによってビットレートモードを時々刻々切
り替えた場合にも、音が途切れることなく、フレームご
とのビットレート切り替えが可能となる。By utilizing these things, the number of pulses per subframe set up in the fixed codebook can be made very sparse, and the bit allocation of the fixed codebook is 3.4.
It is possible to reduce to the same level as the kHz band. In the driving sound source vector control unit 5-5, the driving sound source vector switching unit 5-11 changes the bit rate by switching the driving sound source vector to be used. Each driving sound source vector generation unit 1 to n (5-12 to 5-1
In 4), the drive excitation vector buffer is not reinitialized even when the bit rate mode is switched, and the adaptive codebook is searched using the drive excitation vector used in the previous subframe. As a result, even when the bit rate mode is switched from moment to moment, the bit rate can be switched frame by frame without interruption of sound.

【００４４】ビットレートモード切り替え時には、ＬＳ
Ｐ符号帳はすべてのビットレートモードで同一のものを
使用し、固定符号帳と適応符号帳に対するゲイン（利
得）をあらわす重み符号帳は、固定符号帳のＭＡ（移動
平均）予測係数と重みのバイアス（平均値）のみを切り
替え、重み符号帳は同一のものを使用している。ここ
で、重み符号帳をビットレートモードごとに独立して作
成し、切り替えて用いることもできる。重み符号帳を、
サブフレームの長さ、つまりビットレートモード切り替
えに拘わらず、駆動音源ベクトル生成部５−１２〜５−
１４に対し、共通のものとすることができるのは、適応
符号帳の符号ベクトル、固定符号帳の符号ベクトルと駆
動音源ベクトル候補との大きさの比はサブフレームの長
さが変っても似たような関係にあるからである。When switching the bit rate mode, LS
The P codebook is the same in all bit rate modes, and the weight codebook that represents the gain for the fixed codebook and the adaptive codebook is the MA (moving average) prediction coefficient and weight of the fixed codebook. Only the bias (average value) is switched, and the same weight codebook is used. Here, the weight codebook may be created independently for each bit rate mode and used by switching. The weight codebook,
The drive sound source vector generation units 5-12 to 5-5 regardless of the length of the subframe, that is, the bit rate mode switching.
14 can be made common to the adaptive codebook code vector, the fixed codebook code vector and the driving excitation vector candidate size ratio even if the subframe length changes. It is because there is such a relationship.

【００４５】この発明におけるビットレート切り替え法
では、適応符号帳が共有可能でありさえすれば、様々な
方式の固定符号帳を用いてビットレートモードを構成す
ることができる。たとえばＤｕａｌ−ＰｕｌｓｅＣＳ
−ＣＥＬＰや、ＰＳＩ−ＣＥＬＰや、ＭＰ（Ｍｕｌｔｉ
Ｐｕｌｓｅ）ＣＥＬＰその他のＣＥＬＰを基本とする
方式の固定符号帳を用いてビットレートモードを構成し
てもよい。その場合にも、すでに送信した情報を有効に
利用でき、音が途切れることなくビットレートモードを
切り替えることが可能である。In the bit rate switching method according to the present invention, as long as the adaptive codebook can be shared, the bitrate mode can be constructed using fixed codebooks of various systems. For example, Dual-Pulse CS
-CELP, PSI-CELP, MP (Multi
Pulse) CELP and other fixed codebooks based on CELP may be used to configure the bit rate mode. Even in that case, the already transmitted information can be effectively used, and the bit rate mode can be switched without interruption of sound.

【００４６】実施例２ではビットレートモードにかかわ
りなく、先読みを約７ｍｓ、つまりフレーム長の約３分
の１としている。このため、ビットレートモードの切り
替えにより、サブフレームが約５ｍｓ、約７ｍｓ、約１
０ｍｓの何れかに切り替わるが、符号が得られる遅延は
１フレーム＋７ｍｓと常に一定である。つまりＬＳＰ係
数は隣フレームの第１サブフレームでの分析結果でその
間の他のサブフレームを補間でしているため、本来は１
フレーム＋１サブフレームだけ遅延して各フレームのＬ
ＳＰ量子化符号が得られるが、モード切り替え時に先読
みする１サブフレーム分のＬＳＰ分析結果として、常に
７ｍｓの先読みで得られるＬＳＰ分析結果を利用する。
このため符号結果が得られる遅延量はモードに無関係で
一定であり、通信が一時的に途切れるようなことがない
ようにすることができ、かつ、遅延量も大きくしないで
済む。In the second embodiment, the read-ahead is about 7 ms, that is, about 1/3 of the frame length regardless of the bit rate mode. Therefore, by switching the bit rate mode, the subframe is about 5 ms, about 7 ms, about 1
Although it is switched to any of 0 ms, the delay for obtaining a code is always constant at 1 frame + 7 ms. That is, since the LSP coefficient is the analysis result of the first subframe of the adjacent frame and interpolates other subframes between them, it is originally 1
Frame + 1 subframe, delayed by L for each frame
Although the SP quantized code is obtained, the LSP analysis result obtained by the 7 ms prefetch is always used as the LSP analysis result for one subframe that is prefetched at the time of mode switching.
Therefore, the delay amount for obtaining the coded result is constant regardless of the mode, it is possible to prevent the communication from being temporarily interrupted, and the delay amount does not have to be large.

【００４７】この発明の復号方法の機能構成は、図２に
示すように、図１３に示した従来方法とほぼ同様であ
る。ただ、ビットレートモードを切り替える場合は、そ
の各モードと対応して複数の駆動音源ベクトル生成部５
−１２〜５−１４が符号器のものと同一ものとして構成
される。短期予測成分の符号（インデックス）の復号に
用いる符号帳、実施例ではＬＳＰ符号帳は、３．４ｋＨ
ｚ帯域の同一ビット数の符号（インデックス）の復号に
用いられる符号帳よりも量子化歪が大きくなるものであ
る。つまり短期予測成分のインデックスが同一ビット数
の場合、３．４ｋＨｚ帯域のＬＳＰ符号帳のＬＳＰ係数
の次数よりも、この５ｋＨｚ帯域のＬＳＰ符号帳のＬＳ
Ｐ係数の次数が大とされている。As shown in FIG. 2, the functional structure of the decoding method of the present invention is almost the same as that of the conventional method shown in FIG. However, when switching the bit rate mode, a plurality of driving sound source vector generation units 5 are provided corresponding to each mode.
-12 to 5-14 are configured as the same as those of the encoder. The codebook used for decoding the code (index) of the short-term prediction component, in the embodiment, the LSP codebook is 3.4 kH.
The quantization distortion is larger than that of a codebook used for decoding a code (index) having the same number of bits in the z band. That is, when the index of the short-term prediction component has the same number of bits, the LS of the LSP codebook of this 5 kHz band is higher than the order of the LSP coefficient of the LSP codebook of 3.4 kHz band.
The order of the P coefficient is high.

【００４８】またこの発明においては周期的な成分の符
号（インデックス）の復号に用いる適応符号帳は、３．
４ｋＨｚ帯域の同一ビット数の周期的成分符号（インデ
ックス）の復号に用いる適応符号帳と同程度の時間精度
で周期化されるものである。換言すれば、符号化方法の
実施例で説明したように、周期的成分符号（インデック
ス）のビット数を、適応符号帳の１符号ベクトルのサン
プル数で割った値は、インデックスビット数が同一の場
合、この発明の５ｋＨｚ帯域のものは、３．４ｋＨｚ帯
域のそれよりも小とされている。Further, in the present invention, the adaptive codebook used for decoding the code (index) of the periodic component is 3.
It is periodic with the same time accuracy as the adaptive codebook used for decoding the periodic component code (index) of the same number of bits in the 4 kHz band. In other words, as described in the embodiment of the coding method, the value obtained by dividing the number of bits of the periodic component code (index) by the number of samples of one code vector of the adaptive codebook has the same number of index bits. In this case, the 5 kHz band of the present invention is smaller than that of the 3.4 kHz band.

【００４９】この発明の符号化方法では音声帯域をほぼ
５ｋＨｚに制限するが、４．５ｋＨｚ帯域より小さくす
ると、原音での自然性が３．４ｋＨｚ帯域とそれ程変わ
らぬ程度に劣化してしまい自然性の優れた再生音声は得
られない。一方帯域を５．５ｋＨｚより大きくすると、
音声の特徴である周期性以外の成分が含まれ、ＬＳＰ符
号帳、適応符号帳のそれぞれについて、ビット数を可成
り多くしないと、性能が著しく劣化してしまう。要する
に４．５ｋＨｚ帯域〜５．５ｋＨｚ帯域にすることによ
り、原音での自然性を３．４ｋＨｚ帯域より著しく高
め、かつ音声の特徴である周期性を利用して、３．４ｋ
Ｈｚ帯域と同程度のビット割り当てで、短期予測成分、
周期的成分をそれぞれ高い品質を保持して符号化するこ
とができるようにしたものである。In the encoding method of the present invention, the voice band is limited to approximately 5 kHz, but if it is smaller than 4.5 kHz band, the naturalness of the original sound deteriorates to the 3.4 kHz band to the same extent and the naturalness of the original sound deteriorates. I can't get the excellent playback sound. On the other hand, if the band is greater than 5.5 kHz,
A component other than the periodicity, which is a feature of speech, is included, and unless the number of bits in each of the LSP codebook and the adaptive codebook is considerably increased, the performance is significantly deteriorated. In short, by setting the frequency range from 4.5 kHz to 5.5 kHz, the naturalness of the original sound is remarkably enhanced compared to the 3.4 kHz band, and the periodicity which is a characteristic of voice is used to obtain 3.4 kHz.
With bit allocation similar to Hz band, short-term prediction component,
The periodic components can be encoded while maintaining high quality.

【００５０】[0050]

【発明の効果】この発明の効果を明らかにするために、
実施例に示した符号化方式を用いたＭＯＳ評価試験を行
った。評価対象の符号化方式セットとしては次のような
ものを用いた。周波数帯域７ｋＨｚ、５ｋＨｚ、３．４
ｋＨｚの音声信号それぞれについて、原音とＭＮＲＵ
（振幅相関雑音付加音声）４０ｄＢ，３０ｄＢ，２０ｄ
Ｂ，１０ｄＢ。In order to clarify the effect of the present invention,
A MOS evaluation test using the encoding method shown in the embodiment was conducted. The following was used as the encoding system set to be evaluated. Frequency band 7kHz, 5kHz, 3.4
Original sound and MNRU for each kHz audio signal
(Voice with amplitude correlation noise) 40 dB, 30 dB, 20 d
B, 10 dB.

【００５１】３．４ｋＨｚ帯域の音声符号化方式として
Ｇ．７２３．１（６．３ｋｂｉｔ／ｓ，５．３ｋｂｉｔ
／ｓ），Ｇ．７２９（８ｋｂｉｔ／ｓ）。７ｋＨｚ帯域
のＧ．７２２（６４ｋｂｉｔ／ｓ，５６ｋｂｉｔ／ｓ，
４８ｋｂｉｔ／ｓ）。１１ｋＨｚサンプリングの音声を
既存の３．４ｋＨｚ帯域符号化方式に入力した場合の参
考値として、Ｇ．７２３．１の５．３ｋｂｉｔ／ｓモー
ド、６．３ｋｂｉｔ／ｓモードに１１．０２５ｋＨｚサ
ンプリングの音声を入力したＧ．７２３．１Ｂａｓｅ１
１ｋＨｚ７．２９ｂｉｔ／ｓ，８．６８ｋｂｉｔ／ｓ。As a voice coding system in the 3.4 kHz band, G. 723.1 (6.3 kbit / s, 5.3 kbit
/ S), G.I. 729 (8 kbit / s). G. 7 kHz band. 722 (64 kbit / s, 56 kbit / s,
48 kbit / s). As a reference value when a sound of 11 kHz sampling is input to the existing 3.4 kHz band encoding method, G. G.72 which inputs the voice of 11.25 kHz sampling into the 5.3kbit / s mode and the 6.3kbit / s mode of 723.1. 723.1Base1
1 kHz 7.29 bit / s, 8.68 kbit / s.

【００５２】この発明による５ｋＨｚ帯域の符号化方式
として、実施例１の７．８ｋｂｉｔ／ｓ符号化方式、実
施例２の５．７５ｋｂｉｔ／ｓ符号化方式。上記のそれ
ぞれについて、男性、女性あわせて１４音声の評価用音
声を使用し、被験者１６名、５段階絶対評価でＭＯＳ評
価試験を行った。評価試験結果を図９に示す。As the coding system of the 5 kHz band according to the present invention, the 7.8 kbit / s coding system of the first embodiment and the 5.75 kbit / s coding system of the second embodiment. For each of the above, a total of 14 voices for male and female voices were used for evaluation, and 16 subjects performed a MOS evaluation test with 5 grade absolute evaluation. The evaluation test results are shown in FIG.

【００５３】実験の結果から、３．４ｋＨｚ帯域の８ｋ
Ｈｚサンプリング音声を入力とするＧ．７２３．１，
５．３ｋｂｉｔ／ｓ，６．３ｋｂｉｔ／ｓのＭＯＳ評価
値はそれぞれ２．７５８９，２．８８８４であり、Ｇ．
７２９の評価値は３．２０５４であった。８ｋＨｚサン
プリング用に設計された符号化方式に対して１１ｋＨｚ
サンプリングの音声を入力したＧ７２３．１Ｂａｓｅ１
１ｋＨｚ７．２９ｂｉｔ／ｓ，８．６８ｋｂｉｔ／ｓで
はそれぞれ２．９４６４，３．０４０２であった。From the result of the experiment, 8k in the 3.4kHz band
G. Hz sampling voice as input 723.1
The MOS evaluation values of 5.3 kbit / s and 6.3 kbit / s are 2.7589 and 2.8884, respectively.
The evaluation value of 729 was 3.2054. 11 kHz for a coding scheme designed for 8 kHz sampling
G723.1Base1 that input sampling voice
The values were 2.9464 and 3.0402 at 1 kHz 7.29 bit / s and 8.68 kbit / s, respectively.

【００５４】これに対してこの発明を用いた実施例１の
７．８ｋｂｉｔ／ｓ符号化方式ではＭＯＳ評価値は３．
４４２、実施例２の５．７５ｋｂｉｔ／ｓモードでは
３．２９０２であった。これらの結果から、既存の３．
４ｋＨｚ帯域の符号化方式や、既存の３．４ｋＨｚ帯域
の符号化方式に単に１１．０２５ｋＨｚサンプリングの
信号を入力したものに比べて、帯域を５ｋＨｚ程度に特
化したこの発明のＭＯＳ評価値の方が有意に高いことが
示された。On the other hand, in the 7.8 kbit / s encoding system of the first embodiment using the present invention, the MOS evaluation value is 3.
442 and 3.2902 in the 5.75 kbit / s mode of Example 2. From these results, the existing 3.
The MOS evaluation value of the present invention, which is specialized in the band of about 5 kHz, is more suitable than the coding system of the 4 kHz band or the existing coding system of the 3.4 kHz band in which a signal of 11.25 kHz sampling is simply input. Was significantly higher.

【００５５】この発明では、入力音声の周波数帯域を５
ｋＨｚ程度に特化することで、従来の７ｋＨｚ帯域の入
力音声を対象とした符号化方式に比べて圧倒的に低いビ
ットレートで符号化でき、３．４ｋＨｚ帯域の入力音声
を対象とした符号化方式に比べては、同程度の低いビッ
トレートで圧倒的に自然性の高い音声を再生できる。ま
た入力音声を５ｋＨｚ程度の帯域と３．４ｋＨｚ帯域よ
りも広くし、かつ前述したように線形予測量子化符号
帳、適応符号帳の性能を高めたため、比較的高い品質を
保持した状態でビットレートを可変にすることができ
る。In the present invention, the frequency band of the input voice is set to 5
By specializing in about kHz, it is possible to perform encoding at an overwhelmingly lower bit rate than the conventional encoding method for 7 kHz band input speech, and encoding for 3.4 kHz band input speech. Compared to the system, it is possible to play overwhelmingly natural sound at the same low bit rate. In addition, since the input voice is set wider than the band of about 5 kHz and the band of 3.4 kHz, and the performance of the linear predictive quantization codebook and the adaptive codebook is improved as described above, the bit rate is maintained while maintaining a relatively high quality. Can be variable.

[Brief description of drawings]

【図１】この発明による符号化法の機能的構成例を示す
図。FIG. 1 is a diagram showing a functional configuration example of an encoding method according to the present invention.

【図２】この発明による復号法の機能的構成例を示す
図。FIG. 2 is a diagram showing a functional configuration example of a decoding method according to the present invention.

【図３】実施例１の７．８ｋｂｉｔ／ｓ符号化器におけ
るビット配分の例を示す図。FIG. 3 is a diagram showing an example of bit allocation in the 7.8 kbit / s encoder of the first embodiment.

【図４】実施例１の７．８ｋｂｉｔ／ｓ符号化器におけ
るパルス配置の例を示す図。FIG. 4 is a diagram showing an example of pulse arrangement in the 7.8 kbit / s encoder of the first embodiment.

【図５】実施例２の符号化器の各ビットレートモードに
おけるビット配分の例を示す図。FIG. 5 is a diagram showing an example of bit allocation in each bit rate mode of the encoder according to the second embodiment.

【図６】実施例２の符号化器の７．７５ｋｂｉｔ／ｓＭ
ＯＤＥにおけるパルス配置の例を示す図。FIG. 6 is 7.75 kbit / sM of the encoder according to the second embodiment.
The figure which shows the example of the pulse arrangement in ODE.

【図７】実施例２の符号化器の５．７５ｋｂｉｔ／ｓＭ
ＯＤＥにおけるパルス配置の例を示す図。FIG. 7 is 5.75 kbit / sM of the encoder according to the second embodiment.
The figure which shows the example of the pulse arrangement in ODE.

【図８】実施例２の符号化器の３．９５ｋｂｉｔ／ｓＭ
ＯＤＥにおけるパルス配置の例を示す図。FIG. 8 is 3.95 kbit / sM of the encoder according to the second embodiment.
The figure which shows the example of the pulse arrangement in ODE.

【図９】ＭＯＳ評価試験結果を示す図。FIG. 9 is a diagram showing a MOS evaluation test result.

【図１０】従来の符号化器の機能的構成を示す図。FIG. 10 is a diagram showing a functional configuration of a conventional encoder.

【図１１】従来の駆動音源ベクトル生成部を示す図。FIG. 11 is a diagram showing a conventional driving sound source vector generation unit.

【図１２】従来の合成歪み計算法の構成を示す図。FIG. 12 is a diagram showing a configuration of a conventional synthetic distortion calculation method.

【図１３】従来の復号器の機能的構成を示す図。FIG. 13 is a diagram showing a functional configuration of a conventional decoder.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−36495（ＪＰ，Ａ) 特開平６−153119（ＪＰ，Ａ) 特開平５−199071（ＪＰ，Ａ) 特開平６−124100（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 G10L 19/04 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-7-36495 (JP, A) JP-A-6-153119 (JP, A) JP-A-5-199071 (JP, A) JP-A-6- 124100 (JP, A) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 19/00 G10L 19/04

Claims

(57) [Claims]

1. A short-term prediction of an input speech signal for each frame
Component and its prediction residual component, and the prediction residual component
To a method of encoding using the adaptive codebook and the fixed codebook.
Be careful Limit the input audio signal to the band of about 5kHzThen When the above short-term prediction components are coded with the same bit allocation
Has a larger quantization distortion than the encoding in the 3.4 kHz band
To be Encoding using the adaptive codebook is performed in the 3.4 kHz band.
Fewer bits allocated to one sample than encoding
It is characterized by doingSoundVoice signal coding method.

2. A coding method according to claim 1 Symbol placement, by switching the fixed code book, the speech signal encoding method characterized by the bit rate varying.

3. The encoding method according to claim 2 , wherein when the bit rate is switched, the adaptive codebook immediately before switching is used without re-initializing the adaptive codebook. .

4. The encoding method according to claim 2 or 3 , wherein the frame length is constant at all bit rates,
A voice encoding method characterized in that the number of subframes for each frame is changed according to a bit rate.

5. A speech signal is divided into a short-term prediction component for each frame and its prediction residual component, and the prediction residual component is
A method of decoding a code encoded by a method of separately encoding a periodic component longer than a frame and the remaining component, wherein the code of the short-term prediction component is the same number of bits in the 3.4 kHz band. Decoding is performed using a codebook that has a larger quantization distortion than the codebook used to decode the code, and the code of the periodic component is of the same time as the code of the same number of bits in the 3.4 kHz band. A method for decoding a voice signal, characterized in that the adaptive codebook is cyclicized with high accuracy and then decoded.

6. An encoding method in which an input speech signal is divided into a short-term prediction component for each frame and a prediction residual component thereof, and the prediction residual component is encoded using an adaptive codebook and a fixed codebook.
In vessels, the filter for limiting the input audio signal in the band of about 5kHz
Comprising a part, if the number of same bits are allocated, the short-term prediction
The linear prediction analysis order of the components of the encoding in the 3.4 kHz band
Is it larger than, the encoding using the adaptive codebook, than the encoding of 3.4kHz band, is it small number of bits allocated to one sample
Rot characteristics and be Ruoto voice signal encoder that is.

7. The speech signal encoder according to claim 6, wherein a plurality of fixed code vectors in which the number of stored fixed code vectors is different.
A constant codebook, the speech signal encoder characterized in that it comprises a vector switching unit using these fixed codebook, the <br/> re coded cut by the bit rate control signal.

8. The encoder according to claim 7 , wherein the fixed codebook is switched by a bit rate control signal.
Was sometimes not performed reinitialize the adaptive codebook, the speech signal encoder according to claim Rukoto adaptive codebook immediately before the switching is utilized.

9. Te encoder smell <br/> according to claim 7, against the all bit rates, the frame length is constant,
The higher the bit rate, the number of subframes per frame
There speech coder, characterized in that there is a large.

10. A speech signal is divided into a short-term prediction component for each frame and its prediction residual component, and the prediction residual component is divided into a periodic component longer than a frame and its remaining component and coded. a decoder of an encoded code by the method of reduction, the sign of the short-term prediction component, linear prediction order <br/> than the codebook used for decoding the codes of the same number of bits of 3.4kHz band It is decoded using the code book entry size, the sign of the periodic components, the decoding of the codes of the same number of bits of 3.4kHz band, the adaptive codebook is circumferential <br/> initialized at time system comparable Te, the audio signal decoding, wherein Rukoto decoded
Bowl .