JP2002366195A

JP2002366195A - Method and device for encoding voice and parameter

Info

Publication number: JP2002366195A
Application number: JP2001167913A
Authority: JP
Inventors: Teruo Fumoto; 照夫麓; Seiji Sasaki; 佐々木誠司
Original assignee: YRP Advanced Mobile Communication Systems Research Laboratories Co Ltd
Current assignee: YRP Advanced Mobile Communication Systems Research Laboratories Co Ltd
Priority date: 2001-06-04
Filing date: 2001-06-04
Publication date: 2002-12-20
Anticipated expiration: 2021-06-04
Also published as: JP3472279B2

Abstract

PROBLEM TO BE SOLVED: To encode a voice pitch being an analysis/synthetic voice encoding parameter, voiced/unvoiced information and harmonic spectrum amplitude through the use of a small number of bits. SOLUTION: The sound pitch is encoded by changing-over a frame for quantization by the uniform/differential selection of a logarithm pitch and a frame for quantization by an inter-frame interpolation pitch candidate index. The voiced/unvoiced information is encoded by changing-over a frame for quantization by information of the voiced/unvoiced information which is selected from the representative voiced/unvoiced information and as frame without the transmission of voiced/unvoiced information. A harmonic spectrum amplitude string is made into a model by an autoregressive linear prediction model which is adjusted by spectrum. An LSP coefficient is encoded by changing-over a frame for vector quantization and a frame for quantization by an interpolation candidate index. An LPC gain is quantized by changing-over a frame for the uniform quantization of the logarithm gain and a frame for differential quantization or quantization by the interpolation candidate index after correcting the change of sound power which is generated by LSP coefficient quantization.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号をデジタ
ル化して所定の時間間隔毎にその特徴を表す音声符号化
パラメータを取得し、取得した音声符号化パラメータを
符号化する音声符号化パラメータ符号化方法及び装置に
関するものであり、その符号化した音声符号化パラメー
タを伝送または蓄積し、伝送先または蓄積先から必要な
時に音声符号化パラメータを復元し、復元した音声符号
化パラメータから音声信号を合成して音声を伝えるデジ
タル携帯電話やデジタル音声蓄積装置などに使用して好
適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding parameter code for digitizing a voice signal, obtaining voice coding parameters representing the characteristics of the voice signal at predetermined time intervals, and coding the obtained voice coding parameters. The present invention relates to an encoding method and apparatus, which transmits or stores the encoded audio encoding parameter, restores the audio encoding parameter when necessary from the transmission destination or storage destination, and converts the audio signal from the restored audio encoding parameter. It is suitable for use in digital cellular phones, digital voice storage devices, and the like that transmit voice through synthesis.

【０００２】[0002]

【従来の技術】デジタル化された音声信号は、データ圧
縮、誤り処理、多重化などさまざまなデジタル信号処理
が可能になるため、固定電話や移動電話に限らず音声を
利用するマルチメディアシステムなどに広く取り入れら
れている。アナログの音声信号をデジタル化するには、
一般に入力音声周波数帯域の２倍以上の標本化周波数で
標本化し、耳で識別できない程度の量子化ステップで量
子化が必要なため、アナログ信号と比較し広い伝送周波
数帯域幅を必要とする。そのため、一旦デジタル化され
た音声信号は、要求される音声品質に応じてさまざまな
符号化方式や変調方式によりデータの圧縮が行われてい
る。高い音声データの圧縮率が得られる方法として、音
声の持つ特徴を積極的に利用する分析合成型の音声符号
化方式とそこから得られた音声符号化パラメータを効率
的に量子化する方法が考えられている。2. Description of the Related Art Digitized audio signals can be subjected to various digital signal processing such as data compression, error processing, and multiplexing. Widely adopted. To digitize analog audio signals,
Generally, sampling is performed at a sampling frequency that is twice or more the input voice frequency band, and quantization is required in a quantization step that cannot be discerned by ears. Therefore, a wider transmission frequency bandwidth is required compared to analog signals. Therefore, the audio signal once digitized is subjected to data compression by various coding methods and modulation methods according to the required sound quality. As a method to obtain a high compression ratio of audio data, an analysis-synthesis-type audio encoding system that actively uses the features of audio and a method of efficiently quantizing the audio encoding parameters obtained from it are considered. Have been.

【０００３】例えば、衛星携帯電話に一部使用されてい
るＭＢＥ（Multi-Band Excitation）方式もしくはＩＭ
ＢＥ（Improved Multi-Band Excitation）方式は、この
分析合成型の音声符号化方式の一種で、音声を所定の時
間周期（20msec）で所定の時間長さのセグメントを取り
出してフレームを構成し、そのフレーム毎に、音声ピッ
チ（又はその逆数としての音声基本周波数）、フレーム
の音声の周波数スペクトルから得られる音声ハーモニッ
クスペクトル振幅列、周波数スペクトルを適当な周波数
領域に分割した周波数バンド毎の有声／無声情報（Voic
ed/Unvoiced情報、又はV/UV情報）を音声符号化パラメ
ータとし、各フレームに対して、音声ピッチは８ビット
均一量子化、バンド毎のV/UV情報ｖ[k]（kはバンドの番
号）は０／１の２進数で表したバイナリ値でＫビット量
子化（Ｋ：最大バンド数で最大12ビットの可変長）、音
声ハーモニック振幅列はフレーム間予測差分値を２次元
変換しそのＤＣＴ（離散コサイン変換）係数を（75−
K）ビットで量子化して4.15kbpsの音声符号化速度を得
ている。[0003] For example, the MBE (Multi-Band Excitation) method or IM which is partially used in a satellite cellular phone.
The BE (Improved Multi-Band Excitation) method is a type of this analysis-synthesis-type speech coding method. The speech is extracted at predetermined time periods (20 msec) into segments of a predetermined time length to form a frame. For each frame, a voice pitch (or a voice fundamental frequency as a reciprocal thereof), a voice harmonic spectrum amplitude sequence obtained from the frequency spectrum of the voice of the frame, voiced / unvoiced information for each frequency band obtained by dividing the frequency spectrum into an appropriate frequency region (Voic
ed / Unvoiced information or V / UV information) is used as a voice coding parameter, and for each frame, the voice pitch is 8-bit uniform quantization, and V / UV information v [k] (k is a band number) for each band. ) Is a K-bit quantization (K: variable length of up to 12 bits with the maximum number of bands) as a binary value represented by a binary number of 0/1, and a voice harmonic amplitude sequence is a two-dimensional transform of an inter-frame prediction difference value and its DCT. (Discrete cosine transform)
Quantization by K) bits gives a speech coding rate of 4.15 kbps.

【０００４】図８は、一般的な音声符号化伝送装置の構
成を示した図である。音声入力端子３０１から入力され
た標本化・量子化された音声デジタル信号を、音声符号
化パラメータ抽出部３０２で所定の時間周期で所定の時
間長さのセグメントを取り出してフレームを構成し、そ
のフレーム毎に音声符号化パラメータを抽出する。抽出
する音声符号化パラメータは音声符号化方式により異な
り、例えば前記のＭＢＥ方式では、音声ピッチ、音声ハ
ーモニックスペクトル振幅列、各周波数バンドのV/UV情
報である。パラメータ符号化部３０３は、抽出した音声
符号化パラメータを効果的に符号化して符号量を低減せ
しめ、送信部３０４を介して伝送路３０５に送り出す。
受信部３０６で受け取った信号は、パラメータ復号化部
３０７で音声符号化パラメータを復元し、音声合成部３
０８は音声符号化パラメータ抽出部３０２と逆の動作に
より合成音声を作成し音声出力端子３０９から音声デジ
タル信号を出力する。FIG. 8 is a diagram showing a configuration of a general speech coded transmission apparatus. From the sampled and quantized audio digital signal input from the audio input terminal 301, the audio encoding parameter extracting unit 302 extracts a segment of a predetermined time length at a predetermined time period to form a frame, and A speech coding parameter is extracted every time. The voice coding parameters to be extracted differ depending on the voice coding method. For example, in the above-mentioned MBE method, it is a voice pitch, a voice harmonic spectrum amplitude sequence, and V / UV information of each frequency band. The parameter encoding unit 303 effectively encodes the extracted speech encoding parameters to reduce the code amount, and sends the encoded speech encoding parameters to the transmission path 305 via the transmission unit 304.
The signal received by the receiving unit 306 is restored to the speech coding parameters by the parameter decoding unit 307, and the speech
Reference numeral 08 generates a synthesized speech by the operation opposite to that of the speech encoding parameter extraction unit 302, and outputs a speech digital signal from a speech output terminal 309.

【０００５】図９は前記ＭＢＥ方式の場合における前記
音声符号化パラメータ抽出部３０２のブロック構成図で
ある。デジタル入力音声信号は入力端子３０１から基本
周波数推定部４０１に入力され、ここで音声の基本周波
数が推定される。基本周波数の推定値は、時間遅れの自
己相関関数が最大となる時間の逆数値として計算され
る。周波数スペクトル計算部４０２では、ハミング窓等
の窓関数によりフレームから切り出した有限長の音声信
号を周波数分析して音声周波数スペクトルを得る。基本
周波数修正部４０３は、推定された音声基本周波数と前
記窓関数により合成されるスペクトルと前記音声周波数
スペクトルとの誤差最小条件で、Ａ−ｂ−Ｓ（Analysis
-by-Synthesis）手法により修正音声基本周波数ωoとハ
ーモニックスペクトル振幅列を同時に求める。有声強度
計算部４０４は修正音声基本周波数ωoに基づいて、周
波数帯域を複数の周波数バンドｋ（k=1,2,...,K）に分
割し、各周波数バンド毎に合成された合成スペクトルと
音声周波数スペクトルの誤差を計算し、閾値判定により
V/UV情報ｖ[k]を出力する。スペクトル包絡計算部４０
５はV/UV情報ｖ[k]により、有声バンドではＡ−ｂ−Ｓ
手法で求めた各ハーモニックスペクトル振幅、無声バン
ドでは各ハーモニックの周波数帯域での周波数スペクト
ルのルート二乗平均値をスペクトル包絡|Ａ(ω)|として
出力する。FIG. 9 is a block diagram of the speech coding parameter extraction unit 302 in the case of the MBE system. The digital input audio signal is input from an input terminal 301 to a fundamental frequency estimator 401, where the fundamental frequency of the audio is estimated. The estimated value of the fundamental frequency is calculated as the reciprocal value of the time at which the autocorrelation function of the time delay is maximized. The frequency spectrum calculation unit 402 obtains a voice frequency spectrum by frequency-analyzing a finite-length voice signal cut out from a frame using a window function such as a Hamming window. The fundamental frequency correction unit 403 performs an AbS (Analysis) under a minimum error condition between the estimated speech fundamental frequency, the spectrum synthesized by the window function, and the speech frequency spectrum.
-by-Synthesis) method to simultaneously obtain the corrected voice fundamental frequency ωo and the harmonic spectrum amplitude sequence. The voiced intensity calculation unit 404 divides the frequency band into a plurality of frequency bands k (k = 1, 2,..., K) based on the modified voice fundamental frequency ωo, and synthesizes a synthesized spectrum for each frequency band. And the difference between the voice frequency spectrum and the threshold
Outputs V / UV information v [k]. Spectrum envelope calculator 40
5 is based on V / UV information v [k].
For each harmonic spectrum amplitude and unvoiced band obtained by the method, the root mean square value of the frequency spectrum in the frequency band of each harmonic is output as a spectrum envelope | A (ω) |.

【０００６】図１０は前記ＩＭＢＥ方式の場合における
パラメータ符号化部３０３のブロック構成図である。入
力端子５０１に入力された音声基本周波数ωoは、基本
周波数量子化部５０２で、予め定めた量子化範囲及び量
子化ステップで８ビットに均一量子化し、その量子化値
B0を出力端子５０３に出力する。入力端子５０４に入力
されたV/UV情報ｖ[k]は、V/UV情報量子化部５０５で、
例えば周波数バンド数Ｋが１２の場合は１２個の０又は
１の情報で表した２進数１２ビット値B1として出力端子
５０６に出力する。入力端子５０７に入力されたスペク
トル包絡|Ａ(ω)|は、離散的なハーモニックスペクトル
振幅列|Ａ(ωi)|、（i=1,2,3,....,N、Ｎ：ハーモニッ
ク本数）として入力される。FIG. 10 is a block diagram of the parameter encoding unit 303 in the case of the IMBE system. The fundamental voice frequency ωo input to the input terminal 501 is uniformly quantized to 8 bits by a fundamental frequency quantization unit 502 in a predetermined quantization range and quantization step, and the quantized value
B0 is output to the output terminal 503. The V / UV information v [k] input to the input terminal 504 is input to a V / UV information quantization unit 505,
For example, when the number of frequency bands K is 12, it outputs to the output terminal 506 as a binary 12-bit value B1 represented by 12 pieces of information of 0 or 1. The spectrum envelope | A (ω) | input to the input terminal 507 is a discrete harmonic spectrum amplitude sequence | A (ωi) |, (i = 1, 2, 3,..., N, N: harmonic Number).

【０００７】まず対数変換部５０８で対数変換後、減算
器５０９で前フレームのハーモニックスペクトル振幅列
から予測した予測ハーモニックスペクトル振幅列５２１
との差（これを「予測差分値列」と呼ぶことにする）が
計算され、ブロック変換部５１０に渡される。ブロック
変換部５１０では予測差分値列をハーモニックの順位に
より６種類に順次分類して２次元データとし、次のＤＣ
Ｔ（離散コサイン変換）部５１１に渡しＤＣＴ係数が計
算され量子化部５１２に渡され、ＤＣＴ係数の次数によ
り予め選定した均一量子化法とベクトル量子化法の組み
合わせにより、予測差分値列の符号化データB2が出力端
子５１３に出力される。量子化復元部５１４、逆ＤＣＴ
部５１５およびブロック復元部５１６は、量子化された
予測差分値列を復元し、加算器５１７で予測ハーモニッ
クスペクトル振幅列５２１と加算し、現フレームの入力
スペクトル包絡の量子化スペクトル包絡値が復元され
る。その量子化スペクトル包絡値はフレーム遅延部５１
８で１フレーム遅延し、新たに入力される次フレームの
音声基本周波数ωoと前フレームの基本周波数を元にし
てスペクトル包絡予測部５１９で次フレームのスペクト
ル包絡値を予測し、その予測値を前記減算器５０９に導
き、次フレームのスペクトル包絡の量子化に備える。First, a logarithmic converter 508 performs logarithmic conversion, and a subtractor 509 predicts a harmonic spectrum amplitude sequence 521 predicted from the harmonic spectrum amplitude sequence of the previous frame.
(This is referred to as a “prediction difference value sequence”), and is passed to the block conversion unit 510. The block conversion unit 510 sequentially classifies the prediction difference value sequence into six types according to the order of harmonics to obtain two-dimensional data.
A DCT coefficient is calculated and passed to a T (discrete cosine transform) unit 511, passed to a quantization unit 512, and a code of a prediction difference value sequence is obtained by a combination of a uniform quantization method and a vector quantization method selected in advance according to the order of the DCT coefficient. The converted data B2 is output to the output terminal 513. Quantization restoration unit 514, inverse DCT
The unit 515 and the block restoring unit 516 restore the quantized prediction difference value sequence, add the result to the predicted harmonic spectrum amplitude sequence 521 in the adder 517, and restore the quantized spectrum envelope value of the input spectrum envelope of the current frame. You. The quantized spectrum envelope value is transmitted to the frame delay unit 51.
8, the spectrum envelope predicting unit 519 predicts the spectrum envelope value of the next frame based on the newly input voice fundamental frequency ωo of the next frame and the fundamental frequency of the previous frame. It is led to a subtractor 509 and prepares for quantization of the spectral envelope of the next frame.

【０００８】このＭＢＥ方式の原理は、D.W.Griffin an
d J.S.Lim "Multi-band ExcitationVocoder", IEEE Tra
nsactions on Acoustics, speech, and signal process
ing, vol.36,No.8,August 1988, pp1223-1235に記載さ
れている。又、符号化器の構成方法はＩＭＢＥ方式の音
声符号化手順として、USP-5491722（Methods for speec
h transmission,Feb.13,1996）により詳しく開示されて
いる。[0008] The principle of this MBE system is that DWG Riffin an
d JSLim "Multi-band ExcitationVocoder", IEEE Tra
nsactions on Acoustics, speech, and signal process
ing, vol. 36, No. 8, August 1988, pp 1223-1235. Also, the encoding method of the encoder is USP-5491722 (Methods for speec
h transmission, Feb. 13, 1996).

【０００９】一方、ハーモニックスペクトル振幅列の量
子化を更に効率化するため、ハーモニック振幅列を線形
予測モデル（ＬＰＣ）でモデル化し、そのＬＰＣモデル
化係数（線形予測係数とゲイン）を量子化する方法も考
案されている。（A.M.Kondoz"Digital Speech", John W
iley & Sons, Ltd,1995, pp256-261）ＬＰＣモデル化
を、式（１）に示す。On the other hand, in order to further increase the efficiency of the quantization of the harmonic spectrum amplitude sequence, a method of modeling the harmonic amplitude sequence with a linear prediction model (LPC) and quantizing the LPC modeled coefficients (linear prediction coefficient and gain). Has also been devised. (AMKondoz "Digital Speech", John W
iley & Sons, Ltd, 1995, pp. 256-261) LPC modeling is shown in equation (1).

【数１】ハーモニックスペクトル振幅列は、振幅Ｇで繰り返しが
ピッチ周期のパルス列を音源信号とするＪ次のＬＰＣ合
成フィルタＨ(ω)の出力スペクトルとしてモデル化し、
可変長であるハーモニック振幅列を振幅値（または利
得、ゲイン）ＧとＪ個（たとえば１０個）のＬＰＣ係数
ａkで表したものである。また、このＬＰＣ係数は線ス
ペクトル対（ＬＳＰ）に変換すれば、補間特性が優れて
いることは良く知られており、このモデル化とＬＰＣ係
数のフレーム補間を併用することで、更に低ビットレー
ト化ができることも提案されているが、音質の劣化を伴
うことも報告されている。このように、音声をデジタル
化して低ビットレートの音声符号化を実現する方法とし
て、音声合成モデルに基づく音声符号化パラメータを抽
出して符号化を行う分析合成型の音声符号化方式が提案
され、一部実用に供されている。(Equation 1) The harmonic spectrum amplitude sequence is modeled as an output spectrum of a J-order LPC synthesis filter H (ω) using a pulse sequence having a pitch G with a repetition pitch of amplitude G as a sound source signal,
A variable-length harmonic amplitude sequence is represented by an amplitude value (or gain, gain) G and J (for example, 10) LPC coefficients ak. It is well known that if the LPC coefficients are converted to a linear spectrum pair (LSP), the interpolation characteristics are excellent. By using this modeling together with the frame interpolation of the LPC coefficients, a lower bit rate can be obtained. Although it has been proposed that the sound quality can be improved, it is also reported that the sound quality is deteriorated. As described above, an analysis-synthesis-type speech coding scheme that extracts and encodes speech coding parameters based on a speech synthesis model has been proposed as a method of realizing low bit rate speech coding by digitizing speech. , Some of which are in practical use.

【００１０】[0010]

【発明が解決しようとする課題】以上述べた分析合成型
の音声符号化方式は、低ビットレート音声符号化に有効
であるが、分析条件によっては分析合成型特有の音質劣
化を伴いやすい。分析合成型の音声符号化方式の合成音
質改善方法として、音声フレーム更新周期を短く設定す
ることにより、音声フレーム内での音声パラメータの変
化を少なくし、分析合成型でありながら高音質化を図る
方法が考えられている。フレーム更新周期を短く設定し
た場合の符号化音声品質の改善効果については、麓他
“業務用移動体通信向けの音声符号化方式の検討”，電
子情報通信学会全国大会，D-14-2,p171,Mar.2000で報告
されている。音声フレームの更新周期を短縮すれば、符
号化ビット数が増大するので、符号化パラメータを補間
等により再生し、パラメータの冗長性を削減することが
必要になる。しかし、単純な補間によりパラメータのビ
ット数を削減すれば大きな劣化が発生する。この様に、
分析合成型音声符号化方式の低ビットレート化とフレー
ム更新周期の短縮による音声品質向上を両立するために
は、音声符号化パラメータの効率的な量子化方法の課題
があり、特に補間により符号化パラメータを再生するこ
とを利用して低ビットレート化を行う場合には、補間方
法と音質劣化のバランスを十分考慮しながら量子化法を
設計するという課題がある。The above-described analysis / synthesis type speech coding method is effective for low bit rate speech coding, but tends to be accompanied by sound quality deterioration peculiar to the analysis / synthesis type depending on analysis conditions. As a method for improving the synthesis sound quality of the analysis-synthesis type speech coding method, by setting a short speech frame update period, the change of the speech parameters in the speech frame is reduced, and the sound quality is improved while being the analysis-synthesis type. A way is being considered. Regarding the effect of improving the coded voice quality when the frame update period is set to be short, see, eg, "Study of Voice Coding Schemes for Commercial Mobile Communications", IEICE National Convention, D-14-2, p171, reported in Mar.2000. If the update period of the audio frame is shortened, the number of coded bits increases. Therefore, it is necessary to reproduce the coded parameters by interpolation or the like to reduce the parameter redundancy. However, if the number of bits of the parameter is reduced by simple interpolation, significant deterioration occurs. Like this
In order to achieve both a low bit rate of the analysis-synthesis type speech coding method and an improvement in speech quality by shortening the frame update period, there is a problem of an efficient quantization method of speech coding parameters. In the case of lowering the bit rate by using the reproduction of parameters, there is a problem that the quantization method is designed while sufficiently considering the balance between the interpolation method and the sound quality deterioration.

【００１１】そこで本発明は、分析合成型の音声符号化
方式において、低ビットレート化とフレーム更新周期の
短縮による音声品質の向上を両立させることのできる音
声符号化パラメータ符号化方法および装置を提供するこ
とを目的としている。Accordingly, the present invention provides a speech encoding parameter encoding method and apparatus in an analysis-synthesis speech encoding system which can achieve both a lower bit rate and an improved speech quality by shortening a frame update period. It is intended to be.

【００１２】[0012]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声符号化パラメータ符号化方法は、デジ
タル化され所定時間長のフレームに分割された音声信号
から取得した音声符号化パラメータを符号化する音声符
号化パラメータ符号化方法であって、前記音声符号化パ
ラメータとしての音声ピッチを、差分量子化法と均一量
子化法の選択によりいずれかの量子化法により量子化ピ
ッチを得るフレームと、前後のフレームの量子化ピッチ
を用いて計算した複数の補間ピッチ候補から選択した補
間ピッチのインデックスにより量子化するフレームとの
組み合わせにより符号化するステップ、前記音声符号化
パラメータとしての有声／無声情報を、限定された数の
代表有声／無声情報から選択した代表有声／無声情報の
インデックスにより符号化するステップ、前記音声符号
化パラメータとしてのハーモニックスペクトル振幅列
を、線形予測モデルによる線形予測係数もしくはそれよ
り導かれる線スペクトル対とゲインに分離し、線形予測
係数もしくは線スペクトル対については、ベクトル量子
化器などの量子化器により量子化するフレームと、前後
のフレームの量子化線形予測係数もしくは線スペクトル
対から線形補間器により求めた複数の候補点から選択し
た候補点のインデックスにより補間量子化するフレーム
との組み合わせにより符号化するステップ、前記線形予
測係数の量子化により発生するフレームのハーモニック
スペクトルパワーの変化に応じて前記ゲインを補正し補
正ゲインを得るステップ、および、該補正ゲインを対数
化し、そのまま第１のゲイン量子化器で均一量子化する
フレームと、前のフレームの前記第１のゲイン量子化器
の量子化ゲインを基準とした差分量子化器の出力値と、
前後のフレームの前記第１の量子化器の出力の複数の補
間候補の選択により求めた補間量子化器の出力値から、
誤差の少ない方を選んで量子化する第２のゲイン量子化
器により量子化するフレームとの組み合わせにより前記
補正ゲインを符号化するステップを含むものである。ま
た、前記補正ゲインを得るステップは、線形予測モデル
化前のハーモニック振幅列の二乗和により得られるハー
モニックスペクトルパワーと、量子化線形予測係数と量
子化前のゲインを用いて線形予測モデルにより得られる
ハーモニックスペクトル振幅値の二乗和から求めたハー
モニックスペクトルパワーの比を、前記ゲインに乗算す
ることにより補正ゲインを計算するものである。In order to achieve the above object, a speech encoding parameter encoding method according to the present invention comprises a speech encoding parameter obtained from an audio signal digitized and divided into frames of a predetermined time length. A speech pitch as the speech encoding parameter, wherein a quantization pitch is obtained by one of a quantization method and a difference quantization method or a uniform quantization method. Encoding by a combination of a frame and a frame to be quantized by an index of an interpolation pitch selected from a plurality of interpolation pitch candidates calculated using the quantization pitches of the preceding and succeeding frames; Unvoiced information is represented by an index of representative voiced / unvoiced information selected from a limited number of representative voiced / unvoiced information. Encoding, separating the harmonic spectrum amplitude sequence as the speech encoding parameter into a linear prediction coefficient by a linear prediction model or a line spectrum pair and a gain derived therefrom, and for the linear prediction coefficient or the line spectrum pair, a vector Interpolated quantization using a frame to be quantized by a quantizer such as a quantizer and an index of a candidate point selected from a plurality of candidate points obtained by a linear interpolator from quantized linear prediction coefficients of a preceding and succeeding frame or a line spectrum pair. Encoding in combination with a frame to be corrected, correcting the gain in accordance with a change in harmonic spectral power of the frame generated by quantization of the linear prediction coefficient, and obtaining a correction gain, and logarithmizing the correction gain. , The first gain quantizer as it is A frame for uniform quantization, and the output value of the difference quantizer that said first reference quantization gain of the gain quantizer of the previous frame,
From the output values of the interpolation quantizer obtained by selecting a plurality of interpolation candidates of the output of the first quantizer of the previous and subsequent frames,
The method includes a step of encoding the correction gain in combination with a frame to be quantized by a second gain quantizer that selects and quantizes a smaller error. Further, the step of obtaining the correction gain is obtained by a linear prediction model using a harmonic spectral power obtained by a sum of squares of a harmonic amplitude sequence before linear prediction modeling, a quantized linear prediction coefficient, and a gain before quantization. The correction gain is calculated by multiplying the gain by a ratio of the harmonic spectrum power obtained from the sum of squares of the harmonic spectrum amplitude value.

【００１３】さらに、本発明の音声符号化パラメータ符
号化装置は、デジタル化され所定時間長のフレームに分
割された音声信号から取得した音声符号化パラメータを
符号化する音声符号化パラメータ符号化装置であって、
前記音声符号化パラメータとしての音声ピッチを、差分
量子化法と均一量子化法の選択によりいずれかの量子化
法により量子化するフレームと、前後のフレームの量子
化ピッチを用いて計算した複数の補間ピッチ候補から選
択した補間ピッチのインデックスにより量子化するフレ
ームの組み合わせにより符号化する手段と、前記音声符
号化パラメータとしての有声／無声情報を、限定された
数の代表有声／無声情報から選択した代表有声／無声情
報のインデックスにより符号化する手段と、前記音声符
号化パラメータとしてのハーモニックスペクトル振幅列
を、線形予測モデルによる線形予測係数もしくはそれよ
り導かれる線スペクトル対とゲインに分離し、線形予測
係数もしくは線スペクトル対については、ベクトル量子
化器などの量子化器により量子化するフレームと、前後
のフレームの量子化線形予測係数もしくは線スペクトル
対から線形補間器により求めた複数の候補点から選択し
た候補点のインデックスにより補間量子化するフレーム
の組み合わせにより符号化する手段と、前記線形予測係
数の量子化により発生するフレームのハーモニックスペ
クトルパワーの変化に応じて上記ゲインを補正し補正ゲ
インを得る手段と、該補正ゲインを対数化し、そのまま
第１のゲイン量子化器で均一量子化するフレームと、前
のフレームの前記第１のゲイン量子化器の量子化ゲイン
を基準とした差分量子化器の出力値と、前後のフレーム
の前記第１の量子化器の出力の複数の補間候補の選択に
より求めた補間量子化器の出力値から、誤差の少ない方
を選んで量子化する第２のゲイン量子化器により量子化
するフレームの組み合わせにより前記補正ゲインを符号
化する手段とを有するものである。Further, the voice coding parameter coding device of the present invention is a voice coding parameter coding device for coding voice coding parameters obtained from a voice signal digitized and divided into frames of a predetermined time length. So,
The voice pitch as the voice coding parameter, a frame to be quantized by any one of the quantization methods by the selection of the difference quantization method and the uniform quantization method, and a plurality of calculated using the quantization pitch of the preceding and following frames Means for encoding by a combination of frames to be quantized by an index of an interpolation pitch selected from interpolation pitch candidates, and voiced / unvoiced information as the voice coding parameter are selected from a limited number of representative voiced / unvoiced information. Means for encoding with representative voiced / unvoiced information index, and a step of separating the harmonic spectrum amplitude sequence as the speech encoding parameter into a linear prediction coefficient by a linear prediction model or a line spectrum pair and a gain derived from the linear prediction coefficient, For coefficients or line spectrum pairs, quantize using a vector quantizer And a frame to be interpolated and quantized by an index of candidate points selected from a plurality of candidate points obtained by a linear interpolator from quantized linear prediction coefficients or line spectrum pairs of the preceding and following frames. Means, means for correcting the gain in accordance with a change in the harmonic spectral power of a frame generated by quantization of the linear prediction coefficient, to obtain a correction gain, and logarithmizing the correction gain, and as it is, a first gain quantizer. , The output value of the difference quantizer based on the quantization gain of the first gain quantizer of the previous frame, and the output of the first quantizer of the preceding and succeeding frames. A second gain amount for selecting and quantizing a value having a smaller error from the output values of the interpolation quantizer obtained by selecting the plurality of interpolation candidates And has a means for encoding the correction gain by a combination of a frame to be quantized by the encoder.

【００１４】このように、本発明においては、音声ピッ
チの符号化に対しては、対数変換したピッチに対して、
差分量子化法と均一量子化法を切り換えて、入力音声ピ
ッチとの誤差が少ない方を選択して量子化するフレーム
と、フレーム間ピッチの複数個の補間点から一番近い補
間点候補の番号を選択し、その選択番号で量子化するフ
レームを、フレーム繰り返しにより切り換えて使用する
ことにより、ピッチの符号化ビット数を減少させてい
る。また、有声／無声情報（V/UV情報）の符号化に対し
ては、予め多くの音声フレームに対してV/UV情報とその
発生頻度を取得し、その中から固定数の代表V/UV情報を
予め選定し、その代表V/UV情報の中からそのフレームの
V/UV情報に最も似た代表V/UV情報の番号（インデック
ス）で符号化する手段をとる。また、V/UV情報の伝送を
行わないフレームを適宜挿入し、V/UV情報が送られてい
ないフレームの復号に対しては、前後のフレームのう
ち、大きい音声エネルギーを持った方のフレームのV/UV
情報を用いて復元するようにしている。以上２つの手段
により、V/UV情報の符号化ビット数を減少させている。
さらに、ハーモニックスペクトル振幅列の符号化に関し
ては、そのハーモニックスペクトル振幅列を自己回帰型
Ｊ次線形予測モデル（ＡＲモデル）でモデル化し、線形
予測係数（ＬＰＣ）とゲインで表現する。ＬＰＣ係数は
ＬＳＰ（線スペクトル対）に変換した後、ベクトル量子
化によりフレームあたりＪ個のＬＳＰをベクトル量子化
するフレームと、量子化されたＬＳＰのフレーム間の複
数補間点から一番近い補間点候補の番号を選択し、その
選択番号で量子化するフレームの組み合わせにより量子
化する。また、ゲインはＬＳＰのベクトル量子化および
補間により量子化した線形予測係数から線形予測モデル
により復元した場合の、ハーモニック振幅の誤差のため
発生するフレームパワーの変化を補正し、補正したゲイ
ンの対数値を均一量子化するフレームと、前フレームか
らの差分量を量子化した場合と、前後のフレーム間の補
間値を選択した場合の誤差が少ない方のゲインを選択す
るフレームの組み合わせにより量子化する手段をとる。
以上のような２段構成でハーモニックスペクトル振幅列
の量子化を行うことにより、線形予測係数の量子化誤差
により発生するフレームパワーの変化を抑えつつ、低ビ
ットでのハーモニック振幅列の符合化を行うことができ
る。As described above, in the present invention, the coding of the voice pitch is performed with respect to the logarithmically converted pitch.
Switching between the differential quantization method and the uniform quantization method, selecting the frame with the smaller error from the input voice pitch and quantizing, and the number of the interpolation point candidate closest to the plurality of interpolation points of the inter-frame pitch Is selected, and the frame to be quantized with the selected number is switched and used by frame repetition, thereby reducing the number of encoded bits of the pitch. For voiced / unvoiced information (V / UV information) coding, V / UV information and its occurrence frequency are obtained in advance for many voice frames, and a fixed number of representative V / UV information are obtained from among them. Information is selected in advance, and from the representative V / UV information,
Means is used for encoding with the number (index) of the representative V / UV information most similar to the V / UV information. In addition, a frame that does not transmit V / UV information is appropriately inserted, and decoding of a frame to which V / UV information is not transmitted is performed in order to decode a frame having higher sound energy among the preceding and following frames. V / UV
It restores using the information. The number of encoded bits of V / UV information is reduced by the above two means.
Further, regarding encoding of the harmonic spectrum amplitude sequence, the harmonic spectrum amplitude sequence is modeled by an autoregressive J-order linear prediction model (AR model), and is represented by a linear prediction coefficient (LPC) and a gain. The LPC coefficient is converted into an LSP (line spectrum pair), and then the vector is subjected to vector quantization to perform vector quantization of J LSPs per frame, and an interpolation point closest to a plurality of interpolation points between the quantized LSP frames. A candidate number is selected, and quantization is performed using a combination of frames to be quantized with the selected number. In addition, the gain corrects a change in frame power caused by an error in harmonic amplitude when the linear prediction coefficient is restored from a linear prediction coefficient quantized by LSP vector quantization and interpolation, and a logarithmic value of the corrected gain. Means for quantizing by a combination of a frame in which uniform quantization is performed, a case in which an amount of difference from the previous frame is quantized, and a frame in which an error is smaller when an interpolation value between the preceding and succeeding frames is selected and a gain having a smaller error is selected. Take.
By performing quantization of the harmonic spectrum amplitude sequence in the two-stage configuration as described above, encoding of the harmonic amplitude sequence at low bits is performed while suppressing a change in frame power caused by a quantization error of the linear prediction coefficient. be able to.

【００１５】[0015]

【発明の実施の形態】本発明の音声符号化パラメータ符
号化方法および該符号化方法が適用された音声符号化パ
ラメータ符号化装置の一実施の形態について、前記分析
合成型音声符号化方法であるＭＢＥもしくはＩＭＢＥ音
声符号化方法に適応した場合を例にとって説明する。な
お、この音声符号化パラメータ符号化装置は、前記図８
に示したパラメータ符号化部３０３に対応するものであ
り、音声符号化パラメータ抽出部３０２により抽出され
た音声符号化パラメータ、すなわち、音声ピッチ（また
はその逆数である音声基本周波数ωo）、音声ハーモニ
ックスペクトル振幅列および各周波数バンドの有声／無
声情報（V/UV情報）を効率的に符号化する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of a speech coding parameter coding method and a speech coding parameter coding apparatus to which the coding method is applied according to the present invention are the analysis-synthesis type speech coding method. A case where the present invention is applied to the MBE or IMBE speech encoding method will be described as an example. It should be noted that this speech coding parameter coding apparatus is the same as that shown in FIG.
And the speech encoding parameters extracted by the speech encoding parameter extraction unit 302, that is, the speech pitch (or the speech fundamental frequency ωo, which is the reciprocal thereof), and the speech harmonic spectrum. Efficiently encode the amplitude sequence and voiced / unvoiced information (V / UV information) of each frequency band.

【００１６】図１は本発明による音声符号化パラメータ
符号化方法が適用された音声符号化パラメータ符号化装
置の一構成例を示すブロック図である。例えば前記図９
で示した音声符号化パラメータ抽出部で得られた音声ピ
ッチ（又は音声基本周波数ωo）は入力端子１０１に入
力され、対数変換部１０２で音声ピッチが対数変換さ
れ、対数音声ピッチP[n]（ｎはフレーム番号）を得る。
対数音声ピッチは文献（Thomas Eriksson and Hong-Goo
Kang, "Pitch Quantization in low Bit-Rate Speech
Coding", ICASSP '99, pp489-492,1999）に述べられて
いるように、対数ピッチの変化量に対する人間の検知限
界値が、対数ピッチの値の影響をあまり受けないことが
知られている。そのため、量子化ステップ幅を均一にす
ることが出来るため都合の良い変換となっている。FIG. 1 is a block diagram showing an example of the configuration of a speech coding parameter coding apparatus to which a speech coding parameter coding method according to the present invention is applied. For example, FIG.
The voice pitch (or voice fundamental frequency ωo) obtained by the voice coding parameter extraction unit shown in is input to the input terminal 101, the voice pitch is logarithmically converted by the logarithmic conversion unit 102, and the logarithmic voice pitch P [n] ( n is the frame number).
Logarithmic voice pitch is described in the literature (Thomas Eriksson and Hong-Goo
Kang, "Pitch Quantization in low Bit-Rate Speech
Coding ", ICASSP '99, pp489-492, 1999), it is known that the limit of human detection to the change in logarithmic pitch is not significantly affected by the value of logarithmic pitch. Therefore, since the quantization step width can be made uniform, the conversion is convenient.

【００１７】対数ピッチP[n]は入力切換部１０３でフレ
ーム毎（またはサブフレーム化されている場合はサブフ
レーム毎）に交互に切り換えられて、２つの出力端子１
０４または１１６のいずれかに出力される。１０４に出
力された場合は均一量子化部１０５と減算部１１２に導
かれる。均一量子化部１０５では一定の量子化ステップ
で均一に量子化され、その量子化対数ピッチP1'[n]（１
０６）がピッチ比較部１０８に入力される。一方、減算
部１１２では入力された対数ピッチと遅延部１１１から
受け取った前フレームの量子化対数ピッチとから差分対
数ピッチを得て差分量子化部１１３に入力する。遅延部
１１１は、直前フレームの量子化対数ピッチを現在フレ
ームに渡すためのものである。差分量子化部１１３では
均一の差分量子化ステップ、もしくは差分値ゼロを基準
として差分入力振幅の増加につれて差分量子化ステップ
が拡大する様に設定した不均一量子化ステップで差分量
子化を行い、加算部１１４で差分の基準とした前フレー
ムの量子化対数ピッチと加算し、差分量子化による量子
化対数ピッチP2'[n]を１０７に出力する。ピッチ比較部
１０８では、P1'[n]とP2'[n]を比較し、量子化前の対数
ピッチP[n]との誤差が少ない方の量子化対数ピッチを選
択し、均一量子化インデックスN1と差分量子化インデッ
クスN2のうち選択された方の量子化器の出力したインデ
ックスをピッチ符号としてピッチ符号切換部１１０の一
方の入力に出力する。N1とN2のインデックス（選択番
号）は番号の重複が無い様に配置することで出力された
インデックス番号からどちらの量子化方法が選択された
かが判る。出力端子１０９にはピッチ比較部１０８で選
択された量子化器からの量子化対数ピッチP'[n]を出力
する。The logarithmic pitch P [n] is alternately switched for each frame (or for each sub-frame when sub-frames are formed) by the input switching unit 103, and the two output terminals 1
04 or 116 is output. When output to 104, it is guided to uniform quantization section 105 and subtraction section 112. The uniform quantizing section 105 performs uniform quantization at a fixed quantization step, and the quantized logarithmic pitch P1 ′ [n] (1
06) is input to the pitch comparison unit 108. On the other hand, the subtraction unit 112 obtains the difference logarithmic pitch from the input logarithmic pitch and the quantization logarithmic pitch of the previous frame received from the delay unit 111 and inputs the difference logarithmic pitch to the difference quantization unit 113. The delay unit 111 passes the quantized logarithmic pitch of the immediately preceding frame to the current frame. The difference quantization unit 113 performs the difference quantization in a uniform difference quantization step or in a non-uniform quantization step set such that the difference quantization step expands as the difference input amplitude increases with reference to the difference value of zero. The unit 114 adds the result to the quantization logarithmic pitch of the previous frame, which is used as the reference for the difference, and outputs the quantization logarithmic pitch P2 ′ [n] by the difference quantization to 107. The pitch comparison unit 108 compares P1 ′ [n] and P2 ′ [n], selects a quantized logarithmic pitch having a smaller error from the logarithmic pitch P [n] before quantization, and generates a uniform quantization index. The index output from the quantizer selected from N1 and the differential quantization index N2 is output as a pitch code to one input of the pitch code switching unit 110. By arranging the indexes (selection numbers) of N1 and N2 so that the numbers do not overlap, it is possible to determine which quantization method has been selected from the output index numbers. An output terminal 109 outputs the quantized logarithmic pitch P ′ [n] from the quantizer selected by the pitch comparing unit 108.

【００１８】入力切換部１０３のもう一方の出力１１６
に現れた対数ピッチP[n]は、遅延部１１１の入出力端か
ら得られる現フレームと前フレームの量子化対数ピッチ
を用いて、補間ピッチ候補作成部１１７で作成した複数
の補間ピッチ候補と、補間点比較部１１９で比較され、
最も現在フレームの対数ピッチ１１６に近いピッチを与
えた補間点インデックスN3をピッチ補間符号としてピッ
チ符号切換部１１０のもう一方の入力に出力する。ピッ
チ符号切換部１１０は入力切換部１０３の動きに合せて
ピッチ符号を切り替えて出力するThe other output 116 of the input switching unit 103
The logarithmic pitch P [n] appearing on the basis of the quantized logarithmic pitches of the current frame and the previous frame obtained from the input / output terminal of the delay unit 111 is used as the plurality of interpolation pitch candidates created by the interpolation pitch candidate creation unit 117. Are compared by the interpolation point comparison unit 119,
An interpolation point index N3 that gives a pitch closest to the logarithmic pitch 116 of the current frame is output to the other input of the pitch code switching unit 110 as a pitch interpolation code. Pitch code switching section 110 switches and outputs a pitch code in accordance with the movement of input switching section 103.

【００１９】図２は、図１の補間ピッチ候補作成部１１
７の働きを説明する図である。図２に示した例は、補間
点候補数を４としその選択番号（インデックス）により
２ビットで量子化した例である。現在フレームの前方フ
レームの量子化対数ピッチをP'[n+1]、後方フレームの
量子化対数ピッチをP'[n-1]とし、その間を直線で結ん
だ間を均等に分割する４点を×印で示す。この４点の補
間ピッチ候補のうち、最も入力対数ピッチP[n]に近い補
間量子化ピッチP'[n]が選択され、この補間量子化ピッ
チを与えるインデックスとして、この例では２を選択す
る。P[n]はP'[n+1]とP[n-1]の間のフレームのピッチで
あり、例えば、フレームを２つのサブフレームに分割さ
れている場合には、P[n]は現フレームの第１サブフレー
ム、P'[n+1]が現フレームの第２サブフレームの量子化
ピッチ、P'[n-1]は前フレームの第２サブフレームの量
子化ピッチに対応している。なお、図２の補間ピッチ候
補の配置ではP'[n+1]とP'[n-1]は補間ピッチ候補に入れ
ていないが、補間ピッチ候補を両端のP'[n+1]とP'[n-1]
を含んで設定することも出来る。その場合には、P'[n+
1]とP'[n-1]以外の補間ピッチ候補は２点となる。図２
の例の様に補間ピッチ候補の位置を両端を除いて設定す
ると、１ビットでも両端を除く２点を選択できることに
なるため、図２の補間点配置は補間点に与えるビット数
が１ビットとか２ビットとかの少ない場合に有効といえ
る。FIG. 2 shows an interpolation pitch candidate creating section 11 shown in FIG.
FIG. 7 is a diagram for explaining the operation of No. 7; The example shown in FIG. 2 is an example in which the number of interpolation point candidates is 4 and quantization is performed with 2 bits using the selection number (index). The quantized logarithmic pitch of the front frame of the current frame is P '[n + 1], and the quantized logarithmic pitch of the rear frame is P' [n-1]. Is indicated by a cross. Among the four interpolation pitch candidates, the interpolation quantization pitch P ′ [n] closest to the input logarithmic pitch P [n] is selected, and 2 is selected as an index for giving the interpolation quantization pitch in this example. . P [n] is the pitch of the frame between P '[n + 1] and P [n-1]. For example, if the frame is divided into two subframes, P [n] is The first subframe of the current frame, P ′ [n + 1], corresponds to the quantization pitch of the second subframe of the current frame, and P ′ [n−1] corresponds to the quantization pitch of the second subframe of the previous frame. ing. Although P '[n + 1] and P' [n-1] are not included in the interpolation pitch candidates in the arrangement of the interpolation pitch candidates in FIG. 2, the interpolation pitch candidates are represented by P '[n + 1] at both ends. P '[n-1]
Can also be set. In that case, P '[n +
Interpolation pitch candidates other than [1] and P '[n-1] are two points. FIG.
When the position of the interpolation pitch candidate is set excluding both ends as in the example of FIG. 2, two points excluding both ends can be selected even with one bit. Therefore, the interpolation point arrangement of FIG. It can be said to be effective when there are as few as 2 bits.

【００２０】再度、図１に戻り、有声／無声情報（V/UV
情報）は入力端子１３１から入力され、フレーム間引き
部１３３でV/UV情報がフレーム（またはサブフレーム）
間引きされる。例えば２フレーム（または２サブフレー
ム）に対して１回のみV/UV情報が出力され、有声／無声
比較部１３４に入力される。代表有声／無声情報コード
ブック１３２は予め多くの音声フレームから取得したV/
UV情報から後で述べる方法で、発生頻度の高いものから
限定個数を選んで格納したものである。現在入力された
V/UV情報値b1と最も近い距離の代表V/UV情報値b1'を有
声／無声比較部１３４で選択し、その代表V/UV情報のイ
ンデックスを有声／無声符号１３５として出力する。V/
UV情報値b1（又はb1'）は音声周波数スペクトルを音声
基本周波数の例えば３倍の区間間隔で区切った周波数バ
ンド毎のV/UV情報値ｖ[k]、k=1,2,...,K（V[k]は０又は
１）を２進数の各ビットに割り振った２進数値で表わさ
れている。Returning to FIG. 1, voiced / unvoiced information (V / UV
Information) is input from the input terminal 131, and the V / UV information is converted into a frame (or subframe) by the frame thinning unit 133.
Thinned out. For example, V / UV information is output only once for two frames (or two subframes) and input to voiced / unvoiced comparison section 134. The representative voiced / unvoiced information codebook 132 stores V / V obtained in advance from many voice frames.
In the method described later, a limited number is selected from the most frequently generated UV information and stored. Currently entered
The voiced / unvoiced comparison unit 134 selects a representative V / UV information value b1 ′ closest to the V / UV information value b1 and outputs the index of the representative V / UV information as a voiced / unvoiced code 135. V /
The UV information value b1 (or b1 ′) is a V / UV information value v [k], k = 1, 2,... For each frequency band obtained by dividing the audio frequency spectrum by, for example, three times the interval of the audio fundamental frequency. , K (V [k] is 0 or 1) assigned to each bit of the binary number.

【数２】上記V/UV情報値の距離はb1とb1'の差の絶対値で表すも
のとする。また、代表V/UV情報は、音声基本周波数によ
り決まるバンド数毎に独立に設定する。(Equation 2) The distance between the V / UV information values is represented by the absolute value of the difference between b1 and b1 '. The representative V / UV information is set independently for each band number determined by the audio fundamental frequency.

【００２１】代表V/UV情報は以下の方法で選定し作成す
ることが出来る。すなわち、予め多くの音声フレームか
ら多くのV/UV情報値を得ておき、バンド数K毎に分類す
る。各バンド数毎に分類されたV/UV情報値の集合{b1_i}
から各b1_iの発生頻度を集計する。例えば、バンド数が
５個の場合に対しては、b1の値は０〜３１の整数値をと
り、おのおのb1値に対して発生頻度が集計される。例え
ば２ビットでV/UV情報を量子化する場合は、この中から
４種類の代表V/UV情報を選択する必要がある。この選択
のためには、発生頻度の上位から順に４つ選択する方法
が考えられるが、場合によっては隣接した代表V/UV情報
値が選ばれることがあり、代表V/UV情報値として適当で
ない場合が発生する。特にバンド数Kが大きい場合には
発生頻度の高い代表V/UV情報値が隣接して存在する可能
性が高い。The representative V / UV information can be selected and created by the following method. That is, a large number of V / UV information values are obtained in advance from a large number of audio frames and are classified for each band number K. A set of V / UV information values classified for each band number {b1 _i }
, The frequency of occurrence of each b1 _i is tabulated. For example, when the number of bands is 5, the value of b1 takes an integer value of 0 to 31, and the occurrence frequency is totaled for each b1 value. For example, when quantizing V / UV information with 2 bits, it is necessary to select four types of representative V / UV information from among them. For this selection, a method of selecting four items from the highest occurrence frequency is conceivable. However, in some cases, adjacent representative V / UV information values may be selected, which is not appropriate as the representative V / UV information value. Cases occur. In particular, when the number of bands K is large, there is a high possibility that a representative V / UV information value having a high frequency of occurrence exists adjacently.

【００２２】このため、発生頻度の最も低いV/UV情報の
発生頻度を、隣接するV/UV情報の発生頻度に配分しなが
ら順次消去し、最終的に目的数の代表V/UV情報の数まで
削減する方法が有効となる。図３はV/UV情報（V/UVパタ
ーン）の候補数を順次削除する場合に、最も発生頻度の
低いV/UVパターンを消去する方法について説明する図で
ある。ここで、最も発生頻度の低いV/UVパターンの発生
頻度をq[n]とするとq[n]をq1とq2に分けてそれぞれ隣接
するV/UVパターンの発生頻度q[n-l₁]とq[n+l₂]に加算配
分する。q1とq2の配分量は隣接V/UVパターンの発生頻度
q[n-l₁]とq[n+l ₂]の大きさと、隣接V/UVパターンまでの
距離l₁とl₂の近さに応じて下式により決める。For this reason, V / UV information having the lowest frequency of occurrence
Distribute the frequency of occurrence to the frequency of occurrence of adjacent V / UV information
From the target V / UV information
The method of reducing becomes effective. Figure 3 shows V / UV information (V / UV pattern
), The number of occurrences with the highest frequency
Figure explaining how to erase low V / UV patterns
is there. Here, the least frequently occurring V / UV pattern
If the frequency is q [n], q [n] is divided into q1 and q2
V / UV pattern occurrence frequency q [n-l₁] And q [n + l_Two]
Minute. The allocation amount of q1 and q2 is the frequency of occurrence of adjacent V / UV pattern
q [n-l₁] And q [n + l _Two] Size and up to adjacent V / UV pattern
Distance l₁And l_TwoIs determined by the following equation according to the proximity of

【数３】 (Equation 3)

【００２３】再度、図１に戻り、スペクトル包絡の符号
化について説明する。スペクトル包絡はハーモニックス
ペクトル振幅列として離散的なスペクトルが入力端子１
４０に入力され、スペクトル修正部１４１に入力され
る。スペクトル修正部１４１の動作は後で説明する。ス
ペクトル修正部１４１で修正されたスペクトル列は、線
形予測モデル化部１４２で自己回帰型線形予測モデルで
モデル化し、ゲインＧと高次（例えばＪ次、Ｊ＝１０）
の線形予測係数（ＬＰＣ係数ａ[j]、j=1,2,3,...,J、Ｊ
は予測次数）に変換される。更に、ＬＰＣ係数ａ[j]は
線スペクトル対（ＬＳＰ）Ｆ[j]に変換される。ＬＳＰ
は０からπまでの値を持ち、線形補間による聴感上の劣
化が少ないため、スペクトル包絡のモデル化の係数とし
て広く一般に用いられている。ＬＳＰに変換されたＬＰ
Ｃ係数はＬＳＰ量子化部１４３でＬＳＰコードブック１
４４を用いてベクトル量子化され、そのコードブックの
インデックスがＬＳＰ符号切換部１４６に出力される。
ここでＬＳＰ量子化部１４３は、１フレーム（または１
サブフレーム）おきに量子化を行うなどして量子化を行
うフレーム数を削減する。一方、量子化されたＬＳＰは
ＬＳＰ補間部１４５に入り、直前と直後に量子化された
量子化ＬＳＰとの間で複数のＬＳＰ補間候補を計算し、
ＬＳＰ補間候補から現在フレームのＬＳＰに最も近いＬ
ＳＰ補間候補を選択し、そのＬＳＰ補間候補番号をＬＳ
Ｐ符号切換部１４６へ出力する。ＬＳＰ量子化部１４３
で量子化を行わなかったフレームは、このＬＳＰ補間に
より量子化を行うことで符号量の増大を抑えている。Ｌ
ＳＰ符号切換部１４６ではフレーム毎（またはサブフレ
ーム毎）にＬＳＰ量子化部１４３またはＬＳＰ補間部１
４５からの量子化インデックスを切り換えてＬＳＰ符号
として出力端子１５４へ出力する。Returning to FIG. 1, coding of the spectral envelope will be described. For the spectral envelope, a discrete spectrum is input terminal 1 as a harmonic spectrum amplitude sequence.
The signal is input to the spectrum correction unit 141. The operation of the spectrum correction unit 141 will be described later. The spectrum sequence corrected by the spectrum correction unit 141 is modeled by an autoregressive linear prediction model by a linear prediction modeling unit 142, and gain G and a higher order (for example, J order, J = 10)
(LPC coefficient a [j], j = 1,2,3, ..., J, J
Is predicted order). Further, the LPC coefficient a [j] is converted to a line spectrum pair (LSP) F [j]. LSP
Has a value from 0 to π and has little perceptual deterioration due to linear interpolation. Therefore, it is widely and generally used as a coefficient for modeling a spectral envelope. LP converted to LSP
The C coefficient is calculated by the LSP quantizer 143 in the LSP codebook 1
44, and the index of the codebook is output to the LSP code switching unit 146.
Here, the LSP quantization unit 143 outputs one frame (or one frame).
The number of frames to be quantized is reduced by performing quantization every other subframe). On the other hand, the quantized LSP enters the LSP interpolation unit 145, and calculates a plurality of LSP interpolation candidates between the quantized LSP immediately before and immediately after,
L closest to the LSP of the current frame from the LSP interpolation candidate
Select an SP interpolation candidate and enter its LSP interpolation candidate number in LS
Output to P code switching section 146. LSP quantizer 143
In the frames that have not been quantized by the above, the increase in the code amount is suppressed by performing the quantization by the LSP interpolation. L
In the SP code switching unit 146, the LSP quantization unit 143 or the LSP interpolation unit 1 is used for each frame (or for each subframe).
The quantization index is switched from 45 and output to the output terminal 154 as an LSP code.

【００２４】ここでスペクトル修正部１４１の機能とそ
の目的について説明する。線形予測モデルでハーモニッ
クスペクトル振幅列をモデル化する場合、多くのスペク
トル点があった方が良いが、逆にモデルに合わないスペ
クトルが有ればモデル化に歪を与え、モデル化後のスペ
クトル誤差が増大する。一般に音声の０次のハーモニッ
クスペクトル振幅（直流成分）は他のハーモニックスペ
クトル振幅に比べて低く、モデル化誤差が発生しやす
い。また、０次ハーモニックスペクトル振幅は、音声復
号時には不要な成分であるため（音声信号には直流成分
は殆ど含まれないため）、モデル化しやすいレベルに調
整変更しても良いといえる。以上の理由で、スペクトル
修正部１４１では線形予測モデルでモデル化しやすい様
に０次のハーモニックスペクトル振幅を修正する。具体
的には、次式により２次のハーモニック振幅Ｈ2を係数
αで補正した値を０次ハーモニック振幅に置きかえる。Here, the function and purpose of the spectrum correcting section 141 will be described. When modeling a harmonic spectrum amplitude sequence with a linear prediction model, it is better to have many spectral points, but if there is a spectrum that does not fit the model, it distorts the modeling, and the spectral error after modeling Increase. Generally, the 0th-order harmonic spectrum amplitude (DC component) of a voice is lower than other harmonic spectrum amplitudes, and a modeling error is likely to occur. Also, since the 0th-order harmonic spectrum amplitude is an unnecessary component at the time of audio decoding (the audio signal hardly contains a DC component), it can be said that the amplitude may be adjusted and changed to a level that is easy to model. For the above reasons, the spectrum correction unit 141 corrects the 0th-order harmonic spectrum amplitude so that it can be easily modeled by a linear prediction model. Specifically, a value obtained by correcting the second harmonic amplitude H2 by the coefficient α according to the following equation is replaced with the 0th harmonic amplitude.

【数４】ここで、Ｈ₀'は置き換えた０次ハーモニックスペクトル
振幅、Ｈ₁、Ｈ₂、Ｈ₃はそれぞれ１次、２次、３次のハ
ーモニックスペクトル振幅である。また、ハーモニック
スペクトル振幅列はピッチ周波数によってはそのサンプ
ル個数が少なくなるため、入力されたハーモニックスペ
クトル振幅列を離散スペクトル間で補間生成を行う。こ
こではハーモニックスペクトル振幅の対数値を周波数に
対して線形で補間し、補間スペクトルデータを作成す
る。(Equation 4) Here, H ₀ ′ is the replaced zero-order harmonic spectrum amplitude, and H ₁ , H ₂ , and H ₃ are the first-order, second-order, and third-order harmonic spectrum amplitudes, respectively. Further, since the number of samples of the harmonic spectrum amplitude sequence decreases depending on the pitch frequency, the input harmonic spectrum amplitude sequence is interpolated between discrete spectra. Here, the logarithmic value of the harmonic spectrum amplitude is linearly interpolated with respect to the frequency to create interpolated spectrum data.

【００２５】ハーモニック復元部１５３は、量子化され
たＬＳＰ符号から復元したＬＰＣ係数ａkと量子化前の
ＬＰＣゲインＧにより、前記ＬＰＣモデル化式（１）か
らピッチ周波数の高調波のスペクトル振幅を計算して復
元されたハーモニックスペクトル振幅とし誤差計算部１
５２へ出力する。誤差計算部１５２では、元の入力ハー
モニックスペクトル振幅列Ａ[l]とのスペクトルパワー
の誤差を計算し、ゲイン補正部１４７に入力する。ゲイ
ン補正部１４７では線形予測モデル化部１４２のゲイン
Ｇに対して誤差計算部１５２の出力に基づいてパワー誤
差の補正を行い、補正されたゲインがゲイン量子化部１
４８に入力され、ここで補正後の対数ゲインが均一量子
化される。均一量子化されたゲインのゲイン符号はゲイ
ン符号切換部１５１の一方の入力に出力される。このゲ
イン量子化部１４８では、１フレーム（または１サブフ
レーム）おきに量子化を行うなどして量子化を行うフレ
ームを削減する。The harmonic restoration unit 153 calculates the spectral amplitude of the harmonic of the pitch frequency from the LPC modeling formula (1) using the LPC coefficient ak restored from the quantized LSP code and the LPC gain G before quantization. Calculated as the harmonic spectrum amplitude restored
52. The error calculator 152 calculates an error of the spectrum power with respect to the original input harmonic spectrum amplitude sequence A [l] and inputs it to the gain corrector 147. The gain correction unit 147 corrects the power error of the gain G of the linear prediction modeling unit 142 based on the output of the error calculation unit 152, and the corrected gain is used as the gain quantization unit 1
48, where the corrected logarithmic gain is uniformly quantized. The gain code of the uniformly quantized gain is output to one input of the gain code switching unit 151. The gain quantization unit 148 reduces the number of frames to be quantized by performing quantization every other frame (or every other subframe).

【００２６】量子化を行わなかったフレームに対して
は、ゲイン差分量子化部１４９又はゲイン補間量子化部
１５０で量子化を行うことで符号量の増大を抑える。ゲ
イン差分量子化部１４９では、ゲイン量子化部１４８で
量子化しなかったフレームの対数ゲインを直前にゲイン
量子化部１４８で量子化した量子化ゲインを基準として
差分量子化する。また、ゲイン補間量子化部１５０で
は、同様に、ゲイン量子化部１４８で量子化しなかった
フレームの対数ゲインを、直前および直後のフレームの
量子化ゲインから線形補間で求めた補間ゲイン候補の中
から一番誤差の少ない補間ゲインを選択して補間量子化
する。そして、ゲイン差分量子化部１４９とゲイン補間
量子化部１５０の出力のうち最も量子化誤差の少ないゲ
インのゲイン符号をゲイン符号切換部１５１のもう一方
の入力に出力する。ここでは、ゲイン差分量子化部１４
９とゲイン補間量子化部１５０のそれぞれのゲイン符号
が重複の無い様にすることでどちらの量子化法が選択さ
れたがわかるので選択符号を送る必要はない。ゲイン符
号切換部１５１では、フレーム毎（またはサブフレーム
毎）にゲイン量子化部１４８のゲイン符号と、ゲイン差
分量子化部１４９またはゲイン補間量子化部１５０のゲ
イン符号を切り換えてＬＳＰゲイン符号として出力端子
１５５へ出力する。For a frame that has not been subjected to quantization, the gain difference quantization section 149 or the gain interpolation quantization section 150 performs quantization to suppress an increase in the code amount. The gain difference quantization unit 149 performs difference quantization on the logarithmic gain of the frame that has not been quantized by the gain quantization unit 148, with reference to the quantization gain quantized immediately before by the gain quantization unit 148. Similarly, the gain interpolation quantization unit 150 calculates the logarithmic gain of the frame not quantized by the gain quantization unit 148 from the interpolation gain candidates obtained by linear interpolation from the quantization gains of the immediately preceding and subsequent frames. The interpolation gain having the smallest error is selected and interpolation quantization is performed. Then, of the outputs of the gain difference quantization unit 149 and the gain interpolation quantization unit 150, the gain code of the gain with the smallest quantization error is output to the other input of the gain code switching unit 151. Here, the gain difference quantization unit 14
9 and the gain interpolation quantization unit 150 are set such that there is no overlap, so that it is possible to know which quantization method has been selected, and it is not necessary to send a selection code. The gain code switching unit 151 switches between the gain code of the gain quantization unit 148 and the gain code of the gain difference quantization unit 149 or the gain interpolation quantization unit 150 for each frame (or for each subframe) and outputs them as LSP gain codes. Output to terminal 155.

【００２７】このように、ＬＳＰ量子化に伴うハーモニ
ックスペクトル振幅の変化によって発生するフレームパ
ワー誤差を、量子化前のゲインへ補正することで、特に
音声発生時や消滅時などの過渡状態で発生しやすいフレ
ーム（またはサブフレーム）の音声振幅の過大な誤差に
よる雑音発生を抑えることができる。As described above, by correcting the frame power error caused by the change in the amplitude of the harmonic spectrum due to the LSP quantization to the gain before quantization, it is generated especially in a transient state such as when a sound is generated or disappeared. It is possible to suppress noise generation due to an excessive error in the audio amplitude of a frame (or a subframe) that is likely to occur.

【００２８】次に、本発明による前記図１に示した音声
符号化パラメータ符号化部の処理の流れについて説明す
る。図４は音声基本周波数ωoの量子化の流れ図であ
る。図４で７０１から処理が開始される。７０２で量子
化する音声基本周波数ωo、フレーム番号ｍを設定す
る。次に７０３で対数ピッチPを計算し、７０４でｍの
偶奇を判断し、もし偶数（EVEN）ならば７０５及び７０
６で、それぞれ均一量子化と差分量子化を行い、均一量
子化ピッチP_uと均一量子化インデックスIndex_u、及び
差分量子化ピッチP_dと差分量子化インデックスIndex_d
を計算する。Next, the flow of processing of the speech coding parameter coding section shown in FIG. 1 according to the present invention will be described. FIG. 4 is a flowchart of quantization of the fundamental voice frequency ωo. The process is started from 701 in FIG. In step 702, a voice fundamental frequency ωo and a frame number m to be quantized are set. Next, the logarithmic pitch P is calculated in 703, and the evenness or oddness of m is determined in 704. If it is even (EVEN), 705 and 70
In step 6, the uniform quantization and the difference quantization are performed, and the uniform quantization pitch P_u and the uniform quantization index Index_u, and the difference quantization pitch P_d and the difference quantization index Index_d
Is calculated.

【００２９】次に、７０８で差分量子化誤差（|P_d−P
|）を判定し、ある閾値Thより小さい場合は、７１０で
ｍフレーム目のピッチP[m]（これを偶数フレームの意味
でP[2n]と表す）をP_d、そのインデックスIndex[m]（こ
れを偶数フレームの意味でIndex[2n]と表す）をIndex_d
とする。一方、７０８で|P_d−P|がTｈ以上であると判
定された場合は、７０７で均一量子化誤差（|P_u−P|）
と差分量子化誤差（|P_d−P|）を比較し、均一量子化誤
差が小さい場合は７０９でｍフレーム目のピッチP[2n]
をP_u、そのインデックスIndex[2n]をIndex_uとし、逆
の場合は７１０でピッチP[2n]をP_d、Index[2n]をIndex
_dとする。また、ｍフレーム目のピッチP[2n]は７１１
で２フレーム時間遅延してP[2n-2]とし、差分量子化７
０６の基準対数ピッチとして、次の偶数フレームを差分
量子化する時の基準対数ピッチとして使用する。Next, at step 708, the difference quantization error (| P_d-P
|) Is determined, and if it is smaller than a certain threshold Th, the pitch P [m] of the m-th frame (this is represented as P [2n] in the meaning of an even frame) is P_d at 710, and its index Index [m] ( This is expressed as Index [2n] in the meaning of an even-numbered frame.)
And On the other hand, if it is determined in 708 that | P_d-P | is equal to or greater than Th, the uniform quantization error (| P_u-P |) is determined in 707.
And the difference quantization error (| P_d-P |). If the uniform quantization error is small, the pitch P [2n] of the m-th frame is determined at 709.
Is P_u, the index Index [2n] is Index_u, and in the opposite case, the pitch P [2n] is P_d and the index [2n] is Index at 710.
_d. The pitch P [2n] of the m-th frame is 711.
Is delayed by 2 frames to P [2n-2], and the differential quantization 7
The reference logarithmic pitch of 06 is used as the reference logarithmic pitch when the next even frame is differentially quantized.

【００３０】また、７０４でｍが奇数の場合は７１３で
ピッチ補間候補から選択される。ピッチ補間候補は、す
でに図２に示した手法により、前後のフレームの量子化
対数ピッチ間を複数個に均等分割した補間ピッチ候補の
集合{Pinpol[2n-1]_i}（i=0,1,2,3...,N-1、Ｎは補間候
補点数）として７１２で計算される。ピッチ補間候補の
選択は、ｍ＝2n-1フレームの対数ピッチP[2n-1]との誤
差絶対値が最も小さい補間候補点を選択し、そのインデ
ックス番号Index[2n-1]を奇数フレームの量子化された
ピッチ補間符号として７１４で設定される。なお、７１
３中のargmin_i(x)関数（argminの下にiが記された関
数）は、ｉをパラメータとして評価してｘが最小となる
ｉを返す関数である。また、偶数フレームの場合のピッ
チ符号は７０９又は７１０で選択された量子化法のイン
デックスIndex_d又はIndex_uがIndex[2n]として同様に
７１４で設定される。この様にして設定された偶数フレ
ームのピッチ符号と奇数フレームのピッチ補間符号が７
１５に出力される。図４に示した本発明による音声基本
周波数の符号化法を用いれば、例えば、ピッチ符号に４
ビット、ピッチ補間符号に１ビットを用いて、良好に２
フレーム分の音声基本周波数の符号化が出来る。If m is an odd number in step 704, pitch selection candidates are selected in step 713. A pitch interpolation candidate is a set of interpolation pitch candidates {Pinpol [2n-1] _i } (i = 0,1) obtained by equally dividing the interval between the quantized logarithmic pitches of the preceding and succeeding frames into a plurality of pieces by the method shown in FIG. , 2, 3,..., N−1, N are calculated at 712 as interpolation candidate points). To select a pitch interpolation candidate, an interpolation candidate point having the smallest error absolute value from the logarithmic pitch P [2n-1] of m = 2n-1 frames is selected, and its index number Index [2n-1] is set to the odd frame number. The quantized pitch interpolation code is set at 714. Note that 71
The argmin_i (x) function (function in which i is written below argmin) in 3 is a function that evaluates i as a parameter and returns i that minimizes x. In the case of an even-numbered frame, the pitch code Index_d or Index_u of the quantization method selected in 709 or 710 is similarly set in 714 as Index [2n]. The pitch code of the even frame and the pitch interpolation code of the odd frame set in this way are 7
15 is output. If the speech fundamental frequency encoding method according to the present invention shown in FIG. 4 is used, for example, 4
1 bit for bit and pitch interpolation code, 2
The audio fundamental frequency for the frame can be encoded.

【００３１】図５はV/UV情報の符号化の流れ図である。
８０１から処理が開始し、８０２で音声基本周波数をω
oに、フレームのV/UV情報値をｖに、フレーム番号をｍ
に設定する。８０３ではωoに対するバンド数Kを求め
る。FIG. 5 is a flowchart of encoding V / UV information.
The processing starts at 801 and the fundamental frequency of the sound is changed to ω at 802.
o, V / UV information value of frame to v, frame number to m
Set to. In step 803, the number of bands K for ωo is obtained.

【数５】ここで、floor(x)はｘを超えない最大の整数値を示す関
数、Bは各バンドに含まれるハーモニックの本数であ
り、符号化に先立ち予め決めておくもので、３程度が使
用される。８０４でｍの偶奇を判定し、８０５で、ｍが
偶数の場合には予め選出したバンド数ｋ毎の代表V/UV情
報値のデータ８０６の中から最も入力されたV/UV値ｖに
近い値を持った代表V/UV情報値VCB[K][i]のインデック
スｉを選びIndexV/UV[2n]とし、８０７で有声／無声符
号に設定する。処理フレームが奇数の場合は、有声／無
声符号は間引かれて出力されない。図５に示した本発明
によるV/UV情報の符号化法を用いれば、例えば、有声／
無声符号に２ビットを用いて、良好に２フレーム分のV/
UV情報の符号化が出来る。(Equation 5) Here, floor (x) is a function indicating the maximum integer value not exceeding x, B is the number of harmonics included in each band, which is determined in advance before encoding, and about 3 is used. . At step 804, the even / odd of m is determined. At step 805, when m is an even number, the data is closest to the most inputted V / UV value v from the representative V / UV information value data 806 for each band number k selected in advance. The index i of the representative V / UV information value VCB [K] [i] having a value is selected and set as IndexV / UV [2n]. At 807, a voiced / unvoiced code is set. If the processing frame is an odd number, the voiced / unvoiced code is thinned out and not output. Using the V / UV information encoding method according to the present invention shown in FIG.
Using 2 bits for unvoiced code, V /
UV information can be encoded.

【００３２】図６はハーモニックスペクトル振幅列の符
号化の流れ図である。９０１から処理が開始し、９０２
でハーモニックスペクトル振幅をＡ[l]に、フレーム番
号をｍにセットする。９０３で０次のハーモニックスペ
クトル振幅Ａ[0]を第２次のハーモニックスペクトル振
幅Ａ[2]をα倍したもので補正する。補正係数αの決定
方法は前記（４）式で示した方法等で決定する。９０４
で補正されたハーモニックスペクトル振幅列の対数値を
線形補間し、スペクトル列の本数を増加する。９０５で
は補間して増加したスペクトル列を（１）式で示した線
形予測モデルでのモデル化曲線上の値としてモデル化を
行い、ＬＰＣ係数ａ[j]、ＬＰＣゲインＧを計算する。
さらにＬＰＣ係数ａ[j]は後の処理で有効な線スペクト
ル対（ＬＳＰ：Line Spectrum Pair）Ｆ[j]に変換す
る。FIG. 6 is a flowchart of encoding a harmonic spectrum amplitude sequence. Processing starts from 901 and 902
To set the harmonic spectrum amplitude to A [l] and the frame number to m. In step 903, the 0th-order harmonic spectrum amplitude A [0] is corrected by multiplying the second-order harmonic spectrum amplitude A [2] by α. The method of determining the correction coefficient α is determined by the method shown in the above equation (4). 904
Linearly interpolates the logarithmic value of the harmonic spectrum amplitude sequence corrected by (1) to increase the number of spectrum sequences. In step 905, modeling is performed using the spectrum sequence increased by interpolation as a value on a modeling curve in the linear prediction model expressed by the equation (1), and an LPC coefficient a [j] and an LPC gain G are calculated.
Further, the LPC coefficient a [j] is converted into a valid line spectrum pair (LSP: Line Spectrum Pair) F [j] in a later process.

【００３３】９０６で現在のフレーム番号の偶数・奇数
を判断し、偶数の場合にはＦ[j]を９０７にてベクトル
量子化テーブル９０８を用いてベクトル量子化し、量子
化ＬＳＰベクトルＦ'[j]とその量子化ＬＳＰベクトルの
インデックスLSPindex0を得、LSPindex0は９１４へ出力
する。また、量子化ＬＳＰベクトルＦ'[j]は９１０でフ
レーム遅延した量子化ＬＳＰベクトルと共に、９１１で
ＬＳＰ補間候補を線形補間により計算する。一方、９０
６で現在のフレーム番号が奇数の場合は、９１１で計算
したＬＳＰ補間候補の中から、量子化前の現在フレーム
のＬＳＰベクトルＦ[j]に最も近い値を持つＬＳＰ補間
候補Ｆ'[j]を選出し、選出したＬＳＰ補間候補のインデ
ックスをLSPindex1として９１４に出力する。９１４で
はフレーム番号が偶数の場合にはＬＳＰ符号としてLSPi
ndex0、奇数の場合にはLSPindex1を出力する。In step 906, the even frame number and the odd frame number of the current frame number are determined. If the frame number is even, F [j] is vector-quantized in step 907 using the vector quantization table 908, and the quantized LSP vector F '[j ] And the index LSPindex0 of the quantized LSP vector, and outputs LSPindex0 to 914. In addition, the quantized LSP vector F ′ [j] is calculated together with the quantized LSP vector whose frame has been delayed in 910 and the LSP interpolation candidate is calculated in 911 by linear interpolation. On the other hand, 90
If the current frame number is odd in 6, the LSP interpolation candidate F ′ [j] having the closest value to the LSP vector F [j] of the current frame before quantization is selected from among the LSP interpolation candidates calculated in 911. Is selected, and the index of the selected LSP interpolation candidate is output to 914 as LSPindex1. In 914, if the frame number is even, LSPi is used as the LSP code.
Output ndex0, LSPindex1 for odd numbers.

【００３４】９１４でＬＳＰ符号を出力した後、偶数フ
レームおよび奇数フレームの量子化ＬＳＰは９０９で量
子化ＬＳＰ係数Ｆ'[j]から量子化ＬＰＣ係数ａ'[j]に
し、９１２にて量子化ＬＰＣ係数ａ'[j]と量子化前のＬ
ＰＣゲインＧから、前記モデル化式（１）により、量子
化された音声基本周波数の高調波のスペクトル振幅値に
より、ハーモニック振幅列Ａ'[l]を復元する。９１３で
は、量子化前のハーモニック振幅列Ａ[l]の２乗和と復
元されたハーモニック振幅列Ａ'[l]の２乗和との比から
ＬＳＰ量子化によるゲイン変化率dgを計算する。After outputting the LSP code at 914, the quantized LSPs of the even and odd frames are changed from the quantized LSP coefficient F ′ [j] at 909 to the quantized LPC coefficient a ′ [j], and quantized at 912. LPC coefficient a '[j] and L before quantization
From the PC gain G, the harmonic amplitude sequence A ′ [l] is restored from the quantized spectrum amplitude value of the harmonic of the fundamental sound frequency by the modeling equation (1). At 913, the gain change rate dg by LSP quantization is calculated from the ratio of the sum of squares of the harmonic amplitude sequence A [l] before quantization to the sum of squares of the restored harmonic amplitude sequence A '[l].

【００３５】一方、９１５において、９０５でモデル化
されたＬＰＣゲインＧに対し９１３で計算したゲイン変
化率dgを用いて補正対数ゲインＧ'=log(dg×Ｇ)を計算
する。９１６でフレーム番号の偶（EVEN）／奇（ODD）
を判断し、偶数フレームのＬＰＣゲインは、９１７で対
数ゲインＧ'を均一量子化し、均一量子化ゲインＧu'を
得て、そのインデックスGindex0を９２２に出力する。
また、均一量子化ゲインＧu'は９１８でフレーム遅延し
たものと共に補間量子化９２０へ出力する。On the other hand, at 915, a corrected logarithmic gain G ′ = log (dg × G) is calculated using the gain change rate dg calculated at 913 with respect to the LPC gain G modeled at 905. Even number (EVEN) / odd (ODD) of frame number at 916
The LPC gain of the even-numbered frame is determined by uniformly quantizing the logarithmic gain G ′ at 917 to obtain a uniform quantization gain Gu ′ and outputting the index Gindex0 to 922.
The uniform quantization gain Gu ′ is output to the interpolation quantization 920 together with the one obtained by delaying the frame by 918.

【００３６】９１９では、現在フレームの対数ＬＰＣゲ
インと均一量子化した１つ前のフレーム（偶数フレー
ム）の量子化対数ゲインとの間で差分量子化し、量子化
ゲインＧd'とそのインデックスGindex_dを計算する。９
２０では前後のフレームの均一量子化ゲインから補間ゲ
イン候補を線形補間等で選出し、現在フレーム（奇数フ
レーム）のＧ'と最も近い補間ゲイン候補を選び、量子
化ゲインＧi'とそのインデックスGindex_iを計算する。
９２１では、奇数フレームの量子化ＬＰＣゲインＧd'お
よびＧi'から量子化前のＬＰＣゲインＧ'に近い方のイ
ンデックスを選択しGindex1として９２２へ出力する。
９２２では、ゲインインデックスをフレームの偶数、奇
数にあわせてGindex0とGindex1を切り換えてＬＰＣゲイ
ン符号Gindexとして出力し９２３でこの処理は終了す
る。At 919, the difference quantization is performed between the logarithmic LPC gain of the current frame and the quantization logarithmic gain of the immediately preceding frame (even frame) that has been uniformly quantized, and the quantization gain Gd ′ and its index Gindex_d are calculated. I do. 9
In step 20, an interpolation gain candidate is selected by linear interpolation or the like from the uniform quantization gains of the preceding and succeeding frames, an interpolation gain candidate closest to G ′ of the current frame (odd frame) is selected, and the quantization gain Gi ′ and its index Gindex_i are determined. calculate.
At 921, an index closer to the LPC gain G ′ before quantization is selected from the quantized LPC gains Gd ′ and Gi ′ of the odd-numbered frames, and is output to 922 as Gindex1.
At 922, Gindex0 and Gindex1 are switched according to the gain index according to the even number and odd number of the frame, and output as the LPC gain code Gindex. At 923, this processing ends.

【００３７】図６に示した本発明によるハーモニックス
ペクトル振幅列の符号化法を用いれば、例えば、ＬＳＰ
符号に１７ビット、ＬＳＰゲイン符号に８ビット（均一
量子化に５ビット＋差分量子化と補間量子化に３ビッ
ト）の合計２５ビットで２フレーム（または２サブフレ
ーム）分のハーモニックスペクトル振幅列の符号化が出
来る。図４、図５、図６に示した音声符号化パラメータ
の符号化手順に従えば、２フレーム（または２サブフレ
ーム）分の音声符号化パラメータを、直接法と差分法と
補間法を組み合わせて効率的に量子化すると同時に、量
子化によるフレームのパワーの変化を抑えた音声符号化
パラメータの低ビット量子化法を提供することができ
る。例えば、音声１フレーム分の２０msec（１０msecの
サブフレームの２サブフレーム分に相当）を、３２ビッ
トで符号化することができ、その結果、１．６kbpsの低
ビットレートの音声符号化方法を実現することができ
る。Using the encoding method of the harmonic spectrum amplitude sequence according to the present invention shown in FIG.
A total of 25 bits of 17 bits for the code and 8 bits for the LSP gain code (5 bits for the uniform quantization + 3 bits for the difference quantization and the interpolation quantization) are used to generate a harmonic spectrum amplitude sequence for 2 frames (or 2 subframes). Can be encoded. According to the speech encoding parameter encoding procedure shown in FIGS. 4, 5, and 6, speech encoding parameters for two frames (or two subframes) are obtained by combining the direct method, the difference method, and the interpolation method. It is possible to provide a low-bit quantization method for speech coding parameters that efficiently quantizes and suppresses a change in frame power due to quantization. For example, 20 msec of one voice frame (corresponding to two subframes of 10 msec subframe) can be coded with 32 bits, and as a result, a 1.6 kbps low bit rate voice coding method is realized. can do.

【００３８】このような本発明の音声符号化パラメータ
符号化方法を用いて符号化された符号化音声を復号する
場合には、前記図８におけるパラメータ復号化部３０７
で上述と逆の処理を行って音声基本周波数（あるいは音
声ピッチ）、有声／無声情報およびハーモニックスペク
トル振幅列を復元し、それを用いて音声合成部３０８で
合成音声を作成すればよい。このような音声復号化部お
よび音声合成部の一例について説明する。When decoding the coded voice coded by using the voice coding parameter coding method of the present invention, the parameter decoding unit 307 shown in FIG.
Then, the processing reverse to that described above may be performed to restore the voice fundamental frequency (or voice pitch), voiced / unvoiced information, and the harmonic spectrum amplitude sequence, and the synthesized voice may be generated by the voice synthesis unit 308 using the restored sequences. An example of such a speech decoding unit and a speech synthesis unit will be described.

【００３９】図７は、前記図１に示した音声符号化パラ
メータ符号化装置により符号化された符号化音声を復号
する音声復号化部と音声合成部の一構成例を示すブロッ
ク図である。図示しない受信部を介して、音声符号化パ
ラメータとしての、有声／無声情報、ピッチ情報、ＬＳ
Ｐ符号、ＬＳＰゲイン符号がそれぞれ端子１００１、１
００４、１００６、１００９に入力される。有声／無声
符号１００１は有声／無声符号復号部１００３に入力さ
れ、ここで前記図１の音声符号化パラメータ符号化部に
おける代表有声／無声情報コードブック１３２と同じ内
容の代表有声／無声情報コードブック１００２を用いて
有声／無声情報が復元される。伝送されなかったフレー
ムの有声／無声情報は前後のフレームからのフレームパ
ワーの大きいフレームの有声／無声情報で代用される。
ピッチ符号１００４は、ピッチ復号部１００５に入力さ
れ、その符号値により均一／差分量子化法が判断され
て、対応する逆量子化法によりピッチが復元される。補
間によりピッチを符号化したフレームのピッチに対して
は、ピッチ符号をピッチの補間符号として前後のフレー
ムのピッチから符号化時に行った補間の逆動作によりピ
ッチを復元する。FIG. 7 is a block diagram showing an example of the configuration of a voice decoding unit and a voice synthesis unit for decoding the coded voice coded by the voice coding parameter coding apparatus shown in FIG. Via a receiving unit (not shown), voiced / unvoiced information, pitch information, LS
The P code and the LSP gain code are terminals 1001, 1 respectively.
004, 1006, and 1009. The voiced / unvoiced code 1001 is input to the voiced / unvoiced code decoding unit 1003, where the representative voiced / unvoiced information codebook 132 has the same contents as the representative voiced / unvoiced information codebook 132 in the voice coding parameter coding unit of FIG. 1002 is used to restore voiced / unvoiced information. The voiced / unvoiced information of the frame that has not been transmitted is replaced with voiced / unvoiced information of a frame having a large frame power from the preceding and following frames.
The pitch code 1004 is input to a pitch decoding unit 1005, and a uniform / differential quantization method is determined based on the code value, and the pitch is restored by a corresponding inverse quantization method. With respect to the pitch of a frame whose pitch has been encoded by interpolation, the pitch is restored by the inverse operation of the interpolation performed at the time of encoding from the pitch of the preceding and succeeding frames using the pitch code as the pitch interpolation code.

【００４０】ＬＳＰ符号１００６は、ＬＳＰ復号部１０
０８で図１の音声符号化パラメータ符号化部のＬＳＰコ
ードブック１４４と同じ内容のＬＳＰコードブック１０
０７を用いてＬＳＰが復元される。補間によりＬＳＰを
符号化したフレームのＬＳＰに対しては、前後のフレー
ムのＬＳＰを用いて、符号化時の逆動作によりＬＳＰを
復元する。ＬＳＰゲイン符号１００９は、ゲイン復号部
１０１０で図１のゲイン量子化部１４８の逆動作により
ＬＳＰゲインを復号する。また、ＬＳＰゲインを差分又
は補間量子化を行ったフレームに対しては、ＬＳＰ符号
値から対応する逆量子化法を判断してＬＳＰゲインを復
号する。ＬＳＰ復号部１００８の出力のＬＳＰと、ゲイ
ン復号部１０１０の出力のＬＳＰゲインから、ハーモニ
ック振幅計算部１０１１で、前記式（１）に示したＬＰ
Ｃ合成法を用いてハーモニック振幅列を復元する。The LSP code 1006 is used by the LSP decoding unit 10
08, the LSP code book 10 having the same contents as the LSP code book 144 of the speech coding parameter coding unit in FIG.
07 is used to restore the LSP. With respect to the LSP of the frame in which the LSP has been encoded by interpolation, the LSP of the preceding and succeeding frames is used to restore the LSP by the inverse operation at the time of encoding. The LSP gain code 1009 decodes the LSP gain by the inverse operation of the gain quantization unit 148 in FIG. Further, for a frame in which the LSP gain has been subjected to difference or interpolation quantization, a corresponding inverse quantization method is determined from the LSP code value to decode the LSP gain. From the LSP output from the LSP decoding unit 1008 and the LSP gain output from the gain decoding unit 1010, the harmonic amplitude calculation unit 1011 calculates the LP shown in the above equation (1).
The harmonic amplitude sequence is restored using the C synthesis method.

【００４１】無声ゲイン設定部１０１２では、雑音発生
部１０１３の出力をランダムな雑音スペクトルとみなし
て、有声／無声信号が無声の周波数バンドの雑音スペク
トルパワーが、対応する周波数バンドのハーモニック振
幅列のパワーに一致する様に、雑音スペクトルのレベル
を周波数バンド毎に調整する。逆ＦＦＴ部１０１４では
周波数バンド毎にレベル調整された雑音スペクトルを、
実部はゼロ周波数に対し対称に負周波数側に拡張し、虚
部はゼロ周波数に対し極性を反転して負周波数側に拡張
して逆ＦＦＴを行い、その結果得られる実部のみの時間
軸の音声信号を得る。フレーム補間部１０１５では、逆
ＦＦＴで得られた時間軸の音声信号を、フレーム間で補
間合成を行い無声部の音声合成信号を得る。The unvoiced gain setting section 1012 regards the output of the noise generation section 1013 as a random noise spectrum, and calculates the noise spectrum power of the frequency band in which the voiced / unvoiced signal is unvoiced to the power of the harmonic amplitude sequence of the corresponding frequency band. The level of the noise spectrum is adjusted for each frequency band so that In the inverse FFT unit 1014, the noise spectrum whose level has been adjusted for each frequency band is
The real part expands to the negative frequency side symmetrically with respect to the zero frequency, and the imaginary part inverts the polarity with respect to the zero frequency and expands to the negative frequency side to perform inverse FFT. To obtain the audio signal. The frame interpolation unit 1015 interpolates and synthesizes the time axis audio signal obtained by the inverse FFT between frames to obtain an unvoiced audio synthesis signal.

【００４２】一方、有声ゲイン設定部１０１６では、有
声／無声信号が有声の周波数バンドに対しては、ハーモ
ニック振幅値をハーニック振幅列計算部から得られた対
応する周波数バンド内のハーモニック振幅列の値に設定
し、それ以外の周波数バンド内のハーモニック振幅値を
ゼロとする。ハーモニック合成部１０１７では、その初
期位相が位相再生部１０１８からの各ハーモニック初期
位相であり、その振幅が有声ゲイン設定部１０１６で設
定されたハーモニック振幅値である正弦波により生成
し、その総和として有声の周波数バンドの音声合成信号
を得る。ここで、位相再生部１０１８は、フレーム間で
各ハーモニックの位相連続性を保つ様に、各ハーモニッ
ク正弦波の初期位相を設定すると同時に、初期位相の連
続性に擾乱を与えて、単純な正弦波合成に起因するバズ
音の発生を防止する。フレーム補間部１０１９ではフレ
ーム間の振幅変化を滑らかにし、フレーム間での急激な
レベル変化を防止している。有声バンドの音声合成信号
と無声バンドの音声合成信号は、加算器１０２０で加算
され、ポストフィルタ部１０２１で聴感上の音質改善フ
ィルタ処理を行った後、最終的な合成音声信号が端子１
０２２に出力される。On the other hand, in the voiced gain setting section 1016, for a frequency band in which the voiced / unvoiced signal is voiced, the harmonic amplitude value is calculated as the value of the harmonic amplitude sequence in the corresponding frequency band obtained from the harmonic amplitude sequence calculation section. And set the harmonic amplitude values in the other frequency bands to zero. In the harmonic synthesizing unit 1017, the initial phase is each harmonic initial phase from the phase reproducing unit 1018, and the amplitude is generated by a sine wave which is the harmonic amplitude value set by the voiced gain setting unit 1016, and the sum is voiced. To obtain a speech synthesis signal of the frequency band of Here, the phase reproducing unit 1018 sets the initial phase of each harmonic sine wave so as to maintain the phase continuity of each harmonic between frames, and at the same time, disturbs the continuity of the initial phase, thereby forming a simple sine wave. Prevents buzzing sound due to synthesis. The frame interpolator 1019 smoothes the amplitude change between frames and prevents a sharp level change between frames. The voice-synthesized signal of the voiced band and the voice-synthesized signal of the unvoiced band are added by the adder 1020, and after the post-filter unit 1021 performs sound quality improvement filter processing on the audibility, the final synthesized voice signal is supplied to the terminal 1
022.

【００４３】なお、上記においては、ハーモニックスペ
クトル振幅列を線スペクトル対（ＬＳＰ）とゲインで量
子化したが、線形予測係数（ＬＰＣ係数）とゲインで量
子化するようにしてもよい。また、以上の説明では、判
りやすくするために、図１の入力切換部１０３や、ピッ
チ符号切換部１１０、フレーム間引き部１３３、ＬＳＰ
符号切換部１４６、ゲイン符号切換部１５１等はフレー
ム毎に切換ることとして説明したが、特にフレーム毎の
切換に限定するものではなく、異なる周期に変更して
も、関連する技術者、研究者には容易に必要な箇所を変
更して実現することが可能である。In the above description, the harmonic spectrum amplitude sequence is quantized by the line spectrum pair (LSP) and the gain, but may be quantized by the linear prediction coefficient (LPC coefficient) and the gain. In the above description, the input switching unit 103, the pitch code switching unit 110, the frame thinning unit 133, the LSP
The code switching unit 146, the gain code switching unit 151, and the like have been described as switching on a frame-by-frame basis, but are not limited to switching on a frame-by-frame basis. Can be easily realized by changing necessary parts.

【００４４】[0044]

【発明の効果】以上述べた様に、本発明の音声符号化パ
ラメータ符号化方法及び装置によれば、音声のフレーム
毎に、音声ピッチ、各スペクトルバンドのV/UV情報、及
びハーモニックスペクトル振幅列からなる音声符号化パ
ラメータで表した分析合成型の音声符号化方法におい
て、音声ピッチを、対数ピッチとして差分量子化または
均一量子化するフレームと、フレーム間補間インデック
スで量子化するフレームの切り換えにより符号化するこ
とで、大幅に符号化ビット数を低下することが出来る。
また、V/UV情報を音声基本周波数の範囲で決まるバンド
数毎に、予め準備した代表V/UV値のインデックス番号で
符号化することで、合理的にV/UV符号化ビット数を削減
することが出来る。更に、V/UV情報をフレーム毎に間引
いて、V/UV情報を伝送しないフレームのV/UV情報は、前
後のフレームから類推する方法により更にV/UV符号化ビ
ット数を削減することが出来る。As described above, according to the speech encoding parameter encoding method and apparatus of the present invention, the speech pitch, the V / UV information of each spectrum band, and the harmonic spectrum amplitude sequence are obtained for each speech frame. In an analysis-synthesis-type speech coding method represented by speech coding parameters consisting of, a speech pitch is encoded by switching between a frame to be differentially or uniformly quantized as a logarithmic pitch and a frame to be quantized by an inter-frame interpolation index. Thus, the number of coded bits can be greatly reduced.
In addition, V / UV information is coded with an index number of a representative V / UV value prepared in advance for each band number determined by the range of the audio fundamental frequency, so that the number of V / UV coded bits can be reduced rationally. I can do it. Further, the V / UV information is thinned out for each frame, and the V / UV information of the frame that does not transmit the V / UV information can be further reduced in the number of V / UV encoded bits by a method of inferring from the preceding and succeeding frames. .

【００４５】音声ハーモニックスペクトル振幅列は、０
次のハーモニック振幅値をモデル化しやすい値に修正
後、自己回帰型線形予測モデルでモデル化し、そのモデ
ル化係数である線形予測係数とゲインにより表現する。
線形予測係数は、ベクトル量子化して伝送するフレーム
と、すでに量子化された線形予測係数からフレーム間補
間により、最も誤差の少ない補間候補のインデックス番
号で量子化することで少ないビット数で符号化できる。
一方、ゲインは、量子化された線形予測係数から、ハー
モニックスペクトル振幅列を復元し、モデル化と線形予
測係数の量子化によるフレームのパワーの変化率によっ
て補正した後、その対数ゲインを均一量子化するフレー
ムと、前のフレームからの差分量子化もしくは前後のフ
レームからの補間量子化から誤差の少ない方で量子化す
るフレームの組み合わせで量子化する。これらにより、
ハーモニックスペクトル振幅列の量子化によるフレーム
の音声レベル変化を抑つつ、音声ハーモニックスペクト
ル振幅列を低ビットで符号化することができる。以上本
発明によれば、分析合成型の音声符号化方法及び装置に
おいて、符号化ビットレートを大きく低下する方法及び
装置を提供することが出来る。また、分析合成型の音声
符号化方法及び装置において、音声符号化のフレーム更
新周期を早くして符号化音声品質を向上させ、かつ符号
化ビット数の増大を防いだ音質の良い分析合成型の音声
符号化方法及び装置を提供することが出来る。The voice harmonic spectrum amplitude sequence is 0
After correcting the next harmonic amplitude value to a value that is easy to model, it is modeled by an autoregressive linear prediction model, and is expressed by the linear prediction coefficient and gain, which are the modeled coefficients.
The linear prediction coefficient can be encoded with a small number of bits by quantizing with a frame number to be transmitted after vector quantization and an already quantized linear prediction coefficient with an index number of an interpolation candidate having the least error by inter-frame interpolation. .
On the other hand, the gain is obtained by restoring the harmonic spectrum amplitude sequence from the quantized linear prediction coefficients, correcting it by the rate of change of the frame power by modeling and quantization of the linear prediction coefficients, and then uniformly quantizing the logarithmic gain. Quantization is performed using a combination of a frame to be quantized and a frame quantized with a smaller error from differential quantization from the previous frame or interpolation quantization from the previous and next frames. By these,
The audio harmonic spectrum amplitude sequence can be encoded with low bits while suppressing a change in the audio level of the frame due to the quantization of the harmonic spectrum amplitude sequence. As described above, according to the present invention, it is possible to provide a method and an apparatus for greatly reducing an encoding bit rate in an analysis and synthesis type speech encoding method and apparatus. Further, in the analysis-synthesis-type speech encoding method and apparatus, the analysis-synthesis-type speech encoding method of improving the encoded speech quality by shortening the frame update period of the speech encoding and preventing an increase in the number of encoded bits is improved. An audio encoding method and apparatus can be provided.

[Brief description of the drawings]

【図１】本発明の音声符号化パラメータ符号化方法が
適用された音声符号化パラメータ符号化装置のブロック
図である。FIG. 1 is a block diagram of a speech coding parameter coding apparatus to which a speech coding parameter coding method of the present invention is applied.

【図２】補間ピッチ候補作成について説明するための
図である。FIG. 2 is a diagram for describing creation of an interpolation pitch candidate.

【図３】発生頻度の低い有声／無声情報の消去方法に
ついて説明するための図である。FIG. 3 is a diagram for explaining a method of deleting voiced / unvoiced information having a low frequency of occurrence.

【図４】音声ピッチの符号化処理の流れ図である。FIG. 4 is a flowchart of a speech pitch encoding process.

【図５】有声／無声情報の符号化処理の流れ図であ
る。FIG. 5 is a flowchart of encoding processing of voiced / unvoiced information.

【図６】ハーモニックスペクトル振幅列の符号化処理
の流れ図である。FIG. 6 is a flowchart of an encoding process of a harmonic spectrum amplitude sequence.

【図７】本発明により符号化された符号化音声を復号
する音声復号化部と音声合成部の一構成例を示すブロッ
ク図である。FIG. 7 is a block diagram illustrating a configuration example of a speech decoding unit and a speech synthesis unit that decode coded speech encoded according to the present invention.

【図８】一般的な音声符号化伝送装置の構成図であ
る。FIG. 8 is a configuration diagram of a general speech coded transmission device.

【図９】音声符号化パラメータ抽出部のブロック図で
ある。FIG. 9 is a block diagram of a speech coding parameter extraction unit.

【図１０】従来の音声符号化装置におけるパラメータ
符号化部のブロック図である。FIG. 10 is a block diagram of a parameter encoding unit in a conventional speech encoding device.

[Explanation of symbols]

１０２対数変換部、１０３入力切換部、１０５均
一量子化部、１０８ピッチ比較部、１１０ピッチ符号
切換部、１１１遅延部、１１２減算部、１１３差
分量子化部、１１４加算部、１１７補間ピッチ候補
作成部、１１９補間点比較部、１３２代表有声／無
声情報コードブック、１３３フレーム間引き部、１３
４有声／無声比較部、１４１スペクトル修正部、１
４２線形予測モデル化部、１４３ＬＳＰ量子化部、１
４４ＬＳＰコードブック、１４５ＬＳＰ補間部、１
４６ＬＳＰ符号切換部、１４７ゲイン補正部、１４
８ゲイン量子化部、１４９ゲイン差分量子化部、１
５０ゲイン補間量子化部、１５１ゲイン符号切換
部、１５２誤差計算部、１５３ハーモニック復元部102 logarithmic conversion unit, 103 input switching unit, 105 uniform quantization unit, 108 pitch comparison unit, 110 pitch code switching unit, 111 delay unit, 112 subtraction unit, 113 difference quantization unit, 114 addition unit, 117 interpolation pitch candidate creation Section, 119 interpolation point comparison section, 132 representative voiced / unvoiced information codebook, 133 frame thinning section, 13
4 voiced / unvoiced comparison unit, 141 spectrum correction unit, 1
42 linear prediction modeler, 143 LSP quantizer, 1
44 LSP codebook, 145 LSP interpolation unit, 1
46 LSP code switching unit, 147 gain correction unit, 14
8 gain quantization section, 149 gain difference quantization section, 1
50 gain interpolation quantization section, 151 gain code switching section, 152 error calculation section, 153 harmonic restoration section

───────────────────────────────────────────────────── フロントページの続き (72)発明者佐々木誠司神奈川県横須賀市光の丘３番２号株式会社ワイ・アール・ピー高機能移動体通信研究所内Ｆターム(参考） 5D045 CC07 DA06 DA11 5J064 AA01 BA13 BA16 BB01 BB03 BB04 BB12 BC11 BC16 BD02 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Seiji Sasaki 3-2 Hikarinooka, Yokosuka City, Kanagawa Prefecture F.R.P. 5D045 CC07 DA06 DA11 5J064 AA01 BA13 BA16 BB01 BB03 BB04 BB12 BC11 BC16 BD02

Claims

[Claims]

1. A speech coding parameter coding method for coding a voice coding parameter obtained from a voice signal digitized and divided into frames of a predetermined time length, wherein a voice pitch as the voice coding parameter is provided. A frame whose quantization pitch is obtained by one of the quantization methods by selecting the difference quantization method or the uniform quantization method, and an interpolation selected from a plurality of interpolation pitch candidates calculated using the quantization pitches of the preceding and succeeding frames. Coding by a combination with a frame to be quantized by a pitch index; voiced / unvoiced information as the voice coding parameter;
Encoding by an index of representative voiced / unvoiced information selected from a limited number of representative voiced / unvoiced information; and extracting a harmonic spectrum amplitude sequence as the voice coding parameter from a linear prediction coefficient by a linear prediction model or derived therefrom. And the line spectrum pair and the gain, and for the linear prediction coefficient or line spectrum pair,
A frame to be quantized by a quantizer such as a vector quantizer and an interpolated quantum by an index of a candidate point selected from a plurality of candidate points obtained by a linear interpolator from quantized linear prediction coefficients of a preceding and succeeding frame or a line spectrum pair. Coding in combination with a frame to be converted, correcting the gain according to a change in harmonic spectral power of the frame generated by quantization of the linear prediction coefficient, and obtaining a correction gain, and logarithmically calculating the correction gain. A frame to be uniformly quantized by the first gain quantizer as it is, an output value of the difference quantizer based on the quantization gain of the first gain quantizer of the previous frame, and a preceding and succeeding frame. From the output value of the interpolation quantizer obtained by selecting a plurality of interpolation candidates of the output of the first quantizer, Speech coding parameter coding method characterized by comprising the step of encoding the correction gain by the combination of the frame to be quantized by in favor less second gain quantizer for quantizing.

2. The method according to claim 1, wherein the step of obtaining the correction gain comprises using a linear prediction model using a harmonic spectral power obtained by a sum of squares of a harmonic amplitude sequence before the linear prediction model, a quantized linear prediction coefficient, and a gain before the quantization. 2. A speech coding parameter code according to claim 1, wherein a correction gain is calculated by multiplying the gain by a ratio of a harmonic spectrum power obtained from a sum of squares of a harmonic spectrum amplitude value obtained by the following. Method.

3. A voice coding parameter coding device for coding voice coding parameters obtained from a voice signal digitized and divided into frames of a predetermined time length, wherein a voice pitch as the voice coding parameter is provided. Of a frame to be quantized by one of the quantization methods by selecting the difference quantization method and the uniform quantization method, and an interpolation pitch selected from a plurality of interpolation pitch candidates calculated using the quantization pitches of the preceding and succeeding frames. Means for encoding by a combination of frames to be quantized by an index; and voiced / unvoiced information as the speech encoding parameter,
Means for encoding with an index of representative voiced / unvoiced information selected from a limited number of representative voiced / unvoiced information; and a step of converting a harmonic spectrum amplitude sequence as the voice coding parameter into a linear prediction coefficient by a linear prediction model or Separated into the line spectrum pair and the gain to be derived, and for the linear prediction coefficient or line spectrum pair,
A frame to be quantized by a quantizer such as a vector quantizer and an interpolated quantum by an index of a candidate point selected from a plurality of candidate points obtained by a linear interpolator from quantized linear prediction coefficients of a preceding and succeeding frame or a line spectrum pair. Means for encoding with a combination of frames to be converted, means for correcting the gain in accordance with a change in the harmonic spectral power of the frame generated by quantization of the linear prediction coefficient, and means for obtaining a correction gain, and logarithmizing the correction gain , The frame to be uniformly quantized by the first gain quantizer as it is, the output value of the difference quantizer based on the quantization gain of the first gain quantizer of the previous frame, From the output values of the interpolation quantizer obtained by selecting a plurality of interpolation candidates of the output of the first quantizer, Nde speech coding parameter encoding device characterized by having a means for encoding the correction gain by a combination of a frame to be quantized by the second gain quantizer for quantizing.