JP3453116B2

JP3453116B2 - Audio encoding method and apparatus

Info

Publication number: JP3453116B2
Application number: JP2000292666A
Authority: JP
Inventors: 照夫麓; 佐々木誠司
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 2000-09-26
Filing date: 2000-09-26
Publication date: 2003-10-06
Anticipated expiration: 2020-09-26
Also published as: JP2002099300A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号をデジタ
ル化して所定の時間間隔毎にその特徴を表す音声符号化
パラメータを符号化する音声符号化方法及び装置に関す
るものであり、その符号化した音声符号化パラメータを
伝送または蓄積し、伝送先または蓄積先から必要な時に
音声符号化パラメータを復元し、復元した音声符号化パ
ラメータから音声信号を合成して音声を伝えるデジタル
携帯電話やデジタル音声蓄積装置などに使用して好適な
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding method and apparatus for digitizing a speech signal and coding a speech coding parameter representing the characteristic thereof at a predetermined time interval. A digital mobile phone or digital voice storage that transmits or stores voice coding parameters, restores voice coding parameters from the transmission destination or storage destination when needed, and synthesizes a voice signal from the restored voice coding parameters to transmit voice. It is suitable for use in devices and the like.

【０００２】[0002]

【従来の技術】デジタル化された音声信号は、データ圧
縮、誤り処理、多重化などさまざまなデジタル信号処理
が可能になるため、固定電話や移動電話に限らず音声を
利用するマルチメディアシステムなどに広く取り入れら
れている。アナログの音声信号をデジタル化するには、
一般に入力音声周波数帯域の２倍以上の標本化周波数で
標本化し、耳で識別できない程度の量子化ステップで量
子化が必要なため、アナログ信号と比較し広い伝送周波
数帯域幅を必要とする。そのため、一旦デジタル化され
た音声信号は、要求される音声品質に応じてさまざまな
符号化方式や変調方式によりデータの圧縮が行われてい
る。2. Description of the Related Art Since a digitized voice signal can be subjected to various digital signal processing such as data compression, error processing, and multiplexing, it is not limited to fixed-line telephones and mobile telephones, but can be applied to multimedia systems using voice. Widely adopted. To digitize analog audio signals,
Generally, sampling is performed at a sampling frequency that is twice or more the input voice frequency band, and quantization is required at a quantization step that cannot be discriminated by the ear. Therefore, a wide transmission frequency bandwidth is required as compared with an analog signal. Therefore, the audio signal once digitized is compressed by various encoding and modulation methods according to the required audio quality.

【０００３】高い音声データの圧縮率が得られる方法と
して、音声の持つ特徴を積極的に利用する分析合成型の
音声符号化方式とそこから得られた音声符号化パラメー
タを効率的に量子化する方法が考えられている。例え
ば、衛星携帯電話に一部使用されているＭＢＥ（Multi-
Band Excitation）方式もしくはＩＭＢＥ（Improved Mu
lti-Band Excitation）方式は、この分析合成型の音声
符号化方式の一種で、音声を所定の時間間隔（20msec）
でセグメントに分割してフレームを構成し、そのフレー
ム毎に、音声ピッチ（又はその逆数としての音声基本周
波数）、フレームの音声の周波数スペクトルから得られ
る音声ハーモニクススペクトル振幅列、周波数スペクト
ルを適当な周波数領域に分割した周波数バンド毎の有声
／無声情報（Voiced/Unvoiced情報、又はV/UV情報）を
音声符号化パラメータとし、各フレームに対して、音声
ピッチは８ビット均一量子化、バンド毎のV/UV情報ｖ
[k]（ｋはバンドの番号）は０／１の２進数で表したバ
イナリ値でＫビット量子化（Ｋ：最大バンド数で最大12
ビットの可変長）、音声ハーモニクス振幅列はフレーム
間予測差分値を２次元変換しそのＤＣＴ（離散コサイン
変換）係数を75−Kビット量子化して4.15kbpsの音声符
号化速度を得ている。As a method for obtaining a high compression rate of voice data, an analysis-synthesis type voice encoding system which positively uses the characteristics of voice and a voice encoding parameter obtained therefrom are efficiently quantized. A method is being considered. For example, MBE (Multi-
Band Excitation) method or IMBE (Improved Mu)
The lti-Band Excitation) method is a type of this analysis-synthesis type speech coding method, in which speech is transmitted at a predetermined time interval (20 msec).
Divide into segments to form a frame, and for each frame, the voice pitch (or the voice fundamental frequency as its inverse), the voice harmonics spectrum amplitude sequence obtained from the voice frequency spectrum of the frame, and the frequency spectrum are set to appropriate frequencies. Voiced / unvoiced information (Voiced / Unvoiced information or V / UV information) for each frequency band divided into regions is used as a voice encoding parameter, and for each frame, the voice pitch is 8-bit uniform quantization, and V for each band is used. / UV information v
[k] (k is the band number) is a binary value expressed as a binary number of 0/1, and K bit quantization (K: maximum number of bands is 12).
(Variable length of bits), the speech harmonics amplitude sequence is obtained by two-dimensionally transforming the inter-frame prediction difference value and quantizing its DCT (discrete cosine transform) coefficient by 75-K bits to obtain a speech coding rate of 4.15 kbps.

【０００４】図７は、一般的な音声符号化伝送装置の構
成を示した図である。音声入力端子３０１から入力され
た標本化・量子化された音声デジタル信号を、音声符号
化パラメータ抽出部３０２で所定の時間間隔のセグメン
トに分割してフレームを構成し、そのフレーム毎に音声
符号化パラメータを抽出する。抽出する音声符号化パラ
メータは音声符号化方式により異なり、例えば前記のＭ
ＢＥ方式では、音声ピッチ、音声ハーモニクススペクト
ル振幅列、各周波数バンドのV/UV情報である。パラメー
タ符号化部３０３は、抽出した音声符号化パラメータを
効果的に符号化して符号量を低減せしめ、送信部３０４
を介して伝送路３０５に送り出す。受信部３０６で受け
取った信号は、パラメータ復号化部３０７で音声符号化
パラメータを復元し、音声合成部３０８は音声符号化パ
ラメータ抽出部３０２と逆の動作により合成音声を作成
し音声出力端子３０９から音声デジタル信号を出力す
る。FIG. 7 is a diagram showing the configuration of a general voice coding transmission device. The sampled / quantized audio digital signal input from the audio input terminal 301 is divided into segments at predetermined time intervals by the audio encoding parameter extraction unit 302 to form a frame, and the audio encoding is performed for each frame. Extract the parameters. The speech coding parameters to be extracted differ depending on the speech coding method, and for example, the above M
In the BE method, it is a voice pitch, a voice harmonics spectrum amplitude sequence, and V / UV information of each frequency band. The parameter encoding unit 303 effectively encodes the extracted speech encoding parameter to reduce the code amount, and the transmitting unit 304
Is sent to the transmission line 305 via. With respect to the signal received by the receiving unit 306, a voice decoding parameter is restored by a parameter decoding unit 307, and a voice synthesizing unit 308 creates a synthesized voice by an operation reverse to that of the voice encoding parameter extracting unit 302, and outputs it from a voice output terminal 309. Outputs a digital audio signal.

【０００５】図８は前記ＭＢＥ方式の場合における前記
音声符号化パラメータ抽出部３０２のブロック構成図で
ある。デジタル入力音声信号は入力端子３０１から基本
周波数推定部４０１に入力され、ここで音声の基本周波
数が推定される。基本周波数の推定値は、時間遅れの自
己相関関数が最大となる時間の逆数値として計算され
る。周波数スペクトル計算部４０２では、ハミング窓等
の窓関数によりフレームから切り出した有限長の音声信
号を周波数分析して音声周波数スペクトルを得る。基本
周波数修正部４０３は、推定された音声基本周波数と前
記窓関数により合成されるスペクトルと前記音声周波数
スペクトルとの誤差最小条件で、Ａ−ｂ−Ｓ（Analysis
-by-Synthesis）手法により修正音声基本周波数ωoとハ
ーモニクススペクトル振幅列を同時に求める。有声強度
計算部４０４は修正音声基本周波数ωoに基づいて、周
波数帯域を複数の周波数バンドｋ（k=1,2,...,K）に分
割し、各周波数バンド毎に合成された合成スペクトルと
音声周波数スペクトルの誤差を計算し、閾値判定により
V/UV情報ｖ[k]を出力する。スペクトル包絡計算部４０
５はV/UV情報ｖ[k]により、有声バンドではＡ−ｂ−Ｓ
手法で求めた各ハーモニクススペクトル振幅、無声バン
ドでは各ハーモニクスの周波数帯域での周波数スペクト
ルのルート二乗平均値をスペクトル包絡絶対値|Ａ(ω)|
として出力する。FIG. 8 is a block diagram of the voice coding parameter extraction unit 302 in the case of the MBE method. The digital input voice signal is input from the input terminal 301 to the fundamental frequency estimation unit 401, where the fundamental frequency of the voice is estimated. The estimated value of the fundamental frequency is calculated as the reciprocal value of the time when the time-delayed autocorrelation function becomes maximum. The frequency spectrum calculation unit 402 frequency-analyzes the finite-length voice signal cut out from the frame using a window function such as a Hamming window to obtain a voice frequency spectrum. The fundamental frequency correction unit 403 sets Ab-S (Analysis) under the minimum error condition between the estimated speech fundamental frequency, the spectrum synthesized by the window function, and the speech frequency spectrum.
-by-Synthesis) method is used to simultaneously obtain the modified speech fundamental frequency ωo and the harmonic spectrum amplitude sequence. The voiced strength calculation unit 404 divides the frequency band into a plurality of frequency bands k (k = 1,2, ..., K) based on the modified speech fundamental frequency ωo, and synthesizes the synthesized spectrum for each frequency band. And the error of the voice frequency spectrum is calculated, and by the threshold judgment
V / UV information v [k] is output. Spectrum envelope calculation unit 40
5 is V / UV information v [k], which is Abs in voiced band.
The root mean square value of the frequency spectrum in each harmonics frequency band for each harmonics spectrum amplitude obtained by the method and the unvoiced band is the spectrum envelope absolute value | A (ω) |
Output as.

【０００６】図９は前記ＩＭＢＥ方式の場合における前
記パラメータ符号化部３０３のブロック構成図である。
入力端子５０１に入力された音声基本周波数ωoは、基
本周波数量子化部５０２で、予め定めた量子化範囲及び
量子化ステップで８ビットに均一量子化し、その量子化
値B0を出力端子５０３に出力する。入力端子５０４に入
力されたV/UV情報ｖ[k]は、V/UV情報量子化部５０５
で、例えば周波数バンド数Ｋが１２の場合は１２個の０
又は１の情報で表した２進数１２ビット値B1として出力
端子５０６に出力する。入力端子５０７に入力されたス
ペクトル包絡|Ａ(ω)|は、離散的なハーモニクススペク
トル振幅列|Ａ(ωi)|、(i=1,2,3,...,N、Ｎ：ハーモニ
クス本数）として入力される。まず対数変換部５０８で
対数変換後、減算器５０９で前フレームのハーモニクス
スペクトル振幅列から予測した予測ハーモニクススペク
トル振幅列５２１との差（これを「予測差分値列」と呼
ぶことにする）が計算され、ブロック変換部５１０に渡
される。ブロック変換部５１０では予測差分値列をハー
モニクスの順位により６種類に順次分類して２次元デー
タとし、次のＤＣＴ（離散コサイン変換）部５１１に渡
しＤＣＴ係数が計算され量子化部５１２に渡され、ＤＣ
Ｔ係数の次数により予め選定した均一量子化法とベクト
ル量子化法の組み合わせにより、予測差分値列の量子化
データB2が出力端子５１３に出力される。量子化復元部
５１４、逆ＤＣＴ部５１５およびブロック復元部５１６
は、量子化された予測差分値列を復元し、加算器５１７
で予測ハーモニクススペクトル振幅列５２１と加算し、
現フレームの入力スペクトル包絡の量子化スペクトル包
絡値が復元される。その量子化スペクトル包絡値はフレ
ーム遅延部５１８で１フレーム遅延し、新たに入力され
る次フレームの音声基本周波数ωoと前フレームの基本
周波数を元にしてスペクトル包絡予測部５１９で次フレ
ームのスペクトル包絡値を予測し、その予測値を前記減
算器５０９に導き、次フレームのスペクトル包絡の量子
化に備える。FIG. 9 is a block diagram of the parameter coding unit 303 in the case of the IMBE method.
The fundamental frequency quantization unit 502 uniformly quantizes the voice fundamental frequency ωo input to the input terminal 501 into 8 bits in a predetermined quantization range and quantization step, and outputs the quantized value B0 to the output terminal 503. To do. The V / UV information v [k] input to the input terminal 504 is the V / UV information quantizer 505.
If, for example, the number K of frequency bands is 12, 12 0's
Alternatively, it is output to the output terminal 506 as a binary 12-bit value B1 represented by the information of 1. The spectral envelope | A (ω) | input to the input terminal 507 is a discrete harmonics spectrum amplitude sequence | A (ωi) |, (i = 1,2,3, ..., N, N: number of harmonics. ). First, after the logarithmic transformation is performed by the logarithmic transformation unit 508, the difference (which will be referred to as “prediction difference value sequence”) from the predicted harmonics spectrum amplitude sequence 521 predicted from the harmonics spectrum amplitude sequence of the previous frame is calculated by the subtractor 509. And passed to the block conversion unit 510. The block transformation unit 510 sequentially classifies the prediction difference value sequence into 6 types according to the order of harmonics to form two-dimensional data, passes it to the next DCT (discrete cosine transformation) unit 511, calculates DCT coefficients, and passes it to the quantization unit 512. , DC
The quantized data B2 of the prediction difference value sequence is output to the output terminal 513 by the combination of the uniform quantization method and the vector quantization method selected in advance according to the order of the T coefficient. Quantization restoration unit 514, inverse DCT unit 515 and block restoration unit 516
Restores the quantized prediction difference value sequence and adds it to the adder 517.
And add it to the predicted harmonics spectrum amplitude column 521,
The quantized spectral envelope value of the input spectral envelope of the current frame is restored. The quantized spectrum envelope value is delayed by one frame in the frame delay unit 518, and based on the newly input voice fundamental frequency ωo of the next frame and the fundamental frequency of the previous frame, the spectrum envelope prediction unit 519 uses the spectrum envelope of the next frame. The value is predicted, the predicted value is guided to the subtractor 509, and the spectrum envelope for the next frame is quantized.

【０００７】このＭＢＥ方式の原理は、D. W. Griffin
and J. S. Lim, "Multi-band Excitation Vocoder" , I
EEE Transactions on Acoustics, speech, and signal
processing, vol.36,No.8,August 1988, pp1223-1235に
記載されている。又、符号化器の構成方法はＩＭＢＥ方
式の音声符号化手順として、USP-5491722（Methods for
speech transmssion,Feb.13,1996）により詳しく開示
されている。このように、音声をデジタル化して低ビッ
トレートの音声符号化を実現する方法として、音声合成
モデルに基づく音声符号化パラメータを抽出して符号化
を行う分析合成型の音声符号化方式が提案され、一部実
用に供されている。The principle of this MBE method is based on DW Griffin
and JS Lim, "Multi-band Excitation Vocoder", I
EEE Transactions on Acoustics, speech, and signal
processing, vol.36, No.8, August 1988, pp1223-1235. In addition, the configuration method of the encoder is USP-5491722 (Methods for
speech transmssion, Feb. 13, 1996). As described above, as a method of digitizing voice to realize low bit rate voice encoding, an analysis-synthesis type voice encoding method is proposed in which voice encoding parameters based on a voice synthesis model are extracted and encoded. , Partly put to practical use.

【０００８】[0008]

【発明が解決しようとする課題】以上述べた分析合成型
の音声符号化方式は低ビットレート音声符号化のために
は有効であるが、この方法は音声をある音声分析・合成
モデルに基づいて音声合成パラメータでのみ音声の分析
・合成を行うため、符号化方式の構成によっては合成音
的な音質になりやすい。この点の改善の手法として、音
声フレーム更新周期を短く設定することにより、音声フ
レーム内での音声パラメータの変化を少なくし、分析合
成型でありながら高音質化を図る手段が考えられる。フ
レーム更新周期を短く設定した場合の符号化音声品質の
改善効果について、麓他，“業務用移動体通信向け音声
符号化方式の検討”，電子情報通信学会2000年全国大
会，D14-2,p171,Mar.2000で報告されている。Although the analysis-synthesis type speech coding system described above is effective for low bit rate speech coding, this method is based on a certain speech analysis / synthesis model. Since voice analysis / synthesis is performed only with the voice synthesis parameter, synthetic voice quality is likely to occur depending on the configuration of the encoding system. As a method of improving this point, a method of reducing the change of the voice parameter in the voice frame by setting the voice frame update period to be short and improving the sound quality while being of the analysis and synthesis type can be considered. Regarding the effect of improving the coded voice quality when the frame update cycle is set short, M. et al., “Voice coding system for commercial mobile communication”, IEICE 2000 National Convention, D14-2, p171 , Mar. 2000.

【０００９】しかしながら、音声フレームの更新周期の
短縮化は、音声の高圧縮率化（低ビットレート化）には
逆行する。例えば、上記、麓らの報告では、音声フレー
ムを２分割してサブフレーム構造として音声符号化パラ
メータを取りだしており、従来方法で音声符号化パラメ
ータの符号化を行うと２倍のビットレートが必要とな
る。そこで、音声セグメント長を短く設定した場合で
も、パラメータの符号化ビット数が極端に大きくならな
いパラメータ符号化方法が望まれる。この様に、低ビッ
トレートをめざした分析合成型音声符号化方式の音声品
質向上のためには、分析合成型の音声符号化パラメータ
の効率的な符号化方法の課題があり、特にフレーム更新
周期を短くした場合の符号化ビットレートの増大を防止
する方策への課題がある。However, shortening the update cycle of the voice frame is contrary to the high compression rate (low bit rate) of voice. For example, in the above-mentioned report of Mitsuru et al., A speech frame is divided into two and the speech coding parameters are taken out as a sub-frame structure. When the speech coding parameters are coded by the conventional method, a double bit rate is required. Becomes Therefore, there is a demand for a parameter coding method in which the number of parameter coding bits does not become extremely large even when the voice segment length is set to be short. As described above, in order to improve the voice quality of the analysis-synthesis type speech encoding method aiming at a low bit rate, there is a problem of an efficient encoding method of the analysis-synthesis type speech encoding parameter, and particularly, the frame update period. There is a problem with measures to prevent an increase in the encoding bit rate when the value is shortened.

【００１０】そこで、本発明は、分析合成型の音声符号
化方法及び装置において、符号化ビットレートを大きく
低下することのできる音声符号化方法及び装置を提供す
ることを目的としている。また、分析合成型の音声符号
化方法及び装置において、音声符号化のフレーム更新周
期を早くすることにより符号化音声品質を向上させ、か
つ符号化ビット数の増大を防いだ音質の良い分析合成型
の音声符号化方法及び装置を提供することを目的として
いる。Therefore, an object of the present invention is to provide a speech coding method and apparatus of the analysis-synthesis type which can significantly reduce the coding bit rate. Further, in the analysis-synthesis type speech encoding method and apparatus, the speech-synthesis type which improves the encoded speech quality by shortening the frame updating period of speech encoding and prevents the increase in the number of encoded bits is provided. It is an object of the present invention to provide a voice encoding method and device.

【００１１】[0011]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声符号化方法は、デジタル化され所定時
間長のフレームに分割された音声信号から音声符号化パ
ラメータを取得し符号化する音声符号化方法であって、
前記音声符号化パラメータとしての音声ピッチを、差分
量子化法と均一量子化法の選択によりいずれかの量子化
法により量子化するフレームと、前後のフレームの量子
化音声ピッチを用いて計算した複数の補間音声ピッチ候
補のうちのいずれかを選択し、そのインデックス番号に
より量子化するフレームとの組み合わせにより符号化す
るステップ、前記音声符号化パラメータとしての有声／
無声情報を、限定された数の代表有声／無声情報の中か
ら最も近い距離にあるもののインデックス番号により符
号化するステップ、および、前記音声符号化パラメータ
としてのハーモニクススペクトル振幅列を、線形予測モ
デルによる線形予測係数（もしくはそれより導かれる線
スペクトル対）および利得により量子化するフレーム
と、前後のフレームの線形予測係数（もしくは線スペク
トル対）を用いて計算した複数の補間線形予測係数（も
しくは線スペクトル対）候補と前後のフレームの利得を
用いて計算した複数の補間利得候補の組合せのうちのい
ずれかを選択し、そのインデックス番号により量子化す
るフレームの組み合わせにより符号化するステップを含
むものである。In order to solve the above-mentioned problems, a speech coding method of the present invention acquires a speech coding parameter from a speech signal which is digitized and divided into frames of a predetermined time length, and encodes it. A voice encoding method for
A plurality of voice pitches as the voice coding parameters are calculated using a frame to be quantized by any one of the quantization methods by selecting a differential quantization method and a uniform quantization method, and quantized voice pitches of preceding and following frames. Of any of the interpolated voice pitch candidates of, and encoding with a combination with a frame to be quantized by the index number, voiced / voice as the voice encoding parameter
The step of encoding the unvoiced information by the index number of the one having the shortest distance from the limited number of representative voiced / unvoiced information, and the harmonic spectrum amplitude sequence as the speech encoding parameter are calculated by the linear prediction model. Frames quantized by linear prediction coefficients (or line spectrum pairs derived from them) and gain, and multiple interpolated linear prediction coefficients (or line spectra) calculated using linear prediction coefficients (or line spectrum pairs) of preceding and following frames A pair) a step of selecting any one of a plurality of combinations of a candidate and a plurality of interpolation gain candidates calculated by using gains of preceding and following frames, and encoding by a combination of frames to be quantized by the index number.

【００１２】また、前記音声ピッチの量子化における差
分量子化法と均一量子化法の選択は、差分量子化誤差が
ある閾値以下の場合は差分量子化法を選択し、それ以上
の場合は量子化誤差の少ない量子化法を選択する処理で
あり、前記補間音声ピッチ候補の選択は、前後のフレー
ムの音声ピッチから計算される複数の補間音声ピッチ候
補のうち最も現在フレームの音声ピッチに近い補間音声
ピッチ候補を選ぶ処理とされている。さらに、前記音声
ピッチの量子化は、音声ピッチを対数変換した対数ピッ
チを用いて行うようになされている。さらにまた、前記
音声ピッチの差分量子化は、差分値が大きくなるにつれ
て量子化ステップを大きく設定して行なうものとされて
いる。Further, in the selection of the differential quantization method and the uniform quantization method in the quantization of the voice pitch, the differential quantization method is selected when the differential quantization error is less than a threshold value, and the quantum quantization method is selected when the differential quantization error is more than that threshold value. This is a process of selecting a quantization method with less quantization error, and the selection of the interpolated voice pitch candidate is performed by interpolation that is closest to the voice pitch of the current frame among a plurality of interpolated voice pitch candidates calculated from voice pitches of preceding and following frames. This is a process of selecting a voice pitch candidate. Further, the quantization of the voice pitch is performed using a logarithmic pitch obtained by logarithmically converting the voice pitch. Furthermore, the difference quantization of the voice pitch is performed by setting a larger quantization step as the difference value increases.

【００１３】さらにまた、前記限定された数の代表有声
／無声情報は、予め取得した多数の有声／無声情報の中
から、発生頻度の低いものから順に削除し、その除去し
た有声／無声情報の発生頻度を隣接する有声／無声情報
に、そこへの距離および大きさに応じて配分し統合する
ことにより、所望の限定個数の有声／無声情報にまで削
減することにより作成されたものとされている。さらに
また、前記有声／無声情報の距離は、音声スペクトル帯
域毎の有声／無声を０又は１で表し、各音声スペクトル
帯域の０又は１の値を、音声スペクトル帯域の周波数順
に２進数のビットに当てはめ、その２進数値の差を持っ
て表すものとされている。さらにまた、前記有声／無声
情報は、有声／無声情報を送るフレームと省略して送ら
ないフレームの組み合わせにより構成されるものであ
る。さらにまた、前記有声／無声情報を省略したフレー
ムでは、その前後のフレームのうちフレームのエネルギ
ーもしくは振幅平均値の大きいフレームが持つ有声／無
声情報を使用して復号するようになされている。さらに
また、前記ハーモニクススペクトル振幅列を線形予測モ
デルでモデル化する場合に、０次（直流成分）のハーモ
ニクススペクトル振幅値を修正した後、ハーモニクスス
ペクトル振幅の線形予測モデル化を行うようになされて
いる。さらにまた、前記複数の補間利得候補には、前後
のフレームの利得間を均等もしくは不均等に分割した利
得、前後のフレームのうちの利得の大きい方以上の利得
および小さい方以下の利得を含むものである。Furthermore, the limited number of representative voiced / unvoiced information items are deleted from a large number of pre-acquired voiced / unvoiced information items in the order of low occurrence frequency, and the removed voiced / unvoiced information items are deleted. It is assumed that it was created by reducing the frequency of occurrence to a desired limited number of voiced / unvoiced information by allocating and integrating the frequency of occurrence to adjacent voiced / unvoiced information according to the distance and size to it. There is. Furthermore, the voiced / unvoiced information distance is represented by 0 or 1 for voiced / unvoiced voices in each voice spectrum band, and values of 0 or 1 in each voice spectrum band are converted into binary bits in the order of frequency of the voice spectrum band. It is said that they are fitted and represented by the difference between their binary values. Furthermore, the voiced / unvoiced information is composed of a combination of a frame for transmitting voiced / unvoiced information and a frame for not transmitting it. Furthermore, in the frame in which the voiced / unvoiced information is omitted, the voiced / unvoiced information possessed by the frame having the larger energy or the average amplitude value of the frames before and after the frame is used for decoding. Furthermore, when the harmonics spectrum amplitude sequence is modeled by a linear prediction model, the harmonics spectrum amplitude value of the 0th order (DC component) is corrected, and then the linear prediction modeling of the harmonics spectrum amplitude is performed. . Furthermore, the plurality of interpolation gain candidates include a gain obtained by equally or unequally dividing the gains of the preceding and following frames, a gain having a higher gain and a gain having a lower gain in the preceding and following frames. .

【００１４】さらにまた、本発明の音声符号化装置は、
デジタル化され所定時間長のフレームに分割された音声
信号から音声符号化パラメータを取得し符号化する音声
符号化装置であって、前記音声符号化パラメータとして
の音声ピッチを、差分量子化法と均一量子化法の選択に
よりいずれかの量子化法により量子化するフレームと、
前後のフレームの量子化音声ピッチを用いて計算した複
数の補間音声ピッチ候補のうちのいずれかを選択し、そ
のインデックス番号により量子化するフレームの組み合
わせにより符号化する手段と、前記音声符号化パラメー
タとしての有声／無声情報を、限定された数の代表有声
／無声情報の中から最も近い距離にあるもののインデッ
クス番号により符号化する手段と、前記音声符号化パラ
メータとしてのハーモニクススペクトル振幅列を、線形
予測モデルによる線形予測係数（もしくはそれより導か
れる線スペクトル対）および利得により量子化するフレ
ームと、前後のフレームの線形予測係数（もしくは線ス
ペクトル対）を用いて計算した複数の補間線形予測係数
（もしくは線スペクトル対）候補と前後のフレームの利
得を用いて計算した複数の補間利得候補の組合せのうち
のいずれかを選択し、そのインデックス番号により量子
化するフレームの組み合わせにより符号化する手段とを
含むものである。Furthermore, the speech coding apparatus of the present invention comprises:
A voice encoding device for acquiring and encoding a voice encoding parameter from a voice signal which is digitized and divided into frames of a predetermined time length, wherein a voice pitch as the voice encoding parameter is uniform with a differential quantization method. A frame that is quantized by one of the quantization methods by selecting the quantization method,
Means for selecting one of a plurality of interpolated voice pitch candidates calculated using the quantized voice pitch of the preceding and following frames, and encoding by a combination of frames quantized by the index number, and the voice encoding parameter Means for encoding the voiced / unvoiced information as a voice by the index number of the closest distance from the limited number of representative voiced / unvoiced information, and the harmonics spectrum amplitude sequence as the voice encoding parameter, A frame that is quantized by a linear prediction coefficient (or a line spectrum pair derived from it) and a gain by a prediction model, and a plurality of interpolated linear prediction coefficients (using linear prediction coefficients (or line spectrum pairs) of preceding and following frames ( Or line spectrum pair) Candidates and gains of previous and next frames Select one of the combinations of the plurality of interpolated gain candidate, by its index number is intended to include means for encoding by the combination of a frame to be quantized.

【００１５】このように、本発明においては、音声ピッ
チの符号化に対しては、対数変換したピッチに対して、
差分量子化法と均一量子化法を切り換えて、入力音声ピ
ッチとの誤差が少ない方を選択して量子化する場合と、
フレーム間音声ピッチの複数の補間点から一番近い補間
点候補の番号を選択し、その選択番号で量子化する場合
を、フレーム繰り返しにより切り換えて使用することに
より、音声ピッチの量子化ビット数を減少している。ま
た、有声／無声情報(V/UV情報）の符号化に対しては、
予め多くの音声フレームに対してV/UV情報とその発生頻
度を取得し、その中から固定数の代表V/UV情報を選定
し、その代表V/UV情報の中から各フレームのV/UV情報に
最も似た代表V/UV情報の番号（インデックス）で符号化
する手段をとっている。さらに、V/UV情報の伝送を行わ
ないフレームを適宜挿入し、V/UV情報が送られていない
フレームの復号に対しては、前後のフレームのうち、大
きい音声エネルギーを持った方のフレームのV/UV情報を
用いるようにしている。さらにまた、ハーモニクススペ
クトル振幅列の符号化に関しては、そのスペクトル振幅
列を高次の全極モデルすなわち自己回帰型線形予測モデ
ル（ＡＲモデル）でモデル化し、その線形予測係数(Ｌ
ＰＣ係数)とゲイン、もしくはＬＰＣ係数を変形して得
られるＬＳＰ（線スペクトル対）とゲインを量子化する
手段をとっている。また、フレームのＬＰＣ係数又はＬ
ＳＰとゲインの量子化には、量子化されたＬＰＣ係数又
はＬＳＰとゲインのフレーム間の複数補間点から一番近
い補間点候補の番号を選択し、その選択番号で量子化す
る手段を併用することによりハーモニクススペクトル振
幅列の量子化ビット数を減少するようにしている。As described above, in the present invention, with respect to the coding of the voice pitch, the logarithmically converted pitch is
Switching between the differential quantization method and the uniform quantization method, and selecting the one with less error from the input speech pitch for quantization,
By selecting the number of the closest interpolation point candidate from multiple interpolation points of the inter-frame voice pitch and quantizing with that selection number, by switching by using frame repetition, the number of quantization bits of the voice pitch can be changed. is decreasing. For voiced / unvoiced information (V / UV information) coding,
V / UV information and its frequency of occurrence are acquired in advance for many audio frames, a fixed number of representative V / UV information is selected from among them, and V / UV of each frame is selected from the representative V / UV information. It takes the means of encoding with the number (index) of the representative V / UV information that is most similar to the information. In addition, insert a frame that does not transmit V / UV information as appropriate, and decode the frame that does not have V / UV information transmitted. I am trying to use V / UV information. Furthermore, regarding the encoding of the harmonics spectrum amplitude sequence, the spectrum amplitude sequence is modeled by a high-order all-pole model, that is, an autoregressive linear prediction model (AR model), and its linear prediction coefficient (L
A means for quantizing the gain (PC coefficient) and the gain, or the LSP (line spectrum pair) obtained by modifying the LPC coefficient and the gain is used. Also, the LPC coefficient of the frame or L
For the quantization of the SP and the gain, a means for selecting the number of the nearest interpolation point candidate from the quantized LPC coefficients or a plurality of interpolation points between the frames of the LSP and the gain and quantizing with the selected number is also used. This reduces the number of quantization bits in the harmonic spectrum amplitude sequence.

【００１６】[0016]

【発明の実施の形態】本発明の音声符号化方法及び装置
の一実施の形態について、前記分析合成型音声符号化方
法であるＭＢＥもしくはＩＭＢＥ音声符号化方法に適応
した場合を例にとって説明する。なお、この音声符号化
装置は、前記図７に示したパラメータ符号化部３０３に
対応しており、音声符号化パラメータ抽出部３０２によ
り抽出された音声符号化パラメータ、すなわち、音声ピ
ッチ（またはその逆数である音声基本周波数ωo）、音
声ハーモニクススペクトル振幅列および各周波数バンド
の有声／無声情報（V/UV情報）を効率的に符号化する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the speech coding method and apparatus of the present invention will be described by taking as an example the case of being applied to the MBE or IMBE speech coding method which is the analysis-synthesis type speech coding method. This speech coding apparatus corresponds to the parameter coding unit 303 shown in FIG. 7, and the speech coding parameter extracted by the speech coding parameter extraction unit 302, that is, the speech pitch (or its reciprocal). Voice fundamental frequency ωo), voice harmonics spectrum amplitude sequence, and voiced / unvoiced information (V / UV information) of each frequency band are efficiently encoded.

【００１７】図１は本発明の音声符号化方法が適用され
た音声符号化装置の一構成例を示すブロック図である。
例えば前記図８で示した音声符号化パラメータ抽出部で
得られた音声ピッチ（又は音声基本周波数ωo）は入力
端子１０１に入力され、対数変換部１０２で音声ピッチ
が対数変換され、対数音声ピッチP[n]（ｎはフレーム番
号）を得る。対数音声ピッチは文献（Thomas Eriksson
and Hong-Goo Kang, "Pitch Quantization in low Bit-
Rate Speech Coding", ICASSP '99, pp489-492,1999）
に述べられているように、対数ピッチの変化量に対する
人間の検知限界値が、対数ピッチの値の影響をあまり受
けないことが知られている。そのため、量子化ステップ
幅を均一にすることが出来るため都合の良い変換となっ
ている。FIG. 1 is a block diagram showing an example of the configuration of a speech coding apparatus to which the speech coding method of the present invention is applied.
For example, the voice pitch (or the voice fundamental frequency ωo) obtained by the voice coding parameter extraction unit shown in FIG. 8 is input to the input terminal 101, the voice pitch is logarithmically converted by the logarithmic conversion unit 102, and the logarithmic voice pitch P [n] (n is a frame number) is obtained. Logarithmic voice pitch is described in the literature (Thomas Eriksson
and Hong-Goo Kang, "Pitch Quantization in low Bit-
Rate Speech Coding ", ICASSP '99, pp489-492,1999)
It is known that the human detection limit value with respect to the change amount of the logarithmic pitch is not so much influenced by the value of the logarithmic pitch, as described in. Therefore, the quantization step width can be made uniform, which is a convenient conversion.

【００１８】対数ピッチP[n]は入力切換部１０３でフレ
ーム毎（またはサブフレーム化されている場合はサブフ
レーム毎）に交互に切り換えられて、２つの出力端子１
０４または１１６のいずれかに出力される。１０４に出
力された場合は均一量子化部１０５と減算部１１２に導
かれる。均一量子化部１０５では一定の量子化ステップ
で均一に量子化され、その量子化対数ピッチP1'[n]がピ
ッチ比較部１０８に入力される。一方、減算部１１２で
は入力された対数ピッチと遅延部１１１から受け取った
前フレームの量子化対数ピッチとから差分対数ピッチを
得て差分量子化部１１３に入力する。遅延部１１１は、
直前フレームの量子化対数ピッチを現在フレームに渡す
ためのものである。差分量子化部１１３では一定の差分
量子化ステップ、もしくはゼロ入力を対称として差分入
力振幅の増加につれて差分量子化ステップが拡大する様
に設定した不均一量子化ステップで差分量子化を行い、
加算部１１４で基準とした前フレームの量子化対数ピッ
チと加算し、差分量子化による対数ピッチP2'[n]を１０
７に出力する。The logarithmic pitch P [n] is alternately switched by the input switching unit 103 for each frame (or each subframe when subframed), and the two output terminals 1
It is output to either 04 or 116. When it is output to 104, it is guided to the uniform quantization unit 105 and the subtraction unit 112. The uniform quantizing unit 105 quantizes uniformly at a certain quantizing step, and the quantized logarithmic pitch P1 ′ [n] is input to the pitch comparing unit 108. On the other hand, the subtraction unit 112 obtains the differential logarithmic pitch from the input logarithmic pitch and the quantized logarithmic pitch of the previous frame received from the delay unit 111, and inputs it to the differential quantization unit 113. The delay unit 111 is
It is for passing the quantized logarithmic pitch of the immediately preceding frame to the current frame. The differential quantization unit 113 performs differential quantization at a constant differential quantization step, or at a non-uniform quantization step set so that the differential quantization step expands as the differential input amplitude increases with zero input as symmetry.
The addition unit 114 adds the quantized logarithmic pitch of the previous frame as a reference, and the logarithmic pitch P2 ′ [n] obtained by differential quantization is 10
Output to 7.

【００１９】ピッチ比較部１０８では、P1'[n]とP2'[n]
を比較し、量子化前の対数ピッチP[n]との誤差が少ない
方の量子化対数ピッチを選択し、このフレームの量子化
対数ピッチP'[n]を出力端子１０９に出力する。出力端
子１１０には均一量子化インデックスN1と差分量子化イ
ンデックスN2のうちピッチ比較部１０８で選択された方
の量子化器の出力したインデックスをピッチ符号として
出力する。N1とN2のインデックスは番号の重複が無い様
に配置することで出力されたインデックス番号からどち
らの量子化方法が選択されたかが判る。In the pitch comparison unit 108, P1 '[n] and P2' [n]
Are compared, the quantized logarithmic pitch having a smaller error from the logarithmic pitch P [n] before quantization is selected, and the quantized logarithmic pitch P ′ [n] of this frame is output to the output terminal 109. To the output terminal 110, the index output from the quantizer selected by the pitch comparison unit 108 out of the uniform quantization index N1 and the differential quantization index N2 is output as a pitch code. By arranging the indexes of N1 and N2 so that the numbers do not overlap, it is possible to know which quantization method is selected from the output index numbers.

【００２０】入力切換部１０３のもう一方の出力１１６
に現れた対数ピッチP[n]は、遅延部１１１の入出力端か
ら得られる現フレームと前フレームの量子化対数ピッチ
を用いて補間ピッチ候補作成部１１７で作成した複数の
補間ピッチ候補と、補間点比較部１１９で比較され、最
も出力１１６の対数ピッチに近いピッチを与えた補間点
インデックス（選択番号）N3がピッチ補間符号として１
２０に出力される。The other output 116 of the input switching unit 103
The logarithmic pitch P [n] appearing in the above is a plurality of interpolation pitch candidates created by the interpolation pitch candidate creating unit 117 using the quantized log pitches of the current frame and the previous frame obtained from the input / output terminal of the delay unit 111, The interpolation point index (selection number) N3 that is compared by the interpolation point comparison unit 119 and gives the pitch closest to the logarithmic pitch of the output 116 is set as 1 as the pitch interpolation code.
It is output to 20.

【００２１】図２は、図１の補間ピッチ候補作成部１１
７の働きを説明する図である。図２に示した例は、補間
点候補数を４としその選択番号により２ビットで量子化
した例である。現在フレームの前方フレームの量子化対
数ピッチをP'[n+1]、後方フレームの量子化対数ピッチ
をP'[n-1]とし、その間を直線で結んだ間を均等に分割
する４点を×印で示す。この４点の補間ピッチ候補のう
ち、最も入力対数ピッチP[n]に近い補間量子化ピッチP'
[n]が選択され、この補間量子化ピッチを与えるインデ
ックスとして、図２の例では２を選択する。P[n]はP'[n
+1]とP'[n-1]の間のフレームのピッチであり、例えば、
フレームが２つのサブフレームに分割されている場合で
は、P[n]は現フレームの第１サブフレーム、P'[n+1]が
現フレームの第２サブフレームの量子化ピッチ、P'[n-
1]は前フレームの第２サブフレームの量子化ピッチに対
応している。FIG. 2 shows the interpolation pitch candidate creation unit 11 of FIG.
It is a figure explaining the operation of 7. The example shown in FIG. 2 is an example in which the number of interpolation point candidates is 4, and quantization is performed with 2 bits according to the selection number. Quantize logarithmic pitch of the front frame of the current frame is P '[n + 1], Quantize logarithmic pitch of the rear frame is P' [n-1], and divide them evenly with straight lines. Is indicated by a cross. Of the four interpolation pitch candidates, the interpolation quantization pitch P ′ that is closest to the input logarithmic pitch P [n].
[n] is selected, and 2 is selected as an index giving this interpolation quantization pitch in the example of FIG. P [n] is P '[n
Is the pitch of the frame between +1] and P '[n-1], eg
If the frame is divided into two subframes, P [n] is the first subframe of the current frame, P '[n + 1] is the quantization pitch of the second subframe of the current frame, and P' [ n-
1] corresponds to the quantization pitch of the second subframe of the previous frame.

【００２２】なお、図２の補間ピッチ候補の配置ではP'
[n+1]とP'[n-1]は補間ピッチ候補に入れていないが、補
間ピッチ候補を両端のP'[n+1]とP'[n-1]を含んで設定す
ることも出来る。その場合には、P'[n+1]とP'[n-1]以外
の補間ピッチ候補は２点となる。図２の例の様に補間ピ
ッチ候補の位置を両端を除いて設定すると、１ビットで
も両端を除く２点を選択できることになるため、図２の
補間点配置は補間点に与えるビット数が１ビットとか２
ビットとかの少ない場合に有効といえる。In the arrangement of interpolation pitch candidates in FIG. 2, P '
[n + 1] and P '[n-1] are not included in the interpolation pitch candidate, but the interpolation pitch candidate should be set to include P' [n + 1] and P '[n-1] at both ends. You can also In that case, there are two interpolation pitch candidates other than P '[n + 1] and P' [n-1]. When the positions of the interpolation pitch candidates are set excluding both ends as in the example of FIG. 2, two points excluding both ends can be selected even with 1 bit. Therefore, the interpolation point arrangement of FIG. Bit or 2
It can be said to be effective when there are few bits.

【００２３】再度、図１に戻り、有声／無声情報（V/UV
情報）は入力端子１３１から入力され、フレーム間引き
部１３３でV/UV情報がフレーム間引きされる。例えば２
フレームに対して１回のみV/UV情報が出力され、有声／
無声比較部１３４に入力される。なお、受信側（復号
側）においては、前記有声／無声判定情報を省略したフ
レームでは、その前後のフレームのうち、フレームのエ
ネルギーもしくは振幅平均値の大きいフレームが持つ有
声／無声情報を使用して復号する。Returning to FIG. 1 again, voiced / unvoiced information (V / UV
Information) is input from the input terminal 131, and the frame thinning unit 133 thins out the V / UV information. Eg 2
V / UV information is output only once for each frame.
It is input to the unvoiced comparison unit 134. On the receiving side (decoding side), in the frame in which the voiced / unvoiced determination information is omitted, the voiced / unvoiced information of the frame having the largest frame energy or amplitude average value is used among the frames before and after the frame. Decrypt.

【００２４】代表有声／無声情報コードブック１３２は
予め多くの音声フレームから取得したV/UV情報から後で
述べる本発明による方法で、発生頻度の高いものから限
定個数を選んで格納したものある。現在入力されたV/UV
情報値b1と最も近い距離の代表V/UV情報値b1'を有声／
無声比較部１３４で選択し、その代表V/UV情報のインデ
ックスを有声／無声符号１３５として出力する。V/UV情
報値b1（又はb1'）は音声周波数スペクトルを音声基本
周波数の例えば３倍の区間間隔で区切った周波数バンド
毎のV/UV情報値ｖ[k]、k=1,2,...,K（ｖ[k]は０又は
１）を２進数の各ビットに割り振った２進数値で表す。The representative voiced / unvoiced information codebook 132 is a method in which a limited number of occurrences are selected and stored according to the method of the present invention, which will be described later, from V / UV information acquired in advance from many voice frames. Currently entered V / UV
Voiced the representative V / UV information value b1 'at the closest distance to the information value b1
It is selected by the unvoiced comparison unit 134 and the index of the representative V / UV information is output as the voiced / unvoiced code 135. The V / UV information value b1 (or b1 ') is the V / UV information value v [k], k = 1,2, ... for each frequency band obtained by dividing the voice frequency spectrum at intervals of, for example, three times the voice fundamental frequency. .., K (v [k] is 0 or 1) is represented by a binary value assigned to each bit of the binary number.

【数１】上記、V/UV情報値の距離はb1とb1'の差の絶対値で表
す。また、代表V/UV情報は、音声基本周波数により決ま
るバンド数毎に独立に設定する。[Equation 1] The distance of the V / UV information value is represented by the absolute value of the difference between b1 and b1 '. In addition, the representative V / UV information is set independently for each band number determined by the audio fundamental frequency.

【００２５】代表V/UV情報は以下の方法で選定し作成す
ることが出来る。すなわち、予め多くの音声フレームか
ら多くのV/UV情報値を得ておき、バンド数Ｋ毎に分類す
る。各バンド数毎に分類されたV/UV情報値の集合{b1_i}
から各b1_iの発生頻度を集計する。例えば、バンド数５
の場合に対しては、b1の値は０〜３１の整数値をとり、
おのおのb1値に対して発生頻度が集計される。この中か
ら、例えば２ビットでV/UV情報を量子化する場合はこの
中から４種類の代表V/UV情報を選択する必要がある。こ
の選択のためには、発生頻度の上位から順に４つ選択す
る方法が考えられるが、場合によっては隣接した代表V/
UV情報値が選ばれることがあり、代表V/UV情報値として
適当でない場合が発生する。特にバンド数Ｋが大きい場
合には発生頻度の高い代表V/UV情報値が隣接して存在す
る可能性が高い。The representative V / UV information can be selected and created by the following method. That is, a large number of V / UV information values are obtained in advance from a large number of audio frames, and the number of bands K is classified. A set of V / UV information values classified for each number of bands {b1 _i }
Then, the frequency of occurrence of each b1 _i is totaled. For example, 5 bands
In the case of, the value of b1 takes an integer value from 0 to 31,
The frequency of occurrence is calculated for each b1 value. From this, for example, when quantizing V / UV information with 2 bits, it is necessary to select four types of representative V / UV information from this. For this selection, a method of selecting four from the highest occurrence frequency may be considered, but depending on the case, the adjacent representative V /
The UV information value may be selected and may not be suitable as the representative V / UV information value. In particular, when the number of bands K is large, there is a high possibility that representative V / UV information values having a high occurrence frequency will exist adjacently.

【００２６】このため、本発明では、発生頻度の最も低
いV/UV情報の発生頻度を、隣接するV/UV情報の発生頻度
に配分しながら順次消去し、最終的に目的数の代表V/UV
情報の数になるまで削減する方法を取った。図３はV/UV
情報（V/UVパターン）の候補数を順次削除する場合に、
最も発生頻度の低いV/UVパターンを消去する方法を説明
する図である。ここで、最も発生頻度の低いV/UVパター
ンの発生頻度をｑ[n]とするとｑ[n]をｑ1とｑ2に分けて
それぞれ隣接するV/UVパターンの発生頻度ｑ[n-l₁]とｑ
[n+l₂]に加算配分する。ｑ1とｑ2の配分量は隣接V/UVパ
ターンの発生頻度ｑ[n-l₁]とｑ[n+l₂]の大きさと、隣接
V/UVパターンまでの距離l₁とl₂の近さに応じて下式によ
り決める。Therefore, in the present invention, the occurrence frequency of the V / UV information having the lowest occurrence frequency is sequentially erased while allocating to the occurrence frequency of the adjacent V / UV information, and finally the representative V / UV of the target number is generated. UV
I took a way to reduce the number of information. Figure 3 is V / UV
When sequentially deleting the number of information (V / UV pattern) candidates,
It is a figure explaining the method of erasing the V / UV pattern with the lowest occurrence frequency. Here, the frequency of low V / UV pattern most frequency q [n] to the q [n] the frequency q of V / UV pattern adjacent each divided into q1 and q2 [nl _1] and q
Allocate to [n + l ₂ ]. The distribution of q1 and q2 depends on the size of adjacent V / UV pattern occurrences q [nl ₁ ] and q [n + l ₂ ]
It is determined by the following formula according to the proximity of the distances l ₁ and l ₂ to the V / UV pattern.

【数２】 [Equation 2]

【００２７】再度、図１に戻り、スペクトル包絡の符号
化について説明する。スペクトル包絡はハーモニクスス
ペクトル振幅列Ａ[l]として離散的なスペクトルが入力
端子１４０に入力され、入力切換部１４１でフレーム毎
に経路を切り換えて一方はスペクトル修正部１４２に入
力される。スペクトル修正部１４２の動作は後で説明す
る。スペクトル補間部１４３では、離散的なスペクトル
振幅列として入力されたハーモニクススペクトル振幅列
を離散スペクトル間で補間生成し、線形予測モデル化部
１４４でモデル化に有効な多くのスペクトル振幅列を生
成する。このスペクトル補間はハーモニクススペクトル
振幅を対数に変形して線形補間し、結果的に非線型の補
間を行う。もちろん、他の非線型補間でも良く、線形予
測モデルに良く合った補間方法が望まれる。Returning to FIG. 1 again, the encoding of the spectrum envelope will be described. As the spectrum envelope, a discrete spectrum is input to the input terminal 140 as a harmonics spectrum amplitude sequence A [l], the input switching unit 141 switches the path for each frame, and one is input to the spectrum correction unit 142. The operation of the spectrum correction unit 142 will be described later. The spectrum interpolation unit 143 interpolates and generates the harmonics spectrum amplitude sequence input as the discrete spectrum amplitude sequence between the discrete spectra, and the linear prediction modeling unit 144 produces many spectrum amplitude sequences effective for modeling. In this spectrum interpolation, the harmonic spectrum amplitude is transformed into a logarithm to perform linear interpolation, and as a result, nonlinear interpolation is performed. Of course, other non-linear interpolation may be used, and an interpolation method that is well suited to the linear prediction model is desired.

【００２８】補間によりそのサンプル数を増やしたスペ
クトル振幅列は、線形予測モデル化部１４４で自己回帰
型線形予測モデルでモデル化し、利得ｇと高次（例えば
１０次まで）の線形予測係数（ＬＰＣ係数ａ[j]、j=1,
2,3,....,J、Ｊは予測次数）に変換される。利得ｇは利
得量子化部１４５で対数利得に変換して均一量子化し、
そのインデックスがスペクトル利得符号出力端子１５５
に出力される。一方、ＬＰＣ係数ａ[j]はモデル化係数
変換部１４７で線スペクトル対（ＬＳＰ）Ｆ[j]に変換
される。ＬＳＰは０〜πの値を持ち、変化範囲が決まっ
ており、線形補間による聴感上の劣化が少なく、スペク
トル包絡のモデル化の係数として広く一般に用いられて
いる。ＬＳＰに変換されたＬＰＣ係数はＬＳＰ量子化部
１４８でＬＳＰコードブック１４９を用いてベクトル量
子化され、そのコードブックのインデックスがスペクト
ル包絡符号としてスペクトル包絡符号出力端子１５６に
出力される。The spectrum amplitude sequence whose number of samples is increased by interpolation is modeled by an autoregressive linear prediction model in the linear prediction modeling unit 144, and a gain g and a high-order (for example up to 10th) linear prediction coefficient (LPC) are obtained. Coefficient a [j], j = 1,
2,3, ..., J, J are converted to the predicted order). The gain g is converted into a logarithmic gain by the gain quantizer 145 and uniformly quantized,
The index is the spectral gain code output terminal 155.
Is output to. On the other hand, the LPC coefficient a [j] is converted into a line spectrum pair (LSP) F [j] by the modeling coefficient conversion unit 147. The LSP has a value of 0 to π, has a fixed change range, has little auditory deterioration due to linear interpolation, and is widely and generally used as a coefficient for modeling a spectrum envelope. The LPC coefficient converted into the LSP is vector-quantized by the LSP quantizer 148 using the LSP codebook 149, and the index of the codebook is output to the spectrum envelope code output terminal 156 as the spectrum envelope code.

【００２９】ここで、前記スペクトル修正部１４２の機
能とその目的について説明する。線形予測モデルでハー
モニクススペクトル振幅列をモデル化する場合、多くの
スペクトル点があった方が良いが、モデルに合わないス
ペクトルが有ればモデル化に歪を与え、モデル化後のス
ペクトル誤差が増大する。一般に音声の０次のハーモニ
クススペクトル振幅（直流成分）は他のハーモニクスス
ペクトル振幅に比べて低く、モデル化誤差を発生しやす
い。また、０次ハーモニクススペクトル振幅は、音声復
号時には不要な成分であるため（音声信号には直流成分
は殆ど含まれない）、モデル化しやすいレベルに調整変
更しても良いといえる。以上の理由で、スペクトル修正
部１４２では線形予測モデルでモデル化しやすい様に０
次のハーモニクススペクトル振幅を修正する。具体的に
は、次式により算出した値を０次ハーモニクス振幅に置
きかえる。Here, the function and purpose of the spectrum correction unit 142 will be described. When modeling a harmonics spectrum amplitude sequence with a linear prediction model, it is better to have many spectral points, but if there is a spectrum that does not fit the model, it will distort the modeling and increase the spectral error after modeling. To do. In general, the 0th-order harmonic spectrum amplitude (DC component) of voice is lower than other harmonic spectrum amplitudes, and a modeling error is likely to occur. Further, since the 0th-order harmonic spectrum amplitude is an unnecessary component at the time of voice decoding (the DC signal is hardly contained in the voice signal), it can be said that it may be adjusted and changed to a level that can be easily modeled. For the above reason, the spectrum correction unit 142 sets 0 to facilitate modeling with a linear prediction model.
Modify the following harmonics spectrum amplitude. Specifically, the value calculated by the following equation is replaced with the 0th-order harmonic amplitude.

【数３】ここで、Ｈ’₀は修正した０次ハーモニクススペクトル
振幅、Ｈ₁,Ｈ₂.Ｈ₃はそれぞれ１次、２次、３次のハー
モニクススペクトル振幅である。なお、この式以外の式
を用いて０次ハーモニクス振幅を置き換えてもよい。[Equation 3] Here, H ′ ₀ is the corrected zero-order harmonics spectrum amplitude, and H ₁ and H ₂ .H ₃ are the first-order, second-order, and third-order harmonics spectrum amplitudes, respectively. Note that the zeroth-order harmonics amplitude may be replaced by using an equation other than this equation.

【００３０】一方、前記入力切り換え部１４１でフレー
ム毎に切りかえられたもう一方の入力ハーモニクススペ
クトル振幅列Ａ[l]は、スペクトル補間点比較部１５４
に入力される。スペクトル補間点比較部１５４では、利
得補間演算部１５０において利得遅延部１４６の入出力
端からの前後のフレームの量子化利得の補間演算により
求めた補間利得の候補と、ＬＳＰ補間演算部１５２にお
いてＬＳＰ遅延部１５１の入出力端からの前後のフレー
ムのＬＳＰ係数の補間演算により求めた補間ＬＳＰ係数
の候補の組み合わせから、スペクトル復元部１５３で復
元されたハーモニクススペクトル振幅Ａ’_i[l]（ｌはハ
ーモニクス番号、ｉは補間候補番号またはインデック
ス）を、入力されたハーモニクススペクトル振幅列Ａ
[l]と比較して最もスペクトル誤差の少ない補間利得と
補間ＬＳＰ係数を与えたインデックスｉを下式により選
択し、スペクトル補間符号としてスペクトル補間符号出
力端子１５７に出力する。On the other hand, the other input harmonics spectrum amplitude sequence A [l] switched by the input switching unit 141 for each frame is the spectrum interpolation point comparison unit 154.
Entered in. In the spectrum interpolation point comparison unit 154, the interpolation gain candidates obtained by the interpolation calculation of the quantization gains of the frames before and after the input / output end of the gain delay unit 146 in the gain interpolation calculation unit 150, and the LSP interpolation calculation unit 152 in the LSP interpolation calculation unit 152. The harmonics spectrum amplitude A ′ _i [l] (l is obtained by the spectrum restoration unit 153 from the combination of the interpolated LSP coefficient candidates obtained by the interpolation calculation of the LSP coefficients of the frames before and after the input / output end of the delay unit 151. The harmonics number, i is the interpolation candidate number or index), and the input harmonics spectrum amplitude sequence A
The index i which gives the interpolation gain and the interpolation LSP coefficient with the smallest spectrum error compared with [l] is selected by the following formula, and is output to the spectrum interpolation code output terminal 157 as the spectrum interpolation code.

【数４】ここで、argmin_i(x)関数（上式中のargminの下にiが記
された関数）は、ｉをパラメータとして評価してｘが最
小となるｉを返す関数とする。[Equation 4] Here, the argmin_i (x) function (the function in which i is described below argmin in the above expression) is a function that evaluates i as a parameter and returns i with the minimum x.

【００３１】また、前記補間利得の候補は、前後フレー
ムの量子化利得間の補間候補のみでは不都合が発生する
場合がある。例えば、急激な利得の極小がある場合には
符号化音声が十分小さなレベルになりきれず、不快な雑
音を発生する。これを避けるため、前記補間利得候補
に、前後のフレームの量子化利得の最小値以上の候補や
最大値以上の候補を加えることで、この問題は軽減され
る。具体的には、最大値（前後のフレームの量子化利得
の大きいほうの値）＋５dB、＋１０dB、最小値（前後の
フレームの量子化利得の小さいほうの値）−５dB、−１
０dBなどを前記前後のフレームの量子化利得の補間演算
により求めた補間利得の候補に加えることを行なう。Further, there is a case where the above-mentioned interpolation gain candidates are inconvenient only with the interpolation candidates between the quantization gains of the preceding and following frames. For example, when there is a sharp minimum gain, the coded voice cannot reach a sufficiently low level, and unpleasant noise is generated. In order to avoid this, this problem is mitigated by adding a candidate having a minimum value or more or a maximum value of the quantization gain of the preceding and following frames to the interpolation gain candidate. Specifically, the maximum value (the value with the larger quantization gain of the preceding and following frames) +5 dB, +10 dB, the minimum value (the value with the smaller quantization gain of the preceding and following frames) -5 dB, -1
0 dB or the like is added to the interpolation gain candidates obtained by the interpolation calculation of the quantization gains of the preceding and following frames.

【００３２】次に、本発明による前記図１に示した符号
化部の処理の流れについて説明する。図４は音声基本周
波数ωoの符号化の流れ図である。図４で７０１から処
理が開始される。７０２で量子化する音声基本周波数ω
o、フレーム番号ｍを設定する。次に７０３で対数ピッ
チＰを計算し、７０４でｍの偶奇を判断し、もし偶数な
らば７０５及び７０６で、それぞれ均一量子化と差分量
子化を行い、均一量子化ピッチP_uと均一量子化インデ
ックスIndex_u、及び差分量子化ピッチP_dと差分量子化
インデックスIndex_dを計算する。次に、７０８で差分
量子化誤差（|P_d−P|）を判定し、ある閾値Thより小さ
い場合は、７１０でｍフレーム目のピッチP[m]（これを
偶数フレームの意味でP[2n]と表す）をP_d、そのインデ
ックスをIndex[m]（これを偶数フレームの意味でIndex
[2n]と表す）をIndex_dとする。一方、７０８で|P_d−P
|がTh以上であると判定された場合は、７０７で均一量
子化誤差（|P_u−P|）と差分量子化誤差（|P_d−P|）を
比較し、均一量子化誤差が小さい場合は７０９でｍフレ
ーム目のピッチP[2n]をP_u、そのインデックスIndex[2
n]をIndex_uとし、逆の場合は７１０でピッチP[2n]をP_
d、Index[2n]をIndex_dとする。また、ｍフレーム目の
ピッチP[2n]は７１１で２フレーム時間を遅延してP[2n-
2]とし、差分量子化７０６の基準対数ピッチとして、次
の偶数フレームを差分量子化する時の基準対数ピッチと
して使用する。Next, the processing flow of the encoding unit shown in FIG. 1 according to the present invention will be described. FIG. 4 is a flow chart of encoding the voice fundamental frequency ωo. The processing starts from 701 in FIG. Voice fundamental frequency ω quantized by 702
Set o and frame number m. Next, in 703, the logarithmic pitch P is calculated, and in 704, the even / odd of m is determined. If it is even, 705 and 706 perform uniform quantization and differential quantization, respectively, to obtain a uniform quantization pitch P_u and a uniform quantization index. Index_u, differential quantization pitch P_d, and differential quantization index Index_d are calculated. Next, at 708, the difference quantization error (| P_d−P |) is determined. If it is smaller than a certain threshold Th, at 710, the pitch P [m] of the m-th frame (this is P [2n ]] Is P_d, and its index is Index [m] (this is an index in the sense of an even frame).
Let [2n] be Index_d. On the other hand, at 708 | P_d−P
When | is determined to be Th or more, the uniform quantization error (| P_u−P |) and the differential quantization error (| P_d−P |) are compared at 707, and when the uniform quantization error is small, 709, the pitch P [2n] of the m-th frame is P_u, and its index Index [2
n] as Index_u, and vice versa at 710 with pitch P [2n] P_
Let d and Index [2n] be Index_d. Further, the pitch P [2n] of the m-th frame is 711 and P [2n-
2] and the reference logarithm pitch of the differential quantization 706 is used as the reference logarithmic pitch when the next even frame is differentially quantized.

【００３３】また、７０４でｍが奇数の場合は７１３で
ピッチ補間候補から選択される。ピッチ補間候補は、前
記図２に示したと同様の手法により、前後のフレームの
量子化対数ピッチ間を複数個に均等分割した補間ピッチ
候補の集合{Pinpol[2n-1]_i}（i=0,1,2,3...,N-1、Ｎは
補間候補点数）として７１２で計算される。ピッチ補間
候補の選択は、ｍ＝2n-1フレームの対数ピッチP[2n-1]
との誤差絶対値が最も小さい補間候補点を選択し、その
インデックス番号Index[2n-1]を奇数フレームの量子化
されたピッチ符号として７１４で設定される。また、偶
数フレームの場合のピッチ符号は７０９又は７１０で選
択された量子化法のインデックスIndex_d又はIndex_uが
Index[2n]として同様に７１４で設定される。この様に
して設定された偶数と奇数フレームの２フレーム分のピ
ッチ符号が７１５に出力される。図４に示した本発明に
よる音声基本周波数の符号化方法を用いれば、例えば、
ピッチ符号に４ビット、ピッチ補間符号に１ビットを用
いて、良好に２フレーム分の音声基本周波数の符号化が
出来る。If m is an odd number in 704, the pitch interpolation candidate is selected in 713. The pitch interpolation candidates are a set of interpolation pitch candidates {Pinpol [2n-1] _i } (i = 0 , 1,2,3 ..., N-1, N are calculated in 712 as interpolation candidate points). The pitch interpolation candidate is selected by the logarithmic pitch P [2n-1] of m = 2n-1 frames.
The interpolation candidate point having the smallest absolute error value between and is selected, and its index number Index [2n-1] is set at 714 as the quantized pitch code of the odd frame. The pitch code in the case of an even frame is the index Index_d or Index_u of the quantization method selected in 709 or 710.
The index [2n] is similarly set in 714. The pitch codes for two frames, the even and odd frames set in this way, are output to 715. Using the method of encoding the fundamental frequency of speech according to the present invention shown in FIG. 4, for example,
By using 4 bits for the pitch code and 1 bit for the pitch interpolation code, it is possible to satisfactorily encode the voice fundamental frequency for 2 frames.

【００３４】図５はV/UV情報の符号化の流れ図である。
８０１から処理が開始され、８０２で音声基本周波数を
ωoに、フレームのV/UV情報値をｖに、フレーム番号を
ｍに設定する。８０３ではωoに対するバンド数Ｋを求
める。Ｋ＝(int)((π-ωo/2)/(ωo×B)) ここで、Ｂは各バンドに含まれるハーモニクスの本数で
あり、符号化に先立ち予め決めておくもので、３程度が
使用される。８０４でｍの偶奇を判定し、ｍが偶数の場
合には８０５で予め選出したバンド数毎の代表V/UV情報
値のデータ８０６の中からバンド数Ｋ用のグループ｛VC
B[K][i]｝で最も入力されたV/UV値ｖに近い値を持った
代表V/UV情報値のインデックスｉを選びIndexV/UV[2n]
とし、８０７で有声／無声符号に設定する。処理フレー
ムが奇数の場合は、有声／無声符号は間引かれて出力さ
れない。図５に示した本発明によるV/UV情報の符号化方
法を用いれば、例えば、有声／無声符号に２ビットを用
いて、良好に２フレーム分のV/UV情報の符号化が出来
る。FIG. 5 is a flow chart of encoding V / UV information.
The processing is started from 801, and in 802, the audio fundamental frequency is set to ωo, the V / UV information value of the frame is set to v, and the frame number is set to m. At 803, the number K of bands for ωo is obtained. K = (int) ((π-ωo / 2) / (ωo × B)) where B is the number of harmonics contained in each band, which is determined in advance before encoding and is about 3 used. In 804, it is determined whether m is even or odd, and when m is an even number, in 805, the group V for the number of bands K is selected from the data 806 of the representative V / UV information value for each number of bands selected in advance.
Select the index i of the representative V / UV information value that has a value closest to the input V / UV value v in B [K] [i]} and select IndexV / UV [2n]
Then, in 807, a voiced / unvoiced code is set. When the number of processed frames is odd, voiced / unvoiced codes are decimated and not output. If the V / UV information encoding method according to the present invention shown in FIG. 5 is used, it is possible to favorably encode V / UV information for two frames by using 2 bits for a voiced / unvoiced code, for example.

【００３５】図６はハーモニクススペクトル振幅列の符
号化の流れ図である。９０１から処理が開始され、９０
２でハーモニクススペクトル振幅をＡ[l]にセットす
る。またフレーム番号はｍにセットする。９０３でｍの
偶奇を判定し、偶数フレームの場合は９０４で０次のハ
ーモニクススペクトル振幅Ａ[0]を第２次のハーモニク
ススペクトル振幅Ａ[2]をα倍したもので補正する。補
正係数αの決定方法は既述の方法で決定する。９０５で
補正されたハーモニクススペクトル振幅列を対数値で線
形補間し、スペクトル列の本数を増加する。９０６で
は、補間して増加したスペクトル列を線形予測モデルで
のモデル化曲線上の値としてモデル化表現する。線形予
測モデルのモデル化係数である利得ｇは９０８で対数利
得として均一量子化し、量子化利得ｇ’とそのインデッ
クスを得る。量子化ステップと量子化範囲は９０７で予
め決められている。もう一方のモデル化係数であるＬＰ
Ｃ係数ａ[j]は線スペクトル対（ＬＳＰ）に変換してＦ
[j]とし、９０９でベクトル量子化テーブル９１０を用
いてベクトル量子化し、量子化ベクトルＦ’[j]とその
インデックスを得る。FIG. 6 is a flow chart of encoding a harmonics spectrum amplitude sequence. The processing starts from 901, and 90
At 2, set the harmonics spectrum amplitude to A [l]. The frame number is set to m. In 903, it is determined whether m is even or odd. In the case of an even frame, in 904, the 0th-order harmonics spectrum amplitude A [0] is corrected by multiplying the 2nd-order harmonics spectrum amplitude A [2] by α. The method of determining the correction coefficient α is determined by the method described above. The harmonic spectrum amplitude sequence corrected in 905 is linearly interpolated with a logarithmic value to increase the number of spectrum sequences. In 906, the increased spectral sequence by interpolation is modeled and expressed as a value on the modeling curve in the linear prediction model. The gain g, which is a modeling coefficient of the linear prediction model, is uniformly quantized as a logarithmic gain at 908 to obtain a quantization gain g ′ and its index. The quantization step and the quantization range are predetermined at 907. LP, the other modeling factor
The C coefficient a [j] is converted into a line spectrum pair (LSP) and then F
[j], vector quantization is performed at 909 using the vector quantization table 910 to obtain a quantized vector F ′ [j] and its index.

【００３６】９０８からの量子化利得ｇ’は、９１２で
２フレーム時間遅延し、その入出力間の複数（M1個）の
補間利得候補が９１１で対数利得線形補間演算され９１
５に導かれる。また、９０９からの量子化ＬＳＰ（Ｆ’
[j]）は９１３で２フレーム時間遅延し、その入出力間
の複数（M2個）の補間ＬＳＰ候補が９１４でＬＳＰ線形
補間候補演算されて、ＬＳＰ補間候補列が９１５に導か
れる。９１５では入力されたＬＳＰ補間候補と利得補間
候補間の組み合わせ（最大M1×M2個）から補間ハーモニ
クススペクトルＡ’[l]を復元する。９１６では復元さ
れた補間ハーモニクススペクトル列の集まりと、９０３
から入力された奇数フレームのハーモニクススペクトル
列を順次比較し、最もスペクトル誤差が少ない補間ハー
モニクススペクトル列を与える組み合わせ番号をスペク
トル補間番号として選定する。また、前記図１に示した
スペクトル利得符号は９０８から、またスペクトル包絡
符号は９０９からそれぞれ得られ、９１７で本処理は終
了する。図６に示した本発明によるハーモニクススペク
トル振幅列の符号化方法を用いれば、スペクトル修正さ
れた２フレーム分のスペクトル利得符号とスペクトル包
絡符号を、１フレーム分の修正されたスペクトル利得符
号とスペクトル包絡符号とスペクトル補間符号で、良好
に符号化が出来るため、音声品質の改善と符号化ビット
数の削減が出来る。The quantization gain g'from 908 is delayed by 912 for two frames, and a plurality of (M1) interpolation gain candidates between its input and output are subjected to logarithmic gain linear interpolation operation at 911. 91
Guided to 5. Also, the quantized LSP (F ′ from 909
[j]) is delayed for two frames at 913, and a plurality (M2) of interpolated LSP candidates between its input and output are subjected to LSP linear interpolation candidate calculation at 914, and the LSP interpolation candidate sequence is led to 915. At 915, the interpolation harmonics spectrum A ′ [l] is restored from the combination (maximum M1 × M2) between the input LSP interpolation candidates and gain interpolation candidates. In 916, a set of restored interpolated harmonics spectrum sequences and 903
The harmonic spectrum sequences of the odd frames input from are sequentially compared, and the combination number that gives the interpolated harmonic spectrum sequence with the smallest spectrum error is selected as the spectrum interpolation number. Further, the spectrum gain code and the spectrum envelope code shown in FIG. 1 are obtained from 908 and 909, respectively, and this processing ends at 917. By using the encoding method of the harmonics spectrum amplitude sequence according to the present invention shown in FIG. 6, the spectrum-gain code and the spectrum envelope code for two spectrum-corrected frames are converted into the spectrum gain code and the spectrum envelope for one frame. Since good coding can be performed with the code and the spectrum interpolation code, it is possible to improve the voice quality and reduce the number of coded bits.

【００３７】なお、以上の説明では、判りやすくするた
めに図１の入力切換部１０３や１４１は、フレーム毎に
切り換えることとして説明したが、特にフレーム毎の交
互の切り換えに限定するものではなく、必要に応じて切
り換えの周期を変更しての動作も容易に実現が可能であ
る。これにより図４、図５、図６の流れ図にも若干の変
更が発生するが、関連する技術者、研究者には容易に該
当箇所を変更することが可能である。また、上記実施の
形態においては、符号化ビット数をより削減するため
に、有声／無声の判定情報を送出しないフレームを設け
たが、すべてのフレームについて有声／無声情報を送出
するようにしてもよい。さらに、上記においては、ハー
モニクススペクトル振幅列を利得と線スペクトル対（Ｌ
ＳＰ）で量子化したが、利得と線形予測係数（ＬＰＣ係
数）で量子化するようにしてもよい。In the above description, the input switching units 103 and 141 in FIG. 1 are described as switching for each frame for the sake of clarity. However, the switching is not limited to alternate switching for each frame. It is possible to easily realize the operation by changing the switching cycle as needed. This causes some changes in the flow charts of FIGS. 4, 5, and 6, but the relevant engineers and researchers can easily change the relevant parts. Further, in the above-described embodiment, in order to further reduce the number of encoded bits, the frame in which the voiced / unvoiced determination information is not transmitted is provided, but the voiced / unvoiced information may be transmitted for all the frames. Good. Further, in the above, the harmonics spectrum amplitude sequence is set to the gain and line spectrum pair (L
Although it is quantized by SP), it may be quantized by the gain and the linear prediction coefficient (LPC coefficient).

【００３８】[0038]

【発明の効果】以上述べた様に、本発明の音声符号化方
法および装置によれば、音声のフレーム毎に、音声ピッ
チ（または音声基本周波数）、各スペクトルバンドのV/
UV情報、及びハーモニクススペクトル振幅列からなる音
声符号化パラメータで表した分析合成型の音声符号化方
法において、音声ピッチを、対数ピッチとして差分量子
化または均一量子化するフレームと、フレーム間補間イ
ンデックスで量子化するフレームの切り換えにより符号
化することで、大幅に符号化ビット数を低下することが
出来る。また、V/UV情報を音声基本周波数の範囲で決ま
るバンド数毎に、予め本発明による方法で取得した代表
V/UV値のインデックス番号で符号化することで、合理的
にV/UV符号化ビット数を削減することが出来る。更に、
V/UV情報をフレームで間引き、V/UV情報を伝送しないフ
レームのV/UV情報については、本発明に述べた前後のフ
レームから類推する方法により更にV/UV符号化ビット数
を削減することが出来る。また、音声ハーモニクススペ
クトル振幅列の０次のハーモニクス振幅値を本発明に述
べた方法で修正後、自己回帰型線形予測モデル化し、そ
のモデル係数を量子化して伝送するフレームと、モデル
係数そのものを量子化せずに、すでに量子化されたモデ
ル係数からフレーム間補間により複数の補間モデル化係
数を求め、最良の補間モデル化係数を与える補間候補の
インデックス番号で量子化するフレームの切り換えによ
り、符号化音声品質の改善を図りながら、ハーモニクス
スペクトル振幅列の符号化ビット数を大幅に低下するこ
とが出来る。以上のように、本発明によれば、分析合成
型の音声符号化方法及び装置において、符号化ビットレ
ートを大きく低下する方法及び装置を提供することが出
来る。また、分析合成型の音声符号化方法及び装置にお
いて、音声符号化のフレーム更新周期を早くすることに
より符号化音声品質を向上させ、かつ符号化ビット数の
増大を防いだ音質の良い分析合成型の音声符号化方法及
び装置を提供することが可能となる。As described above, according to the voice encoding method and apparatus of the present invention, the voice pitch (or the voice fundamental frequency) and the V / V of each spectrum band are set for each voice frame.
In the analysis-synthesis type speech coding method represented by speech coding parameters consisting of UV information and harmonics spectrum amplitude sequence, in the speech pitch, a frame for differential quantization or uniform quantization as a logarithmic pitch, and an interframe interpolation index are used. By encoding by switching the frame to be quantized, the number of encoded bits can be significantly reduced. In addition, the V / UV information is representatively obtained by the method according to the present invention for each number of bands determined by the range of the voice fundamental frequency.
By encoding with the V / UV value index number, the number of V / UV encoded bits can be reasonably reduced. Furthermore,
For V / UV information of a frame in which V / UV information is thinned out in frames and V / UV information is not transmitted, the number of V / UV encoded bits is further reduced by the method of analogy with the preceding and following frames described in the present invention. Can be done. Further, after correcting the 0th-order harmonics amplitude value of the speech harmonics spectrum amplitude sequence by the method described in the present invention, an autoregressive linear prediction model is created, the model coefficient is quantized and transmitted, and the model coefficient itself is quantized. Without encoding, calculate multiple interpolated modeling coefficients from the already quantized model coefficients by inter-frame interpolation, and quantize with the index number of the interpolation candidate that gives the best interpolated modeling coefficient. It is possible to significantly reduce the number of coded bits of the harmonics spectrum amplitude sequence while improving the voice quality. As described above, according to the present invention, it is possible to provide a method and device for significantly lowering the coding bit rate in the analysis and synthesis type speech coding method and device. Further, in the analysis-synthesis type speech encoding method and apparatus, the speech-synthesis type which improves the encoded speech quality by shortening the frame updating period of speech encoding and prevents the increase in the number of encoded bits is provided. It is possible to provide the voice encoding method and device.

[Brief description of drawings]

【図１】本発明の音声符号化方法が適用された音声符
号化装置の一実施の形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a speech coding apparatus to which a speech coding method of the present invention is applied.

【図２】本発明の補間ピッチ候補作成部における補間
ピッチ候補作成処理を説明するための図である。FIG. 2 is a diagram for explaining an interpolation pitch candidate creation process in an interpolation pitch candidate creation unit of the present invention.

【図３】本発明における発生頻度の低い有声／無声情
報の消去方法を説明するための図である。FIG. 3 is a diagram for explaining a method of erasing voiced / unvoiced information having a low occurrence frequency according to the present invention.

【図４】本発明の音声符号化方法における音声ピッチ
の符号化の流れ図である。FIG. 4 is a flow chart of voice pitch encoding in the voice encoding method of the present invention.

【図５】本発明の音声符号化方法における有声／無声
情報の符号化の流れ図である。FIG. 5 is a flow chart of encoding voiced / unvoiced information in the voice encoding method of the present invention.

【図６】本発明の音声符号化方法におけるハーモニク
ススペクトル振幅列の符号化の流れ図である。FIG. 6 is a flowchart of encoding a harmonic spectrum amplitude sequence in the speech encoding method of the present invention.

【図７】音声符号化伝送装置の構成図である。FIG. 7 is a configuration diagram of a voice encoding / transmission device.

【図８】音声符号化パラメータ抽出部のブロック図で
ある。FIG. 8 is a block diagram of a speech coding parameter extraction unit.

【図９】従来の音声符号化方式におけるパラメータ符
号化部のブロック図である。FIG. 9 is a block diagram of a parameter encoding unit in a conventional speech encoding method.

[Explanation of symbols]

１０２対数変換部、１０３入力切換部、１０５均
一量子化部、１０８ピッチ比較部、１１１遅延部、１
１２減算部、１１３差分量子化部、１１４加算
部、１１７補間ピッチ候補作成部、１１９補間点比
較部、１３２代表有声／無声情報コードブック、１３３
フレーム間引き部、１３４有声／無声比較部、１４
１入力切換部、１４２スペクトル修正部、１４３
スペクトル補間部、１４４線形予測モデル化部、１４
５利得量子化部、１４６利得遅延部、１４７モデ
ル化係数変換部、１４８ＬＳＰ量子化部、１４９Ｌ
ＳＰコードブック、１５０利得補間演算部、１５１
ＬＳＰ遅延部、１５２ＬＳＰ補間演算部、１５３スペ
クトル復元部、１５４スペクトル補間点比較部102 logarithmic conversion section, 103 input switching section, 105 uniform quantization section, 108 pitch comparison section, 111 delay section, 1
12 subtraction unit, 113 difference quantization unit, 114 addition unit, 117 interpolation pitch candidate creation unit, 119 interpolation point comparison unit, 132 representative voiced / unvoiced information codebook, 133
Frame thinning section, 134 Voiced / unvoiced comparison section, 14
1 input switching unit, 142 spectrum correction unit, 143
Spectral interpolation unit, 144 Linear prediction modeling unit, 14
5 gain quantization unit, 146 gain delay unit, 147 modeling coefficient conversion unit, 148 LSP quantization unit, 149 L
SP codebook, 150 Gain interpolation calculation unit, 151
LSP delay unit, 152 LSP interpolation calculation unit, 153 spectrum restoration unit, 154 spectrum interpolation point comparison unit

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−130999（ＪＰ，Ａ) 特開平９−172413（ＪＰ，Ａ) 特開昭61−150000（ＪＰ，Ａ) 特開昭62−38500（ＪＰ，Ａ) ＴｈｏｍａｓＥｒｉｋｓｓｏｎ，Ｈｏｎｇ−ＧｏｏＫａｎｇ，ＰｉｔｃｈＱｕａｎｔｉｚａｓｔｉｏｎｉｎＬｏｗＢｉｔ−ＲａｔｅＳｐｅｅｃｈＣｏｄｉｎｇ，ＩＣＡＳＳＰ’99, 1999年，ｐ．489−492 佐々木誠司，他，混合励振線形予測符号化を用いた業務用移動通信向け低ビットレート音声コーデック，電子情報通信学会論文誌Ｄ−ＩＩ，2001年４月，Ｖｏｌ．Ｊ84−Ｄ−ＩＩ，Ｎｏ．４，ｐ. 629−640 麓照夫，他，マルチバンド励振ハーモニック線形予測符号化を用いた業務用移動通信向け低ビットレート音声コーデック，電子情報通信学会論文誌Ｄ−ＩＩ, 2003年１月，Ｖｏｌ．Ｊ86−Ｄ−ＩＩ，Ｎｏ．１，ｐ．32−41 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G01L 11/06 G01L 13/00 G01L 19/02 G01L 19/04 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── --Continued from the front page (56) References JP-A-6-130999 (JP, A) JP-A-9-172413 (JP, A) JP-A 61-150000 (JP, A) JP-A 62- 38500 (JP, A) Thomas Eriksson, Hong-Goo Kang, Pitch Quantization in Low Bit-Rate Spec Coding, ICASSP'99, 1999, p. 489-492 Seiji Sasaki, et al., Low bit rate speech codec for commercial mobile communication using mixed excitation linear predictive coding, IEICE Transactions D-II, April 2001, Vol. J84-D-II, No. 4, p. 629-640 Teruo Kuro, et al., Low-bit-rate speech codec for commercial mobile communications using multi-band excitation harmonic linear predictive coding, IEICE Transactions D-II, 2003 1 Moon, Vol. J86-D-I I, No. 1, p. 32-41 (58) Fields investigated (Int.Cl. ⁷ , DB name) G01L 11/06 G01L 13/00 G01L 19/02 G01L 19/04 JISST file (JOIS)

Claims

(57) [Claims]

1. A voice encoding method for obtaining and encoding a voice encoding parameter from a voice signal which is digitized and divided into frames of a predetermined time length, wherein the voice pitch as the voice encoding parameter is a difference. Select either quantization method or uniform quantization method to select a frame to be quantized by one of the quantization methods and one of a plurality of interpolated speech pitch candidates calculated using the quantized speech pitch of the preceding and following frames. And coding with a combination with a frame to be quantized by the index number, voiced / unvoiced information as the voice coding parameter,
The step of encoding by the index number of the closest one of the limited number of representative voiced / unvoiced information, and the harmonic spectrum amplitude sequence as the voice encoding parameter, the linear prediction coefficient ( Or a line spectrum pair derived from it) and a frame to be quantized by gain, and a plurality of interpolated linear prediction coefficient (or line spectrum pair) candidates calculated using linear prediction coefficients (or line spectrum pairs) of preceding and following frames. A speech code characterized by including a step of selecting one of a plurality of combinations of interpolation gain candidates calculated using gains of preceding and following frames, and encoding by a combination of frames to be quantized by the index number. Method.

2. The difference quantization method and the uniform quantization method in the quantization of the voice pitch are selected when the difference quantization error is less than or equal to a threshold value, and when the difference quantization error is more than a threshold value, the difference quantization method is selected. This is a process of selecting a quantization method with less quantization error, and the selection of the interpolated voice pitch candidate is performed by interpolation that is closest to the voice pitch of the current frame among a plurality of interpolated voice pitch candidates calculated from voice pitches of preceding and following frames. The voice encoding method according to claim 1, wherein the process is a process of selecting a voice pitch candidate.

3. The voice encoding method according to claim 1, wherein the quantization of the voice pitch is performed using a logarithmic pitch obtained by logarithmically converting the voice pitch.

4. The voice encoding method according to claim 1, wherein the difference quantization of the voice pitch is performed by setting a larger quantization step as the difference value increases.

5. The limited number of representative voiced / unvoiced information items are deleted from a large number of pre-acquired voiced / unvoiced information in ascending order, and the removed voiced / unvoiced information is generated. It is created by reducing the frequency to a desired limited number of voiced / unvoiced information by allocating and integrating the frequencies to adjacent voiced / unvoiced information according to the distance and size to the adjacent voiced / unvoiced information. The voice encoding method according to claim 1, wherein

6. The voiced / unvoiced information distance is represented by 0 or 1 for voiced / unvoiced sound in each voice spectrum band, and the value of 0 or 1 in each voice spectrum band is represented by a binary number in order of frequency of the voice spectrum band. 2. The speech coding method according to claim 1, wherein the speech coding method is applied to bits and expressed with a difference between the binary values.

7. The voice encoding method according to claim 1, wherein said voiced / unvoiced information is constituted by a combination of a frame for transmitting voiced / unvoiced information and a frame for omitting not to be transmitted.

8. The frame in which the voiced / unvoiced information is omitted is decoded by using voiced / unvoiced information of a frame having a large energy or an average amplitude value among the preceding and following frames. Item 7. The audio encoding method according to Item 7.

9. When modeling the harmonics spectrum amplitude sequence with a linear prediction model, a 0th order (DC component)
2. The speech coding method according to claim 1, wherein the harmonic prediction spectrum amplitude value is corrected, and then linear prediction modeling of the harmonics spectrum amplitude is performed.

10. The plurality of interpolation gain candidates include a gain obtained by equally or unequally dividing the gains of the preceding and following frames, a gain having a larger gain and a gain having a smaller gain in the preceding and following frames. The speech encoding method according to claim 1, further comprising:

11. A voice encoding device for acquiring and encoding a voice encoding parameter from a voice signal which is digitized and divided into frames of a predetermined time length, wherein a voice pitch as the voice encoding parameter is a difference. Select either quantization method or uniform quantization method to select a frame to be quantized by one of the quantization methods and one of a plurality of interpolated speech pitch candidates calculated using the quantized speech pitch of the preceding and following frames. And means for encoding by a combination of frames quantized by the index number, and voiced / unvoiced information as the voice encoding parameter,
A means for encoding by means of the index number of the closest distance from the limited number of representative voiced / unvoiced information, and the harmonic spectrum amplitude sequence as the voice encoding parameter, the linear prediction coefficient (or The frame to be quantized by the line spectrum pair derived from it and the gain, and a plurality of interpolated linear prediction coefficient (or line spectrum pair) candidates calculated using the linear prediction coefficient (or line spectrum pair) of the preceding and succeeding frames and the front and back Speech code including means for selecting any one of a plurality of combinations of interpolation gain candidates calculated using the gains of the frames and encoding with a combination of frames to be quantized by the index number. Device.