JP2002099300A

JP2002099300A - Method and device for coding voice

Info

Publication number: JP2002099300A
Application number: JP2000292666A
Authority: JP
Inventors: Teruo Fumoto; 照夫麓; Seiji Sasaki; 佐々木誠司
Original assignee: YRP KOKINO IDOTAI TSUSHIN KENK; YRP Advanced Mobile Communication Systems Research Laboratories Co Ltd
Current assignee: YRP KOKINO IDOTAI TSUSHIN KENK; YRP Advanced Mobile Communication Systems Research Laboratories Co Ltd
Priority date: 2000-09-26
Filing date: 2000-09-26
Publication date: 2002-04-05
Anticipated expiration: 2020-09-26
Also published as: JP3453116B2

Abstract

PROBLEM TO BE SOLVED: To code a voice pitch which is a voice coding parameter of analytico- synthetic type, vocal/silent information and amplitude of harmonics spectrum with a small number of bits. SOLUTION: The voice pitch is quantized by switching a frame of logarithmic pitch, which is quantized by uniform/differential selection, to a frame, which is quantized by an interpolating pitch candidate index between the frames, by an input switching part 103. The vocal/silent information is quantized by switching a frame, which is quantized by an index of the vocal/silent information selected from representative vocal/silent information 132, to a frame, which does not transmit the vocal/silent information, by a frame thinning part 133. The amplitude of harmonics spectrum is quantized by switching a frame, which is modeled by an autoregressive linear predictive model adjusted in spectrum, to a frame which is quantized by the index of interpolating candidate of modeling coefficient, by an input switching part 141. As a result, the voice coding parameter can be satisfactorily quantized with low bit rate.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号をデジタ
ル化して所定の時間間隔毎にその特徴を表す音声符号化
パラメータを符号化する音声符号化方法及び装置に関す
るものであり、その符号化した音声符号化パラメータを
伝送または蓄積し、伝送先または蓄積先から必要な時に
音声符号化パラメータを復元し、復元した音声符号化パ
ラメータから音声信号を合成して音声を伝えるデジタル
携帯電話やデジタル音声蓄積装置などに使用して好適な
ものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio encoding method and apparatus for digitizing an audio signal and encoding an audio encoding parameter representing its characteristics at predetermined time intervals. Digital mobile phones or digital voice storage that transmits or stores voice coding parameters, restores voice coding parameters from the transmission destination or storage destination when necessary, synthesizes voice signals from the recovered voice coding parameters, and transmits voice. It is suitable for use in devices and the like.

【０００２】[0002]

【従来の技術】デジタル化された音声信号は、データ圧
縮、誤り処理、多重化などさまざまなデジタル信号処理
が可能になるため、固定電話や移動電話に限らず音声を
利用するマルチメディアシステムなどに広く取り入れら
れている。アナログの音声信号をデジタル化するには、
一般に入力音声周波数帯域の２倍以上の標本化周波数で
標本化し、耳で識別できない程度の量子化ステップで量
子化が必要なため、アナログ信号と比較し広い伝送周波
数帯域幅を必要とする。そのため、一旦デジタル化され
た音声信号は、要求される音声品質に応じてさまざまな
符号化方式や変調方式によりデータの圧縮が行われてい
る。2. Description of the Related Art Digitized audio signals can be subjected to various digital signal processing such as data compression, error processing, and multiplexing. Widely adopted. To digitize analog audio signals,
Generally, sampling is performed at a sampling frequency that is twice or more the input voice frequency band, and quantization is required in a quantization step that cannot be discerned by ears. Therefore, a wider transmission frequency bandwidth is required compared to analog signals. Therefore, the audio signal once digitized is subjected to data compression by various coding methods and modulation methods according to the required sound quality.

【０００３】高い音声データの圧縮率が得られる方法と
して、音声の持つ特徴を積極的に利用する分析合成型の
音声符号化方式とそこから得られた音声符号化パラメー
タを効率的に量子化する方法が考えられている。例え
ば、衛星携帯電話に一部使用されているＭＢＥ（Multi-
Band Excitation）方式もしくはＩＭＢＥ（Improved Mu
lti-Band Excitation）方式は、この分析合成型の音声
符号化方式の一種で、音声を所定の時間間隔（20msec）
でセグメントに分割してフレームを構成し、そのフレー
ム毎に、音声ピッチ（又はその逆数としての音声基本周
波数）、フレームの音声の周波数スペクトルから得られ
る音声ハーモニクススペクトル振幅列、周波数スペクト
ルを適当な周波数領域に分割した周波数バンド毎の有声
／無声情報（Voiced/Unvoiced情報、又はV/UV情報）を
音声符号化パラメータとし、各フレームに対して、音声
ピッチは８ビット均一量子化、バンド毎のV/UV情報ｖ
[k]（ｋはバンドの番号）は０／１の２進数で表したバ
イナリ値でＫビット量子化（Ｋ：最大バンド数で最大12
ビットの可変長）、音声ハーモニクス振幅列はフレーム
間予測差分値を２次元変換しそのＤＣＴ（離散コサイン
変換）係数を75−Kビット量子化して4.15kbpsの音声符
号化速度を得ている。As a method of obtaining a high compression ratio of voice data, an analysis-synthesis type voice coding method that positively uses characteristics of voice and a voice coding parameter obtained therefrom are efficiently quantized. A way is being considered. For example, MBE (Multi-
Band Excitation) or IMBE (Improved Mu)
lti-Band Excitation) is a type of analysis-synthesis-type speech coding system in which speech is transmitted at a predetermined time interval (20 msec).
, A frame is formed by dividing into segments, and for each frame, a voice pitch (or a voice fundamental frequency as a reciprocal thereof), a voice harmonics spectrum amplitude sequence obtained from the frequency spectrum of the voice of the frame, and a frequency spectrum of an appropriate frequency Voiced / unvoiced information (Voiced / Unvoiced information or V / UV information) for each frequency band divided into regions is used as a voice coding parameter. For each frame, the voice pitch is 8-bit uniform quantization, and V for each band. / UV information v
[k] (k is the number of a band) is a binary value represented by a binary number of 0/1, and is K-bit quantized (K: 12 at the maximum number of bands)
(Variable length of bits), and the speech harmonic amplitude sequence obtains a speech encoding speed of 4.15 kbps by two-dimensionally transforming the inter-frame prediction difference value and quantizing its DCT (discrete cosine transform) coefficient by 75-K bits.

【０００４】図７は、一般的な音声符号化伝送装置の構
成を示した図である。音声入力端子３０１から入力され
た標本化・量子化された音声デジタル信号を、音声符号
化パラメータ抽出部３０２で所定の時間間隔のセグメン
トに分割してフレームを構成し、そのフレーム毎に音声
符号化パラメータを抽出する。抽出する音声符号化パラ
メータは音声符号化方式により異なり、例えば前記のＭ
ＢＥ方式では、音声ピッチ、音声ハーモニクススペクト
ル振幅列、各周波数バンドのV/UV情報である。パラメー
タ符号化部３０３は、抽出した音声符号化パラメータを
効果的に符号化して符号量を低減せしめ、送信部３０４
を介して伝送路３０５に送り出す。受信部３０６で受け
取った信号は、パラメータ復号化部３０７で音声符号化
パラメータを復元し、音声合成部３０８は音声符号化パ
ラメータ抽出部３０２と逆の動作により合成音声を作成
し音声出力端子３０９から音声デジタル信号を出力す
る。FIG. 7 is a diagram showing a configuration of a general speech coded transmission apparatus. The sampled and quantized audio digital signal input from the audio input terminal 301 is divided into segments at predetermined time intervals by an audio encoding parameter extraction unit 302 to form frames, and audio encoding is performed for each frame. Extract parameters. The speech coding parameters to be extracted differ depending on the speech coding method.
In the BE system, it is a voice pitch, a voice harmonic spectrum amplitude sequence, and V / UV information of each frequency band. The parameter encoding unit 303 effectively encodes the extracted speech encoding parameters to reduce the code amount, and
To the transmission path 305 via The signal received by the receiving unit 306 is used to restore the speech coding parameters by the parameter decoding unit 307, and the speech synthesis unit 308 creates a synthesized speech by the reverse operation of the speech coding parameter extraction unit 302, and outputs the speech from the speech output terminal 309. Outputs audio digital signal.

【０００５】図８は前記ＭＢＥ方式の場合における前記
音声符号化パラメータ抽出部３０２のブロック構成図で
ある。デジタル入力音声信号は入力端子３０１から基本
周波数推定部４０１に入力され、ここで音声の基本周波
数が推定される。基本周波数の推定値は、時間遅れの自
己相関関数が最大となる時間の逆数値として計算され
る。周波数スペクトル計算部４０２では、ハミング窓等
の窓関数によりフレームから切り出した有限長の音声信
号を周波数分析して音声周波数スペクトルを得る。基本
周波数修正部４０３は、推定された音声基本周波数と前
記窓関数により合成されるスペクトルと前記音声周波数
スペクトルとの誤差最小条件で、Ａ−ｂ−Ｓ（Analysis
-by-Synthesis）手法により修正音声基本周波数ωoとハ
ーモニクススペクトル振幅列を同時に求める。有声強度
計算部４０４は修正音声基本周波数ωoに基づいて、周
波数帯域を複数の周波数バンドｋ（k=1,2,...,K）に分
割し、各周波数バンド毎に合成された合成スペクトルと
音声周波数スペクトルの誤差を計算し、閾値判定により
V/UV情報ｖ[k]を出力する。スペクトル包絡計算部４０
５はV/UV情報ｖ[k]により、有声バンドではＡ−ｂ−Ｓ
手法で求めた各ハーモニクススペクトル振幅、無声バン
ドでは各ハーモニクスの周波数帯域での周波数スペクト
ルのルート二乗平均値をスペクトル包絡絶対値|Ａ(ω)|
として出力する。FIG. 8 is a block diagram of the speech coding parameter extraction unit 302 in the case of the MBE system. The digital input audio signal is input from an input terminal 301 to a fundamental frequency estimating unit 401, where the fundamental frequency of the audio is estimated. The estimated value of the fundamental frequency is calculated as the reciprocal value of the time at which the autocorrelation function of the time delay is maximized. The frequency spectrum calculation unit 402 obtains a voice frequency spectrum by frequency-analyzing a finite-length voice signal cut out from a frame using a window function such as a Hamming window. The fundamental frequency correction unit 403 performs an AbS (Analysis) under a minimum error condition between the estimated speech fundamental frequency, the spectrum synthesized by the window function, and the speech frequency spectrum.
-by-Synthesis) method to simultaneously obtain the modified voice fundamental frequency ωo and the harmonics spectrum amplitude sequence. The voiced strength calculation unit 404 divides the frequency band into a plurality of frequency bands k (k = 1, 2,..., K) based on the modified voice fundamental frequency ωo, and synthesizes a synthesized spectrum for each frequency band. And the difference between the voice frequency spectrum and the threshold
Outputs V / UV information v [k]. Spectrum envelope calculator 40
5 is based on V / UV information v [k].
In the unvoiced band, the root-mean-square value of the frequency spectrum in each harmonics frequency band is calculated as the spectral envelope absolute value | A (ω) |
Output as

【０００６】図９は前記ＩＭＢＥ方式の場合における前
記パラメータ符号化部３０３のブロック構成図である。
入力端子５０１に入力された音声基本周波数ωoは、基
本周波数量子化部５０２で、予め定めた量子化範囲及び
量子化ステップで８ビットに均一量子化し、その量子化
値B0を出力端子５０３に出力する。入力端子５０４に入
力されたV/UV情報ｖ[k]は、V/UV情報量子化部５０５
で、例えば周波数バンド数Ｋが１２の場合は１２個の０
又は１の情報で表した２進数１２ビット値B1として出力
端子５０６に出力する。入力端子５０７に入力されたス
ペクトル包絡|Ａ(ω)|は、離散的なハーモニクススペク
トル振幅列|Ａ(ωi)|、(i=1,2,3,...,N、Ｎ：ハーモニ
クス本数）として入力される。まず対数変換部５０８で
対数変換後、減算器５０９で前フレームのハーモニクス
スペクトル振幅列から予測した予測ハーモニクススペク
トル振幅列５２１との差（これを「予測差分値列」と呼
ぶことにする）が計算され、ブロック変換部５１０に渡
される。ブロック変換部５１０では予測差分値列をハー
モニクスの順位により６種類に順次分類して２次元デー
タとし、次のＤＣＴ（離散コサイン変換）部５１１に渡
しＤＣＴ係数が計算され量子化部５１２に渡され、ＤＣ
Ｔ係数の次数により予め選定した均一量子化法とベクト
ル量子化法の組み合わせにより、予測差分値列の量子化
データB2が出力端子５１３に出力される。量子化復元部
５１４、逆ＤＣＴ部５１５およびブロック復元部５１６
は、量子化された予測差分値列を復元し、加算器５１７
で予測ハーモニクススペクトル振幅列５２１と加算し、
現フレームの入力スペクトル包絡の量子化スペクトル包
絡値が復元される。その量子化スペクトル包絡値はフレ
ーム遅延部５１８で１フレーム遅延し、新たに入力され
る次フレームの音声基本周波数ωoと前フレームの基本
周波数を元にしてスペクトル包絡予測部５１９で次フレ
ームのスペクトル包絡値を予測し、その予測値を前記減
算器５０９に導き、次フレームのスペクトル包絡の量子
化に備える。FIG. 9 is a block diagram of the parameter encoding unit 303 in the case of the IMBE system.
The fundamental voice frequency ωo input to the input terminal 501 is uniformly quantized to 8 bits in a predetermined quantization range and a predetermined quantization step by the basic frequency quantization unit 502, and the quantized value B0 is output to the output terminal 503. I do. The V / UV information v [k] input to the input terminal 504 is output to the V / UV information quantization unit 505.
For example, when the number of frequency bands K is 12, twelve 0s
Alternatively, it outputs to the output terminal 506 as a binary 12-bit value B1 represented by the information of 1. The spectrum envelope | A (ω) | input to the input terminal 507 is a discrete harmonics spectrum amplitude sequence | A (ωi) |, (i = 1,2,3, ..., N, N: number of harmonics ). First, after the logarithmic conversion unit 508 performs logarithmic conversion, a subtractor 509 calculates a difference from a predicted harmonics spectrum amplitude sequence 521 predicted from the harmonics spectrum amplitude sequence of the previous frame (this is referred to as a “prediction difference value sequence”). Then, it is passed to the block conversion unit 510. The block transform unit 510 sequentially classifies the prediction difference value sequence into six types according to the order of harmonics to form two-dimensional data, passes the resulting data to a DCT (discrete cosine transform) unit 511, calculates DCT coefficients, and passes the DCT coefficients to a quantization unit 512. , DC
The quantized data B2 of the prediction difference value sequence is output to the output terminal 513 by a combination of the uniform quantization method and the vector quantization method preliminarily selected according to the order of the T coefficient. Quantization restoration section 514, inverse DCT section 515, and block restoration section 516
Restores the quantized sequence of prediction difference values, and
Is added to the predicted harmonics spectrum amplitude sequence 521,
The quantized spectrum envelope value of the input spectrum envelope of the current frame is restored. The quantized spectrum envelope value is delayed by one frame in the frame delay unit 518, and the spectrum envelope prediction unit 519 uses the spectrum fundamental prediction unit 519 based on the newly input speech fundamental frequency ωo of the next frame and the fundamental frequency of the previous frame. A value is predicted, and the predicted value is led to the subtractor 509 to prepare for quantization of the spectral envelope of the next frame.

【０００７】このＭＢＥ方式の原理は、D. W. Griffin
and J. S. Lim, "Multi-band Excitation Vocoder" , I
EEE Transactions on Acoustics, speech, and signal
processing, vol.36,No.8,August 1988, pp1223-1235に
記載されている。又、符号化器の構成方法はＩＭＢＥ方
式の音声符号化手順として、USP-5491722（Methods for
speech transmssion,Feb.13,1996）により詳しく開示
されている。このように、音声をデジタル化して低ビッ
トレートの音声符号化を実現する方法として、音声合成
モデルに基づく音声符号化パラメータを抽出して符号化
を行う分析合成型の音声符号化方式が提案され、一部実
用に供されている。The principle of the MBE system is based on DW Griffin
and JS Lim, "Multi-band Excitation Vocoder", I
EEE Transactions on Acoustics, speech, and signal
processing, vol. 36, No. 8, August 1988, pp 1223-1235. Also, the encoding method of the encoder is USP-5491722 (Methods for
speech transmssion, Feb. 13, 1996). As described above, an analysis-synthesis-type speech coding scheme that extracts and encodes speech coding parameters based on a speech synthesis model has been proposed as a method of realizing low bit rate speech coding by digitizing speech. , Some of which are in practical use.

【０００８】[0008]

【発明が解決しようとする課題】以上述べた分析合成型
の音声符号化方式は低ビットレート音声符号化のために
は有効であるが、この方法は音声をある音声分析・合成
モデルに基づいて音声合成パラメータでのみ音声の分析
・合成を行うため、符号化方式の構成によっては合成音
的な音質になりやすい。この点の改善の手法として、音
声フレーム更新周期を短く設定することにより、音声フ
レーム内での音声パラメータの変化を少なくし、分析合
成型でありながら高音質化を図る手段が考えられる。フ
レーム更新周期を短く設定した場合の符号化音声品質の
改善効果について、麓他，“業務用移動体通信向け音声
符号化方式の検討”，電子情報通信学会2000年全国大
会，D14-2,p171,Mar.2000で報告されている。The above-described analysis-synthesis type speech coding method is effective for low bit rate speech coding. However, this method converts speech based on a certain speech analysis / synthesis model. Since the speech is analyzed and synthesized only with the speech synthesis parameter, the sound quality tends to be a synthesized sound depending on the configuration of the encoding method. As a method of improving this point, a method of setting a short speech frame update period to reduce a change in speech parameters in the speech frame and improving the sound quality while being of the analysis-synthesis type can be considered. Hokuto et al., "Study on speech coding for commercial mobile communications", IEICE 2000 National Convention, D14-2, p171 , Mar.2000.

【０００９】しかしながら、音声フレームの更新周期の
短縮化は、音声の高圧縮率化（低ビットレート化）には
逆行する。例えば、上記、麓らの報告では、音声フレー
ムを２分割してサブフレーム構造として音声符号化パラ
メータを取りだしており、従来方法で音声符号化パラメ
ータの符号化を行うと２倍のビットレートが必要とな
る。そこで、音声セグメント長を短く設定した場合で
も、パラメータの符号化ビット数が極端に大きくならな
いパラメータ符号化方法が望まれる。この様に、低ビッ
トレートをめざした分析合成型音声符号化方式の音声品
質向上のためには、分析合成型の音声符号化パラメータ
の効率的な符号化方法の課題があり、特にフレーム更新
周期を短くした場合の符号化ビットレートの増大を防止
する方策への課題がある。[0009] However, shortening of the update period of the audio frame goes against high compression ratio (low bit rate) of audio. For example, in the above report of Fumoto et al., A speech frame is divided into two and a speech coding parameter is taken out as a subframe structure. If the speech coding parameter is encoded by a conventional method, a double bit rate is required. Becomes Therefore, a parameter encoding method in which the number of encoded bits of the parameter does not become extremely large even when the audio segment length is set short is desired. As described above, in order to improve the voice quality of the analysis-synthesis-type speech coding scheme aiming at a low bit rate, there is a problem of an efficient coding method of the analysis-synthesis-type speech coding parameter. There is a problem in a measure for preventing an increase in the encoding bit rate when is shortened.

【００１０】そこで、本発明は、分析合成型の音声符号
化方法及び装置において、符号化ビットレートを大きく
低下することのできる音声符号化方法及び装置を提供す
ることを目的としている。また、分析合成型の音声符号
化方法及び装置において、音声符号化のフレーム更新周
期を早くすることにより符号化音声品質を向上させ、か
つ符号化ビット数の増大を防いだ音質の良い分析合成型
の音声符号化方法及び装置を提供することを目的として
いる。Accordingly, an object of the present invention is to provide a speech encoding method and apparatus capable of greatly reducing the encoding bit rate in an analysis-synthesis speech encoding method and apparatus. Also, in the analysis-synthesis type speech coding method and apparatus, the analysis-synthesis-type speech coding method improves the quality of the coded speech by increasing the frame update period of the speech coding and prevents the number of coded bits from increasing. It is an object of the present invention to provide a speech encoding method and apparatus.

【００１１】[0011]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声符号化方法は、デジタル化され所定時
間長のフレームに分割された音声信号から音声符号化パ
ラメータを取得し符号化する音声符号化方法であって、
前記音声符号化パラメータとしての音声ピッチを、差分
量子化法と均一量子化法の選択によりいずれかの量子化
法により量子化するフレームと、前後のフレームの量子
化音声ピッチを用いて計算した複数の補間音声ピッチ候
補のうちのいずれかを選択し、そのインデックス番号に
より量子化するフレームとの組み合わせにより符号化す
るステップ、前記音声符号化パラメータとしての有声／
無声情報を、限定された数の代表有声／無声情報の中か
ら最も近い距離にあるもののインデックス番号により符
号化するステップ、および、前記音声符号化パラメータ
としてのハーモニクススペクトル振幅列を、線形予測モ
デルによる線形予測係数（もしくはそれより導かれる線
スペクトル対）および利得により量子化するフレーム
と、前後のフレームの線形予測係数（もしくは線スペク
トル対）を用いて計算した複数の補間線形予測係数（も
しくは線スペクトル対）候補と前後のフレームの利得を
用いて計算した複数の補間利得候補の組合せのうちのい
ずれかを選択し、そのインデックス番号により量子化す
るフレームの組み合わせにより符号化するステップを含
むものである。In order to solve the above-mentioned problems, a voice coding method according to the present invention obtains voice coding parameters from a voice signal digitized and divided into frames of a predetermined time length, and performs coding. Speech encoding method,
The voice pitch as the voice coding parameter, a plurality of frames calculated by using the quantized voice pitch of a frame to be quantized by any one of the quantization methods by selecting the difference quantization method and the uniform quantization method, and the preceding and succeeding frames. Selecting one of the interpolated speech pitch candidates of the above, and encoding by combining with a frame to be quantized by the index number,
Encoding the unvoiced information by the index number of the nearest one of a limited number of representative voiced / unvoiced information, and a harmonics spectrum amplitude sequence as the speech encoding parameter is calculated by a linear prediction model. A frame to be quantized by a linear prediction coefficient (or a line spectrum pair derived therefrom) and a gain, and a plurality of interpolated linear prediction coefficients (or line spectra) calculated using the linear prediction coefficients (or line spectrum pairs) of the preceding and succeeding frames. Pair) selecting one of a plurality of combinations of the interpolation gain candidates calculated using the gains of the candidate and the preceding and succeeding frames, and performing encoding by a combination of frames to be quantized by the index number.

【００１２】また、前記音声ピッチの量子化における差
分量子化法と均一量子化法の選択は、差分量子化誤差が
ある閾値以下の場合は差分量子化法を選択し、それ以上
の場合は量子化誤差の少ない量子化法を選択する処理で
あり、前記補間音声ピッチ候補の選択は、前後のフレー
ムの音声ピッチから計算される複数の補間音声ピッチ候
補のうち最も現在フレームの音声ピッチに近い補間音声
ピッチ候補を選ぶ処理とされている。さらに、前記音声
ピッチの量子化は、音声ピッチを対数変換した対数ピッ
チを用いて行うようになされている。さらにまた、前記
音声ピッチの差分量子化は、差分値が大きくなるにつれ
て量子化ステップを大きく設定して行なうものとされて
いる。In the quantization of the voice pitch, the difference quantization method and the uniform quantization method are selected. When the difference quantization error is equal to or less than a certain threshold value, the difference quantization method is selected. In the process of selecting a quantization method with a small quantization error, the selection of the interpolation voice pitch candidate is performed by selecting the interpolation voice pitch closest to the voice pitch of the current frame among a plurality of interpolation voice pitch candidates calculated from the voice pitches of the previous and next frames. The process is to select a voice pitch candidate. Furthermore, the quantization of the voice pitch is performed using a logarithmic pitch obtained by logarithmically converting the voice pitch. Furthermore, the difference quantization of the voice pitch is performed by setting the quantization step to be larger as the difference value increases.

【００１３】さらにまた、前記限定された数の代表有声
／無声情報は、予め取得した多数の有声／無声情報の中
から、発生頻度の低いものから順に削除し、その除去し
た有声／無声情報の発生頻度を隣接する有声／無声情報
に、そこへの距離および大きさに応じて配分し統合する
ことにより、所望の限定個数の有声／無声情報にまで削
減することにより作成されたものとされている。さらに
また、前記有声／無声情報の距離は、音声スペクトル帯
域毎の有声／無声を０又は１で表し、各音声スペクトル
帯域の０又は１の値を、音声スペクトル帯域の周波数順
に２進数のビットに当てはめ、その２進数値の差を持っ
て表すものとされている。さらにまた、前記有声／無声
情報は、有声／無声情報を送るフレームと省略して送ら
ないフレームの組み合わせにより構成されるものであ
る。さらにまた、前記有声／無声情報を省略したフレー
ムでは、その前後のフレームのうちフレームのエネルギ
ーもしくは振幅平均値の大きいフレームが持つ有声／無
声情報を使用して復号するようになされている。さらに
また、前記ハーモニクススペクトル振幅列を線形予測モ
デルでモデル化する場合に、０次（直流成分）のハーモ
ニクススペクトル振幅値を修正した後、ハーモニクスス
ペクトル振幅の線形予測モデル化を行うようになされて
いる。さらにまた、前記複数の補間利得候補には、前後
のフレームの利得間を均等もしくは不均等に分割した利
得、前後のフレームのうちの利得の大きい方以上の利得
および小さい方以下の利得を含むものである。Further, the limited number of representative voiced / unvoiced information is deleted from a large number of previously obtained voiced / unvoiced information in order of frequency of occurrence, and the deleted voiced / unvoiced information is deleted. The generation frequency is assumed to be created by reducing the number of occurrences to a desired limited number of voiced / unvoiced information by distributing the occurrence frequency to adjacent voiced / unvoiced information according to the distance and size to the information, and integrating them. I have. Furthermore, the voiced / unvoiced information distance represents voiced / unvoiced for each voice spectrum band as 0 or 1, and the value of 0 or 1 of each voice spectrum band is converted into a binary bit in the frequency of the voice spectrum band. It is supposed to be applied and represented with the difference between the binary values. Furthermore, the voiced / unvoiced information is constituted by a combination of a frame for transmitting voiced / unvoiced information and a frame not to be omitted and transmitted. Furthermore, in a frame from which the voiced / unvoiced information is omitted, decoding is performed using voiced / unvoiced information of a frame having a large energy or an average amplitude value of the frames before and after the frame. Furthermore, when the harmonics spectrum amplitude sequence is modeled by a linear prediction model, a linear prediction model of the harmonics spectrum amplitude is performed after correcting the harmonics amplitude value of the 0th order (DC component). . Still further, the plurality of interpolation gain candidates include gains obtained by equally or unequally dividing the gains of the preceding and succeeding frames, and gains of the larger and smaller gains of the preceding and following frames. .

【００１４】さらにまた、本発明の音声符号化装置は、
デジタル化され所定時間長のフレームに分割された音声
信号から音声符号化パラメータを取得し符号化する音声
符号化装置であって、前記音声符号化パラメータとして
の音声ピッチを、差分量子化法と均一量子化法の選択に
よりいずれかの量子化法により量子化するフレームと、
前後のフレームの量子化音声ピッチを用いて計算した複
数の補間音声ピッチ候補のうちのいずれかを選択し、そ
のインデックス番号により量子化するフレームの組み合
わせにより符号化する手段と、前記音声符号化パラメー
タとしての有声／無声情報を、限定された数の代表有声
／無声情報の中から最も近い距離にあるもののインデッ
クス番号により符号化する手段と、前記音声符号化パラ
メータとしてのハーモニクススペクトル振幅列を、線形
予測モデルによる線形予測係数（もしくはそれより導か
れる線スペクトル対）および利得により量子化するフレ
ームと、前後のフレームの線形予測係数（もしくは線ス
ペクトル対）を用いて計算した複数の補間線形予測係数
（もしくは線スペクトル対）候補と前後のフレームの利
得を用いて計算した複数の補間利得候補の組合せのうち
のいずれかを選択し、そのインデックス番号により量子
化するフレームの組み合わせにより符号化する手段とを
含むものである。Still further, the speech encoding apparatus of the present invention comprises:
An audio encoding device that acquires and encodes an audio encoding parameter from an audio signal that has been digitized and divided into frames of a predetermined time length, wherein an audio pitch as the audio encoding parameter is equalized with a differential quantization method. A frame to be quantized by any of the quantization methods depending on the selection of the quantization method;
Means for selecting any of a plurality of interpolated speech pitch candidates calculated using the quantized speech pitches of the preceding and succeeding frames, and encoding by a combination of frames to be quantized by the index number; Means for encoding voiced / unvoiced information as an index number of a limited number of representative voiced / unvoiced information at the closest distance from the limited number of representative voiced / unvoiced information; A frame to be quantized by a linear prediction coefficient (or a line spectrum pair derived therefrom) and a gain by a prediction model, and a plurality of interpolated linear prediction coefficients (or a line spectrum pair calculated using the linear prediction coefficients (or a line spectrum pair) of the preceding and following frames) Or line spectrum pair) using the gains of the candidate and the previous and next frames. Select one of the combinations of the plurality of interpolated gain candidate, by its index number is intended to include means for encoding by the combination of a frame to be quantized.

【００１５】このように、本発明においては、音声ピッ
チの符号化に対しては、対数変換したピッチに対して、
差分量子化法と均一量子化法を切り換えて、入力音声ピ
ッチとの誤差が少ない方を選択して量子化する場合と、
フレーム間音声ピッチの複数の補間点から一番近い補間
点候補の番号を選択し、その選択番号で量子化する場合
を、フレーム繰り返しにより切り換えて使用することに
より、音声ピッチの量子化ビット数を減少している。ま
た、有声／無声情報(V/UV情報）の符号化に対しては、
予め多くの音声フレームに対してV/UV情報とその発生頻
度を取得し、その中から固定数の代表V/UV情報を選定
し、その代表V/UV情報の中から各フレームのV/UV情報に
最も似た代表V/UV情報の番号（インデックス）で符号化
する手段をとっている。さらに、V/UV情報の伝送を行わ
ないフレームを適宜挿入し、V/UV情報が送られていない
フレームの復号に対しては、前後のフレームのうち、大
きい音声エネルギーを持った方のフレームのV/UV情報を
用いるようにしている。さらにまた、ハーモニクススペ
クトル振幅列の符号化に関しては、そのスペクトル振幅
列を高次の全極モデルすなわち自己回帰型線形予測モデ
ル（ＡＲモデル）でモデル化し、その線形予測係数(Ｌ
ＰＣ係数)とゲイン、もしくはＬＰＣ係数を変形して得
られるＬＳＰ（線スペクトル対）とゲインを量子化する
手段をとっている。また、フレームのＬＰＣ係数又はＬ
ＳＰとゲインの量子化には、量子化されたＬＰＣ係数又
はＬＳＰとゲインのフレーム間の複数補間点から一番近
い補間点候補の番号を選択し、その選択番号で量子化す
る手段を併用することによりハーモニクススペクトル振
幅列の量子化ビット数を減少するようにしている。As described above, in the present invention, the coding of the voice pitch is performed with respect to the logarithmically converted pitch.
Switching between the difference quantization method and the uniform quantization method, and selecting the one with the smaller error from the input voice pitch to quantize;
By selecting the number of the nearest interpolation point candidate from a plurality of interpolation points of the inter-frame audio pitch and quantizing with the selected number, the number of quantization bits of the audio pitch can be reduced by switching and using the frame repetition. is decreasing. For encoding of voiced / unvoiced information (V / UV information),
The V / UV information and its occurrence frequency are obtained in advance for many voice frames, a fixed number of representative V / UV information is selected from among them, and the V / UV of each frame is selected from the representative V / UV information. Means is used for encoding with the number (index) of the representative V / UV information most similar to the information. Furthermore, a frame that does not transmit V / UV information is appropriately inserted, and decoding of a frame for which V / UV information is not transmitted is performed for decoding a frame having higher sound energy among the preceding and following frames. V / UV information is used. Furthermore, regarding encoding of the harmonics spectrum amplitude sequence, the spectrum amplitude sequence is modeled by a higher-order all-pole model, that is, an autoregressive linear prediction model (AR model), and its linear prediction coefficient (L
(PC coefficient) and gain, or LSP (line spectrum pair) obtained by transforming the LPC coefficient and a means for quantizing the gain. Also, the LPC coefficient of the frame or L
For the quantization of SP and gain, a means for selecting the number of the nearest interpolation point candidate from a plurality of interpolation points between the quantized LPC coefficient or LSP and gain frame and quantizing with the selected number is also used. Thereby, the number of quantization bits of the harmonics spectrum amplitude sequence is reduced.

【００１６】[0016]

【発明の実施の形態】本発明の音声符号化方法及び装置
の一実施の形態について、前記分析合成型音声符号化方
法であるＭＢＥもしくはＩＭＢＥ音声符号化方法に適応
した場合を例にとって説明する。なお、この音声符号化
装置は、前記図７に示したパラメータ符号化部３０３に
対応しており、音声符号化パラメータ抽出部３０２によ
り抽出された音声符号化パラメータ、すなわち、音声ピ
ッチ（またはその逆数である音声基本周波数ωo）、音
声ハーモニクススペクトル振幅列および各周波数バンド
の有声／無声情報（V/UV情報）を効率的に符号化する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of a speech encoding method and apparatus according to the present invention will be described by taking as an example a case where the speech encoding method is applied to the MBE or IMBE speech encoding method which is the analysis-synthesis speech encoding method. Note that this speech encoding device corresponds to the parameter encoding unit 303 shown in FIG. 7, and the speech encoding parameters extracted by the speech encoding parameter extraction unit 302, that is, the speech pitch (or the reciprocal thereof) , The voice harmonic frequency spectrum amplitude sequence and voiced / unvoiced information (V / UV information) of each frequency band are efficiently encoded.

【００１７】図１は本発明の音声符号化方法が適用され
た音声符号化装置の一構成例を示すブロック図である。
例えば前記図８で示した音声符号化パラメータ抽出部で
得られた音声ピッチ（又は音声基本周波数ωo）は入力
端子１０１に入力され、対数変換部１０２で音声ピッチ
が対数変換され、対数音声ピッチP[n]（ｎはフレーム番
号）を得る。対数音声ピッチは文献（Thomas Eriksson
and Hong-Goo Kang, "Pitch Quantization in low Bit-
Rate Speech Coding", ICASSP '99, pp489-492,1999）
に述べられているように、対数ピッチの変化量に対する
人間の検知限界値が、対数ピッチの値の影響をあまり受
けないことが知られている。そのため、量子化ステップ
幅を均一にすることが出来るため都合の良い変換となっ
ている。FIG. 1 is a block diagram showing an example of the configuration of a speech coding apparatus to which the speech coding method of the present invention is applied.
For example, the voice pitch (or voice fundamental frequency ωo) obtained by the voice coding parameter extraction unit shown in FIG. 8 is input to the input terminal 101, and the voice pitch is logarithmically converted by the logarithmic conversion unit 102, and the logarithmic voice pitch P [n] (n is the frame number). Logarithmic voice pitch is described in the literature (Thomas Eriksson
and Hong-Goo Kang, "Pitch Quantization in low Bit-
Rate Speech Coding ", ICASSP '99, pp489-492, 1999)
It is known that the detection limit value of a person with respect to the change amount of the logarithmic pitch is not significantly affected by the value of the logarithmic pitch as described in US Pat. Therefore, since the quantization step width can be made uniform, the conversion is convenient.

【００１８】対数ピッチP[n]は入力切換部１０３でフレ
ーム毎（またはサブフレーム化されている場合はサブフ
レーム毎）に交互に切り換えられて、２つの出力端子１
０４または１１６のいずれかに出力される。１０４に出
力された場合は均一量子化部１０５と減算部１１２に導
かれる。均一量子化部１０５では一定の量子化ステップ
で均一に量子化され、その量子化対数ピッチP1'[n]がピ
ッチ比較部１０８に入力される。一方、減算部１１２で
は入力された対数ピッチと遅延部１１１から受け取った
前フレームの量子化対数ピッチとから差分対数ピッチを
得て差分量子化部１１３に入力する。遅延部１１１は、
直前フレームの量子化対数ピッチを現在フレームに渡す
ためのものである。差分量子化部１１３では一定の差分
量子化ステップ、もしくはゼロ入力を対称として差分入
力振幅の増加につれて差分量子化ステップが拡大する様
に設定した不均一量子化ステップで差分量子化を行い、
加算部１１４で基準とした前フレームの量子化対数ピッ
チと加算し、差分量子化による対数ピッチP2'[n]を１０
７に出力する。The logarithmic pitch P [n] is alternately switched for each frame (or for each subframe when subframes are formed) by the input switching unit 103, and the two output terminals 1
04 or 116 is output. When output to 104, it is guided to uniform quantization section 105 and subtraction section 112. The uniform quantization unit 105 performs uniform quantization at a fixed quantization step, and the quantized logarithmic pitch P1 ′ [n] is input to the pitch comparison unit 108. On the other hand, the subtraction unit 112 obtains the difference logarithmic pitch from the input logarithmic pitch and the quantization logarithmic pitch of the previous frame received from the delay unit 111 and inputs the difference logarithmic pitch to the difference quantization unit 113. The delay unit 111
This is for passing the quantized logarithmic pitch of the immediately preceding frame to the current frame. The difference quantization unit 113 performs the difference quantization in a fixed difference quantization step or a non-uniform quantization step set such that the zero input is symmetric and the difference quantization step is expanded as the difference input amplitude increases,
The adder 114 adds the quantized logarithmic pitch of the previous frame as a reference, and sets the logarithmic pitch P2 ′ [n] by differential quantization to 10
7 is output.

【００１９】ピッチ比較部１０８では、P1'[n]とP2'[n]
を比較し、量子化前の対数ピッチP[n]との誤差が少ない
方の量子化対数ピッチを選択し、このフレームの量子化
対数ピッチP'[n]を出力端子１０９に出力する。出力端
子１１０には均一量子化インデックスN1と差分量子化イ
ンデックスN2のうちピッチ比較部１０８で選択された方
の量子化器の出力したインデックスをピッチ符号として
出力する。N1とN2のインデックスは番号の重複が無い様
に配置することで出力されたインデックス番号からどち
らの量子化方法が選択されたかが判る。In the pitch comparing section 108, P1 '[n] and P2' [n]
And selects the quantized logarithmic pitch having a smaller error from the logarithmic pitch P [n] before quantization, and outputs the quantized logarithmic pitch P ′ [n] of this frame to the output terminal 109. The output terminal 110 outputs, as a pitch code, the index output by the quantizer selected by the pitch comparison unit 108 from the uniform quantization index N1 and the differential quantization index N2. By arranging the indexes of N1 and N2 so that the numbers do not overlap, it is possible to determine which quantization method has been selected from the output index numbers.

【００２０】入力切換部１０３のもう一方の出力１１６
に現れた対数ピッチP[n]は、遅延部１１１の入出力端か
ら得られる現フレームと前フレームの量子化対数ピッチ
を用いて補間ピッチ候補作成部１１７で作成した複数の
補間ピッチ候補と、補間点比較部１１９で比較され、最
も出力１１６の対数ピッチに近いピッチを与えた補間点
インデックス（選択番号）N3がピッチ補間符号として１
２０に出力される。The other output 116 of the input switching unit 103
The plurality of interpolation pitch candidates created by the interpolation pitch candidate creation unit 117 using the quantized logarithmic pitches of the current frame and the previous frame obtained from the input / output end of the delay unit 111 are The interpolation point index (selection number) N3 which is compared by the interpolation point comparison unit 119 and gives the pitch closest to the logarithmic pitch of the output 116 is 1 as the pitch interpolation code.
20.

【００２１】図２は、図１の補間ピッチ候補作成部１１
７の働きを説明する図である。図２に示した例は、補間
点候補数を４としその選択番号により２ビットで量子化
した例である。現在フレームの前方フレームの量子化対
数ピッチをP'[n+1]、後方フレームの量子化対数ピッチ
をP'[n-1]とし、その間を直線で結んだ間を均等に分割
する４点を×印で示す。この４点の補間ピッチ候補のう
ち、最も入力対数ピッチP[n]に近い補間量子化ピッチP'
[n]が選択され、この補間量子化ピッチを与えるインデ
ックスとして、図２の例では２を選択する。P[n]はP'[n
+1]とP'[n-1]の間のフレームのピッチであり、例えば、
フレームが２つのサブフレームに分割されている場合で
は、P[n]は現フレームの第１サブフレーム、P'[n+1]が
現フレームの第２サブフレームの量子化ピッチ、P'[n-
1]は前フレームの第２サブフレームの量子化ピッチに対
応している。FIG. 2 shows an interpolation pitch candidate creating section 11 shown in FIG.
FIG. 7 is a diagram for explaining the operation of No. 7; The example shown in FIG. 2 is an example in which the number of interpolation point candidates is 4 and quantization is performed with 2 bits according to the selection number. The quantized logarithmic pitch of the front frame of the current frame is P '[n + 1], and the quantized logarithmic pitch of the rear frame is P' [n-1]. Is indicated by a cross. Of the four interpolation pitch candidates, the interpolation quantization pitch P ′ closest to the input logarithmic pitch P [n]
[n] is selected, and 2 is selected as an index for giving the interpolation quantization pitch in the example of FIG. P [n] is P '[n
+1] and P '[n-1], for example,
When the frame is divided into two subframes, P [n] is the first subframe of the current frame, P ′ [n + 1] is the quantization pitch of the second subframe of the current frame, and P ′ [ n-
1] corresponds to the quantization pitch of the second sub-frame of the previous frame.

【００２２】なお、図２の補間ピッチ候補の配置ではP'
[n+1]とP'[n-1]は補間ピッチ候補に入れていないが、補
間ピッチ候補を両端のP'[n+1]とP'[n-1]を含んで設定す
ることも出来る。その場合には、P'[n+1]とP'[n-1]以外
の補間ピッチ候補は２点となる。図２の例の様に補間ピ
ッチ候補の位置を両端を除いて設定すると、１ビットで
も両端を除く２点を選択できることになるため、図２の
補間点配置は補間点に与えるビット数が１ビットとか２
ビットとかの少ない場合に有効といえる。In the arrangement of the interpolation pitch candidates shown in FIG.
Although [n + 1] and P '[n-1] are not included in the interpolation pitch candidate, set the interpolation pitch candidate including P' [n + 1] and P '[n-1] at both ends. You can also. In that case, there are two interpolation pitch candidates other than P '[n + 1] and P' [n-1]. When the positions of the interpolation pitch candidates are set except for both ends as in the example of FIG. 2, two points excluding both ends can be selected even with one bit. Therefore, the interpolation point arrangement in FIG. Bit or 2
It can be said that it is effective when the number of bits is small.

【００２３】再度、図１に戻り、有声／無声情報（V/UV
情報）は入力端子１３１から入力され、フレーム間引き
部１３３でV/UV情報がフレーム間引きされる。例えば２
フレームに対して１回のみV/UV情報が出力され、有声／
無声比較部１３４に入力される。なお、受信側（復号
側）においては、前記有声／無声判定情報を省略したフ
レームでは、その前後のフレームのうち、フレームのエ
ネルギーもしくは振幅平均値の大きいフレームが持つ有
声／無声情報を使用して復号する。Returning to FIG. 1, voiced / unvoiced information (V / UV
) Is input from the input terminal 131, and the frame thinning unit 133 thins the V / UV information. For example, 2
V / UV information is output only once for a frame,
It is input to the unvoiced comparing section 134. On the receiving side (decoding side), in a frame in which the voiced / unvoiced determination information is omitted, voiced / unvoiced information of a frame having a large frame energy or amplitude average value is used among frames before and after the frame. Decrypt.

【００２４】代表有声／無声情報コードブック１３２は
予め多くの音声フレームから取得したV/UV情報から後で
述べる本発明による方法で、発生頻度の高いものから限
定個数を選んで格納したものある。現在入力されたV/UV
情報値b1と最も近い距離の代表V/UV情報値b1'を有声／
無声比較部１３４で選択し、その代表V/UV情報のインデ
ックスを有声／無声符号１３５として出力する。V/UV情
報値b1（又はb1'）は音声周波数スペクトルを音声基本
周波数の例えば３倍の区間間隔で区切った周波数バンド
毎のV/UV情報値ｖ[k]、k=1,2,...,K（ｖ[k]は０又は
１）を２進数の各ビットに割り振った２進数値で表す。The representative voiced / unvoiced information codebook 132 is one in which a limited number of frequently occurring information is selected from the V / UV information previously obtained from many voice frames and stored by the method according to the present invention described later. V / UV currently input
Voiced the representative V / UV information value b1 'closest to the information value b1 /
The unvoiced comparison unit 134 selects and outputs the index of the representative V / UV information as a voiced / unvoiced code 135. The V / UV information value b1 (or b1 ') is a V / UV information value v [k], k = 1, 2,... For each frequency band obtained by dividing the audio frequency spectrum at intervals of, for example, three times the audio fundamental frequency. .., K (v [k] is 0 or 1) is represented by a binary value obtained by allocating each bit of a binary number.

【数１】上記、V/UV情報値の距離はb1とb1'の差の絶対値で表
す。また、代表V/UV情報は、音声基本周波数により決ま
るバンド数毎に独立に設定する。(Equation 1) The distance between the V / UV information values is represented by the absolute value of the difference between b1 and b1 '. The representative V / UV information is set independently for each band number determined by the audio fundamental frequency.

【００２５】代表V/UV情報は以下の方法で選定し作成す
ることが出来る。すなわち、予め多くの音声フレームか
ら多くのV/UV情報値を得ておき、バンド数Ｋ毎に分類す
る。各バンド数毎に分類されたV/UV情報値の集合{b1_i}
から各b1_iの発生頻度を集計する。例えば、バンド数５
の場合に対しては、b1の値は０〜３１の整数値をとり、
おのおのb1値に対して発生頻度が集計される。この中か
ら、例えば２ビットでV/UV情報を量子化する場合はこの
中から４種類の代表V/UV情報を選択する必要がある。こ
の選択のためには、発生頻度の上位から順に４つ選択す
る方法が考えられるが、場合によっては隣接した代表V/
UV情報値が選ばれることがあり、代表V/UV情報値として
適当でない場合が発生する。特にバンド数Ｋが大きい場
合には発生頻度の高い代表V/UV情報値が隣接して存在す
る可能性が高い。The representative V / UV information can be selected and created by the following method. That is, a large number of V / UV information values are obtained in advance from a large number of audio frames and are classified for each band number K. A set of V / UV information values classified for each band number {b1 _i }
, The frequency of occurrence of each b1 _i is tabulated. For example, 5 bands
For the case, the value of b1 takes an integer value of 0 to 31,
The occurrence frequency is totaled for each b1 value. When quantizing V / UV information with, for example, 2 bits from among them, it is necessary to select four types of representative V / UV information from among them. For this selection, a method of selecting four items in order from the highest occurrence frequency can be considered.
The UV information value may be selected, and a case where it is not appropriate as the representative V / UV information value occurs. In particular, when the number of bands K is large, there is a high possibility that the representative V / UV information value having a high frequency of occurrence exists adjacently.

【００２６】このため、本発明では、発生頻度の最も低
いV/UV情報の発生頻度を、隣接するV/UV情報の発生頻度
に配分しながら順次消去し、最終的に目的数の代表V/UV
情報の数になるまで削減する方法を取った。図３はV/UV
情報（V/UVパターン）の候補数を順次削除する場合に、
最も発生頻度の低いV/UVパターンを消去する方法を説明
する図である。ここで、最も発生頻度の低いV/UVパター
ンの発生頻度をｑ[n]とするとｑ[n]をｑ1とｑ2に分けて
それぞれ隣接するV/UVパターンの発生頻度ｑ[n-l₁]とｑ
[n+l₂]に加算配分する。ｑ1とｑ2の配分量は隣接V/UVパ
ターンの発生頻度ｑ[n-l₁]とｑ[n+l₂]の大きさと、隣接
V/UVパターンまでの距離l₁とl₂の近さに応じて下式によ
り決める。Therefore, in the present invention, the occurrence frequency of the V / UV information having the lowest occurrence frequency is sequentially erased while distributing it to the occurrence frequency of the adjacent V / UV information, and finally, the representative V / UV information of the target number is finally erased. UV
We took the method of reducing until the number of information. Figure 3 shows V / UV
When sequentially deleting the number of information (V / UV pattern) candidates,
FIG. 9 is a diagram for explaining a method of erasing a V / UV pattern that occurs least frequently. Here, assuming that the occurrence frequency of the least frequently occurring V / UV pattern is q [n], q [n] is divided into q1 and q2, and the occurrence frequencies q [nl ₁ ] and q of the adjacent V / UV patterns, respectively.
[n + l ₂ ]. The distribution amount of q1 and q2 depends on the occurrence frequency q [nl ₁ ] and q [n + l ₂ ] of the adjacent V / UV pattern,
It is determined by the following formula according to the proximity of the distances l ₁ and l ₂ to the V / UV pattern.

【数２】 (Equation 2)

【００２７】再度、図１に戻り、スペクトル包絡の符号
化について説明する。スペクトル包絡はハーモニクスス
ペクトル振幅列Ａ[l]として離散的なスペクトルが入力
端子１４０に入力され、入力切換部１４１でフレーム毎
に経路を切り換えて一方はスペクトル修正部１４２に入
力される。スペクトル修正部１４２の動作は後で説明す
る。スペクトル補間部１４３では、離散的なスペクトル
振幅列として入力されたハーモニクススペクトル振幅列
を離散スペクトル間で補間生成し、線形予測モデル化部
１４４でモデル化に有効な多くのスペクトル振幅列を生
成する。このスペクトル補間はハーモニクススペクトル
振幅を対数に変形して線形補間し、結果的に非線型の補
間を行う。もちろん、他の非線型補間でも良く、線形予
測モデルに良く合った補間方法が望まれる。Returning to FIG. 1, coding of the spectral envelope will be described. As the spectrum envelope, a discrete spectrum is input to the input terminal 140 as a harmonics spectrum amplitude sequence A [l], and the input switching unit 141 switches the path for each frame, and one is input to the spectrum correction unit 142. The operation of spectrum correcting section 142 will be described later. The spectrum interpolation unit 143 interpolates and generates the harmonics spectrum amplitude sequence input as the discrete spectrum amplitude sequence between the discrete spectra, and the linear prediction modeling unit 144 generates many spectrum amplitude sequences effective for modeling. In this spectrum interpolation, the harmonics spectrum amplitude is transformed into a logarithm to perform linear interpolation, and as a result, nonlinear interpolation is performed. Of course, other non-linear interpolation may be used, and an interpolation method that is well suited to the linear prediction model is desired.

【００２８】補間によりそのサンプル数を増やしたスペ
クトル振幅列は、線形予測モデル化部１４４で自己回帰
型線形予測モデルでモデル化し、利得ｇと高次（例えば
１０次まで）の線形予測係数（ＬＰＣ係数ａ[j]、j=1,
2,3,....,J、Ｊは予測次数）に変換される。利得ｇは利
得量子化部１４５で対数利得に変換して均一量子化し、
そのインデックスがスペクトル利得符号出力端子１５５
に出力される。一方、ＬＰＣ係数ａ[j]はモデル化係数
変換部１４７で線スペクトル対（ＬＳＰ）Ｆ[j]に変換
される。ＬＳＰは０〜πの値を持ち、変化範囲が決まっ
ており、線形補間による聴感上の劣化が少なく、スペク
トル包絡のモデル化の係数として広く一般に用いられて
いる。ＬＳＰに変換されたＬＰＣ係数はＬＳＰ量子化部
１４８でＬＳＰコードブック１４９を用いてベクトル量
子化され、そのコードブックのインデックスがスペクト
ル包絡符号としてスペクトル包絡符号出力端子１５６に
出力される。The spectrum amplitude sequence whose number of samples has been increased by interpolation is modeled by an autoregressive linear prediction model in a linear prediction modeling unit 144, and gain g and a high-order (for example, up to 10th) linear prediction coefficient (LPC) are calculated. Coefficient a [j], j = 1,
2,3, ..., J, where J is the predicted order). The gain g is converted into a logarithmic gain by the gain quantization unit 145 and is uniformly quantized.
The index is the spectral gain code output terminal 155
Is output to On the other hand, the LPC coefficient a [j] is converted by the modeling coefficient conversion unit 147 into a line spectrum pair (LSP) F [j]. The LSP has a value of 0 to π, a range of change is fixed, there is little deterioration in audibility due to linear interpolation, and the LSP is widely and generally used as a coefficient for modeling a spectral envelope. The LPC coefficient converted into the LSP is vector-quantized by the LSP quantizer 148 using the LSP codebook 149, and the index of the codebook is output to the spectrum envelope code output terminal 156 as a spectrum envelope code.

【００２９】ここで、前記スペクトル修正部１４２の機
能とその目的について説明する。線形予測モデルでハー
モニクススペクトル振幅列をモデル化する場合、多くの
スペクトル点があった方が良いが、モデルに合わないス
ペクトルが有ればモデル化に歪を与え、モデル化後のス
ペクトル誤差が増大する。一般に音声の０次のハーモニ
クススペクトル振幅（直流成分）は他のハーモニクスス
ペクトル振幅に比べて低く、モデル化誤差を発生しやす
い。また、０次ハーモニクススペクトル振幅は、音声復
号時には不要な成分であるため（音声信号には直流成分
は殆ど含まれない）、モデル化しやすいレベルに調整変
更しても良いといえる。以上の理由で、スペクトル修正
部１４２では線形予測モデルでモデル化しやすい様に０
次のハーモニクススペクトル振幅を修正する。具体的に
は、次式により算出した値を０次ハーモニクス振幅に置
きかえる。Here, the function and the purpose of the spectrum correcting section 142 will be described. When modeling a harmonics spectral amplitude sequence with a linear prediction model, it is better to have many spectral points, but if there is a spectrum that does not fit the model, distortion will be given to the modeling and the spectral error after modeling will increase. I do. Generally, the 0th-order harmonics spectrum amplitude (DC component) of voice is lower than other harmonics spectrum amplitudes, and a modeling error is likely to occur. Also, since the 0th-order harmonics spectrum amplitude is an unnecessary component at the time of audio decoding (the audio signal hardly contains a DC component), it can be said that the amplitude may be adjusted and changed to a level that is easy to model. For the above reasons, the spectrum correction unit 142 sets the 0
Modify the next harmonics spectrum amplitude. Specifically, the value calculated by the following equation is replaced with the zero-order harmonics amplitude.

【数３】ここで、Ｈ’₀は修正した０次ハーモニクススペクトル
振幅、Ｈ₁,Ｈ₂.Ｈ₃はそれぞれ１次、２次、３次のハー
モニクススペクトル振幅である。なお、この式以外の式
を用いて０次ハーモニクス振幅を置き換えてもよい。(Equation 3) Here, H ′ ₀ is the corrected zero-order harmonics spectrum amplitude, and H ₁ , H _{2, and} H ₃ are the first-order, second-order, and third-order harmonics amplitudes, respectively. It should be noted that the zero-order harmonics amplitude may be replaced using an equation other than this equation.

【００３０】一方、前記入力切り換え部１４１でフレー
ム毎に切りかえられたもう一方の入力ハーモニクススペ
クトル振幅列Ａ[l]は、スペクトル補間点比較部１５４
に入力される。スペクトル補間点比較部１５４では、利
得補間演算部１５０において利得遅延部１４６の入出力
端からの前後のフレームの量子化利得の補間演算により
求めた補間利得の候補と、ＬＳＰ補間演算部１５２にお
いてＬＳＰ遅延部１５１の入出力端からの前後のフレー
ムのＬＳＰ係数の補間演算により求めた補間ＬＳＰ係数
の候補の組み合わせから、スペクトル復元部１５３で復
元されたハーモニクススペクトル振幅Ａ’_i[l]（ｌはハ
ーモニクス番号、ｉは補間候補番号またはインデック
ス）を、入力されたハーモニクススペクトル振幅列Ａ
[l]と比較して最もスペクトル誤差の少ない補間利得と
補間ＬＳＰ係数を与えたインデックスｉを下式により選
択し、スペクトル補間符号としてスペクトル補間符号出
力端子１５７に出力する。On the other hand, the other input harmonics spectrum amplitude sequence A [l] switched for each frame by the input switching unit 141 is compared with a spectrum interpolation point comparison unit 154.
Is input to In the spectrum interpolation point comparison unit 154, an interpolation gain candidate obtained by the interpolation calculation of the quantization gain of the preceding and succeeding frames from the input / output terminal of the gain delay unit 146 in the gain interpolation calculation unit 150, and the LSP interpolation calculation unit 152 The harmonics spectrum amplitude A ′ _i [l] (l is restored by the spectrum restoration unit 153 from the combination of the candidate interpolation LSP coefficients obtained by the interpolation calculation of the LSP coefficients of the frames before and after from the input / output end of the delay unit 151. Harmonic number, i is an interpolation candidate number or index), and the inputted harmonics spectrum amplitude sequence A
An index i that gives the interpolation gain and the interpolation LSP coefficient with the least spectral error compared to [1] is selected by the following equation, and is output to the spectrum interpolation code output terminal 157 as a spectrum interpolation code.

【数４】ここで、argmin_i(x)関数（上式中のargminの下にiが記
された関数）は、ｉをパラメータとして評価してｘが最
小となるｉを返す関数とする。(Equation 4) Here, the argmin_i (x) function (function in which i is written below argmin in the above equation) is a function that evaluates i as a parameter and returns i that minimizes x.

【００３１】また、前記補間利得の候補は、前後フレー
ムの量子化利得間の補間候補のみでは不都合が発生する
場合がある。例えば、急激な利得の極小がある場合には
符号化音声が十分小さなレベルになりきれず、不快な雑
音を発生する。これを避けるため、前記補間利得候補
に、前後のフレームの量子化利得の最小値以上の候補や
最大値以上の候補を加えることで、この問題は軽減され
る。具体的には、最大値（前後のフレームの量子化利得
の大きいほうの値）＋５dB、＋１０dB、最小値（前後の
フレームの量子化利得の小さいほうの値）−５dB、−１
０dBなどを前記前後のフレームの量子化利得の補間演算
により求めた補間利得の候補に加えることを行なう。As for the interpolation gain candidates, inconvenience may occur if only interpolation candidates between the quantization gains of the preceding and succeeding frames are used. For example, if there is a sharp minimum in the gain, the coded speech cannot be reduced to a sufficiently small level, generating unpleasant noise. In order to avoid this, the problem is reduced by adding to the interpolation gain candidate a candidate having a quantization gain of the preceding and succeeding frames that is equal to or greater than the minimum value or a maximum value. Specifically, the maximum value (the larger value of the quantization gain of the preceding and succeeding frames) +5 dB, +10 dB, and the minimum value (the smaller value of the quantization gain of the preceding and succeeding frames) -5 dB, -1
0 dB and the like are added to the interpolation gain candidates obtained by the interpolation calculation of the quantization gains of the preceding and succeeding frames.

【００３２】次に、本発明による前記図１に示した符号
化部の処理の流れについて説明する。図４は音声基本周
波数ωoの符号化の流れ図である。図４で７０１から処
理が開始される。７０２で量子化する音声基本周波数ω
o、フレーム番号ｍを設定する。次に７０３で対数ピッ
チＰを計算し、７０４でｍの偶奇を判断し、もし偶数な
らば７０５及び７０６で、それぞれ均一量子化と差分量
子化を行い、均一量子化ピッチP_uと均一量子化インデ
ックスIndex_u、及び差分量子化ピッチP_dと差分量子化
インデックスIndex_dを計算する。次に、７０８で差分
量子化誤差（|P_d−P|）を判定し、ある閾値Thより小さ
い場合は、７１０でｍフレーム目のピッチP[m]（これを
偶数フレームの意味でP[2n]と表す）をP_d、そのインデ
ックスをIndex[m]（これを偶数フレームの意味でIndex
[2n]と表す）をIndex_dとする。一方、７０８で|P_d−P
|がTh以上であると判定された場合は、７０７で均一量
子化誤差（|P_u−P|）と差分量子化誤差（|P_d−P|）を
比較し、均一量子化誤差が小さい場合は７０９でｍフレ
ーム目のピッチP[2n]をP_u、そのインデックスIndex[2
n]をIndex_uとし、逆の場合は７１０でピッチP[2n]をP_
d、Index[2n]をIndex_dとする。また、ｍフレーム目の
ピッチP[2n]は７１１で２フレーム時間を遅延してP[2n-
2]とし、差分量子化７０６の基準対数ピッチとして、次
の偶数フレームを差分量子化する時の基準対数ピッチと
して使用する。Next, the flow of the processing of the encoding unit shown in FIG. 1 according to the present invention will be described. FIG. 4 is a flowchart of encoding of the fundamental voice frequency ωo. The process is started from 701 in FIG. Speech fundamental frequency ω quantized in 702
o, set the frame number m. Next, the logarithmic pitch P is calculated in 703, and even or odd of m is determined in 704. If the number is even, uniform quantization and differential quantization are performed in 705 and 706, respectively, and the uniform quantization pitch P_u and the uniform quantization index are calculated. Index_u, differential quantization pitch P_d, and differential quantization index Index_d are calculated. Next, the difference quantization error (| P_d-P |) is determined in 708, and if it is smaller than a certain threshold Th, the pitch P [m] of the m-th frame (P [2n in the sense of an even frame) is determined in 710. ] Is represented by P_d, and its index is Index [m] (this is Index in the sense of an even frame)
[Represented by [2n]) is Index_d. On the other hand, at 708, | P_d-P
If it is determined that | is equal to or greater than Th, the uniform quantization error (| P_u−P |) is compared with the differential quantization error (| P_d−P |) in 707, and if the uniform quantization error is small, At 709, the pitch P [2n] of the m-th frame is P_u, and its index Index [2]
n] is set to Index_u, and conversely, the pitch P [2n] is set to P_ at 710.
Let d and Index [2n] be Index_d. In addition, the pitch P [2n] of the m-th frame is delayed by 2 frames at 711, and P [2n−
2], and is used as a reference logarithmic pitch for the difference quantization 706 when the next even-numbered frame is differentially quantized.

【００３３】また、７０４でｍが奇数の場合は７１３で
ピッチ補間候補から選択される。ピッチ補間候補は、前
記図２に示したと同様の手法により、前後のフレームの
量子化対数ピッチ間を複数個に均等分割した補間ピッチ
候補の集合{Pinpol[2n-1]_i}（i=0,1,2,3...,N-1、Ｎは
補間候補点数）として７１２で計算される。ピッチ補間
候補の選択は、ｍ＝2n-1フレームの対数ピッチP[2n-1]
との誤差絶対値が最も小さい補間候補点を選択し、その
インデックス番号Index[2n-1]を奇数フレームの量子化
されたピッチ符号として７１４で設定される。また、偶
数フレームの場合のピッチ符号は７０９又は７１０で選
択された量子化法のインデックスIndex_d又はIndex_uが
Index[2n]として同様に７１４で設定される。この様に
して設定された偶数と奇数フレームの２フレーム分のピ
ッチ符号が７１５に出力される。図４に示した本発明に
よる音声基本周波数の符号化方法を用いれば、例えば、
ピッチ符号に４ビット、ピッチ補間符号に１ビットを用
いて、良好に２フレーム分の音声基本周波数の符号化が
出来る。If m is an odd number in 704, pitch interpolation candidates are selected in 713. A pitch interpolation candidate is a set {Pinpol [2n-1] _i } (i = 0) of interpolation pitch candidates obtained by equally dividing the interval between the quantized logarithmic pitches of the preceding and succeeding frames into a plurality of pieces by the same method as shown in FIG. , 1, 2, 3,..., N−1, N are calculated at 712 as interpolation candidate points). The selection of the pitch interpolation candidate is performed using a logarithmic pitch P [2n-1] of m = 2n-1 frames.
Is selected, and its index number Index [2n-1] is set at 714 as the quantized pitch code of the odd-numbered frame. The pitch code in the case of the even frame is the index Index_d or Index_u of the quantization method selected in 709 or 710.
Index [2n] is similarly set at 714. The pitch codes for the two frames of the even and odd frames set in this way are output to 715. By using the encoding method of the fundamental voice frequency according to the present invention shown in FIG. 4, for example,
By using 4 bits for the pitch code and 1 bit for the pitch interpolation code, it is possible to satisfactorily encode the speech fundamental frequency for two frames.

【００３４】図５はV/UV情報の符号化の流れ図である。
８０１から処理が開始され、８０２で音声基本周波数を
ωoに、フレームのV/UV情報値をｖに、フレーム番号を
ｍに設定する。８０３ではωoに対するバンド数Ｋを求
める。Ｋ＝(int)((π-ωo/2)/(ωo×B)) ここで、Ｂは各バンドに含まれるハーモニクスの本数で
あり、符号化に先立ち予め決めておくもので、３程度が
使用される。８０４でｍの偶奇を判定し、ｍが偶数の場
合には８０５で予め選出したバンド数毎の代表V/UV情報
値のデータ８０６の中からバンド数Ｋ用のグループ｛VC
B[K][i]｝で最も入力されたV/UV値ｖに近い値を持った
代表V/UV情報値のインデックスｉを選びIndexV/UV[2n]
とし、８０７で有声／無声符号に設定する。処理フレー
ムが奇数の場合は、有声／無声符号は間引かれて出力さ
れない。図５に示した本発明によるV/UV情報の符号化方
法を用いれば、例えば、有声／無声符号に２ビットを用
いて、良好に２フレーム分のV/UV情報の符号化が出来
る。FIG. 5 is a flowchart for encoding V / UV information.
Processing starts at 801. At 802, the voice fundamental frequency is set to ωo, the V / UV information value of the frame is set to v, and the frame number is set to m. In step 803, the number of bands K for ωo is obtained. K = (int) ((π−ωo / 2) / (ωo × B)) where B is the number of harmonics included in each band, and is determined in advance before encoding. used. At step 804, it is determined whether or not m is even. If m is even, at step 805, the group ｛VC for the number of bands K from the representative V / UV information value data 806 for each number of bands selected in advance.
In B [K] [i] UV, select the index i of the representative V / UV information value having a value closest to the most input V / UV value v. IndexV / UV [2n]
At 807, a voiced / unvoiced code is set. If the processing frame is an odd number, the voiced / unvoiced code is thinned out and not output. If the V / UV information encoding method according to the present invention shown in FIG. 5 is used, for example, two frames of V / UV information can be satisfactorily encoded using two bits for voiced / unvoiced code.

【００３５】図６はハーモニクススペクトル振幅列の符
号化の流れ図である。９０１から処理が開始され、９０
２でハーモニクススペクトル振幅をＡ[l]にセットす
る。またフレーム番号はｍにセットする。９０３でｍの
偶奇を判定し、偶数フレームの場合は９０４で０次のハ
ーモニクススペクトル振幅Ａ[0]を第２次のハーモニク
ススペクトル振幅Ａ[2]をα倍したもので補正する。補
正係数αの決定方法は既述の方法で決定する。９０５で
補正されたハーモニクススペクトル振幅列を対数値で線
形補間し、スペクトル列の本数を増加する。９０６で
は、補間して増加したスペクトル列を線形予測モデルで
のモデル化曲線上の値としてモデル化表現する。線形予
測モデルのモデル化係数である利得ｇは９０８で対数利
得として均一量子化し、量子化利得ｇ’とそのインデッ
クスを得る。量子化ステップと量子化範囲は９０７で予
め決められている。もう一方のモデル化係数であるＬＰ
Ｃ係数ａ[j]は線スペクトル対（ＬＳＰ）に変換してＦ
[j]とし、９０９でベクトル量子化テーブル９１０を用
いてベクトル量子化し、量子化ベクトルＦ’[j]とその
インデックスを得る。FIG. 6 is a flowchart of encoding a harmonics spectrum amplitude sequence. The processing is started from 901 and 90
In step 2, the harmonics spectrum amplitude is set to A [l]. The frame number is set to m. In step 903, whether the m is even or odd is determined. In the case of an even frame, in step 904, the 0th-order harmonics spectrum amplitude A [0] is corrected by multiplying the second-order harmonics spectrum amplitude A [2] by α. The method for determining the correction coefficient α is determined by the method described above. The harmonics spectrum amplitude sequence corrected in 905 is linearly interpolated by a logarithmic value to increase the number of spectrum sequences. At 906, the interpolated and increased spectrum sequence is modeled and represented as a value on a modeling curve in a linear prediction model. The gain g, which is a modeling coefficient of the linear prediction model, is uniformly quantized at 908 as a logarithmic gain to obtain a quantization gain g ′ and its index. The quantization step and quantization range are predetermined at 907. LP, the other modeling coefficient
The C coefficient a [j] is converted to a line spectrum pair (LSP) and converted to F
In step 909, the vector is quantized using the vector quantization table 910 to obtain a quantized vector F ′ [j] and its index.

【００３６】９０８からの量子化利得ｇ’は、９１２で
２フレーム時間遅延し、その入出力間の複数（M1個）の
補間利得候補が９１１で対数利得線形補間演算され９１
５に導かれる。また、９０９からの量子化ＬＳＰ（Ｆ’
[j]）は９１３で２フレーム時間遅延し、その入出力間
の複数（M2個）の補間ＬＳＰ候補が９１４でＬＳＰ線形
補間候補演算されて、ＬＳＰ補間候補列が９１５に導か
れる。９１５では入力されたＬＳＰ補間候補と利得補間
候補間の組み合わせ（最大M1×M2個）から補間ハーモニ
クススペクトルＡ’[l]を復元する。９１６では復元さ
れた補間ハーモニクススペクトル列の集まりと、９０３
から入力された奇数フレームのハーモニクススペクトル
列を順次比較し、最もスペクトル誤差が少ない補間ハー
モニクススペクトル列を与える組み合わせ番号をスペク
トル補間番号として選定する。また、前記図１に示した
スペクトル利得符号は９０８から、またスペクトル包絡
符号は９０９からそれぞれ得られ、９１７で本処理は終
了する。図６に示した本発明によるハーモニクススペク
トル振幅列の符号化方法を用いれば、スペクトル修正さ
れた２フレーム分のスペクトル利得符号とスペクトル包
絡符号を、１フレーム分の修正されたスペクトル利得符
号とスペクトル包絡符号とスペクトル補間符号で、良好
に符号化が出来るため、音声品質の改善と符号化ビット
数の削減が出来る。The quantization gain g 'from 908 is delayed by two frames at 912, and a plurality of (M1) interpolation gain candidates between its input and output are subjected to logarithmic gain linear interpolation at 911.
It is led to 5. Also, the quantized LSP (F ′) from 909
[j]) is delayed by two frames at 913, a plurality of (M2) interpolated LSP candidates between the input and output are subjected to an LSP linear interpolation candidate operation at 914, and an LSP interpolation candidate sequence is led to 915. At 915, the interpolation harmonics spectrum A ′ [l] is restored from the combination (maximum M1 × M2) between the input LSP interpolation candidate and the gain interpolation candidate. At 916, a set of the restored interpolated harmonics spectrum trains and 903
Are sequentially compared, and the combination number that gives the interpolated harmonics spectrum sequence with the least spectral error is selected as the spectrum interpolation number. The spectrum gain code shown in FIG. 1 is obtained from 908, and the spectrum envelope code is obtained from 909, and the process ends at 917. With the encoding method of the harmonics spectrum amplitude sequence according to the present invention shown in FIG. Since the encoding can be performed well with the code and the spectrum interpolation code, the voice quality can be improved and the number of encoded bits can be reduced.

【００３７】なお、以上の説明では、判りやすくするた
めに図１の入力切換部１０３や１４１は、フレーム毎に
切り換えることとして説明したが、特にフレーム毎の交
互の切り換えに限定するものではなく、必要に応じて切
り換えの周期を変更しての動作も容易に実現が可能であ
る。これにより図４、図５、図６の流れ図にも若干の変
更が発生するが、関連する技術者、研究者には容易に該
当箇所を変更することが可能である。また、上記実施の
形態においては、符号化ビット数をより削減するため
に、有声／無声の判定情報を送出しないフレームを設け
たが、すべてのフレームについて有声／無声情報を送出
するようにしてもよい。さらに、上記においては、ハー
モニクススペクトル振幅列を利得と線スペクトル対（Ｌ
ＳＰ）で量子化したが、利得と線形予測係数（ＬＰＣ係
数）で量子化するようにしてもよい。In the above description, the input switching units 103 and 141 shown in FIG. 1 are switched for each frame for easy understanding. However, the present invention is not limited to the switching for each frame. The operation of changing the switching cycle as needed can be easily realized. This causes slight changes in the flow charts of FIGS. 4, 5, and 6, but it is possible for related engineers and researchers to easily change the corresponding portions. Further, in the above-described embodiment, a frame that does not transmit voiced / unvoiced determination information is provided in order to further reduce the number of coded bits, but voiced / unvoiced information may be transmitted for all frames. Good. Further, in the above, the harmonics spectrum amplitude sequence is represented by a gain and a line spectrum pair (L
Although the quantization is performed by SP), the quantization may be performed by the gain and the linear prediction coefficient (LPC coefficient).

【００３８】[0038]

【発明の効果】以上述べた様に、本発明の音声符号化方
法および装置によれば、音声のフレーム毎に、音声ピッ
チ（または音声基本周波数）、各スペクトルバンドのV/
UV情報、及びハーモニクススペクトル振幅列からなる音
声符号化パラメータで表した分析合成型の音声符号化方
法において、音声ピッチを、対数ピッチとして差分量子
化または均一量子化するフレームと、フレーム間補間イ
ンデックスで量子化するフレームの切り換えにより符号
化することで、大幅に符号化ビット数を低下することが
出来る。また、V/UV情報を音声基本周波数の範囲で決ま
るバンド数毎に、予め本発明による方法で取得した代表
V/UV値のインデックス番号で符号化することで、合理的
にV/UV符号化ビット数を削減することが出来る。更に、
V/UV情報をフレームで間引き、V/UV情報を伝送しないフ
レームのV/UV情報については、本発明に述べた前後のフ
レームから類推する方法により更にV/UV符号化ビット数
を削減することが出来る。また、音声ハーモニクススペ
クトル振幅列の０次のハーモニクス振幅値を本発明に述
べた方法で修正後、自己回帰型線形予測モデル化し、そ
のモデル係数を量子化して伝送するフレームと、モデル
係数そのものを量子化せずに、すでに量子化されたモデ
ル係数からフレーム間補間により複数の補間モデル化係
数を求め、最良の補間モデル化係数を与える補間候補の
インデックス番号で量子化するフレームの切り換えによ
り、符号化音声品質の改善を図りながら、ハーモニクス
スペクトル振幅列の符号化ビット数を大幅に低下するこ
とが出来る。以上のように、本発明によれば、分析合成
型の音声符号化方法及び装置において、符号化ビットレ
ートを大きく低下する方法及び装置を提供することが出
来る。また、分析合成型の音声符号化方法及び装置にお
いて、音声符号化のフレーム更新周期を早くすることに
より符号化音声品質を向上させ、かつ符号化ビット数の
増大を防いだ音質の良い分析合成型の音声符号化方法及
び装置を提供することが可能となる。As described above, according to the speech encoding method and apparatus of the present invention, the speech pitch (or speech fundamental frequency) and the V /
In an analysis-synthesis-type speech coding method represented by a speech coding parameter consisting of UV information and a harmonics spectrum amplitude sequence, in a speech pitch, a frame to be subjected to differential quantization or uniform quantization as a logarithmic pitch, and an inter-frame interpolation index. By performing encoding by switching frames to be quantized, the number of encoded bits can be significantly reduced. In addition, V / UV information is represented for each number of bands determined by the range of the sound fundamental frequency.
By encoding with V / UV value index numbers, the number of V / UV encoded bits can be reduced rationally. Furthermore,
The V / UV information is thinned out by frames, and the V / UV information of a frame that does not transmit V / UV information is further reduced in the number of V / UV encoded bits by a method of inferring from previous and subsequent frames described in the present invention. Can be done. Further, after correcting the 0th-order harmonics amplitude value of the voice harmonics spectrum amplitude sequence by the method described in the present invention, an autoregressive linear prediction model is formed, and the model coefficient is quantized and transmitted. Without interpolating, a plurality of interpolation model coefficients are obtained by inter-frame interpolation from model coefficients that have already been quantized, and coding is performed by switching frames to be quantized with an index number of an interpolation candidate that gives the best interpolation model coefficient. The number of coded bits of the harmonics spectrum amplitude sequence can be greatly reduced while improving the voice quality. As described above, according to the present invention, it is possible to provide a method and an apparatus for greatly reducing an encoding bit rate in an analysis and synthesis type speech encoding method and apparatus. Also, in the analysis-synthesis type speech coding method and apparatus, the analysis-synthesis-type speech coding method improves the quality of the coded speech by increasing the frame update period of the speech coding and prevents the number of coded bits from increasing. Can be provided.

[Brief description of the drawings]

【図１】本発明の音声符号化方法が適用された音声符
号化装置の一実施の形態を示すブロック図である。FIG. 1 is a block diagram illustrating an embodiment of a speech encoding apparatus to which a speech encoding method according to the present invention is applied.

【図２】本発明の補間ピッチ候補作成部における補間
ピッチ候補作成処理を説明するための図である。FIG. 2 is a diagram for explaining an interpolation pitch candidate creation process in an interpolation pitch candidate creation unit of the present invention.

【図３】本発明における発生頻度の低い有声／無声情
報の消去方法を説明するための図である。FIG. 3 is a diagram for explaining a method of deleting voiced / unvoiced information having a low frequency of occurrence in the present invention.

【図４】本発明の音声符号化方法における音声ピッチ
の符号化の流れ図である。FIG. 4 is a flowchart of encoding a speech pitch in the speech encoding method of the present invention.

【図５】本発明の音声符号化方法における有声／無声
情報の符号化の流れ図である。FIG. 5 is a flowchart of encoding voiced / unvoiced information in the speech encoding method of the present invention.

【図６】本発明の音声符号化方法におけるハーモニク
ススペクトル振幅列の符号化の流れ図である。FIG. 6 is a flowchart of encoding a harmonics spectrum amplitude sequence in the speech encoding method of the present invention.

【図７】音声符号化伝送装置の構成図である。FIG. 7 is a configuration diagram of a speech coded transmission device.

【図８】音声符号化パラメータ抽出部のブロック図で
ある。FIG. 8 is a block diagram of a speech coding parameter extraction unit.

【図９】従来の音声符号化方式におけるパラメータ符
号化部のブロック図である。FIG. 9 is a block diagram of a parameter encoding unit in a conventional speech encoding scheme.

[Explanation of symbols]

１０２対数変換部、１０３入力切換部、１０５均
一量子化部、１０８ピッチ比較部、１１１遅延部、１
１２減算部、１１３差分量子化部、１１４加算
部、１１７補間ピッチ候補作成部、１１９補間点比
較部、１３２代表有声／無声情報コードブック、１３３
フレーム間引き部、１３４有声／無声比較部、１４
１入力切換部、１４２スペクトル修正部、１４３
スペクトル補間部、１４４線形予測モデル化部、１４
５利得量子化部、１４６利得遅延部、１４７モデ
ル化係数変換部、１４８ＬＳＰ量子化部、１４９Ｌ
ＳＰコードブック、１５０利得補間演算部、１５１
ＬＳＰ遅延部、１５２ＬＳＰ補間演算部、１５３スペ
クトル復元部、１５４スペクトル補間点比較部102 logarithmic conversion unit, 103 input switching unit, 105 uniform quantization unit, 108 pitch comparison unit, 111 delay unit, 1
12 subtraction section, 113 difference quantization section, 114 addition section, 117 interpolation pitch candidate creation section, 119 interpolation point comparison section, 132 representative voiced / unvoiced information codebook, 133
Frame thinning section, 134 Voiced / unvoiced comparison section, 14
1 input switching unit, 142 spectrum correction unit, 143
Spectrum interpolating unit, 144 linear prediction modeling unit, 14
5 Gain quantization section, 146 gain delay section, 147 modeling coefficient conversion section, 148 LSP quantization section, 149 L
SP codebook, 150 gain interpolation calculator, 151
LSP delay section, 152 LSP interpolation calculation section, 153 spectrum restoration section, 154 spectrum interpolation point comparison section

フロントページの続き (72)発明者佐々木誠司神奈川県横須賀市光の丘３番２号株式会社ワイ・アール・ピー高機能移動体通信研究所内Ｆターム(参考） 5D045 CC07 DA11 DA20 5J064 AA02 BA01 BB03 BB12 BC14 BC16 BC26 BD01 5K041 AA00 CC01 DD01 EE22 EE24 EE31 HH23 HH43 Continuing from the front page (72) Inventor Seiji Sasaki 3-2 Hikarinooka, Yokosuka City, Kanagawa Prefecture F.R.P. BC14 BC16 BC26 BD01 5K041 AA00 CC01 DD01 EE22 EE24 EE31 HH23 HH43

Claims

[Claims]

1. A speech encoding method for acquiring and encoding a speech encoding parameter from a speech signal digitized and divided into frames of a predetermined time length, wherein a speech pitch as the speech encoding parameter is obtained by subtracting Select a frame to be quantized by one of the quantization methods by selecting the quantization method or uniform quantization method, and select one of a plurality of interpolation voice pitch candidates calculated using the quantized voice pitch of the previous and next frames And coding by a combination with a frame to be quantized by the index number, wherein voiced / unvoiced information as the voice coding parameter is
Encoding the closest number of representative voiced / unvoiced information from the limited number of representative voiced / unvoiced information using an index number; and converting the harmonics spectrum amplitude sequence as the voice coding parameter into a linear prediction coefficient ( Or a frame to be quantized by a gain or a line spectrum pair derived therefrom, and a plurality of interpolation linear prediction coefficient (or line spectrum pair) candidates calculated using the linear prediction coefficients (or line spectrum pair) of the preceding and succeeding frames. Selecting one of a plurality of combinations of interpolation gain candidates calculated using the gains of the preceding and succeeding frames, and encoding the selected combination by a combination of frames to be quantized by the index number. Method.

2. The method according to claim 1, wherein the difference quantization method and the uniform quantization method in the quantization of the voice pitch are selected when the difference quantization error is equal to or smaller than a certain threshold value, and when the difference quantization error is equal to or larger than the threshold value, In the process of selecting a quantization method with a small quantization error, the selection of the interpolation voice pitch candidate is performed by selecting the interpolation voice pitch closest to the voice pitch of the current frame among a plurality of interpolation voice pitch candidates calculated from the voice pitches of the previous and next frames. 2. The speech encoding method according to claim 1, wherein said speech encoding method is a process of selecting a speech pitch candidate.

3. The speech encoding method according to claim 1, wherein the quantization of the speech pitch is performed using a logarithmic pitch obtained by logarithmically converting the speech pitch.

4. The speech encoding method according to claim 1, wherein the difference quantization of the speech pitch is performed by setting a quantization step larger as the difference value increases.

5. The limited number of representative voiced / unvoiced information is deleted from a large number of previously acquired voiced / unvoiced information in order of frequency of occurrence, and the generation of the removed voiced / unvoiced information is performed. By allocating and integrating the frequency to adjacent voiced / unvoiced information according to the distance and size to the information, it is created that the frequency is reduced to a desired limited number of voiced / unvoiced information. The speech encoding method according to claim 1, wherein:

6. The distance of the voiced / unvoiced information represents voiced / unvoiced for each voice spectrum band as 0 or 1, and a value of 0 or 1 of each voice spectrum band is represented by a binary number in the order of the frequency of the voice spectrum band. 2. A speech encoding method according to claim 1, wherein the method is applied to bits and represented with a difference between the binary values.

7. The speech coding method according to claim 1, wherein the voiced / unvoiced information includes a combination of a frame for transmitting voiced / unvoiced information and a frame not to be omitted and transmitted.

8. A frame in which the voiced / unvoiced information is omitted, wherein decoding is performed using voiced / unvoiced information possessed by a frame having a large energy or a large amplitude average value among frames before and after the frame. Item 7. A speech encoding method according to Item 7.

9. When the harmonics spectrum amplitude sequence is modeled by a linear prediction model, a zero-order (DC component)
2. The speech encoding method according to claim 1, wherein after correcting the harmonics spectrum amplitude value, linear prediction modeling of the harmonics spectrum amplitude is performed.

10. The plurality of interpolation gain candidates include gains obtained by equally or unequally dividing the gains of the preceding and succeeding frames, gains of the larger and smaller gains of the preceding and succeeding frames. The speech encoding method according to claim 1, further comprising:

11. A speech encoding apparatus for acquiring and encoding a speech encoding parameter from a speech signal digitized and divided into frames of a predetermined time length, wherein a speech pitch as the speech encoding parameter is obtained by subtracting Select a frame to be quantized by one of the quantization methods by selecting the quantization method or uniform quantization method, and select one of a plurality of interpolation voice pitch candidates calculated using the quantized voice pitch of the previous and next frames Means for encoding by a combination of frames to be quantized by the index number; and voiced / unvoiced information as the speech encoding parameter,
Means for encoding with a index number of the nearest one of a limited number of representative voiced / unvoiced information; and a linear prediction coefficient (or A frame to be quantized by the line spectrum pair derived therefrom and the gain, and a plurality of interpolation linear prediction coefficient (or line spectrum pair) candidates calculated using the linear prediction coefficients (or line spectrum pair) of the preceding and succeeding frames, and Means for selecting any one of a plurality of combinations of interpolation gain candidates calculated using the gain of the frame, and encoding by a combination of frames to be quantized by the index number. Device.