JP3089967B2

JP3089967B2 - Audio coding device

Info

Publication number: JP3089967B2
Application number: JP07004921A
Authority: JP
Inventors: 真一田海; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1995-01-17
Filing date: 1995-01-17
Publication date: 2000-09-18
Anticipated expiration: 2015-09-18
Also published as: JPH08194499A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声符号化装置に係わ
り、たとえば、５ｍｓ−１０ｍｓ以下の短いフレーム単
位で、音声信号を高品質に符号化する音声符号化装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding apparatus, and more particularly to a speech coding apparatus for coding a speech signal with a high quality in a short frame unit of 5 ms to 10 ms or less.

【０００２】[0002]

【従来の技術】音声信号を符号化する方式としては、た
とえば、オザワ(K.Ozawa) 氏らによる“M-LCEP Speech
Coding at 4kb/s with Multi-Mode and Mult-Codebook
”(IEICE Trans. Commun.,vol. E77-B, No.9,pp.1114-
1121,1994) と題した論文などが知られている。以下、
この論文に記載されている符号化方式の概要を説明す
る。2. Description of the Related Art As a method of encoding a speech signal, for example, an M-LCEP Speech by K. Ozawa et al.
Coding at 4kb / s with Multi-Mode and Mult-Codebook
”(IEICE Trans. Commun., Vol. E77-B, No. 9, pp. 1114-
1121, 1994). Less than,
An outline of the encoding method described in this paper will be described.

【０００３】この技術では、フレーム毎（たとえば、４
０ｍｓ）に音声信号から線型予測（ＬＰＣ）分析を用い
て、音声信号のスペクトル特性を表すスペクトルパラメ
ータを抽出し、フレーム単位の信号もしくはフレーム単
位の信号に聴感重み付けを行った信号から特徴量を計算
し、その特徴量を用いて、たとえば、母音部と子音部の
判別といったモード判別を行い、モード判別結果に応じ
てアルゴリズムあるいはコードブックを切り替えて符号
化が行われる。In this technique, every frame (for example, 4
At 0 ms), a linear prediction (LPC) analysis is performed on the audio signal to extract a spectrum parameter representing a spectral characteristic of the audio signal, and calculate a feature amount from a signal in a frame unit or a signal in which a perceptual weight is applied to the signal in a frame unit. Then, using the feature amount, for example, mode discrimination such as discrimination between a vowel portion and a consonant portion is performed, and encoding is performed by switching an algorithm or a code book according to the mode discrimination result.

【０００４】符号化部では、フレームをさらにサブフレ
ーム（たとえば、８ｍｓ）に分割し、サブフレーム毎に
過去の音源信号を基に適応コードブックにおけるパラメ
ータ（ピッチ周期に対応する遅延パラメータとゲインパ
ラメータ）を抽出し、適応コードブックにより、サブフ
レームの音声信号をピッチ予測し、ピッチ予測して求め
た残差信号に対して、予め定められた種類の雑音信号か
らなる音声コードブック（ベクトル量子化コードブッ
ク）から最適音源コードベクトルを選択し、最適なゲイ
ンを計算することにより、音源信号を量子化する。[0004] The encoder further divides the frame into subframes (for example, 8 ms), and for each subframe, parameters in the adaptive codebook (delay parameters and gain parameters corresponding to the pitch period) based on past excitation signals. Is extracted, and the speech signal of the sub-frame is pitch-predicted by the adaptive codebook, and the residual signal obtained by pitch prediction is subjected to a speech codebook (a vector quantization code) including a predetermined type of noise signal. Book), an excitation signal is quantized by selecting an optimal excitation code vector and calculating an optimal gain.

【０００５】音源コードベクトルの選択は、選択した雑
音信号により合成した信号と、上述の残差信号との誤差
電力を最小化するように行われ、選択されたコードベク
トルの種類を示すインデクスとゲイン並びに、スペクト
ルパラメータと適応コードブックのパラメータがマルチ
プレクサ部により組み合わせて伝送されている。[0005] The excitation code vector is selected so as to minimize the error power between the signal synthesized from the selected noise signal and the above-mentioned residual signal, and an index indicating the type of the selected code vector and a gain are selected. In addition, the spectrum parameters and the parameters of the adaptive codebook are transmitted in combination by the multiplexer unit.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来の音声符
号化方法では、処理遅延を低減するためにはフレーム長
を短くすることが必要となるが、たとえば、フレーム長
を５ｍｓ以下にした場合には、特徴量の時間的が大きく
なるため、不安定で誤ったモード切り替えが生じ、音質
劣化がおこるという問題があった。In the above-mentioned conventional speech coding method, it is necessary to shorten the frame length in order to reduce the processing delay. For example, when the frame length is reduced to 5 ms or less, However, there is a problem that the temporal change of the feature amount increases, which causes unstable and erroneous mode switching, thereby deteriorating sound quality.

【０００７】また、ピッチ抽出は、次式を最小にするＴ
を算出することによって行われるが、フレーム長を、５
ｍｓとすると、Ｎ＝４０となる。すなわち、次式を用い
たピッチ抽出では、Ｅ_Tを計算する区間長が短いため
に、時間的に大きく変化するピッチが求められることに
なり、やはり、音質劣化が生じてしまう。In pitch extraction, T
Is calculated by calculating the frame length.
If ms, N = 40. That is, in the pitch extraction using the following expression, since the section length for calculating E _T is short, a pitch that changes greatly with time is required, and the sound quality also deteriorates.

【０００８】[0008]

【数１】 (Equation 1)

【０００９】そこで、本発明の目的は、誤ったモード判
別による音質劣化が起こりにくい音声符号化装置を提供
することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech coding apparatus in which sound quality degradation due to erroneous mode discrimination does not easily occur.

【００１０】また、本発明の他の目的は、誤ったピッチ
抽出による音質劣化が起こりにくい音声符号化装置を提
供することにある。It is another object of the present invention to provide a speech coding apparatus in which sound quality is hardly degraded due to erroneous pitch extraction.

【００１１】[0011]

【課題を解決するための手段】請求項１記載の発明で
は、（イ）音声信号を所定単位のフレーム単位の信号に
分割する分割手段と、（ロ）この分割手段が分割した現
フレームの信号から求めた特徴量と、分割手段が少なく
とも１フレーム分過去に分割したフレームの信号から求
めた特徴量との重み付け和を用いて、音声信号のモード
の判別を行うモード判別手段と、（ハ）このモード判別
手段の判別したモードに応じて特定されるアルゴリズム
を用いて分割手段が分割したフレーム単位で音声信号の
符号化を行う符号化手段とを音声符号化装置に具備させ
る。 According to the first aspect of the present invention ,
Is (a) dividing means for dividing a speech signal into frame units of the signal of a predetermined unit, (b) a feature quantity obtained from the signal of the current frame which the dividing means is divided, dividing means less
Are calculated from the signal of the frame divided in the past for one frame.
A mode discriminating means for discriminating the mode of the audio signal using the weighted sum with the feature amount obtained, and (c) dividing means using an algorithm specified according to the mode discriminated by the mode discriminating means. It is provided to the speech coding apparatus and a coding means for coding the speech signal in units of frames and
You.

【００１２】[0012]

【００１３】[0013]

【００１４】すなわち、請求項１記載の発明では、現フ
レームの信号から求めた特徴量（たとえば、ピッチ予測
ゲイン）と、分割手段が少なくとも１フレーム分過去に
分割したフレームの信号から求めた特徴量との重み付け
和を用いて、音声信号のモードの判別を行うように、音
声符号化装置を構成することによって、誤ったモード判
別が生じないようにする。That is, according to the first aspect of the present invention, the characteristic amount (for example, the pitch prediction gain) obtained from the signal of the current frame and the characteristic amount obtained from the signal of the frame divided at least one frame in the past by the dividing means. By configuring the speech encoding apparatus to determine the mode of the speech signal using the weighted sum of the above, an erroneous mode discrimination does not occur.

【００１５】[0015]

【００１６】[0016]

【００１７】[0017]

【００１８】[0018]

【００１９】[0019]

【実施例】以下、実施例につき本発明を詳細に説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to embodiments.

【００２０】[0020]

【００２１】図１に、本発明の実施例による音声符号化
装置の概略構成を示す。以下、この図を用いて、実施例
の音声符号化装置の動作を説明する。FIG. 1 shows a schematic configuration of a speech coding apparatus according to an embodiment of the present invention. Hereinafter, with reference to this figure, illustrating the operation of the speech coding apparatus of the embodiment <br/>.

【００２２】フレーム分割回路１１は、入力端子５０か
ら入力された音声信号を、フレーム（たとえば、５ｍ
ｓ）毎に分割する回路であり、サブフレーム分割回路１
２は、フレーム分割回路１１が出力するフレームを、さ
らに短いフレーム（たとえば、２．５ｍｓ）に分割す
る。The frame dividing circuit 11 converts the audio signal input from the input terminal 50 into a frame (for example, 5 m
s), and a sub-frame division circuit 1
2 divides the frame output from the frame dividing circuit 11 into shorter frames (for example, 2.5 ms).

【００２３】スペクトルパラメータ計算回路１３は、少
なくとも１つのサブフレームの音声信号に対して、サブ
フレーム長よりも長い窓（たとえば、２４ｍｓ）をかけ
て、音声を切り出してスペクトルパラメータを予め定め
られた次数（たとえば、Ｐ＝１０次）分計算する回路で
あり、実施例のスペクトルパラメータ計算回路１３は、
Ｂｕｒｇ分析を用いて、スペクトルパラメータを計算す
るように構成されている。Ｂｕｒｇ分析の詳細は、中溝
著“信号解析とシステム同定”（コロナ社1998年刊）の
82〜88ページに記載されているので、その説明は省略す
る。なお、このスペクトルパラメータ計算回路として
は、たとえば、ＬＰＣ分析などの他の計算法によって計
算が行われるものを用いることができる。The spectrum parameter calculation circuit 13 cuts out the speech by applying a window (for example, 24 ms) longer than the subframe length to the speech signal of at least one subframe, and sets the spectrum parameter to a predetermined order. (For example, P = 10th order), and the spectrum parameter calculation circuit 13 of the embodiment is
It is configured to calculate spectral parameters using Burg analysis. For details of Burg analysis, see "Signal Analysis and System Identification" by Nakamizo (Corona, 1998).
Since it is described on pages 82 to 88, its description is omitted. It should be noted that, as the spectrum parameter calculation circuit, for example, a circuit that is calculated by another calculation method such as LPC analysis can be used.

【００２４】さらに、スペクトルパラメータ計算回路１
３では、Ｂｕｒｇ法により計算された線形予測係数α_i
（ｉ＝１、…、１０）を量子化や補間に適したＬＳＰパ
ラメータに変換する処理も行われる。実施例のスペクト
ルパラメータ計算回路１３でにおける線形予測係数から
ＬＳＰへの変換は、菅村他による“線スペクトル対（Ｌ
ＳＰ）音声分析合成方式による音声情報圧縮”と題した
論文（電子通信学会論文誌、J64-A 、pp.599-606、1981
年）を参照したものとしている。Further, a spectrum parameter calculation circuit 1
3, the linear prediction coefficient α _i calculated by the Burg method
A process of converting (i = 1,..., 10) into LSP parameters suitable for quantization and interpolation is also performed. The conversion from the linear prediction coefficient to the LSP in the spectral parameter calculation circuit 13 of the embodiment is performed by the method of Sugamura et al.
SP) Speech Information Compression by Speech Analysis / Synthesis Method ”(Transactions of the Institute of Electronics, Information and Communication Engineers, J64-A, pp.599-606, 1981)
Year).

【００２５】すなわち、スペクトルパラメータ計算回路
１３は、第２サブフレームでＢｕｒｇ法により求めた線
形予測係数を、ＬＳＰパラメータに変換し、第１サブフ
レームのＬＳＰを直線補間により求めて、第１サブフレ
ームのＬＳＰを逆変換して線形予測係数に戻し、第１、
２サブフレームの線形予測係数α_il（ｉ＝１、…、１
０、ｌ＝１、…、５）を聴感重み付け回路１７に、第
１、２サブフレームのＬＳＰをスペクトルパラメータ量
子化回路１４へ出力している。That is, the spectrum parameter calculation circuit 13 converts the linear prediction coefficients obtained by the Burg method in the second subframe into LSP parameters, obtains the LSP of the first subframe by linear interpolation, and obtains the LSP of the first subframe. Is transformed back to linear prediction coefficients, and the first,
Linear prediction coefficient α _il of two subframes (i = 1,..., 1
0, l = 1,..., 5) are output to the perceptual weighting circuit 17, and the LSPs of the first and second subframes are output to the spectrum parameter quantization circuit 14.

【００２６】スペクトルパラメータ量子化回路１４は、
予め定められたサブフレームのＬＳＰパラメータを量子
化する回路であり、実施例のスペクトルパラメータ量子
化回路１４は、量子化法として、ベクトル量子化を用
い、第２サブフレームのＬＳＰパラメータを量子化する
ものとなっている。このＬＳＰパラメータのベクトル量
子化の具体的な手順に関しては、特開平４−１７１５０
０号公報（特願平２−２９７６００号）や、特開平４−
３６３０００号公報（特願平３−２６１９２５号）や、
特開平５−６１９９号公報（特願平３−１５５０４９
号）や、ノムラ(T.Nomura)等による“LSP Coding Using
VQ-SVQ With Interpolation in 4.075 kbpsM-LCELP Sp
eech Coder ”と題した論文(Proc. Mobile Multimedia
Communications, pp.B.2.5, 1993)等を参照されたい。The spectrum parameter quantization circuit 14
This is a circuit for quantizing LSP parameters of a predetermined subframe. The spectrum parameter quantization circuit 14 of the embodiment quantizes the LSP parameters of the second subframe using vector quantization as a quantization method. It has become something. For the specific procedure of the vector quantization of the LSP parameter, see Japanese Patent Laid-Open No.
No. 0 (Japanese Patent Application No. 2-297600) and Japanese Unexamined Patent Application Publication No.
No. 363000 (Japanese Patent Application No. 3-261925),
JP-A-5-6199 (Japanese Patent Application No. 3-155049)
No.) and “LSP Coding Using” by T. Nomura
VQ-SVQ With Interpolation in 4.075 kbpsM-LCELP Sp
eech Coder "(Proc. Mobile Multimedia
Communications, pp. B.2.5, 1993).

【００２７】そして、スペクトルパラメータ量子化回路
１４では、第２サブフレームで量子化したＬＳＰパラメ
ータを基に、第１、２サブフレームのＬＳＰパラメータ
が復元される。実施例のスペクトルパラメータ量子化回
路１４では、現フレームの第２サブフレームの量子化Ｌ
ＳＰパラメータと１つ過去のフレームの第２サブフレー
ムの量子化ＬＳＰパラメータとを直線補間することによ
って、第１、２サブフレームのＬＳＰが復元されてい
る。Then, the spectrum parameter quantization circuit 14 restores the LSP parameters of the first and second subframes based on the LSP parameters quantized in the second subframe. In the spectral parameter quantization circuit 14 of the embodiment, the quantization L of the second sub-frame of the current frame is
The LSPs of the first and second sub-frames are restored by linearly interpolating the SP parameters and the quantized LSP parameters of the second sub-frame of the previous frame.

【００２８】ここで、量子化前のＬＳＰと量子化後のＬ
ＳＰとの誤差電力を最小化するコードベクトルを１種類
選択した後に、直線補間することにより第１〜第４サブ
フレームのＬＳＰを復元できる。また、さらに性能を向
上させるためには、誤差電力を最小化するコードベクト
ルを複数候補選択した後に、各々の候補について、累積
歪を評価し、累積歪を最小化する候補と補間ＬＳＰの組
みを選択するようにすることもできる。Here, LSP before quantization and LSP after quantization
After selecting one type of code vector that minimizes the error power from the SP, the LSP of the first to fourth sub-frames can be restored by linear interpolation. Further, in order to further improve the performance, after selecting a plurality of code vectors for minimizing the error power, for each candidate, the cumulative distortion is evaluated, and the combination of the candidate for minimizing the cumulative distortion and the interpolation LSP is determined. You can choose to do so.

【００２９】また、直線補間の代わりに、ＬＳＰの補間
パターンを予め定められたビット数（たとえば２ビッ
ト）分用意しておき、これらのパターンの各々に対し
て、第１、２サブフレームの累積歪を最小化するコード
ベクトルと補間パターンの組みを選択するようにしても
良い。このようにすると補間パターンのビット数分だけ
伝送情報が増加することになるが、ＬＳＰのフレーム内
での時間的な変化をより精密に表すことができる。Also, instead of linear interpolation, LSP interpolation patterns are prepared for a predetermined number of bits (for example, 2 bits), and the accumulation of the first and second subframes is performed for each of these patterns. A combination of a code vector and an interpolation pattern that minimizes distortion may be selected. By doing so, the transmission information increases by the number of bits of the interpolation pattern, but the temporal change in the LSP frame can be represented more precisely.

【００３０】ここで、補間パターンは、トレーニング用
のＬＳＰデータに用いて学習して作成するようにしても
良く、予め定められたパターンを格納するようにしても
良い。後者の場合には、たとえば、タニグチ(T.Taniguc
hi) 氏等による“Improved CELP sppech coding at 4kb
/s and below”と題する論文(Proc. ICSLP, pp.41-44,
1992) に記載のパターンを用いることができる。また、
さらに、性能を改善するためには、補間パターンを選択
した後に、予め定められたサブフレームにおいて、ＬＳ
Ｐの真の値とＬＳＰの補間値との誤差信号を求め、その
誤差信号をさらに誤差コードブックで表すようにしても
良い。Here, the interpolation pattern may be created by learning using LSP data for training, or a predetermined pattern may be stored. In the latter case, for example, T. Taniguc
hi) et al. “Improved CELP sppech coding at 4kb
/ s and below ”(Proc. ICSLP, pp.41-44,
1992) can be used. Also,
Furthermore, in order to improve the performance, after selecting the interpolation pattern, LS
An error signal between the true value of P and the interpolation value of the LSP may be obtained, and the error signal may be further represented by an error codebook.

【００３１】スペクトルパラメータ量子化回路１４は、
上記のような形で復元した第１、２サブフレームのＬＳ
Ｐと第２サブフレームの量子化ＬＳＰを、サブフレーム
毎に線形予測係数α_il′（ｉ＝１、…、１０、ｌ＝１、
…、５）に変換し、インパルス応答計算回路１６へ出力
するとともに、第２サブフレームの量子化ＬＳＰのコー
ドベクトルを表すインデクスをマルチプレクサ２８に出
力する。The spectrum parameter quantization circuit 14
LS of the first and second subframes restored in the above manner
P and the quantized LSP of the second sub-frame are calculated for each sub-frame by a linear prediction coefficient α _il ′ (i = 1,..., 10, l = 1,
.. 5), and outputs the result to the impulse response calculation circuit 16, and outputs an index representing the code vector of the quantized LSP of the second subframe to the multiplexer 28.

【００３２】また、聴感重み付け回路１７は、スペクト
ルパラメータ計算回路１３から、各サブフレーム毎に、
量子化前の線形予測係数α_il（ｉ＝１、…、１０、ｌ＝
１、…、５）を入力し、サブフレームの音声信号に対し
て、聴感重み付けを行い、聴感重み付け信号を出力す
る。そして、提案型モード判別回路２０_Aは、聴感重み
付け回路１７からフレーム単位で聴感重み付け信号を受
け取り、現在のフレームの特徴量と過去の１つのフレー
ムの特徴量とを基に、モード判別を行う。Further, the perceptual weighting circuit 17 calculates the perceptual weight from the spectrum parameter calculating circuit 13 for each subframe.
The linear prediction coefficient α _il before quantization (i = 1,..., 10, l =
1,..., 5) are input, perceptual weighting is performed on the audio signal of the subframe, and a perceptual weighting signal is output. The proposed mode-discriminating circuit 20 _A receives a perceptual weighting signal in units of frames from the perceptual weighting circuit 17, on the basis of the feature amount and the feature amount of the past one frame of the current frame, the mode determination.

【００３３】図２に、提案型モード判別回路の構成を示
す。図示してあるように、提案型モード判別回路２０_A
は、特徴量計算回路３１とフレーム遅延器（Ｄ）３２と
重み付け和計算回路３３とモード判別回路３４によって
構成されており、入力端子５２からフレーム単位に聴感
重み付け信号が入力される。FIG. 2 shows the configuration of the proposed mode discriminating circuit. As shown, the proposed mode discriminating circuit 20 _A
Is composed of a feature amount calculation circuit 31, a frame delay (D) 32, a weighted sum calculation circuit 33, and a mode discrimination circuit 34, and an audibility weighting signal is input from the input terminal 52 in frame units.

【００３４】特徴量計算回路３１は、入力される情報を
基に、特徴量として、ピッチ予測ゲインＧを計算し出力
する。重み付け計算回路３２では、特徴量計算回路３１
の出力と、フレーム遅延器３２に格納されている１つ前
（過去）のフレームの特徴量との重み付け和Ｇ_AVを(1)
式により求めて出力する。なお、(1) 式において、ν _i
は、重み係数である。The feature value calculating circuit 31 converts the input information into
Calculate and output the pitch prediction gain G as a feature value
I do. In the weight calculation circuit 32, the feature amount calculation circuit 31
And the previous one stored in the frame delay unit 32
Weighted sum G with (past) frame features_AV(1)
It is obtained by the formula and output. Note that in equation (1), ν _i
Is a weight coefficient.

【００３５】[0035]

【数２】 (Equation 2)

【００３６】モード判別回路３４は、重み付け和の値Ｇ
_AVを、予め定められた複数個のしきい値と比較して、モ
ード判別を行い、モード判別結果を出力する。たとえ
ば、４種類のモードに分ける場合には、モード判別回路
３４内に３種類のしきい値が設定される。この提案型モ
ード判別回路２０_A内のモード判別回路３４が出力する
モード判別結果は、図１に示してあるように、適応コー
ドブック回路２２と音声量子化回路２４とマルチプレク
サ２８に出力される。The mode discriminating circuit 34 calculates the value G of the weighted sum.
_The mode is determined by comparing the _AV with a plurality of predetermined thresholds, and a mode determination result is output. For example, when dividing into four types of modes, three types of thresholds are set in the mode determination circuit 34. Mode discrimination result mode determination circuit 34 of this proposal-mode determination circuit 20 _A is outputted, as is shown in FIG. 1, is output to the adaptive code book circuit 22 and the audio quantization circuit 24 and the multiplexer 28.

【００３７】応答信号計算回路１８は、スペクトルパラ
メータ計算回路１３からの線形予測係数α_ilと、スペク
トルパラメータ量子化回路１４からの線形予測係数
α_il′を基に、サブフレーム毎に、保存されているフィ
ルタメモリの値を用いて、入力信号ｄ（ｎ）を“０”と
した応答信号を１サブフレーム分計算し、減算器２１に
出力する。この応答信号計算回路１８が出力する応答信
号ｘ_z(n) は、(2) 式で表される。なお、(2) 式におい
て、γは、聴感重み付け量を制御する重み係数である。The response signal calculation circuit 18 is stored for each subframe based on the linear prediction coefficient α _il from the spectrum parameter calculation circuit 13 and the linear prediction coefficient α _il ′ from the spectrum parameter quantization circuit 14. A response signal with the input signal d (n) set to “0” for one subframe is calculated using the value of the filter memory that is present, and is output to the subtractor 21. The response signal x _z (n) output from the response signal calculation circuit 18 is expressed by equation (2). In the expression (2), γ is a weighting coefficient for controlling the audibility weighting amount.

【００３８】[0038]

【数３】 (Equation 3)

【００３９】応答信号計算回路１８の出力を受けた減算
器２１は、(4) 式に従って、聴感重み付け信号から応答
信号を１サブフレーム分減算し、その減算結果を適応コ
ードブック回路２２に出力する。The subtracter 21 receiving the output of the response signal calculation circuit 18 subtracts the response signal by one subframe from the perceptual weighting signal according to the equation (4), and outputs the result of the subtraction to the adaptive codebook circuit 22. .

【００４０】[0040]

【数４】 (Equation 4)

【００４１】また、インパルス応答計算回路１６は、ｚ
変換が、(4) 式で表される重み付けフィルタのインパル
ス応答ｈ_w(n) を予め定めた点数Ｌだけ計算し、適応コ
ードブック回路２２、音源量子化回路２４へ出力する。The impulse response calculation circuit 16 calculates z
The conversion calculates the impulse response h _w (n) of the weighting filter represented by the equation (4) by a predetermined number L, and outputs it to the adaptive codebook circuit 22 and the sound source quantization circuit 24.

【００４２】[0042]

【数５】 (Equation 5)

【００４３】適応コードブック回路２２は、ピッチパラ
メータを求めるとともに、ピッチ予測を(5) 式に従い行
い、適応コードブック予測算差信号Z(n)を出力する。な
お、(5) 式において、b(n)は、適応コードブックピッチ
予測信号であり、適応コードブックピッチ予測信号は、
β、Ｔを、それぞれ、適応コードブックのゲイン、遅延
とし、V(n)を適応コードベクトル、記号＊を畳み込み演
算記号とすると、(6)式で表される。The adaptive codebook circuit 22 obtains a pitch parameter, performs pitch prediction according to equation (5), and outputs an adaptive codebook prediction difference signal Z (n). In Equation (5), b (n) is an adaptive codebook pitch prediction signal, and the adaptive codebook pitch prediction signal is
If β and T are the gain and delay of the adaptive codebook, V (n) is the adaptive code vector, and the symbol * is the convolution operation symbol, it is expressed by equation (6).

【００４４】[0044]

【数６】 (Equation 6)

【００４５】不均一パルス数型スパース音源コードブッ
ク２５は、各々のベクトルの“０”でない成分の個数が
異なるスパースコードブックであり、音源量子化回路２
４では、不均一パルス数型スパース音源コードブック２
５に格納された音源コードベクトルの全部あるいは一部
に対して、(7) 式を最小化するように音源コードベクト
ルｃ_j(n) が選択される。The non-uniform pulse number type sparse excitation codebook 25 is a sparse codebook in which the number of non-zero components of each vector is different.
4, the non-uniform pulse number type sparse sound source codebook 2
The excitation code vector c _j (n) is selected for all or a part of the excitation code vector stored in No. 5 so as to minimize the expression (7).

【００４６】[0046]

【数７】 (Equation 7)

【００４７】なお、音源量子化回路２４では、２種以上
のコードベクトルが選択されるようになっており、以下
に記載するゲイン量子化の際に最良のコードベクトルが
１種特定されるようになっているが、この選択の際に、
コードベクトルを１種に特定してしまっても良い。ま
た、一部の音源コードベクトルに対してのみ、(7) 式を
適用するときには、複数個の音源コードベクトルを予備
選択しておき、予備選択された音源コードベクトルに対
して(7) 式を適用することもできる。In the sound source quantization circuit 24, two or more types of code vectors are selected, so that one of the best code vectors is specified at the time of gain quantization described below. However, at the time of this selection,
The code vector may be specified as one type. When applying Equation (7) to only some of the excitation code vectors, a plurality of excitation code vectors are preliminarily selected, and Equation (7) is applied to the preselected excitation code vectors. It can also be applied.

【００４８】ゲイン量子化回路２６は、ゲインコードブ
ック２７からゲインコードベクトルを読み出し、音源量
子化回路２４によって選択された音源コードベクトルに
対して、(8) 式を最小化するように、音源コードベクト
ルとゲインコードベクトルの組み合わせを選択し、選択
した音源コードベクトルとゲインコードベクトルを表す
インデクスをマルチプレクサ２８に出力する。なお、
(8) 式において、β_K′、γ_K′は、ゲインコードブッ
ク２７に格納されている２次元ゲインコードブックにお
けるｋ番目のコードベクトルである。The gain quantization circuit 26 reads the gain code vector from the gain codebook 27, and generates the excitation code vector from the excitation code vector selected by the excitation quantization circuit 24 so as to minimize the expression (8). The combination of the vector and the gain code vector is selected, and an index representing the selected excitation code vector and gain code vector is output to the multiplexer 28. In addition,
In the equation (8), β _K ′ and γ _K ′ are k-th code vectors in the two-dimensional gain codebook stored in the gain codebook 27.

【００４９】[0049]

【数８】 (Equation 8)

【００５０】重み付け信号計算回路１９は、スペクトル
パラメータ計算回路１３の出力パラメータおよび各イン
デクスを基に、それぞれのインデクスに対応するコード
ベクトルを読み出し、まず、(9) 式に基づき、駆動音源
信号v(n)を求める。The weighting signal calculation circuit 19 reads out the code vector corresponding to each index based on the output parameters of the spectrum parameter calculation circuit 13 and each index, and firstly, based on the equation (9), the driving sound source signal v ( Find n).

【００５１】[0051]

【数９】 (Equation 9)

【００５２】次に、スペクトルパラメータ計算回路１３
の出力パラメータ、スペクトルパラメータ量子化回路１
４の出力パラメータを用いて、(10)式により、重み付け
信号ｓ_w(n) をサブフレーム毎に計算し、計算した重み
付け信号を応答信号計算回路１８に出力する。Next, the spectrum parameter calculation circuit 13
Output parameter and spectrum parameter quantization circuit 1
The weighting signal s _w (n) is calculated for each subframe by the equation (10) using the output parameter of No. 4 and the calculated weighting signal is output to the response signal calculation circuit 18.

【００５３】[0053]

【数１０】 (Equation 10)

【００５４】このように、第１の実施例の音声符号化装
置では、フレーム長よりも長い時間長にわたって平均化
されたモード情報が出力されることになるので、誤った
モード判別に起因する音質劣化を抑制できることにな
る。As described above, in the speech coding apparatus according to the first embodiment, the mode information averaged over a time length longer than the frame length is output, so that the sound quality due to an incorrect mode discrimination is output. Deterioration can be suppressed.

【００５５】[0055]

【００５６】[0056]

【００５７】[0057]

【００５８】[0058]

【００５９】[0059]

【００６０】[0060]

【００６１】[0061]

【００６２】[0062]

【００６３】[0063]

【００６４】[0064]

【００６５】[0065]

【００６６】[0066]

【００６７】[0067]

【００６８】[0068]

【００６９】[0069]

【００７０】[0070]

【００７１】[0071]

【００７２】[0072]

【００７３】[0073]

【００７４】[0074]

【００７５】[0075]

【００７６】[0076]

【００７７】[0077]

【００７８】[0078]

【００７９】[0079]

【００８０】[0080]

【００８１】[0081]

【００８２】[0082]

【００８３】[0083]

【００８４】[0084]

【００８５】[0085]

【００８６】[0086]

【００８７】[0087]

【００８８】[0088]

【００８９】[0089]

【００９０】[0090]

【００９１】[0091]

【００９２】[0092]

【００９３】[0093]

【００９４】[0094]

【００９５】[0095]

【００９６】[0096]

【００９７】[0097]

【００９８】[0098]

【００９９】[0099]

【０１００】[0100]

【０１０１】[0101]

【０１０２】[0102]

【０１０３】[0103]

【０１０４】[0104]

【発明の効果】以上、詳細に説明したように、請求項１
ないし２記載の発明によれば、低遅延とするために、フ
レーム長を５〜１０ｍｓと短くしても、モード判別の時
間的変動による音質劣化を起こすことがないので、良好
な音質が維持できることになる。As described in detail above, claim 1 is as follows.
According to the inventions described in ( 2) and ( 3) , even if the frame length is reduced to 5 to 10 ms in order to reduce the delay, the sound quality does not deteriorate due to the temporal fluctuation of the mode discrimination, so that good sound quality can be maintained. become.

【０１０５】[0105]

[Brief description of the drawings]

【図１】本発明の実施例による音声符号化装置の構成を
示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a speech encoding device according to an embodiment of the present invention .

【図２】本発明の実施例の音声符号化装置で用いられて
いる提案型モード判別回路の構成を示すブロック図であ
る。 [Figure 2] a block diagram showing the structure of a proposed mode-discriminating circuit used in the speech coding apparatus of the embodiment of the present invention Oh
You.

[Explanation of symbols]

１１フレーム分割回路１２サブフレーム分割回路１３スペクトルパラメータ計算回路１４スペクトルパラメータ量子化回路１５ＬＳＰコードブック１６インパルス応答計算回路１７聴感重み付け回路１８応答信号計算回路１９重み付け信号計算回路２０提案型モード判別回路２１減算器２２適応コードブック回路２３パタン蓄積回路２４音源量子化回路２５不均一パルス数型スパース音源コードブック２６ゲイン量子化回路２７ゲインコードブック２８マルチプレクサ３１特徴量計算回路３２フレーム遅延器３３重み付け和計算回路３４モード判別回路 DESCRIPTION OF SYMBOLS 11 Frame division circuit 12 Subframe division circuit 13 Spectrum parameter calculation circuit 14 Spectrum parameter quantization circuit 15 LSP codebook 16 Impulse response calculation circuit 17 Perception weighting circuit 18 Response signal calculation circuit 19 Weighting signal calculation circuit 20 Proposed mode discrimination circuit 21 Subtractor 22 Adaptive codebook circuit 23 Pattern storage circuit 24 Sound source quantization circuit 25 Non-uniform pulse number type sparse sound source codebook 26 Gain quantization circuit 27 Gain codebook 28 Multiplexer 31 Feature amount calculation circuit 32 Frame delay unit 33 Weighted sum calculation Circuit 34 mode discrimination circuit

フロントページの続き (56)参考文献特開平２−139600（ＪＰ，Ａ) 特開平６−4099（ＪＰ，Ａ) 特開平６−222797（ＪＰ，Ａ) 特開平７−225599（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 21/06 H03M 7/30 H04B 14/04 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-2-139600 (JP, A) JP-A-6-4099 (JP, A) JP-A-6-222797 (JP, A) JP-A-7-225599 (JP) , A) (58) Fields surveyed (Int. Cl. ⁷ , DB name) G10L 11/00-21/06 H03M 7/30 H04B 14/04 JICST file (JOIS)

Claims

(57) [Claims]

1. A dividing means for dividing an audio signal into a signal of a predetermined unit of frame, and a characteristic obtained from a signal of a current frame divided by the dividing means.
The amount of data collected and the dividing means
Weighting with features obtained from divided frame signals
A mode discriminator for discriminating a mode of the audio signal by using the sum; and an algorithm specified in accordance with the mode determined by the mode discriminator. An audio encoding apparatus comprising: encoding means for performing encoding.

2. A pitch prediction gain as the feature quantity.
The speech encoding device according to claim 1, wherein the speech encoding device is used.