JPH05273999A

JPH05273999A - Voice encoding method

Info

Publication number: JPH05273999A
Application number: JP4073683A
Authority: JP
Inventors: Yoshiaki Asakawa; 吉章淺川; Hidetoshi Sekine; 英敏関根; Yasuko Shinada; 康子品田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-03-30
Filing date: 1992-03-30
Publication date: 1993-10-22

Abstract

PURPOSE:To provide the voice encoding method which can obtain high-quality synthetic voice even at a low bit rate lower than 4kbps. CONSTITUTION:This method is provided with a pulse component adaptive code book 34, noise component adaptive code book 31, pulse generator 42, first noise code book 46, second noise code book 50 and sound source selector 54 at the voice encoding part, and the sound source code book is switched by weighted error evaluation. Thus. the reproduciveness of the cyclic component of voice is improved, and high-quality voice can be obtained even at the low bit rate.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、低ビットレートで高品
質な合成音声を得るに好適な音声符号化方法に関し、特
にビットレートを４ｋｂｐｓ以下に低減することができ
る音声符号化方法に係る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding method suitable for obtaining high quality synthesized speech at a low bit rate, and more particularly to a speech coding method capable of reducing the bit rate to 4 kbps or less.

【０００２】[0002]

【従来の技術】合成音声と原音声の重み付き誤差を評価
し、その誤差を最小化するように符号化パラメータを決
定する、「合成による分析」手法を取り入れた音声符号
化方式が最近提案され、低ビットレートにおいても比較
的良好な音声品質を得ることに成功している。代表的な
ものとして符号駆動線形予測符号化（ＣＥＬＰ）方式
（例えば、M.R.Schroeder and B.S.Atal："Code-exci
ted linear prediction (CELP)"，Proc．ICASSP 85 (19
85.3)）があり、４．８ｋｂｐｓで実用的な音声品質を
達成している。また、ＣＥＬＰ方式の改良方式も多数提
案されており、例えばベクトル和駆動線形予測符号化
（ＶＳＥＬＰ）方式（例えば、I. A. Gerson and M. A.
Jasiuk："Vector sum excited linear prediction (VS
ELP) speech coding at 8kbps"，Proc．ICASSP 90 (199
0.4)）は、処理量やメモリ容量、ビット誤り耐性の点で
優れている。2. Description of the Related Art Recently, a speech coding method incorporating a "synthesis analysis" method for evaluating a weighted error between synthetic speech and original speech and determining a coding parameter so as to minimize the error has been recently proposed. We have succeeded in obtaining relatively good voice quality even at low bit rates. A typical example is a code-driven linear predictive coding (CELP) method (for example, MR Schroeder and BSAtal: "Code-exci").
ted linear prediction (CELP) ", Proc. ICASSP 85 (19
85.3)) and achieves practical voice quality at 4.8 kbps. Also, many improved methods of the CELP method have been proposed, for example, vector sum driven linear predictive coding (VSELP) method (for example, IA Gerson and MA).
Jasiuk: "Vector sum excited linear prediction (VS
ELP) speech coding at 8kbps ", Proc. ICASSP 90 (199
0.4)) is excellent in processing amount, memory capacity, and bit error resistance.

【０００３】一方無線通信のディジタル化が本格化し、
周波数の有効活用の観点から、より低いビットレート
（４ｋｂｐｓ以下）の音声符号化方式の開発が望まれて
いる。ＣＥＬＰやＶＳＥＬＰを単純に低ビットレート化
しようとすると、品質劣化が大きくなり、限界がある。
これは低ビットレート化することによって音声に周期成
分の再現性が低下するためと考えられている。そこで、
周期成分の再現性を高めるような音源を採用する方式が
提案されている。On the other hand, the digitization of wireless communication has begun in earnest,
From the viewpoint of effective use of frequencies, it is desired to develop a voice encoding system with a lower bit rate (4 kbps or less). If CELP or VSELP is simply made to have a low bit rate, quality deterioration becomes large and there is a limit.
It is considered that this is because the reproducibility of the periodic component in the voice is lowered by lowering the bit rate. Therefore,
A method that employs a sound source that enhances the reproducibility of the periodic component has been proposed.

【０００４】このような方式としては、有声音でマルチ
パルス、無声音でＣＥＬＰを用いる「ＭＰＣ−ＣＥＬ
Ｐ」方式（小澤、熊谷：”マルチパルスとＣＥＬＰを用
いる３．２ｋｂ／ｓ音声符号化方式”、電子情報通信学
会春季全国大会（１９９０．３））や、有声音では位相
と振幅を制御したシングルパルス、無声音ではＣＥＬＰ
を用いる「ＳＰＥ−ＣＥＬＰ」方式（W.Granzow and
B.S.Atal："High-quality digital speech at 4 kb/
s"，Proc．GLOBECOM 90 (1990.12)）等がある。また、
音源としてパルスとそれに直交化させた雑音を用いる
「主軸適応ＶＸＣ」方式（田中、他３：”複数ベクトル
合成によるＣＥＬＰ符号化”、日本音響学会講演論文集
１−３−５（１９８９．１０））や、周期パルスと雑
音を切り替えて使用する「パルス／雑音選択型ＣＥＬ
Ｐ」方式（吉田、他２：”低ビットレートＣＥＬＰ符号
化へのパルス音源探索の適用”、信学技報ＳＰ９１−
６８（１９９１．１０））も提案されている。As such a system, "MPC-CEL" which uses multi-pulse for voiced sound and CELP for unvoiced sound is used.
P "method (Ozawa, Kumagai:" 3.2 kb / s speech coding method using multi-pulse and CELP ", IEICE Spring National Convention (1990.3)), and phase and amplitude control for voiced sounds. Single pulse, CELP for unvoiced sounds
"SPE-CELP" method (W. Granzow and
BSAtal: "High-quality digital speech at 4 kb /
s ", Proc. GLOBECOM 90 (1990.12)), etc.
"Spindle adaptive VXC" method using pulse and noise orthogonalized to it as a sound source (Tanaka et al. 3: "CELP coding by multiple vector synthesis", Acoustical Society of Japan Proceedings 1-3-5 (1989.10) ), Or “pulse / noise selection type CEL” that is used by switching between periodic pulse and noise
P ”method (Yoshida et al. 2:“ Application of pulse source search to low bit rate CELP coding ”, IEICE Technical Report SP91-
68 (1991.10)) has also been proposed.

【０００５】[0005]

【発明が解決しようとする課題】上記提案方式には、次
のような問題がある。本質的に異なった符号化方式（例
えばマルチパルスとＣＥＬＰ）を切り換えて使用する場
合には、音色が変化するなど音質が不自然になりがちで
ある。また、パルスと雑音の切り替えでは、パルス使用
時の音質がパルシブになったりブザー的になったりしが
ちである。さらに、パルスと雑音を併用する方式では、
ビットレートを十分に低減できないという問題点があっ
た。The above-mentioned proposed system has the following problems. When switching between essentially different encoding methods (for example, multi-pulse and CELP), the sound quality tends to be unnatural such as a change in tone color. Also, when switching between pulse and noise, the sound quality when using pulses tends to be pulsive or buzzer-like. Furthermore, in the method that uses both pulse and noise,
There is a problem that the bit rate cannot be reduced sufficiently.

【０００６】本発明の目的は、低ビットレート化しても
音声の周期成分の再現性が高く、かつ音色の変化が目立
たない符号化方式を提供することである。It is an object of the present invention to provide an encoding method in which the reproducibility of the periodic component of voice is high and the change of the tone color is not noticeable even if the bit rate is reduced.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、本発明では以下の手段を有する。（１）パルス発生
器、（２）コードブックサイズが小さな第１の雑音コー
ドブック、（３）コードブックサイズが大きな第２の雑
音コードブック、（４）重み付け誤差評価部を有する。
また、本発明の別の実施例においては、（５）音響分類
器、（６）ピッチ抽出器、（７）パルス成分用適応コー
ドブック、（８）雑音成分用適応コードブックを有す
る。In order to achieve the above object, the present invention has the following means. It has (1) a pulse generator, (2) a first noise codebook with a small codebook size, (3) a second noise codebook with a large codebook size, and (4) a weighting error evaluator.
Further, another embodiment of the present invention has (5) acoustic classifier, (6) pitch extractor, (7) pulse component adaptive codebook, and (8) noise component adaptive codebook.

【０００８】[0008]

【作用】本発明の代表的な構成における作用を述べる。
符号器に入力された音声は、まずフレーム、及びサブフ
レームに分割される。短期予測分析部では、フレームご
とにスペクトルパラメータ（短期予測係数）が抽出さ
れ、量子化される。次に、聴覚重み付け誤差を評価する
ための準備として、入力音声に聴覚重み付けがなされ
る。また、重み付け合成フィルタにゼロ信号を入力し、
零入力応答を求め、重み付けられた入力信号から減算す
る。これは、合成フィルタの内部状態に依存する過去の
影響を取り除くためである。The operation of the typical structure of the present invention will be described.
The voice input to the encoder is first divided into frames and subframes. The short-term prediction analysis unit extracts and quantizes the spectrum parameter (short-term prediction coefficient) for each frame. Next, the input speech is perceptually weighted in preparation for evaluating perceptual weighting errors. Also, input the zero signal to the weighting synthesis filter,
The quiescent response is determined and subtracted from the weighted input signal. This is to remove past effects that depend on the internal state of the synthesis filter.

【０００９】次に長期予測分析部において、サブフレー
ム単位で、適応コードブックから最適な長期予測ラグと
利得を求める。適応コードブックはパルス成分用と雑音
成分用に分けられており、長期予測ラグの検索は最適な
利得による加重和によって得られる長期予測フィルタ成
分に対して行われる。Next, the long-term prediction analysis unit obtains the optimum long-term prediction lag and gain from the adaptive codebook in subframe units. The adaptive codebook is divided into a pulse component and a noise component, and the long-term prediction lag is searched for the long-term prediction filter component obtained by the weighted sum with the optimum gain.

【００１０】パルス発生器では、長期予測分析器で得ら
れた長期予測ラグをパルス間隔として、パルス位置を１
サンプルずつずらせて発生させ、重み付け合成フィルタ
のインパルス応答の畳み込みにより、重み付けする。こ
れらを長期予測ベクトルに対して直交化後、重み付け誤
差を最小化する位置にあるパルス音源を検索し、位置と
利得を決定する。In the pulse generator, the pulse position is 1 with the long-term prediction lag obtained by the long-term prediction analyzer as the pulse interval.
It is generated by shifting each sample and weighted by convolving the impulse response of the weighting synthesis filter. After orthogonalizing these with respect to the long-term prediction vector, a pulse source located at a position where the weighting error is minimized is searched, and the position and the gain are determined.

【００１１】第１のコードブック検索部では、第１の雑
音コードブックの中のコードベクトルを、上記パルス音
源と同様に重み付けする。長期予測ベクトルとパルス音
源ベクトルに対して直交化し、重み付け誤差を最小化す
るコードベクトルのコードと利得を決定する。The first codebook search section weights the code vectors in the first noise codebook in the same manner as the pulse sound source. The code and gain of the code vector that minimizes the weighting error are determined by orthogonalizing the long-term prediction vector and the pulse source vector.

【００１２】第２の雑音コードブックの検索は、パルス
音源の検索、及び、第１の雑音コードブックの検索と並
行して実行可能である。第２のコードブック検索部で
は、第２の雑音コードブックの中のコードベクトルを、
上記パルス音源と同様に重み付けする。長期予測ベクト
ルに対して直交化し、重み付け誤差を最小化するコード
ベクトルのコードと利得を決定する。The search for the second noise codebook can be performed concurrently with the search for the pulse sound source and the search for the first noise codebook. In the second codebook search unit, the code vector in the second noise codebook is
Weighting is performed in the same manner as the pulse sound source. The code and gain of the code vector that is orthogonalized to the long-term prediction vector and minimizes the weighting error are determined.

【００１３】選択器では、上記パルス音源と第１の雑音
音源を用いた場合と、第２の雑音音源のみを用いた場合
との重み付け誤差を評価し、誤差が小さい方を最終的な
音源として選択する。The selector evaluates the weighting error between the case where the pulse sound source and the first noise sound source are used and the case where only the second noise sound source is used, and the smaller error is taken as the final sound source. select.

【００１４】利得量子化部では、選択器によって選択さ
れた音源の利得を同時に最適化し、量子化する。The gain quantizer simultaneously optimizes and quantizes the gains of the sound sources selected by the selector.

【００１５】以上のようにして求められたスペクトルパ
ラメータや利得の量子化コード、長期予測ラグや選択さ
れたパルス音源の位置、雑音コードベクトルの指標が伝
送パラメータとして復号器へ伝送される。The spectrum parameter, the quantized code of gain, the long-term prediction lag, the position of the selected pulse sound source, and the index of the noise code vector obtained as described above are transmitted to the decoder as transmission parameters.

【００１６】復号器では、上記伝送パラメータから駆動
音源が計算され、短期予測係数をフィルタ係数とする合
成フィルタに入力されることによって、復号化音声が得
られる。In the decoder, the driving sound source is calculated from the above transmission parameters and is input to the synthesis filter having the short-term prediction coefficient as a filter coefficient, whereby decoded speech is obtained.

【００１７】また、本発明の別の構成においては、音響
分類を行うことによって、選択器における重み付け誤差
評価を制御することも可能である。さらに、ピッチ抽出
を行い、抽出されたピッチ周期をパルス音源のパルス間
隔とすることも可能である。In another configuration of the present invention, it is also possible to control the weighting error evaluation in the selector by performing acoustic classification. Furthermore, it is also possible to perform pitch extraction and use the extracted pitch period as the pulse interval of the pulse sound source.

【００１８】[0018]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００１９】本発明は、符号駆動型音声符号化（ＣＥＬ
Ｐ）方式に基づいているので、まずＣＥＬＰ方式の原理
について概要を説明する。ＣＥＬＰ符号化では、音源の
周期性を表す成分として適応コードブックの出力である
長期予測ベクトルと、周期性以外の成分（ランダム性、
あるいは、雑音性）として雑音コードブック（統計コー
ドブックとも言う）の出力であるコードベクトルにそれ
ぞれの利得を乗じて加算した荷重和を駆動音源とする。The present invention is directed to code driven speech coding (CEL).
Since it is based on the P) method, the outline of the principle of the CELP method will be described first. In CELP coding, a long-term prediction vector that is an output of an adaptive codebook as a component representing the periodicity of a sound source, and components other than the periodicity (randomness
Alternatively, the sum of weights obtained by multiplying the code vectors, which are the output of the noise codebook (also referred to as statistical codebook) as the noise characteristics, by the respective gains and adding them is used as the driving sound source.

【００２０】最適な駆動音源を得るためのコードブック
の検索は次のようにしてなされる。一般に駆動音源を合
成フィルタに入力して得られる合成音声が、原音声（入
力音声）に一致するような駆動音源が得られれば良い
が、実際にはなんらかの誤差（量子化歪）を伴う。した
がってこの誤差を最小化するように駆動音源を決定すれ
ば良いことになるが、人間の聴覚特性は必ずしも誤差量
と音声の主観品質の対応が取れないことが知られてい
る。そこで聴覚特性との対応が良くなるように重み付け
した誤差を用いるのが一般的である。聴覚重み付けにつ
いては、例えば次の文献に記載されている。B. S. Atal
and J. R. Remde: "A new model of LPC excitation f
or producing natural-sounding speech at low bit ra
tes", Proc.ICASSP 82 (1982.5)。The codebook search for obtaining the optimum driving sound source is performed as follows. Generally, it suffices to obtain a driving sound source in which the synthetic speech obtained by inputting the driving sound source to the synthesis filter matches the original speech (input speech), but in practice, some error (quantization distortion) is involved. Therefore, it is sufficient to determine the driving sound source so as to minimize this error, but it is known that the human auditory characteristics do not always correspond to the error amount and the subjective quality of the voice. Therefore, it is general to use an error weighted so that the correspondence with the auditory characteristics is improved. Hearing weighting is described in the following documents, for example. BS Atal
and JR Remde: "A new model of LPC excitation f
or producing natural-sounding speech at low bit ra
tes ", Proc. ICASSP 82 (1982.5).

【００２１】この聴覚重み付け誤差を評価するために、
駆動音源は重み付け合成フィルタに入力され、重み付け
合成音声を得る。入力音声も重み付けフィルタを通して
重み付け入力音声を得、重み付け合成音声との差を取っ
て重み付け誤差波形を得る。重み付け誤差波形は、誤差
評価区間にわたって２乗和を計算され、重み付け２乗誤
差が得られる。前述のように駆動音源は長期予測ベクト
ルと雑音コードベクトルの荷重和であるから、駆動音源
の決定は、各コードブックからどのコードベクトルを選
択するかを決めるコードベクトル指標の決定に帰着す
る。すなわち、長期予測ラグとコードベクトル指標を順
次変えて重み付け２乗誤差を算出し、重み付け誤差が最
小となるものを選択すれば良い。このような駆動音源決
定法を「合成による分析」法と呼んでいる。上述の手順
を忠実に行う、すなわち毎回重み付け誤差を評価しなが
ら長期予測ラグと雑音コードベクトルの指標を同時に最
適化しようとすると、膨大な処理量となるため、実際に
は逐次最適化等の手法が用いられる。To evaluate this perceptual weighting error,
The driving sound source is input to the weighting synthesis filter to obtain the weighted synthesis speech. The input speech is also obtained through the weighting filter to obtain the weighted input speech, and the difference from the weighted synthesized speech is obtained to obtain the weighted error waveform. For the weighted error waveform, the sum of squares is calculated over the error evaluation section, and the weighted squared error is obtained. Since the driving sound source is the weighted sum of the long-term predicted vector and the noise code vector as described above, the determination of the driving sound source results in the determination of the code vector index that determines which code vector is selected from each codebook. That is, the long-term prediction lag and the code vector index are sequentially changed to calculate the weighted squared error, and the one with the smallest weighting error may be selected. Such a driving sound source determination method is called a "synthesis analysis" method. If the above procedure is faithfully performed, that is, if the long-term prediction lag and the index of the noise code vector are optimized at the same time while evaluating the weighting error, a huge amount of processing is required. Is used.

【００２２】本発明の第１の実施例の符号化部のブロッ
ク図を図１に、復号化部のブロック図を図２に示す。以
下、第１の実施例の動作の概要を説明する。FIG. 1 shows a block diagram of an encoding unit according to the first embodiment of the present invention, and FIG. 2 shows a block diagram of a decoding unit. The outline of the operation of the first embodiment will be described below.

【００２３】音声符号化部には、所定の標本化周波数
（通常８ｋＨｚ）でＡ／Ｄ変換されたディジタル音声信
号１１が入力される。音響分類器１２は入力音声の音響
的特徴に基づいて入力音声を複数個のカテゴリー、例え
ば母音性や摩擦性等に分類する。音響分類結果は音響分
類フラッグ１３として出力される。The voice encoding unit receives the digital voice signal 11 A / D converted at a predetermined sampling frequency (usually 8 kHz). The sound classifier 12 classifies the input voice into a plurality of categories, such as vowel characteristics and frictional characteristics, based on the acoustic characteristics of the input voice. The sound classification result is output as the sound classification flag 13.

【００２４】短期予測分析器（ＬＰＣ分析器）１７は分
析フレーム長の音声データ１１を読みだし、短期予測係
数１８を出力する。フレーム長は、例えば４０ｍｓ（３
２０サンプル）程度である。The short-term prediction analyzer (LPC analyzer) 17 reads out the speech data 11 of the analysis frame length and outputs the short-term prediction coefficient 18. The frame length is, for example, 40 ms (3
20 samples).

【００２５】短期予測係数１８は、短期予測係数量子化
器１９にて量子化される。量子化符号が短期予測係数量
子化指標２１として、伝送パラメータとして出力され
る。また、短期予測係数の量子化値２０が、次段以降の
処理で参照される。The short-term prediction coefficient 18 is quantized by the short-term prediction coefficient quantizer 19. The quantized code is output as the transmission parameter as the short-term prediction coefficient quantization index 21. In addition, the quantized value 20 of the short-term prediction coefficient is referred to in the subsequent processing.

【００２６】さらに、入力音声は聴覚重み付け器２２で
重み付けられ、重み付け音声２３が得られる。一方、重
み付け合成フィルタ２４にフレーム長分の値が０の信号
（零入力）２５を入力し、零入力応答２６を得る。これ
を重み付け入力音声２３から減算し、重み付け合成フィ
ルタの過去の内部状態の影響を取り除いた重み付け入力
音声２７が得られる。Further, the input voice is weighted by the auditory weighting device 22 to obtain a weighted voice 23. On the other hand, a signal (zero input) 25 having a value of 0 for the frame length is input to the weighting synthesis filter 24, and a zero input response 26 is obtained. This is subtracted from the weighted input voice 23 to obtain the weighted input voice 27 in which the influence of the past internal state of the weighting synthesis filter is removed.

【００２７】長期予測分析は、サブフレームごとに適応
コードブックの検索によって実行されるので、以下では
適応コードブック検索と呼ぶ。ここで、サブフレーム長
は、例えば８ｍｓ（６４サンプル）程度である。本発明
ではパルス成分用と、雑音成分用の二つの適応コードブ
ックを有しており、図面でも適応コードブック検索器３
１、３４と示されている。後述のように、二つの適応コ
ードブックの状態を合成し、音声の周期性を表すパラメ
ータである長期予測ラグ３７が抽出され、長期予測ラグ
の指標３８と長期予測ベクトル４１が出力される。Since the long-term predictive analysis is performed by searching the adaptive codebook for each subframe, it will be referred to as adaptive codebook search hereinafter. Here, the subframe length is, for example, about 8 ms (64 samples). The present invention has two adaptive codebooks for the pulse component and the noise component, and the adaptive codebook searcher 3 is also shown in the drawing.
1, 34. As described later, the states of the two adaptive codebooks are combined, the long-term prediction lag 37 that is a parameter representing the periodicity of the speech is extracted, and the long-term prediction lag index 38 and the long-term prediction vector 41 are output.

【００２８】パルス発生器４２では、長期予測分析で得
られた長期予測ラグ３７をパルス間隔として、パルス位
置を１サンプルずつずらせて発生させ、重み付け合成フ
ィルタのインパルス応答の畳み込みにより、重み付けす
る。これらを長期予測ベクトル４１に対して直交化後、
重み付け誤差を最小化する位置にあるパルス音源を検索
し、位置４４と利得４３を決定する。選択されたパルス
成分の重み付けパルス音源ベクトル４５が、第１の雑音
コードブック検索部４６へ出力される。In the pulse generator 42, the long-term prediction lag 37 obtained by the long-term prediction analysis is used as pulse intervals, the pulse positions are shifted by one sample, and the pulses are generated by weighting by convolution of the impulse response of the weighting synthesis filter. After orthogonalizing these with respect to the long-term prediction vector 41,
The pulse sound source located at the position where the weighting error is minimized is searched, and the position 44 and the gain 43 are determined. The weighted pulse excitation vector 45 of the selected pulse component is output to the first noise codebook search unit 46.

【００２９】第１のコードブック検索部４６では、第１
の雑音コードブックの中のコードベクトルを、上記パル
ス音源と同様に重み付けする。長期予測ベクトル４１と
パルス音源ベクトル４５に対して直交化し、重み付け誤
差を最小化するコードベクトルのコード４８と利得４７
を決定する。In the first codebook search section 46,
The code vectors in the noise codebook of 1 are weighted in the same manner as the pulse source. The code 48 and the gain 47 of the code vector that minimizes the weighting error by orthogonalizing the long-term prediction vector 41 and the pulse sound source vector 45.
To decide.

【００３０】第２の雑音コードブックの検索は、パルス
音源の検索、及び、第１の雑音コードブックの検索と並
行して実行可能である。第２のコードブック検索部５６
では、第２の雑音コードブックの中のコードベクトル
を、上記パルス音源と同様に重み付けする。長期予測ベ
クトル４１に対して直交化し、重み付け誤差を最小化す
るコードベクトルのコード５２と利得５１を決定する。The search of the second noise codebook can be executed in parallel with the search of the pulse sound source and the search of the first noise codebook. Second codebook search unit 56
Then, the code vectors in the second noise codebook are weighted in the same manner as the pulse source. The code 52 and the gain 51 of the code vector that is orthogonalized to the long-term prediction vector 41 and minimizes the weighting error are determined.

【００３１】選択器５４では、上記パルス音源と第１の
雑音音源を用いた場合と、第２の雑音音源のみを用いた
場合との重み付け誤差を評価し、誤差が小さい方を最終
的な音源として選択し、それを音源選択フラッグ５５と
して出力する。ここで、重み付け誤差は音響分類フラッ
グ１３によって修正され、主観品質が良くなるような音
源が優先的に選択される。The selector 54 evaluates the weighting error between the case where the pulse sound source and the first noise sound source are used and the case where only the second noise sound source is used, and the smaller error is the final sound source. Is output as a sound source selection flag 55. Here, the weighting error is corrected by the sound classification flag 13, and the sound source that improves the subjective quality is preferentially selected.

【００３２】利得コードブック検索器５６では、選択器
５４によって選択された音源の利得を、利得コードブッ
クの検索により同時に最適化し、その時の量子化コード
５７を出力する。The gain codebook searcher 56 simultaneously optimizes the gains of the sound sources selected by the selector 54 by searching the gain codebook, and outputs the quantized code 57 at that time.

【００３３】以上のようにして求められた短期予測係数
や利得の量子化コード２１、５７、長期予測ラグの指標
３８や選択されたパルス音源の位置４４、雑音コードベ
クトルの指標４８，５２及び音源選択フラッグ５５が伝
送パラメータとして復号器へ伝送される。Quantization codes 21, 57 of the short-term prediction coefficient and gain, the index 38 of the long-term prediction lag, the position 44 of the selected pulse sound source, the noise code vector indexes 48, 52, and the sound source obtained as described above. The selection flag 55 is transmitted as a transmission parameter to the decoder.

【００３４】図２の音声復号部では、コードブック指標
３８’、４４’、４８’、５２’を用いて各コードブッ
ク６１、６５、７７、８１から各コードベクトル６２、
６６、７８、８２が読みだされ、また、パルス発生器７
３からパルス音源７８が発生される。また、利得コード
ブック指標５７’を用いて、利得コードブックから各利
得６３、６７、７５、７９、８３が再生される。各コー
ドベクトルの各利得を乗じて駆動音源ベクトル８９が生
成される。ただし、音源切り替えフラッグ５５’に基づ
いて、切り替え器８７によって、パルス音源と第１の雑
音音源、または、第２の雑音音源の一方が選択される。In the speech decoding unit of FIG. 2, the codebooks 38 ', 44', 48 'and 52' are used to extract the codevectors 62 from the codebooks 61, 65, 77 and 81.
66, 78, 82 are read out and the pulse generator 7
3 produces a pulsed sound source 78. The gains 63, 67, 75, 79, 83 are reproduced from the gain codebook using the gain codebook index 57 '. The driving sound source vector 89 is generated by multiplying each gain of each code vector. However, one of the pulse sound source and the first noise sound source or the second noise sound source is selected by the switch 87 based on the sound source switching flag 55 ′.

【００３５】上記駆動音源８９を、短期予測係数２１’
をフィルタ係数とする合成フィルタ９３に入力されるこ
とによって、合成音声９４が得られる。最後に主観的な
音質を向上させる目的で、合成音声９４が適応ポストフ
ィルタ９５に入力され、最終的な復号音声９６が得られ
る。The driving sound source 89 is converted into a short-term prediction coefficient 21 '.
By inputting it to the synthesis filter 93 having a filter coefficient of, a synthesized voice 94 is obtained. Finally, for the purpose of improving subjective sound quality, the synthesized speech 94 is input to the adaptive post filter 95, and the final decoded speech 96 is obtained.

【００３６】復号音声（ディジタル信号）９６はＤＡ変
換され、アナログ音声に変換され、出力される。The decoded voice (digital signal) 96 is DA converted, converted into analog voice, and output.

【００３７】以上、概要を説明したので、次に第１の実
施例の主要部分の詳細な機能を説明する。Now that the outline has been described, the detailed functions of the main parts of the first embodiment will be described.

【００３８】音響分類器１２はフレーム長、あるいはサ
ブフレーム長の音声データ１１から物理的パラメータを
計算し、それらのパラメータ値の論理判定によって、そ
の区間の音声を複数個のカテゴリーに分類するものであ
る。音響分類方法自体は公知の技術であり、たとえば小
澤：”種々の音源を用いる４．８ｋｂ／ｓマルチパルス
音声符号化方式”、日本音響学会講演論文集（１９８
９．３）に一例が開示されている。音響分類器として構
成した場合の例は、（中研）受付番号３１９２００４４
８の特許に開示されている。物理パラメータとしては、
例えば、エネルギー、エネルギー変化率、最大相関値、
予測利得、対数断面積比等を用いる。音声のカテゴリー
としては、母音性、鼻音性、破裂・過渡性、摩擦性等に
分類したり、母音・鼻音性、立ち上がり、立ち下がりな
どに分類している。音響分類はフレーム単位、あるいは
サブフレーム単位で実行されるが、例えばエネルギー変
化率は、フレーム単位に算出する場合は前フレームのフ
レームエネルギーと現フレームのフレームエネルギーの
差、あるいはサブフレームごとのエネルギーの変化を算
出すれば良い。また、サブフレーム単位で算出する場合
は、隣接するサブフレームのエネルギー差、または、サ
ブフレームをさらに前後半に分割して、その各々のエネ
ルギーの差を検出すれば良い。The acoustic classifier 12 calculates physical parameters from the voice data 11 of frame length or subframe length, and classifies the voice of the section into a plurality of categories by logically judging the parameter values. is there. The sound classification method itself is a known technique, for example, Ozawa: “4.8 kb / s multi-pulse speech coding method using various sound sources”, Proceedings of Acoustical Society of Japan (198).
An example is disclosed in 9.3). An example of the configuration as an acoustic classifier is (Chuken) reception number 31920044.
8 patents. As physical parameters,
For example, energy, energy change rate, maximum correlation value,
Predicted gain, logarithmic cross-sectional area ratio, etc. are used. The categories of voice are classified into vowels, nasal sounds, plosive / transient, frictional properties, and vowels / nasal sounds, rising and falling. Acoustic classification is performed in frame units or subframe units. For example, when calculating the energy change rate in frame units, the difference between the frame energy of the previous frame and the frame energy of the current frame or the energy of each subframe is calculated. The change may be calculated. Further, when the calculation is performed in units of subframes, the energy difference between adjacent subframes or the subframe is further divided into the first half and the second half, and the difference in energy of each may be detected.

【００３９】短期予測分析器（ＬＰＣ分析器）１７は、
フレームごとに音声データ１１から音声のスペクトル包
絡を表す短期予測係数１８を抽出する。短期予測係数１
８は最も一般的には線形予測係数であるが、それから導
出される等価なパラメータである偏自己相関係数（ＰＡ
ＲＣＯＲ係数、反射係数）や線スペクトル対（ＬＳＰパ
ラメータ）に容易に変換される。The short-term predictive analyzer (LPC analyzer) 17 is
The short-term prediction coefficient 18 representing the speech spectrum envelope is extracted from the speech data 11 for each frame. Short-term prediction coefficient 1
8 is most commonly a linear prediction coefficient, but is an equivalent parameter derived from it, the partial autocorrelation coefficient (PA
It is easily converted into RCOR coefficient, reflection coefficient) and line spectrum pair (LSP parameter).

【００４０】線形予測係数の導出方法としては、Ｄｕｒ
ｂｉｎ・Ｌｅｖｉｎｓｏｎの反復法（斎藤、中田著、
「音声情報処理の基礎」、オーム社、昭和５６年に紹介
されている）が一般的であり、反射係数の導出方法は、
上記以外にもＦＬＡＴアルゴリズム（電波システム開発
センター策定、「デジタル方式自動車電話システム標準
規格ＲＣＲＳＴＤ−２７」（以下、「ＲＣＲ規格書」
と略す）に開示されている）やＬｅＲｏｕｘ法（斎藤、
中田著、前出書に記載）などが提案されている。また、
線形予測係数からＬＳＰパラメータへの変換方法も、斎
藤、中田著の前出書に記載されている。As a method of deriving the linear prediction coefficient, Dur
Bin-Levinson Iterative Method (Saito, Nakata,
"Basics of voice information processing", introduced by Ohmsha, Ltd. in 1981) is common, and the method of deriving the reflection coefficient is
In addition to the above, the FLAT algorithm (established by the Radio System Development Center, "Digital Car Telephone System Standard RCR STD-27" (hereinafter "RCR Standard")
Abbreviated) and the LeRoux method (Saito,
Nakata, described in the above-mentioned book) has been proposed. Also,
The conversion method from the linear prediction coefficient to the LSP parameter is also described in the above-mentioned book by Saito and Nakata.

【００４１】線形予測係数１８は本実施例ではＬＳＰパ
ラメータに変換された後、量子化器１９によって２段ベ
クトル量子化され、量子化値２０に変換される。ＬＳＰ
パラメータは線形予測係数を直接量子化するよりも量子
化特性が良い（同一のビット数で量子化しても、スペク
トル歪が小さい）ことが知られている。量子化方法は、
許容されるビット数によって、スカラー量子化やベクト
ル量子化、ベクトル・スカラー量子化が用いられること
もある。量子化指標２１は伝送パラメータとして出力さ
れる。In the present embodiment, the linear predictive coefficient 18 is converted into an LSP parameter, which is then subjected to a two-stage vector quantization by a quantizer 19 and converted into a quantized value 20. LSP
It is known that the parameter has a better quantization characteristic than that of directly quantizing a linear prediction coefficient (spectrum distortion is small even if quantized with the same number of bits). The quantization method is
Depending on the number of bits allowed, scalar quantization, vector quantization, or vector / scalar quantization may be used. The quantization index 21 is output as a transmission parameter.

【００４２】次に聴覚重み付け誤差を計算するための前
処理について説明する。重み付け誤差を算出するため
に、まず入力音声１１に聴覚重み付け部２２において重
み付けがなされ、重み付け音声２３を得る。重み付けフ
ィルタは短期予測係数（または等価なパラメータ）の量
子化値２０から構成されるが、その具体形式は次の通り
である。Next, preprocessing for calculating the perceptual weighting error will be described. In order to calculate the weighting error, the perceptual weighting unit 22 first weights the input voice 11 to obtain a weighted voice 23. The weighting filter is composed of the quantized value 20 of the short-term prediction coefficient (or an equivalent parameter), and its specific form is as follows.

【００４３】[0043]

【数１】 [Equation 1]

【００４４】ここにαｉはフィルタ係数（線形予測係
数）、Ｎｐはフィルタ次数でたとえばＮｐ＝１０、λは
重み付けパラメータで通常λ＝０．８である。Here, αi is a filter coefficient (linear prediction coefficient), Np is a filter order, for example, Np = 10, and λ is a weighting parameter, usually λ = 0.8.

【００４５】一般に合成フィルタの出力は過去の状態の
影響を受けるが、ここでは演算量を削減するために、予
め重み付け音声２３から過去の合成フィルタの影響を取
り除いておく。すなわち、重み付け合成フィルタ２４に
フレーム長に相当する、値が０のデータ（零入力２５）
を入力し、零入力応答２６を計算し、重み付け音声２３
から減算し、過去の影響を取り除いた重み付け音声２７
を得る。ここで用いる重み付け合成フィルタ２４の伝達
関数は次の通りである。Generally, the output of the synthesis filter is influenced by the past state, but here, in order to reduce the amount of calculation, the influence of the past synthesis filter is removed from the weighted speech 23 in advance. That is, data having a value of 0 corresponding to the frame length in the weighting synthesis filter 24 (zero input 25)
, The zero input response 26 is calculated, and the weighted speech 23
Weighted speech 27 subtracted from
To get The transfer function of the weighting synthesis filter 24 used here is as follows.

【００４６】[0046]

【数２】 [Equation 2]

【００４７】この合成フィルタ２４は重み付けパラメー
タλを含んでいる点が、復号側の合成フィルタと異なる
点である。The synthesizing filter 24 is different from the synthesizing filter on the decoding side in that it includes the weighting parameter λ.

【００４８】初めに説明したとおり、長期予測分析は適
応コードブックの検索とみなされ、合成波形と原音声と
の聴覚重み付け誤差の最小化によって長期予測ラグ（適
応コードブックの指標）が選択される。ここでは雑音コ
ードブックとは逐次的に検索する場合について説明す
る。すなわち、雑音コードブックの出力は０と仮定し
て、最適な長期予測ベクトル４１を決定する。As explained at the beginning, the long-term prediction analysis is regarded as a search of the adaptive codebook, and the long-term prediction lag (index of the adaptive codebook) is selected by minimizing the auditory weighting error between the synthetic waveform and the original speech. .. Here, a case where the noise codebook is sequentially searched will be described. That is, assuming that the output of the noise codebook is 0, the optimum long-term prediction vector 41 is determined.

【００４９】本発明では、適応コードブックをパルス成
分用と雑音成分用の二つ有するため、コードブックの検
索は以下のようになる。重み付け２乗誤差は次式で定義
する。Since the present invention has two adaptive codebooks for the pulse component and the noise component, the codebook search is as follows. The weighted squared error is defined by the following equation.

【００５０】[0050]

【数３】 [Equation 3]

【００５１】ここで、ｂ_LP(n) ：ラグＬに対するパルス成分コードブックａ
_P(n)の出力ｂ_LN(n) ：ラグＬに対する雑音成分コードブックａ_N(n)
の出力ｂ'_LP(n)：ｂ_LP(n)の重み付け合成音声ｂ'_LN(n)：ｂ_LN(n)の重み付け合成音声 β_P：パルス成分の利得 β_N：雑音成分の利得ｐ(n) ：過去の影響を取り除いた重み付け入力音声である。ただし、重み付け合成はコードブックの出力に
対し、重み付け合成フィルタのインパルス応答との畳み
込みによって実現する。このようにして得られた合成出
力は合成フィルタの過去の状態には依存しないので、零
状態応答と呼ばれる。（数３）をβ_P、β_Nで偏微分する
ことにより、最適な利得は、Where b _LP (n): pulse component codebook a for lag L
Output of _P (n) b _LN (n): Noise component codebook a _N (n) for lag L
Output _{b 'LP (n): b} LP (n) of the weighted synthesized speech _{b' LN (n): b} LN (n) of the weighted synthesized speech beta _P: gain of pulse component beta _N: gain of the noise component p ( n): Weighted input speech with the past influence removed. However, the weighted synthesis is realized by convolving the output of the codebook with the impulse response of the weighted synthesis filter. The synthesized output thus obtained does not depend on the past states of the synthesis filter and is therefore called the zero-state response. By partially differentiating (Equation 3) with β _P and β _N , the optimum gain is

【００５２】[0052]

【数４】 [Equation 4]

【００５３】となり、この時の２乗誤差は、The squared error at this time is

【００５４】[0054]

【数５】 [Equation 5]

【００５５】となる。ただし、It becomes However,

【００５６】[0056]

【数６】 [Equation 6]

【００５７】である。よって、最適なラグＬは、（数
５）の右辺代２項を最大化するようなラグを求めれば良
い。ただし、β_P、β_Nが正となるもののみを対象とす
る。長期予測ベクトル４１は、ｂ'_L(n)＝β_Pｂ'_LP(n)＋β_Nｂ'_LN(n) となる。ただし、Ｌは最適なラグである。It is Therefore, the optimum lag L may be a lag that maximizes the right-hand side two terms of (Equation 5). However, only those for which β _P and β _{N are} positive are targeted. The long-term prediction vector 41 becomes b ′ _L (n) = β _P b ′ _LP (n) + β _N b ′ _LN (n). However, L is the optimum lag.

【００５８】次に、パルス発生器４２におけるパルス音
源の生成について説明する。パルス発生器４２において
は、最適なラグＬをパルス間隔とするパルス列ｃ_P(n)を
生成する。サブフレームにおける先頭のパルスの位置を
指標とすることによって、パルス列を一意に決定でき
る。サブフレームにおけるパルスの関係は、図３に示す
ように、ラグＬとサブフレーム長Ｎとの関係によって、
二つのタイプに分けることができる。Next, generation of a pulse sound source in the pulse generator 42 will be described. The pulse generator 42 generates a pulse train c _P (n) having an optimum lag L as a pulse interval. The pulse train can be uniquely determined by using the position of the leading pulse in the subframe as an index. As shown in FIG. 3, the relationship between the pulses in the sub-frame depends on the relationship between the lag L and the sub-frame length N.
It can be divided into two types.

【００５９】タイプ１は、Ｌ_min≦Ｌ≦Ｎ−１の場合
で、サブフレーム内に複数本のパルスが存在する場合で
あり、Ｌ通りの配置がある。タイプ２は、Ｎ≦Ｌ≦Ｌ
_maxの場合で、サブフレーム内には１本のパルスしかな
い。この時はＮ通りの配置がある。ただし、Ｌ_minとＬ
_maxはラグの検索範囲の最小値と最大値である。たとえ
ば、Ｌ_min＝２０、Ｌ_max＝１４６、Ｎ＝６４とすると、
Ｌの値に応じて５ビットあるいは６ビットで全ての配置
を表すことができる。Type 1 is a case where L _min ≤L≤N-1 and there are a plurality of pulses in a subframe, and there are L arrangements. Type 2 is N ≦ L ≦ L
_In case of _max , there is only one pulse in a subframe. At this time, there are N arrangements. However, L _min and L
_max is the minimum value and the maximum value of the search range of lag. For example, if L _min = 20, L _max = 146, N = 64,
Depending on the value of L, all the arrangements can be represented by 5 bits or 6 bits.

【００６０】次にパルス音源の検索について説明する。
生成したパルス音源ｃ_P(n)の重み付け合成音声をｆ_P(n)
とすると、ｆ_P(n)をｂ'_L(n)に対して直交化したｆ'_P(n)
についてNext, the search for the pulse sound source will be described.
Generate the weighted synthesized speech of the generated pulse sound source c _P (n) as f _P (n)
Then, f _P (n) is orthogonalized to b ′ _L (n), and f ′ _P (n)
about

【００６１】[0061]

【数７】 [Equation 7]

【００６２】を最小化するようなｆ_P(n)を求める。な
お、直交化にはグラム・シュミットの直交化法等が用い
られる。Find f _P (n) that minimizes For the orthogonalization, the Gram-Schmidt orthogonalization method or the like is used.

【００６３】次に、第１の雑音コードブック検索器４６
における雑音音源（１）の検索について説明する。パル
ス音源と雑音音源（１）に割り当てられるビット数を合
計１０ビットと仮定すると、ラグＬの値に応じて、雑音
音源には５ビットまたは４ビット割り当てることができ
る。これはコードブックサイズが、５ビットまたは４ビ
ットの雑音コードブックの検索を行うことである。Next, the first noise codebook search unit 46
The search for the noise source (1) in (1) will be described. Assuming that the total number of bits allocated to the pulse sound source and the noise sound source (1) is 10 bits, 5 bits or 4 bits can be allocated to the noise sound source depending on the value of the lag L. This is to search for a noise codebook whose codebook size is 5 bits or 4 bits.

【００６４】ｆ_N1(n)を雑音コードベクトルｃ_N1(n)の重
み付け合成音声とする。ｆ_N1(n)をｂ'_L(n)とｆ'_P(n)に
直交化させたベクトルをｆ'_N1(n)とすると、ｆ'_N1(n)に
ついてLet f _N1 (n) be the weighted synthesized speech of the noise code vector c _N1 (n). 'When _{N1 (n), f' f} N1 and b _'L and f (n)' _P the vector is orthogonal to the (n) (n) f For _N1 (n)

【００６５】[0065]

【数８】 [Equation 8]

【００６６】を最小化するようなｆ_N1(n)を求める。Find f _N1 (n) that minimizes

【００６７】次に、第２の雑音コードブック検索器５０
における雑音音源（２）の検索について説明する。パル
ス音源と雑音音源（１）に割り当てられるビット数の合
計に等しい１０ビットの雑音コードブックの検索を行う
ことになる。Next, the second noise codebook search unit 50
The search for the noise source (2) in (1) will be described. A 10-bit noise codebook equal to the total number of bits assigned to the pulse sound source and the noise sound source (1) will be searched.

【００６８】ｆ_N2(n)を雑音コードベクトルｃ_N2(n)の重
み付け合成音声とする。ｆ_N2(n)をｂ'_L(n)に直交化させ
たベクトルをｆ'_N2(n)とすると、ｆ'_N2(n)についてLet f _N2 (n) be the weighted synthesized speech of the noise code vector c _N2 (n). When f _N2 and b _'L vector was orthogonal to the (n) f' (n) and _N2 (n), the f _'N2 (n)

【００６９】[0069]

【数９】 [Equation 9]

【００７０】を最小化するようなｆ_N2(n)を求める。な
お、雑音コードブックはＶＳＥＬＰ型の構造にしても良
い。このようにすることにより、通常のＣＥＬＰに比べ
て、処理量を格段に低減することができる。ＶＳＥＬＰ
については、前出のＲＣＲ規格書に詳細に述べられてい
る。Find f _N2 (n) that minimizes The noise codebook may have a VSELP type structure. By doing so, it is possible to significantly reduce the processing amount as compared with the normal CELP. VSELP
Are described in detail in the RCR standard mentioned above.

【００７１】次に、選択器５４における音源の選択につ
いて説明する。基本的には、パルス音源と雑音音源
（１）を用いた場合の重み付け誤差と、雑音音源（２）
のみを用いた場合の重み付け誤差を比較し、誤差が小さ
くなる方の音源を選択することになる。選択器５４の出
力は、音源選択フラッグ５５である。Next, selection of a sound source by the selector 54 will be described. Basically, the weighting error when using the pulse sound source and the noise sound source (1) and the noise sound source (2)
The weighting error in the case of using only is compared, and the sound source with the smaller error is selected. The output of the selector 54 is a sound source selection flag 55.

【００７２】パルス音源と雑音音源（１）を用いた場合
の重み付け誤差は、次式の通りである。The weighting error when the pulse sound source and the noise sound source (1) are used is as follows.

【００７３】[0073]

【数１０】 [Equation 10]

【００７４】同様に、雑音音源（２）のみを用いた場合
の重み付け誤差は、次式の通りである。Similarly, the weighting error when only the noise source (2) is used is as follows.

【００７５】[0075]

【数１１】 [Equation 11]

【００７６】ここに、β_P、β_N、γ_P、γ_N1、γ_N2は利
得である。実際には、利得コードブック検索器５６で利
得コードブックを検索し、その結果得られた量子化され
た利得を用いて（数１０）と（数１１）を評価すること
になる。Here, β _P , β _N , γ _P , γ _N1 and γ _N2 are gains. In reality, the gain codebook searcher 56 searches the gain codebook, and the quantized gains obtained as a result are used to evaluate (Equation 10) and (Equation 11).

【００７７】ここで、音響分類結果の寄与について簡単
に触れておく。（数１０）と（数１１）の重み付け誤差
を評価した場合、利得の量子化等の影響で、選択される
音源が頻繁に切り替わることがありうる。このような場
合、主観品質が必ずしも良くならないことがある。例え
ば、母音の定常部などでは、多少重み付け誤差が悪くて
も一方に固定しておいた方がよい。このため、音響分類
結果に基づいて（数１０）または（数１１）の重み付け
誤差にバイアスを与えている。Here, the contribution of the sound classification result will be briefly described. When the weighting errors of (Equation 10) and (Equation 11) are evaluated, it is possible that the selected sound source is frequently switched due to the effect of quantization of gain or the like. In such a case, the subjective quality may not necessarily be good. For example, in the stationary part of vowels, it is better to fix it to one side even if the weighting error is somewhat bad. Therefore, the weighting error of (Equation 10) or (Equation 11) is biased based on the sound classification result.

【００７８】次に、適応コードブックの更新について説
明する。ここでは先ず、復号器におけるのと同じ駆動音
源を作成する。パルス音源と雑音音源（１）が選択され
た場合、駆動音源は次のようになる。Next, updating of the adaptive codebook will be described. Here, first, the same driving sound source as in the decoder is created. When the pulse sound source and the noise sound source (1) are selected, the driving sound source is as follows.

【００７９】ｅｘ₁(n)＝ｅｘ_P1(n)＋ｅｘ_N1(n) ただし、ｅｘ_P1(n)＝β_Pｂ_LP(n)＋γ_Pｃ_P(n) ｅｘ_N1(n)＝β_Nｂ_LN(n)＋γ_N1ｃ_N1(n) である。一方、雑音音源（２）が選択された場合の駆動
音源は、ｅｘ₂(n)＝ｅｘ_P2(n)＋ｅｘ_N2(n) ただし、ｅｘ_P2(n)＝β_Pｂ_LP(n) ｅｘ_N2(n)＝β_Nｂ_LN(n)＋γ_N2ｃ_N2(n) である。Ex ₁ (n) = ex _P1 (n) + ex _N1 (n) where ex _P1 (n) = β _P b _LP (n) + γ _P c _P (n) ex _N1 (n) = β _N b _LN (n) + γ _N1 c _N1 (n). On the other hand, when the noise source (2) is selected, the driving source is ex ₂ (n) = ex _P2 (n) + ex _N2 (n), where ex _P2 (n) = β _P b _LP (n) ex _N2 (n) = β _N b _LN (n) + γ _N2 c _N2 (n).

【００８０】パルス成分用適応コードブックの更新は、
次の通りである。ａ_P(n)＝ａ_P(n＋Ｌ) ：−Ｌ_max≦ｎ＜−Ｎａ_P(n)＝ｅｘ_Pi(n＋Ｌ)：−Ｎ≦ｎ＜０ただし、ｉ＝１のとき、パルス音源と雑音音源（１）が選択され
た場合ｉ＝２のとき、雑音音源（２）が選択された場合である。Updating the adaptive codebook for pulse components is
It is as follows. a _P (n) = a _P (n + L): −L _max ≦ n <−N a _P (n) = ex _Pi (n + L): −N ≦ n <0 However, when i = 1, pulse source and noise When the sound source (1) is selected When i = 2, the noise source (2) is selected.

【００８１】一方、雑音成分適応コードブックの更新
は、次の通りである。ａ_N(n)＝ａ_N(n＋Ｌ) ：−Ｌ_max≦ｎ＜−Ｎａ_N(n)＝ｅｘ_Ni(n＋Ｌ)：−Ｎ≦ｎ＜０ただし、ｉ＝１のとき、パルス音源と雑音音源（１）が選択され
た場合ｉ＝２のとき、雑音音源（２）が選択された場合である。On the other hand, the noise component adaptive codebook is updated as follows. a _N (n) = a _N (n + L): −L _max ≦ n <−N a _N (n) = ex _Ni (n + L): −N ≦ n <0 However, when i = 1, pulse source and noise When the sound source (1) is selected When i = 2, the noise source (2) is selected.

【００８２】駆動音源の生成と、適応コードブックの更
新は、符号器と復号器でまったく同一である。The generation of the driving excitation and the updating of the adaptive codebook are exactly the same in the encoder and the decoder.

【００８３】次に図２に戻り、本実施例の復号化部につ
いて説明する。復号器は、概要で説明した通り、各指標
に基づいてコードブックを検索して得られたコードベク
トル、あるいはパルス音源に対し、復号された利得を乗
して加算することによって、駆動音源を求める。これを
線駅予測係数をフィルタ係数とする合成フィルタに入力
することによって、合成音を得る。Next, returning to FIG. 2, the decoding unit of this embodiment will be described. As described in the overview, the decoder obtains the driving sound source by multiplying the code vector obtained by searching the codebook based on each index or the pulsed sound source with the decoded gain and adding it. .. By inputting this into a synthesis filter having a line station prediction coefficient as a filter coefficient, a synthetic sound is obtained.

【００８４】実際の音源は、音源選択フラッグに基づい
て切り替えられる。詳細は、符号器の説明を参照された
い。The actual sound source is switched based on the sound source selection flag. For details, refer to the description of the encoder.

【００８５】本実施例によれば、低ビットレートのＣＥ
ＬＰ符号器においても周期成分の再現性が向上し、高品
質化が図れる。According to this embodiment, the CE having a low bit rate is used.
Also in the LP encoder, the reproducibility of the periodic component is improved and the quality can be improved.

【００８６】次に、本発明の第２の実施例について説明
する。図４に本実施例の音声符号器のブロック図を示
す。本実施例は、音響分類器を用いない場合を示してお
り、選択器５４では、（数１０）と（数１１）の重み付
け誤差の比較のみで音源を選択する。音声復号器は、第
１の実施例のものと同一である。本実施例では、音響分
類を行わないので、第１に実施例に比較して、処理量を
低減できる。Next, a second embodiment of the present invention will be described. FIG. 4 shows a block diagram of the speech coder of this embodiment. This embodiment shows a case where the acoustic classifier is not used, and the selector 54 selects a sound source only by comparing the weighting errors of (Equation 10) and (Equation 11). The speech decoder is the same as that of the first embodiment. Since sound classification is not performed in this embodiment, the processing amount can be reduced compared to the first embodiment.

【００８７】次に、本発明の第３の実施例について説明
する。図５に本実施例の音声符号器のブロック図を示
す。本実施例ではピッチ抽出器１４を具備し、入力音声
１１のピッチ周期を抽出し、その値１５と符号１６を出
力する。ピッチ周期１５は、パルス発生器４２におい
て、生成するパルス間隔を規定する。すなわち、第１の
実施例では、パルス間隔は長期予測ラグに一致するよう
に決定されていたが、本実施例ではパルス間隔はピッチ
周期に一致する。Next, a third embodiment of the present invention will be described. FIG. 5 shows a block diagram of the speech coder of this embodiment. In the present embodiment, the pitch extractor 14 is provided to extract the pitch period of the input speech 11 and output the value 15 and the code 16. The pitch period 15 defines the pulse interval generated by the pulse generator 42. That is, in the first embodiment, the pulse interval is determined to match the long-term prediction lag, but in the present embodiment, the pulse interval matches the pitch period.

【００８８】ピッチ周期の符号１６は、伝送パラメータ
として復号器へ伝送される。The code 16 of the pitch period is transmitted to the decoder as a transmission parameter.

【００８９】図６に本実施例の復号器のブロック図を示
す。第１の実施例の復号器との違いは、ピッチ周期の符
号からパルス音源のパルス間隔を決定する点である。本
実施例では、ピッチ周期という音声の物理現象を反映し
たパルスが生成されることで、音声品質が一層向上す
る。FIG. 6 shows a block diagram of the decoder of this embodiment. The difference from the decoder of the first embodiment is that the pulse interval of the pulse sound source is determined from the code of the pitch period. In the present embodiment, the voice quality is further improved by generating the pulse reflecting the physical phenomenon of the voice called the pitch period.

【００９０】次に、本発明の第４の実施例について説明
する。図７に本実施例の音声符号器のブロック図を示
す。本実施例では、第３の実施例と同様に、ピッチ抽出
器１４を具備するが、ピッチ周期１５をパルス発生器４
２におけるパルス間隔に直接反映させないことに特徴が
ある。本実施例では、ピッチ周期１５は長期予測ラグの
選択に影響を与える。すなわち、第１の実施例では長期
予測ラグは（数５）のＥを最小化するものとして決定さ
れたが、本実施例では、検索されるラグＬがピッチ周期
１５に近い場合にはＥの値を一定の割合で下げるように
バイアスをかける。このようにすると、通常はピッチ周
期の整数倍の値をランダムにとりやすかった長期予測ラ
グが、ピッチ周期と一致する比率が高くなり、長期予測
ラグの連続性が向上する。Next, a fourth embodiment of the present invention will be described. FIG. 7 shows a block diagram of the speech coder of this embodiment. In the present embodiment, the pitch extractor 14 is provided as in the third embodiment, but the pitch period 15 is set to the pulse generator 4.
The feature is that it is not directly reflected in the pulse interval in 2. In this example, pitch period 15 affects the choice of long-term predicted lag. That is, in the first embodiment, the long-term predicted lag is determined to minimize E of (Equation 5), but in the present embodiment, when the searched lag L is close to the pitch period 15, E of E is calculated. Bias to lower the value at a constant rate. By doing so, the long-term prediction lag, which is normally easy to randomly take a value that is an integral multiple of the pitch period, has a high rate of matching with the pitch period, and the continuity of the long-term prediction lag is improved.

【００９１】さらに、パルス発生器４２におけるパルス
間隔は、第１の実施例同様、長期予測ラグに一致するの
で、ピッチ周期の指標を伝送する必要がない。なお、音
声復号器は、第１の実施例の復号器と同一である。Further, since the pulse interval in the pulse generator 42 matches the long-term prediction lag as in the first embodiment, it is not necessary to transmit the pitch period index. The audio decoder is the same as the decoder of the first embodiment.

【００９２】本実施例によれば、長期予測ラグの連続性
が改善され、同時にパルス音源のパルス間隔も音声の物
理的特徴を反映するようになるので、音声品質の改善が
図れる。According to this embodiment, the continuity of the long-term prediction lag is improved, and at the same time, the pulse interval of the pulse sound source also reflects the physical characteristics of the voice, so that the voice quality can be improved.

【００９３】次に、本発明の第５の実施例について説明
する。図８は本実施例の音声符号器のブロック図であ
る。これまでの実施例との違いは、適応コードブック３
９がパルス成分用と雑音成分用に分離されておらず、一
つだけであるという点である。これに対応して、図９に
示した音声復号器にも適応コードブック６９も一つだけ
である。このようにすることで、長期予測ラグの検索
や、更新の処理量の低減が図れる。低処理量での実現が
望まれるときに、本実施例は有効である。Next, a fifth embodiment of the present invention will be described. FIG. 8 is a block diagram of the speech coder of this embodiment. The difference from the previous embodiments is that the adaptive codebook 3
9 is not separated for the pulse component and the noise component, and there is only one. Correspondingly, there is only one speech decoder and one adaptive codebook 69 shown in FIG. By doing so, it is possible to search for the long-term prediction lag and reduce the update processing amount. This embodiment is effective when it is desired to realize a low throughput.

【００９４】[0094]

【発明の効果】本発明によれば、ＣＥＬＰ符号器を低ビ
ットレート化したときに問題となる周期成分の再現性が
改善されるため、４ｋｂｐｓ以下のビットレートでも良
好な音声品質の音声符号器を提供できる。According to the present invention, the reproducibility of the periodic component which is a problem when the CELP encoder is made to have a low bit rate is improved, so that the speech encoder having a good voice quality even at a bit rate of 4 kbps or less. Can be provided.

【００９５】[0095]

[Brief description of drawings]

【図１】本発明の第１の実施例の符号化部のブロック図
である。FIG. 1 is a block diagram of an encoding unit according to a first embodiment of this invention.

【図２】本発明の第１の実施例の復号化部のブロック図
である。FIG. 2 is a block diagram of a decoding unit according to the first embodiment of this invention.

【図３】パルス音源発生の原理説明図である。FIG. 3 is an explanatory diagram of the principle of pulse sound source generation.

【図４】本発明の第２の実施例の符号化部のブロック図
である。FIG. 4 is a block diagram of an encoding unit according to a second embodiment of the present invention.

【図５】本発明の第３の実施例の符号化部のブロック図
である。FIG. 5 is a block diagram of an encoding unit according to a third embodiment of the present invention.

【図６】本発明の第３の実施例の復号化部のブロック図
である。FIG. 6 is a block diagram of a decoding unit according to a third embodiment of the present invention.

【図７】本発明の第４の実施例の符号化部のブロック図
である。FIG. 7 is a block diagram of an encoding unit according to a fourth embodiment of the present invention.

【図８】本発明の第５の実施例の符号化部のブロック図
である。FIG. 8 is a block diagram of an encoding unit according to a fifth embodiment of the present invention.

【図９】本発明の第５の実施例の復号化部のブロック図
である。FIG. 9 is a block diagram of a decoding unit according to a fifth embodiment of the present invention.

[Explanation of symbols]

１２…音響分類器、１４…ピッチ抽出器、１７…聴覚重み付け器、２４…重み付け合成フィルタ、３１，６１…雑音成分適応コードブック検索器、３４，６５…パルス成分適応コードブック検索器、３９…適応コードブック検索器、４２…パルス発生器、４６…第１の雑音コードブック検索器、５０…第２の雑音コードブック検索器、５４…音源選択器、５６…利得コードブック検索器、５８…多重化装置、５９…伝送路、６０…多重分離装置、６９…適応コードブック、７３…パルス発生器、７７…第１の雑音コードブック、８１…第２の雑音コードブック、８７…音源切り替え器，９３…合成フィルタ、９５…適応ポストフィルタ、９７…利得コードブック。 12 ... Acoustic classifier, 14 ... Pitch extractor, 17 ... Auditory weighter, 24 ... Weighting synthesis filter, 31, 61 ... Noise component adaptive codebook searcher, 34, 65 ... Pulse component adaptive codebook searcher, 39 ... Adaptive codebook searcher, 42 ... Pulse generator, 46 ... First noise codebook searcher, 50 ... Second noise codebook searcher, 54 ... Sound source selector, 56 ... Gain codebook searcher, 58 ... Multiplexing device, 59 ... Transmission line, 60 ... Demultiplexing device, 69 ... Adaptive codebook, 73 ... Pulse generator, 77 ... First noise codebook, 81 ... Second noise codebook, 87 ... Sound source switching device , 93 ... Synthesis filter, 95 ... Adaptive post filter, 97 ... Gain codebook.

Claims

[Claims]

1. A code-driven linear prediction (CELP) coding method in which a plurality of sound sources are switched and used, wherein at least one of the sound sources is a combination of a pulse component and a noise component, and a pulse of the pulse component. The position and the gain are set independently of the noise vector of the noise component.

2. The speech coding method according to claim 1, wherein the switching of the sound sources is determined by evaluating a weighting error in a coding target section.

3. The speech coding method comprises an acoustic classifier for classifying an input speech based on acoustic characteristics of the speech.
The speech coding method according to claim 2, wherein the classification result of the acoustic classifier is reflected in the evaluation of the weighting error.

4. The pulse interval of the pulse component is made to coincide with a long-term prediction lag within the coding target section, and the position of the first pulse of the pulse component is set at a position from the beginning of the coding target section. The audio encoding method according to claim 1, wherein the audio encoding method is defined and used as an index.

5. The voice encoding method comprises a pitch extractor for extracting a pitch period of an input voice, wherein a pulse interval of the pulse component is matched with a pitch period extracted by the pitch extractor, 5. The speech coding method according to claim 1, wherein the position of the first pulse of the pulse component is defined as a position from the beginning of the coding target section and is used as an index. ..

6. The speech coding method according to claim 4, further comprising a pitch extractor for extracting the pitch period of the input speech, and the output of the pitch extractor is reflected in the determination of the long-term prediction lag. The audio encoding method described in.

7. The speech encoding method comprises an adaptive codebook for pulse components and an adaptive codebook for noise components,
2. The search for the long-term prediction lag is performed for a long-term prediction filter component synthesized by optimizing the gain of the adaptive codebook for pulse components and the gain of the adaptive codebook for noise components. The audio encoding method according to claim 6.