JP3063087B2

JP3063087B2 - Audio encoding / decoding device, audio encoding device, and audio decoding device

Info

Publication number: JP3063087B2
Application number: JP63123148A
Authority: JP
Inventors: 英輔花田; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-05-20
Filing date: 1988-05-20
Publication date: 2000-07-12
Anticipated expiration: 2015-07-12
Also published as: JPH01293399A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声信号を低いビットレートで効率的に符号
化、復号化するための音声符号化復号化方式及びその装
置に関する。Description: TECHNICAL FIELD The present invention relates to an audio encoding / decoding system for encoding and decoding an audio signal efficiently at a low bit rate and an apparatus therefor.

（従来の技術）音声信号を低いビットレート、例えば16kb/s程度以下
で伝送する方式としては、マルチパルス符号化法などが
知られている。これらは音源信号を複数個のパルスの組
合せ（マルチパルス）で表し、声道の特徴をデジタルフ
ィルタで表し、音源パルスの情報とフィルタの係数を、
一定時間区間（フレーム）毎に求めて伝送している。こ
の方法の詳細については、例えば（Araseki,Ozawa,Ono
and Ochial,“Mulit−pulse Excited Speech Coder Bas
ed on Maximum Cross−correlation Search Algorith
m",（GLOBECOM'83,IEEE Global Telecommunication,23.
3,1983））に記載されている。この方法では、声道情報
と音源信号を分離してそれぞれ表現すること、および音
源信号を表現する手段として複数のパルス列の組合せ
（マルチパルス）を用いることにより、復号後に良好な
音声信号を出力する。音源信号を表すマルチパルス列を
求める基本的な考え方については第３図を用いて説明す
る。図中の入力端子800からはフレーム毎に分割された
区間の音声信号が入力される。合成フィルタ回路820に
は現フレームの音声信号が求められたスペクトルパラメ
ータが入力されている。音源計算回路810において初期
音源マルチパルス列を発生し、これを前記合成フィルタ
820に入力することによって出力として合成音声波形が
得られる。減算器840では前記入力信号から合成音声波
形を減ずる。この結果を重み付け回路850へ入力し、出
力として現フレームでの重み付け誤差電力を得る。そし
てこの重み付け誤差電力を最小とするように、音源計算
回路810において音源マルチパルス列の振幅と位置を求
める。(Prior Art) As a method of transmitting an audio signal at a low bit rate, for example, about 16 kb / s or less, a multi-pulse encoding method or the like is known. In these, the sound source signal is represented by a combination of a plurality of pulses (multi-pulse), the characteristics of the vocal tract are represented by a digital filter, and the information of the sound source pulse and the coefficient of the filter are represented by:
It is obtained and transmitted for each fixed time section (frame). For details of this method, see, for example, (Araseki, Ozawa, Ono
and Ochial, “Mulit-pulse Excited Speech Coder Bas
ed on Maximum Cross-correlation Search Algorith
m ", (GLOBECOM'83, IEEE Global Telecommunication, 23.
3,1983)). In this method, a good voice signal is output after decoding by separating and expressing the vocal tract information and the sound source signal, respectively, and using a combination (multi-pulse) of a plurality of pulse trains as means for expressing the sound source signal. . The basic concept of obtaining a multi-pulse train representing a sound source signal will be described with reference to FIG. An audio signal of a section divided for each frame is input from an input terminal 800 in the figure. The spectrum parameter from which the audio signal of the current frame is obtained is input to the synthesis filter circuit 820. A sound source calculation circuit 810 generates an initial sound source multi-pulse train,
By inputting to 820, a synthesized speech waveform is obtained as an output. The subtractor 840 subtracts the synthesized speech waveform from the input signal. The result is input to the weighting circuit 850, and the weighted error power in the current frame is obtained as an output. Then, the amplitude and position of the sound source multi-pulse train are obtained in the sound source calculation circuit 810 so as to minimize the weighted error power.

一方、ピッチの微細構造を表すピッチパラメータを用
いてピッチ予測を行うことにより文献１の方式の音質を
改善するピッチ予測マルチパルス法については、特願昭
58−139022号明細書（文献２）において説明されている
ので、ここでは説明を省略する。On the other hand, a pitch prediction multi-pulse method for improving the sound quality of the method of Reference 1 by performing pitch prediction using a pitch parameter representing a fine structure of pitch is described in Japanese Patent Application No.
Since it is described in the specification of Japanese Patent Application No. 58-139022 (Reference 2), the description is omitted here.

（発明が解決しようとする問題点）しかしながら、前記文献１の従来法ではピットレート
が充分に高く音源パルスの数が充分なときは音質が良好
であったが、ビットレートを下げて行くと音質が低下し
ていた。特に、従来の方式においては、ピッチ周波数の
高い入力信号の場合、例えば女性の声を入力した場合に
は、再生音声か劣化するという欠点があった。これはピ
ッチ周波数が高い場合には、パルス計算のフレーム内に
多くのピッチ波形が含まれることになり、このピッチ波
形を良好に再生するためには、ピッチ周波数が低い話者
の場合と比べて、より多くの個数のマルチパルスを必要
とする理由による。従ってこの理由から、音質を低下さ
せることなく伝送ビットレートを大幅に下げる、すなわ
ち１フレーム内のパルス数を大幅に減少させることが困
難であった。(Problems to be Solved by the Invention) However, in the conventional method of Document 1, the sound quality is good when the pit rate is sufficiently high and the number of sound source pulses is sufficient, but when the bit rate is reduced, the sound quality is reduced. Had declined. In particular, the conventional method has a drawback that in the case of an input signal having a high pitch frequency, for example, when a female voice is input, the reproduced voice is deteriorated. This means that when the pitch frequency is high, many pitch waveforms are included in the frame of the pulse calculation, and in order to reproduce this pitch waveform well, compared to a speaker with a low pitch frequency, , Because more multi-pulses are required. Therefore, for this reason, it has been difficult to greatly reduce the transmission bit rate without deteriorating the sound quality, that is, to significantly reduce the number of pulses in one frame.

一方、前記文献２の従来法では、ビッチ毎の相関に基
づきピッチパラメータを用いてピッチ予測を行っている
ものの、大振幅音源信号、小振幅音源信号を問わず、マ
ルチパルスとピッチ予測とを用いて音源信号を表わして
いた。大振幅音源信号はピッチ毎の相関が高いと考えら
れるが、小振幅音源信号では相関は低いと考えられる。
この方法による音質をさらに改善するためには、音源信
号を表すマルチパルス列の内、小振幅のマルチパルス列
あるいは小振幅の音源信号の役割がさらに重要である。
このことは特に子音性の音声信号に対して重要である。
従来の方法では音源信号を表現するマルチパルス列とし
て、振幅が大きいものから順に、設定した個数のみを求
めて伝送していた。従って従来例では予め設定した情報
量の上限により、充分な個数の小振幅パルスを求めるこ
とができず、音源信号の近似度が充分でなく、再生音声
の品質の点で限界があり、それ以上の音質の向上は図れ
なかった。また、このことはビットレートが低いときに
特に顕著であった。On the other hand, in the conventional method of Document 2, although pitch prediction is performed using a pitch parameter based on a correlation for each bitch, a multi-pulse and pitch prediction are used regardless of a large-amplitude excitation signal or a small-amplitude excitation signal. The sound source signal. A large amplitude excitation signal is considered to have a high correlation for each pitch, while a small amplitude excitation signal is considered to have a low correlation.
In order to further improve the sound quality by this method, the role of the small-amplitude multi-pulse train or the small-amplitude sound source signal among the multi-pulse trains representing the sound source signal is more important.
This is especially important for consonant audio signals.
In the conventional method, as a multi-pulse train expressing a sound source signal, only a set number is obtained and transmitted in descending order of amplitude. Therefore, in the conventional example, a sufficient number of small-amplitude pulses cannot be obtained due to the preset upper limit of the information amount, the approximation of the sound source signal is not sufficient, and there is a limit in the quality of the reproduced sound. Could not be improved. This was particularly remarkable when the bit rate was low.

本発明の目的は、比較的少ない演算量で、ビットレー
トが高いところでも、下げていっても従来よりも良好な
音声を再生するこが可能である音声符号化復号化方式お
よびその装置を提供することにある。An object of the present invention is to provide an audio encoding / decoding method and an apparatus capable of reproducing sound better than before even with a relatively small amount of operation even at a high bit rate or at a reduced bit rate. Is to do.

（問題点を解決するための手段）本発明の音声符号化復号化方式は、離散的な音声信号
を入力し、前記音声信号のピッチの微細構造を表わすピ
ッチパラメータと前記音声信号のスペクトルを表すスペ
クトルパラメータを求め、前記音声信号を予め定められ
た種類に分類し、前記音声信号の音源信号を前記種類に
応じてピッチ予測して求めたマルチパルス列とコードブ
ックかあるいはピッチ予測して求めたマルチパルス列を
用いて表して伝送し、前記音声信号の種類に応じて前記
コードブックと前記マルチパルス列と前記ピッチパラメ
ータまたは前記マルチパルス列と前記ピッチパラメータ
とを用いて前記音源信号を復元し前記スペクトルパラメ
ータを用いて前記音声信号を良好に表す合成音声信号を
出力する。(Means for Solving the Problems) A speech encoding / decoding system according to the present invention inputs a discrete speech signal and represents a pitch parameter representing a fine structure of a pitch of the speech signal and a spectrum of the speech signal. Spectral parameters are determined, the voice signal is classified into a predetermined type, and a multi-pulse sequence and a codebook obtained by pitch prediction of a sound source signal of the voice signal according to the type are obtained. Expressed using a pulse train and transmitted, restore the sound source signal using the codebook, the multi-pulse train and the pitch parameter or the multi-pulse train and the pitch parameter according to the type of the audio signal, and restore the spectral parameter. To output a synthesized speech signal that satisfactorily represents the speech signal.

本発明の音声符号化装置は、入力した離散的な音声信
号系列からピッチの微細構造を表すピッチパラメータを
求めて符号化するピッチパラメータ計算回路と、短時間
スペクトル特性を表すスペクトルパラメータを求めて符
号化するスペクトルパラメータ計算回路と、前記音声信
号を予め定められた種類に分類し判別符号を出力する判
別回路と、前記ピッチパラメータと前記スペクトルパラ
メータを用いて前記音声信号の音源信号を区間に応じて
ピッチ予測して求めたマルチパルス列と複数個のコード
ブックから一種類を選択したコードブックで表す音源信
号計算回路と、前記ピッチパラメータと前記スペクトル
パラメータと前記判別符号と前記マルチパルス列と前記
コードブックとを組み合わせて出力するマルチプレクサ
回路とを有する。A speech coding apparatus according to the present invention includes a pitch parameter calculation circuit that obtains and encodes a pitch parameter representing a fine structure of a pitch from an input discrete speech signal sequence, and a code that obtains a spectrum parameter representing a short-time spectrum characteristic. A spectral parameter calculation circuit for converting the audio signal into a predetermined type and outputting a determination code, and a sound source signal of the audio signal according to a section using the pitch parameter and the spectrum parameter. An excitation signal calculation circuit represented by a multi-pulse sequence determined by pitch prediction and a code book of one type selected from a plurality of code books, the pitch parameter, the spectrum parameter, the discrimination code, the multi-pulse sequence, and the code book. And a multiplexer circuit for combining and outputting.

本発明の音声復号化装置は、音源信号を表す音源マル
チパルス列を表す符号と音声信号のピッチの微細構造を
表わすピッチパラメータを表わす符号と前記音声信号の
短時間スペクトル特性を表わすスペクトルパラメータを
表す符号と音源信号を表す符号と判別符号を入力して分
離し復号化するデマルチプレクサ回路と、前記復号化さ
れた音源信号のうちマルチパルス列を復元し前記復号化
されたピッチパラメータを用いてピッチを再生した音源
信号を求めるピッチ再生回路と、判別符号に応じて複数
種類のコードブックから一種類を選択し前記ピッチ再生
回路の出力を用いて駆動音源信号を復元する音源信号復
元回路と、前記復元された駆動音源信号または前記再生
された音源信号と前記復号されたスペクトルパラメータ
を用いて音声信号を合成するスペクトル包絡フィルタ回
路とを有する。A speech decoding apparatus according to the present invention comprises a code representing a sound source multi-pulse train representing a sound source signal, a code representing a pitch parameter representing a fine structure of a pitch of a speech signal, and a code representing a spectrum parameter representing a short-time spectrum characteristic of the speech signal. And a demultiplexer circuit that inputs and separates and decodes a code representing a sound source signal and a discrimination code, and restores a multi-pulse train among the decoded sound source signals and reproduces a pitch using the decoded pitch parameter. A pitch reproduction circuit for obtaining the generated sound source signal, a sound source signal restoration circuit for selecting one type from a plurality of types of codebooks according to the discrimination code and restoring a driving excitation signal using an output of the pitch reproduction circuit, A sound signal using the driving sound source signal or the reproduced sound source signal and the decoded spectrum parameter. And a spectral envelope filter circuit formed.

（作用）本発明は、前記文献２のピッチ予測マルチパルス符号
化法において、少ない伝送情報量で、音源信号を従来方
法よりも効果的に表現するために、フレーム毎の音声信
号を予め定められた種類、例えば母音部分、子音部分あ
るいは有声部分、無声部分に分類し、前記音声信号を表
すための音源信号として母音あるいは有声部分ではピッ
チ予測マルチパルス法で、子音あるいは無声部分ではピ
ッチ予測マルチパルス法とコードブックで表すことを特
徴としている。(Function) In the present invention, in the pitch prediction multi-pulse coding method of the above-mentioned document 2, in order to express a sound source signal more effectively than a conventional method with a small amount of transmission information, an audio signal for each frame is predetermined. Vowel part, consonant part or voiced part, unvoiced part, and as a sound source signal for representing the voice signal, a vowel or voiced part is a pitch prediction multipulse method, and a consonant or unvoiced part is a pitch prediction multipulse. It is characterized by the use of a code and a law.

母音、子音部分の判別には、周知の方法、例えば現フ
レームのパワー、前フレームとのパワーの差、前フレー
ムとのスペクトルの変化、ピッチ性などのパラメータを
用いることができる。一方、有声、無声部分の判別に
は、ピッチゲイン等のパラメータを用いることができ
る。The vowels and consonants can be determined using well-known methods, for example, parameters such as the power of the current frame, the difference in power from the previous frame, the change in spectrum from the previous frame, and the pitch. On the other hand, a parameter such as a pitch gain can be used to determine a voiced or unvoiced portion.

本発明の作用を第２図を用いて説明する。第２図の上
部は音源信号を表現するための大振幅のマルチムハルス
をピッチ予測により求める回路のブロック図となってい
る。図中の入力端子300からはフレーム毎に分割された
区間の音声信号が入力される。符号入力端子305から
は、現フレームが母音区間であるか子音区間であるかを
示す符号が入力される。ピッチ再生フィルタ315には現
フレームの音声信号から求められたピッチパラメータが
入力されている。スペクトル包絡フィルタ回路320には
現フレームの音声信号から求められたスペクトルパラメ
ータが入力されている。マルチパルス音源310において
初期音源マルチパルス列を発生し、これを前記ピッチ再
生フィルタ315に入力することによって駆動音源信号が
得られる。前記スペクトル包絡フィルタ回路320に前記
駆動音源信号を入力することによって合成音声波形が出
力として得られる。減算器340では前記入力信号から合
成音声波形を減ずる。この結果を重み付け回路350へ入
力し、出力として現フレームでの重み付け誤差電力を得
る。そしてこの重み付け誤差電力を最小とするように、
マルチパルス音源310において音源マルチパルス列の振
幅と位置を求める。スイッチ380は、マルチパルス列を
求めるときは常に上側に接続されている。スイッチ380
はマルチパルス列が求まった後に、符号入力端子305か
ら入力された符号が現フレームが子音部分であることを
示す場合には下側に接続され、求められたマルチパルス
列とピッチ再生フィルタ315とスペクトル包絡フィルタ
回路320により得られる合成音声波形（ｎ）を減算器2
20へ出力する。符号入力端子305から入力された符号が
現フレームが母音部分であることを示す場合にはスイッ
チは上側に接続されたままとなり、音源信号としてはマ
ルチパルスが出力される。The operation of the present invention will be described with reference to FIG. The upper part of FIG. 2 is a block diagram of a circuit for obtaining a large-amplitude multi-Muhals for expressing a sound source signal by pitch prediction. An audio signal of a section divided for each frame is input from an input terminal 300 in the figure. From the code input terminal 305, a code indicating whether the current frame is a vowel section or a consonant section is input. The pitch parameter obtained from the audio signal of the current frame is input to the pitch reproduction filter 315. Spectral parameters obtained from the audio signal of the current frame are input to the spectral envelope filter circuit 320. A driving sound source signal is obtained by generating an initial sound source multi-pulse train in the multi-pulse sound source 310 and inputting it to the pitch reproduction filter 315. By inputting the driving sound source signal to the spectrum envelope filter circuit 320, a synthesized speech waveform is obtained as an output. The subtractor 340 subtracts the synthesized voice waveform from the input signal. The result is input to the weighting circuit 350, and the weighted error power in the current frame is obtained as an output. And to minimize this weighted error power,
In the multi-pulse sound source 310, the amplitude and position of the sound source multi-pulse train are obtained. The switch 380 is always connected to the upper side when obtaining a multi-pulse train. Switch 380
Is connected to the lower side when the code input from the code input terminal 305 indicates that the current frame is a consonant part after the multi-pulse train is obtained, and the obtained multi-pulse train, the pitch reproduction filter 315, and the spectral envelope The synthesized speech waveform (n) obtained by the filter circuit 320 is subtracted by a subtractor 2
Output to 20. When the code input from the code input terminal 305 indicates that the current frame is a vowel portion, the switch remains connected to the upper side, and a multi-pulse is output as a sound source signal.

以上は、音源をピッチ予測したマルチパルス列で表す
場合であり、ピッチ予測したマルチパルス列とコードブ
ックを用いて音源信号を表す場合には上述の処理に加え
て下記の処理を行なう。The above is a case where a sound source is represented by a pitch-predicted multi-pulse sequence. When a pitch-predicted multi-pulse sequence and a codebook are used to represent a sound source signal, the following processing is performed in addition to the above processing.

小振幅音源計算回路230は、フレーム区間をさらにい
くつかに分割した小区間（例えば5msec.程度）の各々に
ついて、音源信号の小振幅成分の特徴を表す小振幅の音
源信号を計算する。ここで、この小振幅音源信号はほぼ
ランダムな位相特性を有し、ほとんど雑音信号に近いと
考えられる。このような信号を非常に効率よく符号化す
るためには、予め複数個作成した小振幅音源信号のコー
ドブック（符号帳）を用意して符号化するベクトル量子
化の手法を用いることができる。ベクトル量子化につい
ては、例えばR.M.Gray,“Vector quantization for spe
ech coding and recognition"（J.Acoustical Soc.Amer
ica,vol.80,Suppl.1,Q1,1986）に詳しいのでここでは説
明を略す。The small-amplitude sound source calculation circuit 230 calculates a small-amplitude sound source signal representing the feature of the small-amplitude component of the sound source signal for each of the small sections (for example, about 5 msec.) Obtained by further dividing the frame section. Here, this small-amplitude sound source signal has almost random phase characteristics and is considered to be almost close to a noise signal. In order to encode such a signal very efficiently, a vector quantization technique of preparing and encoding a codebook (codebook) of a plurality of small amplitude excitation signals prepared in advance can be used. Regarding vector quantization, for example, RMGray, “Vector quantization for spe
ech coding and recognition "(J. Acoustical Soc. Amer
ica, vol.80, Suppl.1, Q1, 1986), so the description is omitted here.

以下で、小振幅音源信号回路230の動作を説明する。
減算器220で再生信号（ｎ）を元の音声波形ｘ（ｎ）
から減じた結果生じる残差信号ｅ（ｎ）を、時間分割回
路240によってフレームよりも短い小区間に時間的に一
様に分割する。コードブック（符号帳）250は、予め複
数個作成して用意されており、時間分割回路240によっ
て分割された各小区間について、コードブックの中から
１種類を入力として、ゲイン回路255を通してゲインを
合わせた後、スペクトル包絡フィルタ320と同様のスペ
クトルパラメータを用いたスペクトル包絡フィルタ260
により合成残差信号（ｎ）を合成する。Hereinafter, the operation of the small-amplitude sound source signal circuit 230 will be described.
The reproduced signal (n) is converted by the subtracter 220 into the original audio waveform x (n).
The time division circuit 240 divides the residual signal e (n) resulting from the subtraction into the small sections shorter than the frame uniformly in time. A plurality of codebooks (codebooks) 250 are prepared and prepared in advance. For each of the small sections divided by the time division circuit 240, one of the codebooks is input and the gain is set through the gain circuit 255. After matching, a spectrum envelope filter 260 using the same spectral parameters as the spectrum envelope filter 320
To synthesize the combined residual signal (n).

減算器270は、入力残差信号ｅ（ｎ）から合成残差信
号（ｎ）を減ずる。この結果は重み付け回路280に入
力され、出力として重み付け誤差電力を得る。ここで重
み付け回路280は第２図上部の重み付け回路350と同一の
動作を行う。そして重み付け誤差電力を最小とするよう
にコードブック250の中から最適なものを選び、そのイ
ンデクスを出力とする。The subtracter 270 subtracts the combined residual signal (n) from the input residual signal e (n). This result is input to the weighting circuit 280, and a weighted error power is obtained as an output. Here, the weighting circuit 280 performs the same operation as the weighting circuit 350 in the upper part of FIG. Then, an optimal one is selected from the codebook 250 so as to minimize the weighted error power, and the index is output.

次に、小振幅音源信号をコードブックを用いて表現
し、コードブックを選択するための実際の方法につい
て、以下で式を用いて説明する。コードブックの選択方
法としては次式で定義される誤差電力Ｅを最小化するよ
うに計算する。Next, an actual method for expressing a small-amplitude sound source signal using a codebook and selecting the codebook will be described below using equations. As a codebook selection method, calculation is performed so as to minimize the error power E defined by the following equation.

ここで、ｅ（ｎ）は第２図の入力誤差信号であり、ｇ
はゲイン回路255において乗ずるゲイン、（ｎ）は選
択された一種類のコードブックとスペクトル包絡フィル
タによって再生した残差信号である。ｗ（ｎ）は聴感を
考えた重み付けフィルタ（第２図中の重み付け回路280
と同一である。）のインパルス応答を示す。（１）式を
ｇについて最小化すると（２）式の形となる。 Here, e (n) is the input error signal in FIG.
Is a gain to be multiplied by the gain circuit 255, and (n) is a residual signal reproduced by one selected codebook and a spectrum envelope filter. w (n) is a weighting filter (the weighting circuit 280 in FIG.
Is the same as 3) shows the impulse response. When the expression (1) is minimized with respect to g, the expression (2) is obtained.

ここで、 _Ｗ（ｎ）＝（ｎ）＊ｗ（ｎ）＝ｎ（ｎ）＊ｈ
（ｎ）＊ｗ（ｎ）（3a） e_W（ｎ）＝ｅ（ｎ）＊ｗ（ｎ）（3b）記号＊は畳み込みを表す。（２）式の分母は
_Ｗ（ｎ）の自己相関（厳密には共分散）、分子は_Ｗ（ｎ）とe_W（ｎ）の相互相関である。また
（3a）式のｎ（ｎ）はコードブック中の、選択されたあ
る１種類のコードが表す信号である。 Here, _W (n) = (n) * w (n) = n (n) * h
(N) * w (n) (3a) e _W (n) = e (n) * w (n) (3b) The symbol * represents convolution. The denominator of equation (2) is
Autocorrelation of _W (n) (strictly covariance), then the molecules are cross-correlation _W (n) and e _W (n). Further, n (n) in the expression (3a) is a signal represented by one selected type of code in the code book.

また、ｈ（ｎ）はスペクトル包絡フィルタ回路260の
インパルス応答を示す。Also, h (n) indicates the impulse response of the spectrum envelope filter circuit 260.

このとき誤差電力Ｅは次式のように書けるので、Ｅを最小化するコードブックは、（４）式第２項を最大
化、即ちｇを最大化するように選択さればよい。At this time, the error power E can be written as the following equation. The codebook that minimizes E may be selected so as to maximize the second term of Expression (4), that is, maximize g.

コードブックを選択するための計算量をさらに大幅に
削減するための方法としては、次のような構成も考えら
れる。音源信号を表すマルチパルス列は相互相関を用い
て探索する。この求め方は前記文献１、２等に詳しいの
でここでは説明は省略するが、ピッチ予測マルチパルス
列を求めた後の相互相関関数Φ_xh′を用いることによ
り、前述の方法より大幅に演算量を削減した上で、コー
ドブックを選択することが可能となる。As a method for further greatly reducing the amount of calculation for selecting a codebook, the following configuration can be considered. The multipulse train representing the sound source signal is searched for using the cross-correlation. This Determination detailed in the Documents 1 and 2 is omitted described here, by using a cross-correlation function [Phi _xh 'after determined pitch prediction multi-pulse train, a significant amount of calculation than the method described above After the reduction, the codebook can be selected.

以下に示す方法ではコードブック選択の際に信号_Ｗ
（ｎ）を再生しなくてよいので、特性を前述の方法とほ
ぼ同じに保ちながら演算量を大幅に低減できる。以下に
導出方法を説明する。まずΦ_xh′、_Ｗ（ｎ）は次のよ
うにして書くことができる。In the method described below, the signal _W
Since (n) does not need to be reproduced, the amount of calculation can be greatly reduced while maintaining the characteristics almost the same as in the above-described method. The derivation method will be described below. First, Φ _xh ', _W (n) can be written as follows.

Φ_xh′＝Σe_W（ｎ）h_W（ｎ）（５） _Ｗ（ｎ）＝ｎ（ｎ）＊h_w（ｎ）（６）（６）式を（２）式に代入し、（５）式を用いると、次
の様に変形が可能である。Φ _xh ′ = Σe _W (n) h _W (n) (5) _W (n) = n (n) * h _w (n) (6) Substituting equation (6) into equation (2), Using the expression, the following modifications are possible.

ここでΦ_xh′はピッチ予測によりマルチパルス列を求
めた後の相互相関関数、R_hh（０）は、スペクトル包絡
フィルタ260と重み付け回路280の従属接続からなるフィ
ルタのインパルス応答の電力である。R_nn（０）はコー
ドブック250のうちある１種類のコードを選択した場合
の、前記コードにより表される信号ｎ（ｎ）の電力であ
る。（７）式の分子はΦ_xh′と選択されたコードにより
表される信号ｎ（ｎ）との相互相関関数である。前述の
（２）式と同じように、コードブックは（７）式のｇを
最大化するものを選べはよい。 Here, Φ _xh ′ is a cross-correlation function after _{obtaining a} multi-pulse train by pitch prediction, and R _hh (0) is the power of the impulse response of a filter composed of a cascade connection of the spectral envelope filter 260 and the weighting circuit 280. R _nn (0) is the power of the signal n (n) represented by the code when one type of code is selected from the code book 250. The numerator of equation (7) is the cross-correlation function between Φ _xh 'and the signal n (n) represented by the selected code. As in the case of the above equation (2), it is preferable to select a codebook that maximizes g in the equation (7).

なお、コードブック250は、大振幅のピッチ予測マル
チパルス列を予め定められた個数だけ求めた後の音源の
残差信号を用いて、予めトレーニングすることによって
作成してもよいし、例えばガウス性の統計的性質を持つ
ような雑音信号を位相特性を種々に変化させて複数個用
いて作成しておいてもよい。後者の方法についてはM.R.
Shroeder and B.S.Atal:“Code−Excited linear predi
ction（CELP）:high−quality speech at very low bit
rates,",Proc,I.C.A.S.S.P.vol.1,paper no.25.1.1,Ma
rch,1985を参照することができる。Note that the codebook 250 may be created by training in advance using residual signals of a sound source after obtaining a predetermined number of large-amplitude pitch prediction multi-pulse trains, or may be created by, for example, Gaussian characteristics. A noise signal having a statistical property may be created by using a plurality of noise signals with variously changing phase characteristics. MR for the latter method
Shroeder and BSAtal: “Code-Excited linear predi
ction (CELP): high-quality speech at very low bit
rates, ", Proc, ICASSPvol.1, paper no.25.1.1, Ma
rch, 1985.

送信側の伝送情報は、ピッチ予測した大振幅のマルチ
パルスの振幅、位置、小振幅音源信号のコーブックのイ
ンデクスとゲインと、ピッチパラメータ、スペクトルパ
ラメータである。The transmission information on the transmitting side is the amplitude and position of the large-pulse multipulse whose pitch is predicted, the index and gain of the cobook of the small-amplitude excitation signal, the pitch parameter, and the spectrum parameter.

（実施例）本発明の一実施例を示す第１図（ａ），（ｂ）におい
て、入力端子500から離散的な音声信号ｘ（ｎ）を入力
する。時間分割回路510は入力された音声信号を時間的
に一様なフレーム毎（例えば20msec.毎）に分割する。
ピッチパラメータ計算回路515はピッチの微細構造を表
わすピッチパラメータを計算する。計算方法は前記文献
２に示されているような方法を用いる。量子化器516は
前記求められたピッチパラメータを量子化する。逆量子
化器518は、量子化した結果を用いて逆量子化して出力
する。スペクトルパラメータ計算回路520では前記分割
した区間の音声信号のスペクトルを表すスペクトルパラ
メータを衆知のLPC分析法によって求める。(Embodiment) In FIGS. 1A and 1B showing an embodiment of the present invention, a discrete audio signal x (n) is input from an input terminal 500. FIG. The time division circuit 510 divides the input audio signal into frames that are temporally uniform (for example, every 20 msec.).
The pitch parameter calculation circuit 515 calculates a pitch parameter representing the fine structure of the pitch. As a calculation method, a method as shown in the aforementioned reference 2 is used. The quantizer 516 quantizes the obtained pitch parameter. The inverse quantizer 518 performs inverse quantization using the result of quantization and outputs the result. The spectrum parameter calculation circuit 520 obtains a spectrum parameter representing the spectrum of the audio signal in the divided section by a well-known LPC analysis method.

求められたスペクトルパラメータに対しては、スペク
トルパラメータ量子化器525において量子化を行う。量
子化の方法は、特願昭59−272435号明細書（文献５）に
示されているようなスカラー量子化や、あるいは前記文
献４に示されたベクトル量子化を行ってもよい。逆量子
化器530は、量子化した結果を用いて逆量子化して出力
する。重み付け回路540は、逆量子化されたスペクトル
パラメータを用いて前記分割された音声信号に重み付け
を行う。重み付けの方法は、前記文献５の重み付け回路
200を参照することができる。インパルス応答計算回路5
50は、逆量子化されたピッチパラメータと逆量子化され
たスペクトルパラメータを用いてインパルス応答を計算
する。具体的な方法は前記文献２を参照できる。自己相
関関数回路560は前記インパルス応答の自己相関関数を
計算し音源パルス計算回路580へ出力する。自己相関関
数の計算法は前記文献２の自己相関関数計算回路180を
参照することができる。相互相関関数計算回路570は前
記重み付けられた信号と前記インパルス応答との相互相
関関数を計算して音源パルス計算回路580へ出力する。
具体的な方法は前記文献２を参照できる。The obtained spectral parameters are quantized by a spectral parameter quantizer 525. As a quantization method, scalar quantization as described in Japanese Patent Application No. 59-272435 (Reference 5) or vector quantization as described in Reference 4 may be performed. The inverse quantizer 530 inversely quantizes using the result of the quantization and outputs the result. The weighting circuit 540 weights the divided audio signal using the dequantized spectral parameters. The weighting method is described in the weighting circuit of the above-mentioned reference 5.
200 can be referenced. Impulse response calculation circuit 5
50 calculates the impulse response using the dequantized pitch parameter and the dequantized spectral parameter. The specific method can be referred to the above-mentioned document 2. The autocorrelation function circuit 560 calculates the autocorrelation function of the impulse response and outputs the calculated function to the sound source pulse calculation circuit 580. The calculation method of the autocorrelation function can be referred to the autocorrelation function calculation circuit 180 of the aforementioned document 2. The cross-correlation function calculation circuit 570 calculates a cross-correlation function between the weighted signal and the impulse response, and outputs the result to the sound source pulse calculation circuit 580.
The specific method can be referred to the above-mentioned document 2.

判別回路575では、現フレームが例えば母音区間であ
るか子音区間であるかを判別し、その結果を示す判別符
号を音源パルス計算回路580へ出力する。判別には前記
作用の項で示した通り例えばスペクトルの変化、パワ
ー、パワーの変化といった衆知のパラメータを用いるこ
とができる。The determination circuit 575 determines whether the current frame is, for example, a vowel section or a consonant section, and outputs a determination code indicating the result to the excitation pulse calculation circuit 580. For the discrimination, as shown in the section of the operation, for example, well-known parameters such as spectrum change, power, and power change can be used.

音源パルス計算回路580では、判別回路575の出力が母
音を示す符号である場合には、マルチパルスをピッチ予
測により、予め定められた個数（L1個）だけ求める。マ
ルチパルス列の計算方法については、前記文献２の音源
パルス計算回路210を参照することができる。When the output of the discrimination circuit 575 is a code indicating a vowel, the sound source pulse calculation circuit 580 obtains a predetermined number (L1) of multi-pulses by pitch prediction. For the calculation method of the multi-pulse train, reference can be made to the sound source pulse calculation circuit 210 of Reference 2.

判別回路575の出力が母音を示す符号である場合には
小振幅音源信号を計算することはせず、ここで音源信号
に関する計算を終了する。従ってこの場合には量子化器
585、ピッチ再生器600、ピッチ再生フィルタ605、スペ
クトル包絡フィルタ610、減算器615、小振幅音源計算回
路620は作動しない。If the output of the discrimination circuit 575 is a code indicating a vowel, the small amplitude excitation signal is not calculated, and the calculation for the excitation signal ends here. So in this case the quantizer
585, pitch reproducer 600, pitch reproduction filter 605, spectrum envelope filter 610, subtractor 615, and small amplitude sound source calculation circuit 620 do not operate.

量子化器585は音源マルチパルス列を量子化して符号
を出力する。この出力は逆量子化器590によって逆量子
化され、パルス発生器600によってマルチパルスを再生
する。ピッチ再生フィルタ605では前記再生されたマル
チパルスと前記逆量子化器518によって逆量子化された
ピッチパラメータを入力としピッチを再生した音源信号
を出力する。前記音源信号と前記逆量子化器530から出
力されたスペクトルパラメータをスペクトル包絡フィル
タ610に通すことによって、前記音源パルスによる合成
音声信号（ｎ）が求まる。The quantizer 585 quantizes the source multipulse train and outputs a code. This output is inversely quantized by an inverse quantizer 590, and a multi-pulse is reproduced by a pulse generator 600. The pitch reproduction filter 605 receives the reproduced multi-pulse and the pitch parameter dequantized by the dequantizer 518 as inputs and outputs a sound source signal whose pitch has been reproduced. By passing the sound source signal and the spectrum parameter output from the inverse quantizer 530 through a spectrum envelope filter 610, a synthesized speech signal (n) based on the sound source pulse is obtained.

判別回路575の出力が子音を示す場合には、前述の構
成によりL2個（L2＜L1）のピッチ予測したマルチパルス
列を求め合成信号（ｎ）を求める。When the output of the discriminating circuit 575 indicates a consonant, a multi-pulse train whose pitch is predicted by L2 (L2 <L1) is obtained by the above-described configuration to obtain a composite signal (n).

さらに減算器615は、前記音声信号ｘ（ｎ）から合成
音声信号（ｎ）を減ずることによって、残差信号ｅ
（ｎ）を小振幅音源計算回路へ出力する。Further, the subtracter 615 subtracts the synthesized audio signal (n) from the audio signal x (n) to generate a residual signal e.
(N) is output to the small amplitude sound source calculation circuit.

小振幅音源計算回路620では、前記作用の項で動作を
説明したように、フレームよりみ短い区間に分割された
小区間（例えば5msec.）の小振幅音源信号を複数個のコ
ードブックの中から最適なものを用いて表す。As described in the operation section, the small-amplitude sound source calculation circuit 620 converts a small-amplitude sound source signal of a small section (for example, 5 msec.) Divided into sections shorter than a frame from a plurality of codebooks. Express using the optimal one.

現フレームが母音区間であるか子音区間であるかを示
す符号、小振幅音源信号を表すコードブックのインデク
スとゲイン、量子化器585の出力であるマルチパルス列
を量子化した符号、量子化器516の出力であるピッチパ
ラメータを量子化した符号、さらに量子化器525の出力
であるスペクトルパラメータを量子化した符号は、それ
ぞれマルチプレクサ630の入力となる。ただし、現フレ
ームが母音区間である場合には小振幅音源信号を表すコ
ードブックのインデクスとゲインは入力とはならない。
マルチプレクサ630は以上の各符号を組み合わせて出力
する。A code indicating whether the current frame is a vowel section or a consonant section, an index and a gain of a codebook representing a small amplitude excitation signal, a code obtained by quantizing a multi-pulse train output from the quantizer 585, a quantizer 516 The code obtained by quantizing the pitch parameter, which is the output of, and the code obtained by quantizing the spectrum parameter, which is the output of the quantizer 525, are input to the multiplexer 630. However, when the current frame is a vowel section, the index and the gain of the codebook representing the small amplitude excitation signal are not input.
The multiplexer 630 combines and outputs the above codes.

一方、受信側では、デマルチプレクサ710は、マルチ
パルス列の符号、ピッチパラメータの符号、スペクトル
パラメータの符号、スペクトルパラメータの符号、現フ
レームが母音区間であるか子音区間であるかを示す判別
符号、現フレームが子音区間である場合には小振幅音源
信号を表すインデクス及びゲインの符号を分離して出力
する。音源パルス復号器720はマルチパルスの振幅、位
置を復号する。スペクトルパラメータ復号器750は、送
信側の逆量子化器530と同じ働きをする。小振幅音源復
号器730は、送信側の小振幅音源計算回路620と同一のコ
ードブックを有しており、現フレームが子音区間である
ことを示す符号を受信した場合には、受信したインデク
スを用いて小振幅音源信号を表すコードを選択して出力
する。ゲイン回路735は、現フレームが子音区間である
ことを示す符号を受信した場合に、受信したゲインの符
号を用いて小振幅音源信号の振幅を決定する。ピッチパ
ラメータ復号器745は送信側の逆量子化器518と同じ働き
をする。パルス発生器725は前記マルチパルス列による
音源信号を発生させる。ピッチ再生フィルタ755は前記
求められた音源信号と前記復号されたピッチパラメータ
を入力としてピッチを再生した合成音源信号を再生す
る。加算器740は前記ピッチを再生した音源信号と、現
フレームが子音区間であることを示す符号を受信した場
合にはゲイン回路735の出力信号を加算して、駆動音源
信号を求め、スペクトル包絡フィルタ回路760を駆動す
るる。スペクトル包絡フィルタ回路760では前記駆動音
源信号及び前記復号されたスペクトルパラメータを用い
て合成音声波形を求めて出力する。On the other hand, on the receiving side, the demultiplexer 710 includes a code of the multi-pulse train, a code of the pitch parameter, a code of the spectrum parameter, a code of the spectrum parameter, a discrimination code indicating whether the current frame is a vowel section or a consonant section, If the frame is a consonant section, the index representing the small amplitude excitation signal and the sign of the gain are separated and output. The excitation pulse decoder 720 decodes the amplitude and position of the multi-pulse. The spectrum parameter decoder 750 performs the same function as the inverse quantizer 530 on the transmission side. The small-amplitude excitation decoder 730 has the same codebook as the small-amplitude excitation calculation circuit 620 on the transmitting side, and when receiving a code indicating that the current frame is a consonant section, the received index is To select and output a code representing a small amplitude excitation signal. When a code indicating that the current frame is a consonant section is received, the gain circuit 735 determines the amplitude of the small-amplitude excitation signal using the received code of the gain. The pitch parameter decoder 745 performs the same function as the inverse quantizer 518 on the transmission side. The pulse generator 725 generates a sound source signal based on the multi-pulse train. A pitch reproduction filter 755 reproduces a synthesized excitation signal whose pitch has been reproduced by using the obtained excitation signal and the decoded pitch parameter as inputs. The adder 740 adds the sound source signal obtained by regenerating the pitch and the output signal of the gain circuit 735 when receiving a code indicating that the current frame is a consonant section, obtains a driving sound source signal, and obtains a spectrum envelope filter. Drive circuit 760. The spectrum envelope filter circuit 760 obtains and outputs a synthesized speech waveform using the driving excitation signal and the decoded spectrum parameters.

以上述べた構成は本発明の一構成に過ぎず、種々の変
形も可能である。The configuration described above is only one configuration of the present invention, and various modifications are possible.

小振幅音源信号を求めるための計算量をさらに大幅に
削減するためには、作用の項の（５）式から（７）式で
説明したように、ピッチ予測による大振幅マルチパルス
を求めた後の相互相関関数Φ_xh′を用いてコードブック
を選択するような構成とすることが可能である。このよ
うにすると、前記作用の項でも述べた通り、コードブッ
ク選択の際に信号_Ｗ（ｎ）を再生しなくてよいので、
第１図に示した構成と比べて演算量を大幅に低減でき
る。In order to further reduce the amount of calculation for obtaining the small-amplitude sound source signal, as described in Equations (5) to (7) in the operation section, after calculating the large-amplitude multipulse by pitch prediction, A code book can be selected using the cross-correlation function Φ _xh ′. In this case, as described in the above-mentioned operation, the signal _W (n) does not need to be reproduced at the time of codebook selection.
The amount of calculation can be greatly reduced as compared with the configuration shown in FIG.

また、子音部分に対しては、子音の性質（例えば破裂
性、摩擦性等）に応じて異なるコードブックを予め作成
しておき、これらを切り替えて使用してもよい。For the consonant part, different codebooks may be created in advance in accordance with the properties of the consonant (for example, burstiness, friction, etc.), and these may be switched and used.

また、マルチパルスの計算方法としては、前記文献１
に示した方法の他に、種々の衆知な方法を用いることが
できる。Further, as a method of calculating a multi-pulse,
Various known methods can be used in addition to the method shown in FIG.

また、スペクトルパラメータとしては、他の衆知なパ
ラメータ（線スペクトル対、ケプストラム、メルケプス
トラム、対数断面積比等）を用いることもできる。さら
に、スペクトルパラメータの量子化法としてはスカラー
量子化以外にもベクトル量子化等を用いることができ
る。ベクトル量子化については、前記文献３を参照でき
る。Further, other well-known parameters (a line spectrum pair, a cepstrum, a mel cepstrum, a logarithmic cross-sectional area ratio, etc.) can also be used as the spectrum parameter. Further, as a method of quantizing the spectrum parameter, vector quantization or the like can be used in addition to scalar quantization. Reference 3 can be referred to for the vector quantization.

また、フレーム長は一定としたが、可変としてもよ
い、（発明の効果）本発明によれば、従来例に比べて、音源信号を予め定
められた種類（例えば母音部分、子音部分あるいは有声
部分、無声部分）に分類し、前記分類に応じて母音ある
いは有声部分では音源を比較的少ない個数のピッチ予測
マルチパルス列、子音あるいは無声部分においてはピッ
チ予測したマルチパルス列のみならず音質改善にさらに
効果のある小振幅の音源信号をコードブックを併せて用
いることによって非常に少ない伝送情報量で表すことが
できる。従って、従来法とビットレートを同一として
も、母音部分のみならず子音区間においても従来法より
もより良好な再生音声信号を得ることができるという大
きな効果がある。さらに、この効果はビットレートを下
げていった場合により顕著となる。Although the frame length is fixed, the frame length may be variable. (Effect of the Invention) According to the present invention, the sound source signal is set to a predetermined type (for example, a vowel portion, a consonant portion, or a voiced portion) as compared with the conventional example. , Unvoiced part), and according to the classification, the vowel or voiced part has a relatively small number of sound sources of pitch prediction multi-pulse trains. By using a certain small-amplitude sound source signal together with a codebook, it can be represented by a very small amount of transmission information. Therefore, even if the bit rate is the same as that of the conventional method, there is a great effect that a better reproduced audio signal can be obtained not only in the vowel part but also in the consonant section as compared with the conventional method. Further, this effect becomes more remarkable when the bit rate is reduced.

[Brief description of the drawings]

第１図は本発明による音声符号化復号化方法とその装置
の一実施例の構成を示すブロック図、第２図は本発明の
作用を示すブロック図である。第３図はマルチパルス列
探索法の従来例を表すブロック図である。図において、 510,240……時間分割回路、515……ピッチパラメータ計
算回路、520……スペクトルパラメータ計算回路、516,5
25,585……量子化器、518,530,590……逆量子化器、54
0,350,280,850……重み付け回路、550……インパルス応
答計算回路、560……自己相関関数計算回路、570……相
互相関関数計算回路、575……判別回路、580……音源パ
ルス計算回路、600,725……パルス発生器、755,315,605
……ピッチ再生フィルタ、610,760,320,260……スペク
トル包絡フィルタ、820……合成フィルタ回路、620,230
……小振幅音源計算回路、630……マルチプレクサ、710
……デマルチプレクサ、720……音源パルス復号器、730
……小振幅音源復号器、740……加算器、745……ピッチ
パラメータ復号器、750……スペクトルパラメータ復号
器、310……マルチパルス音源、810……音源計算回路、
615,340,220,270,840……減算器、735,255……ゲイン回
路、380……スイッチ。FIG. 1 is a block diagram showing the configuration of an embodiment of a speech encoding / decoding method and apparatus according to the present invention, and FIG. 2 is a block diagram showing the operation of the present invention. FIG. 3 is a block diagram showing a conventional example of the multi-pulse train search method. In the figure, 510,240: time division circuit, 515: pitch parameter calculation circuit, 520: spectrum parameter calculation circuit, 516,5
25,585 …… Quantizer, 518,530,590 …… Dequantizer, 54
0,350,280,850 ... weighting circuit, 550 ... impulse response calculation circuit, 560 ... autocorrelation function calculation circuit, 570 ... cross-correlation function calculation circuit, 575 ... discrimination circuit, 580 ... sound source pulse calculation circuit, 600,725 ... pulse Generator, 755,315,605
…… Pitch reproduction filter, 610,760,320,260 …… Spectrum envelope filter, 820 …… Synthesis filter circuit, 620,230
…… Small amplitude sound source calculation circuit, 630 …… Multiplexer, 710
…… Demultiplexer, 720 …… Sound source pulse decoder, 730
…… Small amplitude excitation decoder, 740 …… Adder, 745 …… Pitch parameter decoder, 750 …… Spectrum parameter decoder, 310 …… Multi pulse excitation, 810 …… Excitation calculation circuit,
615,340,220,270,840 ... Subtractor, 735,255 ... Gain circuit, 380 ... Switch.

Claims

(57) [Claims]

1. A speech encoding / decoding device comprising a speech encoding device and a speech decoding device, wherein the speech encoding device comprises a pitch parameter calculating means (51).
5) a spectrum parameter calculating means (520), a discriminating means (575), a sound source calculating means, and a multiplexer (630). The pitch parameter calculating means (515) calculates a pitch parameter representing a pitch parameter from an input discrete voice signal. The spectrum parameter calculation means (520) determines and encodes a spectrum parameter representing a short-time spectrum from the input discrete speech signal, and the discrimination means (575) converts the input discrete speech signal into a predetermined type. When the discriminating means (575) indicates a predetermined classification, the sound source calculating means performs the first processing. When the discriminating means (575) indicates any other classification, the sound source calculating means performs the first and second processing. Performing a first process, wherein a first synthesized signal is reproduced based on a pitch parameter, a spectrum parameter, and a multi-pulse sequence, and And a multi-pulse train at which the signal becomes optimal is coded as multi-pulse information. The second processing is to obtain a difference signal between the determined first combined signal and the input discrete signal, and to obtain a pitch parameter, a spectrum parameter, and a code. The second combined signal is reproduced based on the book and the gain, and the code vector and the gain of the codebook when the second combined signal is optimal with the difference signal are determined and encoded. The output of the parameter calculating means (515), the spectrum parameter calculating means (520), the discriminating means (575) and the sound source calculating means are combined and output. (730), pulse generation means (720, 72
5), a pitch reproducing means (755), an adding means (740), and a spectrum envelope synthesis filter (760). The demultiplexer (710) converts a gain, a discrimination code, an index, multi-pulse train information, a spectrum parameter, When the discrimination code is a predetermined one, the small-amplitude excitation signal restoring means (730) adds a gain to the code vector indicated by the index and outputs the code vector. The pulse generating means (720, 725) ) Generates a multi-pulse train based on the multi-pulse train information, the pitch reproducing means (755) outputs a synthesized sound source signal whose pitch is reproduced based on the multi-pulse train and the pitch parameter, and the adding means (740) Small amplitude excitation signal decoding means (730),
The output of the pitch reproduction means (755) is added to output a sound source signal, and the spectrum envelope synthesis filter (760) synthesizes a sound signal based on the sound source signal output from the addition means and the spectrum parameter. apparatus.

2. A pitch parameter calculating means (515), a spectrum parameter calculating means (520), and a discriminating means (57).
5) A speech coding apparatus comprising a sound source calculation means and a multiplexer (630), wherein the pitch parameter calculation means (515) obtains and encodes a pitch parameter representing a pitch parameter from an input discrete speech signal, and encodes the spectrum parameter. The calculating means (520) obtains and encodes a spectrum parameter representing a short-time spectrum from the input discrete audio signal, and the determining means (575) classifies the input discrete audio signal into a predetermined type, The means performs first processing when the determination means (575) indicates a predetermined classification, and performs first and second processing when the determination means (575) indicates another classification. Reproduces a first synthesized signal based on a pitch parameter, a spectrum parameter, and a multi-pulse train, and generates a multi-pulse when the first synthesized signal is optimal with an input discrete voice signal. The second processing determines a difference signal between the determined first combined signal and the input discrete signal, and determines the difference signal based on the pitch parameter, the spectrum parameter, the codebook, and the gain. Is reproduced, and the code vector and the gain of the code book when the second synthesized signal is optimal with the difference signal are determined and coded. The multiplexer (630) includes a pitch parameter calculating means (515), An audio encoding device that combines and outputs the outputs of the parameter calculation means (520), the determination means (575), and the sound source calculation means.

3. A demultiplexer (710), a small-amplitude sound source signal restoring means (730), a pulse generating means (720, 725), a pitch reproducing means (755), an adding means (740), and a spectrum envelope synthesis filter (760). A voice decoding device, wherein a demultiplexer (710) separates a gain, a discrimination code, an index, multipulse train information, a spectrum parameter, and a pitch parameter from an input signal and outputs the separated signal; When the discrimination code is a predetermined one, the gain is added to the code vector indicated by the index and output. The pulse generating means (720, 725) generates a multi-pulse train based on the multi-pulse train information, (755) outputs a synthesized sound source signal whose pitch is reproduced based on the multi-pulse train and the pitch parameter, and the adding means (740) , Small amplitude excitation signal decoding means (730),
A speech decoding device that adds the output of the pitch reproducing means (755) to output a sound source signal, and the spectrum envelope synthesis filter (760) synthesizes a sound signal based on the sound source signal output by the adding means and the spectrum parameter.