JPH0425560B2

JPH0425560B2 -

Info

Publication number: JPH0425560B2
Application number: JP57231603A
Authority: JP
Inventors: Kazunori Ozawa; Taku Arazeki
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-12-24
Filing date: 1982-12-24
Publication date: 1992-05-01
Also published as: JPS59116793A

Description

【発明の詳細な説明】本発明は音声信号の低ビツトレイト波形符号化
方式、特に伝送情報量を10Kビツト／秒以下とす
るような符号化装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a low bit rate waveform encoding system for audio signals, and particularly to an encoding apparatus that reduces the amount of transmitted information to 10K bits/second or less.

音声信号を10Kビツト／秒程度以下の伝送情報
量で符号化するための効果的な方法として、音声
信号の駆動音源信号系列を、それを用いて再生し
た信号と入力信号との誤差最小を条件として、短
時間毎に探索する方法が、よく知られている。こ
れらの方法はその探索方法によつて木符号化
（TREECODING）、ベクトル量子化（VECTOR
QUANTIZATION）と呼ばれている。また、こ
れらの方法以外に、駆動音源信号系列を表わす複
数個のパルス系列を、短時間毎に、符号器側で、
Ａ−ｂ−Ｓ（ANALYSIS−BY−SYNTHESIS）
の手法を用いて逐次的に求めようとする方式が最
近提案されている。本発明は、この方法に関係す
るものである。この方式の詳細については、ビ
ー・エス・アタール）B.S.ATAL）氏らによる
アイ・シー・エー・エス・エス・ピー（I.C.A.S.
S.P）の予稿集、1982年614〜617頁に掲載の
「ア・ニユー・モデル・オブ・エル・ビー・シ
ー・エクサイテイシヨン・フオー・プロデユーシ
ング・ナチユラル・サウンデイング・スピーチ・
アツト・ロウ・ビツト・レイツ」（“ANEW
MODEL OF LPC EXCITATION FOR
PRODUCING NATURAL−SOUNDING
SPEECHAT LOW BIT RATES”）と題した論
文（文献１）に説明されているので、ここでは簡
単に説明を行なう。 As an effective method for encoding an audio signal with a transmission information amount of about 10K bits/second or less, the driving sound source signal sequence of the audio signal is conditioned on the minimum error between the signal reproduced using the sequence and the input signal. A method of searching every short period of time is well known. These methods include tree coding (TREECODING) and vector quantization (VECTOR) depending on the search method.
QUANTIZATION). In addition to these methods, a plurality of pulse sequences representing the driving excitation signal sequence are transmitted at short intervals on the encoder side.
A-b-S (ANALYSIS-BY-SYNTHESIS)
Recently, a method has been proposed that attempts to find the value sequentially using the following method. The present invention relates to this method. For details on this method, please refer to the ICAS website by BSATAL et al.
Proceedings of SP), 1982, pp. 614-617, ``A New Model of LBC Excitement for Producing Natural Sounding Speech''.
“At Low Between Rates” (“ANEW
MODEL OF LPC EXCITATION FOR
PRODUCING NATURAL−SOUNDING
SPEECHAT LOW BIT RATES") (Reference 1), so we will briefly explain it here.

第１図は、前記文献１に記載の従来方式におけ
る符号器側の処理を示すブロツク図である。図に
おいて、１００は符号入力端子を示し、Ａ／Ｄ変
換された音声信号系列Xnが入力される。１００
はバツフアメモリ回路であり、音声信号系列を１
フレーム（例えば10msec、8KHzサンプリングの
場合は80サンプル）分、蓄積する。１１０の出力
値は減算器１２０と、Ｋパラメータ計算回路１８
０とに出力される。但し、文献１によれば、Ｋパ
ラメータのかわりにレフレクシヨン・コエフイシ
エンツ（REFLECTION COEFFICIENTS）と
記載されているが、これはＫパラメータと同一の
パラメータである。Ｋパラメータ計算回路１８０
は、１１０の出力値を用い、共分散法に従つて、
フレーム毎の音声信号スペクトルを表わすＫパラ
メータKiを16次分（１≦ｉ≦16）求め、これら
を合成フイルタ１３０へ出力する。１４０は、音
源パルス発生回路であり、１フレームにあらかじ
め定められた個数のパルス系列を発生させる。こ
こでは、このパルス系列をｄ(n)と記する。１４０
によつて発生された音源パルス系列の一例を第２
図に示す。第２図で横軸は離散的な時刻を、縦軸
は振幅をそれぞれ示す、ここでは、１フレーム内
に８個のパルスを発生させる場合について示して
ある。１４０によつて発生されたパルス系列を
dnは、合成フイルタ１３０を駆動する。合成フ
イルタ１３０は、ｄ(n)を入力し、音声信号ｘ(n)に
対応する再生信号(n)を求め、これを減算器１２
０へ出力する。ここで、合成フイルタ１３０は、
ＫパラメータKiを入力し、これらを予測パラメ
ータai（１≦ｉ≦16）へ変換し、aiを用いて(n)
を計算する。(n)は、ｄ(n)とaiを用いて下式のよ
うに表わすことができる。 FIG. 1 is a block diagram showing processing on the encoder side in the conventional system described in Document 1. In the figure, 100 indicates a code input terminal, into which an A/D converted audio signal sequence Xn is input. 100
is a buffer memory circuit, which stores the audio signal series in one
Accumulates frames (for example, 10 msec, 80 samples in the case of 8KHz sampling). The output value of 110 is sent to a subtracter 120 and a K parameter calculation circuit 18.
It is output as 0. However, according to Document 1, REFLECTION COEFFICIENTS is described instead of the K parameter, but this is the same parameter as the K parameter. K parameter calculation circuit 180
uses an output value of 110 and follows the covariance method:
The K parameter Ki representing the audio signal spectrum for each frame is obtained for 16 orders (1≦i≦16), and these are output to the synthesis filter 130. 140 is a sound source pulse generation circuit, which generates a predetermined number of pulse sequences in one frame. Here, this pulse sequence is denoted as d(n). 140
An example of the sound source pulse sequence generated by
As shown in the figure. In FIG. 2, the horizontal axis indicates discrete time and the vertical axis indicates amplitude. Here, the case where eight pulses are generated within one frame is shown. The pulse sequence generated by 140
dn drives the synthesis filter 130. The synthesis filter 130 inputs d(n), obtains a reproduced signal (n) corresponding to the audio signal x(n), and sends this to the subtracter 12
Output to 0. Here, the synthesis filter 130 is
Input K parameters Ki, convert them to prediction parameters ai (1≦i≦16), and use ai to calculate (n)
Calculate. (n) can be expressed as shown below using d(n) and ai.

X^〜(n)＝ｄ(n)＋_P 〓ⁱ⁼¹ aiX^〜（ｎ−ｉ） −(1) 上式でｐは合成フイルタの次数を示し、ここでは
ｐ＝16としている。減算器１２０は、原信号ｘ(n)
と再生信号X^〜との差ｅ(n)を計算し、重み付け回
路１９０へ出力する。１９０は、ｅ(n)を入力し、
重み付け関数ω(n)を用い、次式に従つて重み付け
誤差eω(n)を計算する。 X ^~ (n) = d (n) + _P 〓 ^{i = 1} aiX ^~ (ni) - (1) In the above formula, p indicates the order of the synthesis filter, and here p = 16. The subtracter 120 extracts the original signal x(n)
The difference e(n) ^between and the reproduced signal X is calculated and output to the weighting circuit 190. 190 inputs e(n),
Using the weighting function ω(n), the weighting error eω(n) is calculated according to the following equation.

eω(n)＝ω(n)＊ｅ(n) −(2) 上式で、記号“＊”はたたみこみ積分を表わ
す。また、重み付け関数ω(n)は、周波数軸上で重
み付けを行なうものであり、そのＺ変換値をＷ(z)
とすると、合成フイルタの予測パラメータaiを用
いて、次式により表わされる。 eω(n)=ω(n)*e(n) −(2) In the above equation, the symbol “*” represents a convolution integral. Furthermore, the weighting function ω(n) performs weighting on the frequency axis, and its Z-transformed value is expressed as W(z).
Then, using the prediction parameter ai of the synthesis filter, it is expressed by the following equation.

Ｗ(z)＝（１−_P 〓ⁱ⁼¹ a_iZ^-i）／１_P 〓ⁱ⁼¹ a_i・rⁱ・Z^-i） −(3) 上式でｒは０≦ｒ≦１の定数であり、Ｗ(Z)の周
波数特性を決定する。つまり、ｒ＝１とすると、
Ｗ(z)＝１となり、その周波数特性は平坦となる。
一方、ｒ＝０とすると、Ｗ(z)は合成フイルタの周
波数特性の逆特性となる。従つて、ｒの値によつ
てＷ(Z)の特性を変えることができる。また、(3)式
で示したようにＷ(Z)を合成フイルタの周波数特性
に依存させて決めているのは、聴感的なマスク効
果を利用しているためである。つまり、入力音声
信号のスペクトルのパワが大きな箇所では（例え
ばフオルマントの近傍）、再生信号のスペクトル
との誤差が少々大きくても、その誤差は耳につき
難いという聴感的な性質による。第３図に、ある
フレームにおける入力音声信号のスペクトルと、
Ｗ(Z)の周波数特性の一例を示した。ここではｒ＝
0.8とした。図において、横軸は周波数（最大4K
Hz）を、縦軸は対数振幅（最大60dB）をそれぞ
れ示す。また、上部の曲線は音声信号のスペクト
ルを、下部の曲線は重み付け関数の周波数特性を
表わしている。 W(z)=(1- _P 〓 ⁱ⁼¹ a _i Z ^-i ) /1 _P 〓 ⁱ⁼¹ a _i・r ⁱ・Z ^-i ) −(3) In the above equation, r is 0≦r≦1 is a constant that determines the frequency characteristics of W(Z). In other words, if r=1,
W(z)=1, and the frequency characteristic becomes flat.
On the other hand, when r=0, W(z) has a frequency characteristic opposite to that of the synthesis filter. Therefore, the characteristics of W(Z) can be changed depending on the value of r. Furthermore, as shown in equation (3), W(Z) is determined depending on the frequency characteristics of the synthesis filter because an auditory masking effect is utilized. In other words, this is due to the perceptual property that even if the error with the spectrum of the reproduced signal is a little large, the error is hard to notice at a location where the input audio signal has a large spectral power (for example, near a formant). FIG. 3 shows the spectrum of the input audio signal in a certain frame,
An example of the frequency characteristics of W(Z) is shown. Here r=
It was set to 0.8. In the figure, the horizontal axis is the frequency (up to 4K
Hz), and the vertical axis shows logarithmic amplitude (maximum 60 dB). Further, the upper curve represents the spectrum of the audio signal, and the lower curve represents the frequency characteristic of the weighting function.

第１図へ戻つて、重み付け誤差eω(n)は、誤差
最小化回路１５０へフイ−ドバツクされる。誤差
最小化回路１５０は、eω(n)の値を１フレーム分
記憶し、これらを用いて次式に従い、重み付け２
乗誤差を計算する。 Returning to FIG. 1, the weighted error eω(n) is fed back to the error minimization circuit 150. The error minimization circuit 150 stores the values of eω(n) for one frame, and uses them to perform weighting 2 according to the following equation.
Calculate the multiplicative error.

ε＝_N 〓ⁿ⁼¹ eω(n)² −(4) ここで、Ｎは２乗誤差を計算するサンプル数を
示す。文献１の方式では、この時間長を5msecと
しており、これは8KHzサンプリングの場合には
Ｎ＝40に相当する。次に、誤差最小化回路１５０
は、前記(4)式で計算した２乗誤差εを小さくする
ように音源パルス発生回路１４０に対し、パルス
位置及び振幅情報を与える。１４０は、この情報
に基づいて音源パルス系列を発生させる。合成フ
イルタ１３０は、この音源パルス系列を駆動源と
して再生信号x^〜を計算する。次に減算器１２０
では、先に計算した原信号と再生信号との誤差ｅ
(n)から現在求まつた再生信号x^〜を減算して、こ
れを新たな誤差ｅ(n)とする。重み付け回路１９０
はｅ(n)を入力し重み付け誤差eω(n)を計算し、こ
れを誤差最小化回路１５０へフイードバツクす
る。１５０は、再び、２乗誤差εを計算し、これ
を小さくするように音源パルス系列の振幅と位置
を調整する。こうして音源パルス系列の発生から
誤差最小化による音源パルス系列の調整までの一
連の処理は、音源パルス系列のパルス数があらか
じめ定められた数に達するまでくり返される。 ε= _N 〓 ⁿ⁼¹ eω(n) ² −(4) Here, N indicates the number of samples for calculating the squared error. In the method of Document 1, this time length is set to 5 msec, which corresponds to N=40 in the case of 8KHz sampling. Next, the error minimization circuit 150
gives pulse position and amplitude information to the sound source pulse generation circuit 140 so as to reduce the squared error ε calculated by the above equation (4). 140 generates a sound source pulse sequence based on this information. The synthesis filter 130 uses this sound source pulse sequence as a driving source to calculate a reproduced signal ^x . Next, the subtractor 120
Then, the error e between the original signal and the reproduced signal calculated earlier is
The currently determined reproduced signal x ^~ is subtracted from (n), and this is set as a new error e(n). Weighting circuit 190
inputs e(n), calculates a weighted error eω(n), and feeds this back to the error minimization circuit 150. 150 again calculates the squared error ε, and adjusts the amplitude and position of the sound source pulse sequence to reduce it. In this way, a series of processes from generation of the sound source pulse sequence to adjustment of the sound source pulse sequence by error minimization are repeated until the number of pulses of the sound source pulse sequence reaches a predetermined number.

以上で従来方式の説明を終了する。 This concludes the explanation of the conventional method.

この方式の場合に、伝送すべき情報は、合成フ
イルタのＫパラメータKi（１≦ｉ≦16）と、音源
パルス系列のパルス位置及び振幅であり、１フレ
ーム内にたてるパルスの数によつて任意の伝送レ
イトを実現できる。さらに、伝送レイトを
10Kbps以下とする領域に対しては、良好な再生
音質が得られ、有効な方式の一つと考えられる。 In the case of this method, the information to be transmitted is the K parameter Ki (1≦i≦16) of the synthesis filter and the pulse position and amplitude of the sound source pulse sequence. Any transmission rate can be achieved. Furthermore, the transmission rate
Good playback quality can be obtained in the region of 10Kbps or less, and it is considered to be an effective method.

しかしながら、この従来方式は、演算量が非常
に多いという欠点がある。これは音源パルス系列
におけるパルスの位置と振幅を計算する際に、そ
のパルスに基づいて再生した信号と原信号との誤
差及び２乗誤差を計算し、それらをフイードバツ
クさせてパルス位置と振幅を調整していることに
起因している。更には、パルスの数があらかじめ
定められた値に達するまでこの一連の処理をくり
返すことに起因している。 However, this conventional method has the disadvantage that the amount of calculation is extremely large. When calculating the position and amplitude of a pulse in a sound source pulse sequence, this calculates the error and squared error between the reproduced signal and the original signal based on the pulse, and uses these as feedback to adjust the pulse position and amplitude. This is due to what you are doing. Furthermore, this is caused by repeating this series of processes until the number of pulses reaches a predetermined value.

本発明の目的は、比較的少ない演算量で
10Kbps以下の伝送レイトに適用し得る高品質な
音声符号化装置を提供することにある。 The purpose of the present invention is to
The object of the present invention is to provide a high-quality audio encoding device that can be applied to transmission rates of 10 Kbps or less.

第２の発明の音声符号化装置は、離散的音声信
号系列を入力し、前記音声信号系列を短時間毎に
分割し短時間音声信号系列を求める手段と、前記
短時間音声信号系列からスペクトル包絡を表わす
パラメータを抽出して符号化する手段と、前記ス
ペクトル包絡を表わすパラメータをもとにインパ
ルス応答系列を計算する手段と、前記インパルス
応答系列を用いて相関々数列を計算する手段と、
前記短時間音声信号系列をもとにあらかじめ定め
られた補正を加えた目標信号系列を計算する手段
と、前記インパルス応答系列と前記目標信号系列
とを用いて相互相関々数列を計算する手段と、前
記相関々数列と前記相互相関々数列とを用いて前
記短時間音声信号系列の駆動音源信号系列を計算
して符号化する手段と、前記スペクトル包絡を表
わすパラメータの符号と前記駆動音源信号系列を
表わす符号とを組み合わせて出力する手段とを有
することを特徴とする。 A speech encoding device according to a second aspect of the present invention includes means for inputting a discrete speech signal sequence, dividing the speech signal sequence into short time periods to obtain a short time speech signal sequence, and calculating a spectral envelope from the short time speech signal sequence. means for extracting and encoding parameters representing the spectral envelope; means for calculating an impulse response sequence based on the parameters representing the spectral envelope; and means for calculating a correlation sequence using the impulse response sequence;
means for calculating a target signal sequence with predetermined corrections based on the short-time audio signal sequence; means for calculating a cross-correlation sequence using the impulse response sequence and the target signal sequence; means for calculating and encoding a driving excitation signal sequence of the short-time audio signal sequence using the correlation sequence and the cross-correlation sequence; and means for calculating and encoding the driving excitation signal sequence of the short-time audio signal sequence, and and a means for outputting the combination with the code representing the information.

第１の発明の音声符号化装置は、離散的音声信
号系列を入力し、前記音声信号系列を短時間毎に
分割し短時間音声信号系列を求める手段と、前記
短時間音声信号系列からスペクトル包絡を表わす
パラメータを抽出して符号化する手段と、前記ス
ペクトル包絡を表わすパラメータをもとにインパ
ルス応答系列を計算する手段と、前記インパルス
応答系列を用いて相関々数列を計算する手段と、
前記インパルス応答系列と前記短時間音声信号系
列とを用いて相関々数列を計算する手段と、前記
自己相関々数列と前記相互相関々数列とを用いて
前記短時間音声信号系列の駆動音源信号系列を計
算して符号化する手段と、前記スペクトル包絡を
表わすパラメータの符号と前記駆動音源信号系列
を表わす符号とを組み合わせて出力する手段とを
有する特徴とする。 A speech encoding device according to a first aspect of the present invention includes means for inputting a discrete speech signal series, dividing the speech signal series into short time periods to obtain a short time speech signal series, and calculating a spectral envelope from the short time speech signal series. means for extracting and encoding parameters representing the spectral envelope; means for calculating an impulse response sequence based on the parameters representing the spectral envelope; and means for calculating a correlation sequence using the impulse response sequence;
means for calculating a correlation sequence using the impulse response sequence and the short-time audio signal sequence; and a driving sound source signal sequence of the short-time audio signal sequence using the autocorrelation sequence and the cross-correlation sequence. and means for outputting a combination of the code of the parameter representing the spectral envelope and the code representing the drive excitation signal sequence.

本発明による音声符号化方式は、音源パルス系
列を計算するアルゴリズムに特徴がある。従つて
以下では、このアルゴリズムを特に詳細に説明す
ることにする。 The speech encoding method according to the present invention is characterized by an algorithm for calculating a sound source pulse sequence. In the following, this algorithm will therefore be explained in particular detail.

まず、１フレーム内の任意の時刻ｎにおける音
源パルス系列dnを次式で表わす。 First, the sound source pulse sequence dn at an arbitrary time n within one frame is expressed by the following equation.

ｄ(n)＝_K 〓^k=1 _gk ・δ_o,nk −(5) ここで、δ_o,nkはクロネツカーのデルタを表わ
し、ｎ＝mkの場合に１で、ｎ≠mkの場合に０で
ある。また、gkは、位置mkのパルスの振幅を表
わす。ｄ(n)を合成フイルタに入力して得られる再
生信号x^〜は、合成フイルタの予測パラメータをai
（１≦ｉ≦N_p；ここでNpは合成フイルタの次数
を示す）とすると、次式のように書ける。 d(n)= _K 〓 ^k=1 _gk・δ _o,nk −(5) Here, δ _o,nk represents Kronetzker's delta, which is 1 when n=mk and 0 when n≠mk. It is. Furthermore, gk represents the amplitude of the pulse at position mk. The reproduced signal x ^~ obtained by inputting d(n) to the synthesis filter is
Assuming that (1≦i≦N _p ; where Np indicates the order of the synthesis filter), it can be written as the following equation.

x^〜(n)＝dn＋_NP 〓ⁱ⁼¹ aix^〜（(n)−ｉ） −(6) 次に、入力音声信号xnと再生信号x^〜との１フ
レーム内の重み付け２乗誤差ｊは次のように書け
る。 x ^~ (n)=dn+ _NP 〓 ⁱ⁼¹ aix ^~ ((n)−i) −(6) Next, the weighted squared error j within one frame between the input audio signal xn and the reproduced signal x ^~ is as follows. It can be written as

Ｊ＝_N 〓ⁿ⁼¹ （（ｘ(n)−x^〜〓(n)）＊ω(n)）² −(7) ここでω(n)は重み付け回路のインパルス応答で
あり、例えば従来例と同一特性としてもよい。
又、Ｎは１フレームのサンプル数を示す。(7)式は
さらに次式のように変形できる。 J= _N 〓 ⁿ⁼¹ ((x(n)−x ^〜〓(n))＊ω(n)) ² −(7) Here, ω(n) is the impulse response of the weighting circuit, for example, in the conventional example It is also possible to have the same characteristics as .
Further, N indicates the number of samples in one frame. Equation (7) can be further transformed as follows.

Ｊ＝_P 〓ⁱ⁼¹ （（ｘ(n)＊ω(n)−x^〜(n)＊ω(n)² −(8) ここでx^〜(n)＊ω(n)の項は次式に従つて変形さ
れる。 J= _P 〓 ⁱ⁼¹ ((x(n)＊ω(n)−x ^〜 (n)＊ω(n) ² −(8) Here, the term x ^〜 (n)＊ω(n) is It is transformed according to the formula.

x^〜ω(n)＝x^〜(n)＊ω(n) −(9) とおく。(9)式の両辺をＺ変換すると、 x^〜ω(z)＝x^〜(z)・Ｗ(z) −(10) と書ける。x^〜は更に次のようにかける。 Let x ^~ ω(n)=x ^~ (n)＊ω(n) −(9). When both sides of equation (9) are Z-transformed, it can be written as x ^~ ω(z)=x ^~ (z)・W(z) −(10). x ^~ is further multiplied as follows.

x^〜(z)＝Ｈ(z)・Ｄ(z) −（11）ここではＤ(z)は音源パルス系列(5)式のＺ変換を
示し、Ｈ(z)は合成フイルタのインパルス応答のＺ
変換値を示す。（11）式を(10)式に代入すると、 x^〜ω(z)＝Ｄ(z)・Ｈ(z)・Ｗ(z) −（12）となり、Hω(z)＝Ｈ(z)・Ｗ(z)とおき、（12）式を
逆Ｚ変換し、Hω(z)の逆Ｚ変換値をhω(n)とする
と、次式を得る。 x ^~ (z) = H(z)・D(z) − (11) Here, D(z) represents the Z transformation of the sound source pulse sequence equation (5), and H(z) is the impulse response of the synthesis filter. Z
Indicates the conversion value. Substituting equation (11) into equation (10), x ^~ ω(z) = D(z)・H(z)・W(z) −(12), and Hω(z)=H(z)・Letting W(z) be the inverse Z-transform of equation (12) and the inverse Z-transformed value of Hω(z) to be hω(n), the following equation is obtained.

Xω(n)＝ｄ(n)＊ｈ(n) −（13）ここで、hω(n)は合成フイルタと重み付け回路
の縦続接続フイルタのインパルス応答を示す。
（13）式に(5)を代入して次式を得る。 Xω(n)=d(n)*h(n)−(13) Here, hω(n) represents the impulse response of the cascade-connected filter of the synthesis filter and the weighting circuit.
Substitute (5) into equation (13) to obtain the following equation.

x^〜ω(n)＝_K 〓ⁱ⁼¹ g_ihω（ｎ−m_i） −（14）ここでＫは、１フレームにたてるパルス数を示
す。（14）式、(9)式を(8)式に代入すればＪ＝_N 〓ⁿ⁼¹ （xω(n)−_K 〓ⁱ⁼¹ g_ihω（ｎ−m_i））² −（15）とかける。従つて(7)式は（15）式のように表わせ
ることになる。（15）式を最小とするような音源
パルス系列の振幅g_k、位置m_kの計算式を次に導
出する。（15）式をg_kで偏微分して０とおくこと
によつて、次式が導かれる。 x ^~ ω(n)= _K 〓 ⁱ⁼¹ g _i hω(n−m _i ) −(14) Here, K indicates the number of pulses generated in one frame. Substituting equations (14) and (9) into equation (8), we get J= _N 〓 ⁿ⁼¹ (xω(n)− _K 〓 ⁱ⁼¹ g _i hω(n−m _i )) ² − (15 ). Therefore, equation (7) can be expressed as equation (15). Formulas for calculating the amplitude g _k and position m _k of the sound source pulse sequence that minimize equation (15) are derived next. By partially differentiating equation (15) with respect to g _k and setting it to 0, the following equation is derived.

ここで、_xh・はXω(n)とhω(n)から計算した相
互相関々数列をψhh（・）は、インパルス応答
hωnの相関々数列を表す。インパルス応答の相
関々数列としては共分散関数列あるいは自己相
関々数列が知られている。相互相関々数列と共分
散関数列は、次式のように表せる。尚、hh（・）
は音声信号処理の分野では共分散関数列と呼ばれ
ることが多い。 Here, _xh・ is the cross-correlation sequence calculated from Xω(n) and hω(n), and ψhh(・) is the impulse response
represents the correlation sequence of hωn. A covariance function sequence or an autocorrelation function sequence is known as a correlation sequence of an impulse response. The cross-correlation sequence and covariance function sequence can be expressed as follows. Furthermore, hh(・)
is often called a covariance function sequence in the field of audio signal processing.

×ｈ（−m_k）_N 〓ⁿ⁼¹ x〓(n)h〓（ｎ−m_k）＝_hx（m_k），（１≦m_k≦Ｎ） −（17） hh（m_i,nk）＝_N-(ni-nk) 〓ⁿ⁼¹ hω（ｎ−m_i）hω（ｎ−m_k），（１≦m_i，mk≦
Ｎ） −（18）（16）式によれば、パルスの位置mkをパラメ
ータとして、位置mkに対応した振幅gkが計算で
きる。つまり、パルスの位置mkは各パルスにつ
いて、｜gk｜が最大となるmkを選べばよい。こ
れは、（16）式をgiについて解くことによつて証
明されるが、ここでは証明は省略する。 ×h(−m _k ) _N 〓 ⁿ⁼¹ x〓(n)h〓(n−m _k )= _hx (m _k ), (1≦m _k ≦N) −(17) hh(m _i,nk )= _N-(ni-nk) 〓 ⁿ⁼¹ hω(n-m _i ) hω(n-m _k ), (1≦m _i , mk≦
N) - (18) According to equation (16), the amplitude gk corresponding to the position mk can be calculated using the pulse position mk as a parameter. In other words, the pulse position mk may be selected from the position mk at which |gk| is maximum for each pulse. This can be proven by solving equation (16) for gi, but the proof will be omitted here.

以上で本アルゴリズムの導出に関する説明を終
える。 This concludes the explanation regarding the derivation of this algorithm.

第４図は（16）式による音源パルス計算アルゴ
リズムを用いた第１及び第２の発明の一構成例を
示すブロツク図である。図において、第１図と同
一番号を付した構成要素は、同一の働きをするの
で、ここでは説明は省略する。又、図において
は、符号器側のみを示してある。復号器側は従来
方式の復号器側と同一の構成で実現できるので、
ここでは説明を省略する。第４図において、各構
成要素は、１フレーム毎に以下の処理を行なう。
Ｋパラメータ計算回路２８０は、バツフアメモリ
回路１１０に蓄積された音声信号x_oを入力し、あ
らかじめ定められたNp個のＫパラメータKi（１
≦ｉ≦Np）を計算する。このＫパラメータ符号
化回路２００に出力される。Ｋパラメータ符号化
回路２００は例えばあらかじめ定められた量子化
ビツト数に基づいて、Kiを符号化し、この符号
1Kiをマルチプレクサ２６０へ出力する。またＫ
パラメータ符号化回路２００は、1KIを符号化
し、復号化値Ki′（１≦ｉ≦Np）を音源パルス計
算回路２３０へ出力する。音源パルス計算回路２
３０は、バツフアメモリ回路１１０に蓄積された
入力音声信号x_oと、Ｋパラメータ量子化値Ki′と
を入力し、前述の（17）、（18）、（19）式に基づい
て、１フレーム内の音源パルス系列の振幅gk及
び位置mkを計算し、これらを符号化回路２５０
へ出力する。次に、音源パルス回路２３０の構成
について説明する。第５図は、第２の発明におけ
る音源パルス計算回路２３０に一実施例を示すブ
ロツク図である。図において、端子２３２からＫ
パラメータ量子化値Ki′が入力され、インパルス
応答計算回路２１０と、重み付け回路２９０へ入
力される。インパルス応答計算回路２１０は、
Ki′を入力し、前述の（13）式におけるhωn（合成
フイルタと重み付け回路の縦続接続からなるフイ
ルタのインパルス応答）の計算を、あらかじめ定
められたサンプル数だけ行ない、求まつたhω(n)
を共分散関数計算回路２２０と、相互相関々数計
算回路２３５とへ出力する。共分散関数計算回路
２２０は、あらかじめ定められたサンプル数の
hω(n)を入力し、前述の（18）式に従つてhω(n)の
共分散関数ψhh（mi，mk）（１≦ｉ，ｋ≦Ｎ）を
計算し、これをパルス系列計算回路２４０へ出力
する。次に、重み付け回路２９０は、入力端子２
３２からKi′を入力し、重み付け関数ω(n)を、例
えば従来方式の(3)式に従つて計算する。これは他
の周波数重み付け方法を用いて計算してもよい。
また、重み付け回路２９０は、入力端子２３１か
らxnを入力し、ｘ(n)とω(n)とのたたみこみ計算
を行ない、得られたxω(n)を相互相関々数計算回
路２３５へ出力する。相互相関々数計算回路２３
５は、xω(n)とhω(n)とを入力し、前述の（17）式
に従つて、相互相関々数ψxh（−m_k）（１≦mk≦
Ｎ）を計算し、これをパルス系列計算回路２４０
へ出力する。次に、パルス系列計算回路２４０
は、２３５からψxh（−m_k）を、２２０からψhh
（m_i，m_k）（１≦m_i，m_k≦Ｎ）をそれぞれ入力
し、前述の音源パルス計算式（16）式を用いて、
パルスの振幅g_kを計算する。例えば、１つ目のパ
ルスは（16）式において、ｋ＝１とおいて振幅g₁
を位置m₁の関数として求める。次に、｜g₁｜が最
大とするようなm₁を選び、その際のm₁，g₁を１
番目のパルスの位置及び振幅とする。次に、２番
目のパルスは、（16）式において、ｋ＝２おとく
ことにより求まる。（16）式によれば、２番目の
パルスは１番目のパルスによる影響をさしひいて
求まることを意味している。３番目以降のパルス
も同様にして計算でき、あらかじめ定まつたパル
ス数に達するか、あるいは、求まつたパルスの
g_k，m_kを（16）式に代入して得られる誤差の値
が、あらかじめ定められたしきい値以下になるま
でパルスの計算を続ける。パルス系列の振幅、位
置を表わすg_k、m_kは、パルス系列計算回路２４
０から出力端子２３３を通して出力される。以上
で、第２の発明における音源パルス計算回路２３
０の説明を終える。なお、第１の発明における音
源パルス計算回路２３０は、第５図から重み付け
回路２９０を除いたものである。この場合、前述
の(7)式における重み付け関数ωnは１となるが、
ω(n)＝１とすると、(7)式は、下式のようになる。 FIG. 4 is a block diagram showing an example of the configuration of the first and second inventions using the sound source pulse calculation algorithm according to equation (16). In the figure, the components given the same numbers as in FIG. 1 have the same functions, so their explanation will be omitted here. Further, in the figure, only the encoder side is shown. The decoder side can be realized with the same configuration as the conventional decoder side, so
The explanation will be omitted here. In FIG. 4, each component performs the following processing for each frame.
The K parameter calculation circuit 280 inputs the audio signal x _o accumulated in the buffer memory circuit 110 and calculates Np predetermined K parameters Ki (1
≦i≦Np). It is output to this K parameter encoding circuit 200. The K parameter encoding circuit 200 encodes Ki based on, for example, a predetermined number of quantization bits, and converts this code into
1Ki is output to multiplexer 260. Also K
The parameter encoding circuit 200 encodes 1KI and outputs the decoded value Ki' (1≦i≦Np) to the excitation pulse calculation circuit 230. Sound source pulse calculation circuit 2
30 inputs the input audio signal _xo accumulated in the buffer memory circuit 110 and the K parameter quantized value Ki′, and calculates the value within one frame based on the above-mentioned equations (17), (18), and (19). The amplitude gk and position mk of the sound source pulse sequence are calculated, and these are sent to the encoding circuit 250.
Output to. Next, the configuration of the sound source pulse circuit 230 will be explained. FIG. 5 is a block diagram showing one embodiment of the sound source pulse calculation circuit 230 in the second invention. In the figure, from terminal 232 to K
The parameter quantized value Ki' is inputted to the impulse response calculation circuit 210 and the weighting circuit 290. The impulse response calculation circuit 210 is
Ki′ is input, hωn (impulse response of a filter consisting of a cascade of a synthesis filter and a weighting circuit) is calculated in equation (13) for a predetermined number of samples, and the obtained hω(n)
is output to the covariance function calculation circuit 220 and the cross-correlation calculation circuit 235. The covariance function calculation circuit 220 calculates a predetermined number of samples.
Input hω(n), calculate the covariance function ψhh(mi, mk) (1≦i, k≦N) of hω(n) according to the above equation (18), and apply this to the pulse sequence calculation circuit. Output to 240. Next, the weighting circuit 290
Ki' is input from 32, and the weighting function ω(n) is calculated, for example, according to the conventional formula (3). This may be calculated using other frequency weighting methods.
Further, the weighting circuit 290 receives xn from the input terminal 231, performs convolution calculation of x(n) and ω(n), and outputs the obtained xω(n) to the cross-correlation coefficient calculation circuit 235. . Cross-correlation calculation circuit 23
5 inputs xω(n) and hω(n) and calculates the cross-correlation number ψxh (−m _k ) (1≦mk≦) according to the above equation (17).
N) and sends it to the pulse sequence calculation circuit 240.
Output to. Next, the pulse sequence calculation circuit 240
is ψxh (−m _k ) from 235, ψhh from 220
(m _i , m _k ) (1≦m _i , m _k ≦N), and using the above-mentioned sound source pulse calculation formula (16),
Calculate the amplitude of the pulse g _k . For example, in equation (16), the first pulse has an amplitude g ₁ when k=1
Find it as a function of the position m ₁ . Next, select m ₁ such that |g ₁ | is maximum, and then set m ₁ and g ₁ to 1
Let be the position and amplitude of the second pulse. Next, the second pulse is found by setting k=2 in equation (16). According to equation (16), it means that the second pulse is found by subtracting the influence of the first pulse. The third and subsequent pulses can be calculated in the same way, and either the predetermined number of pulses is reached or the number of determined pulses is reached.
Pulse calculations are continued until the error value obtained by substituting g _k and m _k into equation (16) falls below a predetermined threshold. g _k and m _k representing the amplitude and position of the pulse sequence are calculated by the pulse sequence calculation circuit 24.
0 through the output terminal 233. The above describes the sound source pulse calculation circuit 23 in the second invention.
Finish explaining 0. Note that the sound source pulse calculation circuit 230 in the first invention is the same as that shown in FIG. 5 without the weighting circuit 290. In this case, the weighting function ωn in equation (7) above is 1, but
When ω(n)=1, equation (7) becomes as shown below.

Ｊ＝_N 〓ⁿ⁼¹ （ｘ(n)−x^〜(n)）² つまり、この場合はｘ(n)とx^〜(n)との誤差の評
価式として、２乗ユークリツド距離を用いて評価
することになり、より一般的な評価式となる。一
方ω(n)を用いる(7)式では、ω(n)は重み付け関数と
して動作するため、重み付け２乗距離を用いてｘ
(n)とx^〜(n)との誤差を評価することになる。 J= _N 〓 ⁿ⁼¹ (x(n)−x ^~ (n)) ^2In other words, in this case, the squared Euclidean distance is used as the evaluation formula for the error between x(n) and x ^~ (n). This is a more general evaluation formula. On the other hand, in equation (7) using ω(n), since ω(n) operates as a weighting function, x
The error between (n) and x ^~ (n) will be evaluated.

第４図に戻つて、符号化回路２５０は、音源パ
ルス計算回路２３０の出力端子２３３から、パル
ス系列の振幅g_k及び位置m_kを入力し、これらを
後述の正規化係数を用いて符号化し、g_k，m_k及
び正規化係数を表わす符号をマルチプレクサ２６
０へ出力する。ここで、符号化の方法は種々考え
られるが、振幅gkに符号化については、従来よ
く知られている方法を用いることができる。例え
ば、振幅の確率分布を正規型と仮定して、正規型
の場合の最適量子化器を用いる方法が考えられ
る。これについては、ジェー・マツクス・（J.
MAX）氏によるアイ・アール・イー・トランザ
クシヨンズ・オン・インフォメーシヨン・セオリ
（IRE TR NSACTIONS ON
INFORMATION THEORY）の1960年３月号、
７〜12頁に計算の「クオンタイジング・フオー・
ミニマム・デイストーシヨン」（“
QUANTIZING FOR MINIMUM
DISTORTION”）と題した論文（文献２）等に
詳述されているので、ここでは説明を省略する。
また、他の方法としては、１フレーム内のパルス
系列の振幅の最大値を正規化係数として、この値
で各パルス振幅を正規化した後に量子化、符号化
する方法も考えられる。前者の方法の場合には、
１フレーム内のr.m.s（ROOT MEAN
SQUARE）値を正規化係数とすればよい。次に
パルスの位置の符号化についても種々の方法が考
えられる。例えばフアクシミリ信号符号化の分野
でよく知られているランレングス符号等を用いて
もよい。これは符号“０”の続く長さをあらかじ
め定められた符号系列を用いて表わすものであ
る。また、正規化係数の符号化には、従来よく知
られている対数圧縮符号化等を用いることができ
る。 Returning to FIG. 4, the encoding circuit 250 inputs the amplitude g _k and position m _k of the pulse sequence from the output terminal 233 of the sound source pulse calculation circuit 230, and encodes these using normalization coefficients to be described later. , g _k , m _k and the codes representing the normalization coefficients are sent to the multiplexer 26
Output to 0. Here, although various encoding methods can be considered, a conventionally well-known method can be used to encode the amplitude gk. For example, a method can be considered in which the amplitude probability distribution is assumed to be a normal type and an optimal quantizer for the normal type is used. Regarding this, J. Maxkus (J.
IRE TR NSACTIONS ON by Mr. MAX)
INFORMATION THEORY) March 1960 issue,
Calculation “Quantizing for
Minimum Distortion” (“
QUANTIZING FOR MINIMUM
This is explained in detail in the paper entitled "DISTORTION" (Reference 2), so the explanation will be omitted here.
Another possible method is to use the maximum value of the amplitude of a pulse sequence within one frame as a normalization coefficient, normalize each pulse amplitude with this value, and then quantize and encode it. In the case of the former method,
rms within one frame (ROOT MEAN
SQUARE) value may be used as the normalization coefficient. Next, various methods can be considered for encoding the pulse position. For example, a run-length code well known in the field of facsimile signal encoding may be used. This represents the length of the code "0" using a predetermined code sequence. Further, for encoding the normalization coefficients, conventionally well-known logarithmic compression encoding or the like can be used.

尚、パルス系列の符号化に関しては、ここで説
明した符号化方法に限らず、衆知の最良の方法を
用いることができることは勿論である。 It should be noted that the coding of the pulse sequence is not limited to the coding method described here, and it goes without saying that the best known method can be used.

再び第４図に戻つて、マルチプレクサ２６０
は、Ｋパラメータ符号化回路２００の出力符号
と、符号化回路２５０の出力符合を入力し、これ
らの組み合わせて、送信側出力端子２７０から通
信路へ出力する。以上で本発明による音声符号化
装置の説明を終える。 Returning to FIG. 4 again, multiplexer 260
inputs the output code of the K-parameter encoding circuit 200 and the output code of the encoding circuit 250, combines them, and outputs the combination from the transmission side output terminal 270 to the communication path. This concludes the description of the speech encoding device according to the present invention.

本発明の構成によれば、音源パルス系列の計算
を（16）式に従つているので、文献１の従来方式
に見られたパルスにより合成フイルタを駆動し、
再生信号を求め原信号との誤差及び２乗誤差を計
算し、これらをフイードバツクしてパルスを調整
するという径路がなく、またそれら一連の処理を
くり返す必要もないので、演算量を大幅に減らす
ことが可能であり、かつ良好な再生音質が得られ
るという大きな効果がある。更に、（16）式の演
算において、ψxh（−mk）とψhh（mi，mk）（１
≦mi，mk≦Ｎ）の値は、１フレーム毎に、前も
つて計算しておくことによつて、（16）式の計算
は掛け算と引き算という非常に簡略化された演算
となり、更に演算量を減らすことができるという
効果がある。また、音源パルス系列を探索する他
の従来方式と比べても、本発明による方法は、同
一の伝送情報量の場合に、より良好な品質を得る
ことができるという効果がある。 According to the configuration of the present invention, since the sound source pulse sequence is calculated according to equation (16), the synthesis filter is driven by the pulses found in the conventional method of Document 1,
There is no path to find the reproduced signal, calculate the error and squared error from the original signal, and feed back these to adjust the pulse, and there is no need to repeat this series of processing, so the amount of calculations is significantly reduced. This has the great effect of making it possible to reproduce sound with good playback quality. Furthermore, in the calculation of equation (16), ψxh (−mk) and ψhh (mi, mk) (1
By calculating the values of ≦mi, mk≦N) in advance for each frame, the calculation of equation (16) becomes a very simplified operation of multiplication and subtraction, and further calculations are performed. It has the effect of reducing the amount. Furthermore, compared to other conventional methods for searching for sound source pulse sequences, the method according to the present invention has the advantage that better quality can be obtained for the same amount of transmitted information.

尚、前述の本発明の実施例においては、１フレ
ーム内の音源パルス系列の符号化は、パルス系列
が全て求まつた後に、第４図の構成要素２５０に
よつて符号化を施したが、符号化をパルス系列の
計算を含めて、パルスを１つ計算する毎に、符号
化を行ない、次のパルス計算するという構成にし
てもよい。このような構成をとることによつて、
符号化の歪をも含めた誤差を最小とするようなパ
ルス系列が求まるので、更に品質を向上させるこ
とができる。 In the above-described embodiment of the present invention, the sound source pulse sequence within one frame is encoded by the component 250 in FIG. 4 after all pulse sequences have been determined. The encoding may include the calculation of the pulse sequence, and each time one pulse is calculated, the encoding may be performed, and the next pulse may be calculated. By adopting such a configuration,
Since a pulse sequence that minimizes errors including encoding distortion can be found, the quality can be further improved.

また、前述の実施例においては、パルス系列の
計算はフレーム単位で行なつたが、フレームをい
くつかのサブフレームに分割し、そのサブフレー
ム毎にパルス系列を計算するような構成にしても
よい。この構成によれば、フレーム長をＮとすれ
ば、第４図に示した構成と比べて演算量を大略
１／ｄ倍にすることができる。ここではｄはフレーム分割数を示す。例えばｄ＝２とすれば、演算量
は約1/2にできる。勿論、同等の特性は得られる。 Furthermore, in the above embodiment, the pulse sequence was calculated on a frame-by-frame basis, but the frame may be divided into several subframes, and the pulse sequence may be calculated for each subframe. . According to this configuration, if the frame length is N, the amount of calculation can be approximately 1/d times as large as that of the configuration shown in FIG. Here, d indicates the number of frame divisions. For example, if d=2, the amount of calculation can be reduced to about 1/2. Of course, equivalent characteristics can be obtained.

また、以上説明した構成例においては、フレー
ム長を一定としたが、これは可変にしてもよい。
可変にした方が特性は向上する。また、短時間音
声信号系列のスペクトル包絡を表わすパラメータ
としてはＫパラメータを用いたが、これはよく知
られている他のパラメータ（例えばLSPパラメー
タ等）を用いてもよい。尚、デイジタル信号処理
の分野でよく知られているように、自己相関関数
はパワスペクトルから計算してもよい。また相互
相関関数はクロス・パワスペクトルから計算して
もよい。 Furthermore, in the configuration example described above, the frame length is constant, but it may be variable.
The characteristics will improve if it is made variable. Further, although the K parameter is used as the parameter representing the spectral envelope of the short-time audio signal sequence, other well-known parameters (for example, LSP parameters, etc.) may also be used. Note that, as is well known in the field of digital signal processing, the autocorrelation function may be calculated from the power spectrum. The cross-correlation function may also be calculated from the cross-power spectrum.

また、本発明による音源パルス計算式（16）式
においては、ψhh・として（18）式に従つて共分
散関数を計算したが、これは下式のような自己相
関々数列を計算するような構成にしてもよい。 In addition, in the sound source pulse calculation formula (16) according to the present invention, the covariance function was calculated according to formula (18) as ψhh. It may be configured.

Ψhh（｜m_i−m_k｜）＝_N-|_ni-nk| 〓〓ⁿ⁼¹ h_w(n)h_w（ｎ−｜m_i−m_k｜），（１≦｜m_i−m_k｜≦Ｎ
）−（19）このような構成をとることによつて、ψhh（・）
の計算に要する演算量を大幅に低減させることが
可能となり、全体の演算量も低減できるという効
果がある。 Ψhh(｜m _i −m _k ｜)= _N- | _ni-nk | 〓〓 ⁿ⁼¹ h _w (n)h _w (n-｜m _i −m _k ｜), (1≦｜m _i −m _k ｜≦N
)−(19) By adopting this configuration, ψhh(・)
It is possible to significantly reduce the amount of calculation required for the calculation of , and there is an effect that the amount of calculation as a whole can also be reduced.

[Brief explanation of drawings]

第１図は従来方式の構成を示すブロツク図、第
２図は音源パルス系列の一例を示す１、第３図は
入力音声信号系列の周波数特性と第１図に記載の
重み付け回路の周波数特性の一例を示す図、第４
図は本発明の構成による音声符号化装置の一実施
例を示すブロツク図、第５図は第４図に記載の音
源パルス計算回路２３０の一構成例を示すブロツ
ク図をそれぞれ示す。図において、１１０……バツフアメモリ回路、
１２０……減算回路、１３０……合成フイルタ、
１４０……音源パルス発生回路、１５０……誤差
最小化回路、１８０，２８０……Ｋパラメータ計
算回路、１９０，２９０……重み付け回路、２０
０……Ｋパラメータ符号化回路、２３０……音源
パルス計算回路、２１０……インパルス応答計算
回路、２２０……共分散関数計算回路、２３５…
…相互相関々数計算回路、２４０……パルス計算
回路、２５０……符号化回路、２６０……マルチ
プレクサをそれぞれ示す。 Fig. 1 is a block diagram showing the configuration of the conventional system, Fig. 2 shows an example of a sound source pulse sequence, and Fig. 3 shows the frequency characteristics of the input audio signal sequence and the frequency characteristics of the weighting circuit shown in Fig. 1. Diagram showing an example, No. 4
The figure shows a block diagram showing an embodiment of the speech encoding device according to the present invention, and FIG. 5 shows a block diagram showing an example of the structure of the sound source pulse calculation circuit 230 shown in FIG. 4. In the figure, 110...buffer memory circuit,
120...subtraction circuit, 130...composition filter,
140... Sound source pulse generation circuit, 150... Error minimization circuit, 180, 280... K parameter calculation circuit, 190, 290... Weighting circuit, 20
0... K parameter encoding circuit, 230... Sound source pulse calculation circuit, 210... Impulse response calculation circuit, 220... Covariance function calculation circuit, 235...
. . . cross-correlation calculation circuit, 240 . . . pulse calculation circuit, 250 . . . encoding circuit, 260 . . . multiplexer, respectively.

Claims

[Scope of Claims] 1. Means for inputting a discrete audio signal sequence, dividing the audio signal sequence into short-time intervals to obtain a short-time audio signal sequence, and determining a parameter representing a spectral envelope from the short-term audio signal sequence. means for extracting and encoding; means for calculating an impulse response sequence based on parameters representing the spectral envelope; means for calculating a correlation sequence using the impulse response sequence; means for calculating a cross-correlation sequence using a short-time audio signal sequence; and calculating and encoding a drive excitation signal sequence of the short-time audio signal sequence using the correlation sequence and the cross-correlation sequence. and means for outputting a combination of the code of the parameter representing the spectral envelope and the code representing the drive excitation signal sequence. 2. means for inputting a discrete audio signal sequence and dividing the audio signal sequence into short-term audio signal sequences to obtain a short-term audio signal sequence; and means for calculating parameters representing a spectral envelope from the short-term audio signal sequence;
means for extracting and encoding the impulse response sequence and the target signal sequence; means for calculating an impulse response sequence based on a parameter representing the spectral envelope; and calculating a correlation sequence using the impulse response sequence. means for calculating a target signal sequence with a predetermined correction based on the short-time audio signal sequence; and calculating a cross-correlation sequence using the impulse response sequence and the target signal sequence. means for calculating and encoding a driving excitation signal sequence of the short-time audio signal sequence using the correlation sequence and the cross-correlation sequence; and a code of a parameter representing the spectral envelope and the driving excitation source. 1. A speech encoding device comprising means for outputting a combination of a code representing a signal sequence and a code representing a signal sequence.