JPH0291698A

JPH0291698A - Sound encoding and decoding system

Info

Publication number: JPH0291698A
Application number: JP63245077A
Authority: JP
Inventors: Eisuke Hanada; 英輔花田; Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-09-28
Filing date: 1988-09-28
Publication date: 1990-03-30

Abstract

PURPOSE:To reproduce a satisfactory audio signal by using a multiple pulse train obtained by pitch forecasting and a code book indicating a narrow- amplitude sound source signal together as a sound source signal. CONSTITUTION:A discrete audio signal is inputted from an input terminal 500, and a pitch parameter and a spectrum parameter are obtained by a pitch parameter calculating circuit 515 and a spectrum parameter calculating circuit 520. A sound source pulse calculating circuit 580 obtains a pitch-forecasted multiple pulse train, and a narrow-amplitude sound source calculating circuit 620 obtains a pitch-forecasted code book. A multiplexer 630 combines and outputs the multiple pulse train and the code book. This signal passes a demultiplexer 710, and the multiple pulse train, the code book, and the pitch parameter are used to restore a sound source signal, and further, the spectrum parameter is used to reproduce a synthesized audio signal. Thus, the audio signal is satisfactorily reproduced.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声信号を低いビットレートで効率的に符号化
、復号化するための音声符号化復号化方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to an audio encoding/decoding method for efficiently encoding and decoding audio signals at a low bit rate.

（従来の技術）音声信号を低いビットレート、例えば１６Ｋｂ／Ｓ程度
以下で伝送する方式としては、マルチパルス符号化法な
とが知られている。これらは音源信号を複数個のパルス
の組合せ（マルチパルス）で表し、声道の特徴をデジタ
ルフィノジタで表し、音源パルスの情報とフィルタの係
数を、一定時間区間（フレーム）毎に求めて伝送してい
る。この方法の詳細については、例えば、Ａｒａｓｅｋ
ｌ、　Ｏｚａｗａ、　Ｏｎｏ、　ａｎｄ　０ｃｈｌａ１
氏による文献（以下、「文献１」）”Ｍｕｌｔｉ−ｐｕ
ｌｓｅ　Ｅｘｃｉｔｅｄ　５ｐｅｅｃｈＣｏｄｅｒ　Ｂ
ａ５ｅｄ　ｏｎ　Ｍａｘｉｍｕｍ　Ｃｒｏｓｓ−ｃｏｒ
ｒｅｌａｔｉｏｎＳｅａｒｃｈ　Ａｌｇｏｒｉｔｈｍ−
（ＩＥＥＥ　ＧｌｏｂａｌＴｅｌｅｃｏｍｍｕｎｉｃａ
ｔｉｏｎ　　ｃｏｎｆｅｒｅｎｃｅ、　２３．３＋　１
９８３）に記載されている。この方法では、声道情報と
音源信号を分離してそれぞれ表現すること、および音源
信号を表現する手段として複数のパルス列の組合せ（マ
ルチパルス）を用いることにより、復号後に良好な音声
信号を出力する。音源信号を表すマルチパルス列を求め
る基本的な考え方については第３図を用いて説明する。(Prior Art) A multi-pulse encoding method is known as a method for transmitting audio signals at a low bit rate, for example, about 16 Kb/S or less. These represent the sound source signal as a combination of multiple pulses (multipulse), represent the characteristics of the vocal tract using a digital finoscillator, and transmit information on the sound source pulse and filter coefficients after determining them at fixed time intervals (frames). are doing. For details of this method, see, for example, Arasek
l, Ozawa, Ono, and 0chla1
Document by Mr. (hereinafter referred to as “Reference 1”) “Multi-pu
lse Excited 5peechCoder B
a5ed on Maximum Cross-cor
relationshipSearch Algorithm-
(IEEE Global Telecommunica
tion conference, 23.3+1
983). This method outputs a good audio signal after decoding by separately expressing the vocal tract information and the sound source signal, and by using a combination of multiple pulse trains (multipulse) as a means of expressing the sound source signal. . The basic concept of obtaining a multi-pulse train representing a sound source signal will be explained using FIG.

図中の入力端子８００からはフレーム毎に分割された区
間の音声信号が入力される。合成フィルタ回路８２０に
は現フレームの音声信号から求められたスペクトルパラ
メータが入力されている。音源計算回路８１０において
初期音源マルチパルス列を発生し、これを前記合成フィ
ルタ８２０に入力することによって出力として合成音声
波形が得られる。減算器８４０では前記入力信号から合
成音声波形を減する。An audio signal in sections divided into frames is inputted from an input terminal 800 in the figure. Spectral parameters determined from the audio signal of the current frame are input to the synthesis filter circuit 820. An initial sound source multi-pulse train is generated in the sound source calculation circuit 810 and inputted to the synthesis filter 820 to obtain a synthesized speech waveform as an output. A subtracter 840 subtracts the synthesized speech waveform from the input signal.

この結果を重み付は回路８５０へ入力し、出力として現
フレームでの重み付は誤差電力を得る。そしてこの重み
付は誤差電力を最小とするように、音源計算回路８１０
において音源マルチパルス列の振幅と位置を求める。This result is input to a weighting circuit 850, and the weighting error power in the current frame is obtained as an output. This weighting is performed by the sound source calculation circuit 810 so as to minimize the error power.
Find the amplitude and position of the sound source multi-pulse train at .

一方、ピッチの微細構造を表すピッチパラメータを用い
てピッチ予測を行うことにより文献１の方式の音質を改
善するピッチ予測マルチパルス法については、特願昭５
８−１３９０２２号明細書（文献２）において説明され
ているので、ここでは説明を省略する。On the other hand, regarding the pitch prediction multi-pulse method that improves the sound quality of the method of Document 1 by performing pitch prediction using pitch parameters representing the fine structure of the pitch,
Since it is explained in the specification of No. 8-139022 (Document 2), the explanation is omitted here.

（発明が解決しようとする問題点）しかしながら、前記文献１の従来法ではピッレートが充
分に高く音源パルスの数が充分なときは音質が良好であ
ったが、ビットレートを下げて行くと音質が低下してい
た。特に、従来の方式においては、ピッチ周波数の高い
入力信号の場合、例えば女性の声を入力した場合には、
再生音声が劣化するという欠点があった。これはピッチ
周波数が高い場合には、パルス計算のフレーム内に多く
のピッチ波形が含まれることになり、このピッチ波形を
良好に再生するためには、ピッチ周波数が低い話者の場
合と比べて、より多くの個数のマルチパルスを必要とす
るという理由による。従ってこの理由から、音質を低下
させることなく伝送ビットレートを大幅に下げる、すな
わち１フレーム内のパルス数を大幅に減少させることが
困難であった。(Problem to be solved by the invention) However, in the conventional method of Document 1, the sound quality was good when the pill rate was sufficiently high and the number of sound source pulses was sufficient, but as the bit rate was lowered, the sound quality deteriorated. It was declining. In particular, in the conventional method, in the case of an input signal with a high pitch frequency, for example, when a female voice is input,
There was a drawback that the reproduced sound deteriorated. This means that when the pitch frequency is high, many pitch waveforms are included in the pulse calculation frame, and in order to reproduce this pitch waveform well, it is necessary to reproduce the pitch waveform well compared to the case of a speaker with a low pitch frequency. , because it requires a larger number of multi-pulses. Therefore, for this reason, it has been difficult to significantly reduce the transmission bit rate, that is, to significantly reduce the number of pulses within one frame, without degrading sound quality.

一方、前記文献２の従来法では、ピッチ毎の相関に基づ
きピッチパラメータを用いてピッチ予測を行っているも
のの、大振幅音源信号、小振幅音源信号を問わず、マル
チパルスとピッチ予測とを用いて音源信号を表していた
。大振幅音源信号はピッチ毎の相関が高いと考えられる
が、小振幅音源信号では相関は低いと考えられる。この
方法による音質をさらに改善するためには、音源信号を
表すマルチパルス列の内、小振幅のマルチパルス列ある
いは小振幅の音源信号の役割がさらに重要である。この
ことは特に？−音性の音声信号に対して重要である。従
来の方法では音源信号を表現するマルチパルス列として
、振幅が大きいものから順に、設定した個数のみを求め
て伝送していた。On the other hand, in the conventional method of Document 2, although pitch prediction is performed using pitch parameters based on the correlation for each pitch, multipulses and pitch prediction are used regardless of whether the sound source signal is a large amplitude sound source signal or a small amplitude sound source signal. represents the sound source signal. It is thought that a large-amplitude sound source signal has a high correlation for each pitch, but a small-amplitude sound source signal is considered to have a low correlation. In order to further improve the sound quality by this method, the role of a small-amplitude multi-pulse train or a small-amplitude sound source signal among the multi-pulse trains representing the sound source signal is even more important. Is this in particular? - Important for tonal audio signals. In the conventional method, only a set number of pulses are determined and transmitted in descending order of amplitude as a multi-pulse train representing the sound source signal.

従って従来列では予め設定した情報量の上限により、充
分な個数の小振幅パルスを求めることができず、音源信
号の近似度が充分でな（、再生音声の品質の点で限界が
あり、それ以上の音質の向上は図れなかった。また、こ
のことはビットレートが低いときには特に顕著であった
。Therefore, in the conventional array, due to the upper limit of the amount of information set in advance, it is not possible to obtain a sufficient number of small amplitude pulses, and the approximation of the sound source signal is insufficient (there is a limit in terms of the quality of the reproduced sound). It was not possible to improve the sound quality above, and this was especially noticeable when the bit rate was low.

本発明の目的は、ビットレートが高いところ、低いとこ
ろでも、従来よりも良好な音声を再生することが可能で
ある音声符号化復号化方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide an audio encoding/decoding method that is capable of reproducing audio better than before even when the bit rate is high or low.

（問題点を解決するための手段）本発明の音声符号化復号化方式は、離散的な音声信号を
入力し、前記音声信号のピッチの微細構造を表すピッチ
パラメータと前記音声信号のスペクトルを表すスペクト
ルパラメータを求め、前記音声信号の音声信号をピッチ
予測して求めたマルチパルス列とピッチ予測して求めた
コードブックを用いて表して伝送し、前記コードブック
と前記マルチパルス列と前記ピッチパラメータとを用い
て前記音源信号を復元し前記スペクトルパラメータを用
いて前記音声信号を良好に表す合成音声信号を出力する
。(Means for Solving the Problems) The audio encoding/decoding method of the present invention inputs a discrete audio signal, and expresses a pitch parameter representing the fine structure of the pitch of the audio signal and a spectrum of the audio signal. A spectral parameter is obtained, and is expressed and transmitted using a multi-pulse train obtained by pitch-predicting the audio signal of the audio signal and a codebook obtained by pitch-predicting, and the codebook, the multi-pulse train, and the pitch parameter are expressed and transmitted. The spectral parameters are used to restore the sound source signal, and the spectral parameters are used to output a synthesized audio signal that satisfactorily represents the audio signal.

（作用）本発明は、前記文献２のピッチ予測マルチパルス符号化
法において、少ない伝送情報量で、音源信号を従来方法
よりも効果的に表現するために、音源信号をピッチ予測
マルチパルスとコードブノりで表すことを特徴としてい
る。(Function) In the pitch predictive multipulse encoding method of Document 2, the present invention uses a pitch predictive multipulse and a code block to express the sound source signal more effectively than the conventional method with a small amount of transmitted information. It is characterized by being expressed as .

本発明の作用を第２図を用いて説明する。第２図の上部
は音源信号を表現するためのマルチパルスをピッチ予測
により求める回路のブロック図となっている。図中の入
力端子３００からはフレーム毎に分割された区間の音声
信号が入力される。The operation of the present invention will be explained using FIG. 2. The upper part of FIG. 2 is a block diagram of a circuit that uses pitch prediction to obtain multipulses to represent the sound source signal. An audio signal in sections divided into frames is inputted from an input terminal 300 in the figure.

ピッチ再生フィルタ３１５には現フレームの音声信号か
ら求められたピッチパラメータが入力されている。スペ
クトル包絡フィルタ回路３２０には現フレームの音声信
号から求められたスペクトルパラメータが入力されてい
る。マルチパルス音源３１０において初期音源マルチパ
ルス列を発生し、これを前記ピッチ再生フィルタ３１５
に入力することによって駆動音源信号が得られる。前記
スペクトル包絡フィルタ３２０に前記駆動音源信号を入
力することによって合成音声波形が出力として得られる
。減算器３４０では前記入力信号から、合成音声波形を
減する。この結果を重み付は回路３５０へ入力し、出力
として現フレームでの重み付は誤差電力を得る。そして
この重み付けの誤差電力を最小とするように、マルチパ
ルス音源３１０において音源マルチパルス列の振幅と位
置を求める。スイッチ３８０はマルチパルス列を求める
ときには常に上側に接続されている。スイッチ３８０は
、マルチパルス列を求まった後に下側に接続され、求め
られたマルチパルス列とピッチ再生フィルタ３１５とス
ペクトル包絡フィルタ回路３２０により得られる合成音
声波形ｘ　（ｎ）を減算器２２０へ出力する。The pitch parameter determined from the audio signal of the current frame is input to the pitch reproduction filter 315. Spectral parameters determined from the audio signal of the current frame are input to the spectral envelope filter circuit 320. An initial sound source multi-pulse train is generated in the multi-pulse sound source 310, and this is passed through the pitch reproduction filter 315.
A driving sound source signal is obtained by inputting the signal into the . By inputting the driving sound source signal to the spectral envelope filter 320, a synthesized speech waveform is obtained as an output. A subtracter 340 subtracts the synthesized speech waveform from the input signal. This result is input to the weighting circuit 350, and the weighting error power in the current frame is obtained as an output. Then, the amplitude and position of the sound source multi-pulse train in the multi-pulse sound source 310 are determined so as to minimize the error power of this weighting. Switch 380 is always connected to the upper side when obtaining a multi-pulse train. The switch 380 is connected to the lower side after determining the multi-pulse train, and outputs the determined multi-pulse train, the synthesized speech waveform x (n) obtained by the pitch recovery filter 315 and the spectral envelope filter circuit 320 to the subtracter 220.

小振幅音源計算回路２３０は、フレーム区間をさらにい
くつかに分割した小区間（例えば５ｍ５ｅｃ、程度）の
各々について、音源信号の小振幅成分の特徴を表す小振
幅の音源信号を計算する。The small-amplitude sound source calculation circuit 230 calculates a small-amplitude sound source signal that represents the characteristics of the small-amplitude component of the sound source signal for each of the small sections (for example, about 5 m5 ec) that are obtained by further dividing the frame section.

ここで、この小振幅音源信号はほぼランダムな位相特性
を何し、はとんど雑音信号に近いと考えられる。このよ
うな信号を非常に効率よく符号化するためには、予め複
数個作成した小振幅音源信号のフードブック（符号帳）
を用意して符号化するベクトル量子化の手法を用いるこ
とができる。ベクトル量子化については、例えばＲ，Ｍ
、Ｇｒａｙ氏による論文（以下、文献３）”ｖｅｃｔｏ
ｒ　ｑｕａｎｔｉｚａｔｉｏｎｆｏｒ　５ｐｅｅｃｈ　
ｃｏ旧ｎｇ　ａｎｄ　ｒｅｃｏｇｎｉｔｉｏｎ”に詳し
いのでここでは説明を略す。Here, this small amplitude sound source signal has almost random phase characteristics and is considered to be almost like a noise signal. In order to encode such signals very efficiently, it is necessary to create a food book (codebook) of multiple small amplitude sound source signals in advance.
It is possible to use a vector quantization method that prepares and encodes. For vector quantization, for example, R, M
, a paper by Mr. Gray (hereinafter referred to as Document 3) “vecto
r quantization for 5peech
The explanation is omitted here because it is well known in the "former ng and recognition".

以下で、小振幅音源信号回路２３０の動作を説明する。The operation of the small amplitude sound source signal circuit 230 will be described below.

減算器２２０で再生信号ｘ　（ｎ）を元のＢ声波形ｘ　
（ｎ）から減じた結果生じる残差信号ｅ　（ｎ）を、時
間分割回路２４０によってフレームよりも短い小区間に
時間的に一様に分割する。The subtracter 220 converts the reproduced signal x (n) to the original B voice waveform x
The residual signal e (n) resulting from the subtraction from (n) is temporally uniformly divided into small sections shorter than a frame by the time division circuit 240.

コードブック（符号帳）２５０は、予め複数個作成して
用意されており、時間分割回路２４０によって分割され
た各小区間について、コードブックの中から１種類を入
力として、ゲイン回路２５５を通してゲインを合わせた
後、ピッチ再生フィルタ３１５と同様のピッチパラメー
タを用いたピッチ再生フィルタ２６７によりピッチを再
生し、スペクトル包絡フィルタ３２０と同様のスペクト
ルパラメータを用いたスペクトル包絡フィルタ２６０に
より合成残差信号τ（ｎ）を合成する。A plurality of codebooks (codebooks) 250 are prepared in advance, and for each small section divided by the time division circuit 240, one type from the codebook is input and the gain is calculated through the gain circuit 255. After matching, the pitch is recovered by a pitch recovery filter 267 using the same pitch parameters as the pitch recovery filter 315, and the synthesized residual signal τ(n ).

減算器２７０は、入力残差信号ｅ　（ｎ）から合成残差
信号’；ｅ　（ｎ　）を滅する。この結果は重み付は回
路２８０に人力され、出力として重み付は誤差電力を得
る。ここで２８０は３５０と同一の動作を行う。そして
重み付は誤差電力を最小とするようにコードブック２５
０の中から最適なものを選び、そのインデクスを出力と
する。The subtractor 270 subtracts the composite residual signal ';e (n) from the input residual signal e (n). This result is manually input to the weighting circuit 280, and the weighting obtains the error power as an output. Here, 280 performs the same operation as 350. Then, the weighting is performed using the codebook 25 to minimize the error power.
Select the optimal index from among 0 and output that index.

次に、小振幅音源信号をコードブックを用いて表現し、
フードブックを選択するための実際の方法について、以
下で式を用いて説明する。コードブックの選択方法とし
ては次式で定義される誤差電力Ｅを最小化するように計
算する。Next, the small amplitude sound source signal is expressed using a codebook,
The actual method for selecting a food book will be explained below using a formula. The codebook selection method is calculated so as to minimize the error power E defined by the following equation.

Ｅ＝Σ［（ｅ（ｎ）−ｇ−；ｉ　（ｎ）　）　＊Ｗ（ｎ
）］２（１）ここでｅ　（ｎ）は第２図の入力残差信号
であり、ｇはゲイン回路２５５において乗するゲイ’、
ｅ　（ｎ）は選択された一種類のフードブックとスペク
トル包絡フィルタによって再生した残差信号である。ｗ
　（ｎ）は聴感を考えた重み付はフィルタ（第２図中の
重み付は回路２８０と同一である。）のインパルス応答
を示す。（＋）式をｇについて最小化すると（２）式の
形となる。E=Σ[(e(n)−g−;i(n)) *W(n
)]2(1) where e (n) is the input residual signal in FIG. 2, g is the gain multiplied by the gain circuit 255,
e (n) is a residual signal reproduced by one type of food book selected and a spectral envelope filter. lol
(n) shows the impulse response of a filter (the weighting in FIG. 2 is the same as the circuit 280), which is weighted with auditory sense in mind. When formula (+) is minimized with respect to g, it becomes the form of formula (2).

ｇ＝（Σｅ、（ｎ）　ｅｗ（ｎ）　　）　／　（Σｅｗ
（ｎ　）れ（ｎ）　　）　　（２＞ここで、ｅｗ（ｎ）＝ｅ″（ｎＬ末Ｗ（ｎ）　　：　　ｎ（ｎ）
＊ｐ（ｎ）ｌ（ｎ）木Ｗ（ｎ）（３ａ）ｅｗ（ｎ）：ｅ
　（ｎ）＊Ｗ（ｎ）　　　　　　　　　　　　（３ｂ）
記号＊は畳み込みを表す。（２）式の分母はτＷ　（ｎ
）自己相関（厳密には共分散）、分子はτＷ　（ｎ）と
ｅｗ（ｎ、）の相互相関である。また（３ａ）式のｎ　
（ｎ）はコードブック中の、選択された１種類のコード
が表す信号である。また、ｐ（ｎ）はピッチ再生フィル
タ回路２５７のインパルス応答を、ｈ（ｎ）はスペクト
ル包絡フィルタ回路２６０のインパルス応答を示す。g=(Σe, (n) ew(n) ) / (Σew
(n)re(n)) (2>Here, ew(n)=e″(nL end W(n): n(n)
*p(n)l(n) tree W(n)(3a)ew(n):e
(n)*W(n) (3b)
The symbol * represents convolution. The denominator of equation (2) is τW (n
) autocorrelation (strictly speaking, covariance), the numerator is the cross-correlation of τW (n) and ew(n, ). Also, n in equation (3a)
(n) is a signal represented by one type of code selected in the codebook. Further, p(n) indicates the impulse response of the pitch recovery filter circuit 257, and h(n) indicates the impulse response of the spectral envelope filter circuit 260.

このとき誤差電力Ｅは次式のように書けるので、Ｅ＝Σｅｗ（ｎ）２−ｇ・　　Σｅｖ（ｎ）　　；（ｎ
）　　　　（４）ｎ　　　　　　　　　　　　　　ｎＥを最小化するコードブックは、（４）式第２項を最大
化、即ちｇを最大化するように選択すればよい。At this time, the error power E can be written as the following equation, so E=Σew(n)2−g・Σev(n);(n
) (4) The codebook that minimizes n n E may be selected to maximize the second term of equation (4), that is, to maximize g.

コードブックを選択するための計算量をさらに大幅に削
減するための方法としては、次のような構成も考えられ
る。音源信号を表すマルチパルス列は相互相関を用いて
探索する。この求め方は前記文献１．２等に詳しいので
ここでは説明は省略するが、ピッチ予測マルチパルス列
を求めた後の相互相関関数Φｘｈ′を用いることにより
、前述の方法より大幅に演Ｗｆｆｌを削減した上で、コ
ードブックを選択することが可能となる。As a method for further significantly reducing the amount of calculation for selecting a codebook, the following configuration may be considered. A multi-pulse train representing a sound source signal is searched using cross-correlation. This calculation method is detailed in the above-mentioned documents 1 and 2, so the explanation is omitted here, but by using the cross-correlation function Φxh′ after calculating the pitch prediction multipulse train, the calculation Wffl can be significantly reduced compared to the method described above. After that, it is possible to select a codebook.

以下に示す方法ではコードブック選択の際に信号ｅｗ（
ｎ　）を再生しなくてよいので、特性を目す述の方法と
ほぼ同じに保ちながら演算量を大幅に低減できる。以下
に導出方法を説明する。まず、Φ。In the method shown below, the signal ew (
Since there is no need to reproduce n ), the amount of calculation can be significantly reduced while keeping the characteristics almost the same as in the method described above. The derivation method will be explained below. First, Φ.

’　、ｅｗ（ｎ）は次のように書くことができる。’, ew(n) can be written as follows.

Φ。°：Σｅｖｉ（ｎ）ｈｗ（ｎ）　　　（５）ｅ”ｗ
（ｎ）　　：　　ｎ（ｎ）寡ｐ（ｎ）本ｈｗ（ｎ）　　
（Ｇ）（６）式を（２）式に代入し、（５）式を用いる
と、次の様に変形が可能である。Φ. °:Σevi(n)hw(n) (5)e”w
(n) : n(n) small p(n) book hw(n)
(G) By substituting equation (6) into equation (2) and using equation (5), the following transformation is possible.

ｇ＝　＜ΣΦｘｈ’　・ｎ（ｎ））　／　（Ｒｈｈ（０
）　”　Ｒｎｎ（０））　（９）ここでΦＸｌ＋９はピ
ッチ予測によりマルチパルス列を求めた後の相互相関関
数、Ｒｈ＋、（ｏ）は、ピッチ再生フィルタ２５７とス
ペクトル包絡フィルタ２６０と重み付は回路２８０の従
属接続からなるフィルタのインパルス応答の電力である
。Ｒｈｈ（０）はコードブック２５０のうちある１種類
のコードを選択した場合の、前記コードにより表される
信号ｎ（ｎ）の電力である。（７）式の分子はΦｘｈ′
と選択されたコードにより表される信号ｎ（ｎ）との相
互相関関数である。前述の（２）式と同じように、コー
ドブックは（７）式のｇを最大化するものを選べばよい
。g= <ΣΦxh' ・n(n)) / (Rhh(0
) ”Rnn(0)) (9) Here, ΦXl+9 is the cross-correlation function after obtaining the multipulse train by pitch prediction, Rh+, (o) is the pitch recovery filter 257, the spectral envelope filter 260, and the weighting circuit 280. Rhh(0) is the power of the impulse response of the filter consisting of the cascade connection of Rhh(0) is the power of the signal n(n) represented by the code when one type of code is selected from the codebook 250. The numerator of formula (7) is Φxh′
and the signal n(n) represented by the selected code. As with the above-mentioned equation (2), the codebook that maximizes g in equation (7) can be selected.

なお、フードブック２５０は、大振幅のピッチを測マル
チパルス列を予め定められた個数だけ求めた後の音源の
残差信号を用いて、予めトレーニングすることによって
作成してもよいし、例えばガウス性の統計的性質を持つ
ような雑音信号を位相特性を種々に変化させて複数個用
いて作成しておいてもよい。後者の方法についてはＭ、
Ｒ。Note that the food book 250 may be created by training in advance using the residual signal of the sound source after obtaining a predetermined number of large-amplitude pitch multipulse trains, or, for example, by training with a Gaussian It is also possible to create a plurality of noise signals having the statistical properties with variously changed phase characteristics. Regarding the latter method, M.
R.

５ｈｒｏｅｄｅｒ氏及びＢ、Ｓ、Ａｔａ１氏による論文
（以下、文献４）：”Ｃｏｄｅ−Ｅｘｃｉｔｅｄ　１ｉ
ｎｅａｒ　ｐｒｅｄｌｃｔｌｏｎ　（ＣＥＬＰ）：ｈ！
ｇｈ−ｑｕａｌｉｔｙ　５ｐｅｅｃｈ　ａｔ　ｖｅｒｙ
　ｌｏｖ　ｂｉｔｒａｔｅｓ、”　（Ｐｒｏｃ、１．ｃ
、Ａ、ｓ、ｓ、Ｐ、ｖｏｌ、ｌ、ｐａｐｅｒ　　Ｎｏ。Paper by Mr. 5hroeder and Mr. B, S, Ata1 (hereinafter referred to as Document 4): “Code-Excited 1i
near predlctron (CELP): h!
gh-quality 5peech at very
lov bitrates,” (Proc, 1.c
,A,s,s,P,vol,l,paper No.

２５．１．Ｉ、　Ｍａｒｃｈ、　＋９８５）に詳しい。25.1. I, March, +985).

送信側の伝送情報は、ピッチ予測したマルチパルスの振
幅、位置、小振幅音源信号のコードブックのインデクス
とゲインと、ピッチパラメータ、スペクトルパラメータ
である。The transmission information on the transmitting side is the amplitude and position of the pitch-predicted multipulse, the index and gain of the codebook of the small amplitude excitation signal, the pitch parameter, and the spectrum parameter.

（実施例）本発明の一実施例を示す第１図において、入力端子５０
０から離散的な音声信号ｘ　（ｎ）を入力する。時間分
割回路５１０は入力された音声信号を時間的に−様なフ
レーム毎（例えば２０ｍ５　ｅｃ、毎）に分割する。ピ
ッチパラメータ計算回路５１５はピッチの微細構造を表
すピッチパラメータを計算する。計算方法は前記文献２
に示されているような方法を用いる。ｆｆｉ子化蒸化器
５１６記求められたピッチパラメータを量子化する。逆
量子化器５１８は、量子化した結果を用いて逆量子化し
て出力する。スペクトルパラメータ計算回路５２０では
前記分割した区間の音声信号のスペクトルを表すスペク
トルパラメータを、衆知のＬＰＣ分析法によって求める
。(Embodiment) In FIG. 1 showing an embodiment of the present invention, an input terminal 50
A discrete audio signal x (n) from 0 is input. The time division circuit 510 temporally divides the input audio signal into various frames (for example, every 20 m5 ec). A pitch parameter calculation circuit 515 calculates a pitch parameter representing the pitch fine structure. The calculation method is described in the above document 2.
Use a method such as that shown in . ffi quantization evaporator 516 quantizes the determined pitch parameter. The dequantizer 518 dequantizes and outputs the quantized result. The spectral parameter calculation circuit 520 calculates spectral parameters representing the spectrum of the audio signal in the divided sections using the well-known LPC analysis method.

求められたスペクトルパラメータに対しては、スペクト
ルパラメータ量子化器５２５において量子化を行う。量
子化の方法は、特願昭５９−２７２４３５号明細書（文
献５）に示されているようなスカラー量子化やあるいは
前記文献４に示されたベクトル量子化を行ってもよい。The obtained spectral parameters are quantized in a spectral parameter quantizer 525. The quantization method may be scalar quantization as shown in Japanese Patent Application No. 59-272435 (Reference 5) or vector quantization as shown in Reference 4.

逆量子化器５３０は、量子化した結果を用いて逆量子化
して出力する。重み付は回路５４０は、逆量子化された
スペクトルパラメータを用いて前記分割された音声信号
に重み付けを行なう。重み付けの方法は、前記文献５の
重み付は回路２００を参照することができる。インパル
ス応答計算回路５５０は、逆量子化されたピッチパラメ
ータと逆量子化されたスペクトルパラメータを用いてイ
ンパルス応答を計算する。具体的な方法は前記文献２を
参照できる。自己相関関数計算回路５６０は前記インパ
ルス応答の自己相関関数を計算し音源パルス計算回路５
８０へ出力する。自己相関関数の計算法は前記文献２の
自己相関関数計算回路１８０を参照することができる。The dequantizer 530 dequantizes and outputs the quantized result. A weighting circuit 540 weights the divided audio signals using the dequantized spectral parameters. For the weighting method, reference can be made to the weighting circuit 200 in Document 5. The impulse response calculation circuit 550 calculates an impulse response using the dequantized pitch parameter and the dequantized spectral parameter. For the specific method, reference can be made to the above-mentioned document 2. The autocorrelation function calculation circuit 560 calculates the autocorrelation function of the impulse response, and the sound source pulse calculation circuit 5
Output to 80. For the method of calculating the autocorrelation function, reference can be made to the autocorrelation function calculation circuit 180 in Document 2 mentioned above.

相互相関関数計算回路５７０は前記重み付けられた信号
と前記インパルス応答との相互相関関数を計算して音源
パルス計算回路５８０へ出力する。具体的な方法は前記
文献２を参照できる。The cross-correlation function calculation circuit 570 calculates a cross-correlation function between the weighted signal and the impulse response and outputs it to the sound source pulse calculation circuit 580. For the specific method, reference can be made to the above-mentioned document 2.

音源パルス計算回路５８０では、マルチパルスをピッチ
予測により、予め定められた個８１　（Ｌ　１個）だけ
求める。マルチパルス列の計算方法については、前記文
献２の音源パルス計算回路２１０を参照することができ
る。量子化器５８５は音源マルチパルス列を量子化して
符号を出力する。この出力は逆量子化器５９０によって
逆量子化され、パルス発生器６００によってマルチパル
スを再生する。ピッチ再生フィルタ６０５では前記再生
されたマルチパルスと前記逆量子化器５１８によって逆
量子化されたピッチパラメータを入力としピッチを再生
した音源信号を出力する。前記音源信号と前記逆量子化
器５３０から出力されたスペクトルパラメータをスペク
トル包絡フィルタ６１０に通すことによって、前記音源
パルスによる合成音声信号’１（ｎ）が求まる。The sound source pulse calculation circuit 580 calculates a predetermined number 81 (L 1) of multipulses by pitch prediction. Regarding the method of calculating a multi-pulse train, reference can be made to the sound source pulse calculation circuit 210 of the above-mentioned document 2. A quantizer 585 quantizes the excitation multipulse train and outputs a code. This output is dequantized by dequantizer 590 and regenerated into multiple pulses by pulse generator 600. The pitch reproduction filter 605 inputs the reproduced multi-pulse and the pitch parameter dequantized by the dequantizer 518, and outputs a pitch-reproduced sound source signal. By passing the sound source signal and the spectral parameters output from the inverse quantizer 530 through a spectral envelope filter 610, a synthesized speech signal '1(n) based on the sound source pulse is obtained.

減算器６１５は、前記音声信号ｘ　（ｎ）から合成音声
信号ｘ（ｎ）を減することによって、残差信号ｅ　（ｎ
）に対して小振幅音源信号を計算する。The subtracter 615 subtracts the synthesized speech signal x(n) from the speech signal x(n), thereby producing a residual signal e(n
) to calculate the small amplitude sound source signal.

小振幅音源計算回路６２０では、前記作用の項で動作を
説明したように、フレームよりも短い区間に分割された
小区間（例えば５ｍ５ｅｃ、）の小振幅音源信号を複数
個のフードブックを用いて表す。As explained in the operation section above, the small amplitude sound source calculation circuit 620 calculates a small amplitude sound source signal of a small section (for example, 5 m5ec) divided into sections shorter than a frame using a plurality of food books. represent.

小振幅音源信号を表すコードブックのインデクスとゲイ
ン、量子化器５８５の出力であるマルチパルス列を量子
化した符号、量子化器５１８の出力であるピッチパラメ
ータを量子化した符号、さらに量子化器５２５の出力で
あるスペクトルパラメータを量子化した符号は、それぞ
れマルチプレクサ６３０の入力となる。マルチプレクサ
６３０は以上の各符号を組み合わせて出力する。The index and gain of the codebook representing the small amplitude excitation signal, the code obtained by quantizing the multi-pulse train that is the output of the quantizer 585, the code obtained by quantizing the pitch parameter that is the output of the quantizer 518, and the quantizer 525 The codes obtained by quantizing the spectral parameters which are the outputs of are respectively input to the multiplexer 630. The multiplexer 630 combines and outputs the above codes.

一方、受信側では、デマルチプレクサ７１０は、マルチ
パルス列の符号、ピッチパラメータの符号、スペクトル
パラメータの符号、小振幅音源信号を表すインデクス及
びゲインの符号を分離して出力する。音源パルス復号器
７２０はマルチパルスの振幅、位置を複合する。スペク
トルパラメータ復号器７５０は、送信側の逆量子化器５
３０と同じ働きをする。小振幅音源復号器７３０は、送
信側の小振幅音源計算回路８２０と同一のコードブック
を有しており、受信したインデクスを用いて小振幅音源
信号を表すコードを選択して出力する。ゲイン回路７３
５は、現フレームが子音区間のであることを示す符号を
受信した場合に、受信したゲインの符号を用いて小振幅
音源信号の振幅を決定する。ピッチパラメータ復号器７
４５は送信側の逆量子化器５１８と同じ出きをする。パ
ルス発生器７２５は前記マルチパルス列による音源信号
を発生させる。ピッチ再生フィルタ７５５は前記重めら
れた音源信号と前記復号されたピッチパラメータを入力
としてピッチを再生した合成音源信号を再生する。加算
器７４０は前記ピッチを再生した音源信号とゲイン回路
７３５の出力信号を加算して、駆動音源信号を求め、ス
ペクトル包絡フィルタ回路７６０を駆動する。スペクト
ル包絡フィルタ回路７６０では前記駆動音源信号及び前
記復号されたスペクトルパラメータを用いて合成音声波
形を求めて出力する。On the other hand, on the receiving side, the demultiplexer 710 separates and outputs the code of the multipulse train, the code of the pitch parameter, the code of the spectral parameter, and the code of the index and gain representing the small amplitude excitation signal. The source pulse decoder 720 decodes the amplitude and position of the multipulses. The spectral parameter decoder 750 includes the inverse quantizer 5 on the transmitting side.
It works the same as 30. The small amplitude excitation decoder 730 has the same codebook as the small amplitude excitation calculation circuit 820 on the transmitting side, and uses the received index to select and output a code representing the small amplitude excitation signal. Gain circuit 73
5 determines the amplitude of the small-amplitude excitation signal using the received gain sign when a code indicating that the current frame is in a consonant interval is received. Pitch parameter decoder 7
45 has the same output as the inverse quantizer 518 on the transmitting side. The pulse generator 725 generates a sound source signal based on the multi-pulse train. The pitch reproduction filter 755 inputs the weighted sound source signal and the decoded pitch parameter and reproduces a pitch-regenerated synthetic sound source signal. The adder 740 adds the pitch-reproduced sound source signal and the output signal of the gain circuit 735 to obtain a driving sound source signal, and drives the spectral envelope filter circuit 760. The spectral envelope filter circuit 760 uses the driving sound source signal and the decoded spectral parameters to obtain and output a synthesized speech waveform.

以上述べた構成は本発明の一構成に過ぎず、種々の変形
も可能である。The configuration described above is only one configuration of the present invention, and various modifications are possible.

小振幅音源信号を求めるための計算量をさらに大幅に削
減するためには、作用の項の（５）式から（７）式で説
明したように、ピッチ予測によるマルチパルスを求めた
後の相互相関関数Φ、゛を用いてコードブックを選択す
るような構成とすることが可能である。このようにする
と、前記作用の項でも述べた通り、コードブック選択の
際に信号τｗ　　（ｎ）を再生しなくてよいので、第１
図に示した構成と比べて演算量を大幅に低減できる。In order to further significantly reduce the amount of calculation required to obtain a small-amplitude sound source signal, as explained in equations (5) to (7) in the action section, the mutual It is possible to have a configuration in which a codebook is selected using correlation functions Φ and . By doing this, as mentioned in the section on the operation above, it is not necessary to reproduce the signal τw (n) when selecting the codebook, so the first
The amount of calculation can be significantly reduced compared to the configuration shown in the figure.

また、マルチパルスの計算方法としては、前記文献１に
示した方法の他に、種々の衆知な方法を用いることがで
きる。Furthermore, as a method for calculating multi-pulses, various well-known methods can be used in addition to the method shown in the above-mentioned document 1.

また、スペクトルパラメータとしては、他の衆知なパラ
メータ（線スペクトル対、ケプストラム、メルケプスト
ラム、対数断面積比等）を用いることもできる。さらに
、スペクトルパラメータの量子化法としてはスカラー量
子化以外にもベクトル量子化等を用いることができる。Further, as the spectral parameter, other well-known parameters (line spectrum pair, cepstrum, mel cepstrum, logarithmic cross-sectional area ratio, etc.) can also be used. Furthermore, as a quantization method for spectral parameters, vector quantization or the like can be used in addition to scalar quantization.

ベクトル量子化については、前記文献３を参照できる。Regarding vector quantization, reference can be made to the above-mentioned document 3.

また、フレーム長は一定としたが、可変としてもよい。Further, although the frame length is fixed, it may be variable.

また、小振幅音源計算回路６２０、小振幅音源復号器７
３０において、最適なコードブックを選択、復号する際
に使用するピッチパラメータとしてのピッチ周期、ピッ
チゲインの少なくとも一方を残差信号ｅ（ｒｉ）から求
めなおして使ってもよい。Also, a small amplitude sound source calculation circuit 620, a small amplitude sound source decoder 7
At step 30, at least one of a pitch period and a pitch gain as a pitch parameter used when selecting and decoding an optimal codebook may be recalculated from the residual signal e(ri) and used.

また、入力信号を例えば母音、子音部分（または有声、
無声部分）に判別し、子音部分にのみフードブックを使
用する形にしてもよい。この場合、母音、子音部分の判
別には、衆知の方法、例えばフレームのパワー、前フレ
ームとのパワーの差、前フレームとのスペクトルの変化
、ピッチ性などのパラメータを用いることができる。一
方、仔声、無声部分の判別には、ピッチゲイン等のパラ
メータを用いることができる。Also, the input signal can be converted into, for example, vowels, consonants (or voiced,
The food book may be used only for the consonant parts. In this case, the vowel and consonant parts can be determined using a well-known method such as parameters such as the power of the frame, the difference in power from the previous frame, the change in spectrum from the previous frame, and the pitch property. On the other hand, parameters such as pitch gain can be used to distinguish between child voices and unvoiced parts.

また、子音部分に対しては、子音の性質（例えば破裂性
、摩擦性等）に応じて異なるフードブックを予め作成し
ておき、これらを切り替えて使用してもよい。Further, for the consonant part, different food books may be created in advance according to the properties of the consonant (for example, plosiveness, fricativeness, etc.), and these may be used by switching.

さらに、フードブックを用いることによる有効性を判別
して有効である部分にのみコードブックを用いる形にす
ることもでき、また、母音、子音部分に判別した後に有
効性を判別する形することも可能である。有効性の判別
には、各区間における、コードブックを用いて再生した
信号のパワーあるいはＲＭＳと減算器６１５の出力であ
る残差信号のパワーあるいはＲＭＳの比や、コードブッ
クのゲインの大きさなどを用いることができる。Furthermore, it is also possible to determine the effectiveness of using the food book and use the codebook only for the parts that are effective, or to determine the effectiveness after determining the vowel and consonant parts. It is possible. To determine the effectiveness, the ratio of the power or RMS of the signal reproduced using the codebook to the power or RMS of the residual signal that is the output of the subtracter 615 in each section, the magnitude of the codebook gain, etc. can be used.

（発明の効果）本発明によれば、従来例に比べ、音源信号としてピッチ
予測したマルチパルス列のみなららず、音質改善にさら
に効果のある小振幅の音源信号を表すためにコードブッ
クを併せて用いることによって、効率的に音源情報を表
すことができる。従って、従来法とビットレートを同一
としても、母音部分のみならず子音区間においても従来
法よりもより良好な再生音声信号を得ることができると
いう大きな効果がある。さらに、この効果はビットレー
トを下げていった場合により顕著となる。(Effects of the Invention) According to the present invention, compared to the conventional example, a codebook is used to represent not only a pitch-predicted multi-pulse train as a sound source signal but also a small-amplitude sound source signal that is more effective in improving sound quality. By using this, sound source information can be expressed efficiently. Therefore, even if the bit rate is the same as that of the conventional method, there is a great effect that a reproduced audio signal that is better than the conventional method can be obtained not only in the vowel portion but also in the consonant section. Furthermore, this effect becomes more pronounced when the bit rate is lowered.

[Brief explanation of the drawing]

第１図は本発明による音声符号化復号化方法とその装置
の一実施例の構成を示すブロック図、第２図は本発明の
作用を示すブロック図である。第３図はマルチパルス列
探索法の従来例を表すブロック図である。図において、２４０．５１０・・・時間分割回路、５１
５・・・ピッチパラメータ計算回路、５２０・・・スペ
クトルパラメータ計算回路、５１Ｂ、５２５、５８５・
・・量子化器、５１８．５３０．５９０・・・逆量子化
器、２８０．３５０．５４０．８５０・・・重み付は回
路、５５０・・・インパルス応答計算回路、５６０・・
・自己相関関数計算回路、５７０・・・相互相関関数計
算回路、５８０・・・音源パルス計算回路、６００．７
２５・・・パルス発生器、２５７．３１５．８０５．７
５５・・・ピッチ再生フィルタ、２６０．３２０．６１
０．７６０・・・スペクトル包絡フィルタ、８２０・・
・合成フィルタ回路、２３０．６２０・・・小振幅音源
計算回路、６３０・・・マルチプレクサ、７１０・・・
デマルチプレクサ、７２０・・・音源パルス復号器、７
３０・・・小振幅音源復号器、７４０・・・加算器、７
４５・・・ピッチパラメータ復号器、７５０・・・スペ
クトルパラメータ復号器、７７０・・・出力端子、３１
０・・・マルチパルス音源、８１０・・・音源計算回路
、２２０，２７０．３４０１６１５．８４０・・・減算
器、２５５．７３５・・・ゲイン回路、３８０・・・ス
イッチ。FIG. 1 is a block diagram showing the configuration of an embodiment of a speech encoding/decoding method and apparatus according to the present invention, and FIG. 2 is a block diagram showing the operation of the present invention. FIG. 3 is a block diagram showing a conventional example of a multi-pulse train search method. In the figure, 240.510... time division circuit, 51
5... Pitch parameter calculation circuit, 520... Spectral parameter calculation circuit, 51B, 525, 585.
... Quantizer, 518.530.590 ... Inverse quantizer, 280.350.540.850 ... Weighting circuit, 550 ... Impulse response calculation circuit, 560 ...
- Autocorrelation function calculation circuit, 570... Cross correlation function calculation circuit, 580... Sound source pulse calculation circuit, 600.7
25...Pulse generator, 257.315.805.7
55... Pitch reproduction filter, 260.320.61
0.760...spectral envelope filter, 820...
-Synthesis filter circuit, 230.620...Small amplitude sound source calculation circuit, 630...Multiplexer, 710...
Demultiplexer, 720... Sound source pulse decoder, 7
30... Small amplitude sound source decoder, 740... Adder, 7
45... Pitch parameter decoder, 750... Spectrum parameter decoder, 770... Output terminal, 31
0...Multipulse sound source, 810...Sound source calculation circuit, 220,270.3401615.840...Subtractor, 255.735...Gain circuit, 380...Switch.

Claims

[Claims]

A discrete audio signal is input, a pitch parameter representing the pitch fine structure of the audio signal and a spectral parameter representing the spectrum of the audio signal are determined, and a multi-pulse train obtained by pitch predicting the sound source signal of the audio signal is calculated. Representing and transmitting using a codebook obtained by pitch prediction, restoring the sound source signal using the codebook, the multipulse train, and the pitch parameter, and satisfactorily representing the audio signal using the spectral parameter. A speech encoding/decoding method characterized by outputting a synthesized speech signal.