JP2508002B2

JP2508002B2 - Speech coding method and apparatus thereof

Info

Publication number: JP2508002B2
Application number: JP61148579A
Authority: JP
Inventors: 一範小澤
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1986-06-24
Filing date: 1986-06-24
Publication date: 1996-06-19
Anticipated expiration: 2011-06-19
Also published as: JPS634300A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声符号化方法とその装置に関し、特に音声
信号を4.8Kb/s（キロビット／秒）程度の低ビットレイ
トの少ない演算量によって高品質符号化するための音声
符号化方法とその装置に関する。Description: TECHNICAL FIELD The present invention relates to a speech coding method and apparatus thereof, and particularly to a speech signal having a low bit rate of about 4.8 Kb / s (kilobits / second) and a high computation rate. TECHNICAL FIELD The present invention relates to a speech coding method and apparatus for quality coding.

[Conventional technology]

音声信号を、4.8Kb/s程度の低い伝送ビットレイトで
高品質に符号化する方式としては、たとえば特願昭59−
272435（文献１）や特願昭60−178911（文献２）等に記
載されている如く、有声区間では１フレームの音源信号
を１つのピッチ区間（代表区間）のパルス列で表現しこ
のパルス列を伝送する方法が知られている。この方法に
よれば4.8Kb/s程度の低ビットレイトでも自然性を損な
わない良好な音質を得ることができる。As a method for encoding a voice signal with a low transmission bit rate of about 4.8 Kb / s with high quality, for example, Japanese Patent Application No.
As described in 272435 (Reference 1) and Japanese Patent Application No. 60-178911 (Reference 2), a voice signal in a voiced section is expressed by a pulse train of one pitch section (representative section) and this pulse train is transmitted. It is known how to do it. According to this method, it is possible to obtain good sound quality without impairing the naturalness even at a low bit rate of about 4.8 Kb / s.

第３図は有声区間における代表区間のパルス列を求め
る従来方式の一例を示すパルス処理説明図である。第３
図において（ａ）は１フレームの音声波形であり、
（ｂ）は（ａ）の音声波形をピッチ周期Ｐ′ｄに対応す
るサブフレームに分割した例であり、ピッチ周期Ｐ′ｄ
と第１番目のパルスg₁の位置を利用して１フレームをサ
ブフレーム区間に分割している。また（ｃ）はサブフレ
ームごとにパルス列を求めた例で、さらに（ｄ）は探索
して求めた代表区間とこの代表区間におけるパルス列の
例で代表区間としては〜のサブフレームのうちが
該当する。この場合の代表区間は、フレーム全体で良好
な音声を再生できるようなサブフレーム区間を探索する
方法や絶対値の大きいパルスを含む区間を選択する方法
によて求めることができる。音源情報としては、このよ
うな代表区間の位置、すなわち第３図の場合にはサブフ
レームの位置とフレーム内におけるサブフレームの開
始点の時間情報Ｔとを伝送している。FIG. 3 is a pulse processing explanatory view showing an example of a conventional method for obtaining a pulse train of a representative section in a voiced section. Third
In the figure, (a) is a voice waveform of one frame,
(B) is an example in which the speech waveform of (a) is divided into subframes corresponding to the pitch period P'd.
And one frame is divided into sub-frame sections by utilizing the position of the _first pulse g ₁ . Further, (c) is an example in which a pulse train is obtained for each subframe, and (d) is an example of the representative section obtained by searching and an example of the pulse train in this representative section. . In this case, the representative section can be obtained by a method of searching a subframe section that can reproduce good audio in the entire frame or a method of selecting a section including a pulse having a large absolute value. As the sound source information, the position of such a representative section, that is, the position of the subframe in the case of FIG. 3 and the time information T of the start point of the subframe in the frame are transmitted.

[Problems to be solved by the invention]

しかしながら上述した従来のこの種の方法では、パル
ス列の探索や代表区間を捜すのに要する演算量が多いう
え代表区間の位置やフレーム内でのサブフレームの開始
点の情報などの補助情報が必要であり、従って現在市場
に出廻っている汎用のシグナルプロセッサなどを用いて
ハードウエアを実現しようとした場合、特に前者の理由
により簡単な装置構成での実現が困難であるという欠点
がある。However, in the above-described conventional method of this type, a large amount of calculation is required to search the pulse train or the representative section, and auxiliary information such as the position of the representative section and the start point of the subframe in the frame is required. Therefore, when trying to realize the hardware by using a general-purpose signal processor currently on the market, there is a drawback that it is difficult to realize with a simple device configuration especially for the former reason.

本発明の目的は上述した欠点を除去し、少ない演算量
と簡単な装置構成によって4.8Kb/s程度の低い伝送ビッ
トトレイでも高品質な音声を合成しうる音声符号化方法
とその装置を提供することにある。An object of the present invention is to eliminate the above-mentioned drawbacks and provide a speech coding method and apparatus capable of synthesizing high-quality speech even with a low transmission bit tray of about 4.8 Kb / s with a small amount of calculation and a simple device configuration. Especially.

[Means for solving problems]

送信側では離散的な音声信号を入力し前記音声信号か
らフレーム毎に短時間スペクトル包絡を表わすスペクト
ルパラメータとピッチを表わすピッチパラメータとを抽
出し、フレーム区間を前記ピッチパラメータによる周期
と等しい複数個のピッチ区間に分割し、１つのピッチ区
間の音源信号をマルチパルス列もしくは雑音とマルチパ
ルス列との組合せで表わし前記音源信号を表わす情報と
前記ピッチパラメータと前記スペクトルパラメータとを
組合せて出力し、受信側では前記ピッチパラメータにも
とづいてフレームをピッチ区間に分割し、前記音源信号
を表わす情報にもとづいて一つのピッチ区間の音源信号
を復元してフレーム全体の駆動音源信号を復元し、前記
スペクトルパラメータを用いて前記音声信号を合成する
構成である。On the transmission side, a discrete voice signal is input, a spectrum parameter representing a short-time spectrum envelope and a pitch parameter representing a pitch are extracted from the voice signal for each frame, and a frame section is divided into a plurality of periods equal to the period of the pitch parameter. Dividing into pitch sections, a sound source signal of one pitch section is represented by a multi-pulse train or a combination of noise and a multi-pulse train, and information representing the sound source signal, the pitch parameter and the spectrum parameter are combined and output, and the receiving side The frame is divided into pitch sections based on the pitch parameter, the excitation signal of one pitch section is restored based on the information indicating the excitation signal to restore the driving excitation signal of the entire frame, and the spectrum parameter is used. It is a configuration for synthesizing the audio signals.

入力した音声信号からフレーム毎に短時間スペクトル
包絡を表わすスペクトルパラメータとピッチを表わすピ
ッチパラメータとを抽出して符号化するパラメータ計算
回路と、フレーム区間を前記ピッチパラメータによる周
期と等しい複数個のピッチ区間に分割し一つのピッチ区
間の音源信号をマルチパルス列もしくは雑音とマルチパ
ルス列との組合せで表わして前記音源信号を符号化する
駆動信号計算回路と、前記パラメータ計算回路の出力符
号と前記駆動信号計算回路の出力符号とを組合せて出力
するマルチプレクサ回路とを有する。A parameter calculation circuit that extracts and encodes a spectrum parameter that represents a short-time spectrum envelope and a pitch parameter that represents a pitch for each frame from an input speech signal, and a plurality of pitch intervals in which a frame interval is equal to the cycle of the pitch parameter. A drive signal calculation circuit that encodes the excitation signal by expressing the excitation signal of one pitch section by a multi-pulse train or a combination of noise and a multi-pulse train, and an output code of the parameter calculation circuit and the drive signal calculation circuit. And a multiplexer circuit for outputting the combined output code of the above.

〔Example〕

次に図面を参照して本発明を詳細に説明する。 The present invention will now be described in detail with reference to the drawings.

第１（ａ）図は本発明による音声符号化方法とその装
置の送信側の一実施例を示すブロック図，第１（ｂ）図
は本発明により音声符号化方法とその装置の受信側の一
実施例を示すブロック図である。FIG. 1 (a) is a block diagram showing an embodiment of a voice coding method and its transmitting side according to the present invention, and FIG. 1 (b) is a voice coding method and its receiving side according to the present invention. It is a block diagram which shows one Example.

第１図（ａ）図に示す送信側は、バッファメモリ110,
ピッチ分析回路130,Kパラメータ計算回路140,ピッチ符
号化回路150,Kパラメータ符号化回路160、インパルス応
答計算回路170,自己相関関数計算回路180,重み付け回路
200,駆動信号計算回路220,雑音メモリ225,符号化回路23
0,マルチプレクサ260等を備えて構成され、また１−
（ｂ）図に示す受信側は、デマルチプレクサ290,復号回
路300,雑音メモリ310,ピッチ復号回路320,Kパラメータ
復号回路330,駆動信号復元回路340,補間回路335,合成フ
ィルタ回路350等を備えて構成される。The transmitting side shown in FIG.
Pitch analysis circuit 130, K parameter calculation circuit 140, pitch coding circuit 150, K parameter coding circuit 160, impulse response calculation circuit 170, autocorrelation function calculation circuit 180, weighting circuit
200, drive signal calculation circuit 220, noise memory 225, encoding circuit 23
0, multiplexer 260, etc.
The receiving side shown in (b) is provided with a demultiplexer 290, a decoding circuit 300, a noise memory 310, a pitch decoding circuit 320, a K parameter decoding circuit 330, a drive signal restoration circuit 340, an interpolation circuit 335, a synthesis filter circuit 350, and the like. Consists of

これら送信側および受信側によって行なわれる本発明
の基本的処理内容は次のとおりである。The basic processing contents of the present invention performed by the transmitting side and the receiving side are as follows.

すなわち、本発明は、有声区間では音声信号の周期性
を利用してフレーム内のあらかじめ定めたピッチ区間を
そのフレームを代表する代表区間としてこのピッチ区間
のパルス列を用いて音源信号を表わし、無声区間ではパ
ルス列と雑音との組合わせによって音源信号を表わす。
この場合、音声区間では代表区間の位置をあらかじめ定
めておきその区間についてのみパルス列を求めることに
よりパルス探索や代表区間の探索に要する演算量を大幅
に低減している。また、代表区間の位置を示す情報やフ
レーム内のサブフレームの開始点を表わす情報などの補
助情報が不要となり、この分パルス列に付与すべき情報
を増大することができ、少ない演算量で高品質の音声を
再生することができ、コンパクトな装置構成でハードウ
ェアの実現をはかるものである。That is, in the voiced section, the present invention represents a sound source signal by using a pulse train of this pitch section as a representative section representing the frame by using a predetermined pitch section in the frame by utilizing the periodicity of the voice signal, and the unvoiced section. In, the source signal is represented by the combination of pulse train and noise.
In this case, in the voice section, the position of the representative section is determined in advance, and the pulse train is obtained only for that section, thereby significantly reducing the amount of calculation required for pulse search and representative section search. Moreover, auxiliary information such as information indicating the position of the representative section and information indicating the start point of a subframe within a frame is unnecessary, and the information to be added to the pulse train can be increased by this amount, and high quality can be achieved with a small amount of calculation. The audio can be reproduced and the hardware is realized with a compact device configuration.

パルス列の振幅と位置とを求める方法としては、前記
文献（１），（２）に記載の方法のほかに、たとえばア
ナリシス−バイ−シンセシス（ANALYSIS−by−SYNTHESI
S:A−ｂ−Ｓ）の手法を用いる方法が知られており、そ
の詳細についてはビー・エス・アタル（Ｂ・Ｓ・ATAL）
らによる“アニューモデルオブエルピーシー
エクサイテイションフォープロデューシングナチュ
ナルサウンヂィングスピーチアットロウビット
レイツ”（A NEW MODEL OFLPC EXCITATION FOR PRODUCI
NG NATURAL SOUNDING SPEECHAT LOW BIT RATES）と題し
た論文（PROC.I.C.A.S.S.P.,pp 614〜617,1982）（文献
３）等に詳述されている。As a method for obtaining the amplitude and position of the pulse train, in addition to the methods described in the above-mentioned documents (1) and (2), for example, ANALYSIS-by-SYNTHESI.
S: A-B-S) method is known, and the details are described in B.S.ATAL.
"A new model of LPC
"Excitement for Producing Natural Sounding Speech at Low Bitrates" (A NEW MODEL OFLPC EXCITATION FOR PRODUCI
NG NATURAL SOUNDING SPEECHAT LOW BIT RATES) (PROC.ICASSP, pp 614-617,1982) (Reference 3) and the like.

一方、無音区間においてパルス列と雑音との組合せに
よる音源の求め方は、前記（文献２）の方法に詳述さ
れている。On the other hand, the method of obtaining the sound source by the combination of the pulse train and the noise in the silent section is described in detail in the method of (Reference 2).

さて、第１（ａ）図において、音声信号Ｘ（ｎ）が入
力端子100を介して入力され、あらかじめ設定されたサ
ンプル数ずつバッファメモリ110に蓄積される。Now, in FIG. 1 (a), the audio signal X (n) is input through the input terminal 100 and accumulated in the buffer memory 110 by a preset number of samples.

Ｋパラメータ計算回路140は、バッファメモリ110から
あらかじめ設定したサンプル数ずつ音声信号を入力し音
声信号のスペクトル包絡を表わすＫパラメータを計算す
る、このＫパラメータはPARCOR（偏自己相関）係数であ
る。このＫパラメータの計算法としては自己相関法がよ
く知られており、その詳細はジョンマコール（John M
AKHOUL）らによる“クオンタイゼイションプロパティ
ズオブトランスミッションパラメータズイン
リニアプリヂィクティブシステムズ”（QUANTIZATI
ON PROPERTIES OF TRANSMISSION PARAMETERS IN LINER
PREDICTIVE SYSTEMS）と題した論文（IEEETRANS.A.S.S.
P.pp309〜321,1983）（文献４）等に詳述されている。
ふたたび第１（ａ）図に戻って実施例の説明を続行す
る。The K parameter calculation circuit 140 inputs a voice signal for each preset number of samples from the buffer memory 110 and calculates a K parameter representing a spectral envelope of the voice signal. The K parameter is a PARCOR (partial autocorrelation) coefficient. The autocorrelation method is well known as a method of calculating the K parameter, and the details thereof are described in John McCall.
AKHOUL) et al. “Quantization Properties of Transmission Parameters
Linear Predictive Systems ”(QUANTIZATI
ON PROPERTIES OF TRANSMISSION PARAMETERS IN LINER
PREDICTIVE SYSTEMS) (IEEETRANS.ASS
P.pp309-321, 1983) (Reference 4) and the like.
Returning to FIG. 1 (a) again, the description of the embodiment will be continued.

Ｋパラメータ計算回路140から出力されるｉ次のＫパ
ラメータKiはＫパラメータ符号化回路160に出力されあ
らかじめ設定した量子化ビット数にもとづいて符号化さ
れ符号lKiとしてマルチプレクサ260に出力される。また
Ｋパラメータ符号化回路160はlKiを一旦復号化して得た
Ｋパラメータ復号化Kiを利用しこれから線形予測係数値
ａ′ｉを得てこれをインパルス応答計算回路170と重み
付け回路200に出力する。The i-th order K parameter Ki output from the K parameter calculation circuit 140 is output to the K parameter encoding circuit 160, encoded based on the preset number of quantization bits, and output to the multiplexer 260 as a code lKi. Further, the K parameter encoding circuit 160 uses the K parameter decoding Ki obtained by once decoding lKi to obtain the linear prediction coefficient value a′i and outputs it to the impulse response calculation circuit 170 and the weighting circuit 200.

ピッチ分析回路130は、バッファメモリ回路110の出力
を用いてピッチ周期Ｐ′ｄを計算する。Ｐ′ｄの計算方
法は、たとえば、アールブイコックス（R.V.C OX）
らによる“リアルタイムインプリメンティション
オブタイムドメインハーモニックスケィリング
オブスピーチ”（REAL−TIME IMPLEMEN−TATION OF TIM
E DOMAIN HAR−MONIC SCALING OF SPEECH SIGNALS）と
題した論文（IEEE TRANS.A.S.S.P..pp.258〜272.1983）
（文献５）等に述べられている方法を用いることができ
る。この場合、ピッチ周期離れた位置の自己相関係数を
用いてフレームの音声信号が有声か無声化を判別する。
もし無声の場合はピッチ周期を０とする。The pitch analysis circuit 130 uses the output of the buffer memory circuit 110 to calculate the pitch period P′d. The calculation method of P'd is, for example, RVC OX
“Real time implementation”
Of Time Domain Harmonic Scaling
Of speech "(REAL-TIME IMPLEMEN-TATION OF TIM
E DOMAIN HAR-MONIC SCALING OF SPEECH SIGNALS) (IEEE TRANS.ASSP.pp.258-272.1983)
The method described in (Reference 5) or the like can be used. In this case, it is determined whether the voice signal of the frame is voiced or unvoiced by using the autocorrelation coefficient at the position separated by the pitch period.
If unvoiced, the pitch period is set to 0.

ピッチ符号化回路150は、ピッチ周期Ｐ′ｄをあらか
じめ定められた量子化ビット数で量子化符号化し、符号
ldとしてマルチプレクサ260へ出力する。このとき、無
声の場合は０に対応する符号を出力する。また復号化し
て得たＰ′ｄを駆動信号計算回路220へ出力する。The pitch encoding circuit 150 quantizes and encodes the pitch period P′d with a predetermined number of quantization bits,
It is output to the multiplexer 260 as ld. At this time, when the voice is unvoiced, the code corresponding to 0 is output. Also, P′d obtained by decoding is output to the drive signal calculation circuit 220.

インパルス応答計算回路170は、Ｋパラメータ符号化
回路160から予測係数値ａ′ｉを入力し、重みずけされ
た合成フィルタの伝達関数を表わすインパルス応答hw
（ｎ）を計算する。ここで、hw（ｎ）の計算には、例え
ば日本国出願特許“特願昭59−042305"（文献６）の第
４図（ａ）に記載のインパルス応答計算回路210と同一
の方法を用いることができる。インパルス応答hw（ｎ）
は、自己相関関数計算回路180と相互相関関数計算回路2
10に出力される。The impulse response calculation circuit 170 inputs the prediction coefficient value a′i from the K parameter encoding circuit 160 and represents the impulse response hw representing the transfer function of the weighted synthesis filter.
Calculate (n). Here, the same method as the impulse response calculation circuit 210 described in FIG. 4 (a) of Japanese Patent Application "Japanese Patent Application No. 59-042305" (Reference 6) is used for the calculation of hw (n). be able to. Impulse response hw (n)
Is an autocorrelation function calculation circuit 180 and a cross correlation function calculation circuit 2
Output to 10.

自己相関関数計算回路180は、インパルス応答計算回
路170からインパルス応答hw（ｎ）を入力し、自己相関
関数Rhh（ｍ）を計算して駆動信号計算回路220へ出力す
る。ここでRhh（ｍ）の計算には例えば前記（文献６）
に記載の自己相関関数計算回路180と同一の方法を用い
ることができる。The autocorrelation function calculation circuit 180 receives the impulse response hw (n) from the impulse response calculation circuit 170, calculates the autocorrelation function Rhh (m), and outputs it to the drive signal calculation circuit 220. Here, for the calculation of Rhh (m), for example, the above (Reference 6)
The same method as that of the autocorrelation function calculation circuit 180 described in (1) can be used.

重み付け回路200は、フレームのサンプル数だけｘ
（ｎ）を入力し、またＫパラメータ符号化回路160から
予測係数aiを入力し、ｘ（ｎ）に対し重みずけを施して
求めたxw（ｎ）を出力する。この計算には、例えば前記
文献６の第４図（ａ）に記載の重み付け回路410と同一
の方法を用いることができる。The weighting circuit 200 sets the number of samples in the frame by x
(N) is input, the prediction coefficient ai is input from the K parameter encoding circuit 160, and xw (n) obtained by weighting x (n) is output. For this calculation, for example, the same method as the weighting circuit 410 shown in FIG.

駆動信号計算回路220では、まずピッチ周期Ｐ′ｄを
用いてフレームが音声か無声かを判別する。そして音声
信号を表わす音源信号として、有声のフレームでは、あ
らかじめ定められたピッチ区間（代表区間）におけるパ
ルス列を計算する。一方、無声のフレームでは、パルス
列と雑音の組合わせによる音源信号を計算する。The drive signal calculation circuit 220 first determines whether the frame is voice or unvoiced using the pitch period P'd. Then, as a sound source signal representing a voice signal, in a voiced frame, a pulse train in a predetermined pitch section (representative section) is calculated. On the other hand, in the unvoiced frame, the sound source signal is calculated by the combination of the pulse train and the noise.

第２図は第１（ａ）図における有声のフレームでの代
表区間とその区間内パルス列の求め方を示すパルス処理
説明図である。（ａ）は１フレームの音声波形を示す。
まず最初に、分割サブフレーム（ｂ）に示すようにフレ
ームをピッチ周期Ｐ′ｄごとのサブフレームに分割す
る。この分割には、１つ前のフレーム（Nl−１）の最後
のサブフレーム分割点Tsを記憶しておき、この点をＰ′
ｄだけずらしながらサブフレーム分割をすることができ
る。１つ前のフレームが無声のばあいは、例えばフレー
ムの左端の点からサブフレーム分割する方法や信号の立
ち上がり点を求めて分割する方法など、他の良好な方法
を用いることができる。次に代表区間のパルス列（ｃ）
に示すように、あらかじめ定められたピッチ区間に対し
て、あらかじめ定められた個数のパルス列を計算す
る。ここではフレームのほぼ中央のサブフレーム区間を
代表区間としている（図ではサブフレーム区間）が、
他の良好な方法を用いることもできる。代表区間でのパ
ルス列の計算には、サブフレーム境界でパルスが隣接し
て求まることを防ぎ能率良くパルスを求めるために次の
ように行なう。まずこのサブフレーム区間の両側にＬサ
ンプル追加した（Ｐ′ｄ＋2L）サンプルについて
（（ｂ）のNcの区間）相互相関関数Rhx（ｍ）を計算す
る。ここでＬは自己相関関数計算回路180で求めた自己
相関関数Rhh（ｍ）の最大遅延時間を示す。Rhx（ｍ）、
Rhh（ｍ）の計算には前記（文献６）の第１図（ａ）の
相互相関関数計算回路210、自己相関関数計算回路180で
の計算法を参照することができる。Rhx（ｍ）とRhh
（ｍ）を用いてパルス列の探索を行ない、代表区間にあ
らかじめ定められた個数のパルス列を求める。パルス探
索には前記（文献２）の第１図（ａ）の駆動信号計算回
路220に記載のパルス探索法を参照することができる。
このようにして求めた代表区間のパルス列を第２図
（ｃ）に示す。代表区間のパルス列の振幅、位置は符号
化回路230へ出力される。このとき、代表区間の分割の
仕方によっては、大振幅のパルスが削れてしまう恐れが
あるので、代表区間の両側を数サンプル分伸ばした区間
についてパルス列を求めて符号化回路230に出力するよ
うにしてもよい。FIG. 2 is a pulse processing explanatory view showing a representative section in the voiced frame in FIG. 1 (a) and a method of obtaining a pulse train within the section. (A) shows a voice waveform of one frame.
First, as shown in the divided sub-frame (b), the frame is divided into sub-frames each having a pitch period P'd. In this division, the last subframe division point Ts of the immediately preceding frame (Nl-1) is stored, and this point is P '.
Subframe division can be performed while shifting by d. When the previous frame is unvoiced, another good method can be used, such as a method of dividing the subframe from the leftmost point of the frame or a method of obtaining the rising point of the signal and dividing. Next, the pulse train of the representative section (c)
As shown in, a predetermined number of pulse trains are calculated for a predetermined pitch section. Here, the sub-frame section in the center of the frame is the representative section (sub-frame section in the figure),
Other good methods can also be used. The calculation of the pulse train in the representative section is performed as follows in order to prevent the pulses from being found adjacent to each other at the subframe boundary and to find the pulses efficiently. First, a cross-correlation function Rhx (m) is calculated for (P'd + 2L) samples (L section of (b)) in which L samples are added to both sides of this subframe section. Here, L represents the maximum delay time of the autocorrelation function Rhh (m) obtained by the autocorrelation function calculation circuit 180. Rhx (m),
For the calculation of Rhh (m), the calculation method in the cross-correlation function calculation circuit 210 and the autocorrelation function calculation circuit 180 in FIG. Rhx (m) and Rhh
A pulse train is searched using (m) to obtain a predetermined number of pulse trains in the representative section. For the pulse search, the pulse search method described in the drive signal calculation circuit 220 of FIG. 1 (a) of (Reference 2) can be referred to.
The pulse train of the representative section thus obtained is shown in FIG. 2 (c). The amplitude and position of the pulse train in the representative section are output to the encoding circuit 230. At this time, a large-amplitude pulse may be cut off depending on how the representative section is divided.Therefore, a pulse train is obtained for a section obtained by extending both sides of the representative section by several samples, and is output to the encoding circuit 230. May be.

一方、無声フレームではパルス列と雑音の組み合わせ
で音源信号を求める。これには、前述の（文献２）等に
記述の方法を参照することができる。またこの方法以外
にも、フレームを一定区間毎のいくつかのサブフレーム
に分割し、このうちの１つの区間に対してのみパルス列
を求め、残りのサブフレーム区間では音源信号を雑音で
表わすようにしてもよい。パルス列を求めるサブフレー
ム区間は、原音声信号や原音声信号を予測した予測残差
信号の勢力の大きなサブフレーム区間を選ぶようにする
ことができる。On the other hand, in the unvoiced frame, the sound source signal is obtained by combining the pulse train and noise. For this, the method described in the above (Reference 2) or the like can be referred to. In addition to this method, the frame is divided into several sub-frames for each fixed section, the pulse train is obtained only for one of these sections, and the sound source signal is represented by noise in the remaining sub-frame sections. May be. As the subframe section for obtaining the pulse train, it is possible to select a subframe section in which the power of the original speech signal or the prediction residual signal that predicts the original speech signal is large.

符号化回路230は、パルス列が入力された場合にはパ
ルス列の振幅、位置を符号化し、これらの符号をマルチ
プレクサ260へ出力する。ここで、パルス列の符号化法
には、例えば前記（文献６）に記載の符号化回路230と
同一の方法を用いることができる。When the pulse train is input, the encoding circuit 230 encodes the amplitude and position of the pulse train, and outputs these codes to the multiplexer 260. Here, as the pulse train encoding method, for example, the same method as the encoding circuit 230 described in the above (Document 6) can be used.

パルス列と雑音の情報が入力された場合には、パルス
列に対しては上述の方法と同じ方法を用いて符号化し、
雑音に対しては、振幅、位相をあらかじめ定められたビ
ット数で符号化して符号をマルチプレクサへ出力する。When the pulse train and noise information is input, the pulse train is encoded using the same method as described above,
For noise, the amplitude and phase are encoded with a predetermined number of bits and the code is output to the multiplexer.

マルチプレクサ回路260は、Ｋパラメータ符号化回路1
60の符号lkiとピッチ符号化回路150の符号ldと符号化回
路230の符号を入力し、これらを組合せて送信側出力端
子270から出力する。The multiplexer circuit 260 is a K-parameter encoding circuit 1
The code lki of 60, the code ld of the pitch coding circuit 150, and the code of the coding circuit 230 are input, and these are combined and output from the transmission side output terminal 270.

以上で本発明による音声符号化方法の送信側の説明を
終了する。This is the end of the description of the transmitting side of the audio encoding method according to the present invention.

次に、本発明による音声符号化方法の受信側の構成に
ついて、第１（ｂ）図を参照して説明する。Next, the configuration of the receiving side of the audio encoding method according to the present invention will be described with reference to FIG. 1 (b).

デマルチプレクサ290は、受信側入力端子280から入力
した符号のうち、Ｋパラメータを表わす符号と、ピッチ
周期を表わす符号と、音源情報を表わす符号とを分離し
て、それぞれＫパラメータ復号回路330、ピッチ復号回
路320、復号回路300へ出力する。The demultiplexer 290 separates a code representing the K parameter, a code representing the pitch period, and a code representing the excitation information among the codes input from the receiving side input terminal 280, and separates them into the K parameter decoding circuit 330, Output to the decoding circuit 320 and the decoding circuit 300.

Ｋパラメータ復号回路330は、Ｋパラメータを復号し
て復号値Ｋ′ｉを補間回路335へ出力する。The K parameter decoding circuit 330 decodes the K parameter and outputs the decoded value K′i to the interpolation circuit 335.

ピッチ復号回路320は、ピッチ周期Ｐ′ｄを復号し
て、駆動信号復元回路340および補間回路335へ出力す
る。The pitch decoding circuit 320 decodes the pitch period P′d and outputs it to the drive signal restoration circuit 340 and the interpolation circuit 335.

復号回路300は音源情報を復号し駆動信号復元回路340
へ出力する。The decoding circuit 300 decodes the sound source information and restores the drive signal
Output to.

駆動信号復元回路340は、ピッチ周期復号値Ｐ′ｄを
用いて、これが０以外の値であれば、有声フレームであ
って音源としてパルス列が利用できると判別して、送信
側の駆動信号計算回路220と同じ方法を用いてフレーム
をピッチ周期Ｐ′ｄごとのサブフレームに分割する。そ
うしてフレーム内のあらかじめ定めた代表区間の位置で
表わされるいサブフレーム区間に対して、受信したパル
ス情報を用いてパルス列を発生させる。次に、代表区間
のパルス列と隣接フレームのパルス列を用いてパルス列
を補間して１フレームの音源パルス列を発生させ駆動音
源信号を復元し、合成フィルタ回路350へ出力する。The drive signal restoration circuit 340 uses the pitch period decoded value P′d, and if it is a value other than 0, determines that it is a voiced frame and a pulse train can be used as a sound source, and the drive signal calculation circuit on the transmission side. The frame is divided into subframes of pitch period P'd using the same method as 220. Then, a pulse train is generated using the received pulse information for the sub-frame section represented by the position of the predetermined representative section in the frame. Next, the pulse train is interpolated using the pulse train of the representative section and the pulse train of the adjacent frame to generate a sound source pulse train of one frame to restore the driving sound source signal and output it to the synthesis filter circuit 350.

この補間法には前記（文献６）に記載の駆動信号復元
回路340と同じ方法を用いることができる。For this interpolation method, the same method as the drive signal restoration circuit 340 described in (Reference 6) can be used.

一方、パルス列と雑音の組み合わせが音源として用い
られる場合は、前記（文献２）に記載の駆動信号復元回
路340と同じ処理を施して駆動音源信号を求め合成フィ
ルタ回路350ヘ出力する。On the other hand, when the combination of the pulse train and the noise is used as the sound source, the same process as the drive signal restoration circuit 340 described in (Reference 2) is performed to obtain the drive sound source signal and output it to the synthesis filter circuit 350.

補間回路335は、復号されたＫパラメータをピッチ周
期ごとに補間し、補間されたＫパラメータを合成フィル
タ回路350へ出力する。The interpolation circuit 335 interpolates the decoded K parameter for each pitch cycle, and outputs the interpolated K parameter to the synthesis filter circuit 350.

合成フィルタ回路350は、駆動音源信号、補間された
Ｋパラメータを入力し、送信側の合成フィルタ回路250
と同一の動作をして１フレーム分の合成音声信号Ｘ
（ｎ）を計算し受信側出力端子360から出力する。The synthesis filter circuit 350 inputs the driving sound source signal and the interpolated K parameter, and outputs the synthesis filter circuit 250 on the transmission side.
Performs the same operation as that of the synthesized voice signal X for one frame.
Calculate (n) and output from the output terminal 360 on the receiving side.

以上で本発明による音声符号化方法の受信側の説明を
おえる。The above is the explanation of the receiving side of the audio encoding method according to the present invention.

上述した実施例はあくまで本発明の一実施例に過ぎず
その変形例も種種考えられる。The above-described embodiment is merely one embodiment of the present invention, and various modifications thereof can be considered.

たとえば、駆動信号計算回路220においては、無声区
間での種々の音声を良好に表わすとともに、無声区間と
有声区間との間で良好な遷移を実現するために、音源を
パルス列と雑音の組み合せで表わす場合に、パルス列と
雑音の組み合せとして、パルス数を０（つまり雑音の
み）から数個まで適応的にかえるようにしてもよい。こ
のようにした場合はパルス数を表わす情報を伝送する必
要がある（例えばフレームあたり２ビット程度）。For example, in the drive signal calculation circuit 220, various voices in the unvoiced section are satisfactorily represented, and in order to realize a good transition between the unvoiced section and the voiced section, the sound source is represented by a combination of a pulse train and noise. In this case, the number of pulses may be adaptively changed from 0 (that is, only noise) to several as a combination of a pulse train and noise. In this case, it is necessary to transmit information indicating the number of pulses (for example, about 2 bits per frame).

また、パルス列の計算法としては、本実施例でのべた
方法の他に、種々の方法を用いることができる。例えば
パルスを１つ求めるごとに過去に求めたパルスの振幅を
調整する方法を用いることができる。この方法の詳細に
ついては小野その他による“マルチパルス駆動型音声符
号化法における音源パルス探索法の検討”と題した論文
（日本音響学会講演論文集157、1983）（文献７）等を
参照することができる。In addition to the method described in the present embodiment, various methods can be used as the method of calculating the pulse train. For example, a method of adjusting the amplitude of a pulse obtained in the past every time one pulse is obtained can be used. For details of this method, refer to the paper by Ono et al. Entitled "Examination of Source Pulse Search Method in Multi-Pulse Driven Speech Coding Method" (Proceedings of the Acoustical Society of Japan, 157, 1983) (Reference 7). You can

また、雑音源を計算する別な方法としては、例えば、
サブフレーム毎に雑音信号を発生させ、雑音信号から合
成した信号とサブフレーム区間の音声信号との誤差電力
を最小化するような雑音を選択する方法が知られてい
る。この方法の詳細については、ビーエスアタル
（B.S.ATAL）らによる“ストキャスティックコーディ
ングオブスピーチシグナルズアットベリィロ
ウビットレイツ”（STOCHASTIC CODING OF SPEECH
SIGNALS AT VERY LOW BIT RATES）と題した論文（PRO
C.,ICC84.pp.1610〜1613,1984）（文献８）等を参照す
ることができる。また、他の方法としては、音声信号を
予測した予測残差信号から雑音源の振幅と位相を求める
方法が知られている。この方法は音声信号を合成しな
くてもよいので演算量を低減することはできるが品質は
劣化する。この方法の詳細については大山による“残差
を雑音でモデル化した駆動音源による線形予測分析合成
方式”と題した論文（日本音響学会講演論文集、昭和59
年10月165−166）（文献９）を参照することができる。Also, as another method of calculating the noise source, for example,
A method is known in which a noise signal is generated for each subframe, and noise is selected so as to minimize the error power between the signal synthesized from the noise signal and the voice signal in the subframe section. For more information on this method, see STOCHASTIC CODING OF SPEECH by BSATAL et al., “Stochastic Coding of Speech Signals at Very Low Bit Rate”.
SIGNALS AT VERY LOW BIT RATES) (PRO
C., ICC84.pp.1610-1613, 1984) (reference 8) and the like can be referred to. As another method, a method of obtaining the amplitude and phase of a noise source from a prediction residual signal obtained by predicting a voice signal is known. Since this method does not need to synthesize a voice signal, the amount of calculation can be reduced, but the quality deteriorates. For details of this method, see a paper by Oyama entitled "Linear Predictive Analysis and Synthesis Method by Driving Sound Source Modeling Residual with Noise" (Proceedings of Acoustical Society of Japan, Showa 59).
165-166) (Reference 9).

またパルス列を求めるときに、Ｋパラメータの値はフ
レーム内で一定（つまり合成フィルタの特性がフレーム
内で変化しない）としていたが、Ｋパラメータの値をサ
ブフレーム毎になめらかに変化させながらパルスを求め
てもよい。具体的には、Ｋパラメータの値を前後のフレ
ームのＫパラメータの値を用いてサブフレーム毎に補間
し、この値を予測係数に変換して、重み付け回路200、
インパルス応答計算回路170に出力し、代表区間のイン
パルス応答を用いて相互相関関数、自己相関関数を計算
してパルス列を求める。このようにしたほうが時間的に
滑らかなスペクトル変化が得られ、品質のより高い音声
を合成できる。Further, when the pulse train was obtained, the K parameter value was constant within the frame (that is, the characteristics of the synthesis filter did not change within the frame), but the pulse was obtained while smoothly changing the K parameter value for each subframe. May be. Specifically, the value of the K parameter is interpolated for each subframe using the values of the K parameters of the preceding and succeeding frames, this value is converted into a prediction coefficient, and the weighting circuit 200,
It outputs to the impulse response calculation circuit 170 and calculates a cross-correlation function and an autocorrelation function using the impulse response of the representative section to obtain a pulse train. In this way, a temporally smooth spectrum change can be obtained, and higher quality speech can be synthesized.

また、パルス列及びＫパラメータの補間は、代表的な
ピッチ区間を基準としてピッチ周期に同期させて補間し
てもよいし、あらかじめ定められたピッチ区間（例え
ば、フレームの中央付近のピッチ区間）を基準として補
間を施してもよい。The pulse train and the K parameter may be interpolated in synchronization with the pitch cycle using a typical pitch section as a reference, or may be a predetermined pitch section (for example, a pitch section near the center of the frame) as a reference. May be interpolated as

また、ピッチ周期についてもＫパラメータと同様な処
理を施すこともできる。Further, the same process as the K parameter can be performed on the pitch period.

これらのパラメータの補間法は、直線補間以外の方法
も考えられる。例えば、パルス列やピッチ周期について
は、対数補間等も考えられる。また、合成フィルタのパ
ラメータの補間は、たとえば、線形予測係数（ただし、
この場合はフィルタの安定性をチェックする必要があ
る）、対数断面積関数、フォルマントパラメータや自己
相関関数を補間する方法等を用いることもできる。これ
らの具体的な方法は、ビーエスアクル（B.S.ATAL）
らによる“スピーチアナリシスアンドシンセシス
バイリニアーブリディクションオブザスピ
ーチウェイブ”（SP−EECH ANALYSIS AND SYNTHESIS
BY LINEAR PREDICTION OF THE SPEECH WAVE）と題した
論文（J.ACOUST.SOC.AM..p.p.637−655,1971）（文献10
等）を参照することができる。As the interpolation method of these parameters, methods other than linear interpolation can be considered. For example, logarithmic interpolation or the like can be considered for the pulse train and the pitch period. In addition, the interpolation of the parameters of the synthesis filter can be performed, for example, by using linear prediction coefficients (however,
In this case, it is necessary to check the stability of the filter), a logarithmic cross-sectional area function, a method of interpolating a formant parameter or an autocorrelation function, and the like can be used. These concrete methods are BS ATAL (BSATAL)
"SP-EECH ANALYSIS AND SYNTHESIS by SPEECH ANALYSIS AND SYNTHESIS BY LINEAR BRIDISION OF THE SPEECH WAVE"
BY LINEAR PREDICTION OF THE SPEECH WAVE) (J.ACOUST.SOC.AM..pp637-655,1971) (Reference 10)
Etc.) can be referred to.

さらに代表ピッチ区間の選択法としては、他の方法を
用いることもできる。Further, as a method of selecting the representative pitch section, another method can be used.

本実施例では、フレーム長は一定としてＫパラメータ
の分析および音源パルス列の計算をしたが、フレーム長
は可変としてもよい。このようにした場合には、音声の
変化部では、フレーム長を短くし、定常部ではフレーム
長を長くできるので、伝送ビットレイトを低減すること
ができる。In the present embodiment, the analysis of the K parameter and the calculation of the sound source pulse train are performed with the frame length being fixed, but the frame length may be variable. In such a case, the frame length can be shortened in the voice change section and the frame length can be increased in the stationary section, so that the transmission bit rate can be reduced.

なお、ディジタル信号処理の分野でよく知られている
ように、自己相関関数はパワスペクトルから計算するこ
ともできる。また、相互相関関数はクロスパワスベクト
ルから計算することもできる。これらの対応関係につい
ては、エーブイオッペンハイム（A.V.OPPENHEIM）
らによる“ディジタル信号処理”“DIGITAL SIGNAL PRO
CESSING"と題した単行本（文献11）等の第８章にて詳細
に説明されているので、ここでは説明を省略する。The autocorrelation function can also be calculated from the power spectrum, as is well known in the field of digital signal processing. The cross-correlation function can also be calculated from the cross power vector. For these correspondences, see AVOPPENHEIM
"Digital signal processing" by "DIGITAL SIGNAL PRO
Since it has been described in detail in Chapter 8 such as a book titled "CESSING" (reference 11), the description thereof is omitted here.

以上はすべて本発明の趣旨を損なうことなく、いずれ
も容易に実施できるものである。All of the above can be easily implemented without impairing the spirit of the present invention.

（発明の効果）以上述べたように本発明によれば、音源信号として、
有声区間では音声信号の周期性を利用しあらかじめ定め
られたピッチ区間についてパルス列を探索し、これを用
いて１フレームの音源信号を表わしているので、パルス
探索や代表区間の探索に要する演算量を大幅に低減する
ことができる。また、代表区間の位置やフレーム内のサ
ブフレームの開始点を表わすための補助情報を送る必要
がないのでその分パルス列に情報を割り当てることがで
きる。したがって少ない演算量で低いビットレートでも
非常に高品質な音声を合成でき、ハード化が極めて容易
であるという効果がある。As described above, according to the present invention, as the sound source signal,
In the voiced section, the periodicity of the voice signal is used to search for a pulse train in a predetermined pitch section, and a one-frame sound source signal is represented using this, so the amount of calculation required for pulse search and search for the representative section is calculated. It can be significantly reduced. Moreover, since it is not necessary to send auxiliary information for indicating the position of the representative section and the start point of the subframe in the frame, information can be assigned to the pulse train accordingly. Therefore, a very high quality voice can be synthesized with a small amount of calculation even at a low bit rate, and there is an effect that hardware implementation is extremely easy.

一方、無声区間ではパルス列と雑音の組み合わせによ
り音源信号を表わしているため、種々の子音波形や過渡
的な音声波形でも極めて良好に表現することができると
いう効果がある。また、有声、無声の判別を過まった場
合でも音質の劣化が著しく少なくなるという効果もあ
る。On the other hand, in the unvoiced section, since the sound source signal is represented by the combination of the pulse train and the noise, there is an effect that various consonant sound waveforms and transient speech waveforms can be represented extremely well. Further, even if the voiced / unvoiced discrimination is mistaken, the deterioration of the sound quality is significantly reduced.

[Brief description of drawings]

第１（ａ）図は本発明による音声符号化方法とその装置
の送信側の一実施例の構成を示すブロック図、第１
（ｂ）図は本発明による音声符号化方法とその装置の受
信側の一実施例の構成を示すブロック図、第２図は第１
（ａ）図における有声のフレームでの代表区間と区間内
パルス列の求め方を示すパルス処理説明図、第３図は有
声区間における代表区間のパルス列を求める従来方式の
一例を示すパルス処理説明図である。 110……バッファメモリ、130……ピッチ分析回路、140
……Ｋパラメータ計算回路、150……ピッチ符号化回
路、160……Ｋパラメータ符号化回路、170……インパル
ス応答計算回路、180……自己相関関数計算回路、220…
…駆動信号計算回路、225……雑音メモリ、230……符号
化回路、260……マルチプレクサ、290……デマルチプレ
クサ、300……復号回路、310……雑音メモリ、320……
ピッチ復号回路、330……Ｋパラメータ復号回路、335…
…補間回路、340……駆動信号復元回路、350……合成フ
ィルタ回路。FIG. 1 (a) is a block diagram showing a configuration of an embodiment of a voice encoding method and a transmitting side of the apparatus according to the present invention.
FIG. 1B is a block diagram showing the configuration of an embodiment of the receiving side of the voice encoding method and apparatus according to the present invention, and FIG.
FIG. 3A is a pulse processing explanatory view showing a representative section in a voiced frame and a method of obtaining an intra-section pulse train, and FIG. 3 is a pulse processing explanatory view showing an example of a conventional method for obtaining a pulse train of a representative section in a voiced section. is there. 110 …… Buffer memory, 130 …… Pitch analysis circuit, 140
... K parameter calculation circuit, 150 ... pitch coding circuit, 160 ... K parameter coding circuit, 170 ... impulse response calculation circuit, 180 ... autocorrelation function calculation circuit, 220 ...
… Drive signal calculation circuit, 225 …… Noise memory, 230 …… Encoding circuit, 260 …… Multiplexer, 290 …… Demultiplexer, 300 …… Decoding circuit, 310 …… Noise memory, 320 ……
Pitch decoding circuit, 330 ... K parameter decoding circuit, 335 ...
Interpolation circuit, 340 Drive signal restoration circuit, 350 Synthesis filter circuit.

Claims

(57) [Claims]

1. A transmission side inputs a discrete voice signal, extracts a spectrum parameter indicating a short-time spectrum envelope and a pitch parameter indicating a pitch for each frame from the voice signal, and a frame section is cycled by the pitch parameter. Is divided into a plurality of pitch intervals equal to and a sound source signal of one pitch interval is represented by a multi-pulse train or a combination of noise and a multi-pulse train, and information representing the sound source signal, the pitch parameter and the spectrum parameter are output in combination. Then, on the receiving side, the frame is divided into pitch sections based on the pitch parameter, the excitation signal of one pitch section is restored based on the information indicating the excitation signal to restore the driving excitation signal of the entire frame, and Synthesizing the speech signal using spectral parameters Voice coding method for the butterflies.

2. A parameter calculation circuit for extracting and coding a spectrum parameter representing a short-time spectrum envelope and a pitch parameter representing a pitch for each frame from an input voice signal, and a frame section having a period equal to the pitch parameter. A drive signal calculation circuit that encodes the excitation signal by dividing the excitation signal into a plurality of pitch intervals and expressing the excitation signal of one pitch interval by a multi-pulse train or a combination of noise and multi-pulse train, and an output code of the parameter calculation circuit. And a multiplexer circuit for outputting the combination with the output code of the drive signal calculation circuit.