JP2615548B2

JP2615548B2 - Highly efficient speech coding system and its device.

Info

Publication number: JP2615548B2
Application number: JP60178911A
Authority: JP
Inventors: 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1985-08-13
Filing date: 1985-08-13
Publication date: 1997-05-28
Anticipated expiration: 2012-05-28
Also published as: JPS6238500A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声信号を低いビットレイトで高品質に符号
化するための符号化方法とその装置に関する。Description: TECHNICAL FIELD The present invention relates to an encoding method and apparatus for encoding an audio signal with high quality at a low bit rate.

（従来の技術）音声信号を低い伝送ビットレイト（例えば4.8kbps程
度）で符号化する方式として、ボコーダ（VOCODER）が
知られている。この方法については、原理については例
えば、エムアールシュレイダー（M.R.SCROEDER）氏
による“ボコーダズ：アナリシスアンドシンセシス
オブスピーチ”（“VOCODERS:ANALYSIS AND SYNTHE
SIS OF SPEECH"）と題した論文（PROC.IEEE,p.p.720−7
34,MAY,1966）（文献１）等に詳細に説明されている。
また、線形予測分析法を用いるボコーダとしてエルピー
シーボコーダ（LPCVOCODER）が知られており、その内容
については例えば、ジエーディーマーケル（J.D.MA
RKEL）氏らによる“アーリニアープレディクション
ボコーダベイスドアポンザオートコリレイショ
ンメソッド”（“ALINEAR PREDICTION VOCODER BASED U
PON THE AUTOCORRELATION METHOD"）と題した論文（IEE
E TRANS.A.S.S.P.,p.p.124−134,APRIL,1974）（文献
２）等に詳細に説明されている。本発明はVOCODERの音
源部を改良したものであり、LPCVOCODERと密接な関係が
あるので、以下LPCVOCODERについて合成部の構成を中心
に概略を説明する。(Prior Art) A vocoder (VOCODER) is known as a method for encoding an audio signal at a low transmission bit rate (for example, about 4.8 kbps). The principle of this method is described in, for example, "VOCODERS: ANALYSIS AND SYNTHEN" by MR SCROEDER.
SIS OF SPEECH ") (PROC.IEEE, pp720-7
34, MAY, 1966) (Literature 1).
LPCVOCODER (LPCVOCODER) is known as a vocoder using the linear prediction analysis method, and its contents are described, for example, by JD Markel (JDMA).
RKEL) et al. “A Linear Prediction
Vocoder based upon the autocorrelation method ”(“ ALINEAR PREDICTION VOCODER BASED U
PON THE AUTOCORRELATION METHOD ") (IEE
E TRANS.ASSP, pp124-134, APRIL, 1974) (Reference 2) and the like. The present invention is an improvement of the sound source section of the VOCODER, and has a close relationship with the LPCVOCODER. Therefore, the LPCVOCODER will be briefly described below with a focus on the configuration of the synthesis section.

第４図は、文献２に記載のLPCVOCODERの合成部（受信
部）を示すブロック図である。合成部は音源発生部500
と合成フィルタ510からなる。音源発生部500はインパル
ス発生器501と雑音発生器502と有声／無声切りかえ回路
503と、ゲイン回路504から構成される。VOCODERでは、
音声信号は短時間（例えば20msec）毎に有声と無声の２
種にわけられ、有声の場合は、インパルス発生器501か
らピッチ周期Pdの時間間隔をもつパルス列が発生され
る。一方、無声の場合は、雑音発生器502から白色雑音
が発生される。有声／無声の制御は、切り換え回路503
にておこなわれる。このようにして発生された信号に対
して、ゲイン回路504にてゲインＧがあたえられ、音源
信号ｄ（ｎ）として合成フィルタ510へ出力される。FIG. 4 is a block diagram showing a synthesis unit (reception unit) of LPCVOCODER described in Reference 2. The synthesis unit is the sound source generation unit 500
And a synthesis filter 510. The sound source generator 500 includes an impulse generator 501, a noise generator 502, and a voiced / unvoiced switching circuit.
503 and a gain circuit 504. In VOCODER,
The audio signal is voiced and unvoiced every short time (for example, 20 msec).
In the case of voice, the impulse generator 501 generates a pulse train having a time interval of the pitch period Pd. On the other hand, when there is no voice, white noise is generated from the noise generator 502. The voiced / unvoiced control is performed by the switching circuit 503.
It is done in. The signal generated in this way is given a gain G by a gain circuit 504 and output to the synthesis filter 510 as a sound source signal d (n).

合成フィルタ510では音源信号ｄ（ｎ）とフィルタパ
ラメータKiを用いて音声ｘ（ｎ）を合成し出力する。こ
こでピッチ周期Pd、有声／無声切り換え信号（V/UV）、
ゲインＧ、フィルタパラメータKiは分析側（送信側）に
おいてあらかじめ定められた時間ごとに計算され、受信
側に伝送される。The synthesis filter 510 synthesizes and outputs the sound x (n) using the sound source signal d (n) and the filter parameter Ki. Here, pitch period Pd, voiced / unvoiced switching signal (V / UV),
The gain G and the filter parameter Ki are calculated at predetermined intervals on the analysis side (transmission side) and transmitted to the reception side.

（発明が解決しようとする問題点）以上説明したLPCVOCODERにおいては、伝送情報は、ピ
ッチ周期、有声／無声音声、ゲイン、フィルタパラメー
タであり、これらの情報から音声信号を合成できるの
で、伝送ビットレイトを低く（例えば4.5kbps程度）す
ることができる。しかしながら、この従来法では品質の
良好な音声を合成することは困難であった。それは、音
源信号は有声の場合は音源を１ピッチあたり１個のイン
パルスで表わしており、更に位相情報も含まないので、
自然性はかなり損なわれており、その合成音はいわゆる
機械的な音であった。また、音声を有声と無声という２
種の極端なクラスにわけ、音源をインパルス音源が雑音
源に切り替えているので、有声／無声の判別誤りがおき
た場合は大きな品質劣化をひきおこすという欠点があっ
た。また、無声と有声の切り換わり部では音源を良好に
表わすことができず、劣化がおきていた。更に、ピッチ
周期がずれて求まった場合には、大きな品質劣化を引き
起こすという欠点があった。(Problems to be Solved by the Invention) In the LPCVOCODER described above, the transmission information is a pitch period, voiced / unvoiced voice, gain, and filter parameters, and a voice signal can be synthesized from these information. Can be reduced (for example, about 4.5 kbps). However, it has been difficult to synthesize a high-quality voice by the conventional method. That is, when the sound source signal is voiced, the sound source is represented by one impulse per pitch and does not include phase information.
Naturalness was considerably impaired, and the synthesized sound was a so-called mechanical sound. Also, voice is called voiced and unvoiced.
Since the impulse sound source is switched to the noise source, the sound source is switched to the noise source, so that there is a defect that when a voiced / unvoiced discrimination error occurs, a large quality deterioration is caused. In addition, the sound source could not be well represented at the switching part between unvoiced and voiced, and deterioration occurred. Further, when the pitch period is determined to be shifted, there is a disadvantage that a great quality deterioration is caused.

音源を改良する方法として、例えば特願昭59−272435
号明細書（文献３）等に記載されているように、音源を
パルス列の組み合わせで表わし、代表的なピッチ区間の
パルス列を伝送する方法が知られている。この方法では
ピッチ周期の明瞭な有声区間では前述の問題点を改善し
良好な品質を得ることができるが、ピッチが明瞭でなく
音源が雑音的になる無声区間、及び、無声区間と有声区
間との過渡部では、伝送ビットレイトが低い場合、音源
を良好に表わせず、品質が劣化するという欠点があっ
た。As a method for improving the sound source, for example, Japanese Patent Application No. 59-272435
As described in the specification (Document 3) and the like, a method of expressing a sound source by a combination of pulse trains and transmitting a pulse train in a representative pitch section is known. In this method, the above problem can be improved and a good quality can be obtained in a voiced section in which the pitch period is clear, but a voiceless section in which the pitch is not clear and the sound source is noisy, and an unvoiced section and a voiced section are used. In the transient part, when the transmission bit rate is low, the sound source cannot be displayed well, and the quality deteriorates.

本発明の目的は、比較的少ない演算量で、4.8kbps程
度の低い伝送ビットレイトでも高品質な音声を合成する
ことのできる高能率音声符号化方式とその装置を提供す
ることにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a high-efficiency speech coding system and a device capable of synthesizing high-quality speech with a relatively small amount of computation even at a low transmission bit rate of about 4.8 kbps.

（問題を解決するための手段）本発明の高能率音声符号化方式は、送信側では離散的
な音声信号を入力しあらかじめ定められた時間間隔に分
割し、前記音声信号から短時間スペクトル包絡を表すス
ペクトルパラメータとピッチ周期を表すピッチパラメー
タとを抽出し、前記時間間隔の音声信号を前記ピッチ周
期に応じた小区間に分割し、前記音声信号を表すための
音源を前記小区間のうちの代表的な区間のパルス列また
はパルス列と雑音の組み合わせで表し、前記音源を表す
情報と前記ピッチパラメータと前記スペクトルパラメー
タとを組み合わせて出力し、受信側では前記ピッチパラ
メータと前記音源を表す情報をもとに前記代表区間のパ
ルス列に対し前記ピッチ周期毎に時間的になめらかな変
化を与える処理を施して駆動音源信号を復元して前記ス
ペクトルパラメータを用いて前記音声信号を合成するこ
とを特徴とする。(Means for Solving the Problem) According to the high-efficiency speech coding method of the present invention, a discrete speech signal is inputted on the transmitting side, divided into predetermined time intervals, and a short-time spectrum envelope is obtained from the speech signal. Extracting a spectral parameter and a pitch parameter representing a pitch cycle, dividing the audio signal at the time interval into small sections corresponding to the pitch cycle, and representing a sound source for representing the audio signal as a representative of the small sections. Expressed by a combination of a pulse train or a pulse train and noise in a typical section, combining the information representing the sound source, the pitch parameter and the spectrum parameter, and outputting the combined data.On the receiving side, based on the information representing the pitch parameter and the sound source. Restoring the drive sound source signal by performing processing to give a temporally smooth change to the pulse train of the representative section for each pitch cycle And synthesizing the audio signal using the spectrum parameter.

また、本発明の符号化装置は、入力した音声信号をあ
らかじめ定められた時間間隔に分割し前記音声信号から
短時間スペクトル包絡を表すスペクトルパラメータとピ
ッチ周期を表すピッチパラメータとを抽出し符号化する
パラメータ計算回路と、前記時間間隔の音声信号を前記
ピッチ周期に応じた小区間に分割し、前記音声信号を表
すための音源を前記小区間のうちの代表的な区間のパル
ス列またはパルス列と雑音の組み合わせを求め前記音源
を表す情報を符号化する駆動信号計算回路と、前記パラ
メータ計算回路の出力符号と前記駆動信号計算回路の出
力符号とを組み合わせて出力するマルチプレクサ回路と
を有することを特徴とする。Further, the encoding device of the present invention divides an input audio signal into predetermined time intervals, extracts and encodes a spectrum parameter representing a short-time spectrum envelope and a pitch parameter representing a pitch period from the audio signal. A parameter calculation circuit, the audio signal at the time interval is divided into small sections corresponding to the pitch period, and a sound source for representing the audio signal is a pulse train or a pulse train of a representative section of the small section and noise. A drive signal calculation circuit for obtaining a combination and encoding information representing the sound source; and a multiplexer circuit for combining and outputting an output code of the parameter calculation circuit and an output code of the drive signal calculation circuit. .

更に本発明の復号化装置は、ピッチパラメータを表す
符号とスペクトルパラメータを表す符号と音源情報を表
す符号とが組み合わされた符号系列が入力され前記ピッ
チパラメータを表す符号と前記スペクトルパラメータを
表す符号と前記音源情報を表す符号とを分離して復号す
るデマルチプレクサ回路と、前記復号されたピッチパラ
メータと前記復号された音源情報をもとに代表区間のパ
ルス列に対してピッチ周期毎に時間的になめらかな変化
を与える処理を施し、パルス列と雑音を音源とする場合
は前記音源情報をもとにパルス列と雑音を発生して駆動
音源信号を復元する駆動音源信号復元回路と、前記駆動
音源信号と前記復号されたスペクトルパラメータをもと
に音声信号を合成し出力する合成フィルタ回路とを有す
ることを特徴とする。Further, the decoding device of the present invention, a code representing a combination of a code representing a pitch parameter, a code representing a spectrum parameter, and a code representing excitation information is input, and a code representing the pitch parameter and a code representing the spectrum parameter. A demultiplexer circuit that separates and decodes the code representing the excitation information, and a temporally smooth pulse train of a representative section based on the decoded pitch parameter and the decoded excitation information for each pitch cycle. When a pulse train and noise are used as a sound source, a driving sound source signal restoring circuit that generates a pulse train and noise based on the sound source information to restore a driving sound source signal, A synthesis filter circuit for synthesizing and outputting the audio signal based on the decoded spectral parameters.

（作用）本発明は、音声信号の周期性を利用して前記文献３に
記載のように代表的な１ピッチ区間のパルス列で表わし
た音源信号と、パルスと雑音源との組み合わせによる音
源信号のうち、音声信号をより良好に表わすことのでき
る音源信号を選択することを特徴とする。代表的な１ピ
ッチ区間のパルス列を求める方法としては、前記文献３
に記載の方法を用いることができる。また、パルス列の
振幅と位置を求める方法としては、前記文献３に記載の
方法の他に、例えばアナリシス−バイ−シンセシス（AN
ALYSIS−by−SYNTHESIS;A−ｂ−Ｓ）の手法を用いる方
法が知られており、その詳細についてはビーエスア
タル（B.S.ATAL）氏らによる“アニューモデルオ
ブエルピーシーエクサイテイションフォープロ
デューシングナチユラルサウンディングスピーチ
アットロウビットレイツ”（“Ａ NEW MODEL
OF LPC EXCITATION FOR PRODUCING NATURAL SOUND
ING SPEECH ATLOW BIT RATES"）と題した論文（PROC.I.
C.A.S.S.P.,p.p.614−617,1982）（文献４）等に説明さ
れている。(Operation) The present invention utilizes a periodicity of a sound signal to generate a sound source signal represented by a pulse train of a typical one-pitch section as described in Reference 3 and a sound source signal by a combination of a pulse and a noise source. Among them, a sound source signal capable of better expressing a voice signal is selected. As a method for obtaining a typical pulse train in one pitch section, see the aforementioned reference 3.
Can be used. As a method for obtaining the amplitude and the position of the pulse train, in addition to the method described in the aforementioned reference 3, for example, analysis-by-synthesis (AN
A method using the ALYSIS-by-SYNTHESIS (Abs) method is known, and details thereof are described in “A New Model of LPC Excitation for Producing Natural” by BSATAL et al. Sounding Speech At Low Bit Rate's ("A NEW MODEL
OF LPC EXCITATION FOR PRODUCING NATURAL SOUND
ING SPEECH ATLOW BIT RATES ") (PROC.I.
CASSP, pp. 614-617, 1982) (Reference 4).

一方、パルスと雑音の組み合わせによる音源の求め方
は、フレーム全体に対してパルス列をあらかじめ定めら
れた個数だけ求めたあとで、雑音源の振幅と位相を計算
する。On the other hand, in a method of obtaining a sound source by a combination of a pulse and noise, a predetermined number of pulse trains are obtained for the entire frame, and then the amplitude and phase of the noise source are calculated.

（実施例）以下、本発明の実施例について図面を参照して詳細に
説明する。第１図（ａ）は本発明による高能率音声符号
化方式の送信側の一実施例を示すブロック図であり、第
１図（ｂ）は受信側の一実施例を示すブロック図であ
る。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 (a) is a block diagram showing one embodiment of a transmitting side of the high-efficiency speech coding system according to the present invention, and FIG. 1 (b) is a block diagram showing one embodiment of a receiving side.

第１図（ａ）において、音声信号Ｘ（ｎ）が入力され
あらかじめ定められたサンプル数だけバッファメモリ回
路110に蓄積される。次にＫパラメータ計算回路140は、
バッファメモリ回路110からあらかじめ定められたサン
プル数の音声信号を入力し、音声信号のスペクトル包絡
を表わすＫパラメータを計算する。ここでＫパラメータ
はPARCOR係数と同一のパラメータである。Ｋパラメータ
の計算法としては、自己相関法がよく知られている。こ
の方法の詳細については、ジョンマコウル氏（JOHN M
AKHOUL）氏らによる“クォンタイゼイションプロパテ
ィズオブトランスミッションパラメターズイン
リニアプリディクティブシステムズ（“QUANTIZA
TION PROPERTIES OF TRANSMISSION PARAMETERS IN
LINEAR PREDICTIVE SYSTEMS"）と題した論文（IEEE T
RANS.A.S.S.P.,p.p.309−321,1983）（文献５）等に述
べられているので、ここでは説明を省略する。第１図
（ａ）にもどって、ＫパラメータK_iは、Ｋパラメータ符
号化回路160へ出力される。Ｋパラメータ符号化回路160
は、あらかじめ定められた量子化ビット数に基づいてK_i
を符号化し、符号l_iをマルチプレクサ260へ出力する。
また、Ｋパラメータ符号化回路160は、l_iを復号化して
得たＫパラメータ復号値K_i′を用い、予測係数値a_i′に
変換し、インパルス応答計算回路170と重みずけ回路200
とへ出力する。またＫパラメータ復号値K_i′を補間回路
255へ出力する。In FIG. 1A, an audio signal X (n) is input and stored in a buffer memory circuit 110 for a predetermined number of samples. Next, the K parameter calculation circuit 140
An audio signal having a predetermined number of samples is input from the buffer memory circuit 110, and a K parameter representing a spectrum envelope of the audio signal is calculated. Here, the K parameter is the same parameter as the PARCOR coefficient. As a calculation method of the K parameter, an autocorrelation method is well known. For more information on this method, see John Makoul (JOHN M
AKHOUL et al., “Quantization Properties of Transmission Parameters in Linear Predictive Systems (“ QUANTIZA
TION PROPERTIES OF TRANSMISSION PARAMETERS IN
LINEAR PREDICTIVE SYSTEMS ") (IEEE T
RANS.ASSP, pp. 309-321, 1983) (Reference 5) and the like, and a description thereof is omitted here. Returning to FIG. 1 (a), K parameters K _i is output to the K parameter coding circuit 160. K parameter encoding circuit 160
Is, K _i on the basis of the number of quantization bits which is predetermined
And outputs the code l _i to the multiplexer 260.
Further, the K parameter encoding circuit 160 uses the K parameter decoded value K _i ′ obtained by decoding l _i , converts it into a prediction coefficient value a _i ′, and converts the impulse response calculation circuit 170 and the weighting circuit 200
And output to Also, the K parameter decoded value K _i '
Output to 255.

ピッチ分析回路130は、バッファメモリ回路110の出力
を用いてピッチ周期P_dを計算する。P_dの計算法は、例え
ば、アールブイコックス（R.V.COX）氏らによる
“リアルタイムインプリメンティションォブタ
イムドメインハーモニックスケィリングオブ
スピーチシグナルズ”（“REAL−TIME IMPLEMENTATIO
N OF TIME DOMAIN HARMONIC SCALING OF SPEECH SIG
NALS"）と題した論文（IEEE TRANS.A.S.S.P.,p.p.258−
272,1983）（文献６）等で述べられている方法を用いる
ことができる。Pitch analysis circuit 130 calculates the pitch period P _d using the output of the buffer memory circuit 110. The calculation method of P _d is described in, for example, “Real-time Implementation of Time Domain Harmonic Scaling of RVCOX” et al.
Speech signals ”(“ REAL-TIME IMPLEMENTATIO
N OF TIME DOMAIN HARMONIC SCALING OF SPEECH SIG
NALS ") (IEEE TRANS.ASSP, pp258-
272, 1983) (Reference 6) and the like.

ピッチ符号化回路150は、ピッチ周期P_dをあらかじめ
定められた量子化ビット数で量子化符号化し、符号l_dを
マルチプレクサ260へ出力する。また復号化して得た
P_d′を駆動信号計算回路220へ出力する。Pitch encoding circuit 150 quantizes encoded with the number of quantization bits determined pitch period P _d advance, outputs the code l _d to the multiplexer 260. Also obtained by decoding
P _d ′ is output to the drive signal calculation circuit 220.

インパルス応答計算回路170は、Ｋパラメータ符号化
回路160から予測係数値a_i′を入力し、重みずけされた
合成フィルタの伝達関数を表わすインパルス応答hw
（ｎ）を計算する。ここで、h_w（ｎ）の計算には、例え
ば特願昭59−042305号明細書（文献７）の第４図（ａ）
に記載のインパルス応答計算回路210と同一の方法を用
いることができる。インパルス応答h_w（ｎ）は、自己相
関関数計算回路180と相互相関関係計算回路210とへ出力
される。The impulse response calculation circuit 170 receives the prediction coefficient value a _i ′ from the K parameter encoding circuit 160, and receives an impulse response hw representing a transfer function of the weighted synthesis filter.
Calculate (n). Here, for the calculation of h _w (n), for example, FIG. 4 (a) of Japanese Patent Application No. 59-042305 (Reference 7) is used.
The same method as the impulse response calculation circuit 210 described in (1) can be used. The impulse response h _w (n) is output to the autocorrelation function calculation circuit 180 and the cross-correlation calculation circuit 210.

自己相関関数計算回路180は、インパルス応答計算回
路170からインパルス応答h_w（ｎ）を入力し、自己相関
関数R_hh（ｍ）を計算し、駆動信号計算回路220へ出力す
る。ここでR_hh（ｍ）の計算には例えば前記文献７に記
載の自己相関関数計算回路180と同一の方法を用いるこ
とができる。The auto-correlation function calculation circuit 180 receives the impulse response h _w (n) from the impulse response calculation circuit 170, calculates an auto-correlation function R _hh (m), and outputs it to the drive signal calculation circuit 220. Here, for the calculation of R _hh (m), for example, the same method as that of the autocorrelation function calculation circuit 180 described in Reference 7 can be used.

次に減算器120は、バッファメモリ回路110の音声信号
Ｘ（ｎ）から合成フィルタ回路250の出力を１フレーム
分減算し、結果ｅ（ｎ）を重みずけ回路200へ出力す
る。重みずけ回路200は、ｅ（ｎ）を入力し、また、Ｋ
パラメータ符号化回路160から予測係数a_i′を入力し、
ｅ（ｎ）に対し重みずけを施して求めたe_w（ｎ）を出力
する。ここでｅ（ｎ）の計算には、例えば前記文献７の
第４図（ａ）に記載の重みずけ回路410と同一の方法を
用いることができる。Next, the subtracter 120 subtracts the output of the synthesis filter circuit 250 for one frame from the audio signal X (n) of the buffer memory circuit 110, and outputs the result e (n) to the weighting circuit 200. The weighting circuit 200 receives e (n),
The prediction coefficient a _i ′ is input from the parameter coding circuit 160,
subjected to heavy moisture to the e (n) to output a e _w (n) was determined. Here, for the calculation of e (n), for example, the same method as that of the weighting circuit 410 described in FIG.

相互相関関数計算回路210は、重みずけ回路200からe_w
（ｎ）を入力し、インパルス応答計算回路170からイン
パルス応答h_w（ｎ）を入力し相互相関関数Ψ_hx（ｍ）を
計算し、駆動信号計算回路220へ出力する。ここでΨ_hx
（ｍ）の計算には例えば前記文献７に記載の相互相関関
数計算回路210と同一の方法を用いることができる。The cross-correlation function calculation circuit 210 calculates e _w
(N) is input, the impulse response h _w (n) is input from the impulse response calculation circuit 170, the cross-correlation function Ψ _hx (m) is calculated, and output to the drive signal calculation circuit 220. Where Ψ _hx
For the calculation of (m), for example, the same method as the cross-correlation function calculation circuit 210 described in Reference 7 can be used.

次に、駆動信号計算回路220は、音声信号を表わす音
源信号として、まず代表的なピッチ区間のパルス列を計
算する。次にパルス列と雑音源による音源信号を計算
し、これらのうち、音声信号をより良好に表わし得る音
源信号を選択する。ピッチが明瞭か否かの判別には、簡
便法としてはピッチゲインP_gを用いることができる。Next, the drive signal calculation circuit 220 first calculates a pulse train of a representative pitch section as a sound source signal representing a sound signal. Next, a sound source signal based on the pulse train and the noise source is calculated, and a sound source signal that can better represent the voice signal is selected from these. The determination of whether the pitch is clearly, it is possible to use a pitch gain P _g is a simplified method.

音源信号の求め方を以下で説明する。代表的なピッチ
区間のパルス列の計算法としては、例えば前記文献３に
記載の駆動信号計算回路220と同一の方法を用いること
ができる。従ってここでは簡単に説明するにとどめる。A method for obtaining the sound source signal will be described below. As a method of calculating a pulse train in a representative pitch section, for example, the same method as that of the drive signal calculation circuit 220 described in Reference 3 can be used. Therefore, only a brief description will be given here.

まず最初に、フレームをピッチ周期P_d′ごとのサブフ
レームに分割する。この分割には、ピッチの励振位置を
知る必要があるが、これは音源を表わすパルス列を求め
ることにより知ることができる。つまり、第１番目に求
めたパルスの位置から、ピッチの励振位置を知ることが
できる。ここでパルス鵜の計算には、例えば特願昭57−
231606号明細書（文献８）に記載の第（21）式で示した
方法を用いることができる。第２図（ａ）に１フレーム
の音声波形を、第２図（ｂ）に第１番目に求まるパルス
g1とこのパルスの位置を用いて分割したサブフレームの
ようすを示す。次にサブフレーム毎に、あらかじめ定め
られた個数のパルスを計算する。ピッチ区間の選定法と
しては、例えばフレームの中央付近のサブフレームを代
表ピッチ区間とし、この区間に含まれるパルスを代表パ
ルスとする方法が考えられる。このようにして求めた代
表ピッチパルスを第２図（Ｃ）に示す。代表ピッチ区間
のパルス列の振幅、位置は符号器230へ出力される。ま
た、サブフレーム位相Ｔ、代表ピッチ区間のサブフレー
ム番号（図２（Ｃ）では３）は代表ピッチ位置としてあ
らかじめ定められたビット数で符号化され、マルチプレ
クサ260へ出力される。First, the frame is divided into subframes for each pitch period _Pd '. For this division, it is necessary to know the excitation position of the pitch, which can be known by obtaining a pulse train representing the sound source. That is, the excitation position of the pitch can be known from the position of the pulse obtained first. Here, for the calculation of the pulse cormorant, for example,
The method represented by the formula (21) described in the specification of 231606 (Reference 8) can be used. FIG. 2A shows the speech waveform of one frame, and FIG. 2B shows the first pulse obtained.
The state of a subframe divided using g1 and the position of this pulse is shown. Next, a predetermined number of pulses are calculated for each subframe. As a method of selecting a pitch section, for example, a method in which a subframe near the center of a frame is set as a representative pitch section and pulses included in this section are set as representative pulses can be considered. FIG. 2C shows the representative pitch pulse obtained in this manner. The amplitude and position of the pulse train in the representative pitch section are output to encoder 230. The subframe phase T and the subframe number of the representative pitch section (3 in FIG. 2C) are encoded with a predetermined number of bits as the representative pitch position, and output to the multiplexer 260.

次に、パルスと雑音による音源の求めかたを示す。ま
ずフレーム全体に対しあらかじめ定められた個数Ｌのパ
ルスを前述の方法を用いて求める。このパルスを用いて
信号（ｎ）を合成し、原音声信号Ｘ（ｎ）から合成信
号（ｎ）を減算した信号Ｘ′（ｎ）を求め、Ｘ′
（ｎ）を良好に表わすように雑音源を選択する。この計
算の具体的な方法を次に示す。今、雑音源をｑ（ｎ）、
雑音源の振幅をＧ、合成フィルタのインパルス応答をｈ
（ｎ）とすると、雑音源から合成される信号（ｎ）と
信号Ｘ′（ｎ）との誤差電力εは次式により表わせる。Next, a method of obtaining a sound source using pulses and noise will be described. First, a predetermined number L of pulses is obtained for the entire frame using the above-described method. The signal (n) is synthesized using these pulses, and a signal X '(n) obtained by subtracting the synthesized signal (n) from the original audio signal X (n) is obtained.
The noise source is chosen so as to better represent (n). The specific method of this calculation will be described below. Now, let q (n) be the noise source,
Let G be the amplitude of the noise source and h be the impulse response of the synthesis filter.
Assuming that (n), the error power ε between the signal (n) synthesized from the noise source and the signal X ′ (n) can be expressed by the following equation.

雑音源の振幅G_xは上式を最小化するように求めること
ができる。 The noise source amplitude G _x can be determined so as to minimize the above equation.

具体的には、雑音源のパターンをあらかじめ定められ
た種類（例えばＢ種）だけ雑音メモリ225に記憶してお
き、雑音メモリ225から１種類ずつ雑音源をよみだす。
そして（1b）式を基に最適な振幅Ｇを求め、このときの
誤差電力を計算しておく。そして、以上の処理を雑音源
の種類（Ｂ種）だけ繰り返し、誤差電力を最も小さくす
るような雑音源の種類を求めるわけである。以上述べた
音源計算処理は、無声区間と有声区間の過渡部のよい
に、音源の特性が少しずつ変化している場合は特に効果
的である。More specifically, only predetermined types (for example, B types) of noise source patterns are stored in the noise memory 225, and the noise sources are read out from the noise memory 225 one by one.
Then, the optimum amplitude G is obtained based on the equation (1b), and the error power at this time is calculated. Then, the above processing is repeated for the types of noise sources (B types) to find the type of noise source that minimizes the error power. The sound source calculation processing described above is particularly effective when the characteristics of the sound source change little by little in the transition section between the unvoiced section and the voiced section.

以上のようにして求めた２種の音源のうち、音声信号
と合成信号との誤差電力をより小さくする音源を選択
し、この音源をあらわす音源情報を符号化回路230へ出
力する。From the two types of sound sources obtained as described above, a sound source that reduces the error power between the audio signal and the synthesized signal is selected, and sound source information representing this sound source is output to the encoding circuit 230.

符号化回路230は、パルス列が入力された場合には、
パルス列の振幅、位置を符号化する。そして、パルス列
の振幅、位置の符号をマルチプレクサ260へ出力する。
また、パルス列の振幅、位置の復号値g_i′,m_i′を駆動
信号復元回路240へ出力する。ここで、パルスの符号化
法には、例えば前記文献８に記載の符号化回路250と同
一の方法を用いることができる。The encoding circuit 230, when a pulse train is input,
Encode the amplitude and position of the pulse train. Then, the amplitude and the sign of the position of the pulse train are output to the multiplexer 260.
In addition, the decoded values g _i ′ and m _i ′ of the pulse train amplitude and position are output to the drive signal restoration circuit 240. Here, as a pulse encoding method, for example, the same method as the encoding circuit 250 described in the above-mentioned reference 8 can be used.

パルスと雑音源の情報が入力された場合には、パルス
列に対しては上述の方法と同じ方法を用いて符号化し、
雑音源に対しては、振幅と雑音の種類を表わす符号をあ
らかじめ定められたビット数で符号化し、符号をマルチ
プレクサへ出力する。また、復号化した値を駆動信号復
元回路240へ出力する。When the information of the pulse and the noise source is input, the pulse train is encoded using the same method as described above,
For a noise source, a code representing the amplitude and the type of noise is encoded with a predetermined number of bits, and the code is output to a multiplexer. Further, it outputs the decoded value to drive signal restoration circuit 240.

駆動信号復元回路240は、符号化回路230から入力した
復号値を用いて、１フレーム分の音源信号を発生させ、
これを駆動音源信号として、合成フィルタ回路250へ出
力する。The drive signal restoration circuit 240 generates an excitation signal for one frame using the decoded value input from the encoding circuit 230,
This is output to the synthesis filter circuit 250 as a driving sound source signal.

補間回路255は、音源としてパルス列がもちいられる
場合は、ピッチ周期P_d′，サブフレーム位相Ｔ、代表ピ
ッチ位置を入力しピッチ周期P_d′ごとに分割されたサブ
フレームに対し、Ｋパラメータを補間する。ここで、補
間は直線補間とし、１フレーム過去及び１フレーム先の
Ｋパラメータの値をもちいて行なう。この補間のようす
を第３図に示す。図において第ｊフレームのｉ番目のＫ
パラメータＫ_i,jは、１フレーム過去の値Ｋ_i,j−１，及
び１フレーム先の値Ｋ_i,j＋１を用いて、サブフレーム
毎に補間がおこなわれる。このようにして補間して求め
たＫパラメータは、合成フィルタ回路250へ出力され
る。When a pulse train is used as the sound source, the interpolation circuit 255 inputs the pitch period P _d ′, the subframe phase T, and the representative pitch position, and interpolates the K parameter for the subframe divided for each pitch period P _d ′. I do. Here, the interpolation is linear interpolation, and the interpolation is performed using the values of the K parameter one frame before and one frame ahead. This interpolation is shown in FIG. In the figure, the i-th K in the j-th frame
As the parameter K _{i, j} , interpolation is performed for each subframe using the value K _{i, j−1} one frame before and the value K _{i, j + 1} one frame ahead. The K parameter obtained by interpolation in this way is output to the synthesis filter circuit 250.

音源としてパルスと雑音が用いられる場合は、あらか
じめ定められたサンプル区間毎に補間が行なわれる、補
間されたＫパラメータは合成フィルタ回路250へ出力さ
れる。合成フィルタ回路250は、駆動音源信号、及び補
間されたＫパラメータを入力し、１フレーム分の応答信
号（ｎ）を計算する。ここで、この計算には、例えば
前記文献８に記載の合成フィルタ回路320と同一の方法
を用いることができる。When a pulse and noise are used as a sound source, interpolation is performed for each predetermined sample section. The interpolated K parameter is output to the synthesis filter circuit 250. The synthesis filter circuit 250 receives the driving sound source signal and the interpolated K parameter, and calculates a response signal (n) for one frame. Here, for this calculation, for example, the same method as that of the synthesis filter circuit 320 described in Reference 8 can be used.

マルチプレクサ回路260は、Ｋパラメータ符号化回路1
60の符号l_kiとピッチ符号化回路150の符号l_dと符号化回
路230の符号を入力し、パルス列が用いられる場合は更
にサブフレーム位相、代表ピッチ位置を入力し、これら
を組あわせて送信側出力端子270から出力する。以上で
本発明による高能率音声符号化方式の送信側の説明を終
了する。The multiplexer circuit 260 is a K-parameter encoding circuit 1
The code l _{ki of} 60, the code l _d of the pitch coding circuit 150 and the code of the coding circuit 230 are input, and if a pulse train is used, the subframe phase and the representative pitch position are further input, and these are combined and transmitted. Output from the side output terminal 270. This concludes the description of the transmitting side of the high-efficiency speech coding system according to the present invention.

次に、本発明による音声符号化方式の受信側の構成に
ついて、第１図（ｂ）を参照して説明する。Next, the configuration of the receiving side of the speech coding system according to the present invention will be described with reference to FIG.

デマルチプレクサ290は、受信側入力端子280から入力
した符号のうち、Ｋパラメータを表わす符号と、ピッチ
周期を表わす符号と、音源情報を表わす符号とを分離し
て、それぞれＫパラメータ復号回路330、ピッチ復号回
路320、復号回路300へ出力する。The demultiplexer 290 separates a code representing the K parameter, a code representing the pitch period, and a code representing the excitation information among the codes input from the receiving side input terminal 280, and separates them into the K parameter decoding circuit 330, Output to the decoding circuit 320 and the decoding circuit 300.

Ｋパラメータ復号回路330は、Ｋパラメータを復号し
て復号値K_i′を補間回路335へ出力する。The K parameter decoding circuit 330 decodes the K parameter and outputs a decoded value K _i ′ to the interpolation circuit 335.

ピッチ復号回路320は、ピッチ周期P_d′を復号して、
駆動信号復元回路340、補間回路335へ出力する。The pitch decoding circuit 320 decodes the pitch period P _d ′,
The signal is output to the drive signal restoration circuit 340 and the interpolation circuit 335.

復号回路300は音源情報を復号し駆動信号復元回路340
へ出力する。The decoding circuit 300 decodes the sound source information and restores the drive signal
Output to

駆動信号復元回路340は、ピッチ周期復号値P_d′を用
いて、これが０以外の値であれば音源としてパルス列が
用いられると判別して、サブフレーム位相、代表ピッチ
位置を表わす符号を音源情報から分離して復号し、これ
らを用いてフレームをピッチ周期P_d′ごとのサブフレー
ムに分割する。そして代表ピッチ位置で表されるサブフ
レーム区間に対して位置ｍ′に振幅ｇ′のパルスを発生
させる。次に、代表ピッチパルスと１フレーム過去、及
び１フレーム先の代表的なパルスを用いてサブフレーム
毎にパルスを補間して求める。こうして１フレーム全体
についてパルスを発生させ駆動音源信号を復元し合成フ
ィルタ回路350へ出力する。The drive signal restoring circuit 340 uses the pitch period decoded value P _d ′ to determine that a pulse train is used as a sound source if this is a value other than 0, and outputs a code representing a subframe phase and a representative pitch position as sound source information. , And are used to divide the frame into subframes for each pitch period P _d ′. Then, a pulse having an amplitude g 'is generated at a position m' for the subframe section represented by the representative pitch position. Next, a pulse is interpolated and determined for each sub-frame using the representative pitch pulse and the representative pulse one frame before and one frame ahead. In this manner, a pulse is generated for the entire one frame to restore the driving sound source signal and output to the synthesis filter circuit 350.

一方パルスと雑音が音源として用いられる場合は、パ
ルス列の振幅、位置と雑音源の振幅、種類を表わす符号
を音源情報から分離して復号する。雑音源に対しては、
送信側の雑音メモリ回路225と同一の雑音が記憶されて
いる雑音メモリ310に対し、復号した種類を読み出し開
始位置として、あらかじめ定められたサンプル数だけ雑
音信号を読み出し、これに振幅Ｇを乗じて音源を再生す
る。今、雑音信号のサンプル値をqi（ｎ）とすると、音
源信号ｖ（ｎ）は次式により表わせる。On the other hand, when the pulse and the noise are used as the sound source, the code representing the amplitude of the pulse train, the position and the amplitude and the type of the noise source are separated from the sound source information and decoded. For noise sources,
With respect to the noise memory 310 in which the same noise as that of the noise memory circuit 225 on the transmission side is stored, the decoded type is read as a read start position, a noise signal is read out for a predetermined number of samples, and the noise signal is multiplied by the amplitude G. Play the sound source. Now, assuming that the sample value of the noise signal is qi (n), the sound source signal v (n) can be expressed by the following equation.

Ｖ（ｎ）＝Ｇ・qi（ｎ）（２）上式でｉは雑音メモリ310に記憶されている雑音信号
の種類を示す。V (n) = G · qi (n) (2) In the above equation, i indicates the type of the noise signal stored in the noise memory 310.

上式の音源信号に復号したパルス列を加算して駆動音
源信号を復元し、合成フィルタ回路350へ出力する。The driving pulse signal is restored by adding the decoded pulse train to the excitation signal of the above formula, and output to the synthesis filter circuit 350.

補間回路335は、送信側の補間回路225と同一の動作を
し、復号されたＫパラメータをピッチ周期ごとに補間
し、補間されたＫパラメータを合成フィルタ回路350へ
出力する。The interpolation circuit 335 performs the same operation as the interpolation circuit 225 on the transmission side, interpolates the decoded K parameter for each pitch period, and outputs the interpolated K parameter to the synthesis filter circuit 350.

合成フィルタ回路350は、駆動音源信号、補間された
Ｋパラメータを入力し、送信側の合成フィルタ回路250
と同一の動作をして１フレーム分の合成音声信号Ｘ
（ｎ）を計算し、受信側出力端子360から出力する。The synthesis filter circuit 350 inputs the driving sound source signal and the interpolated K parameter, and outputs the synthesis filter circuit 250 on the transmission side.
Performs the same operation as that of the synthesized voice signal X for one frame.
(N) is calculated and output from the receiving-side output terminal 360.

以上で本発明による高能率音声符号化方式の受信側の
説明をおえる。The receiving side of the high-efficiency speech coding system according to the present invention has been described above.

駆動信号計算回路220において、無声区間での種々の
音声を良好に表わすとともに、無声区間と音声区間との
間で良好な遷移を実現するために、音源をパルスと雑音
で表わす場合に、パルス数を０から数個まで適応的にか
えるようにしてもよい。この場合はパルス数を表わす情
報を伝送する必要がある（例えばフレームあたり２ビッ
ト程度）。演算量を減らす方法としては、例えばピッチ
符号化回路で１ピッチ離れた自己相関関数の値からピッ
チゲインを求め、ピッチゲインの大きさにより有声か無
声かを送信側で音源信号計算の前に判別し、有声の場合
は音源信号として代表ピッチ区間のパルス列、無声の場
合は雑音とパルス列の組み合わせを用いるようにしても
よい。また有声無声の判別方法としては、他の周知な方
法を用いることができる。In the drive signal calculation circuit 220, in order to satisfactorily represent various voices in the unvoiced section and realize a good transition between the unvoiced section and the voice section, when the sound source is represented by pulses and noise, the number of pulses May be adaptively changed from zero to several. In this case, it is necessary to transmit information indicating the number of pulses (for example, about 2 bits per frame). As a method of reducing the amount of calculation, for example, a pitch encoding circuit obtains a pitch gain from the value of an autocorrelation function separated by one pitch, and determines whether voiced or unvoiced on the transmitting side before calculating a sound source signal based on the magnitude of the pitch gain. Alternatively, a pulse train of a representative pitch section may be used as a sound source signal when voiced, and a combination of noise and a pulse train may be used when voiceless. In addition, as a method of determining voiced / unvoiced, other well-known methods can be used.

駆動信号計算回路220におけるパルス計算法として
は、本実施例でのべた方法の他に、種々の方法を用いる
ことができる。例えばパルスを１つ求めるごとに過去に
求めたパルスの振幅を調整する方法を用いることができ
る。この方法の詳細については小野氏らによる“マルチ
パルス駆動型音声符号化法における音源パルス探索法の
検討”と題した論文（日本音響学会講演論文集157,198
3）（文献９）等に述べられているのでここでは説明を
省略する。As a pulse calculation method in the drive signal calculation circuit 220, various methods can be used in addition to the method described in this embodiment. For example, a method of adjusting the amplitude of a pulse obtained in the past every time one pulse is obtained can be used. For details of this method, see a paper entitled “Study on Source Pulse Search Method in Multi-pulse Driven Speech Coding” by Ono et al. (Proceedings of the Acoustical Society of Japan 157,198).
3) Since it is described in (Reference 9) and the like, the description is omitted here.

また、駆動信号計算回路220にてパルス列を求めるさ
いに、フレームをサブフレームに分割した後に、サブフ
レームごとにパルス列を求めていたが、サブフレームに
分割せずに、フレーム全体に対してあらかじめ定められ
た個数のパルスを求めそのうちのサブフレームにはいる
パルスを用いるようにしてもよい。In addition, when a pulse train was obtained by the drive signal calculation circuit 220, the frame was divided into sub-frames, and then a pulse train was obtained for each sub-frame. A given number of pulses may be obtained, and the pulses in the subframe may be used.

一方、雑音源を計算する別な方法としては、例えば、
サブフレーム毎にガウス分布に従うランダムな雑音信号
を発生させ、雑音信号から合成した信号とサブフレーム
区間の音声信号との誤差電力を最小化するような雑音を
選択する方法が知られている。この方法の詳細について
は、ビーエスアタル（B.S.ATAL）氏らによる“スト
キャスティックコーディングオブスピーチシグ
ナルズアットベリィロウビットレイツ”
（“STOCHASTIC CODING OF SPEECH SIGNALS AT VERY LO
W BIT RATES"）と題した論文（PROC.,ICC84,pp.1610−1
613,1984）（文献10）等を参照することができる。ま
た、他の方法としては、雑音源は１種としてあらかじめ
定められたサンプル数だけ用意しておき、音声信号を予
測した予測残差信号から雑音源の振幅と位相（読み出し
位置）を求める方法が知られている。この方法では予測
残差上で計算を行なうので演算量を低減することができ
る。この方法の詳細については大山氏による“残差を雑
音でモデル化した駆動音源による線形予測分析合成方
式”と題した論文（日本音響学会講演論文集昭和59年10
月165−166頁）（文献11）を参照することができる。ま
た、音源の特性がほぼ一定な無声区間では、前記文献２
のように固定の雑音源を用い振幅のみ伝送し、過渡部で
は雑音源の振幅と種類を送るようにしてもよい。更に、
無声区間では常に雑音源は固定としてもよい。On the other hand, as another method of calculating the noise source, for example,
There is known a method of generating a random noise signal according to a Gaussian distribution for each subframe, and selecting noise that minimizes error power between a signal synthesized from the noise signal and a speech signal in a subframe section. For more information on this method, see “Stochastic Coding of Speech Signals at Berry Low Bit Rate” by BSATAL et al.
(“STOCHASTIC CODING OF SPEECH SIGNALS AT VERY LO
W BIT RATES ") (PROC., ICC84, pp. 1610-1)
613, 1984) (Literature 10). As another method, there is a method in which a predetermined number of samples are prepared as a single noise source, and the amplitude and phase (readout position) of the noise source are obtained from the prediction residual signal obtained by predicting the audio signal. Are known. In this method, the calculation is performed on the prediction residual, so that the amount of calculation can be reduced. For a detailed description of this method, see a paper entitled “Linear predictive analysis and synthesis using a driving sound source with residual modeled by noise” by Oyama (Proceedings of the Acoustical Society of Japan, Oct. 1984.
Pp. 165-166) (Reference 11). In an unvoiced section where the characteristics of the sound source are almost constant,
As described above, only the amplitude may be transmitted using a fixed noise source, and the amplitude and type of the noise source may be transmitted in the transient part. Furthermore,
The noise source may always be fixed in the unvoiced section.

本実施例の送信側では、有声区間に於いてフレーム内
のサブフレームごとにパルスを求めるときに、Ｋパラメ
ータの値はフレーム内で一定（つまり合成フィルタの特
性がフレーム内で変化しない）としていたが、Ｋパラメ
ータの値をサブフレーム毎になめらかに変化させながら
パルスを求めてもよい。具体的には、Ｋパラメータの値
を前後のフレームのＫパラメータの値を用いてサブフレ
ーム毎に補間し、この値を予測係数に変換して、重みず
け回路200、インパルス応答計算回路170、合成フィルタ
回路250に出力し、サブフレーム毎に係数を更新して求
めた相互相関関数、自己相関関数を用いてパルスを計算
する。このようにしたほうが時間的に滑らかなスペクト
ル変化が得られ、品質のより高い音声を合成できる。ま
た、パルス及びＫパラメータの値を補間するさいに、代
表的なピッチ区間を基準としてピッチ周期に同期させて
補間しててもよいし、パルス及びＫパラメータのいずれ
か一方、あるいは両方とも、あらかじめ定められたピッ
チ区間（例えば、フレームの中央付近のピッチ区間）を
基準として補間を施してもよい。両者ともにこのような
補間法を用いる場合は、代表ピッチ区間の位置を表わす
符号を伝送しなくてもよく、伝送ビットレイトを減らす
ことができる。一方、パルス及びＫパラメータをピッチ
周期に同期させずに補間する方法も考えられる。この場
合は、フレームをあらかじめ定められた時間間隔（例え
ば2.5msec程度）に区切り、この区間毎に補間処理を行
なう。この場合はサブフレーム位相は伝送してなくても
よいので伝送ビットレイトを減らすことができる。この
場合は、補間の基準区間としては、代表区間を送信側で
さがしてもよいし、あらかじめ定めておいてもよい（例
えばフレーム中央付近）。後者の場合には、サブフレー
ム位相と代表ピッチ位置を伝送しなくてもよく、更にビ
ットレイトを減らすことができる。On the transmitting side of the present embodiment, when obtaining a pulse for each subframe in a frame in a voiced section, the value of the K parameter is constant within the frame (that is, the characteristics of the synthesis filter do not change within the frame). However, the pulse may be obtained while smoothly changing the value of the K parameter for each subframe. Specifically, the value of the K parameter is interpolated for each subframe using the value of the K parameter of the preceding and succeeding frames, and this value is converted into a prediction coefficient, and the weighting circuit 200, the impulse response calculation circuit 170, A pulse is calculated using the cross-correlation function and the auto-correlation function which are output to the synthesis filter circuit 250 and updated for each sub-frame. In this way, a temporally smooth spectrum change can be obtained, and higher quality speech can be synthesized. In addition, when interpolating the values of the pulse and the K parameter, interpolation may be performed in synchronization with the pitch cycle based on a representative pitch section, or one or both of the pulse and the K parameter may be determined in advance. The interpolation may be performed based on a determined pitch section (for example, a pitch section near the center of the frame). When both use such an interpolation method, the code representing the position of the representative pitch section does not need to be transmitted, and the transmission bit rate can be reduced. On the other hand, a method of interpolating the pulse and the K parameter without synchronizing with the pitch period is also conceivable. In this case, the frame is divided into predetermined time intervals (for example, about 2.5 msec), and interpolation processing is performed for each section. In this case, the transmission bit rate can be reduced because the subframe phase does not need to be transmitted. In this case, as the interpolation reference section, a representative section may be searched for on the transmitting side or may be determined in advance (for example, near the center of the frame). In the latter case, the subframe phase and the representative pitch position need not be transmitted, and the bit rate can be further reduced.

演算量を減らす方法として、Ｋパラメータの補間処理
は受信側のみで行なうようにしてもよい。このようにす
ることにより、送信側の補間回路255を省略することが
できる。As a method of reducing the calculation amount, the K parameter interpolation processing may be performed only on the receiving side. By doing so, the interpolation circuit 255 on the transmission side can be omitted.

また、代表ピッチ区間の選択法として、絶対値の大き
なパルスを含むサブフレームを選択する方法等、他の方
法を用いることもできる。また良好な音声を再生できる
区間をフレーム毎に探索することもできる。また、サブ
フレーム分割を行なうときにピッチ周期は一定としてい
たが、この値も前後のフレームのピッチ周期を用いて補
間するようにしてもよい。このほうがピッチ周期の変化
が時間的に滑らかとなり、より良好な音声を得ることが
できる。Further, as a method of selecting the representative pitch section, another method such as a method of selecting a subframe including a pulse having a large absolute value can be used. In addition, a section in which good sound can be reproduced can be searched for each frame. Although the pitch period is fixed when subframe division is performed, this value may be interpolated using the pitch periods of the preceding and succeeding frames. In this case, the change in the pitch period becomes smoother in time, and a better sound can be obtained.

次に、パルス、合成フィルタのパラメータ、ピッチ周
期の補間法としては、直線補間以外の方法も考えられ
る。例えば、パルス、ピッチ周期については、対数補間
等も考えられる。また、合成フィルタのパラメータを補
間する場合、本実施例ではＫパラメータについて補間し
たが、例えば、予測係数（但し、この場合はフィルタの
安定性をチェックする必要がある）、対数断面積関数、
フォルマントパラメータや自己相関関数を補間する方法
等を用いることもできる。これらの具体的な方法は、ビ
ーエスアタル（B.S.ATAL）氏らによる“スピーチ
アナリシスアンドシンセシスバイリニアープ
リディクションオブザスピーチウェイブ”
（“SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDIC
TION OF THE SPEECH WAVE"）と題した論文（J.ACOUST.S
OC.AM.,p.p.637−655,1971）（文献12）等に述べられて
いるので、説明は省略する。Next, as a method of interpolating the pulse, the parameters of the synthesis filter, and the pitch period, a method other than linear interpolation may be used. For example, logarithmic interpolation or the like can be considered for the pulse and pitch period. In the present embodiment, when the parameters of the synthesis filter are interpolated, the K parameter is interpolated. However, for example, a prediction coefficient (however, in this case, it is necessary to check the stability of the filter), a logarithmic cross section function,
A method of interpolating a formant parameter or an autocorrelation function may be used. These specific methods are described in a speech by BSATAL and others.
Analysis and Synthesis by Linear Prediction of the Speech Wave ”
(“SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDIC
TION OF THE SPEECH WAVE ") (J. ACOUST.S
OC.AM., pp637-655, 1971) (Reference 12) and the like, and a description thereof will not be repeated.

本実施例では、フレーム長は一定としてＫパラメータ
の分析および音源パルス列の計算をしたが、フレーム長
は可変としてもよい。このようにした場合には、音声の
変化部では、フレーム長を短くし、定常部ではフレーム
長を長くできるので、伝送ビットレイトを低減すること
ができる。更に、ピッチ周期に応じて（例えばピッチ周
期の整数倍）フレーム長を決めるようにすれば、本実施
例で述べたサブフレーム位相も送らなくてよいので、更
に伝送ビットレイトを低減することができる。本発明の
他の構成法として、図１（ａ）に於ける駆動信号復元回
路240、合成フィルタ回路250、補間回路255、減算回路1
20を省略した構成をとることもできる。このようにした
場合は、送信側で音声信号を合成しなくてもよく、装置
構成を簡略化することができる。In the present embodiment, the analysis of the K parameter and the calculation of the sound source pulse train are performed with the frame length being fixed, but the frame length may be variable. In such a case, the frame length can be shortened in the voice change section and the frame length can be increased in the stationary section, so that the transmission bit rate can be reduced. Furthermore, if the frame length is determined according to the pitch period (for example, an integral multiple of the pitch period), the sub-frame phase described in the present embodiment does not need to be transmitted, so that the transmission bit rate can be further reduced. . As another configuration method of the present invention, the drive signal restoration circuit 240, the synthesis filter circuit 250, the interpolation circuit 255, and the subtraction circuit 1 in FIG.
A configuration in which 20 is omitted can also be adopted. In such a case, it is not necessary to synthesize the audio signal on the transmission side, and the device configuration can be simplified.

尚、ディジタル信号処理の分野でよく知られているよ
うに、自己相関関数はパワスペクトルから計算すること
もできる。また、相互相関関数はクロスパワスペクトル
から計算することもできる。これらの対応関係について
は、エーブイオッペンハイム（A.V.OPPENHEIM）氏
らによる“ディジタル信号処理”“DIGITAL SIGNAL PRO
CESSING"と題した単行本（文献13）等の第８章にて詳細
に説明されているので、ここでは説明を省略する。Note that, as is well known in the field of digital signal processing, the autocorrelation function can be calculated from a power spectrum. Also, the cross-correlation function can be calculated from the cross-power spectrum. For information on these correspondences, see “Digital Signal Processing” and “DIGITAL SIGNAL PRO” by AVOPPENHEIM and others.
This is described in detail in Chapter 8, such as a book entitled "CESSING" (Reference 13), and will not be described here.

（本発明の効果）以上述べたように本発明によれば、音源信号として、
音声信号の周期性を利用した代表的な１ピッチ区間のパ
ルス列による音源と、パルスと雑音の組み合わせによる
音源のうち、音声信号をより良好に再生できる音源信号
を選択しているため、低い伝送ビットレイトにおいても
有声区間、無声区間及び無声区間と有声区間の過渡部に
拘らず高品質な音声を合成できるという効果がある。(Effects of the Present Invention) As described above, according to the present invention, as a sound source signal,
Since a sound source that can reproduce an audio signal better is selected from a sound source based on a pulse train of a representative one-pitch interval using the periodicity of the audio signal and a sound source based on a combination of pulses and noise, a low transmission bit is selected. Also in the late, there is an effect that high-quality speech can be synthesized regardless of a voiced section, an unvoiced section, and a transition section between the unvoiced section and the voiced section.

[Brief description of the drawings]

第１図（ａ），（ｂ）は、本発明による高能率音声符号
化方式の一実施例を表わすブロック図、第２図は駆動信
号計算回路220における処理内容の一例を示す図、第３
図は、補間回路255の処理例を示す図、第４図は従来方
式の合成側の構成を示すブロック図である。図において、110:バッファメモリ回路、120:減算回路、
250、350:合成フィルタ回路、200:重みずけ回路、170:
インパルス応答計算回路、180:自己相関関数計算回路、
210:相互相関関数計算回路、220:駆動信号計算回路、24
0、340:駆動信号復元回路、130:ピッチ分析回路、140:K
パラメータ計算回路、150:ピッチ符号化回路、160:Kパ
ラメータ符号化回路、225、310:雑音メモリ、230:符号
化回路、255、335:補間回路、260:マルチプレクサ、29
0:デマルチプレクサ、300:復元回路、320:ピッチ復号回
路、330:Kパラメータ復号回路をそれぞれ示す。1 (a) and 1 (b) are block diagrams showing an embodiment of a high-efficiency speech coding system according to the present invention, FIG. 2 is a diagram showing an example of processing contents in a drive signal calculation circuit 220, and FIG.
FIG. 4 is a diagram showing a processing example of the interpolation circuit 255, and FIG. 4 is a block diagram showing a configuration on the synthesis side in the conventional system. In the figure, 110: buffer memory circuit, 120: subtraction circuit,
250, 350: synthesis filter circuit, 200: weighting circuit, 170:
Impulse response calculation circuit, 180: autocorrelation function calculation circuit,
210: cross-correlation function calculation circuit, 220: drive signal calculation circuit, 24
0, 340: drive signal restoration circuit, 130: pitch analysis circuit, 140: K
Parameter calculation circuit, 150: pitch coding circuit, 160: K parameter coding circuit, 225, 310: noise memory, 230: coding circuit, 255, 335: interpolation circuit, 260: multiplexer, 29
0: demultiplexer, 300: restoration circuit, 320: pitch decoding circuit, 330: K parameter decoding circuit.

Claims

(57) [Claims]

1. A transmitting side receives a discrete voice signal, divides the signal into predetermined time intervals, and extracts a spectrum parameter representing a short-time spectrum envelope and a pitch parameter representing a pitch period from the voice signal. The audio signal of the time interval is divided into small sections corresponding to the pitch period, and a sound source for representing the audio signal is represented by a pulse train of a representative section of the small sections or a combination of a pulse train and noise, A combination of the information representing the sound source, the pitch parameter, and the spectrum parameter is output, and the reception side temporally generates a pulse train of the representative section based on the pitch parameter and the information representing the sound source for each pitch cycle. The audio signal is restored using the spectral parameters by restoring the driving sound source signal by performing a process of giving a smooth change. High-efficiency speech coding method, characterized by synthesis.

2. A parameter calculation circuit for dividing an input speech signal into predetermined time intervals, extracting a spectrum parameter representing a short-time spectrum envelope and a pitch parameter representing a pitch period from the speech signal, and encoding the parameter. The audio signal of the time interval is divided into small sections corresponding to the pitch period, and a sound source for representing the audio signal is obtained by obtaining a pulse train or a combination of a pulse train and noise in a representative section of the small sections. And a multiplexer circuit for combining and outputting an output code of the parameter calculation circuit and an output code of the drive signal calculation circuit. apparatus.

3. A code sequence in which a code representing a pitch parameter, a code representing a spectrum parameter, and a code representing excitation information are combined, and a code representing the pitch parameter, a code representing the spectrum parameter, and the excitation information are used as a code sequence. A demultiplexer circuit that separates and decodes a code to be represented, and gives a temporally smooth change to the pulse train of the representative section for each pitch cycle based on the decoded pitch parameter and the decoded excitation information. Processing, when a pulse train and noise are used as a sound source, a driving sound source signal restoring circuit that generates a pulse train and noise based on the sound source information to restore a driving sound source signal, and the driving sound source signal and the decoded spectrum. And a synthesis filter circuit for synthesizing and outputting a speech signal based on parameters. Apparatus.