JP2946525B2

JP2946525B2 - Audio coding method

Info

Publication number: JP2946525B2
Application number: JP1103410A
Authority: JP
Inventors: 一範小澤
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1989-04-25
Filing date: 1989-04-25
Publication date: 1999-09-06
Anticipated expiration: 2014-09-06
Also published as: JPH02282800A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号を低いビットレート、特に4.8kb/
s程度で、比較的少ない演算量により高品質に符号化す
るための音声符号化方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial application] The present invention relates to an audio signal having a low bit rate, particularly 4.8 kb /
The present invention relates to a speech coding system for performing high-quality coding with a relatively small amount of calculation in about s.

[Conventional technology]

音声信号を4.8kb/s程度の低いビットレートで符号化
する方式としては、例えば特願昭59−272435号明細書
（文献１）や特願昭60−178911号明細書（文献２）等に
記載されているピッチ補間マルチパルス法が知られてい
る。この方法によれば、送信側では、フレーム毎の音声
信号から音声信号のスペクトル特性を表すスペクトルパ
ラメータとピッチを表すピッチパラメータとを抽出し、
音声信号の有声区間では、１フレームの音源信号を、１
フレームをピッチ区間毎に分割した複数個のピッチ区間
のうちの一つのピッチ区間（代表区間）について少数の
マルチパルスで表し、代表区間におけるマルチパルスの
振幅，位相と、スペクトル、ピッチパラメータを伝送す
る。また無声区間では、１フレームの音源を少数のマル
チパルスと雑音信号で表し、マルチパルスの振幅，位相
と、雑音信号のゲイン，インデックスを伝送する。As a method of encoding an audio signal at a low bit rate of about 4.8 kb / s, for example, Japanese Patent Application No. 59-272435 (Reference 1) and Japanese Patent Application No. 60-178911 (Reference 2) The pitch interpolation multipulse method described is known. According to this method, the transmitting side extracts, from the audio signal for each frame, a spectrum parameter representing a spectrum characteristic of the audio signal and a pitch parameter representing the pitch,
In the voiced section of the audio signal, one frame of the sound source signal is
One pitch section (representative section) of a plurality of pitch sections obtained by dividing a frame for each pitch section is represented by a small number of multipulses, and the amplitude, phase, spectrum, and pitch parameters of the multipulse in the representative section are transmitted. . In the unvoiced section, the sound source of one frame is represented by a small number of multipulses and a noise signal, and the amplitude and phase of the multipulse, and the gain and index of the noise signal are transmitted.

受信側においては、有声区間では、現フレームの代表
区間のマルチパルスと隣接フレームの代表区間のマルチ
パルスとを用いてマルチパルス同士の振幅と位相を補間
して、現フレームの代表区間以外のピッチ区間のマルチ
パルスを復元しフレームの駆動音源信号を復元する。ま
た、無声区間では、マルチパルスと雑音信号のインデッ
クス，ゲインを用いてフレームの駆動音源信号を復元す
る。さらに、復元した駆動音源信号を、スペクトルパラ
メータを用いた合成フィルタに入力して合成音声信号を
出力する。On the receiving side, in the voiced section, the amplitude and phase of the multi-pulses are interpolated using the multi-pulse of the representative section of the current frame and the multi-pulse of the representative section of the adjacent frame, and the pitch other than the representative section of the current frame is used. The multipulse of the section is restored to restore the driving sound source signal of the frame. In the unvoiced section, the driving sound source signal of the frame is restored using the multipulse and the index and gain of the noise signal. Further, the restored driving sound source signal is input to a synthesis filter using spectral parameters to output a synthesized voice signal.

[Problems to be solved by the invention]

上述した従来方式によれば、有声区間では代表区間に
たてた少数のマルチパルスと隣接フレームの代表区間に
おけるマルチパルスとを補間して音源信号を表してい
た。しかるにマルチパルスの振幅，位相という２種類の
伝送パラメータが必要であり、これらを符号化するの
に、１パルス当り合計で10ビット程度のビット数が必要
である。従って、4.8kb/s程度のビットレートに適用す
るためには、Ozawa,Araseki氏らの論文“Multi−pulse
speech coding with natural Speech quality"（ICASS
P,pp,457−460,1986年）（文献３）等に記載されている
ように、フレーム長を20msとすると代表区間にたてるマ
ルチパルスの個数を４個程度と少なくする必要がある。
従ってこのように少ない個数では代表区間の音源信号の
近似度が十分ではなく、特にピッチ周期の長い男性話者
では音質が劣化するという問題点があった。According to the conventional method described above, in a voiced section, a sound source signal is represented by interpolating a small number of multipulses set in a representative section and a multipulse in a representative section of an adjacent frame. However, two types of transmission parameters, that is, the amplitude and the phase of the multi-pulse, are required. To encode these, a total of about 10 bits per pulse is required. Therefore, in order to apply to a bit rate of about 4.8 kb / s, Ozawa and Araseki et al.
speech coding with natural Speech quality "(ICASS
As described in P, pp, 457-460, 1986) (Literature 3) and the like, when the frame length is 20 ms, it is necessary to reduce the number of multi-pulses in a representative section to about four.
Therefore, with such a small number, the degree of approximation of the sound source signal in the representative section is not sufficient, and there is a problem that the sound quality is deteriorated especially in a male speaker having a long pitch cycle.

さらに従来方式では、音声信号のスペクトル包絡特性
を表す合成フィルタの係数は、線形予測（LPC）分析法
を用いて計算する。しかしLPC分析法では、ピッチ周期
の短い女性音に対しては、ピッチの影響を受けるために
合成フィルタの近似度が低下し音声のスペクトル包絡を
良好に表すくことが困難で、このような合成フィルタを
用いて合成した合成音声の音質が低下していた。このこ
とはビットレートが低くパルスの個数が少ない領域、特
に4.8kb/s以下で顕著であった。Further, in the conventional method, coefficients of a synthesis filter representing a spectral envelope characteristic of a speech signal are calculated using a linear prediction (LPC) analysis method. However, in the LPC analysis method, it is difficult to express the spectral envelope of the voice of a female voice with a short pitch period because the pitch is affected by the pitch of the female voice. The sound quality of the synthesized speech synthesized using the filter was degraded. This was remarkable in the region where the bit rate was low and the number of pulses was small, particularly in the region of 4.8 kb / s or less.

本発明の目的は、上述した問題点を解決し、比較的少
ない演算量により4.8kb/s程度で音質の良好な音声符号
化方式を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems and to provide a speech encoding system with a good sound quality at about 4.8 kb / s with a relatively small amount of calculation.

[Means for solving the problem]

第１の発明である音声符号化方式は、入力した離散的
な音声信号から、スペクトル包絡を表すスペクトルパラ
メータとピッチを表すピッチパラメータとを予め定めら
れた時間長のフレーム毎に求め、前記フレームの音声信
号を前記ピッチパラメータから求めたピッチ周期に応じ
たピッチ区間毎に分割し、前記ピッチ区間の内の１つの
ピッチ区間の音源信号をパルスと前記音源信号のスペク
トル包絡特性を表すコードブックとで表し、前記パルス
と前記コードブックにより得られる復元音源信号と前記
スペクトルパラメータにより得られる合成信号と前記音
声信号との誤差を小さくするように前記パルスの振幅と
位相を求め、前記コードブックから一つのコードワード
を選択すると共に、前記復元音源信号をもとに前記スペ
クトルパラメータを修正し、前記ピッチパラメータと前
記スペクトルパラメータと前記パルスの振幅，位相と前
記コードワードを表す情報とを出力することを特徴とす
る。According to a first aspect of the present invention, a speech encoding method obtains a spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch for each frame of a predetermined time length from an input discrete speech signal, The audio signal is divided into pitch sections corresponding to the pitch period determined from the pitch parameter, and a sound source signal of one of the pitch sections is divided into a pulse and a codebook representing a spectral envelope characteristic of the sound source signal. Represents, the amplitude and phase of the pulse so as to reduce the error between the audio signal and the synthesized signal obtained by the pulse and the restored excitation signal obtained by the codebook and the spectrum parameter, and one from the codebook. A codeword is selected, and the spectral parameter is determined based on the restored sound source signal. Correct, the pitch parameter and the spectral parameter and the amplitude of the pulse, and outputs the information representing the phase and the code word.

第２の発明である音声符号化方式は、入力した離散的
な音声信号から、スペクトル包絡を表すスペクトルパラ
メータとピッチを表すピッチパラメータとを予め定めら
れた時間長のフレーム毎に求め、前記ピッチパラメータ
から求めたピッチ周期に応じたピッチ区間毎に前記フレ
ームの音声信号を分割し、前記ピッチ区間の内の１つの
ピッチ区間の音源信号をパルスと前記音源信号のスペク
トル包絡特性を表すコードブックとで表し、さらに前記
ピッチ区間以外の他のピッチ区間では前記パルスの振
幅，位相を補正する補正係数を求め、前記パルスと前記
補正係数と前記コードブックにより得られる復元音源信
号と前記スペクトルパラメータとから求めた合成音声と
前記音声信号との誤差を小さくするように前記パルスの
振幅と位相を求め前記コードブックから一つのコードワ
ードを選択すると共に、前記復元音源信号をもとに前記
スペクトルパラメータを修正し、前記ピッチパラメータ
と前記スペクトルパラメータと前記パルスの振幅，位相
と、前記補正係数と、前記コードワードを表す情報とを
出力することを特徴とする。According to a second aspect of the present invention, in the speech encoding method, a spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are obtained for each frame of a predetermined time length from an input discrete speech signal, and the pitch parameter The audio signal of the frame is divided for each pitch section according to the pitch period obtained from the above, and a sound source signal of one pitch section of the pitch sections is divided into a pulse and a codebook representing a spectrum envelope characteristic of the sound source signal. In other pitch sections other than the pitch section, a correction coefficient for correcting the amplitude and phase of the pulse is obtained, and the correction coefficient is obtained from the pulse, the correction coefficient, a restored excitation signal obtained from the codebook, and the spectrum parameter. Determine the amplitude and phase of the pulse so as to reduce the error between the synthesized voice and the voice signal. Selecting one codeword from the codebook and modifying the spectral parameter based on the restored excitation signal, the pitch parameter, the spectral parameter, the amplitude and phase of the pulse, the correction coefficient, and the codeword. Is output.

[Action]

本発明による音声符号化方式の第１の特徴は、有声区
間において、第３図のブロック図に示すように、フレー
ム（通常20ms程度）内のピッチ区間の音源信号を、振幅
及び位相を与えるための少数のパルスを発生するパルス
発生部700と、音源信号のスペクトル包絡を表すフィル
タの係数のコードブック、あるいはフィルタのインパル
ス応答のコードブック720と、コードブック720から一つ
のコードワードを選択して音源信号を形成する音源信号
形成部710により表すことである。このようにして表し
た音源信号により合成フィルタ730を駆動して合成音声
を得る。A first feature of the speech coding method according to the present invention is that, as shown in the block diagram of FIG. 3, in a voiced section, an excitation signal of a pitch section in a frame (normally about 20 ms) is given an amplitude and a phase. A pulse generator 700 that generates a small number of pulses, a codebook of filter coefficients representing the spectral envelope of the sound source signal, or a codebook 720 of the impulse response of the filter, and selecting one codeword from the codebook 720 This is to be expressed by a sound source signal forming unit 710 that forms a sound source signal. The synthesis filter 730 is driven by the sound source signal represented in this way to obtain a synthesized voice.

また第２の特徴は、前述のように表した音源信号を用
いて合成フィルタ730のスペクトルパラメータ（以下フ
ィルタ係数）を求め直すことである。The second feature is that the spectrum parameters (hereinafter, filter coefficients) of the synthesis filter 730 are re-calculated using the sound source signal expressed as described above.

今、一例としてパルス発生部700のパルスの個数を１
とする。また前記コードブックは、音源信号のスペクト
ル包絡を表すフィルタのインパルス応答の集合からなる
ものとする。これをh_j（ｎ）（ｊ＝１−2^M）とする。こ
のインパルス応答は種々の方法により求めることができ
る。例えば、音声信号をLPC分析して得た予測残差信号
のフレーム毎の予め定められたサンプル数をFFT（高速
フーリエ変換）して絶対値スペクトルを求め、これを逆
FFTするとインパルス応答が求まる。別の方法として
は、前記予測残差信号を周知のLPC分析によりフィルタ
の係数を求め、このフィルタのインパルス応答を求め
る。以上の他、周知の方法を用いることができる。コー
ドブックは多量の音声データに対してトレーニングを行
いあらかじめ作成しておく。Now, as an example, the number of pulses of the pulse generator 700 is set to 1
And Further, the code book is composed of a set of impulse responses of a filter representing a spectral envelope of a sound source signal. This is defined as h _j (n) (j = 1−2 ^M ). This impulse response can be obtained by various methods. For example, an absolute value spectrum is obtained by performing a FFT (fast Fourier transform) on a predetermined number of samples for each frame of a prediction residual signal obtained by performing an LPC analysis on an audio signal, and calculating the absolute value spectrum.
FFT gives the impulse response. As another method, coefficients of a filter are obtained from the prediction residual signal by a well-known LPC analysis, and an impulse response of the filter is obtained. In addition to the above, known methods can be used. Codebooks are created in advance by training a large amount of audio data.

前記パルスの振幅g,位相ｍ、コードブックからのコー
ドワードh_j（ｎ）の選択は次のように行う。第４図
（ａ）に或るフレームの音声波形を示す。フレームを音
声信号から求めたピッチパラメータのピッチ周期Ｔ毎の
ピッチ区間に区切り、一つのピッチ区間（代表区間）に
着目する（第４図（ｂ））。この区間での音声信号をx_k
（ｎ）とする。この区間におけるパルスの振幅g,位相
ｍ、コードブックからの最適なコードワードの選択は、
次式で示す重みづけ誤差電力を最小化するように行う。
代表区間における重みづけ誤差電力E_kは、で表される。ただし、 _ｋ（ｎ−ｍ）＝ｇ・h_j（ｎ−ｍ）＊h_s（ｎ） ……（２）である。ここで、ｗ（ｎ）は聴感重みづけフィルタのイ
ンパルス応答を示す。具体的な構成例は、Atal氏らによ
る“A New Model of LPC Excitation for Producing Na
tural Sounding Speech at low Bit Rates",Proc.ICASS
P,pp,614−617,1982,文献４）を参照することができ
る。ただし、このフィルタはなくてもよい。_ｋ（ｎ）
は、パルスとコードブックから選択したｊ番目のコード
ワードとを用いて音源信号を表し、さらにこれを合成フ
ィルタに通して再生した再生音声を示す。またh_s（ｎ）
は音声を合成するための合成フィルタのインパルス応答
を示す。記号＊は畳み込み積分を示す。（２）式を
（１）式に代入してｇで偏微分して０とおき次式を得
る。The selection of the pulse amplitude g, the phase m, and the code word h _j (n) from the code book is performed as follows. FIG. 4A shows an audio waveform of a certain frame. The frame is divided into pitch sections for each pitch period T of the pitch parameter obtained from the audio signal, and attention is paid to one pitch section (representative section) (FIG. 4 (b)). The audio signal in this section is x _k
(N). The selection of the optimal codeword from the pulse amplitude g, phase m, and codebook in this section is
This is performed so as to minimize the weighted error power represented by the following equation.
The weighted error power E _k in the representative section is It is represented by Here, _k (nm) = _ghj (nm) * _hs (n) (2) Here, w (n) indicates the impulse response of the auditory weighting filter. A specific configuration example is described in “A New Model of LPC Excitation for Producing Na
tural Sounding Speech at low Bit Rates ", Proc.ICASS
P, pp, 614-617, 1982, Reference 4). However, this filter may not be provided. _k (n)
Represents a sound source signal using a pulse and a j-th codeword selected from a codebook, and indicates a reproduced sound reproduced by passing the signal through a synthesis filter. H _s (n)
Indicates an impulse response of a synthesis filter for synthesizing speech. The symbol * indicates convolution integration. The following equation is obtained by substituting equation (2) into equation (1) and performing partial differentiation with g to set it to 0.

ここで、 x_wk＝x_k（ｎ）＊ｗ（ｎ）ｘ′_wk（ｎ−ｍ）＝h_j（ｎ−ｍ）＊h_s（ｎ）＊ｗ（ｎ） ……（４）である。（１）式を最小化する最適なg,m,h_jの組は次の
ように求められる。インパルス応答系列h_jとしてまず或
るコードワードを用いて（３）式を計算し、（１）式を
最小化するようにg,mを求める。これには、を最大化するg,mを求めればよい。以上の処理を全ての
ｊについて行い、の値が最も大きいg,m,jの組が求める組である。 Here is the _{_{x wk = x k (n)}} * w (n) x 'wk (n-m) = h j (n-m) * h s (n) * w (n) ...... (4) . The optimal set of g, m, h _j that minimizes the expression (1) is obtained as follows. Equation (3) is first calculated using a certain code word as the impulse response sequence h _j , and g and m are determined so as to minimize equation (1). This includes G, m that maximizes. Perform the above processing for all j, The set of g, m, j with the largest value of is the set to be determined.

以上の処理により、着目するピッチ区間においてパル
スの振幅，位相、コードワードが求まる。第４図
（ｃ），（ｄ）に、求めたパルス、求めたパルスと選択
したコードワードにより発生した代表区間の音源信号に
より合成フィルタを駆動して得た合成波形x_k（ｎ）をそ
れぞれ示す。以上の処理はフレーム内の全てのピッチ区
間で行ってもよいし、一つのピッチ区間（代表区間）に
ついてのみ行ってもよい。With the above processing, the amplitude, phase, and code word of the pulse are obtained in the pitch section of interest. FIGS. 4 (c) and 4 (d) show the obtained pulse, the synthesized waveform x _k (n) obtained by driving the synthesis filter based on the obtained pulse and the excitation signal of the representative section generated by the selected code word, respectively. Show. The above processing may be performed for all pitch sections in the frame, or may be performed for only one pitch section (representative section).

次に合成フィルタの係数の求め直しについて説明す
る。上述のようにパルスとコードブックにより求めた代
表区間の音源信号をｖ（ｎ）とする。Next, the recalculation of the coefficients of the synthesis filter will be described. The sound source signal of the representative section obtained by the pulse and the codebook as described above is defined as v (n).

ｖ（ｎ）＝ｇ・h_j（ｎ−ｍ） ……（５）合成フィルタの係数をa_iとし、音源信号ｖ（ｎ）を合
成フィルタに通して求めた音声は、で表され、ｅ（ｎ）は誤差信号を示す。係数a_iは次式を
最小化するように求められる。v (n) = g · h _j (nm) (5) With the coefficient of the synthesis filter being a _i and the sound source signal v (n) obtained through the synthesis filter, And e (n) indicates an error signal. The coefficient a _i is determined so as to minimize the following equation.

（６）式を（７）式に代入して係数a_iを偏微分して０
とおき次式を得る。 Substituting equation (6) into equation (7) and partially differentiating the coefficients a _i to 0
The following equation is obtained.

従って（８）式を解くことによりa_iは求められる。こ
こで（８）式の左辺第１項はｘ（ｎ）の自己相関、第２
項はｖ（ｎ）とｘ（ｎ）の相互相関である。（８）式の
解法には、例えばRabiner,Schafer氏らによる“Digital
processing of speech signals"と題した刊行物（Pren
tice−Hall社1978）（文献５）等を参照できる。 Therefore, _ai can be obtained by solving equation (8). Here, the first term on the left side of equation (8) is the autocorrelation of x (n),
The term is the cross-correlation of v (n) and x (n). Equation (8) is solved by, for example, "Digital" by Rabiner, Schafer et al.
processing of speech signals "(Pren
tice-Hall 1978) (Reference 5).

〔Example〕

第１図は、第１の本発明による音声符号化方式を実施
する音声符号化装置を示す。FIG. 1 shows a speech coding apparatus for implementing a speech coding method according to the first invention.

第１図において、入力端子100から音声信号を入力
し、１フレーム分（例えば20ms）の音声信号ｘ（ｎ）を
バッファメモリ110に格納する。In FIG. 1, an audio signal is input from an input terminal 100, and an audio signal x (n) for one frame (for example, 20 ms) is stored in a buffer memory 110.

スペクトルパラメータ計算回路140は、フレームの音
声信号のスペクトル特性を表すスペクトルパラメータと
して、線形予測係数a_iを前記フレームの音声信号から周
知のLPC分析を行い、予め定められた次数Ｍだけ計算
し、インパルス応答計算回路170,重みづけ回路200へ出
力する。The spectrum parameter calculation circuit 140 performs a well-known LPC analysis on the linear prediction coefficient a _i from the audio signal of the frame as a spectral parameter representing the spectral characteristic of the audio signal of the frame, calculates a predetermined order M, and calculates an impulse. The response is output to the response calculation circuit 170 and the weighting circuit 200.

ピッチ計算回路130は、フレームの音声信号からピッ
チパラメータとして平均ピッチ周期Ｔを計算する。この
方法としては例えば自己相関法にもとづく方法が知られ
ており、詳細は前記文献1,2のピッチ抽出回路を参照す
ることができる。また、この方法以外にも他の周知な方
法（例えば、ケプストラム法、SIFT法、変相関法など）
を用いることができる。The pitch calculation circuit 130 calculates an average pitch period T as a pitch parameter from the audio signal of the frame. As this method, for example, a method based on an autocorrelation method is known. For details, refer to the pitch extraction circuits in the above-mentioned documents 1 and 2. In addition to this method, other well-known methods (eg, cepstrum method, SIFT method, modified correlation method, etc.)
Can be used.

ピッチ符号化回路150は、平均ピッチ周期Ｔを予め定
められたビット数で量子化して得た符号をマルチプレク
サ260へ出力するとともに、これを復号化して得た復号
ピッチ周期Ｔ′をピッチ分割回路205,音源信号計算回路
220へ出力する。The pitch encoding circuit 150 outputs a code obtained by quantizing the average pitch period T with a predetermined number of bits to the multiplexer 260, and outputs a decoded pitch period T ′ obtained by decoding the code to the pitch dividing circuit 205. , Sound source signal calculation circuit
Output to 220.

コードブック175は、音源信号のスペクトル包絡を表
すフィルタのインパルス応答の系列h_j（ｎ）（ｎ＝１−
Ｌ）の集合（コードブック）2^M種類格納している。ここ
でコードブックは予め多量の音声信号の予測残差信号か
ら分析した、残差信号のスペクトル包絡を表すフィルタ
のインパルス応答データから学習により作成しておく。
この学習の方法としては、ベクトル量子化の学習法が知
られており、例えばMakhoul氏らによる“Vector Quanti
zation in Speech Coding,"（Proc.IEEE,vol,73,11,155
1−1588,1985）（文献６）等を参照することができる。
また、残差信号のスペクトル包絡を表すフィルタの特性
の求め方としては、周知の種々の方法を用いることがで
きる。例えば、残差信号に対してLPC分析，共分散分
析，改良ケプストラム分析などを用いることができる。
LPC分析，共分散分析については、前記文献５などを参
照できる。改良ケプストラム分析については、今井氏ら
による“改良ケプストラム法によるスペクトル包絡の抽
出”（電子通信学会論文誌,J62−A,217−233頁,1979
年）（文献７）等を参照できる。コードブック175は、2
^M個のインパルス応答系列h_j（ｎ）（ｊ＝１−2^M）につ
いて、ｊ＝１から順にｊ＝2^Mまで一つずつ取り出してイ
ンパルス応答計算回路170へ出力する。The codebook 175 includes a sequence h _j (n) (n = 1−1) of the impulse response of the filter representing the spectral envelope of the sound source signal.
L) sets (codebooks) of 2 ^M types are stored. Here, the code book is created by learning from impulse response data of a filter representing the spectral envelope of the residual signal, which has been analyzed in advance from a large number of predicted residual signals of the audio signal.
As a learning method, a learning method of vector quantization is known. For example, “Vector Quanti” by Makhoul et al.
zation in Speech Coding, "(Proc. IEEE, vol, 73, 11, 155
1-1588, 1985) (Reference 6).
Various known methods can be used to determine the characteristics of the filter representing the spectral envelope of the residual signal. For example, LPC analysis, covariance analysis, improved cepstrum analysis, or the like can be used for the residual signal.
For the LPC analysis and the covariance analysis, reference can be made to the aforementioned reference 5. For the improved cepstrum analysis, see “Improvement of spectral envelope by improved cepstrum method” by Imai et al. (Transactions of the Institute of Electronics, Information and Communication Engineers, J62-A, pp. 217-233, 1979)
Year) (Reference 7). Codebook 175, 2
^{The M} impulse response sequences h _j (n) (j = 1−2 ^M ) are extracted one by one from j = 1 to j = 2 ^M and output to the impulse response calculation circuit 170.

インパルス応答計算回路170は、スペクトルパラメー
タ計算回路140からの線形予測係数a_iを用いて、聴感重
みづけを行った合成フィルタのインパルス応答h_w（ｎ）
を計算し、さらにコードブック175からの出力h_j（ｎ）
と（４）式に従いたたみこみ計算を行って得たインパル
ス応答ｘ′_wk（ｎ−ｍ）を、自己相関関数計算回路180
へ出力する。The impulse response calculation circuit 170 uses the linear prediction coefficient _ai from the spectrum parameter calculation circuit 140 to perform the impulse response h _w (n) of the synthesis filter weighted with the auditory sense.
And output h _j (n) from codebook 175
And the impulse response x ′ _wk (nm) obtained by performing the convolution calculation according to the equation (4) is converted to an autocorrelation function calculation circuit 180
Output to

自己相関関数計算回路180は、インパルス応答ｘ′_wk
（ｎ−ｍ）の自己相関関数R_hh（ｎ）を予め定められた
遅れ時間まで計算して出力する。自己相関関数計算回路
180の動作は前記文献1,2等を参照することができる。The autocorrelation function calculation circuit 180 calculates the impulse response x ′ _wk
The autocorrelation function R _hh (n) of (nm) is calculated and output up to a predetermined delay time. Autocorrelation function calculation circuit
For the operation of 180, reference can be made to the above-mentioned documents 1 and 2.

減算器190は、フレームの音声信号ｘ（ｎ）から合成
フィルタ281の出力を１フレーム分減算し、減算結果を
重みづけ回路200へ出力する。The subtractor 190 subtracts the output of the synthesis filter 281 for one frame from the audio signal x (n) of the frame, and outputs the subtraction result to the weighting circuit 200.

重みづけ回路200は、前記減算結果をインパルス応答
がｗ（ｎ）で表される聴感重みづけフィルタに通し、重
みづけ信号x_w（ｎ）を得てこれを出力する。重みづけの
方法は前記文献1,2等を参照できる。The weighting circuit 200 passes the subtraction result through an audibility weighting filter whose impulse response is represented by _w (n), obtains a weighting signal _xw (n), and outputs it. The weighting method can be referred to the above-mentioned documents 1 and 2.

ピッチ分割回路205は、フレームの音声信号を復号化
されたピッチ周期Ｔ′を用いてＴ′毎に分割する。The pitch division circuit 205 divides the audio signal of the frame for each T ′ using the decoded pitch period T ′.

相互相関関数計算回路210は、重みづけ信号x_w（ｎ）
とインパルス応答ｘ′_wk（ｎ−ｍ）を入力して相互相関
関数φ_xhを予め定められた遅れ時間まで計算し出力す
る。この計算法は前記文献1,2等を参照できる。The cross-correlation function calculation circuit 210 calculates the weighted signal x _w (n)
And the impulse response x ′ _wk (nm) are input to calculate and output a cross-correlation function φ _{xh up} to a predetermined delay time. This calculation method can be referred to the above-mentioned references 1 and 2.

音源信号計算回路220では、フレーム内の代表的な１
つのピッチ区間（代表区間）について、音源信号をコー
ドブックh_j（ｎ）と１個のパルスで表すために、コード
ワードとパルスの振幅ｇと位相ｍを求める。このときg,
mの計算には前記（３）式を用いる。次に前記作用の項
で述べたように、h_j（ｎ）として2^M種類についてコード
ブック175から出力し以上の処理を繰り返し行い、
（１）式の誤差電力を最小化するg,m,h_j（ｎ）の組を作
用の項で述べた方法により求める。そして選択されたコ
ードブックのインデックスを示す符号をマルチプレクサ
260に出力し、g,mを符号器230へ出力する。In the sound source signal calculation circuit 220, the representative 1
For one pitch section (representative section), in order to represent the sound source signal by the codebook h _j (n) and one pulse, the code word and the pulse amplitude g and phase m are obtained. Then g,
Equation (3) is used to calculate m. Then, as mentioned in the paragraph of the action repeats the outputs above processing from the codebook 175 for 2 ^M types as h _j (n),
A set of g, m, h _j (n) that minimizes the error power in the equation (1) is obtained by the method described in the section of operation. The code indicating the index of the selected codebook is multiplexer-multiplexed.
And outputs g and m to the encoder 230.

符号器230は、代表区間のパルスの振幅g,位相ｍを予
め定められたビット数で符号化して出力する。また、代
表区間のサブフレーム位置を示す情報P₁を予め定められ
たビット数で符号化してマルチプレクサ260へ出力す
る。さらに、これらを復号化して駆動信号復元回路283,
パラメータ修正回路178へ出力する。The encoder 230 encodes the amplitude g and the phase m of the pulse in the representative section with a predetermined number of bits and outputs the result. Furthermore, by encoding the information P ₁ number predetermined bit of which indicates the sub-frame location of the representative section outputs to the multiplexer 260. Further, these are decoded to obtain a drive signal restoring circuit 283,
Output to the parameter correction circuit 178.

パラメータ修正回路178は、代表区間において求めた
パルスの振幅，位相、選択したコードワードを用いて代
表区間において音源信号ｖ（ｎ）を発生する。さらに音
声信号ｘ（ｎ）を用い前記（８）式に従い線形予測係数
a_iを求め直し、これをＫパラメータに変換しパラメータ
符号化回路160に出力する。The parameter correction circuit 178 generates the sound source signal v (n) in the representative section using the pulse amplitude and phase obtained in the representative section and the selected codeword. Further, using the audio signal x (n), a linear prediction coefficient according to the above equation (8)
a _i is recalculated, converted to K parameters, and output to the parameter encoding circuit 160.

パラメータ符号化回路160はＫパラメータを符号化し
て、符号l_kをマルチプレクサ260へ出力する。またこの
復号値を線形予測係数a_i′に変換し合成フィルタ281へ
出力する。The parameter encoding circuit 160 encodes the K parameter and outputs the code l _k to the multiplexer 260. The decoded value is converted to a linear prediction coefficient a _i ′ and output to the synthesis filter 281.

駆動信号復元回路283は、代表区間において求めたパ
ルスの振幅，位相、選択したコードワードを用いて代表
区間において音源信号を発生する。他のピッチ区間にお
いては、前後のフレームの代表区間におけるパルスの振
幅を用いて振幅同士を線形補間して、他のピッチ区間の
パルスを求める。また、選択したコードワードに対して
は、代表区間のコードワード同士を線形補間して、他の
ピッチ区間における音源信号のスペクトル包絡を表すイ
ンパルス応答を求める。以上の処理によりフレームの音
源信号を復元して発生する。The drive signal restoration circuit 283 generates a sound source signal in the representative section using the pulse amplitude and phase obtained in the representative section and the selected codeword. In the other pitch sections, the amplitudes are linearly interpolated using the amplitudes of the pulses in the representative section of the preceding and succeeding frames to obtain pulses in the other pitch sections. Further, for the selected codeword, the codewords in the representative section are linearly interpolated to obtain an impulse response representing the spectral envelope of the sound source signal in another pitch section. The above process is performed by restoring the sound source signal of the frame.

合成フィルタ281は、前記復元された音源信号を入力
し、パラメータ符号化回路160からの線形予測係数a_i′
を入力して１フレーム分の合成音声信号を求めると共
に、次のフレームへの影響信号を１フレーム分計算し、
これを減算器190へ出力する。なお、影響信号の計算法
は特願昭57−231605号明細書（文献８）等を参照でき
る。The synthesis filter 281 receives the restored excitation signal and receives the linear prediction coefficient a _i ′ from the parameter encoding circuit 160.
To obtain a synthesized voice signal for one frame, and calculate an influence signal for the next frame for one frame,
This is output to the subtractor 190. The calculation method of the influence signal can be referred to Japanese Patent Application No. 57-231605 (Reference 8).

マルチプレクサ260は、代表区間におけるパルスの振
幅，位相を表す符号、代表区間の位置を表す符号、Ｋパ
ラメータを表す符号、ピッチ周期を表す符号、選択され
たコードワードを表す符号を組み合わせて出力する。The multiplexer 260 outputs a combination of a code representing the amplitude and phase of the pulse in the representative section, a code representing the position of the representative section, a code representing the K parameter, a code representing the pitch period, and a code representing the selected codeword.

次に、第２の本発明による実施例を説明する。 Next, a second embodiment according to the present invention will be described.

第２図は、第２の本発明による音声符号化方式を実施
する音声符号化装置を示す。図において第１図と同一の
参照番号を付した構成要素は第１図と同様の動作をする
ので説明は省略する。FIG. 2 shows a speech encoding apparatus for implementing the speech encoding method according to the second invention. In the figure, components having the same reference numerals as those in FIG. 1 operate in the same manner as in FIG.

第２図において、225は振幅・位相補正計算回路であ
る。振幅・位相補正計算回路225では、同一フレーム内
の代表区間以外のピッチ区間において代表区間のパルス
の振幅，位相を補正するための補正係数を各ピッチ区間
毎に計算する。具体的には次のように求める。第ｉ番目
のピッチ区間における入力音声，振幅補正係数，位相補
正係数を、それぞれx_i（ｎ）,c_i,d_iとする。In FIG. 2, reference numeral 225 denotes an amplitude / phase correction calculation circuit. The amplitude / phase correction calculation circuit 225 calculates a correction coefficient for correcting the amplitude and phase of the pulse in the representative section in each pitch section other than the representative section in the same frame. Specifically, it is obtained as follows. The input voice, amplitude correction coefficient, and phase correction coefficient in the i-th pitch section are x _i (n), c _i , and d _i , respectively.

第ｉ番目のピッチ区間において代表区間のパルスの振
幅，位相とコードワードにより復元した音源信号の振幅
と位相を補正して合成フィルタに通して再生した再生信
号_ｉ（ｎ）と入力音声信号x_i（ｎ）との聴感重みづけ
誤差電力は次のように書ける。In the i-th pitch section, the amplitude and phase of the pulse in the representative section and the amplitude and phase of the sound source signal restored by the code word are corrected, and the reproduced signal _i (n) and the input audio signal x _i reproduced through a synthesis filter. The perceptual weighting error power with (n) can be written as follows.

ここで、 _ｉ（ｎ−Ｔ′−d_i）＝ｇ・ｈ（ｎ−ｍ−Ｔ′−d_i）＊h_s（ｎ） ……（10）である。振幅，位相補正係数c_i,d_iは（10）式を最小化
するように求めることができる。（10）式を振幅補正係
数c_iで偏微分して０とおき次式を得る。 Here, _i (n−T′−d _i ) = g · h (nm−T′−d _i ) * h _s (n) (10) The amplitude and phase correction coefficients c _i and d _i can be obtained so as to minimize the expression (10). (10) get by partially differentiating the amplitude correction coefficient c _i 0 Distant following equation Eq.

c_i＝Σx_wi（ｎ）_wi（ｎ−Ｔ′−d_i）／ Σ_wi（ｎ−Ｔ′−d_i）_wi（ｎ−Ｔ′−d_i） ……（11）種々の位相補正係数d_iについて（11）式を計算し、
（11）式を最大化するc_i,d_iの組を求めればよい。以上
の処理をフレーム内の代表区間以外の全てのピッチ区間
について行い、各区間の振幅・位相補正係数を符号器23
0へ出力する。c _i = Σx _wi (n) _wi (n−T′−d _i ) / Σ _wi (n−T′−d _i ) _wi (n−T′−d _i ) (11) Various phase correction coefficients Calculate equation (11) for d _i ,
What is necessary is just to find the pair of c _i and d _i that maximizes the equation (11). The above processing is performed for all pitch sections other than the representative section in the frame, and the amplitude / phase correction coefficient of each section is encoded by the encoder 23.
Output to 0.

駆動信号復元回路285は、フレームの代表区間ではパ
ルスの振幅，位相及び選ばれたコードワードを用いて音
源信号ｖ（ｎ）を発生させる。また同一フレーム内の代
表区間以外のｉ番目のピッチ区間においては、代表区間
の音源信号ｖ（ｎ）を振幅，位相補正係数c_i,d_iを用い
て次式に従い補正してｉ番目のピッチ区間の音源信号d_i
（ｎ）を発生させる。The drive signal restoration circuit 285 generates the sound source signal v (n) using the pulse amplitude and phase and the selected code word in the representative section of the frame. In the representative period other than the i-th pitch interval in the same frame, amplitude sound source signal v (n) representative period, the phase correction coefficient c _i, i-th pitch correction according to the following equation using the d _i Sound source signal d _{i for the} section
(N) is generated.

d_i（ｎ）＝c_i・ｖ（ｎ−Ｔ′−d_i） ……（12）ただしｖ（ｎ）＝ｇ・h_j（ｎ−ｍ） ……（13）ここでh_j（ｎ）,g,mはコードブックのコードワード、パ
ルスの振幅、パルスの位相である。 _{_{d i (n) = c i}} · v (n-T'-d i) ...... (12) provided that v (n) = g · h j (n-m) ...... (13) where h _j (n ), G, m are the codebook codeword, pulse amplitude, and pulse phase.

以上には本発明の各実施例を説明したが、上述した各
実施例はあくまで本発明の一例に過ぎず、その変形例も
種々考えられる。The embodiments of the present invention have been described above. However, each of the embodiments described above is merely an example of the present invention, and various modifications thereof are conceivable.

例えば、パラメータ修正回路178で求め直した線形予
測係数a_iを用いて音源信号計算回路220において代表区
間のパルスを計算し直してもよい。このためには前記求
め直した線形予測係数をインパルス応答計算回路170に
通してインパルス応答を計算し直し、さらに自己相関関
数計算回路180,相互相関関数計算回路210で自己相関，
相互相関を計算し直し、これらを音源信号計算回路220
へ出力してパルスを求め直せばよい。また、パルス計
算、線形予測係数の補正、パルスの求め直しの処理を、
あらかじめ定められた回数だけ繰り返してもよい。この
ような構成をとることにより演算量は増加するが特性は
改善される。For example, the pulse of the representative section may be recalculated in the sound source signal calculation circuit 220 using the linear prediction coefficient _ai obtained again in the parameter correction circuit 178. For this purpose, the recalculated linear prediction coefficients are passed through an impulse response calculation circuit 170 to recalculate the impulse response, and furthermore, the autocorrelation function calculation circuit 180 and the cross correlation function calculation circuit 210
The cross-correlation is recalculated, and these are calculated by the sound source signal calculation circuit 220.
And then recalculate the pulse. In addition, the pulse calculation, linear prediction coefficient correction, and pulse
It may be repeated a predetermined number of times. By adopting such a configuration, the amount of calculation increases, but the characteristics are improved.

また、パルスの振幅，位相の計算及びコードワードの
選択を代表区間のみではなくフレーム内の全ピッチ区間
において行うようにしてもよい。このような構成とする
と、音源情報の伝送に必要な情報量は増大するが特性は
向上する。The calculation of the pulse amplitude and phase and the selection of the code word may be performed not only in the representative section but also in all pitch sections in the frame. With such a configuration, the amount of information necessary for transmitting the sound source information increases, but the characteristics are improved.

また、代表区間は例えばフレームの中央部というよう
にフレーム内で固定的に決めてもよいし、合成音声と入
力音声との誤差を最も小さくするピッチ区間を探索して
求めてもよい。後者の具体的な方法については前記文献
１を参照できる。The representative section may be fixedly determined in the frame, for example, at the center of the frame, or may be obtained by searching for a pitch section that minimizes the error between the synthesized speech and the input speech. Reference 1 can be referred to for the specific method of the latter.

また、代表区間のパルスの個数は２以上でもよい。こ
のようにすると特性は改善されるが、伝送情報量が増大
する。Further, the number of pulses in the representative section may be two or more. This improves the characteristics, but increases the amount of transmitted information.

また、コードワードに関しては代表区間以外の他のピ
ッチ区間においては線形補間してもよいし、しなくても
よい。Further, with respect to a code word, linear interpolation may or may not be performed in a pitch section other than the representative section.

また、コードブックとして、音声信号の予測残差信号
のスペクトル包絡を表すフィルタのインパルス応答とし
たが、フィルタの係数としてもよい。このような構成の
ときはフィルタ係数からインパルス応答に変換する必要
がある。係数としては具体的には、線形予測係数,Kパラ
メータ，対数断面積比，ケプストラム，メルケプストラ
ムなど周知の係数を用いることができる。Further, although the codebook is the impulse response of the filter representing the spectral envelope of the prediction residual signal of the audio signal, it may be the coefficient of the filter. In such a configuration, it is necessary to convert a filter coefficient into an impulse response. Specifically, known coefficients such as a linear prediction coefficient, a K parameter, a logarithmic cross-sectional area ratio, a cepstrum, and a mel cepstrum can be used as the coefficients.

また、実施例では、スペクトルパラメータとして線形
予測係数を符号化し、その分析法としてLPC分析を用い
たが、スペクトルパラメータとしては他の周知なパラメ
ータ、例えばLSP,LPCケプストラム，ケプストラム，改
良ケプストラム，一般化ケプストラム，メルケプストラ
ムなどを用いることもできる。また各パラメータに最適
な分析法を用いることができる。In the embodiment, linear prediction coefficients are encoded as spectral parameters, and LPC analysis is used as an analysis method. However, other well-known parameters such as LSP, LPC cepstrum, cepstrum, improved cepstrum, and generalized Cepstrum, mel cepstrum and the like can also be used. In addition, an optimal analysis method can be used for each parameter.

また、演算量を低減するために、影響信号の計算を省
略することもできる。これによって、駆動信号復元回路
283,合成フィルタ281,減算器190は不要となり演算量低
減が可能となるが、音質は低下する。Further, in order to reduce the amount of calculation, calculation of the influence signal can be omitted. Thus, the driving signal restoring circuit
The 283, the synthesis filter 281, and the subtractor 190 become unnecessary, and the amount of calculation can be reduced, but the sound quality is reduced.

なお、デジタル信号処理の分野でよく知られているよ
うに、自己相関関数は周波数軸上でパワスペクトルに、
相互相関関数はクロスパワスペクトルに対応しているの
で、これらから計算することもできる。これらの計算法
については、Oppenheim氏らによる“Digital Signal Pr
ocessing"（Prentice−Hall,1975）（文献９）と題した
刊行物を参照できる。As is well known in the field of digital signal processing, the autocorrelation function is represented by a power spectrum on the frequency axis,
Since the cross-correlation function corresponds to the cross-power spectrum, it can be calculated from them. These calculations are described in “Digital Signal Pr
ocessing "(Prentice-Hall, 1975) (Reference 9).

〔The invention's effect〕

以上述べたように、本発明によれば、１ピッチ区間の
音源信号（代表区間）を、振幅，位相を与える少数のパ
ルスと音源信号の特性を表すコードブックとを用いて表
しており、さらにこのような音源信号を用いてスペクト
ルパラメータを求め直しているので、4.8kb/s程度のビ
ットレートでは従来方式に比べ音源信号の近似度が高く
良好な合成音声を得ることができるという大きな効果が
ある。As described above, according to the present invention, a sound source signal (representative section) in one pitch section is represented by using a small number of pulses giving an amplitude and a phase and a codebook representing characteristics of the sound source signal. Since the spectral parameters are re-calculated using such a sound source signal, a great effect is obtained at a bit rate of about 4.8 kb / s, in which the approximation degree of the sound source signal is higher than in the conventional method and a good synthesized voice can be obtained. is there.

[Brief description of the drawings]

第１図は第１の発明による音声符号化方式の一実施例を
説明するための音声符号化装置のブロック図、第２図は第２の発明による音声符号化方式の一実施例を
説明するための音声符号化装置のブロック図、第３図及び第４図は本発明の原理を説明するための図で
ある。 110……バッファメモリ 130……ピッチ計算回路 140……スペクトルパラメータ計算回路 150……ピッチ符号化回路 160……パラメータ符合化回路 170……インパルス応答計算回路 178……パラメータ修正回路 175,350,720……コードブック 180……自己相関関数計算回路 205……ピッチ分割回路 210……相互相関関数計算回路 220……音源信号計算回路 225……振幅・位相補正計算回路 230……符号器 260……マルチプレクサ 281,360,730……合成フィルタ 283……駆動信号復元回路FIG. 1 is a block diagram of a speech encoding device for explaining an embodiment of a speech encoding system according to the first invention, and FIG. 2 is a diagram explaining an embodiment of a speech encoding system according to the second invention. FIG. 3 and FIG. 4 are diagrams for explaining the principle of the present invention. 110 buffer memory 130 pitch calculation circuit 140 spectrum parameter calculation circuit 150 pitch coding circuit 160 parameter coding circuit 170 impulse response calculation circuit 178 parameter correction circuit 175,350,720 codebook 180 ... Autocorrelation function calculation circuit 205 ... Pitch division circuit 210 ... Cross correlation function calculation circuit 220 ... Sound source signal calculation circuit 225 ... Amplitude / phase correction calculation circuit 230 ... Encoder 260 ... Multiplexer 281,360,730 ... Synthesis filter 283 …… Drive signal restoration circuit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 - 9/20 H03M 7/30 H04B 14/04 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (58) Field surveyed (Int. Cl. ⁶ , DB name) G10L 3/00-9/20 H03M 7/30 H04B 14/04 JICST file (JOIS)

Claims

(57) [Claims]

1. A spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are obtained for each frame of a predetermined time length from an input discrete voice signal, and a voice signal of the frame is obtained from the pitch parameter. Divide every pitch section according to the obtained pitch cycle,
An excitation signal of one of the pitch sections is represented by a pulse and a codebook representing a spectral envelope characteristic of the excitation signal, and a restored excitation signal is generated from the pulse and the codeword. Generate a composite waveform from the parameters, determine the amplitude and phase of the pulse so as to reduce the error between the composite waveform and the audio signal,
Selecting one codeword from the codebook, correcting the spectral parameter based on the restored excitation signal, the pitch parameter, the spectral parameter, the amplitude and phase of the pulse, and information representing the codeword; A speech encoding method that outputs

2. A spectrum parameter representing a spectrum envelope and a pitch parameter representing a pitch are determined for each frame of a predetermined time length from an input discrete speech signal, and the pitch parameter is determined according to a pitch period determined from the pitch parameter. Divided the audio signal of the frame for each pitch section,
The excitation signal of one of the pitch sections is represented by a pulse and a codebook representing the spectral envelope characteristic of the excitation signal, and the amplitude and phase of the pulse are corrected in other pitch sections than the pitch section. A correction coefficient to be obtained is generated, a restored excitation signal is generated from the pulse, the correction coefficient, and the codeword, a composite waveform is generated from the restored excitation signal and the spectrum parameter, and an error between the composite waveform and the audio signal is reduced. To determine the amplitude and phase of the pulse, select one codeword from the codebook, modify the spectral parameters based on the restored excitation signal, the pitch parameter and the spectral parameters and the pulse amplitude , Phase, the correction coefficient, and information representing the codeword