JPH02280200A

JPH02280200A - Voice coding and decoding system

Info

Publication number: JPH02280200A
Application number: JP1100113A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-04-21
Filing date: 1989-04-21
Publication date: 1990-11-16
Anticipated expiration: 2014-10-04
Also published as: JP2956068B2

Abstract

PURPOSE:To obtain the good synthesized voice having a high degree of approximation to a sound source signal at a bit rate of about 4.8kb/s by expressing the sound source signal of one pitch section (key section) by using a small number of pulses to give an amplitude and phase and a code book to indicate the characteristics of the sound source signal. CONSTITUTION:The voice signal of a frame is divided to every pitch section corresponding to the pitch period determined from a pitch parameter. The sound source signal of one pitch section in the pitch sections is then expressed by a small number of the pulses to give the amplitude and phase and the code book 175 to express the characteristics of the sound source signal and the amplitude and phase of the pulse are so determined as to decrease the error between the voice signal and the synthesis signal obtd. by the code book 175. One code word is then selected from the code book 175 and the pitch parameter and spectrum parameter, the amplitude and phase of the pulse, and the index indicating the pitch parameter are outputted. The coding and decoding of the voice signal with high quality by a small computing quantity at the low bit rate, more particularly about 4.8kb/s are possible in his way.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号を低いビットレート、特に４．８ｋ
ｂ／ｓ程度で、比較的少ない演算量により高品質に符号
化し復号化するための音声符号化復号化方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention provides a method for converting audio signals to low bit rates, especially 4.8K.
The present invention relates to a speech encoding/decoding method for high-quality encoding and decoding with a relatively small amount of calculation at approximately b/s.

[Conventional technology]

音声信号を４．８ｋｂ／ｓ程度の低いピントレートで符
号化する方式としては、例えば特願昭５９−２７２４３
５号明細書（文献１）や特願昭６０−１７８９１１号明
細書（文献２）等に記載されているピッチ補間マルチパ
ルス法が知られている。この方法によれば、送信側では
、フレーム毎の音声信号から音声信号のスペクトル特性
を表すスペクトルパラメータとピッチを表すピンチパラ
メータとを抽出し、音声信号の有声区間では、１フレー
ムの音源信号を、１フレームをピッチ区間毎に分割した
複数個のピッチ区間のうちの一つのピッチ区間（代表区
間）についてマルチパルスで表し、代表区間におけるマ
ルチパルスの振幅１位相と、スペクトル、ピッチパラメ
ータを伝送する。また無声区間では、１フレームの音源
を少数のマルチパルスと雑音信号で表し、マルチパルス
の振幅１位相と、雑音信号のゲイン、インデックスを伝
送する。As a method for encoding audio signals at a low focus rate of about 4.8 kb/s, for example, Japanese Patent Application No. 59-27243
The pitch interpolation multi-pulse method described in Japanese Patent Application No. 5 (Reference 1) and Japanese Patent Application No. 60-178911 (Reference 2) is known. According to this method, on the transmitting side, a spectral parameter representing the spectral characteristics of the audio signal and a pinch parameter representing the pitch are extracted from the audio signal for each frame, and in the voiced section of the audio signal, the sound source signal of one frame is One pitch section (representative section) out of a plurality of pitch sections (representative section) obtained by dividing one frame into pitch sections is represented by a multi-pulse, and the amplitude, one phase, spectrum, and pitch parameter of the multi-pulse in the representative section are transmitted. In the silent section, the sound source of one frame is represented by a small number of multipulses and a noise signal, and one phase of the amplitude of the multipulse and the gain and index of the noise signal are transmitted.

受信側においては、有声区間では、現フレームの代表区
間のマルチパルスと隣接フレームの代表区間のマルチパ
ルスとを用いてマルチパルス同士の振幅と位相を補間し
て、代表区間以外のピッチ区間のマルチパルスを復元し
フレームの駆動音源信号を復元す乞。また、無声区間で
は、マルチパルスと雑音信号のインデンクス、ゲインを
用いてフレームの駆動音源信号を復元する。さらに、復
元した駆動音源信号を、スペクトルパラメータを用いた
合成フィルタに入力して合成音声信号を出力する。On the receiving side, in a voiced section, the multipulse in the representative section of the current frame and the multipulse in the representative section of the adjacent frame are used to interpolate the amplitude and phase of the multipulses, and the multipulse in the pitch section other than the representative section is interpolated. Restore the pulse and restore the frame's driving sound source signal. Furthermore, in the unvoiced section, the driving sound source signal of the frame is restored using the index and gain of the multi-pulse and the noise signal. Furthermore, the restored drive sound source signal is input to a synthesis filter using spectral parameters to output a synthesized speech signal.

（発明が解決しようとする課題）上述した従来方式によれば、有声区間では代表区間にた
てた少数のマルチパルスと隣接フレームの代表区間にお
けるマルチパルスとを補間して音源信号を表していた。(Problems to be Solved by the Invention) According to the conventional method described above, in a voiced section, a sound source signal is expressed by interpolating a small number of multipulses in a representative section and multipulses in a representative section of adjacent frames. .

しかるにマルチパルスの振幅２位相を符号化するのに、
１パルス当り合計で１０ビット程度のビット数が必要で
ある。従って、４．８ｋｂ／ｓ程度のピントレートに適
用するためには代表区間のマルチパルスの個数を通常４
個程度と少なくする必要がある。従ってこのように少な
い個数では音源信号の近似度が十分ではなく、特にピッ
チ周期の長い男性話者では音質が劣化するという問題点
があった。さらに従来方式では、代表区間以外のピッチ
区間は代表区間のパルス同士を線形補間して音源信号を
復元していたが、パルスには振幅と位相の２種のパラメ
ータがあり相互に関係しているため、これらを独立に補
間すると特性が低下するという問題点があった。However, in order to encode the two-phase amplitude of a multi-pulse,
A total of about 10 bits is required per pulse. Therefore, in order to apply to a focus rate of about 4.8 kb/s, the number of multipulses in the representative section is usually set to 4.
It is necessary to reduce the number to about 1. Therefore, with such a small number, the approximation of the sound source signal is not sufficient, and there is a problem in that the sound quality deteriorates, especially for male speakers with long pitch periods. Furthermore, in the conventional method, for pitch sections other than the representative section, the sound source signal was restored by linearly interpolating the pulses in the representative section, but the pulse has two types of parameters, amplitude and phase, and they are related to each other. Therefore, there is a problem in that the characteristics deteriorate when these are interpolated independently.

本発明の目的は、上述した問題点を解決し、比較的少な
い演算量により４．８ｋｂ／ｓ程度で音質の良好な音声
符号化復号化方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide a speech encoding/decoding method that achieves good sound quality at approximately 4.8 kb/s with a relatively small amount of calculation.

[Means to solve the problem]

第１の発明である音声符号化復号化方式は、送信側では
、人力した離散的な音声信号から、スペクトル包絡を表
すスペクトルパラメータとピンチを表すピッチパラメー
タとを予め定められた時間長のフレーム毎に求め、前記
フレームのＦ声ｆｆ　号を前記ピッチパラメータから求
めたピッチ周期に応じたピッチ区間毎に分割し、前記ピ
ッチ区間の内の１つのピッチ区間の音源信号を少数のパ
ルスと前記音源信号の特性を表すコードブックとで表し
、前記音声信号と前記パルスと前記コードブックにより
得られる合成信号との誤差を小さくするように前記パル
スの振幅と位相を求め、前記コードブックからΔつのコ
ードワードを選択し、前記ピンチパラメータと前記スペ
クトルパラメータと前記パルスの振幅１位相と前記コー
ドワードを表すインデックスとを出力し、受信側では、
前記バルスの振幅１位相と前記インデックスとを用いて
前記ピッチ区間の音源信号を発生し、さらに補間により
他のピッチ区間の音源信号を復元し、前記スペクトルパ
ラメータを用いて構成した合成フィルタを駆動して合成
音声を求め出力することを特徴とする。In the audio encoding/decoding method, which is the first invention, on the transmitting side, a spectral parameter representing a spectral envelope and a pitch parameter representing a pinch are extracted from a human-generated discrete audio signal every frame of a predetermined time length. The F voice ff signal of the frame is divided into pitch sections corresponding to the pitch period obtained from the pitch parameter, and the sound source signal of one pitch section among the pitch sections is combined with a small number of pulses and the sound source signal. A codebook representing the characteristics of and outputs the pinch parameter, the spectrum parameter, the amplitude 1 phase of the pulse, and the index representing the codeword, and on the receiving side,
generating a sound source signal in the pitch section using one phase of the amplitude of the pulse and the index; further restoring sound source signals in other pitch sections by interpolation; and driving a synthesis filter configured using the spectral parameter. It is characterized by determining and outputting synthesized speech.

第２の発明である音声符号化復号化方式は、送信側では
、入力した離散的な音声信号から、スペクトル包絡を表
すスペクトルパラメータとピッチを表すピッチパラメー
タとを予め定められた時間長のフレーム毎に求め、前記
フレームの音声信号を前記ピッチパラメータから求めた
ピッチ周期に応じたピッチ区間毎に分割し、前記ピッチ
区間の内の１つのピッチ区間の音源信号を少数のパルス
と前記音源信号の特性を表すコードブックとで表し、さ
らに前記ピッチ区間以外の他のピッチ区間では前記パル
スの振幅１位相の少なくとも一方を補正する補正係数を
求め、前記ピッチパラメータと前記スペクトルパラメー
タと前記パルスの振幅。The second invention, the audio encoding/decoding method, is such that on the transmitting side, a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch are determined from an input discrete audio signal every frame of a predetermined time length. The audio signal of the frame is divided into pitch sections according to the pitch period determined from the pitch parameter, and the sound source signal of one of the pitch sections is combined with a small number of pulses and the characteristics of the sound source signal. Further, in other pitch sections other than the pitch section, a correction coefficient for correcting at least one of one phase of the amplitude of the pulse is determined, and the pitch parameter, the spectrum parameter, and the amplitude of the pulse are determined.

位相と前記コードブックの選択されたコードワードを表
すインデックスと前記補正係数を出力し、受信側では、
前記パルスの振幅１位相と前記コードワードのインデッ
クスとを用いて前記ピッチ区間の音源信号を発生し、さ
らに他のピッチ区間では前記ピッチ区間の音源信号と前
記補正係数を用いて音源信号を復元し、前記復元した音
源信号により前記スペクトルパラメータを用いて構成し
た合成フィルタを駆動して合成音声を求めて出力するこ
とを特徴とする。outputting the phase, an index representing the selected codeword of the codebook, and the correction coefficient, and on the receiving side,
A sound source signal in the pitch section is generated using one phase of the amplitude of the pulse and an index of the code word, and a sound source signal is restored in another pitch section using the sound source signal in the pitch section and the correction coefficient. The method is characterized in that the reconstructed sound source signal drives a synthesis filter configured using the spectral parameters to obtain and output synthesized speech.

[Effect]

本発明による音声符号化復号化方式は、有声区間におい
て、第３図のブロック図に示すように、フレーム（通常
２０ｍ５程度）内のピッチ区間の音源信号を、振幅及び
位相を与えるための少数のパルスを発生するパルス発生
部７００と、音源信号のスペクトル包絡を表すフィルタ
のインパルス応答、あるいはフィル汐の係数のコードブ
ック７２０から一つを選択して音源信号を形成する音源
信号形成部７１０により表すことを特徴とする。さらに
このようにして表した音源信号により合成フィルタ７３
０を駆動して再生音声を得る。In the voice encoding/decoding method according to the present invention, as shown in the block diagram of FIG. A pulse generating section 700 that generates a pulse, and a sound source signal forming section 710 that selects one from a codebook 720 of filter impulse responses or filter coefficients representing the spectral envelope of the sound source signal to form a sound source signal. It is characterized by Furthermore, the synthesis filter 73 uses the sound source signal expressed in this way.
0 to obtain playback audio.

今、−例として前記パルスの個数を１とする。Now, for example, let the number of pulses be 1.

コードブックには、音源信号のスペクトル包絡を表すフ
ィルタのインパルス応答の集合からなるコードブックを
用いる。これをｈｉ（ｎ）（Ｊ＝１２Ｍ）とする。この
インパルス応答は種々の方法により求めることができる
。例えば、音声信号をＬＰＧ分析して得た予測残差信号
のフレーム毎の予め定められたサンプル数をＦＦＴ　（
高速フーリエ変換）して絶対値スペクトルを求め、さら
に逆ＦＦＴするとインパルス応答が求まる。別の方法と
しては、前記予測残差信号を周知のＬＰＣ分析して合成
フィルタの係数を求め、このフィルタのインパルス応答
を求める。以上の他、周知の方法を用いることができる
。A codebook consisting of a set of impulse responses of filters representing the spectral envelope of the sound source signal is used as the codebook. Let this be hi(n) (J=12M). This impulse response can be determined by various methods. For example, a predetermined number of samples per frame of a prediction residual signal obtained by LPG analysis of an audio signal is subjected to FFT (
Fast Fourier transform) is performed to determine the absolute value spectrum, and further inverse FFT is performed to determine the impulse response. Another method is to perform well-known LPC analysis on the prediction residual signal to determine the coefficients of a synthesis filter, and to determine the impulse response of this filter. In addition to the above, known methods can be used.

前記パルスの振幅ｇ１位相ｍ、コードブックは次のよう
に求める。第４図（ａ）に成るフレームの音声波形を示
す。フレームをピッチ周期Ｔ毎のピンチ区間に区切り、
一つのピッチ区間（代表区間）に着目する（第４図（ｂ
））。この区間での音声信号をｘｋ（ｎ）とする。この
区間におけるパルスの振幅ｇ１位相ｍ、コードブックか
らの最適なコードワードの選択は、次式で示す重みづけ
誤差電力を最小化するように行う。重みづけ誤差電力Ｅ
、は、Ｅｋ−Σ（（ｘｋ（ｎ）−ｘｋ（ｎ−ｍ）：１＊ｗ　（
ｎ）　）　”　　　　　　　　　　・　・　・（１）た
だしＸｋ　（ｎ　−ｍ）　＝ｇ　−ｔｚ　（ｎ　−ｍ）　＊
　ｈｓ　（ｎ）・　・　・（２）ココテ、ｗ　（ｎ）は聴惑重みづけフィルタのインパル
ス応答を示す。具体的な構成例は、Ａｔａ１氏らによる
”Ａ　Ｎｅｗ　Ｍｏｄｅｌ　ｏｆ　ＬＰＣＥｘｃｉｔａ
ｔｉｏｎ　ｆｏｒ　Ｐｒｏｄｕｃｉｎｇ　Ｎａｔｕｒａ
ｌ　Ｓｏｕｎｄｉｎｇ　５ｐｅｅｃｈ　ａｔ　ｌｏｗ　
ＢｉｔＲａｔｅｓ　”　、　Ｐｒｏｃ、　ＩＣＡＳＳＰ
、　ｐｐ、　６１４−６１７．１９８２゜文献３）を参
照することができる。ただ巳、このフィルタはなくても
よい。ｘ−（ｎ）は、パルスとコードブックから選択し
たｊ番目のコードワードとを用いて音源信号を表し、さ
らにこれを合成フィルタに通して再生した再生音声を示
す。またｈ、（ｎ）は音声を合成するための合成フィル
タのインパルス応答を示す。（２）式を（１）式に代入
してｇで偏微分してＯとおき次式を得る。The amplitude g1 phase m of the pulse and the codebook are determined as follows. The audio waveform of the frame shown in FIG. 4(a) is shown. Divide the frame into pinch sections of each pitch period T,
Focusing on one pitch section (representative section) (Fig. 4 (b)
)). Let xk(n) be the audio signal in this section. The amplitude g1 phase m of the pulse in this section and the selection of the optimal code word from the codebook are performed so as to minimize the weighted error power expressed by the following equation. Weighted error power E
, is Ek-Σ((xk(n)-xk(n-m):1*w(
n) ) ” ・・・(1) However, Xk (n − m) = g − tz (n − m) *
hs(n)・・・(2) Here, w(n) represents the impulse response of the auditory weighting filter. A specific configuration example is “A New Model of LPC Excita” by Mr. Ata1 et al.
tion for producing Natura
l Sounding 5peach at low
BitRates”, Proc, ICASSP
, pp. 614-617.1982゜Reference 3). However, this filter is not necessary. x-(n) represents a sound source signal using pulses and the j-th codeword selected from the codebook, and represents the reproduced sound obtained by passing this through a synthesis filter and reproducing it. Further, h and (n) indicate impulse responses of a synthesis filter for synthesizing speech. Substituting equation (2) into equation (1), partial differentiation with respect to g, and setting O to obtain the following equation.

ｇ　−Σ　Ｘ　ｗｋ　　（ｎ　）　　Ｘ　’　　ｗｋ　
　（ｎ　　　ｍ　）／ΣＸ’　ｗｋ　（ｎ　　ｍ）　Ｘ
’　ｉｉｋ　（ｎ　　ｍ）・・・（３）ここで、ｘ、ｋ（ｎ）＝ｘｋ（ｎ）＊ｗ　（ｎ）ｘ　”　ｗｋ　
（ｎ　　ｍ）　−ｈ、（ｎ−ｍ）＊ｈ、（ｎ）＊ｗ　（
ｎ）・　・　・（４）である。（１）式を最小化する最適なｇ、　　ｍ、　　
ｈＪの組は次のように求められる。インパルス応答系列
ｈ、としてまず成るコードワードを用いて（３）式を計
算し、（１）式を最小化するようにｇ、　　ｍを求める
。g −Σ X wk (n) X' wk
(n m)/ΣX' wk (n m)
' iik (n m)...(3) Here, x, k(n)=xk(n)*w (n)x '' wk
(n m) -h, (n-m)*h, (n)*w (
n)・・・・(4) Optimal g, m, that minimizes equation (1)
The set of hJ is obtained as follows. Equation (3) is calculated using the code word formed as the impulse response series h, and g and m are determined so as to minimize equation (1).

これには、ｇ’ΣＸｗｋ　（ｎ）　　ｘ　　、に’　　　（ｎ−ｍ
）／ΣＸ’　ｗｋ　（ｎ　　ｍ）　　Ｘ’　ｗｗ　（ｎ
　　ｍ）を最大化するｇ、　ｍを求めればよい。以上の
処理を全てのｊについて行い、ｇ・ΣＸｗｋ　（ｎ　）　　Ｘ　’　ｗｋ　（ｎ　　ｍ
）／ΣＸ’　ｗｌｌ（ｎ　　ｍ）Ｘ’　　ｗｋ（ｎ　　
ｍ）の値が最も大きいｇ、　　ｍ、　　ｊの組が求める
組である。This includes g'ΣXwk (n) x, ni' (n-m
)/ΣX' wk (n m) X' ww (n
All you have to do is find g and m that maximize m). The above processing is performed for all j, g・ΣXwk (n ) X' wk (n m
)/ΣX' wll(n m)X' wk(n
The set of g, m, and j with the largest value of m) is the set to be sought.

以上の処理により、着目するピッチ区間においてパルス
の振幅１位相、コードワードが求まる。Through the above processing, the pulse amplitude, one phase, and the code word are found in the pitch section of interest.

第４図（Ｃ）、（ｄ）に、求めたパルス、求めたパルス
と選択したコードワードにより発生した代表区間の音源
信号により合成フィルタを駆動して得た合成波形をそれ
ぞれ示す。以上の処理はフレーム内の全てのピンチ区間
で行ってもよいし、つのピッチ区間（代表区間）につい
てのみ行ってもよい。FIGS. 4C and 4D show the obtained pulse and the synthesized waveform obtained by driving the synthesis filter with the sound source signal of the representative section generated by the obtained pulse and the selected code word, respectively. The above processing may be performed for all pinch sections within a frame, or may be performed only for one pitch section (representative section).

一方、無声区間では、１フレ一ム全体の音源信号を従来
のマルチパルスや乱数コードブック信号などを用いて表
すことができる。後者の方法に関しては、例えば、シュ
レーダ、アタル氏らによる”Ｃｏｄｅ−ｅｘｃｉｔｅｄ
　１ｉｎｅａｒ　ｐｒｅｄｉｃｔｉｏｎ　（ＣＥＬＰ）
：　Ｈ４ｇｈｑｕａｌｉｔｙ　５ｐｅｅｃｈ　ａｔ　ｖ
ｅｒｙ、　ｌｏｗ　ｂｉｔ　ｒａｔｅｓ”　と題した論
文（ＩＣＡＳＳＰ、　９３７−９４０．１９８乳文献４
）等を参照できる。On the other hand, in the unvoiced section, the entire sound source signal of one frame can be represented using conventional multi-pulse or random number codebook signals. Regarding the latter method, for example, "Code-excited" by Schrader, Attal et al.
1inear prediction (CELP)
: H4ghquality 5peech at v
937-940.198 Milk Literature 4
), etc.

〔Example〕

第１図（ａ）、（ｂ）は、第１の本発明による音声符号
化復号化方式を実施する音声符号化装置および音声復号
化装置をそれぞれ示す。FIGS. 1(a) and 1(b) respectively show a speech encoding device and a speech decoding device that implement the speech encoding/decoding method according to the first invention.

まず、送信側での音声符号化方式について説明する。第
１図（ａ）において、入力端子１００から音声信号を入
力し、１フレ一ム分（例えば２０ｍ５　）の音声信号ｘ
　（ｎ）をバッファメモ１月１０に格納する。First, the audio encoding method on the transmitting side will be explained. In FIG. 1(a), an audio signal is input from the input terminal 100, and the audio signal x for one frame (for example, 20 m5) is input.
(n) is stored in the buffer memo January 10.

Ｋパラメータ計算回路１４０は、フレームの音声信号の
スペクトル特性を表すスペクトルパラメータとして、Ｋ
パラメータを前記フレームの音声信号から周知のＬＰＧ
分析を行い、予め定められた次数Ｍだけ計算する。この
具体的な計算方法については前記文献１．２のにパラメ
ータ計算回路を参照することができる。なお、Ｋパラメ
ータはＰＡＲＣＯＲ係数と同一のものである。The K parameter calculation circuit 140 calculates K as a spectral parameter representing the spectral characteristics of the audio signal of the frame.
The parameters are determined from the audio signal of the frame by well-known LPG.
The analysis is performed and only a predetermined order M is calculated. Regarding this specific calculation method, reference can be made to the above-mentioned document 1.2 regarding the parameter calculation circuit. Note that the K parameter is the same as the PARCOR coefficient.

Ｋパラメータ符号化回路１６０は、Ｋパラメータを予め
定められた量子化ビット数で量子化して得た符号ｌｋを
マルチプレクサ２６０へ出力するとともに、これを復号
化してさらに線形予測係数ａ。The K-parameter encoding circuit 160 outputs a code lk obtained by quantizing the K-parameter with a predetermined number of quantization bits to the multiplexer 260, and also decodes the code lk to obtain a linear prediction coefficient a.

（ｉ＝１〜Ｍ）に変換して重みづけ回路２００．インパ
ルス応答計算回路１７０へ出力する。Ｋパラメータの符
号化、Ｋパラメータから線形予測係数への変換の方法に
ついては、Ｊ、Ｍａｌｈｏｕ１氏らによる”ｔ、１ｎｅ
ａｒ　Ｐｒｅｄｉｃｔｉｏｎ　ｏｆ　５ｐｅｅｃｈ　”
と題した刊行物（文献５）等や前記文献１．２等を参照
することができる。(i=1 to M) and weighting circuit 200. It is output to the impulse response calculation circuit 170. Regarding the method of encoding K parameters and converting K parameters to linear prediction coefficients, see "t,1ne" by J. Malhou1 et al.
ar Prediction of 5peech”
(Reference 5) and the above-mentioned publications 1.2 and the like can be referred to.

ピッチ計算回路１３０は、フレームの音声信号から平均
ピッチ周期Ｔを計算する。この方法としては例えば自己
相関法にもとづく方法が知られており、詳細は前記文献
１．２のピッチ抽出回路を参照することができる。また
、この方法以外にも他の周知な方法（例えば、ケプスト
ラム法、５ＴＦＴ法、変相開法など）を用いる゛ことが
できる。The pitch calculation circuit 130 calculates the average pitch period T from the audio signal of the frame. As this method, for example, a method based on an autocorrelation method is known, and for details, refer to the pitch extraction circuit in Document 1.2. In addition to this method, other well-known methods (for example, cepstrum method, 5TFT method, phase change open method, etc.) can be used.

ピッチ符号化回路１５０は、平均ピッチ周ＢＨ”を予め
定められたビット数で量子化して得た符号をマルチプレ
クサ２６０°へ出力するとともに、これを復号化して得
た復号ピッチ周期Ｔ′をピンチ分別回路２０５．音源信
号計算回路２２０へ出力する。The pitch encoding circuit 150 outputs a code obtained by quantizing the average pitch period BH'' with a predetermined number of bits to the multiplexer 260°, and also performs pinch classification on the decoded pitch period T' obtained by decoding the code. Circuit 205. Outputs to the sound source signal calculation circuit 220.

コードブック１７５は、残差信号のスペクトル包絡を表
すフィルタのインパルス応答の系列ｈ　（ｎ）（ｎ＝１
−Ｌ）の集合（コードブック）を２Ｍ種類格納している
。ここでコードブックは予め多量の音声信号の予測残差
信号から分析した、残差信号のスペクトル包絡を表すフ
ィルタのインパルス応答データから学習により作成して
おく。この学習の方法としては、ベクトル量子化の学習
法が知られており、例えばＭａｋｈｏｕ１氏らによる“
Ｖｅｃ、ｔｏｒ口ｕａｎｔｉｚａｔｉｏｎ　　ｉｎ　　
５ｐｅｅｃｈ　　Ｃｏｄｉｎｇ、　　　　（Ｐｒｏｃ、
　　ＩＥＥＥ。The codebook 175 includes a series h (n) of impulse responses of a filter representing the spectral envelope of the residual signal (n=1
-L) sets (codebooks) of 2M types are stored. Here, the codebook is created in advance by learning from filter impulse response data representing the spectral envelope of the residual signal, which is analyzed from the predicted residual signals of a large amount of speech signals. As a method for this learning, vector quantization learning method is known, for example, "
Vec, tor mouth uantization in
5peech Coding, (Proc,
IEEE.

ｖｏｌ、？３．　ＬＬ　１５５１１５８８．１９８５．
文献６）等を参照することができる。また、残差信号の
スペクトル包絡を表すフィルタの特性の求め方としては
、周知の種々の方法を用いることができる。例えば、残
差信号に対してＬＰＧ分析、共分散分析、改良ケプスト
ラム分析などを用いることができる。ＬＰＧ分析、共分
散分析については、Ｒａｂｉｎｅｒ　ａｎｄＳｃｈａｆ
ｅｒ氏らによる”Ｄｉｇｉｔａｌ　Ｐｒｏｃｅｓｓｉｎ
ｇ　ｏｆＳｐｅｅｃｈ　Ｓｉｇｎａｌｓ”と題した刊行
物（Ｐｒｅｎｔｉｃｅ−Ｈａｌｌ著１９７８年１文献７
）を参照できる。改良ケプストラム分析については、今
井氏らによる゛改良ケプストラム法によるスペクトル包
絡の抽出”　（電子通信学会論文誌、　Ｊ６２−Ａ、　
２１？−２３３頁、　１９７９年１文献８）等を参照で
きる。コードブック１７５は、２′４個のインパルス応
答系列ｈｊ（ｎ）（ｊ＝１−２’）について、ｊ＝１か
ら順にｊ＝２Ｍまで一つずつインパルス応答計算回路１
７０へ出力する。Vol,? 3. LL 15511588.1985.
Reference 6) etc. can be referred to. Moreover, various well-known methods can be used to obtain the filter characteristics representing the spectral envelope of the residual signal. For example, LPG analysis, covariance analysis, modified cepstral analysis, etc. can be used for the residual signal. For LPG analysis and covariance analysis, Rabiner and Schaf
“Digital Processin” by Mr. er et al.
of Speech Signals” (by Prentice-Hall, 1978, 1 ref. 7).
) can be referenced. Regarding the improved cepstral analysis, please refer to "Extraction of spectral envelope by improved cepstral method" by Imai et al. (IEICE Transactions, J62-A,
21? -233 pages, 1979, 1 Reference 8), etc. The codebook 175 stores the impulse response calculation circuit 1 one by one from j=1 to j=2M for 2'4 impulse response sequences hj(n) (j=1-2').
Output to 70.

インパルス応答計算回路１７０は、Ｋパラメータ符号化
回路１６０からの線形予測係数ａ、′を用いて、聴感重
みづけを行った合成フィルタのインパルス応答ｈｗ（ｎ
）を計算し、さらにコードブック１７５からの出力ｈＪ
（ｎ）と（４）式に従いたたみこみ計算を行って得たイ
ンパルス応答Ｘ′、、ｋ（ｎｍ）を、自己相関関数計算
回路１８０へ出力する。The impulse response calculation circuit 170 uses the linear prediction coefficients a,' from the K-parameter encoding circuit 160 to calculate the impulse response hw(n
) and further calculate the output hJ from codebook 175
Impulse responses X', , k (nm) obtained by performing convolution calculation according to equations (n) and (4) are output to the autocorrelation function calculation circuit 180.

自己相関関数計算回路１８０ば、インパルス応答ｘｗｋ
’（ｎ−ｍ）の自己相関関数Ｒｈｈ（ｎ）を予め定めら
れた遅れ時間まで計算して出力する。自己相関関数計算
回路１８０の動作は前記文献１，２等を参照することが
できる。Autocorrelation function calculation circuit 180, impulse response xwk
'(n-m) autocorrelation function Rhh(n) is calculated and outputted up to a predetermined delay time. Regarding the operation of the autocorrelation function calculation circuit 180, reference can be made to the above-mentioned documents 1 and 2.

減算器１９０は、フレームの音声信号Ｘ（ｎ）から合成
フィルタ２８１の出力を１フレーム分減算し、減算結果
を重みづけ回路２００へ出力する。The subtracter 190 subtracts the output of the synthesis filter 281 by one frame from the frame audio signal X(n), and outputs the subtraction result to the weighting circuit 200.

重みづけ回路２００は、前記減算結果をインパルス応答
がｗ　（ｎ）で表される聴感重みづけフィルタに通し、
重みづけ信号ｘ、（ｎ）を得てこれを出力する。重みづ
けの方法は前記文献１．２等を参照できる。The weighting circuit 200 passes the subtraction result through an auditory weighting filter whose impulse response is represented by w (n),
A weighted signal x,(n) is obtained and output. For the weighting method, reference can be made to the above-mentioned documents 1 and 2.

ピッチ分割回路２０５は、フレームの音声信号を復号化
されたピッチ周期Ｔ′を用いてＴ′毎に分割する。The pitch dividing circuit 205 divides the frame audio signal into T' units using the decoded pitch period T'.

相互相関関数計算回路２１０は、重みづけ信号ｘ、、（
ｎ）とインパルス応答Ｘ’ｗｋ（ｎｍ）を入力して相互
相関間数φ、を予め定められた遅れ時間まで計算し出力
する。この計算法は前記文献１．２等を参照できる。The cross-correlation function calculation circuit 210 calculates the weighting signals x, , (
n) and impulse response X'wk (nm) are input, and the cross-correlation number φ is calculated and outputted up to a predetermined delay time. For this calculation method, reference can be made to the above-mentioned documents 1 and 2.

音源信号計算回路２２０では、フレーム内の代表的なｌ
ピッチ区間（代表区間）について、コードブック出力ｈ
＝（ｎ）を用いたときの１個のパルスの振幅ｇと位相ｍ
を求める。このときｇ、ｍの計算には前記（３）式を用
いる。次に前記作用の項で述べたように、Ｊ　（ｎ）と
して２′″種類についてコードブック１７５から出力し
以上の処理を繰り返し行い、（１）弐の誤差電力を最小
化するｇ、ｍ。The sound source signal calculation circuit 220 calculates representative l in the frame.
For the pitch section (representative section), codebook output h
= Amplitude g and phase m of one pulse when using (n)
seek. At this time, the above equation (3) is used to calculate g and m. Next, as described in the section on the effect, J (n) is output from the codebook 175 for 2'' types, and the above processing is repeated to (1) minimize the error power of 2 g, m;

Ｊ　（ｎ）の組を求める。そして選択されたコードブッ
クのインデックスを示す符号をマルチプレクサ２６０に
出力し、ｇ、ｍを符号器２３０へ出力する。Find the set of J(n). Then, a code indicating the index of the selected codebook is output to the multiplexer 260, and g and m are output to the encoder 230.

符号器２３０は、代表区間のパルスの振幅ｇ１位相ｍを
予め定められたビット数で符号化して出力する。また、
代表区間のサブフレーム位置を示す情報Ｐ、を予め定め
られたビット数で符号化してマルチプレクサ２６０へ出
力する。さらに、これらを復号化して駆動信号復元回路
２８３へ出力する。The encoder 230 encodes the amplitude g1 phase m of the pulse in the representative section using a predetermined number of bits and outputs the encoded signal. Also,
Information P indicating the subframe position of the representative section is encoded with a predetermined number of bits and output to the multiplexer 260. Furthermore, these are decoded and output to the drive signal restoration circuit 283.

駆動信号復元回路２８３は、代表区間において求めたパ
ルスの振幅１位相、選択したコードワードを用いて代表
区間において音源信号を発生する。The drive signal restoration circuit 283 generates a sound source signal in the representative section using the amplitude 1 phase of the pulse found in the representative section and the selected code word.

他のピッチ区間においては、前後のフレームの代表区間
におけるパルスの振幅を用いて振幅同士を線形補間して
、他のピッチ区間のパルスを求める。In other pitch sections, the amplitudes of the pulses in the representative sections of the previous and subsequent frames are used to linearly interpolate the amplitudes to obtain pulses in the other pitch sections.

また、選択したコードワードに対しては、代表区間のコ
ードワード同士を線形補間して、他のピッチ区間におけ
る残差信号のスペクトル包絡を表すインパルス応答を求
める。以上の処理によりフレームの音源信号を復元して
発生する。Furthermore, for the selected codeword, the codewords in the representative interval are linearly interpolated to obtain an impulse response representing the spectral envelope of the residual signal in other pitch intervals. Through the above processing, the frame sound source signal is restored and generated.

合成フィルタ２８１は、前記復元された音源信号を入力
し、Ｋパラメータ符号化回路１６０からの線形予測係数
ａ、′を重みづけ回路２００を介して入力して１フレ一
ム分の合成音声信号を求めると共に、次のフレームへの
影響信号を１フレーム分計算し、これを減算器１９０へ
出力する。なお、影響信号の計算法は特願昭５７−２３
１６０５号明細書（文献９）等を参照できる。The synthesis filter 281 inputs the restored sound source signal, inputs the linear prediction coefficients a,' from the K-parameter encoding circuit 160 via the weighting circuit 200, and generates a synthesized speech signal for one frame. At the same time, the influence signal for the next frame is calculated for one frame, and this is output to the subtracter 190. The calculation method of the influence signal is described in the patent application 1986-23.
Reference can be made to the specification of No. 1605 (Document 9).

マルチプレクサ２６０は、代表区間におけるパルスの振
幅９位相を表す符号、代表区間の位置を表す符号、Ｋパ
ラメータを表す符号、ピッチ周期を表す符号、選択され
たコードワードを表す符号を組み合わせて出力する。The multiplexer 260 combines and outputs a code representing nine phases of the amplitude of the pulse in the representative section, a code representing the position of the representative section, a code representing the K parameter, a code representing the pitch period, and a code representing the selected code word.

次に、受信側での音声復号化方式について説明する。第
１図（ｂ）において、受信側では、デマルチプレクサ３
００は受信した信号を分離して出力する。Next, the audio decoding method on the receiving side will be explained. In FIG. 1(b), on the receiving side, the demultiplexer 3
00 separates the received signal and outputs it.

復号器３１０は代表区間におけるパルスの振幅。The decoder 310 calculates the amplitude of the pulse in the representative section.

位相を復号して出力する。Decode and output the phase.

パラメータ復号器３２０は、Ｋパラメータを表す符号を
復号し、さらに復号したにパラメータを線形予測係数ａ
ｉ’に変換して出力する。また、ピンチ周期を表す符号
を復号し、復号したピッチ周期Ｔ′を出力する。The parameter decoder 320 decodes the code representing the K parameter, and converts the decoded parameter into a linear prediction coefficient a.
Convert to i' and output. It also decodes the code representing the pinch period and outputs the decoded pitch period T'.

駆動信号復元回路３４０は、送信側における駆動信号復
元回路２８３と同一の動作を行い、フレームの音源信号
を復元する。またコードブック３５０は、送信側のコー
ドブック１７５と同一のコードワードを格納している。The drive signal restoration circuit 340 performs the same operation as the drive signal restoration circuit 283 on the transmission side, and restores the sound source signal of the frame. Further, the codebook 350 stores the same codewords as the codebook 175 on the transmitting side.

合成フィルタ３６０は、フレームの音源信号を入力し合
成音声を求めた端子３８０°から出力する。The synthesis filter 360 inputs the frame sound source signal and outputs synthesized speech from a terminal 380°.

以上のように本実施例によれば、送信側では、入力した
離散的な音声信号から、スペクトル包絡を表すスペクト
ルパラメータとピッチを表すピッチパラメータとを予め
定められた時間長のフレーム毎に求め、前記フレームの
音声信号を前記ピッチパラメータから求めたピッチ周期
に応じたピッチ区間毎に分割し、前記ピッチ区間の内の
１つのピッチ区間の音源信号を少数のパルスと前記音源
信号の特性を表すコードブックとで表し、前記音声信号
と前記パルスと前記コードブックにより得られる合成信
号との誤差を小さくするように前記パルスの振幅と位相
を求め、前記コードブックから一つのコードワードを選
択し、前記ピッチパラメータと前記スペクトルパラメー
タと前記パルスの振幅２位相と前記コードワードを表す
インデックスとを出力し、受信側では、前記パルスの振
幅。As described above, according to this embodiment, on the transmitting side, from the input discrete audio signal, a spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are obtained for each frame of a predetermined time length, The audio signal of the frame is divided into pitch sections according to the pitch period determined from the pitch parameter, and the sound source signal of one pitch section among the pitch sections is divided into a small number of pulses and a code representing the characteristics of the sound source signal. calculate the amplitude and phase of the pulse so as to reduce the error between the speech signal, the pulse, and the composite signal obtained by the codebook, select one codeword from the codebook, and select the codeword from the codebook. A pitch parameter, the spectrum parameter, the amplitude two phases of the pulse, and an index representing the code word are output, and the receiving side outputs the amplitude of the pulse.

位相と前記インデックスとを用いて前記ピッチ区間の音
源信号を発生し、さらに補間により他のピンチ区間の音
源信号を復元し、前記スペクトルパラメータを用いて構
成した合成フィルタを駆動して合成音声を求め出力する
。Generating a sound source signal in the pitch interval using the phase and the index, restoring the sound source signal in the other pinch interval by interpolation, and driving a synthesis filter configured using the spectral parameter to obtain synthesized speech. Output.

第２図（ａ）、（ｂ）は、第２の本発明による音声符号
化復号化方式を実施する音声符号化装置および音声復号
化装置をそれぞれ示す。図において第１図（ａ）、（ｂ
）と同一の参照番号を付した構成要素は第１図（ａ）、
（ｂ）と同様の動作をするので説明は省略する。FIGS. 2(a) and 2(b) respectively show a speech encoding device and a speech decoding device that implement the speech encoding/decoding method according to the second invention. In Figure 1 (a), (b)
) Components with the same reference numbers as in Figure 1(a),
Since the operation is similar to that in (b), the explanation will be omitted.

第２図（ａ）において、２２５は振幅・位相補正計算回
路である。振幅・位相補正計算回路２２５では、同一フ
レーム内の代表区間以外のビ・ンチ区間において代表区
間のパルスの振幅１位相を補正するための補正係数を各
ピ・ノチ区間毎に計算する。In FIG. 2(a), 225 is an amplitude/phase correction calculation circuit. The amplitude/phase correction calculation circuit 225 calculates a correction coefficient for correcting the amplitude 1 phase of the pulse in the representative section in each of the pitch sections other than the representative section within the same frame.

具体的には次のように求める。第ｉ番目のピ・ソチ区間
における入力音声、振幅補正係数１位相補正係数を、そ
れぞれＸ、（ｎ）、Ｃｔ、ｄ、とする。Specifically, it is calculated as follows. Let the input voice and amplitude correction coefficient 1 phase correction coefficient in the i-th Pisochi section be X, (n), Ct, and d, respectively.

第ｉ番目のピッチ区間における入力音声と代表区間のパ
ルスの振幅１位相を補正して合成フィルりに通して再生
した再生信号ｘ＝　（ｎ）との聴感重みづけ誤差電力は
次のように書ける。The perceptually weighted error power between the input voice in the i-th pitch section and the reproduced signal x = (n), which is reproduced by correcting the amplitude 1 phase of the pulse in the representative section and passing through the synthesis filter, can be written as follows. .

Ｅ、、、＝ΣＮＸ、（ｎ）−ｃ、ｉ６　（ｎ−Ｔ’−ｄ、））＊ｗ　（ｎ）３”・　・　
・（５）振幅１位相補正係数ｃ；、ｄ＝は上式を最小化するよう
に求めることができる。上式を振幅補正係数Ｃ６で偏微
分して０とおき次式を得る。E,,,=ΣNX, (n)-c, i6 (n-T'-d,))*w (n)3"・・
-(5) Amplitude 1 phase correction coefficient c;, d= can be obtained by minimizing the above equation. The above equation is partially differentiated by the amplitude correction coefficient C6 and set to 0 to obtain the following equation.

Ｃｉ＝ΣＸｗｉ　（ｎ）　Ｍｗｔ　（ｎ　　Ｔ’　　ｄ
Ｈ）　７２Ｍｗｔ　（ｎ　　Ｔ’　　ｄｉ　）　Ｍｗｔ
　（ｎ　　Ｔ’　　ｄｌ・・・（６）種々の位相補正係数ｄ、について上式を計算し、上式を
最大化するｃ、、ｄ、の組を求めればよい。Ci=ΣXwi (n) Mwt (n T' d
H) 72Mwt (n T' di ) Mwt
(n T' dl (6) The above equation is calculated for various phase correction coefficients d, and the set of c, d that maximizes the above equation is found.

以上の処理をフレーム内の代表区間以外の全てのピッチ
区間について行い、各区間の振幅・位相補正係数を符号
器２３０へ出力する。The above processing is performed for all pitch sections other than the representative section within the frame, and the amplitude/phase correction coefficients for each section are output to the encoder 230.

駆動信号復元回路２８５は、フレームの代表区間ではパ
ルスの振幅８位相及び選ばれたコードワードを用いて音
源信号を発生させる。また同一フレーム内の代表区間以
外のｉ番目のピッチ区間においては、代表区間の音源信
号ｖ　（ｎ）を振幅１位相補正係数ｃ４．ｄ、を用いて
次式に従い補正して音源信号ｄ、（ｎ）を発生させる。The drive signal restoration circuit 285 generates a sound source signal using the pulse amplitude 8 phases and the selected code word in a representative section of the frame. In addition, in the i-th pitch section other than the representative section within the same frame, the sound source signal v (n) of the representative section is changed to the amplitude 1 phase correction coefficient c4. The sound source signal d,(n) is generated by correcting the sound source signal d,(n) using the following equation.

ｄ、（ｎ）＝ｃ、　　・ｖ　（ｎ−Ｔ’−ｄ、）・　・
（７）ただしｖ　（ｎ）　＝ｇ−ｈ；　（ｎ−ｍ）　　　　　・・・
（８）ここでｈＪ（ｎ）、ｇ＋　　ｍはコードブックの
コードワード、パルスの振幅、パルスの位相である。d, (n)=c, ・v (n-T'-d,)...
(7) However, v (n) = g-h; (n-m)...
(8) where hJ(n), g+m are the codeword of the codebook, the amplitude of the pulse, and the phase of the pulse.

第２図（ｂ）に示す受信側の復号化装置において、駆動
信号復元回路３４２は送信側の駆動信号復元回路２８５
と同一の動きを行う。In the reception side decoding device shown in FIG. 2(b), the drive signal restoration circuit 342 is replaced by the transmission side drive signal restoration circuit 285.
Perform the same movement as

以上のように本実施例によれば、送信側では、入力した
離散的な音声信号から、スペクトル包絡を表すスペクト
ルパラメータとピッチを表すピッチパラメータとを予め
定められた時間長のフレーム毎に求め、前記フレームの
音声信号を前記ピッチパラメータから求めたピッチ周期
に応じたピッチ区間毎に分割し、前記ピッチ区間の内の
１つのピッチ区間の音源信号を少数のパルスと前記音源
信号の特性を表すコードブックとで表し、さらに前記ピ
ッチ区間以外の他のピッチ区間では前記パルスの振幅１
位相を補正する補正係数を求め、前記ピッチパラメータ
と前記スペクトルパラメータと前記パルスの振幅１位相
と前記コードブックの選択されたコードワードを表すイ
ンデックスと前記補正係数を出力し、受信側では前記パ
ルスの振幅１位相と前記コードワードのインデックスと
を用いて前記ピッチ区間の音源信号を発生し、さらに他
のピッチ区間では前記ピッチ区間の音源信号と前記補正
係数とを用いて音源信号を復元し、前記復元した音源信
号により前記スペクトルパラメータを用いて構成した合
成フィルタを駆動して合成音声を求めて出力する。As described above, according to this embodiment, on the transmitting side, from the input discrete audio signal, a spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are obtained for each frame of a predetermined time length, The audio signal of the frame is divided into pitch sections according to the pitch period determined from the pitch parameter, and the sound source signal of one pitch section among the pitch sections is divided into a small number of pulses and a code representing the characteristics of the sound source signal. Furthermore, in other pitch sections other than the pitch section, the amplitude of the pulse is 1.
A correction coefficient for correcting the phase is obtained, and the pitch parameter, the spectrum parameter, the amplitude 1 phase of the pulse, an index representing the selected code word of the codebook, and the correction coefficient are output, and the receiving side outputs the correction coefficient. generating a sound source signal in the pitch section using the amplitude 1 phase and the index of the codeword; The reconstructed sound source signal drives a synthesis filter configured using the spectral parameters to obtain and output synthesized speech.

上述した各実施例はあくまで本発明の一例に過ぎず、そ
の変形例も種々考えられる。The embodiments described above are merely examples of the present invention, and various modifications thereof are possible.

例えば、パルスの振幅９位相の計算及びコードワードの
選択を代表区間のみではなくフレーム内の全ピッチ区間
において行うようにしてもよい。For example, the calculation of the amplitude and nine phases of the pulse and the selection of the code word may be performed not only in the representative section but also in all pitch sections within the frame.

このような構成とすると、音源情報の伝送に必要な情報
量は増大するが特性は向上する。With such a configuration, the amount of information required to transmit the sound source information increases, but the characteristics are improved.

また、代表区間は例えばフレームの中央部というように
フレーム内で固定的に決めてもよいし、最もよい区間を
探索して求めてもよい。後者の具体的な方法については
前記文献１を参照できる。Furthermore, the representative section may be fixedly determined within the frame, such as at the center of the frame, or may be determined by searching for the best section. Regarding the latter specific method, reference can be made to the above-mentioned document 1.

また、代表区間のパルスの個数は２以上でもよいが、伝
送情報量が増大する。Furthermore, although the number of pulses in the representative section may be two or more, the amount of transmitted information increases.

また、コードワードに関しては代表区間以外の他のピッ
チ区間においては線形補間してもよいし、しなくてもよ
い。Furthermore, regarding codewords, linear interpolation may or may not be performed in pitch sections other than the representative section.

また、コードブックとして、音声信号の予測残差信号の
スペクトル包絡を表すフィルタのインパルス応答とした
が、フィルタの係数としてもよい。Furthermore, although the codebook is an impulse response of a filter representing the spectral envelope of the prediction residual signal of the audio signal, it may also be a coefficient of the filter.

このような構成のときはフィルタ係数からインパルス応
答に変換する必要がある。係数としては具体的には、線
形予測係数、にパラメータ、対数断面積比、ケプストラ
ム、メルケプストラムなど周知の係数を用いることがで
きる。In such a configuration, it is necessary to convert the filter coefficients into an impulse response. Specifically, as the coefficient, well-known coefficients such as a linear prediction coefficient, a parameter, a logarithmic cross-sectional area ratio, a cepstrum, and a mel-cepstrum can be used.

また、本実施例では、スペクトルパラメータとしてにパ
ラメータを符号化し、その分析法としてＬＰＧ分析を用
いたが、スペクトルパラメータとしては他の周知なパラ
メータ、例えばＬＳＰ、ＬＰＣケプストラム、ケプスト
ラ゛ム、改良ケプストラム、一般化ケプストラム、メル
ケプストラムなどを用いることもできる。また各パラメ
ータに最適な分析法を用いることができる。In addition, in this example, the parameters were encoded as spectral parameters, and LPG analysis was used as the analysis method. However, as the spectral parameters, other well-known parameters such as LSP, LPC cepstrum, cepstrum, improved cepstrum, Generalized cepstrum, mel cepstrum, etc. can also be used. Furthermore, it is possible to use the optimal analysis method for each parameter.

また、演算量を低減するために、影響信号の計算を省略
することもできる。これによって、駆動信号復元回路２
８３１合成フ合成フィルタ、減算器１９０は不要となり
演算量低減が可能となるが、音質は低下する。Further, in order to reduce the amount of calculation, calculation of the influence signal can be omitted. As a result, the drive signal restoration circuit 2
Since the 831 synthesis filter and the subtracter 190 are not necessary, it is possible to reduce the amount of calculation, but the sound quality is degraded.

なお、デジタル信号処理の分野でよく知られているよう
に、自己相関関数は周波数軸上でパワスペクトルに、相
互相関関数はクロスパワスペクトルに対応しているので
、これらから計算することもできる。これらの計算法に
ついては、Ｏｐｐｅｎｈｅｉｍ氏らによる”Ｄｉｇｉｔ
ａｌ　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｉｎｇ”　（Ｐｒ
ｅｎｔｉｃｅ−Ｈａｌｌ、　１９７５）と題した刊行物
を参照できる。Note that, as is well known in the field of digital signal processing, the autocorrelation function corresponds to the power spectrum on the frequency axis, and the cross-correlation function corresponds to the cross-power spectrum, so it is also possible to calculate from these. These calculation methods are described in “Digit” by Oppenheim et al.
al Signal Processing” (Pr.
Reference may be made to the publication entitled Entice-Hall, 1975).

（発明の効果〕以上述べたように、本発明によれば、ｌピッチ区間の音
源信号（代表区間）を、振幅２位相を与える少数のパル
スと音源信号の特性を表すコードブックとを用いて表し
ているので、４．８ｋｂ／ｓ程度のビットレートでは従
来方式に比べ音源信号の近似度が高く良好な合成音声を
得ることができるという大きな効果がある。(Effects of the Invention) As described above, according to the present invention, the sound source signal (representative interval) in the l pitch interval is processed using a small number of pulses giving two amplitude phases and a codebook representing the characteristics of the sound source signal. Therefore, at a bit rate of about 4.8 kb/s, there is a great effect that the approximation of the sound source signal is higher than in the conventional method, and good synthesized speech can be obtained.

[Brief explanation of drawings]

第１図は第１の発明による音声符号化復号化方式の一実
施例を説明するための音声符号化装置及び音声復号化装
置のブロック図、第２図は第２の発明による音声符号化復号化方式の一実
施例を説明するための音声符号化装置及び音声復号化装
置のブロック図、第３図及び第４図は本発明の詳細な説明するための図で
ある。１１０　　・・１３０　　・・１４０　　・・１５０　　・・１６０　　・・１７０　　・・１７５、３５０゜１８０　　・・２０５　　・・２１０　　・・２２０　　・・２２５　　・・バッファメモリピッチ計算回路にパラメータ計算回路ピッチ符号化回路にパラメータ符号化回路インパルス応答計算回路・・・コードブック自己相関関数計算回路ピッチ分割回路相互相関関数計算回路音源信号計算回路振幅・位相補正計算回路２３０　・　・　・　・　・２６０　・　・　・　・　・２８Ｌ　　３６０，７３０２８３、　２８５．　３４０゜３００　・　・　・　・　・符号器マルチプレクサ・・・合成フィルタ３４２・・・駆動信号復元回路デマルチプレクサFIG. 1 is a block diagram of a speech encoding device and a speech decoding device for explaining an embodiment of the speech encoding and decoding method according to the first invention, and FIG. 2 is a block diagram of a speech encoding and decoding method according to the second invention. FIGS. 3 and 4 are block diagrams of a speech encoding device and a speech decoding device for explaining one embodiment of the encoding method, and FIGS. 3 and 4 are diagrams for explaining the present invention in detail. 110 ... 130 ... 140 ... 150 ... 160 ... 170 ... 175, 350°180 ... 205 ... 210 ... 220 ... 225 ... Parameter calculation circuit pitch encoding circuit in buffer memory pitch calculation circuit Parameter encoding circuit Impulse response calculation circuit...Code book Autocorrelation function calculation circuit Pitch division circuit Cross correlation function calculation circuit Sound source signal calculation circuit Amplitude/phase correction calculation circuit 230 260 260 28L 360,730 283, 285. 340°300 ・・・・・ Encoder multiplexer...Synthesis filter 342...Drive signal restoration circuit demultiplexer

Claims

[Claims]

(1) On the transmitting side, from the input discrete audio signal, a spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are obtained for each frame of a predetermined time length, and the audio signal of the frame is The audio signal is divided into pitch sections corresponding to the pitch period determined from the parameters, and the sound source signal of one of the pitch sections is represented by a small number of pulses and a codebook representing the characteristics of the sound source signal. The amplitude and phase of the pulse are determined so as to reduce the error between the pulse and the composite signal obtained by the codebook, one codeword is selected from the codebook, and the pitch parameter, the spectral parameter, and the outputting the amplitude and phase of the pulse and an index representing the code word;
On the receiving side, a sound source signal of the pitch section is generated using the amplitude and phase of the pulse and the index, and a sound source signal of the other pitch section is restored by interpolation, and a synthesizer configured using the spectral parameters is generated. A speech encoding/decoding method that drives a filter to obtain and output synthesized speech.

(2) On the transmitting side, from the input discrete audio signal, a spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are obtained for each frame of a predetermined time length, and the audio signal of the frame is The sound source signal of one of the pitch periods is represented by a small number of pulses and a codebook representing the characteristics of the sound source signal, and In other pitch sections other than the section, a correction coefficient for correcting at least one of the amplitude and phase of the pulse is determined, and the pitch parameter, the spectrum parameter, the amplitude and phase of the pulse, and the selected codeword of the codebook are calculated. On the receiving side, the amplitude and phase of the pulse and the index of the code word are used to generate a sound source signal for the pitch section, and for other pitch sections, the signal of the pitch section is output. A speech encoding/decoding method that restores a sound source signal using a sound source signal and the correction coefficient, and uses the restored sound source signal to drive a synthesis filter configured using the spectral parameters to obtain and output synthesized speech.