JPH077277B2

JPH077277B2 - Speech coding method and apparatus thereof

Info

Publication number: JPH077277B2
Application number: JP59156117A
Authority: JP
Inventors: 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1984-07-26
Filing date: 1984-07-26
Publication date: 1995-01-30
Anticipated expiration: 2010-01-30
Also published as: JPS6163900A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声信号の低ビットレイト波形符号化方式とそ
の装置に関する。The present invention relates to a low bit rate waveform coding system for speech signals and an apparatus therefor.

（従来技術とその問題点）音声信号を例えば16kビット／秒程度以下の伝送情報量
で符号化するための効果的な方法としては、音声信号の
駆動音源信号系列を、それを用いて再生した信号と入力
信号との誤差電力最小を条件として、短時間毎に探索す
る方法が、よる知られている。これら方法はその探索方
法によって木符号化（TREE CODING）、ベクトル量子化
（VEOTOR QUANTIZATION）と呼ばれている。また、これ
らの方法以外に、駆動音源信号系列を表わす複数個のパ
ルス系列を、短時間毎に、符号器側で、Ａ−ｂ−Ｓ（Ａ
NALYSIS−ＢＹ−ＳYNTHESIS）の手法を用いて遂次的を
求めようとする方式が最近、提案されている。本発明
は、この方式に関係するものである。この方式の詳細に
ついては、ビー・エス・アタール（Ｂ・Ｓ・ATAL）氏ら
によるアイ・シー・エー・エス・エス・ピー（I.C.A.S.
S.P.）の予稿集、1982年614〜617頁に掲載の「ア・ニュ
ー・モデル・オブ・エル・ピー・シー・エクサイティシ
ョン・フォー・プロデューシング・ナチュラル・サウン
ディング・スピーチ・アット・ロウ・ビット・レイツ」
（“A NEW MODEL OF LPC EXCITATION FOR PRODUCING NA
TURAL−SOUNDING SPEECH AT LOW BIT RATES"）と題した
論文（文献１）に説明されているので、ここでは簡単に
説明を行なうにとどめる。(Prior art and its problems) As an effective method for encoding a voice signal with a transmission information amount of, for example, about 16 kbit / sec or less, a drive source signal sequence of a voice signal is reproduced using it. There is known a method of searching every short time on condition that the error power between the signal and the input signal is minimum. These methods are called tree coding (TREE CODING) and vector quantization (VEOTOR QUANTIZATION) depending on the search method. In addition to these methods, a plurality of pulse sequences representing a drive excitation signal sequence are transmitted by the encoder side at a short time every A-b-S ( A
NALYSIS- B Y- S YNTHESIS) technique recently scheme to be obtained a sequential manner with the, has been proposed. The present invention relates to this system. For more information on this method, please see ICS AS (ICAS) by BS ATAL.
SP), 1982 pp. 614-617, "A New Model of LPC Excitement for Producing Natural Sounding Speech at Low Bit"・ Rates "
(“A NEW MODEL OF LPC EXCITATION FOR PRODUCING NA
TURAL-SOUNDING SPEECH AT LOW BIT RATES "), the explanation is given here, so only a brief explanation is given here.

第１図は、前記文献１、に記載された従来方式における
符号器側の処理の概念を示す図である。図において、10
0は符号器入力端子を示し、A/D変換された音声信号系列
ｘ（ｎ）が入力される。110はバッファメモリ回路であ
り、音声信号系列を１フレーム（例えば8KHZサンプリン
グの場合でフレーム長を10msecとすると80サンプル）
分、蓄積する。110の出力値は減算器120と、Ｋパラメー
タ計算回路180とに出力される。但し、文献１、によれ
ばＫパラメータのかわりにレフレクションコエフィシエ
ンツ（REFLECTION COEFFICIENTS）と記載されている
が、これはＫパラメータと同一のパラメータである。Ｋ
パラメータ計算回路180は、110の出力値を用い、共分散
法に従って、フレーム毎の音声信号スペクトルを表わす
ＫパラメータKiを16次分（１≦ｉ≦16）求め、これらを
合成フィルタ130へ出力する。140は、音源パルス発生回
路であり、１フレーム内にあらかじめ定められた個数の
パルス系列を発生させる。ここでは、このパルス系列を
ｄ（ｎ）と記する。音源パルス発生回路140によって発
生された音源パルス系列の一例を第２図に示す。第２図
で横軸は離散的な時刻を、縦軸は振幅をそれぞれに示
す。ここでは、１フレーム内に８個のパルスを発生させ
る場合について示してある。音源パルス発生回路140に
よって発生されたパルス系列ｄ（ｎ）は、合成フィルタ
130を駆動する。合成フィルタ130は、ｄ（ｎ）を入力
し、音声信号ｘ（ｎ）に対応する再生信号（ｎ）を求
め、これを減算器120へ出力する。ここで、合成フィル
タ130は、ＫパラメータKiを入力し、これらを予測パラ
メータai（１≦ｉ≦16）へ変換し、aiを用いて再生信号
（ｎ）を計算する。（ｎ）は、ｄ（ｎ）とaiを用い
て下式のように表わすことができる。FIG. 1 is a diagram showing the concept of processing on the encoder side in the conventional method described in Document 1 above. In the figure, 10
Reference numeral 0 denotes an encoder input terminal to which the A / D converted audio signal sequence x (n) is input. 110 is a buffer memory circuit, and one frame of the audio signal sequence (for example, in the case of 8KHZ sampling, if the frame length is 10 msec, 80 samples)
Accumulate minutes. The output value of 110 is output to the subtractor 120 and the K parameter calculation circuit 180. However, according to Document 1, it is described as REFLECTION COEFFICIENTS instead of the K parameter, but this is the same parameter as the K parameter. K
The parameter calculation circuit 180 uses the output value of 110, calculates the K parameter Ki representing the speech signal spectrum for each frame for the 16th order (1 ≦ i ≦ 16) according to the covariance method, and outputs these to the synthesis filter 130. . Reference numeral 140 denotes a sound source pulse generating circuit, which generates a predetermined number of pulse sequences in one frame. Here, this pulse sequence is referred to as d (n). FIG. 2 shows an example of a sound source pulse sequence generated by the sound source pulse generation circuit 140. In FIG. 2, the horizontal axis represents discrete time and the vertical axis represents amplitude. Here, the case where eight pulses are generated in one frame is shown. The pulse sequence d (n) generated by the sound source pulse generation circuit 140 is a synthesis filter.
Drive 130. The synthesis filter 130 inputs d (n), obtains a reproduction signal (n) corresponding to the audio signal x (n), and outputs this to the subtractor 120. Here, the synthesis filter 130 inputs the K parameter Ki, converts these into the prediction parameter ai (1 ≦ i ≦ 16), and calculates the reproduction signal (n) using ai. (N) can be expressed by the following equation using d (n) and ai.

上式でＰは合成フィルタの次数を示し、前記文献１では
Ｐ＝16としている。減算器120は、原信号ｘ（ｎ）と再
生信号（ｎ）との差ｅ（ｎ）を計算し、重み付け回路
190へ出力する。190は、ｅ（ｎ）を入力し、重み付け関
数Ｗ（ｎ）を用い、次式に従って重み付け誤差ew（ｎ）
を計算する。 In the above equation, P indicates the order of the synthesis filter, and P = 16 in Document 1 above. The subtractor 120 calculates the difference e (n) between the original signal x (n) and the reproduced signal (n), and the weighting circuit
Output to 190. 190 inputs e (n), uses weighting function W (n), and weights error ew (n) according to the following equation.
To calculate.

ew（ｎ）＝ｗ（ｎ）＊ｅ（ｎ）（２）上式で、記号“＊”はたたみこみ積分を表わす。また、
重み付け関数ｗ（ｎ）は、周波数軸上で重み付けを行な
うものであり、そのＺ変換値をＷ（Ｚ）とすると、合成
フィルタの予測パラメータaiを用いて、次式により表わ
される。ew (n) = w (n) * e (n) (2) In the above formula, the symbol “*” represents convolution integral. Also,
The weighting function w (n) is used to perform weighting on the frequency axis. When the Z-transformed value is W (Z), it is represented by the following equation using the prediction parameter ai of the synthesis filter.

上式でｒは０≦ｒ≦１の定数であり、Ｗ（Ｚ）の周波数
特性を決定する。つまり、ｒ＝１とすると、Ｗ（Ｚ）＝
１となり、その周波数特性は平坦となる。一方、ｒ＝０
とすると、Ｗ（Ｚ）は合成フィルタの周波数特性の逆特
性となる。従って、ｒの値によってＷ（Ｚ）の特性を変
えることができる。また、（３）式で示したようにＷ
（Ｚ）を合成フィルタの周波数特性に依存させて決めて
いるのは、聴感的なマスク効果を利用しているためであ
る。つまり、入力音声信号のスペクトルのパワが大きな
箇所では（例えばフォルマントの近傍）、再生信号のス
ペクトルとの誤差が少々大きくても、その誤差は耳につ
き難いという聴感的な性質による。第３図に、あるフレ
ームにおける入力音声信号のスペクトルと、Ｗ（Ｚ）の
周波数特性の一例とを示した。ここではｒ＝0.8とし
た。図において、横軸は周波数（最大4KHz）を、縦軸は
対数振幅（最大60dB）をそれぞれ示す。また、上部の曲
線は音声信号のスペクトルを、下部の曲線は重み付け関
数の周波数特性を表わしている。 In the above equation, r is a constant of 0 ≦ r ≦ 1 and determines the frequency characteristic of W (Z). That is, if r = 1, W (Z) =
The frequency characteristic becomes 1 and the frequency characteristic becomes flat. On the other hand, r = 0
Then, W (Z) has the inverse characteristic of the frequency characteristic of the synthesis filter. Therefore, the characteristic of W (Z) can be changed by the value of r. Also, as shown in the equation (3), W
The reason why (Z) is determined depending on the frequency characteristic of the synthesis filter is that the audible masking effect is used. That is, in a place where the spectrum of the input audio signal is large in power (for example, in the vicinity of the formant), even if the error with the spectrum of the reproduced signal is a little large, the error is hard to hear, which is due to the auditory property. FIG. 3 shows the spectrum of the input audio signal in a certain frame and an example of the frequency characteristic of W (Z). Here, r = 0.8. In the figure, the horizontal axis represents frequency (up to 4 KHz) and the vertical axis represents logarithmic amplitude (up to 60 dB). The upper curve represents the spectrum of the audio signal, and the lower curve represents the frequency characteristic of the weighting function.

第１図へ戻って、重み付け誤差ew（ｎ）は、誤差最小化
回路150へフィードバックされる。誤差最小化回路150
は、ew（ｎ）の値を１フレーム分記憶し、これらを用い
て次式に従い、重み付けられた２乗誤差εを計算する。Returning to FIG. 1, the weighting error ew (n) is fed back to the error minimization circuit 150. Error minimization circuit 150
Stores the value of ew (n) for one frame and calculates the weighted squared error ε according to the following equation using these values.

ここで、Ｎは２乗誤差を計算するサンプル数を示す。文
献１、の方式では、この時間長を5msecとしており、こ
れは8KHzサンプリングの場合にはＮ＝40に相当する。次
に、誤差最小化回路150は、前記（４）式で計算した２
乗誤差εを小さくするようにパルス位置及び振幅情報を
求め、これらを音源パルス発生回路140に与える。140
は、この情報に基づいて音源パルス系列を発生させる。
合成フィルタ130は、この音源パルス系列を駆動源とし
て再生信号（ｎ）を計算する。次に減算器120では、
先に計算した原信号と再生信号との誤差ｅ（ｎ）から現
在求まった再生信号（ｎ）を減算した結果を新たな誤
差ｅ（ｎ）とする。重み付け回路190はｅ（ｎ）を入力
し重み付け誤差ew（ｎ）を計算し、これを誤差最小化回
路150へフィードバックする。誤差最小化回路150は、再
び２乗誤差を計算し、これを小さくするように音原パル
ス系列の振幅と位置を調整する。こうして音源パルス系
列の発生から誤差最小化による音源パルス系列の調整ま
での一連の処理は、音源パルス系列のパルス数があらか
じめ定められた数に達するまでくり返され、音源パルス
系列が決定される。 Here, N represents the number of samples for calculating the squared error. In the method of Reference 1, this time length is set to 5 msec, which corresponds to N = 40 in the case of 8 KHz sampling. Next, the error minimization circuit 150 calculates 2 using the equation (4).
Pulse position and amplitude information is obtained so as to reduce the multiplication error ε, and these are given to the sound source pulse generation circuit 140. 140
Generates a source pulse sequence based on this information.
The synthesizing filter 130 calculates the reproduction signal (n) using this sound source pulse sequence as a driving source. Next, in the subtractor 120,
A new error e (n) is obtained by subtracting the currently obtained reproduction signal (n) from the previously calculated error e (n) between the original signal and the reproduction signal. The weighting circuit 190 inputs e (n), calculates a weighting error ew (n), and feeds it back to the error minimizing circuit 150. The error minimization circuit 150 calculates the squared error again and adjusts the amplitude and position of the sound source pulse sequence so as to reduce the squared error. In this way, a series of processes from generation of the sound source pulse sequence to adjustment of the sound source pulse sequence by error minimization is repeated until the number of pulses of the sound source pulse sequence reaches a predetermined number, and the sound source pulse sequence is determined.

以上で従来方式の説明を終了する。This is the end of the description of the conventional method.

従来方式では、伝送すべき情報は、合成フィルタのＫパ
ラメータKi（１≦ｉ≦16）と、音源パルス系列のパルス
位置及び振幅であり、１フレーム内にたてるパルスの数
によって任意の伝送レイトを実現できる。さらに、伝送
レイトを16Kbps以下とする領域に対しては、良好な再生
音質が得られ、有効な方式の一つと考えられる。In the conventional method, the information to be transmitted is the K parameter Ki (1 ≦ i ≦ 16) of the synthesizing filter, the pulse position and the amplitude of the sound source pulse sequence, and an arbitrary transmission rate depending on the number of pulses generated in one frame. Can be realized. Furthermore, in the area where the transmission rate is 16 Kbps or less, good reproduction sound quality can be obtained, which is considered to be one of the effective methods.

しかしながら、この従来方式は、演算量が非常に多いと
いう欠点がある。これは音源パルス系列計算ループ内に
Ａ−ｂ−Ｓ処理を含むからである。つまり、音源パルス
系列の位置と振幅を計算する際に、そのパルスに基づい
て信号を一たん再生し、この再生信号と原信号との誤差
及び２乗誤差を計算し、それらをフィードバックさせ
て、２乗誤差を小さくするようにパルス位置と振幅を調
整していることに起因している。更に、この従来方式に
よれば、16Kbps程度以下のビットレイトでは、ピッチ周
波数の高い入力信号の場合、例えば女性の声を入力した
場合には、再生品質が劣化するという欠点があった。こ
れはピッチ周波数が高い場合には、パルス計算のフレー
ム内に多くのピッチ波形が含まれることになり、このピ
ッチ波形を良好に再生するためには、ピッチ周波数が低
い話者の場合を比べて、より多くの個数の音源パルスを
必要とするという理由による。従ってこの理由から、伝
送ビットレイトを大幅に下げる、すなわち１フレーム内
のパルス数を大幅に減少させることが困難であった。However, this conventional method has a drawback that the amount of calculation is very large. This is because the A-B-S process is included in the sound source pulse sequence calculation loop. That is, when calculating the position and amplitude of the sound source pulse sequence, the signal is once reproduced based on the pulse, the error between this reproduced signal and the original signal and the square error are calculated, and these are fed back. This is because the pulse position and amplitude are adjusted so as to reduce the square error. Further, according to this conventional method, at a bit rate of about 16 Kbps or less, in the case of an input signal with a high pitch frequency, for example, when a female voice is input, there is a drawback that the reproduction quality deteriorates. This means that when the pitch frequency is high, many pitch waveforms are included in the frame of the pulse calculation, and in order to reproduce this pitch waveform satisfactorily, it is necessary to compare it with a speaker with a low pitch frequency. , Because of the need for a larger number of source pulses. Therefore, for this reason, it is difficult to significantly reduce the transmission bit rate, that is, the number of pulses in one frame.

（発明の目的）本発明の目的は、低い伝送ビットレイトでも高品質な音
声を再生できる音声符号化方式とその装置を提供するこ
とにある。(Object of the Invention) It is an object of the present invention to provide a voice encoding system and a device thereof capable of reproducing high quality voice even at a low transmission bit rate.

（発明の構成）本発明によれば、送信側においては、離散的音声信号系
列を入力し前記音声信号系列の短時間スペクトル特性を
表す複数個のパラメータ系列を求め、前記音声信号系列
を表すためのパルス系列をフレーム毎に求める際に前記
複数個のパラメータ系列を前記フレーム内で切り換えて
パルス系列を求め前記パルス系列を符号化するか又は各
々のパラメータについてパルス系列を求めた後で最良の
パラメータを選択して前記パルス系列を符号化し、前記
パルス系列を表す符号と前記複数個のパラメータ系列を
表す符号とを組み合わせて出力し、受信側においては前
記組み合わされた符号を入力し前記パルス系列を表す符
号と前記複数個のパラメータ系列を表す符号とを分離し
て復号し、前記復号されたパルス系列をもとにして前記
音声信号系列を再生する際に前記復号された複数個のパ
ラメータ系列を切り替えるか又は選択されたパラメータ
を用いて前記音声信号系列を再生するようにしたことを
特徴とする音声符号化方法が得られる。(Structure of the Invention) According to the present invention, on the transmitting side, a discrete audio signal sequence is input to obtain a plurality of parameter sequences representing short-time spectrum characteristics of the audio signal sequence, and to represent the audio signal sequence. When obtaining the pulse sequence for each frame, the plurality of parameter sequences are switched within the frame to obtain the pulse sequence and the pulse sequence is encoded, or the best parameter is obtained after obtaining the pulse sequence for each parameter. To encode the pulse sequence, and output by combining the code representing the pulse sequence and the code representing the plurality of parameter sequences, and at the receiving side input the combined code to input the pulse sequence. The code that represents the code and the code that represents the plurality of parameter sequences are separated and decoded, and the sound based on the decoded pulse sequence. A voice encoding method is characterized in that, when reproducing a voice signal sequence, the plurality of decoded parameter sequences are switched or the voice signal sequence is reproduced using a selected parameter. .

また本発明によれば、離散的音声信号系列を入力し前記
音声信号系列の短時間スペクトル特性を表す複数個のパ
ラメータ系列を求めるパラメータ計算回路と、前記音声
信号系列を表すためのパルス系列をフレーム毎に求める
際に前記複数個のパラメータ系列を前記フレーム内で切
り替えてパルス系列を求め前記パルス系列を符号化する
か又は各々のパラメータについてパルス系列を求めた後
で最良のパラメータを選択して前記パルス系列を符号化
するパルス計算回路と、前記複数個のパラメータ系列の
符号系列と前記パルス系列の符号系列を組み合わせて出
力するマルチプレクサ回路とを有することを特徴とする
音声符号化装置が得られる。Further, according to the present invention, a parameter calculation circuit for inputting a discrete audio signal sequence to obtain a plurality of parameter sequences representing short-time spectral characteristics of the audio signal sequence, and a pulse sequence for representing the audio signal sequence are framed. The plurality of parameter sequences are switched within the frame to obtain the pulse sequence when each is obtained, and the pulse sequence is encoded, or the best parameter is selected after obtaining the pulse sequence for each parameter, and A speech coder comprising a pulse calculation circuit for coding a pulse sequence, and a multiplexer circuit for combining and outputting the code sequences of the plurality of parameter sequences and the pulse sequence.

さらに本発明によれば、音声信号系列の短時間スペクト
ル特性を表す複数個のパラメータ系列を表す符号と音声
信号を表すためのパルス系列を表す符号とが組み合わさ
れた符号系列を入力し各々の符号を分離するデマルチプ
レクサ回路と、分離して得られた前記パルス系列を表す
符号を入力して復号するパルス復号回路と、分離して得
られた前記複数個のパラメータ系列を表す符号系列を入
力して復号するパラメータ復号回路と、前記復号された
パルス系列と前記復号された複数個のパラメータ系列と
を用いて前記音声信号系列を再生する際に前記復号され
た複数個のパラメータ系列を切り替えるか又は選択され
たパラメータを用いて前記音声信号系列を再生し出力す
る合成フィルタ回路とを有することを特徴とする音声復
号化装置が得られる。Further, according to the present invention, a code sequence in which a code representing a plurality of parameter sequences representing a short-time spectrum characteristic of a voice signal sequence and a code representing a pulse sequence for representing a voice signal are combined is inputted and each code is inputted. A demultiplexer circuit for separating the input signal, a pulse decoding circuit for inputting and decoding the code representing the pulse sequence obtained by separation, and a code sequence representing the plurality of parameter sequences obtained for separation. A parameter decoding circuit for decoding by using the decoded pulse sequence and the plurality of decoded parameter sequences, or when switching the plurality of decoded parameter sequences when reproducing the audio signal sequence, or And a synthesis filter circuit for reproducing and outputting the audio signal sequence by using a selected parameter. .

（本発明の原理）本発明は音源パルス計算法に特徴がある。その特徴は原
信号の短時間スペクトル特性を表わすパラメータを、音
原パルスを計算する過程で切り換えることにある。以下
では上記パラメータの一例としてインパルス応答を挙
げ、音源パルス計算過程において２種のインパルス応答
を切り換える場合について、音源パルス計算法を説明す
る。(Principle of the Present Invention) The present invention is characterized by a sound source pulse calculation method. The feature is that the parameter representing the short time spectrum characteristic of the original signal is switched in the process of calculating the original sound pulse. The impulse response will be described below as an example of the above parameters, and the excitation pulse calculation method will be described in the case of switching between two types of impulse responses in the excitation pulse calculation process.

今、２種のインパルス応答をh₁（ｎ）,h₂（ｎ）とす
る。h₁（ｎ）,h₂（ｎ）の選び方はいくつか考えられる
が、ここではh₁（ｎ）として、ピッチの微細構造を含む
音声信号の短時間スペクトルを表わすインパルス応答を
考え、h₂（ｎ）として、音声信号の短時間スペクトル包
絡を表わすインパルス応答を考える。このようにした場
合、h₁（ｎ）は、ピッチ予測フィルタとスペクトル包絡
予測フィルタの縦続接続からなる合成フィルタのインパ
ルス応答から求まり、h₂（ｎ）はスペクトル包絡予測フ
ィルタのインパルス応答から求まる。ここでピッチ予測
フィルタとしては、１次の場合と高次の場合とが考えら
れるが、ここでは説明の簡略化のために１次のピッチ予
測フィルタを用いる場合について考える。ピッチ予測フ
ィルタのタップ係数β及びピッチ周期Ｍの算出法は種々
知られているが、簡便な方法としては、例えば入力音声
信号の自己相関々数列のピーク振幅及びその位置を抽出
する方法がよく知られている。この方法の詳細について
は、ビーエス・アタール（B.S.ATAL）、エム・アール・
シュレーダー（M.R.SCHROEDER）氏によるベル・システ
ム・テクニカル・ジャーナル（BELL SYSTEM TECHNICAL
JOURNAL）誌、1970年10月号、1973〜1986頁に掲載の
「アダプティブ・プリディクティブ・コーディング・オ
ブ・スピーチ・シグナルズ」（“ADAPTIVE PREDITIVE C
ODING OF SPEECH SIGNALS"）と題した論文（文献２）に
詳細に説明されているのでここでは説明を省略する。Now, let the two types of impulse responses be h ₁ (n) and h ₂ (n). There are several possible ways to select h ₁ (n) and h ₂ (n). Here, as h ₁ (n), consider an impulse response that represents a short-time spectrum of a speech signal including a fine pitch structure, and h ₂ As (n), consider the impulse response that represents the short-time spectral envelope of the audio signal. In this case, h ₁ (n) is obtained from the impulse response of the synthesis filter that is a cascade connection of the pitch prediction filter and the spectrum envelope prediction filter, and h ₂ (n) is obtained from the impulse response of the spectrum envelope prediction filter. Here, the pitch prediction filter may be of a first-order case or a high-order case, but here, for simplification of description, a case of using a first-order pitch prediction filter will be considered. Various methods of calculating the tap coefficient β and the pitch period M of the pitch prediction filter are known, but as a simple method, for example, a method of extracting the peak amplitude of the autocorrelation sequence of the input speech signal and its position is well known. Has been. For more information on this method, see BS ATAL, M.R.
BELL SYSTEM TECHNICAL JOURNAL by Mr. Schroeder (MRSCHROEDER)
JOURNAL magazine, October 1970, 1973-1986, "Adaptive Predictive Coding of Speech Signals"("ADAPTIVE PREDITIVE C
Since it is explained in detail in the paper (Reference 2) entitled "ODING OF SPEECH SIGNALS"), the explanation is omitted here.

１フレーム内の音源パルス系列をｄ（ｎ）とすると、イ
ンパルス応答h₁（ｎ）を用いて再生する再生信号
（ｎ）は次のように書ける。If the source pulse sequence in one frame is d (n), the reproduced signal (n) reproduced using the impulse response h ₁ (n) can be written as follows.

（ｎ）＝ｄ（ｎ）＊h₁（ｎ）（５）ここで記号“＊”はたたみこみを表わす。またｄ（ｎ）
は以下のように表わせる。(N) = d (n) * h ₁ (n) (5) Here, the symbol “*” represents convolution. Also d (n)
Can be expressed as

ここで、δ（n,mi）はクロネッカーのデルタを表わし、
ｎ＝miの場合に１で、ｎ≠miの場合は０である。gi,mi
はそれぞれｉ番目のパルスの振幅位置を示す。K₁は１フ
レーム内の、インパルス応答h₁（ｎ）を用いて求めるパ
ルス数を示す。 Where δ (n, mi) represents the Kronecker delta,
It is 1 when n = mi and 0 when n ≠ mi. gi, mi
Each indicate the amplitude position of the i-th pulse. K ₁ represents the number of pulses obtained using the impulse response h ₁ (n) within one frame.

次に入力音声信号ｘ（ｎ）と再生信号（ｎ）との誤差
電力の重み付け値は次式のようにかける。Next, the weight value of the error power between the input audio signal x (n) and the reproduced signal (n) is multiplied by the following equation.

ここで xw（ｎ）＝ｘ（ｎ）＊ｗ（ｎ）（８） h₁w（ｎ）＝h₁（ｎ）＊ｗ（ｎ）でありｗ（ｎ）は重み付け回路のインパルス応答であ
る。ｗ（ｎ）は例えば第１図に示した従来方式の重み付
け回路と同一の特性とする。 Where xw (n) = x (n) * w (n) (8) h ₁ w (n) = h ₁ (n) * w (n) and w (n) is the impulse response of the weighting circuit. . For example, w (n) has the same characteristic as that of the conventional weighting circuit shown in FIG.

（７）式を最小化する音源パルス系列は、（７）式を音
源パルス系列の振幅giで偏微分して０とおくことによっ
て得る次式から計算される。The sound source pulse sequence that minimizes equation (7) is calculated from the following equation obtained by partially differentiating equation (7) with the amplitude gi of the sound source pulse sequence and setting it to zero.

ここで、xh（・）はxw（ｎ）とh₁w（ｎ）から計算し
た相互相関々数列を、hh（・）はh₁w（ｎ）から計算
した自己相関々数列をそれぞれ表わす。尚、hh（・）
は音声信号処理の分野では共分散関数列と呼ばれること
が多い。 Here, xh (•) represents the cross-correlation sequence calculated from xw (n) and h ₁ w (n), and hh (•) represents the autocorrelation sequence calculated from h ₁ w (n). In addition, hh (・)
Is often called a covariance function sequence in the field of speech signal processing.

（９）式によれば、パルスの位置は（９）式右辺の絶対
値を最大にする位置として求まる。またパルスの振幅
は、求まった位置に対するgiの値である。According to the equation (9), the position of the pulse is obtained as a position where the absolute value of the right side of the equation (9) is maximized. The amplitude of the pulse is the value of gi for the obtained position.

この関係を用いると、（９）式は次式のように修正され
る。Using this relationship, equation (9) is modified as

ここでRhh（・）は、次式のように表わせる。 Here, Rhh (•) can be expressed as the following equation.

従ってインパルス応答としてh₁（ｎ）を用いるときの音
源パルスは（10）式からK₁個計算できる。 Therefore, when h ₁ (n) is used as the impulse response, K ₁ sound source pulses can be calculated from the equation (10).

次にK₁個のパルスを計算した後に、インパルス応答をh₂
（ｎ）に切り換えて音源パルスをK₂個計算する場合の方
法を示す。Next, after calculating K ₁ pulses, the impulse response is h ₂
A method of calculating K ₂ source pulses by switching to (n) will be shown.

音源パルスの振幅，位置をそれぞれgsi,msiとすると、
音源パルスをK₂個たてる場合の重み付けられた誤差電力
Ｅ′は次式で表わせる。If the amplitude and position of the sound source pulse are gsi and msi respectively,
The weighted error power E'when K ₂ source pulses are applied can be expressed by the following equation.

ここで（ｎ）は（10）式を用いて求めた音源パルス列
を用いて（５）式から求められる。 Here, (n) is obtained from the equation (5) using the sound source pulse train obtained using the equation (10).

（12）式を最小化するgsiは（５）式を（12）式に代入
して、（12）をgsiについて偏微分して０とおくことに
よって得られる次式から求まる。The gsi that minimizes the equation (12) is obtained from the following equation obtained by substituting the equation (5) into the equation (12) and partially differentiating the equation (12) with respect to gsi.

ここでsxhは入力信号ｘ（ｎ）とインパルス応答h₂wと
の相互相関々数を示す。またRshhはインパルス応答h₂w
の自己相関々数を示す。（13）式は、インパルス応答h₁
（ｎ）を用いて計算したパルス列ｄ（ｎ）による影響を
相互相関から差し引いておくことによって、インパルス
応答をh₂（ｎ）に切り換えた場合のパルス列gsiは、前
述の（10）式と同一の手順によって計算されることを示
している。 Here, sxh represents the number of cross-correlations between the input signal x (n) and the impulse response h ₂ w. Rshh is the impulse response h ₂ w
The autocorrelation number of is shown. Equation (13) is the impulse response h ₁
By subtracting the effect of the pulse train d (n) calculated using (n) from the cross-correlation, the pulse train gsi when the impulse response is switched to h ₂ (n) is the same as the above equation (10). It is shown that it is calculated by the procedure of.

（13）式以外の方法としては、（12）式をgsiについて
偏微分して０とおくことによって得られる式からもパル
ス列gsiを計算することができる。但し、この方法によ
れば、（12）式の信号（ｎ）を一たん再生しなくては
ならないので、（13）式の方法と比べ演算量は増加す
る。As a method other than the equation (13), the pulse train gsi can be calculated from the equation obtained by partially differentiating the equation (12) with respect to gsi and setting it to 0. However, according to this method, since the signal (n) of the equation (12) must be reproduced at once, the amount of calculation increases as compared with the method of the equation (13).

以上で音源パルス計算法の説明を終える。This completes the description of the sound source pulse calculation method.

（実施例）以下本発明の実施例を図面を用いて詳細に説明する。第
４図（ａ）は本発明による音声符号化方式の送信側の一
実施例を示すブロック図であり、第４図（ｂ）は受信側
の一実施例を示すブロック図である。第４図（ａ）にお
いて、離散的な音声信号系列ｘ（ｎ）は入力端子195か
ら入力され、あらかじめ定められたサンプル数だけ区切
られてバッファメモリ回路340に蓄積される。(Example) Hereinafter, an example of the present invention will be described in detail with reference to the drawings. FIG. 4 (a) is a block diagram showing an example of the transmitting side of the audio encoding system according to the present invention, and FIG. 4 (b) is a block diagram showing an example of the receiving side. In FIG. 4A, the discrete audio signal sequence x (n) is input from the input terminal 195, divided into a predetermined number of samples, and stored in the buffer memory circuit 340.

次に、Ｋパラメータ計算回路280は、バッファメモリ回
路340に蓄積されている音声信号系列のうち、あらかじ
め定められた長さの系列を入力し、これを用いてあらか
じめ定められた次数ＰのLPCパラメータを、衆知の方法
（例えば線形予測分析法）に従い計算する。LPCパラメ
ータとしては、種々のものが考えられるが、以下ではＫ
パラメータKi（１≦ｉ≦Ｐ）を用いるものとして説明を
進める。Ｋパラメータはパーコール係数と同一のパラメ
ータである。ＫパラメータKiは、Ｋパラメータ符号化回
路200に出力される。Ｋパラメータ符号化回路200は、例
えばあらかじめ定められた量子化ビット数に基づいて、
Kiを符号化し、符号lkiをマルチプレクサ450へ出力す
る。また、Ｋパラメータ符号化回路200は、lkiを復号化
して得たＫ_ｉ′をインパルス応答計算回路210と重み付
け回路410と合成フィルタ回路400へ出力する。Next, the K parameter calculation circuit 280 inputs a sequence of a predetermined length among the audio signal sequences accumulated in the buffer memory circuit 340, and using this, the LPC parameter of the predetermined order P is used. Is calculated according to a known method (for example, a linear predictive analysis method). There are various possible LPC parameters, but in the following, K
The description will proceed assuming that the parameter Ki (1 ≦ i ≦ P) is used. The K parameter is the same parameter as the Percoll coefficient. The K parameter Ki is output to the K parameter encoding circuit 200. The K parameter encoding circuit 200, for example, based on a predetermined number of quantization bits,
Ki is encoded and the code lki is output to the multiplexer 450. Further, the K parameter coding circuit 200 outputs K _i ′ obtained by decoding lki to the impulse response calculation circuit 210, the weighting circuit 410, and the synthesis filter circuit 400.

次に、ピッチ分析回路370は、バッファメモリ回路340の
出力系列を入力し、ピッチ再生フィルタの係数をあらか
じめ定められた次数Ｑだけ計算する。ここでピッチ再生
フィルタの伝達関数H_P（Ｚ）はＺ変換表現を用いて次式
のように表わせる。Next, the pitch analysis circuit 370 inputs the output sequence of the buffer memory circuit 340 and calculates the coefficient of the pitch reproduction filter by a predetermined order Q. Here, the transfer function H _P (Z) of the pitch reproduction filter can be expressed by the following equation using the Z conversion expression.

本実施例では説明の簡略化のために、ピッチ再生フィル
タの次数が１次であるとして説明を進める。この場合に
ピッチゲインβ及びピッチ周期Ｍの計算には、例えば前
述の文献2.等に詳述されている。またこれ以外の衆知の
方法を用いることもできる。ピッチゲインβ及びピッチ
周期Ｍはピッチ符号化回路380へ出力される。ピッチ符
号化回路380は、あらかじめ定められたビット数でピッ
チ周期Ｍ及びピッチゲインβを符号化して得たl_M及びｌ
_βをマルチプレクサ450へ出力する。また、ピッチ符号
化回路380は、l_M及びｌ_βを復号化して得たＭ′及び
β′をインパルス応答計算回路210とパルス計算回路390
とパルス発生回路420とへ出力する。 In the present embodiment, for simplification of the description, the description will proceed assuming that the order of the pitch reproduction filter is the first order. In this case, the calculation of the pitch gain β and the pitch period M is described in detail in, for example, the above-mentioned Document 2. Other publicly known methods can also be used. The pitch gain β and the pitch period M are output to the pitch encoding circuit 380. The pitch encoding circuit 380 encodes the pitch period M and the pitch gain β with a predetermined number of bits, and obtains l _M and l
Output _β to the multiplexer 450. The pitch coding circuit 380, l _M and l M obtained by decrypting the _beta 'and beta' impulse response calculation circuit 210 and the pulse calculating circuit 390
And pulse generator circuit 420.

次に、インパルス応答計算回路210は、Ｋパラメータ復
号値Ｋ_ｉ′をＫパラメータ符号化回路200から入力し、
また、ピッチ周期及びピッチゲインの復号化値Ｍ′及び
β′をピッチ符号化回路380から入力する。インパルス
応答計算回路210は、２種類のインパルス応答を計算す
る。その一つとしてＫ_ｉ′とピッチゲインβ′及びピッ
チ周期Ｍ′とを用いて、ピッチ再生フィルタとスペクト
ル包絡合成フィルタからなる合成フィルタの重み付けら
れたインパルス応答h₁w（ｎ）をあからじめ定められた
サンプル数だけ計算する。ここで、この合成フィルタの
伝達関数はＺ変換表現を用いて次式のようにかける。Next, the impulse response calculation circuit 210 inputs the K parameter decoded value K _i ′ from the K parameter encoding circuit 200,
Further, the decoded values M ′ and β ′ of the pitch period and the pitch gain are input from the pitch encoding circuit 380. The impulse response calculation circuit 210 calculates two types of impulse responses. As one of them, using K _i ′, pitch gain β ′ and pitch period M ′, the weighted impulse response h ₁ w (n) of the synthesis filter composed of the pitch reproduction filter and the spectral envelope synthesis filter is obtained. Calculate only the specified number of samples. Here, the transfer function of this synthesis filter is multiplied by the following equation using the Z-transform expression.

ここでaiはＫ_ｉ′から衆知の方法によって変換し求めた
予測係数を示す。またＷ（ｚ）は重み付け関数のｚ変換
表現を示し、前述の（３）式を用いることができる。も
う一つのインパルス応答として、Ｋパラメータ復号値Ｋ
_ｉ′を用いてスペクトル包絡合成フィルタの重み付けら
れたインパルス応答系列h₂w（ｎ）を計算する。ここで
重み付けられたスペクトル包絡合成フィルタの伝達関数
はｚ変換表現を用いて次式のように書ける。 Here, ai represents a prediction coefficient obtained by converting from K _i ′ by a known method. Further, W (z) represents a z-transform expression of the weighting function, and the above equation (3) can be used. As another impulse response, the K parameter decoded value K
Calculate the weighted impulse response sequence h ₂ w (n) of the spectral envelope synthesis filter using _i ′. Here, the transfer function of the weighted spectral envelope synthesis filter can be written as the following equation using the z-transform expression.

以上のようにして求めたインパルス応答h₁（ｎ）,h
₂（ｎ）はスイッチ回路365へ出力される。 Impulse response h ₁ (n), h obtained as described above
₂ (n) is output to the switch circuit 365.

スイッチ回路365は、２種のインパルス応答h₁（ｎ）,h₂
（ｎ）を入力し、これらを切り換えてパルス決定回路34
0へ出力する。The switch circuit 365 has two types of impulse responses h ₁ (n), h ₂
(N) is input, and these are switched to switch the pulse determination circuit 34.
Output to 0.

次に、減算器285は、バッファメモリ回路340に蓄積され
た音声信号系列ｘ（ｎ）を入力し、ｘ（ｎ）から合成フ
ィルタ回路400の出力系列（ｎ）を１フレームサンプ
ル分減算し、減算結果を重み付け回路410へ出力する。
重み付け回路410は、Ｋパラメータ符号化回路200から、
Ｋパラメータ復号値Ｋ_ｉ′ 入力し、重み付け関数ｗ
（ｎ）を、そのｚ変換値を例えば（３）式とするように
計算する。これは他の周波数重み付け方法を用いて計算
してもよい。更に、重み付け回路410は、減算器285の減
算結果を入力し、これと重み付け関数ｗ（ｎ）とのたた
みこみ演算を行ない、得られたxw（ｎ）をパルス決定回
路340へ出力する。Next, the subtractor 285 inputs the audio signal sequence x (n) accumulated in the buffer memory circuit 340, subtracts the output sequence (n) of the synthesis filter circuit 400 from x (n) by one frame sample, The subtraction result is output to weighting circuit 410.
The weighting circuit 410 uses the K parameter encoding circuit 200
The K parameter decoded value K _i ′ is input and the weighting function w
(N) is calculated so that the z-transformed value is, for example, the expression (3). This may be calculated using other frequency weighting methods. Further, the weighting circuit 410 inputs the subtraction result of the subtractor 285, performs a convolution operation with this and the weighting function w (n), and outputs the obtained xw (n) to the pulse determination circuit 340.

パルス決定回路340は、xw（ｎ）と２種類のインパルス
応答h₁（ｎ）,h₂（ｎ）を入力し、h₁（ｎ）とh₂（ｎ）
を切り換えて前述の（10）式、（13）式に従い音源パル
スを計算する。パルス決定回路340は、自己相関々数計
算回路360、相互相関々数計算回路350、パルス計算回路
390から構成される。以下これらの回路の動作を説明す
る。自己相関々数計算回路360は、２種のインパルス応
答h₁w（ｎ）とh₂w（ｎ）を順に入力し、各々のインパル
ス応答に対する自己相関々数R₁hh（τ）及びR₂hh（τ）
を次式に従って計算する。The pulse determination circuit 340 inputs xw (n) and two types of impulse responses h ₁ (n) and h ₂ (n), and inputs h ₁ (n) and h ₂ (n).
And the sound source pulse is calculated according to the above equations (10) and (13). The pulse determination circuit 340 includes an autocorrelation coefficient calculation circuit 360, a cross correlation coefficient calculation circuit 350, and a pulse calculation circuit.
Composed of 390. The operation of these circuits will be described below. The autocorrelation coefficient calculation circuit 360 inputs two kinds of impulse responses h ₁ w (n) and h ₂ w (n) in order, and the autocorrelation coefficients R ₁ hh (τ) and R ₂ for each impulse response are input. hh (τ)
Is calculated according to the following formula.

上式でτは遅れサンプル数、Ｍはあらかじめ定められた
サンプル数を示す。R₁hh（τ）,R₂hh（τ）はパルス計
算回路390へ出力される。 In the above equation, τ is the number of delayed samples, and M is the number of predetermined samples. R ₁ hh (τ) and R ₂ hh (τ) are output to the pulse calculation circuit 390.

相互相関々数計算回路350は、xw（ｎ）とインパルス応
答h₁（ｎ）,h₂（ｎ）を入力し、xw（ｎ）とh₁（ｎ）と
の相互相関々数xhとxw（ｎ）とh₂（ｎ）との相互相関
々数sxhを計算し、これらをパルス計算回路390へ出力
する。The cross-correlation coefficient calculation circuit 350 inputs xw (n) and impulse responses h ₁ (n) and h ₂ (n), and cross-correlation coefficients xh and xw between xw (n) and h ₁ (n). The cross-correlation number sxh between (n) and h ₂ (n) is calculated, and these are output to the pulse calculation circuit 390.

パルス計算回路390は２種類の相互相関xh,sxhと２
種類の自己相関R₁hh,R₂hhとを入力し、インパルス応答h
₁（ｎ）による音源パルスを（10）式に従いK₁個求め
る。次にインパルス応答h₂（ｎ）による音源パルスを
（13）式に従いK₂個求める。求めた音源パルスは符号化
回路470へ出力される。以上でパルス決定回路340の説明
を終了する。The pulse calculation circuit 390 has two types of cross-correlation xh, sxh and 2
Type the autocorrelation R ₁ hh, R ₂ hh, and enter the impulse response h
₁ Find the sound source pulse by (n) K ₁ according to equation (10). Next, K ₂ pieces of sound source pulses based on the impulse response h ₂ (n) are obtained according to the equation (13). The obtained excitation pulse is output to the encoding circuit 470. This is the end of the description of the pulse determination circuit 340.

次に、符号化回路470は、パルス計算回路390から、音源
パルス列の振幅及び位置を入力し、これらを後述の正規
化係数を用いて符号化する。また正規化係数にも符号化
を施し、正規化係数、音源パルス列の振幅、位置を表わ
す符号を、マルチプレクサ450へ出力する。また、音源
パルス列の振幅、位置の復号化値ｇ_ｉ′,m_ｉ′を音源パ
ルス発生回路420へ出力する。尚、符号化回路470の動作
は、特願昭57−231605号明細書において符号化回路250
として詳細に説明されているので、ここでは説明を省略
する。Next, the encoding circuit 470 inputs the amplitude and position of the excitation pulse train from the pulse calculation circuit 390, and encodes these using a normalization coefficient described later. The normalization coefficient is also encoded, and the normalization coefficient, the amplitude of the excitation pulse train, and the code indicating the position are output to the multiplexer 450. Also, the decoded values g _i ′, m _i ′ of the amplitude and position of the excitation pulse train are output to the excitation pulse generation circuit 420. The operation of the encoding circuit 470 is described in Japanese Patent Application No. 57-231605.
The detailed description is omitted here.

次に、パルス位置の符号化についても種々の方法が考え
られる。例えば、ファクシミリ信号符号化の分野でよく
知られているランレングス符号等を用いてもよい。これ
は符号“0"または“1"の続く長さをあらかじめ定められ
た符号系列を用いて表わすものである。この符号系列を
用いてパルス間の位置を符号化する場合、次のような方
法を用いればよい。今、１フレームにたてる全パルス数
をＫ、インパルス応答h₁（ｎ）を用いて計算したパルス
数をK₁、インパルス応答h₂（ｎ）を用いて計算したパル
ス数をK₂とする。K₁個のパルスの位置に対してあらかじ
め定められた長さの符号を与え、K₂個のパルスの位置に
対してあらかじめ定められた符号を与えるようにすれば
よい。他の方法としては、各パルスの位置を表わす符号
に判別符号を１ビット分追加し、この判別符号によって
h₁（ｎ）,h₂（ｎ）のうち、どちらのインパルス応答を
用いたかを判別するようにしてもよい。前者の方法の方
が後者の方法よりも全体の位置情報は少なくてすむ。Next, various methods can be considered for encoding the pulse position. For example, a run length code or the like well known in the field of facsimile signal encoding may be used. This represents the length following the code "0" or "1" using a predetermined code sequence. When the position between pulses is encoded using this code sequence, the following method may be used. Now, let K be the total number of pulses in one frame, K _{1 be} the number of pulses calculated using impulse response h ₁ (n), and K ₂ be the number of pulses calculated using impulse response h ₂ (n). . It suffices to give a code of a predetermined length to the positions of K ₁ pulses and give a predetermined code to the positions of K ₂ pulses. As another method, a discriminant code for one bit is added to the code indicating the position of each pulse, and
It may be possible to determine which _{one of} h ₁ (n) and h ₂ (n) has been used. The former method requires less total position information than the latter method.

尚、パルス系列の符号化に関しては、ここで説明した符
号化方法に限らず、衆知の最良の方法を用いることがで
きることは勿論である。Regarding the encoding of the pulse sequence, it is needless to say that the best known method can be used without being limited to the encoding method described here.

第４図（ａ）に戻って、パルス系列発生回路420は入力
したｇ_ｉ′,m_ｉ′を用いて、ｍ_ｉ′の位置に振幅ｇ_ｉ′
をもつ音源パルス系列を１フレーム長Ｎにわたって計算
し、これを駆動信号として、合成フィルタ回路400へ出
力する。駆動信号のつくり方を以下に説明する。インパ
ルス応答h₁（ｎ）を用いて計算したK₁個のパルスに対し
ては、振幅ｇ_ｉ′、位置ｍ_ｉ′とピッチ情報β′,M′と
を用いて次式に従いパルス列d₁（ｎ）を発生させる。Returning to FIG. 4 (a), _{g i} pulse sequence generating circuit 420 which inputs ', _{m i'} with 'amplitude _g i to the position of' _{m i}
Is calculated over one frame length N and is output to the synthesis filter circuit 400 as a drive signal. The method of creating the drive signal will be described below. For the K ₁ pulses calculated using the impulse response h ₁ (n), the pulse train d ₁ (using the amplitude g _i ′, the position m _i ′, and the pitch information β ′, M ′ according to the following equation n) is generated.

d₁（ｎ）＝d_I（ｎ）＋β′・d₁（ｎ−Ｍ′）（18）但し、一方、インパルス応答h₂（ｎ）を用いて計算したK₂個の
パルスに対しては、次式に従いパルス列を発生させる。d ₁ (n) = d _I (n) + β ′ · d ₁ (n−M ′) (18) On the other hand, for K ₂ pulses calculated using the impulse response h ₂ (n), a pulse train is generated according to the following equation.

d₁（ｎ）とd₂（ｎ）との加算系列が駆動信号ｄ（ｎ）と
して合成フィルタ回路400へ出力される。 The addition sequence of d ₁ (n) and d ₂ (n) is output to the synthesis filter circuit 400 as the drive signal d (n).

合成フィルタ回路400は、パルス発生回路420から駆動信
号ｄ（ｎ）を入力し、Ｋパラメータ符号化回路200から
Ｋパラメータ復号値Ｋ_ｉ′を入力し、２フレーム分（１
≦ｎ≦2N）の応答信号系列（ｎ）を求め、第２フレー
ム目の（ｎ）（Ｎ＋１≦ｎ≦2N）の値が減算器285へ
出力される。合成フィルタ400の動作は、前述の特願昭5
7−231605号明細書に合成フィルタ320として詳細に説明
されているのでここでは説明を省略する。The synthesis filter circuit 400 inputs the drive signal d (n) from the pulse generation circuit 420, the K parameter decoded value K _i ′ from the K parameter encoding circuit 200, and inputs two frames (1
A response signal sequence (n) of ≦ n ≦ 2N) is obtained, and the value of (n) (N + 1 ≦ n ≦ 2N) of the second frame is output to the subtractor 285. The operation of the synthesizing filter 400 is the same as the above-mentioned Japanese Patent Application No.
Since the synthesis filter 320 is described in detail in the specification of 7-231605, its description is omitted here.

次にマルチプレクサ450は、符号化回路470の出力符号
と、Ｋパラメータ符号化回路200の出力符号と、ピッチ
符号化回路380の出力符号とを入力し、これらを組み合
わせて、送信側出力端子480から通信路へ出力する。以
上で本発明による音声符号化方式の符号器側の説明を終
える。Next, the multiplexer 450 inputs the output code of the encoding circuit 470, the output code of the K parameter encoding circuit 200, and the output code of the pitch encoding circuit 380, combines them, and outputs them from the transmission side output terminal 480. Output to the communication path. This is the end of the description of the encoder side of the speech encoding system according to the present invention.

次に、本発明による音声符号化方式の受信側について第
４図（ｂ）を参照して説明する。Next, the receiving side of the voice encoding system according to the present invention will be described with reference to FIG.

デマルチプレクサ500は、受信側入力端子490から、符号
を入力する。デマルチプレクサ500は、入力符号のう
ち、Ｋパラメータを表わす符号系列とピッチ情報を表わ
す符号系列と、音源パルス列を表わす符号系列とを分離
し、Ｋパラメータを表わす符号系列lkiをＫパラメータ
復号回路520へ出力し、ピッチ情報を表わす符号系列l_M,
l_βを、ピッチ復号回路510へ出力し、音源パルス列を表
わす符号系列を、音源パルス復号回路530へ出力する。
Ｋパラメータ復号回路520は、入力した符号系列を復号
し、合成フィルタ回路550へ出力する。ピッチ復号回路
は入力された符号系列を復号しパルス発生回路540へ出
力する。The demultiplexer 500 inputs a code from the reception side input terminal 490. Of the input codes, the demultiplexer 500 separates the code sequence representing the K parameter, the code sequence representing the pitch information, and the code sequence representing the excitation pulse train, and the code sequence lki representing the K parameter to the K parameter decoding circuit 520. A code sequence l _M ,
l _β is output to pitch decoding circuit 510, and a code sequence representing an excitation pulse train is output to excitation pulse decoding circuit 530.
The K parameter decoding circuit 520 decodes the input code sequence and outputs it to the synthesis filter circuit 550. The pitch decoding circuit decodes the input code sequence and outputs it to the pulse generation circuit 540.

音源パルス復号回路530は、音源パルス列を表わす符号
系列を入力し、復号化して音源パルス列の振幅、位置情
報としてパルス発生回路540へ出力する。Excitation pulse decoding circuit 530 inputs a code sequence representing an excitation pulse train, decodes it, and outputs it to pulse generation circuit 540 as amplitude and position information of the excitation pulse train.

パルス発生回路540は、音源パルス列の振幅、位置情報
及びピッチ情報を入力し、送信側のパルス発生回路420
と同一の動作を行ない、駆動信号ｄ（ｎ）を発生させ
る。ｄ（ｎ）は合成フィルタ回路550へ出力される。The pulse generation circuit 540 inputs the amplitude, position information and pitch information of the sound source pulse train, and the pulse generation circuit 420 on the transmission side.
The same operation is performed to generate the drive signal d (n). d (n) is output to the synthesis filter circuit 550.

合成フィルタ回路550は、パルス発生回路540から駆動信
号ｄ（ｎ）を入力し、Ｋパラメータ復号回路520からＫ
パラメータ復号値Ｋ_ｉ′を入力する。Ｋ_ｉ′は予測パラ
メータａ_ｉ′（１≦ｉ≦N_P）に衆知の方法により変換さ
れる。合成フィルタ回路550は、ａ_ｉ′と駆動信号ｄ
（ｎ）を用いて次式に従い再生信号（ｎ）を１フレー
ム分計算し、（ｎ）を受信側出力端子560を通して出力する。The synthesis filter circuit 550 inputs the drive signal d (n) from the pulse generation circuit 540 and outputs the K parameter decoding circuit 520 to K.
Input the parameter decoded value K _i ′. K _i ′ is converted into a prediction parameter a _i ′ (1 ≦ i ≦ N _P ) by a known method. The synthesis filter circuit 550 uses the a _i ′ and the drive signal d.
The reproduced signal (n) for one frame is calculated according to the following equation using (n), (N) is output through the reception side output terminal 560.

以上で本発明による音声符号化方式の復号器側の説明を
終了する。This is the end of the description on the decoder side of the audio encoding system according to the present invention.

本実施例においては、パルス計算回路390におけるパル
ス計算の際に、最初のK₁個のパルスはピッチ情報を用い
たインパルス応答h₁（ｎ）を用いて求め、残りのK₂個の
パルスはピッチ情報を用いないインパルス応答h₂（ｎ）
によって求めていた。これは次のようにしてもよい。つ
まり、ピッチゲインβ′を用いてピッチ情報をパルス計
算に用いるか否かの判別をしてもよい。すなわちピッチ
ゲインβ′があらかじめ定められたしきい値をこれてい
れば最初のK₁個のパルスにはピッチ情報を用いたインパ
ルス応答h₁（ｎ）を用い、β′がしきい値以下であれ
ば、ピッチ情報を用いないインパルス応答h₂（ｎ）によ
ってＫ個のパルスを計算してもよい。また次のようにし
てもよい。つまり、ピッチ情報を用いて計算したパルス
（この場合K₁個に対してはピッチ情報を用いるが、残り
のK₂個に対してはピッチ情報を用いない）とＫ個と全て
のパルスに対してピッチ情報を用いずに計算したパルス
を用いて、各々のパルスに対する誤差電力を求め、誤差
電力の小さな方のパルスを用いるようにしてもよい。こ
こでパルスを用いた誤差電力Ｅの計算は、例えば次式に
従えばよい。In the present embodiment, in the pulse calculation in the pulse calculation circuit 390, the first K ₁ pulses are obtained using the impulse response h ₁ (n) using the pitch information, and the remaining K ₂ pulses are Impulse response without pitch information h ₂ (n)
Was sought by. This may be done as follows. That is, the pitch gain β'may be used to determine whether or not the pitch information is used for pulse calculation. That is, if the pitch gain β'has a predetermined threshold value, the impulse response h ₁ (n) using the pitch information is used for the first K ₁ pulses, and β'is below the threshold value. If so, K pulses may be calculated by the impulse response h ₂ (n) that does not use pitch information. Alternatively, the following may be performed. That is, the pulses calculated using the pitch information (in this case, the pitch information is used for K ₁ pieces, but the pitch information is not used for the remaining K ₂ pieces) and K and all the pulses. The error power for each pulse may be obtained using the pulse calculated without using the pitch information, and the pulse having the smaller error power may be used. The calculation of the error power E using the pulse may be performed according to the following equation, for example.

上式で、Ｒ（０）は重み付け回路410の出力系列xw
（ｎ）の電力を示す。ピッチを用いない場合には（21）
式をそのまま用いて誤差電力を計算すればよい。一方、
ピッチを用いる場合には、（21）式を変形した次式に従
って誤差電力を計算すればよい。 In the above equation, R (0) is the output sequence xw of the weighting circuit 410.
The power of (n) is shown. Without pitch (21)
The error power may be calculated using the equation as it is. on the other hand,
When the pitch is used, the error power may be calculated according to the following equation that is a modification of equation (21).

上式でgi,miはピッチを用いて計算したパルスの振幅、
位置を示し、gsi,msiはピッチを用いずに計算したパル
スの振幅、位置を示す。またxhは前述の（９）式にお
ける相互相関と同一であり、sxhは（13）式における
相互相関と同一である。このような方法を用いれば、ピ
ッチゲインβ′による判別と比べ、より精度のよい判別
を行なうことができる。 In the above equation, gi and mi are the amplitude of the pulse calculated using the pitch,
The position is shown, and gsi and msi show the amplitude and position of the pulse calculated without using the pitch. Further, xh is the same as the cross correlation in the above equation (9), and sxh is the same as the cross correlation in the above equation (13). By using such a method, more accurate determination can be performed as compared with the determination based on the pitch gain β '.

また本実施例においては、パルス計算回路390において
ピッチを用いてK₁個のパルスを計算した後にピッチを用
いずにK₂個のパルスを計算する際に、（13）式を用いて
いた。これはつまりピッチを用いて計算したK₁個のパル
スの影響を相関領域で除去した後に、ピッチを用いずに
K₂個のパルスを計算していたが、他の方法として、ピッ
チを用いて計算したK₁個のパルスによって一たん信号を
再生し、前述の（12）式で示したように原信号ｘ（ｎ）
からこの信号を減算した後の信号に対して、ピッチを用
いずにK₂個のパルスを計算するようにしてもよい。Further, in the present embodiment, when the pulse calculation circuit 390 calculates K ₁ pulses using the pitch and then calculates K ₂ pulses without using the pitch, the equation (13) is used. This means that after removing the effect of K ₁ pulses calculated using pitch in the correlation region, without using pitch
K ₂ pulses were calculated, but as another method, the K ₁ pulses calculated using the pitch are used to regenerate a signal, and the original signal x is calculated as shown in the above equation (12). (N)
It is also possible to calculate K ₂ pulses without using the pitch for the signal after subtracting this signal from.

本実施例においては、音源パルスを計算する過程でピッ
チを用いる場合の合成フィルタのインパルス応答とピッ
チを用いない場合の合成フィルタのインパルス応答とを
切りかえる例について説明したが、切りかえるパラメー
タの候補がいくつもある場合は、それらのパラメータを
切りかえるようにしてもよい。また周波数選択された短
時間スペクトル特性を表わすパラメータ同志を切りかえ
るようにしてもよい。In the present embodiment, an example was described in which the impulse response of the synthesis filter when the pitch is used in the process of calculating the sound source pulse and the impulse response of the synthesis filter when the pitch is not used are described. In some cases, those parameters may be switched. Also, the parameters representing the short-time spectrum characteristics whose frequencies are selected may be switched.

また本実施例においては、２種類のインパルス応答を切
り換える例について説明したが、これは３種類以上のイ
ンパルス応答又は、音声信号の短時間スペクトル特性を
表わす他のパラメータを切り換えるような構成としても
よい。またインパルス応答を切り換える際に、フレーム
毎に計算して求めたインパルス応答を切り換えるのでは
なくて、次のような構成にしてもよい。すなわち、音声
信号の色々な短時間スペクトル特性を表わすインパルス
応答を充分な種類だけあらかじめ計算しておき、全ての
種類を表わす符号（コード）をあらかじめ定めたコード
ブック（Code Book）を作っておく。このコードブック
は送信側、受信側で同じものをそなえておく。実際の符
号化にあたっては送信側では、フレーム毎にインパルス
応答を用いる際に、原信号の短時間スペクトル特性を最
も良好に近似し得るインパルス応答をこのコードブック
の中から選択して用い、選択されたインパルス応答に対
応する符号を受信側に伝送する。受信側では受信した符
号に対応するインパルス応答を用いる。Further, in the present embodiment, an example in which two types of impulse responses are switched has been described, but this may be configured to switch three or more types of impulse responses or other parameters representing short-time spectrum characteristics of a voice signal. . Further, when switching the impulse response, instead of switching the impulse response calculated and calculated for each frame, the following configuration may be adopted. That is, a sufficient number of impulse responses representing various short-time spectral characteristics of a voice signal are calculated in advance, and a code book in which codes representing all types are predetermined is created. This codebook must have the same code on the sending side and the receiving side. In actual encoding, when using the impulse response for each frame, the impulse response that can best approximate the short-time spectral characteristics of the original signal is selected from this codebook and used. The code corresponding to the impulse response is transmitted to the receiving side. The receiving side uses the impulse response corresponding to the received code.

尚、短時間スペクトル特性を表わすパラメータとして、
インパルス応答以外のパラメータ、例えば前述の予測係
数、PARCOR係数等をコードブックとしてもっていてもよ
い。As a parameter that represents the short-time spectrum characteristics,
Parameters other than the impulse response, such as the above-described prediction coefficient and PARCOR coefficient, may be used as a codebook.

また、（10），（13）式に示した音源パルス計算法にお
いては、準最適なパルスを一つずつ計算していた。パル
ス計算法としては、次のパルスを計算する際に、これよ
り過去に求まった複数個のパルスの振幅を再調整するよ
うな方法を用いることもできる。この方法によれば、各
パルスに独立性が成立しない場合、例えば、各パルスが
非常に近接して求まる場合に効果的である。この方法以
外にも音源パルス計算法としては種々のものが考えられ
る。例えば、１フレーム内の全てのパルスが求まった後
に、全てのパルスの振幅を再調整するような方法を用い
ることもできる。In addition, in the sound source pulse calculation method shown in Eqs. (10) and (13), quasi-optimal pulses were calculated one by one. As a pulse calculation method, a method of re-adjusting the amplitudes of a plurality of pulses obtained in the past when calculating the next pulse can also be used. This method is effective when the independence of each pulse is not established, for example, when each pulse is found very close to each other. In addition to this method, various sound source pulse calculation methods are possible. For example, it is possible to use a method in which the amplitudes of all the pulses are readjusted after all the pulses in one frame are obtained.

また本実施例においては、短時間スペクトル構造を表わ
すインパルス応答系列の自己相関々数列を計算する際
に、インパルス応答計算回路210によって、Ｋパラメー
タ復号値及びピッチ情報とを用いて、インパルス応答系
列を計算したのちに、このインパルス応答系列を用いて
自己相関々数列を計算していた。ディジタル信号処理の
分野でよく知られているように、インパルス応答系列の
自己相関々数列は、短時間スペクトルのパワスペクトル
と対応関係にある。従って、Ｋパラメータ復号値及びピ
ッチ情報を用いて、短時間スペクトルのパワスペクトル
を求めこの後に自己相関々数列を計算するような構成と
してもよい。一方、音声信号系列と短時間スペクトル包
絡を表わすインパルス応答系列との相互相関々数列を計
算する際に、本実施例の構成では、重み付け回路410の
出力値である信号系列xw（ｎ）と、インパルス応答計算
回路210で求めたインパルス応答系列とを用いて、相互
相関関数計算回路350にて相互相関々数を計算してい
た。よく知られているように、相互相関々数は、クロス
・パワスペクトルと対応関係にある。この関係を用いて
音声信号系列とＫパラメータ復号値及びピッチ情報とを
用いてクロス・パワスペクトルを求めた後に相互相関々
数列を計算するような構成としてもよい。尚、パワスペ
クトルと自己相関々数列との対応関係、及びクロス・パ
ワスペクトルと相互相関々数列との対応関係について
は、エー・ブイ・オッペンハイム（A.V.OPPENHEIM）氏
らによる「ディジタル信号処理」（“DIGITAL SIGNAL P
ROCESSING"）と題した単行本（文献４）の第８章にて詳
細に説明されているので、ここでは説明を省略する。Further, in the present embodiment, when calculating the autocorrelation sequence of the impulse response sequence representing the short-time spectral structure, the impulse response calculation circuit 210 uses the K parameter decoded value and the pitch information to calculate the impulse response sequence. After the calculation, the autocorrelation sequence was calculated using this impulse response sequence. As is well known in the field of digital signal processing, the autocorrelation series of impulse response sequences has a correspondence with the power spectrum of short-time spectrum. Therefore, the configuration may be such that the power spectrum of the short-time spectrum is obtained using the K parameter decoded value and the pitch information, and then the autocorrelation sequence is calculated. On the other hand, when calculating the cross-correlation sequence of the voice signal sequence and the impulse response sequence representing the short-time spectrum envelope, in the configuration of the present embodiment, the signal sequence xw (n), which is the output value of the weighting circuit 410, The cross-correlation function calculation circuit 350 calculates the cross-correlation number using the impulse response series obtained by the impulse response calculation circuit 210. As is well known, the cross correlation number corresponds to the cross power spectrum. The cross-correlation sequence may be calculated after obtaining the cross power spectrum using the voice signal sequence, the K parameter decoded value, and the pitch information using this relationship. Regarding the correspondence between the power spectrum and the autocorrelation sequence, and the correspondence between the cross power spectrum and the cross-correlation sequence, "Digital Signal Processing" by AVOPPENHEIM et al. DIGITAL SIGNAL P
ROCESSING ") is described in detail in Chapter 8 of the book (Reference 4), so the description is omitted here.

また、前述の実施例においては、１フレーム内の音源パ
ルス系列の符号化は、パルス系列が全て求まった後に、
第４図（ａ）の符号化回路470によって符号化を施した
が、符号化をパルス系列の計算に含めて、パルスを１つ
計算する毎に、符号化を行ない、次のパルスを計算する
という構成にしてもよい。Further, in the above-mentioned embodiment, the encoding of the excitation pulse sequence in one frame is performed after all the pulse sequences are obtained.
Although the encoding is performed by the encoding circuit 470 of FIG. 4A, the encoding is included in the calculation of the pulse sequence, the encoding is performed every time one pulse is calculated, and the next pulse is calculated. You may make it the structure.

また、以上説明した実施例においては、短時間音声信号
系列のスペクトル包絡を表わすパラメータとしてはＫパ
ラメータを用いたが、これはよく知られている他のパラ
メータ（例えばLSPパラメータ等）を用いてもよい。更
に、前述の（８）式において重み付け関数ｗ（ｎ）はな
くてもよい。Further, in the embodiment described above, the K parameter is used as the parameter representing the spectrum envelope of the short-time speech signal sequence, but other well-known parameters (such as LSP parameter) may be used. Good. Furthermore, the weighting function w (n) may not be included in the above equation (8).

また、本実施例においては、フレーム境界での再生波形
の不連続に起因する品質劣化を防ぐために、現フレーム
より１フレーム過去の音源パルスに由来した応答信号系
列を計算し、現フレームの入力音声からこの応答信号を
減算した後に、駆動音源パルスを計算したが、第５図に
示すように、音源パルス計算に用いるデータとして、パ
ルスを伝送するフレームのデータ及びそれよりも過去の
データを含むような構成にしてもよい。第５図で、N_Tは
パルスを伝送するフレームを示し、Ｎは音源パルスを計
算するフレームを示す。このような構成とすることによ
って、１フレーム過去の音源パルスに由来した応答信号
系列を計算する必要がなくなり、回路構成が簡略化され
る。Further, in the present embodiment, in order to prevent quality deterioration due to discontinuity of the reproduced waveform at the frame boundary, a response signal sequence derived from a sound source pulse one frame before the current frame is calculated, and the input speech of the current frame is calculated. The drive sound source pulse was calculated after this response signal was subtracted from, but as shown in FIG. 5, the data used in the sound source pulse calculation should include the data of the frame that transmits the pulse and the past data. You may make it a different structure. In FIG. 5, N _T indicates a frame for transmitting a pulse, and N indicates a frame for calculating a sound source pulse. With such a configuration, it is not necessary to calculate a response signal sequence derived from a sound source pulse of one frame past, and the circuit configuration is simplified.

（発明の効果）以上詳細に説明した通り、本発明によれば、入力音声信
号の音源パルス列を計算する際に、パルス計算過程にお
いて入力音声信号の短時間スペクトル特性を表わすパラ
メータを切り換えて用いているので、より良好なパラメ
ータをパルス計算に用いることができ、伝送ビットレイ
トが低い場合でも従来方式と比べより高品質な再生音声
を得ることができる。(Effect of the Invention) As described in detail above, according to the present invention, when calculating a sound source pulse train of an input audio signal, a parameter representing a short-time spectrum characteristic of the input audio signal is switched and used in a pulse calculation process. Therefore, better parameters can be used for pulse calculation, and even if the transmission bit rate is low, it is possible to obtain a reproduced voice of higher quality than the conventional method.

[Brief description of drawings]

第１図は従来方式の構成を示すブロック図、第２図は音
源パルス系列の一例を示す図、第３図は入力音声信号系
列の周波数特性と第１図に記載の重み付け回路の周波数
特性の一例を示す図、第４図（ａ），（ｂ）は本発明に
よる音声符号化方式の一実施例を示すブロック図、第５
図はパルス伝送フレームと音源パルス計算フレームとの
位置関係を説明するための図である。図において、110,340……バッファメモリ回路、120,285
……減算回路、130,400,550……合成フィルタ回路、14
0,420,540……パルス発生回路、150……誤差最小化回
路、180,200……Ｋパラメータ計算回路、190,410……重
み付け回路、200……Ｋパラメータ符号化回路、210……
インパルス応答計算回路、350……相互相関関数計算回
路、360……自己相関関数計算回路、390……パルス計算
回路、470……符号化回路、450……マルチプレクサ、50
0……デマルチプレクサ、520……Ｋパラメータ復号回
路、530……パルス復号回路をそれぞれ示す。FIG. 1 is a block diagram showing a configuration of a conventional system, FIG. 2 is a diagram showing an example of a sound source pulse sequence, and FIG. 3 is a frequency characteristic of an input audio signal sequence and a frequency characteristic of a weighting circuit shown in FIG. FIG. 4 shows an example, FIG. 4 (a) and FIG. 4 (b) are block diagrams showing an embodiment of a voice coding system according to the present invention, and FIG.
The figure is a diagram for explaining the positional relationship between the pulse transmission frame and the sound source pulse calculation frame. In the figure, 110,340 ... buffer memory circuit, 120,285
...... Subtraction circuit, 130,400,550 …… Synthesis filter circuit, 14
0,420,540 ... Pulse generation circuit, 150 ... Error minimization circuit, 180,200 ... K parameter calculation circuit, 190,410 ... Weighting circuit, 200 ... K parameter coding circuit, 210 ...
Impulse response calculation circuit, 350 ... Cross-correlation function calculation circuit, 360 ... Autocorrelation function calculation circuit, 390 ... Pulse calculation circuit, 470 ... Encoding circuit, 450 ... Multiplexer, 50
0 ... Demultiplexer, 520 ... K parameter decoding circuit, 530 ... Pulse decoding circuit, respectively.

Claims

[Claims]

1. A transmitter side receives a discrete audio signal sequence, obtains a plurality of parameter sequences representing short-time spectral characteristics of the audio signal sequence, and obtains a pulse sequence for representing the audio signal sequence for each frame. When obtaining the pulse sequence by switching the plurality of parameter sequences in the frame to obtain a pulse sequence or encoding the pulse sequence, or obtaining the pulse sequence for each parameter, select the best parameter and select the pulse The sequence is encoded, and the code representing the pulse sequence and the code representing the plurality of parameter sequences are combined and output. At the receiving side, the combined code is input and the code representing the pulse sequence and the plurality of And a code representing a parameter sequence of the above are separated and decoded, and the audio signal sequence is reproduced based on the decoded pulse sequence. Speech coding method is characterized in that so as to reproduce the voice signal sequence with or selected parameters to switch the decoded plurality of parameter sequence when.

2. A parameter calculation circuit for inputting a discrete audio signal sequence to obtain a plurality of parameter sequences representing short-time spectral characteristics of the audio signal sequence, and a pulse sequence for representing the audio signal sequence for each frame. When obtaining, the pulse sequence is switched by switching the plurality of parameter sequences within the frame to encode the pulse sequence, or after obtaining the pulse sequence for each parameter, the best parameter is selected to select the pulse sequence. A speech coding apparatus, comprising: a pulse calculation circuit for coding a plurality of parameter sequences; and a multiplexer circuit for combining and outputting the code sequences of the plurality of parameter sequences and the code sequences of the pulse sequences.

3. A code sequence in which a code representing a plurality of parameter sequences representing a short-time spectrum characteristic of a voice signal sequence and a code representing a pulse sequence for representing a voice signal are input, and each code is separated. Demultiplexer circuit, a pulse decoding circuit for inputting and decoding the code representing the pulse sequence obtained separately, and a pulse decoding circuit for inputting and decoding the code sequence representing the plurality of parameter sequences obtained separately A parameter decoding circuit for switching the decoded plurality of parameter sequences when reproducing the audio signal sequence using the decoded pulse sequence and the decoded plurality of parameter sequences. And a synthesis filter circuit that reproduces and outputs the audio signal sequence by using the above parameters.