JP3102017B2

JP3102017B2 - Audio coding method

Info

Publication number: JP3102017B2
Application number: JP02184234A
Authority: JP
Inventors: 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-07-13
Filing date: 1990-07-13
Publication date: 2000-10-23
Anticipated expiration: 2015-10-23
Also published as: JPH0473700A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号を低いビットレート、特に８〜4.
8kb/s程度で高品質に符号化するための音声符号化方法
に関する。DETAILED DESCRIPTION OF THE INVENTION INDUSTRIAL APPLICATION The present invention relates to an audio signal having a low bit rate, especially 8 to 4.
The present invention relates to a voice coding method for performing high-quality coding at about 8 kb / s.

[Conventional technology]

音声信号を８〜4.8kb/s程度の低いビットレートで符
号化する方式としては、例えば、M.Schroeder and B.At
al氏による、“Code−excited linear prediction:High
quality speech at very low bit rates"（Proc.ICASS
P,pp.937−940,1985年）と題した論文（文献１）等に記
載されているCELP（Code Excited LPC Coding）が知ら
れている。この方法では、送信側では、フレーム毎（例
えば20ms）に音声信号から音声信号のスペクトル特性を
表すスペクトルパラメータを抽出し、フレームをさらに
小区間サブフレーム（例えば5ms）に分割し、サブフレ
ーム毎に過去の音源信号をもとに長時間相関（ピッチ相
関）を表す適応コードブックのピッチパラメータを抽出
し、ピッチパラメータによりサブフレームの音声信号を
長期予測し、長期予測して求めた残渣信号に対して、予
め定められた種類の雑音信号からなるコードブックから
選択した信号により合成した信号と、音声信号との誤差
電力を最小化するように一種類の雑音信号を選択すると
ともに、最適なゲインを計算する。そして選択された雑
音信号の種類を表すインデクスとゲイン、ならびに、前
記スペクトルパラメータとピッチパラメータを伝送す
る。As a method of encoding an audio signal at a low bit rate of about 8 to 4.8 kb / s, for example, M. Schroeder and B. At
al says, “Code-excited linear prediction: High
quality speech at very low bit rates "(Proc.ICASS
P, pp. 937-940, 1985), and a CELP (Code Excited LPC Coding) described in a paper (Reference 1) and the like are known. In this method, the transmitting side extracts a spectrum parameter representing a spectrum characteristic of a speech signal from a speech signal for each frame (for example, 20 ms), further divides the frame into small-section subframes (for example, 5 ms), and Extracts the pitch parameter of the adaptive codebook representing the long-term correlation (pitch correlation) based on the past sound source signal. A single noise signal is selected so as to minimize the error power between the signal synthesized from the signal selected from the codebook consisting of a predetermined type of noise signal and the audio signal, and an optimal gain is selected. calculate. Then, an index and a gain indicating the type of the selected noise signal, and the spectrum parameter and the pitch parameter are transmitted.

[Problems to be solved by the invention]

上述した文献１の従来方式では、高音質を得るために
は、一般に、雑音信号から構成されるコードブックのビ
ットサイズを10ビット以上にきわめて大きくする必要が
あるため、コードブックを探索して最適な雑音信号（コ
ードワード）を求めるために膨大な演算量が必要である
という問題点があった。さらに、コードブックが基本的
に雑音信号から構成されるために、コードブックから選
択された音源信号により再生された再生音声の音質は雑
音感がともなうという問題点があった。さらにビットレ
ートを低減するためにコードブックのサイズを低減させ
ると音質は急速に劣化するという問題点があった。In the conventional method of the above-mentioned document 1, in order to obtain high sound quality, it is generally necessary to extremely increase the bit size of a codebook composed of a noise signal to 10 bits or more. There is a problem that an enormous amount of calculation is required to obtain a suitable noise signal (codeword). Further, since the codebook is basically composed of a noise signal, there is a problem that the sound quality of the reproduced sound reproduced by the sound source signal selected from the codebook is accompanied by noise. If the size of the codebook is reduced to further reduce the bit rate, there is a problem that the sound quality is rapidly deteriorated.

本発明の目的は、上述した問題点を解決し、比較的少
ない演算量およびメモリ量により、８〜4.8kb/s程度で
音質の良好な音声符号化方法を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems and to provide a speech encoding method with a good sound quality at about 8 to 4.8 kb / s with a relatively small amount of calculation and memory.

[Means for solving the problem]

第１の発明は、入力した離散的な音声信号を予め定め
られた時間長のフレームに分割し、前記音声信号のスペ
クトル包絡を表すスペクトルパラメータを求めて出力
し、前記フレームを予め定められた時間長の小区間を分
割し、過去の音源信号をもとに再生した信号が前記音声
信号に近くなるようにピッチパラメータを求め、前記音
声信号の音源信号を第１のコードブックから選択した信
号と第２のコードブックから選択した信号との線形結合
により表す音声符号化方法において、前記第２のコードブックから選択した信号をもとに前
記第１のコードブックを修正することを特徴とする。A first invention divides an input discrete audio signal into frames of a predetermined time length, obtains and outputs a spectrum parameter representing a spectrum envelope of the audio signal, and outputs the frame at a predetermined time. A long section is divided, a pitch parameter is determined so that a signal reproduced based on a past sound source signal is close to the sound signal, and a sound source signal of the sound signal is selected from a first codebook and a signal selected from a first codebook. A speech encoding method represented by a linear combination with a signal selected from a second codebook, wherein the first codebook is modified based on a signal selected from the second codebook.

第２の発明は、入力した離散的な音声信号を予め定め
られた時間長のフレームに分割し、前記音声信号のスク
トル包絡を表すスペクトルパラメータを求めて出力し、
前記フレームを予め定められた時間長の小区間に分割
し、過去の音源信号をもとに再生した信号が前記音声信
号に近くなるようにピッチパラメータを求め、前記音声
信号の音源信号を第１のコードブックから選択した信号
と第２のコードブックから選択した信号との線形結合に
より表す音声符号化方法において、前記第１のコードブックから選択した信号をもとに前
記第２のコードブックを修正することを特徴とする。The second invention divides the input discrete audio signal into frames of a predetermined time length, obtains and outputs a spectrum parameter representing a spectrum envelope of the audio signal,
The frame is divided into small sections of a predetermined time length, and a pitch parameter is determined so that a signal reproduced based on a past sound source signal is close to the sound signal. A speech encoding method represented by a linear combination of a signal selected from the codebook and a signal selected from the second codebook, wherein the second codebook is generated based on the signal selected from the first codebook. It is characterized by being modified.

第３の発明は、入力した離散的な音声信号を予め定め
られた時間長のフレームに分割し、前記音声信号のスペ
クトル包絡を表すスペクトルパラメータを求めて出力
し、前記フレームを予め定められた時間長の小区間に分
割し、前記小区間毎にピッチパラメータを求めてピッチ
予測音源信号を求め、前記ピッチ予測音源信号とコード
ブックから選択した信号とにより前記音声信号の音源信
号を表す音声符号化方法において、前記ピッチ予測音源信号をもとに前記コードブックを
修正するか、あるいは、前記コードブックから選択した
信号により前記予測音源信号を修正することを特徴とす
る。A third invention divides an input discrete audio signal into frames of a predetermined time length, obtains and outputs a spectrum parameter representing a spectrum envelope of the audio signal, and outputs the frame at a predetermined time. Speech coding that represents a sound source signal of the sound signal by dividing the sound signal into long sub-intervals, obtaining a pitch parameter for each of the sub-intervals to obtain a pitch prediction excitation signal, and using the pitch prediction excitation signal and a signal selected from a codebook. In the method, the codebook is modified based on the pitch prediction excitation signal, or the prediction excitation signal is modified by a signal selected from the codebook.

[Action]

本発明による音声符号化方法の作用を示す。 4 illustrates the operation of the speech encoding method according to the present invention.

第１の発明では、フレームを分割したサブフレーム毎
に、下式を最小化するように音源信号を求める。In the first invention, a sound source signal is obtained so as to minimize the following equation for each subframe obtained by dividing a frame.

ここでβ,Mは長期相関にもとづくピッチ予測（適応コ
ードブック）のピッチパラメータ、すなわちゲインおよ
び遅延であり、ｖ（ｎ）は過去の音源信号である。ｈ
（ｎ）はスペクトルパラメータにより構成される合成フ
ィルタのインパルス応答、ｗ（ｎ）は聴感重み付けフィ
ルタのインパルス応答を示す。信号＊は畳み込み演算を
示す。なお、ｗ（ｎ）の詳細については前記文献１を参
照できる。 Here, β and M are pitch parameters of pitch prediction (adaptive codebook) based on long-term correlation, that is, gain and delay, and v (n) is a past sound source signal. h
(N) indicates the impulse response of the synthesis filter composed of the spectrum parameters, and w (n) indicates the impulse response of the auditory weighting filter. The signal * indicates a convolution operation. For details of w (n), reference can be made to Reference 1.

また、ｄ（ｎ）はコードブックにより表される音源信
号を示し、下式のように、第１のコードブックから選択
されたコードベクトルc₁（ｎ）と第２のコードブックか
ら選択されたコードベクトルc₂（ｎ）との線形結合で表
される。Further, d (n) represents the excitation signal represented by codebook, the following equation was first code code selected from book vectors c ₁ (n) and selected from the second codebook It is represented by a linear combination with the code vector c ₂ (n).

ここで、γ₁,γ_２は、選択されたコードワードc
_1j（ｎ）,c_2i（ｎ）のゲインを示す。従って、本発明で
は、２種類のコードブックを分解して音源信号が表され
ることになるため、各コードブックはコードブック全体
のビット数の1/2でよい。例えばコードブック全体のビ
ット数を10ビットとすると、第1,2のコードブックは例
えば、５ビットずつでよく、コードブック探索の演算量
を大幅に低減できる。 Here, γ ₁ and γ ₂ are the selected code words c
_1j (n) and c _2i (n) are shown. Therefore, in the present invention, since the sound source signal is represented by decomposing the two types of codebooks, each codebook may be half the number of bits of the entire codebook. For example, assuming that the number of bits of the entire codebook is 10 bits, the first and second codebooks may be, for example, 5 bits each, and the amount of calculation for codebook search can be greatly reduced.

各コードブックとして前記文献１のような雑音コード
ブックを用いる場合、（２）式のように分割すると、特
性的には10ビット分のコードブックよりも劣化し全体で
７〜８ビット分の性能しか出せない。When a noise codebook as described in Reference 1 is used as each codebook, if the codebook is divided as shown in the equation (2), the codebook is characteristically degraded more than a 10-bit codebook and has a total performance of 7 to 8 bits. I can only give it.

そこで、高性能を得るために、第１のコードブックは
トレーニングデータを用いて予め学習させることにより
構成する。学習によるコードブックの構成法としては、
例えば、Lindeらによる“An Algorithm for Vector Qua
ntization De−sign"と題した論文（IEEE Trans.COM−2
8,pp.84−95,1980年）（文献２）等が知られている。Therefore, in order to obtain high performance, the first codebook is configured by learning in advance using training data. As a method of constructing a codebook by learning,
For example, Linde et al., “An Algorithm for Vector Qua
ntization De-sign "(IEEE Trans.COM-2
8, pp. 84-95, 1980) (Literature 2).

学習のときの距離尺度としては、通常、２乗距離（ユ
ークリッド距離）が用いられるが、本発明では２乗距離
よりも性能の良好な次式による聴感重み付け距離尺度を
用いる。As a distance measure at the time of learning, a square distance (Euclidean distance) is usually used, but in the present invention, an auditory weighting distance scale that has better performance than the square distance and is represented by the following equation is used.

ここでt_j（ｎ）はｊ番目のトレーニングデータ、c
_l（ｎ）はクラスタ１のコードワードである。クラスタ
１のセントロイド（代表コードワーダ）は、クラスタ１
内のトレーニングデータを用いて（４）式あるいは
（５）式を最小化するように求める。 Where t _j (n) is the j-th training data, c
_l (n) is the code word of cluster 1. Cluster 1 centroid (representative codeworder) is cluster 1
Equation (4) or Equation (5) is determined to be minimized using the training data in.

（５）式において、ｇは最適ゲインを示す。 In the equation (5), g indicates an optimum gain.

次に、第２のコードブックは、第１のコードブックに
よるトレーニングデータ依存性を救済するために、第１
のコードブックの寄与分を音源信号から減算した残りの
信号に対して、前記文献２の方法により学習により求め
たコードブックや、前記文献１のガウス性雑音信号のよ
うな予め統計的特性が確定した雑音信号あるいは乱数信
号からなるコードブックや、他の特性を有するコードブ
ックを使用する。なお、雑音コードブックに対して、あ
る距離尺度のもとで学習を行うことにより、さらに特性
が改善される。詳細は、T.Moriya氏らによる“Transfor
m Coding of Speech using a Weighted Vector Quantiz
er,"と題した論文（IEEE J.Sel.Areas Commun.,pp.425
−431,1988年）（文献３）等を参照することができる。Next, the second codebook rewrites the first codebook to relieve the training data dependency due to the first codebook.
For the remaining signal obtained by subtracting the contribution of the codebook from the sound source signal, the statistical characteristics such as the codebook obtained by learning according to the method of Reference 2 and the Gaussian noise signal of Reference 1 are determined in advance. A codebook including a noise signal or a random number signal obtained as described above and a codebook having other characteristics are used. The characteristics are further improved by learning the noise codebook under a certain distance scale. See “Transfor by T. Moriya et al.
m Coding of Speech using a Weighted Vector Quantiz
er, "(IEEE J. Sel. Areas Commun., pp. 425
-431, 1988) (Reference 3).

第１の発明では、第２のコードブックにより選択され
たコードベクトルを用いて、第１のコードベクトルを修
正することに特徴がある。これは、第１のコードブック
を、入力信号の短時間特性に適応させ、少ないコードブ
ックサイズでより良好な特性を得るために行う。第１の
コードブックの修正は下式にしたがう。The first invention is characterized in that the first code vector is corrected using the code vector selected by the second code book. This is performed in order to adapt the first codebook to the short-time characteristics of the input signal and to obtain better characteristics with a small codebook size. The first codebook is modified according to the following formula.

ｃ′_1j（ｎ）＝c_1j（ｎ）＋sign（γ_１）・sign（γ_２）・Ｂ・C_2i（ｎ）（６）ここでＢは収束を決める正の微小量である。また、si
gn（γ）はγの符号を表す。（６）式にしたがい、送信
側，受信側共に第１のコードブックを修正する。c ′ _1j (n) = c _1j (n) + sign (γ ₁ ) · sign (γ ₂ ) · B · C _2i (n) (6) where B is a small positive amount that determines convergence. Also si
gn (γ) represents the sign of γ. According to the equation (6), the first codebook is modified on both the transmitting side and the receiving side.

また、伝送路誤りに強くすると共に誤った修正の影響
を防ぐために、（６）式の代りに下式を用いることもで
きる。Also, the following equation can be used in place of equation (6) in order to make the channel resistant to transmission line errors and to prevent the influence of erroneous correction.

ｃ′_1j（ｎ）＝（１−δ）・c_1j（ｎ）＋Ａ・δ・C_2i（ｎ）・・・（７）ここでδは、誤った修正を防ぐと共に、伝送路誤りの
影響を低減させるための正の微小量（例えば10^-2〜1
0^-3）、Ａは下式で決まる収束係数である。c ′ _1j (n) = (1−δ) · c _1j (n) + A · δ · C _2i (n) (7) where δ is not only an erroneous correction but also an effect of a transmission path error. Small amount (eg, 10 ^-2 to 1
0 ^-3 ), and A is a convergence coefficient determined by the following equation.

Ａ＝γ₂/γ_１（８）ここで、入力ベクトルと１段目で選択したベクトルと
の誤差が小さくなれば、１段目と２段目のゲインの比γ
₂/γ_１は小さくなるので、修正が進みにくくなる。A = γ ₂ / γ ₁ (8) Here, if the error between the input vector and the vector selected in the first stage is small, the ratio γ of the gain in the first and second stages
_{Since 2} / γ ₁ becomes smaller, the correction becomes difficult to proceed.

さらに、簡略化した（９）式や（10）式を用いること
もできる。Further, simplified equations (9) and (10) can also be used.

ｃ′_1j（ｎ）＝（１−δ）・c_1j（ｎ）＋Ａ・δ （９）ｃ′_1j（ｎ）＝（１−δ）・c_1j（ｎ）＋sign（γ_１）・sign（γ_２）・δ （10）次に、第２の発明では、第２のコードブックを第１の
コードブックから選択されたコードベクトルを用いて修
正する。修正には（11）式あるいは、（12）式を用い
る。c ′ _1j (n) = (1−δ) · c _1j (n) + A · δ (9) c ′ _1j (n) = (1−δ) · c _1j (n) + sign (γ ₁ ) · sign ( γ ₂ ) · δ (10) Next, in the second invention, the second codebook is modified using a code vector selected from the first codebook. Equation (11) or (12) is used for the correction.

ｃ′_2i（ｎ）＝sign（γ_１）・sign（γ_２）・Ｂ・C_1j（ｎ）＋C_2i（ｎ）（11）ｃ′_2i（ｎ）＝Ａ・δ・c_1j（ｎ）＋（１−δ）・C_2i（ｎ）・・・（12）また、簡略化した（13），（14）式を用いることもで
きる。c ′ _2i (n) = sign (γ ₁ ) · sign (γ ₂ ) · B · C _1j (n) + C _2i (n) (11) c ′ _2i (n) = A · δ · c _1j (n) + (1−δ) · C _2i (n) (12) Further, simplified equations (13) and (14) can also be used.

ｃ′_2i（ｎ）＝Ａ・δ＋（１−δ）・C_2i（ｎ）・・・（13）ｃ′_2i（ｎ）＝sign（γ_１）・sign（γ_２）・δ ＋（１−δ）・C_2i（ｎ）（14）さらに第３の発明では、音声信号の音源信号を、前記
文献１のように適応コードブックとコードブックとで表
す。適応コードブックにより過去の音源信号をもとに求
めたピッチ予測音源信号を用いて、コードブックで選択
されたコードベクトルを修正する。修正は（15）式ある
いは（16）式に伴う。c ′ _2i (n) = A · δ + (1−δ) · C _2i (n) (13) c ′ _2i (n) = sign (γ ₁ ) · sign (γ ₂ ) · δ + (1) −δ) · C _2i (n) (14) Further, in the third invention, the sound source signal of the audio signal is represented by an adaptive codebook and a codebook as in the above-mentioned document 1. The code vector selected in the codebook is corrected using the pitch prediction excitation signal obtained based on the past excitation signal by the adaptive codebook. The correction accompanies equation (15) or equation (16).

ｃ′_ｊ（ｎ）＝sign（β）・sign（γ_１）Ｂ・ｖ（ｎ−Ｍ）＋C_j（ｎ）（15）ｃ′_ｊ（ｎ）＝Ａ・δ・ｖ（ｎ−Ｍ）＋（１−δ）・C_j（ｎ）・・・（16）ここで、ｖ（ｎ−Ｍ）は、適応コードブックで求めた
ピッチ予測音源信号である。また、c_j（ｎ）はコードブ
ックから選択されたｊ番目のコードベクトルである。c ′ _j (n) = sign (β) · sign (γ ₁ ) B · v (n−M) + C _j (n) (15) c ′ _j (n) = A · δ · v (n−M) + (1−δ) · C _j (n) (16) Here, v (n−M) is a pitch prediction excitation signal obtained by the adaptive codebook. Also, c _j (n) is the j-th code vector selected from the codebook.

また、簡略化した（17），（18）式を用いることもで
きる。Further, simplified equations (17) and (18) can also be used.

ｃ′_ｊ（ｎ）＝Ａ・δ＋（１−δ）・C_j（ｎ）（17）ｃ′_ｊ（ｎ）＝sign（β）・sign（γ_１）・δ ＋（１−δ）・C_j（ｎ）（18）また、コードブックから選択されたコードベクトルC_j
（ｎ）を用いてピッチ予測音源信号を修正することもで
きる。c ′ _j (n) = A · δ + (1−δ) · C _j (n) (17) c ′ _j (n) = sign (β) · sign (γ ₁ ) · δ + (1−δ) · C _j (n) (18) Also, the code vector C _j selected from the codebook
(N) may be used to modify the pitch prediction excitation signal.

〔Example〕

第１図は第１の発明による音声符号化方法を実施する
音声符号化装置を示すブロック図である。FIG. 1 is a block diagram showing a speech encoding apparatus for implementing a speech encoding method according to the first invention.

送信側では、入力端子110から音声信号を入力し、１
フレーム分（例えば20ms）の音声信号をバッファメモリ
120に格納する。On the transmitting side, an audio signal is input from the input terminal 110 and
Buffer memory for audio signals for frames (for example, 20 ms)
Store in 120.

LPC計算回路130は、フレームの音声信号のスペクトル
特性を表すパラメータとして、LSPパラメータをフレー
ムの音声信号から周知のLPC分析を行い、予め定められ
た次数Ｌだけ計算する。この具体的な計算法については
前記文献１を参照することができる。The LPC calculation circuit 130 performs a well-known LPC analysis on the LSP parameter from the audio signal of the frame as a parameter representing the spectral characteristic of the audio signal of the frame, and calculates only a predetermined order L. Reference 1 can be referred to for the specific calculation method.

次にLSP量子化回路140は、LSPパラメータを予め定め
られた量子化ビット数で量子化し、得た符号1_kをマルチ
プレクサ260へ出力するとともに、これを復号化してさ
らに線形予測係数a_i′（ｉ＝１〜Ｌ）に変換して、重み
付け回路200,インパルス応答計算回路170,合成フィルタ
281へ出力する。LSPパラメータの符号化,LSPパラメータ
から線形予測係数への変換の方法については、Sugamura
氏らによる“Quantizer design in LSP speech analysi
s−synthesis"と題した論文（IEEE J.Sel.Areas Commu
n.,pp.432−440,1988）（文献４）等を参照することが
できる。Next, the LSP quantization circuit 140 quantizes the LSP parameter with a predetermined number of quantization bits, outputs the obtained code 1 _k to the multiplexer 260, decodes it, and further decodes the linear prediction coefficient a _i ′ ( i = 1 to L), the weighting circuit 200, the impulse response calculation circuit 170, the synthesis filter
Output to 281. For information on how to encode LSP parameters and how to convert LSP parameters to linear prediction coefficients, see Sugamura
“Quantizer design in LSP speech analysi
s-synthesis "(IEEE J. Sel. Areas Commu
n., pp. 432-440, 1988) (Reference 4).

サブフレーム分割回路150は、フレームの音声信号を
サブフレームに分割する。ここで例えばフレーム長は20
ms、サブフレーム長は5msとする。The subframe division circuit 150 divides the audio signal of the frame into subframes. Here, for example, the frame length is 20
ms, and the subframe length is 5 ms.

重み付け回路200は、減算した信号に対して周知の聴
感重み付けを行う。聴感重み付け関数の詳細は、前記文
献１を参照できる。The weighting circuit 200 performs well-known auditory weighting on the subtracted signal. Reference 1 can be referred to for details of the auditory sensation weighting function.

減算器190は、サブフレームに分割された入力信号か
ら合成フィルタ281の出力を減算して出力する。The subtractor 190 subtracts the output of the synthesis filter 281 from the input signal divided into subframes and outputs the result.

適応コードブック210は、合成フィルタ281の入力信号
ｖ（ｎ）を遅延回路206を介して入力し、さらにインパ
ルス応答出力回路170から重み付けインパルス応答h
_w（ｎ）、重み付け回路200から重い付け信号を入力し、
長期相関にもとづくピッチ予測を行い、ピッチパラメー
タとして遅延Ｍとゲインβを計算する。以下の説明では
適応コードブックの予測次数は１とするが、２次以上の
高次とすることもできる。１次の適応コードブックにお
ける遅延M,ゲインβの計算法は、Kleijin“Improved sp
eech quality and efficient vector quantization in
SELP"と題した論文（Proc.ICASSP,pp.155−158,1988
年）（文献５）等に記載されている。さらに求めたゲイ
ンβをゲイン量子化器により予め定められた量子化ビッ
ト数で量子化復号化し、ゲインβ′を求め、これを用い
て次式により予測信号_ｗ（ｎ）を計算し減算器205に
出力する。また遅延Ｍをマルチプレクサ260へ出力す
る。_ｗ（ｎ）＝β′・ｖ（ｎ−Ｍ）＊h_w（ｎ）（19）上式でｖ（ｎ−Ｍ）は過去の音源信号で、合成フィル
タ281の入力信号である。h_w（ｎ）はインパルス応答計
算回路170で求めた重み付けインパルス応答である。The adaptive codebook 210 inputs the input signal v (n) of the synthesis filter 281 via the delay circuit 206, and further outputs the weighted impulse response h from the impulse response output circuit 170.
_w (n), input a heavy weighting signal from the weighting circuit 200,
Pitch prediction is performed based on the long-term correlation, and delay M and gain β are calculated as pitch parameters. In the following description, the prediction order of the adaptive codebook is assumed to be 1, but may be higher than or equal to second order. The calculation method of the delay M and the gain β in the first-order adaptive codebook is described in Kleijin “Improved sp
eech quality and efficient vector quantization in
SELP "(Proc. ICASSP, pp. 155-158, 1988
Year) (Reference 5). Further, the obtained gain β is quantized and decoded by a predetermined number of quantization bits by a gain quantizer to obtain a gain β ′, which is used to calculate a prediction signal _w (n) according to the following equation. Output to The delay M is output to the multiplexer 260. _w (n) = β ′ · v (n−M) * h _w (n) (19) In the above equation, v (n−M) is a past sound source signal, which is an input signal of the synthesis filter 281. h _w (n) is a weighted impulse response obtained by the impulse response calculation circuit 170.

遅延回路206は、合成フィルタ入力信号ｖ（ｎ）を１
サブフレーム分遅延させて適応コードブック210へ出力
する。The delay circuit 206 sets the synthesis filter input signal v (n) to 1
Output to adaptive codebook 210 after delaying by a subframe.

減算器205は、次式により重み付け回路200の出力信号
から適応コードブック210の出力を減算し、残差信号e_w
（ｎ）を第１のコードブック探索回路230に出力する。The subtractor 205 subtracts the output of the adaptive codebook 210 from the output signal of the weighting circuit 200 according to the following equation, and generates a residual signal e _w
(N) is output to the first codebook search circuit 230.

e_w（ｎ）＝x_w（ｎ）−_ｗ（ｎ）（20）インパルス応答計算回路170は、聴感重み付けした合
成フィルタのインパルス応答h_w（ｎ）を、予め定められ
たサンプル数Ｌだけ計算する。具体的な計算法は、前記
文献１等を参照できる。e _w (n) = x _w (n) _−w (n) (20) The impulse response calculation circuit 170 calculates the impulse response h _w (n) of the synthesis filter weighted by the auditory sense for a predetermined number L of samples. I do. For a specific calculation method, reference can be made to Document 1 and the like.

第１のコードブック探索回路230は、第１のコードブ
ック235を用いて最適なコードワードc_1j（ｎ）を探索す
る。ここで作用の項に記したように、第１のコードブッ
クは、予めトレーニング信号を用いて学習しておく。最
適なコードベクトルC_1j（ｎ）の探索法は、特願平２−4
2956号明細書（文献６）等を参照できる。そいて最適な
ゲインγ_１を求め、これとC_1j（ｎ）を用いて前記文献
６の方法により重み付け再生信号y_w（ｎ）を求め出力す
る。The first codebook search circuit 230 uses the first codebook 235 to search for an optimal codeword c _1j (n). Here, as described in the operation section, the first codebook is learned in advance using a training signal. The method of searching for the optimal code vector C _1j (n) is disclosed in Japanese Patent Application No. Hei.
2956 (Reference 6) and the like can be referred to. Then, the optimum gain γ ₁ is obtained, and using this and C _1j (n), the weighted reproduction signal y _w (n) is obtained and output by the method of Reference 6.

減算器255は、e_w（ｎ）からy_w（ｎ）を減算して第２
のコードブック探索回路270へ出力する。The subtractor 255 subtracts y _w (n) from e _w (n) to obtain a second
To the codebook search circuit 270.

第２のコードブック探索回路270は、第２のコードブ
ック275から最適なコードワードを計算する。第２のコ
ードブック探索回路の構成は、第１のコードブック探索
回路の構成と基本的に同一の構成を用いることができ
る。また、コードワードの探索法としては、第１のコー
ドブック235の探索と同一の方法を用いることができ
る。第２のコードブックの構成法としては、作用の項で
述べたように、学習コードブックの高効率を保ちながら
トレーニングデータ依存性を救済するために、乱数系列
からなるコードブックを用いる。乱数系列からなるコー
ドブックの構成法は前記文献１を参照できる。The second codebook search circuit 270 calculates an optimal codeword from the second codebook 275. The configuration of the second codebook search circuit can be basically the same as the configuration of the first codebook search circuit. Also, the same method as the method of searching the first codebook 235 can be used as a codeword search method. As described in the section of the operation, the second codebook is composed of a random number sequence codebook in order to relieve training data dependency while maintaining high efficiency of the learning codebook. For the method of constructing a codebook composed of a random number sequence, reference can be made to the above-mentioned document 1.

また、コードブック探索の演算量の低減化のために、
第２のコードブック275として、重畳型（overlan）乱数
コードブックを用いることができる。重畳型乱数コード
ブックの構成法、コードワード探索法については前記文
献５等を参照できる。また、第１のコードブックと同様
に予め学習して構成することもできる。In addition, in order to reduce the amount of calculation for codebook search,
As the second codebook 275, a superimposed random number codebook can be used. For the construction method of the superimposed random number codebook and the codeword search method, reference can be made to the above-mentioned Document 5. Further, it is also possible to configure by learning in advance similarly to the first codebook.

ゲイン量子化器286は、作用に述べた方法により、予
め学習により（12），（13）式を用いて作成したゲイン
コードブック287を用いて、ゲインγ₁,γ_２をベクトル
量子化する。詳細な前記文献６等を参照できる。The gain quantizer 286 performs vector quantization on the gains γ ₁ and γ ₂ by using the gain codebook 287 created in advance by learning using the equations (12) and (13) by the method described in the operation. Reference can be made to the above-mentioned reference 6 and the like.

修正回路280は、作用の項で述べた（６）〜（10）式
を用いて、第１のコードブック探索回路230において選
択されたコードベクトルc_1j（ｎ）の修正を行う。The correction circuit 280 corrects the code vector c _1j (n) selected in the first codebook search circuit 230 using the equations (6) to (10) described in the section of the operation.

加算器290は、適応コードブック210の予測音源信号
と、第１のコードブック探索回路230の出力音源信号
と、第２のコードブック探索回路270の出力音源信号と
を加算して合成フィルタ281へ出力する。The adder 290 adds the predicted excitation signal of the adaptive codebook 210, the output excitation signal of the first codebook search circuit 230, and the output excitation signal of the second codebook search circuit 270 to the synthesis filter 281. Output.

合成フィルタ281は、加算器290の出力ｖ（ｎ）を入力
し、下式により合成音声を１フレーム分求め、さらにも
う１フレーム分は０の系列をフィルタを入力して応答信
号系列を求め、１フレーム分の応答信号系列を減算器19
0に出力する。The synthesis filter 281 receives the output v (n) of the adder 290, obtains a synthesized speech for one frame by the following equation, and inputs a 0 series filter for another frame to obtain a response signal sequence. One frame of the response signal sequence is subtracted by 19
Output to 0.

マルチプレクサ260は、LSP量子化回路140,適応コード
ブック210,第１のコードブック探索回路230,第２のコー
ドブック探索回路270,ゲイン量子化器286の出力符号系
列を組みあわせて出力する。 The multiplexer 260 combines and outputs the output code sequences of the LSP quantization circuit 140, the adaptive codebook 210, the first codebook search circuit 230, the second codebook search circuit 270, and the gain quantizer 286.

以上で第１の発明の実施例の説明を終える。 This concludes the description of the embodiment of the first invention.

第２図は、第２の発明による音声符号化方法を実施す
る音声符号化装置を示すブロック図である。図におい
て、第１図と同一の番号を付した構成要素は、第１図の
構成要素と同一の動作を行うので説明を省略する。FIG. 2 is a block diagram showing a speech encoding apparatus for implementing the speech encoding method according to the second invention. In the figure, components having the same reference numerals as those in FIG. 1 perform the same operations as those in FIG.

修正回路380は、第１のコードブック探索回路230にお
いて選択されたコードベクトルc_1j（ｎ）を用いて、作
用の項で述べた（11）〜（14）式にもとづき、第２のコ
ードブック探索回路270において選択されたコードベク
トルc_2i（ｎ）の修正を行う。The correction circuit 380 uses the code vector c _1j (n) selected in the first codebook search circuit 230 to calculate the second codebook based on the expressions (11) to (14) described in the section of operation. The search circuit 270 corrects the selected code vector c _2i (n).

以上で第２の発明の説明を終える。 This concludes the description of the second invention.

第３図は、第３の発明による音声符号化方法を実施す
る音声符号化装置を示すブロック図である。図におい
て、第１図と同一の番号を付した構成要素は、第１図の
構成要素と同一の動作を行うので説明を省略する。FIG. 3 is a block diagram showing a speech encoding apparatus for implementing the speech encoding method according to the third invention. In the figure, components having the same reference numerals as those in FIG. 1 perform the same operations as those in FIG.

音源コードブック探索回路430は、第１のコードブッ
ク探索回路230と同一の動作を行い、最適な音源信号を
音源コードブック435から選択する。The sound source codebook search circuit 430 performs the same operation as the first codebook search circuit 230, and selects an optimum sound source signal from the sound source codebook 435.

修正回路480は、適応コードブック210で求められたピ
ッチ予測音源信号ｖ（ｎ−Ｍ）を用いて、（15）〜（1
8）式を用いて、音源コードブック探索回路430により選
択された音源信号c_1j（ｎ）の修正を行う。The correction circuit 480 uses the pitch prediction excitation signal v (n−M) obtained by the adaptive codebook 210 to calculate (15) to (1)
Using the expression 8), the sound source signal c _1j (n) selected by the sound source codebook search circuit 430 is corrected.

以上により第３の発明の実施例の説明を終える。 This concludes the description of the third embodiment of the present invention.

修正回路における修正方法においては、実施例で述べ
た方法以外に、下記の方法を用いることもできる。例え
ば、第１の発明を例にとると、ｃ′_1j（ｎ）＝（１−δ）・c_1j（ｎ）＋sign・｜γ₂/γ₁| ・δ・C_2i（ｎ）（23）ここでsignは正あるいは負の符号を示す。符号は次式
を最小化する方法を選択する。In the correction method in the correction circuit, the following method can be used in addition to the method described in the embodiment. For example, taking the first invention as an example, c ′ _1j (n) = (1−δ) · c _1j (n) + sign · | γ ₂ / γ ₁ | · δ · C _2i (n) (23) Here, sign indicates a positive or negative sign. The sign selects the method that minimizes:

ここでe_w（ｎ）は減算器205の出力信号である。上式
を最小化するには、上式をγ_１で編微分して０とおいた
下式を最小化すればよい。 Here, e _w (n) is the output signal of the subtractor 205. In order to minimize the above equation, the following equation obtained by knitting and differentiating the above equation with γ ₁ and setting it to 0 may be minimized.

従って、（23）式の符号を正と負にした両者について
（25）式を計算し、（25）式かより大きな値をとる方の
符号を１ビットで伝送する。 Therefore, equation (25) is calculated for both the positive and negative signs of equation (23), and the sign of equation (25) or the one with a larger value is transmitted in one bit.

第２および第３の発明においても上述と同様な構成を
とることができる。The second and third inventions can also have the same configuration as described above.

また、上述の実施例では、適応コードブックのゲイ
ン、第1,第２のコードブックのゲイン、あるいは、適応
コードブックのゲインと音源コードブックのゲインには
同時最適化を施さなかったが、適応コードブック第１の
コードブック，第２のコードブックのゲインについて、
同時最適化を行い、さらに特性を改善する。この同時最
適化は、第1,2のコードブックのコードベクトルを求め
るときになお、演算量の低減化のために、第１のコード
ブックのコードベクトル探索のときにのみゲイン最適化
を行い、第２のコードブックの探索のときには行わない
構成とすることもできる。In the above-described embodiment, the adaptive codebook gain, the first and second codebook gains, or the adaptive codebook gain and the excitation codebook gain are not simultaneously optimized. Regarding the gain of the first codebook and the second codebook,
Perform simultaneous optimization to further improve the characteristics. In this simultaneous optimization, when calculating the code vectors of the first and second codebooks, the gain optimization is performed only at the time of searching for the code vectors of the first codebook in order to reduce the amount of calculation. A configuration in which the search is not performed when searching for the second codebook may be adopted.

また、さらに演算量を低減するためには、コードブッ
クのコードベクトルの探索のときにはゲインの最適化を
行わずに、適応コードブックと第１のコードブックのゲ
インの同時最適化を行い、さらに、適応コードブックと
第1,2のコードブックのゲインを同時に最適化する構成
を用いることもできる。詳細は前記文献５等を参照でき
る。Further, in order to further reduce the amount of calculation, the gain of the adaptive codebook and the first codebook are simultaneously optimized without performing the gain optimization when searching for the code vector of the codebook. A configuration in which the gains of the adaptive codebook and the first and second codebooks are simultaneously optimized may be used. For details, reference can be made to the aforementioned reference 5.

また、さらに演算量を低減化するためには、第1,2の
コードブックのコードベクトルが選択された後に、適応
コードブックのゲインβと、第1,2のコードブックのゲ
インγ₁,γ_２の３種を同時に最適化するような構成とす
ることもできる。詳細は前記文献６等を参照できる。In order to further reduce the amount of calculation, after the code vectors of the first and second codebooks are selected, the gain β of the adaptive codebook and the gains γ ₁ and γ of the _first and second codebooks are selected. It is also possible to adopt a configuration in which the three types ₂ are simultaneously optimized. For details, reference can be made to the aforementioned reference 6.

また、第１のコードブックの探索法は、実施例の方法
以外にも他の周知な方法を用いることができる。例え
ば、前記文献１に記載の方法や、予めコードブックの各
コードワードc_1j（ｎ）の直交変換C₁（ｋ）と求めて格
納しておき、サブフレーム毎に、重み付けインパルス応
答h_w（ｎ）の直交変換H_w（ｋ）と、残差信号e_w（ｎ）の
直交変換E_w（ｋ）を予め定められた点数だけ求め、周波
数軸上で探索することもできる。詳細は前記文献５等を
参照できる。Further, as the first codebook search method, other well-known methods can be used other than the method of the embodiment. For example, the method described in Document 1 or the orthogonal transform C ₁ (k) of each codeword c _1j (n) in the codebook is obtained and stored in advance, and the weighted impulse response h _w ( The orthogonal transform H _w (k) of n) and the orthogonal transform E _w (k) of the residual signal e _w (n) can be obtained by a predetermined number of points, and searched on the frequency axis. For details, reference can be made to the aforementioned reference 5.

また、第２のコードブックの探索法としては、前記実
施例の方法以外にも上記で示した方法や、前記文献６に
記載の方法や、他の周知な良好な方法を用いることがで
きる。As the second codebook search method, in addition to the method of the above-described embodiment, the method described above, the method described in the above-mentioned document 6, and other well-known good methods can be used.

また、第２のコードブックの構成法としては、前記実
施例に記載した方法以外に、例えば予め膨大な乱数系列
をコードブックとして用意して、それらを用いてトレー
ニングデータに対して乱数系列の探索を行い、選択され
る頻度が高いものからコードワードとして登録して第２
のコードブックを構成することもできる。なお、この構
成法は、第１のコードブックの構成にも適用することが
できる。As a method of constructing the second codebook, in addition to the method described in the above embodiment, for example, an enormous random number sequence is prepared in advance as a codebook, and the random number sequence is searched for training data by using them. And register it as a codeword from the one that is selected
Can be configured. This configuration method can also be applied to the configuration of the first codebook.

また、上記実施例では、適応コードブックのゲインと
第1,第２のコードブックのゲインは別々にベクトル量子
化したが、３種のゲインβ，γ₁,γ_２あるいはβ，γ_１
をまとめてベクトル量子化するような構成をとることも
できる。詳細は前記文献５や、I.Gerson氏らによる“Ve
ctor sum excited linear prediction"（VSELP）speech
coding at 8kbp/s"と題した論文（Proc.ICASSP,pp.461
−464,1990）（文献７）等を参照できる。In the above embodiment, the gain of the adaptive codebook and the gains of the first and second codebooks are separately vector-quantized, but three types of gains β, γ ₁ , γ ₂ or β, γ ₁
Can be taken together and vector-quantized. For details, refer to the above-mentioned document 5 and “Ve by I. Gerson et al.
ctor sum excited linear prediction "(VSELP) speech
coding at 8kbp / s "(Proc.ICASSP, pp.461
-464, 1990) (Reference 7).

また、前記実施例では、適応コードブックの次数は１
次としたが、２次以上の高次とすることもできる。ま
た、次数は１次のままで遅延を整数値ではなく少数値と
することもできる。これらについての詳細は、例えばP.
Kroon氏らによる“Pitch predictors with high tempor
al resolution"と題した論文（Proc.ICASSP,pp.661−66
4,1990）（文献８）等を参照できる。以上のようにして
方が特性は向上するが、ゲインあるいは遅延の伝送に必
要な情報量が若干増大する。In the above embodiment, the order of the adaptive codebook is 1
The order is higher, but can be higher than second order. Further, it is possible to set the delay to a decimal value instead of an integer value while keeping the order. For details on these, see, for example, P.
“Pitch predictors with high tempor” by Kroon et al.
al resolution "(Proc.ICASSP, pp.661-66)
4, 1990) (Reference 8). As described above, the characteristics are improved, but the amount of information necessary for transmitting the gain or the delay is slightly increased.

また、前記の実施例では、スペクトルパラメータとし
てＫパラメータ,LSPパラメータを符号化し、その分析法
としてLPC分析を用いたが、スペクトルパラメータとし
ては他の周知なパラメータ、例えばLPCケプストラム，
ケプストラム，改良ケプストラム，一般化ケプストラ
ム，メルケプストラムなどを用いることもできる。また
各パラメータに最適な分析法を用いることができる。In the above embodiment, the K parameter and the LSP parameter are encoded as the spectrum parameters, and the LPC analysis is used as the analysis method. However, other known parameters such as the LPC cepstrum,
Cepstrum, improved cepstrum, generalized cepstrum, mel cepstrum and the like can also be used. In addition, an optimal analysis method can be used for each parameter.

また、フレームで求めたLPC係数をLSP上や線形予測係
数上でサブフレーム毎に補間し、補間した係数を用いて
適応コードブック、第1,第２のコードブックの探索を行
う構成としてもよい。このような構成とすることによ
り、音質がさらに改善される。Further, the LPC coefficient obtained in the frame may be interpolated for each subframe on the LSP or linear prediction coefficient, and the adaptive codebook and the first and second codebooks may be searched using the interpolated coefficient. . With such a configuration, the sound quality is further improved.

また、LSP係数は周知の方法により、ベクトル量子化
あるいはベクトル−スカラ量子化することにより、さら
に効率的に符号化することができる。ベクトル−スカラ
量子化の方法については例えば前記文献３等を参照でき
る。Further, the LSP coefficient can be more efficiently encoded by performing vector quantization or vector-scalar quantization by a known method. For the method of vector-scalar quantization, reference can be made to the above-mentioned reference 3, for example.

また、受信側では、量子化雑音を整形することにより
聴覚的に聞き易くするために、ピッチとスペクトル包絡
の少なくとも１つについて動作する適応形ポストフィル
タを付加してもよい。適応型ポストフィルムの構成につ
いては、例えば、Kroon氏らによる“A Class of Analys
is−by−synthesis Predictive Coders for High Quali
ty Speech Coding at Rates between 4.8 and 16kb/s,"
（IEEE JSAC,vol.6,2,353−363,1988）（文献９）等を
参照できる。Also, on the receiving side, an adaptive post filter that operates on at least one of the pitch and the spectral envelope may be added in order to make the quantization noise shaped to make it easier to hear audibly. For the configuration of the adaptive post film, see, for example, “A Class of Analys
is-by-synthesis Predictive Coders for High Quali
ty Speech Coding at Rates between 4.8 and 16kb / s, "
(IEEE JSAC, vol. 6, 2, 353-363, 1988) (Reference 9).

〔The invention's effect〕

以上述べたように、本発明によれば、第１あるいは第
２のコードブックにより選択されたコードベクトルを、
第２あるいは第１のコードブックにより選択されたコー
ドベクトルをもとに修正するか、あるいは、適応コード
ブックにより選択されたピッチ予測音源信号をもとに音
源コードブックにより選択されたコードベクトルを修正
するか、あるいは、前記コードベクトルによりピッチ予
測音源信号を修正しているので、コードブックの特性を
入力信号の特性に適応化させることが可能となり、低ビ
ットレートにおいてコードブックサイズを低減しても従
来方式よりも良好な特性が得られるという大きな効果が
ある。As described above, according to the present invention, the code vector selected by the first or second codebook is
Correcting based on the code vector selected by the second or first codebook, or correcting the code vector selected by the excitation codebook based on the pitch prediction excitation signal selected by the adaptive codebook Or, since the pitch prediction excitation signal is modified by the code vector, it is possible to adapt the characteristics of the codebook to the characteristics of the input signal, and to reduce the codebook size at a low bit rate. There is a great effect that better characteristics can be obtained than in the conventional method.

[Brief description of the drawings]

第１図は第１の発明による音声符号化方法を実施する音
声符号化装置を示すブロック図、第２図は第２の発明による音声符号化方法を実施する音
声符号化装置を示すブロック図、第３図は第３の発明による音声符号化方法を実施する音
声符号化装置を示すブロック図である。 110……バッファメモリ 130……LPC計算回路 140……量子化回路 150……サブフレーム分割回路 170……インパルス応答計算回路 190,205,255……減算器 200……重み付け回路 206……遅延回路 210……適応コードブック 230……第１のコードブック探索回路 235……第１のコードブック 281……合成フィルタ 270……第２のコードブック探索回路 275……第２のコードブック 280,380,480……修正回路 286……ゲイン量子化器 287……ゲインコードブック 430……音源コードブック探索回路 435……音源コードブックFIG. 1 is a block diagram showing a speech encoding device that implements the speech encoding method according to the first invention, FIG. 2 is a block diagram showing a speech encoding device that implements the speech encoding method according to the second invention, FIG. 3 is a block diagram showing a speech encoding apparatus for implementing the speech encoding method according to the third invention. 110 buffer memory 130 LPC calculation circuit 140 quantization circuit 150 subframe division circuit 170 impulse response calculation circuit 190, 205, 255 subtractor 200 weighting circuit 206 delay circuit 210 adaptation Codebook 230: First codebook search circuit 235: First codebook 281: Synthesis filter 270: Second codebook search circuit 275: Second codebook 280, 380, 480: Correction circuit 286: … Gain quantizer 287 …… Gain codebook 430 …… Sound source codebook search circuit 435 …… Soundsource codebook

フロントページの続き (56)参考文献特公平８−32033（ＪＰ，Ｂ２) 1990年電子情報通信学会春季全国大会講演論文集，分冊１，ＳＡ−５−４, 「学習コードブックによる８ｋｂ／ｓＣＥＬＰの改良（ＬＣＥＬＰ）」，ｐ．１ −427〜１−428，（1990年３月５日発行) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 H03M 7/30 H04B 14/04 ＩＮＳＰＥＣ（ＤＩＡＬＯＧ) ＪＩＣＳＴファイル（ＪＯＩＳ) ＷＰＩ（ＤＩＡＬＯＧ)Continuation of the front page (56) References Tokuhoku Hei 8-32033 (JP, B2) 1990 IEICE Spring National Convention Lecture Papers, Volume 1, SA-5-4, "8 kb / sC using Learning Codebook Improvement of ELP (LCELP) ", p. 1-427 to 1-428, (issued March 5, 1990) (58) Fields surveyed (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06 H03M 7/30 H04B 14/04 INSPEC (DIALOG) JICST file (JOIS) WPI (DIALOG)

Claims

(57) [Claims]

1. An input discrete audio signal is divided into frames of a predetermined time length, and a spectrum parameter representing a spectrum envelope of the audio signal is obtained and output.
The frame is divided into small sections of a predetermined time length, a pitch parameter is obtained for each of the small sections to obtain a pitch prediction excitation signal, and the speech signal is obtained by the pitch prediction excitation signal and a signal selected from a codebook. In the speech encoding method representing the excitation signal, the codebook is modified based on the pitch prediction excitation signal, or the prediction excitation signal is modified by a signal selected from the codebook. Audio coding method.