JPH0481199B2

JPH0481199B2 -

Info

Publication number: JPH0481199B2
Application number: JP58139022A
Authority: JP
Inventors: Kazunori Ozawa
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-07-29
Filing date: 1983-07-29
Publication date: 1992-12-22
Also published as: JPS6051900A

Description

【発明の詳細な説明】本発明は音声信号の低ビツトレイト波形符号化
方式、特に伝送情報量を10kビツト／秒以下とす
るような符号化方式と装置に関する。音声信号をokビツト／秒程度以下の伝送情報
量で符号化するための効果的な方法としては、音
声信号の駆動音源信号系列を、それを用いて再生
した信号と入力信号との誤差最小を条件として、
短時間毎に探索する方法が、よく知られている。
これらの方法はその探索方法によつて木符号化
（TREE CODING）、ベクトル量子化
（VECTORQUANTIZATION）と呼ばれてい
る。また、これらの方法以外に、駆動音源信号系
列を表わす複数個のパルス系列を、短時間毎に、
符号器側で、Ａ−ｂ−Ｓ（ＡNALYSIS−ＢＹ−
ＳYNTHESIS）の手法を用いて逐次的に求めよ
うとする方式が最近、提案されている。本発明
は、この方式に関係するものである。この方式の
詳細については、ビー．エス．アタール（（B.S.
ATAL）氏らによるアイ．シー．エー．エス．
エス．ピー（I.C.A.S.S.P）の予稿集、1982年614
〜617頁に掲載の「ア．ニユー．モデル．オブ．
エル．ピー．シー．エクサイテイシヨン．フオ
ー．プロデユーシング．ナチユラル．サウンデイ
ング．スピーチ．アツト．ロウ．ビツト．レイ
ツ」（“Ａ NEW MODEL OF LPC
EXCITATION FOR PRODUCING
NATURAL−SOUNDING SPEECH AT
LOW BIT RATES”）と題した論文（文献１）
に説明されているので、ここでは簡単に説明を行
なう。第１図は、前記文献１、に記載された従来方式
における符号器側の処理を示すブロツク図であ
る。図において、１００は符号器入力端子を示
し、Ａ／Ｄ変換された音声信号系列ｘ(n)が入力さ
れる。１１０はバツフアメモリ回路であり、音声
信号系列を１フレーム（例えば8KHZサンプリン
グの場合でフレーム長10ｍsecとすると80サンプ
ル）分、蓄積する。１１０の出力値は減算器１２
０と、Ｋパラメータ計算回路１８０とに出力され
る。但し、文献１、によればＫパラメータのかわ
りにレフレクシヨン．コエフイシエンツ
（REFLECTION COEFFICIENTS）と記載され
ているが、これはＫパラメータと同一のパラメー
タである。Ｋパラメータ計算回路１８０は、１１
０の出力値を用い、共分散法に従つて、フレーム
毎の音声信号スペクトルを表わすＫパラメータK_i
を16次分（１≦ｉ≦16）求め、これらを合成フイ
ルタ１３０へ出力する。１４０は、音源パルス発
生回路であり、１フレームにあらかじめ定められ
た個数のパルス系列を発生させる。ここでは、こ
のパルス系列をｄ(n)と記する。音源パルス発生回
路１４０によつて発生された音源パルス系列の一
例を第２図に示す。第２図で横軸は離散的な時刻
を、縦軸は振幅をそれぞれ示す。ここでは、１フ
レーム内に８個のパルスを発生させる場合につい
て示してある。音源パルス発生回路１４０によつ
て発生されたパルス系列ｄ(n)は、合成フイルタ１
３０を駆動する。合成フイルタ１３０は、ｄ(n)を
入力し、音声信号ｘ(n)に対応する再生信号ｘ〜(n)を
求め、これを減算器１２０へ出力する。ここで、
合成フイルタ１３０は、ＫパラメータK_iを入力
し、これらを予測パラメータa_i（１≦ｉ≦16）へ
変換し、a_iを用いて再生信号ｘ〜(n)を計算する。ｘ〜
(n)は、ｄ(n)とa_iを用いて下式のように表わすこと
ができる。ｘ〜(n)＝ｄ(n)＋_p 〓ⁱ⁼¹ a_i・ｘ〜（ｎ−ｉ） −(1) 上式でＰは合成フイルタの次数を示し、ここで
はＰ＝16としている。減算器１２０は、原信号ｘ〜
(n)と再生信号ｘ(n)との差ｅ(n)を計算し、重み付け
回路１９０へ出力する。１９０は、ｅ(n)を入力
し、重み付け関数ｗ(n)を用い、次式に従つて重み
付け誤差e_w(n)を計算する。 e_w(n)＝ｗ(n)^*ｅ(n) −(2) 上式で、記号“＊”はたたみこみ積分を表わ
す。また、重み付け関数ｗ(n)は、周波数軸上で重
み付けを行なうものであり、そのＺ変換値をＷ(Z)
とすると、合成フイルタの予測パラメータa_iを用
いて、次式により表わされる。Ｗ(Z)＝（１−_p 〓ⁱ⁼¹ a_iZ^-i）／（１−_p 〓ⁱ⁼¹ a_i・rⁱ．Z^-i） −(3) 上式でｒは０≦ｒ≦１の定数であり、Ｗ(Z)の周
波数特性を決定する。つまり、ｒ＝１とすると、
Ｗ(Z)＝１となり、その周波数特性は平担となる。
一方、ｒ＝０とすると、Ｗ(Z)は合成フイルタの周
波数特性の逆特性となる。従つて、ｒの値によつ
てＷ(Z)の特性を変えることができる。また、(3)式
で示したようにＷ(Z)を合成フイルタの周波数特性
に依存させて決めているのは、聴感的なマスク効
果を利用しているためである。つまり、入力音声
信号のスペクトルのパワが大きな箇所では（例え
ばフオルマントの近傍）、再生信号のスペクトル
との誤差が少々大きくても、その誤差は耳につき
難いという聴感的な性質による。第３図に、ある
フレームにおける入力音声信号スペクトルと、Ｗ
(Z)の周波数特性の一例とを示した。ここではｒ＝
0.8とした。図において、横軸は周波数（最大4K
Hz）を、縦軸は対数振幅（最大60dB）をそれぞ
れ示す。また、上部の曲線は音声信号のスペクト
ルを、下部の曲線は重み付け関数の周波数特性を
表わしている。第１図へ戻つて、重み付け誤差e_w(n)は、誤差最
小化回路１５０へフイードバツクされる。誤差最
小化回路１５０は、e_w(n)の値を１フレーム分記憶
し、これを用いて次式に従い、重み付け２重誤差
εを計算する。 ε_N 〓ⁿ⁼¹ e_w(n)² −(4) ここで、Ｎは２乗誤差を計算するサンプル数を
示す文献１、の方式では、この時間長を５ｍsec
としており、これは8KHzサンプリングの場合に
はＮ＝40に相当する。次に、誤差最小化回路１５
０は、前記(4)式で計算した２乗誤差εを小さくす
るように音源パルス発生回路１４０に対し、パル
ス位置及び振幅情報を与える。１４０は、この情
報に基づいて音源パルス系列を発生させる。合成
フイルタ１３０は、この音源パルス系列を駆動源
として再生信号ｘ〜(n)を計算する。次に減算器１２
０では、先に計算した原信号と再生信号との誤差
ｅ(n)から現在求まつた再生信号ｘ〜(n)を減算して、
これを新たな誤差ｅ(n)とする。重み付け回路１９
０はｅ(n)を入力し重み付け誤差e_w(n)を計算し、こ
れを誤差最小化回路１５０へフイードバツクす
る。誤差最小化回路１５０は、再び２乗誤差を計
算し、これを小さくするように音源パルス系列の
振幅と位置を調整する。こうした音源パルス系列
の発生から誤差最小化による音源パルス系列の調
整までの一連の処理は、音源パルス系列のパルス
数があらかじめ定められた数に達するまでくり返
され、音源パルス系列が決定される。以上で従来方式の説明を終了する。この方式の場合に、伝送すべき情報は、合成フ
イルタのＫパラメータK_i（１≦ｉ≦16）と、音源
パルス系列のパルス位置及び振幅であり、１フレ
ーム内にたてるパルスの数によつて任意の伝送レ
イトを実現できる。さらに、伝送レイトを
10Kbps以下とする領域に対しては、良好な再生
音質が得られ有効な方式の一つと考えられる。しかしながら、この従来方式は、演算量が非常
に多いという欠点がある。これは音源パルス系列
におけるパルスの位置と振幅を計算する際に、そ
のパルスに基づいて再生した信号と原信号との誤
差及び２乗誤差を計算し、それらをフイードバツ
クさせて、２乗誤差を小さくするようにパルス位
置と振幅を調整していることに起因している。更
には、パルスの数があらかじめ定められた値に達
するまでこの処理をくり返すことに起因してい
る。更に、この従来方式によれば、0Kbps程度以下
のビツトレイトでは、ピツチ周波数の高い入力信
号の場合、例えば女性の声を入力した場合には、
再生品質が劣化するという欠点があつた。これは
ピツチ周波数が高い場合には、パルス計算のフレ
ーム内に多くのピツチ波形が含まれることにな
り、このピツチ波形を良好に再生するためには、
ピツチ周波数が低い話者の場合と比べて、より多
くの個数の音源パルスを必要とするという理由に
よる。従つてこの理由から、伝送ビツトレイトを
大幅に下げる、すなわち１フレーム内のパルス数
を大幅に減少させることが困難であつた。本発明の目的は、比較的少ない演算量で、
10Kbps以下のビツトレイトに適用し得る高品質
な音声符号方式とその装置を提供することにあ
る。本発明によれば、送信側では離散的音声信号系
列を入力しピツチの微細構造を含む短時間スペク
トルを表わすパラメータを抽出して付号化し、前
記パラメータをもとに前記短時間スペクトルに応
じたインパルス応答系列の自己相関々数を計算
し、前記音声信号系列と前記インパルス応答系列
とに応じた相互相関々数を計算し、前記自己相
関々数と前記相互相関々数とを用いて前記音声信
号系列に対する駆動音源信号系列を求めて符号化
し、前記駆動音源信号系列を表わす符号と前記パ
ラメータを表わす符号とを組み合わせて出力し、
受信側では前記符号系列を入力し前記駆動音源信
号系列を表わす符号系列と前記ピツチの微細構造
を含む短時間スペクトルを表わすパラメータの符
号系列とを分離して復号し、前記復号化された駆
動音源信号系列と前記復号されたパラメータとを
用いて前記音声信号系列を再生するようにしたこ
とを特徴とする音声符号化方法が得られる。また、本発明によれば、離散的音声信号系列を
入力し前記音声信号系列からピツチの微細構造を
表わすピツチパラメータと短時間スペクトル包絡
を表わすスペクトルパラメータとを抽出し符号化
するパラメータ計算回路と、前記パラメータ計算
回路の出力系列を入力し前記音声信号系列のピツ
チ構造を含んだ短時間スペクトルに応じたインパ
ルス応答系列の自己相関々数を計算する自己相関
関数計算回路と、前記音声信号系列と前記パラメ
ータ計算回路の出力系列を入力し前記音声信号系
列と前記短時間スペクトルに応じたインパルス応
答系列とで表わされる相互相関々数を計算する相
互相関々数計算回路と、前記自己相関々数計算回
路の出力系列と前記相互相関々数計算回路の出力
系列とを入力し前記音声信号系列に対する駆動音
源信号系列を求めて符号化する駆動音源信号系列
計算回路と、前記パラメータ計算回路の出力符号
系列と前記駆動音源信号系列計算回路の出力符号
系列とを組み合わせて出力するマルチプレクサ回
路とを有するようにしたことを特徴とする音声符
号化装置が得られる。更に本発明によれば、離散的音声信号系列を入
力し前記音声信号系列からピツチの微細構造を表
わすピツチパラメータと短時間スペクトル包絡を
表わすスペクトルパラメータとを抽出し符号化す
るパラメータ計算回路と、前記パラメータ計算回
路の出力系列を入力し前記音声信号系列のピツチ
構造を含んだ短時間スペクトルに応じたインパル
ス応答系列の自己相関々数を計算する自己相関々
数計算回路と、前記音声信号系列と前記パラメー
タ計算回路の出力系列を入力し前記音声信号系列
と前記短時間スペクトルに応じたインパルス応答
系列とで表わされる相互相関々数を計算する相互
相関々数計算回路と、前記自己相関々数計算回路
の出力系列と前記相互相関々数計算回路の出力系
列とを入力し前記音声信号系列に対する駆動音源
信号系列を求めて符号化する駆動音源信号系列計
算回路、前記パラメータ計算回路の出力符号系列
と前記駆動音源信号系列計算回路の出力符号系列
とを組み合わせて出力するマルチプレクサ回路
と、前記組合わせることにより得られる符号系列が
入力され前記駆動音源信号系列を表わす符号系列
と前記ピツチパラメータを表わす符号系列と前記
スペクトルパラメータを表わす符号系列とを分離
するデマルチプレクサ回路と、分離して得られた
前記駆動音源信号系列を表わす符号系列を入力し
て復号する駆動音源復号回路と、分離して得られ
た前記ピツチパラメータを表わす符号系列と前記
スペクトルパラメータを表わす符号系列とを入力
し復号するパラメータ復号回路と、前記駆動音源
復号回路の出力系列と前記パラメータ復号回路の
出力とを用い音声信号系列を再生し出力する合成
フイルタ回路とを有することを特徴とする音声符
号化復号化装置が得られる。まず本発明による音源パルス計算アルゴリズム
を詳細に説明することにする。１フレーム内の任意の時刻ｎにおける音源パル
ス系列ｄ(n)を次式で表わす。ｄ(n)＝_K 〓ⁱ⁼¹ g_i・δ_o,ni (5) ここで、δ_o,niはクロネツカーのデルタを表わ
し、ｎ＝miの場合に１で、ｎ≠miの場合は０で
ある。Ｋは１フレーム内にたてるパルス数を示
る。g_iはｉ番目のパルスの振幅を示し、m_iはｉ番
目のパルスの位置を示す。次に、合成フイルタとしてピツチの微細構造も
含めた音声信号のスペクトル構造を表わし得るフ
イルタを考える。このフイルタはピツチ予測フイ
ルタとスペクトル包絡予測フイルタとの縦続接続
で表わすことができる。ブロツク図を第４図に示
す。図において、１９１はピツチ予測フイルタを
示し、１９２はスペクトル包絡予測フイルタを示
している。ピツチ予測フイルタとしては、１次の
場合と高次の場合とが考えられるが、ここでは説
明の簡略化のために１次のピツチ予測フイルタを
用いた場合について考える。音源パルス列ｄ(n)に
よつて、ピツチ予測フイルタとスペクトル包絡予
測フイルタとの縦続接続からなる、合成フイルタ
を駆動して得られる再生信号ｘ〜(n)は、次式のよう
に書き表わすことができる。ｘ〜(n)＝ｄ(n)＋β・x_d（ｎ−M_d）＋_p 〓ⁱ⁼¹ a_i・ｘ〜（ｎ−ｉ） −(6) ここでβはピツチ予測フイルタのタツプ係数を
示し、M_dは入力信号のピツチ周期を示す。l_d(n)
はピツチ予測フイルタ出力信号を示す。また、Ｐ
はスペクトル予測器の包絡予測次数であり、a_i
（１≦ｉ≦Ｐ）は、スペクトル包絡予測器の予測
係数を示す。ピツチ予測器のタツプ係数β及びピ
ツチ周期M_dの算出法は種々知られているが、簡
便な方法法としては、例えば入力音声信号の自己
相関々数列のピーク振幅及びその位置を抽出する
方法がよく知られている。この方法の詳細につい
ては、ビーエス・アタール（B.S.ATAL）、エ
ム・アール・シユレーダー（M.R.
SCHROEDER）氏によるベル・システム・テク
ニカル・ジヤーナル（BELL SYSTEM
TECHNICAL JOURNAL）誌、1970年10月号、
1973〜1986頁に掲載の「アダプテイブ・プリデイ
クテイプ・コーデイング・オブ・スピーチ・シグ
ナルズ」（“ADAPTIVE PREDITIVE CODING
OF SPEECH SIGNALS”）と題した論文（文献
２）に詳細に説明されているのでここでは説明を
省略する。今、ピツチ予測フイルタとスペクトル包絡予測
フイルタとからなる合成フイルタのインパルス応
答をｈ(i)（０≦ｉ≦Ｍ−１；ここでＭはインパル
ス応答の継続サンプル数を示す。）とすると、再
生信号ｘ〜(n)は次式のようにも書くことができる。ｘ〜(n)＝ｄ(n)＊ｈ(n) −(7) 次に、入力音声信号ｘ(n)と再生信号ｘ〜(n)との１
フレーム内の重み付け乗誤差Ｊは、次のように表
わすことができる。Ｊ＝_N 〓ⁿ⁼¹ 〔｛ｘ(n)−ｘ〜(n)｝＊ｗ(n)〕² −(8) ここでｗ(n)は重み付け回路のインパルス応答で
あり、例えば第１図に示した従来方式の重み付け
回路と同一の特性とする。又、Ｎは例えば１フレ
ームのサンプル数を示す。 (8)式で示した重み付けられた２乗誤差Ｊを最小
化する音源パルス列を計算するためのアルゴリズ
ムを、次に導出する。まず(7)式を(8)式に代入して次式を得る。Ｊ＝_N 〓ⁿ⁼¹ 〔｛ｘ(n)−ｄ(n)＊ｈ(n)｝＊ｗ(n)〕² −(9) ここで上式右辺の各項を次式のように表わし、 x_w(n)＝ｘ(n)＊(n) (10) h_w(n)＝ｈ(n)＊ｗ(n) −(11) (5)式、(10)式、(11)式を(9)式に代して次式を得る。Ｊ＝_N 〓ⁿ⁼¹ 〔x_w(n)−_K 〓ⁱ⁼¹ g_i・h_w（ｎ−m_i）〕² −(12) (12)式を最小化する音源パルス系列は、(12)式を音
源パルス系列の振幅g_iで偏微分して０とおくこと
によつて得た次式から計算される。ここで、_xh（・）はx_w(n)とh_w(n)から計算した
相互相関々数列を、_hh（・）はh_w(n)から計算した
自己相関々数列をそれぞれ表わし、次式のように
書ける。尚、_hh（・）は音声信号処理の分野では
共分散数列と呼ばれることが多い。 (13)式によれば、音源パルス系列の振幅g_iは、そ
の位置m_iの関数となつており、位置m_iにパルス
をたてる場合に最適な振幅g_iを計算することがで
きる。また、音源パルスの位置m_iは、(13)式を(12)
式に代入して求めた２乗誤差Ｊ＝R_xx（０）−_K 〓ⁱ⁼¹ g_i・_xh（−m_i） −(15) を最小化する、つまり、右辺第２項を最大化する
位置を選べばよい。また、近似的な方法として
は、｜g_i｜を最大とするような位置を選んでもよ
い。(15)式でR_xx（０）は重み付け信号x_w(n)の電力
を示す。今、フレームの端の影響を無視すれば、(15)式で
示した共分散関数_hh（m_l，m_i）は、次式のよう
に時間差（｜m_l−m_i｜）に依存した自己相関々
数列R_hh（｜m_l−m_i｜）に等しいとおける。 _hh（m_l，m_i）＝R_hh（｜m_l−m_i｜） −(16) ここでR_hh（・）は、次式のように表わせる。 R_hh（｜m_l−m_i｜）＝_N-(|_nl-ni|₎ 〓〓ⁿ⁼¹ h_w（ｎ−m_l），h_w（ｎ−m_i），（１≦m_l，m_i≦Ｎ）
−(17) 従つて(13)式は(16)、(17)式を用いて次式のように修正
される。 R_hh（・）の計算は、_hh（・，・）の計算に比べ約
１／Ｎの演算量ですむ。従つて、音源パルス系列
の計算に(18)式を用いることによつて(13)式と比較し
て演算量を１／Ｎに低減できる。しかしながら(18)
式に従つて音源パルス列を計算する場合に、相互
相関々数_xh（・）を計算するデータサンプル数が
パルスを伝送するフレームのサンプル数よりも大
きくないとフレームの端近傍のパルスに誤差を生
ずる。従つて、相互相関々数_xh（・）を計算する
データサンプル数を、フレームのサンプル数より
も大きく選んでおくことによつて、この問題は回
避できる。以上で音源パルス計算アルゴリズムの
導出及びその特徴に関する説明を終える。次に本発明による音源パルス計算アルゴリズム
を用いた音声符号化方式を、第５図を参照して詳
細に説明する。第５図ａは、本発明による音声符号化方式の送
信側の一実施例を示すブロツク図であり、第５図
ｂは受信側の一実施例を示すブロツク図である。
第５図ａにおいて、離散的な音声信号系列ｘ(n)は
入力端子１９５から入力され、あらかじめ定めら
れたサンプル数だけ区切られてバツフアメモリ回
路３４０に蓄積される。ここで入力音声信号系列
を区切る際に、あらかじめ定められたサンプル数
だけの重なりをもつて区切るようにする。これは
前述のように、音源パルス計算に用いるデータサ
ンプル数をフレームのサンプル数よりも大きくす
るためである。次に、Ｋパラメータ計算回路２８
０は、バツフアメモリ回路３４０に蓄積されてい
る音声信号系列のうち、あらかじめ定められた長
さの系列を入力し、これを用いてあらかじめ定め
られた次数Ｐ個のLPCパラメータを、衆知の方
法（例えば線形予測分析法）に従い計算する。
LPCパラメータとしては、種々のものが考えら
れるが、以下ではＫパラメータK_i（１≦ｉ≦Ｐ）
を用いるものとして説明を進める。Ｋパラメータ
はパーコール係数と同一のパラメータである。Ｋ
パラメータK_iは、Ｋパラメータ符号化回路２００
に出力される。Ｋパラメータ符号化回路２００
は、例えばあらかじめ定められた量子化ビツト数
に基づいて、K_iを符号化し、符号l_kiをゲート回路
４６０へ出力する。また、Ｋパラメータ符号化回
路２００は、l_kiを復号化して得たk_i′をインパル
ス応答計算回路２１０と合成フイルタ回路４００
へ出力する。次に、ピツチ分析回路３７０は、バツフアメモ
リ回路３４０の出力系列を入力し、例えば前述の
文献(2)に記載の方法に従つて、ピツチ周期M_d及
びピツチゲインβを計算し、ピツチ符号化回路３
８０へ出力する。ピツチ符号化回路３８０は、あ
らかじめ定められたビツト数でピツト周期M_d及
びピツチゲインβを符号化して得たl_d及びl〓をゲ
ート回路４６０へ出力する。また、ピツチ符号化
回路３８０は、l_d及びl〓を復号化して得たM_d′及
びβ′をインパル応答計算回路２１０と合成フイル
タ回路４００へ出力する。次に、インパルス応答計算回路２１０は、Ｋパ
ラメータ復号値K_i′をＫパラメータ符号化回路２
００から入力し、また、ピツチ周期及びピツチゲ
インの復号化値M_d′及びβ′をピツチ符号化回路３
８０から入力する。インパルス応答計算回路２１
０は、２種類のインパルス応答を計算する。まず
初めに、Ｋパラメータ復号値K_i′のみを用いて第
４図に示しれ合成フイルタのうち、スペクトル包
絡予測フイルタ１９２のみの場合の重み付けられ
たインパルス応答h_1wを、あらかじめ定められた
サンプル数だけ計算し、求まつたh_1w(n)を自己相
関々数計算回路３６０と、相互相関々数計算回路
３５０とへ出力する。続いてＫパラメータ復号値
K_i′とピツチ情報（ピツチゲインβ′及びピツチ周
期M_d′）とを用いて、ピツチ予測フイルタとスペ
クトル包絡予測フイルタからなる合成フイルタの
重み付けられたインパルス応答h_2w(n)をあらかじ
め定められたサンプル数だけ計算し、h_1w(n)によ
る相関演算処理が終了した後に、自己相関々数計
算回路３６０と相互相関々数計算回路３５０とへ
出力する。次に、自己相関々数計算回路３６０は、インパ
ルス応答h_1w(n)を入力し、前述の(17)式に従つてh_1w
(n)の自己相関々数列R_hh1（｜m_l−m_i｜）を計算
し、これをパルス計算回路３９０へ出力する。続
いて、前述の計算が終了した後に、インパルス応
答h_2w(n)を入力し、h_2w(n)の自己相関々数列R_hh2
（｜m_l−m_i｜）を計算し、パルス計算回路３９０
へ出力する。次に、減算器２８５は、バツフアメモリ回路３
４０に蓄積された音声信号系列ｘ(n)を入力し、Ｘ
(n)から合成フイルタ回路４００の出力系列を１フ
レームサンプル分減算し、減算結果を重み付け回
路４１０へ出力する。重み付け回路４１０は、Ｋ
パラメータ符号化回路２００から、Ｋパラメータ
復号値K_i′を入力し、重み付け関数Ｗ(n)を、その
ｚ変換値を(3)式とするように計算する。これは他
の周波数重み付け方法を用いて計算してもよい。
更に、重み付け回路４１０は、減算器２８５の減
算結果を入力し、これと重み付け関数ｗ(n)とのた
たみこみ演算を行ない、得られたx_w(n)を相互相
関々数計算回路３５０へ出力する。相互相関々数計算回路３５０は、インパルス応
答計算回路２１０からインパルス応答h_1w(n)を入
力し、h_1w(n)と前述のx_w(n)とを用いて、第１の相
互相関々数_xh1（−m_i）（１≦m_i≦Ｎ）を計算し、
これをパルス計算回路３９０へ出力する。続い
て、インパルス応答h_2w(n)を入力し、h_2w(n)とx_w(n)
とを用いて、第２の相互相関々数_xh2（−m_i）１
≦m_i≦Ｎ）を計算し、これをパルス計算回路３
９０へ出力する。パルス計算回路３９０は、相互相関々数と自己
関関数とを同期して入力する。つまり、あらかじ
め定められたフレーム周期内で初めに、第１の相
互相関々数_xh1（−m_i）と自己相関々数R_hh1（｜m_l
−m_i｜）とを同期して入力し、前述の音源パル
ス計算式(18)を用いて、第１の音源パルス列の振幅
g_iと位置m_iとを、あらかじめ定められた個数だけ
計算する。次に、前述の処理が終了した後に、第
２の相互相関々数_xh2（−m_i）と自己相関々数
R_hh2（｜A_l−m_i｜）とを同期して入力し、前述の
(18)式に従つて、第２の音源パルス列の振幅と位置
とを計算する。更に、パルス計算回路３９０は、
入力信号と再生信号との間の誤差信号のパワー
を、第１の音源パルス列と第２の音源パルス列の
各々に対して、次式に従つて計算する。Ｊ＝_N 〓ⁿ⁼¹ ｛（ｘ(n)−ｘ〜(n)）＊ｗ(n)）²＝R_xx（０）−_K 〓ⁱ⁼¹ g_i・_xh（−m_i） −(19) 上式は、(9)式に(18)式を代入することによつて求
められる。ここで、R_xx（０）は重み付け回路４
１０の出力値x_w(n)のパワーを示す。第１の音源
パルス列と第２の音源パルス列について計算され
た誤差信号パワーは、判別回路４３０へ出力され
る。尚、(U)式のかわりに、音源パルス列を計算し
た結果、得られる相互相関々数の残差（相互相
関々数からパルスと自己相関々数によつて求まる
値をパルス毎に減算して求めた最終値）を用い
て、誤差信号のパワーを次式に従い近似的に計算
することもできる。Ｊ＝_N 〓^m=1 ² _xh （−ｍ）／R_hh（０） −(20) 次に、判別回路４３０は、第１の音源パルス列
と第２の音源パルス列のうちで、入力信号をより
忠実に表わし得る音源パルス列を選択する目的を
持つている。従つて判別回路４３０は、パルス計
算回路３９０から入力した各々の音源パルス列に
対する誤差信号のパワーを比較する。もし第１の
音源パルスに対する値が第２の音源パルスに対す
る値よりも小さい場合は、第１の音源パルス例を
用いた方が第２の音源パルス列を用いるよりも特
性がよいと判断する。第１の音源パルス列は、ピ
ツチ情報を用いない（つまりβ′＝０，M_d′＝０と
した）第１のインパルス応答から計算したもので
あるため、判別回路４３０は、切り換え回路４４
０に対して、第１の音源パルス列を符号化回路４
７０へ出力させるようにする。また、判別回路４
３０は、ゲート回路４６０に対して、Ｋパラメー
タ符号l_kiをマルチプレクサ４５０へ出力させる。
また、ピツチ情報を表わす符号（l_d及びl〓）は、
ゲート回路４６０において、あらかじめ定められ
た符号がセツトされ、マルチプレクサ４５０へ出
力される。逆の場合には、第２の音源パルス列を
用いた方が第１の音源パルス列を用いるよりも特
性がよいと判断する。第２の音源パルス列は、ス
ペクトル包絡情報とピツチ情報とを用いた第２の
インパルス応答から計算したものであるため、判
別回路４３０は、切り換え回路４４０に対し、第
２の音源パルス列を符号化回路４７０へ出力させ
るようにする。また、判別回路４３０は、ゲート
回路４６０に対して、Ｋパラメータ符号l_ki及びピ
ツチ情報を表わす符号l_d，l〓をマルチプレクサ４
５０へ出力させる。次に、符号化回路４７０は、切り換え回路４４
０から、音源パルス列の振幅及び位置を入力し、
これらを後述の正規化係数を用いて符号化する。
また正規化係数にも符号化を施し、正規化係数、
音源パルス列の振幅、位置を表わす符号を、マル
チプレクサ４５０へ出力する。また、音源パルス
列の振幅、位置の復号化値g_i′，m_i′を音源パルス
発生回路４２０へ出力する。ここで符号化回路４
７０の符号化の方法は種々考えられる。一つは、
パルス列の振幅、位置を別々に符号化する方法で
あり、また、一つは、振幅、位置を一諸にして符
号化する方法である。前者の方法について一例を
説明する。まず、音源パルスの振幅の符号化法と
しては、フレーム内のパルス系列の振幅の最大値
を正規化係数として、この値を用いて各パルスの
振幅を正規化した後に、量子化、符号化する方法
が考えられる。また、他の方法としては、振幅の
確率分布を正規型と仮定して、正規型の場合の最
適量子化器を用いる方法が考えられる。これにつ
いては、ジエー・マツクス（Ｊ・MAX）氏によ
るアイ・アール・イー・トランザクシヨンズ・オ
ン・インフオメーシヨン・セオリー（IRE
TRANSACTIONS ON INFORMATION
THEORY）の1960年３月号、７〜12頁に掲載の
「クオンタイジング・フオー・ミニマム・デイス
トーシヨン」（“QUANTIZING FOR
MINIMUMDISTORTION”）と題した論文（文
献３）等に詳述されているので、ここでは説明を
省略する。次に、パルス位置の符号化についても
種々の方法が考えられる。例えば、フアクシミリ
信号符号化の分野でよく知られているランレング
ス符号等を用いてもよい。これは符号“０”また
は“１”の続く長さをあらかじめ定められた符号
系列を用いて表わすものである。また、正規化係
数の符号化には、従来よく知られている対数圧縮
符号化等を用いることができる。尚、パルス系列の符号化に関しては、ここで説
明した符号化方法に限らず、衆知の最良の方法を
用いることができることは勿論である。第５図に戻つて、パルス系列発生回路４２０は
入力したg_i′，m_i′を用いて、m_i′の位置に振幅
g_i′をもつ音源パルス系列を１フレーム長Ｎにわ
たつて計算し、これを駆動信号として、合成フイ
ルタ回路４００へ出力する。合成フイルタ回路４
００は、Ｋパラメータ符号化回路２００から、Ｋ
パラメータ復号値K_i′を入力する。また、ピツチ
符号化回路３８０から、ピツチ情報（ピツチ周期
復号値M_d′及びピツチゲイン復号値β′）を入力す
る。Ｋパラメータ復号値K_i′を予測パラメータa_i
（１≦ｉ≦N_p）に、衆知の方法を用いて変換して
おく。また、判別回路４３０から、判別情報を入
力する。前述の第１の音源パルス列を用いる場合
には、ピツチ情報は０とする。次に合成フイルタ
回路４００は、音源パルス発生回路４２０から、
１フレーム分の駆動音源信号を入力して、この１
フレーム分の信号に、更に１フレーム分、零を付
加し、この２フレームの信号に対する応答信号系
列ｘ〜(n)を求める。次式にこのことを示す。ｘ〜(n)＝ｄ(n)＋β・x_d（ｎ−M_d）＋_p 〓ⁱ⁼¹ a_i・ｘ〜（ｎ−ｉ） −(21) ここで駆動音源信号ｄ(n)は、１≦ｎ≦Ｎでは、
パルス発生回路４２０から出力されたパルス系列
を表わし、Ｎ＋１≦ｎ≦2Nでは、全て０の系列
を表わす。また、Ｎ＋１≦ｎ≦2N時刻において、
(21)式で用いるa_i、M_d、βは現フレーム時刻に求
めた値を使つてもよいし、次のフレーム時刻で求
まる値を使つてもよい。(21)式に従つて求めたｘ〜
(n)のうち、第２フレーム目のｘ〜(n)（Ｎ＋１≦ｎ≦
2N）の値が減算器２８５へ出力される。次に、マルチプレクサ４５０は、符号化回路４
７０の出力符号と、ゲート回路４６０の出力符号
とを入力し、これらを組み合わせて、送信側出力
端子４８０から通信路へ出力する。以上で本発明
による音声符号化方式の符号器側の説明を終え
る。次に、本発明による音声符号化方式の受信側に
ついて第５図ｂを参照して説明する。デマルチプレクサ５００は、受信側入力端子４
９０から、符号を入力する。デマルチプレクサ５
００は、入力符号のうち、Ｋパラメータを表わす
符号系列とピツチ情報を表わす符号系列と、音源
パルス列を表わす符号系列とを分離し、Ｋパラメ
ータを表わす符号系列をＫパラメータ復号回路５
２０へ出力し、ピツチ情報を表わす符号系列を、
ピツチ復号回路５１０へ出力し、音源パルス列を
表わす符号系列を、音源パルス復号回路５３０へ
出力する。Ｋパラメータ復号回路５２０及びピツ
チ復号回路５１０は、入力した符号系列を復号
し、合成フイルタ回路５５０へ出力する。音源パルス復号回路５３０は、音源パルス列を
表わす符号系列を入力し、復号化して音源パルス
列の振幅、位置情報としてパルス発生回路５４０
へ出力する。パルス発生回路５４０は、音源パル
ス列の振幅、位置情報を入力し、音源パルス列を
発生させ、これを合成フイルタ回路５５０へ出力
する。合成フイルタ回路５５０は、第４図に示したよ
うに、ピツチ予測フイルタとスペクトル予測フイ
ルタとの縦続接続になつている。合成フイルタ回
路５５０は、ピツチ復号回路５１０は、Ｋパラメ
ータ復号回路５２０から、ピツチ情報及びＫパラ
メータ復号値を入力する。もしピツチ情報があら
かじめ定められた符号であつた場合は、スペクト
ル予測フイルタのみを用いて（つまり、ピツチ情
報は０として）信号を再生する合成フイルタ回路
５５０は、パルス発生回路５４０の出力パルス列
を駆動源として信号ｘ〜(n)を再生し、受信側出力端
子５６０から出力する。以上で本発明による復号
器側の説明を終える。本発明によれば、音源パルス系列の計算を(18)式
に従つているので、文献１、の従来方式に見られ
たように、パルスにより合成フイルタを駆動し、
再生信号を求め、原信号との誤差及び２乗誤差を
フイードバツクしてパルスを調整するという径路
がなく、またその処理をくり返す必要もないの
で、演算量を大幅に減らすことが可能で、良好な
再生音質が得られるという大きな効果がある。更
に、(18)式の演算において、_xh（−m_i）とR_hh（｜
m_l−m_i｜）（１≦｜m_l−m_i｜≦Ｎ）の値は、１フ
レーム毎に、前もつて計算しておくことによつ
て、(18)式の計算は音源パルスを求める毎に相関演
算を行なう必要はなくなり、更に演算量を減らす
ことができるという効果がある。また、音源パル
ス列を探索する他の従来方式と比べても、本発明
による方法は、同一の伝送情報量の場合に、より
良好な品質を得ることができるという効果があ
る。また本発明によれば、入力音声信号の周期性つ
まり音源パルス系列の周期性を利用し、入力信号
のピツチ構造も含めたスペクトル構造を再現でき
るパラメータを用いて音源パルス系列を計算して
おり、音源パルス上で、ピツチ周期だけ離れた音
源パルスを予測することができるので、従来方式
と比較して、同一の特性を得るのに必要な音源パ
ルス数をきわめて削減できるという効果がある。
従つて伝送情報量の低減にきわめて有効である。
このことは、従来方式と同一の伝送情報量とした
場合に、再生品質が向上するという効果にもな
る。特に、従来方式において問題であつたピツチ
周波数の高い女性話者に対しては、10Kbps以下
の伝送情報量でも良好な再生品質を得ることがで
きる。また本発明によれば、符号器側において、スペ
クトルパラメータのみを用いて計算した音源パル
ス列とピツチパラメータをも用いて計算した音源
パルス列とを比較し、入力信号をより忠実に再現
できるパルス列を伝送し、これを受信側での再生
に用いる構成としているので、入力音声信号の過
渡部で周期性がないフレームや、ピツチパラメー
タの抽出誤りに起因する劣化を防止することがで
きるという効果がある。尚、より簡便な方法とし
て、ピツチパラメータのピツチゲインβを用いて
判別するような構成とすることも考えられる。例
えば、ピツチゲインβを計算した後に、βをあら
かじめ定められたしきい値と比較して、βがしき
い値以下であればβを強制的に０とする。この場
合には、スペクトルパラメータのみを用いて音源
パルスを計算することになる。このような構成と
することによつて、音源パルス系列を比較判別す
るための判別回路及び前述の(19)式あるいは(20)式の
演算が不要となり、演算量を低減することができ
る。また、(18)式に示した音源パルス計算法において
は、準最適なパルスを一つずつ計算していた。こ
の方法においては、次のパルスを計算する際に、
これより過去に求まつた複数個のパルスの振幅を
再調整するような方法を用いることもできる。こ
の方法によれば、各パルスに独立性が成立しない
場合、つまり、各パルスの位置が非常に接近して
求まる場合に効果的である。更に、他の音源パル
ス計算法として、種々のものが考えられる。例え
ば、１フレーム内の全てのパルスが求まつた後
に、全てのパルスの振幅を再調整するような方法
を用いることもできる。更に、本発明によれば、フレーム境界での波形
の不連続に起因したフレーム境界近傍での再生信
号の劣化がほとんどないという大きな効果があ
る。この効果は、符号器側において、現フレーム
の音源パルス系列を計算する際に、１フレーム過
去の音源パルス系列によつて合成フイルタを駆動
して得られた応答信号系列を、現フレームにまで
伸ばして求め、これを入力音声信号系列から減算
した結果に対して現フレームの音源パルス系列を
計算するという構成にしたことに起因している。
また、本実施例ではフレーム長を一定とした場合
について説明したが、フレーム長を時間的に変化
させる可変長フレームとしても勿論同様の効果は
得られる。また、１フレーム過去の音源パルス系列に由来
した応答信号系列の求め方として、本発明の実施
例の構成によれば、応答信号計算回路のフイルタ
パラメータとしては、１フレーム過去に入力され
たピツチ情報とＫパラメータ値をそのまま用いた
が、過去のフレームの音源パルスに由来した応答
信号系列を計算する際には、現フレーム時刻に入
力されたピツチ情報とＫパラメータ値を用いる構
成としてもよい。また、本発明によれば、送信側の合成フイルタ
回路４００において、１フレーム過去の音源パル
スに由来した応答信号系列を求める際に、判別回
路４３０の判別結果に従つてピツチ情報を用いる
か用いないかを切り換えていたが、ピツチ情報は
常に０として応答信号系列を計算するような構成
としてもよい。また、符号器側において、合成フイルタ回路４
００にて過去のフレームの音源パルスに由来した
応答信号系列を計算する場合に、Ｋパラメータの
みを用いて計算した応答信号系列と、Ｋパラメー
タとピツチ情報とを用いて計算した応答信号系列
との２種の応答信号系列を計算しておき、次のフ
レームで、どちらの応答信号系列を用いた方がよ
いか、例えば入力信号と各々の応答信号系列との
重み付けされた誤差のパワーを計算し、この誤差
パワーの小さい方を選択するという構成にすれ
ば、特性はより改善される。但し、このような構
成とした場合には、復号器側で２種の応答信号系
列のうち１種を選択するために必要な選択情報
を、フレーム毎にビツト余分に伝送しなくてはな
らない。この場合の伝送情報量の増加は、フレー
ム長を20ｍsecとすると、50ビツト／秒となり、
非常に少ない量で済む。また、本発明の実施例の構成によれば、送信側
の重み付け回路４１０において、従来方式に用い
られている(3)式に従つた重み付けを行なつた。こ
の重み付けはスペクトル包絡に関する重み付けで
あり、ピツチ構造を利用した重み付けは含まれて
いない。従つて、次式に示すスペクトル包絡とピ
ツチ構造の両方を利用した重み付け関数W_p(n)を
用いることによつて、より効果的な重み付けがさ
れ、より効果的な重み付けができる。ここでW_p(z)は、重み付け関数wp(n)のＺ変換表
現であり、ｒ及びr′は重み付け係数であり、０＜
ｒ，r′＜１の値が選ばれる。また本発明によれば、符号器側の判別回路４３
０において、２種の音源パルス列のうち、どちら
のパルス列を用いれば特性が良好かを判別する場
合に、(19)、(20)式で求めた重み付けられた誤差信号
のパワーを判断の基準にした。判断基準として
は、他の最良な方法を用いることができる。例え
ば、ピツチ予測を行なつた場合の予測ゲインを計
算し、それらの値とあらかじめ定められたしきい
値とを比較して判断基準とするような構成にして
もよい。また本発明においては、短時間スペクトル構造
を表わすインパルス応答系列の自己相関々数列を
計算する際に、インパルス応答計算回路２１０に
よつて、Ｋパラメータ復号値及びピツチ情報とを
用いて、インパルス応答系列を計算したのちに、
このインパルス応答系列を用いて自己相関々数列
を計算していた。デイジタル信号処理の分野でよ
く知られているように、インパルス応答系列の自
己相関々数列は、短時間スペクトルのパワスペク
トルと対応関係にある。従つて、Ｋパラメータ復
号値及びピツチ情報を用いて、短時間スペクトル
のパワスペクトルを求め自己相関々数列を計算す
るような構成としてもよい。一方、音声信号系列
と短時間スペクトル包絡を表わすインパルス応答
系列との相互相関々数列を計算する際に、本実施
例の構成では、重み付け回路４１０の出力値であ
る信号系列x_w(n)と、インパルス応答計算回路２
１０で求めたインパルス応答系列とを用いて、相
互相関々数計算回路３５０にて相互相関々数を計
算していた。よく知られているように、相互相
関々数は、クロス・パワスペクトルと対応関係に
ある。この関係を用いて音声信号系列とＫパラメ
ータ復中値及びピツチ情報とを用いてクロス・パ
ワスペクトルを求めて相互相関々数列を計算する
ような構成としてもよい。尚、パワスペクトルと
自己相関々数列との対応関係、及びクロス・パワ
スペクトルと相互相関々数列との対応関係につい
ては、エー・ブイ・オツペンハイム（A.V.
OPPENHEIM）氏らによる「デイジタル信号処
理」（“DIGITAL SIGNAL PROCESSING”）
と題した単行文（文献４）の第８章にて詳細に説
明されているので、ここでは説明を省略する。また、前述の本発明の実施例においては、１フ
レーム内の音源パルス系列の符号化は、パルス系
列が全て求まつた後に、第５図の符号化回路４７
０によつて符号化を施したが、符号化をパルス系
列の計算に含めて、パルスを１つ計算する毎に、
符号化を行ない、次のパルスを計算するという構
成にしてもよい。このような構成をとることによ
つて、符号化の歪をも含めた誤差を最小とするよ
うなパルス系列が求まるので、更に品質を向上さ
せることができる。また、符号化回路４７０における符号割り当て
に関しては、本発明の構成では等長符号割り当て
よりも可変長符号割り当てを行なつた方が符号化
効率がかなり向上する。なぜならば、ピツチ情報
を用いて音源パルス列を求めることによつて、音
源パルス列の振幅分布により一層のかたよりが生
ずるためである。また、以上説明した実施例においては、短時間
音声信号系列のスペクトル包絡を表わすパラメー
タとしてはＫパラメータを用いたが、これはよく
知られている他のパラメータ（例えばLSPパラメ
ータ等）を用いてもよい。更に、前述の(8)式にお
いて重み付け関数ｗ(n)はなくてもよい。また、本実施例においては、フレーム境界での
再生波形の不連続に起因する品質劣化を防ぐため
に、現フレームより１フレーム過去の音源パルス
に由来した応答信号系列を計算し、現フレームの
入力音声からこの応答信号を減算した後に、駆動
音源パルスを計算したが、第６図に示すように、
音源パルス計算に用いるデータとして、パルスを
伝送するフレームのデータ及びそれよりも過去の
データを含むような構成にしてもよい。図６で、
N_Tはパルスを伝送するフレームを示し、Ｎは音
源パルスを計算するフレームを示す。このような
構成とすることによつて、１フレーム過去の音源
パルスに由来した応答信号系列を計算する必要が
なくなるという効果がある。 DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a low bit rate waveform encoding system for audio signals, and particularly to an encoding system and apparatus for reducing the amount of transmitted information to 10 kbit/sec or less. An effective method for encoding an audio signal with a transmission information amount of about ok bits/second or less is to minimize the error between the input signal and the signal reproduced using the driving excitation signal sequence of the audio signal. As a condition,
A method of searching at short intervals is well known.
These methods are called tree coding (TREE CODING) and vector quantization (VECTORQUANTIZATION) depending on their search method. In addition to these methods, a plurality of pulse sequences representing the driving sound source signal sequence are transmitted at short intervals.
On the encoder side, A-b-S( A NALYSIS- B Y-
Recently, a method has been proposed that attempts to obtain the information sequentially using the SYNTHESIS method. The present invention relates to this method. For details on this method, please refer to B. S. Attar ((BS
Eye by ATAL) et al. C. A. S.
S. Proceedings of ICASSP, 1982 614
"A. New. Model. of..." published on page 617.
L. P. C. Excitement. Huo. Producing. Natural. Sounding. speech. Att. Row. Bit. Reitz” (“A NEW MODEL OF LPC”
EXCITATION FOR PRODUCING
NATURAL−SOUNDING SPEECH AT
LOW BIT RATES”) (Reference 1)
Since it has been explained previously, a brief explanation will be provided here. FIG. 1 is a block diagram showing the processing on the encoder side in the conventional system described in Document 1. In the figure, 100 indicates an encoder input terminal, into which an A/D converted audio signal sequence x(n) is input. Reference numeral 110 denotes a buffer memory circuit, which stores an audio signal sequence for one frame (for example, in the case of 8KHZ sampling and assuming a frame length of 10 msec, 80 samples). The output value of 110 is the subtracter 12
0 and is output to the K parameter calculation circuit 180. However, according to Reference 1, the reflection.. is used instead of the K parameter. Although it is described as REFLECTION COEFFICIENTS, this is the same parameter as the K parameter. The K parameter calculation circuit 180 includes 11
Using an output value of 0, the K parameter K _i representing the audio signal spectrum for each frame is calculated according to the covariance method.
are obtained for 16 orders (1≦i≦16), and these are output to the synthesis filter 130. 140 is a sound source pulse generation circuit, which generates a predetermined number of pulse sequences in one frame. Here, this pulse sequence is denoted as d(n). An example of a sound source pulse sequence generated by the sound source pulse generation circuit 140 is shown in FIG. In FIG. 2, the horizontal axis indicates discrete time, and the vertical axis indicates amplitude. Here, a case is shown in which eight pulses are generated within one frame. The pulse sequence d(n) generated by the sound source pulse generation circuit 140 is passed through the synthesis filter 1
Drive 30. The synthesis filter 130 inputs d(n), obtains a reproduced signal x~(n) corresponding to the audio signal x(n), and outputs this to the subtracter 120. here,
The synthesis filter 130 inputs the K parameters K _i , converts them into prediction parameters a _i (1≦i≦16), and uses a _i to calculate the reproduced signal x˜(n). x~
(n) can be expressed as shown below using d(n) and a _i . x ~ (n) = d (n) + _p 〓 ^{i = 1} a _i x ~ (ni) - (1) In the above formula, P indicates the order of the synthesis filter, and here P = 16. The subtracter 120 receives the original signal x~
The difference e(n) between (n) and the reproduced signal x(n) is calculated and output to the weighting circuit 190. 190 inputs e(n), uses the weighting function w(n), and calculates the weighting error e _w (n) according to the following equation. e _w (n)=w(n) ^* e(n) −(2) In the above equation, the symbol “*” represents a convolution integral. Furthermore, the weighting function w(n) performs weighting on the frequency axis, and its Z-transformed value is expressed as W(Z).
Then, using the prediction parameter a _i of the synthesis filter, it is expressed by the following equation. W(Z)=(1- _p 〓 ⁱ⁼¹ a _i Z ^-i )/(1- _p 〓 ⁱ⁼¹ a _i・r ⁱ .Z ^-i ) −(3) In the above equation, r is 0≦r It is a constant of ≦1 and determines the frequency characteristics of W(Z). In other words, if r=1,
W(Z)=1, and the frequency characteristics become flat.
On the other hand, when r=0, W(Z) has a frequency characteristic opposite to that of the synthesis filter. Therefore, the characteristics of W(Z) can be changed depending on the value of r. Furthermore, as shown in equation (3), W(Z) is determined depending on the frequency characteristics of the synthesis filter because an auditory masking effect is utilized. In other words, this is due to the perceptual property that even if the error with the spectrum of the reproduced signal is a little large, the error is hard to notice at a location where the input audio signal has a large spectral power (for example, near a formant). Figure 3 shows the input audio signal spectrum in a certain frame and W
An example of the frequency characteristics of (Z) is shown. Here r=
It was set to 0.8. In the figure, the horizontal axis is the frequency (up to 4K
Hz), and the vertical axis shows logarithmic amplitude (maximum 60 dB). Further, the upper curve represents the spectrum of the audio signal, and the lower curve represents the frequency characteristic of the weighting function. Returning to FIG. 1, the weighted error e _w (n) is fed back to the error minimization circuit 150. The error minimization circuit 150 stores the value of e _w (n) for one frame, and uses this to calculate the weighted double error ε according to the following equation. ε _N 〓 ⁿ⁼¹ e _w (n) ² −(4) Here, N indicates the number of samples for calculating the squared error. In the method of Reference 1, this time length is set to 5 msec.
This corresponds to N=40 in the case of 8KHz sampling. Next, the error minimization circuit 15
0 provides pulse position and amplitude information to the sound source pulse generation circuit 140 so as to reduce the squared error ε calculated by the above equation (4). 140 generates a sound source pulse sequence based on this information. The synthesis filter 130 uses this sound source pulse sequence as a driving source to calculate a reproduction signal x~(n). Next, subtractor 12
0, the currently determined reproduced signal x~(n) is subtracted from the previously calculated error e(n) between the original signal and the reproduced signal,
Let this be the new error e(n). Weighting circuit 19
0 inputs e(n), calculates a weighted error e _w (n), and feeds it back to the error minimization circuit 150. The error minimization circuit 150 calculates the squared error again, and adjusts the amplitude and position of the sound source pulse sequence to reduce the squared error. A series of processes from generation of the sound source pulse sequence to adjustment of the sound source pulse sequence by error minimization are repeated until the number of pulses in the sound source pulse sequence reaches a predetermined number, and the sound source pulse sequence is determined. This concludes the explanation of the conventional method. In this method, the information to be transmitted is the K parameter K _i (1≦i≦16) of the synthesis filter, and the pulse position and amplitude of the sound source pulse sequence, depending on the number of pulses generated in one frame. Therefore, any transmission rate can be achieved. Furthermore, the transmission rate
It is considered to be one of the effective methods since good playback quality can be obtained for the range of 10Kbps or less. However, this conventional method has the disadvantage that the amount of calculation is extremely large. When calculating the position and amplitude of a pulse in a sound source pulse sequence, this calculates the error and square error between the reproduced signal and the original signal based on the pulse, and feeds them back to reduce the square error. This is due to the fact that the pulse position and amplitude are adjusted accordingly. Furthermore, this is caused by repeating this process until the number of pulses reaches a predetermined value. Furthermore, according to this conventional method, at a bit rate of about 0 Kbps or less, in the case of an input signal with a high pitch frequency, for example, when a female voice is input,
The drawback was that the playback quality deteriorated. This means that if the pitch frequency is high, many pitch waveforms will be included in the pulse calculation frame, and in order to reproduce this pitch waveform well, it is necessary to
This is because a larger number of sound source pulses are required compared to the case of a speaker with a low pitch frequency. Therefore, for this reason, it has been difficult to significantly lower the transmission bit rate, that is, to significantly reduce the number of pulses within one frame. The purpose of the present invention is to
The purpose of this invention is to provide a high-quality audio encoding system that can be applied to bit rates of 10Kbps or less and its equipment. According to the present invention, on the transmitting side, a discrete audio signal sequence is input, and parameters representing a short-time spectrum including pitch fine structure are extracted and coded, and based on the parameters, a discrete audio signal sequence is input, and a parameter representing a short-time spectrum including a pitch fine structure is extracted and encoded. Calculate the autocorrelations of the impulse response sequence, calculate the crosscorrelations according to the audio signal sequence and the impulse response sequence, and use the autocorrelations and the crosscorrelations to determining and encoding a driving excitation signal sequence for the signal sequence, and outputting a combination of a code representing the driving excitation signal sequence and a code representing the parameter;
On the receiving side, the code sequence is input, and the code sequence representing the driving excitation signal sequence and the parameter code sequence representing the short-time spectrum including the fine structure of the pitch are separated and decoded, and the code sequence representing the driving excitation signal sequence is separated and decoded. A voice encoding method is obtained, characterized in that the voice signal sequence is reproduced using the signal sequence and the decoded parameters. Further, according to the present invention, a parameter calculation circuit inputs a discrete audio signal sequence and extracts and encodes a pitch parameter representing a pitch fine structure and a spectral parameter representing a short-time spectral envelope from the audio signal sequence; an autocorrelation function calculation circuit that inputs the output sequence of the parameter calculation circuit and calculates the autocorrelations of the impulse response sequence according to the short-time spectrum including the pitch structure of the audio signal sequence; a cross-correlation calculation circuit that receives an output sequence of the parameter calculation circuit and calculates a cross-correlation represented by the audio signal sequence and the impulse response sequence corresponding to the short-time spectrum; and the autocorrelation calculation circuit. a driving excitation signal sequence calculation circuit that inputs the output sequence of the output sequence and the output sequence of the cross-correlation coefficient calculation circuit to obtain and encode a driving excitation signal sequence for the audio signal sequence; and an output code sequence of the parameter calculation circuit. There is obtained a speech encoding device characterized in that it has a multiplexer circuit that combines and outputs the output code sequence of the drive excitation signal sequence calculation circuit. Further, according to the present invention, a parameter calculation circuit inputs a discrete audio signal sequence and extracts and encodes a pitch parameter representing a pitch fine structure and a spectral parameter representing a short-time spectral envelope from the audio signal sequence; an autocorrelation calculation circuit that inputs the output sequence of the parameter calculation circuit and calculates the autocorrelation of the impulse response sequence according to the short-time spectrum including the pitch structure of the audio signal sequence; a cross-correlation calculation circuit that receives an output sequence of the parameter calculation circuit and calculates a cross-correlation represented by the audio signal sequence and the impulse response sequence corresponding to the short-time spectrum; and the autocorrelation calculation circuit. a drive excitation signal sequence calculation circuit which inputs the output sequence of the above-mentioned cross-correlation coefficient calculation circuit and the output sequence of the cross-correlation coefficient calculation circuit, and calculates and encodes a drive excitation signal sequence for the audio signal sequence; an output code sequence of the parameter calculation circuit; a multiplexer circuit that combines and outputs the output code sequence of the driving excitation signal sequence calculation circuit; and a multiplexer circuit that receives the code sequence obtained by the combination and outputs a code sequence representing the driving excitation signal sequence and a code sequence representing the pitch parameter. a demultiplexer circuit that separates the code sequence representing the spectral parameter; a driving excitation decoding circuit that inputs and decodes the code sequence representing the driving excitation signal sequence obtained by the separation; A parameter decoding circuit inputs and decodes a code sequence representing a pitch parameter and a code sequence representing the spectral parameter, and an output sequence of the drive excitation decoding circuit and an output of the parameter decoding circuit are used to reproduce and output an audio signal sequence. There is obtained a speech encoding/decoding device characterized in that it has a synthesis filter circuit. First, the sound source pulse calculation algorithm according to the present invention will be explained in detail. The sound source pulse sequence d(n) at any time n within one frame is expressed by the following equation. d(n)= _K 〓 ⁱ⁼¹ g _i・δ _o,ni (5) Here, δ _o,ni represents Kronetzker's delta, which is 1 when n=mi and 0 when n≠mi. It is. K indicates the number of pulses generated within one frame. g _i indicates the amplitude of the i-th pulse, and m _i indicates the position of the i-th pulse. Next, consider a filter that can represent the spectral structure of the audio signal, including the fine structure of the pitch, as a synthesis filter. This filter can be represented by a cascade of a pitch prediction filter and a spectral envelope prediction filter. A block diagram is shown in Figure 4. In the figure, 191 indicates a pitch prediction filter, and 192 indicates a spectral envelope prediction filter. There are two possible pitch prediction filters: a first-order pitch prediction filter and a high-order pitch prediction filter. Here, for the sake of simplicity, we will consider a case where a first-order pitch prediction filter is used. The reproduction signal x~(n) obtained by driving a synthesis filter consisting of a cascade connection of a pitch prediction filter and a spectral envelope prediction filter using the sound source pulse train d(n) can be expressed as the following equation. I can do it. x~(n)=d(n)+β・x _d (n−M _d )+ _p 〓 ⁱ⁼¹ a _i・x−(ni) −(6) Here, β is the tap coefficient of the pitch prediction filter , and M _d indicates the pitch period of the input signal. l _d (n)
indicates the pitch prediction filter output signal. Also, P
is the envelope prediction order of the spectral predictor and a _i
(1≦i≦P) indicates the prediction coefficient of the spectral envelope predictor. Various methods are known for calculating the tap coefficient β and pitch period M _d of the pitch predictor, but a simple method is, for example, a method of extracting the peak amplitude and its position of the autocorrelation sequence of the input audio signal. well known. For more information on this method, please refer to BSATAL, M.R. Schulder, M.R.
BELL SYSTEM Technical Journal by Mr. SCHROEDER
TECHNICAL JOURNAL) magazine, October 1970 issue,
“ADAPTIVE PREDITIVE CODING” published on pages 1973-1986.
OF SPEECH SIGNALS” (Reference 2), so the explanation is omitted here. Now, let us express the impulse response of the composite filter consisting of the pitch prediction filter and the spectral envelope prediction filter h(i ) (0≦i≦M−1; where M indicates the number of continuous samples of the impulse response), the reproduced signal x~(n) can also be written as the following equation: x~(n )=d(n)*h(n) −(7) Next, the input audio signal x(n) and the reproduced signal x~(n) are 1
The weighted multiplicative error J within a frame can be expressed as follows. J= _N 〓 ⁿ⁼¹ [{x(n)−x〜(n)}*w(n)] ² −(8) Here, w(n) is the impulse response of the weighting circuit, for example, as shown in Fig. 1 It has the same characteristics as the conventional weighting circuit shown in . Further, N indicates, for example, the number of samples in one frame. Next, an algorithm for calculating a sound source pulse train that minimizes the weighted squared error J shown in equation (8) will be derived. First, substitute equation (7) into equation (8) to obtain the following equation. J= _N 〓 ⁿ⁼¹ [{x(n)−d(n)*h(n)}*w(n)] ² −(9) Here, each term on the right side of the above equation is expressed as the following equation. , x _w (n)=x(n)*(n) (10) h _w (n)=h(n)*w(n) −(11) Equation (5), Equation (10), (11) Substituting equation (9) for equation (9), we obtain the following equation. J= _N 〓 ⁿ⁼¹ [x _w (n)− _K 〓 ⁱ⁼¹ g _i・h _w (n−m _i )] ² −(12) The sound source pulse sequence that minimizes equation (12) is ( It is calculated from the following equation obtained by partially differentiating equation 12) with the amplitude g _i of the sound source pulse sequence and setting it to 0. Here, _xh (・) represents the cross-correlation sequence calculated from x _w (n) and h _w (n), and _hh (・) represents the autocorrelation sequence calculated from h _w (n). It can be written like an expression. Note that _hh (·) is often called a covariance sequence in the field of audio signal processing. According to equation (13), the amplitude g _i of the sound source pulse sequence is a function of its position m _i , and it is possible to calculate the optimum amplitude g _i when emitting a pulse at the position m _i . In addition, the position m _i of the sound source pulse can be calculated using equation (13) as (12)
Minimize the squared error J=R _xx (0) − _K 〓 ⁱ⁼¹ g _i・_xh (−m _i ) −(15) obtained by substituting it into the formula, that is, maximize the second term on the right side. All you have to do is choose a location. Furthermore, as an approximate method, a position that maximizes |g _i | may be selected. In equation (15), R _xx (0) indicates the power of the weighting signal x _w (n). Now, if we ignore the influence of the edge of the frame, the covariance function _hh (m _l , m _i ) shown in equation (15) depends on the time difference (|m _l −m _i |) as shown in the following equation. It can be assumed that it is equal to the autocorrelation sequence R _hh (|m _l −m _i |). _hh (m _l , m _i )=R _hh (|m _l −m _i |) −(16) Here, R _hh (・) can be expressed as in the following equation. R _hh (｜m _l −m _i ｜)= _N-( | _nl-ni | ₎ 〓〓 ⁿ⁼¹ h _w (n-m _l ), h _w (n-m _i ), (1≦m _l , m _i ≦N)
−(17) Therefore, equation (13) is modified as follows using equations (16) and (17). Calculating R _hh (・) requires approximately 1/N the amount of calculation compared to calculating _hh (・,・). Therefore, by using equation (18) to calculate the sound source pulse sequence, the amount of calculation can be reduced to 1/N compared to equation (13). However(18)
When calculating the sound source pulse train according to the formula, if the number of data samples used to calculate the cross-correlation number _xh (・) is not larger than the number of samples in the frame that transmits the pulses, errors will occur in the pulses near the edges of the frame. . Therefore, this problem can be avoided by selecting the number of data samples for calculating the cross-correlation number _xh (.) to be larger than the number of samples of the frame. This concludes the derivation of the sound source pulse calculation algorithm and the explanation regarding its characteristics. Next, a speech encoding method using the sound source pulse calculation algorithm according to the present invention will be explained in detail with reference to FIG. FIG. 5a is a block diagram showing an embodiment of the transmitting side of the speech encoding system according to the present invention, and FIG. 5b is a block diagram showing an embodiment of the receiving side.
In FIG. 5a, a discrete audio signal sequence x(n) is input from an input terminal 195, divided into a predetermined number of samples, and stored in a buffer memory circuit 340. When dividing the input audio signal sequence, the input audio signal sequence is divided with an overlap of a predetermined number of samples. This is because, as described above, the number of data samples used for sound source pulse calculation is made larger than the number of frame samples. Next, the K parameter calculation circuit 28
0 inputs a sequence of a predetermined length among the audio signal sequences stored in the buffer memory circuit 340, and uses this to calculate LPC parameters of a predetermined order P by a well-known method (for example, Calculated according to the linear predictive analysis method).
Various LPC parameters can be considered, but below we will use the K parameter K _i (1≦i≦P)
The explanation will proceed assuming that . The K parameter is the same parameter as the Percoll coefficient. K
The parameter K _i is the K parameter encoding circuit 200
is output to. K parameter encoding circuit 200
encodes K _i based on, for example, a predetermined number of quantization bits, and outputs the code l _ki to gate circuit 460 . Furthermore, the K parameter encoding circuit 200 decodes l _ki and sends k _i ′ to the impulse response calculation circuit 210 and the synthesis filter circuit 400 .
Output to. Next, the pitch analysis circuit 370 inputs the output series of the buffer memory circuit 340, calculates the pitch period M _{d and the pitch gain β according to the method described in the above-mentioned document (2), and calculates the pitch period M d} and the pitch gain β.
Output to 80. The pitch encoding circuit 380 encodes the pit period M _d and the pitch gain β using a predetermined number of bits and outputs _ld and l 〓 to the gate circuit 460 . Further, the pitch encoding circuit 380 outputs M _d ' and β' obtained by decoding l _d and l 〓 to the impulse response calculation circuit 210 and the synthesis filter circuit 400 . Next, the impulse response calculation circuit 210 converts the K parameter decoded value K _i ' to the K parameter encoding circuit 2
00, and also input the decoded values M _d ′ and β′ of the pitch period and pitch gain to the pitch encoding circuit 3.
Enter from 80. Impulse response calculation circuit 21
0 calculates two types of impulse responses. First, using only the K-parameter decoded value K _i ′, the weighted impulse response h _1w in the case of only the spectral envelope prediction filter 192 among the synthesis filters shown in FIG. 4 is calculated using a predetermined number of samples. and outputs the determined h _1w (n) to the autocorrelation number calculation circuit 360 and the cross correlation number calculation circuit 350. Next, the K parameter decoded value
Using K _i ′ and the pitch information (pitch gain β′ and pitch period M _d ′), the weighted impulse response h _2w (n) of the composite filter consisting of the pitch prediction filter and the spectral envelope prediction filter is determined in advance. After calculating the number of samples and completing the correlation calculation process using h _1w (n), it is output to the autocorrelation number calculation circuit 360 and the cross correlation number calculation circuit 350. Next, the autocorrelation calculation circuit 360 inputs the impulse response h _1w (n) and calculates h _1w according to the above-mentioned equation (17).
The autocorrelation sequence R _hh1 (| _ml −m _i |) of (n) is calculated and output to the pulse calculation circuit 390. Next, after completing the above calculation, input the impulse response h _2w (n) and calculate the autocorrelation sequence R _hh2 of h _2w (n).
(|m _l −m _i |), the pulse calculation circuit 390
Output to. Next, the subtracter 285 converts the buffer memory circuit 3
Input the audio signal sequence x(n) accumulated in 40,
The output sequence of the synthesis filter circuit 400 is subtracted by one frame sample from (n), and the subtraction result is output to the weighting circuit 410. The weighting circuit 410 has K
The K-parameter decoded value K _i ' is inputted from the parameter encoding circuit 200, and the weighting function W(n) is calculated so that its z-transformed value is expressed by equation (3). This may be calculated using other frequency weighting methods.
Furthermore, the weighting circuit 410 inputs the subtraction result of the subtracter 285, performs a convolution operation with this and the weighting function w(n), and outputs the obtained x _w (n) to the cross-correlation calculation circuit 350. do. The cross-correlation calculation circuit 350 inputs the impulse response h _1w (n) from the impulse response calculation circuit 210 and calculates the first cross-correlation using h _1w (n) and the above-mentioned x _w (n). Calculate the number _xh1 (−m _i ) (1≦m _i ≦N),
This is output to the pulse calculation circuit 390. Next, input the impulse response h _2w (n) and calculate h _2w (n) and x _w (n)
The second cross-correlation number _xh2 (−m _i )1
≦m _i ≦N) and sends it to the pulse calculation circuit 3.
Output to 90. The pulse calculation circuit 390 receives the cross-correlation numbers and the autocorrelation functions in synchronization. That is, at the beginning within a predetermined frame period, the first cross-correlation number _xh1 (-m _i ) and the autocorrelation number R _hh1 (|m _l
−m _i |), and using the aforementioned sound source pulse calculation formula (18), calculate the amplitude of the first sound source pulse train.
A predetermined number of g _i and positions m _i are calculated. Next, after the above processing is completed, the second cross-correlation number _xh2 (−m _i ) and the autocorrelation number
Synchronize and input R _hh2 (｜A _l −m _i ｜) and
The amplitude and position of the second sound source pulse train are calculated according to equation (18). Furthermore, the pulse calculation circuit 390
The power of the error signal between the input signal and the reproduced signal is calculated for each of the first sound source pulse train and the second sound source pulse train according to the following equation. J= _N 〓 ⁿ⁼¹ {(x(n)−x〜(n))＊w(n)) ² =R _xx (0)− _K 〓 ⁱ⁼¹ g _i・_xh (−m _i ) −( 19) The above equation can be obtained by substituting equation (18) into equation (9). Here, R _xx (0) is the weighting circuit 4
The power of the output value x _w (n) of 10 is shown. The error signal power calculated for the first sound source pulse train and the second sound source pulse train is output to the discrimination circuit 430. In addition, instead of formula (U), the residual of the cross-correlation number obtained as a result of calculating the sound source pulse train (subtracting the value found by the pulse and autocorrelation number from the cross-correlation number for each pulse) The power of the error signal can also be approximately calculated using the obtained final value) according to the following equation. J= _N 〓 ^m=1 ² _xh (-m)/R _hh (0) -(20) Next, the discrimination circuit 430 determines which of the first sound source pulse train and the second sound source pulse train is the input signal. The purpose is to select a sound source pulse train that can be faithfully represented. Therefore, the discrimination circuit 430 compares the power of the error signal for each sound source pulse train input from the pulse calculation circuit 390. If the value for the first sound source pulse is smaller than the value for the second sound source pulse, it is determined that using the first sound source pulse example has better characteristics than using the second sound source pulse train. Since the first sound source pulse train is calculated from the first impulse response without using pitch information (in other words, β'=0, M _d '=0), the discrimination circuit 430 uses the switching circuit 44
0, the first sound source pulse train is encoded by the encoding circuit 4.
70. In addition, the discrimination circuit 4
30 causes the gate circuit 460 to output the K parameter code l _ki to the multiplexer 450 .
Also, the codes (l _d and l〓) representing pitch information are:
A predetermined sign is set in gate circuit 460 and output to multiplexer 450. In the opposite case, it is determined that using the second sound source pulse train has better characteristics than using the first sound source pulse train. Since the second sound source pulse train is calculated from the second impulse response using spectral envelope information and pitch information, the discrimination circuit 430 instructs the switching circuit 440 to transmit the second sound source pulse train to the encoding circuit. 470. Further, the discrimination circuit 430 sends the K parameter code l _ki and the codes l _d , l 〓 representing pitch information to the multiplexer 460 .
output to 50. Next, the encoding circuit 470 switches the switching circuit 44
From 0, input the amplitude and position of the sound source pulse train,
These are encoded using normalization coefficients described later.
In addition, the normalization coefficients are also encoded, and the normalization coefficients,
A code representing the amplitude and position of the sound source pulse train is output to multiplexer 450. It also outputs decoded values g _i ′, m _i ′ of the amplitude and position of the sound source pulse train to the sound source pulse generation circuit 420 . Here, encoding circuit 4
70 can be encoded in various ways. one,
One method is to encode the amplitude and position of a pulse train separately, and the other is to encode the amplitude and position together. An example of the former method will be explained. First, as a method for encoding the amplitude of the sound source pulse, the maximum value of the amplitude of the pulse sequence in the frame is used as a normalization coefficient, and after normalizing the amplitude of each pulse using this value, it is quantized and encoded. There are possible ways. Another possible method is to assume that the amplitude probability distribution is a normal type and use an optimal quantizer for the normal type. Regarding this, please refer to IRE Transactions on Information Theory (IRE) by J. MAX.
TRANSACTIONS ON INFORMATION
“QUANTIZING FOR MINIMUM DISTORTION” published in the March 1960 issue of THEORY, pages 7-12.
MINIMUMDISTOTORTION") (Reference 3), so the explanation is omitted here. Next, various methods can be considered for encoding the pulse position. For example, facsimile signal encoding You may also use a run-length code, etc., which is well known in the field of Conventionally well-known logarithmic compression encoding etc. can be used to encode the coding coefficients. Note that the encoding method of the pulse sequence is not limited to the encoding method described here, but also the best known method. Returning to FIG. 5, the pulse sequence generation circuit 420 uses the input g _i ′ and m _i ′ to generate an amplitude at the position m _i ′.
A sound source pulse sequence having g _i ' is calculated over one frame length N, and is output to the synthesis filter circuit 400 as a drive signal. Synthesis filter circuit 4
00 is K from the K parameter encoding circuit 200.
Input the parameter decoded value K _i ′. Further, pitch information (pitch period decoded value M _d ′ and pitch gain decoded value β′) is input from the pitch encoding circuit 380 . The K parameter decoded value K _i ′ is converted into the predicted parameter a _i
(1≦i≦N _p ) using a well-known method. Further, discrimination information is input from the discrimination circuit 430. When using the first sound source pulse train described above, the pitch information is set to 0. Next, the synthesis filter circuit 400 receives the sound source pulse generation circuit 420 from the sound source pulse generation circuit 420.
Input one frame worth of driving sound source signal and
One frame worth of zeros is added to the frame worth of signals, and a response signal sequence x~(n) for the two frames of signals is obtained. This is shown in the following equation. x~(n)=d(n)+β・x _d (n−M _d )+ _p 〓 ⁱ⁼¹ a _i・x〜(n−i) −(21) Here, the driving sound source signal d(n) is , 1≦n≦N,
It represents a pulse sequence output from the pulse generation circuit 420, and represents a sequence of all 0s when N+1≦n≦2N. Also, at time N+1≦n≦2N,
For a _i , M _d , and β used in equation (21), values determined at the current frame time may be used, or values determined at the next frame time may be used. x~ calculated according to formula (21)
Among (n), x of the second frame ~ (n) (N+1≦n≦
2N) is output to the subtracter 285. Next, the multiplexer 450
70 and the output code of the gate circuit 460 are input, and these are combined and output from the transmission side output terminal 480 to the communication path. This completes the explanation of the encoder side of the audio encoding system according to the present invention. Next, the receiving side of the speech encoding system according to the present invention will be explained with reference to FIG. 5b. The demultiplexer 500 has a receiving side input terminal 4
Enter the code from 90. Demultiplexer 5
00 separates the code sequence representing the K parameter, the code sequence representing the pitch information, and the code sequence representing the excitation pulse train from the input code, and sends the code sequence representing the K parameter to the K parameter decoding circuit 5.
20, and the code sequence representing pitch information is output to
The code sequence representing the excitation pulse train is output to the pitch decoding circuit 510, and the code sequence representing the excitation pulse train is output to the excitation pulse decoding circuit 530. The K-parameter decoding circuit 520 and the pitch decoding circuit 510 decode the input code sequence and output it to the synthesis filter circuit 550. The sound source pulse decoding circuit 530 inputs a code sequence representing the sound source pulse train, decodes it, and outputs the amplitude and position information of the sound source pulse train to the pulse generating circuit 540.
Output to. The pulse generation circuit 540 inputs the amplitude and position information of the sound source pulse train, generates a sound source pulse train, and outputs it to the synthesis filter circuit 550. The synthesis filter circuit 550 is a cascade of a pitch prediction filter and a spectrum prediction filter, as shown in FIG. The synthesis filter circuit 550 and the pitch decoding circuit 510 input pitch information and K parameter decoded values from the K parameter decoding circuit 520. If the pitch information has a predetermined sign, the synthesis filter circuit 550 that reproduces the signal using only the spectrum prediction filter (that is, the pitch information is set to 0) drives the output pulse train of the pulse generation circuit 540. The signal x~(n) is reproduced as a source and output from the receiving side output terminal 560. This completes the explanation of the decoder side according to the present invention. According to the present invention, since the sound source pulse sequence is calculated according to equation (18), the synthesis filter is driven by pulses, as seen in the conventional method of Reference 1,
There is no path to find the reproduced signal and adjust the pulse by feeding back the error and squared error with the original signal, and there is no need to repeat that process, so the amount of calculation can be significantly reduced, which is good. This has the great effect of providing excellent playback quality. Furthermore, in the calculation of equation (18), _xh (−m _i ) and R _hh (|
By calculating the value of m _l −m _i |) (1≦|m _l −m _i |≦N) in advance for each frame, the calculation of equation (18) It is no longer necessary to perform a correlation calculation every time . Furthermore, compared to other conventional methods for searching for a sound source pulse train, the method according to the present invention has the advantage that better quality can be obtained for the same amount of transmitted information. Further, according to the present invention, the periodicity of the input audio signal, that is, the periodicity of the sound source pulse sequence is used to calculate the sound source pulse sequence using parameters that can reproduce the spectral structure including the pitch structure of the input signal. Since it is possible to predict the sound source pulses that are separated by the pitch period on the sound source pulses, this method has the effect of significantly reducing the number of sound source pulses required to obtain the same characteristics compared to the conventional method.
Therefore, it is extremely effective in reducing the amount of transmitted information.
This also has the effect of improving reproduction quality when the amount of transmitted information is the same as in the conventional method. In particular, for female speakers with high pitch frequencies, which was a problem in the conventional system, good reproduction quality can be obtained even with a transmission information amount of 10 Kbps or less. Further, according to the present invention, the encoder side compares the excitation pulse train calculated using only the spectral parameters and the excitation pulse train calculated using the pitch parameter as well, and transmits the pulse train that can more faithfully reproduce the input signal. Since this is configured to be used for reproduction on the receiving side, it is possible to prevent deterioration caused by non-periodic frames in the transient portion of the input audio signal or errors in pitch parameter extraction. Note that, as a simpler method, it is also possible to adopt a configuration in which the determination is made using the pitch gain β of the pitch parameter. For example, after calculating the pitch gain β, β is compared with a predetermined threshold, and if β is less than the threshold, β is forcibly set to 0. In this case, the source pulse will be calculated using only the spectral parameters. With such a configuration, a discrimination circuit for comparing and discriminating sound source pulse sequences and calculation of the above-mentioned equation (19) or (20) are unnecessary, and the amount of calculation can be reduced. Furthermore, in the sound source pulse calculation method shown in equation (18), suboptimal pulses are calculated one by one. In this method, when calculating the next pulse,
From this, it is also possible to use a method of readjusting the amplitudes of a plurality of pulses found in the past. This method is effective when the pulses are not independent, that is, when the positions of the pulses are determined very close to each other. Furthermore, various other sound source pulse calculation methods can be considered. For example, a method may be used in which the amplitudes of all pulses are readjusted after all pulses within one frame have been determined. Further, according to the present invention, there is a great effect that there is almost no deterioration of the reproduced signal near the frame boundary due to waveform discontinuity at the frame boundary. This effect is achieved by extending the response signal sequence obtained by driving the synthesis filter using the excitation pulse sequence of one frame past to the current frame when calculating the excitation pulse sequence of the current frame on the encoder side. This is due to the configuration in which the sound source pulse sequence of the current frame is calculated from the result of subtracting this from the input audio signal sequence.
Further, in this embodiment, the case where the frame length is constant has been described, but of course the same effect can be obtained by using a variable length frame in which the frame length is changed over time. Furthermore, according to the configuration of the embodiment of the present invention, as a method of obtaining a response signal sequence derived from a sound source pulse sequence of one frame past, pitch information input one frame past is used as a filter parameter of the response signal calculation circuit. Although the K parameter values are used as they are, pitch information and K parameter values input at the current frame time may be used when calculating the response signal sequence derived from the sound source pulse of the past frame. Further, according to the present invention, in the synthesis filter circuit 400 on the transmitting side, pitch information is used or not in accordance with the determination result of the determination circuit 430 when determining a response signal sequence derived from a sound source pulse of one frame past. However, the pitch information may be always set to 0 when calculating the response signal sequence. Also, on the encoder side, a synthesis filter circuit 4
When calculating the response signal sequence derived from the sound source pulse of the past frame in 00, the response signal sequence calculated using only the K parameter and the response signal sequence calculated using the K parameter and pitch information are different. Calculate two types of response signal sequences, and decide which response signal sequence should be used in the next frame. For example, calculate the power of the weighted error between the input signal and each response signal sequence. , the characteristics are further improved by selecting the one with smaller error power. However, with such a configuration, the selection information necessary for selecting one of the two types of response signal sequences on the decoder side must be transmitted in extra bits for each frame. In this case, the increase in the amount of transmitted information is 50 bits/second when the frame length is 20 msec.
A very small amount is required. Further, according to the configuration of the embodiment of the present invention, the weighting circuit 410 on the transmitting side performs weighting according to equation (3) used in the conventional system. This weighting is related to the spectral envelope and does not include weighting using the pitch structure. Therefore, by using the weighting function W _p (n) that utilizes both the spectral envelope and pitch structure shown in the following equation, more effective weighting can be performed. Here, W _p (z) is the Z-transformed representation of the weighting function w p (n), r and r' are weighting coefficients, and 0 <
A value of r, r′<1 is chosen. Further, according to the present invention, the determination circuit 43 on the encoder side
0, when determining which of the two sound source pulse trains should be used for better characteristics, the power of the weighted error signal obtained by equations (19) and (20) is used as the criterion for judgment. did. Other best methods can be used as criteria. For example, a configuration may be adopted in which prediction gains are calculated when pitch prediction is performed, and these values are compared with a predetermined threshold value to be used as a judgment criterion. Further, in the present invention, when calculating the autocorrelation sequence of the impulse response sequence representing the short-time spectral structure, the impulse response calculation circuit 210 uses the K parameter decoded value and the pitch information to calculate the impulse response sequence. After calculating,
The autocorrelation sequence was calculated using this impulse response sequence. As is well known in the field of digital signal processing, the autocorrelation sequence of the impulse response sequence corresponds to the power spectrum of the short-time spectrum. Therefore, a configuration may be adopted in which the power spectrum of the short-time spectrum is determined using the K-parameter decoded value and the pitch information, and the autocorrelation sequence is calculated. On the other hand, when calculating the cross-correlation sequence between the audio signal sequence and the impulse response sequence representing the short-time spectral envelope, in the configuration of this embodiment, the signal sequence x _w (n), which is the output value of the weighting circuit 410, is , impulse response calculation circuit 2
A cross-correlation calculation circuit 350 calculates the cross-correlation using the impulse response sequence obtained in step 10. As is well known, the cross-correlation number corresponds to the cross-power spectrum. Using this relationship, the cross-power spectrum may be obtained using the audio signal sequence, the K-parameter reconstruction value, and the pitch information, and the cross-correlation sequence may be calculated. Regarding the correspondence between the power spectrum and the autocorrelation sequence, and the correspondence between the cross power spectrum and the cross-correlation sequence, please refer to A.V. Otzpenheim (AV
“DIGITAL SIGNAL PROCESSING” by Mr. OPPENHEIM et al.
Since it is explained in detail in Chapter 8 of the monograph titled (Reference 4), the explanation will be omitted here. In addition, in the embodiment of the present invention described above, the encoding of the sound source pulse sequence within one frame is performed by the encoding circuit 47 in FIG. 5 after all pulse sequences have been determined.
Although encoding was performed using 0, the encoding is included in the calculation of the pulse sequence, and each time one pulse is calculated,
A configuration may also be used in which encoding is performed and the next pulse is calculated. By adopting such a configuration, a pulse sequence that minimizes errors including encoding distortion can be found, so that quality can be further improved. Regarding code assignment in the encoding circuit 470, in the configuration of the present invention, the encoding efficiency is considerably improved by variable length code assignment rather than equal length code assignment. This is because, by determining the sound source pulse train using pitch information, the amplitude distribution of the sound source pulse train becomes even more skewed. In addition, in the embodiments described above, the K parameter was used as the parameter representing the spectral envelope of the short-time audio signal sequence, but other well-known parameters (for example, LSP parameters) may also be used. good. Furthermore, the weighting function w(n) may not be included in the above equation (8). In addition, in this embodiment, in order to prevent quality deterioration due to discontinuity of the reproduced waveform at frame boundaries, a response signal sequence derived from the sound source pulse one frame past the current frame is calculated, and the input audio of the current frame is After subtracting this response signal from , the driving sound source pulse was calculated, as shown in Figure 6.
The data used for the sound source pulse calculation may include data of a frame that transmits the pulse and data from the past. In Figure 6,
N _T indicates a frame for transmitting pulses, and N indicates a frame for calculating source pulses. Such a configuration has the effect that it is not necessary to calculate a response signal sequence derived from a sound source pulse one frame past.

[Brief explanation of the drawing]

第１図は従来方式の構成を示すブロツク図、第
２図は音源パルス系列の一例を示す図、第３図は
入力音声信号系列の周波数特性と第１図に記載の
重み付け回路の周波数特性の一例を示す図、第４
図は本発明による音源パルス計算アルゴリズムの
説明に用いる合成フイルタの一例を示す図、第５
図は本発明の構成による音声符号化方式の一実施
例を示すブロツク図、第６図はパルス伝送フレー
ムと音源パルス計算フレームとの位置関係を説明
するための図である。図において、１１０，３４０……バツフアメモ
リ回路、１２０，２８５……減算回路、１３０，
４００，５５０……合成フイルタ回路、１４０，
４２０，５４０……音源パルス発生回路、１５０
……誤差最小化回路、１８０，２８０……Ｋパラ
メータ計算回路、１９０，４１０……重み付け回
路、２００……Ｋパラメータ符号化回路、１９１
……ピツチ予測フイルタ、１９２……スペクトル
包絡予測フイルタ、２１０……インパルス応答計
算回路、３５０……相互相関計算回路、３６０…
…自己相関計算回路、３７０……ピツチ分析回
路、３８０……ピツチ符号化回路、３９０……パ
ルス計算回路、４３０……判別回路、４４０……
切り換え回路、４７０……符号化回路、４５０…
…マルチプレクサ、４６０……ゲート回路、５０
０……デマルチプレクサ、５１０……ピツチ復号
回路、５２０……Ｋパラメータ復号回路、５３０
……音源パルス復号回路をそれぞれ示す。 Figure 1 is a block diagram showing the configuration of the conventional system, Figure 2 is a diagram showing an example of a sound source pulse sequence, and Figure 3 shows the frequency characteristics of the input audio signal sequence and the frequency characteristics of the weighting circuit shown in Figure 1. Diagram showing an example, No. 4
Figure 5 shows an example of a synthesis filter used to explain the sound source pulse calculation algorithm according to the present invention.
The figure is a block diagram showing an embodiment of the speech encoding system according to the present invention, and FIG. 6 is a diagram for explaining the positional relationship between the pulse transmission frame and the sound source pulse calculation frame. In the figure, 110, 340... buffer memory circuit, 120, 285... subtraction circuit, 130,
400,550...Synthesis filter circuit, 140,
420, 540...Sound source pulse generation circuit, 150
... Error minimization circuit, 180, 280 ... K parameter calculation circuit, 190, 410 ... Weighting circuit, 200 ... K parameter encoding circuit, 191
... Pitch prediction filter, 192 ... Spectrum envelope prediction filter, 210 ... Impulse response calculation circuit, 350 ... Cross correlation calculation circuit, 360 ...
... Autocorrelation calculation circuit, 370 ... Pitch analysis circuit, 380 ... Pitch encoding circuit, 390 ... Pulse calculation circuit, 430 ... Discrimination circuit, 440 ...
Switching circuit, 470... Encoding circuit, 450...
...Multiplexer, 460 ... Gate circuit, 50
0... Demultiplexer, 510... Pitch decoding circuit, 520... K parameter decoding circuit, 530
... Each shows a sound source pulse decoding circuit.

Claims

[Claims] 1. On the transmitting side, a discrete audio signal sequence is input, parameters representing a short-time spectrum including pitch fine structure are extracted and encoded, and based on the parameters, a signal is encoded according to the short-time spectrum. Calculate the autocorrelations of the impulse response sequence, calculate the crosscorrelations according to the audio signal sequence and the impulse response sequence, and use the autocorrelations and the crosscorrelations to A driving excitation signal sequence for the signal sequence is determined and encoded, a code representing the driving excitation signal sequence and a code representing the parameter are combined and output, and the receiving side inputs the code sequence to represent the driving excitation signal sequence. Separate and decode the code sequence and the code sequence of the parameter representing the short-time spectrum including the pitch fine structure, and use the decoded drive excitation signal sequence and the decoded parameter to generate the audio signal sequence. 1. A voice encoding method characterized in that the voice encoding method reproduces 2. A parameter calculation circuit that inputs a discrete audio signal sequence and extracts and encodes a pitch parameter representing a pitch fine structure and a spectral parameter representing a short-time spectral envelope from the audio signal sequence, and an output sequence of the parameter calculation circuit. an autocorrelation calculation circuit that calculates an autocorrelation coefficient of an impulse response sequence according to a short-time spectrum including a pitch structure of the audio signal sequence; and an output sequence of the audio signal sequence and the parameter calculation circuit. a cross-correlation coefficient calculating circuit that receives the input signal and calculates a cross-correlation coefficient represented by the audio signal sequence and the impulse response sequence according to the short-time spectrum;
a drive excitation signal sequence calculation circuit that inputs the output series of the autocorrelation coefficient calculation circuit and the output series of the cross-correlation coefficient calculation circuit to obtain and encode a drive excitation signal sequence for the audio signal sequence; and the parameter A speech encoding device comprising: a multiplexer circuit that combines and outputs the output code sequence of the calculation circuit and the output code sequence of the drive excitation signal sequence calculation circuit. 3. A parameter calculation circuit that inputs a discrete audio signal sequence and extracts and encodes a pitch parameter representing a pitch fine structure and a spectral parameter representing a short-time spectral envelope from the audio signal sequence, and an output sequence of the parameter calculation circuit. an autocorrelation calculation circuit that calculates an autocorrelation coefficient of an impulse response sequence according to a short-time spectrum including a pitch structure of the audio signal sequence; and an output sequence of the audio signal sequence and the parameter calculation circuit. a cross-correlation calculation circuit that calculates the cross-correlation numbers represented by the audio signal sequence and the impulse response sequence corresponding to the short-time spectrum; a driving excitation signal sequence calculation circuit that inputs the output sequence of the correlation coefficient calculation circuit, obtains and encodes a driving excitation signal sequence for the audio signal sequence, and calculates the output code sequence of the parameter calculation circuit and the driving excitation signal sequence. a multiplexer circuit that combines and outputs the output code sequences of the circuit; and a multiplexer circuit that receives the code sequence obtained by the combination and represents the code sequence representing the drive excitation signal sequence, the code sequence representing the pitch parameter, and the spectral parameter. a demultiplexer circuit that separates the code sequence, a driving excitation decoding circuit that inputs and decodes the code sequence representing the driving excitation signal sequence obtained by the separation, and a code representing the pitch parameter obtained by the separation. a parameter decoding circuit that inputs and decodes a sequence and a code sequence representing the spectral parameter, and a synthesis filter that reproduces and outputs an audio signal sequence using the output sequence of the driving excitation decoding circuit and the output of the parameter decoding circuit. A speech encoding/decoding device comprising a circuit.