JPH0426119B2

JPH0426119B2 -

Info

Publication number: JPH0426119B2
Application number: JP57231605A
Authority: JP
Inventors: Kazunori Ozawa
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-12-24
Filing date: 1982-12-24
Publication date: 1992-05-06
Also published as: JPS59116794A

Description

【発明の詳細な説明】本発明は音声信号の低ビツトレイト波形符号化
方式、特に伝送情報量を10kビツト／秒以下とす
るような符号化・復号化装置に関する。音声信号を10kビツト／秒程度以下の伝送情報
量で符号化するための効果的な方法としては、音
声信号の駆動音源信号系列を、それを用いて再生
した信号と入力信号との誤差最小を条件として、
短時間毎に探索する方法が、よく知られている。
これらの方法はその探索方法によつて木符号化
（TREE CODING）、ベクトル量子化
（VECTORQUANTIZATION）と呼ばれてい
る。また、これらの方法以外に、駆動音源信号系
列を表わす複数個のパルス系列を、短時間毎に、
符号器側で、Ａ−ｂ−Ｓ（ＡNALYSIS−ＢＹ−
ＳYNTHESIS）の手法を用いて逐次的に求めよ
うとする方式が最近、提案されている。本発明
は、この方式に関係するものである。この方式の
詳細については、ビー・エス・アタール（Ｂ・
Ｓ・ATAL）氏らによるアイ．シー．エー．エ
ス．エス．ピー（I.C.A.S.S.P）の予稿集、1982
年614〜617頁に掲載の「ア．ニユー．モデル．オ
ブ．エル．ピー．シー．エクサイテイシヨン．フ
オー．プロデユーシング．ナチユラル．サウンデ
イング．スピーチ．アツト．ロウ．ビツト．レイ
ツ」（“Ａ NEW MODEL OF LPC
EXCITATION FOR PRODUCING
NATURAL−SOUNDING SPEECH ATLOW
BIT RATES”）と題した論文（文献１）に説明
されているので、ここでは簡単に説明を行なう。第１図は、前記文献１、に記載された従来方式
における符号器側の処理を示すブロツク図であ
る。図において、１００は符号器入力端子を示
し、Ａ／Ｄ変換された音声信号系列ｘ（ｎ）が入
力される。１１０はバツフアメモリ回路であり、
音声信号系列を１フレーム（例えば10msec，8K
Hzサンプリングの場合は80サンプル）分、蓄積す
る。１１０の出力値は減算器１２０と、Ｋパラメ
ータ計算回路１８０とに出力される。但し、文献
１、によればＫパラメータのかわりにレフレクシ
ヨン．コエフイシエンツ（REFLECTION
COEFFICIENTS）と記載されているが、これは
Ｋパラメータと同一のパラメータである。Ｋパラ
メータ計算回路１８０は、１１０の出力値を用
い、共分散法に従つて、フレーム毎の音声信号ス
ペクトルを表わすＫパラメータKiを16次分（１
≦ｉ≦16）求め、これらを合成フイルタ１３０へ
出力する。１４０は、音源パルス発生回路であ
り、１フレームにあらかじめ定められた個数のパ
ルス系列を発生させる。ここでは、このパルス系
列をｄ（ｎ）と記する。１４０によつて発生され
た音源パルス系列の一例を第２図に示す。第２図
で横軸は離散的な時刻を、縦軸は振幅をそれぞれ
に示す。ここでは、１フレーム内に８個のパルス
を発生させる場合について示してある。１４０に
よつて発生されたパルス系列ｄ（ｎ）は、合成フ
イルタ１３０を駆動する。合成フイルタ１３０
は、ｄ（ｎ）を入力し、音声信号ｘ（ｎ）に対応す
る再生信号x〓（ｎ）を求め、これを減算器１２０
へ出力する。ここで、合成フイルタ１３０は、Ｋ
パラメータKiを入力し、これらを予測パラメー
タai（１≦ｉ＝16）へ変換し、aiを用いてx〓（ｎ）
を計算する。x〓（ｎ）は、ｄ（ｎ）とaiを用い下式
のように表わすことができる。 x〓(n)＝ｄ(n)＋_P 〓ⁱ⁼¹ ai・x〓（ｎ−ｉ） −(1) 上式でＰは合成フイルタの次数を示し、ここで
はＰ＝16としている。減算器１２０は、原信号x〓
（ｎ）と再生信号ｘ(n)との差ｅ(n)を計算し、重み付
け回路１９０へ出力する。１９０は、ｅ(n)を入
力し、重み付け関数ｗ(n)を用い、次式に従つて
重み付け誤差ew(n)を計算する。 ew(n)＝ｗ(n)^*ｅ(n) −(2) 上式で、記号“＊”はたたみこみ積分を表わ
す。また、重み付け関数ｗ(n)は、周波数軸上で
重み付けを行なうものであり、そのＺ変換値をＷ
（Ｚ）とすると、合成フイルタの予測パラメータai
を用いて、次式により表わされる。Ｗ(Z)＝（１−_P 〓ⁱ⁼¹ aiZ_-i）／（１−_P 〓ⁱ⁼¹ ai・r_i・Z_-i） −(3) 上式でｒは０≦ｒ≦１の定数であり、Ｗ(Z)の
周波数特性を決定する。つまり、ｒ＝１とする
と、Ｗ(Z)＝１となり、その周波数特性は平担と
なる。一方、ｒ＝０とすると、Ｗ(Z)は合成フイ
ルタの周波数特性の逆特性となる。従つて、ｒの
値によつてＷ(Z)の特性を変えることができる。
また、(3)式で示したようにＷ(Z)を合成フイルタ
の周波数特性に依存させて決めているのは、聴感
的なマスク効果を利用しているためである。つま
り、入力音声信号のスペクトルのパワが大きな箇
所では（例えばフオルマントの近傍）、再生信号
のスペクトルとの誤差が少々大きくても、その誤
差は耳につき難いという聴感的な性質による。第
３図に、あるフレームにおける入力音声信号のス
ペクトルと、Ｗ(Z)の周波数特性の一例とを示し
た。ここではｒ＝0.8とした。図において、横軸
は周波数（最大4KHz）を、縦軸は対数振幅（最
大60dB）をそれぞれ示す。また、上部の曲線は
音声信号のスペクトルを、下部の曲線は重み付け
関数の周波数特性を表わしている。第１図へ戻つて、重み付け誤差ew(n)は、誤差
最小化回路１５０へフイードバツクされる。誤差
最小化回路１５０は、ew(n)の値を１フレーム分
記憶し、これらを用いて次式に従い、重み付け２
乗誤差εを計算する。 ε＝_N 〓ⁿ⁼¹ ew(n)² −(4) ここで、Ｎは２乗誤差を計算するサンプル数を
示す。文献１、の方式では、この時間長を5msec
としており、これは8KHzサンプリングの場合に
はＮ＝40に相当する。次に、誤差最小化回路１５
０は、前記(4)式で計算した２乗誤差εを小さくす
るように音源パルス発生回路１４０に対し、パル
ス位置及び振幅情報を与える。１４０は、この情
報に基づいて音源パルス系列を発生させる。合成
フイルタ１３０は、この音源パルス系列を駆動源
として再生信号x〓(n)を計算する。次に減算器１２
０では、先に計算した原信号と再生信号との誤差
ｅ(n)から現在求まつた再生信号x〓(n)を減算して、
これを新たな誤差ｅ(n)とする。重み付け回路１
９０はｅ(n)を入力し重み付け誤差ew(n)を計算
し、これを誤差最小化回路１５０へフイードバツ
クする。１５０は、再び、２乗誤差とを計算し、
これを小さくするように音源パルス系列の振幅と
位置を調整する。こうして音源パルス系列の発生
から誤差最小化による音源パルス系列の調整まで
の一連の処理は、音源パルス系列のパルス数があ
らかじめ定められた数に達するまでくり返され、
音源パルス系列が決定される。以上で従来方式の説明を終了する。この方式の場合に、伝送すべき情報は、合成フ
イルタのＫパラメータKi（１≦ｉ≦16）と、音源
パルス系列のパルス位置及び振幅であり、１フレ
ーム内にたてるパルスの数によつて任意の伝送レ
イトを実現できる。さらに、伝送レイトを
10Kbps以下とする領域に対しては、良好な再生
音質が得られ有効な方式の一つと考えられる。しかしながら、この従来方式は、演算量が非常
に多いという欠点がある。これは音源パルス系列
におけるパルス位置の振幅を計算する際に、その
パルスに基づいて再生した信号と原信号との誤差
及び２乗誤差を計算し、それらをフイードバツク
させて、パルス位置と振幅を調整していることに
起因している。更には、パルスの数があらかじめ
定められた値に達するまでこの処理をくり返すこ
とに起因している。更に、この従来方式によれ
ば、分析フレーム長を一定としており、入力音声
信号系列のパワーの大きな部分でフレームが切り
換わつた場合には、再生信号系列においてフレー
ムの境界部近傍で波形の不連続に起因した劣化が
発生し、再生音声品質を大きく損なうという欠点
がある。本発明の目的は、比較的少ない演算量で、フレ
ーム境界部近傍での品質劣化がほとんどなく、
10Kbps以下の伝送レイトに適用し得る高品質な
音声符号化・復号化装置を提供することにある。本発明の音声符号化・復号化装置は、送信側で
は、離散的な音声信号系列を入力し前記音声信号
系列から過去に求めた駆動音源信号系列に由来し
た応答信号系列を減算する減算回路と、前記音声
信号系列あるいは前記減算結果の短時間スペクト
ル包絡を表わすパラメータを抽出して符号化する
パラメータ計算回路と、前記スペクトル包絡を表
わすパラメータをもとにインパルス応答系列を計
算するインパルス応答系列計算回路と、前記イン
パルス応答系列の計算回路の出力系列を入力し相
関々数列を計算する相関々数列計算回路と、前記
減算結果をもとに目標信号系列を作り、前記目標
信号系列と前記インパルス応答系列との相互相
関々数列を計算する相互相関々数列計算回路と、
前記相関々数列と前記相互相関々数列とを入力し
前記音声信号系列の駆動音源信号系列を計算し符
号化する駆動音源信号系列計算回路と、前記駆動
音源信号系列に由来した前記応答信号系列を計算
する応答信号系列計算回路と、前記スペクトル包
絡を表わすパラメータ計算回路の出力符号系列と
前記駆動音源信号系列の符号系列とを組み合わせ
て出力するマルチプレクサ回路とを有し、受信側
では、前記符号系列を入力し前記駆動音源信号系
列の符号系列と前記スペクトル包絡を表わすパラ
メータの符号系列とを分離するデマルチプレクサ
回路と、分離して得た符号系列から前記駆動音源
信号系列を復号化し音源パルス系列を発生させる
音源パルス系列発生回路と、分離して得た前記ス
ペクトル包絡を表わすパラメータの符号系列を復
号化する復号回路と、前記復号化したスペクトル
包絡を表わすパラメータを用いて音声信号系列を
再生し出力する合成フイルタ回路とを有すること
を特徴とする。本発明の音声符号化装置は、離散的音声信号系列を入力し前記音声信号系列
から応答信号系列を減算する減算回路と、前記音
声信号系列あるいは前記減算回路の出力系例の短
時間スペクトル包絡を表わすパラメータを抽出し
符号化するパラメータ計算回路と、前記スペクト
ル包絡を表わすパラメータをもとにインパルス応
答系列を計算するインパルス応答系列計算回路
と、前記インパルス応答系列計算回路の出力系列
を入力し相関々数列を計算する相関々数列計算回
路と、前記減算回路の出力系列または前記減算回
路の出力系列にあらかじめ定められた補正を施し
た信号と前記インパルス応答系列との相互相関々
数列を計算する相互相関々数列計算回路と、前記
相関々数列と前記相互相関々数列とを入力し前記
音声信号系列の駆動音源信号系列を計算し符号化
する駆動音源信号系列計算回路と、前記駆動音源
信号系列を入力して前記駆動音源信号系列に由来
した前記応答信号系列を計算する応答信号系列計
算回路と、前記パラメータ計算回路の出力符号系
列と前記駆動音源信号系列の符号系列とを組み合
わせて出力するマルチプレクサ回路とを有するこ
とを特徴とする。本発明の音声復号化装置は、離散的音声信号系列から過去に求めた駆動音源
信号系列に由来した応答信号系列を減算し、前記
音声信号系列あるいは減算結果の短時間スペクト
ル包絡を表わすパラメータを抽出して符号化し、
前記パラメータから求めたインパルス応答系列と
前記減算結果とを用いて計算した相互相関々数列
と前記インパルス応答系列を用いて計算した相
関々数列とを使つて駆動音源信号系列を探索して
符号化し、前記スペクトル包絡を表わすパラメー
タの符号系列とを組み合わせて出力された符号系
列を入力し前記駆動音源信号系列を表わす符号系
列と前記スペクトル包絡を表わすパラメータの符
号系列とを分離するデマルチプレクサ回路と、分
離して得た前記駆動音源信号系列を表わす符号系
列を復号化して音源パルス系列を発生させる音源
パルス系列発生回路と、分離して得た前記スペク
トル包絡を表わすパラメータの符号系列を復号化
する復号回路と、前記音源パルス系列発生回路の
出力系列を入力し前記復号回路の出力パラメータ
を用いて音声信号系列を再生し出力する合成フイ
ルタ回路を有することを特徴とする。本発明による音声符号化装置は、音源パルス系
列を計算するアルゴリズムに特徴の一つがある。
従つて以下では、このアルゴリズムを最初に詳細
に説明することにする。まず、１フレーム内の任意の時刻ｎにおける音
源パルス系列ｄ(n)を次式で表わす。ｄ(n)＝_k 〓^k=1 g_k・δ_o，mk −(5) ここで、δn，mkはクロネツカーのデルタを表
わし、ｎ＝m_kの場合に１で、ｎ≠m_kの場合は０
である。またg_kは、位置m_kのパルスの振幅を表
わす。ｄ(n)を合成フイルタに入力して得られる
再生信号x〓(n)は、合成フイルタの予測パラメータ
をa_i（１≦ｉ≦N_p；ここでN_pは合成フイルタの次
数を示す）とすると、次式のように書ける。 x〓(n)＝ｄ(n)＋_Np 〓ⁱ⁼¹ aix〓（ｎ−ｉ） −(6) 次に、入力音声信号ｘ(n)と再生信号x〓(n)との
１フレーム内の重み付け２乗誤差Ｊは次のように
書ける。Ｊ＝_N 〓ⁿ⁼¹ （（ｘ(n)−x〓(n)）＊ｗ(n)）² −(7) ここでｗ(n)は重み付け回路のインパルス応答
であり、例えば従来例と同一特性としてもよい。
又、Ｎは１フレームのサンプル数を示す。(7)式は
さらに次式のように変形できる。Ｊ＝_N 〓ⁿ⁼¹ （ｘ(n)＊ｗ(n) −x〓(n)＊ｗ(n)）² −(8) ここでx〓(n)＊ｗ(n)の項は次式に従つて変形さ
れる。 x〓(w)(n)＝x〓(n)＊ｗ(n) −(9) とおく。(9)式の両辺をＺ変換すると、 X〓(Z)＝X〓(Z)・Ｗ(Z) −(10) とかける。X〓(Z)は更に次のようにかける。 X〓(Z)＝Ｈ(Z)・Ｄ(Z) −(11) ここでＤ(Z)は音源パルス系列(5)式のＺ変換を
示し、Ｈ(Z)は合成フイルタのインパルス応答の
Ｚ変換値を示す。 (11)式を(10)式に代入すると、 X_w(Z)＝Ｄ(Z)．Ｈ(Z)．Ｗ(Z) −(12) となり、H_w(Z)＝Ｈ(Z)．Ｗ(Z)とおき、(12)式を逆
Ｚ変換し、Hw(Z)の逆Ｚ変換値をhw(n)とする
と、次式を得る。 x_w(n)＝ｄ(n)＊h_w(n) −(13) ここで、h_w(n)は合成フイルタと重み付け回路
の縦続接続フイルタのインパルス応答を示す。(1
３）式に(5)式を代入して次式を得る。 X〓_w(n)＝_K 〓ⁱ⁼¹ g_ih_w（ｎ−m_i） −(14) ここでＫは、１フレームにたてるパルス数を示
す。(14)式、(9)式を(8)式に代入すれば、Ｊ＝_N 〓ⁿ⁼¹ （x_w(n)−_K 〓ⁱ⁼¹ g_ih_w（ｎ−m_i））² −(15) とかける。従つて、(7)式は(15)式のように表わせ
ることになる。 (15)式を最小とするような音源パルス系列の振幅
g_k，位置m_kの計算式を、次に導出する。 (15)式をg_kで偏微分して０とおくことによつ
て、次式が導かれる。ここで、ψxh（・）はXw(n)とh_w(n)から計算し
た相互相関々数列を表す。また、ψhh（・）は、
インパルス応答hω(n)の相関々数列を表す。イン
パルス応答の相関々数列としては共分散関数列あ
るいは自己相関々数列が知られている。相互相
関々数列と共分散関数列は、次式のように表せ
る。尚、ψ_hh（・）は音声信号処理の分野では共分散
関数と呼ばれることが多い。 ψ_xh（−m_k）＝_N 〓ⁿ⁼¹ x_w(n)h_w（ｎ−m_k）＝ψ_hx（m_k），（１≦m_k≦Ｎ） −(17) ψ（m_i，m_k）＝_N-(ni-nk) 〓^n-1 h_w（ｎ−m_i） h_w（ｎ−m_k），（１≦m_k≦Ｎ） −(18) (16)式によれば、パルスの位置m_kをパラメータ
として、位置m_kに対応した振幅g_kが計算できる。
パルスの位置m_kは各パルスについて、｜g_k｜が最
大となるm_kを選べばよい。これは、(16)式をg_iについて、解くことによつ
て証明されるが、ここでは証明は略す。以上で本アルゴリズムの導出に関する説明を終
える。本発明による音声符号化装置のもう一つの特徴
は、フレーム境界部近傍での品質劣化がほとんど
ないことであり、これは次に実施例を用いて説明
する。第４図は、(16)式による音源パルス計算ア
ルゴリズムを用いた符号器の一構成例を示すブロ
ツク図である。図において、第１図と同一番号を付した構成要
素は、第１図と同一の働きをするのでここでは説
明を省略する。第４図において各構成要素は１フ
レーム毎に以下の処理を行なう。また、１フレー
ムのサンプル数をＮとする。Ｋパラメータ計算回
路２８０は、バツフアメモリ回路１１０に蓄積さ
れた音声信号系列ｘ(n)を入力し、あらかじめ定
められた次数Np個のＫパラメータK_i（１≦ｉ≦
N_p）を計算する。K_iはＫパラメータ符号化回路
２００に出力される。Ｋパラメータ符号化回路２
００は例えばあらかじめ定められた量子化ビツト
数に基づいて、K_iを符号化し、符号l_kiをマルチプ
レクサ２６０へ出力する。またＫパラメータ符号
化回路２００は、l_kiを復号化し、復号値k_i′（１
≦ｉ≦N_p）をインパルス応答計算回路２１０と、
重み付け回路２９０と、合成フイルタ回路３２０
へ出力する。インパルス応答計算回路２１０は、
k_i′を入力し、前述の(13)式におけるh_w(n)（合成
フイルタと重み付け回路の縦続接続からなるフイ
ルタのインパルス応答）の計算を、あらかじめ定
められたサンプル数だけ行ない、求まつたh_w(n)
を共分散関数計算回路２２０と、相互相関々数計
算回路２３５とへ出力する。共分散関数計算回路２２０は、あらかじめ定め
られたサンプル数のh_w(n)を入力し、前述の(18)式
に従つてh_w(n)の共分散ψ_hh（m_i，m_k）（１≦ｉ，
Ｋ≦Ｎ）を計算し、これをパルス系列計算回路２
４０へ出力する。次に減算器２８５はバツフアメ
モリ回路１１０に蓄積された音声信号系列ｘ(n)
から、合成フイルタ回路３２０の出力系列を１フ
レーム分減算し、減算結果を重み付け回路２９０
へ出力する。ここで合成フイルタ回路３２０には
後述するように、現フレームより１フレーム過去
の音源パルス系列を駆動信号として応答信号系列
を求め、その後、駆動信号を０として現フレーム
に延ばした信号系列が１フレーム分蓄積されてい
る。つまりこれは、合成フイルタのインパルス応
答の意味のあるサンプル数がたかだか２フレーム
程度であるとすれば、現フレームの音声信号系列
は、１フレーム過去の音源パルスによつて駆動さ
れた合成フイルタ出力信号をその後、駆動信号を
０として、現フレームへ延ばした信号系列と、現
フレームの音源パルス系列によつて駆動された合
成フイルタ出力信号系列との和として表現できる
という考えに基づいている。重み付け回路２９０
は、Ｋパラメータ符号化回路２００からK_i′を入
力し、重み付け関数ｗ(n)を、例えば従来方式の
(3)式に従つて計算する。これは他の周波数重み付
け方法を用いて計算してもよい。また、重み付け
回路２９０は、減算器２８５の減算結果を入力
し、これとｗ(n)とのたたみこみ積分計算を行な
い、得られたx_w(n)を相互相関々数計算回路２３
５へ出力する。相互相関々数計算回路２３５は、
x_w(n)とh_w(n)とを入力し、前述の(17)式に従つて、
相互相関々数ψ_xh（−m_k）（１≦m_k≦Ｎ）を計算
し、これをパルス系列計算回路２４０へ出力す
る。次に、パルス系列計算回路２４０は、相互相
関々数計算回路２３５からψ_xh（−m_k）を、共分
散関数計算回路２２０からψ_hh（mi，m_k）（１≦
m_i，m_k≦Ｎ）をそれぞれ入力し、前述の音源パ
ルス計算式(16)式を用いて、パルスの振幅g_kを計
算する。例えば、１つ目のパルスは(16)におい
て、ｋ＝１とおいて振幅g₁を位置m₁の関数とし
て求める。次に、｜g₁｜を最大とするようなm₁を選び、そ
の際のm₁，g₁を１番目のパルスの位置及び振幅
とする。次に、２番目のパルスは、(16)式におい
て、ｋ＝２とおくことにより求まる。(16)式によ
れば、２番目のパルスは１番目のパルスによる影
響をさしひいて求まることを意味している。３番
目以降のパルスも同様にして計算でき、あらかじ
め定められたパルス数に達するか、あるいは、求
まつたパルスのg_k，m_kを(15)式に代入して得られ
る誤差の値が、あらかじめ定められたしきい値以
下になるまでパルスの計算を続ける。パルス系列
の振幅、位置を表わすg_k，m_kは、符号化回路２
５０へ出力される。符号化回路２５０は、音源パルス計算回路２４
０から、音源パルス系列の振幅g_k及び位置m_kを
入力し、これらを後述の正規化係数を用いて符号
化し、g_k，m_k及び正規化係数を表わす符号をマ
ルチプレクサ２６０へ出力する。また、これを復
号化し、g_k，m_kの復号化値g_k′及びm_k′をパルス
系列発生回路３００へ出力する。ここで、符号化
の方法は種々考えられるが、振幅g_kの符号化につ
いては、従来よく知られている方法を用いること
ができる。例えば、振幅の確率分布を正規型と仮
定して、正規型の場合の最適量子化器を用いる方
法が考えられる。これについては、ジエー・マツ
クス（J.MAX）氏によるアイ．アール．イー．
トランザクシヨンズ．オン．インフオメーシヨ
ン．セオリー（IRETRANSACTIONS ON
INFORMATION THEORY）の1960年３月号、
７〜12頁に掲載の「クオンタイジング．フオー．
ミニマム．デイストーシヨン」
（“QUANTIZING FOR MINIMUM
DISTORTION”）と題した論文（文献２、）等に
詳述されているので、ここでは説明を省略する。
また、他の方法としては、１フレーム内のパルス
系列の振幅の最大値を正規化係数として、この値
で各パルス振幅を正規化した後に量子化、符号化
する方法も考えられる。前者の方法の場合には、
１フレーム内のr.m.s（ROOT MEAN
SQUARE）値を正規化係数とすればよい。次に
パルスの位置の符号化についても種々の方法が考
えられる。例えばフアクシミリ信号符号化の分野
でよく知られているランレングス符号等を用いて
もよい。これは符号“０”の続く長さをあらかじ
め定められた符号系列を用いて表わすものであ
る。また、正規化係数の符号化には、従来よく知
られている対数圧縮符号化等を用いることができ
る。尚、パルス系列の符号化に関しては、ここで説
明した符号化方法に限らず、衆知の最良の方法を
用いることができることは勿論である。再び第４図に戻つて、パルス系列発生回路３０
０は入力したg_k′，m_k′を用いて、m_k′の位置に振
幅g_k′をもつ音源パルス系列を１フレーム分計算
し、これを駆動信号として、合成フイルタ回路３
２０へ出力する。合成フイルタ回路３２０はＫパ
ラメータ符号化回路２００からＫパラメータ量子
化値K_i′（１≦ｉ≦N_p）を入力し、これを予測パ
ラメータa_i（１≦ｉ≦N_p）に衆知の方法を用いて
変換しておく。次に合成フイルタ回路３２０はパ
ルス発生回路３００から１フレーム分の駆動音源
信号を入力して、この１フレーム分の信号に１フ
レーム分、零を付加し、この２フレーム分の信号
に対する応答信号系列x〓(n)′を求める。更に、第
２フレームの零信号列によつて応答信号系列を計
算する際には、合成フイルタ回路３２０は、Ｋパ
ラメータ符号化回路２００から新たなK_i′（１≦
ｉ≦N_p）を入力し、これを用いて行なう。次式
にこのことを示す。ここで、駆動音源信号ｄ(n)は、１≦ｎ≦Ｎで
はパルス発生回路３００からの出力パルス系列を
表わし、Ｎ＋１≦ｎ≦2Nでは全て０の系列を表
わす。また、(19)でa^j _iは現フレーム時刻ｊのK_i′
（１≦ｉ≦N_p）から計算した予測パラメータを、
a^j-1 _iは１フレーム時刻過去のフレーム時刻ｊ−１
のK_i′から計算した予測パラメータをそれぞれ示
す。(19)式に従つて求めたx〓′(n)のうち、第２フレ
ーム目のx〓′(n)（Ｎ＋１≦ｎ≦2N）が減算器２８
５へ出力される。次に、マルチプレクサ２６０は、Ｋパラメータ
符号化回路２００の出力符号と、符号化回路２５
０の出力符号を入力し、これらを組み合わせて、
送信側出力端子２７０から通信路へ出力する。以
上で本発明による音声符号化装置の説明を終え
る。次に、本発明による音声復号化装置の説明を行
なう。第５図は、本発明による音声復号化装置の
構成例を示す。図において、復号器入力端子３５
０からフレーム毎に符号系列を入力し、デマルチ
プレクサ３６０はこの符号系列を、Ｋパラメータ
符号系列と、音源パルス系列の振幅及び位置を表
わす符号系列と、正規化係数を表わす符号とに分
離し、Ｋパラメータ符号系列をＫパラメータ復号
化回路３８０へ出力し、残りの符号系列を復号化
回路３７０へ出力する。復号化回路３７０は、ま
ず正規化係数を表わす符号を復号し、これを用い
て音源パルス系列の符号系列を復号し、パルスの
振幅g_k′と位置m_k′をパルス系列発生回路４２０
へ出力する。ここでパルス系列発生回路４２０
は、第４図符号器側におけるパルス系列発生回路
３００と同一の動作を行ない、１フレーム内のパ
ルス系列を発生させ、合成フイルタ回路４４０へ
出力する。合成フイルタ回路４４０は、Ｋパラメ
ータ復号化回路３８０からN_p個のＫパラメータ
復号値K_i′（１≦ｉ≦N_p）を入力し、これらを予
測パラメータa_i（１≦ｉ≦N_p）に変換する。次に、
パルス系列発生回路４２０から駆動音源信号を１
フレーム分入力し、これを用いて音声信号系列を
１フレーム分再生する。この合成フイルタ４４０内部では、１フレーム
過去の音源パルス系列から求まつた応答信号系列
が現フレームの音源パルス系列によつて求まつた
再生信号系列に加算され、音声信号系列が再生さ
れる。再生された音声信号系列x〓(n)は、バツフア
メモリ回路４７０へ出力される。バツフアメモリ
回路４７０は、１フレーム分のx〓(n)を蓄積した後
に、復号器側出力端子４１０を通して出力する。
以上で本発明による音声復号化装置の説明を終了
する。本発明の構成によれば、音源パルス系列の計算
を(16)式に従つているので、文献１の従来方式に
見られたパルスにより合成フイルタを駆動し、再
生信号を求め、原信号との誤差及び２乗誤差をフ
イードバツクしてパルスを調整するという径路が
なく、またその処理をくり返す必要もないので、
演算量を大幅に減らすことが可能で、良好な再生
音質が得られるという大きな効果がある。更に(1
６）式の演算において、ψ_xh（−m_k）とψ_hh（m_i，m_k）
（１≦m_i，m_k≦Ｎ）の値を、１フレーム毎に、前
もつて計算しておくことによつて、(16)式の計算
は掛け算と引き算という非常に簡略化された演算
となり、更に演算量を減らすことができるという
効果がある。また、音源パルス系列を探索する他
の従来方式と比べても、本発明による方法は、同
一の伝送情報量の場合に、より良好な品質を得る
ことができるという効果がある。更に、本発明の構成によれば分析フレーム長が
一定でない場合は勿論のこと、分析フレーム長を
一定にした場合でも、波形の不連続に起因したフ
レームの境界近傍での再生信号の劣化がほとんど
ないという大きな効果がある。この効果は符号器
側において、現フレームの音源パルス系列を計算
する際に、１フレーム過去のフレームの音源パル
ス系列によつて合成フイルタを駆動して得た応答
信号系列を現フレームにまで伸ばして求め、これ
を入力音声信号系列から減算した結果を目標信号
系列として現フレームの音源パルス系列を計算す
るという構成にしたことによる。またこの効果は
復号器側において、受信し復号化して得た音源パ
ルスを駆動源として再生した信号系列と、１フレ
ーム過去の音源パルス系列に由来した応答信号系
列とを用いて音声信号系列を再生するという構成
にしたことによる。尚、前述の本発明の実施例においては、１フレ
ーム内の音源パルス系列の符号化は、パルス系列
が全て求まつた後に、第４図の構成要素２５０に
よつて符号化を施したが、符号化をパルス系列の
計算に含めて、パルスを１つ計算する毎に、符号
化を行ない、次のパルスを計算するという構成に
してもよい。このような構成をとることによつ
て、符号化の歪をも含めた誤差を最小とするよう
なパルス系列が求まるので、更に品質を向上させ
ることができる。また、前述の実施例においては、パルス系列の
計算はフレーム単位で行なつたが、フレームをい
くつかのサブフレームに分割し、そのサブフレー
ム毎にパルス系列を計算するような構成にしても
よい。この構成によれば、フレーム長をＮとすれ
ば、第４図に示した構成と比べて演算量を大略
１／ｄ倍にすることができる。ここでｄはフレーム分割数を示す。例えばｄ＝２とすれば、演算量は
約1/2にできる。勿論、同等の特性は得られる。更に本発明の構成によれば、第４図に示した符
号器側の実施例において、１フレーム過去の音源
パルス系列によつて合成フイルタ回路３２０を駆
動した後に、１フレーム全て０の音源パルス系列
を入力し、応答信号系列を現フレームにまで伸ば
して求めた。この場合に、１フレーム過去の音源
パルス系列によつて合成フイルタを駆動した際に
は１フレーム過去に入力されたＫパラメータ値を
そのまま用いたが、１フレーム全て０の音源パル
ス系列を入力した際には、現フレーム時刻に入力
されたＫパラメータ値を用いる構成とした。ここ
で、１フレーム全て０の音源パルス系列を入力し
た際にも、合成フイルタ回路３２０のＫパラメー
タ値としては１フレーム過去に入力されたＫパラ
メータ値をそのまま用いるような構成としてもよ
い。このような構成とした場合には、第５図復号
器側の構成も符号器側と同一の変更を必要とす
る。また、以上説明した構成例においては、短時間
音声信号系列のスペクトル包絡を表わすパラメー
タとしてはＫパラメータを用いたが、これはよく
知られている他のパラメータ（例えばLSPパラメ
ータ等）を用いてもよい。更に、前述の(7)式にお
ける重み付け関数ｗ(n)はなくてもよい。尚、デイジタル信号処理の分野でよく知られて
いるように、自己相関関数はパワスペクトルから
計算してもよい。また相互相関関数はクロス・パ
ワススペクトルから計算してもよい。また、本発明による音源パルス計算式(16)式に
おいては、ψ_hh（゜）として(18)式に従つて共分散
関数を計算したが、これは下式のような自己相
関々数列を計算するような構成にしてもよい。 Ψ_hh（｜m_i−m_k｜）＝Ｎ−｜mi−mk｜〓ｎ＝１h_w(n) h_w（ｎ−｜m_i−m_k｜，（１≦｜m_i−m_k｜≦Ｎ）
(20) このような構成をとることによつてψ_hh（゜）の
計算に要する演算量を大幅に低減させることが可
能となり、全体の演算量も低減できるという効果
がある。また、第４図に示した本発明の構成による符号
器の一実施例においては、バツフアメモリ回路１
１０の後ろに減算回路２８５をおく構成とした
が、減算回路２８５をバツフアメモリ回路１１０
の前におく構成としてもよい。更には、第４図に
おいてはＫパラメータ計算回路２８０は減算回路
２８５の前に接続されており、バツフアメモリ回
路１１０の出力系列を分析するような構成とした
が、Ｋパラメータ計算回路２８０を減算回路２８
５の後ろに接続して、２８５の出力系列を分析す
るような構成としてもよい。 DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a low bit rate waveform encoding system for audio signals, and in particular to an encoding/decoding apparatus that reduces the amount of transmitted information to 10 kbit/sec or less. An effective method for encoding an audio signal with a transmission information amount of about 10k bits/second or less is to minimize the error between the input signal and the signal reproduced using the driving sound source signal sequence of the audio signal. As a condition,
A method of searching at short intervals is well known.
These methods are called tree coding (TREE CODING) and vector quantization (VECTORQUANTIZATION) depending on their search method. In addition to these methods, a plurality of pulse sequences representing the driving sound source signal sequence are transmitted at short intervals.
On the encoder side, A-b-S( A NALYSIS- B Y-
Recently, a method has been proposed that attempts to obtain the information sequentially using the SYNTHESIS method. The present invention relates to this method. For details on this method, please refer to B.S.A.T.A.R.
Eye by S.ATAL) et al. C. A. S. S. Proceedings of ICASSP, 1982
“A. New Model of L.P.C. Excitement for Producing Natural Sounding Speech at Low Bit Rates” (“ A NEW MODEL OF LPC
EXCITATION FOR PRODUCING
NATURAL−SOUNDING SPEECH ATLOW
BIT RATES") (Reference 1), so we will briefly explain it here. Figure 1 shows the processing on the encoder side in the conventional method described in Reference 1. 1 is a block diagram. In the figure, 100 indicates an encoder input terminal, into which an A/D converted audio signal sequence x(n) is input. 110 is a buffer memory circuit;
1 frame of audio signal sequence (e.g. 10msec, 8K
In the case of Hz sampling, 80 samples) are accumulated. The output value of 110 is output to a subtracter 120 and a K parameter calculation circuit 180. However, according to Reference 1, the reflection.. is used instead of the K parameter. COE FICCIENTS (REFLECTION
COEFFICIENTS), which is the same parameter as the K parameter. Using the output value of 110, the K parameter calculation circuit 180 divides the K parameter Ki representing the audio signal spectrum for each frame into 16th order (1
≦i≦16) and outputs these to the synthesis filter 130. 140 is a sound source pulse generation circuit, which generates a predetermined number of pulse sequences in one frame. Here, this pulse sequence is denoted as d(n). An example of a sound source pulse sequence generated by 140 is shown in FIG. In FIG. 2, the horizontal axis represents discrete time, and the vertical axis represents amplitude. Here, a case is shown in which eight pulses are generated within one frame. The pulse sequence d(n) generated by 140 drives the synthesis filter 130. Synthesis filter 130
inputs d(n), obtains the reproduced signal x〓(n) corresponding to the audio signal x(n), and sends this to the subtracter 120.
Output to. Here, the synthesis filter 130 is K
Input the parameters Ki, convert them to prediction parameters ai (1≦i=16), and use ai to calculate x〓(n)
Calculate. x〓(n) can be expressed as shown below using d(n) and ai. x〓(n)=d(n)+ _P〓i ⁼¹ ai·x〓(n−i)−(1) In the above equation, P indicates the order of the synthesis filter, and here P=16. The subtracter 120 converts the original signal x
(n) and the reproduced signal x(n) is calculated and output to the weighting circuit 190. 190 inputs e(n), uses the weighting function w(n), and calculates the weighting error ew(n) according to the following equation. ew(n)=w(n) ^* e(n) −(2) In the above equation, the symbol “*” represents a convolution integral. Furthermore, the weighting function w(n) performs weighting on the frequency axis, and its Z-transformed value is
(Z), the prediction parameter ai of the synthesis filter is
It is expressed by the following formula using . W(Z)=(1- _P 〓 ⁱ⁼¹ aiZ _-i )/(1- _P 〓 ⁱ⁼¹ ai・r _i・Z _-i ) −(3) In the above equation, r is 0≦r≦1. It is a constant and determines the frequency characteristics of W(Z). That is, when r=1, W(Z)=1, and the frequency characteristics are flat. On the other hand, when r=0, W(Z) has a frequency characteristic opposite to that of the synthesis filter. Therefore, the characteristics of W(Z) can be changed depending on the value of r.
Furthermore, as shown in equation (3), W(Z) is determined depending on the frequency characteristics of the synthesis filter because an auditory masking effect is utilized. In other words, this is due to the perceptual property that even if the error with the spectrum of the reproduced signal is a little large, the error is hard to notice at a location where the input audio signal has a large spectral power (for example, near a formant). FIG. 3 shows an example of the spectrum of the input audio signal and the frequency characteristics of W(Z) in a certain frame. Here, r=0.8. In the figure, the horizontal axis shows frequency (maximum 4KHz) and the vertical axis shows logarithmic amplitude (maximum 60dB). Further, the upper curve represents the spectrum of the audio signal, and the lower curve represents the frequency characteristic of the weighting function. Returning to FIG. 1, the weighted error ew(n) is fed back to the error minimization circuit 150. The error minimization circuit 150 stores the values of ew(n) for one frame, and uses them to perform weighting 2 according to the following formula.
Calculate the multiplicative error ε. ε= _N 〓 ⁿ⁼¹ ew(n) ² −(4) Here, N indicates the number of samples for calculating the squared error. In the method of Reference 1, this time length is set to 5 msec.
This corresponds to N=40 in the case of 8KHz sampling. Next, the error minimization circuit 15
0 provides pulse position and amplitude information to the sound source pulse generation circuit 140 so as to reduce the squared error ε calculated by the above equation (4). 140 generates a sound source pulse sequence based on this information. The synthesis filter 130 uses this sound source pulse sequence as a driving source to calculate a reproduced signal x〓(n). Next, subtractor 12
0, the currently determined reproduced signal x〓(n) is subtracted from the previously calculated error e(n) between the original signal and the reproduced signal,
Let this be the new error e(n). Weighting circuit 1
90 inputs e(n), calculates a weighted error ew(n), and feeds it back to the error minimization circuit 150. 150 again calculates the squared error and
The amplitude and position of the sound source pulse sequence are adjusted to reduce this. In this way, a series of processes from generation of the sound source pulse sequence to adjustment of the sound source pulse sequence by error minimization are repeated until the number of pulses in the sound source pulse sequence reaches a predetermined number.
A sound source pulse sequence is determined. This concludes the explanation of the conventional method. In the case of this method, the information to be transmitted is the K parameter Ki (1≦i≦16) of the synthesis filter and the pulse position and amplitude of the sound source pulse sequence. Any transmission rate can be achieved. Furthermore, the transmission rate
It is considered to be one of the effective methods since good playback quality can be obtained for the range of 10Kbps or less. However, this conventional method has the disadvantage that the amount of calculation is extremely large. When calculating the amplitude of a pulse position in a sound source pulse sequence, this calculates the error and squared error between the reproduced signal and the original signal based on the pulse, and feeds them back to adjust the pulse position and amplitude. This is due to what you are doing. Furthermore, this is caused by repeating this process until the number of pulses reaches a predetermined value. Furthermore, according to this conventional method, the length of the analysis frame is fixed, and when frames are switched at a portion where the power of the input audio signal sequence is large, waveform irregularities occur near the frame boundaries in the reproduced signal sequence. There is a drawback that deterioration due to continuity occurs, which greatly impairs the quality of reproduced audio. The purpose of the present invention is to use a relatively small amount of calculations, almost no quality deterioration near frame boundaries,
The object of the present invention is to provide a high-quality audio encoding/decoding device that can be applied to transmission rates of 10 Kbps or less. The audio encoding/decoding device of the present invention includes, on the transmission side, a subtraction circuit that inputs a discrete audio signal sequence and subtracts a response signal sequence derived from a drive sound source signal sequence obtained in the past from the audio signal sequence. , a parameter calculation circuit that extracts and encodes a parameter representing the short-time spectral envelope of the audio signal sequence or the subtraction result; and an impulse response sequence calculation circuit that calculates an impulse response sequence based on the parameter representing the spectral envelope. a correlation sequence calculation circuit that inputs the output sequence of the impulse response sequence calculation circuit and calculates a correlation sequence, and creates a target signal sequence based on the subtraction result, and generates a target signal sequence and the impulse response sequence. a cross-correlation sequence calculation circuit that calculates a cross-correlation sequence with
a driving excitation signal sequence calculation circuit that inputs the correlation sequence and the cross-correlation sequence and calculates and encodes the driving excitation signal sequence of the audio signal sequence; A response signal sequence calculation circuit that calculates the response signal sequence, and a multiplexer circuit that combines and outputs the output code sequence of the parameter calculation circuit representing the spectral envelope and the code sequence of the drive excitation signal sequence, and on the receiving side, the code sequence a demultiplexer circuit that inputs a code sequence of the driving excitation signal sequence and separates a code sequence of the parameter representing the spectrum envelope, and a demultiplexer circuit that decodes the driving excitation signal sequence from the separated code sequence to generate an excitation pulse sequence. A sound source pulse sequence generation circuit to generate a sound source pulse sequence, a decoding circuit to decode a code sequence of parameters representing the separated and obtained spectral envelope, and a reproduction and output of an audio signal sequence using the parameters representing the decoded spectral envelope. The invention is characterized in that it has a synthesis filter circuit. The speech encoding device of the present invention includes a subtraction circuit that inputs a discrete speech signal series and subtracts a response signal series from the speech signal series, and a short-time spectral envelope of the speech signal series or the output system of the subtraction circuit. a parameter calculation circuit that extracts and encodes parameters representing the spectral envelope; an impulse response sequence calculation circuit that calculates an impulse response sequence based on the parameters representing the spectral envelope; a correlation sequence calculation circuit for calculating a number sequence; and a cross-correlation for calculating a cross-correlation sequence between the output sequence of the subtraction circuit or a signal obtained by applying a predetermined correction to the output sequence of the subtraction circuit and the impulse response sequence. a driving excitation signal sequence calculation circuit that inputs the correlated sequence and the cross-correlated sequence and calculates and encodes a driving excitation signal sequence of the audio signal sequence; and a driving excitation signal sequence calculation circuit that inputs the driving excitation signal sequence. a response signal sequence calculation circuit that calculates the response signal sequence derived from the driving excitation signal sequence; and a multiplexer circuit that combines and outputs the output code sequence of the parameter calculation circuit and the code sequence of the driving excitation signal sequence. It is characterized by having the following. The audio decoding device of the present invention subtracts a response signal sequence derived from a drive sound source signal sequence obtained in the past from a discrete audio signal sequence, and extracts a parameter representing the short-time spectral envelope of the audio signal sequence or the subtraction result. and encode it as
searching for and encoding a driving excitation signal sequence using a cross-correlation sequence calculated using the impulse response sequence obtained from the parameters and the subtraction result, and a correlation sequence calculated using the impulse response sequence; a demultiplexer circuit inputting a code sequence output by combining the code sequence of the parameter representing the spectral envelope and separating the code sequence representing the drive excitation signal sequence and the code sequence of the parameter representing the spectral envelope; an excitation pulse sequence generation circuit that generates an excitation pulse sequence by decoding a code sequence representing the driving excitation signal sequence obtained by the above-described method; and a decoding circuit that decodes a code sequence of parameters representing the spectral envelope obtained by separation. and a synthesis filter circuit which inputs the output sequence of the sound source pulse sequence generation circuit and reproduces and outputs the audio signal sequence using the output parameters of the decoding circuit. One of the features of the speech encoding device according to the present invention is the algorithm for calculating the sound source pulse sequence.
Therefore, in the following, this algorithm will first be explained in detail. First, the sound source pulse sequence d(n) at an arbitrary time n within one frame is expressed by the following equation. d(n)= _k 〓 ^k=1 g _k・δ _o , mk −(5) Here, δn, mk represent Kronetzker's delta, which is 1 when n=m _k and when n≠m _k is 0
It is. Furthermore, g _k represents the amplitude of the pulse at position m _k . The reproduced signal x〓(n) obtained by inputting d(n) to the synthesis filter is the predicted parameter of the synthesis filter a _i (1≦i≦N _p ; Here, N _p indicates the order of the synthesis filter) Then, it can be written as the following formula. x〓(n)=d(n)+ _Np 〓 ⁱ⁼¹ aix〓(n-i) −(6) Next, within one frame of the input audio signal x(n) and the playback signal x〓(n) The weighted squared error J can be written as follows. J= _N 〓 ⁿ⁼¹ ((x(n)−x〓(n))＊w(n)) ² −(7) Here, w(n) is the impulse response of the weighting circuit, and for example, in the conventional example They may have the same characteristics.
Further, N indicates the number of samples in one frame. Equation (7) can be further transformed as follows. J= _N 〓 ⁿ⁼¹ (x(n)＊w(n) −x〓(n)＊w(n)) ² −(8) Here, the term x〓(n)＊w(n) is as follows It is transformed according to the formula. Let x〓(w)(n)=x〓(n)＊w(n) −(9). When both sides of equation (9) are Z-transformed, it is multiplied by X〓(Z)=X〓(Z)・W(Z) −(10). X〓(Z) is further multiplied as follows. X〓(Z)=H(Z)・D(Z) −(11) Here, D(Z) is the Z transformation of the sound source pulse sequence equation (5), and H(Z) is the impulse response of the synthesis filter. Indicates the Z-transform value. Substituting equation (11) into equation (10), we get X _w (Z)=D(Z). H(Z). W(Z) −(12), and H _w (Z)=H(Z). Letting W(Z) be the inverse Z-transform of equation (12) and the inverse Z-transformed value of Hw(Z) to be hw(n), the following equation is obtained. x _w (n)=d(n)*h _w (n) −(13) Here, h _w (n) represents the impulse response of the cascade-connected filter of the synthesis filter and the weighting circuit. (1
Substitute equation (5) into equation 3) to obtain the following equation. X〓 _w (n)= _K 〓 ⁱ⁼¹ g _i h _w (n−mi ₎ −(14) Here, K indicates the number of pulses generated in one frame. Substituting equations (14) and (9) into equation (8), J= _N 〓 ⁿ⁼¹ (x _w (n)− _K 〓 ⁱ⁼¹ g _i h _w (n−m _i )) ² Multiply by −(15). Therefore, equation (7) can be expressed as equation (15). The amplitude of the sound source pulse sequence that minimizes equation (15)
The calculation formulas for g _k and position m _k are derived as follows. By partially differentiating equation (15) with respect to g _k and setting it to 0, the following equation is derived. Here, ψxh (·) represents the cross-correlation sequence calculated from Xw(n) and h _w (n). Also, ψhh(・) is
represents the correlation sequence of the impulse response hω(n). A covariance function sequence or an autocorrelation function sequence is known as a correlation sequence of an impulse response. The cross-correlation sequence and covariance function sequence can be expressed as follows. Note that ψ _hh (·) is often called a covariance function in the field of audio signal processing. ψ _xh (−m _k )= _N 〓 ⁿ⁼¹ x _w (n)h _w (n−m _k ) = ψ _hx (m _k ), (1≦m _k ≦N) −(17) ψ (m _i , m _k )= _N-(ni-nk) 〓 ^n-1 h _w (n-m _i ) h _w (n-m _k ), (1≦m _k ≦N) −(18) In equation (16) According to the above, the amplitude g _k corresponding to the position m _k can be calculated using the pulse position m _k as a parameter.
As for the pulse position m _k, it is sufficient to select the m _k at which |g _k | is maximum for each pulse. This can be proven by solving equation (16) for g _i , but the proof is omitted here. This concludes the explanation regarding the derivation of this algorithm. Another feature of the speech encoding device according to the present invention is that there is almost no quality deterioration near frame boundaries, and this will be explained next using an example. FIG. 4 is a block diagram showing an example of the configuration of an encoder using the excitation pulse calculation algorithm according to equation (16). In the figure, the components given the same numbers as in FIG. 1 have the same functions as in FIG. 1, and therefore their explanations will be omitted here. In FIG. 4, each component performs the following processing for each frame. Also, let N be the number of samples in one frame. The K parameter calculation circuit 280 inputs the audio signal sequence x(n) stored in the buffer memory circuit 110 and calculates K parameters K _i (1≦i≦
N _p ). K _i is output to the K parameter encoding circuit 200. K parameter encoding circuit 2
00 encodes K _i based on, for example, a predetermined number of quantization bits, and outputs the code l _ki to the multiplexer 260 . Further, the K parameter encoding circuit 200 decodes l _ki and obtains a decoded value k _i ′(1
≦i≦N _p ) with the impulse response calculation circuit 210,
Weighting circuit 290 and synthesis filter circuit 320
Output to. The impulse response calculation circuit 210 is
Input k _i ′ and calculate h _w (n) (impulse response of a filter consisting of a cascade of a synthesis filter and a weighting circuit) in equation (13) for a predetermined number of samples. ta h _w (n)
is output to the covariance function calculation circuit 220 and the cross-correlation calculation circuit 235. The covariance function calculation circuit 220 inputs a predetermined number of samples h _w (n), and calculates the covariance ψ _hh (m _i , m _k ) of h _w (n) according to the above-mentioned equation (18). (1≦i,
K≦N) and sends it to the pulse sequence calculation circuit 2.
Output to 40. Next, the subtracter 285 extracts the audio signal sequence x(n) stored in the buffer memory circuit 110.
, the output series of the synthesis filter circuit 320 is subtracted by one frame, and the subtraction result is sent to the weighting circuit 290.
Output to. Here, as will be described later, in the synthesis filter circuit 320, a response signal sequence is obtained by using the sound source pulse sequence one frame past the current frame as a drive signal, and then the signal sequence is extended to the current frame with the drive signal set to 0 for one frame. minutes have been accumulated. In other words, if the number of meaningful samples of the impulse response of the synthesis filter is at most two frames, the audio signal sequence of the current frame is the synthesis filter output signal driven by the sound source pulse of one frame past. This is based on the idea that the signal sequence can then be expressed as the sum of the signal sequence extended to the current frame by setting the drive signal to 0, and the synthesis filter output signal sequence driven by the sound source pulse sequence of the current frame. Weighting circuit 290
inputs K _i ′ from the K-parameter encoding circuit 200 and calculates the weighting function w(n) using, for example, the conventional method.
Calculate according to formula (3). This may be calculated using other frequency weighting methods. Further, the weighting circuit 290 inputs the subtraction result of the subtracter 285, performs convolution integral calculation with this and w(n), and transmits the obtained x _w (n) to the cross-correlation coefficient calculation circuit 285.
Output to 5. The cross-correlation calculation circuit 235 is
Input x _w (n) and h _w (n), and according to equation (17) above,
The cross-correlation number ψ _xh (-m _k ) (1≦m _k ≦N) is calculated and output to the pulse sequence calculation circuit 240 . Next, the pulse sequence calculation circuit 240 receives ψ _xh (−m _k ) from the cross-correlation calculation circuit 235 and ψ _hh (mi, m _k ) (1≦
m _i , m _k ≦N), and the pulse amplitude g _k is calculated using the above-mentioned sound source pulse calculation formula (16). For example, in (16) for the first pulse, the amplitude g ₁ is determined as a function of the position m ₁ with k=1. Next, select m ₁ that maximizes |g ₁ |, and let m ₁ and g ₁ at that time be the position and amplitude of the first pulse. Next, the second pulse is found by setting k=2 in equation (16). According to equation (16), it means that the second pulse is found by subtracting the influence of the first pulse. The third and subsequent pulses can be calculated in the same way, and either the predetermined number of pulses is reached, or the error value obtained by substituting g _k and m _k of the determined pulses into equation (15) is Pulses continue to be calculated until they fall below a predetermined threshold. g _k and m _k representing the amplitude and position of the pulse sequence are
50. The encoding circuit 250 includes the excitation pulse calculation circuit 24
0, the amplitude g _k and position m _k of the sound source pulse sequence are input, these are encoded using normalization coefficients to be described later, and codes representing g _k , m _k and the normalization coefficient are output to the multiplexer 260 . It also decodes this and outputs the decoded values g _k _′ and m k ′ of g _k and m _k to the pulse sequence generation circuit 300 . Here, although various encoding methods can be considered, a conventionally well-known method can be used to encode the amplitude g _k . For example, a method can be considered in which the amplitude probability distribution is assumed to be a normal type and an optimal quantizer for the normal type is used. Regarding this, please refer to I. by Mr. J.MAX. R. E.
Transactions. on. Information. Theory (IRETRANSACTIONS ON
INFORMATION THEORY) March 1960 issue,
“Quantizing Four.” published on pages 7-12.
minimum. "Destruction"
(“QUANTIZING FOR MINIMUM
This is explained in detail in the paper entitled "DISTORTION" (Reference 2), so the explanation will be omitted here.
Another possible method is to use the maximum value of the amplitude of a pulse sequence within one frame as a normalization coefficient, normalize each pulse amplitude with this value, and then quantize and encode it. In the case of the former method,
rms within one frame (ROOT MEAN
SQUARE) value may be used as the normalization coefficient. Next, various methods can be considered for encoding the pulse position. For example, a run-length code well known in the field of facsimile signal encoding may be used. This represents the length of the code "0" using a predetermined code sequence. Further, for encoding the normalization coefficients, conventionally well-known logarithmic compression encoding or the like can be used. It should be noted that the coding of the pulse sequence is not limited to the coding method described here, and it goes without saying that the best known method can be used. Returning again to FIG. 4, the pulse sequence generation circuit 30
0 uses the input g _k ′ and m _k ′ to calculate one frame of a sound source pulse sequence having an amplitude g _k ′ at the position m _k ′, and uses this as a drive signal to send to the synthesis filter circuit 3.
Output to 20. The synthesis filter circuit 320 inputs the K parameter quantized value K _i ' (1≦i≦N _p ) from the K parameter encoding circuit 200 and converts it into the prediction parameter a _i (1≦i≦N _p ) using a well-known method. Convert it using . Next, the synthesis filter circuit 320 inputs one frame of the drive sound source signal from the pulse generation circuit 300, adds one frame of zero to this one frame of signal, and generates a response signal sequence for the two frames of signal. Find x〓(n)′. Furthermore, when calculating a response signal sequence using the zero signal sequence of the second frame, the synthesis filter circuit 320 calculates a new K _i ′ (1≦
i≦N _p ) and use this. This is shown in the following equation. Here, the drive sound source signal d(n) represents an output pulse series from the pulse generation circuit 300 when 1≦n≦N, and represents a series of all 0s when N+1≦n≦2N. Also, in (19), a ^j _i is K _i ′ at current frame time j
The prediction parameters calculated from (1≦i≦N _p ) are
a ^j-1 _i is one frame time past frame time j-1
The prediction parameters calculated from K _i ′ of are shown respectively. Of the x〓′(n) obtained according to formula (19), the second frame x〓′(n) (N+1≦n≦2N) is the subtracter 28
5. Next, the multiplexer 260 outputs the output code of the K-parameter encoding circuit 200 and the output code of the encoding circuit 25.
Input the output sign of 0 and combine them,
It is output from the transmission side output terminal 270 to the communication path. This concludes the description of the speech encoding device according to the present invention. Next, a speech decoding device according to the present invention will be explained. FIG. 5 shows an example of the configuration of a speech decoding device according to the present invention. In the figure, decoder input terminal 35
A code sequence is input for each frame from 0, and the demultiplexer 360 separates this code sequence into a K parameter code sequence, a code sequence representing the amplitude and position of the excitation pulse sequence, and a code representing a normalization coefficient, The K-parameter code sequence is output to the K-parameter decoding circuit 380, and the remaining code sequence is output to the decoding circuit 370. The decoding circuit 370 first decodes the code representing the normalization coefficient, uses this to decode the code sequence of the excitation pulse sequence, and outputs the amplitude g _k ′ and position m _k ′ of the pulse to the pulse sequence generation circuit 420
Output to. Here, the pulse sequence generation circuit 420
performs the same operation as the pulse sequence generation circuit 300 on the encoder side in FIG. 4, generates a pulse sequence within one frame, and outputs it to the synthesis filter circuit 440. The synthesis filter circuit 440 inputs N _p K parameter decoded values K _i ′ (1≦i≦N _p ) from the K parameter decoding circuit 380 and converts them into prediction parameters a _i (1≦i≦N _p ). Convert to next,
1 drive sound source signal from the pulse sequence generation circuit 420
A frame worth of data is input, and this is used to reproduce one frame worth of the audio signal sequence. Inside this synthesis filter 440, the response signal sequence determined from the sound source pulse sequence of one frame past is added to the reproduction signal sequence determined from the sound source pulse sequence of the current frame, and the audio signal sequence is reproduced. The reproduced audio signal sequence x〓(n) is output to the buffer memory circuit 470. After the buffer memory circuit 470 accumulates x〓(n) for one frame, it outputs it through the decoder side output terminal 410.
This concludes the description of the audio decoding device according to the present invention. According to the configuration of the present invention, since the sound source pulse sequence is calculated according to equation (16), the synthesis filter is driven by the pulses seen in the conventional method of Document 1, the reproduced signal is obtained, and the reproduction signal is compared with the original signal. There is no path to adjust the pulse by feeding back errors and squared errors, and there is no need to repeat the process.
This has the great effect of significantly reducing the amount of calculations and providing good playback quality. Furthermore(1
6) In the calculation of equations, ψ _xh (−m _k ) and ψ _hh (m _i , m _k )
By calculating the values of (1≦m _i , m _k ≦N) for each frame in advance, the calculation of equation (16) is a very simplified operation of multiplication and subtraction. This has the effect of further reducing the amount of calculation. Furthermore, compared to other conventional methods for searching for sound source pulse sequences, the method according to the present invention has the advantage that better quality can be obtained for the same amount of transmitted information. Furthermore, according to the configuration of the present invention, not only when the analysis frame length is not constant, but even when the analysis frame length is constant, there is almost no deterioration of the reproduced signal near the frame boundaries due to waveform discontinuity. There is a big effect that there is no. This effect occurs on the encoder side when calculating the sound source pulse sequence of the current frame by extending the response signal sequence obtained by driving the synthesis filter using the sound source pulse sequence of the frame one frame past to the current frame. This is because the sound source pulse sequence of the current frame is calculated using the result obtained by subtracting this from the input audio signal sequence as the target signal sequence. This effect also occurs on the decoder side, which reproduces an audio signal sequence using a signal sequence that is reproduced using the received and decoded sound source pulse as a driving source, and a response signal sequence derived from the sound source pulse sequence one frame past. This is due to the configuration. In the above-described embodiment of the present invention, the sound source pulse sequence within one frame is encoded by the component 250 in FIG. 4 after all pulse sequences have been determined. It may also be configured such that encoding is included in the pulse sequence calculation, and each time one pulse is calculated, encoding is performed and the next pulse is calculated. By adopting such a configuration, a pulse sequence that minimizes errors including encoding distortion can be found, so that quality can be further improved. Furthermore, in the above-mentioned embodiment, the calculation of the pulse sequence was performed on a frame-by-frame basis, but the frame may be divided into several subframes, and the pulse sequence may be calculated for each subframe. . According to this configuration, if the frame length is N, the amount of calculation can be approximately 1/d times as large as that of the configuration shown in FIG. Here, d indicates the number of frame divisions. For example, if d=2, the amount of calculation can be reduced to about 1/2. Of course, equivalent characteristics can be obtained. Furthermore, according to the configuration of the present invention, in the embodiment on the encoder side shown in FIG. was input, and the response signal sequence was extended to the current frame. In this case, when the synthesis filter is driven by the sound source pulse sequence of one frame past, the K parameter value input one frame past is used as is, but when the sound source pulse sequence of all zeros is input for one frame, In this case, the K parameter value input at the current frame time is used. Here, even when a sound source pulse sequence of all zeros in one frame is input, the K parameter value input one frame in the past may be used as it is as the K parameter value of the synthesis filter circuit 320. In such a configuration, the configuration on the decoder side shown in FIG. 5 requires the same changes as the encoder side. In addition, in the configuration example described above, the K parameter was used as the parameter representing the spectral envelope of the short-time audio signal sequence, but it is also possible to use other well-known parameters (such as LSP parameters). good. Furthermore, the weighting function w(n) in the above equation (7) may be omitted. Note that, as is well known in the field of digital signal processing, the autocorrelation function may be calculated from the power spectrum. Alternatively, the cross-correlation function may be calculated from the cross-power spectrum. In addition, in the sound source pulse calculation formula (16) according to the present invention, the covariance function was calculated according to formula (18) with ψ _hh (゜), but this is based on calculating the autocorrelation sequence as shown in the following formula. It is also possible to configure such a configuration. Ψ _hh (|m _i −m _k |)=N−|mi−mk| 〓 n=1h _w (n) h _w (n−|m _i −m _k |, (1≦|m _i −m _k | ≦N)
(20) By adopting such a configuration, it is possible to significantly reduce the amount of calculation required to calculate ψ _hh (°), and the overall amount of calculation can also be reduced. Further, in one embodiment of the encoder having the configuration of the present invention shown in FIG. 4, the buffer memory circuit 1
Although the subtraction circuit 285 is placed after the buffer memory circuit 110, the subtraction circuit 285 is placed after the buffer memory circuit 110.
It may also be configured to be placed before. Furthermore, in FIG. 4, the K parameter calculation circuit 280 is connected before the subtraction circuit 285, and is configured to analyze the output series of the buffer memory circuit 110, but the K parameter calculation circuit 280 is connected before the subtraction circuit 285.
It may be configured such that it is connected after 5 and analyzes the 285 output series.

[Brief explanation of drawings]

第１図は従来方式の構成を示すブロツク図、第
２図は音源パルス系列の一例を示す図、第３図は
入力音声信号系列の周波数特性と第１図に記載の
重み付け回路の周波数特性の一例を示す図、第４
図は本発明の構成による音声符号化装置側の一実
施例を示すブロツク図、第５図は本発明の構成に
よる音声復号化装置側の一実施例を示すブロツク
図をそれぞれ示す。図において、１１０，４７０……バツフアメモ
リ回路、１２０，２８５……減算回路、１３０，
３２０，４４０……合成フイルタ回路、１４０，
３００，４２０……音源パルス発生回路、１５０
……誤差最小化回路、１８０，２８０……Ｋパラ
メータ計算回路、１９０，２９０……重み付け回
路、２００……Ｋパラメータ符号化回路、２４０
……音源パルス計算回路、２１０……インパルス
応答計算回路、２２０……共分散関数計算回路、
２３５……相互相関々数計算回路、２５０……符
号化回路をそれぞれ示す。 Figure 1 is a block diagram showing the configuration of the conventional system, Figure 2 is a diagram showing an example of a sound source pulse sequence, and Figure 3 shows the frequency characteristics of the input audio signal sequence and the frequency characteristics of the weighting circuit shown in Figure 1. Diagram showing an example, No. 4
FIG. 5 is a block diagram showing an embodiment of the speech encoding apparatus according to the present invention, and FIG. 5 is a block diagram showing an embodiment of the speech decoding apparatus according to the invention. In the figure, 110,470...Buffer memory circuit, 120,285...Subtraction circuit, 130,
320, 440...Synthesis filter circuit, 140,
300, 420...Sound source pulse generation circuit, 150
... Error minimization circuit, 180, 280 ... K parameter calculation circuit, 190, 290 ... Weighting circuit, 200 ... K parameter encoding circuit, 240
... Sound source pulse calculation circuit, 210 ... Impulse response calculation circuit, 220 ... Covariance function calculation circuit,
235 shows a cross-correlation calculation circuit, and 250 shows an encoding circuit.

Claims

[Scope of Claims] 1. On the transmission side, a subtraction circuit inputs a discrete audio signal sequence and subtracts a response signal sequence derived from a drive sound source signal sequence obtained in the past from the audio signal sequence; Alternatively, a parameter calculation circuit extracts and encodes a parameter representing the short-time spectral envelope of the subtraction result, an impulse response sequence calculation circuit calculates an impulse response sequence based on the parameter representing the spectral envelope, and the impulse response a correlation sequence calculation circuit that inputs the output sequence of the sequence calculation circuit and calculates a correlation sequence; and a correlation sequence calculation circuit that generates a target signal sequence based on the subtraction result and calculates the cross-correlation between the target signal sequence and the impulse response sequence. a cross-correlation sequence calculation circuit that calculates a number sequence; a driving excitation signal sequence calculation circuit that receives the correlation sequence and the cross-correlation sequence and calculates and encodes a driving excitation signal sequence of the audio signal sequence; a response signal sequence calculation circuit that calculates the response signal sequence derived from the driving sound source signal sequence; and a multiplexer circuit that combines and outputs the output code sequence of the parameter calculation circuit representing the spectral envelope and the code sequence of the driving sound source signal sequence. on the receiving side, a demultiplexer circuit receives the code sequence and separates the code sequence of the driving excitation signal sequence from the code sequence of the parameter representing the spectral envelope; a sound source pulse sequence generation circuit that decodes the driving sound source signal sequence to generate a sound source pulse sequence; a decoding circuit that decodes a code sequence of parameters representing the separated spectral envelope; and a decoding circuit that decodes a code sequence of parameters representing the separated spectral envelope; 1. An audio encoding/decoding device comprising: a synthesis filter circuit that reproduces and outputs an audio signal sequence using parameters representing the audio signal. 2. A subtraction circuit that inputs a discrete audio signal sequence and subtracts a response signal sequence from the audio signal sequence, and a parameter that extracts and encodes a parameter representing the short-time spectral envelope of the audio signal sequence or the output sequence of the subtraction circuit. a calculation circuit; an impulse response sequence calculation circuit that calculates an impulse response sequence based on parameters representing the spectral envelope; and a correlation sequence calculation circuit that receives an output sequence of the impulse response sequence calculation circuit and calculates a correlation sequence. a cross-correlation sequence calculation circuit for calculating a cross-correlation sequence between the output series of the subtraction circuit or a signal obtained by subjecting the output sequence of the subtraction circuit to a predetermined correction and the impulse response sequence; a driving sound source signal sequence calculation circuit that inputs a number sequence and the cross-correlation sequence and calculates and encodes a driving sound source signal sequence of the audio signal sequence; and a multiplexer circuit that combines and outputs the output code sequence of the parameter calculation circuit and the code sequence of the drive excitation signal sequence. conversion device.