JP3628268B2

JP3628268B2 - Acoustic signal encoding method, decoding method and apparatus, program, and recording medium

Info

Publication number: JP3628268B2
Application number: JP2001069894A
Authority: JP
Inventors: 茂明佐々木; 一則間野; 丈太郎池戸; 祐介日和▲崎▼
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-03-13
Filing date: 2001-03-13
Publication date: 2005-03-09
Anticipated expiration: 2021-03-13
Also published as: JP2002268696A

Description

【０００１】
【発明の属する技術分野】
この発明は、入力信号と符号化による合成信号との誤差が最小となるように符号を決定する音響信号符号化・復号化において、復号化すべきフレームの符号化符号を復号化器で受信できなかった場合の出力信号のフレーム消失補償を行う音響信号符号化・復号化法及び装置並びに音響信号符号化・復号化プログラム及び記録媒体するに関する。
【０００２】
【従来の技術】
従来において、音響信号を線形予測符号化により低ビットレートに符号化する方法の典型としてＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ：符号励振線形予測）があげられる。
図１に従来のＣＥＬＰ符号化器の構成を示す。
入力端子１１からの入力音響信号は５〜２０ｍｓ程度のフレーム毎に線形予測分析手段１２で線形予測分析されｐ次の線形予測係数α_ｉ＾，ｉ＝１，・・・，ｐが求められ、この線形予測係数α_ｉ＾は量子化手段１３で量子化され、この量子化線形予測係数α_ｉは線形予測合成フィルタ１４にフィルタ係数として設定される。
【０００３】
線形予測合成フィルタ１４の伝達関数は以下の式で表される。
【０００４】
【数１】

線形予測合成フィルタ１４の励振信号が適応符号帳２０に格納され、制御手段１７からの符号に応じたピッチ周期に基づいて励振信号（ベクトル）が適応符号帳２０から切り出され、これをフレーム長もしくはサブフレーム（フレームを分割したもの）長の分だけ繰り返し、利得付与手段１８、利得制御手段２２により利得が付与され、加算手段２５を通じて励振信号として線形予測合成フィルタ１４へ供給される。
【０００５】
減算手段１５で入力信号から線形予測合成フィルタ１４よりの合成信号が差し引かれ、その差信号は聴覚重み付けフィルタ１６で聴覚特性のマスキング特性と対応した重み付けがなされ、制御手段１７によりこの重み付けされた差信号のエネルギーが最小となるように適応符号帳２０からの符号（つまりピッチ周期）が探索される。
その後、制御手段１７により固定符号帳２１から励振ベクトルが順次取り出され、利得付与手段１８、利得制御手段２３で利得が付与された後、先に選択された適応符号帳２０からの励振ベクトルに加算手段２５により加算されて励振信号として線形予測合成フィルタ１４へ供給され、先の場合と同様で聴覚重み付けフィルタ１６よりの差信号のエネルギーが最小となる励振ベクトルが選択され、これに対応付けられる符号帳符号が決定される。
【０００６】
最後に、これら選択された適応符号帳２０及び固定符号帳２１からの各励振ベクトルに対して、それぞれ利得付与手段１８で付与する各利得が最適となるように、前述と同様に聴覚重み付けフィルタ１６の出力信号のエネルギーが最小となる利得が選択され、この利得に対応付けられる利得符号が決定される。
また、生成された励振信号は過去の励振信号をバッファリングしている適応符号帳２０に格納される。
各々得られた符号、すなわち、ピッチ符号（適応符号帳符号）、固定符号帳符号、ピッチ符号利得、固定符号利得及び線形予測係数はフレーム単位にまとめられ復号化器に送信される。
【０００７】
図２にこのＣＥＬＰ符号化に対する復号化器の構成を示す。
入力端子３１からの入力符号中の線形予測係数符号が復号化手段３２で復号化され、線形予測合成フィルタ３３にフィルタ係数として設定される。
入力符号中のピッチ符号により適応符号帳４０から励振ベクトルが切り出され、また固定符号帳符号により固定符号帳４１から励振ベクトルが選択され、これら符号帳４０，４１からの各励振ベクトルは利得付与手段３６、利得制御手段４２，４３で入力符号中の利得（ピッチ符号利得、固定符号利得）符号に応じてそれぞれ利得が付与された後、加算手段４５で加算されて線形予測合成フィルタ３３に励振信号として与えられる。また、励振信号は過去の励振信号をバッファリングしている適応符号帳４０に格納される。
【０００８】
線形予測合成フィルタ３３からの合成信号はポストフィルタ３８で、量子化雑音が聴覚特性を考慮して小さくなるように処理され、出力端子３９より音響信号が出力される。
復号化器において入力端子３１からの入力符号がフレーム単位で受信されなかった場合（以降フレーム消失）、従来技術では復号化器において、過去の合成信号を分析もしくはすでに受信されている過去の情報から線形予測係数、ピッチ周期等を推定し、これらの情報から擬似的に線形予測合成フィルタ３３を構成し、これに入力されるべき励振信号を求め、擬似出力信号を合成する。この手法では過去の受信情報のみから出力信号を補償するため、フレーム間でピッチ周期が変動（有声無声の変動も含む）した場合、ピッチ間隔の不一致による励振信号の不連続性が生じる。過去の励振信号を符号化器の適応符号帳２０及び復号化器の適応符号帳４０に格納し励振ベクトルとして用いるＣＥＬＰにおいては、消失フレーム以降の受信が回復したフレームにおいても符号化器の適応符号帳２０と復号化器の適応符号帳４０から出力される励振ベクトルが一致しないため、複数フレームにわたって波形の不連続性が持続し、聴感上大きな劣化をもたらす。
【０００９】
【発明が解決しようとする課題】
この発明では従来のＣＥＬＰ等の音響信号符号化・復号化において、フレーム消失の際に品質劣化が顕著となるピッチ周期が大きく変動する区間（有声無声の変動も含む）において、その劣化を抑えるフレーム消失補償手段を提供することを課題とする。
【００１０】
【課題を解決するための手段】
上記課題を解決するために、この発明は、符号化において、符号化対象の入力信号を含む現在のフレームに対して次フレーム以降の入力信号をバッファに格納し、このバッファに格納されている信号を分析して得られた周期性情報を現在のフレームで決定された符号化符号と併せて送信する。復号化において、復号化対象となるフレームの復号化符号が受信できなかった場合、直前フレームの符号化符号と、併せて受信されている周期性情報を用いて、出力信号を補償する。
【００１１】
【発明の実施の形態】
（実施例１）
図３に本発明の実施例１の構成を示す。
この実施例は図１に示した従来の符号化器に、符号化対象の入力信号を含む現在のフレームに対して次フレーム以降の入力信号をバッファに格納し、このバッファに格納されている信号を分析して得られた周期性情報を現在のフレームで決定された符号化符号と併せて送信する手段を加えたものである。
【００１２】
入力端子１１からの入力信号を、現在の符号化すべき対象のフレーム分以外に、次フレーム以降の符号化対象となるべき未来の入力信号をバッファ３５に格納しておく。バッファの長さは、フレームよりも短い長さから数フレーム分まであってもかまわない。上述の図１に示した従来の符号化方式で現在の符号化すべき対象のフレームについて各符号化符号が決定された後、出力信号を合成するために線形予測合成フィルタ１４の入力として用いられた励振信号は符号帳３７に格納される。符号帳３７に格納された信号系列に対して、制御手段４６からの符号に応じた位置から、バッファ３５の長さ分信号を切り出す。この際、切り出した信号がバッファ３５の長さに足りなければバッファ３５の長さになるまで切り出した信号を繰り返す。得られた信号系列を励振信号として線形予測合成フィルタ４４に入力し、量子化手段１３からの現在のフレームの線形予測係数あるいはこの線形予測係数から次フレーム線形予測係数推定器５０により得られた次のフレームの線形予測係数を用いて線形予測合成フィルタ４４を構成し、合成信号を得る。減算手段４５で、バッファ３５に格納されている信号と得られた合成信号との差信号を求め、その差信号は聴覚重み付けフィルタ４７で聴覚特性とマスキング特性と対応した重み付けがなされ、制御手段４６によりこの重み付けされた差信号のエネルギーが最小となるように符号帳３７から切り出し位置を探索し（バッファ３５内の信号のピッチ周期に相当する）、この切り出し位置に対応付けられる符号が決定される。
【００１３】
符号帳３７からの切り出し位置に対応付けられた符号を用いることにより少ない情報量により品質劣化の少ない復号を行うことができる。
この符号を周期性情報として、現在の符号化対象のフレームの符号化符号と併せて復号化器に送信する。
線形予測合成フィルタ４４は図１中の合成フィルタと同様に前述した式で表され、線形予測分析手段１２と量子化手段１３で得られた量子化線形予測係数α_ｉ、もしくはこの係数α_ｉをＬＳＰ（ＬｉｎｅＳｐｅｃｔｒｕｍＰａｉｒ）や偏自己相関係数に変換したものから次フレーム線形予測係数推定器５０で次フレーム以降のフィルタ係数を推定して設定する。また、バッファ３５の信号を線形予測分析し、量子化した係数から設定してもよいが、量子化係数に対応する符号も併せて復号化器に送信する必要がある。
【００１４】
また、現フレームの線形予測係数を用いて線形予測合成フィルタを設定することもできる。この場合には、次フレーム線形予測係数推定器は不要である。
（実施例２）
図４に本発明の実施例２の構成を示す。
この実施例は図３に示した実施例１と比較して、符号帳３７から切り出された信号系列に対して、利得付与手段４８、利得制御手段４９で利得が付与され、励振信号として線形予測合成フィルタ４４に入力される点が異なる。したがって、付与された利得に対応付けられる符号と、符号帳３７からの切り出し位置に対応付けられる符号とを併せて周期性情報とし、現在の符号化対象のフレームの符号化符号に併せて復号化器に送信する。
（実施例３）
図５に本発明の実施例３の構成を示す。
【００１５】
これは実施例１もしくは実施例２に示した符号化器に対応する復号化器として、図２に示した復号化器に復号化対象となるフレームの復号化符号が受信できなかった場合、直前フレームの符号化符号と併せて受信されている現在のフレームの周期性情報を用いて、出力信号を補償するフレーム消失補償手段を加えたものである。
入力端子３１からフレーム単位で与えられるべき入力符号が復号化器で受信されなかったとフレーム消失検出手段５１で判定された場合、切換スイッチ５５を線形予測合成フィルタ５４側へ切換え、直前のフレームの符号化符号と併せてすでに受信されている周期性情報、つまり、直前のフレームまでに励振信号として線形予測合成フィルタ３３に入力された信号系列を格納した符号帳５２から、入力符号を受信できなかった現在のフレームの励振信号を切り出す位置に対応する符号もしくはこの位置符号とこの位置に対応する利得符号を取り出す。符号帳５２から位置符号の示す切り出し位置よりフレーム長さ分の信号系列を切り出し、もしくは切り出した信号系列に利得付与手段５３から利得符号に対応付けられる利得を利得制御手段５６で付与し、励振信号とする。この際、切り出した信号系列がフレーム長さより短い場合、この信号系列をフレーム長さ分繰り返し励振信号とする。得られた励振信号と次フレーム線形予測係数推定器５８で推定された現在のフレームの線形予測係数、あるいは直前のフレームの線形予測係数により線形予測合成フィルタ５４を用いて合成信号を得る。
【００１６】
線形予測合成フィルタ５４は図２中の線形予測合成フィルタ３３と同様に復号化手段３２で得られた量子化線形予測係数α_ｉ、もしくはこの係数α_ｉをＬＳＰ（ＬｉｎｅＳｐｅｓｔｒｕｍＰａｉｒ）や偏自己相関係数に変換したものから次フレーム以降のフィルタ係数として次フレーム線形予測係数推定器５８で推定して設定する。また、これらとは別に線形予測合成フィルタ係数が符号化され、周期性情報と併せて受信されている場合、復号化手段３２で復号化して線形予測合成フィルタ係数として用いる。合成フィルタ係数を設定する際、上記いずれの手法を用いるにしても、符号化器において周期性情報を決定する際に用いられた合成フィルタ係数と等しい係数を用いるのが望ましい。また、生成された励振信号を過去の励振信号をバッファリングしている適応符号帳４０と符号帳５２に格納する。
（実施例４）
本発明の実施例４を説明する。
【００１７】
実施例４は、実施例１，２に示した符号化器において、バッファ３５内の信号特性を分析した特性情報（周期的・非周期的情報）を周期性情報に含める。
実施例１もしくは実施例２では、現在の符号化対象フレームの次フレーム以降の未来の入力信号の周期性情報として、過去の励振信号を格納した符号帳から励振信号を切り出す位置符号、もしくはこれと利得符号の組み合わせを用いるが、さらにバッファ３５内の信号特性を分析し、この特性情報も周期性情報に含めて送信する。具体的には、この実施例における信号特性の分析には、バッファ３５内信号系列のパワー｜｜ｓ｜｜^２、実施例１もしくは実施例２で符号帳の切り出し位置情報を決定した際に計算された聴覚重み付け誤差｜｜Ｗｄ｜｜^２（聴覚重み付けフィルタ４７の出力）を用いる。パワー｜｜ｓ｜｜^２が音声が無いもしくは非常に小さいと判定される閾値ｐ_０よりも大きい場合に、▲１▼線形予測合成フィルタ４４で合成された信号とバッファ３５の信号との信号対重み付け誤差比｜｜Ｗｄ｜｜^２／｜｜ｓ｜｜^２が、バッファ３５信号が周期性が高いと判定される閾値ｅ_ｈｉｇｈよりも大きいか、▲２▼もしくは直前フレームでの同様の処理において｜｜Ｗｄ｜｜^２／｜｜ｓ｜｜^２がｅ_ｈｉｇｈよりも大きく、かつ現在のフレームでも｜｜Ｗｄ｜｜^２／｜｜ｓ｜｜^２が閾値ｅ_ｌｏｗよりも大きければ、バッファ３５の信号は周期的と判定し、符号帳の切り出し位置符号は制御手段４３で求められた値とする。また、前述した条件を満たさない場合はバッファ３５の信号は非周期的と判定し、符号帳３７とは別の符号帳（白色雑音系列）の中に切り出し位置に対応付けられるあらかじめ非周期的であることを表す一つもしくは複数の符号を用意し、この非周期的であることを示す符号を選択して送信する。
（実施例５）
図６に本発明の実施例５の構成を示す。
【００１８】
これは実施例４に示した符号化器に対応する復号化器である。
入力端子３１からフレーム単位で与えられるべき入力符号が復号化器で受信されなかったとフレーム消失検出手段５１で判定された場合、切換スイッチ５５を線形予測合成フィルタ５４側へ切り換え、直前のフレームの符号化符号と併せてすでに受信されている周期性情報、つまり符号帳５２から励振信号を切り出す位置に対応する符号もしくはこの位置符号とこの位置に対応する利得符号を取り出す。切り出し位置符号が実際には符号帳５２から切り出す位置に対応する符号ではなく、符号化器で送信された非周期性を表す符号を検出した場合、すなわち、これを非周期性符号検出手段６２で検出し、切換スイッチ６３を白色雑音系列符号帳６１側に切換えて、符号帳５２から切り出す代わりに、白色雑音系列もしくは非周期性を示す信号系を格納した白色雑音系列符号帳６１から励振信号を取り出し、利得制御手段５６で利得を付与し、線形予測合成フィルタ５４に入力して合成信号を合成する。白色雑音系列符号帳６１は１または複数の白色雑音符号系列を備え、複数の白色雑音符号系列を用いる場合には符号化器でこの選択信号を送信する。
【００１９】
また、この発明の符号化器、復号化器をＣＰＵやメモリ等を有するコンピュータと、アクセス主体となるユーザが利用する利用者端末と、記録媒体から構成することができる。
記録媒体は、ＣＤ−ＲＯＭ、磁気ディスク、半導体メモリ等の機械読み取り可能な記録媒体であって、ここに記録された音響信号符号化・復号化プログラムは、コンピュータに読み取られ、コンピュータの動作を制御し、コンピュータ上に前述した各構成要素、すなわち、線形予測分析手段、量子化手段、線形予測フィルタ等を実現する。
【００２０】
図７に、復号化器が１フレーム分の符号化符号を受信できなかった場合に、本手法と従来手法でフレーム補償を行った音声波形を示す。従来手法では復号化器において過去に受信された符号化符号もしくはその符号化符号から合成された合成信号を分析して得られた情報のみを用いて、励振信号を推定するため、音声のピッチ周期がフレーム間で急激に変動し、そのフレームの符号化符号が欠落した場合においては周期性変動が正しく表現されず、また、それ以降フレーム情報が正しく受信されても適応符号帳内の信号系列が符号化器と復号化器で一致しないため周期の不連続性が持続する。これに対して、本手法は、あらかじめ符号化器において、現在の符号化すべきフレームの符号化符号と併せて、次フレーム以降の入力信号の周期性情報を分析して符号化し、復号化器に送信することで、ピッチ周期が変動するフレーム情報が欠落しても周期変動を復元することができ、また、符号化器と復号化器で起こる適応符号帳の不一致も従来手法よりも急速に改善される。
【００２１】
【発明の効果】
以上説明したようにこの発明によれば、従来のＣＥＬＰ符号化・復号化方式において、フレーム消失補償を行うことによりフレーム単位での符号化符号の欠落による品質劣化を抑えることができる。
【図面の簡単な説明】
【図１】従来のＣＥＬＰ符号化器の構成を示すブロック図。
【図２】従来のＣＥＬＰ復号化器の構成を示すブロック図。
【図３】実施例１の符号化器の構成を示すブロック図。
【図４】実施例２の符号化器の構成を示すブロック図。
【図５】実施例３の復号化器の構成を示すブロック図。
【図６】実施例５の復号化器の構成を示すブロック図。
【図７】従来手法と本発明手法による音声波形補償を比較するための図。
【符号の説明】
１１，３１入力端子
１２線形予測分析手段
１３量子化手段
１４，３３，４４，５４線形予測合成フィルタ
１５，４５減算手段
１６，４７聴覚重み付けフィルタ
１７，４６制御手段
１８，３６，４８，５３利得付与手段
２０，４０適応符号帳
２１，４１固定符号帳
２２，２３，４２，４３，４９，５６利得制御手段
２５，４５加算手段
３２復号化手段
３５バッファ
３７符号帳
３８ポストフィルタ
３９出力端子
５０，５８次フレーム線形予測係数推定器
５５，６３切換スイッチ
６１白色雑音系列符号帳
６２非周期性符号検出手段[0001]
BACKGROUND OF THE INVENTION
In the present invention, the encoding code of the frame to be decoded cannot be received by the decoder in the acoustic signal encoding / decoding in which the code is determined so as to minimize the error between the input signal and the synthesized signal by encoding. The present invention relates to an audio signal encoding / decoding method and apparatus, an audio signal encoding / decoding program, and a recording medium that perform frame erasure compensation of an output signal in the case of recording.
[0002]
[Prior art]
Conventionally, CELP (Code Excited Linear Prediction) is a typical method for encoding an acoustic signal at a low bit rate by linear predictive coding.
FIG. 1 shows the configuration of a conventional CELP encoder.
The input acoustic signal from the input terminal 11 is subjected to linear prediction analysis by the linear prediction analysis means 12 for each frame of about 5 to 20 ms, and p-order linear prediction coefficients α _i ^, i = 1,. The linear prediction coefficient α _i ^ is quantized by the quantizing means 13, and the quantized linear prediction coefficient α _i is set as a filter coefficient in the linear prediction synthesis filter 14.
[0003]
The transfer function of the linear prediction synthesis filter 14 is expressed by the following equation.
[0004]
[Expression 1]

The excitation signal of the linear prediction synthesis filter 14 is stored in the adaptive codebook 20, and the excitation signal (vector) is cut out from the adaptive codebook 20 based on the pitch period corresponding to the code from the control means 17, and this is converted into the frame length or It is repeated for the length of the subframe (divided frame), gain is applied by the gain applying means 18 and the gain control means 22, and supplied to the linear prediction synthesis filter 14 as an excitation signal through the adding means 25.
[0005]
The subtracting means 15 subtracts the synthesized signal from the linear predictive synthesis filter 14 from the input signal, the difference signal is weighted corresponding to the masking characteristic of the auditory characteristic by the auditory weighting filter 16, and the weighted difference is given by the control means 17. A code (that is, pitch period) from the adaptive codebook 20 is searched so that the energy of the signal is minimized.
Thereafter, the excitation vector is sequentially taken out from the fixed codebook 21 by the control means 17, and after gain is given by the gain applying means 18 and the gain control means 23, it is added to the excitation vector from the previously selected adaptive codebook 20. An addition signal is supplied to the linear prediction synthesis filter 14 as an excitation signal by the means 25, and an excitation vector that minimizes the energy of the difference signal from the perceptual weighting filter 16 is selected as in the previous case, and a code associated with this is selected. A book code is determined.
[0006]
Finally, the perceptual weighting filter 16 is applied to each excitation vector from the selected adaptive codebook 20 and fixed codebook 21 in the same manner as described above so that each gain applied by the gain applying means 18 is optimum. The gain that minimizes the energy of the output signal is selected, and the gain code associated with this gain is determined.
Further, the generated excitation signal is stored in the adaptive codebook 20 that buffers the past excitation signal.
Each obtained code, that is, a pitch code (adaptive codebook code), a fixed codebook code, a pitch code gain, a fixed code gain, and a linear prediction coefficient are collected in units of frames and transmitted to a decoder.
[0007]
FIG. 2 shows the configuration of a decoder for this CELP coding.
The linear prediction coefficient code in the input code from the input terminal 31 is decoded by the decoding means 32 and set in the linear prediction synthesis filter 33 as a filter coefficient.
An excitation vector is extracted from the adaptive codebook 40 by the pitch code in the input code, and an excitation vector is selected from the fixed codebook 41 by the fixed codebook code, and each excitation vector from these

codebooks

40 and 41 is gain adding means. 36, gains are added according to the gain (pitch code gain, fixed code gain) code in the input code by the gain control means 42 and 43, and then added by the addition means 45 to the linear prediction synthesis filter 33. As given. Further, the excitation signal is stored in the adaptive codebook 40 that buffers the past excitation signal.
[0008]
The synthesized signal from the linear predictive synthesis filter 33 is processed by the post filter 38 so that the quantization noise is reduced in consideration of auditory characteristics, and an acoustic signal is output from the output terminal 39.
When the decoder does not receive the input code from the input terminal 31 in units of frames (hereinafter referred to as frame erasure), in the prior art, the decoder analyzes the past synthesized signal or uses the past information already received. A linear prediction coefficient, a pitch period, and the like are estimated, and a linear prediction synthesis filter 33 is pseudo-configured from these pieces of information, an excitation signal to be input thereto is obtained, and a pseudo output signal is synthesized. In this method, the output signal is compensated only from the past received information. Therefore, when the pitch period varies (including voiced and unvoiced variations) between frames, discontinuity of the excitation signal occurs due to mismatch of pitch intervals. In CELP in which past excitation signals are stored in the adaptive codebook 20 of the encoder and the adaptive codebook 40 of the decoder and used as excitation vectors, the adaptive code of the encoder is used even in a frame in which reception after the lost frame is recovered. Since the excitation vectors output from the book 20 and the adaptive codebook 40 of the decoder do not coincide with each other, the discontinuity of the waveform continues over a plurality of frames, resulting in a great deterioration in hearing.
[0009]
[Problems to be solved by the invention]
In the present invention, in a conventional audio signal encoding / decoding such as CELP, a frame that suppresses deterioration in a section (including voiced and unvoiced fluctuations) in which the pitch period greatly changes when the frame is lost. It is an object of the present invention to provide erasure compensation means.
[0010]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, in the present invention, in encoding, an input signal after the next frame is stored in a buffer with respect to a current frame including an input signal to be encoded, and a signal stored in the buffer is stored. The periodicity information obtained by analyzing is transmitted together with the encoded code determined in the current frame. In decoding, when the decoding code of the frame to be decoded cannot be received, the output signal is compensated using the periodicity information received together with the encoding code of the immediately preceding frame.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
(Example 1)
FIG. 3 shows the configuration of the first embodiment of the present invention.
In this embodiment, the conventional encoder shown in FIG. 1 stores an input signal from the next frame on in the buffer with respect to the current frame including the input signal to be encoded, and the signal stored in the buffer. Is added with means for transmitting the periodicity information obtained from the analysis together with the encoded code determined in the current frame.
[0012]
An input signal from the input terminal 11 is stored in the buffer 35 in addition to the current frame to be encoded, and in the buffer 35, future input signals to be encoded in subsequent frames. The length of the buffer may be shorter than a frame to several frames. After each coding code has been determined for the current frame to be coded by the conventional coding method shown in FIG. 1, it was used as an input to the linear prediction synthesis filter 14 to synthesize the output signal. The excitation signal is stored in the code book 37. For the signal sequence stored in the codebook 37, a signal corresponding to the length of the buffer 35 is cut out from a position corresponding to the code from the control means 46. At this time, if the extracted signal is not enough for the length of the buffer 35, the extracted signal is repeated until the length of the buffer 35 is reached. The obtained signal sequence is input to the linear prediction synthesis filter 44 as an excitation signal, and the next frame linear prediction coefficient estimator 50 obtained from the linear prediction coefficient of the current frame from the quantizing means 13 or the linear prediction coefficient is obtained. The linear prediction synthesis filter 44 is configured using the linear prediction coefficients of the frames, and a synthesized signal is obtained. The subtracting means 45 obtains a difference signal between the signal stored in the buffer 35 and the obtained combined signal, and the difference signal is weighted corresponding to the auditory characteristic and the masking characteristic by the auditory weighting filter 47, and the control means 46. Thus, the extraction position is searched from the code book 37 so that the energy of the weighted difference signal is minimized (corresponding to the pitch period of the signal in the buffer 35), and the code associated with this extraction position is determined. .
[0013]
By using the code associated with the cut-out position from the codebook 37, it is possible to perform decoding with little quality degradation with a small amount of information.
This code is transmitted as periodicity information to the decoder together with the encoding code of the current encoding target frame.
The linear prediction synthesis filter 44 is expressed by the above-described equation in the same manner as the synthesis filter in FIG. 1, and the quantized linear prediction coefficient α _i obtained by the linear prediction analysis means 12 and the quantization means 13 or this coefficient α _i is used as The next frame linear prediction coefficient estimator 50 estimates and sets the filter coefficient of the next frame or later from the LSP (Line Spectrum Pair) or the one converted to the partial autocorrelation coefficient. In addition, the signal of the buffer 35 may be set from a coefficient obtained by performing linear prediction analysis and quantized, but a code corresponding to the quantized coefficient needs to be transmitted to the decoder together.
[0014]
A linear prediction synthesis filter can also be set using the linear prediction coefficient of the current frame. In this case, the next frame linear prediction coefficient estimator is unnecessary.
(Example 2)
FIG. 4 shows the configuration of the second embodiment of the present invention.
Compared with the first embodiment shown in FIG. 3, this embodiment gives a gain to the signal sequence cut out from the codebook 37 by the gain applying means 48 and the gain control means 49, and performs linear prediction as an excitation signal. The difference is that it is input to the synthesis filter 44. Therefore, the code associated with the assigned gain and the code associated with the cut-out position from the codebook 37 are combined as periodicity information, and decoded together with the encoded code of the current encoding target frame. To the instrument.
(Example 3)
FIG. 5 shows the configuration of the third embodiment of the present invention.
[0015]
This is a decoder corresponding to the encoder shown in the first or second embodiment, and when the decoding code of the frame to be decoded cannot be received by the decoder shown in FIG. Frame erasure compensation means for compensating the output signal using the periodicity information of the current frame received together with the frame coding code is added.
When the frame erasure detection means 51 determines that the input code to be given from the input terminal 31 in units of frames has not been received by the decoder, the changeover switch 55 is switched to the linear prediction synthesis filter 54 side to change the code of the immediately preceding frame. The input code could not be received from the code book 52 that stores the periodicity information that has already been received together with the encoding code, that is, the signal sequence input to the linear prediction synthesis filter 33 as the excitation signal by the immediately preceding frame. The code corresponding to the position where the excitation signal of the current frame is cut out or the position code and the gain code corresponding to this position are extracted. A signal sequence corresponding to the frame length is cut out from the cut-out position indicated by the position code from the codebook 52, or a gain associated with the gain code is given from the gain applying means 53 to the cut-out signal series by the gain control means 56, and the excitation signal And At this time, if the extracted signal sequence is shorter than the frame length, this signal sequence is repeatedly used as the excitation signal for the frame length. Based on the obtained excitation signal and the linear prediction coefficient of the current frame estimated by the next frame linear prediction coefficient estimator 58 or the linear prediction coefficient of the immediately preceding frame, a synthesized signal is obtained using the linear prediction synthesis filter 54.
[0016]
As with the linear prediction synthesis filter 33 in FIG. 2, the linear prediction synthesis filter 54 uses the quantized linear prediction coefficient α _i obtained by the decoding unit 32 or the coefficient α _i as an LSP (Line Spair Pair) or partial self-phase. The next frame linear prediction coefficient estimator 58 estimates and sets the filter coefficient of the next frame or later from the one converted into the relation number. In addition, when the linear prediction synthesis filter coefficient is encoded separately and received together with the periodicity information, it is decoded by the decoding means 32 and used as the linear prediction synthesis filter coefficient. When setting the synthesis filter coefficient, it is desirable to use a coefficient equal to the synthesis filter coefficient used when determining the periodicity information in the encoder, regardless of which method is used. Further, the generated excitation signal is stored in the adaptive codebook 40 and the codebook 52 in which past excitation signals are buffered.
(Example 4)
Embodiment 4 of the present invention will be described.
[0017]
In the fourth embodiment, in the encoders shown in the first and second embodiments, characteristic information (periodic / non-periodic information) obtained by analyzing signal characteristics in the buffer 35 is included in the periodicity information.
In the first embodiment or the second embodiment, as the periodicity information of the future input signal after the next frame of the current encoding target frame, a position code for extracting the excitation signal from the code book storing the past excitation signal, or A combination of gain codes is used, but the signal characteristics in the buffer 35 are further analyzed, and this characteristic information is also included in the periodicity information and transmitted. Specifically, in the analysis of the signal characteristics in this embodiment, the power || s || ² of the signal sequence in the buffer 35 is calculated when the code book cut-out position information is determined in the first or second embodiment. The perceptual weighting error || Wd || ² (output of perceptual weighting filter 47) is used. When the power || s || ² is larger than the threshold value p ₀ determined that there is no voice or very small, (1) a signal pair of the signal synthesized by the linear prediction synthesis filter 44 and the signal of the buffer 35 The weighting error ratio || Wd || ² / || s || ² is larger than a threshold value e _high at which the buffer 35 signal is determined to have high periodicity, or {circle around (2)} or similar processing in the immediately preceding frame If || Wd || ² / || s || ² is larger than e _high and || Wd || ² / || s || ² is larger than the threshold e _low even in the current frame, The signal is determined to be periodic, and the codebook cut-out position code is a value obtained by the control means 43. If the above-described conditions are not satisfied, the signal in the buffer 35 is determined to be aperiodic, and is aperiodic in advance associated with the cutout position in a codebook (white noise sequence) different from the codebook 37. One or a plurality of codes indicating a certain thing are prepared, and a code indicating this non-periodic is selected and transmitted.
(Example 5)
FIG. 6 shows the configuration of the fifth embodiment of the present invention.
[0018]
This is a decoder corresponding to the encoder shown in the fourth embodiment.
When the frame erasure detection means 51 determines that the input code to be given from the input terminal 31 in units of frames has not been received by the decoder, the changeover switch 55 is switched to the linear prediction synthesis filter 54 side to change the code of the immediately preceding frame. The periodicity information already received together with the encoding code, that is, the code corresponding to the position where the excitation signal is cut out from the code book 52 or the position code and the gain code corresponding to this position are extracted. When the cut position code is not actually the code corresponding to the position cut out from the code book 52 but the code representing the non-periodicity transmitted by the encoder is detected, that is, the non-periodic code detecting means 62 In place of detecting and switching the switch 63 to the white noise sequence codebook 61 side and cutting it out from the codebook 52, an excitation signal is received from the white noise sequence codebook 61 storing a white noise sequence or a signal system showing non-periodicity. Then, the gain control means 56 gives a gain and inputs it to the linear prediction synthesis filter 54 to synthesize a synthesized signal. The white noise sequence codebook 61 includes one or a plurality of white noise code sequences. When a plurality of white noise code sequences are used, the encoder transmits this selection signal.
[0019]
Further, the encoder and decoder of the present invention can be composed of a computer having a CPU, a memory, etc., a user terminal used by a user who is an access subject, and a recording medium.
The recording medium is a machine-readable recording medium such as a CD-ROM, a magnetic disk, or a semiconductor memory, and the acoustic signal encoding / decoding program recorded therein is read by the computer to control the operation of the computer. Then, the above-described components, that is, linear prediction analysis means, quantization means, linear prediction filter, and the like are realized on the computer.
[0020]
FIG. 7 shows a speech waveform in which frame compensation is performed by the present method and the conventional method when the decoder cannot receive an encoded code for one frame. In the conventional method, since the excitation signal is estimated using only the information obtained by analyzing the encoded code received in the past or the synthesized signal synthesized from the encoded code in the decoder, the pitch period of the speech However, when the coding code of the frame is lost, the periodicity fluctuation is not expressed correctly, and the signal sequence in the adaptive codebook is not changed even if the frame information is received correctly thereafter. Since the encoder and decoder do not match, the discontinuity of the cycle continues. On the other hand, in this method, in the encoder, the periodicity information of the input signal after the next frame is analyzed and encoded together with the encoding code of the current frame to be encoded, By transmitting, even if frame information with varying pitch period is lost, the period fluctuation can be restored, and the mismatch of the adaptive codebook that occurs in the encoder and decoder is improved more quickly than the conventional method. Is done.
[0021]
【The invention's effect】
As described above, according to the present invention, in the conventional CELP encoding / decoding method, it is possible to suppress quality deterioration due to missing encoded codes in units of frames by performing frame erasure compensation.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a conventional CELP encoder.
FIG. 2 is a block diagram showing a configuration of a conventional CELP decoder.
FIG. 3 is a block diagram illustrating a configuration of an encoder according to the first embodiment.
FIG. 4 is a block diagram illustrating a configuration of an encoder according to a second embodiment.
FIG. 5 is a block diagram illustrating a configuration of a decoder according to a third embodiment.
FIG. 6 is a block diagram illustrating a configuration of a decoder according to a fifth embodiment.
FIG. 7 is a diagram for comparing speech waveform compensation according to the conventional technique and the technique of the present invention.
[Explanation of symbols]
11, 31 Input terminal 12 Linear prediction analysis means 13 Quantization means 14, 33, 44, 54 Linear prediction synthesis filters 15, 45 Subtraction means 16, 47 Auditory weighting filters 17, 46 Control means 18, 36, 48, 53 Gain provision Means 20, 40

Adaptive codebook

21, 41 Fixed codebook 22, 23, 42, 43, 49, 56 Gain control means 25, 45 Adder means 32 Decoding means 35 Buffer 37 Codebook 38 Post filter 39

Output terminals

50, 58 Next frame linear prediction coefficient estimator 55, 63 selector switch 61 white noise sequence codebook 62 aperiodic code detection means

Claims

The spectral envelope information for each frame of the input acoustic signal is analyzed based on the linear prediction coefficient by linear prediction analysis of the acoustic signal including the frame before and after it, or the past synthesized signal, and is constructed based on the obtained linear prediction coefficient. A coding code for configuring the synthesis filter and the drive excitation signal so as to minimize an auditory weighting error between the synthesized signal obtained by inputting the drive excitation signal to the synthesis filter and the input acoustic signal. In the acoustic signal encoding method for determining
After determining the encoding code of the current frame that is to be encoded,
The acoustic signal to be encoded after the next frame is stored in a buffer, the signal sequence stored in this buffer is analyzed to generate periodicity information including pitch period information,
An acoustic signal encoding method comprising: transmitting an encoding code determined in a current frame together with periodicity information including pitch period information of a next frame.

Received together with a synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame and an encoding code for configuring the drive excitation signal, and periodicity information including the pitch period information of the current frame,
When the loss of the encoding code in the current frame to be decoded is detected, the period including the encoding code of the immediately preceding frame and the pitch period information of the current frame received together with the encoding code of the immediately preceding frame An acoustic signal decoding method, comprising: generating and compensating an output signal of a current frame using a synthesis filter based on sex information.

The acoustic signal encoding method according to claim 1,
The periodicity information is
After determining the encoding code of the current frame that is to be encoded, the drive excitation signal is obtained from the codebook that stores the signal sequence used as the drive excitation signal in the past, including the one used in the current frame. The position where the drive excitation signal is cut out from the codebook so that the perceptual weighting error between the cut-out and synthesized signal input to the synthesis filter and the signal sequence stored in the buffer as the audio signal of the next frame and later is minimized. An audio signal encoding method, wherein the information is determined information.

The encoding code for constructing the driving filter and the synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame, and the information for constructing the driving excitation signal cut out from the code book of the current frame Receive periodic information including
When the loss of the encoded code in the current frame to be decoded is detected, the drive excitation is cut out from the code book of the current frame received together with the encoded code of the immediately preceding frame and the encoded code of the immediately preceding frame. Synthesis based on the drive excitation signal cut out from the codebook storing the signal sequence used as the past drive excitation signal including the one used in the previous frame by the periodicity information including the information for composing the signal An acoustic signal decoding method characterized by generating and compensating an output signal of a current frame using a filter.

The acoustic signal encoding method according to claim 1,
The periodicity information is
After determining the encoding code of the current frame that is to be encoded, the drive excitation signal is obtained from the codebook that stores the signal sequence used as the drive excitation signal in the past, including the one used in the current frame. From a codebook that has been cut out and gained, and determined so that the perceptual weighting error between the synthesized signal input to the synthesis filter and synthesized and the signal sequence stored in the buffer as the input signal after the next frame is minimized A method for encoding an acoustic signal, comprising: a position where a drive excitation signal is cut out, and gain information corresponding to the position.

Corresponding to the position where the drive excitation signal of the code book of the current frame is encoded, the encoding filter for composing the drive excitation signal and the synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame Receiving periodicity information including a gain code;
When the loss of the encoded code in the current frame to be decoded is detected, the code excitation of the current frame received together with the encoded code of the immediately preceding frame and the encoded code of the immediately preceding frame A method of decoding an acoustic signal, comprising: generating and compensating an output signal of a current frame using a synthesis filter based on periodicity information including a position where a frame is cut out and a gain code corresponding thereto.

The acoustic signal encoding method according to claim 3 or 5,
After determining the encoding code of the current frame to be encoded, the drive excitation signal cut out from the code book storing the past drive excitation signal including the one used in the current frame is used as a synthesis filter. Analyze auditory weighting error between the synthesized signal input and synthesized and the signal sequence stored in the buffer as the input signal after the next frame, and determine the periodic or aperiodic characteristics of the input signal after the next frame An acoustic signal encoding method characterized by generating input signal characteristic information and transmitting it together with periodicity information including the input signal characteristic information and an encoding code of a current frame to be encoded.

Periodicity including periodic or aperiodic information of the input signal characteristics of the next frame, and the synthesis code of the immediately preceding frame generated by encoding the acoustic signal for each frame and the driving excitation signal Receive information together,
If loss of the encoding code in the current frame to be decoded is detected, the encoding code of the immediately preceding frame and the encoding code of the immediately preceding frame as the drive excitation signal of the current frame to be decoded that could not be received Select a signal sequence or white noise sequence extracted from a codebook that stores past drive excitation signals using periodicity information including periodic or aperiodic information of the input signal characteristics of the current frame received together with A method for decoding an acoustic signal, comprising: switching, generating an output signal of the current frame using a synthesis filter, and compensating.

Input linear excitation analysis means to obtain linear prediction coefficient by linear prediction analysis of the acoustic signal including the frame envelope before and after the spectrum envelope information for each frame of the input acoustic signal or past synthesized signal, and drive excitation signal A synthesis filter configured based on a linear prediction coefficient that outputs a synthesized signal and a code for configuring the synthesis filter and the drive excitation signal so as to minimize an auditory weighting error between the input speech signal and the synthesized signal. In an acoustic signal encoding device comprising a control means for determining an encoded code and a means for outputting the encoded code,
After determining the encoding code in the current frame that is the object to be encoded by the control means,
A buffer for storing an acoustic signal to be encoded after the next frame;
Periodicity information generating means for analyzing the signal stored in the buffer and generating periodicity information including pitch period information;
An acoustic signal encoding apparatus comprising means for transmitting both the encoding code of the current frame and the periodicity information including the pitch period information of the next frame.

Receives both the synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame and the encoding code for constructing the drive excitation signal and the periodicity information including the pitch period information after the current frame. ,
Frame loss detection means for detecting that the coding code of the current frame to be decoded has been lost;
When a frame loss is detected, the synthesis filter is used to determine the current frame's encoding code based on the periodicity information including the pitch period information of the current frame received together with the encoding code of the previous frame and the encoding code of the previous frame. An acoustic signal decoding apparatus comprising frame erasure compensation means for generating an output signal.

Processing for obtaining a linear prediction coefficient by performing a linear prediction analysis on an acoustic signal including a frame before or after the spectrum envelope information of the input acoustic signal for each frame, or a past synthesized signal;
A process of configuring a synthesis filter based on the obtained linear prediction coefficient;
An encoding code for configuring the synthesis filter and the drive excitation signal is minimized so as to minimize an auditory weighting error between the synthesized signal obtained by inputting the drive excitation signal to the synthesis filter and the input audio signal. Process to determine,
After determining the encoding code of the current frame that is to be encoded, the acoustic signal that is to be encoded after the next frame is stored in a buffer, and the signal stored in this buffer is analyzed to analyze the pitch period. Processing to generate periodicity information including information;
An acoustic signal encoding program for causing a computer to execute a process of transmitting together an encoding code determined in a current frame and periodicity information including pitch period information of a next frame.

Processing for obtaining a linear prediction coefficient by performing a linear prediction analysis on an acoustic signal including a frame before or after the spectrum envelope information of the input acoustic signal for each frame, or a past synthesized signal;
A process of configuring a synthesis filter based on the obtained linear prediction coefficient;
An encoding code for configuring the synthesis filter and the drive excitation signal is minimized so as to minimize an auditory weighting error between the synthesized signal obtained by inputting the drive excitation signal to the synthesis filter and the input audio signal. Process to determine,
After determining the encoding code of the current frame that is to be encoded, the acoustic signal that is to be encoded after the next frame is stored in a buffer, and the signal stored in this buffer is analyzed to analyze the pitch period. Processing to generate periodicity information including information;
A recording medium on which is recorded an acoustic signal encoding program that causes a computer to execute a process of transmitting the encoded code determined in the current frame and the periodicity information including the pitch period information of the next frame.

A process for receiving the synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame and the encoding code for constructing the drive excitation signal and the periodicity information including the pitch information of the current frame; ,
Detecting the loss of the encoded code of the current frame to be decoded;
When the loss of the encoding code of the current frame is detected, it is synthesized by the encoding information of the immediately preceding frame and the periodicity information including the pitch information of the current frame received together with the encoding code of the immediately preceding frame. An acoustic signal decoding program for causing a computer to execute processing for generating and compensating an output signal of a current frame using a filter.

A process for receiving the synthesis filter of the immediately preceding frame generated by encoding the acoustic signal for each frame and the encoding code for constructing the drive excitation signal and the periodicity information including the pitch information of the current frame; ,
Detecting the loss of the encoded code of the current frame to be decoded;
When the loss of the encoding code of the current frame is detected, it is synthesized by the encoding information of the immediately preceding frame and the periodicity information including the pitch information of the current frame received together with the encoding code of the immediately preceding frame. A recording medium on which an audio signal decoding program for causing a computer to execute a process of generating and compensating an output signal of a current frame using a filter is recorded.