JP2004229184A

JP2004229184A - Method and device for encoding time-series signal

Info

Publication number: JP2004229184A
Application number: JP2003017224A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2003-01-27
Filing date: 2003-01-27
Publication date: 2004-08-12
Anticipated expiration: 2023-01-27
Also published as: JP4139697B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and a device for encoding a time-series signal in which more highly efficient compression can be executed by selecting the optimum method as needed in accordance with fluctuations of signal characteristics in the time-series signal. <P>SOLUTION: An estimated error is calculated by a plurality of estimating calculation formulas, which are prepared beforehand, for each sample of the time-series signal consisting of sample sequences in time-series (S1). The estimated error to become an object for encoding is selected by utilizing an accumulated error in each estimating calculation expression (S2). After updating the accumulated error (S3), the accumulated error is re-set whenever the number of the samples exceed a prescribed number (S4, S5). The estimated error, which is optimum for each sample, can be obtained by executing the processing repeatedly. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【産業上の利用分野】
本発明は、音楽制作、音響データの素材保管、ロケ素材の中継など音楽制作分野、特にＣＤよりも品質の高い高精細オーディオ制作を行う分野、ＣＤ、ＤＶＤ等のデジタル記録媒体を用いたオーディオ記録再生分野、遠隔医療における生体信号の伝送等、データの改変が嫌われる分野等において好適なデータの可逆圧縮技術に関する。
【０００２】
【従来の技術】
従来より、音響信号の圧縮には様々な手法が用いられている。音響信号を圧縮して符号化する手法として、ＭＰ３（ＭＰＥＧ−１／Ｌａｙｅｒ３）、ＡＡＣ（ＭＰＥＧ−２／Ｌａｙｅｒ３）などが実用化されている。このような圧縮符号化方式により、音響信号を小さいデータとして扱うことが可能となり、データの記録・伝送の効率化に貢献している。
【０００３】
上述のようなＭＰ３、ＡＡＣ等はいずれもロッシー符号化方式といわれるものであり、効率的な圧縮が可能であるが、復号化にあたって、少なからず品質の劣化を伴い、原信号を完全に再現することはできない。そのため、音楽制作、素材保管、ロケ素材の中継など音楽制作分野では、これらの符号化方式を適用できず、非効率ではあるが、非圧縮で保存・伝送する方式がとられている。特に最近は高精細オーディオを扱うプロダクションが増え、素材容量が膨大になり、ワークディスクを管理する上で問題になってきていた。
【０００４】
最近では、上記問題を解決するため、音響信号を可逆圧縮符号化する方法として、予測符号化を利用し、予測誤差のデータを出現頻度に応じた符号化処理と組み合わせたものも提案されている（例えば、特許文献１参照）。
【０００５】
また、本出願人も、音響信号のサンプル列に対してチャンネル間、フレーム間の差分演算を行って、各サンプルの値を小さくした後、予測符号化を利用してデータの圧縮を行う技術について提案している。（特許文献２参照）。
【０００６】
【特許文献１】
特開２００２−２７８６００号公報
【特許文献２】
特願２００２−２３１１５０号
【０００７】
【発明が解決しようとする課題】
しかしながら、上記特許文献１、特許文献２で提案した技術では、予測の方式が１つに限定されている。例えば、特許文献２に記載の技術では、過去２サンプルに基づく予測符号化を提案している。過去２サンプルに基づく予測符号化が適用できるコンテンツのジャンルは最も広いが、変化が緩やかなコンテンツは過去１サンプル、変化が激しいコンテンツは過去３サンプル以上に基づいて予測符号化を行った方が予測誤差が小さくなることが確認されている。そこで、複数の計算式を準備しておき、コンテンツに合わせて適切なものを選択する方式もとれるが、コンテンツ全体が均一な性質をもっていないと圧縮効果が得られず、特にヴォーカルやミックスされた音楽では、圧縮効率が低下するという問題がある。
【０００８】
また、特許文献２で提案した内容では、あらかじめ設定した圧縮困難な固定ビット長の量子化雑音成分を可変長符号化対象から分離する手法をとっていた。このときの固定ビット長は、Ａ／Ｄ変換器のビット精度や、オーディオ収録時の条件などに依存してコンテンツごとに異なるため、随時最適な値を設定する必要がある。さらに、コンテンツ内においても信号振幅の変化により最適な固定ビット長が変化することが判明し、一律に処理すると圧縮効果が低下するという問題がある。
【０００９】
上記のような点に鑑み、本発明は、時系列信号内の信号特性の変化に伴って、随時最適な方式を選択していくことにより、より効率の高い圧縮を行うことが可能であると共に、復号時には、元の時系列信号を完全に復号することが可能な時系列信号の符号化方法および装置を提供することを課題とする。
【００１０】
【課題を解決するための手段】
上記課題を解決するため、本発明では、時系列のサンプル列で構成される時系列信号に対して、前記全てのサンプル列を再現できるように情報量を圧縮する符号化方法として、前記サンプル列の各サンプルに対して、時間的に過去のサンプル列から、複数の予測計算式に基づいて複数の予測誤差値を算出する予測誤差算出段階、過去のサンプル列の予測誤差値の各予測計算式別の累積値である累積誤差に基づいて、前記予測誤差算出段階で算出された複数の予測誤差値の中から、符号化対象の予測誤差値として１つを選択する符号化対象誤差選択段階、前記符号化対象として選択された予測誤差値を有する誤差サンプルに対して可変長符号で符号化を行う可変長符号化段階を有し、前記予測誤差算出段階、符号化対象誤差選別段階、可変長符号化段階を全サンプルについて実行するようにしたことを特徴とする。
【００１１】
本発明によれば、各サンプルについて、複数の予測計算式に基づいて複数の予測誤差値を算出し、過去のサンプル列に対する各計算式別の累積予測誤差が最小となる計算式の予測誤差を、そのサンプルの予測誤差として選出するようにしたので、時間の変化に伴う信号波形の変化に応じて、最適な計算式に基づいて予測誤差を算出するようにしたので、より効率の高い圧縮を行うことが可能となる。
【００１２】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して詳細に説明する。
（時系列信号の構成）
まず、本発明で符号化対象とする時系列信号について説明する。ここでは、時系列信号として複数のチャンネルを有する音響信号の場合を例にとって説明する。まず、時系列信号であるアナログの音響信号をデジタル化する。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周波数でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えば良い。本実施形態では、サンプリング周波数４４．１ＫＨｚ、量子化ビット数１６ビットで正負の符号を記録した場合を想定して以降説明する。サンプリング周波数４４．１ＫＨｚでサンプリングすると、１秒あたり４４１００個のサンプルにより構成されるサンプル列ができることになる。図１に、先頭からの所定数のサンプルを模式的に示す。図１において横軸は時刻ｔ、縦軸は振幅ｘ（ｔ）を示す。図１において、ｔ−４〜ｔ＋５の数字は各サンプルのサンプル番号を示しており、上述のように４４．１ＫＨｚでサンプリングすると、この１０サンプルは、１／４４１０秒に相当する。なお、ここでは、サンプル番号と時刻を同義で用いている。これは、サンプル番号が時刻の変化に伴って昇順に付与されており、時刻にサンプリング周波数を乗じた値がサンプル番号となるためである。また、図１において各サンプル番号から延びる線分は振幅値を示しているが、この振幅値は、上述のように量子化ビット数１６ビットで正負の符号を記録した場合、−３２７６８〜３２７６７の値をとることになる。
【００１３】
（処理概要）
続いて、上記のような時系列信号に対する符号化処理について説明していく。まず、最初に、適応型線形予測符号化を行う。適応型線形予測符号化の処理概要を図２のフローチャートに示す。図２において、まず、あらかじめ準備された複数の予測計算式を用いて、各予測計算式に対応した線形予測誤差を算出する（ステップＳ１）。具体的には、サンプル番号ｔの予測誤差を算出する予測計算式として、以下の〔数式１〕〜〔数式４〕を用意している。
【００１４】
〔数式１〕
ｅ１（ｔ）＝ｘ（ｔ）−ｘ（ｔ−１）−ｅ１（ｔ−１）／２
【００１５】
〔数式２〕
ｅ２（ｔ）＝ｘ（ｔ）−２×ｘ（ｔ−１）＋ｘ（ｔ−２）−ｅ２（ｔ−１）／２
【００１６】
〔数式３〕
ｅ３（ｔ）＝ｘ（ｔ）−３×ｘ（ｔ−１）＋３×ｘ（ｔ−２）−ｘ（ｔ−３）−ｅ３（ｔ−１）／２
【００１７】
〔数式４〕
ｅ４（ｔ）＝ｘ（ｔ）−４×ｘ（ｔ−１）＋６×ｘ（ｔ−２）−４×ｘ（ｔ−３）＋ｘ（ｔ−４）−ｅ４（ｔ−１）／２
【００１８】
上記〔数式１〕〜〔数式４〕において、ｅ１（ｔ）〜ｅ４（ｔ）は各予測計算式による時刻ｔのサンプルにおける予測誤差であり、ｘ（ｔ）〜ｘ（ｔ−４）は時刻ｔ〜ｔ−４における振幅値である。
【００１９】
上記〔数式２〕における「２×ｘ（ｔ−１）−ｘ（ｔ−２）」、上記〔数式３〕における「３×ｘ（ｔ−１）−３×ｘ（ｔ−２）＋ｘ（ｔ−３）」、上記〔数式４〕における「４×ｘ（ｔ−１）−６×ｘ（ｔ−２）＋４×ｘ（ｔ−３）−ｘ（ｔ−４）」は過去の２〜４個のサンプルに基づく線形予測成分である。この線形予測成分、および、直前のサンプルにおいて算出された予測誤差「ｅ１（ｔ−１）／２」〜「ｅ４（ｔ−１）／２」（誤差フィードバック成分）を用いて時刻ｔにおける予測誤差ｅ１（ｔ）〜ｅ４（ｔ）を算出する。
【００２０】
続いて、上記各予測計算式別の予測誤差値の絶対値の累積である累積誤差が最小となる線形予測誤差をそのサンプルの予測誤差として選出する（ステップＳ２）。ここでは、累積誤差という考え方を用いている。具体的には、各予測計算式〔数式１〕〜〔数式４〕により算出された予測誤差の過去のサンプルについての累積値をＲ１〜Ｒ４として設定する。そして、この累積誤差Ｒ１〜Ｒ４のうち、最小となるものに対応する予測誤差を選出する。例えば、Ｒ１〜Ｒ４のうち、Ｒ２が最小であったとする。この場合、〔数式２〕で算出された予測誤差ｅ２（ｔ）を符号化対象とする予測誤差ｅ（ｔ）として選出することになる。選出された予測誤差ｅ（ｔ）はサンプルの元の値ｘ（ｔ）と置き換えられて以降処理が行われることになる。元のサンプルと区別するために予測誤差ｅ（ｔ）を記録したサンプルを誤差サンプルと呼ぶことにする。
【００２１】
続いて、累積誤差Ｒ１〜Ｒ４に各予測誤差ｅ１（ｔ）〜ｅ４（ｔ）の絶対値を加算する（ステップＳ３）。具体的には、以下の〔数式５〕に示すように、累積誤差値となる変数Ｒ１〜Ｒ４を更新していく。同時に、各サンプルの処理を行う度に、カウンタを１つづつ加算していく処理を行う。
【００２２】
〔数式５〕
Ｒ１←Ｒ１＋｜ｅ１（ｔ）｜
Ｒ２←Ｒ２＋｜ｅ２（ｔ）｜
Ｒ３←Ｒ３＋｜ｅ３（ｔ）｜
Ｒ４←Ｒ４＋｜ｅ４（ｔ）｜
【００２３】
続いて、カウンタが所定回数を超えたかどうかの判定を行う（ステップＳ４）。本実施形態では、この所定回数を１００回として設定している。すなわち、カウンタが１００を超えたかどうかの判定を行う。
【００２４】
この結果、カウンタが１００を超えていたら、累積誤差を半分にする（ステップＳ５）。具体的には、以下の〔数式６〕に示すように、累積誤差となる変数Ｒ１〜Ｒ４を２で除算する。同時に、カウンタを０にリセットする。すなわち、ここでのＲ１〜Ｒ４は純粋な意味での累積誤差ではなく、累積誤差の移動平均となっている。本実施形態では、直前の最大１００サンプルまでは累積されるが、それ以前のものは半分になるように処理する。これにより、時間的に離れたサンプルの影響が小さくなるようにしている。
【００２５】
〔数式６〕
Ｒ１←（Ｒ１）／２
Ｒ２←（Ｒ２）／２
Ｒ３←（Ｒ３）／２
Ｒ４←（Ｒ４）／２
【００２６】
上記ステップＳ１〜ステップＳ５の処理を時系列信号中の全時刻全サンプルに渡って実行することにより、全サンプルの値が元の振幅値ｘ（ｔ）から対象誤差ｅ（ｔ）に置き換えられることになる。
【００２７】
続いて、各誤差サンプルの正負極性処理を行う（ステップＳ６）。上記ステップＳ１〜ステップＳ５の処理により各サンプルの値は、振幅値から予測誤差に置き換えられたが、各サンプルのビット形式は、当初のままである。通常、コンピュータ等の計算機で演算される場合は、各データは３２ビット単位で処理され、２の補数表現を用いて表現されている。これを、正負の符号付き絶対値表現に変換し、なおかつ、その絶対値部分を上位に１ビット移動させ、正負の符号ビットをＬＳＢ（最下位ビット）に移動させる。ステップＳ６によるビット構成の変換の様子を模式的に示すと図３のようになる。図３（ａ）は処理前のビット構成であり、図３（ｂ）は処理後のビット構成である。このように正負の符号ビットをＬＳＢに移動させるのは、後述するステップＳ７以降（可変長符号化）の処理で、各誤差サンプルのビット長を検出し易くするためである。
【００２８】
ここからは、各誤差サンプルを可変長に変換する処理を行っていく。本実施形態におけるビット長変換は、一般にゴロム符号化と呼ばれる方式を採用している。具体的には、１サンプルを構成するビット成分を上位ビット成分と下位ビット成分に分け、下位ビット成分は変更を加えずそのままとし、上位ビット成分は、上位ビットだけを十進数変換した数値分のビット「０」を並べ、最後にセパレータビット「１」を加えた配列とする。例えば、８ビットのビット成分「００１０１０００」を考えてみる。このとき、下位ビット成分を４ビットとすると、下位ビット成分は「１０００」となる。上位ビットは「００１０」であるため、これを十進数変換した「２」個分の「０」を配列して最後に「１」を加えた「００１」に変換される。この結果、８ビットのビット列「００１０１０００」は、７ビットのビット列「００１１０００」に変換されることになる。本実施形態では、変換の前後でビット成分を不変とする下位ビット成分のビット長を各サンプルで可変とするようにしている。
【００２９】
以下、具体的に説明していく。図４は可変長符号化の概要を示すフローチャートである。まず、過去のサンプルのビット長の移動平均である平均ビット長Ｂｆを算出する（ステップＳ７）。平均ビット長Ｂｆは、過去のビット長の累積値である累積ビット長ＲＢを、過去のサンプル数を基にしたカウンタＣで除算することにより求められる。すなわち、Ｂｆ＝ＲＢ／Ｃで算出される。累積ビット長ＲＢは、初期状態では０であるので、ｔ＝１のサンプルを処理する場合には、ｔ＝１のサンプルのビット長Ｂｄ（ｔ）を初期値として設定しておく。また、初期のカウンタＣ＝１と設定する。
【００３０】
続いて、時刻ｔにおけるサンプルのビット長Ｂｄ（ｔ）を算出する（ステップＳ８）。ｔ＝２以降のサンプルについては、平均ビット長Ｂｆの算出後、サンプルのビット長Ｂｄ（ｔ）を算出する。このビット長Ｂｄ（ｔ）は、上記ステップＳ６のようにビット構成の変換を行ったことにより算出し易くなっている。図３（ｂ）に示したようなビット構成に変換したことにより、各サンプルのビット構成において先頭にビット「１」が出現したところからがビット長となる。次に、変更部のビット長Ｂｖを算出する（ステップＳ９）。これは、上記誤差サンプルのビット長Ｂｄ（ｔ）から平均ビット長Ｂｆを減じることにより算出される。続いて、データの符号出力を行う（ステップＳ１０）。具体的には、上位Ｂｖビットを十進数変換した数値分だけ「０」を出力した後、セパレータビット「１」を出力し、下位Ｂｆビットを不変部として出力する。符号出力は、ハードディスク、ＣＤ−Ｒ等の外部記憶装置への記録として行われることになる。次に、累積ビット長ＲＢにビット長Ｂｄ（ｔ）を加算する（ステップＳ１１）。同時に、各誤差サンプルの処理を行う度に、カウンタＣを１つづつ加算していく処理を行う。続いて、カウンタＣが所定の数を超えたかどうかを判定する（ステップＳ１２）。所定の数としては、ここでも１００程度を設定している。そのため、カウンタが１００を超えたかどうかを判断することになる。この結果、カウンタが１００を超えていたら、累積ビット長ＲＢを半分にする（ステップＳ１３）。具体的には、累積ビット長となる変数ＲＢを２で除算する。同時に、カウンタＣを半分に１／２にする。
【００３１】
上記のようにして、各サンプルについて可変ビット長での符号化が行われて行く。符号化により得られた可変長誤差サンプルは、符号データとして目的とする記録媒体に記録されることになる。なお、ステップＳ１〜ステップＳ１３の処理について、説明の便宜上、図２の適応型線形予測と図４の可変長符号化に分けて説明したが、実際は、各サンプルについてステップＳ１〜ステップＳ１３までの処理を並行して行うようにしている。すなわち、図１に示したようにステップＳ４、ステップＳ５の処理後、ステップＳ１に戻るのではなく、ステップＳ６〜ステップＳ１３の処理を行った後、ステップＳ１に戻るようにしている。
【００３２】
（他の圧縮方式との組み合わせ）
本発明は、上記説明の内容のみであっても十分に圧縮効果をあげることが可能であるが、他の圧縮方式と組み合わせることで、より高い効果を得ることができる。以下に、好ましい組み合わせについて説明する。ここでは、図５に示すようなステレオ音響信号に対して処理を行う場合を想定して説明する。図５（ａ）は、２チャンネルのステレオ音響信号を示しており、Ｃｈ１にＬ（左）信号、Ｃｈ２にＲ（右）信号が記録されている。また、図５（ａ）から（ｄ）においては、左端が開始時刻であり、右端が終端時刻である。図１と比較すると、横軸の時間間隔を凝縮した形式で示している。図６は、本発明に他の方式を組み合わせた場合の全体の処理概要を示すフローチャートである。ここからは、図６に従って説明していく。
【００３３】
（信号平坦部処理方式）
まず、デジタル音響信号であるサンプル列に対して、信号平坦部の処理を行う（ステップＳ２１）。信号平坦部とは、同一の信号レベルが連続する部分のことをいう。特に信号レベルが「０」の無音部、および信号レベルの絶対値が最大の飽和部に現れることが多い。無音部は実際に無音であるか、音が非常に小さく記録されなかった場合に生じるが、飽和部は、信号の録音およびＡ／Ｄ変換の過程において生じる。無音部、飽和部またはそれ以外の同一信号レベルが連続する場合のいずれであっても、信号平坦部は、同一の信号レベルが所定の時間（所定のサンプル数）連続して記録される。このため、この部分は圧縮し易いデータになっている。具体的には、信号平坦部の先頭時刻位置と、同一信号レベルが続くサンプルの個数と、信号レベル（サンプル値）の３つの値を信号平坦部データとして各チャンネルのサンプル列と分離して記録する。各チャンネルのサンプル列からは、信号平坦部が削除される。これを模式的に示すと図５（ｂ）（ｃ）に示すようになる。図５（ｂ）は、信号平坦部処理前のサンプル列である。図５（ｂ）において、網掛けで示した部分は信号平坦部を示す。ステップＳ２１の処理により、信号平坦部は元のサンプル列からは削除され、図５（ｃ）に示すようになる。ただし、復号時に元通りに復元するために、分離された信号平坦部は、図５（ｅ）に示すような形式で記録しておく。
【００３４】
信号平坦部データは、上述のように、信号平坦部ごとに、その先頭時刻（サンプル番号）、サンプル数、サンプル値の３属性で記録する。ここで、先頭時刻とは、信号の開始位置からの時刻であり、図５（ｅ）の例では、先頭からのサンプル番号で記録している。上述のように、サンプル番号をサンプリング周波数で除算すれば、時刻に変換されることになる。サンプル数は、そのサンプル値がどの程度連続して続くかを示す情報である。なお、サンプル数の代わりに信号平坦部の終了時刻を記録するようにしても良い。サンプル値は、デジタル化された信号レベルを示している。ここでは、１６ビットで量子化しているので、最大値は「３２７６７」、最小値は「−３２７６８」となる。すなわち、「０」は無音部、「３２７６７」および「−３２７６８」は飽和部を示している。ただし、信号平坦部を無条件には処理しない。ここでは、データの圧縮を目的としているため、サンプル列の削減分よりも信号平坦部データが大きくなると意味がないからである。したがって、信号平坦部となるサンプルが所定数以上連続する場合に限り信号平坦部データを作成して各チャンネルのサンプル列から分離するのである。
【００３５】
続いて、各サンプルに対して、元のサンプル値から予測誤差への変換処理を行う（ステップＳ１〜ステップＳ５）。これは、図２のフローチャートに示した処理を実行することにより各サンプルの値を予測誤差に変換する。複数チャンネルある場合は、各チャンネルのサンプル列に対して処理を行う。
【００３６】
（チャンネル間演算方式）
次に、予測誤差値が記録された各チャンネルの誤差サンプル列に対して、チャンネル間の差分演算を行う（ステップＳ２２）。これは、同一時刻における誤差サンプル値の差分を単純にとることにより行われる。差分演算の結果は、一方のチャンネルの誤差サンプル列として与え、他方のチャンネルの誤差サンプル列の値は、元のままとしておく。具体的には、図５（ｃ）に示すような２チャンネルのステレオ音響信号の場合Ｃｈ１にはＬ信号の値をそのまま記録しておき、Ｃｈ２にはＲ−Ｌの差分値を与える。一般に、ステレオ音響信号では、同一時刻におけるそれぞれのデータには相関があり、各時刻における両データの差分値は元の値に比べて小さな値となる。これは線形予測により予測符号化した場合も同じである。そのため、図５（ｄ）の例では、Ｃｈ２における各誤差サンプルの値が小さくなり、後に圧縮できる余地が大きくなる。
【００３７】
（フレーム間演算方式）
続いて、チャンネル間演算が行われた各チャンネルの誤差サンプル列に対して、所定の区間長をもつフレームを設定して、設定されたフレーム間の演算を行う（ステップＳ２３）。各フレームを構成する誤差サンプル列の類似度を求め、類似しているフレームを選別する。ここでは、フレーム長を誤差サンプル列の開始時刻から終了時刻までの全区間に渡って固定長としている。具体的には、１フレームを２５６サンプルとしている。サンプル列の先頭から２５６サンプルずつを１フレームとして抽出し、各フレームの類似度を求めていくことになる。フレーム同士の類似度とは、両信号の相関を求めることになるので、相関計算を行うための種々の手法を用いることができるが、ここでは、各フレームにおいて対応する２５６サンプルに対して差分を計算し、各々の絶対値の最大値を算出する。ここでは、基本フレームに対して後続する１００フレームについて各々最大になる差分絶対値を算出し、最大値が所定値以下となるフレームを相関フレームとして選別し、前記基本フレームと１つのグループを形成することになる。この処理は誤差サンプル列の全区間に渡って行われる。ここで、ステップＳ２３の処理による誤差サンプル列の変化の様子を図７（ａ）〜（ｃ）に示す。なお、図７においては、図５と異なり１チャンネルしか示していないが、他のチャンネルについても同様に処理される。まず、図７（ａ）に示したように、固定長にフレーム化された誤差サンプル列は、フレームＦ１、Ｆ２、Ｆ３、…Ｆｍ、…Ｆｎ、…に分割される。
【００３８】
続いて、１つの基本フレームに対して後続する複数のフレームについて、差分を算出する。まず先頭のフレームＦ１と次のフレームＦ２内の各サンプルごとに差分を算出していく。この例では、２５６個の差分値が各サンプル時刻に対して得られることになる。得られた差分値の絶対値の最大値をＦ２フレームにおけるＦ１フレームとの相関を示す指標値として記録しておく。同様に、Ｆ３フレームに対してもＦ１フレームとの差分絶対値の最大値を求め、最大値が最も小さくなるフレームを相関フレーム候補として選別する。例えば、フレームＦ１を基本フレームとしたとき、フレームＦ３の差分絶対値の最大値が最も小さいため、フレームＦ３が相関フレーム候補となる。そして、差分をとる前のフレームＦ３の各サンプル値の絶対値の最大値に比べ、前記差分絶対値の最大値が、所定の割合以下に減少している場合、フレームＦ３を相関フレームに決定し、基本フレームであるフレームＦ１とグループＡを形成する。この時、フレームＦ１はそのままであるが、フレームＦ３の各サンプルには、フレームＦ１との差分値に更新されることになる。差分値であることを示すために、処理後のフレームをフレーム「Ｆ３−Ｆ１」で表現することにする。さらに、後続するフレームに対しても同様の処理が行われる。例えば、基本フレームＦｍに対してフレームＦｎが相関フレームとして決定され、グループＧが構成されるとともに、フレームＦｎについても差分処理を行い、フレーム「Ｆｎ−Ｆｍ」が得られる。結局、グループ内の基本フレームは、そのままとなり、グループ内の相関フレームには、基本フレームとの差分が記録されることになる。
【００３９】
ステップＳ２３においては、上記差分演算処理と並列してフレーム間の関係であるフレーム構造データを記録していく。具体的には、どのフレームがグループ化されたかの情報を記録していくことになる。フレームの記録は、各フレームのフレーム番号を記録することにより行う。ここで、フレーム構造データの一例を図７（ｄ）に示す。図７（ｄ）に示すようにフレーム構造データには、グループ番号とそのグループに属する基本フレームと相関フレームの各々のＩＤ番号により記録している。このフレーム構造データは、復号時に元の信号を忠実に復元するために必要となる。ステップＳ２３では、類似しているフレームを選別して各グループの相関フレームは基本フレームとの差分で記録するようにした。類似しているフレームの差分値は、値が小さくなるので、後述する処理で記録するビット数を変化させたときに、少ないビット数で表現することが可能となる。
【００４０】
この後、図３に示したような正負極性の処理を行う（ステップＳ６）。これは、図３に示したように、２の補数表現を符号付絶対値表現に変換し、正負の符号を最下位ビットに移動させるものである。さらにこの後、可変長符号化を行う（ステップＳ７〜ステップＳ１３）。これは、図４のフローチャートに示した処理を実行することにより各サンプルのビット長を可変長で符号化していく。
【００４１】
（予測誤差の一例）
次に、上記ステップＳ１で行われる予測誤差の算出について説明しておく。ここでは、代表して過去の２サンプルにより予測する〔数式２〕を用いた場合について説明する。例えば、サンプル値ｘ（ｔ）が図８（ａ）に示すような状態である場合を考えてみる。図８（ａ）において、横軸は時刻（サンプル番号）、縦軸はサンプル値ｘ（ｔ）である。また、各時刻における線分は、各時刻におけるサンプル値ｘ（ｔ）の大きさを示している。〔数式２〕による算出の手順を図８を用いて説明すると、まず、誤差フィードバック成分を加えない状態で各予測誤差ｅｏ（ｔ）を算出する。図８（ｂ）に示すように、時刻ｔの予測誤差ｅｏ（ｔ）を算出する場合、直前の時刻ｔ−１におけるサンプル値ｘ（ｔ−１）および２つ前の時刻ｔ−２におけるサンプル値ｘ（ｔ−２）を結ぶ予測線が時刻ｔでとる値と、時刻ｔにおけるサンプル値ｘ（ｔ）の差分（図中太点線で示す）に基づいて予測誤差ｅｏ（ｔ）が算出される。時刻ｔ＋１以降も同様に行って予測誤差ｅｏ（ｔ＋１）を算出する。算出された予測誤差ｅｏ（ｔ）は、図８（ｃ）に示すようになる。図８（ａ）と図８（ｃ）を比較するとわかるように値が変動する範囲が大きく狭まり、データ圧縮に都合が良くなる。
【００４２】
続いて、〔数式２〕に基づいて予測誤差ｅｏ（ｔ）に対して直前の時刻ｔ−１における補正が加わった予測誤差ｅ２（ｔ−１）の５０％を減算させて、誤差フィードバック処理を加えた結果が図８（ｄ）である。図８（ｃ）と比べると、時刻ｔ＋１およびｔ＋２における予測誤差の低減が顕著である。逆に時刻ｔ＋３およびｔ＋４では予測誤差が増大しているが、平均的には予測誤差が低減し、図８（ａ）と比較すると値が変動する範囲が更に狭まり、データ圧縮効果が向上する。
【００４３】
（実現のための装置構成）
上記符号化方法は、現実には専用のソフトウェアを搭載したコンピュータにより実現されることは当然である。コンピュータに上記処理を実行させるプログラムを搭載することにより、コンピュータがデジタル音響信号等の時系列信号を読んだ後、上記ステップＳ１〜ステップＳ１３の処理を実行することにより圧縮された符号データが得られる。
【００４４】
【発明の効果】
以上、説明したように本発明によれば、時系列のサンプル列で構成される時系列信号の可逆圧縮を行うにあたり、サンプル列の各サンプルに対して、時間的に過去のサンプル列から、複数の予測計算式に基づいて複数の予測誤差値を算出し、過去のサンプル列の予測誤差値の累積値である累積誤差に基づいて、算出された複数の予測誤差値の中から、符号化対象の予測誤差値として１つを選択し、符号化対象として選択された予測誤差値に対して可変長符号で符号化を行うことにより各サンプルを符号化し、予測誤差の算出、符号化対象誤差の選択、可変長符号化を全サンプルについて実行するようにしたので、より効率の高い圧縮を行うことが可能となるという効果を奏する。
【図面の簡単な説明】
【図１】本発明において処理対象とするデジタル時系列信号のサンプル列を模式的に示した図である。
【図２】本発明における適応型線形予測処理の概要を示すフローチャートである。
【図３】ステップＳ６の正負極性処理の様子を示す図である。
【図４】本発明における可変長符号化の概要を示すフローチャートである。
【図５】信号平坦部処理およびチャンネル間演算処理の様子を示す図である。
【図６】本発明に好適な他の圧縮処理を組み合わせた場合の処理概要を示すフローチャートである。
【図７】フレーム間演算処理の様子を示す図である。
【図８】〔数式２〕を用いた予測誤差算出処理の様子を示す図である。
【符号の説明】
Ｒ１〜Ｒ４・・・累積誤差
Ｂｖ・・・変更部ビット長
Ｂｄ（ｔ）・・・誤差サンプルビット長
Ｂｆ・・・平均ビット長
ＲＢ・・・累積ビット長[0001]
[Industrial applications]
The present invention relates to the field of music production, such as music production, storage of audio data materials, and relay of location materials, particularly to the field of producing high-definition audio with higher quality than CDs, and audio recording using digital recording media such as CDs and DVDs. The present invention relates to a reversible data compression technique suitable in a field where reproduction of data, a transmission of a biological signal in telemedicine, and the like where data modification is disliked, and the like are performed.
[0002]
[Prior art]
Conventionally, various methods have been used for compressing an acoustic signal. MP3 (MPEG-1 / Layer3), AAC (MPEG-2 / Layer3) and the like have been put to practical use as a technique for compressing and encoding an audio signal. With such a compression encoding method, it is possible to treat an audio signal as small data, which contributes to the efficiency of data recording and transmission.
[0003]
MP3, AAC, and the like as described above are all called lossy coding schemes, and can be efficiently compressed. However, decoding involves a considerable deterioration in quality and completely reproduces the original signal. It is not possible. For this reason, in the music production field such as music production, material storage, and location material relay, these encoding methods cannot be applied, and although inefficient, non-compressed storage / transmission methods are used. In particular, recently, the number of productions that handle high-definition audio has increased, the material capacity has become enormous, and this has become a problem in managing work disks.
[0004]
Recently, in order to solve the above problem, as a method of reversibly compressing and encoding an audio signal, a method using prediction encoding and combining prediction error data with encoding processing according to the frequency of appearance has been proposed. (For example, see Patent Document 1).
[0005]
In addition, the present applicant also performs a difference operation between channels and between frames on a sample sequence of an audio signal to reduce the value of each sample, and then compresses data using predictive coding. is suggesting. (See Patent Document 2).
[0006]
[Patent Document 1]
JP 2002-278600 A [Patent Document 2]
Japanese Patent Application No. 2002-231150 [0007]
[Problems to be solved by the invention]
However, in the technologies proposed in Patent Documents 1 and 2, the prediction method is limited to one. For example, the technique described in Patent Document 2 proposes predictive coding based on the past two samples. The genre of contents to which predictive coding based on the past two samples can be applied is the widest, but contents that change slowly are predicted based on the past one sample, and contents that change rapidly are predicted based on the past three or more samples. It has been confirmed that the error is reduced. Therefore, there is a method that prepares multiple formulas and selects an appropriate one according to the content, but if the entire content does not have uniform properties, the compression effect can not be obtained, especially vocal and mixed music Then, there is a problem that the compression efficiency is reduced.
[0008]
Further, in the content proposed in Patent Document 2, a method is employed in which a quantization noise component having a fixed bit length which is set in advance and is difficult to compress is separated from a variable length coding target. At this time, the fixed bit length differs for each content depending on the bit precision of the A / D converter, the conditions at the time of audio recording, and the like, and therefore, it is necessary to set an optimal value as needed. Further, it has been found that the optimum fixed bit length changes due to a change in the signal amplitude even in the content, and there is a problem that the compression effect is reduced if the processing is performed uniformly.
[0009]
In view of the above points, the present invention is capable of performing more efficient compression by selecting an optimal method at any time in accordance with a change in signal characteristics in a time-series signal. It is an object of the present invention to provide a time-series signal encoding method and apparatus capable of completely decoding an original time-series signal at the time of decoding.
[0010]
[Means for Solving the Problems]
In order to solve the above problem, the present invention provides a coding method for compressing the amount of information so that all the sample sequences can be reproduced with respect to a time-series signal composed of a time-series sample sequence. Prediction error calculating step of calculating a plurality of prediction error values based on a plurality of prediction calculation formulas from a temporally past sample sequence with respect to each sample, and a prediction calculation formula of a prediction error value of a past sample sequence An encoding target error selecting step of selecting one as the encoding target prediction error value from a plurality of prediction error values calculated in the prediction error calculating step based on the cumulative error that is another accumulated value; A variable length coding step of performing coding with a variable length code on an error sample having a prediction error value selected as the coding target, the prediction error calculation step, a coding target error selection step, a variable length coding step; Mark Characterized in that the reduction step to be executed for all samples.
[0011]
According to the present invention, for each sample, a plurality of prediction error values are calculated based on a plurality of prediction calculation formulas, and a prediction error of a calculation formula in which the cumulative prediction error for each calculation formula with respect to a past sample sequence is minimized is calculated. The prediction error is selected as the prediction error of the sample, so that the prediction error is calculated based on the optimal calculation formula according to the change of the signal waveform with the change of time. It is possible to do.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
(Configuration of time series signal)
First, a time-series signal to be encoded in the present invention will be described. Here, a case of an audio signal having a plurality of channels as a time-series signal will be described as an example. First, an analog audio signal that is a time-series signal is digitized. This can be done by using a conventional general PCM technique, sampling this analog audio signal at a predetermined sampling frequency, and converting the amplitude into digital data using a predetermined number of quantization bits. In the present embodiment, a description will be given below on the assumption that positive and negative signs are recorded at a sampling frequency of 44.1 KHz and a quantization bit number of 16 bits. Sampling at a sampling frequency of 44.1 KHz results in a sample sequence composed of 44100 samples per second. FIG. 1 schematically shows a predetermined number of samples from the top. In FIG. 1, the horizontal axis represents time t, and the vertical axis represents amplitude x (t). In FIG. 1, the numbers t-4 to t + 5 indicate the sample numbers of the respective samples. When sampling at 44.1 KHz as described above, these 10 samples correspond to 1/4410 seconds. Here, the sample number and the time are used synonymously. This is because the sample numbers are assigned in ascending order as the time changes, and the value obtained by multiplying the sampling frequency by the time becomes the sample number. In FIG. 1, the line segment extending from each sample number indicates an amplitude value. When the amplitude value is 16 bits and a positive / negative sign is recorded as described above, the amplitude is −32768 to 32767. Value.
[0013]
(Outline of processing)
Next, the encoding process for the time-series signal as described above will be described. First, adaptive linear prediction coding is performed. The processing outline of the adaptive linear prediction coding is shown in the flowchart of FIG. In FIG. 2, first, a linear prediction error corresponding to each prediction formula is calculated using a plurality of prediction formulas prepared in advance (step S1). Specifically, the following [Equation 1] to [Equation 4] are prepared as prediction calculation equations for calculating the prediction error of the sample number t.
[0014]
[Formula 1]
e1 (t) = x (t) -x (t-1) -e1 (t-1) / 2
[0015]
[Formula 2]
e2 (t) = x (t) −2 × x (t−1) + x (t−2) −e2 (t−1) / 2
[0016]
[Equation 3]
e3 (t) = x (t) −3 × x (t−1) + 3 × x (t−2) −x (t−3) −e3 (t−1) / 2
[0017]
[Equation 4]
e4 (t) = x (t) −4 × x (t−1) + 6 × x (t−2) −4 × x (t−3) + x (t−4) −e4 (t−1) / 2
[0018]
In the above [Equation 1] to [Equation 4], e1 (t) to e4 (t) are prediction errors in the sample at time t by the respective prediction calculation expressions, and x (t) to x (t-4) are time It is an amplitude value from t to t-4.
[0019]
“2 × x (t−1) −x (t−2)” in the above “Formula 2” and “3xx (t−1) −3 × x (t−2) + x ( t−3) ”, and“ 4 × x (t−1) −6 × x (t−2) + 4 × x (t−3) −x (t−4) ”in the above [Equation 4] is the past 2 Linear prediction component based on ~ 4 samples. The prediction error at time t using this linear prediction component and the prediction errors “e1 (t−1) / 2” to “e4 (t−1) / 2” (error feedback components) calculated in the immediately preceding sample. e1 (t) to e4 (t) are calculated.
[0020]
Next, a linear prediction error that minimizes the cumulative error, which is the accumulation of the absolute values of the prediction error values for each of the above-described prediction formulas, is selected as the prediction error of the sample (step S2). Here, the concept of accumulated error is used. Specifically, the cumulative values of the prediction errors calculated by the prediction calculation formulas [Formula 1] to [Formula 4] for the past samples are set as R1 to R4. Then, a prediction error corresponding to the smallest one of the accumulated errors R1 to R4 is selected. For example, it is assumed that R2 is the minimum among R1 to R4. In this case, the prediction error e2 (t) calculated by [Equation 2] is selected as the prediction error e (t) to be encoded. The selected prediction error e (t) is replaced with the original value x (t) of the sample, and the subsequent processing is performed. A sample in which the prediction error e (t) is recorded to be distinguished from the original sample will be referred to as an error sample.
[0021]
Subsequently, the absolute values of the prediction errors e1 (t) to e4 (t) are added to the accumulated errors R1 to R4 (step S3). Specifically, as shown in the following [Equation 5], the variables R1 to R4 serving as the accumulated error values are updated. At the same time, every time the processing of each sample is performed, the processing of incrementing the counter by one is performed.
[0022]
[Equation 5]
R1 ← R1 + | e1 (t) |
R2 ← R2 + | e2 (t) |
R3 ← R3 + | e3 (t) |
R4 ← R4 + | e4 (t) |
[0023]
Subsequently, it is determined whether the counter has exceeded a predetermined number of times (step S4). In the present embodiment, the predetermined number is set to 100 times. That is, it is determined whether the counter has exceeded 100.
[0024]
As a result, if the counter exceeds 100, the accumulated error is halved (step S5). Specifically, as shown in the following [Equation 6], variables R1 to R4 that are cumulative errors are divided by two. At the same time, the counter is reset to zero. That is, R1 to R4 are not cumulative errors in a pure sense, but are moving averages of the cumulative errors. In the present embodiment, up to the immediately preceding maximum of 100 samples are accumulated, but the previous samples are processed so as to be halved. Thereby, the influence of the samples separated in time is reduced.
[0025]
[Equation 6]
R1 ← (R1) / 2
R2 ← (R2) / 2
R3 ← (R3) / 2
R4 ← (R4) / 2
[0026]
By executing the processing of steps S1 to S5 over all samples at all times in the time-series signal, the values of all samples are replaced from the original amplitude value x (t) with the target error e (t). become.
[0027]
Subsequently, positive / negative processing is performed on each error sample (step S6). The value of each sample is replaced from the amplitude value to the prediction error by the processing of the above steps S1 to S5, but the bit format of each sample remains unchanged. Normally, when operated by a computer such as a computer, each data is processed in units of 32 bits and is expressed using a two's complement representation. This is converted into a positive / negative signed absolute value expression, and the absolute value portion is shifted one bit higher, and the positive / negative sign bit is shifted to the LSB (least significant bit). FIG. 3 schematically shows how the bit configuration is converted in step S6. FIG. 3A shows a bit configuration before processing, and FIG. 3B shows a bit configuration after processing. The reason why the positive / negative sign bit is shifted to the LSB is to make it easier to detect the bit length of each error sample in the processing of step S7 and later (variable length coding) described later.
[0028]
From here on, the process of converting each error sample into a variable length is performed. The bit length conversion in the present embodiment employs a method generally called Golomb coding. More specifically, the bit components forming one sample are divided into upper bit components and lower bit components, the lower bit components are left unchanged, and the upper bit components are the numerical values obtained by converting only the upper bits into decimal numbers. Bits “0” are arranged, and a separator bit “1” is added at the end. For example, consider an 8-bit bit component "00101000". At this time, if the lower bit component is 4 bits, the lower bit component is “1000”. Since the upper bits are "0010", "2" pieces of "0" obtained by converting this number into decimal numbers are arranged, and are converted into "001" obtained by adding "1" at the end. As a result, the 8-bit bit string “00101000” is converted into a 7-bit bit string “00111000”. In the present embodiment, the bit length of the lower bit component that makes the bit component invariable before and after the conversion is made variable in each sample.
[0029]
Hereinafter, a specific description will be given. FIG. 4 is a flowchart showing an outline of the variable length coding. First, an average bit length Bf, which is a moving average of the bit lengths of past samples, is calculated (step S7). The average bit length Bf is obtained by dividing the accumulated bit length RB, which is the accumulated value of the past bit length, by a counter C based on the number of past samples. That is, it is calculated by Bf = RB / C. Since the accumulated bit length RB is 0 in the initial state, when processing a sample at t = 1, the bit length Bd (t) of the sample at t = 1 is set as an initial value. Also, the initial counter C is set to 1.
[0030]
Subsequently, the bit length Bd (t) of the sample at time t is calculated (step S8). For the samples after t = 2, after calculating the average bit length Bf, the bit length Bd (t) of the sample is calculated. The bit length Bd (t) is easily calculated by performing the conversion of the bit configuration as in step S6. As a result of the conversion into the bit configuration as shown in FIG. 3B, the bit length starts from the first bit "1" in the bit configuration of each sample. Next, the bit length Bv of the changing unit is calculated (Step S9). This is calculated by subtracting the average bit length Bf from the bit length Bd (t) of the error sample. Subsequently, the code output of the data is performed (step S10). Specifically, after outputting “0” by the numerical value obtained by converting the upper Bv bits into a decimal number, a separator bit “1” is output, and the lower Bf bits are output as an invariable part. The code output is performed as recording on an external storage device such as a hard disk or a CD-R. Next, the bit length Bd (t) is added to the accumulated bit length RB (step S11). At the same time, each time the processing of each error sample is performed, the processing of adding the counter C one by one is performed. Subsequently, it is determined whether the counter C has exceeded a predetermined number (step S12). As the predetermined number, about 100 is set here as well. Therefore, it is determined whether the counter has exceeded 100. As a result, if the counter exceeds 100, the cumulative bit length RB is halved (step S13). Specifically, the variable RB that is the accumulated bit length is divided by two. At the same time, the counter C is halved.
[0031]
As described above, encoding with a variable bit length is performed for each sample. The variable-length error samples obtained by the encoding are recorded on a target recording medium as code data. Note that the processing of steps S1 to S13 has been described separately for adaptive linear prediction of FIG. 2 and the variable length coding of FIG. 4 for convenience of description, but actually, the processing of steps S1 to S13 is performed for each sample. Are performed in parallel. That is, as shown in FIG. 1, instead of returning to step S1 after the processing of steps S4 and S5, the processing returns to step S1 after performing the processing of steps S6 to S13.
[0032]
(Combination with other compression methods)
The present invention can sufficiently enhance the compression effect even with only the above description, but a higher effect can be obtained by combining with another compression method. Hereinafter, preferred combinations will be described. Here, a description will be given assuming that processing is performed on a stereo sound signal as shown in FIG. FIG. 5A shows a two-channel stereo sound signal, in which an L (left) signal is recorded in Ch1 and an R (right) signal is recorded in Ch2. 5A to 5D, the left end is the start time and the right end is the end time. Compared to FIG. 1, the time intervals on the horizontal axis are shown in a condensed form. FIG. 6 is a flowchart showing an outline of the entire processing when another method is combined with the present invention. From now on, description will be made according to FIG.
[0033]
(Signal flat part processing method)
First, processing of a signal flat portion is performed on a sample sequence that is a digital acoustic signal (step S21). The signal flat portion refers to a portion where the same signal level is continuous. In particular, it often occurs in a silent part where the signal level is “0” and in a saturated part where the absolute value of the signal level is the maximum. Silence occurs when the sound is actually silent or when the sound is not recorded very small, while saturation occurs during the signal recording and A / D conversion. Regardless of whether a silent portion, a saturated portion, or the other same signal level continues, the same signal level is continuously recorded for a predetermined time (a predetermined number of samples) in the signal flat portion. For this reason, this part is data that can be easily compressed. Specifically, the three values of the head time position of the signal flat portion, the number of samples following the same signal level, and the signal level (sample value) are recorded as signal flat portion data separately from the sample sequence of each channel. I do. The signal flat portion is deleted from the sample sequence of each channel. This is schematically shown in FIGS. 5B and 5C. FIG. 5B shows a sample sequence before the signal flat portion processing. In FIG. 5B, a shaded portion indicates a signal flat portion. By the processing in step S21, the signal flat portion is deleted from the original sample sequence, and the result is as shown in FIG. However, in order to restore the original state at the time of decoding, the separated signal flat part is recorded in a format as shown in FIG.
[0034]
As described above, the signal flat portion data is recorded for each signal flat portion with three attributes of the start time (sample number), the number of samples, and the sample value. Here, the head time is the time from the start position of the signal, and in the example of FIG. 5E, it is recorded with the sample number from the head. As described above, if the sample number is divided by the sampling frequency, it will be converted to time. The sample number is information indicating how continuous the sample value continues. The end time of the signal flat portion may be recorded instead of the number of samples. The sample value indicates the digitized signal level. Here, since quantization is performed with 16 bits, the maximum value is “32767” and the minimum value is “−32768”. That is, “0” indicates a silent part, and “32767” and “−32768” indicate a saturated part. However, the signal flat portion is not unconditionally processed. Here, since the purpose is to compress data, it is meaningless if the signal flat portion data is larger than the reduction of the sample sequence. Therefore, only when the sample which becomes the signal flat portion continues for a predetermined number or more, the signal flat portion data is created and separated from the sample sequence of each channel.
[0035]
Subsequently, a conversion process from the original sample value to the prediction error is performed on each sample (Steps S1 to S5). This converts the value of each sample into a prediction error by executing the processing shown in the flowchart of FIG. When there are a plurality of channels, processing is performed on the sample sequence of each channel.
[0036]
(Calculation method between channels)
Next, a difference calculation between channels is performed on the error sample sequence of each channel in which the prediction error value is recorded (step S22). This is performed by simply taking the difference between the error sample values at the same time. The result of the difference operation is given as an error sample sequence of one channel, and the value of the error sample sequence of the other channel is left as it is. Specifically, in the case of a two-channel stereo sound signal as shown in FIG. 5C, the value of the L signal is recorded as it is for Ch1, and the RL difference value is given to Ch2. Generally, in a stereophonic signal, each data at the same time has a correlation, and the difference between the two data at each time is a smaller value than the original value. This is the same when predictive coding is performed by linear prediction. For this reason, in the example of FIG. 5D, the value of each error sample in Ch2 becomes small, and there is more room for subsequent compression.
[0037]
(Inter-frame operation method)
Subsequently, a frame having a predetermined section length is set for the error sample sequence of each channel on which the inter-channel operation has been performed, and the operation between the set frames is performed (step S23). The similarity of the error sample sequence constituting each frame is obtained, and similar frames are selected. Here, the frame length is fixed over the entire section from the start time to the end time of the error sample sequence. Specifically, one frame has 256 samples. 256 samples from the beginning of the sample sequence are extracted as one frame, and the similarity of each frame is determined. Since the similarity between frames means that the correlation between both signals is obtained, various methods for performing the correlation calculation can be used. Here, the difference between the 256 samples corresponding to each frame is calculated. Calculate and calculate the maximum value of each absolute value. Here, a difference absolute value that maximizes each of the 100 subsequent frames to the basic frame is calculated, and a frame whose maximum value is equal to or less than a predetermined value is selected as a correlation frame, and one group is formed with the basic frame. Will be. This process is performed over the entire section of the error sample sequence. FIGS. 7A to 7C show how the error sample sequence changes in the process of step S23. Although only one channel is shown in FIG. 7 unlike FIG. 5, the other channels are processed similarly. First, as shown in FIG. 7A, the error sample sequence framed to a fixed length is divided into frames F1, F2, F3,... Fm,.
[0038]
Subsequently, differences are calculated for a plurality of frames subsequent to one basic frame. First, a difference is calculated for each sample in the first frame F1 and the next frame F2. In this example, 256 difference values are obtained for each sample time. The maximum absolute value of the obtained difference value is recorded as an index value indicating the correlation with the F1 frame in the F2 frame. Similarly, the maximum value of the absolute difference between the F3 frame and the F1 frame is determined, and the frame having the smallest maximum value is selected as a correlation frame candidate. For example, when the frame F1 is a basic frame, the maximum difference absolute value of the frame F3 is the smallest, so that the frame F3 is a correlation frame candidate. Then, when the maximum value of the absolute value of the difference is smaller than or equal to the maximum value of the absolute value of each sample value of the frame F3 before taking the difference, the frame F3 is determined to be a correlation frame. , And a group A with the frame F1 which is a basic frame. At this time, the frame F1 remains unchanged, but each sample of the frame F3 is updated to a difference value from the frame F1. In order to indicate the difference value, the processed frame is represented by a frame “F3-F1”. Further, the same processing is performed on the subsequent frames. For example, the frame Fn is determined as a correlation frame with respect to the basic frame Fm, a group G is formed, and a difference process is performed on the frame Fn to obtain a frame “Fn−Fm”. After all, the basic frame in the group remains as it is, and the difference from the basic frame is recorded in the correlation frame in the group.
[0039]
In step S23, frame structure data, which is a relationship between frames, is recorded in parallel with the above-described difference calculation processing. Specifically, information on which frames have been grouped is recorded. The recording of a frame is performed by recording the frame number of each frame. Here, an example of the frame structure data is shown in FIG. As shown in FIG. 7D, the frame structure data is recorded by a group number and each ID number of a basic frame and a correlation frame belonging to the group. This frame structure data is required to faithfully restore the original signal during decoding. In step S23, similar frames are selected, and the correlation frame of each group is recorded by the difference from the basic frame. Since the difference value between similar frames is small, it is possible to represent the difference value with a small number of bits when changing the number of bits to be recorded in the processing described later.
[0040]
Thereafter, the processing of the positive / negative polarity as shown in FIG. 3 is performed (step S6). As shown in FIG. 3, the two's complement representation is converted to a signed absolute value representation, and the sign is shifted to the least significant bit. After that, variable length coding is performed (steps S7 to S13). In this case, the bit length of each sample is coded in a variable length by executing the processing shown in the flowchart of FIG.
[0041]
(Example of prediction error)
Next, the calculation of the prediction error performed in step S1 will be described. Here, a description will be given of a case where [Equation 2], which is predicted by two past samples, is used as a representative. For example, consider a case where the sample value x (t) is in a state as shown in FIG. In FIG. 8A, the horizontal axis represents time (sample number), and the vertical axis represents sample value x (t). The line segment at each time indicates the magnitude of the sample value x (t) at each time. The procedure of calculation using [Equation 2] will be described with reference to FIG. 8. First, each prediction error eo (t) is calculated without adding an error feedback component. As shown in FIG. 8B, when calculating the prediction error eo (t) at the time t, the sample value x (t-1) at the immediately preceding time t-1 and the sample value x (t-1) at the immediately preceding time t-2 are calculated. The prediction error eo (t) is calculated based on the difference between the value taken by the prediction line connecting the value x (t−2) at the time t and the sample value x (t) at the time t (shown by a thick dotted line in the figure). You. The same operation is performed after time t + 1 to calculate the prediction error eo (t + 1). The calculated prediction error eo (t) is as shown in FIG. As can be seen from a comparison between FIG. 8A and FIG. 8C, the range in which the value fluctuates is greatly narrowed, and data compression becomes more convenient.
[0042]
Subsequently, based on [Equation 2], 50% of the prediction error e2 (t-1) obtained by adding the correction at the immediately preceding time t-1 to the prediction error eo (t) is subtracted, and the error feedback processing is performed. FIG. 8D shows the added result. Compared with FIG. 8C, the reduction of the prediction error at the times t + 1 and t + 2 is remarkable. Conversely, the prediction error increases at times t + 3 and t + 4, but the prediction error is reduced on average, and the range in which the value fluctuates is further narrowed as compared with FIG. 8A, and the data compression effect is improved.
[0043]
(Device configuration for realization)
It goes without saying that the above encoding method is actually realized by a computer equipped with dedicated software. By installing a program that causes the computer to execute the above-described processing, the computer reads a time-series signal such as a digital acoustic signal, and then executes the processing of steps S1 to S13 to obtain compressed code data. .
[0044]
【The invention's effect】
As described above, according to the present invention, in performing lossless compression of a time-series signal composed of a time-series sample sequence, a plurality of samples in the sample sequence are sampled from a temporally past sample sequence. Calculates a plurality of prediction error values based on the prediction calculation formula of, and calculates an encoding target Is selected as a prediction error value of the target, and each sample is coded by performing coding with a variable length code on the prediction error value selected as a coding target, thereby calculating a prediction error, and calculating a coding target error. Since the selection and the variable length coding are performed for all the samples, it is possible to perform the compression with higher efficiency.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a sample sequence of a digital time-series signal to be processed in the present invention.
FIG. 2 is a flowchart illustrating an outline of an adaptive linear prediction process according to the present invention.
FIG. 3 is a diagram showing a state of positive / negative polarity processing in step S6.
FIG. 4 is a flowchart showing an outline of variable-length coding according to the present invention.
FIG. 5 is a diagram illustrating a state of a signal flat portion process and an inter-channel calculation process.
FIG. 6 is a flowchart showing an outline of a process when another compression process suitable for the present invention is combined.
FIG. 7 is a diagram illustrating a state of an inter-frame calculation process.
FIG. 8 is a diagram showing a state of a prediction error calculation process using [Equation 2].
[Explanation of symbols]
R1 to R4: cumulative error Bv: changing unit bit length Bd (t): error sample bit length Bf: average bit length RB: cumulative bit length

Claims

An encoding method for compressing the amount of information so that all the sample sequences can be reproduced for a time-series signal composed of a time-series sample sequence,
For each sample of the sample sequence, a prediction error calculation step of calculating a plurality of prediction error values based on a plurality of prediction formulas from a temporally past sample sequence,
Based on the cumulative error that is the cumulative value of the prediction error value of the past sample sequence for each prediction calculation formula, from among the plurality of prediction error values calculated in the prediction error calculation step, the prediction error value to be encoded An encoding target error selecting step of selecting one as
A variable-length encoding step of performing encoding with a variable-length code on an error sample having a prediction error value selected as the encoding target,
Encoding the time-series signal by performing variable-length encoding on all samples of the time-series signal by performing the prediction error calculating step, the encoding target error selecting step, and the variable-length encoding step on all the samples. Method.

In claim 1,
A method for encoding a time-series signal, wherein a plurality of prediction calculation formulas in the prediction error calculation step are provided depending on a difference in the number of past samples to be referred to.

In claim 1 or claim 2,
In the variable length encoding step, among the bit components of each of the error samples, the lower bit component is encoded with the same bit component, and the encoding is performed by changing the bit component with respect to the remaining upper bit components. A time-series signal encoding method, characterized in that:

In claim 1 or claim 2,
The variable length encoding step determines an invariant bit length based on a moving average of the bit lengths of the past error samples, and the lower invariant bit length of the bit components of each error sample is used as it is. A coding method for a time-series signal, wherein coding is performed using a component, and coding is performed by changing a bit component with respect to the remaining higher-order bit components.

In any one of claims 1 to 4,
A polarity processing step of shifting the entirety of the absolute value of the prediction error value selected in the encoding target error selecting step by one bit higher and inserting a positive / negative bit into the least significant bit is performed before the variable length coding step. A method for encoding a time-series signal, which is provided as stages.

In any one of claims 1 to 5,
The encoding method of a time-series signal, wherein the accumulated error is calculated by using the accumulation of absolute values of a predetermined number of prediction error values in the past.

In any one of claims 1 to 6,
In the sample sequence, a signal flat portion in which the value of the sample is continuously the same is extracted, separated from the sample sequence, and the leading time position of the separated sample, the number of samples, and the sample value A time-series signal encoding method, characterized in that a signal flat portion processing step of coding three values as signal flat portion data is performed as a preceding stage of the prediction error calculation step.

In any one of claims 1 to 6,
When the sample sequence is composed of a plurality of channels having a plurality of values at the same time, a predetermined operation is performed on the error sample sequence between the channels to update the error sample sequence of any of the channels. A method for encoding a time-series signal, wherein an inter-operation step is performed between the encoding target error selecting step and the variable length encoding step.

In any one of claims 1 to 6,
A plurality of frames composed of a predetermined number of error sample sequences are extracted from the error sample sequence, a predetermined operation is performed between the extracted frames, and each prediction error value of one frame is calculated as a calculated value. A method for encoding a time-series signal, wherein an inter-frame operation to be updated is performed between the encoding target error selecting step and the variable length encoding step.

A coding apparatus that compresses the amount of information so that all the sample sequences can be reproduced with respect to a time-series signal including a time-series sample sequence,
For each sample of the sample sequence, a prediction error calculation unit that calculates a plurality of prediction error values based on a plurality of prediction calculation formulas from a temporally past sample sequence,
Cumulative error calculation means for calculating a plurality of cumulative prediction error values corresponding to the plurality of prediction error values,
Encoding target error selecting means for selecting one as a coding target prediction error value from among a plurality of prediction error values calculated by the prediction error calculating means, based on the cumulative prediction error value;
A variable-length encoding unit that performs encoding with a variable-length code on an error sample having a prediction error value selected as the encoding target,
An encoding device for a time-series signal, comprising:

A recording medium on which code data encoded by the encoding method according to claim 1 is recorded.

For a time-series signal composed of a time-series sample sequence, an encoding program that compresses the amount of information so that all the sample sequences can be reproduced,
On the computer,
For each sample of the sample sequence, a prediction error calculating step of calculating a plurality of prediction error values based on a plurality of prediction formulas from a temporally past sample sequence, each of the prediction error values of the past sample sequence An encoding target error that selects one as a prediction error value to be encoded from among a plurality of prediction error values calculated in the prediction error calculation step based on a cumulative error that is a cumulative value for each prediction calculation formula. A selecting step, performing a variable length coding step of performing coding with a variable length code on the prediction error value selected as the coding target, and calculating the prediction error, a coding target error selecting step, and a variable length coding step. A program that causes the encoding step to be performed on all samples.