JP2004266587A

JP2004266587A - Time-sequential signal encoding apparatus and recording medium

Info

Publication number: JP2004266587A
Application number: JP2003055114A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2003-03-03
Filing date: 2003-03-03
Publication date: 2004-09-24
Anticipated expiration: 2023-03-03
Also published as: JP4170795B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a time-sequential signal encoding apparatus and recording medium in which digitally edited high-definition audio work data can be efficiently reversibly compressed. <P>SOLUTION: Time-sequential sample streams are divided into odd-numbered and even-numbered streams, a differential value between an odd-numbered sample and an average of both neighboring even-numbered samples and a differential value between an even-numbered sample and an average of both neighboring odd-numbered samples are calculated, the stream with continuous smaller differential values is relocated as a main sample stream and the other stream is relocated as a sub sample stream. With respect to the relocated main sample stream and sub sample stream, a predictive error value based on a linear predictive error is obtained, and variable length encoding is performed. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【産業上の利用分野】
本発明は、音楽制作、音響データの素材保管、ロケ素材の中継など音楽制作分野、ＣＤ・ＤＶＤ等のデジタル記録媒体を用いたオーディオ記録再生、遠隔医療における生体信号の解析・診断等の分野において好適なデータの圧縮符号化技術に関する。
【０００２】
【従来の技術】
従来より、音響信号の圧縮には様々な手法が用いられている。音響信号を圧縮して符号化する手法として、ＭＰ３（ＭＰＥＧ−１／Ｌａｙｅｒ３）、ＡＡＣ（ＭＰＥＧ−２／Ｌａｙｅｒ３）などが実用化されている。このような圧縮符号化方式により、音響信号を小さいデータとして扱うことが可能となり、データの記録・伝送の効率化に貢献している。
【０００３】
最近では、上述のようなＭＰ３、ＡＡＣ等のロッシー符号化方式だけでなく、完全に復元することが可能なロスレス符号化方式も開発されており、音響素材の管理に用いられている（例えば、特許文献１参照）。
【０００４】
【特許文献１】
特表２０００−８２１１９９号公報
【０００５】
【発明が解決しようとする課題】
高精細オーディオを扱う場合は、通常のオーディオ（音楽ＣＤ品質：サンプリング周波数４４．１ｋＨｚ、量子化ビット数１６ビット）に比べ、サンプリング周波数または量子化ビット数を高く設定してサンプリングしている。音楽編集分野では、さまざまな音をミックスして編集する作業がしばしば行われるが、この際、部分的に通常のオーディオがミックスされる場合があり、この通常のオーディオについては、高精細オーディオに合わせるべく、サンプルの補間、量子化ビット数の引き伸ばし等が行われる。このような補間処理を施された音響信号は、原理的に原音がもつ情報量程度に圧縮を行うことが可能であるが、従来のロスレス符号化方式で符号化を行っても、その程度の圧縮効果が得られない。逆に、そのまま線形予測符号化を中心とした圧縮を行うと、補間された箇所の線形予測誤差が増大して圧縮率が劣化する場合もあり、冗長箇所を圧縮するのとは逆効果にもなり得る。例えば、サンプリング周波数を２倍に拡大したオーディオデータは、理論上は５０％以下に圧縮可能なはずであるが、現実には５０％を超える圧縮率になってしまう。
【０００６】
そこで、このような問題を解決するため、本発明は、デジタル編集途上の高精細オーディオのワークデータを効率的に可逆圧縮することが可能な時系列信号の符号化装置および記録媒体を提供することを課題とする。
【０００７】
【課題を解決するための手段】
上記課題を解決するため、本発明では、時系列のサンプル列で構成される時系列信号に対して、前記全てのサンプル列を再現できるように情報量を圧縮する符号化装置として、前記サンプル列に対して、録音により作成された主サンプル列と、主サンプル列を補間処理することにより得られた副サンプル列を分離すると共に、前記副サンプル列中の各副サンプルの値を、近傍の主サンプルの平均値と当該副サンプルの値との差分値に変換するサンプル列再配置手段と、前記主サンプル列、前記変換された副サンプル列それぞれに対して、線形予測誤差を算出し、前記主サンプル列および前記副サンプル列の値をそれぞれ予測誤差値に変換する予測誤差変換手段と、前記予測誤差値に変換された主サンプル列、副サンプル列を可変長で符号化する可変長符号化手段を有する構成としたことを特徴とする。
【０００８】
本発明によれば、時系列信号を構成するサンプル列を、録音により得られた主サンプル列と、主サンプル列を補間することにより得られた副サンプル列に再配置し、主サンプル列と副サンプル列それぞれに対して予測符号化し、可変長符号化するようにしたので、デジタル編集途上の高精細オーディオのワークデータを効率的に可逆圧縮することが可能となる。
【０００９】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して詳細に説明する。
（第１の実施形態）
図１は、本発明に係る時系列信号の符号化装置の第１の実施形態を示す構成図である。図１において、１０は時系列信号入力手段、２０はサンプル列再配置手段、３０は下位固定ビット削除手段、４０は予測誤差変換手段、５０はチャンネル間演算手段、６０は極性処理手段、７０は可変長符号化手段である。
【００１０】
図１において、時系列信号入力手段１０はデジタル音響信号等のデジタル化された音響信号を入力する機能を有している。サンプル列再配置手段２０は、入力された時系列信号であるサンプル列を録音を基に得られたサンプル列である主サンプル列と主サンプル列を補間することにより得られた副サンプル列とに分離する機能を有している。下位固定ビット削除手段３０は、量子化雑音成分である下位の所定数のビットを削除する機能を有している。予測誤差変換手段４０は、線形予測誤差の手法を用いて、各サンプルの値を予測誤差値に変換する機能を有する。チャンネル間演算手段５０は、複数のチャンネルからなるサンプル列の各チャンネル間の差分演算を行う機能を有する。極性処理手段６０は、正負の値を補数表現により表した各サンプルのビット列を、正負の極性を表す１ビットと他のビット列に分ける処理を行う機能を有する。可変長符号化手段７０は、各サンプルの値を可変ビット長で符号化する機能を有している。図１に示した装置は、実際には、コンピュータおよびコンピュータにインストールされた専用のソフトウェアプログラムにより実現される。
【００１１】
次に、図１に示した時系列信号の符号化装置の処理動作について説明する。本発明では、時系列信号として複数の音響信号をミックスしたワークデータを扱う場合を例にとって説明する。このような音響信号は、上述のように、サンプリング周波数４８ｋＨｚ、量子化ビット数１６ビットの通常の音響信号や、サンプリング周波数９６ｋＨｚ、量子化ビット数２４ビットの高精細の音響信号が混在している。このようにサンプリング周波数の異なる音響信号を混在させることにより得られる音響信号は、高精細の音響信号にサンプリング周波数を統一させて扱うことになる。この場合、サンプリング周波数４８ｋＨｚの音響信号は、サンプリング周波数９６ｋＨｚの音響信号にサンプル数を合わせるべく隣接するサンプルの平均値で間を補間していく。
【００１２】
このような音響信号を模式的に示すと図２（ａ）のようになる。図２（ａ）において括弧内の数字は、１から昇順に付されたサンプル番号であり、ｘは、そのサンプルの値を示している。このようなサンプル列である時系列信号が時系列信号入力手段１０から入力されると、サンプル列再配置手段２０は、奇数番目のサンプルについて、その両隣の偶数番目のサンプルの平均値との差分を演算すると共に、偶数番目のサンプルについて、その両隣の奇数番目のサンプルの平均値との差分を演算する。このときのサンプル列を模式的に示すと、それぞれ図２（ｂ）（ｃ）に示すようになる。なお、図２（ｂ）の例では、演算を行わない偶数番目のサンプルを、図２（ｃ）の例では、奇数番目のサンプルを、それぞれ時間的に過去に移動させた状態で示している。
【００１３】
この差分演算の結果、差分値が小さいものが多い方を副サンプル列とし、少ない方を主サンプル列とする。図２の例では、図２（ｂ）の配列の後半分の各値と、
図２（ｃ）の配列の後半分の各値とを比較することになる。例えば、奇数番目のサンプルが補間によって得られたものである場合、図２（ｂ）に示した配列の後半の値が０になる。一方、偶数番目のサンプルが補間によって得られたものである場合、図２（ｃ）に示した配列の後半の値が０に近くなる。例えば、図２（ｂ）に示す配列の後半に０近辺の値が多い場合、偶数番目のサンプルの集合を主サンプル列、奇数番目のサンプルの集合を副サンプル列として分離する。また、サンプル列再配置手段２０の処理においては、図２（ｂ）（ｃ）に示したように主サンプルを時間的に過去に移動し、副サンプルを時間的に未来に移動させるようにしても良いが、主サンプルと副サンプルを分離して扱うようにしても良い。例えば、奇数番目が副サンプルの場合には、図２（ｄ）に示すように主サンプルと副サンプルを分離する。本発明においては、本来のサンプルを利用して補間することにより得られたサンプルを含んだサンプル列に対して線形予測を行うことにより、逆にデータ量が増えてしまうことを防ぐために、主サンプル列と副サンプル列を区別している。そのため、主サンプル列と副サンプル列に対して、別々に線形予測を行うことができれば、図２（ｂ）（ｃ）に示したような１つのサンプル列であっても、図２（ｄ）に示したような２つのサンプル列であっても良い。
【００１４】
次に、下位固定ビット削除手段３０が、主サンプル列、副サンプル列の各サンプルの下位の所定数のビットを分離する。これは、量子化ビット数が１６ビットのデータを高精細の音響信号と合わせるために２４ビットに変換している場合に、冗長な下位ビット成分を削除するために行い、この処理を行わないと、符号化された情報量は３／２倍に増大することになる。一方、ミックスする基になった素材の音響信号が全て高精細の２４ビットで量子化されている場合は、下位固定ビット削除手段３０による処理を行う必要はないが、同様に削除を行い、削除された下位ビットデータ配列を出力符号データの一部として別途記録する手段もとれ、その方が後段の予測誤差変換手段以降の処理負荷が軽減する。下位固定ビット削除手段３０については、動作させるかどうかをあらかじめ設定しておくことができる。
【００１５】
続いて、主サンプル列、副サンプル列の各サンプルの値を、予測誤差変換手段４０が予測誤差値に変換する。あるサンプルにおける予測誤差値の算出は、時間的に過去に位置する直前の１つもしくは複数のサンプルの値を利用して行われる。本実施形態では、利用する直前のサンプル数を動的に変化させる手法を用いている。以下に、このような適応型線形予測符号化について説明する。予測誤差変換手段４０により行われる適応型線形予測符号化の処理概要を図３のフローチャートに示す。まず、あらかじめ準備された複数の予測計算式を用いて、各予測計算式に対応した線形予測誤差を算出する（ステップＳ１）。具体的には、サンプル番号ｔの予測誤差を算出する予測計算式として、以下の〔数式１〕〜〔数式６〕を用意している。
【００１６】
〔数式１〕
ｅ０（ｔ）＝ｘ（ｔ）−ｅ０（ｔ−１）／２
【００１７】
〔数式２〕
ｅ１（ｔ）＝ｘ（ｔ）−ａ_１１・ｘ（ｔ−１）−ｅ１（ｔ−１）／２
【００１８】
〔数式３〕
ｅ２（ｔ）＝ｘ（ｔ）−ａ_２１・ｘ（ｔ−１）−ａ_２２・ｘ（ｔ−２）−ｅ２（ｔ−１）／２
【００１９】
〔数式４〕
ｅ３（ｔ）＝ｘ（ｔ）−ａ_３１・ｘ（ｔ−１）−ａ_３２・ｘ（ｔ−２）−ａ_３３・ｘ（ｔ−３）−ｅ３（ｔ−１）／２
【００２０】
〔数式５〕
ｅ４（ｔ）＝ｘ（ｔ）−ａ_４１・ｘ（ｔ−１）−ａ_４２・ｘ（ｔ−２）−ａ_４３・ｘ（ｔ−３）−ａ_４４・ｘ（ｔ−４）−ｅ４（ｔ−１）／２
【００２１】
〔数式６〕
ｅ５（ｔ）＝ｘ（ｔ）−ａ_５１・ｘ（ｔ−１）−ａ_５２・ｘ（ｔ−２）−ａ_５３・ｘ（ｔ−３）−ａ_５４・ｘ（ｔ−４）−ａ_５５・ｘ（ｔ−５）−ｅ５（ｔ−１）／２
【００２２】
上記〔数式１〕〜〔数式６〕において、ｅ０（ｔ）〜ｅ５（ｔ）は各予測計算式による時刻ｔのサンプルにおける予測誤差であり、ｘ（ｔ）〜ｘ（ｔ−５）は時刻ｔ〜ｔ−５におけるサンプル値である。
【００２３】
上記〔数式３〕における「ａ_２１・ｘ（ｔ−１）＋ａ_２２・ｘ（ｔ−２）」、上記〔数式４〕における「ａ_３１・ｘ（ｔ−１）＋ａ_３２・ｘ（ｔ−２）＋ａ_３３・ｘ（ｔ−３）」、上記〔数式５〕における「ａ_４１・ｘ（ｔ−１）＋ａ_４２・ｘ（ｔ−２）＋ａ_４３・ｘ（ｔ−３）＋ａ_４４・ｘ（ｔ−４）」、上記〔数式６〕における「ａ_５１・ｘ（ｔ−１）＋ａ_５２・ｘ（ｔ−２）＋ａ_５３・ｘ（ｔ−３）＋ａ_５４・ｘ（ｔ−４）＋ａ_５５・ｘ（ｔ−５）」は過去の２〜５個のサンプルに基づく線形予測成分である。この線形予測成分、および、直前のサンプルにおいて算出された予測誤差「ｅ１（ｔ−１）／２」〜「ｅ５（ｔ−１）／２」（誤差フィードバック成分）を用いて時刻ｔにおける予測誤差ｅ０（ｔ）〜ｅ５（ｔ）を算出する。
【００２４】
上記の係数ａ_１１〜ａ_５５には初期値として、ａ_１１＝１、ａ_２１＝２、ａ_２２＝−１、ａ_３１＝３、ａ_３２＝−３、ａ_３３＝１、ａ_４１＝４、ａ_４２＝−６、ａ_４３＝４、ａ_４４＝−１、ａ_５１＝５、ａ_５２＝−１０、ａ_５３＝１０、ａ_５４＝−５、ａ_５５＝１なる値が各々設定されているが、本実施形態では、これらの係数を動的に変化させる。具体的には、Ｌｅｖｉｎｓｏｎ−Ｄｕｒｖｉｎのアルゴリズムを利用した以下の〔数式７〕を用いて係数ａ_１１〜ａ_５５を決定する。
【００２５】
〔数式７〕
φ（ｋ）＝１／（Ｎ−Ｋ）・Σ_{ｊ＝１，Ｎ−Ｋ}ｘ（ｊ）・ｘ（ｊ＋ｋ）
ｋ_ｉ＝−｛φ（ｉ）＋Σ_{ｊ＝１，ｉ−１}ａ_ｊ（ｉ−１）・φ（ｉ−ｊ）｝／Ｅ（ｉ−１）
ａ_ｉ（ｉ）＝ｋ_ｉ
ａ_ｊ（ｉ）＝ａ_ｊ（ｉ−１）＋ｋ_ｉ・ａ_ｉ−ｊ（ｉ−１）ただし、１≦ｊ≦ｉ−１
Ｅ（ｉ）＝（１−ｋ_ｉ ^２）Ｅ（ｉ−１）
【００２６】
上記〔数式７〕において、φ（ｋ）は、Ｎ個のサンプルｘ（ｊ）（ｊ＝１，…，Ｎ）において、最大値Ｋ（上記例では５）の範囲でｋサンプルシフトさせたサンプル列との自己相関値である。なお、ＮはＫに対して十分大きな数値をとっている（例えばＫ＝５の場合、Ｎ＝３２７６８）。〔数式７〕は、ｉ＝１からｉ＝Ｋまで再帰的に繰り返し、最終的に得られたａ_ｊ（Ｋ）が過去Ｋ個のサンプルに対応する係数になるとともに、各フェーズにおいて得られた中間結果であるａ_ｊ（ｉ）が係数ａ_ｉｊとなる。ステップＳ１においては、上記〔数式７〕により決定した係数を用いて、〔数式１〕〜〔数式６〕の各計算式で計算を行うことになる。〔数式７〕による計算は、実際には後述するステップＳ７において行われるものである。また、係数を決定するには、過去の数サンプル分の値を必要とするので、初めのＮ−１サンプルについては、前述した初期係数で〔数式１〕〜〔数式６〕の計算を行うことになる。
【００２７】
続いて、上記各予測計算式別の予測誤差値の絶対値の累積である累積誤差が最小となる線形予測誤差をそのサンプルの予測誤差として選出する（ステップＳ２）。ここでは、累積誤差という考え方を用いている。具体的には、各予測計算式〔数式１〕〜〔数式６〕により算出された予測誤差の過去のサンプルについての累積値をＡ０〜Ａ５として設定する。そして、この累積誤差Ａ０〜Ａ５のうち、最小となるものに対応する予測誤差を選出する。例えば、Ａ０〜Ａ５のうち、Ａ２が最小であったとする。この場合、〔数式３〕で算出された予測誤差ｅ２（ｔ）を符号化対象とする予測誤差ｅ（ｔ）として選出することになる。選出された予測誤差ｅ（ｔ）はサンプルの元の値ｘ（ｔ）と置き換えられて以降処理が行われることになる。
【００２８】
続いて、累積誤差Ａ０〜Ａ５に各予測誤差ｅ０（ｔ）〜ｅ５（ｔ）の絶対値を加算する（ステップＳ３）。具体的には、以下の〔数式８〕に示すように、累積誤差値となる変数Ａ０〜Ａ５を更新していく。同時に、各サンプルの処理を行う度に、カウンタＣ１、Ｃ２を１つづつ加算していく処理を行う。
【００２９】
〔数式８〕
Ａ０←Ａ０＋｜ｅ０（ｔ）｜
Ａ１←Ａ１＋｜ｅ１（ｔ）｜
Ａ２←Ａ２＋｜ｅ２（ｔ）｜
Ａ３←Ａ３＋｜ｅ３（ｔ）｜
Ａ４←Ａ４＋｜ｅ４（ｔ）｜
Ａ５←Ａ５＋｜ｅ５（ｔ）｜
【００３０】
続いて、カウンタＣ１が所定回数を超えたかどうかの判定を行う（ステップＳ４）。本実施形態では、この所定回数を１００回として設定している。すなわち、カウンタＣ１が１００を超えたかどうかの判定を行う。
【００３１】
この結果、カウンタが１００を超えていたら、累積誤差を半分にする（ステップＳ５）。具体的には、以下の〔数式９〕に示すように、累積誤差となる変数Ａ０〜Ａ５を２で除算する。同時に、カウンタＣ１を０にリセットする。すなわち、ここでのＡ０〜Ａ５は純粋な意味での累積誤差ではなく、累積誤差の移動平均となっている。本実施形態では、直前の最大１００サンプルまでは累積されるが、それ以前のものは半分になるように処理する。これにより、時間的に離れたサンプルの影響が小さくなるようにしている。
【００３２】
〔数式９〕
Ａ０←（Ａ０）／２
Ａ１←（Ａ１）／２
Ａ２←（Ａ２）／２
Ａ３←（Ａ３）／２
Ａ４←（Ａ４）／２
Ａ５←（Ａ５）／２
【００３３】
続いて、カウンタＣ２が所定回数を超えたかどうかの判定を行う（ステップＳ６）。本実施形態では、この所定回数を３２７６８回として設定している。すなわち、カウンタＣ２が３２７６８を超えたかどうかの判定を行う。
【００３４】
この結果、カウンタＣ２が３２７６８を超えていたら、係数ａ_１１〜ａ_５５の再計算を行う（ステップＳ７）。具体的には、上記〔数式７〕を用いて、係数ａ_１１〜ａ_５５を計算し直すことになる。同時に、カウンタＣ２を０にリセットする。
【００３５】
上記ステップＳ１〜ステップＳ７の処理を主サンプル列および副サンプル列のサンプルに渡って実行することにより、全サンプルの値が元の振幅値ｘ（ｔ）から対象誤差ｅ（ｔ）に置き換えられることになる。本実施形態では、特に、複数の予測式の係数を動的に変化させることにより、より精度の高い予測誤差を算出することが可能になる。
【００３６】
次に、チャンネル間演算手段５０が、予測誤差値が記録された各チャンネルの主サンプル、副サンプルに対して、チャンネル間の差分演算を行う。このチャンネル間差分演算の処理概要を図４のフローチャートに示す。まず、主サンプルを読み込む（ステップＳ１１）。読み込んだ主サンプルがＬチャンネルのものであれば、主サンプルの値をメモリに格納すると共に、次の極性処理手段６０に出力する（ステップＳ１２）。一方、読み込んだ主サンプルがＲチャンネルのものであれば、ステップＳ１３以降の処理を行う。なお、主サンプルは、各チャンネルのものが交互に読み込まれるので、判断を行う必要はない。例えば、サンプル番号ｔ＝１のＬチャンネルのサンプルを読み込んだら、次はサンプル番号ｔ＝１のＲチャンネルのサンプル、その次は、サンプル番号ｔ＝２のＬチャンネルのサンプルというように順番が決まっているので、交互にステップＳ１２の処理とステップＳ１３以降の処理とを切替えるようにすれば良い。
【００３７】
Ｒチャンネルのサンプルの場合は、変数ＡｏとＡｄの比較を行う（ステップＳ１３）。ここで、ＡｏはＲチャンネルのサンプル値の絶対値の累積であり、ＡｄはＲチャンネルとＬチャンネルのサンプル値の差分の絶対値の累積である。変数Ａｏ、Ａｄ共に初期値は０である。ステップＳ１３において、Ａｄ≧Ａｏであれば、Ｒチャンネルのサンプルをそのままの値で、次の極性処理手段６０に出力する（ステップＳ１４）。さらに、累積値Ａｏを以下の〔数式１０〕の第１式に示すように更新する。具体的には、Ｒチャンネルのサンプル値ｅ_Ｒの絶対値を累積値Ａｏに加えることになる。
【００３８】
ステップＳ１３において、Ａｄ＜Ａｏであれば、上記ステップＳ１２においてメモリに格納したＬチャンネルのサンプルとの差分を算出し、差分値を次の極性処理手段６０に出力する（ステップＳ１５）。さらに、累積値Ａｄを以下の〔数式１０〕の第２式に示すように更新する。具体的には、ＲチャンネルとＬチャンネルのサンプルの差分値ｅ_Ｒ−ｅ_Ｌの絶対値を累積値Ａｄに加えることになる。
【００３９】
〔数式１０〕
Ａｏ←Ａｏ＋｜ｅ_Ｒ｜
Ａｄ←Ａｄ＋｜ｅ_Ｒ−ｅ_Ｌ｜
【００４０】
続いて、Ｌ、Ｒの１対のサンプルを処理したことを示すカウンタＣ３を１つ加算する（ステップＳ１６）。
【００４１】
続いて、カウンタが所定回数を超えたかどうかの判定を行う（ステップＳ１７）。本実施形態では、この所定回数を１００回として設定している。すなわち、カウンタＣ３が１００を超えたかどうかの判定を行う。
【００４２】
この結果、カウンタＣ３が１００を超えていたら、累積値Ａｏ、Ａｄを半分にする（ステップＳ１８）。具体的には、以下の〔数式１１〕に示すように、累積値となる変数Ａｏ、Ａｄを２で除算する。同時に、カウンタＣも半分にリセットする。すなわち、ここでの変数Ａｏ、Ａｄも上記〔数式８〕におけるＡ０〜Ａ５と同様、純粋な意味での累積値ではなく、累積値の移動平均となっている。本実施形態では、直前の最大１００サンプルまでは累積されるが、それ以前のものは半分になるように処理する。これにより、時間的に離れたサンプルの影響が小さくなるようにしている。
【００４３】
〔数式１１〕
Ａｏ←Ａｏ／２
Ａｄ←Ａｄ／２
Ｃ３←Ｃ３／２
【００４４】
上記ステップＳ１１〜ステップＳ１８の処理を主サンプル列中の全主サンプルに渡って実行することにより、Ｒチャンネルの全サンプルの値が、Ｌチャンネルとの差分値に置き換えられることになる。ただし、上述の処理から明らかなように、累積値Ａｏ、Ａｄの大小関係によっては、Ｒチャンネルのサンプル値がそのまま記録されるサンプルも存在する。なお、チャンネル間演算手段５０では、Ｌチャンネルのサンプルは、全てそのままの値で出力されることになる。主サンプル列中の各サンプルに対して処理を終えたら、副サンプルに対しても同様に処理を行う。なお、チャンネルが１つだけのモノラルの音響信号に対しては、チャンネル間演算手段５０による処理は省略される。
【００４５】
続いて、極性処理手段６０が、各サンプルの正負極性処理を行う。上記予測誤差変換手段４０およびチャンネル間演算手段５０の処理により各サンプルの値は、振幅値から予測誤差に置き換えられると共に、Ｒチャンネルの値は、Ｌチャンネルとの差分に置き換えられたが、各サンプルのビット形式は、当初のままである。通常、コンピュータ等の計算機で演算される場合は、各データは３２ビット単位で処理され、２の補数表現を用いて表現されている。これを、正負の符号付き絶対値表現に変換し、なおかつ、その絶対値部分を上位に１ビット移動させ、正負の符号ビットをＬＳＢ（最下位ビット）に移動させる。極性処理手段６０によるビット構成の変換の様子を模式的に示すと図５のようになる。図５（ａ）は処理前のビット構成であり、図５（ｂ）は処理後のビット構成である。このように正負の符号ビットをＬＳＢに移動させるのは、後の可変長符号化手段７０の処理で、各サンプルのビット長を検出し易くするためである。
【００４６】
次に、可変長符号化手段７０が、各サンプルを可変長に変換する処理を行っていく。本実施形態における可変長符号化は、一般にゴロム符号化と呼ばれる方式を採用している。具体的には、１サンプルを構成するビット成分を上位ビット成分と下位ビット成分に分け、下位ビット成分は変更を加えずそのままとし、上位ビット成分は、上位ビットだけを十進数変換した数値分のビット「０」を並べ、最後にセパレータビット「１」を加えた配列とする。例えば、８ビットのビット成分「００１０１０００」を考えてみる。このとき、下位ビット成分を４ビットとすると、下位ビット成分は「１０００」となる。上位ビットは「００１０」であるため、これを十進数変換した「２」個分の「０」を配列して最後に「１」を加えた「００１」に変換される。この結果、８ビットのビット列「００１０１０００」は、７ビットのビット列「００１１０００」に変換されることになる。本実施形態では、変換の前後でビット成分を不変とする下位ビット成分のビット長を各サンプルで可変とするようにしている。
【００４７】
以下、可変長符号化手段７０が行う処理を具体的に説明していく。図６は可変長符号化の概要を示すフローチャートである。まず、過去のサンプルのビット長の移動平均である平均ビット長Ｂｆを算出する（ステップＳ２１）。平均ビット長Ｂｆは、過去のビット長の累積値である累積ビット長ＲＢを、過去のサンプル数を基にしたカウンタＣ４で除算することにより求められる。すなわち、Ｂｆ＝ＲＢ／Ｃ４で算出される。累積ビット長ＲＢは、初期状態では０であるので、ｔ＝１のサンプルを処理する場合には、ｔ＝１のサンプルのビット長Ｂｄ（ｔ）を初期値として設定しておく。また、初期のカウンタＣ４＝１と設定する。
【００４８】
続いて、時刻ｔにおけるサンプルのビット長Ｂｄ（ｔ）を算出する（ステップＳ２２）。ｔ＝２以降のサンプルについては、平均ビット長Ｂｆの算出後、サンプルのビット長Ｂｄ（ｔ）を算出する。このビット長Ｂｄ（ｔ）は、上記極性処理手段６０によりビット構成の変換を行ったことにより算出し易くなっている。図５（ｂ）に示したようなビット構成に変換したことにより、各サンプルのビット構成において先頭にビット「１」が出現したところからがビット長となる。次に、変更部のビット長Ｂｖを算出する（ステップＳ２３）。これは、上記サンプルのビット長Ｂｄ（ｔ）から平均ビット長Ｂｆを減じることにより算出される。続いて、データの符号出力を行う（ステップＳ２４）。具体的には、上位Ｂｖビットを十進数変換した数値分だけ「０」を出力した後、セパレータビット「１」を出力し、下位Ｂｆビットを不変部として出力する。符号出力は、ハードディスク、ＣＤ−Ｒ等の外部記憶装置への記録として行われることになる。次に、累積ビット長ＲＢにビット長Ｂｄ（ｔ）を加算する（ステップＳ２５）。同時に、各サンプルの処理を行う度に、カウンタＣ４を１つずつ加算していく処理を行う。続いて、カウンタＣ４が所定の数を超えたかどうかを判定する（ステップＳ２６）。所定の数としては、ここでも１００程度を設定している。そのため、カウンタ４が１００を超えたかどうかを判断することになる。この結果、カウンタが１００を超えていたら、累積ビット長ＲＢを半分にする（ステップＳ２７）。具体的には、累積ビット長となる変数ＲＢを２で除算する。同時に、カウンタＣ４を１／２にする。
【００４９】
上記のようにして、各サンプルについて可変ビット長での符号化が行われて行く。符号化により得られた可変長サンプルは、符号データとして出力される。なお、可変長符号化手段７０には、上記のような処理を行うに先立ち量子化雑音成分を分離する機能を持たせておいても良い。具体的には、極性処理手段６０による処理後の各サンプルの下位の所定数のビットを量子化雑音成分とみなして分離する。例えば各サンプルが１６ビットで表現されている場合、本実施形態では、上位ビット１２ビットと、下位ビット４ビットに分離する。この分離は、基本的に、Ａ／Ｄ変換機等、音響信号をデジタル化する際に用いる回路の熱雑音を分離するために行う。そのため、熱雑音であると考えられる下位ビットを分離するのである。下位ビットとして、どの程度分離するかは、音源や利用した回路の特性によっても変化するが、通常量子化ビット数の１／４程度とすることが望ましい。したがって、ここでは、１６ビットの１／４にあたる４ビットを下位ビットとして分離しているのである。
【００５０】
ここで、上位ビットと下位ビットのデータ分離の様子を図７に模式的に示す。図７において、Ｈは上位ビットもしくは上位サンプルデータを示し、Ｌは下位ビットもしくは下位サンプルデータを示す。図７（ａ）は分離前のサンプルデータである。可変長符号化手段７０により、サンプルデータは、図７（ｂ）に示す上位サンプルデータと図７（ｃ）に示す下位サンプルデータに分離された後処理されることになる。なお、上位ビットに含まれる符号ビットは、そのまま上位サンプルデータに含まれて分離される。このように、量子化雑音成分の分離を行った場合には、残りの上位ビットに対して、上記図６に示したフローチャートに従った可変長符号化が行われ、下位ビットについては、そのまま固定長で符号化が行われる。
【００５１】
以上のようにして得られた符号データは、コンピュータに接続されたハードディスク等の記憶装置等に随時記憶され、その後、必要な記憶媒体に対応するフォーマットで記憶される。
【００５２】
（第２の実施形態）
続いて、本発明第２の実施形態に係る符号化装置について説明する。図８は、本発明第２の実施形態に係る符号化装置の機能ブロック図である。図８において、図１に示した構成と同様の機能を有するものについては、同一符号を付して説明を省略する。第１の実施形態と異なる点は、信号平坦部処理手段８０と相関フレーム検出手段９０が加わったことである。図２において、信号平坦部処理手段８０は、各チャンネルごとのサンプル列に対して、信号の値が一定である平坦部を検出し、効率的に符号化する機能を有する。相関フレーム検出手段９０は、各サンプル列に対して、所定の区間をフレームとして設定した後、フレーム間で対応する全てのサンプル値が同一になっている相関フレームを検出し、時間的に後方（未来）に位置する相関フレームを削除する機能を有する。図８に示した装置は、実際には、コンピュータおよびコンピュータにインストールされた専用のソフトウェアプログラムにより実現される。
【００５３】
続いて、図８に示した符号化装置の処理動作について説明する。まず、時系列信号入力手段１０より上記のようなミックスされた音響信号を入力する。すると、チャンネル間演算手段５０が上記図に示した手順に従って、チャンネル間の差分演算処理を行う。続いて、サンプル列再配置手段２０が、上記図２に示したような処理で主サンプル列と副サンプル列に再配置する。その後、下位固定ビット削除手段３０が各サンプルの下位ビットを削除する。
【００５４】
次に、信号平坦部処理手段８０が、信号平坦部の処理を行う。信号平坦部とは、同一の信号レベルが連続する部分のことをいう。特に信号レベルが「０」の無音部、および信号レベルの絶対値が最大の飽和部に現れることが多い。無音部は実際に無音であるか、音が非常に小さく記録されなかった場合に生じるが、飽和部は、信号の録音およびＡ／Ｄ変換の過程において生じる。無音部、飽和部またはそれ以外の同一信号レベルが連続する場合のいずれであっても、信号平坦部は、同一の信号レベルが所定の時間（所定のサンプル数）連続して記録される。このため、この部分は圧縮し易いデータになっている。具体的には、信号平坦部の先頭時刻位置と、同一信号レベルが続くサンプルの個数と、信号レベル（サンプル値）の３つの値を信号平坦部データとして各チャンネルのサンプル列と分離して記録する。各チャンネルのサンプル列からは、信号平坦部が削除される。これを模式的に示すと図９（ａ）（ｂ）に示すようになる。図９（ａ）は、信号平坦部処理前のサンプル列である。図９（ａ）において、網掛けで示した部分は信号平坦部を示す。信号平坦部処理手段８０の処理により、信号平坦部は元のサンプル列からは削除され、図９（ｂ）に示すようになる。ただし、復号時に元通りに復元するために、分離された信号平坦部は、信号平坦部データとして図９（ｃ）に示すような形式で記録しておく。
【００５５】
信号平坦部データは、上述のように、信号平坦部ごとに、その先頭時刻（サンプル番号）、サンプル数、サンプル値の３属性で記録する。ここで、先頭時刻とは、信号の開始位置からの時刻であり、図９（ｃ）の例では、先頭からのサンプル番号で記録している。このサンプル番号をサンプリング周波数で除算すれば、時刻に変換されることになる。サンプル数は、そのサンプル値がどの程度連続して続くかを示す情報である。なお、サンプル数の代わりに信号平坦部の終了時刻を記録するようにしても良い。サンプル値は、デジタル化された信号レベルを示している。符号付き１６ビットで表現した場合は、最大値は「３２７６７」、最小値は「−３２７６８」となる。すなわち、「０」は無音部、「３２７６７」および「−３２７６８」は飽和部を示している。ただし、信号平坦部処理手段８０は、信号平坦部を無条件には処理しない。本発明は、データの圧縮を目的としているため、サンプル列の削減分よりも信号平坦部データが大きくなると意味がないからである。したがって、信号平坦部となるサンプルが所定数以上連続する場合に限り信号平坦部データを作成して各チャンネルのサンプル列から分離するのである。
【００５６】
続いて、各チャンネルのサンプル列に対して、相関フレーム検出手段９０が、所定の区間長をもつフレームを設定して、設定されたフレーム間の比較を行う。本実施形態では、フレーム長をサンプル列の開始時刻から終了時刻までの全区間に渡って固定長としている。具体的には、１フレームを５１２サンプルとしている。相関フレーム検出手段９０は、各チャンネルのサンプル列の先頭から５１２サンプルずつを１フレームとして設定し、フレーム間で全サンプルが一致する相関フレームを求めていくことになる。具体的な手順を図１０のフローチャートに従って説明する。
【００５７】
まず、相関フレーム検出手段９０は、所定のサンプル数単位でフレーム化を行う（ステップＳ３１）。本実施形態では、上述のようにフレーム長をサンプル列の開始時刻から終了時刻までの全区間に渡って固定長５１２サンプルとしている。相関フレーム検出手段９０は、図１１（ａ）に示すように、サンプル列の先頭から５１２サンプルずつを１フレームとして設定していくことになる。
【００５８】
次に、各フレームに対して構成するサンプル値が全て一致するフレームを探索する。具体的には、図１１（ｂ）に示すように、まず、設定されたフレームのうち、時間的に最後尾のフレームを、相関フレームを探すための対象フレームとする。次に、所定の探索範囲内において、対象フレームの先頭サンプルの値と同一の値をもつサンプルを、時間的に遡りながら探索していく（ステップＳ３２）。例えば、図１２（ａ）に示すように、対象フレームがｍＴ〜ｍＴ＋５１１の５１２個のサンプルで構成されているとする。この場合、まず、対象フレームの先頭サンプルｍＴのサンプル値ｅ（ｍＴ）と同一となるサンプルを探索していく。サンプルｍＴ−１、サンプルｍＴ−２と順に探索していく。なお、図１２において、ｍは先頭からｍ番目のフレームであることを示し、Ｔはフレーム長（本実施形態では５１２サンプル）を示している。
【００５９】
一致するサンプルｔが見つかったら（ステップＳ３３）、次に、そのサンプルｔの次のサンプルｔ＋１と対象フレームの２番目のサンプルｍＴ＋１が一致するかどうかを比較する。このようにしてサンプルの値が一致する限り後続するサンプル同士の比較を行っていく（ステップＳ３４）。ステップＳ３４においては、ｅ（ｔ＋ｐ）とｅ（ｍＴ＋ｐ）の値が一致する限り、処理を繰り返していく。例えば、図１２（ｂ）に示す例では、ｅ（ｔ）〜ｅ（ｔ＋８）がｅ（ｍＴ）〜ｅ（ｍＴ＋８）と一致しているので、さらにｐ＝９として、ステップＳ３４の処理が続けられることになる。ｐ＝０〜ｐ＝５１１までの全てのｅ（ｔ＋ｐ）とｅ（ｍＴ＋ｐ）が一致した場合（ステップＳ３５）、そのサンプル列を対象フレームに対する相関フレームとし、相関フレームの先頭のサンプル番号と対象フレームの先頭のサンプル番号とを対応付けてフレーム相関データとして記録し、対象フレームを元のサンプル列から削除する（ステップＳ３６）。対象フレームの全サンプルと一致しなければ、さらに対象フレームの先頭サンプルと値が一致するサンプルが存在するかどうかを時間的に遡りながら探索していく。所定のサンプル数分遡っても一致する相関フレームが存在しない場合は、その対象フレームに関する相関フレームの探索を中止し、対象フレームの直前のフレームを新たな対象フレームとして相関フレームの探索を行う。１つの対象フレームに対しての処理が終わったら、ステップＳ３２に戻って、１つ直前のフレームを新たな対象フレームとして処理を続けていく（ステップＳ３７）。このようにして、時系列信号の先頭時刻近辺に位置するフレームを除く全フレームを対象フレームとして相関フレームの検出処理を行う。
【００６０】
サンプル列全体でみると、図１１（ｃ）に示すように対象フレームに対応する相関フレームが検出されたとすると、図１１（ｄ）に示すように対象フレームが削除されることになる。このとき、復号時に完全に復元できるように図１１（ｅ）に示すようなフレーム相関データが記録される。図１１（ｅ）に示すように、フレーム相関データには対象フレームの先頭のサンプル番号と相関フレームの先頭のサンプル番号が対応づけて記録される。
【００６１】
以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、主サンプルと副サンプルの出現比率が１対１のものについて説明したが、これ以外の比率のものでも良い。上記実施形態では、最初に４８ｋＨｚでサンプリングしたものを９６ｋＨｚに補間して作成した音響信号を扱ったので、主サンプルと副サンプルの出現比率が１対１となったが、例えば、最初に２４ｋＨｚでサンプリングしたものを９６ｋＨｚに補間すると、主サンプルと副サンプルの出現比率は１対３となる。このような音響信号を扱う場合には、サンプル列再配置手段２０が、各サンプルを順番に１つの主サンプル列と３つの副サンプル列に順に振り分けるようにすれば良い。
【００６２】
【発明の効果】
以上、説明したように本発明によれば、時系列信号を構成するサンプル列に対して、録音により作成された主サンプル列と、主サンプル列を補間処理することにより得られた副サンプル列を分離すると共に、前記副サンプル列中の各副サンプルの値を、近傍の主サンプルの平均値と当該副サンプルの値との差分値に変換するサンプル再配置を行い、主サンプル列、変換された副サンプル列それぞれに対して、線形予測誤差を算出し、主サンプル列および副サンプル列の値をそれぞれ予測誤差値に変換し、予測誤差値に変換された主サンプル列、副サンプル列を可変長で符号化するようにしたので、デジタル編集された高精細オーディオのワークデータを効率的に可逆圧縮することが可能となるという効果を奏する。
【図面の簡単な説明】
【図１】本発明第１の実施形態に係る時系列信号の符号化装置を示す機能ブロック図である。
【図２】サンプル列再配置手段２０によるサンプルの再配置の様子を示す図である。
【図３】予測誤差変換手段４０による処理を示すフローチャートである。
【図４】チャンネル間演算手段５０による処理を示すフローチャートである。
【図５】極性処理手段６０によるビット構成の変換の様子を示す図である。
【図６】可変長符号化手段７０による処理を示すフローチャートである。
【図７】上位ビットと下位ビットのデータ分離の様子を示す図である。
【図８】本発明第２の実施形態に係る時系列信号の符号化装置を示す機能ブロック図である。
【図９】信号平坦部処理手段８０による処理の様子を示す図である。
【図１０】相関フレーム検出手段９０による処理を示すフローチャートである。
【図１１】相関フレーム検出手段９０の処理による時系列信号全体の様子を示す図である。
【図１２】相関フレーム検出手段９０の処理により比較されるサンプルの様子を示す図である。
【符号の説明】
１０・・・時系列信号入力手段
２０・・・サンプル列再配置手段
３０・・・下位固定ビット削除手段
４０・・・予測誤差変換手段
５０・・・チャンネル間演算手段
６０・・・極性処理手段
７０・・・可変長符号化手段
８０・・・信号平坦部処理手段
９０・・・相関フレーム検出手段[0001]
[Industrial applications]
INDUSTRIAL APPLICABILITY The present invention relates to music production fields such as music production, storage of acoustic data materials, relay of location materials, audio recording / reproduction using digital recording media such as CD / DVD, and analysis / diagnosis of biological signals in telemedicine. The present invention relates to a suitable data compression encoding technique.
[0002]
[Prior art]
Conventionally, various methods have been used for compressing an acoustic signal. MP3 (MPEG-1 / Layer3), AAC (MPEG-2 / Layer3) and the like have been put to practical use as a technique for compressing and encoding an audio signal. With such a compression encoding method, it is possible to treat an audio signal as small data, which contributes to the efficiency of data recording and transmission.
[0003]
Recently, not only lossy coding schemes such as MP3 and AAC as described above, but also lossless coding schemes that can be completely restored have been developed, and are used for sound material management (for example, Patent Document 1).
[0004]
[Patent Document 1]
JP 2000-821199 A
[0005]
[Problems to be solved by the invention]
When handling high-definition audio, sampling is performed with the sampling frequency or the number of quantization bits set higher than that of normal audio (music CD quality: sampling frequency 44.1 kHz, quantization bit number 16 bits). In the music editing field, the task of mixing and editing various sounds is often performed. In this case, normal audio may be partially mixed, and this normal audio is adjusted to high definition audio To this end, interpolation of samples, extension of the number of quantization bits, and the like are performed. The sound signal that has been subjected to such interpolation processing can be compressed in principle to the information amount of the original sound. However, even if encoding is performed by the conventional lossless coding method, the compression is not so large. No compression effect is obtained. Conversely, if compression is performed directly with linear prediction coding as it is, the linear prediction error at the interpolated location may increase and the compression ratio may deteriorate, which may have the opposite effect of compressing the redundant location. Can be. For example, audio data whose sampling frequency has been doubled can theoretically be compressed to 50% or less, but in reality, the compression ratio exceeds 50%.
[0006]
Therefore, in order to solve such a problem, the present invention provides an encoding device and a recording medium for a time-series signal capable of efficiently reversibly compressing high-definition audio work data in the course of digital editing. As an issue.
[0007]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, in the present invention, for a time-series signal composed of a time-series sample sequence, an encoding device that compresses the amount of information so that all the sample sequences can be reproduced, In contrast, a main sample sequence created by recording and a sub-sample sequence obtained by interpolating the main sample sequence are separated, and the value of each sub-sample in the sub-sample sequence is replaced with a neighboring main sample sequence. A sample sequence rearrangement means for converting the average value of the sample and the value of the subsample into a difference value, calculating a linear prediction error for each of the main sample sequence and the converted subsample sequence, Prediction error conversion means for converting the values of the sample sequence and the sub-sample sequence into prediction error values, respectively, and encoding the main sample sequence and the sub-sample sequence converted into the prediction error value with a variable length. Characterized by being configured to have a variable length coding means.
[0008]
According to the present invention, the sample sequence constituting the time-series signal is rearranged into a main sample sequence obtained by recording and a sub-sample sequence obtained by interpolating the main sample sequence, and the main sample sequence and the sub-sample sequence are rearranged. Since each sample sequence is predictively coded and variable-length coded, work data of high-definition audio in the process of digital editing can be efficiently and reversibly compressed.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
(1st Embodiment)
FIG. 1 is a configuration diagram showing a first embodiment of a time-series signal encoding apparatus according to the present invention. In FIG. 1, reference numeral 10 denotes a time-series signal input unit, 20 denotes a sample sequence rearrangement unit, 30 denotes a lower fixed bit deletion unit, 40 denotes a prediction error conversion unit, 50 denotes an inter-channel calculation unit, 60 denotes a polarity processing unit, and 70 denotes It is a variable length encoding means.
[0010]
In FIG. 1, a time-series signal input unit 10 has a function of inputting a digitized sound signal such as a digital sound signal. The sample sequence rearranging means 20 converts a sample sequence, which is an input time-series signal, into a main sample sequence, which is a sample sequence obtained based on the recording, and a sub-sample sequence, which is obtained by interpolating the main sample sequence. It has the function of separating. The lower fixed bit deletion unit 30 has a function of deleting a predetermined number of lower bits that are quantization noise components. The prediction error conversion means 40 has a function of converting the value of each sample into a prediction error value using a linear prediction error technique. The inter-channel calculation means 50 has a function of performing a difference calculation between channels of a sample sequence including a plurality of channels. The polarity processing unit 60 has a function of dividing a bit string of each sample, which represents a positive / negative value in a complement expression, into one bit representing positive / negative polarity and another bit string. The variable length coding means 70 has a function of coding the value of each sample with a variable bit length. The apparatus shown in FIG. 1 is actually realized by a computer and a dedicated software program installed on the computer.
[0011]
Next, the processing operation of the time-series signal encoding device shown in FIG. 1 will be described. In the present invention, a case will be described as an example in which work data in which a plurality of audio signals are mixed is handled as a time-series signal. As described above, such an audio signal includes both a normal audio signal having a sampling frequency of 48 kHz and a quantization bit number of 16 bits and a high-definition audio signal having a sampling frequency of 96 kHz and a quantization bit number of 24 bits. . A sound signal obtained by mixing sound signals having different sampling frequencies in this manner is handled by unifying the sampling frequency to a high-definition sound signal. In this case, a sound signal having a sampling frequency of 48 kHz is interpolated between the sound signals having a sampling frequency of 96 kHz with an average value of adjacent samples in order to match the number of samples.
[0012]
FIG. 2A schematically shows such an acoustic signal. In FIG. 2A, numbers in parentheses are sample numbers assigned in ascending order from 1, and x indicates a value of the sample. When such a time-series signal as a sample sequence is input from the time-series signal input means 10, the sample sequence rearrangement means 20 calculates the difference between the odd-numbered sample and the average value of the adjacent even-numbered samples. And the difference between the even-numbered sample and the average value of the adjacent odd-numbered sample is calculated. The sample rows at this time are schematically shown in FIGS. 2B and 2C, respectively. In the example of FIG. 2B, the even-numbered samples that are not subjected to the calculation are shown in the state of being moved temporally in the past, and in the example of FIG. .
[0013]
As a result of this difference operation, the one with the larger difference value is set as the sub-sample sequence, and the one with the smaller difference value is set as the main sample sequence. In the example of FIG. 2, each value of the latter half of the array of FIG.
The values in the second half of the array in FIG. 2C are compared. For example, when the odd-numbered sample is obtained by interpolation, the latter half value of the array shown in FIG. On the other hand, when the even-numbered samples are obtained by interpolation, the latter half value of the array shown in FIG. For example, when there are many values near 0 in the latter half of the array shown in FIG. 2B, the set of even-numbered samples is separated as the main sample sequence, and the set of odd-numbered samples is separated as the sub-sample sequence. In the processing of the sample sequence rearrangement means 20, the main sample is moved in the past in time and the sub-sample is moved in the future in time, as shown in FIGS. Alternatively, the main sample and the sub-sample may be handled separately. For example, if the odd number is a sub-sample, the main sample and the sub-sample are separated as shown in FIG. In the present invention, by performing linear prediction on a sample sequence including samples obtained by interpolating using original samples, the main sample Columns and subsample columns. Therefore, if linear prediction can be separately performed on the main sample sequence and the sub-sample sequence, even if only one sample sequence as shown in FIGS. May be two sample strings as shown in FIG.
[0014]
Next, the lower fixed bit deletion unit 30 separates a predetermined number of lower bits of each sample of the main sample sequence and the sub-sample sequence. This is performed in order to remove redundant lower-order bit components when the data having a quantization bit number of 16 bits is converted into 24 bits in order to match with a high-definition audio signal. , The amount of coded information will increase by a factor of 3/2. On the other hand, if all the audio signals of the source material to be mixed are quantized with high-definition 24 bits, it is not necessary to perform the processing by the lower fixed bit deleting means 30. A means for separately recording the obtained lower bit data array as a part of the output code data reduces the processing load after the subsequent prediction error conversion means. Regarding the lower fixed bit deleting means 30, whether or not to operate can be set in advance.
[0015]
Subsequently, the prediction error conversion means 40 converts the values of each sample of the main sample sequence and the sub-sample sequence into prediction error values. The calculation of the prediction error value for a certain sample is performed using the value of one or a plurality of samples located immediately before in the past in time. In the present embodiment, a method of dynamically changing the number of samples immediately before use is used. Hereinafter, such adaptive linear prediction encoding will be described. FIG. 3 is a flowchart showing an outline of processing of adaptive linear prediction encoding performed by the prediction error conversion means 40. First, using a plurality of prediction formulas prepared in advance, a linear prediction error corresponding to each prediction formula is calculated (step S1). Specifically, the following [Equation 1] to [Equation 6] are prepared as prediction calculation equations for calculating the prediction error of the sample number t.
[0016]
[Formula 1]
e0 (t) = x (t) -e0 (t-1) / 2
[0017]
[Formula 2]
e1 (t) = x (t) -a₁₁X (t-1) -e1 (t-1) / 2
[0018]
[Equation 3]
e2 (t) = x (t) -a₂₁X (t-1) -a₂₂X (t-2) -e2 (t-1) / 2
[0019]
[Equation 4]
e3 (t) = x (t) -a₃₁X (t-1) -a₃₂X (t-2) -a₃₃X (t-3) -e3 (t-1) / 2
[0020]
[Equation 5]
e4 (t) = x (t) -a₄₁X (t-1) -a₄₂X (t-2) -a₄₃X (t-3) -a₄₄X (t-4) -e4 (t-1) / 2
[0021]
[Equation 6]
e5 (t) = x (t) -a₅₁X (t-1) -a₅₂X (t-2) -a₅₃X (t-3) -a₅₄X (t-4) -a₅₅X (t-5) -e5 (t-1) / 2
[0022]
In the above [Equation 1] to [Equation 6], e0 (t) to e5 (t) are prediction errors in the sample at time t by the respective prediction calculation formulas, and x (t) to x (t-5) are time Sample values at t to t-5.
[0023]
[A]₂₁X (t-1) + a₂₂X (t−2) ”and“ a ”in the above [Equation 4].₃₁X (t-1) + a₃₂X (t-2) + a₃₃X (t−3) ”and“ a ”in the above [Equation 5].₄₁X (t-1) + a₄₂X (t-2) + a₄₃X (t-3) + a₄₄X (t−4) ”and“ a ”in the above [Equation 6].₅₁X (t-1) + a₅₂X (t-2) + a₅₃X (t-3) + a₅₄X (t-4) + a₅₅"X (t-5)" is a linear prediction component based on the past 2 to 5 samples. The prediction error at time t using this linear prediction component and the prediction errors “e1 (t−1) / 2” to “e5 (t−1) / 2” (error feedback components) calculated in the immediately preceding sample. e0 (t) to e5 (t) are calculated.
[0024]
The above coefficient a₁₁~ A₅₅Has an initial value of a₁₁= 1, a₂₁= 2, a₂₂= -1, a₃₁= 3, a₃₂= -3, a₃₃= 1, a₄₁= 4, a₄₂= -6, a₄₃= 4, a₄₄= -1, a₅₁= 5, a₅₂= -10, a₅₃= 10, a₅₄= -5, a₅₅= 1 are set, but in the present embodiment, these coefficients are dynamically changed. Specifically, the coefficient a is calculated using the following [Equation 7] using the Levinson-Durvin algorithm.₁₁~ A₅₅To determine.
[0025]
[Equation 7]
φ (k) = 1 / (NK) −_{j = 1, NK}x (j) · x (j + k)
k_i= − {Φ (i) +}_{j = 1, i-1}a_j(I-1) · φ (ij)｝ / E (i-1)
a_i(I) = k_i
a_j(I) = a_j(I-1) + k_i・ A_ij(I-1) where 1≤j≤i-1
E (i) = (1-k_i ²) E (i-1)
[0026]
In the above [Equation 7], φ (k) is a sample obtained by shifting k samples within a range of the maximum value K (5 in the above example) in N samples x (j) (j = 1,..., N). The autocorrelation value with the column. Note that N takes a sufficiently large value with respect to K (for example, when K = 5, N = 32768). [Equation 7] is recursively repeated from i = 1 to i = K, and finally obtained a_j(K) becomes the coefficient corresponding to the past K samples, and a is an intermediate result obtained in each phase._j(I) is the coefficient a_ijBecomes In step S1, a calculation is performed using each of the formulas [1] to [6] using the coefficient determined by the above [7]. The calculation based on [Equation 7] is actually performed in step S7 described below. Further, since the values of several past samples are required to determine the coefficients, the calculation of [Equation 1] to [Equation 6] is performed for the first N-1 samples using the above initial coefficients. become.
[0027]
Next, a linear prediction error that minimizes the cumulative error, which is the accumulation of the absolute values of the prediction error values for each of the above-described prediction formulas, is selected as the prediction error of the sample (step S2). Here, the concept of accumulated error is used. Specifically, the accumulated values of the prediction errors calculated by the prediction calculation formulas [Formula 1] to [Formula 6] for the past samples are set as A0 to A5. Then, a prediction error corresponding to the smallest one of the accumulated errors A0 to A5 is selected. For example, it is assumed that A2 is the smallest among A0 to A5. In this case, the prediction error e2 (t) calculated by [Equation 3] is selected as the prediction error e (t) to be encoded. The selected prediction error e (t) is replaced with the original value x (t) of the sample, and the subsequent processing is performed.
[0028]
Subsequently, the absolute values of the prediction errors e0 (t) to e5 (t) are added to the accumulated errors A0 to A5 (step S3). Specifically, as shown in the following [Equation 8], variables A0 to A5 serving as accumulated error values are updated. At the same time, every time the processing of each sample is performed, the processing of adding the counters C1 and C2 one by one is performed.
[0029]
[Equation 8]
A0 ← A0 + | e0 (t) |
A1 ← A1 + | e1 (t) |
A2 ← A2 + | e2 (t) |
A3 ← A3 + | e3 (t) |
A4 ← A4 + | e4 (t) |
A5 ← A5 + | e5 (t) |
[0030]
Subsequently, it is determined whether or not the counter C1 has exceeded a predetermined number (step S4). In the present embodiment, the predetermined number is set to 100 times. That is, it is determined whether the counter C1 has exceeded 100.
[0031]
As a result, if the counter exceeds 100, the accumulated error is halved (step S5). Specifically, as shown in the following [Equation 9], variables A0 to A5 that are cumulative errors are divided by two. At the same time, the counter C1 is reset to 0. That is, A0 to A5 here are not accumulated errors in a pure sense, but are moving averages of the accumulated errors. In the present embodiment, up to the immediately preceding maximum of 100 samples are accumulated, but the previous samples are processed so as to be halved. Thereby, the influence of the samples separated in time is reduced.
[0032]
[Equation 9]
A0 ← (A0) / 2
A1 ← (A1) / 2
A2 ← (A2) / 2
A3 ← (A3) / 2
A4 ← (A4) / 2
A5 ← (A5) / 2
[0033]
Subsequently, it is determined whether or not the counter C2 has exceeded a predetermined number (step S6). In the present embodiment, the predetermined number is set to 32768 times. That is, it is determined whether the counter C2 has exceeded 32768.
[0034]
As a result, if the counter C2 exceeds 32768, the coefficient a₁₁~ A₅₅Is calculated again (step S7). Specifically, using the above [Equation 7], the coefficient a₁₁~ A₅₅Will be recalculated. At the same time, the counter C2 is reset to zero.
[0035]
By executing the processes of steps S1 to S7 over the samples of the main sample sequence and the sub-sample sequence, the values of all the samples are replaced from the original amplitude value x (t) by the target error e (t). become. In the present embodiment, in particular, it is possible to calculate a more accurate prediction error by dynamically changing coefficients of a plurality of prediction formulas.
[0036]
Next, the inter-channel calculation means 50 performs a difference calculation between the channels on the main sample and the sub-sample of each channel in which the prediction error value is recorded. FIG. 4 is a flowchart showing the outline of the process of calculating the difference between channels. First, a main sample is read (step S11). If the read main sample is for the L channel, the value of the main sample is stored in the memory and output to the next polarity processing means 60 (step S12). On the other hand, if the read main sample is for the R channel, the processing from step S13 is performed. Note that the main sample is read alternately for each channel, so there is no need to make a determination. For example, once the sample of the L channel with the sample number t = 1 is read, the order of the sample of the R channel with the sample number t = 1 is determined next, and then the sample of the L channel with the sample number t = 2. Therefore, the processing in step S12 and the processing after step S13 may be alternately switched.
[0037]
In the case of the R channel sample, the variables Ao and Ad are compared (step S13). Here, Ao is the accumulation of the absolute values of the sample values of the R channel, and Ad is the accumulation of the absolute values of the differences between the sample values of the R channel and the L channel. The initial values of both variables Ao and Ad are 0. If Ad ≧ Ao in step S13, the sample of the R channel is output as it is to the next polarity processing means 60 (step S14). Further, the accumulated value Ao is updated as shown in the following first equation of [Equation 10]. Specifically, the sample value e of the R channel_RIs added to the accumulated value Ao.
[0038]
If Ad <Ao in step S13, the difference from the L channel sample stored in the memory in step S12 is calculated, and the difference value is output to the next polarity processing unit 60 (step S15). Further, the accumulated value Ad is updated as shown in the following Expression (10). Specifically, the difference value e between the samples of the R channel and the L channel_R-E_LIs added to the accumulated value Ad.
[0039]
[Equation 10]
Ao ← Ao + | e_R|
Ad ← Ad + | e_R-E_L|
[0040]
Subsequently, the counter C3 that indicates that a pair of L and R samples has been processed is incremented by one (step S16).
[0041]
Subsequently, it is determined whether the counter has exceeded a predetermined number of times (step S17). In the present embodiment, the predetermined number is set to 100 times. That is, it is determined whether or not the counter C3 has exceeded 100.
[0042]
As a result, if the counter C3 exceeds 100, the cumulative values Ao and Ad are halved (step S18). Specifically, as shown in the following [Equation 11], variables Ao and Ad, which are cumulative values, are divided by two. At the same time, the counter C is also reset to half. That is, the variables Ao and Ad here are not cumulative values in a pure sense, but are moving averages of the cumulative values, similarly to A0 to A5 in [Equation 8]. In the present embodiment, up to the immediately preceding maximum of 100 samples are accumulated, but the previous samples are processed so as to be halved. Thereby, the influence of the samples separated in time is reduced.
[0043]
[Equation 11]
Ao ← Ao / 2
Ad ← Ad / 2
C3 ← C3 / 2
[0044]
By executing the processing of steps S11 to S18 over all the main samples in the main sample sequence, the values of all the samples of the R channel are replaced with the difference values from the L channel. However, as is apparent from the above-described processing, depending on the magnitude relationship between the accumulated values Ao and Ad, there is a sample in which the sample value of the R channel is recorded as it is. In the inter-channel calculation means 50, all the samples of the L channel are output as they are. When the processing is completed for each sample in the main sample sequence, the same processing is performed for the sub-samples. The processing by the inter-channel calculation means 50 is omitted for a monaural sound signal having only one channel.
[0045]
Subsequently, the polarity processing means 60 performs positive / negative processing on each sample. By the processing of the prediction error conversion means 40 and the inter-channel calculation means 50, the value of each sample was replaced from the amplitude value to the prediction error, and the value of the R channel was replaced by the difference from the L channel. Is unchanged from the original. Normally, when operated by a computer such as a computer, each data is processed in units of 32 bits and is expressed using a two's complement representation. This is converted into a positive / negative signed absolute value expression, and the absolute value portion is shifted one bit higher, and the positive / negative sign bit is shifted to the LSB (least significant bit). FIG. 5 schematically shows the conversion of the bit configuration by the polarity processing means 60. FIG. 5A shows a bit configuration before processing, and FIG. 5B shows a bit configuration after processing. The reason why the positive / negative sign bit is shifted to the LSB in this way is to make it easier to detect the bit length of each sample in the subsequent processing of the variable length coding means 70.
[0046]
Next, the variable length coding means 70 performs a process of converting each sample into a variable length. The variable length coding in the present embodiment employs a method generally called Golomb coding. More specifically, the bit components forming one sample are divided into upper bit components and lower bit components, the lower bit components are left unchanged, and the upper bit components are the numerical values obtained by converting only the upper bits into decimal numbers. Bits “0” are arranged, and a separator bit “1” is added at the end. For example, consider an 8-bit bit component "00101000". At this time, if the lower bit component is 4 bits, the lower bit component is “1000”. Since the upper bits are "0010", "2" pieces of "0" obtained by converting this number into decimal numbers are arranged, and are converted into "001" obtained by adding "1" at the end. As a result, the 8-bit bit string “00101000” is converted into a 7-bit bit string “00111000”. In the present embodiment, the bit length of the lower bit component that makes the bit component invariable before and after the conversion is made variable in each sample.
[0047]
Hereinafter, the processing performed by the variable length coding unit 70 will be specifically described. FIG. 6 is a flowchart showing an outline of the variable length coding. First, an average bit length Bf which is a moving average of bit lengths of past samples is calculated (step S21). The average bit length Bf is obtained by dividing the cumulative bit length RB, which is the cumulative value of the past bit length, by a counter C4 based on the number of past samples. That is, it is calculated by Bf = RB / C4. Since the accumulated bit length RB is 0 in the initial state, when processing a sample at t = 1, the bit length Bd (t) of the sample at t = 1 is set as an initial value. Also, the initial counter C4 is set to 1.
[0048]
Subsequently, the bit length Bd (t) of the sample at time t is calculated (step S22). For the samples after t = 2, after calculating the average bit length Bf, the bit length Bd (t) of the sample is calculated. The bit length Bd (t) can be easily calculated by converting the bit configuration by the polarity processing means 60. As a result of the conversion into the bit configuration as shown in FIG. 5B, the bit length starts from the bit “1” appearing at the head in the bit configuration of each sample. Next, the bit length Bv of the changing unit is calculated (Step S23). This is calculated by subtracting the average bit length Bf from the bit length Bd (t) of the sample. Subsequently, data sign output is performed (step S24). Specifically, after outputting “0” by the numerical value obtained by converting the upper Bv bits into a decimal number, a separator bit “1” is output, and the lower Bf bits are output as an invariable part. The code output is performed as recording on an external storage device such as a hard disk or a CD-R. Next, the bit length Bd (t) is added to the accumulated bit length RB (step S25). At the same time, every time the processing of each sample is performed, the processing of adding the counter C4 one by one is performed. Subsequently, it is determined whether the counter C4 has exceeded a predetermined number (step S26). As the predetermined number, about 100 is set here as well. Therefore, it is determined whether the counter 4 has exceeded 100. As a result, if the counter exceeds 100, the cumulative bit length RB is halved (step S27). Specifically, the variable RB that is the accumulated bit length is divided by two. At the same time, the counter C4 is halved.
[0049]
As described above, encoding with a variable bit length is performed for each sample. The variable-length samples obtained by encoding are output as code data. Note that the variable length coding means 70 may have a function of separating the quantization noise component before performing the above processing. Specifically, a predetermined number of lower bits of each sample after the processing by the polarity processing means 60 are regarded as quantization noise components and separated. For example, when each sample is represented by 16 bits, in the present embodiment, the upper bits are separated into 12 bits and the lower bits are separated into 4 bits. This separation is basically performed to separate the thermal noise of a circuit used for digitizing an acoustic signal, such as an A / D converter. Therefore, lower bits considered to be thermal noise are separated. The degree to which the lower bits are separated depends on the characteristics of the sound source and the circuit used, but it is usually desirable to set the number of quantization bits to about 1/4. Therefore, in this case, 4 bits corresponding to 1/4 of 16 bits are separated as lower bits.
[0050]
Here, the state of data separation of upper bits and lower bits is schematically shown in FIG. In FIG. 7, H indicates upper bits or upper sample data, and L indicates lower bits or lower sample data. FIG. 7A shows sample data before separation. The variable length coding means 70 separates the sample data into upper sample data shown in FIG. 7B and lower sample data shown in FIG. Note that the sign bit included in the upper bits is directly included in the upper sample data and separated. As described above, when the quantization noise component is separated, the remaining upper bits are subjected to variable-length coding according to the flowchart shown in FIG. 6, and the lower bits are fixed as they are. Encoding is performed by length.
[0051]
The code data obtained as described above is stored as needed in a storage device such as a hard disk connected to a computer, and then stored in a format corresponding to a necessary storage medium.
[0052]
(Second embodiment)
Subsequently, an encoding device according to the second embodiment of the present invention will be described. FIG. 8 is a functional block diagram of an encoding device according to the second embodiment of the present invention. 8, components having the same functions as those in the configuration shown in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted. The difference from the first embodiment is that a signal flat part processing means 80 and a correlation frame detecting means 90 are added. In FIG. 2, the signal flat part processing means 80 has a function of detecting a flat part having a constant signal value for a sample sequence for each channel, and efficiently coding the same. After setting a predetermined section as a frame for each sample sequence, the correlation frame detection means 90 detects a correlation frame in which all the corresponding sample values are the same between frames, and detects the correlation frame backward (see FIG. It has a function of deleting the correlation frame located in the future. The apparatus shown in FIG. 8 is actually realized by a computer and a dedicated software program installed on the computer.
[0053]
Subsequently, the processing operation of the encoding device shown in FIG. 8 will be described. First, the mixed sound signal as described above is input from the time-series signal input means 10. Then, the inter-channel calculation means 50 performs a difference calculation process between the channels according to the procedure shown in the above figure. Subsequently, the sample sequence rearranging means 20 rearranges the sample sequence into the main sample sequence and the sub-sample sequence by the processing shown in FIG. After that, the lower fixed bit deleting means 30 deletes the lower bit of each sample.
[0054]
Next, the signal flat part processing means 80 performs processing of the signal flat part. The signal flat portion refers to a portion where the same signal level is continuous. In particular, it often appears in a silent part where the signal level is “0” and a saturated part where the absolute value of the signal level is the maximum. Silence occurs when the sound is actually silent or when the sound is not recorded very small, while saturation occurs during the signal recording and A / D conversion. Regardless of whether a silent portion, a saturated portion, or the other same signal level continues, the same signal level is continuously recorded for a predetermined time (a predetermined number of samples) in the signal flat portion. For this reason, this part is data that can be easily compressed. Specifically, the three values of the head time position of the signal flat portion, the number of samples following the same signal level, and the signal level (sample value) are recorded as signal flat portion data separately from the sample sequence of each channel. I do. The signal flat portion is deleted from the sample sequence of each channel. This is schematically shown in FIGS. 9A and 9B. FIG. 9A shows a sample sequence before the signal flat portion processing. In FIG. 9A, a shaded portion indicates a signal flat portion. By the processing of the signal flat part processing means 80, the signal flat part is deleted from the original sample sequence, as shown in FIG. 9B. However, in order to restore the original state at the time of decoding, the separated signal flat part is recorded as signal flat part data in a format as shown in FIG. 9C.
[0055]
As described above, the signal flat portion data is recorded for each signal flat portion with three attributes of the start time (sample number), the number of samples, and the sample value. Here, the head time is a time from the start position of the signal, and in the example of FIG. 9C, is recorded by the sample number from the head. If this sample number is divided by the sampling frequency, it will be converted to time. The sample number is information indicating how continuous the sample value continues. The end time of the signal flat portion may be recorded instead of the number of samples. The sample value indicates the digitized signal level. When represented by signed 16 bits, the maximum value is “32767” and the minimum value is “−32768”. That is, “0” indicates a silent part, and “32767” and “−32768” indicate a saturated part. However, the signal flat part processing means 80 does not unconditionally process the signal flat part. This is because the purpose of the present invention is to compress data, and it is meaningless if the signal flat portion data is larger than the reduction of the sample sequence. Therefore, only when the sample which becomes the signal flat portion continues for a predetermined number or more, the signal flat portion data is created and separated from the sample sequence of each channel.
[0056]
Subsequently, the correlation frame detection unit 90 sets a frame having a predetermined section length for the sample sequence of each channel, and performs comparison between the set frames. In the present embodiment, the frame length is fixed over the entire section from the start time to the end time of the sample sequence. Specifically, one frame has 512 samples. The correlation frame detection means 90 sets 512 samples from the head of the sample sequence of each channel as one frame, and obtains a correlation frame in which all samples match between frames. The specific procedure will be described with reference to the flowchart of FIG.
[0057]
First, the correlation frame detection unit 90 performs framing in units of a predetermined number of samples (step S31). In this embodiment, as described above, the frame length is set to 512 samples of the fixed length over the entire section from the start time to the end time of the sample sequence. As shown in FIG. 11A, the correlation frame detection means 90 sets 512 samples from the head of the sample sequence as one frame.
[0058]
Next, a search is made for a frame in which all sample values constituting each frame match. Specifically, as shown in FIG. 11B, first, a temporally last frame among the set frames is set as a target frame for searching for a correlation frame. Next, within the predetermined search range, a sample having the same value as the value of the first sample of the target frame is searched while going back in time (step S32). For example, as shown in FIG. 12A, it is assumed that the target frame is composed of 512 samples of mT to mT + 511. In this case, first, a sample that is the same as the sample value e (mT) of the first sample mT of the target frame is searched for. The sample mT-1 and the sample mT-2 are sequentially searched. In FIG. 12, m indicates the m-th frame from the beginning, and T indicates the frame length (512 samples in this embodiment).
[0059]
When a matching sample t is found (step S33), it is next compared whether the next sample t + 1 of the sample t matches the second sample mT + 1 of the target frame. In this way, comparison between subsequent samples is performed as long as the values of the samples match (step S34). In step S34, the process is repeated as long as the values of e (t + p) and e (mT + p) match. For example, in the example shown in FIG. 12B, since e (t) to e (t + 8) match with e (mT) to e (mT + 8), the process of step S34 is continued with p = 9. Will be done. If all of e (t + p) and e (mT + p) from p = 0 to p = 511 match (step S35), the sample sequence is set as a correlation frame for the target frame, and the first sample number of the correlation frame and the target frame Is recorded as frame correlation data in association with the first sample number, and the target frame is deleted from the original sample sequence (step S36). If the sample does not match all the samples of the target frame, a search is further performed in time to determine whether there is a sample whose value matches the first sample of the target frame. If there is no matching correlation frame even after going back by the predetermined number of samples, the search for the correlation frame related to the target frame is stopped, and the search for the correlation frame is performed using the frame immediately before the target frame as a new target frame. When the processing for one target frame is completed, the process returns to step S32, and the processing is continued with the immediately preceding frame as a new target frame (step S37). In this way, the detection processing of the correlation frame is performed with all the frames except the frame located near the head time of the time series signal as the target frame.
[0060]
Looking at the entire sample sequence, assuming that a correlation frame corresponding to the target frame is detected as shown in FIG. 11C, the target frame is deleted as shown in FIG. 11D. At this time, frame correlation data as shown in FIG. 11 (e) is recorded so that it can be completely restored at the time of decoding. As shown in FIG. 11E, the head sample number of the target frame and the head sample number of the correlation frame are recorded in the frame correlation data in association with each other.
[0061]
The preferred embodiment of the present invention has been described above, but the present invention is not limited to the above embodiment, and various modifications are possible. For example, in the above-described embodiment, the case where the appearance ratio of the main sample and the sub-sample is 1 to 1 has been described, but another ratio may be used. In the above embodiment, the sound signal created by first interpolating the sampled signal at 48 kHz to 96 kHz was used, so that the appearance ratio of the main sample and the sub-sample was 1: 1. When the sampled sample is interpolated to 96 kHz, the appearance ratio of the main sample and the sub-sample becomes 1: 3. When dealing with such an acoustic signal, the sample sequence rearranging means 20 may divide each sample into one main sample sequence and three sub-sample sequences in order.
[0062]
【The invention's effect】
As described above, according to the present invention, for a sample sequence constituting a time-series signal, a main sample sequence created by recording and a sub-sample sequence obtained by interpolating the main sample sequence are described. Separating and performing a sample rearrangement to convert the value of each sub-sample in the sub-sample sequence into a difference between the average value of the nearby main sample and the value of the sub-sample, the main sample sequence is converted. A linear prediction error is calculated for each sub-sample sequence, the values of the main sample sequence and the sub-sample sequence are converted into prediction error values, and the main sample sequence and the sub-sample sequence converted into the prediction error values are variable-length. Since the encoding is performed by using, the digitally edited high-definition audio work data can be efficiently reversibly compressed.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing an apparatus for encoding a time-series signal according to a first embodiment of the present invention.
FIG. 2 is a diagram showing a state of sample rearrangement by a sample sequence rearrangement unit 20;
FIG. 3 is a flowchart showing processing by a prediction error conversion unit 40;
FIG. 4 is a flowchart showing processing by an inter-channel calculation means 50;
FIG. 5 is a diagram showing a state of conversion of a bit configuration by a polarity processing unit 60;
FIG. 6 is a flowchart showing a process performed by a variable-length encoding unit 70;
FIG. 7 is a diagram showing a state of data separation of upper bits and lower bits.
FIG. 8 is a functional block diagram illustrating a time-series signal encoding device according to a second embodiment of the present invention.
FIG. 9 is a diagram showing a state of processing by a signal flat part processing means 80;
FIG. 10 is a flowchart illustrating a process performed by a correlation frame detection unit 90;
FIG. 11 is a diagram showing a state of an entire time-series signal by processing of a correlation frame detecting means 90.
FIG. 12 is a diagram showing a state of samples compared by the processing of the correlation frame detection means 90.
[Explanation of symbols]
10 time-series signal input means
20 ... sample row rearrangement means
30... Lower fixed bit deleting means
40 ... Prediction error conversion means
50: Channel calculation means
60 ... polarity processing means
70 ... variable length coding means
80 ... Signal flat part processing means
90 ... correlation frame detecting means

Claims

A coding apparatus that compresses the amount of information so that all the sample sequences can be reproduced with respect to a time-series signal including a time-series sample sequence,
For the sample sequence, a main sample sequence obtained by sampling a recording signal and a sub-sample sequence obtained by interpolating the main sample sequence are separated, and each sub-sample sequence in the sub-sample sequence is separated. Sample sequence rearrangement means for converting the value of the sample into a difference between the average value of the nearby main sample and the value of the subsample;
Prediction error conversion means for calculating a linear prediction error for each of the main sample sequence and the converted sub-sample sequence, and converting the values of the main sample sequence and the sub-sample sequence into prediction error values,
The main sample sequence converted to the prediction error value, a variable length encoding means for encoding the sub-sample sequence with a variable length,
An encoding device for a time-series signal, comprising:

In claim 1,
The sample sequence rearrangement means, when a value of an even-numbered sample sequence of the sample sequence is close to the average value of the preceding and following odd-numbered sample sequence, the odd-numbered sample is a main sample sequence, an even-numbered sample sequence. To separate the sample located at the
When the value of an odd-numbered sample in the sample sequence is close to the average value of the preceding and following even-numbered sample sequences, the even-numbered sample is a main sample sequence, and the odd-numbered sample is a sub-sample. An encoding device for a time-series signal, which is separated into columns.

In claim 1,
A low-order fixed bit deletion unit that deletes a predetermined number of lower-order bit components from the main sample sequence and the sub-sample sequence processed by the sub-sample sequence conversion unit; Wherein the prediction error conversion means performs processing on the sample sequence of (1).

In claim 1,
When the sample sequence is composed of a plurality of channels having a plurality of values at the same time, the difference between the samples at the same time between the channels is calculated, and the sample sequence of any channel is calculated as the calculated difference value. A time-series signal encoding apparatus, further comprising an inter-channel calculation means for updating the time-series signal.

In claim 1,
In the sample sequence, a sample in which the value of the sample is continuously the same is extracted and deleted from the sample sequence, and the start time position of the deleted sample, the number of samples, and the sample value are calculated. An encoding apparatus for a time-series signal, comprising signal flat part processing means for encoding two values at a stage preceding the prediction error conversion means.

In claim 1,
If a frame having the same content as a frame composed of a predetermined number of sample sequences exists in the past in the sample sequence, the frame located in the future is deleted, and the start time position of both frames, An apparatus for encoding a time-series signal, comprising: a correlation frame detecting means for encoding the number of samples forming a frame, at a stage preceding the prediction error converting means.

In claim 1,
The prediction error conversion means calculates a plurality of prediction error value candidates for the main sample sequence and the sub-sample sequence from a temporally past sample sequence based on a plurality of prediction calculation formulas. A time-series signal encoding apparatus for selecting a prediction error value to be encoded from the data.

In claim 7,
An apparatus for encoding a time-series signal, wherein the linear coefficients of the plurality of prediction formulas are updated every predetermined number of samples.

In claim 1,
Polarity processing means for shifting the whole of the absolute value of the sample value converted to the prediction error by one bit higher and inserting a sign into the least significant one bit is provided before the variable length coding means. A time-series signal encoding apparatus characterized by the above-mentioned.

In claim 1,
The variable-length encoding means encodes lower-order bit components as they are among bit components of each sample converted to the prediction error value, and changes bit components for the remaining upper-order bit components. A time-series signal encoding apparatus for performing encoding by using a time-series signal.

A recording medium in which, for a given time-series signal, code data output by the time-series signal encoding device according to any one of claims 1 to 10.