JP2005018606A

JP2005018606A - Linear prediction analysis method and device of floating-point type signal sequence, program, and its storage medium

Info

Publication number: JP2005018606A
Application number: JP2003185191A
Authority: JP
Inventors: Takehiro Moriya; 健弘守谷; Dai Yan; ダイヤン
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-06-27
Filing date: 2003-06-27
Publication date: 2005-01-20
Anticipated expiration: 2023-06-27
Also published as: JP4098679B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance the probability of matching a floating-point input value x<SB>i</SB>with the exponent part E of its predicted value x^<SB>i</SB>. <P>SOLUTION: Only the mantissa M of a sequence of input values x<SB>i</SB>is transformed to an intermediate value of values which can be taken by the mantissa, e.g., "100 0000 0000 0000 0000 0000", and predicted coefficients α<SB>1</SB>-α<SB>p</SB>are determined by linear prediction analysis so that the error with the predicted value x^<SB>i</SB>d=Σ<SB>i=p</SB><SP>N-1</SP>(y<SB>i</SB>-x^<SB>i</SB>)<SP>2</SP>is minimized with this transformed floating-point input value y<SB>i</SB>as a target value. When the mantissa is transformed, figures on and after 8 digits of the mantissa may be set to "xxx xxxx 1000 0000 0000 0000" while matching the figures up to an upper digit, e.g., 7 digits to those of the input value x<SB>i</SB>. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
この発明は例えば音声、音楽、画像などの浮動小数点形式のディジタル信号をより低い情報量に可逆圧縮された符号に変換する符号化方法に利用することができ、浮動小数点形式のディジタル信号サンプル値系列からそのサンプル値を線形予測分析する方法、その装置、プログラムおよびその記録媒体に関する。
【０００２】
【従来の技術】
音声、画像などの情報を圧縮する方法として、歪を許さない可逆な符号化方法がある。圧縮率の高い非可逆の符号化を行い、その再生信号と原信号の誤差を可逆に圧縮することを組み合わせることで高い圧縮率で可逆な圧縮が可能となる。この組み合わせ圧縮方法が特許文献１に提案されている。また音声、画像などの情報を歪を許さない可逆な符号化方法としてはその他にも各種のものが知られている。音楽情報については、予測符号化方法を用いて可逆な圧縮符号化をすることが例えば非特許文献１に示されている。従来の方法は何れも波形をそのままＰＣＭ信号としたものについての圧縮符号化方法であった。
【０００３】
しかし音楽の収録スタジオでは浮動小数点形式で波形が記録されて保存されることがある。浮動小数点形式の値は極性、指数部、仮数部に分離されている。例えば図３に示すＩＥＥＥ−７５４として標準化されている浮動小数点形式は３２ビットであり、上位ビットから極性１ビット、指数部８ビット、仮数部２３ビットで構成されている。極性をＳ、指数部の８ビットで表す値を１０進数でＥ、仮数部の２進数をＭとすると、この浮動小数点形式の数値は絶対値表現２進数で表わすと
（−１）^Ｓ×２^Ｅ−Ｅ０×１．Ｍ、Ｅ０＝２^７−１＝１２７となる。
【０００４】
【特許文献１】
特開２００１−４４８４７号公報
【非特許文献１】
ＭａｔＨａｎｓ及びＲｏｎａｌｄＷ．Ｓｃｈａｆｅｒ著「ＬｏｓｓｌｅｓｓＣｏｍｐｒｅｓｓｉｏｎｏｆＤｉｇｉｔａｌＡｕｄｉｏ」ＩＥＥＥＳＩＧＮＡＬＰＲＯＣＥＳＳＩＮＧＭＡＧＡＺＩＮＥ，ＪＵＬＹ２００１，Ｐ２１〜３２
【０００５】
【発明が解決しようとする課題】
音声、音楽、画像の情報が浮動小数点形式のディジタル信号系列として記録され保存されていることがあり、この場合は、極性、指数部、仮数部に分離されているのでそのままではほとんど圧縮できない。非特許文献１に示す可逆予測符号化方法に示すように再帰型の線形予測を行い、予測係数を補助情報として量子化し、予測誤差を圧縮する。整数形式ディジタル信号サンプル系列の圧縮であれば、２乗誤差最小化基準でエネルギーを小さくする予測を行えば、誤差系列の圧縮率の最適化を近似できる。
【０００６】
浮動小数点形式のディジタル信号サンプル系列における誤差の生成や、誤差から原信号の生成では通常の数値として引き算や足し算を行うと指数部をあわせるので、仮数部の精度が保証されなくなる。そこで、指数部、仮数部、極性は別々に引き算足し算を行う。これにより可逆性は保存されるが、指数部が異なると仮数部は大きさとかの数値としての意味が失われる。このため予測しても結果の誤差の値は圧縮できる保証がない。
【０００７】
この発明の目的はこのような点に鑑み、予測された浮動小数点数値ｘ＾_ｉと入力の浮動小数点数値ｘ_ｉの指数部が一致する可能性が大きい、線形予測分析方法、その装置、プログラム、記録媒体を提供することにある。
【０００８】
【課題を解決するための手段】
この発明の１面によれば、浮動小数点形式のディジタル信号サンプル値系列の各サンプル値について、その仮数部を、仮数部がとり得る範囲の中央値又はこれに近い値に置き換えた浮動小数点目標値を生成し、その浮動小数点目標値と浮動小数点予測値との予測誤差又は重みつき予測誤差を最小化するように線形予測分析で予測係数を求める。
【０００９】
この発明の他面によれば、浮動少数点形式のディジタル信号サンプル値系列の各サンプル値について、その仮数部の下位一乃至複数桁を、その仮数部の残りの上位桁の値により決まる仮数部のとり得る値の範囲の中央値又はこれに近い値に置き換えた浮動小数点目標値を生成し、その浮動少数点目標値と浮動少数点予測値との予測誤差又は重みつき予測誤差を最小化するように線形予測分析で予測係数を求める。
【００１０】
【発明の実施の形態】
この発明を、３２ビットＩＥＥＥ−７５４浮動小数点形式のディジタル信号サンプル値系列とされた音声、音楽、画像などの情報を、予測圧縮符号化する場合に適用した実施形態を説明する。
図１にこの発明の実施形態の機能構成を示す。信号源１１から浮動小数点形式ディジタル信号サンプル値系列Ｘが数値変形部１２に入力される。数値変形部１２ではサンプル値系列Ｘの各サンプル値ｘ_ｉの仮数部Ｍが、仮数部の取り得る範囲の中央値又はこれに近い値に変形され、浮動小数点目標値ｙ_ｉが生成される。浮動小数点目標値ｙ_ｉは、浮動小数点サンプル値ｘ_ｉの仮数部の２３ビットだけを例えば仮数部の取り得る値の中央値
“１００００００００００００００００００００００” （１）
に変形し、つまり０．５としたものであり、指数部Ｅはｘ_ｉと同一値とされる。
【００１１】
あるいは浮動小数点目標値ｙ_ｉは、浮動小数点サンプル値ｘ_ｉの仮数部の２３ビット中の上位の所定桁τ（τは１以上２２以下の整数）をそのままｙ_ｉの仮数部の上位桁に用い、この上位桁の値により決まる仮数部が取り得る範囲の中央値又はこれに近い値に仮数部の残りの桁を置き換えて目標値ｙ_ｉの仮数部とする。例えばτ＝７の場合、ｘ_ｉの仮数部の上位７桁をそのまま用い、その値により決まる仮数部のとり得る値の範囲の、例えば下記に示すように中央値に、残りの下位８桁目以降を置き換えたものをｙ_ｉの仮数部とする。
【００１２】
“ｘｘｘｘｘｘｘ１０００００００００００” （２）
このようにして生成された浮動小数点目標値ｙ_ｉと入力浮動小数点ディジタル信号サンプル値系列Ｘとが予測係数算出部１３に入力されて、目標値ｙ_ｉと入力サンプル値ｘ_ｉの浮動小数点予測値ｘ＾_ｉとの予測誤差が最小となるように線形予測分析により予測係数α_ｊ（ｊ＝１，…，ｐ）が求められる。
つまり通常のｐ次の線形予測では下記の予測誤差ｅを最小化する。
【００１３】
ｘ＾_ｉ＝Σ_ｊ＝１ ^ｐα_ｊｘ_ｉ−ｊ
ｅ＝Σ_ｉ＝ｐ ^Ｎ−１（ｘ_ｉ−ｘ＾_ｉ）^２
Ｎは入力系列Ｘが分割されたフレーム内のサンプル数、例えば１０２４である。
しかし、この発明では予測誤差ｄ
ｄ＝Σ_ｉ＝ｐ ^Ｎ−１（ｙ_ｉ−ｘ＾_ｉ）^２（３）
を最小化する。この予測誤差ｄを最小化する解（予測係数）を求める正規方程式は次のようになる。ｊ＝ｐからｊ＝Ｎ−１までの誤差ｄ_ｊの総和をＤとする。
【００１４】
Ｄ＝（ＵＡ−Ｙ）^Ｔ（ＵＡ−Ｙ）（４）
ただし、
Ｄ＝（ｄ_ｐ，ｄ_ｐ＋１，…，ｄ_Ｎ−１）^Ｔ
Ａ＝（α_１，…，α_ｐ）^Ｔ（５）
Ｙ＝（ｙ_ｐ，ｙ_ｐ＋１，…，ｙ_Ｎ−１）^Ｔ
であり、（）^Ｔは（）の転置を表わし、Ｕは（Ｎ−ｐ）行ｐ列の下記の行列である。
【００１５】
【数１】

【００１６】
誤差Ｄを最小化する予測係数Ａは下記の最小二乗解である。
（Ｕ ^ＴＵ）Ａ＝Ｕ ^ＴＹ（７）
Ａ＝（Ｕ ^ＴＵ）^−１Ｕ ^ＴＹ（８）
式（５）の連立方程式をそのまま解くことも可能であるが、近似高速解法を用いることが処理量削減の観点から望ましい。式（８）の場合は（Ｕ ^ＴＵ）をテプリッツ型の行列、つまり対称かつ対角線に平行な線上の要素は同じ値である行列に近似できるが右辺のベクトルＵ ^ＴＹの要素は（Ｕ ^ＴＵ）の要素と異なる。従って最も高速なダービン法は使えず、次に高速なレビンソン法（例えば守谷健弘著「音声符号化」平成１０年１０月電子情報通信学会発行（以下参考文献１と書く）１５〜１６頁参照）で解を求めることになる。
【００１７】
上述では予測係数を求めるために用いる相関係数の演算は入力サンプル値ｘの行列Ｕを用いたが、このかわりに下記の目標値ｙの行列Ｖを使うこともできる。
【００１８】
【数２】

【００１９】
この場合には下記の正規方程式（１０）の左辺のＶ ^ＴＶはテプリッツ型の行列で右辺のベクトルＶ ^ＴＹも同じ相関係数に近似でき、つまり左辺のベクトルＶ ^ＴＹの要素を行列Ｖ ^ＴＶの第１列目の対応する要素と近似できるので式（１１）の解がダービン法（例えば参考文献１の１６〜１７頁参照）で高速に求まる。これは音声符号化で頻繁に使われる方法である。
（Ｖ ^ＴＶ）Ａ＝Ｖ ^ＴＹ（１０）
Ａ＝（Ｖ ^ＴＶ）^−１Ｖ ^ＴＹ（１１）
この行列Ｖを用いる場合には予測係数算出部１３に入力ディジタル信号サンプル値系列Ｘを入力させなくてもよい。
【００２０】
上述では予測誤差ｄを最小化するように線形予測分析で予測係数を求めたが、重みつき予測誤差ｄを最小化するようにして求めてもよい。重みつき予測誤差を最小化するように線形予測分析で予測係数を求める手法は例えば前記参考文献１の１２〜１３頁「ロバストな分析」の項に示す手法を用いることができる。ただし重みのつけ方は異なる。
また浮動小数点形式の線形予測分析に適すように、次式（１２）または式（１３）に示す目標値ｙ_ｉと予測値ｘ＾_ｉとの予測誤差に入力値ｘ_ｉの大きさの逆の重みをつけて、距離尺度に対角重みＷをつけるようにしてもよい。これによっても予測値ｘ＾_ｉと入力値ｘ_ｉの指数部が一致する可能性は高まる。
【００２１】
ｄ_Ｗ＝Σ_ｉ＝ｐ ^Ｎ−１ｗ_ｉ（ｙ_ｉ−ｘ＾_ｉ）^２（１２）
ｅ_Ｗ＝Σ_ｉ＝ｐ ^Ｎ−１ｗ_ｉ（ｘ_ｉ−ｘ＾_ｉ）^２（１３）
式（１２）の場合の正規方程式は（Ｎ−ｐ）×（Ｎ−ｐ）の対角行列Ｗを使って式（１５）のようになる。
【００２２】
【数３】

【００２３】
Ｄ _Ｗ＝（ＶＡ−Ｙ）^ＴＷ（ＶＡ−Ｙ）（１５）
この最小二乗解は下記となる。
（Ｖ ^ＴＷＶ）Ａ＝Ｖ ^ＴＷＹ（１６）
Ａ＝（Ｖ ^ＴＷＶ）^−１Ｖ ^ＴＷＹ（１７）
重みとしては入力値ｘ_ｉの増加とともに減少する関数ｆ（ｘ_ｉ）とする。例えば
ｆ（ｘ_ｉ）＝ｘ_ｊ ^−２（１８）
を使う。これにより予測値と目標値の誤差の指数部を０とする可能性が高まり、情報圧縮効率を改善できる。
【００２４】
また、予測誤差の指数部が０とならないサンプルについて重みを増加させて再びあるいは何回も繰り返して予測係数を求め、予測誤差の指数部が０となった時、あるいはその入力信号系列に対する圧縮効率が最も高いものを求めることも可能である。
（Ｖ ^ＴＷＶ）はテプリッツ型ではないが、対称行列には近似できるので
、コレスキー法（例えば参考文献１の１３〜１４頁参照）を使うことができる。なお、この対角行列Ｗを使う場合に、行列Ｖのかわりに行列Ｕを用いてもよい。
【００２５】
以上のようにしてこの発明によれば、予測係数Ａ＝（α_１，…，α_ｐ）^Ｔを求めることができ、その場合、目標値として入力値ｘ_ｉではなく、ｘ_ｉの仮数部を仮数部が取り得る値の中間値、例えば式（１）又は式（２）を用いて、この目標値ｙ_ｉと予測値ｘ＾_ｉとの誤差が最小になるように線形予測分析を行っている。従来における予測値ｘ＾_ｉが入力値ｘ_ｉからの距離（誤差）ｅを最小化する場合では、入力値ｘ_ｉの仮数部が小さいときや、１に近いときは、浮動小数点予測値ｘ＾_ｉそのものが浮動小数点入力値ｘ_ｉに近くてもこれら間で指数部Ｅが異なる可能性が大きいが、この発明では前記目標値ｙ_ｉと予測値ｘ＾_ｉとの誤差ｄを最小化しているため、予測値ｘ＾_ｉと入力値ｘ_ｉの指数部が一致する可能性が高まる。それだけ指数部が同一であって、仮数部は大きさとしての数値の意味をもっており、指数部、仮数部を別々に引き算足し算を行うことができ、予測していた結果の誤差の値を圧縮できる。なお入力値ｘ_ｉと予測値ｘ＾_ｉの指数部を比較し、両者が不一致であれば、その入力値ｘ_ｉはそのまま出力するなどの別扱いをする。
【００２６】
次に、以上のようにして求めた予測係数Ａ＝（α_１，…，α_ｐ）^Ｔを用いて入力浮動小数点ディジタル信号サンプル値系列Ｘを予測符号化する場合を述べる。予測係数算出部１３で求めた予測係数Ａは量子化部１４で量子化、例えばベクトル量子化され、補助符号ｂとして出力される。この補助符号ｂは逆量子化部１５で逆量子化され、その逆量子化された浮動小数点予測係数（便宜的に量子化前と同一の記号Ａ＝（α_１，…，α_ｐ）^Ｔで表わす）が予測部１６に入力される。予測部１６には入力サンプル値系列Ｘも入力され、ｐ個の浮動小数点入力ディジタル信号サンプル値（入力値）ｘ_ｉ−ｐ，…，ｘ_ｉ−１とｐ個の浮動小数点予測係数α_１，…，α_ｐとから浮動小数点予測値ｘ＾_ｉ＝Σ_ｊ＝１ ^ｐα_ｊｘ_ｉ−ｊが計算される。
【００２７】
この浮動小数点予測値ｘ＾_ｉと浮動小数点入力値ｘ_ｉとの浮動小数点予測誤差ｅ_ｉが減算部１７で求められる。このようにして得られた浮動小数点予測誤差系列ｅ_ｉは指数仮数分解部１８で指数部Ｅと仮数部Ｍとに分解され、それぞれ圧縮部１９_Ｅ，１９_Ｍでそれぞれ可逆圧縮符号化され、符号列ａ_Ｅ，ａ_Ｍとして出力される。指数部Ｅに対する可逆圧縮符号化としては例えば予測符号化を行い、仮数部Ｍに対する可逆圧縮符号化としては例えばエントロピィ符号化をすればよい。これら符号化は所定サンプル数のフレームごとに行ってもよい。極性Ｓはこれのみ又は例えば仮数部Ｍの最上位に付加して可逆圧縮符号化すればよい。
【００２８】
上述では目標値ｙ_ｉの仮数部を仮数部が取り得る範囲の中央値としたが、この中央値が最適とは限らず、この中央値およびこれに近い複数の変形値を使って予測誤差を算出し、最も圧縮効果のある予測係数を選択することも可能である。例えば図１中に破線で示すように、選択部１０を設け、選択部１０により数値変形部１２において前記中央値又はこれに近い値の仮数部をもつ複数の目標値ｙ_ｉを生成し、それらについて予測係数を生成し、予測符号化を行い、各目標値ｙ_ｉについての符号化データ量を求め、その符号化データ量が最小のものを正規の目標値ｙ_ｉとする。この選択部１０は予測係数算出部１３における重みつき予測誤差の重みの変更にも利用することができる。
【００２９】
図１に示した符号化装置と対応する復号化装置は図２に示すようにその機能構成は従来の予測符号の復号化装置と同様である。つまり符号列ａ_Ｅとａ_Ｍはそれぞれ伸張部２１_Ｅと２１_Ｍで、図１中の圧縮部１９_Ｅ，１９_Ｍの圧縮符号化と対応した可逆伸張復号化が行われ、指数部Ｅと仮数部Ｍ、また極性Ｓが求められ、これらは指数仮数統合部２２で浮動小数点誤差系列ｅ_ｉとされる。また符号列ｂは逆量子化部２３で逆量子化され、浮動小数点予測係数Ａが求められる。この浮動小数点予測係数Ａと既に再生された浮動小数点ディジタル信号サンプル値列Ｘが予測部２４に入力され、図１中の予測部１６と同様の処理により浮動小数点予測値ｘ＾_ｉが計算される。この浮動小数点予測値ｘ＾_ｉと指数仮数統合部２２よりの浮動小数点誤差ｅ_ｉとが加算部２５で加算されて、浮動小数点ディジタル信号サンプル値ｘ_ｉが再生される。
【００３０】
図１において浮動小数点ディジタル信号サンプル値系列Ｘは必ずしも可逆圧縮符号化しなくてもよい。その場合は逆量子化部１５は省略できる。更に仮数部を無視して指数部のみの予測符号化や指数部の数値をそのまま整数とみなして予測符号化を適用することも可能である。
上記の予測係数決定において、いったん予測値ｘ＾_ｉと誤差ｅ_ｉを求めてから、その誤差ｅ_ｉの系列において圧縮に適さないサンプルを探し、そのサンプルの圧縮を改善するように予測係数を再調整するようにしてもよい。また複数の予測係数の候補を生成し、その候補から望ましいもの、つまり最も圧縮効率が高くなるものを選択するようにしてもよい。
【００３１】
この発明は前述したように予測符号化に限らず、一般の浮動小数点ディジタル信号サンプル値系列の線形予測分析に適用することができる。図１に示した線形予測分析の構成部分（図１の構成全体でもよい）をコンピュータに機能させてもよい。この場合はその部分としてコンピュータに機能させるためのプログラムをＣＤ−ＲＯＭ、磁気ディスクなどの記録媒体からインストールし、又は通信回線を介してダウンロードしてそのプログラムをコンピュータに実行させればよい。
【００３２】
【発明の効果】
以上述べたようにこの発明によれば、浮動小数点形式のディジタル信号サンプル値系列を変形した目標値を使ったり、必要に応じて重みをつけた誤差評価を行うことで、その予測値と入力値との指数部が異なり難いようになり、それだけ指数部と仮数部とを別々に引き算足し算をしてもよい場合が多くなり頗る効率的である。
【図面の簡単な説明】
【図１】この発明を適用した符号化装置の機能構成例を示す図。
【図２】図１の符号化装置と対応する復号化装置の機能構成を示す図。
【図３】ＩＥＥＥ−７５４の３２ビット浮動小数点のフォーマットを示す図。[0001]
BACKGROUND OF THE INVENTION
The present invention can be used, for example, in an encoding method for converting a digital signal in a floating-point format such as voice, music, or image into a code that is reversibly compressed to a lower amount of information. To a method for linear predictive analysis of the sample value from the above, an apparatus, a program, and a recording medium.
[0002]
[Prior art]
As a method for compressing information such as sound and image, there is a reversible encoding method that does not allow distortion. By combining lossy encoding with a high compression rate and reversibly compressing the error between the reproduced signal and the original signal, reversible compression at a high compression rate is possible. This combination compression method is proposed in Patent Document 1. Various other reversible encoding methods that do not allow distortion of information such as audio and images are known. For music information, for example, Non-Patent Document 1 discloses that lossless compression encoding is performed using a predictive encoding method. Any of the conventional methods is a compression encoding method for a waveform whose waveform is directly used as a PCM signal.
[0003]
However, music recording studios sometimes record and store waveforms in floating-point format. Floating point values are separated into polarity, exponent, and mantissa. For example, the floating-point format standardized as IEEE-754 shown in FIG. 3 is 32 bits, and is composed of a high-order bit, a polarity of 1 bit, an exponent part of 8 bits, and a mantissa part of 23 bits. Assuming that the polarity is S, the value represented by 8 bits of the exponent part is E in decimal number, and the binary number of mantissa part is M, the numerical value in this floating-point format is (-1) ^S × 2 ^E-E0 x1. M, E0 = 2 ⁷ −1 = 127.
[0004]
[Patent Document 1]
JP 2001-44847 A [Non-Patent Document 1]
Mat Hans and Ronald W. Schaffer “Lossless Compression of Digital Audio” IEEE SIGNAL PROCESSING MAGAZINE, JULY 2001, P21-32.
[0005]
[Problems to be solved by the invention]
Voice, music, and image information may be recorded and stored as a digital signal sequence in a floating-point format. In this case, since it is separated into a polarity, an exponent part, and a mantissa part, it cannot be compressed as it is. As shown in the lossless predictive coding method shown in Non-Patent Document 1, recursive linear prediction is performed, the prediction coefficient is quantized as auxiliary information, and the prediction error is compressed. In the case of compression of an integer format digital signal sample sequence, optimization of the compression rate of the error sequence can be approximated by performing a prediction for reducing the energy on the basis of the square error minimization criterion.
[0006]
When generating an error in a digital signal sample sequence in a floating-point format or generating an original signal from an error, subtraction or addition is performed as a normal numerical value, and the exponent part is added, so the precision of the mantissa part cannot be guaranteed. Therefore, the exponent part, mantissa part, and polarity are separately subtracted. This preserves reversibility, but if the exponent part is different, the mantissa part loses its meaning as a numerical value. For this reason, there is no guarantee that the error value of the result can be compressed even if predicted.
[0007]
In view of the above, the object of the present invention is to provide a linear predictive analysis method, apparatus, program, and the like, in which there is a high possibility that the predicted floating-point value x ^ _i matches the exponent part of the input floating-point value x _i . It is to provide a recording medium.
[0008]
[Means for Solving the Problems]
According to one aspect of the present invention, for each sample value of a digital signal sample value series in a floating-point format, a floating-point target value obtained by replacing the mantissa part with a median value in the range that the mantissa part can take or a value close thereto. , And a prediction coefficient is obtained by linear prediction analysis so as to minimize a prediction error or a weighted prediction error between the floating point target value and the floating point prediction value.
[0009]
According to another aspect of the present invention, for each sample value of a digital signal sample value series in a floating-point format, the mantissa part determined by the lower one or more digits of the mantissa part according to the value of the remaining upper digits of the mantissa part Generates a floating-point target value that is replaced by the median value of the range of possible values or a value close thereto, and minimizes the prediction error or the weighted prediction error between the floating-point target value and the floating-point prediction value Thus, the prediction coefficient is obtained by linear prediction analysis.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment will be described in which the present invention is applied to predictive compression encoding of information such as speech, music, and images, which is a digital signal sample value sequence in 32-bit IEEE-754 floating point format.
FIG. 1 shows a functional configuration of an embodiment of the present invention. A floating-point format digital signal sample value series X is input from the signal source 11 to the numerical transformation unit 12. In the numerical transformation unit 12, the mantissa part M of each sample value x _i of the sample value series X is transformed into a median value in a range that the mantissa part can take or a value close thereto, and a floating point target value y _i is generated. The floating point target value y _i is, for example, the median “100 0000 0000 0000 0000 0000” of the values that the mantissa part can take, for example, only the 23 bits of the mantissa part of the floating point sample value x _i (1)
In other words, the exponent part E is set to the same value as x _i .
[0011]
Alternatively, the floating point target value y _i uses the upper predetermined digit τ (τ is an integer from 1 to 22) in the mantissa part of the floating point sample value x _i as it is as the upper digit of the mantissa part of y _i. The remaining digit of the mantissa part is replaced with the median value of the range that can be taken by the mantissa part determined by the value of the upper digit or a value close to this to obtain the mantissa part of the target value y _i . For example, when τ = 7, the upper 7 digits of the mantissa part of _xi are used as they are, and the range of possible values of the mantissa part determined by the value is, for example, the median as shown below, and the remaining lower 8 digits. The following is replaced by the mantissa part of y _i .
[0012]
“Xxx xxx 1000 0000 0000” (2)
The floating-point target value y _i and the input floating-point digital signal sample value series X generated in this way are input to the prediction coefficient calculation unit 13, and the floating-point predicted value of the target value y _i and the input sample value x _i Prediction coefficients α _j (j = 1,..., p) are obtained by linear prediction analysis so that the prediction error with x ^ _i is minimized.
That is, in the normal p-th order linear prediction, the following prediction error e is minimized.
[0013]
x ^ _i = Σ _{j = 1} ^p α _j x _i−j
e = Σ _{i = p} ^N-1 (x _i -x ^ _i ) ²
N is the number of samples in the frame into which the input sequence X is divided, for example, 1024.
However, in the present invention, the prediction error d
d = Σ _{i = p} ^N-1 (y _i -x ^ _i ) ² (3)
Minimize. A normal equation for obtaining a solution (prediction coefficient) that minimizes the prediction error d is as follows. The sum of the error _{d j} from j = p to j = N-1 and D.
[0014]
D = ( UA − Y ) ^T ( UA − Y ) (4)
However,
D = (d _p , d _{p + 1} ,..., D _N−1 ) ^T
A = (α ₁ ,..., Α _p ) ^T (5)
Y = (y _p , y _{p + 1} ,..., Y _N−1 ) ^T
() ^T represents the transpose of (), and U is the following matrix of (Np) rows and p columns.
[0015]
[Expression 1]

[0016]
The prediction coefficient A that minimizes the error D is the following least-square solution.
^{(U T U) A = U} T Y (7)
^{^{A = (U T U) -1}} U T Y (8)
Although it is possible to solve the simultaneous equations of Equation (5) as they are, it is desirable to use an approximate fast solution method from the viewpoint of reducing the processing amount. In the case of Expression (8), ( U ^T U ) can be approximated to a Toeplitz-type matrix, that is, a matrix in which the elements on the line that is symmetrical and parallel to the diagonal are the same value, but the element of the vector U ^T Y on the right side is ( U ^T U ) is different from the element. Therefore, the fastest Durbin method cannot be used, and the next fastest Levinson method (for example, Takehiro Moriya "Speech coding" published by the Institute of Electronics, Information and Communication Engineers in October 1998 (hereinafter referred to as Reference 1), pages 15-16. ) To find a solution.
[0017]
In the above description, the calculation of the correlation coefficient used for obtaining the prediction coefficient uses the matrix U of the input sample values x. Instead, the following matrix V of the target values y can also be used.
[0018]
[Expression 2]

[0019]
In this case, V ^T V on the left side of the following normal equation (10) is a Toeplitz-type matrix, and the vector V ^T Y on the right side can be approximated to the same correlation coefficient, that is, the elements of the vector V ^T Y on the left side can be approximated to the matrix V since can be approximated as the first column of the corresponding element of ^T V obtained at high speed by equation solutions Durbin method (11) (see, e.g., pages 16 to 17 of reference 1). This is a method frequently used in speech coding.
( V ^T V ) A = V ^T Y (10)
A = ( V ^T V ) ⁻¹ V ^T Y (11)
When this matrix V is used, the input digital signal sample value series X may not be input to the prediction coefficient calculation unit 13.
[0020]
In the above description, the prediction coefficient is obtained by linear prediction analysis so as to minimize the prediction error d, but it may be obtained by minimizing the weighted prediction error d. As a method for obtaining a prediction coefficient by linear prediction analysis so as to minimize the weighted prediction error, for example, the method shown in the section of “Robust analysis” on pages 12 to 13 of the reference document 1 can be used. However, the weighting method is different.
Further, in order to be suitable for the linear prediction analysis in the floating-point format, the prediction error between the target value y _i and the predicted value x ^ _i shown in the following formula (12) or formula (13) is the reverse of the magnitude of the input value x _i . You may make it attach a weight and apply the diagonal weight W to a distance scale. This also is more likely to exponent of the input value x _i and the predicted value x ^ _i matches.
[0021]
d _W = Σ _{i = p} ^N−1 w _i (y _i −x ^ _i ) ² (12)
e _W = Σ _{i = p} ^N−1 w _i (x _i −x ^ _i ) ² (13)
The normal equation in the case of Expression (12) is as shown in Expression (15) using the diagonal matrix W of (N−p) × (N−p).
[0022]
[Equation 3]

[0023]
D _W = ( VA − Y ) ^T W ( VA − Y ) (15)
This least squares solution is
( V ^T WV ) A = V ^T WY (16)
A = ( V ^T WV ) ^-1 V ^T WY (17)
The weight is a function f (x _i ) that decreases as the input value x _i increases. For example, f (x _i ) = x _j ⁻² (18)
use. This increases the possibility of setting the exponent part of the error between the predicted value and the target value to 0, thereby improving the information compression efficiency.
[0024]
In addition, the prediction coefficient is obtained by increasing the weight for a sample in which the exponent part of the prediction error is not 0 and repeating again or again, and when the exponent part of the prediction error becomes 0, or the compression efficiency for the input signal sequence It is also possible to find the one with the highest.
( V ^T WV ) is not a Toeplitz type, but can be approximated to a symmetric matrix, so the Cholesky method (see, for example, pages 13 to 14 of Reference 1) can be used. When this diagonal matrix W is used, the matrix U may be used instead of the matrix V.
[0025]
According to the present invention as described above, prediction coefficients _{A = (α 1, ...,} α p) can be obtained ^T, in which case, rather than the input value x _i as a target value, the mantissa of x _i Using an intermediate value that can be taken by the mantissa part, for example, Expression (1) or Expression (2), linear prediction analysis is performed so that an error between the target value y _i and the predicted value x ^ _i is minimized. Yes. In the case where the predicted value x ^ _i in the conventional minimizes the distance (error) e from the input value x _i, and when the mantissa part of the input value x _i is small, when close to 1, the floating point predicted value x ^ _{Even if i} itself is close to the floating-point input value x _i , there is a high possibility that the exponent part E is different between them, but in the present invention, the error d between the target value y _i and the predicted value x ^ _i is minimized. Therefore, it increases the likelihood that the exponent of the input value x _i and the predicted value x ^ _i matches. The exponent part is the same, and the mantissa part has the meaning of a numerical value as a size. The exponent part and the mantissa part can be subtracted separately, and the error value of the predicted result can be compressed. . Note that the exponent values of the input value x _i and the predicted value x ^ _i are compared, and if the two do not match, the input value x _i is treated as it is.
[0026]
Next, a case where the input floating-point digital signal sample value series X is predictively encoded using the prediction coefficient A = (α ₁ ,..., Α _p ) ^T obtained as described above will be described. The prediction coefficient A obtained by the prediction coefficient calculation unit 13 is quantized by the quantization unit 14, for example, vector quantization, and is output as an auxiliary code b. This auxiliary code b is dequantized by the dequantization unit 15 and the dequantized floating-point prediction coefficient (for convenience, the same symbol A = (α ₁ ,..., Α _p ) ^T as before quantization) Is input to the prediction unit 16. An input sample value series X is also input to the prediction unit 16, and p floating point input digital signal sample values (input values) x _i−p ,..., X _i−1 and p floating point prediction coefficients α ₁ , .., Α _p and the floating-point predicted value x ^ _i = Σ _{j = 1} ^p α _j x _i−j are calculated.
[0027]
A subtracting unit 17 obtains a floating-point prediction error e _i between the floating-point predicted value x ^ _i and the floating-point input value x _i . The floating-point prediction error sequence e _i obtained in this way is decomposed into an exponent part E and a mantissa part M by the exponent mantissa decomposition unit 18 and is losslessly encoded by the compression units 19 _E and 19 _M , respectively. Output as columns a _E and a _M. For example, predictive coding is performed as the lossless compression coding for the exponent part E, and entropy coding is performed as the lossless compression coding for the mantissa part M, for example. These encodings may be performed every predetermined number of frames. The polarity S may be added to this alone or, for example, the most significant part of the mantissa part M and losslessly encoded.
[0028]
In the above description, the mantissa part of the target value y _i is the median of the range that the mantissa can take. However, this median is not necessarily optimal, and the prediction error is calculated using this median and a plurality of deformation values close thereto. It is also possible to calculate and select the prediction coefficient having the most compression effect. For example, as shown by a broken line in FIG. 1, a selection unit 10 is provided, and the selection unit 10 generates a plurality of target values y _i having the mantissa part of the median value or a value close to the median value in the numerical deformation unit 12. A prediction coefficient is generated for, prediction encoding is performed, and the amount of encoded data for each target value y _i is obtained, and the one with the minimum amount of encoded data is defined as a normal target value y _i . The selection unit 10 can also be used to change the weight of the weighted prediction error in the prediction coefficient calculation unit 13.
[0029]
The decoding apparatus corresponding to the encoding apparatus shown in FIG. 1 has the same functional configuration as that of the conventional predictive code decoding apparatus as shown in FIG. That is, in the code sequence _{a E} and _{a M} each expansion section 21 _E is a 21 _M, compression section ₁₉ E in Figure 1, of 19 _M compression coding and the corresponding lossless expansion decoding is performed, exponent E and the mantissa part M, also sign S is determined, which are floating point error series e _i in mantissa integration unit 22. The code string b is inversely quantized by the inverse quantization unit 23 to obtain the floating-point prediction coefficient A. The floating-point prediction coefficient A and the already reproduced floating-point digital signal sample value sequence X are input to the prediction unit 24, and the floating-point prediction value x ^ _i is calculated by the same processing as the prediction unit 16 in FIG. . The floating point predicted value x ^ _i and the floating point error e _i from the exponent mantissa integration unit 22 are added by the adding unit 25 to reproduce the floating point digital signal sample value x _i .
[0030]
In FIG. 1, the floating-point digital signal sample value series X does not necessarily have to be losslessly encoded. In that case, the inverse quantization unit 15 can be omitted. Furthermore, it is also possible to apply predictive encoding by ignoring the mantissa part and predicting only the exponent part, or regarding the numerical value of the exponent part as an integer as it is.
In the above prediction coefficient determination, once the prediction value x ^ _i and the error e _i are obtained, a sample that is not suitable for compression is searched for in the series of the error e _i , and the prediction coefficient is re-applied to improve the compression of the sample. You may make it adjust. Alternatively, a plurality of prediction coefficient candidates may be generated, and a desired one, that is, the one having the highest compression efficiency may be selected from the candidates.
[0031]
As described above, the present invention is not limited to predictive coding but can be applied to linear predictive analysis of general floating-point digital signal sample value sequences. The components of the linear prediction analysis shown in FIG. 1 (or the whole configuration of FIG. 1) may be caused to function in a computer. In this case, a program for causing the computer to function as the part may be installed from a recording medium such as a CD-ROM or a magnetic disk, or downloaded via a communication line and executed by the computer.
[0032]
【The invention's effect】
As described above, according to the present invention, a predicted value and an input value can be obtained by using a target value obtained by modifying a digital signal sample value sequence in a floating-point format, or by performing error evaluation with a weight as necessary. Therefore, the exponent part and the mantissa part may be added and subtracted separately.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a functional configuration example of an encoding apparatus to which the present invention is applied.
FIG. 2 is a diagram showing a functional configuration of a decoding apparatus corresponding to the encoding apparatus in FIG. 1;
FIG. 3 is a diagram showing a 32-bit floating point format of IEEE-754.

Claims

A method for linear predictive analysis of a sample value from a digital signal sample value sequence in a floating-point format,
By replacing the mantissa part of the input value in the floating-point format with the median value of the range that the mantissa part can take or a value close to it, a target value in the floating-point format is created, and the prediction error between this target value and the predicted value or A linear prediction analysis method for a floating-point format signal sequence, wherein a prediction coefficient is obtained by linear prediction analysis so as to minimize a weighted prediction error.

A method for linear predictive analysis of a sample value from a digital signal sample value sequence in a floating-point format,
Replacing the lower one or more digits of the mantissa part of the input value in floating-point format with the median value of the range of values that can be taken by the mantissa part determined by the value of the remaining upper digits of the mantissa part or a value close thereto A floating-point target value and a prediction coefficient is obtained by linear prediction analysis so as to minimize the prediction error or weighted prediction error between the target value and the predicted value. Linear predictive analysis method.

3. The method of linear prediction analysis of a floating-point format signal sequence according to claim 1, wherein a weight opposite to the magnitude of the input value is used as a weight of the weighted prediction error between the predicted value and the target value. .

An apparatus for linear predictive analysis of a sample value from a digital signal sample value series in a floating-point format,
Numerical value transformation for generating a floating-point target value by inputting the digital signal sample value series in the floating-point format and replacing the mantissa part of each sample value with the median value of the range that the mantissa part can take or a value close thereto. And
The floating-point target value is input, or the floating-point target value and the digital signal sample value series in the floating-point format are input, and a prediction error or weighting between the predicted value of the input sample value and the floating-point target value A linear prediction analysis apparatus for a floating-point format signal sequence, comprising: a prediction coefficient calculation unit that obtains a prediction coefficient by linear prediction analysis so as to minimize a prediction error.

An apparatus for linear predictive analysis of a sample value from a digital signal sample value series in a floating-point format,
A digital signal sample value series in the above floating-point format is input, and for each sample value, the lower one or more digits of the mantissa part is the center of the range that can be taken by the mantissa part determined by the value of the remaining upper digits of the mantissa part. A numerical transformation unit for generating a floating point target value replaced with a value or a value close thereto,
The floating-point target value is input, or the floating-point target value and the digital signal sample value series in the floating-point format are input, and a prediction error or weighting between the predicted value of the input sample value and the floating-point target value A linear prediction analysis apparatus for a floating-point format signal sequence, comprising: a prediction coefficient calculation unit that obtains a prediction coefficient by linear prediction analysis so as to minimize a prediction error.

The program for making a computer perform each process of the linear prediction analysis method of the floating-point format signal sequence in any one of Claims 1-3.

A computer-readable recording medium on which the program according to claim 6 is recorded.