JP2004194224A

JP2004194224A - Device and method for transforming wavelet

Info

Publication number: JP2004194224A
Application number: JP2002362726A
Authority: JP
Inventors: Yusuke Mizuno; 雄介水野
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2002-12-13
Filing date: 2002-12-13
Publication date: 2004-07-08
Anticipated expiration: 2022-12-13
Also published as: JP4223795B2

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently perform wavelet transformation based on a lifting configuration in a short period of time. <P>SOLUTION: Intermediate data D<SP>1</SP><SB>n+1</SB>and S<SP>1</SP><SB>n+2</SB>of two points within a target area A1 are used to calculate the temporary data (S<SP>2</SP><SB>n+2</SB>) of a second stage on a sequence with even numbered input data Y (2n + 4) defined as a start point. In addition, in a cycle one clock ahead, input data Y (2n + 4) within a target area N1 is normalized to calculate the intermediate data S<SP>1</SP><SB>n+2</SB>. In the case of repeatedly applying an operation within the target area A1 to a plurality of data streams, when an operation within the target area A1 is applied to a certain data stream, that the operations parallelly apply the operation of the target area N1 to the next data stream. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ウェーブレット変換を用いた圧縮伸長技術に関する。
【０００２】
【従来の技術】
画像データの高能率符号化方式として、離散ウェーブレット変換（Discrete Wavelet Transformation、以下「ＤＷＴ」と呼ぶ。）に基づく画像の圧縮伸長方法が知られており、これはＩＳＯ（国際標準化機構）が策定するＪＰＥＧ２０００（Joint Photographic Experts Group 2000）方式で採用されている。ＤＷＴの演算方法としては、畳み込み演算方法とリフティング構成（lifting scheme）に基づく演算方法とが知られており、両者は同一結果を出力するが、後者のリフティング構成に基づく演算方法の方が、前者の畳み込み演算方法と比べて、少ないメモリ使用量で高速演算が可能なことや、ロスレス（可逆）圧縮に適することなどの利点を有している。
【０００３】
一般に、ＤＷＴは、原信号を高域成分（高周波数成分）と低域成分（低周波数成分）とに帯域分割するフィルタバンクを用いて構成することが可能である。そして、その逆変換（逆ＤＷＴ）は、帯域分割された高域成分と低域成分とを合成するフィルタバンクを用いて構成することが可能である。
【０００４】
図３５に、ＤＷＴとその逆変換（逆ＤＷＴ）で用いるフィルタバンク２００Ｓ，２００Ａを模式的に示す。入力信号ｘ（ｎ）を低域成分と高域成分との２帯域に分解する分解側フィルタバンク２００Ｓは、低域成分を通過させるローパスフィルタ２０１Ｌと、高域成分を通過させるハイパスフィルタ２０１Ｈと、第１および第２ダウンサンプラー２０２，２０３とで構成されている。ローパスフィルタ２０１Ｌとハイパスフィルタ２０１Ｈは、畳み込み演算を実行するＦＩＲフィルタによって構成される。また、第１および第２ダウンサンプラー２０２，２０３は、それぞれ、フィルタ２０１Ｌ，２０１Ｈからの入力信号を１点おきに間引き、信号長を半分にして出力するものである。ＪＰＥＧ２０００の規格では、第１ダウンサンプラー２０２は奇数番目の信号を間引いて偶数番目の信号（低域成分）を出力し、第２ダウンサンプラー２０３は偶数番目の信号を間引いて奇数番目の信号（高域成分）を出力する。
【０００５】
他方、入力信号（低域成分，高域成分）を合成する合成側フィルタバンク２００Ａは、第１および第２アップサンプラー２０４，２０５と、ローパスフィルタ２０６Ｌと、ハイパスフィルタ２０６Ｈと、加算器２０７とで構成されている。ローパスフィルタ２０６Ｌとハイパスフィルタ２０６Ｈは、畳み込み演算を実行するＦＩＲフィルタによって構成されており、一般に、これら合成側フィルタ２０６Ｌ，２０６Ｈと分解側フィルタ２０１Ｌ，２０１Ｈとは完全再構成条件を満たすように構成される。また、第１および第２アップサンプラー２０４，２０５は、各点間にゼロ値を挿入し信号長を倍にして出力する。そして、加算器２０７は、各合成側フィルタ２０６Ｌ，２０６Ｈから出力された信号を加算し、合成信号ｘ’（ｎ）を出力する。ここで、完全再構成条件を満たす場合はｘ（ｎ）＝ｘ’（ｎ）が成立する。
【０００６】
２次元ＤＷＴは、２次元画像データに対して分解側フィルタバンク２００Ｓを、その２次元画像データの垂直方向，水平方向の順に繰り返し適用することで実行できる。図３６は、３次の分解レベル（decomposition level）でＤＷＴを施された２次元画像データ２１０を模式的に示す帯域分割図である。２次元画像データ２１０中の各ブロックがサブバンド（帯域成分）を表している。例えば、サブバンドＨＨ１は、分解レベル１における垂直方向の高域成分（Ｈ）と水平方向の高域成分（Ｈ）とからなり、サブバンドＬＨ２は、分解レベル２における垂直方向の高域成分（Ｈ）と水平方向の低域成分（Ｌ）とからなる。一般に、サブバンドＸＹｎ（Ｘ，Ｙは「Ｈ」または「Ｌ」の何れか、ｎは分解レベルの次数）は、分解レベルｎにおける垂直方向の成分Ｙと水平方向の成分Ｘとからなるものである。
【０００７】
分解レベル３のＤＷＴの処理手順は次の通りである。先ず、２次元画像全体に、上記分解側フィルタバンク２００Ｓを２回適用することで、分解レベル１のサブバンドＨＨ１，ＨＬ１，ＬＨ１，ＬＬ１（図示せず）が生成される。次に、分解レベル１の最低域のサブバンドＬＬ１に、分解側フィルタバンク２００Ｓを２回適用することで、分解レベル２のサブバンドＨＨ２，ＨＬ２，ＬＨ２，ＬＬ２（図示せず）が生成される。そして、分解レベル２の最低域のサブバンドＬＬ２に、分解側フィルタバンク２００Ｓを２回適用することで、分解レベル３のサブバンドＨＨ３，ＨＬ３，ＬＨ３，ＬＬ３が生成される。
【０００８】
逆に、分解レベル３のサブバンドを合成する逆ＤＷＴの処理手順は次の通りである。先ず、サブバンドＨＨ３，ＨＬ３，ＬＨ３，ＬＬ３に、合成側フィルタバンク２００Ａを２回適用することで、分解レベル２の最低域のサブバンドＬＬ２が生成される。次に、分解レベル２のサブバンドＨＨ２，ＨＬ２，ＬＨ２，ＬＬ２に、合成側フィルタバンク２００Ａを２回適用することで、分解レベル１の最低域のサブバンドＬＬ１が生成される。そして、分解レベル１のサブバンドＨＨ１，ＨＬ１，ＬＨ１，ＬＬ１に、合成側フィルタバンク２００Ａを２回適用することで、２次元画像が生成される。
【０００９】
以上、３次の分解レベルの例を示したが、ＪＰＥＧ２０００方式では、一般に、３次〜８次以上の分解レベルが採用される。また、本例では、１枚の静止画像全体に一括してＤＷＴを施したが、実際には、実装メモリ容量などの関係上、１枚の静止画像を複数の矩形状の「タイル」と称する領域に分割し、各タイル単位でＤＷＴを実行することも行われている。
【００１０】
一方、ＤＷＴおよび逆ＤＷＴはリフティング構成で実現することも可能である。本発明は、合成側の処理に関するものであるので、ここからは逆ＤＷＴの処理について説明する。公知の９×７タップのDaubechiesフィルタの場合、入力データＹ（２ｎ），Ｙ（２ｎ＋１），Ｙ（２ｎ＋２）（ｎ：整数）などと、出力データＸ（２ｎ），Ｘ（２ｎ＋１）との間の関係式は、次式（１）で規定するリフティング構成で表現できる。なお、合成側の処理は逆ＤＷＴであることから、この後の説明全般にわたって、入力データにＹを出力データにＸの文字を使用することとする。
【００１１】
【数１】

【００１２】
上式（１）中、奇数番目の入力データＹ（２ｎ＋１）は分解処理によって得られた高域成分のデータを示し、偶数番目の入力データＹ（２ｎ）は分解処理によって得られた低域成分のデータを示している。そして、出力データＸ（２ｎ）およびＸ（２ｎ＋１）が高域成分と低域成分とが合成されたデータを示している。また、係数α，β，γ，δはリフティング係数と呼ばれ、係数κ，１／κは規格化係数と呼ばれており、これら係数α，β，γ，δ，κ，１／κは、９×７タップのDaubechiesフィルタのフィルタ係数によって一意に導出される。
【００１３】
上式（１）で規定されるリフティング構成は、図３７に示す格子構造で表現することが可能である。図３７の左端の縦一列に並ぶ格子点は、それぞれ、入力データ…，Ｙ（２ｎ−１），（２ｎ），…，Ｙ（２ｎ＋９），Ｙ（２ｎ＋１０），…（ｎ：整数）を表している。つまり、ＤＷＴによって分解された低域成分および高域成分のデータが交互に並んで配列されたデータである。また、これら入力データから水平方向右方に延びる線分の右端の格子点は、それぞれ、出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…，Ｘ（２ｎ＋９），Ｘ（２ｎ＋１０），…を表している。
【００１４】
また、各入力データＹ（ｋ）（ｋ：整数）を示す格子点から、出力データＸ（ｋ）を示す格子点まで延びる線分上の複数の格子点は、一系列の中間データを表している。例えば、入力データＹ（２ｎ）と出力データＸ（２ｎ）との間の線分上には、入力データＹ（２ｎ）を始点として生成された中間データＳ¹ _n，Ｓ² _nを表す格子点が存在している。
【００１５】
この格子構造に基づく演算は次の（Ａ）〜（Ｃ）の規則に従って行われる。（Ａ）格子点を表すデータは、当該格子点から右方へ延びる線分に沿って移動する。（Ｂ）各線分を移動するデータは、当該線分に付した係数を乗算される（係数乗算処理）。（Ｃ）各格子点では、線分に沿って左方から移動してきたデータが加算される（加算処理）。例えば、入出力データＹ（２ｎ），Ｘ（２ｎ）間の線分上の中間データＳ² _nは、Ｓ² _n＝１×Ｓ¹ _n−δ×Ｄ¹ _n-1−δ×Ｄ¹ _n、のように算出される。この式は、上式（１）中の［ｓｔｅｐ３］に相当するものである。
【００１６】
図３７に示すように、例えば、中間データＳ² _nは、図面左方の３つの格子点Ｄ¹ _n-1，Ｓ¹ _n，Ｄ¹ _nから遷移したデータを加算したものである。全ての中間データが、当該中間データよりも左方の３つの格子点から遷移した３点のデータを加算することで算出されることが分かる。ＪＰＥＧ２０００方式は、１点の中間データの算出処理を２工程に分けて行うことを推奨している（"Mathias Larsson Carlander, Media Lab, Ericsson Research, Sweden, JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165, 28 June, 2001"の文献参照）。図３８は、そのＪＰＥＧ２０００方式が推奨する算出方法を模式的に示す図である。格子点ｘ₁，ｘ₂，ｘ₃，ｙがデータを表しており、α，β，γは、各格子点間を結ぶ線分に付した係数を表している。図示するように、データｙは、ステップａで一時データｚを算出した後に、ステップｂで算出されることが分かる。
【００１７】
【非特許文献１】
マシアス・ラーソン・カーランダー（Mathias Larsson Carlander）著，メディアラボ，エリクソン研究所，スエーデン（Media Lab, Ericsson Research, Sweden），「JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165」，２００１年６月２８日。
【００１８】
【発明が解決しようとする課題】
しかしながら、前述のＪＰＥＧ２０００方式が推奨するリフティング演算では、以下に説明するように、１点の出力データを算出するために要する処理時間が長いという問題がある。
【００１９】
図３９〜図４８は、リフティング構成によるＤＷＴ逆変換の処理手順の例を説明するための格子図である。なお、図示しないが、各格子点間を結ぶ全ての線分には、図３７に示した係数が対応付けされているものとする。また、図３９〜図４８では、黒く塗りつぶした格子点は、入力済み或いは計算済みのデータ点を表し、上半分だけ塗りつぶした格子点は、上記ステップａの処理だけが終了した一時データの点を表し、空白の格子点は、ステップａとステップｂの何れの処理もなされていない未計算の点を表している。これら各図に示す処理は、何れも、１クロック周期内に実行される。
【００２０】
図３９に示すＮ回目（Ｎ：整数）の処理では、対象領域Ｎ１内の入力データＹ（２ｎ＋４）を規格化することで、偶数番目の入力データＹ（２ｎ＋４）を始点とする第１段階の中間データＳ¹ _n+2が算出される。
【００２１】
図４０〜図４３に示すＮ＋１回目〜Ｎ＋４回目の処理では、全て、上記ステップａが実行される。Ｎ＋１回目処理（図４０）では、対象領域Ａ１内の２点の中間データＳ¹ _n+2，Ｄ¹ _n+1を用いた処理により、偶数番目の入力データＸ（２ｎ＋４）を始点とする第２段階の一時データ（Ｓ² _n+2）が算出される（このように、一時データを表す場合には、データをカッコで括って区別することとする。）。次のＮ＋２回目処理（図４１）では、対象領域Ａ２内の２点の中間データＤ¹ _n+1，Ｓ² _n+1を用いた処理により、奇数番目の入力データＹ（２ｎ＋３）を始点とする第２段階の一時データ（Ｄ² _n+1）が算出される。次のＮ＋３回目処理（図４２）では、対象領域Ａ３内の２点の中間データＳ² _n+1，Ｄ² _nを用いた処理により、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋２））が算出される。そして、Ｎ＋４回目処理（図４３）では、対象領域Ａ４内の２点のデータＤ² _n，Ｘ（２ｎ）を用いた処理により、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））が算出される。
【００２２】
次のＮ＋５回目の処理（図４４）では、対象領域Ｎ２内の入力データＹ（２ｎ＋５）を規格化することで、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第一段階の中間データＤ¹ _n+2が算出される。
【００２３】
次に、図４５〜図４８に示すＮ＋６回目〜Ｎ＋９回目の処理では、全て、上記ステップｂが実行される。Ｎ＋６回目処理（図４５）では、対象領域Ｂ１内の中間データＤ¹ _n+2と、上記Ｎ＋１回目処理で算出した一時データ（Ｓ² _n+2）とを用いた処理により、中間データＳ² _n+2が算出される。次のＮ＋７回目処理（図４６）では、対象領域Ｂ２内の前記Ｎ＋６回目処理で算出した中間データＳ² _n+2と、上記Ｎ＋２回目処理で算出した一時データ（Ｄ² _n+1）とを用いた処理により、中間データＤ² _n+1が算出される。次のＮ＋８回目処理（図４７）では、対象領域Ｂ３内の前記Ｎ＋７回目処理で算出した中間データＤ² _n+1と、上記Ｎ＋３回目処理で算出した出力一時データ（Ｘ（２ｎ＋２））とを用いた処理により、出力データＸ（２ｎ＋２）が算出される。そして、Ｎ＋９回目処理（図４８）では、対象領域Ｂ４内の前記Ｎ＋８回目処理で算出した出力データＸ（２ｎ＋２）と、上記Ｎ＋４回目処理で算出した出力一時データ（Ｘ（２ｎ））とを用いた処理により、出力データＸ（２ｎ＋１）が算出される。
【００２４】
次に、Ｎ＋１０回目処理（図示せず）では、上記Ｎ回目処理と同様に、入力データＹ（２ｎ＋６）を用いた規格化処理が行なわれ、以降、上記Ｎ＋１回目〜Ｎ＋９回目処理と同様の処理が繰り返し実行される。
【００２５】
このように、高域成分と低域成分とを交互に並べた入力データＹ（２ｎ＋４）およびＹ（２ｎ＋５）を入力することによって合成結果である出力データＸ（２ｎ＋２）およびＸ（２ｎ＋１）を算出するために、Ｎ回目〜Ｎ＋９回目の１０クロック周期が必要であることが分かる。したがって、１点の出力データを算出するために平均して５クロック周期が必要となる。この５クロック周期を更に短縮することで逆ＤＷＴ演算を高速に実行し得る処理方法が求められている。
【００２６】
以上の問題などに鑑みて本発明が解決しようとするところは、リフティング構成に基づくウェーブレット変換を短時間で効率良く実行し得るウェーブレット変換装置およびウェーブレット変換方法を提供する点にある。
【００２７】
【課題を解決するための手段】
上記課題を解決するため、請求項１記載の発明は、リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換装置であって、制御部と、高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列を取り込んで合成された出力データ列を算出するフィルタリング部と、を備え、前記フィルタリング部は、前記入力データ列の各々に所定の規格化係数を乗算することで、各入力データを第１段階の中間データへ１点当たり１クロック周期内に変換する単数または複数の規格化処理を実行する規格化手段と、前記規格化手段によって規格化された第１段階の中間データの各々を単数または複数の段階に亘る一系列の中間データへ１点当たり１クロック周期内に変換し、あるいは、最終段階の中間データの各々を出力データへ１点当たり１クロック周期内に変換する単数または複数の変換処理を実行する中間データ変換手段と、を含み、前記制御部は、前記規格化手段および前記中間データ変換手段に、前記単数または複数の規格化処理および前記単数または複数の変換処理を、全ての点の前記出力データが算出されるまで繰り返し実行させ、且つ、繰り返し実行される前記単数または複数の規格化処理および前記単数または複数の変換処理のうち少なくとも２個の処理を１クロック周期内に並列に実行させるように制御する、ことを特徴とする。
【００２８】
請求項２記載の発明は、請求項１記載のウェーブレット変換装置であって、前記規格化手段および前記中間データ変換手段は、前記規格化処理および前記変換処理を並列に実行する。
【００２９】
請求項３記載の発明は、請求項１または請求項２記載のウェーブレット変換装置であって、前記規格化手段は、各入力データに前記規格化係数を乗算する規格化係数乗算器と、前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、を含み、前記中間データ変換手段は、２点の中間データの一方に所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと前記２点の中間データの他方とを加算する加算器とからなる２点演算部と、前記２点演算部から出力されたデータを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、を含み、前記ウェーブレット変換装置は、さらに、メモリ管理部と、前記メモリ管理部の制御によりデータを一時記憶するメモリと、を備え、前記メモリ管理部は、前記出力先選択部から出力された前記データを前記メモリに転送し記憶させるように制御する、ことを特徴とする。
【００３０】
請求項４記載の発明は、請求項３記載のウェーブレット変換装置であって、前記制御部は、前記変換処理として、「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第１の変換処理と、前記第１の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第３の変換処理と、前記第３の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の前記中間データを１点当たり１クロック周期内に算出する第４の変換処理と、第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する第５の変換処理と、前記第５の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第６の変換処理と、第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する第７の変換処理と、前記第７の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の前記中間データを１点当たり１クロック周期内に算出する第８の変換処理と、を全ての点の前記出力データが算出されるまで前記２点演算部に繰り返し実行させるように制御する。
【００３１】
請求項５記載の発明は、請求項４記載のウェーブレット変換装置であって、前記制御部は、前記第１の変換処理および前記第３の変換処理を実行した後に、前記第５の変換処理および前記第７の変換処理を、前記最終段階の前記一時データが算出されるまで前記２点演算部に実行させ、その後、前記第２の変換処理および前記第４の変換処理を実行した後に、前記第６の変換処理および前記第８の変換処理を、前記出力データが算出されるまで前記２点演算部に実行させるように制御する。
【００３２】
請求項６記載の発明は、請求項４記載のウェーブレット変換装置であって、互いに独立に動作する４個の前記２点演算部を備え、前記制御部は、前記変換処理として、前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記第２の変換処理と、Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第３の変換処理と、Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第６の変換処理と、Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記第７の変換処理と、の４工程を前記各２点演算部に並列に実行させると共に、Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第１の変換処理と、前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記第４の変換処理と、Ｐ−２番目の入力データを始点とする系列上の第Ｍ段階の一時データを算出する前記第５の変換処理と、前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記第８の変換処理と、の４個の処理をそれぞれ前記各２点演算部に並列に実行させるように制御する。
【００３３】
請求項７記載の発明は、請求項１または請求項２記載のウェーブレット変換装置であって、前記規格化手段は、各入力データに前記規格化係数を乗算する規格化係数乗算器と、前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、を含み、前記中間データ変換手段は、取り込まれた３点の入力データの中で第１および第２の入力データを加算する第１加算器と、該第１加算器から出力されたデータに所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと第３の入力データとを加算することで中間データを算出する第２加算器とからなる３点演算部と、前記３点演算部から出力された中間データを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、を含み、前記メモリ管理部は、前記出力先選択部から出力された中間データを前記メモリに転送し記憶させるように制御する。
【００３４】
請求項８記載の発明は、請求項７記載のウェーブレット変換装置であって、前記制御部は、前記変換処理として、「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第１の変換処理と、第１系列上の第１段階の中間データと、その第１段階の中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第３の変換処理と、第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する第４の変換処理と、を全ての点の前記出力データが算出されるまで前記３点演算部に繰り返し実行させるように制御する。
【００３５】
請求項９記載の発明は、請求項８記載のウェーブレット変換装置であって、互いに独立に動作する２個の前記３点演算部を備え、前記制御部は、前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第２の変換処理と、Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記第４の変換処理と、の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する。
【００３６】
請求項１０記載の発明は、請求項８または請求項９記載のウェーブレット変換装置であって、前記制御部は、前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第１の変換処理と、Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第３の変換処理と、の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する。
【００３７】
請求項１１記載の発明は、請求項８記載のウェーブレット変換装置であって、前記制御部は、前記第１の変換処理〜前記第４の変換処理を並列にさせるように制御する。
【００３８】
請求項１２記載の発明は、請求項１〜請求項１１の何れか１項に記載のウェーブレット変換装置であって、前記フィルタリング部は、直列に接続される第１フィルタリング部と第２フィルタリング部とから構成されており、前記第１フィルタリング部は、水平方向および垂直方向のうちの一方向に帯域分割されている前記高域成分および前記低域成分のデータを入力し、これらのデータを合成してライン単位で算出し、前記第２フィルタリング部は、前記第１フィルタリング部で算出された合成データに対して処理を実行することで、前記水平方向および前記垂直方向のうちの他方向の合成データを算出する。
【００３９】
請求項１３記載の発明は、リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換方法であって、（ａ）高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列から、入力データを選択的に取り込む工程と、（ｂ）前記工程（ａ）で取り込まれた前記入力データの各々に規格化係数を乗算することで第１段階の中間データへ１点当たり１クロック周期内に変換する工程と、（ｃ）第ｍ段階（ｍは１以上の整数）の中間データを第ｍ＋１段階の中間データへ１点当たり１クロック周期内に算出する工程（第ｍ段階の中間データが最終段階の中間データである場合を含む。この場合、第ｍ＋１段階の中間データは出力データである。）と、を備え、前記工程（ｂ）および工程（ｃ）を、全ての点の前記出力データが算出されるまで繰り返し実行し、且つ、繰り返し実行される前記工程（ｂ）および工程（ｃ）を１クロック周期内に並列に実行することを特徴とする。
【００４０】
請求項１４記載の発明は、請求項１３に記載のウェーブレット変換方法であって、前記工程（ｃ）は、（ｃ−１）「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−２）前記工程（ｃ−１）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−３）第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−４）前記工程（ｃ−３）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−５）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−６）前記工程（ｃ−５）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−７）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の前記中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−８）前記工程（ｃ−７）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、を備え、前記工程（ｃ−１）〜工程（ｃ−８）を、全ての点の出力データが算出されるまで繰り返し実行させるように制御する。
【００４１】
請求項１５記載の発明は、請求項１４記載のウェーブレット変換方法であって、前記工程（ｃ−１）および前記工程（ｃ−３）を実行した後に、前記工程（ｃ−５）および前記工程（ｃ−７）を、前記出力データの一時データが算出されるまで実行し、その後、前記工程（ｃ−２）および前記工程（ｃ−４）を実行した後に、前記工程（ｃ−６）および前記工程（ｃ−８）を、前記出力データが算出されるまで実行する。
【００４２】
請求項１６記載の発明は、請求項１４記載のウェーブレット変換方法であって、前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−３）と、Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−６）と、Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記工程（ｃ−７）と、の４工程を前記各２点演算部に並列に実行させると共に、Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−１）と、前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記工程（ｃ−４）と、Ｐ−２番目の入力データを始点とする系列上の第Ｍ＋１段階の一時データを算出する前記工程（ｃ−５）と、前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記工程（ｃ−８）と、の４個の処理をそれぞれ並列に実行させるように制御する。
【００４３】
請求項１７記載の発明は、請求項１３に記載のウェーブレット変換方法であって、前記工程（ｃ）は、（ｃ−１）「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−２）第１系列上の第１段階の中間データと、その中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−３）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−４）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、を備え、前記工程（ｃ−１）〜工程（ｃ−４）を、全ての点の前記出力データが算出されるまで繰り返し実行する。
【００４４】
請求項１８記載の発明は、請求項１７記載のウェーブレット変換方法であって、前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記工程（ｃ−４）と、の２個の処理をそれぞれ並列に実行させるように制御する。
【００４５】
請求項１９記載の発明は、請求項１７または請求項１８記載のウェーブレット変換方法であって、前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の中間データを算出する前記工程（ｃ−１）と、Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−３）と、の２個の処理をそれぞれ並列に実行させるように制御する。
【００４６】
請求項２０記載の発明は、請求項１７記載のウェーブレット変換方法であって、前記工程（ｃ−１）〜工程（ｃ−４）を並列に実行する。
【００４７】
請求項２１記載の発明は、請求項１３〜請求項２０の何れか１項に記載のウェーブレット変換方法であって、低域成分と高域成分に帯域分割された２次元画像データに対して、当該２次元画像データの水平方向および垂直方向のうちの一方向にライン単位で前記工程（ａ）〜工程（ｃ）を適用することによって合成データ列を算出し、この算出された合成データ列に対して、前記水平方向および前記垂直方向のうちの他方向に前記工程（ａ）〜工程（ｃ）を適用する、ウェーブレット変換方法。
【００４８】
【発明の実施の形態】
＜第１の実施形態＞
以下、本発明の第１の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図１は、第１の実施形態に係るウェーブレット変換装置１の概略構成を示す図である。このウェーブレット変換装置１は、ウェーブレット変換によって分解された高域成分あるいは低域成分のサブバンドのデータを一時的に保持するバッファ８、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）２、第１リングメモリ３Ａ、水平フィルタリング部４Ａ、ラインバッファ回路５、第２リングメモリ３Ｂおよび垂直フィルタリング部４Ｂを備えて構成されている。ここで、第１リングメモリ３Ａ、水平フィルタリング部４Ａ、ラインバッファ回路５、第２リングメモリ３Ｂおよび垂直フィルタリング部４Ｂは、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【００４９】
本実施形態では、ＭＭＵ２、水平フィルタリング部４Ａおよび垂直フィルタリング部４Ｂはハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【００５０】
このウェーブレット変換装置１に入力したサブバンドのデータはバッファ８に一時的に記憶される。ウェーブレット変換装置１は、サブバンドのデータにラインベースの２次元逆ＤＷＴを１回施す機能を有している。水平フィルタリング部４Ａと垂直フィルタリング部４Ｂとは、ラインバッファ回路５と第２リングメモリ３Ｂとを介して直列に接続されている。後述するように、サブバンドのデータは、水平フィルタリング部４Ａで水平方向にフィルタリングされた後に、垂直フィルタリング部４Ｂで垂直方向にフィルタリングされる。２次以上の分解レベルのデータに対して２次元逆ＤＷＴを実行する場合、このウェーブレット変換装置１を２回以上繰り返し利用すればよい。
【００５１】
ＭＭＵ２は、バッファ８と第１リングメモリ３Ａと第２リングメモリ３Ｂとのデータ入出力を制御する機能を有しており、バッファ８から読出した入力データを第１リングメモリ３Ａに転送し記憶させることができる。水平フィルタリング部４Ａは、第１リングメモリ３Ａから入力したデータに対して水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの８クロック周期で、その水平方向の高域成分と同方向の低域成分とを合成した出力データを２点算出してラインバッファ回路５に出力できる。よって、１点の出力データを算出するのに要する平均周期は４クロック周期である。
【００５２】
ラインバッファ回路５から出力されたデータは、第２リングメモリ３Ｂに記憶される。ＭＭＵ２は、この第２リングメモリ３Ｂから垂直フィルタリング部４Ｂに入データを入力させる。垂直フィルタリング部４Ｂは、入力データに対して垂直方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの８周期で、その垂直方向の高域成分と同方向の低域成分とを合成した出力データを２点算出し、出力する。
【００５３】
水平フィルタリング部４Ａの構成と垂直フィルタリング部４Ｂの構成は互いに同一である。図２に、フィルタリング部４（水平フィルタリング部４Ａまたは垂直フィルタリング部４Ｂ）の概略構成を示す。図２に示すリングメモリ３は、図１に示した第１リングメモリ３Ａと第２リングメモリ３Ｂとの何れか一方を表すものとする。
【００５４】
このフィルタリング部４は、入力データを選択的に取り込む第１データ・セレクタ１１、第１係数乗算器１２、遅延レジスタ１６、第２データ・セレクタ１７、第２係数乗算器１８、加算器２２、出力先選択部（ＤＭＵＸ）２３および制御部２４を備えて構成される。制御部２４は、画素クロック信号ＰＣＬＫと同期して動作する。第１，第２データ・セレクタ１１，１７は、この制御部２４から供給される選択制御信号ＳＥＬ０，ＳＥＬ１の値に応じて、リングメモリ３で取り込んだ入力データや遅延レジスタ１６に保持されたデータをそれぞれ第１端子Ｓ０と第２端子Ｓ１とに出力する。
【００５５】
また、第１係数乗算器１２は、制御部２４から供給される制御信号Ｃ０に応じて、第１データ・セレクタ１１の第１端子Ｓ０から出力されたデータに規格化係数κ，１／κの何れかを乗算して出力する（規格化処理）。第１係数乗算器１２から出力されたデータは、遅延レジスタ１６で画素クロック信号ＰＣＬＫの１クロック周期遅延した後に、第２データ・セレクタ１７に入力される。なお、第１係数乗算器１２と遅延レジスタ１６とで本発明の規格化手段が構成される。
【００５６】
また、第２係数乗算器１８は、制御部２４から供給される制御信号Ｃ１に応じて、第２データ・セレクタ１７の第１端子Ｓ０から出力されたデータにリフティング係数−α，−β，−γ，−δの何れかを乗算して出力する（係数乗算処理）。加算器２２は、第２係数乗算器１８から出力されたデータと、第２データ・セレクタ１７の第２端子Ｓ１から出力されたデータとを加算して出力先選択部２３に出力する（加算処理）。また、１／κで規格化処理されたデータは第２データ・セレクタ１７の第３端子Ｓ２から外部のＭＭＵ２にも出力される。ＭＭＵ２は、第２データ・セレクタ１７の第３端子Ｓ２からそれぞれ外部へ出力されたデータをリングメモリ３に転送し記憶させることができる。
【００５７】
また、出力先選択部２３は、制御部２４から供給される選択制御信号ＳＥＬ２の値に応じて、加算器２２から入力するデータを第１端子Ｋ０〜第３端子Ｋ２のいずれかから出力する。第２係数乗算器１８と加算器２２での係数乗算処理と加算処理は、１点当たり１クロック周期内に実行される。したがって、１点の入力データにリフティング係数を乗算し加算するのに要する期間は、画素クロック信号ＰＣＬＫの１周期である。
【００５８】
なお、係数レジスタ１９と加算器２２とはデータ・セレクタ１７から２点の入力データを取り込んで演算する２点演算部を構成する。また、この２点演算部と出力先選択部２３とで中間データ算出手段が構成される。
【００５９】
出力先選択部２３の第１端子Ｋ０および第２端子Ｋ１から出力されたデータは、低域成分と高域成分の入力データが合成された出力データとして外部へ出力される。
【００６０】
また、出力先選択部２３の第２端子Ｋ１から出力されたデータは分岐して外部のＭＭＵ２にも出力される。また、第３端子Ｋ２から出力されるデータは外部のＭＭＵ２に出力される。ＭＭＵ２は、第２端子Ｋ１と第３端子Ｋ２からそれぞれ外部へ出力されたデータをリングメモリ３に転送し記憶させることができる。
【００６１】
次に、以上のフィルタリング部４を用いたリフティング演算の代表例を、図３〜図１０を参照しつつ以下に説明する。図３〜図１０は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図３〜図１０では、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【００６２】
図３〜図１０に示す通り、入力データ…，Ｙ（２ｎ−１），Ｙ（２ｎ），…，Ｙ（２ｎ＋９），…は、それぞれ、複数の段階に亘る一系列の格子点（中間データ）に変換され、出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…，Ｘ（２ｎ＋９），として出力される。例えば、入力データＹ（２ｎ）は、２段階の中間データ（格子点）を経た後、出力データＸ（２ｎ）として出力される。以下、入力データを規格化して中間データを生成する処理を規格化処理（上記数式（１）中のstep１およびstep２が該当する）と呼び、その他の中間データを算出する処理を変換処理（上記step３およびstep４が該当する）と呼ぶ。なお、本実施形態や後述する他の実施形態では、各系列で２段階の中間データが算出されるが、本発明ではこれに限らず、１段階だけの中間データを算出するリフティング構成もあり得る。実際に、５×３タップや１３×７タップのフィルタの場合、１段階だけの中間データを算出するリフティング構成が可能である。
【００６３】
図３〜図１０は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋７回目の処理の内容を示している。Ｎ回目処理（図３）では、対象領域Ａ１内の２点の中間データＳ¹ _n+2とＤ¹ _n+1とを用いた上記ステップａ（図３８）の２点演算を画素クロック信号ＰＣＬＫの１周期（１クロック周期）内に実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の第２段階の一時データ（Ｓ² _n+2）を算出する。すなわち、偶数番目の中間データＳ¹ _n+2と、この中間データＳ¹ _n+2に対して１点前の系列上の奇数番目の中間データＤ¹ _n+1とを用いて上記ステップａの処理が実行される。
【００６４】
また、対象領域Ａ１における演算処理の１クロック前の周期において、対象領域Ｎ１においては、入力データＹ（２ｎ＋４）に規格化係数κを乗算する規格化処理を実行して、入力データＹ（２ｎ＋４）の系列上の第１段階の中間データＳ¹ _n+2が算出されている。
【００６５】
このＮ回目の具体的な処理の内容は次の通りである。図２に示すリングメモリ３は、入力データや中間データや一時データを格納する５ライン（系列）の記憶領域を備えており、参照済みの古いデータを格納する記憶領域に新たなデータを順番に上書きする構造を持つ。
【００６６】
まず、１クロック前の周期で実行される対象領域Ｎ１における処理から説明する。ＭＭＵ２は、このリングメモリ３に一時記憶された入力データＹ（２ｎ＋４）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、入力データＹ（２ｎ＋４）を第１係数乗算器１２に出力させる。第１係数乗算器１２は、制御部２４から供給された制御信号Ｃ０に従って選択した規格化係数κを乗算器１４に出力し、乗算器１４は、入力データＹ（２ｎ＋４）に規格化係数κを乗算して得たデータκ×Ｙ（２ｎ＋４）＝Ｓ¹ _n+2を遅延レジスタ１６に出力する。この第１係数乗算器１２での係数乗算処理は１クロック周期内に実行される。
【００６７】
１クロック周期後、遅延レジスタ１６に保持された中間データＳ¹ _n+2が第２データ・セレクタ１７に出力される。また、ＭＭＵ２は、リングメモリ３に一時記憶された中間データＤ¹ _n+1を第１データ・セレクタ１１に出力させる。第１データ・セレクタ１１は、制御部２４から供給された選択制御信号ＳＥＬ０に応じて、第２端子Ｓ１から中間データＤ¹ _n+1を出力する。出力されたデータは、第２データ・セレクタ１７に入力される。第２データ・セレクタ１７は、制御部２４から供給される選択制御信号ＳＥＬ１に応じて、中間データＤ¹ _n+1を第１端子Ｓ０から第２係数乗算器１８に出力し、中間データＳ¹ _n+2を第２端子Ｓ１から加算器２２に出力させる。
【００６８】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数δを乗算器２０に出力し、乗算器２０は、中間データＤ¹ _n+1にリフティング係数δを乗算して得たデータδ×Ｄ１ｎ＋１を２の補数演算回路２１に出力する。２の補数演算回路２１は、入力データの符号を反転する演算回路であり、−δ×Ｄ¹ _n+1を加算器２２に出力する。そして、加算器２２は、２点のデータ−δ×Ｄ¹ _n+1とＳ¹ _n+2とを加算することで一時データ（Ｓ² _n+2）を算出し、出力先選択部２３に出力する。この一時データ（Ｓ² _n+2）の算出処理は１クロック周期内に実行される。
【００６９】
出力先選択部２３は、制御部２４から供給された選択制御信号ＳＥＬ２の値に従って選択した第３端子Ｋ２から、外部のＭＭＵ２に一時データ（Ｓ² _n+2）を出力する。ＭＭＵ２は、その一時データ（Ｓ² _n+2）をリングメモリ３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋４）に上書きさせる。
【００７０】
次の第Ｎ＋１回目処理（図４）では、対象領域Ａ２内の２点の中間データＤ¹ _n ₊₁とＳ² _n+1とを用いた上記ステップａの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の第２段階の一時データ（Ｄ² _n+1）を算出する。中間データＳ² _n+1は、入力データＹ（２ｎ＋３）に対して１点前の入力データＹ（２ｎ＋２）を始点とする系列上の第２段階のデータである。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＤ¹ _n+1とＳ² _n+1とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＤ¹ _n+1とＳ² _n+1をそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＳ² _n+1を第１端子Ｓ０から第２係数乗算器１８に、中間データＤ¹ _n+1を第２端子Ｓ１から加算器２２に出力する。
【００７１】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数γを乗算器２０に出力し、乗算器２０は、中間データＳ² _n+1にリフティング係数γを乗算して得たデータγ×Ｓ２ｎ＋１を２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力データである−γ×Ｓ² _n+1と第２データ・セレクタ１７からの出力であるＤ¹ _n+1を加算することで一時データ（Ｄ² _n+1）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｄ² _n+1）を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｄ² _n+1）をリングメモリ３に転送し、参照済みの記憶領域中間データＤ^１ _n+1に上書きさせる。
【００７２】
次の第Ｎ＋２回目処理（図５）では、対象領域Ａ３内の２点の中間データＳ² _n+1とＤ² _nとを用いた上記ステップａの２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋２））を算出する。中間データＤ² _nは、入力データＹ（２ｎ＋２）に対して１点前の入力データＹ（２ｎ＋１）を始点とする系列上の第２段階の中間データである。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＳ² _n+1とＤ² _nとを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＳ² _n+1とＤ² _nをそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＤ² _nを第１端子Ｓ０から第２係数乗算器１８に、中間データＳ² _n+1を第２端子Ｓ１から加算器２２に出力する。
【００７３】
第２係数乗算器１８は、中間データＤ² _nにリフティング係数βを乗算するとともに、リフティング係数βによって重み付けられたβ×Ｄ２ｎを２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力データである−β×Ｄ² _nと第２データ・セレクタ１７からの出力であるＳ² _n+1を加算することで出力一時データ（Ｘ（２ｎ＋２））を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｘ（２ｎ＋２））を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｘ（２ｎ＋２））をリングメモリ３に転送し、参照済みの記憶領域中間データＳ² _n+1に上書きさせる。
【００７４】
次の第Ｎ＋３回目処理（図６）では、対象領域Ａ４内の中間データＤ² _nと出力データＸ（２ｎ）とを用いた上記ステップａの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））を算出する。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＤ² _nと出力データＸ（２ｎ）とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＤ² _nとＸ（２ｎ）とをそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、Ｘ（２ｎ）を第１端子Ｓ０から第２係数乗算器１８に、中間データＤ² _nを第２端子Ｓ１から加算器２２に出力する。
【００７５】
第２係数乗算器１８は、Ｘ（２ｎ）にリフティング係数αを乗算するとともに、リフティング係数αによって重み付けられたデータα×Ｘ（２ｎ）を２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力である−α×Ｘ（２ｎ）と第２データ・セレクタ１７からの出力であるＤ² _nとを加算することで出力一時データ（Ｘ（２ｎ＋１））を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｘ（２ｎ＋１））を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｘ（２ｎ＋１））をリングメモリ３に転送し、参照済みの記憶領域中間データＤ² _nに上書きさせる。
【００７６】
次のＮ＋４回目処理（図７）では、上記Ｎ回目処理（図３）で算出した一時データ（Ｓ² _n+2）と対象領域Ｂ１内の中間データＤ¹ _n+1とを用いた上記ステップｂ（図３８）の２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の第２段階の中間データＳ² _n+2を算出する。中間データＤ¹ _n+2は、一時データ（Ｓ² _n+2）の系列に対して１点後の系列上のデータである。
【００７７】
また、この対象領域Ｂ１における演算処理を実行する１クロック前の周期において、対象領域Ｎ２では、入力データＹ（２ｎ＋５）に規格化係数１／κを乗算する規格化処理が実行される。これにより、入力データＹ（２ｎ＋５）の系列上の第１段階の中間データＤ¹ _n+2が算出されている。
【００７８】
具体的な処理を１クロック前の周期から説明する。対象領域Ｂ１の演算を行なう１クロック前の周期において、ＭＭＵ２は、リングメモリ３に一時記憶された入力データＹ（２ｎ＋５）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、入力データＹ（２ｎ＋５）を第１係数乗算器１２に出力させる。第１係数乗算器１２は、制御部２４の制御に従い入力データＹ（２ｎ＋５）に規格化係数１／κを乗算し、得られたデータ（１／κ）×Ｙ（２ｎ＋５）＝Ｄ¹ _n+2を遅延レジスタ１６に出力する。この第１係数乗算器１２での係数乗算処理は１クロック周期内に実行される。
【００７９】
１クロック周期後、遅延レジスタ１６に保持された中間データＤ¹ _n+2が第２データ・セレクタ１７に出力される。また、ＭＭＵ２は、リングメモリ３に一時記憶された一時データ（Ｓ² _n+2）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、一時データ（Ｓ² _n+2）を第２データ・セレクタ１７に出力させる。
【００８０】
そして、制御部２４は、選択制御信号ＳＥＬ１を第２データ・セレクタ１７に供給して、中間データＤ¹ _n+2を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｓ² _n+2）を第２端子Ｓ１から加算器２２に出力させる。さらに、第２データ・セレクタ１７は、制御部２４の制御により、中間データＤ¹ _n+2を第３端子Ｓ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その中間データＤ¹ _n+2をリングメモリ３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋５）に上書きさせる。
【００８１】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数δを乗算器２０に出力し、乗算器２０は、中間データＤ¹ _n+2にリフティング係数δを乗算して得たデータδ×Ｄ１ｎ＋２を２の補数演算回路２１に出力し、２の補数演算回路２１はデータ−δ×Ｄ¹ _n+2を加算器２２に出力する。そして、加算器２２は、２点のデータ−δ×Ｄ¹ _n+2と一時データ（Ｓ² _n+2）とを加算することで中間データＳ² _n+2を算出し、出力先選択部２３に出力する。この中間データＳ² _n+2の算出処理は１クロック周期内に実行される。
【００８２】
出力先選択部２３は、制御部２４から供給された選択制御信号ＳＥＬ２の値に従って選択した第３端子Ｋ２から、外部のＭＭＵ２に中間データＳ² _n+2を出力する。ＭＭＵ２は、その中間データＳ² _n+2をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｓ² _n+2）に上書きさせる。
【００８３】
次のＮ＋５回目処理（図８）では、上記Ｎ＋１回目処理（図４）で算出した一時データ（Ｄ² _n+1）と、前記Ｎ＋４回目処理（図７）で算出した対象領域Ｂ１内の中間データＳ² _n+2とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の第２段階の中間データＤ² _n+1を算出する。なお、中間データＳ² _n+2は、一時データ（Ｄ² _n+1）の系列に対して１点後の系列上の第２段階のデータである。
【００８４】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｄ² _n+1）と中間データＳ² _n+2とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｄ² _n+1）および中間データＳ² _n+2を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＳ² _n+2を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｄ² _n+1）を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、中間データＳ² _n+2にリフティング係数γを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−γを重み付けされた中間データ−γ×Ｓ² _n+2と一時データ（Ｄ² _n+1）とを加算して中間データＤ² _n+1を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その中間データＤ² _n+1を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その中間データＤ² _n+1をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｄ² _n+1）に上書きさせる。
【００８５】
次のＮ＋６回目処理（図９）では、上記Ｎ＋２回目処理（図５）で算出した出力一時データ（Ｘ（２ｎ＋２））と、前記Ｎ＋５回目処理（図８）で算出した対象領域Ｂ２内の中間データＤ² _n+1とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力データＸ（２ｎ＋２）を算出する。なお、中間データＤ² _n+1は、出力一時データ（Ｘ（２ｎ＋２））の系列に対して１点後の系列上の第２段階の中間データである。
【００８６】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｘ（２ｎ＋２））と中間データＤ² _n+1とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｘ（２ｎ＋２））および中間データＤ² _n+1を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＤ² _n+1を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｘ（２ｎ＋２））を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、中間データＤ² _n+1にリフティング係数βを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−βを重み付けされた中間データ−β×Ｄ² _n+1と一時データ（Ｘ（２ｎ＋２））とを加算して出力データＸ（２ｎ＋２）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その出力データＸ（２ｎ＋２）を第２端子Ｋ１から外部と外部のＭＭＵ２に出力し、ＭＭＵ２は、その出力データＸ（２ｎ＋２）をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｘ（２ｎ＋２））に上書きさせる。
【００８７】
次のＮ＋７回目処理（図１０）では、上記Ｎ＋３回目処理（図６）で算出した出力一時データ（Ｘ（２ｎ＋１））と、前記Ｎ＋６回目処理（図９）で算出した対象領域Ｂ４内の出力データＸ（２ｎ＋２）とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）を算出する。なお、出力データＸ（２ｎ＋２）は、出力一時データ（Ｘ（２ｎ＋１））の系列に対して１点後の系列上の出力データである。
【００８８】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｘ（２ｎ＋１））と出力データＸ（２ｎ＋２）とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｘ（２ｎ＋１））および出力データＸ（２ｎ＋２）を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、出力データＸ（２ｎ＋２）を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｘ（２ｎ＋１））を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、出力データＸ（２ｎ＋２）にリフティング係数αを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−αを重み付けされた出力データ−α×Ｘ（２ｎ＋２）と一時データ（Ｘ（２ｎ＋１））とを加算して出力データＸ（２ｎ＋１）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その出力データＸ（２ｎ＋１）を第１端子Ｋ０から外部に出力する。
【００８９】
次のＮ＋８回目処理（図示せず）では、対象領域を除いて上記Ｎ回目処理（図３）と同様の処理が行われる。以降、Ｎ＋１回目〜Ｎ＋７回目までの処理が繰り返される。以上のように、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）と同様の処理が、全ての点の出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…が算出されるまで対象領域を移動させつつ実行される。
【００９０】
また、本実施形態では、上記Ｎ回目〜Ｎ＋３回目の処理で示したように、最終段階の出力一時データ（Ｘ（２ｎ＋１））が算出されるまで上記ステップａの２点演算が実行され、その後、上記Ｎ＋４回目〜Ｎ＋７回目の処理で示したように、上記Ｎ回目〜Ｎ＋３回目の処理で算出した全ての一時データを中間データあるいは出力データに変換する上記ステップｂの２点演算が行われている。
【００９１】
以上のように、本実施形態に係るウェーブレット変換方法では、入力データ…，Ｙ（２ｎ），Ｙ（２ｎ＋１），…を規格化する処理と、規格化された中間データを他の中間データに変換する変換処理とを１クロック周期内に並列に同時実行するため、１点の出力データを算出するのに要する平均周期を４クロック周期とすることができ、出力データの算出周期を短縮化できる。
【００９２】
次に、上記ウェーブレット変換装置１を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【００９３】
水平フィルタリング部４Ａに入力するサブバンド（帯域成分）は、図１１に示すように、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである。ここで、サブバンド２３ＬＬは、水平方向の低域成分（Ｌ）と垂直方向の低域成分（Ｌ）とからなり、サブバンド２３ＨＬは、水平方向の高域成分（Ｈ）と垂直方向の低域成分（Ｌ）とからなり、サブバンド２３ＬＨは、水平方向の低域成分（Ｌ）と垂直方向の高域成分（Ｈ）とからなり、サブバンド２３ＨＨは、水平方向の高域成分（Ｈ）と垂直方向の高域成分（Ｈ）とからなる。
【００９４】
水平フィルタリング部４Ａに入力するサブバンド（帯域成分）が、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである場合には、図３〜図１０で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３ＬＬと２３ＨＬの水平方向のデータを交互に配列したデータ、あるいは、サブバンド２３ＬＨと２３ＨＨの水平方向のデータを交互に配列したデータである。そして、サブバンド２３ＬＬと２３ＨＬとからなる入力データに対して水平フィルタリングを施すことにより、水平方向の合成処理が行なわれ、サブバンド２３Ｌが出力される。また、サブバンド２３ＬＨと２３ＨＨとからなる入力データに対して水平フィルタリングを施すことにより、水平方向の合成処理が行なわれ、サブバンド２３Ｈが出力される。図３〜図１０で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、サブバンド２３Ｌあるいはサブバンド２３Ｈの水平方向の１ラインのデータを示している。
【００９５】
次に、垂直フィルタリング部４Ｂが入力するサブバンドは、図１２に示すように、サブバンド２３Ｌおよびサブバンド２３Ｈである。この場合には、図３〜図１０で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３Ｌと２３Ｈの垂直方向のデータを交互に配列したデータである。そして、サブバンド２３Ｌと２３Ｈとからなる入力データに対して垂直フィルタリングを施すことにより、垂直方向の合成処理が行なわれ、画像データ２３が出力される。図３〜図１０で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、画像データ２３の垂直方向の１ラインのデータを示している。画像データ２３は、水平画素数Ｗ、垂直画素数Ｈを有する矩形状のデータである。
【００９６】
サブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨは、垂直画素数Ｈ／２、水平画素数Ｗ／２を有する矩形状のデータであって、図１２に模式的に示すように、偶数行偶数列のサブバンド２３ＬＬおよび偶数行奇数列のサブバンド２３ＨＬを１組として、あるいは、奇数行偶数列のサブバンド２３ＬＨおよび奇数行奇数列をサブバンド２３ＨＨの１組として、垂直方向に配列するデータ列…，Ｙ_i（２ｎ），Ｙ_i（２ｎ＋１），Ｙ_i（２ｎ＋２）…として水平フィルタリング部４Ａに入力させられる。入力データＹ_i（ｋ）の添字ｉは、当該入力データＹ_i（ｋ）が所属する画素列の番号を示すものとする。画素列の番号ｉは、ｉ＝０，１，…，Ｗ−１（Ｗ：水平画素数）の値をとる。図中、サブバンド２３ＬＬおよび２３ＨＬを１組とした偶数行の記憶領域２４Ｌと、サブバンド２３ＬＨおよびサブバンド２３ＨＨを１組とした奇数行の記憶領域２４Ｈとを２領域に分割しているが、メモリ状のデータ配置はこれに限定されるものではない。
【００９７】
具体的には、第１リングメモリ３Ａと水平フィルタリング部４Ａは、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）を含む各回の処理を、低域側（記憶領域２４Ｌ側）と高域側（記憶領域２４Ｈ側）を交互に切り替えながら、各回の処理を画素単位で繰り返し実行する。
【００９８】
例えば、上記Ｎ回目処理（図３）が、メモリ領域２４Ｌ側の１番目の画素行に対して１回実行された後に、上記Ｎ＋１回目処理（図４）が１回実行され、更に、上記Ｎ＋２回目処理（図５）が１回実行され、・・・といった処理が行われる。同様に、記憶領域２４Ｈ側の１番目の画素行に対して実行され、次に、記憶領域２４Ｌ側の２番目の画素行に対して実行された後に、記憶領域２４Ｈ側の２番目の画素行に対して実行され、次に、記憶領域２４Ｌ側の３番目の画素行に対して実行された後に、記憶領域２４Ｈ側の３番目の画素行に対して実行され、・・・、最終的に、記憶領域２４Ｌ側のＨ／２番目の画素行に対して実行された後に、記憶領域２４Ｈ側のＨ／２番目の画素行に対して実行される。
【００９９】
なお、第１リングメモリ３Ａは、図１３に模式的に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する５点（５画素）のデータを保持する記憶領域２６を有しており、上記一時データや中間データを保持することができる。
【０１００】
この結果、水平フィルタリング部４Ａからは、サブバンド２３ＬＬと２３ＨＬとが合成されたサブバンド２３Ｌの各水平ライン単位（Ｈ／２高さ）の出力と、サブバンド２３ＬＨと２３ＨＨとが合成されたサブバンド２３Ｈの各水平ライン単位（Ｈ／２高さ）の出力とが、交互にで連続的に出力される。そして、サブバンド２３Ｌの水平ラインが、ラインバッファ回路５内のＬ用ラインバッファ５Ｌにバッファリングされ、サブバンド２３Ｈの水平ラインが、ラインバッファ回路５内のＨ用ラインバッファ５Ｈにバッファリングされる。
【０１０１】
例えば、上記Ｎ＋６回目処理（図９）が１番目〜Ｗ番目の各画素に対して連続的に実行された結果、２ｎ＋２番目の水平成分の合成された１ラインのデータＸ₀（２ｎ＋２），Ｘ₂（２ｎ＋２），…，Ｘ_j（２ｎ＋２），…，Ｘ_ｗ _-1（２ｎ＋２）が連続的に出力され、Ｌ用ラインバッファ回路５Ｌでバッファリングされる。次に、上記Ｎ＋７回目処理（図１０）が１番目〜Ｗ番目の各画素に対して連続的に実行された結果、２ｎ＋１番目の水平成分の合成された１ラインのデータＸ₀（２ｎ＋１），Ｘ₂（２ｎ＋１），…，Ｘ_j（２ｎ＋１），…，Ｘ_ｗ _-1（２ｎ＋１）が連続的に出力され、Ｈ用ラインバッファ回路５Ｈでバッファリングされる。
【０１０２】
ラインバッファ回路５は、ＭＭＵ２の制御により、Ｌ用ラインバッファ５Ｌ内の１水平ラインの成分とＨ用ラインバッファ５Ｈ内の１水平ラインの成分とを１ラインづつ交互に第２リングメモリ３Ｂに供給する。第２リングメモリ３Ｂに出力されたデータが垂直フィルタリング部４Ｂで処理される。
【０１０３】
具体的には、第２リングメモリ３Ｂと垂直フィルタリング部４Ｂは、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）を含む各画素列について処理を水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図３）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。次に、上記Ｎ＋１回目処理（図４）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について順次実行される。なお、第２リングメモリ３Ｂは、図１２に模式的に示すように、入力データ列に対応する５×Ｗ点（５ライン）のデータを保持する記憶領域２４を有しており、上記一時データや中間データを保持することができる。
【０１０４】
この結果、垂直フィルタリング部４Ｂは、水平ライン単位で入力するデータ行から画像データ２３を出力するのである。
【０１０５】
以上の処理を再帰的に実行させることで、任意次数の分解レベルの帯域成分を合成処理し、画像データを復元することができる。すなわち、ｋ−１次（ｋは２以上の整数）の分解レベルにおけるサブバンドＬＬ（ｋ−１），ＨＬ（ｋ−１），ＬＨ（ｋ−１），ＨＨ（ｋ−１）を、ウェーブレット変換装置１に再帰的に入力させることで、ｋ次のサブバンドＬＬ（ｋ）を得ることが可能である。
【０１０６】
以上のように、本実施形態に係るウェーブレット変換装置１では、図２に示す構成を有する水平フィルタリング部４Ａと垂直フィルタリング部４Ｂとを備えるため、出力データの算出周期を短縮化できる。したがって、ラインベースの２次元ウェーブレット変換を短時間で高速で行うことが可能である。
【０１０７】
＜第２の実施形態＞
次に、本発明の第２の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図１４は、第２の実施形態に係るウェーブレット変換装置３０の概略構成を示す図である。このウェーブレット変換装置３０は、サブバンドの２次元画像データを一時的に保持するバッファ３４、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）３１、第１リングメモリ３２Ａ、水平フィルタリング部３３Ａ、第２リングメモリ３２Ｂおよび垂直フィルタリング部３３Ｂを備えて構成されている。ここで、第１リングメモリ３２Ａ、水平フィルタリング部３３Ａ、第２リングメモリ３２Ｂおよび垂直フィルタリング部３３Ｂは、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【０１０８】
なお、図中、第１および第２リングメモリ３２Ａ，３２Ｂの画素数あるいはライン数が８ｏｒ９となっているが、この第２の実施形態においては、第１リングメモリ３２Ａは、９点のリングメモリであり、第２リングメモリ３２Ｂは、９ラインのリングメモリである。
【０１０９】
本実施形態では、ＭＭＵ３１、水平フィルタリング部３３Ａおよび垂直フィルタリング部３３Ｂはハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【０１１０】
このウェーブレット変換装置３０に入力したサブバンドの２次元画像データはバッファ３４に一時的に記憶される。ウェーブレット変換装置３０は、２次元画像データにラインベースの２次元逆ＤＷＴを１回施す機能を有し、ｋ＋１次レベルのサブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨを合成して、ｋ次のサブバンド２３ＬＬを生成する。水平フィルタリング部３３Ａと垂直フィルタリング部３３Ｂとは、第２リングメモリ３２Ｂを介して直列に接続されている。サブバンドのデータは、水平フィルタリング部３３Ａで水平方向にフィルタリングされた後に、垂直フィルタリング部３３Ｂで垂直方向にフィルタリングされる。２次以上の分解レベルのサブバンドを合成する２次元逆ＤＷＴを実行する場合、このウェーブレット変換装置３０を２回以上繰り返し利用すればよい。
【０１１１】
ＭＭＵ３１は、バッファ３４と第１リングメモリ３２Ａと第２リングメモリ３２Ｂとのデータ入出力を制御する機能を有しており、バッファ３４から読出したサブバンドのデータを第１リングメモリ３２Ａに転送し記憶させることができる。詳しくは、サブバンド２３ＬＬおよび２３ＨＬの水平方向のデータが画素単位で交互に配列されたデータ、および、サブバンド２３ＬＨおよび２３ＨＨの水平方向のデータが画素単位で交互に配列されたデータとが、第１リングメモリ３２Ａに記憶される。水平フィルタリング部３３Ａは、第１リングメモリ３２Ａから入力したデータに対して水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、その高域成分と低域成分とを合成したデータを１点ずつ算出して第２リングメモリ３２Ｂに出力できる。詳しくは、サブバンド２３ＬＬおよび２３ＨＬが合成されたデータと、サブバンド２３ＬＨおよび２３ＨＨが合成されたデータとが交互に出力されて第２リングメモリ３２Ｂに記憶される。
【０１１２】
次に、ＭＭＵ３１は、この第２リングメモリ３２Ｂから垂直フィルタリング部３３Ｂにデータを入力させる。垂直フィルタリング部３３Ｂは、入力したデータに対して垂直方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、高域成分と低域成分とを合成したデータを１点ずつ算出し、出力する。
【０１１３】
水平フィルタリング部３３Ａの構成と垂直フィルタリング部３３Ｂの構成とは互いに同一である。図１５に、フィルタリング部３３（水平フィルタリング部３３Ａまたは垂直フィルタリング部３３Ｂ）の概略構成を示す。図１５に示すリングメモリ３２は、図１４に示した第１リングメモリ３２Ａと第２リングメモリ３２Ｂとの何れか一方を表すものとする。
【０１１４】
このフィルタリング部３３は、入力データを選択的に取り込む第１データ・セレクタ３５、第１係数乗算器３６、遅延レジスタ４０、第２データ・セレクタ４１、第３データ・セレクタ４２、加算器４３，４８，４９，５４、第２係数乗算器４４、第３係数乗算器５０、出力先選択部（ＤＭＵＸ）５５、および制御部５６を備えて構成される。これら構成要素のうち、２個の加算器４３，４８と第２係数乗算器４４からなる組は、３点のデータを１クロック周期内に処理する３点演算部を構成する。また、２個の加算器４９，５４と第３係数乗算器５０からなる組も同様に３点演算部を構成する。また、これら２組の３点演算部と出力先選択部５５とで中間データ算出手段が構成される。
【０１１５】
制御部５６は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ３５は、この制御部５６から供給される選択制御信号ＳＥＬ０の値に応じて、リングメモリ３２で取り込んだデータを第１端子Ｓ０〜第７端子Ｓ６の何れかから選択的に出力する。
【０１１６】
第１データ・セレクタ３５の第１端子Ｓ０から出力されたデータは、第１係数乗算器３６に入力される。第１係数乗算器３６では、係数レジスタ３７は、制御部５６から供給される制御信号Ｃ０の値に応じて、規格化係数１／κ，κの何れか一方を乗算器３８に出力し、乗算器３８は、入力データにその規格化係数を乗算する規格化処理を１クロック周期内に実行する。第１係数乗算器３６から出力されたデータは、遅延レジスタ４０で画素クロック信号ＰＣＬＫの１クロック周期遅延した後に第２データ・セレクタ４１に入力される。なお、第１係数乗算器３６と遅延レジスタ４０とで本発明の規格化手段が構成される。
【０１１７】
３点演算部においては、加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１とから出力された２点のデータを加算して第２係数乗算器４４に出力する。第２係数乗算器４４では、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１の値に応じて、リフティング係数δ，αの何れか一方を入力データに乗算し、２の補数演算回路４７で符号が反転された後、加算器４８に出力される。そして、加算器４８は、第３データ・セレクタ４２の第３端子Ｓ２から入力したデータと、第２係数乗算器４４から入力したデータとを加算して出力先選択部５５に出力する。
【０１１８】
また、加算器４９は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから出力された２点のデータを加算して第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２の値に応じて、リフティング係数β，γの何れか一方を入力データに乗算し、２の補数演算回路５３で符号が反転した後、加算器５４に出力される。加算器５４は、第３データ・セレクタ４２の第６端子Ｓ５から入力したデータと、第３係数乗算器５０から入力したデータとを加算して出力先選択部５５に出力する。
【０１１９】
出力先選択部５５は、制御部５６から供給される選択制御信号ＳＥＬ３の値に応じて、加算器４８，５４から並列に入力する２点のデータを第１端子Ｋ０から第３端子Ｋ２のいずれかから出力する。
【０１２０】
また、出力先選択部５５の第２端子Ｋ１から出力されたデータは分岐して外部のＭＭＵ２にも出力され、第３端子Ｋ２から出力されるデータは外部のＭＭＵ２に出力される。ＭＭＵ２は、第２端子Ｋ１と第３端子Ｋ２からそれぞれ外部へ出力されたデータをリングメモリ３２に転送し記憶させることができる。
【０１２１】
次に、以上のフィルタリング部３３を用いたリフティング演算の代表例を、図１６〜図１９を参照しつつ以下に説明する。図１６〜図１９は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図１６〜図１９は、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０１２２】
図１６〜図１９は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋３回目の処理を模式的に示している。Ｎ回目処理（図１６）では、対象領域Ｃ１，Ｃ２の２個の変換処理が１クロック周期内に並列に同時実行される。対象領域Ｃ２では、２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _nとを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）が算出される。２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）は、中間データＤ² _nの系列に対して１点前後する２系列上のデータである。また、対象領域Ｃ１では、２点の中間データＳ² _n+2，Ｓ² _n+3を加算したデータにリフティング係数−γを乗算することで乗算値を算出した後、この乗算値と中間データＤ¹ _n+2とを加算するという３点演算が実行される。この結果、偶数番目の入力データY（２ｎ＋５）を始点とする系列上の第２段階の中間データD² _n+2が算出される。ここで、２点の中間データＳ² _n+2，Ｓ² _n+3は、中間データD¹ _n+2の系列に対して１点前後する２系列上のデータである。
【０１２３】
また、上記対象領域Ｃ１およびＣ２における演算の１クロック前の周期に、対象領域Ｎ１において、入力データＹ（２ｎ＋８）に規格化係数κを乗算する規格化処理が実行され、入力データＹ（２ｎ＋８）の系列上の第１段階の中間データであるＳ¹ _n+4が算出される。
【０１２４】
このＮ回目の具体的な処理の内容は次の通りである。図１５に示すリングメモリ３２は、入力データや中間データや一時データを格納する９ライン（系列）の記憶領域を備えており、参照済みの古いデータを格納する記憶領域に新たなデータを順番に上書きする構造を持つ。
【０１２５】
ＭＭＵ３１は、このリングメモリ３２に一時記憶された入力データＹ（２ｎ＋８）を第１データ・セレクタ３５に出力させる。制御部５６は、選択制御信号ＳＥＬ０を第１データ・セレクタ３５に供給して、入力データＹ（２ｎ＋８）を第１係数乗算器３６に出力させる。第１係数乗算器３６は、制御部５６から供給された制御信号Ｃ０に従って２個の規格化係数κ，１／κのうち後半の係数κを選択して乗算器３８に供給し、乗算器３８は、入力データと規格化係数κとを乗算した乗算値（＝κ×Ｙ（２ｎ＋８）＝Ｓ¹ _n+4）を遅延レジスタ４０に出力する。この第１係数乗算器３６での係数乗算処理は１クロック周期内に実行される。
【０１２６】
この係数乗算処理から１クロック周期後、遅延レジスタ４０に記憶された中間データＳ１ｎ＋４が第２データ・セレクタ４１に出力される。第２データ・セレクタ４１は、制御部５６から供給される選択制御信号ＳＥＬ１に従って、中間データＳ¹ _n+4を第２端子Ｓ１からＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＳ¹ _n+4をリングメモリ３２に転送し、参照済みの記憶領域入力データＹ（２ｎ＋８）に上書きさせる。また、この中間データＳ¹ _n+4をＭＭＵ３１に出力する周期と同じクロック周期において、ＭＭＵ３１は、リングメモリ３２に一時記憶された６点のデータＸ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｓ² _n+2，Ｄ¹ _n+2，Ｓ² _n+3を第１データ・セレクタ３５に出力させる。第１データ・セレクタ３５は、制御部５６から供給される選択制御信号ＳＥＬ０の値に応じて、前記６点のデータを第２端子Ｓ１〜第７端子Ｓ６に出力する。この出力は、次に、第３データ・セレクタ４２に入力され、第３データ・セレクタ４２は、制御部５６から供給される選択制御信号ＳＥＬ２の値に応じて、入力データのうち対象領域Ｃ２内の３点のデータＸ（２ｎ），Ｘ（２ｎ＋２），Ｄ² _nを選択して、それぞれ第１端子Ｓ０〜第３端子Ｓ２から出力し、入力データのうち対象領域Ｃ１内の３点のデータＳ² _n+2，Ｓ² _n+3,Ｄ¹ _n+2を選択して、それぞれ第４端子Ｓ３〜第６端子Ｓ５から出力する。
【０１２７】
上方の加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１から入力した２点のデータＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータを第２係数乗算器４４に出力する。第２係数乗算器４４において、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１に従って、２個のリフティング係数α，δのうち後半の係数αを選択して乗算器４６に供給し、乗算器４６は、入力データとリフティング係数αとを乗算した乗算値（＝α×（Ｘ（２ｎ）＋Ｘ（２ｎ＋２）））を２の補数演算回路４７に出力する。２の補数演算回路４７において、符号が反転されたデータは、加算器４８に出力される。そして、加算器４８は、第２係数乗算器４４から入力する乗算値と、第３データ・セレクタ４２の第３端子Ｓ２から入力した中間データＤ² _nとを加算することで、対象領域Ｃ２内の出力データＸ（２ｎ＋１）を算出し、出力先選択部５５に出力する。この出力データＸ（２ｎ＋１）の算出処理は１クロック周期内に実行される。
【０１２８】
一方、下方の加算器５０は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから入力した２点の中間データＳ² _n+2，Ｓ² _n+3を加算したデータを第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２に従って、２個のリフティング係数β，γのうち後半の係数γを選択して乗算器５２に供給し、乗算器５２は、入力データとリフティング係数γとを乗算した乗算値（＝γ×（Ｓ² _n+2＋Ｓ² _n+3））を２の補数演算回路５３に出力する。２の補数演算回路５３において、符号が反転されたデータは、加算器５４に出力される。そして、加算器５４は、第３係数乗算器５０から入力する乗算値と、第３データ・セレクタ４２の第６端子Ｓ５から入力した中間データＤ¹ _n+2とを加算することで、対象領域Ｃ１内の中間データＤ² _n+2を算出し、出力先選択部５５に出力する。この中間データＤ² _n+2の算出処理は１クロック周期内に実行される。
【０１２９】
出力先選択部５５は、制御部５６から供給された選択制御信号ＳＥＬ３の値に従って、加算器４８から入力した出力データＸ（２ｎ＋１）を第１端子Ｋ０から出力し、他方の加算器５４から入力した中間データＤ² _n+2を第３端子Ｋ２から外部のＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＤ² _n+2をリングメモリ３２に転送し、参照済みの記憶領域中間データＤ¹ _n+2に上書きさせる。
【０１３０】
次に、Ｎ＋１回目処理（図１７）における対象領域Ｃ３，Ｃ４の変換処理が行なわれる。対象領域Ｃ３では、２点の中間データＤ¹ _n+3，Ｄ¹ _n+4を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と中間データＳ¹ _n+4とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋８）を始点とする系列上の第２段階の中間データＳ² _n+4が算出される。ここで、２点の中間データＤ¹ _n+3，Ｄ¹ _n+4は、中間データＳ¹ _n+4に対して１点前後するデータである。また、対象領域Ｃ４では、２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータにリフティング係数−βを乗算することで乗算値を算出した後、この乗算値と中間データＳ² _n+2とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の出力データＸ（２ｎ＋４）が算出される。ここで、２点の中間データＤ² _n+1，Ｄ² _n+2は、中間データＳ² _n+2の系列に対して１点前後する２系列上のデータである。
【０１３１】
また、対象領域Ｃ３，Ｃ４における演算処理が実行される１クロック前の周期において、対象領域Ｎ２における処理が実行される。対象領域Ｎ２においては、入力データＹ（２ｎ＋９）に規格化係数１／κを乗算する規格化処理が実行され、中間データＤ¹ _n+4が出力される。
【０１３２】
このＮ＋１回目の具体的な処理内容は次の通りである。まず、１クロック前の周期に実行される対象領域Ｎ２の処理から説明する。ＭＭＵ３１は、このリングメモリ３２に一時記憶された入力データＹ（２ｎ＋９）を第１データ・セレクタ３５に出力させる。制御部５６は、選択制御信号ＳＥＬ０を第１データ・セレクタ３５に供給して、入力データＹ（２ｎ＋９）を第１係数乗算器３６に出力させる。第１係数乗算器３６は、制御部５６から供給された制御信号Ｃ０に従って２個の規格化係数κ，１／κのうち前半の係数１／κを選択して乗算器３８に供給し、乗算器３８は、入力データと規格化係数１／κとを乗算した乗算値（＝１／κ×Ｙ（２ｎ＋９）＝Ｄ¹ _n+4）を遅延レジスタ４０に出力する。この第１係数乗算器３６での係数乗算処理は１クロック周期内に実行される。
【０１３３】
この係数乗算処理から１クロック周期後、遅延レジスタ４０に記憶された中間データＤ１ｎ＋４が第２データ・セレクタ４１に出力される。第２データ・セレクタ４１は、制御部５６から供給される選択制御信号ＳＥＬ１に従って、中間データＤ¹ _n+4を第１端子Ｓ０から第３データ・セレクタ４２に出力し、且つ、中間データＤ¹ _n+4を第２端子Ｓ１からＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＤ¹ _n+4をリングメモリ３２に転送し、参照済みの記憶領域入力データＹ（２ｎ＋９）に上書きさせる。次に、この中間データＤ¹ _n+4を第３データ・セレクタ４２に出力する周期と同じクロック周期において、ＭＭＵ３１は、リングメモリ３２に一時記憶された５点のデータＤ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｄ¹ _n+3，Ｓ¹ _n+4を第１データ・セレクタ３５に出力させる。第１データ・セレクタ３５は、制御部５６から供給された選択制御信号ＳＥＬ０の値に応じて、前記５点のデータを第２端子Ｓ１〜第６端子Ｓ５に出力する。この出力は、次に、第３データ・セレクタ４２に入力され、第３データ・セレクタ４２は、前記５点のデータのうち対象領域Ｃ３内の３点の入力データＤ² _n+1，Ｄ² _n+2，Ｓ² _n+2，を選択して第４端子Ｓ３〜第６端子Ｓ５から出力し、前記５点のデータのうち対象領域Ｃ４内の２点のデータおよび第２データ・セレクタ４１から入力したデータＤ¹ _n+3，Ｄ¹ _n+4，Ｓ¹ _n+4を選択して第１端子Ｓ０〜第３端子Ｓ２から出力する。
【０１３４】
上方の加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１から入力した２点のデータＤ¹ _n+3，Ｄ¹ _n+4を加算したデータを第１係数乗算器４４に出力する。第１係数乗算器４４において、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１に従って、２個のリフティング係数α，δのうち前半の係数δを選択して乗算器４６に供給し、乗算器４６は、入力データとリフティング係数δとを乗算した乗算値（＝δ×（Ｄ¹ _n+3＋Ｄ¹ _n+4））を２の補数演算回路４７に出力する。２の補数演算回路４７において、符号が反転されたデータは、加算器４８に出力される。そして、加算器４８は、第２係数乗算器４４から入力する乗算値と、第３データ・セレクタ４２の第３端子Ｓ２から入力した中間データＳ¹ _n+4とを加算することで、対象領域Ｃ３内の中間データＳ² _n+4を算出し、出力先選択部５５に出力する。この中間データＳ² _n+4の算出処理は１クロック周期内に実行される。
【０１３５】
一方、下方の加算器４９は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから入力した２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータを第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２に従って、２個のリフティング係数β，γのうち前半の係数βを選択して乗算器５２に供給し、乗算器５２は、入力データとリフティング係数βとを乗算した乗算値（＝β×（Ｄ² _n+1＋Ｄ² _n+2））を２の補数演算回路５３に出力する。２の補数演算回路５３において、符号が反転されたデータは、加算器５４に出力される。そして、加算器５４は、第３係数乗算器５０から入力する乗算値と、第３データ・セレクタ４２の第６端子Ｓ５から入力した中間データＳ² _n+2とを加算することで、対象領域Ｃ４内の出力データＸ（２ｎ＋４）を算出し、出力先選択部５５に出力する。この出力データＸ（２ｎ＋４）の算出処理は１クロック周期内に実行される。
【０１３６】
出力先選択部５５は、制御部５６から供給された選択制御信号ＳＥＬ３の値に従って、加算器５４から入力した出力データＸ（２ｎ＋４）を第２端子Ｋ１から出力し、他方の加算器４８から入力した中間データＳ² _n+4を第３端子Ｋ２から外部のＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＳ² _n+4をリングメモリ３２に転送し、参照済みの記憶領域中間データＳ¹ _n+4に上書きさせる。また、第２端子Ｋ１から出力された出力データＸ（２ｎ＋４）は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その出力データＸ（２ｎ＋４）をリングメモリ３２に転送し、参照済みの記憶領域中間データＳ² _n+2に上書きさせる。
【０１３７】
次に、Ｎ＋２回目処理（図１８）における対象領域Ｃ５，Ｃ６の変換処理が実行される。また、対象領域Ｃ５，Ｃ６における演算処理が実行される１クロック前の周期において、対象領域Ｎ３の規格化処理が実行される。ここで、対象領域Ｃ５，Ｃ６，Ｎ３は、それぞれ、上記Ｎ回目処理（図１６）の対象領域Ｃ１，Ｃ２，Ｎ１を２系列（２点）後方に移動した領域である。これら対象領域Ｃ５，Ｃ６，Ｎ３では、対象領域Ｃ１，Ｃ２，Ｎ１での処理と同様の処理が実行される。したがって、対象領域Ｎ３では、偶数番目の入力データＹ（２ｎ＋１０）に規格化係数κを乗算する規格化処理を実行して、中間データＳ¹ _n+5を算出する。また、対象領域Ｃ５では、２点の中間データＳ² _n+3，Ｓ² _n+4を加算したデータにリフティング係数−γを乗算することで乗算値を算出した後、この乗算値と中間データＤ¹ _n+3とを加算するという３点演算が実行される。この結果、奇数番目の入力データＸ（２ｎ＋７）を始点とする系列上の第２段階の中間データＤ² _n+3が算出される。また、対象領域Ｃ６では、２点の出力データＸ（２ｎ＋２），Ｘ（２ｎ＋４）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _n+1とを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の出力データＸ（２ｎ＋３）が算出される。
【０１３８】
次に、Ｎ＋３回目処理（図１９）における対象領域Ｃ７，Ｃ８の変換処理が実行される。また、対象領域Ｃ７，Ｃ８における演算処理が実行される１クロック前の周期において、対象領域Ｎ４の規格化処理が実行される。ここで、対象領域Ｃ７，Ｃ８，Ｎ４は、それぞれ、上記Ｎ＋１回目処理（図１７）の対象領域Ｃ３，Ｃ４，Ｎ２を２系列（２点）後方に移動した領域である。これら対象領域Ｃ７，Ｃ８，Ｎ４では、対象領域Ｃ３，Ｃ４，Ｎ２での処理と同様の処理が実行される。したがって、対象領域Ｎ４では、入力データＹ（２Ｎ＋１１）に規格化係数１／κを乗算する規格化処理を実行して、中間データＤ¹ _n+5を算出する。また、対象領域Ｃ７では、奇数番目の２点の中間データＤ¹ _n+4，Ｄ¹ _n+5を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と偶数番目の中間データＳ¹ _n+5とを加算するという３点演算が実行される。この結果、偶数番目の入力データＸ（２ｎ＋１０）を始点とする系列上の第２段階の中間データＳ² _n+5が算出される。また、対象領域Ｃ８では、２点の中間データＤ² _n ₊₂，Ｄ² _n+3を加算したデータにリフティング係数−βを乗算して乗算値を算出した後、この乗算値と中間データＳ² _n+3とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の出力データＸ（２ｎ＋６）算出される。
【０１３９】
以上のように、上記Ｎ回目処理（図１６）およびＮ＋１回目処理（図１７）と同様の処理が、全ての点の出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目或いは奇数番目の１点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０１４０】
次に、上記ウェーブレット変換装置３０を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【０１４１】
水平フィルタリング部３３Ａに入力するサブバンド（帯域成分）は、図１１に示すように、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである。
【０１４２】
図１６〜図１９で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３ＬＬと２３ＨＬの水平方向のデータを交互に配列したデータ、あるいは、サブバンド２３ＬＨと２３ＨＨの水平方向のデータを交互に配列したデータである。そして、サブバンド２３ＬＬと２３ＨＬとからなる入力データに対して水平フィルタリングを施すことにより、サブバンド２３Ｌが出力され、サブバンド２３ＬＨと２３ＨＨとからなる入力データに対して水平フィルタリングを施すことによりサブバンド２３Ｈが出力される。図１６〜図１９で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、サブバンド２３Ｌあるいはサブバンド２３Ｈの水平方向の１ラインのデータ列を示している。
【０１４３】
次に、垂直フィルタリング部３３Ｂが入力するサブバンドは、図１１に示すように、サブバンド２３Ｌおよびサブバンド２３Ｈである。この場合には、図１６〜図１９で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３Ｌと２３Ｈの垂直方向のデータを交互に配列したデータである。そして、サブバンド２３Ｌと２３Ｈとからなる入力データに対して垂直フィルタリングを施すことにより、画像データ２３が出力される。図１６〜図１９で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、画像データ２３の垂直方向の１ラインのデータ列を示している。画像データ２３は、水平画素数Ｗ、垂直画素数Ｈを有する矩形状のデータである。
【０１４４】
サブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨは、垂直画素数Ｈ／２、水平画素数Ｗ／２を有する矩形状のデータであって、図２０に模式的に示すように、偶数行偶数列のサブバンド２３ＬＬおよび偶数行奇数列のサブバンド２３ＨＬを１組として、あるいは、奇数行偶数列のサブバンド２３ＬＨおよび奇数行奇数列のサブバンド２３ＨＨを１組として、垂直方向に配列するデータ列…，Ｙ_i（２ｎ），Ｙ_i（２ｎ＋１），Ｙ_i（２ｎ＋２）…として水平フィルタリング部３３に入力させられる。つまり、記憶領域５８Ｌにおける各画素行（図の横方向のデータ列）は、サブバンド２３ＬＬおよび２３ＨＬの各水平ラインの画素を交互に配列したデータ列であり、記憶領域５８Ｈに入力される各画素行（図の横方向のデータ列）は、サブバンド２３ＬＨおよび２３ＨＨの各水平ラインの画素を交互に配列したデータ列である。入力データＹ_i（ｋ）の添字ｉは、当該入力データＹ_i（ｋ）が所属する画素列の番号を示すものとする。画素列の番号ｉは、ｉ＝０，１．…，Ｗ−１（Ｗ：水平画素数）の値をとる。図中、サブバンド２３ＬＬおよび２３ＨＬを１組とした偶数行の記憶領域５８Ｌと、サブバンド２３ＬＨおよびサブバンド２３ＨＨを１組とした奇数行の記憶領域５８Ｈとを２領域に分割しているが、メモリ状のデータ配置はこれに限定されるものではない。
【０１４５】
具体的には、第１リングメモリ３２Ａと水平フィルタリング部３３Ａは、上記Ｎ回目処理（図１６）〜上記Ｎ＋２回目処理（図１７）を含む各回の処理を、低域側（記憶領域５８Ｌ側）と高域側（記憶領域５８Ｈ側）を交互に切り替えながら、各回の処理を画素単位について繰り返し実行する。
【０１４６】
例えば、上記Ｎ回目処理（図１６）が、メモリ領域５８Ｌ側の１番目の画素行に対して１回実行された後に、上記Ｎ＋１回目処理（図１７）が１回実行され、更に、上記Ｎ＋２回目処理（図１８）が１回実行され、・・・といった処理が行われる。同様に、記憶領域５８Ｈ側の１番目の画素行に対して実行され、次に、記憶領域５８Ｌ側の２番目の画素行に対して実行された後に、記憶領域５８Ｈ側の２番目の画素行に対して実行され、次に、記憶領域５８Ｌ側の３番目の画素行に対して実行された後に、記憶領域５８Ｈ側の３番目の画素行に対して実行され、・・・、最終的に、記憶領域５８Ｌ側のＨ／２番目の画素行に対して実行された後に、記憶領域５８Ｈ側のＨ／２番目の画素行に対して実行される。
【０１４７】
なお、第１リングメモリ３２Ａは、図２１に模式的に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する９点（９画素）のデータを保持する記憶領域５９を有しており、上記一時データや中間データを保持することができる。
【０１４８】
この結果、水平フィルタリング部３３Ａからは、サブバンド２３ＬＬと２３ＨＬとが合成されたサブバンド２３Ｌの各水平ライン単位（Ｈ／２高さ）の出力と、サブバンド２３ＬＨと２３ＨＨとが合成されたサブバンド２３Ｈの各水平ライン単位（Ｈ／２高さ）の出力とが、交互にで連続的に出力される。
【０１４９】
そして、サブバンド２３Ｌの水平ラインとサブバンド２３Ｈの水平ラインとが、交互に配列されたデータが、垂直ラインのデータとして、そのまま第２リングメモリ３２Ｂに出力され垂直フィルタリング部３３Ｂで処理される。
【０１５０】
具体的には、第２リングメモリ３２Ｂと垂直フィルタリング部３３Ｂは、上記Ｎ回目処理（図１６）〜上記Ｎ＋１回目処理（図１７）を含む各画素列について処理を水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図１６）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。次に、上記Ｎ＋１回目処理（図７）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について順次実行される。なお、第２リングメモリ３２Ｂは、図２０に模式的に示すように、入力データ列に対応する９×Ｗ点（９ライン）のデータを保持する記憶領域５８を有しており、上記一時データや中間データを保持することができる。
【０１５１】
この結果、垂直フィルタリング部３３Ｂは、水平ライン単位で入力するデータ行から画像データ２３を出力するのである。
【０１５２】
以上の処理を再帰的に実行させることで、任意次数の分解レベルの帯域成分を合成処理し、画像データを復元することができる。すなわち、ｋ＋１次（ｋは整数）の分解レベルにおけるサブバンドＬＬ（ｋ＋１），ＨＬ（ｋ＋１），ＬＨ（ｋ＋１），ＨＨ（ｋ＋１）を、ウェーブレット変換装置１に再帰的に入力させることで、ｋ次のサブバンドＬＬ（ｋ）を得ることが可能である。
【０１５３】
以上のように、本実施形態に係るウェーブレット変換装置１では、図１５に示す構成を有する水平フィルタリング部３３Ａと垂直フィルタリング部３３Ｂとを備えるため、出力データの算出周期を短縮化できる。したがって、ラインベースの２次元ウェーブレット変換を短時間で高速で行うことが可能である。
【０１５４】
そして、第２の実施形態においては、第１の実施の形態において必要であった、水平フィルタリング部３３Ａの出力を記憶するバッファが不要である。第１の実施形態においては、水平フィルタリング部４Ａが４クロックで１画素を出力し、垂直フィルタリング部４Ｂが４クロックで１画素を入力する構成であったが、水平フィルタリング４ＡがＮ＋６回目処理（図９）およびＮ＋７回目処理（図１０）において、連続的に、垂直ラインを出力するのに対して、垂直フィルタリング部４Ｂでは、Ｎ回目処理（図３）で垂直ラインを入力した後、Ｎ＋４回目処理（図７）までは、垂直ラインを入力しない。このためバッファが必要であった。これに対して、第２の実施形態においては、水平フィルタリング部３３Ａが各回処理において垂直ラインを出力し、垂直フィルタリング３３Ｂが各回処理において垂直ラインを入力するので、バッファが不要となるのである。
【０１５５】
＜第３の実施形態＞
次に、本発明の第３の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。本実施形態に係るウェーブレット変換装置は、水平フィルタリング部と垂直フィルタリング部を除いて、上記第２の実施形態に係るウェーブレット変換装置３０（図１４）の構成と同じ構成を有する。ただし、第２の実施形態においては第１，第２リングメモリ３２Ａ，３２Ｂは、それぞれ９点、９ラインのリングメモリであったが、この実施の形態においては、第１，第２リングメモリ３２Ａ，３２Ｂは、それぞれ８点、８ラインのリングメモリである。
【０１５６】
図２２は、第３の実施形態に係るフィルタリング部３３ｓの概略構成を示す図である。このフィルタリング部３３ｓは、水平フィルタリング部または垂直フィルタリング部を示し、また、リングメモリ３２ｓは、図１４に示した第１リングメモリ３２Ａまたは第２リングメモリ３２Ｂの何れかを示すものとする。
【０１５７】
このフィルタリング部３３ｓは、リングメモリ３２ｓから入力データを選択的に取り込む第１，第２データ・セレクタ６０，６５、遅延レジスタ６４、第１〜第５係数乗算器６１，６６，７１，７６，８１、加算器７０，７５，８０，８５、出力先選択部（ＤＭＵＸ）８６、および制御部８７を備えて構成される。これら構成要素のうち、第２係数乗算器６６と加算器７０の組は、２点のデータを上記ステップａ或いはステップｂ（図３８）の方法で処理する２点演算部を構成する。その他、第３係数乗算器７１と加算器７５の組、第４係数乗算器７６と加算器８０の組、および第５係数乗算器８１と加算器８５の組も同様に２点演算部を構成している。また、これら２点演算部と出力先選択部８６とで中間データ算出手段が構成される。
【０１５８】
制御部８７は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ６０は、この制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、リングメモリ３２ｓから取り込んだデータを第１端子Ｓ０〜第８端子Ｓ７の何れかから選択的に出力する。
【０１５９】
第１データ・セレクタ６０の第１端子Ｓ０から出力されたデータは、第１係数乗算器６１に入力される。第１係数乗算器６１では、制御部８７から供給される制御信号Ｃ０の値に応じて、規格化係数κ，１／κの何れか一方を乗算器６３に出力し、乗算器６３は、入力データにその規格化係数を乗算する。乗算器６３からの出力データは、遅延レジスタ６４に入力される。この第１係数乗算器６１における規格化処理は１クロック周期内に実行される。なお、第１係数乗算器６１と遅延レジスタ６４とから規格化手段が構成される。遅延レジスタ６４の出力は第２データ・セレクタ６５に入力され、かつ、分岐してＭＭＵ３１に入力される。
【０１６０】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、遅延レジスタ６４および第１データ・セレクタ６０から取り込んだデータを第１端子Ｓ０〜第８端子Ｓ７の何れかから選択的に出力する。第２〜第５係数乗算器６６，７１，７６，８１は、それぞれ、制御信号Ｃ１〜Ｃ４に従って入力データにリフティング係数−α，−β，−γ，−δを乗算する回路である。係数レジスタ６７，７２，７７，８２は、制御信号Ｃ１〜Ｃ４を受けて、リフティング係数α，β，γ，δをそれぞれ乗算器６８，７３，７８，８３に出力する。乗算器６８，７３，７８，８３は、それぞれ、第２データ・セレクタ６５の出力端子Ｓ０，Ｓ２，Ｓ４，Ｓ６から入力するデータにリフティング係数α，β，γ，δを乗算して出力する。２の補数演算回路６９，７４，７９，８４は、それぞれ乗算器６８，７３，７８，８３からの出力データの符号を反転させる。加算器７０，７５，８０，８５は、それぞれ、第２〜第５係数乗算器６６，７１，７６，８１から入力したデータと、第２データ・セレクタ６５の出力端子Ｓ１，Ｓ３，Ｓ５，Ｓ７から入力したデータとを加算して出力先選択部８６に出力する。
【０１６１】
出力先選択部８６は、制御部８７から供給される選択制御信号ＳＥＬ２の値に応じて、加算器７０，７５，８０，８５から並列に入力する４点のデータを第１端子Ｋ０〜第５端子Ｋ４から出力する。第１端子Ｋ０および第２端子Ｋ１から出力されたデータは合成データとして外部に出力される。また、第２端子Ｋ１から分岐されたデータおよび第３端子Ｋ２〜第５端子Ｋ４から出力されたデータは、ＭＭＵ３１に入力される。ＭＭＵ３１は、これら第２端子Ｋ１〜第５端子Ｋ４からＭＭＵ３１へ出力されたデータをリングメモリ３２ｓに転送し記憶させることができる。
【０１６２】
次に、図２２に示すフィルタリング部３３ｓを用いたリフティング演算の代表例を、図２３〜図２５を参照しつつ以下に説明する。この格子図の演算は、図３７の場合と同様に行われる。なお、図２３〜図２５では、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０１６３】
図２３は、Ｎ回目処理（Ｎ：整数）が終了した時点の格子図を示し、図２４、図２５は、それぞれＮ＋１回目、Ｎ＋２回目の処理を模式的に示している。Ｎ回目処理（図２３）では、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２の４個の変換処理が１クロック周期内に並列に同時実行される。対象領域Ａ１では、２点の中間データD¹ _n+2，Ｓ² _n+2を用いた上記ステップａ（図３８）の２点演算を実行して、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第２段階の一時データ（Ｄ² _n+2）を算出する。ここで、中間データＳ² _n+2は、中間データＤ¹ _n+2の系列に対して１点前の系列上のデータである。また、対象領域Ａ２では、２点のデータＤ² _n，Ｘ（２ｎ）を用いた上記ステップａの２点演算を実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））を算出する。また、対象領域Ｂ１では、一時データ（Ｓ² _n+3）と１クロック周期前の演算処理で算出された中間データＤ¹ _n+3とを用いた上記ステップｂ（図３８）の２点演算を実行して、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の第２段階の中間データＳ² _n+3を算出する。ここで、中間データＤ¹ _n+3は、一時データ（Ｓ² _n+3）の系列に対して１点後の系列上のデータである。また、対象領域Ｂ２では、出力一時データ（Ｘ（２ｎ＋２））と中間データＤ² _n+1とを用いた上記ステップｂの２点演算を実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力データＸ（２ｎ＋２）を算出する。
【０１６４】
また、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２における上記並列処理の１クロック前の周期において、対象領域Ｎ１の規格化処理が行なわれる。対象領域Ｎ１においては、入力データＹ（２ｎ＋７）に規格化係数１／κを乗算する規格化処理が実行される。
【０１６５】
このＮ回目の具体的な処理の内容は次の通りである。リングメモリ３２ｓは８ライン（系列）の記憶領域を備えている。
Ｎ回目処理においては、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が１クロック周期内に行なわれるが、この演算処理の１クロック周期前において、対象領域Ｎ１内の演算処理が行なわれる。この１クロック前の周期における処理から説明する。ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された入力データＹ（２ｎ＋７）を第１データ・セレクタ６０に出力する。第１データ・セレクタ６０は、制御部８７からの選択制御信号ＳＥＬ０の値に応じて、入力データＹ（２ｎ＋７）を第１端子Ｓ０から出力する。
【０１６６】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋７）は、第１係数乗算器６１に入力される。第１係数乗算器６１において、係数レジスタ６２は、制御部８７から供給された制御信号Ｃ０に従って、２個の規格化係数κ，１／κのうち規格化係数１／κを乗算器６３に出力し、乗算器６３は入力データＹ（２ｎ＋７）に規格化係数１／κを乗算する。この結果、第１係数乗算器６１は、データＤ¹ _n+3（＝（１／κ）×Ｙ（２ｎ＋７））を算出する。乗算器６３の出力は、遅延レジスタ６４に入力される。以上の処理が、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が行なわれる１クロック前の周期において実行される。
【０１６７】
次のクロック周期において、ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された７点のデータＸ（２ｎ），Ｄ² _n，（Ｘ（２ｎ＋２）），Ｄ² _n+1，Ｓ² _n+2，Ｄ¹ _n+2，（Ｓ² _n+3）を第１データ・セレクタ６０に出力させる。第１データ・セレクタ６０は、制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、前記７点のデータを第２データ・セレクタ６５に出力する。また、遅延レジスタ６４に記憶されているデータＤ¹ _n+3が第２データ・セレクタ６５に出力される。遅延レジスタ６４から出力されたは中間データＤ¹ _n+3は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その中間データＤ¹ _n+3をリングメモリ３２ｓに転送し、参照済みの記憶領域入力データＹ（２ｎ＋７）に上書きさせる。
【０１６８】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、８点のデータのうち対象領域Ａ２内の２点の出力データＸ（２ｎ），Ｄ² _nを選択して第１端子Ｓ０と第２端子Ｓ１とに出力し、対象領域Ｂ２内の中間データＤ² _n+1と一時データ（Ｘ（２ｎ＋２））とを第３端子Ｓ２と第４端子Ｓ３とから出力し、対象領域Ａ１内の中間データＳ² _n+2とＤ¹ _n+2とを第５端子Ｓ４と第６端子Ｓ５とから出力し、対象領域Ｂ１内の中間データＤ¹ _n+3と一時データ（Ｓ² _n+3）とを第７端子Ｓ６と第８端子Ｓ７とから出力する。
【０１６９】
第２係数乗算器６６において、係数レジスタ６７は、制御部８７から供給された制御信号Ｃ１に応じてリフティング係数αを乗算器６８に出力し、乗算器６８は、第１端子Ｓ０から入力したデータＸ（２ｎ）にリフティング係数αを乗算して得たデータα×Ｘ（２ｎ）を出力する。乗算器６８からの出力データは、２の補数演算回路６９において符号が反転され、加算器７０に出力される。加算器７０は、第２係数乗算器６６から出力されたたデータ−α×Ｘ（２ｎ）と、第２データ・セレクタ６５の第２端子Ｓ１から入力したデータＤ² _nとを加算することで対象領域Ａ２内の一時データ（Ｘ（２ｎ＋１））を算出し、出力先選択部８６に出力する。
【０１７０】
また、第３係数乗算器７１では、係数レジスタ７２は、制御部８７から供給された制御信号Ｃ２に応じてリフティング係数βを乗算器７３に出力し、乗算器７３は、第３端子Ｓ２から入力した中間データＤ² _n+1にリフティング係数βを乗算して得たデータβ×Ｄ² _n+1を出力する。乗算器７３の出力は、２の補数演算回路７４において符号が反転された後、加算器７５に出力される。加算器７５は、第３係数乗算器７１から出力されたデータ−β×Ｄ² _n+1と、第２データ・セレクタ６５の第４端子Ｓ３から入力した出力一時データ（Ｘ（２ｎ＋２））とを加算することで、対象領域Ｂ２内の出力データＸ（２ｎ＋２）を算出し、出力先選択部８６に出力する。
【０１７１】
また、第４係数乗算器７６では、係数レジスタ７７は、制御部８７から供給された制御信号Ｃ３に応じてリフティング係数γを乗算器７８に出力し、乗算器７８は、第５端子Ｓ４から入力した中間データＳ² _n+2にリフティング係数γを乗算して得たデータγ×Ｓ² _n+2を出力する。乗算器７８の出力は、２の補数演算回路７９において符号が反転された後、加算器８０に出力される。加算器８０は、第４係数乗算器７６から出力されたデータ−γ×Ｓ² _n+2と、第２データ・セレクタ６５の第６端子Ｓ５から入力したデータＤ¹ _n+2とを加算することで、対象領域Ａ１内の一時データ（Ｄ² _n+2）を算出し、出力先選択部８６に出力する。
【０１７２】
また、第５係数乗算器８１では、係数レジスタ８２は、制御部８７から供給された制御信号Ｃ４に応じてリフティング係数δを乗算器８３に出力し、乗算器８３は、第７端子Ｓ６から入力した中間データＤ¹ _n+3にリフティング係数δを乗算して得たデータδ×Ｄ¹ _n+3を出力する。乗算器８３の出力は、２の補数演算回路８４において符号が反転された後、加算器８５に出力される。加算器８５は、第５係数乗算器８１から出力されたデータ−δ×Ｄ¹ _n+3と、第２データ・セレクタ６５の第８端子Ｓ７から入力した一時データ（Ｓ² _n+3）とを加算することで、対象領域Ｂ１内の第２段階の中間データＳ² _n+3を算出し、出力先選択部８６に出力する。
【０１７３】
出力先選択部８６は、制御部８７から供給された選択制御信号ＳＥＬ２の値に従って、加算器７５から入力した出力データＸ（２ｎ＋２）を第２端子Ｋ１から外部に出力する。また、出力データＸ（２ｎ＋２）は、ＭＭＵ３１にも出力される。また、出力先選択部８６は、前記選択制御信号ＳＥＬ２に従って、加算器７０，８０，８５から入力した３点のデータ（Ｘ（２ｎ＋１）），（Ｄ² _n+2），Ｓ² _n+3を第３端子Ｋ２〜第５端子Ｋ４からＭＭＵ３１に出力する。ＭＭＵ３１は、フィルタリング部３３ｓから外部に出力された４点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），（Ｄ² _n+2），Ｓ² _n+3をリングメモリ３２ｓに転送し、ＭＭＵ３１は、その４点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），（Ｄ² _n+2），Ｓ² _n+3をリングメモリ３２ｓに転送し、参照済みの記憶領域Ｄ² _n，（Ｘ（２ｎ＋２）），Ｄ¹ _n+2，（Ｓ² _n+3）に上書きさせる。
【０１７４】
次に、Ｎ＋１回目処理（図２４）における対象領域Ａ３，Ａ４，Ｂ３，Ｂ４における変換処理が並列に同時実行される。対象領域Ａ３では、１クロック周期前の演算処理で算出された中間データＳ¹ _n+4と中間データＤ¹ _n+3を用いた上記ステップａ（図３８）の２点演算を実行して、偶数番目の入力データＹ（２ｎ＋８）を始点とする系列上の第２段階の一時データ（Ｓ² _n+4）を算出する。ここで、中間データＤ¹ _n+3は、中間データＳ¹ _n+4の系列に対して１点前の系列上のデータである。また、対象領域Ａ４では、２点のデータＳ² _n+2，Ｄ² _n+1を用いた上記ステップａの２点演算を実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋４））を算出する。また、対象領域Ｂ３では、一時データ（Ｄ² _n+2）と中間データＳ² _n+3とを用いた上記ステップｂ（図３８）の２点演算を実行して、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第２段階の中間データＤ² _n+2を算出する。ここで、中間データＳ² _n+3は、一時データ（Ｄ² _n+2）の系列に対して１点後の系列上のデータである。また、対象領域Ｂ４では、出力一時データ（Ｘ（２ｎ＋１））と出力データＸ（２ｎ＋２）とを用いた上記ステップｂの２点演算を実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）を算出する。
【０１７５】
また、対象領域Ａ３，Ａ４，Ｂ３，Ｂ４における上記並列処理の１クロック前の周期において、対象領域Ｎ２の規格化処理が行なわれる。対象領域Ｎ２では、入力データＹ（２ｎ＋８）に規格化係数κを乗算する規格化処理が実行される。
【０１７６】
次に、Ｎ＋１回目の具体的な処理の内容は次の通りである。１クロック前の周期の対象領域Ｎ２における処理から説明する。ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された入力データＹ（２ｎ＋８）を第１データ・セレクタ６０に出力する。第１データ・セレクタ６０は、制御部８７からの選択制御信号ＳＥＬ０の値に応じて、入力データＹ（２ｎ＋８）を第１端子Ｓ０から出力する。
【０１７７】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋８）は、第１係数乗算器６１に入力される。第１係数乗算器６１において、係数レジスタ６２は、制御部８７から供給された制御信号Ｃ０に従って、２個の規格化係数κ，１／κのうち規格化係数κを乗算器６３に出力し、乗算器６３は入力データＹ（２ｎ＋８）に規格化係数κを乗算する。この結果、第１係数乗算器６１は、データＳ¹ _n+4（＝κ×Ｙ（２ｎ＋８））を算出する。乗算器６３の出力は、遅延レジスタ６４に入力される。以上の処理が、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が行なわれる１クロック前の周期において実行される。
【０１７８】
次のクロック周期において、ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された７点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，（Ｄ² _n+2），Ｓ² _n+3，Ｄ¹ _n+3を第１データ・セレクタ６０に出力させる。第１データ・セレクタ６０は、制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、前記７点のデータを第２データ・セレクタ６５に出力する。また、遅延レジスタ６４に記憶されている中間データＳ¹ _n+4が第２データ・セレクタ６５に出力される。遅延レジスタ６４から出力されたは中間データＳ¹ _n+4は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その中間データＳ¹ _n+4をリングメモリ３２ｓに転送し、参照済みの記憶領域入力データＹ（２ｎ＋８）に上書きさせる。
【０１７９】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、８点のデータのうち対象領域Ｂ４内の２点の入力データＸ（２ｎ＋２），（Ｘ（２ｎ＋１））を選択して第１端子Ｓ０と第２端子Ｓ１とに出力し、対象領域Ａ４内の中間データＤ² _n+1，Ｓ² _n+2とを第３端子Ｓ２と第４端子Ｓ３とから出力し、対象領域Ｂ３内の中間データＳ² _n+3と一時データ（Ｄ² _n+2）とを第５端子Ｓ４と第６端子Ｓ５とから出力し、対象領域Ａ３内の中間データＤ¹ _n+3とＳ¹ _n+4とを第７端子Ｓ６と第８端子Ｓ７とから出力する。
【０１８０】
第２係数乗算器６６において、係数レジスタ６７は、制御部８７から供給された制御信号Ｃ１に応じてリフティング係数αを乗算器６６に出力し、乗算器６８は、第１端子Ｓ０から入力したデータＸ（２ｎ＋２）にリフティング係数αを乗算して得たデータα×Ｘ（２ｎ＋２）を出力する。乗算器６８からの出力データは、２の補数演算回路６９において符号が反転され、加算器７０に出力される。加算器７０は、第２係数乗算器６６から出力されたたデータ−α×Ｘ（２ｎ＋２）と、第２データ・セレクタ６５の第２端子Ｓ１から入力した一時データ（Ｘ（２ｎ＋１））とを加算することで対象領域Ｂ４内の出力データＸ（２ｎ＋１）を算出し、出力先選択部８６に出力する。
【０１８１】
また、第３係数乗算器７１では、係数レジスタ７２は、制御部８７から供給された制御信号Ｃ２に応じてリフティング係数βを乗算器７３に出力し、乗算器７３は、第３端子Ｓ２から入力した中間データＤ² _n+1にリフティング係数βを乗算して得たデータβ×Ｄ² _n+1を出力する。乗算器７３の出力は、２の補数演算回路７４において符号が反転された後、加算器７５に出力される。加算器７５は、第３係数乗算器７１から出力されたデータ−β×Ｄ² _n+1と、第２データ・セレクタ６５の第４端子Ｓ３から入力した中間データＳ² _n+2とを加算することで、対象領域Ａ４内の出力一時データ（Ｘ（２ｎ＋４））を算出し、出力先選択部８６に出力する。
【０１８２】
また、第４係数乗算器７６では、係数レジスタ７７は、制御部８７から供給された制御信号Ｃ３に応じてリフティング係数γを乗算器７８に出力し、乗算器７８は、第５端子Ｓ４から入力した中間データＳ² _n+3にリフティング係数γを乗算して得たデータγ×Ｓ² _n+3を出力する。乗算器７８の出力は、２の補数演算回路７９において符号が反転された後、加算器８０に出力される。加算器８０は、第４係数乗算器７６から出力されたデータ−γ×Ｓ² _n+3と、第２データ・セレクタ６５の第６端子Ｓ５から入力した一時データ（Ｄ² _n+2）とを加算することで、対象領域Ｂ３内の中間データＤ² _n+2を算出し、出力先選択部８６に出力する。
【０１８３】
また、第５係数乗算器８１では、係数レジスタ８２は、制御部８７から供給された制御信号Ｃ４に応じてリフティング係数δを乗算器８３に出力し、乗算器８３は、第７端子Ｓ６から入力した中間データＤ¹ _n+3にリフティング係数δを乗算して得たデータδ×Ｄ¹ _n+3を出力する。乗算器８３の出力は、２の補数演算回路８４において符号が反転された後、加算器８５に出力される。加算器８５は、第５係数乗算器８１から出力されたデータ−δ×Ｄ¹ _n+3と、第２データ・セレクタ６５の第８端子Ｓ７から入力した中間データＳ¹ _n+4とを加算することで、対象領域Ａ３内の第２段階の中間データＳ² _n+4を算出し、出力先選択部８６に出力する。
【０１８４】
出力先選択部８６は、制御部８７から供給された選択制御信号ＳＥＬ２の値に従って、加算器７０から入力した出力データＸ（２ｎ＋１）を第１端子Ｋ０から外部に出力する。また、出力先選択部８６は、前記選択制御信号ＳＥＬ２に従って、加算器７５，８０，８５から入力した３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）を第３端子Ｋ２〜第５端子Ｋ４からＭＭＵ３１に出力する。ＭＭＵ３１は、フィルタリング部３３ｓから外部に出力された３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）をリングメモリ３２ｓに転送し、ＭＭＵ３１は、その３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）をリングメモリ３２ｓに転送し、参照済みの記憶領域Ｓ² _n+2，（Ｄ² _n+2），Ｓ¹ _n+4に上書きさせる。
【０１８５】
次に、Ｎ＋２回目処理（図２５）における対象領域Ａ５，Ａ６，Ｂ５，Ｂ６の４個の変換処理が１クロック周期内に並列に同時実行される。また、対象領域Ａ５，Ａ６，Ｂ５，Ｂ６における上記並列処理の１クロック前の周期において、対象領域Ｎ３の規格化処理が行なわれる。
【０１８６】
対象領域Ａ６，Ｂ６，Ａ５，Ｂ５，Ｎ３は、それぞれ、上記Ｎ回目処理（図２３）の対象領域Ａ２，Ｂ２，Ａ１，Ｂ１，Ｎ１を２系列（２点）後方に移動した領域である。これら対象領域Ａ６，Ｂ６，Ａ５，Ｂ５，Ｎ３では、それぞれ、対象領域Ａ２，Ｂ２，Ａ１，Ｂ１，Ｎ１における処理と同様の処理が実行される。この結果として、対象領域Ａ６では一時データ（Ｘ（２ｎ＋３））が、対象領域Ｂ６では出力データＸ（２ｎ＋４）が、対象領域Ａ５では一時データ（Ｄ² _n+3）が、対象領域Ｂ５では中間データＳ² _n+4が、対象領域Ｎ３では中間データＤ¹ _n+4がそれぞれ算出される。
【０１８７】
次に、Ｎ＋３回目処理（図示せず）においては、上記Ｎ＋１回目処理（図２４）の対象領域Ｂ４，Ａ４，Ｂ３，Ａ３，Ｎ２を２系列（２点）後方に移動した領域において、Ｎ＋１回目処理と同様の処理が行なわれる。
【０１８８】
以上のように、上記Ｎ回目処理（図２３）および上記Ｎ＋１回目処理（図２４）と同様の処理が、全ての出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目或いは奇数番目の１点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０１８９】
本実施形態に係るウェーブレット変換装置は、図２２に示す構成を有する水平フィルタリング部と垂直フィルタリング部とを備えるため、上記第２の実施形態の場合と同じラインベースの２次元逆ＤＷＴ処理を実行することが可能である。したがって、ウェーブレット変換を極めて短時間で高速に行うことが可能である。
【０１９０】
また、第３の実施形態においても、第２の実施形態で説明したように水平フィルタリング部３３ｓが各回処理において水平ラインを出力し、垂直フィルタリング３３ｓが各回処理において画素列を入力するので、上記第１の実施形態に係るウェーブレット変換装置１のようにラインバッファ回路５を必要としない。したがって、小回路規模で、低消費電力で動作する廉価なウェーブレット変換装置の実現が可能である。
【０１９１】
＜変形例＞
図２６は、上記した第２および第３の実施形態の変形例に係る２次元ウェーブレット変換装置３０ａの概略構成を示す図である。このウェーブレット変換装置３０ａは、サブバンドの２次元画像データを一時的に保持するバッファ８８、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）８９、第１リングメモリ３２または３２ｓ、水平フィルタリング部３３または３３ｓ、第２リングメモリ３、垂直フィルタリング部４を備えて構成されている。
【０１９２】
ここで、第２リングメモリ３と垂直フィルタリング部４は、上記第１の実施形態に係るリングメモリ３とフィルタリング部４と同じ構成を有する。よって、本変形例の第２リングメモリ３Ｂと垂直フィルタリング部４Ｂは４ライン周期で１ラインの出力データを算出できる。
【０１９３】
また、第１リングメモリ３２または３２ｓと水平フィルタリング部３３または３３ｓとは、上記第２の実施形態に係るリングメモリ３２とフィルタリング部３３と、若しくは上記第３の実施形態に係るリングメモリ３２ｓとフィルタリング部３３ｓと同じ構成を有する。よって、本変形例の第１リングメモリ３２または３２ｓと水平フィルタリング部３３または３３ｓは１クロック周期で１点の出力データを算出できる。
【０１９４】
したがって、この変形例においては、水平フィルタリング部３３または３３ｓは、第１リングメモリ３２から４クロック周期間隔で入力データを取り込むように処理する。これにより、上記第１の実施形態に係るウェーブレット変換装置１（図１）のようにラインバッファ回路５を必要としない。したがって、メモリ使用量が少ない、小回路規模で低廉なウェーブレット変換装置の実現が可能となる。
【０１９５】
なお、本変形例では、第２リングメモリと垂直フィルタリング部として第１の実施形態に係る第２リングメモリ３Ｂと垂直フィルタリング部４Ｂを採用したが、この代わりに、第２リングメモリと垂直フィルタリング部として従来技術で説明したような平均５クロック周期で１点の出力データを算出する構成を採用してもよい。この場合には、水平フィルタリング部３３または３３ｓは、第１リングメモリ３２から５ライン周期間隔で入力データを取り込むように処理する。これにより、ラインバッファ回路５を必要としない構成とすることができる。
【０１９６】
＜第４の実施形態＞
次に、本発明の第４の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図２７は、第４の実施形態に係るウェーブレット変換装置９０の概略構成を示す図である。このウェーブレット変換装置９０は、サブバンドの２次元画像データを一時的に保持するバッファ９１、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）９２、第１リングメモリ３２Ｈ、第１水平フィルタリング部３３Ｈ、第２リングメモリ３２Ｌ、第２水平フィルタリング部３３Ｌ、第３リングメモリ９３および垂直フィルタリング部９４を備えて構成されている。ここで、第１リングメモリ３２Ｈ、第１水平フィルタリング部３３Ｈ、第２リングメモリ３２Ｌ、第２水平フィルタリング部３３Ｌ、第３リングメモリ９３および垂直フィルタリング部９４は、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【０１９７】
本実施形態では、ＭＭＵ９２、第１水平フィルタリング部３３Ｈ、第２水平フィルタリング部３３Ｌおよび垂直フィルタリング部９４、はハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【０１９８】
このウェーブレット変換装置９０は、２次元画像データにラインベースの２次元逆ＤＷＴを１回施す機能を有している。第１および第２水平フィルタリング部３３Ｈ，３３Ｌと垂直フィルタリング部９４とは、それぞれ第３リングメモリ９３を介して接続されている。
【０１９９】
ＭＭＵ９２は、バッファ９１、第１リングメモリ３２Ｈ、第２リングメモリ３２Ｌおよび第３リングメモリ９３のデータ入出力を制御する機能を有しており、バッファ９１から読出したサブバンドの２次元画像データを第１リングメモリ３２Ｈおよび第２リングメモリ３２Ｌに転送し記憶させることができる。
【０２００】
ここで、バッファ９１には、図１１で示した４つのサブバンドのデータ２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨが入力され、第１リングメモリ３２Ｈには、サブバンド２３ＬＨと２３ＨＨの水平方向の画素が交互に配列された水平幅Ｗ、垂直高さＨ／２の画像データが入力され、第２リングメモリ３２Ｌには、サブバンド２３ＬＬと２３ＨＬの水平方向の画素が交互に配列された水平幅Ｗ、垂直高さＨ／２の画像データが入力される。
【０２０１】
第１水平フィルタリング部３３Ｈは、第１リングメモリ３２Ｈから入力したデータに対して２次元画像の水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、サブバンド２３ＬＨと２３ＨＨとを合成した画像データであるサブバンド２３Ｈのデータを１点ずつ算出できる。このようにして算出されたサブバンド２３Ｈの画像データＹ_H（ｍ）が第３リングメモリ９３に転送される。
【０２０２】
第２水平フィルタリング部３３Ｌは、第２リングメモリ３２Ｌから入力したデータに対して２次元画像の水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、サブバンド２３ＬＬと２３ＨＬとを合成した画像データであるサブバンド２３Ｌのデータを１点ずつ算出できる。このようにして算出されたサブバンド２３Ｌの画像データＹ_L（ｍ）が第３リングメモリ９３に転送される。
【０２０３】
これら第１水平フィルタリング部３３Ｈと第２水平フィルタリング部３３Ｌとしては、上記第２または第３の実施形態に係るフィルタリング部３３または３３ｓと同じ構成を採用すればよい。
【０２０４】
一方、垂直フィルタリング部９４は、第３リングメモリ９３からサブバンド２３Ｌと２３Ｈの画像データＹ_L（ｍ）およびＹ_H（ｍ）を入力し、この画像データＹ_L（ｍ）およびＹ_H（ｍ）の垂直方向のラインを交互に配列したデータに対して画素列ごとに水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、画像データ２３の垂直ラインのデータを水平方向に２点ずつ算出できる。
【０２０５】
図２８に、本実施形態に係る垂直フィルタリング部９４の概略構成を示す。この垂直フィルタリング部９４は、入力データを選択的に取り込む第１データ・セレクタ９５、第１および第２係数乗算器９６，１００、遅延レジスタ９９，１０３、第２データ・セレクタ１０４、前段の４つの加算器１０５，１１１，１１７，１２３、第３〜第６係数乗算器１０６，１１２，１１８，１２４、後段の４つの加算器１１０，１１６，１２２，１２８、出力先選択部（ＤＭＵＸ）１２９、および制御部１３０を備えて構成される。これら構成要素のうち、２個の加算器１０５，１１０と第３係数乗算器１０６からなる組は３点のデータを１クロック周期内に処理するため、３点演算部を構成する。また、２個の加算器１１１，１１６と第４係数乗算器１１２からなる組、２個の加算器１１７，１２２と第５係数乗算器１１８からなる組、および２個の加算器１２３，１２８と第６係数乗算器１２４からなる組も、それぞれ、３点のデータを１クロック周期内に処理するため、３点演算部を構成する。また、これら４組の３点演算部と出力先選択部１２９とで中間データ算出手段が構成される。
【０２０６】
制御部１３０は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ９５は、この制御部１３０から供給される選択制御信号ＳＥＬ０の値に応じて、第３リングメモリ９３から取り込んだデータ（Ｙ_L（ｍ）およびＹ_H（ｍ）の垂直方向のラインを交互に配列したデータ）を第１端子Ｓ０〜第１２端子Ｓ１１の何れかから選択的に出力する。
【０２０７】
第１データ・セレクタ９５の第１端子Ｓ０あるいは第２端子Ｓ１から出力されたデータは、第１係数乗算器９６と第２係数乗算器１００とに入力される。第１係数乗算器９６では、係数レジスタ９７は、制御部１３０から供給される制御信号Ｃ０に応じて、規格化係数κを乗算器９８に出力し、乗算器９８は、入力データに規格化係数κを乗算し、乗算出力を遅延レジスタ９９に出力する。また、第２係数乗算器１００では、係数レジスタ１０１は、制御部１３０から供給される制御信号Ｃ１に応じて、規格化係数１／κを乗算器１０２に出力し、乗算器１０２は、入力データに規格化係数１／κを乗算し、乗算出力を遅延レジスタ１０３に出力する。なお、第１係数乗算器９６と遅延レジスタ９９との組、第２係数乗算器１０１と遅延レジスタ１０３との組で、それぞれ、本発明の規格化手段が構成される。
【０２０８】
遅延レジスタ９９と遅延レジスタ１０３とに入力されたデータは、画素クロック信号ＰＣＬＫの１クロック周期遅延した後に、第２データ・セレクタ１０４に出力される。また、遅延レジスタ１０３に入力されたデータは、分岐してＭＭＵ９２に出力される。
【０２０９】
また、第１データ・セレクタ９５の第３端子Ｓ２〜第１２端子Ｓ１１から出力されたデータは、第２データ・セレクタ１０４に出力され、さらに、第２データ・セレクタ１０４は、制御部１３０から供給される選択制御信号ＳＥＬ１に応じて、各データを４組の３点演算部に出力し、これら３点演算部において並列処理が実行される。
【０２１０】
前段の加算器１０５は、第２データ・セレクタ１０４の第１端子Ｓ０と第２端子Ｓ１とから出力された２点のデータを加算して第３係数乗算器１０６に出力する。第３係数乗算器１０６では、係数レジスタ１０７は、制御部１３０から供給される制御信号Ｃ２に応じて、リフティング係数αを乗算器１０８に出力し、乗算器１０８は、加算器１０５から入力したデータにリフティング係数αを乗算する。その乗算出力は２の補数演算回路１０９において符号が反転されて後段の加算器１１０に出力される。そして、後段の加算器１１０は、第３係数乗算器１０６から入力したデータと、第２データ・セレクタ１０４の第３端子Ｓ２から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１１】
また、前段の加算器１１１は、第２データ・セレクタ１０４の第４端子Ｓ３と第５端子Ｓ４とから出力された２点のデータを加算して第４係数乗算器１１２に出力する。第４係数乗算器１１２では、係数レジスタ１１３は、制御部１３０から供給される制御信号Ｃ３に応じて、リフティング係数βを乗算器１１４に出力し、乗算器１１４は、加算器１１１から入力したデータにリフティング係数βを乗算する。その乗算出力は２の補数演算回路１１５において符号が反転されて後段の加算器１１６に出力される。後段の加算器１１６は、第４係数乗算器１１２から入力したデータと、第２データ・セレクタ１０４の第６端子Ｓ５から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１２】
また、前段の加算器１１７は、第２データ・セレクタ１０４の第７端子Ｓ６と第８端子Ｓ７とから出力された２点のデータを加算して第５係数乗算器１１８に出力する。第５係数乗算器１１８では、係数レジスタ１１９は、制御部１３０から供給される制御信号Ｃ４に応じて、リフティング係数γを乗算器１２０に出力し、乗算器１２０は、加算器１１７から入力したデータにリフティング係数γを乗算する。その乗算出力は２の補数演算回路１２１において符号が反転されて後段の加算器１２２に出力される。後段の加算器１２２は、第５係数乗算器１１８から入力したデータと、第２データ・セレクタ１０４の第９端子Ｓ８から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１３】
また、前段の加算器１２３は、第２データ・セレクタ１０４の第１０端子Ｓ９と第１１端子Ｓ１０とから出力された２点のデータを加算して第６係数乗算器１２４に出力する。第６係数乗算器１２４では、係数レジスタ１２５は、制御部１３０から供給される制御信号Ｃ５に応じて、リフティング係数δを乗算器１２６に出力し、乗算器１２６は、加算器１２３から入力したデータにリフティング係数δを乗算する。その乗算出力は２の補数演算回路１２７において符号が反転されて後段の加算器１２８に出力される。後段の加算器１２８は、第６係数乗算器１２４から入力したデータと、第２データ・セレクタ１０４の第１２端子Ｓ１１から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１４】
出力先選択部１２９は、制御部１３０から供給される選択制御信号ＳＥＬ２の値に応じて、後段の加算器１１０，１１６，１２２，１２８から並列に入力する４点のデータを第１端子Ｋ０〜第４端子Ｋ３の何れかから選択的に出力する。
【０２１５】
出力先選択部１２９は、第１端子Ｋ０と第２端子Ｋ１から出力データＸ（２ｋ）およびＸ（２ｋ＋１）とを出力する。また、出力先選択部１２９の第１端子Ｋ０、第３端子Ｋ２、第４端子Ｋ３から出力されたデータはＭＭＵ９２にも出力される。ＭＭＵ９２は、第１端子Ｋ０、第３端子Ｋ２、第４端子Ｋ３から出力されたデータを第３リングメモリ９３に転送し、参照済みの記憶領域に上書きさせることができる。
【０２１６】
次に、以上の垂直フィルタリング部９４を用いたリフティング演算の代表例を、図２９〜図３１を参照しつつ以下に説明する。図２９〜図３１は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図２９〜図３１は、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０２１７】
図２９〜図３１は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋２回目の処理を模式的に示している。
【０２１８】
Ｎ回目処理（図２９）では、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４の４個の変換処理が１クロック周期内に並列に同時実行される。
【０２１９】
対象領域Ｃ１では、２点の中間データＤ¹ _n+4，Ｄ¹ _n+5を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と中間データＳ¹ _n+5とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋１０）を始点とする系列上の第２段階の中間データＳ² _n+5が算出される。ここで、２点の中間データＤ¹ _n+4，Ｄ¹ _n+5は、中間データＳ¹ _n+5の系列に対して１点前後する系列上のデータである。
【０２２０】
また、対象領域Ｃ２では、２点の中間データＳ² _n+3，Ｓ² _n+4を加算したデータにリフティング係数−γを乗算した後、この乗算値と中間データＤ¹ _n+3とを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋７）を始点とする系列上の第２段階の中間データＤ² _n+3が算出される。ここで、２点の中間データＳ² _n+3，Ｓ² _n+4は、中間データＤ¹ _n+3の系列に対して１点前後する系列上のデータである。
【０２２１】
また、対象領域Ｃ３では、２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータにリフティング係数−βを乗算することで乗算値を算出した後、この乗算値と中間データＳ² _n+2とを加算するという３点演算が実行される。この結果、入力データＹ（２ｎ＋４）を始点とする系列上の出力データＸ（２ｎ＋４）が算出される。ここで、２点の中間データＤ² _n+1，Ｄ² _n+2は、中間データＳ² _n+2の系列に対して１点前後する系列上のデータである。
【０２２２】
また、対象領域Ｃ４では、偶数番目の２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _nとを加算するという３点演算が実行される。この結果、入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）が算出される。ここで、偶数番目の２点の入力データＸ（２ｎ），Ｘ（２ｎ＋２）は、中間データＤ² _nに対して１点前後するデータである。
【０２２３】
また、前記対象領域Ｃ１〜Ｃ４における演算処理が実行される１クロック前の周期において、対象領域Ｎ１およびＮ２における演算処理が並列実行される。対象領域Ｎ１においては、入力データＹ（２ｎ＋１０）に規格化係数κを乗算する規格化処理が実行され中間データＳ¹ _n+5が算出され、対象領域Ｎ２においては、入力データＹ（２ｎ＋１１）に規格化係数１／κを乗算する規格化処理が実行され中間データＤ¹ _n+5が算出される。
【０２２４】
このＮ回目の具体的な処理の内容は次の通りである。Ｎ回目処理においては、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４内の演算処理が１クロック周期内に行なわれるが、この演算処理の１クロック周期前において、対象領域Ｎ１およびＮ２内の演算処理が行なわれる。この１クロック前の周期における処理から説明する。ＭＭＵ９２は、リングメモリ９３に一時記憶された入力データＹ（２ｎ＋１０）およびＹ（２ｎ＋１１）を入力し、制御部１３０から供給される選択制御信号ＳＥＬ０に応じて第１端子Ｓ０から入力データＹ（２ｎ＋１０）を出力し、第２端子Ｓ１から入力データＹ（２ｎ＋１１）を出力する。
【０２２５】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋１０）は、第１係数乗算器９６に入力される。第１係数乗算器９６において、係数レジスタ９７は、制御部１３０から供給された制御信号Ｃ０に従って規格化係数κを乗算器９８に出力し、乗算器９８は入力データＹ（２ｎ＋１０）に規格化係数κを乗算する。この結果、第１係数乗算器９６は、中間データＳ¹ _n+5（＝κ×Ｙ（２ｎ＋１０））を１クロック周期内に算出する。
【０２２６】
第２端子Ｓ１から出力された入力データＹ（２ｎ＋１１）は、第２係数乗算器１００に入力される。第２係数乗算器１００において、係数レジスタ１０１は、制御部１３０から供給された制御信号Ｃ１に従って規格化係数１／κを乗算器１０２に出力し、乗算器１０２は入力データＹ（２ｎ＋１１）に規格化係数１／κを乗算する。この結果、第２係数乗算器１００は、中間データＤ¹ _n+5（＝１／κ×Ｙ（２ｎ＋１１））を１クロック周期内に算出する。
【０２２７】
第１および第２係数乗算器９６，１００から出力された中間データＳ¹ _n+5，Ｄ¹ _n+5は、それぞれ遅延レジスタ９９，１０３に入力される。遅延レジスタ９９，１００において、中間データＳ¹ _n+5，Ｄ¹ _n+5は１クロック周期遅延された後、出力される。
【０２２８】
上記対象領域Ｎ１およびＮ２内の演算処理が行なわれた１クロック周期の後において、ＭＭＵ９２は、第３リングメモリ９３に一時記憶された１０点のデータＸ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｓ² _n+3，Ｄ¹ _n+3，Ｓ² _n+4，Ｄ¹ _n+4を第１データ・セレクタ９５に出力させる。第１データ・セレクタ９５は、制御部１３０から供給された選択制御信号ＳＥＬ０の値に応じて、前記１０点のデータを第３端子Ｓ２〜第１２端子Ｓ１１から出力する。この出力データは、第２データ・セレクタ１０４に入力される。また、遅延レジスタ９６，１０３に記憶されている中間データＳ¹ _n+5，Ｄ¹ _n+5が第２データ・セレクタ１０４に入力される。遅延レジスタ１０３から出力された中間データＤ¹ _n+5は分岐して外部のＭＭＵ９２にも出力され、ＭＭＵ９２は、その中間データＤ¹ _n+5をリングメモリ９３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋１１）に上書きさせる。
【０２２９】
第２データ・セレクタ１０４は、制御部１３０から供給された選択制御信号ＳＥＬ１に応じて、前記１２点のデータの中から、対象領域C４内の３点の入力データＸ（２ｎ），Ｘ（２ｎ＋２），Ｄ² _nを選択してそれぞれ第１端子Ｓ０〜第３端子Ｓ２から出力し、対象領域Ｃ３内の３点のデータＤ² _n+1，Ｄ² _n+2，Ｓ² _n+2を選択してそれぞれ第４端子Ｓ３〜第６端子Ｓ５から出力し、対象領域Ｃ２内の３点のデータＳ² _n+3，Ｓ² _n+4，Ｄ¹ _n+3を選択してそれぞれ第７端子Ｓ６〜第９端子Ｓ８から出力し、対象領域Ｃ１内の３点のデータＤ¹ _n+4，Ｄ¹ _n+5，Ｓ¹ _n+5を選択してそれぞれ第１０端子Ｓ９〜第１２端子Ｓ１１から出力する。
【０２３０】
前段の加算器１０５は、第２データ・セレクタ１０４の第１端子Ｓ０と第２端子Ｓ１から入力した対象領域Ｃ４内の２点のデータＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータを第３係数乗算器１０６に出力する。第３係数乗算器１０６では、係数レジスタ１０７は制御信号Ｃ２に従ってリフティング係数αを乗算器１０８に供給し、乗算器１０８は、入力データとリフティング係数αとを乗算した乗算値（＝α×（Ｘ（２ｎ）＋Ｘ（２ｎ＋２）））を出力する。この出力データは、２の補数演算回路１０９において符号が反転された後、後段の加算器１１０に出力される。そして、後段の加算器１１０は、第３係数乗算器１０６から入力する乗算値と、第２データ・セレクタ１０４の第３端子Ｓ２から入力したデータＤ² _nとを加算することで、対象領域Ｃ４内の出力データＸ（２ｎ＋１）算出し、出力先選択部１２９に出力する。
【０２３１】
また、前段の加算器１１１は、第２データ・セレクタ１０４の第４端子Ｓ３と第５端子Ｓ４から入力した対象領域Ｃ３内の２点のデータＤ² _n+1，Ｄ² _n+2を加算したデータを第４係数乗算器１１２に出力する。第４係数乗算器１１２では、係数レジスタ１１３は制御信号Ｃ３に従ってリフティング係数βを乗算器１１４に供給し、乗算器１１４は、入力データとリフティング係数βとを乗算した乗算値（＝β×（Ｄ² _n+1＋Ｄ² _n+2））を出力する。この出力データは、２の補数演算回路１１５において符号が反転された後、後段の加算器１１６に出力される。そして、後段の加算器１１６は、第４係数乗算器１１２から入力する乗算値と、第２データ・セレクタ１０４の第６端子Ｓ５から入力したデータＳ² _n+2を加算することで、対象領域Ｃ３内の出力データＸ（２ｎ＋４）を算出し、出力先選択部１２９に出力する。
【０２３２】
また、前段の加算器１１７は、第２データ・セレクタ１０４の第７端子Ｓ６と第８端子Ｓ７から入力した対象領域Ｃ２内の２点のデータＳ² _n+3，Ｓ² _n+4を加算したデータを第５係数乗算器１１８に出力する。第５係数乗算器１１８では、係数レジスタ１１９は制御信号Ｃ４に従ってリフティング係数γを乗算器１２０に供給し、乗算器１２０は、入力データとリフティング係数γとを乗算した乗算値（＝γ×（Ｓ² _n+3＋Ｓ² _n+4））を出力する。この出力データは２の補数演算回路１２１において符号が反転された後、後段の加算器１２２に出力される。そして、後段の加算器１２２は、第５係数乗算器１１８から入力する乗算値と、第２データ・セレクタ１０４の第９端子Ｓ８から入力したデータＤ¹ _n+3とを加算することで、対象領域Ｃ２内の中間データＤ² _n+3を算出し、出力先選択部１２９に出力する。
【０２３３】
また、前段の加算器１２３は、第２データ・セレクタ１０４の第１０端子Ｓ９と第１１端子Ｓ１０から入力した対象領域Ｃ１内の２点のデータＤ¹ _n+4，Ｄ¹ _n+5を加算したデータを第６係数乗算器１２４に出力する。第６係数乗算器１２４では、係数レジスタ１２５は制御信号Ｃ５に従ってリフティング係数δを乗算器１２６に供給し、乗算器１２６は、入力データとリフティング係数δとを乗算した乗算値（＝δ×（Ｄ¹ _n+4＋Ｄ¹ _n+5））を出力する。この出力データは、２の補数演算回路１２７において符号が反転された後、後段の加算器１２８に出力される。そして、後段の加算器１２８は、第６係数乗算器１２４から入力する乗算値と、第２データ・セレクタ１０４の第１２端子Ｓ１１から入力した中間データＳ¹ _n ₊₅とを加算することで、対象領域Ｃ１内の中間データＳ² _n+5を算出し、出力先選択部１２９に出力する。
【０２３４】
出力先選択部１２９は、選択制御信号ＳＥＬ２の値に従って、後段の２つの加算器１１０,１１６から入力した２点の出力データを第１端子Ｋ０と第２端子Ｋ１とからそれぞれ出力する。また、出力先選択部１２９は、後段の３つの加算器１１０，１２２，１２８からの入力した３点のデータをＭＭＵ９２へ出力する。ＭＭＵ９２は、出力された中間データＸ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5を第３リングメモリ９３に転送し、ＭＭＵ９２は、その３点のデータ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5をリングメモリ９３に転送し、参照済みの記憶領域Ｓ² _n+2，Ｄ¹ _n+3，Ｙ（２ｎ＋１０）に上書きさせる。
【０２３５】
次のＮ＋１回目処理（図３０）では、対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８の変換処理が行なわれる。また、この対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８の変換処理より１クロック前の周期において対象領域Ｎ３，Ｎ４の２個の規格化処理とが実行される。対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８，Ｎ３，Ｎ４は、上記Ｎ回目処理（図２９）の対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４，Ｎ１，Ｎ２を２系列（２点）後方に移動した領域である。これら対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８，Ｎ３，Ｎ４では、それぞれ、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４，Ｎ１，Ｎ２での処理と同様の処理が実行される。したがって、対象領域Ｃ８では、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の出力データＸ（２ｎ＋３）が算出され、対象領域Ｃ７では、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の出力データＸ（２ｎ＋６）が算出され、対象領域Ｃ６では、奇数番目の入力データＹ（２ｎ＋９）を始点とする系列上の第２段階の中間データＤ² _n+4が算出され、対象領域Ｃ１では、偶数番目の入力データＹ（２ｎ＋１２）を始点とする系列上の第２段階の中間データＳ² _n+6が算出される。また、１クロック前の周期において、対象領域Ｎ３，Ｎ４では、入力データＹ（２ｎ＋１２），Ｙ（２ｎ＋１３）に対する規格化処理が実行される。
【０２３６】
さらに、Ｎ＋２回目処理（図３１）では、対象領域Ｃ９，Ｃ１０，Ｃ１１，Ｃ１２の変換処理が行なわれる。また、この対象領域Ｃ９，Ｃ１０，Ｃ１１，Ｃ１２の変換処理より１クロック前の周期において対象領域Ｎ５，Ｎ６の２個の規格化処理とが実行される。
【０２３７】
以上のように、上記Ｎ回目処理（図２９）と同様の処理が、全ての点の出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目および奇数番目の２点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０２３８】
次に、上記ウェーブレット変換装置９０を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【０２３９】
第１水平フィルタリング部３３Ｈに入力されるデータは、図１１に示したサブバンド２３ＬＨおよび２３ＨＨであり、第２水平フィルタリング部３３Ｌに入力されるデータは、サブバンド２３ＬＬおよび２３ＨＬである。そして、第１および第２水平フィルタリング部３３Ｈ，３３Ｌからは、それぞれサブバンド２３Ｈ（Ｙ_H（ｍ）），２３Ｌ（Ｙ_L（ｍ））が出力される。
【０２４０】
垂直フィルタリング部９４に入力するデータは、第１および第２水平フィルタリング部３３Ｈ，３３Ｌから出力されるデータＹ_H（ｍ），Ｙ_L（ｍ）であり、これらデータＹ_H（ｍ），Ｙ_L（ｍ）の垂直ラインのデータが交互に配列されることによって、水平方向に画素列として入力される。そして、垂直フィルタリング部９４は、２次元画像データ２３を出力する。
【０２４１】
具体的には、第１リングメモリ３２Ｈと第１水平フィルタリング部３３Ｈは、水平ライン単位で入力するデータを１点当たり１クロック周期でフィルタリングすることでサブバンド２３Ｈを出力し、また、第２リングメモリ３２Ｌと第２水平フィルタリング部３３Ｌは、水平ライン単位で入力するデータを１点当たり１クロック周期でフィルタリングすることでサブバンド２３Ｌを出力する。
【０２４２】
なお、第１リングメモリ３２Ｈと第２リングメモリ３３Ｌは、第３の実施例で述べた図２２のリングメモリ３２ｓを用いることができ、図３３に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する８点（８画素）のデータを保持する記憶領域１３３を有し、上記一時データや中間データを保持することができる。もしくは、第１リングメモリ３２Ｈと第２リングメモリ３３Ｌは、第２の実施例で述べた図１５のリングメモリ３２を用いることができ、図２１に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する９点（９画素）のデータを保持する記憶領域５９を有し、上記一時データや中間データを保持することができる。
【０２４３】
同様に、第１および第２水平フィルタリング部３３Ｈ，３３Ｌは、第３の実施例で述べた図２２のフィルタリング部３３ｓ、もしくは、第２の実施例で述べた図１５のフィルタリング部３３を用いることができる。
【０２４４】
第３リングメモリ９３と垂直フィルタリング部９４は、上記Ｎ回目処理（図２９）と上記Ｎ＋１回目処理（図３０）を含む各回の処理を、各画素列について水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図２９）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。その後、上記Ｎ＋１回目処理（図３０）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について繰り返し実行される。
【０２４５】
この結果、垂直フィルタリング部９４からは、偶数行のデータと奇数行のデータとが各水平ライン単位で並列に出力される。例えば、上記Ｎ回目処理（図２９）を０番目〜Ｗ−１番目の画素列に対して連続的に実行した結果、２ｎ＋１番目の水平ラインの奇数行のデータＸ₀（２ｎ＋１），Ｘ₁（２ｎ＋１），…，Ｘ_j（２ｎ＋１），…，Ｘ_W-1（２ｎ＋１）が連続的に出力される。これと並行して、２ｎ＋４番目の水平ラインの偶数行のデータＸ₀（２ｎ＋４），Ｘ₁（２ｎ＋４），…，Ｘ_j（２ｎ＋４），…，Ｘ_W-1（２ｎ＋４）が連続的に出力される。
【０２４６】
なお、第１リングメモリ９３は、図３２に模式的に示すように、入力データ列に対応する１２×Ｗ点（１２ライン）のデータを保持する記憶領域１３２を有しており、上記一時データや中間データを保持することができる。この記憶領域１３２は、垂直方向に１２点のデータを保持する列領域の集合体である。一つの列領域によって、１回の処理で参照される入力データや中間データが保持される。例えば、Ｎ回目処理（図２９）では、或る列領域において、データ列｛Ｘ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｓ² _n+3，Ｄ¹ _n+3，Ｓ² _n+4，Ｄ¹ _n+4，Ｙ（２ｎ＋１０），Ｙ（２ｎ＋１１）｝から、データ列｛Ｘ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｘ（２ｎ＋４），Ｄ² _n+2，Ｓ² _n+3，Ｄ² _n+3，Ｓ² _n+4，Ｄ¹ _n+4，Ｓ² _n+5，Ｄ¹ _n+5｝へ記憶内容が変化する（データＳ² _n+2，Ｄ¹ _n+3，Ｙ（２ｎ＋１０），Ｙ（２ｎ＋１１）が、それぞれ、データＸ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5，Ｄ¹ _n+5に上書きされる）。
【０２４７】
以上の処理を再帰的に実行させることで、任意次数の分解レベルのサブバンド（帯域成分）を合成することができる。すなわち、ｋ＋１次（ｋは２以上の整数）の分解レベルにおける４つのサブバンドＬＬ（ｋ＋１），ＨＬ（ｋ＋１），ＬＨ（ｋ＋１），ＨＨ（ｋ＋１）を、ウェーブレット変換装置９０に入力させることで、ｋ次の分解レベルにおけるサブバンドＬＬ（ｋ）を得ることが可能であり、このような処理を再帰的に実行することによって、ｋ次の分解レベルのサブバンドから元の画像データを復元することが可能である。
【０２４８】
このように、本実施形態に係るウェーブレット変換装置９０とウェーブレット変換方法では、４点の中間データを算出する４個の変換処理と２点の中間データを規格化する２個の規格化処理とを１クロック周期内に並列に同時実行するため、出力データの算出周期を大幅に短縮化できる。したがって、ウェーブレット変換を極めて短時間で高速に実行することが可能である。
【０２４９】
また、ウェーブレット変換装置９０は、１クロック周期内に１点のデータを算出する第１および第２水平フィルタリング部３３Ｈ，３３Ｌと、１クロック周期内で２点のデータを算出する垂直フィルタリング部９４とを備えるため、１クロック周期内に２点の合成データを並列に算出できる。したがって、ラインベースの２次元ＤＷＴ演算を極めて高速に実行することが可能である。
【０２５０】
＜変形例＞
図３４は、上記した第４の実施形態の変形例に係る２次元ウェーブレット変換装置１４０の概略構成を示す図である。このウェーブレット変換装置１４０は、サブバンドの２次元画像データを一時的に保持するバッファ９１、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）９２Ａ、第１リングメモリ９３Ａ、水平フィルタリング部９４Ａ、ラインバッファ回路１４１、第２リングメモリ９３Ｂおよび垂直フィルタリング部９４Ｂを備えて構成されている。
【０２５１】
ここで、水平フィルタリング部９４Ａと垂直フィルタリング部９４Ｂは、上記第４の実施形態に係る垂直フィルタリング部９４（図２８）の構成と同じ構成を有し、図２９〜図３１で示したリフティング演算を実行するように、データを与えられ且つ制御される。
【０２５２】
水平フィルタリング部９４Ａからは、サブバンド２３Ｈと２３Ｌのデータが交互に各水平ライン単位で出力される。
【０２５３】
ラインバッファ回路１４１においては、第１ラインバッファ１４３と第２ラインバッファ１４４は、それぞれ、水平ライン２本分のバッファを備えている。セレクタ１４２が、入力する２本のデータを第１ラインバッファ１４３と第２ラインバッファ１４４の何れか一方に記憶させる期間、デマルチプレクサ１４５は、その他方に記憶済みの２本のデータを読み出して第２リングメモリ９３Ｂに出力する。
【０２５４】
このように本変形例の構成によっても、１クロック周期内に合成データを２点並列に算出できることから、ラインベースの２次元ＤＷＴ演算を極めて高速に実行することが可能である。
【０２５５】
【発明の効果】
以上の如く、本発明に係るウェーブレット変換装置によれば、各入力データを規格化する処理と、各中間データを一系列上の他の中間データや出力データに変換する変換処理とを繰り返し実行し、繰り返し実行される複数の処理のうち少なくとも２個の処理を１クロック周期内に並列に実行するため、出力データの算出周期を短縮化でき、逆ウェーブレット変換を短時間で高速に実行することが可能になる。
【０２５６】
また、本発明に係るウェーブレット変換方法によれば、入力データを規格化して第１段階の中間データに変換する工程（ｂ）と、中間データを一系列上の他の中間データに変換する工程（ｃ）と、最終段階の中間データを出力データに変換する工程（ｄ）とは繰り返し実行されるが、繰り返し実行する複数の工程のうち少なくとも２工程を１クロック周期内に並列に実行するため、入力データ列から出力データを算出する周期を短縮化でき、逆ウェーブレット変換を短時間で高速に行うことが可能になる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図２】第１の実施形態に係るフィルタリング部の概略構成図である。
【図３】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図４】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図５】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図６】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図７】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図８】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図９】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１０】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１１】サブバンドから画像を合成する工程を模式的に示す図である。
【図１２】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図１３】リングメモリの記憶領域を模式的に示す図である。
【図１４】本発明の第２の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図１５】第２の実施形態に係るフィルタリング部の概略構成図である。
【図１６】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１７】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１８】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１９】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２０】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図２１】リングメモリの記憶領域を模式的に示す図である。
【図２２】本発明の第３の実施形態に係るフィルタリング部の概略構成を示す図である。
【図２３】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２４】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２５】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２６】第２および第３の実施形態の変形例に係るウェーブレット変換装置の概略構成を示す図である。
【図２７】本発明の第４の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図２８】第４の実施形態に係る垂直フィルタリング部の概略構成図である。
【図２９】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３０】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３１】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３２】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図３３】リングメモリの記憶領域を模式的に示す図である。
【図３４】第４の実施形態の変形例に係るウェーブレット変換装置の概略構成を示す図である。
【図３５】ＤＷＴと逆ＤＷＴで用いるフィルタバンクを模式的に示す図である。
【図３６】３次の分解レベルで２次元ＤＷＴを施された画像データを模式的に示す図である。
【図３７】合成側のリフティング構成を模式的に示す格子図である。
【図３８】ＪＰＥＧ２０００方式が推奨する算出方法を模式的に示す図である。
【図３９】リフティング演算の工程を模式的に示す図である。
【図４０】リフティング演算の工程を模式的に示す図である。
【図４１】リフティング演算の工程を模式的に示す図である。
【図４２】リフティング演算の工程を模式的に示す図である。
【図４３】リフティング演算の工程を模式的に示す図である。
【図４４】リフティング演算の工程を模式的に示す図である。
【図４５】リフティング演算の工程を模式的に示す図である。
【図４６】リフティング演算の工程を模式的に示す図である。
【図４７】リフティング演算の工程を模式的に示す図である。
【図４８】リフティング演算の工程を模式的に示す図である。
【符号の説明】
１ウェーブレット変換装置
２ＭＭＵ（メモリ管理部）
３Ａ，３Ｂリングメモリ
４Ａ，４Ｂフィルタリング部
５ラインバッファ回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a compression / decompression technique using a wavelet transform.
[0002]
[Prior art]
As an efficient coding method for image data, there is known an image compression / decompression method based on Discrete Wavelet Transformation (hereinafter referred to as "DWT"), which is formulated by the ISO (International Organization for Standardization). JPEG2000 (Joint Photographic Experts Group 2000) is used. As a calculation method of the DWT, a convolution calculation method and a calculation method based on a lifting scheme are known, and both output the same result. However, the calculation method based on the latter lifting scheme is better than the former. As compared with the convolution operation method, there are advantages that a high-speed operation can be performed with a small amount of memory usage and that the method is suitable for lossless (lossless) compression.
[0003]
In general, a DWT can be configured using a filter bank that divides an original signal into high-frequency components (high-frequency components) and low-frequency components (low-frequency components). Then, the inverse transform (inverse DWT) can be configured using a filter bank that combines the high-frequency component and the low-frequency component that have been band-divided.
[0004]
FIG. 35 schematically illustrates the DWT and the

filter banks

200S and 200A used in the inverse transform (inverse DWT). The decomposition filter bank 200S that decomposes the input signal x (n) into two bands, a low-frequency component and a high-frequency component, includes a low-pass filter 201L that passes a low-frequency component, a high-pass filter 201H that passes a high-frequency component, It comprises first and

second downsamplers

202 and 203. Each of the low-pass filter 201L and the high-pass filter 201H is configured by an FIR filter that performs a convolution operation. The first and

second downsamplers

202 and 203 thin out the input signals from the

filters

201L and 201H at every other point, and output the signals by reducing the signal length by half. According to the JPEG2000 standard, the first downsampler 202 thins out odd-numbered signals and outputs even-numbered signals (low-frequency components), and the second downsampler 203 thins out even-numbered signals and outputs odd-numbered signals (high-frequency signals). Output).
[0005]
On the other hand, the synthesis-side filter bank 200A that synthesizes input signals (low-frequency components and high-frequency components) includes first and

second upsamplers

204 and 205, a low-pass filter 206L, a high-pass filter 206H, and an adder 207. It is configured. The low-pass filter 206L and the high-pass filter 206H are configured by FIR filters that execute a convolution operation. In general, the synthesis-side filters 206L and 206H and the decomposition-

side filters

201L and 201H are configured to satisfy a perfect reconstruction condition. You. Further, the first and

second upsamplers

204 and 205 insert a zero value between each point and double the signal length for output. Then, the adder 207 adds the signals output from the respective synthesis-side filters 206L and 206H, and outputs a synthesized signal x '(n). Here, if the perfect reconstruction condition is satisfied, x (n) = x '(n) holds.
[0006]
The two-dimensional DWT can be executed by repeatedly applying the decomposition-side filter bank 200S to the two-dimensional image data in the vertical and horizontal directions of the two-dimensional image data in order. FIG. 36 is a band division diagram schematically showing two-dimensional image data 210 on which DWT has been performed at a third-order decomposition level. Each block in the two-dimensional image data 210 represents a subband (band component). For example, the sub-band HH1 is composed of a vertical high-frequency component (H) and a horizontal high-frequency component (H) at the decomposition level 1, and the sub-band LH2 is a vertical high-frequency component (H) at the decomposition level 2. H) and a horizontal low-frequency component (L). In general, a sub-band XYn (X and Y are either “H” or “L”, n is an order of the decomposition level) is composed of a vertical component Y and a horizontal component X at the decomposition level n. is there.
[0007]
The processing procedure of DWT of decomposition level 3 is as follows. First, the sub-bands HH1, HL1, LH1, and LL1 (not shown) of the decomposition level 1 are generated by applying the decomposition-side filter bank 200S twice to the entire two-dimensional image. Next, by applying the decomposition-side filter bank 200S twice to the lowest subband LL1 of the decomposition level 1, subbands HH2, HL2, LH2, and LL2 (not shown) of the decomposition level 2 are generated. . Then, by applying the decomposition-side filter bank 200S twice to the lowest subband LL2 of the decomposition level 2, the subbands HH3, HL3, LH3, and LL3 of the decomposition level 3 are generated.
[0008]
Conversely, the processing procedure of the inverse DWT for synthesizing the subband of the decomposition level 3 is as follows. First, by applying the synthesis-side filter bank 200A twice to the subbands HH3, HL3, LH3, and LL3, the lowest subband LL2 of the decomposition level 2 is generated. Next, by applying the synthesis-side filter bank 200A twice to the subbands HH2, HL2, LH2, and LL2 at the decomposition level 2, the lowest subband LL1 at the decomposition level 1 is generated. Then, a two-dimensional image is generated by applying the synthesis-side filter bank 200A twice to the subbands HH1, HL1, LH1, and LL1 of the decomposition level 1.
[0009]
As described above, the example of the tertiary decomposition level has been described. In the JPEG2000 system, generally, the tertiary to eighth or higher decomposition levels are employed. Also, in this example, DWT is applied to the entire still image as a whole, but actually, one still image is referred to as a plurality of rectangular “tiles” due to the mounting memory capacity and the like. It is also performed to divide the image into regions and execute DWT for each tile.
[0010]
On the other hand, DWT and inverse DWT can also be realized in a lifting configuration. Since the present invention relates to processing on the synthesis side, processing of inverse DWT will be described below. In the case of a known 9 × 7 tap Daubechies filter, between input data Y (2n), Y (2n + 1), Y (2n + 2) (n: integer) and the like, and output data X (2n), X (2n + 1) Can be expressed by a lifting configuration defined by the following equation (1). Since the process on the synthesis side is inverse DWT, throughout the following description, Y is used for input data and X is used for output data.
[0011]
(Equation 1)

[0012]
In the above equation (1), odd-numbered input data Y (2n + 1) indicates high-frequency component data obtained by the decomposition processing, and even-numbered input data Y (2n) indicates low-frequency component data obtained by the decomposition processing. 2 shows the data. The output data X (2n) and X (2n + 1) indicate data obtained by combining the high frequency component and the low frequency component. The coefficients α, β, γ, δ are called lifting coefficients, and the coefficients κ, 1 / κ are called normalization coefficients. These coefficients α, β, γ, δ, κ, 1 / κ are It is uniquely derived by the filter coefficient of the 9 × 7 tap Daubechies filter.
[0013]
The lifting configuration defined by the above equation (1) can be expressed by a lattice structure shown in FIG. The grid points arranged in a vertical line at the left end of FIG. 37 represent input data..., Y (2n−1), (2n),..., Y (2n + 9), Y (2n + 10),. ing. That is, it is data in which the data of the low-frequency component and the data of the high-frequency component decomposed by the DWT are alternately arranged. Also, the grid points at the right end of the line segment extending rightward in the horizontal direction from the input data are output data..., X (2n−1), X (2n),..., X (2n + 9), and X (2n + 10), respectively. , ....
[0014]
A plurality of grid points on a line segment extending from a grid point indicating each input data Y (k) (k: an integer) to a grid point indicating output data X (k) represents a series of intermediate data. I have. For example, on the line segment between the input data Y (2n) and the output data X (2n), the intermediate data S generated with the input data Y (2n) as the starting point¹ _n, S^Two _nIs present.
[0015]
The calculation based on this lattice structure is performed according to the following rules (A) to (C). (A) Data representing a grid point moves along a line segment extending rightward from the grid point. (B) Data moving on each line segment is multiplied by a coefficient assigned to the line segment (coefficient multiplication process). (C) At each grid point, data moving from the left along the line segment is added (addition processing). For example, intermediate data S on a line segment between input / output data Y (2n) and X (2n)^Two _nIs S^Two _n= 1 × S¹ _n−δ × D¹ _n-1−δ × D¹ _nIs calculated as follows. This equation corresponds to [step 3] in the above equation (1).
[0016]
As shown in FIG. 37, for example, the intermediate data S^Two _nIs the three grid points D on the left side of the drawing.¹ _n-1, S¹ _n, D¹ _nIs the sum of the data transitioned from. It can be seen that all the intermediate data are calculated by adding the data of three points that have transitioned from the three lattice points on the left of the intermediate data. The JPEG2000 method recommends that the calculation of one point of intermediate data be performed in two steps ("Mathias Larsson Carlander, Media Lab, Ericsson Research, Sweden, JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165, 28 June, 2001 "). FIG. 38 is a diagram schematically showing a calculation method recommended by the JPEG2000 method. Grid point x₁, X_Two, X_Three, Y represent data, and α, β, γ represent coefficients assigned to line segments connecting the respective grid points. As shown, the data y is calculated in step b after calculating the temporary data z in step a.
[0017]
[Non-patent document 1]
Mathias Larsson Carlander, Media Lab, Ericsson Research Institute, Sweden, Media Lab, Ericsson Research, Sweden, JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165, June 28, 2001. .
[0018]
[Problems to be solved by the invention]
However, the lifting operation recommended by the JPEG2000 method described above has a problem that the processing time required to calculate one point of output data is long, as described below.
[0019]
FIG. 39 to FIG. 48 are grid diagrams for explaining an example of the processing procedure of the DWT inverse transform using the lifting configuration. Although not shown, it is assumed that the coefficients shown in FIG. 37 are associated with all line segments connecting the respective grid points. Also, in FIGS. 39 to 48, the grid points filled in black represent input or calculated data points, and the grid points filled in only the upper half are the points in the temporary data for which only the processing in step a has been completed. The blank grid points represent uncalculated points on which neither step a nor step b has been performed. All of the processes shown in these figures are executed within one clock cycle.
[0020]
In the N-th (N: integer) process shown in FIG. 39, the input data Y (2n + 4) in the target area N1 is normalized, so that the even-numbered input data Y (2n + 4) is used as a starting point. Intermediate data S¹ _{n + 2}Is calculated.
[0021]
In the (N + 1) -th to (N + 4) -th processes shown in FIGS. 40 to 43, the above-described step a is executed. In the (N + 1) th processing (FIG. 40), the intermediate data S of two points in the target area A1¹ _{n + 2}, D¹ _{n + 1}, The temporary data of the second stage starting from the even-numbered input data X (2n + 4) (S^Two _{n + 2}) Is calculated (in this way, when representing temporary data, the data is enclosed in parentheses and distinguished). In the next (N + 2) -th processing (FIG. 41), the intermediate data D of two points in the target area A2¹ _{n + 1}, S^Two _{n + 1}, The temporary data (D) of the second stage starting from the odd-numbered input data Y (2n + 3).^Two _{n + 1}) Is calculated. In the next N + 3rd processing (FIG. 42), the intermediate data S of two points in the target area A3^Two _{n + 1}, D^Two _n, The output temporary data (X (2n + 2)) on the series starting from the even-numbered input data Y (2n + 2) is calculated. Then, in the N + 4th processing (FIG. 43), the data D of two points in the target area A4^Two _n, X (2n), the output temporary data (X (2n + 1)) on the series starting from the odd-numbered input data Y (2n + 1) is calculated.
[0022]
In the next N + 5th processing (FIG. 44), the input data Y (2n + 5) in the target area N2 is normalized, so that the odd-numbered input data Y (2n + 5) as a starting point of the first stage on the series. Intermediate data D¹ _{n + 2}Is calculated.
[0023]
Next, in the (N + 6) th to (N + 9) th processes shown in FIGS. 45 to 48, the above step b is executed. In the (N + 6) th processing (FIG. 45), the intermediate data D in the target area B1¹ _{n + 2}And the temporary data (S^Two _{n + 2}), The intermediate data S^Two _{n + 2}Is calculated. In the next N + 7th processing (FIG. 46), the intermediate data S calculated in the N + 6th processing in the target area B2^Two _{n + 2}And the temporary data (D^Two _{n + 1}), The intermediate data D^Two _{n + 1}Is calculated. In the next N + 8th processing (FIG. 47), the intermediate data D calculated in the N + 7th processing in the target area B3^Two _{n + 1}And output temporary data (X (2n + 2)) calculated in the N + 3rd process, output data X (2n + 2) is calculated. In the N + 9th processing (FIG. 48), the output data X (2n + 2) calculated in the N + 8th processing in the target area B4 and the output temporary data (X (2n)) calculated in the N + 4th processing are used. By the processing, the output data X (2n + 1) is calculated.
[0024]
Next, in the N + 10th processing (not shown), the normalization processing using the input data Y (2n + 6) is performed in the same manner as the Nth processing, and thereafter, the same processing as the N + 1th to N + 9th processings is performed. Is repeatedly executed.
[0025]
In this way, by inputting input data Y (2n + 4) and Y (2n + 5) in which high-frequency components and low-frequency components are alternately arranged, output data X (2n + 2) and X (2n + 1), which are synthesis results, are calculated. It can be seen that N-th to (N + 9) -th 10 clock cycles are required to perform this operation. Therefore, five clock cycles are required on average to calculate one point of output data. There is a demand for a processing method capable of executing the inverse DWT operation at high speed by further reducing the five clock cycles.
[0026]
In view of the above problems, it is an object of the present invention to provide a wavelet transform apparatus and a wavelet transform method that can efficiently perform a wavelet transform based on a lifting configuration in a short time.
[0027]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, an invention according to claim 1 is a wavelet transform device that synthesizes band-divided high-frequency component data and low-frequency component data based on a lifting configuration. , An output data obtained by taking in an input data sequence constituted by alternately arranging a first data sequence composed of one of a high-frequency component and a low-frequency component and a second data sequence composed of the other in pixel units A filtering unit for calculating a column, wherein the filtering unit multiplies each of the input data sequences by a predetermined normalization coefficient to convert each input data into intermediate data of the first stage by one clock per point. Normalizing means for executing one or more normalization processes for converting within a cycle, and one or more stages of each of the intermediate data of the first stage standardized by said normalizing means A single or a plurality of conversion processes for converting one series of intermediate data within one clock cycle per point, or converting each of the final stage intermediate data to output data within one clock cycle per point are executed. Intermediate data conversion means, the control unit, the normalization means and the intermediate data conversion means, the singular or plural normalization processing and the singular or plural conversion processing, the output of all points The process is repeatedly executed until data is calculated, and at least two processes of the singular or plural normalization processes and the singly or plural conversion processes repeatedly executed are executed in parallel within one clock cycle. Control.
[0028]
The invention according to claim 2 is the wavelet transform device according to claim 1, wherein the normalizing means and the intermediate data converting means execute the normalizing processing and the converting processing in parallel.
[0029]
According to a third aspect of the present invention, in the wavelet transform device according to the first or second aspect, the normalizing means includes: a normalizing coefficient multiplier for multiplying each input data by the normalizing coefficient; A delay unit for delaying data output from the conversion coefficient multiplier, wherein the intermediate data conversion means multiplies one of the two intermediate data by a predetermined lifting coefficient, and the lifting coefficient multiplier A two-point operation unit comprising an adder for adding the data output from the multiplier and the other of the two-point intermediate data, and taking in the data output from the two-point operation unit and designated by the control unit. An output destination selection unit for outputting to the output destination, wherein the wavelet transform device further includes: a memory management unit; and a memo for temporarily storing data under the control of the memory management unit. When, wherein the memory management unit, the said data output from the output destination selecting unit transferred to the memory controls to store, it is characterized.
[0030]
According to a fourth aspect of the present invention, in the wavelet transform apparatus according to the third aspect, the control unit performs, as the conversion process, a "sequence starting from input data belonging to the second data sequence" Intermediate data at the first stage above, and a "series starting from the input data belonging to the first data string" immediately before the intermediate data (hereinafter referred to as a first series). )) And the data obtained by multiplying the intermediate data of the first stage above by a predetermined lifting coefficient are added, so that the temporary data of the second stage on the second series can be shifted within one clock cycle per point. A first conversion process to be calculated, the temporary data calculated in the first conversion process and stored in the memory, and a first step in a first series one point after the series of the temporary data. Multiply the intermediate data by a predetermined lifting coefficient The second conversion process of calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the data obtained in the first series and the first stage of the first series on the first series By adding the intermediate data and the data obtained by multiplying the intermediate data in the second stage on the second series one point before the intermediate data by a predetermined lifting coefficient, the intermediate data is added to the first series. A third conversion process for calculating the temporary data of the second stage within one clock cycle per point; a process for calculating the third conversion process and storing the temporary data in the memory; On the other hand, by adding the intermediate data of the second stage on the second series one point later and data obtained by multiplying the intermediate data by a predetermined lifting coefficient, the intermediate data of the second stage on the first series can be obtained. Fourth variation calculated within one clock cycle per point Processing, intermediate data of the M-th stage (the number of stages M is an integer of 1 or more) on the second stream, and intermediate data of the M-th stage on the first stream one point before the series of the intermediate data A fifth conversion process for calculating the temporary data of the (M + 1) th stage on the second stream within one clock cycle per point by adding data obtained by multiplying a predetermined lifting coefficient; The temporary data calculated in the conversion process of step 5 and stored in the memory, and the intermediate data at the M-th stage on the first series one point after the temporary data series are multiplied by a predetermined lifting coefficient. A sixth conversion process of calculating the intermediate data of the (M + 1) -th stage on the second sequence within one clock cycle per point by adding the obtained data to the L-th stage on the first sequence (stage The number L is an integer of 1 or more) and the intermediate data By adding the data obtained by multiplying the intermediate data at the (L + 1) -th stage on the second sequence one point before the data sequence by a predetermined lifting coefficient, the (L + 1) -th stage on the first sequence is obtained. A seventh conversion process for calculating the temporary data of one point within one clock cycle per point; the temporary data calculated in the seventh conversion process and stored in the memory; By adding the intermediate data of the (L + 1) -th stage on the second series after the point and data obtained by multiplying the intermediate data of the (L + 1) -th stage on the first series by one point, The control is performed such that the two-point operation unit is repeatedly executed until the output data of all the points is calculated, in the eighth conversion processing calculated within one clock cycle.
[0031]
The invention according to claim 5 is the wavelet transform device according to claim 4, wherein the control unit executes the first transform process and the third transform process, and then executes the fifth transform process and the fifth transform process. The seventh conversion process is performed by the two-point operation unit until the temporary data at the final stage is calculated, and then, after performing the second conversion process and the fourth conversion process, The sixth conversion processing and the eighth conversion processing are controlled so as to be executed by the two-point calculation unit until the output data is calculated.
[0032]
The invention according to claim 6 is the wavelet transform device according to claim 4, further comprising four of the two-point operation units that operate independently of each other, and wherein the control unit performs the second data processing as the conversion processing. A second conversion process of calculating intermediate data of a second stage on a sequence belonging to a sequence and starting from a P-th (data number P is an integer) input data in the input data sequence; The third conversion processing of calculating temporary data of the second stage on the sequence starting from the fourth input data, and calculating the intermediate data of the (M + 1) th stage on the sequence starting from the P-4th input data And the seventh conversion process of calculating temporary data of the (L + 1) th stage on the sequence starting from the P-5th input data. At the P + 2nd input The first conversion processing of calculating temporary data of a second stage on a sequence starting from data, and calculating the intermediate data of a second stage on a sequence starting from the (P-1) th input data. The fourth conversion processing, the fifth conversion processing of calculating temporary data of the M-th stage on the series starting from the (P-2) th input data, and the P-5th input data as the starting point. And the eighth conversion processing for calculating the intermediate data of the (L + 1) th stage on the sequence to be performed, so that each of the two-point calculation units is controlled in parallel.
[0033]
According to a seventh aspect of the present invention, in the wavelet transform device according to the first or second aspect, the normalizing means includes: a normalizing coefficient multiplier for multiplying each input data by the normalizing coefficient; And a delay unit for delaying data output from the conversion coefficient multiplier, wherein the intermediate data conversion means adds first and second input data among the input three-point input data. An adder, a lifting coefficient multiplier that multiplies the data output from the first adder by a predetermined lifting coefficient, and adding the data output from the lifting coefficient multiplier and third input data A three-point operation unit including a second adder for calculating intermediate data, and an output destination selection unit that receives the intermediate data output from the three-point operation unit and outputs the intermediate data to an output destination specified by the control unit Includes, the memory management unit controls the intermediate data output from the output destination selecting unit so as to transfer stored in said memory.
[0034]
The invention according to claim 8 is the wavelet transform device according to claim 7, wherein the control unit performs, as the conversion process, a "sequence starting from input data belonging to the second data sequence" The intermediate data at the first stage in the above and a “sequence starting from the input data belonging to the first data sequence” which is about one point behind the intermediate data sequence (hereinafter referred to as a first sequence) By adding data obtained by multiplying the above two points of the first stage intermediate data by a predetermined lifting coefficient, the second stage intermediate data on the second series is obtained. Conversion within one clock cycle per point, a first-stage intermediate data on the first series, and a second series that is about one point from the first-stage intermediate data series The intermediate data of the second stage of the above two points were added. A data obtained by multiplying the data by a predetermined lifting coefficient to add intermediate data of the second stage on the first stream within one clock cycle per point; , Intermediate data at the M-th stage (the number of stages M is an integer equal to or greater than 1) on the second series, and two M-th points on the first series that are approximately one point away from the series of intermediate data at the M-th stage By adding the data obtained by multiplying the intermediate data of the stages by the predetermined lifting coefficient and the data obtained by multiplying the data by the predetermined lifting coefficient, the intermediate data of the (M + 1) th stage on the second series is calculated within one clock cycle per point. A third conversion process, intermediate data of an L-th stage (the number of stages L is an integer of 1 or more) on the first stream, and a second stream on the second stream that is about one point away from the stream of intermediate data at the L-th stage; A predetermined value is added to the data obtained by adding the intermediate data of the L + 1 stage at Adding the data obtained by multiplying by the footing coefficient to calculate the intermediate data of the (L + 1) -th stage on the first stream within one clock cycle per point; Until the output data is calculated.
[0035]
The invention according to claim 9 is the wavelet transform device according to claim 8, comprising two of the three-point operation units operating independently of each other, wherein the control unit belongs to the first data sequence and The second conversion processing for calculating the intermediate data on the sequence starting from the P-th (data number P is an integer) input data in the input data sequence, and the P-4th input data as the starting point And the fourth conversion processing for calculating the intermediate data in the (L + 1) th stage on the sequence, and the three processing units are controlled so as to be executed in parallel.
[0036]
According to a tenth aspect of the present invention, in the wavelet transform device according to the eighth or ninth aspect, the control unit starts the input data of the (P + 3) th (data number P is an integer) in the input data sequence. The first conversion processing for calculating the intermediate data on the series to be set, and the third conversion processing for calculating the M + 1-th stage intermediate data on the series starting from the (P−1) th input data, Are controlled so that each of the three-point calculation units is executed in parallel.
[0037]
According to an eleventh aspect of the present invention, in the wavelet transform device according to the eighth aspect, the control unit controls the first to fourth conversion processes in parallel.
[0038]
The invention according to claim 12 is the wavelet transform device according to any one of claims 1 to 11, wherein the filtering unit includes a first filtering unit and a second filtering unit connected in series. The first filtering unit receives the data of the high-frequency component and the low-frequency component, which are band-divided in one of the horizontal direction and the vertical direction, and synthesizes these data. The second filtering unit performs a process on the combined data calculated by the first filtering unit, so that the combined data in the other direction of the horizontal direction and the vertical direction is calculated. Is calculated.
[0039]
According to a thirteenth aspect of the present invention, there is provided a wavelet transform method for synthesizing band-divided high-frequency component data and low-frequency component data based on a lifting configuration, wherein (a) a high-frequency component and a low-frequency component (B) selectively taking in input data from an input data string configured by alternately arranging a first data string composed of one of the two and a second data string composed of the other in a pixel unit; Multiplying each of the input data fetched in the step (a) by a normalization coefficient to convert the input data into the first-stage intermediate data within one clock cycle per point; and (c) the m-th stage (m Calculating intermediate data of (m is an integer of 1 or more) into m + 1-th intermediate data within one clock cycle per point (including the case where the m-th intermediate data is the final intermediate data. In this case, M + 1 The intermediate data of the floor is output data.), And the steps (b) and (c) are repeatedly executed until the output data of all points are calculated, and are repeatedly executed. The method is characterized in that the steps (b) and (c) are executed in parallel within one clock cycle.
[0040]
According to a fourteenth aspect of the present invention, in the wavelet transform method according to the thirteenth aspect, the step (c) comprises the step of: (c-1) “a sequence starting from input data belonging to the second data sequence” ( Hereinafter, this is referred to as a second series.) The intermediate data of the first stage on the above and a “series starting from the input data belonging to the first data string” immediately before the intermediate data (hereinafter, the first series) The temporary data of the second stage on the second sequence is added to the intermediate data of the first stage on the above first sequence and data obtained by multiplying the intermediate data of the first stage by a predetermined lifting coefficient by one clock per point. (C-2) the temporary data calculated in the step (c-1) and stored in the memory, and a first series one point after the temporary data series. Multiply the intermediate data of the first stage by a predetermined lifting coefficient (C-3) calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the obtained data to the first series. Is added to data obtained by multiplying the intermediate data of the second stage on the second series immediately before by the intermediate data by a predetermined lifting coefficient with respect to the intermediate data of the first series. Calculating the temporary data of the second stage in one clock cycle per point; (c-4) the temporary data calculated in the step (c-3) and stored in the memory; and the temporary data Is added to the data obtained by multiplying the intermediate data of the second stage on the second series one point after the series by a predetermined lifting coefficient, thereby obtaining the intermediate data of the second stage on the first series. Calculating data within one clock cycle per point; (c) 5) The intermediate data of the M-th stage (the number of stages M is an integer of 1 or more) on the second stream and the intermediate data of the M-th stage on the first stream one point before the series of the intermediate data are specified. (C-6) calculating temporary data of the (M + 1) -th stage on the second sequence within one clock cycle per point by adding the data obtained by multiplying by the lifting coefficient of Multiplying the temporary data calculated in (c-5) and stored in the memory, and the intermediate data of the M-th stage on the first series one point after the temporary data series by a predetermined lifting coefficient; (C-7) calculating temporary data of the (M + 1) th stage on the second stream within one clock cycle per point by adding the obtained data to the Lth step on the first stream. (The number of stages L is an integer of 1 or more) and the intermediate data By adding the data obtained by multiplying the intermediate data at the (L + 1) -th stage on the second sequence immediately before the data sequence by a predetermined lifting coefficient to the (L + 1) -th stage on the first sequence, (C-8) calculating the temporary data within one clock cycle per point; and (c-8) calculating the temporary data calculated in the step (c-7) and stored in the memory, and the sequence of the temporary data. By adding the intermediate data of the (L + 1) th stage on the second series one point later and data obtained by multiplying the intermediate data of the (L + 1) th stage by a predetermined lifting coefficient, the intermediate data of the (L + 1) th stage on the first series is added by one point. And calculating within one clock cycle per step, wherein the steps (c-1) to (c-8) are repeatedly executed until output data of all points are calculated.
[0041]
The invention according to claim 15 is the wavelet transform method according to claim 14, wherein after performing the steps (c-1) and (c-3), the steps (c-5) and the steps (c-5) are performed. (C-7) is executed until temporary data of the output data is calculated, and then, after executing the steps (c-2) and (c-4), the steps (c-6) And the step (c-8) is performed until the output data is calculated.
[0042]
According to a sixteenth aspect of the present invention, there is provided the wavelet transform method according to the fourteenth aspect, wherein a P-th (data number P is an integer) input data belonging to the second data sequence and the input data sequence is defined as a starting point. The step (c-2) of calculating intermediate data of the second stage on the sequence to be performed and the step (c-2) of calculating temporary data of the second stage on the sequence starting from the P-1st input data 3), the step (c-6) of calculating intermediate data at the (M + 1) th stage on the sequence starting from the P-4th input data, and the step (c-6) on the sequence starting from the P-5th input data. And the step (c-7) of calculating the temporary data of the (L + 1) th stage is performed in parallel by the respective two-point arithmetic units, and the second step on the series starting from the (P + 2) th input data The step (c-1) of calculating temporary data of The step (c-4) of calculating the intermediate data in the second stage on the sequence starting from the P-1st input data; and the M + 1th sequence on the sequence starting from the P-2th input data. The step (c-5) of calculating the temporary data of the step and the step (c-8) of calculating the intermediate data of the (L + 1) th step on the series starting from the P-5th input data. Control is performed so that each of the four processes is executed in parallel.
[0043]
According to a seventeenth aspect of the present invention, in the wavelet transform method according to the thirteenth aspect, in the step (c), (c-1) "a sequence starting from input data belonging to the second data sequence" ( Hereinafter, this is referred to as a second series.) The intermediate data at the first stage on the above and a “series starting from the input data belonging to the first data sequence” which is one point around the intermediate data series (hereinafter, referred to as “second series”) The data obtained by multiplying the data obtained by multiplying the intermediate data of the first two stages in the first stage by a predetermined lifting coefficient is added to the second stage on the second system. And (c-2) a first stage of intermediate data on the first stream, and a second stream that is about one point from the intermediate data stream. The above two points are added to the data obtained by adding the intermediate data of the second stage. (C-3) calculating the intermediate data of the second stage on the first series within one clock cycle per point by adding the data obtained by multiplying by the lifting coefficient of The intermediate data of the M-th stage (the number of stages M is an integer equal to or greater than 1) on the series, and the middle of the M-stage of two points on the first series that is about one point behind the series of the intermediate data of the M-th Calculating the intermediate data of the (M + 1) th stage on the second stream within one clock cycle by adding the data obtained by multiplying the data obtained by multiplying the data by a predetermined lifting coefficient; (C-4) L-stage intermediate data on the first stream (the number of stages L is an integer equal to or greater than 1) and 2nd-stream data on the second stream that is about one point away from the L-stage intermediate data stream A predetermined lifting is performed on the data obtained by adding the intermediate data of the (L + 1) th stage of the point. Calculating the intermediate data of the (L + 1) -th stage on the first series within one clock cycle per point by adding the data obtained by multiplying the coefficient by the coefficient. ) To (c-4) are repeatedly executed until the output data of all points is calculated.
[0044]
The invention according to claim 18 is the wavelet transform method according to claim 17, wherein a P-th (data number P is an integer) input data belonging to the first data sequence and in the input data sequence is defined as a starting point. The step (c-2) of calculating intermediate data of the second stage on the sequence to be performed and the step (c) of calculating the intermediate data of the (L + 1) th stage on the sequence starting from the P-4th input data -4) and the two processes are controlled to be executed in parallel.
[0045]
The invention according to claim 19 is the wavelet transform method according to claim 17 or claim 18, wherein a P + 3th (data number P is an integer) input data in the input data sequence is used as a starting point. (C-1) for calculating intermediate data, and (c-3) for calculating intermediate data at the (M + 1) th stage on a sequence starting from the (P-1) th input data. The processing is controlled so as to be executed in parallel.
[0046]
The invention according to claim 20 is the wavelet transform method according to claim 17, wherein the steps (c-1) to (c-4) are executed in parallel.
[0047]
The invention according to claim 21 is the wavelet transform method according to any one of claims 13 to 20, wherein two-dimensional image data band-divided into a low-frequency component and a high-frequency component is By applying the steps (a) to (c) on a line basis in one of the horizontal direction and the vertical direction of the two-dimensional image data, a combined data string is calculated. On the other hand, a wavelet transform method in which the steps (a) to (c) are applied in the other of the horizontal direction and the vertical direction.
[0048]
BEST MODE FOR CARRYING OUT THE INVENTION
<First embodiment>
Hereinafter, a wavelet transform device and a wavelet transform method according to the first embodiment of the present invention will be described. FIG. 1 is a diagram illustrating a schematic configuration of a wavelet transform device 1 according to the first embodiment. The wavelet transform apparatus 1 includes a buffer 8 for temporarily holding subband data of a high-frequency component or a low-frequency component decomposed by wavelet transform, and an MMU (memory management) operating in synchronization with an externally supplied clock signal CLK. 2), a first ring memory 3A, a horizontal filtering unit 4A, a line buffer circuit 5, a second ring memory 3B, and a vertical filtering unit 4B. Here, the first ring memory 3A, the horizontal filtering unit 4A, the line buffer circuit 5, the second ring memory 3B, and the vertical filtering unit 4B operate in synchronization with an externally supplied pixel clock signal PCLK.
[0049]
In the present embodiment, the MMU 2, the horizontal filtering unit 4A, and the vertical filtering unit 4B are configured by hardware, but may be configured by a computer program including an instruction group executed by a microprocessor instead.
[0050]
The sub-band data input to the wavelet transform device 1 is temporarily stored in the buffer 8. The wavelet transform device 1 has a function of applying a line-based two-dimensional inverse DWT to subband data once. The horizontal filtering unit 4A and the vertical filtering unit 4B are connected in series via a line buffer circuit 5 and a second ring memory 3B. As described later, the sub-band data is filtered in the horizontal direction by the horizontal filtering unit 4A, and then filtered in the vertical direction by the vertical filtering unit 4B. When performing a two-dimensional inverse DWT on data at a decomposition level of second order or higher, the wavelet transform device 1 may be repeatedly used two or more times.
[0051]
The MMU 2 has a function of controlling data input / output between the buffer 8, the first ring memory 3A, and the second ring memory 3B, and transfers and stores input data read from the buffer 8 to the first ring memory 3A. be able to. The horizontal filtering unit 4A performs filtering in the horizontal direction on the data input from the first ring memory 3A, so that, in eight clock cycles of the pixel clock signal PCLK, a low-frequency component in the same direction as the high-frequency component in the horizontal direction is used. Two points of output data obtained by synthesizing the area component can be calculated and output to the line buffer circuit 5. Therefore, the average cycle required to calculate one point of output data is four clock cycles.
[0052]
The data output from the line buffer circuit 5 is stored in the second ring memory 3B. The MMU 2 causes the input data to be input from the second ring memory 3B to the vertical filtering unit 4B. The vertical filtering unit 4B performs filtering on the input data in the vertical direction, so that the output data obtained by combining the high frequency component in the vertical direction and the low frequency component in the same direction in eight periods of the pixel clock signal PCLK. Is calculated at two points and output.
[0053]
The configuration of the horizontal filtering unit 4A and the configuration of the vertical filtering unit 4B are the same. FIG. 2 shows a schematic configuration of the filtering unit 4 (the horizontal filtering unit 4A or the vertical filtering unit 4B). The ring memory 3 shown in FIG. 2 represents one of the first ring memory 3A and the second ring memory 3B shown in FIG.
[0054]
The filtering unit 4 includes a first data selector 11, a first coefficient multiplier 12, a delay register 16, a second data selector 17, a second coefficient multiplier 18, an adder 22, an output, which selectively take in input data. It is configured to include a destination selection unit (DMUX) 23 and a control unit 24. The control unit 24 operates in synchronization with the pixel clock signal PCLK. The first and

second data selectors

11 and 17 receive the input data fetched by the ring memory 3 and the data held in the delay register 16 according to the values of the selection control signals SEL0 and SEL1 supplied from the control unit 24. To the first terminal S0 and the second terminal S1, respectively.
[0055]
Further, the first coefficient multiplier 12 converts the data output from the first terminal S0 of the first data selector 11 into normalized coefficients κ and 1 / κ according to the control signal C0 supplied from the control unit 24. The output is multiplied by any one (normalization processing). The data output from the first coefficient multiplier 12 is input to the second data selector 17 after being delayed by one clock cycle of the pixel clock signal PCLK in the delay register 16. It should be noted that the first coefficient multiplier 12 and the delay register 16 constitute the normalizing means of the present invention.
[0056]
Further, the second coefficient multiplier 18 adds the lifting coefficients −α, −β, − to the data output from the first terminal S0 of the second data selector 17 according to the control signal C1 supplied from the control unit 24. Multiply and output any of γ and -δ (coefficient multiplication process). The adder 22 adds the data output from the second coefficient multiplier 18 and the data output from the second terminal S1 of the second data selector 17 and outputs the result to the output destination selection unit 23 (addition processing). ). The data normalized by 1 / κ is also output from the third terminal S2 of the second data selector 17 to the external MMU2. The MMU 2 can transfer data output from the third terminal S2 of the second data selector 17 to the outside to the ring memory 3 and store the data.
[0057]
Further, the output destination selecting unit 23 outputs data input from the adder 22 from any one of the first terminal K0 to the third terminal K2 according to the value of the selection control signal SEL2 supplied from the control unit 24. The coefficient multiplication processing and the addition processing in the second coefficient multiplier 18 and the adder 22 are executed within one clock cycle per point. Therefore, the period required to multiply and add one point of input data by the lifting coefficient is one cycle of the pixel clock signal PCLK.
[0058]
Note that the coefficient register 19 and the adder 22 constitute a two-point operation unit that fetches two points of input data from the data selector 17 and performs an operation. The two-point calculation unit and the output destination selection unit 23 constitute an intermediate data calculation unit.
[0059]
The data output from the first terminal K0 and the second terminal K1 of the output destination selection unit 23 is output to the outside as output data in which the input data of the low-frequency component and the high-frequency component is synthesized.
[0060]
The data output from the second terminal K1 of the output destination selection unit 23 is branched and output to the external MMU2. The data output from the third terminal K2 is output to the external MMU2. The MMU 2 can transfer data output from the second terminal K1 and the third terminal K2 to the outside to the ring memory 3 and store the data.
[0061]
Next, a representative example of the lifting operation using the filtering unit 4 will be described below with reference to FIGS. 3 to 10 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of this grid diagram is performed in the same manner as in the case of FIG. 3 to 10, for convenience of description, lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ, 1 / κ corresponding to line segments connecting the respective grid points are displayed. Not.
[0062]
As shown in FIGS. 3 to 10, input data..., Y (2n−1), Y (2n),..., Y (2n + 9) are each a series of grid points (intermediate data) ), And output as output data..., X (2n-1), X (2n),..., X (2n + 9). For example, the input data Y (2n) is output as output data X (2n) after passing through two stages of intermediate data (lattice points). Hereinafter, a process of normalizing input data to generate intermediate data is referred to as a normalization process (corresponding to

steps

1 and 2 in the above formula (1)), and a process of calculating other intermediate data is a conversion process (step 3 described above). And step 4 apply). In the present embodiment and other embodiments described later, two-stage intermediate data is calculated for each stream. However, the present invention is not limited to this, and there may be a lifting configuration that calculates only one-stage intermediate data. . Actually, in the case of a filter of 5 × 3 taps or a filter of 13 × 7 taps, a lifting configuration for calculating intermediate data of only one stage is possible.
[0063]
3 to 10 show the contents of the N-th (N is an integer) to N + 7-th processing in the present embodiment. In the N-th processing (FIG. 3), the intermediate data S of two points in the target area A1¹ _{n + 2}And D¹ _{n + 1}Is performed within one cycle (one clock cycle) of the pixel clock signal PCLK using the above-mentioned step a (FIG. 38), and the even-numbered input data Y (2n + 4) is used as a starting point. The temporary data of the second stage (S^Two _{n + 2}) Is calculated. That is, the even-numbered intermediate data S¹ _{n + 2}And this intermediate data S¹ _{n + 2}Odd-numbered intermediate data D on the series one point before¹ _{n + 1}Is used to execute the processing of step a.
[0064]
Further, in the cycle one clock before the arithmetic processing in the target area A1, in the target area N1, a normalization process of multiplying the input data Y (2n + 4) by the normalization coefficient κ is executed, and the input data Y (2n + 4) Intermediate data S of the first stage on the series¹ _{n + 2}Is calculated.
[0065]
The contents of the N-th specific processing are as follows. The ring memory 3 shown in FIG. 2 includes a 5-line (series) storage area for storing input data, intermediate data, and temporary data, and sequentially stores new data in a storage area for storing referenced old data. Has a structure to overwrite.
[0066]
First, the processing in the target area N1 executed in the cycle one clock before will be described. The MMU 2 causes the first data selector 11 to output the input data Y (2n + 4) temporarily stored in the ring memory 3. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11 and causes the first coefficient multiplier 12 to output the input data Y (2n + 4). The first coefficient multiplier 12 outputs the standardization coefficient κ selected according to the control signal C0 supplied from the control unit 24 to the multiplier 14, and the multiplier 14 converts the standardization coefficient κ into the input data Y (2n + 4). Multiplied data κ × Y (2n + 4) = S¹ _{n + 2}Is output to the delay register 16. The coefficient multiplication processing in the first coefficient multiplier 12 is executed within one clock cycle.
[0067]
One clock cycle later, the intermediate data S held in the delay register 16¹ _{n + 2}Is output to the second data selector 17. The MMU 2 stores the intermediate data D temporarily stored in the ring memory 3.¹ _{n + 1}Is output to the first data selector 11. The first data selector 11 receives the intermediate data D from the second terminal S1 in response to the selection control signal SEL0 supplied from the control unit 24.¹ _{n + 1}Is output. The output data is input to the second data selector 17. The second data selector 17 outputs the intermediate data D in response to the selection control signal SEL1 supplied from the control unit 24.¹ _{n + 1}Is output from the first terminal S0 to the second coefficient multiplier 18, and the intermediate data S¹ _{n + 2}From the second terminal S1 to the adder 22.
[0068]
The second coefficient multiplier 18 outputs the lifting coefficient δ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data D¹ _{n + 1}Is multiplied by a lifting coefficient δ, and data δ × D1n + 1 is output to a two's complement arithmetic circuit 21. The two's complement arithmetic circuit 21 is an arithmetic circuit for inverting the sign of the input data, and is -δ × D¹ _{n + 1}Is output to the adder 22. Then, the adder 22 calculates the data of two points −δ × D¹ _{n + 1}And S¹ _{n + 2}And the temporary data (S^Two _{n + 2}) Is calculated and output to the output destination selection unit 23. This temporary data (S^Two _{n + 2}Is calculated within one clock cycle.
[0069]
The output destination selection unit 23 transmits the temporary data (S) to the external MMU 2 from the third terminal K2 selected according to the value of the selection control signal SEL2 supplied from the control unit 24.^Two _{n + 2}) Is output. MMU2 transmits the temporary data (S^Two _{n + 2}) Is transferred to the ring memory 3 to overwrite the storage area input data Y (2n + 4) which has been referred to.
[0070]
In the next (N + 1) -th processing (FIG. 4), the intermediate data D of two points in the target area A2¹ _n ₊₁And S^Two _{n + 1}Is performed within one clock cycle, and the temporary data (D) of the second stage on the series starting from the odd-numbered input data Y (2n + 3) is obtained.^Two _{n + 1}) Is calculated. Intermediate data S^Two _{n + 1}Is the second stage data on the series starting from the input data Y (2n + 2) one point before the input data Y (2n + 3). Specifically, the MMU 2 reads the calculated intermediate data D from the ring memory 3.¹ _{n + 1}And S^Two _{n + 1}Is output to the first data selector 11. Next, under the control of the control unit 24, the first data selector 11¹ _{n + 1}And S^Two _{n + 1}Are output from the second and third terminals S1 and S2 to the second data selector 17, respectively. Further, under the control of the control unit 24, the second data selector 17^Two _{n + 1}From the first terminal S0 to the second coefficient multiplier 18 and the intermediate data D¹ _{n + 1}Is output from the second terminal S1 to the adder 22.
[0071]
The second coefficient multiplier 18 outputs the lifting coefficient γ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data S^Two _{n + 1}Is multiplied by a lifting coefficient γ, and data γ × S2n + 1 is output to a two's complement arithmetic circuit 21. The adder 22 outputs -γ × S which is output data of the two's complement arithmetic circuit 21.^Two _{n + 1}And D which is the output from the second data selector 17¹ _{n + 1}Is added to the temporary data (D^Two _{n + 1}) Is calculated and output to the output destination selection unit 23. The output destination selection unit 23 controls the temporary data (D^Two _{n + 1}) Is output from the third terminal K2 to the external MMU2, and the MMU2 outputs the temporary data (D^Two _{n + 1}) Is transferred to the ring memory 3 and the referenced storage area intermediate data D¹ _{n + 1}Overwrite.
[0072]
In the next (N + 2) -th processing (FIG. 5), the intermediate data S of two points in the target area A3^Two _{n + 1}And D^Two _nIs performed within one clock cycle to calculate output temporary data (X (2n + 2)) on the series starting from the even-numbered input data Y (2n + 2). Intermediate data D^Two _nIs the second stage intermediate data on the series starting from the input data Y (2n + 1) one point before the input data Y (2n + 2). Specifically, the MMU 2 reads the calculated intermediate data S from the ring memory 3.^Two _{n + 1}And D^Two _nIs output to the first data selector 11. Next, under the control of the control unit 24, the first data selector 11^Two _{n + 1}And D^Two _nAre output from the second and third terminals S1 and S2 to the second data selector 17, respectively. Further, under the control of the control unit 24, the second data selector 17 sets the intermediate data D^Two _nFrom the first terminal S0 to the second coefficient multiplier 18, and the intermediate data S^Two _{n + 1}Is output from the second terminal S1 to the adder 22.
[0073]
The second coefficient multiplier 18 calculates the intermediate data D^Two _nIs multiplied by a lifting coefficient β, and β × D2n weighted by the lifting coefficient β is output to the two's complement arithmetic circuit 21. The adder 22 outputs -β × D which is output data of the two's complement arithmetic circuit 21.^Two _nAnd S which is the output from the second data selector 17^Two _{n + 1}Is added to calculate the output temporary data (X (2n + 2)) and output it to the output destination selection unit 23. The output destination selection unit 23 outputs the temporary data (X (2n + 2)) to the external MMU 2 from the third terminal K2 under the control of the control unit 24, and the MMU 2 transmits the temporary data (X (2n + 2)) to the ring. Transferred to the memory 3 and referred to the storage area intermediate data S^Two _{n + 1}Overwrite.
[0074]
In the next N + 3rd processing (FIG. 6), the intermediate data D in the target area A4^Two _nAnd the output data X (2n), the two-point operation in step a is executed within one clock cycle, and the output temporary data (X (X (2n + 1)) on the sequence starting from the odd-numbered input data Y (2n + 1) is obtained. 2n + 1)) is calculated. Specifically, the MMU 2 reads the calculated intermediate data D from the ring memory 3.^Two _nAnd the output data X (2n) are output to the first data selector 11. Next, under the control of the control unit 24, the first data selector 11^Two _nAnd X (2n) are output to the second data selector 17 from the second and third terminals S1 and S2, respectively. Further, under the control of the control unit 24, the second data selector 17 sends X (2n) from the first terminal S0 to the second coefficient multiplier 18 to output the intermediate data D^Two _nIs output from the second terminal S1 to the adder 22.
[0075]
The second coefficient multiplier 18 multiplies X (2n) by the lifting coefficient α and outputs data α × X (2n) weighted by the lifting coefficient α to the two's complement arithmetic circuit 21. The adder 22 outputs -α × X (2n) which is the output of the two's complement arithmetic circuit 21 and D which is the output of the second data selector 17.^Two _nTo calculate output temporary data (X (2n + 1)) and output it to the output destination selection unit 23. The output destination selection unit 23 outputs the temporary data (X (2n + 1)) from the third terminal K2 to the external MMU 2 under the control of the control unit 24, and the MMU 2 transmits the temporary data (X (2n + 1)) to the ring. Transferred to the memory 3 and referred to the storage area intermediate data D^Two _nOverwrite.
[0076]
In the next N + 4th processing (FIG. 7), the temporary data (S^Two _{n + 2}) And the intermediate data D in the target area B1¹ _{n + 1}Is executed within one clock cycle, and the intermediate data S of the second stage on the sequence starting from the even-numbered input data Y (2n + 4) is used.^Two _{n + 2}Is calculated. Intermediate data D¹ _{n + 2}Is the temporary data (S^Two _{n + 2}) Is data on the series one point after the series.
[0077]
Further, in the cycle one clock before executing the arithmetic processing in the target area B1, in the target area N2, the normalization processing of multiplying the input data Y (2n + 5) by the normalization coefficient 1 / κ is executed. Thereby, the first stage intermediate data D on the sequence of the input data Y (2n + 5)¹ _{n + 2}Is calculated.
[0078]
Specific processing will be described starting from the cycle one clock before. The MMU 2 causes the first data selector 11 to output the input data Y (2n + 5) temporarily stored in the ring memory 3 in a cycle one clock before performing the operation on the target area B1. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11 and causes the first coefficient multiplier 12 to output the input data Y (2n + 5). The first coefficient multiplier 12 multiplies the input data Y (2n + 5) by a normalization coefficient 1 / κ under the control of the control unit 24, and obtains data (1 / κ) × Y (2n + 5) = D¹ _{n + 2}Is output to the delay register 16. The coefficient multiplication processing in the first coefficient multiplier 12 is executed within one clock cycle.
[0079]
One clock cycle later, the intermediate data D held in the delay register 16¹ _{n + 2}Is output to the second data selector 17. The MMU 2 also stores the temporary data (S^Two _{n + 2}) Is output to the first data selector 11. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11 to store the temporary data (S^Two _{n + 2}) Is output to the second data selector 17.
[0080]
Then, the control unit 24 supplies the selection control signal SEL1 to the second data selector 17, and the intermediate data D¹ _{n + 2}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (S^Two _{n + 2}) Is output from the second terminal S1 to the adder 22. Further, the second data selector 17 controls the intermediate data D¹ _{n + 2}From the third terminal S2 to the external MMU2, which outputs the intermediate data D¹ _{n + 2}Is transferred to the ring memory 3 to overwrite the storage area input data Y (2n + 5) which has been referred to.
[0081]
The second coefficient multiplier 18 outputs the lifting coefficient δ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data D¹ _{n + 2}Is output to a two's complement arithmetic circuit 21. The two's complement arithmetic circuit 21 outputs data −δ × D¹ _{n + 2}Is output to the adder 22. Then, the adder 22 calculates the data of two points −δ × D¹ _{n + 2}And temporary data (S^Two _{n + 2}) Is added to the intermediate data S^Two _{n + 2}Is calculated and output to the output destination selection unit 23. This intermediate data S^Two _{n + 2}Is performed within one clock cycle.
[0082]
The output destination selection unit 23 outputs the intermediate data S to the external MMU 2 from the third terminal K 2 selected according to the value of the selection control signal SEL 2 supplied from the control unit 24.^Two _{n + 2}Is output. MMU2 calculates the intermediate data S^Two _{n + 2}Is transferred to the ring memory 3 and the referenced storage area temporary data (S^Two _{n + 2}).
[0083]
In the next N + 5th processing (FIG. 8), the temporary data (D^Two _{n + 1}) And the intermediate data S in the target area B1 calculated in the N + 4th processing (FIG. 7)^Two _{n + 2}Is performed within one clock cycle, and the intermediate data D of the second stage on the series starting from the odd-numbered input data Y (2n + 3) is used.^Two _{n + 1}Is calculated. The intermediate data S^Two _{n + 2}Is the temporary data (D^Two _{n + 1}) Is the data of the second stage on the series one point after the series.
[0084]
Specifically, the MMU 2 stores the temporary data (D^Two _{n + 1}) And intermediate data S^Two _{n + 2}Is output to the first data selector 11. Next, under the control of the control unit 24, the first data selector 11 stores the temporary data (D^Two _{n + 1}) And intermediate data S^Two _{n + 2}From the second and third terminals S1 and S2 to the second data selector 17. Further, under the control of the control unit 24, the second data selector 17^Two _{n + 2}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (D^Two _{n + 1}) Is output from the second terminal S1 to the adder 22. The second coefficient multiplier 18 calculates the intermediate data S^Two _{n + 2}Is multiplied by a lifting coefficient γ, and the sign of the coefficient is inverted in the two's complement arithmetic circuit 21. The adder 22 calculates the weighted intermediate data −γ × S^Two _{n + 2}And temporary data (D^Two _{n + 1}) And the intermediate data D^Two _{n + 1}Is calculated and output to the output destination selection unit 23. The output destination selection unit 23 controls the intermediate data D under the control of the control unit 24.^Two _{n + 1}From the third terminal K2 to the external MMU2, which outputs the intermediate data D^Two _{n + 1}Is transferred to the ring memory 3 and the referenced storage area temporary data (D^Two _{n + 1}).
[0085]
In the next N + 6th processing (FIG. 9), the output temporary data (X (2n + 2)) calculated in the N + 2nd processing (FIG. 5) and the intermediate data in the target area B2 calculated in the N + 5th processing (FIG. 8) Data D^Two _{n + 1}Is executed within one clock cycle to calculate the output data X (2n + 2) on the series starting from the even-numbered input data Y (2n + 2). The intermediate data D^Two _{n + 1}Is the intermediate data of the second stage on the series one point after the series of the output temporary data (X (2n + 2)).
[0086]
Specifically, the MMU 2 stores the temporary data (X (2n + 2)) and the intermediate data D from the ring memory 3.^Two _{n + 1}Is output to the first data selector 11. Next, under the control of the control unit 24, the first data selector 11 stores the temporary data (X (2n + 2)) and the intermediate data D^Two _{n + 1}From the second and third terminals S1 and S2 to the second data selector 17. Further, under the control of the control unit 24, the second data selector 17 sets the intermediate data D^Two _{n + 1}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (X (2n + 2)) is output from the second terminal S1 to the adder 22. The second coefficient multiplier 18 calculates the intermediate data D^Two _{n + 1}Is multiplied by a lifting coefficient β, and the sign of the coefficient is inverted in the two's complement arithmetic circuit 21. The adder 22 calculates the weighted intermediate data −β × D^Two _{n + 1}And the temporary data (X (2n + 2)) to calculate output data X (2n + 2), which is output to the output destination selection unit 23. The output destination selection unit 23 outputs the output data X (2n + 2) from the second terminal K1 to the external and external MMU2 under the control of the control unit 24, and the MMU2 outputs the output data X (2n + 2) to the ring memory 3 And overwrites the referred storage area temporary data (X (2n + 2)).
[0087]
In the next N + 7th processing (FIG. 10), the output temporary data (X (2n + 1)) calculated in the N + 3rd processing (FIG. 6) and the output in the target area B4 calculated in the N + 6th processing (FIG. 9) The two-point operation of the above step b using the data X (2n + 2) is executed within one clock cycle to calculate output data X (2n + 1) on the series starting from the odd-numbered input data Y (2n + 1). I do. Note that the output data X (2n + 2) is output data on a series one point after the series of output temporary data (X (2n + 1)).
[0088]
Specifically, the MMU 2 causes the first data selector 11 to output the temporary data (X (2n + 1)) and the output data X (2n + 2) from the ring memory 3. Next, under the control of the control unit 24, the first data selector 11 transfers the temporary data (X (2n + 1)) and the output data X (2n + 2) from the second and third terminals S1 and S2 to the second data selector 17. Output to Further, under the control of the control unit 24, the second data selector 17 outputs the output data X (2n + 2) from the first terminal S0 to the second coefficient multiplier 18, and outputs the temporary data (X (2n + 1)) to the second coefficient multiplier 18. The signal is output from the terminal S1 to the adder 22. The second coefficient multiplier 18 multiplies the output data X (2n + 2) by the lifting coefficient α and inverts the sign of the coefficient in the two's complement arithmetic circuit 21. The adder 22 calculates the output data X (2n + 1) by adding the output data −α × X (2n + 2) weighted with the lifting coefficient −α and the temporary data (X (2n + 1)), and calculates the output data X (2n + 1). 23. The output destination selection unit 23 outputs the output data X (2n + 1) from the first terminal K0 to the outside under the control of the control unit 24.
[0089]
In the next N + 8th processing (not shown), processing similar to the Nth processing (FIG. 3) is performed except for the target area. Thereafter, the processing from the (N + 1) th to the (N + 7) th is repeated. As described above, the same processing as the N-th processing (FIG. 3) to the N + 7-th processing (FIG. 10) is performed when the output data of all points..., X (2n−1), X (2n),. It is executed while moving the target area until the calculation is performed.
[0090]
Further, in the present embodiment, as shown in the N-th to (N + 3) -th processes, the two-point calculation in step a is executed until the temporary output data (X (2n + 1)) at the final stage is calculated. As shown in the (N + 4) th to (N + 7) th processes, the two-point calculation in step (b) for converting all the temporary data calculated in the (N) th to (N + 3) th processes into intermediate data or output data is performed. I have.
[0091]
As described above, in the wavelet transform method according to the present embodiment, the process of normalizing the input data..., Y (2n), Y (2n + 1), and the process of converting the standardized intermediate data to other intermediate data And the conversion process to be performed simultaneously in one clock cycle, the average cycle required to calculate one point of output data can be set to four clock cycles, and the calculation cycle of output data can be shortened.
[0092]
Next, a line-based two-dimensional inverse DWT process using the wavelet transform device 1 will be described below.
[0093]
The subbands (band components) input to the horizontal filtering unit 4A are the subbands 23LL and 23HL, or the subbands 23LH and 23HH, as shown in FIG. Here, the sub-band 23LL includes a horizontal low-frequency component (L) and a vertical low-frequency component (L), and the sub-band 23HL includes a horizontal high-frequency component (H) and a vertical low-frequency component (H). The sub-band 23LH includes a horizontal low-frequency component (L) and a vertical high-frequency component (H), and the sub-band 23HH includes a horizontal high-frequency component (H ) And a high-frequency component (H) in the vertical direction.
[0094]
When the sub-bands (band components) input to the horizontal filtering unit 4A are the sub-bands 23LL and 23HL, or the sub-bands 23LH and 23HH, the input data shown in FIGS. .. n-1), Y (n), Y (n + 1),... represent data in which horizontal data of subbands 23LL and 23HL are alternately arranged, or horizontal data of subbands 23LH and 23HH. It is data arranged alternately. Then, horizontal filtering is performed on the input data composed of the sub-bands 23LL and 23HL to perform a horizontal synthesis process, and the sub-band 23L is output. Also, by performing horizontal filtering on the input data composed of the sub-bands 23LH and 23HH, a horizontal synthesis process is performed, and the sub-band 23H is output. .., X (n−1), X (n), X (n + 1),... Shown in FIGS. 3 to 10 are the horizontal lines of the

subband

23L or 23H. Shows the data.
[0095]
Next, the subbands input by the vertical filtering unit 4B are the subband 23L and the subband 23H, as shown in FIG. In this case, the input data..., Y (n−1), Y (n), Y (n + 1),. This is data in which data is alternately arranged. Then, by subjecting the input data composed of the

subbands

23L and 23H to vertical filtering, synthesis processing in the vertical direction is performed, and the image data 23 is output. .., X (n−1), X (n), X (n + 1),... Shown in FIGS. 3 to 10 indicate one line of image data 23 in the vertical direction. I have. The image data 23 is rectangular data having the number of horizontal pixels W and the number of vertical pixels H.
[0096]
The subbands 23LL, 23HL, 23LH, and 23HH are rectangular data having the number of vertical pixels H / 2 and the number of horizontal pixels W / 2, and as schematically shown in FIG. Data strings vertically arranged as a set of bands 23LL and sub-bands 23HL of even rows and odd columns, or a set of sub bands 23LH of odd rows and even columns and sub bands 23HH of sub rows 23HH._i(2n), Y_i(2n + 1), Y_i(2n + 2)... Are input to the horizontal filtering unit 4A. Input data Y_iThe subscript i in (k) indicates the input data Y_iLet (k) indicate the number of the pixel column to which it belongs. The number i of the pixel column takes a value of i = 0, 1,..., W−1 (W: the number of horizontal pixels). In the figure, a storage area 24L in an even-numbered row with one set of sub-bands 23LL and 23HL and a storage area 24H in an odd-numbered row with one set of sub-bands 23LH and 23HH are divided into two areas. The memory-like data arrangement is not limited to this.
[0097]
Specifically, the first ring memory 3A and the horizontal filtering unit 4A perform each processing including the N-th processing (FIG. 3) to the N + 7-th processing (FIG. 10) on the low frequency side (the storage area 24L side). And the high frequency side (storage area 24H side) is alternately switched, and the processing is repeated for each pixel.
[0098]
For example, after the N-th processing (FIG. 3) is executed once for the first pixel row on the memory area 24L side, the N + 1-th processing (FIG. 4) is executed once, and further, the N + 2 The first processing (FIG. 5) is executed once, and processing such as... Is performed. Similarly, the processing is performed on the first pixel row on the storage area 24H side, and then performed on the second pixel row on the storage area 24L side, and then performed on the second pixel row on the storage area 24H side. , And then executed on the third pixel row on the storage area 24L side, and then executed on the third pixel row on the storage area 24H side, and finally Is performed on the H / 2-th pixel row on the storage area 24L side, and then is performed on the H / 2-th pixel row on the storage area 24H side.
[0099]
Note that, as schematically shown in FIG. 13, the first ring memory 3A stores input data._j(K), X_{j + 1}(K), has a storage area 26 for storing data of five points (five pixels) corresponding to the temporary data and the intermediate data.
[0100]
As a result, the horizontal filtering unit 4A outputs the output of each horizontal line unit (H / 2 height) of the subband 23L in which the subbands 23LL and 23HL are synthesized, and the subband in which the subbands 23LH and 23HH are synthesized. The output of each band (H / 2 height) of the band 23H is output alternately and continuously. Then, the horizontal line of the subband 23L is buffered in the L line buffer 5L in the line buffer circuit 5, and the horizontal line of the subband 23H is buffered in the H line buffer 5H in the line buffer circuit 5. .
[0101]
For example, as a result of the N + 6th processing (FIG. 9) being continuously performed on each of the first to Wth pixels, the data X of one line in which the 2n + 2nd horizontal components are combined is obtained.₀(2n + 2), X_Two(2n + 2), ..., X_j(2n + 2), ..., X_w _-1(2n + 2) are continuously output and buffered by the L line buffer circuit 5L. Next, as a result of the N + 7-th processing (FIG. 10) being continuously performed on each of the first to W-th pixels, one line of data X in which the 2n + 1-th horizontal component is synthesized is obtained.₀(2n + 1), X_Two(2n + 1), ..., X_j(2n + 1), ..., X_w _-1(2n + 1) are continuously output and buffered by the H line buffer circuit 5H.
[0102]
The line buffer circuit 5 alternately supplies the components of one horizontal line in the L line buffer 5L and the components of one horizontal line in the H line buffer 5H line by line to the second ring memory 3B under the control of the MMU 2. I do. The data output to the second ring memory 3B is processed by the vertical filtering unit 4B.
[0103]
Specifically, the second ring memory 3B and the vertical filtering unit 4B repeatedly execute the processing for each pixel row including the N-th processing (FIG. 3) to the N + 7-th processing (FIG. 10) in units of horizontal lines. For example, after the N-th process (FIG. 3) is performed on the 0th pixel column, it is performed on the first pixel column, then on the second pixel column, ... Finally, the processing is performed on the (W-1) -th pixel column. Next, after the above-described N + 1-th process (FIG. 4) is performed on the 0th pixel column, the process is performed on the first pixel column, and further performed on the second pixel column. ... Finally, the processing is performed on the (W-1) -th pixel column. In this way, the processing of each time is sequentially executed for all the pixel columns. As schematically shown in FIG. 12, the second ring memory 3B has a storage area 24 for holding data of 5 × W points (five lines) corresponding to the input data string, And intermediate data.
[0104]
As a result, the vertical filtering unit 4B outputs the image data 23 from the data line input in units of horizontal lines.
[0105]
By performing the above processing recursively, it is possible to combine band components of decomposition levels of any order and restore image data. That is, the subbands LL (k-1), HL (k-1), LH (k-1), and HH (k-1) at the (k-1) -th order (k is an integer of 2 or more) decomposition level are represented by wavelets. By making the conversion device 1 recursively input, it is possible to obtain the k-th order subband LL (k).
[0106]
As described above, since the wavelet transform device 1 according to the present embodiment includes the horizontal filtering unit 4A and the vertical filtering unit 4B having the configuration shown in FIG. 2, the calculation cycle of the output data can be shortened. Therefore, it is possible to perform the line-based two-dimensional wavelet transform in a short time and at a high speed.
[0107]
<Second embodiment>
Next, a wavelet transform device and a wavelet transform method according to a second embodiment of the present invention will be described. FIG. 14 is a diagram illustrating a schematic configuration of a wavelet transform device 30 according to the second embodiment. The wavelet transform device 30 includes a buffer 34 for temporarily holding subband two-dimensional image data, an MMU (memory management unit) 31 operating in synchronization with an externally supplied clock signal CLK, a first ring memory 32A, It is configured to include a filtering unit 33A, a second ring memory 32B, and a vertical filtering unit 33B. Here, the first ring memory 32A, the horizontal filtering unit 33A, the second ring memory 32B, and the vertical filtering unit 33B operate in synchronization with an externally supplied pixel clock signal PCLK.
[0108]
In the figure, the number of pixels or the number of lines of the first and second ring memories 32A and 32B is 8 or 9, but in the second embodiment, the first ring memory 32A is a nine-point ring memory. The second ring memory 32B is a 9-line ring memory.
[0109]
In the present embodiment, the MMU 31, the horizontal filtering unit 33A, and the vertical filtering unit 33B are configured by hardware, but may be configured by a computer program including a group of instructions executed by a microprocessor instead.
[0110]
The sub-band two-dimensional image data input to the wavelet transform device 30 is temporarily stored in the buffer 34. The wavelet transform device 30 has a function of applying a line-based two-dimensional inverse DWT once to the two-dimensional image data, and synthesizes the subbands 23LL, 23HL, 23LH, and 23HH of the (k + 1) th level to form the kth subband. Generate 23LL. The horizontal filtering unit 33A and the vertical filtering unit 33B are connected in series via a second ring memory 32B. The sub-band data is filtered in the horizontal direction by the horizontal filtering unit 33A, and then filtered in the vertical direction by the vertical filtering unit 33B. When performing a two-dimensional inverse DWT for synthesizing subbands of the second or higher decomposition level, the wavelet transform device 30 may be repeatedly used two or more times.
[0111]
The MMU 31 has a function of controlling data input / output between the buffer 34, the first ring memory 32A, and the second ring memory 32B, and transfers the sub-band data read from the buffer 34 to the first ring memory 32A. Can be memorized. Specifically, the data in which the horizontal data of the subbands 23LL and 23HL are alternately arranged in pixel units, and the data in which the horizontal data of the subbands 23LH and 23HH are alternately arranged in pixel units, It is stored in one ring memory 32A. The horizontal filtering unit 33A performs filtering in the horizontal direction on the data input from the first ring memory 32A, thereby synthesizing the high frequency component and the low frequency component in one clock cycle of the pixel clock signal PCLK. Data can be calculated point by point and output to the second ring memory 32B. More specifically, data obtained by combining the subbands 23LL and 23HL and data obtained by combining the subbands 23LH and 23HH are output alternately and stored in the second ring memory 32B.
[0112]
Next, the MMU 31 inputs data from the second ring memory 32B to the vertical filtering unit 33B. The vertical filtering unit 33B performs filtering on the input data in the vertical direction, and calculates data obtained by combining the high frequency component and the low frequency component one point at a time in one clock cycle of the pixel clock signal PCLK. ,Output.
[0113]
The configuration of the horizontal filtering unit 33A and the configuration of the vertical filtering unit 33B are the same as each other. FIG. 15 shows a schematic configuration of the filtering unit 33 (the horizontal filtering unit 33A or the vertical filtering unit 33B). The ring memory 32 shown in FIG. 15 represents one of the first ring memory 32A and the second ring memory 32B shown in FIG.
[0114]
The filtering unit 33 includes a first data selector 35, a first coefficient multiplier 36, a delay register 40, a second data selector 41, a third data selector 42, and adders 43 and 48 that selectively take in input data. , 49, 54, a second coefficient multiplier 44, a third coefficient multiplier 50, an output destination selection unit (DMUX) 55, and a control unit 56. Among these constituent elements, a set including two adders 43 and 48 and a second coefficient multiplier 44 constitutes a three-point operation unit that processes three-point data within one clock cycle. In addition, a set including two

adders

49 and 54 and a third coefficient multiplier 50 similarly constitutes a three-point operation unit. The two sets of three-point operation units and the output destination selection unit 55 constitute an intermediate data calculation unit.
[0115]
The control unit 56 operates in synchronization with the pixel clock signal PCLK. The first data selector 35 selectively selects the data fetched by the ring memory 32 from any of the first terminal S0 to the seventh terminal S6 according to the value of the selection control signal SEL0 supplied from the control unit 56. Output.
[0116]
The data output from the first terminal S0 of the first data selector 35 is input to the first coefficient multiplier 36. In the first coefficient multiplier 36, the coefficient register 37 outputs one of the normalized coefficients 1 / κ and κ to the multiplier 38 according to the value of the control signal C0 supplied from the control unit 56, and performs multiplication. The device 38 executes a normalization process of multiplying the input data by the normalization coefficient within one clock cycle. The data output from the first coefficient multiplier 36 is input to the second data selector 41 after being delayed by one clock cycle of the pixel clock signal PCLK in the delay register 40. Note that the first coefficient multiplier 36 and the delay register 40 constitute the normalizing means of the present invention.
[0117]
In the three-point operation unit, the adder 43 adds two points of data output from the first terminal S0 and the second terminal S1 of the third data selector 42 and outputs the result to the second coefficient multiplier 44. . In the second coefficient multiplier 44, the coefficient register 45 multiplies the input data by one of the lifting coefficients δ and α according to the value of the control signal C1 supplied from the control unit 56, and performs a two's complement operation circuit. After the sign is inverted at 47, it is output to the adder 48. Then, the adder 48 adds the data input from the third terminal S2 of the third data selector 42 and the data input from the second coefficient multiplier 44 and outputs the result to the output destination selecting unit 55.
[0118]
The adder 49 adds the two data points output from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42 and outputs the result to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 multiplies the input data by one of the lifting coefficients β and γ according to the value of the control signal C2 supplied from the control unit 56, and performs a two's complement operation circuit. After the sign is inverted at 53, it is output to the adder 54. The adder 54 adds the data input from the sixth terminal S5 of the third data selector 42 and the data input from the third coefficient multiplier 50, and outputs the result to the output destination selection unit 55.
[0119]
The output destination selection unit 55 outputs two points of data input in parallel from the

adders

48 and 54 to any one of the first terminal K0 to the third terminal K2 according to the value of the selection control signal SEL3 supplied from the control unit 56. Output from
[0120]
The data output from the second terminal K1 of the output destination selection unit 55 is branched and output to the external MMU2, and the data output from the third terminal K2 is output to the external MMU2. The MMU 2 can transfer data output from the second terminal K1 and the third terminal K2 to the outside to the ring memory 32 and store the data.
[0121]
Next, a typical example of the lifting operation using the filtering unit 33 will be described below with reference to FIGS. 16 to 19 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of this grid diagram is performed in the same manner as in the case of FIG. FIGS. 16 to 19 show lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ, 1 / κ corresponding to the line segments connecting the respective grid points for convenience of explanation. Not.
[0122]
16 to 19 schematically show the N-th (N is an integer) to (N + 3) -th processing in the present embodiment. In the N-th process (FIG. 16), two conversion processes of the target areas C1 and C2 are simultaneously executed in parallel within one clock cycle. In the target area C2, a multiplication value is calculated by multiplying the data obtained by adding the output data X (2n) and X (2n + 2) at two points by a lifting coefficient -α, and then the multiplication value and the intermediate data D are calculated.^Two _nAre added to perform a three-point operation. As a result, output data X (2n + 1) on the series starting from the odd-numbered input data Y (2n + 1) is calculated. Output data X (2n) and X (2n + 2) at two points are intermediate data D^Two _nThis is data on two series that is one point around the series. In the target area C1, two intermediate data S^Two _{n + 2}, S^Two _{n + 3}Is multiplied by a lifting coefficient −γ to the data obtained by adding the multiplication value and the intermediate data D¹ _{n + 2}Are added to perform a three-point operation. As a result, the intermediate data D of the second stage on the sequence starting from the even-numbered input data Y (2n + 5)^Two _{n + 2}Is calculated. Here, two intermediate data S^Two _{n + 2}, S^Two _{n + 3}Is the intermediate data D¹ _{n + 2}This is data on two series that is one point around the series.
[0123]
Further, in the period one clock before the operation in the target regions C1 and C2, a normalization process of multiplying the input data Y (2n + 8) by the normalization coefficient κ is executed in the target region N1, and the input data Y (2n + 8) S, which is the intermediate data of the first stage on the series¹ _{n + 4}Is calculated.
[0124]
The contents of the N-th specific processing are as follows. The ring memory 32 shown in FIG. 15 has a 9-line (series) storage area for storing input data, intermediate data, and temporary data, and sequentially stores new data in a storage area for storing referenced old data. Has a structure to overwrite.
[0125]
The MMU 31 causes the first data selector 35 to output the input data Y (2n + 8) temporarily stored in the ring memory 32. The control unit 56 supplies the selection control signal SEL0 to the first data selector 35 to output the input data Y (2n + 8) to the first coefficient multiplier 36. The first coefficient multiplier 36 selects the latter half coefficient κ of the two normalized coefficients κ and 1 / κ according to the control signal C0 supplied from the control unit 56, and supplies the selected coefficient κ to the multiplier 38. Is a multiplication value obtained by multiplying the input data and the normalization coefficient κ (= κ × Y (2n + 8) = S¹ _{n + 4}) To the delay register 40. The coefficient multiplication process in the first coefficient multiplier 36 is executed within one clock cycle.
[0126]
One clock cycle after the coefficient multiplication process, the intermediate data S1n + 4 stored in the delay register 40 is output to the second data selector 41. The second data selector 41 outputs the intermediate data S in accordance with the selection control signal SEL1 supplied from the control unit 56.¹ _{n + 4}From the second terminal S1 to the MMU 31, which outputs the intermediate data S¹ _{n + 4}Is transferred to the ring memory 32 to overwrite the referred storage area input data Y (2n + 8). The intermediate data S¹ _{n + 4}Is output to the MMU 31 at the same clock cycle, the MMU 31 outputs the six-point data X (2n), D temporarily stored in the ring memory 32.^Two _n, X (2n + 2), S^Two _{n + 2}, D¹ _{n + 2}, S^Two _{n + 3}To the first data selector 35. The first data selector 35 outputs the data of the six points to the second terminal S1 to the seventh terminal S6 according to the value of the selection control signal SEL0 supplied from the control unit 56. This output is then input to the third data selector 42, and the third data selector 42 outputs the data in the target area C2 of the input data according to the value of the selection control signal SEL2 supplied from the control unit 56. Data X (2n), X (2n + 2), D^Two _nAnd outputs the data from the first terminal S0 to the third terminal S2 respectively.^Two _{n + 2}, S^Two _{n + 3,}D¹ _{n + 2}And outputs the signals from the fourth terminal S3 to the sixth terminal S5, respectively.
[0127]
The upper adder 43 adds the data X (2n) and X (2n + 2) of two points input from the first terminal S0 and the second terminal S1 of the third data selector 42 to a second coefficient multiplier 44. Output to In the second coefficient multiplier 44, the coefficient register 45 selects the second half coefficient α of the two lifting coefficients α and δ according to the control signal C1 supplied from the control unit 56, and supplies the selected coefficient α to the multiplier 46. The multiplier 46 outputs a multiplied value (= α × (X (2n) + X (2n + 2))) obtained by multiplying the input data and the lifting coefficient α to the two's complement arithmetic circuit 47. In the two's complement arithmetic circuit 47, the data whose sign is inverted is output to the adder 48. Then, the adder 48 calculates the multiplied value input from the second coefficient multiplier 44 and the intermediate data D input from the third terminal S2 of the third data selector 42.^Two _nIs added to calculate the output data X (2n + 1) in the target area C2, and output it to the output destination selecting unit 55. The calculation processing of the output data X (2n + 1) is executed within one clock cycle.
[0128]
On the other hand, the lower adder 50 outputs the intermediate data S of two points input from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42.^Two _{n + 2}, S^Two _{n + 3}Is output to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 selects the latter half coefficient γ of the two lifting coefficients β and γ according to the control signal C2 supplied from the control unit 56, and supplies the selected coefficient γ to the multiplier 52. The multiplier 52 multiplies the input data by a lifting coefficient γ (= γ × (S^Two _{n + 2}+ S^Two _{n + 3})) To the two's complement arithmetic circuit 53. In the two's complement arithmetic circuit 53, the data whose sign is inverted is output to the adder 54. Then, the adder 54 calculates the multiplied value input from the third coefficient multiplier 50 and the intermediate data D input from the sixth terminal S5 of the third data selector 42.¹ _{n + 2}Is added to the intermediate data D in the target area C1.^Two _{n + 2}Is calculated and output to the output destination selection unit 55. This intermediate data D^Two _{n + 2}Is performed within one clock cycle.
[0129]
The output destination selection unit 55 outputs the output data X (2n + 1) input from the adder 48 from the first terminal K0 according to the value of the selection control signal SEL3 supplied from the control unit 56, and inputs the output data X from the other adder 54. Intermediate data D^Two _{n + 2}From the third terminal K2 to the external MMU 31, which outputs the intermediate data D^Two _{n + 2}Is transferred to the ring memory 32, and the referenced storage area intermediate data D¹ _{n + 2}Overwrite.
[0130]
Next, conversion processing of the target areas C3 and C4 in the (N + 1) th processing (FIG. 17) is performed. In the target area C3, two intermediate data D¹ _{n + 3}, D¹ _{n + 4}Is multiplied by a lifting coefficient −δ to calculate a multiplication value, and the multiplication value and the intermediate data S¹ _{n + 4}Are added to perform a three-point operation. As a result, the second-stage intermediate data S on the series starting from the even-numbered input data Y (2n + 8)^Two _{n + 4}Is calculated. Here, two intermediate data D¹ _{n + 3}, D¹ _{n + 4}Is the intermediate data S¹ _{n + 4}Is one point before and after. In the target area C4, two intermediate data D^Two _{n + 1}, D^Two _{n + 2}Is multiplied by a lifting coefficient −β to the multiplied value, and the multiplied value and the intermediate data S^Two _{n + 2}Are added to perform a three-point operation. As a result, output data X (2n + 4) on the series starting from the even-numbered input data Y (2n + 4) is calculated. Here, two intermediate data D^Two _{n + 1}, D^Two _{n + 2}Is the intermediate data S^Two _{n + 2}This is data on two series that is one point around the series.
[0131]
Further, the processing in the target area N2 is executed in a cycle one clock before the execution of the arithmetic processing in the target areas C3 and C4. In the target area N2, a normalization process of multiplying the input data Y (2n + 9) by a normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 4}Is output.
[0132]
The specific processing contents of the (N + 1) -th process are as follows. First, the processing of the target area N2 executed in the cycle one clock before will be described. The MMU 31 causes the first data selector 35 to output the input data Y (2n + 9) temporarily stored in the ring memory 32. The control unit 56 supplies the selection control signal SEL0 to the first data selector 35 to output the input data Y (2n + 9) to the first coefficient multiplier 36. The first coefficient multiplier 36 selects the first half coefficient 1 / κ from the two standardized coefficients κ and 1 / κ according to the control signal C0 supplied from the control unit 56, supplies the selected coefficient 1 / κ to the multiplier 38, and performs multiplication. The multiplier 38 multiplies the input data by the normalization coefficient 1 / κ (= 1 / κ × Y (2n + 9) = D¹ _{n + 4}) To the delay register 40. The coefficient multiplication process in the first coefficient multiplier 36 is executed within one clock cycle.
[0133]
One clock cycle after the coefficient multiplication process, the intermediate data D1n + 4 stored in the delay register 40 is output to the second data selector 41. The second data selector 41 outputs the intermediate data D in accordance with the selection control signal SEL1 supplied from the control unit 56.¹ _{n + 4}Is output from the first terminal S0 to the third data selector 42, and the intermediate data D¹ _{n + 4}From the second terminal S1 to the MMU 31, which outputs the intermediate data D¹ _{n + 4}Is transferred to the ring memory 32 to overwrite the referred storage area input data Y (2n + 9). Next, the intermediate data D¹ _{n + 4}Is output to the third data selector 42 at the same clock cycle, the MMU 31 outputs the five-point data D temporarily stored in the ring memory 32.^Two _{n + 1}, S^Two _{n + 2}, D^Two _{n + 2}, D¹ _{n + 3}, S¹ _{n + 4}To the first data selector 35. The first data selector 35 outputs the data of the five points to the second terminal S1 to the sixth terminal S5 according to the value of the selection control signal SEL0 supplied from the control unit 56. This output is then input to a third data selector 42, and the third data selector 42 selects three input data D in the target area C3 among the five data.^Two _{n + 1}, D^Two _{n + 2}, S^Two _{n + 2}, And output from the fourth terminal S3 to the sixth terminal S5. Of the five data, the data of two points in the target area C4 and the data D input from the second data selector 41 are selected.¹ _{n + 3}, D¹ _{n + 4}, S¹ _{n + 4}And outputs it from the first terminal S0 to the third terminal S2.
[0134]
The upper adder 43 outputs two points of data D input from the first terminal S0 and the second terminal S1 of the third data selector 42.¹ _{n + 3}, D¹ _{n + 4}Is output to the first coefficient multiplier 44. In the first coefficient multiplier 44, the coefficient register 45 selects the first half coefficient δ from the two lifting coefficients α and δ according to the control signal C1 supplied from the control unit 56, and supplies the selected coefficient δ to the multiplier 46. The multiplier 46 multiplies the input data by the lifting coefficient δ (= δ × (D¹ _{n + 3}+ D¹ _{n + 4})) To the two's complement arithmetic circuit 47. In the two's complement arithmetic circuit 47, the data whose sign is inverted is output to the adder 48. Then, the adder 48 outputs the multiplied value inputted from the second coefficient multiplier 44 and the intermediate data S inputted from the third terminal S2 of the third data selector 42.¹ _{n + 4}Is added to the intermediate data S in the target area C3.^Two _{n + 4}Is calculated and output to the output destination selection unit 55. This intermediate data S^Two _{n + 4}Is performed within one clock cycle.
[0135]
On the other hand, the lower adder 49 outputs the two-point intermediate data D input from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42.^Two _{n + 1}, D^Two _{n + 2}Is output to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 selects the first half coefficient β of the two lifting coefficients β and γ according to the control signal C2 supplied from the control unit 56 and supplies the selected coefficient β to the multiplier 52, The multiplier 52 multiplies the input data by a lifting coefficient β (= β × (D^Two _{n + 1}+ D^Two _{n + 2})) To the two's complement arithmetic circuit 53. In the two's complement arithmetic circuit 53, the data whose sign is inverted is output to the adder 54. Then, the adder 54 calculates the multiplication value input from the third coefficient multiplier 50 and the intermediate data S input from the sixth terminal S5 of the third data selector 42.^Two _{n + 2}Is calculated, and the output data X (2n + 4) in the target area C4 is calculated and output to the output destination selection unit 55. The calculation processing of the output data X (2n + 4) is executed within one clock cycle.
[0136]
The output destination selection unit 55 outputs the output data X (2n + 4) input from the adder 54 from the second terminal K1 according to the value of the selection control signal SEL3 supplied from the control unit 56, and inputs the output data X from the other adder 48. Intermediate data S^Two _{n + 4}From the third terminal K2 to the external MMU 31, which outputs the intermediate data S^Two _{n + 4}Is transferred to the ring memory 32, and the referenced storage area intermediate data S¹ _{n + 4}Overwrite. The output data X (2n + 4) output from the second terminal K1 branches and is also output to the external MMU 31. The MMU 31 transfers the output data X (2n + 4) to the ring memory 32 and stores the referred data. Area intermediate data S^Two _{n + 2}Overwrite.
[0137]
Next, conversion processing of the target areas C5 and C6 in the N + 2nd processing (FIG. 18) is executed. Further, the normalization process of the target area N3 is executed in a cycle one clock before the execution of the arithmetic processing in the target areas C5 and C6. Here, the target regions C5, C6, and N3 are regions obtained by moving the target regions C1, C2, and N1 of the N-th process (FIG. 16) two series (two points) backward. In these target areas C5, C6, and N3, processing similar to the processing in the target areas C1, C2, and N1 is performed. Therefore, in the target area N3, a normalization process of multiplying the even-numbered input data Y (2n + 10) by the normalization coefficient κ is executed, and the intermediate data S¹ _{n + 5}Is calculated. In the target area C5, two intermediate data S^Two _{n + 3}, S^Two _{n + 4}Is multiplied by a lifting coefficient −γ to the data obtained by adding the multiplication value and the intermediate data D¹ _{n + 3}Are added to perform a three-point operation. As a result, the intermediate data D of the second stage on the series starting from the odd-numbered input data X (2n + 7)^Two _{n + 3}Is calculated. In the target area C6, a multiplication value is calculated by multiplying data obtained by adding two points of output data X (2n + 2) and X (2n + 4) by a lifting coefficient -α, and then the multiplication value and the intermediate data D are calculated.^Two _{n + 1}Are added to perform a three-point operation. As a result, output data X (2n + 3) on the series starting from the odd-numbered input data Y (2n + 3) is calculated.
[0138]
Next, conversion processing of the target areas C7 and C8 in the N + 3rd processing (FIG. 19) is executed. Further, the normalization processing of the target area N4 is performed in a cycle one clock before the execution of the arithmetic processing in the target areas C7 and C8. Here, the target areas C7, C8, and N4 are areas obtained by moving the target areas C3, C4, and N2 of the (N + 1) -th processing (FIG. 17) two series (two points) backward. In these target areas C7, C8, and N4, processing similar to the processing in the target areas C3, C4, and N2 is performed. Therefore, in the target area N4, the normalization process of multiplying the input data Y (2N + 11) by the normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 5}Is calculated. In the target area C7, the intermediate data D of the two odd-numbered points¹ _{n + 4}, D¹ _{n + 5}Is multiplied by a lifting coefficient −δ to calculate a multiplied value, and the multiplied value and the even-numbered intermediate data S¹ _{n + 5}Are added to perform a three-point operation. As a result, the second-stage intermediate data S on the series starting from the even-numbered input data X (2n + 10)^Two _{n + 5}Is calculated. In the target area C8, two intermediate data D^Two _n ₊₂, D^Two _{n + 3}Is multiplied by the lifting coefficient −β to calculate a multiplication value, and the multiplication value and the intermediate data S^Two _{n + 3}Are added to perform a three-point operation. As a result, output data X (2n + 6) on the series starting from the even-numbered input data Y (2n + 6) is calculated.
[0139]
As described above, processes similar to the N-th process (FIG. 16) and the (N + 1) -th process (FIG. 17) are repeatedly executed while moving the target area until output data of all points is calculated. As a result, the average cycle required to calculate the output data of the even-numbered or odd-numbered one point can be set to one clock cycle, and the calculation cycle of the output data can be greatly reduced.
[0140]
Next, a line-based two-dimensional inverse DWT process using the wavelet transform device 30 will be described below.
[0141]
The subbands (band components) input to the horizontal filtering unit 33A are the subbands 23LL and 23HL, or the subbands 23LH and 23HH, as shown in FIG.
[0142]
.., Y (n−1), Y (n), Y (n + 1),... Shown in FIG. 16 to FIG. 19 are arranged alternately with horizontal data of subbands 23LL and 23HL. Or the data in which the horizontal data of the subbands 23LH and 23HH are alternately arranged. Then, the sub-band 23L is output by subjecting the input data composed of the sub-bands 23LL and 23HL to horizontal filtering, and the sub-band 23L is outputted by performing the horizontal filtering on the input data composed of the sub-bands 23LH and 23HH. 23H is output. .., X (n−1), X (n), X (n + 1),... Shown in FIGS. 16 to 19 are the horizontal lines of the

subband

23L or 23H. The data string is shown.
[0143]
Next, the sub-bands input by the vertical filtering unit 33B are the sub-band 23L and the sub-band 23H, as shown in FIG. In this case, the input data ..., Y (n-1), Y (n), Y (n + 1), ... shown in Figs. 16 to 19 are the vertical directions of the sub-bands 23L and 23H. This is data in which data is alternately arranged. Then, image data 23 is output by subjecting the input data composed of the

subbands

23L and 23H to vertical filtering. , X (n−1), X (n), X (n + 1),... Shown in FIG. 16 to FIG. 19 indicate a data line of one line in the vertical direction of the image data 23. ing. The image data 23 is rectangular data having the number of horizontal pixels W and the number of vertical pixels H.
[0144]
The subbands 23LL, 23HL, 23LH, and 23HH are rectangular data having the number of vertical pixels H / 2 and the number of horizontal pixels W / 2, and as schematically shown in FIG. Data rows vertically arranged as a set of the band 23LL and the sub-band 23HL of the even-numbered row and the odd-numbered column, or a set of the sub-band 23LH of the odd-numbered row and the even-numbered column and the sub-band 23HH of the odd-numbered row and the odd-numbered column_i(2n), Y_i(2n + 1), Y_i(2n + 2)... Are input to the horizontal filtering unit 33. In other words, each pixel row (data row in the horizontal direction in the drawing) in the storage area 58L is a data row in which pixels on each horizontal line of the subbands 23LL and 23HL are alternately arranged, and each pixel input to the storage area 58H. Rows (horizontal data columns in the figure) are data columns in which pixels on each horizontal line of the subbands 23LH and 23HH are alternately arranged. Input data Y_iThe subscript i in (k) indicates the input data Y_iLet (k) indicate the number of the pixel column to which it belongs. The pixel column number i is i = 0, 1,. .., W-1 (W: number of horizontal pixels). In the figure, a storage area 58L of an even-numbered row with one set of sub-bands 23LL and 23HL and a storage area 58H of an odd-numbered row with one set of sub-bands 23LH and 23HH are divided into two areas. The memory-like data arrangement is not limited to this.
[0145]
Specifically, the first ring memory 32A and the horizontal filtering unit 33A perform each processing including the N-th processing (FIG. 16) to the (N + 2) -th processing (FIG. 17) on the low frequency side (the storage area 58L side). And the high frequency side (storage area 58H side) is alternately switched, and the processing is repeated for each pixel.
[0146]
For example, after the N-th process (FIG. 16) is performed once for the first pixel row on the memory area 58L side, the N + 1-th process (FIG. 17) is performed once, and further, the N + 2 The first processing (FIG. 18) is executed once, and processing such as... Is performed. Similarly, the processing is performed on the first pixel row on the storage area 58H side, and then performed on the second pixel row on the storage area 58L side, and then executed on the second pixel row on the storage area 58H side. , And then executed on the third pixel row on the storage area 58L side, then executed on the third pixel row on the storage area 58H side,. Is performed on the H / 2-th pixel row on the storage area 58L side, and then on the H / 2-th pixel row on the storage area 58H side.
[0147]
Note that, as schematically shown in FIG. 21, the first ring memory 32A stores input data._j(K), X_{j + 1}(K), has a storage area 59 for holding data of nine points (9 pixels) corresponding to the temporary data and the intermediate data.
[0148]
As a result, the horizontal filtering unit 33A outputs the output of each horizontal line unit (H / 2 height) of the subband 23L in which the subbands 23LL and 23HL are synthesized, and the subband in which the subbands 23LH and 23HH are synthesized. The output of each band (H / 2 height) of the band 23H is output alternately and continuously.
[0149]
Then, data in which the horizontal lines of the sub-band 23L and the horizontal lines of the sub-band 23H are alternately arranged is output as it is to the second ring memory 32B as vertical line data, and is processed by the vertical filtering unit 33B.
[0150]
Specifically, the second ring memory 32B and the vertical filtering unit 33B repeatedly execute the processing for each pixel row including the N-th processing (FIG. 16) to the (N + 1) -th processing (FIG. 17) in units of horizontal lines. For example, after the N-th process (FIG. 16) is performed on the 0th pixel column, it is performed on the first pixel column, then on the second pixel column, ... Finally, the processing is performed on the (W-1) -th pixel column. Next, after the above-described N + 1-th process (FIG. 7) is performed on the 0th pixel column, it is performed on the first pixel column, and further on the second pixel column. ... Finally, the processing is performed on the (W-1) -th pixel column. In this way, the processing of each time is sequentially executed for all the pixel columns. As schematically shown in FIG. 20, the second ring memory 32B has a storage area 58 for holding data of 9 × W points (9 lines) corresponding to the input data string, And intermediate data.
[0151]
As a result, the vertical filtering unit 33B outputs the image data 23 from the data line input in units of horizontal lines.
[0152]
By performing the above processing recursively, it is possible to combine band components of decomposition levels of any order and restore image data. That is, the subbands LL (k + 1), HL (k + 1), LH (k + 1), and HH (k + 1) at the (k + 1) -th order (k is an integer) decomposition level are recursively input to the wavelet transform device 1 so that k It is possible to obtain the next subband LL (k).
[0153]
As described above, since the wavelet transform device 1 according to the present embodiment includes the horizontal filtering unit 33A and the vertical filtering unit 33B having the configuration illustrated in FIG. 15, the calculation cycle of the output data can be shortened. Therefore, it is possible to perform the line-based two-dimensional wavelet transform in a short time and at a high speed.
[0154]
In the second embodiment, a buffer for storing the output of the horizontal filtering unit 33A, which is required in the first embodiment, is unnecessary. In the first embodiment, the horizontal filtering unit 4A outputs one pixel in four clocks, and the vertical filtering unit 4B inputs one pixel in four clocks. However, the horizontal filtering unit 4A performs the N + 6th processing (see FIG. 9) and the (N + 7) th processing (FIG. 10), the vertical lines are continuously output. On the other hand, the vertical filtering unit 4B inputs the vertical lines in the Nth processing (FIG. 3) and then performs the (N + 4) th processing. Until (FIG. 7), no vertical line is input. This required a buffer. On the other hand, in the second embodiment, the horizontal filtering unit 33A outputs a vertical line in each processing, and the vertical filtering 33B inputs a vertical line in each processing, so that a buffer is not required.
[0155]
<Third embodiment>
Next, a wavelet transform device and a wavelet transform method according to a third embodiment of the present invention will be described. The wavelet transform device according to the present embodiment has the same configuration as that of the wavelet transform device 30 (FIG. 14) according to the second embodiment, except for the horizontal filtering unit and the vertical filtering unit. However, in the second embodiment, the first and second ring memories 32A and 32B are ring memories of nine points and nine lines, respectively. However, in this embodiment, the first and second ring memories 32A and 32B are provided. , 32B are ring memories of 8 points and 8 lines, respectively.
[0156]
FIG. 22 is a diagram illustrating a schematic configuration of a filtering unit 33s according to the third embodiment. The filtering unit 33s indicates a horizontal filtering unit or a vertical filtering unit, and the ring memory 32s indicates one of the first ring memory 32A and the second ring memory 32B shown in FIG.
[0157]
The filtering unit 33 s includes first and

second data selectors

60 and 65, a delay register 64, and first to

fifth coefficient multipliers

61, 66, 71, 76, 81 that selectively take in input data from the ring memory 32 s. ,

Adders

70, 75, 80, 85, an output destination selection unit (DMUX) 86, and a control unit 87. Among these components, the pair of the second coefficient multiplier 66 and the adder 70 constitutes a two-point operation unit that processes two-point data by the method of the step a or the step b (FIG. 38). In addition, a set of the third coefficient multiplier 71 and the adder 75, a set of the fourth coefficient multiplier 76 and the adder 80, and a set of the fifth coefficient multiplier 81 and the adder 85 also constitute a two-point operation unit. are doing. The two-point calculation unit and the output destination selection unit 86 constitute an intermediate data calculation unit.
[0158]
The control unit 87 operates in synchronization with the pixel clock signal PCLK. The first data selector 60 selectively selects the data fetched from the ring memory 32s from any of the first terminal S0 to the eighth terminal S7 according to the value of the selection control signal SEL0 supplied from the control unit 87. Output.
[0159]
The data output from the first terminal S0 of the first data selector 60 is input to the first coefficient multiplier 61. The first coefficient multiplier 61 outputs one of the normalization coefficients κ and 1 / κ to the multiplier 63 in accordance with the value of the control signal C0 supplied from the control unit 87. Multiply the data by its normalization factor. Output data from the multiplier 63 is input to the delay register 64. The normalization process in the first coefficient multiplier 61 is executed within one clock cycle. Note that the first coefficient multiplier 61 and the delay register 64 constitute a normalizing means. The output of the delay register 64 is input to the second data selector 65, and is branched and input to the MMU 31.
[0160]
The second data selector 65 outputs the data fetched from the delay register 64 and the first data selector 60 to the first terminal S0 to the eighth terminal S7 according to the value of the selection control signal SEL1 supplied from the control unit 87. Selectively output from either. The second to

fifth coefficient multipliers

66, 71, 76, 81 are circuits for multiplying input data by lifting coefficients -α, -β, -γ, -δ according to the control signals C1 to C4, respectively. Coefficient registers 67, 72, 77, and 82 receive control signals C1 to C4 and output lifting coefficients α, β, γ, and δ to multipliers 68, 73, 78, and 83, respectively. The

multipliers

68, 73, 78, and 83 multiply data input from the output terminals S0, S2, S4, and S6 of the second data selector 65 by lifting coefficients α, β, γ, and δ, respectively, and output the result. Two's

complement arithmetic circuits

69, 74, 79 and 84 invert the signs of the output data from

multipliers

68, 73, 78 and 83, respectively. The

adders

70, 75, 80, and 85 respectively include the data input from the second to

fifth coefficient multipliers

66, 71, 76, and 81 and the output terminals S1, S3, S5, and S7 of the second data selector 65. And outputs the result to the output destination selection unit 86.
[0161]
The output destination selection unit 86 outputs the data of four points input in parallel from the

adders

70, 75, 80, and 85 in accordance with the value of the selection control signal SEL2 supplied from the control unit 87, to the first terminals K0 to K5. Output from terminal K4. Data output from the first terminal K0 and the second terminal K1 is output to the outside as synthesized data. The data branched from the second terminal K1 and the data output from the third terminal K2 to the fifth terminal K4 are input to the MMU 31. The MMU 31 can transfer the data output from the second terminal K1 to the fifth terminal K4 to the MMU 31 to the ring memory 32s and store the data.
[0162]
Next, a representative example of the lifting operation using the filtering unit 33s illustrated in FIG. 22 will be described below with reference to FIGS. The calculation of this grid diagram is performed in the same manner as in the case of FIG. 23 to 25, lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ, 1 / κ corresponding to the line segments connecting the respective grid points are displayed for convenience of description. Not.
[0163]
FIG. 23 is a lattice diagram at the time when the N-th processing (N: integer) is completed, and FIGS. 24 and 25 schematically show the N + 1-th processing and the N + 2-th processing, respectively. In the N-th process (FIG. 23), four conversion processes of the target areas A1, A2, B1, and B2 are simultaneously executed in parallel within one clock cycle. In the target area A1, two intermediate data D¹ _{n + 2}, S^Two _{n + 2}Is performed, and the second-stage temporary data (D) in the series starting from the odd-numbered input data Y (2n + 5)^Two _{n + 2}) Is calculated. Here, the intermediate data S^Two _{n + 2}Is the intermediate data D¹ _{n + 2}Is the data on the series one point before the series. In the target area A2, two points of data D^Two _n, X (2n) to calculate the temporary output data (X (2n + 1)) on the series starting from the odd-numbered input data Y (2n + 1). In the target area B1, the temporary data (S^Two _{n + 3}) And the intermediate data D calculated by the arithmetic operation one clock cycle earlier¹ _{n + 3}And the two-point calculation of the above step b (FIG. 38) is performed, and the second-stage intermediate data S on the series starting from the even-numbered input data Y (2n + 6)^Two _{n + 3}Is calculated. Here, the intermediate data D¹ _{n + 3}Is the temporary data (S^Two _{n + 3}) Is data on the series one point after the series. In the target area B2, the output temporary data (X (2n + 2)) and the intermediate data D^Two _{n + 1}Is performed, the output data X (2n + 2) on the series starting from the even-numbered input data Y (2n + 2) is calculated.
[0164]
Further, the normalization processing of the target area N1 is performed in a cycle one clock before the parallel processing in the target areas A1, A2, B1, and B2. In the target area N1, a normalization process of multiplying the input data Y (2n + 7) by a normalization coefficient 1 / κ is executed.
[0165]
The contents of the N-th specific processing are as follows. The ring memory 32s has a storage area of 8 lines (series).
In the N-th processing, the arithmetic processing in the target areas A1, A2, B1, and B2 is performed within one clock cycle, but the arithmetic processing in the target area N1 is performed one clock cycle before this arithmetic processing. The processing in the cycle one clock before will be described. The MMU 31 outputs the input data Y (2n + 7) temporarily stored in the ring memory 32s to the first data selector 60. The first data selector 60 outputs the input data Y (2n + 7) from the first terminal S0 according to the value of the selection control signal SEL0 from the control unit 87.
[0166]
The input data Y (2n + 7) output from the first terminal S0 is input to the first coefficient multiplier 61. In the first coefficient multiplier 61, the coefficient register 62 outputs the normalized coefficient 1 / κ of the two normalized coefficients κ and 1 / κ to the multiplier 63 according to the control signal C0 supplied from the control unit 87. Then, the multiplier 63 multiplies the input data Y (2n + 7) by a normalization coefficient 1 / κ. As a result, the first coefficient multiplier 61 outputs the data D¹ _{n + 3}(= (1 / κ) × Y (2n + 7)) is calculated. The output of the multiplier 63 is input to the delay register 64. The above processing is executed in a cycle one clock before the arithmetic processing in the target areas A1, A2, B1, B2 is performed.
[0167]
In the next clock cycle, the MMU 31 transmits the seven data points X (2n), D temporarily stored in the ring memory 32s.^Two _n, (X (2n + 2)), D^Two _{n + 1}, S^Two _{n + 2}, D¹ _{n + 2}, (S^Two _{n + 3}) Is output to the first data selector 60. The first data selector 60 outputs the data of the seven points to the second data selector 65 according to the value of the selection control signal SEL0 supplied from the control unit 87. The data D stored in the delay register 64¹ _{n + 3}Is output to the second data selector 65. The intermediate data D output from the delay register 64¹ _{n + 3}Is also branched and output to the external MMU 31, and the MMU 31¹ _{n + 3}Is transferred to the ring memory 32s, and is overwritten with the storage area input data Y (2n + 7) which has been referred to.
[0168]
The second data selector 65 outputs two output data X (2n), D in the target area A2 among the eight data in accordance with the value of the selection control signal SEL1 supplied from the control unit 87.^Two _nAnd outputs it to the first terminal S0 and the second terminal S1, and outputs the intermediate data D in the target area B2.^Two _{n + 1}And the temporary data (X (2n + 2)) are output from the third terminal S2 and the fourth terminal S3, and the intermediate data S in the target area A1 is output.^Two _{n + 2}And D¹ _{n + 2}Are output from the fifth terminal S4 and the sixth terminal S5, and the intermediate data D in the target area B1 is output.¹ _{n + 3}And temporary data (S^Two _{n + 3}) Are output from the seventh terminal S6 and the eighth terminal S7.
[0169]
In the second coefficient multiplier 66, the coefficient register 67 outputs the lifting coefficient α to the multiplier 68 according to the control signal C1 supplied from the control unit 87, and the multiplier 68 outputs the data input from the first terminal S0. The data α × X (2n) obtained by multiplying X (2n) by the lifting coefficient α is output. The sign of the output data from the multiplier 68 is inverted in the two's complement arithmetic circuit 69 and output to the adder 70. The adder 70 outputs the data −α × X (2n) output from the second coefficient multiplier 66 and the data D input from the second terminal S 1 of the second data selector 65.^Two _nIs added to calculate temporary data (X (2n + 1)) in the target area A2, and output to the output destination selection unit 86.
[0170]
Further, in the third coefficient multiplier 71, the coefficient register 72 outputs the lifting coefficient β to the multiplier 73 according to the control signal C2 supplied from the control unit 87, and the multiplier 73 receives the input from the third terminal S2. Intermediate data D^Two _{n + 1}Β × D obtained by multiplying by the lifting coefficient β^Two _{n + 1}Is output. The output of the multiplier 73 is output to the adder 75 after the sign is inverted in the two's complement arithmetic circuit 74. The adder 75 calculates the data −β × D output from the third coefficient multiplier 71.^Two _{n + 1}And output temporary data (X (2n + 2)) input from the fourth terminal S3 of the second data selector 65 to calculate output data X (2n + 2) in the target area B2, and select an output destination. Output to the unit 86.
[0171]
In the fourth coefficient multiplier 76, the coefficient register 77 outputs the lifting coefficient γ to the multiplier 78 according to the control signal C3 supplied from the control unit 87, and the multiplier 78 receives the input from the fifth terminal S4. Intermediate data S^Two _{n + 2}Γ × S obtained by multiplying by the lifting coefficient γ^Two _{n + 2}Is output. The output of the multiplier 78 is output to the adder 80 after the sign is inverted in the two's complement arithmetic circuit 79. The adder 80 outputs the data −γ × S output from the fourth coefficient multiplier 76.^Two _{n + 2}And data D input from the sixth terminal S5 of the second data selector 65.¹ _{n + 2}Is added to the temporary data (D) in the target area A1.^Two _{n + 2}) Is calculated and output to the output destination selection unit 86.
[0172]
In the fifth coefficient multiplier 81, the coefficient register 82 outputs the lifting coefficient δ to the multiplier 83 according to the control signal C4 supplied from the control unit 87, and the multiplier 83 receives the input from the seventh terminal S6. Intermediate data D¹ _{n + 3}Δ × D obtained by multiplying by the lifting coefficient δ¹ _{n + 3}Is output. The output of the multiplier 83 is output to the adder 85 after the sign is inverted in the two's complement arithmetic circuit 84. The adder 85 outputs the data −δ × D output from the fifth coefficient multiplier 81.¹ _{n + 3}And the temporary data (S) input from the eighth terminal S7 of the second data selector 65.^Two _{n + 3}) Is added to the intermediate data S of the second stage in the target area B1.^Two _{n + 3}Is calculated and output to the output destination selection unit 86.
[0173]
The output destination selection unit 86 outputs the output data X (2n + 2) input from the adder 75 to the outside from the second terminal K1, according to the value of the selection control signal SEL2 supplied from the control unit 87. The output data X (2n + 2) is also output to the MMU 31. Further, the output destination selecting unit 86 outputs the three data points (X (2n + 1)) and (D (D)) input from the

adders

70, 80, and 85 according to the selection control signal SEL2.^Two _{n + 2}), S^Two _{n + 3}From the third terminal K2 to the fifth terminal K4 to the MMU 31. The MMU 31 outputs four points of data (X (2n + 1)), X (2n + 2), (D^Two _{n + 2}), S^Two _{n + 3}Is transferred to the ring memory 32s, and the MMU 31 outputs the data (X (2n + 1)), X (2n + 2), (D^Two _{n + 2}), S^Two _{n + 3}Is transferred to the ring memory 32s, and the referenced storage area D^Two _n, (X (2n + 2)), D¹ _{n + 2}, (S^Two _{n + 3}).
[0174]
Next, the conversion processing in the target areas A3, A4, B3, and B4 in the (N + 1) th processing (FIG. 24) is simultaneously executed in parallel. In the target area A3, the intermediate data S calculated by the arithmetic processing one clock cycle earlier is used.¹ _{n + 4}And intermediate data D¹ _{n + 3}, The two-point calculation of the above step a (FIG. 38) is performed, and the second-stage temporary data (S) on the series starting from the even-numbered input data Y (2n + 8)^Two _{n + 4}) Is calculated. Here, the intermediate data D¹ _{n + 3}Is the intermediate data S¹ _{n + 4}Is the data on the series one point before the series. In the target area A4, two points of data S^Two _{n + 2}, D^Two _{n + 1}Is performed, and the output temporary data (X (2n + 4)) on the series starting from the even-numbered input data Y (2n + 4) is calculated. In the target area B3, the temporary data (D^Two _{n + 2}) And intermediate data S^Two _{n + 3}And the second point intermediate data D in the series starting from the odd-numbered input data Y (2n + 5) by executing the two-point operation of step b (FIG. 38)^Two _{n + 2}Is calculated. Here, the intermediate data S^Two _{n + 3}Is the temporary data (D^Two _{n + 2}) Is data on the series one point after the series. Further, in the target area B4, the two-point operation of the above step b using the output temporary data (X (2n + 1)) and the output data X (2n + 2) is executed, and the odd-numbered input data Y (2n + 1) is set as the starting point. Output data X (2n + 1) on a series to be calculated.
[0175]
Further, the normalization process of the target area N2 is performed in a cycle one clock before the parallel processing in the target areas A3, A4, B3, and B4. In the target area N2, a normalization process of multiplying the input data Y (2n + 8) by a normalization coefficient κ is executed.
[0176]
Next, the contents of the (N + 1) -th specific processing are as follows. The processing will be described from the processing in the target area N2 of the cycle one clock before. The MMU 31 outputs the input data Y (2n + 8) temporarily stored in the ring memory 32s to the first data selector 60. The first data selector 60 outputs the input data Y (2n + 8) from the first terminal S0 according to the value of the selection control signal SEL0 from the control unit 87.
[0177]
The input data Y (2n + 8) output from the first terminal S0 is input to the first coefficient multiplier 61. In the first coefficient multiplier 61, the coefficient register 62 outputs the normalized coefficient κ of the two normalized coefficients κ and 1 / κ to the multiplier 63 according to the control signal C0 supplied from the control unit 87, The multiplier 63 multiplies the input data Y (2n + 8) by a normalization coefficient κ. As a result, the first coefficient multiplier 61 outputs the data S¹ _{n + 4}(= Κ × Y (2n + 8)) is calculated. The output of the multiplier 63 is input to the delay register 64. The above processing is executed in a cycle one clock before the arithmetic processing in the target areas A1, A2, B1, B2 is performed.
[0178]
In the next clock cycle, the MMU 31 stores the data of seven points (X (2n + 1)), X (2n + 2), D temporarily stored in the ring memory 32s.^Two _{n + 1}, S^Two _{n + 2}, (D^Two _{n + 2}), S^Two _{n + 3}, D¹ _{n + 3}To the first data selector 60. The first data selector 60 outputs the data of the seven points to the second data selector 65 according to the value of the selection control signal SEL0 supplied from the control unit 87. Further, the intermediate data S stored in the delay register 64¹ _{n + 4}Is output to the second data selector 65. The intermediate data S output from the delay register 64¹ _{n + 4}Is also branched and output to the external MMU 31, and the MMU 31¹ _{n + 4}Is transferred to the ring memory 32s to overwrite the storage area input data Y (2n + 8) which has been referred to.
[0179]
The second data selector 65 outputs the input data X (2n + 2) and (X (2n + 1) of two points in the target area B4 among the eight points of data in accordance with the value of the selection control signal SEL1 supplied from the control unit 87. )) To output to the first terminal S0 and the second terminal S1, and to output the intermediate data D in the target area A4.^Two _{n + 1}, S^Two _{n + 2}Are output from the third terminal S2 and the fourth terminal S3, and the intermediate data S in the target area B3 is output.^Two _{n + 3}And temporary data (D^Two _{n + 2}) Is output from the fifth terminal S4 and the sixth terminal S5, and the intermediate data D in the target area A3 is output.¹ _{n + 3}And S¹ _{n + 4}Are output from the seventh terminal S6 and the eighth terminal S7.
[0180]
In the second coefficient multiplier 66, the coefficient register 67 outputs the lifting coefficient α to the multiplier 66 in accordance with the control signal C1 supplied from the control unit 87, and the multiplier 68 outputs the data input from the first terminal S0. The data α × X (2n + 2) obtained by multiplying X (2n + 2) by the lifting coefficient α is output. The sign of the output data from the multiplier 68 is inverted in the two's complement arithmetic circuit 69 and output to the adder 70. The adder 70 combines the data −α × X (2n + 2) output from the second coefficient multiplier 66 with the temporary data (X (2n + 1)) input from the second terminal S1 of the second data selector 65. The output data X (2n + 1) in the target area B4 is calculated by the addition, and output to the output destination selection unit 86.
[0181]
Further, in the third coefficient multiplier 71, the coefficient register 72 outputs the lifting coefficient β to the multiplier 73 according to the control signal C2 supplied from the control unit 87, and the multiplier 73 receives the input from the third terminal S2. Intermediate data D^Two _{n + 1}Β × D obtained by multiplying by the lifting coefficient β^Two _{n + 1}Is output. The output of the multiplier 73 is output to the adder 75 after the sign is inverted in the two's complement arithmetic circuit 74. The adder 75 calculates the data −β × D output from the third coefficient multiplier 71.^Two _{n + 1}And intermediate data S input from the fourth terminal S3 of the second data selector 65.^Two _{n + 2}Is calculated, the output temporary data (X (2n + 4)) in the target area A4 is calculated and output to the output destination selection unit 86.
[0182]
In the fourth coefficient multiplier 76, the coefficient register 77 outputs the lifting coefficient γ to the multiplier 78 according to the control signal C3 supplied from the control unit 87, and the multiplier 78 receives the input from the fifth terminal S4. Intermediate data S^Two _{n + 3}Γ × S obtained by multiplying by the lifting coefficient γ^Two _{n + 3}Is output. The output of the multiplier 78 is output to the adder 80 after the sign is inverted in the two's complement arithmetic circuit 79. The adder 80 outputs the data −γ × S output from the fourth coefficient multiplier 76.^Two _{n + 3}And the temporary data (D) input from the sixth terminal S5 of the second data selector 65.^Two _{n + 2}) Is added to the intermediate data D in the target area B3.^Two _{n + 2}Is calculated and output to the output destination selection unit 86.
[0183]
In the fifth coefficient multiplier 81, the coefficient register 82 outputs the lifting coefficient δ to the multiplier 83 according to the control signal C4 supplied from the control unit 87, and the multiplier 83 receives the input from the seventh terminal S6. Intermediate data D¹ _{n + 3}Δ × D obtained by multiplying by the lifting coefficient δ¹ _{n + 3}Is output. The output of the multiplier 83 is output to the adder 85 after the sign is inverted in the two's complement arithmetic circuit 84. The adder 85 outputs the data −δ × D output from the fifth coefficient multiplier 81.¹ _{n + 3}And intermediate data S input from the eighth terminal S7 of the second data selector 65.¹ _{n + 4}Is added to the intermediate data S in the second stage in the target area A3.^Two _{n + 4}Is calculated and output to the output destination selection unit 86.
[0184]
The output destination selection unit 86 outputs the output data X (2n + 1) input from the adder 70 to the outside from the first terminal K0 according to the value of the selection control signal SEL2 supplied from the control unit 87. Further, the output destination selecting section 86 outputs the three data points (X (2n + 4)), D input from the

adders

75, 80, 85 according to the selection control signal SEL2.^Two _{n + 2}, (S^Two _{n + 4}) Is output to the MMU 31 from the third terminal K2 to the fifth terminal K4. The MMU 31 outputs three points of data (X (2n + 4)), D^Two _{n + 2}, (S^Two _{n + 4}) To the ring memory 32s, and the MMU 31 transmits the data (X (2n + 4)) and D at the three points.^Two _{n + 2}, (S^Two _{n + 4}) Is transferred to the ring memory 32s, and the storage area S^Two _{n + 2}, (D^Two _{n + 2}), S¹ _{n + 4}Overwrite.
[0185]
Next, four conversion processes of the target areas A5, A6, B5, and B6 in the (N + 2) th process (FIG. 25) are simultaneously executed in parallel within one clock cycle. Further, the normalization processing of the target area N3 is performed in a cycle one clock before the parallel processing in the target areas A5, A6, B5, and B6.
[0186]
The target areas A6, B6, A5, B5, and N3 are areas obtained by moving the target areas A2, B2, A1, B1, and N1 of the N-th processing (FIG. 23) two series (two points) backward. In the target areas A6, B6, A5, B5, and N3, the same processing as that in the target areas A2, B2, A1, B1, and N1 is performed. As a result, the temporary data (X (2n + 3)) in the target area A6, the output data X (2n + 4) in the target area B6, and the temporary data (D^Two _{n + 3}), But in the target area B5, the intermediate data S^Two _{n + 4}However, in the target area N3, the intermediate data D¹ _{n + 4}Are calculated respectively.
[0187]
Next, in the (N + 3) th processing (not shown), in the area in which the target areas B4, A4, B3, A3, and N2 of the (N + 1) th processing (FIG. 24) are moved backward by two series (two points), the (N + 1) th processing is performed. The same processing as the processing is performed.
[0188]
As described above, processes similar to the N-th process (FIG. 23) and the (N + 1) -th process (FIG. 24) are repeatedly executed while moving the target area until all output data is calculated. As a result, the average cycle required to calculate the output data of the even-numbered or odd-numbered one point can be set to one clock cycle, and the calculation cycle of the output data can be greatly reduced.
[0189]
Since the wavelet transform device according to the present embodiment includes the horizontal filtering unit and the vertical filtering unit having the configuration shown in FIG. 22, it performs the same line-based two-dimensional inverse DWT processing as in the second embodiment. It is possible. Therefore, it is possible to perform the wavelet transform in a very short time and at a high speed.
[0190]
Also in the third embodiment, as described in the second embodiment, the horizontal filtering unit 33s outputs a horizontal line in each processing, and the vertical filtering 33s inputs a pixel row in each processing. Unlike the wavelet transform device 1 according to the first embodiment, the line buffer circuit 5 is not required. Therefore, it is possible to realize an inexpensive wavelet transform device that operates on a small circuit and with low power consumption.
[0191]
<Modification>
FIG. 26 is a diagram illustrating a schematic configuration of a two-dimensional wavelet transform device 30a according to a modification of the second and third embodiments. The wavelet transform device 30a includes a buffer 88 for temporarily storing the two-dimensional image data of the subband, an MMU (memory management unit) 89 operating in synchronization with an externally supplied clock signal CLK, a first ring memory 32 or 32s. , A

horizontal filtering unit

33 or 33 s, a second ring memory 3, and a vertical filtering unit 4.
[0192]
Here, the second ring memory 3 and the vertical filtering unit 4 have the same configuration as the ring memory 3 and the filtering unit 4 according to the first embodiment. Therefore, the second ring memory 3B and the vertical filtering unit 4B of this modification can calculate one line of output data at a cycle of four lines.
[0193]
Further, the first ring memory 32 or 32 s and the

horizontal filtering unit

33 or 33 s are the same as the ring memory 32 and the filtering unit 33 according to the second embodiment or the ring memory 32 s according to the third embodiment. It has the same configuration as the part 33s. Therefore, the first ring memory 32 or 32 s and the

horizontal filtering unit

33 or 33 s of this modification can calculate one point of output data in one clock cycle.
[0194]
Therefore, in this modification, the

horizontal filtering unit

33 or 33 s performs processing so as to fetch input data from the first ring memory 32 at intervals of four clock cycles. This eliminates the need for the line buffer circuit 5 unlike the wavelet transform device 1 (FIG. 1) according to the first embodiment. Therefore, it is possible to realize a low-cost, small-circuit-scale, and low-cost wavelet transform device.
[0195]
In this modification, the second ring memory 3B and the vertical filtering unit 4B according to the first embodiment are employed as the second ring memory and the vertical filtering unit. Instead, the second ring memory and the vertical filtering unit are used. As described in the related art, a configuration in which one point of output data is calculated at an average of five clock cycles may be adopted. In this case, the

horizontal filtering unit

33 or 33s performs processing so as to fetch input data from the first ring memory 32 at a 5-line cycle interval. Thus, a configuration that does not require the line buffer circuit 5 can be achieved.
[0196]
<Fourth embodiment>
Next, a wavelet transform device and a wavelet transform method according to a fourth embodiment of the present invention will be described. FIG. 27 is a diagram illustrating a schematic configuration of a wavelet transform device 90 according to the fourth embodiment. The wavelet transform device 90 includes a buffer 91 for temporarily storing the two-dimensional image data of the subband, an MMU (memory management unit) 92 operating in synchronization with an externally supplied clock signal CLK, a first ring memory 32H, It comprises one horizontal filtering section 33H, a second ring memory 32L, a second horizontal filtering section 33L, a third ring memory 93, and a vertical filtering section 94. Here, the first ring memory 32H, the first horizontal filtering unit 33H, the second ring memory 32L, the second horizontal filtering unit 33L, the third ring memory 93, and the vertical filtering unit 94 are synchronized with the externally supplied pixel clock signal PCLK. Work.
[0197]
In the present embodiment, the MMU 92, the first horizontal filtering unit 33H, the second horizontal filtering unit 33L, and the vertical filtering unit 94 are configured by hardware, but instead of a computer including an instruction group to be executed by a microprocessor. It may be constituted by a program.
[0198]
The wavelet transform device 90 has a function of applying a line-based two-dimensional inverse DWT to two-dimensional image data once. The first and second horizontal filtering units 33H and 33L and the vertical filtering unit 94 are connected via a third ring memory 93, respectively.
[0199]
The MMU 92 has a function of controlling data input / output of the buffer 91, the first ring memory 32H, the second ring memory 32L, and the third ring memory 93, and stores the two-dimensional image data of the sub-band read from the buffer 91. The data can be transferred to and stored in the first ring memory 32H and the second ring memory 32L.
[0200]
Here, the data 23LL, 23HL, 23LH, and 23HH of the four sub-bands shown in FIG. 11 are input to the buffer 91, and the horizontal pixels of the sub-bands 23LH and 23HH are alternately input to the first ring memory 32H. Is input to the second ring memory 32L. The horizontal width W and the vertical width of the sub-bands 23LL and 23HL are alternately arranged in the second ring memory 32L. Image data of height H / 2 is input.
[0201]
The first horizontal filtering unit 33H performs filtering on the data input from the first ring memory 32H in the horizontal direction of the two-dimensional image, so that the sub-bands 23LH and 23HH are generated in one clock cycle of the pixel clock signal PCLK. Can be calculated one point at a time. The image data Y of the sub-band 23H calculated in this manner_H(M) is transferred to the third ring memory 93.
[0202]
The second horizontal filtering unit 33L performs filtering on the data input from the second ring memory 32L in the horizontal direction of the two-dimensional image, so that the sub-bands 23LL and 23HL are generated in one clock cycle of the pixel clock signal PCLK. Can be calculated one point at a time. The image data Y of the sub-band 23L calculated in this manner_L(M) is transferred to the third ring memory 93.
[0203]
The first horizontal filtering unit 33H and the second horizontal filtering unit 33L may have the same configuration as the

filtering unit

33 or 33s according to the second or third embodiment.
[0204]
On the other hand, the vertical filtering unit 94 outputs the image data Y of the

subbands

23L and 23H from the third ring memory 93._L(M) and Y_H(M), and the image data Y_L(M) and Y_HBy performing filtering in the horizontal direction for each pixel column with respect to the data in which the vertical lines of (m) are alternately arranged, the data of the vertical lines of the image data 23 can be converted in one clock cycle of the pixel clock signal PCLK. Two points can be calculated in the horizontal direction.
[0205]
FIG. 28 shows a schematic configuration of the vertical filtering unit 94 according to the present embodiment. The vertical filtering unit 94 includes a first data selector 95 for selectively taking in input data, first and

second coefficient multipliers

96 and 100, delay registers 99 and 103, a second data selector 104, and four preceding stages. Adders 105, 111, 117, 123, third to

sixth coefficient multipliers

106, 112, 118, 124, four

subsequent adders

110, 116, 122, 128, an output destination selection unit (DMUX) 129, and The control unit 130 is provided. Among these constituent elements, a set composed of two adders 105 and 110 and a third coefficient multiplier 106 constitutes a three-point operation unit for processing three-point data within one clock cycle. Also, a set including two adders 111 and 116 and a fourth coefficient multiplier 112, a set including two adders 117 and 122 and a fifth coefficient multiplier 118, and two

adders

123 and 128 Each set of the sixth coefficient multipliers 124 also constitutes a three-point operation unit for processing three-point data within one clock cycle. The four sets of three-point calculation units and the output destination selection unit 129 constitute an intermediate data calculation unit.
[0206]
The control unit 130 operates in synchronization with the pixel clock signal PCLK. The first data selector 95 outputs the data (Y) fetched from the third ring memory 93 according to the value of the selection control signal SEL0 supplied from the control unit 130._L(M) and Y_H(M) in which the vertical lines are alternately arranged) are selectively output from any of the first terminal S0 to the twelfth terminal S11.
[0207]
The data output from the first terminal S0 or the second terminal S1 of the first data selector 95 is input to the first coefficient multiplier 96 and the second coefficient multiplier 100. In the first coefficient multiplier 96, the coefficient register 97 outputs the normalized coefficient κ to the multiplier 98 according to the control signal C0 supplied from the control unit 130, and the multiplier 98 outputs the normalized coefficient to the input data. κ and outputs the multiplied output to the delay register 99. In the second coefficient multiplier 100, the coefficient register 101 outputs the normalized coefficient 1 / κ to the multiplier 102 according to the control signal C1 supplied from the control unit 130, and the multiplier 102 Is multiplied by a normalization coefficient 1 / κ, and the multiplied output is output to the delay register 103. Note that a set of the first coefficient multiplier 96 and the delay register 99 and a set of the second coefficient multiplier 101 and the delay register 103 constitute the standardization means of the present invention.
[0208]
The data input to the delay registers 99 and 103 are output to the second data selector 104 after being delayed by one clock cycle of the pixel clock signal PCLK. The data input to the delay register 103 is branched and output to the MMU 92.
[0209]
The data output from the third terminal S2 to the twelfth terminal S11 of the first data selector 95 is output to the second data selector 104, and the second data selector 104 supplies the data from the control unit 130. In response to the selected control signal SEL1, each data is output to four sets of three-point arithmetic units, and the three-point arithmetic units execute parallel processing.
[0210]
The adder 105 at the previous stage adds the two data points output from the first terminal S0 and the second terminal S1 of the second data selector 104 and outputs the result to the third coefficient multiplier 106. In the third coefficient multiplier 106, the coefficient register 107 outputs the lifting coefficient α to the multiplier 108 according to the control signal C2 supplied from the control unit 130, and the multiplier 108 outputs the data input from the adder 105. Is multiplied by a lifting coefficient α. The multiplied output is inverted in sign in the two's complement arithmetic circuit 109 and output to the subsequent adder 110. Then, the subsequent adder 110 adds the data input from the third coefficient multiplier 106 and the data input from the third terminal S2 of the second data selector 104, and outputs the result to the output destination selection unit 129.
[0211]
Further, the adder 111 at the preceding stage adds the two data points output from the fourth terminal S3 and the fifth terminal S4 of the second data selector 104 and outputs the result to the fourth coefficient multiplier 112. In the fourth coefficient multiplier 112, the coefficient register 113 outputs the lifting coefficient β to the multiplier 114 according to the control signal C3 supplied from the control unit 130, and the multiplier 114 outputs the data input from the adder 111. Is multiplied by a lifting coefficient β. The multiplied output is inverted in sign in the two's complement arithmetic circuit 115 and output to the subsequent adder 116. The adder 116 at the subsequent stage adds the data input from the fourth coefficient multiplier 112 and the data input from the sixth terminal S5 of the second data selector 104 and outputs the result to the output destination selection unit 129.
[0212]
Further, the adder 117 at the preceding stage adds the two data points output from the seventh terminal S6 and the eighth terminal S7 of the second data selector 104 and outputs the result to the fifth coefficient multiplier 118. In the fifth coefficient multiplier 118, the coefficient register 119 outputs the lifting coefficient γ to the multiplier 120 according to the control signal C4 supplied from the control unit 130, and the multiplier 120 outputs the data input from the adder 117. Is multiplied by a lifting coefficient γ. The multiplied output is inverted in sign in the two's complement arithmetic circuit 121 and output to the subsequent adder 122. The subsequent adder 122 adds the data input from the fifth coefficient multiplier 118 and the data input from the ninth terminal S8 of the second data selector 104, and outputs the result to the output destination selection unit 129.
[0213]
The adder 123 at the preceding stage adds the two data points output from the tenth terminal S9 and the eleventh terminal S10 of the second data selector 104 and outputs the result to the sixth coefficient multiplier 124. In the sixth coefficient multiplier 124, the coefficient register 125 outputs the lifting coefficient δ to the multiplier 126 in accordance with the control signal C5 supplied from the control unit 130, and the multiplier 126 outputs the data input from the adder 123. Is multiplied by a lifting coefficient δ. The multiplied output is inverted in sign in the two's complement arithmetic circuit 127 and output to the subsequent adder 128. The subsequent adder 128 adds the data input from the sixth coefficient multiplier 124 and the data input from the twelfth terminal S11 of the second data selector 104, and outputs the result to the output destination selection unit 129.
[0214]
The output destination selection unit 129 outputs the data of four points that are input in parallel from the

adders

110, 116, 122, and 128 at the first stage in accordance with the value of the selection control signal SEL2 supplied from the control unit 130. The signal is selectively output from any of the fourth terminals K3.
[0215]
The output destination selection unit 129 outputs output data X (2k) and X (2k + 1) from the first terminal K0 and the second terminal K1. The data output from the first terminal K0, the third terminal K2, and the fourth terminal K3 of the output destination selection unit 129 are also output to the MMU 92. The MMU 92 can transfer the data output from the first terminal K0, the third terminal K2, and the fourth terminal K3 to the third ring memory 93 and overwrite the data in the storage area that has been referred to.
[0216]
Next, a representative example of the lifting operation using the vertical filtering unit 94 will be described below with reference to FIGS. FIGS. 29 to 31 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of this grid diagram is performed in the same manner as in the case of FIG. FIGS. 29 to 31 show lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ, 1 / κ corresponding to line segments connecting the respective grid points for convenience of explanation. Not.
[0217]
29 to 31 schematically show the N-th (N is an integer) to (N + 2) -th processing in this embodiment.
[0218]
In the N-th process (FIG. 29), four conversion processes of the target areas C1, C2, C3, and C4 are simultaneously executed in parallel within one clock cycle.
[0219]
In the target area C1, two intermediate data D¹ _{n + 4}, D¹ _{n + 5}Is multiplied by a lifting coefficient −δ to calculate a multiplication value, and the multiplication value and the intermediate data S¹ _{n + 5}Are added to perform a three-point operation. As a result, the second-stage intermediate data S on the series starting from the even-numbered input data Y (2n + 10)^Two _{n + 5}Is calculated. Here, two intermediate data D¹ _{n + 4}, D¹ _{n + 5}Is the intermediate data S¹ _{n + 5}This is data on the series that is about one point behind the series.
[0220]
In the target area C2, two intermediate data S^Two _{n + 3}, S^Two _{n + 4}Is multiplied by a lifting coefficient −γ, and the multiplied value and the intermediate data D¹ _{n + 3}Are added to perform a three-point operation. As a result, the intermediate data D of the second stage on the series starting from the odd-numbered input data Y (2n + 7)^Two _{n + 3}Is calculated. Here, two intermediate data S^Two _{n + 3}, S^Two _{n + 4}Is the intermediate data D¹ _{n + 3}This is data on the series that is about one point behind the series.
[0221]
In the target area C3, two intermediate data D^Two _{n + 1}, D^Two _{n + 2}Is multiplied by a lifting coefficient −β to the multiplied value, and the multiplied value and the intermediate data S^Two _{n + 2}Are added to perform a three-point operation. As a result, output data X (2n + 4) on the series starting from the input data Y (2n + 4) is calculated. Here, two intermediate data D^Two _{n + 1}, D^Two _{n + 2}Is the intermediate data S^Two _{n + 2}This is data on the series that is about one point behind the series.
[0222]
Further, in the target area C4, a multiplication value is calculated by multiplying the data obtained by adding the output data X (2n) and X (2n + 2) of the even-numbered two points by the lifting coefficient -α, and then the multiplication value is calculated. Data D^Two _nAre added to perform a three-point operation. As a result, output data X (2n + 1) on the series starting from the input data Y (2n + 1) is calculated. Here, the input data X (2n) and X (2n + 2) of the even-numbered two points are the intermediate data D^Two _nIs one point before and after.
[0223]
Further, in the cycle one clock before execution of the arithmetic processing in the target areas C1 to C4, the arithmetic processing in the target areas N1 and N2 is executed in parallel. In the target area N1, a normalization process of multiplying the input data Y (2n + 10) by the normalization coefficient κ is executed, and the intermediate data S¹ _{n + 5}Is calculated in the target area N2, a normalization process of multiplying the input data Y (2n + 11) by a normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 5}Is calculated.
[0224]
The contents of the N-th specific processing are as follows. In the N-th processing, the arithmetic processing in the target areas C1, C2, C3, and C4 is performed within one clock cycle, but the arithmetic processing in the target areas N1 and N2 is performed one clock cycle before this arithmetic processing. It is. The processing in the cycle one clock before will be described. The MMU 92 receives the input data Y (2n + 10) and Y (2n + 11) temporarily stored in the ring memory 93, and receives the input data Y (2n + 10) from the first terminal S0 in response to the selection control signal SEL0 supplied from the control unit 130. ), And input data Y (2n + 11) is output from the second terminal S1.
[0225]
The input data Y (2n + 10) output from the first terminal S0 is input to the first coefficient multiplier 96. In the first coefficient multiplier 96, the coefficient register 97 outputs the normalized coefficient κ to the multiplier 98 in accordance with the control signal C0 supplied from the control unit 130, and the multiplier 98 converts the normalized coefficient κ to the input data Y (2n + 10). Multiply κ. As a result, the first coefficient multiplier 96 outputs the intermediate data S¹ _{n + 5}(= Κ × Y (2n + 10)) is calculated within one clock cycle.
[0226]
The input data Y (2n + 11) output from the second terminal S1 is input to the second coefficient multiplier 100. In the second coefficient multiplier 100, the coefficient register 101 outputs a normalization coefficient 1 / κ to the multiplier 102 according to the control signal C1 supplied from the control unit 130, and the multiplier 102 performs standardization on the input data Y (2n + 11). Multiplication factor 1 / κ. As a result, the second coefficient multiplier 100 outputs the intermediate data D¹ _{n + 5}(= 1 / κ × Y (2n + 11)) is calculated within one clock cycle.
[0227]
Intermediate data S output from first and

second coefficient multipliers

96 and 100¹ _{n + 5}, D¹ _{n + 5}Are input to the delay registers 99 and 103, respectively. In the delay registers 99 and 100, the intermediate data S¹ _{n + 5}, D¹ _{n + 5}Is output after being delayed by one clock cycle.
[0228]
After one clock cycle in which the arithmetic processing in the target areas N1 and N2 has been performed, the MMU 92 sets the ten-point data X (2n), D temporarily stored in the third ring memory 93.^Two _n, X (2n + 2), D^Two _{n + 1}, S^Two _{n + 2}, D^Two _{n + 2}, S^Two _{n + 3}, D¹ _{n + 3}, S^Two _{n + 4}, D¹ _{n + 4}To the first data selector 95. The first data selector 95 outputs the data of the ten points from the third terminal S2 to the twelfth terminal S11 according to the value of the selection control signal SEL0 supplied from the control unit 130. This output data is input to the second data selector 104. Also, the intermediate data S stored in the delay registers 96 and 103¹ _{n + 5}, D¹ _{n + 5}Is input to the second data selector 104. Intermediate data D output from delay register 103¹ _{n + 5}Is also branched and output to the external MMU 92, which outputs the intermediate data D¹ _{n + 5}Is transferred to the ring memory 93 to overwrite the storage area input data Y (2n + 11) which has been referred to.
[0229]
The second data selector 104 selects input data X (2n) and X (2n + 2) of three points in the target area C4 from the data of the twelve points in response to the selection control signal SEL1 supplied from the control unit 130. ), D^Two _nAnd outputs the data from the first terminal S0 to the third terminal S2, respectively, and outputs data D of three points in the target area C3.^Two _{n + 1}, D^Two _{n + 2}, S^Two _{n + 2}And outputs the data from the fourth terminal S3 to the sixth terminal S5, respectively, and outputs the data S of three points in the target area C2.^Two _{n + 3}, S^Two _{n + 4}, D¹ _{n + 3}And outputs the data from the seventh terminal S6 to the ninth terminal S8, respectively, and outputs data D of three points in the target area C1.¹ _{n + 4}, D¹ _{n + 5}, S¹ _{n + 5}Is selected and output from the tenth terminal S9 to the twelfth terminal S11, respectively.
[0230]
The adder 105 in the preceding stage adds the data obtained by adding the data X (2n) and X (2n + 2) of the two points in the target area C4 input from the first terminal S0 and the second terminal S1 of the second data selector 104 to the Output to the three coefficient multiplier 106. In the third coefficient multiplier 106, the coefficient register 107 supplies the lifting coefficient α to the multiplier 108 according to the control signal C2, and the multiplier 108 multiplies the input data by the lifting coefficient α (= α × (X (2n) + X (2n + 2))). After the sign of the output data is inverted by the two's complement arithmetic circuit 109, the output data is output to the adder 110 at the subsequent stage. Then, the adder 110 at the subsequent stage compares the multiplication value input from the third coefficient multiplier 106 with the data D input from the third terminal S2 of the second data selector 104.^Two _nIs added to calculate the output data X (2n + 1) in the target area C4, and output it to the output destination selection unit 129.
[0231]
The adder 111 at the preceding stage outputs the data D of two points in the target area C3 input from the fourth terminal S3 and the fifth terminal S4 of the second data selector 104.^Two _{n + 1}, D^Two _{n + 2}Is output to the fourth coefficient multiplier 112. In the fourth coefficient multiplier 112, the coefficient register 113 supplies the lifting coefficient β to the multiplier 114 in accordance with the control signal C3, and the multiplier 114 multiplies the input data by the lifting coefficient β (= β × (D^Two _{n + 1}+ D^Two _{n + 2})) Is output. This output data is output to the subsequent adder 116 after the sign is inverted in the two's complement arithmetic circuit 115. Then, the subsequent adder 116 calculates the multiplication value input from the fourth coefficient multiplier 112 and the data S input from the sixth terminal S5 of the second data selector 104.^Two _{n + 2}Is added, the output data X (2n + 4) in the target area C3 is calculated and output to the output destination selection unit 129.
[0232]
The adder 117 at the previous stage outputs the data S of two points in the target area C2 input from the seventh terminal S6 and the eighth terminal S7 of the second data selector 104.^Two _{n + 3}, S^Two _{n + 4}Is output to the fifth coefficient multiplier 118. In the fifth coefficient multiplier 118, the coefficient register 119 supplies the lifting coefficient γ to the multiplier 120 according to the control signal C4, and the multiplier 120 multiplies the input data by the lifting coefficient γ (= γ × (S^Two _{n + 3}+ S^Two _{n + 4})) Is output. After the sign of the output data is inverted in the two's complement arithmetic circuit 121, the output data is output to the subsequent adder 122. Then, the adder 122 at the subsequent stage calculates the multiplication value input from the fifth coefficient multiplier 118 and the data D input from the ninth terminal S8 of the second data selector 104.¹ _{n + 3}Is added to the intermediate data D in the target area C2.^Two _{n + 3}Is calculated and output to the output destination selection unit 129.
[0233]
The adder 123 in the preceding stage outputs the data D of two points in the target area C1 input from the tenth terminal S9 and the eleventh terminal S10 of the second data selector 104.¹ _{n + 4}, D¹ _{n + 5}Is output to the sixth coefficient multiplier 124. In the sixth coefficient multiplier 124, the coefficient register 125 supplies the lifting coefficient δ to the multiplier 126 according to the control signal C5, and the multiplier 126 multiplies the input data by the lifting coefficient δ (= δ × (D¹ _{n + 4}+ D¹ _{n + 5})) Is output. After the sign of the output data is inverted by the two's complement arithmetic circuit 127, the output data is output to the adder 128 at the subsequent stage. Then, the adder 128 at the subsequent stage outputs the multiplied value input from the sixth coefficient multiplier 124 and the intermediate data S input from the twelfth terminal S11 of the second data selector 104.¹ _n ₊₅Is added to the intermediate data S in the target area C1.^Two _{n + 5}Is calculated and output to the output destination selection unit 129.
[0234]
The output destination selection unit 129 outputs the two-point output data input from the two subsequent adders 110 and 116 from the first terminal K0 and the second terminal K1, respectively, according to the value of the selection control signal SEL2. Further, the output destination selection unit 129 outputs the data of three points input from the three

adders

110, 122, and 128 at the subsequent stage to the MMU 92. The MMU 92 outputs the intermediate data X (2n + 4), D^Two _{n + 3}, S^Two _{n + 5}Is transferred to the third ring memory 93, and the MMU 92 outputs the data (2n + 4) and D at the three points.^Two _{n + 3}, S^Two _{n + 5}Is transferred to the ring memory 93, and the referenced storage area S^Two _{n + 2}, D¹ _{n + 3}, Y (2n + 10).
[0235]
In the next (N + 1) -th processing (FIG. 30), conversion processing of the target areas C5, C6, C7, and C8 is performed. In addition, two normalization processes of the target regions N3 and N4 are executed in a cycle one clock before the conversion process of the target regions C5, C6, C7, and C8. The target areas C5, C6, C7, C8, N3, and N4 are areas obtained by moving the target areas C1, C2, C3, C4, N1, and N2 of the N-th processing (FIG. 29) two series (two points) backward. is there. In these target areas C5, C6, C7, C8, N3, and N4, processing similar to the processing in the target areas C1, C2, C3, C4, N1, and N2 is performed, respectively. Therefore, in the target area C8, the output data X (2n + 3) on the series starting from the odd-numbered input data Y (2n + 3) is calculated. In the target area C7, the even-numbered input data Y (2n + 6) is set as the starting point. The output data X (2n + 6) on the sequence to be processed is calculated, and in the target area C6, the intermediate data D of the second stage on the sequence starting from the odd-numbered input data Y (2n + 9)^Two _{n + 4}Is calculated, and in the target area C1, the second-stage intermediate data S on the series starting from the even-numbered input data Y (2n + 12)^Two _{n + 6}Is calculated. In the cycle one clock before, in the target areas N3 and N4, normalization processing is performed on the input data Y (2n + 12) and Y (2n + 13).
[0236]
Furthermore, in the (N + 2) -th processing (FIG. 31), conversion processing of the target areas C9, C10, C11, and C12 is performed. Further, two normalization processes of the target regions N5 and N6 are executed in a cycle one clock before the conversion process of the target regions C9, C10, C11 and C12.
[0237]
As described above, the same processing as the N-th processing (FIG. 29) is repeatedly performed while moving the target area until output data of all points is calculated. As a result, the average period required to calculate the output data of the even-numbered and odd-numbered points can be set to one clock period, and the calculation period of the output data can be greatly reduced.
[0238]
Next, a line-based two-dimensional inverse DWT process using the wavelet transform device 90 will be described below.
[0239]
The data input to the first horizontal filtering unit 33H is the subbands 23LH and 23HH shown in FIG. 11, and the data input to the second horizontal filtering unit 33L is the subbands 23LL and 23HL. Then, the first and second horizontal filtering units 33H and 33L output the subband 23H (Y_H(M)), 23L (Y_L(M)) is output.
[0240]
Data input to the vertical filtering unit 94 is data Y output from the first and second horizontal filtering units 33H and 33L._H(M), Y_L(M), and these data Y_H(M), Y_LBy alternately arranging the data of the vertical line (m), the data is input as a pixel row in the horizontal direction. Then, the vertical filtering unit 94 outputs the two-dimensional image data 23.
[0241]
Specifically, the first ring memory 32H and the first horizontal filtering unit 33H output the sub-band 23H by filtering the data input in units of horizontal lines at one clock cycle per point, and output the sub-band 23H. The memory 32L and the second horizontal filtering unit 33L output the subband 23L by filtering data input in units of horizontal lines at one clock cycle per point.
[0242]
The first ring memory 32H and the second ring memory 33L can use the ring memory 32s of FIG. 22 described in the third embodiment, and as shown in FIG._j(K), X_{j + 1}(K), has a storage area 133 for holding data of eight points (eight pixels) corresponding to the temporary data and the intermediate data. Alternatively, as the first ring memory 32H and the second ring memory 33L, the ring memory 32 of FIG. 15 described in the second embodiment can be used. As shown in FIG._j(K), X_{j + 1}It has a storage area 59 for storing data of 9 points (9 pixels) corresponding to (k),..., And can store the temporary data and intermediate data.
[0243]
Similarly, the first and second horizontal filtering units 33H and 33L use the filtering unit 33s of FIG. 22 described in the third embodiment or the filtering unit 33 of FIG. 15 described in the second embodiment. Can be.
[0244]
The third ring memory 93 and the vertical filtering unit 94 repeatedly execute each processing including the N-th processing (FIG. 29) and the (N + 1) -th processing (FIG. 30) for each pixel row in units of horizontal lines. For example, after the N-th process (FIG. 29) is performed on the 0th pixel column, it is performed on the first pixel column, then on the second pixel column, ... Finally, the processing is performed on the (W-1) -th pixel column. After that, the above-described N + 1-th process (FIG. 30) is performed on the 0th pixel column, then on the first pixel column, and further on the second pixel column. ... Finally, the processing is performed on the (W−1) th pixel column. In this way, each process is repeatedly executed for all the pixel columns.
[0245]
As a result, the data of the even-numbered rows and the data of the odd-numbered rows are output in parallel for each horizontal line from the vertical filtering unit 94. For example, as a result of continuously executing the N-th process (FIG. 29) on the 0th to W-1st pixel columns, the data X of the odd-numbered row of the 2n + 1st horizontal line is obtained.₀(2n + 1), X₁(2n + 1), ..., X_j(2n + 1), ..., X_W-1(2n + 1) are continuously output. In parallel with this, the data X of the even-numbered row of the 2n + 4th horizontal line₀(2n + 4), X₁(2n + 4), ..., X_j(2n + 4), ..., X_W-1(2n + 4) are continuously output.
[0246]
As schematically shown in FIG. 32, the first ring memory 93 has a storage area 132 for holding data of 12 × W points (12 lines) corresponding to the input data sequence. And intermediate data. This storage area 132 is an aggregate of row areas that hold 12 points of data in the vertical direction. One column area holds input data and intermediate data referred to in one process. For example, in the N-th processing (FIG. 29), in a certain column area, the data sequence {X (2n), D^Two _n, X (2n + 2), D^Two _{n + 1}, S^Two _{n + 2}, D^Two _{n + 2}, S^Two _{n + 3}, D¹ _{n + 3}, S^Two _{n + 4}, D¹ _{n + 4}, Y (2n + 10), Y (2n + 11)}, a data string {X (2n), D^Two _n, X (2n + 2), D^Two _{n + 1}, X (2n + 4), D^Two _{n + 2}, S^Two _{n + 3}, D^Two _{n + 3}, S^Two _{n + 4}, D¹ _{n + 4}, S^Two _{n + 5}, D¹ _{n + 5}The storage contents change to｝ (data S^Two _{n + 2}, D¹ _{n + 3}, Y (2n + 10), and Y (2n + 11) are data X (2n + 4), D^Two _{n + 3}, S^Two _{n + 5}, D¹ _{n + 5}Will be overwritten).
[0247]
By performing the above processing recursively, it is possible to synthesize subbands (band components) having decomposition levels of any order. That is, the four subbands LL (k + 1), HL (k + 1), LH (k + 1), and HH (k + 1) at the (k + 1) -th (k is an integer of 2 or more) decomposition level are input to the wavelet transform device 90. , LL (k) at the k-th decomposition level can be obtained. By performing such processing recursively, the original image data is restored from the sub-band at the k-th decomposition level. It is possible.
[0248]
As described above, in the wavelet transform device 90 and the wavelet transform method according to the present embodiment, four transform processes for calculating four-point intermediate data and two normalization processes for normalizing two-point intermediate data are performed. Since the simultaneous execution is performed in parallel within one clock cycle, the calculation cycle of the output data can be greatly reduced. Therefore, it is possible to execute the wavelet transform in a very short time and at a high speed.
[0249]
In addition, the wavelet transform device 90 includes first and second horizontal filtering units 33H and 33L that calculate one point of data within one clock cycle, and a vertical filtering unit 94 that calculates two points of data within one clock cycle. , Two combined data can be calculated in parallel within one clock cycle. Therefore, it is possible to execute the line-based two-dimensional DWT calculation at an extremely high speed.
[0250]
<Modification>
FIG. 34 is a diagram illustrating a schematic configuration of a two-dimensional wavelet transform device 140 according to a modification of the above-described fourth embodiment. The wavelet transform device 140 includes a buffer 91 for temporarily storing the two-dimensional image data of the subband, an MMU (memory management unit) 92A operating in synchronization with an externally supplied clock signal CLK, a first ring memory 93A, The configuration includes a filtering unit 94A, a line buffer circuit 141, a second ring memory 93B, and a vertical filtering unit 94B.
[0251]
Here, the horizontal filtering unit 94A and the vertical filtering unit 94B have the same configuration as the configuration of the vertical filtering unit 94 (FIG. 28) according to the fourth embodiment, and perform the lifting operation shown in FIGS. Data is provided and controlled to execute.
[0252]
From the horizontal filtering unit 94A, the data of the sub-bands 23H and 23L are alternately output for each horizontal line.
[0253]
In the line buffer circuit 141, each of the first line buffer 143 and the second line buffer 144 includes a buffer for two horizontal lines. During a period in which the selector 142 stores two pieces of input data in one of the first line buffer 143 and the second line buffer 144, the demultiplexer 145 reads out the two pieces of data stored in the other side and reads out the two pieces of data. Output to the two-ring memory 93B.
[0254]
As described above, according to the configuration of the present modification, two points of composite data can be calculated in parallel within one clock cycle, so that a line-based two-dimensional DWT calculation can be performed at an extremely high speed.
[0255]
【The invention's effect】
As described above, according to the wavelet transform device of the present invention, the process of normalizing each input data and the transform process of transforming each intermediate data into other intermediate data and output data on one line are repeatedly executed. Since at least two processes among a plurality of processes repeatedly executed are executed in parallel within one clock cycle, the calculation cycle of output data can be shortened, and inverse wavelet transform can be executed in a short time and at high speed. Will be possible.
[0256]
According to the wavelet transform method of the present invention, a step (b) of normalizing input data and converting it into intermediate data of a first stage, and a step of converting intermediate data into other intermediate data in a series ( c) and the step (d) of converting intermediate data of the final stage into output data are repeatedly executed. However, at least two steps of the repeatedly executed steps are executed in parallel within one clock cycle. The cycle of calculating output data from the input data sequence can be shortened, and inverse wavelet transform can be performed in a short time and at high speed.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a schematic configuration of a wavelet transform device according to a first embodiment of the present invention.
FIG. 2 is a schematic configuration diagram of a filtering unit according to the first embodiment.
FIG. 3 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 4 is a diagram schematically showing steps of a lifting operation according to the first embodiment.
FIG. 5 is a diagram schematically showing a step of a lifting calculation according to the first embodiment.
FIG. 6 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 7 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 8 is a diagram schematically illustrating a step of a lifting operation according to the first embodiment.
FIG. 9 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 10 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 11 is a diagram schematically showing a process of synthesizing an image from subbands.
FIG. 12 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 13 is a diagram schematically showing a storage area of a ring memory.
FIG. 14 is a diagram illustrating a schematic configuration of a wavelet transform device according to a second embodiment of the present invention.
FIG. 15 is a schematic configuration diagram of a filtering unit according to a second embodiment.
FIG. 16 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 17 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 18 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 19 is a diagram schematically illustrating a lifting calculation process according to the second embodiment.
FIG. 20 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 21 is a diagram schematically showing a storage area of a ring memory.
FIG. 22 is a diagram illustrating a schematic configuration of a filtering unit according to a third embodiment of the present invention.
FIG. 23 is a view schematically showing a step of a lifting operation according to the third embodiment.
FIG. 24 is a diagram schematically showing a lifting calculation process according to the third embodiment.
FIG. 25 is a diagram schematically showing a lifting calculation process according to the third embodiment.
FIG. 26 is a diagram illustrating a schematic configuration of a wavelet transform device according to a modification of the second and third embodiments.
FIG. 27 is a diagram illustrating a schematic configuration of a wavelet transform device according to a fourth embodiment of the present invention.
FIG. 28 is a schematic configuration diagram of a vertical filtering unit according to a fourth embodiment.
FIG. 29 is a diagram schematically showing a lifting calculation process according to the fourth embodiment.
FIG. 30 is a diagram schematically showing a lifting calculation process according to the fourth embodiment.
FIG. 31 is a diagram schematically showing a lifting calculation process according to the fourth embodiment.
FIG. 32 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 33 is a diagram schematically showing a storage area of a ring memory.
FIG. 34 is a diagram illustrating a schematic configuration of a wavelet transform device according to a modification of the fourth embodiment.
FIG. 35 is a diagram schematically showing a filter bank used in DWT and inverse DWT.
FIG. 36 is a diagram schematically illustrating image data subjected to two-dimensional DWT at a third-order decomposition level.
FIG. 37 is a lattice diagram schematically showing a lifting configuration on the combining side.
FIG. 38 is a diagram schematically illustrating a calculation method recommended by the JPEG2000 system.
FIG. 39 is a diagram schematically showing a step of a lifting calculation.
FIG. 40 is a view schematically showing a step of a lifting operation.
FIG. 41 is a diagram schematically showing a step of a lifting calculation.
FIG. 42 is a view schematically showing a step of a lifting operation.
FIG. 43 is a view schematically showing a step of a lifting operation.
FIG. 44 is a view schematically showing a step of a lifting operation.
FIG. 45 is a diagram schematically showing a lifting calculation process.
FIG. 46 is a view schematically showing a step of a lifting operation.
FIG. 47 is a view schematically showing a step of a lifting operation.
FIG. 48 is a view schematically showing a step of a lifting operation.
[Explanation of symbols]
1 Wavelet transform device
2 MMU (memory management unit)
3A, 3B ring memory
4A, 4B filtering unit
5 Line buffer circuit

Claims

A wavelet transform device that combines band-divided high-frequency component data and low-frequency component data based on a lifting configuration,
A control unit;
An output data sequence synthesized by taking in an input data sequence constituted by alternately arranging a first data sequence composed of one of a high-frequency component and a low-frequency component and a second data sequence composed of the other in pixel units A filtering unit that calculates
With
The filtering unit,
A standard for executing one or more normalization processes for converting each input data into first stage intermediate data within one clock cycle per point by multiplying each of the input data strings by a predetermined normalization coefficient. Means,
Each of the first-stage intermediate data standardized by the normalization means is converted into one series or a plurality of stages of intermediate data within one clock cycle per point, or the final stage intermediate data is converted. Intermediate data conversion means for performing one or more conversion processes for converting each to output data within one clock cycle per point,
The control unit includes:
Causing the normalizing means and the intermediate data converting means to repeatedly execute the singular or plural normalizing processing and the singular or plural converting processing until the output data of all points is calculated; and Controlling at least two of the singular or plural normalization processes and the singular or plural conversion processes to be executed in parallel within one clock cycle;
A wavelet transform device characterized by the above-mentioned.

2. The wavelet transform device according to claim 1, wherein said normalization means and said intermediate data conversion means execute said normalization processing and said conversion processing in parallel.

The wavelet transform device according to claim 1 or 2, wherein
The normalizing means,
A normalization coefficient multiplier for multiplying each input data by the normalization coefficient,
A delay unit for delaying data output from the normalization coefficient multiplier,
Including
The intermediate data conversion means,
A lifting coefficient multiplier for multiplying one of the two intermediate data by a predetermined lifting coefficient; and an adder for adding the data output from the lifting coefficient multiplier and the other of the two intermediate data. A point calculation unit,
An output destination selection unit that captures data output from the two-point calculation unit and outputs the data to an output destination specified by the control unit;
Including
The wavelet transform device further includes:
A memory management unit,
A memory for temporarily storing data under the control of the memory management unit;
With
The memory management unit,
Controlling the data output from the output destination selection unit to be transferred to and stored in the memory,
A wavelet transform device characterized by the above-mentioned.

The wavelet transform device according to claim 3, wherein
The control unit, as the conversion process,
The intermediate data at the first stage on the “series starting from the input data belonging to the second data string” (hereinafter, referred to as a second series), and the “1st stage intermediate data before the intermediate data” By adding intermediate data of the first stage on a sequence starting from input data belonging to a data sequence (hereinafter referred to as a first sequence) with data obtained by multiplying by a predetermined lifting coefficient, A first conversion process for calculating the second stage temporary data on the second stream within one clock cycle per point;
The temporary data calculated in the first conversion process and stored in the memory, and the intermediate data of the first stage on the first series one point after the temporary data series are multiplied by a predetermined lifting coefficient. A second conversion process of calculating the intermediate data of the second stage on the second sequence within one clock cycle per point by adding
The first stage intermediate data on the first series and the data obtained by multiplying the second stage intermediate data on the second series one point earlier by a predetermined lifting coefficient with respect to the intermediate data are added. In this way, a third conversion process of calculating the second stage temporary data on the first stream within one clock cycle per point,
Multiplying the temporary data calculated in the third conversion processing and stored in the memory by a predetermined lifting coefficient to the intermediate data of the second stage on the second series one point after the temporary data series A fourth conversion process of calculating the intermediate data of the second stage on the first series within one clock cycle per point by adding
A predetermined lifting is performed on the intermediate data of the M-th stage (the number of stages M is an integer of 1 or more) on the second sequence and the intermediate data of the M-th stage on the first sequence one point before the intermediate data sequence A fifth conversion process of calculating temporary data of the (M + 1) -th stage on the second sequence within one clock cycle per point by adding the data obtained by multiplying the coefficients;
Multiplying the temporary data calculated in the fifth conversion process and stored in the memory and the intermediate data of the M-th stage on the first series one point after the temporary data series by a predetermined lifting coefficient; A sixth conversion process of calculating the intermediate data of the (M + 1) th stage on the second stream within one clock cycle per point by adding the data obtained by
A predetermined lifting is performed on intermediate data of the L-th stage (the number of stages L is an integer of 1 or more) on the first stream and intermediate data of the L + 1-th stage on the second stream one point before the series of the intermediate data A seventh conversion process of calculating the temporary data of the (L + 1) th stage on the first series within one clock cycle per point by adding the data obtained by multiplying the coefficients,
The temporary data calculated in the seventh conversion processing and stored in the memory and the intermediate data of the L + 1-th stage on the second series one point after the temporary data series are multiplied by a predetermined lifting coefficient. An eighth conversion process of calculating the intermediate data of the (L + 1) th stage on the first stream within one clock cycle per point by adding
Is controlled so that the two-point calculation unit is repeatedly executed until the output data of all points is calculated.

The wavelet transform device according to claim 4, wherein
The control unit, after executing the first conversion process and the third conversion process, performs the fifth conversion process and the seventh conversion process until the temporary data in the final stage is calculated. After the second conversion process and the fourth conversion process are performed by the two-point operation unit, the output data is calculated by performing the sixth conversion process and the eighth conversion process after performing the second conversion process and the fourth conversion process. Until it is executed by the two-point calculation unit,
Wavelet transform device.

The wavelet transform device according to claim 4, wherein
It comprises four of said two-point arithmetic units operating independently of each other,
The control unit, as the conversion process,
A second conversion process of calculating intermediate data of a second stage on a sequence belonging to the second data sequence and starting from a P-th (data number P is an integer) input data in the input data sequence; ,
The third conversion processing of calculating temporary data of a second stage on a series starting from the P-1st input data;
The sixth conversion processing for calculating intermediate data at the (M + 1) th stage on a sequence starting from the P-4th input data;
The seventh conversion process of calculating temporary data at the (L + 1) th stage on a sequence starting from the P-5th input data;
The above four steps are executed in parallel by the respective two-point arithmetic units,
The first conversion processing of calculating temporary data of a second stage on a series starting from the P + 2th input data;
A fourth conversion process of calculating the intermediate data in a second stage on a sequence starting from the P-1st input data;
The fifth conversion processing for calculating the M-th stage temporary data on the series starting from the P-2nd input data;
The eighth conversion processing for calculating intermediate data at the (L + 1) th stage on a sequence starting from the P-5th input data;
A wavelet transform device that controls each of the two processing units to execute the four processes in parallel.

The wavelet transform device according to claim 1 or 2, wherein
The normalizing means,
A normalization coefficient multiplier for multiplying each input data by the normalization coefficient,
A delay unit for delaying data output from the normalization coefficient multiplier,
Including
The intermediate data conversion means,
A first adder for adding the first and second input data among the three input data taken in, and a lifting coefficient multiplier for multiplying the data output from the first adder by a predetermined lifting coefficient A three-point operation unit comprising: a second adder that calculates intermediate data by adding data output from the lifting coefficient multiplier and third input data;
An output destination selection unit that takes in the intermediate data output from the three-point calculation unit and outputs the intermediate data to an output destination specified by the control unit;
Including
The memory management unit,
Controlling the intermediate data output from the output destination selection unit to be transferred to and stored in the memory;
Wavelet transform device.

The wavelet transform device according to claim 7, wherein
The control unit, as the conversion process,
The first-stage intermediate data on the “sequence starting from the input data belonging to the second data sequence” (hereinafter, referred to as a second sequence) and one point around the intermediate data sequence “ It is obtained by multiplying data obtained by adding intermediate data of the first stage of two points on a sequence starting from input data belonging to the first data sequence (hereinafter referred to as a first sequence) by a predetermined lifting coefficient. By adding the data, a first conversion process of calculating the second-stage intermediate data on the second sequence within one clock cycle per point, and a first-stage intermediate data on the first sequence Add the data obtained by multiplying the data obtained by adding the intermediate data of the second stage of the second series, which is about one point to the series of the intermediate data of the first stage, by a predetermined lifting coefficient. By doing, the intermediate data of the second stage on the first series A second conversion processing for calculating the data within one clock cycle per point,
Intermediate data at the M-th stage (the number of stages M is an integer equal to or greater than 1) on the second series, and two M-th stages on the first series that are about one point behind the series of intermediate data at the M-th The intermediate data of the (M + 1) th stage on the second sequence is calculated within one clock cycle by adding data obtained by multiplying the intermediate data of the second series by a predetermined lifting coefficient. 3 conversion processing;
Intermediate data at the L-th stage (the number of stages L is an integer equal to or greater than 1) on the first stream, and two points on the second stream at the L + 1-th stage, which is one point around the L-stage intermediate data series By adding the data obtained by multiplying the intermediate data of the first series by a predetermined lifting coefficient and adding the data obtained by multiplying the data by a predetermined lifting coefficient, the intermediate data of the (L + 1) th stage on the first series is calculated within one clock cycle per point. 4 conversion processing;
Is controlled so that the three-point calculation unit is repeatedly executed until the output data of all the points is calculated.

The wavelet transform device according to claim 8, wherein
Comprising two of the three-point operation units operating independently of each other;
The control unit includes:
A second conversion process for calculating the intermediate data on a sequence belonging to the first data sequence and starting from a P-th (data number P is an integer) input data in the input data sequence;
The fourth conversion processing of calculating the intermediate data of the (L + 1) -th stage on the sequence starting from the P-4th input data;
A wavelet transform device for controlling the two processes of the three-point calculation units to execute the two processes in parallel.

The wavelet transform device according to claim 8 or 9, wherein:
The control unit includes:
The first conversion processing of calculating the intermediate data on a sequence starting from the P + 3th (data number P is an integer) input data in the input data sequence;
The third conversion processing of calculating intermediate data at the (M + 1) th stage on a sequence starting from the (P-1) th input data;
A wavelet transform device for controlling the two processes of the three-point calculation units to execute the two processes in parallel.

9. The wavelet transform device according to claim 8, wherein the control unit controls the first to fourth transform processes in parallel.

The wavelet transform device according to any one of claims 1 to 11, wherein
The filtering unit includes a first filtering unit and a second filtering unit connected in series,
The first filtering unit receives the data of the high-frequency component and the low-frequency component, which are band-divided in one of a horizontal direction and a vertical direction, and combines these data to calculate a line unit. ,
The second filtering unit calculates the combined data in the other direction of the horizontal direction and the vertical direction by performing a process on the combined data calculated by the first filtering unit.
Wavelet transform device.

A wavelet transform method for combining band-divided high-frequency component data and low-frequency component data based on a lifting configuration,
(A) Input data is selected from an input data sequence in which a first data sequence composed of one of a high-frequency component and a low-frequency component and a second data sequence composed of the other are alternately arranged in pixel units. The process of capturing
(B) multiplying each of the input data fetched in the step (a) by a normalization coefficient to convert the input data into a first stage intermediate data within one clock cycle per point;
(C) calculating the m-th stage (m is an integer of 1 or more) intermediate data into the (m + 1) -th stage intermediate data within one clock cycle per point (where the m-th stage intermediate data is the final stage intermediate data) In this case, the intermediate data at the (m + 1) th stage is output data.)
With
The steps (b) and (c) are repeatedly executed until the output data of all points are calculated, and the steps (b) and (c) executed repeatedly are performed within one clock cycle. A wavelet transform method, which is performed in parallel.

The wavelet transform method according to claim 13, wherein
The step (c) comprises:
(C-1) Intermediate data at the first stage on the "series starting from the input data belonging to the second data string" (hereinafter referred to as "second series") and one point before the intermediate data Is added to the data obtained by multiplying the intermediate data of the first stage on the “sequence starting from the input data belonging to the first data string” (hereinafter referred to as the first series) by a predetermined lifting coefficient. Calculating the temporary data of the second stage on the second stream within one clock cycle per point;
(C-2) the temporary data calculated in the step (c-1) and stored in the memory, and the intermediate data of the first stage on the first series one point after the series of the temporary data Calculating intermediate data of the second stage on the second sequence within one clock cycle per point by adding the data obtained by multiplying by a predetermined lifting coefficient;
(C-3) Intermediate data in the first stage on the first stream and intermediate data in the second step on the second stream one point before the intermediate data are multiplied by a predetermined lifting coefficient. Calculating the temporary data of the second stage on the first stream within one clock cycle per point by adding
(C-4) the temporary data calculated in the step (c-3) and stored in the memory, and the intermediate data of the second stage on the second series one point after the series of the temporary data Calculating intermediate data of the second stage on the first series within one clock cycle per point by adding the data obtained by multiplying by a predetermined lifting coefficient;
(C-5) Intermediate data of the M-th stage (the number of stages M is an integer of 1 or more) on the second stream, and the middle of the M-th stage on the first stream one point before the series of the intermediate data Calculating temporary data of the (M + 1) th stage on the second stream within one clock cycle by adding the data obtained by multiplying the data by a predetermined lifting coefficient;
(C-6) The temporary data calculated in the step (c-5) and stored in the memory, and the intermediate data of the M-th stage on the first series one point after the series of the temporary data. Calculating the temporary data of the (M + 1) th stage on the second stream within one clock cycle per point by adding the data obtained by multiplying the predetermined lifting coefficient;
(C-7) The L-th stage (the number of stages L is an integer of 1 or more) of the intermediate data on the first stream and the L + 1-th stage on the second stream that is one point before the intermediate data series Calculating the temporary data of the (L + 1) -th stage on the first stream within one clock cycle by adding the intermediate data and data obtained by multiplying the intermediate data by a predetermined lifting coefficient;
(C-8) The temporary data calculated in the step (c-7) and stored in the memory, and the intermediate data of the L + 1-th stage on the second series one point after the series of the temporary data. Calculating intermediate data of the (L + 1) -th stage on the first stream within one clock cycle per point by adding data obtained by multiplying by a predetermined lifting coefficient;
With
A wavelet transform method that controls the steps (c-1) to (c-8) to be repeatedly executed until output data of all points are calculated.

The wavelet transform method according to claim 14,
After executing the steps (c-1) and (c-3), the steps (c-5) and (c-7) are executed until temporary data of the output data is calculated. ,
Then, after executing the steps (c-2) and (c-4), the steps (c-6) and (c-8) are executed until the output data is calculated.
Wavelet transform method.

The wavelet transform method according to claim 14,
The step (c-2) of calculating intermediate data of a second stage on a sequence belonging to the second data string and starting from a P-th (data number P is an integer) input data in the input data string; When,
The step (c-3) of calculating temporary data of a second stage on a series starting from the P-1st input data;
The step (c-6) of calculating intermediate data of the (M + 1) th stage on the sequence starting from the P-4th input data;
The step (c-7) of calculating temporary data at the (L + 1) th stage on the sequence starting from the P-5th input data;
The above four steps are executed in parallel by the respective two-point arithmetic units,
The step (c-1) of calculating temporary data of a second stage on a sequence starting from the P + 2th input data;
The step (c-4) of calculating the intermediate data of the second stage on the sequence starting from the P-1st input data;
The step (c-5) of calculating temporary data at the (M + 1) th stage on a sequence starting from the (P-2) th input data;
The step (c-8) of calculating intermediate data of the (L + 1) -th stage on a sequence starting from the P-5th input data;
Are controlled to execute the four processes in parallel, respectively.
Wavelet transform method.

The wavelet transform method according to claim 13, wherein
The step (c) comprises:
(C-1) The first-stage intermediate data on the “sequence starting from the input data belonging to the second data sequence” (hereinafter, referred to as the second sequence), and 1 for the intermediate data sequence A predetermined lifting coefficient is added to the data obtained by adding the intermediate data of the first stage of the two points on the “sequence starting from the input data belonging to the first data string” (hereinafter referred to as the first series) that is before or after the point. (C-2) calculating the intermediate data of the second stage on the second stream within one clock cycle per point by adding the data obtained by the multiplication; One-stage intermediate data, and data obtained by multiplying data obtained by adding two points of second-stage intermediate data on a second series that is one point around the intermediate data series by a predetermined lifting coefficient, and Is added to the intermediate data of the second stage on the first series. (C-3) intermediate data of the M-th stage (the number of stages M is an integer of 1 or more) on the second stream, and intermediate data of the M-th stage By adding a data obtained by multiplying the data obtained by multiplying a predetermined lifting coefficient to data obtained by adding two points of the M-th stage intermediate data on the first stream which is about one point to the second stream. Calculating the intermediate data of the (M + 1) th stage within one clock cycle per point;
(C-4) L-stage intermediate data on the first stream (the number of stages L is an integer equal to or greater than 1) and 2nd-stream data on the second stream that is about one point away from the L-stage intermediate data stream By adding the data obtained by multiplying the data obtained by multiplying the intermediate data of the (L + 1) -th stage of the point by a predetermined lifting coefficient, the intermediate data of the (L + 1) -th stage on the first stream can be changed by one clock cycle per point. Calculating within
With
A wavelet transform method in which the steps (c-1) to (c-4) are repeatedly executed until the output data of all points is calculated.

The wavelet transform method according to claim 17, wherein
The step (c-2) of calculating intermediate data of a second stage on a sequence belonging to the first data sequence and starting from a P-th (data number P is an integer) input data in the input data sequence. When,
The step (c-4) of calculating the intermediate data of the (L + 1) -th stage on the sequence starting from the P-4th input data;
Is controlled so that the two processes are executed in parallel.
Wavelet transform method.

A wavelet transform method according to claim 17 or claim 18, wherein
The step (c-1) of calculating intermediate data on a sequence starting from the (P + 3) th (data number P is an integer) input data in the input data sequence;
The step (c-3) of calculating intermediate data at the (M + 1) -th stage on a sequence starting from the (P-1) th input data;
Is controlled so that the two processes are executed in parallel.
Wavelet transform method.

18. The wavelet transform method according to claim 17, wherein the steps (c-1) to (c-4) are performed in parallel.

The wavelet transform method according to any one of claims 13 to 20, wherein
For the two-dimensional image data band-divided into a low-frequency component and a high-frequency component, the steps (a) to (c) are performed in line units in one of the horizontal direction and the vertical direction of the two-dimensional image data. Is applied, and the steps (a) to (c) are applied to the calculated combined data string in the other of the horizontal direction and the vertical direction. Wavelet transform method.