JP2004302510A

JP2004302510A - Data processing device

Info

Publication number: JP2004302510A
Application number: JP2003091251A
Authority: JP
Inventors: Yukio Kadowaki; 幸男門脇
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-03-28
Filing date: 2003-03-28
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data processing device that reduces the number of computing elements used for wavelet lifting computation and shortens total processing time to the utmost, especially in a data processing device used for two-dimensional discrete wavelet transformation. <P>SOLUTION: The data processing device comprises one or more computing elements for feeding back computation result data on input computational data to perform lifting computation and outputting the computation result data, in a period wherein computational data and computation result data can be once read from and written into a memory respectively. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、リフティング方式による演算を行うデータ処理装置、特にＪＰＥＧ２０００で用いられているウェーブレット変換を実行するデータ処理装置に関する。
【０００２】
【従来の技術】
近年、高精細画像の圧縮伸張方式として、ＪＰＥＧ２０００が注目されている。当該ＪＰＥＧ２０００では、画像データに対して２次元離散ウェーブレット変換を行うが、当該ウェーブレット変換は、ＪＰＥＧ２０００の標準で定められているリフティング方式による演算（ウェーブレットリフティング演算という）を行うと演算量が節約される。
【０００３】
上記ウェーブレットリフティング演算を行うデータ処理装置は、例えば、以下の特許文献１に開示されている。
【０００４】
【特許文献１】
特開２００２−１９７０７５号公報
【０００５】
図５は、ウェーブレットリフティング演算を行う従来のデータ処理装置２００の回路構成を示す図である。データ処理装置２００は、ＩＲ９：７フィルタを用いたウェーブレット変換、即ち、低域ウェーブレット変換係数の演算に９つの連続する画素の画像データ、高域ウェーブレット変換係数の演算に７つの連続する画素の画像データを用いてウェーブレットリフティング演算を行う。
【０００６】
演算器２１０、２２０、２３０及び２４０は、それぞれ同様の構成で成り、それぞれ注目画素の画像データに、前後して入力される画素の画像データを規定数倍（重み付け）したデータを加算したデータを出力する。
【０００７】
演算器２１０は、２つの加算器２１０ａ及び２１０ｃと１つの乗算器２１０ｂで構成される。演算器２１０の端子Ａには、図示しないメモリより読み出したＸ（２ｎ＋２）番目の画素の画像データが入力され、端子Ｂには、レジスタ２１１及び２１２により一時的に記録されていたＸ（２ｎ）番目の画素の画像データが入力され、端子Ｃには、レジスタ２１１により一時的に記録されていたＸ（２ｎ＋１）番目の画素の画像データが入力される。演算器２１０内部では、加算器２１０ａが端子Ａ及び端子Ｂに入力されたＸ（２ｎ＋２）番目及びＸ（２ｎ）番目の画素の画像データの和を求め、乗算器２１０ｂが上記加算器２１０ａの出力をα倍し、更に、加算器２１０ｃが上記乗算器２１０ｂがα倍した値に端子Ｃに入力されたＸ（２ｎ＋１）番目の注目画素の画像データを加算し、当該加算した値をＹ（２ｎ＋１）番目のステップＳ１のウェーブレットリフティング演算結果のデータとして出力する。
【０００８】
演算器２２０の端子Ａには、上記演算器２１０において求めたＹ（２ｎ＋１）番目のステップＳ１のウェーブレットリフティング検算結果のデータが入力され、端子Ｂには、レジスタ２２１により一時的に記録されていたＹ（２ｎ−１）番目のデータが入力され、端子Ｃには、前段の演算器２１０の端子Ｂに入力されたのと同じＸ（２ｎ）番目の画素の画像データが入力される。演算器２２０は、端子Ａ及び端子Ｂに入力されたＹ（２ｎ＋１）番目及びＹ（２ｎ−１）番目のデータの和をβ倍し、更に、当該β倍した値に端子Ｃに入力されたＸ（２ｎ）番目の注目画素の画像データを加算した値をＹ（２ｎ）番目のステップＳ２のウェーブレットリフティング演算結果のデータとして出力する。
【０００９】
演算器２３０の端子Ａには、上記演算器２２０において求めたＹ（２ｎ）番目のステップＳ２のウェーブレットリフティング検算結果のデータが入力され、端子Ｂには、レジスタ２３１により一時的に記録されていたＹ（２ｎ−２）番目のデータが入力され、端子Ｃには、前段の演算器２１０の端子Ｂに入力されたのと同じＹ（２ｎ−１）番目のデータが入力される。演算器２３０は、端子Ａ及び端子Ｂに入力されたＹ（２ｎ）番目及びＹ（２ｎ−２）番目のデータの和をγ倍し、更に、端子Ｃに入力されたＹ（２ｎ−１）番目の注目データを加算した値をＹ（２ｎ−１）番目のステップＳ３のウェーブレットリフティング演算結果のデータとして出力する。
【００１０】
演算器２４０の端子Ａには、上記演算器２３０において求めたＹ（２ｎ−１）番目のステップＳ３のウェーブレットリフティング検算結果のデータが入力され、端子Ｂには、レジスタ２４１により一時的に記録されていたＹ（２ｎ−３）番目のデータが入力され、端子Ｃには、前段の演算器２３０の端子Ｂに入力されたのと同じＹ（２ｎ−２）番目のデータが入力される。演算器２４０は、端子Ａ及び端子Ｂに入力されたＹ（２ｎ−１）番目及びＹ（２ｎ−３）番目のデータの和をσ倍し、更に、端子Ｃに入力されたＹ（２ｎ−２）番目の注目データを加算した値をステップＳ４のＹ（２ｎ−２）番目のウェーブレットリフティング演算結果のデータとして出力する。
【００１１】
上記演算器２３０より出力されるＹ（２ｎ−１）番目のステップＳ３のウェーブレットリフティング演算結果のデータを乗算器２３２により−Ｋ倍した値をウェーブレット係数のハイパス成分（ステップＳ５のウェーブレットリフティング演算結果）−Ｋ・Ｙ（２ｎ−１）として図示しないメモリに書き込む。
【００１２】
また、上記演算器２４０より出力されるＹ（２ｎ−２）番目のステップＳ４のウェーブレットリフティング演算結果のデータを乗算器２４２により１／Ｋ倍した値をウェーブレット係数のローパス成分（ステップＳ６のウェーブレットリフティング演算結果）１／Ｋ・Ｙ（２ｎ−２）として図示しないメモリに書き込む出力する。
【００１３】
上記構成のデータ処理装置２００では、例えば、Ｘ１番目の画素の画像データについてステップＳ１の演算が完了し、当該ステップＳ１の演算により求めたデータを用いてステップＳ２の演算を行う際、次のＸ２番目の画素の画像データについてステップＳ１の演算を実行することができる。同様のことが、他のステップＳ２，Ｓ３，Ｓ４においていえる。
【００１４】
図６（ａ）は、データ処理装置２００の最高のスループットを示す図である。上部に記すｔ１〜ｔ６は、駆動クロック信号の１サイクル期間を表す。図示するように、上記構成のデータ処理装置２００では、４サイクルでステップＳ１のウェーブレットリフティング演算結果から最終のステップＳ６のウェーブレットリフティング演算結果を求めることができる。また、一旦、演算を開始した後は、間を置かずに次々と次の画素の画像データについての最終のステップＳ６のウェーブレットリフティング演算結果を求めることができる。
【００１５】
【発明が解決しようとする課題】
メモリからの演算用の画像データの読み出しには駆動クロック信号１サイクル分の時間を要するが、演算結果のデータのメモリへの書き込みにも１サイクル分の時間を要する。このため、画像データは１サイクル置きにしか読み出すことができない。このため、実際には図６（ｂ）に示すように、１サイクル置きにしか次の画素の画像データについての最終ステップＳ６のウェーブレットリフティング演算結果を求めることしかできない。
【００１６】
即ち、従来のデータ処理装置２００では、より高速な処理ができるにもかかわらず、上記演算用の画像データのメモリからの読み出し及び演算結果のデータのメモリへの書き込みに関する制約により、半分の処理効率で動作していた。
【００１７】
また、従来のデータ処理装置２００では、各ステップの演算を専用の４個の演算器で実行するため、規模が大きいという問題があった。１個の演算器を時分割処理により共用すれば、使用する演算器の個数を減らすことはできるが、この場合、処理時間が４倍以上に延びてしまう。
【００１８】
本発明は、上記２次元離散ウェーブレット変換を行う際に用いるデータ処理装置であって、ウェーブレットリフティング演算を行うのに使用する演算器の数を少なくすると共に、全体の処理時間をできるだけ短くできるデータ処理装置を提供することを目的とする。
【００１９】
【課題を解決するための手段】
本発明の第１のデータ処理装置は、メモリからの演算用のデータの読み出し及びメモリへの演算結果のデータの書き込みが各１回できる期間に、入力された演算用のデータに対して行った演算結果のデータを帰還させてリフティング方式の演算を行い、当該演算結果のデータを出力する演算器を、１個以上備える事を特徴とする。
【００２０】
本発明の第２のデータ処理装置は、上記第１のデータ処理装置において、上記演算器を２個以上備える場合において、前段に位置する演算器の出力端子と後段に位置する演算器の信号入力端子の間に、上記メモリへからのデータの読み出し及びメモリへのデータの書き込みが各１回できる期間の内の一定時間だけ開くゲートを備えることを特徴とする。
【００２１】
本発明の第３のデータ処理装置は、上記第２のデータ処理装置において、上記演算器を３個以上備え、各演算器の間に上記ゲートを２個以上備える場合において、信号入力側から見て奇数番目に位置するゲートと偶数番目に位置するゲートの開閉のタイミングが反対に設定されていることを特徴とする。
【００２２】
本発明の第４のデータ処理装置は、上記第２のデータ処理装置において、ＪＰＥＧ２０００の準拠したウェーブレット変換を実行することを特徴とする。
【００２３】
本発明の第５のデータ処理装置は、上記第４のデータ処理装置において、画像データの他、１以上の各レベルのＬＬのサブバンドのウェーブレット係数のデータが混在したデータを演算用のデータとして処理することを特徴とする。
【００２４】
本発明の第６のデータ処理装置は、上記第４のデータ処理装置において、画像データを色変換して得られる３つの色成分のデータに変換したデータであって、各色成分のデータが混在したデータを演算用のデータとして処理することを特徴とする。
【００２５】
【発明の実施の形態】
図１は、ウェーブレットリフティング演算を実行するデータ処理装置１００の回路図である。当該データ処理装置１００は、駆動クロック信号の２サイクルに一度、画像データＸを読み込む通常のメモリシステムに対して、僅か２個の演算器（従来は４個、図５を参照）により最高のスループット、具体的には、上記従来技術の欄で説明した図６（ｂ）に示すスループットが得られるように設計したものである。
【００２６】
以下、データ処理装置１００の構成について説明した後、実際の動作について駆動クロック信号２個分のタイミング（タイミングＡ及びタイミングＢ）に分けて説明する。タイミングＡは、駆動クロック信号の奇数サイクルの期間をいう。また、タイミングＢは、駆動クロック信号の偶数サイクルの期間をいう。
【００２７】
図示しないメモリから読み出されて、データ処理装置１００に入力される演算用の画像データは、マルチプレクサ（図中、ＭＵＸと記す）１１１の信号入力端子１１１ｂに入力される他、レジスタ１１２を介してマルチプレクサ１１５の信号入力端子１１５ｂに入力され、更にレジスタ１１３を介してマルチプレクサ１１４の信号入力端子１１４ｂに入力される。
【００２８】
マルチプレクサ１１１のもう一方の信号入力端子１１１ａには、演算器１１０の出力端子Ｄより帰還してきた信号が入力される。マルチプレクサ１１４のもう一方の信号入力端子１１４ａには、演算器１１０の出力端子Ｄより帰還してきた信号がレジスタ１１７により一時的に記録された後に入力される。マルチプレクサ１１５のもう一方の信号入力端子１１５ａには、マルチプレクサ１１４の出力信号が入力される。
【００２９】
マルチプレクサ１１１，１１４及び１１５は、タイミングＡで図中、下側の入力端子１１１ｂ、１１４ｂ及び１１５ｂに入力される信号を通過させ、タイミングＢで上側の入力端子１１１ａ、１１４ａ及び１１５ａに入力される信号を通過させる。
【００３０】
演算器１１０は、２つの加算器１１０ａ及び１１０ｃと駆動クロック信号の１サイクル毎にα又はβに倍率が切り換る１つの乗算器１１０ｂで構成される。演算器１１０内で、加算器１１０ａ及び乗算器１１０ｂは、端子Ａ及び端子Ｂに入力される注目画素の両隣の画像データの和をタイミングＡでα倍し、タイミングＢでβ倍する。加算器１１０ｃは、上記α倍又はβ倍された値に端子Ｃより入力される注目画素の画像データを加算して出力する。倍率の切換は駆動クロック信号に応じて行う。
【００３１】
演算器１１０の出力端子Ｄは、タイミングＡで閉じ、タイミングＢで開くゲート１２１に接続される。ゲート１２１の出力は、入力された信号を駆動クロック信号１サイクル分だけ遅延して出力する遅延回路１２２に接続されている。遅延回路１２２の出力は、マルチプレクサ１２４の信号入力端子１２３ｂに接続される他、レジスタ１２４を介してマルチプレクサ１２５の信号入力端子１２５ｂに接続される。マルチプレクサ１２３の残りの信号入力端子１２３ａは、演算器１２０の出力端子Ｄより帰還してきた信号が入力される。また、マルチプレクサ１２５のもう一方の信号入力端子１２５ａは、演算器１２０の出力端子Ｄより帰還してきた信号がレジスタ１２８を介して入力される。また、マルチプレクサ１２７の信号入力端子１２７ａには、上記マルチプレクサ１２５の出力信号が入力され、もう一方の信号入力端子１２７ｂには、マルチプレクサ１１４より出力された信号が遅延回路１２６において駆動クロック信号１サイクル分だけ遅延された状態で入力される。
【００３２】
マルチプレクサ１２３、１２５及び１２７は、タイミングＡで図中、上側の入力端子１２３ａ、１２５ａ及び１２７ａに入力される信号を通過させ、タイミングＢで下側の入力端子１２３ｂ、１２５ｂ及び１２７ｂに入力される信号を通過させる。
【００３３】
演算器１２０は、演算器１１０と同様の構成で成り、端子Ａ及び端子Ｂに入力される注目データの前後のデータの和をタイミングＡでσ倍し、タイミングＢでγ倍した値に端子Ｃより入力される注目データの値を加算して出力する。
【００３４】
演算器１２０の出力端子Ｄは、マルチプレクサ１２９の信号入力端子１２９ａに接続される。マルチプレクサ１２９は、タイミングＡで出力端子１２９ｃから乗算器１３１に信号を出力し、タイミングＢで出力端子１２９ｂから乗算器１３０に信号を出力する。
【００３５】
以下、上記構成のデータ処理装置１００がタイミングＡにおいて実行する処理内容について説明する。図２は、図１に示したデータ処理装置１００の回路図から、タイミングＡにおいて有効に働いている構成要素のみを抽出して表示したものである。
【００３６】
タイミングＡでは、前段の演算器１１０においてステップＳ１のウェーブレットリフティング演算が実行されると共に、後段の演算器１２０においてステップＳ４及びステップＳ６のウェーブレットリフティング演算が実行される。
【００３７】
タイミングＡにおいて、演算器１１０の信号入力端子Ｃには、Ｘ（２ｎ＋１）番目の注目画素の画像データが入力され、信号入力端子Ａには、注目画素の前に位置するＸ（２ｎ＋２）番目の画素の画像データが入力され、信号入力端子Ｂには、注目画素の後に位置するＸ（２ｎ）番目の画素の画像データが入力される。演算器１１０は、Ｙ（２ｎ＋１）番目のデータのステップＳ１のウェーブレットリフティング演算のデータとしてＹ（２ｎ＋１）＝Ｘ（２ｎ＋１）＋α（Ｘ（２ｎ）＋Ｘ（２ｎ＋２））を求め、求めたデータを出力する。
【００３８】
一方で、ゲート１２１により演算器１１０からのデータ入力が遮断されている後段の演算器１２０では、信号入力端子Ｃに注目データとしてＹ（２ｎ−２）番目のデータが入力され、信号入力端子Ａには、注目データの前に位置するＹ（２ｎ−１）番目のデータが入力され、信号入力端子Ｂには、注目データの後に位置するＹ（２ｎ−３）番目のデータが入力される。なお、上記信号入力端子Ａに入力されるデータは、後に説明するタイミングＢにおいて当該演算器１２０から出力されたステップＳ３のウェーブレットリフティング演算結果のデータである。演算器１２０は、Ｙ（２ｎ−２）番目のステップＳ４のウェーブレットリフティング演算のデータとしてＹ（２ｎ−２）＝Ｙ（２ｎ−２）＋σ（Ｙ（２ｎ−１）＋Ｙ（２ｎ−３））を求め、求めたデータを出力する。
【００３９】
更に、演算器１２０より出力されたデータは、ステップＳ６のウェーブレットリフティング演算として乗算器１３１により１／Ｋ倍され、ウェーブレット係数のハイパス成分のデータ（演算結果のデータ）として図示しないメモリに書き込まれる。
【００４０】
引き続き、データ処理装置１００がタイミングＢにおいて実行する処理内容について説明する。図３は、図１に示したデータ処理装置１００の回路図から、タイミングＢにおいて有効に働いている構成要素のみを抽出して表示したものである。
【００４１】
タイミングＢでは、前段の演算器１１０においてステップＳ２のウェーブレットリフティング演算が実行されると共に、後段の演算器１２０においてステップＳ３及びステップＳ５のウェーブレットリフティング演算が実行される。
【００４２】
タイミングＢにおいて、演算器１１０の信号入力端子Ｃには、直前のタイミングＡで注目画素の後に位置していたＸ（２ｎ）番目の画素の画像データが注目データとして入力され、信号入力端子Ａには、注目データの前に位置するＹ（２ｎ＋１）番目のデータが入力され、信号入力端子Ｂには、注目データの後に位置するＹ（２ｎ−１）番目のデータが入力される。演算器１１０は、Ｙ（２ｎ）番目のステップＳ２のウェーブレットリフティング演算のデータとしてＹ（２ｎ）＝Ｘ（２ｎ）＋β（Ｙ（２ｎ−１）＋Ｙ（２ｎ＋１））を求め、求めたデータを出力する。
【００４３】
タイミングＢではゲート１２１は開いており、演算器１１０より出力されたＹ（２ｎ）番目のデータは、遅延回路１２２により駆動クロック信号１サイクル分だけ遅延された後に、注目データの１つ前のデータとして演算器１２０の信号入力端子Ａに入力される。信号入力端子Ｂには、注目データの１つ後のデータとしてＹ（２ｎ−２）番目のデータが入力される。信号入力端子Ｃには、注目データとしてＹ（２ｎ−１）番目のデータが入力される。演算器１２０は、Ｙ（２ｎ−１）番目のステップＳ３のウェーブレットリフティング演算のデータとしてＹ（２ｎ−１）＝Ｙ（２ｎ−１）＋γ（Ｙ（２ｎ−２）＋Ｙ（２ｎ））を求め、求めたデータを出力する。
【００４４】
演算器１２０より出力されたデータは、更に、乗算器１３０により−Ｋ倍され、ステップＳ５のウェーブレットリフティング演算により得られるウェーブレット係数のローパス成分のデータ（演算結果のデータ）として図示しないメモリに書き込まれる。
【００４５】
図４は、上記構成のデータ処理装置１００において、実現される最高のスループットを示す図である。上部に記すｔ１〜ｔ６は、駆動クロック信号の１サイクルを表す。図示するように、データ処理装置１００では、４サイクルでステップＳ１のウェーブレットリフティング演算結果から最終のステップＳ６のウェーブレットリフティング演算結果を求めることができる。また、一旦、演算を開始した後は、駆動クロック信号１サイクル置きに次の画素の画像データについての最終のステップＳ６のウェーブレットリフティング演算結果を求めることができる。
【００４６】
上記図４は、図５を参照しつつ従来技術の欄において説明したデータ処理装置２００の実現し得るスループット（図６（ｂ）を参照）と同じである。即ち、データ処理装置１００では、従来のデータ処理装置２００と比べて演算器の数を半分の２個にして回路規模を小さくしたにも拘わらず、全く同じスループットを得ることができるといった効果を奏することができるのである。これは、ウェーブレットリフティング演算結果のデータを図示しないメモリに書き込んでいる間に、２つの演算器を休ませることなく、次のステップのウェーブレットリフティング演算を実行させることで、処理速度を２倍に高めているためである。
【００４７】
なお、データ処理装置１００では、ある画素の画像データに対するステップＳ１及びＳ２の演算終了後、上記処理した画素の画像データと全く関係ない別の画素の画像データを入力することができる。従って、全画像データをデータ処理装置１００で処理してレベル１のウェーブレット係数に変換した後に、１ＬＬのサブバンドの全データをデータ処理装置１００で処理する以外に、画像データの処理がある程度進んだ段階で、１ＬＬのサブバンドのウェーブレット係数の処理を行いレベル２のウェーブレット係数を求めても良い。更に高いレベルについても同様である。このように、データ処理装置１００では、画像データの他、１以上の各レベルのＬＬのサブバンドのウェーブレット係数のデータが混在したデータを演算用のデータとして処理することができ、これにより、所望のレベルのウェーブレット係数がより迅速に得られるといった利点を有する。
【００４８】
また、画像データをＹ，Ｃｒ，Ｃｂの３つの色成分に変換した後、各色成分のデータを一塊としてデータ処理装置１００において処理しても良いし、Ｙ成分のデータ、Ｃｒ成分のデータ、Ｃｂ成分のデータの順に並列に処理しても良い。このように、データ処理装置１００では、画像データを色変換して得られる３つの色成分のデータに変換したデータであって、各色成分のデータが混在したデータを演算用のデータとして処理することができ、これにより、所望の色成分のウェーブレット係数がより迅速に得られるといった利点を有する。
【００４９】
以上、ＩＲ９：７フィルタ処理を行うデータ処理装置１００について説明したが、メモリからの演算用のデータの読み出し及びメモリへの演算結果のデータの書き込みが各１回できる期間に、入力された演算用のデータに対して行った演算結果のデータを帰還させてリフティング方式の演算を行い、当該演算結果のデータを出力する演算器を、１個以上備える事を基本概念とする本発明のデータ処理装置の構成を適用すれば、少ない演算器で迅速な処理の可能なＩＲ５：３フィルタや、デコーダを構成することもできる。
【００５０】
【発明の効果】
本発明のデータ処理装置は、メモリから演算用のデータを読み出し、及び、演算結果のデータをメモリに書き込むのに要する時間内に、最初の演算結果のデータを帰還させてリフティング方式による演算を行うことにより、駆動クロック信号１サイクル毎に１つの演算を連続して行うタイプのデータ処理回路に比べて半分の演算器の数で同じスループットを実現することができる。
【図面の簡単な説明】
【図１】データ処理装置の回路図である。
【図２】データ処理装置中でタイミングＡにおいて動作する構成要素を示す図である。
【図３】データ処理装置中でタイミングＢにおいて動作する構成要素を示す図である。
【図４】データ処理装置のスループットを示す図である。
【図５】従来のデータ処理装置の回路図である。
【図６】（ａ）は、従来のデータ処理装置の理論上の最高のスループットを示す図であり、（ｂ）は、メモリへのデータの読み書きのタイミングの制限より実際に得られるスループットを示す図である。
【符号の説明】１００データ処理装置、１１０，１２０演算器、１１１，１１４，１１５，１２３，１２５，１２７マルチプレクサ、１１２，１１３，１１７，１２４，１２８レジスタ、１２２，１２６遅延回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data processing device that performs an operation according to a lifting method, and particularly to a data processing device that performs a wavelet transform used in JPEG2000.
[0002]
[Prior art]
In recent years, JPEG2000 has been receiving attention as a compression / expansion method for high-definition images. According to the JPEG2000, a two-dimensional discrete wavelet transform is performed on image data. When the wavelet transform is performed by an operation according to a lifting method defined by the JPEG2000 standard (referred to as a wavelet lifting operation), the amount of operation is reduced. .
[0003]
A data processing device that performs the wavelet lifting operation is disclosed in, for example, Patent Document 1 below.
[0004]
[Patent Document 1]
JP-A-2002-19775 [0005]
FIG. 5 is a diagram showing a circuit configuration of a conventional data processing device 200 that performs a wavelet lifting operation. The data processing device 200 performs wavelet transform using an IR9: 7 filter, that is, image data of nine consecutive pixels for calculating a low-pass wavelet transform coefficient, and image of seven consecutive pixels for calculating a high-pass wavelet transform coefficient. A wavelet lifting operation is performed using the data.
[0006]
Each of the arithmetic units 210, 220, 230 and 240 has the same configuration, and adds data obtained by adding data obtained by multiplying (weighting) image data of pixels input before and after a specified number to image data of a target pixel. Output.
[0007]
The arithmetic unit 210 includes two adders 210a and 210c and one multiplier 210b. Image data of the X (2n + 2) th pixel read from a memory (not shown) is input to a terminal A of the arithmetic unit 210, and X (2n) temporarily recorded by the registers 211 and 212 is input to a terminal B. The image data of the X-th pixel is input to the terminal C, and the image data of the X (2n + 1) -th pixel temporarily recorded by the register 211 is input to the terminal C. Inside the arithmetic unit 210, the adder 210a calculates the sum of the image data of the X (2n + 2) th and X (2n) th pixels input to the terminals A and B, and the multiplier 210b outputs the output of the adder 210a. , And the adder 210c adds the image data of the X (2n + 1) -th pixel of interest input to the terminal C to the value multiplied by α by the multiplier 210b, and adds the added value to Y (2n + 1). ) Output as the data of the result of the wavelet lifting calculation in step S1.
[0008]
The data of the Y (2n + 1) -th wavelet lifting check result obtained in step S1 obtained in the above-described computing unit 210 is input to terminal A of computing unit 220, and is temporarily recorded in terminal B by register 221 in terminal B. The Y (2n-1) -th data is input, and the same image data of the X (2n) -th pixel as input to the terminal B of the arithmetic unit 210 in the preceding stage is input to the terminal C. The arithmetic unit 220 multiplies the sum of the Y (2n + 1) -th and Y (2n-1) -th data input to the terminal A and the terminal B by β, and further inputs the sum to the terminal C by the β-multiplied value. The value obtained by adding the image data of the X (2n) th pixel of interest is output as the data of the wavelet lifting operation result of the Y (2n) th step S2.
[0009]
The data of the Y (2n) -th step S2 wavelet lifting verification result obtained in the arithmetic unit 220 is input to the terminal A of the arithmetic unit 230, and the data is temporarily recorded by the register 231 in the terminal B. The Y (2n-2) -th data is input, and the same Y (2n-1) -th data as that input to the terminal B of the preceding computing unit 210 is input to the terminal C. The arithmetic unit 230 multiplies the sum of the Y (2n) -th and Y (2n-2) -th data input to the terminal A and the terminal B by γ, and further, Y (2n-1) input to the terminal C The value obtained by adding the data of interest is output as the data of the result of the wavelet lifting operation in the Y (2n-1) th step S3.
[0010]
The data of the Y (2n-1) -th wavelet lifting check result obtained in the arithmetic unit 230 is input to the terminal A of the arithmetic unit 240, and the data is temporarily recorded by the register 241 to the terminal B. The Y (2n−3) th data is input to the terminal C, and the same Y (2n−2) th data as input to the terminal B of the arithmetic unit 230 in the previous stage is input to the terminal C. The arithmetic unit 240 multiplies the sum of the Y (2n−1) th and Y (2n−3) th data input to the terminal A and the terminal B by σ, and furthermore, Y (2n−) input to the terminal C. The value obtained by adding the 2) th data of interest is output as the data of the Y (2n-2) th wavelet lifting calculation result in step S4.
[0011]
A high-pass component of a wavelet coefficient obtained by multiplying the data of the Y (2n-1) -th wavelet lifting operation result output from the arithmetic unit 230 in step S3 by -K by the multiplier 232 (wavelet lifting operation result in step S5) Write to a memory (not shown) as -KY (2n-1).
[0012]
Further, a value obtained by multiplying the data of the wavelet lifting calculation result of the Y (2n-2) -th step S4 output from the calculator 240 by 1 / K by the multiplier 242 is a low-pass component of the wavelet coefficient (the wavelet lifting in step S6). The result is written into a memory (not shown) as 1 / KY (2n-2) and output.
[0013]
In the data processing device 200 having the above configuration, for example, when the calculation of step S1 is completed for the image data of the X1th pixel, and when the calculation of step S2 is performed using the data obtained by the calculation of step S1, the following X2 The operation of step S1 can be performed on the image data of the ith pixel. The same can be said for the other steps S2, S3, S4.
[0014]
FIG. 6A is a diagram illustrating the highest throughput of the data processing device 200. T1 to t6 described above represent one cycle period of the drive clock signal. As shown in the figure, in the data processing device 200 having the above configuration, the wavelet lifting calculation result of the final step S6 can be obtained from the wavelet lifting calculation result of step S1 in four cycles. Also, once the calculation is started, the result of the wavelet lifting calculation in the final step S6 for the image data of the next pixel can be obtained one after another without a pause.
[0015]
[Problems to be solved by the invention]
Reading image data for calculation from the memory requires one cycle of the driving clock signal, but writing data of the calculation result to the memory also requires one cycle of time. Therefore, the image data can be read out only every other cycle. Therefore, in practice, as shown in FIG. 6B, the result of the wavelet lifting calculation in the final step S6 for the image data of the next pixel can be obtained only every other cycle.
[0016]
That is, in the conventional data processing apparatus 200, although processing can be performed at higher speed, half of the processing efficiency is reduced due to the restriction on reading the image data for calculation from the memory and writing the data of the calculation result into the memory. Was working on.
[0017]
In addition, the conventional data processing device 200 has a problem that the scale of the data processing device 200 is large because the operation of each step is performed by four dedicated arithmetic units. If one arithmetic unit is shared by time-division processing, the number of arithmetic units to be used can be reduced. However, in this case, the processing time is extended four times or more.
[0018]
The present invention relates to a data processing apparatus used for performing the two-dimensional discrete wavelet transform, wherein the number of arithmetic units used for performing a wavelet lifting operation is reduced, and the data processing apparatus is capable of reducing the overall processing time as much as possible. It is intended to provide a device.
[0019]
[Means for Solving the Problems]
The first data processing device of the present invention performs the operation on the input operation data during a period in which the operation data can be read from the memory and the operation result data can be written to the memory once. It is characterized in that one or more arithmetic units are provided which perform lifting-type arithmetic by feeding back the arithmetic result data and output the arithmetic result data.
[0020]
According to a second data processing device of the present invention, in the first data processing device, when two or more arithmetic units are provided, the output terminal of the arithmetic unit located at the preceding stage and the signal input of the arithmetic unit located at the subsequent stage are provided. A gate is provided between the terminals, the gate being opened for a predetermined time in a period in which data can be read from the memory and data can be written to the memory once.
[0021]
The third data processing device of the present invention is characterized in that, in the second data processing device, when three or more arithmetic units are provided and two or more gates are provided between the arithmetic units, the third data processing device is viewed from a signal input side. The timing of opening and closing the odd-numbered gate and the even-numbered gate is set opposite.
[0022]
A fourth data processing device according to the present invention is characterized in that, in the second data processing device, a wavelet transform conforming to JPEG2000 is executed.
[0023]
According to a fifth data processing device of the present invention, in the above fourth data processing device, data in which wavelet coefficient data of one or more LL subbands of each level are mixed as data for calculation in addition to image data. It is characterized by processing.
[0024]
According to a sixth data processing device of the present invention, in the above-described fourth data processing device, data obtained by converting image data into data of three color components obtained by performing color conversion, wherein data of each color component is mixed. The method is characterized in that data is processed as data for calculation.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a circuit diagram of a data processing device 100 that performs a wavelet lifting operation. The data processing apparatus 100 has the highest throughput with only two arithmetic units (conventionally four, see FIG. 5) for a normal memory system that reads image data X once every two cycles of the drive clock signal. Specifically, it is designed so as to obtain the throughput shown in FIG. 6B described in the section of the related art.
[0026]
Hereinafter, after describing the configuration of the data processing apparatus 100, the actual operation will be described separately for the timings (timing A and timing B) of two drive clock signals. Timing A refers to a period of an odd cycle of the drive clock signal. Timing B refers to a period of an even cycle of the drive clock signal.
[0027]
Image data for calculation read out from a memory (not shown) and input to the data processing device 100 is input to a signal input terminal 111 b of a multiplexer (denoted as MUX in the figure) 111 and also via a register 112. The signal is input to the signal input terminal 115 b of the multiplexer 115, and further input to the signal input terminal 114 b of the multiplexer 114 via the register 113.
[0028]
The signal fed back from the output terminal D of the arithmetic unit 110 is input to the other signal input terminal 111a of the multiplexer 111. The signal fed back from the output terminal D of the arithmetic unit 110 is input to the other signal input terminal 114a of the multiplexer 114 after being temporarily recorded by the register 117. The output signal of the multiplexer 114 is input to the other signal input terminal 115a of the multiplexer 115.
[0029]
The multiplexers 111, 114 and 115 pass the signals input to the lower input terminals 111b, 114b and 115b in the figure at timing A, and the signals input to the upper input terminals 111a, 114a and 115a at timing B. Through.
[0030]
The arithmetic unit 110 is composed of two adders 110a and 110c and one multiplier 110b whose magnification is switched to α or β every cycle of the drive clock signal. In the arithmetic unit 110, the adder 110a and the multiplier 110b multiply the sum of the image data on both sides of the target pixel input to the terminal A and the terminal B by α at the timing A and β by the timing B. The adder 110c adds the α- or β-multiplied value to the image data of the target pixel input from the terminal C, and outputs the added value. Switching of the magnification is performed according to the drive clock signal.
[0031]
The output terminal D of the arithmetic unit 110 is connected to the gate 121 which closes at the timing A and opens at the timing B. The output of the gate 121 is connected to a delay circuit 122 that delays the input signal by one cycle of the driving clock signal and outputs the delayed signal. The output of the delay circuit 122 is connected to the signal input terminal 123b of the multiplexer 124 and also to the signal input terminal 125b of the multiplexer 125 via the register 124. The signal fed back from the output terminal D of the arithmetic unit 120 is input to the remaining signal input terminal 123a of the multiplexer 123. The signal that has been fed back from the output terminal D of the arithmetic unit 120 is input to the other signal input terminal 125 a of the multiplexer 125 via the register 128. The output signal of the multiplexer 125 is input to the signal input terminal 127a of the multiplexer 127, and the signal output from the multiplexer 114 is input to the other signal input terminal 127b of the delay circuit 126 for one cycle of the driving clock signal. Input with only a delay.
[0032]
The multiplexers 123, 125, and 127 pass the signals input to the upper input terminals 123a, 125a, and 127a at the timing A, and the signals input to the lower input terminals 123b, 125b, and 127b at the timing B. Through.
[0033]
The arithmetic unit 120 has the same configuration as that of the arithmetic unit 110, and multiplies the sum of the data before and after the target data input to the terminals A and B by σ at the timing A, and multiplies the sum of the terminals C by γ at the timing B. The value of the input data of interest is added and output.
[0034]
The output terminal D of the arithmetic unit 120 is connected to the signal input terminal 129a of the multiplexer 129. The multiplexer 129 outputs a signal from the output terminal 129c to the multiplier 131 at the timing A, and outputs a signal from the output terminal 129b to the multiplier 130 at the timing B.
[0035]
Hereinafter, the processing performed by the data processing apparatus 100 having the above configuration at the timing A will be described. FIG. 2 is a diagram in which only components effective at timing A are extracted and displayed from the circuit diagram of the data processing device 100 shown in FIG.
[0036]
At timing A, the wavelet lifting operation of step S1 is executed in the operation unit 110 in the preceding stage, and the wavelet lifting operation in steps S4 and S6 is executed in the operation unit 120 in the subsequent stage.
[0037]
At timing A, the image data of the X (2n + 1) -th pixel of interest is input to the signal input terminal C of the arithmetic unit 110, and the signal input terminal A of the X (2n + 2) -th pixel located before the pixel of interest. The image data of the pixel is input, and the image data of the X (2n) -th pixel located after the pixel of interest is input to the signal input terminal B. The arithmetic unit 110 obtains Y (2n + 1) = X (2n + 1) + α (X (2n) + X (2n + 2)) as the data of the wavelet lifting operation in step S1 of the Y (2n + 1) -th data, and outputs the obtained data. I do.
[0038]
On the other hand, in the subsequent operation unit 120 in which data input from the operation unit 110 is cut off by the gate 121, the Y (2n−2) th data is input to the signal input terminal C as data of interest, and the signal input terminal A , The Y (2n-1) th data located before the data of interest is input, and the Y (2n-3) th data located after the data of interest is input to the signal input terminal B. The data input to the signal input terminal A is data of the result of the wavelet lifting operation in step S3 output from the arithmetic unit 120 at a timing B described later. The arithmetic unit 120 calculates Y (2n−2) = Y (2n−2) + σ (Y (2n−1) + Y (2n−3)) as the data of the wavelet lifting calculation in the Y (2n−2) th step S4. And outputs the obtained data.
[0039]
Further, the data output from the arithmetic unit 120 is multiplied by 1 / K by the multiplier 131 as a wavelet lifting operation in step S6, and written into a memory (not shown) as high-pass component data of the wavelet coefficient (data of the operation result).
[0040]
Subsequently, the processing performed by the data processing device 100 at the timing B will be described. FIG. 3 is a diagram in which only the components that function effectively at the timing B are extracted and displayed from the circuit diagram of the data processing device 100 illustrated in FIG.
[0041]
At the timing B, the wavelet lifting operation in step S2 is executed in the operation unit 110 in the preceding stage, and the wavelet lifting operation in steps S3 and S5 is executed in the operation unit 120 in the subsequent stage.
[0042]
At the timing B, the image data of the X (2n) -th pixel located after the pixel of interest at the immediately preceding timing A is input to the signal input terminal C of the arithmetic unit 110 as the data of interest. , The Y (2n + 1) th data located before the target data is input, and the Y (2n−1) th data located after the target data is input to the signal input terminal B. The arithmetic unit 110 obtains Y (2n) = X (2n) + β (Y (2n−1) + Y (2n + 1)) as the data of the wavelet lifting operation in the Y (2n) -th step S2, and outputs the obtained data. I do.
[0043]
At the timing B, the gate 121 is open, and the Y (2n) -th data output from the arithmetic unit 110 is delayed by one cycle of the driving clock signal by the delay circuit 122, and then the data immediately before the target data is obtained. Is input to the signal input terminal A of the arithmetic unit 120. The Y (2n−2) th data is input to the signal input terminal B as data immediately after the target data. The Y (2n-1) -th data is input to the signal input terminal C as the data of interest. The arithmetic unit 120 obtains Y (2n-1) = Y (2n-1) + γ (Y (2n-2) + Y (2n)) as data of the Y (2n-1) th wavelet lifting operation in step S3. And output the obtained data.
[0044]
The data output from the arithmetic unit 120 is further multiplied by −K by the multiplier 130 and written to a memory (not shown) as low-pass component data (data of a calculation result) of the wavelet coefficient obtained by the wavelet lifting calculation in step S5. .
[0045]
FIG. 4 is a diagram showing the highest throughput realized in the data processing device 100 having the above configuration. T1 to t6 described above represent one cycle of the drive clock signal. As shown in the figure, the data processing device 100 can obtain the final wavelet lifting operation result in step S6 from the wavelet lifting operation result in step S1 in four cycles. Further, once the calculation is started, the result of the wavelet lifting calculation in the final step S6 for the image data of the next pixel can be obtained every other cycle of the drive clock signal.
[0046]
4 is the same as the achievable throughput (see FIG. 6B) of the data processing device 200 described in the section of the related art with reference to FIG. That is, in the data processing apparatus 100, the same throughput can be obtained even though the number of arithmetic units is reduced to half and the circuit scale is reduced as compared with the conventional data processing apparatus 200. You can do it. This is because the processing speed is doubled by executing the wavelet lifting operation of the next step without resting the two arithmetic units while writing the data of the result of the wavelet lifting operation to the memory (not shown). Because it is.
[0047]
In addition, in the data processing device 100, after the calculations in steps S1 and S2 for the image data of a certain pixel are completed, image data of another pixel completely unrelated to the image data of the processed pixel can be input. Therefore, after all the image data is processed by the data processing apparatus 100 and converted into wavelet coefficients of level 1, the processing of the image data proceeds to some extent in addition to processing all the data of the 1LL subband by the data processing apparatus 100. At this stage, wavelet coefficients of the 1LL subband may be processed to obtain level 2 wavelet coefficients. The same applies to higher levels. As described above, the data processing device 100 can process data in which wavelet coefficient data of one or more LL subbands of each level in addition to image data are mixed as calculation data. Level of wavelet coefficients can be obtained more quickly.
[0048]
Further, after converting the image data into three color components of Y, Cr, and Cb, the data of each color component may be processed as one block in the data processing device 100, or may be data of the Y component, data of the Cr component, and data of the Cb. Processing may be performed in parallel in the order of the component data. As described above, in the data processing apparatus 100, data obtained by converting image data into three color component data obtained by performing color conversion, and in which data of each color component are mixed, is processed as calculation data. This has the advantage that the wavelet coefficients of the desired color component can be obtained more quickly.
[0049]
The data processing apparatus 100 that performs the IR9: 7 filter processing has been described above. However, during the period in which the reading of the calculation data from the memory and the writing of the calculation result data to the memory can be performed once, the input calculation The data processing apparatus according to the present invention, which has a basic concept of providing at least one arithmetic unit for performing a lifting method operation by feeding back data of the operation result performed on the data of the above and outputting the data of the operation result By applying the configuration described above, it is also possible to configure an IR5: 3 filter and a decoder that can perform quick processing with a small number of arithmetic units.
[0050]
【The invention's effect】
The data processing device of the present invention performs the operation by the lifting method by feeding back the first operation result data within the time required to read the operation data from the memory and write the operation result data to the memory. Thus, the same throughput can be realized with half the number of arithmetic units as compared with a data processing circuit of a type that performs one operation continuously for each cycle of the drive clock signal.
[Brief description of the drawings]
FIG. 1 is a circuit diagram of a data processing device.
FIG. 2 is a diagram showing components operating at a timing A in the data processing device.
FIG. 3 is a diagram showing components operating at a timing B in the data processing device.
FIG. 4 is a diagram illustrating a throughput of the data processing device.
FIG. 5 is a circuit diagram of a conventional data processing device.
FIG. 6A is a diagram showing the theoretical maximum throughput of a conventional data processing apparatus, and FIG. 6B is a diagram showing the throughput actually obtained from the restriction on the timing of reading and writing data to and from a memory. FIG.
DESCRIPTION OF SYMBOLS 100 data processing device, 110, 120 arithmetic unit, 111, 114, 115, 123, 125, 127 multiplexer, 112, 113, 117, 124, 128 register, 122, 126 delay circuit

Claims

During the period in which the operation data can be read from the memory and the operation result data can be written to the memory once, the operation result data input to the input operation data is fed back to the lifting method. A data processing device comprising one or more arithmetic units for performing an operation and outputting data of the operation result.

The data processing device according to claim 1,
When two or more arithmetic units are provided, data is read from the memory and data is written to the memory between the output terminal of the arithmetic unit located at the preceding stage and the signal input terminal of the arithmetic unit located at the subsequent stage. A data processing device comprising a gate that opens for a fixed time within a period in which each can be performed once.

The data processing device according to claim 2,
When three or more arithmetic units are provided and two or more gates are provided between the arithmetic units, the timing of opening and closing the odd-numbered gate and the even-numbered gate when viewed from the signal input side is opposite. Data processing device set to.

3. The data processing apparatus according to claim 2, wherein the data processing apparatus executes a wavelet transform based on JPEG2000.

The data processing device according to claim 4,
A data processing device for processing data in which wavelet coefficient data of one or more LL subbands of each level in addition to image data are mixed as calculation data.

The data processing device according to claim 4,
A data processing apparatus for processing data obtained by converting image data into data of three color components obtained by performing color conversion, wherein data in which data of each color component is mixed is processed as calculation data.