JP3706656B2

JP3706656B2 - Image processing method and apparatus

Info

Publication number: JP3706656B2
Application number: JP13423395A
Authority: JP
Inventors: 哲臣田中
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-05-31
Filing date: 1995-05-31
Publication date: 2005-10-12
Anticipated expiration: 2020-10-12
Also published as: JPH08329235A

Description

【０００１】
【産業上の利用分野】
本発明は与えられたビットマップの任意の領域の画像に対してプロセッサのソフトウェアにより９０°×ｎ（ｎ＝１，２，３）の回転したビットマップ画像の描画を行なう画像処理方法及びその装置に関するものである。
【０００２】
【従来の技術】
従来、与えられたビットマップの任意の領域の画像に対してプロセッサのソフトウェアにより９０°×ｎ（ｎ＝１，２，３）の回転したビットマップ画像の描画を行なうビットマップ回転方法には次のような方式があった。なお、以下、回転角度は反時計方向への回転角を表わし、ことわりのない限りビット７側が左の画素になるものとする。
【０００３】
図５は従来の９０°回転の主要部分を抜粋である。
【０００４】
回転処理は１バイト×８ライン（８×８画素）の８バイトをブロックを処理単位とする。図において、p_from／p_toは、それぞれ処理するブロックの入出力のポインタであり、添数[n]によりnバイト分バイアスされたアドレスを指し示す。例えばp_from[2]は、p_from[0]よりも２バイト分進んだアドレスを指し示すことになる。row_from／row_toは、それぞれ回転前と回転後の画像の１ラインのバイト数であり、workは回転処理のための作業用変数である。
【０００５】
ステップＳ５１は初期化であり、回転しようとするブロックを指し示すポインタや回転されたブロックを指し示すポインタ等の変数の初期化を行う。
【０００６】
ステップＳ５２は回転処理の本体であり、回転処理した１ブロック（８バイト）のデータを生成する為に、もとになるブロックの８バイトのデータを毎回アクセスして１画素ずつ変数workに移して変数workに回転後のデータを生成し、ポインタp_toのバッファに書き込みを行なう。
【０００７】
ステップＳ５２では処理前ブロック（８行×８列）の各列を処理後ブロックの各行に移し変えている。ステップＳ５２−１では、処理前ブロックの最も右側の列の画素を、処理後ブロックの最も上段の行に移動する。なお、左右上下といった言葉は、８画素×８画素のブロックを視覚的に考えた場合の対応する位置を示している。この際の移動のしかたは、処理前ブロックの最右列の上端の画素を処理後ブロックの最上行の左端に移動し、順次処理前ブロックにおける直前に処理された画素の下の画素を、処理後ブロックにおける直前に処理された画素の右側に配置するように行う。同様にステップＳ５２−２では処理前ブロックの右側から２番目の列を処理後ブロックの２番目の行に移動する。このようにして１つのブロックについて９０°回転を行う。
【０００８】
ステップＳ５２が終わるとステップＳ５１に戻って次のブロックを処理する。
【０００９】
図６に１８０度回転の従来例を示す。図６（ａ）は２５６バイトのルックアップテーブル（ＬＵＴ）を用いた場合であり、図６（ｂ）は９０度回転の従来例と同じ様に１画素単位にコピー／ペーストを繰り返すものである。ＬＵＴを用いた例では、処理前ブロックを行すなわち１バイト単位でＬＵＴ"table"で変換し、それを変換後のブロックとしている。ＬＵＴ"table"は、先頭からのバイト単位のオフセット値を、２進数として１８０°回転させた値を内容として有する。例えば"table"の０ｘ０５バイト目は、それを１８０度回転した値０ｘＡ０となる。このため、ステップＳ６０２においては、ポインタp_toの指し示す処理後ブロックに、ポインタp_fromにより指し示されるバイトを１８０度回転した値が与えられる。１行処理後とにポインタp_fromを１行分進め、ポインタp_toを１行分戻して８行分処理すれば１ブロックの回転がおわる。
【００１０】
また、図６（ｂ）では、ステップＳ６１２において、処理前ブロックの１行（１バイト）を左右反転、すなわち１８０度回転させる。これを回転前ブロックの各行について上段から順次行い、それを回転後ブロックの下段から組み入れれば１８０度回転したブロックが得られる。
【００１１】
ちなみに９０度回転の場合もＬＵＴを用いて処理することも考えられる。
【００１２】
【発明が解決しようとする課題】
しかしながら従来の方式では１ビット単位で処理するため遅く、またビット単位の処理を行うためには対象のビットを含む処理単位、例えばバイト単位でデータをアクセスする必要がある。そのため回転したデータ１バイト生成するために毎回８バイトのデータアクセスを行なうために更に低速になる。それはデータキャッシュを持たないシステムでは更に顕著になる。そして１６bitプロセッサでは１６bit演算、３２bitプロセッサでは３２bit演算ができるにもかかわらず実際に演算に使用されている部分は８bit（１BYTE）の箇所だけであり無駄が多い。
【００１３】
ＬＵＴを用いた場合ではデータのアクセス回数が増えるため高速に処理するためにはデータキャッシュがあるのが望ましい。またＬＵＴを用いて更に高速にするためにテーブルサイズを大きくして例えば１バイト単位で変換するのではなく２バイト単位で変換する様にＬＵＴのアクセス回数を減らした場合、１バイト単位のテーブルの５１２倍となってしまうテーブルサイズの増加量に比べてＬＵＴのアクセス回数が１／２にしかならず極めて非効率である。逆にデータキャッシュを持つシステムではキャッシュヒット率が低下して遅くなることが考えられる。
【００１４】
本発明は上記従来例に鑑みてなされたもので、
・複数バイトにまたがる分割されたブロックのデータを多バイト長の変数にまとめて格納して処理することにより一回の演算で複数の画素を処理することが可能となり全体の演算回数を減らすことができる画像処理方法及びその装置、あるいは、
・回転のための画素の組み替えを小部分から段階的に行なうことにより演算回数を減らすことができる画像処理方法及びその装置、あるいは、
・ブロックの処理順番を回転処理をする元の画像のアドレスが連続する順番で行なうことによりデータキャッシュの容量に対してキャッシュヒット率が高くなり速度が向上させることができる画像処理方法及びその装置を提供することを目的とする。
【００１５】
【課題を解決するための手段】
上記目的を達成するために本発明の画像処理方法は次のような構成からなる。即ち、８ビット×８ラインのイメージデータを９０度回転する画像処理装置であって、８ラインの上半分の連続するイメージデータを第１のメモリに格納し、８ラインの下半分の連続するイメージデータを第２のメモリに格納する第１処理手段と、前記第１のメモリに格納されたイメージデータを７ビット以内でＮビットシフトし、８ビット間隔の第１注目ビット以外をマスクしたイメージデータと、前記第２のメモリに格納されたイメージデータをＮとは４ビット差のＭビットシフトし、前記第１注目ビットに対してＭ−Ｎビットずれた位置の第２注目ビット以外をマスクしたイメージデータとの論理和をとったイメージデータを第３のメモリに格納する第２処理手段と、前記第２処理手段によりマスクされなかったイメージデータを１バイトに加工すべく、前記第３のメモリに格納されたイメージデータと前記第３のメモリに格納されたイメージデータを所定量シフトさせたイメージデータとの論理和をとる第３処理手段と、前記第２処理手段によるシフト量Ｎ及びＭを順次１ずつを変更した処理と、第３処理手段による処理とを８回実行させる手段とを有する。
【００１７】
【第１実施例】
図１４は、本発明の実施例である画像の回転処理を実現するためのシステムの構成図である。図１４において、ＣＰＵ１は、メモリに格納されたプログラムを実行することでシステム全体を制御して画像の回転処理等の処理を行う。ＲＯＭ２は、ＣＰＵ１の制御プログラム（後述するフローチャートに係るプログラムも含む）やデータを格納する。ＲＡＭ３は、ＣＰＵ１の作業用データや回転する画像と回転後の画像を格納する。スキャナ４は画像を入力する。ディスプレイ５は画像を表示する。記憶デバイス６は磁気ディスク装置等で構成され、画像やプログラム、その他のデータを格納する。プリンタ７は画像などの印刷出力を行う。入出力インターフェース（Ｉ／Ｆ）８は外部機器と接続して入出力を行う。これらのブロックのうち、後述する画像回転を行うものとして次のようなものがある。
【００１８】
＜プリンタ＞
まず、入出力インターフェース８などから入力されたページ記述言語のデータをＲＡＭ３に格納する。ＣＰＵ１はそれを解釈しながらＲＡＭ３に描画し、描画されたビットマップデータをプリンタ７に送り紙に印刷する。プリンタ７においては、ビットマップデータを回転させたり、ビットマップフォントを回転させたり、あるいはＡ３プリンタでＡ４の出力を行う場合などに画像の回転処理を行う。
【００１９】
＜ファイリング装置＞
図１４では、記憶デバイス６がファイリング装置に該当することになる。スキャナ４で読み込まれた原稿画像を格納してその文字認識を行うインテリジェントファイリング装置の場合、原稿の向きが正しくない場合には原稿画像を回転させる必要がある。
【００２０】
＜ＣＰＵ＞
スキャナ４や記憶デバイス６から読み込んだ画像をＲＡＭ３に格納し、回転処理してディスプレイ５に表示などを行う。
【００２１】
このように、必要に応じて画像の回転処理が行われる。この回転処理は次のような手順で実現される。
【００２２】
＜回転処理の手順＞
図１は本発明による画像の回転処理を最もよく表した図である。
【００２３】
本実施例における処理は１バイト×８ライン（８×８画素）のブロックを処理単位とする。変数p_from／p_toはそれぞれ処理するブロックの入出力のポインタである。変数row_from／row_toはそれぞれ回転前と回転後の画像の１ラインのバイト数である。例えばp_fromがあるブロックの最上行を指しているとすれば、そのブロックの次の行は、その上の行のアドレスに画像の１ライン分のサイズrow_fromを加えたアドレスとなる。変数work／tmp1／tmp2は回転処理のための作業用変数である。ここでuint32は４バイト長整数であらかじめ定義されているものとする。
【００２４】
図において、ステップＳ１１は初期化であり、処理するブロックの読み出しと書き込みのポインタ等の変数の初期化を行なう。すなわち、変数p_fromには回転前ブロックのアドレスを、変数p_toには回転後ブロックのアドレスを格納する。
【００２５】
ステップＳ１２は処理するブロックの全データを読み出して１バイト長よりも大きな変数にまとめて格納するパッキング処理である。図２（ａ）のように１バイト×８ラインの各ラインのバイトデータを、上４行とした４行とに分けてそれぞれtmp1とtmp2の変数に順次格納する。
【００２６】
ステップＳ１３はその変数tmp1とtmp2とから回転後のデータを順次生成して書き込みを行なう処理である。
【００２７】
図３はステップＳ１３の主要部分であるbitcombine()マクロの処理概要を表したものである。
【００２８】
ステップＳ１２により、変数tmp1とtmp2に１バイト×８ライン（第７〜０ライン）の画素データが４バイトずつパッキングされて格納されている。ステップＳ１３では、この第７ラインから第０ラインのバイトデータの各ＬＳＢ（最下位桁）の画素データを取り出して回転処理した１バイトデータを取り出す。９０度回転した状態では、このＬＳＢが番号順にならんで１行を形成することになる。なお、説明の便宜上、各バイトのＬＳＢにはラインの番号と一致した値を書き込んである。また、疑問符はその値を問わないことを示す。
【００２９】
まず操作１は、状態１に示された変数tmp1とtmp2とからＬＳＢ７からＬＳＢ０の各画素データを取り出して変数workに格納する論理操作である。この操作では、tmp1の内容を４ビット左シフトし、その値と１６進数のマスクデータmask2=0x10101010との論理積をとる。これにより、変数tmp1に格納された値が４ビット左シフトされ、元の４ビットのＬＳＢ７，６，５，４以外は０となっている。また、変数tmep2の内容と１６進数のマスクデータmask1=0x01010101との論理積をとる。このため、tmp2の内容はＬＳＢ以外のビットはすべて０になっている。このようにシフト及びマスクされたtmp1と、マスクされたtmp2との論理和を計算して変数workに得られるのが状態２に示した値である。この操作においてビット７とビット３との位置関係、６と２との位置関係、５と１との位置関係、４と０との位置関係が回転後の位置関係と同じく配置された状態になる。
【００３０】
次に操作２によって、状態２として示された変数workの値と、それを７ビット右シフトした値との論理和を計算する。その結果、変数workは状態３のように再配置され、このうちＬＳＢ７と６と３と２の位置関係と、ＬＳＢ５と４と１と０の位置関係が、回転後の位置関係の通りに配置される。
【００３１】
最後に操作３によって、状態３として示された変数workの値と、それを１４ビット右シフトした値との論理和を計算する。その結果、変数workは状態４のように再配置され、変数workの最下位８ビットに回転後の第１行目の画素データが得られる。
【００３２】
以上の操作で回転前ブロックの最右列を回転後ブロックの最上行におきかえることができる。１ブロックについて行うには、上記説明におけるＬＳＢを１画素ずつ上位桁に移しつつ、同様の操作を行う。画素を移すには、上記操作１における変数tmp1およびtmp2に対するシフト量を減らしていき、対象となる桁が状態２のビット０〜７のようになるようシフト操作すれば良い。こうして得られた状態４の下位１バイトが９０度回転された行となる。
【００３３】
なお、図１のフローチャートをより説明的に示す図が図１５及び図１６である。図１５において、ステップＳ１５１はステップＳ１１に相当し、図２（ａ）の処理を行う。ステップＳ１５２〜Ｓ１５５はステップＳ１３に相当する処理である。ステップＳ１３では各列に対してそれぞれ論理操作を行うように記述しているが、図１５では列の番号をｎとして一般化してるため、ループ処理が加わっている。図１６はステップＳ１５３を詳細に説明するもので、図１でいうところのマクロbitcombine()における処理、すなわち図３に説明された手順を、図３に即して記述したものである。
【００３４】
図１６では、まずパックされた３２ビットのtmp1を、（４−ｎ）ビット左シフトしてその第４，１２，２０，２８ビットを残してすべて０とする。ここでｎは回転しようとする列（桁）を示す。また、ｎ−４が負となった場合は、シフトの方向を反転する。次にステップＳ１６２において、tmp2をｎビット右シフトして、第０，８，１６，２４ビットを残して０とする。ステップＳ１６３ではステップＳ１６１とＳ１６２とで得られた結果の論理和を計算する。ここまでが図３の操作１である。
【００３５】
ステップＳ１６４乃至Ｓ１６６は、それぞれ操作２，３として説明したままの内容である。
【００３６】
このように、ブロックの各ラインをラインごとに扱わず、連続するデータに置き換えてから論理操作を行うことで、画素の組み替えが１回の操作によって１箇所ずつではなく複数の箇所で並列的に処理されるため高速になるのである。
【００３７】
さらに詳しく説明すると、例えば変数tmp1とtmp2にそれぞれ４バイトをパッキングして格納する事によって１バイト×８ラインのデータから必要画素をマスクして取り出すにしても、１回の論理演算（ＡＮＤ演算）によって複数画素（例では４画素）を取り出せる。また変数tmp1とtmp2のデータのパッキング方法も回転後のビットの並びに合わせてあるため、操作２，３に示すような段階的な画素の組み替えができるのである。
【００３８】
変数tmp1とtmp2であるが、図３のようにビット０の場合だけではなく、各桁の回転データを取り出すときにも再利用できるため、この値を保存しておけば、以後、画像データをアクセスすることなしに残りの７バイト（ライン）の回転データを生成できる。このため、データキャッシュを持たないシステムで実行したとしてもレジスタ変数にtmp1とtmp2とを割り付ければ、メモリアクセスなしに回転データを生成できるため非常に高速である。また複数の画像データをパッキングすることにより変数の数が削減でき、レジスタ変数への割り付けが容易であるという効果もある。また、レジスタ変数への割り付けができなくても、データアクセスの回数が１／４（１バイト単位のアクセスが４バイト単位のアクセスとなるため８回が２回となる）になるのでやはり高速である。
【００３９】
詳細に計算すると１バイト×８ラインの１ブロックを回転処理するため、パッキングのための各バイトデータのリードアクセスが各１回、パッキングされたデータから回転後の１バイトデータを生成するための演算回数が各８または９回、生成された回転データのライトアクセスが各１回となる。ちなみに図３の例でいうと操作１が４演算、操作２と３がそれぞれ２演算である。即ち、画像データのアクセス回数が最小限に押さえられ、かつ回転処理が１画素あたり１演算強ですむのである。
【００４０】
ステップＳ１３が終了するとステップＳ１１に戻って次のブロックを処理するのであるが、次のブロックは処理した前のブロックに隣接した左右のブロックか上下のブロックになると考えられる。図４（ａ）のように１バイト×８ラインのブロックａからブロックｙの２５ブロックを１から２５の順に処理した場合と図４（ｂ）の様な順番でブロックａ〜ｙを１から２５の順に処理した場合を考える。データキャッシュを利用する場合、データキャッシュの性質として、１バイトの読み出しであってもデータバスが３２ビットのプロセッサであれば少なくともその１バイトを含む３２ビットデータバス上の４バイトをキャッシュメモリ内に取り込む。そのため例えばブロックｍの中の１バイトを読み込んだ場合には、ブロックｌとブロックｎのデータがキャッシュメモリ内に取り込まれるのである。ブロックａのデータキャッシュの効果が現れるのはブロックｂを処理するときであるため、図４（ａ）のようにブロックａ，ｆ，ｋ，ｐ，ｕ，ｂ…の順番でブロックを処理した場合、間のブロックｆ，ｋ，ｐ，ｕの処理のときにデータキャッシュの内容が更新されてブロックｂの処理の時にはミスヒットする確率が高くなる。そのため図４（ｂ）のような順番で処理したほうがデータキャッシュのヒット率が高い。つまり小容量のデータキャッシュでも効果を発揮するのである。従って図４（ｃ）に示す様にブロックの構成が左上のブロック４１に図示するライン構成であれば、斜線部のブロックを処理した後は左右のどちらかのブロック（丸印の方向）を処理することが望ましい。ブロックの処理順番を回転処理をする元の画像のアドレスが連続する順番で行なうのである。
【００４１】
図１で示したのは１バイト×８ラインの１ブロックのデータが全て揃っている場合であるが、画像の端のようにデータが揃わない場合は以下のようにすればよい。
【００４２】
まず８ラインのデータが揃わない場合はステップＳ１２（あるいはステップＳ１５１）において変数tmp1とtmp2の欠落に該当する部分を空白で処理を行う。また、１バイト中の８画素が足りない場合は、ステップＳ１３において８つのbitcombine（）マクロがあるが欠落部分に該当するbitcombine（）マクロを実行しないように制御すればよい。これはステップＳ１５３において、欠落したｎに対しては処理を実行しないことをに相当する。
［上記実施例のバリエーション］
上記実施例のマクロbitcobine()、すなわち図１６に示した処理手順を以下の様に変更しても同じ結果が得られる。
【００４３】
ステップＳ１６１において３２ビット変数tmp1をシフトした後の第４，１２，２０，２８ビット以外のビットを０にしているが、０ではなく１にする。また、同様にステップＳ１６２において第０，８，１６，２４ビット以外のビットを０にしているがこれも１にする。さらに、ステップＳ１６３では論理和を計算するが、論理積を計算する。この操作の結果、図３の状態２のブランクの桁は、０ではなく１で充填されることになる。後のステップは図１６と同様である。
【００４４】
これを図１の上段のbitcombine()と同様に表記すると次の様になる。
mask1=0xfefefefe;
mask2=0xefefefef;
#definebitcombine(pat1,pat2)\
work=((pat1)|mask2)&((pat2)|mask1);\
work&=work>>14;\
work&=work>>7;
この例では、１ブロックのライン数が８にみたない場合に、足りない部分の画素が１となる。これは黒画素の意味が０の場合と１の場合があるため、余白部分を白又は黒にしたい場合、この例と図１の例とを使い分けると良い。
【００４５】
【第２実施例】
［１８０度回転］
図７は本発明に係る画像処理装置を、画像の１８０度回転に適用した例である。
【００４６】
変数p_from／p_toは処理するブロックの入出力のポインタである。変数row_from／row_toは回転前と回転後の横方向バイト数であり、workは回転処理のための作業用変数である。変数rem1は端処理のための値で画像の端が１バイトの境界にあわなかった場合のシフト量を格納しておく。rem2は（３２−rem1）の固定値である。
【００４７】
ステップＳ７１は初期化であり、処理するブロックの読み出しと書き込みのポインタ等の変数の初期化を行なう。処理は４バイト×１ラインを単位とする。
【００４８】
ステップＳ７２は処理するブロックの全データを読み出して１バイト長よりも大きい変数にまとめて格納するパッキング処理である。パッキングされたデータは変数workに格納される。変数rem1が０でない場合はステップＳ７２の「work＝（work＞＞rem1）｜（（unit32）p_from[0]＜＜rem2）；」の処理を行って端を合わせる処理をする。この式の意味するところは、変数workの内容を桁合わせすべき桁数rem1分右シフトした値と、ポインタp_from[0]の指し示す値を３２ビット化してrem2桁分だけ左シフトした値との論理和を計算するということである。ちなみに同様の処理は従来例の処理でも必要であるが説明を簡単にするため省略した。なお、ここで行うべき右シフト演算は論理シフト演算であり、シフトされて空いた桁には０が充填されるシフト演算である。
【００４９】
ステップＳ７３はその変数から回転後のデータを順次生成して書き込みを行う処理である。ステップＳ７３が終わるとステップＳ７１に戻って次のブロックを処理する。
【００５０】
図８はこの図７のステップＳ７３のbitcombine()マクロを説明するための図である。説明を簡単にするために１バイトデータのビット回転で説明する。まず変数workには状態１のように７から０のビットデータが並んでいる。操作１によって変数workの７と６、５と４、３と２、１と０の各画素データが交換されて変数workは状態２となる。次に操作２によって変数workの内容が再配置されこのうち６−７と４−５，２−３と０−１の画素データが交換され変数workは状態３となる。最後に操作３によって変数workの内容が再配置されこのうち４−７，０−３の画素データが交換され変数workは状態４となり回転処理は終了する。即ち、段階的に画素の組み替えが１回の操作によって複数の箇所を処理するようになるため高速になるのである。ちなみに操作１から３の順番はどのようにしても結果は同じくなる。回転処理の対象となる変数workは４バイトの変数であり、ステップＳ７２によりブロックの上半分又は下半分の４バイトが変数workにパックされているため、ステップＳ７３の処理では図８に示した操作が変数work内の４バイトに対して同時に施される。
【００５１】
この処理時間を詳しく計算すると操作１から３はそれぞれ５演算で１５演算となる。従来例の図６（ｂ）では２１演算であるためこれだけでも高速であるが、更に変数workには４バイト分格納されていて同時に処理が行えるため更に高速となる。１ビットあたりの演算数は１５／３２で約０．５演算となる。加えてこのアルゴリズムはＬＵＴのように余分なデータアクセスをしないためデータキャッシュを持たないシステムで実行しても速度低下が小さい。図８からもわかるように１８０度回転では複数バイトのデータをパッキングしなくても画素の組み替えを小部分から段階的に行えるため必ずしもパッキングが必要というわけではない。
【００５２】
図７で示したのは４バイト×１ラインの１ブロックのデータが全て揃っている場合であるが、画像の端のようにデータが揃わない時は以下のようにすればよい。まず４バイトのデータが揃わない場合はステップＳ７２において変数workの欠落に該当する部分を空白で処理を行なう。次にステップＳ７３においてbitcombine（）マクロにおいて同じように処理し、データの有効部分のみの書き込みを行なうように制御すればよい。
【００５３】
図１７は図７の処理をより説明的に示したものであり、図１８は図７におけるマクロbitcombineを説明するものである。
【００５４】
図１７において、まず、回転しようとするブロックの上半分４バイトを変数workにパックする（Ｓ１７１）。これが半端なブロックであれば、半端な部分を空白で充填処理する（Ｓ１７３）。次にブロック内の第ｎライン目を上位桁と下位桁とを逆順にする（Ｓ１７５）。これは図１８に示したもので、図８の処理そのものである。このステップＳ１７５の処理を変数workにパックされた４ラインについて行い、上４ラインが終了したならした４ラインについても同様に処理をステップＳ１７１からくり返し行って（Ｓ１７９）、１ブロックの１８０度回転が終了する。
【００５５】
図１８においては、まずステップＳ１８１でパックされた４バイトの内、互いに隣り合う奇数桁と偶数桁とを入れ替え（操作１）、次にステップＳ１８２で入れ替えられた２ビットを組として、更に隣接する奇数番目の組と偶数番目の組とを入れ替える（操作２）。ステップＳ１８３では同様にステップＳ１８２で入れ替えられた２ビットずつを組とする４ビットを組にして、互いに隣接する奇数番目の組と偶数番目の組とを入れ替える（操作３）。これでパックされた４バイトについて、上位桁と下位桁の入れ替えが完了する。この、回転前ブロックにおける第ｎ番目のラインを、回転後ブロックにおける（７−ｎ）ラインとして格納すれば、１ラインのの回転が完了する。
【００５６】
【第３実施例】
［ビット０が左の場合の９０度回転］
第１の実施例ではビット７が左の画素となる場合の９０度回転を示したがビット０が左の場合の例を図９に示す。動作は実施例１とほぼ同じ処理であるが、ステップＳ９２における処理では、図２（ｂ）に示すようにデータのパッキング方法が逆になり、また、図１のステップＳ１３においてbitcombine()マクロにより列→行に変換されたラインの順番が逆になるだけである。すなわち、図１の処理においてはＬＳＢが右端の桁であるためそれが第１ラインになるように回転したが、図９においては左端桁がＬＳＢとなるため、それが第１ラインとなるように図３とは逆に左端の桁からまず取り出してラインを形成し、それを第１ラインとする。
【００５７】
このようにすることで、ビット０が左端であっても同様に９０度回転処理ができる。
【００５８】
【第４実施例】
［２７０度回転］
ビット７が左画素となる場合が図９と同じで、ビット０が左画素となる場合は図１と同じくなる。９０度回転と異なる点はステップＳ１１またはステップＳ９１のポインタの初期化で処理するブロックの順番だけである。これは、図１の処理と図９の処理とでは、ビットの並びだけを考えた場合には反対向きの回転となっていることからも明らかである。
【００５９】
［６４ｂｉｔ演算での高速化した例］
実施例１では３２ビット演算を使用した場合であるが６４ビット演算を行えるプロセッサを使用した場合の例を図１０に示す。ここでuint６４は８バイト長整数であらかじめ定義されているものとする。
【００６０】
この処理は、本質的には図３あるいは図１５、図１６で説明した処理と変わるものではないが、６４桁を並列に処理できるため、図３におけるtmp1とtmp2とを１つの変数workにまとめることができる、という点を利用したものである。すなわち、図３の状態１ではブロックの上４バイトとした４バイトを２つの４バイト変数に分けてパッキングしたが、図１０のステップＳ１０２ではこれをまとめて１ブロック分８バイトをパッキングする。すなわち、図１０ではtmp1,tmp2は連続する変数workに相当するため、変数workを３２桁（４バイト分）右シフトした値が変数tmp1に相当する。従って、図３の操作１は、変数workの上位３２ビットを４桁シフトして下位３２ビットと論理和をとることに相当する（対象外の桁にはマスクする処理も含まれる）。
【００６１】
このように、実施例１ではbitcombine（）マクロでの演算数が８または９であったが、この実施例の図１０では、図３の操作１の処理がシフトと論理和の計算できとなるため、更に演算数が減って７または８となる。
【００６２】
このように、従来例のアルゴリズムではプロセッサの並行処理する桁数に関らず８ビット単位での処理を行うため、プロセッサが進化しても高速化の余地はないが、本発明ではプロセッサが進化すればそれに合わせてさらに高速化が行える。
【００６３】
またこの実施例では、実施例１の変数tmp1，tmp2，mask1，mask2が変数tmp，maskに削減される利点が有る。ところでこの図１０のbitcombine（）マクロの「「work|=work＞＞28;work｜=14;work|=work＞＞7;」の部分は図１とは順番が逆であるが処理結果は同じになる。すなわち、この「」内の処理は図３で説明した桁をシフトして重ねあわせるという操作１〜操作３に相当するが、これら操作は逆の順序で処理しても得られる結果は同じであるということである。
【００６４】
【第５実施例】
［多値画像回転］
図１１に１画素２ビットの場合の９０度回転の処理方法を示す。１バイト×４ラインを１ブロックとして処理するのであるが、これは第１の実施例として示した回転処理を、２ビットを１まとまりとして行うものである。すなわち、ステップ１１２では４ライン３２バイトを１つの変数tmpにパッキングする。図１９は図１１のステップＳ１１３における、回転後の第１ライン目を形成する１ラインの回転処理の模式図である。ステップＳ１１２により変数tmpは図１９の状態１のようにパッキングされた状態になる。この後、６桁右シフトして、各バイトの右端２桁以外を０でマスクする（操作１）。ここで６桁シフトするのは上位２ビットを処理の対象としているためで、最下位２ビットならシフトする必要はないし、その上の２ビットならば２桁右シフトするし、更にその上の２ビットならば４桁右シフトすれば良い。こうして状態２となったデータが得られる。この状態は何番目のラインであろうとも同一である。
【００６５】
次に、状態２のデータを右に６桁シフトし、状態２のデータとの論理和を計算して（操作２）、状態３のデータを得る。。更に、状態３のデータを１２桁右シフトして状態３のデータとの論理和を計算し（操作３）、状態４のデータを得る。この最下位１バイトが回転後の１ラインとなる。
【００６６】
以上の操作を４ライン全てについて行い、得られたラインを処理前の桁に応じたラインとして格納すれば、９０度回転した画像データが得られる。図１９では、上位桁が上位ラインとなり、下位桁が下位ラインとなるように回転している。このように、基本的には第１実施例と同じ要領であるが、捜査対象の桁数が１ビットではなく、２ビットである点において異なる。
【００６７】
以上のように本発明は多値画像でも適用できる。
【００６８】
【第６実施例】
［複数ライン処理］
図１２に１画素２ビットの場合の９０度回転の処理方法を示す。図１と同じの変数tmp1,tmp2を使って、ステップＳ１２２では「tmp1｜＝tmp1＜＜１４；tmp2｜＝tmp2＜＜１４；」の処理によりステップＳ１２３において同じ演算を繰り返すのを防いでいる。その結果bitcombine（）マクロにおいて変数workには回転後のデータが２ライン分生成される。
【００６９】
図２０はその様子を示す図である。まず、ステップＳＳ１２２において、状態１のように、回転前ブロックの第１ライン目（記号ａで示す）と第２ライン目（記号ｂ）とが、変数tmp2にセットされる。すなわち、tmp１の下位１６バイトに第２ラインと第１ラインを順に格納し（その他の桁は０にしておく）、その値と、それを１４ビット左シフトした値との論理和を変数tmp2に格納する。変数tmp1も同様にして第３ラインと第４ラインとを格納する。
【００７０】
これら状態１の変数にたいし、まずtmp2の値を４桁右シフトして各バイトの下２ビット以外を０とし、tmp1の値を各バイトの上から３ビット目と４ビット目以外を０とする（操作１）。なお、変数tmp1,tmp2の値はこの後の処理のために変えないようにしておくことが望ましい。
【００７１】
こうして得られる状態２の値に対して、両者の論理和を計算して（操作２）、状態４の値を得、状態３の値と、その値を６桁右シフトした値との論理和を計算すると（操作３）、状態４の値が得られる。
【００７２】
状態４では、回転前のブロックを２ビットずつ９０度回転した後のラインが２つ得られている。即ち、第１ラインが第２バイト目として、第２ラインが第４バイト目として得られる。
【００７３】
以上の操作を、回転後の第３、第４ラインに対しても行う。そのためには、状態２としてあたえる２つの値として、それぞれ図２０よりも４桁ずつ左にずらし、図２０に示したようにビット７，６あるいはビット５，４に代えて、ビット３，２あるいはビット１，０をそれぞれ置き換えれば良い。この結果、操作２，操作３で回転後の第３、第４ラインが得られる。
【００７４】
【第７実施例】
［複数ブロック処理］
図１３に１画素４ビットの場合の９０度回転の処理方法を示す。実施例１に従うと１画素が４ビットの場合は１バイト×２ラインがブロックになるが、このままであると３２ビットの変数にパッキングすると２バイト分が空白となり演算の１／２は無駄になる。そこで２ブロック分の２バイト×２ラインを変数tmpにパッキングして処理を行なう。このパッキングを行うのがステップＳ１３２である。ここでは、２ブロック目の第１ライン、２ブロック目の第２ライン、１ブロック目の第１ライン、１ブロック目の第２ラインの順で４バイト変数tmpにパッキングする。
【００７５】
図２１はこの様子を示している。まず、４ビット１画素の２つのブロックＡ，Ｂについて、各画素ａ１−ａ４，ｂ１−ｂ４を図２１のように考える。これをステップＳ１３２で状態１のようにパッキングする。これを操作１により３２ビットの変数tmpに下位から１画素（４ビット）ずつ格納する。同じくworkにtmpに格納しなかった画素を４桁右シフトして格納する。この結果状態２のようになる。こうして得られたtmpの値と、その値を４桁右シフトした値との論理和を計算し、変数tmpに格納する（操作３）。変数workについても同じ操作を行う。こうして得られた状態３の両変数を、順にworkの第２バイト、tempの第２バイト、workの第４バイト、tmpの第４バイトを回転後の２つのブロックとして格納すれば、ブロックＡとＢとをまとめて９０度回転した画像をえることができる。
【００７６】
これによりトータルの演算回数とループ回数が減るため高速に処理が行える。またパッキングするための変数を複数用意すればさらにループ回数が削減される。
【００７７】
さらに６４ビット演算が可能であるプロセッサを使用するのであればそれだけで処理できるブロック数が倍になる。例えば１画素４ビットの場合だけではなく図１１のような１画素２ビットの場合でも２ブロックを同時に処理するように変更することは容易である。
【００７８】
尚、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器から成る装置に適用しても良い。また、本発明はシステム或は装置にプログラムを供給することによって達成される場合にも適用できることはいうまでもない。
【００７９】
【発明の効果】
以上説明したように、本発明に係る画像処理方法及びその装置によれば、
・複数バイトにまたがるビットマップ上のデータを多バイト長の変数にまとめて格納することにより複数バイトを同時に処理することが可能となり演算回数を減らすことができる。よって複数画素を同時に処理することが可能となるため１画素あたりの演算回数が削減され処理時間を短縮できる。
【００８０】
・従来例では実行するプロセッサを３２ビットから６４ビットに変更しても動作周波数やキャッシュ内容や内部アーキテクチャの向上などの変更がない限り高速にはならない。しかし本発明では演算のビット幅が増えるだけでそれを積極的に利用して高速化がはかれる。そのため従来例との差はさらに大きくすることができる。
【００８１】
・データアクセス回数が少ないのでデータキャッシュが少ないシステムで速度低下が小さい。同様の理由により、データキャッシュを持たないシステムでも速度低下が小さい。
【００８２】
・ブロックの処理順番を回転処理をする元の画像のアドレスが連続する順番で行なうことによりデータキャッシュの容量に対してキャッシュヒット率が高くなり速度が向上する。
【００８３】
・複数のバイトにまたがる画像データを多バイト長の変数にまとめて格納することにより分けて格納するよりも変数のすうを削減でき、それによりコンパイラがその変数をレジスタ変数に割り振ることが容易になるという効果がある。
【００８４】
【図面の簡単な説明】
【図１】本発明の第１の実施例である９０度回転の処理の手順を示す図である。
【図２】第１実施例によるデータパッキングの図である。
【図３】第１実施例の９０度回転処理の手順を示す図である。
【図４】第１実施例のブロック処理の順序を示す図である。
【図５】ブロックの９０度回転の従来例の手順を示す図である。
【図６】ブロックの１８０度回転の従来例の手順を示す図である。
【図７】第２実施例のブロックの１８０度回転の手順を示す図である。
【図８】第２実施例の１８０度回転を説明する図である。
【図９】第４実施例の２７０度回転の手順を示す図である。
【図１０】第１実施例の処理を６４ビットプロセッサで行う場合の処理の手順を示す図である。
【図１１】第５実施例による１画素が２ビットの場合のブロックを９０度回転する処理手順を示す図である。
【図１２】第６実施例による複数ラインを同時処理する場合の処理手順を示す図である。
【図１３】第７実施例による、複数ブロックを同時に処理場合の処理手順を示す図である。
【図１４】実施例の画像処理を行うシステムの構成図である。
【図１５】本発明の第１の実施例である９０度回転の処理の手順を示す図である。
【図１６】本発明の第１の実施例である９０度回転の処理の手順を示す図である。
【図１７】本発明の第２の実施例である１８０度回転の処理の手順を示す図である。
【図１８】本発明の第２の実施例である１８０度回転の処理の手順を示す図である。
【図１９】第５実施例による１画素が２ビットの場合のブロックを９０度回転する様子を示す図である。
【図２０】第５実施例による１画素が２ビットの場合のブロックを９０度回転するもうひとつの様子を示す図である。
【図２１】第７実施例による、複数ブロックを同時に処理する場合のブロックの様子を示す図である。[0001]
[Industrial application fields]
The present invention relates to an image processing method and apparatus for rendering a rotated bitmap image of 90 ° × n (n = 1, 2, 3) with respect to an image in an arbitrary region of a given bitmap by a processor software. It is about.
[0002]
[Prior art]
Conventionally, a bitmap rotation method for drawing a rotated bitmap image of 90 ° × n (n = 1, 2, 3) with respect to an image in an arbitrary region of a given bitmap by processor software is as follows. There was a system like this. Hereinafter, the rotation angle represents the rotation angle in the counterclockwise direction, and the bit 7 side is the left pixel unless otherwise specified.
[0003]
FIG. 5 is an excerpt of the main part of the conventional 90 ° rotation.
[0004]
The rotation process uses 8 bytes of 1 byte × 8 lines (8 × 8 pixels) as a unit of processing. In the figure, p_from / p_to are input / output pointers of the respective blocks to be processed, and indicate addresses biased by n bytes by the index [n]. For example, p_from [2] indicates an address advanced by 2 bytes from p_from [0]. row_from / row_to is the number of bytes per line of the image before and after rotation, and work is a work variable for rotation processing.
[0005]
Step S51 is initialization, in which variables such as a pointer indicating the block to be rotated and a pointer indicating the rotated block are initialized.
[0006]
Step S52 is the main body of the rotation process. In order to generate one block (8 bytes) of the rotated process, the 8-byte data of the original block is accessed each time and transferred to the variable work pixel by pixel. Generate rotated data in variable work and write to pointer p_to's buffer.
[0007]
In step S52, each column of the pre-processing block (8 rows × 8 columns) is changed to each row of the post-processing block. In step S52-1, the pixel in the rightmost column of the pre-processing block is moved to the uppermost row of the post-processing block. Note that words such as left and right and up and down indicate corresponding positions when a block of 8 pixels × 8 pixels is visually considered. The movement at this time is performed by moving the uppermost pixel in the rightmost column of the pre-processing block to the left end of the uppermost row of the post-processing block, and sequentially processing the pixel below the pixel processed immediately before in the pre-processing block. This is performed so as to be arranged on the right side of the pixel processed immediately before in the subsequent block. Similarly, in step S52-2, the second column from the right side of the pre-processing block is moved to the second row of the post-processing block. In this way, 90 ° rotation is performed for one block.
[0008]
When step S52 ends, the process returns to step S51 to process the next block.
[0009]
FIG. 6 shows a conventional example of 180 degree rotation. FIG. 6A shows a case where a 256-byte lookup table (LUT) is used, and FIG. 6B shows a case where copy / paste is repeated in units of one pixel as in the conventional example rotated 90 degrees. . In the example using the LUT, the block before processing is converted by the LUT “table” in units of rows, that is, in units of 1 byte, and is used as the converted block. The LUT “table” has a value obtained by rotating the offset value in byte units from the head by 180 ° as a binary number. For example, the 0x05th byte of “table” is a value 0xA0 obtained by rotating it by 180 degrees. Therefore, in step S602, a value obtained by rotating the byte indicated by the pointer p_from by 180 degrees is given to the post-processing block indicated by the pointer p_to. If one line is processed, the pointer p_from is advanced by one line, the pointer p_to is moved back by one line, and processing is performed for eight lines, thereby completing one block rotation.
[0010]
In FIG. 6B, in step S612, one row (1 byte) of the pre-processing block is horizontally reversed, that is, rotated by 180 degrees. If this is sequentially performed from the upper stage for each row of the pre-rotation block, and it is incorporated from the lower stage of the post-rotation block, a block rotated 180 degrees is obtained.
[0011]
Incidentally, it is also conceivable to process using the LUT even in the case of 90-degree rotation.
[0012]
[Problems to be solved by the invention]
However, the conventional method is slow because processing is performed in units of one bit, and in order to perform processing in units of bits, it is necessary to access data in units of processing including the target bit, for example, in units of bytes. For this reason, since the rotated data is generated 1 byte, the data access of 8 bytes is performed every time, so that the speed is further reduced. This is even more noticeable in systems that do not have a data cache. Although the 16-bit processor can perform 16-bit computation and the 32-bit processor can perform 32-bit computation, the portion actually used for computation is only the 8-bit (1 BYTE) portion, which is wasteful.
[0013]
When the LUT is used, the number of data accesses increases, so that a data cache is desirable for high-speed processing. Also, if the table size is increased in order to further increase the speed by using the LUT and the number of LUT accesses is reduced so that conversion is performed in units of 2 bytes instead of conversion in units of 1 byte, for example, Compared to the increase in the table size that is 512 times, the number of LUT accesses is only ½, which is extremely inefficient. Conversely, in a system having a data cache, the cache hit rate may decrease and become slower.
[0014]
The present invention has been made in view of the above conventional example,
・ By storing and processing the data of the divided blocks spanning multiple bytes in a variable of multi-byte length, it is possible to process multiple pixels in one operation, reducing the total number of operations Image processing method and apparatus thereof, or
An image processing method and apparatus capable of reducing the number of operations by performing pixel recombination step by step from a small portion, or
An image processing method and apparatus capable of increasing the cache hit rate with respect to the capacity of the data cache and improving the speed by performing the processing order of the blocks in the order in which the addresses of the original images to be rotated are consecutive. The purpose is to provide.
[0015]
[Means for Solving the Problems]
In order to achieve the above object, the image processing method of the present invention has the following configuration. That is, 8 Bit x 8 An image processing device for rotating line image data by 90 degrees, 8 Store continuous image data of the upper half of the line in the first memory; 8 First processing means for storing continuous image data in the lower half of the line in a second memory; and image data stored in the first memory. N bits within 7 bits Shift, 1st 8 bit interval Image data masked except for the bit of interest and image data stored in the second memory N is 4 bits of difference M bits Shift, A second position shifted by MN bits with respect to the first bit of interest. Second processing means for storing, in a third memory, image data obtained by ORing image data other than the bit of interest masked; In order to process the image data not masked by the second processing means into 1 byte, A third processing means for obtaining a logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory by a predetermined amount; and a shift by the second processing means. amount N and M one by one And the processing by the third processing means 8 And means for executing the number of times.
[0017]
[First embodiment]
FIG. 14 is a configuration diagram of a system for realizing image rotation processing according to the embodiment of the present invention. In FIG. 14, the CPU 1 executes a program stored in a memory to control the entire system and perform processing such as image rotation processing. The ROM 2 stores a control program (including a program related to a flowchart described later) and data of the CPU 1. The RAM 3 stores the work data of the CPU 1 and the rotated image and the rotated image. The scanner 4 inputs an image. The display 5 displays an image. The storage device 6 is composed of a magnetic disk device or the like, and stores images, programs, and other data. The printer 7 prints out images and the like. An input / output interface (I / F) 8 is connected to an external device for input / output. Among these blocks, there are the followings for performing image rotation described later.
[0018]
<Printer>
First, page description language data input from the input / output interface 8 or the like is stored in the RAM 3. The CPU 1 interprets it and draws it on the RAM 3, and sends the drawn bitmap data to the printer 7 for printing on the paper. In the printer 7, image rotation processing is performed when the bitmap data is rotated, the bitmap font is rotated, or when A4 is output by the A3 printer.
[0019]
<Filing device>
In FIG. 14, the storage device 6 corresponds to a filing device. In the case of an intelligent filing device that stores an original image read by the scanner 4 and performs character recognition, it is necessary to rotate the original image when the orientation of the original is not correct.
[0020]
<CPU>
The image read from the scanner 4 or the storage device 6 is stored in the RAM 3, rotated and displayed on the display 5.
[0021]
In this way, image rotation processing is performed as necessary. This rotation process is realized by the following procedure.
[0022]
<Rotation procedure>
FIG. 1 is a diagram that best represents image rotation processing according to the present invention.
[0023]
In the processing in this embodiment, a block of 1 byte × 8 lines (8 × 8 pixels) is used as a processing unit. Variables p_from / p_to are input / output pointers of blocks to be processed. The variables row_from / row_to are the number of bytes per line of the image before and after rotation, respectively. For example, if p_from points to the top row of a block, the next row of the block is an address obtained by adding the size row_from for one line of the image to the address of the row above the block. Variables work / tmp1 / tmp2 are work variables for rotation processing. Here, uint32 is a 4-byte long integer and is defined in advance.
[0024]
In the figure, step S11 is initialization, in which variables such as pointers to read and write blocks to be processed are initialized. That is, the address of the block before rotation is stored in the variable p_from, and the address of the block after rotation is stored in the variable p_to.
[0025]
Step S12 is a packing process in which all data of the block to be processed is read and stored in a variable larger than 1 byte length. As shown in FIG. 2 (a), the byte data of each line of 1 byte × 8 lines is divided into four lines, the upper four lines, and sequentially stored in variables of tmp1 and tmp2, respectively.
[0026]
Step S13 is a process of sequentially generating and writing the rotated data from the variables tmp1 and tmp2.
[0027]
FIG. 3 shows an outline of the processing of the bitcombine () macro, which is the main part of step S13.
[0028]
In step S12, pixel data of 1 byte × 8 lines (7th to 0th lines) is packed and stored in variables tmp1 and tmp2 by 4 bytes. In step S13, pixel data of each LSB (least significant digit) of the byte data on the seventh line is extracted from the seventh line, and 1-byte data that has been rotated is extracted. In the state rotated 90 degrees, the LSBs are arranged in order of numbers to form one row. For convenience of explanation, a value matching the line number is written in the LSB of each byte. A question mark indicates that the value does not matter.
[0029]
First, operation 1 is a logical operation in which each pixel data of LSB7 to LSB0 is extracted from variables tmp1 and tmp2 shown in state 1 and stored in variable work. In this operation, the contents of tmp1 are shifted left by 4 bits, and the logical product of the value and hexadecimal mask data mask2 = 0x10101010 is obtained. As a result, the value stored in the variable tmp1 is shifted left by 4 bits, and is 0 except for the original 4-bit LSBs 7, 6, 5, and 4. Further, the logical product of the contents of the variable tmep2 and the hexadecimal mask data mask1 = 0x01010101 is calculated. For this reason, the contents of tmp2 are all 0 except for the LSB. The value shown in the state 2 is obtained by calculating the logical sum of the tmp1 thus shifted and masked and the masked tmp2 to obtain the variable work. In this operation, the positional relationship between bit 7 and bit 3, the positional relationship between 6 and 2, the positional relationship between 5 and 1, and the positional relationship between 4 and 0 are arranged in the same manner as the positional relationship after rotation. .
[0030]
Next, by operation 2, the logical sum of the value of the variable work indicated as the state 2 and the value right-shifted by 7 bits is calculated. As a result, the variable work is rearranged as in state 3, and the positional relationship between LSB7, 6, 3 and 2 and the positional relationship between LSB5, 4 and 1 and 0 are arranged according to the positional relationship after rotation. Is done.
[0031]
Finally, the operation 3 calculates the logical sum of the value of the variable work indicated as the state 3 and the value obtained by shifting it to the right by 14 bits. As a result, the variable work is rearranged as in state 4, and the pixel data of the first row after rotation is obtained in the least significant 8 bits of the variable work.
[0032]
With the above operation, the rightmost column of the block before rotation can be replaced with the uppermost row of the block after rotation. To perform one block, the same operation is performed while shifting the LSB in the above description to the upper digit one pixel at a time. In order to move the pixel, the shift amount for the variables tmp1 and tmp2 in the operation 1 described above is decreased, and the shift operation is performed so that the target digit becomes the bits 0 to 7 in the state 2. The lower 1 byte of state 4 obtained in this way is a row rotated 90 degrees.
[0033]
FIGS. 15 and 16 show the flowchart of FIG. 1 in a more explanatory manner. In FIG. 15, step S151 corresponds to step S11, and the process of FIG. Steps S152 to S155 are processes corresponding to step S13. In step S13, it is described that a logical operation is performed on each column, but in FIG. 15, since the column number is generalized to n, a loop process is added. FIG. 16 explains step S153 in detail. The processing in the macro bitcombine () referred to in FIG. 1, that is, the procedure described in FIG. 3 is described with reference to FIG.
[0034]
In FIG. 16, the packed 32-bit tmp1 is first (4-n) Bits are shifted to the left, leaving all the 4th, 12th, 20th, and 28th bits set to 0. Here, n indicates a column (digit) to be rotated. When n-4 becomes negative, the shift direction is reversed. Next, in step S162, tmp2 is shifted n bits to the right, leaving 0th, 8, 16, and 24 bits as 0. In step S163, the logical sum of the results obtained in steps S161 and S162 is calculated. This is the operation 1 in FIG.
[0035]
Steps S164 to S166 are the contents as described in operations 2 and 3, respectively.
[0036]
In this way, each line of the block is not handled line by line, but is replaced with continuous data, and then the logical operation is performed, so that the pixel rearrangement is performed in parallel at a plurality of places instead of one place at a time. It is faster because it is processed.
[0037]
More specifically, for example, even if 4 bytes are packed and stored in variables tmp1 and tmp2, respectively, even if a necessary pixel is masked and extracted from data of 1 byte × 8 lines, one logical operation (AND operation) Can extract a plurality of pixels (4 pixels in the example). In addition, since the packing method of the data of the variables tmp1 and tmp2 is also aligned with the rotated bits, the pixel can be rearranged stepwise as shown in operations 2 and 3.
[0038]
The variables tmp1 and tmp2 can be reused not only when the bit is 0 as shown in FIG. 3 but also when extracting the rotation data of each digit. The remaining 7 bytes (line) of rotation data can be generated without access. For this reason, even if it is executed on a system without a data cache, if tmp1 and tmp2 are assigned to register variables, rotation data can be generated without memory access, which is very fast. In addition, packing a plurality of image data can reduce the number of variables, and also has an effect that assignment to register variables is easy. Even if allocation to register variables is not possible, the number of times of data access becomes 1/4 (8 times becomes 2 times because 1-byte access becomes 4-byte access), so it is still fast. is there.
[0039]
When calculated in detail, one block of 1 byte × 8 lines is rotated, so each byte data for packing is read once, an operation to generate 1 byte data after rotation from the packed data The number of times is 8 or 9 times, and the generated rotation data is written once. Incidentally, in the example of FIG. 3, operation 1 has 4 operations, and operations 2 and 3 each have 2 operations. That is, the number of access times of the image data is minimized, and the rotation process requires only one operation per pixel.
[0040]
When step S13 ends, the process returns to step S11 to process the next block, but the next block is considered to be the left or right block adjacent to the processed previous block or the upper and lower blocks. As shown in FIG. 4A, 25 blocks from 1 byte × 8 lines block a to y are processed in the order of 1 to 25, and blocks a to y are changed from 1 to 25 in the order as shown in FIG. Consider the case of processing in this order. When using a data cache, as a property of the data cache, if the data bus is a 32-bit processor even if it is a 1-byte read, at least 4 bytes on the 32-bit data bus including the 1 byte are stored in the cache memory. take in. Therefore, for example, when one byte in the block m is read, the data of the block l and the block n are taken into the cache memory. Since the data cache effect of block a appears when processing block b, when blocks are processed in the order of blocks a, f, k, p, u, b... As shown in FIG. The contents of the data cache are updated when the blocks f, k, p, and u are processed, and the probability of a miss hit is increased when the block b is processed. For this reason, the data cache hit rate is higher when processing is performed in the order as shown in FIG. In other words, it is effective even with a small-capacity data cache. Therefore, if the block configuration is the line configuration shown in the upper left block 41 as shown in FIG. 4C, after processing the shaded block, either the left or right block (the direction of the circle) is processed. It is desirable to do. The processing order of the blocks is performed in the order in which the addresses of the original images to be rotated are consecutive.
[0041]
FIG. 1 shows the case where all the data of one block of 1 byte × 8 lines is prepared. However, when the data is not prepared like the end of the image, the following may be performed.
[0042]
First, when the data of 8 lines are not prepared, the portion corresponding to the lack of the variables tmp1 and tmp2 is processed with a blank in step S12 (or step S151). If eight pixels in one byte are insufficient, control may be performed so that the bitcombine () macro corresponding to the missing portion is not executed in step S13, although there are eight bitcombine () macros. This corresponds to not executing the process for the missing n in step S153.
[Variation of the above embodiment]
Even if the macro bitcobine () of the above embodiment, that is, the processing procedure shown in FIG. 16 is changed as follows, the same result can be obtained.
[0043]
Bits other than the 4th, 12th, 20th, and 28th bits after shifting the 32-bit variable tmp1 in step S161 are set to 0, but are set to 1 instead of 0. Similarly, bits other than the 0th, 8th, 16th, and 24th bits are set to 0 in step S162, but this is also set to 1. Further, in step S163, a logical sum is calculated, but a logical product is calculated. As a result of this operation, the blank digit in state 2 of FIG. 3 is filled with 1 instead of 0. The subsequent steps are the same as in FIG.
[0044]
This can be expressed in the same way as bitcombine () in the upper part of FIG.
mask1 = 0xfefefefe;
mask2 = 0xefefefef;
#definebitcombine (pat1, pat2) \
work = ((pat1) | mask2) & ((pat2) | mask1); \
work & = work >>14; \
work & = work >>7;
In this example, when the number of lines in one block is not eight, the missing pixel is one. This means that the meaning of the black pixel may be 0 or 1. Therefore, when it is desired to make the margin portion white or black, it is preferable to use this example separately from the example of FIG.
[0045]
[Second embodiment]
[Rotate 180 degrees]
FIG. 7 shows an example in which the image processing apparatus according to the present invention is applied to 180 degree rotation of an image.
[0046]
The variable p_from / p_to is an input / output pointer of the block to be processed. The variable row_from / row_to is the number of horizontal bytes before and after rotation, and work is a work variable for rotation processing. The variable rem1 is a value for edge processing and stores the shift amount when the edge of the image does not fall on the 1-byte boundary. rem2 is a fixed value of (32−rem1).
[0047]
Step S71 is initialization, in which variables such as pointers for reading and writing of blocks to be processed are initialized. Processing is in units of 4 bytes × 1 line.
[0048]
Step S72 is a packing process in which all data of the block to be processed is read and stored in a variable larger than 1 byte length. The packed data is stored in the variable work. If the variable rem1 is not 0, the process of “work = (work >> rem1) | ((unit32) p_from [0] <<rem2);" in step S72 is performed to match the ends. The meaning of this expression is that the value of the variable work is shifted right by the number of digits rem1 to be aligned, and the value pointed to by the pointer p_from [0] is 32 bits and shifted left by rem2 digits. It is to calculate the logical sum. Incidentally, the same processing is necessary for the processing of the conventional example, but is omitted for the sake of simplicity. Note that the right shift operation to be performed here is a logical shift operation, and is a shift operation in which 0 is filled in the vacant digits after the shift.
[0049]
Step S73 is a process of sequentially generating and writing the rotated data from the variables. When step S73 ends, the process returns to step S71 to process the next block.
[0050]
FIG. 8 is a diagram for explaining the bitcombine () macro in step S73 of FIG. In order to simplify the explanation, explanation will be made with bit rotation of 1-byte data. First, bit data from 7 to 0 is arranged in the variable work as in state 1. By the operation 1, the pixel data of the variables work 7 and 6, 5 and 4, 3 and 2, 1 and 0 are exchanged, and the variable work becomes the state 2. Next, the contents of the variable work are rearranged by the operation 2, among which the pixel data 6-7, 4-5, 2-3, and 0-1 are exchanged, and the variable work becomes the state 3. Finally, the contents of the variable work are rearranged by the operation 3, among which the pixel data 4-7 and 0-3 are exchanged, the variable work becomes the state 4 and the rotation process ends. In other words, pixel recombination is processed in a stepwise manner because a plurality of locations are processed by a single operation. Incidentally, the result is the same regardless of the order of operations 1 to 3. The variable work to be rotated is a 4-byte variable, and the upper half or the lower half of the block is packed into the variable work in step S72. Therefore, in the process of step S73, the operation shown in FIG. Is applied simultaneously to the 4 bytes in the variable work.
[0051]
If this processing time is calculated in detail, operations 1 to 3 each have 5 operations and 15 operations. In FIG. 6B of the conventional example, since there are 21 operations, this alone is high speed. However, since the variable work is stored for 4 bytes and can be processed at the same time, the speed is further increased. The number of operations per bit is 15/32, which is about 0.5 operations. In addition, since this algorithm does not access extra data like the LUT, even if it is executed in a system without a data cache, the speed reduction is small. As can be seen from FIG. 8, the 180 ° rotation does not necessarily require packing because pixel rearrangement can be performed step by step without packing a plurality of bytes of data.
[0052]
FIG. 7 shows the case where all the data of one block of 4 bytes × 1 line is prepared. However, when the data is not prepared like the end of the image, the following may be performed. First, when the 4-byte data is not complete, the portion corresponding to the lack of the variable work is processed with a blank in step S72. In step S73, the same processing is performed in the bitcombine () macro, and control is performed so that only the valid portion of the data is written.
[0053]
FIG. 17 illustrates the processing of FIG. 7 in a more explanatory manner, and FIG. 18 illustrates the macro bitcombine in FIG.
[0054]
In FIG. 17, first, the upper half 4 bytes of the block to be rotated are packed into the variable work (S171). If this is an odd block, the odd portion is filled with blanks (S173). Next, the upper digit and the lower digit are reversed in the n-th line in the block (S175). This is what is shown in FIG. 18 and is the process itself of FIG. The process of step S175 is performed for the four lines packed in the variable work, and the process is repeated from step S171 in the same manner for the four lines that have finished the upper four lines (S179). finish.
[0055]
In FIG. 18, first, the odd and even digits adjacent to each other among the 4 bytes packed in step S181 are exchanged (operation 1), and then the two bits exchanged in step S182 are paired and further adjacent. The odd-numbered group and the even-numbered group are switched (operation 2). Similarly, in step S183, 4 bits each having 2 bits replaced in step S182 are set as a set, and the odd-numbered set and the even-numbered set adjacent to each other are switched (operation 3). This completes the exchange of the upper and lower digits for the packed 4 bytes. If the nth line in the pre-rotation block is stored as the (7-n) line in the post-rotation block, the rotation of one line is completed.
[0056]
[Third embodiment]
[90 degree rotation when bit 0 is left]
In the first embodiment, the 90 ° rotation is shown when bit 7 is the left pixel, but FIG. 9 shows an example where bit 0 is the left. The operation is almost the same as that in the first embodiment. However, in the processing in step S92, the data packing method is reversed as shown in FIG. 2B, and in step S13 in FIG. 1, the bitcombine () macro is used. Only the order of the lines converted from column to row is reversed. That is, in the process of FIG. 1, since the LSB is the rightmost digit, it is rotated so that it is the first line. However, in FIG. 9, the leftmost digit is the LSB, so that it is the first line. Contrary to FIG. 3, a line is first formed from the leftmost digit to form a first line.
[0057]
By doing in this way, even if bit 0 is the left end, 90 degree rotation processing can be performed similarly.
[0058]
[Fourth embodiment]
[270 degree rotation]
The case where bit 7 is the left pixel is the same as in FIG. 9, and the case where bit 0 is the left pixel is the same as in FIG. The only difference from the 90-degree rotation is the order of blocks to be processed by the pointer initialization in step S11 or step S91. This is clear from the fact that the processing in FIG. 1 and the processing in FIG. 9 are rotating in opposite directions when only the bit arrangement is considered.
[0059]
[Example of speeding up with 64-bit calculation]
FIG. 10 shows an example in which a 32-bit operation is used in the first embodiment, but a processor capable of 64-bit operation is used. Here, uint64 is an 8-byte long integer and is defined in advance.
[0060]
This process is essentially the same as the process described with reference to FIG. 3, FIG. 15, or FIG. 16. However, since 64 digits can be processed in parallel, tmp1 and tmp2 in FIG. 3 are combined into one variable work. It is a point that can be used. That is, in the state 1 of FIG. 3, the upper 4 bytes of the block are divided into two 4-byte variables and packed. In step S102 of FIG. 10, 8 bytes for one block are packed together. That is, in FIG. 10, tmp1 and tmp2 correspond to the continuous variable work, and therefore, a value obtained by shifting the variable work to the right by 32 digits (4 bytes) corresponds to the variable tmp1. Therefore, the operation 1 in FIG. 3 corresponds to shifting the upper 32 bits of the variable work by 4 digits and taking the logical sum with the lower 32 bits (including a masking process for non-target digits).
[0061]
As described above, in the first embodiment, the number of operations in the bitcombine () macro is 8 or 9, but in FIG. 10 of this embodiment, the processing of the operation 1 in FIG. 3 can calculate the shift and the logical sum. Therefore, the number of operations is further reduced to 7 or 8.
[0062]
In this way, the conventional algorithm performs processing in units of 8 bits regardless of the number of digits processed in parallel by the processor, so there is no room for speeding up even if the processor evolves. If so, the speed can be further increased.
[0063]
In this embodiment, there is an advantage that the variables tmp1, tmp2, mask1, and mask2 of the first embodiment are reduced to the variables tmp and mask. By the way, in the bitcombine () macro of FIG. 10, the part of ““ work | = work >>28; work | = 14; work | = work >>7; ”is in reverse order to FIG. Be the same. That is, the processing in “” corresponds to the operations 1 to 3 in which the digits described with reference to FIG. 3 are shifted and overlapped, but these operations can be obtained in the reverse order, and the same result can be obtained. That's what it means.
[0064]
[Fifth embodiment]
[Multi-valued image rotation]
FIG. 11 shows a 90-degree rotation processing method in the case of 2 bits per pixel. One byte × 4 lines are processed as one block, and this is a process in which the rotation processing shown as the first embodiment is performed with 2 bits as one unit. That is, in step 112, 4 lines and 32 bytes are packed into one variable tmp. FIG. 19 is a schematic diagram of the rotation process of one line forming the first line after rotation in step S113 of FIG. In step S112, the variable tmp is packed as in state 1 in FIG. Thereafter, the data is shifted to the right by 6 digits, and the other than the 2 rightmost digits of each byte are masked with 0 (operation 1). Here, the 6-digit shift is performed because the upper 2 bits are the object of processing. If the lowest 2 bits, it is not necessary to shift. If it is a bit, it is sufficient to shift four digits to the right. In this way, data in state 2 is obtained. This state is the same regardless of the number of lines.
[0065]
Next, the state 2 data is shifted to the right by 6 digits, and the logical sum with the state 2 data is calculated (operation 2) to obtain the state 3 data. . Further, the state 3 data is shifted to the right by 12 digits, and the logical sum with the state 3 data is calculated (operation 3) to obtain the state 4 data. This least significant byte is one line after rotation.
[0066]
If the above operation is performed for all four lines and the obtained lines are stored as lines corresponding to the digits before processing, image data rotated by 90 degrees can be obtained. In FIG. 19, the rotation is performed so that the upper digit is the upper line and the lower digit is the lower line. As described above, the procedure is basically the same as that of the first embodiment, except that the number of digits to be searched is 2 bits instead of 1 bit.
[0067]
As described above, the present invention can also be applied to multi-value images.
[0068]
[Sixth embodiment]
[Multi-line processing]
FIG. 12 shows a 90 ° rotation processing method in the case of 2 bits per pixel. Using the same variables tmp1 and tmp2 as in FIG. 1, in step S122, the same operation is prevented from being repeated in step S123 by the process of “tmp1 | = tmp1 <<14; tmp2 | = tmp2 <<14;”. As a result, in the bitcombine () macro, two lines of rotated data are generated for the variable work.
[0069]
FIG. 20 is a diagram showing this state. First, in step SS122, as in state 1, the first line (indicated by symbol a) and the second line (symbol b) of the pre-rotation block are set in the variable tmp2. That is, the second line and the first line are sequentially stored in the lower 16 bytes of tmp1 (the other digits are set to 0), and the logical sum of the value and the value obtained by shifting it to the left by 14 bits is stored in the variable tmp2. Store. Similarly, the variable tmp1 stores the third line and the fourth line.
[0070]
For these state 1 variables, first, the value of tmp2 is shifted 4 digits to the right to set the values other than the lower 2 bits of each byte to 0, and the values of tmp1 are set to 0 for the 3rd and 4th bits from the top of each byte. (Operation 1). It is desirable to keep the values of variables tmp1 and tmp2 unchanged for subsequent processing.
[0071]
For the value of state 2 obtained in this way, the logical OR of both is calculated (operation 2) to obtain the value of state 4, and the logical sum of the value of state 3 and the value obtained by shifting the value to the right by 6 digits (Operation 3), the value of state 4 is obtained.
[0072]
In state 4, two lines are obtained after the pre-rotation block is rotated 90 degrees by 2 bits. That is, the first line is obtained as the second byte, and the second line is obtained as the fourth byte.
[0073]
The above operation is also performed on the third and fourth lines after rotation. For that purpose, the two values given as state 2 are shifted to the left by 4 digits from FIG. 20, respectively, and instead of bits 7 and 6 or bits 5 and 4, as shown in FIG. Bits 1 and 0 may be replaced respectively. As a result, the third and fourth lines after rotation are obtained in operations 2 and 3.
[0074]
[Seventh embodiment]
[Multiple block processing]
FIG. 13 shows a 90-degree rotation processing method in the case of 4 bits per pixel. According to the first embodiment, when 1 pixel is 4 bits, 1 byte × 2 lines become a block, but if this is left as it is, when packing into a 32-bit variable, 2 bytes are blank and 1/2 of the operation is wasted. . Therefore, processing is performed by packing 2 bytes × 2 lines for 2 blocks into a variable tmp. This packing is performed in step S132. Here, packing is performed in a 4-byte variable tmp in the order of the first line of the second block, the second line of the second block, the first line of the first block, and the second line of the first block.
[0075]
FIG. 21 shows this state. First, for the two blocks A and B each having 4 bits and 1 pixel, the pixels a1 to a4 and b1 to b4 are considered as shown in FIG. This is packed as in state 1 in step S132. This is stored in the 32-bit variable tmp by operation 1 pixel by pixel (4 bits) from the lower order. Similarly, the pixels that are not stored in tmp are stored in work by shifting them four places to the right. This results in state 2. The logical sum of the value of tmp thus obtained and the value obtained by shifting the value to the right by 4 digits is calculated and stored in the variable tmp (operation 3). Do the same for the variable work. If both variables of state 3 obtained in this way are stored as the two blocks after rotation, the second byte of work, the second byte of temp, the fourth byte of work, and the fourth byte of tmp, Images obtained by rotating 90 ° together with B can be obtained.
[0076]
As a result, the total number of calculations and the number of loops are reduced, so that processing can be performed at high speed. If a plurality of variables for packing are prepared, the number of loops can be further reduced.
[0077]
Furthermore, if a processor capable of 64-bit operation is used, the number of blocks that can be processed by itself is doubled. For example, not only in the case of 4 bits per pixel but also in the case of 2 bits per pixel as shown in FIG. 11, it is easy to change to process two blocks simultaneously.
[0078]
The present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. Needless to say, the present invention can also be applied to a case where the present invention is achieved by supplying a program to a system or apparatus.
[0079]
【The invention's effect】
As described above, according to the image processing method and apparatus according to the present invention,
-By storing the data on a bitmap that spans multiple bytes in a variable of multiple bytes, multiple bytes can be processed simultaneously, and the number of operations can be reduced. Therefore, since a plurality of pixels can be processed at the same time, the number of operations per pixel is reduced, and the processing time can be shortened.
[0080]
In the conventional example, even if the processor to be executed is changed from 32 bits to 64 bits, the speed does not increase unless there is a change in operating frequency, cache contents, internal architecture, or the like. However, in the present invention, only the bit width of the operation is increased, and the speed is increased by actively using it. Therefore, the difference from the conventional example can be further increased.
[0081]
-Since the number of data accesses is small, the decrease in speed is small in a system with few data caches. For the same reason, the speed reduction is small even in a system without a data cache.
[0082]
By performing the processing order of blocks in the order in which the addresses of the original images to be rotated are consecutive, the cache hit rate is increased with respect to the capacity of the data cache, and the speed is improved.
[0083]
・ By storing image data that spans multiple bytes together in a variable of multi-byte length, it is possible to reduce the amount of variable rather than storing separately, which makes it easier for the compiler to assign the variable to a register variable. There is an effect.
[0084]
[Brief description of the drawings]
FIG. 1 is a diagram showing a procedure of 90-degree rotation processing according to a first embodiment of the present invention.
FIG. 2 is a diagram of data packing according to the first embodiment.
FIG. 3 is a diagram illustrating a procedure of a 90-degree rotation process according to the first embodiment.
FIG. 4 is a diagram illustrating an order of block processing according to the first embodiment.
FIG. 5 is a diagram showing a procedure of a conventional example of 90-degree rotation of a block.
FIG. 6 is a diagram showing a procedure of a conventional example of 180 degree rotation of a block.
FIG. 7 is a diagram illustrating a procedure of 180 degree rotation of a block according to a second embodiment.
FIG. 8 is a diagram for explaining 180-degree rotation in the second embodiment.
FIG. 9 is a diagram showing a procedure of 270 degree rotation in the fourth embodiment.
FIG. 10 is a diagram showing a processing procedure when the processing of the first embodiment is performed by a 64-bit processor.
FIG. 11 is a diagram showing a processing procedure for rotating a block by 90 degrees when one pixel is 2 bits according to the fifth embodiment;
FIG. 12 is a diagram showing a processing procedure for simultaneously processing a plurality of lines according to the sixth embodiment.
FIG. 13 is a diagram showing a processing procedure when processing a plurality of blocks simultaneously according to the seventh embodiment;
FIG. 14 is a configuration diagram of a system that performs image processing according to an embodiment.
FIG. 15 is a diagram showing a procedure of 90-degree rotation processing according to the first embodiment of the present invention.
FIG. 16 is a diagram showing a procedure of 90-degree rotation processing according to the first embodiment of the present invention.
FIG. 17 is a diagram showing a procedure of 180 degree rotation processing according to the second embodiment of the present invention.
FIG. 18 is a diagram showing a procedure of 180 degree rotation processing according to the second embodiment of the present invention.
FIG. 19 is a diagram illustrating a state in which a block is rotated 90 degrees when one pixel is 2 bits according to the fifth embodiment.
FIG. 20 is a diagram illustrating another state in which a block is rotated 90 degrees when one pixel is 2 bits according to the fifth embodiment.
FIG. 21 is a diagram showing a block state when a plurality of blocks are processed simultaneously according to the seventh embodiment;

Claims

An image processing apparatus that rotates image data of 8 bits × 8 lines by 90 degrees,
First processing means for storing continuous image data of the upper half of the eight lines in the first memory, and storing continuous image data of the lower half of the eight lines in the second memory;
The image data stored in the first memory is shifted by N bits within 7 bits, and the image data masked except for the first bit of interest at intervals of 8 bits and the image data stored in the second memory are N Is a M-bit shift of a difference of 4 bits, and image data obtained by ORing image data other than the second noticed bit at a position shifted by MN bits with respect to the first noticed bit is the third Second processing means for storing in a memory;
An image obtained by shifting the image data stored in the third memory and the image data stored in the third memory by a predetermined amount in order to process the image data not masked by the second processing means into 1 byte. Third processing means for performing a logical OR with data;
An image processing apparatus comprising: processing for sequentially changing the shift amounts N and M by one by the second processing means, and means for executing the processing by the third processing means eight times.

The second processing means shifts the image data stored in the first memory to the left by N bits and masks the image data other than the target bit and the image data stored in the second memory to the right by M bits. The image data obtained by performing a logical sum with the image data that has been shifted and masked other than the bit of interest is stored in the third memory,
The third processing means calculates the logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory to the right by 7 bits and stores the logical sum in the third memory. The logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory to the right by 14 bits is stored in the third memory. The image processing apparatus according to claim 1.

The processing by the third processing means takes a logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory to the right by 7 bits, The image processing apparatus according to claim 1, further comprising a process of calculating a logical sum of the logical sum result and image data obtained by shifting the logical sum result to the right by 14 bits .

4. The image processing apparatus according to claim 1, further comprising output means for outputting an image based on the image data.

The image processing apparatus according to claim 4, wherein the output unit is a printing unit.

An image processing method of rotating image data of 8 bits × 8 lines by 90 degrees,
A first processing step of storing continuous image data of the upper half of the eight lines in the first memory, and storing continuous image data of the lower half of the eight lines in the second memory;
The image data stored in the first memory is shifted by N bits within 7 bits, and the image data masked except for the first bit of interest at intervals of 8 bits and the image data stored in the second memory are N Is a M-bit shift of a difference of 4 bits, and image data obtained by ORing image data other than the second noticed bit at a position shifted by MN bits with respect to the first noticed bit is the third A second processing step stored in a memory;
An image obtained by shifting the image data stored in the third memory and the image data stored in the third memory by a predetermined amount in order to process the image data not masked in the second processing step into 1 byte. A third processing step for performing a logical OR with data;
An image processing method comprising: a process in which the shift amounts N and M in the second processing step are sequentially changed by 1; and a step in which the processing in the third processing step is executed eight times.

In the second processing step, the image data stored in the first memory is shifted N bits to the left, and the image data masked except for the bit of interest and the image data stored in the second memory are shifted to the right by M bits. The image data obtained by performing a logical sum with the image data that has been shifted and masked other than the bit of interest is stored in the third memory,
In the third processing step, the logical sum of the image data stored in the third memory and the image data obtained by right shifting the image data stored in the third memory by 7 bits is stored in the third memory. The logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory to the right by 14 bits is stored in the third memory. The image processing method according to claim 6 .

In the processing by the third processing step, the logical sum of the image data stored in the third memory and the image data obtained by shifting the image data stored in the third memory to the right by 7 bits is taken. The image processing method according to claim 6 , further comprising: a logical sum of the logical sum result and image data obtained by shifting the logical sum result to the right by 14 bits .

The image processing method according to claim 6, further comprising an output step of outputting an image based on the image data.

The image processing method according to claim 9 , wherein the output step is a printing step.