JP2004242075A

JP2004242075A - Image processing apparatus and method therefor

Info

Publication number: JP2004242075A
Application number: JP2003029583A
Authority: JP
Inventors: Tomotoshi Kanatsu; 知俊金津; Keiko Nakanishi; 恵子中西
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-02-06
Filing date: 2003-02-06
Publication date: 2004-08-26

Abstract

<P>PROBLEM TO BE SOLVED: To extract character colors and character area information for a multi-level image with proper accuracy, by feedback of color extraction result to a character cut processing. <P>SOLUTION: The image processing apparatus comprises: a character area image generating means for generating a binary image of a character area from a color image; a characters cut means for creating a character rectangle for the binary image of the character area; a monochromatic determination means for conducting monochromatic determination of the character rectangle; a determining means for determining that a non-monochromatic character rectangle is a plurality sorts of monochromatic character sets; and a detailed character cut means for conducting character cut for the inside of the determined rectangle. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は画像処理装置および方法に関するものである。
【０００２】
【従来の技術】
近年、スキャナの普及により文書の電子化が進んでいる。電子化された紙文書をフルカラービットマップ形式の状態では、Ａ４サイズの場合３００ｄｐｉで約２４Ｍバイトにもなる。このような大容量のデータは、メイルに添付して送信するのに適したサイズとはいえない。そこで、フルカラー画像を圧縮することが通常行われており、その圧縮方式としてＪＰＥＧが知られている。ＪＰＥＧは写真などの自然画像を圧縮するには非常に効果も高く、画質も良い。しかし一方で、文字部などの高周波部分をＪＰＥＧ圧縮すると、モスキートノイズと呼ばれる画像劣化が発生し、圧縮率も悪い。そこで、領域分割を行い、文字領域を抜いた下地部分に自然画向きの圧縮方式、単色あるいは小数色の文字領域部分には可逆圧縮方式を施す方法があった。
【０００３】
圧縮の際に、領域分割により分けられた文字領域部分は減色してＭＭＲやＺＩＰ圧縮を施すとともに色情報を保持し、文字部を抜いた下地部分はＪＰＥＧ圧縮を施す。展開時には下地画像の上に、文字領域の画像を色情報に従って描画することで、高画質と高圧縮率を両立する画像処理装置を提供している（例えば、特許文献１参照）。
【０００４】
上記のような装置においては、文字領域の色情報抽出精度が、画質と圧縮率双方の性能に大きく影響する。そのため、二値画像の文字切り処理を利用して文字色抽出の性能を向上させている（例えば、特許文献２参照）。
【０００５】
【特許文献１】
特開２００２−０７７６３１号公報
【特許文献２】
特開２００３−００８９０９号公報
【０００６】
【発明が解決しようとしている問題】
しかし、特殊なレイアウトや領域分割の誤り、あるいはノイズや傾きなどの影響で文字切り処理が正しく行われなかった領域に、異なる色の文字が混在する場合、色抽出が正しくおこなわれず、該領域が非文字と認識され画質劣化、圧縮率低下の原因となっていた。
【０００７】
たとえば、図１３のように画像が傾いており、かつ色の異なる２行が接近している場合、射影を用いた行の分割ができないため文字の切り出しに失敗し、個々の文字に対応する正しい色を抽出できない。その結果この領域は非文字の図形と判断されてしまう。
【０００８】
また、図１４のように、見出しなど大きな文字と小さな文字が接近している場合、領域分割はこれをひとつの文字列と看倣してしまうことがある。このときも文字切り失敗により、文字毎の単色抽出がおこなえず、非文字の図形とされてしまう。
【０００９】
なおこれらの例では、文字が複数種の単色ではなく、すべて同色からなる領域であれば、たとえ文字切りは失敗しても色抽出は自体は正しくおこなわれるので、文字領域内の処理に問題は生じない。そのうえ、文字切り処理自体を例外対応により複雑化していくことには限界があることをふまえると、既存の文字切り処理を用いたうえで、複数色の文字領域に関する例外事象に対処することがより望ましい。
【００１０】
本発明は上記従来技術の課題を解決するために成されたものであり、文字切り処理に色抽出の結果をフィードバックして、多値画像の文字色および文字領域情報の抽出を精度良く行う画像処理装置および方法を提供することを目的とする。
【００１１】
【課題を解決するための手段】
この発明は下記の構成を備えることにより上記課題を解決できるものである。
【００１２】
（１）スキャンされたカラー画像に対し、前記カラー画像より文字領域の二値画像を生成する文字領域画像作成手段と、前記文字領域の二値画像に対し文字矩形を作成する文字切り手段と、前記文字矩形の単色判定をおこなう単色判定手段と、非単色の文字矩形が複数種の単色文字集合であることを判定する手段と、上記判定矩形内を文字切りする詳細文字切り手段と、を有する画像処理装置。
【００１３】
（２）前記（１）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、非単色矩形およびそれに隣接する矩形を含めて複数種の単色文字集合であることを判定することを特徴とする画像処理装置。
【００１４】
（３）前記（２）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、行間位置が一致する隣接複数矩形の総領域が複数種の単色文字集合であると判定することを特徴とする画像処理装置。
【００１５】
（４）前記（２）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、色分布一致する隣接複数矩形の総領域が複数種の単色文字集合であると判定することを特徴とする画像処理装置。
【００１６】
（５）スキャンされたカラー画像に対し、前記カラー画像より文字領域の二値画像を生成する文字領域画像作成手段と、前記文字領域の二値画像に対し文字矩形を作成する文字切り手段と、前記文字矩形の単色判定をおこなう単色判定手段と、非単色の文字矩形が複数種の単色文字集合であることを判定する手段と、
上記判定矩形内を文字切りする詳細文字切り手段と、を有する画像処理方法。
【００１７】
（６）前記（５）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、非単色矩形およびそれに隣接する矩形を含めて複数種の単色文字集合であることを判定することを特徴とする画像処理方法。
【００１８】
（７）前記（６）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、行間位置が一致する隣接複数矩形の総領域が複数種の単色文字集合であると判定することを特徴とする画像処理方法。
【００１９】
（８）前記（６）記載の文字矩形が複数種の単色文字集合であることを判定する手段は、色分布一致する隣接複数矩形の総領域が複数種の単色文字集合であると判定することを特徴とする画像処理方法。
【００２０】
【発明の実施の形態】
図１に本発明の第一の実施例のブロック図を示す。
【００２１】
１００は多値画像より全面二値画像を作成する縮小・二値化部である。１０１は画像中の文字領域を検出して複数の文字領域座標（１０９）を作成する文字領域検出部である。１０２は上記文字領域座標と原画像より、複数の文字領域部分の画像（１０７）を作成する文字領域画像作成部である。１０３は上記文字領域画像の黒部分と原画像を参照しながら黒部分の代表色（１１０）を算出する文字色抽出部である。１０４は上記文字領域画像の黒画素に対応する原画像上の画素を、周辺の色で塗りつぶし下地多値画像（１０８）を作成する文字部塗りつぶし部である。１０５は複数の文字領域画像を圧縮して複数の圧縮コードＡ（１１１）を作成する文字領域画像圧縮部である。１０６は下地多値画像（１０８）を圧縮して圧縮コードＢ（１１２）を作成する下地画像圧縮部である。
【００２２】
次に、図１の構成を用いて、スキャナなどで入力した画像データを圧縮する際の処理を、図３のフローチャートを用いて説明する。
【００２３】
ステップＳ３０１では、縮小・二値化部１００にて、入力多値画像に対し二値化処理を行う。その内容を以下簡単に説明する。
【００２４】
ＲＧＢ多値画像に次式のような輝度変換を行って、輝度画像Ｊを作成する。
【００２５】
Ｙ＝０．２９９Ｒ＋０．５８７Ｇ＋０．１１４Ｂ
このとき、入力画像の解像度に応じて解像度変換をおこなってもよい。例えば原画像が３００ｄｐｉのとき、縦方向、横方向とも４画素ごとに上式の演算を行い、新しい画像Ｊを作成すると画像ＪはＹ８ビット７５ｄｐｉの画像となる。次に、輝度画像Ｊのヒストグラムを取り、二値化閾値Ｔを算出する。輝度画像ＪをＴにて二値化し、全面二値画像Ｋを作成する。なお、ヒストグラムより閾値Ｔを算出する際には公知の方法を用いるとする。
【００２６】
ステップＳ３０２では、文字領域検出部１０１にて、二値画像に対して領域分割処理をおこない、その結果から文字領域のみを抽出して文字領域座標１０９を作成する。
【００２７】
上記領域分割処理は、ＵＳＰ５６８０４７８“ＭｅｔｈｏｄａｎｄＡｐｐａｒａｔｕｓｆｏｒｃｈａｒａｃｔｅｒｒｅｃｏｇｎｉｔｉｏｎ”（Ｓｈｉｎ−ＹｗａｎＷａｎｇら／ＣａｎｏｎＫ．Ｋ．）などを用いる。簡単に説明すると、二値画像中の黒画素を輪郭線追跡して得られた塊を抽出し、その形状、大きさ、文字、絵や図、線、表、を分類するとともに、文字と判定される塊の集合から、文字列をなす文字領域を抽出する処理となる。
【００２８】
Ｓ３０１およびＳ３０２の処理例を示す。例えば図２に示すカラー原稿を入力し、間引いて輝度変換したもののヒストグラムを取ると図１２のようになる。このヒストグラムから平均、分散、などのデータを利用して閾値Ｔ＝１９９を算出し、二値化した画像は図６のようになる。図６を領域分割処理すると、図７に示すような１５個の文字領域が検出される。これらの座標データが図１の１０９に格納される。
【００２９】
ステップＳ３０３では、文字領域画像作成部１０２が、文字領域座標１０９に基き、それぞれ文字領域ごとに領域内の文字部を黒、背景を白とする二値画像を作成する。この二値画像は、二値化部で得た閾値で多値画像を全面二値化し、それらから切りとって作成してもよいし、文字領域内で多値画像から輝度ヒストグラムを取りなおし、領域毎に最適な二値化閾値を算出して得た二値画像を用いてもよい。
【００３０】
ステップＳ３０４では、文字色抽出部１０３が、各文字領域内の代表色を抽出する。ここで代表色数は１に限定してもよいし、領域内に複数色の文字が混在する場合には任意の最大代表色数を選ぶようにすればよい。以下に、ある文字領域に対する文字色抽出処理の詳細を、図４のフローチャートを用いて説明する。
【００３１】
Ｓ４０１では、文字領域の二値画像Ｒより、文字行および個別文字に対応する矩形を抽出する、いわゆる文字切り処理をおこなう。以下に文字切り処理の概要を説明する。
【００３２】
まず水平方向の射影と垂直方向の射影をとり、その分散の高いほうを文字列方向とし、その射影の切れ目で行を分割する。さらに個々の行内で先の射影方向と垂直の射影をとりなおし、その切れめより文字行を分割して文字矩形を作成する。以降の処理は文字切りされた矩形毎におこなわれる。図９に文字切り処理の例を示す。
【００３３】
Ｓ４０２では文字矩形内の二値画像を細線化した二値画像Ｐを作成する。これはスキャナ特性やプリント時のアンチエイリアシングにより乱された、文字周辺付近よりの色抽出を回避するためである。図１１に細線化の例を示す。
【００３４】
Ｓ４０３では、細線化された二値画像Ｐの各画素に対応する色情報を、元のカラー画像を参照して求め、ＲＧＢそれぞれヒストグラムを作成する。もちろん、ＲＧＢのかわりにＹＵＶなど他の色空間を用いてもよい。
【００３５】
Ｓ４０４では、文字矩形の単色判定を行う。具体的には、ＲＧＢ各ヒストグラムの分散を求め、あらかじめ定めた閾値と比較し、ＲＧＢいずれの分散値も閾値以内の場合は、単色と判定しＳ４０５に進む。閾値を超える分散値があった場合は複数色を所持するとしてＳ４０６へ進む。
【００３６】
Ｓ４０５では、ＲＧＢ各ヒストグラムから注目文字矩形の代表色を決定する。これは各ヒストグラムのピーク値からなる色をとってもよいし、平均値を用いてもよい。
【００３７】
図１０に、Ｓ４０３〜Ｓ４０５の処理例を示す。図１０（ａ）の「イ」は黒文字（ｂ）の「ン」は赤文字として、それぞれ代表色のＲＧＢ値を得るが、図１０（ｃ）の花模様はヒストグラムの分散が大きく、かつ文字切り不能なため、非文字と判定されている。
【００３８】
次に、Ｓ４０６では注目の複数色を有す矩形が、非文字であるか、あるいは複数の単色文字の集合であるかを判定する。
【００３９】
このＳ４０６内の判定処理について、図５のフローチャートを用いて説明する。なお、ここでは水平方向の文字列からなる文字領域の場合について説明するが、垂直方向の文字列の場合は９０度方向を入れ換えることで同様の処理が可能である。
【００４０】
Ｓ５０１では、入力矩形Ｃに対し、その高さを閾値Ｈと比較する。この閾値Ｈは複数の文字が縦に並んでいる可能性を持つ高さの最小値であり、画像解像度に応じてあらかじめ定める数値である。Ｈより小さい場合は、矩形Ｃを非文字と判断して終了する。Ｈより大きければＳ５０２に進む。
【００４１】
Ｓ５０２では、矩形Ｃ内のみを対象にＳ４０１と同様の文字切り処理をおこなう。ただし文字列方向についてはあらたに判定せず、処理中の領域と同じ方向とみなす。
【００４２】
Ｓ５０３では、Ｓ５０２の文字切り処理で得た行数が２未満、あるいは行高の閾値ｔ未満の高さの行を含む場合、矩形Ｃを非文字と判断して終了する。それ以外の場合はＳ５０４に進む。
【００４３】
Ｓ５０４では、Ｓ５０２の文字切り処理で得た文字行と文字矩形に関して、行毎の文字数の最大値ｐと最小値ｑを求める。ただしあらかじめ定めた幅および高さの範囲にない文字矩形は文字数としてカウントしない。ここで、ｑが０の場合矩形Ｃは非文字と判断して終了し、ｐ≧２かつｑ≧１の場合は、矩形Ｃを複数の単色文字集合と判断して終了する。それ以外の場合、すなわちｐ＝ｑ＝１の場合Ｓ５０５へ進む。
【００４４】
Ｓ５０５では、矩形Ｃの右に隣接する矩形に対し、Ｃと同等の行間の隙間を持つ矩形が連続する数Ｎをカウントする。具体的には、矩形Ｃの水平方向の射影の切れ目と、注目矩形の水平方向射影の切れ目を比較して、一致するものをカウントする。また、右隣の矩形と行間が一致した場合、さらにその右隣と比較する、というように行間が一致しなくなるまで範囲を広げてゆく。このグループ化によれば、文字切り処理回数を減らし効率を上げる効果があるが、数Ｎを文字としての確信度に用いてもよい。たとえば、Ｎ≧１でなければ矩形Ｃを非文字と判定して終了してもよい。
【００４５】
Ｓ５０６では、Ｓ５０２の文字切り結果は破棄し、Ｓ５０５で得た矩形集合に外接する領域内に対し、詳細文字切り処理をおこなう。そして、矩形Ｃおよび、その右に続くＮ−１個の矩形のすべてを含む領域が、複数種の単色文字の集合であると判定し終了する。
【００４６】
図１５はＳ５０６の詳細文字切り処理に至る例を説明する図である。Ｓ４０１で得られた文字切り矩形Ｃ１〜Ｃ５において、Ｃ１、Ｃ２はの単色一文字だが、Ｃ３〜Ｃ５は異なる色の複数文字を含む矩形になっている。図４処理ではＣ１、Ｃ２を単色と判定した後、Ｃ３は単色ではないためＳ４０６の処理にはいる。図５に移り、Ｓ５０１〜Ｓ５０４を経てＳ５０５により、同一行間を持つ右側の矩形Ｃ４とＣ５を含めた領域で、Ｓ５０６詳細文字切りが行われている。最終的な文字矩形はすべて単色となっており、文字切りの精度が向上されるのがわかる。
【００４７】
図４に戻り、Ｓ４０６で非文字と判定された場合はＳ４０７に進み、複数種の単色文字と判定された場合はＳ４０８に進む。
【００４８】
Ｓ４０７では、二値画像Ｒにおいて、非文字と判定された該文字矩形部分内の黒画素をすべて消去する。これは、該当部分を二値画像から消去することで、後の穴うめ処理の対象外とし、非文字の複雑な色情報を下地側に残して保存するための処理である。消去後、矩形は破棄してＳ４０９に進む。
【００４９】
Ｓ４０８では、矩形Ｃを破棄するとともに、その内側に存在する文字矩形をそれぞれを図４処理で未処理の矩形として追加する。これらはＳ５０２あるいはＳ５０６で作成された文字矩形群に相当し、なお、文字矩形群がＳ５０６で作成された場合は、それらのもとになったＮ−１個の矩形もＣ同様破棄する。そしてＳ４０９に進む。
【００５０】
Ｓ４０９にて、未処理の文字矩形が残っていれば、Ｓ４０３に戻って繰り返す。なければＳ４１０に進む。
【００５１】
Ｓ４１０では、各文字矩形に対応して文字数ぶんの色の集合が作成されているので、これに対して減色処理を施し、合計ｎ色以下にまとめる。本処理はスキャナ処理などの影響で生じた色のばらつきをまとめる為の処理である。具体的減色方法としては、ヒストグラムをとって閾値以上のピークを１個以上抽出し、それらの色は近接ピークと統合する、などの方法があるが、他の種々のクラスタリング手法を用いてもよい。またｎの値は任意であるが、圧縮率に影響するので、たかだかｎ＝４程度を妥当とする。
【００５２】
図３に戻り、ステップＳ３０５では，文字部塗りつぶし部１０４が、文字領域画像内の黒画素に対応する原画像上の各画素を、周辺の色で塗りつぶし下地多値画像（１０８）を作成する。本処理の一例を図８を用いて説明する。
【００５３】
グラデーション画像を背景とし、「イン」という青色の文字が中央付近に描かれた、図８（ａ）のような画像を原画像とする。この原画像から（ｂ）のような１つの文字領域の二値画像を得たとする。本実施の形態では、例えば全画像を３２×３２の領域（以下、パーツ）に分割し、パーツごとの処理をおこなう。図８（ｃ）にパーツ分けの様子を示す。この図では簡単に説明するため、４×３のパーツに分割した状態を示している。各領域の左上の数字はパーツ番号を示す。このとき、パーツ００〜０３，１０，１３，２０〜２３内には文字がないので、処理は行われない。パーツ１１に対しては、対応する二値画像中の画素より、各パーツ内の白部分に対応するカラー画像のＲＧＢ値（またはＹＵＶ等でも良い）の平均値ａｖｅ＿ｃｏｌｏｒ１１を算出する。そして、原画像上で、二値画像の黒部分にあたる画素このａｖｅ＿ｃｏｌｏｒ１１で塗りつぶす。パーツ２２に対しても同様である。
【００５４】
このようにすれば、文字の存在する部分の周りの画素の平均値をもって、文字の存在する画素を塗りつぶすことができ、見掛け上自然に文字のみが取り除かれた、下地画像１０８が生成される。
【００５５】
図３に戻り、ステップＳ３０６では、文字領域画像圧縮部１０７にて、文字領域画像にあたる部分二値画像の集合１０８を圧縮して圧縮コードＡを作成する。この際、単色の文字領域はＭＭＲ圧縮を施すが、Ｓ３０４にて複数の代表色が抽出された文字領域は、領域内の色情報を保存できる最低必要ｂｉｔ数に変換し、ＺＩＰ圧縮をするか、あるいは同一領域を色別に異なる二値画像に分解する形で、それぞれをＭＭＲ圧縮を施すようにしてもよい。
【００５６】
ステップＳ３０７では、下地画像圧縮処理部１０６にて、下地画像１０８に対しＪＰＥＧ圧縮を行い圧縮コードＢを作成する。ＪＰＥＧ圧縮処理は一般的なものを用いる。簡単に説明すると、画像をＹＵＶの各成分に分割し、それぞれを小領域（たとえば８ｘ８ｐｉｘｅｌ）ごとにＤＣＴ変換し、得られた変換係数を量子化し、符号化することで圧縮コードを得る。なお、文字の取り去られた下地は一般に高い解像度を必要としないため、ＪＰＥＧ圧縮をおこなう前に解像度変換をおこなってもよい。
【００５７】
最後に、ステップＳ３０８では、文字領域座標（１０９）、文字領域色情報（１１０）、圧縮コードＣ（１１１）、圧縮コードＤ（１１２）の４つをまとめて最終的な画像データとして出力する。また、これらはＰＤＦやＸＭＬのように一般的に共有されるフォーマットで出力してもよい。
【００５８】
以上説明したように、本発明によれば、複数種の文字色が混在する文字領域において、文字切り処理が失敗した場合でも、文字色抽出の結果を利用して失敗を検出し、該当領域に詳細文字切り処理をおこなって、正しい文字切り結果を得ることができるので、多値画像の文字色および文字領域情報抽出の精度を向上できる。
【００５９】
この効果により、上記実施例のように、色情報付きの二値画像の文字領域と下地画像を分離し、双方を異なる圧縮方法で圧縮して、高画質かつ高圧縮のデータを生成する処理においては、ノイズやレイアウトの例外などに対して性能の劣化しない処理が可能になる。
【００６０】
【第二の実施例】
本発明第一の実施例では、複数色を有す文字矩形が、非文字であるか、あるいは複数種の単色文字の集合であるかを判定する処理の過程において、注目矩形の右側にあり、かつ射影を利用して求める行間位置が一致することを条件に、詳細文字切りの範囲を拡大したが、行間の一致ではなく、矩形内のＲＧＢヒストグラムの分布の一致を利用して、詳細文字切りの範囲を拡大するようにしてもよい。
【００６１】
特に、接近した色の異なる文字行があり、その一部がノイズの影響で二値画像上で接触している場合など、射影より行間を求めることが困難な場合でも、隣接する矩形で詳細文字切りが必要であるもののグループに含めて正しい文字切り結果を得ることができ、文字色および文字領域情報抽出の精度を向上させることができる。
【００６２】
このように、本発明第二の実施例においても、複数種の文字色が混在する文字領域において、文字切り処理が失敗した場合、文字色抽出の結果を利用して失敗を検出し、該当領域に詳細文字切り処理をおこなって正しい文字切り結果を得ることができるので、多値画像の文字色および文字領域情報抽出の精度向上が可能になる。
【００６３】
【発明の効果】
本発明によれば、多値画像の文字色および文字領域情報の抽出を精度良く行うことができる。
【図面の簡単な説明】
【図１】本発明の第１の実施例に係る圧縮装置のブロック図である。
【図２】本発明の第１の実施例の文字領域検出処理を説明するための原画像の例を示す図である。
【図３】本発明の第１の実施例のスキャン画像の圧縮処理を説明するためのフローチャートである。
【図４】本発明の第１の実施例の文字色抽出部の処理を説明するためのフローチャートである。
【図５】本発明の第１の実施例の文字判定部の処理を説明するためのフローチャートである。
【図６】本発明の第１の実施例の文字領域検出処理を説明するための二値画像の例を示す図である。
【図７】本発明の第１の実施例の文字領域検出処理を説明するための文字領域の例を示す図である。
【図８】本発明の第１の実施例の文字部塗りつぶし処理を説明するための図である。
【図９】本発明の第１の実施例の文字切り処理を説明するための図である。
【図１０】本発明の第１の実施例の文字色抽出処理を説明するための図である。
【図１１】本発明の第１の実施例の細線化処理を説明するための図である。
【図１２】本発明の第１の実施例の文字領域検出処理を説明するためのヒストグラムを示す図である。
【図１３】傾斜した、複数色を有する文字領域に対する文字切り誤りの例である。
【図１４】大小、複数色の混在する文字領域に対する文字切り誤りの例である。
【図１５】本発明の第１の実施例の詳細文字切り処理の例である。゜
【符号の説明】
１００縮小・二値化部
１０１文字領域検出部
１０２文字領域画像作成部
１０３文字色抽出部
１０４文字部塗りつぶし部
１０５文字領域画像圧縮部
１０６下地画像圧縮部
１０７画像
１０８下地多値画像
１０９文字領域座標
１１０文字領域代表色
１１１圧縮コードＡ
１１２圧縮コードＢ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image processing device and an image processing method.
[0002]
[Prior art]
2. Description of the Related Art In recent years, documents have been digitized due to the spread of scanners. When the digitized paper document is in a full-color bitmap format, the size becomes about 24 Mbytes at 300 dpi in the case of A4 size. Such a large amount of data cannot be said to be a size suitable for being attached to a mail and transmitted. Therefore, full-color images are usually compressed, and JPEG is known as a compression method. JPEG is very effective for compressing natural images such as photographs, and has good image quality. On the other hand, when high-frequency portions such as character portions are JPEG-compressed, image degradation called mosquito noise occurs and the compression ratio is poor. Therefore, there has been a method in which a region is divided and a compression method suitable for a natural image is applied to a background portion excluding a character region, and a reversible compression method is applied to a character region portion of a single color or a decimal color.
[0003]
At the time of compression, the character area portion divided by the area division is reduced in color and subjected to MMR or ZIP compression, and at the same time, retains color information, and the background portion excluding the character portion is subjected to JPEG compression. An image processing apparatus that achieves both high image quality and a high compression ratio by rendering an image of a character area on a base image in accordance with color information at the time of decompression is provided (for example, see Patent Document 1).
[0004]
In such an apparatus as described above, the accuracy of extracting color information in a character area greatly affects the performance of both the image quality and the compression ratio. For this reason, the performance of character color extraction is improved by utilizing character segmentation processing of a binary image (for example, see Patent Document 2).
[0005]
[Patent Document 1]
Japanese Patent Application Laid-Open No. 2002-077631 [Patent Document 2]
JP-A-2003-008909
[Problems to be solved by the invention]
However, when characters of different colors are mixed in a region where the character segmentation process has not been performed properly due to a special layout or an error in region division, noise, inclination, or the like, color extraction is not performed correctly, and the region is not correctly extracted. They were recognized as non-characters, causing image quality degradation and compression rate degradation.
[0007]
For example, as shown in FIG. 13, when the image is inclined and two lines having different colors are close to each other, character segmentation fails due to the inability to divide the line using the projection, and a correct character corresponding to each character is obtained. Cannot extract colors. As a result, this area is determined to be a non-character figure.
[0008]
Further, as shown in FIG. 14, when a large character such as a heading and a small character are close to each other, the area division may regard this as one character string. Also at this time, due to the failure of the character segmentation, single color extraction for each character cannot be performed, and the figure is regarded as a non-character figure.
[0009]
Note that, in these examples, if the character is not a plurality of types of single color but is an area composed of all the same color, even if the character cutout fails, the color extraction itself will be performed correctly, so there is no problem with processing in the character area. Does not occur. Furthermore, considering that there is a limit to complicating the character segmentation process by handling exceptions, it is more effective to use the existing character segmentation process and deal with exceptional events related to multi-color character areas. desirable.
[0010]
The present invention has been made in order to solve the above-mentioned problems of the related art, and provides an image for accurately extracting a character color and character area information of a multi-valued image by feeding back a result of color extraction to a character segmentation process. It is an object to provide a processing device and method.
[0011]
[Means for Solving the Problems]
The present invention can solve the above problem by providing the following configuration.
[0012]
(1) a character area image creating means for creating a binary image of a character area from the color image for a scanned color image; a character cutting means for creating a character rectangle for the binary image of the character area; A single-color determination unit that performs a single-color determination of the character rectangle; a unit that determines that a non-single-color character rectangle is a set of a plurality of types of single-color characters; and a detailed character-cutting unit that cuts characters in the determination rectangle. Image processing device.
[0013]
(2) The means for determining that the character rectangle described in (1) is a plurality of single-color character sets includes determining a plurality of single-color character sets including a non-single-color rectangle and a rectangle adjacent thereto. An image processing apparatus comprising:
[0014]
(3) The means for determining that the character rectangle described in (2) is a plurality of types of single-color character sets determines that the total area of adjacent plural rectangles having the same line spacing position is a plurality of types of single-color character sets. An image processing apparatus comprising:
[0015]
(4) The means for determining that the character rectangle described in (2) is a plurality of types of single-color character sets determines that the total area of the adjacent plurality of rectangles having the same color distribution is a plurality of types of single-color character sets. An image processing apparatus characterized by the above-mentioned.
[0016]
(5) a character area image creating means for generating a binary image of a character area from the color image for the scanned color image; a character cutting means for creating a character rectangle for the binary image of the character area; A single-color determination unit that performs the single-color determination of the character rectangle, and a unit that determines that the non-single-color character rectangle is a plurality of types of single-color character sets,
An image processing method comprising: a detailed character cutting unit that cuts characters in the determination rectangle.
[0017]
(6) The means for determining that the character rectangle described in (5) is a plurality of types of single-color character sets determines the plurality of types of single-color character sets including a non-single-color rectangle and a rectangle adjacent thereto. An image processing method comprising:
[0018]
(7) The means for determining that the character rectangle described in (6) is a plurality of types of single-color character sets determines that the total area of adjacent plural rectangles having the same line spacing position is a plurality of types of single-color character sets. An image processing method comprising:
[0019]
(8) The means for determining that the character rectangle described in (6) is a plurality of types of single-color character sets determines that the total area of the adjacent plurality of rectangles having the same color distribution is a plurality of types of single-color character sets. An image processing method comprising:
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows a block diagram of a first embodiment of the present invention.
[0021]
Reference numeral 100 denotes a reduction / binarization unit that creates an entire binary image from a multivalued image. Reference numeral 101 denotes a character area detection unit that detects a character area in an image and creates a plurality of character area coordinates (109). Reference numeral 102 denotes a character area image creating unit that creates an image (107) of a plurality of character area portions from the character area coordinates and the original image. Reference numeral 103 denotes a character color extracting unit that calculates a representative color (110) of the black portion while referring to the black portion and the original image of the character area image. Reference numeral 104 denotes a character part filling unit that fills pixels on the original image corresponding to black pixels of the character area image with peripheral colors to create a base multivalued image (108). Reference numeral 105 denotes a character area image compression unit that compresses a plurality of character area images to create a plurality of compression codes A (111). Reference numeral 106 denotes a background image compression unit that compresses the background multi-valued image (108) to create a compression code B (112).
[0022]
Next, a process for compressing image data input by a scanner or the like using the configuration of FIG. 1 will be described with reference to a flowchart of FIG.
[0023]
In step S301, the reduction / binarization unit 100 performs a binarization process on the input multivalued image. The contents will be described briefly below.
[0024]
The luminance conversion is performed on the RGB multi-valued image as in the following equation to create a luminance image J.
[0025]
Y = 0.299R + 0.587G + 0.114B
At this time, resolution conversion may be performed according to the resolution of the input image. For example, when the original image is 300 dpi, the above formula is calculated every four pixels in both the vertical and horizontal directions, and a new image J is created. The image J is a Y8-bit 75 dpi image. Next, a histogram of the luminance image J is obtained, and a binarization threshold T is calculated. The luminance image J is binarized by T to create an entire binary image K. When calculating the threshold value T from the histogram, a known method is used.
[0026]
In step S302, the character area detection unit 101 performs area division processing on the binary image, and extracts only the character area from the result to create character area coordinates 109.
[0027]
The above-mentioned area division processing uses USP 5,680,478 “Method and Apparatus for character recognition” (Shin-Ywan Wang et al./CanonKK). Briefly, a chunk obtained by tracing the outline of a black pixel in a binary image is extracted, and its shape, size, characters, pictures, figures, lines, and tables are classified and determined as characters. This is a process of extracting a character area forming a character string from a set of blocks to be formed.
[0028]
The processing example of S301 and S302 is shown. For example, when a color original shown in FIG. 2 is inputted, and a luminance of the original is thinned out and converted into a histogram, a histogram is obtained as shown in FIG. A threshold value T = 199 is calculated from the histogram using data such as an average and a variance, and a binarized image is as shown in FIG. When the region division processing in FIG. 6 is performed, fifteen character regions as shown in FIG. 7 are detected. These coordinate data are stored in 109 in FIG.
[0029]
In step S303, the character area image creating unit 102 creates, based on the character area coordinates 109, a binary image in which the character part in the area is black and the background is white for each character area. This binary image may be created by binarizing the entire multi-valued image with the threshold value obtained by the binarization unit and cutting out the binary image, or re-creating the luminance histogram from the multi-valued image in the character area, A binary image obtained by calculating an optimal binarization threshold for each time may be used.
[0030]
In step S304, the character color extraction unit 103 extracts a representative color in each character region. Here, the number of representative colors may be limited to one, or when characters of a plurality of colors are mixed in an area, an arbitrary maximum number of representative colors may be selected. Hereinafter, the details of the character color extraction processing for a certain character area will be described with reference to the flowchart of FIG.
[0031]
In S401, a so-called character segmentation process is performed to extract a rectangle corresponding to a character line and an individual character from the binary image R of the character region. The outline of the character segmentation process will be described below.
[0032]
First, a horizontal projection and a vertical projection are taken, and the higher variance is set as the character string direction, and the line is divided at the break of the projection. Further, the projection perpendicular to the projection direction is retaken within each line, and the character line is divided from the cut portion to form a character rectangle. Subsequent processing is performed for each cut rectangle. FIG. 9 shows an example of the character segmentation process.
[0033]
In S402, a binary image P is created by thinning the binary image in the character rectangle. This is to avoid the color extraction from the vicinity of the character, which is disturbed by the scanner characteristics or anti-aliasing at the time of printing. FIG. 11 shows an example of thinning.
[0034]
In step S403, color information corresponding to each pixel of the thinned binary image P is obtained with reference to the original color image, and a histogram is created for each of RGB. Of course, other color spaces such as YUV may be used instead of RGB.
[0035]
In S404, a single color determination of the character rectangle is performed. More specifically, the variance of each of the RGB histograms is obtained and compared with a predetermined threshold. If any of the variances of the RGB is within the threshold, it is determined that the color is a single color, and the process proceeds to S405. When there is a variance value exceeding the threshold value, the process proceeds to S406 as possessing a plurality of colors.
[0036]
In S405, the representative color of the target character rectangle is determined from each of the RGB histograms. This may be a color consisting of the peak value of each histogram or an average value.
[0037]
FIG. 10 shows a processing example of S403 to S405. In FIG. 10A, “A” is a black character, “B” is a red character, and “N” is a red character to obtain the RGB values of the representative colors. However, the flower pattern in FIG. Since it cannot be cut, it is determined to be a non-character.
[0038]
Next, in S406, it is determined whether the rectangle having the plurality of colors of interest is a non-character or a set of a plurality of monochrome characters.
[0039]
The determination process in S406 will be described with reference to the flowchart in FIG. Here, a case of a character area composed of a character string in the horizontal direction will be described. However, in the case of a character string in the vertical direction, the same processing can be performed by exchanging the 90-degree direction.
[0040]
In S501, the height of the input rectangle C is compared with a threshold value H. The threshold value H is a minimum value of the height at which a plurality of characters may be arranged vertically, and is a numerical value determined in advance according to the image resolution. If less than H, the rectangle C is determined to be a non-character and the process ends. If it is larger than H, the process proceeds to S502.
[0041]
In S502, the same character segmentation processing as in S401 is performed for only the inside of the rectangle C. However, the direction of the character string is not newly determined, and is regarded as the same direction as the area being processed.
[0042]
In step S503, if the number of lines obtained in the character segmentation process in step S502 is less than 2 or includes a line having a height less than the line height threshold value t, the rectangle C is determined to be a non-character, and the process ends. Otherwise, the process proceeds to S504.
[0043]
In S504, the maximum value p and the minimum value q of the number of characters in each line are obtained for the character line and the character rectangle obtained in the character cutting process in S502. However, character rectangles not within the predetermined width and height ranges are not counted as the number of characters. Here, if q is 0, the rectangle C is determined to be a non-character, and the process ends. If p ≧ 2 and q ≧ 1, the rectangle C is determined to be a plurality of monochrome character sets, and the process ends. In other cases, that is, when p = q = 1, the process proceeds to S505.
[0044]
In S505, the number N of continuous rectangles having the same space between rows as the rectangle C is counted with respect to the rectangle adjacent to the right of the rectangle C. Specifically, the break in the horizontal projection of the rectangle C is compared with the break in the horizontal projection of the rectangle of interest, and those that match are counted. Further, when the line on the right and the line spacing match, the range is expanded until the line spacing no longer matches, such as comparing with the right next rectangle. This grouping has the effect of reducing the number of times of character segmentation processing and increasing efficiency, but the number N may be used for the certainty factor as a character. For example, if N ≧ 1, the rectangle C may be determined to be a non-character, and the process may end.
[0045]
In step S506, the result of the character segmentation in step S502 is discarded, and a detailed character segmentation process is performed on an area circumscribing the rectangle set obtained in step S505. Then, it is determined that the region including the rectangle C and all of the N-1 rectangles following the rectangle C is a set of a plurality of types of single-color characters, and the process ends.
[0046]
FIG. 15 is a view for explaining an example leading to the detailed character cutting process in S506. In the character cut rectangles C1 to C5 obtained in S401, C1 and C2 are single color single characters, but C3 to C5 are rectangles including a plurality of characters of different colors. In the process of FIG. 4, after C1 and C2 are determined to be single colors, C3 is not a single color, so the process proceeds to S406. Referring to FIG. 5, through S501 to S504 and S505, detailed character segmentation at S506 is performed in an area including the right rectangles C4 and C5 having the same line spacing. It can be seen that the final character rectangles are all single color, and the accuracy of character cutting is improved.
[0047]
Returning to FIG. 4, if it is determined in S406 that the character is a non-character, the process proceeds to S407. If it is determined that the character is a plurality of types of monochrome characters, the process proceeds to S408.
[0048]
In S407, in the binary image R, all the black pixels in the character rectangular portion determined as a non-character are deleted. This is a process for erasing the corresponding portion from the binary image so as to be excluded from the later filling process and to save the non-character complex color information on the background side and save it. After the erasure, the rectangle is discarded and the process proceeds to S409.
[0049]
In step S408, the rectangle C is discarded, and the character rectangles existing inside the rectangle C are added as unprocessed rectangles in the processing in FIG. These correspond to the character rectangle group created in S502 or S506. When the character rectangle group is created in S506, the N-1 rectangles based on them are also discarded similarly to C. Then, the process proceeds to S409.
[0050]
If an unprocessed character rectangle remains in S409, the process returns to S403 and repeats. If not, proceed to S410.
[0051]
In S410, since a set of colors for the number of characters corresponding to each character rectangle has been created, a color reduction process is performed on the set, and the colors are reduced to a total of n colors or less. This process is a process for summing up color variations caused by the influence of the scanner process or the like. As a specific color reduction method, there is a method of extracting one or more peaks equal to or more than a threshold value by taking a histogram, and integrating those colors with adjacent peaks, but other various clustering methods may be used. . Although the value of n is arbitrary, it affects the compression ratio, so that at most n = 4 is appropriate.
[0052]
Referring back to FIG. 3, in step S305, the character portion painting unit 104 creates a base multi-value image (108) by painting each pixel in the original image corresponding to a black pixel in the character region image with a surrounding color. An example of this processing will be described with reference to FIG.
[0053]
An image as shown in FIG. 8A with a gradation image as a background and a blue character “in” drawn near the center is assumed to be an original image. It is assumed that a binary image of one character area as shown in FIG. In the present embodiment, for example, the entire image is divided into 32 × 32 areas (hereinafter, parts), and processing is performed for each part. FIG. 8C shows how parts are divided. This figure shows a state of being divided into 4 × 3 parts for simple explanation. The numbers at the upper left of each area indicate part numbers. At this time, since there are no characters in the parts 00 to 03, 10, 13, 20 to 23, no processing is performed. For the part 11, an average value ave_color11 of RGB values (or YUV or the like) of a color image corresponding to a white portion in each part is calculated from pixels in the corresponding binary image. Then, on the original image, the pixel corresponding to the black portion of the binary image is painted with ave_color11. The same applies to the part 22.
[0054]
In this way, the pixel in which the character is present can be painted out with the average value of the pixels around the part in which the character is present, and the background image 108 in which only the character is apparently removed naturally is generated.
[0055]
Returning to FIG. 3, in step S306, the character area image compression unit 107 compresses the set of partial binary images 108 corresponding to the character area images to create a compression code A. At this time, the monochromatic character area is subjected to MMR compression, but the character area from which a plurality of representative colors are extracted in S304 is converted into the minimum necessary number of bits capable of storing the color information in the area, and is subjected to ZIP compression. Alternatively, the same region may be subjected to MMR compression in such a manner that the same region is separated into different binary images for each color.
[0056]
In step S307, the background image compression processing unit 106 performs JPEG compression on the background image 108 to create a compression code B. A general JPEG compression process is used. In brief, an image is divided into YUV components, each is subjected to DCT for each small area (for example, 8 × 8 pixels), and the obtained transform coefficients are quantized and encoded to obtain a compressed code. Since the background from which the characters have been removed generally does not require a high resolution, resolution conversion may be performed before JPEG compression.
[0057]
Finally, in step S308, the character area coordinates (109), character area color information (110), compression code C (111), and compression code D (112) are collectively output as final image data. Also, these may be output in a commonly shared format such as PDF or XML.
[0058]
As described above, according to the present invention, even if the character segmentation process fails in a character region in which a plurality of types of character colors are mixed, the failure is detected using the result of the character color extraction, and the corresponding region is detected. Since accurate character segmentation processing can be performed and a correct character segmentation result can be obtained, the accuracy of character color and character area information extraction of a multi-valued image can be improved.
[0059]
With this effect, as in the above embodiment, in the process of separating the character area and the base image of the binary image with color information and compressing both with different compression methods to generate high quality and highly compressed data Can perform processing without deteriorating performance against noise, layout exceptions, and the like.
[0060]
[Second embodiment]
In the first embodiment of the present invention, in the process of determining whether a character rectangle having a plurality of colors is a non-character or a set of a plurality of types of single-color characters, it is on the right side of the rectangle of interest, In addition, the range of detailed character segmentation is expanded on the condition that the line spacing positions obtained using projection match, but the detailed character segmentation is performed not by matching lines but by matching the distribution of RGB histograms in a rectangle. May be expanded.
[0061]
In particular, even if there are character lines that are close to each other and have different colors and some of them are touching on the binary image due to the effect of noise, it is difficult to determine the line spacing from the projection. It is possible to obtain a correct character segmentation result by including the character segment which needs segmentation, and improve the accuracy of character color and character area information extraction.
[0062]
As described above, also in the second embodiment of the present invention, when a character segmentation process fails in a character region in which a plurality of types of character colors are mixed, the failure is detected using the result of character color extraction, and the corresponding region is detected. In this case, it is possible to obtain a correct character segmentation result by performing detailed character segmentation processing, so that it is possible to improve the accuracy of extracting character colors and character region information of a multi-valued image.
[0063]
【The invention's effect】
According to the present invention, it is possible to accurately extract a character color and character area information of a multi-valued image.
[Brief description of the drawings]
FIG. 1 is a block diagram of a compression device according to a first embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of an original image for explaining a character area detection process according to the first embodiment of the present invention.
FIG. 3 is a flowchart illustrating a scan image compression process according to the first embodiment of the present invention.
FIG. 4 is a flowchart illustrating a process performed by a character color extracting unit according to the first embodiment of this invention.
FIG. 5 is a flowchart illustrating a process performed by a character determination unit according to the first embodiment of the present invention.
FIG. 6 is a diagram illustrating an example of a binary image for describing a character area detection process according to the first embodiment of the present invention.
FIG. 7 is a diagram illustrating an example of a character area for describing a character area detection process according to the first embodiment of this invention.
FIG. 8 is a diagram for explaining a character portion filling process according to the first embodiment of the present invention.
FIG. 9 is a diagram illustrating a character segmentation process according to the first embodiment of the present invention.
FIG. 10 is a diagram for explaining a character color extraction process according to the first embodiment of this invention.
FIG. 11 is a diagram illustrating a thinning process according to the first embodiment of the present invention.
FIG. 12 is a diagram illustrating a histogram for explaining a character area detection process according to the first embodiment of this invention.
FIG. 13 is an example of a character segmentation error with respect to an inclined character area having a plurality of colors.
FIG. 14 is an example of a character segmentation error in a character area in which large and small characters are mixed.
FIG. 15 is an example of a detailed character segmentation process according to the first embodiment of this invention.゜ [Explanation of symbols]
Reference Signs List 100 Reduction / binarization unit 101 Character region detection unit 102 Character region image creation unit 103 Character color extraction unit 104 Character part painting unit 105 Character region image compression unit 106 Base image compression unit 107 Image 108 Background multi-value image 109 Character region coordinates 110 Character area representative color 111 Compression code A
112 Compression code B

Claims

For scanned color images,
Character region image creating means for generating a binary image of a character region from the color image,
Character cutting means for creating a character rectangle for the binary image of the character area,
A monochrome determination means for performing a monochrome determination of the character rectangle;
Means for determining that the non-monochromatic character rectangle is a plurality of types of monochromatic character sets;
Detailed character cutting means for cutting characters in the determination rectangle;
An image processing apparatus having:

The means for determining that a character rectangle is a plurality of single-color character sets according to claim 1 determines that the character rectangle is a plurality of single-color character sets including a non-single-color rectangle and a rectangle adjacent thereto. Image processing apparatus.

The means for determining that a character rectangle is a plurality of monochromatic character sets according to claim 2 is characterized in that a total area of adjacent plural rectangles having matching line spacing positions is determined to be a plurality of monochromatic character sets. Image processing apparatus.

The means for determining that a character rectangle is a plurality of types of single-color character sets according to claim 2 is characterized in that a total area of a plurality of adjacent rectangular shapes having a matching color distribution is determined to be a plurality of types of single-color character sets. Image processing device.

For scanned color images,
Character region image creating means for generating a binary image of a character region from the color image,
Character cutting means for creating a character rectangle for the binary image of the character area,
A monochrome determination means for performing a monochrome determination of the character rectangle;
Means for determining that the non-monochromatic character rectangle is a plurality of types of monochromatic character sets;
Detailed character cutting means for cutting characters in the determination rectangle;
An image processing method comprising:

The means for determining that a character rectangle is a plurality of single-color character sets according to claim 5 determines that the character rectangle is a plurality of single-color character sets including a non-single-color rectangle and a rectangle adjacent thereto. Image processing method.

The means for determining that a character rectangle is a plurality of monochromatic character sets according to claim 6 is characterized in that it is determined that a total area of adjacent plural rectangles having matching line spacing positions is a plurality of monochromatic character sets. Image processing method.

The means for determining that a character rectangle is a plurality of single-color character sets according to claim 6 determines that the total area of adjacent plural rectangles having the same color distribution is a plurality of single-color character sets. Image processing method.