JP3923243B2

JP3923243B2 - Character extraction method from color document image

Info

Publication number: JP3923243B2
Application number: JP2000222063A
Authority: JP
Inventors: 正行岡本
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2000-07-24
Filing date: 2000-07-24
Publication date: 2007-05-30
Anticipated expiration: 2020-07-24
Also published as: JP2002042055A

Description

【０００１】
【発明の属する技術分野】
本発明は、複雑で多様な背景を持ったカラー文書画像から文字色部分のみを抜き出すようにした、カラー文書画像からの文字抽出方法に関するものである。
【０００２】
【従来の技術】
近年、多くの光学的文字読み取り装置（ＯＣＲ）が研究されてきた。その結果、印刷物からの文字認識では、非常に高い精度の認識が行なわれるようになってきている。
【０００３】
しかしながら、それらのシステムはすべて２値（例えば白黒）の印刷物を対象にしたものであり、カラー文書を読み取るためのシステムは存在しない。
【０００４】
〔１〕Ｍ．Ｃｅｌｅｎｋ，“ＡＣｏｌｏｒＣｌｕｓｔｅｒｉｎｇＴｅｃｈｎｉｑｕｅｆｏｒＩｍａｇｅＳｅｇｍｅｎｔａｔｉｏｎ”，Ｃｏｍｐｕｔ．ＶｉｓｉｏｎＧｒａｐｈｉｃｓＩｍａｇｅＰｒｏｃｅｓｓ．５２，ｐｐ．１４５−１７０（１９９０）
〔２〕Ｊ．ＬｉｕａｎｄＹ．Ｈ．Ｙａｎｇ，“ＭｕｌｔｉｒｅｓｏｌｕｔｉｏｎＣｏｌｏｒＩｍａｇｅＳｅｇｍｅｎｔａｔｉｏｎ”，ＩＥＥＥＴｒａｎｓ．ＰａｔｔｅｒｎＡｎａｌ．Ｍａｃｈ．Ｉｎｔｅｌｌ．１６，７，ｐｐ．６８９−７００（１９９４）
〔３〕Ｓ．Ｈ．Ｐａｒｋ，Ｉ．Ｄ．ＹｕｎａｎｄＳ．Ｕ．Ｌｅｅ，“ＣｏｌｏｒＩｍａｇｅＳｅｇｍｅｎｔａｔｉｏｎｂａｓｅｄｏｎ３−ＤＣｌｕｓｔｅｒｉｎｇ：ＭｏｒｐｈｏｌｏｇｉｃａｌＡｐｐｒｏａｃｈ”，ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ３１，８，ｐｐ．１０６１−１０７６（１９９８）
〔４〕Ｊ．Ｏｈｙａ，Ａ．ＳｈｉｏａｎｄＳ．Ａｋａｍａｔｓｕ，“ＲｅｃｏｇｎｉｔｉｏｎＣｈａｒａｃｔｅｒｓｉｎＳｃｅｎｅＩｍａｇｅｓ”，ＩＥＥＥＴｒａｎｓ．ＰａｔｔｅｒｎＡｎａｌ．Ｍａｃｈ．Ｉｎｔｅｌｌ．１６，２，ｐｐ．２１４−２２０（１９９４）
〔５〕松尾賢一、梅田三千雄、“濃淡及び色情報による情景画像からの文字列抽出”電子情報通信学会技術研究会報告ＰＲＵ９２−１２１（１９９２）
〔６〕仙田修司、美濃導彦、池田克夫、“文字列の単色性に着目したカラー画像からの文字パタン抽出法”電子情報通信学会技術研究報告ＰＲＵ９４−２９（１９９４）
〔７〕Ｙ．Ｚｈｏｎｇ，Ｋ．ＫａｒｕａｎｄＡ．Ｋ．Ｊａｉｎ，“ＬｏｃａｔｉｎｇＴｅｘｔｉｎＣｏｍｐｌｅｘＣｏｌｏｒＩｍａｇｅｓ”，ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ２８，１０，ｐｐ．１５２３−１５３５（１９９５）
〔８〕Ｊ．Ｃ．Ｂｅｚｄｅｋ，“ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎｗｉｔｈＦｕｚｚｙＯｂｊｅｃｔｉｖｅＦｕｎｃｔｉｏｎＡｌｇｏｒｉｔｈｍｓ”ＰｌｅｎｕｍＰｒｅｓｓ（１９８１）
【０００５】
【発明が解決しようとする課題】
現在出版されている印刷物の中には、多彩な色を持ち、複雑な背景の上に書かれた、雑誌のような出版物も存在する。そのような雑誌等のカラー印刷物から文字認識を行なう場合、白地に黒で印刷されているだけの印刷物と違い、背景の中から文字色を見つけ出す作業が必要となる。
【０００６】
ところで、カラー画像の領域分割に関しては、過去に上記したいくつかの研究がある（上記文献〔１〕，〔２〕，〔３〕）。また、情景画像からの文字抽出に関しても、上記文献〔４〕，〔５〕のような研究がある。しかし、上記文献〔４〕はカラー画像の色の情報を利用しておらず、上記文献〔５〕は限定された状況（画像中に文字列が１列のみ）の研究である。
【０００７】
そこで、文字色を推定し、それを利用して文字を抽出する方法としては、上記文献〔６〕，〔７〕が存在する。
【０００８】
上記文献〔６〕は、色ヒストグラムからＫ平均法（Ｋ−ｍｅａｎｓ法）によるクラスタリングによって色を分解しており、本発明の方法に近いアルゴリズムになっている。
【０００９】
しかしながら、本発明の方法はファジイクラスタリングを用いた方法であり、要素の所属が２値的（所属するか、しないか）に決まってしまうハードクラスタリングと異なり、所属の程度まで判別できるクラスタリングを使用している。
【００１０】
このファジイクラスタリングによって得られる帰属度を用いることによって、２値的な分類だけでは判別の難しかった画像についての微妙な色の識別ができるようになる。そして、色ごとに分解された画像（２値画像）から、文字パターンを抽出する方法についても述べる。
【００１１】
本発明では、２値化された色分解画像をラベリングし、そのラベルの外接矩形から文字の並びと思われる特徴を見つけ出している。本発明の方法は、同じ文字列内の各文字の色が同一で単色であれば、複数行で複数色の文字列があっても抽出可能である。また、様々なサイズの文字が混じっていても抽出が可能である。
【００１２】
本発明は、上記状況に鑑みて、文字列が単色で書かれていることを前提として、色情報をクラスタリングすることによって、画像の背景色と文字色を分離し、複雑で多様な背景からでも文字を抽出できるカラー文書画像からの文字抽出方法を提供することを目的としている。
【００１３】
【課題を解決するための手段】
本発明は、上記目的を達成するために、
〔１〕複雑で多様な背景を持ったカラー文書画像から文字色部分のみを抜き出すようにしたカラー文書画像からの文字抽出方法であって、（ａ）前記カラー文書画像からスムージングによるディザの除去を行い、（ｂ）このディザの除去を行った画像の色値のＲＧＢからＬ^*ｕ^*ｖ^*ヘの変換とヒストグラム作成を行い、（ｃ）次に、前記Ｌ^*ｕ^*ｖ^*の３次元だけでなく、画像の垂直方向の座標（ｙ座標）も加えた４次元の空間に対して、クラスタ内の要素の分散による分割、統合、所属要素数による消滅を含むファジイクラスタリングを行うことにより、各色がそのクラスタにどの程度属しているかを示す帰属度を求め、（ｄ）次に、前記帰属度を基に閾値ｔ_iは、そのクラスタに所属する要素の数ρ_i （重みつき帰属度合計）に応じて、

つまり、所属する要素の数が多いクラスタは画像の広い範囲で使われる背景の色であるので、なるべく文字部分を含まない２値画像を作成するために閾値を高くし、逆に、所属する要素の数が少ないクラスタは文字色である可能性が高く、色のにじみも含めて抽出するために閾値は低くするように設定する色分解画像である２値画像の作成を行い、（ｅ）次に、前記２値画像のノイズを除去し、（ｆ）次に、黒画素および白画素のラベリングを行い、（ｇ）次に、文字抽出に適した２値画像の選択を行い、（ｈ）次に、文字行の抽出を行うことを特徴とする。
【００１５】
【発明の実施の形態】
以下、本発明の実施の形態を図を参照しながら説明する。
【００１６】
まず、本発明に係るカラー文書画像からの文字抽出のための処理の概要について説明する。
【００１７】
大まかな流れとしては、色分解画像を作成し、その２値化したデータをラベリングし、そのラベルを囲む外接矩形より文字列を抽出する、といった流れになっている。本発明の方法の特徴は、色分解画像を作成するにあたり、入力画像の色情報に対してファジイクラスタリングを用いて色分割を行なっていることである。
【００１８】
細かな処理の流れは、図１に示すように、以下の通りである。
【００１９】
（１）まず、カラー文書画像からのスムージングによるディザの除去を行う（ステップＳ１）。
【００２０】
（２）次に、画素の色値のＲＧＢからＬ^*ｕ^*ｖ^*ヘの変換とヒストグラム作成を行う（ステップＳ２）。
【００２１】
（３）次に、色情報のファジイクラスタリングを行う（ステップＳ３）。
【００２２】
（４）次に、帰属度を基に色分解画像（２値画像）作成を行う（ステップＳ４）。
【００２３】
（５）次に、２値画像のノイズ除去を行う（ステップＳ５）。
【００２４】
（６）次に、黒画素および白画素のラベリングを行う（ステップＳ６）。
【００２５】
（７）次に、文字抽出に適した２値画像の選択を行う（ステップＳ７）。
【００２６】
（８）次に、文字行の抽出を行う（ステップＳ８）。
【００２７】
次に、それぞれの処理における細かなアルゴリズムについて説明する。
【００２８】
▲１▼上記（１）の「スムージングによるディザの除去」
３色もしくは４色の色を周期的に配置することで、見た目にはそれらの色を混ぜ合わせたような効果を期待するのがディザ法である。ディザ法によって印刷されている文字があると、図２（ａ）に示すように文字の単色性が満たされ難い。上記文献〔６〕では、スムージングを行なうことによって、ディザの影響を軽減する方法が提案されている。そこでは、文字の輪郭がぼやけてしまわないように、画像のエッジの弱い部分のみを平均化する方法を提案している。なお、図２において、１はディザ法によって印刷されている文字、２はディザが除かれた文字である。
【００２９】
本発明の方法では、エッジであるかどうかの判定に、次の式を用いている。
【００３０】
ｄ_iｍａｘ＝ｍａｘ（‖ｘ_j−ｘ_k‖） …（１）
ｊ，ｋ∈Ｎ＝｛座標ｉとその８近傍の座標｝
なお、ｘ_iは座標ｉの色値を表す
式（１）は、目標画素とその８近傍の画素の計９つの画素の色値のうち、最も離れた２つの色値の距離である。ｄ_iｍａｘが一定値以内ならば、エッジではないと考えて、式（２）で表されるフィルタで８近傍の画素との平均化を行なう。
【００３１】
【数１】

【００３２】
ｄ_iｍａｘが一定値以上ならば、エッジを含んでいると考えて平均化を行なわない。
【００３３】
このようなスムージング処理を行なうことによって、図２（ａ）が図２（ｂ）のようになる。
▲２▼上記（２）の「色値のＲＧＢからＬ^*ｕ^*ｖ^*への変換とヒストグラム作成」ＣＩＥのＬ^*ｕ^*ｖ^*色空間は、人間の色感覚に近いとされる均等知覚色空間である。そこで、ＲＧＢで表される色値をＬ^*ｕ^*ｖ^*に変換する。その変換式は以下の通りである。
【００３４】
Ｘ＝０．４７８Ｒ＋０．２９９Ｇ＋０．１７５Ｂ …（３）
Ｙ＝０．２６３Ｒ＋０．６５５Ｇ＋０．０５１Ｂ …（４）
Ｚ＝０．０２０Ｒ＋０．１６０Ｇ＋０．９０８Ｂ …（５）
Ｌ^*＝２５（１００Ｙ／Ｙ₀）^1/3−１６ …（６）
ｕ^*＝１３Ｌ（ｕ′−ｕ′₀） …（７）
ｖ^*＝１３Ｌ（ｖ′−ｖ′₀） …（８）
ｕ′＝４Ｘ／（Ｘ＋１５Ｙ＋３Ｚ） …（９）
ｖ′＝９Ｙ／（Ｘ＋１５Ｙ＋３Ｚ） …（１０）
なお、ＲＧＢは多くの場合０〜２５５の値を持つデータだが、それを０〜１の値に変換したものを上記の式に適用する。
【００３５】
また、Ｙ₀＝１，ｕ₀＝０．２０１，ｖ₀＝０．４６１となっている。
【００３６】
ＲＧＢからＸＹＺへの変換式についてはいくつかのバリエーションが存在するが、本発明では上記の式を用いている。なお、このＬ^*ｕ^*ｖ^*色空間は立方体をしておらず、Ｌ^*，ｕ^*，ｖ^*のとる値の範囲もそれぞれ違っている。イメージとしては図３（ｂ）に示すような形状をしている。ちなみにＬ^*は０〜１１６の値をとり、ｕ^*は−９７〜１７１の値をとり、ｖ^*は−１２８〜１０９の値をとる。なお、図３（ａ）はＲＧＢ色空間を示している。
【００３７】
このようにして得られた色値からヒストグラムを作成し、クラスタリングを行なうが、本システムでは文字がすべて横書きで書かれているものとして考え、その位置情報も利用してクラスタリングを行なう。そこで、Ｌ^*ｕ^*ｖ^*の３次元だけでなく、画像の垂直方向の座標（ｙ座標）も加えた４次元の空間に対してクラスタリングを行なう。この４次元の空間は１７×４５×４０×１０（Ｌ^*が１７，ｕ^*が４５，ｖ^*が４０，ｙ座標が１０）に分けられ、ヒストグラムが作成される。
【００３８】
▲３▼上記（３）の「色情報のファジイクラスタリング」
入力された画像の色情報に対してファジイクラスタリングを行ない、色を分解する。ファジイクラスタリングが、ハードクラスタリングといわれる通常のクラスタリングと異なる点は、要素が複数のクラスタに少しずつ所属する事を認めている点である。ハードクラスタリングにおける、要素のクラスタヘの所属の有無を１と０で表すと、図４に示すようになる。これがファジイクラスタリングでは、各クラスタ中心までの距離の比で決まる０〜１の値の帰属度という数値で表される（図５参照）。帰属度は、要素にそのクラスタがどれだけ強い影響を与えているかを示している。
【００３９】
これにより、あるクラスタに所属するかしないかといった２値的な判別ではなく、どのクラスタにどの程度所属しているかといった程度の違いまで判断できるようになる。
【００４０】
▲３▼−１：ＦＣＭについて説明する。
【００４１】
代表的なファジイクラスタリングとしては、Ｂｅｚｄｅｋらによるファジイｃ−ｍｅａｎｓ法（ＦＣＭ）、つまり、上記文献〔８〕に示されるアルゴリズムが存在する。これは、Ｋ−ｍｅａｎｓ法に帰属度の考えを付け加えて拡張したアルゴリズムである。ここで、Ｋ−ｍｅａｎｓ法について説明すると、Ｋ−ｍｅａｎｓ法は予めクラスタ数（Ｋ個）が定まっている場合のクラスタリング手法である。ここでは、まず、初期クラスタ中心を適当に与え、各要素を最も近いクラスタ中心に所属させる。その後、各クラスタに対して平均値により新たなクラスタ中心を計算し、再度新しいクラスタを求める。これらの操作をクラスタ中心が変化しなくなるまで繰り返し、最終的なクラスタを得る。
【００４２】
このＦＣＭのアルゴリズムを簡単に述べると次のようになっている。
【００４３】
ステップ１：ｃ個の初期クラスタ中心ｖ_i、（ｉ＝１，２，…，ｃ）を適当に決める。
【００４４】
ステップ２：すべての要素ｘ_k、（ｋ＝１，…，ｎ）の帰属度
【００４５】
【数２】

【００４６】
を求める
ステップ３：新たなクラスタ中心
【００４７】
【数３】

【００４８】
ここでのｍは帰属度に対する重み値で、１＜ｍ＜∞の値をとり、ｍが大きくなればなるほど、帰属度の大きな要素ｘ_kのクラスタ中心ｖ_iに対する影響が大きくなる。なお、式（１１），（１２）で、ｍ→１，ｕ_ik∈｛０，１｝と置けば、このアルゴリズムは通常のＫ−ｍｅａｎｓ法と同じものとなる。
【００４９】
▲３▼−２：自己収束型ファジイクラスタリング
本発明では、最も一般的なファジイクラスタリングであるＦＣＭにクラスタの分割、統合、消滅といった処理を加えた、自己収束型ファジイクラスタリングを使用している。
【００５０】
この自己収束型ファジイクラスタリングのアルゴリズムは以下のようになっている。
【００５１】
ステップ１：クラスタ数の初期値ｃ、帰属度の重みｍ、収束判定値ε、最大繰り返し数Ｉ、クラスタ分割条件の閾値θ_S、クラスタ統合の閾値θ_d、クラスタ消滅条件の閾値θ_Cを決める。
【００５２】
ステップ２：初期クラスタ中心ｖ_i，（ｉ＝１，２，…，ｃ）を決める。
【００５３】
ステップ３：要素ｘ_k＝（ｘ_kL，ｘ_ku，ｘ_kv，ｘ_ky）、（ｋ＝１，…，ｎ）の帰属度
【００５４】
【数４】

【００５５】
を求める。
【００５６】
ステップ４：各クラスタの分散
【００５７】
【数５】

【００５８】
を求め、ソートする。
【００５９】
ステップ５：σ_iがθ_S以上のクラスタを分割する。なお、分割後の新しいクラスタ中心は、
【００６０】
【数６】

【００６１】
（Ｓ₁＝｛ｘ_k｜ｘ_kq＞ｖ_iq｝、Ｓ₂＝｛ｘ_k｜ｘ_kq ＜ｖ_iq｝、ｑはＬ，ｕ，ｖ，ｙのうち最も分散の大きいもの）
ステップ６：クラスタ間の距離がθ_d以下のクラスタを統合する。なお、統合後の新たなクラスタ中心は、２つのクラスタ中心の中間点である。クラスタを統合したならば、すべての要素の帰属度を求め直す。
【００６２】
ステップ７：各クラスタの所属要素数（重みつき帰属度合計）
【００６３】
【数７】

【００６４】
がθ_C以下のクラスタを消滅させ、再度すべての要素の帰属度を計算する。
【００６５】
ステップ８：新たなクラスタ中心
【００６６】
【数８】

【００６７】
を計算する。
【００６８】
【数９】

【００６９】
さもなくば、ステップ４に戻る。
【００７０】
クラスタ数が増減し、自己収束するアルゴリズムは、Ｋ−ｍｅａｎｓ法を基にしたものではいくつも存在する。本発明の方法は、ハードクラスタリングで使われる自己収束化方法を、ファジイクラスタリングに拡張したものである。本来のファジイクラスタリングと異なる点は、ステップ４、ステップ５のクラスタ分割処理、ステップ６のクラスタ統合処理、そして、ステップ７のクラスタ消滅処理である。
【００７１】
クラスタ分割処理は、クラスタの分散の大きさを見て行われる。分散の大きなクラスタは２つに分割される。次に、クラスタ同士が接近していた場合（クラスタ中心間の距離が小さかった場合）、その２つのクラスタを１つにする。そして、このクラスタ数の変化などによって、所属する要素の数（重みつき帰属度合計）が少なくなったクラスタは消滅させる。
【００７２】
ただし、このようなクラスタ数決定アルゴリズムは、解が振動してしまい、収束しないことがある。そこで、解の振動を抑えるために、繰り返しが進むにつれて閾値や収束条件を緩めることが考えられる。
【００７３】
このような処理を加えた、適切なクラスタ数で自己収束するファジイクラスタリングを用いて、各クラスタおよび帰属度が計算される。
【００７４】
なお、実験結果のところで示してあるデータは以下の初期値、閾値で計算された結果である。
【００７５】
ｃ＝１
ｍ＝１．５
Ｉ＝１０
ε＝１．０
θ_S＝８．０
θ_d＝３．０
θ_C＝全画素数／１００
なお、閾値に関しては、解が振動することが多いことから、繰り返しが進むにつれて閾値を緩める方法を用いた。この値は、あくまで実験的に求めた値である。
【００７６】
▲４▼上記（４）の「帰属度を基に色分解画像（２値画像）作成」
ファジイクラスタリングによって得られる各色の帰属度から、そのクラスタヘの所属の程度が分かる。その帰属度を基に、２値画像を作成する。その際の閾値ｔ_iは、そのクラスタに所属する要素の数ρ_i〔式（１６）〕に応じて次の式で決定する。
【００７７】
【数１０】

【００７８】
所属する要素の数が多いクラスタは画像の広い範囲で使われる背景の色であると考えられ、なるべく文字部分を含まない２値画像を作成するために閾値を高くする。逆に、所属する要素の数が少ないクラスタは文字色である可能性が高く、色のにじみも含めて抽出するために閾値は低くする。
【００７９】
なお、この２値化処理の際には、ｙ座標（画素の垂直方向の座標）が違うのみでＬ^*，ｕ^*，ｖ^*の値がほぼ同じクラスタに対しては、それぞれの帰属度を足して１つのものとして閾値処理を行っている。画像によっては、背景色のように広い範囲で使われている色が、ｙ座標で２つに分けられることがある。このような場合、２つのクラスタを統合して処理しても、文字色と背景色の分離性にはほとんど影響しないので、２値画像の数を減らし、処理の効率化を図るために、クラスタを統合処理する必要がある。
【００８０】
このようにして、図６の画像からは図７中に示してあるような２値画像が作成される。
【００８１】
▲５▼上記（５）の「２値画像のノイズ除去」
作成された２値画像には、スムージング処理で吸収しきれなかったディザや、背景上の孤立点等がノイズとして出現する。具体的には、画素の連結数が少ないものである。画像の解像度によって、いくつの連結数のものまでをノイズとするかは変わる。これらは、次のラベリング処理やそれに続く文字行の抽出で、使用メモリ量の減少や処理速度の高速化のために除去する。
【００８２】
▲６▼上記（６）の「黒画素および自画素のラベリング」
ラベリングとは、同じ連結成分に属するすべての画素に同じラベル（番号）を割り当て、異なった連結成分には異なったラベルを割り当てる操作である。２値化された画像の黒画素、白画素両方に対してラベリングを行ない、各連結成分に対して外接矩形を求める。図８に示すような２値画像には、図９（ａ），図９（ｂ）に示すようにラベルが付けられる。ここでは４連結でラベリングを行なっている。そして、同一のラベルをつけられた連結成分を囲う矩形（外接矩形）を元に文字の抽出を行なう。
【００８３】
図１０（ａ），図１０（ｂ）が図９の黒画素、白画素のラベルに対してのそれぞれの外接矩形となる。この時、大き過ぎる矩形（入力画像の４分の１以上の大きさ）、あるいは小さ過ぎる矩形（幅と高さの両方が３画素以下）、縦横比が大きく違う矩形（幅／高さが１５以上、または１５分の１以下）は、文字矩形ではないとして除外する。
【００８４】
図６の画像から得られる外接矩形は、図７に示すようになっている。
【００８５】
▲７▼上記（７）の「文字抽出に適した２値画像の選択」
以後の処理の高速化とメモリの節約のため文字抽出に適した画像を選ぶ。その際には次の４点を考慮する。
【００８６】
・矩形の分散
・矩形数
・画素密度
・平均矩形サイズ
各文字行は同じくらいの大きさの矩形からできているはずなので、各２値画像中の矩形の幅と高さの分散を計算し、その値が大き過ぎる画像については文字行を含んでいないと考え、以後の処理を行なわない。また、矩形数が多過ぎるものも、文字色を含んだ画像である可能性が低いので棄却する。各矩形ごとの黒及び白の画素密度が高過ぎたり低過ぎたりするものも、文字ではない可能性が高いので棄却する。なお、画素密度（ｄｅｎｓｉｔｙ）とは
ｄｅｎｓｉｔｙ＝Ｎ／（ｗ＋ｈ） …（１９）
ここで、Ｎ：ラベルのついた画素数
ｗ：矩形の幅
ｈ：矩形の高さ
で表される値である。また、平均矩形サイズが小さ過ぎるものも文字を含んでいないとして棄却する。
【００８７】
▲８▼上記（８）の「文字行の抽出」
文字矩形が含まれていると思われる２値画像が求まったら、文字列としての特徴を持った矩形を抽出するために、各矩形ごとに隣接する（対象矩形から一定距離内にある）矩形を求め、それをもとに矩形の連結（文字行として推定されるもの）を求める。この隣接矩形は対象矩形から見て次の条件を満たすものである。
【００８８】
＊対象矩形をその高さ（もしくは幅のどちらか大きい方）の０．７倍、上下左右に拡大した範囲に、一部、または全部含まれている（図１１参照）。
【００８９】
＊対象矩形の中に完全に含まれてしまっていない。つまり、一部は含まれていても良い）〔図１２（ａ）参照〕。
【００９０】
＊対象矩形と交差していない〔図１２（ｂ）参照〕。
【００９１】
この条件を満たす矩形を対象矩形の隣接矩形とし、それらを次々と連結させて矩形の連なり、連結矩形を求める。この隣接矩形の連結条件は以下の通りである。
【００９２】
＊矩形の大きさが同じくらいである（幅及び高さが対象矩形の４分の１以上４倍以下、かつ幅もしくは高さのどちらかが対象矩形の２分の１以上２倍以下）。
【００９３】
＊それぞれの矩形の代表色が近い色である。
【００９４】
＊矩形がほぼ水平にならんでいる（矩形の左下の座標同士をみて、その角度θが水平から１０°以内（図１３参照）。
【００９５】
上記の条件を満たす矩形を次々に連結させて、連結矩形を求める。
【００９６】
このようにして求めた連結矩形の連結数が４以上であれば、その連結矩形は文字行の一部であるとみなす。そして、その連結矩形から文字行の幅と高さを推定する。
【００９７】
推定される文字行の高さは、連結矩形の最上部と最下部を、もっとも高さの大きい矩形の高さの２分の１だけ広げた範囲である（図１４参照）。幅は、先に求めた高さの範囲内で、連結矩形と同じくらいの大きさで近い色の矩形を求め、そのような矩形の存在する終端を図１４に示すように延長した範囲とする。推定された文字行の範囲内に矩形の中心が存在し、その矩形の色が連結矩形の色に近いならば、それを文字の要素として抽出する。
【００９８】
上記した通りに、いくつかのカラー文書画像に対して実験を行なった結果、その多くでほとんどの文字を抽出できた。図１５は図６から文字を抽出した結果である。他にもいくつか実験結果を示す。図１６、図１７では、すべての文字が抽出されている。
【００９９】
特に、ファジイクラスタリングを用いることにより、ハードクラスタリングを用いた時よりもいくつかの点で効果的に文字を抽出できるようになった。まず、全画素中で少数しか使われていない文字色は、ハード、ファジイの両クラスタリングでも背景との分離が困難である。この時、ハードクラスタリングではその文字を抽出することはまず不可能であるが、ファジイクラスタリングでは帰属度の違いにより背景より分離することが可能になることが多い。
【０１００】
また、色空間の広い範囲に渡って色が使われている画像では、ハードクラスタリングは必要以上に色空間を分割してしまい、処理効率の悪化や背景パターンの誤抽出が起こりやすくなっていたが、ファジイクラスタリングではそれらは改善されている。逆に、使われている色数が少なく偏っていた場合には、ファジイクラスタリングの方が多くクラスタを生成する傾向がある。しかし、ファジイクラスタリングでは極端にクラスタ数が多くなることはなかったので、処理効率の点でも十分有用性があると言える。
【０１０１】
本発明の方法による文字抽出は、背景に複雑で多様な色が使われている時に特に有効である。シンプルな背景（模様がなく、単一色）の場合には、ハードクラスタリングや、従来の他の方法を用いても文字抽出が容易であり、メモリや処理速度の効率の点で本方法よりすぐれているものも多い。
【０１０２】
本発明の方法では、色ヒストグラムと画像の垂直座標を元にクラスタリングを行ない色分解画像を作成したが、本発明の方法だけでは、背景と文字色を完全に分離することは困難な場合もある。
【０１０３】
また、今後は横書きの文字だけでなく、縦書きの文字にも対応させていくことが考えられるので、その場合には本発明の方法のような垂直座標だけではうまく処理できない。そのような場合には、画像中の位置情報を現在以上にうまく利用したクラスタリングを考えることが必要である。
【０１０４】
なお、本発明は上記実施例に限定されるものではなく、本発明の趣旨に基づいて種々の変形が可能であり、それらを本発明の範囲から排除するものではない。
【０１０５】
【発明の効果】
以上、詳細に説明したように、本発明によれば、以下のような効果を奏することができる。
【０１０６】
（Ａ）カラー文書画像からの文字抽出にあたり、ファジイクラスタリングを利用し、類似色をまとめることによって、複雑で多様な背景を持ったカラー文書画像から、文字色部分のみを抜き出すことができる。ファジイクラスタリングは、要素のクラスタヘの所属の程度を表す帰属度という値を持つクラスタリングアルゴリズムであり、これにより、微妙な色（背景と文字の中間色など）の所属の程度が分かるようになり、２値的な分類をするハードクラスタリングでは判断の難しい低画質の画像や、色彩の豊富な画像についてもそれなりに良好な結果が得られるようになった。
【０１０７】
（Ｂ）画素数の少ない色の分離が有効にできるようになったり、色彩の豊富な画像で必要以上にクラスタを生成しなくなった。本発明は、特に、複雑な背景で多彩な色を持つ画像に対して有効性が認められた。
【図面の簡単な説明】
【図１】本発明にかかるカラー文書画像からの文字抽出フローチャートである。
【図２】本発明にかかるスムージング処理例の説明図である。
【図３】本発明にかかる色空間の形状例を示す図である。
【図４】従来のハードクラスタリング例の結果を示す図である。
【図５】本発明にかかるファジイクラスタリング例を示す図である。
【図６】本発明にかかる入力画像例を示す図である。
【図７】本発明にかかる図６の入力画像の色分解画像を示す図である。
【図８】２値画像例を示す図である。
【図９】図８の２値画像のラベリングの説明図である。
【図１０】図８の２値画像の外接矩形の説明図である。
【図１１】本発明にかかる隣接矩形の説明図である。
【図１２】本発明にかかる隣接矩形条件の説明図である。
【図１３】本発明にかかる矩形同士の角度の説明図である。
【図１４】本発明にかかる文字行の推定範囲の説明図である。
【図１５】本発明にかかる図６の入力画像の文字抽出結果を示す図である。
【図１６】本発明の実験結果（その１）を示す図である。
【図１７】本発明の実験結果（その２）を示す図である。
【符号の説明】
１ディザ法によって印刷されている文字
２ディザが除かれた文字[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for extracting characters from a color document image in which only the character color portion is extracted from the color document image having a complicated and diverse background.
[0002]
[Prior art]
In recent years, many optical character readers (OCR) have been studied. As a result, character recognition from printed matter has been performed with extremely high accuracy.
[0003]
However, these systems are all intended for binary (for example, black and white) printed matter, and there is no system for reading a color document.
[0004]
[1] M.M. Celenk, “A Color Clustering Technology for Image Segmentation”, Comput. Vision Graphics Image Process. 52, pp. 145-170 (1990)
[2] J. Liu and Y.M. H. Yang, “Multiresolution Color Image Segmentation”, IEEE Trans. Pattern Anal. Mach. Intell. 16, 7, pp. 689-700 (1994)
[3] S. H. Park, I. et al. D. Yun and S. U. Lee, "Color Image Segmentation based on 3-D Clustering: Morphological Approach", Pattern Recognition 31, 8, pp. 1061-1076 (1998)
[4] J. Ohya, A .; Shio and S. Akamatsu, “Recognition Characters in Scene Images”, IEEE Trans. Pattern Anal. Mach. Intell. 16, 2, pp. 214-220 (1994)
[5] Kenichi Matsuo, Michio Umeda, “Character string extraction from scene image by shading and color information” IEICE Technical Report, PRU92-121 (1992)
[6] Shuji Senda, Tadahiko Mino, Katsuo Ikeda, “Character Pattern Extraction Method from Color Image Focusing on Monochromaticity of Character Strings” IEICE Technical Report PRU94-29 (1994)
[7] Y. Zhong, K .; Karu and A.K. K. Jain, “Locating Text in Complex Color Images”, Pattern Recognition 28, 10, pp. 1523-1535 (1995)
[8] J. et al. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms”, Plenum Press (1981)
[0005]
[Problems to be solved by the invention]
Among the printed publications that are currently published, there are magazine-like publications that have a variety of colors and are written on a complex background. When character recognition is performed from such a color printed matter such as a magazine, it is necessary to find the character color from the background, unlike a printed matter that is printed in black on a white background.
[0006]
By the way, there have been several studies on color image area division in the past (the above documents [1], [2], [3]). There are also studies on character extraction from scene images as described in the above references [4] and [5]. However, the document [4] does not use the color information of the color image, and the document [5] is a study of a limited situation (only one character string in the image).
[0007]
Therefore, the above-mentioned documents [6] and [7] exist as methods for estimating the character color and extracting the character using the estimated character color.
[0008]
In the above document [6], colors are separated from the color histogram by clustering by the K-means method (K-means method), which is an algorithm close to the method of the present invention.
[0009]
However, the method of the present invention is a method using fuzzy clustering. Unlike hard clustering in which element affiliation is determined binary (whether it belongs or not), clustering that can discriminate to the degree of affiliation is used. ing.
[0010]
By using the degree of attribution obtained by this fuzzy clustering, it becomes possible to identify subtle colors for images that were difficult to discriminate by only binary classification. A method for extracting a character pattern from an image (binary image) decomposed for each color will also be described.
[0011]
In the present invention, a binarized color separation image is labeled, and a feature that seems to be an arrangement of characters is found from a circumscribed rectangle of the label. According to the method of the present invention, as long as each character in the same character string has the same color and is a single color, it can be extracted even if there are character strings of a plurality of colors in a plurality of lines. In addition, extraction is possible even if characters of various sizes are mixed.
[0012]
In view of the above situation, the present invention separates the background color and the character color of an image by clustering color information on the premise that the character string is written in a single color, even from complicated and diverse backgrounds. An object of the present invention is to provide a method for extracting characters from a color document image from which characters can be extracted.
[0013]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides
[1] A character extraction method from a color document image in which only a character color portion is extracted from a color document image having a complicated and diverse background, wherein (a) dither is removed from the color document image by smoothing. (B) The RGB color values of the image from which this dither is removed are converted to L^*u^*v^*(C) Next, L^*u^*v^*Fuzzy clustering including division and integration by dispersion of elements in the cluster, and annihilation by the number of belonging elements is performed on a four-dimensional space including not only the three dimensions but also the vertical coordinate (y coordinate) of the image. Thus, the degree of attribution indicating how much each color belongs to the cluster is obtained. (D) Next, based on the degree of attribution, a threshold value t_iIs the number of elements ρ belonging to the cluster_i (Total weighted attribution)In response to the,

In other words, a cluster with a large number of belonging elements is a background color that is used in a wide range of images. Therefore, in order to create a binary image that does not include character parts as much as possible, the threshold is increased, and conversely A cluster having a small number of characters is likely to be a character color, and a binary image, which is a color separation image that is set so that a threshold value is set low in order to extract it including a color blur, is created. (F) Next, black pixels and white pixels are labeled. (G) Next, a binary image suitable for character extraction is selected. (H) Next, character lines are extracted.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0016]
First, an outline of processing for character extraction from a color document image according to the present invention will be described.
[0017]
As a general flow, a color separation image is created, the binarized data is labeled, and a character string is extracted from a circumscribed rectangle surrounding the label. A feature of the method of the present invention is that, when creating a color separation image, color division is performed on color information of an input image using fuzzy clustering.
[0018]
The detailed processing flow is as follows, as shown in FIG.
[0019]
(1) First, dither is removed from the color document image by smoothing (step S1).
[0020]
(2) Next, RGB of the color value of the pixel is L^*u^*v^*Conversion to F and histogram creation are performed (step S2).
[0021]
(3) Next, fuzzy clustering of color information is performed (step S3).
[0022]
(4) Next, a color separation image (binary image) is created based on the degree of attribution (step S4).
[0023]
(5) Next, noise is removed from the binary image (step S5).
[0024]
(6) Next, black pixels and white pixels are labeled (step S6).
[0025]
(7) Next, a binary image suitable for character extraction is selected (step S7).
[0026]
(8) Next, a character line is extracted (step S8).
[0027]
Next, a detailed algorithm in each process will be described.
[0028]
(1) “Removing dither by smoothing” in (1) above
By arranging three or four colors periodically, the dither method is expected to produce an effect that the colors are mixed. If there is a character printed by the dither method, it is difficult to satisfy the monochromaticity of the character as shown in FIG. In the above document [6], a method of reducing the influence of dither by performing smoothing is proposed. There, a method has been proposed in which only the weak edges of the image are averaged so that the outlines of characters are not blurred. In FIG. 2, 1 is a character printed by the dither method, and 2 is a character from which dither is removed.
[0029]
In the method of the present invention, the following equation is used to determine whether an edge is present.
[0030]
d_imax = max (‖x_j-X_k‖)… (1)
j, kεN = {coordinate i and its eight neighboring coordinates}
X_iRepresents the color value of coordinate i
Expression (1) is the distance between the two color values that are farthest among the color values of a total of nine pixels, that is, the target pixel and the eight neighboring pixels. d_iIf max is within a certain value, it is considered not to be an edge, and averaging with 8 neighboring pixels is performed by the filter expressed by the equation (2).
[0031]
[Expression 1]

[0032]
d_iIf max is greater than or equal to a certain value, it is assumed that the edge is included and averaging is not performed.
[0033]
By performing such smoothing processing, FIG. 2A becomes as shown in FIG.
(2) “Color value RGB to L” in (2) above^*u^*v^*Conversion to Histogram and Histogram Creation ”CIE L^*u^*v^*The color space is a uniform perceptual color space which is considered to be close to human color sense. Therefore, the color value expressed in RGB is set to L^*u^*v^*Convert to The conversion formula is as follows.
[0034]
X = 0.478R + 0.299G + 0.175B (3)
Y = 0.263R + 0.655G + 0.051B (4)
Z = 0.020R + 0.160G + 0.908B (5)
L^*= 25 (100 Y / Y₀)^1/3-16 (6)
u^*= 13L (u'-u '₀(7)
v^*= 13L (v′−v ′₀... (8)
u ′ = 4X / (X + 15Y + 3Z) (9)
v '= 9Y / (X + 15Y + 3Z) (10)
In many cases, RGB is data having a value of 0 to 255, but the data converted into a value of 0 to 1 is applied to the above formula.
[0035]
Y₀= 1, u₀= 0.201, v₀= 0.461.
[0036]
Although there are some variations on the conversion formula from RGB to XYZ, the above formula is used in the present invention. This L^*u^*v^*The color space is not a cube, L^*, U^*, V^*The range of values taken by is also different. The image has a shape as shown in FIG. By the way, L^*Takes a value between 0 and 116, u^*Takes values from -97 to 171 and v^*Takes a value of -128 to 109. FIG. 3A shows an RGB color space.
[0037]
A histogram is created from the color values obtained in this way and clustering is performed. In this system, it is assumed that all characters are written horizontally, and clustering is performed using the position information. So, L^*u^*v^*Clustering is performed not only on the three-dimensional space but also on a four-dimensional space including the vertical coordinate (y-coordinate) of the image. This four-dimensional space is 17 × 45 × 40 × 10 (L^*Is 17, u^*Is 45, v^*Is divided into 40, y-coordinate is 10), and a histogram is created.
[0038]
(3) “Fuzzy clustering of color information” in (3) above
Fuzzy clustering is performed on the color information of the input image to separate the colors. Fuzzy clustering differs from normal clustering called hard clustering in that elements are allowed to belong to a plurality of clusters little by little. In hard clustering, the presence / absence of an element belonging to a cluster is represented by 1 and 0 as shown in FIG. In fuzzy clustering, this is represented by a numerical value called the degree of membership of a value of 0 to 1 determined by the ratio of the distance to each cluster center (see FIG. 5). The degree of membership indicates how strongly the cluster has an influence on the element.
[0039]
As a result, it is possible to determine not only the binary determination as to whether or not to belong to a certain cluster, but also to the extent of the degree to which cluster it belongs.
[0040]
(3) -1: FCM will be described.
[0041]
As a typical fuzzy clustering, there is a fuzzy c-means method (FCM) by Bezdek et al., That is, the algorithm shown in the above document [8]. This is an algorithm extended by adding the idea of the degree of attribution to the K-means method. Here, the K-means method will be described. The K-means method is a clustering method when the number of clusters (K) is determined in advance. Here, first, an initial cluster center is appropriately given, and each element is assigned to the nearest cluster center. Thereafter, a new cluster center is calculated from the average value for each cluster, and a new cluster is obtained again. These operations are repeated until the cluster center does not change to obtain a final cluster.
[0042]
The FCM algorithm is briefly described as follows.
[0043]
Step 1: c initial cluster centers v_i, (I = 1, 2,..., C) are appropriately determined.
[0044]
Step 2: All elements x_k, (K = 1, ..., n)
[0045]
[Expression 2]

[0046]
Ask for
Step 3: New cluster center
[0047]
[Equation 3]

[0048]
Here, m is a weight value for the degree of attribution, and takes a value of 1 <m <∞. The larger the value of m, the larger the factor x_kCluster center v_iThe effect on is increased. In equations (11) and (12), m → 1, u_ikIf ε {0,1} is set, this algorithm is the same as the normal K-means method.
[0049]
(3) -2: Self-convergence type fuzzy clustering
In the present invention, self-convergence type fuzzy clustering in which processing such as cluster division, integration, and annihilation is added to FCM, which is the most common fuzzy clustering, is used.
[0050]
The self-convergence type fuzzy clustering algorithm is as follows.
[0051]
Step 1: Initial value c of cluster number, membership weight m, convergence determination value ε, maximum number of iterations I, threshold value θ for cluster division condition_S, Cluster integration threshold θ_d, Threshold value of cluster extinction condition θ_CDecide.
[0052]
Step 2: Initial cluster center v_i, (I = 1, 2,..., C).
[0053]
Step 3: Element x_k= (X_kL, X_ku, X_kv, X_ky), (K = 1,..., N) degree of attribution
[0054]
[Expression 4]

[0055]
Ask for.
[0056]
Step 4: Distribute each cluster
[0057]
[Equation 5]

[0058]
And sort.
[0059]
Step 5: σ_iIs θ_SDivide these clusters. The new cluster center after the division is
[0060]
[Formula 6]

[0061]
(S₁= {X_k| X_kq> V_iq}, S₂= {X_k| X_kq <v_iq} And q are those having the largest variance among L, u, v, and y)
Step 6: Distance between clusters is θ_dIntegrate the following clusters: Note that the new cluster center after integration is an intermediate point between the two cluster centers. Once the cluster is merged, recalculate the attribution of all elements.
[0062]
Step 7: Number of elements belonging to each cluster (weighted attribution total)
[0063]
[Expression 7]

[0064]
Is θ_CMake the following clusters disappear, and calculate the membership of all elements again.
[0065]
Step 8: New cluster center
[0066]
[Equation 8]

[0067]
Calculate
[0068]
[Equation 9]

[0069]
Otherwise, return to step 4.
[0070]
There are a number of algorithms based on the K-means method that increase or decrease the number of clusters and self-converge. The method of the present invention is an extension of the self-convergence method used in hard clustering to fuzzy clustering. Differences from the original fuzzy clustering are the cluster division process in step 4 and step 5, the cluster integration process in step 6, and the cluster disappearance process in step 7.
[0071]
The cluster division process is performed by checking the size of cluster dispersion. A cluster with a large variance is divided into two. Next, when the clusters are close to each other (when the distance between the cluster centers is small), the two clusters are made one. Then, a cluster whose number of elements to which it belongs (total weighted membership) is reduced due to the change in the number of clusters or the like is eliminated.
[0072]
However, such a cluster number determination algorithm may not converge because the solution vibrates. Therefore, in order to suppress the vibration of the solution, it can be considered that the threshold value and the convergence condition are relaxed as the iteration proceeds.
[0073]
Each cluster and the degree of membership are calculated using fuzzy clustering that self-converges with an appropriate number of clusters, with such processing added.
[0074]
The data shown in the experimental results are the results calculated with the following initial values and threshold values.
[0075]
c = 1
m = 1.5
I = 10
ε = 1.0
θ_S= 8.0
θ_d= 3.0
θ_C= Total number of pixels / 100
As for the threshold, since the solution often vibrates, a method of relaxing the threshold as the iteration progresses was used. This value is an experimentally obtained value.
[0076]
(4) “Creation of color separation image (binary image) based on attribution” in (4) above
From the degree of attribution of each color obtained by fuzzy clustering, the degree of affiliation to the cluster can be known. A binary image is created based on the degree of attribution. Threshold t at that time_iIs the number of elements ρ belonging to the cluster_iIt is determined by the following formula according to [Formula (16)].
[0077]
[Expression 10]

[0078]
A cluster having a large number of belonging elements is considered to be a background color used in a wide range of the image, and the threshold value is increased to create a binary image that does not include a character portion as much as possible. Conversely, a cluster with a small number of elements belonging to it is highly likely to be a character color, and the threshold is lowered in order to extract it including the color blur.
[0079]
In this binarization process, only the y coordinate (the vertical coordinate of the pixel) is different.^*, U^*, V^*For clusters having substantially the same value of, threshold values are processed as one by adding their respective degrees of attribution. Depending on the image, the color used in a wide range such as the background color may be divided into two by the y coordinate. In such a case, even if the two clusters are integrated and processed, the separation between the character color and the background color is hardly affected. Therefore, in order to reduce the number of binary images and improve the processing efficiency, the cluster Need to be integrated.
[0080]
In this way, a binary image as shown in FIG. 7 is created from the image of FIG.
[0081]
(5) “Denoise of binary image” in (5) above
In the created binary image, dither that cannot be absorbed by the smoothing process, isolated points on the background, and the like appear as noise. Specifically, the number of connected pixels is small. Depending on the resolution of the image, how many connections are used as noise varies. These are removed to reduce the amount of memory used and increase the processing speed in the next labeling process and subsequent extraction of character lines.
[0082]
(6) “Labeling of black pixels and own pixels” in (6) above
Labeling is an operation of assigning the same label (number) to all pixels belonging to the same connected component and assigning different labels to different connected components. Labeling is performed for both black pixels and white pixels of the binarized image, and a circumscribed rectangle is obtained for each connected component. A binary image as shown in FIG. 8 is labeled as shown in FIGS. 9 (a) and 9 (b). Here, labeling is performed with four connections. Then, characters are extracted based on a rectangle (a circumscribed rectangle) surrounding the connected components having the same label.
[0083]
10A and 10B are circumscribed rectangles for the black pixel and white pixel labels in FIG. At this time, a rectangle that is too large (more than a quarter of the input image), a rectangle that is too small (both width and height are 3 pixels or less), and a rectangle with a greatly different aspect ratio (width / height of 15) The above or less than 1/15) is excluded because it is not a character rectangle.
[0084]
The circumscribed rectangle obtained from the image of FIG. 6 is as shown in FIG.
[0085]
(7) “Selecting a binary image suitable for character extraction” in (7) above
Select an image suitable for character extraction to speed up subsequent processing and save memory. In that case, the following four points are considered.
[0086]
・ Rectangle distribution
・ Number of rectangles
・ Pixel density
・ Average rectangle size
Since each character line should be made up of rectangles of the same size, calculate the variance of the width and height of the rectangles in each binary image, and images that are too large will not contain character lines The subsequent processing is not performed. In addition, an image having too many rectangles is rejected because it is unlikely that the image includes a character color. Any black and white pixel density that is too high or too low for each rectangle is rejected because it is highly possible that it is not a character. What is pixel density?
density = N / (w + h) (19)
Where N is the number of labeled pixels
w: width of the rectangle
h: Height of the rectangle
It is a value represented by In addition, an object whose average rectangle size is too small is rejected as not including characters.
[0087]
(8) “Character line extraction” in (8) above
When a binary image that seems to contain character rectangles is obtained, adjacent rectangles (within a certain distance from the target rectangle) are extracted for each rectangle in order to extract rectangles having characteristics as character strings. Find a rectangle concatenation (estimated as a character line) based on it. This adjacent rectangle satisfies the following condition when viewed from the target rectangle.
[0088]
* The target rectangle is partially or wholly included in a range enlarged 0.7 times the height (or width, whichever is greater), vertically and horizontally (see FIG. 11).
[0089]
* It is not completely contained in the target rectangle. That is, a part may be included) [see FIG.
[0090]
* It does not intersect with the target rectangle [see FIG. 12 (b)].
[0091]
A rectangle satisfying this condition is set as an adjacent rectangle of the target rectangle, and the rectangles are connected one after another to obtain a connected rectangle. The conditions for connecting the adjacent rectangles are as follows.
[0092]
* The size of the rectangle is about the same (width and height are 1/4 or more and 4 or less times that of the target rectangle, and either the width or height is 1/2 or more and 2 times or less that of the target rectangle).
[0093]
* The representative colors of each rectangle are close to each other.
[0094]
* The rectangles are almost horizontal (the angle θ is within 10 ° from the horizontal when viewing the lower left coordinates of the rectangles (see FIG. 13).
[0095]
The rectangles satisfying the above conditions are connected one after another to obtain a connected rectangle.
[0096]
If the number of connected rectangles obtained in this way is 4 or more, the connected rectangle is regarded as a part of a character line. Then, the width and height of the character line are estimated from the connected rectangle.
[0097]
The estimated height of the character line is a range in which the uppermost part and the lowermost part of the connected rectangle are expanded by one half of the height of the largest rectangle (see FIG. 14). The width is within the range of the height previously obtained, and a rectangle of the same color as the connected rectangle is obtained, and the end where such a rectangle exists is extended as shown in FIG. . If the center of the rectangle exists within the estimated range of the character line and the color of the rectangle is close to the color of the connected rectangle, it is extracted as a character element.
[0098]
As described above, as a result of experiments on several color document images, most of the characters could be extracted. FIG. 15 shows the result of extracting characters from FIG. Several other experimental results are shown. In FIG. 16 and FIG. 17, all characters are extracted.
[0099]
In particular, by using fuzzy clustering, characters can be extracted more effectively at several points than when using hard clustering. First, character colors that are used only in a small number in all pixels are difficult to separate from the background even in both hard and fuzzy clustering. At this time, it is impossible to extract the characters by hard clustering, but in fuzzy clustering, it is often possible to separate the characters from the background due to the difference in the degree of attribution.
[0100]
In addition, in an image in which colors are used over a wide range of color space, hard clustering divides the color space more than necessary, which tends to cause deterioration in processing efficiency and erroneous background pattern extraction. In fuzzy clustering, they are improved. Conversely, when the number of colors used is small and biased, fuzzy clustering tends to generate more clusters. However, in fuzzy clustering, the number of clusters did not increase extremely, so it can be said that it is sufficiently useful in terms of processing efficiency.
[0101]
Character extraction by the method of the present invention is particularly effective when complex and diverse colors are used in the background. In the case of a simple background (no pattern, single color), it is easy to extract characters using hard clustering or other conventional methods, which is superior to this method in terms of memory and processing speed efficiency. There are many that are.
[0102]
In the method of the present invention, a color separation image is created by performing clustering based on the color histogram and the vertical coordinates of the image. However, it may be difficult to completely separate the background and the character color only by the method of the present invention. .
[0103]
Further, in the future, it is conceivable to support not only horizontally written characters but also vertically written characters. In such a case, processing cannot be performed well only by the vertical coordinates as in the method of the present invention. In such a case, it is necessary to consider clustering that uses position information in an image more effectively than at present.
[0104]
In addition, this invention is not limited to the said Example, A various deformation | transformation is possible based on the meaning of this invention, and they are not excluded from the scope of the present invention.
[0105]
【The invention's effect】
As described above in detail, according to the present invention, the following effects can be obtained.
[0106]
(A) When extracting characters from a color document image, fuzzy clustering is used to group similar colors, so that only a character color portion can be extracted from a color document image having a complex and diverse background. Fuzzy clustering is a clustering algorithm that has a value called the degree of membership that represents the degree of affiliation to the cluster of elements. By this, the degree of affiliation of subtle colors (such as the intermediate color between the background and characters) can be understood and binary. As a result, good results can be obtained even for low-quality images that are difficult to judge by hard clustering, and images with abundant colors.
[0107]
(B) Separation of colors with a small number of pixels can be effectively performed, and clusters are not generated more than necessary in a rich image. The present invention has been found to be particularly effective for images having various colors with complex backgrounds.
[Brief description of the drawings]
FIG. 1 is a flowchart of character extraction from a color document image according to the present invention.
FIG. 2 is an explanatory diagram of an example of smoothing processing according to the present invention.
FIG. 3 is a diagram illustrating a shape example of a color space according to the present invention.
FIG. 4 is a diagram illustrating a result of a conventional hard clustering example.
FIG. 5 is a diagram showing an example of fuzzy clustering according to the present invention.
FIG. 6 is a diagram showing an example of an input image according to the present invention.
7 is a diagram showing a color separation image of the input image of FIG. 6 according to the present invention.
FIG. 8 is a diagram illustrating an example of a binary image.
FIG. 9 is an explanatory diagram of labeling of the binary image in FIG.
10 is an explanatory diagram of a circumscribed rectangle of the binary image in FIG. 8. FIG.
FIG. 11 is an explanatory diagram of adjacent rectangles according to the present invention.
FIG. 12 is an explanatory diagram of adjacent rectangle conditions according to the present invention.
FIG. 13 is an explanatory diagram of angles between rectangles according to the present invention.
FIG. 14 is an explanatory diagram of an estimated range of character lines according to the present invention.
15 is a diagram showing a character extraction result of the input image of FIG. 6 according to the present invention.
FIG. 16 is a diagram showing an experimental result (No. 1) of the present invention.
FIG. 17 is a diagram showing an experimental result (No. 2) of the present invention.
[Explanation of symbols]
1 Characters printed by the dither method
2 Characters without dither

Claims

A method for extracting characters from a color document image in which only a character color portion is extracted from a color document image having a complicated and diverse background,
(A) performing dither removal by smoothing from the color document image;
(B) The color value of the image from which the dither has been removed is converted from RGB to L ^* u ^* v ^* and a histogram is created.
(C) Next, a four-dimensional space including not only the three-dimensional L ^* u ^* v ^* but also the vertical coordinate (y-coordinate) of the image is divided by dispersion of elements in the cluster. By performing fuzzy clustering including integration and disappearance by the number of belonging elements, the degree of belonging indicating how much each color belongs to the cluster is obtained,
(D) Next, based on the degree of attribution, the threshold value t _i depends on the number of elements belonging to the cluster (weighted degree of attribution total) ρ _i ,

In other words, a cluster with a large number of belonging elements is a background color that is used in a wide range of images. Therefore, in order to create a binary image that does not include character parts as much as possible, the threshold is increased, and conversely, the belonging elements A cluster with a small number of characters is likely to be a character color, and a binary image that is a color separation image that is set so that a threshold value is set to be low in order to extract including a color blur is created.
(E) Next, noise of the binary image is removed,
(F) Next, black pixels and white pixels are labeled,
(G) Next, a binary image suitable for character extraction is selected,
(H) Next, a method for extracting characters from a color document image, wherein character lines are extracted.