JP4032735B2

JP4032735B2 - Image processing apparatus and image processing method

Info

Publication number: JP4032735B2
Application number: JP2001389763A
Authority: JP
Inventors: 葉子藤原; 勉山崎; 芳則田中; 昌裕小澤
Original assignee: Konica Minolta Business Technologies Inc
Current assignee: Konica Minolta Business Technologies Inc
Priority date: 2001-12-21
Filing date: 2001-12-21
Publication date: 2008-01-16
Anticipated expiration: 2021-12-21
Also published as: JP2003189095A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置および画像処理方法に関し、特に、処理しようとする画像データから、写真領域、図形領域、および文字領域を分離する領域分離処理を行う画像処理装置および画像処理方法に関する。
【０００２】
【従来の技術】
原稿を読み取って得られた画像データから、写真領域、図形領域、および文字領域を判別し、それぞれの領域に適応した処理を行った後、各領域を合成した画像データを出力するという画像処理技術が知られている（たとえば、特開平５−３４２４０８号公報参照）。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記従来技術においては、画像データ中の各領域の種類を誤り無く判別することが困難な場合がある。たとえば、画像データの中で、複数の異なる種類の領域が複雑なレイアウトで配置されている場合、あるいは相互に重なって配置されている場合には、領域の種類を誤判別するおそれが大きい。
【０００４】
たとえば、写真領域内に文字領域と誤判別された部分があると、後工程において、当該部分に対して文字領域に適応した２値化処理が行われてしまうおそれがある。また、写真領域内に図形領域と誤判別された部分があると、後工程において、当該部分に対して図形領域に適応した減色処理が行われて単一色で塗り潰されてしまうおそれがある。この結果、写真画像は、画質が著しく低下することになる。
【０００５】
このように、画像データから、写真画像が含まれる写真領域を重点的に抽出し、抽出された写真領域を高画質で再現したい場合であっても、写真領域を確実に抽出することができないために、写真領域に対して不適切な処理が施されて画像の劣化を来たすおそれがあった。
【０００６】
本発明は、上記従来技術の有する問題点に鑑みてなされたものであり、本発明の目的は、読み取った画像データ中の写真領域を高画質で再現するために、当該画像データから写真領域を確実に抽出することである。
【０００７】
【課題を解決するための手段】
本発明の上記目的は、下記の手段によって達成される。
【０００８】
（１）処理しようとする画像データから、写真領域、図形領域、および文字領域を分離する領域分離手段を有する画像処理装置であって、前記領域分離手段は、前記画像データから、図形領域および文字領域よりも先に写真領域を特定して抽出する写真領域最先抽出手段を備え、前記画像データから写真領域が特定されて抽出されて残った、領域が特定されていないデータから、図形領域と文字領域とを特定して分離し、前記写真領域最先抽出手段は、前記画像データに対して、複数の所定領域に分割し得る第１領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第１写真領域抽出手段と、前記画像データから前記第１写真領域抽出手段により写真領域と特定された領域が抽出されて残ったデータに対して、前記所定領域よりも小さい複数の領域に分割し得る第２領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第２写真領域抽出手段とを有する、ことを特徴とする画像処理装置。
【０００９】
（２）前記領域分離手段は、前記画像データから写真領域が特定されて抽出されて残った、領域が特定されていないデータから、文字領域よりも先に図形領域を特定して抽出する図形領域先行抽出手段をさらに備えていることを特徴とする上記（１）に記載の画像処理装置。
【００１０】
（３）前記領域分離手段は、前記画像データから写真領域が特定されて抽出されて残った、領域が特定されていないデータから、図形領域よりも先に文字領域を特定して抽出する文字領域先行抽出手段をさらに備えていることを特徴とする上記（１）に記載の画像処理装置。
【００１２】
（４）前記第１領域分割処理は、前記画像データにおける写真領域、図形領域、または文字領域の部分とこれら以外の下地部分とを区別した２値画像のエッジを検出することにより、複数の所定領域に分割可能であり、前記第２領域分割処理は、前記画像データから写真領域と特定された領域が抽出されて残ったデータのエッジを検出することにより、前記所定領域よりも小さい複数の領域に分割可能であることを特徴とする上記（１）に記載の画像処理装置。
（５）原稿を読み取ることによって画像データを得る読取手段をさらに有し、前記処理しようとする画像データは、前記読取手段によって得られることを特徴とする上記（１）〜（４）のいずれか１つに記載の画像処理装置。
（６）処理しようとする画像データから、写真領域、図形領域、および文字領域を分離する領域分離段階を有する画像処理方法であって、前記領域分離段階は、前記画像データから、図形領域および文字領域よりも先に写真領域を特定して抽出する写真領域最先抽出段階を備え、前記画像データから写真領域が特定されて抽出されて残った、領域が特定されていないデータから、図形領域と文字領域とを特定して分離する段階であり、前記写真領域最先抽出段階は、前記画像データに対して、複数の所定領域に分割し得る第１領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第１写真領域抽出段階と、前記画像データから前記第１写真領域抽出段階において写真領域と特定された領域が抽出されて残ったデータに対して、前記所定領域よりも小さい複数の領域に分割し得る第２領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第２写真領域抽出段階とを有する、ことを特徴とする画像処理方法。
（７）処理しようとする画像データから、写真領域、図形領域、および文字領域を分離する領域分離手順を画像処理装置に実行させるための画像処理プログラムであって、前記領域分離手順は、前記画像データから、図形領域および文字領域よりも先に写真領域を特定して抽出する写真領域最先抽出手順を備え、前記画像データから写真領域が特定されて抽出されて残った、領域が特定されていないデータから、図形領域と文字領域とを特定して分離する手順であり、前記写真領域最先抽出手順は、前記画像データに対して、複数の所定領域に分割し得る第１領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第１写真領域抽出手順と、前記画像データから前記第１写真領域抽出手順において写真領域と特定された領域が抽出されて残ったデータに対して、前記所定領域よりも小さい複数の領域に分割し得る第２領域分割処理を施し、分割された領域のうちで写真領域と特定された領域を抽出する第２写真領域抽出手順とを有する、ことを特徴とする画像処理プログラム。
（８）上記（７）に記載の画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体。
【００１３】
【発明の実施の形態】
以下、本発明の実施の形態を、図面を参照して詳細に説明する。
【００１４】
図１は、本発明の実施形態にかかる画像処理装置を含む画像処理システムの全体構成を示すブロック図である。本画像処理システムは、画像処理装置１と、スキャナ２と、ファイルサーバ３とを備え、これらはコンピュータネットワーク４を介して相互に通信可能に接続されている。なお、コンピュータネットワークに接続される機器の種類および台数は、図１に示す例に限定されない。
【００１５】
図２は、本実施形態にかかる画像処理装置１の構成を示すブロック図である。図２において、画像処理装置１は、制御部１０１、記憶部１０２、操作部１０３、入力インタフェース部１０４、出力インタフェース部１０５、領域分離部１０６、画像処理部１０８、文書ファイル作成部１０９およびファイル形式変換部１１０を備えており、これらは信号をやり取りするためのバス１１１を介して相互に接続されている。
【００１６】
制御部１０１はＣＰＵであり、プログラムにしたがって上記各部の制御や各種の演算処理等を行う。記憶部１０２は、予め各種プログラムやパラメータを格納しておくＲＯＭ、作業領域として一時的にプログラムやデータを記憶するＲＡＭ、各種プログラムやパラメータを格納し、または画像処理により得られた画像データ等を一時的に保存するために使用されるハードディスク等からなる。
【００１７】
操作部１０３は、各種項目の設定、または動作開始の指示を行うためのキーや操作パネル等から構成される。図３に示すように、設定可能な項目としては、画像データの送信先、出力ファイル形式、原稿モード、スキャン条件、スキャン後処理等が挙げられる。
【００１８】
入力インタフェース部１０４は、画像データなどのデータや命令等を受信するためのインタフェースであり、出力インタフェース部１０５は、出力ファイルなどのデータや命令等を送信するためのインタフェースである。
【００１９】
領域分離部１０６は、画像データから、写真領域、図形領域、および文字領域を分離する。画像処理部１０８は、写真領域処理部１０８ａ、図形領域処理部１０８ｂおよび文字領域処理部１０８ｃからなる。各領域処理部１０８ａ〜１０８ｃは、領域分離部１０６によりそれぞれ抽出された文字領域、図形領域、および写真領域に対して、当該領域の種類に応じた適切な画像処理を施す。
【００２０】
文書ファイル作成部１０９は、写真領域処理部１０８ａ、図形領域処理部１０８ｂ、および文字領域処理部１０８ｃからそれぞれ送られる処理後の画像が含まれる各領域を合成して、内部ファイル形式により文書ファイルを作成する。ファイル形式変換部１１０は、内部ファイル形式により作成した文書ファイルを設定された出力ファイル形式に変換する。なお、出力ファイル形式としては、各種文書作成ソフトの文書形式や、ポストスクリプト（登録商標）、ＰＤＦ、ＪＰＥＧ、ＴＩＦＦ等の汎用フォーマットが挙げられる。
【００２１】
スキャナ２は、原稿を読み取って画像データを取得し、得られた画像データを画像処理装置に送信する。
【００２２】
ファイルサーバ３はコンピュータであり、コンピュータネットワーク４を介して受信したファイルを格納し、また転送要求に応じて格納したファイルをコンピュータネットワーク上の他の機器に転送する。
【００２３】
コンピュータネットワーク４は、イーサネット（登録商標）、トークンリング、ＦＤＤＩ等の規格によりコンピュータや周辺機器、ネットワーク機器等を接続したＬＡＮや、ＬＡＮ同士を専用線で接続したＷＡＮ等からなる。
【００２４】
次に、図４を参照して、本実施形態の画像処理装置１における処理の手順について説明する。なお、図４のフローチャートにより示されるアルゴリズムは、画像処理装置１の記憶部１０２にプログラムとして記憶されており、制御部１０１によって実行される。
【００２５】
まず、ステップＳ１０１では、各種項目の設定が行われる。すなわち、画像データの送信先、出力ファイル形式、原稿モード、スキャン条件、およびスキャン後処理についての設定が行われる。ここで、画像データの送信先の設定は、画像出力先装置のＩＰアドレス、ホスト名、メールアドレス等が入力されることにより行われる。出力ファイル形式の設定は、画像出力先装置に送信する出力ファイルのファイル形式が選択されることにより行われる。
【００２６】
原稿モードの設定は、写真画像が含まれる写真領域、図形画像が含まれる図形領域、および文字画像が含まれる文字領域のうちで、どの領域の画像を優先するかが選択されることにより行われる。ここで、写真画像とは、写真や絵柄などの連続的に変化する階調のある画像をいう。また、図形画像とは、線やベタ絵などのたとえばパソコンで作成された画像をいう。
【００２７】
図３に示すように、本実施形態では、ユーザは、操作部１０３を通して、原稿の内容に応じて、最も優先して処理したい領域（第１優先領域）と、２番目に優先して処理したい領域（第２優先領域）とを選択することができる。なお、ユーザによる第１優先領域および／または第２優先領域の選択が行われない場合には、所定のデフォルト値にしたがって領域の優先順位が決定される。
【００２８】
スキャン条件の設定は、スキャン領域、スキャン解像度、カラー／モノクロ等が指定されることにより行われる。スキャン後処理の設定は、文字認識処理、ベクタ変換処理、画像圧縮方法、減色方法、出力解像度等が指定されることにより行われる。
【００２９】
ステップＳ１０２では、画像処理の開始命令があるまで待機する。開始命令は、ユーザが操作部１０３のたとえばスタートキーを操作することにより行われる。
【００３０】
ステップＳ１０３では、スキャナ２に対して、出力インタフェース部１０５を介して原稿読み取り命令が送信される。ここで、スキャナ２は、画像処理装置１から原稿読み取り命令を受信すると、所定の位置にセットされた原稿を読み取って画像データを取得し、得られた画像データを画像処理装置１に送信する。
【００３１】
ステップＳ１０４では、スキャナ２から、入力インタフェース部１０４を介して画像データが受信されるまで待機する。ここで、スキャナ２から画像データが受信されると、受信した画像データ（ＲＧＢ画像データ）は記憶部１０２に保存される。
【００３２】
図５は、受信した画像データの一例を模式的に示す図である。図５に示される画像データは、写真画像ＰＩ１〜ＰＩ４、図形画像ＧＩ１〜ＧＩ３、文字画像ＣＩ１〜ＣＩ６、および下地Ｕから構成される。図示のように、写真画像ＰＩ１の中に図形画像ＧＩ３および文字画像ＣＩ１、ＣＩ３が配置されており、また、図形画像ＧＩ２の中に写真画像ＰＩ３、ＰＩ４および文字画像ＣＩ５、ＣＩ６が配置されている。ここで、下地とは、画像データにおける写真領域、図形領域、および文字領域以外の部分をいい、たとえば元の原稿の印刷されていない用紙の色が残っている部分に対応する画像データの部分をいう。
【００３３】
なお、画像処理の開始命令はコンピュータネットワーク４上の他の機器から、またはスキャナ２から入力されてもよい。
【００３４】
ステップＳ１０５では、領域分離部１０６により、スキャナ２から受信した画像データ、つまり処理しようとする画像データから、写真領域、図形領域、および文字領域が分離される。本実施形態では、領域分離部１０６は、操作部１０３を通したユーザによる選択によって設定された原稿モードにしたがって、画像データから第１優先領域を最先に抽出し、当該画像データから第１優先領域が抽出されて残ったデータから、残りの２つの領域を分離する。さらに、第２優先領域が選択されている場合には、画像データから第１優先領域が抽出されて残ったデータから、第２優先領域を先に抽出することによって残りの２つの領域を分離する。ここで、３種類の領域別に各画像が作成され、それぞれの領域は、各画像を含んだ領域として抽出される。この領域分離処理の手順についての詳細は後述する。
【００３５】
ステップＳ１０６では、ステップＳ１０５において分離された写真領域に対して、写真領域に適応した処理が行われる。すなわち、写真領域処理部１０８ａにより、写真領域内の写真画像は、たとえば、解像度変換が行われた後、カラー画像の非可逆圧縮処理が行われて、位置情報とともに記憶部１０２に保存される。
【００３６】
ステップＳ１０７では、ステップＳ１０５において分離された図形領域に対して、図形領域に適応した処理が行われる。すなわち、図形画像処理部１０８ｂにより、図形領域内の図形画像は、たとえば、スムージング処理、減色処理等が施された後、カラー画像の可逆圧縮処理が行われて、位置情報とともに記憶部１０２に保存される。
【００３７】
ステップＳ１０８では、ステップＳ１０５において分離された文字領域に対して、文字領域に適応した処理が行われる。すなわち、文字画像処理部１０８ｃにより、文字領域内の文字画像は、たとえば、２値化された後、１ビットデータの可逆圧縮処理が施されて、色情報、位置情報とともに記憶部１０２に保存される。
【００３８】
ステップＳ１０９では、文書ファイル作成部１０９により、写真領域処理部１０８ａ、図形領域処理部１０８ｂ、および文字領域処理部１０８ｃからそれぞれ送られた処理後の画像を含む上記３つの領域が合成されて、文書ファイルが作成される。
【００３９】
領域の合成は、たとえば図６（Ａ）に示すように、メモリ上に写真領域６００、図形領域７００、および文字領域８００を出力することにより行われる。ここで、図６（Ｂ）に示すように、写真領域６００内の写真画像６００ａ以外の部分、図形領域７００内の図形画像７００ａ以外の部分、文字領域８００内の文字画像８００ａ以外の部分がそれぞれマスク部６００ｂ〜８００ｂに指定される。なお、各領域６００〜８００は、画像６００ａ〜８００ａについての外接矩形で与えられる。マスク部は、メモリ上に先に記憶されている情報を有効とするためのマスク処理が行われる部分である。まず、写真領域６００と図形領域７００とがマスク処理されて配置され、その後に、文字領域８００がマスク処理されて配置される。すべての領域についてマスク処理を行う理由は、各領域内の画像が矩形である場合のみならず、他の領域の画像が中に入り込むような形状である場合を許容しているからである。こうして、図６（Ａ）に示すように、３つの領域内の画像が相互に欠損することなくメモリ上に出力されて、領域の合成が完了する。
【００４０】
文字領域を他の領域よりも後で配置する理由は、後述するように、領域分離処理の際に文字領域を抽出した後、文字画像が存在していた部分を元の画像データの背景部で補間する補間処理が行われているためである。つまり、文字領域を他の領域よりも先に配置すると、後で配置される領域内における補間された部分によって文字画像が隠されてしまうことを防止するためである。なお、図形領域を抽出した後、図形画像が存在していた部分に対しても補間処理が行われる場合には、写真領域、図形領域、文字領域の順番でメモリ上に出力される。
【００４１】
ステップＳ１１０では、ファイル形式変換部１１０により、ステップＳ１０９で作成された文書ファイルが、設定された出力ファイル形式に変換される。
【００４２】
ステップＳ１１１では、ステップＳ１１０で得られた出力ファイルが、出力インタフェース部１０５およびコンピュータネットワーク４を介してファイルサーバ３に送信される。
【００４３】
本実施形態では、ファイルサーバ３は、コンピュータネットワーク４を介して画像処理装置１から出力ファイルを受信すると、受信したファイルから文字画像および図形画像を展開し、文字画像に対して文字認識処理を施して文字コードデータに変換し、図形画像に対してベクタ変換処理を施してベクタデータに変換し、変換後の各データを写真画像とともに再度合成して、所定のファイル形式に変換して得られた文書ファイルをハードディスク等の記憶装置の所定のディレクトリに格納する。そして、コンピュータネットワーク４上の他の機器から当該ファイルの転送要求があると、格納した前記ファイルをコンピュータネットワーク４を介して前記他の機器に転送する。
【００４４】
次に、本実施形態における画像処理装置１の画像処理の特徴である図４に示したステップＳ１０５の領域分離処理の手順について、さらに詳細に説明する。
【００４５】
図７〜図１２は、原稿モードに応じた領域分離処理の手順を示すフローチャートである。図４のステップＳ１０１において設定される原稿モードは、全部で６種類ある。すなわち、原稿モードは、第１優先領域および第２優先領域が、それぞれ写真領域および図形領域である第１モード（図７参照）と、それぞれ写真領域および文字領域である第２モード（図８参照）と、それぞれ図形領域および写真領域である第３モード（図９参照）と、それぞれ図形領域および文字領域である第４モード（図１０参照）と、それぞれ文字領域および写真領域である第５モード（図１１参照）と、それぞれ文字領域および図形領域である第６モード（図１２参照）とからなる。
【００４６】
図７〜図１２の各領域分離処理は、２値化による領域分割、第１写真／図形領域抽出、エッジによる領域分割、第２写真／図形領域抽出、写真／図形領域抽出、および文字領域抽出という、各モードで内容が共通する処理ブロックを含んでいる。つまり、各領域分離処理は、原稿モードに応じて優先される領域がそれぞれ異なり、領域の優先度にしたがって、領域の抽出順がそれぞれ異なる。たとえば第１モードの場合、領域の優先度は、高い方から写真領域、図形領域、文字領域の順となり、領域の抽出順もこれと同じである。
【００４７】
以下、領域分離処理における各処理ブロックの内容について詳細に説明する。ここでは、例として図７の第１モードの処理順序と同じ順序で、各処理ブロックの内容について具体的な説明を行う。
【００４８】
（２値化による領域分割）
まず、図１３を参照して、２値化による領域分割の手順について説明する。
【００４９】
たとえば図５に示される受信した画像データに基づいて、明度からなる画像、つまり明度画像が作成される（ステップＳ３０１）。次に、明度画像から下地が除去され（ステップＳ３０２）、スムージングフィルタを使用するスムージング処理が行われる（ステップＳ３０３）。下地およびノイズが除去された明度画像は、下地の明度レベル（以下、「下地レベル」という。）で、２値化処理される（ステップＳ３０４）。これにより、図１４に示すように、下地以外の領域を黒く塗り潰した２値画像が得られる。この２値画像に対して、たとえば２次微分フィルタであるラプラシアンフィルタを使用することにより、エッジ検出が行われる（ステップＳ３０５）。続いて、モフォロジー処理の一種であるクロージング（膨張・侵食）処理を実行することにより、エッジが補間され、エッジから構成される画像、つまりエッジ画像が得られる（ステップＳ３０６、図１５参照）。図１５のエッジ画像は、図１４の２値画像の輪郭線に相当する。
【００５０】
このように下地レベルで２値化して得られた２値画像の輪郭線を検出することによって、相互に接する複数の領域がある場合にはこれらの領域を一つにまとめた形で、領域の分割が行われる。第１〜第４モードにおいては、図１４および図１５に示すような４つの大きな領域と文字領域とが得られる。ただし、文字領域が優先される第５および第６モードにおいては、２値化による領域分割よりも前に文字領域の抽出が行われるので、４つの大きな領域のみが得られる。
【００５１】
（第１写真／図形領域抽出）
次に、図１６および図１７を参照して、第１写真／図形領域抽出の手順について説明する。
【００５２】
第１写真／図形領域抽出に関する処理は、下地レベルで２値化して得られた２値画像の輪郭線によって区画された各領域（図１５の閉曲線で囲まれた領域）に対してそれぞれ実行される。まず、図１５のエッジ画像のエッジの位置に基づいて、第１方向、たとえば主走査方向に関するエッジ間線分の位置が検出される（ステップＳ４０１）。そして、注目エッジ間線分の位置に対応する明度画像の画素を使用して、たとえば図１８に示すようなヒストグラム（第１ヒストグラム）が作成される（ステップＳ４０２）。また、第１ヒストグラムに対して、平均値フィルタ（｜１｜０｜１｜）を使用するスムージング処理を行うことによって、たとえば図１９に示すような第２ヒストグラムが作成される（ステップＳ４０３）。続いて、下記の式にしたがって、階調毎に第１ヒストグラムHist１と第２ヒストグラムHist２の差を算出し、その合計を特徴量とする（ステップＳ４０４）。なお、明度は、８ビットデータで表され、２５６階調を有する。
【００５３】
【数１】

【００５４】
次いで、注目エッジ間線分に位置する総画素数Ｎ１と所定の定数Ｔ１との比率Ｒ１（＝Ｎ１／Ｔ１）が、算出される（ステップＳ４０５）。定数Ｔ１は、写真領域と図形領域とを分離するための第１のパラメータである。続いて、閾値である比率Ｒ１と特徴量とが比較される（ステップＳ４０６）。特徴量が、比率Ｒ１より大きいと判断される場合（ステップＳ４０６：ＮＯ）、第１方向に関するエッジ間線分に位置する画素の全ては、図形領域に属しているとみなされ、番号を割り当てる処理であるラベリングが実行され、ラベリングデータが生成される（ステップＳ４０７）。すなわち、各画素に対して領域判定の結果をラベリングする。具体的には、領域判定の結果が画素位置と対応して保存される。一方、特徴量が、比率Ｒ１以下であると判断される場合（ステップＳ４０６：ＹＥＳ）、第１方向に関するエッジ間線分に位置する画素の全ては、写真領域に属しているとみなされ、ラベリングデータが生成される（ステップＳ４０８）。続いて、注目エッジ間線分が、第１方向に関する最終エッジ間線分であるか否かが判断される（ステップＳ４０９）。注目エッジ間線分が、最終エッジ間線分でないと判断される場合（ステップＳ４０９：ＮＯ）、ステップＳ４０２に戻って、上記処理が繰り返される。
【００５５】
次に、図１５のエッジ画像に基づいて、第１方向と直交する方向である第２方向、たとえば副走査方向に関するエッジ間線分の位置が検出される（ステップＳ４１０）。そして、ステップＳ４０７およびＳ４０８で作成されたラベリングデータに基づいて、注目エッジ間線分に位置する総画素数Ｎ２と写真領域に属する画素数Ｎ３との比率Ｒ２（＝Ｎ３／Ｎ２）が、算出される（ステップＳ４１１）。続いて、比率Ｒ２と閾値である所定の定数Ｔ２とが比較される（ステップＳ４１２）。定数Ｔ２は、写真領域と図形領域とを分離するための第２のパラメータである。比率Ｒ２が、定数Ｔ２よりも小さいと判断される場合（ステップＳ４１２：ＮＯ）、第２方向に関する注目エッジ間線分に存在している画素の全てが、図形領域に属するとみなされ、ラベリングし直される（ステップＳ４１３）。一方、比率Ｒ２が、定数Ｔ２以上と判断される場合（ステップＳ４１２：ＹＥＳ）、第２方向に関する注目エッジ間線分に位置する画素の全てが、写真領域に属すると見なされ、ラベリングし直される（ステップＳ４１４）。続いて、注目エッジ間線分が、第２方向に関する最終エッジ間線分であるか否かが判断される（ステップＳ４１５）。注目エッジ間線分が、最終エッジ間線分でないと判断される場合（ステップＳ４１５：ＮＯ）、ステップＳ４１１に戻って、上記処理が繰り返される。
【００５６】
次に、２値化による領域分割により得られた一つの領域内で、写真領域に属するとラベリングされた画素の数Ｎ５と図形領域に属するとラベリングされた画素の数Ｎ４とが比較される（ステップＳ４１６）。Ｎ５＜Ｎ４と判断される場合（ステップＳ４１６：ＮＯ）、当該領域は図形領域と判定される（ステップＳ４１７）。一方、Ｎ５≧Ｎ４と判断される場合（ステップＳ４１６：ＹＥＳ）、当該領域は写真領域と判定される（ステップＳ４１８）。そして、写真領域または図形領域のうち優先度が高い方の領域の判定が確定されるとともに、当該領域内の画素の全てが、優先度が高い方の領域に属するとみなされ、ラベリングし直されて、画像データから抽出される（ステップＳ４１９）。ここで、位置データに基づいて領域の輪郭を追跡することによって領域の外接矩形が算出され、外接矩形内の座標位置に対応していて当該領域に属するとラベリングされた画素を含む矩形領域が、画像データから抽出される。
【００５７】
このように第１写真／図形領域抽出においては、２値化による領域分割が行われて得られた領域に対して、領域の境界であるエッジの間の画像特徴量を算出することにより、当該領域が写真領域であるか図形領域であるかが判定される。そして、写真領域の方が図形領域よりも優先度が高い場合（第１、第２、および第５モード）、写真領域が確定され、当該領域が矩形領域の形で抽出される。一方、図形領域の方が写真領域よりも優先度が高い場合（第３、第４、および第６モード）、図形領域が確定され、当該領域が矩形領域の形で抽出される。
【００５８】
本実施形態では、上記のように、図形領域の明度分布はある程度均一である一方で、写真領域の明度分布は分散しているという特徴に基づいて、写真領域であるかまたは図形領域であるかが判定される。ただし、領域の判定方法は、これに限定されるものではなく、たとえば明度画像から抽出される周波数成分を特徴量として用いて領域の判定を行ってもよい。
【００５９】
（エッジによる領域分割）
次に、図２０を参照して、エッジによる領域分割の手順について説明する。
【００６０】
エッジによる領域分割は、第１写真／図形領域抽出によって写真領域または図形領域のうち優先度が高い方の領域が抽出されて残ったデータに対して、実行される。
【００６１】
まず、画像データにおけるＲ成分画像、Ｇ成分画像、およびＢ成分画像のそれぞれに対して、たとえばラプラシアンフィルタなどのエッジ検出フィルタを使用することにより、エッジ検出が行われる（ステップＳ５０１〜Ｓ５０３）。続いて、検出されたＲ成分画像、Ｇ成分画像、およびＢ成分画像の各エッジの和集合を得るためのＯＲ処理が行われ（ステップＳ５０４）、さらにエッジの途切れをなくすために、クロージング処理が行われる（ステップＳ５０５）。ここで、第１〜第４モードの場合、文字領域の抽出がまだ行われていない段階であるため、文字画像のエッジも検出されてしまう。しかし、たとえば、領域の外接矩形の大きさ（縦および横寸法）が所定の閾値より小さく、外接矩形の大きさに対する領域内部の有効画素数の割合が所定の閾値より小さい場合、当該領域は、エッジによる領域分割の対象から除外される。これにより、文字画像のエッジは、領域の境界とはみなされなくなる。
【００６２】
図２１は、エッジによる領域分割によって得られた領域の一例を模式的に示す図であって、（Ａ）は、第１、第２、または第５モードの場合、（ｂ）は、第３、第４、または第６モードの場合を示す。ここで、図２１（Ａ）では、第１写真／図形領域抽出によって、図形領域よりも先に写真領域が既に抽出されている。つまり、図２１（Ｂ）に示す領域Ｐ１（領域Ｇ３を含む）、および領域Ｐ２は、図２１（Ａ）においては写真領域として抽出済みである。また、図２１（Ｂ）では、第１写真／図形領域抽出によって、写真領域よりも先に図形領域が既に抽出されている。つまり、図２１（Ａ）に示す領域Ｇ１、および領域Ｇ２（領域Ｐ３、Ｐ４を含む）は、図２１（Ｂ）においては写真領域として抽出済みである。
【００６３】
このようにエッジによる領域分割は、第１写真／図形領域抽出によって写真領域または図形領域のうち優先度の高い方の領域が抽出されて残ったデータの中に、優先度の高い方の領域が優先度の低い方の領域上に重なった状態でまだ残っている場合、あるいは優先度が高い方の領域が優先度の低い方の領域の内部に含まれた状態でまだ残っている場合に、当該優先度が高い方の領域をさらに抽出するために行われる。つまり、上記エッジを検出することによって、より細かい領域分割が行われる。
【００６４】
（第２写真／図形領域抽出）
次に、第２写真／図形領域抽出の手順について説明する。第２写真／図形領域抽出は、図２１に示されるエッジによる領域分割によって得られた領域に対して、前述した第１写真／図形領域抽出と同様の処理が再度行われる。これにより、第１、第２、および第５モードの場合、第１写真／図形領域抽出によっては写真領域として抽出されなかった図形領域中の写真領域が抽出される。たとえば図２１（Ａ）では、領域Ｐ３、Ｐ４が写真領域として追加的に抽出される。また、第３、第４、および第６モードの場合、第１写真／図形領域抽出によっては図形領域として抽出されなかった写真領域中の図形領域が抽出される。たとえば図２１（Ｂ）では、領域Ｇ３が図形領域として追加的に抽出される。
【００６５】
なお、図７〜図１２に示される第２写真／図形領域抽出の終了後に行われる写真／図形領域抽出においては、図２１に示されるエッジによる領域分割によって得られた領域のうち、第２写真／図形領域抽出によって抽出されずに残った領域が抽出される。
【００６６】
（文字領域抽出）
次に、図２２を参照して、文字領域抽出の手順について説明する。
【００６７】
ここでは、説明を簡単にするため、たとえば図２３に示される画像データに対して、文字領域抽出に関する処理が行われる場合について説明する。
【００６８】
まず、画像データに対して、領域統合処理が行われる（ステップＳ６０１）。この処理は、たとえば背景画像上の文字画像、あるいは画素値が異なる文字画像を含む文字領域をも抽出するための処理である。具体的には、まず、画像データのうち明度画像に対して、スムージング処理が施された後、変動閾値による２値化を行うことにより、エッジ画像が作成される。変動閾値による２値化処理とは、具体的には、たとえば図２４に示すように、５×５ブロック内の四隅に位置する画素の階調値の最大値からオフセット値を減算した値を閾値として注目画素を２値化する処理である。続いて、得られたエッジ画像の主走査方向の黒画素の間隔を計測し、所定間隔以下である黒画素間の白画素を全て黒画素で置換して主走査方向に黒画素を連結した連結エッジ画像を作成する。さらに、得られた連結エッジ画像の副走査方向に対しても、同様の処理が繰り返され、主副走査方向に黒画素を連結した連結エッジ画像が得られる。このようにして、画像処理装置１は、近隣の黒画素を連結し、画像データ中で孤立している個々の文字画像を１つの領域として統合することにより、ある程度まとまった文字列ごとに１つの領域として抽出することが可能となる。
【００６９】
次に、領域抽出処理が行われる（ステップＳ６０２）。この処理は、連結した黒画素のまとまりを１領域として別々に抽出する処理である。具体的には、まず、得られた連結エッジ画像に対し、連結した黒画素ごとにラベリングが施される。ラベリングと同時に、同一ラベルの連結した黒画素ごとの外接矩形の位置情報（幅、高さ、および座標）が検出され、ラベリング画像が作成される。続いて、ラベリング時に検出された外接矩形とラベル番号とに基づいて、ラベリング画像から当該外接矩形で囲まれる領域が局所領域として抽出される。ここで、同一ラベル番号の画素だけを含む外接矩形を抽出することにより、外接矩形同士が重なり合ったレイアウトの画像も分離して抽出することが可能となる。図２５は、変動閾値による２値化処理および黒画素の連結処理を施して得られた連結エッジ画像と、連結エッジ画像データから得られたラベリング画像において同一ラベルの連結した黒画素ごとに求めた外接矩形とを示す図である。
【００７０】
次に、ステップＳ６０２において抽出された各局所領域に属する画像の斜め方向エッジ成分が特徴量として抽出され（Ｓ６０３）、斜め方向エッジ成分の含有率が所定範囲内にある局所領域が、文字領域と判別される（Ｓ６０４）。文字領域は、図形、写真、罫線等の他の領域と比較して、小さい領域内に斜め方向エッジ成分を多く含んでいる。したがって、文字領域特有の周波数成分として斜め方向エッジ成分を抽出し局所領域における含有率を求めることにより、当該局所領域が文字領域であるか否かが判定され得る。かかる斜め方向エッジ成分の抽出は、２×２ＤＣＴ（離散コサイン変換）によって得られた周波数成分のうち高周波成分を抽出する処理と同等である。すなわち、局所領域内の画像に対して２×２マトリクスによるＤＣＴを施し、得られた周波数成分のうちの高周波成分を「０」として逆ＤＣＴ変換を行うことにより、高周波成分を除去した復元画像が得られる。そして、原画像と復元画像との差分を取り出すことにより、原画像の高周波成分のみが抽出され得る。ここで、図２６に示すフィルタ処理を施すことにより高速な処理が可能である。図２７は、抽出された高周波成分を２値化して得られた斜め方向エッジ成分画像の一例を示す図である。所領局域は、大体において単語単位となっている。このため、所領局域が文字領域である場合、局所領域内の斜め方向エッジ成分の含有率、すなわち、局所領域の面積に対する当該局所領域に属する図２７の黒画素総数の比率は所定範囲内となる（約０．２〜２０％）。したがって、当該比率が上記範囲内である局所領域が文字領域と判別される。
【００７１】
次に、文字画像作成処理が行われる（ステップＳ６０５）。すなわち、ステップＳ６０４において文字領域と判別された局所領域内における元の画像データ（スキャナ２から受信した画像データ）を２値化することによって文字部とその背景部とが区別され、文字部のみからなる文字画像が作成される。２値化する際に使用される閾値は、文字領域ごとに設定される。文字領域ごとの閾値の設定方法としては、例えば以下の方法を用いることができる。まず、文字領域ごとに、当該文字領域内の画像データの明度画像を用いて、図２８（Ａ）に示すような明度ヒストグラムを作成する。続いて、明度ヒストグラムを当該文字領域内の画素数に対する百分率に変換して二次微分を行い、二次微分の結果が所定値以上であれば「１」、それ以外は「０」を出力することによって、図２８（Ｂ）に示すようなピーク検出用ヒストグラムを作成し、ピークを検出する。そして、検出されたピーク数が２以上のときは両端のピークの中心値、ピーク数が１のときは当該ピークと前記明度ヒストグラムの左右立ち上がり値（図２８（Ａ）における「Ｌｅｆｔ」および「Ｒｉｇｈｔ」の値）の平均値、ピーク数が０のときは前記明度ヒストグラムの前記左右立ち上がり値の中心値を閾値と決定する。このように、文字領域内の明度ヒストグラムのピーク数によって異なる２値化閾値を用いるため、たとえば背景画像上の文字画像や反転文字画像等も、画像が欠けることなく２値化可能となる。
【００７２】
次に、画像補間処理が行われる（ステップＳ６０６）。すなわち、元の画像データから、文字部のみからなる文字画像が除去され、除去後の部分が、その文字画像の背景画素で補間される。ここで、文字画像の背景画素は、ステップＳ６０５において文字領域ごとに２値化して得られた画像から特定することができる。補間に使用される背景画素の値は、元のＲＧＢ画像データにおける文字画像の背景に相当する画素のＲＧＢごとの平均値を算出することにより与えられる。
【００７３】
このように、画像処理装置１は、近隣の黒画素を連結することにより近接する領域を統合して、統合された領域を抽出し、文字らしさを表す特徴量を算出して、この特徴量を用いて抽出された各領域が文字領域であるか否かを判別し、続いて、文字領域と判別された領域内の画像データから文字部のみからなる文字画像を作成する。そして、文字部のみからなる文字画像を除去した後の部分を背景画素で補間する。
【００７４】
この文字領域抽出においては、文字画像が写真画像または図形画像の上に重なっている場合であっても、文字領域の確実な抽出が可能である。ただし、文字領域よりも写真領域または図形領域が優先される原稿モードが設定されている場合、写真画像または図形画像の上に重なっている文字画像は、写真領域または図形領域の一部として先に抽出されることになる。
【００７５】
上述のようにして、図７〜図１２に示す設定された原稿モードに応じた領域抽出順序で、スキャナから受信した画像データから、写真領域、図形領域、および文字領域が分離される。
【００７６】
本実施形態によれば、画像データから写真領域、図形領域、および文字領域を分離する際に、各領域の抽出順を設定することができ、これにより、どの領域を優先して抽出するかを制御することが可能となる。したがって、優先度の高い領域は、その中に他の領域を含んでいても当該他の領域を包含した状態で優先的に抽出され、また、他の領域の中に含まれていても優先的に抽出される。このように、優先度の高い領域は、他の領域よりも先に抽出されるため他の領域と誤判別されて抽出されることがなくなり、当該領域に対して不適切な処理が施されて画像が劣化することが防止される。
【００７７】
図２９〜図３４は、図５の画像データから、それぞれ第１〜第６モードの領域分離処理によって分離された写真領域、図形領域、および文字領域を示す図であって、（Ａ）は最先に抽出された領域、（Ｂ）は２番目に抽出された領域、（Ｃ）は３番目に抽出された領域を示す。
【００７８】
第１優先領域が写真領域である場合（第１および第２モード）、図２９（Ａ）および図３０（Ａ）に示すように、受信した画像データから、写真領域が最先に抽出された後に、残ったデータから図形領域と文字領域とが分離されるため、写真領域は、他の領域の分離処理の影響を受けて当該他の領域に付随して抽出されることがなくなる。これにより、写真領域は、他の領域と誤判別されることなく確実に抽出され得る。したがって、写真領域を高画質で再現することを主目的とする場合、より多くの写真領域を確実に抽出して、当該領域に対して適切な処理を実行することができる。換言すれば、たとえば写真領域内のある部分が文字領域と誤判別され、後工程において文字領域に適応した２値化処理が行われてしまうことを防止することができる。また、たとえば写真領域内のある部分が図形領域と誤判別され、後工程において図形領域に適応した減色処理が行われて単一色で塗り潰されてしまうことを防止することができる。つまり、写真領域に対して不適切な処理が施されて写真画像が劣化することが防止される。また、写真領域を優先して抽出することにより、元の画像データの内容が維持される利点がある。すなわち、図形領域または文字領域であるにもかかわらず当該領域を写真領域と誤判別したとしても、画像として再現できるため当該領域の内容が維持される。
【００７９】
さらに、写真領域抽出後の画像データから、文字領域よりも先に図形領域を抽出する場合（第１モード）、図形領域内のある部分が文字領域と誤判別され、後工程において文字領域に適応した処理が行われてしまうことを防止することができる。したがって、写真画像および図形画像に対する劣化が少なくなる。また、写真領域抽出後の画像データから、図形領域よりも先に文字領域を抽出する場合（第２モード）、図形領域内の文字画像を抽出することが可能となる。したがって、写真画像および文字画像に対する劣化が少なくなる。
【００８０】
第１優先領域が図形領域である場合（第３および第４モード）、図３１（Ａ）および図３２（Ａ）に示すように、受信した画像データから、図形領域が最先に抽出された後に、残ったデータから写真領域と文字領域とが分離されるため、図形領域は、他の領域の分離処理の影響を受けて当該他の領域に付随して抽出されることがなくなる。これにより、図形領域は、他の領域と誤判別されることなく確実に抽出され得る。したがって、ベクタ変換処理などの図形領域に適応した処理を施すことを主目的とする場合、より多くの図形領域を確実に抽出して、当該領域に対して適切な処理を実行することができる。換言すれば、たとえば写真領域の中に図形領域が重なって配置されている場合、全体が写真領域と誤判別されＪＰＥＧ圧縮されてしまってノイズが発生することを防止することができる。また、たとえば文字画像と間違いやすい図形画像を含む領域が文字領域と誤判別されてしまって当該領域に対して文字領域に適応した処理である２値化処理、さらには文字認識処理が実行されることを防止することができる。つまり、図形領域に対して不適切な処理が施されて図形画像が劣化することが防止される。
【００８１】
さらに、図形領域抽出後の画像データから、文字領域よりも先に写真領域を抽出する場合（第３モード）、写真領域内のある部分が文字領域と誤判別され、後工程において文字領域に適応した処理が行われてしまうことを防止することができる。したがって、図形画像および写真画像に対する劣化が少なくなる。また、図形領域抽出後の画像データから、写真領域よりも先に文字領域を抽出する場合（第４モード）、写真領域内の文字画像を抽出することが可能となる。したがって、図形画像および文字画像を劣化させることなく、写真画像に対して非可逆の圧縮処理を行うことができる。
【００８２】
第１優先領域が文字領域である場合（第５および第６モード）、図３３（Ａ）および図３４（Ａ）に示すように、受信した画像データから、文字領域が最先に抽出された後に、残ったデータから写真領域と図形領域とが分離されるため、文字領域は、他の領域の分離処理の影響を受けて当該他の領域に付随して抽出されることがなくなる。これにより、文字領域は、他の領域と誤判別されることなく確実に抽出され得る。したがって、文字認識処理などの文字領域に適応した処理を施すことを主目的とする場合、より多くの文字領域を確実に抽出して、当該領域に対して適切な処理を実行することができる。換言すれば、たとえば画像データ中で文字画像が写真画像または図形画像の上に重なっている場合、文字領域が判別されないで全体が写真領域または図形領域と判別されて抽出されてしまって、文字画像に対する文字認識処理を十分に実行できないばかりか、文字画像に対して不適切な処理が施されることを防止することができる。
【００８３】
さらに、文字領域抽出後の画像データから、図形領域よりも先に写真領域を抽出する場合（第５モード）、たとえば図形領域に内包されている写真領域を抽出することが可能である。したがって、文字画像および写真画像に対する劣化が少なくなる。また、文字領域抽出後の画像データから、写真領域よりも先に図形領域を抽出する場合（第６モード）、たとえば写真領域に内包されている図形領域を抽出することが可能である。したがって、文字画像および図形画像を劣化させることなく、写真画像に対して非可逆の圧縮処理を行うことができる。
【００８４】
本発明は、上述した実施の形態のみに限定されるものではなく、特許請求の範囲内において、種々改変することができる。
【００８５】
本発明の画像処理装置は、上記実施形態で示した態様以外に、スキャナ、パソコン、ワークステーション、サーバ等のコンピュータ、デジタル複写機、ファクシミリ装置、およびＭＦＰ（multi-function peripheral）等の機器にも応用することができる。
【００８６】
また、上記実施形態では、ファイルサーバ３が、画像処理装置１から受信したファイルから文字画像および図形画像を展開して、それぞれ文字認識処理およびベクタ変換処理を施す構成とされているが、これらの処理は画像処理装置１により行われてもよい。また、図７〜図１２の各領域分離処理における個々の処理ブロックの内容は適宜変更が可能である。
【００８７】
また、上記実施形態では、画像処理装置１は、写真領域、図形領域、および文字領域のうちで画像データの内容に応じて優先的に抽出される領域を設定する構成とされているが、本発明はこれに限定されない。本発明による画像処理装置では、あらかじめ最先に抽出される領域が固定されていてもよく、たとえば写真領域が最先に抽出される領域として固定され得る。また、本発明による画像処理装置では、あらかじめ領域の抽出順が固定されていてもよく、たとえば写真領域、図形領域および文字領域の順、あるいは写真領域、文字領域および図形領域の順が領域の抽出順として固定され得る。
【００８８】
本発明による画像処理装置および画像処理方法は、上記各手順を実行するための専用のハードウエア回路によっても、また、上記各手順を記述した所定のプログラムをＣＰＵが実行することによっても実現することができる。後者により本発明を実現する場合、画像処理装置を動作させる上記所定のプログラムは、フレキシブルディスクやＣＤ−ＲＯＭ等のコンピュータ読取可能な記録媒体によって提供されてもよいし、インターネット等のネットワークを介してオンラインで提供されてもよい。この場合、コンピュータ読取可能な記録媒体に記録されたプログラムは、通常、ハードディスク等に転送され記憶される。また、このプログラムは、たとえば、単独のアプリケーションソフトとして提供されてもよいし、画像処理装置の一機能としてその装置のソフトウエアに組み込んでもよい。
【００９４】
【発明の効果】
上述したように、本発明の画像処理装置によれば、処理しようとする画像データから、写真領域が最先に抽出された後に、残ったデータから図形領域と文字領域とが分離され得る。このため、写真領域は、他の領域の分離処理の影響を受けて当該他の領域に付随して抽出されることがなくなる。これにより、写真領域は、他の領域と誤判別されることなく確実に抽出され得る。したがって、写真領域を高画質で再現することを主目的とする場合、より多くの写真領域を確実に抽出して、当該領域に対して適切な処理を実行することができる。
【図面の簡単な説明】
【図１】本発明の実施形態にかかる画像処理装置を含む画像処理システムの全体構成を示すブロック図である。
【図２】画像処理装置の構成の一例を示すブロック図である。
【図３】操作部における原稿モード設定画面の一例を示す図である。
【図４】画像処理装置における処理の手順を示すフローチャートである。
【図５】スキャナから受信した画像データの一例を模式的に示す図である。
【図６】領域の合成を説明するための図であって、（Ａ）は合成後の状態を示す図、（Ｂ）は合成前の状態を示す図である。
【図７】第１モードの領域分離処理の手順を示すフローチャートである。
【図８】第２モードの領域分離処理の手順を示すフローチャートである。
【図９】第３モードの領域分離処理の手順を示すフローチャートである。
【図１０】第４モードの領域分離処理の手順を示すフローチャートである。
【図１１】第５モードの領域分離処理の手順を示すフローチャートである。
【図１２】第６モードの領域分離処理の手順を示すフローチャートである。
【図１３】２値化による領域分割の手順を示すフローチャートである。
【図１４】図５の下地以外の領域を黒く塗り潰した２値画像を示す図である。
【図１５】図１４のエッジから構成される画像を示す図である。
【図１６】第１写真／図形領域抽出の手順を示すフローチャートである。
【図１７】図１６から続く第１写真／図形領域抽出の手順を示すフローチャートである。
【図１８】第１ヒストグラムを示す図である。
【図１９】第２ヒストグラムを示す図である。
【図２０】エッジによる領域分割の手順を示すフローチャートである。
【図２１】エッジによる領域分割によって得られた領域の一例を模式的に示す図であって、（Ａ）は第１、第２、または第５モードの場合、（ｂ）は第３、第４、または第６モードの場合を示す。
【図２２】文字領域抽出の手順を示すフローチャートである。
【図２３】文字領域抽出の説明のために用いられる画像データを示す図である。
【図２４】変動閾値による２値化処理を説明するための図である。
【図２５】変動閾値による２値化処理および黒画素の連結処理を施して得られた連結エッジ画像と、連結エッジ画像データから得られたラベリング画像において同一ラベルの連結した黒画素ごとに求めた外接矩形とを示す図である。
【図２６】画像データの特徴周波数成分から高周波成分を除去する際に用いられるフィルタ処理を説明するための図である。
【図２７】抽出された高周波成分を２値化して得られた斜め方向エッジ成分画像の一例を示す図である。
【図２８】文字領域内の画像データの明度画像から作成した（Ａ）明度ヒストグラムと、（Ｂ）ピーク検出用ヒストグラムとの一例を示す図である。
【図２９】第１モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された写真領域、（Ｂ）２番目に抽出された図形領域、および（Ｃ）３番目に抽出された文字領域を示す図である。
【図３０】第２モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された写真領域、（Ｂ）２番目に抽出された文字領域、および（Ｃ）３番目に抽出された図形領域を示す図である。
【図３１】第３モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された図形領域、（Ｂ）２番目に抽出された写真領域、および（Ｃ）３番目に抽出された文字領域を示す図である。
【図３２】第４モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された図形領域、（Ｂ）２番目に抽出された文字領域、および（Ｃ）３番目に抽出された写真領域を示す図である。
【図３３】第５モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された文字領域、（Ｂ）２番目に抽出された写真領域、および（Ｃ）３番目に抽出された図形領域を示す図である。
【図３４】第６モードの領域分離処理において、図５の画像データから、（Ａ）最先に抽出された文字領域、（Ｂ）２番目に抽出された図形領域、および（Ｃ）３番目に抽出された写真領域を示す図である。
【符号の説明】
１…画像処理装置、
１０１…制御部、
１０２…記憶部、
１０３…操作部、
１０４…入力インタフェース部、
１０５…出力インタフェース部、
１０６…領域分離部、
１０８…画像処理部、
１０８ａ…写真領域処理部、
１０８ｂ…図形領域処理部、
１０８ｃ…文字領域処理部、
１０９…文書ファイル作成部、
１１０…ファイル形式変換部、
１１１…バス、
２…スキャナ、
３…ファイルサーバ、
４…コンピュータネットワーク。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing device and an image processing method, and more particularly to an image processing device and an image processing method for performing region separation processing for separating a photographic region, a graphic region, and a character region from image data to be processed.
[0002]
[Prior art]
Image processing technology that distinguishes photographic regions, graphic regions, and character regions from image data obtained by reading a document, performs processing adapted to each region, and then outputs image data that combines the regions. Is known (for example, see JP-A-5-342408).
[0003]
[Problems to be solved by the invention]
However, in the above prior art, it may be difficult to determine the type of each area in the image data without error. For example, when a plurality of different types of areas are arranged in a complicated layout in the image data, or are arranged so as to overlap each other, there is a high possibility that the type of the area is erroneously determined.
[0004]
For example, if there is a part in the photo area that is misidentified as a character area, the binarization process adapted to the character area may be performed on the part in a later step. Further, if there is a portion that is erroneously determined as a graphic region in the photographic region, there is a possibility that a color-reduction process adapted to the graphic region is performed on the portion in a subsequent process and the portion is filled with a single color. As a result, the picture quality of the photographic image is significantly reduced.
[0005]
In this way, even if it is desired to extract a photographic area including a photographic image from image data and to reproduce the extracted photographic area with high image quality, the photographic area cannot be reliably extracted. In addition, improper processing may be performed on the photographic area, which may cause image degradation.
[0006]
The present invention has been made in view of the above-described problems of the prior art, and an object of the present invention is to reproduce a photographic area from the image data in order to reproduce the photographic area in the read image data with high image quality. It is sure to extract.
[0007]
[Means for Solving the Problems]
The above object of the present invention is achieved by the following means.
[0008]
(1) An image processing apparatus having area separation means for separating a photographic area, a graphic area, and a character area from image data to be processed, wherein the area separation means is configured to extract a graphic area and a character from the image data. Photo area first extraction means for specifying and extracting a photo area prior to the area, and a graphic area is extracted from the data that has been identified and extracted from the image data and is left unspecified. Identify and separate character areas The first photographic area extraction unit performs a first area dividing process on the image data, which can be divided into a plurality of predetermined areas, and extracts an area identified as a photographic area from the divided areas. A first photographic area extracting unit that extracts a region identified as a photographic area by the first photographic region extracting unit from the image data and divides the remaining data into a plurality of areas smaller than the predetermined area A second photographic area extracting means for performing a second area dividing process and extracting a photographic area and a specified area among the divided areas; An image processing apparatus.
[0009]
(2) The region separation means may extract a photographic region from the image data. Identified Left extracted , The area is not specified From the data, the graphic area is set before the text area Identify Extract Shape The image processing apparatus according to (1), further comprising area advance extraction means.
[0010]
(3) The region separating means may extract a photograph region from the image data. Identified Left extracted , The area is not specified From the data, set the character area before the graphic area. Identify The image processing apparatus according to (1), further comprising a character region preceding extraction means for extracting.
[0012]
( 4 ) The first area dividing process detects edges of a binary image in which a photographic area, a graphic area, or a character area portion in the image data is distinguished from a background portion other than these areas, thereby obtaining a plurality of predetermined areas. In the second area dividing process, an area identified as a photographic area is extracted from the image data, and an edge of remaining data is detected to divide the area into a plurality of areas smaller than the predetermined area. Characteristic that is possible Above (1) An image processing apparatus according to 1.
( 5 (1) to (1), further comprising reading means for obtaining image data by reading a document, wherein the image data to be processed is obtained by the reading means. 4 The image processing apparatus according to any one of the above.
( 6 ) An image processing method having an area separation step of separating a photographic area, a graphic area, and a character area from image data to be processed, wherein the area separation stage includes a graphic area and a character area from the image data. First identify and extract the photo area The first step to extract the photo area In this step, the graphic area and the character area are identified and separated from the unidentified data remaining after the photographic area is identified and extracted from the image data. The first photographic area extraction step performs a first area dividing process on the image data, which can be divided into a plurality of predetermined areas, and extracts an area identified as a photographic area from among the divided areas. A first photographic region extracting step, and a region that is identified as a photographic region in the first photographic region extracting step is extracted from the image data and divided into a plurality of regions smaller than the predetermined region. Performing a second region dividing process that can be performed, and having a second photo region extraction step of extracting a photo region and a specified region among the divided regions, An image processing method.
( 7 ) An image processing program for causing an image processing apparatus to execute a region separation procedure for separating a photographic region, a graphic region, and a character region from image data to be processed, wherein the region separation procedure is performed from the image data. Identifies and extracts photo areas before graphic areas and text areas Photo area first extraction procedure This is a procedure for identifying and separating a graphic region and a character region from data that has been identified and extracted from the image data and remains after the region has not been identified. The first photographic area extraction procedure performs a first area dividing process on the image data, which can be divided into a plurality of predetermined areas, and extracts an area identified as a photographic area from among the divided areas. A first photographic region extraction procedure, and a region that is identified as a photographic region in the first photographic region extraction procedure is extracted from the image data and divided into a plurality of regions smaller than the predetermined region. A second photo area extraction procedure for performing a second area division process that can be performed, and extracting a photo area and a specified area among the divided areas; An image processing program characterized by that.
( 8 )the above( 7 The computer-readable recording medium which recorded the image processing program as described in 1).
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0014]
FIG. 1 is a block diagram showing the overall configuration of an image processing system including an image processing apparatus according to an embodiment of the present invention. The image processing system includes an image processing apparatus 1, a scanner 2, and a file server 3, which are connected to each other via a computer network 4 so as to communicate with each other. The type and number of devices connected to the computer network are not limited to the example shown in FIG.
[0015]
FIG. 2 is a block diagram illustrating a configuration of the image processing apparatus 1 according to the present embodiment. 2, the image processing apparatus 1 includes a control unit 101, a storage unit 102, an operation unit 103, an input interface unit 104, an output interface unit 105, an area separation unit 106, an image processing unit 108, a document file creation unit 109, and a file format. A conversion unit 110 is provided, which are connected to each other via a bus 111 for exchanging signals.
[0016]
The control unit 101 is a CPU, and controls the above-described units and performs various arithmetic processes according to a program. The storage unit 102 stores a variety of programs and parameters in advance, a ROM that temporarily stores programs and data as a work area, stores various programs and parameters, or stores image data obtained by image processing, and the like. Consists of a hard disk or the like used for temporary storage.
[0017]
The operation unit 103 includes keys, an operation panel, and the like for setting various items or instructing operation start. As shown in FIG. 3, items that can be set include a destination of image data, an output file format, a document mode, scan conditions, post-scan processing, and the like.
[0018]
The input interface unit 104 is an interface for receiving data such as image data and instructions, and the output interface unit 105 is an interface for transmitting data such as output files and instructions.
[0019]
The area separation unit 106 separates a photographic area, a graphic area, and a character area from the image data. The image processing unit 108 includes a photographic region processing unit 108a, a graphic region processing unit 108b, and a character region processing unit 108c. Each of the area processing units 108a to 108c performs appropriate image processing corresponding to the type of the area on the character area, the graphic area, and the photo area extracted by the area separation unit 106, respectively.
[0020]
The document file creation unit 109 synthesizes the regions including the processed images sent from the photo region processing unit 108a, the graphic region processing unit 108b, and the character region processing unit 108c, and generates a document file in the internal file format. create. The file format conversion unit 110 converts a document file created in the internal file format into a set output file format. Examples of the output file format include document formats of various document creation software and general-purpose formats such as Postscript (registered trademark), PDF, JPEG, and TIFF.
[0021]
The scanner 2 reads a document, acquires image data, and transmits the obtained image data to the image processing apparatus.
[0022]
The file server 3 is a computer, which stores a file received via the computer network 4 and transfers the stored file to another device on the computer network in response to a transfer request.
[0023]
The computer network 4 includes a LAN in which computers, peripheral devices, network devices, and the like are connected according to standards such as Ethernet (registered trademark), token ring, and FDDI, and a WAN in which LANs are connected by a dedicated line.
[0024]
Next, a processing procedure in the image processing apparatus 1 of the present embodiment will be described with reference to FIG. Note that the algorithm shown in the flowchart of FIG. 4 is stored as a program in the storage unit 102 of the image processing apparatus 1 and executed by the control unit 101.
[0025]
First, in step S101, various items are set. That is, settings are made for the destination of image data, the output file format, the document mode, the scan conditions, and the post-scan processing. Here, the transmission destination of the image data is set by inputting the IP address, host name, mail address, etc. of the image output destination device. The output file format is set by selecting the file format of the output file to be transmitted to the image output destination device.
[0026]
The document mode is set by selecting which area of the image is to be given priority among a photo area including a photographic image, a graphic area including a graphic image, and a character area including a character image. . Here, the photographic image refers to an image having gradation that continuously changes, such as a photograph or a picture. The graphic image refers to an image created by a personal computer such as a line or a solid picture.
[0027]
As shown in FIG. 3, in this embodiment, the user wants to process the area with the highest priority (first priority area) and the second priority according to the content of the document through the operation unit 103. An area (second priority area) can be selected. When the user does not select the first priority area and / or the second priority area, the priority order of the areas is determined according to a predetermined default value.
[0028]
The scan condition is set by specifying a scan area, scan resolution, color / monochrome, and the like. Setting of post-scan processing is performed by designating character recognition processing, vector conversion processing, image compression method, color reduction method, output resolution, and the like.
[0029]
In step S102, the process waits for an image processing start command. The start command is issued when the user operates, for example, a start key of the operation unit 103.
[0030]
In step S <b> 103, a document reading command is transmitted to the scanner 2 via the output interface unit 105. Here, when receiving a document reading command from the image processing apparatus 1, the scanner 2 reads the document set at a predetermined position to acquire image data, and transmits the obtained image data to the image processing apparatus 1.
[0031]
In step S104, the process waits until image data is received from the scanner 2 via the input interface unit 104. Here, when image data is received from the scanner 2, the received image data (RGB image data) is stored in the storage unit 102.
[0032]
FIG. 5 is a diagram schematically illustrating an example of received image data. The image data shown in FIG. 5 includes photographic images PI1 to PI4, graphic images GI1 to GI3, character images CI1 to CI6, and background U. As shown in the figure, a graphic image GI3 and character images CI1 and CI3 are arranged in the photographic image PI1, and a photographic image PI3 and PI4 and character images CI5 and CI6 are arranged in the graphic image GI2. . Here, the background means a portion other than a photo area, a graphic area, and a character area in the image data. For example, a portion of the image data corresponding to a portion where the original color of the original document is not printed remains. Say.
[0033]
Note that an image processing start command may be input from another device on the computer network 4 or from the scanner 2.
[0034]
In step S105, the region separation unit 106 separates the photo region, the graphic region, and the character region from the image data received from the scanner 2, that is, the image data to be processed. In the present embodiment, the area separation unit 106 extracts the first priority area first from the image data according to the document mode set by the user's selection through the operation unit 103, and first priority from the image data. The remaining two regions are separated from the remaining data after the region is extracted. Further, when the second priority area is selected, the remaining two areas are separated by first extracting the second priority area from the data remaining after the first priority area is extracted from the image data. . Here, each image is created for each of the three types of regions, and each region is extracted as a region including each image. Details of the procedure of the region separation process will be described later.
[0035]
In step S106, processing adapted to the photographic area is performed on the photographic area separated in step S105. That is, the photographic image in the photographic area is subjected to, for example, resolution conversion and then irreversible compression processing of the color image by the photographic area processing unit 108a, and is stored in the storage unit 102 together with the position information.
[0036]
In step S107, processing adapted to the graphic area is performed on the graphic area separated in step S105. That is, the graphic image in the graphic area is subjected to, for example, smoothing processing, color reduction processing, and the like by the graphic image processing unit 108b, and then reversible compression processing of the color image is performed and stored in the storage unit 102 together with the position information. Is done.
[0037]
In step S108, processing adapted to the character area is performed on the character area separated in step S105. That is, the character image in the character area is binarized by the character image processing unit 108c, for example, and then subjected to lossless compression processing of 1-bit data and stored in the storage unit 102 together with the color information and the position information. The
[0038]
In step S109, the document file creation unit 109 combines the above three regions including the processed images sent from the photo region processing unit 108a, the graphic region processing unit 108b, and the character region processing unit 108c, respectively, A file is created.
[0039]
For example, as shown in FIG. 6A, the areas are combined by outputting a photo area 600, a graphic area 700, and a character area 800 on the memory. Here, as shown in FIG. 6B, a portion other than the photographic image 600a in the photographic region 600, a portion other than the graphic image 700a in the graphic region 700, and a portion other than the character image 800a in the character region 800, respectively. It is specified in the mask parts 600b to 800b. Each region 600 to 800 is given by a circumscribed rectangle for the images 600a to 800a. The mask portion is a portion where mask processing for validating information previously stored in the memory is performed. First, the photographic area 600 and the graphic area 700 are arranged after being masked, and then the character area 800 is arranged after being masked. The reason for performing the mask process for all the regions is that not only the image in each region is rectangular but also the case where the image in another region has a shape that goes into it. In this way, as shown in FIG. 6A, the images in the three regions are output onto the memory without any loss, and the composition of the regions is completed.
[0040]
The reason for arranging the character area after other areas is that, as will be described later, after extracting the character area during the area separation process, the part where the character image existed is used as the background part of the original image data. This is because interpolation processing for interpolation is performed. That is, if the character area is arranged before other areas, the character image is prevented from being hidden by the interpolated portion in the area arranged later. In addition, after the graphic area is extracted, when the interpolation process is also performed on the part where the graphic image exists, the photographic area, the graphic area, and the character area are output on the memory in the order.
[0041]
In step S110, the file format conversion unit 110 converts the document file created in step S109 into the set output file format.
[0042]
In step S111, the output file obtained in step S110 is transmitted to the file server 3 via the output interface unit 105 and the computer network 4.
[0043]
In this embodiment, when the file server 3 receives the output file from the image processing apparatus 1 via the computer network 4, the file server 3 develops a character image and a graphic image from the received file, and performs character recognition processing on the character image. Obtained by converting the data into character code data, performing vector conversion processing on the graphic image to vector data, recombining the converted data together with the photo image, and converting it into a predetermined file format. The document file is stored in a predetermined directory of a storage device such as a hard disk. When there is a transfer request for the file from another device on the computer network 4, the stored file is transferred to the other device via the computer network 4.
[0044]
Next, the procedure of the region separation process in step S105 shown in FIG. 4 which is a feature of the image processing of the image processing apparatus 1 in the present embodiment will be described in more detail.
[0045]
7 to 12 are flowcharts showing the procedure of the region separation process corresponding to the document mode. There are a total of six document modes set in step S101 in FIG. That is, the document mode includes a first mode (see FIG. 7) in which the first priority area and the second priority area are a photographic area and a graphic area, respectively, and a second mode (see FIG. 8) in which the photographic area and a character area are respectively ), A third mode (see FIG. 9) that is a graphic region and a photo region, respectively, a fourth mode (see FIG. 10) that is a graphic region and a character region, respectively, and a fifth mode that is a character region and a photo region, respectively. (See FIG. 11) and a sixth mode (see FIG. 12) which is a character area and a graphic area, respectively.
[0046]
Each of the region separation processes of FIGS. 7 to 12 includes region division by binarization, first photo / graphic region extraction, region division by edge, second photo / graphic region extraction, photo / graphic region extraction, and character region extraction. It includes a processing block whose contents are common in each mode. In other words, each area separation process has a different priority area according to the document mode, and the extraction order of the areas differs according to the priority of the areas. For example, in the case of the first mode, the priority of the areas is from the highest to the photographic area, the graphic area, and the character area, and the extraction order of the areas is the same.
[0047]
Hereinafter, the contents of each processing block in the region separation processing will be described in detail. Here, as an example, the content of each processing block will be specifically described in the same order as the processing order of the first mode in FIG.
[0048]
(Division by binarization)
First, with reference to FIG. 13, the procedure of area division by binarization will be described.
[0049]
For example, based on the received image data shown in FIG. 5, an image composed of brightness, that is, a brightness image is created (step S301). Next, the background is removed from the brightness image (step S302), and a smoothing process using a smoothing filter is performed (step S303). The brightness image from which the background and noise are removed is binarized at the background brightness level (hereinafter referred to as “background level”) (step S304). As a result, as shown in FIG. 14, a binary image in which the area other than the background is blacked out is obtained. Edge detection is performed on the binary image by using, for example, a Laplacian filter that is a secondary differential filter (step S305). Subsequently, by performing a closing (expansion / erosion) process, which is a type of morphology process, an edge is interpolated to obtain an image composed of the edge, that is, an edge image (see step S306, FIG. 15). The edge image in FIG. 15 corresponds to the contour line of the binary image in FIG.
[0050]
By detecting the contour line of the binary image obtained by binarization at the background level in this way, when there are a plurality of regions that are in contact with each other, these regions are combined into one, Splitting is performed. In the first to fourth modes, four large areas and character areas as shown in FIGS. 14 and 15 are obtained. However, in the fifth and sixth modes in which the character area is prioritized, since the character area is extracted before the area division by binarization, only four large areas are obtained.
[0051]
(First photo / graphic region extraction)
Next, the procedure for extracting the first photograph / graphic area will be described with reference to FIGS.
[0052]
The processing relating to the extraction of the first photograph / graphic region is executed for each region (region surrounded by the closed curve in FIG. 15) partitioned by the contour line of the binary image obtained by binarization at the background level. The First, based on the position of the edge of the edge image in FIG. 15, the position of the line segment between the edges in the first direction, for example, the main scanning direction is detected (step S401). Then, using a pixel of the brightness image corresponding to the position of the line segment between the target edges, for example, a histogram (first histogram) as shown in FIG. 18 is created (step S402). Further, by performing a smoothing process using the average value filter (| 1 | 0 | 1 |) on the first histogram, for example, a second histogram as shown in FIG. 19 is created (step S403). Subsequently, according to the following formula, the difference between the first histogram Hist1 and the second histogram Hist2 is calculated for each gradation, and the sum is used as the feature amount (step S404). The lightness is represented by 8-bit data and has 256 gradations.
[0053]
[Expression 1]

[0054]
Next, a ratio R1 (= N1 / T1) between the total number of pixels N1 located in the line segment between the target edges and a predetermined constant T1 is calculated (step S405). The constant T1 is a first parameter for separating a photographic area and a graphic area. Subsequently, the ratio R1 as a threshold value is compared with the feature amount (step S406). When it is determined that the feature amount is larger than the ratio R1 (step S406: NO), all of the pixels located in the inter-edge line segment in the first direction are considered to belong to the graphic region, and a process of assigning a number is performed. Is executed, and labeling data is generated (step S407). That is, the result of area determination is labeled for each pixel. Specifically, the result of area determination is stored in correspondence with the pixel position. On the other hand, when it is determined that the feature amount is equal to or less than the ratio R1 (step S406: YES), all of the pixels located in the line segment between the edges in the first direction are regarded as belonging to the photographic region and are labeled. Data is generated (step S408). Subsequently, it is determined whether or not the line segment between the target edges is the final line segment between the edges in the first direction (step S409). If it is determined that the line segment between the edges of interest is not the line segment between the last edges (step S409: NO), the process returns to step S402 and the above process is repeated.
[0055]
Next, based on the edge image of FIG. 15, the position of the line segment between the edges in the second direction, for example, the sub-scanning direction, which is a direction orthogonal to the first direction is detected (step S410). Based on the labeling data created in steps S407 and S408, the ratio R2 (= N3 / N2) of the total number of pixels N2 located in the line segment between the target edges and the number of pixels N3 belonging to the photo area is calculated. (Step S411). Subsequently, the ratio R2 is compared with a predetermined constant T2 that is a threshold value (step S412). The constant T2 is a second parameter for separating the photographic area and the graphic area. When it is determined that the ratio R2 is smaller than the constant T2 (step S412: NO), all the pixels existing in the line segment between the edges of interest in the second direction are regarded as belonging to the graphic area and are labeled. It is corrected (step S413). On the other hand, when it is determined that the ratio R2 is equal to or greater than the constant T2 (step S412: YES), all the pixels located in the line segment between the edges of interest in the second direction are regarded as belonging to the photo area and are relabeled. (Step S414). Subsequently, it is determined whether or not the line segment between the target edges is the final line segment between the edges in the second direction (step S415). When it is determined that the line segment between the edges of interest is not the line segment between the last edges (step S415: NO), the process returns to step S411 and the above process is repeated.
[0056]
Next, the number N5 of pixels labeled as belonging to the photographic area and the number N4 of labeled pixels belonging to the graphic area are compared in one area obtained by area division by binarization ( Step S416). When it is determined that N5 <N4 (step S416: NO), the area is determined as a graphic area (step S417). On the other hand, when it is determined that N5 ≧ N4 (step S416: YES), the area is determined to be a photograph area (step S418). Then, the determination of the higher priority region of the photo region or the graphic region is confirmed, and all the pixels in the region are regarded as belonging to the higher priority region and are relabeled. And extracted from the image data (step S419). Here, the circumscribed rectangle of the region is calculated by tracking the outline of the region based on the position data, and a rectangular region including pixels labeled as belonging to the region corresponding to the coordinate position in the circumscribed rectangle, Extracted from image data.
[0057]
As described above, in the first photo / graphic region extraction, by calculating the image feature amount between the edges that are the boundaries of the region with respect to the region obtained by performing the region division by binarization, It is determined whether the area is a photographic area or a graphic area. If the priority of the photo area is higher than that of the graphic area (first, second, and fifth modes), the photo area is determined and the area is extracted in the form of a rectangular area. On the other hand, when the priority of the graphic area is higher than that of the photo area (third, fourth, and sixth modes), the graphic area is determined and the area is extracted in the form of a rectangular area.
[0058]
In this embodiment, as described above, based on the feature that the brightness distribution of the graphic area is uniform to some extent while the brightness distribution of the photo area is dispersed, it is a photographic area or a graphic area. Is determined. However, the region determination method is not limited to this. For example, the region may be determined using a frequency component extracted from the brightness image as a feature amount.
[0059]
(Area division by edge)
Next, with reference to FIG. 20, the procedure of area division by edges will be described.
[0060]
The area division based on the edge is performed on the data remaining after the higher priority area of the photo area or the graphic area is extracted by the first photo / graphic area extraction.
[0061]
First, edge detection is performed on each of the R component image, the G component image, and the B component image in the image data by using an edge detection filter such as a Laplacian filter (steps S501 to S503). Subsequently, an OR process is performed to obtain the union of each edge of the detected R component image, G component image, and B component image (step S504), and a closing process is performed in order to eliminate edge breaks. This is performed (step S505). Here, in the first to fourth modes, since the character area is not yet extracted, the edge of the character image is also detected. However, for example, when the size (vertical and horizontal dimensions) of the circumscribed rectangle of the region is smaller than a predetermined threshold and the ratio of the number of effective pixels inside the region to the size of the circumscribed rectangle is smaller than the predetermined threshold, the region is It is excluded from the area division target by the edge. As a result, the edge of the character image is not regarded as the boundary of the region.
[0062]
FIG. 21 is a diagram schematically illustrating an example of a region obtained by region division using edges, where (A) is in the first, second, or fifth mode, and (b) is the third , 4th, or 6th mode is shown. Here, in FIG. 21A, the photo area has already been extracted before the graphic area by the first photo / graphic area extraction. That is, the region P1 (including the region G3) and the region P2 shown in FIG. 21B have already been extracted as photographic regions in FIG. In FIG. 21B, the graphic area is already extracted before the photo area by the first photo / graphic area extraction. That is, the region G1 and the region G2 (including the regions P3 and P4) shown in FIG. 21A have already been extracted as photographic regions in FIG.
[0063]
As described above, the edge segmentation is performed by extracting the higher priority area of the photographic area or the graphic area by the first photo / graphic area extraction, and adding the higher priority area to the remaining data. If it is still overlaid on the lower priority area, or if the higher priority area is still contained within the lower priority area, This is performed in order to further extract a region having a higher priority. That is, by detecting the edge, finer region division is performed.
[0064]
(Second photo / graphic region extraction)
Next, the procedure for extracting the second photograph / graphic area will be described. In the second photo / graphic region extraction, the same process as the first photo / graphic region extraction described above is performed again on the region obtained by the region division by the edge shown in FIG. As a result, in the first, second, and fifth modes, a photo area in the graphic area that is not extracted as the photo area by the first photo / graphic area extraction is extracted. For example, in FIG. 21A, regions P3 and P4 are additionally extracted as photo regions. In the third, fourth, and sixth modes, graphic areas in the photo area that were not extracted as graphic areas by the first photo / graphic area extraction are extracted. For example, in FIG. 21B, a region G3 is additionally extracted as a graphic region.
[0065]
In the photo / graphic region extraction performed after the second photo / graphic region extraction shown in FIGS. 7 to 12 is completed, the second photo is selected from the regions obtained by the region division by the edge shown in FIG. / A remaining region is extracted without being extracted by the graphic region extraction.
[0066]
(Character area extraction)
Next, a procedure for extracting a character area will be described with reference to FIG.
[0067]
Here, for the sake of simplicity, a case will be described in which processing relating to character area extraction is performed on the image data shown in FIG. 23, for example.
[0068]
First, region integration processing is performed on image data (step S601). This process is a process for extracting a character image including a character image on a background image or a character image having a different pixel value, for example. Specifically, first, after a smoothing process is performed on a lightness image in the image data, an edge image is created by performing binarization using a variation threshold. Specifically, the binarization processing based on the variation threshold is, for example, as shown in FIG. 24, a value obtained by subtracting the offset value from the maximum value of the gradation values of the pixels located at the four corners in the 5 × 5 block. As a process of binarizing the target pixel. Subsequently, a black pixel interval in the main scanning direction of the obtained edge image is measured, and all white pixels between black pixels that are equal to or smaller than a predetermined interval are replaced with black pixels, and the black pixels are connected in the main scanning direction. Create an edge image. Further, similar processing is repeated in the sub-scanning direction of the obtained connected edge image, and a connected edge image in which black pixels are connected in the main sub-scanning direction is obtained. In this way, the image processing apparatus 1 connects neighboring black pixels and integrates individual character images isolated in the image data as one region, so that one character string is collected to some extent. It can be extracted as a region.
[0069]
Next, a region extraction process is performed (step S602). This process is a process of separately extracting a group of connected black pixels as one area. Specifically, first, the obtained connected edge image is labeled for each connected black pixel. Simultaneously with the labeling, position information (width, height, and coordinates) of the circumscribed rectangle for each black pixel connected with the same label is detected, and a labeling image is created. Subsequently, based on the circumscribed rectangle and the label number detected at the time of labeling, a region surrounded by the circumscribed rectangle is extracted as a local region from the labeling image. Here, by extracting a circumscribed rectangle including only pixels having the same label number, it is possible to separate and extract an image of a layout in which the circumscribed rectangles overlap each other. FIG. 25 is obtained for each connected black pixel of the same label in the connected edge image obtained by performing the binarization processing by the variation threshold and the black pixel connecting process and the labeling image obtained from the connected edge image data. It is a figure which shows a circumscribed rectangle.
[0070]
Next, the oblique edge component of the image belonging to each local area extracted in step S602 is extracted as a feature amount (S603), and the local area where the content rate of the oblique edge component is within a predetermined range is defined as a character area. It is determined (S604). The character area includes a large amount of diagonal edge components in a small area as compared to other areas such as graphics, photographs, and ruled lines. Therefore, it is possible to determine whether or not the local region is a character region by extracting a diagonal edge component as a frequency component peculiar to the character region and obtaining the content rate in the local region. The extraction of the edge component in the oblique direction is equivalent to the process of extracting the high frequency component from the frequency components obtained by 2 × 2 DCT (discrete cosine transform). That is, by performing DCT using a 2 × 2 matrix on the image in the local region and performing inverse DCT transform with the high frequency component of the obtained frequency components set to “0”, a restored image from which the high frequency component has been removed is obtained. can get. Then, by extracting the difference between the original image and the restored image, only the high frequency component of the original image can be extracted. Here, high-speed processing is possible by performing the filter processing shown in FIG. FIG. 27 is a diagram illustrating an example of an oblique direction edge component image obtained by binarizing the extracted high frequency component. The territory is generally in units of words. Therefore, when the local region is a character region, the content ratio of the oblique edge component in the local region, that is, the ratio of the total number of black pixels in FIG. 27 belonging to the local region to the area of the local region is within a predetermined range. (Approximately 0.2 to 20%). Therefore, a local region whose ratio is within the above range is determined as a character region.
[0071]
Next, a character image creation process is performed (step S605). That is, by binarizing the original image data (image data received from the scanner 2) in the local area determined as the character area in step S604, the character part and its background part are distinguished, and only from the character part. A character image is created. A threshold value used for binarization is set for each character area. As a threshold value setting method for each character area, for example, the following method can be used. First, for each character region, a brightness histogram as shown in FIG. 28A is created using the brightness image of the image data in the character region. Subsequently, the lightness histogram is converted into a percentage with respect to the number of pixels in the character area, and second order differentiation is performed. If the result of the second order differentiation is equal to or greater than a predetermined value, “1” is output, otherwise “0” is output. Thus, a peak detection histogram as shown in FIG. 28B is created, and the peak is detected. When the number of detected peaks is 2 or more, the center value of the peaks at both ends, and when the number of peaks is 1, the peak and the left and right rising values of the brightness histogram (“Left” and “Right” in FIG. 28A). When the average value and the number of peaks are 0, the center value of the left and right rising values of the brightness histogram is determined as a threshold value. As described above, since a binarization threshold value that differs depending on the number of peaks in the brightness histogram in the character area is used, for example, a character image on a background image, a reversed character image, or the like can be binarized without any missing image.
[0072]
Next, an image interpolation process is performed (step S606). That is, the character image consisting only of the character portion is removed from the original image data, and the removed portion is interpolated with the background pixels of the character image. Here, the background pixel of the character image can be specified from the image obtained by binarization for each character region in step S605. The value of the background pixel used for the interpolation is given by calculating the average value for each RGB of the pixels corresponding to the background of the character image in the original RGB image data.
[0073]
As described above, the image processing apparatus 1 integrates adjacent regions by connecting neighboring black pixels, extracts the integrated region, calculates a feature amount that represents character, and calculates the feature amount. It is determined whether or not each extracted region is a character region, and then a character image consisting only of a character portion is created from image data in the region determined to be a character region. Then, the portion after removing the character image consisting only of the character portion is interpolated with the background pixels.
[0074]
In this character region extraction, the character region can be reliably extracted even when the character image is superimposed on the photo image or the graphic image. However, if the document mode is set in which the photo area or graphic area is prioritized over the text area, the character image overlying the photo image or graphic image will be displayed as a part of the photo area or graphic area first. Will be extracted.
[0075]
As described above, the photographic area, the graphic area, and the character area are separated from the image data received from the scanner in the area extraction order corresponding to the set document mode shown in FIGS.
[0076]
According to this embodiment, when separating a photo area, a graphic area, and a character area from image data, it is possible to set the extraction order of each area, thereby determining which area is preferentially extracted. It becomes possible to control. Therefore, a high priority area is preferentially extracted in a state including the other area even if it includes another area, and is preferential even if included in another area. Extracted into In this way, the high priority area is extracted before the other area, so that it is not mistakenly extracted as the other area, and the area is inappropriately processed. It is possible to prevent the image from deteriorating.
[0077]
FIGS. 29 to 34 are diagrams showing a photographic region, a graphic region, and a character region separated from the image data of FIG. 5 by the region separation processing in the first to sixth modes, respectively. The previously extracted area, (B) indicates the second extracted area, and (C) indicates the third extracted area.
[0078]
When the first priority area is a photograph area (first and second modes), as shown in FIGS. 29A and 30A, the photograph area is extracted first from the received image data. Later, since the graphic area and the character area are separated from the remaining data, the photograph area is not extracted along with the other area under the influence of the separation process of the other area. As a result, the photograph area can be reliably extracted without being erroneously determined as another area. Therefore, when the main purpose is to reproduce a photographic area with high image quality, more photographic areas can be reliably extracted and appropriate processing can be performed on the area. In other words, for example, it can be prevented that a certain part in the photographic region is erroneously determined as a character region and a binarization process adapted to the character region is performed in a later process. Further, for example, it can be prevented that a certain part in the photographic area is erroneously determined as a graphic area, and color-reduction processing adapted to the graphic area is performed in a later process and is painted with a single color. That is, the photographic image is prevented from being deteriorated due to inappropriate processing on the photographic area. Moreover, there is an advantage that the contents of the original image data are maintained by extracting the photo area with priority. That is, even if it is a graphic region or a character region, even if the region is misidentified as a photographic region, it can be reproduced as an image, so the content of the region is maintained.
[0079]
Furthermore, when a graphic area is extracted before the character area from the image data after extraction of the photographic area (first mode), a certain part in the graphic area is misidentified as a character area, and is applied to the character area in a later process. It is possible to prevent the performed processing from being performed. Accordingly, the deterioration of the photographic image and the graphic image is reduced. Further, when a character area is extracted from the image data after extraction of the photographic area before the graphic area (second mode), it is possible to extract a character image in the graphic area. Accordingly, the deterioration of the photographic image and the character image is reduced.
[0080]
When the first priority area is a graphic area (third and fourth modes), as shown in FIGS. 31A and 32A, the graphic area is extracted first from the received image data. Later, since the photo area and the character area are separated from the remaining data, the graphic area is not extracted along with the other area under the influence of the separation process of the other area. As a result, the graphic area can be reliably extracted without being misidentified as another area. Therefore, when the main purpose is to apply a process adapted to a graphic area such as a vector conversion process, it is possible to reliably extract more graphic areas and execute an appropriate process on the area. In other words, for example, when a graphic area is arranged so as to overlap in a photographic area, it is possible to prevent the entire image from being erroneously determined as a photographic area and compressed by JPEG and generating noise. Further, for example, an area including a character image that is likely to be mistaken for a character image is misidentified as a character area, and binarization processing, which is processing adapted to the character area, and character recognition processing are executed. This can be prevented. That is, it is possible to prevent the graphic image from being deteriorated due to inappropriate processing on the graphic region.
[0081]
Furthermore, when a photo area is extracted before the character area from the image data after the graphic area is extracted (third mode), a certain part in the photo area is erroneously determined as a character area, and is applied to the character area in a later process. It is possible to prevent the performed processing from being performed. Accordingly, the deterioration of the graphic image and the photographic image is reduced. In addition, when a character area is extracted prior to a photographic area from the image data after extraction of the graphic area (fourth mode), it is possible to extract a character image in the photographic area. Therefore, irreversible compression processing can be performed on a photographic image without degrading the graphic image and the character image.
[0082]
When the first priority area is a character area (fifth and sixth modes), as shown in FIGS. 33 (A) and 34 (A), the character area is extracted first from the received image data. Later, since the photo area and the graphic area are separated from the remaining data, the character area is not extracted along with the other area under the influence of the separation process of the other area. As a result, the character area can be reliably extracted without being misidentified as another area. Therefore, when the main purpose is to perform processing adapted to a character region such as character recognition processing, more character regions can be reliably extracted and appropriate processing can be executed on the region. In other words, for example, when a character image is superimposed on a photographic image or graphic image in the image data, the character region is not determined and the entire image is determined as a photographic region or graphic region and extracted. In addition to not being able to sufficiently execute the character recognition process for the character image, it is possible to prevent the character image from being inappropriately processed.
[0083]
Further, when a photographic area is extracted from the image data after extracting the character area before the graphic area (fifth mode), for example, a photographic area included in the graphic area can be extracted. Therefore, the deterioration with respect to the character image and the photographic image is reduced. Further, when a graphic area is extracted before the photographic area from the image data after the character area is extracted (sixth mode), for example, a graphic area included in the photographic area can be extracted. Therefore, irreversible compression processing can be performed on a photographic image without degrading the character image and graphic image.
[0084]
The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims.
[0085]
The image processing apparatus of the present invention can be applied to devices such as computers such as scanners, personal computers, workstations, and servers, digital copiers, facsimile apparatuses, and MFPs (multi-function peripherals) in addition to the aspects described in the above embodiments. Can be applied.
[0086]
In the above-described embodiment, the file server 3 is configured to expand the character image and the graphic image from the file received from the image processing apparatus 1 and perform the character recognition process and the vector conversion process, respectively. The processing may be performed by the image processing apparatus 1. In addition, the contents of individual processing blocks in each region separation process of FIGS. 7 to 12 can be changed as appropriate.
[0087]
In the above-described embodiment, the image processing apparatus 1 is configured to set an area preferentially extracted according to the content of the image data among the photo area, the graphic area, and the character area. The invention is not limited to this. In the image processing apparatus according to the present invention, the area extracted first may be fixed in advance, and for example, the photograph area may be fixed as the area extracted first. In the image processing apparatus according to the present invention, the extraction order of the areas may be fixed in advance, for example, the order of the photographic area, the graphic area, and the character area, or the order of the photographic area, the character area, and the graphic area is extracted. Can be fixed as order.
[0088]
The image processing apparatus and the image processing method according to the present invention can be realized by a dedicated hardware circuit for executing each of the above procedures, or by a CPU executing a predetermined program describing the above each procedure. Can do. When the present invention is realized by the latter, the predetermined program for operating the image processing apparatus may be provided by a computer-readable recording medium such as a flexible disk or a CD-ROM, or via a network such as the Internet. It may be provided online. In this case, the program recorded on the computer-readable recording medium is usually transferred and stored in a hard disk or the like. Further, this program may be provided as, for example, a single application software, or may be incorporated in the software of the apparatus as one function of the image processing apparatus.
[0094]
【The invention's effect】
As described above, according to the image processing apparatus of the present invention, after the photographic area is first extracted from the image data to be processed, the graphic area and the character area can be separated from the remaining data. For this reason, the photographic area is not extracted along with the other area due to the influence of the separation process of the other area. As a result, the photograph area can be reliably extracted without being erroneously determined as another area. Therefore, when the main purpose is to reproduce a photographic region with high image quality, it is possible to reliably extract more photographic regions and execute appropriate processing on the region.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of an image processing system including an image processing apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an example of a configuration of an image processing apparatus.
FIG. 3 is a diagram illustrating an example of a document mode setting screen in an operation unit.
FIG. 4 is a flowchart illustrating a processing procedure in the image processing apparatus.
FIG. 5 is a diagram schematically illustrating an example of image data received from a scanner.
6A and 6B are diagrams for explaining composition of regions, in which FIG. 6A shows a state after composition, and FIG. 6B shows a state before composition;
FIG. 7 is a flowchart showing a procedure of region separation processing in a first mode.
FIG. 8 is a flowchart showing a procedure of region separation processing in a second mode.
FIG. 9 is a flowchart showing a procedure of region separation processing in a third mode.
FIG. 10 is a flowchart illustrating a procedure of region separation processing in a fourth mode.
FIG. 11 is a flowchart illustrating a procedure of region separation processing in a fifth mode.
FIG. 12 is a flowchart illustrating a procedure of region separation processing in a sixth mode.
FIG. 13 is a flowchart showing a procedure of region division by binarization.
14 is a diagram illustrating a binary image in which an area other than the background in FIG. 5 is painted black.
15 is a diagram showing an image composed of the edges of FIG.
FIG. 16 is a flowchart showing a procedure for extracting a first photograph / graphic area.
FIG. 17 is a flowchart showing a procedure for extracting a first photograph / graphic region continued from FIG. 16;
FIG. 18 is a diagram showing a first histogram.
FIG. 19 is a diagram showing a second histogram.
FIG. 20 is a flowchart showing a procedure for dividing an area by edges.
FIG. 21 is a diagram schematically showing an example of a region obtained by region division by edges, where (A) is in the first, second, or fifth mode, and (b) is in the third, The case of 4 or 6th mode is shown.
FIG. 22 is a flowchart showing a procedure for extracting a character area;
FIG. 23 is a diagram showing image data used for explaining character region extraction.
FIG. 24 is a diagram for explaining binarization processing based on a variation threshold.
FIG. 25 is obtained for each connected black pixel of the same label in a connected edge image obtained by performing binarization processing using a variation threshold and a black pixel connecting process and a labeling image obtained from the connected edge image data. It is a figure which shows a circumscribed rectangle.
FIG. 26 is a diagram for explaining a filter process used when a high frequency component is removed from a characteristic frequency component of image data.
FIG. 27 is a diagram illustrating an example of an oblique direction edge component image obtained by binarizing an extracted high frequency component.
FIG. 28 is a diagram illustrating an example of (A) a brightness histogram created from a brightness image of image data in a character region, and (B) a peak detection histogram.
29 shows (A) a photograph area extracted first, (B) a second extracted graphic area, and (C) a third one from the image data of FIG. 5 in the area separation processing in the first mode. It is a figure which shows the character area extracted by.
30 shows (A) the first extracted photo area, (B) the second extracted character area, and (C) the third one from the image data shown in FIG. 5 in the second mode area separation processing. It is a figure which shows the figure area | region extracted.
31 is a diagram showing (A) the first extracted graphic region, (B) the second extracted photo region, and (C) the third one from the image data of FIG. 5 in the region separation processing in the third mode. It is a figure which shows the character area extracted by.
32 shows (A) a graphic area extracted first, (B) a second extracted character area, and (C) a third one from the image data of FIG. 5 in the area separation processing in the fourth mode. It is a figure which shows the photograph area | region extracted by (b).
33 shows (A) a character region extracted first, (B) a second extracted photo region, and (C) a third one from the image data of FIG. 5 in the region separation processing in the fifth mode. It is a figure which shows the figure area | region extracted.
34 shows (A) the character region extracted first, (B) the second extracted graphic region, and (C) the third one from the image data of FIG. 5 in the region separation processing in the sixth mode. It is a figure which shows the photograph area | region extracted by (b).
[Explanation of symbols]
1 ... Image processing device,
101 ... control unit,
102 ... storage part,
103 ... operation unit,
104 ... input interface part,
105 ... output interface part,
106 ... area separation unit,
108: Image processing unit,
108a ... Photo region processing unit,
108b ... a graphics area processing unit,
108c ... character area processing unit,
109 ... Document file creation unit,
110 ... file format conversion unit,
111 ... Bus
2 ... Scanner,
3 ... File server,
4 Computer network.

Claims

An image processing apparatus having an area separating means for separating a photograph area, a graphic area, and a character area from image data to be processed,
The region separating means includes
A photographic area first extracting means for identifying and extracting a photographic area from the image data prior to the graphic area and the character area is provided, and the photographic area is identified and extracted from the image data, and the remaining area is identified. Identify and separate graphic and character areas from unprocessed data ,
The photograph area first extraction means is:
A first photographic area extracting unit that performs a first area dividing process that can divide the image data into a plurality of predetermined areas, and that extracts an area that is identified as a photographic area among the divided areas;
A second area dividing process that can be divided into a plurality of areas smaller than the predetermined area is performed on the data remaining after the area specified as the photographic area is extracted from the image data by the first photographic area extracting means. A second photographic area extracting means for extracting a photographic area and a specified area among the divided areas;
An image processing apparatus.

The region separating means includes
It further comprises a graphic area preceding extraction means for specifying and extracting a graphic area before the character area from data that has been identified and extracted from the image data and is left unidentified. The image processing apparatus according to claim 1.

The region separating means includes
Character area preceding extraction means for specifying and extracting a character area prior to a graphic area from data in which a photograph area has been identified and extracted from the image data and which has not been identified is further provided. The image processing apparatus according to claim 1.

The first area dividing process divides the image data into a plurality of predetermined areas by detecting edges of a binary image in which a photographic area, a graphic area, or a character area portion in the image data is distinguished from a background portion other than these. The second area dividing process can be divided into a plurality of areas smaller than the predetermined area by extracting an area identified as a photographic area from the image data and detecting an edge of the remaining data. The image processing apparatus according to claim 1 , wherein:

Further comprising a reading means to obtain image data by reading an original, image data to be the process, the image of any one of claims 1-4, characterized in that it is obtained by the reading means Processing equipment.

An image processing method having a region separation step of separating a photo region, a graphic region, and a character region from image data to be processed,
The region separation step includes:
A photographic area first extraction step for identifying and extracting a photographic area prior to a graphic area and a character area from the image data is provided , and the photographic area is identified and extracted from the image data and the remaining area is identified. from the data that has not been, Ri step der separating identify a graphic area and a text area,
The photographic region first extraction step includes:
A first photographic region extracting step of performing a first region dividing process on the image data, which can be divided into a plurality of predetermined regions, and extracting a region identified as a photographic region among the divided regions;
A second area dividing process that can be divided into a plurality of areas smaller than the predetermined area is performed on the data remaining after the area specified as the photographic area is extracted from the image data in the first photographic area extracting step. A second photo area extracting step of extracting a photo area and a specified area among the divided areas;
An image processing method.

An image processing program for causing an image processing apparatus to execute a region separation procedure for separating a photograph region, a graphic region, and a character region from image data to be processed,
The region separation procedure includes:
A photographic area first extraction procedure for identifying and extracting a photographic area prior to a graphic area and a character area from the image data is provided , and the photographic area is identified and extracted from the image data, and the remaining area is identified. from the data that has not been, Ri procedures der separating identify a graphic area and a text area,
The photo area first extraction procedure is:
A first photographic region extraction procedure for subjecting the image data to a first region dividing process that can be divided into a plurality of predetermined regions, and extracting a region identified as a photographic region among the divided regions;
A second area dividing process that can be divided into a plurality of areas smaller than the predetermined area is performed on the data remaining after the area specified as the photo area in the first photo area extracting procedure is extracted from the image data. A second photo area extraction procedure for extracting a photo area and a specified area from among the divided areas;
An image processing program characterized by that.

A computer-readable recording medium on which the image processing program according to claim 7 is recorded.