JP4261988B2

JP4261988B2 - Image processing apparatus and method

Info

Publication number: JP4261988B2
Application number: JP2003158105A
Authority: JP
Inventors: 朋紀工藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-06-03
Filing date: 2003-06-03
Publication date: 2009-05-13
Anticipated expiration: 2023-06-03
Also published as: JP2004363786A

Description

【０００１】
【発明の属する技術分野】
本願発明は、スキャナ等の入力装置で読み取られた画像と類似する画像データを、データベースから検索して出力する画像処理装置に関するものである。
【０００２】
【従来の技術】
近年、バインダー等で蓄積された紙文書や配付資料等をスキャナで読み取り、オリジナルの電子データを検索するような画像処理装置が提案されている。特許文献１はデータベース内の電子文書をラスター画像に展開してスキャン画像と比較して検索結果を絞り込み、類似度の最も高い文書と予め定められた基準値と比較して、基準値を超えていたら該文書を表示部に出力し、その後印刷や送信を行うものである。
【０００３】
【特許文献１】
特開２００１−２５６２５６
【０００４】
【発明が解決しようとする課題】
特許文献１では、オリジナル文書を検索して印刷したい場合に、類似度が十分大きくても一度検索結果を表示部に表示し、印刷や送信を選択する構成のため、余計な手間がかかっていた。
【０００５】
【課題を解決するための手段】
上記課題を解決するために、本発明の請求項１に記載の画像処理装置は、入力される文書画像に類似する画像を登録データから検索する画像処理装置において、文書画像を入力する入力手段と、前記入力手段によって入力された文書画像と登録データの類似度を算出する類似度算出手段と、前記類似度算出手段による算出の結果、前記文書画像との類似度が最も高い登録データの類似度と次に類似度が高い登録データの類似度との差が所定の値より大きいと判定された登録データのアドレスを自動的に通知する通知手段と、前記通知手段によってアドレスが通知された登録データを印刷するよう制御する印刷制御手段と、
を有することを特徴とする。
【０００６】
また、上記課題を解決するために、本発明の請求項４に記載の画像処理方法は、入力される文書画像に類似する画像を登録データから検索する画像処理方法において、文書画像を入力手段によって入力する入力ステップと、前記入力手段によって入力された文書画像と登録データの類似度を類似度算出手段が算出する類似度算出ステップと、前記類似度算出ステップによる算出の結果、前記文書画像との類似度が最も高い登録データの類似度と次に類似度が高い登録データの類似度との差が所定の値より大きいと判定された登録データのアドレスを自動的に通知手段が通知する通知ステップと、前記通知ステップでアドレスが通知された登録データを印刷するよう印刷制御手段が制御する印刷制御ステップと、を有することを特徴とする。
【００１０】
【発明の実施の形態】
本願発明の実施の形態について説明する。図１は本願発明にかかる画像処理装置の構成例を示すブロック図である。本実施例では、オフィス１０とオフィス２０とがインターネット１０４で接続された環境をあげる。オフィス１０内に構築されたＬＡＮ１０７には、ＭＦＰ１００、ＭＦＰ１００を制御するマネージメントＰＣ１０１、クライアントＰＣ（外部記憶手段）１０２文書管理サーバ１０６、そのデータベース１０５、およびプロキシサーバ１０３が接続されている。ＬＡＮ１０７及びオフィス２０内のＬＡＮ１０８はプロキシサーバ１３を介してインターネット１０４に接続される。ＭＦＰ１００は本発明において紙文書の画像読み取り部と読み取った画像信号に対する画像処理の一部を担当し、画像信号はＬＡＮ１０９を用いてマネージメントＰＣ１０１に入力する。マネージメントＰＣは通常のＰＣであり、内部に画像記憶手段、画像処理手段、表示手段、入力手段を有するが、その一部をＭＦＰ１００に一体化して構成されている。
【００１１】
図２はＭＦＰ１００の構成図である。図２においてオートドキュメントフィーダー（以降ＡＤＦと記す）を含む画像読み取り部１１０は束状の或いは１枚の原稿画像を図示しない光源で照射し、原稿反射像をレンズで固体撮像素子上に結像し、固体撮像素子からラスター状の画像読み取り信号をイメージ情報として得る。通常の複写機能はこの画像信号をデータ処理部１１５で記録信号へ画像処理し、複数毎複写の場合は記録部１１１に一旦一ページ分の記録データを記憶保持した後、記録部１１２に順次出力して紙上に画像を形成する。
【００１２】
一方クライアントＰＣ１０２から出力されるプリントデータはＬＡＮ１０７からネットワークＩＦ１１４を経てデータ処理部１１５で記録可能なラスターデータに変換した後、前記記録部で紙上に記録画像として形成される。
【００１３】
ＭＦＰ１００への操作者の指示はＭＦＰに装備されたキー操作部とマネージメントＰＣに入力されるキーボード及びマウスからなる入力部１１３から行われ、これら一連の動作はデータ処理部１１５内の図示しない制御部で制御される。
【００１４】
一方、操作入力の状態表示及び処理中の画像データの表示は表示部１１６で行われる。なお記憶部１１１はマネージメントＰＣからも制御され、これらＭＦＰとマネージメントＰＣとのデータの授受及び制御はネットワークＩＦ１１７および直結したＬＡＮ１０９を用いて行われる。
【００１５】
〔処理概要〕
次に本発明による画像処理の概要を、図５を用いて説明する。
【００１６】
原稿を入力する原稿入力処理（２００１）ではＭＦＰ１００の画像読み取り部１１０を動作させ１枚の原稿をラスター状に走査し画像信号を得る。次にあらかじめ処理設定で設定された処理を判定する判定処理（２００２）で図６のようなユーザインタフェースで設定された設定を判定する。原稿出力が設定されていた場合、２００１で入力した画像をそのまま、画像の印刷／編集／蓄積／伝達／記録に出力する（２００４）。また、原本を検索する原本出力が設定された場合、原本処理（２００３）を行い、画像の印刷／編集／蓄積／伝達／記録に出力する（２００４）。
【００１７】
〔原本処理概要〕
次に本発明による画像処理の原本処理概要を、図３を用いて説明する。
【００１８】
原稿入力処理で入力した画像信号をデータ処理部１１５で前処理を施し記憶部１１１に１ページ分の画像データとして保存する。マネージメントＰＣ１０１のＣＰＵは該格納された画像信号から先ず、文字／線画部分とハーフトーンの画像部分とに領域を分離し、文字部は更に段落で塊として纏まっているブロック毎に、或いは、線で構成された表、図形に分離し各々セグメント化する。一方ハーフトーンで表現される画像部分は、矩形に分離されたブロックの画像部分、背景部等、所謂ブロック毎に独立したオブジェクトに分割する（ステップ１２１）。
【００１９】
このとき原稿画像中に付加情報として記録された２次元バーコード、或いはＵＲＬに該当するオブジェクトを検出しＵＲＬはＯＣＲで文字認識し、或いは２次元バーコードなら該マークを解読して（ステップ１２２）該原稿のオリジナル電子ファイルが格納されている記憶部内のポインター情報を検出する（ステップ１２３）。なお、ポインター情報を付加する手段は他に文字と文字の間隔に情報を埋め込む方法、ハーフトーンの画像に埋め込む方法等直接可視化されない所謂電子透かしによる方法も有り、それに対応できる構成としてもよい。
【００２０】
ステップ１２４でポインター情報が検出された場合、ステップ１２５に分岐し、ポインターで示されたアドレスから元の電子ファイルを検索する。電子ファイルとはスキャンして登録された文書や、アプリケーションで作成された文書等であり、図１におけるクライアントＰＣ内のハードディスク内、或いはオフィス１０或いは２０のＬＡＮに接続された文書管理サーバ１０５内のデータベース１０５内、或いはＭＦＰ１００自体が有する記憶部１１１のいずれかに格納されている。ステップ１２５で電子ファイルが見つからなかった場合、見つかったがＰＤＦあるいはＴＩＦＦに代表される所謂イメージファイルであった場合、或いはステップ１２４でポインター情報自体が存在しなかった場合はステップ１２６に分岐する。
【００２１】
ステップ１２６ではデータベース上のオリジナル電子ファイルを検索するため、まず入力画像をベクトルデータへ変換する。先ず、ステップ１２２でＯＣＲされた文字ブロックに対しては、更に文字のサイズ、スタイル、字体を認識し、原稿を走査して得られた文字に可視的に忠実なフォントデータに変換する。一方線で構成される表、図形ブロックに対してはアウトライン化し、表など図形形状が認識できるものは、その形状を認識する。画像ブロックに対してはイメージデータとして個別の画像ファイルとして処理する。これらのベクトル化処理はオブジェクト毎に行う。データベース上のファイルベクトルデータへ変換されたイメージは、ステップ１２７でデータベース上の各ファイルと類似度を調べ、オリジナルを検索する。本実施例では、ステップ１２６により変換されたベクトルデータを用いて忠実にオリジナルファイルを検索する。オブジェクト毎に類似度を求め、オブジェクト毎の類似度をそのオブジェクトのファイル内占有率に応じてファイル全体の類似度へ反映させる。ファイル内で占めている割合の大きいオブジェクトの類似度が、ファイル全体の類似度へより大きく反映されるため、いかなるフォーマットのファイルにも適応的に対応することが可能である。
【００２２】
ステップ１２８で類似度と閾値を比較した結果、候補が１ファイルの場合はそのファイルの類似度を、候補が複数の場合は類似度の１番高いファイルの類似度を予め定められた閾値と比較し、閾値より高い場合は、自動的にステップ１３４に分岐し、格納アドレスを通知する。なお、この分岐判定は閾値との比較をするのではなく、１番高い類似度と２番目に高い類似度の差が予め定められたある設定値以上であれば、１３４に分岐する分岐条件としてもよいし、分岐を設定しないで無条件に類似度の１番高いファイルを選択してステップ１３４に進むよう構成することもできる。このようにスキャンしてから印刷などの出力を受けるまでの間にユーザの選択操作を挟まないことで、操作性を大幅に向上させることが可能となる。
【００２３】
ステップ１２８で類似度が閾値を超えているファイルがない場合、図７に示すようにサムネイル等を類似度順に表示（ステップ１２９）し、操作者の選択が必要なら操作者の入力操作よって複数のファイルの中からファイルの特定を行う。ステップ１３０ではステップ１２９で表示したファイル中にユーザ所望の電子ファイルがあり、それが選択された場合にステップ１３４に分岐して該ファイルの格納アドレスを通知し、選択されなかった場合は、ステップ１３１に分岐する。
【００２４】
ステップ１３１では入力されたデータを登録するために、ベクトル化処理を行う。ベクトル化処理はオブジェクト毎に行い、更に各オブジェクトのレイアウト情報を保存して例えば、ｒｔｆに変換（ステップ１３１）して電子ファイルとして記憶部１１１に格納（ステップ１３２）する。
【００２５】
今ベクトル化した原稿画像は以降同様の処理を行う際に直接電子ファイルとして検索出来るように、先ずステップ１３３において検索の為のインデックス情報を生成して検索用インデックスファイルに追加する。ステップ１３４では記憶部に格納した際の格納アドレスを操作者に通知する。
【００２６】
以上本発明によって得られた電子ファイル自体を用いて、例えば文書の印刷、伝送、加工、蓄積、記録をステップ１３５で行う事が可能になる。なお、上記実施例では操作者に格納アドレスを通知する構成としているが、通知せずに文書の印刷、伝送、加工、蓄積、記録をする構成としても構わない。
【００２７】
以下、各処理ブロックに対して詳細に説明する。
【００２８】
先ずステップ１２１で示すブロックセレクション処理について説明する。
【００２９】
〔ブロックセレクション処理〕
ブロックセレクション処理とは、図４に示すように、文書画像をオブジェクト毎の塊として認識し、該ブロック各々を文字／図画／写真／線／表等の属性に判定し、異なる属性を持つ領域に分割する処理である。
【００３０】
ブロックセレクション処理の実施例を以下に説明する。
【００３１】
先ず、入力画像を白黒に二値化し、輪郭線追跡をおこなって黒画素輪郭で囲まれる画素の塊を抽出する。面積の大きい黒画素の塊については、内部にある白画素に対しても輪郭線追跡をおこない白画素の塊を抽出、さらに一定面積以上の白画素の塊の内部からは再帰的に黒画素の塊を抽出する。
【００３２】
このようにして得られた黒画素の塊を、大きさおよび形状で分類し、異なる属性を持つ領域へ分類していく。たとえば、縦横比が１に近く、大きさが一定の範囲のものを文字相当の画素塊とし、さらに近接する文字が整列良くグループ化可能な部分を文字領域、扁平な画素塊を線領域、一定大きさ以上でかつ四角系の白画素塊を整列よく内包する黒画素塊の占める範囲を表領域、不定形の画素塊が散在している領域を写真領域、それ以外の任意形状の画素塊を図画領域、などとする。
【００３３】
ブロックセレクション処理で得られた各ブロックに対するブロック情報を図４に示す。
【００３４】
これらのブロック毎の情報は以降に説明するベクトル化、或いは検索の為の情報として用いる。
【００３５】
〔文字認識〕
文字認識部では、文字単位で切り出された画像に対し、パターンマッチの一手法を用いて認識を行い、対応する文字コードを得る。この認識処理は、文字画像から得られる特徴を数十次元の数値列に変換した観測特徴ベクトルと、あらかじめ字種毎に求められている辞書特徴ベクトルと比較し、最も距離の近い字種を認識結果とする処理である。特徴ベクトルの抽出には種々の公知手法があり、たとえば、文字をメッシュ状に分割し、各メッシュ内の文字線を方向別に線素としてカウントしたメッシュ数次元ベクトルを特徴とする方法がある。
【００３６】
ブロックセレクション（ステップ１２１）で抽出された文字領域に対して文字認識を行う場合は、まず該当領域に対し横書き、縦書きの判定をおこない、各々対応する方向に行を切り出し、その後文字を切り出して文字画像を得る。横書き、縦書きの判定は、該当領域内で画素値に対する水平／垂直の射影を取り、水平射影の分散が大きい場合は横書き領域、垂直射影の分散が大きい場合は縦書き領域と判断すればよい。文字列および文字への分解は、横書きならば水平方向の射影を利用して行を切り出し、さらに切り出された行に対する垂直方向の射影から、文字を切り出すことでおこなう。縦書きの文字領域に対しては、水平と垂直を逆にすればよい。なお、この時文字のサイズが検出出来る。
【００３７】
〔ファイル検索〕
次に、図３のステップ１２７で示すファイル検索処理の詳細について図１１乃至図１３を使用して説明を行う。
【００３８】
本実施例では、前述したブロックセレクション処理により分割しベクトル化された各ブロック情報を利用し検索を行う。具合的に検索は、各ブロックの属性とファイル中のブロック座標情報との比較、すなわちレイアウトによる比較と、ファイル内の各ブロックの属性により異なる比較方法が適用されるブロック毎の内部情報比較とを複合した複合検索を用いる。
【００３９】
図１１は、図３のステップ１２６でベクトル化されたスキャン画像データ（入力ファイル）の例であり、ブロックＢ’１〜Ｂ’９に分割されかつそれぞれがベクトル化処理されている。
【００４０】
図１２は、入力ファイルを既にベクトル化されデータベース上に保存されてある画像データ（データベースファイル）と順次比較し、類似度を算出するフローチャートである。まず、データベースよりデータベースファイルへアクセスする（ステップ５０１）。入力ファイルの各ブロックとデータベースファイルの各ブロックを比較し、入力ファイルのブロック毎にデータベースファイルのブロックとの類似度を求める（ステップ５０２）。
【００４１】
ここで、ブロック毎に類似度を算出する際、図１３に示すフローチャートに従い、まず入力ファイルの該ブロックとレイアウト上一致すると推定されるデータベースファイルの対象ブロックを選出する。この処理においては、入力ファイルの複数のブロックに対し、データベースファイルの対象ブロックが重複されて選出されてもよい。次に該ブロックと対象ブロックとのレイアウト情報の類似度を求める。ブロックの位置、サイズ、属性を比較し（ステップ５１２、５１３、５１４）、その誤差からレイアウトの類似度を求める。次にブロック内部の比較を行うが、ブロック内部を比較する際は同じ属性として比較するため、属性が異なる場合は片方のブロックを一致する属性へ再ベクトル化するなど前処理を行う。前処理により同じ属性として扱われる入力ファイルのブロックとデータベースファイルの対象ブロックは、ブロックの内部比較を行う（ステップ５１５）。
【００４２】
ブロック内部比較では、ブロックの属性に最適な比較手法をとるため、属性によりその比較手法は異なる。例えば、前述したブロックセレクション処理により、ブロックはテキスト、写真、表、線画などの属性に分割される。テキストブロックを比較する場合は、ベクトル化処理により文字コード、フォントが判別されているため、各文字の一致度からその文章の類似度を算出し、ブロック内部の類似度が算出される。写真画像ブロックでは、画像より抽出される特徴ベクトルを特徴空間上の誤差より類似度が算出される。ここでいう特徴ベクトルとは、色ヒストグラムや色モーメントのような色に関する特徴量、共起行列、コントラスト、エントロピ、Ｇａｂｏｒ変換等で表現されるテクスチャ特徴量、フーリエ記述子等の形状特徴量など複数挙げられ、このような複数の特徴量のうち最適な組み合わせを用いる。また、線画ブロックでは、線画ブロックはベクトル化処理によりアウトライン線、もしくは罫線、曲線の集合として表現されるため、各線の始点、終点の位置、曲率などの誤差を算出することにより線画の類似度が算出される。また、表ブロックでは、表の格子数、各枠子のサイズ、各格子内のテキスト類似度などを算出することにより、表ブロック全体の類似度が算出できる。
【００４３】
以上より、ブロック位置、サイズ、属性、ブロック内部の類似度を算出し、各類似度を合計することで入力ファイルの該ブロックに対しその類似度を算出することが可能であり、該ブロック類似度を記録する。入力ファイルのブロック全てについて、一連の処理を繰り返す。求められたブロック類似度は、全て統合することで、入力ファイルの類似度を求める（ステップ５０３）。統合処理について説明する。図１１の入力ファイルのブロックＢ１’〜Ｂ９’に対し、ブロック毎の類似度がｎ１〜ｎ９と算出されたとする。このときファイル全体の総合類似度Ｎは、以下の式で表現される。
Ｎ＝ｗ１＊ｎ１＋ｗ２＊ｎ２＋ｗ３＊ｎ３＋…．＋ｗ９＊ｎ９＋γ ・・・（１）
【００４４】
ここで、ｗ１〜ｗ９は、各ブロックの類似度を評価する重み係数である。γは補正項であり、例えば、データベースファイルの入力ファイルに対する対象ブロックとして選出されなかったブロックの評価値などとする。重み係数ｗ１〜ｗ９は、ブロックのファイル内占有率により求める。例えばブロック１〜９のサイズをＳ１〜Ｓ９とすると、ブロック１の占有率ｗ１は、
ｗ１＝Ｓ１／（Ｓ１＋Ｓ２＋…．＋Ｓ９）・・・（２）
として算出できる。このような占有率を用いた重み付け処理により、ファイル内で大きな領域を占めるブロックの類似度がよりファイル全体の類似度に反映されるようになる。
【００４５】
〔ファイル検索におけるテキスト検索の類似度算出〕
文書は登録される段階で、登録文書に含まれる単語を取得する。次に、文書内に出現する単語から基本ベクトル辞書を用いて算出される。図９は基本ベクトル辞書の構成を示したものである。基本ベクトル辞書は単語毎にベクトル表現時のそれぞれの次元（Ｄｉｍ．）に応対した特徴量が格納されている。次元はその単語本来の意味によって分類された基準や、その単語の使用分野に応じて分類された基準等が採用される。単語１のＤｉｍ．１の特徴量は０であり、Ｄｉｍ．２の特徴量は２３であることがわかる。このように辞書から一つの単語におけるそれぞれの次元（Ｄｉｍ．）の特徴量を得ることが可能となる。特徴量はその単語が使用されることにより、その文書がその分類基準（＝次元）をどれぐらい特徴付ける可能性があるかを示す値と解釈することが可能である。文書を構成するすべての単語から得られた分類基準別（次元別）の特徴量から、文書全体の特徴量を分類基準を次元とするベクトルで表現する。得られたベクトルをノルム＝１で正規化した値を文書ベクトルとして格納する。文書ベクトルを図１０のようなインデックスに格納する。文書ＩＤ＝６９４７の文書ベクトルのＤｉｍ．１の特徴量は０．１８３であり、Ｄｉｍ．２の特徴量は０．２１４であることがわかる。
【００４６】
〔アプリデータへの変換処理〕
ところで、一頁分のイメージデータをブロックセレクション処理（ステップ１２１）し、ベクトル化処理（ステップ１２６）した結果は図１４に示す様な中間データ形式のファイルとして変換されているが、このようなデータ形式はドキュメント・アナリシス・アウトプット・フォーマット（ＤＡＯＦ）と呼ばれる。
【００４７】
図１４はＤＡＯＦのデータ構造を示す図である。
【００４８】
図１４において、７９１はＨｅａｄｅｒであり、処理対象の文書画像データに関する情報が保持される。レイアウト記述データ部７９２では、文書画像データ中のＴＥＸＴ（文字）、ＴＩＴＬＥ（タイトル）、ＣＡＰＴＩＯＮ（キャプション）、ＬＩＮＥＡＲＴ（線画）、ＥＰＩＣＴＵＲＥ（自然画）、ＦＲＡＭＥ（枠）、ＴＡＢＬＥ（表）等の属性毎に認識された各ブロックの属性情報とその矩形アドレス情報を保持する。文字認識記述データ部７９３では、ＴＥＸＴ、ＴＩＴＬＥ、ＣＡＰＴＩＯＮ等のＴＥＸＴブロックを文字認識して得られる文字認識結果を保持する。表記述データ部７９４では、ＴＡＢＬＥブロックの構造の詳細を格納する。画像記述データ部７９５は、ＰＩＣＴＵＲＥやＬＩＮＥＡＲＴ等のブロックのイメージデータを文書画像データから切り出して保持する。
【００４９】
このようなＤＡＯＦは、中間データとしてのみならず、それ自体がファイル化されて保存される場合もあるが、このファイルの状態では、所謂一般の文書作成アプリケーションで個々のオブジェクトを再利用する事は出来ない。そこで次に、このＤＡＯＦからアプリデータに変換する処理（ステップ１３１）について詳説する。
【００５０】
図１５は、アプリデータ変換の概略フローである。
８０００は、ＤＡＯＦデータの入力を行う。
８００２は、アプリデータの元となる文書構造ツリー生成を行う。
８００４は、文書構造ツリーを元に、ＤＡＯＦ内の実データを流し込み、実際のアプリデータを生成する。
【００５１】
図１６は、８００２文書構造ツリー生成部の詳細フロー、図１７は、文書構造ツリーの説明図である。全体制御の基本ルールとして、処理の流れはミクロブロック（単一ブロック）からマクロブロック（ブロックの集合体）へ移行する。
【００５２】
以後ブロックとは、ミクロブロック、及びマクロブロック全体を指す。
【００５３】
８１００は、ブロック単位で縦方向の関連性を元に再グループ化する。スタート直後はミクロブロック単位での判定となる。
【００５４】
ここで、関連性とは、距離が近い、ブロック幅（横方向の場合は高さ）がほぼ同一であることなどで定義することができる。
【００５５】
また、距離、幅、高さなどの情報はＤＡＯＦを参照し、抽出する。
【００５６】
図１７（ａ）は実際のページ構成、（ｂ）はその文書構造ツリーである。８１００の結果、Ｔ３、Ｔ４、Ｔ５が一つのグループＶ１、Ｔ６、Ｔ７が一つのグループＶ２が同じ階層のグループとしてまず生成される。
【００５７】
８１０２は、縦方向のセパレータの有無をチェックする。セパレータは、例えば物理的にはＤＡＯＦ中でライン属性を持つオブジェクトである。また論理的な意味としては、アプリ中で明示的にブロックを分割する要素である。ここでセパレータを検出した場合は、同じ階層で再分割する。
【００５８】
８１０４は、分割がこれ以上存在し得ないか否かをグループ長を利用して判定する。
【００５９】
ここで、縦方向のグループ長がページ高さとなっている場合は、文書構造ツリー生成は終了する。
【００６０】
図１７の場合は、セパレータもなく、グループ高さはページ高さではないので、８１０６に進む。
【００６１】
８１０６は、ブロック単位で横方向の関連性を元に再グループ化する。ここもスタート直後の第一回目はミクロブロック単位で判定を行うことになる。
【００６２】
関連性、及びその判定情報の定義は、縦方向の場合と同じである。
【００６３】
図１７の場合は、Ｔ１，Ｔ２でＨ１、Ｖ１，Ｖ２でＨ２、がＶ１，Ｖ２の１つ上の同じ階層のグループとして生成される。
【００６４】
８１０８は、横方向セパレータの有無をチェックする。
【００６５】
図１７では、Ｓ１があるので、これをツリーに登録し、Ｈ１、Ｓ１、Ｈ２という階層が生成される。
【００６６】
８１１０は、分割がこれ以上存在し得ないか否かをグループ長を利用して判定する。
【００６７】
ここで、横方向のグループ長がページ幅となっている場合は、文書構造ツリー生成は終了する。
【００６８】
そうでない場合は、８１０２に戻り、再びもう一段上の階層で、縦方向の関連性チェックから繰り返す。
【００６９】
図１７の場合は、分割幅がページ幅になっているので、ここで終了し、最後にページ全体を表す最上位階層のＶ０が文書構造ツリーに付加される。
【００７０】
文書構造ツリーが完成した後、その情報を元に８００６においてアプリデータの生成を行う。
【００７１】
図１７の場合は、具体的には、以下のようになる。
【００７２】
すなわち、Ｈ１は横方向に２つのブロックＴ１とＴ２があるので、２カラムとし、Ｔ１の内部情報（ＤＡＯＦを参照、文字認識結果の文章、画像など）を出力後、カラムを変え、Ｔ２の内部情報出力、その後Ｓ１を出力となる。
【００７３】
Ｈ２は横方向に２つのブロックＶ１とＶ２があるので、２カラムとして出力、Ｖ１はＴ３、Ｔ４、Ｔ５の順にその内部情報を出力、その後カラムを変え、Ｖ２のＴ６、Ｔ７の内部情報を出力する。
【００７４】
以上によりアプリデータへの変換処理が行える。
【００７５】
〔ファイル検索における別実施例１〕
上記の実施例では、ファイル検索において、入力ファイルとデータベースファイルを比較する際、全ての入力ファイルの全てのブロックについて、レイアウト情報とブロックの内部情報の比較を行った。しかし、ブロック内部情報の比較を行わずともレイアウトの情報を比較した段階である程度ファイルを選別することが可能である。すなわち、入力ファイルとレイアウトが全く異なるデータベースファイルはブロック内部情報の比較処理を省くことが可能である。図１９にレイアウト情報によるファイル選別を実施した際のフローチャートである。まず、入力ファイルの全てのブロックに対し、位置、サイズ、属性の比較を行い、その類似度を求め、ファイル全体のレイアウト類似度を求める（ステップ５２２）。レイアウト類似度が閾値より低い場合は、ブロック内部情報比較は行わない（ステップ５２３）。閾値より高い場合、つまりレイアウトが似ている場合のみ、ブロック内部情報の比較（ステップ５２４）を行い、先に求めたレイアウト類似度とブロック内部の類似度より、ファイル全体の総合類似度が求まる（ステップ５２５）。ブロック毎の類似度からの総合類似度の求める手法は、図１２のステップ５０３と同様の処理であり、説明を省略する。該類似度が閾値以上のファイルに関しては候補として保存する。ブロック内部情報の類似度を求める処理は特に写真ブロックの一致を調べるときなど、一般的に重い処理となる。よって、レイアウトである程度ファイルを絞り込むことで、検索処理量の軽減、処理の高速化が行え、効率よく所望のファイルを検索できる。
【００７６】
〔ファイル検索における別実施例２〕
上記の実施例は全て、ファイル検索時、ユーザが何も指定せずに検索を施した場合の検索処理実施例である。しかし、ユーザに文書内の特徴となる部分（ブロックセレクションより求められるブロック）を指定させる、もしくは無駄なブロックを省く、または文書内の特徴を指定させることで、ファイル検索をより最適化することが可能になる。
【００７７】
図８は検索時、ユーザによる検索オプション指定のユーザインタフェース画面（１００１）の例である。入力ファイルはブロックセレクション処理により、複数のブロックに分割されており、入力画面にはファイル上のテキスト、写真、表、線画など各ブロックがサムネイルとなり表示される（１０１１〜１０１７）。ユーザは表示されたブロック中から、特徴となるブロックを選択する。このとき選択するブロックは複数であってもよい。例として、ブロック１０１４を選択したとする。ブロック１０１４が選択された状態で、ボタン「優先」（１００３）を押したとき、よりブロック１０１４を重視した検索処理を行うようにする。重視した検索とは、例えば、ブロック毎の類似度からファイル全体の類似度を求める演算式（１）の指定されたブロック１０１４の重み係数を大きくし、選択外のブロックの重み係数を小さくするようにするということで実現できる。複数回「優先」ボタン（１００４）を押せば、選択されたブロックの重み係数を大きくし、よりブロックを重視した検索が行える。また、除外ボタン（１００４）を押せば、選択されたブロック１０１４を省いた状態で検索処理を施す。ブロックが誤って認識された場合などには、無駄な検索処理を省略し、かつ誤った検索処理を防止できる。また、詳細設定（１００５）によりブロックの属性の変更を実現可能とし、ブロックセレクション（ステップ１２１）での誤って属性を認識した場合でもユーザが修正することで、正確な検索できる。また、詳細設定１００５では、ユーザにより、ブロックの検索優先する重みを細かく調節可能とする。このように、検索する際、ユーザが特徴となるブロックを指定、設定させることで、検索の最適化が行える。
【００７８】
一方、ファイルによっては、レイアウトが特殊な場合も考えられる。このようなファイルに関しては、図８のレイアウト優先ボタン（１００５）を選択することにより、レイアウトを重視したファイル検索を可能とする。この場合、レイアウトの類似度の結果をより重視するように、重み付けすることで実現する。また、テキスト優先ボタン（１００６）では、テキストブロックのみの検索を実行し、処理の軽減を図れる。
【００７９】
このように、ユーザに画像の特徴を選択させることで、ファイルの特徴を重視した検索が行える。また、ユーザという人為的手段を信頼する、すなわちユーザ指定により重みを変更する際に、それに伴い変更された重みが閾値以下になる選択外ブロックを検索処理しないなどの制限を加えれば、ユーザの簡単な操作で、無駄なブロックの検索処理を大幅に削減できることも可能である。
【００８０】
（他の実施例）
上記実施例では、図６に示すように原本出力、原稿出力から処理を選択して実行していたが、本発明はこれに限られるものではない。図２０に示すように、原本出力、原本登録、原稿出力（原本登録しない）、原稿出力（原本登録する）から処理を選択してもよい。原本登録が選択された場合は画像入力後、図３で示すステップ１３１から処理が始まり、画像の印刷は行わない。原稿出力（原本登録しない）が選択された場合は画像入力後、ステップ１３５にとび、画像の印刷が行われる。原稿出力（原本登録する）が選択された場合は画像入力後、ステップ１３１から処理が始まり、登録するとともに画像印刷が行われる。
【００８１】
また、上記実施例では、ステップ１２８で比較する閾値や設定値は予め定められたものとしていたが、これを設定する手段を備えても構わない。その場合例えば、図１８に示すようなインタフェースで設定するよう構成すればよい。
【００８２】
【発明の効果】
以上詳述したように本発明によれば、画像処理装置において、入力画像と登録データの類似度が大きい登録データを、ユーザの選択操作を介さずに印刷することにより、ユーザの操作性を大幅に向上させることが可能になる。
【図面の簡単な説明】
【図１】本発明の実施形態に係るシステムの構成を示すブロック図である。
【図２】本発明の実施形態に係るＭＦＰの構成を示すブロック図である。
【図３】本発明の実施形態に係る原本処理手順を示すフローチャートである。
【図４】本発明の実施形態に係るブロックセレクション処理の実施例である。
【図５】概略処理手順を示すフローチャートである。
【図６】ユーザインタフェース画面の例を示す図である。
【図７】一覧選択ユーザインタフェース画面の例を示す図である。
【図８】ユーザインタフェース画面の例を示す図である。
【図９】テキスト検索の基本ベクトル辞書の例である。
【図１０】テキストの文書ベクトルインデックスの例である。
【図１１】ブロック例を示す図である。
【図１２】ファイル検索処理の処理手順を示すフローチャートである。
【図１３】ファイル検索処理のブロック比較処理手順を示すフローチャートである。
【図１４】ＤＡＯＦ例を示す図である。
【図１５】アプリデータ変換処理手順を示すフローチャートである。
【図１６】文書構造ツリー生成処理手順を示すフローチャートである。
【図１７】文書構造ツリー説明図である。
【図１８】閾値設定ユーザインタフェース画面の例を示す図である。
【図１９】レイアウト情報によるファイル選別処理手順を示すフローチャートである。
【図２０】ユーザインタフェース画面の例を示す図である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus for retrieving image data similar to an image read by an input device such as a scanner from a database and outputting it.
[0002]
[Prior art]
2. Description of the Related Art In recent years, an image processing apparatus has been proposed in which a paper document or distributed material stored in a binder or the like is read by a scanner and original electronic data is searched. Patent Document 1 expands an electronic document in a database into a raster image, compares it with a scanned image, narrows down search results, compares the document with the highest similarity with a predetermined reference value, and exceeds the reference value. Then, the document is output to the display unit, and then printed or transmitted.
[0003]
[Patent Document 1]
JP2001-256256
[0004]
[Problems to be solved by the invention]
In Patent Document 1, when it is desired to search and print an original document, even if the degree of similarity is sufficiently large, the search result is once displayed on the display unit, and printing or transmission is selected, which requires extra work. .
[0005]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, an image processing apparatus according to claim 1 of the present invention is an image processing apparatus that searches an image similar to an input document image from registered data. An input means for inputting a document image; Above Entered by input means As a result of the calculation by the similarity calculation means for calculating the similarity between the document image and the registered data, and the similarity calculation means, the similarity with the document image is Most Registration data that has been determined that the difference between the similarity of the highest registered data and the similarity of the next highest registered data is greater than a predetermined value Address The A notification means for automatically notification, and registered data whose address is notified by the notification means. Print control means for controlling printing, and
It is characterized by having.
[0006]
In order to solve the above problem, an image processing method according to claim 4 of the present invention is an image processing method for retrieving an image similar to an input document image from registered data. An input step of inputting a document image by an input means; Above Entered by input means The similarity between the document image and the registered data Similarity calculation means The difference between the similarity calculation step to be calculated and the similarity of the registration data having the highest similarity to the document image and the similarity of the registration data having the next highest similarity is predetermined as a result of the calculation by the similarity calculation step. Registration data determined to be greater than the value of Address The A notification step automatically notified by the notification means, and registration data for which the address was notified in the notification step To print Print control means And a printing control step for controlling.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of an image processing apparatus according to the present invention. In the present embodiment, an environment in which the office 10 and the office 20 are connected by the Internet 104 is given. A LAN 107 constructed in the office 10 is connected to the MFP 100, a management PC 101 that controls the MFP 100, a client PC (external storage means) 102, a document management server 106, its database 105, and a proxy server 103. The LAN 107 and the LAN 108 in the office 20 are connected to the Internet 104 via the proxy server 13. In the present invention, the MFP 100 is in charge of a part of image processing for the image signal read by the image reading unit of the paper document, and the image signal is input to the management PC 101 using the LAN 109. The management PC is a normal PC and includes an image storage unit, an image processing unit, a display unit, and an input unit. A part of the management PC is integrated with the MFP 100.
[0011]
FIG. 2 is a configuration diagram of the MFP 100. In FIG. 2, an image reading unit 110 including an auto document feeder (hereinafter referred to as ADF) irradiates a bundle or one original image with a light source (not shown), and forms an original reflection image on a solid-state image sensor with a lens. A raster-like image reading signal is obtained as image information from the solid-state imaging device. In the normal copying function, this image signal is image-processed into a recording signal by the data processing unit 115. In the case of copying every plural number, recording data for one page is temporarily stored in the recording unit 111 and then sequentially output to the recording unit 112. Then, an image is formed on the paper.
[0012]
On the other hand, print data output from the client PC 102 is converted into raster data that can be recorded by the data processing unit 115 from the LAN 107 via the network IF 114, and then formed as a recorded image on paper by the recording unit.
[0013]
An operator's instruction to the MFP 100 is performed from a key operation unit provided in the MFP and an input unit 113 including a keyboard and a mouse that are input to the management PC. These series of operations are a control unit (not shown) in the data processing unit 115. It is controlled by.
[0014]
On the other hand, the status display of the operation input and the display of the image data being processed are performed on the display unit 116. The storage unit 111 is also controlled by the management PC, and data exchange and control between the MFP and the management PC are performed using the network IF 117 and the directly connected LAN 109.
[0015]
〔Outline of processing〕
Next, an outline of image processing according to the present invention will be described with reference to FIG.
[0016]
In document input processing (2001) for inputting a document, the image reading unit 110 of the MFP 100 is operated to scan one document in a raster pattern to obtain an image signal. Next, in the determination process (2002) for determining the process set in advance in the process setting, the setting set in the user interface as shown in FIG. 6 is determined. If document output is set, the image input in 2001 is output as it is to print / edit / store / transmit / record the image (2004). If an original output for searching for an original is set, an original process (2003) is performed and output to print / edit / store / transmit / record the image (2004).
[0017]
[Outline of original processing]
Next, an outline of the original processing of image processing according to the present invention will be described with reference to FIG.
[0018]
The image signal input in the document input process is preprocessed by the data processing unit 115 and stored in the storage unit 111 as image data for one page. The CPU of the management PC 101 first separates the area from the stored image signal into a character / line image portion and a halftone image portion, and the character portion is further divided into blocks or a block or a line. Separated into organized tables and figures and segmented. On the other hand, the image portion expressed by halftone is divided into independent objects for each so-called block, such as an image portion of a block separated into rectangles, a background portion, and the like (step 121).
[0019]
At this time, a two-dimensional barcode recorded as additional information in the original image or an object corresponding to the URL is detected, and the URL recognizes the character by OCR. If the two-dimensional barcode is used, the mark is decoded (step 122). Pointer information in the storage unit storing the original electronic file of the original is detected (step 123). There are other means for adding pointer information, such as a method of embedding information in the space between characters and a method of embedding in a halftone image, such as a so-called digital watermark method that is not directly visualized.
[0020]
If the pointer information is detected in step 124, the process branches to step 125, and the original electronic file is searched from the address indicated by the pointer. The electronic file is a document registered by scanning, a document created by an application, or the like, and is stored in the hard disk in the client PC in FIG. 1 or in the document management server 105 connected to the LAN of the office 10 or 20. It is stored either in the database 105 or in the storage unit 111 included in the MFP 100 itself. If the electronic file is not found in step 125, if it is found but is a so-called image file represented by PDF or TIFF, or if the pointer information itself does not exist in step 124, the process branches to step 126.
[0021]
In step 126, in order to search the original electronic file on the database, first, the input image is converted into vector data. First, for the character block that has been OCR in step 122, the character size, style, and font are further recognized and converted into font data that is visually faithful to the character obtained by scanning the document. A table or figure block composed of one line is outlined, and if the figure shape such as a table can be recognized, the shape is recognized. The image block is processed as an individual image file as image data. These vectorization processes are performed for each object. The image converted to the file vector data on the database is checked for similarity with each file on the database in step 127, and the original is searched. In this embodiment, the original file is searched faithfully using the vector data converted in step 126. The degree of similarity is obtained for each object, and the degree of similarity for each object is reflected in the degree of similarity of the entire file according to the occupancy ratio of the object in the file. Since the degree of similarity of objects that occupy a large percentage in the file is more greatly reflected in the degree of similarity of the entire file, it is possible to adaptively handle files of any format.
[0022]
As a result of comparing the similarity with the threshold value in step 128, when the candidate is one file, the similarity of the file is compared with the predetermined threshold value when the number of candidates is plural and the similarity of the file with the highest similarity is compared. If it is higher than the threshold value, the process automatically branches to step 134 to notify the storage address. Note that this branch determination does not compare with the threshold value, but as a branch condition for branching to 134 if the difference between the highest similarity and the second highest similarity is greater than a predetermined value. Alternatively, it may be configured such that the file having the highest similarity is selected unconditionally and the process proceeds to step 134 without setting a branch. By not interposing a user's selection operation between scanning and receiving an output such as printing, operability can be greatly improved.
[0023]
If there is no file whose similarity exceeds the threshold value in step 128, thumbnails and the like are displayed in the order of similarity as shown in FIG. 7 (step 129). Specify a file from among the files. In step 130, there is an electronic file desired by the user in the file displayed in step 129, and if it is selected, the process branches to step 134 to notify the storage address of the file. Branch to
[0024]
In step 131, vectorization processing is performed to register the input data. The vectorization process is performed for each object, and further, layout information of each object is stored, converted into, for example, rtf (step 131), and stored as an electronic file in the storage unit 111 (step 132).
[0025]
First, in step 133, index information for search is generated and added to the search index file so that the vectorized document image can be directly searched as an electronic file when the same processing is performed thereafter. In step 134, the storage address when it is stored in the storage unit is notified to the operator.
[0026]
As described above, using the electronic file itself obtained by the present invention, for example, printing, transmission, processing, storage, and recording of a document can be performed in step 135. In the above embodiment, the storage address is notified to the operator. However, the document may be printed, transmitted, processed, stored, and recorded without notification.
[0027]
Hereinafter, each processing block will be described in detail.
[0028]
First, the block selection process shown in step 121 will be described.
[0029]
[Block selection processing]
As shown in FIG. 4, the block selection process recognizes a document image as a block for each object, determines each block as an attribute such as character / drawing / photo / line / table, etc. This is a process of dividing.
[0030]
An example of the block selection process will be described below.
[0031]
First, the input image is binarized into black and white, and contour tracking is performed to extract a block of pixels surrounded by a black pixel contour. For a black pixel block with a large area, contour tracing is also performed for white pixels inside, and a white pixel block is extracted, and the black pixel block is recursively extracted from the white pixel block with a certain area or more. Extract lumps.
[0032]
The black pixel blocks obtained in this way are classified by size and shape, and are classified into regions having different attributes. For example, if the aspect ratio is close to 1 and the size is within a certain range, the pixel block corresponding to the character is used, the portion where the adjacent characters can be grouped in a well-aligned manner is the character region, and the flat pixel block is the line region. The area occupied by the black pixel block that is larger than the size and contains the square white pixel block in a well-aligned manner is the table region, the region where the irregular pixel block is scattered is the photo region, and the pixel block of any other shape is used. A drawing area, etc.
[0033]
FIG. 4 shows block information for each block obtained by the block selection process.
[0034]
Information for each block is used as information for vectorization or search described below.
[0035]
[Character recognition]
The character recognition unit recognizes an image cut out in character units using a pattern matching method, and obtains a corresponding character code. This recognition process recognizes the character type with the closest distance by comparing the observed feature vector obtained by converting the feature obtained from the character image into a numerical sequence of several tens of dimensions and the dictionary feature vector obtained for each character type in advance. The resulting process. There are various known methods for extracting a feature vector. For example, there is a method characterized by dividing a character into meshes, and using a mesh number-dimensional vector obtained by counting character lines in each mesh as line elements according to directions.
[0036]
When character recognition is performed on the character area extracted in block selection (step 121), first, horizontal writing and vertical writing are determined for the corresponding area, lines are cut out in the corresponding directions, and then characters are cut out. Get a character image. Horizontal / vertical writing can be determined by taking a horizontal / vertical projection of the pixel value in the corresponding area, and determining that the horizontal projection area is large when the horizontal projection variance is large, and vertical writing area when the vertical projection variance is large. . For horizontal writing, character strings and characters are decomposed by cutting out lines using horizontal projection, and then cutting out characters from the vertical projection of the cut lines. For vertically written character areas, horizontal and vertical may be reversed. At this time, the character size can be detected.
[0037]
[File Search]
Next, details of the file search process shown in step 127 of FIG. 3 will be described with reference to FIGS.
[0038]
In this embodiment, a search is performed using each block information divided and vectorized by the block selection process described above. Specifically, the search is a comparison between the attribute of each block and the block coordinate information in the file, that is, comparison by layout, and internal information comparison for each block to which a different comparison method is applied depending on the attribute of each block in the file. Use complex compound search.
[0039]
FIG. 11 shows an example of the scanned image data (input file) vectorized at step 126 of FIG. 3, which is divided into blocks B′1 to B′9 and each vectorized.
[0040]
FIG. 12 is a flowchart for calculating the similarity by sequentially comparing the input file with image data (database file) already vectorized and stored on the database. First, the database file is accessed from the database (step 501). Each block of the input file is compared with each block of the database file, and a similarity with the block of the database file is obtained for each block of the input file (step 502).
[0041]
Here, when calculating the similarity for each block, the target block of the database file that is presumed to be identical in layout to the block of the input file is first selected according to the flowchart shown in FIG. In this process, the target block of the database file may be selected in duplicate for a plurality of blocks of the input file. Next, the similarity of the layout information between the block and the target block is obtained. The block positions, sizes, and attributes are compared (steps 512, 513, and 514), and the layout similarity is obtained from the error. Next, the inside of the block is compared. When the inside of the block is compared, the same attribute is compared. Therefore, if the attributes are different, pre-processing such as re-vectorizing one block to the matching attribute is performed. The block of the input file and the target block of the database file that are treated as having the same attribute by the preprocessing are subjected to block internal comparison (step 515).
[0042]
In the block internal comparison, since the optimum comparison method is adopted for the block attribute, the comparison method differs depending on the attribute. For example, the block is divided into attributes such as text, photograph, table, and line drawing by the block selection process described above. When comparing text blocks, since the character code and font are determined by vectorization processing, the similarity of the sentence is calculated from the matching degree of each character, and the similarity inside the block is calculated. In the photographic image block, the similarity is calculated from the feature vector extracted from the image based on the error in the feature space. The feature vector here includes a plurality of color feature values such as a color histogram and a color moment, a co-occurrence matrix, a texture feature amount expressed by contrast, entropy, Gabor transformation, and a shape feature amount such as a Fourier descriptor. The optimum combination is used among the plurality of feature quantities. Also, in a line drawing block, the line drawing block is expressed as an outline line, a ruled line, or a set of curves by vectorization processing. Therefore, by calculating errors such as the start point, end point position, and curvature of each line, the degree of similarity of line drawing can be calculated Calculated. In the table block, the similarity of the entire table block can be calculated by calculating the number of grids in the table, the size of each frame, the text similarity in each grid, and the like.
[0043]
From the above, it is possible to calculate the block position, size, attribute, similarity inside the block, and to calculate the similarity for the block of the input file by summing up the similarities. Record. A series of processing is repeated for all blocks of the input file. All of the obtained block similarities are integrated to obtain the similarity of the input file (step 503). The integration process will be described. Assume that the similarity for each block is calculated as n1 to n9 for the blocks B1 ′ to B9 ′ of the input file in FIG. At this time, the overall similarity N of the entire file is expressed by the following equation.
N = w1 * n1 + w2 * n2 + w3 * n3 +. + W9 * n9 + γ (1)
[0044]
Here, w1 to w9 are weighting factors for evaluating the similarity of each block. γ is a correction term, for example, an evaluation value of a block not selected as a target block for the input file of the database file. The weighting factors w1 to w9 are obtained from the occupancy rate of the block in the file. For example, if the sizes of the blocks 1 to 9 are S1 to S9, the occupation ratio w1 of the block 1 is
w1 = S1 / (S1 + S2 +... + S9) (2)
Can be calculated as By such weighting processing using the occupation ratio, the similarity of blocks that occupy a large area in the file is more reflected in the similarity of the entire file.
[0045]
[Calculating text search similarity in file search]
When a document is registered, words included in the registered document are acquired. Next, it is calculated from words appearing in the document using a basic vector dictionary. FIG. 9 shows the structure of the basic vector dictionary. The basic vector dictionary stores a feature quantity corresponding to each dimension (Dim.) At the time of vector expression for each word. For the dimension, a standard classified according to the original meaning of the word, a standard classified according to the field of use of the word, or the like is adopted. Dim. 1 is 0, and Dim. It can be seen that the feature amount of 2 is 23. In this way, it is possible to obtain the feature quantity of each dimension (Dim.) In one word from the dictionary. The feature amount can be interpreted as a value indicating how much the document may characterize the classification standard (= dimension) by using the word. The feature amount of the entire document is expressed by a vector having the classification reference as a dimension from the feature amounts of the classification reference (by dimension) obtained from all the words constituting the document. A value obtained by normalizing the obtained vector with norm = 1 is stored as a document vector. The document vector is stored in an index as shown in FIG. Dim. Of document vector of document ID = 6947. 1 is 0.183, and Dim. It can be seen that the feature amount of 2 is 0.214.
[0046]
[Conversion to application data]
By the way, the image data for one page is subjected to block selection processing (step 121), and the result of vectorization processing (step 126) is converted as an intermediate data format file as shown in FIG. The format is called Document Analysis Output Format (DAOF).
[0047]
FIG. 14 shows the data structure of DAOF.
[0048]
In FIG. 14, reference numeral 791 denotes a header, which holds information relating to document image data to be processed. In the layout description data portion 792, attributes such as TEXT (character), TITLE (title), CAPTION (caption), LINEART (line drawing), EPICTURE (natural image), FRAME (frame), TABLE (table), etc. in the document image data. The attribute information of each block recognized every time and its rectangular address information are held. The character recognition description data portion 793 holds character recognition results obtained by character recognition of TEXT blocks such as TEXT, TITLE, and CAPTION. The table description data portion 794 stores details of the structure of the TABLE block. The image description data portion 795 cuts out image data of blocks such as PICTURE and LINEART from the document image data and holds them.
[0049]
Such a DAOF is not only used as intermediate data but may be stored as a file itself. In this file state, it is not possible to reuse individual objects in a so-called general document creation application. I can't. Next, the process of converting DAOF to application data (step 131) will be described in detail.
[0050]
FIG. 15 is a schematic flow of application data conversion.
In 8000, DAOF data is input.
8002 generates a document structure tree that is the source of application data.
In step 8004, actual data in the DAOF is poured based on the document structure tree to generate actual application data.
[0051]
FIG. 16 is a detailed flow of the 8002 document structure tree generation unit, and FIG. 17 is an explanatory diagram of the document structure tree. As a basic rule of overall control, the flow of processing shifts from a micro block (single block) to a macro block (an aggregate of blocks).
[0052]
Hereinafter, the block refers to the micro block and the entire macro block.
[0053]
8100 performs regrouping based on the relevance in the vertical direction in units of blocks. Immediately after the start, judgment is made in units of micro blocks.
[0054]
Here, the relevance can be defined by the fact that the distance is close and the block width (height in the horizontal direction) is substantially the same.
[0055]
Information such as distance, width, and height is extracted with reference to DAOF.
[0056]
FIG. 17A shows an actual page configuration, and FIG. 17B shows its document structure tree. As a result of 8100, T3, T4, and T5 are generated as one group V1, and T6 and T7 are generated as one group V2 in the same hierarchy.
[0057]
8102 checks whether or not there is a separator in the vertical direction. For example, the separator is physically an object having a line attribute in the DAOF. Also, logically, it is an element that explicitly divides a block in the application. If a separator is detected here, it is subdivided at the same level.
[0058]
8104 uses the group length to determine whether there are no more divisions.
[0059]
If the group length in the vertical direction is the page height, the document structure tree generation ends.
[0060]
In the case of FIG. 17, since there is no separator and the group height is not the page height, the process proceeds to 8106.
[0061]
In step 8106, regrouping is performed based on the relevance in the horizontal direction in units of blocks. Here too, the first time immediately after the start is determined in units of microblocks.
[0062]
The definition of the relevance and the determination information is the same as in the vertical direction.
[0063]
In the case of FIG. 17, H1 at T1 and T2 and H2 at V1 and V2 are generated as a group in the same hierarchy one above V1 and V2.
[0064]
8108 checks for the presence of a horizontal separator.
[0065]
In FIG. 17, since there is S1, this is registered in the tree, and a hierarchy of H1, S1, and H2 is generated.
[0066]
8110 uses the group length to determine whether there are no more divisions.
[0067]
If the horizontal group length is the page width, the document structure tree generation ends.
[0068]
If not, the process returns to 8102, and the relevance check in the vertical direction is repeated again at the next higher level.
[0069]
In the case of FIG. 17, since the division width is the page width, the process ends here, and finally V0 of the highest hierarchy representing the entire page is added to the document structure tree.
[0070]
After the document structure tree is completed, application data is generated in 8006 based on the information.
[0071]
Specifically, in the case of FIG.
[0072]
That is, since there are two blocks T1 and T2 in the horizontal direction, H1 has two columns, and after T1 internal information (refer to DAOF, text of character recognition result, image, etc.) is output, the column is changed and the internal of T2 Information is output, and then S1 is output.
[0073]
Since H2 has two blocks V1 and V2 in the horizontal direction, it outputs as two columns, V1 outputs its internal information in the order of T3, T4, T5, then changes the column, and outputs the internal information of T6, T7 of V2 To do.
[0074]
As described above, conversion processing to application data can be performed.
[0075]
[Another Example 1 in File Search]
In the above embodiment, when comparing the input file and the database file in the file search, the layout information and the internal information of the block are compared for all the blocks of all the input files. However, it is possible to select files to some extent at the stage of comparing layout information without comparing block internal information. In other words, a database file whose layout is completely different from that of the input file can omit the block internal information comparison process. FIG. 19 is a flowchart when file selection is performed based on layout information. First, the positions, sizes, and attributes of all the blocks of the input file are compared, the similarity is obtained, and the layout similarity of the entire file is obtained (step 522). If the layout similarity is lower than the threshold, the block internal information comparison is not performed (step 523). Only when it is higher than the threshold value, that is, when the layout is similar, the block internal information is compared (step 524), and the overall similarity of the entire file is obtained from the previously obtained layout similarity and the similarity inside the block ( Step 525). The method for obtaining the total similarity from the similarity for each block is the same processing as step 503 in FIG. A file whose similarity is equal to or higher than a threshold is stored as a candidate. The process for obtaining the similarity of the block internal information is generally a heavy process, particularly when checking the match of a photo block. Therefore, by narrowing down the files to some extent by the layout, the amount of search processing can be reduced and the processing speed can be increased, and a desired file can be searched efficiently.
[0076]
[Another Example 2 in File Search]
All of the above-described embodiments are search processing embodiments when a user performs a search without specifying anything when searching for a file. However, it is possible to further optimize the file search by allowing the user to specify a part to be a feature in the document (a block obtained from block selection), omit a useless block, or specify a feature in the document. It becomes possible.
[0077]
FIG. 8 shows an example of a user interface screen (1001) for specifying a search option by the user at the time of search. The input file is divided into a plurality of blocks by block selection processing, and each block such as text, photo, table, and line drawing on the file is displayed as a thumbnail on the input screen (1011 to 1017). The user selects a block as a feature from the displayed blocks. A plurality of blocks may be selected at this time. As an example, assume that block 1014 is selected. When the button “priority” (1003) is pressed in a state where the block 1014 is selected, a search process that places more importance on the block 1014 is performed. The important search is, for example, to increase the weight coefficient of the block 1014 designated in the calculation formula (1) for obtaining the similarity of the entire file from the similarity of each block, and to decrease the weight coefficient of the non-selected block. This can be achieved. If the “priority” button (1004) is pressed a plurality of times, the weighting coefficient of the selected block is increased, and a search with more emphasis on the block can be performed. If the exclude button (1004) is pressed, the search process is performed with the selected block 1014 omitted. When a block is recognized by mistake, useless search processing can be omitted and erroneous search processing can be prevented. Further, it is possible to change the attribute of the block by the detailed setting (1005), and even when the attribute is mistakenly recognized in the block selection (step 121), the user can correct it and correct the search. Also, in the detailed setting 1005, the user can finely adjust the weight for prioritizing the block search. As described above, when a search is performed, the user can specify and set a characteristic block to optimize the search.
[0078]
On the other hand, some files may have special layouts. For such a file, selecting the layout priority button (1005) in FIG. 8 enables file search with an emphasis on layout. In this case, weighting is performed so that the layout similarity result is more important. The text priority button (1006) can search only text blocks and reduce processing.
[0079]
In this way, by allowing the user to select image features, a search that emphasizes file features can be performed. In addition, when the user's artificial means is trusted, that is, when the weight is changed by the user designation, if the restriction that the changed weight becomes less than the threshold value is not searched, the restriction of the user is simplified. It is also possible to significantly reduce useless block search processing with simple operations.
[0080]
(Other examples)
In the above embodiment, as shown in FIG. 6, processing is selected and executed from original output and original output, but the present invention is not limited to this. As shown in FIG. 20, processing may be selected from original output, original registration, original output (original registration is not performed), and original output (original registration is performed). If original registration is selected, the process starts from step 131 shown in FIG. 3 after inputting the image, and the image is not printed. If document output (original registration is not registered) is selected, after inputting the image, the process jumps to step 135 to print the image. If document output (original registration) is selected, the process starts from step 131 after image input, and registration and image printing are performed.
[0081]
In the above embodiment, the threshold value and setting value to be compared in step 128 are set in advance, but means for setting the threshold value may be provided. In that case, for example, the setting may be made with an interface as shown in FIG.
[0082]
【The invention's effect】
As described above in detail, according to the present invention, in an image processing apparatus, registration data having a high degree of similarity between an input image and registration data is printed without the user's selection operation, thereby greatly improving user operability. It becomes possible to improve.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a system according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of an MFP according to the embodiment of the present invention.
FIG. 3 is a flowchart showing an original processing procedure according to the embodiment of the present invention.
FIG. 4 is an example of block selection processing according to the embodiment of the present invention.
FIG. 5 is a flowchart showing a schematic processing procedure;
FIG. 6 is a diagram illustrating an example of a user interface screen.
FIG. 7 is a diagram illustrating an example of a list selection user interface screen.
FIG. 8 is a diagram illustrating an example of a user interface screen.
FIG. 9 is an example of a basic vector dictionary for text search.
FIG. 10 is an example of a text document vector index.
FIG. 11 is a diagram illustrating a block example.
FIG. 12 is a flowchart showing a processing procedure for file search processing;
FIG. 13 is a flowchart illustrating a block comparison process procedure of a file search process.
FIG. 14 is a diagram illustrating an example of a DAOF.
FIG. 15 is a flowchart showing an application data conversion processing procedure.
FIG. 16 is a flowchart showing a document structure tree generation processing procedure;
FIG. 17 is an explanatory diagram of a document structure tree.
FIG. 18 is a diagram illustrating an example of a threshold setting user interface screen.
FIG. 19 is a flowchart showing a file selection processing procedure based on layout information.
FIG. 20 is a diagram illustrating an example of a user interface screen.

Claims

In an image processing apparatus for retrieving an image similar to an input document image from registered data,
An input means for inputting a document image;
Similarity calculating means for calculating the similarity between the document image input by the input means and registered data;
The calculation by the similarity calculating unit results, similarity between the document image is determined as the difference between the similarity of the next degree of similarity is high registration data similarity of the most high registration data is larger than a predetermined value Notification means for automatically notifying the address of the registered data ,
Print control means for controlling to print registration data whose address is notified by the notification means ;
An image processing apparatus comprising:

If the difference between the similarity of the registered data having the highest similarity to the document image and the similarity of the registered data having the next highest similarity is not greater than the predetermined value, the plurality of registered data are set to the similarity. The image processing apparatus according to claim 1, further comprising list display means for displaying a list based on the list.

The image processing apparatus according to claim 1, further comprising a value setting unit configured to set the predetermined value.

In an image processing method for retrieving an image similar to an input document image from registered data,
An input step of inputting a document image by an input means;
A similarity calculating step in which the similarity calculating means calculates the similarity between the document image input by the input means and the registered data;
As a result of the calculation by the similarity calculation step, it is determined that the difference between the similarity of the registered data having the highest similarity with the document image and the similarity of the registered data having the next highest similarity is larger than a predetermined value. A notification step in which the notification means automatically notifies the address of the registered data ;
A print control step in which the print control means controls to print the registration data notified of the address in the notification step;
An image processing method comprising:

A computer program comprising program code for causing a computer to execute each step described in the image processing method according to claim 4.