JP2004348467A

JP2004348467A - Image retrieval apparatus and its control method, program

Info

Publication number: JP2004348467A
Application number: JP2003145193A
Authority: JP
Inventors: Tomotoshi Kanatsu; 知俊金津
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2003-05-22
Filing date: 2003-05-22
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image retrieval apparatus and its control method, and a program which enable original electronic data to be accurately retrieved from a paper document. <P>SOLUTION: A printed matter is read out as electronic data and inputted as a retrieval key image. The retrieval key image is divided into a character area and a non-character area. Then similarity between the retrieval key image and registered images is calculated on the basis of the feature quantities of the character in the character area, the feature quantities of the image in the non-character area and positional relation between the character area and the non-character area. Then a registered image corresponding to the retrieval key image is retrieved on the basis of the calculation result. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、印刷物を検索キーとして入力して、その印刷物に対応する画像を検索する画像検索装置及びその方法、プログラムに関するものである。
【０００２】
【従来の技術】
近年、オフィスでのペーパーレス化は急速に進み、パーソナルコンピュータ（ＰＣ）等の端末上で作成された書類は、もちろん、旧来よりバインダー等で蓄積されていた過去の紙文書もスキャナで画像データ（電子文書）に変換し、データベース上に蓄積するようになっている。
【０００３】
一方で、会議での配付資料等には依然として紙文書が好まれ、データベースに蓄積された電子文書が、紙へとプリントアウトされてユーザの手に渡る機会も多い。
【０００４】
ここで、紙文書を受けとったユーザが、それを電子的に保管や送信、あるいは内容を抽出して再利用したいと考えた時、その紙文書を再電子化して電子文書を得るよりも、データベースより、その紙文書のオリジナルの電子（画像）データを取得して利用するほうが、紙文書を再電子化して利用するよりも情報の損失が少なく望ましい。
【０００５】
このような要求に答えるため、例えば、特許文献１では、紙文書をスキャナで読み取り、その内容と類似する画像データをデータベースより検索することが可能なシステムが考案されている。
【０００６】
また、検索対象がオフィス等で一般的に利用される文書であると想定すると、文書内には、大きくわけて文字情報と、写真や図等の非文字情報の２種類が存在しており、検索の際にはそれぞれの特性に応じた類似度算出処理を用いることで、より処理精度を高めることができる。
【０００７】
更に、特許文献２で開示される領域識別処理を使えば、紙文書をスキャンした文書画像中の文字領域と非文字領域をそれぞれ分離、抽出することができる。この技術を用いると、文字と非文字が混在する文書に関してもそれぞれの領域を特定し、文字領域からは、文字認識処理によって得られた文字コード列からキーワードを抽出し、登録データの文字コード列と照合した際の一致度を用いた類似度が得られる。一方、非文字領域からは、色、エッジ等の画像特徴量による類似度が得られる。この類似度を適当な配分で加算することで、文字・非文字両方の類似度情報を利用した検索が可能であった。
【０００８】
【特許文献１】
特許第３０１７８５１号
【特許文献２】
米国特許第５６８０４７８号
【０００９】
【発明が解決しようとする課題】
上記のような環境において、特に、オフィス等で登録される文書画像データの中には、類似文書画像データが多く含まれる可能性がある。例えば、図画部品や文面を流用することにより、同じ写真を含みながら文面の違う文書、あるいは文面は同じで図の異なる文書等が多く作成されて登録される。
【００１０】
一方、紙文書画像と登録画像の類似度比較において、従来例のように、文字及び非文字の類似度を各々独立に求めて適当な配分で総和する方式は、その配分が検索精度に大きく影響し、特に、両者の類似度に偏りが大きい場合、常に検索に適当な配分を求めることは困難だった。
【００１１】
また、内容の類似する文書画像群からオリジナル文書の文書画像を特定するために、文字や非文字の領域の位置も特徴量に含めて検索をおこなう方法もあった。
【００１２】
しかしながら、オリジナル文書が電子文書から生成されている場合、その電子文書から作成された紙文書との間でレイアウトが厳密には一致しない場合がある。例えば、ＨＴＭＬ形式の電子文書の場合、プリントアウト時のレイアウトは環境によって大きく異なる。このようなケースでは、その紙文書中の文字や非文字領域の位置情報を用いて、オリジナルの電子文書を検索しようとすると、かえって検索精度が悪化する可能性があった。
【００１３】
本発明は上記の課題を解決するためになされたものであり、紙文書からオリジナルの電子データを精度良く検索することができる画像検索装置及びその制御方法、プログラムを提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記の目的を達成するための本発明による画像検索装置は以下の構成を備える。即ち、
印刷物を検索キー画像として入力して、その検索キー画像に対応する登録画像を検索する画像検索装置であって、
前記検索キー画像を文字領域と非文字領域に領域分割する領域分割手段と、
前記文字領域の文字特徴量と、前記非文字領域の画像特徴量と、該文字領域と該非文字領域の位置関係に基づいて、前記検索キー画像と前記登録画像間の類似度を算出する算出手段と、
前記算出手段による算出結果に基づいて、前記検索キー画像に対応する登録画像を検索する検索手段と
を備える。
【００１５】
また、好ましくは、前記文字特徴量は、前記文字領域を文字認識して得られる文字コードである。
【００１６】
また、好ましくは、前記算出手段は、前記非文字領域の画像特徴量に基づく類似度が閾値以上である場合に、前記文字領域中の所定文字列と該非文字領域との位置関係に基づいて、前記検索キー画像と前記登録画像間の類似度を算出する。
【００１７】
また、好ましくは、前記算出手段は、前記検索キー画像及び前記登録画像双方の処理対象の前記非文字領域の所定範囲内に位置する前記文字領域中の所定文字列同士の相関を前記類似度として算出する。
【００１８】
また、好ましくは、前記算出手段は、前記検索キー画像と前記登録画像双方の処理対象の前記非文字領域と、その所定範囲内に位置する前記文字領域中の所定文字列間の距離同士を比較し、その比較結果に基づく類似度を算出する。
【００１９】
また、好ましくは、前記算出手段は、前記検索キー画像と前記登録画像双方の処理対象の前記非文字領域から、その所定範囲内に位置する前記文字領域中の所定文字列へ向かう角度同士を比較し、その比較結果に基づく類似度を算出する。
【００２０】
上記の目的を達成するための本発明による画像検索方法は以下の構成を備える。即ち、
印刷物を検索キー画像として入力して、その検索キー画像に対応する登録画像を検索する画像検索方法であって、
前記検索キー画像を文字領域と非文字領域に領域分割する領域分割工程と、
前記文字領域の文字特徴量と、前記非文字領域の画像特徴量と、該文字領域と該非文字領域の位置関係に基づいて、前記検索キー画像と記憶媒体に記憶されている複数の登録画像間の類似度を算出する算出工程と、
前記算出工程による算出結果に基づいて、前記検索キー画像に対応する登録画像を前記記憶媒体に記憶されている複数の登録画像から検索する検索工程と
を備える。
【００２１】
上記の目的を達成するための本発明によるプログラムは以下の構成を備える。即ち、
印刷物を検索キー画像として入力して、その検索キー画像に対応する登録画像を検索する画像検索を実現するプログラムであって、
前記検索キー画像を文字領域と非文字領域に領域分割する領域分割工程のプログラムコードと、
前記文字領域の文字特徴量と、前記非文字領域の画像特徴量と、該文字領域と該非文字領域の位置関係に基づいて、前記検索キー画像と記憶媒体に記憶されている複数の登録画像間の類似度を算出する算出工程のプログラムコードと、
前記算出工程による算出結果に基づいて、前記検索キー画像に対応する登録画像を前記記憶媒体に記憶されている複数の登録画像から検索する検索工程のプログラムコードと
を備える。
【００２２】
【発明の実施の形態】
以下、本発明の実施の形態について図面を用いて詳細に説明する。
【００２３】
図１は本発明の実施形態の画像処理システムの構成を示すブロック図である。
【００２４】
この画像処理システムは、オフィス１０とオフィス２０とをインターネット等のネットワーク１０４で接続された環境で実現する。
【００２５】
オフィス１０内に構築されたＬＡＮ１０７には、複数種類の機能を実現する複合機であるＭＦＰ（ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）１００、ＭＦＰ１００を制御するマネージメントＰＣ１０１、文書管理サーバ１０６及びそのデータベース１０５、及びプロキシサーバ１０３が接続されている。
【００２６】
また、オフィス２０内に構築されたＬＡＮ１０８には、例えば、ＭＦＰ１００や文書管理サーバ１０６のユーザとして機能するユーザＰＣ１０２が接続されている。
【００２７】
オフィス１０内のＬＡＮ１０７及びオフィス２０内のＬＡＮ１０８は、双方のオフィスのプロキシサーバ１０３を介してネットワーク１０４に接続されている。
【００２８】
ＭＦＰ１００は、特に、紙文書を電子的に読み取る画像読取部と、画像読取部から得られる画像信号に対する画像処理を実行する画像処理部を有し、この画像信号はＬＡＮ１０９を介してマネージメントＰＣ１０１に送信することができる。
【００２９】
マネージメントＰＣ１０１は、通常のＰＣであり、内部に画像記憶部、画像処理部、表示部、入力部等の各種構成要素を有するが、その構成要素の一部はＭＦＰ１００に一体化して構成されている。
【００３０】
尚、ネットワーク１０４は、典型的にはインターネットやＬＡＮやＷＡＮや電話回線、専用デジタル回線、ＡＴＭやフレームリレー回線、通信衛星回線、ケーブルテレビ回線、データ放送用無線回線等のいずれか、またはこれらの組み合わせにより実現されるいわゆる通信ネットワークであり、データの送受信が可能であれば良い。
【００３１】
また、マネージメントＰＣ１０１、ユーザＰＣ１０２、文書管理サーバ等の各種端末はそれぞれ、汎用コンピュータに搭載される標準的な構成要素（例えば、ＣＰＵ、ＲＡＭ、ＲＯＭ、ハードディスク、外部記憶装置、ネットワークインタフェース、ディスプレイ、キーボード、マウス等）を有している。
【００３２】
更に、図１の画像処理システムの構成は、一例であり、例えば、オフィス１０とオフィス２０は同一ＬＡＮ上で構成されていても良い。
【００３３】
次に、ＭＦＰ１００の詳細構成について、図２を用いて説明する。
【００３４】
図２は本発明の実施形態のＭＦＰの詳細構成を示すブロック図である。
【００３５】
図２において、原稿台とオートドキュメントフィーダ（ＡＤＦ）を含む画像読取部１１０は、束状のあるいは１枚の原稿画像を光源（不図示）で照射し、原稿反射像をレンズで固体撮像素子上に結像し、固体撮像素子からラスタ状の画像読取信号を所定密度（例えば、６００ＤＰＩ）のラスタ画像として得る。
【００３６】
尚、本実施形態では、画像読取部１１０で読み取る印刷物として、紙文書を例に挙げて説明するが、紙以外の記録媒体（例えば、ＯＨＰシート、フィルム等の透過原稿、や布）からなる印刷物を、画像読取部１１０の読取対象としても良い。
【００３７】
また、ＭＦＰ１００は、画像読取信号に対応する画像を印刷部１１２で記録媒体に印刷する複写機能を有し、原稿画像を１つ複写する場合には、この画像読取信号をデータ処理部１１５で画像処理して記録信号を生成し、これを印刷部１１２によって記録媒体上に印刷させる。一方、原稿画像を複数複写する場合には、記憶部１１１に一旦一つ分の記録信号を記憶保持させた後、これを印刷部１１２に順次出力して記録媒体上に印刷させる。
【００３８】
また、ＭＦＰ１００の印刷機能としては、ユーザＰＣ１０２から出力される記録信号は、ＬＡＮ１０７及びネットワークＩＦ１１４を介してデータ処理部１１５が受信し、データ処理部１１５は、その記録信号を印刷部１１２で記録可能なラスタデータに変換した後、印刷部１１２によって記録媒体上に印刷することが可能である。
【００３９】
また、ＭＦＰ１００の送信機能として、ラスタ画像をＴＩＦＦやＪＰＥＧ等の圧縮画像ファイル形式、あるいはＰＤＦ等のネットワークファイル形式の電子データへと変換し、ネットワークＩＦ１１４から出力する。出力された電子データは、ＬＡＮ１０７を介して文書管理サーバ１０６へ送信されたり、更に、ネットワーク１０４経由でユーザＰＣ１０２に転送することが可能である。
【００４０】
ＭＦＰ１００への操作者の指示は、ＭＦＰ１００に装備されたキー操作部とマネージメントＰＣ１０１に接続されたキーボード及びマウスからなる入力部１１３から行われ、これら一連の動作はデータ処理部１１５内の制御部（不図示）で制御される。また、操作入力の状態表示及び処理中の画像データの表示は、表示部１１６で行われる。
【００４１】
記憶部１１１は、マネージメントＰＣ１０１からも制御され、ＭＦＰ１００とマネージメントＰＣ１０１とのデータの送受信及び制御は、ネットワークＩＦ１１７及びＬＡＮ１０９を介して行われる。
【００４２】
尚、ＭＦＰ１００では、後述する各種処理を実行するための各種操作・表示をユーザに提供するユーザインタフェースを、表示部１１６及び入力部１１３によって実現している。
【００４３】
本発明による画像処理システムで実行する処理としては、大きく分けて画像データを登録する登録処理と、所望の画像データを検索する検索処理の２つがある。
【００４４】
尚、本実施形態では、以下に説明する処理を、例えば、ＭＦＰ１００単体で実行するものとするが、画像処理システム全体の処理効率を向上するために、以下に説明する各種処理を、画像処理システムを構成する各種端末に分散させて実行するようにしても良い。
【００４５】
まず、登録処理について説明する。
【００４６】
登録処理による登録対象の画像データの登録方法としては、文書作成アプリケーション等で作成された電子データをオリジナル文書として登録する場合と、紙文書をオリジナル文書として登録する場合とがある。また、この登録処理では、登録されたオリジナル文書データに対して、検索に必要となる特徴量の特徴量抽出処理と、それらとオリジナル文書データを関係づけて、データベースへ登録する。
【００４７】
図３は本発明の実施形態の登録処理を示すフローチャートである。
【００４８】
まず、ステップＳ３０１で、登録対象が紙文書であるか、電子データであるか否かを判定する。
【００４９】
尚、この判定は、例えば、入力部１０１に構成されている紙文書登録ボタン及び電子データ登録ボタンの操作に基づいて判定する。
【００５０】
ステップＳ３０１において、紙文書登録ボタンが操作された場合（ステップＳ３０１でＹＥＳ）、ステップＳ３０２に進み、ＭＦＰ１００の画像読取部１１０で、その紙文書をラスタ状に走査しラスタ画像（ページ画像）を得る。
【００５１】
一方、電子データ登録ボタンが操作された場合（ステップＳ３０１でＮＯ）、ステップＳ３０３に進み、クライアントＰＣ１０２内のハードディスク内、あるいはオフィス１０の文書管理サーバ１０６内のデータベース１０５内、あるいはＭＦＰ１００の記憶部１１１のいずれかに格納されている、登録対象のオリジナル文書の電子データを読み出してネットワークＩＦ１１４を介してデータ処理部１１５に入力し、データ処理部１１５でその電子データをラスタ画像（ページ画像）に変換する。
【００５２】
尚、このラスタ画像への変換は、登録対象の電子データを生成したアプリケーション自身あるいは付加ソフトウェアが有するラスタデータ化機能を利用して実現しても良い。
【００５３】
次に、ステップＳ３０４で、ページ画像に対して領域分割処理を行い、ページ画像から文字領域と非文字領域（例えば、写真、図画等）を抽出する。
【００５４】
ここで、領域分割処理は、例えば、図４（ａ）のページ画像を、図４（ｂ）のように、意味のあるブロック毎の塊として認識し、該ブロック各々の属性（文字（ＴＥＸＴ）／図画（ＰＩＣＴＵＲＥ）／写真（ＰＨＯＴＯ）／線（ＬＩＮＥ）／表（ＴＡＢＬＥ）等）を判定し、異なる属性を持つブロックに分割する処理である。
【００５５】
領域分割処理の実施形態を以下に説明する。
【００５６】
まず、入力画像を白黒に二値化し、輪郭線追跡を行って黒画素輪郭で囲まれる画素の塊を抽出する。面積の大きい黒画素の塊については、内部にある白画素に対しても輪郭線追跡を行って白画素の塊を抽出、さらに一定面積以上の白画素の塊の内部からは再帰的に黒画素の塊を抽出する。
【００５７】
このようにして得られた黒画素の塊を、大きさ及び形状で分類し、異なる属性を持つブロックへ分類していく。例えば、縦横比が１に近く、大きさが一定の範囲のブロックは文字相当の画素塊とし、さらに近接する文字が整列良くグループ化可能な部分を文字ブロック、扁平な画素塊を線ブロック、一定大きさ以上でかつ矩形の白画素塊を整列よく内包する黒画素塊の占める範囲を表ブロック、不定形の画素塊が散在している領域を写真ブロック、それ以外の任意形状の画素塊を図画ブロックとする。
【００５８】
尚、本実施形態では、文字ブロックを文字領域とし、図画ブロック及び写真ブロックを非文字領域にする。また、非文字領域として表プロックや線ブロックを含めても構わない。
【００５９】
次に、ステップＳ３０５で、ページ画像中の非文字領域をそれぞれひとつの非文字画像情報とし、その画像特徴量抽出を行う。そして、各非文字領域の画像特徴量とページ画像内での位置を検索用データとして、ページ画像と対応づけてデータベース１０５に登録する。
【００６０】
尚、画像特徴量抽出は、公知の手法を用いれば良く、ここでは詳細は説明しないが、一例としては、処理対象画像をメッシュ分割し、各メッシュ領域の平均色を要素としてベクトル化する方式がある。
【００６１】
また、画像特徴量は、処理対象画像の平均色を抽出する例以外に、最頻色を抽出する例もある。更に、例えば、最頻輝度、平均輝度等の輝度特徴量、共起行列、コントラスト、エントロピ、Ｇａｂｏｒ変換等で表現されるテクスチャ特徴量、エッジ、フーリエ記述子等の形状特徴量等の複数種類の画像特徴量を１つ、或いは、任意に組み合わせた画像特徴量を用いても良い。
【００６２】
次に、ステップ３０６で、ページ画像中のすべての文字領域に対して文字認識を施して文字コード列を取得し、これを文字特徴量として、ページ画像と対応づけてデータベース１０５に登録する。
【００６３】
尚、文字認識処理は、公知の手法を用いれば良く、例えば、文字画像からエッジ成分等を取得して特徴ベクトル化し、あらかじめ字種分登録された認識辞書内の特徴ベクトルと類似度を算出し、最も類似度の高い文字の文字コードを認識結果とする方法がある。
【００６４】
また、文字特徴量としては文字コードを採用しているが、例えば、単語辞書とのマッチングを予め行って単語の品詞を抽出しておき、名詞である単語を文字特徴量としても良い。
【００６５】
ステップＳ３０７で、ステップＳ３０６で取得した文字コード列から、キーワードとなる単語を抽出し、その位置とともに文字特徴量に追加して、ページ画像と対応づけてデータベース１０５に登録する。
【００６６】
尚、キーワード抽出には、種々の方法があるが、ここでは出現頻度の高い文字コード及び品詞解析の結果、固有名詞と看倣される文字コードを優先し、最大５種類のキーワードを抽出することにする。但し、これは、あくまで一例であって、他のキーワード抽出方法や、任意のキーワード個数を設定できることは言うまでもない。
【００６７】
次に、検索処理について説明する。
【００６８】
検索処理では、検索キーとしての紙文書をスキャンしたページ画像（検索キー画像）から特徴量を抽出し、データベース１０５に登録されている登録画像（ページ画像）の特徴量群と比較して、最も類似度の高い登録画像を検索結果として出力する。また、ユーザは、この出力された登録画像に対して、印刷、配信、蓄積、編集等の各種処理を実行することが可能である。
【００６９】
図５は本発明の実施形態の検索処理を示すフローチャートである。
【００７０】
まず、ステップＳ４０１で、検索キーとなる紙文書の入力を、ＭＦＰ１００の画像読取部１１０を介して行う。この処理は、ステップＳ３０２の処理と同様である。但し、この処理によって生成するラスタ画像（ページ画像）は一時保存するだけである。
【００７１】
次に、ステップＳ４０２で、ページ画像（検索キー画像）に対して領域分割処理を行い、検索キー画像中の文字領域と非文字領域を抽出する。この処理は、ステップＳ３０４の処理と同様である。但し、この処理によって生成する各領域の位置は一時保存するだけである。
【００７２】
次に、ステップＳ４０３で、検索キー画像中の非文字領域をそれぞれひとつの非文字画像情報とし、その画像特徴量抽出を行う。そして、各非文字領域の画像特徴量とページ画像内での位置を検索キー画像の画像特徴量として、検索キー画像と対応づけて記憶部１１１に記憶する。この処理は、ステップＳ３０５の処理と同様である。
【００７３】
次に、ステップＳ４０４で、検索キー画像中のすべての文字領域に対して文字認識を施して文字コード列を取得し、これを検索キー画像の文字特徴量として、検索キー画像と対応づけて記憶部１１１に記憶する。この処理は、ステップＳ３０６の処理と同様である。
【００７４】
次に、ステップＳ４０５で、ステップＳ４０４で取得した文字コード列から、キーワードとなる単語を抽出し、その位置とともに検索キー画像の文字特徴量に追加して、検索キー画像と対応づけて記憶部１１１に記憶する。この処理は、ステップＳ３０７の処理と同様である。
【００７５】
次に、ステップＳ４０６で、検索キー画像と、図３の登録処理でデータベース１０５に登録されている各登録画像（ページ画像）間で、その画像特徴量、文字特徴量、非文字領域の位置に基づいて処理対象として決定される文字領域中のキーワードでの類似比較を行い、その比較結果として類似度を算出する。
【００７６】
この類似比較は、以下の３種類の類似比較を行う。
【００７７】
１つ目は、非文字領域の画像特徴量間の類似度であり、具体的には、特徴ベクトル空間内で検索キー画像と登録画像双方の画像特徴量間の距離の近さに基づいて算出する。ここで、比較先画像（データベース１０５の登録画像）内に複数の非文字領域が存在する場合は、検索キー画像（比較元画像）内の１つの非文字領域の画像特徴量に対して、比較先画像中の複数の非文字領域画像それぞれの画像特徴量との類似度を算出し、最も類似度の高い非文字領域と対応させることで、複数の非文字領域に対する累積類似度を算出する。
【００７８】
但し、比較先画像内の非文字領域数が多く、組み合わせ個数が増大する場合には、その位置によって比較対象とする非文字領域の組み合わせに制限を設けてもよい。以上のようにして算出した画像特徴量による類似度をＳ１とする。
【００７９】
２つ目は、文字領域の文字特徴量間の類似度であり、これは、検索キー画像と登録画像双方の文字コード列間の近さに基づいて算出する。但し、文字認識によって得られる文字コードは様々な要因により誤認識を含むため、元々完全に一致するものではない。そこで、本実施形態では、検索キー画像の文字領域に対応する文字コード列から抽出された複数のキーワードが、登録画像の文字領域に対応する文字コード列内に存在する確率をもって、これを文字特徴量による類似度とし、これを類似度Ｓ２とする。
【００８０】
３つ目は、検索キー画像と登録画像間で、非文字領域の位置と文字領域の位置関係に基づいた類似度を算出する。この類似度算出処理の詳細について、図６を用いて説明する。
【００８１】
図６は本発明の実施形態の類似度算出処理を示すフローチャートである。
【００８２】
まず、ステップＳ６０１で、検索キー画像内の非文字領域群の内、処理対象とする非文字領域をＡ_１に設定する。尚、検索キー画像中に、非文字領域が１つしかない場合は、その非文字領域をＡ_１に設定する。
【００８３】
次に、ステップＳ６０２で、登録画像内の非文字領域の内、非文字領域Ａ_１と最も類似度の高い非文字領域Ａ_２を取得する。尚、この処理は、上述の画像特徴量により類似度算出の過程で取得するようにしても良い。
【００８４】
次に、ステップＳ６０３で、類似度が閾値以上、即ち、非文字領域Ａ_１と非文字領域Ａ_２が同一（あるいは類似）画像であるか否かを判定する。閾値未満である場合（ステップＳ６０３でＮＯ）、同一（あるいは類似）画像でないと判定して、ステップＳ６０７に進む。一方、閾値以上である場合（ステップＳ６０３でＹＥＳ）、同一（あるいは類似）画像であると判定して、ステップＳ６０４に進む。
【００８５】
次に、ステップＳ６０４で、非文字画像領域Ａ１の所定範囲内の文字領域群を参照して、その文字領域群に含まれる検索キー画像のキーワード群を選出し、これらを｛Ｎ_１１，Ｎ_１２，…｝と設定する。
【００８６】
ここで、ステップＳ６０４の具体例について、図７を用いて説明する。
【００８７】
図７は本発明の実施形態の文字領域中のキーワードの選出例を説明するための図である。
【００８８】
図７の例では、非文字領域７００の重心を中心とする、辺が２×Ｌの正方形の範囲内に存在するキーワードを文字領域から選出する場合を示している。そして、図７では、検索キー画像のキーワードが「ｙｙｙｙ」と「ｘｘｘｘ」である場合に、非文字領域７００に対して所定範囲（本実施形態では、２×Ｌの正方形の範囲）内にある文字領域７０１〜７０３の内、文字領域７０１からキーワード「ｙｙｙ」と、文字領域７０２からキーワード「ｘｘｘｘ」を選出する例を示している。
【００８９】
尚、Ｌの値は、固定であっても、あるいは処理対象の非文字領域の大きさや、近傍の文字領域内の文字量によって変化させてもよい。
【００９０】
次に、ステップＳ６０５で、登録画像内の非文字領域Ａ２の所定範囲内にある文字領域群を参照して、その文字領域群に含まれる登録画像のキーワード群を選出し、これらを｛Ｎ_２１，Ｎ_２２，…｝と設定する。
【００９１】
次に、ステップＳ６０６で、｛Ｎ_１１，Ｎ_１２，…｝×｛Ｎ_２１，Ｎ_２２，…｝の相関性に基づいて、相関性を示すスコアを算出する。
【００９２】
具体的には、１つ目のキーワードＮ_１１に対し、キーワード群｛Ｎ_２１，Ｎ_２２，…｝中に一致するキーワードがあれば、そのスコアを＋１点とし、最終的なキーワード一致個数を、相関性を示すスコアとして算出する。あるいは、キーワードＮ_１１と一致するキーワードＮ_２ｎに対し、検索キー画像と登録画像間の双方の非文字領域の重心位置と、キーワードの重心位置間の距離の比率が１に近いキーワード、もしくは非文字領域の重心位置から処理対象のキーワードの重心位置へ向かう角度の一致度が高いキーワードに、大きく加点するようにしてもよい。
【００９３】
以下、同様にして、処理対象のキーワードを変更して、各キーワードとのスコアを算出し、各キーワードのスコアの総和値を類似度Ｓ３とする。
【００９４】
そして、算出した３種類の類似度Ｓ１、Ｓ２、Ｓ３に対して、係数（ａ，ｂ，ｃ）を用いて、
総合類似度Ｓ＝ａ×Ｓ１＋ｂ×Ｓ２＋ｃ×Ｓ３
を算出する。
【００９５】
尚、係数（ａ，ｂ，ｃ）は、固定値を用いたり、処理対象画像内の文字領域と非文字領域の配分によって変化させたり、あるいはユーザが検索処理実行時に、任意に設定するようにしても良い。ここで、ａは画像特徴量による類似度比較の重み付けを規定する係数、ｂは文字特徴量による類似度比較の重み付けを規定する係数、ｃはキーワードによる類似度比較の重み付けを規定する係数と表現することができる。
【００９６】
そして、ステップＳ６０７で、検索キー画像中に未処理の非文字領域の有無を判定する。未処理の非文字領域がある場合（ステップＳ６０７でＹＥＳ）、未処理の非文字領域をＡ_１に設定するために、ステップＳ６０１に戻り、検索キー画像中の全ての非文字領域に対して、図６の処理を完了するまで、処理対象の非文字領域を切り替えて、再帰的に繰り返す。一方、未処理の非文字領域がない場合（ステップＳ６０７でＹＥＳ）、処理を終了し、図４のステップＳ４０７に進む。
【００９７】
ステップＳ４０７で、算出された総合類似度４０７に基づいて、検索結果となる登録画像を表示部１１６で表示する。この表示は、例えば、最も高い総合類似度を有する登録画像のみを表示してもよいし、閾値以上の総合類似度を有する登録画像を検索結果候補として一覧表示し、最終的な選択をユーザに委ねるようにしてもよい。
【００９８】
ステップＳ４０８で、検索結果となる登録画像に対し、表示部１０６・入力部１１４で実現されるユーザインタフェースを介するユーザからの操作に基いて、その登録画像の印刷、配信、蓄積、編集のいずれかの処理を実行する。
【００９９】
以上説明したように、本実施形態によれば、紙文書あるいは電子文書によって登録された登録画像を管理するデータベースに対し、その登録画像を利用して印刷された印刷物を検索キー画像として入力して、その検索キー画像の文字特徴量、画像特徴量及び検索キー画像の文字領域中のキーワードと非文字領域の位置関係に基づいて、その検索キー画像に対応する登録画像を検索する。
【０１００】
これにより、その内容が類似する登録画像が複数存在していても、より適切に検索キー画像に対応する登録画像を検索することができる。
【０１０１】
より具体的には、本実施形態では、文字領域と非文字領域が混在し、かつその構成が類似する登録画像が多数登録されている場合でも、その登録画像内の非文字領域と文字領域中のキーワードの位置関係を利用することで、より正確に、検索キー画像に対応する登録画像を検索することが可能となる。
【０１０２】
また、検索キー画像と登録画像間の内容のレイアウトが厳密に一致しない場合でも、柔軟な検索が可能となる。
【０１０３】
以上、実施形態例を詳述したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。
【０１０４】
尚、本発明は、前述した実施形態の機能を実現するソフトウェアのプログラム（実施形態では図に示すフローチャートに対応したプログラム）を、システムあるいは装置に直接あるいは遠隔から供給し、そのシステムあるいは装置のコンピュータが該供給されたプログラムコードを読み出して実行することによっても達成される場合を含む。
【０１０５】
従って、本発明の機能処理をコンピュータで実現するために、該コンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明は、本発明の機能処理を実現するためのコンピュータプログラム自体も含まれる。
【０１０６】
その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等の形態であっても良い。
【０１０７】
プログラムを供給するための記録媒体としては、例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などがある。
【０１０８】
その他、プログラムの供給方法としては、クライアントコンピュータのブラウザを用いてインターネットのホームページに接続し、該ホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスク等の記録媒体にダウンロードすることによっても供給できる。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードすることによっても実現可能である。つまり、本発明の機能処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明に含まれるものである。
【０１０９】
また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布し、所定の条件をクリアしたユーザに対し、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報を使用することにより暗号化されたプログラムを実行してコンピュータにインストールさせて実現することも可能である。
【０１１０】
また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される他、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行ない、その処理によっても前述した実施形態の機能が実現され得る。
【０１１１】
さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行ない、その処理によっても前述した実施形態の機能が実現される。
【０１１２】
【発明の効果】
以上説明したように、本発明によれば、紙文書からオリジナルの電子データを精度良く検索することができる画像検索装置及びその制御方法、プログラムを提供できる。
【図面の簡単な説明】
【図１】本発明の実施形態の画像処理システムの構成を示すブロック図である。
【図２】本発明の実施形態のＭＦＰの詳細構成を示すブロック図である。
【図３】本発明の実施形態の電子文書の登録処理を示すフローチャートである。
【図４】本発明の実施形態の画像ブロック抽出の一例を示す図である。
【図５】本発明の実施形態の検索処理を示すフローチャートである。
【図６】本発明の実施形態の類似度算出処理を示すフローチャートである。
【図７】本発明の実施形態の文字領域中のキーワードの選出例を説明するための図である。
【符号の説明】
１００ＭＦＰ
１０１マネージメントＰＣ
１０２ユーザＰＣ
１０３プロキシサーバ
１０４ネットワーク
１０５データベース
１０６文書管理サーバ
１０７ＬＡＮ
１１０画像読取部
１１１記憶部
１１２印刷部
１１３入力部
１１４、１１７ネットワークＩ／Ｆ
１１５データ処理部
１１６表示部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image search apparatus, a method, and a program for inputting a printed material as a search key and searching for an image corresponding to the printed material.
[0002]
[Prior art]
In recent years, paperless offices have rapidly become paperless, and not only documents created on terminals such as personal computers (PCs), but also past paper documents that have been stored in binders and the like from the past as well as image data (electronic Document) and store it in a database.
[0003]
On the other hand, paper documents are still preferred for handouts and the like at conferences, and there are many occasions where electronic documents stored in a database are printed out on paper and handed over to users.
[0004]
Here, when a user who receives a paper document wants to store and transmit it electronically or extract the contents and reuse it, rather than re-electronicizing the paper document to obtain an electronic document, a database is obtained. It is more desirable to acquire and use the original electronic (image) data of the paper document than to re-electronicize and use the paper document.
[0005]
In order to respond to such a request, for example, Patent Document 1 proposes a system capable of reading a paper document with a scanner and searching for image data similar to the content from a database.
[0006]
Further, assuming that a search target is a document generally used in an office or the like, there are roughly two types of information in a document: textual information and non-textual information such as a photograph or a figure. At the time of the search, the processing accuracy can be further improved by using the similarity calculation processing according to each characteristic.
[0007]
Further, if the area identification processing disclosed in Patent Document 2 is used, a character area and a non-character area in a document image obtained by scanning a paper document can be separated and extracted. By using this technology, each area is specified even for a document in which characters and non-characters are mixed. From the character area, keywords are extracted from the character code string obtained by the character recognition processing, and the character code string of the registered data is extracted. And the similarity using the degree of matching at the time of collation is obtained. On the other hand, from the non-character area, a similarity based on image feature amounts such as colors and edges is obtained. By adding this similarity with an appropriate distribution, a search using both character and non-character similarity information was possible.
[0008]
[Patent Document 1]
Patent No. 3017851 [Patent Document 2]
US Pat. No. 5,680,478 [0009]
[Problems to be solved by the invention]
In such an environment as described above, in particular, document image data registered in an office or the like may include many similar document image data. For example, by using drawing parts and texts, many documents containing the same photograph but different texts, or many documents with the same text but different figures are created and registered.
[0010]
On the other hand, in the method of comparing the similarity between a paper document image and a registered image, as in the conventional example, the similarity between a character and a non-character is obtained independently and summed up in an appropriate distribution. However, it is difficult to always find an appropriate distribution for search when the similarity between the two is highly biased.
[0011]
In addition, in order to specify a document image of an original document from a group of document images having similar contents, there has been a method of performing a search including the position of a character or non-character area in the feature amount.
[0012]
However, when the original document is generated from the electronic document, the layout may not exactly match the paper document created from the electronic document. For example, in the case of an electronic document in the HTML format, the layout at the time of printout greatly differs depending on the environment. In such a case, if an attempt is made to search for the original electronic document using the position information of the characters and non-character areas in the paper document, the search accuracy may be degraded.
[0013]
SUMMARY An advantage of some aspects of the invention is to provide an image retrieval apparatus capable of accurately retrieving original electronic data from a paper document, a control method thereof, and a program.
[0014]
[Means for Solving the Problems]
An image search device according to the present invention for achieving the above object has the following configuration. That is,
An image search device for inputting a printed material as a search key image and searching for a registered image corresponding to the search key image,
Region dividing means for dividing the search key image into a character region and a non-character region,
Calculating means for calculating a similarity between the search key image and the registered image based on a character feature amount of the character region, an image feature amount of the non-character region, and a positional relationship between the character region and the non-character region. When,
A search unit for searching for a registered image corresponding to the search key image based on a calculation result by the calculation unit.
[0015]
Preferably, the character feature amount is a character code obtained by character recognition of the character region.
[0016]
Also preferably, the calculating means, when the similarity based on the image feature amount of the non-character area is equal to or more than a threshold, based on the positional relationship between the predetermined character string in the character area and the non-character area, A similarity between the search key image and the registered image is calculated.
[0017]
Further, preferably, the calculating unit sets a correlation between predetermined character strings in the character area located within a predetermined range of the non-character area to be processed for both the search key image and the registered image as the similarity. calculate.
[0018]
Preferably, the calculation means compares distances between the non-character area to be processed in both the search key image and the registered image and a predetermined character string in the character area located within the predetermined range. Then, a similarity is calculated based on the comparison result.
[0019]
Preferably, the calculation means compares angles from the non-character area to be processed of both the search key image and the registered image to a predetermined character string in the character area located within the predetermined range. Then, a similarity is calculated based on the comparison result.
[0020]
An image search method according to the present invention for achieving the above object has the following configuration. That is,
An image search method for inputting a printed material as a search key image and searching for a registered image corresponding to the search key image,
An area dividing step of dividing the search key image into a character area and a non-character area;
Based on the character feature amount of the character region, the image feature amount of the non-character region, and the positional relationship between the character region and the non-character region, the search key image and a plurality of registered images stored in the storage medium are used. Calculating a similarity of
A search step of searching a registered image corresponding to the search key image from a plurality of registered images stored in the storage medium, based on a result of the calculation in the calculation step.
[0021]
A program according to the present invention for achieving the above object has the following configuration. That is,
A program for inputting a printed material as a search key image and implementing an image search for searching for a registered image corresponding to the search key image,
Program code of an area dividing step of dividing the search key image into a character area and a non-character area,
Based on the character feature amount of the character region, the image feature amount of the non-character region, and the positional relationship between the character region and the non-character region, the search key image and a plurality of registered images stored in the storage medium are used. Program code for a calculation step of calculating the similarity of
A program code for a search step of searching for a registered image corresponding to the search key image from a plurality of registered images stored in the storage medium based on a calculation result in the calculation step.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0023]
FIG. 1 is a block diagram illustrating a configuration of an image processing system according to an embodiment of the present invention.
[0024]
This image processing system is realized in an environment where the office 10 and the office 20 are connected by a network 104 such as the Internet.
[0025]
A LAN 107 built in the office 10 includes an MFP (Multi Function Peripheral) 100, which is a multifunction peripheral that implements a plurality of types of functions, a management PC 101 that controls the MFP 100, a document management server 106 and its database 105, and a proxy server 103. Is connected.
[0026]
A user PC 102 functioning as a user of the MFP 100 or the document management server 106 is connected to the LAN 108 built in the office 20, for example.
[0027]
The LAN 107 in the office 10 and the LAN 108 in the office 20 are connected to the network 104 via the proxy server 103 in both offices.
[0028]
The MFP 100 particularly includes an image reading unit that electronically reads a paper document and an image processing unit that performs image processing on an image signal obtained from the image reading unit. The image signal is transmitted to the management PC 101 via the LAN 109. can do.
[0029]
The management PC 101 is a normal PC, and includes various components such as an image storage unit, an image processing unit, a display unit, and an input unit. Some of the components are integrated with the MFP 100. .
[0030]
The network 104 is typically one of the Internet, a LAN, a WAN, a telephone line, a dedicated digital line, an ATM, a frame relay line, a communication satellite line, a cable TV line, a data broadcasting wireless line, or the like. It is a so-called communication network realized by the combination, and it is sufficient that data can be transmitted and received.
[0031]
Various terminals such as a management PC 101, a user PC 102, and a document management server are respectively standard components (eg, CPU, RAM, ROM, hard disk, external storage device, network interface, display, keyboard) mounted on a general-purpose computer. , Mouse, etc.).
[0032]
Further, the configuration of the image processing system in FIG. 1 is an example, and for example, the office 10 and the office 20 may be configured on the same LAN.
[0033]
Next, a detailed configuration of MFP 100 will be described with reference to FIG.
[0034]
FIG. 2 is a block diagram showing a detailed configuration of the MFP according to the embodiment of the present invention.
[0035]
In FIG. 2, an image reading unit 110 including a document table and an automatic document feeder (ADF) irradiates a bundle or one document image with a light source (not shown), and reflects a document reflection image on a solid-state image sensor with a lens. And a raster image reading signal is obtained from the solid-state imaging device as a raster image having a predetermined density (for example, 600 DPI).
[0036]
In the present embodiment, a paper document is described as an example of a printed material read by the image reading unit 110. However, a printed material formed of a recording medium other than paper (for example, a transparent original such as an OHP sheet or a film, or a cloth) is used. May be set as a reading target of the image reading unit 110.
[0037]
Further, MFP 100 has a copy function of printing an image corresponding to the image reading signal on a recording medium by printing unit 112, and when copying one document image, the image reading signal is transmitted to data processing unit 115 by image processing unit 115. Processing is performed to generate a recording signal, and this is printed by a printing unit 112 on a recording medium. On the other hand, when a plurality of original images are copied, a recording signal for one copy is once stored and held in the storage unit 111, and is sequentially output to the printing unit 112 to be printed on a recording medium.
[0038]
As a printing function of the MFP 100, a recording signal output from the user PC 102 is received by the data processing unit 115 via the LAN 107 and the network IF 114, and the data processing unit 115 can record the recording signal by the printing unit 112. After converting the data into raster data, the printing unit 112 can print the data on a recording medium.
[0039]
As a transmission function of the MFP 100, the MFP 100 converts a raster image into electronic data in a compressed image file format such as TIFF or JPEG or a network file format such as PDF and outputs the same from the network IF 114. The output electronic data can be transmitted to the document management server 106 via the LAN 107 or further transferred to the user PC 102 via the network 104.
[0040]
An operator's instruction to the MFP 100 is performed from a key operation unit provided in the MFP 100 and an input unit 113 including a keyboard and a mouse connected to the management PC 101. A series of these operations are performed by a control unit ( (Not shown). The display of the status of the operation input and the display of the image data being processed are performed on the display unit 116.
[0041]
The storage unit 111 is also controlled by the management PC 101, and data transmission and reception and control between the MFP 100 and the management PC 101 are performed via the network IF 117 and the LAN 109.
[0042]
In the MFP 100, the display unit 116 and the input unit 113 implement a user interface for providing a user with various operations and displays for executing various processes described below.
[0043]
The processing executed by the image processing system according to the present invention is roughly classified into a registration processing for registering image data and a search processing for searching for desired image data.
[0044]
In the present embodiment, the processing described below is performed by, for example, the MFP 100 alone. However, in order to improve the processing efficiency of the entire image processing system, various processing described below is performed by the image processing system. May be executed in a distributed manner among the various terminals constituting the program.
[0045]
First, the registration process will be described.
[0046]
As a method for registering image data to be registered by the registration process, there are a case where electronic data created by a document creation application or the like is registered as an original document, and a case where a paper document is registered as an original document. In this registration process, the registered original document data is subjected to a feature amount extraction process of a feature amount necessary for a search, and the original document data is associated with the registered original document data and registered in a database.
[0047]
FIG. 3 is a flowchart showing the registration processing according to the embodiment of the present invention.
[0048]
First, in step S301, it is determined whether the registration target is a paper document or electronic data.
[0049]
This determination is made, for example, based on the operation of the paper document registration button and the electronic data registration button configured on the input unit 101.
[0050]
In step S301, if the paper document registration button is operated (YES in step S301), the process proceeds to step S302, and the image reading unit 110 of the MFP 100 scans the paper document in a raster form to obtain a raster image (page image). .
[0051]
On the other hand, if the electronic data registration button has been operated (NO in step S301), the process proceeds to step S303, in the hard disk in the client PC 102, in the database 105 in the document management server 106 in the office 10, or in the storage unit 111 of the MFP 100. The electronic data of the original document to be registered, which is stored in any of the above, is read and input to the data processing unit 115 via the network IF 114, and the data processing unit 115 converts the electronic data into a raster image (page image). I do.
[0052]
The conversion into the raster image may be realized by using the rasterization function of the application itself or the additional software that generated the electronic data to be registered.
[0053]
Next, in step S304, a region division process is performed on the page image to extract a character region and a non-character region (for example, a photograph, a drawing, and the like) from the page image.
[0054]
Here, in the area division processing, for example, as shown in FIG. 4B, the page image of FIG. 4A is recognized as a meaningful block for each block, and the attribute (character (TEXT)) of each block is recognized. / Picture (PICTURE) / photograph (PHOTO) / line (LINE) / table (TABLE)), and is divided into blocks having different attributes.
[0055]
An embodiment of the area dividing process will be described below.
[0056]
First, the input image is binarized into black and white, and contour tracing is performed to extract a block of pixels surrounded by black pixel contours. For a block of black pixels having a large area, contour tracing is also performed on the white pixels inside to extract a block of white pixels. To extract the lump.
[0057]
The block of black pixels obtained in this manner is classified according to size and shape, and classified into blocks having different attributes. For example, a block whose aspect ratio is close to 1 and whose size is fixed is a pixel block equivalent to a character, a portion where adjacent characters can be grouped in a well-aligned manner is a character block, a flat pixel block is a line block, A table block shows the area occupied by black pixel blocks that are larger than the size and contain rectangular white pixel blocks in a well-aligned manner, a photo block shows areas where irregular pixel blocks are scattered, and other arbitrary pixel blocks Block.
[0058]
In the present embodiment, a character block is a character region, and a drawing block and a photo block are non-character regions. Further, a table block or a line block may be included as a non-character area.
[0059]
Next, in step S305, each non-character area in the page image is set as one piece of non-character image information, and the image feature amount is extracted. Then, the image feature amount of each non-character area and the position in the page image are registered as search data in the database 105 in association with the page image.
[0060]
It should be noted that the image feature amount extraction may be performed by using a known method, and the details will not be described here. As an example, a method of dividing a processing target image into meshes and vectorizing the average color of each mesh region as an element is used. is there.
[0061]
In addition to the example of extracting the average color of the image to be processed, there is also an example of extracting the most frequent color as the image feature amount. Furthermore, for example, a plurality of types of luminance feature amounts such as mode luminance, average luminance, co-occurrence matrix, contrast, entropy, texture feature amounts represented by Gabor transform, and shape feature amounts such as edges, Fourier descriptors, etc. One image feature amount or an image feature amount obtained by arbitrarily combining the image feature amounts may be used.
[0062]
Next, in step 306, character recognition is performed on all character regions in the page image to obtain a character code string, and this is registered in the database 105 as a character feature amount in association with the page image.
[0063]
The character recognition process may use a known method.For example, an edge component or the like is acquired from a character image and converted into a feature vector, and a feature vector in a recognition dictionary registered in advance for each character type and a similarity are calculated. There is a method in which the character code of the character having the highest similarity is used as the recognition result.
[0064]
Although a character code is used as the character feature amount, for example, the word part of the word may be extracted by performing matching with a word dictionary in advance, and the word that is a noun may be used as the character feature amount.
[0065]
In step S307, a word serving as a keyword is extracted from the character code string acquired in step S306, added to the character feature amount along with its position, and registered in the database 105 in association with the page image.
[0066]
There are various methods for keyword extraction. Here, as a result of analyzing character codes with a high frequency of appearance and part of speech analysis, character codes recognized as proper nouns are given priority, and up to five types of keywords are extracted. To However, this is only an example, and it goes without saying that other keyword extraction methods and an arbitrary number of keywords can be set.
[0067]
Next, the search processing will be described.
[0068]
In the search processing, a feature amount is extracted from a page image (search key image) obtained by scanning a paper document as a search key, and the extracted feature amount is compared with a feature amount group of a registered image (page image) registered in the database 105. A registered image having a high degree of similarity is output as a search result. Further, the user can execute various processes such as printing, distribution, accumulation, and editing on the output registered image.
[0069]
FIG. 5 is a flowchart showing a search process according to the embodiment of the present invention.
[0070]
First, in step S401, a paper document serving as a search key is input via the image reading unit 110 of the MFP 100. This process is the same as the process in step S302. However, the raster image (page image) generated by this process is only temporarily stored.
[0071]
Next, in step S402, a region division process is performed on the page image (search key image) to extract a character region and a non-character region in the search key image. This processing is the same as the processing in step S304. However, the position of each area generated by this processing is only temporarily stored.
[0072]
Next, in step S403, each non-character area in the search key image is set as one piece of non-character image information, and the image feature amount is extracted. Then, the image feature amount of each non-character area and the position in the page image are stored in the storage unit 111 as the image feature amount of the search key image in association with the search key image. This process is the same as the process in step S305.
[0073]
Next, in step S404, character recognition is performed on all character regions in the search key image to obtain a character code string, and this is stored as the character feature amount of the search key image in association with the search key image. The information is stored in the unit 111. This process is the same as the process in step S306.
[0074]
Next, in step S405, a word serving as a keyword is extracted from the character code string acquired in step S404, added to the character feature amount of the search key image along with its position, and associated with the search key image in the storage unit 111. To memorize. This process is the same as the process in step S307.
[0075]
Next, in step S406, between the search key image and each registered image (page image) registered in the database 105 in the registration processing of FIG. A similarity comparison is performed for keywords in a character area determined as a processing target based on the comparison, and a similarity is calculated as a result of the comparison.
[0076]
This similarity comparison performs the following three types of similarity comparisons.
[0077]
The first is the degree of similarity between the image features in the non-character area, and specifically, is calculated based on the closeness of the distance between the image features of both the search key image and the registered image in the feature vector space. I do. Here, when there are a plurality of non-character areas in the comparison target image (the registered image in the database 105), the comparison is performed with respect to the image feature amount of one non-character area in the search key image (comparison source image). The degree of similarity with the image feature amount of each of the plurality of non-character area images in the previous image is calculated, and by associating with the non-character area having the highest similarity, the cumulative similarity for the plurality of non-character areas is calculated.
[0078]
However, when the number of non-character areas in the comparison destination image is large and the number of combinations increases, the combination of non-character areas to be compared may be limited depending on the position. The similarity based on the image feature amount calculated as described above is defined as S1.
[0079]
The second is the similarity between the character feature amounts of the character areas, which is calculated based on the closeness between the character code strings of both the search key image and the registered image. However, the character codes obtained by character recognition include erroneous recognition due to various factors, and therefore do not originally completely match. Therefore, in the present embodiment, the probability that a plurality of keywords extracted from the character code string corresponding to the character area of the search key image are present in the character code string corresponding to the character area of the registered image is determined by the character feature. The similarity based on the quantity is defined as the similarity S2.
[0080]
Third, a similarity is calculated between the search key image and the registered image based on the positional relationship between the non-character area and the character area. Details of the similarity calculation processing will be described with reference to FIG.
[0081]
FIG. 6 is a flowchart showing the similarity calculation processing according to the embodiment of the present invention.
[0082]
First, in step S601, among the non-character region group in the search key image, it sets a non-character region to be processed in A _1. Incidentally, during the search key image, when the non-character area there is only one, and sets the non-character area A _1.
[0083]
Next, in step S602, among the non-character region in the reference image, to obtain the highest degree of similarity and the non-character areas A ₁ non-character region A _2. This processing may be acquired in the process of calculating the similarity based on the above-described image feature amount.
[0084]
Next, it is determined in step S603, the similarity is above a threshold, i.e., whether the non-character area A ₁ and the non-character area A ₂ are the same (or similar) images. If it is less than the threshold (NO in step S603), it is determined that the images are not the same (or similar), and the process proceeds to step S607. On the other hand, if it is equal to or larger than the threshold (YES in step S603), it is determined that the images are the same (or similar), and the process proceeds to step S604.
[0085]
Next, in step S604, referring to a character area group within a predetermined range of the non-character image area A1, a keyword group of a search key image included in the character area group is selected, and these are referred to as {N ₁₁ , N ₁₂ ,…｝
[0086]
Here, a specific example of step S604 will be described with reference to FIG.
[0087]
FIG. 7 is a diagram for explaining an example of selecting a keyword in a character area according to the embodiment of the present invention.
[0088]
In the example of FIG. 7, a case is shown in which a keyword whose center is located at the center of gravity of the non-character area 700 and whose side is within a 2 × L square range is selected from the character area. In FIG. 7, when the keywords of the search key image are “yyyy” and “xxxx”, the non-character area 700 is within a predetermined range (in this embodiment, a 2 × L square range). An example is shown in which, of the character areas 701 to 703, a keyword “yyy” is selected from the character area 701 and a keyword “xxxx” is selected from the character area 702.
[0089]
Note that the value of L may be fixed or may be changed according to the size of the non-character area to be processed or the amount of characters in a nearby character area.
[0090]
Next, in step S605, referring to a character area group within a predetermined range of the non-character area A2 in the registered image, a keyword group of the registered image included in the character area group is selected, and these are set to ｛N ₂₁ , N ₂₂ ,...}.
[0091]
Next, in step _S606, the based on the correlation of _{{N 11, N 12, ...} } × {N 21, N 22, ...}, calculates a score indicating a correlation.
[0092]
Specifically, if there is a matching keyword in the keyword group {N ₂₁ , N ₂₂ ,...} For the first keyword N ₁₁ , the score is set to +1 and the final number of matching keywords is It is calculated as a score indicating the correlation. Alternatively, with respect to the keyword N _2n matching the keyword N _11, the search key image and the center of gravity of the non-character region both between the registered image, close to the ratio of the distance between keywords in center of gravity 1 keyword or non-character, A keyword with a high degree of coincidence of the angle from the center of gravity of the region to the center of gravity of the keyword to be processed may be greatly added.
[0093]
Hereinafter, similarly, the keywords to be processed are changed, the score with each keyword is calculated, and the total value of the scores of each keyword is set as the similarity S3.
[0094]
Then, for the calculated three types of similarities S1, S2, and S3, using the coefficients (a, b, and c),
Total similarity S = a × S1 + b × S2 + c × S3
Is calculated.
[0095]
Note that the coefficients (a, b, c) may be fixed values, may be changed by the distribution of character areas and non-character areas in the processing target image, or may be arbitrarily set by the user when executing the search processing. May be. Here, a is a coefficient defining the weight of the similarity comparison based on the image feature amount, b is a coefficient defining the weight of the similarity comparison based on the character feature amount, and c is a coefficient defining the weighting of the similarity comparison based on the keyword. can do.
[0096]
Then, in step S607, it is determined whether there is an unprocessed non-character area in the search key image. If there is a non-character area unprocessed (YES in step S607), the non-character area of the untreated in order to set the A _1, returns to step S601, for all the non-character region in the key image, Until the processing in FIG. 6 is completed, the non-character area to be processed is switched and the processing is repeated recursively. On the other hand, if there is no unprocessed non-character area (YES in step S607), the processing ends, and the flow advances to step S407 in FIG.
[0097]
In step S407, a registered image serving as a search result is displayed on the display unit 116 based on the calculated overall similarity 407. For this display, for example, only registered images having the highest overall similarity may be displayed, or registered images having overall similarities equal to or greater than a threshold are displayed as a list of search result candidates, and the final selection is presented to the user. It may be entrusted.
[0098]
In step S408, any one of printing, distribution, storage, and editing of the registered image serving as a search result is performed based on an operation from the user via the user interface realized by the display unit 106 and the input unit 114. The processing of is performed.
[0099]
As described above, according to the present embodiment, a printed matter printed using a registered image is input as a search key image to a database for managing registered images registered by a paper document or an electronic document. Then, a registered image corresponding to the search key image is searched based on the character feature amount of the search key image, the image feature amount, and the positional relationship between the keyword in the character region of the search key image and the non-character region.
[0100]
As a result, even if there are a plurality of registered images having similar contents, the registered image corresponding to the search key image can be more appropriately searched.
[0101]
More specifically, in the present embodiment, even when a character area and a non-character area are mixed and a large number of registered images having similar configurations are registered, the non-character area and the character area By using the positional relationship of the keyword, the registered image corresponding to the search key image can be more accurately searched.
[0102]
Moreover, even when the layout of the contents between the search key image and the registered image does not exactly match, a flexible search can be performed.
[0103]
As described above, the embodiment has been described in detail. However, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, a storage medium, or the like. The system may be applied to a system including a single device or an apparatus including one device.
[0104]
According to the present invention, a software program (in the embodiment, a program corresponding to the flowchart shown in the drawings) for realizing the functions of the above-described embodiment is directly or remotely supplied to a system or an apparatus, and a computer of the system or the apparatus is supplied. Is also achieved by reading and executing the supplied program code.
[0105]
Therefore, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. That is, the present invention includes the computer program itself for realizing the functional processing of the present invention.
[0106]
In that case, as long as it has the function of the program, it may be in the form of object code, a program executed by the interpreter, script data supplied to the OS, or the like.
[0107]
As a recording medium for supplying the program, for example, a floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, non-volatile memory card , ROM, DVD (DVD-ROM, DVD-R) and the like.
[0108]
In addition, as a method of supplying the program, a client computer connects to an Internet homepage using a browser, and downloads the computer program itself of the present invention or a compressed file including an automatic installation function to a recording medium such as a hard disk from the homepage. Can also be supplied. Further, the present invention can also be realized by dividing the program code constituting the program of the present invention into a plurality of files and downloading each file from a different homepage. In other words, the present invention also includes a WWW server that allows a plurality of users to download a program file for realizing the functional processing of the present invention on a computer.
[0109]
In addition, the program of the present invention is encrypted, stored in a storage medium such as a CD-ROM, distributed to users, and downloaded to a user who satisfies predetermined conditions from a homepage via the Internet to download key information for decryption. It is also possible to execute the encrypted program by using the key information and install the program on a computer to realize the program.
[0110]
The functions of the above-described embodiments are implemented when the computer executes the read program, and an OS or the like running on the computer executes a part of the actual processing based on the instructions of the program. Alternatively, all the operations are performed, and the functions of the above-described embodiments can be realized by the processing.
[0111]
Further, after the program read from the recording medium is written into the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board or the A CPU or the like provided in the function expansion unit performs part or all of the actual processing, and the processing also realizes the functions of the above-described embodiments.
[0112]
【The invention's effect】
As described above, according to the present invention, it is possible to provide an image search apparatus capable of accurately searching original electronic data from a paper document, a control method thereof, and a program.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an image processing system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a detailed configuration of the MFP according to the embodiment of the present invention.
FIG. 3 is a flowchart illustrating registration processing of an electronic document according to the embodiment of the present invention.
FIG. 4 is a diagram showing an example of image block extraction according to the embodiment of the present invention.
FIG. 5 is a flowchart illustrating a search process according to the embodiment of the present invention.
FIG. 6 is a flowchart illustrating a similarity calculation process according to the embodiment of the present invention.
FIG. 7 is a diagram illustrating an example of selecting a keyword in a character area according to the embodiment of the present invention.
[Explanation of symbols]
100 MFP
101 Management PC
102 User PC
103 Proxy server 104 Network 105 Database 106 Document management server 107 LAN
110 Image reading unit 111 Storage unit 112 Printing unit 113 Input unit 114, 117 Network I / F
115 Data processing unit 116 Display unit

Claims

An image search device for inputting a printed material as a search key image and searching for a registered image corresponding to the search key image,
Region dividing means for dividing the search key image into a character region and a non-character region,
Calculating means for calculating a similarity between the search key image and the registered image based on a character feature amount of the character region, an image feature amount of the non-character region, and a positional relationship between the character region and the non-character region. When,
An image search device comprising: a search unit that searches for a registered image corresponding to the search key image based on a calculation result by the calculation unit.

The apparatus according to claim 1, wherein the character feature amount is a character code obtained by character recognition of the character area.

The calculating means, when the similarity based on the image feature amount of the non-character area is equal to or more than a threshold, based on the positional relationship between the predetermined character string in the character area and the non-character area, the search key image and The apparatus according to claim 1, wherein a similarity between the registered images is calculated.

The calculation means calculates, as the similarity, a correlation between predetermined character strings in the character region located within a predetermined range of the non-character region to be processed for both the search key image and the registered image. The image search device according to claim 1.

The calculation means compares the distance between the non-character area to be processed for both the search key image and the registered image and a predetermined character string in the character area located within the predetermined range, and the comparison result The image search device according to claim 1, wherein the similarity is calculated based on:

The calculation means compares angles from the non-character area to be processed for both the search key image and the registered image to a predetermined character string in the character area located within the predetermined range, and compares the comparison result. The image search device according to claim 1, wherein the similarity is calculated based on:

An image search method for inputting a printed material as a search key image and searching for a registered image corresponding to the search key image,
An area dividing step of dividing the search key image into a character area and a non-character area;
Based on the character feature amount of the character region, the image feature amount of the non-character region, and the positional relationship between the character region and the non-character region, the search key image and a plurality of registered images stored in the storage medium are used. Calculating a similarity of
A search step of searching a registered image corresponding to the search key image from a plurality of registered images stored in the storage medium based on a result of the calculation in the calculation step.

A program for inputting a printed material as a search key image and implementing an image search for searching for a registered image corresponding to the search key image,
Program code of an area dividing step of dividing the search key image into a character area and a non-character area,
Based on the character feature amount of the character region, the image feature amount of the non-character region, and the positional relationship between the character region and the non-character region, the search key image and a plurality of registered images stored in the storage medium are used. Program code for a calculation step of calculating the similarity of
A program code for a search step of searching a registered image corresponding to the search key image from a plurality of registered images stored in the storage medium, based on a result of the calculation in the calculation step.