JP3917349B2

JP3917349B2 - Retrieval device and method for retrieving information using character recognition result

Info

Publication number: JP3917349B2
Application number: JP2000159688A
Authority: JP
Inventors: 裕勝山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2000-05-30
Filing date: 2000-05-30
Publication date: 2007-05-23
Anticipated expiration: 2020-05-30
Also published as: JP2001337993A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像情報に対する文字認識処理の結果を利用して、画像内の情報を検索する検索装置およびその方法に関する。
【０００２】
【従来の技術】
近年、企業等の組織内では、情報の共有化および迅速な情報利用の観点から、文書を電子化して共有する文書管理システムが使用されている。電子化文書としては、作成時から電子化されているワードプロセッサ文書やプレゼンテーション文書等のテキスト検索可能な文書がよく使用されるが、その他にも、そのままではテキスト検索ができない画像情報も使用される。
【０００３】
画像を文書管理システムに登録する方法としては、後の検索処理のために、画像を文字認識して検索可能なテキスト情報を抽出し、その情報を画像とともに保存する方法が一般的である。
【０００４】
しかし、文字認識処理の認識率は１００％ではなく、認識結果には誤認識も含まれる。また、通常の検索方法では、誤認識を検出することができない。このため、文書中にキーワードの文字列があるにもかかわらず、その文字列が誤認識のためにキーワードと一致せず、検索できない場合がある。
【０００５】
これを防ぐために、文字認識の特性を利用した検索用データ作成方法と、文字認識後のデータに対する検索方法の開発が必要とされている。文字認識後の情報を検索する従来の検索技術としては、例えば、以下のような文献が挙げられる。
（＃１）丸川（Marukawa）、藤澤（Fujisawa）、嶋（Shima ）「認識機能の出力あいまい性を許容した情報検索手法の一検討認識誤り特性に着目した検索手法の分析評価」電子情報通信学会論文誌vol.J79-D-II, no.5, pp785-794, 1996
（＃２）太田（Ohta）、高須（Takasu）、安達（Adachi）「認識誤りを含む和文テキストにおける全文検索手法」情処学会論文誌vol.39, no.3, pp625-635, 1998
（＃３）糸乗（Itonori ）、尾崎（Ozaki ）「類似文字による日本語単語抽出」信学技報（Technical Report of IEICE）PRMU98-87, pp25-32, 1998
（＃４）今川（Imagawa ）、松川（Matsukawa ）、近藤（Kondo ）、目片（Mekata）「文字毎に認識信頼度を付与した誤認識を含むテキストからの検索手法」信学技報PRMU99-72, pp63-75, 1999
（＃５）近藤、松川、今川、目片「文字認識誤りを含むテキストからのあいまい検索に関する一検討」信学技報PRMU99-73, pp69-75, 1999
（＃６）遊佐（Yusa）、田中（Tanaka）「日本語文書画像に対する文字列検索機能の実現」情報処理、情報メディア研究報告19-1, pp1-8, 1995
（＃７）中西（Nakanishi ）、大町（Omachi）、阿曽（Aso ）「低品位文書画像に対応した高精度なキーワード検索システム」信学技報PRMU98-232, pp97-104, 1999
（＃８）松川、今川、近藤、目片「形状特徴検索併用による文書画像検索の性能向上」信学技報PRMU99-74, pp77-83, 1999
これらの従来技術は、以下のように分類できる。
ａ）文字認識候補とキーワードとの照合を行う方法
文字認識時に認識候補を生成し、検索時に候補を用いて、複数の可能性を考慮しながら検索する。
文献（＃１）
認識候補の数は固定数ではなく、文字毎に類似度のしきい値処理を行い、認識候補を絞り込んでいる（しきい値：４種類）。
ｂ）類似文字によるキーワードの展開方法
ｂ−１）類似文字テーブルによるキーワード文字列展開
１つのキーワードを類似文字情報により複数個に展開して検索する。
文献（＃１）
あらかじめ作っておいたコンフュージョンマトリクスで、単純にキーワードを展開する。１文字に対する類似文字数は固定数ではなく、あらかじめ文字毎に類似度のしきい値処理を行い、類似文字を絞り込んでいる（しきい値：４種類）。
文献（＃２）
誤認識、誤欠落、誤挿入、誤結合、および誤分割を確率的に表した誤り易さに基づく確率を格納したテーブルをあらかじめ作っておき、そのテーブルを用いて、確率が０以外のすべての組み合わせでキーワードを展開する。次に、展開されたキーワードを使って全文検索を行う。そして、検索で抽出された文字列毎に、確率的に求めた文字列の確信度を計算し、しきい値を用いて適否を決める。
ｂ−２）類似文字テーブルによる検索
文字認識された文字列に対して、類似文字テーブルを用いて、検索毎の複数の対応関係の可能性を考慮して検索する。
文献（＃３）
形状の似た複数の文字を１つのカテゴリとして扱い、言語処理で１つに絞り込んで検索する。
【０００６】
辞書作成）
まず、階層的クラスタリングで類似文字のカテゴリ（クラスタ）を作成する。次に、類似文字クラスタを分割し、分割クラスタを再統合する。
【０００７】
識別）
画像に対して、類似文字識別、形態素解析、および詳細識別の順で処理を行い、単語列を生成する。類似文字識別では、１０００個または２０００個のカテゴリが用いられる。
【０００８】
まず、再統合クラスタ→分割クラスタ→類似文字クラスタの階層的識別で、入力文字に最も近い類似文字カテゴリを求め、類似文字カテゴリでできた文字列を作成する。次に、類似文字カテゴリ用の形態素解析で、類似文字カテゴリ内の文字を使った単語を生成する。このとき、形態素解析で１単語に絞り込めなかった文字だけ、詳細識別で決定する。
文献（＃４）
認識結果の確信度（信頼度）に応じて、類似文字テーブルの展開文字数を文字毎に可変にしている。あらかじめ、正しい文字と認識結果、およびその確信度からなる類似文字テーブルを作り、認識時に文字認識結果とその確信度を出力しておく。検索時には、キーワードの文字と認識結果の文字、およびその確信度から、類似文字テーブルを使って検索文字との照合確率を求める。そして、その値が正である場合に認識結果が照合されたものとみなして、検索を行う。この方法は、照合確率が正になる類似文字に限定した検索と同じである。
文献（＃５）
文字毎に付与された文字認識の信頼度と文字数別特定可能確率を用いる文献（＃４）の検索方法に加えて、ｎ文字の単語のうちｋ文字が一致した場合にその単語が一意に特定できる確率を使って検索する。文字数別特定可能確率とは、比較的長い文字列長をもった単語の一部をワイルドカード扱いにして検索を行っても、別の単語が検索されない確率のことである。
ｃ）画像検索方法
画像照合により、文書画像からキーワード画像を直接検索する。
文献（＃６）
まず、文字矩形を正方形とみなして１６等分し、隣り合った部分同士を縦または横に２つ組み合わせてできる長方形を１単位とする。そして、隣同士の２つの長方形の組み合わせのうち、正方形を作るような組み合わせを抽出する。１文字からは、１８セットの組み合わせができる。次に、１単位の中の黒画素密度を求め、隣同士の単位間における黒画素密度の比を００〜１１の２ビットでコード化したものを特徴量とする。１文字当たり３６ビットの特徴量が得られ、照合時には、８ビットの誤りまで許容される。
文献（＃７）
複数の切り出し候補を出力し、切り出し候補毎に特徴ベクトルを登録する。検索時には、キーワード文字の最初の文字から１文字毎にマッチングを行い、距離値が固定しきい値以下である場合に、マッチしたものとする。このとき、前の文字がマッチしたら次の文字は切り出し候補を絞ることで、処理を高速化する。そして、すべての文字の距離値がしきい値以下で、かつ、距離値の和が別の固定しきい値以下の場合に、キーワードを検出する。
ｄ）類似文字検索と画像検索を組み合わせた方法
文献：（＃８）
認識結果の確信度（信頼度）に応じて、類似文字検索と画像検索を切り替える。ｂ−２）における文献（＃４）の類似文字検索を行い、照合条件に合う文字を抽出する。そして、類似文字検索で検出されず、かつ、信頼度が固定しきい値より低い文字が連続する場合に、形状特徴検索を行う。形状特徴検索では、キーワードの文字から推定した幅になるように、隣接する低信頼度の文字を統合して切り出す。次に、文字の特徴量を求め、それとキーワードの文字の特徴量との間のユークリッド距離を求める。そして、距離値が固定しきい値以下であれば、照合されたものとする。
【０００９】
【発明が解決しようとする課題】
しかしながら、上述した従来の検索技術には、次のような問題がある。
ａ）文字認識候補とキーワードとの照合を行う方法
文献（＃１）
検索対象文書画像のフォントによっては、候補文字の中に正解が入らない場合がある。特に、類似文字が多数ある場合には、正解が候補文字の下位になってしまうと、候補文字数の制限から足切りされて、候補文字の中に正解が入らない。
ｂ）類似文字によるキーワードの展開方法
ｂ−１）
文献（＃１）
展開したキーワードの数が、類似文字テーブルの大きさによって指数関数的に増え、計算量が膨大になる。
文献（＃２）
処理対象が広がると、展開したキーワードの数が爆発的に増える。
ｂ−２）
文献（＃３）
言語処理が入るので、複合語や文字数が少ない単語では精度が悪くなる。
文献（＃４）
あらかじめ用意される類似文字テーブルの作り方によっては、精度が期待できない。実際に、低解像度文字では精度が悪いことが、この文献の著者らによって報告されている。
文献（＃５）
複合語や文字数が少ない単語に弱い。また、文献（＃４）の方法と同様に、あらかじめ用意される類似文字テーブルの作り方によっては、精度が期待できない。
ｃ）画像検索方法
文献（＃６）
この方法で使用されている特徴量では、高精度が期待できない。また、テキストのみを使う検索に比べて、計算量が多い。さらに、辞書作成フォント以外では、高精度にならない。
文献（＃７）
テキストのみを使う検索に比べて、計算量が多い。また、キーワードの最初の文字が検出できないと、対応する文字列を検索できない。さらに、辞書作成フォント以外では、高精度にならない。
ｄ）類似文字検索と画像検索を組み合わせた方法
文献（＃８）
文献（＃４）の方法と同様に、あらかじめ用意される類似文字テーブルの作り方によっては、精度が期待できない。
【００１０】
本発明の課題は、文字認識後の情報の検索処理において、上述した検索技術の良い部分を残し、不完全な部分を改良して、より高精度な検索を行う検索装置およびその方法を提供することである。
【００１１】
【課題を解決するための手段】
図１は、本発明の検索装置の原理図である。図１の検索装置は、入力手段１０１、抽出手段１０２、推定手段１０３、および出力手段１０４を備え、文書画像の認識結果の情報を用いて、その文書画像内の情報を検索する。
【００１２】
入力手段１０１は、キーワードを入力し、抽出手段１０２は、文書画像から、キーワードを構成する任意の文字を探索し、探索された文字をキー文字として、そのキー文字の位置情報を抽出する。推定手段１０３は、抽出されたキー文字の位置情報を基準として、そのキーワードを構成する文字のうちキー文字より先頭側にある文字の数と、文書画像内の文字の代表的なサイズを示す文字サイズ情報および代表的な文字間隔を示す文字間隔情報のうち少なくとも一方とを用いて、文書画像内のキーワードに対応する領域の先端位置を計算し、キーワードを構成する文字のうちキー文字より末尾側にある文字の数と、文字サイズ情報および文字間隔情報のうち少なくとも一方とを用いて、キーワードに対応する領域の末端位置を計算し、得られた先端位置と末端位置の間の範囲をキーワードに対応する領域と推定する。出力手段１０４は、推定された領域のうち、キーワードに対応する文字列を含む領域の情報を、検索結果として出力する。
【００１３】
入力手段１０１は、例えば、ユーザにより指定されたキーワードを検索装置に入力し、抽出手段１０２は、入力されたキーワードに含まれる任意の文字を、文書画像から抽出し、それをキー文字として推定手段１０３に渡す。推定手段１０３は、受け取ったキー文字の位置から、そのキー文字を含み、かつ、キーワードを構成する文字の数に対応するような領域を推定する。そして、出力手段１０４は、推定された領域のうち、キーワードに対応する文字列を含む領域の位置および範囲の情報を出力する。
【００１４】
このような検索装置によれば、文書画像に含まれる文字パターンのうち、キーワードを構成する任意の文字に対応するパターンの位置を基準として、自動的にキーワード領域を推定することができる。したがって、キーワードの最初の文字が検出されなくても、他の文字がキー文字として抽出されれば、キーワードに対応する領域の情報が出力される。これにより、検索処理の精度が向上する。
【００１５】
例えば、図１の入力手段１０１は、後述する図１４の入力装置１１３に対応し、図１の抽出手段１０２および推定手段１０３は、図１４のＣＰＵ１１１とメモリ１１２の組み合わせに対応し、図１の出力手段１０４は、図１４の出力装置１１４に対応する。
【００１６】
【発明の実施の形態】
以下、図面を参照しながら、本発明の実施の形態を詳細に説明する。
本実施形態の検索装置は、まず、入力された画像情報の領域識別と文字認識を行い、検索に必要な認識結果を含む検索用情報を生成する。次に、検索用情報を用いて、キーワードを構成する文字を入力画像内で探索し、その文字の位置からキーワードが存在する領域の範囲を推定する。そして、その領域内の画像情報を用いて文字列を検証し、キーワードが抽出されれば、その位置を検索結果として出力する。
【００１７】
図２は、検索用情報生成処理のフローチャートである。検索装置は、まず、検索対象の文書画像を入力する（ステップＳ１）。このとき、ユーザは、紙文書をスキャナ等の画像入力装置で光学的に読み取って、検索装置に入力する。これにより、文書画像が検索装置に登録される。文書画像としては、基本的に白黒の２値画像を想定しているが、カラー画像であっても、前処理として２値化を行うことで、２値画像と同様に扱うことができる。
【００１８】
次に、検索装置は、文書画像を走査し、公知の技術で領域識別を行って、文字領域を抽出する（ステップＳ２）。ここでは、例えば、画像全体をラベリングして黒画素連結領域の外接矩形を抽出し、一定サイズ以上の矩形を図形・表領域候補として抽出する。次に、図・表領域候補の各矩形の内部を対象として罫線抽出を行い、縦横の罫線が抽出されれば、その矩形を表領域と識別する。また、罫線が抽出されなければ、その矩形を図領域と識別する。そして、図・表領域以外の領域を文字領域として抽出する。
【００１９】
次に、文字領域から行（文字列）を抽出し、各行を対象として文字認識を行い、認識結果を含む検索用情報を生成して出力する（ステップＳ３）。このとき、黒画素連結領域の外接矩形を元にして１文字の領域が決められ、画像情報から文字パターンが切り出される。
【００２０】
ただし、文字同士が接触している場合もあるので、接触文字切り出し技術を用いて接触文字を分割し、文字あるいは文字の部分パターンを単位（基本矩形）として、検索用情報を出力する。接触文字切り出し技術としては、例えば、特開平１１−３３８９７５（「文字切り出し処理方式および文字切り出し処理プログラムを記録した記録媒体」）の技術が用いられる。
【００２１】
また、“川”のような分離文字を正確に抽出するため、隣接する基本矩形を統合して文字候補の統合矩形を作り、その矩形についても同様の検索用情報を出力する。
【００２２】
以下では、検索用情報の出力単位となる基本矩形と統合矩形を合わせて、文字矩形と呼ぶことにする。検索用情報の出力は行単位で行われ、１つの行内で各文字矩形の情報をソートして出力する。各文字矩形の検索用情報には、以下のようなものが含まれる。
ａ）文字認識結果情報
この情報は、文字認識により得られた第１位から第ｎ位までの認識候補の情報である（ｎ≧１）。各候補の情報の内容は、文字コード、距離値、評価値、または確信度である。文字コードは、検索するときのために、すべて全角コードに変換しておく。距離値は、認識対象の文字パターンと認識辞書の文字パターンの間の類似度を表し、評価値は、認識候補に対する評価を表す。また、確信度は、認識候補の確からしさを表し、例えば、その候補の距離値と他の候補の距離値を用いて算出される。
ｂ）文字図形特徴情報
この情報は、複数の文字の図形的な特徴を表す情報で、以下の情報を含む。
・文字パターンそのもの
・文字パターンを画像処理した情報（画像特徴）
文字の識別能力がはっきりしている情報が望ましく、具体的には、特徴ベクトル等が用いられる。キーワードの中の１文字の文字コードをキーとして、認識辞書中の特徴ベクトルとこの特徴ベクトルを整合させたときに、同じ文字同士では確実に整合でき、異なる文字同士では整合できないような性質が要求される。例えば、以下のような特徴ベクトルが挙げられる。
１）文字パターンから求めた詳細分類用特徴ベクトル（２８８次元）
文字パターンから多元圧縮法という公知技術を用いて、２８８次元の特徴ベクトルを抽出する（Hai, Kabuyama, Yamamoto「手書き漢字認識の一手法多元圧縮法と部分パターン法による認識」電子情報通信学会論文誌vol.J68-D no.4, pp773-780, 1985 ）。この方法では、まず、文字パターンの輪郭点毎に、４方向の属性を持たせて方向線素を求め、文字パターンを非線形正規化した後に、方向毎に異なる数の領域に分割する。そして、各領域毎に方向線素の画素数をカウントして、カウント値を規定の順序で並べ、ベクトル化する。これにより、詳細分類用特徴ベクトルが得られる。
２）文字パターンから求めた大分類用特徴ベクトル（１６次元）
上述の詳細分類用特徴ベクトルを元にして、カテゴリ毎に固有値と固有ベクトルを求め、固有値の大きい方から１６本の軸（固有ベクトル）を求める。そして、各軸上にカテゴリの代表ベクトルを投影した値を要素とする１６次元のベクトルを生成する。これにより、大分類用特徴ベクトルが得られる。
【００２３】
これらの特徴ベクトルを用いれば、キーワードの中の１文字の文字コードに相当する認識辞書中の特徴ベクトルと、検索対象文書中の文字の特徴ベクトルを整合させたときに、同じ文字同士では、距離のしきい値を用いることでそれらが同じ文字であると判別できる。また、異なる文字同士では、同じ距離のしきい値を用いて、それらが異なっていると判別できる。
ｃ）文字情報
・文字の位置
例えば、文字の黒画素連結領域の外接矩形の左上および右下の頂点の座標値が用いられる。
・サイズ情報
この情報は、文字が属する行内に含まれる代表的な文字矩形のサイズを表す。例えば、文字が属する行内に含まれる文字矩形の高さと幅の最頻値が用いられる。
・行内の文字間隔距離の代表値
例えば、行内の文字の外接矩形間の距離の平均値が用いられる。
・行内の文字矩形の並びと、文字矩形の順序を表す情報
例えば、横書き行であれば、図３に示すように、基本矩形の間隙部に、左側から順に、０，１，２，．．．のような通し番号を付ける。そして、１つの文字矩形の開始番号として、その矩形の左側の間隙番号を設定し、終了番号として右側の間隙番号を設定して、文字矩形の順序情報を生成する。
【００２４】
図３の文字矩形のうち、矩形１〜矩形９は基本矩形に対応し、矩形１０〜矩形１５は、複数の基本矩形を統合した統合矩形に対応する。例えば、矩形１０は、矩形１と矩形２を統合して生成され、矩形１４は、矩形１、矩形２、および矩形３を統合して生成される。これらの矩形は、１つの行に対する複数の異なる切り出し結果を表している。また、これらの矩形の順序情報は、図４のようになる。
ｄ）行情報
・行番号
行番号は、ページ内の各行に付けられたユニークな番号を表す。
・行方向
行方向は、縦または横を表す。
【００２５】
図５は、出力された検索用情報に基づく全文検索処理のフローチャートである。まず、ユーザは、検索したいキーワードを入力する（ステップＳ１１）。次に、検索装置は、検索用情報を格納した検索対象ファイルから、キーワードを構成する文字（キー文字）を抽出し（ステップＳ１２）、キー文字に基づいて、キーワードに対応する文字列が存在する領域を推定する（ステップＳ１３）。
【００２６】
次に、その領域内の画像情報を認識辞書の情報と比較して、領域内にキーワードが含まれているか否かを検証し、キーワードと整合する文字列を抽出する（ステップＳ１４）。そして、抽出された文字列の位置情報を検索結果として出力し（ステップＳ１５）、処理を終了する。
【００２７】
次に、図６から図１３までを参照しながら、全文検索処理の具体例について説明する。
図６および図７は、図５のステップＳ１２で行われるキー文字抽出処理のフローチャートである。この処理では、認識候補の確信度に基づくキー文字探索が行われる。そして、入力されたキーワードの中の１文字の文字コードと検索対象ファイルの中の文字（第２位以下の認識候補も含む）の文字コードが一致し、かつ、確信度が一定しきい値を越える場合に、その文字をキー文字として抽出する。
【００２８】
検索装置は、まず、キーワードの中の各文字を保持する配列ｋｅｙｃｏｄｅを作成し（図６のステップＳ２１）、キーワードの文字数を変数ｋｅｙｗｏｒｄ＿ｎｕｍに入れる（ステップＳ２２）。次に、抽出されたキー文字の数を表す変数ｅｘｔｒａｃｔ＿ｎｕｍに０を入れ（ステップＳ２３）、検索対象ファイルから認識結果の１文字を抽出する（ステップＳ２４）。
【００２９】
次に、認識候補の順位を表す変数ｉに１を入れ（ステップＳ２５）、配列ｋｅｙｃｏｄｅのインデックスを表す変数ｊに１を入れて（ステップＳ２６）、ｉ番目の候補の文字コードを、ｋｅｙｃｏｄｅのｊ番目の文字の文字コードと比較する（ステップＳ２７）。
【００３０】
２つの文字コードが一致すれば、次に、ｉ番目の候補の確信度を一定しきい値と比較する（ステップＳ２８）。確信度がしきい値を越えていれば、ステップＳ２４で抽出された文字をキー文字とみなし、ｋｅｙｃｏｄｅのｊ番目の文字コードと、抽出された文字の位置を記録する（ステップＳ２９）。そして、ｅｘｔｒａｃｔ＿ｎｕｍ＝ｅｘｔｒａｃｔ＿ｎｕｍ＋１とおく（ステップＳ３０）。
【００３１】
次に、ｊ＝ｊ＋１とおき（図７のステップＳ３１）、ｊをｋｅｙｗｏｒｄ＿ｎｕｍと比較する（ステップＳ３２）。そして、ｊがｋｅｙｗｏｒｄ＿ｎｕｍ以下であれば、ステップＳ２７以降の処理を繰り返す。
【００３２】
また、ステップＳ２７において２つの文字コードが一致しない場合、および、ステップＳ２８において確信度がしきい値以下である場合は、ステップＳ２４で抽出された文字はキー文字ではないと判断して、ステップＳ３１以降の処理を行う。
【００３３】
ステップＳ３２においてｊがｋｅｙｗｏｒｄ＿ｎｕｍを越えると、次に、ｉ＝ｉ＋１とおき（ステップＳ３３）、ｉを認識候補の数と比較する（ステップＳ３４）。そして、ｉが認識候補の数以下であれば、ステップＳ２６以降の処理を繰り返す。
【００３４】
ｉが認識候補の数を越えると、次に、検索対象ファイルの中の文字が終了したか否かをチェックする（ステップＳ３５）。未処理の文字が残っていれば、次の文字についてステップＳ２４以降の処理を繰り返す。そして、すべての文字が処理されると、処理を終了する。
【００３５】
この処理では、認識候補の確信度の代わりに、評価値や距離値を用いることもできる。評価値を用いた場合は、ステップＳ２８において認識候補の評価値を一定しきい値と比較し、評価値がしきい値を越えていれば、検索対象ファイルの文字をキー文字とみなす。また、距離値を用いた場合は、ステップＳ２８において認識候補の距離値を一定しきい値と比較し、距離値がしきい値を下回れば、検索対象ファイルの文字をキー文字とみなす。
【００３６】
上述したキー文字抽出処理によれば、検索対象ファイルの中の各文字について、複数の認識候補とキーワードを構成する文字とが照合されるので、文書画像中のキー文字が検出され易くなる。また、認識候補の確信度、評価値、または距離値に基づいてキー文字が抽出されるので、キー文字の抽出精度が向上する。
【００３７】
図８および図９は、図５のステップＳ１３で行われるキーワード領域推定処理のフローチャートである。この処理では、抽出されたキー文字の文字情報から、キーワード内におけるその文字の位置を獲得する。そして、キーワード内の位置と、キーワードの文字数や文字幅等の文字範囲情報から、キー文字を囲む推定キーワード領域を求める。例えば、横書きの場合は、キー文字から左右に伸びるキーワード領域が生成され、縦書きの場合は、キー文字から上下に伸びるキーワード領域が生成される。
【００３８】
検索装置は、まず、抽出されたキー文字の１つを指す変数ｉに１を入れ（図８のステップＳ４１）、既抽出キーワード領域の数を表す変数ｋｗｎｕｍに０を入れて（ステップＳ４２）、ｉ番目のキー文字の文字コードと検索対象ファイル内の位置の情報を取得する（ステップＳ４３）。
【００３９】
次に、キー文字の位置、サイズ（キー文字が属する行の文字の幅または高さ）、文字間隔距離等の文字情報と、そのキー文字のキーワード内における位置から、文書画像内におけるキーワード領域を推定する（ステップＳ４４）。
【００４０】
例えば、キーワードとして“富士通研究所”が入力され、キー文字として“通”が抽出された場合を考える。この例では、図１０に示すように、文書画像内にｘｙ座標系が設定され、抽出された文字“通”の文字情報には、文字矩形の左上頂点の座標（ｘ，ｙ）＝（２００，５００）と右下頂点の座標（２５０，５５０）が、位置として記録されている。また、代表的な文字幅は５０ドットであり、文字間隔距離は１０ドットであることが記録されている。
【００４１】
この場合、推定キーワード領域の上端および下端の座標値としては、それぞれ、左上頂点のｙ座標５００および右下頂点のｙ座標５５０が採用され、左端および右端の座標値は、次式により計算される。
左座標：
左上頂点のｘ座標−（文字幅＋文字間隔距離）×（キー文字の左側の文字数）
＝２００−（５０＋１０）×２
＝８０
右座標：
右下頂点のｘ座標＋（文字幅＋文字間隔距離）×（キー文字の右側の文字数）
＝２５０＋（５０＋１０）×３
＝４３０
したがって、推定キーワード領域の範囲を左上および右下頂点の座標を用いて表すと、（８０，５００）−（４３０，５５０）となる。
【００４２】
次に、検索装置は、既に登録されたキーワード領域の１つを指す変数ｊに１を入れ、既抽出キーワード領域との重複フラグ変数ｋｗｎｕｍ２に０を入れて（ステップＳ４５）、推定キーワード領域がｊ番目の既登録キーワード領域と重複するか否かをチェックする（ステップＳ４６）。
【００４３】
２つのキーワード領域が重複すれば、それらのキーワード領域を囲む矩形領域を新たなキーワード領域として求め、ｊ番目の既登録キーワード領域の座標を更新する（ステップＳ４７）。これにより、ｊ番目の既登録キーワード領域の範囲が新たなキーワード領域の範囲に置き換えられる。そして、ｋｗｎｕｍ２＝１とおく。
【００４４】
次に、ｊ＝ｊ＋１とおき（ステップＳ４８）、ｊとｋｗｎｕｍを比較する（図９のステップＳ４９）。そして、ｊがｋｗｎｕｍ以下であれば、ステップＳ４６以降の処理を繰り返し、ｊがｋｗｎｕｍを越えれば、次に、ｋｗｎｕｍ２の値をチェックする。
【００４５】
ｋｗｎｕｍ２＝０であれば、推定キーワード領域を新規に登録し、ｋｗｎｕｍ＝ｋｗｎｕｍ＋１とおく（ステップＳ５１）。次に、ｉ＝ｉ＋１とおいて（ステップＳ５２）、ｉ番目の抽出キー文字の情報があるか否かをチェックする（ステップＳ５３）。そして、ｉ番目の抽出キー文字があれば、ステップＳ４３以降の処理を繰り返し、抽出キー文字の情報が終了すれば、処理を終了する。
【００４６】
ステップＳ４６において、２つのキーワード領域が重複しなければ、そのままステップＳ４８以降の処理を行い、ステップＳ５０において、ｋｗｎｕｍ２＞０であれば、ステップＳ５２以降の処理を行う。
【００４７】
このようなキーワード領域推定処理によれば、抽出されたキー文字の位置に基づいて、その文字を含むキーワード領域の範囲を精度良く推定することができる。また、キーワードの最初の文字がキー文字として抽出されなくても、他の文字がキー文字として抽出されれば、その文字の位置から対応するキーワード領域を推定することができる。さらに、複数のキー文字が１つのキーワード領域を構成する場合に、そのキーワード領域を正しく抽出することができる。
【００４８】
図１１は、図５のステップＳ１４で行われるキーワード抽出処理のフローチャートである。この処理では、得られた推定キーワード領域の範囲の文字矩形を対象として、文字パターンの画像を認識辞書の対応する文字と整合させ、領域内の文字列が本当にキーワードであるか否かを検証する。
【００４９】
例えば、キーワード領域内の重複しない矩形の組み合わせをすべて求め、得られた組み合わせの各々について、各矩形内の文字パターンから求めた画像特徴と、辞書から求めたキーワード構成文字の画像特徴を整合させる。これらの組み合わせは、それぞれ異なる文字切り出し結果に対応する。そして、整合度が一定しきい値以上（例えば、距離値の合計が一定しきい値以下）の組み合わせが存在する場合に、キーワードが検出されたものとみなす。
【００５０】
例えば、キーワード領域内に図３の文字矩形が含まれる場合、以下のような矩形の組み合わせが考えられる。
・矩形１〜矩形９
・矩形１０、および矩形３〜矩形９
・矩形１０、矩形１３、および矩形４〜矩形９
・矩形１０、矩形１１、および矩形５〜矩形９
・矩形１０〜矩形１２、および矩形７〜矩形９
・矩形１、矩形２、矩形１１、および矩形５〜矩形９
・矩形１、矩形２、矩形１１、矩形１２、および矩形７〜矩形９
・矩形１〜矩形４、矩形１２、および矩形７〜矩形９
・矩形１、矩形１３、および矩形４〜矩形９
・矩形１、矩形１３、矩形４、矩形１２、および矩形７〜矩形９
・矩形１４、および矩形４〜矩形９
・矩形１４、矩形４、矩形１２、および矩形７〜矩形９
・矩形１４、矩形１５、および矩形６〜矩形９
・矩形１０、矩形３、矩形１５、および矩形６〜矩形９
・矩形１、矩形１３、矩形１５、および矩形６〜矩形９
図１１では、１つの推定キーワード領域についての検証処理が示されているが、検索装置は、同様の処理をすべての推定キーワード領域について行い、キーワードが検出された領域だけを、検索結果として出力する。推定キーワード領域内の各組み合わせに含まれる矩形は、あらかじめ、対応する行の先頭に近いものから順にソートされているものとする。
【００５１】
検索装置は、まず、キーワード領域内の矩形の組み合わせの１つに注目し、その組み合わせの最初の矩形をカレント矩形として選択する（ステップＳ６１）。次に、キーワードの中の文字と一致した領域内の文字の数を表す変数ｎｕｍに０を入れ、キーワードの中の文字を指す変数ｋに１を入れて（ステップＳ６２）、再帰的整合処理を呼び出す（ステップＳ６３）。再帰的整合処理では、画像特徴を用いて領域内の各文字パターンとキーワードの文字パターンとを照合し、一致した文字数を復帰値としてｎｕｍに設定する。
【００５２】
次に、復帰値ｎｕｍとキーワードの文字数を比較し（ステップＳ６４）、両者が一致しなければ、同じキーワード領域内に他の組み合わせがあるか否かをチェックする（ステップＳ６５）。他の組み合わせがあれば、その組み合わせについてステップＳ６１以降の処理を繰り返す。
【００５３】
ステップＳ６４において、ｎｕｍとキーワード文字数が一致すれば、キーワードが検出された領域の数をインクリメントして（ステップＳ６６）、ステップＳ６５の処理を行う。そして、ステップＳ６５において、キーワード領域内のすべての組み合わせの検証が終了すれば、処理を終了する。
【００５４】
ステップＳ６３の再帰的整合処理では、まず、キーワード領域内の矩形の文字パターンと文字情報を取得する。その後の処理で用いられる検証方法には、以下のような方法がある。
ａ）すべての矩形の文字パターンについて検証を行う方法
領域内のすべての矩形について、画像情報を用いた整合処理としきい値処理を行い、対象矩形がキーワードを構成する文字か否かを検証する。
【００５５】
具体的には、領域内のすべての矩形について画像情報によるマッチングを行い、しきい値以下の距離の矩形をキーワードの文字と整合したものとみなす。そして、キーワードの文字がすべて領域内の矩形と整合した場合に、キーワードが抽出されたとみなす。この方法によれば、すべての矩形が画像情報を用いて検証されるため、キーワードの抽出精度が向上する。
ｂ）確信度の低い矩形の文字パターンのみについて検証を行う方法
領域内の矩形から、確信度が所定の条件を満たすようなものを除き、残りの矩形を対象として、画像特徴情報を用いた整合処理としきい値処理を行う。
【００５６】
具体的には、領域内の矩形のうち、確信度がしきい値より高い矩形は、既にキーワードの文字と整合がとれているとみなし、確信度がしきい値以下の矩形について、画像情報によるマッチングを行う。そして、しきい値以下の距離の矩形をキーワードの文字と整合したものとみなし、キーワードの文字がすべて領域内の矩形と整合した場合に、キーワードが抽出されたとみなす。この方法によれば、キーワードの抽出精度を損なうことなく、処理時間を削減することができる。
【００５７】
図１２および図１３は、ａ）の検証方法を用いた再帰的整合処理のフローチャートである。検索装置は、まず、カレント矩形の開始番号とその前の矩形の終了番号を比較する（図１２のステップＳ７１）。両者が一致しなければ、次に、カレント矩形の開始番号が０であるか否かをチェックする（ステップＳ７２）。カレント矩形の開始番号が０であれば、次に、カレント矩形の行番号が前の行の行番号＋１であるか否かをチェックする（ステップＳ７３）。
【００５８】
カレント矩形の行番号が前の行の行番号＋１であれば、次に、カレント矩形内の文字の画像特徴（特徴ベクトル等）と、キーワードのｋ番目の文字の画像特徴を比較する（ステップＳ７４）。そして、画像特徴間の距離がしきい値以下であれば、カレント矩形の文字とキーワードのｋ番目の文字が一致したものとみなし、ｎｕｍ＝ｎｕｍ＋１とおく（ステップＳ７５）。
【００５９】
次に、ｋ＋１をキーワードの文字数と比較する（図１３のステップＳ７６）。そして、ｋ＋１がキーワード文字数以下であれば、ｋ＝ｋ＋１とおき（ステップＳ７７）、カレント矩形の次の矩形を新たなカレント矩形に設定する（ステップＳ７８）。次の矩形としては、カレント矩形の終了番号を開始番号として持つ矩形が選択される。そして、再帰的整合処理を呼び出して復帰値ｎｕｍを取得し（ステップＳ７９）、呼び出し元にｎｕｍを返して（ステップＳ８０）、処理を終了する。
【００６０】
ステップＳ７１において、カレント矩形の開始番号と前の矩形の終了番号が一致すれば、ステップＳ７４以降の処理を行う。また、ステップＳ７２において開始番号が０でない場合と、ステップＳ７３において行番号が前の行の行番号＋１でない場合と、ステップＳ７４においてカレント矩形の文字とキーワードのｋ番目の文字が一致しない場合は、そのままステップＳ８０の処理を行う。また、ステップＳ７６において、ｋ＋１がキーワード文字数を越えた場合も、ステップＳ８０の処理を行う。
【００６１】
ｂ）の検証方法を用いた場合は、領域内のすべての矩形ではなく、確信度が高い矩形の間にある矩形を対象にして、図１２および図１３と同様の処理を行う。この場合、図１１のステップＳ６４では、キーワードの中の文字のうち、確信度が高い文字を除いた残りの文字（部分的な文字列）の数がキーワード文字数として用いられる。
【００６２】
また、ｂ）の検証方法において、確信度の代わりに、評価値や距離値を用いて同様の処理を行うこともできる。この場合、キーワード領域内の矩形から、評価値または距離値が所定の条件を満たすようなものを除き、残りの矩形を対象として、整合処理としきい値処理を行う。
【００６３】
本実施形態のパターン抽出装置は、例えば、図１４に示すような情報処理装置（コンピュータ）を用いて構成することができる。図１４の情報処理装置は、ＣＰＵ（中央処理装置）１１１、メモリ１１２、入力装置１１３、出力装置１１４、外部記憶装置１１５、媒体駆動装置１１６、ネットワーク接続装置１１７、および画像入力装置１１８を備え、それらはバス１１９により互いに接続されている。
【００６４】
メモリ１１２は、例えば、ＲＯＭ（read only memory）、ＲＡＭ（random access memory）等を含み、処理に用いられるプログラムとデータを格納する。ＣＰＵ１１１は、メモリ１１２を利用してプログラムを実行することにより、必要な処理を行う。
【００６５】
入力装置１１３は、例えば、キーボード、ポインティングデバイス、タッチパネル等であり、ユーザからの指示や情報の入力に用いられる。出力装置１１４は、例えば、ディスプレイ、プリンタ、スピーカ等であり、ユーザへの問い合わせや処理結果の出力に用いられる。
【００６６】
外部記憶装置１１５は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク（magneto-optical disk）装置等である。情報処理装置は、この外部記憶装置１１５に、上述のプログラムとデータを保存しておき、必要に応じて、それらをメモリ１１２にロードして使用することができる。また、外部記憶装置１１５は、検索対象ファイルを格納するデータベースとしても用いられる。
【００６７】
媒体駆動装置１１６は、可搬記録媒体１２０を駆動し、その記録内容にアクセスする。可搬記録媒体１２０としては、メモリカード、フロッピーディスク、ＣＤ−ＲＯＭ（compact disk read only memory ）、光ディスク、光磁気ディスク等、任意のコンピュータ読み取り可能な記録媒体が用いられる。ユーザは、この可搬記録媒体１２０に上述のプログラムとデータを格納しておき、必要に応じて、それらをメモリ１１２にロードして使用することができる。
【００６８】
ネットワーク接続装置１１７は、ＬＡＮ（local area network）等の任意の通信ネットワークへの接続に用いられ、通信に伴うデータ変換を行う。情報処理装置は、上述のプログラムとデータをネットワーク接続装置１１７を介して外部の装置から受け取り、それらをメモリ１１２にロードして使用することができる。
【００６９】
画像入力装置１１８は、例えば、ＣＣＤカメラやスキャナ等の撮像装置であり、カラー画像を含む文書画像の入力に用いられる。
図１５は、図１４の情報処理装置にプログラムとデータを供給することのできるコンピュータ読み取り可能な記録媒体を示している。可搬記録媒体１２０や外部のデータベース１２１に保存されたプログラムとデータは、メモリ１１２にロードされる。そして、ＣＰＵ１１１は、そのデータを用いてそのプログラムを実行し、必要な処理を行う。
【００７０】
以上説明した実施形態では、文字位置の情報として矩形の座標を用いているが、本発明では、その他の任意の形状の座標を用いることができる。また、推定キーワード領域として、矩形以外の形状を用いてもよい。
（付記１）文書画像の認識結果の情報を用いて、該文書画像内の情報を検索する検索装置であって、
キーワードを入力する入力手段と、
前記文書画像から、前記キーワードを構成する任意の文字をキー文字として抽出する抽出手段と、
前記キー文字の位置情報に基づいて、前記文書画像内の前記キーワードに対応する領域を推定する推定手段と、
推定された領域の情報に基づいて、検索結果を出力する出力手段と
を備えることを特徴とする検索装置。
（付記２）前記抽出手段は、前記文書画像内の文字の複数の認識候補と前記キーワードを構成する任意の文字を比較して、前記キー文字を抽出することを特徴とする付記１記載の検索装置。
（付記３）前記抽出手段は、前記文書画像内の文字の認識候補の確信度、評価値、および距離値のうちの１つを用いて、前記キー文字を抽出することを特徴とする付記１記載の検索装置。
（付記４）前記推定手段は、前記文書画像内の文字のサイズ情報および文字間隔の情報のうち少なくとも一方と、前記キー文字の位置情報とを用いて、前記キーワードに対応する領域の範囲を計算することを特徴とする付記１記載の検索装置。
（付記５）前記推定された領域内の文字パターンに基づいて、該推定された領域内の文字列が前記キーワードであるか否かを検証する検証手段をさらに備え、前記出力手段は、該キーワードであることが検証された文字列を含む領域の情報を、前記検索結果として出力することを特徴とする付記１記載の検索装置。
（付記６）前記検証手段は、前記推定された領域内のすべての文字パターンを対象として、画像特徴情報を用いた整合処理としきい値処理を行い、対象文字パターンがキーワードを構成する文字か否かを検証することを特徴とする付記５記載の検索装置。
（付記７）前記検証手段は、前記推定された領域内の文字パターンから、確信度、評価値、および距離値のうちの１つが所定の条件を満たすような文字パターンを除き、残りの文字パターンを対象として、画像特徴情報を用いた整合処理としきい値処理を行い、対象文字パターンがキーワードを構成する文字か否かを検証することを特徴とする付記５記載の検索装置。
（付記８）前記検証手段は、複数の文字切り出し結果に対応する、文字パターンの複数の組み合わせの各々について、画像特徴情報を用いた整合処理としきい値処理を行い、各組み合わせが前記キーワードに対応するか否かを検証することを特徴とする付記５記載の検索装置。
（付記９）あらかじめ前記文書画像を文字認識して、認識結果を含む検索用情報を生成する生成手段と、該検索用情報を検索対象ファイルとして格納する格納手段をさらに備え、前記抽出手段と推定手段は、該格納手段に格納された該検索用情報を用いて処理を行うことを特徴とする付記１記載の検索装置。
（付記１０）文書画像の認識結果の情報を用いて、該文書画像内の情報を検索するコンピュータのためのプログラムを記録した記録媒体であって、
前記プログラムは、
前記文書画像から、キーワードを構成する任意の文字をキー文字として抽出し、
前記キー文字の位置情報に基づいて、前記文書画像内の前記キーワードに対応する領域を推定し、
推定された領域の情報に基づいて、検索結果を出力する
処理を前記コンピュータに実行させることを特徴とするコンピュータ読み取り可能な記録媒体。
（付記１１）文書画像の認識結果の情報を用いて、該文書画像内の情報を検索する検索方法であって、
キーワードを指定し、
前記文書画像から、前記キーワードを構成する任意の文字をキー文字として抽出し、
前記キー文字の位置情報に基づいて、前記文書画像内の前記キーワードに対応する領域を推定し、
推定された領域の情報に基づいて、検索結果を提示する
ことを特徴とする検索方法。
【００７１】
【発明の効果】
本発明によれば、画像内の情報の検索処理において、文字認識結果が有効に利用され、高精度な検索を行うことができる。これにより、従来では誤認識のために検索できなかった文書画像中の単語についても、キーワードによる全文検索が可能になる。
【図面の簡単な説明】
【図１】本発明の検索装置の原理図である。
【図２】検索用情報生成処理のフローチャートである。
【図３】文字矩形を示す図である。
【図４】文字矩形の順序情報を示す図である。
【図５】全文検索処理のフローチャートである。
【図６】キー文字抽出処理のフローチャート（その１）である。
【図７】キー文字抽出処理のフローチャート（その２）である。
【図８】キーワード領域推定処理のフローチャート（その１）である。
【図９】キーワード領域推定処理のフローチャート（その２）である。
【図１０】キーワード領域を示す図である。
【図１１】キーワード抽出処理のフローチャートである。
【図１２】再帰的整合処理のフローチャート（その１）である。
【図１３】再帰的整合処理のフローチャート（その２）である。
【図１４】情報処理装置の構成図である。
【図１５】記録媒体を示す図である。
【符号の説明】
１０１入力手段
１０２抽出手段
１０３推定手段
１０４出力手段
１１１ＣＰＵ
１１２メモリ
１１３入力装置
１１４出力装置
１１５外部記憶装置
１１６媒体駆動装置
１１７ネットワーク接続装置
１１８画像入力装置
１１９バス
１２０可搬記録媒体
１２１データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a search apparatus and a method for searching for information in an image by using a result of character recognition processing on image information.
[0002]
[Prior art]
2. Description of the Related Art In recent years, document management systems that share documents electronically have been used in organizations such as companies from the viewpoint of information sharing and rapid information use. As a computerized document, a text searchable document such as a word processor document or a presentation document that has been digitized from the time of creation is often used. In addition, image information that cannot be text-searched as it is is also used.
[0003]
As a method of registering an image in a document management system, a method of extracting text information that can be searched by character recognition of the image and storing the information together with the image is generally used for later search processing.
[0004]
However, the recognition rate of the character recognition process is not 100%, and the recognition result includes erroneous recognition. Further, the normal search method cannot detect misrecognition. For this reason, even if there is a keyword character string in the document, the character string does not match the keyword due to misrecognition, and search may not be possible.
[0005]
In order to prevent this, it is necessary to develop a search data creation method using character recognition characteristics and a search method for data after character recognition. Examples of conventional search techniques for searching for information after character recognition include the following documents.
(# 1) Marukawa, Fujisawa, Shima “Study of information retrieval method that allows output ambiguity of recognition function Analysis and evaluation of retrieval method focusing on recognition error characteristics” IEICE Journal vol.J79-D-II, no.5, pp785-794, 1996
(# 2) Ohta, Takasu, Adachi “Full-text search method in Japanese text including recognition errors”, Journal of the Institute of Information Processing, vol.39, no.3, pp625-635, 1998
(# 3) Itonori, Ozaki “Japanese word extraction with similar characters”, Technical Report of IEICE PRMU 98-87, pp25-32, 1998
(# 4) Imagawa, Matsukawa, Kondo, Mekata “Search method from text including misrecognition with recognition reliability for each character” IEICE Technical Report PRMU99- 72, pp63-75, 1999
(# 5) Kondo, Matsukawa, Imagawa, Eyepiece "A Study on Fuzzy Search from Texts Containing Character Recognition Errors" IEICE Tech. PRMU99-73, pp69-75, 1999
(# 6) Yusa, Tanaka “Realization of a character string search function for Japanese document images” Information processing, Information media research report 19-1, pp1-8, 1995
(# 7) Nakanishi, Omachi, Aso “Highly accurate keyword search system for low-quality document images” IEICE Tech. Bulletin PRMU98-232, pp97-104, 1999
(# 8) Matsukawa, Imakawa, Kondo, Eyepiece "Improvement of document image search performance using shape feature search" IEICE Tech. Bulletin PRMU99-74, pp77-83, 1999
These conventional techniques can be classified as follows.
a) Method for collating character recognition candidates with keywords
A recognition candidate is generated at the time of character recognition, and the search is performed while considering a plurality of possibilities using the candidate at the time of search.
Reference (# 1)
The number of recognition candidates is not a fixed number, and threshold processing of similarity is performed for each character to narrow down recognition candidates (threshold values: 4 types).
b) Keyword expansion method using similar characters
b-1) Keyword character string expansion by similar character table
One keyword is expanded and searched by similar character information.
Reference (# 1)
Simply expand keywords using a pre-made confusion matrix. The number of similar characters for one character is not a fixed number, and a threshold value process for similarity is performed for each character in advance to narrow down similar characters (threshold value: 4 types).
Reference (# 2)
A table storing probabilities based on the probability of error, which is a probability representation of misrecognition, error omission, error insertion, error coupling, and error division, is created in advance, and all non-zero probabilities are used using the table. Expand keywords in combination. Next, a full text search is performed using the expanded keywords. Then, for each character string extracted by the search, the certainty factor of the character string obtained probabilistically is calculated, and the suitability is determined using a threshold value.
b-2) Search by similar character table
A character string that has been character-recognized is searched using a similar character table in consideration of a plurality of correspondence relationships for each search.
Reference (# 3)
A plurality of characters having similar shapes are handled as one category, and search is performed by narrowing down to one by language processing.
[0006]
Dictionary creation)
First, similar character categories (clusters) are created by hierarchical clustering. Next, the similar character cluster is divided and the divided clusters are reintegrated.
[0007]
identification)
The image is processed in the order of similar character identification, morphological analysis, and detailed identification to generate a word string. In similar character identification, 1000 or 2000 categories are used.
[0008]
First, a similar character category closest to the input character is obtained by hierarchical identification of reintegrated cluster → divided cluster → similar character cluster, and a character string made of the similar character category is created. Next, words using characters in the similar character category are generated by morphological analysis for the similar character category. At this time, only characters that cannot be narrowed down to one word by morphological analysis are determined by detailed identification.
Reference (# 4)
The number of characters developed in the similar character table is variable for each character in accordance with the certainty (reliability) of the recognition result. In advance, a similar character table including correct characters, recognition results, and certainty factors is created, and character recognition results and certainty factors are output at the time of recognition. At the time of search, a matching probability with the search character is obtained from the character of the keyword, the character of the recognition result, and its certainty using a similar character table. When the value is positive, the recognition result is regarded as being collated, and the search is performed. This method is the same as the search limited to similar characters having a positive matching probability.
Reference (# 5)
In addition to the search method of the literature (# 4) using the reliability of character recognition given to each character and the probability of specifying by number of characters, when k characters of n words match, the word is uniquely specified Search using the probability that you can. The probability that can be specified by the number of characters is a probability that another word is not searched even if a part of a word having a relatively long character string length is treated as a wild card.
c) Image search method
A keyword image is directly retrieved from a document image by image matching.
Reference (# 6)
First, a character rectangle is regarded as a square and divided into 16 equal parts, and a rectangle formed by combining two adjacent portions vertically or horizontally is taken as one unit. And the combination which makes a square is extracted from the combination of two adjacent rectangles. From one character, 18 sets can be combined. Next, the black pixel density in one unit is obtained, and the ratio of the black pixel density between adjacent units is coded with 2 bits of 00 to 11 as the feature amount. A feature amount of 36 bits per character is obtained, and an error of 8 bits is allowed at the time of collation.
Reference (# 7)
A plurality of cutout candidates are output, and a feature vector is registered for each cutout candidate. At the time of retrieval, matching is performed for each character from the first character of the keyword characters, and if the distance value is equal to or less than a fixed threshold value, the match is assumed. At this time, if the previous character matches, the next character is narrowed down and the processing speed is increased. Then, the keyword is detected when the distance values of all characters are equal to or smaller than the threshold value and the sum of the distance values is equal to or smaller than another fixed threshold value.
d) A combination of similar character search and image search
Reference: (# 8)
The similar character search and the image search are switched according to the certainty (reliability) of the recognition result. The similar character search of the document (# 4) in b-2) is performed, and characters that match the matching conditions are extracted. A shape feature search is performed when characters that are not detected by the similar character search and the reliability is lower than the fixed threshold continue. In the shape feature search, adjacent low-reliability characters are integrated and cut out so as to have a width estimated from the keyword characters. Next, the character feature amount is obtained, and the Euclidean distance between the character feature amount and the keyword character feature amount is obtained. If the distance value is equal to or smaller than the fixed threshold value, it is assumed that the collation has been performed.
[0009]
[Problems to be solved by the invention]
However, the conventional search technique described above has the following problems.
a) Method for collating character recognition candidates with keywords
Reference (# 1)
Depending on the font of the search target document image, the correct character may not be included in the candidate characters. In particular, when there are a lot of similar characters, if the correct answer falls below the candidate character, it is cut off from the limitation on the number of candidate characters, and the correct character does not enter the candidate character.
b) Keyword expansion method using similar characters
b-1)
Reference (# 1)
The number of expanded keywords increases exponentially depending on the size of the similar character table, and the amount of calculation becomes enormous.
Reference (# 2)
As processing targets expand, the number of expanded keywords increases explosively.
b-2)
Reference (# 3)
Since language processing is included, the accuracy is poor for a compound word or a word with a small number of characters.
Reference (# 4)
Depending on how the similar character table prepared in advance is made, accuracy cannot be expected. In fact, it has been reported by the authors of this document that the accuracy of low resolution characters is poor.
Reference (# 5)
Weak to compound words and words with few characters. In addition, as in the method of the document (# 4), accuracy cannot be expected depending on how to prepare a similar character table prepared in advance.
c) Image search method
Reference (# 6)
With the feature values used in this method, high accuracy cannot be expected. Also, the amount of calculation is large compared to a search using only text. Furthermore, the accuracy is not high except for dictionary fonts.
Reference (# 7)
Compared to a search that uses only text, it is computationally intensive. If the first character of the keyword cannot be detected, the corresponding character string cannot be searched. Furthermore, the accuracy is not high except for dictionary fonts.
d) A combination of similar character search and image search
Reference (# 8)
Similar to the method of literature (# 4), accuracy cannot be expected depending on how to prepare a similar character table prepared in advance.
[0010]
An object of the present invention is to provide a search apparatus and method for performing a more accurate search by leaving the good part of the search technique described above and improving the incomplete part in the search process of information after character recognition. That is.
[0011]
[Means for Solving the Problems]
FIG. 1 is a principle diagram of a search apparatus according to the present invention. 1 includes an input unit 101, an extraction unit 102, an estimation unit 103, and an output unit 104, and searches for information in the document image using information on the recognition result of the document image.
[0012]
The input unit 101 inputs a keyword, and the extraction unit 102 extracts an arbitrary character constituting the keyword from the document image. Search for the searched character As key letter , Position information of the key character Extract. The estimation means 103 uses the position information of the extracted key character as a reference, and the number of characters that are on the head side of the key character among the characters constituting the keyword and the character in the document image Characters indicating typical sizes Size information and Representative Character spacing Character spacing indicating Using at least one of the information, calculate the tip position of the area corresponding to the keyword in the document image, the number of characters that are on the end side of the key character among the characters that constitute the keyword, font size Information and Character spacing The terminal position of the region corresponding to the keyword is calculated using at least one of the information, and the range between the obtained tip position and the terminal position is estimated as the region corresponding to the keyword. The output means 104 outputs the estimated area. Among them, information on the area containing the character string corresponding to the keyword ,search results As Output.
[0013]
For example, the input unit 101 inputs a keyword designated by the user to the search device, and the extraction unit 102 extracts an arbitrary character included in the input keyword from the document image and estimates it as a key character. 103. The estimation means 103 estimates an area including the key character and corresponding to the number of characters constituting the keyword from the position of the received key character. Then, the output unit 104 outputs the position and range information of the region including the character string corresponding to the keyword among the estimated regions.
[0014]
According to such a search device, it is possible to automatically estimate a keyword region on the basis of the position of a pattern corresponding to an arbitrary character constituting a keyword among character patterns included in a document image. Therefore, even if the first character of the keyword is not detected, if other characters are extracted as key characters, information on the area corresponding to the keyword is output. This improves the accuracy of the search process.
[0015]
For example, the input unit 101 in FIG. 1 corresponds to the input device 113 in FIG. 14 described later, and the extraction unit 102 and the estimation unit 103 in FIG. 1 correspond to the combination of the CPU 111 and the memory 112 in FIG. The output means 104 corresponds to the output device 114 in FIG.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The search apparatus according to the present embodiment first performs area identification and character recognition of input image information, and generates search information including a recognition result necessary for the search. Next, using the search information, the characters constituting the keyword are searched in the input image, and the range of the region where the keyword exists is estimated from the position of the characters. Then, the character string is verified using the image information in the area, and if a keyword is extracted, the position is output as a search result.
[0017]
FIG. 2 is a flowchart of the search information generation process. The search device first inputs a document image to be searched (step S1). At this time, the user optically reads a paper document with an image input device such as a scanner and inputs the paper document to the search device. Thereby, the document image is registered in the search device. As a document image, a monochrome binary image is basically assumed. However, even a color image can be handled in the same manner as a binary image by performing binarization as preprocessing.
[0018]
Next, the search device scans the document image, performs region identification by a known technique, and extracts a character region (step S2). Here, for example, the entire image is labeled to extract a circumscribed rectangle of the black pixel connection region, and a rectangle of a certain size or more is extracted as a figure / table region candidate. Next, ruled line extraction is performed for the inside of each rectangle of the figure / table area candidate. If vertical and horizontal ruled lines are extracted, the rectangle is identified as a table area. If no ruled line is extracted, the rectangle is identified as a figure region. Then, an area other than the figure / table area is extracted as a character area.
[0019]
Next, a line (character string) is extracted from the character area, character recognition is performed for each line, and search information including a recognition result is generated and output (step S3). At this time, an area of one character is determined based on a circumscribed rectangle of the black pixel connection area, and a character pattern is cut out from the image information.
[0020]
However, since the characters may be in contact with each other, the contact characters are divided using the contact character cut-out technique, and the search information is output in units of characters or character partial patterns (basic rectangles). As the contact character segmentation technique, for example, the technique of Japanese Patent Application Laid-Open No. 11-338975 (“a recording medium on which a character segmentation processing method and a character segmentation processing program are recorded”) is used.
[0021]
In addition, in order to accurately extract a separation character such as “river”, adjacent basic rectangles are integrated to create an integrated rectangle of character candidates, and similar search information is output for the rectangle.
[0022]
Hereinafter, the basic rectangle that is the output unit of the search information and the integrated rectangle are collectively referred to as a character rectangle. The search information is output line by line, and the information of each character rectangle is sorted and output within one line. The search information for each character rectangle includes the following.
a) Character recognition result information
This information is information of recognition candidates from the first position to the nth position obtained by character recognition (n ≧ 1). The content of each candidate information is a character code, a distance value, an evaluation value, or a certainty factor. All character codes are converted to full-width codes for searching. The distance value represents the similarity between the character pattern to be recognized and the character pattern in the recognition dictionary, and the evaluation value represents the evaluation for the recognition candidate. The certainty factor represents the likelihood of the recognition candidate, and is calculated using, for example, the distance value of the candidate and the distance value of another candidate.
b) Character graphic feature information
This information is information representing the graphic characteristics of a plurality of characters, and includes the following information.
・ Character pattern itself
・ Character pattern information (image features)
Information with a clear character recognition capability is desirable. Specifically, a feature vector or the like is used. When the feature vector in the recognition dictionary is matched with this feature vector using the character code of one character in the keyword as a key, the same character can be reliably matched, but different characters cannot be matched. Is done. For example, the following feature vectors can be mentioned.
1) Detailed classification feature vector obtained from character pattern (288 dimensions)
Extract a 288-dimensional feature vector from a character pattern using a known technique called multi-dimensional compression (Hai, Kabuyama, Yamamoto "A method of handwritten Kanji recognition" Recognition by IEICE) vol.J68-D no.4, pp773-780, 1985). In this method, first, a direction line element is obtained for each contour point of a character pattern with attributes in four directions, the character pattern is nonlinearly normalized, and then divided into a different number of regions for each direction. Then, the number of pixels of the direction line element is counted for each region, and the count values are arranged in a prescribed order and vectorized. Thereby, a feature vector for detailed classification is obtained.
2) Large classification feature vectors obtained from character patterns (16 dimensions)
Based on the above-described detailed classification feature vector, eigenvalues and eigenvectors are obtained for each category, and 16 axes (eigenvectors) from the larger eigenvalue are obtained. Then, a 16-dimensional vector having a value obtained by projecting the representative vector of the category on each axis is generated. Thereby, a feature vector for large classification is obtained.
[0023]
When these feature vectors are used, when the feature vector in the recognition dictionary corresponding to the character code of one character in the keyword and the feature vector of the character in the search target document are matched, the distance between the same characters By using the threshold value, it can be determined that they are the same character. Further, different characters can be distinguished from each other by using a threshold of the same distance.
c) Character information
・ Character position
For example, the coordinate values of the upper left and lower right vertices of the circumscribed rectangle of the black pixel connection area of the character are used.
・ Size information
This information represents the size of a representative character rectangle included in the line to which the character belongs. For example, the mode value of the height and width of the character rectangle included in the line to which the character belongs is used.
・ Representative value of distance between characters in line
For example, the average value of the distance between the circumscribed rectangles of the characters in the line is used.
-Information indicating the sequence of character rectangles in the line and the order of the character rectangles
For example, in the case of a horizontal writing line, as shown in FIG. 3, 0, 1, 2,. . . Use a serial number such as Then, the gap number on the left side of the rectangle is set as the start number of one character rectangle, and the gap number on the right side is set as the end number to generate character rectangle order information.
[0024]
Of the character rectangles in FIG. 3, rectangles 1 to 9 correspond to basic rectangles, and rectangles 10 to 15 correspond to integrated rectangles obtained by integrating a plurality of basic rectangles. For example, the rectangle 10 is generated by integrating the rectangle 1 and the rectangle 2, and the rectangle 14 is generated by integrating the rectangle 1, the rectangle 2, and the rectangle 3. These rectangles represent a plurality of different cutout results for one row. Also, the order information of these rectangles is as shown in FIG.
d) Line information
·line number
The line number represents a unique number assigned to each line in the page.
・ Row direction
The row direction represents vertical or horizontal.
[0025]
FIG. 5 is a flowchart of the full-text search process based on the output search information. First, the user inputs a keyword desired to be searched (step S11). Next, the search device extracts characters (key characters) constituting the keyword from the search target file storing the search information (step S12), and there is a character string corresponding to the keyword based on the key character. A region is estimated (step S13).
[0026]
Next, the image information in the area is compared with the information in the recognition dictionary to verify whether or not the keyword is included in the area, and a character string that matches the keyword is extracted (step S14). Then, the position information of the extracted character string is output as a search result (step S15), and the process ends.
[0027]
Next, a specific example of the full-text search process will be described with reference to FIGS.
6 and 7 are flowcharts of the key character extraction process performed in step S12 of FIG. In this process, a key character search based on the certainty factor of the recognition candidate is performed. Then, the character code of one character in the input keyword matches the character code of the character in the search target file (including the second and lower recognition candidates), and the certainty factor has a certain threshold value. If it exceeds, the character is extracted as a key character.
[0028]
First, the search device creates an array keycode that holds each character in the keyword (step S21 in FIG. 6), and puts the number of characters in the keyword into a variable keyword_num (step S22). Next, 0 is put into a variable extract_num representing the number of extracted key characters (step S23), and one character as a recognition result is extracted from the search target file (step S24).
[0029]
Next, 1 is put into the variable i representing the rank of the recognition candidates (step S25), 1 is put into the variable j representing the index of the array keycode (step S26), and the character code of the i-th candidate is changed to j of the keycode. The character code of the second character is compared (step S27).
[0030]
If the two character codes match, the confidence of the i-th candidate is compared with a certain threshold value (step S28). If the certainty factor exceeds the threshold value, the character extracted in step S24 is regarded as a key character, and the j-th character code of the keycode and the position of the extracted character are recorded (step S29). Then, extract_num = extract_num + 1 is set (step S30).
[0031]
Next, j = j + 1 is set (step S31 in FIG. 7), and j is compared with keyword_num (step S32). If j is less than keyword_num, the processes in and after step S27 are repeated.
[0032]
If the two character codes do not match in step S27, and if the certainty factor is less than or equal to the threshold value in step S28, it is determined that the character extracted in step S24 is not a key character, and step S31. Perform the following processing.
[0033]
If j exceeds keyword_num in step S32, then i = i + 1 is set (step S33), and i is compared with the number of recognition candidates (step S34). If i is equal to or less than the number of recognition candidates, the processes in and after step S26 are repeated.
[0034]
If i exceeds the number of recognition candidates, it is next checked whether or not the characters in the search target file have ended (step S35). If unprocessed characters remain, the processing from step S24 onward is repeated for the next character. Then, when all characters are processed, the process is terminated.
[0035]
In this process, an evaluation value or a distance value can be used instead of the certainty factor of the recognition candidate. When the evaluation value is used, the evaluation value of the recognition candidate is compared with a certain threshold value in step S28, and if the evaluation value exceeds the threshold value, the character of the search target file is regarded as a key character. If the distance value is used, the distance value of the recognition candidate is compared with a certain threshold value in step S28, and if the distance value falls below the threshold value, the character of the search target file is regarded as a key character.
[0036]
According to the key character extraction process described above, since each character in the search target file is collated with a plurality of recognition candidates and the characters constituting the keyword, the key character in the document image is easily detected. Further, since the key character is extracted based on the certainty factor, the evaluation value, or the distance value of the recognition candidate, the key character extraction accuracy is improved.
[0037]
8 and 9 are flowcharts of the keyword region estimation process performed in step S13 of FIG. In this process, the position of the character in the keyword is obtained from the character information of the extracted key character. Then, an estimated keyword region surrounding the key character is obtained from the position in the keyword and character range information such as the number of characters and the character width of the keyword. For example, in the case of horizontal writing, a keyword region extending from the key character to the left and right is generated, and in the case of vertical writing, a keyword region extending from the key character to the top and bottom is generated.
[0038]
First, the search device puts 1 in a variable i indicating one of the extracted key characters (step S41 in FIG. 8), and puts 0 in a variable kwnum representing the number of extracted keyword regions (step S42). Information on the character code of the i-th key character and the position in the search target file is acquired (step S43).
[0039]
Next, the keyword area in the document image is determined from the character information such as the position and size of the key character (the width or height of the character in the line to which the key character belongs), the character spacing distance, and the position of the key character in the keyword. Estimate (step S44).
[0040]
For example, consider a case where “Fujitsu Laboratories” is input as a keyword and “communication” is extracted as a key character. In this example, as shown in FIG. 10, an xy coordinate system is set in the document image, and the extracted character information of the character “T” is the coordinates (x, y) = (200) of the upper left vertex of the character rectangle. , 500) and the coordinates (250, 550) of the lower right vertex are recorded as positions. Further, it is recorded that a typical character width is 50 dots and a character interval distance is 10 dots.
[0041]
In this case, as the coordinate values of the upper and lower ends of the estimated keyword area, the y coordinate 500 of the upper left vertex and the y coordinate 550 of the lower right vertex are respectively adopted, and the coordinate values of the left end and the right end are calculated by the following equations. .
Left coordinate:
X coordinate of upper left vertex-(character width + distance between characters) x (number of characters to the left of the key character)
= 200- (50 + 10) * 2
= 80
Right coordinate:
X coordinate of the lower right vertex + (character width + character spacing distance) x (number of characters to the right of the key character)
= 250 + (50 + 10) × 3
= 430
Therefore, if the range of the estimated keyword area is expressed using the coordinates of the upper left and lower right vertices, (80,500)-(430,550) is obtained.
[0042]
Next, the search device puts 1 in the variable j indicating one of the already registered keyword areas, puts 0 in the overlap flag variable kwnum2 with the already extracted keyword area (step S45), and the estimated keyword area is j It is checked whether or not it overlaps with the th already registered keyword area (step S46).
[0043]
If the two keyword areas overlap, a rectangular area surrounding the keyword areas is obtained as a new keyword area, and the coordinates of the jth registered keyword area are updated (step S47). As a result, the range of the j-th registered keyword area is replaced with the new keyword area range. Then, kwnum2 = 1 is set.
[0044]
Next, j = j + 1 is set (step S48), and j and kwnum are compared (step S49 in FIG. 9). If j is less than or equal to kwnum, the processing from step S46 is repeated, and if j exceeds kwnum, the value of kwnum2 is checked next.
[0045]
If kwnum2 = 0, an estimated keyword area is newly registered, and kwnum = kwnum + 1 is set (step S51). Next, i = i + 1 is set (step S52), and it is checked whether there is information on the i-th extracted key character (step S53). If there is an i-th extracted key character, the processing from step S43 is repeated, and if the extracted key character information ends, the processing ends.
[0046]
In step S46, if the two keyword regions do not overlap, the process from step S48 is performed as it is. If kwnum2> 0 in step S50, the process from step S52 is performed.
[0047]
According to such a keyword area estimation process, it is possible to accurately estimate the range of the keyword area including the character based on the position of the extracted key character. Even if the first character of a keyword is not extracted as a key character, if another character is extracted as a key character, the corresponding keyword region can be estimated from the position of the character. Furthermore, when a plurality of key characters constitute one keyword area, the keyword area can be correctly extracted.
[0048]
FIG. 11 is a flowchart of the keyword extraction process performed in step S14 of FIG. In this process, for the character rectangle in the range of the estimated keyword area obtained, the character pattern image is matched with the corresponding character in the recognition dictionary, and it is verified whether the character string in the area is really a keyword. .
[0049]
For example, all non-overlapping rectangle combinations in the keyword area are obtained, and for each of the obtained combinations, the image feature obtained from the character pattern in each rectangle matches the image feature of the keyword constituent character obtained from the dictionary. These combinations correspond to different character cutout results. Then, when there is a combination whose degree of matching is equal to or greater than a certain threshold (for example, the sum of distance values is equal to or less than a certain threshold), it is considered that the keyword has been detected.
[0050]
For example, when the character rectangle shown in FIG. 3 is included in the keyword area, the following combinations of rectangles are conceivable.
・ Rectangle 1 to Rectangle 9
・ Rectangle 10 and Rectangle 3 to Rectangle 9
Rectangle 10, rectangle 13, and rectangles 4 to 9
Rectangle 10, rectangle 11, and rectangles 5 to 9
Rectangle 10 to rectangle 12 and rectangle 7 to rectangle 9
Rectangle 1, rectangle 2, rectangle 11, and rectangle 5 to rectangle 9
Rectangle 1, rectangle 2, rectangle 11, rectangle 12, and rectangle 7 to rectangle 9
Rectangle 1 to rectangle 4, rectangle 12, and rectangle 7 to rectangle 9
Rectangle 1, rectangle 13, and rectangle 4 to rectangle 9
Rectangle 1, rectangle 13, rectangle 4, rectangle 12, and rectangle 7 to rectangle 9
Rectangle 14 and rectangle 4 to rectangle 9
Rectangle 14, rectangle 4, rectangle 12, and rectangle 7 to rectangle 9
Rectangle 14, rectangle 15, and rectangle 6 to rectangle 9
Rectangle 10, rectangle 3, rectangle 15, and rectangle 6 to rectangle 9
Rectangle 1, rectangle 13, rectangle 15, and rectangle 6 to rectangle 9
FIG. 11 shows the verification process for one estimated keyword area, but the search device performs the same process for all estimated keyword areas, and outputs only the area where the keyword is detected as a search result. . It is assumed that the rectangles included in each combination in the estimated keyword area are sorted in advance from the closest to the top of the corresponding line.
[0051]
First, the search device pays attention to one of the combinations of rectangles in the keyword area, and selects the first rectangle of the combination as the current rectangle (step S61). Next, 0 is set in the variable num indicating the number of characters in the area matching the characters in the keyword, and 1 is set in the variable k indicating the characters in the keyword (step S62). Call (step S63). In the recursive matching process, each character pattern in the region is matched with the keyword character pattern using the image feature, and the number of matched characters is set as num as a return value.
[0052]
Next, the return value num is compared with the number of characters of the keyword (step S64), and if they do not match, it is checked whether there is another combination in the same keyword area (step S65). If there is another combination, the processing from step S61 onward is repeated for that combination.
[0053]
In step S64, if the num matches the number of keyword characters, the number of areas in which the keyword is detected is incremented (step S66), and the process of step S65 is performed. In step S65, when the verification of all the combinations in the keyword area is completed, the process is terminated.
[0054]
In the recursive matching process in step S63, first, a rectangular character pattern and character information in the keyword area are acquired. The verification methods used in the subsequent processing include the following methods.
a) Method of verifying all rectangular character patterns
For all rectangles in the region, alignment processing using image information and threshold processing are performed to verify whether the target rectangle is a character constituting a keyword.
[0055]
Specifically, all rectangles in the region are matched based on image information, and rectangles with a distance equal to or less than a threshold value are regarded as being matched with keyword characters. Then, it is considered that the keyword has been extracted when all the characters of the keyword match the rectangle in the area. According to this method, since all the rectangles are verified using the image information, the keyword extraction accuracy is improved.
b) Method for verifying only rectangular character patterns with low confidence
Excludes rectangles in the area whose certainty satisfies a predetermined condition, and performs matching processing and threshold processing using image feature information for the remaining rectangles.
[0056]
Specifically, out of the rectangles in the region, the rectangle whose confidence level is higher than the threshold value is considered to be already matched with the keyword character, and the rectangle whose confidence level is equal to or lower than the threshold value is determined by the image information. Perform matching. A rectangle with a distance equal to or less than the threshold value is regarded as being matched with the keyword character, and if all the keyword characters are matched with the rectangle in the region, the keyword is regarded as being extracted. According to this method, the processing time can be reduced without impairing the keyword extraction accuracy.
[0057]
12 and 13 are flowcharts of recursive matching processing using the verification method of a). First, the search device compares the start number of the current rectangle with the end number of the previous rectangle (step S71 in FIG. 12). If they do not match, it is next checked whether the start number of the current rectangle is 0 (step S72). If the start number of the current rectangle is 0, it is next checked whether or not the line number of the current rectangle is the line number of the previous line + 1 (step S73).
[0058]
If the line number of the current rectangle is the line number +1 of the previous line, the image feature (feature vector, etc.) of the character in the current rectangle is compared with the image feature of the kth character of the keyword (step S74). ). If the distance between the image features is equal to or smaller than the threshold value, it is considered that the current rectangular character matches the k-th character of the keyword, and num = num + 1 is set (step S75).
[0059]
Next, k + 1 is compared with the number of characters of the keyword (step S76 in FIG. 13). If k + 1 is equal to or less than the number of keyword characters, k = k + 1 is set (step S77), and the next rectangle after the current rectangle is set as a new current rectangle (step S78). As the next rectangle, a rectangle having the end number of the current rectangle as the start number is selected. Then, the recursive matching process is called to obtain the return value num (step S79), the num is returned to the caller (step S80), and the process ends.
[0060]
In step S71, if the start number of the current rectangle matches the end number of the previous rectangle, the processing from step S74 is performed. If the start number is not 0 in step S72, the line number is not the line number +1 of the previous line in step S73, and if the character of the current rectangle does not match the kth character of the keyword in step S74, The process of step S80 is performed as it is. In step S76, the process of step S80 is also performed when k + 1 exceeds the number of keyword characters.
[0061]
When the verification method of b) is used, the same processing as in FIG. 12 and FIG. 13 is performed on the rectangles between the rectangles with high certainty instead of all the rectangles in the region. In this case, in step S64 of FIG. 11, the number of remaining characters (partial character strings) excluding characters with high certainty among the characters in the keyword is used as the number of keyword characters.
[0062]
Moreover, in the verification method of b), the same process can be performed using an evaluation value or a distance value instead of the certainty factor. In this case, the matching process and the threshold process are performed on the remaining rectangles except for the rectangles in the keyword area whose evaluation value or distance value satisfies a predetermined condition.
[0063]
The pattern extraction apparatus of this embodiment can be configured using, for example, an information processing apparatus (computer) as shown in FIG. 14 includes a CPU (central processing unit) 111, a memory 112, an input device 113, an output device 114, an external storage device 115, a medium driving device 116, a network connection device 117, and an image input device 118. They are connected to each other by a bus 119.
[0064]
The memory 112 includes, for example, a ROM (read only memory), a RAM (random access memory), and the like, and stores programs and data used for processing. The CPU 111 performs necessary processing by executing a program using the memory 112.
[0065]
The input device 113 is, for example, a keyboard, a pointing device, a touch panel, and the like, and is used for inputting instructions and information from the user. The output device 114 is, for example, a display, a printer, a speaker, or the like, and is used for outputting an inquiry to a user and a processing result.
[0066]
The external storage device 115 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, or the like. The information processing apparatus stores the above-described program and data in the external storage device 115, and can use them by loading them into the memory 112 as necessary. The external storage device 115 is also used as a database for storing search target files.
[0067]
The medium driving device 116 drives the portable recording medium 120 and accesses the recorded contents. As the portable recording medium 120, any computer-readable recording medium such as a memory card, a floppy disk, a CD-ROM (compact disk read only memory), an optical disk, or a magneto-optical disk is used. The user can store the above-described program and data in the portable recording medium 120 and load them into the memory 112 for use as necessary.
[0068]
The network connection device 117 is used for connection to an arbitrary communication network such as a local area network (LAN) and performs data conversion accompanying communication. The information processing apparatus can receive the above-described program and data from an external apparatus via the network connection apparatus 117, load them into the memory 112, and use them.
[0069]
The image input device 118 is an imaging device such as a CCD camera or a scanner, and is used for inputting a document image including a color image.
FIG. 15 shows a computer-readable recording medium that can supply a program and data to the information processing apparatus of FIG. Programs and data stored in the portable recording medium 120 or the external database 121 are loaded into the memory 112. Then, the CPU 111 executes the program using the data and performs necessary processing.
[0070]
In the embodiment described above, rectangular coordinates are used as character position information. However, in the present invention, coordinates of other arbitrary shapes can be used. Moreover, you may use shapes other than a rectangle as an estimation keyword area | region.
(Supplementary Note 1) A search device for searching information in a document image using information of a recognition result of the document image,
An input means for inputting a keyword;
Extracting means for extracting any character constituting the keyword from the document image as a key character;
Estimating means for estimating an area corresponding to the keyword in the document image based on the position information of the key character;
An output means for outputting a search result based on the estimated area information;
A search device comprising:
(Supplementary note 2) The search according to supplementary note 1, wherein the extracting means extracts a key character by comparing a plurality of recognition candidates of characters in the document image with an arbitrary character constituting the keyword. apparatus.
(Additional remark 3) The said extraction means extracts the said key character using one of the certainty of the recognition candidate of the character in the said document image, an evaluation value, and a distance value, The additional remark 1 characterized by the above-mentioned. The described search device.
(Additional remark 4) The said estimation means calculates the range of the area | region corresponding to the said keyword using at least one among the size information of the character in the said document image, and the information of a character space | interval, and the positional information on the said key character. The search device according to supplementary note 1, wherein:
(Additional remark 5) It further has a verification means which verifies whether the character string in this estimated area | region is the said keyword based on the character pattern in the said estimated area | region, The said output means contains this keyword The search apparatus according to claim 1, wherein information on a region including a character string verified to be is output as the search result.
(Additional remark 6) The said verification means performs the alignment process and threshold value process which used image feature information for all the character patterns in the said estimated area | region, and whether the object character pattern is a character which comprises a keyword. The search device according to appendix 5, wherein the search device is verified.
(Additional remark 7) The said verification means remove | excludes the character pattern in which one of reliability, an evaluation value, and a distance value satisfy | fills predetermined conditions from the character pattern in the said estimated area | region, and the remaining character pattern 6. The search apparatus according to appendix 5, wherein a matching process and a threshold process using image feature information are performed on the subject to verify whether the target character pattern is a character constituting a keyword.
(Supplementary Note 8) The verification unit performs matching processing and threshold processing using image feature information for each of a plurality of combinations of character patterns corresponding to a plurality of character cutout results, and each combination corresponds to the keyword. 6. The search device according to appendix 5, wherein whether or not to verify is verified.
(Additional remark 9) It further has the production | generation means which produces | generates the information for a search which recognizes the said document image beforehand as a search result file containing a recognition result, and the storage means which stores this information for a search as a search object file, The said extraction means and estimation The search apparatus according to appendix 1, wherein the means performs processing using the search information stored in the storage means.
(Supplementary Note 10) A recording medium on which a program for a computer that searches information in a document image using information of a recognition result of a document image is recorded,
The program is
Extracting from the document image any character that constitutes a keyword as a key character,
Based on the position information of the key characters, an area corresponding to the keyword in the document image is estimated,
Output search results based on estimated area information
A computer-readable recording medium that causes the computer to execute processing.
(Supplementary Note 11) A search method for searching information in a document image using information on a recognition result of the document image,
Specify keywords,
An arbitrary character constituting the keyword is extracted as a key character from the document image,
Based on the position information of the key characters, an area corresponding to the keyword in the document image is estimated,
Present search results based on estimated region information
A search method characterized by that.
[0071]
【The invention's effect】
According to the present invention, a character recognition result is effectively used in a search process for information in an image, and a highly accurate search can be performed. As a result, even for words in document images that could not be searched due to misrecognition in the past, a full-text search using keywords is possible.
[Brief description of the drawings]
FIG. 1 is a principle diagram of a search device according to the present invention.
FIG. 2 is a flowchart of search information generation processing;
FIG. 3 is a diagram illustrating a character rectangle.
FIG. 4 is a diagram illustrating order information of character rectangles.
FIG. 5 is a flowchart of a full text search process.
FIG. 6 is a flowchart (part 1) of key character extraction processing;
FIG. 7 is a flowchart (No. 2) of key character extraction processing;
FIG. 8 is a flowchart (part 1) of keyword region estimation processing;
FIG. 9 is a flowchart (part 2) of the keyword area estimation process;
FIG. 10 is a diagram showing keyword regions.
FIG. 11 is a flowchart of keyword extraction processing.
FIG. 12 is a flowchart (part 1) of recursive matching processing;
FIG. 13 is a flowchart (part 2) of recursive matching processing;
FIG. 14 is a configuration diagram of an information processing apparatus.
FIG. 15 is a diagram illustrating a recording medium.
[Explanation of symbols]
101 Input means
102 Extraction means
103 Estimating means
104 Output means
111 CPU
112 memory
113 Input device
114 output device
115 External storage device
116 Medium drive device
117 Network connection device
118 Image Input Device
119 bus
120 portable recording media
121 database

Claims

A search device for searching for information in a document image using information on a recognition result of the document image,
An input means for inputting a keyword;
An extraction unit that searches the document image for an arbitrary character constituting the keyword, and uses the searched character as a key character to extract position information of the key character ;
Based on the position information of the key character, the character number information indicating the number of characters at the head of the key character among the characters constituting the keyword, the representative size of the character in the document image, and the representative using at least one of character spacing information indicating a character spacing, and calculates the tip position of the region corresponding to the keyword in the document image, at the end side of the key characters of the characters constituting the keyword Using the number of characters and at least one of the character size information and the character spacing information, the end position of the region corresponding to the keyword is calculated, and the range between the obtained tip position and the end position is calculated. An estimation means for estimating the region corresponding to the keyword;
A search apparatus comprising: output means for outputting, as a search result , information on a region including a character string corresponding to the keyword among the estimated regions.

Verification means for verifying whether the character string in the estimated area is the keyword based on the character pattern in the estimated area, and the output means is the keyword The search apparatus according to claim 1, wherein information on a region including the verified character string is output as the search result.

A recording medium storing a program for a computer that searches information in a document image using information on a recognition result of the document image,
The program is
From the document image , search for an arbitrary character constituting a keyword, using the searched character as a key character, extract the position information of the key character ,
Based on the position information of the key character, the character number information indicating the number of characters at the head of the key character among the characters constituting the keyword, the representative size of the character in the document image, and the representative using at least one of character spacing information indicating a character spacing, and calculates the tip position of the region corresponding to the keyword in the document image, at the end side of the key characters of the characters constituting the keyword Using the number of characters and at least one of the character size information and the character spacing information, the end position of the region corresponding to the keyword is calculated, and the range between the obtained tip position and the end position is calculated. Estimate the region corresponding to the keyword,
A computer-readable recording medium that causes the computer to execute a process of outputting information on a region including a character string corresponding to the keyword among the estimated regions as a search result.

A search method for searching for information in a document image using information on a recognition result of the document image,
Input means, enter a keyword,
The extracting means searches the document image for an arbitrary character constituting the keyword, and uses the searched character as a key character to extract position information of the key character ,
The estimation means uses the position information of the key character as a reference, and the character size indicating the number of characters on the front side of the key character among the characters constituting the keyword and the representative size of the character in the document image Using at least one of the information and character spacing information indicating a representative character spacing , the leading edge position of the region corresponding to the keyword in the document image is calculated, and the key character among the characters constituting the keyword The end position of the region corresponding to the keyword is calculated using the number of characters on the end side and at least one of the character size information and the character spacing information, and between the obtained tip position and the end position. Is assumed to be the region corresponding to the keyword,
Search method output means, among the estimated area, the information of a region including a character string corresponding to the keyword, and outputs as a search result.