JP3673553B2

JP3673553B2 - Filing equipment

Info

Publication number: JP3673553B2
Application number: JP07651395A
Authority: JP
Inventors: 和之齋藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-03-31
Filing date: 1995-03-31
Publication date: 2005-07-20
Anticipated expiration: 2020-07-20
Also published as: JPH08272813A

Description

【０００１】
【産業上の利用分野】
本発明は文書画像中の文字を認識し、蓄積するファイリング装置およびその制御方法に関するものである。
【０００２】
【従来の技術】
印刷された文書をスキャナで取り込み画像データとして蓄積する文書ファイリング装置において、画像データ内のテキスト領域を文字認識し、得られた文字コードを検索時の照合用のデータとして用いる構成の文書ファイリング装置が存在する。その構成及び動作について以下説明する。
【０００３】
図６は登録時の処理の流れを表すフローチャートである。ステップＳ６０１で登録しようとする文書をスキャナで読み込み２値の画像データに変換する。ステップＳ０２では、得られた入力画像に対して領域分離を行い、文字画像が存在するテキスト領域だけを抽出する。テキスト領域だけを抽出するのは２値の画像データにおいて黒画素の連結成分を抽出し文字と推定されるものだけを結合することにより可能であり既存の技術である。次にステップＳ６０３では、テキスト領域に対して文字認識処理を行い、テキストデータを得る。このテキストデータは検索時に検索キーワードとの照合に用いるためのものである。ステップＳ６０４で画像データ全体を圧縮する。本実施例では２値画像に最適な例えばＭＭＲを用いる。ステップＳ６０５で圧縮した画像データとテキストデータを統合し、さらに日付や登録者名、データサイズ等を記述したヘッダを付加して外部記憶装置に保存する。
【０００４】
図７は領域分離の模式図である。図示において、入力画像７０１に対して領域分離を行いテキスト領域のみの画像７０２を得る。このテキスト領域に対して文字認識しテキストデータを得る。一方、元の画像データを圧縮したものを前記テキストデータとペアにし、さらにヘッダが付加され７０３のような１件のデータとして保持される。
【０００５】
図８に保存されるデータ形式を示す。１件目のデータが符号８０１で示される領域に保存され、２番目以降のデータは８０２から順に格納される。
【０００６】
次に検索時の処理の流れについて図９のフローチャートを用いて説明する。
【０００７】
ステップＳ９０１で検索キーワードを入力する。ステップＳ９０２では、検索データの数を記憶しておくカウンタｉに１をセットする。次にステップＳ９０３で保存されているｉ件目のデータのテキストデータ部と入力された検索キーワードとの照合を行う。この照合はいわゆる全文検索を行い、テキストデータ部に検索キーワードがそのまま含まれているかどうかを調べる。
【０００８】
ステップＳ９０４では、検索キーワードを含まれていたかどうかを判断する。この判断で、検索キーワードが含まれていたと判断した場合、ステップＳ９０５へ進み、発見されたデータの全画像データを伸長しディスプレイに表示し、ステップＳ９０６へ進む。なお、検索キーワードがテキストデータ部に含まれていたことを検出することを「ヒットした」と呼ぶことにする。
【０００９】
一方、ステップＳ９０４で検索キーワードが含まれていなかったと判断した場合には、ステップＳ９０６へ進む。
【００１０】
ステップＳ９０６では、検索する対象の最後に到達したか否かを判断する。もし、未だ、最終データにまで到達していないと判断したら、カウンタｉを１つ増やしステップＳ９０３へ戻る。
【００１１】
以上のようにして検索キーワードが文字画像として含まれているデータの画像を捜し出しディスプレイ上に表示することが可能である。
【００１２】
【発明が解決しようとしている課題】
しかしながら、上記従来例では検索キーワードをそのままテキストデータと照合していたため、テキストデータ中に文字認識の誤認識に起因する誤った文字が含まれていて、その誤認識文字が検索キーワードに対応する文字列に存在する場合、ヒットするはずのデータの検索ができないという問題が発生する。
【００１３】
【発明が解決するための手段】
本発明はかかる問題点に鑑みなされたものであり、文字認識が完全ではないことを考慮し、検索キーワードに基づく検索率を向上させることを可能にするファイリング装置およびその制御方法を提供しようとするものである。
【００１４】
この課題を解決するため、例えば本発明のファイリング装置は以下の構成を備える。すなわち、
文書画像を蓄積し、前記文書画像中の文字を認識したデータを利用して、検索キーワードに基づき文書画像を検索するファイリング装置であって、
入力文書画像中の文字の認識結果得られた候補文字群を類似度と対応付けて蓄積する蓄積手段と、
与えられた検索キーワードと前記認識結果の類似度が第１位の候補文字どうしを組み合わせた第１組合せとの一致度を算出する第１の算出手段と、
前記第１の算出手段で算出された一致度が所定の第１の閾値以上の場合、前記検索キーワードと、前記所定の第１の閾値以上と判定された候補文字群に第２位以降の候補の文字を含めた第２組合せとの一致度を算出する第２の算出手段と、
前記第２の算出手段で算出された一致度が所定の第２の閾値以上の場合、前記検索キーワードにヒットしたと決定する決定手段とを備える。
【００１５】
また、本発明のファイリング装置の制御方法は以下の工程を備える。すなわち、
文書画像を蓄積し、前記文書画像中の文字を認識したデータを利用して、検索キーワードに基づき文書画像を検索するファイリング装置の制御方法であって、
入力文書画像中の文字の認識結果得られた候補文字群を類似度と対応付けて蓄積する蓄積工程と、
与えられた検索キーワードと前記認識結果の類似度が第１位の候補文字どうしを組み合わせた第１組合せとの一致度を算出する第１の算出工程と、
前記第１の算出工程で算出された一致度が所定の第１の閾値以上の場合、前記検索キーワードと、前記所定の第１の閾値以上と判定された候補文字群の第２位以降の候補の文字を含めた第２組合せとの一致度を算出する第２の算出工程と、
前記第２の算出工程で算出された一致度が所定の第２の閾値以上の場合、前記検索キーワードにヒットしたと決定する決定工程とを備える。
【００１６】
更に、前記一致度と、対応するデータのリスト一覧を表示する表示手段と、表示されたリスト中の所望とするデータを指示する指示手段と、指示されたデータを表示する表示手段とを備えることが望ましい。この結果、ユーザは検索されたデータの確からしさを判断できるようになり、その判断結果に基づいてデータを表示することが可能になる。
【００１８】
また、前記第２の算出手段は、前記第２位以降の候補文字群中の前記第２組合せとする範囲を、前記認識結果得られた前記第１位の候補文字の類似度に基づいて決定することが望ましい。この結果、照合する第２組合せの数を少なくでき、検索が高速になる。
【００１９】
【実施例】
以下、添付図面に従って本発明に係る実施例を詳細に説明する。
【００２０】
＜第１の実施例の説明＞
図１は第１の実施例をファイリング装置のブロック構成図である。図１において、１０１は画像原稿に光を照射し、その反射光を読み取り電気信号に変換するスキャナ、１０２はスキャナ１０１で得られた電子信号を２値のデジタル電気信号に変換し他の装置構成要素に伝送するためのスキャナインターフェース回路、１０３はディスプレイのウィンドウ上で所望とする座標を入力するためのポインティングデバイス（例えばマウス等）、１０４はポインティングデバイス１０３からの信号を受け、それを他の装置構成要素に伝送するためのインターフェース回路、１０５は装置全体の制御及び文字切り出し処理や認識処理を実行するためのＣＰＵ、１０６はＣＰＵ１０５が実行する制御プログラム、各種処理プログラムやフォントデータなどを格納しているＲＯＭ、１０７は文字画像の展開や文字認識処理のための作業領域などとして用いられるＲＡＭである。また、１０８は入力イメージや認識結果を表示するためのディスプレイ、１０９はディスプレイインターフェース回路である。ディスプレイ１０８には、ＲＡＭ１０７の所定アドレスエリアに格納されているＶＲＡＭ領域のイメージを表示する。１１０は、登録されたデータが格納されるハードディスク等の外部記憶装置で、データの蓄積はもとより、文字認識用の辞書が格納されている。１１１はそのインターフェースである。そして１１２は各装置構成要素を接続するバスである。
【００２１】
入力文書画像を登録する処理及び格納されるデータのデータ構造に関しては先に説明した図６、図７と略同じであるので省略する。但し、入力文書画像をファイリングするとき、文字認識処理で得られた文字の第１候補のみを文字コードをテキストデータとして登録するのではなく、第４候補までの文字コードを登録する。
【００２２】
次に検索時の処理の流れについて図２，図３のフローチャート、図４、及び図５に従って説明する。
【００２３】
検索キーワードとして、例えば「内部処理統合型」という文字列が入力されたとする。
【００２４】
まず、ステップＳ２０１において、キーワードを入力し、検索データの数を示すカウンタｉに初期値として“１”を代入する。
【００２５】
次いで、処理はステップＳ２０３に進み、保存されているｉ番目のデータのテキストデータ部の全候補を対象に検索キーワードとのマッチングを行う。
【００２６】
そして、ステップＳ２０４において、ステップＳ２０３の処理結果に基づいて、検索キーワードに対応する文字列が存在するか否かを判断する。一致する、もしくはほぼ一致すると判定した場合には、ステップＳ２０５に写って、概要するデータのイメージ部分を伸長し、文字列と共に表示する。そして、ユーザから次のデータの検索の指示があった場合には、ステップＳ２０６に進み、データ終了と判断した場合には、検索処理を終了し、そうでなければステップＳ２０７で変数ｉをインクリメントし、次のデータの検索を行う。
【００２７】
上記処理におけるステップＳ２０３の処理手順を図３に示す。以下、同処理の内容を説明する。
【００２８】
尚、以下の説明における変数ｎ、ｃ，ｊ，ｋはＲＡＭ１０７に確保されているものである。変数ｎは、テキストデータの文字位置を示すカウンタ（ポインタ）、変数ｃはキーワードと一致する文字数を示すカウンタ、変数ｊはキーワードの文字列中の１つの文字位置を示すカウンタ、変数ｋは候補文字順序を表わすカウンタである。
【００２９】
ステップＳ３０１〜３０４では、各変数に初期値として“１”を与える。
【００３０】
ステップＳ３０５では、注目しているデータ（ｉ番目のデータ）のテキストデータのｎ文字目を第ｋ候補Ｘ（ｎ，ｋ）と、検索キーワードの第ｊ番目の文字Ｙ（ｊ）とのマッチングがとれるかどうかを判断する。すなわち、Ｘ（ｎ，ｋ）とＹ（ｊ）が等しいか否かを判断する。
【００３１】
この判断で等しくないと判断したら、ステップＳ３０６に進んで、テキストデータ中の第ｎ番目の文字に対する全ての候補に対して照合を行ったか否かを判断する。未照合の候補があると判断した場合には、ステップＳ３０７で注目文字位置の次の候補を参照すべく、変数ｋを“１”だけインクメントし、ステップＳ３０５の処理を行う。この結果、図５に示すごとく、各認識候補の組み合わせに対して順次照合が行われることになる。
【００３２】
一方、キーワード文字列の第ｊ番目の文字と等しい文字が発見されたら、ステップＳ３０８に進み、一致した文字数を係数するカウンタｃをインクメントする。
【００３３】
ステップＳ３０９に処理が進むと、検索キーワードの最後の文字に対する照合処理が完了したかどうかを判断する。検索キーワードの最後の文字に対する照合が完了していないと判断した場合には、ステップＳ３１０に進み、変数ｊをインクリメントする。そして、ステップＳ３１１で、次のテキストデータ中の文字が存在するかどうかを判断し、あればステップＳ３１２で照合対象の文字位置を次の文字位置に進めるべく、変数ｎをインクメントし、ステップＳ３０４に戻る。
【００３４】
一方、ステップＳ３１１の判断で、テキストデータの終了であると判断した場合には、ステップＳ３１６でテキストデータに検索キーワードはないと判断し、本処理を終了し、図２の処理に復帰する。
【００３５】
さて、ステップＳ３０９において、検索キーワードの最後の文字に対する照合が完了したと判断した場合には、ステップＳ３１３に進み、一致度ｍの計算を行う。
【００３６】
ここで、実施例における一致度ｍの算出は、例えば次式によるものとする。
【００３７】
ｍ＝（一致した文字数）／（全構成文字数）×１００［％］
すなわち、検索キーワードの文字列のうち、何文字が一致したかを割合で示した（ｍの値が大きいほど検索キーワードに一致している可能性が高いことを示す）。
【００３８】
ステップＳ３１４に処理が進むと、上記のようにして算出した一致度ｍと予め設定された値Ｍとを比較し、それ以下の場合には、テキストデータ部のｎ文字目からはじめる文字列は、検索キーワードとは照合しないと判断し、ステップＳ３１７に進む。ステップＳ３１７では、テキストデータ部の最後まで到達したか否かを判断する。ここで、テキストデータの終了ではないと判断した場合には、変数ｎを１つ進め、テキストデータ部の次の文字位置を切り出し位置として設定し、ステップＳ３０２以下の処理を行う。
【００３９】
また、ステップＳ３１７において、テキストデータ部の終了であると判断した場合には、ステップＳ３１９に進み、注目しているい番目のテキストデータには検索キーワードに相当する文字列がないと判断し、本処理を終了し、図２の処理に復帰する。
【００４０】
そして、ステップＳ３１４の判断で、一致度ｍ＞Ｍであると判断した場合、処理はステップＳ３１５に進み、注目データのテキストデータ部には、検索キーワードと一致する（ｍ＝１００のとき）、もしくは、一致する可能性が高いと文字列があると判断し、その結果を上位ルーチンに返し、本処理を終える。
【００４１】
尚、上記処理の中で、例えば検索キーワードとして、「内部処理統合型」を入力し、検索を行ったとき、例えばある注目データ中のｉ番目のテキストデータ部が図４の如く、「内」「処」「理」「合」「型」の各文字が認識結果の第１候補にあり、「統」の文字が第３候補にあって、「部」の文字は候補にも無かったとする。このとき、一致度ｍは、
６／７×１００＝８５．７［％］
となる。
【００４２】
例えば閾値が５０％と設定してあれば、先の一致では閾値を越えているのでテキストデータに検索キーワードと「一致する部分が有る」と判断し、画像データ部を伸長して表示することができる。
【００４３】
以上説明した様に本実施例によれば、原稿画像を読み取って文字認識し、その結果をデータベースとして登録する場合において、その文字認識の第１候補のみではなく複数の候補も登録対象になり、その候補の組み合わせでもってキーワードを検索するので、検索キーワードによる検索が意図した通りになる率を高めることが可能になる。
【００４４】
更に、検索キーワードそのものがなくても、そのキーワードの文字列に対して一部が異なる場合であっても、全体としての一致度がある程度であれば、検索対象になるので、文字認識による候補にもならない文字があっても検索される可能性を高めることが可能になる。
【００４５】
尚、上記説明によれば、検索キーワードの文字数が多ければ多いほど、その判定結果に対する確からしさが高くなる。従って、検索キーワードの文字列の文字数がｎ以上の場合に、一致度に基づく判定を行うようにしても良い。また、場合によっては、一致度の判定の閾値をユーザから設定できるようにしても良い。例えば、その閾値を高くすると、少ない文字の場合には、全体として完全一致しなければならなくなり、逆に多ければ数文字が不一致であってもそのキーワードをもっていると判定できるであろう。
【００４６】
＜第２の実施例の説明＞
次に第２の実施例を説明する。本実施例では、第１段階として、検索キーワードを構成する個々の文字と、検索対象のテキストデータ部にあるテキストデータの第１候補とを照合し、その一致度が第１の閾値Ｍ１以上であるかどうかを判断し、閾値Ｍ１以上であると判断した場合、その文字列が検索キーワードと一致する可能性が高いと判断する。そして、該当する文字の第２候補以下（不一致であると判定された文字の第２候補以下）の組み合わせでもって一致度を再計算し、それが第２の閾値Ｍ２より大きいと判断した場合には、そのテキストデータに検索キーワードに対応する文字列があると判断する。
【００４７】
従って、閾値Ｍ１、Ｍ２の関係は、Ｍ１＜Ｍ２である。すなわち、第１段階では、検索キーワードになり得る可能性がある文字列があるかどうかを判断し、もしその可能性があると判断した場合には、第１の実施例で説明した照合を行うものである。
【００４８】
以下、第２の実施例における動作処理内容を図１０のフローチャート（図３のフローチャートに対応する）に従って説明する。尚、装置構成は第１の実施例と同じものする。従って、図１０に基づくプログラムはＲＯＭ１０６に格納されている。
【００４９】
まず、ステップＳ１００１、１００２、１００３では、各変数を初期化する。次いで、ステップＳ１００４〜Ｓ１００６およびステップＳ１０１３〜Ｓ１０１４で、検索キーワードで与えられた文字数文の照合処理を行なう。この過程で、検索キーワードの文字数の照合中、注目しているデータのテキストデータの最後に到達した場合には、一致する文字列がないと判断して本処理を終了する（ステップＳ１０１５）。
【００５０】
また、検索キーワードの文字数文の照合が完了したら、ステップＳ１００７で第１段階の一致度ｍ１を算出し、次のステップＳ１００８で、予め設定された閾値Ｍ１と比較する。
【００５１】
ここで、ｍ１＜Ｍ１であると判断した場合、テキストデータ部における切り出し位置（変数ｎで与えられる）からの文字列は、検索キーワードになり得る可能性が低いことになるから、処理はステップＳ１０１７に進んで、テキストデータ部の最後に到達したか否かを判断し、未到達であると判断した場合には、切り出し位置を１つ進めて、ステップＳ１００２に戻る。尚、ステップＳ１０１７の判断は、テキストデータ部の文字数から検索キーワードの文字数を引いた位置になったかどうかを判断すれば十分である。なぜなら、それ以降に対しては、検索キーワードの文字数より小さいので、必ず、ステップＳ１０１４の判断結果がｙｅｓになるからである。
【００５２】
さて、第１段階における一致度ｍ１と閾値Ｍ１との関係が、ｍ１＞Ｍ１であると判断した場合、処理はステップＳ１００９に進み、一致しなかった第１候補の文字の第２候補文字以降をも参照して照合処理を行う。この照合処理は第１の実施例と同じであるので、その説明は省略する。
【００５３】
こうして、第２候補以降の文字を含めての照合処理が完了すると、最終の一致度ｍ２を計算し（ステップＳ１０１０）、ｍ２と閾値Ｍ２との比較判断を行う（Ｓ１０１１）。
【００５４】
この結果、ｍ２＞Ｍ２であると判断した場合、注目データのテキストデータ部には検索キーワードがあるものとして、判断し本処理を終える（ステップＳ１０１２）。
【００５５】
一方、ｍ２≦Ｍ２であると判断した場合には、ステップＳ１０１７に進む。
【００５６】
以上の結果、検索キーワードに一致する可能性があるかどうかを判断し、一致する可能性があると判断した場合にのみ更なる照合処理を行うことで、先の第１の実施例と比較して、検索処理を高速に行うことが可能になる。
【００５８】
＜第３の実施例の説明＞
図１１は、一致度と文書ファイルのリストを、一致度の高い順に表示しユーザが選択可能とすることを示したものである。本実施例では、検索キーワードとして「内部処理統合型」が入力されており、その検索結果が一致度の高い順に一致度と共にリストが表示されている。そのリストから所望の文書ファイルを選択し、“オープン”ボタンをポインティング・デバイスでクリックすることで画像データ部を伸長して表示する。その結果、文書ファイリングの見逃しを防ぎ、不要な画像データ部を伸長する時間緒無駄を省き、使い勝手を向上させることが可能となる。
【００５９】
尚、本第３の実施例を実現するためには、検索結果、一致すると判断した場合に、その時に該当するデータの読み出し、および伸長処理を行うのではなく、全体に対して行ったのち（一致度とデータ番号リストのファイルを一時的に作成する等で対処できる）、それに基づいて図示のリストを表示すれば良い。従って、この場合には、一致度と閾値との比較は行わず、もじくは、閾値は低い値に設定して、ユーザに判断させることになる。
【００６０】
＜第４の実施例の説明＞
第４の実施例を説明する。図１２は第４の実施例における処理内容を示すフローチャートである。また、図１３は、その動作を説明するための図である。また図１４は保存されるデータを表したものであり、テキストデータ、イメージデータの他に類似度データも格納されている。
【００６１】
本第４の実施例では、類似度を用いて、照合範囲を限定する。第１位候補の類似度が予め定めた閾値Ｘ１以上であれば第２位以降の候補の類似度が（第１位候補の類似度−α）（αはあらかじめ定めた第１照合範囲）までを照合範囲とし、第１位候補の類似度がＸ１未満であれば第２位以降の候補の類似度が（第１位候補の類似度−β）（βはあらかじめ定めた第２照合範囲、α＜β）までを照合範囲とすることで、認識結果の第１位候補の確からしさが高い時は出来るだけ誤った文字候補を照合範囲から除外し、逆に認識結果の第１位候補の確からしさが低い時は正しい認識候補を除外せぬように照合範囲を広げるという照合範囲の限定方法がある。
【００６２】
図１３の例では、閾値類似度Ｘ１を９０、第１照合範囲αを１０、第２照合範囲βを２０としており、認識結果の第１位候補「内」（符号１４０１）の類似度は９５であるので、照合範囲は８５以上の類似度を持つ認識候補までとなり、「肉」までが照合範囲となる。また、認識結果の第１位候補「縦」（符号１４０２）の類似度は７８であるので、照合範囲は５８以上の類似度を持つ認識候補までとなり、「統」までが照合範囲となる。その結果、処理の無駄を省き、処理時間を短縮することが可能となる。
【００６３】
一般に、文字認識処理では、認識しようとしている文字画像から特徴量を抽出し、その特徴量と認識辞書に記憶されている特徴量に一番近い文字を第１候補として認識している。ここで、言う、類似度とは、その認識処理で得られた候補文字の順位を判定するために用いられた値である。
【００６４】
尚、上記各実施例では、原稿画像を光学的に読み取る装置について説明したが、本発明はこれに限定されるものではなく、通信回線を介して画像を入力したり、記憶媒体に格納されている画像を入力しても良い。また、単体の装置に適応することも可能であるし、複数の機器で構成するシステムであっても良い。また、各処理プログラムはＲＯＭに格納されているとして説明したが、外部から供給することで実現する（ＲＡＭにロードする）ことも可能であるので、本願発明は上記実施例でもって限定されるものではない。
【００６５】
以上説明したように本実施例によれば、文書画像データの登録時に文字認識が１００％正しくなくとも、各認識対象が持つ１つまたは複数個の認識候補をテキストデータとして保持し、それらテキストデータの認識候補と検索キーワードとを照合し、検索キーワードとの一致度を求め、一致度が閾値を越えた時は一致したと判断することにより、高い精度で検索キーワードの含まれた画像データを捜し出すことが出来るという効果がある。
【００６６】
【発明の効果】
以上説明したように本発明によれば、文字認識が完全ではないことを考慮し、検索キーワードに基づく検索率を向上させることが可能になる。
【００６７】
【図面の簡単な説明】
【図１】実施例の装置のブロック構成図である。
【図２】第１の実施例の検索のフローチャートである。
【図３】第１の実施例の検索の詳細フローチャートである。
【図４】第１の実施例のテキストデータの例である。
【図５】第１の実施例のテキストデータの候補の例である。
【図６】従来の登録のフローチャートである。
【図７】従来の登録の模式図である。
【図８】従来の登録データの構造である。
【図９】従来の検索のフローチャートである。
【図１０】第２の実施例の検索の詳細フローチャートである。
【図１１】第３の実施例の文書ファイルリストの表示の例である。
【図１２】第４の実施例の検索のフローチャートである。
【図１３】第４の実施例の認識結果の類似度と検索対象限定の例である。
【図１４】第４の実施例の登録データの構造である。[0001]
[Industrial application fields]
The present invention relates to a filing apparatus that recognizes and stores characters in a document image and a control method therefor.
[0002]
[Prior art]
In a document filing apparatus that captures a printed document with a scanner and stores it as image data, a document filing apparatus configured to recognize a text area in the image data and use the obtained character code as data for collation at the time of search. Exists. The configuration and operation will be described below.
[0003]
FIG. 6 is a flowchart showing the flow of processing during registration. In step S601, a document to be registered is read by a scanner and converted into binary image data. In step S02, region separation is performed on the obtained input image, and only the text region where the character image exists is extracted. Extracting only the text area is possible by extracting the connected components of black pixels in binary image data and combining only those estimated to be characters, which is an existing technique. In step S603, character recognition processing is performed on the text area to obtain text data. This text data is used for matching with a search keyword at the time of search. In step S604, the entire image data is compressed. In this embodiment, for example, MMR, which is optimal for binary images, is used. The image data and text data compressed in step S605 are integrated, and a header describing the date, registrant name, data size, etc. is added and stored in the external storage device.
[0004]
FIG. 7 is a schematic diagram of region separation. In the figure, the input image 701 is subjected to region separation to obtain an image 702 having only a text region. Character data is recognized for this text area to obtain text data. On the other hand, the compressed original image data is paired with the text data, and a header is added to hold the data as one piece of data 703.
[0005]
FIG. 8 shows the data format stored. The first data is stored in the area indicated by reference numeral 801, and the second and subsequent data are stored in order from 802.
[0006]
Next, the flow of processing at the time of search will be described using the flowchart of FIG.
[0007]
In step S901, a search keyword is input. In step S902, 1 is set to a counter i for storing the number of search data. Next, in step S903, the text data part of the i-th data stored in the data is collated with the input search keyword. This collation performs a so-called full-text search, and checks whether or not the search keyword is included in the text data portion as it is.
[0008]
In step S904, it is determined whether a search keyword is included. If it is determined that the search keyword is included in this determination, the process proceeds to step S905, where all image data of the found data is decompressed and displayed on the display, and the process proceeds to step S906. Note that detecting that the search keyword is included in the text data portion is referred to as “hit”.
[0009]
On the other hand, if it is determined in step S904 that the search keyword is not included, the process proceeds to step S906.
[0010]
In step S906, it is determined whether the end of the search target has been reached. If it is determined that the final data has not yet been reached, the counter i is incremented by one and the process returns to step S903.
[0011]
As described above, it is possible to search for an image of data containing the search keyword as a character image and display it on the display.
[0012]
[Problems to be solved by the invention]
However, in the above conventional example, since the search keyword is directly matched with the text data, the text data includes an erroneous character due to the misrecognition of the character recognition, and the erroneously recognized character corresponds to the search keyword. If it exists in the column, there is a problem that data that should be hit cannot be searched.
[0013]
[Means for Solving the Invention]
The present invention has been made in view of such problems, and in consideration of the fact that character recognition is not complete, an object of the present invention is to provide a filing apparatus and a control method therefor that can improve the search rate based on a search keyword. Is.
[0014]
In order to solve this problem, for example, the filing apparatus of the present invention has the following configuration. That is,
A filing device that stores document images and searches for document images based on a search keyword using data obtained by recognizing characters in the document images,
Storage means for storing candidate character groups obtained as a result of recognition of characters in the input document image in association with similarities;
First calculation means for calculating a degree of coincidence between a given search keyword and a first combination in which the similarity of the recognition result is a combination of the first candidate characters;
If the degree of coincidence calculated by the first calculating means is greater than or equal to a predetermined first threshold , the second and subsequent candidates in the search keyword and a candidate character group determined to be greater than or equal to the predetermined first threshold Second calculating means for calculating the degree of coincidence with the second combination including the character of
Determining means for determining that the search keyword has been hit when the degree of coincidence calculated by the second calculating means is equal to or greater than a predetermined second threshold value;
[0015]
The filing apparatus control method of the present invention includes the following steps. That is,
A method for controlling a filing device that stores document images and uses data obtained by recognizing characters in the document images to search for document images based on a search keyword,
An accumulation step of accumulating candidate character groups obtained as a result of recognition of characters in the input document image in association with similarities;
A first calculation step of calculating a degree of coincidence between a given search keyword and a first combination in which the similarity of the recognition result is a combination of the first candidate characters;
If the degree of coincidence calculated in the first calculation step is greater than or equal to a predetermined first threshold, the second and subsequent candidates of the search keyword and a candidate character group determined to be greater than or equal to the predetermined first threshold A second calculation step of calculating the degree of coincidence with the second combination including the character of
A determination step of determining that the search keyword is hit when the degree of coincidence calculated in the second calculation step is equal to or greater than a predetermined second threshold value .
[0016]
Furthermore, a display means for displaying the matching degree, a list of corresponding data, an instruction means for instructing desired data in the displayed list, and a display means for displaying the instructed data are provided. Is desirable. As a result, the user can determine the certainty of the retrieved data, and can display the data based on the determination result.
[0018]
The second calculating means determines a range to be the second combination in the second and subsequent candidate character groups based on the similarity of the first candidate character obtained as the recognition result. It is desirable to do. As a result, the number of second combinations to be collated can be reduced, and the search becomes faster.
[0019]
【Example】
Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.
[0020]
<Description of the first embodiment>
FIG. 1 is a block diagram of a filing apparatus according to the first embodiment. In FIG. 1, reference numeral 101 denotes a scanner that irradiates light on an image original, reads the reflected light and converts it into an electrical signal, and 102 converts an electronic signal obtained by the scanner 101 into a binary digital electrical signal. A scanner interface circuit for transmitting to the element; 103, a pointing device (for example, a mouse) for inputting desired coordinates on a display window; 104, a signal received from the pointing device 103, and the other device An interface circuit for transmission to the constituent elements, 105 is a CPU for executing control of the entire apparatus, character segmentation processing and recognition processing, 106 is a control program executed by the CPU 105, various processing programs, font data, etc. ROM 107 includes character image expansion and character recognition processing. Working area for a RAM used as such. Reference numeral 108 denotes a display for displaying an input image and a recognition result, and 109 denotes a display interface circuit. The display 108 displays an image of the VRAM area stored in a predetermined address area of the RAM 107. Reference numeral 110 denotes an external storage device such as a hard disk in which registered data is stored. In addition to storing data, a dictionary for character recognition is stored. Reference numeral 111 denotes the interface. Reference numeral 112 denotes a bus for connecting each device component.
[0021]
The processing for registering the input document image and the data structure of the stored data are substantially the same as those described above with reference to FIGS. However, when filing an input document image, the character codes up to the fourth candidate are registered instead of registering only the first candidate character obtained by the character recognition process as the text data.
[0022]
Next, the flow of processing at the time of search will be described with reference to the flowcharts of FIGS. 2 and 3, FIG. 4, and FIG.
[0023]
For example, it is assumed that a character string “internal processing integrated type” is input as a search keyword.
[0024]
First, in step S201, a keyword is input, and “1” is assigned as an initial value to a counter i indicating the number of search data.
[0025]
Next, the process proceeds to step S203, and matching with the search keyword is performed for all candidates in the text data portion of the stored i-th data.
[0026]
In step S204, it is determined whether there is a character string corresponding to the search keyword based on the processing result in step S203. If it is determined that they match or almost match, the image portion of the data to be outlined is expanded and displayed together with the character string in step S205. If the user gives an instruction to retrieve the next data, the process proceeds to step S206. If it is determined that the data is terminated, the retrieval process is terminated. If not, the variable i is incremented in step S207. The next data is searched.
[0027]
The processing procedure of step S203 in the above processing is shown in FIG. Hereinafter, the contents of the process will be described.
[0028]
Note that the variables n, c, j, and k in the following description are secured in the RAM 107. The variable n is a counter (pointer) indicating the character position of the text data, the variable c is a counter indicating the number of characters matching the keyword, the variable j is a counter indicating one character position in the character string of the keyword, and the variable k is a candidate character. It is a counter representing the order.
[0029]
In steps S301 to S304, “1” is given to each variable as an initial value.
[0030]
In step S305, the matching of the kth candidate X (n, k) with the nth character of the text data of the data of interest (ith data) and the jth character Y (j) of the search keyword is performed. Judge whether it can be taken. That is, it is determined whether X (n, k) and Y (j) are equal.
[0031]
If it is determined that they are not equal, the process proceeds to step S306 to determine whether or not all candidates for the nth character in the text data have been collated. If it is determined that there is an unmatched candidate, in step S307, the variable k is incremented by "1" to refer to the next candidate for the target character position, and the process of step S305 is performed. As a result, as shown in FIG. 5, collation is sequentially performed on each combination of recognition candidates.
[0032]
On the other hand, if a character equal to the j-th character in the keyword character string is found, the process proceeds to step S308, and a counter c that counts the number of matched characters is incremented.
[0033]
When the process proceeds to step S309, it is determined whether or not the collation process for the last character of the search keyword is completed. If it is determined that the collation for the last character of the search keyword is not completed, the process proceeds to step S310, and the variable j is incremented. In step S311, it is determined whether or not there is a character in the next text data. If there is, a variable n is incremented in step S312 to advance the character position to be collated to the next character position, and in step S304. Return to.
[0034]
On the other hand, if it is determined in step S311 that the text data is terminated, it is determined in step S316 that there is no search keyword in the text data, the process is terminated, and the process returns to the process of FIG.
[0035]
If it is determined in step S309 that collation for the last character of the search keyword has been completed, the process proceeds to step S313, and the degree of matching m is calculated.
[0036]
Here, the calculation of the degree of coincidence m in the embodiment is based on, for example, the following equation.
[0037]
m = (number of matched characters) / (total number of characters) × 100 [%]
That is, the number of characters matched in the character string of the search keyword is shown as a percentage (the larger the value of m, the higher the possibility of matching the search keyword).
[0038]
When the process proceeds to step S314, the degree of coincidence m calculated as described above is compared with a preset value M, and if it is less than that, the character string starting from the nth character in the text data portion is: It is determined that the search keyword is not checked, and the process proceeds to step S317. In step S317, it is determined whether or not the end of the text data portion has been reached. If it is determined that the text data is not finished, the variable n is incremented by one, the next character position in the text data portion is set as the cut-out position, and the processes in and after step S302 are performed.
[0039]
If it is determined in step S317 that the text data portion has ended, the process proceeds to step S319, where it is determined that there is no character string corresponding to the search keyword in the second text data of interest. To return to the processing of FIG.
[0040]
If it is determined in step S314 that the degree of matching is m> M, the process proceeds to step S315, where the text data portion of the data of interest matches the search keyword (when m = 100), or If there is a high possibility of matching, it is determined that there is a character string, the result is returned to the upper routine, and the present process is terminated.
[0041]
In the above processing, for example, when “internal processing integrated type” is input as a search keyword and a search is performed, for example, the i-th text data portion in certain attention data is “inside” as shown in FIG. It is assumed that the characters “processing”, “reason”, “go”, and “type” are in the first candidate of the recognition result, the character “Tong” is in the third candidate, and the character “part” is not in the candidate. . At this time, the matching degree m is
6/7 × 100 = 85.7 [%]
It becomes.
[0042]
For example, if the threshold value is set to 50%, the threshold value is exceeded in the previous match, so it is determined that the text data has a “matching portion” with the search keyword, and the image data portion can be expanded and displayed. it can.
[0043]
As described above, according to the present embodiment, when a document image is read and character recognition is performed and the result is registered as a database, not only the first candidate for character recognition but also a plurality of candidates are registered. Since the keyword is searched with the combination of candidates, it is possible to increase the rate at which the search with the search keyword is as intended.
[0044]
Furthermore, even if there is no search keyword itself, even if a part of the character string of the keyword is different, if the degree of matching as a whole is to some extent, it becomes a search target. It becomes possible to increase the possibility of being searched even if there is a character that does not exist.
[0045]
Note that according to the above description, the greater the number of characters in the search keyword, the higher the certainty for the determination result. Therefore, when the number of characters in the character string of the search keyword is n or more, the determination based on the matching degree may be performed. In some cases, a threshold for determining the degree of coincidence may be set by the user. For example, if the threshold value is increased, if there are a small number of characters, it must be completely matched as a whole. Conversely, if there are many characters, it can be determined that the keyword is held even if several characters do not match.
[0046]
<Description of the second embodiment>
Next, a second embodiment will be described. In the present embodiment, as the first stage, individual characters constituting the search keyword are collated with the first candidate of text data in the text data portion to be searched, and the degree of coincidence is equal to or higher than the first threshold value M1. If it is determined whether or not there is a threshold value M1 or more, it is determined that there is a high possibility that the character string matches the search keyword. When the degree of coincidence is recalculated with a combination of the second and lower candidates of the corresponding character (lower than the second candidate of the character determined to be inconsistent) and it is determined that it is greater than the second threshold M2. Determines that there is a character string corresponding to the search keyword in the text data.
[0047]
Therefore, the relationship between the threshold values M1 and M2 is M1 <M2. That is, in the first stage, it is determined whether or not there is a character string that can be a search keyword, and if it is determined that there is a possibility, the collation described in the first embodiment is performed. Is.
[0048]
Hereinafter, the contents of the operation processing in the second embodiment will be described according to the flowchart of FIG. 10 (corresponding to the flowchart of FIG. 3). The apparatus configuration is the same as that of the first embodiment. Therefore, the program based on FIG. 10 is stored in the ROM 106.
[0049]
First, in steps S1001, 1002, and 1003, each variable is initialized. Next, in steps S1004 to S1006 and steps S1013 to S1014, collation processing of the number of characters given by the search keyword is performed. In this process, if the end of the text data of the data of interest is reached during the collation of the number of characters of the search keyword, it is determined that there is no matching character string, and this process is terminated (step S1015).
[0050]
When the collation of the number of characters of the search keyword is completed, the first step of matching m1 is calculated in step S1007, and compared with a preset threshold value M1 in the next step S1008.
[0051]
Here, if it is determined that m1 <M1, the character string from the cutout position (given by the variable n) in the text data portion is unlikely to be a search keyword, so the processing is step S1017. To determine whether or not the end of the text data portion has been reached. If it is determined that the text data portion has not been reached, the cut-out position is advanced by one, and the flow returns to step S1002. The determination in step S1017 is sufficient if it is determined whether or not the position is obtained by subtracting the number of characters of the search keyword from the number of characters in the text data portion. This is because, after that, the number of characters of the search keyword is smaller, so the determination result in step S1014 is always yes.
[0052]
If it is determined that the relationship between the matching level m1 and the threshold value M1 in the first stage is m1> M1, the process proceeds to step S1009, and the second candidate character and subsequent characters of the first candidate character that did not match are determined. The collating process is also performed with reference to FIG. Since this collation processing is the same as that of the first embodiment, its description is omitted.
[0053]
Thus, when the collation process including the characters after the second candidate is completed, the final matching degree m2 is calculated (step S1010), and the comparison determination between m2 and the threshold value M2 is performed (S1011).
[0054]
As a result, if it is determined that m2> M2, it is determined that there is a search keyword in the text data portion of the data of interest, and the process is terminated (step S1012).
[0055]
On the other hand, if it is determined that m2 ≦ M2, the process proceeds to step S1017.
[0056]
As a result of the above, it is determined whether or not there is a possibility that the search keyword is matched, and only when it is determined that there is a possibility that the search keyword is matched, the comparison process is performed, thereby comparing with the first embodiment. Thus, the search process can be performed at high speed.
[0058]
<Description of the third embodiment>
FIG. 11 shows that the degree of coincidence and the list of document files are displayed in order of the degree of coincidence so that the user can select them. In the present embodiment, “internal processing integrated type” is input as a search keyword, and the search results are displayed together with the matching degree in descending order of matching degree. A desired document file is selected from the list, and an “open” button is clicked with a pointing device to expand and display the image data portion. As a result, it is possible to prevent oversight of document filing, eliminate unnecessary time for decompressing unnecessary image data portions, and improve usability.
[0059]
In order to realize the third embodiment, when it is determined that the search results match, the corresponding data is not read and decompressed at that time, but is performed on the whole ( This can be dealt with by temporarily creating a file of the degree of coincidence and data number list), and the list shown in the figure may be displayed based on the file. Therefore, in this case, the degree of coincidence is not compared with the threshold value, and the threshold value is set to a low value to make the user judge.
[0060]
<Description of the fourth embodiment>
A fourth embodiment will be described. FIG. 12 is a flowchart showing the processing contents in the fourth embodiment. FIG. 13 is a diagram for explaining the operation. FIG. 14 shows data to be stored. Similarity data is stored in addition to text data and image data.
[0061]
In the fourth embodiment, the collation range is limited using the similarity. If the similarity of the first candidate is equal to or greater than a predetermined threshold value X1, the similarity of the second and subsequent candidates is up to (similarity of the first candidate- α) (α is a predetermined first collation range) If the similarity of the first candidate is less than X1, the similarity of the second and subsequent candidates is (similarity of the first candidate− β) (β is a predetermined second matching range, By setting the range up to α <β ), when the probability of the first candidate of the recognition result is high, the character candidate that is wrong as much as possible is excluded from the collation range, and conversely the first candidate of the recognition result When the probability is low, there is a collation range limiting method in which the collation range is expanded so as not to exclude correct recognition candidates.
[0062]
In the example of FIG. 13 , the threshold similarity X1 is 90, the first collation range α is 10, and the second collation range β is 20, and the similarity of the first candidate “inside” (reference numeral 1401) of the recognition result is 95. Therefore, the collation range is up to recognition candidates having a similarity of 85 or more, and up to “meat” is the collation range. Moreover, the similarity of the first candidate of the recognition result "vertical" (1402) than a 78, matching range is up to recognition candidate having 58 or more similarity, to "integrate" is the collation range . As a result, it is possible to reduce processing time and shorten processing time.
[0063]
In general, in the character recognition process, a feature amount is extracted from a character image to be recognized, and the character closest to the feature amount and the feature amount stored in the recognition dictionary is recognized as the first candidate. Here, the similarity is a value used to determine the rank of candidate characters obtained by the recognition process.
[0064]
In each of the above embodiments, an apparatus for optically reading a document image has been described. However, the present invention is not limited to this, and an image is input via a communication line or stored in a storage medium. You may input the image. Further, it is possible to adapt to a single device, or a system constituted by a plurality of devices. Further, although each processing program has been described as being stored in the ROM, it can be realized by being supplied from the outside (loaded into the RAM), and therefore the present invention is limited by the above embodiment. is not.
[0065]
As described above, according to this embodiment, even when character recognition is not 100% correct when registering document image data, one or more recognition candidates of each recognition target are held as text data, and the text data Search the image data containing the search keyword with high accuracy by matching the recognition candidates with the search keyword, obtaining the degree of match with the search keyword, and determining that the match is found when the match exceeds the threshold There is an effect that can be.
[0066]
【The invention's effect】
As described above, according to the present invention, it is possible to improve the search rate based on the search keyword in consideration that character recognition is not complete.
[0067]
[Brief description of the drawings]
FIG. 1 is a block configuration diagram of an apparatus according to an embodiment.
FIG. 2 is a flowchart of a search according to the first embodiment.
FIG. 3 is a detailed flowchart of a search according to the first embodiment.
FIG. 4 is an example of text data in the first embodiment.
FIG. 5 is an example of text data candidates according to the first embodiment.
FIG. 6 is a flowchart of conventional registration.
FIG. 7 is a schematic diagram of conventional registration.
FIG. 8 is a structure of conventional registration data.
FIG. 9 is a flowchart of a conventional search.
FIG. 10 is a detailed flowchart of a search according to the second embodiment.
FIG. 11 is a display example of a document file list according to the third embodiment.
FIG. 12 is a flowchart of search in the fourth embodiment.
FIG. 13 is an example of recognition result similarity and search target limitation according to the fourth embodiment.
FIG. 14 shows the structure of registration data in the fourth embodiment.

Claims

A filing device that stores document images and searches for document images based on a search keyword using data obtained by recognizing characters in the document images,
Storage means for storing candidate character groups obtained as a result of recognition of characters in the input document image in association with similarities;
First calculating means for calculating a degree of coincidence between a given search keyword and a first combination in which the similarity of the recognition result is a combination of first candidate characters;
If the degree of coincidence calculated by the first calculating means is greater than or equal to a predetermined first threshold , the second and subsequent candidates in the search keyword and a candidate character group determined to be greater than or equal to the predetermined first threshold Second calculating means for calculating the degree of coincidence with the second combination including the character of
A filing apparatus comprising: a determination unit that determines that the search keyword is hit when the degree of coincidence calculated by the second calculation unit is equal to or greater than a predetermined second threshold value.

The second calculation means compares the similarity of the first candidate character of the target character with a predetermined threshold,
Condition: 1st similarity score ≧ predetermined threshold
In the case of satisfying the above, a combination including a candidate character having a similarity of 1st-ranked or higher among candidate characters after the second-ranked character of interest is defined as the second combination,
If the above condition is not satisfied, a combination including candidate characters having a similarity of β-first (where α <β) or more among the candidate characters after the second-ranked character of interest is included. The second combination,
The filing apparatus according to claim 1, wherein the degree of coincidence with the search keyword is calculated.

A method for controlling a filing apparatus that stores document images and uses data obtained by recognizing characters in the document images to search for document images based on a search keyword,
A storage step provided in the filing device stores a candidate character group obtained as a result of recognition of characters in the input document image in association with a similarity, and
A first calculation step in which a first calculation unit included in the filing device calculates a degree of coincidence between a given search keyword and a first combination obtained by combining candidate characters ranked first in the recognition result; ,
When the degree of coincidence calculated in the first calculation step is greater than or equal to a predetermined first threshold , the second calculation means included in the filing device determines that the search keyword is greater than or equal to the predetermined first threshold. a second calculation step of calculating the degree of coincidence between the second combination including the character of the second position subsequent candidate in the candidate character group has,
The determination means included in the filing device includes a determination step of determining that the search keyword is hit when the degree of coincidence calculated in the second calculation step is equal to or greater than a predetermined second threshold. Control method of filing device.