JP2004178181A

JP2004178181A - Movement system for registered contents by multi-database integration of full-text search

Info

Publication number: JP2004178181A
Application number: JP2002342372A
Authority: JP
Inventors: Yoshinobu Mita; 良信三田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-11-26
Filing date: 2002-11-26
Publication date: 2004-06-24

Abstract

<P>PROBLEM TO BE SOLVED: To shorten a stand-by time by solving the problem that when a document is registered in a document database, and then full-text retrieval registration is executed, it is necessary to wait for a long time until the document is hit in retrieval processing. <P>SOLUTION: Full-text retrieval is executed, and each time a hit document is decided, a hit frequency distribution table corresponding to document ID is updated. When the registration processing of full text retrieval is started, any non-hit document is erased from the first full-text retrieval data base based on the contents of the frequency distribution table when the registration processing of the full-text retrieval is started so that the retrieving time can be shortened. The erased document is registered in a second full-text retrieval data base again. The actual retrieval is executed in the order of the first and second full-text retrieval data bases, and the document whose hit frequency is high in the second full-text retrieval data base is registered in the first full-text retrieval data base again. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
全文検索を行う際に実際にヒットする回数が多い文書と少ない文書を別個の全文検索データベースで管理する技術分野に関する。
【０００２】
【従来の技術】
全文検索処理は文書中のテキストを調べて、キーワードがその中に存在するか否かを検索するために膨大な処理時間がかかる。検索時間の短縮のために通常は基本処理として文書中からテキストを抽出してそのテキストに対応する文書のＩＤ（認識）番号をテーブルとして作成しておく。しかしながら、このような全文検索技術を使ってもヒット文書数が多い場合には、処理時間が多くかかってしまう。
【０００３】
【発明が解決しようとする課題】
そこで検索速度を上げる為に文書を複数の全文検索データベースに登録して、各全文検索データベースからの検索速度を向上する事が可能であるが、文書が含まれる全文検索データベースがどれであるか特定でき、その全文検索データベースに絞って検索する場合にしか効果はなかった。
【０００４】
【課題を解決するための手段】
本発明では、全文検索データベースを複数持ち、検索でヒットした各文書についてのヒット回数頻度をテーブルに保持する構成をとり、定期的に検索優先順位の高いデータベースからはヒット回数頻度の低い文書の登録を削除し、優先順位の低い全文検索データベースへ登録しなおす手段を有し、また検索優先順位の低いデータベースからはヒット回数頻度の高い文書の登録を削除し、優先順位の高い全文検索データベースへ登録しなおす手段を有する。
【０００５】
検索時には優先順位の高い全文検索データベースから順に検索をおこない、利用率の高い文書から高速で検索を行う手段を有する。
【０００６】
【発明の実施の形態】
以下、図面を参照して本発明の好適な実施形態を詳細に説明する。
【０００７】
［実施形態１］
図１は本発明の実施形態１の全文検索処理を含むアプリケーションソフトが動作する文書検索装置の全体構成を示す図である。
【０００８】
１はＣＰＵであり、本実施形態の処理を実行する為の各種制御を行う。２はＣＤ−ＲＯＭドライブ装置であり、外部からのデータの取りだしに用いる。３はプリンタであり、ディスプレイ装置９に表示される画像等のプリントを行う。４はハードディスクドライブ装置であり、主に主記憶装置として用いられ、本実施形態で実行する処理プログラムやアプリケーション等を記憶する。また、複数種類の文書（画像）データを記憶する。５はフロッピー（Ｒ）ディスクドライブ装置であり、外部からのデータの取り込みや外部へのデータの取り出しに用いる。６は通信部であり、ネットワーク等との接続を可能にし、外部のコンピュータやネットワーク上のスキャナやプリンタやディスク装置との間でデータの授受が行える。７はキーボードであり、本実施形態で実行される処理の指示や、検索条件等の文字列の入力等、操作者の対話的入力を受け付ける。８はマウス等のポインティングデバイスであり、キーボード７と共に操作者の対話的入力を受け付ける。９はディスプレイ装置であり、文書画像を表示する為のメモリを搭載するディスプレイカードやモニタなどを含み、本実施形態の処理を実行するためのウィンドウや処理結果を表示する。
【０００９】
（実施例１）
実施形態１のアプリケーション動作中のウィンドウ表示画面について図２を用いて説明する。
【００１０】
図２は本発明における実施形態１のアプリケーション動作中のウィンドウ表示画面を示す図である。尚、このウィンドウ表示画面はたとえばディスプレイ装置９のモニタ上に表示される。
【００１１】
１８ａはウィンドウ２００をアイコン化して表示する最小化ボタンである。１８ｂはウィンドウ２００をモニタ前面に最大化して表示する最大化ボタンである。１８ｃはウィンドウ２００の表示を消去する「閉じるボタン」である。１２はウィンドウ２００のタイトルを表示するタイトルバーである。１７は現在起動しているアプリケーション名と、表示しているイメージの文書名が表示される領域である。また、１３は各種の処理機能を表示するメニューであり、１４はメニュー１３に表示される処理機能と同様の処理機能をボタン化して表示したツールボタンである。このメニュー１３、あるいはツールボタン１４に表示される任意のボタンをポインティングデバイス８を使用して選択する事で、その選択した処理機能を実行させることが可能である。１１はツールバーであり、ある特定のモードでの動作の選択に使われる。２２は検索結果として表示される文書画像の枠の外部であるウィンドウ背景部を表示する領域である。１５は検索結果として表示されている文書画像である。
【００１２】
図２においてメニュー１３の“ツール”メニューを選択すると、図３に示すようなプルダウンメニューが表示される。プルダウンメニューには、文書画像データに含まれる文字列の登録処理、全文検索を実行するための“登録処理”、“全文検索”メニューが含まれている。実施形態１での通常の登録処理は、入力した文書データを検索対象にするための操作で、たとえば、夜間やシステムのアイドル時に自動的に実行されるようになっている。
【００１３】
次に実施形態１で実行される文書データの登録処理について、図６を用いて説明する。
【００１４】
図６は本発明の実施形態１で実行される文書データの登録処理を示すフローチャートである。
【００１５】
全文検索データベースに対する文書の登録処理について説明する。
【００１６】
ステップＳ１２で全文検索データベースに対する登録の為の検索テーブルを指定する。これは、文書を管理する文書データベースに対応して検索テーブルを持つようにすると、管理が楽である。本発明の場合は、高優先順位の全文検索テーブルと低優先順位に全文検索テーブルを組みにして１つの検索テーブルとして考えるが、通常登録では、低優先順位の全文検索テーブルが指定される。これにより文書データベース中に新たに追加された文書中のテキストは登録・検索の為の全文検索テーブルに対して、登録処理が行われる。検索テーブルはたとえば図１０のようになっていて、テキストを分解して登録されている。たとえば“ａｂｃ”という文字に対して、これが含まれる文書のＩＤが登録されている。したがって検索では実際のキーワードを分解して、分解したキーワードに対応するテーブルの文書ＩＤから、全分解キーワードで共通する文書ＩＤを特定して絞り込んで、最終的に絞り込まれた文書の内容から分解前のキーワードが本当に存在するかを確かめてヒット文書を確定する。
【００１７】
ステップ１３では処理対象の文書データのＩＤと文書中のテキストを関連付けして登録用の全文検索用テーブルに登録する。
【００１８】
ステップ１４では処理対象の文書データの登録処理が全て終了したか否かを判定する。登録処理が全て終了している（ステップ１４でｙｅｓ）場合には処理を終了し、登録処理が終了していない（ステップ１４でｎｏ）場合にはステップＳ１１にもどり、登録処理を繰り返す。
【００１９】
尚、文書データに変更があった場合にも同様に登録処理を行うが、その場合には、その変更前の文書データのＩＤを検索用テーブルから削除した上で登録処理を行う。
【００２０】
ところで文書データが画像である場合に文書画像中のテキストを抽出する技術としてＯＣＲ技術は一般的である。
【００２１】
次に図６で説明した登録処理によって登録された文書データに対して実行される全文検索処理について説明する。
【００２２】
全文検索を行う場合、図２に示したメニュー１３の“ツール”メニューを選択する。これにより図３に示すようなプルダウンメニューが表示され、“全文検索”メニューを選択すると、図４に示すようなダイアログボックスが表示される。検索を実行する操作者は、編集ボックス３１に文字列を入力して、追加ボタン３３を押すとキーワードリスト３２に入力した文字列がキーワードリスト３２に入力した文字列がキーワードとして追加される。削除ボタン３４は、キーワードリスト３２の中の１つを選択して押す事により、選択されたキーワードを削除する事ができる。
【００２３】
３５、３６はＡＮＤ、ＯＲ論理を指定するチェックボタンであり、キーワードリスト３２中のキーワード全てを含む文書データを検索する場合にはＡＮＤのチェックボタン３５にチェックを付け、複数のキーワードのうちいずれか１つ以上を含む文書データを検索する場合にはＯＲチェックボタン３６にチェックを付ける。キャンセルボタン３７はキーワードリスト３２中にキーワードが１つ以上列挙されている場合にイネーブルとなり、この検索ボタン３７を押すと、キーワードリスト３２中のキーワードを検索条件とした検索が実行される。
【００２４】
検索ボタンが押され、ヒットした文書がある場合に、図５で示すヒットリストウィンドウが表示される。図５は本発明の実施形態１のヒットリストウィンドウを説明するための図である。
【００２５】
図５に示すように検索された文書画像データ数を示すヒット数としてＩＤリスト中のＩＤの数が表示ボックス４５内に表示される。４６はヒット文書リスト表示部であり、検索された文書画像データのサイズ、ページ数、作成日、更新日等がＩＤリストを基に調べられ表示される。閉じるボタン４８はヒットリストウィンドウを閉じるときに押すボタンである。検索ボタン４９はメニュー１３の“ツール”メニューのプルダウンメニューの“全文検索”メニューと同じ働きをする。開くボタン４７はヒット文書リスト表示部４６でフォーカスが当たっている文書画像データを図２のドキュメント表示部２３に表示させるためのものである。後述するテキスト表示部５０は簡易的なものであるが、開くボタン４７では文書中のフォントサイズ、種類、レイアウト等を忠実に再現する。
【００２６】
またテキスト表示部５０には、ヒット文書リスト表示部４６でフォーカスが当たっている文書画像データ中のテキスト部分が抽出されて表示され、検索に用いられたキーワードが斜線枠５３で示すように強調表示される。次ヒットボタン５１前ヒットボタン５２を押すことにより、斜線枠５３にカーソルが当てられ、そのカーソルの位置が前後に存在する斜線枠５３に移動して、テキスト部分の表示領域がスクロールされる。
【００２７】
尚、文書データは、画像データではなくアプリケーションで作成されたテキストデータ等であっても、本発明になんら影響はない。
【００２８】
また画像データの場合では、文書画像データからＯＣＲ（文字認識）されて得られたテキスト部分が表示される。
【００２９】
以上が本発明の全体的な構成と動作であるが、以下に本発明に特有な部分の詳細を説明する。
【００３０】
図４に示す検索ボタン３７がマウスポインタを使ってクリックされると図８に示す検索処理が行われる。ステップ１０１で検索ボタンが押された事を認識するとステップ１０２で文書に対応した登録する全文検索テーブル（データベース）のうち高優先順位の全文検索データベースを選択します。ステップ１０３で、選択した全文検索データベースから実際の検索処理を行う。これは検索の際に設定されたキーワードを分解した文字から、対応する文書のＩＤを求めて、全ての分解文字に対応するＩＤの共通ＩＤを残す処理をする。但し、得られた共通ＩＤの文書ではキーワードとして連続した文字列が含まれている保証がないのでステップ１０４で本当に文書中にキーワードが含まれているかを調べて、キーワードが含まれている文書に対応するＩＤのみを確定し、ステップ１０５で図５に示すヒットリストダイアログ（ヒットリストウィンドウ）にヒットした文書を最終的に確定したＩＤから特定して列挙する。操作者はこの段階でヒット文書を確認できるので、結果を速く得ることになる。
【００３１】
ステップ１０５では検索でヒットした文書のＩＤに対応した図９ヒット頻度テーブルの該当文書ＩＤの行にヒット頻度を１加える。テーブルに該当する文書ＩＤが無い場合には、新規にテーブルに加える。ヒット頻度テーブルは、高優先順位の全文検索データベースと低優先順位の全文検索データベースのそれぞれに別個に存在している。ステップ１０６では、高優先順位と低優先順位の両方のデータベースで検索を終えたか判断し、高優先順位のデータベースでしか検索を終えていない場合にはステップ１０２に戻って低優先順位の全文検索データベースに切り替えて同様に検索処理を行い、ステップ１０５には検索によるヒット文書をヒットリストに加え、操作者はこの段階ですべてのヒット文書が確認可能になる。またステップ１０５で低優先順位の全文検索データベースに対応した図９ヒット頻度テーブルに対してヒット文書ＩＤに対応したヒット頻度を１増加させる。
【００３２】
図７は高優先順位全文検索データベースと低優先順位全文検索データベースとの登録文書入れ替え（移動）処理である移動登録処理を説明する図である。
【００３３】
この移動登録処理は、１週間や１ヶ月という長い周期で自動的に実行されるようになっている。
【００３４】
ステップ７１では低優先順位の全文検索テーブルを対象に指定し、ステップ７２で、対になる図９のヒット頻度テーブルから頻度が高い文書ＩＤを特定する。ステップ７３では、その頻度が高い文書ＩＤを低優先順位の全文検索データベースから削除する。同時に図９に示すヒット頻度テーブルから該当する文書ＩＤを削除し、内容を一時的に保持しておく。ステップ７４では高優先順位の全文検索データベースに切り換え、ステップ７５で高優先順位の全文検索テーブルに、低優先順位の全文検索データベースから削除した文書の登録を行う。またステップ７３で保持していた文書ＩＤと出現頻度は高優先順位の全文検索テーブルに対応する図９のヒット頻度テーブルに書き込まれる。ステップ７６ではヒット頻度が高い文書がないか調べ、あればステップ７１からの処理を繰り返す。
【００３５】
処理対象となる文書は、ヒット頻度テーブルの値がＮ以上と決めても良いし、ヒット頻度テーブルの全ヒット頻度の平均値Ａを計算し、Ａ×Ｍ以上のヒット頻度を持つ文書を対象にする等、なんら制限されるものではない。
【００３６】
また、登録されている文書ＩＤのうち、ヒット頻度が上位Ｌ％に入っている文書を対象にするようにしてもよい。
【００３７】
ステップ７７では高優先順位の全文検索テーブルに対して処理され、対になる図９のヒット頻度テーブルから頻度が低い文書ＩＤを特定する。ステップ７８では、その頻度が低い文書ＩＤを高優先順位の全文検索データベースから削除する。同時に対応する図９に示すヒット頻度テーブルから該当する文書ＩＤを削除し、内容を一時的に保持しておく。ステップ７９では低優先順位の全文検索データベースに切り換え、ステップ８０で低優先順位の全文検索テーブルに、高優先順位の全文検索データベースから削除した文書の登録を行う。またステップ７８で保持していた文書ＩＤと出現頻度は低優先順位の全文検索テーブルに対応する図９のヒット頻度テーブルに書き込まれる。ステップ８１ではヒット頻度が低い文書がないか調べ、あればステップ８２で再度高優先順位の全文検索テーブルに対象を設定しなおして、ステップ７７からの処理を繰り返す。
【００３８】
処理対象となる文書は、ヒット頻度テーブルの値がＬＮ以下と決めても良いし、ヒット頻度テーブルの全ヒット頻度の平均値ＬＡを計算し、ＬＡ×ＬＭ以下のヒット頻度を持つ文書を対象にする等、なんら制限されるものではない。
【００３９】
また、登録されている文書ＩＤのうち、ヒット頻度が下位ＬＬ％に入っている文書を対象にするようにしてもよい。または、ステップ７７の直前までに削除した文書数分ＬＰだけ、処理対象の文書にする為に、ヒット頻度数が下位からＬＰに入っている文書を対象にしても良い。
【００４０】
また、ステップ７１〜７５までの処理文書数とステップ７７〜８０までの処理文書数を一致させ、しかもステップ７１〜７５で処理された文書がステップ７７〜８０で再度処理される事を防ぎ、高優先順位の全文検索テーブルに登録する文書数を一定に保つ為に、あらかじめ高優先順位の全文検索データベースに移し変える文書のヒット頻度をＨＰとし、この値以下のヒット頻度では低優先順位の全文検索データベースに移し変えるように定めれても良い。
【００４１】
この場合は両優先順位のテーブルから移し変える文書数があらかじめ計算できるので、高優先順位の全文検索テーブルに登録する文書数ＨＴを定めると、ＨＰの値を大体いくつにすれば良いか計算する事が可能となる。
【００４２】
（実施例２）
実施例１では全文検索データベースの登録文書の移動処理をヒットした頻度に応じて行っていたが、これを文書を開いたり、編集したり等の操作を行った頻度に置き換えても良いことは容易に想像がつく。
【００４３】
（実施例３）
ところで全文検索データベースの登録内容が大きく変わった場合には、一旦高優先順位の全文検索データベースから全ての文書を削除して、低優先順位の全文検索データベースに登録しなおしても良いし、その際に図９のヒット頻度テーブルを全てクリアしてしまっても構わない。
【００４４】
（実施例４）
実施例１では、文書が最初に登録される全文検索データベースは低優先順位のデータベースであり、ヒットの頻度に応じて高優先順位の全文検索データベースに移していたが、その逆にして、最初に高優先順位の全文検索データベースに登録しておいて、ヒット頻度の少ない文書を低優先順位の全文検索データベースに移し変えるというようにしても良いことは言うまでもない。
【００４５】
（実施例５）
ところで実施例１では、検索の実行時におけるヒットリスト上への文書表示は、高優先順位の全文検索データベースからの結果表示と低優先順位の全文検索データベースからの結果表示を２段階に行うが、最初の高優先順位の全文検索データベースからの結果表示が終わった段階で、操作者に対してメッセージを表示して、低優先順位の全文検索データベースに対する検索を続行するか中断するかを選べるようにしても構わないし、図５のヒットリスト上に検索検索中止ボタンを表示するか、検索中に検索中止ボタンがついたダイアログを表示して検索を中断できるようにしても良いことは言うまでもない。
【００４６】
（実施例６）
実施例１では高優先順位の全文検索データベースと低優先順位の全文検索データベースの２種類のデータベースで登録する文書を移動していたが、３種類以上の全文検索データベースを有する構成でも良いことは容易に推測できる。
【００４７】
尚、本発明は複数の機器（例えばホストコンピュータ、インターフェイス機器、スキャナ、プリンタなど）から構成されるシステムに適用しても、１つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。
【００４８】
また、本発明の目的は前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることはいうまでもない。
【００４９】
この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。
【００５０】
プログラムコードを供給する為の記憶媒体としては、たとえば、フロッピー（Ｒ）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。
【００５１】
またコンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることはいうまでもない。
【００５２】
さらに、記憶媒体から読み出されたプログラムコードがコンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることはいうまでもない。
【００５３】
【発明の効果】
以上、説明したように全文検索を行う際に、複数の全文検索データベースの優先順位が高い順に検索をおこうので、より必要な文書から先に検索結果リストに得られ、検索操作者を長く待たせないで検索結果を表示できるようになった。
【図面の簡単な説明】
【図１】本発明の実施形態１の全文検索の処理を含むアプリケーションソフトが動作する画像検索装置の全体構成を示す図である。
【図２】本発明の実施形態１のアプリケーション動作中のウィンドウ表示画面を示す図である。
【図３】本発明の実施形態１のプルダウンメニューの一例を示す図である。
【図４】本発明の実施形態１の検索を行う為のダイアログボックスを示す図である。
【図５】本発明の実施形態１のヒットリストウィンドウを説明するための図でる。
【図６】本発明の実施形態１で実行される文書データの登録処理を示すフローチャートである。
【図７】本発明の実施形態１で実行される文書データの移動登録処理を示すフローチャートである。
【図８】本発明の実施形態１における検索処理を説明する図。
【図９】本発明の実施形態１のヒット頻度テーブル例。
【図１０】全文検索用テーブル。
【符号の説明】
１ＣＰＵ
２ＣＤ−ＲＯＭドライブ
３プリンタ
４ＨＤＤ
５ＦＤＤ
６通信部
７キーボード
８マウス
９ディスプレイ装置[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technical field of managing a document having a large number of hits and a document having a small number of hits when performing a full-text search using separate full-text search databases.
[0002]
[Prior art]
The full-text search process requires an enormous amount of processing time to check the text in a document and search for a keyword in it. In order to shorten the search time, a text is usually extracted from a document as a basic process, and the ID (recognition) number of the document corresponding to the text is created as a table. However, even if such a full-text search technique is used, if the number of hit documents is large, it takes a long processing time.
[0003]
[Problems to be solved by the invention]
Therefore, in order to increase the search speed, it is possible to register a document in multiple full-text search databases and improve the search speed from each full-text search database, but specify which full-text search database contains the document Yes, it was effective only when the search was limited to the full-text search database.
[0004]
[Means for Solving the Problems]
The present invention has a configuration in which a plurality of full-text search databases are provided, and the frequency of hits for each document hit in the search is stored in a table, and a document having a low frequency of hits is periodically registered from a database having a high search priority. Has a means of deleting and re-registering to the low-priority full-text search database, and deleting the documents with high hit frequency from low-priority databases and registering them in the high-priority full-text search database There is a means for resetting.
[0005]
At the time of the search, there is a means for performing a search in order from a full-text search database having a high priority and performing a high-speed search from a document having a high usage rate.
[0006]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
[0007]
[Embodiment 1]
FIG. 1 is a diagram showing an overall configuration of a document search device on which application software including a full-text search process according to a first embodiment of the present invention operates.
[0008]
Reference numeral 1 denotes a CPU, which performs various controls for executing the processing of the present embodiment. Reference numeral 2 denotes a CD-ROM drive, which is used to retrieve data from the outside. Reference numeral 3 denotes a printer, which prints images and the like displayed on the display device 9. Reference numeral 4 denotes a hard disk drive, which is mainly used as a main storage, and stores a processing program, an application, and the like to be executed in the present embodiment. Also, it stores a plurality of types of document (image) data. Reference numeral 5 denotes a floppy (R) disk drive, which is used for taking in data from outside and taking out data to outside. Reference numeral 6 denotes a communication unit which enables connection with a network or the like, and can exchange data with an external computer, a scanner, a printer, or a disk device on the network. Reference numeral 7 denotes a keyboard which receives an operator's interactive input, such as an instruction for a process to be executed in the present embodiment or input of a character string such as a search condition. Reference numeral 8 denotes a pointing device such as a mouse, which receives an interactive input of the operator together with the keyboard 7. Reference numeral 9 denotes a display device, which includes a display card or a monitor equipped with a memory for displaying a document image, and displays a window for executing the processing of the present embodiment and a processing result.
[0009]
(Example 1)
The window display screen during the operation of the application according to the first embodiment will be described with reference to FIG.
[0010]
FIG. 2 is a diagram illustrating a window display screen during application operation according to the first embodiment of the present invention. The window display screen is displayed on a monitor of the display device 9, for example.
[0011]
Reference numeral 18a denotes a minimize button for displaying the window 200 as an icon. 18b is a maximize button for maximizing and displaying the window 200 on the front of the monitor. Reference numeral 18c denotes a “close button” for deleting the display of the window 200. Reference numeral 12 denotes a title bar that displays the title of the window 200. Reference numeral 17 denotes an area in which the name of the currently activated application and the document name of the displayed image are displayed. Reference numeral 13 denotes a menu for displaying various processing functions, and reference numeral 14 denotes a tool button for displaying a processing function similar to the processing function displayed in the menu 13 as a button. By selecting the menu 13 or an arbitrary button displayed on the tool button 14 using the pointing device 8, it is possible to execute the selected processing function. A toolbar 11 is used to select an operation in a specific mode. An area 22 displays a window background portion outside the frame of the document image displayed as a search result. Reference numeral 15 denotes a document image displayed as a search result.
[0012]
When the "tool" menu of the menu 13 is selected in FIG. 2, a pull-down menu as shown in FIG. 3 is displayed. The pull-down menu includes a registration process of a character string included in the document image data, a “registration process” for executing a full-text search, and a “full-text search” menu. The normal registration process in the first embodiment is an operation for making the input document data a search target, and is automatically executed, for example, at night or when the system is idle.
[0013]
Next, document data registration processing executed in the first embodiment will be described with reference to FIG.
[0014]
FIG. 6 is a flowchart showing a document data registration process executed in the first embodiment of the present invention.
[0015]
A process of registering a document in the full-text search database will be described.
[0016]
In step S12, a search table for registration in the full-text search database is specified. This is easier if a search table is provided corresponding to a document database for managing documents. In the case of the present invention, a high-priority full-text search table and a low-priority full-text search table are combined and considered as one search table. In normal registration, a low-priority full-text search table is designated. Thus, the text in the document newly added to the document database is registered in the full-text search table for registration and search. The search table is, for example, as shown in FIG. For example, the ID of a document including the character “abc” is registered. Therefore, in the search, the actual keyword is decomposed, the document ID common to all the decomposed keywords is specified and narrowed down from the document ID of the table corresponding to the decomposed keyword, and the content of the finally narrowed down document is determined before the decomposition. Check if the keyword really exists and determine the hit document.
[0017]
In step 13, the ID of the document data to be processed is associated with the text in the document and registered in the full-text search table for registration.
[0018]
In step S14, it is determined whether or not all registration processing of the document data to be processed has been completed. If all the registration processes have been completed (yes in step 14), the process ends. If the registration processes have not been completed (no in step 14), the process returns to step S11, and the registration process is repeated.
[0019]
Note that the registration process is also performed when the document data is changed. In this case, the registration process is performed after deleting the ID of the document data before the change from the search table.
[0020]
Incidentally, the OCR technology is generally used as a technology for extracting text in a document image when the document data is an image.
[0021]
Next, a full-text search process executed on the document data registered by the registration process described with reference to FIG. 6 will be described.
[0022]
When performing a full-text search, the "Tool" menu of the menu 13 shown in FIG. 2 is selected. As a result, a pull-down menu as shown in FIG. 3 is displayed. When the “full-text search” menu is selected, a dialog box as shown in FIG. 4 is displayed. When the operator performing the search inputs a character string in the edit box 31 and presses an add button 33, the character string input to the keyword list 32 is added as a keyword to the keyword list 32. The delete button 34 can delete the selected keyword by selecting and pressing one of the keyword lists 32.
[0023]
Check buttons 35 and 36 are used to specify AND and OR logics. To search for document data including all keywords in the keyword list 32, check the AND check button 35 and select one of a plurality of keywords. To search for document data including one or more, the user checks the OR check button 36. The cancel button 37 is enabled when one or more keywords are listed in the keyword list 32. When the search button 37 is pressed, a search is performed using the keywords in the keyword list 32 as search conditions.
[0024]
When the search button is pressed and there is a hit document, a hit list window shown in FIG. 5 is displayed. FIG. 5 is a diagram for explaining a hit list window according to the first embodiment of the present invention.
[0025]
As shown in FIG. 5, the number of IDs in the ID list is displayed in the display box 45 as the number of hits indicating the number of searched document image data. Reference numeral 46 denotes a hit document list display section, in which the size, number of pages, creation date, update date, and the like of the searched document image data are checked and displayed based on the ID list. The close button 48 is a button pressed when closing the hit list window. The search button 49 has the same function as the “full-text search” menu in the pull-down menu of the “tool” menu of the menu 13. The open button 47 is for displaying the document image data focused on the hit document list display section 46 on the document display section 23 of FIG. Although a text display unit 50 described later is a simple one, the open button 47 faithfully reproduces the font size, type, layout, and the like in the document.
[0026]
The text display section 50 extracts and displays a text portion in the document image data focused on by the hit document list display section 46, and highlights the keyword used for the search as indicated by a hatched frame 53. Is done. By pressing the next hit button 51 and the previous hit button 52, the cursor is placed on the hatched frame 53, the position of the cursor is moved to the preceding and following hatched frame 53, and the display area of the text portion is scrolled.
[0027]
Note that even if the document data is not image data but text data or the like created by an application, the present invention is not affected at all.
[0028]
In the case of image data, a text portion obtained by OCR (character recognition) from the document image data is displayed.
[0029]
The above is the overall configuration and operation of the present invention. Hereinafter, the details of the parts unique to the present invention will be described.
[0030]
When the search button 37 shown in FIG. 4 is clicked using the mouse pointer, the search processing shown in FIG. 8 is performed. When it is recognized in step 101 that the search button has been pressed, a high-priority full-text search database is selected from registered full-text search tables (databases) corresponding to the documents in step 102. In step 103, actual search processing is performed from the selected full-text search database. In this process, the ID of the corresponding document is obtained from the characters obtained by decomposing the keyword set at the time of the search, and the process of leaving the common ID of the ID corresponding to all the decomposed characters is performed. However, since there is no guarantee that a continuous character string is included as a keyword in the obtained document with the common ID, it is checked in step 104 whether the keyword is actually included in the document. Only the corresponding ID is determined, and the documents hit in the hit list dialog (hit list window) shown in FIG. 5 in step 105 are identified and listed from the finally determined ID. Since the operator can check the hit document at this stage, the result can be obtained quickly.
[0031]
In step 105, one hit frequency is added to the row of the corresponding document ID in the hit frequency table of FIG. 9 corresponding to the ID of the document hit in the search. If there is no corresponding document ID in the table, it is newly added to the table. The hit frequency table exists separately in each of the high-priority full-text search database and the low-priority full-text search database. In step 106, it is determined whether the search has been completed in both the high-priority database and the low-priority database. If the search has been completed only in the high-priority database, the process returns to step 102 to return to the low-priority full-text search database. And the search process is performed in the same manner. In step 105, the hit document by the search is added to the hit list, and the operator can confirm all the hit documents at this stage. In step 105, the hit frequency corresponding to the hit document ID is increased by 1 in the hit frequency table of FIG. 9 corresponding to the low-priority full-text search database.
[0032]
FIG. 7 is a diagram for explaining a transfer registration process which is a process of replacing (moving) registered documents between a high-priority full-text search database and a low-priority full-text search database.
[0033]
This transfer registration processing is automatically executed in a long cycle of one week or one month.
[0034]
In step 71, a low-priority full-text search table is specified as a target. In step 72, a document ID having a high frequency is specified from the hit frequency table of FIG. In step 73, the document ID with the high frequency is deleted from the low-priority full-text search database. At the same time, the corresponding document ID is deleted from the hit frequency table shown in FIG. 9, and the contents are temporarily stored. In step 74, the database is switched to the high-priority full-text search database. In step 75, the document deleted from the low-priority full-text search database is registered in the high-priority full-text search table. The document ID and the appearance frequency stored in step 73 are written in the hit frequency table of FIG. 9 corresponding to the high-priority full-text search table. In step 76, it is checked whether there is a document having a high hit frequency, and if so, the processing from step 71 is repeated.
[0035]
For the document to be processed, the value of the hit frequency table may be determined to be N or more, or the average value A of all the hit frequencies in the hit frequency table is calculated, and the document having the hit frequency of A × M or more is targeted. There are no restrictions on what you do.
[0036]
Further, among the registered document IDs, a document whose hit frequency is in the top L% may be targeted.
[0037]
In step 77, the high-priority full-text search table is processed, and a document ID having a low frequency is specified from the hit frequency table of FIG. In step 78, the document ID with a low frequency is deleted from the high-priority full-text search database. At the same time, the corresponding document ID is deleted from the corresponding hit frequency table shown in FIG. 9, and the contents are temporarily stored. In step 79, the database is switched to the low-priority full-text search database. In step 80, the document deleted from the high-priority full-text search database is registered in the low-priority full-text search table. Further, the document ID and the appearance frequency stored in step 78 are written in the hit frequency table of FIG. 9 corresponding to the low-priority full-text search table. In step 81, it is checked whether there is a document having a low hit frequency. If so, in step 82, the target is set again in the high-priority full-text search table, and the processing from step 77 is repeated.
[0038]
For the document to be processed, the value of the hit frequency table may be determined to be LN or less, or the average value LA of all the hit frequencies in the hit frequency table is calculated, and the document having a hit frequency of LA × LM or less is targeted. There are no restrictions on what you do.
[0039]
Further, among the registered document IDs, a document whose hit frequency is in the lower LL% may be targeted. Alternatively, in order to make a document to be processed a LP corresponding to the number of documents deleted up to immediately before step 77, a document whose hit frequency is included in the LP from the lowest may be targeted.
[0040]
In addition, the number of processed documents in steps 71 to 75 is made equal to the number of processed documents in steps 77 to 80, and the documents processed in steps 71 to 75 are prevented from being processed again in steps 77 to 80. In order to keep the number of documents registered in the full-text search table of the priority order constant, the hit frequency of the document to be transferred to the high-priority full-text search database is previously set to HP, and the hit frequency less than this value is used for the low-priority full-text search. It may be determined to transfer to a database.
[0041]
In this case, the number of documents to be transferred from the tables of both priorities can be calculated in advance. Therefore, when the number of documents HT to be registered in the high-priority full-text search table is determined, it is necessary to calculate the approximate value of HP. Becomes possible.
[0042]
(Example 2)
In the first embodiment, the process of moving a registered document in the full-text search database is performed according to the frequency of hits. However, it is easy to replace this with the frequency of operations such as opening or editing a document. I can imagine.
[0043]
(Example 3)
By the way, when the registered contents of the full-text search database change significantly, all the documents may be temporarily deleted from the high-priority full-text search database and registered again in the low-priority full-text search database. Alternatively, all of the hit frequency tables in FIG. 9 may be cleared.
[0044]
(Example 4)
In the first embodiment, the full-text search database in which a document is registered first is a low-priority database, and is moved to a high-priority full-text search database according to the frequency of hits. Needless to say, a document having a low hit frequency may be registered in a high-priority full-text search database and transferred to a low-priority full-text search database.
[0045]
(Example 5)
By the way, in the first embodiment, the display of the document on the hit list at the time of executing the search is performed in two steps: displaying the result from the high-priority full-text search database and displaying the result from the low-priority full-text search database. When the results from the first high-priority full-text search database have been displayed, a message is displayed to the operator so that the user can select whether to continue or stop searching the low-priority full-text search database. Needless to say, a search / search stop button may be displayed on the hit list in FIG. 5 or a dialog with the search stop button may be displayed during the search so that the search can be interrupted.
[0046]
(Example 6)
In the first embodiment, the documents to be registered are moved in two types of databases, a high-priority full-text search database and a low-priority full-text search database. However, it is easy to adopt a configuration having three or more types of full-text search databases. Can be guessed.
[0047]
The present invention is applicable to a system including a plurality of devices (for example, a host computer, an interface device, a scanner, a printer, etc.), but is also applicable to a device including one device (for example, a copying machine, a facsimile machine, etc.). May be.
[0048]
Further, an object of the present invention is to supply a storage medium in which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or an apparatus, and a computer (or CPU or MPU) of the system or the apparatus stores the storage medium in the storage medium. It goes without saying that this is also achieved by reading and executing the stored program code.
[0049]
In this case, the program code itself read from the storage medium realizes the function of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.
[0050]
As a storage medium for supplying the program code, for example, a floppy (R) disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, nonvolatile memory card, ROM, or the like is used. Can be.
[0051]
When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also an OS (Operating System) running on the computer is actually executed based on the instruction of the program code. It goes without saying that a part or all of the above processing is performed, and the functions of the above-described embodiments are realized by the processing.
[0052]
Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion board is executed based on the instruction of the program code. It goes without saying that a CPU or the like provided in the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0053]
【The invention's effect】
As described above, when performing a full-text search, a search is performed in the order of the priority of a plurality of full-text search databases in descending order of priority. Therefore, more necessary documents can be obtained in the search result list first, and the search operator waits longer. You can now display search results without having to wait.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an entire configuration of an image search device on which application software including a full-text search process according to a first embodiment of the present invention operates.
FIG. 2 is a diagram illustrating a window display screen during an application operation according to the first embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of a pull-down menu according to the first embodiment of the present invention.
FIG. 4 is a diagram illustrating a dialog box for performing a search according to the first embodiment of the present invention.
FIG. 5 is a diagram illustrating a hit list window according to the first embodiment of the present invention.
FIG. 6 is a flowchart illustrating document data registration processing executed in the first embodiment of the present invention.
FIG. 7 is a flowchart illustrating document data transfer registration processing executed in the first embodiment of the present invention.
FIG. 8 is a view for explaining search processing according to the first embodiment of the present invention.
FIG. 9 is an example of a hit frequency table according to the first embodiment of the present invention.
FIG. 10 is a full-text search table.
[Explanation of symbols]
1 CPU
2 CD-ROM drive 3 Printer 4 HDD
5 FDD
6 Communication unit 7 Keyboard 8 Mouse 9 Display device

Claims

A full-text search / registration method that has multiple full-text search databases and performs searches in order of priority, dynamically changing documents registered in multiple full-text search databases .

Documents registered in multiple full-text search databases are changed based on the hit frequency of past documents, and the registered database must be moved by deleting the registration from the database and re-registering it with another database. 2. A search / registration method for full-text search according to claim 1, wherein:

3. A full-text search and retrieval system according to claim 2, wherein a hit frequency table is provided in pair with the full-text search database, and when a document is registered and transferred to another full-text search database, the hit frequency data also moves. Registration method.