JP4014417B2

JP4014417B2 - Full-text search device

Info

Publication number: JP4014417B2
Application number: JP2002036000A
Authority: JP
Inventors: 卓也平岡; 研策山本; 哲也池田; 泰嗣小川; 一繁浅田; 弘志竹川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-02-13
Filing date: 2002-02-13
Publication date: 2007-11-28
Anticipated expiration: 2022-02-13
Also published as: JP2003242180A

Description

【０００１】
【発明の属する技術分野】
本発明は、全文検索装置に関し、より詳細には、複数の文書データから指定された文字列を含む文書を検索する全文検索装置に関する。本発明は、例えば文書管理システム、電子図書館システム、特許公報検索システムなど、多量の文書データを管理するシステムに適用可能である。
【０００２】
【従来の技術】
近年、情報通信技術の発達により電子化された文書及びその文書に関する情報がインターネットなどを介して大量に流通している。この電子化文書及び情報の流通に際し、所望の文書を精度よく、さらには高速に検索する文書検索装置が提案されている。
【０００３】
そのような文書検索装置においてはキーワード検索手法や全文検索手法が用いられている。全文検索手法を用いた全文検索装置は、任意の検索文字列と検索対象の文書全てとの間で照合を行なって、検索文字列を含む文書を漏れなく抽出する装置であり、キーワード検索手法のように検索対象となる全ての文書に対してキーワードを予め付与するといった多大な人力が必要ない。全文検索装置としては、様々な種類のものが提案されているが、その１種として転置（索引）ファイル方式を採用した装置がある。転置ファイル方式では、検索のための補助ファイルとして、文字／単語／n-gram（n文字連接）などが出現する文書、或いはそれらの文書中の出現位置を記録する転置ファイルを予め構築し、全文検索時には、転置ファイルのみを用いて検索するもので非常に高速な検索を行なうことが可能であり大量文書の高速検索が要求されるシステムに対して有効である。
【０００４】
全文検索方式一般、転置ファイル方式の詳細については、特開平１１−０７３４２９号公報の従来技術や、全文検索システム協議会平成１０年度活動報告(http://www.ftsanet.com/dbtokyo99/Db99.htm)などで述べられており、公知であるのでその説明を省略する。
【０００５】
しかしながら、転置ファイル方式では通常原データの数倍にも及ぶ転置ファイルを構築する必要があり、転置ファイル方式の全文索引は登録されている文書データ量が多くなるにしたがって登録・削除処理に時間を要するようになり、全文検索装置としては利用者側からみた登録・削除処理のレスポンスタイムが長くなる。その登録・削除処理の間、検索処理は待たざるを得ない。
【０００６】
また、特開平７−１４６８８０号公報には、新規文書を登録する際に、主インデックスよりも小さな副インデックスに登録し、登録時間を短くする文書検索装置及び方法が記載されている。しかしながら、同公報に記載の発明では、登録時間が短くなっているとはいえ、新規文書の登録の間、検索処理は行えない。
【０００７】
【発明が解決しようとする課題】
本発明は、上述のごとき実情に鑑みてなされたものであり、利用者側からみた登録及び削除処理のレスポンスタイムを短くし、さらに登録処理及び削除処理が終了しないうちから検索処理を行うことが可能な、全文検索装置を提供することをその目的とする。
【０００８】
【課題を解決するための手段】
請求項１の発明は、複数の文書データから指定された文字列を含む文書を検索する全文検索装置において、登録された文書データを保存する文書データ記憶部と、検索用の全文索引記憶部と、ユーザからのデータを入力する入力手段と、検索結果を出力する出力手段と、文書データに関する登録処理を行う登録処理手段と、文書データに関する削除処理を行う削除処理手段と、検索処理を行う検索処理手段とを有し、登録用の全文索引記憶部と、削除用の全文索引記憶部とを、前記検索用の全文索引記憶部とは別に有し、さらに、前記登録用の全文索引記憶部及び削除用の全文索引記憶部から、前記検索用の全文索引記憶部へデータをマージするマージ手段と、ロック処理を行うロック処理手段とを有し、前記登録用の全文索引記憶部及び削除用の全文索引記憶部から、前記検索用の全文索引記憶部へデータをマージする際に、前記マージ手段が全文索引の構成要素であるトークンの転置リストごとに処理を行い、前記ロック処理手段が前記転置リストのトークンにロックをかけることを特徴としたものである。
【００１０】
請求項２の発明は、請求項１の発明において、前記マージ手段は、前記登録用の全文索引記憶部に登録された文書データ件数が予め指定された件数に達したときに、前記検索用の全文索引記憶部にデータをマージする処理を行うことを特徴としたものである。
【００１１】
請求項３の発明は、請求項１の発明において、前記マージ手段は、前記登録用の全文索引記憶部の容量が予め指定された容量に達したときに、前記検索用の全文索引記憶部にデータをマージする処理を行うことを特徴としたものである。
【００１２】
請求項４の発明は、請求項１乃至３のいずれか１の発明において、前記マージ手段は、前記削除用の全文索引記憶部に登録された文書データ件数が予め指定された件数に達したときに、前記検索用の全文索引記憶部にデータをマージする処理を行うことを特徴としたものである。
【００１３】
請求項５の発明は、請求項１乃至３のいずれか１の発明において、前記マージ手段は、前記削除用の全文索引記憶部の容量が予め指定された容量に達したときに、前記検索用の全文索引記憶部にデータをマージする処理を行うことを特徴としたものである。
【００１４】
【発明の実施の形態】
本出願人は、従来技術による転置ファイル方式における利用者側からみた登録・削除処理のレスポンスタイムの長さを解消するために、特願２００１−２２３６０４号明細書において、小規模の全文索引を登録用及び削除用に別に用意し登録及び削除のレスポンスタイムの悪化を防ぎ、検索処理の際には大規模の全文索引の検索結果に、登録用の小規模全文索引の検索結果を加え、削除用の小規模全文索引の検索結果を除き、利用者に返す検索結果とする全文検索装置を提案した。これは、本出願人による特願２００１−７８０２６号明細書に記載の手法を全文検索装置に適用し、登録及び削除のレスポンスタイムの悪化を防止したものである。特願２００１−２２３６０４号明細書に記載の発明では、全文索引の構成要素である転置リストを転送することにより、データ転送に要する時間を短くしたものであり、より具体的には、小規模な全文索引から大規模な全文索引へのデータ転送手段において、元の文書データを用いるのではなく転置ファイル方式の全文索引を用いることによって、データ転送に要する時間を短くしている。
【００１５】
なお、上述の特願２００１−７８０２６号明細書には、高度な検索要求に高速に応答できる性能を維持しつつ、システム稼働中の更新性能をさらに向上させることができるデータベース管理システム、プログラム、及び記録媒体が記載されており、登録・削除のためのデータ保持手段を検索向けデータ保持手段とは別に用意することによって、登録・削除のスループットを高くすることを特徴としている。しかしながら、上述の特願２００１−７８０２６号明細書に記載の手法では、登録用及び削除用の小規模な全文索引から検索用の大規模な全文索引へのデータ転送手段で小規模索引に登録されている文書データの識別子から元の文書データを取得し、大規模な索引に登録及び削除を行っている。上述のごとく、大規模な全文索引への登録・削除処理には時間がかかるので、データ転送処理の時間が長くなり、一般に全文索引への登録・削除処理の間は検索処理が行えないことから、利用者から見た検索処理のレスポンスタイムが悪くなるという問題があった。
【００１６】
上述の特願２００１−２２３６０４号明細書に記載の発明は、全ての転置リストの転送処理が終了するまで検索処理が行えない。すなわち、登録用及び削除用の小規模な全文索引から検索用の大規模な全文索引へのデータ転送が終了しないと、検索処理が行えない。本発明は、トークンにロック処理を加えることにより、転送処理が終了するのを待つことなく検索処理を行えるようにしたものである。換言すると、本発明では、小規模な全文索引から大規模な全文索引へのデータ転送手段において、転送する転置リストのトークンをロックすることにより、転置リスト転送中も検索を行えるようにしている。
【００１７】
図１は、本発明の一実施形態に係る全文検索装置の機能を説明するためのブロック図、図２は、図１における全文検索装置をスタンドアロンで構成した場合のハードウェア構成例を示す図、図３は、図１における全文検索装置をサーバ／クライアントで構成した場合のハードウェア構成例を示す図である。
本発明に係る全文検索装置は、複数の文書データ（複数の電子化文書）から指定された文字列を含む文書を検索する装置である。図１を参照すると、本実施形態においては、入力手段（入力処理手段）１では、登録処理用のテキストデータ，削除処理用の文書識別子，検索処理用の検索条件などのデータがユーザから入力され、それぞれ、登録処理手段３，削除処理手段４，検索処理手段５に渡す処理が行われる。登録処理手段３では文書データに関する登録処理を行う。登録処理手段３における登録処理は文書データ記憶部７及び登録用全文索引記憶部（しばしば登録用小規模全文索引記憶部と呼ぶ）９に対して行われる。削除処理手段４では文書データに関する削除処理を行う。削除処理手段４における削除処理は、入力手段１で入力された文書識別子に基づいて、文書データ記憶部７に記憶された文書データを読み出し、テキスト分割手段６を用い、登録用小規模全文索引記憶部９に登録された索引である場合にはそれを削除し、登録された索引でない場合には削除用全文索引記憶部（しばしば削除用小規模全文索引記憶部と呼ぶ）１０にその索引を記録する。
【００１８】
テキスト分割手段６では、登録処理手段３，削除処理手段４，検索処理手段５の各々で必要な、登録処理における文書データから部分文字列への分割処理、削除処理における文書データから部分文字列への分割処理、検索処理における検索条件（検索文字列）から部分文字列への分割処理を行う。また、検索処理手段５における検索処理は、検索用全文索引記憶部（しばしば検索用大規模全文索引記憶部と呼ぶ）８，登録用小規模全文索引記憶部９，削除用小規模全文索引記憶部１０に対して実行し、記憶部８及び９の検索結果から記憶部１０における検索結果を差し引いた結果を求め、検索結果として出力手段２で出力する。マージ手段１１においては、検索用大規模全文索引記憶部８，登録用小規模全文索引記憶部９，削除用小規模全文索引記憶部１０間でのデータ転送を行う。本発明の特徴として、ロック処理手段１２では、各手段３，４，５，１１におけるそれぞれの処理において、他の処理を防止するためにロックをかけるロック処理を行う。ロック処理手段１２はロック処理を管理する管理手段ともいえる。
【００１９】
図２に示すスタンドアロンでのハードウェア構成においては、図１における入力手段１は入力装置２１に実現され、出力手段２は表示装置２２に実現される。各種処理手段３〜６，１１，１２は主制御装置（ＣＰＵ，メモリ等）２４に、各種記憶部７〜１０は記憶装置２５に実現される。また、入出力制御装置２３は主制御装置２４の制御信号に従って入力装置２１及び表示装置２２を制御する。
【００２０】
図３に示すサーバ／クライアントでのハードウェア構成においては、図１における入力手段１はクライアント３０の入力装置３１で実現され、出力手段２はクライアント３０の表示装置３２に実現される。各種処理手段３〜６，１１，１２はクライアント３０及びサーバ５０の主制御装置（ＣＰＵ，メモリ等）３４，５２に実現され、各種記憶部７〜１０はサーバ５０の記憶装置５３に実現される。また、クライアント３０，サーバ５０のネットワーク制御装置３５，５１は、ネットワーク４０を介してクライアント３０とサーバ５０の間のデータ伝送等の制御を行う。さらにクライアント３０の入出力制御装置３３は、主制御装置３４の制御信号に従って入力装置２１及び表示装置２２を制御する。
【００２１】
以下に、上述のごとく構成された全文検索装置の動作の一例を詳細に説明する。
図４は、本発明の一実施形態に係る全文検索装置における登録処理を説明するためのフロー図である。
登録処理を実行するには、まず利用者が文書データを作成し、入力手段１からその文書データを登録（入力）する（ステップＳ１）。登録処理手段３において文書データを文書データ記憶部７に保存し（ステップＳ３）、同時にその文書データを示す識別子（文書識別子）を定める（ステップＳ２）。さらに登録処理手段３において、テキスト分割手段６を用いて文書データから部分文字列（トークン）とそのトークンの出現位置情報（転置リスト）を得る（ステップＳ４）。次に、ロック処理手段１２により登録用小規模転置索引記憶部（登録用小規模全文索引記憶部）９にＸロックをかける（ステップＳ５）。Ｘロックに関しては後述する。トークンを終了するまで（ステップＳ６でＮＯの間）、文書識別子と各トークンの出現位置情報を登録用小規模全文索引記憶部９に記録する（ステップＳ７）。すなわち、転置リストを登録用小規模全文索引記憶部９に挿入する。ステップＳ６でＹＥＳの場合、登録用小規模全文索引記憶部９のＸロックをはずし（ステップＳ８）、処理を終了する。なお、テキスト分割手段６で使用される分割手法については、Ｎ文字組をトークンとする手法でもよいし、形態素解析を行い単語をトークンとする手法でもよい。以下の例ではＮ文字組みをトークンとする手法を用いたテキスト分割手段に限って説明するが形態素解析を行った単語をトークンとする手法に対しても同様に適用可能である。
【００２２】
図５は、図１の全文検索装置における処理を説明するための図で、全文索引の一例を示す図である。図５の例を用いて転置ファイル方式の全文索引について詳細に説明する。
登録文書データを文書１，文書２とし、それらの内容（ここではテキスト分割手段６で分割することにより得た内容）がそれぞれ、図５の符号６１，６２で表されるものとする。ここで、各文書の左の数字は文字列の先頭からの文字数を表している。つまり、文書１では、「全文検索」は先頭から１１文字目、「方法」は２０，６０文字目、「全文検索方法」は３１文字目に出現していることを意味する。また文書２では、「探索方法」は先頭から１文字目、「方法」は２４文字目、「全文」は３０，４２文字目に出現していることを意味する。
【００２３】
なお、２文字組を部分文字列とする場合、文書中の全ての部分文字列を抽出し、それらの文書内での出現位置（先頭からの文字数）を部分文字列ごとにまとめて索引に記録する。例えば、文書１からは「全文」が１１，３１の位置、「文検」が１２，３２の位置に出現しているので、索引に記録する。索引では、文書内での出現位置だけでなく、どの文書に出現したかを識別するための文書識別子と出現回数を加えて記録するので、図５の符号６３で示したような形式になる。例えば、「全文」に対する転置リスト｛１，２，（１１，３１）｝及び｛２，２，（３０，４２）｝はそれぞれ、文書１において２回出現してその位置は１１，３１であること、及び文書２において２回出現してその位置は３０，４２であることを意味する。
【００２４】
図６及び図７は、本発明の一実施形態に係る全文検索装置における削除処理を説明するためのフロー図である。
削除処理を実行するには、まず利用者が入力手段１から削除する文書の文書識別子を入力するなどして文書削除要求を入力する（ステップＳ１１）。次に、ロック処理手段１２により文書にＸロックをかけ（ステップＳ１２）、削除処理手段４において文書データ記憶部７から文書識別子に対応する文書データを読み出す（取り出す）（ステップＳ１３）。さらに削除処理手段４において、テキスト分割手段６を用いて文書データから部分文字列（トークン）とそのトークンの出現位置情報を得る（ステップＳ１４）。次に、登録用小規模全文索引記憶部９に削除する文書が存在するかを判断し（ステップＳ１５）、文書識別子が登録用小規模全文索引に登録されていない場合（検索用大規模全文索引に登録されている場合）には（ステップＳ１５でＮＯ）、削除用小規模全文索引記憶部１０にＸロックをかけ（ステップＳ１６）、トークンが終了するまで（ステップＳ１７でＹＥＳ）、転置リストを削除用小規模全文索引記憶部１０に挿入、すなわち文書識別子と各トークンの出現位置情報を削除用小規模全文索引記憶部１０に記録し（ステップＳ１８）、削除用小規模全文索引記憶部１０のＸロックをはずす（ステップＳ１９）。一方、文書識別子が登録用小規模全文索引に登録されている文書識別子である場合には（ステップＳ１５でＹＥＳ）、登録用小規模全文索引記憶部９にＸロックをかけ（ステップＳ２２）、トークンが終了するまで（ステップＳ２３でＹＥＳ）、転置リストを登録用小規模全文索引記憶部９から削除、すなわち各トークンの出現位置情報を登録用小規模全文索引記憶部９から削除し（ステップＳ２４）、登録用小規模全文索引記憶部９のＸロックをはずす（ステップＳ２５）。ステップＳ１９，Ｓ２５に続き、削除処理手段４において文書データ記憶部７から文書識別子に対応する文書データを削除し（ステップＳ２０）、文書のＸロックをはずして（ステップＳ２１）、削除処理を終了する。
【００２５】
図８は、本発明の一実施形態に係る全文検索装置における検索処理を説明するためのフロー図である。
検索処理を実行するには、まず利用者が入力手段１から検索文字列（検索データ）を入力する（ステップＳ３１）。次に、検索処理手段５において、テキスト分割手段６を用いて検索文字列からトークンを得る（ステップＳ３２）。そしてロック処理手段１２により全てのトークンにＳロックをかけ（ステップＳ３３）、検索処理手段５において検索用大規模全文索引記憶部８の検索用大規模全文索引を用いて、検索文字列を含む文書データの文書識別子の集合（Ｒｓ）を得る（ステップＳ３４）とともに、登録用小規模全文索引記憶部９の登録用小規模全文索引を用いて、検索文字列を含む文書データの文書識別子の集合（Ｒｉ）を得る（ステップＳ３５）。さらに、検索処理手段５において削除用小規模全文索引記憶部１０の削除用小規模全文索引を用いて、検索文字列を含む文書データの文書識別子の集合（Ｒｄ）を得る（ステップＳ３６）。そしてロック処理手段１２は全てのトークンのＳロックをはずし（ステップＳ３７）、検索処理手段５は得られた文書識別子の集合（Ｒｓ，Ｒｉ，Ｒｄ）に対して下記の集合演算を行い（ステップＳ３８）、その結果を検索結果（Ｒ）とし、出力手段２を通じて利用者に検索文字列を含む文書データの文書識別子の集合を出力する（ステップＳ３９）。なお、Ｓロックについては後述する。
Ｒ＝Ｒｓ＋Ｒｉ−Ｒｄ
ただし、＋を論理和演算子、−を論理差演算子とする。
【００２６】
図５の全文索引６３を例として検索処理について詳細に説明する。
検索文字列を「全文検索」とすると、テキスト分割手段が「全文」，「文検」，「検索」の３個のトークンを抽出する。次に全文索引６３の対応するトークンの３つの転置リストを調べる。それぞれのトークン出現位置の差が１であるものを探すと文書識別子１の１１文字目と３１文字目に「全文検索」が存在することがわかる。
【００２７】
図９は、本発明の一実施形態に係る全文検索装置におけるマージ処理を説明するためのフロー図である。
転送処理に使用するデータに、転置リストを用いて行うと、元の文書データを用いて登録・削除処理を行う場合に比べて、処理開始時にすでに作成されている転置リストを直接利用するのでテキスト分割処理によるトークンの切り出し及びその転置リスト作成に要する時間が不要となるためデータ転送時間を短くできる。本発明においては転置リスト同士の処理であることからデータ転送処理のことをマージ処理と呼ぶ。
【００２８】
マージ処理を実行するには、まず削除用小規模全文索引記憶部１０にトークンがあるかを判断し（ステップＳ４１）、存在すれば（ステップＳ４１でＹＥＳ）、削除用小規模全文索引の全てのトークンに対して、ステップＳ４３〜Ｓ４５の処理を行う。すなわち、ステップＳ４３ではマージするトークンにＸロックをかける。ステップＳ４４では、全文索引からそのトークンの転置リストを取り出し、検索用大規模全文索引の対応するトークンの転置リストから、取り出した転置リスト中の出現位置情報を削除する（転置リストを検索用大規模全文索引記憶部８から削除する）。ステップＳ４５ではマージするトークンのＸロックをはずし、ステップＳ４１へ戻る。一方、ステップＳ４１でＮＯの場合、削除用小規模全文索引記憶部１０を空にする（ステップＳ４２）。
【００２９】
次に、登録用小規模全文索引記憶部９にトークンがあるかを判断し（ステップＳ４６）、存在すれば（ステップＳ４６でＹＥＳ）、登録用小規模全文索引の全てのトークンに対して、ステップＳ４８〜Ｓ５０の処理を行う。すなわち、ステップＳ４８ではマージするトークンにＸロックをかける。ステップＳ４９では、全文索引からそのトークンの転置リストを取り出し、検索用大規模全文索引の対応するトークンの転置リストの末尾に先の転置リストを加える（転置リストを検索用大規模全文索引記憶部８に登録する）。ステップＳ５０ではマージしたトークンのＸロックをはずし、ステップＳ４６へ戻る。一方、ステップＳ４６でＮＯの場合、登録用小規模全文索引記憶部９を空にする
【００３０】
図１０は、図５における全文索引６３のトークン「全文」の転置リストを例にマージ処理の概要を説明するための図である。
検索用全文索引の転置リスト７１としての、「全文」に対する転置リスト｛１，２，（１１，３１）｝，｛２，２，（３０，４２）｝と、削除用全文索引の転置リスト７２としての、「全文」に対する転置リスト｛１，２，（１１，３１）｝とのマージ処理７３を実行することにより、「全文」に対する転置リスト｛２，２，（３０，４２）｝（７４）が得られる。さらに、この転置リスト７４と、登録用全文索引の転置リスト７６としての、「全文」に対する転置リスト｛５，２，（４，１６）｝，｛８，１，（３）｝とをマージ処理７５することにより、「全文」に対する転置リスト｛２，２，（３０，４２）｝，｛５，２，（４，１６）｝，｛８，１，（３）｝（７７）が得られる。
【００３１】
（マージ処理の形態１）
マージ処理は、登録用小規模全文索引記憶部９における登録用小規模全文索引に登録されている文書識別子の数が予め指定されている数に達したときに登録処理手段３によって起動されるようにしてもよい。
【００３２】
（マージ処理の形態２）
マージ処理は、登録用小規模全文索引記憶部９における記憶容量（大きさ）が予め指定されているサイズになったときに登録処理手段３によって起動されるようにしてもよい。この形態により、利用者から登録される文書データの大きさにばらつきがあるような応用形態として使用される場合に、小さな文書データが連続して登録されたときに登録用小規模全文索引への登録時間が長くなる前にマージ処理が開始されることを防ぐことができる。サイズを起動条件にすることでマージの処理時間を均等にすることができる。さらに、前述のマージ処理（形態１）の場合には件数を起動条件にしており全文索引記憶部の大きさを管理する必要がないので処理が簡単になる利点がある。
形態１，２では、文章を登録することが多い場合にも更新処理全体のスループットを上げることができる。
【００３３】
（マージ処理の形態３）
削除用小規模全文索引のマージ処理は削除処理手段４によって起動される。起動条件は削除用小規模全文索引に登録されている文書識別子の数が予め指定されている数に達したときとしてもよい。
【００３４】
（マージ処理の形態４）
削除用小規模全文索引のマージ処理は削除処理手段４によって起動される。起動条件は削除用小規模全文索引記憶部１０の大きさが予め指定されているサイズに達したときとしてもよい。
形態３，４では、文章を削除すること（削除処理）が多い場合にも更新処理全体のスループットを上げることができる。
【００３５】
上述のごときマージ処理の各形態により、全文検索装置においては登録・削除する文書データの特徴や利用分野の特徴に適した条件で全文索引のマージ処理を開始することが可能となり、マージ処理の発生回数を減らせ、システム全体のスループットを向上させることが可能となる。
【００３６】
図１１は、本発明の一実施形態に係る全文検索装置におけるロック処理を説明するためのフロー図である。
ロックにはＸとＳの２種類のモードがあり、あるオブジェクトにＸロックがかかっていると、他のユーザはそのオブジェクトにロックをかけることはできない。また、あるオブジェクトにＳロックがかかっていると、他のユーザはそのオブジェクトにはＳロックしかかけることはできない。ロック処理手順は、そのような仕組みにより、オブジェクト間の排他制御を行っている。ロック処理手段１２におけるロック処理は、まずロック要求があると（ステップＳ６１でＹＥＳ）、Ｘロックであるかを判断する（ステップＳ６２）。ステップＳ６２でＹＥＳの場合、既にロックされているかを判断し（ステップＳ６３）、ロックされていればロックが解除されるのを待ち（ステップＳ６５）ステップＳ６２へ戻り、ロックされていなければＸロックをかけて（ステップＳ６６）、処理を終了する。一方、ステップＳ６２でＮＯの場合、既にロックされているかを判断し（ステップＳ６４）、ロックされていればステップＳ６５へ進みロックが解除されるのを待ってステップＳ６２へ戻り、ロックされていなければＳロックをかけて（ステップＳ６７）、処理を終了する。
【００３７】
例えば、あるトークンにＸロックがかかっていると、検索するためにそのトークンにＳロックをかけようとしたユーザは、Ｘロックがはずされるまで、待つことになる。また、あるユーザが検索するためにあるトークンにＳロックをかけていると、マージ処理を行うためにそのトークンにＸロックをかけようとしたマージ処理手順は、検索が終了し、そのトークンのＳロックがはずされるまで、待つことになる。
【００３８】
以上、本発明の全文検索装置を中心に各実施形態を説明してきたが、本発明は、これら全文検索装置における処理手順を含んでなる全文検索方法、これら全文検索装置として機能させるためのプログラム、又はその各手段として機能させるためのプログラムとしても、或いは、そのプログラムを記録したコンピュータ読み取り可能な記録媒体としての形態も可能である。
【００３９】
本発明による全文検索の機能を実現するためのプログラムやデータを記憶した記録媒体の実施形態を説明する。記録媒体としては、具体的には、ＣＤ−ＲＯＭ、光磁気ディスク、ＤＶＤ−ＲＯＭ、ＦＤ、フラッシュメモリ、及びその他各種ＲＯＭやＲＡＭ等が想定でき、これら記録媒体に上述した本発明の各実施形態の装置の機能をコンピュータに実行させ、全文検索の機能を実現するためのプログラムを記録して流通させることにより、当該機能の実現を容易にする。そしてコンピュータ等の情報処理装置に上記のごとくの記録媒体を装着して情報処理装置によりプログラムを読み出すか、若しくは情報処理装置が備えている記憶媒体に当該プログラムを記憶させておき、必要に応じて読み出すことにより、本発明に係わる全文検索機能を実行することができる。
【００４０】
【発明の効果】
本発明によれば、全文検索装置における登録・削除処理を小規模な全文索引記憶部に対して行うので、その処理時間は短く抑えることが可能となり、利用者へのレスポンスタイムを短くすることが可能となる。さらに、本発明によれば、検索用全文索引へのデータ登録・削除の際に、トークンにロックをかけながら既に作成されている転置リストを直接利用することができるので、検索用全文索引へのマージ処理の時間を短縮でき、また、同時に検索処理を行うこともできる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る全文検索装置の機能を説明するためのブロック図である。
【図２】図１における全文検索装置をスタンドアロンで構成した場合のハードウェア構成例を示す図である。
【図３】図１における全文検索装置をサーバ／クライアントで構成した場合のハードウェア構成例を示す図である。
【図４】本発明の一実施形態に係る全文検索装置における登録処理を説明するためのフロー図である。
【図５】図１の全文検索装置における処理を説明するための図で、全文索引の一例を示す図である。
【図６】本発明の一実施形態に係る全文検索装置における削除処理を説明するためのフロー図である。
【図７】本発明の一実施形態に係る全文検索装置における削除処理を説明するためのフロー図である。
【図８】本発明の一実施形態に係る全文検索装置における検索処理を説明するためのフロー図である。
【図９】本発明の一実施形態に係る全文検索装置におけるマージ処理を説明するためのフロー図である。
【図１０】図５における全文索引のトークン「全文」の転置リストを例にマージ処理の概要を説明するための図である。
【図１１】本発明の一実施形態に係る全文検索装置におけるロック処理を説明するためのフロー図である。
【符号の説明】
１…入力手段、２…出力手段、３…登録処理手段、４…削除処理手段、５…検索処理手段、６…テキスト分割手段、７…文書データ記憶部、８…検索用大規模全文索引記憶部、９…登録用小規模全文索引記憶部、１０…削除用小規模全文索引記憶部、１１…マージ手段、１２…ロック処理手段、２１，３１…入力装置、２２，３２…表示装置、２３，３３…入出力制御装置、２４，３４，５２…主制御装置（ＣＰＵ・メモリ）、２５，５３…記憶装置、３０…クライアント、３５，５１…ネットワーク制御装置、４０…ネットワーク、５０…サーバ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a full-text search device, and more particularly to a full-text search device that searches a document including a character string designated from a plurality of document data. The present invention can be applied to a system that manages a large amount of document data such as a document management system, an electronic library system, and a patent publication search system.
[0002]
[Prior art]
2. Description of the Related Art In recent years, documents that have been digitized due to the development of information communication technology and information related to the documents have been distributed in large quantities via the Internet or the like. A document search apparatus that searches a desired document with high accuracy and at a high speed during the distribution of the digitized document and information has been proposed.
[0003]
In such a document search apparatus, a keyword search method or a full-text search method is used. A full-text search device using a full-text search method is a device that performs matching between an arbitrary search character string and all documents to be searched, and extracts documents including the search character string without omission. In this way, a great deal of human power is not required such as assigning keywords in advance to all documents to be searched. Various types of full-text search devices have been proposed. One type of full-text search device is a device that employs a transposed (index) file method. In the transposed file method, as an auxiliary file for searching, a document in which characters / words / n-grams (n-character concatenation) appear, or a transposed file that records the appearance position in those documents is constructed in advance, and the whole text At the time of retrieval, the retrieval is performed using only the transposed file, and a very high-speed retrieval can be performed, which is effective for a system that requires a high-speed retrieval of a large number of documents.
[0004]
For details on the full-text search method in general and the transposed file method, refer to the conventional technology disclosed in Japanese Patent Laid-Open No. 11-073429, and the activity report of the full-text search system council in 1998 (http://www.ftsanet.com/dbtokyo99/Db99. htm), etc., and since it is publicly known, its description is omitted.
[0005]
However, with the inverted file method, it is usually necessary to construct an inverted file that is several times the original data, and the full-text index of the inverted file method requires more time for registration / deletion processing as the amount of registered document data increases. As a result, the response time of the registration / deletion process as viewed from the user side becomes longer for the full-text search apparatus. During the registration / deletion process, the search process must wait.
[0006]
Japanese Patent Application Laid-Open No. 7-146880 discloses a document search apparatus and method for registering a new document in a sub-index smaller than the main index and shortening the registration time. However, in the invention described in the publication, search processing cannot be performed during registration of a new document, although the registration time is short.
[0007]
[Problems to be solved by the invention]
The present invention has been made in view of the above circumstances, shortening the response time of registration and deletion processing as seen from the user side, and further performing search processing before registration processing and deletion processing are completed. It is an object of the present invention to provide a possible full-text search device.
[0008]
[Means for Solving the Problems]
According to the first aspect of the present invention, in a full-text search apparatus for searching for a document including a specified character string from a plurality of document data, a document data storage unit for storing registered document data, a full-text index storage unit for search, , An input means for inputting data from the user, an output means for outputting search results, a registration processing means for performing registration processing for document data, a deletion processing means for performing deletion processing for document data, and a search for performing search processing A full-text index storage unit for registration and a full-text index storage unit for deletion separately from the full-text index storage unit for search, and further, the full-text index storage unit for registration And data from the full-text index storage unit for deletion to the full-text index storage unit for search Merge to merge Means and a lock processing means for performing a lock process. When the data is merged from the full-text index storage unit for registration and the full-text index storage unit for deletion into the full-text index storage unit for search, the merge means transposes tokens that are constituent elements of the full-text index. Processing is performed for each list, and the lock processing means locks the tokens of the transposed list. It is characterized by that.
[0010]
Claim 2 The invention of claim 1 In the invention of merge When the number of document data registered in the registration full-text index storage unit reaches a predetermined number, the data is stored in the search full-text index storage unit. Merge It is characterized by performing processing.
[0011]
Claim 3 The invention of claim 1 In the invention of merge The means stores data in the full-text index storage unit for search when the capacity of the full-text index storage unit for registration reaches a predetermined capacity. Merge It is characterized by performing processing.
[0012]
Claim 4 The invention of claim 1 to claim 1 3 In any one of the inventions, merge When the number of document data registered in the deletion full-text index storage unit reaches a predetermined number, the data is stored in the search full-text index storage unit. Merge It is characterized by performing processing.
[0013]
Claim 5 The invention of claim 1 to claim 1 3 In any one of the inventions, merge The means stores data in the full-text index storage unit for search when the capacity of the full-text index storage unit for deletion reaches a predetermined capacity. Merge It is characterized by performing processing.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
In order to eliminate the length of response time of registration / deletion processing as seen from the user side in the transposed file method according to the prior art, the present applicant registers a small full-text index in Japanese Patent Application No. 2001-223604. Prepared separately for registration and deletion to prevent deterioration of response time for registration and deletion. For search processing, add the search result of small full-text index for registration to the search result of large-scale full-text index. We proposed a full-text search device that uses search results to be returned to users, except for small-scale full-text index search results. In this method, the technique described in Japanese Patent Application No. 2001-78026 by the present applicant is applied to a full-text search apparatus to prevent deterioration in response times of registration and deletion. In the invention described in the specification of Japanese Patent Application No. 2001-223604, the time required for data transfer is shortened by transferring an inverted list that is a constituent element of a full-text index. In the data transfer means from the full-text index to the large-scale full-text index, the time required for data transfer is shortened by using the transposed file type full-text index instead of using the original document data.
[0015]
In the above-mentioned Japanese Patent Application No. 2001-78026, a database management system, a program, and a program capable of further improving the update performance during system operation while maintaining the performance capable of responding to an advanced search request at high speed, and A recording medium is described, and a registration / deletion throughput is increased by preparing a data holding unit for registration / deletion separately from a data holding unit for search. However, according to the technique described in the above-mentioned Japanese Patent Application No. 2001-78026, registration is made in a small-scale index by means of data transfer from a small full-text index for registration and deletion to a large full-text index for search. The original document data is acquired from the identifier of the existing document data, and is registered and deleted in a large-scale index. As described above, registration / deletion processing to a large-scale full-text index takes time, so the data transfer processing time becomes longer, and generally search processing cannot be performed during registration / deletion processing to the full-text index. There is a problem that the response time of the search processing as seen from the user is deteriorated.
[0016]
In the invention described in the above-mentioned Japanese Patent Application No. 2001-223604, the search process cannot be performed until the transfer process of all the transposed lists is completed. That is, search processing cannot be performed unless data transfer from a small full-text index for registration and deletion to a large full-text index for search is completed. In the present invention, a lock process is added to a token so that a search process can be performed without waiting for the transfer process to end. In other words, in the present invention, in the data transfer means from the small full-text index to the large full-text index, the token of the transposed list to be transferred is locked so that the search can be performed even during the transposed list transfer.
[0017]
FIG. 1 is a block diagram for explaining the function of a full-text search device according to an embodiment of the present invention. FIG. 2 is a diagram showing an example of a hardware configuration when the full-text search device in FIG. FIG. 3 is a diagram illustrating a hardware configuration example when the full-text search apparatus in FIG. 1 is configured by a server / client.
The full-text search device according to the present invention is a device for searching for a document including a specified character string from a plurality of document data (a plurality of digitized documents). Referring to FIG. 1, in the present embodiment, in the input means (input processing means) 1, data such as text data for registration processing, document identifiers for deletion processing, search conditions for search processing, and the like are input from the user. , Processing to pass to the registration processing means 3, the deletion processing means 4 and the search processing means 5, respectively. The registration processing means 3 performs registration processing relating to document data. Registration processing in the registration processing means 3 is performed for the document data storage unit 7 and the registration full-text index storage unit (often referred to as a registration small-scale full-text index storage unit) 9. The deletion processing unit 4 performs deletion processing related to document data. The deletion processing in the deletion processing unit 4 reads out the document data stored in the document data storage unit 7 on the basis of the document identifier input by the input unit 1 and uses the text dividing unit 6 to store a small full-text index for registration. If the index is registered in the section 9, the index is deleted, and if it is not a registered index, the index is recorded in the deletion full-text index storage section (often referred to as a small-scale full-text index storage section for deletion) 10. To do.
[0018]
In the text dividing means 6, the division processing from document data to partial character strings in registration processing and the conversion from document data to partial character strings required in each of the registration processing means 3, deletion processing means 4 and search processing means 5 are performed. The dividing process from the search condition (searched character string) to the partial character string is performed. The search processing in the search processing means 5 includes a search full-text index storage unit (often referred to as a large-scale search full-text index storage unit) 8, a small-scale full-text index storage unit 9 for registration, and a small-scale full-text index storage unit for deletion. 10, the result obtained by subtracting the search result in the storage unit 10 from the search results in the storage units 8 and 9 is obtained, and the result is output by the output unit 2 as the search result. The merging means 11 performs data transfer among the search large-scale full-text index storage unit 8, the registration small-scale full-text index storage unit 9, and the deletion small-scale full-text index storage unit 10. As a feature of the present invention, the lock processing means 12 performs a lock process for locking each process in each of the means 3, 4, 5 and 11 in order to prevent other processes. It can be said that the lock processing means 12 is a management means for managing lock processing.
[0019]
In the stand-alone hardware configuration shown in FIG. 2, the input means 1 in FIG. 1 is realized by the input device 21, and the output means 2 is realized by the display device 22. Various processing means 3 to 6, 11 and 12 are realized in a main control device (CPU, memory, etc.) 24, and various storage units 7 to 10 are realized in a storage device 25. The input / output control device 23 controls the input device 21 and the display device 22 in accordance with a control signal from the main control device 24.
[0020]
In the hardware configuration of the server / client shown in FIG. 3, the input means 1 in FIG. 1 is realized by the input device 31 of the client 30, and the output means 2 is realized by the display device 32 of the client 30. The various processing units 3 to 6, 11, and 12 are realized in the main control devices (CPU, memory, etc.) 34 and 52 of the client 30 and the server 50, and the various storage units 7 to 10 are realized in the storage device 53 of the server 50. . In addition, the network control devices 35 and 51 of the client 30 and the server 50 control data transmission and the like between the client 30 and the server 50 via the network 40. Further, the input / output control device 33 of the client 30 controls the input device 21 and the display device 22 according to the control signal of the main control device 34.
[0021]
Below, an example of operation | movement of the full-text search apparatus comprised as mentioned above is demonstrated in detail.
FIG. 4 is a flowchart for explaining the registration process in the full-text search apparatus according to the embodiment of the present invention.
To execute the registration process, the user first creates document data and registers (inputs) the document data from the input means 1 (step S1). Document data is stored in the document data storage unit 7 in the registration processing means 3 (step S3), and at the same time, an identifier (document identifier) indicating the document data is determined (step S2). Further, the registration processing means 3 uses the text dividing means 6 to obtain a partial character string (token) and appearance position information (transposition list) of the token from the document data (step S4). Next, an X lock is applied to the registration small transposed index storage unit (registration small full-text index storage unit) 9 by the lock processing means 12 (step S5). The X lock will be described later. Until the token is ended (during NO in step S6), the document identifier and the appearance position information of each token are recorded in the registration small full-text index storage unit 9 (step S7). That is, the transposed list is inserted into the registration small full-text index storage unit 9. If YES in step S6, the X-lock of the registration small full-text index storage unit 9 is released (step S8), and the process ends. The dividing method used by the text dividing unit 6 may be a method using N character sets as tokens, or a method using morpheme analysis and words as tokens. In the following example, the description will be made only for the text dividing means using the method of using the N character set as a token, but it can be similarly applied to the method of using the word subjected to morphological analysis as a token.
[0022]
FIG. 5 is a diagram for explaining processing in the full-text search apparatus of FIG. 1 and is a diagram illustrating an example of a full-text index. The full text index of the inverted file method will be described in detail using the example of FIG.
Registered document data is document 1 and document 2, and their contents (here, contents obtained by dividing by text dividing means 6) are respectively represented by reference numerals 61 and 62 in FIG. Here, the number on the left of each document represents the number of characters from the beginning of the character string. That is, in Document 1, “full text search” appears at the 11th character from the top, “method” appears at the 20th and 60th characters, and “full text search method” appears at the 31st character. In document 2, “search method” means the first character from the top, “method” appears in the 24th character, and “full text” appears in the 30th and 42nd characters.
[0023]
When using 2 character sets as partial character strings, all partial character strings in the document are extracted, and their appearance positions (number of characters from the beginning) are collectively recorded in the index for each partial character string. To do. For example, since “full text” appears at positions 11 and 31 and “sentence check” appears at positions 12 and 32 from document 1, it is recorded in the index. In the index, not only the appearance position in the document but also a document identifier for identifying in which document and the number of appearances are added and recorded, so the format is as shown by reference numeral 63 in FIG. For example, the transposed lists {1, 2, (11, 31)} and {2, 2, (30, 42)} for “full text” each appear twice in document 1 and their positions are 11, 31 , And it appears twice in document 2 and its position is 30,42.
[0024]
6 and 7 are flowcharts for explaining the deletion process in the full-text search apparatus according to the embodiment of the present invention.
In order to execute the deletion process, the user first inputs a document deletion request by inputting the document identifier of the document to be deleted from the input means 1 (step S11). Next, the lock processing means 12 applies X lock to the document (step S12), and the deletion processing means 4 reads out (takes out) the document data corresponding to the document identifier from the document data storage unit 7 (step S13). Further, the deletion processing unit 4 obtains a partial character string (token) and appearance information of the token from the document data using the text dividing unit 6 (step S14). Next, it is determined whether there is a document to be deleted in the registration small full-text index storage unit 9 (step S15). If the document identifier is not registered in the registration small full-text index (search large-scale full-text index) (No in step S15), an X lock is applied to the deletion small full-text index storage unit 10 (step S16), and the transposed list is stored until the token ends (YES in step S17). Inserted into the deletion small full-text index storage unit 10, that is, the document identifier and the appearance position information of each token are recorded in the deletion small full-text index storage unit 10 (step S18). The X lock is released (step S19). On the other hand, if the document identifier is a document identifier registered in the registration small full-text index (YES in step S15), the registration small full-text index storage unit 9 is X-locked (step S22) and the token is registered. Until the process ends (YES in step S23), the transposed list is deleted from the registration small full-text index storage unit 9, that is, the appearance position information of each token is deleted from the registration small full-text index storage unit 9 (step S24). The X-lock of the registration small full-text index storage unit 9 is released (step S25). Subsequent to steps S19 and S25, the deletion processing unit 4 deletes the document data corresponding to the document identifier from the document data storage unit 7 (step S20), releases the X lock of the document (step S21), and ends the deletion process. .
[0025]
FIG. 8 is a flowchart for explaining the search processing in the full-text search device according to the embodiment of the present invention.
To execute the search process, the user first inputs a search character string (search data) from the input means 1 (step S31). Next, in the search processing means 5, a token is obtained from the search character string using the text dividing means 6 (step S32). Then, all the tokens are S-locked by the lock processing unit 12 (step S33), and the search processing unit 5 uses the search large-scale full-text index in the search large-scale full-text index storage unit 8 to include a document including the search character string. A set (Rs) of document identifiers of data is obtained (step S34), and a set of document identifiers of document data including a search character string (using the registration small full-text index of the registration small full-text index storage unit 9) ( Ri) is obtained (step S35). Further, the search processing means 5 obtains a set (Rd) of document identifiers of the document data including the search character string using the delete small full-text index of the delete small full-text index storage unit 10 (step S36). Then, the lock processing means 12 unlocks all tokens (step S37), and the search processing means 5 performs the following set operation on the obtained set of document identifiers (Rs, Ri, Rd) (step S38). ), And the result is set as a search result (R), and a set of document identifiers of document data including the search character string is output to the user through the output means 2 (step S39). The S lock will be described later.
R = Rs + Ri-Rd
However, + is a logical sum operator and − is a logical difference operator.
[0026]
The search process will be described in detail using the full text index 63 of FIG. 5 as an example.
If the search character string is “full text search”, the text dividing unit extracts three tokens of “full text”, “sentence check”, and “search”. Next, the three transposed lists of the corresponding tokens in the full text index 63 are examined. When searching for a token having a difference in token appearance position of 1, a “full text search” exists in the 11th and 31st characters of the document identifier 1.
[0027]
FIG. 9 is a flowchart for explaining merge processing in the full-text search apparatus according to the embodiment of the present invention.
If you use a transposed list for the data used for the transfer process, you can use the transposed list that has already been created at the start of processing, compared to the case of registering / deleting using the original document data. Data transfer time can be shortened because time required for token extraction and transposition list creation by division processing is not required. In the present invention, the data transfer processing is called merge processing because it is processing between transposed lists.
[0028]
In order to execute the merge process, first, it is determined whether or not there is a token in the deletion small full-text index storage unit 10 (step S41), and if it exists (YES in step S41), all of the deletion small full-text indexes are determined. Processing of steps S43 to S45 is performed on the token. That is, in step S43, an X lock is applied to the tokens to be merged. In step S44, the transposed list of the token is extracted from the full-text index, and the appearance position information in the extracted transposed list is deleted from the transposed list of the corresponding token of the large-scale search full-text index (the transposed list is searched for a large-scale search). Delete from the full text index storage unit 8). In step S45, the X lock of the token to be merged is released, and the process returns to step S41. On the other hand, in the case of NO in step S41, the deletion small-scale full-text index storage unit 10 is emptied (step S42).
[0029]
Next, it is determined whether or not there is a token in the registration small full-text index storage unit 9 (step S46). If it exists (YES in step S46), the step is performed for all tokens of the registration small full-text index. The process of S48-S50 is performed. That is, in step S48, an X lock is applied to the tokens to be merged. In step S49, the transposed list of the token is extracted from the full-text index, and the previous transposed list is added to the end of the transposed list of the corresponding token of the large-scale search full-text index (the transposed list is added to the large-scale full-text index storage unit for search 8). To register). In step S50, the X lock of the merged token is released, and the process returns to step S46. On the other hand, if NO in step S46, the registration small full-text index storage unit 9 is emptied.
[0030]
FIG. 10 is a diagram for explaining the outline of the merge process, taking the transposed list of the token “full text” in the full text index 63 in FIG. 5 as an example.
The transposed list {1, 2, (11, 31)}, {2, 2, (30, 42)} for “full text” and the transposed list 72 of the full text index for deletion as the transposed list 71 of the full text index for search As a result of executing the merge process 73 with the transposed list {1, 2, (11, 31)} for “full text”, the transposed list {2, 2, (30, 42)} for “full text” (74 ) Is obtained. Further, this transposed list 74 and the transposed list {5, 2, (4, 16)}, {8, 1, (3)} for “full text” as the transposed list 76 of the registration full-text index are merged. 75, the transposed list {2, 2, (30, 42)}, {5, 2, (4, 16)}, {8, 1, (3)} (77) for “full text” is obtained. .
[0031]
(Merge processing mode 1)
The merge processing is started by the registration processing means 3 when the number of document identifiers registered in the registration small full-text index in the registration small full-text index storage unit 9 reaches a number designated in advance. It may be.
[0032]
(Merge processing mode 2)
The merge processing may be started by the registration processing means 3 when the storage capacity (size) in the small-scale full-text index storage unit 9 for registration reaches a size specified in advance. In this form, when used as an application form in which the size of document data registered by the user varies, when small document data is continuously registered, the registration to the small scale full-text index for registration is performed. It is possible to prevent the merge process from being started before the registration time becomes long. The merge processing time can be equalized by setting the size as a start condition. Further, in the case of the above-described merge processing (form 1), there is an advantage that the processing is simplified because the number of cases is used as a start condition and it is not necessary to manage the size of the full-text index storage unit.
In modes 1 and 2, the throughput of the entire update process can be increased even when sentences are frequently registered.
[0033]
(Merge processing mode 3)
The deletion processing unit 4 starts the merging process of the deletion small full-text index. The activation condition may be when the number of document identifiers registered in the small text index for deletion reaches a number specified in advance.
[0034]
(Merge processing mode 4)
The deletion processing unit 4 starts the merging process of the deletion small full-text index. The activation condition may be when the size of the small-scale full-text index storage unit 10 for deletion reaches a size specified in advance.
In modes 3 and 4, the throughput of the entire update process can be increased even when sentences are frequently deleted (delete process).
[0035]
With the above-described forms of merge processing, the full-text search apparatus can start full-text index merge processing under conditions suitable for the characteristics of document data to be registered / deleted and the characteristics of the field of use. The number of times can be reduced, and the throughput of the entire system can be improved.
[0036]
FIG. 11 is a flowchart for explaining the lock processing in the full-text search apparatus according to the embodiment of the present invention.
There are two types of lock modes, X and S. When an X lock is applied to an object, other users cannot lock the object. Also, if an S lock is applied to an object, other users can only apply an S lock to that object. The lock processing procedure performs exclusive control between objects by such a mechanism. In the lock processing in the lock processing means 12, first, when there is a lock request (YES in step S61), it is determined whether the lock is X lock (step S62). If YES in step S62, it is determined whether it is already locked (step S63), and if it is locked, it waits for the lock to be released (step S65). (Step S66), the process is terminated. On the other hand, if NO in step S62, it is determined whether it is already locked (step S64). If locked, the process proceeds to step S65, waits for the lock to be released, returns to step S62, and if not locked. S lock is applied (step S67), and the process is terminated.
[0037]
For example, when an X lock is applied to a token, a user who tries to apply an S lock to the token for searching waits until the X lock is released. In addition, if an S lock is applied to a token for a certain user to search, the merge processing procedure that attempts to apply an X lock to the token in order to perform the merge processing ends the search and the S of the token. Wait until the lock is released.
[0038]
As mentioned above, although each embodiment has been described centering on the full-text search device of the present invention, the present invention is a full-text search method including processing procedures in these full-text search devices, a program for functioning as these full-text search devices, Alternatively, the present invention can be implemented as a program for causing each unit to function, or as a computer-readable recording medium on which the program is recorded.
[0039]
An embodiment of a recording medium storing a program and data for realizing a full text search function according to the present invention will be described. As the recording medium, specifically, a CD-ROM, a magneto-optical disk, a DVD-ROM, an FD, a flash memory, and various other ROMs and RAMs can be assumed. This function is facilitated by causing a computer to execute the function of the device and recording and distributing a program for realizing the full-text search function. Then, the recording medium as described above is mounted on an information processing apparatus such as a computer and the program is read by the information processing apparatus, or the program is stored in a storage medium provided in the information processing apparatus. By reading, the full-text search function according to the present invention can be executed.
[0040]
【The invention's effect】
According to the present invention, since the registration / deletion process in the full-text search device is performed on a small-scale full-text index storage unit, the processing time can be shortened and the response time to the user can be shortened. It becomes possible. Furthermore, according to the present invention, when registering / deleting data in / from the full-text index for search, it is possible to directly use the transposed list already created while locking the token. The merge process time can be shortened, and the search process can be performed simultaneously.
[Brief description of the drawings]
FIG. 1 is a block diagram for explaining functions of a full-text search apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a hardware configuration example when the full-text search device in FIG. 1 is configured as a stand-alone.
FIG. 3 is a diagram illustrating a hardware configuration example when the full-text search apparatus in FIG. 1 is configured by a server / client.
FIG. 4 is a flowchart for explaining registration processing in the full-text search device according to the embodiment of the present invention.
FIG. 5 is a diagram for explaining processing in the full-text search apparatus of FIG. 1, and is a diagram illustrating an example of a full-text index.
FIG. 6 is a flowchart for explaining deletion processing in the full-text search apparatus according to the embodiment of the present invention.
FIG. 7 is a flowchart for explaining deletion processing in the full-text search device according to the embodiment of the present invention.
FIG. 8 is a flowchart for explaining search processing in the full-text search device according to the embodiment of the present invention.
FIG. 9 is a flowchart for explaining merge processing in the full-text search device according to the embodiment of the present invention.
FIG. 10 is a diagram for explaining the outline of the merging process taking the transposed list of the token “full text” of the full text index in FIG. 5 as an example;
FIG. 11 is a flowchart for explaining lock processing in the full-text search device according to the embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Input means, 2 ... Output means, 3 ... Registration processing means, 4 ... Deletion processing means, 5 ... Search processing means, 6 ... Text division means, 7 ... Document data storage part, 8 ... Large-scale full-text index storage for search 9, small full-text index storage unit for registration, 10 small-scale full-text index storage unit for deletion, 11 merging unit, 12 lock processing unit, 21, 31 input device 22, 32 display unit, 23 , 33 ... I / O control device, 24, 34, 52 ... Main control device (CPU / memory), 25, 53 ... Storage device, 30 ... Client, 35, 51 ... Network control device, 40 ... Network, 50 ... Server.

Claims

In a full-text search device that searches a document including a specified character string from a plurality of document data, a document data storage unit that stores registered document data, a full-text index storage unit for search, and data from a user are input Input means for outputting, output means for outputting search results, registration processing means for performing registration processing for document data, deletion processing means for performing deletion processing for document data, and search processing means for performing search processing, A full-text index storage unit for registration and a full-text index storage unit for deletion are provided separately from the full-text index storage unit for search, and the full-text index storage unit for registration and a full-text index storage for deletion from parts, the merge means for merging the data to full-text index storage unit for searching, and a locking means for locking process, the full text index storage unit and full-text index storage for deletion for the registration From the time of merging the data to full-text index storage unit for the search, said merging means performs a process for each inverted list of tokens which is a component of the full-text index, the token of the locking means said inverted list full text search device you characterized in that the Ru a lock.

The merging means performs a process of merging data into the full-text index storage unit for search when the number of document data registered in the full-text index storage unit for registration reaches a predetermined number. The full-text search device according to claim 1 .

The merger performs a process of merging data into the full-text index storage unit for search when the capacity of the full-text index storage unit for registration reaches a predetermined capacity. 1. The full-text search device according to 1 .

The merging means performs a process of merging data into the full-text index storage unit for search when the number of document data registered in the full-text index storage unit for deletion reaches a predetermined number. the full-text search apparatus according to any one of claims 1 to 3, characterized.

The merger performs a process of merging data into the search full-text index storage unit when the capacity of the full-text index storage unit for deletion reaches a predetermined capacity. The full-text search device according to any one of 1 to 3 .