JP2006185368A

JP2006185368A - Document database update processor, document database retrieval device, document database index preparation method, and document database retrieval method

Info

Publication number: JP2006185368A
Application number: JP2004380955A
Authority: JP
Inventors: Koichi Tamatoshi; 公一玉利; Yuji Sugano; 祐司菅野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-12-28
Filing date: 2004-12-28
Publication date: 2006-07-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document database update processor capable of efficiently updating/retrieving a document DB by suppressing the consumption of storage capacity in update processing of the document database, a document database retrieval device, a document database index preparing method and a document database retrieval method. <P>SOLUTION: The document DB update processor 102 prepares update information having words to be eliminated in a new edition, negative indexes obtained by recording a list of appearance position list groups of the words and a group of positive indexes obtained by recording appearance positions of character strings particularly added and changed in changed indexes and position shift values in updating indexes from the present generation to the next generation in a document DB of a generation control system, and represents the next generation document DB by the update information and the index groups. In addition, the document DB update processor 102 has an update information merge processing function for merging update information of a plurality of generations. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文書データベース更新装置、文書データベース検索装置、文書データベース索引作成方法及び文書データベース検索方法に関し、特に、文書データベースの更新処理及び検索処理を行う文書データベース更新装置、文書データベース検索装置、文書データベース索引作成方法及び文書データベース検索方法に関する。 The present invention relates to a document database update device, a document database search device, a document database index creation method, and a document database search method, and more particularly to a document database update device, a document database search device, and a document database that perform update processing and search processing of a document database. The present invention relates to an index creation method and a document database search method.

近年、電子計算機やネットワークの高性能化、低価格化、インターネット通信環境の整備、普及に伴い、情報技術の利用が盛んになってきている。その中で、情報蓄積、情報検索の基盤として、データベース、データベース管理システムが盛んに利用されている。 In recent years, the use of information technology has become active along with the improvement in performance and price of computers and networks, and the development and spread of the Internet communication environment. Among them, databases and database management systems are actively used as a basis for information storage and information retrieval.

このようなデータベース管理システムの中には、データベースの「世代」という概念を導入し、世代管理を行うものがある。世代管理型のデータベース管理システムでは、更新処理において、補助記憶装置等に記憶された更新元のデータベースの実体を変更せずに、次の世代のデータベースを作成する。 Some of such database management systems introduce a concept of database “generation” and perform generation management. In the generation management type database management system, in the update process, the database of the next generation is created without changing the substance of the update source database stored in the auxiliary storage device or the like.

このような従来の世代管理型のデータベース管理システムとしては、文書データベースの世代を管理するものがあり、例えば、非特許文献１に記載されるような文書データベース管理システムが存在する。この文書データベース管理システムでは、文書データベース（以下、文書ＤＢという）の更新内容を考慮して、投入データ期間で複数のＤＢに分割して、差分更新を効率化している。 As such a conventional generation management type database management system, there is one that manages the generation of a document database. For example, there is a document database management system as described in Non-Patent Document 1. In this document database management system, the update contents of a document database (hereinafter referred to as a document DB) are taken into consideration and divided into a plurality of DBs in the input data period to make differential update more efficient.

以下、この文書ＤＢ管理システムについて図２１を参照して説明する。図２１に示す文書ＤＢ管理システム２１００おいて、２１０１は投入された更新文書と、世代Ｎ−１の索引及び文書を用いて世代Ｎを作成する文書ＤＢ更新処理装置、２１０２は文書を入力する文書入力装置、２１０３は複数世代に渡る文書ＤＢの実体を永続的に補助記憶装置２１１０に記録または、検索処理の高速化のために、最新の数世代の更新情報を主記憶装置２１０９に記録する文書ＤＢ保持装置、２１０４は文書から初期世代の索引を作成する文書ＤＢ作成装置、２１０５はキーボード等の入力インタフェースを備え、文書ＤＢを検索するための検索キーワード及び検索対象とする世代を入力する検索キーワード入力部、２１０６は入力された検索キーワードを文書ＤＢより検索する文書ＤＢ検索部、２１０７は検索結果をディスプレイなどの表示装置に表示する検索結果出力部、２１０８は検索キーワード入力部２１０５、文書ＤＢ検索部２１０６及び検索結果出力部２１０７を含む文書ＤＢ検索装置である。 The document DB management system will be described below with reference to FIG. In the document DB management system 2100 shown in FIG. 21, reference numeral 2101 denotes an input update document, a document DB update processing apparatus that creates a generation N using an index and a document of generation N−1, and 2102 a document into which a document is input. The input device 2103 records a plurality of generations of document DB entities permanently in the auxiliary storage device 2110, or records the latest several generations of update information in the main storage device 2109 in order to speed up the search process. A DB holding device, 2104 is a document DB creation device that creates an index of an initial generation from a document, 2105 is provided with an input interface such as a keyboard, and a search keyword for searching a document DB and a search keyword for inputting a generation to be searched An input unit 2106 is a document DB search unit that searches an input search keyword from the document DB, and 2107 is a display of search results. Search result output unit to be displayed on the display device, such as a ray, 2108 search keyword input unit 2105, a document DB retrieval device comprising a document DB search unit 2106 and a search result output unit 2107.

以上の構成の文書ＤＢ管理システム２１００において、その動作と、簡単な更新及び検索の具体例について図３を用いて説明する。 The operation of the document DB management system 2100 having the above configuration and a specific example of simple update and search will be described with reference to FIG.

図３（ａ）〜（ｃ）は、文書ＤＢとして、個人の日記等の雑感を格納した文書ＤＢの例である。 3A to 3C are examples of a document DB storing miscellaneous feelings such as a personal diary as the document DB.

図３（ａ）の更新前の文書ＤＢの例３０１に示すように、文書ＤＢは、タグ（＜ｄｏｃＩＤ＝Ｘ＞と表記（Ｘ：ＩＤ番号））で区切られた複数の文書レコードとして表現され、各文書レコードは、「文書ＩＤ」と「本文」の２項目を、ＩＤ番号順に並べ、特に文書ＩＤについては、文書ＩＤがＸの場合、タグの要素として、＜ｄｏｃＩＤ＝Ｘ＞のように記録して表現するものとする。なお、この文書ＤＢは、高速な検索、更新処理を行うために、図４の更新前の文書ＤＢの索引例４０１に示すように、文書ＤＢから、各文書レコードに出現する単語と、その単語の出現する文書ＩＤと、出現文字位置が記録されている。３０１の例では、文書ＩＤが１、２、３の３つの文書レコードがあり、更新の際には、文書ＩＤで各雑感文書を識別し、追加・変更・削除のそれぞれの更新処理を行う。図３（ｂ）、（ｃ）は、それぞれ文書ＤＢ３０１に対する更新用文書の例３０２及び更新後の文書ＤＢの例３０３を示している。 As shown in an example 301 of the document DB before update in FIG. 3A, the document DB is expressed as a plurality of document records delimited by tags (represented as <doc ID = X> (X: ID number)). Each document record arranges two items of “document ID” and “text” in the order of ID numbers. Particularly, for the document ID, when the document ID is X, <doc ID = X> is used as a tag element. It shall be recorded and expressed as follows. In order to perform a high-speed search and update process, this document DB includes a word appearing in each document record from the document DB, as shown in index example 401 of the document DB before update in FIG. The document ID in which the character appears and the appearance character position are recorded. In the example 301, there are three document records with document IDs 1, 2, and 3. When updating, each miscellaneous document is identified by the document ID, and update processing of addition, change, and deletion is performed. FIGS. 3B and 3C respectively show an update document example 302 and an updated document DB example 303 for the document DB 301.

これらの例では、文書ＩＤ１、ＩＤ３の既存の文書レコードに対する変更データと、文書ＩＤ２の既存の雑感文書の削除レコード（本文が空の雑感文書は文書ＩＤで指定した既存の文書の削除を指示するものとする）と、文書ＩＤ４の新たに追加された文書レコードが含まれており、３０２の更新用文書を入力として差分更新処理を行うと、３０３のような次の世代の文書ＤＢ（更新後の文書ＤＢ）が作成される。 In these examples, the change data for the existing document records of document ID 1 and ID 3 and the deletion record of the existing miscellaneous document of document ID 2 (in the case of a miscellaneous document with an empty body, an instruction to delete the existing document specified by the document ID is given And a newly added document record with document ID 4 is included, and when a difference update process is performed using 302 update document as an input, the next generation document DB 303 (after update) Document DB) is created.

文書ＤＢ更新処理装置２１０１と、文書ＤＢ作成装置２１０４では、図３（ｂ）に示すような更新用文書３０２を入力として受け取ると、新たな更新用文書の内容を検索し、更新するためのデータ構造である差分索引を、図２２に示す更新後の文書ＤＢの索引例２２０１のように作成し、文書ＤＢ保持装置２１０３に格納する。 When the document DB update processing apparatus 2101 and the document DB creation apparatus 2104 receive the update document 302 as shown in FIG. 3B as input, data for searching for and updating the contents of the new update document A difference index which is a structure is created as shown in an updated document DB index example 2201 shown in FIG. 22 and stored in the document DB holding device 2103.

この文書ＤＢ管理システム２１００では、このように動作する事で、文書ＤＢの更新内容を考慮して、投入データ期間で複数の文書ＤＢに分割して、世代管理型の文書ＤＢの差分更新を効率化している。 In this document DB management system 2100, by operating in this way, the update contents of the generation management type document DB are efficiently divided into a plurality of document DBs in the input data period in consideration of the update contents of the document DB. It has become.

なお、他に従来データベースと重ね合わせて検索を可能とする世代管理型のデータ更新装置としては、例えば、特許文献１に記載されている概念辞書管理装置が存在する。この装置では、複数の利用者が共用する基本概念辞書をその内容を変更することなく、分野別または利用者別に調整された概念辞書を作成して、これらを重ね合わせて検索できるようにする事で、基本概念辞書を破壊することなく拡張、縮小することを可能にしている。
特開平６−０７５９８９号公報 Narayanan Shivakumar, Hector Garcia-Molina: Wave-Indices: Indexing Evolving Databases. SIGMOD Conference 1997: 381-392 In addition, as a generation management type data updating apparatus that enables searching by superimposing with a conventional database, for example, there is a concept dictionary management apparatus described in Patent Document 1. With this device, a basic concept dictionary shared by multiple users can be created by creating a conceptual dictionary adjusted for each field or user without changing the contents, and these can be overlaid and searched. Thus, the basic concept dictionary can be expanded and reduced without destroying it.
JP-A-6-075989 Narayanan Shivakumar, Hector Garcia-Molina: Wave-Indices: Indexing Evolving Databases. SIGMOD Conference 1997: 381-392

しかし、上記従来の非特許文献１の文書ＤＢ管理システムでは、文書ＤＢ中のある文書ＩＤの一単語だけを変更するような小規模な更新でも、変更の発生した文書ＩＤの全ての文字列の切出しを行い、切り出された文字列全てについて索引を作成するため、更新情報を格納する補助記憶領域の容量を多く消費し、かつ、更新速度が更新情報量に比例してかかるため、更新処理の効率が低下するという問題がある。 However, in the conventional document DB management system of Non-Patent Document 1, all character strings of the document ID in which the change has occurred are changed even in a small update in which only one word of a document ID in the document DB is changed. In order to create an index for all the extracted character strings, it consumes a large amount of auxiliary storage area for storing update information, and the update speed is proportional to the amount of update information. There is a problem that efficiency decreases.

また、上記従来の特許文献１の概念辞書管理装置では、変更部分の単語のみに関する少量の差分概念辞書と元の概念辞書とを組み合わせることで、効率的な更新処理は行えるが、あくまでも概念辞書であり、文書を高速に検索するための索引は持たないため、文書ＤＢの世代更新処理を実現できるものではない。 In addition, in the conventional concept dictionary management device of Patent Document 1, efficient update processing can be performed by combining a small amount of difference concept dictionary related to only the word of the changed part and the original concept dictionary, but the concept dictionary only In addition, since there is no index for searching documents at high speed, generation update processing of the document DB cannot be realized.

本発明はかかる点に鑑みてなされたものであり、文書データベースの更新処理における記憶容量の消費を抑え、効率的な文書データベースの更新・検索を可能とする文書データベース更新処理装置、文書データベース検索装置、文書データベース索引作成方法及び文書データベース検索方法を提供することを目的としている。 SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and a document database update processing apparatus and a document database search apparatus capable of efficiently updating / searching a document database while suppressing consumption of storage capacity in the update process of the document database. An object of the present invention is to provide a document database index creation method and a document database search method.

本発明の第１の態様にかかる文書データベース更新処理装置は、世代管理された文書データベースを更新する文書データベース更新処理装置であって、一意に識別されるＩＤを持つ複数のレコード単位で構成される初期世代文書から、レコード毎に文字列を切出し、その切出された文字列と、その文字列の出現文字位置とを対で示す索引と、前記初期世代文書を文書データベースに記録する文書データベース記録部と、更新用文書を入力する文書入力部と、前記初期世代文書と前記更新用文書間における変更箇所の文字列の部分を判定する更新文書判定部と、前記判定により該当文字列部分について、切出された文字列とその出現位置及び文字列の変更において発生する文字列長の差分値の組を索引要素として正の索引を作成する正の索引作成部と、前記判定により削除されるべき初期世代の索引要素を負の索引として作成する負の索引作成部と、前記判定により削除されたレコードについては、その文書ＩＤを削除レコード表として作成する削除レコード表作成部と、から成り、前記作成された正の索引、負の索引及び削除レコード表を新たな世代の更新情報として更新・登録する文書データベース更新処理部と、を備えた構成を採る。 A document database update processing apparatus according to a first aspect of the present invention is a document database update processing apparatus that updates a generation-managed document database, and is configured in units of a plurality of records having IDs that are uniquely identified. A document database record that cuts out a character string for each record from the initial generation document, an index that shows the extracted character string and the character position of the character string as a pair, and records the initial generation document in the document database A document input unit for inputting a document for update, an update document determination unit for determining a part of a character string of a changed portion between the initial generation document and the document for update, and a corresponding character string part by the determination, Positive index creation that creates a positive index using a pair of the extracted character string, its occurrence position, and the difference value of the character string length that occurs when the character string is changed as an index element A negative index creation unit that creates an index element of the first generation to be deleted by the determination as a negative index, and a deletion that creates the document ID as a deletion record table for the record deleted by the determination And a document database update processing unit that updates and registers the created positive index, negative index, and deleted record table as update information of a new generation.

この構成によれば、初期世代から次世代への索引更新の際に、新版で削除されるべき初期世代の索引要素を記録した負の索引と、追加、変更のあった文字列の出現位置と文字列長の差分値の組を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する更新情報を作成し、この更新情報と、索引の組で次世代の文書データベースを表現することにより、変更した文字列部分のみに比例した量の更新情報となるため、更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 According to this configuration, when updating the index from the initial generation to the next generation, the negative index that records the index element of the initial generation that should be deleted in the new version, and the appearance position of the character string that has been added or changed Update information having a positive index that records a pair of character string length difference values and a deleted record table that records document IDs of deleted records is created. By representing the document database, the amount of update information is proportional to only the changed character string portion, so the storage area for update information can be reduced, and at the same time, update information is created in proportion to the reduction in the amount of update information Processing time can also be shortened.

本発明の第２の態様にかかる文書データベース更新処理装置は、世代管理された文書データベースを更新する文書データベース更新処理装置であって、一意に識別されるＩＤを持つ複数のレコード単位で構成される初期世代から世代Ｎまでの文書と、負の索引、正の索引及び削除レコード表からなる索引情報を、それぞれ文書データベースに記録する文書データベース記録部と、世代Ｎ＋１の更新用文書を入力する文書入力部と、前記初期世代から世代Ｎまでの文書と、負の索引、正の索引及び削除レコード表からなる索引情報と、前記世代Ｎ＋１の更新用文書から変更箇所の文字列の部分を判定する更新文書判定部と、前記判定により該当文字列部分について、切出された文字列とその出現位置及び文字列の変更において発生する文字列長の差分値の組を索引要素として正の索引を作成する正の索引作成部と、前記判定により削除されるべき初期世代の索引要素を負の索引として作成する負の索引作成部と、前記判定により削除されたレコードについては、その文書ＩＤを削除レコード表として作成する削除レコード表作成部と、から成り、前記判定により作成された正の索引、負の索引及び削除レコード表を世代Ｎ＋１の更新情報として更新・登録する文書データベース更新処理部と、を備えた構成を採る。 A document database update processing device according to a second aspect of the present invention is a document database update processing device that updates a generation-managed document database, and is configured in units of a plurality of records having IDs that are uniquely identified. Document database recording section for recording documents from the initial generation to generation N and index information including negative index, positive index, and deleted record table in the document database, and document input for inputting generation N + 1 update document Update to determine the portion of the character string at the change location from the update document of the generation N + 1, and the index information including the negative index, the positive index, and the deletion record table, the document from the initial generation to the generation N The character string length generated in the document determination unit and in the change of the character string cut out, its appearance position, and the character string for the corresponding character string part by the determination. A positive index creation unit that creates a positive index using a set of difference values as an index element, a negative index creation unit that creates an index element of the initial generation to be deleted by the determination as a negative index, and the determination A deleted record table creation unit for creating a deleted record table for the deleted record, and a positive index, a negative index, and a deleted record table created by the determination are updated with generation N + 1 update information. And a document database update processing unit to be updated / registered.

この構成によれば、初期世代から世代Ｎへの索引更新の際に、新版で削除されるべき初期世代の索引要素を記録した負の索引と、追加、変更のあった文字列の出現位置と文字列長の差分値の組を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する世代Ｎ＋１の更新情報を作成し、この更新情報と、索引の組で世代Ｎ＋１の文書データベースを表現することにより、世代毎に変更した文字列部分のみに比例した量の更新情報となるため、各世代の更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 According to this configuration, when updating the index from the initial generation to the generation N, the negative index in which the index element of the initial generation to be deleted in the new version is recorded, and the appearance position of the character string that has been added or changed Generation N + 1 update information having a positive index that records a set of character string length difference values and a deleted record table set that records document IDs of deleted records is created. By expressing the generation N + 1 document database with the amount of update information proportional to only the character string portion changed for each generation, the storage area of update information for each generation can be reduced, and at the same time, the amount of update information In proportion to the reduction, the update information creation processing time can also be shortened.

本発明の第３の態様は、第２の態様にかかる文書データベース更新処理装置において、前記初期世代から世代Ｎまでの文書と、前記負の索引、前記正の索引及び前記更新用文書に基づく世代ｉ＋１（０＜ｉ＜Ｎ）の更新処理においては、世代ｉ〜世代Ｎの負の索引、正の索引及び削除レコード表に基づいて削除する索引要素と、追加・変更された索引要素と、削除されたレコードとを解釈する複数更新情報解釈部を備えることにより、前記世代Ｎ＋１の更新情報を作成する構成を採る。 According to a third aspect of the present invention, in the document database update processing apparatus according to the second aspect, the generation from the initial generation to the generation N, the negative index, the positive index, and the generation based on the update document In the update process of i + 1 (0 <i <N), the index element to be deleted based on the negative index of generation i to generation N, the positive index, and the deletion record table, the added / changed index element, and the deletion By providing a plurality of update information interpreting units for interpreting the recorded records, a configuration for creating the generation N + 1 update information is adopted.

この構成によれば、初期世代から世代Ｎへの索引更新の際に、世代ｉ〜世代Ｎの負の索引、正の索引及び削除レコード表の要素を解釈して世代Ｎ＋１の更新情報を作成することができ、世代Ｎまでに追加・変更された文字列部分のみの更新情報作成処理となるため、更新情報作成処理の効率化を図ることができる。 According to this configuration, when updating the index from the initial generation to the generation N, generation N + 1 update information is generated by interpreting the negative index, the positive index, and the deletion record table elements of the generations i to N. Since the update information creation process is performed only for the character string portion added / changed up to generation N, the efficiency of the update information creation process can be improved.

本発明の第４の態様は、第１または第２の態様にかかる文書データベース更新処理装置において、前記更新文書判定部は、更新前世代の索引から、更新後世代の索引への更新処理時に、更新対象レコードの変更される文字列の数が任意の閾値より多いか否かを判定し、多い場合には、そのレコードを変更レコードとみなして索引を作成し、当該レコード番号を前記削除レコード表に記録する構成を採る。 According to a fourth aspect of the present invention, in the document database update processing device according to the first or second aspect, the updated document determination unit is configured to perform an update process from the pre-update generation index to the post-update generation index. It is determined whether or not the number of character strings to be updated in the update target record is greater than an arbitrary threshold value. If there are many, the index is created by regarding the record as a change record, and the record number is stored in the deleted record table. The configuration to record is taken.

この構成によれば、変更される文字列数が予め決定した閾値を超える場合は、削除対象レコードとして削除レコード表に記録することにより、常に更新情報を適用して更新処理を行う場合よりも処理効率の向上を図ることができる。 According to this configuration, when the number of character strings to be changed exceeds a predetermined threshold value, it is recorded in the deletion record table as a record to be deleted, so that the update process is always performed by applying update information. Efficiency can be improved.

本発明の第５の態様は、第２の態様にかかる文書データベース更新処理装置において、前記複数世代にわたる更新により蓄積される複数世代の更新情報を一つの更新情報に纏める処理を行う更新情報マージ処理部を、更に備える構成を採る。 According to a fifth aspect of the present invention, in the document database update processing apparatus according to the second aspect, an update information merge process for performing a process of combining the update information of a plurality of generations accumulated by the update over the plurality of generations into one update information The structure which further comprises a part is taken.

この構成によれば、複数世代にわたる更新情報を一つの更新情報に纏めることにより、以降の世代更新処理、検索処理の際の処理速度を向上させることができる。 According to this configuration, it is possible to improve the processing speed in subsequent generation update processing and search processing by combining update information over a plurality of generations into one update information.

本発明の第６の態様は、第５の態様にかかる文書データベース更新処理装置において、前記更新情報マージ処理部により、必要のなくなった更新情報は削除する構成を採る。 According to a sixth aspect of the present invention, in the document database update processing apparatus according to the fifth aspect, the update information merge processing unit deletes update information that is no longer needed.

この構成によれば、必要のなくなった更新情報は削除することにより、文書データベースを格納する記憶領域を効率よく使用することができる。 According to this configuration, it is possible to efficiently use the storage area for storing the document database by deleting unnecessary update information.

本発明の第７の態様は、第４の態様にかかる文書データベース更新処理装置において、前記更新文書判定部は、前記更新用文書から比較対照とする文書レコードを指定し、この文書レコードと、前記初期文書の該当文書レコードとの間の差分文字列リストを求め、当該差分文字列リストの要素数が前記閾値より多いか否かを判定する構成を採る。 According to a seventh aspect of the present invention, in the document database update processing apparatus according to the fourth aspect, the updated document determination unit specifies a document record to be compared from the update document, the document record, A difference character string list between the corresponding document records of the initial document is obtained, and it is determined whether or not the number of elements in the difference character string list is greater than the threshold value.

この構成によれば、差分文字列リストの要素数と予め決定した閾値との大小を判定することにより、削除すべきレコードの判定処理を効率よく実行することができる。 According to this configuration, the determination process of the record to be deleted can be efficiently executed by determining the size of the number of elements of the difference character string list and the predetermined threshold value.

本発明の第８の態様にかかる文書データベース検索装置は、検索対象とする文字列を入力する検索文字列入力部と、複数世代の正の索引、負の索引及び削除レコード表からなる更新情報と、その各世代の文書情報を記憶する文書データベース保持部と、前記入力された文字列を解析して文字列に分割し、分割した各文字列について、前記文書データベース保持部から複数世代に渡る更新情報と、初期世代の索引及び文書とを用いて検索する文書データベース検索部と、前記文書データベース検索部により得られたレコード集合を出力する検索結果出力部と、を具備する構成を採る。 A document database search device according to an eighth aspect of the present invention includes a search character string input unit that inputs a character string to be searched, update information that includes a plurality of generations of positive indexes, negative indexes, and delete record tables. A document database holding unit that stores document information of each generation, and the input character string is analyzed and divided into character strings, and each divided character string is updated from the document database holding unit over a plurality of generations. A configuration is provided that includes a document database search unit that searches using information, an index and a document of an initial generation, and a search result output unit that outputs a record set obtained by the document database search unit.

この構成によれば、検索対象文字列を正の索引、負の索引及び削除レコード表から成る更新情報から検索するため、索引に記録された各要素から検索対象を検索できるため、世代管理型文書データベースに対する検索処理効率の向上を図ることができる。 According to this configuration, since the search target character string is searched from the update information including the positive index, the negative index, and the deletion record table, the search target can be searched from each element recorded in the index. The search processing efficiency for the database can be improved.

本発明の第９の態様は、第８の態様にかかる文書データベース検索装置において、世代Ｎ＋１の前記正の索引と、世代０と世代１から世代Ｎ＋１までの更新情報を用いて、世代ｉ（ｉ＝１〜Ｎ＋１）の検索時に、世代Ｎ＋１から世代ｉ＋１までの負の索引の要素と、削除レコード表の要素に基づいて削除された索引要素と、削除されたレコードを累積的に解釈する負索引・削除レコード表解釈部を備え、前記文書データベース検索部は、世代Ｎ＋１の索引検索において、前記分割された文字列毎に、前記世代Ｎ＋１の正の索引を検索して該当する文字列があれば、当該文字列を検索候補とし、世代Ｎの索引検索において、前記分割された文字列毎に該当する文字列があれば、当該文字列を前記負索引・削除レコード表解釈部に出力し、前記負索引・削除レコード表解釈部は、前記文書データベース検索部から入力された文字列に該当する文字列が前記世代Ｎ＋１の負の索引にあれば、当該文字列を検索対象とせず、前記世代Ｎ＋１の削除レコード表に登録されたレコード番号の文書データを解釈して、前記入力された文字列の要素があれば、その要素を検索対象としない構成を採る。 According to a ninth aspect of the present invention, in the document database search device according to the eighth aspect, generation i (i) is performed using the positive index of generation N + 1 and update information from generation 0 and generation 1 to generation N + 1. = 1 to N + 1), a negative index element from generation N + 1 to generation i + 1, an index element deleted based on an element of the deletion record table, and a negative index that interprets the deleted record cumulatively A deletion record table interpretation unit, wherein the document database search unit searches the generation N + 1 positive index for each divided character string in the generation N + 1 index search and if there is a corresponding character string The character string is set as a search candidate, and in the generation N index search, if there is a character string corresponding to each of the divided character strings, the character string is output to the negative index / delete record table interpretation unit, and negative If the character string corresponding to the character string input from the document database search unit is in the negative index of the generation N + 1, the lookup / deletion record table interpretation unit does not search the character string, and the generation N + 1 If the document data of the record number registered in the deletion record table is interpreted, and there is an element of the input character string, a configuration is adopted in which that element is not a search target.

この構成によれば、複数世代に渡る更新処理により作成された文書データベースから検索対象を効率よく検索することができる。 According to this configuration, a search target can be efficiently searched from a document database created by an update process over a plurality of generations.

本発明の第１０の態様は、第９の態様にかかる文書データベース検索装置において、前記世代Ｎ＋１の正の索引から前記世代ｉ＋１の正の索引までの各要素の位置シフト値を累算する位置シフト累算部を、更に備え、前記文書データベース検索部は、前記世代Ｎ＋１の正の索引から検索された文字列の出現位置に、前記位置シフト累算部により累算された位置シフト値を加算して、当該文字列より前に検索された文字列の出現位置と連接するか否かを判定し、連接していれば、前記検索された文字列を検索対象とする構成を採る。 According to a tenth aspect of the present invention, in the document database search device according to the ninth aspect, a position shift that accumulates a position shift value of each element from the positive index of the generation N + 1 to the positive index of the generation i + 1 An accumulator; and the document database search unit adds the position shift value accumulated by the position shift accumulator to the appearance position of the character string retrieved from the positive index of the generation N + 1. Then, it is determined whether or not the character string searched before the character string is connected, and if it is connected, the searched character string is set as a search target.

この構成によれば、複数世代に渡る更新処理により作成された文書データベースから検索対象の位置シフト後の出現位置と、他の検索対象の出現位置との連接関係も考慮して検索することができ、検索対象を正確に検索することができる。 According to this configuration, it is possible to perform a search from a document database created by an update process over a plurality of generations in consideration of the connection relationship between the appearance position after the position shift of the search target and the appearance position of another search target. The search object can be searched accurately.

本発明の第１１の態様は、第９または第１０の態様にかかる文書データベース検索装置において、前記文書データベース検索部は、前記分割した各文字列に対して、前記世代０〜Ｎ＋１の各世代の正の索引から該当文字列を検索する処理を繰り返し実行し、検索した該当文字列毎に前記連接の判定を行って、前記分割した全ての文字列に対して、前記世代Ｎ＋１の正の索引から前記世代ｉ＋１の正の索引までの全要素から検索対象を検索する構成を採る。 An eleventh aspect of the present invention is the document database search device according to the ninth or tenth aspect, wherein the document database search unit is configured to generate the generations 0 to N + 1 for each of the divided character strings. The process of searching for the corresponding character string from the positive index is repeatedly executed, the concatenation is determined for each searched character string, and the generation N + 1 positive index is determined for all the divided character strings. A configuration is adopted in which a search target is searched from all elements up to the positive index of the generation i + 1.

この構成によれば、複数世代に渡る更新処理により作成された文書データベースから検索対象の文字列を検索する際に、索引の各要素の出現位置集合等に対する加減算の演算だけで処理できるため、検索処理速度を低下させずに、正確かつ効率的な検索処理を実現することができる。 According to this configuration, when searching for a character string to be searched from a document database created by update processing over a plurality of generations, processing can be performed only by addition / subtraction operations on the appearance position set of each element of the index. Accurate and efficient search processing can be realized without reducing the processing speed.

本発明の第１２の態様にかかる文書データベース索引作成方法は、世代管理された文書データベースの索引を作成する文書データベース索引作成方法であって、一意に識別されるＩＤを持つ複数のレコード単位で構成される初期世代文書から、レコード毎に文字列を切出し、その切出された文字列と、その文字列の出現文字位置とを対で示す索引と、前記初期世代文書を文書データベースに記録する文書データベース記録ステップと、更新用文書を入力する文書入力ステップと、前記初期世代文書と前記更新用文書間における変更箇所の文字列の部分を判定する更新文書判定ステップと、前記判定により該当文字列部分について、切出された文字列とその出現位置及び文字列の変更において発生する文字列長の差分値の組を索引要素として正の索引を作成する正の索引作成ステップと、前記判定により削除されるべき初期世代の索引要素を負の索引として作成する負の索引作成ステップと、前記判定により削除されたレコードについては、その文書ＩＤを削除レコード表として作成する削除レコード表作成ステップと、前記作成された正の索引、負の索引及び削除レコード表を新たな世代の更新情報として更新・登録する更新・登録ステップと、を具備するようにした。 A document database index creation method according to a twelfth aspect of the present invention is a document database index creation method for creating an index of a generation-managed document database, and is composed of a plurality of record units each having a uniquely identified ID. A character string is cut out for each record from the generated initial generation document, an index showing the extracted character string and the character position of the character string as a pair, and a document that records the initial generation document in the document database A database recording step, a document input step for inputting an update document, an update document determination step for determining a portion of a character string to be changed between the initial generation document and the update document, and a corresponding character string portion by the determination For the index element, a pair of the extracted character string, its occurrence position, and the character string length difference value that occurs when the character string is changed A positive index creation step for creating an index, a negative index creation step for creating an index element of the initial generation to be deleted by the determination as a negative index, and a document ID for the record deleted by the determination A deletion record table creation step for creating a deletion record table, and an update / registration step for updating / registering the created positive index, negative index, and deletion record table as update information of a new generation. I did it.

この方法によれば、初期世代から次世代への索引更新の際に、新版で削除されるべき初期世代の索引要素を記録した負の索引と、追加、変更のあった文字列の出現位置と文字列長の差分値の組を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する更新情報を作成し、この更新情報と、索引の組で次世代の文書データベースを表現することにより、変更した文字列部分のみに比例した量の更新情報となるため、更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 According to this method, when updating the index from the initial generation to the next generation, a negative index that records the index elements of the initial generation to be deleted in the new version, and the appearance position of the character string that has been added or changed Update information having a positive index that records a pair of character string length difference values and a deleted record table that records document IDs of deleted records is created. By representing the document database, the amount of update information is proportional to only the changed character string portion, so the storage area for update information can be reduced, and at the same time, update information is created in proportion to the reduction in the amount of update information Processing time can also be shortened.

本発明の第１３の態様にかかる文書データベース索引作成方法は、世代管理された文書データベースの索引を作成する文書データベース索引作成方法であって、一意に識別されるＩＤを持つ複数のレコード単位で構成される初期世代から世代Ｎまでの文書と、負の索引、正の索引及び削除レコード表からなる索引情報を、それぞれ文書データベースに記録する文書データベース記録ステップと、世代Ｎ＋１の更新用文書を入力する文書入力ステップと、前記初期世代から世代Ｎまでの文書と、負の索引、正の索引及び削除レコード表からなる索引情報と、前記世代Ｎ＋１の更新用文書から変更箇所の文字列の部分を判定する更新文書判定ステップと、前記判定により該当文字列部分について、切出された文字列とその出現位置及び文字列の変更において発生する文字列長の差分値の組を索引要素として正の索引を作成する正の索引作成ステップと、前記判定により削除されるべき初期世代の索引要素を負の索引として作成する負の索引作成ステップと、前記判定により削除されたレコードについては、その文書ＩＤを削除レコード表として作成する削除レコード表作成ステップと、前記判定により作成された正の索引、負の索引及び削除レコード表を世代Ｎ＋１の更新情報として更新・登録する更新・登録ステップと、を具備するようにした。 A document database index creation method according to a thirteenth aspect of the present invention is a document database index creation method for creating an index of a generation-managed document database, and is composed of a plurality of record units each having a uniquely identified ID. A document database recording step for recording documents from the initial generation to generation N and index information including a negative index, a positive index, and a deletion record table, respectively, and a generation N + 1 update document are input. Document input step, document from the initial generation to generation N, index information including negative index, positive index, and deleted record table, and determination of character string portion of changed portion from update document of generation N + 1 An updated document determination step to be performed, and for the character string portion corresponding to the determination, the extracted character string, its appearance position, and the change of the character string are changed. A positive index creation step for creating a positive index using a set of character string length difference values generated as an index element, and a negative index for creating an initial generation index element to be deleted by the determination as a negative index For the record deleted by the determination step, the deletion record table generation step for generating the document ID as a deletion record table, and the generation of the positive index, the negative index and the deletion record table generated by the determination An update / registration step for updating / registering as N + 1 update information is provided.

この方法によれば、初期世代から世代Ｎへの索引更新の際に、新版で削除されるべき初期世代の索引要素を記録した負の索引と、追加、変更のあった文字列の出現位置と文字列長の差分値の組を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する世代Ｎ＋１の更新情報を作成し、この更新情報と、索引の組で世代Ｎ＋１の文書データベースを表現することにより、世代毎に変更した文字列部分のみに比例した量の更新情報となるため、各世代の更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 According to this method, when updating the index from the initial generation to the generation N, the negative index in which the index element of the initial generation to be deleted in the new version is recorded, and the appearance position of the added or changed character string Generation N + 1 update information having a positive index that records a set of character string length difference values and a deleted record table set that records document IDs of deleted records is created. By expressing the generation N + 1 document database with the amount of update information proportional to only the character string portion changed for each generation, the storage area of update information for each generation can be reduced, and at the same time, the amount of update information In proportion to the reduction, the update information creation processing time can also be shortened.

本発明の第１４の態様にかかる文書データベース検索方法は、検索対象とする文字列を入力する検索文字列入力ステップと、複数世代の正の索引、負の索引及び削除レコード表からなる更新情報と、その各世代の文書情報を記憶する文書データベース保持ステップと、前記入力された文字列を解析して文字列に分割し、分割した各文字列について、前記文書データベースから複数世代に渡る更新情報と、初期世代の索引及び文書とを用いて検索する文書データベース検索ステップと、前記文書データベース検索ステップにより得られたレコード集合を出力する検索結果出力ステップと、を具備するようにした。 A document database search method according to a fourteenth aspect of the present invention includes a search character string input step for inputting a character string to be searched, update information comprising a plurality of generations of positive indexes, negative indexes, and deletion record tables. A document database holding step for storing the document information of each generation, the input character string is analyzed and divided into character strings, and update information for a plurality of generations from the document database for each divided character string; A document database search step for searching using an index and a document of the initial generation, and a search result output step for outputting a record set obtained by the document database search step.

この方法によれば、検索対象文字列を正の索引、負の索引及び削除レコード表からなる更新情報から検索するため、索引に記録された各要素から検索対象を検索できるため、世代管理型文書データベースに対する検索処理効率の向上を図ることができる。 According to this method, since the search target character string is searched from the update information including the positive index, the negative index, and the deletion record table, the search target can be searched from each element recorded in the index. The search processing efficiency for the database can be improved.

本発明によれば、初期世代から次世代への索引更新の際に、新版で削除されるべき初期世代の索引要素を記録した負の索引と、追加、変更のあった文字列の出現位置と文字列長の組を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する更新情報を作成し、この更新情報と、索引の組で次世代の文書データベースを表現することにより、変更した文字列部分のみに比例した量の更新情報となるため、更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 According to the present invention, when updating the index from the initial generation to the next generation, the negative index that records the index element of the initial generation to be deleted in the new version, and the appearance position of the character string that has been added or changed Update information having a set of a positive index that records a set of character string lengths and a deleted record table that records the document IDs of deleted records is created, and the next generation document database is created using this update information and the set of indexes. Since the amount of update information is proportional to only the changed character string portion, the storage area for the update information can be reduced, and at the same time, the update information creation processing time is also proportional to the reduction in the amount of update information. It can be shortened.

以下、本発明の実施の形態について図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施の形態に係る世代管理型の文書データベース処理システムの全体構成を示す図である。図１において、文書データベース処理システム１００は、文書入力装置１０１と、文書ＤＢ更新処理装置１０２と、文書ＤＢ保持装置１０３と、文書ＤＢ検索装置１０４と、主記憶装置１０８，１１０と、補助記憶装置１０９とから構成される。 FIG. 1 is a diagram showing an overall configuration of a generation management type document database processing system according to an embodiment of the present invention. In FIG. 1, a document database processing system 100 includes a document input device 101, a document DB update processing device 102, a document DB holding device 103, a document DB search device 104, main storage devices 108 and 110, and an auxiliary storage device. 109.

文書入力装置１０１は、文書データを入力するためのキーボード等を備え、文書ＤＢ更新処理装置１０２に対して文書データを入力する。 The document input device 101 includes a keyboard or the like for inputting document data, and inputs the document data to the document DB update processing device 102.

文書ＤＢ更新処理装置１０２は、文書入力装置１０１から入力された初期文書（世代０）からレコード単位に文字列を切り出して初期索引を作成して、初期文書と初期索引を世代０の文書ＤＢ１０３ａとして文書ＤＢ保持装置１０３に記録する。 The document DB update processing apparatus 102 creates an initial index by cutting out character strings in record units from the initial document (generation 0) input from the document input apparatus 101, and sets the initial document and the initial index as the generation 0 document DB 103a. Records in the document DB holding device 103.

また、文書ＤＢ更新処理装置１０２は、文書入力装置１０１から新たに更新用文書が入力されると、その更新用文書と、世代０の文書ＤＢ１０３ａの初期文書との文字列の差分情報からレコード単位に索引（後述する正の索引と負の索引）と、後述する削除レコード表を作成する。これらと、入力された更新用文書を、世代０→１の更新情報１０３ｂとして文書ＤＢ保持装置１０３に記録する。 In addition, when a new update document is input from the document input device 101, the document DB update processing device 102 obtains a record unit from character string difference information between the update document and the initial document of the generation 0 document DB 103a. In addition, an index (a positive index and a negative index described later) and a deleted record table described later are created. These and the input update document are recorded in the document DB holding device 103 as the update information 103b of the generation 0 → 1.

また、文書ＤＢ更新処理装置１０２は、文書入力装置１０１から入力された新たなＮ−１世代目の更新用文書と、一世代前（Ｎ−２世代）の更新情報（図示せず）から更新用文書と初期文書との文字列の差分情報である各世代に記録された更新情報の索引（正の索引、負の索引）を解釈した上で、今回の更新に対応する索引（正の索引、負の索引）と削除レコード表を作成する。作成された索引（正の索引、負の索引）と削除レコード表と入力された更新用文書を、世代Ｎ−２→Ｎ−１の更新情報１０３ｃとして文書ＤＢ保持装置１０３に記録する。 Further, the document DB update processing apparatus 102 updates the new N-1 generation update document input from the document input apparatus 101 and the update information (not shown) of the previous generation (N-2 generation). After interpreting the update information index (positive index, negative index) recorded in each generation, which is the difference information of the character string between the document for use and the initial document, the index corresponding to this update (positive index) , Negative index) and create a delete record table. The created index (positive index, negative index), deletion record table, and input update document are recorded in the document DB holding device 103 as update information 103c of generation N-2 → N-1.

また、文書ＤＢ更新処理装置１０２は、文書入力装置１０１から新たなＮ世代目の更新用文書が入力されると、上記の説明と同様に、索引（正の索引、負の索引）と削除レコード表を作成し、入力された更新用文書とともに、世代Ｎ−１→Ｎの更新情報１０３ｄとして文書ＤＢ保持装置１０３に記録する。 In addition, when a new Nth generation update document is input from the document input device 101, the document DB update processing device 102 receives an index (positive index, negative index) and a deletion record as described above. A table is created and recorded together with the input update document in the document DB holding apparatus 103 as update information 103d of generation N-1 → N.

また、文書ＤＢ更新処理装置１０２は、文書ＤＢ保持装置１０３に記録した複数世代の更新情報をマージする更新情報マージ処理機能を有する。これについては、後述する。 Further, the document DB update processing apparatus 102 has an update information merge processing function for merging multiple generations of update information recorded in the document DB holding apparatus 103. This will be described later.

文書ＤＢ保持装置１０３は、世代０の文書ＤＢ１０３ａを永続的に保持するとともに、文書ＤＢ更新処理装置１０２により更新用文書が入力される毎に作成される世代間の更新情報１０３ａ〜１０３ｄを保持する。なお、図１に示す世代０の更新情報１０３ａには、初期文書１０３１と初期索引４０１が記録されていることを示している。 The document DB holding apparatus 103 permanently holds the generation 0 document DB 103a and holds intergenerational update information 103a to 103d created each time an update document is input by the document DB update processing apparatus 102. . The generation 0 update information 103a shown in FIG. 1 indicates that an initial document 1031 and an initial index 401 are recorded.

主記憶装置１０８は、文書ＤＢ更新処理装置１０２において実行される文書ＤＢ更新処理の処理プログラムを記憶するとともに、後述する更新情報マージ処理において、比較対照となる索引の各要素（検索対象文字列）等を一時的に記憶する。 The main storage device 108 stores a processing program for the document DB update processing executed in the document DB update processing device 102, and each index element (search target character string) to be compared in the update information merging processing described later. Etc. are temporarily stored.

補助記憶装置１０９は、更新情報マージ処理において複数世代の更新情報の索引から作成された索引等を記憶する。 The auxiliary storage device 109 stores an index or the like created from the update information index of multiple generations in the update information merge process.

文書ＤＢ検索装置１０４は、検索キーワード入力部１０５と、文書ＤＢ検索部１０６と、検索結果出力部１０７とから構成される。 The document DB search device 104 includes a search keyword input unit 105, a document DB search unit 106, and a search result output unit 107.

検索キーワード入力部１０５は、ユーザが文書ＤＢを検索する検索文字列を入力するためのキーボード等を備え、入力された検索文字列を文書ＤＢ検索部１０６に出力する。 The search keyword input unit 105 includes a keyboard or the like for the user to input a search character string for searching the document DB, and outputs the input search character string to the document DB search unit 106.

文書ＤＢ検索部１０６は、検索キーワード入力部１０５から検索キーワードが入力されると、検索文字列に含まれる文字列毎に、文書ＤＢ保持装置１０３に記録された複数世代の更新情報と、世代０の文書ＤＢを入力として、世代Ｎの更新情報を検索する検索機能を有し、その検索結果を検索結果出力部１０７に出力する。 When a search keyword is input from the search keyword input unit 105, the document DB search unit 106 includes update information of multiple generations recorded in the document DB holding device 103 for each character string included in the search character string, and generation 0. And a search function for searching for update information of generation N, and outputs the search result to the search result output unit 107.

検索結果出力部１０７は、文書ＤＢ検索部１０６から入力される検索結果を、ディスプレイ等に表示する。 The search result output unit 107 displays the search result input from the document DB search unit 106 on a display or the like.

主記憶装置１１０は、文書ＤＢ検索装置１０４において実行される文書ＤＢ検索処理の処理プログラムを記憶するとともに、文書ＤＢ検索処理の検索結果等を記憶する。 The main storage device 110 stores a processing program of the document DB search process executed in the document DB search device 104 and stores a search result of the document DB search process.

次に、文書ＤＢ更新処理装置１０２で実行される更新情報作成処理について、図２に示す文書ＤＢ更新処理装置１０２のブロック図、図３に示す文書ＤＢの更新例、図４に示す更新前の文書ＤＢの索引例、図５に示す更新後の文書ＤＢの索引例、図６に示す削除レコード表の例及び図７に示すフローチャートを参照して説明する。 Next, with respect to the update information creation processing executed in the document DB update processing apparatus 102, the block diagram of the document DB update processing apparatus 102 shown in FIG. 2, the update example of the document DB shown in FIG. 3, the pre-update shown in FIG. Description will be made with reference to an index example of the document DB, an index example of the updated document DB shown in FIG. 5, an example of the deleted record table shown in FIG. 6, and a flowchart shown in FIG.

なお、この更新情報作成処理では、最初に入力された文書である初期世代文書（世代０の文書）と、次に更新用文書として入力された世代１の更新用文書との間の差分文字列に対して、更新情報を作成する場合を説明する。 In this update information creation process, the difference character string between the initial generation document (generation 0 document) that is the first input document and the generation 1 update document that is input next as the update document. In contrast, a case where update information is created will be described.

図２は、初期の文書から作成された初期世代の文書ＤＢから、次世代つまり世代１の文書ＤＢを作成する文書ＤＢ更新処理装置１０２の構成を示す図である。 FIG. 2 is a diagram showing a configuration of the document DB update processing apparatus 102 that creates the next generation, that is, the generation 1 document DB from the initial generation document DB created from the initial document.

図２において、文書ＤＢ更新処理装置１０２は、更新文書判定部２０１と、正の索引作成部２０２、負の索引作成部２０３及び削除レコード表作成部２０５を含む文書ＤＢ更新処理部２０４と、索引作成部２０６と、から構成される。 2, the document DB update processing apparatus 102 includes an update document determination unit 201, a document DB update processing unit 204 including a positive index creation unit 202, a negative index creation unit 203, and a deleted record table creation unit 205, an index And a creation unit 206.

図３（ａ）〜（ｃ）は、文書ＤＢとして、個人の雑感を記録した文書の更新例を示す図であり、従来で説明したものと同じ内容の文書ＤＢである。 FIGS. 3A to 3C are diagrams illustrating an example of updating a document in which personal miscellaneous feelings are recorded as the document DB, and is a document DB having the same contents as those described above.

図３（ａ）に示すように、更新前の文書ＤＢ３０１は、タグ（＜ｄｏｃＩｄ＝Ｘ＞と表記）で区切られた複数の文書レコードの列として表現され、各文書レコードは、「文書ＩＤ」と「本文」の２項目を、文書ＩＤ順に並べて記述されている。特に文書ＩＤについては、文書ＩＤがＸの場合、タグの要素として、＜ｄｏｃＩｄ＝Ｘ＞のように記録して表現するものとする。なお、この文書ＤＢは高速な検索、更新処理を行うために、図４に示す更新前の文書ＤＢの索引（初期索引）４０１のように、各文書レコードから、文書に出現する単語と、その単語の出現する文書ＩＤと、出現文字位置が記録されている。 As shown in FIG. 3A, the document DB 301 before update is expressed as a sequence of a plurality of document records delimited by tags (represented as <doc Id = X>). "And" text "are described side by side in document ID order. In particular, when the document ID is X, the document ID is recorded and expressed as a tag element such as <doc Id = X>. In order to perform high-speed search and update processing, this document DB includes words appearing in the document from each document record, such as an index (initial index) 401 of the document DB before update shown in FIG. The document ID in which the word appears and the appearance character position are recorded.

図３（ａ）の更新前の文書ＤＢ３０１では、文書ＩＤが１、２、３の３つの文書レコードがあり、更新の際には、文書ＩＤで各文書レコードを識別し、追加・変更・削除のそれぞれの更新処理を行う。図３（ｂ）、（ｃ）は、それぞれ更新前の文書ＤＢ３０１に対する更新用文書３０２及び更新後の文書ＤＢ３０３である。 In the pre-update document DB 301 in FIG. 3A, there are three document records with document IDs 1, 2, and 3. When updating, each document record is identified by the document ID, and added / changed / deleted. Each update process is performed. FIGS. 3B and 3C are an update document 302 and an updated document DB 303 for the document DB 301 before update, respectively.

この更新用文書３０２の例では、文書ＩＤ１、ＩＤ３の既存の文書レコードに対する変更データと、文書ＩＤ２の既存の文書レコードの削除データ（本文が空の雑感文書は文書ＩＤで指定した既存の雑感文書の削除を指示するものと定める）と、文書ＩＤ４の新たな追加文書レコードが含まれており、文書ＤＢ更新処理装置１０２において差分更新処理を行うと、図３（ｃ）に示すような次の世代の雑感文書ＤＢ、すなわち、更新後の文書ＤＢ３０３が作成される。 In this example of the update document 302, change data for the existing document records of document ID1 and ID3, and deletion data of the existing document record of document ID2 (the illusory document whose body is empty is an existing illusory document specified by the document ID). 3) and a new additional document record with the document ID 4 is included, and when the difference update process is performed in the document DB update processing apparatus 102, the following as shown in FIG. A generation miscellaneous document DB, that is, an updated document DB 303 is created.

図２の文書ＤＢ更新処理部２０４では、図３（ｂ）に示す更新用文書３０２あるいは初期文書（更新処理でなく初めての文書データの場合を特別にこのように呼ぶこととする）を入力として受け取り、図３（ａ）のような更新前の文書ＤＢ３０１を高速に検索し、更新するために、文書ＩＤ及び本文から切り出す文字列に対して初期索引あるいは初期索引からの差分索引である更新情報を作成し、更新文書あるいは初期文書と共に、文書ＤＢ保持装置１０３に記録する。 In the document DB update processing unit 204 in FIG. 2, the update document 302 or the initial document shown in FIG. 3B or the initial document (this is the case of the first document data, not the update process) is input. In order to retrieve and update the document DB 301 before update as shown in FIG. 3A at high speed, the update information which is the initial index or the difference index from the initial index for the character string cut out from the document ID and the text Is recorded in the document DB holding device 103 together with the updated document or the initial document.

図２において、索引作成部２０６は、世代０の文書ＤＢ１０３ａから初期文書１０３１（図３（ａ）の更新前の文書ＤＢ３０１に含まれる文書レコードのこと）を読み出し、従来の全文検索ＤＢ作成における既知の手法であるＮ−ｇｒａｍ分割方式や、各文書レコードに対して単語辞書を用いて単語で分割する方式等といった方式を用いて、初期文書に含まれる各文書レコードから文字列の切出しを行い、その出現位置を記録して、図４に示すような更新前の文書ＤＢの索引、すなわち、初期索引４０１を作成する。そして、索引作成部２０６は、作成した初期索引４０１を世代０の文書ＤＢ１０３ａに記録する。 In FIG. 2, the index creation unit 206 reads the initial document 1031 (the document record included in the document DB 301 before update in FIG. 3A) from the generation 0 document DB 103a, and is known in the conventional full-text search DB creation. The character string is cut out from each document record included in the initial document using a method such as an N-gram division method that is a method of the above, or a method such as a method of dividing each document record into words using a word dictionary, The appearance position is recorded, and an index of the document DB before update as shown in FIG. 4, that is, an initial index 401 is created. Then, the index creating unit 206 records the created initial index 401 in the generation 0 document DB 103a.

次に、文書入力装置１０１から次の世代（世代１）の更新用文書３０２（図３（ｂ）参照）が入力されて、世代０の文書ＤＢ１０３ａ内の初期文書１０３１を次の世代１に更新する際に、世代０→１の更新情報１０３ｂを作成する更新情報作成処理について、図２の構成図及び図７に示すフローチャートを参照して説明する。また、図５（ａ）、（ｂ）は、図２の文書ＤＢ更新処理装置１０２による更新処理おいて作成される正の索引５０１の例と、負の索引５０２の例を示す図である。 Next, the next generation (generation 1) update document 302 (see FIG. 3B) is input from the document input device 101, and the initial document 1031 in the generation 0 document DB 103a is updated to the next generation 1. The update information creation process for creating the update information 103b of the generation 0 → 1 when doing so will be described with reference to the configuration diagram of FIG. 2 and the flowchart shown in FIG. 5A and 5B are diagrams showing an example of a positive index 501 and an example of a negative index 502 created in the update process by the document DB update processing apparatus 102 in FIG.

図２において、世代Ｎの更新用文書３０２が文書ＤＢ更新処理装置１０２に入力される（ステップＳ７０１）。更新文書判定部２０１は、更新用文書３０２の総レコード数を判定し、比較対照とする文書レコードｊ（ｊ：レコード番号、すなわち、文書ＩＤ）を指定し、その指定文書レコードｊが判定した総レコード数以下か否かを判定する（ステップＳ７０２）。 In FIG. 2, a generation N update document 302 is input to the document DB update processing apparatus 102 (step S701). The update document determination unit 201 determines the total number of records in the update document 302, specifies a document record j (j: record number, ie, document ID) to be compared, and determines the total number determined by the specified document record j. It is determined whether or not the number is equal to or less than the number of records (step S702).

更新文書判定部２０１は、指定文書レコードｊが総レコード数以下であると判定した場合（ステップＳ７０２：ＹＥＳ）、指定文書レコードｊと初期文書１０３１の対応する文書レコードｊとの比較を行い、レコード間の差分文字列リストＬ（図示せず）を求める（ステップＳ７０３）。 If the updated document determination unit 201 determines that the designated document record j is equal to or less than the total number of records (step S702: YES), the updated document determination unit 201 compares the designated document record j with the corresponding document record j of the initial document 1031, and records A difference character string list L (not shown) is obtained (step S703).

次に、更新文書判定部２０１は、差分文字列リストＬの要素数が、予め設定した閾値εを越えるか否かを判定する（ステップＳ７０４）。更新文書判定部２０１は、要素数が閾値εを越えないと判定した場合（ステップＳ７０４：ＮＯ）、その差分文字列リストＬを文書ＤＢ更新処理部２０４に通知する。 Next, the updated document determination unit 201 determines whether or not the number of elements in the difference character string list L exceeds a preset threshold value ε (step S704). If the updated document determination unit 201 determines that the number of elements does not exceed the threshold ε (step S704: NO), the updated document determination unit 201 notifies the document DB update processing unit 204 of the difference character string list L.

次に、文書ＤＢ更新処理部２０４は、更新文書判定部２０１から差分文字列リストＬの通知を受けると、差分文字列リストＬの各要素である文字列に基づいて、従来のＮ−ｇｒａｍや、極大単語切出し方式などの従来の索引作成方法により、更新用文書３０２の文書レコードｊから文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）の分割切出しを行う（ステップＳ７０５）。 Next, when receiving the notification of the difference character string list L from the update document determination unit 201, the document DB update processing unit 204, based on the character strings that are the elements of the difference character string list L, Then, the character strings Word (l) to Word (M) are divided and extracted from the document record j of the update document 302 by a conventional index creation method such as a maximum word extraction method (step S705).

次に、文書ＤＢ更新処理部２０４は、切り出した文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）の中から、最初の文字列Ｗｏｒｄ（ｉ）（ｉ＝ｌ）を指定し（ステップＳ７０６：ＹＥＳ）、その文字列Ｗｏｒｄ（ｉ）の出現位置に、世代０の初期索引４０１の要素Ｗｏｒｄ（ｐ）で削除されるべき文字列が存在するかどうかを判定する（ステップＳ７０７）。 Next, the document DB update processing unit 204 designates the first character string Word (i) (i = 1) from the extracted character strings Word (l) to Word (M) (step S706: YES). Then, it is determined whether or not there is a character string to be deleted by the element Word (p) of the generation 0 initial index 401 at the appearance position of the character string Word (i) (step S707).

文書ＤＢ更新処理部２０４は、文字列Ｗｏｒｄ（ｉ）の出現位置に削除されるべき文字列Ｗｏｒｄ（ｐ）が存在しないと判定した場合（ステップＳ７０７：ＮＯ）、文字列Ｗｏｒｄ（ｉ）の文書ＩＤ、出現位置及び文字列長の差分値の情報を正の索引作成部２０２に渡す。正の索引作成部２０２は、文書ＤＢ更新処理部２０４から渡された文字列Ｗｏｒｄ（ｉ）の文書ＩＤ、出現位置及び文字列長の差分値の情報を組み合わせて、位置ポスティングとして、図５（ａ）に示すような正の索引５０１に記録する（ステップＳ７０８）。 When the document DB update processing unit 204 determines that there is no character string Word (p) to be deleted at the appearance position of the character string Word (i) (step S707: NO), the document DB Word (i) document Information on the difference value of the ID, the appearance position, and the character string length is passed to the positive index creation unit 202. The positive index creating unit 202 combines the information of the difference value between the document ID, the appearance position, and the character string length of the character string Word (i) passed from the document DB update processing unit 204 as a position posting as shown in FIG. A positive index 501 as shown in a) is recorded (step S708).

また、ステップＳ７０７において、文書ＤＢ更新処理部２０４は、文字列Ｗｏｒｄ（ｉ）の出現位置に削除されるべき文字列Ｗｏｒｄ（ｐ）が存在すると判定した場合（ステップＳ７０７：ＹＥＳ）、文字列Ｗｏｒｄ（ｐ）の文書ＩＤ及び出現位置を負の索引作成部２０３に渡す。負の索引作成部２０３は、文書ＤＢ更新処理部２０４から渡された文字列Ｗｏｒｄ（ｐ）の文書ＩＤ及び出現位置の対を、図５（ｂ）に示すような負の索引５０２に記録する（ステップＳ７０９）。 In step S707, when the document DB update processing unit 204 determines that there is a character string Word (p) to be deleted at the appearance position of the character string Word (i) (step S707: YES), the character string Word. The document ID and the appearance position of (p) are passed to the negative index creation unit 203. The negative index creation unit 203 records the document ID and appearance position pair of the character string Word (p) passed from the document DB update processing unit 204 in the negative index 502 as shown in FIG. (Step S709).

以後、文書ＤＢ更新処理部２０４は、切り出した文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）の中から、文字列Ｗｏｒｄ（ｉ）を順次指定（ｉ＋＋）して（ステップＳ７０６）、切り出した文字列Ｗｏｒｄ（ｉ）の全てを指定し、同様に、文書ＤＢ更新処理部２０４は、指定された各文字列Ｗｏｒｄ（ｉ）に対して、ステップＳ７０７〜ステップＳ７０９の処理を繰り返し実行して、正の索引５０１及び負の索引５０２を作成する。 Thereafter, the document DB update processing unit 204 sequentially designates (i ++) the character string Word (i) from the extracted character strings Word (l) to Word (M) (step S706), and extracts the extracted character string. All of Word (i) are designated, and similarly, the document DB update processing unit 204 repeatedly executes the processing of Step S707 to Step S709 for each designated character string Word (i) to obtain a positive value. An index 501 and a negative index 502 are created.

以上の処理により、文書ＤＢ更新処理部２０４は、文書ＩＤにより示される１つの文書レコードに対して、初期文書との差分文字列Ｌと、差分文字列Ｌからの文字列の切り出しと、正の索引５０１及び負の索引５０２の各作成処理が終了し、その正の索引５０１と負の索引５０２を、図２に示す世代０→１の更新情報１０３ｂ内に記録する。 As a result of the above processing, the document DB update processing unit 204, for one document record indicated by the document ID, extracts the difference character string L from the initial document, cuts out the character string from the difference character string L, Each creation process of the index 501 and the negative index 502 is completed, and the positive index 501 and the negative index 502 are recorded in the generation information 0 → 1 update information 103b shown in FIG.

次に、更新文書判定部２０１は、切り出した文字列Ｗｏｒｄ（ｉ）の全ての指定を終了すると（ｉ＝Ｍの条件成立）（ステップＳ７０６：ＮＯ）、次に比較対照とする文書レコードｊを指定するため、ｊを加算する（ｊ＋＋）（ステップＳ７１０）。 Next, when the update document determination unit 201 finishes specifying all of the extracted character string Word (i) (i = M is satisfied) (step S706: NO), the document record j to be compared next is selected. In order to specify, j is added (j ++) (step S710).

次に、更新文書判定部２０１は、ステップＳ７１０で加算したｊに基づいて、比較対照とする文書レコードｊを指定し、その指定文書レコードｊが判定した総レコード数以上か否かを判定する（ステップＳ７０２）。 Next, the updated document determination unit 201 specifies a document record j to be compared based on j added in step S710, and determines whether the specified document record j is equal to or greater than the determined total number of records ( Step S702).

更新文書判定部２０１は、指定文書レコードｊが総レコード数以下であると判定した場合（ステップＳ７０２：ＹＥＳ）、指定文書レコードｊと初期文書１０３１の対応する文書レコードｊとの比較を行い、レコード間の差分文字列リストＬを求め（ステップＳ７０３）、ステップＳ７０４以下の処理を繰り返し実行する。 If the updated document determination unit 201 determines that the designated document record j is equal to or less than the total number of records (step S702: YES), the updated document determination unit 201 compares the designated document record j with the corresponding document record j of the initial document 1031, and records A difference character string list L between them is obtained (step S703), and the processes after step S704 are repeatedly executed.

また、更新文書判定部２０１は、ステップＳ７０４において、差分文字列リストＬの要素数が、予め設定した閾値εを越えると判定した場合（ステップＳ７０４：ＹＥＳ）、初期文書と更新用文書において、指定文書レコードｊ間の違いが多すぎるため、その指定文書レコードｊの情報を削除レコード表作成部２０５に通知する。削除レコード表作成部２０５は、更新文書判定部２０１から指定文書レコードｊの情報が通知されると、図６に示すような削除レコード表に文書レコードｊを記録する（ステップＳ７１１）。 If the updated document determination unit 201 determines in step S704 that the number of elements in the difference character string list L exceeds the preset threshold value ε (step S704: YES), the updated document determination unit 201 specifies the initial document and the update document. Since there are too many differences between the document records j, information on the designated document record j is notified to the deletion record table creation unit 205. When the information of the designated document record j is notified from the update document determination unit 201, the deletion record table creation unit 205 records the document record j in the deletion record table as shown in FIG. 6 (step S711).

また、更新文書判定部２０１は、文書ＤＢ更新処理部２０４に更新用文書３０２の文書レコードｊを、追加レコードとして処理するように通知する。文書ＤＢ更新処理部２０４は、更新文書判定部２０１から更新用文書３０２の文書レコードｊを追加レコードとして処理する通知を受けると、位置ポスティングが全て０の正の索引である当該文書レコードｊの索引を作成する（ステップＳ７１２）。 Further, the update document determination unit 201 notifies the document DB update processing unit 204 to process the document record j of the update document 302 as an additional record. When the document DB update processing unit 204 receives a notification from the update document determination unit 201 to process the document record j of the update document 302 as an additional record, the index of the document record j whose position posting is a positive index of all 0s. Is created (step S712).

以上のステップＳ７１１及びステップＳ７１２の処理は、更新用文書３０２の全ての文書レコードｊに対して実行されて、削除レコード表６０１の作成が完了し、図２に示すように世代０→１の更新情報１０３ｂ内に記録される。 The processes in steps S711 and S712 described above are executed for all the document records j of the update document 302, the creation of the deletion record table 601 is completed, and generation 0 → 1 is updated as shown in FIG. It is recorded in the information 103b.

また、ステップＳ７０２において、更新文書判定部２０１は、指定文書レコードｊが総レコード数を越えたと判定した場合（ｊ＞総レコード数）（ステップＳ７０２：ＮＯ）、更新用文書３０２の全ての文書レコードｊに対する処理が終了したため、本更新情報作成処理を終了する。 In step S702, when the update document determination unit 201 determines that the designated document record j exceeds the total number of records (j> total number of records) (step S702: NO), all the document records of the update document 302 are recorded. Since the process for j is completed, the update information creation process is terminated.

以上の更新情報作成処理により、図３（ａ）の更新前の文書ＤＢ３０１に記録された初期文書１０３１の各文書レコードと、図３（ｂ）の更新用文書３０２の各文書レコードとの間の差分文字列から、図５（ａ）の正の索引５０１が作成され、図５（ｂ）の負の索引５０２が作成され、図６の削除レコード表６０１が作成されて、図２の世代０→１の更新情報１０３ｂとして、図１の文書ＤＢ保持装置１０３に記録される。 As a result of the above update information creation processing, each document record of the initial document 1031 recorded in the document DB 301 before update shown in FIG. 3A and each document record of the update document 302 shown in FIG. From the difference character string, the positive index 501 in FIG. 5A is created, the negative index 502 in FIG. 5B is created, the deletion record table 601 in FIG. 6 is created, and generation 0 in FIG. 2 is created. 1 is recorded in the document DB holding device 103 of FIG.

ここで、図３（ａ）、（ｂ）、図４、図５（ａ）、（ｂ）及び図６を参照して、上記更新情報作成処理に基づく更新情報１０３ｂの作成過程を具体的に説明する。 Here, the creation process of the update information 103b based on the update information creation process is specifically described with reference to FIGS. 3 (a), 3 (b), 4, 4, 5 (a), (b) and FIG. explain.

まず、図３（ａ）の更新前の文書ＤＢ３０１に対しては、図４に示す更新前の文書ＤＢの索引（以下、初期索引という）４０１が作成済みである。この初期索引４０１では、更新前の文書ＤＢ３０１内の文書ＩＤ＜ｄｏｃＩＤ＝１＞で示される文書レコードから切り出した４つの要素の文字列を示している。図中の（１，１）、（１，２）、（１，３）、（１，６）は、それぞれ前者の数値が文書ＩＤを示し、後者の数値が要素の出現位置（文字桁数）を示している。 First, an index (hereinafter referred to as an initial index) 401 of the document DB before update shown in FIG. 4 has been created for the document DB 301 before update shown in FIG. This initial index 401 indicates a character string of four elements cut out from the document record indicated by the document ID <doc ID = 1> in the document DB 301 before update. In the figure, (1, 1), (1, 2), (1, 3), (1, 6) are the former numerical value indicating the document ID, and the latter numerical value is the element appearance position (number of character digits). ).

そして、更新文書判定部２０１は、図３（ｂ）の更新用文書３０２が入力されると、更新用文書３０１の総レコード数が「４」であることを判定し、比較対照とする文書レコード１（＜ｄｏｃＩＤ＝１＞）を指定し、この文書レコード１と初期文書１０３１の文書レコード１との間の差分文字列リストＬを求める。 Then, when the update document 302 of FIG. 3B is input, the update document determination unit 201 determines that the total number of records in the update document 301 is “4”, and is a document record to be compared. 1 (<doc ID = 1>) is designated, and a difference character string list L between the document record 1 and the document record 1 of the initial document 1031 is obtained.

この場合、初期索引４０１に記録された文書レコード１の要素に基づいて、初期文書１０３１の文書レコード１と更新用文書３０２の文書レコード１との間の差分文字列は、図３（ａ）、（ｂ）に示す各下線部分である。すなわち、差分文字列リストＬには、初期文書１０３１の文書レコード１の“だというのに”、“暑い。”、“気温”、“３０度”と、更新用文書３０２の文書レコード１の“に”、“近づいている”、“が”、“未だ”、“暑く、”、“最高気温”、“３５度”が、要素として記録される。 In this case, based on the elements of the document record 1 recorded in the initial index 401, the difference character string between the document record 1 of the initial document 1031 and the document record 1 of the update document 302 is shown in FIG. It is each underline part shown in (b). That is, the difference character string list L includes “Despite being hot”, “hot”, “temperature”, “30 degrees” of the document record 1 of the initial document 1031, and “ “,” “Approaching”, “ga”, “still”, “hot”, “highest temperature”, “35 degrees” are recorded as elements.

次に、更新文書判定部２０１は、差分文字列リストＬの要素数が、予め設定した閾値ε（例えば、２０）を越えるか否かを判定する。この場合、図３（ａ）、（ｂ）の各文書レコード１に示す差分文字列リストＬの要素の要素数は１１であるため、要素数が閾値εを越えないと判定される。 Next, the updated document determination unit 201 determines whether or not the number of elements in the difference character string list L exceeds a preset threshold value ε (for example, 20). In this case, since the number of elements of the difference character string list L shown in each document record 1 in FIGS. 3A and 3B is 11, it is determined that the number of elements does not exceed the threshold ε.

そして、更新文書判定部２０１は、その差分文字列リストＬを文書ＤＢ更新処理部２０４に通知する。次に、文書ＤＢ更新処理部２０４は、更新文書判定部２０１から差分文字列リストＬを受けると、差分文字列リストＬの各要素に基づいて、従来のＮ−ｇｒａｍや、極大単語切出し方式などの従来の索引作成方法により、更新用文書３０２の文書レコード１から、上記文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）に相当する“に”、“近づいている”、“が、”、“未だ”、“暑く、”、“最高気温”、“３５度”の分割切出しを行う。 Then, the updated document determination unit 201 notifies the document DB update processing unit 204 of the difference character string list L. Next, when the document DB update processing unit 204 receives the difference character string list L from the update document determination unit 201, based on each element of the difference character string list L, a conventional N-gram, a maximum word extraction method, and the like. According to the conventional index creation method, from the document record 1 of the update document 302, “to”, “approaching”, “but”, “not yet” corresponding to the character strings Word (l) to Word (M). “,“ Hot, ”,“ Maximum temperature ”,“ 35 degrees ”.

次に、文書ＤＢ更新処理部２０４は、切り出した文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）の中から、上記文字列Ｗｏｒｄ（ｉ）に相当する文書レコード１の“に”を指定し、その出現位置「６」に、初期文書１０３１の文書レコード１で削除されるべき文字列Ｗｏｒｄ（ｐ）が存在するか否かを判定する。 Next, the document DB update processing unit 204 designates “to” of the document record 1 corresponding to the character string Word (i) from the extracted character strings Word (l) to Word (M). It is determined whether or not there is a character string Word (p) to be deleted in the document record 1 of the initial document 1031 at the appearance position “6”.

図３（ａ）の文書レコード１では、出現位置「６」に文字列“だというのに”が存在するため、文書ＤＢ更新処理部２０４は、削除されるべき文字列Ｗｏｒｄ（ｐ）が存在すると判定し、文字列“だというのに”の文書ＩＤ「１」及び出現位置「６」を負の索引作成部２０３に渡す。負の索引作成部２０３は、文書ＤＢ更新処理部２０４から渡された文字列“だというのに”の文書ＩＤ「１」及び出現位置「６」の対（１，６）を、図５（ｂ）に示す負の索引５０２に記録する。 In the document record 1 in FIG. 3A, since the character string “Despite being” exists at the appearance position “6”, the document DB update processing unit 204 has the character string Word (p) to be deleted. Then, the document ID “1” and the appearance position “6” of the character string “Daidano” are passed to the negative index creation unit 203. The negative index creation unit 203 creates a pair (1, 6) of the document ID “1” and the appearance position “6” of the character string “Despite being passed” from the document DB update processing unit 204 as shown in FIG. Record in the negative index 502 shown in b).

また、文書ＤＢ更新処理部２０４は、文字列“に”は、初期文書１０３１の文書レコード１の文字列“だというのに”と比較して、文字列長が５文字分短くなっているため、文字列“に”の文書ＩＤ「１」、出現位置「６」及び文字列長の差分値「−５」の情報を、正の索引作成部２０２に渡す。正の索引作成部２０２は、文書ＤＢ更新処理部２０４から渡された文字列“に”の文書ＩＤ「１」、出現位置「６」及び文字列長の差分値「−５」の情報を組み合わせて位置ポスティング（１，６，−５）として、図５（ａ）に示すような正の索引５０１に記録する。 In addition, the document DB update processing unit 204 has the character string “ni” shorter by five characters than the character string “despite it” of the document record 1 of the initial document 1031. The information of the document ID “1”, the appearance position “6”, and the character string length difference value “−5” of the character string “ni” is passed to the positive index creation unit 202. The positive index creating unit 202 combines the information of the document ID “1”, the appearance position “6”, and the character string length difference value “−5” of the character string “i” passed from the document DB update processing unit 204. The position posting (1, 6, -5) is recorded in a positive index 501 as shown in FIG.

また、文書ＤＢ更新処理部２０４は、切り出した文字列Ｗｏｒｄ（ｌ）〜Ｗｏｒｄ（Ｍ）の中から、次の文字列“近づいている”を指定し、その出現位置「７」に、初期文書１０３１の文書レコード１で削除されるべき文字列Ｗｏｒｄ（ｐ）が存在するか否かを判定する。 Further, the document DB update processing unit 204 designates the next character string “approaching” from the extracted character strings Word (l) to Word (M), and sets the initial document at the appearance position “7”. It is determined whether or not there is a character string Word (p) to be deleted in the document record 1 of 1031.

図３（ａ）の文書レコード１では、出現位置「７」には、削除されるべき文字列Ｗｏｒｄ（ｐ）が存在しない。このため、文書ＤＢ更新処理部２０４は、文字列「近づいている」の文書ＩＤ「１」、出現位置「７」及び文字列長「６」の情報を、正の索引作成部２０２に渡す。正の索引作成部２０２は、文書ＤＢ更新処理部２０４から渡された文字列“近づいている”の文書ＩＤ「１」、出現位置「７」及び文字列長の差分値「６」の情報を組み合わせて位置ポスティング（１，７，６）として、図５（ａ）に示すような正の索引５０１に記録する。 In the document record 1 in FIG. 3A, the character string Word (p) to be deleted does not exist at the appearance position “7”. Therefore, the document DB update processing unit 204 passes the document ID “1”, the appearance position “7”, and the character string length “6” of the character string “approaching” to the positive index creation unit 202. The positive index creation unit 202 receives the information of the document ID “1”, the appearance position “7”, and the character string length difference value “6” of the character string “approaching” passed from the document DB update processing unit 204. In combination, position posting (1, 7, 6) is recorded in a positive index 501 as shown in FIG.

以上の処理を、文書レコード１の他の差分文字列及び他の文書レコード２〜３の各差分文字列に対しても実行するにより、図５（ａ）に示すような正の索引５０１が作成される。 The above processing is also performed on the other difference character strings of the document record 1 and the difference character strings of the other document records 2 to 3, thereby creating a positive index 501 as shown in FIG. Is done.

次に、更新文書判定部２０１は、更新用文書３０２の次の文書レコード２（＜ｄｏｃＩＤ＝２＞）を指定し、この文書レコード２と、初期文書１０３１の文書レコード２との間の差分文字列リストＬを同様に求めるが、更新用文書３０２の文書レコード２には、文書が存在しないため、差分文字列数が閾値εを越えることになり、その文書レコード２の情報を削除レコード表作成部２０５に渡す。 Next, the update document determination unit 201 designates the next document record 2 (<doc ID = 2>) of the update document 302, and the difference between this document record 2 and the document record 2 of the initial document 1031. The character string list L is obtained in the same manner, but since there is no document in the document record 2 of the update document 302, the number of difference character strings exceeds the threshold ε, and the information of the document record 2 is stored in the deletion record table. The data is passed to the creation unit 205.

また、更新文書判定部２０１は、更新用文書３０２の文書レコード３についても、初期文書１０３１の文書レコード３との間の差分文字列リストＬを求めるが、図３（ａ）、（ｂ）の下線部分が差分文字列であり、上記閾値εとして「２０」を越えるため、その文書レコード３の情報を削除レコード表作成部２０５に渡す。 Further, the update document determination unit 201 obtains a difference character string list L between the document record 3 of the update document 302 and the document record 3 of the initial document 1031, as shown in FIGS. 3 (a) and 3 (b). Since the underlined portion is a difference character string and the threshold ε exceeds “20”, the information of the document record 3 is transferred to the deletion record table creation unit 205.

削除レコード表作成部２０５は、更新文書判定部２０１から渡された文書レコード２，３の情報に基づいて、図６に示す削除レコード表６０１を作成して、図２の世代０→１の更新情報１０３ｂに記録する。 The deletion record table creation unit 205 creates the deletion record table 601 shown in FIG. 6 based on the information of the document records 2 and 3 passed from the update document determination unit 201, and updates generation 0 → 1 in FIG. It records in the information 103b.

以上のように、文書ＤＢ更新処理装置１０２では、初期文書ＤＢ１０３ａに対して、次の世代の更新用文書が入力されると、更新情報として正の索引、負の索引及び削除レコード表が作成されて、図１の文書ＤＢ保持装置１０３に記録される。また、図３（ｃ）に示す更新後の文書ＤＢ３０３が作成される。 As described above, when the next generation update document is input to the initial document DB 103a, the document DB update processing apparatus 102 creates a positive index, a negative index, and a deletion record table as update information. Is recorded in the document DB holding device 103 of FIG. Also, the updated document DB 303 shown in FIG. 3C is created.

したがって、正の索引とは、入力された更新用文書の各文書レコードにおいて、初期文書１０３１の各文書レコードから新たに追加された文字列を切り出し、その文書ＩＤ、出現位置及び位置ポスティングとを組み合わせて索引として記録するためのものである。 Therefore, the positive index means that a newly added character string is cut out from each document record of the initial document 1031 in each document record of the input update document, and its document ID, appearance position, and position posting are combined. It is for recording as an index.

また、負の索引とは、入力された更新用文書の各文書レコードに対して、初期文書１０３１の各文書レコードで不要になった文字列の文書ＩＤと出現位置とを組み合わせて索引として記録するためのものである。 The negative index is a combination of the document ID and the appearance position of the character string that is no longer necessary in each document record of the initial document 1031 and recorded as an index for each document record of the input update document. Is for.

また、削除レコード表とは、入力された更新用文書の各文書レコードにおいて、初期文書１０３１から削除された文書レコード、又は大幅に変更された文書レコードを記録するためのものである。 The deletion record table is used to record a document record deleted from the initial document 1031 or a document record greatly changed in each document record of the input update document.

以上の更新情報作成処理では、世代０から世代１の更新情報を作成する場合を説明したが、より一般的な世代１から世代２以降の更新情報作成処理について、図８に示す文書ＤＢ更新処理装置１０２のブロック図、図９に示す文書ＤＢの更新例、図１０に示す更新用文書の例、図１１に示す初期文書ＤＢの索引例、図１２に示す更新文書ＤＢの更新情報の例及び図１３に示すフローチャートを参照して説明する。 In the above update information creation process, the case of creating update information from generation 0 to generation 1 has been described. However, the document DB update process shown in FIG. FIG. 9 is a block diagram of the apparatus 102, an example of updating the document DB shown in FIG. 9, an example of an update document shown in FIG. 10, an example of an index of the initial document DB shown in FIG. 11, an example of update information of the update document DB shown in FIG. This will be described with reference to the flowchart shown in FIG.

図８は、世代０から世代１、世代２以降の更新情報を作成する文書ＤＢ更新処理装置１０２の構成を示す図であり、上記図２の文書ＤＢ更新処理装置１０２と同一の構成部分には、同一符号を付している。 FIG. 8 is a diagram showing a configuration of the document DB update processing apparatus 102 that creates update information from generation 0 to generation 1, generation 2 and later. The same components as the document DB update processing apparatus 102 in FIG. , Are given the same reference numerals.

図８において、文書ＤＢ更新処理装置１０２は、複数更新情報解釈部８０１を含む更新文書判定部２０１と、正の索引作成部２０２及び負の索引作成部２０３を含む文書ＤＢ更新処理部２０４と、削除レコード表作成部２０５と、から構成される。 8, the document DB update processing apparatus 102 includes an update document determination unit 201 including a multiple update information interpretation unit 801, a document DB update processing unit 204 including a positive index creation unit 202 and a negative index creation unit 203, A deletion record table creation unit 205.

複数更新情報解釈部８０１は、図１３のフローチャートに示す複数更新情報解釈処理を実行し、世代Ｎの更新用文書ｉが入力されたとき、世代１〜Ｎ−１の各世代０→１〜世代Ｎ−２→Ｎ−１の各更新情報に含まれる削除レコード表ｉ＋１に記述される削除対象となる文書レコード番号のレコードについて、既に削除済みとして解釈する（ステップＳ１３０１，Ｓ１３０２）。 The multiple update information interpretation unit 801 executes the multiple update information interpretation processing shown in the flowchart of FIG. 13, and when generation N update document i is input, each generation 0 → 1 to generation 1 of generations 1 to N−1. The record of the document record number to be deleted described in the deletion record table i + 1 included in each update information of N-2 → N-1 is interpreted as already deleted (steps S1301 and S1302).

図９（ａ）〜（ｃ）は、上記図３（ａ）に示した世代０の更新前の文書ＤＢ３０１と、図３（ｃ）に示した世代１の更新文書ＤＢ又は更新後の文書ＤＢ３０３とから更に更新を行い、世代２の更新文書ＤＢ９０１を作成する例を示している。 FIGS. 9A to 9C show the document DB 301 before generation 0 shown in FIG. 3A and the updated document DB 303 or updated document DB 303 shown in FIG. 3C. In this example, the update document DB 901 of the generation 2 is created by further updating from the above.

図１０（ａ）は、上記図３（ｂ）に示した世代１の更新用文書３０２と、同図（ｂ）は世代２の更新用文書１００１とをそれぞれ示している。 FIG. 10A shows the generation 1 update document 302 shown in FIG. 3B and FIG. 10B shows the generation 2 update document 1001.

また、図１１は、上記図４に示した初期索引４０１と、図１に示した初期文書１０３１とを含む世代０の文書ＤＢ１０３ａを示している。図１２（ａ）は、上記図５（ａ）に示した正の索引５０１と、図５（ｂ）に示した負の索引５０２と、図６に示した削除レコード表６０１と、図３（ｂ）に示した更新用文書３０２とを含む世代０→１の更新情報１０３ｂを示している。 FIG. 11 shows a generation 0 document DB 103a including the initial index 401 shown in FIG. 4 and the initial document 1031 shown in FIG. 12A shows the positive index 501 shown in FIG. 5A, the negative index 502 shown in FIG. 5B, the deletion record table 601 shown in FIG. 6, and FIG. The update information 103b of the generation 0 → 1 including the update document 302 shown in b) is shown.

図１２（ｂ）は、世代１→２の更新情報１２０１に含まれる正の索引１２０１ａと、負の索引１２０１ｂと、削除レコード表１２０１ｃと、更新用文書１００１とを示している。 FIG. 12B shows a positive index 1201a, a negative index 1201b, a deletion record table 1201c, and an update document 1001 included in the update information 1201 of generation 1 → 2.

図９（ａ）の世代０の文書ＤＢ３０１から、同図（ｂ）の世代１の更新文書ＤＢ３０３を作成するところまでは、上記図７に基づく更新情報作成処理において説明したが、図１０（ｂ）の更新用文書１００１が入力されると、文書ＩＤ３の文書レコードが削除され、文書ＩＤ４の文書レコードは内容が変更されるため、世代２の更新文書ＤＢは、同図（ｃ）の９０１のように作成される。図１２（ｂ）は、図９の例において、図８の文書ＤＢ更新処理部２０４により世代１→２の更新情報１２０１を作成した場合を示す図である。 From the generation 0 document DB 301 in FIG. 9A to the generation of the generation 1 update document DB 303 in FIG. 9B, the update information creation processing based on FIG. ) Update document 1001 is input, the document record with document ID 3 is deleted, and the content of the document record with document ID 4 is changed. Created as FIG. 12B is a diagram illustrating a case where generation information 1 → 2 update information 1201 is created by the document DB update processing unit 204 of FIG. 8 in the example of FIG.

図１２（ｂ）の世代１→２の更新情報１２０１では、図１０（ａ）の更新用文書３０２中の文書ＩＤ３の文書レコードは、同図（ｂ）の更新用文書１００１により削除対象となるので、削除レコード表１２０１ｃに記録される。また、世代１から世代２への更新において、文書ＩＤ４の文書レコードは、更新情報判定部２０１により、図１０（ａ）の更新用文書３０２と同図（ｂ）の更新用文書１００１の各該当文書レコードが比較され、差分文字列数が判定されると、この例ではほとんど変更が無いものと判定され、その判定結果が文書ＤＢ更新処理部２０４に通知される。 In the update information 1201 of the generation 1 → 2 in FIG. 12B, the document record with the document ID 3 in the update document 302 in FIG. 10A is to be deleted by the update document 1001 in FIG. Therefore, it is recorded in the deletion record table 1201c. Also, in the update from generation 1 to generation 2, the document record with document ID 4 is updated by the update information determination unit 201 for each of the update document 302 in FIG. 10 (a) and the update document 1001 in FIG. 10 (b). When the document records are compared and the number of difference character strings is determined, it is determined that there is almost no change in this example, and the determination result is notified to the document DB update processing unit 204.

そして、文書ＤＢ更新処理部２０４において、上記図７で説明した通常の更新処理が実行されると、図１２（ｂ）に示す正の索引１２０１ａのように文字列が切り出され、世代１の正の索引５０１と比較されることにより、削除対象となる文字列が負の索引１２０１ｂのように記録される。 When the normal update process described with reference to FIG. 7 is executed in the document DB update processing unit 204, a character string is cut out like the positive index 1201a shown in FIG. The character string to be deleted is recorded as a negative index 1201b.

また、複数更新情報解釈部８０１において、図１３のフローチャートに示す複数更新情報解釈処理を実行することにより、図１２（ｂ）に示す削除レコード表１２０１ｃが記録される。 In addition, by executing the multiple update information interpretation process shown in the flowchart of FIG. 13 in the multiple update information interpretation unit 801, the deletion record table 1201c shown in FIG. 12B is recorded.

上記の図９〜図１２の例のように、世代０→１の更新情報１０３ｂと、世代１→２の更新情報１２０１と、世代０の初期索引４０１を組にすることで、図９（ｃ）に示す世代２の更新文書ＤＢ９０１を表現することができる。これらの複数世代における世代間の索引の関係を概念的な式で表現すると下記のようになる。 As shown in the example of FIGS. 9 to 12 above, the update information 103b of the generation 0 → 1, the update information 1201 of the generation 1 → 2, and the initial index 401 of the generation 0 are paired, so that FIG. The generation 2 update document DB 901 shown in FIG. The index relationship between the generations in these multiple generations is expressed by a conceptual expression as follows.

（正の索引Ｎ）＝Ｆ（正の索引Ｎ−１）−（負の索引Ｎ）−（削除レコード表Ｎ）
但し、Ｎ：世代番号（１＜＝Ｎ）
Ｆ（索引ｎ）：世代ｎの正の索引Ｎにおける位置シフト値を反映させる関数 (Positive index N) = F (positive index N−1) − (negative index N) − (deleted record table N)
N: Generation number (1 <= N)
F (index n): a function that reflects the position shift value in the positive index N of generation n

したがって、本実施の形態による更新情報作成処理では、世代管理方式の文書ＤＢにおいて、現世代から次世代への索引更新の際に、新版で削除される単語と、その出現位置リストの組のリストを記録した負の索引と、変更後の索引で特に追加、変更のあった文字列の出現位置と位置シフト値を記録した正の索引と、削除されたレコードの文書ＩＤを記録した削除レコード表の組を有する更新情報を作成し、この更新情報と、索引の組で次世代の文書ＤＢを表現することにより、変更した文字列部分のみに比例した量の更新情報となるため、更新情報の記憶領域を削減でき、同時に、更新情報量の削減に比例して、更新情報作成処理時間も短縮することができる。 Therefore, in the update information creation processing according to the present embodiment, in the generation management system document DB, a list of a set of a word to be deleted in the new version and its appearance position list when updating the index from the current generation to the next generation , A negative index that records the occurrence position and position shift value of the character string that was added or changed, and a deleted record table that records the document ID of the deleted record By creating update information having a set of the following and expressing the next generation document DB with this update information and the set of indexes, the amount of update information is proportional to only the changed character string portion. The storage area can be reduced, and at the same time, the update information creation processing time can be shortened in proportion to the reduction in the amount of update information.

なお、本実施の形態による更新情報作成処理では、文字列の切出し方法については、従来の全文検索の索引作成時に用いられている既知の技術である、Ｎ−ｇｒａｍ分割方式や、辞書単語による分割方式など、すなわち、切出し文字列とその出現位置で情報を記録している索引であれば、どのような方式にも適用できる。 In the update information creation process according to the present embodiment, the character string segmentation method is a known technique used when creating an index for conventional full-text search, such as an N-gram segmentation method or segmentation based on dictionary words. Any method can be applied as long as it is an index in which information is recorded by a cut character string and its appearance position.

また、本実施の形態の更新情報作成処理では、更新情報作成の際に、変更される文字列箇所が多い文書レコードは、従来方式よりも更新・検索処理共に、オーバーヘッドが大きくなる。このため、本実施の形態の更新情報作成処理では、変更される文字列数が予め決定した閾値εを超える場合は、削除対象レコードとして削除レコード表に記録し、通常通りの索引更新を行う事により、常に更新情報を適用して更新処理を行う場合よりも処理効率の向上を図ることができる。 In addition, in the update information creation process of the present embodiment, document records that have many character string portions to be changed when creating update information require more overhead in both update and search processes than in the conventional method. For this reason, in the update information creation process of the present embodiment, when the number of character strings to be changed exceeds a predetermined threshold value ε, it is recorded in the deletion record table as a record to be deleted and the index is updated as usual. Thus, the processing efficiency can be improved as compared with the case where update processing is always performed by applying update information.

なお、閾値の決定については、管理者の経験的な値でも良いし、更新データの性質に合わせて、前もって更新処理の統計を採っておき、その値に基づいて経験的に決める事も可能であり、また、ユーザにその最適な閾値を決定させることもできる。 Note that the threshold value may be determined based on the empirical value of the administrator, or it may be determined empirically based on the value of the update process in advance according to the nature of the update data. Yes, the user can also determine the optimal threshold.

次に、本実施の形態の更新情報作成処理により、更新用文書が入力される度に作成される各世代間の更新情報、特に、複数世代にわたる更新情報を効率化する手段であるマージ処理について説明する。 Next, update processing between generations created each time an update document is input by the update information creation processing according to the present embodiment, in particular, merge processing that is a means for streamlining update information over multiple generations. explain.

図１４は、文書ＤＢ更新処理装置１０２に、新たな機能として、複数の更新情報を一つの更新情報にまとめる更新情報マージ処理部１４０１と、マージ処理に伴い、複数の更新情報を場合に応じて消去する更新情報削除部１４０２とを新たに設けた文書ＤＢ更新処理装置１０２の構成を示すブロック図である。また、図１５は、文書ＤＢ更新処理装置１０２において実行される、世代ｊの更新情報〜世代ｋの更新情報を一つの更新情報にまとめるための更新情報マージ処理を示すフローチャートである。 FIG. 14 shows an update information merge processing unit 1401 that combines a plurality of pieces of update information into one update information as a new function in the document DB update processing apparatus 102, and a plurality of pieces of update information according to the merge processing. It is a block diagram which shows the structure of the document DB update processing apparatus 102 which newly provided the update information deletion part 1402 to delete. FIG. 15 is a flowchart showing an update information merge process executed in the document DB update processing apparatus 102 to combine the update information of the generation j to the update information of the generation k into one update information.

なお、図１６は、図１２（ａ）、（ｂ）で説明した世代０〜世代２の更新情報の作成において、作成される二つの更新情報（世代０→１の更新情報１０３ｂと、世代１→２の更新情報１２０１）を本実施の形態の更新情報マージ処理部１４０１において、一つの更新情報にまとめて、世代０→２の更新情報１６０１とする例を示す図である。 Note that FIG. 16 illustrates two pieces of update information (generation 0 → 1 update information 103b and generation 1 in the generation 0 to generation 2 update information described with reference to FIGS. 12 (a) and 12 (b). FIG. 6 is a diagram illustrating an example in which update information 1201) of 2 is combined into one update information and updated information 1601 of generation 0 → 2 in the update information merge processing unit 1401 of the present embodiment.

まず、更新情報マージ処理部１４０１は、世代１→２の更新情報１２０１である、正の索引１２０１ａ、負の索引１２０１ｂの全要素を、不揮発メモリ等である主記憶装置１０８に、累積正要素集合（図示せず）及び累積負要素集合（図示せず）として記録する（ステップＳ１５０１）。次に、更新情報マージ処理部１４０１は、削除レコード表１２０１ｃの要素を累積削除レコード集合として主記憶装置１０８に記録する（ステップＳ１５０２）。 First, the update information merge processing unit 1401 stores all elements of the positive index 1201a and the negative index 1201b, which are the update information 1201 of generation 1 → 2, in the main storage device 108 such as a non-volatile memory. (Not shown) and a cumulative negative element set (not shown) are recorded (step S1501). Next, the update information merge processing unit 1401 records the elements of the deletion record table 1201c as a cumulative deletion record set in the main storage device 108 (step S1502).

次に、更新情報マージ処理部１４０１は、一時変数ｉの値をｊになるまで１増加させつつ、ステップＳ１５０４以降の処理を実行する（ステップＳ１５０３）。「ｉ＜ｊ」の条件が成立するまでの更新情報マージ処理部１４０１は、一時正要素集合と呼ぶことにする正の索引の要素集合の格納領域を主記憶装置１０８内に用意して初期化し、正の索引１２０１ａの全要素を一時正要素集合に記録する（ステップＳ１５０４）。但し、この際、更新情報マージ処理部１４０１は、削除レコード表１２０１ｃに記録された文書ＩＤを持つ要素、つまり文書ＩＤが３の文書レコードについては、一時正要素集合には記録しない。 Next, the update information merge processing unit 1401 executes the processing after step S1504 while incrementing the value of the temporary variable i by 1 until it reaches j (step S1503). The update information merge processing unit 1401 until the condition “i <j” is satisfied prepares and initializes a storage area for a positive index element set in the main storage device 108, which is called a temporary primary element set. All elements of the positive index 1201a are recorded in the temporary primary element set (step S1504). However, at this time, the update information merge processing unit 1401 does not record the element having the document ID recorded in the deletion record table 1201c, that is, the document record having the document ID of 3, in the temporary primary element set.

次に、更新情報マージ処理部１４０１は、累積負要素集合の各要素ＭＷｏｒｄ（ｍ）をそれぞれ取り出しつつ（ステップＳ１５０５）、一時正要素集合（図示せず）にＭＷｏｒｄ（ｍ）が存在するかを調べる（ステップＳ１５０６）。存在する場合（ステップＳ１５０６：ＹＥＳ）、更新情報マージ処理部１４０１は、一時正要素集合からＭＷｏｒｄ（ｍ）を削除し、累積負要素集合からＭＷｏｒｄ（ｍ）を削除する（ステップＳ１５０７）。 Next, the update information merge processing unit 1401 takes out each element MWord (m) of the cumulative negative element set (step S1505), and checks whether MWord (m) exists in the temporary positive element set (not shown). It investigates (step S1506). If it exists (step S1506: YES), the update information merge processing unit 1401 deletes MWord (m) from the temporary positive element set and deletes MWord (m) from the cumulative negative element set (step S1507).

図１２（ｂ）の世代１→２の更新情報１２０１の例では、現時点で負の索引１２０１ｂに記録されている情報が累積負要素集合に記録されており、累積負要素集合の“Ｍ電器(４，１，０)”と、一時正要素集合の“Ｍ電器(４，１，０)”が共に存在するため、“Ｍ電器(４，１，０)”は一時正要素集合から削除される。 In the example of the update information 1201 of the generation 1 → 2 in FIG. 12B, the information currently recorded in the negative index 1201b is recorded in the cumulative negative element set, and the “M appliance ( 4, M) (4, 1, 0) "and" M electric device (4, 1, 0) "of the temporary positive element set exist together," M electric device (4, 1, 0) "is deleted from the temporary positive element set. The

次に、更新情報マージ処理部１４０１は、累積負要素集合の要素を全て比較が終わると（ステップＳ１５０５：ＮＯ）、世代０→１の削除レコード表６０１の要素を累積削除レコード集合（図示せず）として主記憶装置１０８に記録する（ステップＳ１５０８）。さらに、一時正要素集合を累積正要素集合に追加記録し（ステップＳ１５０９）、世代０→１の負の索引５０２の要素を累積負要素集合（図示せず）に追加記録する（ステップＳ１５１０）。 Next, when the update information merge processing unit 1401 finishes comparing all the elements of the cumulative negative element set (step S1505: NO), the update record merge processing unit 1401 converts the elements of the generation 0 → 1 deletion record table 601 into a cumulative deletion record set (not shown). ) In the main storage device 108 (step S1508). Further, the temporary positive element set is additionally recorded in the cumulative positive element set (step S1509), and the elements of the negative index 502 of the generation 0 → 1 are additionally recorded in the cumulative negative element set (not shown) (step S1510).

その後、更新情報マージ処理部１４０１は、ステップＳ１５０３の処理における比較式が成立しない場合（ステップＳ１５０３：ＮＯ）、これまでに求めた累積負要素集合を図１６に示す世代０→２の更新情報１６０１内に、負の索引（０＿２）１６０３として、累積正要素集合を正の索引（０＿２）１６０２、累積削除レコード集合を削除レコード表（０＿２）１６０４として記録し（ステップＳ１５１１）、更新処理を終了する。 Thereafter, when the comparison formula in the processing of step S1503 is not satisfied (step S1503: NO), the update information merge processing unit 1401 uses the cumulative negative element set obtained so far as the update information 1601 of generation 0 → 2 shown in FIG. The cumulative positive element set is recorded as a negative index (0_2) 1603, the positive index (0_2) 1602 and the cumulative deleted record set is deleted as a deleted record table (0_2) 1604 (step S1511), and the update process is terminated. .

このように複数世代にわたる更新情報を一つの更新情報にまとめることにより、以降の世代更新処理、検索処理の際の処理速度を向上させることができる。 In this way, by combining update information over a plurality of generations into one update information, the processing speed in subsequent generation update processing and search processing can be improved.

また、マージ処理が終了すると、文書データベースの管理者が予め指定した条件等に応じて、更新情報削除部１４０２は、世代０→１の更新情報１０３ｂと、世代１→２の更新情報１２０１を文書ＤＢ保持装置１０３より消去することもでき、文書ＤＢ保持装置１０３の記憶領域を有効に活用できる。また、その削除の際に他の記憶メディア、例えば、ディスクメディア等の外部記憶メディアへのバックアップ書き出しを行った上で消去する事で、より安全に削除を行うことができる。 When the merge process ends, the update information deletion unit 1402 updates the generation information 0 → 1 update information 103b and the generation information 1 → 2 update information 1201 according to the conditions specified in advance by the administrator of the document database. The data can be deleted from the DB holding device 103, and the storage area of the document DB holding device 103 can be used effectively. Further, when deleting the data, the data can be deleted more safely by performing backup writing to another storage medium, for example, an external storage medium such as a disk medium, and deleting the data.

また、本実施の形態の更新情報作成処理では、複数回の更新を行い、更新情報が蓄積されると、その更新情報に比例し、更新速度、検索速度共に遅くなるが、複数の更新情報に対して更新情報マージ処理を行うため、更新及び検索処理速度の低下を緩和することができる。同時に、マージ処理完了後に必要のなくなった更新情報を削除する事で、文書ＤＢを格納する記憶領域を効率よく使用することができる。 In addition, in the update information creation process of the present embodiment, when update is performed a plurality of times and the update information is accumulated, the update speed and the search speed are reduced in proportion to the update information. On the other hand, since update information merge processing is performed, it is possible to mitigate a decrease in update and search processing speed. At the same time, it is possible to efficiently use the storage area for storing the document DB by deleting update information that is no longer necessary after the merge processing is completed.

ここまでは、文書ＤＢの更新処理について述べたが、以下、上記の動作で作成した文書ＤＢを検索する文書ＤＢ検索装置１０４の動作について、以下に説明する。 Up to this point, the update process of the document DB has been described. Hereinafter, the operation of the document DB search apparatus 104 that searches the document DB created by the above operation will be described.

図１７は、文書ＤＢ検索装置１０４の構成を示すブロック図である。図１７において、文書ＤＢ検索装置１０４は、検索キーワード入力部１０５と、索引検索部１７０１と、位置シフト累算部１７０２と、負索引・削除レコード表解釈部１７０３と、検索結果出力部１０７と、から構成される。 FIG. 17 is a block diagram showing the configuration of the document DB search device 104. 17, the document DB search apparatus 104 includes a search keyword input unit 105, an index search unit 1701, a position shift accumulation unit 1702, a negative index / deleted record table interpretation unit 1703, a search result output unit 107, Consists of

図１８は、文書ＤＢ検索装置１０４において実行される文書ＤＢ検索処理を示すフローチャートである。これらの図を参照して、検索処理を説明する。 FIG. 18 is a flowchart showing document DB search processing executed in the document DB search apparatus 104. The search process will be described with reference to these drawings.

ユーザにより検索キーワード入力部１０５から検索文字列が入力される（ステップＳ１８０１）。文書ＤＢ検索装置１０４は、入力された検索文字列を辞書単語に基づいてキーワード分割処理により切り出す（ステップＳ１８０２）。 A search character string is input from the search keyword input unit 105 by the user (step S1801). The document DB search apparatus 104 cuts out the input search character string by keyword division processing based on the dictionary word (step S1802).

次に、文書ＤＢ検索装置１０４は、初期準備として、負の索引の要素と、削除レコード表の要素を格納する一時負集合（図示せず）と、上記キーワード分割処理で得られた文字列で、かつ、正の索引で検索が可能な場合に、その先頭出現位置のリストを、複数世代に渡って累積的に格納する累積結果集合（図示せず）と、累積結果集合の各要素に対して複数世代の正の索引における位置シフト値を累算する位置シフト累算集合（図示せず）とを初期化する（ステップＳ１８０３）。 Next, as an initial preparation, the document DB search device 104 uses a negative index element, a temporary negative set (not shown) that stores elements of the deleted record table, and a character string obtained by the keyword division process. When a search is possible with a positive index, a list of head appearance positions is stored cumulatively over a plurality of generations (not shown), and for each element of the cumulative result set Then, a position shift accumulation set (not shown) for accumulating position shift values in a plurality of generations of positive indexes is initialized (step S1803).

以後、文書ＤＢ検索装置１０４は、分割された各切出し文字列についてｉがＭを越えるまで処理を行う（ステップＳ１８０４）。次に、世代を表す添え字として用いるｊを、最初に検索対象とする世代Ｎをｊに代入しておき、これを１ずつ減らしながら、ｊが０以上の間処理を続ける（ステップＳ１８０５）。 Thereafter, the document DB search device 104 performs processing for each divided character string until i exceeds M (step S1804). Next, j, which is used as a subscript representing the generation, is substituted with the generation N to be searched first for j, and while this is decreased by 1, the process is continued while j is 0 or more (step S1805).

ｊが０以上の場合（ステップＳ１８０５：ＹＥＳ）、索引検索部１７０１は、前述の切り出した文字列Ｗｏｒｄ（ｉ）に対して、正の索引ｊで検索可能か否かを判定する（ステップＳ１８０６）。検索可能と判定した場合は（ステップＳ１８０６：ＹＥＳ）、ステップＳ１８０７に移行する。なお、正の索引ｊの添え字として用いるｊには、世代を表す「０〜Ｎ」を代入する。以後の負の世代ｊまたは削除レコード表ｊについても同様である。 If j is greater than or equal to 0 (step S1805: YES), the index search unit 1701 determines whether or not a search can be performed with the positive index j with respect to the extracted character string Word (i) (step S1806). . If it is determined that the search is possible (step S1806: YES), the process proceeds to step S1807. Note that “0 to N” representing a generation is substituted for j used as a subscript of the positive index j. The same applies to the subsequent negative generation j or deleted record table j.

また、ステップＳ１８０６において、位置シフト累算部１７０２は、位置シフト累算集合の値を算出する。また、この際、負索引・削除レコード表解釈部１７０３は、索引検索部１７０１により検索された文字列Ｗｏｒｄ（ｉ）が、一時負集合に登録されている負の索引の要素と同じ文字出現位置、または削除レコード表の要素を持つレコードであるか否かを判定する。 In step S1806, the position shift accumulation unit 1702 calculates a value of the position shift accumulation set. At this time, the negative index / deletion record table interpretation unit 1703 causes the character string Word (i) searched by the index search unit 1701 to have the same character appearance position as the negative index element registered in the temporary negative set. Whether or not the record has an element of the deleted record table is determined.

また、ステップＳ１８０６において、負索引・削除レコード表解釈部１７０３は、検索された文字列Ｗｏｒｄ（ｉ）が、一時負集合に登録されている負の索引の要素と同じ文字出現位置、または削除レコード表の要素を持つレコードに該当する場合、その位置では検索対象としないものとして処理を行い、一時負集合から該当要素である文字列Ｗｏｒｄ（ｉ）を削除する。この結果として、検索可能な正の索引の要素が存在するか否かにより次の処理として、ステップＳ１８０７に移行するか、ステップＳ１８０８に移行するかを決定する。 In step S1806, the negative index / deletion record table interpretation unit 1703 determines that the searched character string Word (i) has the same character appearance position as the negative index element registered in the temporary negative set, or the deletion record. If the record corresponds to a record having an element in the table, the processing is carried out assuming that the position is not a search target, and the character string Word (i) as the corresponding element is deleted from the temporary negative set. As a result, whether to move to step S1807 or to move to step S1808 is determined as the next processing depending on whether or not a searchable positive index element exists.

索引検索部１７０１は、累積結果集合の各要素ｓと、前述の索引検索処理により求まった正の索引の要素集合の要素ｐに対し、出現位置が連接するか判定し、連接する要素ｐについては、位置シフト累算集合の要素ｅ（ｐ）に位置シフト値を合算する。また、索引検索部１７０１は、ｊ＝＝０である初期索引の場合であり、かつ、連接するｐが存在しない場合、累積結果集合から要素ｓを削除する（ステップＳ１８０７）。 The index search unit 1701 determines whether the appearance position is connected to each element s of the cumulative result set and the element p of the positive index element set obtained by the above-described index search process. The position shift value is added to the element e (p) of the position shift accumulation set. Also, the index search unit 1701 deletes the element s from the cumulative result set in the case of the initial index where j == 0 and there is no concatenated p (step S1807).

次に、位置シフト累算部１７０２は、一時負集合に負の索引ｊ及び削除レコード表ｊの集合を一時負集合に累積して格納する（ステップＳ１８０８）。なお、ステップＳ１８０６において、検索可能な正の索引の要素が存在しない場合（ステップＳ１８０６：ＮＯ）、位置シフト累算部１７０２によりステップＳ１８０８の処理を行うように遷移する。 Next, the position shift accumulation unit 1702 accumulates and stores the negative index j and the set of deleted record tables j in the temporary negative set in the temporary negative set (step S1808). In step S1806, if there is no searchable positive index element (step S1806: NO), the position shift accumulation unit 1702 makes a transition to perform the process in step S1808.

また、ステップＳ１８０５において、０＜＝ｊの条件が成立しなくなると（ステップＳ１８０５：ＮＯ）、文書ＤＢ検索装置１０４は、累積結果集合が空集合か否かを判定し（ステップＳ１８０９）、空集合なら検索結果が無いとして（ステップＳ１８０９：ＹＥＳ）、結果レコード集合を返し、検索処理を終了する（ステップＳ１８１０）。 In step S1805, when the condition 0 <= j is not satisfied (step S1805: NO), the document DB search apparatus 104 determines whether or not the cumulative result set is an empty set (step S1809). If there is no search result (step S1809: YES), the result record set is returned and the search process is terminated (step S1810).

また、文書ＤＢ検索装置１０４は、空集合で無い場合は（ステップＳ１８０９：ＮＯ）、次の切出し文字列について、これまでと同様に、ステップＳ１８０４〜ステップＳ１８０９の処理を行う。また、文書ＤＢ検索装置１０４は、全ての切出し文字列についてステップＳ１８０４〜ステップＳ１８０９の処理を繰り返し行い、求まった累積結果集合を結果集合として、検索結果出力部１０７に出力する。 If the document DB search apparatus 104 is not an empty set (step S1809: NO), the processing of steps S1804 to S1809 is performed on the next cut character string as before. In addition, the document DB search apparatus 104 repeatedly performs the processing of steps S1804 to S1809 for all the extracted character strings, and outputs the obtained accumulated result set to the search result output unit 107 as a result set.

このような検索処理により、文書ＤＢ更新処理装置１０２において複数世代に渡る更新処理により作成された文書ＤＢを効率よく検索することができる。 By such a search process, the document DB update processing apparatus 102 can efficiently search the document DB created by the update process over a plurality of generations.

次に、文字列検索処理の具体例について、図１９，２０を参照して説明する。 Next, a specific example of character string search processing will be described with reference to FIGS.

図１９の例では、“Ｈ社”という検索文字列１９０１の入力に対して、索引検索部１７０１は、世代０の文書ＤＢ１９０２の初期索引１９０２ａから“Ｈ社（１，７，０）”という文字列１９０３の切り出しを行う（ステップＳ１９１０）。 In the example of FIG. 19, in response to the input of the search character string 1901 “H company”, the index search unit 1701 reads the characters “H company (1, 7, 0)” from the initial index 1902 a of the generation 0 document DB 1902. The column 1903 is cut out (step S1910).

次に、索引検索部１７０１は、更新情報（世代０→１）１９０４の正の索引１９０４ａを検索して“Ｈ社”がヒットしないため（ステップＳ１９１１）、次に、負の索引１９０４ｂを検索して（ステップＳ１９１２）、“Ｈ社（１，７，０）”がヒットしたため、これを累積負集合Ｂ１９０５に登録する（ステップＳ１９１３）。 Next, the index search unit 1701 searches the positive index 1904a of the update information (generation 0 → 1) 1904 and does not hit “Company H” (step S1911), and then searches the negative index 1904b. (Step S1912), "H Company (1, 7, 0)" has been hit, and this is registered in the cumulative negative set B1905 (Step S1913).

次に、索引検索部１７０１は、初期索引１９０２ａから検索した文字列１９０３である“Ｈ社”は、出現位置７番目でヒットするが、累積負集合Ｂ１９０５に同じ出現位置でヒットするものが存在するため、この位置では“Ｈ社”はヒットしない。すなわち、図中に示す初期索引１９０２ａから切り出した文字列１９０３は、累積負集合１９０５から削除されて（ステップＳ１９１４）、その結果集合φ１９０６が出力される。 Next, the index search unit 1701 hits the seventh occurrence position of “H Company”, which is the character string 1903 searched from the initial index 1902a, but there is a hit in the same appearance position in the cumulative negative set B1905. Therefore, “Company H” does not hit at this position. That is, the character string 1903 cut out from the initial index 1902a shown in the figure is deleted from the cumulative negative set 1905 (step S1914), and as a result, the set φ1906 is output.

次に、図２０の例では、図１９の世代０の文書ＤＢ１９０２に対して、“ＭＥＩ社とＳＯ社が”という検索文字列２００１が入力された場合の検索処理の例を示している。 Next, the example of FIG. 20 shows an example of search processing when a search character string 2001 “MEI company and SO company” is input to the generation 0 document DB 1902 of FIG.

まず、索引検索部１７０１は、検索文字列２００１を、“ＭＥＩ社”、“と”、“ＳＯ社”、“が”というように文字列の切出しを行う。 First, the index search unit 1701 cuts out the search character string 2001 such as “MEI company”, “TO”, “SO company”, and “GA”.

次に、索引検索部１７０１は、図１９の例と同様に、正の索引１９０４ａを検索して（ステップＳ２０１０）、“ＭＥＩ社”、と、“ＳＯ社”が出現するため、これらを累積結果集合（図示せず）に格納する。 Next, as in the example of FIG. 19, the index search unit 1701 searches the positive index 1904a (step S2010), and “ME company” and “SO company” appear. Store in a set (not shown).

次に、索引検索部１７０１は、負の索引１９０４ｂの全要素を累積負集合（図示せず）に格納する（ステップＳ２０１１）。その後、索引検索部１７０１は、世代０の初期索引１９０２ａを検索し（ステップＳ２０１２）、累積負集合に登録された要素と、初期索引１９０２ａに登録された要素とで、同じ文字列でかつ、同じ出現位置のものを検索対象からは省き、その上で、世代０でヒットする文字列を探す。 Next, the index search unit 1701 stores all elements of the negative index 1904b in a cumulative negative set (not shown) (step S2011). Thereafter, the index search unit 1701 searches the initial index 1902a of generation 0 (step S2012), and the elements registered in the cumulative negative set and the elements registered in the initial index 1902a have the same character string and the same A character string hit in generation 0 is searched for after omitting the appearance position from the search target.

この場合、まず、初期索引１９０２ａには“ＭＥＩ社”は存在しないため、累積結果集合は、正の索引１９０４ａでヒットした“ＭＥＩ社（１，４，２）”の先頭位置「４」が記録され、その位置シフト値である「２」が位置シフト累算集合に記録される。 In this case, first, since “MEI company” does not exist in the initial index 1902a, the cumulative position set is recorded with the head position “4” of “MEI company (1, 4, 2)” hit in the positive index 1904a. The position shift value “2” is recorded in the position shift accumulation set.

次に、索引検索部１７０１は、検索文字列から切り出した文字列“と”については、正の索引１９０４ａでは検索されず、負の索引１９０４ｂにも記録されないため、世代０の初期索引１９０２ａの検索時に初めて“と（１，６，０）”がヒットする。次に、累積結果集合に保存されている唯一つの“ＭＥＩ社”と連接するかどうかを判定するに当たり、“と”の出現位置「６」に、位置シフト累算集合に記録した位置シフト値「２」を加算した上で、連接を判定する。 Next, the index search unit 1701 does not search the character string “to” cut out from the search character string in the positive index 1904a and does not record it in the negative index 1904b. Sometimes “and (1, 6, 0)” hits for the first time. Next, in determining whether to concatenate with only one “MEI company” stored in the accumulation result set, the position shift value “3” recorded in the position shift accumulation set “ The connection is determined after adding “2”.

この処理により、図中２００２で示すように、“と”の出現位置は「８」となり、“ＭＥＩ社”の出現位置が「４」でその文字列長が「４」であるため、出現位置的に連接すると判定することで、この位置は累積結果集合に残る。 With this processing, as indicated by 2002 in the figure, the appearance position of “to” is “8”, the appearance position of “MEI company” is “4”, and the character string length is “4”. This position remains in the cumulative result set.

また、切り出し文字列“が”についても同様に検索処理を行い、その出現位置と位置シフト値を加算して、正の索引１９０４ａでヒットした“ＳＯ社”との連接を判定することで、その位置は累積結果集合に残る。 Similarly, a search process is performed for the cut-out character string “ga”, and the appearance position and the position shift value are added to determine the connection with “SO company” hit in the positive index 1904a. The position remains in the cumulative result set.

このように、切出された文字列毎に処理を繰り返すことによって、最終的に、この文書ＩＤが「１」の出現文字位置が「４」である部分でヒットするように検索することができる。 In this way, by repeating the process for each extracted character string, it is possible to finally perform a search so as to hit the portion where the appearance character position where the document ID is “1” is “4”. .

したがって、本実施の形態の文書ＤＢ検索装置１０４による検索処理では、従来の検索処理に加え、更新情報を解釈する処理が発生するが、この処理は、文字列の出現位置集合に対する加減算の演算だけで処理できるため、検索処理速度の低下は無く、また、変更の無い文字列の検索処理については、従来の索引を用いて検索すればよいため、検索速度が低下しない。 Therefore, in the search process performed by the document DB search apparatus 104 according to the present embodiment, in addition to the conventional search process, a process for interpreting the update information occurs. This process is only an addition / subtraction operation for the character string appearance position set. Therefore, the search processing speed does not decrease, and the search processing of the character string without change may be performed by using a conventional index, so that the search speed does not decrease.

本発明の文書データベース更新処理装置、文書データベース検索装置、文書データベース索引作成方法及び文書データベース検索方法は、世代管理型文書データベースの更新情報量を削減し、文字列検索の効率化を図ることができるため、文書データベース処理システム等に適用することが可能である。 The document database update processing device, the document database search device, the document database index creation method, and the document database search method of the present invention can reduce the amount of update information of the generation management type document database and can improve the efficiency of the character string search. Therefore, it can be applied to a document database processing system or the like.

本発明の実施の形態に係る文書データベース処理システムの構成を示すブロック図The block diagram which shows the structure of the document database processing system which concerns on embodiment of this invention 本実施の形態に係る世代１の文書ＤＢを作成する文書ＤＢ更新処理装置の構成を示すブロック図The block diagram which shows the structure of the document DB update processing apparatus which produces document DB of the generation 1 which concerns on this Embodiment 本実施の形態に係る（ａ）は更新前の文書ＤＢの例を示す図、（ｂ）は更新用文書の例を示す図、（ｃ）は更新後の文書ＤＢの例を示す図(A) according to the present embodiment is a diagram showing an example of a document DB before update, (b) is a diagram showing an example of a document for update, and (c) is a diagram showing an example of a document DB after update. 本実施の形態に係る更新前の文書ＤＢの索引例を示す図The figure which shows the example of an index of document DB before the update which concerns on this Embodiment 本実施の形態に係る（ａ）は更新後の文書ＤＢの正の索引例を示す図、（ｂ）は更新後の文書ＤＢの負の索引例を示す図(A) is a diagram showing a positive index example of the updated document DB, and (b) is a diagram showing a negative index example of the updated document DB according to the present embodiment. 本実施の形態に係る更新後の文書ＤＢの削除レコード表の例を示す図The figure which shows the example of the deletion record table of document DB after the update which concerns on this Embodiment 本実施の形態に係る更新情報作成処理を示すフローチャートFlowchart showing update information creation processing according to the present embodiment 本実施の形態に係る世代０から世代１、世代２以降の更新情報を作成する文書ＤＢ更新処理装置の構成を示すブロック図The block diagram which shows the structure of the document DB update processing apparatus which produces the update information from the generation 0 to the generation 1, the generation 2 or later concerning this Embodiment 本実施の形態に係る（ａ）は世代０の文書ＤＢの例を示す図、（ｂ）は世代１の更新文書ＤＢの例を示す図、（ｃ）は世代２の更新文書ＤＢの例を示す図(A) is a diagram showing an example of a generation 0 document DB, (b) is a diagram showing an example of a generation 1 update document DB, and (c) is an example of a generation 2 update document DB according to the present embodiment. Illustration 本実施の形態に係る（ａ）は更新用文書の例を示す図、（ｂ）は他の更新用文書の例を示す図(A) which shows the example of the document for an update which concerns on this Embodiment, (b) is a figure which shows the example of the document for another update 本実施の形態に係る世代０の文書ＤＢの例を示す図The figure which shows the example of document DB of the generation 0 which concerns on this Embodiment 本実施の形態に係る（ａ）は世代０→１の更新情報の例を示す図、（ｂ）は世代１→２の更新情報の例を示す図(A) which shows the example of the update information of generation 0-> 1 which concerns on this Embodiment, (b) is a figure which shows the example of update information of generation 1-> 2 本実施の形態に係る複数更新情報解釈処理を示すフローチャートFlowchart showing multiple update information interpretation processing according to the present embodiment 本実施の形態に係る更新情報マージ処理を実行する文書ＤＢ更新処理装置の構成を示すブロック図The block diagram which shows the structure of the document DB update processing apparatus which performs the update information merge process which concerns on this Embodiment 本実施の形態に係る更新情報マージ処理を示すフローチャートThe flowchart which shows the update information merge process which concerns on this Embodiment 本実施の形態に係る更新情報マージ処理により作成した世代０→２の更新情報の例を示す図The figure which shows the example of the update information of the generation 0-> 2 produced by the update information merge process which concerns on this Embodiment 本実施の形態に係る文書ＤＢ検索装置の構成を示すブロック図The block diagram which shows the structure of the document DB search apparatus concerning this Embodiment. 本実施の形態に係る文書ＤＢ検索処理を示すフローチャートFlowchart showing document DB search processing according to the present embodiment 本実施の形態に係る文字列検索処理の具体例を示す図The figure which shows the specific example of the character string search process which concerns on this Embodiment 本実施の形態に係るその他の文字列検索処理の具体例を示す図The figure which shows the specific example of the other character string search process which concerns on this Embodiment 従来の文書ＤＢ管理システムの構成を示すブロック図Block diagram showing the configuration of a conventional document DB management system 従来の更新後の文書ＤＢの索引例を示す図The figure which shows the example of an index of the document DB after the conventional update

Explanation of symbols

１００文書データベース処理システム
１０１文書入力装置
１０２文書ＤＢ更新処理装置
１０３文書ＤＢ保持装置
１０３ａ世代０の文書ＤＢ
１０３ｂ世代０→１の更新情報
１０３ｃ世代Ｎ−２→Ｎ−１の更新情報
１０３ｄ世代Ｎ−１→Ｎの更新情報
１０４文書ＤＢ検索装置
１０５検索キーワード入力部
１０６文書ＤＢ検索部
１０７検索結果出力部
１０８、１１０主記憶装置
１１０補助記憶装置
２０１更新文書判定部
２０２正の索引作成部
２０３負の索引作成部
２０４文書ＤＢ更新処理部
２０５削除レコード表作成部
２０６索引作成部
８０１複数更新情報解釈部
１４０１更新情報マージ処理部
１４０２更新情報削除部
１７０１索引検索部
１７０２位置シフト累算部
１７０３負索引・削除レコード表解釈部
DESCRIPTION OF SYMBOLS 100 Document database processing system 101 Document input apparatus 102 Document DB update processing apparatus 103 Document DB holding | maintenance apparatus 103a Generation 0 document DB
103b Update information of generation 0 → 1 103c Update information of generation N−2 → N−1 103d Update information of generation N−1 → N 104 Document DB search device 105 Search keyword input unit 106 Document DB search unit 107 Search result output unit 108, 110 Main storage device 110 Auxiliary storage device 201 Update document determination unit 202 Positive index creation unit 203 Negative index creation unit 204 Document DB update processing unit 205 Deleted record table creation unit 206 Index creation unit 801 Multiple update information interpretation unit 1401 Update information merge processing unit 1402 Update information deletion unit 1701 Index search unit 1702 Position shift accumulation unit 1703 Negative index / delete record table interpretation unit

Claims

A document database update processing device for updating a generation-managed document database,
A character string is cut out for each record from an initial generation document composed of a plurality of records having a uniquely identified ID, and the extracted character string and an appearance character position of the character string are shown in pairs. An index, and a document database recording unit for recording the initial generation document in a document database;
A document input section for inputting the update document;
An update document determination unit for determining a part of a character string of a changed portion between the initial generation document and the update document;
A positive index creation unit that creates a positive index for the corresponding character string portion by using a pair of a character string length that is cut out and a character string length difference value that occurs when the character string is changed as an index element When,
A negative index creation unit that creates, as a negative index, the index element of the initial generation to be deleted by the determination;
The record deleted by the determination includes a deletion record table creation unit that creates the document ID as a deletion record table, and the created positive index, negative index, and deletion record table are created for a new generation. A document database update processing apparatus comprising: a document database update processing unit that updates and registers as update information.

A document database update processing device for updating a generation-managed document database,
Documents that record documents from the initial generation to generation N composed of a plurality of record units each having a uniquely identified ID, and index information composed of a negative index, a positive index, and a deleted record table, respectively, in a document database A database recording unit;
A document input unit for inputting generation N + 1 update documents;
An update document determination unit that determines a part of a character string at a changed portion from the update document of the generation N + 1, and index information including a negative index, a positive index, and a deletion record table, from the initial generation to the generation N When,
A positive index creation unit that creates a positive index for the corresponding character string portion by using a pair of a character string length that is cut out and a character string length difference value that occurs when the character string is changed as an index element When,
A negative index creation unit that creates, as a negative index, the index element of the initial generation to be deleted by the determination;
A deletion record table creation unit that creates a document ID of the record deleted by the determination as a deletion record table, and generates a positive index, a negative index, and a deletion record table generated by the determination as generation N + 1. And a document database update processing unit for updating and registering as update information of the document database.

In the update process of the generation i + 1 (0 <i <N) based on the documents from the initial generation to the generation N and the negative index, the positive index, and the update document, negative numbers of the generation i to the generation N Updating the generation N + 1 by providing a multiple update information interpretation unit that interprets an index element to be deleted based on an index, a positive index, and a deleted record table, an added / changed index element, and a deleted record 3. The document database update processing apparatus according to claim 2, wherein information is created.

The update document determination unit determines whether or not the number of character strings to be updated in the update target record is larger than an arbitrary threshold when updating from the pre-update generation index to the post-update generation index. 3. The document database update processing apparatus according to claim 1 or 2, wherein the record is regarded as a change record, an index is created, and the record number is recorded in the deletion record table.

3. The document database update processing apparatus according to claim 2, further comprising an update information merge processing unit that performs a process of combining the update information of the plurality of generations accumulated by the update over the plurality of generations into one update information.

6. The document database update processing apparatus according to claim 5, wherein the update information merge processing unit deletes update information that is no longer needed.

The updated document determination unit specifies a document record to be compared from the update document, obtains a difference character string list between the document record and the corresponding document record of the initial document, and calculates the difference character string list. The document database update processing apparatus according to claim 4, wherein it is determined whether or not the number of elements is greater than the threshold value.

A search character string input part for inputting a character string to be searched;
Update information consisting of a plurality of generations of positive indexes, negative indexes and deletion record tables, and a document database holding unit for storing document information of each generation,
The input character string is analyzed and divided into character strings, and for each divided character string, a search is performed using update information over a plurality of generations from the document database holding unit, an index and a document of an initial generation A database search unit;
And a search result output unit for outputting a record set obtained by the document database search unit.

Elements of negative index from generation N + 1 to generation i + 1 when searching for generation i (i = 1 to N + 1) using the positive index of generation N + 1 and update information from generation 0 and generation 1 to generation N + 1 And an index element deleted based on the elements of the deleted record table, and a negative index / deleted record table interpreter that interprets deleted records cumulatively,
In the generation N + 1 index search, the document database search unit searches the generation N + 1 positive index for each divided character string and if there is a corresponding character string, the character string is set as a search candidate. In the generation N index search, if there is a character string corresponding to each of the divided character strings, the character string is output to the negative index / deleted record table interpretation unit,
The negative index / deleted record table interpretation unit does not search the character string if the character string corresponding to the character string input from the document database search unit is in the negative index of the generation N + 1, and the generation 9. The document database according to claim 8, wherein the document data of the record number registered in the N + 1 deletion record table is interpreted, and if there is an element of the input character string, the element is not searched. Search device.

A position shift accumulation unit for accumulating the position shift value of each element from the generation N + 1 positive index to the generation i + 1 positive index;
The document database search unit adds the position shift value accumulated by the position shift accumulation unit to the appearance position of the character string retrieved from the positive index of the generation N + 1, before the character string. 10. The document database search apparatus according to claim 9, wherein it is determined whether or not the appearance position of the searched character string is concatenated, and if the character string is concatenated, the searched character string is a search target.

The document database search unit repeatedly executes a process of searching for the corresponding character string from the positive index of each generation of the generations 0 to N + 1 for each of the divided character strings, and for each searched character string The determination of concatenation is performed, and a search target is searched from all elements from the positive index of the generation N + 1 to the positive index of the generation i + 1 for all the divided character strings. The document database search device according to 9 or 10.

A document database indexing method for creating a generation-managed document database index comprising:
A character string is cut out for each record from an initial generation document composed of a plurality of records having a uniquely identified ID, and the extracted character string and an appearance character position of the character string are shown in pairs. An index, and a document database recording step for recording the initial generation document in a document database;
A document entry step for entering the update document;
An update document determination step of determining a part of a character string of a changed portion between the initial generation document and the update document;
A positive index creation step of creating a positive index for the corresponding character string portion by using the character string length difference group generated in the change of the character string extracted and its appearance position and character string as an index element When,
A negative indexing step of creating, as a negative index, an initial generation index element to be deleted by the determination;
For the record deleted by the determination, a deletion record table creation step of creating the document ID as a deletion record table;
An update / registration step of updating / registering the created positive index, negative index and deleted record table as update information of a new generation;
A document database index creation method comprising:

A document database indexing method for creating a generation-managed document database index comprising:
Documents that record documents from the initial generation to generation N composed of a plurality of record units each having a uniquely identified ID, and index information composed of a negative index, a positive index, and a deleted record table, respectively, in a document database A database recording step;
A document input step of inputting a generation N + 1 update document;
Update document determination step of determining a part of the character string of the changed portion from the update document of the generation N + 1, the index information including the negative generation index, the positive index, and the deletion record table, the document from the initial generation to the generation N When,
A positive index creation step of creating a positive index for the corresponding character string portion by using the character string length difference group generated in the change of the character string extracted and its appearance position and character string as an index element When,
A negative indexing step of creating, as a negative index, an initial generation index element to be deleted by the determination;
For the record deleted by the determination, a deletion record table creation step of creating the document ID as a deletion record table;
An update / registration step of updating / registering the positive index, negative index, and deleted record table created by the determination as update information of generation N + 1,
A document database index creation method comprising:

A search character string input step for inputting a character string to be searched;
Update information comprising a plurality of generations of positive indexes, negative indexes, and deleted record tables, and a document database holding step for storing document information of each generation;
Document database search in which the input character string is analyzed and divided into character strings, and each divided character string is searched by using update information for a plurality of generations from the document database, an initial generation index, and a document. Steps,
A search result output step of outputting a record set obtained by the document database search step;
A document database search method comprising: