JP4521413B2

JP4521413B2 - Database management system and program

Info

Publication number: JP4521413B2
Application number: JP2007030890A
Authority: JP
Inventors: 敦子江口
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-02-09
Filing date: 2007-02-09
Publication date: 2010-08-11
Anticipated expiration: 2027-02-09
Also published as: JP2008197815A

Description

本発明は、文書（電子化文書）及び当該文書を検索するのに用いられる索引が格納されるデータベースを管理するデータベース管理システムに係り、特に当該データベースに登録されている文書の更新に伴って発生する文書再登録処理に好適なデータベース管理システム及びプログラムに関する。 The present invention relates to a database management system that manages a database that stores a document (an electronic document) and an index used to search for the document, and particularly occurs when a document registered in the database is updated. The present invention relates to a database management system and program suitable for document re-registration processing.

近年、例えばオフィスにおいては、当該オフィスで扱われる種々の文書（電子化文書）を情報検索の目的で検索が可能なデータベース（ＤＢ）に登録し管理するデータベース管理システム（ＤＢＭＳ）が構築されるようになってきている。このようなシステムでは、データベースに登録されている文書の内容が更新されると、更新された文書の当該データベースへの再登録が行われる。 In recent years, for example, in an office, a database management system (DBMS) for registering and managing various documents (digitized documents) handled in the office in a database (DB) that can be searched for the purpose of information search is constructed. It is becoming. In such a system, when the content of a document registered in the database is updated, the updated document is re-registered in the database.

オフィスで扱われる文書、例えばパーソナルコンピュータのワードプロセッシング・ソフトゥエアで作成した文書（オフィス文書）は、更新の単位が文書である。このため、データベースに登録されている文書の更新が発生した場合、データベースへの情報登録も文書単位の再登録となるのが一般的である。 Documents handled in the office, for example, documents created with word processing software of a personal computer (office documents), are updated in units of documents. For this reason, when a document registered in the database is updated, information registration in the database is generally re-registered in document units.

ところで、データベースに登録された文書の例えば全文検索を可能とするには、当該文書に含まれる単語を抽出して、その抽出された単語を索引としてデータベースに登録する必要がある（例えば、特許文献１及び２参照）。このため、データベースに登録されている文書の更新が発生した場合には、次のような一連の操作
（１）既登録文書の削除
（２）削除文書の索引削除
（３）再登録文書の全文検索のための索引として用いられる単語の抽出（再登録文書のスキャン）
（４）索引の作成
（５）文書の登録
が必要となる。 By the way, in order to enable, for example, full-text search of a document registered in the database, it is necessary to extract words included in the document and register the extracted words as an index in the database (for example, patent literature). 1 and 2). Therefore, when a document registered in the database is updated, the following series of operations (1) Delete the registered document (2) Delete the index of the deleted document (3) Full text of the re-registered document Extract words used as index for search (scan for re-registered documents)
(4) Index creation (5) Document registration is required.

また、文書の一部分の更新を行う場合、例えばＸＭＬ文書のような構造化文書の特定の構造のみを指定して更新を行う場合、その部分のみの更新（部分更新）が発生する。部分更新が発生した場合、文書全体の再登録ではなく、変更箇所のみの再登録となる。このため、文書の部分更新が増えると、データベース内の登録文書は複数の領域に分断される。
特開２００４−２０６４７３号公報特開２００６−１７２２６８号公報 In addition, when updating a part of a document, for example, when updating by designating only a specific structure of a structured document such as an XML document, only the part is updated (partial update). When a partial update occurs, only the changed part is re-registered, not the entire document. For this reason, when the partial update of the document increases, the registered document in the database is divided into a plurality of areas.
JP 2004-206473 A JP 2006-172268 A

上述のように、データベースに登録されている文書の更新に伴う当該データベースへの文書の再登録は文書単位に行われるのが一般的であり、その再登録のためには上記５つの操作が必要である。この再登録処理の中で最も時間を要する処理は、索引に関する処理（索引処理）であり、文書のサイズが大きいほど索引処理に時間がかかる。つまり、文書のサイズが大きいほど再登録処理に長時間を要する。 As described above, when a document registered in the database is updated, the re-registration of the document to the database is generally performed for each document, and the above five operations are necessary for the re-registration. It is. The most time-consuming process in the re-registration process is an index process (index process). The larger the document size, the longer the index process takes. That is, the larger the document size, the longer the re-registration process.

本発明は上記事情を考慮してなされたものでその目的は、更新後の文書のうち元の既登録文書に対して変更が発生した箇所のデータに絞ってデータベースへの登録を行うことにより、索引作成のコストを削減し、再登録処理に要する時間を短縮することができるデータベース管理システム及びプログラムを提供することにある。 The present invention has been made in consideration of the above circumstances, and its purpose is to perform registration in the database by narrowing down the data of locations where changes have occurred to the original registered document among the updated documents, An object of the present invention is to provide a database management system and program capable of reducing the cost of index creation and shortening the time required for re-registration processing.

本発明の１つの観点によれば、文書及び当該文書を検索するのに用いられる索引が格納されるデータベースを管理するデータベース管理システムが提供される。このシステムは、前記データベースに登録されるべき文書に対応する元の文書が前記データベースに既に登録されている場合、前記登録されるべき文書のうち前記元の文書に対して変更が発生した箇所のデータを差分データとして抽出する差分抽出手段と、前記変更が発生した箇所のみの索引を更新する索引更新手段と、前記差分抽出手段によって抽出された差分データのみを前記データベースに登録する差分登録手段と、前記元の文書に、前差分登録手段によって前記データベースに登録された差分データへのリンクを設定するリンク付け手段とを具備する。 According to one aspect of the present invention, there is provided a database management system for managing a database in which a document and an index used for searching the document are stored. In this system, when an original document corresponding to a document to be registered in the database has already been registered in the database, a part of the document to be registered where a change has occurred to the original document is recorded. Difference extraction means for extracting data as difference data, index update means for updating the index of only the portion where the change has occurred, difference registration means for registering only the difference data extracted by the difference extraction means in the database Linking means for setting a link to the difference data registered in the database by the previous difference registration means in the original document.

本発明によれば、更新後の文書のうち元の既登録文書に対して変更が発生した箇所のデータに絞ってデータベースに登録することにより、索引作成のコストを削減し、再登録処理に要する時間を短縮することができる。 According to the present invention, the cost of index creation is reduced by registering in the database only the data of the part where the change has occurred in the original registered document among the updated documents, and it is necessary for the re-registration process. Time can be shortened.

以下、本発明の実施の形態につき図面を参照して説明する。
図１は本発明の一実施形態に係るクライアント−サーバシステムのハードウェア構成を示すブロック図である。クライアント−サーバシステムは、主として、データベースサーバ（データベースサーバコンピュータ）１０と、複数のクライアント端末とから構成される。複数のクライアント端末はクライアント端末２０を含む。クライアント端末２０上では、データベースサーバ１０を利用するアプリケーション（アプリケーションプログラム）が動作する。クライアント端末２０を含む複数のクライアント端末は、ローカルエリアネットワーク（ＬＡＮ）のようなネットワーク３０を介してデータベースサーバ１０と接続されている。なお、図１にはクライアント端末２０以外のクライアント端末は省略されている。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram showing a hardware configuration of a client-server system according to an embodiment of the present invention. The client-server system mainly includes a database server (database server computer) 10 and a plurality of client terminals. The plurality of client terminals include a client terminal 20. On the client terminal 20, an application (application program) that uses the database server 10 operates. A plurality of client terminals including the client terminal 20 are connected to the database server 10 via a network 30 such as a local area network (LAN). In FIG. 1, client terminals other than the client terminal 20 are omitted.

データベースサーバ１０は、ハードディスクドライブのような外部記憶装置４０と接続されている。この外部記憶装置４０は、データベース管理プログラム４１及びデータベース４２を格納する。 The database server 10 is connected to an external storage device 40 such as a hard disk drive. The external storage device 40 stores a database management program 41 and a database 42.

データベース管理プログラム４１は、データベースサーバ１０によるデータベース４２の管理、及びクライアント端末からの検索要求に基づく検索処理に用いられる。データベース４２は検索の対象となる複数の文書を格納する文書部４２１と、当該文書部４２１に格納されている文書を検索するのに用いられる索引（索引データ）を格納する索引部４２２とを含む。 The database management program 41 is used for management of the database 42 by the database server 10 and search processing based on a search request from a client terminal. The database 42 includes a document part 421 that stores a plurality of documents to be searched, and an index part 422 that stores an index (index data) used to search for a document stored in the document part 421. .

本実施形態では、データベースサーバ１０によってデータベース管理システム５０が実現される。図２はデータベース管理システム５０の主として機能構成を示すブロック図である。データベース管理システム５０は、検索部５１、差分抽出部５２、索引処理部５３、登録部５４及び文書検索部５５を含む。 In the present embodiment, the database management system 50 is realized by the database server 10. FIG. 2 is a block diagram mainly showing a functional configuration of the database management system 50. The database management system 50 includes a search unit 51, a difference extraction unit 52, an index processing unit 53, a registration unit 54, and a document search unit 55.

検索部５１は、クライアント端末２０からデータベース管理システム５０（データベースサーバ１０）に対して文書登録要求が与えられた場合に、要求された文書に対応する既登録文書をデータベース４２（内の文書部４２１）から検索する。差分抽出部５２は、既登録文書がデータベース４２に存在する場合、つまり文書の再登録の場合、要求された文書と当該既登録文書とを比較して、差分（変更箇所）のデータ（差分データ）を抽出する。 When a document registration request is given from the client terminal 20 to the database management system 50 (database server 10), the search unit 51 searches for an already registered document corresponding to the requested document in the database 42 (inside the document unit 421). ) When the registered document exists in the database 42, that is, when the document is re-registered, the difference extraction unit 52 compares the requested document with the registered document, and calculates difference (changed part) data (difference data). ).

索引処理部５３は、データベース４２内の索引部４２２を対象とする索引作成・更新処理を行う。索引処理部５３は、索引更新部５３１及び索引作成部５３２を含む。索引更新部５３１は、差分抽出部５２によって抽出された差分データを対象とする索引の更新を行う。索引作成部５３２は、要求された文書に対応する既登録文書がデータベース４２に存在しない場合、つまり新規文書の登録の場合、当該新規文書の索引をデータベース４２の索引部４２２内に作成する。 The index processing unit 53 performs index creation / update processing for the index unit 422 in the database 42. The index processing unit 53 includes an index update unit 531 and an index creation unit 532. The index update unit 531 updates the index for the difference data extracted by the difference extraction unit 52. The index creating unit 532 creates an index of the new document in the index unit 422 of the database 42 when the already registered document corresponding to the requested document does not exist in the database 42, that is, when a new document is registered.

登録部５４は、要求された文書に関する登録処理を行う。登録部５４は、差分登録部５４１、リンク付け部５４２、原文登録部５４３及び文書登録部５４４を含む。
差分登録部５４１は、上記抽出された差分データをデータベース４２の文書部４２１に登録する。リンク付け部５４２は、文書部４２１に登録される差分データと当該文書部４２１に既に登録されている文書（元文書）とのリンク付けを行う。リンク付け部５４２はまた、文書の再登録時に、要求された文書が原文文書として原文登録部５４３によって文書部４２１に登録された場合、差分データとのリンク付けがなされた元文書に当該原文文書へのリンク付けを行う。 The registration unit 54 performs registration processing for the requested document. The registration unit 54 includes a difference registration unit 541, a linking unit 542, an original text registration unit 543, and a document registration unit 544.
The difference registration unit 541 registers the extracted difference data in the document unit 421 of the database 42. The linking unit 542 links the difference data registered in the document unit 421 and the document (original document) already registered in the document unit 421. In addition, when the requested document is registered in the document unit 421 by the original document registration unit 543 as the original document document when the document is re-registered, the link unit 542 adds the original document document to the original document linked with the difference data Link to.

原文登録部５４３は、既登録文書（元文書）との差分データが抽出された文書（要求された文書）を原文文書として文書部４２１に登録する。文書登録部５４４は、新規文書を文書部４２１に登録する。 The original text registration unit 543 registers a document (requested document) from which difference data from the already registered document (original document) is extracted in the document unit 421 as an original text document. The document registration unit 544 registers a new document in the document unit 421.

文書検索部５５は、クライアント端末２０からデータベース管理システム５０（データベースサーバ１０）に対して文書検索要求が与えられた場合に、要求された検索条件に合致する文書をデータベース４２内の索引部４２２を利用して検索する。なお、検索部５１に文書検索部５５の機能を持たせても良い。 When a document search request is given from the client terminal 20 to the database management system 50 (database server 10), the document search unit 55 searches the index unit 422 in the database 42 for documents that match the requested search condition. Search using it. Note that the search unit 51 may have the function of the document search unit 55.

本実施形態において、検索部５１、差分抽出部５２、索引処理部５３、登録部５４及び文書検索部５５は、図１のデータベースサーバ１０が外部記憶装置４０に格納されているデータベース管理プログラム４１を当該サーバ１０内のメモリ（図示せず）に読み込んで実行することにより実現されるものとする。このプログラム４１は、コンパクトディスク、或いはＲＯＭのような、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、このプログラム４１が、ネットワーク３０を介してデータベースサーバ１０にダウンロードされても構わない。 In the present embodiment, the search unit 51, the difference extraction unit 52, the index processing unit 53, the registration unit 54, and the document search unit 55 use the database management program 41 stored in the external storage device 40 by the database server 10 of FIG. It is realized by reading into a memory (not shown) in the server 10 and executing it. The program 41 can be stored in advance in a computer-readable storage medium such as a compact disk or ROM and can be distributed. Further, this program 41 may be downloaded to the database server 10 via the network 30.

次に、上記実施形態で適用される文書登録処理について、図３のフローチャートを参照して説明する。
まず、クライアント端末２０からデータベース管理システム５０（データベースサーバ１０）に対して、ネットワーク３０を介して文書登録が要求されたものとする。クライアント端末２０からデータベース管理システム５０に与えられる要求は、当該システム５０内の図示せぬ要求管理部で受け付けられる。この例のように、文書登録要求の場合、当該要求は検索部５１に渡される。 Next, the document registration process applied in the above embodiment will be described with reference to the flowchart of FIG.
First, it is assumed that document registration is requested from the client terminal 20 to the database management system 50 (database server 10) via the network 30. A request given from the client terminal 20 to the database management system 50 is received by a request management unit (not shown) in the system 50. As in this example, in the case of a document registration request, the request is passed to the search unit 51.

検索部５１は、クライアント端末２０からの文書登録要求を受け取ると、データベース４２内の文書部４２１に登録されている文書を検索する（ステップＳ１）。一般に、文書部４２１に登録される文書にはファイル名のようなＩＤが付されており、要求された文書のＩＤに一致するＩＤの文書が検索される。検索部５１は検索の結果に基づき、要求された文書と同一のＩＤの文書が既に文書部４２１に登録されているかを判定する（ステップＳ２）。 When receiving the document registration request from the client terminal 20, the search unit 51 searches for a document registered in the document unit 421 in the database 42 (step S1). In general, a document registered in the document unit 421 is given an ID such as a file name, and a document having an ID that matches the requested document ID is searched. Based on the search result, the search unit 51 determines whether a document having the same ID as the requested document is already registered in the document unit 421 (step S2).

もし、要求された文書と同一のＩＤの文書が文書部４２１に登録されていないならば、検索部５１は新規文書の登録が要求されたものとして、索引処理部５３内の索引作成部５３２に制御を渡す。これを受けて索引作成部５３２は、要求された文書（登録対象文書）を形態素解析することにより、当該文書を検索するのに必要な索引（例えば単語索引）を作成してデータベース４２の索引部４２２に登録する（ステップＳ３）。登録された索引は、当該索引の作成に用いられた単語（語彙）を含む登録対象文書へのリンクを（表すリンク情報を）含む。ここでは説明の簡略化のために、単語索引が作成されるものとしているが、全文検索に適したＮ−グラム索引が作成されるものであっても構わない。 If a document having the same ID as the requested document is not registered in the document unit 421, the search unit 51 determines that registration of a new document is requested and stores it in the index creation unit 532 in the index processing unit 53. Pass control. In response to this, the index creating unit 532 creates an index (for example, a word index) necessary for searching the document by performing a morphological analysis on the requested document (document to be registered), and the index unit of the database 42. Registered in 422 (step S3). The registered index includes a link (representing link information) to a registration target document including a word (vocabulary) used to create the index. Here, for simplification of explanation, a word index is created, but an N-gram index suitable for full-text search may be created.

ステップＳ３において索引作成部５３２は、上記要求された文書から上記登録された索引へのリンク（逆リンク）を、当該文書に埋め込む（設定する）。すると文書登録部５４４は、登録された索引への逆リンクが埋め込まれた上記要求された文書を文書部４２１に登録する（ステップＳ４）。 In step S3, the index creating unit 532 embeds (sets) a link (reverse link) from the requested document to the registered index. Then, the document registration unit 544 registers the requested document in which the reverse link to the registered index is embedded in the document unit 421 (step S4).

この時点におけるデータベース４２内の文書部４２１及び索引部４２２の状態例を図４に示す。図４の例では、文書部４２１には、「高速検索ＤＢ処理システム」という内容の文書４００が登録されている。文書部４２１には文書４００以外の文書も登録されているが、図４では省略されている。一方、索引部４２２には、単語「高速」「検索」「ＤＢ」「処理」「システム」を含む各単語に関する索引、即ち「高速」「検索」「ＤＢ」「処理」「システム」を含む単語（の文字列）毎に、その単語を含む文書を指す索引が登録されている。 An example of the state of the document part 421 and the index part 422 in the database 42 at this time is shown in FIG. In the example of FIG. 4, a document 400 having a content “high-speed search DB processing system” is registered in the document unit 421. Documents other than the document 400 are also registered in the document part 421, but are omitted in FIG. On the other hand, the index unit 422 includes an index relating to each word including the words “high speed”, “search”, “DB”, “processing”, and “system”, that is, a word including “high speed”, “search”, “DB”, “processing”, and “system”. For each (character string), an index indicating a document including the word is registered.

単語「高速」「検索」「ＤＢ」「処理」「システム」に関する索引は、それぞれ、当該索引（図４においてハッチングが施された矩形枠）から当該索引で示される単語を含む文書４００へのリンク（を表すリンク情報）４０１，４０２，４０３，４０４，４０５を持つ。図４では、リンク４０１，４０２，４０３，４０４，４０５は矢印で表されており、当該矢印の基端及び先端は、それぞれリンク元及びリンク先を示す。 Each of the indexes related to the words “high speed”, “search”, “DB”, “processing”, and “system” is a link from the index (a rectangular frame hatched in FIG. 4) to the document 400 including the word indicated by the index. (Link information representing) 401, 402, 403, 404, 405. In FIG. 4, links 401, 402, 403, 404, and 405 are represented by arrows, and a base end and a tip end of the arrows indicate a link source and a link destination, respectively.

また図４では、リンク４０１，４０２，４０３，４０４，４０５（を表す矢印）の先端は文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」の位置を指している。このような表現は、リンク４０１，４０２，４０３，４０４，４０５が文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」に基づいて設定されたことの理解を容易にするために適用されたものである。リンク４０１，４０２，４０３，４０４，４０５は、対応する単語「高速」「検索」「ＤＢ」「処理」「システム」をキーワードとする検索処理で文書４００を検索するためには、当該文書４００を指し示だけで十分である。このようにリンク４０１，４０２，４０３，４０４，４０５は文書４００を指し示すが、説明の都合で、当該文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」を指し示すと表現することもある。 In FIG. 4, the tips of links 401, 402, 403, 404, and 405 (representing arrows) indicate the positions of the words “fast”, “search”, “DB”, “process”, and “system” in the document 400. Such an expression facilitates understanding that the links 401, 402, 403, 404, and 405 are set based on the words “fast”, “search”, “DB”, “process”, and “system” in the document 400. It has been applied for. The links 401, 402, 403, 404, and 405 are used to search the document 400 in a search process using the corresponding words “high speed”, “search”, “DB”, “process”, and “system” as keywords. Just pointing is enough. As described above, the links 401, 402, 403, 404, and 405 indicate the document 400. For convenience of explanation, the links 401, 402, 403, 404, and 405 are expressed as indicating the words “high speed”, “search”, “DB”, “process”, and “system” in the document 400. Sometimes.

一方、文書４００は、当該文書４００から当該文書４００に含まれる単語「高速」「検索」「ＤＢ」「処理」「システム」に関する索引への逆リンク４１１，４１２，４１３，４１４，４１５は矢印で表されている。この逆リンク４１１，４１２，４１３，４１４，４１５（を表す矢印）の基端は文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」の箇所に位置している。このような表現も、先のリンク４０１，４０２，４０３，４０４，４０５と同様に、文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」に基づいて設定されたことの理解を容易にするために適用されたものである。逆リンク４１１，４１２，４１３，４１４，４１５は、文書４００から当該文書４００内の単語「高速」「検索」「ＤＢ」「処理」「システム」に基づいて作成された索引（リンク４０１，４０２，４０３，４０４，４０５）を辿ることができるだけで十分である。 On the other hand, in the document 400, reverse links 411, 412, 413, 414, and 415 to the indexes related to the words “high speed”, “search”, “DB”, “processing”, and “system” included in the document 400 are indicated by arrows. It is represented. The base ends of the reverse links 411, 412, 413, 414, and 415 (representing arrows) are located in the word 400 “Fast”, “Search”, “DB”, “Process”, and “System” in the document 400. Such expressions are also set based on the words “high speed”, “search”, “DB”, “process”, and “system” in the document 400, as in the case of the previous links 401, 402, 403, 404, and 405. It has been applied to facilitate understanding. The reverse links 411, 412, 413, 414, and 415 are indexes created from the document 400 based on the words “high speed”, “search”, “DB”, “processing”, and “system” (links 401, 402, and 400). 403, 404, 405) can only be traced.

なお、図４において、単語「高速」「ＤＢ」「処理」「システム」に関する索引のうち、ハッチングが施されていない矩形枠から当該索引で示される単語を含む文書（文書４００以外の文書）へのリンク、及びその逆リンクは省略されている。 In FIG. 4, among the indexes related to the words “high speed”, “DB”, “processing”, and “system”, from a rectangular frame not hatched to a document (a document other than the document 400) including the word indicated by the index. This link and its reverse link are omitted.

次に、クライアント端末２０からデータベース管理システム５０（データベースサーバ１０）に対して再び文書登録が要求されたものとする。このとき、データベース４２内の文書部４２１及び索引部４２２は、図４に示す状態にあるものとする。また、クライアント端末２０からの要求が、「高速検索ＤＢ処理システム」という内容の文書（つまり文書４００）が文書単位で更新された、「高速検索ＸＭＬ処理システム」という内容の文書を対象とする登録要求であるものとする。 Next, it is assumed that document registration is requested again from the client terminal 20 to the database management system 50 (database server 10). At this time, it is assumed that the document part 421 and the index part 422 in the database 42 are in the state shown in FIG. In addition, a request from the client terminal 20 is registered for a document with the content “high-speed search XML processing system” in which a document with the content “high-speed search DB processing system” (that is, the document 400) is updated in units of documents. Suppose it is a request.

データベース管理システム５０の検索部５１は、クライアント端末２０からの文書登録要求を受け取ると、データベース４２内の文書部４２１に登録されている文書を検索する（ステップＳ１）。検索部５１は検索の結果に基づき、要求された文書と同一のＩＤの文書が既に文書部４２１に登録されているかを判定する（ステップＳ２）。 When receiving the document registration request from the client terminal 20, the search unit 51 of the database management system 50 searches for a document registered in the document unit 421 in the database 42 (step S1). Based on the search result, the search unit 51 determines whether a document having the same ID as the requested document is already registered in the document unit 421 (step S2).

もし、要求された文書と同一のＩＤの文書が文書部４２１に登録されているならば、検索部５１は既登録文書に対応する更新文書の登録（再登録）が要求されたものとして、差分抽出部５２に制御を渡す。これを受けて差分抽出部５２は、文書部４２１から既登録文書（つまり更新前の文書）を取り出す（ステップＳ５）。差分抽出部５２は、クライアント端末２０から要求された更新済み文書（再登録文書）と文書部４２１から取り出された既登録文書との差分のデータ（差分データ）を、例えば単語（語彙）単位で抽出する（ステップＳ６）。 If a document having the same ID as that of the requested document is registered in the document unit 421, the search unit 51 assumes that registration (re-registration) of the updated document corresponding to the already registered document is requested. Control is passed to the extraction unit 52. In response to this, the difference extraction unit 52 extracts a registered document (that is, a document before update) from the document unit 421 (step S5). The difference extraction unit 52 generates difference data (difference data) between the updated document (re-registered document) requested from the client terminal 20 and the registered document extracted from the document unit 421, for example, in units of words (vocabulary). Extract (step S6).

ここでは、「高速検索ＤＢ処理システム」という内容の文書（文書４００）が「高速検索ＸＭＬ処理システム」という内容の文書に更新されている。つまり、「高速検索ＤＢ処理システム」のうちの「ＤＢ」が「ＸＭＬ」に変更されている。この場合、更新後の文書文書のうち、元の既登録文書（元文書）４００に対して変更が発生した箇所（変更箇所）のデータ、つまり「ＸＭＬ」が、文書４００中の「ＤＢ」に対応する差分データとして抽出される。 Here, the document (document 400) having the content “fast search database processing system” is updated to the document having the content “fast search XML processing system”. That is, “DB” in the “fast search DB processing system” is changed to “XML”. In this case, in the updated document document, data of a portion (changed portion) where the original registered document (original document) 400 has changed, that is, “XML” is stored in “DB” in the document 400. Extracted as corresponding difference data.

索引処理部５３内の索引更新部５３１は、抽出された差分データ「ＸＭＬ」に対応する文書４００中の単語「ＤＢ」を指す索引（つまり文書４００へのリンク４０３）を索引部４２２から削除する（ステップＳ７）。ステップＳ７において索引更新部５３１は、単語「ＤＢ」の索引への逆リンク４１３を当該文書４００から削除すると共に、当該文書４００中の単語「ＤＢ」に削除を表す特定情報（削除マーク）を付加する（ステップＳ７）。ステップＳ７において索引更新部５３１は更に、抽出された差分データ（変更後の単語）「ＸＭＬ」に関する索引（索引データ）を作成して索引部４２２に追加する。作成された索引は、差分データ（変更後の単語）「ＸＭＬ」に対応する変更箇所の単語（変更前の単語）「ＤＢ」を含む文書（元文書）４００へのリンクを持つ。このように索引更新部５３１は、差分抽出部５２によって差分データが抽出された変更箇所に関する索引データのみを更新する（ステップＳ７）。 The index update unit 531 in the index processing unit 53 deletes the index (that is, the link 403 to the document 400) indicating the word “DB” in the document 400 corresponding to the extracted difference data “XML” from the index unit 422. (Step S7). In step S7, the index update unit 531 deletes the reverse link 413 to the index of the word “DB” from the document 400 and adds specific information (deletion mark) indicating deletion to the word “DB” in the document 400. (Step S7). In step S <b> 7, the index update unit 531 further creates an index (index data) related to the extracted difference data (word after change) “XML” and adds it to the index unit 422. The created index has a link to a document (original document) 400 including a word (word before change) “DB” at a change location corresponding to difference data (word after change) “XML”. In this way, the index update unit 531 updates only the index data related to the changed portion from which the difference data is extracted by the difference extraction unit 52 (step S7).

索引更新部５３１はステップＳ７を実行すると、登録部５４内の差分登録部５４１に制御を渡す。すると差分登録部５４１は、差分データ、つまり単語「ＸＭＬ」のみを、文書部４２１に登録する（ステップＳ８）。 When executing the step S7, the index update unit 531 passes control to the difference registration unit 541 in the registration unit 54. Then, the difference registration unit 541 registers only the difference data, that is, the word “XML” in the document unit 421 (step S8).

リンク付け部５４２は、文書４００（元の文書）中の削除マークが付された単語「ＤＢ」から差分登録部５４１によって登録された差分データ「ＸＭＬ」へのリンク（文書リンク）を当該文書４００に設定する（ステップＳ９）。このステップＳ９においてリンク付け部５４２は、文書部４２１に登録された差分データ「ＸＭＬ」と文書リンクによってリンクされている文書４００に、当該差分データ「ＸＭＬ」に関する索引データへの逆リンクを埋め込む。 The linking unit 542 sends a link (document link) from the word “DB” marked with a deletion mark in the document 400 (original document) to the difference data “XML” registered by the difference registration unit 541. (Step S9). In step S <b> 9, the linking unit 542 embeds a reverse link to the index data related to the difference data “XML” in the document 400 linked by the document link with the difference data “XML” registered in the document unit 421.

このように本実施形態においては、文書単位で更新が行われた場合でも、文書部４２１に既に登録されている元の文書（既登録文書）と更新された文書（再登録対象文書）との差分が抽出されて、その差分のデータのみ文書部４２１に登録されると共に、その差分のデータが抽出された変更箇所についてのみ索引が作成・更新される。このため、更新された文書（再登録対象文書）全体を対象とする索引処理が必要であった従来技術に比べて変更箇所が限られ、索引処理に要する時間を短縮して高速な文書登録処理（再登録処理）を実現することができる。図４には、簡略化のために、極めて小さいサイズの文書４００の例が示されているが、サイズが大きい文書の場合には、この効果は一層大きくなる。 As described above, in this embodiment, even when updating is performed in document units, the original document (registered document) already registered in the document unit 421 and the updated document (document to be re-registered) are updated. The difference is extracted, and only the difference data is registered in the document unit 421, and the index is created / updated only for the changed portion from which the difference data is extracted. For this reason, the number of changes is limited compared to the prior art that required index processing for the entire updated document (document to be re-registered), and the time required for index processing is shortened, resulting in high-speed document registration processing. (Re-registration processing) can be realized. FIG. 4 shows an example of a very small document 400 for the sake of simplicity, but this effect is even greater for large documents.

図５に、図４に示す示す状態で、差分データ「ＸＭＬ」が登録された直後の文書部４２１及び索引部４２２の状態を示す。図５の例では、文書（元文書）４００中の単語「ＤＢ」に記号×で示される削除マークが付されており、この単語「ＤＢ」から差分データ「ＸＭＬ」に文書リンク４３１が設定されている。また、単語「ＸＭＬ」を含む文書を指し示すための索引は、当該単語「ＸＭＬ」との間に文書リンク４３１が設定されている文書（元文書）４００へのリンク４０６を含む。一方、文書（元文書）４００には、単語「ＸＭＬ」に関する索引への逆リンク４１６が埋め込まれている。また、図４と比較することによって明らかなように、リンク４０３及び逆リンク４１３が削除されている。 FIG. 5 shows the state of the document part 421 and the index part 422 immediately after the difference data “XML” is registered in the state shown in FIG. In the example of FIG. 5, a deletion mark indicated by a symbol x is attached to the word “DB” in the document (original document) 400, and a document link 431 is set from the word “DB” to the difference data “XML”. ing. The index for pointing to a document including the word “XML” includes a link 406 to a document (original document) 400 in which a document link 431 is set between the word “XML”. On the other hand, a reverse link 416 to the index related to the word “XML” is embedded in the document (original document) 400. Further, as apparent from comparison with FIG. 4, the link 403 and the reverse link 413 are deleted.

その後、「高速検索ＸＭＬ処理システム」という内容の文書（つまり更新された文書４００）が文書単位で更に更新された結果、クライアント端末２０からデータベース管理システム５０（データベースサーバ１０）に対して、「高速検索ＸＭＬ管理システム」という内容の文書（更新文書）を対象とする登録要求が与えられたものとする。この場合、単語「管理」が、文書４００中の「処理」に対する差分データとして抽出される（ステップＳ６）。これにより、「管理」のみ文書部４２１に登録されると共に、「管理」及び「処理」についてのみ、それぞれ索引の作成及び更新が行われる。 Thereafter, as a result of further updating the document having the content “high-speed search XML processing system” (that is, the updated document 400) in units of documents, the client terminal 20 sends the “high-speed search XML processing system” to the database management system 50 (database server 10). It is assumed that a registration request for a document (update document) with the content “search XML management system” is given. In this case, the word “management” is extracted as difference data for “processing” in the document 400 (step S6). As a result, only “management” is registered in the document part 421, and only “management” and “processing” are respectively created and updated.

図６に、図５に示す示す状態で、差分データ「管理」が登録された直後の文書部４２１及び索引部４２２の状態を示す。図６の例では、文書４００中の単語「処理」に削除マーク“×”が付されており、この単語「処理」から差分データ「管理」に文書リンク４３２が設定されている。また、単語「管理」を含む文書を指し示すための索引は、単語「管理」との間に文書リンク４３２が設定されている文書４００へのリンク４０７を含む。一方、文書４００には、単語「管理」に関する索引への逆リンク４１７が埋め込まれている。 FIG. 6 shows the state of the document part 421 and the index part 422 immediately after the difference data “management” is registered in the state shown in FIG. In the example of FIG. 6, a deletion mark “×” is added to the word “processing” in the document 400, and a document link 432 is set from the word “processing” to the difference data “management”. The index for pointing to a document including the word “management” includes a link 407 to the document 400 in which a document link 432 is set between the word “management”. On the other hand, a reverse link 417 to an index related to the word “management” is embedded in the document 400.

さて、図６に示されるように、文書４００中の削除マークが付されている単語「ＤＢ」及び「処理」と単語（差分データ）「ＸＭＬ」及び「管理」との間に、それぞれ文書リンク４３１及び４３２が設定されている状態で、単語「高速」「検索」「ＸＭＬ」「管理」及び「システム」の少なくとも１つを含む文書の検索が要求されたものとする。この要求は、要求管理部によって文書検索部５５に渡される。すると文書検索部５５は、索引部４２２に登録されている索引に基づいて、単語「高速」「検索」「ＸＭＬ」「管理」及び「システム」の少なくとも１つを含む文書を文書部４２１から検索する。この検索により、文書４００を含む文書群が検索される。 Now, as shown in FIG. 6, there are document links between the words “DB” and “processing” with deletion marks in the document 400 and the words (difference data) “XML” and “management”, respectively. Assume that a search for a document including at least one of the words “fast”, “search”, “XML”, “management”, and “system” is requested in a state where 431 and 432 are set. This request is passed to the document search unit 55 by the request management unit. Then, the document search unit 55 searches the document unit 421 for a document including at least one of the words “fast”, “search”, “XML”, “management”, and “system” based on the index registered in the index unit 422. To do. By this search, a document group including the document 400 is searched.

ドキュメント管理システムのようなシステムにおける文書検索では、文書単位の取り出しが多い。したがって、例えば文書４００が検索された場合には、次のような文字列取り出しを含む文書取り出し処理が行われる。 In document retrieval in a system such as a document management system, document units are often retrieved. Therefore, for example, when the document 400 is searched, the following document extraction process including character string extraction is performed.

まず文書４００中の文字列「高速検索」が取り出される。次に、文字列「高速検索」に後続し、削除マークが付されている単語「ＤＢ」から文書リンク４３１を辿ることにより、当該単語「ＤＢ」に対する更新後の単語「ＸＭＬ」が取り出される。続いて、単語「ＤＢ」に後続し、削除マークが付されている単語「処理」から文書リンク４３２を辿ることにより、当該単語「処理」に対する更新後の単語「管理」が取り出される。そして、単語「処理」に後続する単語「システム」が取り出される。つまり、更新された文書の内容「高速検索ＸＭＬ管理システム」が取り出される。また、上述の文書取り出し処理では、文書４００に設定されている索引への逆リンクを取り除く処理が行われる。 First, the character string “fast search” in the document 400 is extracted. Next, following the character string “fast search”, the document link 431 is traced from the word “DB” to which the deletion mark is attached, whereby the updated word “XML” for the word “DB” is extracted. Subsequently, the updated word “management” for the word “processing” is extracted by following the document link 432 from the word “processing” followed by the word “DB”. Then, the word “system” following the word “processing” is extracted. In other words, the updated document content “Fast Search XML Management System” is retrieved. In the document retrieval process described above, a process for removing the reverse link to the index set in the document 400 is performed.

図６から明らかなように、差分データの登録が繰り返されると、文書（元文書）４００から差分データへの文書リンクが増える。ところが、差分データへの文書リンクの数が多い文書の取り出しが行われる文書検索処理では、文書取り出し時に当該文書リンクに基づく差分データ取り出しのためのコストがかかる。 As is apparent from FIG. 6, when the registration of the difference data is repeated, the document links from the document (original document) 400 to the difference data increase. However, in a document search process in which a document having a large number of document links to difference data is extracted, there is a cost for extracting the difference data based on the document link when the document is extracted.

そこで本実施形態では、差分データの登録が繰り返されても、更新された文書の内容を簡単に取り出すことができる仕組みを適用している。以下、この仕組みについて、先に挙げた図３のフローチャートを参照して説明する。 Therefore, in the present embodiment, a mechanism is employed in which the contents of the updated document can be easily extracted even if the registration of difference data is repeated. Hereinafter, this mechanism will be described with reference to the above-described flowchart of FIG.

リンク付け部５４２は、上記ステップＳ９を実行すると原文登録部５４３を起動する。原文登録部５４３は、クライアント端末２０から要求された文書自体を原文文書としてデータベース４２の文書部４２１に登録する（ステップＳ１０）。するとリンク付け部５４２は、差分データとのリンク付けがなされた元文書から当該原文文書へのリンク（原文文書リンク）を、当該元文書に設定する（ステップＳ１１）。これにより、図３のフローチャートに従う文書登録処理は終了する。 When executing step S9, the linking unit 542 activates the original text registration unit 543. The original text registration unit 543 registers the document itself requested from the client terminal 20 in the document part 421 of the database 42 as an original text document (step S10). Then, the linking unit 542 sets a link (original document document link) from the original document linked with the difference data to the original document document in the original document (step S11). Thereby, the document registration process according to the flowchart of FIG. 3 ends.

図７は、図６に示す状態で、原文文書の登録と元文書から当該原文文書へのリンクの設定とが実行された直後の文書部４２１の状態を示す。図７の例では、「高速検索ＸＭＬ管理システム」という内容の文書を対象とする登録要求が与えられて、図３のフローチャートに従う文書登録処理がステップＳ１１まで実行された結果、当該文書が原文文書４３０として文書部４２１に登録されている。また、元文書４００には、原文文書４３０（つまり元文書４００に対応する更新後の文書）へのリンク（原文文書リンク）４４０が設定されている。本実施形態では、文書（元文書）４００が検索された場合、上記リンク４４０に基づき、以下に述べるように、原文文書４３０の内容を高速で取り出すことができる。 FIG. 7 shows the state of the document part 421 immediately after registration of the original document and setting of the link from the original document to the original document in the state shown in FIG. In the example of FIG. 7, when a registration request for a document having the content “fast search XML management system” is given and the document registration process according to the flowchart of FIG. 3 is executed up to step S11, the document is a source document. The document part 421 is registered as 430. In the original document 400, a link (original document link) 440 to the original document 430 (that is, an updated document corresponding to the original document 400) is set. In the present embodiment, when the document (original document) 400 is searched, the contents of the original document 430 can be extracted at high speed based on the link 440 as described below.

今、単語「高速」「検索」「ＸＭＬ」「管理」及び「システム」の少なくとも１つを含む文書の検索が要求された結果、文書（元文書）４００を含む文書群が登録部５４によって検索されたものとする。ここで、文書（元文書）４００が検索された場合、文書検索部５５は、当該文書（元文書）４００に設定されているリンク４４０に基づき、原文文書４３０の内容を取り出す。 Now, as a result of a request for searching for a document including at least one of the words “high speed”, “search”, “XML”, “management”, and “system”, the registration unit 54 searches for a document group including the document (original document) 400. It shall be assumed. Here, when the document (original document) 400 is searched, the document search unit 55 extracts the content of the original document 430 based on the link 440 set in the document (original document) 400.

このように本実施形態によれば、検索された文書（元文書）から差分データへのリンクを辿ることなく、差分データが反映されている文書の内容、つまり原文文書の内容を高速で取り出すことができる。しかも、原文文書には、元文書と異なって索引への逆リンクが設定されていないため、原文文書の内容を取り出す際に逆リンクを取り除く必要がなく、一層高速な文書取り出しが可能となる。 As described above, according to the present embodiment, the contents of the document reflecting the difference data, that is, the contents of the original document document can be extracted at high speed without following the link from the retrieved document (original document) to the difference data. Can do. In addition, unlike the original document, the reverse link to the index is not set in the original document document, so that it is not necessary to remove the reverse link when extracting the contents of the original document document, and the document can be extracted at a higher speed.

ここで、更新（再登録）が頻繁に発生する文書の場合、再登録が要求される都度、更新された文書が原文文書として文書部４２１に登録されることから、当該文書部４２１のための記憶領域は大容量を必要とする。そこで、原文文書を圧縮して文書部４２１に登録すると良い。 Here, in the case of a document that is frequently updated (re-registered), the updated document is registered in the document unit 421 as an original document every time re-registration is requested. The storage area requires a large capacity. Therefore, it is preferable to compress the original document and register it in the document part 421.

［変形例］
次に、上記実施形態の変形例について説明する。
上記実施形態で適用される文書登録処理は、文書の更新が文書単位で行われる場合を前提としている。しかし、文書の部分更新が行われる場合にも、上記実施形態と同様の登録処理が適用できる。 [Modification]
Next, a modification of the above embodiment will be described.
The document registration process applied in the above embodiment is premised on the case where the document is updated in units of documents. However, the registration process similar to that in the above embodiment can also be applied when partial updating of a document is performed.

文書の部分更新の場合、差分抽出部５２は、部分更新箇所の更新データをそのまま差分データとする。このため本変形例において、差分抽出部５２による差分抽出に要する時間は、上記実施形態に比べて短縮される。但し、文書の部分更新の場合、文書単位の更新と異なって、原文文書として文書部４２１に登録可能な再登録文書（更新文書）は存在しないため、原文登録部５４３は次のような処理を行う必要がある。 In the case of partial update of the document, the difference extraction unit 52 uses the update data of the partial update portion as it is as difference data. For this reason, in this modification, the time which the difference extraction part 52 requires for the difference extraction is shortened compared with the said embodiment. However, in the case of partial update of a document, there is no re-registration document (update document) that can be registered in the document part 421 as an original document document unlike update in document units, so the original document registration unit 543 performs the following processing. There is a need to do.

原文登録部５４３は、文書の部分更新の場合、部分更新が発生した文書（元文書）と原文文書リンクによってリンク付けされている原文文書を文書部４２１から取り出す。原文登録部５４３は、文書部４２１から取り出された原文文書に部分更新の内容（更新部分）を反映する更新処理（部分更新処理）を行う。なお、既登録の文書に対する最初の部分更新の場合には、元文書とリンク付けされている原文文書は存在しない。このような場合、原文登録部５４３は元文書を文書部４２１から取り出して、その取り出された元文書に更新部分を反映する。 In the case of a partial update of a document, the original text registration unit 543 extracts from the document unit 421 an original text document linked with a document (original document) in which the partial update has occurred by an original text document link. The original text registration unit 543 performs an update process (partial update process) that reflects the content of the partial update (updated part) in the original text document extracted from the document part 421. In the case of the first partial update for a registered document, there is no original document document linked to the original document. In such a case, the original text registration unit 543 extracts the original document from the document unit 421, and reflects the updated portion in the extracted original document.

原文登録部５４３は、更新部分が反映された文書（原文文書または元文書）を新たな原文文書として文書部４２１に登録する。元文書が検索された場合、この新たな原文文書が文書取り出しの対象となる。 The original text registration unit 543 registers a document (original text document or original document) in which the updated portion is reflected in the document part 421 as a new original text document. When the original document is searched, this new original document becomes a target for document retrieval.

このように上記実施形態の変形例においては、部分更新の場合でも、既登録の原文文書または元文書を用いて当該部分更新の内容（更新部分）が反映された文書を取得して、その更新部分が反映された文書を原文文書として文書部４２１に登録することにより、既登録の文書が部分更新によって分断されるを防止することができる。文書取り出し時には、この更新部分が反映された原文文書を取り出すことで、当該文書取り出しの高速化を図ることができる。 As described above, in the modification of the above embodiment, even in the case of partial update, a document that reflects the content (updated part) of the partial update is acquired using the registered original document or original document, and the update is performed. By registering the document in which the part is reflected in the document part 421 as the original document, it is possible to prevent the already registered document from being divided by the partial update. At the time of document retrieval, by extracting the original document document in which the updated portion is reflected, it is possible to speed up the document retrieval.

また、上記実施形態では、差分（変更箇所）の検出が単語（語彙）単位で行われる。しかし、タグを用いて階層的な構造が記述される、ＸＭＬ（Extensible Markup Language）文書のような構造を持つ文書（つまり構造化文書）の場合には、単語単位で差分が検出されると、変更箇所が多数になり、文書のリンク処理が複雑となる。そこで、構造化文書における更新（変更）の単位がタグを用いて記述される要素の単位となる場合が多いことを考慮して、当該構造化文書を対象とする登録処理では、要素の単位に差分が検出される構成を適用すると良い。 Moreover, in the said embodiment, the detection of a difference (change location) is performed per word (vocabulary). However, in the case of a document having a structure such as an XML (Extensible Markup Language) document in which a hierarchical structure is described using tags (that is, a structured document), if a difference is detected in units of words, There are many changes, and the document linking process becomes complicated. Therefore, in consideration of the fact that the unit of update (change) in a structured document is often the unit of an element described using a tag, in the registration process for the structured document, the unit of the element is used. A configuration in which a difference is detected may be applied.

図８は、構造化文書の代表であるＸＭＬ文書の一例を示す。図８に示すＸＭＬ文書は、要素（名称要素）８１を含む。名称要素８１は、＜名称＞タグと当該要素８１の内容（値）である文字列（テキスト）「高速検索ＤＢ処理システム」含む。もし、図８に示すＸＭＬ文書が更新された結果、名称要素８１に含まれている文字列「高速検索ＤＢ処理システム」中の「ＤＢ」が「ＸＭＬ」に変更されたＸＭＬ文書の登録が要求された場合、差分抽出部５２は、名称要素８１を差分（変更箇所）として抽出すれば良い。 FIG. 8 shows an example of an XML document that is a representative of a structured document. The XML document shown in FIG. 8 includes an element (name element) 81. The name element 81 includes a <name> tag and a character string (text) that is the content (value) of the element 81 “fast search DB processing system”. If the XML document shown in FIG. 8 is updated, registration of the XML document in which “DB” in the character string “Fast Search DB Processing System” included in the name element 81 is changed to “XML” is requested. When it is done, the difference extraction part 52 should just extract the name element 81 as a difference (change location).

なお、本発明は、上記実施形態またはその変形例そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態またはその変形例に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態またはその変形例に示される全構成要素から幾つかの構成要素を削除してもよい。 In addition, this invention is not limited to the said embodiment or its modification example as it is, A component can be deform | transformed and embodied in the range which does not deviate from the summary in an implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment or its modification. For example, you may delete a some component from all the components shown by embodiment or its modification.

本発明の一実施形態に係るクライアント−サーバシステムのハードウェア構成を示すブロック図。The block diagram which shows the hardware constitutions of the client-server system which concerns on one Embodiment of this invention. 図１に示されるデータベース管理システムの主として機能構成を示すブロック図。The block diagram which mainly shows a function structure of the database management system shown by FIG. 上記実施形態で適用される文書登録処理の手順を示すフローチャート。6 is a flowchart showing a procedure of document registration processing applied in the embodiment. データベース内の文書部及び索引部の状態の一例を示す図。The figure which shows an example of the state of the document part and index part in a database. 図４に示す示す状態で、差分データ「ＸＭＬ」が登録された直後の文書部及び索引部の状態を示す図。The figure which shows the state of a document part and an index part immediately after difference data "XML" was registered in the state shown in FIG. 図５に示す示す状態で、差分データ「管理」が登録された直後の文書部及び索引部の状態を示す図。The figure which shows the state of a document part and an index part immediately after difference data "management" is registered in the state shown in FIG. 図６に示す状態で、原文文書の登録と元文書から当該原文文書へのリンクの設定とが実行された直後の文書部の状態を示す図。The figure which shows the state of the document part immediately after registration of an original text document and the setting of the link from an original document to the said original text document were performed in the state shown in FIG. 要素単位で差分検出が行われるＸＭＬ文書の一例を示す図。The figure which shows an example of the XML document by which difference detection is performed per element.

Explanation of symbols

１０…データベースサーバ、２０…クライアント端末、３０…ネットワーク、４０…外部記憶装置、４１…データベース管理プログラム、４２…データベース、５０…データベース管理システム、５１…検索部、５２…差分抽出部、５３…索引処理部、５４…登録部、５５…文書検索部、５３１…索引更新部、５３２…索引作成部、５４１…差分登録部、５４２…リンク付け部、５４３…原文登録部、５４４…文書登録部。 DESCRIPTION OF SYMBOLS 10 ... Database server, 20 ... Client terminal, 30 ... Network, 40 ... External storage device, 41 ... Database management program, 42 ... Database, 50 ... Database management system, 51 ... Search part, 52 ... Difference extraction part, 53 ... Index Processing unit 54... Registration unit 55 55 Document search unit 531 Index update unit 532 Index creation unit 541 Difference registration unit 542 Linking unit 543 Original document registration unit 544 Document registration unit

Claims

In a database management system for managing a structured document in which a hierarchical structure is described using tags and a database in which an index used to search the structured document is stored,
If a structured document to be registered in the database and an original structured document corresponding to the structured document to be registered including the part where the change has occurred is already registered in the database, the registered document A difference extraction unit that extracts, as difference data, an element in which a change has occurred in the original structured document among the structured documents to be processed;
Index update means for updating only the index of the element where the change has occurred;
Difference registration means for registering only the difference data extracted by the difference extraction means in the database;
Source text registration means for registering the structured document to be registered in the database as a source text structured document;
The reverse link to the previous SL links and index on the difference data to the difference data registered in the database, with linking means for setting the original structured document in accordance with the registration in the database of the difference data there are, a link to the original structured document used to access the original structured document corresponding to the structured document of those said original when said original structured document is retrieved, the original structured Linking means for setting the original structured document according to the registration of the document in the database ,
When the partial extraction of the original structured document registered in the database or the original structured document linked to the original structured document occurs, the difference extraction unit Using the update data of the element including
The original text registration means takes out the original structured document or the original text structured document in which the partial update has occurred, and reflects the content of the partial update in the original structured document or the original text structured document. And register a new source text structured document reflecting the contents of the partial update in the database.
Database management system which is characterized a call.

The index update means deletes an index indicating the original structured document created based on an element corresponding to the extracted difference data in the original structured document from the database, The database management system according to claim 1, wherein an index indicating the structured document to be registered is created and registered in the database.

A database server computer that manages a database storing a structured document in which a hierarchical structure is described using tags and an index used to search the structured document;
If a structured document to be registered in the database and an original structured document corresponding to the structured document to be registered including the part where the change has occurred is already registered in the database, the registered document A step of extracting, as difference data, an element in which a portion where a change has occurred with respect to the original structured document among the structured documents to be performed;
Updating the index of only the element where the location where the change occurred,
Registering only the extracted difference data in the database;
Registering the structured document to be registered in the database as a source text structured document;
Setting a reverse link to a link and the index on the difference data to the difference data registered in the database, the original structured document in accordance with the registration in the database of the difference data,
A link to the original structured document used to access the original structured document said registered in the database corresponding to the original structured document when the structured document before Kimoto is retrieved, Setting the original structured document in the original structured document in response to registration in the database ;
When a partial update occurs to the original structured document registered in the database or a source text structured document linked to the original structured document, update data of an element including a portion where the partial update occurs Directly using the difference data as the difference data;
The original structured document or the original text structured document in which the partial update has occurred is taken out, and the contents of the partial update are reflected in the original structured document or the original text structured document. A program for executing a step of registering a new original text structured document reflecting the contents in the database .

In the updating step, an index indicating the original structured document created based on an element corresponding to the extracted difference data in the original structured document is deleted from the database, and the difference data 4. The program according to claim 3, wherein an index pointing to the structured document to be registered is created.