JP4623446B2

JP4623446B2 - Data management program and data management system

Info

Publication number: JP4623446B2
Application number: JP2004199927A
Authority: JP
Inventors: 敬史田島; 佳紀福井
Original assignee: 敬史田島
Priority date: 2004-06-08
Filing date: 2004-06-08
Publication date: 2011-02-02
Anticipated expiration: 2024-06-08
Also published as: JP2005353024A

Description

本発明は，計算機に格納されているデータの一部をユーザからの要求に応じて取り出し，あるいはさらに，取り出したデータのネットワーク等を介しての送信や，ユーザ等への呈示や，記憶媒体等への保存等を行うシステムにおいて，データ中の冗長性を除去した状態で，取出し，送信，呈示，保存等を行うことによって，取出し処理，送信処理，保存領域等のコストの軽減ならびに，ユーザの閲覧利便性の向上を実現するためのプログラムおよびシステムに関する．The present invention extracts a part of data stored in a computer in response to a request from a user, or further transmits the extracted data through a network, presents it to a user, a storage medium, etc. In a system that performs storage, etc., by removing, transmitting, presenting, storing, etc. with redundancy in the data removed, it is possible to reduce the cost of retrieval processing, transmission processing, storage area, etc. It relates to a program and system for improving browsing convenience.

ユーザからのデータの取出し要求に応じて，管理しているデータの中から，その取出し要求への解となるデータの集合，すなわち解集合を取り出すデータ管理システムで，解集合中の重複を取り除くことを考慮するシステムに関しては，以下の四つの分類軸が考えられる．In response to a data retrieval request from a user, a data management system that extracts a set of data that is a solution to the retrieval request from the managed data, that is, a data management system that extracts the solution set, removes duplicates in the solution set The following four classification axes can be considered for the system that considers.

（１）同一のデータが複数回，解として出現するような重複のみを考慮するか，あるいは対象データが木構造として表現できるような入れ子構造を持つ場合には，ある解が別の解の部分木として出現するような重複も考量するか．(1) If only duplicates where the same data appears multiple times as a solution are considered, or if the target data has a nested structure that can be expressed as a tree structure, a solution is a part of another solution Do you consider duplicates that appear as trees?

（２）取り出したデータを調べて重複を発見し取り除くか，あるいは，与えられたデータ取出し要求を解析し，重複を含まない形でデータを取り出すようにこれらの取出し要求を変形したものを合成し，これらを用いてデータを取り出すことで，重複を取り除くか．(2) Check the retrieved data to find and remove duplicates, or analyze the given data retrieval request and synthesize these retrieval requests so that the data is retrieved without duplicates. , Do you remove duplicates by using these to extract data?

（３）一つのデータ取出し要求に対する解集合内の重複のみを考慮するか，あるいは，複数のデータ取出し要求を同時に受け取った場合は，これらのうちの異なる要求に対する解の間での重複についても考慮するか．(3) Consider only the duplication in the solution set for one data retrieval request, or if multiple data retrieval requests are received at the same time, consider the duplication between the solutions for these different requests. Or.

（４）重複を取り除いた状態のデータから，必要に応じて，当初の要求通りの，重複を含む解集合を合成する機能を有するかどうか．(4) Whether or not it has a function of synthesizing a solution set including duplicates as required from the original data as needed.

これら四つの分類軸の組合わせから１６通りの場合が考えられる．それらのうち，分類軸の（１）に関して，同一のデータが複数回，解となるような重複のみを考慮するものに分類される８通りのうちのいくつかについては，先行技術が存在する．There are 16 possible cases from the combination of these four classification axes. Among them, with respect to (1) of the classification axis, there are prior arts for some of the eight types classified into those that only consider duplication that the same data becomes a solution multiple times.

まず，取り出したデータを調べて重複を発見する手法で，一つの取出し要求のみを考慮するシステムについては，下記特許文献１や下記特許文献２において，ユーザからの検索要求に応じて，検索結果を呈示するシステムで，検索結果の呈示前にその内容を調べ，同じデータが複数回含まれる場合には，このうちの一つを残して削除するシステムが開示されている．同様に，取り出したデータを調べて重複を発見する手法で，複数の問い合わせを考慮する点についても，下記特許文献１において言及されている．
特開平０７−２９５９９４特開平１０−１４３５２３ First, for a system that considers only one retrieval request by examining the retrieved data and finds duplicates, the retrieval results are obtained according to the retrieval request from the user in Patent Document 1 and Patent Document 2 below. In the system that presents, the contents of the search results are examined before presenting, and when the same data is included multiple times, one of these is deleted and deleted. Similarly, Patent Document 1 mentioned below also points out that multiple inquiries are taken into account by examining the extracted data and finding duplicates.
JP 07-295994 A JP-A-10-143523

また，データ取出し要求を変形して重複を取り除く手法で，一つの要求のみを考慮する技術の例としては，関係データベースに対して「従業員の中で６０歳以上の者の社員番号の集合と，従業員の中で勤続年数が４０年以上の者の社員番号の集合の集合和を取り出せ」という要求を表すＳＱＬ検索式：
（ｓｅｌｅｃｔ社員番号ｆｒｏｍ従業員ｗｈｅｒｅ年齢＞＝６０）
ｕｎｉｏｎ
（ｓｅｌｅｃｔ社員番号ｆｒｏｍ従業員ｗｈｅｒｅ勤続年数＞＝４０）
が与えられた時，これを「従業員の中で年齢が６０歳以上か、もしくは，勤続年数が４０年以上の者の社員番号の集合を取り出せ」という要求を表すＳＱＬ検索式：
ｓｅｌｅｃｔ社員番号ｆｒｏｍ従業員
ｗｈｅｒｅ年齢＞＝６０ｏｒ勤続年数＞＝４０
へと変形することで，同じ従業員の取出し処理を二回行う可能性を排除するといった手法が広く知られている．In addition, as an example of a technique that considers only one request by modifying the data retrieval request to eliminate duplication, the relational database is “a set of employee numbers of employees over the age of 60”. , SQL search expression representing a request that “take out the sum of a set of employee numbers of employees who have worked for 40 years or more among employees”:
(Select employee number from employee where age> = 60)
union
(Select employee number from employee where years of service> = 40)
Is an SQL search expression that expresses a request to “retrieve a set of employee numbers of employees who are over 60 years old or whose service years are over 40 years”:
select employee number from employee
where age> = 60or years of service> = 40
There is a widely known method that eliminates the possibility of the same employee taking out twice by transforming it into

また，データ取出し要求を変形して重複を取り除く手法で，複数の要求を考慮する技術の例としては，ｒｅｍｉｎｄｅｒｑｕｅｒｙと呼ばれる手法が下記非特許文献１に開示されている．これは，以前に「年齢が６０歳以上の従業員の社員番号の集合を取り出せ」という要求があった場合，この要求内容とそれに対する結果のデータを保存しておき，その後に「年齢が５０歳以上の従業員の社員番号の集合を取り出せ」という要求があった場合には，これを，以前の要求の解集合との差分のみを取り出す「年齢が５０歳以上かつ６０歳未満の従業員の社員番号の集合を取り出せ」という要求に変形してデータを取り出し，その結果と，保存しておいた６０以上の従業員の社員番号の集合との集合和を取って，年齢が５０歳以上の従業員の社員番号の集合を生成する手法である．これにより，以前に取り出したことがある，６０以上の従業員の情報を繰り返し取り出す処理を排除することができる．この例からわかるように，ｒｅｍｉｎｄｅｒｑｕｅｒｙは（４）の分類軸で述べた，冗長性を取り除いて取り出したデータから，当初の要求通りの解となるデータを合成する機能についても，これを有していることがわかる．
Ｓ．Ｄａｒ他著 ”ＳｅｍａｎｔｉｃＤａｔａＣａｃｈｉｎｇａｎｄＲｅｐｌａｃｅｍｅｎｔ”ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＶｅｒｙＬａｒｇｅＤａｔａｂａｓｅｓ（ＶＬＤＢ）予稿集，頁３３０−３４１，１９９６ Non-patent document 1 discloses a technique called “reminder query” as an example of a technique that considers a plurality of requests by modifying a data retrieval request to eliminate duplication. This is because if there was a request to “take out a set of employee numbers of employees over 60 years of age” before, this request content and the resulting data are saved, and then “age 50 If there is a request to “take out the set of employee numbers of employees over the age of”, this is taken out only for the difference from the solution set of the previous request. “Employees who are over 50 and under 60 The data is extracted by transforming it into a request to “take out the set of employee numbers of the employees”, and the result is obtained by taking the set sum of the stored employee numbers of 60 or more employees. This is a method for generating a set of employee numbers of employees. This eliminates the process of repeatedly retrieving information on 60 or more employees that have been previously retrieved. As can be seen from this example, the reminder query has a function for synthesizing data as a solution as originally requested from data extracted by removing redundancy as described in the classification axis in (4). You can see that
S. Dar et al., “Semantic Data Caching and Replacement”, International Conference on Very Large Databases (VLDB) Proceedings, pages 330-341, 1996.

同様に，木構造を持つデータであるＸＭＬデータの管理においても，ｒｅｍｉｎｄｅｒｑｕｅｒｙの手法を用いる研究が下記非特許文献２で示されている．しかし，非特許文献２で示されている手法では，扱うデータは木構造データであっても，過去の要求の解集合中のある要素が新しい要求の解ともなる場合のみを考慮していて，新しい要求のある解が過去の要求のある解の部分木になっているような場合については考慮しておらず，分類軸の（１）に関しては，通常のｒｅｍｉｎｄｅｒｑｕｅｒｙの手法と同様に分類される．
Ｌ．Ｃｈｅｎ，Ｅ．Ａ．Ｒｕｎｄｅｎｓｔｅｉｎｅｒ著 ”ＡＣＥ−ＸＱ：ＡＣａｃｈＥ−ａｗａｒｅＸＱｕｅｒｙａｎｓｗｅｒｉｎｇｓｙｓｔｅｍ” ＩｎｔｅｒｎａｔｉｏｎａｌＷｏｒｋｓｈｏｐｏｎｔｈｅＷｅｂａｎｄＤａｔａｂａｓｅｓ（ＷｅｂＤＢ）予稿集，頁３１−３６，２００２ Similarly, in the management of XML data that is data having a tree structure, research using the method of the reminder query is shown in Non-Patent Document 2 below. However, in the method shown in Non-Patent Document 2, even if the data to be handled is tree structure data, only the case where an element in the solution set of the past request becomes the solution of the new request is considered. The case where the solution with a new requirement is a subtree of a solution with a requirement in the past is not considered, and the classification axis (1) is classified in the same manner as the ordinary reminder query method. R
L. Chen, E .; A. RUNDENSTINER "ACE-XQ: A CacheE-aware XQuery answering system" International Works on the Web and Databases (WebDB) Proceedings, pp. 31-36, 2002

また，木構造データが別の木構造データの部分木として出現する場合の冗長性とは異なるが，ある要求に対する解集合中の異なる要素の間で，その値の一部分のみが共通である場合に，その部分の冗長性を取り除くための手法が下記特許文献３に開示されている．
特開平０７−１２１５７１
この手法では，取り出したデータが例えば
佐藤，総務部
田中，総務部
竹田，人事部
石井，人事部
のような表形式のデータであった場合，これを「同じ値」を表わす符合「〃」を用いて
佐藤，総務部
田中，〃
竹田，人事部
石井，〃
のような表現へと変形することで，冗長性を取り除く．This is different from the redundancy when tree structure data appears as a subtree of another tree structure data, but only when a part of the value is common among different elements in the solution set for a request. , A technique for removing the redundancy of the portion is disclosed in Patent Document 3 below.
JP 07-121571 A
In this method, if the retrieved data is tabular data such as Sato, General Affairs Department Tanaka, General Affairs Department Takeda, Human Resources Department Ishii, and Human Resources Department, this is represented by the sign “〃” representing “same value”. Use Sato, General Affairs Department Tanaka, Satoshi
Takeda, Human Resources Department Ishii, Satoshi
Redundancy is removed by transforming into an expression like.

しかしながら，上で分類軸（１）として挙げた点に関して，木構造として表現できるデータを管理する際に生じる，ある解が別の解の一部をなす部分木としても出現するような種類の重複についても考慮するような技術については，これまでに開示されていない．そのため，このような形での重複を含むデータを，取出し，送信，呈示，保存等を行うことによって，不必要に処理コストや必要記憶容量を増大させたり，ユーザの閲覧利便性を低下させるという問題がある．However, with respect to the points listed above as the classification axis (1), there is a kind of duplication that occurs when managing data that can be expressed as a tree structure, where one solution also appears as a subtree that forms part of another solution. No technology has been disclosed so far. Therefore, by taking out, transmitting, presenting, and storing data containing duplicates in this way, the processing cost and required storage capacity are unnecessarily increased, and the user's browsing convenience is reduced. There's a problem.

また，同一のデータが解として複数回出現するような重複のみを考えた場合に関しても，前述のｒｅｍｉｎｄｅｒｑｕｅｒｙの手法では，重複を取り除いた状態でデータを取り出してから，本来の解集合を合成するということが，常に可能とは限らない．例えば，以前に「４０歳以上６０歳未満の従業員の社員番号の集合を取り出せ」という要求があり，その結果を保存してある時に，新たな要求として「３０歳以上５０歳未満の従業員の社員番号の集合を取り出せ」という要求が与えられたとする．この場合，４０歳以上５０歳未満の従業員の社員番号については，全て保存しているデータの中に既に含まれており，これを再度取り出すのは冗長である．しかし，保存している４０歳以上６０歳未満の従業員の「社員番号」のデータを見ただけでは，どの社員番号が５０歳未満の社員の番号であるのか判定不可能であり，そのため，再び「３０歳以上５０歳未満の従業員の社員番号の集合を取り出せ」という要求でデータを取り出すしか方法がない．In addition, even when considering only duplication in which the same data appears multiple times as a solution, the above-described reminder query method extracts the data with the duplication removed and then synthesizes the original solution set. That is not always possible. For example, when there is a request that “a set of employee numbers of employees aged 40 and under 60” can be retrieved, and when the results are saved, a new request is “an employee aged 30 and under 50” Suppose that a request is issued to “take out a set of employee numbers for the current employee”. In this case, all the employee numbers of employees aged 40 and under 50 are already included in the stored data, and it is redundant to retrieve it again. However, it is not possible to determine which employee number is the number of an employee under 50 years old simply by looking at the stored data of “employee number” of an employee under 40 years old and under 60 years old. The only way to retrieve the data is to request the retrieval of a set of employee numbers for employees aged 30 to under 50.

これは，ｒｅｍｉｎｄｅｒｑｕｅｒｙの手法が通常想定しているような，最初に「４０歳以上６０歳未満」という条件の要求が与えられた時点では，その後に「３０歳以上５０歳未満」という条件の要求が与えられることは予測不可能である状況では避けられない問題であるが，これら「４０歳以上６０歳未満」という条件の要求と「３０歳以上５０歳未満」という条件の要求の二つが同時に与えられるような場合には，これらの要求を「３０歳以上４０歳未満」と「４０歳以上５０歳未満」と「５０歳以上６０歳未満」の三つの要求に分割して実行すれば，重複を取り除いた形でデータを取得し，その後にこれらのデータを使って，当初の要求である「４０歳以上６０歳未満」の社員番号の集合と「３０歳以上５０歳未満」の社員番号の集合を生成することは容易である．よって，この例のように複数の取出し要求が「同時に」与えられる場合は，単純にｒｅｍｉｎｄｅｒｑｕｅｒｙの手法を適用するのでは，重複するデータの取出しを行わずに，かつ後から当初の解集合を合成する方法が存在するにも関わらず，不必要に重複するデータの取出しを行って処理コストを増大させてしまう場合があるという問題がある．This is because when the request for the condition of “40 years old and under 60 years old” is first given, the condition of “30 years old and under 50 years old” is assumed. It is an unavoidable problem in a situation where the demand is given unpredictably, but there are two requirements: the requirement of “40 to under 60” and the requirement of “30 to under 50”. If these requests are given at the same time, these requests are divided into three requests, “30 to 40 years old”, “40 to 50 years” and “50 to 60 years”. , Obtain data in a form that eliminates duplicates, and then use these data to collect the initial request for an employee number of “40 to under 60” and “30 to under 50” Numbered It is easy to generate a slip. Therefore, when multiple retrieval requests are given “simultaneously” as in this example, simply applying the reminder query method does not retrieve duplicate data, and the original solution set later Although there is a method of combining, there is a problem that the processing cost may be increased by extracting duplicate data unnecessarily.

本発明の目的は，以上のような問題点に鑑み，一つまたは複数のデータ取出し要求が同時に与えられた場合にはいかなる場合においても，かつ，ある解が木構造データを持つ他の解の一部をなす部分木として出現するような種類の重複まで含めて，重複を取り除き，かつ，この重複を取り除いたデータから，必要に応じて当初の取出し要求の解集合を合成する機能を持つデータ管理プログラムおよびシステムを提供することによって，データの取出し，送信の処理コストの低減や，データの保存の記憶容量の節減，データをユーザへ呈示する際の閲覧利便性の向上を達成することにある．In view of the above problems, the object of the present invention is in any case when one or a plurality of data retrieval requests are given at the same time, and a solution of another solution having tree structure data. Data that has the function of removing the duplication, including the type of duplication that appears as a partial subtree, and combining the solution set of the original retrieval request as needed from the data from which this duplication was removed By providing a management program and system, it is possible to reduce the processing cost of data retrieval and transmission, reduce the storage capacity of data storage, and improve the browsing convenience when presenting data to the user. .

本発明のデータ管理プログラムおよびシステムは，データの取出し要求の集合が与えられた際には，いずれかの取出し要求の解となるデータ全てを，それがどの取出し要求に対する解集合には含まれどの取出し要求に対する解集合には含まれないかに基づいて分類したものを生成する．これにより，同じデータが複数の取出し要求の解集合に重複して現れるという冗長性を除去することができると同時に，後述のように，これらの分類から当初の取出し要求の解集合を合成することができる．In the data management program and system according to the present invention, when a set of data fetching requests is given, all the data that are the solutions to any fetching request are included in the answer set for which fetching request. Generates a classification based on whether it is not included in the solution set for the retrieval request. This eliminates the redundancy that the same data appears redundantly in multiple fetch request solution sets, and simultaneously synthesizes the original fetch request solution set from these classifications, as described below. Is possible.

本発明のデータ管理プログラムおよびシステムは，上記分類を，与えられた取出し要求を用いて各解集合を取り出した後に生成するか，または，与えられた取出し要求をもとに合成した取出し要求を用いて，直接，各分類を取り出す．後者の場合においては，各分類の取り出しを，その分類を解集合に含むとされる取出し要求の全ての積と，その分類を解集合に含まないとされる取出し要求の全ての和の，これら二つの差に相当する取出し要求を用いて行う．The data management program and system of the present invention generates the above classification after extracting each solution set using a given retrieval request, or uses a retrieval request synthesized based on a given retrieval request. To directly extract each classification. In the latter case, the retrieval of each classification is the product of all the retrieval requests whose classification is included in the solution set and all the sums of the retrieval requests whose classification is not included in the solution set. The retrieval request corresponding to the difference between the two is used.

そして，前述のように，本発明のデータ管理プログラムおよびシステムは，上述の各分類から，必要に応じて，当初の取出し要求の解集合を合成することができる．この合成は，該取出し要求の解集合に含まれるとされていた分類全ての和集合を取ることで行う．As described above, the data management program and system of the present invention can synthesize a solution set of the initial retrieval request from each of the above classifications as necessary. This composition is performed by taking the union of all the classifications that were supposed to be included in the solution set of the retrieval request.

また，本発明のデータ管理プログラムおよびシステムは，木構造データ中の様々な深さの階層にある部分木の集合の取出し要求が与えられて，該要求が取り出す部分木の一つが，該要求が取り出す別の部分木の中にその一部をなす部分木としても重複して出現する場合には，この前者の部分木を該要求に対する解集合から取り除く．ここで，ある要素がある要素の部分木であるとは，二つの要素が全く同じ木である場合も含む物とする．これにより，該取出し要求の解集合の中に存在する，木構造データ特有の部分木としての重複による冗長性を除去することができる．Also, the data management program and system of the present invention are given a request to fetch a set of subtrees at various depths in the tree structure data, and one of the subtrees taken out by the request is If a subtree that forms part of another subtree that appears is duplicated, the former subtree is removed from the solution set for the request. Here, an element is a subtree of an element, including the case where two elements are exactly the same tree. As a result, redundancy due to duplication as a subtree specific to the tree structure data existing in the solution set of the retrieval request can be eliminated.

同様に，本発明のデータ管理プログラムおよびシステムは，木構造データ中の様々な深さの階層にある部分木の集合の取出し要求が，複数個与えられて，該複数要求中のある要求が取り出す部分木の一つが，該複数要求中の別の要求が取り出すある部分木の中にその一部をなす部分木としても重複して出現する場合には，この前者の部分木を対応する要求に対する解集合から取り除く．ここでも，ある要素がある要素の部分木であるとは，二つの要素が全く同じ木である場合も含む物とする．これにより，該複数取出し要求の解集合の間に存在する，木構造データ特有の部分木としての重複による冗長性を除去することができる．Similarly, according to the data management program and system of the present invention, a plurality of requests for extracting a set of subtrees at various levels in the tree structure data are given, and a request in the plurality of requests is extracted. If one of the subtrees also appears as a subtree that forms part of it in another subtree taken by another request in the multiple requests, this former subtree is assigned to the corresponding request. Remove from the solution set. Again, an element is a subtree of an element, even if two elements are exactly the same tree. As a result, it is possible to eliminate redundancy due to duplication as a subtree peculiar to tree structure data that exists between the solution sets of the multiple fetch requests.

本発明のデータ管理プログラムおよびシステムは，上述の重複する部分木の除去を，与えられた取出し要求を用いて部分木集合を取り出した後に行うか，あるいは，与えられた取出し要求をもとに合成した取出し要求を用いて，重複が取り除かれた部分木集合を直接取り出すことで行う．According to the data management program and system of the present invention, the above-described overlapping subtree is removed after a subtree set is extracted using a given fetch request, or is synthesized based on a given fetch request. This is done by directly extracting the subtree set from which duplicates have been removed.

そして，本発明のデータ管理プログラムおよびシステムは，上述のようにして，重複を取り除かれた部分木集合から，当初の取出し要求の解となる部分木集合を合成する機能を有する．The data management program and system according to the present invention have a function of synthesizing a subtree set as a solution to the initial retrieval request from the subtree set from which duplication is removed as described above.

また，本発明のデータ管理プログラムおよびシステムは，木構造データ中の様々な深さの階層にある部分木の集合の取出し要求がパス式の形で一つ与えられた際には，該パス式の解となる部分木のうち，該パス式の他の解の一部をなす部分木とならない物の全てを，それが該パス式のどの接頭辞にはマッチし該パス式のどの接頭辞にはマッチしないかに基づいて分類したものを生成する．これにより，該取出し要求の解集合中に存在する，木構造データ特有の部分木としての重複による冗長性を除去することができると同時に，後述のように，これらの分類から当初の取出し要求の解集合を合成することができる．ここで，パス式とは，木構造データから取り出したいノードの指定を，木構造データの根ノードを始点，取り出したいノードを終点とするパス上の各ノードが満たすべき条件を記述することで行うもので，通常，各ノードに関する条件を根ノードに近い側のノードの条件から順に並べた形で記述される．In addition, the data management program and system of the present invention provide a path expression when a request for extracting a set of subtrees at various levels in the tree structure data is given in the form of a path expression. All the subtrees that do not form part of the other solutions of the path expression are matched to which prefix of the path expression and which prefix of the path expression Generates a classification based on whether does not match. As a result, redundancy due to duplication as a subtree unique to the tree structure data existing in the solution set of the fetch request can be removed, and at the same time, as described later, the initial fetch request can be removed from these classifications. Solution sets can be synthesized. Here, the path expression specifies the node to be extracted from the tree structure data by describing the conditions to be satisfied by each node on the path starting from the root node of the tree structure data and ending at the node to be extracted. Usually, the conditions for each node are described in order from the condition of the node closer to the root node.

同様に，本発明のデータ管理プログラムおよびシステムは，木構造データ中の様々な深さの階層にある部分木の集合の取出し要求がパス式の形で複数個与えられた際には，いずれかのパス式の解となる部分木のうち，いずれかのパス式の他の解の一部をなす部分木とならない物の全てを，それがどのパス式の解集合には含まれどのパス式の解集合には含まれないか，および，それがどのパス式のどの接頭辞にはマッチしどのパス式のどの接頭辞にはマッチしないかの，以上の二点に基づいて分類したものを生成する．これにより，該複数取出し要求の解集合の間に存在する，木構造データ特有の部分木としての重複による冗長性を除去することができると同時に，後述のように，これらの分類から当初の取出し要求の解集合を合成することができる．Similarly, the data management program and system according to the present invention can be used when a plurality of requests for extracting a set of subtrees at various depths in the tree structure data are given in the form of a path expression. All of the subtrees that are solutions of the path expression that do not become subtrees that form part of the other solutions of any path expression are included in the solution set of which path expression. Classified based on the above two points, which are not included in the solution set of, and which prefix of which path expression matches and which prefix of which path expression does not match Generate. This eliminates redundancy due to duplication as a subtree specific to the tree structure data that exists between the solution sets of the multiple fetch requests, and at the same time, as described later, the initial fetch from these classifications. You can compose a solution set of requirements.

本発明のデータ管理プログラムおよびシステムは，上記の分類を，与えられたパス式を用いて全データを取り出した後に生成するか，または，与えられたパス式をもとに合成したパス式を用いて，各分類を取り出す．後者の場合においては，パス式が一つ与えられた場合には，各分類を取り出すパス式の合成を，与えられたパス式およびその分類とマッチするとされていた接頭辞の全ての積と，その分類とマッチしないとされていた接頭辞の全ての和の，これら二つの差に相当するパス式を用いて行う．また，後者の場合において，パス式が複数個与えられた場合には，各分類を取り出すパス式の合成を，その分類を解集合に含むとされていた取出し要求およびその分類とマッチするとされていた接頭辞の全ての積と，その分類を解集合に含まないとされていた取出し要求およびその分類とマッチしないとされていた接頭辞の全ての和の，これら二つの差に相当するパス式を用いて行う．The data management program and system according to the present invention generates the above classification after extracting all data using a given path expression, or uses a path expression synthesized based on the given path expression. To extract each classification. In the latter case, if a single path expression is given, the composition of the path expression that retrieves each class is combined with all the products of the given path expression and the prefix that was supposed to match that class, This is done using the path expression corresponding to the difference between these two sums of all the prefixes that were not considered to match the classification. In the latter case, if multiple path expressions are given, the composition of the path expression that extracts each classification is considered to match the retrieval request and the classification that were included in the solution set. A path expression corresponding to the difference between these two products, the product of all the prefixes, the retrieval request whose classification was not included in the solution set, and the sum of all the prefixes that were not considered to match that classification. This is done using.

また，前述のように，本発明のデータ管理方法およびシステムは，上述の各分類から，必要に応じて当初のパス式の解集合を合成することができる．この合成は，該パス式の解集合に含まれるとされていた分類の全てと，および，該パス式の各接頭辞にマッチするとされていた各分類中の部分木から，該パス式中で該接頭辞と対になる接尾辞を用いて取り出した部分木の集合との，これら全ての和集合を取ることによって行う．In addition, as described above, the data management method and system of the present invention can synthesize a solution set of the original path formula from the above-mentioned classifications as necessary. This synthesis is performed in the path expression from all the classifications that were supposed to be included in the solution set of the path expression and the subtree in each classification that was supposed to match each prefix of the path expression. This is done by taking the union of all of these with the set of subtrees extracted using the suffix paired with the prefix.

解の一部が重複する複数の検索式が同時に与えられた場合には，上述のような方法でデータを取り出すことによって，従来のｒｅｍｉｎｄｅｒｑｕｅｒｙの手法では対応できないような場合においても，重複を含まない形でデータを取り出すことが可能となり，従来の方法やシステムに比べて，データの取出しや送信の処理コストの軽減，データの保存のための記憶容量の節減，データを呈示されたユーザの閲覧利便性向上を実現することができる．If multiple search formulas with overlapping parts of the solution are given at the same time, the data is extracted by the method described above, and even if it cannot be handled by the conventional reminder query method, duplicates are included. The data can be retrieved in a form that is less than the conventional methods and systems, the processing cost of data retrieval and transmission is reduced, the storage capacity for data storage is reduced, and the user who is presented with the data is viewed Convenience improvement can be realized.

また，木構造データを管理する場合に生じる，ある解が，他の解の一部をなす部分木としても出現するという種類の重複についても，これを取り除いた状態でデータの取出し，送信，呈示，保存を行うことができ，従来の方法やシステムに比べて，木構造データを扱う場合の，データの取出しや送信の処理コストの軽減，データの保存のための記憶容量の節減，データを呈示されたユーザの閲覧利便性向上を実現することができる．In addition, with regard to the type of duplication that occurs when managing tree-structured data such that a solution appears as a subtree that forms part of another solution, data is extracted, transmitted, and presented in a state where this is removed. Compared with conventional methods and systems, the processing cost of data retrieval and transmission can be reduced compared to conventional methods and systems, storage capacity for data storage can be reduced, and data can be presented It is possible to improve the viewing convenience of the users who have been sent.

また，本発明のデータ管理方法またはシステムは，重複を取り除いた状態のデータから，当初の要求通りのデータを合成する機能を有するため，例えば，重複を取り除いた状態でネットワーク等を介してデータを送信し，これを受信した側で，当初の取出し要求で要求されていたデータを合成することによって，ユーザに呈示する内容を当初の要求通りの物に維持しつつ，ネットワークの送信コストを節減することができる．同様に，重複を取り除いた状態でデータを保存し，後に必要になった時点で，当初の取出し要求で要求されていたデータを合成することによって，ユーザに呈示する内容を当初の要求通りの物に維持しつつ，データの保存のための記憶容量を節減することができる．In addition, since the data management method or system of the present invention has a function of synthesizing data as originally requested from data in a state where duplication has been removed, for example, data can be sent via a network or the like in a state where duplication has been removed. By transmitting and receiving this data, the data requested in the original retrieval request is synthesized, so that the content presented to the user is maintained as the original request, and the transmission cost of the network is reduced. be able to. Similarly, save the data with duplicates removed, and synthesize the data requested in the original retrieval request when it is needed later, so that the content presented to the user is the one as originally requested. It is possible to reduce the storage capacity for storing data while maintaining the above.

（第一の実施の形態）
次に，本発明の第一の実施の形態の詳細について，詳しく説明する．本説明においては，本発明を，ＪＩＳ規格であるＳＱＬ検索式によってデータの取出し要求を行う関係データベースに適用する場合の一実施形態について説明する．本実施形態においては，与えられた複数検索式の解集合が互いに重複する要素を含みうる．よって，システムは，与えられた複数検索式のうちのいずれかの検索式の解集合に含まれるような全データを，そのデータがどの検索式の解には含まれどの検索式の解には含まれないかに基づいて分類したものを作成する．これにより，同じデータが複数の取出し要求の解集合に重複して現れるという冗長性を除去することができ，さらに，後述のように，これらの分類から，必要に応じて，当初の検索式の解を合成することができる．一方，扱うデータは木構造データではないので，解集合中のある要素が，同じまたは別の解集合中の別の要素の部分木として重複して出現するような種類の重複は，当実施形態においては考慮する必要はない．(First embodiment)
Next, the details of the first embodiment of the present invention will be described in detail. In this description, an embodiment will be described in which the present invention is applied to a relational database that makes a data retrieval request using an SQL search formula that is a JIS standard. In the present embodiment, the solution sets of a given plurality of search expressions can include elements that overlap each other. Therefore, the system will include all the data that is included in the solution set of any one of the given search expressions, in which search expression solution the data is included in which search expression solution. Create a classification based on whether it is not included. This eliminates the redundancy that the same data appears redundantly in the solution set of multiple retrieval requests. Furthermore, as described later, the initial search formula can be changed from these classifications as necessary. You can synthesize the solution. On the other hand, since the data to be handled is not tree-structured data, the type of duplication in which one element in the solution set appears redundantly as a subtree of another element in the same or another solution set is the present embodiment. There is no need to consider in.

図２に，本実施形態におけるシステムの全体構成図を示す．本実施形態におけるシステムは，関係データを格納する格納装置４と，ユーザからＳＱＬで記述された検索式を受け取る入力装置１と，入力装置１が受け取った一つまたは複数の検索式を受け取り，各検索式について，格納装置４中の関係データから，その検索式の解となるデータの集合を取り出す検索装置３と，検索装置３が取り出したデータを受け取って，これらのデータから，上述の各分類を生成する重複除去装置１０と，重複除去装置が生成したデータを受け取って出力する出力装置６からなる．Figure 2 shows the overall configuration of the system in this embodiment. The system according to the present embodiment receives a storage device 4 that stores relational data, an input device 1 that receives a search expression described in SQL from a user, and one or a plurality of search expressions received by the input device 1. As for the retrieval formula, from the relational data in the storage device 4, the retrieval device 3 that extracts a set of data that is the solution of the retrieval formula, and the data retrieved by the retrieval device 3 are received, and the above-mentioned classifications are obtained from these data. And an output device 6 that receives and outputs data generated by the deduplication device.

以下，本実施形態において，入力装置が検索式の集合を受け取った際の処理手順について説明する．例として，入力装置に三つの検索式が与えられた場合を考える．この場合，これらの検索式の解集合は互いに重複する要素を含みうる．一般には，三つの検索式が与えられた場合，それらの解集合をＡ１，Ａ２，Ａ３とすると，それら三つの集合の関係は，図３に示すベン図のようになる．そこて，本システムでは，重複除去装置はＡ１，Ａ２，Ａ３を受け取ると，これらに含まれるデータを，図３に示すベン図中の７つの領域に対応する７つのグループに分類したもの，すなわち，以下の式で定義される７つの集合を生成する．
Ａ１∩Ａ２∩Ａ３
（Ａ１∩Ａ２） − Ａ３
（Ａ１∩Ａ３） − Ａ２
（Ａ２∩Ａ３） − Ａ１
Ａ１ − （Ａ２∪Ａ３）
Ａ２ − （Ａ１∪Ａ３）
Ａ３ − （Ａ１∪Ａ２）
ただし，ここで，∩は集合の積演算，∪は集合の和演算，−は集合の差演算を表わすものとする．これら７つのグループが互いに重複する要素を含まないことは，図３に示すベン図より明らかである．The processing procedure when the input device receives a set of search expressions in this embodiment will be described below. As an example, consider the case where three search expressions are given to the input device. In this case, the solution set of these search expressions may contain elements that overlap each other. In general, when three search expressions are given and their solution sets are A1, A2, and A3, the relationship between these three sets is as shown in the Venn diagram shown in FIG. Therefore, in this system, when the deduplication apparatus receives A1, A2, and A3, the data included therein are classified into seven groups corresponding to the seven regions in the Venn diagram shown in FIG. Generate seven sets defined by the following equations.
A1∩A2∩A3
(A1∩A2)-A3
(A1∩A3)-A2
(A2∩A3)-A1
A1-(A2∪A3)
A2-(A1∪A3)
A3-(A1∪A2)
Here, ∩ represents a set product operation, ∪ represents a set sum operation, and − represents a set difference operation. It is clear from the Venn diagram shown in Fig. 3 that these seven groups do not contain overlapping elements.

より一般には，ｎ個の検索式を受け取った場合，それらの解集合をＡ１，…，Ａｎとすると，Ｓ＝｛Ａ１，…，Ａｎ｝の空でない任意の部分集合Ｔに対して，

で定義される集合を求めていけばよい．これは，別の言い方で言えば，Ａ１，…，Ａｎのいずれかに含まれる全データを，それがＡ１，…，Ａｎのうちのどれには含まれ，どれには含まれないかに基づいて分類していることに相当する．More generally, when n search expressions are received and their solution sets are A1,..., An, for any non-empty subset T of S = {A1,.

Find the set defined by. In other words, this is based on whether all data included in one of A1,..., An is included in which of A1,. Corresponds to classification.

ユーザが，当初の検索式のいずれかの解集合中に含まれる全データを閲覧することさえできれば良い場合は，このようにして重複を除去した分類を出力することによって，ユーザは当初の三つの検索式の解そのものを閲覧する場合に比べて，同一のデータを複数回見ないですむことになり，閲覧の利便性が向上する．If the user only needs to be able to browse all the data contained in any solution set of the original search expression, the user can output the original three categories by outputting the classification with duplicates removed in this way. Compared to browsing the solution of the search formula itself, the same data need not be viewed multiple times, and the convenience of browsing is improved.

また，このようにして生成された分類から，あとで必要に応じて，当初の検索式の解集合を合成することも可能である．例えば，前述の三つの検索式の解集合Ａ１，Ａ２，Ａ３の例では，一つ目の検索式の解集合Ａ１は，以下の四つの解集合
Ａ１∩Ａ２∩Ａ３
（Ａ１∩Ａ２）−Ａ３
（Ａ１∩Ａ３）−Ａ２
Ａ１−（Ａ２ＵＡ３）
の集合和を求めればよい．Ａ２，Ａ３についても同様である．この点も，図３から容易に確認できる．It is also possible to compose the solution set of the original search formula from the classification generated in this way, if necessary. For example, in the example of the solution sets A1, A2, and A3 of the three search formulas described above, the solution set A1 of the first search formula is the following four solution sets A1∩A2∩A3.
(A1∩A2) -A3
(A1∩A3) -A2
A1- (A2UA3)
Find the set sum of. The same applies to A2 and A3. This point can also be easily confirmed from Fig. 3.

より一般には，ある解集合Ａｉを合成するには，以下の式を用いればよい．

これは別の言い方で言えば，Ａｉに含まれるとされていた分類の全ての和集合を取ることに相当する．上式が成り立つことについては，複数の集合をベン図で表わした際の，ベン図中の各領域と各集合の間の関係に関する一般的に知られている数学的性質より明らかである．More generally, in order to synthesize a certain solution set Ai, the following equation may be used.

In other words, this is equivalent to taking all unions of classifications that were supposed to be included in Ai. The fact that the above equation holds is clear from the generally known mathematical properties of the relationship between each region in the Venn diagram and each set when multiple sets are represented by a Venn diagram.

よって，上述の方法で生成した重複を含まない各分類を，ディスクに保存したり，ネットワーク等を介して他の計算機へ送信したりし，その後に，当初の検索式の解集合を合成してユーザに呈示することによって，ユーザに呈示される内容を当初の検索式の解集合から変えることなく，データの保存のための記憶領域やデータの送信のための通信コストを低減することができる．Therefore, each classification that does not contain duplicates generated by the above method is saved to disk or sent to other computers via a network, etc., and then the solution set of the original search formula is synthesized. By presenting it to the user, it is possible to reduce the storage area for storing data and the communication cost for transmitting the data without changing the content presented to the user from the solution set of the original search expression.

（第二の実施の形態）
次に，本発明の第二の実施の形態の詳細について，詳しく説明する．本実施形態では，第一の実施の形態同様，取出し要求をＳＱＬによって行う関係データベースに本発明を適用する場合の一実施形態について説明する．第一の実施形態との違いは，データを取り出した後に上述のような分類を生成するのではなく，与えられた検索式をもとに合成した検索式を用いることで，直接，上述の分類に当たるデータを取り出す点である．(Second embodiment)
Next, the details of the second embodiment of the present invention will be described in detail. In the present embodiment, as in the first embodiment, an embodiment in which the present invention is applied to a relational database in which an extraction request is made by SQL will be described. The difference from the first embodiment is that the above classification is not directly generated by using a search expression synthesized based on a given search expression, instead of generating the above classification after extracting data. This is the point where the data corresponding to is extracted.

図４に，本実施形態におけるシステムの全体構成図を示す．本実施形態におけるシステムは，関係データを格納する格納装置４と，ユーザからＳＱＬで記述された検索式の入力を受け取る入力装置１と，入力装置１が受け取った一つまたは複数の検索式をもとに新たな検索式を合成する検索式合成装置２と，検索式合成装置２が生成した一つまたは複数の検索式を受け取って，各検索式について，格納装置４中の関係データから，その検索式の解となるデータの集合を取り出す検索装置３と，検索装置３が取り出したデータを受け取って，当初の検索式の解を合成する検索解合成装置５と，検索解合成装置５が生成したデータを受け取って出力する出力装置６からなる．Figure 4 shows the overall configuration of the system in this embodiment. The system according to the present embodiment includes a storage device 4 that stores relational data, an input device 1 that receives input of a search expression described in SQL from a user, and one or more search expressions received by the input device 1. And a search expression synthesizer 2 for synthesizing a new search expression, and one or a plurality of search expressions generated by the search expression synthesizer 2 are received from the relational data in the storage device 4 for each search expression. Generated by a search device 3 that extracts a set of data that is a solution of the search formula, a search solution synthesis device 5 that receives the data extracted by the search device 3 and synthesizes the solution of the original search formula, and a search solution synthesis device 5 Output device 6 for receiving and outputting the received data.

これらの構成要素は全て同じ計算機上にあってもよいし，あるいは，図４に示すように，格納装置４と検索装置３のみが，サーバとなる計算機７の上にあって，入力装置１，検索式合成装置２，検索解合成装置５，出力装置６は，サーバ７とネットワーク８を介して接続されたクライアント計算機９の上にあるという配置でもよい．These components may all be on the same computer, or, as shown in FIG. 4, only the storage device 4 and the search device 3 are on the computer 7 serving as a server, and the input device 1, The search expression synthesizer 2, the search solution synthesizer 5, and the output device 6 may be arranged on a client computer 9 connected to the server 7 via the network 8.

以下，本実施形態において，入力装置が検索式の集合を受け取った際の処理手順について説明する．例として，以下の三つの検索式が与えられた場合を考える．
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞５０
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞４０ＡＮＤ所属＝経理
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞３０ＡＮＤ所属＝営業
一つ目の検索式は，社員データの一覧から，年齢が５０を超える全社員の氏名を取り出す要求を表わしている．また，二つ目の検索式は，社員データの一覧から，年齢が４０を超え，かつ所属が経理であるような全社員の氏名を取り出す要求を表わしている．同様に，三つ目の検索式は，社員データの一覧から，年齢が３０を超え，かつ所属が営業であるような全社員の氏名を取り出す要求を表わしている．The processing procedure when the input device receives a set of search expressions in this embodiment will be described below. As an example, consider the case where the following three search expressions are given.
SELECT Name FROM Employee WHERE Age> 50
SELECT name FROM employee WHERE age> 40 AND affiliation = account SELECT name FROM employee WHERE age> 30 AND affiliation = sales The first search formula is a request to retrieve the names of all employees over the age of 50 from the list of employee data Represents. The second search expression represents a request to retrieve the names of all employees whose ages are over 40 and whose affiliation is accounting from the list of employee data. Similarly, the third search expression represents a request to retrieve the names of all employees whose ages are over 30 and whose affiliation is sales from the list of employee data.

この場合，これらの検索式の解集合は，互いに重複する要素を含みうる．例えば，年齢が５０を超える経理所属の社員の氏名は，一つ目の検索式の解集合と二つ目の検索式の解集合の双方に，要素として現れる．同様に，年齢が５０を超える営業所属の社員の氏名は，一つ目の検索式の解集合と三つ目の検索式の解集合の双方に，要素として現れる．In this case, the solution set of these search expressions may contain elements that overlap each other. For example, the names of accounting employees over the age of 50 appear as elements in both the solution set of the first search expression and the solution set of the second search expression. Similarly, the names of employees who belong to the sales department who are over 50 years of age appear as elements in both the solution set of the first search expression and the solution set of the third search expression.

よって，本システムでは，検索式合成装置がこれらの検索式から，以下の式で表現されるような解集合を取り出す検索式を合成して格納装置へと送り，これらを用いて重複を取り除いた形でデータの取り出しを行う．
Ａ１∩Ａ２∩Ａ３
（Ａ１∩Ａ２） − Ａ３
（Ａ１∩Ａ３） − Ａ２
（Ａ２∩Ａ３） − Ａ１
Ａ１ − （Ａ２∪Ａ３）
Ａ２ − （Ａ１∪Ａ３）
Ａ３ − （Ａ１∪Ａ２）
ただし，ここで，Ａ１，Ａ２，Ａ３は，それぞれ上の三つの検索式の解集合を表わし，∩は集合の積演算，∪は集合の和演算，−は集合の差演算を表わすものとする．このような解集合を取り出すＳＱＬ検索式は，ＳＱＬのＩＮＴＥＲＳＥＣＴ構文とＵＮＩＯＮ構文とＤＩＦＦＥＲＥＮＣＥ構文を用いて作成可能で，例えば，集合（Ａ１∩Ａ２）−Ａ３を取り出す検索式は，以下のようになる．
（ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞５０
ＩＮＴＥＲＳＥＣＴ
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞４０ＡＮＤ所属＝経理）ＤＩＦＦＥＲＥＮＣＥ
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞３０ＡＮＤ所属＝営業Therefore, in this system, the search expression synthesizer synthesizes a search expression that extracts a solution set represented by the following expression from these search expressions, sends it to the storage device, and uses these to eliminate duplication. Extract data in the form.
A1∩A2∩A3
(A1∩A2)-A3
(A1∩A3)-A2
(A2∩A3)-A1
A1-(A2∪A3)
A2-(A1∪A3)
A3-(A1∪A2)
Here, A1, A2, and A3 represent solution sets of the above three search expressions, respectively, ∩ represents a set product operation, ∪ represents a set sum operation, and − represents a set difference operation. . An SQL search expression that extracts such a solution set can be created using the SQL INTERSECT syntax, UNION syntax, and DIFFERENCE syntax. For example, the search expression for extracting the set (A1∩A2) -A3 is as follows. .
(SELECT name FROM employee WHERE age> 50
INTERSECT
SELECT Name FROM Employee WHERE Age> 40 AND Affiliation = Accounting) DIFFENCE
SELECT Name FROM Employee WHERE Age> 30 AND Affiliation = Sales

同様に，集合Ａ１−（Ａ２∪Ａ３）を取り出す検索式は以下のようになる．
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞５０
ＤＩＦＦＥＲＥＮＣＥ
（ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞４０ＡＮＤ所属＝経理ＵＮＩＯＮ
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞３０ＡＮＤ所属＝営業）
他の６つについても同様である．Similarly, the retrieval formula for retrieving the set A1- (A2∪A3) is as follows.
SELECT Name FROM Employee WHERE Age> 50
DIFFERENCE
(SELECT name FROM employee WHERE age> 40 AND affiliation = accounting UNION
SELECT Name FROM Employee WHERE Age> 30 AND Affiliation = Sales)
The same is true for the other six.

実は，上の（Ａ１∩Ａ２）−Ａ３に対応する検索式は，以下のようなより簡単なＳＱＬに等価変形することも可能である．
ＳＥＬＥＣＴ氏名ＦＲＯＭ社員ＷＨＥＲＥ年齢＞５０ＡＮＤ所属＝経理また，同じ社員データの所属の項目の値が，経理でもあり同時に営業でもあることはありえないことを考えれば，Ａ１∩Ａ２∩Ａ３や（Ａ２∩Ａ３）−Ａ１に対応する検索式は，解集合が常に空集合であるため，必要ないことがわかる．しかし，このような，与えられた検索式の簡単化や解が常に空集合となる検索式の判定の技術は，先行技術が存在する，当発明とは独立した技術であるので，ここでは詳しくは述べない．In fact, the search expression corresponding to (A1∩A2) -A3 above can be equivalently transformed into the following simpler SQL.
SELECT name FROM employee WHERE age> 50 AND affiliation = accounting Also, considering that the value of the item belonging to the same employee data cannot be both accounting and sales, it can be A1∩A2∩A3 or (A2∩A3 ) It can be seen that the search formula corresponding to -A1 is not necessary because the solution set is always an empty set. However, the technique for determining a search expression in which simplification of a given search expression or a solution is always an empty set is a technique independent of the present invention in which prior art exists, and is described in detail here. Is not mentioned.

上述のようにして作成した７つの検索式の解集合が，互いに重複する要素を含まないことは，第一の実施形態の説明の中でも述べたように，直感的には，これらの７つの検索式が図３に示す，Ａ１，Ａ２，Ａ３の関係を表わすベン図中の７つの領域に対応する集合を解集合とする検索式であることから容易にわかる．The fact that the solution set of the seven retrieval formulas created as described above does not include elements that overlap each other, as described in the description of the first embodiment, intuitively, these seven retrieval formulas. It can be easily understood from the fact that the expression is a retrieval expression with the set corresponding to the seven regions in the Venn diagram representing the relationship between A1, A2 and A3 shown in FIG.

上の７つの検索式の解を受け取った検索解合成装置は，これらを用いて，当初の三つの検索式の解集合Ａ１，Ａ２，Ａ３を合成する．例えば，一つ目の検索式の解集合Ａ１は，以下の四つの解集合
Ａ１∩Ａ２∩Ａ３
（Ａ１∩Ａ２）−Ａ３
（Ａ１∩Ａ３）−Ａ２
Ａ１−（Ａ２∪Ａ３）
の集合和を求めればよい．Ａ２，Ａ３についても同様である．この点も，図３から容易に確認できる．The search solution synthesizer that receives the solutions of the above seven search formulas uses these to synthesize the solution sets A1, A2, and A3 of the initial three search formulas. For example, the solution set A1 of the first retrieval formula is represented by the following four solution sets A1∩A2∩A3.
(A1∩A2) -A3
(A1∩A3) -A2
A1- (A2∪A3)
Find the set sum of. The same applies to A2 and A3. This point can also be easily confirmed from Fig. 3.

このようにして合成した検索式を用いてデータの取り出しを行うことで，検索装置が格納装置から同じデータを複数回取り出すことを避けることができ，検索装置の部分の処理の効率を向上することができる．例えば，単純に，与えられた三つの検索式を順番に実行した場合，年齢が５０を超えていて所属が経理であるような社員の氏名のデータは，一つ目の検索式の実行の際と，二つ目の検索式の実行の際の，二回取り出されることになるが，上述の７つの検索式を用いてデータの取り出しを行った場合，いかなるデータも，二回以上，検索装置によって取り出されることはない．By retrieving data using the retrieval formula synthesized in this way, the retrieval device can avoid retrieving the same data from the storage device multiple times, and the processing efficiency of the retrieval device portion can be improved. Is possible. For example, if you simply execute the three given search formulas in order, the data of the names of employees whose ages are over 50 and whose affiliation is accounting are used when the first search formula is executed. When the second retrieval formula is executed, it is fetched twice. When data is fetched using the above seven retrieval formulas, any data is retrieved twice or more. Is not taken out by.

また，検索式合成装置や検索解合成装置がネットワーク等を介して接続されたクライアント上にある場合には，サーバからクライアントへは重複を取り除いたデータが送信され，クライアント上で，重複を含む当初の検索式の解集合を合成するため，ネットワークの通信コストを軽減することができる．In addition, if the search synthesizer or the search solution synthesizer is on a client connected via a network or the like, the data from which the duplicate is removed is transmitted from the server to the client, and the client initially includes the duplicate. The network communication cost can be reduced by synthesizing the solution set of search expressions.

また，ユーザが，当初の検索式の解集合中に含まれる全データを閲覧することさえできれば，必ずしも当初の検索式の解集合の形に合成する必要がない場合は，検索解合成装置の機能をオフにし，合成された検索式の解をそのまま出力装置に送って出力してもよい．これにより，ユーザは，当初の三つの検索式の解を閲覧する場合に比べて，同一のデータを複数回見ないですむことになり，閲覧の利便性が向上する．If the user can browse all the data included in the solution set of the original search formula, the function of the search solution synthesizer is not necessary if it is not necessarily synthesized into the form of the solution set of the original search formula. You can turn off and send the solution of the combined query to the output device as it is. As a result, the user is not required to view the same data multiple times compared to the case of browsing the solutions of the original three search expressions, thereby improving the convenience of browsing.

ここまでは検索式が三つの場合の例を用いて説明したが，一般的には以下のような手順で検索式の合成を行う．まず，検索式の集合Ｓ＝｛Ｑ１，…，Ｑｎ｝が与えられたとする．その場合，Ｓの空でない任意の部分集合Ｔに対して，Ｔに含まれる全ての検索式の積（すなわち，全ての検索式の条件を満たすデータを取り出す検索式）と，Ｔに含まれない全ての検索式の和（すなわち，いずれかの検索式の条件を満たすデータを取り出す検索式）との，これら二つの検索式の差（すなわち，前者の検索式の条件は満たすが，後者の検索式の条件は満たさないデータを取り出す検索式）にあたる検索式を作成する．すなわち，Ｓの空でない任意の部分集合Ｔに対して，

で定義される検索式Ｑ（Ｔ）を求める．ここでは，∩，∪，−はそれぞれ，検索式の積演算，和演算，差演算を表わし，ＳＱＬでは，前述のように，各々，ＩＮＴＥＲＳＥＣＴ構文，ＵＮＩＯＮ構文，ＤＩＦＦＥＲＥＮＣＥ構文で表現できる．ここでは，取出し要求の記述にＳＱＬ検索式を用いる場合を例にとっているが，他の多くの取出し要求記述言語においても，同様の積，和，差にあたる取出し要求の合成が容易に可能である．そして，全ての空集合でない部分集合Ｔに対するＱ（Ｔ）を使って，データを取り出す．上の式は，複数の集合をベン図で表わした際の，ベン図中の各領域に対応する集合を表わす式を，そのまま検索式の式に置き換えたものであり，よって，この式で定義される各検索式の解集合が重複を含まないことは明らかである．Up to this point, we have explained using an example with three search expressions, but in general, search expressions are synthesized in the following procedure. First, suppose that a set of search expressions S = {Q1,..., Qn} is given. In that case, for any non-empty subset T of S, the product of all the search expressions included in T (that is, the search expression that retrieves the data satisfying all the search expressions) and not included in T The difference between these two search expressions and the sum of all the search expressions (that is, a search expression that retrieves data that satisfies the conditions of either search expression) (that is, the former search expression satisfies the conditions, but the latter search) Create a search expression that retrieves data that does not satisfy the expression conditions. That is, for any non-empty subset T of S,

The search expression Q (T) defined by is obtained. Here, ∩, ∪, and-represent the product operation, sum operation, and difference operation of the search expressions, respectively, and can be expressed in the INTERSECT syntax, the UNION syntax, and the DIFFERENCE syntax, respectively, in SQL. In this example, SQL retrieval formulas are used to describe fetch requests. However, in many other fetch request description languages, fetch requests corresponding to similar products, sums, and differences can be easily combined. Then, data is extracted using Q (T) for all non-empty subsets T. The above expression replaces the expression that represents the set corresponding to each area in the Venn diagram when multiple sets are represented by the Venn diagram, and is therefore defined by this formula. It is clear that the solution set of each search expression does not contain duplicates.

そして，それらの解から当初の検索式Ｑｉの解を合成するには，データの取出しに用いた全ての検索式のうち，Ｑｉを含むＴを用いて作成された検索式の全ての解集合の和集合を取ればよい．すなわち，Ｑｉの解集合をＡｉ，Ｑ（Ｔ）の解集合をＡ（Ｔ）と書くことにすると，

である．上記の式が成り立つことについても，複数の集合をベン図で表わした際の，各領域と，各集合の間の関係に関する，一般的に知られている数学的性質より明らかである．Then, in order to synthesize the solution of the original search formula Qi from these solutions, out of all the search formulas used for extracting data, all solution sets of the search formula created using T including Qi are used. Take the union. That is, if the solution set of Qi is written as Ai and the solution set of Q (T) is written as A (T),

It is. The fact that the above equation holds is also clear from the generally known mathematical properties of each region and the relationship between each set when multiple sets are represented by a Venn diagram.

また，ここで，前述のような検索式の簡単化や不要な検索式の除外を行った場合は，検索解合成装置は，Ｑｉの解の合成については，Ｑｉを含むＴを用いて作成された検索式，またはそれを簡単化した検索式のうち，除外されなかったもの全てに対する解集合の和集合を取ることで，これを行う．Here, when simplification of the search expression or exclusion of unnecessary search expressions as described above, the search solution synthesis apparatus is created using T including Qi for the synthesis of the Qi solution. This is done by taking the union of the solution sets for all of the search expressions that have not been excluded.

（第三の実施の形態）
次に，本発明の第三の実施の形態の詳細について，詳しく説明する．本説明においては，本発明を，木構造で表現できるデータの一種であるＸＭＬデータを格納して，これに対するデータ取出し要求の記述にＸＰａｔｈと呼ばれるパス式の一種（より詳しくは，下記非特許文献３参照）を用いる，ＸＭＬデータベースシステムに適用する場合の一実施形態について説明する．
Ｊ．Ｃｌａｒｋ，Ｓ．ＤｅＲｏｓｅ著 ”ＸＭＬＰａｔｈＬａｎｇｕａｇｅＸＰａｔｈＶｅｒｓｉｏｎ２．０ − Ｗ３ＣＷｏｒｋｉｎｇＤｒａｆｔ” ｈｔｔｐ：／／ｗｗｗ．ｗ３．ｏｒｇ／ＴＲ／ｘｐａｔｈ２０ (Third embodiment)
Next, details of the third embodiment of the present invention will be described in detail. In this description, the present invention stores XML data, which is a kind of data that can be expressed in a tree structure, and is a kind of path expression called XPath in the description of a data retrieval request for this (more specifically, the following non-patent document 3), an embodiment when applied to an XML database system will be described.
J. et al. Clark, S.M. By DeRose, “XML Path Language XPath Version 2.0—W3C Working Draft” http: // www. w3. org / TR / xpath20

ＸＰａｔｈは，ＸＭＬの木構造中の任意の深さのノードを根とする部分木の集合を取り出す要求を記述する言語であり，よって，この実施形態においては，第一，第二の実施形態で生じたような，同じデータが複数の検索式の解集合に重複して現れるという種類の重複に加えて，取り出し要求が取り出すある部分木が，その取り出し要求自身または同時に与えられた別の取り出し要求が取り出す別の部分木の，その一部をなす部分木として重複して出現するという種類の重複も起こりうる．よって，本実施形態においては，このような重複についても取り除く処理を行う．本実施形態においては，これらの重複を，取り出したデータを調べることによって発見する．XPath is a language that describes a request for retrieving a set of subtrees rooted at nodes of arbitrary depth in an XML tree structure. Therefore, in this embodiment, in the first and second embodiments, In addition to the kind of duplication in which the same data appears in multiple search expression solution sets as it occurs, a subtree from which a fetch request is taken is either the fetch request itself or another fetch request given at the same time. There can also be a type of duplication where another subtree is taken out and appears as a subtree that forms part of it. Therefore, in the present embodiment, processing for removing such duplication is performed. In this embodiment, these duplications are found by examining the extracted data.

本実施形態におけるＸＭＬデータベースシステムの全体構成図は，第一の実施形態の場合と同様，図２に示すようになる．本実施形態におけるＸＭＬデータベースシステムは，大きなＸＭＬデータの木構造を格納する格納装置４と，ユーザからＸＰａｔｈで記述された検索式の入力を受け取る入力装置１と，入力装置１が受け取った一つまたは複数の検索式を受け取り，各検索式について，格納装置４中のＸＭＬデータから，その検索式の解となる部分木の集合を取り出す検索装置３と，検索装置３が取り出したデータを受け取って，このデータを調べて発見した重複を取り除く重複除去装置１０と，重複除去装置がが生成したデータを受け取って出力する出力装置６からなる．The overall configuration diagram of the XML database system in this embodiment is as shown in FIG. 2 as in the case of the first embodiment. The XML database system in the present embodiment includes a storage device 4 that stores a large XML data tree structure, an input device 1 that receives input of a search expression described in XPath from a user, and one or more received by the input device 1 A plurality of search expressions are received, and for each search expression, a search apparatus 3 that extracts a set of subtrees as a solution of the search expression from the XML data in the storage device 4, and the data extracted by the search apparatus 3 are received, It consists of a deduplication device 10 that removes duplicates found by examining this data, and an output device 6 that receives and outputs data generated by the deduplication device.

以下，本実施形態において，入力装置が検索式または検索式の集合を受け取った際の処理手順について説明する．例えば，入力装置が以下の三つのＸＰａｔｈ検索式の集合を受け取ったとする．
／ａ／ｂ
／ａ／ｃ／＊
／ａ／＊／ｄ
一つ目の／ａ／ｂは，格納されているＸＭＬデータの木構造の根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供のｂというタグ名を持つノードを取り出せという要求を表わす．一方，二つ目の検索式の中の＊は「任意のタグ名」を表わし，／ａ／ｃ／＊は，根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供のｃというタグ名を持つノードの，さらにその子供の任意のタグ名を持つノードを取り出せという要求を表わす．同様に，三つ目の／ａ／＊／ｄは，根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供の任意のタグ名を持つノードの，さらにその子供のｄというタグ名を持つノードを取り出せという要求を表わす．The processing procedure when the input device receives a search expression or a set of search expressions in this embodiment will be described below. For example, assume that the input device receives a set of the following three XPath search expressions.
/ A / b
/ A / c / *
/ A / * / d
The first / a / b starts from the root of the tree structure of the stored XML data, and has a node with a tag name of the child a and a node with the tag name of the child b. Represents a request to retrieve. On the other hand, * in the second search expression represents "arbitrary tag name", and / a / c / * starts from the root and has a node with the tag name of the child a. This represents a request to extract a node having a tag name of child c and an arbitrary tag name of the child. Similarly, the third / a / * / d starts from the root and has a child with a tag name of the child a, a node with an arbitrary tag name of the child, and the child. Represents a request to retrieve a node with a tag name of d.

よって，これら三つの検索式の解集合は，互いに重複して現れる要素を含みうる．具体的には，まず，／ａ／ｃ／ｄという検索式にマッチするようなノードは，全て／ａ／ｃ／＊と／ａ／＊／ｄの双方の解集合の要素として重複して現れる．また，／ａ／＊／ｄの解集合中の要素のうち，／ａ／ｂ／ｄという検索式にマッチするようなノードは，／ａ／ｂの解集合中のある要素の部分木としても重複して現れる．Therefore, the solution set of these three search expressions can contain elements that appear overlapping each other. Specifically, first, all nodes that match the search expression / a / c / d appear redundantly as elements of the solution sets of both / a / c / * and / a / * / d. . Of the elements in the solution set of / a / * / d, a node that matches the search expression / a / b / d is also a subtree of an element in the solution set of / a / b. It appears in duplicate.

例えば，図５に示す木構造で表現されるようなＸＭＬデータに対して上の三つの検索式を評価した場合，それぞれの検索式に対する解集合は，図６に示したようになる．ここで，／ａ／ｃ／＊の解集合中の二つの部分木のうち，左の物は，／ａ／＊／ｄの解集合中にも重複して現れている．また，／ａ／＊／ｄの解集合中の三つの部分木のうち，一番左の物は，／ａ／ｂの解集合中の部分木の，その一部をなす部分木としても重複して出現している．For example, when the above three search expressions are evaluated for XML data represented by the tree structure shown in FIG. 5, the solution set for each search expression is as shown in FIG. Here, of the two subtrees in the solution set of / a / c / *, the left one appears also in the solution set of / a / * / d. Of the three subtrees in the solution set of / a / * / d, the leftmost one is also duplicated as a subtree that forms part of the subtree in the solution set of / a / b And appear.

よって，この場合は，入力装置が前述の三つの検索式を検索装置に送って，検索装置がこれらに対する解集合を取り出し，これを重複除去装置に送った後，重複除去装置は送られた解集合を調べて，これらの重複のいずれか一方を除去し，例えば，図７に示す結果を出力装置へと送り，これを出力装置が出力する．Therefore, in this case, after the input device sends the above three retrieval formulas to the retrieval device, the retrieval device extracts the solution set for these, and sends them to the deduplication device, the deduplication device then sends the solution The set is examined and either one of these duplicates is removed. For example, the result shown in FIG. 7 is sent to the output device, which outputs it.

このように，同じ部分木が複数の解集合中に要素として現れるような重複や，ある解集合中のある要素が，別のあるいは同じ解集合中のある要素の部分木として現れるような重複を取り除いて出力することによって，ユーザが同じデータを複数回見ることがなくなり，ユーザの閲覧の利便性を増すことができる．In this way, duplication such that the same subtree appears as an element in multiple solution sets, or duplication such that one element in one solution set appears as a subtree of another element in another or the same solution set. By removing and outputting, the user does not see the same data multiple times, and the convenience of the user's browsing can be increased.

上の例では，複数の検索式を受け取って，ある検索式の解が，別の検索式の解，またはその一部をなす部分木として重複して出現する例を示したが，ＸＰａｔｈで用いられる構文「｜」や「／／」を含む検索式が与えられた場合には，ある一つの検索式の解集合中のある部分木が，同じ解集合中の別の部分木の一部をなす部分木として重複して出現する場合もある．例えば，以下の検索式が与えられたとする．
／ａ／ｂ｜／ａ／＊／ｄ
「｜」は，二つの検索式の「和」を表わし，上の検索式の場合，／ａ／ｂと／ａ／＊／ｄの二つの検索式の解集合の和集合が，解集合となる．すなわち，図６中の左端の集合と右端の集合の和集合が解集合となる．よって，右端の集合の一番左の部分木は，他の解である，左端の集合中の部分木の，その一部をなす部分木としても重複して出現することになる．よって，重複除去装置は，このような一つの解集合中の重複についても，これを除去する．In the above example, multiple search expressions are received and the solution of one search expression appears redundantly as a solution of another search expression or a subtree that forms part of it. When a search expression including the syntax "|" or "//" is given, a subtree in the solution set of one search expression is replaced with a part of another subtree in the same solution set. In some cases, they appear as subtrees formed. For example, given the following search expression:
/ A / b | / a / * / d
“|” Represents the “sum” of the two search expressions. In the above search expression, the union of the solution sets of the two search expressions / a / b and / a / * / d is the solution set. Become. That is, the union of the leftmost set and the rightmost set in Fig. 6 is the solution set. Therefore, the leftmost subtree of the rightmost set appears redundantly as a subtree in the leftmost set, which is another solution. Therefore, the duplicate elimination device also removes duplicates in one solution set.

同様に，「／／」を含む検索式の例として以下の検索式を考える．
／ａ／ｂ／／＊
「／／」は，直感的には「その下の任意の深さの子孫の」という意味を表わし，上の検索式は，根からスタートして，その子供のａというタグ名を持つノードの，その子供のｂというタグ名を持つノードの，その下の任意の深さの子孫の，任意のタグ名を持つノードを取り出せという要求を表わす．よって，この検索式を図５に示したデータに対して評価した場合，図８に示す部分木集合が解集合となる．この解集合中の右に示した解は，左に示した解の部分木としても重複して出現する．よって，重複除去装置は，このような重複についても，これを除去する．Similarly, consider the following search expression as an example of a search expression including "//".
/ A / b // *
Intuitively, “//” means “descendant of any depth below it”, and the above search formula starts from the root and has a child node with the tag name “a”. , Represents a request to extract a node having an arbitrary tag name of a descendant of an arbitrary depth below the node having the tag name b of the child. Therefore, when this search expression is evaluated for the data shown in FIG. 5, the subtree set shown in FIG. 8 becomes the solution set. The solution shown on the right in this solution set also appears as a subtree of the solution shown on the left. Therefore, the duplicate elimination device removes such duplicates.

（第四の実施の形態）
次に，本発明の第四の実施の形態の詳細について，詳しく説明する．本説明においては，本発明を，第三の実施形態と同様に，木構造で表現できるデータの一種であるＸＭＬデータを格納して，これに対するデータ取出し要求の記述にＸＰａｔｈと呼ばれるパス式の一種（より詳しくは，上記非特許文献３参照）を用いる，ＸＭＬデータベースシステムに適用する場合の一実施形態について説明する．(Fourth embodiment)
Next, details of the fourth embodiment of the present invention will be described in detail. In this description, similarly to the third embodiment, the present invention stores XML data, which is a kind of data that can be expressed in a tree structure, and is a kind of path expression called XPath in the description of a data retrieval request for this data. (For more details, see Non-Patent Document 3 above.) An embodiment when applied to an XML database system will be described.

第三の実施の形態の場合と同様，この実施形態においては，第一，第二の実施形態で生じたような，同じデータが複数の検索式の解集合に重複して現れるという種類の重複に加えて，取り出し要求が取り出すある部分木が，その取り出し要求自身または同時に与えられた別の取り出し要求が取り出す別の部分木の，その一部をなす部分木として重複して出現するという種類の重複も起こりうる．よって，本実施形態においては，これら双方の種類の重複を取り除く処理を行う．また，本実施形態においては，与えられた検索式をもとに合成した検索式を用いてデータを取り出すことで，これらの重複を取り除いた形でデータを取り出す．As in the case of the third embodiment, in this embodiment, the same type of duplication that the same data appears in the solution sets of a plurality of search formulas as occurred in the first and second embodiments. In addition, a subtree that a fetch request fetches appears as a subtree that forms part of the fetch request itself or another subtree that another fetch request given at the same time fetches. Duplication can also occur. Therefore, in this embodiment, processing for removing both types of overlap is performed. In the present embodiment, data is extracted in a form in which these duplicates are removed by extracting data using a search expression synthesized based on a given search expression.

図９に，本実施形態におけるＸＭＬデータベースシステムの全体構成図を示す．本実施形態におけるＸＭＬデータベースシステムは，大きなＸＭＬデータの木構造を格納する格納装置４と，ユーザからＸＰａｔｈで記述された検索式の入力を受け取る入力装置１と，入力装置１が受け取った一つまたは複数の検索式をもとに新たな検索式を合成する検索式合成装置２と，検索式合成装置２が生成した一つまたは複数の検索式を受け取って，各検索式について，格納装置４中のＸＭＬデータから，その検索式の解となる部分木の集合を取り出す検索装置３と，検索装置３が取り出したデータを受け取って出力する出力装置６からなる．Fig. 9 shows the overall configuration of the XML database system in this embodiment. The XML database system in the present embodiment includes a storage device 4 that stores a large XML data tree structure, an input device 1 that receives input of a search expression described in XPath from a user, and one or more received by the input device 1 A search expression synthesizer 2 that synthesizes a new search expression based on a plurality of search expressions, and one or more search expressions generated by the search expression synthesizer 2 are received, and each search expression is stored in the storage device 4. The search device 3 extracts a set of subtrees as a solution of the search formula from the XML data of the data, and the output device 6 receives and outputs the data extracted by the search device 3.

以下，本実施形態において，入力装置が検索式または検索式の集合を受け取った際の処理手順について説明する．例として，入力装置が前述の第三の実施の形態の場合と同様，以下の三つのＸＰａｔｈ検索式の集合を受け取ったとする．
／ａ／ｂ
／ａ／ｃ／＊
／ａ／＊／ｄThe processing procedure when the input device receives a search expression or a set of search expressions in this embodiment will be described below. As an example, assume that the input device receives the following three sets of XPath search expressions, as in the case of the third embodiment.
/ A / b
/ A / c / *
/ A / * / d

この場合，検索式合成装置は，まず，異なる検索式の解集合間のデータの重複を取り除くため，上記の三つの検索式を，以下のように変形する．
／ａ／ｂ
／ａ／ｃ／＊ − ／ａ／ｂ
／ａ／＊／ｄ − ／ａ／ｂ − ／ａ／ｃ／＊
より一般には，全検索式に適当な順番を決め，各検索式の末尾に，その検索式より順番の若い検索式全てを「−」でつないで並べれば良い．このようにして生成された検索式を，さらに，ある解が他の解の部分木として現れるような重複を取り除くため，以下のように変形する．
／ａ／ｂ−／ａ／ｂ／／＊−／ａ／ｃ／＊／／＊−／ａ／＊／ｄ／／＊
／ａ／ｃ／＊ − ／ａ／ｂ
− ／ａ／ｂ／／＊ − ／ａ／ｃ／＊／／＊ − ／ａ／＊／ｄ／／＊
／ａ／＊／ｄ − ／ａ／ｂ − ／ａ／ｃ／＊
−／ａ／ｂ／／＊−／ａ／ｃ／＊／／＊−／ａ／＊／ｄ／／＊
より一般には，各検索式の後ろに，全検索式の末尾に／／＊を追加したものを，−でつないで並べる．このように変形した検索式を用いてデータを取り出すことで，重複を除去した状態でデータを取り出すことができ，データの取り出し処理のコストの低減や，ユーザの閲覧の利便性の向上が期待できる．In this case, the retrieval formula synthesizer first modifies the above three retrieval formulas as follows in order to eliminate duplication of data between solution sets of different retrieval formulas.
/ A / b
/ A / c / *-/ a / b
/ A / * / d− / a / b− / a / c / *
More generally, an appropriate order is determined for all search expressions, and all the search expressions that are younger than the search expression are connected at the end of each search expression with "-". The search expression generated in this way is further transformed as follows to eliminate duplicates that cause a solution to appear as a subtree of another solution.
/ A / b- / a / b // *-/ a / c / * // *-/ a / * / d // *
/ A / c / *-/ a / b
-/ A / b // *-/ a / c / * // *-/ a / * / d // *
/ A / * / d− / a / b− / a / c / *
-/ A / b // *-/ a / c / * // *-/ a / * / d // *
More generally, after each search expression, add / * to the end of all the search expressions and put them together with-. By extracting data using the modified search expression, data can be extracted with duplicates removed, and the cost of data extraction processing can be reduced and the convenience of user browsing can be improved. .

ここで，上に示した変形後の検索式に対しては，簡単化や，解集合が常に空集合となる検索式の除去が可能である．しかし，前述の通り，これらは本発明とは独立した技術であるので，ここでは詳しくは述べない．また，ここでは，木構造データからの部分木集合の取出し要求の記述にＸＰａｔｈを用いる場合を例にとって，検索式の合成方法について説明したが，他の多くの取出し要求記述言語においても，同様の合成が可能である．Here, for the modified query shown above, it is possible to simplify or remove the query that makes the solution set always empty. However, as described above, these are technologies independent of the present invention, and will not be described in detail here. In addition, here, the method of synthesizing search expressions has been described by taking the case of using XPath to describe the extraction request of a subtree set from tree structure data, but the same applies to many other extraction request description languages. Synthesis is possible.

（第五の実施の形態）
次に，本発明の第五の実施の形態の詳細について，詳しく説明する．本説明においては，本発明を，第三，第四の実施形態と同様に，木構造で表現できるデータの一種であるＸＭＬデータを格納して，これに対するデータ取出し要求の記述にＸＰａｔｈと呼ばれるパス式の一種（より詳しくは，上記非特許文献３参照）を用いる，ＸＭＬデータベースシステムに適用する場合の一実施形態について説明する．(Fifth embodiment)
Next, details of the fifth embodiment of the present invention will be described in detail. In this description, as in the third and fourth embodiments, the present invention stores XML data, which is a kind of data that can be expressed in a tree structure, and a path called XPath in the description of the data retrieval request for this data. An embodiment when applied to an XML database system using a kind of formula (more specifically, see Non-Patent Document 3 above) will be described.

この実施形態においても，第一，第二の実施形態で生じたような，同じデータが複数の検索式の解集合に重複して現れるという種類の重複に加えて，第三，第四の実施の形態で生じたような，取り出し要求が取り出すある部分木が，その取り出し要求自身または同時に与えられた別の取り出し要求が取り出す別の部分木の，その一部をなす部分木として重複して出現するという種類の重複が起こりうる．よって，本実施形態においては，これら双方の種類の重複を取り除く処理を行う．また，本実施形態においては，与えられた検索式をもとに合成した検索式を用いてデータを取り出すことで，これらの重複を取り除いた形でデータを取り出す．さらに，本実施形態では，そのようにして重複を取り除いたデータから，必要に応じて，当初の検索式の解集合を合成することが可能である．Also in this embodiment, in addition to the kind of duplication that the same data appears in the solution sets of a plurality of retrieval formulas as in the first and second embodiments, the third and fourth implementations A subtree taken out by a fetch request, such as occurs in the form of, appears as a subtree that forms part of the fetch request itself or another subtree taken by another fetch request given at the same time The kind of duplication that occurs is possible. Therefore, in this embodiment, processing for removing both types of overlap is performed. In the present embodiment, data is extracted in a form in which these duplicates are removed by extracting data using a search expression synthesized based on a given search expression. Furthermore, in the present embodiment, it is possible to synthesize a solution set of the initial search formula from the data thus removed with duplication as necessary.

図１に，本実施形態におけるＸＭＬデータベースシステムの全体構成図を示す．本実施形態におけるＸＭＬデータベースシステムは，大きなＸＭＬデータの木構造を格納する格納装置４と，ユーザからＸＰａｔｈで記述された検索式の入力を受け取る入力装置１と，入力装置１が受け取った一つまたは複数の検索式をもとに新たな検索式を合成し，それと同時に，合成された検索式の解からの，当初の検索式の解の取出し方法を指示する記述を生成する検索式合成装置２と，検索式合成装置２が生成した一つまたは複数の検索式を受け取って，各検索式について，格納装置４中のＸＭＬデータから，その検索式の解となる部分木の集合を取り出す検索装置３と，検索装置３が取り出したデータ，および，検索式合成装置２が生成した，当初の検索式の解の取出し方法を指示する記述を受け取って，当初の検索式の解を合成する検索解合成装置５と，検索解合成装置５が生成したデータを受け取って出力する出力装置６からなる．Fig. 1 shows the overall configuration of the XML database system in this embodiment. The XML database system in the present embodiment includes a storage device 4 that stores a large XML data tree structure, an input device 1 that receives input of a search expression described in XPath from a user, and one or more received by the input device 1 A search expression synthesizer 2 that synthesizes a new search expression based on a plurality of search expressions and simultaneously generates a description indicating how to extract the solution of the original search expression from the solution of the combined search expression. And a search device that receives one or a plurality of search formulas generated by the search formula synthesizer 2 and extracts a set of subtrees as a solution of the search formula from the XML data in the storage device 4 for each search formula. 3 and the data retrieved by the retrieval device 3 and the description generated by the retrieval formula synthesis device 2 that indicates how to retrieve the solution of the original retrieval formula are received, and the solution of the original retrieval formula is synthesized. That a search solution synthesizer 5, and an output device 6 for outputting receiving data search solutions synthesizer 5 to produce.

これらの構成要素は全て同じ計算機上にあってもよいし，あるいは，格納装置４と検索装置３のみが，サーバとなる計算機７の上にあって，入力装置１，検索式合成装置２，検索解合成装置５，出力装置６は，サーバ７とネットワーク８を介して接続されたクライアント計算機９の上にあるという配置でもよい．These components may all be on the same computer, or only the storage device 4 and the search device 3 are on the computer 7 serving as a server, and the input device 1, the search formula synthesis device 2, and the search The disassembly / synthesis device 5 and the output device 6 may be arranged on the client computer 9 connected to the server 7 via the network 8.

以下，本実施形態において，入力装置が検索式または検索式の集合を受け取った際の処理手順について説明するが，ここでは，これを理解の容易さを考えて，三つの場合に分けて説明する．まず最初に，受け取ったＸＰａｔｈ検索式が，前述の「／／」という構文を含まず，かつ，全ての検索式がＸＭＬデータの木構造中の同じ深さのノードを取り出す要求である場合の処理手順について説明する．この場合の処理手順は，当発明を第一の実施形態のように木構造データ以外のデータ管理システムに適用している場合など，ある解が別の解の部分木として重複する場合を考慮しない場合の処理手順と同様のものになる．二番目に，検索式が「／／」を含まないが，各検索式が取り出すノードの深さは全て同じではない場合の処理手順について説明する．最後に，検索式が「／／」を含む場合の処理手順について説明する．以上，理解の容易さのために三つの場合にわけて説明するが，実際には，最後に説明する処理手順が，全ての場合に対応する最も一般的な処理手順になっており，この処理手順を，一番目，二番目に取り上げるような特殊な場合に適用すれば，一番目，二番目に説明する処理手順と同等の最終結果を得ることができる．In the following, in this embodiment, the processing procedure when the input device receives a search expression or a set of search expressions will be described. Here, this will be described in three cases for ease of understanding. . First, processing when the received XPath search expression does not include the above-mentioned syntax “//” and all the search expressions are requests for retrieving nodes of the same depth in the tree structure of the XML data. The procedure is explained. The processing procedure in this case does not consider the case where a solution overlaps as a subtree of another solution, such as when the present invention is applied to a data management system other than tree structure data as in the first embodiment. The procedure is the same as in the case. Second, the processing procedure when the search formulas do not contain “//” but the depths of the nodes extracted by each search formula are not all the same will be described. Finally, the processing procedure when the search expression includes “//” is explained. In the above, for ease of understanding, the explanation will be divided into three cases. Actually, however, the last explained procedure is the most general procedure for all cases. If the procedure is applied to special cases such as the first and second, the final result equivalent to the first and second processing procedures can be obtained.

受け取ったＸＰａｔｈ検索式が「／／」を含まず，検索式全てが同じ深さのデータを取り出す場合の処理手順は，以下のようになる．例えば，入力装置が以下の三つのＸＰａｔｈ検索式の集合を受け取ったとする．
／ａ／ｂ
／ａ／＊
／ａ／＊［ｃ／ｄ］
一つ目の／ａ／ｂは，格納されているＸＭＬデータの木構造の根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供のｂというタグ名を持つノードを取り出せという要求を表わす．一方，二つ目の／ａ／＊は，根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供の任意のタグ名を持つノードを取り出せという要求を表わす．また，三つ目の検索式の中の［ｃ／ｄ］は「その下に，ｃ／ｄというパスを持つ」という条件を表わしており，／ａ／＊［ｃ／ｄ］という検索式は，根からスタートして，その子供のａというタグ名を持つノードの，さらにその子供の任意のタグ名を持つノードであって，しかも，その下にｃという子ノードを持ち，さらに，そのｃという子ノードがｄという子ノードを持っているようなものを取り出せという要求を表わす．The processing procedure when the received XPath retrieval formula does not include “//” and all retrieval formulas retrieve data having the same depth is as follows. For example, assume that the input device receives a set of the following three XPath search expressions.
/ A / b
/ A / *
/ A / * [c / d]
The first / a / b starts from the root of the tree structure of the stored XML data, and has a node with a tag name of the child a and a node with the tag name of the child b. Represents a request to retrieve. On the other hand, the second / a / * indicates a request to start from the root and extract a node having a tag name of the child a and a node having an arbitrary tag name of the child. In addition, [c / d] in the third retrieval formula represents the condition “having a path c / d below”, and the retrieval formula / a / * [c / d] is , Starting from the root, a node having a tag name of the child a, and a node having an arbitrary tag name of the child, and having a child node c below the node, and the c Represents a request that a child node called “d” has a child node called “d”.

よって，これら三つの検索式の解集合は，互いに重複して現れる要素を含みうる．具体的には，／ａ／ｂまたは／ａ／＊［ｃ／ｄ］の解集合の要素となるノードは，全て／ａ／＊の解集合の要素としても現れる．また，／ａ／ｂと／ａ／＊［ｃ／ｄ］の解集合は，同じ要素を重複して持ちうる．一方，この例の場合は，三つの検索式とも木構造中の深さ２のノードを取り出すため，一つの解集合のある要素が，同じまたは他の解集合の要素の部分木として重複して出現することはありえない．Therefore, the solution set of these three search expressions can contain elements that appear overlapping each other. Specifically, all nodes that are elements of the solution set of / a / b or / a / * [c / d] also appear as elements of the solution set of / a / *. Also, the solution sets of / a / b and / a / * [c / d] can have the same elements redundantly. On the other hand, in this example, since all three search expressions extract a node with a depth of 2 in the tree structure, an element in one solution set is duplicated as a subtree of elements in the same or another solution set. It cannot appear.

よって，本システムではこの場合，与えられた検索式集合の任意の空でない部分集合，すなわちこの場合で言うと
｛／ａ／ｂ，／ａ／＊，／ａ／＊［ｃ／ｄ］｝
｛／ａ／ｂ，／ａ／＊｝
｛／ａ／ｂ，／ａ／＊「ｃ／ｄ］｝
｛／ａ／＊，／ａ／＊［ｃ／ｄ］｝
｛／ａ／ｂ｝
｛／ａ／＊｝
｛／ａ／＊［ｃ／ｄ］｝
という七つの部分集合の各々に対して，該部分集合中の全検索式の積と，該部分集合に含まれない検索式の集合である
｛｝
｛／ａ／＊［ｃ／ｄ］｝
｛／ａ／＊｝
｛／ａ／ｂ｝
｛／ａ／＊，／ａ／＊「ｃ／ｄ］｝
｛／ａ／ｂ，／ａ／＊［ｃ／ｄ］｝
｛／ａ／ｂ，／ａ／＊｝
に含まれる全検索式の和との，両者の差に相当する七つの検索式
／ａ／ｂ ∩ ／ａ／＊ ∩ ／ａ／＊「ｃ／ｄ］（検索式１）
（／ａ／ｂ ∩ ／ａ／＊） − ／ａ／＊［ｃ／ｄ］（検索式２）
（／ａ／ｂ ∩ ／ａ／＊「ｃ／ｄ］） − ／ａ／＊（検索式３）
（／ａ／＊ ∩ ／ａ／＊［ｃ／ｄ］） − ／ａ／ｂ（検索式４）
／ａ／ｂ − （／ａ／＊ ∪ ／ａ／＊［ｃ／ｄ］）（検索式５）
／ａ／＊ − （／ａ／ｂ ∪ ／ａ／＊［ｃ／ｄ］）（検索式６）
／ａ／＊［ｃ／ｄ］ − （／ａ／ｂ ∪ ／ａ／＊）（検索式７）
を検索式合成装置が求め，これら七つの検索式を検索装置に送って検索を行う．これは，前述の第二の実施の形態の場合と同様の手順である．Therefore, in this system, in this case, an arbitrary non-empty subset of the given search expression set, that is, {/ a / b, / a / *, / a / * [c / d]} in this case
{/ A / b, / a / *}
{/ A / b, / a / * "c / d]}
{/ A / *, / a / * [c / d]}
{/ A / b}
{/ A / *}
{/ A / * [c / d]}
Is a product of all search expressions in the subset and a set of search expressions not included in the subset {}
{/ A / * [c / d]}
{/ A / *}
{/ A / b}
{/ A / *, / a / * "c / d]}
{/ A / b, / a / * [c / d]}
{/ A / b, / a / *}
Search formulas corresponding to the difference between them and the sum of all the search formulas included in the / a / bｂ / a / * ∩ / a / * "c / d" (search formula 1)
(/ A / b∩ / a / *) − / a / * [c / d] (Search formula 2)
(/ A / b∩ / a / * "c / d])-/ a / * (search formula 3)
(/ A / * ∩ / a / * [c / d]) − / a / b (search formula 4)
/ A / b-(/ a / * ∪ / a / * [c / d]) (Search formula 5)
/ A / *-(/ a / b∪ / a / * [c / d]) (Search formula 6)
/ A / * [c / d] − (/ a / b∪ / a / *) (search formula 7)
The search formula synthesizer searches for these seven search formulas and sends them to the search unit. This is the same procedure as in the second embodiment described above.

ここで，∩は二つの検索式の積，すなわち，二つの検索式双方の条件を満たすノードを取り出す要求を表わす．同様に，∪は二つの検索式の和，すなわち，二つの検索式双方の条件のいずれかを満たすノードを取り出す要求を表わし，また―は二つの検索式の差，すなわち，前者の検索式の条件は満たすが，後者の検索式の条件は満たさないノードを取り出す要求を表わす．これらの記法は，実際のＸＰａｔｈでの記法とは異なる（例えば，∪は実際のＸＰａｔｈでは，前述のように「｜」と記述される）が，ここでは説明のわかりやすさのため，このような記法を用いることにする．上の式１〜７に対する解集合が，互いに重複する要素を含まないことは，第二の実施の形態の場合とほぼ同様である．このように合成した検索式を用いてデータを取り出すことにより，重複したデータの取り出し，あるいは重複したデータのネットワークを介しての送信を避けることができる．Here, ∩ represents the product of two search expressions, that is, a request to extract a node that satisfies both conditions of the two search expressions. Similarly, ∪ represents the sum of two search expressions, that is, a request to retrieve a node that satisfies either of the conditions of both search expressions, and − represents the difference between the two search expressions, that is, the former search expression. Represents a request to retrieve a node that satisfies the condition but does not satisfy the latter search condition. These notations are different from the notations in the actual XPath (for example, ∪ is described as “|” in the actual XPath as described above). Will be used. It is almost the same as in the case of the second embodiment that the solution sets for the above equations 1 to 7 do not contain overlapping elements. By retrieving the data using the search expression synthesized in this way, it is possible to avoid retrieval of duplicate data or transmission of duplicate data via a network.

検索装置は，上のような合成された検索式を受け取ってその解を検索し，検索解合成装置へ対応する七つの解集合を送る．検索解合成装置は，これらの解集合のうち，／ａ／ｂを含む部分集合を用いて作成された，検索式１，２，３，５に対する解集合の集合和を取ることによって，当初の検索式である／ａ／ｂに対する解集合を生成する．また，／ａ／＊の解は，検索式１，２，４，６に対する解集合の集合和を，／ａ／＊［ｃ／ｄ］の解については，検索式１，３，４，７に対する解集合の和集合を取ることによって生成する．The search device receives the combined search expression as above, searches for the solution, and sends seven solution sets corresponding to the search solution combiner. The search solution synthesizer takes the set sum of the solution sets for the search expressions 1, 2, 3, and 5 created using the subset including / a / b among these solution sets. Generate a solution set for the search expression / a / b. The solution of / a / * is the set sum of the solution sets for search formulas 1, 2, 4, and 6, and the solution of / a / * [c / d] is search formulas 1, 3, 4, and 7 Generate by taking the union of the solution sets for.

また，この例の場合，式１は，より単純な検索式／ａ／ｂ［ｃ／ｄ］と等価であり，式３は解集合が常に空集合となる検索式である．他の式についても，同様な簡単化が可能なものや，解が常に空となるものがある．よって，検索式合成装置は，このような検索式の簡単化や，解が空となる検索式の除外を行ってもよい．このような，パス式の形で記述されている検索式の簡単化や，解集合が常に空集合となる検索式の判定のための技術については，下記非特許文献４，５に示されているものなど，様々な先行技術があり，当発明とは独立した技術であるので，ここでは，詳しくは述べない．
Ｇ．Ｍｉｋｌａｕ，Ｄ．Ｓｕｃｉｕ著 ”ＣｏｎｔａｉｎｍｅｎｔａｎｄＥｑｕｉｖａｌｅｎｃｅｆｏｒａｎＸＰａｔｈＦｒａｇｍｅｎｔ” ＡＣＭＳｙｍｐｏｓｉｕｍｏｎＰｒｉｎｃｉｐｌｅｓｏｆＤａｔａｂａｓｅＳｙｓｔｅｍｓ（ＰＯＤＳ）予稿集，頁６５−７６，２００２Ｍ．Ｂｅｎｅｄｉｋｔ他著 ”ＳｔｒｕｃｔｕｒａｌＰｒｏｐｅｒｔｉｅｓｏｆＸＰａｔｈＦｒａｇｍｅｎｔｓ” ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＤａｔａｂａｓｅＴｈｅｏｒｙ（ＩＣＤＴ）予稿集，頁７９−９５，２００３ In this example, Equation 1 is equivalent to a simpler retrieval formula / a / b [c / d], and Equation 3 is a retrieval formula in which the solution set is always an empty set. Other equations can be simplified in the same way, or have solutions that are always empty. Therefore, the query formula synthesizer may simplify such query formulas or exclude search formulas that have empty solutions. Non-patent documents 4 and 5 below describe techniques for simplifying a search expression described in the form of a path expression and determining a search expression in which a solution set is always an empty set. Since there are various prior arts such as those that are independent of the present invention, they are not described in detail here.
G. Miklau, D.M. Suciu, “Containment and Equivalence for an XPath Fragment”, ACM Symposium on Principles of Database Systems (PODS) Proceedings, pages 65-76, 2002 M.M. Benedikt et al., “Structural Properties of XPath Fragments”, International Conference on Database Theory (ICDT) Proceedings, pages 79-95, 2003.

上述のような検索式の簡単化や除外を行った場合は，検索解合成装置は，／ａ／ｂを含む部分集合をもとに生成された検索式の簡単化によって得られ，かつ除外されなかったものの全ての解集合の和集合を求めて／ａ／ｂの解を生成する．他の検索式の解についても同様である．これについても，前述の第二の実施形態の場合と同様である．When simplification or exclusion of the search expression as described above is performed, the search solution synthesis apparatus is obtained and simplified by simplifying the search expression generated based on the subset including / a / b. The solution of / a / b is generated by finding the union of all solution sets of those that did not exist. The same applies to the solutions of other search expressions. This is also the same as in the case of the second embodiment described above.

ここまでは，特定の例を用いて手順を説明したが，より一般的には，第二の実施の形態の場合と同様，与えられた検索式集合の部分集合で空でないもの全てに対して，該部分集合をＴとして数式３で定義される検索式を生成し，これら全ての検索式によってデータの取出しを行う．また，取り出したデータからの当初検索式の解の合成も，第二の実施の形態の場合と同様，数式４に基づいて合成を行う．Up to this point, the procedure has been described using a specific example, but more generally, as with the second embodiment, for all subsets of a given search expression set that are not empty. , The subset is defined as T, and a search expression defined by Equation 3 is generated, and data is extracted by all these search expressions. In addition, the synthesis of the solution of the initial search formula from the extracted data is also performed based on Formula 4 as in the case of the second embodiment.

次に，検索式が「／／」を含まないが，全ての検索式が同じ深さのノードの取出し要求ではない場合の処理手順について説明する．この場合，検索式合成装置は，以下の手順で検索式の合成を行う．
（１）まず，与えられた検索式全てにＱ１，Ｑ２，…という識別子を割り当て，さらに与えられた検索式全てをパスの長さでグループに分類する．
（２）上の（１）で得られたグループ各々について以下の処理を行う．
（３）そのグループに含まれる全検索式の集合をＳとし，Ｓの空でない全ての部分集合の各々をＴとして，以下の処理を行う．
（４）そのグループより長い検索式全てから，現在考えているグループの長さの接頭辞を取り出し，これらの接頭辞の集合をＰとし，Ｐの全ての部分集合の各々をＲとして，以下の処理を行う．
（５）現在のグループより短い検索式全てについて，その末尾に／＊／…／＊を追加して現在のグループと同じ長さにした検索式を作成し，これら全ての和をＥとする．以上を用いて以下の式

にあたるパス式を作成する．また，このようにして生成された各パス式に対して，順に，Ｖ１，Ｖ２…という識別子を割り当てていく．
（６）また，それと同時に，Ｔに含まれる検索式Ｑｘ，Ｑｙ，…の各々について，今，作成したＶａの解となる部分木が，そのまま，Ｑｘ，Ｑｙ，…の解として取り出されるべきであることを表わす，
Ｑｘ←（Ｖａ，＊）
Ｑｙ←（Ｖａ，＊）
：
：
という項を生成する．同様に，Ｒに含まれる接頭辞ｐの各々について，その接頭辞を取り出した検索式をＱｘ，また，Ｑｘからその接頭辞ｐを取り除いた残りの部分にあたる接尾辞をｐｐとして，Ｖａの解となる部分木の根からパス＊／ｐｐにしたがってたどって到達するノードが，Ｑｘの解として取り出されるべきであることを表わす，
Ｑｘ←（Ｖａ，＊／ｐｐ）
：
：
という項を生成する．Next, the processing procedure when the retrieval formula does not include “//” but all retrieval formulas are not retrieval requests of nodes having the same depth will be described. In this case, the retrieval formula synthesizer synthesizes the retrieval formula in the following procedure.
(1) First, identifiers Q1, Q2,... Are assigned to all the given search expressions, and all the given search expressions are classified into groups according to the path length.
(2) The following processing is performed for each group obtained in (1) above.
(3) Let S be the set of all search expressions included in the group, and T be all non-empty subsets of S.
(4) From all the search expressions longer than the group, the prefix of the length of the group currently considered is taken out, a set of these prefixes is set as P, and each of all subsets of P is set as R. Perform processing.
(5) For all search formulas shorter than the current group, create a search formula with /*/.../* added to the end to make it the same length as the current group. Using the above formula,

Create a path expression corresponding to. In addition, identifiers V1, V2,... Are sequentially assigned to the path expressions generated in this way.
(6) At the same time, for each of the search expressions Qx, Qy,... Included in T, the subtree that is the solution of Va that has just been created should be taken out as the solution of Qx, Qy,. Represent something,
Qx ← (Va, *)
Qy ← (Va, *)
:
:
Generates the term Similarly, for each prefix p included in R, the search expression for extracting the prefix is Qx, and the suffix corresponding to the remaining part obtained by removing the prefix p from Qx is pp. Indicates that the node to be reached from the root of the subtree following the path * / pp should be taken as the solution of Qx.
Qx ← (Va, * / pp)
:
:
Generates the term

例として，以下の三つの検索式が与えられた場合について説明する．
／ａ／ｂ／ｃ
／ａ／＊／ｃ／ｄ
／ａ／ｂ／ｃ［ｄ］／＊
一つ目の検索式は，根からスタートして，ａというタグ名を持つ子ノードの，その子供でｂというタグ名を持つノードの，その子供でｃというタグ名を持つノードを取り出す．二つ目の検索式は，根からスタートして，ａというタグ名を持つ子ノードの，その任意の子ノードの，その子供でｃというタグ名を持つノードの，その子供でｄというタグ名を持つノードを取り出す．また，三つ目の検索式は，根からスタートして，ａというタグ名を持つ子ノードの，その子供でｂというタグ名を持つノードの，その子供でｃというタグ名を持ち，かつ，ｄというタグ名の子供を持つノードの，その任意の子ノードを取り出す．As an example, let us explain the case where the following three search expressions are given.
/ A / b / c
/ A / * / c / d
/ A / b / c [d] / *
The first search formula starts from the root and takes out a child node having a tag name of a, a node of that child having a tag name of b, and a child of the child having a tag name of c. The second search formula starts from the root and has a tag name of a child node with a tag name of a, a child node of that child node with a tag name of c, and a child with a tag name of d Get the node with. The third retrieval formula starts from the root, has a tag name of c of the child node of the child node having the tag name of a, the child of the node having the tag name of b, and Extract any child nodes of a node with children with tag name d.

これら三つの検索式の解集合は，以下のようなデータの重複を含みうる．まず，二つ目の検索式の解集合と三つ目の検索式の解集合とには，同じ要素が重複して出現しうる．さらに，二つ目と三つ目の検索式の解集合中の一部の要素は，一つ目の検索式の解集合中のある解の部分木として出現しうる．例えば，システムに現在格納されているＸＭＬデータ中に，／ａ／ｂ／ｃ／ｄという検索式にマッチするようなノードがあった場合，このノードは二つ目の検索式の解となるが，一方で，そのノードの親ノードは一つ目の検索式の解となるので，このノードは二つ目の検索式の解集合中の要素として出現すると同時に，一つ目の検索式の解集合中のある要素の部分木としても重複して出現することになる．The solution set of these three search expressions can include the following duplication of data. First, the same element can appear twice in the solution set of the second search expression and the solution set of the third search expression. In addition, some elements in the solution set of the second and third search expressions can appear as a subtree of a solution in the solution set of the first search expression. For example, if there is a node that matches the search expression / a / b / c / d in the XML data currently stored in the system, this node is the solution of the second search expression. On the other hand, since the parent node of that node becomes the solution of the first search expression, this node appears as an element in the solution set of the second search expression, and at the same time, the solution of the first search expression It will also appear as a subtree of an element in the set.

そこで，検索式合成装置は以下の手順で検索式の合成を行う．まず，これらの検索式に対して，識別子Ｑ１，Ｑ２，Ｑ３を割り当てる．次に，これらの検索式を，長さ３のグループ｛Ｑ１｝と長さ４のグループ｛Ｑ２，Ｑ３｝に分類する．まず，長さ３のグループについて考えると，その空でない部分集合は｛Ｑ１｝のみなので，これをＴとして（４）以下の処理を行う．次の（４）において，このグループより長い全ての検索式から長さ３の接頭辞を取り出した集合Ｐは｛／ａ／＊／ｃ，／ａ／ｂ／ｃ［ｄ］｝となるので，これの部分集合
｛／ａ／＊／ｃ，／ａ／ｂ／ｃ［ｄ］｝
｛／ａ／＊／ｃ｝
｛／ａ／ｂ／ｃ［ｄ］｝
｛｝
の各々をＲとする四通りの場合について（５）を行う．今の場合，現在のグループより短い検索式はなく，Ｅに該当する検索式は存在しない．よって，Ｒとして｛／ａ／＊／ｃ，／ａ／ｂ／ｃ［ｄ］｝を選んだ場合は，以下のパス式が作成される．
／ａ／ｂ／ｃ ∩ ／ａ／＊／ｃ ∩ ／ａ／ｂ／ｃ［ｄ］
同時に，この検索式にＶ１という識別子を割り当てる．また，この検索式は検索式生成に用いるＴとしては｛Ｑ１｝を用いていたので，Ｖ１の解が，そのままＱ１の解としても取り出されるべきであることを表わす項である
Ｑ１←（Ｖ１，＊）
が生成される．さらに，この検索式は，検索式生成に用いるＲとしては，Ｑ２から取り出した接頭辞である／ａ／＊／ｃとＱ３から取り出した接頭辞である／ａ／ｂ／ｃ［ｄ］がらなる集合を用いていたので，Ｖ１の解となる部分木の根から，／ａ／＊／ｃに対応する接尾辞であるｄというパスをたどって到達する要素もＱ２の解として取り出すべきであることを表わす項である
Ｑ２←（Ｖ１，＊／ｄ）
と，Ｖ１の解となる部分木の根から，／ａ／ｂ／ｃ［ｄ］に対応する接尾辞である＊というパスをたどって到達する要素もＱ３の解として取り出すべきであることを表わす項の
Ｑ３←（Ｖ１，＊／＊）
が生成される．Therefore, the retrieval formula synthesizer synthesizes retrieval formulas according to the following procedure. First, identifiers Q1, Q2, and Q3 are assigned to these search expressions. Next, these search expressions are classified into a group {Q1} of length 3 and a group {Q2, Q3} of length 4. First, considering a group of length 3, the only non-empty subset is {Q1}. In the next (4), the set P obtained by extracting the prefix of length 3 from all the search expressions longer than this group is {/ a / * / c, / a / b / c [d]}. A subset of this {/ a / * / c, / a / b / c [d]}
{/ A / * / c}
{/ A / b / c [d]}
{}
Repeat (5) for the four cases where each of R is R. In this case, there is no search expression shorter than the current group, and there is no search expression corresponding to E. Therefore, when {/ a / * / c, / a / b / c [d]} is selected as R, the following path expression is created.
/ A / b / c∩ / a / * / c∩ / a / b / c [d]
At the same time, an identifier V1 is assigned to this search expression. Further, since this search formula uses {Q1} as T used to generate the search formula, it is a term indicating that the solution of V1 should be taken out as the solution of Q1 as it is Q1 ← (V1, *)
Is generated. Further, in this search expression, R used for generating the search expression includes / a / * / c which is a prefix extracted from Q2 and / a / b / c [d] which is a prefix extracted from Q3. Since the set was used, this means that the element that arrives by following the path d that is the suffix corresponding to / a / * / c from the root of the subtree that is the solution of V1 should also be taken out as the solution of Q2 Q2 ← (V1, * / d)
And an element that arrives from the root of the subtree that is the solution of V1 by following the path * that is the suffix corresponding to / a / b / c [d] should be taken out as the solution of Q3 Q3 ← (V1, * / *)
Is generated.

また，ここで，Ｖ１は／ａ／ｂ／ｃ［ｄ］へと簡単化可能であるので，検索式合成装置はそのような簡単化を行ってもよいが，既に述べたように，簡単化や不要な検索式の除外の具体的な手法については，当発明とは独立した技術なので，ここでは詳しく述べない．Here, since V1 can be simplified to / a / b / c [d], the retrieval formula synthesis apparatus may perform such simplification, but as described above, the simplification is simplified. The specific method of removing unnecessary search expressions is a technology independent of the present invention, and will not be described in detail here.

同様に，手順（４）で残りの三通りのＲの選び方，すなわち，
Ｒ＝｛／ａ／＊／ｃ｝
Ｒ＝｛／ａ／ｂ／ｃ［ｄ］｝
Ｒ＝｛｝
として（５）を実行した場合，それぞれ，
Ｖ２：／ａ／ｂ／ｃ ∩ ／ａ／＊／ｃ − ／ａ／ｂ／ｃ［ｄ］
Ｑ１←（Ｖ２，＊）
Ｑ２←（Ｖ２，＊／ｄ）
Ｖ３：／ａ／ｂ／ｃ ∩ ／ａ／ｂ／ｃ［ｄ］ − ／ａ／＊／ｃ
Ｑ１←（Ｖ３，＊）
Ｑ３←（Ｖ３，＊／＊）
Ｖ４：／ａ／ｂ／ｃ − ／ａ／＊／ｃ − ／ａ／ｂ／ｃ［ｄ］
Ｑ１←（Ｖ４，＊）
という検索式Ｖ２，Ｖ３，Ｖ４，および，それらからの当初の検索式の解の取出し方法を指示する項が生成される．Similarly, in the procedure (4), the remaining three ways of selecting R, ie,
R = {/ a / * / c}
R = {/ a / b / c [d]}
R = {}
When (5) is executed as
V2: / a / b / c∩ / a / * / c− / a / b / c [d]
Q1 ← (V2, *)
Q2 ← (V2, * / d)
V3: / a / b / c∩ / a / b / c [d] − / a / * / c
Q1 ← (V3, *)
Q3 ← (V3, * / *)
V4: / a / b / c− / a / * / c− / a / b / c [d]
Q1 ← (V4, *)
And a term that indicates how to extract the solution of the original search formula therefrom.

次に，長さ４の検索式のグループ｛／ａ／＊／ｃ／ｄ，／ａ／ｂ／ｃ［ｄ］／＊｝について考える．このグループの空でない部分集合は，
｛／ａ／＊／ｃ／ｄ，／ａ／ｂ／ｃ［ｄ］／＊｝
｛／ａ／＊／ｃ／ｄ｝
｛／ａ／ｂ／ｃ［ｄ］／＊｝
である．よって，これらの各々をＴとして（４）を行う．このグループより長い検索式はないので，接頭辞の集合Ｐは空集合となり，その部分集合は空集合のみである．よって，いずれをＴとした場合についても，Ｒを空集合として（５）を行う．Ｅについては，現在のグループより短い検索式は，長さが３の／ａ／ｂ／ｃのみなので，これの長さを４とするために，末尾に／＊を追加して／ａ／ｂ／ｃ／＊とする．その結果，上の三つの集合のうちの｛／ａ／＊／ｃ／ｄ，／ａ／ｂ／ｃ［ｄ］／＊｝をＴとした場合は，
Ｖ５：／ａ／＊／ｃ／ｄ ∩ ／ａ／ｂ／ｃ［ｄ］／＊ − ／ａ／ｂ／ｃ／＊
という検索式Ｖ５，および，
Ｑ２←（Ｖ５，＊）
Ｑ３←（Ｖ５，＊）
という，Ｖ５からの当初の検索式の解の取出し方法を指示する項が生成される．同様にして，｛／ａ／＊／ｃ／ｄ｝および｛／ａ／ｂ／ｃ［ｄ］／＊｝をＴとした場合は，
Ｖ６：／ａ／＊／ｃ／ｄ − ／ａ／ｂ／ｃ［ｄ］／＊ − ／ａ／ｂ／ｃ／＊
Ｑ２←（Ｖ６，＊）
Ｖ７：／ａ／ｂ／ｃ［ｄ］／＊ − ／ａ／＊／ｃ／ｄ − ／ａ／ｂ／ｃ／＊
Ｑ３←（Ｖ７，＊）
という検索式Ｖ６，Ｖ７，および，それらからの当初の検索式の解の取出し方法を指示する項が生成される．Next, consider a search expression group {/ a / * / c / d, / a / b / c [d] / *} of length 4. The non-empty subset of this group is
{/ A / * / c / d, / a / b / c [d] / *}
{/ A / * / c / d}
{/ A / b / c [d] / *}
It is. Therefore, (4) is performed with each of these as T. Since there is no search expression longer than this group, the prefix set P is an empty set, and its subset is only the empty set. Therefore, in any case where T is set, (5) is performed with R as an empty set. For E, the search expression shorter than the current group is only / a / b / c with a length of 3. Therefore, in order to set the length to 4, / a / b is added to the end. / C / *. As a result, when {/ a / * / c / d, / a / b / c [d] / *} in the above three sets is T,
V5: / a / * / c / d∩ / a / b / c [d] / *-/ a / b / c / *
The search expression V5 and
Q2 ← (V5, *)
Q3 ← (V5, *)
A term indicating how to extract the solution of the original search formula from V5 is generated. Similarly, when {/ a / * / c / d} and {/ a / b / c [d] / *} are T,
V6: / a / * / c / d− / a / b / c [d] / * − / a / b / c / *
Q2 ← (V6, *)
V7: / a / b / c [d] / *-/ a / * / c / d- / a / b / c / *
Q3 ← (V7, *)
Search terms V6 and V7 and a term indicating how to extract the solution of the original search formula therefrom are generated.

以上から，当初与えられた三つの検索式に対して，七つの検索式と，その解からの当初の検索式の解の取出し方が生成される．これら七つの検索式の解集合は，互いに重複する要素を含まず，またある解集合中のある要素が，同じまたは別の解集合中のある要素の部分木として重複して出現することもない．一方で，これら七つの検索式の解集合は，当初の三つの検索式の解集合の合成に必要な全てのデータを含んでおり，これら七つの検索式の解集合から，上で示された取出し方法にしたがって，当初の三つの検索式の解集合を合成することができる．From the above, for the three search expressions given initially, seven search expressions and how to extract the solution of the original search expression from the solutions are generated. The solution sets of these seven search formulas do not contain elements that overlap each other, and no element in one solution set appears twice as a subtree of an element in the same or another solution set . On the other hand, the solution set of these seven search formulas contains all the data necessary for the synthesis of the solution set of the original three search formulas. From the solution set of these seven search formulas, it is shown above. According to the retrieval method, the solution set of the initial three search expressions can be synthesized.

任意の検索式集合に対して，前述の手順で作成された検索式が，上のような性質を満たす理由については，以下のように説明できる．まず，これらの検索式の解が互いに重複する要素を含まない点について説明する．解集合中に重複する解を含みうるのは，長さが同じ検索式同士のみである．一方，検索式の生成に用いられる数式５中の

の部分は，長さが同じ検索式のグループについて，前述の第二の実施形態の場合や，与えられた検索式が全て同じ長さだった場合において行ったのと同様の，ベン図の各領域に対応する解集合への分類を行っていることになる．よって，数式５によって作成された各検索式は，その解集合中に重複する要素を含まない．The reason why the search formula created by the above procedure satisfies the above properties for an arbitrary set of search formulas can be explained as follows. First, the point that the solutions of these search expressions do not contain overlapping elements. Only search expressions with the same length can contain duplicate solutions in the solution set. On the other hand, in Formula 5 used for generating the search formula

The part of Venn diagram is the same as that performed in the case of the second embodiment described above or when all of the given search formulas have the same length for a group of search formulas having the same length. Classification into solution sets corresponding to. Therefore, each search formula created by Equation 5 does not contain duplicate elements in the solution set.

次に，ある検索式の解集合中のある要素が，同じ検索式または別の検索式の解集合中の別の要素の部分木として出現しないという点について説明する．ある検索式のある解が，ある検索式のある解の部分木として重複して出現しうるのは，前者の検索式が，後者の検索式より長い場合である．しかし，数式５においては，Ｔに含まれる当初検索式より短い全ての検索式について，その後に＊／…／＊を加えて同じ長さにした物の和をＥとして，これに含まれる物を−Ｅによって取り除いており，よって，より短い検索式の解の部分木として現れている物は必ず取り除かれることになる．Next, we explain that an element in a solution set of a search expression does not appear as a subtree of another element in the solution set of the same search expression or another search expression. A solution of a query can appear as a subtree of a solution of a query when the former query is longer than the latter. However, in Formula 5, for all search formulas that are shorter than the initial search formula included in T, the sum of the same length by adding * /... -E is removed, so that what appears as a subtree of a shorter query is always removed.

次に，当初検索式の解となるノードは，必ずいずれかの合成検索式の解集合中に出現しているという点について説明する．これは，そのノードの先祖ノードの中に，いずれかの当初検索式の解となるものがある場合と，そのような先祖ノードが全くない場合の二通りに分けられる．そのような先祖ノードがある場合は，それらの先祖ノードの中で最も根に近いノードが，ある一つの合成検索式の解集合中に含まれる．よって，現在問題にしているノードは，その解の一部をなす部分木として出現する．なお，その先祖ノードが解として出現する合成検索式は，その先祖ノードにマッチする検索式全ての集合をＴとし，Ｔ中の検索式より長い検索式の接頭辞のうち，その先祖ノードとマッチするもの全ての集合をＲとして生成された検索式である．一方，そのような先祖ノードが全くない場合は，現在問題としているノード自身が，ある一つの合成検索式の解集合中に出現する．その合成検索式は，その部分木にマッチする全ての検索式の集合をＴとし，Ｔ中の検索式より長い検索式の接頭辞のうち，そのノードとマッチするもの全ての集合をＲとして生成された検索式である．Next, we explain that the node that is the solution of the initial search expression always appears in the solution set of one of the composite search expressions. This can be divided into two cases: some of the ancestor nodes of that node will be the solution of any of the initial search expressions, and no such ancestor node. If there is such an ancestor node, the node closest to the root among those ancestor nodes is included in the solution set of a certain composite search expression. Therefore, the node currently in question appears as a subtree that forms part of the solution. Note that the combined search expression in which the ancestor node appears as a solution is T, which is the set of all search expressions that match the ancestor node, and matches the ancestor node among the search expression prefixes longer than the search expression in T. This is a search expression generated with R as the set of all things to do. On the other hand, if there is no such ancestor node, the current node itself appears in the solution set of a single synthetic search expression. The composite search expression generates a set of all search expressions that match the subtree as T, and generates a set of all the search expression prefixes in T that match the node as R. This is the retrieved search expression.

次に，上述の方法で生成された各検索式の解から，上述の手法によって，正しく当初検索式の解を合成できる点について説明する．合成に用いる数式５のうちの数式６の部分は，当初検索式のうちのいずれかの解となるノード全てを，そのノードが当初検索式のうちのどれの解集合には含まれどれの解集合には含まれないかに基づいて分類していることに相当する．同様に，数式５中の

の部分は，これらのノードを，そのノードを根とする部分木からは，他の，より長い検索式のうちのどれとどれの解を，どんなパス式に基づいて取り出せばよいかという観点から分類していることに相当する．よって，数式５は，必要な全ノードを，それをどの検索式の解とすればよいか，そこからどのような部分木を他の解として抜き出せば良いかに基づいて，分類していることに相当し，よって，正しく当初検索式の解を取り出すことができる．Next, the point that the solution of the initial search formula can be correctly synthesized by the above-mentioned method from the solution of each search formula generated by the above method is explained. The portion of Equation 6 of Equation 5 used for synthesis includes all nodes that are solutions of any one of the initial search equations, and which solution is included in any solution set of the original search equations. This is equivalent to classification based on whether it is not included in the set. Similarly, in Equation 5

The part of is from the point of view of what path expression should be taken from these nodes, and from the subtree rooted at that node, which of other longer search expressions and which solutions. Corresponds to classification. Therefore, Equation 5 classifies all the necessary nodes based on which search expression should be used as a solution and what subtrees should be extracted as other solutions. Therefore, the solution of the initial search formula can be extracted correctly.

次に，検索式が「／／」という構文を含む場合の処理手順について説明する．ＸＰａｔｈにおける「／／」という構文は「その下の任意の深さの」という条件を表しており，例えば，／ａ／／ｂはデータの根からスタートして，ａというタグ名を持つ子ノードの，その下の任意の深さの子孫で，ｂというタグ名を持つノードを取り出せという要求を表す．また，／／ａ／ｂは，データ中の任意の深さにあるａというタグ名を持つノードの，その子ノードでｂというタグ名を持つものを取り出せという要求を表す．Next, the processing procedure when the search expression includes the syntax "//" is explained. The syntax of “//” in XPath represents the condition of “any depth below”, for example, / a // b starts from the root of the data and is a child node with a tag name of a Represents a request to extract a node with a tag name of b that is a descendant of any depth below. // a / b represents a request to retrieve a node having a tag name of a at an arbitrary depth in the data and a child node having a tag name of b.

このような「／／」という構文を含む検索式が与えられた場合の処理手順を記述するために，まず，検索式のｉ−接頭辞という概念を以下のように定義する．今，与えられた検索式が
／１ｐ１／２ｐ２ … ／ｍｐｍ
という形をしていたとする．ここで，／１，…，／ｍは，それぞれ／であるか，または，／／であるかのいずれかであり，ｐ１，…，ｐｍは，それぞれ，それらの間に挟まれた，／と／／を含まない部分である．この時，０≦ｉ≦ｍ−１であるｉに対して，この検索式のｉ−接頭辞は以下のように定義される．
（１）ｉ≠０で／ｉ＋１が／である場合：
／１ｐ１ … ／ｉｐｉ
（２）ｉ≠０で／ｉ＋１が／／である場合：
／１ｐ１ … ／ｉｐｉ ∪ ／１ｐ１ … ／ｉｐｉ／／＊
（３）ｉ＝０で／１が／である場合：
φ （いかなるノードにもマッチしない特別なパス式）
（４）ｉ＝０で／１が／／である場合：
／／＊
例えば，／／ａという長さ１の検索式に対しては，０≦ｉ≦１−１であるｉ，すなわち，ｉ＝０のみについてｉ−接頭辞が定義され，この検索式の０−接頭辞は／／＊である．また，／ｂ／／ｃという長さが２である検索式に対しては，０≦ｉ≦２−１であるｉ，すなわち，ｉ＝０とｉ＝１についてｉ−接頭辞が定義され，この検索式の０−接頭辞はφであり，１−接頭辞は／ｂ∪／ｂ／／＊である．In order to describe the processing procedure when a search expression including the syntax “//” is given, first, the concept of i-prefix of the search expression is defined as follows. Now, the given search formula is: / 1 p1 / 2 p2 ... / mpm
Suppose that Here, / 1,..., / M are either / or //, respectively, and p1,..., Pm are respectively sandwiched between them. This part does not contain //. At this time, for i where 0 ≦ i ≦ m−1, the i-prefix of this search expression is defined as follows.
(1) When i ≠ 0 and / i + 1 is /:
/ 1 p1 ... / ipi
(2) When i ≠ 0 and / i + 1 is //:
/ 1 p1 ... / ipi∪ / 1 p1 ... / i pi // *
(3) When i = 0 and / 1 is /:
φ (special path expression that does not match any node)
(4) When i = 0 and / 1 is //:
// *
For example, for a search expression of length 1 of // a, an i-prefix is defined only for i where 0 ≦ i ≦ 1-1, i.e., i = 0, and the 0-prefix of this search expression. The acronym is // *. For a search expression having a length of / b // c of 2, an i-prefix is defined for i where 0 ≦ i ≦ 2-1, i.e., i = 0 and i = 1. In this search expression, the 0-prefix is φ and the 1-prefix is / b∪ / b // *.

上のｉ−接頭辞の定義は，構文「／／」が「／，または，／＊／，または，／＊／＊／，または，…」という無限個の場合を表現するものであることを考慮して，通常のパスの接頭辞の概念を自然に拡張した物になっている．The definition of the i-prefix above expresses the infinite number of cases where the syntax “//” is “/, or / * /, or / * / * /, or…”. Considering this, it is a natural extension of the usual path prefix concept.

上のように，ｉ−接頭辞を定義すると，構文「／／」を含む検索式に対する処理手順は，以下のようになる．
（１）まず，与えられた検索式全てに，Ｑ１，Ｑ２，…という識別子を割り当てる．
（２）与えられた全検索式の集合をＳとし，Ｓの空でない全ての部分集合の各々をＴとして，以下の処理を行う．
（３）与えられた各検索式に対して定義される全てのｉ−接頭辞の集合をＰとし，Ｐの全ての部分集合の各々をＲとして，以下の処理を行う．
（４）全ての検索式について，その末尾に／／＊を追加した検索式を作成し，これら全ての和をＥとして，以下の式

にあたるパス式を作成する．また，このようにして作成された各パス式に対して，順に，Ｖ１，Ｖ２，…という識別子を割り当てていく．
（５）また，それと同時に，Ｔに含まれる検索式Ｑｘ，Ｑｙ，…の各々について，今，作成したＶａの解となる部分木が，そのまま，Ｑｘ，Ｑｙ，…の解として取り出されるべきであることを表わす，
Ｑｘ←（Ｖａ，＊）
Ｑｙ←（Ｖａ，＊）
：
：
という項を生成する．同様に，Ｒに含まれるｉ−接頭辞ｐの各々について，その接頭辞を取り出した検索式をＱｘ，また，Ｑｘからその接頭辞ｐを取り除いた残りの部分にあたる接尾辞をｐｐとして，Ｖａの解となる部分木の根からパス＊ｐｐにしたがってたどって到達するノードが，Ｑｘの解として取り出されるべきであることを表わす，
Ｑｘ←（Ｖａ，＊ｐｐ）
：
：
という項を生成する．If the i-prefix is defined as above, the processing procedure for the search expression including the syntax “//” is as follows.
(1) First, identifiers Q1, Q2,... Are assigned to all given search expressions.
(2) The following processing is performed with S being a set of all given search expressions and T being a subset of all non-empty subsets of S.
(3) The following processing is performed with a set of all i-prefixes defined for each given search expression as P and each subset of P as R.
(4) Create a search expression with // * added to the end of all the search expressions.

Create a path expression corresponding to. In addition, identifiers V1, V2,... Are sequentially assigned to the path expressions created in this way.
(5) At the same time, for each of the search expressions Qx, Qy,... Included in T, the subtree that is the solution of Va that has been created should be taken out as a solution of Qx, Qy,. Represent something,
Qx ← (Va, *)
Qy ← (Va, *)
:
:
Generates the term Similarly, for each of the i-prefix p included in R, the search expression from which the prefix is extracted is Qx, and the suffix corresponding to the remaining part obtained by removing the prefix p from Qx is pp. Indicates that the node that is reached from the root of the subtree to be solved along the path * pp should be taken out as a solution of Qx.
Qx ← (Va, * pp)
:
:
Generates the term

上の手順は，構文「／／」を含む検索式はその長さ，すなわち取り出す部分木の現れる深さが可変になる点を考慮して，前述の「／／」を含まない検索式集合に対する手順を自然に拡張した物になっている．また，上の手順は，別の言い方をすると，与えられたパス式のいずれかの解となる部分木のうちで，他の解の一部をなす部分木とはならない物の全てを，それがどのパス式の解集合には含まれどのパス式の解集合には含まれないかという点と，それがどのパス式のどの接頭辞にはマッチしどのパス式のどの接頭辞にはマッチしないかという点の，以上の二点に基づいて分類し，この各分類を取り出すパス式を生成していることに相当する．The above procedure takes into account the fact that a search expression including the syntax “//” has a variable length, that is, a depth at which the subtree to be extracted appears. It is a natural extension of the procedure. In other words, the above procedure, in other words, considers all the subtrees that are solutions of any given path expression that are not part of other solutions. Is included in which path expression's solution set and not in which path expression's solution set, and it matches which prefix of which path expression matches which prefix of which path expression This is equivalent to generating a path expression that classifies each class based on the above two points.

例として，以下の二つの検索式が与えられた場合について説明する．
／／ａ
／ｂ／／＊
一つ目の検索式は，任意の深さにあるノードでａというタグ名を持つものを根とする部分木を取り出す．二つ目の検索式は，データの根からスタートして，ｂというタグ名を持つ子ノードの，その下の任意の深さにある任意のタグ名を持つノードを根とする部分木を取り出す．この二つの検索式は，以下のようなデータの重複を含みうる．まず，／ｂ／／ａという検索式にマッチするようなノードがあった場合，そのノードは上の二つの検索式の双方の解集合中に要素として含まれる．また，／ｂ／／＊／／ａという検索式にマッチするようなノードがあった場合，そのノードは一つ目の検索式の解集合中の要素として出現すると同時に，二つ目の検索式の解集合中のある要素の部分木としても重複して出現することになる．同様に，／ｂ／／ａ／／＊という検索式にマッチするようなノードがあった場合，そのノードは二つ目の検索式の解集合中の要素として出現すると同時に，一つ目の検索式の解集合中のある要素の部分木としても重複して出現することになる．As an example, the case where the following two search expressions are given is explained.
// a
/ B // *
The first search expression extracts a subtree rooted at a node at an arbitrary depth and having a tag name a. The second search formula starts from the root of the data and extracts a subtree rooted at a node with an arbitrary tag name at an arbitrary depth below a child node with a tag name of b . These two search expressions can include duplication of data as follows. First, if there is a node that matches the search expression / b // a, that node is included as an element in the solution set of both of the above two search expressions. If there is a node that matches the search expression / b // * // a, the node appears as an element in the solution set of the first search expression and at the same time, the second search expression. It also appears as a subtree of an element in the solution set of. Similarly, if there is a node that matches the search expression / b // a // *, that node appears as an element in the solution set of the second search expression and at the same time the first search It will also appear as a subtree of an element in the solution set of the expression.

そこで，検索式合成装置は，以下の手順で検索式の合成を行う．まず，これらの検索式にＱ１，Ｑ２という識別子を割り当てる．次に，検索式の集合｛／／ａ，／ｂ／／＊｝の空でない部分集合
｛／／ａ，／ｂ／／＊｝
｛／／ａ｝
｛／ｂ／／＊｝
の各々に対して，これをＴとして（３）以下の処理を行う．与えられた検索式に対して定義される全てのｉ−接頭辞の集合は
｛／／＊，φ，／ｂ∪／ｂ／／＊｝
であるので，これの全ての部分集合に対して，その各々をＲとして（４）以下の処理を行う．例えば｛／／ａ，／ｂ／／＊｝をＴ，｛／／＊，φ，／ｂ∪／ｂ／／＊｝をＲとした場合，
Ｖ：／／ａ ∩ ／ｂ／／＊ ∩ ／／＊ ∩ φ ∩ （／ｂ∪／ｂ／／＊）
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
という検索式Ｖと，Ｖからの当初の検索式の解の取出し方法を表す項の，
Ｑ１←（Ｖ，＊）
Ｑ２←（Ｖ，＊）
Ｑ１←（Ｖ，＊／／ａ）
Ｑ２←（Ｖ，＊／ｂ／／＊）
Ｑ２←（Ｖ，＊／／＊）
が生成される．ここで，／／ａはＱ１の０−接頭辞に対する接尾辞，／ｂ／／＊はＱ２の０−接頭辞に対する接尾辞，／／＊はＱ２の１−接頭辞に対する接尾辞である．Therefore, the retrieval formula synthesizer synthesizes retrieval formulas according to the following procedure. First, identifiers Q1 and Q2 are assigned to these search expressions. Next, a non-empty subset {// a, / b // *} of the set of search expressions {// a, / b // *}
{// a}
{/ B // *}
(3) The following processing is performed for each of these. The set of all i-prefixes defined for a given search expression is {// *, φ, / b∪ / b // *}
Therefore, the following processing is performed on all the subsets, each of which is R. For example, when {// a, / b // *} is T and {// *, φ, / b∪ / b // *} is R,
V: // a ∩ / b // * ／ // * ∩ φ ∩ (/ b ∪ / b // *)
-(// a // * ∪ / b // * // *)
And a term representing the retrieval method of the original retrieval formula from V.
Q1 ← (V, *)
Q2 ← (V, *)
Q1 ← (V, * // a)
Q2 ← (V, * / b // *)
Q2 ← (V, * // *)
Is generated. Where // a is the suffix for the 0-prefix of Q1, / b // * is the suffix for the 0-prefix of Q2, and // * is the suffix for the 1-prefix of Q2.

ただし，上のＶは，全てのノードにマッチしないφを積の成分として含んでいるため，常に解集合が空集合となる検索式である．よって，このようにφを積の成分として含んでいて，解集合が空集合となることが容易に判定可能なものについては，検索式合成装置で取り除くことにしてもよい．同様に，／／＊が負の成分として含まれる検索式も，常に解集合が空集合となることが容易に判定可能なので，検索式合成装置が取り除くことにしてもよい．さらには「∩／／＊」と「−φ」は省略しても検索式全体の意味が変わらないことが容易にわかるため，これらを省略することによる検索式の簡単化を，検索式合成装置が行うことにしてもよい．ここでは，これらの特に容易な検索式の除去と簡単化については検索式合成装置が行うものとする．その場合，Ｒとしては｛／／＊，／ｂ∪／ｂ／／＊｝と｛／／＊｝の場合のみを考えればよい．結果として，三通りのＴと二通りのＲの組み合わせで，以下の六つの検索式が生成される．
Ｖ１１：／／ａ ∩ ／ｂ／／＊ ∩ （／ｂ∪／ｂ／／＊）
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
Ｖ１２：／／ａ ∩ ／ｂ／／＊ − （／ｂ∪／ｂ／／＊）
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
Ｖ１３：／／ａ ∩ （／ｂ ∪ ／ｂ／／＊） − ／ｂ／／＊
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
Ｖ１４：／／ａ − ／ｂ／／＊ − （／ｂ ∪ ／ｂ／／＊）
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
Ｖ１５：／ｂ／／＊ ∩ （／ｂ ∪ ／ｂ／／＊） − ／／ａ
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
Ｖ１６：／ｂ／／＊ − ／／ａ − （／ｂ ∪ ／ｂ／／＊）
− （／／ａ／／＊ ∪ ／ｂ／／＊／／＊）
さらに，これらから当初のＱ１とＱ２の解の取出し方法を表す項として，
Ｑ１←（Ｖｉ，＊）ただしｉ＝１１，１２，１３，１４
Ｑ２←（Ｖｉ，＊）ただしｉ＝１１，１２，１５，１６
Ｑ１←（Ｖｉ，＊／／ａ）ただしｉ＝１１，１２，１３，１４，１５，１６
Ｑ２←（Ｖｉ，＊／／＊）ただしｉ＝１１，１３，１５
が生成される．ここで，／／ａはＱ１の０−接頭辞である／／＊に対する接尾辞，／／＊はＱ２の１−接頭辞である／ｂ∪／ｂ／／＊に対する接尾辞である．However, the above V is a search expression in which the solution set is always an empty set because φ that does not match all nodes is included as a product component. Therefore, those that contain φ as a product component and can easily determine that the solution set is an empty set may be removed by the search expression synthesizer. Similarly, a retrieval formula including // * as a negative component can be easily determined that the solution set is always an empty set, and may be removed by the retrieval formula synthesis device. Furthermore, since it is easy to see that the meaning of the entire search expression does not change even if "∩ // *" and "-φ" are omitted, the search expression synthesizer can simplify the search expression by omitting them. May decide to do. Here, it is assumed that the retrieval formula synthesizer performs removal and simplification of these particularly easy retrieval formulas. In that case, it is only necessary to consider the cases of {// *, / b∪ / b // *} and {// *} as R. As a result, the following six search formulas are generated by the combination of three kinds of T and two kinds of R.
V11: // a∩ / b // * ∩ (/ b∪ / b // *)
-(// a // * ∪ / b // * // *)
V12: // a∩ / b // *-(/ b∪ / b // *)
-(// a // * ∪ / b // * // *)
V13: // a ∩ (/ b ∪ / b // *)-/ b // *
-(// a // * ∪ / b // * // *)
V14: // a− / b // * − (/ b∪ / b // *)
-(// a // * ∪ / b // * // *)
V15: / b // * ∩ (/ b∪ / b // *) − // a
-(// a // * ∪ / b // * // *)
V16: / b // *-// a- (/ b∪ / b // *)
-(// a // * ∪ / b // * // *)
Furthermore, from these, as a term that represents the method of extracting the original solution of Q1 and Q2,
Q1 ← (Vi, *) where i = 11, 12, 13, 14
Q2 ← (Vi, *) where i = 11, 12, 15, 16
Q1 ← (Vi, * // a) where i = 11, 12, 13, 14, 15, 16
Q2 ← (Vi, * // *) where i = 11, 13, 15
Is generated. Here, // a is the suffix for // 1 which is the 0-prefix of Q1, and // * is the suffix for / b∪ / b // * which is the 1-prefix of Q2.

実は，ここで，上のＶ１１〜Ｖ１６は以下のようにさらに簡単化可能である．
Ｖ１１：／ｂ／ａ
Ｖ１２： φ
Ｖ１３： φ
Ｖ１４：／／ａ − ／ｂ／／＊ − ／／ａ／／＊
Ｖ１５：／ｂ／＊ − ／／ａ − ／／ａ／／＊
Ｖ１６： φ
よって，検索式合成装置は，このようなさらなる簡単化を行ってもよい．しかし，前述の通り，このような簡単化のための具体的な技術については，ここでは詳しくは述べない．Actually, the above V11 to V16 can be further simplified as follows.
V11: / b / a
V12: φ
V13: φ
V14: // a- / b // *-// a // *
V15: / b / *-// a-// a // *
V16: φ
Therefore, the retrieval formula synthesizer may perform such further simplification. However, as mentioned above, the specific technology for such simplification is not described in detail here.

以上の合成された検索式を用いてデータの取出しを行うことで，重複したデータの取出しを排除することができる．この点について，以下に具体的なデータの例を使って示す．図１０に示したデータに対して，／／ａと／ｂ／／＊を実行した場合の解は，図１１に示す通りになる．図１１からわかるように，これらの解は多くのデータの重複を含み，非常に冗長である．一方，図１０に示したデータに対してＶ１１〜Ｖ１６を実行した場合には，Ｖ１２，Ｖ１３，Ｖ１４，Ｖ１６の解は空集合となり，Ｖ１１，Ｖ１５の解は図１２に示すようになる．そして，これらの解から上述の項に従って，／／ａと／ｂ／／＊の解を取り出すと，正しく図１１と同じ解が取り出せることがわかる．By retrieving data using the combined search formula above, it is possible to eliminate the retrieval of duplicate data. This point is shown below using specific data examples. The solution when // a and / b // * are executed on the data shown in FIG. 10 is as shown in FIG. As can be seen from FIG. 11, these solutions contain a lot of data duplication and are very verbose. On the other hand, when V11 to V16 are executed on the data shown in FIG. 10, the solutions of V12, V13, V14, and V16 are empty sets, and the solutions of V11 and V15 are as shown in FIG. From these solutions, it can be seen that the same solution as in FIG.

同様の重複の問題は，検索式が「／／」を含む場合には，検索式が一つだけ与えられた場合にも生じる．例として，図１３に示すデータに対して，
／／ａ／＊／＊
という検索式が与えられたとする．この場合の解は，図１４に示すようになる．しかし，この図からわかるように，この解にはｆを根とする部分木が，一度は解として，一度は他の解の一部をなす部分木として，重複して現れている．The same duplication problem occurs when only one search expression is given if the search expression contains "//". As an example, for the data shown in FIG.
// a / * / *
Is given. The solution in this case is as shown in Fig. 14. However, as can be seen from this figure, in this solution, the subtree rooted at f appears redundantly, once as a solution and once as part of another solution.

そこで，上述の手順によって検索式の合成を行うことを考える．まず，上の検索式をＱ１とする．検索式が一つしかないため，Ｔの選び方はＴ＝｛Ｑ１｝のみである．接頭辞の集合は，｛／／＊，／／ａ，／／ａ／＊｝であり，この部分集合の選び方は８通りある．しかし，／／＊については，前述の例の場合と同じように，これをＲに含めない時に生成される検索式は必ず解が空集合となる．よって，Ｒについては，
Ｒ＝｛／／＊，／／ａ，／／ａ／＊｝
Ｒ＝｛／／＊，／／ａ｝
Ｒ＝｛／／＊，／／ａ／＊｝
Ｒ＝｛／／＊｝
の四通りを考えればよく，以下の四つの検索式が生成される。
Ｖ２１：／／ａ／＊／＊∩／／＊∩／／ａ∩／／ａ／＊−／／ａ／＊／＊／／＊
Ｖ２２：／／ａ／＊／＊∩／／＊∩／／ａ−／／ａ／＊−／／ａ／＊／＊／／＊
Ｖ２３：／／ａ／＊／＊∩／／＊∩／／ａ／＊−／／ａ−／／ａ／＊／＊／／＊
Ｖ２４：／／ａ／＊／＊∩／／＊−／／ａ−／／ａ／＊−／／ａ／＊／＊／／＊
さらに，これらから当初のＱ１の解の取出し方法を表す項として，
Ｑ１←（Ｖｉ，＊）ただしｉ＝２１，２２，２３，２４
Ｑ１←（Ｖｉ，＊／／ａ／＊／＊）ただしｉ＝２１，２２，２３，２４
Ｑ１←（Ｖｉ，＊／＊／＊）ただしｉ＝２１，２２
Ｑ１←（Ｖｉ，＊／＊）ただしｉ＝２１，２３
が生成される．二番目の項中の／／ａ／＊／＊は，０−接頭辞／／＊に対応する接尾辞であり，同様に，三番目と四番目の項中の／＊／＊と／＊は，１−接頭辞と２−接頭辞に対する接尾辞である．Therefore, consider synthesizing search expressions by the above procedure. First, let the above search formula be Q1. Since there is only one search expression, the only way to select T is T = {Q1}. The set of prefixes is {// *, // a, // a / *}, and there are 8 ways to select this subset. However, for // *, as in the previous example, the search expression generated when this is not included in R always has an empty set. So for R,
R = {// *, // a, // a / *}
R = {// *, // a}
R = {// *, // a / *}
R = {// *}
The following four search expressions are generated.
V21: // a / * / * ∩ // * ∩ // a∩ // a / *-// a / * / * // *
V22: // a / * / * ∩ / * ∩ // a-// a / *-// a / * / * // *
V23: // a / * / * ∩ / * ∩ // a / *-// a-// a / * / * // *
V24: // a / * / * ∩ // *-// a-// a / *-// a / * / * // *
Furthermore, as a term indicating how to extract the original Q1 solution from these,
Q1 ← (Vi, *) where i = 21, 22, 23, 24
Q1 ← (Vi, * // a / * / *) where i = 21, 22, 23, 24
Q1 ← (Vi, * / * / *) where i = 21, 22
Q1 ← (Vi, * / *) where i = 21, 23
Is generated. // a / * / * in the second term is the suffix corresponding to 0-prefix // *, and similarly, / * / * and / * in the third and fourth terms , 1-prefix and 2-prefix suffix.

実は，Ｖ２１〜Ｖ２４の検索式は以下のようにさらに簡単化できるので，検索式合成装置はそのような簡単化を行ってもよいが，前述の通り，ここでは詳しくは述べない．
Ｖ２１：／／ａ／ａ／ａ − ／／ａ／＊／＊／／＊
Ｖ２２：／／ａ／＊／ａ − ／／ａ／＊ − ／／ａ／＊／＊／／＊
Ｖ２３：／／ａ／ａ／＊ − ／／ａ − ／／ａ／＊／＊／／＊
Ｖ２４：／／ａ／＊／＊ − ／／ａ − ／／ａ／＊ − ／／ａ／＊／＊／／＊
これらの検索式を図１３に示すデータに対して実行した場合，Ｖ２１，Ｖ２２の解は空集合となり，Ｖ２３，Ｖ２４の解は図１５に示した通りになる．図１５からわかるように，これらの解集合は，全くデータの重複を含まない．そして，これらの結果に対して，上述の取り出し方法を表わす各項を実行すると，正しくＱ１の解を取り出すことができる．Actually, the retrieval formulas for V21 to V24 can be further simplified as follows. The retrieval formula synthesis apparatus may perform such a simplification, but as described above, it will not be described in detail here.
V21: // a / a / a-// a / * / * // *
V22: // a / * / a-// a / *-// a / * / * // *
V23: // a / a / *-// a-// a / * / * // *
V24: // a / * / *-// a-// a / *-// a / * / * // *
When these search expressions are executed on the data shown in FIG. 13, the solutions of V21 and V22 are empty sets, and the solutions of V23 and V24 are as shown in FIG. As can be seen from FIG. 15, these solution sets do not include any data duplication. And if each term showing the above-mentioned extraction method is executed for these results, the solution of Q1 can be correctly extracted.

以上のように，当初与えられた一つの検索式に対して，四つの検索式と，その解からの当初の検索式の解の取出し方が生成され，これら四つの検索式の解集合は，互いに重複する要素を含まず，またある解集合中のある要素が，同じまたは別の解集合中のある要素の部分木として重複して出現することもない，一方で，これら四つの検索式の解集合は，当初の検索式の解集合の合成に必要な全てのデータを含んでおり，これら四つの検索式の解集合から，上で示された取出し方法にしたがって，当初の検索式の解集合を合成することができる．As described above, four search expressions and how to extract the solution of the original search expression from the solution are generated for one initially given search expression, and the solution set of these four search expressions is It does not contain elements that overlap each other, and an element in one solution set does not appear as a subtree of an element in the same or another solution set. The solution set contains all the data necessary to synthesize the solution set of the original search expression. From the solution set of these four search expressions, the solution of the original search expression is determined according to the retrieval method shown above. You can synthesize sets.

任意の検索式集合に対して，前述の手順で作成された検索式が，上のような性質を満たす理由については，前述の「／／」を含まない場合の手順に対する説明とほぼ同様にして説明できる．The reason why the search formula created by the above procedure satisfies the above properties for an arbitrary set of search formulas is almost the same as the description for the procedure when “//” is not included. Explain.

この明細書で示した実施形態では，木構造データから部分木の集合を取り出すシステムに本発明を適用する際の処理に関しては，パス式の形で検索式が与えられる場合を例にとって説明したが，本発明は，その他のさまざまな形で取り出し要求が与えられる場合においても，同様の考え方に基づいて適用可能である．In the embodiment shown in this specification, the processing when applying the present invention to a system for extracting a set of subtrees from tree structure data has been described by taking as an example a case where a search expression is given in the form of a path expression. The present invention can be applied based on the same concept even when the retrieval request is given in various other forms.

（第六の実施の形態）
次に，本発明の第六の実施の形態について，説明する．本説明では，本発明を，インターネット上で公開されているＸＭＬデータ連続問い合わせシステムのためのプロクシーサーバに適用する場合の一実施形態について説明する．(Sixth embodiment)
Next, a sixth embodiment of the present invention will be described. In this description, an embodiment in which the present invention is applied to a proxy server for an XML data continuous inquiry system published on the Internet will be described.

図１６に，本実施形態におけるＸＭＬデータ連続問い合わせシステムとプロクシーサーバの全体構成図を示す．本実施形態においては，ＸＭＬデータ連続問い合わせシステム１１がインターネット１２の上に公開されているものとする．ＸＭＬデータ連続問い合わせシステムとは，ＸＭＬデータを格納し，ユーザからの検索式登録要求を受けるとその検索式を永続的に登録し，ユーザから登録されたこれらの検索式を，定期的に（例えば一定時間毎に）評価し，検索式が評価される度に，評価結果をその問い合わせを登録したユーザに送信するシステムであり，ニュース情報通知サービスや，株価情報通知システム等に利用される，さらに，インターネット１２に接続された社内ネットワーク等のローカルネットワーク１３上に，このＸＭＬデータ連続問い合わせシステム１１を利用するユーザ１４が複数存在するものとする．FIG. 16 shows an overall configuration diagram of the XML data continuous inquiry system and the proxy server in this embodiment. In the present embodiment, it is assumed that the XML data continuous inquiry system 11 is disclosed on the Internet 12. The XML data continuous inquiry system stores XML data, receives a search expression registration request from a user, registers the search expression permanently, and periodically searches these search expressions registered by the user (for example, This is a system that evaluates (at regular intervals) and sends the evaluation result to the user who registered the query every time the search expression is evaluated, and is used in news information notification services, stock price information notification systems, etc. It is assumed that there are a plurality of users 14 who use this XML data continuous inquiry system 11 on a local network 13 such as an in-house network connected to the Internet 12.

このような環境において，当実施形態では，このローカルネットワーク１３上に，プロクシーサーバ１５を置き，このローカルネット１３上の各ユーザ１４は，直接，ＸＭＬデータ連続問い合わせシステム１１に自分の検索式を登録する代わりに，このプロクシーサーバ１５に対して検索式を登録する．そして，プロクシーサーバ１５は，複数のユーザ１４から登録された複数の検索式をもとに，第五の実施形態と同様の方法で新たな検索式を合成し，この合成した検索式をＸＭＬデータ連続問い合わせシステム１１に登録する．そして，ＸＭＬデータ連続問い合わせシステムから定期的に結果を受け取る度に，その結果を用いて，各ユーザ１４が登録した検索式の解を合成し，これを各ユーザ１４に送信する．In such an environment, in this embodiment, the proxy server 15 is placed on the local network 13, and each user 14 on the local network 13 directly registers his / her search formula in the XML data continuous inquiry system 11. Instead, a search expression is registered for this proxy server 15. Then, the proxy server 15 synthesizes a new search expression based on a plurality of search expressions registered from a plurality of users 14 by the same method as in the fifth embodiment, and the combined search expression is converted into XML data. Register in the continuous inquiry system 11. Each time a result is periodically received from the XML data continuous inquiry system, the result is used to synthesize a solution of the search formula registered by each user 14 and send it to each user 14.

このようにすると，複数のユーザが，解が一部重複するような多数の検索式を登録した場合には，これらの検索式はプロクシーサーバ１５によって，解に重複を含まない問い合わせに置き換えられて連続問い合わせシステム１１に登録され，その結果がインターネット１２を介してプロクシーサーバ１５に送信され，プロクシーサーバ１５において一部重複を含む当初の検索式の解が合成されて，これがローカルネット１３を介して各ユーザ１４に送信されるので，各ユーザ１４が直接，それらの検索式を連続問い合わせシステム１１に登録する場合に比べ，インターネット１２上での重複するデータの送信を排除し，インターネット１２上の通信コストを軽減することができる．In this way, when a plurality of users register a large number of search formulas with overlapping solutions, these search formulas are replaced by the proxy server 15 with queries that do not contain duplicates. The result is registered in the continuous inquiry system 11, and the result is transmitted to the proxy server 15 via the Internet 12. The proxy server 15 synthesizes the solution of the original search formula including a partial overlap, and this is synthesized via the local network 13. Since it is transmitted to each user 14, compared to the case where each user 14 directly registers their search expressions in the continuous inquiry system 11, the transmission of duplicate data on the Internet 12 is eliminated, and communication on the Internet 12 is performed. Cost can be reduced.

第五の実施形態におけるシステムの全体構成図Overall configuration diagram of system in fifth embodiment 第一，第三の実施形態におけるシステムの全体構成図Overall configuration diagram of the system in the first and third embodiments 三つの解集合Ａ１，Ａ２，Ａ３の間の関係を表わすベン図Venn diagram representing the relationship between the three solution sets A1, A2, A3 第二の実施形態におけるシステムの全体構成図Overall configuration diagram of the system in the second embodiment ＸＭＬデータの例（その１）Example of XML data (part 1) ／ａ／ｂ，／ａ／ｃ／＊，／ａ／＊／ｄに対する解集合Solution set for / a / b, / a / c / *, / a / * / d 図６の解集合から重複を取り除いた状態のデータData with duplicates removed from the solution set in FIG. ／ａ／ｂ／／＊に対する解集合Solution set for / a / b // * 第四の実施形態におけるシステムの全体構成図Overall configuration diagram of a system according to the fourth embodiment ＸＭＬデータの例（その２）Example of XML data (part 2) ／／ａ，／ｂ／／＊に対する解集合Solution set for // a, / b // * Ｖ１１，Ｖ１５に対する解集合Solution set for V11 and V15 ＸＭＬデータの例（その３）Example of XML data (part 3) ／／ａ／＊／＊に対する解集合Solution set for // a / * / * Ｖ２３，Ｖ２４に対する解集合Solution set for V23 and V24 第六の実施形態におけるシステム環境全体の構成図Configuration diagram of the entire system environment in the sixth embodiment

Explanation of sign

１入力装置，２検索式合成装置，３検索装置，４格納装置，
５検索解合成装置，６出力装置，７サーバ，８ネットワーク，９クライアント，１０重複除去装置，
１１ＸＭＬデータ連続問い合わせシステム，１２インターネット，
１３ローカルネットワーク，１４ユーザ，１５プロクシーサーバ1 input device, 2 search type composition device, 3 search device, 4 storage device,
5 search solution synthesizer, 6 output device, 7 server, 8 network, 9 client, 10 deduplication device,
11 XML data continuous inquiry system, 12 Internet,
13 local network, 14 users, 15 proxy server

Claims

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a plurality of search expressions for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device are given to the input device in the form of a path expression,
For all combinations of any non-empty subset of the set of all of the given multipath expressions and any subset of the set of all the prefixes of the given multipath expressions, The product of all of the multipath expressions contained in the former subset and all of the prefixes contained in the latter subset;
The sum of all of the multipath expressions not included in the former subset and all of the prefixes not included in the latter subset,
From the difference between these products and sums,
Further, a path expression corresponding to the difference obtained by removing all descendants (not including itself) of data matching all the sums of the given multiple path expressions is synthesized,
For each of these composite path expressions, the subtree set that is the solution set is extracted from the storage unit,
Of the sub-trees that are solutions of one or more path expressions of the given multi-path expressions, they are not sub-trees that form part of other solutions of any of the multi-path expressions. All of what is included in which path expression's solution set, not in which path expression's solution set, and which prefix of which path expression matches which prefix of which path expression Generate a set of groups that correspond to the words that do not match
A data management program for operating a computer so as to output these set groups to the output device.

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a search expression for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device is given in the form of a path expression to the input device,
For all subsets of the set of all prefixes of the path expression,
The product of the given path expression and all of the prefixes contained in the subset;
The sum of all the prefixes not included in the subset,
From the difference between these products and sums,
Further, a path expression corresponding to the difference obtained by removing all descendants (not including itself) of data matching the given path expression is synthesized,
For each of these composite path expressions, the subtree set that is the solution set is extracted from the storage unit,
All subtrees that are solutions of the given path expression that do not form subtrees that form part of the other solutions of the path expression match any prefix of the path expression that matches the path Create a set of groups that are classified based on which prefixes in the expression do not match,
A data management program for operating a computer so as to output these set groups to the output device.

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a plurality of search expressions for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device are given to the input device in the form of a path expression,
After extracting from the storage device a subtree set that is a solution set for each of the multipath expressions using the multipath expression,
Among subtrees that are solutions of one or more path expressions in the given multipath expression, a subtree that forms a part of another solution of any path expression of the multipath expressions Remove what appears, and for all remaining subtrees, it is included in which path expression solution set and not in which path expression solution set, and which path expression it is in Find out which prefix in the path expression matches which prefix, and classify based on the result of the above two points, and generate a set group corresponding to each of these classifications,
A data management program for operating a computer so as to output these set groups to the output device.

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a search expression for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device is given in the form of a path expression to the input device,
After the subtree set that is the solution set is extracted from the storage device using the path expression,
Among the subtrees that are solutions of the path expression, those that also appear as subtrees that form part of other solutions of the path expression are removed, and for all remaining subtrees, the path tree Find out which prefixes match and which prefixes in the path expression do not match, and classify based on the results, generating a set of groups for each of these classifications,
A data management program for operating a computer so as to output these set groups to the output device.

A data management program for operating a computer connected to an input device for receiving a set of data,
A subtree set group representing each classification output by the data management program according to claim 1 or 3 is received by the input device, and a solution set for each of the multiple path expressions initially given to the data management program is obtained. , From all the classifications that were supposed to be included in the solution set of the path expression, and from the subtree in each classification that was supposed to match each prefix of the path expression, the prefix and A data management program that operates a computer to compose a set of subtrees extracted using a pair of suffixes by taking the union of all of them.

A data management program for operating a computer connected to an input device for receiving a set of data,
5. The subtree set group representing each classification output by the data management program according to claim 2 or 4 is received by the input device, and the path expression solution set initially given to the data management program A set of subtrees extracted from all subclasses and subtrees in each class that were supposed to match each prefix of the path expression using a suffix paired with the prefix in the path expression A data management program characterized by operating a computer to synthesize by taking all these unions.

A data management program for operating a computer connected to a storage device that stores a plurality of data, an input device that receives a data retrieval formula, and an output device that outputs data, the program being stored in the storage device When a plurality of retrieval expressions for retrieving a subset of existing data are given to the input device,
For all non-empty subsets of the set of all the multiple search expressions,
A search expression corresponding to the difference between these products and the sum of all the products included in the subset of the plurality of search expressions and the sum of all of the plurality of search expressions not included in the subset. Synthesized,
For each of these composite search expressions, by extracting from the storage device the data set that is the solution set,
Based on which search expression solution set contains all the data that is the solution of one or more of the given search expressions, and which search expression solution set does not. A set group corresponding to the classification
A data management program for operating a computer so as to output these set groups to the output device.

A data management program for operating a computer connected to an input device for receiving a set of data,
A set group representing each classification in the previous period output by the data management program according to claim 7 is received by the input device, and a solution set for each of the plurality of search formulas initially given to the data management program is obtained as the search formula. A data management program that operates a computer to synthesize by taking the union of all the classifications that were supposed to be included in the solution set.

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a plurality of retrieval formulas for retrieving a set of subtrees at various levels in the tree structure data stored in the storage device are given to the input device,
After extracting from the storage device a subtree set that is a solution set for each of the plurality of search expressions using the plurality of search expressions,
If one of the sub-expressions of the plurality of search expressions is a sub-tree that forms part of another solution of any one of the search expressions A data management program for operating a computer so as to output each solution set to the output device after removing the subtree that appears as a part of another solution from the solution set.

A data management program for operating a computer connected to a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, and an output device for outputting data,
When a plurality of retrieval formulas for retrieving a set of subtrees at various levels in the tree structure data stored in the storage device are given to the input device,
Determine an appropriate total order for all of the given multiple search expressions,
For each of the multiple search expressions,
From the difference obtained by removing data that matches all search formulas having a younger order than the search formula from the search formula,
Further, a search expression corresponding to a difference obtained by removing all descendants (not including itself) of data matching any of the given multiple search expressions is synthesized,
For each of these composite search expressions, a subtree set as a solution set is extracted from the storage device,
Of the sub-trees that are solutions of one or more of the given search expressions, the sub-tree is not a sub-tree that forms part of another solution of any of the search expressions. A data management program characterized in that a computer is operated to take out all of them without duplication and output them to the output device.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a plurality of search expressions for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device are given to the input device in the form of a path expression,
For all combinations of any non-empty subset of the set of all of the given multipath expressions and any subset of the set of all the prefixes of the given multipath expressions, The product of all of the multipath expressions contained in the former subset and all of the prefixes contained in the latter subset;
The sum of all of the multipath expressions not included in the former subset and all of the prefixes not included in the latter subset,
From the difference between these products and sums,
Further, a path expression corresponding to the difference obtained by removing all descendants (not including itself) of data matching all the sums of the given multiple path expressions is synthesized,
For each of these composite path expressions, the subtree set that is the solution set is extracted from the storage unit,
Of the sub-trees that are solutions of one or more path expressions of the given multi-path expressions, they are not sub-trees that form part of other solutions of any of the multi-path expressions. All of what is included in which path expression's solution set, not in which path expression's solution set, and which prefix of which path expression matches which prefix of which path expression Generate a set of groups that correspond to the words that do not match
A data management system, wherein a computer operates so as to output these set groups to the output device.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a search expression for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device is given in the form of a path expression to the input device,
For all subsets of the set of all prefixes of the path expression,
The product of the given path expression and all of the prefixes contained in the subset;
The sum of all the prefixes not included in the subset,
From the difference between these products and sums,
Further, a path expression corresponding to the difference obtained by removing all descendants (not including itself) of data matching the given path expression is synthesized,
For each of these composite path expressions, the subtree set that is the solution set is extracted from the storage unit,
All subtrees that are solutions of the given path expression that do not form subtrees that form part of the other solutions of the path expression match any prefix of the path expression that matches the path Create a set of groups that are classified based on which prefixes in the expression do not match,
A data management system, wherein a computer operates so as to output these set groups to the output device.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a plurality of search expressions for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device are given to the input device in the form of a path expression,
After extracting from the storage device a subtree set that is a solution set for each of the multipath expressions using the multipath expression,
Among subtrees that are solutions of one or more path expressions in the given multipath expression, a subtree that forms a part of another solution of any path expression of the multipath expressions Remove what appears, and for all remaining subtrees, it is included in which path expression solution set and not in which path expression solution set, and which path expression it is in Find out which prefix in the path expression matches which prefix, and classify based on the result of the above two points, and generate a set group corresponding to each of these classifications,
A data management system, wherein a computer operates so as to output these set groups to the output device.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a search expression for retrieving a set of subtrees at various depths in the tree structure data stored in the storage device is given in the form of a path expression to the input device,
After the subtree set that is the solution set is extracted from the storage device using the path expression,
Among the subtrees that are solutions of the path expression, those that also appear as subtrees that form part of other solutions of the path expression are removed, and for all remaining subtrees, the path tree Find out which prefixes match and which prefixes in the path expression do not match, and classify based on the results, generating a set of groups for each of these classifications,
A data management system, wherein a computer operates so as to output these set groups to the output device.

A data management system comprising an input device for receiving a set of data and a computer connected to the input device,
14. A subtree set group representing each classification output by the data management system according to claim 11 or 13 is received by the input device, and a solution set for each of the plurality of path expressions initially given to the data management system is obtained. , From all the classifications that were supposed to be included in the solution set of the path expression, and from the subtree in each classification that was supposed to match each prefix of the path expression, the prefix and A data management system in which a computer operates to synthesize a set of subtrees extracted using a pair of suffixes by taking the union of all of them.

A data management system comprising an input device for receiving a set of data and a computer connected to the input device,
15. The subtree set group representing each classification output by the data management system according to claim 12 or 14 is received by the input device, and the path-type solution set initially given to the data management system A set of subtrees extracted from all subclasses and subtrees in each class that were supposed to match each prefix of the path expression using a suffix paired with the prefix in the path expression A data management system characterized in that the computer operates to synthesize by taking the union of all of these.

A data management system comprising a storage device for storing a plurality of data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a plurality of retrieval formulas for retrieving a subset of data stored in the storage device are given to the input device,
For all non-empty subsets of the set of all the multiple search expressions,
A search expression corresponding to the difference between these products and the sum of all the products included in the subset of the plurality of search expressions and the sum of all of the plurality of search expressions not included in the subset. Synthesized,
For each of these composite search expressions, by extracting from the storage device the data set that is the solution set,
Based on which search expression solution set contains all the data that is the solution of one or more of the given search expressions, and which search expression solution set does not. A set group corresponding to the classification
A data management system, wherein a computer operates so as to output these set groups to the output device.

A data management system comprising an input device for receiving a set of data and a computer connected to the input device,
18. A set group representing each classification in the previous period output by the data management system according to claim 17 is received by the input device, and a solution set for each of the plurality of search expressions initially given to the data management system is obtained as the search expression. A data management system in which a computer operates to synthesize by taking the union of all the classifications included in the solution set of.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a plurality of retrieval formulas for retrieving a set of subtrees at various levels in the tree structure data stored in the storage device are given to the input device,
After extracting from the storage device a subtree set that is a solution set for each of the plurality of search expressions using the plurality of search expressions,
If one of the sub-expressions of the plurality of search expressions is a sub-tree that forms part of another solution of any one of the search expressions A data management system, wherein a computer operates to output each solution set to the output device after removing the subtree that appears as a part of another solution from the solution set.

A data management system comprising a storage device for storing one or more tree-structured data, an input device for receiving a data retrieval formula, an output device for outputting data, and a computer connected thereto,
When a plurality of retrieval formulas for retrieving a set of subtrees at various levels in the tree structure data stored in the storage device are given to the input device,
Determine an appropriate total order for all of the given multiple search expressions,
For each of the multiple search expressions,
From the difference obtained by removing data that matches all search formulas having a younger order than the search formula from the search formula,
Further, a search expression corresponding to a difference obtained by removing all descendants (not including itself) of data matching any of the given multiple search expressions is synthesized,
For each of these composite search expressions, a subtree set as a solution set is extracted from the storage device,
Of the sub-trees that are solutions of one or more of the given search expressions, the sub-tree is not a sub-tree that forms part of another solution of any of the search expressions. A data management system in which a computer operates so as to take out everything without duplication and output them to the output device.