JP4550876B2

JP4550876B2 - Structured document retrieval system and program

Info

Publication number: JP4550876B2
Application number: JP2007258085A
Authority: JP
Inventors: 基起中西
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-10-01
Filing date: 2007-10-01
Publication date: 2010-09-22
Anticipated expiration: 2027-10-01
Also published as: JP2009087162A

Description

本発明は、データベースに格納された構造化文書を検索する構造化文書検索システムに係り、特に複数の同一視条件がついた検索に好適な構造化文書検索システム及びプログラムに関する。 The present invention relates to a structured document retrieval system for retrieving a structured document stored in a database, and more particularly to a structured document retrieval system and a program suitable for retrieval with a plurality of identification conditions.

多くのデータベース管理システムは、検索機能の一部として「同一視検索」機能を備えている。同一視検索とは、検索キーワードとそれと同一視する文字種または文字列を指定する検索のことである。よく用いられる同一視条件を以下に例示する。 Many database management systems have a “same look search” function as part of the search function. The same view search is a search that specifies a search keyword and a character type or a character string that is identified with the search keyword. Examples of commonly used identification conditions are given below.

（１）英数字同一視・・・英数字の大文字及び小文字（これをcaseと表現する）並びに全角（全角文字）及び半角（半角文字）（これをwidthと表現する）を同一視する。 (1) Alphanumeric equivalence: Alphanumeric uppercase and lowercase letters (represented as case) and full-width (full-width characters) and half-width (half-width characters) (represented as width) are regarded as the same.

（２）仮名同一視・・・平仮名及び片仮名（これをkanaと表現する）並びに平仮名と片仮名の大文字及び小文字（これをkanacaseと表現する）を同一視する。 (2) Kana equation: Hiragana and Katakana (represented as kana), and hiragana and katakana uppercase and lowercase (represented as kanacase) are identified.

（３）異体字同一視・・・仮名や漢字の異体字を同一視する。 (3) Equivalent variant characters ・・・ Identify variant characters of kana and kanji.

（４）同義語同一視・・・同義語を同一視する。 (4) Identify synonyms ... Identify synonyms.

例えば検索キーワードが「あA」である場合、英数字同一視条件を指定すると「あa」「あＡ」「あａ」もヒットする。仮名同一視条件を指定すると「アA」「ぁA」「ァA」もヒットする。複数の同一視条件を同時に指定した場合は、更にヒットする語彙が増える。 For example, when the search keyword is “A”, if the alphanumeric identification condition is specified, “A”, “A”, and “A” will also be hit. If the kana identification condition is specified, “A A”, “A A”, and “A A” will also be hit. When multiple identification conditions are specified at the same time, more vocabulary hits are added.

このように同一視検索では、検索キーワードと完全に一致する語彙に加えて同一視条件に適合した語彙もヒットする。このため同一視検索は、表記ゆれを考慮した検索などに便利であるが、同一視条件に従って展開（いわゆる同一視展開）した語彙もサーチすることから、検索の処理コストが増加するという問題がある。この問題を、索引情報の工夫により解決する方法が、例えば特許文献１及び２で提案されている。特許文献１は、同一視する検索としない検索とで、使用する索引情報を分けることによって処理コストを抑える方法を提案している。特許文献２は、ハッシュ値を工夫して、同一視される語彙を同じ索引情報として管理することによって処理コストを抑える方法を提案している。 In this way, in the same-look search, in addition to the vocabulary that completely matches the search keyword, the vocabulary that matches the same-view condition is also hit. For this reason, the same-lookup search is convenient for searches that take into account fluctuations, but it also searches for vocabularies that have been developed according to the same-view conditions (so-called same-view development), which increases the search processing cost. . For example, Patent Documents 1 and 2 propose a method for solving this problem by devising index information. Patent Document 1 proposes a method of suppressing processing costs by dividing index information to be used for searches that are not identified and searches that are not identified. Patent Document 2 proposes a method of reducing processing costs by devising a hash value and managing identifiable vocabulary as the same index information.

ところで近年は、論理構造を持つ文書（つまり構造化文書）が、データベース管理システムによって管理されるようになってきている。構造化文書において、当該文書の論理構造は、当該文書中に記述されたタグによって示される。このタグを用いて論理構造が表現された構造化文書は、コンピュータでの処理に適している。 In recent years, documents having a logical structure (that is, structured documents) have been managed by database management systems. In the structured document, the logical structure of the document is indicated by a tag described in the document. A structured document in which a logical structure is expressed using this tag is suitable for processing by a computer.

さて、タグを用いてデータを記述する手段として、ＸＭＬ（Extensible Markup Language）が広く利用されている。ＸＭＬは、意味付けされたタグによるデータの階層化や構造の自由な拡張性という特長を持つ。これらの特長を生かしたＸＭＬ利用技術として、ＸＭＬデータベース（ＸＭＬＤＢ）と呼ばれるデータベースが知られている。ＸＭＬデータベースはＸＭＬデータベース管理システム（ＸＭＬＤＢＭＳ）と呼ばれるデータベース管理システムによって制御され、ＸＭＬ文書を格納し、またＸＭＬ文書（中の指定された構造）を検索する機能を提供する。このようなデータベース管理システムにおいても、同一視検索を適用して、同一視条件を満たすＸＭＬ文書（構造化文書）を検索することが可能である。 As a means for describing data using tags, XML (Extensible Markup Language) is widely used. XML has features such as hierarchization of data by means of tagged tags and free expandability of structure. A database called XML database (XMLDB) is known as an XML utilization technology that makes use of these features. The XML database is controlled by a database management system called an XML database management system (XMLDBMS), and provides functions for storing XML documents and retrieving XML documents (structures specified therein). Even in such a database management system, it is possible to search for an XML document (structured document) that satisfies the same view condition by applying the same view search.

ＸＭＬ文書（内の要素）の検索には標準化された問い合わせ言語がよく利用される。問い合わせ言語の主なものはＸＰａｔｈとＸＱｕｅｒｙである。ＸＰａｔｈは、ＸＭＬ文書中の要素（ノード）の位置を指定して検索を行うのに用いられる。
特開２００４−１９９２８２号公報特開２００６−１０６８９６号公報 A standardized query language is often used to search for XML documents (elements therein). The main query languages are XPath and XQuery. XPath is used to perform a search by specifying the position of an element (node) in an XML document.
JP 2004-199282 A JP 2006-106896 A

同一視検索では、同一視条件ごとに同一視する／しないを切り替える条件指定が必要な場合もあるが、逆に全ての同一視条件をまとめて同一視する／しないに指定する条件指定が必要な場合もある。Ｗｅｂ検索サービスがその例である。 In the same-sight search, there may be a case where it is necessary to specify a condition for switching whether or not to identify the same for each same-sight condition, but conversely, a condition specification for specifying whether or not to identify all the same-view conditions together is necessary. In some cases. An example is a web search service.

従来技術では、後者の場合にも複数の同一視条件を個別に指定する必要がある。そのためクエリ（問い合わせ）を記述するのに手間がかかり、またクエリが長くなって意味を解釈するのが難しいという問題がある。また同一視条件を指定するためのクエリ文法を全て知る必要がある。 In the prior art, it is necessary to individually specify a plurality of identical viewing conditions even in the latter case. Therefore, it takes time to write a query (inquiry), and there is a problem that the query becomes long and it is difficult to interpret the meaning. It is also necessary to know all the query grammars for specifying identification conditions.

ＸＱｕｅｒｙを使って具体例を示す。ここでは説明の便宜上、同一視条件として「英数字同一視」と「仮名同一視」の２つだけがあると仮定し、英数字同一視を“case/width (in)sensitive”、仮名同一視を“kana/kanacase (in)sensitive”と記述することにする。 A specific example is shown using XQuery. Here, for convenience of explanation, it is assumed that there are only two identification conditions: “alphanumeric identification” and “kana identification”, and alphanumeric identification is “case / width (in) sensitive”. Is described as “kana / kanacase (in) sensitive”.

全ての同一視条件を同一視するに指定する場合、クエリ（検索式）は、例えば
/item/name [./text() ftcontains “社員Aさん” case insensitive width insensitive kana insensitive kanacase insensitive]
のようになる。ここで、“ftcontains”はＸＱｕｅｒｙの全文検索演算子を意味するものとする。また、“case insensitive”と“width insensitive”と“kana insensitive”と“kanacase insensitive”とが、同一視条件を指定している部分であり、この４つの同一視条件指定部分によりクエリが長くなっている。 When specifying to identify all the same identification conditions, the query (search expression) is, for example,
/ item / name [./text () ftcontains “Employee A” case insensitive width insensitive kana insensitive kanacase insensitive]
become that way. Here, “ftcontains” means an XQuery full-text search operator. In addition, “case insensitive”, “width insensitive”, “kana insensitive”, and “kanacase insensitive” are the parts that specify the same identification condition, and these four identification condition specification parts make the query longer. Yes.

一方、同一視検索では、実際にヒットする／しないに拘わらず、同一視条件の指定通りに検索キーワードが同一視展開される。このため、同時に指定する同一視条件が多いほど展開数が増え、検索の処理コストが増大する。例えば英数字と平仮名とを含む２文字の検索キーワード“あA”の場合、英数字同一視または仮名同一視の単一指定なら展開数は４となる。しかし、英数字同一視と仮名同一視との同時指定なら展開数は１６に増加する。 On the other hand, in the same-look search, regardless of whether or not the hit is actually made, the search keyword is looked-up according to the designation of the same-view condition. For this reason, the more the same viewing conditions that are specified at the same time, the more the number of expansions, and the search processing cost increases. For example, in the case of a two-character search keyword “A” that includes alphanumeric characters and hiragana, the number of expansions is four if the single designation of alphanumeric identification or kana identification is single. However, the number of expansions is increased to 16 if the alphanumeric identification and kana identification are specified simultaneously.

特許文献１及び２は処理コストを抑える方法を提案している。しかし特許文献１及び２が提案する方法においても、同一視条件に従って検索キーワードが同一視展開される。このため、展開された語彙にヒットしなかった場合には、その展開に要した処理コストが無駄となる。 Patent Documents 1 and 2 propose methods for reducing processing costs. However, in the methods proposed by Patent Documents 1 and 2, the search keywords are identified in accordance with the identification conditions. For this reason, when the expanded vocabulary is not hit, the processing cost required for the expansion is wasted.

本発明は上記事情を考慮してなされたものでその目的は、複数の同一視条件が記述されていない検索式であっても、当該検索式に基づいて複数の同一視条件に従う同一視検索を行うことができる構造化文書検索システム及びプログラムを提供することにある。 The present invention has been made in consideration of the above circumstances, and the purpose of the present invention is to perform an identification search according to a plurality of identification conditions based on the search expression even if the search expression does not describe a plurality of identification conditions. It is an object of the present invention to provide a structured document search system and program that can be performed.

本発明の１つの観点によれば、複数の構造化文書を格納するデータベースを備えた構造化文書検索システムが提供される。前記構造化文書検索システムは、検索キーワードと同一視するための複数の同一視条件を格納する同一視条件情報記憶手段と、クライアント端末から与えられる検索要求に含まれている検索式を解析し、当該検索式に基づいて前記データベースに格納されている構造化文書を対象とする検索処理を行う検索処理手段と、前記検索式が全ての同一視条件を一括して指定するための一括同一視条件を含む場合、前記同一視条件情報記憶手段に格納されている前記複数の同一視条件の全てに基づいて前記検索式に含まれている検索キーワードを展開する同一視展開手段とを具備する。前記検索処理手段は、前記検索式に含まれている前記検索キーワード及び前記展開された検索キーワードに基づいて、前記データベースから前記検索式の示す検索条件に合致する構造を含む構造化文書を検索する。 According to one aspect of the present invention, a structured document search system including a database that stores a plurality of structured documents is provided. The structured document search system analyzes the identification formula information storage means for storing a plurality of identification conditions for identifying the search keyword and a search expression included in the search request given from the client terminal, Search processing means for performing a search process on the structured document stored in the database based on the search formula; and batch identification condition for the search formula to collectively specify all identification conditions , The same-view expansion means for expanding the search keyword included in the search formula based on all of the plurality of the same-view conditions stored in the same-view condition information storage means. The search processing unit searches a structured document including a structure that matches a search condition indicated by the search expression from the database based on the search keyword included in the search expression and the expanded search keyword. .

本発明によれば、検索式に一括同一視条件を含める（記述する）だけで、同一視条件情報記憶手段に格納されている複数の同一視条件の全てが当該検索式に含められているかの如く、当該検索式に基づいて当該複数の同一視条件に従う同一視検索を行うことができる。このため、ユーザにとって検索式の記述が容易になり、また検索式が短くなるので意味を解釈しやすくなるという効果がある。またユーザは、一括同一視条件によって自動的に指定される複数の同一視条件の各々について知らなくても、当該複数の同一視条件の全てを使った検索を構造化文書検索システムに対してクライアント端末から要求することができる。また、一括同一視条件によって自動的に指定される同一視条件が新たに増えても、即ち同一視条件情報記憶手段に新たな同一視条件が追加されても、検索式を変更する必要がない。 According to the present invention, whether or not all of the plurality of identification conditions stored in the identification condition information storage unit are included in the retrieval expression only by including (description) the collective identification condition in the retrieval expression. As described above, the same-view search according to the plurality of same-view conditions can be performed based on the search formula. For this reason, it is easy for the user to describe the search formula, and since the search formula is shortened, the meaning can be easily interpreted. Further, even if the user does not know each of the plurality of identification conditions automatically specified by the batch identification condition, the user can search the structured document search system using all of the plurality of identification conditions. It can be requested from the terminal. Further, even if the number of identical conditions automatically specified by the collective identical condition is increased, that is, even when a new identical condition is added to the identical condition information storage unit, there is no need to change the search formula. .

以下、本発明の実施の形態につき図面を参照して説明する。
図１は本発明の一実施形態に係る構造化文書検索システム５０を含むクライアント−サーバシステムのハードウェア構成を示すブロック図である。クライアント−サーバシステムは、主として、データベースサーバ１０と、複数のクライアント端末（クライアント）とから構成される。複数のクライアント端末はクライアント端末２０を含む。クライアント端末２０上では、データベースサーバ１０を利用するアプリケーション（アプリケーションプログラム）が動作する。クライアント端末２０を含む複数のクライアント端末は、ローカルエリアネットワーク（ＬＡＮ）のようなネットワーク３０を介してデータベースサーバ１０と接続されている。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram showing a hardware configuration of a client-server system including a structured document search system 50 according to an embodiment of the present invention. The client-server system mainly includes a database server 10 and a plurality of client terminals (clients). The plurality of client terminals include a client terminal 20. On the client terminal 20, an application (application program) that uses the database server 10 operates. A plurality of client terminals including the client terminal 20 are connected to the database server 10 via a network 30 such as a local area network (LAN).

なお、図１では、クライアント端末２０以外のクライアント端末は省略されている。そこで以下では、クライアント端末２０だけが構造化文書検索システム５０を利用するものとして説明するが、他のクライアント端末もクライアント端末２０と同様に構造化文書検索システム５０を利用できることは勿論である。 In FIG. 1, client terminals other than the client terminal 20 are omitted. Therefore, in the following description, it is assumed that only the client terminal 20 uses the structured document search system 50. However, other client terminals can use the structured document search system 50 in the same manner as the client terminal 20.

データベースサーバ１０は、主メモリのようなメモリ１１を有するコンピュータ（データベースサーバコンピュータ）である。データベースサーバ１０は、ハードディスクドライブのような外部記憶装置４０と接続されている。この外部記憶装置４０は、データベース管理プログラム４１及びデータベース４２を格納する。本実施形態において構造化文書検索システム５０は、データベースサーバ１０及び外部記憶装置４０（に格納されているデータベース４２）によって実現される。 The database server 10 is a computer (database server computer) having a memory 11 such as a main memory. The database server 10 is connected to an external storage device 40 such as a hard disk drive. The external storage device 40 stores a database management program 41 and a database 42. In the present embodiment, the structured document search system 50 is realized by the database server 10 and the external storage device 40 (the database 42 stored therein).

データベース管理プログラム４１は、データベースサーバ１０によるデータベース４２の管理、及びクライアント端末２０からの例えばＸＱｕｅｒｙを用いた問い合わせに基づく検索処理に用いられる。データベース４２は構造化文書としての例えばＸＭＬ文書（ＸＭＬ形式の電子文書）を格納するＸＭＬ文書データベース（構造化文書データベース）である。 The database management program 41 is used for management of the database 42 by the database server 10 and search processing based on an inquiry from the client terminal 20 using, for example, XQuery. The database 42 is an XML document database (structured document database) that stores, for example, XML documents (electronic documents in XML format) as structured documents.

図２は図１に示される構造化文書検索システム５０の主として機能構成を示すブロック図である。構造化文書検索システム５０は、データベース管理システム５１及びデータベース４２から構成される。データベース４２には、ＸＭＬ文書の集合に加えて、当該データベース４２に格納されているＸＭＬ文書を検索するのに用いられる索引が格納される。 FIG. 2 is a block diagram mainly showing a functional configuration of the structured document search system 50 shown in FIG. The structured document search system 50 includes a database management system 51 and a database 42. In addition to the set of XML documents, the database 42 stores an index used to search for XML documents stored in the database 42.

データベース管理システム５１は、要求処理部５２、同一視条件管理部５３、検索処理部５４、索引管理部５５、文書登録部５６及びデータベース操作部５７を含む。
要求処理部５２は、クライアント端末２０からの要求（コマンド）を受け付けて当該要求の種別を判別し、その判別結果に基づいて当該要求を要求処理部５２、同一視条件管理部５３、検索処理部５４、索引管理部５５または文書登録部５６に送出する入力インタフェースとして機能する。ここでは要求処理部５２は、クライアント端末２０からの要求が同一視条件設定要求（詳細は第２の変形例で述べる）ならば、当該同一視条件設定要求を同一視条件管理部５３に送出し、
検索要求ならば当該検索要求を検索処理部５４に送出する。また要求処理部５２は、クライアント端末２０からの要求が索引設定要求ならば当該索引設定要求を索引管理部５５に送出し、文書登録要求ならば当該文書登録要求を文書登録部５６に送出する。要求処理部５２はまた、クライアント端末２０からの要求に対する同一視条件管理部５３、検索処理部５４、索引管理部５５または文書登録部５６からの応答を当該クライアント端末２０に返す出力インタフェースとしても機能する。 The database management system 51 includes a request processing unit 52, an identification condition management unit 53, a search processing unit 54, an index management unit 55, a document registration unit 56, and a database operation unit 57.
The request processing unit 52 receives a request (command) from the client terminal 20, determines the type of the request, and determines the request based on the determination result, the request processing unit 52, the identification condition management unit 53, and the search processing unit. 54, and functions as an input interface to be sent to the index management unit 55 or the document registration unit 56. Here, if the request from the client terminal 20 is the same viewing condition setting request (details will be described in the second modification), the request processing unit 52 sends the same viewing condition setting request to the same viewing condition management unit 53. ,
If a search request, the search request is sent to the search processing unit 54. The request processing unit 52 sends the index setting request to the index management unit 55 if the request from the client terminal 20 is an index setting request, and sends the document registration request to the document registration unit 56 if the request is a document registration request. The request processing unit 52 also functions as an output interface that returns a response from the identification condition management unit 53, the search processing unit 54, the index management unit 55, or the document registration unit 56 to the request from the client terminal 20. To do.

同一視条件管理部５３は、同一視条件、例えば構造化文書検索システム５０（内のデータベース管理システム５１）とクライアント端末２０との間の通信に用いられるセッションごとの同一視条件を管理する。同一視条件管理部５３は同一視条件記憶部５３０を含む。同一視条件情報記憶部５３０は、上述のセッションごとに各同一視条件の適用／非適用（つまり同一視する／しない）を管理するのに用いられる同一視条件情報５３１を格納する。同一視条件情報記憶部５３０は、データベースサーバ１０が有するメモリ１１内の一部の記憶領域を用いて実現されるものとする。同一視条件情報記憶部５３０を、外部記憶装置４０の一部の記憶領域を用いて実現してもよい。 The same-view condition management unit 53 manages the same-view condition, for example, the same-view condition for each session used for communication between the structured document search system 50 (internal database management system 51) and the client terminal 20. The identical viewing condition management unit 53 includes an identical viewing condition storage unit 530. The same viewing condition information storage unit 530 stores the same viewing condition information 531 used for managing application / non-application (that is, whether or not to view) each of the same viewing conditions for each session described above. The same viewing condition information storage unit 530 is assumed to be realized using a partial storage area in the memory 11 of the database server 10. The identical viewing condition information storage unit 530 may be realized using a partial storage area of the external storage device 40.

図３は、同一視条件情報５３１の一例を示す。以下では説明の便宜上、同一視条件として「英数字同一視（case/width）」と「仮名同一視（kana/kanacase）」の２つだけがあると仮定する。 FIG. 3 shows an example of the same viewing condition information 531. In the following, for convenience of explanation, it is assumed that there are only two identification conditions, “alphanumeric identification (case / width)” and “kana identification (kana / kanacase)”.

本実施形態において同一視条件情報５３１は、図３に示されるように表形式で管理される。同一視条件情報５３１は、列要素（列項目）としてセッション番号と各同一視条件を含む。同一視条件情報５３１は、クライアント端末との通信に用いられるセッション（本実施形態では、接続中のセッション、つまり確立中のセッション）ごとに、当該セッションのセッション番号と、各同一視条件の同一視する／しない（適用／非適用）とを、それぞれ値（表要素の値）として持つ。このセッションごとの、セッション番号と各同一視条件の同一視する／しないを示す値との組からなる情報を、セッション対応同一視条件情報と呼ぶ。 In this embodiment, the same viewing condition information 531 is managed in a table format as shown in FIG. The same viewing condition information 531 includes a session number and each same viewing condition as a column element (column item). For each session (in this embodiment, a connected session, that is, an established session) used for communication with a client terminal, the same-view condition information 531 includes the session number of the session and the same-view of each same-view condition. Whether or not (applied / non-applied) is set as a value (table element value). Information consisting of a set of a session number and a value indicating whether or not each of the same viewing conditions is identified for each session is referred to as session-corresponding identification information.

セッションが確立していないとき、当該セッションに固有の同一視条件情報（セッション対応同一視条件情報）は空である。セッションが確立し、その旨が要求処理部５２から同一視条件管理部５３に通知されると、当該同一視条件管理部５３によってそのセッションに関する情報が同一視条件情報５３１中に追加される。ここでは、確立したセッションに固有の同一視条件情報における各同一視条件が“する”（同一視する）に設定される。図３の例では、セッション（セッション番号）によっては、“する”の逆の設定を意味する“しない”（同一視しない）も敢えて記述されているが、本実施形態ではセッションが確立したときに全て“する”に設定される。セッションが切断され、その旨が要求処理部５２から同一視条件管理部５３に通知されると、当該同一視条件管理部５３によってそのセッションに関する情報が削除される。 When the session is not established, the same viewing condition information unique to the session (session-corresponding viewing condition information) is empty. When the session is established and the request processing unit 52 notifies the same viewing condition management unit 53 to that effect, information regarding the session is added to the same viewing condition information 531 by the same viewing condition management unit 53. Here, each identification condition in identification information specific to the established session is set to “Yes” (identify). In the example of FIG. 3, depending on the session (session number), “do not” (not identified), which means the opposite setting of “yes”, is also described. However, in this embodiment, when the session is established All are set to “Yes”. When the session is disconnected and the fact processing unit 52 notifies the same viewing condition management unit 53 to that effect, the information regarding the session is deleted by the same viewing condition management unit 53.

再び図２を参照すると、検索処理部５４は、クライアント端末２０から送られた検索要求（クエリ）を要求処理部５２を介して受け取り、当該クエリに含まれている検索式に従う検索処理を行う。 Referring to FIG. 2 again, the search processing unit 54 receives a search request (query) sent from the client terminal 20 via the request processing unit 52, and performs a search process according to the search formula included in the query.

索引管理部５５は、検索処理部５４による検索処理に用いられる索引を管理する。索引管理部５５は、文書登録部５６によるＸＭＬ文書の登録時に、当該ＸＭＬ文書に含まれているノード（要素）に対応付けられる索引を作成してデータベース４２に登録する。 The index management unit 55 manages an index used for search processing by the search processing unit 54. The index management unit 55 creates an index associated with a node (element) included in the XML document and registers it in the database 42 when the document registration unit 56 registers the XML document.

文書登録部５６は、クライアント端末２０からの文書登録要求を要求処理部５２を介して受け取り、当該文書登録要求で指定されたＸＭＬ文書をデータベース４２に登録（格納）する文書登録処理を行う。データベース操作部５７は、検索処理部５４、索引管理部５５及び文書登録部５６がデータベース（データベースファイル）４２にアクセスするのを可能とするインタフェースとして機能して、当該データベース４２に対する処理を行う。 The document registration unit 56 receives a document registration request from the client terminal 20 via the request processing unit 52, and performs document registration processing for registering (storing) the XML document specified in the document registration request in the database 42. The database operation unit 57 functions as an interface that enables the search processing unit 54, the index management unit 55, and the document registration unit 56 to access the database (database file) 42, and performs processing on the database 42.

本実施形態において、要求処理部５２、同一視条件管理部５３、検索処理部５４、索引管理部５５、文書登録部５６及びデータベース操作部５７は、図１のデータベースサーバ１０が外部記憶装置４０に格納されているデータベース管理プログラム４１を当該サーバ１０内のメモリ１１に読み込んで実行することにより実現されるものとする。このプログラム４１は、コンパクトディスク、或いはＲＯＭのような、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、このプログラム４１が、ネットワーク３０を介してデータベースサーバ１０にダウンロードされても構わない。 In this embodiment, the request processing unit 52, the same-sight condition management unit 53, the search processing unit 54, the index management unit 55, the document registration unit 56, and the database operation unit 57 are connected to the external storage device 40 by the database server 10 in FIG. It is assumed that the stored database management program 41 is implemented by reading it into the memory 11 in the server 10 and executing it. The program 41 can be stored in advance in a computer-readable storage medium such as a compact disk or ROM and can be distributed. Further, this program 41 may be downloaded to the database server 10 via the network 30.

次に、構造化文書検索システム５０の動作について説明する。
＜文書登録処理＞
まず、構造化文書検索システム５０における文書登録処理について、図４のフローチャートを参照して説明する。 Next, the operation of the structured document search system 50 will be described.
<Document registration process>
First, document registration processing in the structured document search system 50 will be described with reference to the flowchart of FIG.

今、ユーザが、データベース４２に登録されるべきＸＭＬ文書を指定して、当該ＸＭＬ文書の登録を指示するための操作を、クライアント端末２０上で行ったものとする。するとクライアント端末２０は、構造化文書検索システム５０に対して、指定のＸＭＬ文書をデータベース４２に登録させるための登録要求（文書登録要求）をネットワーク３０経由で送出する（ステップＳ１）。 Now, it is assumed that the user designates an XML document to be registered in the database 42 and performs an operation on the client terminal 20 to instruct the registration of the XML document. Then, the client terminal 20 sends a registration request (document registration request) for registering the designated XML document in the database 42 to the structured document search system 50 via the network 30 (step S1).

要求処理部５２は、クライアント端末２０からの登録要求を受け取ると、当該登録要求を文書登録部５６に渡してＸＭＬ文書（ＸＭＬドキュメント）の登録を要求する（ステップＳ２）。文書登録部５６は、クライアント端末２０からの登録要求を要求処理部５２を介して受け取ると、当該要求で指定されたＸＭＬ文書のパース（構文解析）を開始する（ステップＳ３）。 Upon receiving the registration request from the client terminal 20, the request processing unit 52 passes the registration request to the document registration unit 56 and requests registration of an XML document (XML document) (step S2). When the document registration unit 56 receives a registration request from the client terminal 20 via the request processing unit 52, the document registration unit 56 starts parsing (syntax analysis) of the XML document designated by the request (step S3).

文書登録部５６は、ＸＭＬ文書のパースによって当該ＸＭＬ文書から要素（ノード）が抽出される都度、そのノードについて次のように処理を行う（ステップＳ４）。
まず文書登録部５６は、抽出されたノードの情報、例えば抽出されたノードの構造（を表すパス）を含む情報（ノード情報）を索引管理部５５に渡して索引登録処理を行わせる。抽出されたノード（要素）の内容（値）がテキストの場合、当該ノードの内容に対応するノード、つまり当該ノードの直下のノードは、テキストノードと呼ばれる。このように、抽出されたノード（要素）の内容がテキストの場合、ノード情報は、当該ノードの構造（パス）と当該ノードの直下のテキストノードの値とを含む。 Each time an element (node) is extracted from the XML document by parsing the XML document, the document registration unit 56 performs the following process on the node (step S4).
First, the document registration unit 56 passes the extracted node information, for example, the information (node information) including the extracted node structure (representing a path) to the index management unit 55 to perform index registration processing. When the content (value) of the extracted node (element) is text, a node corresponding to the content of the node, that is, a node immediately below the node is called a text node. As described above, when the content of the extracted node (element) is text, the node information includes the structure (path) of the node and the value of the text node immediately below the node.

索引管理部５５は、文書登録部５６からノード情報を受け取ると、当該ノード情報中の構造（パス）を含む索引を作成して、当該索引をデータベース４２に追加登録する索引登録処理を行う（ステップＳ５）。本実施形態では、ノード情報がテキストノードの値を含む場合、作成される索引は、構造（パス）とテキストノードの値の組を含むものとする。 Upon receiving the node information from the document registration unit 56, the index management unit 55 creates an index including the structure (path) in the node information, and performs an index registration process for additionally registering the index in the database 42 (step). S5). In the present embodiment, when the node information includes a text node value, the created index includes a set of a structure (path) and a text node value.

次に文書登録部５６は、抽出されたノード（つまりＸＭＬ文書の一部）をデータベース４２に格納するドキュメント格納処理を行う（ステップＳ６）。データベース４２内では、当該データベース４２に格納される複数のＸＭＬ文書は、１つの仮想的なＸＭＬ文書の一部の構造をなす部分的な文書として、当該仮想的なＸＭＬ文書のルートノード（／）の下位に位置付けられる。 Next, the document registration unit 56 performs document storage processing for storing the extracted node (that is, a part of the XML document) in the database 42 (step S6). Within the database 42, a plurality of XML documents stored in the database 42 is a partial document that forms part of the structure of one virtual XML document, and the root node (/) of the virtual XML document. It is positioned below

文書登録部５６は、ドキュメント格納処理（ステップＳ６）を行うと、クライアント端末２０から要求されたＸＭＬ文書のパースが全て終了したか、つまりＸＭＬ文書に含まれている全てのノードについて処理をし終えたかを判定する（ステップＳ７）。もし、未処理のノードが残っているならば（ステップＳ７）、文書登録部５６は上記ステップＳ４に戻り、次のノードについて処理を継続する。これに対し、未処理のノードが残っていないならば、即ち要求されたＸＭＬ文書のパースが全て終了したならば（ステップＳ７）、検索処理部５４は文書登録処理を終了する。 When the document registration process (step S6) is performed, the document registration unit 56 has completed the parsing of the XML document requested from the client terminal 20, that is, has completed the process for all the nodes included in the XML document. (Step S7). If an unprocessed node remains (step S7), the document registration unit 56 returns to step S4 and continues the process for the next node. On the other hand, if there are no unprocessed nodes left, that is, if all the requested XML documents have been parsed (step S7), the search processing unit 54 ends the document registration process.

ＸＭＬ文書から抽出されるノードの順序は、当該ＸＭＬ文書に出現するノードの順序に一致する。したがって上述の文書登録処理では、ＸＭＬ文書に出現する順番にノードが処理される。 The order of nodes extracted from the XML document matches the order of nodes appearing in the XML document. Therefore, in the document registration process described above, nodes are processed in the order in which they appear in the XML document.

ここで、クライアント端末２０から登録が要求されたＸＭＬ文書が
<item>
<id>0001</id>
<name>社員Aさん</name>
</item>
であるとする。 Here, the XML document requested to be registered by the client terminal 20 is
<item>
<id> 0001 </ id>
<name> Employee A </ name>
</ item>
Suppose that

図５は、上記ＸＭＬ文書がデータベース４２に登録された場合に作成される索引の例を示す。図５の例では、索引は、登録されるＸＭＬ文書から抽出されたノードの構造（パス）と、（ＸＭＬ文書の木構造上で）その直下のテキストノード値の組を含む。 FIG. 5 shows an example of an index created when the XML document is registered in the database 42. In the example of FIG. 5, the index includes a set of a node structure (path) extracted from the registered XML document and a text node value immediately below it (on the tree structure of the XML document).

例えば、<id>ノードに関する索引は、<id>ノード（要素）の構造“/item/id”と、当該<id>ノードの内容であるテキストノードの値“0001”、つまり当該<id>ノードの直下のテキストノードの値“0001”とを含む。同様に、<name>ノードに関する索引は、<name>ノードの構造“/item/name”と、当該<name>ノードの直下のテキストノードの値“社員Aさん”とを含む。但し、<item>ノードのように、直下にテキストノードが存在しないノードに関しては、テキストノードの値に代えて、空文字列を意味する記号“−”が用いられる。 For example, the index for the <id> node is the <id> node (element) structure “/ item / id” and the text node value “0001” that is the content of the <id> node, that is, the <id> node The value “0001” of the text node immediately below is included. Similarly, the index related to the <name> node includes the structure “/ item / name” of the <name> node and the value “Mr. employee A” of the text node immediately below the <name> node. However, for a node that does not have a text node immediately below it, such as an <item> node, the symbol “-” meaning an empty character string is used instead of the value of the text node.

＜検索処理＞
次に、構造化文書検索システム５０における検索処理について、図６のフローチャートを参照して説明する。 <Search process>
Next, search processing in the structured document search system 50 will be described with reference to the flowchart of FIG.

今、ユーザがクライアント端末２０を操作することにより、当該クライアント端末２０から構造化文書検索システム５０に対して、現在確立しているセッションを用いて、ネットワーク３０経由で検索要求が送出されたものとする（ステップＳ１１）。この検索要求は検索式を含む。 Now, when a user operates the client terminal 20, a search request is sent from the client terminal 20 to the structured document search system 50 using the currently established session via the network 30. (Step S11). This search request includes a search expression.

要求処理部５２は、クライアント端末２０からの検索要求を受け取ると、当該検索要求を検索処理部５４に渡して検索を要求する（ステップＳ１２）。検索処理部５４は、検索要求に含まれている検索式を解析することにより（ステップＳ１３）、当該検索式に「一括同一視条件」が含まれているか、つまり当該検索式で「一括同一視条件」が指定されているかを判定する（ステップＳ１４）。一括同一視条件とは、予め設定されている複数の同一視条件の全てを一括して指定するための検索条件を意味する。本実施形態では、一括同一視条件に文字列“all insensitive”の記述が用いられるものとする。 Upon receiving the search request from the client terminal 20, the request processing unit 52 passes the search request to the search processing unit 54 and requests a search (step S12). The search processing unit 54 analyzes the search formula included in the search request (step S13), so that the search formula includes “batch equating condition”, that is, the search formula includes “batch equate”. It is determined whether or not “condition” is designated (step S14). The collective identification condition means a search condition for collectively specifying all of a plurality of preset identification conditions. In the present embodiment, it is assumed that the description of the character string “all insensitive” is used for the collective identification condition.

検索処理部５４は、検索式で一括同一視条件“all insensitive”が指定されているならば（ステップＳ１４）、当該検索式を含む検索要求の送信に用いられたセッションのセッション番号を用いて同一視条件管理部５３に当該セッションの同一視条件情報（セッション対応同一視条件情報）を問い合わせる（ステップＳ１５）。すると同一視条件管理部５３は、同一視条件情報記憶部５３０に格納されている同一視条件情報５３１の中から、検索処理部５４から問い合わせられたセッション番号のセッションに固有の同一視条件情報のみを抽出して、検索処理部５４に返す。これにより検索処理部５４は、上記ステップＳ１５において、目的のセッションに固有の同一視条件情報（セッション対応同一視条件情報）、即ち目的のセッションに対応付けられた全て同一視するに設定された複数の同一視条件を示す情報を取得する。なお、検索処理部５４が同一視条件情報記憶部５３０に格納されている同一視条件情報５３１を直接参照することにより、目的のセッションに固有の同一視条件情報を取得するようにしても構わない。 If the collective identification condition “all insensitive” is specified in the search formula (step S14), the search processing unit 54 uses the session number of the session used to transmit the search request including the search formula. The viewing condition management unit 53 is inquired about the same viewing condition information (session-matching same viewing condition information) of the session (step S15). Then, the same viewing condition management unit 53 selects only the same viewing condition information unique to the session having the session number inquired from the search processing unit 54 from the same viewing condition information 531 stored in the same viewing condition information storage unit 530. Is extracted and returned to the search processing unit 54. Thereby, in step S15, the search processing unit 54 sets the same viewing condition information (session-corresponding viewing condition information) unique to the target session, that is, a plurality of sets that are all identified with the target session. Information indicating the same viewing condition is acquired. Note that the search processing unit 54 may directly acquire the same viewing condition information 531 stored in the same viewing condition information storage unit 530 to acquire the same viewing condition information unique to the target session. .

次に検索処理部５４は、取得した同一視条件情報に従い、解析された検索式によって示される検索条件として個々の同一視条件を設定する（ステップＳ１６）。この同一視条件情報に従う検索条件の設定について、検索式（クエリ）がＸＱｕｅｒｙ式であるものとして具体例を挙げて説明する。 Next, the search processing unit 54 sets each identical viewing condition as a retrieval condition indicated by the analyzed retrieval formula in accordance with the acquired identical viewing condition information (step S16). The setting of the search condition according to the identification condition information will be described with a specific example assuming that the search expression (query) is an XQuery expression.

まず、セッション番号１のセッション（以下、セッション＃１と称する）で以下の検索式（クエリ）
/item/name [./text() ftcontains “社員Aさん” all insensitive]
を含む検索要求が実行されるものとする。このとき、同一視条件情報記憶部５３０に格納されている同一視条件情報５３１が図３の状態にあるものとする。 First, in the session with session number 1 (hereinafter referred to as session # 1), the following search expression (query)
/ item / name [./text () ftcontains “Employee A” all insensitive]
It is assumed that a search request including is executed. At this time, it is assumed that the same viewing condition information 531 stored in the same viewing condition information storage unit 530 is in the state of FIG.

上記クエリでは、一括同一視条件“all insensitive”が指定されている。このため検索処理部５４は、同一視条件管理部５３に問い合わせることで、セッション＃１に固有の同一視条件情報（セッション対応同一視条件情報）を取得する（ステップＳ１５）。 In the above query, the batch identification condition “all insensitive” is specified. Therefore, the search processing unit 54 obtains the same viewing condition information (session-corresponding viewing condition information) unique to the session # 1 by making an inquiry to the same viewing condition management unit 53 (step S15).

図３から明らかなように、セッション＃１に固有の同一視条件情報は、英数字同一視及び仮名同一視（“kana insensitive”と“kanacase insensitive”）を共に“する”（同一視する）に設定された同一視条件（英数字同一視条件及び仮名同一視条件）を含む。この場合、検索処理部５４は、上記クエリにおける一括同一視条件“all insensitive”が、英数字同一視をする（“case insensitive”及び“width insensitive”）と共に仮名同一視をする（“kana insensitive”及び“kanacase insensitive”）検索条件を表していると判断する。そして検索処理部５４は、上記クエリにおける“all insensitive”を、“case insensitive width insensitive kana insensitive kanacase insensitive”に自動的に置き換える。 As is clear from FIG. 3, the identification condition information unique to session # 1 is to “equalize” (identify) both alphanumeric identification and kana identification (“kana insensitive” and “kanacase insensitive”). Includes set identification conditions (alphanumeric identification conditions and pseudonym identification conditions). In this case, the search processing unit 54 recognizes the collective identification condition “all insensitive” in the query as alphanumeric characters (“case insensitive” and “width insensitive”) and also identifies the kana (“kana insensitive”). And “kanacase insensitive”). The search processing unit 54 automatically replaces “all insensitive” in the query with “case insensitive width insensitive kana insensitive kanacase insensitive”.

これにより上記クエリは、以下のクエリ
/item/name [./text() ftcontains “社員Aさん” case insensitive width insensitive kana insensitive kanacase insensitive]
と同等となる。 As a result, the above query becomes
/ item / name [./text () ftcontains “Employee A” case insensitive width insensitive kana insensitive kanacase insensitive]
Is equivalent to

このように、検索処理部５４は取得された同一視条件情報に従って、一括同一視条件“all insensitive”に代えて、英数字同一視及び仮名同一視を共にすることをそれぞれ示す、英数字同一視条件（“case insensitive”及び“width insensitive”）及び仮名同一視条件（“kana insensitive”及び“kanacase insensitive”）を設定する（ステップＳ１６）。なお本実施形態では、クエリにより一括同一視条件及び個々の同一視条件の両方が指定されている場合、個々の同一視条件は無視される（エラー扱いとされる）ものとする。 As described above, the search processing unit 54 replaces the batch identification condition “all insensitive” according to the acquired identification condition information, and indicates that the alphanumeric identification and the pseudonym identification are both performed. Conditions (“case insensitive” and “width insensitive”) and pseudonym identification conditions (“kana insensitive” and “kanacase insensitive”) are set (step S16). In the present embodiment, when both the batch identification condition and the individual identification condition are specified by the query, the individual identification condition is ignored (treated as an error).

索引管理部５５は、検索処理部５４によってステップＳ１６が実行されると、同一視展開手段として機能して、検索処理部５４によって解析されたクエリ（検索式）に含まれている検索キーワードと当該ステップＳ１６で設定された個々の同一視条件とに従って、当該検索キーワードを展開する（ステップＳ１７）。次に索引管理部５５は索引検索処理手段として機能して、展開後の全ての検索キーワードに対し、データベース４２に格納されている索引を用いて、検索条件に合致するノードのリストを検索結果として取得する（ステップＳ１８）。ここでは、上記クエリ（検索式）により検索対象パスとして“/item/name”が指定されていることから、ステップＳ１８での検索には、パス“/item/name”を含む索引が用いられる。検索処理部５４は、索引管理部５５による索引を用いての検索結果を要求処理部５２によりクライアント端末２０に返す（ステップＳ１９）。 When step S16 is executed by the search processing unit 54, the index management unit 55 functions as an identical development means, and the search keyword included in the query (search expression) analyzed by the search processing unit 54 and the search keyword The search keyword is developed in accordance with the individual identification conditions set in step S16 (step S17). Next, the index management unit 55 functions as an index search processing unit, and uses, as a search result, a list of nodes that match the search condition using the index stored in the database 42 for all search keywords after expansion. Obtain (step S18). Here, since “/ item / name” is designated as a search target path by the above query (search formula), an index including the path “/ item / name” is used for the search in step S18. The search processing unit 54 returns the search result using the index by the index management unit 55 to the client terminal 20 by the request processing unit 52 (step S19).

なお、データベース４２に格納されている索引を、構造化文書検索システム５０の起動時にデータベースサーバ１０内のメモリ１１にコピーしておき、索引管理部５５による検索処理にはこのコピーされた索引を用いることにより、検索処理を高速化することも可能である。また、索引管理部５５の同一視展開手段としての機能及び索引検索処理手段としての機能の一方または両方を検索処理部５４に持たせることも可能である。また、同一視展開手段としての機能を、検索処理部５４から独立の手段に持たせることも可能である。 The index stored in the database 42 is copied to the memory 11 in the database server 10 when the structured document search system 50 is activated, and this copied index is used for the search processing by the index management unit 55. As a result, it is possible to speed up the search process. In addition, the search processing unit 54 can have one or both of the function as the same-view developing unit and the function as the index search processing unit of the index management unit 55. In addition, it is possible to provide a function independent from the search processing unit 54 as a function of developing the same view.

一方、検索要求に含まれている検索式で一括同一視条件が指定されていない場合（ステップＳ１４）、ステップＳ１５及びＳ１６をスキップしてステップＳ１７が実行される。この場合、クライアント端末２０からの検索要求に含まれているクエリで直接指定された同一視条件に従って検索キーワードが展開される。 On the other hand, if the batch identification condition is not specified in the search formula included in the search request (step S14), steps S15 and S16 are skipped and step S17 is executed. In this case, the search keyword is expanded according to the same viewing condition directly specified by the query included in the search request from the client terminal 20.

本実施形態によれば、検索式内に個々の同一視条件を記述する必要がないため、ユーザにとって検索式の記述が容易になり、また検索式が短くなるので意味を解釈しやすくなる。また、ユーザは個々の同一視条件について知らなくても、一括同一視条件を用いるだけで、予め設定されている全ての同一視条件を使った検索を構造化文書検索システム５０に行わせることができる。また本実施形態によれば、一括指定可能な同一視条件が新たに増えても、その新たな条件についても既に設定されている同一視条件と同一視する／しないが同じであれば、検索式を変更する必要がない。 According to this embodiment, since it is not necessary to describe individual identification conditions in the search formula, the description of the search formula is easy for the user, and the search formula is shortened, so that the meaning can be easily interpreted. Further, even if the user does not know about the individual identification conditions, the structured document search system 50 can be made to perform a search using all of the preset identification conditions simply by using the collective identification conditions. it can. Further, according to the present embodiment, even if the number of identification conditions that can be collectively specified increases, if the new conditions are the same as or not identical to the already set identification conditions, the search formula There is no need to change.

［第１の変形例］
上記実施形態では、クエリに一括同一視条件を設定（記述）することにより、予め設定されている複数の同一視条件の全てを一括して指定することができる。しかし、一括して指定される複数の同一視条件の中に検索に不要な同一視条件が含まれている場合には、一括同一視条件を利用できない。 [First Modification]
In the above-described embodiment, by setting (description) the collective identification condition in the query, it is possible to collectively specify all of the plurality of preset identification conditions. However, when the same viewing condition unnecessary for the search is included in a plurality of the same viewing conditions that are collectively specified, the batch same viewing condition cannot be used.

そこで、一括同一視条件から除外すべき同一視条件を指定可能とする上記実施形態の第１の変形例について説明する。この第１の変形例では、クエリ（検索式）に、一括同一視条件から除外すべき同一視条件、つまり一括同一視指定の対象外とする同一視条件“except XXX”（“XXX”は“case”、“width”など同一視条件を意味する文字列）を記述できるものとする。この一括同一視条件の対象外とする同一視条件を同一視除外条件と称する。 Therefore, a first modification example of the above-described embodiment that enables designation of the same viewing condition to be excluded from the batch simultaneous viewing condition will be described. In this first modification, the query (search formula) includes the same identification condition to be excluded from the collective identification condition, that is, the same identification condition “except XXX” (“XXX” is excluded from the collective identification designation target). It is possible to describe a character string indicating the same viewing condition such as “case” and “width”. The same viewing condition that is not subject to the collective viewing condition is referred to as a same viewing exclusion condition.

＜検索処理＞
以下、上記実施形態の第１の変形例の動作について、構造化文書検索システム５０における検索処理を例に、上記実施形態と相違する点を中心に図７のフローチャートを参照して説明する。なお、図７において、図６のフローチャートと同様のステップには同一参照符号を付してある。 <Search process>
Hereinafter, the operation of the first modification of the above embodiment will be described with reference to the flowchart of FIG. 7 focusing on the differences from the above embodiment, taking the search processing in the structured document search system 50 as an example. In FIG. 7, the same steps as those in the flowchart of FIG. 6 are denoted by the same reference numerals.

まず、ユーザがクライアント端末２０を操作することにより、当該クライアント端末２０から構造化文書検索システム５０に対して、ネットワーク３０経由で検索要求が送出されたものとする（ステップＳ１１）。この検索要求に含まれている検索式は上記実施形態と同様に一括同一視条件を含むものとする。この場合、上記実施形態と同様にステップＳ１２乃至Ｓ１６が行われて、一括同一視指定される個々の同一視条件が設定される。 First, it is assumed that when the user operates the client terminal 20, a search request is transmitted from the client terminal 20 to the structured document search system 50 via the network 30 (step S11). The search formula included in this search request is assumed to include the collective identification condition as in the above embodiment. In this case, steps S12 to S16 are performed in the same manner as in the above-described embodiment, and individual identification conditions for which the same identification is designated are set.

すると検索処理部５４は、ステップＳ１３で解析されたクエリ（検索式）で同一視除外条件が指定されているか判定する（ステップＳ２１）。第１の変形例において、このクエリは一括同一視条件に加えて同一視除外条件も指定している（含んでいる）ものとする。 Then, the search processing unit 54 determines whether or not the same eye exclusion condition is specified in the query (search formula) analyzed in step S13 (step S21). In the first modification, it is assumed that this query specifies (includes) the same-identification exclusion condition in addition to the collective identification condition.

解析されたクエリで同一視除外条件が指定されている場合（ステップＳ２１）、検索処理部５４は、ステップＳ１６で設定された複数の同一視条件、即ち一括して指定された複数の同一視条件の中から、同一視除外条件で指定された同一視条件だけを除外する（ステップＳ２２）。この同一視除外条件で指定された同一視条件を除外する処理について、具体例を挙げて説明する。 When the equated exclusion condition is specified in the analyzed query (step S21), the search processing unit 54 has a plurality of equated conditions set in step S16, that is, a plurality of equated conditions specified in a lump. Only the same vision condition designated by the same vision exclusion condition is excluded (step S22). A process for excluding the same vision condition specified by the same vision exclusion condition will be described with a specific example.

まず、セッション＃１で以下の検索式（クエリ）
/item/name [./text() ftcontains “社員Aさん” all insensitive except case]
を含む検索要求が実行されるものとする。このとき、同一視条件情報記憶部５３０に格納されている同一視条件情報５３１が図３の状態にあるものとする。 First, in session # 1, the following search expression (query)
/ item / name [./text () ftcontains “Employee A” all insensitive except case]
It is assumed that a search request including is executed. At this time, it is assumed that the same viewing condition information 531 stored in the same viewing condition information storage unit 530 is in the state of FIG.

上記クエリでは、一括同一視条件“all insensitive”が指定されている。このため上記実施形態と同様に、“all insensitive”が“case insensitive width insensitive kana insensitive kanacase insensitive”に自動的に置き換えられる（ステップＳ１６）。 In the above query, the batch identification condition “all insensitive” is specified. Therefore, as in the above embodiment, “all insensitive” is automatically replaced with “case insensitive width insensitive kana insensitive kanacase insensitive” (step S16).

また上記クエリでは、同一視除外条件“except case”が指定されている（ステップＳ２１）。このため検索処理部５４は、“case insensitive width insensitive kana insensitive kanacase insensitive”から英数字の大文字小文字同一視条件（case insensitive）を除外する（ステップＳ２２）。つまり検索処理部５４は、“case insensitive width insensitive kana insensitive kanacase insensitive”を、英数字の大文字小文字同一視条件（case insensitive）が除外された“width insensitive kana insensitive kanacase insensitive”に自動的に置き換える。 Further, in the above query, the sameness exclusion condition “except case” is specified (step S21). Therefore, the search processing unit 54 excludes alphanumeric case insensitivity (case insensitive) from “case insensitive width insensitive kana insensitive kanacase insensitive” (step S22). That is, the search processing unit 54 automatically replaces “case insensitive width insensitive kana insensitive kanacase insensitive” with “width insensitive kana insensitive kanacase insensitive” from which alphanumeric case insensitivity (case insensitive) is excluded.

これにより前述のクエリは、以下のクエリ
/item/name [./text() ftcontains “社員Aさん” width insensitive kana insensitive kanacase insensitive]
と同等となる。 As a result, the above query becomes the following query:
/ item / name [./text () ftcontains “Employee A” width insensitive kana insensitive kanacase insensitive]
Is equivalent to

以降の動作は上記実施形態と同様であり、索引管理部５５は、検索処理部５４によって解析されたクエリ（検索式）に含まれている検索キーワードと現在設定されている個々の同一視条件（ここでは、一括指定された複数の同一視条件から、同一視除外条件で指定された同一視条件が除外された残りの同一視条件）とに従って、当該検索キーワードを展開する（ステップＳ１７）。なお、同一視除外条件が指定されていない場合には（ステップＳ２１）、ステップＳ２２をスキップしてステップＳ１７が実行される。ステップＳ１７が実行されると、上記実施形態と同様にステップＳ１８及び１９が実行される。 Subsequent operations are the same as those in the above embodiment, and the index management unit 55 and the search keyword included in the query (search expression) analyzed by the search processing unit 54 and the individual identification conditions currently set ( Here, the search keyword is expanded in accordance with a plurality of the same viewing conditions specified in batch and the remaining viewing conditions from which the same viewing conditions specified in the same viewing exclusion conditions are excluded (step S17). Note that if the same-eye exclusion condition is not designated (step S21), step S22 is skipped and step S17 is executed. When step S17 is executed, steps S18 and S19 are executed as in the above embodiment.

上記実施形態の第１の変形例によれば、同一視除外条件により一括同一視指定の対象外とする同一視条件を設定できるので、ある同一視条件だけ同一視する、またはしないといった同一視検索でも一括同一視指定が使える。このため一括同一視指定をより有効に利用することができる。また、同一視除外条件をクエリ中に設定（記述）できるので、一括同一視指定を利用しながら、クエリ単位で同一視条件を変更できる。 According to the first modification of the above embodiment, it is possible to set the same-view condition that is not subject to the batch identification specification by the same-view exclusion condition. But you can use batch identification. For this reason, the batch identification designation can be used more effectively. In addition, since the same-look exclusion condition can be set (described) in the query, the same-view condition can be changed in units of queries while using batch identification.

［第２の変形例］
上記実施形態の第１の変形例では、所望の同一視条件を一括同一視指定の対象外とする検索が必要な場合、ユーザは、その都度検索式で同一視除外条件を指定する必要がある。 [Second Modification]
In the first modification of the above-described embodiment, when a search for excluding a desired identical viewing condition from a batch identification designation is necessary, the user needs to designate the identical exclusion condition by a retrieval formula each time. .

そこで、検索式で同一視除外条件を指定しなくても、所望の同一視条件を一括同一視指定の対象外とすることを可能とする上記実施形態の第２の変形例について説明する。第２の変形例の特徴は、ユーザから要求されたセッションに対応付けられている一括同一視指定の対象となる複数の同一視条件ごとに、同一視する／しない（適用／非適用）の設定を更新できるようにしたことにある。 Therefore, a description will be given of a second modification of the above-described embodiment that makes it possible to exclude a desired equating condition from being designated as a collective equating designation without designating the equating exclusion condition with a search expression. A feature of the second modified example is a setting of whether or not to identify (apply / not apply) for each of a plurality of identification conditions that are targets of batch identification specification associated with a session requested by the user. Is to be able to update.

＜同一視条件設定処理＞
以下、上記実施形態の第２の変形例の動作について、構造化文書検索システム５０における同一視条件設定処理を例に図８のフローチャートを参照して説明する。
まず、ユーザがクライアント端末２０を操作することにより、当該クライアント端末２０から構造化文書検索システム５０に対して、現在確立しているセッションを用いて、ネットワーク３０経由で同一視条件設定要求が送出されたものとする（ステップＳ３１）。この同一視条件設定要求は、予め定められている同一視条件ごとに、同一視する／しない（適用／非適用）の情報を含む。 <Identification condition setting process>
Hereinafter, the operation of the second modified example of the above embodiment will be described with reference to the flowchart of FIG.
First, when the user operates the client terminal 20, the client terminal 20 sends an identification condition setting request to the structured document search system 50 through the network 30 using the currently established session. (Step S31). This request for setting the same viewing condition includes information regarding whether or not to perform the same viewing for each predetermined viewing condition (applied / not applied).

要求処理部５２は、クライアント端末２０からの同一視条件設定要求を受け取ると、当該同一視条件設定要求を同一視条件管理部５３に渡して同一視条件設定を要求する（ステップＳ３２）。 When receiving the same viewing condition setting request from the client terminal 20, the request processing unit 52 passes the same viewing condition setting request to the same viewing condition management unit 53 and requests the same viewing condition setting (step S32).

すると同一視条件管理部５３は、同一視条件設定要求に含まれている同一視する／しない（適用／非適用）の情報に従って、同一視条件情報記憶部５３０に格納されている同一視条件情報５３１の中で、該当するセッションに対応付けられている一括同一視指定の対象となる複数の同一視条件ごとの同一視する／しない（適用／非適用）の情報（セッション対応同一視情報）を更新する（ステップＳ３３）。ここで、該当するセッションは、クライアント端末２０からの同一視条件設定要求の送信に用いられたセッションを指す。 Then, the same-view condition management unit 53 stores the same-view condition information stored in the same-view condition information storage unit 530 according to the information regarding whether or not to view (apply / not apply) included in the request for setting the same-view condition. 531, information on whether or not to identify (apply / not apply) for each of a plurality of identification conditions that are targets of batch identification specification associated with the corresponding session (session-corresponding identification information). Update (step S33). Here, the corresponding session refers to a session used for transmission of the same-view condition setting request from the client terminal 20.

もし、同一視条件設定要求で英数字の大文字小文字同一視条件のみ同一視しないこと（非適用）が指定されているならば、ステップＳ３３の実行により、英数字同一視条件及び仮名同一視条件のうち、英数字の大文字小文字同一視条件のみ同一視しないに変更される。これは、クエリにより、その都度同一視除外条件“except case”を指定することと等価である。 If it is specified in the identification condition setting request that only alphanumeric uppercase and lowercase identification conditions are not identified (not applicable), the execution of step S33 causes the alphanumeric identification conditions and the pseudonym identification conditions. Of these, only alphanumeric upper and lower case identification conditions are changed to not identify. This is equivalent to designating the identification exclusion condition “except case” each time by a query.

上記実施形態の第２の変形例によれば、ユーザから要求されたセッションに対応付けられている一括同一視指定の対象となる複数の同一視条件ごとに、同一視する／しない（適用／非適用）の設定を更新できるため、一括同一視指定の対象となる同一視条件の設定を当該セッションにおいて継続的に有効にできる。よってユーザは、同一視除外条件をクエリごとに設定する必要がない。 According to the second modified example of the above-described embodiment, the plurality of identification conditions that are targets of batch identification specification associated with the session requested by the user are / are not identified (applied / non-applicable). Because the setting of (apply) can be updated, it is possible to continuously make the setting of the same viewing condition that is the target of the batch identification specification in the session. Therefore, the user does not need to set the sameness exclusion condition for each query.

［第３の変形例］
上記実施形態（並びに当該実施形態の第１及び第２の変形例）では、一括同一視指定の対象として設定された個々の同一視条件に従って、検索キーワードが無条件に展開される。しかし、展開された語彙にヒットしなかった場合には、その展開に要した処理コストが無駄となる。 [Third Modification]
In the embodiment described above (and the first and second modifications of the embodiment), the search keywords are unconditionally expanded according to the individual identification conditions set as the target of batch identification. However, if the expanded vocabulary is not hit, the processing cost required for the expansion is wasted.

そこで、このような無意味な検索キーワードの展開を防止できるようにした上記実施形態の第３の変形例について説明する。 Therefore, a third modification of the above-described embodiment that can prevent the development of such meaningless search keywords will be described.

図９は上記実施形態の第３の変形例で適用される索引管理部５５の構成を示すブロック図である。索引管理部５５は、文字種情報記憶部５５０を含む。文字種情報記憶部５５０は、データベース４２に登録されるＸＭＬ文書の木構造上の階層ごとに、即ちＸＭＬ文書の構造（パス）ごとに、その直下のテキストノードの文字種（テキストの文字種）と文字数を管理するのに用いられる文字種情報５５１を格納する。 FIG. 9 is a block diagram showing a configuration of the index management unit 55 applied in the third modification of the above embodiment. The index management unit 55 includes a character type information storage unit 550. The character type information storage unit 550 stores the character type (text character type) and the number of characters of the text node immediately below each hierarchical level of the XML document registered in the database 42, that is, for each structure (path) of the XML document. Character type information 551 used for management is stored.

図１０は文字種情報５５１の一例を示す。第３の変形例において、文字種情報５５１は表形式で管理される。文字種情報５５１は、列要素としてパスと文字種を含む。ここでは文字種が、「英数字」、「仮名」及び「その他」の３種であるものとする。文字種情報５５１は、各パスについて、当該パスを表す文字列（ＸＰａｔｈ表記文字列）と、データベース４２に登録されるＸＭＬ文書の当該パスの直下のテキストノードの文字種ごとの文字数とを、それぞれ値（表要素の値）として持つ。このパスごとの、当該パスを表す文字列と文字種ごとの文字数との組からなる情報を、パス対応文字種情報と呼ぶ。図１０の文字種情報５５１は、簡略化のために、データベース４２に上記実施形態で挙げたＸＭＬ文書だけが登録されている場合の例を示している。このＸＭＬ文書の例では、<id>ノードの直下のテキストノードの値は“0001”であり、<name>ノードの直下のテキストノードの値は“社員Aさん”である。このため、図１０において、パス「/item/id」に対応付けられた文字種「英数字，仮名，その他」の文字数は「４，０，０」であり、パス「/item/name」に対応付けられた文字種「英数字，仮名，その他」の文字数は「１，４，０」である。 FIG. 10 shows an example of character type information 551. In the third modification, the character type information 551 is managed in a table format. The character type information 551 includes a path and a character type as column elements. Here, it is assumed that there are three types of characters: “alphanumeric characters”, “kana”, and “others”. The character type information 551 includes, for each path, a character string (XPath notation character string) representing the path and the number of characters for each character type of the text node immediately below the path of the XML document registered in the database 42 (value ( Table element value). Information consisting of a set of a character string representing the path and the number of characters for each character type for each path is referred to as path-corresponding character type information. The character type information 551 in FIG. 10 shows an example in which only the XML document mentioned in the above embodiment is registered in the database 42 for simplification. In the example of this XML document, the value of the text node immediately below the <id> node is “0001”, and the value of the text node immediately below the <name> node is “Employee A”. Therefore, in FIG. 10, the character type “alphanumeric, kana, other” associated with the path “/ item / id” is “4, 0, 0” and corresponds to the path “/ item / name”. The number of characters of the attached character type “alphanumeric characters, kana, etc.” is “1, 4, 0”.

データベース４２にＸＭＬ文書が全く登録されていないときは、文字種情報５５１は空である。ＸＭＬ文書がデータベース４２に登録されて、新たなパス（ＸＰａｔｈ表記）が増える度に、文字種情報５５１の行が追加され、そのＸＭＬ文書に含まれている当該パスの直下のテキストノードの文字種ごとの値（文字数）が登録される。また、既に文字種情報５５１に登録されているパスについては、新たに登録されたＸＭＬ文書に含まれている当該パスの直下のテキストノードの文字種の文字数だけ、既に登録されている当該文字種の文字数が増加される。同様に、ＸＭＬ文書がデータベース４２から削除されると、当該削除されるＸＭＬ文書に含まれている当該パスの直下のテキストノードの文字種の文字数だけ、既に登録されている当該文字種の文字数が減少される。また、ＸＭＬ文書の削除により存在しなくなったパスについては、そのパスに関する行が文字種情報５５１から削除される。 When no XML document is registered in the database 42, the character type information 551 is empty. Each time an XML document is registered in the database 42 and a new path (XPath notation) increases, a line of character type information 551 is added, and for each character type of a text node immediately below the path included in the XML document. A value (number of characters) is registered. In addition, for a path already registered in the character type information 551, the number of characters of the character type already registered is the same as the number of characters of the character type of the text node immediately below the path included in the newly registered XML document. Will be increased. Similarly, when an XML document is deleted from the database 42, the number of characters of the registered character type is decreased by the number of characters of the character type of the text node immediately below the path included in the deleted XML document. The In addition, for a path that no longer exists due to the deletion of the XML document, the line related to the path is deleted from the character type information 551.

＜文書登録処理＞
次に、上記実施形態の第３の変形例の動作について、構造化文書検索システム５０における文書登録処理を例に図１１のフローチャートを参照して説明する。なお、図１１において、図４のフローチャートと同様のステップには同一参照符号を付してある。 <Document registration process>
Next, the operation of the third modified example of the above embodiment will be described with reference to the flowchart of FIG. 11 taking the document registration process in the structured document search system 50 as an example. In FIG. 11, the same steps as those in the flowchart of FIG. 4 are denoted by the same reference numerals.

要求処理部５２は、クライアント端末２０からの登録要求を受け取ると、当該登録要求を文書登録部５６に渡してＸＭＬ文書（ＸＭＬドキュメント）の登録を要求する（ステップＳ２）。文書登録部５６は、クライアント端末２０からの登録要求を要求処理部５２を介して受け取ると、当該要求で指定されたＸＭＬ文書のパースを開始する（ステップＳ３）。すると、上記実施形態と同様に、ＸＭＬ文書から要素（ノード）が抽出される都度（ステップＳ４）、そのノードについて、索引管理部５５による索引登録処理が行われる（ステップＳ５）。 Upon receiving the registration request from the client terminal 20, the request processing unit 52 passes the registration request to the document registration unit 56 and requests registration of an XML document (XML document) (step S2). When the document registration unit 56 receives a registration request from the client terminal 20 via the request processing unit 52, the document registration unit 56 starts parsing the XML document specified by the request (step S3). Then, as in the above embodiment, each time an element (node) is extracted from the XML document (step S4), an index registration process by the index management unit 55 is performed for that node (step S5).

次に索引管理部５５は文字種情報管理手段として機能して、索引登録の対象となったノード（要素）の内容（値）がテキストの場合、当該テキストを対象に文字種を識別すると共に文字数を計算し、文字種情報記憶部５５０に格納されている文字種情報５５１のうち、当該ノードのパスに対応付けられている当該識別された文字種の文字数を更新（例えば、当該計算された文字数だけ増加）する（ステップＳ４０）。図７のフローチャートが図４のフローチャートと相違する点は、このステップＳ４０が追加されたことである。なお、索引管理部５５の文字種情報管理手段としての機能を、当該索引管理部５５から独立の手段に持たせることも可能である。 Next, the index management unit 55 functions as a character type information management unit, and when the contents (values) of the nodes (elements) subjected to index registration are text, the character type is identified for the text and the number of characters is calculated. In the character type information 551 stored in the character type information storage unit 550, the number of characters of the identified character type associated with the path of the node is updated (for example, increased by the calculated number of characters) ( Step S40). 7 is different from the flowchart of FIG. 4 in that step S40 is added. It should be noted that the function as the character type information management means of the index management section 55 can be provided in a means independent from the index management section 55.

文書登録部５６は、索引管理部５５によって文字種情報５５１が更新されると（ステップＳ４０）、抽出されたノードをデータベース４２に格納するドキュメント格納処理を行う（ステップＳ６）。そして文書登録部５６は、未処理のノードが残っているならば（ステップＳ７）、上記ステップＳ４に戻って次のノードについて処理を継続する。 When the character type information 551 is updated by the index management unit 55 (step S40), the document registration unit 56 performs document storage processing for storing the extracted node in the database 42 (step S6). If there is an unprocessed node remaining (step S7), the document registration unit 56 returns to step S4 and continues the process for the next node.

＜検索処理＞
次に、第３の変形例で適用される構造化文書検索システム５０における検索処理について、上記実施形態と相違する点を中心に図１２のフローチャートを参照して説明する。なお、図１２において、図６のフローチャートと同様のステップには同一参照符号を付してある。 <Search process>
Next, search processing in the structured document search system 50 applied in the third modification will be described with reference to the flowchart of FIG. 12 with a focus on differences from the above embodiment. In FIG. 12, the same steps as those in the flowchart of FIG. 6 are denoted by the same reference numerals.

すると索引管理部５５は、文字種情報記憶部５５０に格納されている文字種情報５５１から検索対象パスに固有の文字種情報（パス対応文字種情報）を取得する（ステップＳ５０）。このステップＳ５０において、索引管理部５５は同一視条件除外手段として機能して、取得された文字種情報に含まれている文字種ごとの数（文字数）に基づいて、ヒットすることがない無駄な同一視条件を特定し、その特定された同一視条件を一括同一視指定された複数の同一視条件から除外する。図１２のフローチャートが図６のフローチャートと相違する点は、このステップＳ５０が追加されたことである。このステップＳ５０の詳細な手順については後述する。なお、上記ステップＳ５０を、検索処理部５４が検索対象パスに固有の文字種情報（パス対応文字種情報）を索引管理部５５に問い合わせることで、当該検索処理部５４が行うようにしても構わない。 Then, the index management unit 55 acquires character type information (path-corresponding character type information) specific to the search target path from the character type information 551 stored in the character type information storage unit 550 (step S50). In this step S50, the index management unit 55 functions as an identification condition excluding unit, and is based on the number of characters (number of characters) included in the acquired character type information, so that there is no wasted identification. A condition is specified, and the specified identical viewing condition is excluded from a plurality of identical viewing conditions designated collectively as identical. The flowchart of FIG. 12 differs from the flowchart of FIG. 6 in that step S50 is added. The detailed procedure of step S50 will be described later. The search processing unit 54 may perform the above step S50 by inquiring the index management unit 55 about character type information (path-corresponding character type information) specific to the search target path.

一方、検索要求に含まれている検索式で一括同一視条件が指定されていない場合（ステップＳ１４）、ステップＳ１５及びＳ１６をスキップしてステップＳ５０が実行される。ここでは、検索式で直接指定されている同一視条件の中から上記特定された同一視条件が除外される。 On the other hand, if the batch identification condition is not specified in the search expression included in the search request (step S14), step S15 and step S16 are skipped, and step S50 is executed. Here, the above-identified identification conditions are excluded from the identification conditions directly specified by the search expression.

索引管理部５５は、ステップＳ５０が実行されると、解析されたクエリ（検索式）に含まれている検索キーワードと当該ステップＳ５０で除外されなかった同一視条件とに従って、当該検索キーワードを展開する（ステップＳ１７）。次に索引管理部５５は、展開後の全ての検索キーワードに対し、データベース４２に格納されている索引を用いて、検索条件に合致するノードのリストを検索結果として取得する（ステップＳ１８）。ここでは、ヒットすることがない無駄な同一視条件は除外され、当該無駄な同一視条件に従う検索キーワードの展開は抑止されている。つまり検索キーワードの無駄な展開は抑止される。このため検索の高速化が図れる。 When step S50 is executed, the index management unit 55 expands the search keyword according to the search keyword included in the analyzed query (search expression) and the identification conditions not excluded in step S50. (Step S17). Next, the index management unit 55 acquires, as a search result, a list of nodes that match the search condition using the index stored in the database 42 for all search keywords after expansion (step S18). Here, useless identification conditions that do not hit are excluded, and development of search keywords according to the useless identification conditions is suppressed. In other words, useless expansion of search keywords is suppressed. This speeds up the search.

＜無駄な同一視条件を除外する処理＞
次に索引管理部５５による上記ステップＳ５０の処理（即ち無駄な同一視条件を除外する処理）の詳細な手順について、図１３のフローチャートを参照して説明する。 <Process to exclude useless identification conditions>
Next, a detailed procedure of the processing in step S50 (that is, processing for removing useless identification conditions) by the index management unit 55 will be described with reference to the flowchart of FIG.

まず索引管理部５５は、文字種情報記憶部５５０に格納されている文字種情報５５１から、検索対象パスに固有の文字種情報（パス対応文字種情報）を取得する（ステップＳ６１）。次に索引管理部５５は、取得された文字種情報中の文字種「英数字」の数（文字数）が０であるかを判定する（ステップＳ６２）。もし、英数字の数が０の場合（ステップＳ６２）、索引管理部５５は英数字同一視条件が指定されていても検索でヒットすることはないとして、当該英数字同一視条件を同一視条件から除外する（ステップＳ６３）。即ち索引管理部５５は、英数字同一視条件を“同一視しない”（非適用）として設定する。一方、英数字の数が０でない場合（ステップＳ６２）、索引管理部５５はステップＳ６３をスキップする。 First, the index management unit 55 acquires character type information (path-corresponding character type information) specific to the search target path from the character type information 551 stored in the character type information storage unit 550 (step S61). Next, the index management unit 55 determines whether the number (number of characters) of the character type “alphanumeric characters” in the acquired character type information is 0 (step S62). If the number of alphanumeric characters is 0 (step S62), the index management unit 55 assumes that no alphanumeric hit identification condition will be hit even if the alphanumeric identification condition is specified. (Step S63). That is, the index management unit 55 sets the alphanumeric identification condition as “not to be identified” (not applied). On the other hand, when the number of alphanumeric characters is not 0 (step S62), the index management unit 55 skips step S63.

同様に索引管理部５５は、取得された文字種情報中の文字種「仮名」の数（文字数）が０であるかを判定する（ステップＳ６４）。もし、仮名の数が０の場合（ステップＳ６４）、索引管理部５５は仮名同一視条件が指定されていても検索でヒットすることはないとして、当該仮名同一視条件を同一視条件から除外する（ステップＳ６５）。即ち索引管理部５５は、仮名同一視条件を“同一視しない”（非適用）として設定する。一方、仮名の数が０でない場合（ステップＳ６４）、索引管理部５５はステップＳ６５をスキップする。なお、上述の無駄な同一視条件を除外する処理（ステップＳ６１乃至Ｓ６５）を、検索処理部５４が検索対象パスに固有の文字種情報（パス対応文字種情報）を索引管理部５５に問い合わせることで、当該検索処理部５４が行うことも可能である。 Similarly, the index management unit 55 determines whether or not the number (character number) of the character type “kana” in the acquired character type information is 0 (step S64). If the number of kana is 0 (step S64), the index management unit 55 excludes the kana equating condition from the equating condition, assuming that the kana equating condition does not hit even if the kana equating condition is specified. (Step S65). That is, the index management unit 55 sets the pseudonym identification condition as “not to be identified” (not applied). On the other hand, if the number of kana is not 0 (step S64), the index management unit 55 skips step S65. In addition, the process (steps S61 to S65) of removing the above-mentioned useless identification condition is performed by the search processing unit 54 inquiring the index management unit 55 for character type information (path-corresponding character type information) specific to the search target path. The search processing unit 54 can also perform this.

上記実施形態の第３の変形例によれば、無駄な同一視条件が除外されることにより、検索キーワードの無駄な展開が抑止される。これにより、検索の処理コストが減り、検索を高速化できる。また、無駄な同一視条件を除外する処理が自動的に行われるので、ユーザは検索キーワードを構成する文字種を意識せず同一視検索のための検索式を設定できる。 According to the third modification of the above embodiment, useless expansion of search keywords is suppressed by eliminating useless identification conditions. Thereby, the search processing cost is reduced, and the search can be speeded up. In addition, since the process of eliminating useless identification conditions is automatically performed, the user can set a search expression for the identification search without considering the character types constituting the search keyword.

上記実施形態及びその変形例では、構造化文書がＸＭＬ文書である場合を想定している。しかし、本発明は、例えば、ＳＧＭＬ（Standard Generalized Markup Language）文書のようなＸＭＬ文書以外の構造化文書にも同様に適用できる。 In the above-described embodiment and its modification, it is assumed that the structured document is an XML document. However, the present invention can be similarly applied to structured documents other than XML documents such as SGML (Standard Generalized Markup Language) documents.

なお、本発明は、上記実施形態またはその変形例そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態またはその変形例に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態またはその変形例に示される全構成要素から幾つかの構成要素を削除してもよい。 In addition, this invention is not limited to the said embodiment or its modification example as it is, A component can be deform | transformed and embodied in the range which does not deviate from the summary in an implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment or its modification. For example, you may delete a some component from all the components shown by embodiment or its modification.

本発明の一実施形態に係る構造化文書検索システムを含むクライアント−サーバシステムのハードウェア構成を示すブロック図。1 is a block diagram showing a hardware configuration of a client-server system including a structured document search system according to an embodiment of the present invention. 図１に示される構造化文書検索システムの主として機能構成を示すブロック図。FIG. 2 is a block diagram mainly showing a functional configuration of the structured document search system shown in FIG. 1. 同実施形態で適用される同一視条件情報の一例を示す図。The figure which shows an example of the same viewing condition information applied in the embodiment. 同実施形態における文書登録処理の手順を示すフローチャート。6 is a flowchart showing a procedure of document registration processing in the embodiment. 同実施形態で適用される索引の具体例を示す図。The figure which shows the specific example of the index applied in the embodiment. 同実施形態における検索処理の手順を示すフローチャート。6 is a flowchart showing a procedure of search processing in the embodiment. 同実施形態の第１の変形例における検索処理の手順を示すフローチャート。The flowchart which shows the procedure of the search process in the 1st modification of the embodiment. 同実施形態の第２の変形例における同一視条件設定処理の手順を示すフローチャート。The flowchart which shows the procedure of the same viewing condition setting process in the 2nd modification of the embodiment. 同実施形態の第３の変形例で適用される索引管理部の構成を示すブロック図。The block diagram which shows the structure of the index management part applied in the 3rd modification of the embodiment. 同第３の変形例で適用される文字種情報の一例を示す図。The figure which shows an example of the character type information applied in the 3rd modification. 同第３の変形例における文書登録処理の手順を示すフローチャート。14 is a flowchart showing a procedure for document registration processing in the third modification. 同第３の変形例における検索処理の手順を示すフローチャート。The flowchart which shows the procedure of the search process in the 3rd modification. 同第３の変形例における無駄な同一視条件を除外する処理の手順を示すフローチャート。The flowchart which shows the procedure of the process which excludes the useless identification condition in a 3rd modification.

Explanation of symbols

１０…データベースサーバ、１１…メモリ、２０…クライアント端末、３０…ネットワーク、４０…外部記憶装置、４１…データベース管理プログラム、４２…データベース、５０…構造化文書検索システム、５１…データベース管理システム、５２…要求処理部、５３…同一視条件管理部、５４…検索処理部、５５…索引管理部、５６…文書登録部、５３０…同一視条件情報記憶部、５３１…同一視条件情報、５５０…文字種情報記憶部、５５１…文字種情報。 DESCRIPTION OF SYMBOLS 10 ... Database server, 11 ... Memory, 20 ... Client terminal, 30 ... Network, 40 ... External storage device, 41 ... Database management program, 42 ... Database, 50 ... Structured document search system, 51 ... Database management system, 52 ... Request processing unit 53 ... Equal viewing condition management unit 54 ... Search processing unit 55 ... Index management unit 56 ... Document registration unit 530 ... Equal viewing condition information storage unit 531 ... Equal viewing condition information 550 ... Character type information Storage unit, 551... Character type information.

Claims

A database for storing multiple structured documents;
Identification condition information storage means for storing a plurality of identification conditions for identification with the search keyword;
When the retrieval formula included in the retrieval request given from the client terminal includes a batch identification condition for collectively specifying all identification conditions, the plurality of items stored in the identification condition information storage unit equated expansion means for expanding a search keyword included in the search expression based on all identified conditions,
Search processing means for searching a structured document including a structure that matches a search condition indicated by the search formula from the database, based on the search keyword included in the search formula and the expanded search keyword;
When the identification viewing means expands a search keyword included in the search formula based on each of the identification conditions, the structured data stored in the database of character types unique to the identification conditions. Based on the presence / absence of characters corresponding to the structure indicated by the search expression among the characters included in the document, the presence / absence of the possibility of hitting is determined by the search under each of the same identification conditions. And the same condition exclusion means for excluding conditions ,
The structured document search system, wherein the same view development means excludes the same view conditions excluded by the same view condition exclusion means from the search keyword expansion target .

For each of a plurality of character types predetermined character type information storage means to store information indicating whether a character of the character type included in the structured document stored in the database, for each structure of the structured document further comprising,
The equivalence condition exclusion means associates the presence or absence of a possibility of a hit in a search under each of the identification conditions with the character type unique to each of the identification conditions and the structure indicated by the search formula. claim 1 Symbol mounting structured document retrieval system, wherein the determining based on the information indicating the presence or absence of characters stored in the storage means.

3. The structured document search system according to claim 2, wherein the information indicating the presence or absence of characters is the number of characters.

Managing the character type information storage means, and when the structured document is stored in the database, for each structure of the structured document, identifying the character type for the text included in the structure and calculating the number of characters; And further comprising character type information management means for increasing the number of characters stored in the character type information storage means in association with the identified character type and the structure of the structured document by the calculated number of characters. Item 4. The structured document retrieval system according to item 3.

A database for storing a plurality of structured documents and a same-view condition information storage unit for storing a plurality of same-view conditions for identifying the search keyword, and a structured document stored in the database as a target To the database server computer that performs the search process
Receiving a search request given from a client terminal;
Determining whether the search formula included in the accepted search request includes a batch identification condition for collectively specifying all the identification conditions;
If the search formula includes the batch identification condition, obtaining all of the plurality of identification conditions from the identification condition information storage unit;
Based on the presence / absence of characters corresponding to the structure indicated by the search expression among the characters included in the structured document stored in the database of the character types unique to each of the acquired identification conditions, Determining whether or not there is a possibility of a hit in each search;
Developing a search keyword included in the search formula based on the same viewing condition excluding the same viewing condition determined not to be a hit among the plurality of the same viewing conditions acquired; ,
Searching for a structured document including a structure that matches a search condition indicated by the search expression from the database based on the search keyword included in the search expression and the expanded search keyword. program.