JP2009251845A

JP2009251845A - Retrieval result evaluation device and retrieval result evaluation method

Info

Publication number: JP2009251845A
Application number: JP2008097778A
Authority: JP
Inventors: Koji Aida; 浩二合田
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2008-04-04
Filing date: 2008-04-04
Publication date: 2009-10-29

Abstract

<P>PROBLEM TO BE SOLVED: To perform the retrieval of a structured document under the consideration of an inter-keyword hierarchical relationship only by entering a plurality of keywords, and to evaluate the retrieval results. <P>SOLUTION: A vocabulary hierarchical relationship storage part 132 stores a vocabulary hierarchical relationship graph for representing an inter-vocabulary hierarchical relationship included in each of a plurality of structured documents stored in a retrieval object data storage part 131 for each structured document. A query creation part 121 creates retrieval formula for retrieving a structured document from a plurality of character strings included in retrieval conditions designated by a user by referring to the vocabulary hierarchical relationship storage part 132. A storage part 13 retrieves the structured document matched with retrieval formula created by the query creation part 121 from a retrieval object data storage part 131. A retrieval result evaluation part 141 evaluates the retrieved structured document based on a plurality of character strings included in the retrieval conditions. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、検索された構造化文書を評価する検索結果評価装置及び検索結果評価方法に関する。 The present invention relates to a search result evaluation apparatus and a search result evaluation method for evaluating a searched structured document.

一般に、論理構造を持つ文書は構造化文書と呼ばれる。構造化文書において、当該文書の論理構造は、当該文書中に記述されたタグによって示される場合がある。このタグを用いて論理構造が表現された構造化文書は、計算機で各種目的に合わせて解釈ないし加工して利用する処理に適している。構造化文書の代表として、ＸＭＬ（Extensible Markup Language）形式で記述されたＸＭＬ文書が知られている。 In general, a document having a logical structure is called a structured document. In a structured document, the logical structure of the document may be indicated by a tag described in the document. A structured document in which a logical structure is expressed using this tag is suitable for processing that is interpreted or processed by a computer for various purposes. As a typical structured document, an XML document described in an XML (Extensible Markup Language) format is known.

近年、非常に多くのアプリケーションでＸＭＬが用いられるようになり、様々なデータがＸＭＬ形式で記述されるようになってきている。これにより、ＸＭＬ形式で記述されたＸＭＬ文書を検索する技術が、重要となっている。 In recent years, XML has been used in a large number of applications, and various data have been described in XML format. As a result, a technique for retrieving an XML document described in the XML format is important.

ここで、上記したＸＭＬ文書の検索の条件指定では、
１．XPathやXQueryといった問合せ言語による条件指定
２．要素名、属性名とその値の組による条件指定
３．キーワード（文字列）による条件指定
４．自然言語による条件指定
のいずれかの方法が用いられている。 Here, in the above XML document search condition specification,
1. Condition specification by query language such as XPath and XQuery 2. Specifying conditions by pairs of element names, attribute names and their values Condition specification by keyword (character string) Any method of condition specification in natural language is used.

上記の条件指定方法のうち、１及び２に関しては、利用者（ユーザ）が問合せ言語などの専門的知識を有し、または予め検索したいＸＭＬ文書の構造を把握した上で、条件指定を行うため、ユーザにとって所望のＸＭＬ文書を検索できる可能性が高い。 Among the above-mentioned condition specifying methods, with respect to 1 and 2, the user (user) has specialized knowledge such as a query language, or specifies the condition after grasping the structure of the XML document to be searched in advance. There is a high possibility that the user can search for a desired XML document.

一方、３及び４に関しては、例えばユーザが問合せ言語などの専門的知識を持たない場合であってもＸＭＬ文書を検索することができるという利点がある。 On the other hand, 3 and 4 have an advantage that an XML document can be searched even when the user does not have specialized knowledge such as a query language.

ところで、上記したＸＭＬ文書の検索に際しては、例えばユーザにより指定された条件（検索条件）によっては多数のＸＭＬ文書が検索結果としてユーザに提示される（返される）場合がある。このような場合には、ユーザは、提示された多数のＸＭＬ文書の中から所望のＸＭＬ文書を探す必要がある。しかしながら、多数のＸＭＬ文書の中から所望のＸＭＬ文書を見つけ出すことは困難であり、時間がかかる。 By the way, when searching for the XML document described above, a large number of XML documents may be presented (returned) to the user as search results depending on conditions (search conditions) specified by the user, for example. In such a case, the user needs to search for a desired XML document from among a large number of presented XML documents. However, it is difficult and time consuming to find a desired XML document from among a large number of XML documents.

このため、例えば検索された多数のＸＭＬ文書の各々（検索結果）を評価し、ユーザにとって所望のＸＭＬ文書を例えば上位に順位付けした形で検索結果を返すことで、ユーザが最終的に必要としているデータを見つけやすくすることができる。 For this reason, for example, each of the many XML documents searched (search results) is evaluated, and the search results are returned in a form in which, for example, the XML documents desired by the user are ranked higher, so that the user finally needs Can make it easier to find existing data.

ここで、検索結果を評価する方法としては、例えばＸＭＬ文書に含まれる各単語に対して、当該単語が出現するＸＭＬ文書中での重要度を予めインデックスとして保持しておき、例えば検索時に指定された全てのキーワードに対して、その重要度の評価を行う方法等がある。 Here, as a method of evaluating the search result, for example, for each word included in the XML document, the importance in the XML document in which the word appears is stored in advance as an index. There is a method for evaluating the importance of all keywords.

また、ＸＭＬ文書の検索では、上記した検索結果の評価において当該ＸＭＬ文書の構造についても考慮して評価を行うことが望ましい。 Further, in the search of the XML document, it is desirable to perform the evaluation in consideration of the structure of the XML document in the above-described evaluation of the search result.

上記したような検索結果を評価する技術に関して、例えばフォーマットファイルにそれぞれの文書の構造のどの部分に対応したインデックスを作成するのかを記述するという簡単な操作によって、文書の構造情報とフィルタを利用した適切な適合度計算を行うことができる技術（以下、先行技術と表記）が開示されている（例えば、特許文献１を参照）。
特開２００１−３２５２９３号公報 With regard to the technology for evaluating the search results as described above, for example, the structure information and the filter of the document are used by a simple operation of describing which part of the structure of each document is to be created in the format file. A technique (hereinafter referred to as a prior art) that can perform appropriate fitness calculation is disclosed (see, for example, Patent Document 1).
JP 2001-325293 A

しかしながら、上記した問合せ言語による条件指定及び要素名、属性名とその値の組による条件指定では、XPathまたはXQueryのような問合せ言語の専門的知識や、ユーザが予め検索したいＸＭＬ文書の構造を把握した上で条件指定を行う必要がある。よって、そのような知識を持たないユーザは用いることができないため、利便性に欠ける。 However, in the condition specification by the above query language and the condition specification by the combination of element name, attribute name and its value, the knowledge of the query language such as XPath or XQuery and the structure of the XML document that the user wants to search in advance are grasped After that, it is necessary to specify conditions. Therefore, since a user who does not have such knowledge cannot use it, convenience is lacking.

一方、上記したキーワードによる条件指定または自然言語による条件指定では、必要としているデータを特定するための情報（例えば、ＸＭＬ文書の構造）を表すことができない。また、自然言語による条件指定に関しては、自然言語におけるあいまいさが残ってしまう。このため、ユーザが必要としているデータ以外にも多量のデータが検索結果として返ってきてしまう。 On the other hand, the condition specification by the keyword or the condition specification by the natural language cannot represent information for specifying the necessary data (for example, the structure of the XML document). In addition, natural language ambiguity remains for condition specification in natural language. For this reason, a large amount of data is returned as a search result in addition to the data required by the user.

また、検索結果に対して評価を行う場合、既存のキーワードによる全文検索等では、検索結果にユーザの意図を反映させることが困難である。また、上記した先行技術においては、例えば指定されたキーワードと静的に予め定義しておいたインデックス等に基づいて順序（順位）付けがされている。したがって、先行技術においては、例えば静的にはＸＭＬ文書の構造に従った重要度を適用することも可能であるが、検索時にユーザによって指定された検索条件から当該ユーザが意図する階層構造を有するＸＭＬ文書を動的に上位に順位付けすることは困難である。 Further, when evaluating a search result, it is difficult to reflect the user's intention in the search result in a full-text search using existing keywords. Further, in the above-described prior art, for example, the order (ranking) is given based on a specified keyword and a statically defined index or the like. Therefore, in the prior art, for example, it is possible to statically apply the importance according to the structure of the XML document, but it has a hierarchical structure intended by the user based on the search conditions specified by the user at the time of search. It is difficult to dynamically rank XML documents at the top.

本発明の目的は、複数のキーワード（文字列）を入力するだけで当該キーワード間の階層関係を考慮して構造化文書の検索を実行し、かつ、当該キーワードに応じて当該検索結果を評価する検索結果評価装置及び検索結果評価方法を提供することにある。 An object of the present invention is to search a structured document in consideration of the hierarchical relationship between the keywords only by inputting a plurality of keywords (character strings), and evaluate the search results according to the keywords. A search result evaluation apparatus and a search result evaluation method are provided.

本発明の１つの態様によれば、検索対象となる複数の構造化文書が格納されている検索対象データ格納手段に対して、ユーザによって指定された複数の文字列を含む検索条件に応じて検索を実行する検索結果評価装置が提供される。この検索結果評価装置は、前記検索対象データ格納手段に格納されている複数の構造化文書の各々に含まれる語彙間の階層関係を示す語彙階層関係グラフを、当該構造化文書毎に格納する語彙階層関係グラフ格納手段と、前記語彙階層関係グラフ格納手段を参照して、前記検索条件に含まれる複数の文字列から構造化文書を検索するための検索式を作成するクエリ作成手段と、前記作成された検索式に合致する構造化文書を前記検索対象データ格納手段から検索する検索手段と、前記検索条件に含まれる複数の文字列に基づいて、前記検索された構造化文書を評価する検索結果評価手段とを具備する。 According to one aspect of the present invention, a search target data storage unit storing a plurality of structured documents to be searched is searched according to a search condition including a plurality of character strings specified by a user. A search result evaluation apparatus for executing is provided. This search result evaluation apparatus is a vocabulary for storing a lexical hierarchy relation graph indicating a hierarchical relation between vocabularies included in each of a plurality of structured documents stored in the search target data storage means for each structured document. A hierarchical relationship graph storage means; a query creation means for creating a search expression for retrieving a structured document from a plurality of character strings included in the search condition with reference to the vocabulary hierarchical relationship graph storage means; and the creation A search unit that searches the search target data storage unit for a structured document that matches the search expression, and a search result that evaluates the searched structured document based on a plurality of character strings included in the search condition Evaluation means.

本発明によれば、複数のキーワードを入力するだけで当該キーワード間の階層関係を考慮して構造化文書の検索を実行し、かつ、当該検索結果を評価することを可能とする。 According to the present invention, it is possible to search a structured document in consideration of the hierarchical relationship between the keywords and to evaluate the search results simply by inputting a plurality of keywords.

以下、図面を参照して、本発明の各実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１は、本発明の第１の実施形態に係る検索結果評価装置の主として機能構成を示すブロック図である。この検索結果評価装置１０は、例えば検索者（ユーザ）２０によって指定された複数の文字列を含む検索条件に応じて構造化文書の検索を実行する。
図１に示す検索結果評価装置１０は、データ解析部１１、検索実行部１２、記憶部１３及び検索結果評価部１４を含む。 [First Embodiment]
FIG. 1 is a block diagram mainly showing a functional configuration of a search result evaluation apparatus according to the first embodiment of the present invention. The search result evaluation apparatus 10 executes a search for a structured document according to a search condition including a plurality of character strings specified by a searcher (user) 20, for example.
A search result evaluation apparatus 10 illustrated in FIG. 1 includes a data analysis unit 11, a search execution unit 12, a storage unit 13, and a search result evaluation unit 14.

データ解析部１１は、例えば検索結果評価装置１０の管理者３０からの語彙階層関係グラフ作成要求に従って、語彙階層関係グラフを作成する。語彙階層関係グラフは、検索対象となる構造化文書（以下、検索対象データと表記）に含まれる語彙間の階層関係を示す。なお、この語彙階層関係グラフの詳細については後述する。
データ解析部１１は、データ解析制御部１１１及び語彙階層関係グラフ作成部１１２を含む。データ解析制御部１１１は、記憶部１３から検索対象データを取得し、当該検索対象データを解析する。語彙階層関係グラフ作成部１１２は、データ解析制御部１１１によって解析された結果に基づいて、語彙階層関係グラフを作成する。また、語彙階層関係グラフ作成部１１２は、作成された語彙階層関係グラフを記憶部１３に登録する。 The data analysis unit 11 creates a vocabulary hierarchy relationship graph in accordance with, for example, a vocabulary hierarchy relationship graph creation request from the administrator 30 of the search result evaluation apparatus 10. The vocabulary hierarchy relationship graph shows a hierarchical relationship between vocabularies included in a structured document to be searched (hereinafter referred to as search target data). Details of this vocabulary hierarchy relationship graph will be described later.
The data analysis unit 11 includes a data analysis control unit 111 and a vocabulary hierarchy relation graph creation unit 112. The data analysis control unit 111 acquires search target data from the storage unit 13 and analyzes the search target data. The lexical hierarchy relationship graph creation unit 112 creates a vocabulary hierarchy relationship graph based on the result analyzed by the data analysis control unit 111. Further, the vocabulary hierarchy relationship graph creation unit 112 registers the created vocabulary hierarchy relationship graph in the storage unit 13.

検索実行部１２は、検索者２０によって指定される検索条件（検索者２０からの検索要求）に従って、検索処理を実行する。検索実行部１２は、検索者２０によって指定される検索条件及び当該検索条件に合致する検索結果（データ）を検索結果評価部１４に渡す。また、検索実行部１２は、検索結果を検索者２０に返す。検索実行部１２は、クエリ作成部１２１及び検索制御部１２２を含む。
クエリ作成部１２１は、検索者２０によって指定される検索条件のリストを作成し、作成されたリストから検索対象データを検索するための検索式を作成する。検索式は、語彙階層関係グラフ格納部１３２を参照し後述する変換データに基づいて作成される。なお、検索者２０によって指定される検索条件は、パス形式で示される。以下、パス形式で示される検索条件を検索条件パス式と称する。この検索条件パス式には、例えば検索者２０によって指定される複数の文字列が含まれる。
検索制御部１２２は、クエリ作成部１２１によって作成された検索式により、記憶部１３に対して検索処理を実行するための制御を行う。検索制御部１２２は、記憶部１３から受け取った検索結果データを検索結果評価部１４に渡す。この検索結果データには、クエリ作成部１２１によって作成された検索式により検索された検索対象データ、すなわち、検索者２０によって指定された検索条件に合致する検索対象データが含まれる。 The search execution unit 12 executes a search process according to a search condition specified by the searcher 20 (search request from the searcher 20). The search execution unit 12 passes the search condition specified by the searcher 20 and the search result (data) that matches the search condition to the search result evaluation unit 14. The search execution unit 12 returns the search result to the searcher 20. The search execution unit 12 includes a query creation unit 121 and a search control unit 122.
The query creation unit 121 creates a list of search conditions specified by the searcher 20 and creates a search expression for searching for search target data from the created list. The search formula is created based on conversion data described later with reference to the vocabulary hierarchy relationship graph storage unit 132. The search condition specified by the searcher 20 is shown in a path format. Hereinafter, the search condition indicated in the path format is referred to as a search condition path expression. The search condition path expression includes a plurality of character strings specified by the searcher 20, for example.
The search control unit 122 performs control for executing search processing on the storage unit 13 based on the search formula created by the query creation unit 121. The search control unit 122 passes the search result data received from the storage unit 13 to the search result evaluation unit 14. The search result data includes search target data searched by the search formula generated by the query generation unit 121, that is, search target data that matches the search conditions specified by the searcher 20.

記憶部１３は、各種データを格納する。記憶部１３は、検索対象データ格納部１３１、語彙階層関係グラフ格納部１３２及び変換データ格納部１３３を有し、適宜各種データの格納または取り出しを行う。
検索対象データ格納部１３１には、構造化文書検索装置１０による検索の対象となる検索対象データ（構造化文書）が格納されている。なお、検索対象データ格納部１３１は、検索結果評価装置１０とは別の、例えば外部の記憶装置に管理（格納）される構成であってもよい。語彙階層関係グラフ１３２には、データ解析部１１の語彙階層関係グラフ作成部１１２によって作成された語彙階層関係グラフが格納（登録）されている。変換データ格納部１３３には、前述した変換データが格納されている。この変換データは、上記した検索条件パス式から検索式を作成（変換）するためのデータである。 The storage unit 13 stores various data. The storage unit 13 includes a search target data storage unit 131, a lexical hierarchy relation graph storage unit 132, and a converted data storage unit 133, and stores or retrieves various data as appropriate.
The search target data storage unit 131 stores search target data (structured document) to be searched by the structured document search apparatus 10. The search target data storage unit 131 may be managed (stored) in, for example, an external storage device different from the search result evaluation device 10. The vocabulary hierarchy relationship graph 132 stores (registers) the vocabulary hierarchy relationship graph created by the vocabulary hierarchy relationship graph creation unit 112 of the data analysis unit 11. The conversion data storage unit 133 stores the conversion data described above. This conversion data is data for creating (converting) a search expression from the above-described search condition path expression.

検索結果評価部１４は、検索実行部１２から渡された検索結果データを、当該検索実行部１２から渡された検索条件（検索条件パス式）に基づいて評価する。検索結果評価部１４は、パス式文書間評価部１４１を含む。
パス式文書間評価部１４１は、検索実行部１２から渡された検索結果データに含まれる検索対象データ（構造化文書）毎に、当該検索対象データの評価値を算出する。この場合、パス式文書間評価部１４１は、検索対象データ内において、検索条件パス式に含まれる複数の文字列が出現する箇所に基づいて当該構造化文書の評価値を算出する。パス式文書間評価部１４１は、算出された評価値に基づいて、検索結果データに含まれる検索対象データの順序（順位）付けを行う。パス式文書間評価部１４１は、算出された評価値に応じて順序付けされた検索結果データを検索実行部１２に返す。 The search result evaluation unit 14 evaluates the search result data passed from the search execution unit 12 based on the search condition (search condition path expression) passed from the search execution unit 12. The search result evaluation unit 14 includes a path type inter-document evaluation unit 141.
The path type inter-document evaluation unit 141 calculates the evaluation value of the search target data for each search target data (structured document) included in the search result data passed from the search execution unit 12. In this case, the path expression inter-document evaluation unit 141 calculates the evaluation value of the structured document based on a location where a plurality of character strings included in the search condition path expression appear in the search target data. The path type inter-document evaluation unit 141 performs ordering (ranking) of search target data included in the search result data based on the calculated evaluation value. The path type inter-document evaluation unit 141 returns the search result data ordered according to the calculated evaluation value to the search execution unit 12.

次に、図２のフローチャートを参照して、図１に示す検索結果評価装置１０において語彙階層関係グラフが作成される際の処理手順について説明する。
まず、管理者３０は、例えばデータ解析部１１に対して、語彙階層関係グラフを作成させるための語彙階層関係グラフ作成要求を出す（ステップＳ１）。このとき、例えば作成される語彙階層関係グラフに含まれる語彙を指定する等の条件が、管理者３０によって与えられる構成であっても構わない。この場合、作成される語彙階層関係グラフが、例えば管理者３０によって指定された語彙を必ず保持する構成でもよいし、または当該指定された語彙を優先的に保持する構成でもよい。 Next, with reference to the flowchart of FIG. 2, a processing procedure when the lexical hierarchy relation graph is created in the search result evaluation apparatus 10 shown in FIG. 1 will be described.
First, the administrator 30 issues a lexical hierarchy relationship graph creation request for creating a vocabulary hierarchy relationship graph to the data analysis unit 11, for example (step S1). At this time, for example, the administrator 30 may be given a condition such as specifying a vocabulary included in the created vocabulary hierarchy relation graph. In this case, the generated vocabulary hierarchy relation graph may be configured to always hold the vocabulary designated by the administrator 30, for example, or may be configured to preferentially hold the designated vocabulary.

データ解析部１１のデータ解析制御部１１１は、管理者３０からの語彙階層関係グラフ作成要求に応じて解析の対象となるデータ（以下、解析対象データと表記）の取得要求を、記憶部１３に対して出力する（ステップＳ２）。この解析対象データは、検索対象データ格納部１３１に格納されている検索対象データである。以下、データ解析部１１によって処理される検索対象データを解析対象データとして説明する。
記憶部１３は、この取得要求に応じて、検索対象データ格納部１３１に格納されている解析対象データを検索する（ステップＳ３）。記憶部１３は、検索された解析対象データをデータ解析制御部１１１に渡す。これにより、データ解析制御部１１１は、解析対象データを取得する。 The data analysis control unit 111 of the data analysis unit 11 sends an acquisition request for data to be analyzed (hereinafter referred to as analysis target data) to the storage unit 13 in response to the lexical hierarchy relationship graph creation request from the administrator 30. On the other hand, the data is output (step S2). This analysis target data is search target data stored in the search target data storage unit 131. Hereinafter, search target data processed by the data analysis unit 11 will be described as analysis target data.
In response to this acquisition request, the storage unit 13 searches the analysis target data stored in the search target data storage unit 131 (step S3). The storage unit 13 passes the searched analysis target data to the data analysis control unit 111. Thereby, the data analysis control unit 111 acquires analysis target data.

なお、データ解析制御部１１１は、検索対象データ格納部１３１に格納されている検索対象データの全てを解析対象データとして取得することを基本とするが、ステップＳ１において、管理者３０によって条件が指定されている場合には、当該条件に合致する検索対象データのみを解析対象データとして取得する構成であってもよい。例えば前回の解析時からの差分、つまり、まだ解析を行っていないデータのみを解析対象とすることも可能である。 The data analysis control unit 111 basically acquires all the search target data stored in the search target data storage unit 131 as analysis target data. However, in step S1, a condition is designated by the administrator 30. In such a case, only the search target data that matches the condition may be acquired as analysis target data. For example, only a difference from the previous analysis, that is, only data that has not been analyzed can be analyzed.

データ解析制御部１１１は、取得された解析対象データの解析を行う（ステップＳ４）。語彙階層関係グラフ作成部１１２は、データ解析制御部１１１によって解析された結果を基に、語彙階層関係グラフを作成する（ステップＳ５）。語彙階層関係グラフ作成部１１２は、作成された語彙階層関係グラフを語彙階層関係グラフ格納部１３２に登録する（ステップＳ６）。 The data analysis control unit 111 analyzes the acquired analysis target data (step S4). The lexical hierarchy relationship graph creation unit 112 creates a vocabulary hierarchy relationship graph based on the result analyzed by the data analysis control unit 111 (step S5). The vocabulary hierarchy relationship graph creation unit 112 registers the created vocabulary hierarchy relationship graph in the vocabulary hierarchy relationship graph storage unit 132 (step S6).

次に、図３のフローチャートを参照して、図１に示す検索結果評価装置１０において検索者２０からの検索要求に応じた検索が実行される際の処理手順について説明する。
まず、検索者２０は、検索実行部１２に対して、例えば検索条件パス式を指定して検索要求を出す（ステップＳ１１）。この検索条件パス式には、複数の文字列が含まれる。 Next, with reference to the flowchart of FIG. 3, the processing procedure when the search according to the search request from the searcher 20 is executed in the search result evaluation apparatus 10 shown in FIG.
First, the searcher 20 issues a search request to the search execution unit 12 by specifying, for example, a search condition path expression (step S11). This search condition path expression includes a plurality of character strings.

クエリ作成部１２１は、入力された検索条件パス式に応じて、検索対象データ格納部１３１に対して検索を実行するための検索式を作成する（ステップＳ１２）。すなわち、クエリ作成部１２１は、語彙階層関係グラフ格納部１３２及び変換データ格納部１３３を参照して、検索条件パス式に含まれる複数の文字列を語彙として含む語彙階層関係グラフによって示される当該複数の文字列間の階層関係に基づいて、検索式を作成する。 The query creation unit 121 creates a search formula for executing a search on the search target data storage unit 131 according to the input search condition path formula (step S12). That is, the query creation unit 121 refers to the vocabulary hierarchy relationship graph storage unit 132 and the conversion data storage unit 133, and the query creation unit 121 indicates the plurality of lexical hierarchy relationship graphs including a plurality of character strings included in the search condition path expression as vocabulary. Create a search expression based on the hierarchical relationship between the strings.

検索実行部１２の検索制御部１２２は、クエリ作成部１２１によって作成された検索式を指定して、記憶部１３に対して検索要求を出力する（ステップＳ１３）。
記憶部１３は、検索制御部１２２からの検索要求に応じて、当該検索制御部１２２の制御の下、検索対象データ格納部１３１から当該検索要求に合致する検索対象データを検索する（ステップＳ１４）。記憶部１３は、検索制御部１２２からの検索要求に合致する検索対象データを含む検索結果（データ）を検索制御部１２２に返す。これにより、検索制御部１２２は、検索式に基づく検索結果データを取得する。この検索結果データには、記憶部１３によって検索された上記検索要求に合致する検索対象データ（構造化文書）が含まれる。 The search control unit 122 of the search execution unit 12 designates the search formula created by the query creation unit 121 and outputs a search request to the storage unit 13 (step S13).
In response to the search request from the search control unit 122, the storage unit 13 searches the search target data storage unit 131 for search target data that matches the search request under the control of the search control unit 122 (step S14). . The storage unit 13 returns the search result (data) including the search target data that matches the search request from the search control unit 122 to the search control unit 122. Thereby, the search control unit 122 acquires search result data based on the search formula. The search result data includes search target data (structured document) that matches the search request searched by the storage unit 13.

検索制御部１２２は、取得された検索結果データ及び検索条件パス式を、検索結果評価部１４に渡す。検索結果評価部１４は、検索制御部１２２から渡された検索条件パス式に基づいて、検索結果データに含まれる検索対象データの各々を評価する（ステップＳ１５）。このとき、検索結果評価部１４のパス式文書間評価部１４１は、検索結果データに含まれる検索対象データ内において、検索条件パス式に含まれる複数の文字列が出現する箇所に基づいて当該検索対象データの評価値を算出する。なお、評価値の算出処理の詳細については後述する。 The search control unit 122 passes the acquired search result data and the search condition path expression to the search result evaluation unit 14. The search result evaluation unit 14 evaluates each search target data included in the search result data based on the search condition path expression passed from the search control unit 122 (step S15). At this time, the path expression inter-document evaluation section 141 of the search result evaluation section 14 performs the search based on the location where a plurality of character strings included in the search condition path expression appear in the search target data included in the search result data. The evaluation value of the target data is calculated. Details of the evaluation value calculation process will be described later.

パス式文書間評価部１４１は、算出された評価値に基づいて検索結果データに含まれる検索対象データの各々について順序付けを行う。パス式文書間評価部１４１によって順序付けされた検索対象データを含む検索結果データは、検索実行部１２に渡される。
検索制御部１２２は、パス式文書間評価部１４１から渡された検索結果データを、検索要求に対する検索結果として検索者２０に対して返す（ステップＳ１６）。これにより、検索者２０は、検索要求に対する検索結果を取得する（ステップＳ１７）。この場合、検索結果は、パス式文書間評価部１４１によって順序付けされた結果の上位の検索対象データから順に検索者２０に対して提示（表示）される。 The path type inter-document evaluation unit 141 performs ordering for each of the search target data included in the search result data based on the calculated evaluation value. The search result data including the search target data ordered by the path type inter-document evaluation unit 141 is passed to the search execution unit 12.
The search control unit 122 returns the search result data passed from the path type inter-document evaluation unit 141 to the searcher 20 as a search result for the search request (step S16). Thereby, the searcher 20 acquires a search result for the search request (step S17). In this case, the search results are presented (displayed) to the searcher 20 in order from the search target data at the top of the results ordered by the path type inter-document evaluation unit 141.

図４は、検索対象データ格納部１３１に格納されている検索対象データの一例を示す図である。図４に示す検索対象データは、例えばＸＭＬ（Extensible Markup Language）形式で記述された天気予報データ１００である。図４に示す天気予報データ１００は、例えばルート要素名として「天気予報」を有する。また、天気予報データ１００は、例えば要素名として「場所」、「予報」、「天気」、「気温」、「最高」、「最低」及び「降水確率」を有する。
更に、天気予報データ１００は、例えば属性名が「都道府県」で、属性値として「東京都」、「神奈川県」及び「埼玉県」を有する。また、例えば属性名が「地域」で、属性値として「東京地方」、「伊豆諸島北部」、「伊豆諸島南部」、「小笠原諸島」、「東部」、「西部」、「北部」及び「南部」を有する。また、例えば属性名が「単位」で、属性値として「％」を有する。 FIG. 4 is a diagram illustrating an example of search target data stored in the search target data storage unit 131. The search target data shown in FIG. 4 is, for example, weather forecast data 100 described in XML (Extensible Markup Language) format. The weather forecast data 100 shown in FIG. 4 has, for example, “weather forecast” as a route element name. The weather forecast data 100 has, for example, “location”, “forecast”, “weather”, “temperature”, “highest”, “lowest”, and “precipitation probability” as element names.
Furthermore, the weather forecast data 100 has, for example, an attribute name “prefecture” and attribute values “Tokyo”, “Kanagawa”, and “Saitama”. Also, for example, the attribute name is “Region”, and the attribute values are “Tokyo region”, “Northern Izu Islands”, “Southern Izu Islands”, “Ogasawara Islands”, “Eastern”, “Western”, “Northern” and “Southern” Is included. For example, the attribute name is “unit” and the attribute value has “%”.

ここで、上記した検索対象データに含まれる要素には、単純型と複合型が存在する。単純型は、属性を持たない要素であって、内容にテキスト（文字列）のみを持つ要素である。天気予報データ１００では、例えば要素名「天気」、「最高」及び「最低」が単純型である。
また、複合型には、単純型内容を持つ複合型と単純型内容を持たない複合型が存在する。単純型内容を持つ複合型は、属性を有する要素であって、内容にテキストのみを持つ要素である。天気予報データ１００では、例えば要素名「降水確率」が挙げられる。一方、単純型内容を持たない複合型は、内容にテキスト以外のものを持つ複合型である。天気予報データ１００では、例えば要素名「場所」及び「気温」が挙げられる。 Here, there are a simple type and a composite type in the elements included in the search target data. The simple type is an element having no attribute and having only text (character string) in the content. In the weather forecast data 100, for example, element names “weather”, “highest”, and “lowest” are simple types.
The composite type includes a composite type having a simple type content and a composite type having no simple type content. A complex type having simple type content is an element having an attribute and having only text in the content. In the weather forecast data 100, for example, the element name “precipitation probability” can be mentioned. On the other hand, a composite type having no simple type content is a composite type having contents other than text. In the weather forecast data 100, for example, the element name “location” and “temperature” are listed.

次に、図５及び図７を参照して、前述した図２のステップＳ４及びＳ５の処理について詳細に説明する。まず、図５のフローチャートを参照して、上記した解析対象データ（検索対象データ）の階層構造を示す構造グラフを作成する処理手順について説明する。なお、この構造グラフは、語彙階層関係グラフを作成するために用いられる。
データ解析部１１は、構造グラフを初期化する（ステップＳ２１）。この処理により、データ解析部１１は、構造グラフを頂点及び辺を持たないグラフとする。 Next, with reference to FIG. 5 and FIG. 7, the processing of steps S4 and S5 of FIG. 2 described above will be described in detail. First, a processing procedure for creating a structure graph indicating the hierarchical structure of the analysis target data (search target data) will be described with reference to the flowchart of FIG. This structure graph is used to create a lexical hierarchy relationship graph.
The data analysis unit 11 initializes the structure graph (step S21). With this processing, the data analysis unit 11 converts the structure graph into a graph having no vertices and no edges.

次に、データ解析部１１のデータ解析制御部１１１は、前述した図２のステップＳ３で検索された解析対象データのルート要素を対象要素として取得する（ステップＳ２２）。なお、ルート要素は、例えば構造化文書（ＸＭＬデータ）の要素を全て包含する要素である。
データ解析制御部１１１は、取得された対象要素の要素名を取得し（ステップＳ２３）、対象要素に属性が指定されているか否かを判定する（ステップＳ２４）。さらに、対象要素に属性が指定されていると判定されると（ステップＳ２４のＹＥＳ）、当該対象要素に指定されている属性の属性名及び属性値を取得する（ステップＳ２５）。取得された属性値に対して形態素解析を実行し、その結果により、属性値に含まれる名詞を抽出する（ステップＳ２６）。
なお、ステップＳ２４において、対象要素に属性が指定されていないと判定されると、上記したステップＳ２５及びＳ２６の処理は実行されない。 Next, the data analysis control unit 111 of the data analysis unit 11 acquires the root element of the analysis target data searched in step S3 of FIG. 2 described above as a target element (step S22). The root element is an element that includes all elements of a structured document (XML data), for example.
The data analysis control unit 111 acquires the element name of the acquired target element (step S23), and determines whether an attribute is specified for the target element (step S24). Further, if it is determined that an attribute is specified for the target element (YES in step S24), the attribute name and attribute value of the attribute specified for the target element are acquired (step S25). Morphological analysis is performed on the acquired attribute value, and nouns included in the attribute value are extracted based on the result (step S26).
If it is determined in step S24 that no attribute is specified for the target element, the processes in steps S25 and S26 described above are not executed.

次に、データ解析制御部１１１は、対象要素が前述した単純型または単純型内容を持つ複合型であるか否かを判定する（ステップＳ２７）。対象要素が単純型または単純型内容を持つ複合型であると判定された場合（ステップＳ２７のＹＥＳ）、当該対象要素の内容を取得する（ステップＳ２８）。取得された対象要素の内容に対して形態素解析を実行し、その結果により、対象要素の内容に含まれる名詞を抽出する（ステップＳ２９）。
なお、ステップＳ２７において、対象要素が単純型または単純型内容を持つ複合型でないと判定された場合、上記したステップＳ２８及びＳ２９の処理は実行されない。 Next, the data analysis control unit 111 determines whether the target element is the above-described simple type or a complex type having simple type contents (step S27). When it is determined that the target element is a simple type or a complex type having simple type contents (YES in step S27), the contents of the target element are acquired (step S28). Morphological analysis is performed on the acquired content of the target element, and nouns included in the content of the target element are extracted based on the result (step S29).
If it is determined in step S27 that the target element is not a simple type or a complex type having simple type contents, the processes in steps S28 and S29 described above are not executed.

次に、語彙階層関係グラフ作成部１１２は、取得された要素名、属性名またはステップＳ２６またはＳ２８で抽出された名詞（語彙）に対して、スコアリングを行う（ステップＳ３０）。語彙階層関係グラフ作成部１１２は、例えば評価値を用いてスコアリング処理を実行する。 Next, the vocabulary hierarchy relationship graph creation unit 112 performs scoring on the acquired element name, attribute name, or noun (vocabulary) extracted in step S26 or S28 (step S30). The vocabulary hierarchy relationship graph creation unit 112 executes scoring processing using, for example, evaluation values.

ここで、評価値の例について具体的に説明する。語彙階層関係グラフ作成部１１２は、以下の例えば第１の評価値から第７の評価値のうち複数の評価値を用いてスコアリング処理を実行する。まず、要素名または属性名の場合の評価値の例について説明する。 Here, an example of the evaluation value will be specifically described. The lexical hierarchy relationship graph creating unit 112 executes the scoring process using a plurality of evaluation values among the following first evaluation value to seventh evaluation value, for example. First, an example of an evaluation value in the case of an element name or attribute name will be described.

第１の評価値は、例えば解析対象データ内における要素名または属性名の出現回数に応じて算出される。例えば出現回数が多いものに対しては、評価値は高くなる。
第２の評価値は、出現位置に応じて算出される。この場合、出現位置（階層）が深いものほど、評価値は小さくなる。例えば階層の深さをdepthとすると、評価式１／depthまたは１／（log depth）＋１によって第２の評価値は算出される。なお、階層の深さは、ルート要素を１とする。
また、例えば管理者３０によって条件として指定された語彙が要素名であれば、要素名が当該指定された語彙である要素以下の要素または属性の深さについては、要素名が当該指定された語彙である要素の深さは０として扱い、これに基づいて、上記した第２の評価値が求められる構成であってもよい。 The first evaluation value is calculated according to the number of occurrences of element names or attribute names in the analysis target data, for example. For example, the evaluation value is high for those having a large number of appearances.
The second evaluation value is calculated according to the appearance position. In this case, the evaluation value is smaller as the appearance position (hierarchy) is deeper. For example, when the depth of the hierarchy is depth, the second evaluation value is calculated by the evaluation formula 1 / depth or 1 / (log depth) +1. The depth of the hierarchy is 1 for the root element.
For example, if the vocabulary specified as a condition by the administrator 30 is an element name, the element name is the specified vocabulary for the depth of elements or attributes below the element whose element name is the specified vocabulary. The depth of the element may be treated as 0, and the above-described second evaluation value may be obtained based on this.

第３の評価値は、子ノード（子要素、属性）に応じて算出される。この場合、子ノード数が多いほど、評価値は大きくなる。例えば子ノード数をcountとすると、評価式count／全ノード数によって第３の評価値は算出される。なお、ノードとは、要素、要素の内容、属性名、属性値を総称したものである。
第４の評価値は、要素名であるか属性名であるかに応じて算出される。例えば要素名または属性名によって、評価値を算出する際に用いる係数を変更する。
第５の評価値は、要素名または属性名に関係するテキストの内容に応じて算出される。例えば管理者３０によって条件として指定された語彙が要素の内容または属性値に含まれている場合には、要素の内容に当該指定された語彙を含む要素の要素名、または属性値に当該指定された語彙を含む属性の属性名に対する評価値は大きくなる。
要素名または属性名（に含まれる語彙）の場合、例えば上記した第１から第５の評価値を含む複数の評価値の合計によって、スコアリングされる。 The third evaluation value is calculated according to the child node (child element, attribute). In this case, the evaluation value increases as the number of child nodes increases. For example, if the number of child nodes is count, the third evaluation value is calculated by the evaluation formula count / total number of nodes. Note that a node is a generic term for elements, element contents, attribute names, and attribute values.
The fourth evaluation value is calculated according to whether it is an element name or an attribute name. For example, the coefficient used when calculating the evaluation value is changed depending on the element name or attribute name.
The fifth evaluation value is calculated according to the text content related to the element name or attribute name. For example, when the vocabulary specified as a condition by the administrator 30 is included in the element content or attribute value, the element name or attribute value of the element including the specified vocabulary in the element content is specified. The evaluation value for the attribute name of the attribute including the vocabulary increases.
In the case of an element name or an attribute name (a vocabulary included in the element name), for example, scoring is performed by the sum of a plurality of evaluation values including the first to fifth evaluation values.

次に、要素の内容または属性値の場合の評価値の例について説明する。第６の評価値は、形態素解析を行った結果、取得された名詞または固有名詞などの出現回数に応じて算出される。例えば出現回数が多いものに対しては、評価値は大きくなる。
第７の評価値は、要素の内容または属性値に関係するノード名に応じて算出される。なお、ノード名は、ここでは要素名または属性名を示す。例えば管理者３０によって条件として指定された語彙が要素名又は属性名である場合、要素名が当該指定された要素の内容または属性名が当該指定された語彙の属性値の評価値は大きくなる。要素の内容または属性値の場合、例えば上記した第６または第７の評価値を含む複数の評価値の合計によって、スコアリングされる。 Next, examples of evaluation values in the case of element contents or attribute values will be described. The sixth evaluation value is calculated according to the number of appearances of the acquired noun or proper noun as a result of the morphological analysis. For example, the evaluation value is large for a large number of appearances.
The seventh evaluation value is calculated according to the node name related to the content of the element or the attribute value. Here, the node name indicates an element name or an attribute name. For example, when the vocabulary designated as a condition by the administrator 30 is an element name or an attribute name, the element name or the attribute value of the designated vocabulary attribute value becomes larger as the element name or attribute name of the designated element. In the case of the content or attribute value of an element, for example, it is scored by the sum of a plurality of evaluation values including the above-described sixth or seventh evaluation value.

次に、語彙階層関係グラフ作成部１１２は、取得された要素名、属性名またはステップＳ２６またはＳ２８で抽出された語彙をラベル（名称）として持ち、それぞれノードタイプを保持する頂点として構造グラフに追加する（ステップＳ３１）。ノードタイプは、要素名、要素の内容、属性名または属性値を示す。
語彙階層関係グラフ作成部１１２は、定義に従って、構造グラフに辺を追加する（ステップＳ３２）。辺は、構造グラフ上の要素名、属性名またはステップＳ２６またはＳ２８で抽出された語彙相互間の階層関係を表す。ここで、上記した定義には、例えば矢印（辺）の向きにより「親要素名から子要素名」、「要素名からその属性名」、「属性名から属性値」、「要素名からのその要素の内容に含まれる文字列」、「親要素の属性名から子要素名」、「親要素の属性値に含まれる文字列から子要素名」等が含まれる。 Next, the vocabulary hierarchy relationship graph creation unit 112 has the acquired element name, attribute name, or vocabulary extracted in step S26 or S28 as a label (name), and adds each to the structure graph as a vertex holding the node type. (Step S31). The node type indicates an element name, element content, attribute name or attribute value.
The vocabulary hierarchy relationship graph creation unit 112 adds an edge to the structure graph according to the definition (step S32). An edge represents an element name, an attribute name on the structure graph, or a hierarchical relationship between the vocabularies extracted in step S26 or S28. Here, the above definition includes, for example, “parent element name to child element name”, “element name to attribute name”, “attribute name to attribute value”, “element name to attribute name” depending on the direction of the arrow (side). “Character string included in element content”, “attribute name to child element name from parent element”, “character string to child element name included in attribute value of parent element”, and the like.

語彙階層関係グラフ作成部１１２は、解析対象データに基づいて、対象要素が子要素を持つか否かを判定する（ステップＳ３３）。対象要素が子要素をもつと判定された場合（ステップＳ３３のＹＥＳ）、データ解析制御部１１１は、当該子要素を取得し（ステップＳ３４）、そして、取得された子要素を対象要素として、上記したステップＳ２３に戻って処理を繰り返す。 The vocabulary hierarchy relationship graph creation unit 112 determines whether the target element has a child element based on the analysis target data (step S33). When it is determined that the target element has a child element (YES in step S33), the data analysis control unit 111 acquires the child element (step S34), and uses the acquired child element as the target element. Returning to step S23, the process is repeated.

一方、ステップＳ３３において対象要素が子要素をもたないと判定された場合、語彙階層関係グラフ作成部１１２は、ステップＳ３０においてスコアリングされた結果を下に語彙を抽出する（ステップＳ３５）。このとき、語彙階層関係グラフ作成部１１２は、語彙（ここでは、取得された要素名、属性名、またはステップＳ２６またはＳ２８で抽出された語彙）のうち、スコア（評価値の合計）が所定の値以上の語彙を抽出する。なお、この所定の値は、予め設定されていてもよいし、処理の度に例えば管理者３０によって設定される構成でもよい。また、語彙階層関係グラフ作成部１１２は、例えば管理者３０によって指定された語数の語彙を評価値の大きい順に抽出する構成でもよい。 On the other hand, if it is determined in step S33 that the target element has no child elements, the vocabulary hierarchy relationship graph creating unit 112 extracts the vocabulary based on the result scored in step S30 (step S35). At this time, the vocabulary hierarchy relation graph creating unit 112 has a score (total evaluation value) of vocabularies (here, the acquired element name, attribute name, or vocabulary extracted in step S26 or S28). Extract vocabulary above the value. The predetermined value may be set in advance, or may be set by the administrator 30 for each process. Further, the vocabulary hierarchy relationship graph creating unit 112 may be configured to extract, for example, vocabularies having the number of words designated by the administrator 30 in descending order of evaluation values.

語彙階層関係グラフ１１２は、ステップＳ３５において抽出された語彙でもなく、管理者３０によって指定された語彙でもない語彙をラベルとして持つ頂点及びその頂点に隣接する辺を削除する（ステップＳ３６）。ステップＳ３６において削除処理を終えたグラフが、解析対象データの階層構造を示す構造グラフとなる。 The lexical hierarchy relation graph 112 deletes the vertex having the vocabulary that is not the vocabulary extracted in step S35 but also the vocabulary designated by the administrator 30 and the edge adjacent to the vertex (step S36). The graph for which the deletion process has been completed in step S36 becomes a structure graph indicating the hierarchical structure of the analysis target data.

図６は、図５のフローチャートに示す処理によって作成された天気予報データ１００の構造グラフ２００の一例を示す。なお、この構造グラフ２００は、例えば管理者３０によって、条件として「天気」の語彙が指定されて作成されたものとする。 FIG. 6 shows an example of the structure graph 200 of the weather forecast data 100 created by the process shown in the flowchart of FIG. It is assumed that the structure graph 200 is created, for example, by the administrator 30 with the word “weather” specified as a condition.

図６に示すように、構造グラフ２００には、要素名として「場所」、「予報」、「天気」、「気温」及び「降水確率」が示されている。構造グラフ２００には、属性名として「都道府県」が示され、属性値に含まれる文字列として「東京都」、「神奈川県」及び「埼玉県」が示されている。また、構造グラフ２００には、要素名が「天気」である要素の内容に含まれる文字列として「晴れ」、「くもり」及び「雨」が示されている。なお、図６では、グラフの表記においてＸＭＬ上でのノードタイプによって頂点の表記を変えている。また、構造グラフ２００には、例えば要素名「場所」から要素名「予報」に向かって辺が示される。この辺は、前述したような定義に従って追加されたものである。 As shown in FIG. 6, the structure graph 200 shows “location”, “forecast”, “weather”, “temperature”, and “precipitation probability” as element names. In the structure graph 200, “prefecture” is shown as an attribute name, and “Tokyo”, “Kanagawa”, and “Saitama” are shown as character strings included in the attribute value. In addition, the structure graph 200 shows “clear”, “cloudy”, and “rain” as character strings included in the content of the element whose element name is “weather”. In FIG. 6, in the notation of the graph, the notation of the vertex is changed depending on the node type on XML. In the structure graph 200, for example, an edge is shown from the element name “place” to the element name “forecast”. This side is added according to the definition as described above.

なお、図４に示す天気予報データ１００が有する要素名、属性名、属性値または要素の内容のうち、構造グラフ２００に示されていない語彙については、前述した図５のステップＳ３６において削除されたものとする。削除された語彙は、例えばスコアリングされた結果、所定の値より評価値が小さく、かつ管理者３０によって指定されていない語彙である。 Of the element names, attribute names, attribute values, or element contents of the weather forecast data 100 shown in FIG. 4, vocabularies not shown in the structure graph 200 are deleted in step S36 of FIG. Shall. The deleted vocabulary is a vocabulary whose evaluation value is smaller than a predetermined value as a result of scoring and is not specified by the administrator 30.

次に、図７のフローチャートを参照して、構造グラフを用いて語彙階層関係グラフを作成する処理手順について説明する。なお、ここで用いられる構造グラフは、図６示す構造グラフ２００であるものとして説明する。
まず、語彙階層関係グラフ作成部１１２は、作成された構造グラフ２００を参照して、当該構造グラフ２００に示される頂点と同一のラベルを持つ頂点を語彙階層関係グラフに追加する（ステップＳ４１）。このとき、構造グラフ２００に同一のラベルを持つ頂点が複数存在する場合には、当該頂点が重複しないように頂点を追加する。 Next, a processing procedure for creating a lexical hierarchy relationship graph using a structure graph will be described with reference to the flowchart of FIG. Note that the structure graph used here is assumed to be the structure graph 200 shown in FIG.
First, the lexical hierarchy relationship graph creation unit 112 refers to the created structure graph 200 and adds a vertex having the same label as the vertex shown in the structure graph 200 to the vocabulary hierarchy relationship graph (step S41). At this time, if there are a plurality of vertices having the same label in the structure graph 200, the vertices are added so that the vertices do not overlap.

語彙階層関係グラフ作成部１１２は、構造グラフ２００を参照して、特定の距離以内にある任意の２頂点を取得する（ステップＳ４２）。ここで、構造グラフ２００上の１階層の距離を１とすると、特定の距離は２以上が好ましい。また、この特定の距離は、例えば管理者３０によって事前に指定されてもよいし、処理の度に例えば管理者３０によって指定される構成であってもよい。 The lexical hierarchy relationship graph creation unit 112 refers to the structure graph 200 and acquires any two vertices within a specific distance (step S42). Here, when the distance of one layer on the structure graph 200 is 1, the specific distance is preferably 2 or more. The specific distance may be specified in advance by the administrator 30, for example, or may be specified by the administrator 30 for each process.

語彙階層関係グラフ１１２は、取得された任意の２頂点に対応する語彙階層関係グラフ上の２頂点間に、辺を追加する（ステップＳ４３）。ここで、追加される辺は、ラベルを持つ。辺のラベルは、取得された任意の２頂点の構造グラフ２００上でのノードタイプの組を示す。
なお、語彙階層関係グラフ格納部１３２を更新する場合には、語彙階層関係グラフ作成部１１２は、語彙階層関係グラフを初期化してから図７のフローチャートによって示される処理を実行する。 The lexical hierarchy relationship graph 112 adds an edge between the two vertices on the vocabulary hierarchy relationship graph corresponding to the obtained arbitrary two vertices (step S43). Here, the added side has a label. The label of the edge indicates a set of node types on the structure graph 200 of any two obtained vertices.
When updating the vocabulary hierarchy relationship graph storage unit 132, the vocabulary hierarchy relationship graph creation unit 112 initializes the vocabulary hierarchy relationship graph and then executes the processing shown in the flowchart of FIG.

図８は、図７に示す処理によって作成された語彙階層関係グラフの概略図である。図８に示す語彙階層関係グラフ３００は、上記した特定の距離を２として作成された語彙階層関係グラフである。なお、図８は、便宜的に「東京都」をラベルとして持つ頂点（以下、単に「東京都」と表記）を中心とした部分のみを示す。「東京都」以外のラベルを持つ頂点間の辺は、点線で示されており、当該辺のラベルは省略されている。また、「東京都」以外の頂点については、例えば辺等の一部が省略されている。
上記した図６に示す構造グラフ２００の「東京都」から距離が２以内に存在する頂点は、「場所」、「都道府県」、「予報」、「天気」、「気温」及び「降水確率」である。以下、この頂点を特定距離内頂点と称する。 FIG. 8 is a schematic diagram of the lexical hierarchy relation graph created by the process shown in FIG. A vocabulary hierarchy relationship graph 300 shown in FIG. 8 is a vocabulary hierarchy relationship graph created by setting the specific distance to 2 as described above. FIG. 8 shows only a portion centered on a vertex (hereinafter simply referred to as “Tokyo”) having “Tokyo” as a label for convenience. Sides between vertices having labels other than “Tokyo” are indicated by dotted lines, and labels of the sides are omitted. For vertices other than “Tokyo”, for example, a part of the side is omitted.
In the structure graph 200 shown in FIG. 6 above, the vertices existing within a distance of 2 from “Tokyo” are “location”, “prefecture”, “forecast”, “weather”, “temperature”, and “precipitation probability”. It is. Hereinafter, this vertex is referred to as a vertex within a specific distance.

また、図８に示す語彙階層関係グラフ３００では、「東京都」及び特定距離内頂点の各々が辺で結ばれ、辺の各々にはラベルが付されている。例えば「場所」及び「東京都」においては、「場所」から「東京都」に辺が結ばれており、当該辺のラベルは、「要素名、属性値」である。これは、「場所」のノードタイプが要素名であり、「東京都」のノードタイプが属性値であることを示す。なお、図８に示す語彙階層関係グラフ３００では省略されているが、「東京都」以外の頂点についても同様に、特定距離内頂点と辺で結ばれており、その辺の各々にはラベルが付されている。 Further, in the lexical hierarchy relation graph 300 shown in FIG. 8, “Tokyo” and each vertex within a specific distance are connected by an edge, and each edge is labeled. For example, in “place” and “Tokyo”, an edge is connected from “place” to “Tokyo”, and the label of the edge is “element name, attribute value”. This indicates that the node type of “place” is an element name and the node type of “Tokyo” is an attribute value. Although omitted in the lexical hierarchy relationship graph 300 shown in FIG. 8, vertices other than “Tokyo” are similarly connected to vertices within a specific distance by edges, and each of the edges has a label. It is attached.

次に、図９のフローチャートを参照して、前述した図３に示すステップＳ１２の処理について詳細に説明する。
検索実行部１２のクエリ作成部１２１は、図３に示すステップＳ１１の処理において検索者２０によって指定された検索条件パス式を入力する。以下、クエリ作成部１２１によって入力された検索条件パス式を入力パス式と称する。ここでは、入力パス式は、例えば「東京都／天気」であるものとして説明する。 Next, the process of step S12 shown in FIG. 3 will be described in detail with reference to the flowchart of FIG.
The query creation unit 121 of the search execution unit 12 inputs the search condition path expression designated by the searcher 20 in the process of step S11 shown in FIG. Hereinafter, the search condition path expression input by the query creation unit 121 is referred to as an input path expression. Here, the input path expression will be described as being “Tokyo / weather”, for example.

なお、入力パス式によって示される文字列（語彙）の順列は、当該順列の文字列の順によって階層関係を示す。上記したように入力パス式が「東京都／天気」である場合には、「東京都」から「天気」に階層関係を有する旨を示す。以下、この階層関係を入力パス式の階層関係と称する。 Note that the permutation of the character string (vocabulary) indicated by the input path expression indicates a hierarchical relationship according to the order of the character string of the permutation. As described above, when the input path expression is “Tokyo / weather”, it indicates that there is a hierarchical relationship from “Tokyo” to “weather”. Hereinafter, this hierarchical relationship is referred to as an input path type hierarchical relationship.

クエリ作成部１２１は、語彙階層関係グラフ格納部１３２を参照して、入力パス式の階層関係を例えば親子関係として含む語彙階層関係グラフを取得する（ステップＳ５１）。なお、クエリ作成部１２１は、例えば入力パス式の階層関係の全てを親子関係として含む語彙階層関係グラフが語彙階層関係グラフ格納部１３２に存在しない場合には、当該入力パス式の階層関係の一部を親子関係として含む語彙階層関係グラフを取得する。
また、例えば入力パス式がＡ／Ｂ／Ｃ（当該入力パス式の階層関係がＡからＢ、ＢからＣ）である場合に、当該階層関係を完全に含む語彙階層関係グラフが語彙階層関係グラフ格納部１３２に存在しない場合を想定する。この場合には、クエリ作成部１２１は、例えばＡからＢの階層関係のみを親子関係として含む語彙階層関係グラフ、またはＢからＣの階層関係のみを親子関係として有する語彙階層関係グラフを取得する。 The query creation unit 121 refers to the vocabulary hierarchy relationship graph storage unit 132 and acquires a vocabulary hierarchy relationship graph including the input path expression hierarchy relationship as, for example, a parent-child relationship (step S51). For example, when the vocabulary hierarchy relationship graph including all of the input path expression hierarchical relationships as parent-child relationships does not exist in the vocabulary hierarchy relationship graph storage unit 132, the query creation unit 121 determines whether the input path expression hierarchical relationship A lexical hierarchy relation graph including a part as a parent-child relation is acquired.
For example, when the input path expression is A / B / C (the input path expression has a hierarchical relationship from A to B, and from B to C), the lexical hierarchy relationship graph that completely includes the hierarchical relationship is the lexical hierarchy relationship graph. Assume that the storage unit 132 does not exist. In this case, for example, the query creation unit 121 acquires a lexical hierarchy relationship graph including only the hierarchical relationship from A to B as the parent-child relationship, or a vocabulary hierarchical relationship graph having only the hierarchical relationship from B to C as the parent-child relationship.

クエリ作成部１２１は、取得された語彙階層関係グラフに基づいて、入力パス式に含まれる文字列の各々のノードタイプを判別する（ステップＳ５２）。
クエリ作成部１２１は、入力パス式の階層関係及び判別されたノードタイプに基づいて、当該入力パス式に含まれる文字列間の階層関係を有する検索対象データを検索するための検索式、例えばXPath式を作成する。つまり、クエリ作成部１２１は、入力パス式をXPath式に変換する（ステップＳ５３）。または、クエリ作成部１２１は、変換データ格納部１３３に格納されている変換データに基づいてXPath式に変換する。 The query creation unit 121 determines the node type of each character string included in the input path expression based on the acquired lexical hierarchy relation graph (step S52).
Based on the hierarchical relationship of the input path expression and the determined node type, the query creating unit 121 searches the search target data having a hierarchical relationship between character strings included in the input path expression, for example, XPath Create an expression. That is, the query creation unit 121 converts the input path expression into an XPath expression (step S53). Alternatively, the query creation unit 121 converts it into an XPath expression based on the conversion data stored in the conversion data storage unit 133.

なお、ステップＳ５１において、クエリ作成部１２１によって複数の語彙階層関係グラフが取得された場合は、当該取得された語彙階層関係グラフの各々についてステップＳ５２及びＳ５３の処理が実行される。これにより、複数のXPath式が作成（変換）された場合は、当該複数のXPath式を互いにＯＲ（論理和）の関係として、検索対象データ格納部１３１に対して検索処理が実行される。 In step S51, when a plurality of vocabulary hierarchy relationship graphs are acquired by the query creation unit 121, the processing of steps S52 and S53 is executed for each of the acquired vocabulary hierarchy relationship graphs. Thus, when a plurality of XPath expressions are created (converted), the search processing is performed on the search target data storage unit 131 with the plurality of XPath expressions being ORed (ORed) with each other.

ここで、図１０は、変換データ格納部１３３に格納されている変換データのデータ構造の一例を示す。なお、この変換データは、事前に語彙階層関係グラフにおける各関係に対するXPath式への変換方法として定義されている。図１０に示すように、変換データは、関係情報及び当該情報に対応付けられているXPath式情報を含む。 Here, FIG. 10 shows an example of the data structure of the conversion data stored in the conversion data storage unit 133. This conversion data is defined in advance as a conversion method to an XPath expression for each relationship in the lexical hierarchy relationship graph. As illustrated in FIG. 10, the conversion data includes relationship information and XPath expression information associated with the information.

関係情報は、語彙（文字列）間の階層関係を示す。図１０に示す例では、例えば親子（要素名、要素名）の関係がある。これは、当該語彙のそれぞれのノードタイプが要素名であることを示す。また、同様に、親子（要素名、要素の内容）は、親となる語彙のノードタイプが要素名であり、異なる語彙（子）のノードタイプが要素の内容（に含まれる文字列）であることを示す。また、例えば兄弟（要素名、要素名、…）の関係は、語彙間の関係が同じ階層、同じノードタイプであることを示している。 The relationship information indicates a hierarchical relationship between vocabularies (character strings). In the example shown in FIG. 10, for example, there is a parent-child (element name, element name) relationship. This indicates that each node type of the vocabulary is an element name. Similarly, in the parent and child (element name, element content), the node type of the parent vocabulary is the element name, and the node type of a different vocabulary (child) is the element content (character string included in the element). It shows that. For example, the relationship between siblings (element names, element names,...) Indicates that the relationships between vocabularies are the same hierarchy and the same node type.

また、XPath式情報は、対応付けられている関係を示す情報によって示される語彙間の関係に適合する検索対象データ（構造化文書）を検索するために適当なXPath式を示す。例えば親子（要素名、要素の内容）の関係情報に対応付けられているXPath式情報は、XPath式「//element1[.//elemnt2]」を示す。また、同様に、例えば親子（要素名、要素の内容）の関係情報に対応付けられているXPath式情報は、XPath式「//element1[./text()=”context”]」を示す。なお、上記したXPath式において、element1及びelement2は要素名を示し、contextは要素の内容を示す。また、これ以外にattriは属性名を示し、valueは属性値を示す。 Further, the XPath expression information indicates an XPath expression suitable for searching for search target data (structured document) that matches the vocabulary relationship indicated by the information indicating the associated relationship. For example, the XPath expression information associated with the parent-child (element name, element content) relation information indicates the XPath expression “//element1[.//elemnt2]”. Similarly, for example, XPath expression information associated with parent-child (element name, element content) relation information indicates the XPath expression “//element1[./text()=“context”] ”. In the above XPath expression, element1 and element2 indicate element names, and context indicates the content of the element. In addition, attri indicates an attribute name, and value indicates an attribute value.

なお、図１０に示す変換データに含まれる関係情報には、２つの語彙の階層関係が定義されているが、これらを複数組み合わせることにより、３つ以上の語彙の階層関係を定義することも可能である。これにより、３つ以上の語彙を含む入力パス式をXPath式に変換することも可能である。 Note that the relationship information included in the conversion data shown in FIG. 10 defines a hierarchical relationship between two vocabularies, but it is also possible to define a hierarchical relationship between three or more vocabularies by combining a plurality of these. It is. Thus, an input path expression including three or more vocabularies can be converted into an XPath expression.

再び図９のステップＳ５３に戻ると、クエリ作成部１２１は、ステップＳ５２において判別されたノードタイプ及び入力パス式の階層関係から、適合する関係情報を特定する。クエリ作成部１２１は、特定された関係情報に対応付けられたXPath式情報に基づいて、XPath式に変換する。 Returning to step S53 in FIG. 9 again, the query creation unit 121 identifies compatible relationship information from the node type determined in step S52 and the hierarchical relationship of the input path expressions. The query creation unit 121 converts it into an XPath expression based on the XPath expression information associated with the specified relationship information.

例えば上記したように入力パス式が「東京都／天気」であり、ステップＳ５１において図８に示す語彙階層関係グラフ３００が取得された場合について説明する。この場合、語彙階層関係グラフ３００において、「東京都」は属性値であり、「天気」は要素名である。また、「東京都」から「天気」に辺が結ばれていることから、「東京都」及び「天気」は親子の関係である。よって、クエリ作成部１２１は、例えば図１０に示す変換データに基づいて、親子（属性値、要素名）を示す関係情報に対応するXPath式情報によって示されるXPath式を作成する。つまり、クエリ作成部１２１は、入力パス式「東京都／天気」をXPath式「//*[./@*=”東京都”and.//天気]」に変換する。このXPath式に基づいて、検索対象データ格納部１３１に対して検索処理が実行される。XPath式「//*[./@*=”東京都”and.//天気]」は、「東京都」及び「天気」が親子関係であり、かつ、「東京都」のノードタイプが属性値であり、「天気」のノードタイプが要素名であるＸＭＬデータ（構造化文書）を検索するための検索式である。なお、「東京都」及び「天気」は直接親子関係を有する必要はなく、例えば孫またはひ孫の関係であっても検索可能である。
ここでは、変換データはXPath式情報を含み、入力パス式をXPath式に変換する例を示したが、例えばXQueryまたはSQL等の任意のデータソースに対する任意の問合せ言語へ変換する構成でもよい。 For example, the case where the input path expression is “Tokyo / weather” as described above and the lexical hierarchy relation graph 300 shown in FIG. 8 is acquired in step S51 will be described. In this case, in the vocabulary hierarchy relationship graph 300, “Tokyo” is an attribute value, and “weather” is an element name. Further, since “Tokyo” is connected to “weather”, “Tokyo” and “weather” have a parent-child relationship. Therefore, the query creation unit 121 creates an XPath expression indicated by the XPath expression information corresponding to the relationship information indicating the parent and child (attribute value, element name) based on, for example, the conversion data illustrated in FIG. That is, the query creation unit 121 converts the input path expression “Tokyo / weather” into an XPath expression “//*[./@*=“Tokyo” and.//weather] ”. Based on this XPath expression, a search process is executed for the search target data storage unit 131. XPath expression "//*[./@*="Tokyo"and.//weather]" has parent-child relationship between "Tokyo" and "weather", and the node type of "Tokyo" is an attribute This is a search formula for searching XML data (structured document) that is a value and whose element type is the node type of “weather”. It should be noted that “Tokyo” and “weather” do not need to have a direct parent-child relationship, and can be searched for, for example, a grandchild or great-grandchild relationship.
Here, the conversion data includes XPath expression information, and an example in which an input path expression is converted into an XPath expression has been shown. However, for example, the conversion data may be converted into an arbitrary query language for an arbitrary data source such as XQuery or SQL.

図１１は、図４の天気予報データ１００に対し、上記したクエリ作成部１２１によって作成されたXPath式「//*[./@*=”東京都”and.//天気]」による検索結果４００を示す。検索結果４００は、属性値「東京都」及びその親子関係（ここでは、孫関係）である要素名「天気」を含むＸＭＬデータとなる。 FIG. 11 shows a search result based on the XPath expression “//*[./@*=“Tokyo” and.//weather] ”created by the query creation unit 121 for the weather forecast data 100 of FIG. 400 is shown. The search result 400 is XML data including an attribute value “Tokyo” and an element name “weather” that is a parent-child relationship (here, a grandchild relationship).

ところで、例えば入力パス式の階層関係が複数存在する場合において、当該複数の階層関係のうち、語彙階層関係グラフ格納部１３２内の語彙階層関係グラフに設定されていない階層関係（未設定階層関係）が指定された場合を想定する。この場合、未設定階層関係に対する設定（設定レベル）を予め定義しておく構成であっても構わない。予めレベルを設定しておくことにより、所望の検索結果範囲を得ることができる。 By the way, for example, when there are a plurality of input path type hierarchical relationships, a hierarchical relationship that is not set in the vocabulary hierarchical relationship graph in the vocabulary hierarchical relationship graph storage unit 132 (unset hierarchical relationship) among the plurality of hierarchical relationships. Assume that is specified. In this case, the configuration (setting level) for the unset hierarchy relationship may be defined in advance. By setting the level in advance, a desired search result range can be obtained.

図１２は、予め定義された設定レベルの一例を示す。図１２の例では、未設定階層関係を構成する２つの語彙に設定するノードタイプと、未設定階層関係を構成する２つの語彙に設定する階層関係についてレベル１から３が定義されている。なお、レベルの段階は、例えば検索者２０によって適宜設定される。 FIG. 12 shows an example of a preset setting level. In the example of FIG. 12, levels 1 to 3 are defined for the node type set for the two vocabularies constituting the unset hierarchy relationship and the hierarchy relation set for the two vocabularies constituting the unset hierarchy relation. The level stage is set as appropriate by the searcher 20, for example.

例えばレベル１では、入力パス式に含まれる未設定階層関係については、無視することが定義されている。例えば入力パス式が、Ａ／Ｂ／Ｃである場合を想定する。この場合、語彙階層関係グラフ格納部１３２に格納されている１つの語彙階層関係グラフにおいて、ＡからＢの階層関係は存在するが、ＢからＣの階層関係が存在しない場合、当該ＢからＣの階層関係は無視し、ＡからＢの階層関係のみからXPath式に変換することを示す。
また、例えばレベル２では、タイプは「要素名」を示し、動作内容は「未設定階層関係を親子関係に設定」を示す。つまり、未設定階層関係を構成する２つの語彙のノードタイプをそれぞれ要素名に設定し、当該２つの語彙の階層関係を親子関係として、入力パス式がXPath式に変換される。
また、例えばレベル３では、タイプは「すべて」を示し、動作内容は「未設定階層関係を親子関係または兄弟関係に設定」を示す。つまり、未設定階層関係を構成する２つの語彙の各々のノードタイプは、要素名、属性名、属性値または要素の内容のそれぞれに設定される。また、設定された全ての組み合わせ毎に、語彙間の階層関係に親子関係または兄弟関係が設定される。設定された全ての組み合わせ及び階層関係に基づいて、入力パス式がXPath式に変換される。 For example, at level 1, it is defined that the unset hierarchical relationship included in the input path expression is ignored. For example, assume that the input path expression is A / B / C. In this case, in one vocabulary hierarchy relationship graph stored in the vocabulary hierarchy relationship graph storage unit 132, there is a hierarchy relationship from A to B, but there is no hierarchy relationship from B to C. This indicates that the hierarchical relationship is ignored and only the hierarchical relationship from A to B is converted into an XPath expression.
For example, at level 2, the type indicates “element name”, and the operation content indicates “set unset hierarchical relationship as parent-child relationship”. That is, the node types of two vocabularies that constitute an unset hierarchical relationship are set as element names, and the input path expression is converted into an XPath expression with the hierarchical relationship between the two vocabularies as a parent-child relationship.
Further, for example, at level 3, the type indicates “all” and the operation content indicates “set unset hierarchical relationship to parent-child relationship or sibling relationship”. That is, the node type of each of the two vocabularies constituting the unset hierarchical relationship is set to the element name, attribute name, attribute value, or element content. In addition, for every set combination, a parent-child relationship or a sibling relationship is set as the hierarchical relationship between vocabularies. Based on all set combinations and hierarchical relationships, the input path expression is converted to an XPath expression.

次に、図１３及び図１４を参照して、検索結果評価装置１０における検索結果評価処理（前述した図３に示すステップＳ１５の処理）について具体的に説明する。ここで、入力パス式（検索条件パス式）は、「発明／（合田 and 構造化文書）」であるものとする。この入力パス式は、例えば「発明のうち、合田が行ったもので、かつ、構造化文書を含むもの」といった意味を表す。また、この入力パス式からは、例えば「//発明[.//text()=”合田” and .//text()=”構造化文書”]」のようなXPath式がクエリ作成部１２１により作成される。 Next, with reference to FIGS. 13 and 14, the search result evaluation process (the process of step S15 shown in FIG. 3 described above) in the search result evaluation apparatus 10 will be specifically described. Here, it is assumed that the input path expression (search condition path expression) is “invention / (Goda and structured document)”. This input path expression represents, for example, the meaning of “invented by Goda and including a structured document”. Further, from this input path expression, for example, an XPath expression such as “// invention [.//text()=“Goda” and .//text()=“structured document ”]” is used as the query creation unit 121. Created by.

なお、上記したXPath式「//発明[.//text()=”合田” and .//text()=”構造化文書”]」に応じて検索対象データ格納部１３１から検索された検索結果として、図１３に示す検索対象データ１０１及び図１４に示す検索対象データ１０２が得られたものとする。 The search retrieved from the search target data storage unit 131 in accordance with the above XPath expression “// invention [.//text()=“Goda” and .//text()=“structured document ”]” As a result, the search target data 101 shown in FIG. 13 and the search target data 102 shown in FIG. 14 are obtained.

以下、検索結果として得られた検索対象データ１０１及び検索対象データ１０２のそれぞれについて、パス式文書間評価部１４１が入力パス式（ここでは、「発明／（合田 and 構造化文書）」）に基づいて評価値を算出するアルゴリズム（以下、第１のアルゴリズムと表記）の一例について説明する。
まず、パス式文書間評価部１４１は、例えば入力パス式で階層関係にある全てのキーワード（文字列）間に対して、検索結果として得られた検索対象データ１０１及び検索対象データ１０２の各々における距離を算出する。ここで、キーワード間の距離とは、例えば要素からその子要素、要素からその属性名に対しては距離を１とする。また、要素からその要素の内容、属性名からその属性値に対しても、同様に距離を１とする。 Hereinafter, for each of the search target data 101 and the search target data 102 obtained as a search result, the path expression inter-document evaluation unit 141 is based on an input path expression (here, “invention / (Aida and structured document)”). An example of an algorithm for calculating an evaluation value (hereinafter referred to as a first algorithm) will be described.
First, the path expression inter-document evaluation unit 141 performs the search in each of the search target data 101 and the search target data 102 obtained as a search result for all keywords (character strings) having a hierarchical relationship in the input path expression, for example. Calculate the distance. Here, the distance between keywords is, for example, that the distance is 1 from an element to its child element and from the element to its attribute name. Similarly, the distance is set to 1 from the element to the content of the element and from the attribute name to the attribute value.

次に、パス式文書間評価部１４１は、検索結果として得られた検索対象データ毎に、算出された距離の逆数を求め、それらの合計を当該検索対象データの評価値として算出する。
ここで、図１３に示す検索対象データ１０１及び図１４に示す検索対象データ１０２の評価値の算出処理について具体的に説明する。 Next, the path type inter-document evaluation unit 141 obtains the reciprocal of the calculated distance for each search target data obtained as a search result, and calculates the sum of them as an evaluation value of the search target data.
Here, processing for calculating the evaluation values of the search target data 101 shown in FIG. 13 and the search target data 102 shown in FIG. 14 will be specifically described.

入力パス式は「発明／（合田 and 構造化文書）」であるため、当該入力パス式の階層関係（にあるキーワード）は、「発明→合田」及び「発明→構造化文書」である。 Since the input path expression is “invention / (Goda and structured document)”, the hierarchical relationship (keywords) in the input path expression is “invention → Goda” and “invention → structured document”.

検索対象データ１０１における「発明→合田」に対応する距離を算出する。この場合、検索対象データ１０１の要素である「発明」からその子要素である「発明者」の距離が１であり、要素である「発明者」からその要素の内容である「合田」の距離が１である。よって、検索対象データ１０１における「発明→合田」に対応する距離は２と算出される。 The distance corresponding to “invention → Goda” in the search target data 101 is calculated. In this case, the distance from the “invention” that is the element of the search target data 101 to the “inventor” that is the child element thereof is 1, and the distance from the “inventor” that is the element to the content of the element is “Goda”. 1. Therefore, the distance corresponding to “invention → Goda” in the search target data 101 is calculated as 2.

また、検索対象データ１０１における「発明→構造化文書」に対応する距離を算出する。この場合、検索対象データ１０１の要素である「発明」からその子要素である「名称」の距離が１であり、要素である「名称」からその要素の内容である「構造化文書」の距離が１である。よって、検索対象データ１０１における「発明→構造化文書」に対応する距離は２と算出される。
上記したように評価値はキーワード間の距離の逆数の合計であるため、検索対象データ１０１の評価値は、１／２＋１／２＝１となる。 Further, a distance corresponding to “invention → structured document” in the search target data 101 is calculated. In this case, the distance from the “invention” that is the element of the search target data 101 to the “name” that is the child element is 1, and the distance from the “name” that is the element to the “structured document” that is the content of the element is 1. Therefore, the distance corresponding to “invention → structured document” in the search target data 101 is calculated as 2.
As described above, since the evaluation value is the sum of the reciprocal of the distance between the keywords, the evaluation value of the search target data 101 is 1/2 + 1/2 = 1.

一方、検索対象データ１０２における「発明→合田」に対応する距離を算出する。この場合、検索対象データ１０２の要素である「発明」からその子要素である「発明者」の距離が１であり、要素である「発明者」からその子要素である「氏名」の距離が１であり、要素である「氏名」からその要素の内容である「合田」の距離が１である。よって、検索対象データ１０２における「発明→合田」に対応する距離は３と算出される。 On the other hand, the distance corresponding to “invention → Goda” in the search object data 102 is calculated. In this case, the distance from the “invention” that is the element of the search target data 102 to the “inventor” that is the child element is 1, and the distance from the “inventor” that is the element to the “name” that is the child element is 1. Yes, the distance from the element "name" to the element content "Goda" is 1. Therefore, the distance corresponding to “invention → Goda” in the search target data 102 is calculated as 3.

また、検索対象データ１０２における「発明→構造化文書」に対応する距離を算出する。この場合、検索対象データ１０２の要素である「発明」からその子要素である「内容」の距離が１であり、要素である「内容」からその子要素である「名称」の距離が１であり、要素である「名称」からその要素の内容である「構造化文書」の距離が１である。よって、検索対象データ１０２における「発明→構造化文書」に対応する距離は３と算出される。
上記したように評価値はキーワード間の距離の逆数の合計であるため、検索対象データ１０２の評価値は、１／３＋１／３＝２／３となる。 In addition, the distance corresponding to “invention → structured document” in the search target data 102 is calculated. In this case, the distance from “invention” that is an element of the search target data 102 to “content” that is the child element is 1, and the distance from “content” that is the element to “name” that is the child element is 1. The distance from the “name” that is an element to the “structured document” that is the content of the element is 1. Therefore, the distance corresponding to “invention → structured document” in the search target data 102 is calculated as 3.
As described above, since the evaluation value is the sum of the reciprocals of the distances between the keywords, the evaluation value of the search target data 102 is 1/3 + 1/3 = 2/3.

パス式文書間評価部１４１は、算出された検索対象データ１０１及び検索対象データ１０２の評価値に基づいて、当該検索対象データ１０１及び検索対象データ１０２の順位付けを行う。パス式文書間評価部１４１は、例えば算出された評価値が大きい検索対象データ（ここでは、検索対象データ１０１）を上位に順位付ける。したがって、この場合には検索対象データ１０１、検索対象データ１０２の順で検索結果が検索者２０に対して提示されることになる。
なお、上記した第１のアルゴリズムは一例であり、例えばキーワード間の距離の定義または評価値の算出方法は、ここで説明した以外のものであっても構わない。 The path type inter-document evaluation unit 141 ranks the search target data 101 and the search target data 102 based on the calculated evaluation values of the search target data 101 and the search target data 102. The path type inter-document evaluation unit 141 ranks, for example, search target data (in this case, the search target data 101) having a large calculated evaluation value in a higher rank. Accordingly, in this case, search results are presented to the searcher 20 in the order of the search target data 101 and the search target data 102.
Note that the first algorithm described above is merely an example, and for example, the method for defining the distance between keywords or calculating the evaluation value may be other than that described here.

上記したように本実施形態においては、検索者から指定された複数の文字列（語彙）を含むパス形式の検索条件（検索条件パス式）から、語彙階層関係グラフ格納部１３２に格納されている語彙階層関係グラフ及び変換データ格納部１３３に格納されている変換データに基づいて、当該検索条件に含まれる文字列間の階層関係に適合する構造化文書（検索対象データ）を検索するためのXPath式が作成される。この作成されたXPath式に基づいて、検索対象データ格納部１３１に対して検索が実行される。これにより、検索者２０は、検索対象となる構造化文書のデータ構造を意識することなく、例えばキーワード（文字列）を指定するような簡便さで、構造化文書の階層関係を考慮した検索を実行することが可能となる。 As described above, in the present embodiment, the search condition (search condition path expression) in the path format including a plurality of character strings (vocabulary) designated by the searcher is stored in the vocabulary hierarchy relation graph storage unit 132. XPath for searching a structured document (search target data) that matches the hierarchical relationship between character strings included in the search condition based on the lexical hierarchy relationship graph and the conversion data stored in the conversion data storage unit 133 An expression is created. Based on the created XPath expression, the search target data storage unit 131 is searched. As a result, the searcher 20 can perform a search in consideration of the hierarchical relationship of the structured document, for example, by simply specifying a keyword (character string) without being aware of the data structure of the structured document to be searched. It becomes possible to execute.

また、本実施形態においては、上記したXPath式により検索された検索結果を、検索者２０によって指定された検索条件に基づいて評価することにより、当該検索者２０が意図した階層構造の構造化文書をより上位に順位付けして検索結果を返すことができる。これにより、検索者２０は、効率的に自身が必要とするデータ（構造化文書）を取得することが可能となる。 In the present embodiment, the search result searched by the above XPath expression is evaluated based on the search condition specified by the searcher 20, so that the structured document having the hierarchical structure intended by the searcher 20 is obtained. Can be ranked higher and return search results. As a result, the searcher 20 can efficiently acquire data (structured document) required by the searcher 20.

また、本実施形態においては、構造化文書の階層構造を示す構造グラフ上で、例えば任意の２頂点の特定の距離に基づいて、語彙階層関係グラフが作成される。これにより、この語彙階層関係グラフを用いて検索処理が実行される際、検索対象となる構造化文書上で、検索者２０によって指定された語彙が直接階層関係にない構造化文書であっても考慮して検索することが可能となる。 In the present embodiment, a lexical hierarchy relation graph is created on the structure graph indicating the hierarchical structure of the structured document, for example, based on a specific distance between two arbitrary vertices. Thus, when a search process is executed using this vocabulary hierarchy relation graph, even if the vocabulary specified by the searcher 20 is not directly in a hierarchical relation on the structured document to be searched, It becomes possible to search in consideration.

また、本実施形態においては、予め定義されたレベルを設定指定おくことで、未設定階層関係が入力パス式として指定された場合であっても、当該設定されたレベルに応じて、当該未設定階層関係を静的または動的に設定することが可能となる。 Further, in the present embodiment, by setting and specifying a pre-defined level, even if the unset hierarchy relationship is specified as an input path expression, the non-set level is set according to the set level. Hierarchical relationships can be set statically or dynamically.

なお、上記した本実施形態においては、検索者２０によって検索実行部１２に検索要求が出され、管理者３０によってデータ解析部１１に語彙階層関係グラフ作成要求が出される構成としているが、全ての要求を入力する要求制御部を設ける構成でも構わない。この場合、要求制御部は、入力された要求を解釈し、データ解析部１１または検索実行部１２に処理を自動で振り分けることが可能となる。 In the above-described embodiment, the searcher 20 issues a search request to the search execution unit 12, and the administrator 30 issues a lexical hierarchy relationship graph creation request to the data analysis unit 11. A configuration may be provided in which a request control unit for inputting a request is provided. In this case, the request control unit can interpret the input request and automatically distribute the processing to the data analysis unit 11 or the search execution unit 12.

また、本実施形態に係る検索結果評価装置１０は、上記したように検索装置であるものとして説明したが、検索対象データを検索対象データ格納部１３１に登録する機能を有する構成であってもよい。また、検索結果評価装置１０は、検索対象データを更新または削除する機能を含む管理装置として利用される構成であっても構わない。 Moreover, although the search result evaluation apparatus 10 according to the present embodiment has been described as being a search apparatus as described above, the search result evaluation apparatus 10 may have a function of registering search target data in the search target data storage unit 131. . Further, the search result evaluation device 10 may be configured to be used as a management device including a function of updating or deleting search target data.

［第２の実施形態］
次に、図１５を参照して、本発明の第２の実施形態に係る検索結果評価装置について説明する。図１５は、本実施形態に係る検索結果評価装置の主として機能構成を示すブロック図である。なお、前述した図１と同様の部分には同一参照符号を付してその詳しい説明を省略する。ここでは、図１と異なる部分について主に述べる。 [Second Embodiment]
Next, a search result evaluation apparatus according to the second embodiment of the present invention will be described with reference to FIG. FIG. 15 is a block diagram mainly showing a functional configuration of the search result evaluation apparatus according to the present embodiment. The same parts as those in FIG. 1 described above are denoted by the same reference numerals, and detailed description thereof is omitted. Here, parts different from FIG. 1 will be mainly described.

図１５に示す検索結果評価装置４０は、検索実行部４１及び検索結果評価部４２を含む。検索実行部４１は、クエリ作成部４１１を含む。クエリ作成部４１１は、例えば検索者２０によって指定された検索条件パス式を複数の検索条件パス式に展開する処理を実行する。このとき、クエリ作成部４１１は、検索者２０によって指定された検索条件パス式に含まれる複数の文字列に基づいて、当該検索条件パス式を展開する。クエリ作成部４１１は、例えば検索者２０によって指定された検索条件パス式に含まれる複数の文字列の並び替えることによって複数の検索条件パス式に展開する。これにより、クエリ作成部４１１は、検索者２０によって指定された検索条件パス式から複数の検索条件パス式を作成する。また、クエリ作成部４１１は、語彙階層関係グラフ格納部１３２及び変換データ格納部１３３を参照して、検索者２０によって指定された検索条件パス式及び展開された複数の検索パス式から検索式（例えばXPath式）を作成する。 The search result evaluation device 40 illustrated in FIG. 15 includes a search execution unit 41 and a search result evaluation unit 42. The search execution unit 41 includes a query creation unit 411. For example, the query creation unit 411 executes processing for expanding a search condition path expression designated by the searcher 20 into a plurality of search condition path expressions. At this time, the query creation unit 411 expands the search condition path expression based on a plurality of character strings included in the search condition path expression specified by the searcher 20. For example, the query creation unit 411 expands a plurality of search condition path expressions by rearranging a plurality of character strings included in the search condition path expression designated by the searcher 20. Thereby, the query creation unit 411 creates a plurality of search condition path expressions from the search condition path expressions designated by the searcher 20. In addition, the query creation unit 411 refers to the vocabulary hierarchy relation graph storage unit 132 and the conversion data storage unit 133, and uses the search condition path expression specified by the searcher 20 and the search expression ( For example, XPath expression).

検索結果評価部４２は、パス式間評価部４２１を含む。パス式間評価部４２１は、クエリ作成部４１１によって展開された複数の検索条件パス式の各々を、例えば検索者２０によって指定された検索条件パス式に基づいて評価する。この場合、パス式間評価部４２１は、検索者によって指定された検索条件パス式に含まれる複数の文字列の並び順及びクエリ作成部４１１によって展開された複数の検索条件パス式の各々に含まれる複数の文字列の並び順に基づいて、当該展開された複数の検索条件パス式の各々の評価値を算出する。パス式間評価部４２１は、算出された評価値に基づいて、展開された複数の検索条件パス式の順序付けを行う。 The search result evaluation unit 42 includes an inter-path expression evaluation unit 421. The inter-path-expression evaluation unit 421 evaluates each of the plurality of search condition path expressions developed by the query creation unit 411 based on, for example, the search condition path expression specified by the searcher 20. In this case, the inter-path expression evaluation unit 421 is included in each of the plurality of search condition path expressions developed by the query creation unit 411 and the arrangement order of the plurality of character strings included in the search condition path expression designated by the searcher. Each evaluation value of the developed plurality of search condition path expressions is calculated based on the order of arrangement of the plurality of character strings. The inter-path expression evaluation unit 421 orders a plurality of expanded search condition path expressions based on the calculated evaluation value.

次に、図１６のフローチャートを参照して、図１５に示す検索結果評価装置４０の検索者２０からの検索要求に応じて検索が実行される際の処理手順について説明する。
まず、検索者２０は、検索条件パス式を指定して検索要求を出す（ステップＳ６１）。この検索条件パス式には、複数の文字列（語彙）が含まれる。ここでは、検索条件パス式は、「天気／東京都」であるものとして説明する。 Next, a processing procedure when a search is executed in response to a search request from the searcher 20 of the search result evaluation device 40 shown in FIG. 15 will be described with reference to the flowchart of FIG.
First, the searcher 20 designates a search condition path expression and issues a search request (step S61). This search condition path expression includes a plurality of character strings (vocabulary). Here, the search condition path expression will be described as “weather / Tokyo”.

検索実行部４１のクエリ作成部４１１は、検索者２０によって指定された検索条件パス式を入力する。以下、クエリ作成部４１１によって入力された検索条件パス式を入力パス式と称する。
次に、クエリ作成部４１１は、入力パス式を展開する（ステップＳＳ６２）。クエリ作成部４１１は、入力パス式に含まれる文字列を並び替えることによって、当該入力パス式を複数の検索条件パス式に展開する。このとき、クエリ作成部４１１は、検索条件パス式に含まれる全ての文字列の順列（並び順）毎に展開する。具体的には、入力パス式が「天気／東京都」であれば、当該入力パス式は、「天気／東京都」及び「東京都／天気」のように複数の検索条件パス式に展開される。以下、クエリ作成部４１１によって展開された複数の検索条件パス式の各々を展開パス式と称する。 The query creation unit 411 of the search execution unit 41 inputs the search condition path expression designated by the searcher 20. Hereinafter, the search condition path expression input by the query creation unit 411 is referred to as an input path expression.
Next, the query creation unit 411 expands the input path expression (step SS62). The query creation unit 411 expands the input path expression into a plurality of search condition path expressions by rearranging the character strings included in the input path expression. At this time, the query creation unit 411 expands every permutation (arrangement order) of all character strings included in the search condition path expression. Specifically, if the input path expression is “weather / Tokyo”, the input path expression is expanded into a plurality of search condition path expressions such as “weather / Tokyo” and “Tokyo / weather”. The Hereinafter, each of the plurality of search condition path expressions expanded by the query creation unit 411 is referred to as an expanded path expression.

なお、展開パス式の各々によって示される文字列の順列は、当該順列の文字列の順によって階層関係を示す。例えば展開パス式が「天気／東京都」である場合には、「天気」から「東京都」に階層関係を有する旨を示す。以下、この階層関係を展開パス式の階層関係と称する。 Note that the permutation of the character strings indicated by each of the expansion path expressions indicates the hierarchical relationship according to the order of the character strings in the permutation. For example, if the development path expression is “weather / Tokyo”, it indicates that there is a hierarchical relationship from “weather” to “Tokyo”. Hereinafter, this hierarchical relationship is referred to as an expanded path type hierarchical relationship.

また、上記したステップＳ６２においては、入力パス式に含まれるすべての文字列の順列毎に展開する場合について説明したが、例えば語彙階層関係グラフ格納部１３２を参照して、上記した展開パス式の階層関係を含む語彙階層関係グラフが存在する展開パス式のみが展開される構成でもよい。これにより、以下のステップにおいて処理される展開パス式の数が少なくなるため、処理量を減少させることが可能となる。 Further, in the above-described step S62, the case where expansion is performed for each permutation of all character strings included in the input path expression has been described. For example, referring to the vocabulary hierarchy relation graph storage unit 132, the expansion path expression Only a development path expression in which a vocabulary hierarchy relation graph including a hierarchy relation exists may be developed. As a result, the number of development path expressions processed in the following steps is reduced, and the amount of processing can be reduced.

また、入力パス式に含まれる文字列を形態素解析し、その結果により抽出された文字列により当該入力パス式を展開する構成でもよい。この場合、例えば入力パス式が「天気／くもり時々晴れ」である場合、「くもり時々晴れ」を形態素解析し、展開パス式を「天気／くもり／晴れ」、「天気／くもり」及び「天気／晴れ」のように展開することが可能となる。 Further, a configuration may be adopted in which a character string included in the input path expression is subjected to morphological analysis, and the input path expression is expanded by a character string extracted as a result. In this case, for example, when the input path expression is “weather / cloudy sometimes sunny”, “morphely is sometimes cloudy” is analyzed, and the developed path expressions are “weather / cloudy / sunny”, “weather / cloudy” and “weather / cloudy”. It becomes possible to develop as “sunny”.

また、入力パス式の展開には、上記したような当該入力パス式に含まれる文字列の単純な並び替えだけでなく、例えば「and」、「or」または「not」のような論理演算による展開も含まれる。つまり、入力パス式が「Ａ／Ｂ／Ｃ」である場合には、例えば「Ａ／（Ｂ and Ｃ）」に展開することも可能である。 In addition, the expansion of the input path expression is not only based on the simple rearrangement of the character strings included in the input path expression as described above, but also by a logical operation such as “and”, “or” or “not”. Deployment is also included. That is, when the input path expression is “A / B / C”, for example, it can be expanded to “A / (B and C)”.

次に、クエリ作成部４１１は、展開パス式の各々に対して、スコアリングを行う（ステップＳ６３）。例えば評価値を用いてスコアリング処理を実行する。また、クエリ作成部４１１は、必要に応じて語彙階層関係グラフ格納部１３２に格納されている語彙階層関係グラフを利用して評価値を算出する。 Next, the query creation unit 411 performs scoring for each expansion path expression (step S63). For example, scoring processing is executed using the evaluation value. Further, the query creation unit 411 calculates an evaluation value using the vocabulary hierarchy relationship graph stored in the vocabulary hierarchy relationship graph storage unit 132 as necessary.

ここで、上記したスコアリング処理に用いられる評価値の例について具体的に説明する。クエリ作成部４１１は、以下の例えば第８の評価値から第１０の評価値を含む複数の評価値を用いてスコアリング処理を実行する。
第８の評価値は、入力パス式と展開パス式の各々とを比較して、両者に含まれる文字列の順序の違いに応じて算出される。この場合、入力パス式に含まれる文字列の順序と比較して、順序が異なる文字列が多く含まれる展開パス式に対しては、評価値は小さくなる。例えばｘ＝入力パス式と順序の異なる文字列の数、Ｘ＝入力パス式または展開パス式に含まれる文字列の数とすると、例えば評価式Ｘ／（log a x）＋Ｎによって第８の評価値は算出される。ただし、評価式Ｘ／（log a x）＋Ｘにおいて、入力パス式と順序の異なる文字列がない場合、つまり、ｘ＝０の場合には、評価値１が算出されるものとする。なお、評価式において、「a」は底とし、任意に設定可能である。以下の評価値においても同様である。 Here, an example of evaluation values used in the above scoring process will be specifically described. The query creation unit 411 executes the scoring process using a plurality of evaluation values including the following eighth evaluation value to tenth evaluation value, for example.
The eighth evaluation value is calculated according to the difference in the order of the character strings included in both of the input path expression and the expansion path expression. In this case, compared with the order of the character strings included in the input path expression, the evaluation value is smaller for the expanded path expression including many character strings having different orders. For example, if x = the number of character strings in the order different from the input path expression, and X = the number of character strings included in the input path expression or the expansion path expression, for example, the eighth evaluation value by the evaluation expression X / (log ax) + N Is calculated. However, in the evaluation formula X / (log ax) + X, when there is no character string having a different order from the input path formula, that is, when x = 0, the evaluation value 1 is calculated. In the evaluation formula, “a” is the bottom and can be arbitrarily set. The same applies to the following evaluation values.

第９の評価値は、入力パス式と展開パス式の各々とを比較して、両者の文字列（語彙）の種類の違いに応じて算出される。この場合、入力パス式に含まれる語彙の種類と異なる種類の語彙が多く含まれる展開パス式に対しては、評価値は小さくなる。ここで、例えば展開パス式に含まれる語彙が、上記した入力パス式に含まれる語彙そのものでなく、形態素解析して抽出された語彙である場合には、語彙の種類が異なるものとして扱われる。例えばｙ＝入力パス式に含まれる語彙の種類と異なる種類の語彙の数、Ｙ＝入力パス式または展開パス式に含まれる語彙の数とすると、例えば評価式Ｙ／（log a y）＋Ｙによって第９の評価値は算出される。ただし、評価式Ｙ／（log a y）＋Ｙにおいて入力パス式に含まれる語彙の種類と異なる種類の語彙がない場合、つまり、ｙ＝０の場合には評価値１が算出されるものとする。 The ninth evaluation value is calculated according to the difference in the type of character string (vocabulary) between the input path expression and the expansion path expression. In this case, the evaluation value is small for an expanded path expression that includes many vocabulary types different from the vocabulary types included in the input path expression. Here, for example, when the vocabulary included in the expanded path expression is not the vocabulary itself included in the input path expression but a vocabulary extracted by morphological analysis, the vocabulary types are treated as different. For example, if y = the number of vocabulary types different from the vocabulary type included in the input path expression, and Y = the number of vocabularies included in the input path expression or the expansion path expression, An evaluation value of 9 is calculated. However, when there is no vocabulary of a different type from the vocabulary included in the input path expression in the evaluation expression Y / (log a y) + Y, that is, when y = 0, the evaluation value 1 is calculated.

第１０の評価値は、展開パス式内で隣接されている文字列（語彙）間の関係に応じて算出される。この場合、展開パス式の階層関係の各々のうち、例えば親子関係として設定されていない階層関係の数が多い場合には、評価値は小さくなる。例えばＺ＝親子関係として設定されていない階層関係の数とすると、例えば評価式１／（log a Ｚ）＋１によって第１０の評価値は算出される。ただし、評価式１／（log a Ｚ）＋１において、設定されていない階層関係がない場合、つまり、Ｚ＝０の場合には評価値１が算出されるものとする。なお、第１０の評価値は、例えば語彙階層関係グラフ格納部１３２に格納されている語彙階層関係グラフの各々について上記した評価式で評価値を求め、当該語彙階層関係グラフの各々について求められた評価値を合計することによって求められる。 The tenth evaluation value is calculated according to the relationship between adjacent character strings (vocabulary) in the development path expression. In this case, for example, when the number of hierarchical relationships that are not set as parent-child relationships is large in each of the hierarchical relationships of the development path expression, the evaluation value becomes small. For example, if Z = the number of hierarchical relationships that are not set as parent-child relationships, the tenth evaluation value is calculated by, for example, the evaluation formula 1 / (log a Z) +1. However, in the evaluation formula 1 / (log a Z) +1, when there is no hierarchical relationship that is not set, that is, when Z = 0, the evaluation value 1 is calculated. The tenth evaluation value is obtained for each of the vocabulary hierarchy relationship graphs, for example, by obtaining the evaluation value with the above-described evaluation formula for each of the vocabulary hierarchy relationship graphs stored in the vocabulary hierarchy relationship graph storage unit 132. It is obtained by summing the evaluation values.

展開パス式の各々は、例えば上記した第８から第１０の評価値を含む複数の評価値の合計によってスコアリングされる。クエリ作成部４１１は、スコアリングされた結果を元に、スコア（評価値の合計）が上位の展開パス式を選択する（ステップＳ６４）。以下、クエリ作成部４１１によって選択されたスコアが上位の展開パス式を中間パス式と称する。なお、中間パス式は複数であっても構わない。 Each of the development path expressions is scored by, for example, the sum of a plurality of evaluation values including the above-described eighth to tenth evaluation values. Based on the scored result, the query creation unit 411 selects an expanded path expression having a higher score (total evaluation value) (step S64). Hereinafter, an expanded path expression having a higher score selected by the query creation unit 411 is referred to as an intermediate path expression. There may be a plurality of intermediate path expressions.

検索実行部４１（クエリ作成部４１１）は、入力パス式及び中間パス式を検索結果評価部４２に渡す。
検索結果評価部４２のパス式間評価部４２１は、検索実行部４１から渡された入力パス式及び中間パス式を取得する。パス式間評価部４２１は、取得された入力パス式に基づいて、取得された中間パス式（の各々）を評価する（ステップＳ６５）。この場合、パス式間評価部４２１は、入力パス式に含まれる複数の文字列の並び順及び中間パス式に含まれる複数の文字列の並び順に基づいて、当該中間パス式の評価値を算出し、算出された評価値に基づいて、中間パス式の順序付けを行う。パス式間評価部４２１は、順序付けされた中間パス式を検索実行部４１に返す。 The search execution unit 41 (query creation unit 411) passes the input path expression and intermediate path expression to the search result evaluation unit.
The inter-path expression evaluation unit 421 of the search result evaluation unit 42 acquires the input path expression and the intermediate path expression passed from the search execution unit 41. The inter-path expression evaluation unit 421 evaluates the acquired intermediate path expressions (each) based on the acquired input path expression (step S65). In this case, the inter-path expression evaluation unit 421 calculates the evaluation value of the intermediate path expression based on the arrangement order of the plurality of character strings included in the input path expression and the arrangement order of the plurality of character strings included in the intermediate path expression. Then, the intermediate path formulas are ordered based on the calculated evaluation values. The inter-path expression evaluation unit 421 returns the ordered intermediate path expressions to the search execution unit 41.

ここで、パス式間評価部４２１による中間パス式の評価値を算出する処理について具体的に説明する。ここでは、入力パス式は、「合田／発明／構造化文書」であるものとして説明する。この入力パス式は、例えば「合田が行った発明で構造化文書を含むもの」といった意味を表す。また、クエリ作成部４１１によって入力パス式「合田／発明／構造化文書」から中間パス式「発明／（合田 and 構造化文書）」及び「合田／発明／構造化文書」が作成されたものとする。以下、中間パス式「発明／（合田 and 構造化文書）」を第１の中間パス式、中間パス式「合田／発明／構造化文書」を第２の中間パス式と称する。 Here, the process of calculating the evaluation value of the intermediate path expression by the inter-path expression evaluation unit 421 will be specifically described. Here, the input path expression will be described as “Goda / invention / structured document”. This input path expression represents a meaning such as “an invention made by Goda and including a structured document”. In addition, the query creation unit 411 creates intermediate path expressions “invention / (Goda and structured document)” and “Goda / invention / structured document” from the input path expression “Gaida / invention / structured document”. To do. Hereinafter, the intermediate path expression “invention / (Gita and structured document)” is referred to as a first intermediate path expression, and the intermediate path expression “Goda / invention / structured document” is referred to as a second intermediate path expression.

なお、例えば上記ステップＳ６３におけるスコアリングの結果によっては、上記のように入力パス式と第２の中間パス式が同じになる場合もあり得る。この場合には、入力パス式及び第２の中間パス式を１つの検索条件パス式（例えば入力パス式）として扱っても構わない。 For example, depending on the result of scoring in step S63, the input path expression and the second intermediate path expression may be the same as described above. In this case, the input path expression and the second intermediate path expression may be handled as one search condition path expression (for example, an input path expression).

以下、パス式間評価部４２１が第１の中間パス式及び第２の中間パス式の評価値を算出するアルゴリズム（以下、第２のアルゴリズムと表記）の一例について説明する。
まず、パス式間評価部４２１は、例えば入力パス式及び中間パス式（ここでは、第１の中間パス式及び第２の中間パス式）において同一の階層関係にある同一のキーワードの組み合わせの数（以下、第１の組み合わせ数と表記）を算出する。
ここで、階層関係とは、入力パス式または中間パス式において、「／」で区切られているキーワードの関係を意味する。また、複数の「／」で区切られているキーワード間にも階層関係があると定義する。例えば入力パス式（中間パス式）が「Ａ／Ｂ／Ｃ」である場合には、当該入力パス式には、「Ａ→Ｂ」、「Ａ→Ｃ」及び「Ｂ→Ｃ」という階層関係が含まれている。 Hereinafter, an example of an algorithm (hereinafter referred to as a second algorithm) in which the inter-path expression evaluation unit 421 calculates evaluation values of the first intermediate path expression and the second intermediate path expression will be described.
First, the inter-path expression evaluation unit 421, for example, the number of combinations of the same keyword having the same hierarchical relationship in the input path expression and the intermediate path expression (here, the first intermediate path expression and the second intermediate path expression). (Hereinafter referred to as the first combination number) is calculated.
Here, the hierarchical relationship means a relationship between keywords separated by “/” in an input path expression or an intermediate path expression. Also, it is defined that there is a hierarchical relationship between keywords separated by a plurality of “/”. For example, when the input path expression (intermediate path expression) is “A / B / C”, the input path expression includes a hierarchical relationship of “A → B”, “A → C”, and “B → C”. It is included.

次に、パス式間評価部４２１は、入力パス式に含まれる複数のキーワード（文字列）のうち、任意の２つのキーワードの組み合わせの数（以下、第２の組み合わせ数と表記）を算出する。上記したように例えば入力パス式が「Ａ／Ｂ／Ｃ」である場合には、任意の２つのキーワードの組み合わせは「Ａ、Ｂ」、「Ａ、Ｃ」及び「Ｂ、Ｃ」であるから、第２の組み合わせ数は３となる。
パス式間評価部４２１は、算出された第１の組み合わせ数及び第２の組み合わせ数から、第１の組み合わせ数／第２の組み合わせ数を中間パス式の評価値として算出する。 Next, the inter-path expression evaluation unit 421 calculates the number of combinations of arbitrary two keywords (hereinafter referred to as the second combination number) among a plurality of keywords (character strings) included in the input path expression. . As described above, for example, when the input path expression is “A / B / C”, the combination of any two keywords is “A, B”, “A, C”, and “B, C”. The second combination number is 3.
The inter-path expression evaluation unit 421 calculates the first combination number / the second combination number as the evaluation value of the intermediate path expression from the calculated first combination number and second combination number.

ここで、上記した第１の中間パス式及び第２の中間パス式の評価値について具体的に説明する。まず、入力パス式「合田／発明／構造化文書」の階層関係は、「合田→発明」、「合田→構造化文書」及び「発明→構造化文書」である。第１の中間パス式「発明／（合田 and 構造化文書）」の階層関係は、「発明→合田」及び「発明→構造化文書」である。また、第２の中間パス式「合田／発明／構造化文書」の階層関係は、「合田→発明」、「合田→構造化文書」及び「発明→構造化文書」である。
この場合、入力パス式及び第１の中間パス式における第１の組み合わせ数は、１（「発明→構造化文書」）である。また、入力パス式及び第２の中間パス式における第１の組み合わせ数は、３（「合田→発明」、「合田→構造化文書」、「発明→構造化文書」）である。
また、入力パス式における第２の組み合わせ数は、３（「合田→発明」、「合田→構造化文書」、「発明→構造化文書」）である。
これにより、第１の中間パス式の評価値は、第１の組み合わせ数／第２の組み合わせ数より、１／３である。また、第２の中間パス式の評価値は、３／３（＝１）である。 Here, the evaluation values of the first intermediate path expression and the second intermediate path expression will be specifically described. First, the hierarchical relationship of the input path expression “Goda / invention / structured document” is “Goda → invention”, “Goda → structured document”, and “invention → structured document”. The hierarchical relationship of the first intermediate path expression “invention / (Goda and structured document)” is “invention → Goda” and “invention → structured document”. The hierarchical relationship of the second intermediate path expression “Goda / invention / structured document” is “Goda → invention”, “Goda → structured document”, and “invention → structured document”.
In this case, the first combination number in the input path expression and the first intermediate path expression is 1 (“invention → structured document”). The number of first combinations in the input path expression and the second intermediate path expression is 3 (“Goda → Invention”, “Goda → Structured Document”, “Invention → Structured Document”).
The second combination number in the input path expression is 3 (“Goda → Invention”, “Goda → Structured Document”, “Invention → Structured Document”).
Thereby, the evaluation value of the first intermediate path expression is 1/3 from the number of first combinations / the number of second combinations. The evaluation value of the second intermediate path expression is 3/3 (= 1).

パス式間評価部４２１は、算出された第１の中間パス式及び第２の中間パス式の評価値に基づいて、当該第１の中間パス式及び第２の中間パス式の順序付けを行う。例えば算出された評価値が大きい中間パス式（ここでは、第２の中間パス式）を上位に順位付ける。 The inter-path expression evaluation unit 421 orders the first intermediate path expression and the second intermediate path expression based on the calculated evaluation values of the first intermediate path expression and the second intermediate path expression. For example, an intermediate path expression having a large calculated evaluation value (here, the second intermediate path expression) is ranked higher.

なお、上記した第２のアルゴリズムは一例であり、例えば階層関係の定義または評価値の算出方法は、ここで説明した以外のものであっても構わない。 Note that the second algorithm described above is merely an example. For example, the hierarchical relationship definition or the evaluation value calculation method may be other than the one described here.

次に、クエリ作成部４１１は、語彙階層関係グラフ格納部１３２を参照して、入力パス式毎及び中間パス式毎の階層関係を例えば親子関係として含む語彙階層関係グラフを取得する（ステップＳ６６）。
なお、クエリ作成部４１１は、例えば入力パス式及び中間パス式の階層関係のすべてを親子関係として含む語彙階層関係グラフが語彙階層関係グラフ格納部１３２に存在しない場合には、当該入力パス式及び中間パス式の階層関係の一部を親子関係として含む語彙階層関係グラフを取得する。例えば入力パス式（中間パス式）がＡ／Ｂ／Ｃ（当該入力パス式の階層関係がＡからＢ、ＢからＣ）である場合に、当該階層関係を完全に含む語彙階層関係グラフが語彙階層関係グラフ格納部１３２に存在しない場合を想定する。この場合、クエリ作成部４１１は、例えばＡからＢの階層関係のみを親子関係として含む語彙階層関係グラフ、またはＢからＣの階層関係のみを親子関係として有する語彙階層関係グラフを取得する。 Next, the query creation unit 411 refers to the vocabulary hierarchy relationship graph storage unit 132 and acquires a vocabulary hierarchy relationship graph including the hierarchy relationship for each input path expression and each intermediate path expression as, for example, a parent-child relationship (step S66). .
Note that the query creation unit 411, for example, if the lexical hierarchy relationship graph including all of the hierarchical relationships of the input path expression and the intermediate path expression as a parent-child relationship does not exist in the vocabulary hierarchy relationship graph storage unit 132, the input path expression and A lexical hierarchy relation graph including a part of the intermediate path expression hierarchy relation as a parent-child relation is acquired. For example, when the input path expression (intermediate path expression) is A / B / C (the hierarchical relation of the input path expression is A to B, B to C), the lexical hierarchy relation graph that completely includes the hierarchical relation is the vocabulary. Assume that the hierarchical relationship graph storage unit 132 does not exist. In this case, the query creation unit 411 acquires, for example, a lexical hierarchy relationship graph including only the hierarchical relationship from A to B as a parent-child relationship, or a vocabulary hierarchical relationship graph having only a hierarchical relationship from B to C as a parent-child relationship.

クエリ作成部４１１は、取得された語彙階層関係グラフに基づいて、入力パス式及び中間パス式に含まれる文字列（語彙）の各々のノードタイプを判別する（ステップＳ６７）。クエリ作成部４１１は、入力パス式の階層関係及び判別された当該入力パス式に含まれる文字列のノードタイプに基づいて、当該入力パス式に含まれる文字列間の階層関係を有する検索対象データを検索するための検索式、例えばXPath式を作成する。同様に、クエリ作成部４１１は、中間パス式の階層関係及び判別された当該中間パス式に含まれる文字列のノードタイプに基づいて、当該中間パス式に含まれる文字列間の階層関係を有する検索対象データを検索するための検索式（XPath式）を作成する。
つまり、クエリ作成部４１１は、入力パス式及び中間パス式をそれぞれXPath式に変換する（ステップＳ６８）。このとき、クエリ作成部４１１は、変換データ格納部１３３に格納されている変換データに基づいて、変換処理を実行する。この変換処理については、前述した第１の実施形態と同様であるためその詳しい説明は省略する。 The query creation unit 411 determines the node type of each character string (vocabulary) included in the input path expression and the intermediate path expression based on the acquired vocabulary hierarchy relation graph (step S67). Based on the hierarchical relationship of the input path expression and the node type of the character string included in the determined input path expression, the query creating unit 411 has search target data having a hierarchical relationship between the character strings included in the input path expression. Create a search expression to search for, for example, XPath expression. Similarly, the query creation unit 411 has a hierarchical relationship between character strings included in the intermediate path expression based on the hierarchical relationship of the intermediate path expression and the node type of the character string included in the determined intermediate path expression. Create a search expression (XPath expression) for searching the search target data.
That is, the query creation unit 411 converts the input path expression and the intermediate path expression into XPath expressions, respectively (step S68). At this time, the query creation unit 411 executes conversion processing based on the conversion data stored in the conversion data storage unit 133. Since this conversion process is the same as that of the first embodiment described above, a detailed description thereof will be omitted.

次に、検索実行部４１の検索制御部１２２は、クエリ作成部４１１によって作成されたXPath式を指定して、記憶部１３に対して検索要求を出力する。
記憶部１３は、検索制御部１２２からの検索要求に応じて、当該検索制御部１２２の制御の下、検索対象データ格納部１３１から当該検索要求に合致する検索対象データ（構造化文書）を検索する（ステップＳ６９）。このとき、記憶部１３は、クエリ作成部４１１によって作成されたXPath式の各々に基づいて検索を実行する。記憶部１３は、クエリ作成部４１１によって作成されたXPath式毎に、検索された検索対象データを含む検索結果データを検索制御部１２２に返す。これにより、検索制御部１２２は、XPath式毎の検索結果データを取得する。 Next, the search control unit 122 of the search execution unit 41 specifies the XPath expression created by the query creation unit 411 and outputs a search request to the storage unit 13.
In response to the search request from the search control unit 122, the storage unit 13 searches the search target data storage unit 131 for search target data (structured document) that matches the search request under the control of the search control unit 122. (Step S69). At this time, the storage unit 13 performs a search based on each XPath expression created by the query creation unit 411. The storage unit 13 returns search result data including the searched search target data to the search control unit 122 for each XPath expression created by the query creation unit 411. Thereby, the search control unit 122 acquires search result data for each XPath expression.

以下、入力パス式から作成（変換）されたXPath式に基づいて検索された検索対象データを含む検索結果データを入力パス式の検索結果データと称する。また、中間パス式から作成されたXPath式に基づいて検索された検索対象データを含む検索結果データを中間パス式の検索結果データと称する。なお、単に検索結果データと表記する場合には、入力パス式の検索結果データ及び中間パス式の検索結果データの両方を含むものとする。 Hereinafter, search result data including search target data searched based on an XPath expression created (converted) from an input path expression is referred to as input path expression search result data. Further, search result data including search target data searched based on an XPath expression created from an intermediate path expression is referred to as intermediate path expression search result data. It should be noted that the expression “search result data” includes both input path type search result data and intermediate path type search result data.

検索制御部１２２は、取得された検索結果データ及び入力パス式を検索結果評価部４２に渡す。
検索結果評価部４２のパス式文書間評価部１４１は、検索制御部１２２から渡された入力パス式、検索結果データ及びパス式間評価部４２１による中間パス式の順序付けの結果に基づいて、当該検索結果データに含まれる検索対象データ（構造化文書）の評価を行う（ステップＳ７０）。 The search control unit 122 passes the acquired search result data and the input path expression to the search result evaluation unit 42.
The path expression inter-document evaluation section 141 of the search result evaluation section 42 is based on the input path expression passed from the search control section 122, the search result data, and the result of ordering of the intermediate path expressions by the path expression evaluation section 421. The search target data (structured document) included in the search result data is evaluated (step S70).

このときパス式文書間評価部１４１は、検索制御部１２２から渡された入力パス式に基づいて、入力パス式及び中間パス式の検索結果データ毎に評価値を算出する。
つまり、パス式文書間評価部１４１は、検索制御部１２２から渡された入力パス式に基づいて、入力パス式の検索結果データに含まれる検索対象データの各々の評価値を算出する。これにより、パス式文書間評価部１４１は、算出された評価値に基づいて、入力パス式の検索結果データにも含まれる検索対象データに対して順序付けを行う。 At this time, the path expression inter-document evaluation unit 141 calculates an evaluation value for each search result data of the input path expression and the intermediate path expression based on the input path expression passed from the search control unit 122.
That is, the path expression inter-document evaluation unit 141 calculates each evaluation value of the search target data included in the search result data of the input path expression based on the input path expression passed from the search control unit 122. Thereby, the path type inter-document evaluation unit 141 orders the search target data included in the search result data of the input path type based on the calculated evaluation value.

また、パス式文書間評価部１４１は、検索制御部１２２から渡された入力パス式に基づいて、中間パス式の検索結果データに含まれる検索対象データの各々の評価値を算出する。これにより、パス式文書間評価部１４１は、算出された評価値に基づいて、中間パス式の検索結果データに含まれる検索対象データに対して順序付けを行う。
なお、この検索対象データの評価値は、前述した第１の実施形態の例えば第１のアルゴリズムにより算出されるため、その詳しい説明は省略する。 Further, the path type inter-document evaluation unit 141 calculates each evaluation value of the search target data included in the search result data of the intermediate path type based on the input path type passed from the search control unit 122. As a result, the path expression inter-document evaluation unit 141 orders the search target data included in the intermediate path expression search result data based on the calculated evaluation value.
Note that the evaluation value of the search target data is calculated by, for example, the first algorithm of the first embodiment described above, and thus detailed description thereof is omitted.

また、パス式文書間評価部１４１は、上記ステップＳ６５においてパス式間評価部４２１によって中間パス式に対する順序付けの結果及び上記検索対象データの各々に対する順序付けの結果に基づいて、検索結果データに含まれる検索対象データについて順序付けを行う。パス式文書間評価部１４１によって順序付けされた検索対象データを含む検索結果データは、検索実行部４１に渡される。 Further, the pass expression inter-document evaluation unit 141 is included in the search result data based on the ordering result for the intermediate path expression and the ordering result for each of the search target data by the inter-pass expression evaluation unit 421 in step S65. Order the search target data. The search result data including the search target data ordered by the path type inter-document evaluation unit 141 is passed to the search execution unit 41.

検索制御部１２２は、パス式文書間評価部１４１から渡された検索結果データを、検索者２０からの検索要求に対する検索結果として検索者２０に返す（ステップＳ７１）。これにより検索者２０は、検索要求に対する検索結果を取得する（ステップＳ７２）。この場合、検索結果は、パス式文書間評価部１４１によって順序付けされた結果が上位の検索対象データから順に検索者２０に対して提示（表示）される。 The search control unit 122 returns the search result data passed from the path type inter-document evaluation unit 141 to the searcher 20 as a search result for the search request from the searcher 20 (step S71). Thereby, the searcher 20 acquires the search result for the search request (step S72). In this case, the search results are presented (displayed) to the searcher 20 in order from the higher-order search target data, the results ordered by the path type inter-document evaluation unit 141.

なお、ステップＳ６６において、クエリ作成部４１１によって複数の語彙階層関係グラフが取得された場合は、取得された語彙階層関係グラフの各々についてステップＳ６７及びＳ６８の処理が実行される。これにより、例えば入力パス式（または中間パス式）から複数のXPath式が作成（変換）された場合は、当該複数のXPath式を互いにＯＲ（論理輪）の関係として、ステップＳ６９の処理が実行される。 In step S66, when a plurality of vocabulary hierarchy relationship graphs are acquired by the query creation unit 411, the processing of steps S67 and S68 is executed for each of the acquired vocabulary hierarchy relationship graphs. As a result, for example, when a plurality of XPath expressions are created (converted) from the input path expression (or intermediate path expression), the process of step S69 is executed with the plurality of XPath expressions being in an OR (logical ring) relationship with each other. Is done.

次に、図１７及び図１８を参照して、パス式文書間評価部１４１による検索対象データの順序付けの具体例について説明する。
ここで、クエリ作成部４１１によって入力パス式から作成された中間パス式は、中間パス式１及び中間パス式２であるものとする。
図１７は、各検索条件パス式（入力パス式、中間パス式１、中間パス式２）の検索結果データの一例を示す。図１７に示す例では、入力パス式の検索結果データには、検索対象データ１〜３が含まれる。同様に、中間パス式１の検索結果データには、検索対象データ４〜６が含まれる。また、中間パス式２の検索結果データには、検索対象データ７及び８が含まれる。 Next, a specific example of ordering of search target data by the path type inter-document evaluation unit 141 will be described with reference to FIGS. 17 and 18.
Here, the intermediate path expressions created from the input path expression by the query creation unit 411 are the intermediate path expression 1 and the intermediate path expression 2.
FIG. 17 shows an example of search result data for each search condition path expression (input path expression, intermediate path expression 1, intermediate path expression 2). In the example illustrated in FIG. 17, search target data 1 to 3 are included in the input path type search result data. Similarly, the search result data of the intermediate path expression 1 includes search target data 4 to 6. Further, the search result data of the intermediate path expression 2 includes search target data 7 and 8.

ここで、パス式間評価部４２１が例えば前述した第２のアルゴリズムにより算出した中間パス式１の評価値が１／３、中間パス式２の評価値が１であるものとする。この場合、パス式間評価部４２１は、中間パス式２、中間パス式１の順に順序付けを行う。
また、パス式文書間評価部１４１が例えば前述した第１のアルゴリズムにより入力パス式、中間パス式１及び中間パス式２の検索結果データ毎に対して評価値を算出したものとする。この結果、入力パス式の検索結果データにおいては、算出された評価値に基づいて、検索対象データ１、検索対象データ３、検索対象データ２の順に順序付けされたものとする。中間パス式１においては、算出された評価値に基づいて、検索対象データ５、検索対象データ４、検索対象データ６の順に順序付けされたものとする。また、中間パス式２においては、算出された評価値に基づいて、検索対象データ８、検索対象データ７の順に順序付けされたものとする。
パス式文書間評価部１４１は、上記したような中間パス式に対する順序付けの結果及び各検索結果データに対する順序付けの結果に基づいて、検索結果データに含まれる検索対象データ全体の順序付けを行う。 Here, it is assumed that the evaluation value of the intermediate path expression 1 calculated by the inter-path expression evaluation unit 421 using, for example, the second algorithm described above is 1/3 and the evaluation value of the intermediate path expression 2 is 1. In this case, the inter-path expression evaluation unit 421 performs ordering in the order of the intermediate path expression 2 and the intermediate path expression 1.
In addition, it is assumed that the path type inter-document evaluation unit 141 calculates an evaluation value for each search result data of the input path type, the intermediate path type 1 and the intermediate path type 2 by the first algorithm described above, for example. As a result, in the search result data of the input path expression, it is assumed that the search target data 1, the search target data 3, and the search target data 2 are ordered in this order based on the calculated evaluation value. In the intermediate path expression 1, it is assumed that the search target data 5, the search target data 4, and the search target data 6 are ordered in this order based on the calculated evaluation value. In the intermediate path expression 2, it is assumed that the search target data 8 and the search target data 7 are ordered in this order based on the calculated evaluation value.
The path expression inter-document evaluation unit 141 orders the entire search target data included in the search result data based on the ordering result for the intermediate path expression and the ordering result for each search result data as described above.

図１８は、上記したような中間パス式及び各検索結果データに対する順序付けの結果に基づいて、検索結果データに含まれる検索対象データが順序付けされた結果を示す。
上記したように中間パス式は、中間パス式２、中間パス式１の順に順序付けがされている。この場合、検索対象データの順序付けにおいては、入力パス式の検索結果データ、中間パス式２の検索結果データ、中間パス式１の検索結果データの順に順序付けされる。また、各検索結果データに含まれる検索対象データの順序付けにおいては、上記した各検索結果データに対する順序付けの順となる。 FIG. 18 shows a result of ordering the search target data included in the search result data based on the above intermediate path expression and the result of ordering for each search result data.
As described above, the intermediate path expressions are ordered in the order of the intermediate path expression 2 and the intermediate path expression 1. In this case, the search target data is ordered in the order of the search result data of the input path expression, the search result data of the intermediate path expression 2, and the search result data of the intermediate path expression 1. Further, in ordering the search target data included in each search result data, the order of the search results data described above is the order of order.

よって、図１８に示すように、まず、入力パス式の検索結果データに含まれる検索対象データ１、検索対象データ３、検索対象データ２が上位に順序付けされる。次に、中間パス式２の検索結果データに含まれる検索対象データ８、検索対象データ７の順に順序付けされる。その後に、中間パス式１の検索結果データに含まれる検索対象データ５、検索対象データ４、検索対象データ６の順に順序付けされる。
つまり、上記したような場合、検索結果データ（に含まれる各検索対象データ）が図１８に示す順序で検索者２０に対して提示されることになる。 Therefore, as shown in FIG. 18, first, the search target data 1, the search target data 3, and the search target data 2 included in the search result data of the input path type are ordered higher. Next, the search target data 8 and the search target data 7 included in the search result data of the intermediate path expression 2 are ordered. Thereafter, the search target data 5, the search target data 4, and the search target data 6 included in the search result data of the intermediate path formula 1 are ordered.
That is, in the case described above, search result data (each search target data included therein) is presented to the searcher 20 in the order shown in FIG.

上記したように本実施形態においては、前述した第１の実施形態と異なり、検索者２０によって指定された検索条件パス式（入力パス式）から複数の検索条件パス式（中間パス式）が作成される。本実施形態においては、入力パス式から作成されたXPath式に加えて、この中間パス式から作成されたXPath式に基づいて検索処理が実行される。これにより、検索者２０によって指定された複数の文字列（語彙）を含む入力パス式の階層関係以外の階層関係についても検索することが可能となる。したがって、検索者２０は、前述した第１の実施形態と比較して、より構造化文書のデータ構造を意識することなく、構造化文書の階層関係を考慮した検索を実行することが可能となる。 As described above, in this embodiment, unlike the first embodiment described above, a plurality of search condition path expressions (intermediate path expressions) are created from the search condition path expression (input path expression) specified by the searcher 20. Is done. In the present embodiment, in addition to the XPath expression created from the input path expression, search processing is executed based on the XPath expression created from this intermediate path expression. Thereby, it is possible to search for a hierarchical relationship other than the hierarchical relationship of the input path expression including a plurality of character strings (vocabulary) designated by the searcher 20. Therefore, the searcher 20 can execute a search considering the hierarchical relationship of the structured document without being aware of the data structure of the structured document, as compared with the first embodiment described above. .

また、本実施形態においては、入力パス式及び中間パス式の検索結果（データ）を評価することにより、検索者２０が意図した階層構造の構造化文書をより上位に順位付けして検索結果を返すことができる。これにより、検索者２０は、効率的に自身が必要とするデータ（構造化文書）を取得することが可能となる。 Further, in this embodiment, by evaluating the search results (data) of the input path expression and the intermediate path expression, the structured documents having the hierarchical structure intended by the searcher 20 are ranked higher and the search results are displayed. Can return. As a result, the searcher 20 can efficiently acquire data (structured document) required by the searcher 20.

また、本実施形態においては、入力パス式から作成された中間パス式を事前に評価し、順位付けを行うことにより、検索結果に含まれる検索対象データの評価処理を減少させることにより、全体としての処理時間の短縮を可能とすることができる。 Further, in the present embodiment, the intermediate path expression created from the input path expression is evaluated in advance, and ranking is performed, thereby reducing the evaluation processing of the search target data included in the search result, and as a whole The processing time can be shortened.

なお、本願発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組合せにより種々の発明を形成できる。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に亘る構成要素を適宜組合せてもよい。 Note that the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, you may combine the component covering different embodiment suitably.

本発明の第１の実施形態に係る検索結果評価装置の主として機能構成を示すブロック図。1 is a block diagram mainly showing a functional configuration of a search result evaluation apparatus according to a first embodiment of the present invention. 図１に示す検索結果評価装置１０において語彙階層関係グラフが作成される際の処理手順を示すフローチャート。The flowchart which shows the process sequence at the time of the vocabulary hierarchy relationship graph being created in the search result evaluation apparatus 10 shown in FIG. 図１に示す検索結果評価装置１０において検索者２０からの検索要求に応じた検索が実行される際の処理手順を示すフローチャート。The flowchart which shows the process sequence when the search according to the search request | requirement from the searcher 20 is performed in the search result evaluation apparatus 10 shown in FIG. 検索対象データ格納部１３１に格納されている検索対象データの一例を示す図。The figure which shows an example of the search object data stored in the search object data storage part 131. 解析対象データの階層構造を示す構造グラフを作成する処理手順を示すフローチャート。The flowchart which shows the process sequence which produces the structure graph which shows the hierarchical structure of analysis object data. 図４に示す天気予報データ１００の階層構造を示す構造グラフ２００の一例を示す図。The figure which shows an example of the structure graph 200 which shows the hierarchical structure of the weather forecast data 100 shown in FIG. 構造グラフを用いて語彙階層関係グラフを作成する処理手順を示すフローチャート。The flowchart which shows the process sequence which produces a vocabulary hierarchy relationship graph using a structure graph. 語彙階層関係グラフ作成部１１２によって作成された語彙階層関係グラフ３００の省略図。The abbreviated figure of the vocabulary hierarchy relationship graph 300 created by the vocabulary hierarchy relationship graph creation unit 112. 検索式を作成する処理手順を示すフローチャート。The flowchart which shows the process sequence which produces a search expression. 変換データ格納部１３３に格納されている変換データのデータ構造の一例を示す図。The figure which shows an example of the data structure of the conversion data stored in the conversion data storage part 133. FIG. クエリ作成部１２１によって作成されたXPath式による検索結果の一例を示す図。The figure which shows an example of the search result by the XPath expression created by the query preparation part 121. FIG. 予め定義された設定レベルを示す情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the information which shows the setting level defined beforehand. 検索結果として得られた検索対象データ１０１を示す図。The figure which shows the search object data 101 obtained as a search result. 検索結果として得られた検索対象データ１０２を示す図。The figure which shows the search object data 102 obtained as a search result. 本発明の第２の実施形態に係る検索結果評価装置の主として機能構成を示すブロック図。The block diagram which mainly shows a function structure of the search result evaluation apparatus which concerns on the 2nd Embodiment of this invention. 図１５に示す検索結果評価装置４０の検索者２０からの検索要求に応じて検索が実行される際の処理手順を示すフローチャート。The flowchart which shows the process sequence at the time of a search being performed according to the search request | requirement from the searcher 20 of the search result evaluation apparatus 40 shown in FIG. 各検索条件パス式の検索結果データの一例を示す図。The figure which shows an example of the search result data of each search condition path type | formula. 検索結果データに含まれる検索対象データが順序付けされた結果を示す図。The figure which shows the result by which the search object data contained in search result data was ordered.

Explanation of symbols

１０，４０…検索結果評価装置、１１…データ解析部、１２，４１…検索実行部、１３…記憶部、１４，４２…検索結果評価部、１１１…データ解析制御部、１１２…語彙階層関係グラフ作成部、１２１，４１１…クエリ作成部、１２２…検索制御部、１３１…検索対象データ格納部、１３２…語彙階層関係グラフ格納部、１３３…変換データ格納部、１４１…パス式文書間評価部、４２１…パス式間評価部。 DESCRIPTION OF SYMBOLS 10,40 ... Search result evaluation apparatus, 11 ... Data analysis part, 12, 41 ... Search execution part, 13 ... Memory | storage part, 14,42 ... Search result evaluation part, 111 ... Data analysis control part, 112 ... Vocabulary hierarchy relation graph Creation unit, 121, 411 ... Query creation unit, 122 ... Search control unit, 131 ... Search target data storage unit, 132 ... Vocabulary hierarchy relation graph storage unit, 133 ... Conversion data storage unit, 141 ... Path type inter-document evaluation unit, 421 ... Evaluation unit between pass expressions.

Claims

In a search result evaluation apparatus that performs a search according to a search condition including a plurality of character strings specified by a user, with respect to a search target data storage unit storing a plurality of structured documents to be searched,
A lexical hierarchy relation graph storage means for storing a lexical hierarchy relation graph indicating a hierarchical relation between vocabularies included in each of the plurality of structured documents stored in the search target data storage means, for each structured document;
Query creation means for creating a search expression for searching a structured document from a plurality of character strings included in the search condition with reference to the vocabulary hierarchy relationship graph storage means;
A search unit that searches the search target data storage unit for a structured document that matches the created search formula;
A search result evaluation device comprising: search result evaluation means for evaluating the searched structured document based on a plurality of character strings included in the search condition.

The search result evaluation means includes:
A calculation unit that calculates an evaluation value of the structured document based on a location where a plurality of character strings included in the search condition appear in the searched structured document;
The search result evaluation apparatus according to claim 1, wherein the searched structured documents are ordered based on the calculated evaluation value.

A conversion data storage means for storing relation information indicating a hierarchical relation between character strings and search expression information indicating a search expression for searching for a structured document having a hierarchical relation between the character strings in advance; Equipped,
The query creation means includes:
Expanding means for expanding a plurality of search conditions from the search conditions by rearranging the character strings included in the search conditions;
Obtaining means for acquiring a lexical hierarchy relation graph including a plurality of character strings included in the search condition as a vocabulary for each of the search condition specified by the user and the plurality of expanded search conditions;
A search for creating a search formula indicated by the search formula information stored in the conversion data storage means in association with the relationship information indicating the hierarchical relationship between the plurality of character strings indicated by the acquired vocabulary hierarchical relationship graph Including formula creation means and
The search means searches the search target data storage means for a structured document that matches the search expression created by the search expression creation means,
The search result evaluation means includes:
A search condition evaluation unit that evaluates each of the plurality of expanded search conditions based on a search condition including a plurality of character strings specified by the user;
The structured document searched by the search unit is evaluated based on a plurality of character strings included in the search condition specified by the user and an evaluation result by the search condition evaluation unit. Search result evaluation device.

The search condition evaluation means, based on the arrangement order of a plurality of character strings included in the search condition specified by the user and the arrangement order of a plurality of character strings included in each of the expanded plurality of search conditions, Including a calculation means for calculating an evaluation value of each of the expanded search conditions,
The search result evaluation device according to claim 3, wherein the search result evaluation unit orders the searched structured documents based on the calculated evaluation value.

A search target data storage unit storing a plurality of structured documents to be searched and a vocabulary hierarchy relation graph storage unit are included, and the search target data storage unit includes a plurality of character strings designated by the user. A search result evaluation method applied in a search result evaluation device that executes a search according to a search condition,
A lexical hierarchy relation graph indicating a hierarchical relation between vocabularies included in each of the plurality of structured documents stored in the search target data storage means is stored in the vocabulary hierarchy relation graph storage means for each structured document. Steps,
Creating a search expression for searching a structured document from a plurality of character strings included in the search condition with reference to the vocabulary hierarchy relation graph storage means;
Searching the search target data storage unit for a structured document that matches the created search expression;
Evaluating the searched structured document based on a plurality of character strings included in the search condition. A search result evaluation method comprising: