JP5439028B2

JP5439028B2 - Information search apparatus, information search method, and program

Info

Publication number: JP5439028B2
Application number: JP2009116025A
Authority: JP
Inventors: 達彦岡田; 健典亘; 敬司溝渕; 貞治高井; 隆光石岡; 世紀井上
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2009-05-12
Filing date: 2009-05-12
Publication date: 2014-03-12
Anticipated expiration: 2029-05-12
Also published as: JP2010266970A

Description

本発明は、入力されたテキストデータを解析した結果に応じた検索を行う情報検索装置、情報検索方法、およびプログラムに関する。 The present invention relates to an information retrieval apparatus for searching corresponding to the result of analyzing the input text data, information retrieval method, a contact and a program.

例えば、電子メールによるユーザの意見や企業内の電子文書を収集し、テキストデータに変換してデータベース等に蓄積しておく。そして、オペレーターによるユーザからの問い合わせ対応や苦情対応の際に、オペレーターが入力した文章を検索キーとして、該当する対応方法をデータベースから検索して、オペレーターが使用する端末に送信する検索方法がある。
このようなオペレーターによる対応シーンでは、お客様を待たせないためにも、検索キーに適した検索結果をより絞り込んで検索しなければならない。しかし、このような対応シーンでの検索では、ユーザからの問い合わせや苦情等の、文の意味やユーザの意図を含む文章が検索キーとして利用されている。このため、単語やキーワード等を検索キーとして用いる一般的な検索ではなく、文章の意味やユーザの意図をより重視した検索を行う必要がある。
例えば、検索装置によって検索される辞書データを作成する際に、検索対象となる文章の構文構造や意味を解析し、解析した文章の語の品詞や係りうけ関係を抽出し、抽出された情報をマッチングの条件とする木構造の辞書データを作成しておく。そして、検索キーとして入力された検索文章を解析し、マッチングの条件と合致する解析結果を辞書データから検索して、得られた検索結果としてユーザの端末に表示させるものがある（例えば、特許文献１参照）。 For example, a user's opinion by electronic mail or an electronic document in a company is collected, converted into text data, and stored in a database or the like. Then, when responding to an inquiry or complaint from a user by an operator, there is a search method in which a corresponding input method is searched from a database using a sentence input by the operator as a search key and transmitted to a terminal used by the operator.
In such a response scene by an operator, it is necessary to narrow down the search results suitable for the search key in order to keep the customer from waiting. However, in such a search in the corresponding scene, a sentence including the meaning of the sentence and the user's intention, such as an inquiry or complaint from the user, is used as a search key. For this reason, it is necessary to perform a search that emphasizes the meaning of the sentence and the user's intention, rather than a general search using words, keywords, and the like as search keys.
For example, when creating dictionary data to be searched by a search device, the syntax structure and meaning of the sentence to be searched are analyzed, the part of speech and the relationship between the words of the analyzed sentence are extracted, and the extracted information is Create tree-structured dictionary data as matching conditions. Then, the search text input as a search key is analyzed, an analysis result that matches the matching condition is searched from the dictionary data, and the obtained search result is displayed on the user terminal (for example, patent document) 1).

特開２００３−５８５３７号公報JP 2003-58537 A

しかしながら、特許文献１のような辞書データを利用した検索においては、マッチングの目的や条件に応じて、辞書データをそれぞれ作り込む必要があるという問題がある。
例えば、「携帯電話がつながりにくい」「携帯電話がつながらない」という二つの文章を利用して、“使用方法に関する問い合わせ”に適したマッチングの条件に応じた辞書データを作成する場合は、以下のような辞書データを作り込む。例えば、上記二つの文章は、携帯電話の使用方法に関する問い合わせの意味合いを持っている。このため、上記二つの文章は、ここでのマッチングの条件として関連付けられる文章である。よって、“使用方法に関する問い合わせ”のための辞書データを作成する場合、上記二つの文章は、共に検索される対象としてそれぞれが関連付けられるマッチング条件の下、辞書データが作成される必要がある。 However, a search using dictionary data as in Patent Document 1 has a problem that it is necessary to create dictionary data according to the purpose and conditions of matching.
For example, to create dictionary data according to matching conditions suitable for “inquiry about usage” using two sentences “mobile phone is difficult to connect” and “cell phone cannot be connected” The correct dictionary data. For example, the above two sentences have implications for inquiries regarding how to use a mobile phone. For this reason, the above-mentioned two sentences are sentences associated as matching conditions here. Therefore, when creating dictionary data for “inquiry about usage”, it is necessary to create dictionary data under matching conditions in which the above two sentences are associated with each other as a search target.

一方、“企業に対する意見に関する対応”に適したマッチングの条件に基づく辞書データを作成する場合、「携帯電話がつながりにくい」という文章は、ユーザによる改善の要望という意味合いを持ち、「携帯電話がつながらない」という文章は、ユーザからの苦情いう意味合いを持っている。このため、ユーザからの改善の要望を検索する場合、後者ではなく前者のみが検索結果として得られることが好ましい。従って、このような場合、“企業に対する意見に関する対応”に適したマッチングの条件に基づく辞書データを作成する必要がある。 On the other hand, when creating dictionary data based on matching conditions suitable for “responding to opinions about companies”, the sentence “cell phone is difficult to connect” has the meaning of a request for improvement by the user, and “cell phone is not connected” "Has the meaning of a complaint from the user. For this reason, when searching for improvement requests from users, it is preferable that only the former, not the latter, be obtained as a search result. Therefore, in such a case, it is necessary to create dictionary data based on matching conditions suitable for “responding to opinions about companies”.

つまり、上述のように、検索の目的が異なる場合、その目的に応じたマッチングの条件に基づく辞書データをそれぞれ作成する必要があったため、辞書データを作るための労力が増大するという問題があった。
また、膨大なデータ量の辞書データを、マッチングの条件に応じてそれぞれ記憶しておかなければならないため、効率的な記憶領域の活用が図られないという問題があった。 In other words, as described above, when the purpose of the search is different, it is necessary to create dictionary data based on matching conditions according to the purpose, and thus there is a problem that labor for creating dictionary data increases. .
In addition, since a huge amount of dictionary data must be stored in accordance with matching conditions, there is a problem that efficient use of the storage area cannot be achieved.

本発明は、このような事情を考慮してなされたものであり、その目的は、一つの辞書データを利用して異なるマッチング条件に基づく検索ができる情報検索装置、情報検索方法、およびプログラムを提供することにある。 The present invention has been made in view of such circumstances, and its object is an information retrieval device capable search based on different matching conditions by using one dictionary data, information retrieval method, the contact and program It is to provide.

上述した課題を解決するために、本発明の情報検索装置は、複数の単語から構成される検索キー文が入力される入力部と、前記検索キー文を解析して、前記検索キー文を構成する前記単語に関する解析結果を得る解析部と、少なくとも１つの前記単語によって構成される文節を部分木ノードとして木構造に構成した辞書であって、少なくとも１つの前記文節によって構成される文に関するマッチ辞書情報として、当該文に含まれる前記文節に関する情報を表すルール情報を記憶するマッチ辞書記憶部と、前記マッチ辞書記憶部に記憶されている前記マッチ辞書情報と前記検索キー文との関係性を照合するためのマッチング条件が関連付けられており、当該マッチング条件を満たす単語に対して前記検索キー文との照合の度合いを評価するための評価基準を有するマッチプロファイル情報を記憶するマッチプロファイル記憶部と、前記マッチプロファイル情報に基づき、関連付けられている前記マッチング条件に従って、前記検索キー文と前記マッチ辞書情報との照合を行い、照合の結果、前記マッチング条件を満たす前記文に対して、当該マッチプロファイル情報に関連付けられている前記評価基準に従って、前記検索キー文と前記マッチ辞書情報との照合の度合いを表すスコアを算出する検索処理部と、を備えることを特徴とする。 In order to solve the above-described problems, an information search apparatus according to the present invention includes an input unit to which a search key sentence composed of a plurality of words is input, and analyzes the search key sentence to form the search key sentence. An analysis unit that obtains an analysis result related to the word, and a dictionary configured in a tree structure with a clause constituted by at least one of the words as a subtree node, and a match dictionary related to a sentence constituted by at least one of the clauses The match dictionary storage unit that stores rule information representing information about the clause included in the sentence as information, and the relationship between the match dictionary information stored in the match dictionary storage unit and the search key sentence is collated In order to evaluate the degree of matching with the search key sentence with respect to a word satisfying the matching condition A match profile storage unit that stores match profile information having an evaluation criterion, and based on the match profile information, the search key sentence and the match dictionary information are collated according to the associated matching condition, and the result of the collation A search processing unit that calculates a score representing a degree of matching between the search key sentence and the match dictionary information according to the evaluation criterion associated with the match profile information for the sentence that satisfies the matching condition; It is characterized by providing.

また、この情報検索装置は、前記評価基準が、前記マッチング条件を満たす単語に対して前記照合の度合いに応じた点数を与えるか否かを表し、前記検索処理部が、前記評価基準に従って、前記マッチング条件を満たす単語に対して与えられる前記点数を、前記マッチング条件を満たす文毎に算出して、前記スコアを得ることを特徴とする。 Further, the information search device represents whether the evaluation criterion gives a score corresponding to the degree of matching for a word satisfying the matching condition, and the search processing unit is configured to perform the evaluation according to the evaluation criterion. The score given to the word satisfying the matching condition is calculated for each sentence satisfying the matching condition to obtain the score.

また、この情報検索装置は、前記マッチプロファイル記憶部が、それぞれ異なる特性を有する複数の前記マッチング条件のうち、少なくとも１つのマッチング条件と関連付けられている前記マッチプロファイル情報を、検索の目的に応じて複数備えることを特徴とする。 In the information search apparatus, the match profile storage unit may search the match profile information associated with at least one matching condition among the plurality of matching conditions having different characteristics according to the purpose of the search. It is characterized by comprising a plurality.

また、この情報検索装置は、前記マッチプロファイル記憶部は、前記マッチング条件として、単語要素マッチング、属性マッチング、あるいは係りうけマッチングのうち、少なくとも１つが関連づけられていることを特徴とする。 In the information search apparatus, the match profile storage unit is associated with at least one of word element matching, attribute matching, and dependency matching as the matching condition.

また、この情報検索装置は、前記入力部が、複数の単語から構成される検索対象文を入力し、前記解析部が、前記検索対象文を解析して、前記検索対象文を構成する前記単語に関する解析結果を得て、前記解析結果に基づき、前記単語の文字列に関する文字情報、および前記単語の属性を表す属性情報を含む前記ルール情報を、少なくとも１つの前記単語によって構成される文節と対応付けて、部分木ノードとして木構造に構成した辞書情報であって、少なくとも１つの前記文節によって構成される文に関する前記マッチ辞書情報を作成して前記マッチ辞書記憶部に記憶させる辞書作成部を、さらに備えることを特徴とする。 Further, the word the information retrieval apparatus, the input unit, which inputs the configured search subject sentence from the plurality of words, the analysis unit is analyzing the search subject sentence, constituting said search subject sentence analysis results obtained regarding the analysis based on the results, corresponding to the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, a phrase composed of at least one of said word attached to, a dictionary information configured in a tree structure as a sub-tree nodes, the dictionary creation unit for creating and storing the matching dictionary storage unit said matching dictionary information about configured sentence by at least one of the clauses, It is further provided with the feature.

上述した課題を解決するために、本発明の情報検索方法は、入力部が、複数の単語から構成される検索キー文の入力を受け付け、解析部が、前記検索キー文を解析して、前記検索キー文を構成する前記単語に関する解析結果を得て、検索処理部が、前記マッチ辞書情報と前記検索キー文との関係性を照合するためのマッチング条件が関連付けられており、当該マッチング条件を満たす単語に対して前記検索キー文との照合の度合いを評価するための評価基準を有するマッチプロファイル情報を記憶するマッチプロファイル記憶部から前記マッチプロファイル情報を読み出し、少なくとも１つの前記単語によって構成される文節を部分木ノードとして木構造に構成した辞書であって、少なくとも１つの前記文節によって構成される文に関するマッチ辞書情報として、当該文に含まれる前記文節に関する情報を表すルール情報を記憶するマッチ辞書記憶部の前記マッチ辞書情報を利用して、前記マッチプロファイル情報に基づき、関連付けられている前記マッチング条件に応じた前記検索キー文と前記マッチ辞書情報との照合を行い、照合の結果、前記マッチング条件を満たす前記文に対して、当該マッチプロファイル情報に関連付けられている前記評価基準に従って、前記検索キー文と前記マッチ辞書情報との照合の度合いを表すスコアを算出することを特徴とする。 In order to solve the above-described problem, in the information search method of the present invention, the input unit accepts input of a search key sentence composed of a plurality of words, and the analysis unit analyzes the search key sentence, An analysis result regarding the word constituting the search key sentence is obtained, and the search processing unit is associated with a matching condition for checking the relationship between the match dictionary information and the search key sentence. The match profile information is read from a match profile storage unit that stores match profile information having an evaluation criterion for evaluating the degree of matching with the search key sentence for a word that satisfies, and is configured by at least one of the words A dictionary in which clauses are constructed in a tree structure with subtree nodes, and is a matrix related to a sentence composed of at least one of the clauses. As the dictionary information, using the match dictionary information of the match dictionary storage unit that stores rule information representing information about the clause included in the sentence, based on the match profile information, the associated matching condition The search key sentence is matched with the match dictionary information, and the search key sentence is determined according to the evaluation criterion associated with the match profile information for the sentence that satisfies the matching condition as a result of the matching. And a score representing the degree of matching between the match dictionary information and the match dictionary information.

また、本発明は、コンピュータを、複数の単語から構成される検索キー文が入力される入力手段、前記検索キー文を解析して、前記検索キー文を構成する前記単語に関する解析結果を得る解析手段、前記マッチ辞書情報と前記検索キー文との関係性を照合するためのマッチング条件が関連付けられており、当該マッチング条件を満たす単語に対して前記検索キー文との照合の度合いを評価するための評価基準を有するマッチプロファイル情報を記憶するマッチプロファイル記憶部から前記マッチプロファイル情報を読み出し、少なくとも１つの前記単語によって構成される文節を部分木ノードとして木構造に構成した辞書であって、少なくとも１つの前記文節によって構成される文に関するマッチ辞書情報として、当該文に含まれる前記文節に関する情報を表すルール情報を記憶するマッチ辞書記憶部の前記マッチ辞書情報を利用して、前記マッチプロファイル情報に基づき、関連付けられている前記マッチング条件に応じた前記検索キー文と前記マッチ辞書情報との照合を行い、照合の結果、前記マッチング条件を満たす前記文に対して、当該マッチプロファイル情報に関連付けられている前記評価基準に従って、前記検索キー文と前記マッチ辞書情報との照合の度合いを表すスコアを算出する検索処理手段として機能させるためのプログラムであることを特徴とする。 According to another aspect of the present invention, there is provided an input unit for inputting a search key sentence composed of a plurality of words, an analysis for analyzing the search key sentence and obtaining an analysis result relating to the words constituting the search key sentence. Means for associating a matching condition for collating the relationship between the match dictionary information and the search key sentence, and for evaluating a degree of matching with the search key sentence for a word satisfying the matching condition A dictionary in which the match profile information is read out from a match profile storage unit that stores match profile information having the evaluation criteria, and a clause composed of at least one word is configured as a sub-tree node in a tree structure, The phrase included in the sentence as match dictionary information related to the sentence constituted by the two phrases Using the match dictionary information of the match dictionary storage unit that stores rule information representing information related thereto, based on the match profile information, the search key sentence according to the matching condition associated with the match dictionary information, As a result of the collation, the degree of collation between the search key sentence and the match dictionary information is represented according to the evaluation criteria associated with the match profile information for the sentence that satisfies the matching condition. It is a program for functioning as a search processing means for calculating a score.

また、前記入力手段は、複数の単語から構成される検索対象文を入力し、前記解析手段は、前記検索対象文を解析して、前記検索対象文を構成する前記単語に関する解析結果を得て、前記コンピュータを、さらに前記解析結果に基づき、前記単語の文字列に関する文字情報、および前記単語の属性を表す属性情報を含む前記ルール情報を、少なくとも１つの前記単語によって構成される文節と対応付けて、部分木ノードとして木構造に構成した辞書情報であって、少なくとも１つの前記文節によって構成される文に関する前記マッチ辞書情報を作成して前記マッチ辞書記憶部に記憶させる辞書作成手段として機能させるためのプログラムである。 Further, the input means inputs the retrieval sentence composed of a plurality of words, said analysis means, said search by analyzing the sentence to obtain the analysis result regarding the words constituting said search subject sentence , the computer, on the basis of further result of the analysis, the association and configured clause the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, by at least one of said word Te, a dictionary information configured in a tree structure as a sub-tree nodes, to function as a dictionary creation means for storing and generating the match dictionary information about configured sentence by at least one of the clauses in the matching dictionary storage unit It is a program for.

この発明によれば、一つの辞書データを利用して異なるマッチング条件に基づく検索を実現することができる。 According to this invention, it is possible to realize a search based on different matching conditions using a single dictionary data.

本実施の形態に係る情報検索システムの一例を示すブロック図である。It is a block diagram which shows an example of the information search system which concerns on this Embodiment. 本実施の形態に係るクライアント端末装置の一例を示すブロック図である。It is a block diagram which shows an example of the client terminal device which concerns on this Embodiment. 本実施の形態に係るＷＥＢサーバの一例を示すブロック図である。It is a block diagram which shows an example of the WEB server which concerns on this Embodiment. 本実施の形態に係る日本語解析サーバの一例を示すブロック図である。It is a block diagram which shows an example of the Japanese language analysis server concerning this Embodiment. 本実施の形態に係るマッチプロファイル記憶部に記憶されているマッチプロファイルの一例を示す概略図である。It is the schematic which shows an example of the match profile memorize | stored in the match profile memory | storage part which concerns on this Embodiment. 本実施の形態に係るマッチ辞書記憶部に記憶されているマッチ辞書データの一例を示す概略図である。It is the schematic which shows an example of the match dictionary data memorize | stored in the match dictionary memory | storage part which concerns on this Embodiment. 本実施の形態に係る日本語解析サーバの一例を示すブロック図である。It is a block diagram which shows an example of the Japanese language analysis server concerning this Embodiment. 本実施の形態に係る構文解析部によって作成される構造木の一例を示す概略図である。It is the schematic which shows an example of the structure tree produced by the syntax analysis part which concerns on this Embodiment. 単語要素マッチングについて説明するための概略図である。It is the schematic for demonstrating word element matching. 係りうけマッチングについて説明するための概略図である。It is the schematic for demonstrating pending matching. 属性マッチングについて説明するための概略図である。It is the schematic for demonstrating attribute matching. 本実施の形態に係る情報検索システムにおけるマッチ辞書データの作成方法の一例を示すフローチャートである。It is a flowchart which shows an example of the creation method of the match dictionary data in the information search system which concerns on this Embodiment. 本実施の形態に係る情報検索システムにおける検索方法の一例を示すフローチャートである。It is a flowchart which shows an example of the search method in the information search system which concerns on this Embodiment. 本実施の形態に係る情報検索システムにおけるマッチング処理とスコアリング処理の一例について詳細に説明するフローチャートである。It is a flowchart explaining in detail about an example of the matching process and scoring process in the information search system which concerns on this Embodiment. 本実施の形態に係る検索結果データの一例について説明する概略図である。It is the schematic explaining an example of the search result data which concern on this Embodiment. 本実施の形態に係る検索結果について説明するための参考図である。It is a reference figure for demonstrating the search result which concerns on this Embodiment. 本実施の形態に係る情報検索システムにおける検索開始処理の一例を示すフローチャートである。It is a flowchart which shows an example of the search start process in the information search system which concerns on this Embodiment. 本実施の形態に係る情報検索システムにおける検索結果の表示方法の一例を示すフローチャートである。It is a flowchart which shows an example of the display method of the search result in the information search system which concerns on this Embodiment. 本実施の形態に係るクライアント端末装置の表示部に表示される検索結果を表す画像の一例を示す概略図である。It is the schematic which shows an example of the image showing the search result displayed on the display part of the client terminal device which concerns on this Embodiment. 図１９に示す検索結果から、絞り込み検索を行った後に表示される画面の一例を示す概略図である。FIG. 20 is a schematic diagram illustrating an example of a screen displayed after performing a narrow search from the search result illustrated in FIG. 19. 検索キー文の一例を示す参考図である。It is a reference figure which shows an example of a search key sentence. マッチした単文の一例を示す参考図である。It is a reference figure which shows an example of the matched single sentence. マッチプロファイルの設定の一例を説明するための参考図である。FIG. 10 is a reference diagram for explaining an example of setting a match profile. マッチプロファイルＡに基づき得られた検索結果を表示されている画面の一例を示す概略図である。It is the schematic which shows an example of the screen on which the search result obtained based on the match profile A is displayed. マッチプロファイルＢに基づき得られた検索結果を表示されている画面の一例を示す概略図である。It is the schematic which shows an example of the screen on which the search result obtained based on the match profile B is displayed. マッチプロファイルＣに基づき得られた検索結果を表示されている画面の一例を示す概略図である。It is the schematic which shows an example of the screen on which the search result obtained based on the match profile C is displayed. マッチプロファイルの設定の他の例を説明するための参考図である。FIG. 10 is a reference diagram for explaining another example of setting a match profile. 特定のマッチモードによって得られた検索結果の一例を示す概略図である。It is the schematic which shows an example of the search result obtained by specific match mode. 特定のマッチモードによって得られた検索結果の他の例を示す概略図である。It is the schematic which shows the other example of the search result obtained by specific match mode. 特定のマッチモードによって得られた検索結果の他の例を示す概略図である。It is the schematic which shows the other example of the search result obtained by specific match mode. 特定のマッチモードによって得られた検索結果の他の例を示す概略図である。It is the schematic which shows the other example of the search result obtained by specific match mode.

以下、本発明の一実施形態を、図面を参照して説明する。図１は、本実施の形態に係る情報検索システムの一例を示すブロック図である。
図１に示す通り、情報検索システム１は、クライアント端末装置１００と、ＷＥＢサーバ３００と、日本語解析サーバ５００と、データベースファイルサーバ７００とを備える。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an example of an information search system according to this embodiment.
As shown in FIG. 1, the information search system 1 includes a client terminal device 100, a WEB server 300, a Japanese language analysis server 500, and a database file server 700.

クライアント端末装置１００は、例えば、パーソナルコンピュータ等の情報計算処理装置であって、検索キー文が入力される入力部と、入力部を介してユーザから入力された検索キー文を、ネットワークを介してＷＥＢサーバ３００に送信する。なお、詳細については図２を用いて後述する。 The client terminal device 100 is, for example, an information calculation processing device such as a personal computer, and an input unit to which a search key sentence is input and a search key sentence input from a user via the input unit via a network. It transmits to the WEB server 300. Details will be described later with reference to FIG.

ＷＥＢサーバ３００は、クライアント端末装置１００と日本語解析サーバ５００間の通信を行い、例えば、クライアント端末装置１００から受信した検索キー文を日本語解析サーバ５００に送信し、日本語解析サーバ５００から受信した検索結果をクライアント端末装置１００に送信する。なお、詳細については、図３を用いて後述する。 The WEB server 300 performs communication between the client terminal device 100 and the Japanese language analysis server 500, for example, transmits a search key sentence received from the client terminal device 100 to the Japanese language analysis server 500 and receives from the Japanese language analysis server 500. The retrieved result is transmitted to the client terminal device 100. Details will be described later with reference to FIG.

日本語解析サーバ５００は、検索処理部５０１と、マッチプロファイル記憶部５０２と、マッチ辞書記憶部５０３と、辞書作成部５０４と、文書解析部５０５と、メモリ領域５０６を備える。日本語解析サーバ５００は、ＷＥＢサーバ３００から検索キー文を受信すると、この検索キー文に基づく検索を行い、検索結果をＷＥＢサーバ３００を介してクライアント端末装置１００に送信する。なお、詳細については図４〜８を用いて後述する。 The Japanese analysis server 500 includes a search processing unit 501, a match profile storage unit 502, a match dictionary storage unit 503, a dictionary creation unit 504, a document analysis unit 505, and a memory area 506. When receiving the search key sentence from the WEB server 300, the Japanese analysis server 500 performs a search based on the search key sentence, and transmits the search result to the client terminal device 100 via the WEB server 300. Details will be described later with reference to FIGS.

データベースファイルサーバ７００は、日本語解析サーバ５００の検索対象となる情報（以下、検索対象情報）を記憶する記憶部であって、例えば、問い合わせの対応履歴や、修理のマニュアルに関する情報、あるいは、電子メールによってユーザ等から取得された意見や苦情等の、企業や取扱い製品やサービスに関する情報を、テキストデータとして記憶する記憶装置であるデータソース７０１を備える。
また、データベースファイルサーバ７００は、例えば、企業においてデータウェアハウスとして利用される蓄積装置が利用可能である。 The database file server 700 is a storage unit that stores information to be searched by the Japanese analysis server 500 (hereinafter referred to as search target information), and includes, for example, information on response to inquiries, information on a repair manual, or electronic A data source 701 that is a storage device that stores, as text data, information related to companies, products handled, and services such as opinions and complaints acquired from users and the like by mail is provided.
The database file server 700 can use, for example, a storage device used as a data warehouse in a company.

次に、図２を用いて、クライアント端末装置１００について詳細に説明する。図２は、本実施の形態に係るクライアント端末装置１００の一例を示すブロック図である。
図２に示す通り、クライアント端末装置１００は、ブラウザ（表示制御部）１０１と、表示部１０２と、入力部１０３と、通信部１０４とを備える。
表示部１０２、例えば、液晶表示装置等であって、操作画面や検索結果画面等の表示データを表示する。
入力部１０３は、例えば、キーボードやマウスからなる入力インターフェースであって、ユーザからの操作指示や検索キー文の入力を受け付ける。
通信部１０４は、入力部１０２を介して、ユーザから検索サービスの種類が指定された場合、指定された検索サービスによる検索の実行を要求するリクエスト制御信号を、ＷＥＢサーバ３００を介して日本語解析サーバ５００に送信する。また、通信部１０４は、入力部１０２を介してユーザから入力された検索キー文を、ネットワークを介してＷＥＢサーバ３００に送信する。 Next, the client terminal device 100 will be described in detail with reference to FIG. FIG. 2 is a block diagram illustrating an example of the client terminal device 100 according to the present embodiment.
As illustrated in FIG. 2, the client terminal device 100 includes a browser (display control unit) 101, a display unit 102, an input unit 103, and a communication unit 104.
The display unit 102 is a liquid crystal display device, for example, and displays display data such as an operation screen and a search result screen.
The input unit 103 is an input interface including a keyboard and a mouse, for example, and accepts an operation instruction and a search key sentence input from a user.
When the type of search service is specified by the user via the input unit 102, the communication unit 104 analyzes a request control signal for requesting execution of the search by the specified search service via the WEB server 300. Send to server 500. In addition, the communication unit 104 transmits the search key text input from the user via the input unit 102 to the WEB server 300 via the network.

ブラウザ１０１は、例えば、ＷＥＢサーバ３００から受信したウェブページを表示部１０２に表示させるためのプログラム（例えば、ｊａｖａｓｃｒｉｐｔ（登録商標）等）をＷＥＢサーバ３００から受信して、このプログラムを実行する表示制御部である。また、ブラウザ１０１は、このプログラムとして機能し、表示部１０２によって表示される表示データを生成し、表示部１０２に出力する。
ブラウザ１０１は、記憶部１１１と、データ処理部１１２と、表示処理部１１３とを含み、各構成について以下説明する。 The browser 101 receives, for example, a program (for example, Javascript (registered trademark)) for causing the display unit 102 to display a web page received from the WEB server 300, and executes the program. Part. The browser 101 functions as this program, generates display data displayed by the display unit 102, and outputs the display data to the display unit 102.
The browser 101 includes a storage unit 111, a data processing unit 112, and a display processing unit 113. Each configuration will be described below.

記憶部１１１は、データ処理部１１２や表示処理部１１３によって処理される際のプログラムや所定の設定値等を記憶する。また、記憶部１１１は、日本語解析サーバ５００によって得られた検索結果（例えば、マッチした文章、マッチした単文、マッチした単語、これらのマッチングに用いられたマッチング条件、あるいはマッチ位置情報を含むマッチ情報等）や、絞り込み検索を実行する際の検索ルール情報（例えば、入力部１０３を介して指定されたマッチした単語を検索キーとして、検索結果から、絞り込み対象を検索するためのプログラムや設定値等）を記憶する。 The storage unit 111 stores a program and predetermined set values that are processed by the data processing unit 112 and the display processing unit 113. In addition, the storage unit 111 stores search results obtained by the Japanese language analysis server 500 (for example, a matched sentence, a matched single sentence, a matched word, a matching condition used for the matching, or a match including match position information) Information, etc.) and search rule information when performing a refinement search (for example, a program or setting value for retrieving a refinement target from a search result using a matched word specified via the input unit 103 as a search key) Etc.).

データ処理部１１２は、ＷＥＢサーバ３００から受信されるブラウザ１０１上のプログラムとして動作し、ＷＥＢサーバ３００から受信される表示データを表示部１０２の画面に表示するための表示データに変換し、表示処理部１１３を制御して表示部１０２に表示データを表示させる。また、データ処理部１１２は、記憶部１１１に記憶されている検索結果に基づき、マッチング条件を満たす単語に対して強調表示を表すタグ情報を付与した結果表示データを作成する。
表示処理部１１３は、データ処理１１２によって制御され、データ処理部１１２が変換した表示データを表示部１０２に表示させる。 The data processing unit 112 operates as a program on the browser 101 received from the WEB server 300, converts display data received from the WEB server 300 into display data for display on the screen of the display unit 102, and performs display processing. The display unit 102 is controlled to display display data on the display unit 102. In addition, the data processing unit 112 creates result display data in which tag information representing emphasis is added to words that satisfy the matching condition based on the search result stored in the storage unit 111.
The display processing unit 113 is controlled by the data processing 112 and causes the display unit 102 to display the display data converted by the data processing unit 112.

次に、図３を用いて、ＷＥＢサーバ３００について詳細に説明する。図３は、本実施の形態に係るＷＥＢサーバ３００の一例を示すブロック図である。
図３に示す通り、ＷＥＢサーバ３００は、通信部３０１と、リクエスト処理部３０２と、データ変換部３０３と、記憶部３０４とを備える。
通信部３０１は、例えば、ネットワークを介してクライアント端末装置１００および日本語解析サーバ５００と通信する。
リクエスト処理部３０２は、通信部３０１を介してクライアント端末装置１００から受信したリクエスト制御信号に基づき、クライアント端末装置１００の表示部１０２によって表示される表示データのウェブページのデータを作成するようデータ変換部３０３を制御する。また、リクエスト処理部３０２は、クライアント端末装置１００からリクエスト制御信号を受信して、クライアント端末装置１００によって実行されるコードファイルや、表示データの表示に関する設定データを作成し、クライアント端末装置１００に送信する。 Next, the WEB server 300 will be described in detail with reference to FIG. FIG. 3 is a block diagram showing an example of the WEB server 300 according to the present embodiment.
As illustrated in FIG. 3, the WEB server 300 includes a communication unit 301, a request processing unit 302, a data conversion unit 303, and a storage unit 304.
The communication unit 301 communicates with the client terminal device 100 and the Japanese language analysis server 500 via a network, for example.
The request processing unit 302 performs data conversion so as to create web page data of display data displayed by the display unit 102 of the client terminal device 100 based on the request control signal received from the client terminal device 100 via the communication unit 301. The unit 303 is controlled. Further, the request processing unit 302 receives a request control signal from the client terminal device 100, creates a code file executed by the client terminal device 100 and setting data related to display of display data, and transmits the setting data to the client terminal device 100. To do.

データ変換部３０３は、リクエスト処理部３０２によって制御され、日本語解析サーバ５００から受信した検索結果等に基づき、クライアント端末装置１００に対して送信するウェブページのデータを作成する。
記憶部３０４は、リクエスト処理部３０２やデータ変換部３０３によって利用される設定データ等や、日本語解析サーバ５００の検索によって得られた検索結果を一時的に記憶する。 The data conversion unit 303 is controlled by the request processing unit 302 and creates web page data to be transmitted to the client terminal device 100 based on the search result received from the Japanese analysis server 500.
The storage unit 304 temporarily stores setting data used by the request processing unit 302 and the data conversion unit 303 and search results obtained by the search of the Japanese analysis server 500.

なお、ＷＥＢサーバ３００は、例えば、社内ＬＡＮ（Local area network）に接続され、コールセンターのオペレーターによってなされる、ユーザからの問い合わせの対応に適した情報を提供するＷＥＢサーバであってよく、社内における知識共有用のデータを提供するためのＷＥＢサーバであってもよい。このように、ＷＥＢサーバ３００は、情報検索システム１の検索の目的に応じて、複数あってもよい。 The WEB server 300 may be, for example, a WEB server that is connected to an in-house LAN (Local area network) and provides information suitable for handling inquiries from users made by a call center operator. It may be a WEB server for providing data for sharing. As described above, there may be a plurality of WEB servers 300 depending on the purpose of the search of the information search system 1.

次に、図４を用いて、日本語解析サーバ５００について詳細に説明する。図４は、本実施の形態に係る日本語解析サーバ５００の一例を示すブロック図である。
図４に示す通り、検索処理部５０１は、異なる検索サービスα、βをクライアント端末装置１００に提供するためのプログラムを実行する機能を供え、ユーザによって指定された検索サービスαによる検索を実行するための制御信号をＷＥＢサーバ３００から受信した場合、検索サービスαのプログラムを起動させ、検索サービスαと関連付けられているマッチプロファイルをマッチプロファイル記憶部５０２から読み出す。また、検索処理部５０１は、マッチ辞書記憶部５０３に記憶されているマッチ辞書データを読み出す。さらに、検索処理部５０１は、読み出したマッチプロファイルおよびマッチ辞書データを、メモリ領域５０６に展開して辞書オブジェクトを作成する。 Next, the Japanese language analysis server 500 will be described in detail with reference to FIG. FIG. 4 is a block diagram showing an example of the Japanese language analysis server 500 according to the present embodiment.
As shown in FIG. 4, the search processing unit 501 has a function of executing a program for providing different search services α and β to the client terminal device 100, and executes a search by the search service α designated by the user. When the control signal is received from the WEB server 300, the search service α program is started, and the match profile associated with the search service α is read from the match profile storage unit 502. In addition, the search processing unit 501 reads match dictionary data stored in the match dictionary storage unit 503. Further, the search processing unit 501 expands the read match profile and match dictionary data in the memory area 506 to create a dictionary object.

また、検索処理部５０１は、ＷＥＢサーバ３００を介してクライアント端末装置１００から検索キー文を受信すると、受信した検索キー文を文書解析部５０５に出力し、検索結果を書き込むための検索結果オブジェクト（空の状態）をメモリ領域５０６に生成する。これにより、検索結果を記録するためのメモリ領域を確保することができる。
さらに、検索処理部５０１は、マッチプロファイル記憶部５０２から読み出したマッチプロファイルにおいて予め決められているマッチモードに従って、解析された検索キー文と、メモリ領域５０６の辞書オブジェクトに展開されているマッチ辞書データとの照合（以下、マッチングという）を行い、マッチモードの条件を満たす文章等の検索を行う（以下、マッチング処理という）。 When the search processing unit 501 receives a search key sentence from the client terminal device 100 via the WEB server 300, the search processing unit 501 outputs the received search key sentence to the document analysis unit 505, and writes a search result object ( (Empty state) is generated in the memory area 506. As a result, a memory area for recording the search result can be secured.
Further, the search processing unit 501 analyzes the search key sentence analyzed in accordance with the match mode predetermined in the match profile read from the match profile storage unit 502 and the match dictionary data developed in the dictionary object in the memory area 506. Are matched (hereinafter referred to as matching), and a sentence or the like satisfying the match mode is searched (hereinafter referred to as matching processing).

また、検索処理部５０１は、マッチプロファイルにおいて予め決められているスコアモードに従って、マッチングによって得られた文章（以下、マッチした文章という）における、検索キー文とのマッチングの程度を評価するスコアを算出する（以下、スコアリング処理という）。なお、詳細は後述するが、検索処理部５０１は、例えば、マッチングによって得られた単文（以下、マッチした単文という）に含まれる単語のスコアを算出し、この単語のスコアの和を算出することによって、マッチした文章のスコアやマッチした単文のスコアを算出する。
さらに、検索処理部５０１は、マッチングによって得られた文章等の文章ＩＤ等と、当該文章等のスコアを関連付けて、メモリ領域５０６の検索結果オブジェクトに検索結果として格納する。また、検索処理部５０１は、この検索結果オブジェクトの検索結果を、ＷＥＢサーバ３００を介してクライアント端末装置１００に送信する。 In addition, the search processing unit 501 calculates a score that evaluates the degree of matching with a search key sentence in a sentence obtained by matching (hereinafter referred to as a matched sentence) in accordance with a score mode predetermined in the match profile. (Hereinafter referred to as scoring process). Although details will be described later, for example, the search processing unit 501 calculates the score of a word included in a single sentence obtained by matching (hereinafter referred to as a matched single sentence), and calculates the sum of the scores of the words. To calculate the score of the matched sentence and the score of the matched single sentence.
Further, the search processing unit 501 associates a sentence ID or the like of a sentence obtained by matching with a score of the sentence or the like and stores it as a search result in a search result object in the memory area 506. Also, the search processing unit 501 transmits the search result of the search result object to the client terminal device 100 via the WEB server 300.

メモリ領域５０６は、一時的に情報を記憶する記憶領域であって、例えば、検索処理部５０１によって作成される辞書オブジェクトや検索結果オブジェクトが作成される領域である。 The memory area 506 is a storage area for temporarily storing information. For example, a dictionary object or a search result object created by the search processing unit 501 is created.

マッチプロファイル記憶部５０２には、例えば、各検索サービスα、β・・・に応じたマッチプロファイルＡ、Ｂ・・・が格納されている。ここで、マッチプロファイルとは、予め決定されているマッチモードを定義するマッチモード情報と、当該マッチモードで抽出された結果に適用されるスコアの算出手法を定義するスコアモード情報とを含む。例えば、マッチプロファイルＡのマッチモード定義においては、検索サービスαとマッチプロファイルＡとが予め関連づけられている。よって、ユーザによって検索サービスが指定されると、この検索サービスに応じて予め決定されているマッチプロファイルと、このマッチプロファイルにおいて予め決められているマッチモードやスコアモードが、決定される。
なお、本発明はこれに限られず、クライアント端末装置１００が、これら検索サービスの種類、マッチプロファイルの種類、マッチモードの種類の組み合わせを表す情報をリクエスト制御信号とともにＷＥＢサーバ３００を介して日本語解析サーバ５００に送信する場合、ユーザによって、これらの組み合わせが決定される構成であってもよい。 The match profile storage unit 502 stores, for example, match profiles A, B... Corresponding to each search service α, β. Here, the match profile includes match mode information that defines a predetermined match mode and score mode information that defines a score calculation method applied to a result extracted in the match mode. For example, in the match mode definition of match profile A, search service α and match profile A are associated in advance. Therefore, when a search service is designated by the user, a match profile determined in advance according to the search service, and a match mode and score mode determined in advance in the match profile are determined.
The present invention is not limited to this, and the client terminal device 100 analyzes Japanese information via the WEB server 300 through the WEB server 300 together with the request control signal and information indicating the combination of the search service type, match profile type, and match mode type. When transmitting to the server 500, the combination of these may be determined by the user.

ここで、マッチプロファイルについて、図５を用いて詳細に説明する。図５は、本実施の形態に係るマッチプロファイル記憶部５０２に記憶されているマッチプロファイルの一例を示す概略図である。
図５に示す通り、マッチプロファイルＡは、マッチモード情報として、マッチモード定義ＰＡ１を含み、スコアモード情報として、相対出現頻度フラグＰＡ２と、相対出現頻度重視係数ＰＡ３と、文章出現位置ＰＡ４と、検索キー出現位置ＰＡ５と、述語属性マッチ係数ＰＡ６と、係りうけマッチ係数ＰＡ７と、品詞カテゴリＰＡ８と、接続詞評価ＰＡ９と、同義語マッチ係数ＰＡ１０と、を含む。 Here, the match profile will be described in detail with reference to FIG. FIG. 5 is a schematic diagram illustrating an example of a match profile stored in the match profile storage unit 502 according to the present embodiment.
As shown in FIG. 5, the match profile A includes a match mode definition PA1 as match mode information, a relative appearance frequency flag PA2, a relative appearance frequency emphasis coefficient PA3, a sentence appearance position PA4 as search mode information, and a search. It includes a key appearance position PA5, a predicate attribute match coefficient PA6, a dependency match coefficient PA7, a part of speech category PA8, a conjunction evaluation PA9, and a synonym match coefficient PA10.

マッチモード定義ＰＡ１とは、検索サービスα、β・・・に応じて予め決定されているマッチモードの組み合わせに関する情報である。ここで、マッチモードとは、検索キー文とマッチ辞書データのマッチングを行う際の手法を表すものであって、例えば、図９〜１１において後述するような、単語要素マッチング、係りうけマッチング、属性マッチングがあり、その組み合わせとして定義される。 The match mode definition PA1 is information relating to a combination of match modes determined in advance according to the search services α, β,. Here, the match mode represents a technique for performing matching between the search key sentence and the match dictionary data. For example, as described later with reference to FIGS. There is a match and it is defined as the combination.

スコアモード情報は、マッチモードに応じて得られた結果に対する、スコア条件や重みの付け方に関する情報を含み、マッチした文章や単文のスコアを算出する際に参照される情報である。
スコアモード情報は、相対出現頻度フラグＰＡ２、相対出現頻度重視係数ＰＡ３、文章出現位置ＰＡ４、および検索キー出現位置ＰＡ５、述語属性マッチ係数ＰＡ６、係りうけマッチ係数ＰＡ７、品詞カテゴリＰＡ８、接続詞評価ＰＡ９、および同義語マッチ係数ＰＡ１０を含み、いずれのマッチモードにおいても利用可能なスコアモードに関する情報である。ここで、マッチモード情報とスコアモード情報は、マッチモード定義ＰＡ１において予め決められているマッチングモードに関わらず、任意に組み合わせが可能である。 The score mode information is information that is referred to when calculating the score of a matched sentence or a single sentence, including information on a score condition and a weighting method for a result obtained according to the match mode.
The score mode information includes a relative appearance frequency flag PA2, a relative appearance frequency importance coefficient PA3, a sentence appearance position PA4, a search key appearance position PA5, a predicate attribute match coefficient PA6, a dependency match coefficient PA7, a part of speech category PA8, a conjunction evaluation PA9, And a score mode that can be used in any match mode, including the synonym match coefficient PA10. Here, the match mode information and the score mode information can be arbitrarily combined regardless of the matching mode predetermined in the match mode definition PA1.

また、スコアモード情報は、マッチした単語に対して付与されるスコアを算出するか否かを表す情報であって、算出することを表す情報が設定されている場合、それぞれにおいて付与される係数や得点が設定値として決められている。つまり、検索処理部５０１は、マッチングモード情報を利用して、マッチした文章や単文に対して、スコアモード情報を利用して、さらにマッチングした文章や単文と、検索キー文とのマッチングの度合いを評価するためのスコアを算出することができる。ここで、スコアは、マッチングした文章や単文と、検索キー文とのマッチングの度合いを示すものであって、例えば、マッチした文章等と検索キー文との関係において、文章構成や係りうけ関係が一致している、あるいは、述語属性が一致している等によって、文としての意味合いがどのくらい類似しているかを評価するための点数である。 Further, the score mode information is information indicating whether or not to calculate a score to be given to a matched word, and when information representing the calculation is set, a coefficient given in each The score is determined as a set value. In other words, the search processing unit 501 uses the matching mode information to calculate the degree of matching between the matched sentence or single sentence and the search key sentence using the score mode information for the matched sentence or single sentence. A score for evaluation can be calculated. Here, the score indicates the degree of matching between the matched sentence or single sentence and the search key sentence. For example, in the relationship between the matched sentence or the like and the search key sentence, there is a sentence structure or a relation relationship. This is a score for evaluating how similar the meaning of a sentence is due to the match or the predicate attribute being matched.

相対出現頻度フラグＰＡ２は、相対出現頻度（ｔｆ×ｉｄｆ）に基づく重み付けの利用を行うか否かを表す。フラグがオンとなっている状態では、この重み付けを行うことを表し、フラグがオフとなっている状態では行わないことを表す。
ここで、相対出現頻度（ｔｆ×ｉｄｆ）は、キーワード(重要語)抽出の手がかりとして一般的に用いられる相対値であって、以下の各系数（ｔｆ，ｉｄｆ）係数を乗じたものである。
なお、ｔｆ（ term frequency ）は、ある文章における、特定の単語の相対出現頻度であって、ｉｄｆ（ inversed document frequency ）は、特定の単語が含まれる文章数の逆数である。つまり、どの文章にも含まれるようなありふれた単語ほど、相対出現頻度は小さくなる。したがって、ある文章は、それに含まれ相対出現頻度（ｔｆ×ｉｄｆ）の高い単語によって特徴づけられる。 The relative appearance frequency flag PA2 indicates whether to use weighting based on the relative appearance frequency (tf × idf). When the flag is on, this weighting is performed, and when the flag is off, it is not performed.
Here, the relative appearance frequency (tf × idf) is a relative value that is generally used as a key for keyword (important word) extraction, and is multiplied by the following coefficient (tf, idf).
Note that tf (term frequency) is the relative frequency of occurrence of a specific word in a sentence, and idf (inversed document frequency) is the reciprocal of the number of sentences containing the specific word. In other words, the more common words that are included in any sentence, the lower the relative appearance frequency. Therefore, a certain sentence is characterized by a word contained in it and having a high relative appearance frequency (tf × idf).

相対出現頻度重視係数ＰＡ３は、相対出現頻度フラグＰＡ２がオンの状態で行われる、ｔｆ値のスコアに対する重み付けの係数である。 The relative appearance frequency importance coefficient PA3 is a weighting coefficient for the score of the tf value, which is performed in a state where the relative appearance frequency flag PA2 is on.

文章出現位置ＰＡ４は、文章や単文における文の出現位置に応じた重み付けを行うか否かを表す情報である。例えば、文章出現位置ＰＡ４が出現位置に応じた重み付けを行うことを表している状態で、マッチ辞書記憶部５０３に記憶されている文章の先頭に近い単語から傾斜的な係数を付与することを表している。ここでは、先頭に近い位置に出現する単語に対しては、重み付けを重くし、文末に近い位置に出現する単語ほど重み付けが軽くなるような係数が設定されている。なお、この係数は、任意に設定可能である。 The sentence appearance position PA4 is information indicating whether or not weighting is performed according to the sentence appearance position in a sentence or a single sentence. For example, in a state where the sentence appearance position PA4 represents weighting according to the appearance position, this indicates that a gradient coefficient is given from a word close to the head of the sentence stored in the match dictionary storage unit 503. ing. Here, a coefficient is set such that a word appearing at a position near the beginning is weighted more and a word that appears closer to the end of the sentence is lighter. This coefficient can be set arbitrarily.

検索キー出現位置ＰＡ５は、検索キー文における文の出現位置に応じた重み付けを行うか否かを表す情報である。例えば、検索キー出現位置ＰＡ５は、出現位置に応じた重み付けを行うことを表している状態で、検索キー文とマッチした単語が、検索キー文において出現する位置に応じて、先頭に近い位置に出現した単語等の文字列から傾斜的な係数を付与することを表している。ここでは、検索キー文の先頭に近い位置に出現する単語に対しては、重み付けを重くし、検索キー文の文末に近い位置に出現する単語ほど重み付けが軽くなるような係数が設定されている。なお、この係数は、任意に設定可能である。 The search key appearance position PA5 is information indicating whether or not weighting is performed according to the sentence appearance position in the search key sentence. For example, the search key appearance position PA5 indicates that weighting according to the appearance position is performed, and a word that matches the search key sentence is positioned closer to the head depending on the position at which the search key sentence appears. This indicates that a gradient coefficient is given from a character string such as an appearing word. Here, for words that appear near the beginning of the search key sentence, a weight is set to be heavy, and a coefficient is set so that the weight that appears closer to the end of the search key sentence is lighter. . This coefficient can be set arbitrarily.

述語属性マッチ係数ＰＡ６は、述語属性一致の際のノードのスコア係数を表す情報である。例えば、マッチした単語や係りうけ関係にある単語に対して、これらマッチした単語と検索キー文の対応する単語との述語属性がさらに一致した場合に、どれだけ重み付けを加算するかが規定されている。なお、単純に名詞など属性が無い場合は、重み付けの加算は行わず、否定などの属性が付与され、一致した場合の重み付けの加算は、乗数としてのスコア計数を任意に設定可能である。なお、単純な名詞文節などのように属性が無い場合は、重み付けの加算は行わない。属性が付与され一致した場合の重み付けの加算は、乗数としてのスコア計数を任意に設定可能である。 The predicate attribute match coefficient PA6 is information representing the score coefficient of the node when the predicate attribute matches. For example, it is specified how much weight is added to a matched word or a related word when the predicate attribute of the matched word and the corresponding word in the search key statement further matches. Yes. If there is simply no attribute such as a noun, weighting addition is not performed, and an attribute such as negation is given, and weighting addition in the case of matching can arbitrarily set a score count as a multiplier. If there is no attribute such as a simple noun phrase, weighting is not added. In addition of weights when attributes are assigned and they match, score counting as a multiplier can be arbitrarily set.

係りうけマッチ係数ＰＡ７は、係りうけ単位で一致した場合のスコア係数を表す情報である。例えば、マッチした単語同士が係りうけ関係にある単語（以下、係りうけペアという）に対して重み付けを行うか否かが規定されている。なお、スコア計数は、乗数として任意に設定可能である。 The pending match coefficient PA7 is information representing a score coefficient when matching is performed in the pending units. For example, it is defined whether or not weighting is performed on words in which matched words are related to each other (hereinafter referred to as “related pairs”). The score count can be arbitrarily set as a multiplier.

品詞カテゴリＰＡ８は、品詞カテゴリごとの重み付けを表す情報であって、例えば、ユーザー単語＞固有名詞＞一般名詞＞形容詞・形容動調＞動詞といった優先順位に応じて、品詞ごとに傾斜的な係数を与えるか否かを表している。なお、品詞ごとに傾斜的な係数を任意に設定可能である。 The part-of-speech category PA8 is information representing weighting for each part-of-speech category. For example, a slope coefficient is given to each part-of-speech according to the order of priority such as user word> proprietary noun> general noun> adjective / adjective tone> verb. Indicates whether or not to give. Note that an inclined coefficient can be arbitrarily set for each part of speech.

接続詞評価ＰＡ９は、検索キー文とマッチした文章等において、特定の接続詞が含まれている場合、当該マッチした文全体を重視(または軽視)するか否かを表す情報である。例えば、接続詞評価ＰＡ９は、特定の接続詞がある場合に文全体を重視することを表す状態で、特定の接続詞が含まれているマッチした文章の各単語に乗じる係数や、特定の接続詞が含まれているマッチした文章のうちマッチした単語にのみ乗じられる係数等が規定されている。 The conjunction evaluation PA9 is information indicating whether or not the entire matched sentence is emphasized (or neglected) when a specific conjunction is included in the sentence matched with the search key sentence. For example, the conjunction evaluation PA9 is a state that emphasizes the whole sentence when there is a specific conjunction, and includes a coefficient to be multiplied to each word of the matched sentence including the specific conjunction, or a specific conjunction. Coefficients to be multiplied only for the matched words among the matched sentences are defined.

同義語マッチ係数ＰＡ１０は、後述する文書解析部５０５の類義語処理部による文字列置換が行われた文字列のスコアを何倍にするかを表す情報である。また、同義語マッチ係数ＰＡ１０は、単語そのものが一致した場合に比較して、同義語や類義語との一致のスコア順位を落とすために用いる情報である。 The synonym match coefficient PA10 is information indicating how many times the score of the character string that has been subjected to character string replacement by the synonym processing unit of the document analysis unit 505 described later is to be increased. The synonym match coefficient PA10 is information used to drop the score ranking of matching with synonyms and synonyms compared to when the words themselves match.

マッチ辞書記憶部５０３は、マッチ辞書データを記憶し、このマッチ辞書データは、例えば、シンボルＩＤに置き換えられた単語情報との対応関係を表すシンボルマップＭＤ１と、各文章に関することを表す文章情報ＭＤ２と、この文章に含まれる単文情報ＭＤ３とを含む。
ここで、マッチ辞書データについて、図６を用いて詳細に説明する。図６は、本実施の形態に係るマッチ辞書記憶部５０３に記憶されているマッチ辞書データの一例を示す概略図である。 The match dictionary storage unit 503 stores match dictionary data. The match dictionary data includes, for example, a symbol map MD1 indicating a correspondence relationship with word information replaced with a symbol ID, and sentence information MD2 indicating a relation with each sentence. And single sentence information MD3 included in the sentence.
Here, the match dictionary data will be described in detail with reference to FIG. FIG. 6 is a schematic diagram showing an example of match dictionary data stored in the match dictionary storage unit 503 according to the present embodiment.

シンボルマップＭＤ１は、シンボルＩＤによって識別される単語情報のテキストデータと、単語情報を識別するためのシンボルＩＤとを対応付ける情報である。これにより、マッチ辞書記憶部５０３は、文章情報ＭＤ２および単文情報ＭＤ３において、単語情報をテキストデータで記憶することなく、シンボルマップＭＤ１において対応付けられているシンボルＩＤに置き換えて格納することができる。 The symbol map MD1 is information that associates text data of word information identified by a symbol ID with a symbol ID for identifying the word information. As a result, the match dictionary storage unit 503 can store the word information in the sentence information MD2 and the single sentence information MD3 by replacing them with the symbol IDs associated with the symbol map MD1 without storing them as text data.

文章情報ＭＤ２は、解析された文章の構造木（詳細については後述する）をマッチ辞書記憶部５０３に登録するために必要な登録情報であって、文章ＩＤ２１と、この文章ＩＤ２１によって識別される文章の文章テキストデータ２２と、当該文章が検索対象情報としてデータベースファイルサーバ７００に格納された日時やデータソース７０１での格納場所を表すアドレス等の情報を含む文章付加情報２３と、用語マップ２４とを含む。ここで、用語マップ２４は、文章内に含まれる単語情報の出現回数を表す情報であって、単語情報毎の出現回数とシンボルＩＤとが対応付けられている。 The sentence information MD2 is registration information necessary for registering an analyzed sentence structure tree (details will be described later) in the match dictionary storage unit 503. The sentence information is identified by the sentence ID 21 and the sentence ID 21. Sentence text data 22, sentence addition information 23 including information such as the date and time when the sentence was stored in the database file server 700 as search target information and the storage location in the data source 701, and the term map 24. Including. Here, the term map 24 is information indicating the number of appearances of the word information included in the sentence, and the number of appearances for each word information is associated with the symbol ID.

一方、単文情報ＭＤ３は、単文に含まれる文節毎に、構造木（図８参照にて、詳細は後述する）における部分木ノードの情報を表すルール（ルール情報）３２を含み、各単文を識別するための単文ＩＤ３１が付与されている。
単文情報ＭＤ３に含まれるルール３２は、例えば、語情報３２１、述語属性３２２、親ルールＩＤ３２３、重み値３２４、接続詞種別３２５、カテゴリ３２６、子ノード有フラグ３２７等を含む。 On the other hand, the single sentence information MD3 includes a rule (rule information) 32 representing information of a subtree node in a structure tree (see FIG. 8 for details) for each clause included in the single sentence, and identifies each single sentence. A simple sentence ID 31 is assigned.
The rule 32 included in the single sentence information MD3 includes, for example, word information 321, predicate attribute 322, parent rule ID 323, weight value 324, conjunction type 325, category 326, child node presence flag 327, and the like.

語情報３２１は、例えば、シンボルＩＤや、単文内における単語情報の位置を表す位置情報等を含む。なお、語情報３２１は、単文内に含まれる単語の数に応じた数の語情報１、語情報２、・・・、語情報ｎを含み、例えば、単語情報のシンボルＩＤ、単文における単語の位置情報（開始位置と終了位置）を含む。
述語情報３２２は、例えば、語ＩＤや、動詞形容詞等の句の属性、および句の意味（否定、否定の傾向、願望、肯定・・・等）を表す属性シンボルＩＤ等を含む。 The word information 321 includes, for example, a symbol ID and position information indicating the position of word information in a single sentence. The word information 321 includes the number of word information 1, word information 2,..., Word information n corresponding to the number of words included in the single sentence. For example, the word information 321 includes the symbol ID of the word information and the word information in the single sentence. Includes position information (start position and end position).
The predicate information 322 includes, for example, a word ID, an attribute of a phrase such as a verb adjective, and an attribute symbol ID representing the meaning of the phrase (negative, negative tendency, desire, affirmation, etc.).

親ルールＩＤ３２３は、親子の係りうけ関係にある親の部分木ノードの文節を表す情報である。
重み値３２４は、例えば、文章内での主語や述語に応じた重みを付与する係数等である。また、重み値３２４は、後述するスコアリングにおいて、当該ルール（ノード）がマッチした際に基準となる点数を規定する係数等である。通常は辞書全体で任意の一つの値に設定されるが、辞書の作成時に、文章内での主語や述語に応じた重みを付与する事ができる。
接続詞種別３２５は、ルール３２に対応する文節（句）が、「したがって、だから、すなわち、・・・」等の接続詞である場合、その接続詞を表す情報である。
カテゴリ３２６は、動詞、名詞、副詞、接続詞・・・等の品詞の種類を表す情報である。
子ノード有フラグ３２７は、親子の係りうけ関係にある子の部分木ノードの文節の有無を表す情報であって、フラグがオンされている状態で、当該部分木ノードが親の部分木ノードであることを表す情報である。 The parent rule ID 323 is information representing a clause of a parent subtree node that has a parent-child relationship.
The weight value 324 is, for example, a coefficient that gives a weight according to the subject or predicate in the sentence. Further, the weight value 324 is a coefficient or the like that defines a reference score when the rule (node) matches in scoring described later. Normally, the entire dictionary is set to an arbitrary value, but weights corresponding to the subject and predicate in the sentence can be given when the dictionary is created.
The conjunction type 325 is information representing a conjunction when the clause (phrase) corresponding to the rule 32 is a conjunction such as “So, that is, that is,...
The category 326 is information representing the type of part of speech such as a verb, noun, adverb, conjunction,.
The child node presence flag 327 is information indicating the presence / absence of a clause of a child subtree node having a parent-child relationship, and when the flag is on, the subtree node is a parent subtree node. It is information indicating that there is.

次に、図７を用いて、検索処理部５０１、辞書作成部５０４、および文書解析部５０５について詳細に説明する。図７は、本実施の形態に係る日本語解析サーバ５００の一例を示すブロック図である。
辞書作成部５０４は、データベースファイルサーバ７００から、検索対象となる文章（検索対象情報）を読み出し、検索対象情報に含まれる文章を、例えば句点ごとに区切って単文の単位に変換し、文書解析部５０５に出力する。例えば、検索対象情報に含まれる文章Ａのテキストデータが「ＰＣ内にある画像を送信したところ受信できませんでした。また、携帯内にある画像を送信したところ受信できました。」である場合、句点「。」で区切って２つの単文に分割し、単文Ａ１「ＰＣ内にある画像を送信したところ受信できませんでした。」と、単文Ａ２「また、携帯内にある画像を送信したところ受信できました。」を文書解析部５０５に出力する。
また、辞書作成部５０４は、文書解析部５０５によって解析された結果を、文書解析部５０５から受け取った場合、マッチ辞書記憶部５０３に格納する。
なお、辞書作成部５０４は、データベースファイルサーバ７００から読み出した検索対象情報を、適当な長さに区切るものであればよく、例えば、ピリオド、箇条書きの一文、スペースや改行によって一文と判断される一文で、区切るものであってもよい。 Next, the search processing unit 501, the dictionary creation unit 504, and the document analysis unit 505 will be described in detail with reference to FIG. FIG. 7 is a block diagram showing an example of the Japanese language analysis server 500 according to the present embodiment.
The dictionary creation unit 504 reads the text to be searched (search target information) from the database file server 700, converts the text contained in the search target information into, for example, units of single sentences by dividing each phrase, and the document analysis unit Output to 505. For example, if the text data of sentence A included in the search target information is "I could not receive it when I sent an image in my PC. And I received it when I sent an image in my phone." Divided into two single sentences separated by a punctuation mark “.”, The simple sentence A1 “I could not receive when I sent the image in the PC.” And the simple sentence A2 “Also, I could receive it when I sent the image in my phone. Is output to the document analysis unit 505.
When the dictionary analysis unit 504 receives the result analyzed by the document analysis unit 505 from the document analysis unit 505, the dictionary creation unit 504 stores the result in the match dictionary storage unit 503.
Note that the dictionary creation unit 504 only needs to divide the search target information read from the database file server 700 into an appropriate length. For example, the dictionary creation unit 504 is determined to be one sentence by a period, a single bullet, a space, or a line feed. The sentence may be separated by a single sentence.

文書解析部５０５は、辞書作成部５０４によってマッチ辞書データが作成される際に、例えば、単文単位に区切られた検索対象情報を、辞書作成部５０４から受け取り、文書解析を行って、解析結果を辞書作成部５０４に出力する。
また、文書解析部５０５は、クライアント端末装置１００から特定の検索サービスによる検索を実行するようなリクエスト制御信号を受信した場合、検索処理部５０１から受け取った検索キー文に対して文書解析を行い、解析結果を検索処理部５０１に出力する。以下、文書解析部５０５について詳細に説明する。 When the match creation data is created by the dictionary creation unit 504, the document analysis unit 505 receives, for example, search target information divided into single sentence units from the dictionary creation unit 504, performs document analysis, and obtains an analysis result. The data is output to the dictionary creation unit 504.
When the document analysis unit 505 receives a request control signal for executing a search by a specific search service from the client terminal device 100, the document analysis unit 505 performs document analysis on the search key sentence received from the search processing unit 501, The analysis result is output to the search processing unit 501. Hereinafter, the document analysis unit 505 will be described in detail.

文書解析部５０５は、辞書部５５０と、解析部５５１とを備える。
辞書部５５０は、システム辞書５５０１と、ユーザ辞書５５０２と、類義語辞書５５０３とを備える。一方、解析部５５１は、形態素解析部５５１１と、構文解析部５５１２と、類義語処理部５５１３とを備える。 The document analysis unit 505 includes a dictionary unit 550 and an analysis unit 551.
The dictionary unit 550 includes a system dictionary 5501, a user dictionary 5502, and a synonym dictionary 5503. On the other hand, the analysis unit 551 includes a morphological analysis unit 5511, a syntax analysis unit 5512, and a synonym processing unit 5513.

システム辞書５５０１は、例えば、形態素として意味を持つ最小単位の単語と、この単語の意味や品詞、属性情報等が対応付けられている辞書データである。
ユーザ辞書５５０２は、例えば、日本語解析サーバ５００を利用する管理者等によって、システム辞書５５０１に追加される辞書データである。
類義語辞書は、複数の類義語や同義語を置き換えるため、単語とその類義語等とを対応付けている辞書データであって、例えば、マッチ辞書記憶部５０３のマッチ辞書データの単語情報と、その類義語が対応付けられている。 The system dictionary 5501 is, for example, dictionary data in which a minimum unit word having meaning as a morpheme is associated with the meaning, part of speech, attribute information, and the like of the word.
The user dictionary 5502 is dictionary data added to the system dictionary 5501 by, for example, an administrator who uses the Japanese language analysis server 500.
The synonym dictionary is dictionary data in which a word and its synonym are associated with each other in order to replace a plurality of synonyms and synonyms. For example, word information of match dictionary data in the match dictionary storage unit 503 and its synonyms are It is associated.

形態素解析部５５１１は、例えば、辞書作成部５０４によって、単文ごとに区切られた検索対象情報を受け取り、検索対象となる文章を複数の形態素（単語情報）に分解する。例えば、形態素解析部５５１１は、文章Ａが入力されると、文章Ａに含まれる単文Ａ１を、複数の形態素（「ＰＣ」「内」「に」「ある」「画像」「を」「送信」「した」「ところ」「受信」「できませんでした」）に分解する。
このようにして、形態素解析部５５１１は、マッチ辞書データ作成の際に、検索対象情報を形態素に分解することができるが、これに限られず、検索キー文を入力して検索を行う際には、検索処理部５０１によって単文ごとに区切られた検索キー文を受け取り、形態素に分解する。 The morpheme analysis unit 5511 receives, for example, the search target information divided for each single sentence by the dictionary creation unit 504, and decomposes the text to be searched into a plurality of morphemes (word information). For example, when the sentence A is input, the morpheme analysis unit 5511 converts the single sentence A1 included in the sentence A into a plurality of morphemes (“PC”, “inside”, “in”, “present”, “image”, “in”, and “send”). "I did", "Place", "Receive", "I couldn't").
In this way, the morpheme analysis unit 5511 can decompose the search target information into morphemes when creating match dictionary data, but is not limited to this, and when performing a search by inputting a search key sentence. The search processing unit 501 receives a search key sentence divided for each single sentence and decomposes it into morphemes.

また、形態素解析部５５１１は、システム辞書５５０１やユーザ辞書５５０２を参照して、分解した形態素の品詞を検索し、得られた品詞情報に基づき、文章中の係りうけ関係や形態素の意味に応じた文節を作成する。例えば、構文解析部５５１２は、単文Ａ１から分解された形態素に基づき、文節Ａ１０１「ＰＣ内に」と、文節Ａ１０２「ある」と、文節Ａ１０３「画像を」と、文節Ａ１０４「送信したところ」と、文節Ａ１０５「受信できませんでした」とを作成する。なお、ここで、文節とは、少なくとも１つの単語を含む文字列の単位である。また、単文とは、少なくとも１つの文節を含み１つの文からなる文字列の単位であって、例えば、句点等で区切られる。さらに、文章とは、複数の単文を含む文の単位である。なお、文は、単文と文章の両方を含むものとする。 Further, the morpheme analysis unit 5511 searches the part of speech of the decomposed morpheme with reference to the system dictionary 5501 and the user dictionary 5502, and based on the obtained part of speech information, the morpheme analysis unit 5511 responds to the dependency relationship in the sentence and the meaning of the morpheme. Create a clause. For example, based on the morpheme decomposed from the single sentence A1, the syntax analysis unit 5512, the phrase A101 “in the PC”, the phrase A102 “is”, the phrase A103 “image”, and the phrase A104 “sent” , Phrase A105 “Could not be received” is created. Here, the phrase is a unit of a character string including at least one word. A single sentence is a unit of a character string including one sentence including at least one clause, and is delimited by, for example, a phrase. Furthermore, a sentence is a unit of a sentence including a plurality of simple sentences. A sentence includes both a simple sentence and a sentence.

さらに、形態素解析部５５１１は、システム辞書５５０１およびユーザ辞書５５０２に記憶されている辞書データを参照して、それぞれ品詞のカテゴリ（例えば、動詞、名詞、副詞、接続詞・・・等）や、接続詞の種類（例えば、したがって、だから、といった・・・等）、動詞や形容詞等の句の意味を表す属性（例えば、否定、否定の傾向、願望、肯定・・・等）を検索し、検索によって得られた情報を形態素や文節に付与する。
例えば、形態素解析部５５１１は、文節Ａ１０５「受信できません」を解析して、品詞は“名詞（サ変接続）”であって、句の意味は“否定”であることを解析結果として得て、この解析結果を文節Ａ１０５に付与する。 Further, the morpheme analysis unit 5511 refers to the dictionary data stored in the system dictionary 5501 and the user dictionary 5502, respectively, the category of part of speech (for example, verb, noun, adverb, conjunction, etc.) Search by attribute (eg, negation, negative tendency, desire, affirmation, etc.) that represents the meaning of the phrase (eg, therefore, so on, etc.), verbs, adjectives, etc. Information is given to morphemes and phrases.
For example, the morpheme analysis unit 5511 analyzes the phrase A105 “cannot be received”, obtains as an analysis result that the part of speech is “noun (sa modification connection)” and the meaning of the phrase is “negative”. The analysis result is given to the phrase A105.

構文解析部５５１２は、形態素解析部５５１１によって解析された情報に基づき、文章を構成する文節の品詞や意味、属性情報、文章内での位置、並び等を評価し、文章における文節どうしの係りうけ関係を解析し、解析結果を類義語処理部５５１３に出力する。 Based on the information analyzed by the morphological analysis unit 5511, the syntax analysis unit 5512 evaluates the part of speech and meaning of the clauses constituting the sentence, attribute information, the position in the sentence, the arrangement, and the like. The relationship is analyzed, and the analysis result is output to the synonym processing unit 5513.

さらに、構文解析部５５１２は、検索対象情報を解析した際に、各文章を識別するための文章ＩＤを付与し、この文章をマッチ辞書記憶部５０３に登録（記憶）する際に必要な登録情報を生成する。また、構文解析部５５１２は、単語情報や文節等の解析結果を利用して、文節同士の係りうけ関係に基づき、図８に示すような構造木を作成し、部分木ノード毎のルールを表す情報を生成して、類義語処理部５５１３に出力する。
なお、ルールとは、図８に示す構造木を構成する部分木ノード毎に対応付けられている情報であって、図６に示したように、語情報３２１、述語属性３２２、親ルールＩＤ３２３、重み値３２４、接続詞種別３２５、カテゴリ３２６、子ノード有フラグ３２７等を含む。 Furthermore, the syntax analysis unit 5512 gives a sentence ID for identifying each sentence when the search target information is analyzed, and registration information necessary for registering (storing) the sentence in the match dictionary storage unit 503. Is generated. Further, the syntax analysis unit 5512 creates a structural tree as shown in FIG. 8 based on the relationship between phrases using the analysis results of word information and phrases, and represents the rules for each subtree node. Information is generated and output to the synonym processing unit 5513.
The rule is information associated with each subtree node constituting the structural tree shown in FIG. 8, and as shown in FIG. 6, word information 321, predicate attribute 322, parent rule ID 323, A weight value 324, a conjunction type 325, a category 326, a child node presence flag 327, and the like are included.

ここで、図８を用いて構文解析部５５１２によって作成される構造木について説明する。図８は、構文解析部５５１２によって作成される構造木の一例を示す概略図である。
図８に示す通り、部分木ノードに対応するルールは、形態素解析部５５１１によって区分された文節毎に作成されている。また、部分木ノードによって構成される構造木は、文章の前後関係に応じた係りうけ関係によって作成されている。 Here, a structural tree created by the syntax analysis unit 5512 will be described with reference to FIG. FIG. 8 is a schematic diagram illustrating an example of a structural tree created by the syntax analysis unit 5512.
As shown in FIG. 8, the rule corresponding to the subtree node is created for each phrase segmented by the morphological analysis unit 5511. In addition, the structure tree constituted by the subtree nodes is created by the relationship based on the context of the sentence.

類義語処理部５５１３は、類義語辞書５５０３を参照して、分解された形態素や文節に対して、統一すべき類義語があるか否かを検索し、該当する類義語があれば、類義語辞書５５０３から検索によって得られた類義語への置換えを行う。 The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed morphemes and clauses for whether there is a synonym to be unified. If there is a corresponding synonym, the synonym processor 5513 performs a search from the synonym dictionary 5503. Replace with the obtained synonym.

ここで、辞書作成部５０４が、データベースファイルサーバ７００から検索対象情報を読み出して、単文に区切られた検索対象情報を文書解析部５０５に出力した場合、文書解析部５０５は、上述のような文書解析を行い、解析結果を辞書作成部５０４に出力する。
辞書作成部５０４は、文書解析部５０５から解析結果を受け取り、単文を構成する文節のルールとしての情報、例えば、単語情報、シンボルＩＤ、各単語語情報の述語属性、ルール（部分木ノード）間のつながりを表す親ルールＩＤや子ノード有フラグ、重み値、接続詞種別、カテゴリ、等のマッチ辞書記憶部５０３の登録に必要な情報を得る。 Here, when the dictionary creation unit 504 reads the search target information from the database file server 700 and outputs the search target information divided into simple sentences to the document analysis unit 505, the document analysis unit 505 reads the document as described above. The analysis is performed, and the analysis result is output to the dictionary creation unit 504.
The dictionary creation unit 504 receives the analysis result from the document analysis unit 505, and information as a rule of a clause constituting a single sentence, for example, word information, symbol ID, predicate attribute of each word word information, between rules (subtree nodes) Information necessary for registration in the match dictionary storage unit 503 such as a parent rule ID, a child node presence flag, a weight value, a conjunction type, a category, and the like representing the connection of

また、辞書作成部５０４は、マッチ辞書記憶部５０３から読み出したシンボルマップＭＤ１を確認して、マッチ辞書記憶部５０３内で統一的に利用されている単語情報を、シンボルＩＤに置き換える。さらに辞書作成部５０４は、文章ＩＤや、文章テキスト、文章付加情報や用語マップ等を含む文章情報ＭＤ２と、単語ＩＤ３１とルール３２とを含む単語情報ＭＤ３とを作成し、マッチ辞書登録用のデータを作成する。また、辞書作成部５０４は、このマッチ辞書登録用のデータを、マッチ辞書記憶部５０３のマッチ辞書データに追加する。
なお、辞書作成部５０４は、文書解析部５０５から受け取った解析結果の中に、シンボルマップＭＤ１に対応するシンボルＩＤがない単語情報が存在した場合、当該単語情報に対して、新たなシンボルＩＤを付与して、当該単語情報と新たなシンボルＩＤとの対応関係をシンボルマップＭＤ１に追加する。 In addition, the dictionary creation unit 504 confirms the symbol map MD1 read from the match dictionary storage unit 503, and replaces the word information used uniformly in the match dictionary storage unit 503 with the symbol ID. Further, the dictionary creation unit 504 creates sentence information MD2 including a sentence ID, sentence text, sentence additional information, a term map, and the like, and word information MD3 including a word ID 31 and a rule 32, and data for registering a match dictionary. Create Further, the dictionary creation unit 504 adds the match dictionary registration data to the match dictionary data in the match dictionary storage unit 503.
Note that the dictionary creation unit 504, when the word information having no symbol ID corresponding to the symbol map MD1 exists in the analysis result received from the document analysis unit 505, a new symbol ID is assigned to the word information. And the correspondence between the word information and the new symbol ID is added to the symbol map MD1.

次に、図９〜１１を用いて、検索処理部５０１によって実行されるマッチモードについて詳細に説明する。図９は、単語要素マッチングについて説明するための概略図である。図１０は、係りうけマッチングについて説明するための概略図である。図１１は、属性マッチングについて説明するための概略図である。 Next, the match mode executed by the search processing unit 501 will be described in detail with reference to FIGS. FIG. 9 is a schematic diagram for explaining word element matching. FIG. 10 is a schematic diagram for explaining the pending matching. FIG. 11 is a schematic diagram for explaining attribute matching.

図９（ａ）〜（ｃ）に示す通り、単語要素マッチングは、積集合タイプと、全集合タイプと、部分集合タイプの３タイプがある。単語要素マッチングでは、検索処理部５０１が、マッチ辞書記憶部５０３に記憶されているマッチ辞書データのルール３２と、検索キー文においてルール３２と対応する文字列（例えば、文節）との照合を行う。なお、検索処理部５０１は、単語要素マッチングがマッチプロファイルのマッチモード定義ＰＡ１において、積集合タイプと、全集合タイプと、部分集合タイプのうち、いずれか１つのタイプを実行する。 As shown in FIGS. 9A to 9C, there are three types of word element matching: product set type, full set type, and subset type. In word element matching, the search processing unit 501 compares the match dictionary data rule 32 stored in the match dictionary storage unit 503 with a character string (for example, a phrase) corresponding to the rule 32 in the search key sentence. . Note that the search processing unit 501 executes any one of a product set type, a full set type, and a subset type in the match mode definition PA1 in which the word element matching is a match profile.

ここで、積集合タイプとは、マッチ辞書記憶部５０３のマッチ辞書データのルールの文字列（部分木ノードに対応する文節）の少なくとも一部と、検索キー文に含まれる文字列（部分木ノードに対応する文節）の少なくとも一部とが一致する場合、この一致する文字列をマッチモードの条件を満たす単語として得るマッチモードである。なお、一部でも一致する単語がなかった場合、マッチモードの条件を満たす単文が得られなかったという結果となる。 Here, the product set type means at least a part of a rule character string (a clause corresponding to a subtree node) of the match dictionary data in the match dictionary storage unit 503 and a character string (subtree node) included in the search key sentence. When at least a part of the phrase corresponding to the phrase matches, the matching character string is obtained as a word that satisfies the condition of the match mode. If there is no matching word even in part, the result is that a single sentence that satisfies the match mode condition cannot be obtained.

例えば、図９（ａ）に示す通り、マッチ辞書記憶部５０３のマッチ辞書データのルールに対応する文節「サッカー観戦」と、検索キー文に含まれる文節「サッカー少年」、「観戦ツアー」および「戦」とがそれぞれ照合された場合、検索キー文に含まれる文節「サッカー少年」は、その一部である単語「サッカー」において、ルールに対応する文節「サッカー観戦」の一部と一致している。この場合、ルールに対応する文節「サッカー観戦」は、マッチモードの条件を満たす。
また、検索キー文に含まれる文節「観戦ツアー」は、その一部である単語「観戦」が、ルールに対応する文節「サッカー観戦」の一部と一致しており、マッチモードの条件を満たす。
このように、マッチモードの条件を満たす場合、マッチング処理の結果として、ルールに対応する文節「サッカー観戦」が得られる。また、検索処理部５０１は、このマッチモードの条件を満たす単語をマッチした単語として検出する。 For example, as shown in FIG. 9A, the phrase “watching soccer” corresponding to the rule of the match dictionary data in the match dictionary storage unit 503 and the phrases “soccer boy”, “watching tour” and “ When the word “Soccer” is matched, the phrase “Soccer boy” included in the search key sentence matches the part of the phrase “Soccer watching” corresponding to the rule in the word “soccer” that is part of it. Yes. In this case, the phrase “watching soccer” corresponding to the rule satisfies the condition of match mode.
In addition, the phrase “watching tour” included in the search key sentence has the same word “watching game” as the part of the phrase “watching soccer game” corresponding to the rule, and satisfies the condition of match mode. .
As described above, when the condition of the match mode is satisfied, the phrase “watching soccer” corresponding to the rule is obtained as a result of the matching process. In addition, the search processing unit 501 detects a word that satisfies the condition of the match mode as a matched word.

一方、検索キー文に含まれる文節「戦」は、ルールに対応する文節「サッカー観戦」に含まれる単語の単位で、一致する部分がないため、マッチモードの条件を満たさない。このため、マッチング処理の結果としては、マッチモードの条件を満たす単文が得られなかったという結果となる。なお、単語「戦」は、文節「サッカー観戦」の単語「観戦」の一部に含まれる文字であるが、単語単位で比較した場合、「戦」と「観戦」とは異なる文字列（単語）となるため、ここでは、一致しないと判断される。 On the other hand, the phrase “battle” included in the search key sentence is a unit of words included in the phrase “watching soccer game” corresponding to the rule and does not satisfy the match mode condition because there is no matching part. For this reason, as a result of the matching process, a simple sentence that satisfies the condition of the match mode cannot be obtained. Note that the word “war” is a character included in a part of the word “watching” in the phrase “watching soccer”, but when compared word by word, “war” and “watching” are different character strings (word Therefore, it is determined that they do not match.

次に、全集合タイプについて図９（ｂ）を用いて説明する。
全集合タイプは、マッチ辞書記憶部５０３のマッチ辞書データのルールの文字列（部分木ノードに対応する文節）と、検索キー文に含まれる文字列（部分木ノードに対応する文節）の全てが一致する場合、一致した文字列がマッチモードの条件を満たす単語（マッチした単語）として得られるマッチモードである。この場合、上述の積集合と異なり、一部でも（単語１つでも）一致する文節があった場合であって、文節の全ての文字列が一致しなければ、マッチモードの条件を満たす単文が得られなかったという結果となる。
例えば、図９（ｂ）に示す通り、マッチ辞書記憶部５０３のマッチ辞書データのルールに対応する文節「サッカー観戦」と、検索キー文に含まれる文節「サッカー観戦」「観戦」とが照合された場合、文節「サッカー観戦」は、ルールに対応する文節と、検索キー文の両方ともが完全に一致しており、マッチモードの条件を満たすため、マッチング処理の結果として、文節「サッカー観戦」（マッチした文節）が得られる。
一方、検索キー文の文節「観戦」は、ルールに対応する文節「サッカー観戦」の一部とは一致するものの、全ての文字列が一致していなため、マッチモードの条件を満たさず、マッチング処理の結果としては、マッチモードの条件を満たす単文が得られなかったという結果となる。 Next, all set types will be described with reference to FIG.
The entire set type includes all of the character string (phrase corresponding to the subtree node) of the match dictionary data in the match dictionary storage unit 503 and the character string (phrase corresponding to the subtree node) included in the search key sentence. When they match, the matched character string is a match mode obtained as a word (matched word) that satisfies the match mode condition. In this case, unlike the intersection set described above, if there is a phrase that matches even at least (even one word), and if all the character strings in the phrase do not match, a simple sentence that satisfies the match mode condition is The result is that it was not obtained.
For example, as shown in FIG. 9B, the phrases “watching soccer” corresponding to the rule of the match dictionary data in the match dictionary storage unit 503 are compared with the phrases “watching soccer” and “watching” included in the search key sentence. In this case, the phrase “watching soccer” matches both the clause corresponding to the rule and the search key sentence and satisfies the match mode. (Matched phrase) is obtained.
On the other hand, the phrase “watching” of the search key sentence matches the part of the phrase “watching soccer” corresponding to the rule, but all the character strings do not match. As a result of the processing, a simple sentence satisfying the match mode condition cannot be obtained.

次に、部分集合タイプについて図９（ｃ）を用いて説明する。
部分集合タイプは、マッチ辞書記憶部５０３のマッチ辞書データのルールの文字列（部分木ノードに対応する文節）が、検索キー文に含まれる文字列（部分木ノードに対応する文節）の一部と完全に一致する場合、一致している文字列をマッチモードの条件を満たす単語として得るマッチモードである。この場合、上述の積集合タイプと異なり、少なくともルールに対応する文節の全てを含んでいれば、マッチモードの条件を満たし、一致する単語や文節をマッチした単語あるいはマッチした文節して得る。一方、検索キー文に含まれる文節が、ルールに対応する文節の全てを含むものでなければ、マッチモードの条件を満たす語情報が得られなかったという結果となる。 Next, the subset type will be described with reference to FIG.
The subset type is a part of a character string (a clause corresponding to a subtree node) in which a character string (a clause corresponding to the subtree node) of the rule of the match dictionary data in the match dictionary storage unit 503 is included in the search key sentence. Is a match mode in which the matched character string is obtained as a word that satisfies the condition of the match mode. In this case, unlike the product set type described above, if at least all of the clauses corresponding to the rule are included, the match mode condition is satisfied, and a matching word or clause is obtained as a matched word or a matched clause. On the other hand, if the clause included in the search key sentence does not include all the clauses corresponding to the rule, the result is that word information satisfying the match mode condition cannot be obtained.

例えば、図９（ｃ）に示す通り、マッチ辞書記憶部５０３のマッチ辞書データのルールに対応する文節「サッカー観戦」と、検索キー文に含まれる文節「サッカー観戦ツアー」「観戦ツアー」とが照合された場合、ルールに対応する文節「サッカー観戦」は、検索キー文に含まれる文節「サッカー観戦ツアー」の一部であって、その文節を構成する全ての単語が、検索キー文に含まれる文節に含まれる単語として一致しているため、マッチモードの条件を満たす。このように、マッチモードの条件を満たす場合、マッチング処理の結果として、ルールに対応する文節「サッカー観戦」（マッチした文節）が得られる。
一方、検索キー文の文節「観戦ツアー」は、ルールに対応する文節「サッカー観戦」の一部の単語「観戦」が一致するものの、ルールに対応する文節の全てが検索キー文の一部と一致していなため、マッチモードの条件を満たさず、マッチング処理の結果としては、マッチモードの条件を満たす単文が得られなかったという結果となる。 For example, as shown in FIG. 9C, the phrases “watching soccer” corresponding to the rules of the match dictionary data in the match dictionary storage unit 503 and the phrases “watching soccer watching tour” and “watching tour” included in the search key sentence are included. When matched, the phrase “Soccer watching” corresponding to the rule is part of the phrase “Soccer watching tour” included in the search key sentence, and all the words constituting the phrase are included in the search key sentence. Since it matches as a word included in the phrase, the condition of the match mode is satisfied. Thus, when the condition of the match mode is satisfied, the phrase “watching soccer” (matched phrase) corresponding to the rule is obtained as a result of the matching process.
On the other hand, the phrase “watching tour” in the search key sentence matches some words “watching” in the phrase “watching soccer game” corresponding to the rule, but all the clauses corresponding to the rule are part of the search key sentence. Since they do not match, the match mode condition is not satisfied, and as a result of the matching process, a simple sentence that satisfies the match mode condition is not obtained.

次に、係りうけマッチングの例について図１０（ａ）〜（ｂ）を用いて説明する。
図１０（ａ）〜（ｂ）に示す通り、係りうけマッチングは、係りうけの関係のあるものを抽出するノード親子関係タイプと、係り受けの評価を行わないノード単独タイプの２タイプがある。係りうけマッチングでは、検索処理部５０１によって、マッチ辞書記憶部５０３に記憶されているマッチ辞書データのルール３２の文字列（部分木ノードに対応する文節）の係りうけ関係と、検索キー文に含まれる文字列（部分木ノードに対応する文節）の係りうけ関係との照合が行われる。なお、検索処理部５０１は、係りうけマッチングがマッチプロファイルのマッチモード定義ＰＡ１おいて、ノード親子関係タイプと、ノード単独タイプのうち、いずれか１つのタイプを実行する。 Next, an example of pending matching will be described with reference to FIGS.
As shown in FIGS. 10A and 10B, there are two types of dependency matching: a node parent-child relationship type that extracts a relationship having a dependency relationship and a node single type that does not perform dependency evaluation. In the dependency matching, the search processing unit 501 includes the dependency relationship of the character string (the clause corresponding to the subtree node) of the rule 32 of the match dictionary data stored in the match dictionary storage unit 503 and the search key sentence. The character string (the clause corresponding to the subtree node) is checked against the dependency relationship. The search processing unit 501 executes any one of the node parent-child relationship type and the node single type in the match mode definition PA1 of the match profile for the match matching.

ここで、ノード親子関係タイプは、単語要素マッチングの集積合タイプによってマッチングした単語同士の係りうけ関係についての条件であって、マッチ辞書記憶部５０３のマッチ辞書データのルールの文字列（部分木ノードに対応する文節）の親子関係と、検索キー文に含まれる文字列のうち単語要素マッチングの集積合タイプによって得られたマッチした単語（部分木ノードに対応する文節）の親子関係が一致する場合、この一致する文字列をマッチモードの条件を満たす単語として得るマッチモードである。なお、親子関係が一致する文字列がなかった場合、マッチモードの条件を満たす単文が得られなかったという結果となる。 Here, the node parent-child relationship type is a condition for the relationship between words matched by the integrated type of word element matching, and is a character string (partial tree node) of the rule of match dictionary data in the match dictionary storage unit 503 And the parent-child relationship of the matched word (phrase corresponding to the subtree node) obtained by the word element matching integrated type in the character string included in the search key sentence matches In this match mode, the matching character string is obtained as a word that satisfies the match mode condition. If there is no character string having a matching parent-child relationship, a simple sentence that satisfies the match mode condition cannot be obtained.

例えば、図１０（ａ）に示す通り、マッチ辞書記憶部５０３のマッチ辞書データの親子関係において、親ルールに対応する文節が「行く」、子ルールに対応する文節が「サッカー観戦」である場合、この親子関係と一致するパターンは、以下の２つである。つまり、親ルールに対応する文節が「行く」であって、子ルールに対応する文節が「サッカー」であるパターンと、親ルールに対応する文節が「行く」であって、子ルールに対応する文節が「観戦」であるパターンである。
よって、検索キー文に含まれる文字列として、親ルールに対応する文節が「行く」、子ルールに対応する文節が「サッカー」であるもの親子関係「（サッカー）−（行く）」は、マッチモードの条件を満たす。このように、マッチモードの条件を満たす場合、マッチング処理の結果として、ルールに対応する文節「サッカー観戦」―「行く」の親子関係が得られる。
一方、検索キー文に含まれる文字列として、子ルールに対応する文節が「サッカー観戦」であって、親ルールに対応する文節がないものや、子ルールに対応する文節が「行く」であって、親ルールに対応する文節がないものは、マッチモードの条件を満たさない。 For example, as shown in FIG. 10A, in the parent-child relationship of the match dictionary data in the match dictionary storage unit 503, the phrase corresponding to the parent rule is “go” and the phrase corresponding to the child rule is “watching soccer”. The following two patterns coincide with this parent-child relationship. In other words, the phrase corresponding to the parent rule is “go” and the phrase corresponding to the child rule is “soccer”, and the phrase corresponding to the parent rule is “go” and corresponds to the child rule. It is a pattern in which the phrase is “watching”.
Therefore, as a character string included in the search key sentence, a phrase corresponding to the parent rule is “go” and a phrase corresponding to the child rule is “soccer”. Meet the mode conditions. As described above, when the condition of the match mode is satisfied, the parent-child relationship of the phrase “watching soccer”-“go” corresponding to the rule is obtained as a result of the matching process.
On the other hand, as a character string included in the search key sentence, the phrase corresponding to the child rule is “watching soccer” and there is no phrase corresponding to the parent rule, or the phrase corresponding to the child rule is “go”. If there is no clause corresponding to the parent rule, the match mode condition is not satisfied.

次に、図１０（ｂ）を用いて、ノード単独タイプについて説明する。
ノード単独タイプは、マッチ辞書記憶部５０３のマッチ辞書データのルールの文字列（部分木ノードに対応する文節）の親子関係と、検索キー文に含まれる文字列（部分木ノードに対応する文節）の親子関係において、親ノードあるいは子ノードの少なくともいずれか一方が一致する場合、この一致する文字列をマッチモードの条件を満たす単語として得るマッチモードである。すなわち、実際には係り受けの評価は行われないことになる。なお、ノード内の文字列の比較は、単語要素マッチングに応じたタイプで行われるが、親ノードあるいは子ノードのいずれか一方でも一致する文字列がなかった場合、マッチモードの条件を満たす単文が得られなかったという結果となる。 Next, the node single type will be described with reference to FIG.
The node single type includes a parent-child relationship of a character string (a clause corresponding to a subtree node) of a rule of match dictionary data in the match dictionary storage unit 503, and a character string (a clause corresponding to a subtree node) included in a search key sentence. In the parent-child relationship, when at least one of the parent node and the child node is matched, the matching character string is obtained as a word that satisfies the condition of the match mode. That is, the dependency is not actually evaluated. Note that the comparison of character strings in a node is performed according to the type corresponding to word element matching. The result is that it was not obtained.

例えば、図１０（ｂ）に示す通り、マッチ辞書記憶部５０３のマッチ辞書データの親子関係が、親ノードに対応する文節「サッカー観戦」、子ノードに対応する文節が「行く」である場合、親ノードに対応する文節として「サッカー」を含む検索キー文は、マッチモードの条件を満たし、マッチング処理の結果として、ルールに対応する文節「サッカー観戦」―「行く」の親子関係が得られる。
また、子ノードに対応する文節として「行く」を含む検索キー文は、マッチング条件を満たす。一方、子ノードに対応する文節として「する」を含む検索キー文は、親ノードあるいは子ノードのいずれか一方でも一致する文字列がないため、マッチモードの条件を満たさない。
例えば、マッチプロファイルのマッチモード定義ＰＡ１においては、係りうけマッチングを行うか否かが予め決められており、上述した複数のタイプのうち、いずれか一方が予め決められている。 For example, as shown in FIG. 10B, when the parent-child relationship of the match dictionary data in the match dictionary storage unit 503 is a phrase “watching soccer” corresponding to the parent node, and a phrase corresponding to the child node “going”, The search key sentence including “soccer” as the phrase corresponding to the parent node satisfies the condition of the match mode, and as a result of the matching process, the parent-child relationship of the phrases “watching soccer”-“go” corresponding to the rule is obtained.
A search key sentence including “go” as a clause corresponding to the child node satisfies the matching condition. On the other hand, the search key sentence including “Yes” as the clause corresponding to the child node does not satisfy the match mode condition because there is no matching character string in either the parent node or the child node.
For example, in the match mode definition PA1 of the match profile, it is determined in advance whether or not to perform the matching matching, and one of the above-described types is determined in advance.

次に、属性マッチングについて説明する。属性マッチングは、文属性一致タイプと、係り受けマッチングと同様に、実質的には属性のマッチを評価しない単語一致タイプがある。
ここでは、文属性一致タイプを図１１に示す。
属性マッチングでは、検索処理部５０１によって、マッチ辞書記憶部５０３に記憶されているマッチ辞書データのルールの文字列（部分木ノードに対応する文節）の属性と、検索キー文における対応する文字列（部分木ノードに対応する文節）の属性との照合が行われる。
ここで、文属性の一致のタイプは、マッチ辞書記憶部５０３のマッチ辞書データのルールの文節の少なくとも一部と、検索キー文に含まれる文節のうち少なくとも一部（単語）が一致しており、この一致している部分の属性もそれぞれ一致する場合、この一致する文字列をマッチモードの条件を満たす単語として得られるマッチモードである。なお、文字列が一致していても、属性が異なる場合は、マッチモードの条件を満たす単文が得られなかったという結果となる。 Next, attribute matching will be described. As attribute matching, there are a sentence attribute matching type and a word matching type that does not substantially evaluate attribute matching, similar to dependency matching.
Here, the sentence attribute match type is shown in FIG.
In attribute matching, the search processing unit 501 causes the attribute of the rule string of the match dictionary data stored in the match dictionary storage unit 503 (the clause corresponding to the subtree node) and the corresponding character string in the search key sentence ( The attribute of the clause corresponding to the subtree node is checked.
Here, the sentence attribute match type is such that at least a part of the rule clause of the match dictionary data in the match dictionary storage unit 503 matches at least a part (word) of the clauses included in the search key sentence. When the attributes of the matching parts match, the matching character string is a match mode obtained as a word that satisfies the match mode conditions. Even if the character strings match, if the attributes are different, a simple sentence that satisfies the match mode condition cannot be obtained.

例えば、マッチ辞書記憶部５０３のマッチ辞書データのルール３２において、ルールと対応する文節「サッカー観戦」の述語属性が「否定」であって、検索キー文に含まれる文節「観戦」の述語属性が「否定」である場合、マッチモードの条件を満たす。つまり、「サッカー観戦しない」の場合、「サッカー（名詞）」＋「観戦（名詞）」＋「しない（助動詞）」に分解されるが、「しない（助動詞）」の属性が「否定」である。このため、「サッカー観戦」の述語属性が「否定」となり、マッチモードの条件を満たす。
一方、検索キー文に含まれる文節「サッカー観戦」の述語属性が「可能」である場合、文節「サッカー観戦」の文字列は一致しているものの、属性が異なるため、マッチモードの条件を満たさない。つまり、「サッカー観戦できる」の場合、「サッカー（名詞）」＋「観戦（名詞）」＋「できる（助動詞）」に分解されるが、「できる（助動詞）」の属性が「可能」である。このため、「サッカー観戦」の述語属性は「可能」となり、マッチモードの条件を満たさない。
ここでいう「属性」とは、助動詞の意味情報のことで、否定、疑問、可能などがある。例えば、「使えない」という語に対しては、「使う（動詞）」+「ない（助動詞）」に分解されるが、「ない（助動詞）」の属性は「否定」である。
なお、属性マッチングにおいては、名詞など「属性なし」同士でマッチしてもマッチモードの条件を満たすと判定する。 For example, in the rule 32 of the match dictionary data in the match dictionary storage unit 503, the predicate attribute of the phrase “watching soccer” corresponding to the rule is “denial”, and the predicate attribute of the phrase “watching watching” included in the search key sentence is If “No”, the condition of the match mode is satisfied. In other words, in the case of “do not watch soccer”, it is decomposed into “soccer (noun)” + “watching (noun)” + “do (auxiliary verb)”, but the attribute of “do not (auxiliary verb)” is “denial” . For this reason, the predicate attribute of “watching soccer” is “negative”, which satisfies the condition of the match mode.
On the other hand, if the predicate attribute of the phrase “Soccer watching” included in the search key sentence is “possible”, the character string of the clause “Soccer watching” matches, but the attributes are different, so the match mode condition is met. Absent. That is, in the case of “can watch soccer”, it is decomposed into “soccer (noun)” + “watching (noun)” + “can (auxiliary verb)”, but the attribute of “can (auxiliary verb)” is “possible” . For this reason, the predicate attribute of “watching soccer” is “possible” and does not satisfy the condition of the match mode.
The “attribute” here means semantic information of an auxiliary verb, and there are denial, doubt, possibility, and the like. For example, the word “cannot be used” is decomposed into “use (verb)” + “not (auxiliary verb)”, but the attribute of “not (auxiliary verb)” is “denial”.
In the attribute matching, it is determined that the match mode condition is satisfied even if “no attributes” such as nouns are matched.

次に、図１２を用いて、本実施の形態に係る情報検索システム１におけるマッチ辞書データの作成方法の一例について説明する。図１２は、本実施の形態に係る情報検索システム１におけるマッチ辞書データの作成方法の一例を示すフローチャートである。
図１２に示す通り、マッチ辞書データの作成が、例えば日本語解析サーバ５００の操作部（図示せず）から指示された場合、辞書作成部５０４は、データベースファイルサーバ７００のデータソース７０１から検索対象となる文章を読み出し、文章を句点等ごとに区切って単文の単位として、文書解析部５０５に出力する。例えば、検索対象となる文章Ａのテキストデータが「ＰＣ内にある画像を送信したところ受信できませんでした。また、携帯内にある画像を送信したところ受信できました。」である場合、句点「。」で区切られている２つの単文に区切って、単文Ａ１「ＰＣ内にある画像を送信したところ受信できませんでした。」と、単文Ａ２「また、携帯内にある画像を送信したところ受信できました。」を文書解析部５０５に出力する（ステップＳＴ１）。 Next, an example of a method for creating match dictionary data in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 12 is a flowchart showing an example of a method for creating match dictionary data in the information search system 1 according to the present embodiment.
As shown in FIG. 12, when the creation of match dictionary data is instructed from an operation unit (not shown) of the Japanese analysis server 500, for example, the dictionary creation unit 504 searches the data source 701 of the database file server 700 for a search target. The sentence is read out, and the sentence is divided into phrases and output to the document analysis unit 505 as a single sentence unit. For example, if the text data of sentence A to be searched is "I couldn't receive it when I sent an image in my PC. And I couldn't receive it when I sent an image in my phone." . ”Is divided into two single sentences separated by“. ”, And the simple sentence A1“ I couldn't receive it when I sent the image in the PC. ” Is output to the document analysis unit 505 (step ST1).

単文Ａ１を受け取った文書解析部５０５の形態素解析部５５１１は、単文Ａを形態素（例えば単語単位）に分離して、検索対象となる文章を複数の形態素に分解する。例えば、形態素解析部５５１１は、文章Ａが入力されると、文章Ａに含まれる単文Ａ１を複数の単語「ＰＣ」「内」「に」「ある」「画像」「を」「送信」「した」「ところ」「受信」「できませんでした」に分解する。
そして、形態素解析部５５１１は、システム辞書５５０１やユーザ辞書５５０２を参照して、分解した形態素の品詞や属性、意味等を解析し、解析結果として得る（ステップＳＴ２）。 The morpheme analysis unit 5511 of the document analysis unit 505 that has received the single sentence A1 separates the single sentence A into morphemes (for example, in units of words), and decomposes the sentence to be searched into a plurality of morphemes. For example, when the sentence A is input, the morphological analysis unit 5511 performs a plurality of words “PC”, “inside”, “in”, “a”, “image”, “send”, “send”, and “send”. ”“ Place ”“ Receive ”“ Disabled ”.
The morpheme analysis unit 5511 analyzes the part of speech, attributes, meaning, and the like of the decomposed morpheme with reference to the system dictionary 5501 and the user dictionary 5502 to obtain an analysis result (step ST2).

次いで、構文解析部５５１２は、少なくとも１つ以上の形態素を組み合わせて、構造木の部分木ノードに対応する文節を作成する。なお、ここでは、部分木ノードに対応する文字列として、文章を構成する構成要素の１つであって、文章を実際の言葉として不自然にならない程度に区切ったときに得られる最小のひとまとまりである文節を１つの単位とした例について説明する。しかし、本発明はこれに限られない。 Next, the syntax analysis unit 5512 creates a clause corresponding to the subtree node of the structural tree by combining at least one morpheme. Note that here, the character string corresponding to the subtree node is one of the constituent elements of the sentence, and is the smallest set obtained when the sentence is divided as an actual word so as not to be unnatural. An example in which a phrase is a unit will be described. However, the present invention is not limited to this.

そして、構文解析部５５１２は、形態素解析部５５１１によって解析された解析結果に基づき、文章を構成する単語の品詞や意味、属性情報、文章内での位置、並び等を評価し、文章における文節同士の係りうけ関係を解析し、文節どうしの係りうけ関係や、単語の出現位置、文章中での文の成分（主語や述語等）等を解析結果として得る。また、構文解析部５５１２は、各単文を識別するための単文ＩＤを単文毎に付与する。
次いで、構文解析部５５１２は、解析結果に基づき、文節を部分木ノードとする構造木を作成し、解析結果を類義語処理部５５１３に出力する（ステップＳＴ３）。 Based on the analysis result analyzed by the morphological analysis unit 5511, the syntax analysis unit 5512 evaluates the part of speech and meaning of words constituting the sentence, the attribute information, the position in the sentence, the arrangement, and the like. The relationship between clauses is analyzed, and the relationship between clauses, the appearance position of words, sentence components (sentences, predicates, etc.) in sentences are obtained as analysis results. Further, the syntax analysis unit 5512 assigns a single sentence ID for identifying each single sentence for each single sentence.
Next, the syntax analysis unit 5512 creates a structure tree with the clause as a subtree node based on the analysis result, and outputs the analysis result to the synonym processing unit 5513 (step ST3).

類義語処理部５５１３は、類義語辞書５５０３を参照して、分解された単語に対して、統一すべき類義語や同意語があるか否かを検索し、該当する類義語等があれば、該当する単語を、類義語辞書５５０３から検索によって得られた類義語等に置換える（ステップＳＴ４）。そして、類義語処理部５５１３は、解析部５５１による解析結果を辞書作成部５０４に出力する。 The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed words for synonyms and synonyms to be unified. If there are corresponding synonyms, the synonym processing unit 5513 selects the corresponding words. The synonym dictionary 5503 is replaced with a synonym obtained by searching (step ST4). Then, the synonym processing unit 5513 outputs the analysis result by the analysis unit 551 to the dictionary creation unit 504.

解析結果を受け取った辞書作成部５０４は、この解析結果から、文節毎のルールとして、例えば、語情報テキストデータ、各語情報の述語属性、ルール（部分木ノード）間のつながりを表す親ルールＩＤや子ノード有フラグ、重み値、接続詞種別、カテゴリ、等のマッチ辞書記憶部５０３の登録に必要な情報を得る。そして、辞書作成部５０４は、これら文節によって構成される単文毎に、各文節のルールと単文ＩＤとをまとめて、マッチ辞書データの単文情報ＭＤ３として登録可能な登録用データを作成する。 The dictionary creation unit 504 that has received the analysis result, based on the analysis result, as a rule for each clause, for example, word information text data, a predicate attribute of each word information, a parent rule ID representing a connection between rules (subtree nodes) Information necessary for registration in the match dictionary storage unit 503, such as a child node presence flag, a weight value, a conjunction type, and a category, is obtained. Then, the dictionary creation unit 504 creates the registration data that can be registered as the single sentence information MD3 of the match dictionary data by putting together the rules and the single sentence IDs of each phrase for each single sentence constituted by these phrases.

また、辞書作成部５０４は、マッチ辞書記憶部５０３からシンボルマップＭＤ１を読み出して、検索対象である文章に、マッチ辞書記憶部５０３内で統一的に利用されている語情報があるか否かを確認し、単語情報と同一の単語があった場合は、この単語を、シンボルＩＤに置き換える。なお、シンボルマップＭＤ１内に置き換えるべき同一の単語情報がない場合、辞書作成部５０４は、新たなシンボルＩＤを当該単語情報に与える。
そして、辞書作成部５０４は、文章ＩＤや、文章テキスト、文章付加情報や用語マップ等を含む文章情報（文章の登録のために必要な情報）、および単文情報に基づくマッチ辞書の登録用データを作成する（ステップＳＴ５）。 In addition, the dictionary creation unit 504 reads the symbol map MD1 from the match dictionary storage unit 503, and determines whether there is word information that is uniformly used in the match dictionary storage unit 503 in the text to be searched. If there is a word identical to the word information, this word is replaced with the symbol ID. When there is no identical word information to be replaced in the symbol map MD1, the dictionary creation unit 504 gives a new symbol ID to the word information.
Then, the dictionary creation unit 504 stores sentence ID, sentence text, sentence information including sentence additional information, a term map, and the like (information necessary for registering sentences) and match dictionary registration data based on simple sentence information. Create (step ST5).

次いで、辞書作成部５０４は、作成したマッチ辞書の登録用データをマッチ辞書記憶部５０３に書込み、解析結果をマッチ辞書データとして登録する（ステップＳＴ６）。 Next, the dictionary creation unit 504 writes the created match dictionary registration data in the match dictionary storage unit 503, and registers the analysis result as match dictionary data (step ST6).

次に、図１３を用いて、本実施の形態に係る情報検索システム１におけるマッチ辞書データの基づく検索方法の一例について説明する。図１３は、本実施の形態に係る情報検索システム１における検索方法の一例を示すフローチャートである。
図１３に示す通り、例えば、ユーザによってクライアント端末装置１００の入力部１０３から検索サービスαが指定された場合、クライアント端末装置１００は、指定された検索サービスαによる検索を実行するためのリクエスト制御信号を、ＷＥＢサーバ３００を介して日本語解析サーバ５００に送信する。 Next, an example of a search method based on match dictionary data in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 13 is a flowchart showing an example of a search method in the information search system 1 according to the present embodiment.
As illustrated in FIG. 13, for example, when the search service α is designated by the user from the input unit 103 of the client terminal device 100, the client terminal device 100 performs a request control signal for executing a search using the designated search service α. Is transmitted to the Japanese language analysis server 500 via the WEB server 300.

日本語解析サーバ５００は、この検索リクエスト制御信号を受信すると、検索処理部５０１が検索サービスαのプログラムを起動させ、検索サービスαと関連付けられているマッチプロファイルをマッチプロファイル記憶部５０２から読み出す。ここでは、マッチプロファイルＡのマッチモード定義において、予め検索サービスαとマッチプロファイルＡとが関連づけられているため、検索プログラムαが起動されることにより、検索処理部５０１がマッチプロファイルＡを読み出す。また、検索処理部５０１は、マッチ辞書記憶部５０３に記憶されているマッチ辞書データを読み出す（ステップＳＴ１０）。 When the Japanese analysis server 500 receives this search request control signal, the search processing unit 501 activates the search service α program and reads the match profile associated with the search service α from the match profile storage unit 502. Here, since the search service α and the match profile A are associated in advance in the match mode definition of the match profile A, the search processing unit 501 reads the match profile A when the search program α is started. Further, the search processing unit 501 reads the match dictionary data stored in the match dictionary storage unit 503 (step ST10).

そして、検索処理部５０１は、例えば、読み出したマッチプロファイルＡおよびマッチ辞書データとを、メモリ領域５０６に展開して辞書オブジェクトを作成する（ステップＳＴ１１）。なお、検索処理部５０１は、マッチプロファイルＡから、各文章に付与された用語マップ２４を読み出し、単語情報の出現頻度情報を計算し、メモリ領域５０６に一時的に記憶させておいてもよい。このように、辞書オブジェクトを作成するメモリ展開時に、予め各単語情報の出現頻度情報を得て置くことにより、マッチング処理の際に、単語情報の出現頻度情報を計算する処理負荷が軽減される。 Then, for example, the search processing unit 501 expands the read match profile A and match dictionary data in the memory area 506 to create a dictionary object (step ST11). Note that the search processing unit 501 may read the term map 24 attached to each sentence from the match profile A, calculate the appearance frequency information of the word information, and temporarily store it in the memory area 506. As described above, by obtaining the appearance frequency information of each word information in advance when expanding the memory for creating the dictionary object, the processing load for calculating the appearance frequency information of the word information is reduced during the matching process.

ここで、クライアント端末装置１００の入力部１０３から、ユーザによって検索キー文が入力されると、クライアント端末装置１００は、検索キー文を、ＷＥＢサーバ３００を介して日本語解析サーバ５００に送信する（ステップＳＴ１２）。
そして、日本語解析サーバ５００は、この検索キー文を受信すると（ステップＳＴ１３）、以下に示す通り、この検索キー文に基づく検索を行う。 Here, when a search key sentence is input by the user from the input unit 103 of the client terminal apparatus 100, the client terminal apparatus 100 transmits the search key sentence to the Japanese analysis server 500 via the WEB server 300 ( Step ST12).
When receiving the search key sentence (step ST13), the Japanese analysis server 500 performs a search based on the search key sentence as shown below.

まず、検索処理部５０１は、検索サービスαに対する検索結果をクライアント端末装置１００に返信するため、メモリ領域５０６に、空のオブジェクトである検索結果オブジェクトを生成し、結果記録用の記憶領域を確保する（ステップＳＴ１４）。
そして、検索処理部５０１は、検索サービスαのプログラムに従って、検索キー文の解析を行う。すなわち、形態素解析部５５１１は、検索処理部５０１によって分割された単文の形態素解析を行い形態素に分割し、システム辞書５５０１およびユーザ辞書５５０２を参照して、品詞や属性等を検索する。そして、形態素解析部５５１１は、得られた品詞や属性等を表す情報に基づき、文章中の係りうけ関係や形態素の意味に応じた文節を作成する。 First, the search processing unit 501 generates a search result object, which is an empty object, in the memory area 506 in order to return a search result for the search service α to the client terminal device 100, and secures a storage area for recording the result. (Step ST14).
Then, the search processing unit 501 analyzes the search key sentence according to the program of the search service α. That is, the morpheme analysis unit 5511 analyzes the morpheme of the single sentence divided by the search processing unit 501, divides the morpheme into morphemes, and searches the system dictionary 5501 and the user dictionary 5502 to search for parts of speech, attributes, and the like. The morpheme analysis unit 5511 creates a phrase according to the relationship between the sentence and the meaning of the morpheme based on the information indicating the part of speech, the attribute, and the like.

次いで、構文解析部５５１２は、形態素解析部５５１１による解析結果に基づき、文章を構成する文節の品詞や意味、属性情報、文章内での位置、並び等を評価し、文章における文節同士の係りうけ関係を解析し、解析結果を類義語処理部５５１３に出力する。
類義語処理部５５１３は、類義語辞書５５０３を参照して、分解された単語や文節に対して、統一すべき類義語や同義語があるか否かを検索し、該当する類義語等があれば、該当する単語や文節を、類義語辞書５５０３から検索によって得られた類義語等に置換える。そして、類義語処理部５５１３は、解析部５５１による解析結果を検索処理部５０１に出力する（ステップＳＴ１５）。 Next, the syntax analysis unit 5512 evaluates the part of speech and the meaning of the phrase constituting the sentence, the attribute information, the position in the sentence, the arrangement, etc. based on the analysis result by the morpheme analysis unit 5511, and determines the relationship between the phrases in the sentence. The relationship is analyzed, and the analysis result is output to the synonym processing unit 5513.
The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed word or phrase for whether there is a synonym or synonym that should be unified. The word or phrase is replaced with a synonym or the like obtained by searching from the synonym dictionary 5503. Then, the synonym processing unit 5513 outputs the analysis result by the analysis unit 551 to the search processing unit 501 (step ST15).

そして、検索処理部５０１は、マッチプロファイルＡにおいて予め決められているマッチモードに従って、解析された検索キー文と、メモリ領域506の辞書オプジェクトに展開されているマッチ辞書データとのマッチングを行い、各マッチングのどのタイプに属するかを判定し、マッチモード定義の条件を満たす文章等の検索(マッチング処理）を行う(ステップＳＴ１６)。なお、詳細については、後述する。
さらに、検索処理部５０１は、ステップＳＴ１６において、マッチプロファイルＡにおいて予め決められているスコアモードに従って、マッチングによって得られた文章等における、検索キー文とのマッチングの程度を評価するスコアを算出する（スコアリング処理）。 Then, the search processing unit 501 performs matching between the analyzed search key sentence and the match dictionary data expanded in the dictionary object in the memory area 506 according to a match mode predetermined in the match profile A, and The type of matching is determined, and a sentence or the like that satisfies the condition of the match mode definition is searched (matching process) (step ST16). Details will be described later.
Further, in step ST16, the search processing unit 501 calculates a score that evaluates the degree of matching with the search key sentence in a sentence obtained by matching according to a score mode predetermined in the match profile A ( Scoring process).

そして、検索処理部５０１は、マッチング処理における検索によって得られたマッチモードの条件を満たす文章と、スコアリング処理によって得られたスコアとを、メモリ領域５０６の検索結果オブジェクトに書き込む（ステップＳＴ１７）。
そして、検索処理部５０１は、ＷＥＢサーバ３００を介してクライアント端末装置１００に、検索結果オブジェクトの内容を送信する（ステップＳＴ１８）。 Then, the search processing unit 501 writes the sentence satisfying the match mode obtained by the search in the matching process and the score obtained by the scoring process in the search result object in the memory area 506 (step ST17).
Then, the search processing unit 501 transmits the contents of the search result object to the client terminal device 100 via the WEB server 300 (step ST18).

次に、図１４を用いて、本実施の形態に係る情報検索システム１におけるマッチング処理とスコアリング処理の一例について詳細に説明する。図１４は、本実施の形態に係る情報検索システム１におけるマッチング処理とスコアリング処理の一例について詳細に説明するフローチャートである。なお、図１４に示す処理は、図１３のステップＳＴ１６に対応する処理を詳細に記載したものである。
図１４に示す通り、検索処理部５０１は、マッチ辞書データのシンボルマップＭＤ１を参照して、図１３のステップＳＴ１５において文書解析部５０５によって解析された検索キー文の単語情報をシンボルＩＤに置き換える（ステップＳＴ２０）。
そして、検索処理部５０１は、マッチプロファイルＡにおいて予め決められているマッチモードに従ってマッチング処理を行う。本実施の形態において、検索処理部５０１は、
単語要素マッチング、係りうけマッチング、属性マッチングについて、それぞれどのタイプに合致するかを判定し、マッチモード定義ＰＡ１によって定義された条件に合致する単文を抽出する（ステップＳＴ２１）。これにより、検索処理部５０１は、マッチング処理によって検索キー文とマッチした文章を、マッチ辞書データから検索によって得ることができる。 Next, an example of matching processing and scoring processing in the information search system 1 according to the present embodiment will be described in detail with reference to FIG. FIG. 14 is a flowchart illustrating in detail an example of matching processing and scoring processing in the information search system 1 according to the present embodiment. The process shown in FIG. 14 is a detailed description of the process corresponding to step ST16 in FIG.
As shown in FIG. 14, the search processing unit 501 refers to the symbol map MD1 of the match dictionary data and replaces the word information of the search key sentence analyzed by the document analysis unit 505 in step ST15 of FIG. Step ST20).
Then, the search processing unit 501 performs a matching process according to a match mode determined in advance in the match profile A. In the present embodiment, the search processing unit 501
For word element matching, dependency matching, and attribute matching, it is determined which type is matched, and a single sentence that matches the condition defined by the match mode definition PA1 is extracted (step ST21). Thereby, the search processing unit 501 can obtain a sentence that matches the search key sentence by the matching process from the match dictionary data.

検索処理部５０１はマッチング処理によって得られた結果に対し、各マッチングモードで判定されたマッチタイプの情報と、マッチプロファイルＡによって定義されているスコアモードを利用してスコアを算出し、これらの和をメモリ領域５０６に一時的に記憶させる(ステップＳＴ２２)。ここで、スコアの算出にあたっては、、実際に結果の抽出に用いられたマッチモード定義とは関連なく行われる。これは、マッチモード定義を条件としたマッチングは、検索結果そのものの抽出である一方、スコア算出の処理は、抽出された結果の中で、より検索の意図に適した結果を得やすくするための評価を行う処理であり、抽出した結果に対して、マッチモード定義の条件にかかわらず再度、単語要素、係りうけ、属性の観点で結果を評価することが有用だからである。 The search processing unit 501 calculates the score using the match type information determined in each matching mode and the score mode defined by the match profile A for the result obtained by the matching process, and sums these values. Is temporarily stored in the memory area 506 (step ST22). Here, the score is calculated irrespective of the match mode definition actually used for extracting the result. This is because matching based on the match mode definition is extraction of the search result itself, while the score calculation process is intended to make it easier to obtain a result more suitable for the purpose of the search among the extracted results. This is an evaluation process, and it is useful to evaluate the extracted result again in terms of word elements, involvement, and attributes regardless of the match mode definition conditions.

例えば、係りうけマッチングにおいて、図１０（ｂ）のように、ノード単独タイプが行われていた場合、ルールに対応する文節が「（サッカー観戦）−（行く）」であり、検索キー文に含まれる文節が「（サッカー）−（行く）」、「（サッカー）−（する）」であった場合、いずれもマッチング結果としては、一致すると判断されるが、マッチプロファイルにおける係りうけマッチ係数ＰＡ７は、検索キー文の係りうけとして一致する「（サッカー）−（行く）」に対して、係数を適用してスコアが算出される。
つまり、検索処理部５０１は、このような係りうけマッチングの条件を満たす文章や単文が得られた場合、このマッチした文章や単文と、検索キー文とを比較して、スコアモード情報が示すような関係であった場合、係数を適用してスコアを算出することができる。 For example, in the case of dependency matching, when the node single type is performed as shown in FIG. 10B, the clause corresponding to the rule is “(Watch soccer game) − (go)” and is included in the search key sentence. When the phrases to be played are “(soccer) − (go)” and “(soccer) − (s)”, it is determined that both match as matching results, but the match match coefficient PA7 in the match profile is Then, a score is calculated by applying a coefficient to “(soccer)-(go)” that matches the search key sentence.
That is, when a sentence or a single sentence satisfying such dependency matching conditions is obtained, the search processing unit 501 compares the matched sentence or single sentence with the search key sentence, and the score mode information indicates If the relationship is negative, the score can be calculated by applying a coefficient.

そして、検索処理部５０１は、マッチした文章と、この文章のスコアに基づき、マッチした文章のマッチ情報（例えば、マッチング処理において利用したマッチモードの種類、マッチした単語やマッチした文節の文章内における出現位置（以下、マッチ位置情報という）、スコア）を生成する（ステップＳＴ２３）。
次いで、検索処理部５０１は、例えば、ステップＳＴ２２によって算出されたスコアの点数が高い順にマッチした文章を並び替える（ステップＳＴ２４）。そして、検索処理部５０１は、検索キー文、マッチした文章、マッチした単文およびマッチ情報を関連付けて、メモリ領域５０６の検索結果オブジェクトに書き込む（ステップＳＴ２５）。 Then, the search processing unit 501 uses the matched sentence and the score of the sentence to match the matched sentence (for example, the type of match mode used in the matching process, the matched word or the sentence in the matched phrase) An appearance position (hereinafter referred to as match position information) and score) is generated (step ST23).
Next, for example, the search processing unit 501 rearranges the matched sentences in descending order of score score calculated in step ST22 (step ST24). Then, the search processing unit 501 associates the search key sentence, the matched sentence, the matched single sentence, and the match information, and writes them in the search result object in the memory area 506 (step ST25).

そして、検索処理部５０１は、ＷＥＢサーバ３００に検索結果オブジェクトの内容を送信する（ステップＳＴ２６）。ＷＥＢサーバ３００は、受信した検索結果オブジェクトを記憶部３０４に一時的に記憶させ、クライアント端末装置１００の表示部１０２によって表示可能な検索結果の表示データ（ウェブページ）を作成しクライアント端末装置１００に送信する。クライアント端末装置１００は、この表示データに基づき、検索結果の表示データを表示部１０２に表示する。
なお、マッチ位置情報とは、マッチング条件を満たす単語（マッチした単語）が、このマッチした単語を含むマッチした単文や文章の文中に出現する文字位置を表す情報である。 Then, the search processing unit 501 transmits the contents of the search result object to the WEB server 300 (step ST26). The WEB server 300 temporarily stores the received search result object in the storage unit 304, creates search result display data (web page) that can be displayed by the display unit 102 of the client terminal device 100, and stores the search result object in the client terminal device 100. Send. The client terminal device 100 displays the search result display data on the display unit 102 based on the display data.
Note that the match position information is information indicating a character position at which a word satisfying the matching condition (matched word) appears in a matched single sentence or sentence including the matched word.

ここで、図１５を用いて、検索結果オブジェクトの内容である検索結果データの一例について説明する。図１５は、検索結果データの一例について説明する概略図である。
図１５に示す通り、検索結果データは、クライアント端末装置１００においてユーザによって入力された“検索キー文”と、日本語解析サーバ５００によるマッチング処理によって得られた検索キー文とマッチした“マッチした文章”と、検索キー文とマッチした単文であって、マッチした文章に含まれる単文であることを表す“マッチした単文”と、このマッチした単文に含まれる文節毎に生成される“マッチ情報”とが、それぞれ関連付けられている。 Here, an example of search result data, which is the content of the search result object, will be described with reference to FIG. FIG. 15 is a schematic diagram illustrating an example of search result data.
As shown in FIG. 15, the search result data includes a “matched sentence” that matches the “search key sentence” input by the user in the client terminal device 100 and the search key sentence obtained by the matching process by the Japanese analysis server 500. ", A single sentence that matches the search key sentence and is a single sentence included in the matched sentence, and a" match information "generated for each clause included in this matched single sentence Are associated with each other.

次に、図１６を用いて、検索結果オブジェクトの内容として、クライアント端末装置１００に送信される検索結果データの一例について、より詳細に説明する。
図１６（ａ）に示す通り、検索キー文が「インターネットがつながらない」であって、日本語解析サーバ５００によって、例えば、検索結果１として、単文ＩＤ「００１−１」、テキスト「インターネットがつながらない場合でも操作は可能ですか？」と、検索結果２として、単文ＩＤ「００２−３」、テキスト「突然、インターネットができなくなりました」が得られた場合について以下説明する。 Next, an example of search result data transmitted to the client terminal device 100 as the contents of the search result object will be described in more detail with reference to FIG.
As shown in FIG. 16A, when the search key sentence is “Internet not connected” and the Japanese analysis server 500, for example, as the search result 1, the single sentence ID “001-1” and the text “Internet is not connected”. However, a case where the simple sentence ID “002-3” and the text “Suddenly the Internet could not be made” is obtained as the search result 2 will be described below.

図１６（ｂ）は、検索キー文や検索結果で得られた文章のマッチ位置情報を表す文字位置について説明する図である。図１６（ｂ）に示す通り、例えば、検索キー文は、文章の先頭から順番に、一文字ずつ「１，２，３，・・・，１２」という文字位置を表す番号が付与されている。この文字位置を表す番号によって、マッチした単文や文章の文中に出現するマッチした単語の文字位置を表すことができる。 FIG. 16B is a diagram for explaining a character position representing match position information of a sentence obtained from a search key sentence or a search result. As shown in FIG. 16B, for example, the search key sentence is given a number indicating the character position “1, 2, 3,..., 12” one by one from the beginning of the sentence. The number representing the character position can represent the character position of the matched word appearing in the matched single sentence or sentence.

図１６（ｃ）は、検索結果の一例を示す。図１６（ｃ）に示す通り、検索結果１は、文章ＩＤが「００１」、単文ＩＤが「００１−１」であって、マッチモードが「係りうけマッチング」の場合では、スコアが「８．９」であって、マッチ位置情報が「ｋｅｙ１：７，ｒｅｓ１：７」「ｋｅｙ９：１２，ｒｅｓ９：１２」であることが示されている。ここで、マッチ位置は、当該マッチモードにおいてマッチングした単語を示しており、マッチ位置情報「ｋｅｙ１：７，ｒｅｓ１：７」は、検索キー文の「インターネット」を、マッチ位置情報「ｋｅｙ９：１２，ｒｅｓ９：１２」は、検索キー文の「つながらない」を、意味している。つまり、この「インターネット」と「つながらない」は、係りうけマッチングにおいて、検索キー文における親子関係と、検索結果１の文章における親子関係のマッチングの程度がスコア「８．９」と評価されていることを意味している。 FIG. 16C shows an example of the search result. As shown in FIG. 16C, the search result 1 has a score of “8.” when the sentence ID is “001”, the single sentence ID is “001-1”, and the match mode is “relevant matching”. 9 ”and the match position information is“ key1: 7, res1: 7 ”,“ key9: 12, res9: 12 ”. Here, the match position indicates a word matched in the match mode, and the match position information “key1: 7, res1: 7” indicates “Internet” of the search key sentence and the match position information “key9: 12”. “res9: 12” means “not connected” in the search key sentence. In other words, the “Internet” and “not connected” have a score of “8.9” for the matching between the parent-child relationship in the search key sentence and the parent-child relationship in the sentence of the search result 1 in the dependency matching. Means.

ここで、マッチ位置情報「ｋｅｙ１：７，ｒｅｓ１：７」とは、マッチした単語の文中における位置を表す情報であって、「ｋｅｙ１：７」は、検索キー文の先頭から数えて、１番目から７番目までの文字列が、マッチした単語に該当することを表している。また、「ｒｅｓ１：７」は、マッチした単文（あるいはマッチした文章）の先頭から数えて、１番目から７番目までの文字列が、マッチした単語に該当することを表している。なお、この数字は、文の先頭を基点として数えられる文字の数であって、文中における文字の位置を表す情報である。
このように、検索結果は、マッチング条件を満たす単語を含む文と、当該文に含まれるマッチした単語のマッチ位置情報とが関連付けられている情報を含む。 Here, the match position information “key1: 7, res1: 7” is information indicating the position of the matched word in the sentence, and “key1: 7” is the first counted from the head of the search key sentence. The seventh to seventh character strings correspond to matching words. Further, “res1: 7” indicates that the first to seventh character strings counted from the head of the matched single sentence (or matched sentence) correspond to the matched word. This number is the number of characters counted from the beginning of the sentence and is information indicating the position of the character in the sentence.
As described above, the search result includes information in which a sentence including a word satisfying the matching condition is associated with match position information of a matched word included in the sentence.

次に、図１７を用いて、本実施の形態に係る情報検索システム１における検索開始処理の一例について説明する。図１７は、本実施の形態に係る情報検索システム１における検索開始処理の一例を示すフローチャートである。
図１７に示す通り、例えば、クライアント端末装置１００の入力部１０３が日本語解析サーバ５００による検索サービスを利用するリクエストをユーザから受け付けると、クライアント端末装置１００は、日本語解析サーバ５００による検索用表示データを送信するよう、ＷＥＢサーバ３００に対してリクエスト制御信号を送信する。 Next, an example of a search start process in the information search system 1 according to the present embodiment will be described using FIG. FIG. 17 is a flowchart showing an example of a search start process in the information search system 1 according to the present embodiment.
As shown in FIG. 17, for example, when the input unit 103 of the client terminal device 100 receives a request for using the search service by the Japanese analysis server 500 from the user, the client terminal device 100 displays the search for display by the Japanese analysis server 500. A request control signal is transmitted to the WEB server 300 so as to transmit data.

ＷＥＢサーバ３００は、通信部３０１を介して、クライアント端末装置１００からリクエスト制御信号を受信すると、リクエスト処理部３０２が、このリクエスト制御信号に基づき、クライアント端末装置１００の表示部１０２によって表示される表示データのウェブページのデータを作成するようデータ変換部３０３を制御する。次いで、データ変換部３０３は、記憶部３０４から必要な設定データ等を読み出し、ユーザによって検索キー文が入力されるテキストボックスを表示する検索用表示データを作成する。そして、通信部３０１は、この検索用表示データを、クライアント端末装置１００に送信する（ステップＳＴ３０）。
例えば、データ変換部３０３は、クライアント端末装置１００に対して検索用表示データを表示させるためのＨＴＭＬ文章等で構成された表示データを作成する。そして、リクエスト処理部３０２は、この表示データと、検索結果を表示するためのルールが記載されたルール情報（例えば、ＣＳＳファイルで構成されたもの）、あるいは、検索結果をクライアント端末装置１００の表示部１０２に表示されるために利用されるプログラム（例えば、ｊａｖａｓｃｒｉｐｔ等）であって、ブラウザ１０１上で動作するプログラムコードを、通信部３０１を介して、クライアント端末装置１００に送信する。 When the WEB server 300 receives a request control signal from the client terminal device 100 via the communication unit 301, the request processing unit 302 displays on the display unit 102 of the client terminal device 100 based on the request control signal. The data conversion unit 303 is controlled so as to create data of the data web page. Next, the data conversion unit 303 reads necessary setting data from the storage unit 304 and creates search display data for displaying a text box in which a search key sentence is input by the user. And the communication part 301 transmits this display data for a search to the client terminal device 100 (step ST30).
For example, the data conversion unit 303 creates display data composed of HTML text or the like for causing the client terminal device 100 to display search display data. Then, the request processing unit 302 displays the display data and rule information (for example, configured with a CSS file) in which rules for displaying the search result are described, or displays the search result on the client terminal device 100. A program code (for example, javascript) that is used to be displayed on the unit 102 and that runs on the browser 101 is transmitted to the client terminal device 100 via the communication unit 301.

クライアント端末装置１００は、ＷＥＢサーバ３００から表示データやプログラム等を受信すると、このプログラムを起動させる。そして、データ処理部１１２は、このプログラムに従って、ＷＥＢサーバ３００から受信される検索用表示データを、表示部１０２によって表示される表示データを生成し、表示処理部１１３を制御する。そして、表示処理部１１３は、データ処理１１２によって生成された表示データを、表示部１０２に表示させる。
クライアント端末装置１００の入力部１０３を介して、ユーザから特定の検索サービスが指定されると、クライアント端末装置１００は、指定された検索サービスによる検索を実行するためのリクエスト制御信号を生成する。
また、ユーザによって検索キー文が入力されると、入力部１０３はこれを受け付ける（ステップＳＴ３１）。 When the client terminal device 100 receives display data or a program from the WEB server 300, the client terminal device 100 starts this program. Then, the data processing unit 112 generates display data displayed by the display unit 102 from the display data for search received from the WEB server 300 according to this program, and controls the display processing unit 113. The display processing unit 113 causes the display unit 102 to display the display data generated by the data processing 112.
When a specific search service is specified by the user via the input unit 103 of the client terminal device 100, the client terminal device 100 generates a request control signal for executing a search by the specified search service.
Further, when a search key sentence is input by the user, the input unit 103 receives this (step ST31).

次いで、クライアント端末装置１００は、ユーザによって指定された検索サービスの種類や、入力された検索キー文を、検索リクエストメッセージとともに、通信部１０４を介してＷＥＢサーバ３００に送信する。
ＷＥＢサーバ３００は、クライアント端末装置１００から検索リクエストメッセージを受信すると、この検索リクエストメッセージから検索キー文を取り出し、日本語解析サーバ５００に、例えば、検索サービスαによる検索を要求する（ステップＳＴ３２）。 Next, the client terminal device 100 transmits the type of search service designated by the user and the input search key text to the WEB server 300 via the communication unit 104 together with the search request message.
When receiving the search request message from the client terminal device 100, the WEB server 300 extracts a search key sentence from the search request message and requests the Japanese analysis server 500 to perform a search using, for example, the search service α (step ST32).

次に、図１８を用いて、本実施の形態に係る情報検索システム１における検索結果の表示方法の一例について説明する。図１８は、本実施の形態に係る情報検索システム１における検索結果の表示方法の一例を示すフローチャートである。
図１８に示す通り、ＷＥＢサーバ３００は、日本語解析サーバ５００から検索結果を受信すると、記憶部３０４に一時的に記憶させる。そして、データ変換部３０３が、クライアント端末装置１００に表示させるためのルール情報を記憶部３０４から読み出す。次いで、データ変換部３０３は、このルール情報に基づき、検索結果をクライアント端末装置１００の表示装置１０２において表示するためのウェブページの表示データを作成し、検索結果のメッセージとして、通信部３０１を介して、クライアント端末装置１００に送信する（ステップＳＴ４０）。
例えば、データ変換部３０３は、検索キー文ごとに、マッチした文章へのリンク情報、マッチした単文、適用したマッチモード、マッチ位置情報、スコア等のそれぞれに所定のタグを付与して、クライアント端末装置１００側のデータ処理部１１２によって取り扱い可能なデータ（ＸＭＬファイル）を作成し、検索結果として送信する。 Next, an example of a search result display method in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 18 is a flowchart showing an example of a search result display method in the information search system 1 according to the present embodiment.
As shown in FIG. 18, when the WEB server 300 receives the search result from the Japanese analysis server 500, the WEB server 300 temporarily stores it in the storage unit 304. Then, the data conversion unit 303 reads out rule information to be displayed on the client terminal device 100 from the storage unit 304. Next, the data conversion unit 303 creates display data of a web page for displaying the search result on the display device 102 of the client terminal device 100 based on the rule information, and sends the search result message via the communication unit 301 as a search result message. Is transmitted to the client terminal device 100 (step ST40).
For example, for each search key sentence, the data conversion unit 303 assigns predetermined tags to link information to matched sentences, matched single sentences, applied match modes, match position information, scores, etc. Data (XML file) that can be handled by the data processing unit 112 on the apparatus 100 side is created and transmitted as a search result.

クライアント端末装置１００のデータ処理部１１２は、この検索結果を受信すると、この検索結果（ＸＭＬファイル）を記憶部１１１に一時的に記憶させる。そして、データ処理部１１２は、記憶部１１１に記憶されているルール情報に基づき、このＸＭＬファイルの電文に含まれるマッチ位置に該当する単語に対して、適用したマッチモードに対応した表示用のタグを挿入する（ステップＳＴ４１）。
例えば、図１６（ｃ）に示すような検索結果の場合、データ処理部１１２は、マッチ位置情報の「ｋｅｙ１：７、ｒｅｓ１：７」に該当する「インターネット」に対して、適用したマッチモードとして「係りうけマッチング」であることを表す表示用のタグを、検索結果を表示するためのルールが記載されたルール情報（ＣＳＳファイル）を参照することで作成する。例えば、データ処理部１１２は、検索結果を表示するための情報として、特定の単語を強調して表示する強調表示設定情報をルール情報に基づき作成し、タグとしてマッチ位置情報に該当する単語に付与する。この強調表示設定情報としては、例えば、マッチした単語には下線を付加して表示するための設定情報や、あるいは、単語要素マッチングのマッチモードによってマッチした単語等を赤色で表示し、属性マッチングのマッチモードによってマッチした単語等を青色で表示することによって、マッチモードごとにユーザが視覚的に区別して認識することができるように表示するための設定情報が含まれている。 When receiving the search result, the data processing unit 112 of the client terminal device 100 temporarily stores the search result (XML file) in the storage unit 111. Based on the rule information stored in the storage unit 111, the data processing unit 112 displays a tag corresponding to the match mode applied to the word corresponding to the match position included in the message of the XML file. Is inserted (step ST41).
For example, in the case of a search result as shown in FIG. 16C, the data processing unit 112 sets the match mode applied to “Internet” corresponding to “key 1: 7, res1: 7” in the match position information. A display tag indicating that it is “involved matching” is created by referring to rule information (CSS file) in which a rule for displaying a search result is described. For example, the data processing unit 112 creates highlight setting information for emphasizing and displaying a specific word as information for displaying the search result based on the rule information, and assigns it to the word corresponding to the match position information as a tag To do. As the highlight setting information, for example, setting information for displaying a matched word with an underline or a word matched by a word element matching match mode is displayed in red, and attribute matching is performed. Setting information for displaying the word matched by the match mode in blue so that the user can visually distinguish and recognize each match mode is included.

また、強調表示設定情報としては、マッチしたと判断されるマッチモードが複数ある場合、予めユーザによって設定されている優先順位に従って、優先的に表示させるように設定されているマッチモードとマッチした単語等を強調表示するようなものであってもよく、スコアの高い順に優先的に強調表示するようなものであってもよい。
さらに、検索結果として、上述のように予めユーザによってマッチモードの優先順位が決定されている場合、マッチした単文や文章が複数あれば、この優先順位に従って、マッチした単文等を表示させるものであってもよい。
また、検索キー文の文頭に近いマッチした単語から順番に、検索結果として、優先的に表示させるものであってもよい。 In addition, as highlight setting information, when there are a plurality of match modes that are determined to be matched, a word that matches a match mode that is set to be displayed preferentially according to a priority set in advance by the user Or the like may be highlighted, or may be preferentially highlighted in descending order of score.
Furthermore, as a search result, when the priority order of the match mode is determined in advance by the user as described above, if there are a plurality of matched simple sentences or sentences, the matched simple sentences are displayed according to the priority order. May be.
Moreover, you may display preferentially as a search result in an order from the matched word near the head of the search key sentence.

このような強調表示設定情報に基づき、検索結果をクライアント端末装置１００の表示部１０２に表示させる。このとき、ユーザによって、検索結果として表示されているマッチした単文等の中から、強調して表示されているマッチした単語等が選択されると、入力部１０２は、これを受け付ける。そして、データ処理部１１２は、記憶部１１１に一時的に記憶されている検索結果のマッチ位置情報から、ユーザによって選択されたマッチした単語等の位置を特定し、このユーザによって選択されたマッチした単語等に応じた絞み込み検索をさらに行う（ステップＳＴ４２）。 Based on such highlight setting information, the search result is displayed on the display unit 102 of the client terminal device 100. At this time, when the user selects a matched word or the like displayed in an emphasized manner from the matched single sentences or the like displayed as the search results, the input unit 102 accepts this. Then, the data processing unit 112 identifies the position of the matched word or the like selected by the user from the match position information of the search result temporarily stored in the storage unit 111, and the matching selected by the user A narrow-down search according to the word or the like is further performed (step ST42).

ここで、絞り込み検索について、図１９、２０を用いて、詳細に説明する。図１９は、検索結果表示データに基づき、クライアント端末装置１００の表示部１０２に表示される検索結果を表す画像の一例を示す概略図である。
図１９に示す通り、クライアント端末装置１００の表示部１０２は、検索結果表示データに基づき画面１０２Ａを表示し、この画面１０２Ａの左側には、検索キー文を表示するテキストボックス１０２Ｂが、右側には検索結果を表示するサジェスト画面１０２Ｃが表示されている。
テキストボックス１０２Ｂには、検索キー文「クレジットカードの支払方法を登録したのですが料金サポート窓口から請求書が届きます。」が表示されている。 Here, the refinement search will be described in detail with reference to FIGS. FIG. 19 is a schematic diagram illustrating an example of an image representing a search result displayed on the display unit 102 of the client terminal device 100 based on the search result display data.
As shown in FIG. 19, the display unit 102 of the client terminal device 100 displays a screen 102A based on the search result display data. On the left side of the screen 102A is a text box 102B for displaying a search key sentence, and on the right side. A suggestion screen 102C for displaying the search result is displayed.
In the text box 102B, a search key sentence “I have registered a credit card payment method, but I receive a bill from the charge support window” is displayed.

サジェスト画面１０２Ｃには、この検索キー文１０２Ｃ１と、この検索キー文に基づき日本語解析サーバ５００による検索が行われた検索結果であるマッチした単文１０２Ｃ２が表示されている。
例えば、サジェスト画面１０２Ｃでは、検索キー文１０２Ｃ１のマッチした単語が、それぞれマッチモードに応じた色ごとに強調表示されている。マッチした単文１０２Ｃ２では、複数の単文が表示されている場合、通常の検索結果として、マッチした単文が、スコアが高い順番で表示されている。また、マッチした単文１０２Ｃ２に含まれるマッチした単語（例えば、「クレジットカードで支払方法を登録したのですが、料金サポート窓口から請求書が届きます。」）は、検索キー文１０２Ｃ１に含まれるマッチした単語と同様な強調して表示されており、同じマッチモードによって検索された単語に対しては、同じ色で強調して表示されている。
なお、サジェスト画面１０２Ｃの検索キー文１０２Ｃ１において強調して表示されているマッチした単語は、クライアント端末装置１００の入力部１０２からの選択指示を受け付け、ユーザによって選択可能である。
ＷＥＢサーバ３００は、検索結果が得られた場合、図１９に示すような検索結果の表示データを作成し、クライアント端末装置１００に送信する。 The suggestion screen 102C displays the search key sentence 102C1 and a matched single sentence 102C2 that is a search result obtained by the search by the Japanese analysis server 500 based on the search key sentence.
For example, on the suggestion screen 102C, the matched word of the search key sentence 102C1 is highlighted for each color corresponding to the match mode. In the matched simple sentence 102C2, when a plurality of simple sentences are displayed, the matched simple sentences are displayed in the order of the highest score as a normal search result. In addition, the matched word included in the matched simple sentence 102C2 (for example, “I registered the payment method with a credit card, but I receive an invoice from the charge support window”) is included in the search key sentence 102C1 The highlighted words are displayed in the same color as the selected words, and the words searched in the same match mode are highlighted in the same color.
Note that the matched word highlighted and displayed in the search key sentence 102C1 on the suggestion screen 102C can be selected by the user by receiving a selection instruction from the input unit 102 of the client terminal device 100.
When a search result is obtained, the WEB server 300 creates search result display data as shown in FIG. 19 and transmits it to the client terminal device 100.

次に、図２０を用いて、図１９に示す画面から絞込み検索を行う方法について説明する。図２０は、図１９に示す画面から、絞り込み検索を行った後に表示される画面の一例を示す概略図である。なお、クライアント端末装置１００は、日本語検索サーバ５００から受信した検索結果（例えば、マッチした文章、マッチした単文、マッチした単語、これらのマッチングに用いられたマッチング条件、あるいはマッチ位置情報を含むマッチ情報やスコア等）や、絞り込み検索を実行する際の検索ルール情報（例えば、入力部１０３を介して指定されたマッチした単語を検索キーとして、検索結果から、絞り込み対象を検索するためのプログラムや設定値等）を記憶部１１１に記憶しておく。図２０に示す通り、例えば、サジェスト画面１０２Ｃの検索キー文１０２Ｃ１のうち、強調して表示されているマッチした単語の「クレジットカード」が、ユーザによって選択された場合（例えば、マウスによって画面上のポインタが「クレジットカード」と表示されている部分を指示した状態で、ダブルクリック等の操作によって絞込み検索の対象として指定された場合）、クライアント端末装置１００のデータ処理部１１２が、入力部１０３を介して、ユーザからの選択指示を受け付け、「クレジットカード」による絞り込み検索を実行する。 Next, a method for performing a narrow search from the screen shown in FIG. 19 will be described with reference to FIG. FIG. 20 is a schematic diagram illustrating an example of a screen displayed after performing a narrowing search from the screen illustrated in FIG. Note that the client terminal device 100 uses the search result received from the Japanese search server 500 (for example, a matched sentence, a matched single sentence, a matched word, a matching condition used for the matching, or a match including match position information) Information, score, etc.), search rule information when performing a refinement search (for example, a program for retrieving a refinement target from a search result using a matched word specified via the input unit 103 as a search key, Set values and the like) are stored in the storage unit 111. As shown in FIG. 20, for example, when the “credit card” of the matched word highlighted in the search key sentence 102C1 on the suggestion screen 102C is selected by the user (for example, on the screen with the mouse) The data processing unit 112 of the client terminal device 100 uses the input unit 103 in the state where the pointer indicates “credit card” and the data processing unit 112 of the client terminal device 100 is designated as a narrowing search target by an operation such as double click. The user receives a selection instruction from the user, and executes a refinement search by “credit card”.

データ処理部１１２は、記憶部１１１に記憶されている検索結果や検索ルール情報を参照して、検索キー文の「クレジットカード」のマッチ位置情報を検出する。また、データ処理部１１２は、「クレジットカード」のマッチ位置情報に基づき、検索キー文の「クレジットカード」とマッチする単語としてマッチ位置情報において関連付けられているマッチした単語を含むマッチした単文や文章を検索する。
例えば、図１９に示す場合、検索キー文内の「クレジットカード」は、マッチ位置が「１：８」であるため、検索結果において、マッチ位置情報が「ｋｅｙ１：８，ｒｅｓ１：８」であって、同じマッチモードによってマッチした単語を含むマッチした文章を検索する。
さらに、データ処理部１１２は、この検索によって得られたマッチした単文や文章を、検索結果表示データの検索結果の上位に表示する表示データ（画面１０２Ａ−１、図２０の上方に示す図を参照）を作成する。これにより、表示処理部１１２は、絞り込み検索によって得られたマッチした文章を上位に表示するような表示データを、表示部１０２に表示させる。 The data processing unit 112 refers to the search result and search rule information stored in the storage unit 111 and detects the match position information of “credit card” in the search key sentence. Further, the data processing unit 112, based on the match position information of “credit card”, matches a single sentence or sentence including a matched word associated in the match position information as a word that matches “credit card” of the search key sentence. Search for.
For example, in the case shown in FIG. 19, the match position of the “credit card” in the search key sentence is “1: 8”, and therefore the match position information in the search result is “key1: 8, res1: 8”. Search for matching sentences that contain words that match in the same match mode.
Further, the data processing unit 112 displays display data (screen 102A-1, see the figure shown in the upper part of FIG. 20) that displays the matched single sentence or sentence obtained by this search at the top of the search result of the search result display data. ). As a result, the display processing unit 112 causes the display unit 102 to display display data that displays the matched sentence obtained by the narrowing search at the top.

そして、表示部１０２は、図２０の上部に示す検索結果の画面１０２Ａ−１の通り、検索結果のマッチした単文１０２Ｃ２の最上位として「クレジットカード支払いへの変更について」（スコア：２．２）を、その次に、「クレジットカードで支払方法を登録したのですが、料金サポート窓口から請求書が届きます。」（スコア：１．４）を表示する。
これにより、クライアント端末装置１００は、検索結果のマッチ位置情報を利用して、マッチした単語に基づく再検索を実行することができ、マッチした単語に関してマッチングの程度が高い順に、マッチした単文を表示することができる。
なお、ここでは、絞り込み検索として、検索キー文のマッチした単語が指定された場合、単語要素マッチングにおいてマッチした単語であって、かつ、マッチ位置情報が一致する単語を、データ処理部１１２が、記憶部１１１に記憶されている検索結果に基づき、再検索する一例について説明したが、本発明はこれに限られない。 Then, as shown in the search result screen 102A-1 shown in the upper part of FIG. 20, the display unit 102 displays “Change to credit card payment” (score: 2.2) as the highest level of the matched single sentence 102C2. Next, “I have registered a payment method with a credit card, but I receive an invoice from the charge support window” (score: 1.4) is displayed.
Accordingly, the client terminal device 100 can execute a re-search based on the matched word using the match position information of the search result, and displays the matched single sentences in descending order of matching with respect to the matched word. can do.
Here, when a word that matches the search key sentence is specified as the narrowing search, the data processing unit 112 selects a word that matches in the word element matching and matches the match position information. Although an example of performing a re-search based on the search result stored in the storage unit 111 has been described, the present invention is not limited to this.

例えば、サジェスト画面１０２Ｃの検索キー文１０２Ｃ１のうち、強調して表示されているマッチした単語の「請求書」が、ユーザによって選択された場合、データ処理部１１２は、入力部１０３を介して、ユーザからの選択指示を受け付け、「請求書」による絞り込み検索を実行する。
データ処理部１１２は、記憶部１１１に記憶されている検索結果や検索ルール情報を参照して、検索キー文の「請求書」のマッチ位置情報を検出する。また、データ処理部１１２は、「請求書」のマッチ位置情報に基づき、検索キー文の「請求書」とマッチする単語としてマッチ位置情報において関連付けられているマッチした単語を含むマッチした単文や文章を検索する。
例えば、図１９に示す場合、検索キー文内の「請求書」は、マッチ位置が「３３：３５」であるため、検索結果において、マッチ位置情報が「ｋｅｙ３３：３５，ｒｅｓ３３：３５」であって、同じマッチモードによってマッチした単語を含むマッチした文章を検索する。 For example, when the user selects the “invoice” of the matched word displayed highlighted in the search key sentence 102C1 on the suggestion screen 102C, the data processing unit 112, via the input unit 103, A selection instruction from the user is accepted, and a refinement search by “invoice” is executed.
The data processing unit 112 refers to the search result and search rule information stored in the storage unit 111 and detects the match position information of “invoice” in the search key sentence. Further, the data processing unit 112, based on the match position information of “Bill”, matches a single sentence or sentence including a matched word associated in the match position information as a word that matches “Bill” of the search key sentence. Search for.
For example, in the case shown in FIG. 19, since the invoice in the search key statement has a match position of “33:35”, the match position information in the search result is “key33: 35, res33: 35”. Search for matching sentences that contain words that match in the same match mode.

そして、データ処理部１１２は、この検索によって得られたマッチした文書を、検索結果表示データの検索結果の上位に表示する表示データ（画面１０２Ａ−２、図２０の下方に示す図を参照）を作成する。表示処理部１１３は、絞り込み検索によって得られたマッチした文章を上位に表示するような表示データを、表示部１０２に表示させる。
表示部１０２は、この表示データを表示し、図２０の下部に示す検索結果の画面１０２Ａ−２の通り、検索結果のマッチした単文１０２Ｃ２の最上位として「料金センタから請求書が届くのですが、請求書明細の内訳について確認したいです。」（スコア：４．０）を、その次に、「解約したにも関わらず、請求書が届いたのですが。」（スコア：２．５）・・・を表示する。 Then, the data processing unit 112 displays display data (see the screen 102A-2, the diagram shown in the lower part of FIG. 20) that displays the matched document obtained by this search at the top of the search result display data. create. The display processing unit 113 causes the display unit 102 to display display data that displays the matched sentence obtained by the narrowing search at the top.
The display unit 102 displays this display data, and as shown in the search result screen 102A-2 shown at the bottom of FIG. 20, “The invoice arrives from the charge center” as the highest level of the matched single sentence 102C2. I would like to confirm the breakdown of the invoice details. ”(Score: 4.0), then“ I received the invoice despite the cancellation ”(score: 2.5) ... is displayed.

このように、検索結果をクライアント端末装置１００の記憶部１１１に記憶させておき、マッチング位置情報を利用することで、データ処理部１１２は、マッチした単語に基づく再検索を行うことができる。これにより、クライアント端末装置１００は、絞り込み検索がユーザによって指示（リクエスト）された場合、検索結果に対して、形態素分解や構文解析等の解析を行うことなく、文章内におけるマッチした単語の位置を、マッチ位置情報を利用して得ることができる。また、マッチ位置情報は、マッチモードごとに作成されているため、マッチモードに応じて異なる強調表示をするためのタグをマッチした単語に付与することできる。従って、クライアント端末装置１００は、再検索された検索結果を、表示部１０２に表示することで、絞り込み検索の検索結果を得ることができる。 As described above, the search result is stored in the storage unit 111 of the client terminal device 100, and the data processing unit 112 can perform a search again based on the matched word by using the matching position information. As a result, when the refinement search is instructed (requested) by the user, the client terminal device 100 determines the position of the matched word in the sentence without performing analysis such as morphological decomposition or syntax analysis on the search result. , And can be obtained using the match position information. Moreover, since the match position information is created for each match mode, a tag for different highlighting according to the match mode can be given to the matched word. Therefore, the client terminal device 100 can obtain the search result of the narrow search by displaying the search result that has been searched again on the display unit 102.

このように、マッチ位置や、マッチパターンを含む検索結果をクライアント端末装置１００の記憶部１１１に記憶しておき、かつデータ処理部１１２による絞り込み検索のキーとなる単語が入力部１０３を介して指定されると、対応する単語の位置情報とマッチパターンのみから簡単に絞り込みの検索結果を得ることができる。このため、クライアント端末装置１００は、絞り込み検索による再検索結果の表示データを再構築できる。
一方、本実施の形態と異なり、マッチ位置情報が検索結果として記憶部１１１に記憶されていない場合、クライアント端末装置１００によって絞り込み検索が指定された単語が、マッチした文章やマッチした単文の何処に含まれているのかを、文章を解析しなければ検出することができない問題があったと考えられる。また、本実施の形態にように、マッチモードごとにマッチした単語のマッチ位置情報が検索結果としてクライアント端末装置１００に記憶されていない場合、マッチモードごとに同じ色、あるいは同じフォント等で強調して表示できない問題があったと考えられる。本実施の形態に係る情報検索システム１は、上述のような構成とすることによって、上記問題を解決することができる。 As described above, the search result including the match position and the match pattern is stored in the storage unit 111 of the client terminal device 100, and a word that is a key for the narrowing search by the data processing unit 112 is designated via the input unit 103. Then, a narrowed search result can be easily obtained from only the position information of the corresponding word and the match pattern. For this reason, the client terminal device 100 can reconstruct the display data of the re-search result by the narrow-down search.
On the other hand, unlike the present embodiment, when the match position information is not stored in the storage unit 111 as a search result, the word specified by the refinement search by the client terminal device 100 is located anywhere in the matched sentence or the matched single sentence. It is thought that there was a problem that it cannot be detected unless the sentence is analyzed. Further, as in the present embodiment, when the match position information of the word matched for each match mode is not stored in the client terminal device 100 as a search result, the match is emphasized with the same color or the same font for each match mode. It seems that there was a problem that could not be displayed. The information search system 1 according to the present embodiment can solve the above problem by adopting the configuration as described above.

また、本実施の形態に係るクライアント端末装置１００は、マッチ位置情報を用いて、マッチした単文や文章の文中におけるマッチした単語の位置が分かるため、マッチした単語を文中から抽出することができる。このため、形態素解析や構文解析等の文書解析を行うことなく、再検索ができる。また、マッチした単語の位置が分かることにより、クライアント端末装置１００は、マッチした単語を強調表示する表示データを作成し、表示することができる。
一方、マッチ位置情報がない場合、クライアント端末装置１００は、検索結果に対して文書解析を行わない限り、強調表示するための単語が文中のどこにあるのかわからないため、強調表示することができない。また、マッチ条件に応じて異なる色で強調表示することもできない。
なお、上述の処理はクライアント端末装置１００側のデータ処理部１１２で絞り込み検索を実施し、再表示を行ったが、クライアント端末装置１００から、ユーザによって選択された単語情報をＷＥＢサーバ３００側に送信し、ＷＥＢサーバ３００側に絞り込みの処理を実施させてもよい。この場合、検索結果や検索ルール情報は、日本語解析サーバからＷＥＢサーバ３００に送信され、記憶部３０４に記憶しておく。 Further, the client terminal device 100 according to the present embodiment can use the match position information to know the position of the matched word in the matched single sentence or sentence, so that the matched word can be extracted from the sentence. Therefore, re-searching can be performed without performing document analysis such as morphological analysis and syntax analysis. Also, by knowing the position of the matched word, the client terminal device 100 can create and display display data that highlights the matched word.
On the other hand, if there is no match position information, the client terminal device 100 cannot highlight the word because the word for highlighting is not located in the sentence unless document analysis is performed on the search result. Also, it cannot be highlighted with different colors depending on the match condition.
In the above-described processing, the data processing unit 112 on the client terminal device 100 side performs a narrowing search and re-displays, but the word information selected by the user is transmitted from the client terminal device 100 to the WEB server 300 side. The narrowing process may be performed on the WEB server 300 side. In this case, search results and search rule information are transmitted from the Japanese language analysis server to the WEB server 300 and stored in the storage unit 304.

次に、マッチング処理とスコアリング処理の一例について詳細に説明する。
なお、ここでは、検索サービスαがユーザによって指定されており、検索サービスαのマッチプロファイルとして予め決められているマッチプロファイルＡがメモリ領域５０６の辞書オブジェクトに展開されている例について説明する。 Next, an example of the matching process and the scoring process will be described in detail.
Here, an example will be described in which the search service α is designated by the user, and a match profile A determined in advance as a match profile for the search service α is expanded in the dictionary object in the memory area 506.

図２１は、検索キー文の一例を示す図である。
図２１に示すように、検索キー文「ＥＴＣカードがお店で使えません。」が、日本語解析サーバ５００に入力される。これにより、日本語解析サーバ５００によって照合が行われると、図２２に示すようなマッチした単文が得られる。図２２には、マッチした単文が複数示されており、例えば「ＥＴＣカードが使えない」「クレジットカードが使えない」「ＥＴＣカードを使いたい」「クレジットカードを使いたい」「ＥＴＣカードは使いやすい」「ＥＴＣカードを無くした。」等のマッチした単文が検索によって得られている。 FIG. 21 is a diagram illustrating an example of a search key sentence.
As shown in FIG. 21, a search key sentence “ETC card cannot be used at a store” is input to the Japanese language analysis server 500. Thus, when collation is performed by the Japanese analysis server 500, a matched single sentence as shown in FIG. 22 is obtained. FIG. 22 shows a plurality of matched single sentences. For example, “ETC card cannot be used”, “Credit card cannot be used”, “I want to use ETC card”, “I want to use credit card”, “ETC card is easy to use A matched single sentence such as “I lost the ETC card” is obtained by the search.

図２３は、マッチプロファイル記憶部５０２に記憶されているマッチプロファイルＡ〜Ｃの設定を説明するための概略図である。図２３に示す通り、マッチプロファイルＡ〜Ｃの設定は、図５に示したマッチプロファイル記憶部５０２に記憶されているマッチモードの組み合わせが決定されているマッチモード定義ＰＡ１を含むマッチモード情報と、相対出現頻度フラグＰＡ２から同義語マッチ係数ＰＡ１０に相当する情報を含むスコアモード情報である。なお、マッチモードは、図２７を用いて後述するとおり、（１）〜（４）に示す組み合わせパターンが利用可能である。ここでは、（２）単語要素マッチング＋属性マッチングがマッチプロファイルのマッチ情報として予め決められている例について説明する。 FIG. 23 is a schematic diagram for explaining setting of match profiles A to C stored in the match profile storage unit 502. As shown in FIG. 23, the setting of the match profiles A to C includes match mode information including a match mode definition PA1 in which a match mode combination stored in the match profile storage unit 502 shown in FIG. The score mode information includes information corresponding to the synonym match coefficient PA10 from the relative appearance frequency flag PA2. Note that the combination mode shown in (1) to (4) can be used as the match mode, as will be described later with reference to FIG. Here, an example will be described in which (2) word element matching + attribute matching is predetermined as match information of the match profile.

マッチプロファイルＡは、マッチモード情報（マッチモード定義ＰＡ１）として、（２）単語要素マッチングと属性マッチングが行われることが予め決められており、マッチング処理の結果に対しての重み付けは実行されないこと（例えば、係数１．０）が予め決められている。また、マッチプロファイルＢは、マッチモード情報（マッチモード定義ＰＡ１）として、単語要素マッチングと属性マッチングが行われることが予め決められており、マッチング処理の結果に対しての重み付けは、係りつけマッチ係数ＰＡ７に基づく係りうけマッチングにおける重み付けを実行すること（例えば、係数２．０）が予め決められている。さらに、マッチプロファイルＣは、マッチモード情報（マッチモード定義ＰＡ１）として、単語要素マッチングと属性マッチングが行われることが予め決められており、マッチング処理の結果に対しての重み付けは、述語属性マッチ係数ＰＡ６に基づき、属性マッチングにおいてマッチした単文には述語属性マッチ係数に従った重み付けを実行すること（例えば、係数２．０）が予め決められている。 The match profile A is determined beforehand as match mode information (match mode definition PA1) (2) word element matching and attribute matching are performed, and weighting is not performed on the result of the matching process ( For example, the coefficient 1.0) is predetermined. The match profile B is preliminarily determined to be subjected to word element matching and attribute matching as match mode information (match mode definition PA1), and the weighting on the result of the matching process is an associated match coefficient. Execution of weighting in dependency matching based on PA7 (for example, coefficient 2.0) is predetermined. Further, the match profile C is preliminarily determined to be subjected to word element matching and attribute matching as match mode information (match mode definition PA1), and the weighting on the result of the matching process is a predicate attribute match coefficient. Based on PA6, it is predetermined to execute weighting according to the predicate attribute match coefficient (for example, coefficient 2.0) for a single sentence matched in attribute matching.

次に、図２４を用いて、図２１〜２３に示した例において、マッチプロファイルＡに基づき得られた検索結果について説明する。図２４は、マッチプロファイルＡに基づき得られた検索結呆をサジェスト画面１０２Ｃに表示されている画面の一例を示す概略図である。
ここで、マッチプロファイルＡのマッチモード定義ＰＡ１は単語要素＋属性であるから、図２４にはこれらの条件に一致した結果（単文）が抽出されている。
図２４に示すスコアリング処理において、スコアモード情報としては、特に重み付け等は設定されていない（例えば、係数０．０もしくは１．０）。また、各ノード（ルール）がマッチした場合は、基準点を付与し、さらに、スコアモード情報による重み付けの計算を行う。なお、本実施例では基準点は、マッチ辞書内に記憶された各ルール（ノード）の重み値が適用されるが、単純に装置全体で所定の値を設定してもよい。今回は基準点を１．２点として、説明する。
例えば、「ＥＴＣカードが使えない」は、「ＥＴＣ」「カード」「使え」という単語がマッチし、さらに、「ＥＴＣ」、「カード」は名詞なので、「属性なし」で、それぞれ１．２点、「使え」は「使う(否定）」の属性がマッチしているので、１．２点×１．０（述語属性マッチ係数）で、１．２点となり、３．６点である。また「ＥＴＣ」と「使え」、「カード」と「使え」のそれぞれに係り受けペアが含まれる事から、係り受けマッチ係数なし（１．０）が適用され１．２点×１．０×２で２．４点が加算され、合計で６．０点である。
以上のように、マッチモードでは係りうけマッチングは行われていないが、マッチプロファイルのスコアモードで、係り受けマッチ係数を定義しておけば、得られた結果に対して、スコアの算出において、柔軟に重み付けをする事ができる。 Next, search results obtained based on the match profile A in the examples shown in FIGS. 21 to 23 will be described with reference to FIG. FIG. 24 is a schematic diagram showing an example of a screen on which the search ligation obtained based on the match profile A is displayed on the suggestion screen 102C.
Here, since the match mode definition PA1 of the match profile A is a word element + attribute, a result (single sentence) that matches these conditions is extracted in FIG.
In the scoring process shown in FIG. 24, no particular weighting or the like is set as the score mode information (for example, coefficient 0.0 or 1.0). In addition, when each node (rule) matches, a reference point is assigned, and weighting is calculated based on the score mode information. In the present embodiment, the weight value of each rule (node) stored in the match dictionary is applied to the reference point, but a predetermined value may be simply set for the entire apparatus. This time, the reference point will be 1.2 points.
For example, “ETC card cannot be used” matches the words “ETC”, “card” and “useable”, and “ETC” and “card” are nouns, so there are no attributes and 1.2 points each. , “Use” matches the “Use (Negative)” attribute, so 1.2 points × 1.0 (predicate attribute match coefficient), which is 1.2 points, which is 3.6 points. In addition, since there is a dependency pair in each of “ETC” and “usable”, “card” and “usable”, no dependency match coefficient (1.0) is applied and 1.2 points × 1.0 × In 2.4, 2.4 points are added, for a total of 6.0 points.
As described above, dependency matching is not performed in the match mode, but if the dependency match coefficient is defined in the score mode of the match profile, the score can be calculated flexibly for the obtained result. Can be weighted.

一方、「ＥＴＣカードを使いたい」は、「ＥＴＣ」「カード」という単語・属性なしがマッチしている。この場合、１．２点＋１．２点により、２．４点である。また、述語属性マッチ係数・係りうけマッチ係数が適用されるものはないため、その分の加点はない。
また、「クレジットカードが使えない」は、「使え」という単語と属性「使う（否定）」がマッチしている。この場合、１．２点×１．０で１．２点である。また、係り受けマッチ係数が適用されるものはないため、その分の加点はない。よって、図２４に示す通り、スコア順に並べられた検索締果は、最上位が「ＥＴＣカードが使えない」(スコア６．０点)、その次が、「ＥＴＣカードを使いたい」(スコア２．４点)となる。
なお、本実施例では、「クレジットカード」と「ＥＴＣカード」は、「カード」の部分で共通しているが、「クレジット」単独での形で形態素解析をおこなう際のシステム辞書５５０１に登録がなければ、単語要マッチングにおいて、全要素のタイプが実行されているため、両者はマッチした文節として得られない。 On the other hand, “I want to use an ETC card” matches the words “ETC” and “card” with no word / attribute. In this case, it is 2.4 points by 1.2 points + 1.2 points. In addition, since there is no predicate attribute match coefficient or dependency match coefficient, there is no additional point.
In addition, the word “use credit” and the attribute “use (deny)” match “cannot use credit card”. In this case, 1.2 points × 1.0 is 1.2 points. In addition, since there is no application of the dependency match coefficient, there is no additional point. Therefore, as shown in FIG. 24, the search results arranged in the order of the score are “the ETC card cannot be used” at the top (score 6.0 points), followed by “I want to use the ETC card” (score 2 4 points).
In this embodiment, “credit card” and “ETC card” are common in the “card” part, but are registered in the system dictionary 5501 when performing morphological analysis in the form of “credit” alone. Otherwise, since all element types are executed in word matching, both cannot be obtained as matched phrases.

次に、図２５を用いて、図２１〜２３に示した例において、マッチプロファイルＢに基づき得られた検索結果について説明する。図２５は、マッチプロファイルＢに基づき得られた検索結果をサジェスト画面１０２Ｃに表示されている画面の一例を示す概略図である。
図２５に示すスコアリング処理において、スコアモード情報としては、係り受けマッチ係数が、“あり”（点数に２.０を乗じる）に予め決定している。例えば、「ＥＴＣカードが使えない」は、「ＥＴＣ」「カード」「使え」という単語・属性がマッチし、プロファイルＡと同様に３．６点である。また「ＥＴＣ」と「使え」、「カード」と「使え」のそれぞれに係り受けペアが含まれる事から、係り受けマッチ係数あり（２．０）が適用され１．２点×２．０×２で４．８点が加算され、合計で７．２点である。なお、その他のマッチした単文に対してのスコア処理は図２４での説明と同様であるために、詳細な説明は省略する。 Next, search results obtained based on the match profile B in the example shown in FIGS. 21 to 23 will be described with reference to FIG. FIG. 25 is a schematic diagram illustrating an example of a screen in which search results obtained based on the match profile B are displayed on the suggest screen 102C.
In the scoring process shown in FIG. 25, as the score mode information, the dependency match coefficient is determined to be “present” (multiply the score by 2.0). For example, “ETC card cannot be used” matches the words / attributes “ETC”, “card”, and “use” and has 3.6 points as in profile A. In addition, since there is a dependency pair in each of “ETC” and “usable”, “card” and “usable”, there is a dependency match coefficient (2.0) and 1.2 points × 2.0 × 4.8 points are added at 2, which gives a total of 7.2 points. Since the score processing for other matched single sentences is the same as the description in FIG. 24, the detailed description is omitted.

次に、図２６を用いて、図２１〜２３に示した例において、マッチプロファイルＣに基づき得られた検索結果について説明する。図２６は、マッチプロファイルＣに基づき得られた検索結果をサジェスト画面１０２Ｃに表示されている画面の一例を示す概略図である。
図２６に示すスコアリング処理において、スコアモード情報としては、述語属性マッチ係数があり（点数に２.０を乗じる）に予め決定している。例えば、「ＥＴＣカードが使えない」は、「ＥＴＣ」「カード」という単語・属性なしがマッチし、１．２点＋１．２点で２．４点、さらに「使え」が単語、「使う（否定）」の属性でマッチしているので述語属性マッチ係数あり（２．０）が適用され１．２点×２．０で２．４点である。
また、プロファイルＡと同様に「ＥＴＣ」と「使え」、「カード」と「使え」のそれぞれに係り受けペアが含まれる事から、係り受けマッチ係数なし（１．０）が適用され１．２点×１．０×２で２．４点が加算され、合計で７．２点である。一方、「クレジットカードが使えない」は「使え」が単語、「使う(否定）」の属性でマッチしているだけなので、
述語属性マッチ係数あり（２．０）が適用され１．２点×２．０で２．４点である。 Next, search results obtained based on the match profile C in the example illustrated in FIGS. 21 to 23 will be described with reference to FIG. FIG. 26 is a schematic diagram illustrating an example of a screen in which search results obtained based on the match profile C are displayed on the suggest screen 102C.
In the scoring process shown in FIG. 26, as the score mode information, there is a predicate attribute match coefficient (multiply the score by 2.0). For example, “ETC card cannot be used” matches “ETC” and “card” without word / attribute, 1.2 points + 1.2 points, 2.4 points, and “use” is word, “use ( (No) ”, the predicate attribute match coefficient exists (2.0) is applied, and 1.2 points × 2.0 is 2.4 points.
Similarly to profile A, “ETC” and “usable”, “card” and “usable” each include a dependency pair, so no dependency match coefficient (1.0) is applied and 1.2. 2.4 points are added at points × 1.0 × 2, for a total of 7.2 points. On the other hand, “Can't use credit card” matches only “Use” is a word and “Use (Negation)” attribute.
Predicate attribute match coefficient exists (2.0) is applied, which is 1.2 points × 2.0 and 2.4 points.

このように、異なるスコアモード情報が設定されているマッチプロファイルＡ〜Ｃを用いて、異なる検索結果を得ることができる。
これにより、利用するマッチモードに応じて、マッチ辞書データを作成したり、マッチ辞書データをマッチモードに応じて作成したりする必要がなく、１つのマッチ辞書データを利用して、複数のマッチモードやその組み合わせによる検索を行うことができる。 In this way, different search results can be obtained using match profiles A to C in which different score mode information is set.
This eliminates the need to create match dictionary data according to the match mode to be used or to create match dictionary data according to the match mode. Or a combination thereof.

次に、マッチモードに応じた検索結果の特徴について、図２７〜３１を用いて、説明する。
図２７は、検索結果を示し、マッチした単文と、このマッチした単文が得られたマッチモードの組み合わせを示す図である。
例えば、「ＥＴＣカードが使えない」は、（１）単語要素マッチング、（２）単語要素マッチングと属性マッチングの組み合わせ、（３）係りうけマッチング、（４）係りうけマッチングと属性マッチングの組み合わせの全てにおいて、マッチした単文として得られたものであることを示している。なお、検索キー文は「ＥＴＣカードがお店で使えません。」である。 Next, the characteristics of the search result according to the match mode will be described with reference to FIGS.
FIG. 27 shows a search result, and shows a combination of a matched single sentence and a match mode in which the matched single sentence is obtained.
For example, “ETC card cannot be used” means (1) word element matching, (2) combination of word element matching and attribute matching, (3) modification matching, and (4) combination of modification matching and attribute matching. Indicates that it was obtained as a matched single sentence. The search key sentence is “ETC card cannot be used at the store”.

図２８は、マッチプロファイルのマッチモード定義ＰＡ１において、（１）単語要素マッチングによって得られた検索結果を示す概略図である。図２８に示す通り、マッチした単文としては、「ＥＴＣカードが使えない」と「ＥＴＣカードを使いたい」のように、意味が正反対の文章もマッチしている。一方、「ＥＴＣカードを失くした。」等の「ＥＴＣカード」という単語だけマッチしている文章や、「クレジットカードが使えない」等の「使え」という単語だけマッチしている文章も、マッチした文章として検索によって得られる。
このように、このマッチプロファイルでは、広い範囲にわたって、類似文章も含めて集めてくる検索ができる。よって、複数のマッチした単文が得られる。ここで、複数のマッチした単文が膨大に得られる場合、却ってユーザの利便性が害される場合がある。しかし、日本語解析サーバ５００は、マッチした単語の重み付けを行うことによって、スコアに基づき、評価されたマッチした単文を検索結果として表示することができる。よって、広い範囲にわたって複数の類似文章を検索した場合であっても、これら検索結果に優先順位を与え、スコアに基づき優先順位に応じた順番で表示することができる。 FIG. 28 is a schematic diagram showing search results obtained by (1) word element matching in match mode definition PA1 of the match profile. As shown in FIG. 28, as matched single sentences, sentences having opposite meanings such as “I cannot use an ETC card” and “I want to use an ETC card” are matched. On the other hand, sentences that match only the word “ETC card” such as “I lost my ETC card” or sentences that only match the word “use” such as “I cannot use a credit card” match. Obtained by searching as a sentence.
As described above, in this match profile, it is possible to perform a search including a similar sentence over a wide range. Thus, a plurality of matched simple sentences are obtained. Here, if a large number of matched single sentences are obtained, the user's convenience may be adversely affected. However, the Japanese analysis server 500 can display the evaluated matched single sentence as a search result based on the score by weighting the matched word. Therefore, even when a plurality of similar sentences are searched over a wide range, priority can be given to these search results, and the results can be displayed in the order according to the priority based on the score.

図２９は、マッチプロファイルのマッチモード定義ＰＡ１において、（２）単語要素マッチングと属性マッチングの組み合わせによって得られた検索結果を示す概略図である。図２９に示す通り、マッチした単文としては、「ＥＴＣカードが使えない」と「ＥＴＣカードを使いたい」のように、意味が正反対の文章もマッチしている。ただし、ここでは「使え」は「使う（否定）」の属性としてマッチするため、両者のスコアの差分は、図２８に示した例に比べて大きくできる。このように、マッチする単文の属性に対してもスコアの重み付けを行うことによって、検索キー文と意味の近いマッチした単文を、上位に表示することができる。 FIG. 29 is a schematic diagram showing search results obtained by (2) a combination of word element matching and attribute matching in match mode definition PA1 of the match profile. As shown in FIG. 29, as matched single sentences, sentences having opposite meanings such as “I cannot use an ETC card” and “I want to use an ETC card” are matched. However, since “use” matches as the “use (deny)” attribute here, the difference between the scores of both can be made larger than in the example shown in FIG. In this way, by assigning scores to matching single sentence attributes, matched single sentences having a meaning similar to that of the search key sentence can be displayed at the top.

図３０は、マッチプロファイルのマッチモード定義ＰＡ１において、（３）係りうけマッチングによって得られた検索結果を示す概略図である。図３０に示す通り、マッチした単文としては、「ＥＴＣカードが使えない」、「ＥＴＣカードを使いたい」、「ＥＴＣカードが使いやすい」といったように、意味は異なるが、“ＥＴＣカードを使うこと”に関連する単文がマッチしている。このようなマッチングモードを使用した場合、属性に係らず、係りうけ関係を重視した検索を行うことができる。 FIG. 30 is a schematic diagram showing search results obtained by (3) dependency matching in the match mode definition PA1 of the match profile. As shown in FIG. 30, the matched single sentences have different meanings such as “ETC card cannot be used”, “I want to use an ETC card”, “ETC card is easy to use”, but “Use an ETC card” A single sentence related to "" matches. When such a matching mode is used, it is possible to perform a search with an emphasis on the relationship, regardless of the attribute.

また、図３１は、マッチプロファイルのマッチモード定義ＰＡ１において、（４）係りうけマッチングと属性マッチングの組み合わせによって得られた検索結果を示す概略図である。図３１に示す通り、マッチした単文としては、「ＥＴＣカードが使えない」というような、検索キー結果に対してかなり近い意味の単文を検索により得ることができる。これは、係りうけマッチングのノードの親子関係のタイプで、係りうけ関係の親ノードと子ノードがそれぞれマッチし、かつ、係りうけ関係の親ノードあるいは子ノードに対応する単語に属性がマッチする場合にのみ検索結果が得られるものである。 FIG. 31 is a schematic diagram showing search results obtained by the combination of (4) dependency matching and attribute matching in the match profile definition PA1 of the match profile. As shown in FIG. 31, as a matched single sentence, a single sentence having a meaning very close to the search key result such as “ETC card cannot be used” can be obtained by searching. This is the type of parent-child relationship of a node for dependency matching, where the parent node and child node of the dependency relationship match, and the attribute matches the word corresponding to the parent node or child node of the dependency relationship Search results can be obtained only for.

よって、例えば、コールセンターにおけるオペレーターのように、お客様の問い合わせに対して的確な回答を短時間で検索し、回答する必要がある場合複数のマッチモードの組み合わせによって、係りうけ関係や単語に含まれる属性に応じた、より少ない数の検索結果に絞り込むことができ、例えば、どのような商品に対するどのような要望あるいはクレームなのか、どのようなことが分かり難いための問い合わせなのか等の、細かい意味の違いに応じた検索を行うことができる。 Therefore, for example, when an operator needs to search for an accurate answer in a short time, such as an operator in a call center, and answer it, the attribute included in the relationship or word by combining multiple match modes Can be narrowed down to a smaller number of search results, for example, what kind of request or claim for a product, what kind of inquiry is difficult to understand, etc. Search according to the difference.

上述の通り、マッチードは、それぞれ異なる性格を有しているため、検索の目的に応じて使用するマッチモードやその組み合わせを選択することによって、わざわざ辞書データを作りかえることなく、ユーザに応じた検索を実現することができる。
また、従来技術に比べて、一つの辞書データに複数のプロファイルを切替えて使用することで、全体として辞書データのデータ量を大幅に抑制することができる。
本実施の形態に係る日本語解析サーバ５００は、上述の通り、マッチプロファイル毎に、利用するマッチモードを登録するマッチモード定義ＰＡ１を備え、また、マッチプロファイル毎にスコアモードの条件や係数を調整できるように、相対出現頻度フラグＰＡ２〜同義語マッチ係数ＰＡ１０を備えるようにした。これにより、本実施の形態に係る日本語解析サーバ５００は、マッチプロファイルに応じて決定されているマッチモードを利用することができる。よって、マッチ辞書記憶部５０３は、マッチモードごとに作成する必要がなく、１つのマッチ辞書データを利用して、異なるマッチモードでのマッチングを行うことができる。従って、マッチモードごとに、マッチ辞書データを作成する必要がなく、複数のマッチ辞書データを記憶するための記憶容量を削減することができる。また、マッチモード毎にマッチ辞書データを作成する労力が削減される。さらに、マッチモードは、ユーザによって検索サービスが選択されると、関連付けられているマッチモードが一義的に決定されるため、ユーザによって検索サービスが変更されることによって、簡単にマッチモードを変更することができる。 As described above, each match has a different personality, so by selecting the match mode or combination to be used according to the purpose of the search, it is possible to search according to the user without having to bother creating dictionary data. Can be realized.
Compared with the prior art, by switching and using a plurality of profiles for one dictionary data, the data amount of the dictionary data as a whole can be greatly suppressed.
As described above, the Japanese analysis server 500 according to the present embodiment includes a match mode definition PA1 for registering a match mode to be used for each match profile, and adjusts score mode conditions and coefficients for each match profile. In order to make it possible, a relative appearance frequency flag PA2 to a synonym match coefficient PA10 is provided. Thereby, the Japanese language analysis server 500 according to the present embodiment can use the match mode determined according to the match profile. Therefore, the match dictionary storage unit 503 does not need to be created for each match mode, and can perform matching in different match modes using one match dictionary data. Therefore, it is not necessary to create match dictionary data for each match mode, and the storage capacity for storing a plurality of match dictionary data can be reduced. Further, the labor for creating match dictionary data for each match mode is reduced. In addition, when a search service is selected by the user, the associated match mode is uniquely determined by the user, so the user can easily change the match mode by changing the search service by the user. Can do.

また、本実施の形態に係る日本語解析サーバ５００の検索処理部５０１は、マッチングモード情報を利用してマッチング条件を満たす単文や文章を得たうえで、さらに、得られたマッチした文章や単文に対して、スコアを算出するようにした。これにより、マッチした文章や単文の中から、検索キー文との関係において、意味合いがより近いものを、スコアによって表すことができる。
よって、マッチモード情報を利用して、マッチング条件に応じた検索結果に絞り込み、その上で、マッチした検索結果に対して、検索キー文の属性や文の構成（係りうけ関係）に応じたスコアを算出し、得られたスコアに基づいて、マッチした文章や単文の優先順位をつけることができる。これにより、検索キー文に、より近い意味合いのマッチした文章や単文を得ることができる。
さらに、本実施の形態に係る日本語解析サーバ５００の検索処理部５０１は、マッチングモード情報とスコアモード情報とは、関連付けて決められることなく、独立して設定するようにした。これにより、検索キー文の特性や検索の目的や条件に応じた、より多面的な検索を実現することができる。 In addition, the search processing unit 501 of the Japanese analysis server 500 according to the present embodiment obtains a single sentence or sentence satisfying the matching condition using the matching mode information, and further obtains the matched sentence or single sentence obtained. The score was calculated. As a result, from the matched sentences and simple sentences, those having closer meaning in relation to the search key sentence can be represented by the score.
Therefore, match mode information is used to narrow down the search results according to the matching condition, and then the scores according to the search key sentence attributes and sentence structure (relationship relations) for the matched search results. , And based on the obtained score, it is possible to prioritize matched sentences and single sentences. As a result, it is possible to obtain a sentence or a single sentence that has a closer match with the search key sentence.
Furthermore, the search processing unit 501 of the Japanese analysis server 500 according to the present embodiment sets the matching mode information and the score mode information independently without being determined in association with each other. As a result, it is possible to realize a multifaceted search according to the characteristics of the search key sentence, the purpose and conditions of the search.

なお、本実施形態に係るクライアント端末装置１００は、例えば、コールセンターなどでの入力業務に用いる入力端末装置であって、ワークステーションやパーソナルコンピュータなどの情報処理装置から構成されるものが好ましい。また、本実施形態に係るクライアント端末装置１００にあっては、携帯電話のユーザサポート業務を行うコールセンターにおいて、ユーザに対する電話応答時に頻出する語彙を基に、マッチ辞書データが構成されているものであってもよい。
これにより、電話による応対の内容をリアルタイムに入力する必要のあるコールセンターの作業などにおいて、より効率的な文字入力を実現することができる。 Note that the client terminal device 100 according to the present embodiment is an input terminal device used for input work in a call center, for example, and preferably includes an information processing device such as a workstation or a personal computer. Further, in the client terminal device 100 according to the present embodiment, the match dictionary data is configured based on the vocabulary that frequently appears at the time of the telephone response to the user in the call center that performs the user support work of the mobile phone. May be.
As a result, it is possible to realize more efficient character input in a call center operation or the like where it is necessary to input the contents of reception by telephone in real time.

また、上述の情報検索システム１における動作の過程は、コンピュータに実行させるためのプログラムや、このプログラムとしてコンピュータ読み取り可能な記録媒体として利用可能であり、コンピュータシステムが読み出して実行することによって、上記処理が行われる。なお、ここでいう「コンピュータシステム」とは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 The operation process in the information retrieval system 1 described above can be used as a program to be executed by a computer or a computer-readable recording medium as the program, and the computer system reads and executes the above process. Is done. The “computer system” here includes a CPU, various memories, an OS, and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に記憶したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムに既に記録されているプログラムとの組合せで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

１情報検索システム
１００クライアント端末装置
３００ＷＥＢサーバ
５００日本語解析サーバ
７００データベースファイルサーバ 1 Information Retrieval System 100 Client Terminal Device 300 WEB Server 500 Japanese Language Analysis Server 700 Database File Server

Claims

An input unit for inputting a search key sentence composed of a plurality of words;
An analysis unit that analyzes the search key sentence and obtains an analysis result related to the word constituting the search key sentence;
Information about the clause included in the sentence as a match dictionary information about a sentence constituted by at least one of the clauses as a subtree node in a tree structure including a clause constituted by at least one of the words A match dictionary storage unit for storing rule information representing
Matching conditions for checking the relationship between the match dictionary information stored in the match dictionary storage unit and the search key sentence are associated with each other, and the search key sentence for words satisfying the matching condition A match profile storage unit that stores match profile information having evaluation criteria for evaluating the degree of matching of
Based on the match profile information, the search key sentence is matched with the match dictionary information according to the associated matching condition. As a result of the matching, for the sentence satisfying the matching condition, the match profile information A search processing unit that calculates a score representing a degree of matching between the search key sentence and the match dictionary information according to the evaluation criterion associated with
An information retrieval apparatus comprising:

The evaluation criteria are:
Represents whether or not to give a score according to the degree of matching for words that meet the matching condition,
The search processing unit
The information search device according to claim 1, wherein the score is obtained by calculating the score given to the word satisfying the matching condition for each sentence satisfying the matching condition according to the evaluation criterion. .

The match profile storage unit
The plurality of match profile information associated with at least one matching condition among the plurality of matching conditions having different characteristics, respectively, according to the purpose of the search, according to claim 1 or 2 Information retrieval device.

The match profile storage unit
4. The information search apparatus according to claim 1, wherein at least one of word element matching, attribute matching, and dependency matching is associated as the matching condition. 5.

Wherein the input unit inputs the configured search subject sentence from the plurality of words,
Wherein the analysis unit analyzes the searched text to obtain an analysis result relating to the words constituting said search subject sentence,
Based on the analysis results, the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, in association with clause constituted by at least one of said word, a subtree node a dictionary information configured in a tree structure, a dictionary creation unit for creating and storing the matching dictionary storage unit said matching dictionary information about configured sentence by at least one of the clauses,
The information search device according to any one of claims 1 to 4, further comprising:

The input section is
Accept input of search key sentence consisting of multiple words,
The analysis department
Analyzing the search key sentence, obtaining an analysis result relating to the word constituting the search key sentence,
The search processor
A matching condition for collating the relationship between the match dictionary information and the search key sentence is associated, and an evaluation for evaluating the degree of matching with the search key sentence for a word that satisfies the matching condition Read the match profile information from a match profile storage unit that stores match profile information having a reference;
Information about the clause included in the sentence as a match dictionary information about a sentence constituted by at least one of the clauses as a subtree node in a tree structure including a clause constituted by at least one of the words Using the match dictionary information in the match dictionary storage unit that stores rule information representing, the matching between the search key sentence and the match dictionary information according to the associated matching condition based on the match profile information And
As a result of collation, for the sentence satisfying the matching condition, a score representing a degree of collation between the search key sentence and the match dictionary information is calculated according to the evaluation criterion associated with the match profile information. Information search method characterized by

Computer
An input means for inputting a search key sentence composed of a plurality of words;
Analyzing means for analyzing the search key sentence and obtaining an analysis result relating to the word constituting the search key sentence;
A matching condition for collating the relationship between the match dictionary information and the search key sentence is associated, and an evaluation for evaluating the degree of matching with the search key sentence for a word that satisfies the matching condition A dictionary in which the match profile information is read from a match profile storage unit that stores match profile information having a reference, and a clause composed of at least one of the words is configured as a sub-tree node in a tree structure, Based on the match profile information, using the match dictionary information of the match dictionary storage unit that stores rule information representing information about the clause included in the sentence, as match dictionary information about the sentence constituted by the clause, The search key according to the matching condition And the match dictionary information, and as a result of the collation, for the sentence satisfying the matching condition, according to the evaluation criteria associated with the match profile information, the search key sentence and the match dictionary information A program for functioning as a search processing means for calculating a score representing the degree of collation.

Wherein the input means inputs the configured search subject sentence from the plurality of words,
It said analyzing means analyzes the searched text to obtain an analysis result relating to the words constituting said search subject sentence,
Said computer further based on the analysis result, the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, in association with clause constituted by at least one of said word , a dictionary information configured in a tree structure as a sub-tree nodes, to function as a dictionary creation means for storing in said matching dictionary storage unit to create the matching dictionary information about configured sentence by at least one of the clauses The program according to claim 7 .