JP7168411B2

JP7168411B2 - Information processing system and information processing method

Info

Publication number: JP7168411B2
Application number: JP2018202130A
Authority: JP
Inventors: 美沙佐藤; 孝介柳井; 康嗣森本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2022-11-09
Anticipated expiration: 2038-10-26
Also published as: JP2020067971A

Description

本発明は、情報を処理する情報処理システムおよび情報処理方法に関する。 The present invention relates to an information processing system and an information processing method for processing information.

特許文献１は、意味カテゴリを学習して意味カテゴリ辞書を拡張し、仮に意味カテゴリの学習結果に誤りが生じても、その学習結果を修正することができる辞書作成装置を開示する。この辞書作成装置は、意味カテゴリ付与部による意味カテゴリの付与結果を考慮して、意味カテゴリ辞書に保持されている単語と意味カテゴリの対応関係を更新するとともに、意味カテゴリの抽出ルールを更新する意味カテゴリ学習部の他に、その意味カテゴリ学習部により更新された単語と意味カテゴリの対応関係を提示して、単語と意味カテゴリの対応関係の修正を受け付ける意味カテゴリ編集部を設ける。 Patent Literature 1 discloses a dictionary creation device that learns semantic categories to extend a semantic category dictionary, and can correct the learning results even if an error occurs in the semantic category learning results. This dictionary creation device updates the correspondence between words and semantic categories held in the semantic category dictionary and also updates the semantic category extraction rules in consideration of the result of semantic category assignment by the semantic category assigning unit. In addition to the category learning section, a semantic category editing section is provided that presents the correspondence between words and semantic categories updated by the semantic category learning section and accepts correction of the correspondence between words and semantic categories.

特開２００７－２１３３３６号公報JP 2007-213336 A

しかしながら、上述した従来技術では、単語辞書ＤＢ内の語を増やすことが困難であるという課題がある。特許文献１の辞書作成装置は、キーワード検索のみで抽出したい関係を含む例文を検索する。したがって、新たな例文を追加する際には抽出ルールと意味カテゴリ辞書の両方を編集する必要がある。 However, the conventional technology described above has a problem that it is difficult to increase the number of words in the word dictionary DB. The dictionary creation device of Patent Literature 1 retrieves example sentences including a relationship to be extracted only by keyword retrieval. Therefore, when adding a new example sentence, it is necessary to edit both the extraction rule and the semantic category dictionary.

本発明は、未登録単語を効率的に収集することを目的とする。 An object of the present invention is to efficiently collect unregistered words.

本願において開示される情報処理システムおよび情報処理方法は、プログラムを実行するプロセッサと、前記プログラムを記憶する記憶デバイスと、を有する情報処理システムおよび情報処理方法であって、所定の属性によりグループ化された単語群である単語グループを記憶する単語辞書データベースと、文中の単語間の文の要素に関する関係を示す木構造データについて前記単語グループを用いて抽象化した木構造パターンを記憶するルールデータベースと、を有し、前記プロセッサは、文の要素のうち前記単語グループが該当しない第１要素の単語および前記単語グループが該当しない第２要素の単語の組み合わせを含む対象文を取得する取得処理と、前記取得処理によって取得された対象文が、前記文の要素のうち前記単語グループが該当する第３要素を除外した特定の木構造パターンに該当するか否かを判断する判断処理と、前記判断処理によって前記特定の木構造パターンに該当すると判断された対象文から、前記第３要素の単語を抽出して、抽出結果を出力する抽出処理と、を実行することを特徴とする。 An information processing system and an information processing method disclosed in the present application are an information processing system and an information processing method having a processor that executes a program and a storage device that stores the program. a word dictionary database that stores word groups that are word groups that are grouped together; a rule database that stores a tree structure pattern abstracted using the word groups for tree structure data indicating relationships between words in a sentence regarding elements of sentences; wherein the processor acquires a target sentence including a combination of a first element word to which the word group does not apply and a second element word to which the word group does not apply among sentence elements; determining whether or not the target sentence acquired by the acquisition process corresponds to a specific tree structure pattern obtained by excluding a third element corresponding to the word group among the elements of the sentence; an extraction process of extracting words of the third element from the target sentence determined to correspond to the specific tree structure pattern and outputting an extraction result.

本発明の代表的な実施の形態によれば、未登録単語を効率的に収集することができる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the representative embodiment of the present invention, unregistered words can be efficiently collected. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

図１は、データベースのメンテナンス例１を示す説明図である。FIG. 1 is an explanatory diagram showing an example 1 of database maintenance. 図２は、データベースのメンテナンス例２を示す説明図である。FIG. 2 is an explanatory diagram showing an example 2 of database maintenance. 図３は、データベースのメンテナンス例３を示す説明図である。FIG. 3 is an explanatory diagram of a database maintenance example 3. As shown in FIG. 図４は、コンピュータのハードウェア構成例を示すブロック図である。FIG. 4 is a block diagram showing a hardware configuration example of a computer. 図５は、単語辞書ＤＢの記憶内容例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of contents stored in a word dictionary DB. 図６は、ルールＤＢの記憶内容例を示す説明図である。FIG. 6 is an explanatory diagram of an example of contents stored in a rule DB. 図７は、データストアの記憶内容例を示す説明図である。FIG. 7 is an explanatory diagram of an example of contents stored in a data store. 図８は、本文の一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of the text. 図９は、木構造データおよび木構造パターンの一例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of tree structure data and a tree structure pattern. 図１０は、パターン表現の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of pattern representation. 図１１は、図１０に示したパターン表現を用いた変換例を示す説明図である。FIG. 11 is an explanatory diagram showing a conversion example using the pattern representation shown in FIG. 図１２は、情報処理システムによる情報処理手順例を示すフローチャートである。FIG. 12 is a flowchart showing an example of an information processing procedure by the information processing system. 図１３は、情報処理システムの利用例を示す説明図である。FIG. 13 is an explanatory diagram showing a usage example of the information processing system. 図１４は、情報処理システムの表示画面例１を示す説明図である。FIG. 14 is an explanatory diagram showing a display screen example 1 of the information processing system. 図１５は、情報処理システムの表示画面例２を示す説明図である。FIG. 15 is an explanatory diagram showing a display screen example 2 of the information processing system. 図１６は、情報処理システムの表示画面例３を示す説明図である。FIG. 16 is an explanatory diagram showing a display screen example 3 of the information processing system. 図１７は、情報処理システムの表示画面例４を示す説明図である。FIG. 17 is an explanatory diagram showing a display screen example 4 of the information processing system. 図１８は、情報処理システムの表示画面例５を示す説明図である。FIG. 18 is an explanatory diagram showing example 5 of the display screen of the information processing system. 図１９は、情報処理システムの表示画面例６を示す説明図である。FIG. 19 is an explanatory diagram showing a display screen example 6 of the information processing system. 図２０は、情報処理システムの利用例における処理手順例を示すフローチャートである。FIG. 20 is a flowchart illustrating an example of a processing procedure in an example of use of the information processing system. 図２１は、単語登録例を示す説明図である。FIG. 21 is an explanatory diagram showing an example of word registration. 図２２は、単語登録処理手順例を示すフローチャートである。FIG. 22 is a flow chart showing an example of a word registration processing procedure. 図２３は、情報処理システムの表示画面例７を示す説明図である。FIG. 23 is an explanatory diagram showing a display screen example 7 of the information processing system. 図２４は、情報処理システムの表示画面例８を示す説明図である。FIG. 24 is an explanatory diagram showing a display screen example 8 of the information processing system.

本明細書では、単語辞書ＤＢ（Ｄａｔａｂａｓｅ）に対するメンテナンス例と、単語辞書ＤＢへの単語の追加登録例と、に分けて説明する。単語辞書ＤＢに対するメンテナンス例については、図１～図２０を用い、単語辞書ＤＢへの単語の追加登録例については、図２１～図２４を用いて説明する。 In this specification, an example of maintenance for a word dictionary DB (Database) and an example of additional registration of a word to the word dictionary DB will be described separately. Examples of maintenance for the word dictionary DB will be described with reference to FIGS. 1 to 20, and examples of additional registration of words to the word dictionary DB will be described with reference to FIGS. 21 to 24. FIG.

［１．単語辞書ＤＢに対するメンテナンス例］
図１は、データベースのメンテナンス例１を示す説明図である。図１では、単語辞書ＤＢ１０１に対するメンテナンスについて説明する。単語辞書ＤＢ１０１は、１以上の単語グループを記憶する。単語グループは、所定の属性によりグループ化された単語群である。所定の属性とは、その単語グループが示す特徴である。所定の属性とは、具体的には、たとえば、日本語の文において主語の助詞が「が」格となる動詞や、特定の副詞と共起する動詞が挙げられる。そのほか、同義語や類義語であったり、特定の分野（投資、医療など）に用いられる単語であってもよい。図１では、一例として、単語グループＧａは、「ｓｕｐｐｒｅｓ」、「ｄｅｃｒｅａｓｅ」を含む同義語グループとする。 [1. Example of maintenance for word dictionary DB]
FIG. 1 is an explanatory diagram showing an example 1 of database maintenance. FIG. 1 illustrates maintenance of the word dictionary DB 101. FIG. The word dictionary DB 101 stores one or more word groups. A word group is a word group grouped by a predetermined attribute. A predetermined attribute is a characteristic that the word group exhibits. The predetermined attribute specifically includes, for example, verbs in which the subject particle is a verb in the case of "ga" in Japanese sentences, and verbs that co-occur with specific adverbs. In addition, it may be a synonym, a synonym, or a word used in a specific field (investment, medical care, etc.). In FIG. 1, as an example, the word group Ga is a synonym group including "suppress" and "decrease".

ルールＤＢ１０２は、ルールを示す木構造パターンを記憶するデータベースである。木構造パターンは、文中の単語間の文の要素に関する関係を示す木構造データについて単語グループを用いて抽象化したデータである。文の要素とは、たとえば、主語、述語、目的語である。木構造データは、たとえば、形態素解析および句構造解析（以下、構文解析）により句構造規則にしたがって生成される構文木である。図１のルールＲａは、主語（ワイルドカード）、述語、および目的語（ワイルドカード）の語順であり、述語を構成する動詞が単語グループＧａである木構造パターンとする。 The rule DB 102 is a database that stores tree structure patterns that indicate rules. A tree-structured pattern is data obtained by abstracting tree-structured data, which indicates relationships between words in a sentence with respect to elements of the sentence, using word groups. Sentence elements are, for example, subjects, predicates, and objects. The tree structure data is, for example, a syntax tree generated according to phrase structure rules by morphological analysis and phrase structure analysis (hereinafter referred to as syntactic analysis). The rule Ra in FIG. 1 is a tree structure pattern in which a subject (wild card), a predicate, and an object (wild card) are ordered, and the verbs forming the predicate are the word group Ga.

データストア１０３は、各種文（たとえば、学術論文や書籍内の文、新聞雑誌内の文、Ｗｅｂページに記述された文など）のテキストデータを記憶する。 The data store 103 stores text data of various sentences (for example, sentences in academic papers and books, sentences in newspapers and magazines, sentences described on web pages, etc.).

（Ａ）ルールＲａの木構造パターンでデータストア１０３が検索されると（Ｓ１１）、検索結果１１１が得られる。検索結果１１１内の文は、いずれもルールＲａを満たすテキストデータである。ここで、（Ａ）において、単語辞書ＤＢ１０１に対するメンテナンスにより、単語グループＧａに「ｒｅｄｕｃｅ」が追加されて（Ｂ）の状態になったとする。（Ｂ）ルールＲａの木構造パターンでデータストア１０３が検索されると（Ｓ１２）、検索結果１１２が得られる。単語グループＧａに「ｒｅｄｕｃｅ」が追加された場合でも、ルールＲａを修正することなく、検索が可能である。 (A) When the data store 103 is searched with the tree structure pattern of the rule Ra (S11), a search result 111 is obtained. All sentences in the search result 111 are text data that satisfy the rule Ra. Here, in (A), it is assumed that "reduce" is added to the word group Ga due to maintenance of the word dictionary DB 101, resulting in the state of (B). (B) When the data store 103 is searched with the tree structure pattern of the rule Ra (S12), a search result 112 is obtained. Even when "reduce" is added to the word group Ga, retrieval is possible without modifying the rule Ra.

この場合、検索結果１１２内の文は、いずれもルールＲａを満たすテキストデータであり、かつ、検索結果１１１にさらに「ｒｅｄｕｃｅ」を含む「ＺｒｅｄｕｃｅｓＤ．」、「ＸｉｓｇｏｉｎｇｔｏｒｅｄｕｃｅＥ．」が追加される。このように、単語辞書ＤＢ１０１をメンテナンスするだけで、ルールＤＢ１０２をメンテナンスしなくても単語辞書ＤＢ１０１のメンテナンス結果を充足した検索が可能となる。 In this case, the sentences in the search results 112 are all text data that satisfy the rule Ra, and the search results 111 further include "reduce" such as "Z reduces D." and "X is going to reduce E." is added. In this way, only by maintaining the word dictionary DB 101, it is possible to perform a search that satisfies the maintenance result of the word dictionary DB 101 without maintaining the rule DB 102. FIG.

また、（Ｂ）において、単語辞書ＤＢ１０１に対するメンテナンスにより、単語グループＧａから「ｒｅｄｕｃｅ」が削除されて（Ａ）の状態になったとする。（Ａ）ルールＲａの木構造パターンでデータストア１０３が検索されると、検索結果１１１が得られる。単語グループＧａから「ｒｅｄｕｃｅ」が削除された場合でも、ルールＲａを修正することなく、検索が可能である。 Also, in (B), it is assumed that due to maintenance of the word dictionary DB 101, "reduce" is deleted from the word group Ga, resulting in the state of (A). (A) When the data store 103 is searched with the tree structure pattern of the rule Ra, a search result 111 is obtained. Even if "reduce" is deleted from the word group Ga, retrieval is possible without modifying the rule Ra.

この場合、検索結果１１１内の文は、いずれもルールＲａを満たすテキストデータであり、かつ、検索結果１１２において「ｒｅｄｕｃｅ」を含む「ＺｒｅｄｕｃｅｓＤ．」、「ＸｉｓｇｏｉｎｇｔｏｒｅｄｕｃｅＥ．」が検索されない。なお、単語の変更については、上述した単語の削除および追加を実行すればよい。たとえば、（Ｂ）において、「ｒｅｄｕｃｅ」を「ｄｒｏｐ」に変更する場合は、単語グループＧａから「ｒｅｄｕｃｅ」を削除して「ｄｒｏｐ」を追加すればよい。このように、単語辞書ＤＢ１０１をメンテナンスするだけで、ルールＤＢ１０２をメンテナンスしなくても単語辞書ＤＢ１０１のメンテナンス結果を充足した検索が可能となる。 In this case, the sentences in the search result 111 are all text data that satisfy the rule Ra, and in the search result 112, "Z reduces D." and "X is going to reduce E." Not searched. It should be noted that the above-described deletion and addition of words may be executed to change the words. For example, in (B), to change "reduce" to "drop", "reduce" should be deleted from the word group Ga and "drop" should be added. In this way, only by maintaining the word dictionary DB 101, it is possible to perform a search that satisfies the maintenance result of the word dictionary DB 101 without maintaining the rule DB 102. FIG.

図２は、データベースのメンテナンス例２を示す説明図である。図２では、ルールＤＢ１０２に対するメンテナンスについて説明する。（Ａ）は、図１の（Ａ）と同様である。（Ｂ）は、新たに追加されたルールＲｂを示す。ルールＲｂは、主語（ワイルドカード）、述語（助動詞（ワイルドカード）および動詞）、および目的語（ワイルドカード）の語順であり、動詞が単語グループＧａである木構造パターンとする。すなわち、ルールＲｂは、ルールＲａに助動詞が追加された木構造パターンである。 FIG. 2 is an explanatory diagram showing an example 2 of database maintenance. FIG. 2 explains maintenance for the rule DB 102 . (A) is the same as (A) of FIG. (B) shows a newly added rule Rb. The rule Rb is a tree structure pattern in which the subject (wildcard), predicate (auxiliary verb (wildcard) and verb), and object (wildcard) word order, and the verb is the word group Ga. In other words, rule Rb is a tree structure pattern in which auxiliary verbs are added to rule Ra.

（Ｂ）ルールＲｂの木構造パターンでデータストア１０３が検索されると（Ｓ１３）、検索結果２１０が得られる。検索結果２１０内の文は、いずれもルールＲｂを満たすテキストデータである。また、ルールを削除する場合も、ルールＤＢ１０２からルールＲｂを削除するだけでよく、単語辞書ＤＢ１０１をメンテナンスする必要がない。ルールの変更については、上述したルールの削除および追加を実行すればよい。たとえば、ルールＲａをルールＲｂに変更する場合は、ルールＲａを呼び出して、助動詞（ワイルドカード）を動詞（単語グループＧａ）の前に追加すればよい。このように、ルールＤＢ１０２をメンテナンスするだけで、単語辞書ＤＢ１０１をメンテナンスしなくてもルールＤＢ１０２のメンテナンス結果を充足した検索が可能となる。 (B) When the data store 103 is searched with the tree structure pattern of the rule Rb (S13), a search result 210 is obtained. All sentences in the search result 210 are text data that satisfy the rule Rb. Also, when deleting a rule, it is sufficient to delete the rule Rb from the rule DB 102, and there is no need to maintain the word dictionary DB 101. FIG. To change the rules, the deletion and addition of the rules described above may be performed. For example, when rule Ra is changed to rule Rb, rule Ra is called and an auxiliary verb (wild card) is added before the verb (word group Ga). In this way, only by maintaining the rule DB 102, it is possible to perform a search satisfying the maintenance result of the rule DB 102 without maintaining the word dictionary DB 101. FIG.

図３は、データベースのメンテナンス例３を示す説明図である。図３では、ルールＤＢ１０２に対するメンテナンスについて説明する。ルールに単語グループが用いられている場合、単語グループ内の単語ごとにルールをルールＤＢ１０２に登録しておく必要がない。たとえば、ルールＲａは、単語グループＧａを用いているため、動詞ごとのルールＲａ１、Ｒａ２をルールＤＢ１０２に登録する必要がない。これにより、ルールの重複を抑制し、ルールＤＢ１０２の省メモリ化を図ることができる。 FIG. 3 is an explanatory diagram of a database maintenance example 3. As shown in FIG. FIG. 3 explains maintenance for the rule DB 102 . When word groups are used in rules, there is no need to register rules in the rule DB 102 for each word in the word group. For example, since rule Ra uses word group Ga, there is no need to register rules Ra1 and Ra2 for each verb in rule DB 102 . As a result, duplication of rules can be suppressed, and memory saving of the rule DB 102 can be achieved.

＜情報処理システムのハードウェア構成例＞
つぎに、情報処理システムを構成する１以上のコンピュータ４００のハードウェア構成例について説明する。 <Hardware Configuration Example of Information Processing System>
Next, a hardware configuration example of one or more computers 400 configuring the information processing system will be described.

図４は、コンピュータのハードウェア構成例を示すブロック図である。コンピュータ４００は、プロセッサ４０１と、記憶デバイス４０２と、入力デバイス４０３と、出力デバイス４０４と、通信インターフェース（通信ＩＦ４０５）と、を有する。プロセッサ４０１、記憶デバイス４０２、入力デバイス４０３、出力デバイス４０４、および通信ＩＦ４０５は、バス４０６により接続される。プロセッサ４０１は、コンピュータ４００を制御する。記憶デバイス４０２は、プロセッサ４０１の作業エリアとなる。また、記憶デバイス４０２は、各種プログラムやデータを記憶する非一時的なまたは一時的な記録媒体である。記憶デバイス４０２としては、たとえば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリがある。入力デバイス４０３は、データを入力する。入力デバイス４０３としては、たとえば、キーボード、マウス、タッチパネル、テンキー、スキャナがある。出力デバイス４０４は、データを出力する。出力デバイス４０４としては、たとえば、ディスプレイ、プリンタがある。通信ＩＦ４０５は、ネットワークと接続し、データを送受信する。 FIG. 4 is a block diagram showing a hardware configuration example of a computer. The computer 400 has a processor 401, a storage device 402, an input device 403, an output device 404, and a communication interface (communication IF 405). Processor 401 , storage device 402 , input device 403 , output device 404 and communication IF 405 are connected by bus 406 . Processor 401 controls computer 400 . A storage device 402 serves as a work area for the processor 401 . Also, the storage device 402 is a non-temporary or temporary recording medium that stores various programs and data. Examples of the storage device 402 include ROM (Read Only Memory), RAM (Random Access Memory), HDD (Hard Disk Drive), and flash memory. The input device 403 inputs data. The input device 403 includes, for example, a keyboard, mouse, touch panel, numeric keypad, and scanner. The output device 404 outputs data. Output device 404 includes, for example, a display and a printer. Communication IF 405 connects to a network and transmits and receives data.

＜データベースの記憶内容例＞
つぎに、上述した単語辞書ＤＢ１０１、ルールＤＢ１０２、およびデータストア１０３の記憶内容例について説明する。単語辞書ＤＢ１０１、ルールＤＢ１０２、およびデータストア１０３は、図４に示したコンピュータ４００内の記憶デバイス４０２により実現されてもよく、通信ＩＦ４０５を介してアクセス可能な他のコンピュータで実現されてもよい。なお、以降のデータベースまたはテーブルの説明において、ＡＡフィールドｂｂｂ（ＡＡはフィールド名、ｂｂｂは符号）の値を、ＡＡｂｂｂと表記する場合がある。たとえば、グループＩＤフィールド５０１の値を、グループＩＤ５０１と表記する。 <Example of data stored in database>
Next, examples of contents stored in the word dictionary DB 101, the rule DB 102, and the data store 103 will be described. The word dictionary DB 101, the rule DB 102, and the data store 103 may be implemented by the storage device 402 within the computer 400 shown in FIG. 4, or may be implemented by another computer accessible via the communication IF 405. In the description of the database or table below, the value of the AA field bbb (AA is the field name and bbb is the code) may be written as AAbbb. For example, the value of the group ID field 501 is written as group ID 501 .

図５は、単語辞書ＤＢ１０１の記憶内容例を示す説明図である。単語辞書ＤＢ１０１は、グループＩＤフィールド５０１と、属性フィールド５０２と、単語フィールド５０３と、品詞フィールド５０４と、を有する。同一行の各フィールド５０１～５０４の値の組み合わせが１つの単語グループを示すエントリを規定する。グループＩＤフィールド５０１は、グループＩＤを格納する記憶領域である。グループＩＤ５０１は、単語グループを一意に特定する識別情報である。 FIG. 5 is an explanatory diagram showing an example of contents stored in the word dictionary DB 101. As shown in FIG. The word dictionary DB 101 has a group ID field 501 , an attribute field 502 , a word field 503 and a part of speech field 504 . A combination of values of fields 501 to 504 on the same line defines an entry indicating one word group. A group ID field 501 is a storage area for storing group IDs. A group ID 501 is identification information that uniquely identifies a word group.

属性フィールド５０２は、属性を格納する記憶領域である。属性５０２は、その単語グループが示す特徴である。たとえば、日本語の文において主語の助詞が「が」格となる動詞や、特定の副詞と共起する動詞が挙げられる。そのほか、同義語や類義語であったり、特定の分野（投資、医療など）に用いられる単語であってもよい。 The attribute field 502 is a storage area for storing attributes. Attributes 502 are characteristics that the word group exhibits. For example, in Japanese sentences, there are verbs in which the subject particle is a ``ga'' case, and verbs that co-occur with specific adverbs. In addition, it may be a synonym, a synonym, or a word used in a specific field (investment, medical care, etc.).

単語フィールド５０３は、単語を格納する記憶領域である。単語５０３は、その単語グループに属する単語である。操作者（使用者または管理者）は、単語フィールド５０３に対して、単語５０３の追加、変更、削除が可能である。 A word field 503 is a storage area for storing words. A word 503 is a word belonging to the word group. An operator (user or administrator) can add, change, or delete words 503 in the word field 503 .

品詞フィールド５０４は、品詞を格納する記憶領域である。品詞５０４は、単語グループに所属する単語を形態と役割によって分類した種別である。なお、品詞５０４において、単語の形態を指定してもよい。英単語の場合、動詞は、たとえば、原型（現在形）や過去形、過去分詞形、現在進行形、から指定され、名詞は、たとえば、不可算名詞、加算名詞、単数形、複数形から指定され、形容詞や副詞は、たとえば、原型、比較級、最上級から指定される。単に品詞のみ指定された場合（形態が指定されていない場合）は、その品詞５０４のすべての形態が包含されることとしてもよい。 The part-of-speech field 504 is a storage area for storing parts of speech. The part of speech 504 is a type in which words belonging to a word group are classified according to form and role. Note that the part of speech 504 may specify the form of the word. In the case of English words, verbs are designated, for example, from the base form (present tense), past tense, past participle form, and present progressive form, and nouns are designated from, for example, uncountable nouns, addition nouns, singular forms, and plural forms. and adjectives and adverbs are designated, for example, from the base form, comparative, and superlative. If only the part of speech is specified (no form is specified), all forms of the part of speech 504 may be included.

図６は、ルールＤＢ１０２の記憶内容例を示す説明図である。ルールＤＢ１０２は、ルールＩＤフィールド６０１と、木構造パターンフィールド６０２と、を有する。同一行の各フィールド６０１、６０２の値の組み合わせが１つのルールを示すエントリを規定する。ルールＩＤフィールド６０１は、ルールＩＤを格納する記憶領域である。ルールＩＤ６０１は、ルールを一意に特定する識別情報である。木構造パターンフィールド６０２は、木構造パターンを格納する記憶領域である。操作者は、木構造パターンフィールド６０２に対して、木構造パターン６０２の追加、変更、削除が可能である。なお、図１～図３では、木構造パターン６０２内の動詞を単語グループとし主語や目的語をワイルドカードとしたルールを示したが、木構造パターン６０２において主語や目的語など述語以外の語句に単語グループを適用し、それ以外の語句をワイルドカードとしてもよい。また、１つの木構造パターン６０２に複数の単語グループを適用したルールとしてもよい。 FIG. 6 is an explanatory diagram showing an example of contents stored in the rule DB 102. As shown in FIG. The rule DB 102 has a rule ID field 601 and a tree structure pattern field 602 . A combination of values of fields 601 and 602 on the same line defines an entry indicating one rule. A rule ID field 601 is a storage area for storing rule IDs. A rule ID 601 is identification information that uniquely identifies a rule. The tree structure pattern field 602 is a storage area for storing tree structure patterns. The operator can add, change, or delete the tree structure pattern 602 in the tree structure pattern field 602 . 1 to 3 show rules in which verbs in the tree structure pattern 602 are word groups and subjects and objects are wildcards. A word group may be applied and other words may be wildcarded. Also, the rule may apply a plurality of word groups to one tree structure pattern 602 .

図７は、データストア１０３の記憶内容例を示す説明図である。データストア１０３は、見出し語フィールド７０１と、本文フィールド７０２と、木構造データフィールド７０３と、を有する。同一行の各フィールド７０１～７０３の値の組み合わせが１つの文に関するエントリを規定する。 FIG. 7 is an explanatory diagram showing an example of the contents stored in the data store 103. As shown in FIG. The data store 103 has a headword field 701 , a text field 702 and a tree structure data field 703 . A combination of values of fields 701-703 on the same line defines an entry for one sentence.

見出し語フィールド７０１は、見出し語を格納する記憶領域であり、インデックス検索に利用される。見出し語フィールド７０１は、複数の注釈フィールド（図７では、注釈ａ０フィールド７１０～注釈ａ２フィールド７１２の３個）を有する。注釈ａ０フィールド７１０は、注釈ａ０としてあらかじめ設定された見出し語７０１を格納する記憶領域である。注釈ａ１フィールド７１１および注釈ａ２フィールド７１２は、注釈ａ１および注釈ａ２となる見出し語７０１を格納する記憶領域である。注釈ａ１フィールド７１１および注釈ａ２フィールド７１２は、初期状態ではブランクであり、後述のインデックス更新により注釈ａ１および注釈ａ２が追加される。 The headword field 701 is a storage area for storing headwords and is used for index searches. The headword field 701 has a plurality of annotation fields (three annotations a0 field 710 to annotation a2 field 712 in FIG. 7). The annotation a0 field 710 is a storage area for storing entry word 701 preset as annotation a0. An annotation a1 field 711 and an annotation a2 field 712 are storage areas for storing headwords 701 as annotations a1 and annotations a2. Annotation a1 field 711 and annotation a2 field 712 are blank in the initial state, and annotation a1 and annotation a2 are added by index update described later.

本文フィールド７０２は、本文を格納する記憶領域である。本文７０２とは、木構造データ７０３の解析元となるテキストデータである。木構造データフィールド７０３は、本文を句構造規則にしたがって構文解析した木構造データを格納する記憶領域である。 A text field 702 is a storage area for storing text. The text 702 is text data from which the tree structure data 703 is analyzed. The tree-structured data field 703 is a storage area for storing tree-structured data obtained by parsing the text according to the phrase structure rule.

＜各種データの例＞
図８は、本文７０２の一例を示す説明図である。図８では、英語の本文ｓｔ１の一例を示したが、英語に限らず日本語など他の言語でもよい。 <Examples of various data>
FIG. 8 is an explanatory diagram showing an example of the text 702. As shown in FIG. FIG. 8 shows an example of the text st1 in English, but it is not limited to English and may be in other languages such as Japanese.

図９は、木構造データおよび木構造パターンの一例を示す説明図である。木構造データｔｒ１は、図８の本文ｓｔ１を句構造規則にしたがって構文解析した構文木である。木構造データｔｒ１において、「ＰＯＳ」は品詞、「ＲＯＯＴ」は構文木の根を示す。１文字～３文字の大文字アルファベット列は、品詞の種類（名詞、動詞など）を示す。木構造パターンｔｐ１は、操作者が木構造データｔｒ１から不要な情報を削除して編集したパターンである。この木構造パターンｔｐ１は、主語がワイルドカード、述語が動詞の「ｓｐｉｎｏｆｆ」、目的語がワイルドカードとなる語順のルールを示す。 FIG. 9 is an explanatory diagram showing an example of tree structure data and a tree structure pattern. The tree structure data tr1 is a syntax tree obtained by parsing the text st1 of FIG. 8 according to the phrase structure rules. In the tree structure data tr1, "POS" indicates the part of speech, and "ROOT" indicates the root of the syntax tree. A string of 1 to 3 capital letters indicates the type of part of speech (noun, verb, etc.). The tree structure pattern tp1 is a pattern edited by deleting unnecessary information from the tree structure data tr1 by the operator. This tree structure pattern tp1 indicates a word order rule in which the subject is a wildcard, the predicate is the verb "spin off", and the object is a wildcard.

図１０は、パターン表現の一例を示す説明図である。パターン表現１０００は、情報処理システムが情報処理を実行する際に用いられる。また、操作者は、パターン表現１０００を認識することで、木構造データ７０３を編集して木構造パターン６０２を生成することができる。パターン表現１０００において、「＿」はリーフノード（構文木の葉）の判定、「｜」は選択肢、「＃」はサブツリー（構文木内の部分木）抽出、「！」は否定、「＊」は０回以上のサブツリーの出現、「＋」は１回以上の出現を示す。なお、図１０に示したパターン表現１０００は一例である。 FIG. 10 is an explanatory diagram showing an example of pattern representation. The pattern representation 1000 is used when an information processing system executes information processing. Further, by recognizing the pattern expression 1000, the operator can edit the tree structure data 703 to generate the tree structure pattern 602. FIG. In the pattern expression 1000, “_” is leaf node (leaf of the syntax tree) judgment, “|” is option, “#” is subtree (subtree in syntax tree) extraction, “!” is negation, and “*” is 0 times. Occurrences of the above subtrees, "+" indicates one or more occurrences. Note that the pattern representation 1000 shown in FIG. 10 is an example.

図１１は、図１０に示したパターン表現を用いた変換例を示す説明図である。木構造データｔｒ１１において、品詞（ＰＯＳ）が動詞（ＶＰ）であるｉｎｃｒｅａｓｅまたはｃａｕｓｅの選択が、ａｆｆｅｃｔというグループＩＤ５０１の単語グループの呼び出し（＼ｄｉｃ．）に変換されている。これにより、単語グループを含む木構造パターンｔｐ１１が生成される。なお、このような変換は、操作者の編集操作により実行される。 FIG. 11 is an explanatory diagram showing a conversion example using the pattern representation shown in FIG. In the tree structure data tr11, the selection of increase or cause whose part of speech (POS) is a verb (VP) is converted to a call (\dic.) of a word group with a group ID 501 of affect. As a result, a tree structure pattern tp11 containing word groups is generated. Note that such conversion is executed by the operator's editing operation.

＜情報処理手順例＞
図１２は、情報処理システムによる情報処理手順例を示すフローチャートである。情報処理システムは、メンテナンス要求を待ち受ける（ステップＳ１２０１：Ｎｏ）。メンテナンス要求は、プロセッサ４０１からの指示、端末から通信ＩＦ４０５を介して、または入力デバイス４０３から与えられる。メンテナンス要求があった場合（ステップＳ１２０１：Ｙｅｓ）、情報処理システムは、メンテナンス要求が単語に関するメンテナンス要求であるかルール（木構造パターン）に関するメンテナンス要求であるかを、メンテナンス要求に含まれている情報により判断する（ステップＳ１２０２）。 <Example of information processing procedure>
FIG. 12 is a flowchart showing an example of an information processing procedure by the information processing system. The information processing system waits for a maintenance request (step S1201: No). A maintenance request is given from an instruction from the processor 401 , from a terminal via the communication IF 405 , or from the input device 403 . If there is a maintenance request (step S1201: Yes), the information processing system determines whether the maintenance request is for a word or for a rule (tree structure pattern) based on information included in the maintenance request. (step S1202).

単語に関するメンテナンス要求である場合（ステップＳ１２０２：単語）、情報処理システムは、単語に関するメンテナンス要求が単語の追加であるか削除であるかを、単語に関するメンテナンス要求に含まれている情報により判断する（ステップＳ１２０３）。単語の追加である場合（ステップＳ１２０３：追加）、情報処理システムは、追加先の単語グループを単語辞書ＤＢ１０１から特定する（ステップＳ１２０４）。具体的には、たとえば、情報処理システムは、単語に関するメンテナンス要求に追加先のグループＩＤが含まれている場合、単語に関するメンテナンス要求に含まれている追加対象の単語の追加先として、当該グループＩＤ５０１で指定された単語グループを特定する。 If the maintenance request is for a word (step S1202: word), the information processing system determines whether the maintenance request for a word is for addition or deletion of a word based on the information included in the maintenance request for a word (step S1202: word). step S1203). If it is a word addition (step S1203: add), the information processing system identifies the word group to which the word is to be added from the word dictionary DB 101 (step S1204). Specifically, for example, when the word maintenance request includes the group ID of the addition destination, the information processing system selects the group ID 501 as the addition destination of the word to be added included in the word maintenance request. Identify the word group specified by .

また、単語に関するメンテナンス要求に追加先のグループＩＤが含まれてない場合、情報処理システムは、自動的に追加先の単語グループを特定してもよい。たとえば、追加対象の単語が、単語に関するメンテナンス要求に含まれている本文７０２から抽出した単語である場合、情報処理システムは、当該本文の特徴に該当する属性の単語グループを単語辞書ＤＢ１０１から特定する。そして、情報処理システムは、特定した追加先の単語グループに、追加対象の単語を追加して（ステップＳ１２０５）、ステップＳ１２０１に戻る。 Further, if the maintenance request for the word does not include the group ID of the addition destination, the information processing system may automatically identify the word group of the addition destination. For example, if the word to be added is a word extracted from the text 702 included in the maintenance request related to the word, the information processing system identifies from the word dictionary DB 101 a word group having attributes corresponding to the feature of the text. . Then, the information processing system adds the word to be added to the specified word group of the addition destination (step S1205), and returns to step S1201.

また、ステップＳ１２０３において、単語の削除である場合（ステップＳ１２０３：削除）、情報処理システムは、単語辞書ＤＢ１０１の削除対象の単語グループから、単語に関するメンテナンス要求に含まれている削除対象の単語を削除して（ステップＳ１２０６）、ステップＳ１２０１に戻る。削除対象の単語グループとは、たとえば、単語に関するメンテナンス要求にグループＩＤ５０１が指定されていなければ、単語辞書ＤＢ１０１の全エントリであり、グループＩＤ５０１が指定されていれば、当該グループＩＤ５０１で指定されたエントリである。 Also, in step S1203, if the word is to be deleted (step S1203: delete), the information processing system deletes the word to be deleted included in the maintenance request related to the word from the word group to be deleted in the word dictionary DB 101. (step S1206) and returns to step S1201. The word group to be deleted is, for example, all entries in the word dictionary DB 101 if the group ID 501 is not specified in the maintenance request for the word, and if the group ID 501 is specified, the entry specified by the group ID 501. is.

また、ステップＳ１２０２において、ルールに関するメンテナンス要求である場合（ステップＳ１２０２：ルール）、情報処理システムは、ルールに関するメンテナンス要求がルールの追加であるか削除であるかを、ルールに関するメンテナンス要求に含まれている情報により判断する（ステップＳ１２０７）。ルールの追加である場合（ステップＳ１２０７：追加）、情報処理システムは、ルールＤＢ１０２に、ルールに関するメンテナンス要求に含まれている追加対象のルールを追加して（ステップＳ１２０５）、ステップＳ１２０１に戻る。 In step S1202, if the maintenance request is for a rule (step S1202: rule), the information processing system determines whether the rule-related maintenance request is for addition or deletion of a rule. This determination is made based on the information available (step S1207). In the case of addition of a rule (step S1207: add), the information processing system adds to the rule DB 102 the rule to be added included in the maintenance request regarding the rule (step S1205), and returns to step S1201.

また、ステップＳ１２０７において、ルールの削除である場合（ステップＳ１２０７：削除）、情報処理システムは、ルールＤＢ１０２から、ルールに関するメンテナンス要求に含まれているルールＩＤ６０１のエントリを削除して（ステップＳ１２０９）、ステップＳ１２０１に戻る。 Also, in step S1207, if the rule is to be deleted (step S1207: delete), the information processing system deletes the entry of the rule ID 601 included in the maintenance request related to the rule from the rule DB 102 (step S1209), Return to step S1201.

＜情報処理システムの利用例＞
図１３は、情報処理システムの利用例を示す説明図である。（１）情報処理システムは、データストア１０３から本文ｓｔｃ１を取得する。（１）では、情報処理システムは、文ｓｔｃ１を直接指定して取得してもよく、見出し語７０１を用いたインデックス検索により、文ｓｔｃ１を取得してもよい。（２）情報処理システムは、構文解析により、取得した文ｓｔｃ１を木構造データｔｒｃに変換する。（２）では、情報処理システムが構文解析を実行してもよく、情報処理システムが他のコンピュータに文ｓｔｃ１を送信して、当該他のコンピュータが構文解析を実行して木構造データｔｒｃを情報処理システムに返してもよい。また、木構造データｔｒｃがすでに生成済みであれば、情報処理システムは、データストア１０３から本文ｓｔｃ１に関連付けられている木構造データｔｒｃを呼び出す。 <Use example of information processing system>
FIG. 13 is an explanatory diagram showing a usage example of the information processing system. (1) The information processing system acquires the text stc1 from the data store 103 . In (1), the information processing system may acquire the sentence stc1 by directly specifying it, or may acquire the sentence stc1 by an index search using the headword 701 . (2) The information processing system converts the obtained sentence stc1 into tree-structured data trc by syntactic analysis. In (2), the information processing system may execute the syntax analysis, the information processing system transmits the sentence stc1 to another computer, and the other computer executes the syntax analysis to obtain the tree structure data trc as information. May be returned to the processing system. If the tree-structured data trc has already been generated, the information processing system calls the tree-structured data trc associated with the text stc1 from the data store 103 .

（３）情報処理システムは、操作者の編集操作により、木構造データｔｒｃから木構造パターンを生成し、ルールＲｃとする。ここでは、ルールＲｃの述語には、動詞の単語グループＧｂが適用されたこととする。 (3) The information processing system generates a tree-structured pattern from the tree-structured data trc by the editing operation of the operator, and uses it as a rule Rc. Here, it is assumed that the verb word group Gb is applied to the predicate of the rule Rc.

（４）情報処理システムは、ルールＲｃの木構造パターンから注釈ａ１として、文ｓｔｃ１の主語である「Ｘ」を抽出し、注釈ａ２として、文ｓｔｃ１の目的語である「Ａ」を抽出して、表示画面に表示する。 (4) The information processing system extracts the subject "X" of the sentence stc1 as the annotation a1 from the tree structure pattern of the rule Rc, and extracts the object "A" of the sentence stc1 as the annotation a2. , to be displayed on the display screen.

（５）情報処理システムは、ルールＲｃをルールＤＢ１０２に登録する。なお、同一内容のルールが登録済みである場合は、情報処理システムは、ルールＲｃをルールＤＢ１０２に登録しない。 (5) The information processing system registers rule Rc in rule DB 102 . Note that if a rule with the same content has already been registered, the information processing system does not register the rule Rc in the rule DB 102 .

（６）情報処理システムは、（２）の木構造データｔｒｃと（４）の注釈ａ１，ａ２とを、データストア１０３の文ｓｔｃ１のエントリに登録する。これにより、取得した本文ｓｔｃ１の見出し語７０１を自動生成することができ、これ以降のインデックス検索の効率化を図ることができる。 (6) The information processing system registers the tree structure data trc of (2) and the annotations a1 and a2 of (4) in the entry of the sentence stc1 in the data store 103 . As a result, it is possible to automatically generate the headword 701 of the acquired text stc1, and to improve the efficiency of subsequent index searches.

（７）情報処理システムは、データストア１０３の文ｓｔｃ１以外の他の本文をサーチしてルールＲｃに該当する本文ｓｔｃ２を特定し、本文ｓｔｃ２のエントリの主語である「Ｊ」を注釈ａ１、目的語である「Ｋ」を注釈ａ２として登録する（インデックス更新）。これにより、他の本文ｓｔｃ２にも波及して見出し語７０１を自動生成することができ、これ以降のインデックス検索の効率化を図ることができる。 (7) The information processing system searches texts other than the text stc1 in the data store 103 to identify the text stc2 corresponding to the rule Rc. The word "K" is registered as an annotation a2 (index update). As a result, the headword 701 can be automatically generated by spreading to other texts stc2, and subsequent index searches can be made more efficient.

つぎに、図１３に示した利用例での表示画面例について図１４～図１９を用いて説明する。図１４～図１９の表示画面は、情報処理システム内のあるコンピュータ４００で表示される表示画面である。 Next, examples of display screens in the example of use shown in FIG. 13 will be described with reference to FIGS. 14 to 19. FIG. The display screens of FIGS. 14 to 19 are display screens displayed by a computer 400 in the information processing system.

図１４は、情報処理システムの表示画面例１を示す説明図である。表示画面１４００は、サンプルタブ１４０１、バリデートタブ１４０２、およびインデックスタブ１４０３を有する。図１４では、サンプルタブ１４０１が表示される。サンプルタブ１４０１は、検索キーワード入力欄１４１１、検索ボタン１４１２、および保存ボタン１４１５を有する。検索キーワード入力欄１４１１は、操作者が検索キーワードを入力する入力欄である。検索ボタン１４１２は、操作者の操作により、データストア１０３の見出し語７０１をインデックス検索し、対応する本文７０２を抽出するためのボタンである。なお、本例では、インデックス検索として説明するが本文７０２の全文検索でもよい。 FIG. 14 is an explanatory diagram showing a display screen example 1 of the information processing system. Display screen 1400 has sample tab 1401 , validate tab 1402 , and index tab 1403 . In FIG. 14, a sample tab 1401 is displayed. Sample tab 1401 has search keyword input field 1411 , search button 1412 , and save button 1415 . A search keyword input field 1411 is an input field for the operator to input a search keyword. The search button 1412 is a button for performing an index search for the headword 701 in the data store 103 and extracting the corresponding text 702 by the operation of the operator. In this example, an index search will be described, but a full-text search of the text 702 may also be used.

図１４では、検索キーワード入力欄１４１１に「ｓｐｉｎｏｆｆ」が入力されて検索ボタン１４１２が押下されたとする。これにより、図１３の（１）に示したように、データストア１０３の見出し語７０１がインデックス検索され、対応する本文７０２が検索結果１４１３として表示される。検索結果１４１３の各本文は、チェックボックス１４１４を有し、情報処理システムは、操作者がチェックボックス１４１４にチェックを入れられた本文を選択する。図１４では、本文ｓｔ１が選択されたものとする。保存ボタン１４１５は、検索結果１４１３からチェックボックス１４１４で選択された本文を保存するためのボタンである。保存ボタン１４１５の押下により、チェックボックス１４１４にチェックを入れられた本文ｓｔ１がデータストア１０３に保存される。 In FIG. 14, it is assumed that "spin off" is entered in the search keyword input field 1411 and the search button 1412 is pressed. As a result, as shown in (1) of FIG. 13, the headword 701 in the data store 103 is searched by index, and the corresponding text 702 is displayed as the search result 1413. FIG. Each text in the search result 1413 has a check box 1414, and the information processing system selects the text with the check box 1414 checked by the operator. In FIG. 14, it is assumed that text st1 is selected. A save button 1415 is a button for saving the text selected by the check box 1414 from the search result 1413 . By pressing the save button 1415 , the text st1 whose checkbox 1414 is checked is saved in the data store 103 .

図１５は、情報処理システムの表示画面例２を示す説明図である。表示画面例２は、図１４の表示画面例１でチェックボックス１４１４にチェックを入れた状態で、バリデートタブ１４０２を選択した場合の表示画面例である。バリデートタブ１４０２は、確認領域１５０１と、コピー領域１５０２と、解析ボタン１５０３と、注釈ボタン１５０４と、追加ボタン１５０５と、編集領域１５０６と、を有する。確認領域１５０１は、選択文表示領域１５１０と、注釈ａ１表示領域１５１１と、注釈ａ２表示領域１５１２と、を有する。選択文表示領域１５１０は、図１４の表示画面例１でチェックボックス１４１４にチェックを入れられたことで選択された本文を表示する。注釈ａ１表示領域１５１１は、注釈ａ１（主語）を表示する領域である。注釈ａ２表示領域１５１２は、注釈ａ２（目的語）を表示する領域である。 FIG. 15 is an explanatory diagram showing a display screen example 2 of the information processing system. Display screen example 2 is an example of a display screen when the validate tab 1402 is selected with the check box 1414 checked in the display screen example 1 of FIG. The validate tab 1402 has a confirmation area 1501 , a copy area 1502 , an analysis button 1503 , an annotation button 1504 , an add button 1505 and an edit area 1506 . The confirmation area 1501 has a selected sentence display area 1510 , an annotation a1 display area 1511 and an annotation a2 display area 1512 . Selected sentence display area 1510 displays the text selected by checking check box 1414 in display screen example 1 of FIG. An annotation a1 display area 1511 is an area for displaying an annotation a1 (subject). An annotation a2 display area 1512 is an area for displaying an annotation a2 (object).

表示画面例２では、注釈ａ１表示領域１５１１は、注釈ａ１用テキスト入力欄１５１３を有する。操作者は、選択文表示領域１５１０の本文ｓｔ１を参照して、注釈ａ１用テキスト入力欄１５１３に、注釈ａ１（主語）に相当する語句（たとえば、「Ｎｉｃｈｉｒｉｔｓｕ」）を入力する。注釈ａ２表示領域１５１２は、注釈ａ２用テキスト入力欄１５１４を有する。操作者は、選択文表示領域１５１０の本文ｓｔ１を参照して、注釈ａ２用テキスト入力欄１５１４に、注釈ａ２（目的語）に相当する語句（たとえば、「ｈｏｍｅａｐｐｌｉａｎｃｅ」）を入力する。 In display screen example 2, the annotation a1 display area 1511 has a text input field 1513 for annotation a1. The operator refers to text st1 in selected sentence display area 1510 and enters a phrase (for example, “Nichiritsu”) corresponding to annotation a1 (subject) in text input field 1513 for annotation a1. The annotation a2 display area 1512 has a text input field 1514 for annotation a2. The operator refers to text st1 in selected sentence display area 1510 and enters a phrase (for example, “home appliance”) corresponding to annotation a2 (object) in text input field 1514 for annotation a2.

確認領域１５０１に表示された本文ｓｔ１と、注釈ａ１用テキスト入力欄１５１３に入力された語句「Ｎｉｃｈｉｒｉｔｓｕ」と、注釈ａ２用テキスト入力欄１５１４に入力された語句「ｈｏｍｅａｐｐｌｉａｎｃｅ」との組み合わせを、確認用データセット１５００と称す。 The combination of the text st1 displayed in the confirmation area 1501, the word "Nichiritsu" input in the text input field 1513 for annotation a1, and the word "home appliance" input in the text input field 1514 for annotation a2 is confirmed. data set 1500 for

コピーボタン１５１５は、操作者の操作により、選択文表示領域１５１０の本文をコピー領域１５０２にコピーするためのボタンである。コピー領域１５０２は、コピーボタン１５１５の押下により、選択文表示領域１５１０の本文ｓｔ１をコピーして表示する領域である。解析ボタン１５０３は、コピー領域１５０２にコピーされた本文ｓｔ１を構文解析するためのボタンである（図１３の（２）に対応）。注釈ボタン１５０４は、編集領域１５０６で編集された木構造パターンから本文ｓｔ１の注釈を抽出するためのボタンである（図１３の（４）に対応）。追加ボタン１５０５は、編集領域１５０６で編集された木構造パターンをルールＤＢ１０２にルールとして追加するためのボタンである（図１３の（５）に対応）。 A copy button 1515 is a button for copying the text of the selected sentence display area 1510 to the copy area 1502 by the operator's operation. A copy area 1502 is an area where the text st1 of the selected sentence display area 1510 is copied and displayed by pressing a copy button 1515 . An analysis button 1503 is a button for parsing the text st1 copied to the copy area 1502 (corresponding to (2) in FIG. 13). An annotation button 1504 is a button for extracting an annotation of the text st1 from the tree structure pattern edited in the editing area 1506 (corresponding to (4) in FIG. 13). An add button 1505 is a button for adding the tree structure pattern edited in the editing area 1506 to the rule DB 102 as a rule (corresponding to (5) in FIG. 13).

図１６は、情報処理システムの表示画面例３を示す説明図である。表示画面例３は、図１５の表示画面例２でコピーボタン１５１５および解析ボタン１５０３を押下した場合の表示画面例である。操作者の操作により、コピーボタン１５１５が押下されると、コピー領域１５０２に選択した本文ｓｔ１がコピーされる。次に、操作者の操作により、解析ボタン１５０３が押下されると、選択した本文ｓｔ１を構文解析した木構造データｔｒ１が編集領域１５０６に表示される（図１３の（２）に対応）。 FIG. 16 is an explanatory diagram showing a display screen example 3 of the information processing system. A display screen example 3 is an example of a display screen when the copy button 1515 and the analysis button 1503 are pressed in the display screen example 2 of FIG. When the copy button 1515 is pressed by the operator's operation, the selected text st1 is copied to the copy area 1502 . Next, when the analysis button 1503 is pressed by the operator's operation, the tree structure data tr1 obtained by parsing the selected text st1 is displayed in the editing area 1506 (corresponding to (2) in FIG. 13).

図１７は、情報処理システムの表示画面例４を示す説明図である。表示画面例４は、図１６の表示画面例３で編集領域１５０６内の木構造データｔｒ１を編集した場合の表示画面例である。たとえば、操作者の操作により、情報処理システムは、注釈として抽出させる単語に注釈を示す「ａ０」，「ａ１」，「ａ２」を付与する。「ａ０」，「ａ１」，「ａ２」は、ルールを定義する。注釈ａ０は、他の注釈ａ１，ａ２の抽出基準となる抽出対象外の注釈である。すなわち、注釈ａ０が単語であれば、他の本文と一致する抽出対象外の単語であり、注釈ａ０が単語グループであれば、他の本文の単語を包含する抽出対象外の単語グループである。注釈ａ１は木構造パターンｔｐ１で注釈ａ０に対する主語（名詞句（ＮＰ））として定義され、注釈ａ２は木構造パターンｔｐ１で注釈ａ０に対する目的語（名詞句（ＮＰ））で定義されているため、他の本文からルールに該当する名詞句が抽出される。 FIG. 17 is an explanatory diagram showing a display screen example 4 of the information processing system. A display screen example 4 is an example of a display screen when the tree structure data tr1 in the editing area 1506 is edited in the display screen example 3 of FIG. For example, by the operator's operation, the information processing system assigns "a0", "a1", and "a2" indicating annotations to words to be extracted as annotations. "a0", "a1", and "a2" define rules. The annotation a0 is an annotation not to be extracted that serves as an extraction reference for the other annotations a1 and a2. That is, if the annotation a0 is a word, it is a non-extraction-target word that matches the other text, and if the annotation a0 is a word group, it is a non-extraction-target word group that includes the words of the other text. Since annotation a1 is defined as a subject (noun phrase (NP)) for annotation a0 in tree structure pattern tp1, and annotation a2 is defined as an object (noun phrase (NP)) for annotation a0 in tree structure pattern tp1, Noun phrases corresponding to the rule are extracted from other texts.

また、操作者の操作により、操作者の主観で重要でないサブツリーや「ｌｅｍｍｍａ」（単語の基本形）が削除される。また、図１１に示したように、木構造データｔｒ１で定義されている単語が、当該単語を含む単語グループの呼び出しの記述に変更される場合もある。 In addition, subtrees and "lemmma" (basic forms of words) that are subjectively unimportant to the operator are deleted by the operator's operation. Also, as shown in FIG. 11, a word defined in the tree structure data tr1 may be changed to a description of calling a word group containing the word.

図１８は、情報処理システムの表示画面例５を示す説明図である。表示画面例５は、図１７の表示画面例４で注釈ボタン１５０４を押下した場合の表示画面例である。操作者の操作により注釈ボタン１５０４が押下されると、情報処理システムは、編集領域１５０６で編集された木構造パターンｔｐ１（ルール）に該当する注釈ａ１，ａ２に該当する文字列を、コピー画面の選択した本文ｓｔ１から抽出し、抽出結果１８００を表示する（図１３の（４）に対応）。この場合、注釈ａ１の名詞句として、「ＪａｐａｎｅｓｅｅｌｅｃｔｒｏｎｉｃｓｍａｋｅｒＮｉｃｈｉｒｉｔｓｕ」が抽出され、注釈ａ２の名詞句として「ｉｔｓｈｏｍｅａｐｐｌｉａｎｃｅａｎｄｉｎｄｕｓｔｒｉａｌｅｑｕｉｐｍｅｎｔｄｉｖｉｓｉｏｎｓ」が抽出される。また、抽出された注釈ａ１，ａ２の名詞句はそれぞれ、注釈ａ１表示領域１５１１と注釈ａ２表示領域１５１２とに表示される。 FIG. 18 is an explanatory diagram showing example 5 of the display screen of the information processing system. Display screen example 5 is an example of a display screen when the annotation button 1504 is pressed in display screen example 4 of FIG. When the annotation button 1504 is pressed by the operator, the information processing system displays the character strings corresponding to the annotations a1 and a2 corresponding to the tree structure pattern tp1 (rule) edited in the editing area 1506 on the copy screen. Extract from the selected text st1 and display the extraction result 1800 (corresponding to (4) in FIG. 13). In this case, "Japanese electronics maker Nichiritsu" is extracted as the noun phrase of the annotation a1, and "its home appliance and industrial equipment divisions" is extracted as the noun phrase of the annotation a2. The extracted noun phrases of annotations a1 and a2 are displayed in annotation a1 display area 1511 and annotation a2 display area 1512, respectively.

これにより、操作者は、注釈ａ１用テキスト入力欄１５１３に入力した語句「Ｎｉｃｈｉｒｉｔｓｕ」と、ルールに従って抽出された注釈ａ１の名詞句「ＪａｐａｎｅｓｅｅｌｅｃｔｒｏｎｉｃｓｍａｋｅｒＮｉｃｈｉｒｉｔｓｕ」とを比較して、ルールの確からしさを確認することができる。同様に、操作者は、注釈ａ２用テキスト入力欄１５１４に入力した語句「ｈｏｍｅａｐｐｌｉａｎｃｅ」と、ルールに従って抽出された注釈ａ２の名詞句「ｉｔｓｈｏｍｅａｐｐｌｉａｎｃｅａｎｄｉｎｄｕｓｔｒｉａｌｅｑｕｉｐｍｅｎｔｄｉｖｉｓｉｏｎｓ」とを比較して、ルールの確からしさを確認することができる。 As a result, the operator compares the phrase "Nichiritsu" input in the text input field 1513 for annotation a1 with the noun phrase "Japanese electronics maker Nichiritsu" of annotation a1 extracted according to the rule, and determines the likelihood of the rule. can be confirmed. Similarly, the operator compares the phrase "home appliance" entered in the text input field 1514 for annotation a2 with the noun phrase "its home appliance and industrial equipment divisions" of annotation a2 extracted according to the rule, can confirm the certainty of

また、追加ボタン１５０５が押下されることで、編集領域１５０６内の文字列（編集された木構造データｔｒ１）が木構造パターンｔｐ１となって、ルールとしてルールＤＢ１０２に登録される（図１３の（５）に対応）。 Further, when the add button 1505 is pressed, the character string (edited tree structure data tr1) in the editing area 1506 becomes the tree structure pattern tp1 and is registered as a rule in the rule DB 102 (( 5)).

図１９は、情報処理システムの表示画面例６を示す説明図である。表示画面例６は、図１８の表示画面例５でインデックスタブ１４０３を選択した場合の表示画面例である。インデックスタブ１４０３は、更新ボタン１９００を有する。操作者の操作により、更新ボタン１９００が押下されると、情報処理システムは、選択した本文ｓｔ１について、木構造データｔｒ１と、注釈ａ１の名詞句「ＪａｐａｎｅｓｅｅｌｅｃｔｒｏｎｉｃｓｍａｋｅｒＮｉｃｈｉｒｉｔｓｕ」と、注釈ａ２の名詞句「ｉｔｓｈｏｍｅａｐｐｌｉａｎｃｅａｎｄｉｎｄｕｓｔｒｉａｌｅｑｕｉｐｍｅｎｔｄｉｖｉｓｉｏｎｓ」を関連付けてデータストア１０３に登録することで、選択した本文ｓｔ１のエントリをインデックス更新する（図１３の（６）に対応）。 FIG. 19 is an explanatory diagram showing a display screen example 6 of the information processing system. A display screen example 6 is an example of a display screen when the index tab 1403 is selected in the display screen example 5 of FIG. Index tab 1403 has update button 1900 . When the operator presses the update button 1900, the information processing system updates the selected text st1 with the tree structure data tr1, the noun phrase "Japanese electronics maker Nichiritsu" of the annotation a1, and the noun phrase "Japanese electronics maker Nichiritsu" of the annotation a2. By associating "its home appliance and industrial equipment divisions" and registering it in the data store 103, the index of the entry of the selected text st1 is updated (corresponding to (6) in FIG. 13).

同様に、情報処理システムは、他の本文について、木構造パターンｔｐ１のルールに該当する注釈ａ１の名詞句および注釈ａ２の名詞句を、当該他の本文に関連付けてデータストア１０３に登録することで、当該他の本文のエントリをインデックス更新する（図１３の（７）に対応）。 Similarly, the information processing system registers the noun phrase of annotation a1 and the noun phrase of annotation a2 that correspond to the rule of tree structure pattern tp1 in the data store 103 in association with the other text. , the index of the entry of the other text is updated (corresponding to (7) in FIG. 13).

＜情報処理システムの利用例における処理手順例＞
図２０は、情報処理システムの利用例における処理手順例を示すフローチャートである。情報処理システムは、図１４に示したように、検索キーワード入力欄１４１１への検索キーワードの入力を受け付け（ステップＳ２００１）、検索ボタン１４１２の押下により、入力された検索キーワードによるインデックス検索を実行する（ステップＳ２００２）。情報処理システムは、図１４に示したように、操作者の操作によって選択された本文を保存する（ステップＳ２００３）。 <Example of processing procedure in example of use of information processing system>
FIG. 20 is a flowchart illustrating an example of a processing procedure in an example of use of the information processing system. As shown in FIG. 14, the information processing system receives input of a search keyword in the search keyword input field 1411 (step S2001), and executes an index search using the input search keyword by pressing a search button 1412 ( step S2002). The information processing system saves the text selected by the operator's operation, as shown in FIG. 14 (step S2003).

つぎに、情報処理システムは、図１５に示したように、操作者の操作により、確認用データセット１５００を設定する（ステップＳ２００４）。そして、情報処理システムは、図１６に示したように、選択した本文ｓｔ１の構文解析により、木構造データｔｒ１を取得する（ステップＳ２００５）。また、情報処理システムは、操作者による追加ボタン１５０５の押下により、木構造データｔｒ１から編集された木構造パターンｔｐ１をルールＤＢ１０２に登録する（ステップＳ２００６）。追加ボタン１５０５の押下は、図１２のステップＳ１２０７：追加に対応し、木構造パターンｔｐ１の登録は、図１２のステップＳ１２０８に対応する。 Next, as shown in FIG. 15, the information processing system sets the confirmation data set 1500 by the operator's operation (step S2004). Then, as shown in FIG. 16, the information processing system obtains tree structure data tr1 by parsing the selected text st1 (step S2005). Further, the information processing system registers the tree structure pattern tp1 edited from the tree structure data tr1 in the rule DB 102 when the operator presses the add button 1505 (step S2006). Pressing the add button 1505 corresponds to step S1207: add in FIG. 12, and registering the tree structure pattern tp1 corresponds to step S1208 in FIG.

そして、情報処理システムは、図１８に示したように、操作者の操作により、注釈ボタン１５０４が押下されることで、木構造パターンｔｐ１のルールに従って、選択された本文ｓｔ１から注釈ａ１の語句および注釈ａ２の語句を抽出して、抽出結果１８００として表示する（ステップＳ２００７）。 Then, as shown in FIG. 18, when the annotation button 1504 is pressed by the operator, the information processing system converts the words and phrases from the selected text st1 to the annotation a1 according to the rules of the tree structure pattern tp1. The word/phrase of the annotation a2 is extracted and displayed as an extraction result 1800 (step S2007).

なお、操作者は、木構造パターンｔｐ１の編集を繰り返しおこなうことができ、情報処理システムは、その都度、木構造パターンｔｐ１をルールとして登録してもよい。この場合、ステップＳ２００７において、情報処理システムは、木構造パターンｔｐ１ごとに、選択本文から注釈を抽出することになる。このあと、情報処理システムは、図１９に示したように、抽出した注釈をデータストア１０３にインデックス更新する（ステップＳ２００８）。 Note that the operator can repeatedly edit the tree structure pattern tp1, and the information processing system may register the tree structure pattern tp1 as a rule each time. In this case, in step S2007, the information processing system extracts annotations from the selected text for each tree structure pattern tp1. Thereafter, the information processing system updates the index of the extracted annotations in the data store 103 as shown in FIG. 19 (step S2008).

このように、上述した情報処理システムは、単語辞書ＤＢ１０１と、ルールＤＢ１０２と、を有し、プロセッサ４０１は、メンテナンス要求を受け付ける受付処理と、受付処理によって受け付けられたメンテナンス要求が単語に関するメンテナンス要求である場合、単語が所属する単語グループに対するメンテナンスを単語辞書ＤＢ１０１に対して行い、メンテナンス要求が木構造パターンに関するメンテナンス要求である場合、木構造パターンのメンテナンスをルールＤＢ１０２に対して行うメンテナンス処理と、を実行する。 In this way, the information processing system described above has a word dictionary DB 101 and a rule DB 102, and the processor 401 performs a maintenance request reception process, and the maintenance request received by the reception process is a word-related maintenance request. a maintenance process of performing maintenance on the word group to which the word belongs to the word dictionary DB 101, and performing maintenance of the tree structure pattern on the rule DB 102 if the maintenance request is a maintenance request for a tree structure pattern. Run.

これにより、単語辞書ＤＢ１０１とルールＤＢ１０２とを各々独立してメンテナンスが可能となる。換言すれば、情報処理システムは、単語辞書ＤＢ１０１とルールＤＢ１０２のうち、いずれか一方のデータベースのみメンテナンスする。したがって、単語辞書ＤＢ１０１内のある単語グループをメンテナンスしても、当該単語グループを用いるルールをルールＤＢ１０２でメンテナンスする必要はない。逆に、ルールＤＢ１０２内のあるルールをメンテナンスしても、当該ルールに用いられる単語グループをメンテナンスする必要はない。したがって、データベースのメンテナンスの容易化を図ることができる。 Thereby, the word dictionary DB 101 and the rule DB 102 can be maintained independently. In other words, the information processing system maintains only one of the word dictionary DB 101 and the rule DB 102 . Therefore, even if a certain word group in the word dictionary DB 101 is maintained, there is no need to maintain rules using the word group in the rule DB 102 . Conversely, maintenance of a rule in the rule DB 102 does not require maintenance of word groups used in the rule. Therefore, maintenance of the database can be facilitated.

また、プロセッサ４０１は、単語に関するメンテナンス要求が単語の追加要求である場合、単語に基づいて単語が所属すべき単語グループの属性を特定する特定処理を実行し、メンテナンス処理では、プロセッサ４０１は、特定処理によって特定された属性の単語グループに単語を追加する。 In addition, when the maintenance request for a word is a word addition request, the processor 401 executes a specification process of specifying the attribute of the word group to which the word should belong based on the word. In the maintenance process, the processor 401 specifies Add the word to the attribute word group identified by the process.

これにより、単語の追加要求があった場合、単語辞書ＤＢ１０１内の該当する単語グループに当該単語を追加登録するが、当該単語グループを用いるルールをルールＤＢ１０２でメンテナンスする必要はない。したがって、単語登録の際のメンテナンスの容易化を図ることができる。 As a result, when there is a request to add a word, the word is additionally registered in the corresponding word group in the word dictionary DB 101, but there is no need to maintain rules using the word group in the rule DB 102. FIG. Therefore, it is possible to facilitate maintenance when registering words.

また、メンテナンス処理では、プロセッサ４０１は、単語に関するメンテナンス要求が単語の削除要求である場合、単語が所属する単語グループから単語を削除する。 In the maintenance process, processor 401 deletes the word from the word group to which the word belongs when the maintenance request for the word is a word deletion request.

これにより、単語の削除要求があった場合、単語辞書ＤＢ１０１内の該当する単語グループから当該単語を削除するが、当該単語グループを用いるルールをルールＤＢ１０２でメンテナンスする必要はない。したがって、単語削除の際のメンテナンスの容易化を図ることができる。 As a result, when there is a request to delete a word, the word is deleted from the corresponding word group in the word dictionary DB 101, but there is no need to maintain rules using the word group in the rule DB 102. FIG. Therefore, it is possible to facilitate maintenance when deleting words.

また、メンテナンス処理では、プロセッサ４０１は、木構造パターンに関するメンテナンス要求が木構造パターンの追加要求である場合、ルールＤＢ１０２に木構造パターンが存在しなければ木構造パターンをルールＤＢ１０２に登録する。 Also, in the maintenance process, if the tree structure pattern maintenance request is a tree structure pattern addition request, the processor 401 registers the tree structure pattern in the rule DB 102 if the tree structure pattern does not exist in the rule DB 102 .

これにより、木構造パターンの追加要求があった場合、ルールＤＢ１０２に当該木構造パターンを新規なルールとして追加登録するが、当該新規なルールに用いられる単語グループを単語辞書ＤＢ１０１でメンテナンスする必要はない。したがって、木構造パターン登録の際のメンテナンスの容易化を図ることができる。 As a result, when there is a request to add a tree structure pattern, the tree structure pattern is additionally registered in the rule DB 102 as a new rule, but there is no need to maintain word groups used in the new rule in the word dictionary DB 101. . Therefore, it is possible to facilitate maintenance when registering a tree structure pattern.

また、メンテナンス処理では、プロセッサ４０１は、木構造パターンに関するメンテナンス要求が木構造パターンの削除要求である場合、木構造パターンをルールＤＢ１０２から削除する。 Also, in maintenance processing, the processor 401 deletes the tree structure pattern from the rule DB 102 when the maintenance request for the tree structure pattern is a deletion request for the tree structure pattern.

これにより、木構造パターンの削除要求があった場合、ルールＤＢ１０２から当該木構造パターンを削除するが、当該木構造パターンに用いられる単語グループを単語辞書ＤＢ１０１でメンテナンスする必要はない。したがって、木構造パターン登録の際のメンテナンスの容易化を図ることができる。 As a result, when there is a request to delete a tree structure pattern, the tree structure pattern is deleted from the rule DB 102, but there is no need to maintain word groups used in the tree structure pattern in the word dictionary DB 101. FIG. Therefore, it is possible to facilitate maintenance when registering a tree structure pattern.

また、プロセッサ４０１は、複数の文を記憶するデータストア１０３にアクセス可能であり、複数の文のうち特定の単語を含むデータストア１０３内の特定の文の解析結果である特定の木構造データを、特定の単語を含む特定の単語グループを用いて抽象化した特定の木構造パターンを取得する取得処理と、特定の木構造データから、取得処理によって取得された特定の木構造パターンにおいて特定の単語グループと共起する語句（たとえば、特定の単語グループが述語動詞である場合の主語や目的語）に包含される単語を抽出する抽出処理と、抽出処理によって抽出された単語を表示画面に表示可能に出力する出力処理と、を実行し、メンテナンス処理では、プロセッサ４０１は、特定の木構造パターンに関するメンテナンス要求が特定の木構造パターンの追加要求である場合（たとえば、追加ボタン１５０５の押下）、特定の木構造パターンをルールＤＢ１０２に登録する。 Also, the processor 401 can access the data store 103 storing a plurality of sentences, and extracts specific tree structure data that is the analysis result of a specific sentence in the data store 103 that contains a specific word among the plurality of sentences. , an acquisition process for acquiring a specific tree structure pattern abstracted using a specific word group including a specific word, and a specific word in the specific tree structure pattern obtained by the acquisition process from the specific tree structure data Extraction processing that extracts words included in words that co-occur with groups (for example, subjects and objects when a specific word group is a predicate verb), and the words extracted by extraction processing can be displayed on the display screen In the maintenance process, processor 401 outputs a specific is registered in the rule DB 102.

これにより、特定の木構造パターンに該当する単語を特定の文の注釈として表示することができる。したがって、たとえば、特定の文について、あらかじめ特定の単語グループと共起する語句を操作者が選択していた場合、当該選択していた単語と注釈とを比較することにより、特定の木構造パターンの確からしさを確認して、ルールＤＢ１０２に登録することができる。 As a result, words corresponding to a specific tree structure pattern can be displayed as annotations for a specific sentence. Therefore, for example, if the operator has previously selected a word or phrase that co-occurs with a specific word group for a specific sentence, by comparing the selected word with the annotation, a specific tree structure pattern can be obtained. The probability can be confirmed and registered in the rule DB 102 .

また、プロセッサ４０１は、複数の文を記憶するデータストア１０３にアクセス可能であり、複数の文のうち特定の単語を含むデータストア１０３内の特定の文の解析結果である特定の木構造データを、特定の単語を含む特定の単語グループを用いて抽象化した特定の木構造パターンを取得する取得処理と、特定の木構造データから、取得処理によって取得された特定の木構造パターンにおいて特定の単語グループと共起する語句に包含される単語を抽出する抽出処理と、抽出処理によって抽出された単語を特定の文に関連付けることによりデータストア１０３を更新する更新処理と、を実行し、メンテナンス処理では、プロセッサ４０１は、特定の木構造パターンに関するメンテナンス要求が特定の木構造パターンの追加要求である場合、特定の木構造パターンをルールＤＢ１０２に登録する。 Also, the processor 401 can access the data store 103 storing a plurality of sentences, and extracts specific tree structure data that is the analysis result of a specific sentence in the data store 103 that contains a specific word among the plurality of sentences. , an acquisition process for acquiring a specific tree structure pattern abstracted using a specific word group including a specific word, and a specific word in the specific tree structure pattern obtained by the acquisition process from the specific tree structure data An extraction process for extracting words included in words co-occurring with a group, and an update process for updating the data store 103 by associating the words extracted by the extraction process with a specific sentence are executed. , the processor 401 registers the specific tree structure pattern in the rule DB 102 when the maintenance request for the specific tree structure pattern is a request to add the specific tree structure pattern.

これにより、特定の木構造パターンに該当する単語を特定の文の注釈として関連付けて登録するとともに、当該関連付けに用いられた特定の木構造パターンをルールとしてルールＤＢ１０２に登録することができる。またこれにより、たとえば、データストア１０３を検索したい場合に、関連付けられた注釈を見出し語としてインデックス検索することにより、データストア１０３から特定の文を抽出することができる。 As a result, words corresponding to a specific tree structure pattern can be associated and registered as annotations of a specific sentence, and the specific tree structure pattern used for the association can be registered in the rule DB 102 as a rule. This also allows, for example, when searching the data store 103, a particular sentence can be extracted from the data store 103 by performing an index search using the associated annotation as a headword.

また、プロセッサ４０１は、複数の文のうち特定の文以外の他の文の解析結果である他の木構造データから、特定の木構造パターンにおいて特定の単語グループと共起する語句に包含される他の単語を抽出し、抽出処理によって抽出された他の単語を他の文に関連付けることによりデータストア１０３を更新する。 In addition, the processor 401 extracts from other tree structure data, which is the analysis result of sentences other than the specific sentence among the plurality of sentences, a word that is included in a word that co-occurs with a specific word group in a specific tree structure pattern. The data store 103 is updated by extracting other words and associating other words extracted by the extraction process with other sentences.

これにより、データストア１０３の他の文についても、特定の木構造パターンに該当する他の単語を他の文の注釈として関連付けて登録することができ、特定の木構造パターンで規定されるルールを他の文にまで波及することができる。 As a result, for other sentences in the data store 103 as well, other words corresponding to specific tree structure patterns can be associated and registered as annotations of other sentences, and the rules defined by the specific tree structure patterns can be registered. It can spill over into other sentences.

［２．単語辞書ＤＢ１０１への単語の追加登録例］
上述したように、情報処理システムは、文法構造を表す木構造パターンをルールとして規定するルールＤＢ１０２と関係を表す語を収集した単語辞書ＤＢ１０１とを有する。これらにより、情報処理システムは、文の要素のような関係について、文法的な構造を定義し、テキストデータから関係アノテーションとして注釈ａ０～ａ２を抽出する。情報処理システムは、ルールＤＢ１０２および単語辞書ＤＢ１０１を比較し、ルールＤＢ１０２と単語辞書ＤＢ１０１の両方に合致する語を抽出する。このように、多くの関係情報を抽出できるようにするためには、単語辞書ＤＢ１０１への単語の追加が必要となる。 [2. Example of additional registration of words to word dictionary DB 101]
As described above, the information processing system has a rule DB 102 that defines tree structure patterns representing grammatical structures as rules, and a word dictionary DB 101 that collects words representing relationships. With these, the information processing system defines a grammatical structure for relationships such as sentence elements, and extracts annotations a0 to a2 as relational annotations from the text data. The information processing system compares the rule DB 102 and the word dictionary DB 101 and extracts words that match both the rule DB 102 and the word dictionary DB 101 . Thus, in order to be able to extract a large amount of relational information, it is necessary to add words to the word dictionary DB 101 .

しかしながら、単語辞書ＤＢ１０１内の単語を増やすことが困難であるという課題がある。上記特許文献１は、キーワード検索のみで抽出したい関係を含む例文を検索していたため、木構造ルールに合致しない例文も検索されてしまう。したがって、新たな例文を追加する際にはルールＤＢ１０２と単語辞書ＤＢ１０１の両方を編集する必要がある。 However, there is a problem that it is difficult to increase the number of words in the word dictionary DB101. In Japanese Patent Laid-Open No. 2002-200011, since example sentences including relationships to be extracted are searched only by keyword search, example sentences that do not match the tree structure rule are also searched. Therefore, when adding a new example sentence, it is necessary to edit both the rule DB 102 and the word dictionary DB 101 .

このため、情報処理システムは、抽出結果である関係アノテーション（たとえば、主語および目的語に該当する単語の組み合わせ）およびルールＤＢ１０２内の木構造パターンが示すルール（たとえば、文の要素（たとえば、主語、述語、目的語）の組み合わせ）を用いることで大量のテキストデータから登録対象候補（たとえば、述語）を取得する。これにより得られる登録候補を含む例文は、木構造パターンが示すルールと合致していることが保証されるため、獲得された登録候補を単語辞書ＤＢ１０１に追加するだけで、その例文から新たに関係抽出が可能となる。 For this reason, the information processing system extracts relational annotations (for example, a combination of words corresponding to subjects and objects) and rules indicated by tree structure patterns in the rule DB 102 (for example, sentence elements (for example, subjects, By using a combination of predicate and object), candidates for registration (for example, predicate) are obtained from a large amount of text data. Since it is guaranteed that the example sentences containing the registration candidates thus obtained match the rules indicated by the tree structure pattern, simply by adding the acquired registration candidates to the word dictionary DB 101, new relationships can be created from the example sentences. Extraction becomes possible.

したがって、ルールＤＢ１０２の編集を行わずとも、単語辞書ＤＢ１０１だけを更新することで情報処理システムの性能向上が可能となる。ルールＤＢ１０２の編集には文法構造に対する一定の理解が必要となるため、訓練した人材でないと編集が難しいが、単語辞書ＤＢ１０１は動詞などの語の列挙からなるもので扱いやすい。実際の運用の際には、たとえば、ルールＤＢ１０２の開発に慣れた人材が一定量のルールＤＢ１０２を作成し、単語辞書ＤＢ１０１の更新のみを行う人材が情報処理システムにより単語辞書ＤＢ１０１の更新を行うといった作業分担が考えられる。 Therefore, the performance of the information processing system can be improved by updating only the word dictionary DB 101 without editing the rule DB 102 . Editing the rule DB 102 requires a certain understanding of the grammatical structure, so it is difficult to edit unless it is trained personnel. In actual operation, for example, personnel familiar with the development of the rule DB 102 create a certain amount of the rule DB 102, and personnel who only update the word dictionary DB 101 update the word dictionary DB 101 using the information processing system. Work sharing is possible.

なお、本例では、上述した［１．単語辞書ＤＢに対するメンテナンス例］において、ルールＤＢ１０２が構築され、データストア内１０３の本文７０２について見出し語７０１（注釈ａ０～ａ２）が付与されているものとする。また、単語辞書ＤＢ内の単語グループの１つとして、“ｅｎｔｅｒ”が登録済みとする。“ｅｎｔｅｒ”は、「参入する」、「参画する」を含む単語グループである。 Note that, in this example, the above [1. Maintenance Example for Word Dictionary DB], it is assumed that the rule DB 102 is constructed and headwords 701 (annotations a0 to a2) are added to the text 702 in the data store 103 . It is also assumed that "enter" has been registered as one of the word groups in the word dictionary DB. "enter" is a word group including "enter" and "participate".

また、本例では、単語グループを述語の単語グループとし、例文内の主語、述語、および目的語の関係から新たな述語の単語を単語辞書ＤＢ１０１に追加登録する例について説明する。しかし、単語辞書ＤＢ１０１への追加登録は、単語グループに対応する単語であれば、その単語についての文の要素は述語に限られない。なお、例文では英文、日本語文と分かれて記載されているが、これらの別は問わない。 In this example, a word group is defined as a predicate word group, and a new predicate word is additionally registered in the word dictionary DB 101 based on the relationship between the subject, predicate, and object in the example sentence. However, additional registration to the word dictionary DB 101 is not limited to predicates as long as the word corresponds to a word group. Although the example sentences are written separately in English and Japanese, the distinction between them does not matter.

＜単語追加登録例＞
図２１は、単語追加登録例を示す説明図であり、図２２は、単語登録処理手順例を示すフローチャートである。 <Additional word registration example>
FIG. 21 is an explanatory diagram showing an example of word addition registration, and FIG. 22 is a flow chart showing an example of a word registration processing procedure.

（１）情報処理システムは、注釈ａ１および注釈ａ２が規定されたルールＲ１、Ｒ２、…（以下、これらを区別しない場合、ルールＲと表記）のいずれかに該当する例文２００１を検索する（ステップＳ２２０１）。本例では、検索された例文２００１において、「様々な企業」が注釈ａ１に該当し、「小売市場に」が注釈ａ２に該当する。 (1) The information processing system searches for an example sentence 2001 corresponding to one of rules R1, R2, . S2201). In this example, in the retrieved example sentence 2001, "various companies" corresponds to the annotation a1, and "in the retail market" corresponds to the annotation a2.

なお、例文２００１は、情報処理システム外のネットワーク上（たとえば、インターネットのウェブページやデータベースサーバ内の文書データ）に存在するテキストデータである。また、例文は、データストア１０２内で見出し語７０１が付与されていない本文７０２でもよい。 Note that the example sentence 2001 is text data that exists on a network outside the information processing system (for example, a web page on the Internet or document data in a database server). Alternatively, the example sentence may be the text 702 to which the headword 701 is not added in the data store 102 .

（２）情報処理システムは、（１）で検索された例文２００１を構文解析して、木構造データ２００２を生成する（ステップＳ２２０２）。木構造データ２００２は、たとえば、構文解析（形態素解析および句構造解析）により句構造規則にしたがって生成される構文木である（図９を参照）。なお、（３）以降の処理精度の向上のため、ユーザは、図９に示したように、木構造データ２００２を木構造パターンに編集してもよい。 (2) The information processing system syntactically analyzes the example sentence 2001 retrieved in (1) to generate tree structure data 2002 (step S2202). The tree structure data 2002 is, for example, a syntax tree generated according to phrase structure rules by syntactic analysis (morphological analysis and phrase structure analysis) (see FIG. 9). In order to improve the accuracy of processing after (3), the user may edit the tree-structured data 2002 into a tree-structured pattern as shown in FIG.

（３）情報処理システムは、（２）で得られた木構造データ２００２と、ルールＤＢ１０２内のルールＲと、を比較する（ステップＳ２２０３）。この際、情報処理システムは、（３）において単語辞書ＤＢ１０１を非適用にするため、注釈ａ０の条件Ｃを当該比較に使用しない。具体的には、たとえば、情報処理システムは、条件ＣをルールＲから外す。 (3) The information processing system compares the tree structure data 2002 obtained in (2) with the rule R in the rule DB 102 (step S2203). At this time, the information processing system does not use the condition C of the annotation a0 for the comparison because the word dictionary DB 101 is not applied in (3). Specifically, the information processing system removes condition C from rule R, for example.

たとえば、ルールＲ１の場合、条件Ｃは、単語グループの“ｅｎｔｅｒ”を規定する。したがって、“ｅｎｔｅｒ”に該当する「￥ｄｉｃ．ｅｎｔｅｒ」（すなわち、「参入する」および「参画する」）がルールＲ１から外される。 For example, for rule R1, condition C defines the word group "enter." Therefore, "\dic.enter" corresponding to "enter" (that is, "enter" and "participate") is excluded from rule R1.

（４）情報処理システムは、（３）の比較により、例文２００１から抽出結果２００３を得る（ステップＳ２２０４）。たとえば、ルールＲ１と比較したことにより、情報処理システムは、ルールＲの注釈ａ１の条件に該当する主語「様々な企業が」を例文２００１から抽出結果２００３として抽出する。 (4) The information processing system obtains an extraction result 2003 from the example sentence 2001 by the comparison in (3) (step S2204). For example, by comparing with rule R1, the information processing system extracts the subject "various companies" that meet the condition of annotation a1 of rule R from example sentence 2001 as extraction result 2003. FIG.

また、情報処理システムは、ルールＲの注釈ａ２の条件に該当する目的語「小売企業に」を例文２００１から抽出結果２００３として抽出する。また、上記（２）で注釈ａ０の条件Ｃを除外したため、情報処理システムは、条件Ｃ（「参入する」および「参画する」）に該当しない述語「登場している」を抽出結果２００３として抽出することができる。この条件Ｃが除外されて抽出された述語「登場している」という動詞の原形「登場する」が単語辞書ＤＢ１０１への登録候補となる。これにより、未登録単語を効率的に収集することができる。なお、抽出結果２００３は、少なくとも登録候補が含まれていればよい。 In addition, the information processing system extracts the object "to the retail company" that satisfies the condition of the comment a2 of the rule R from the example sentence 2001 as the extraction result 2003. FIG. In addition, since the condition C of the annotation a0 is excluded in (2) above, the information processing system extracts the predicate "appearing" that does not correspond to the condition C ("enter" and "participate") as the extraction result 2003. can do. The original form of the verb "appearing" extracted by excluding this condition C becomes a candidate for registration in the word dictionary DB 101. FIG. Thereby, unregistered words can be efficiently collected. Note that the extraction result 2003 should include at least registration candidates.

（５）情報処理システムは、（４）の抽出結果２００３の登録候補「登場する」が、単語辞書ＤＢ１０１に新規追加可能か否かを確認する（ステップＳ２２０５）。具体的には、たとえば、情報処理システムは、抽出結果２００３に該当する本文７０２がデータストア内に存在するか否かを判断する。 (5) The information processing system confirms whether or not the registration candidate "appear" in the extraction result 2003 of (4) can be newly added to the word dictionary DB 101 (step S2205). Specifically, for example, the information processing system determines whether the text 702 corresponding to the extraction result 2003 exists in the data store.

抽出結果２００３に該当する本文７０２が存在する場合、抽出結果２００３の登録候補「登場する」は、すでに単語辞書ＤＢ１０１において登録済みである。したがって、登録候補「登場する」の登録必要性がないことになり（ステップＳ２２０６：Ｎｏ）、情報処理システムは、つぎの（６）の処理（ステップＳ２２０７）を実行しない。 If the text 702 corresponding to the extraction result 2003 exists, the registration candidate "appearing" of the extraction result 2003 has already been registered in the word dictionary DB 101 . Therefore, there is no need to register the registration candidate "Appear" (step S2206: No), and the information processing system does not execute the next process (6) (step S2207).

一方、抽出結果２００３に該当する本文７０２が存在しない場合、抽出結果２００３の登録候補「登場する」は、単語辞書ＤＢ１０１において未登録な述語となる。登録候補「登場する」の登録必要性がある（ステップＳ２００６：Ｙｅｓ）。したがって、情報処理システムは、つぎの（６）の処理（ステップＳ２２０７）を実行する。 On the other hand, when the text 702 corresponding to the extraction result 2003 does not exist, the registration candidate "appear" in the extraction result 2003 becomes an unregistered predicate in the word dictionary DB 101 . There is a need to register the registration candidate "appear" (step S2006: Yes). Therefore, the information processing system executes the following process (6) (step S2207).

（６）情報処理システムは、抽出結果２００３の登録候補「登場する」を単語辞書ＤＢ１０１に追加登録する（ステップＳ２２０７）。この追加登録では、上述したように、情報処理システムは、（５）を実行せずに登録候補を追加登録してもよい。既登録の述語と一致する場合は、上書きされるだけであるため、（５）の処理（ステップＳ２２０５）を実行しなくても問題はない。 (6) The information processing system additionally registers the registration candidate "appear" of the extraction result 2003 in the word dictionary DB 101 (step S2207). In this additional registration, as described above, the information processing system may additionally register the registration candidate without executing (5). If it matches the registered predicate, it is simply overwritten, so there is no problem even if the process (5) (step S2205) is not executed.

ただし、（５）を実行しない場合、データストア１０３に例文２００１の注釈ａ１および注釈ａ２に該当する本文が存在しなくても、登録候補「登場する」が単語辞書ＤＢ１０１に追加登録される場合がある。このように、（５）の処理（ステップＳ２２０５）を実行して（６）の処理（ステップＳ２２０７）を実行することにより、単語辞書ＤＢ１０１への新規登録の高精度化（誤登録の抑制）を図ることができる。 However, if (5) is not executed, even if the data store 103 does not have text corresponding to the annotations a1 and a2 of the example sentence 2001, the registration candidate "appear" may be additionally registered in the word dictionary DB 101. be. Thus, by executing the process (5) (step S2205) and then executing the process (6) (step S2207), the accuracy of new registration to the word dictionary DB 101 can be improved (suppression of erroneous registration). can be planned.

また、（６）の処理（ステップＳ２２０７）において、（５）の処理（ステップＳ２２０５）の実行、不実行にかかわらず、情報処理システムは、登録候補「登場する」を、ユーザが操作するコンピュータ４００の表示画面に表示させ、ユーザの操作により、ユーザに登録可否を促してもよい。これにより、追加登録前にユーザは追加登録すべきか否かを確認することができる。 In addition, in the process (6) (step S2207), regardless of whether the process (5) (step S2205) is executed or not, the information processing system displays the registration candidate "appearing" as the computer 400 operated by the user. may be displayed on the display screen, and the user may be prompted to approve or disapprove of the registration by the user's operation. This allows the user to confirm whether or not to perform additional registration before additional registration.

このように、情報処理システムは、ルールＲに合致する例文２００１のみを収集することができ、単語辞書ＤＢ１０１の拡張の効率化を図ることができる。また、ルールＤＢ１０２を更新せずに、単語辞書ＤＢ１０１を拡張することができる。 Thus, the information processing system can collect only the example sentences 2001 that match the rule R, and can improve the efficiency of expansion of the word dictionary DB 101 . Also, the word dictionary DB 101 can be expanded without updating the rule DB 102 .

＜画面遷移例＞
つぎに、上述した単語追加登録例における表示画面の画面遷移について説明する。 <Screen transition example>
Next, screen transition of the display screen in the example of word addition registration described above will be described.

図２３は、情報処理システムの表示画面例７を示す説明図である。図２４は、情報処理システムの表示画面例８を示す説明図である。表示画面１４００において単語追加登録に用いるタブは、ＩＮＶＥＳＴＩＧＡＴＥタブ２３００とＤＩＣＴタブ２４００である。 FIG. 23 is an explanatory diagram showing a display screen example 7 of the information processing system. FIG. 24 is an explanatory diagram showing a display screen example 8 of the information processing system. The INVESTIGATE tab 2300 and the DICT tab 2400 are used for additional word registration on the display screen 1400 .

図２３は、上述した（１）の処理（ステップＳ２４０１）での表示画面例を示す。図２３において、ユーザは、ＩＮＶＥＳＴＩＧＡＴＥタブ２３００を選択する。これにより、表示画面１４００は、図２３に示す画面になる。ユーザは、「ＳｅｌｅｃｔＡｎｎｏｔａｔｏｒ」プルダウンから対象の関係の種類（例として「ｐｅｎａｌｉｚｅ」）を選択して、ＳＥＡＲＣＨボタン２３０２をクリックすると、情報処理システムは、既存のルールＲと単語辞書ＤＢ１０１によって「ｐｅｎａｌｉｚｅ」の関係を示す例文２３０２Ａ，２３０２Ｂを検索し、検索結果２３０３を表示画面１４００に表示する。 FIG. 23 shows an example of a display screen in the process (1) (step S2401) described above. In FIG. 23 the user selects the INVESTIGATE tab 2300 . As a result, the display screen 1400 becomes the screen shown in FIG. When the user selects the target relationship type (for example, "penalize") from the "Select Annotator" pull-down and clicks the SEARCH button 2302, the information processing system "penalizes" using the existing rule R and the word dictionary DB 101. Example sentences 2302A and 2302B indicating the relationship between are searched, and a search result 2303 is displayed on the display screen 1400. FIG.

なお「ＰｌｅａｓｅＩｎｐｕｔｋｅｙｗｏｒｄ」欄２３０４に絞り込み用の語を入力しておくと、その語を文中に含む例文２３０２Ａ，２３０２Ｂのみが検索、表示される。ユーザは、「ｐｅｎａｌｉｚｅ」の関係を正しく抽出できている例文２３０２Ａ，２３０２Ｂについて検索結果２３０３の左端のチェックボックス２３０５にチェック入力し、ＳＴＯＲＥボタン２３０６をクリックして一時保存する。このあと、ユーザは、ＤＩＣＴタブ２４００を選択する。これにより、表示画面１４００は、図２４に示す表示画面になる。 If a word for narrowing down is entered in the "Please Input keyword" column 2304, only example sentences 2302A and 2302B containing the word in the sentence are retrieved and displayed. The user checks the check box 2305 at the left end of the search result 2303 for example sentences 2302A and 2302B that correctly extract the relation "penalize", and clicks the STORE button 2306 to temporarily store them. The user then selects the DICT tab 2400 . As a result, the display screen 1400 becomes the display screen shown in FIG.

図２４は、上述した（２）～（７）の処理（ステップＳ２４０２～Ｓ２４０７）での表示画面例を示す。図２４において、ユーザは、ＳＥＡＲＣＨボタン２４０１をクリックして述語候補を検索する。これにより、抽出結果２４０２として述語候補２４０２Ａ，２４０２Ｂが例文２３０２Ａ，２３０２Ｂとともに表示される。例文２３０２Ａ，２３０２Ｂは、抽出される注釈ａ０～ａ２の文字列が強調表示される。 FIG. 24 shows an example of a display screen in the processes (2) to (7) described above (steps S2402 to S2407). In FIG. 24, the user clicks the SEARCH button 2401 to search for predicate candidates. As a result, predicate candidates 2402A and 2402B are displayed as extraction results 2402 together with example sentences 2302A and 2302B. In example sentences 2302A and 2302B, character strings of annotations a0 to a2 to be extracted are highlighted.

抽出に用いられたルールＲを示す木構造パターン２４０３Ａ，２４０３Ｂの名称も表示される。ここでは、単語辞書ＤＢ１０１に登録されていない登録候補が優先的に表示される。既に単語辞書ＤＢ１０１に登録されている登録候補も該当の単語辞書ＤＢ１０１の名称２４０４とともに表示される。ユーザは、表示された登録候補を確認し、適切な登録候補を対応する木構造パターン２４０３Ａ，２４０３Ｂが呼び出している単語辞書ＤＢ１０１に追加する。図２４では、「科す」という語が新たに追加登録される。 Names of tree structure patterns 2403A and 2403B indicating the rule R used for extraction are also displayed. Here, registration candidates that are not registered in the word dictionary DB 101 are preferentially displayed. Registration candidates already registered in the word dictionary DB 101 are also displayed together with the name 2404 of the corresponding word dictionary DB 101 . The user confirms the displayed registration candidates and adds appropriate registration candidates to the word dictionary DB 101 called by the corresponding tree structure patterns 2403A and 2403B. In FIG. 24, the word "impose" is additionally registered.

これにより、単語辞書ＤＢ１０１が拡張され、より多くの関係が抽出できるようになる。加えて、単語辞書ＤＢ１０１を拡張した後に、インデックスタブ１４０３の更新ボタン１９００をクリックすることで、情報処理システム内の表示画面１４００が表示されたコンピュータ４００から、データストア１０３にアクセス可能なコンピュータ４００に注釈ａ０～ａ２の書き込み指示を送信する。 As a result, the word dictionary DB 101 is expanded, and more relationships can be extracted. In addition, by clicking the update button 1900 of the index tab 1403 after expanding the word dictionary DB 101, the computer 400 displaying the display screen 1400 in the information processing system is updated to the computer 400 that can access the data store 103. An instruction to write annotations a0 to a2 is sent.

データストア１０３にアクセス可能なコンピュータ４００は、データストア１０３に格納された例文２３０２Ａ，２３０２Ｂに注釈ａ０～ａ２を書き込む。これにより、抽出結果２４０２に該当する例文２３０２Ａ，２３０２Ｂの見出し語７０１が更新される。このような操作を繰り返すことで、より多くの抽出結果２４０２を単語追加登録に用いることができるため、更に多くの単語候補を収集することができ、単語辞書ＤＢ１０１に登録される語が増加する。 Computer 400 having access to data store 103 writes annotations a0-a2 to example sentences 2302A and 2302B stored in data store 103. FIG. As a result, the headwords 701 of the example sentences 2302A and 2302B corresponding to the extraction result 2402 are updated. By repeating such operations, more extraction results 2402 can be used for additional registration of words, so more word candidates can be collected, and the number of words registered in the word dictionary DB 101 increases.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 It should be noted that the present invention is not limited to the embodiments described above, but includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the described configurations. Also, part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, the configuration of another embodiment may be added to the configuration of one embodiment. Moreover, other configurations may be added, deleted, or replaced with respect to a part of the configuration of each embodiment.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each configuration, function, processing unit, processing means, etc. described above may be realized by hardware, for example, by designing a part or all of them with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing a program to execute.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード、ＳＤカード、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）の記録媒体に格納することができる。 Information such as programs, tables, files, etc. that realize each function is stored in storage devices such as memory, hard disk, SSD (Solid State Drive), or IC (Integrated Circuit) card, SD card, DVD (Digital Versatile Disc) recording Can be stored on media.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 In addition, the control lines and information lines indicate those considered necessary for explanation, and do not necessarily indicate all the control lines and information lines necessary for mounting. In practice, it can be considered that almost all configurations are interconnected.

１０１単語辞書ＤＢ
１０２ルールＤＢ
１０３データストア
４００コンピュータ
４０１プロセッサ
４０２記憶デバイス 101 word dictionary DB
102 Rule DB
103 data store 400 computer 401 processor 402 storage device

Claims

An information processing system having a processor that executes a program and a storage device that stores the program,
a word dictionary database that stores word groups, which are word groups grouped by predetermined attributes;
a rule database that stores a tree structure pattern abstracted using the word groups for tree structure data indicating relationships between words in a sentence regarding elements of sentences;
The processor
Acquisition processing for acquiring a target sentence including a combination of a word of a first element to which the word group does not apply and a word of a second element to which the word group does not apply among sentence elements;
a determination process of determining whether or not the target sentence acquired by the acquisition process corresponds to a specific tree structure pattern obtained by excluding a third element corresponding to the word group among the elements of the sentence;
an extraction process of extracting words of the third element from the target sentence determined by the determination process to correspond to the specific tree structure pattern, and outputting an extraction result;
An information processing system characterized by executing

The information processing system according to claim 1,
The processor
An information processing system, characterized by executing a registration process of registering a word of the third element in the target sentence in a word group corresponding to the third element excluded from the specific tree structure pattern.

The information processing system according to claim 1,
The information processing system, wherein in the extraction process, the processor outputs the extraction result in a displayable manner.

The information processing system according to claim 3,
The processor
registration for registering the word of the third element in the target sentence in a word group corresponding to the third element excluded from the specific tree structure pattern when a registration instruction is input after the extraction result is displayed; An information processing system characterized by executing processing.

The information processing system according to claim 1,
having a data store that stores a set of sentences;
The processor
Information characterized by executing confirmation processing for confirming whether or not a sentence composed of a combination of a word of the first element and a word of the second element exists in the data store, and outputting a confirmation result. processing system.

The information processing system according to claim 2,
having a data store that stores a set of sentences;
The processor
performing a confirmation process for confirming whether a sentence composed of a combination of the words of the first element and the words of the second element exists in the data store;
In the registration process, if the confirmation process confirms that a sentence composed of a combination of the word of the first element and the word of the second element does not exist in the data store, the processor performs registering the word of the third element in the word group corresponding to the third element excluded from the specific tree structure pattern;
An information processing system characterized by:

The information processing system according to claim 3,
having a data store that stores a set of sentences;
The processor
performing a confirmation process for confirming whether a sentence composed of a combination of the words of the first element and the words of the second element exists in the data store;
The processor
When it is confirmed by the confirmation process that a sentence composed of a combination of the word of the first element and the word of the second element does not exist in the data store, and a registration instruction is input after the extraction result is displayed 7. An information processing system, characterized by executing a registration process of registering the word of the third element in the target sentence in a word group corresponding to the third element excluded from the specific tree structure pattern.

A processor that executes a program, a storage device that stores the program, a word dictionary database that stores word groups that are groups of words grouped according to a predetermined attribute, and a relationship between words in a sentence regarding sentence elements. An information processing method by an information processing system having a rule database for storing a tree structure pattern abstracted from the tree structure data using the word group,
The processor
Acquisition processing for acquiring a target sentence including a combination of a word of a first element to which the word group does not apply and a word of a second element to which the word group does not apply among sentence elements;
a determination process of determining whether or not the target sentence acquired by the acquisition process corresponds to a specific tree structure pattern obtained by excluding a third element corresponding to the word group among the elements of the sentence;
an extraction process of extracting words of the third element from the target sentence determined by the determination process to correspond to the specific tree structure pattern, and outputting an extraction result;
An information processing method characterized by executing