JP7095874B2

JP7095874B2 - Natural language analysis system, analysis method and program

Info

Publication number: JP7095874B2
Application number: JP2019006592A
Authority: JP
Inventors: 利充荒牧
Original assignee: HARDIS SYSTEM DESIGN CO., LTD.
Current assignee: HARDIS SYSTEM DESIGN CO., LTD.
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2022-07-05
Anticipated expiration: 2039-01-18
Also published as: JP2020115303A

Description

本発明は、自然言語の解析システム、解析方法およびプログラムに関する。 The present invention relates to a natural language analysis system, analysis method and program.

従来、統計や確率などの数理的な言語解析手法による形態素解析と、構文木や抽象構文木などの構文解析を用いてデータ構造を生成することが知られている（例えば、特許文献１参照）。
特許文献１特開２０１７－１９１４５７号公報 Conventionally, it has been known to generate a data structure by using morphological analysis by a mathematical language analysis method such as statistics and probability and parsing such as a syntax tree or an abstract syntax tree (see, for example, Patent Document 1). ..
Patent Document 1 Japanese Unexamined Patent Publication No. 2017-191457

しかしながら、従来の構文木や抽象構文木を用いた解析システムでは、得られたデータ構造から、検索用語と直接関連のない情報を取得することが困難であるという課題がある。 However, in the analysis system using the conventional syntax tree or abstract syntax tree, there is a problem that it is difficult to acquire information that is not directly related to the search term from the obtained data structure.

本発明の第１の態様においては、解析対象となる解析対象文を取得する文章取得部と、解析対象文を文節または複文節に分解した文節データを生成する文節分解部と、文節データの構文解析のために、文節または複文節から有向非循環グラフ（ＤＡＧ：ＤｉｒｅｃｔｅｄＡｃｙｃｌｉｃＧｒａｐｈ）のデータ構造を生成するＤＡＧ生成部とを備える解析システムを提供する。 In the first aspect of the present invention, a sentence acquisition unit for acquiring an analysis target sentence to be analyzed, a phrase decomposition unit for generating phrase data obtained by decomposing an analysis target sentence into a phrase or a compound phrase, and a syntax of the phrase data. For analysis, an analysis system including a DAG generation unit that generates a data structure of a directed acyclic graph (DAG) from a clause or a compound clause is provided.

本発明の第２の態様においては、解析対象となる解析対象文を取得する段階と、解析対象文を文節または複文節に分解した文節データを生成する段階と、文節データの構文解析のために、文節または複文節からＤＡＧデータを生成する段階とを備える解析方法を提供する。 In the second aspect of the present invention, for the stage of acquiring the analysis target sentence to be analyzed, the stage of generating the phrase data obtained by decomposing the analysis target sentence into clauses or compound clauses, and the syntactic analysis of the clause data. , Provides an analysis method comprising the stage of generating DAG data from a phrase or compound clause.

本発明の第３の態様においては、本発明の第２の態様に係る解析方法をコンピュータに実行させるためのプログラムを提供する。 In the third aspect of the present invention, a program for causing a computer to execute the analysis method according to the second aspect of the present invention is provided.

なお、上記の発明の概要は、本発明の特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 The outline of the above invention does not list all the features of the present invention. A subcombination of these feature groups can also be an invention.

解析システム１００の構成の概要を示す。The outline of the configuration of the analysis system 100 is shown. 情報取得処理を実行するためのフローチャートの一例である。This is an example of a flowchart for executing the information acquisition process. ＤＡＧ生成処理を実行するためのフローチャートの一例である。This is an example of a flowchart for executing the DAG generation process. ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。It is a conceptual diagram which shows an example of the DAG generated by the DAG generation unit 40. より具体的な解析システム１００の構成の一例を示す。An example of a more specific configuration of the analysis system 100 is shown. 単語分解テーブルの一例を示す。An example of a word decomposition table is shown. 助詞テーブルの一例を示す。An example of a particle table is shown. 文節組立テーブルの一例を示す。An example of a clause assembly table is shown. 接続パターンテーブルの一例を示す。An example of the connection pattern table is shown. 重複単語テーブルの一例を示す。An example of a duplicate word table is shown. ノードテーブルの一例を示す。An example of a node table is shown. リンクテーブルの一例を示す。An example of the link table is shown. 解析システム１００で用いられるＧＵＩ画面の一例を示す。An example of the GUI screen used in the analysis system 100 is shown. 解析システム１００をハードウェアとして実現する場合の構成の一例である。This is an example of a configuration when the analysis system 100 is realized as hardware. ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。It is a conceptual diagram which shows an example of the DAG generated by the DAG generation unit 40. ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。It is a conceptual diagram which shows an example of the DAG generated by the DAG generation unit 40. 解析システム１００として機能するコンピュータ１９００のハードウェア構成の一例を示す。An example of the hardware configuration of the computer 1900 functioning as the analysis system 100 is shown.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention to which the claims are made. Also, not all combinations of features described in the embodiments are essential to the means of solving the invention.

図１は、解析システム１００の構成の概要を示す。解析システム１００は、文章取得部１０と、文節分解部２０と、助詞テーブル設定部３０と、ＤＡＧ生成部４０と、出力部５０と、ＤＡＧ構造情報取得部６０と、重複単語設定部７０とを備える。 FIG. 1 shows an outline of the configuration of the analysis system 100. The analysis system 100 includes a sentence acquisition unit 10, a phrase decomposition unit 20, a particle table setting unit 30, a DAG generation unit 40, an output unit 50, a DAG structure information acquisition unit 60, and a duplicate word setting unit 70. Be prepared.

文章取得部１０は、解析対象となる解析対象文を取得する。文章取得部１０は、複数の解析対象文を取得してもよい。解析対象文は、文章毎に入力されてもよいし、段落毎に入力されてもよい。解析対象文は、ユーザによってキーボードから入力されてもよいし、音声入力等の他の入力方法により入力されてもよい。例えば、解析対象文は、医療分野においては、各病気の症状等に関する情報を含む。 The sentence acquisition unit 10 acquires an analysis target sentence to be analyzed. The sentence acquisition unit 10 may acquire a plurality of analysis target sentences. The analysis target sentence may be input for each sentence or for each paragraph. The analysis target sentence may be input by the user from the keyboard, or may be input by another input method such as voice input. For example, the analysis target sentence includes information on the symptoms of each disease in the medical field.

文節分解部２０は、解析対象文を文節または複文節に分解した文節データを生成する。一例において、文節分解部２０は、解析対象文を単語に分解して、単語のそれぞれを予め定められたカテゴリに分類する。そして、文節分解部２０は、分解された単語を連結することにより、文節または複文節を生成する。例えば、文節分解部２０は、解析対象文を漢字、ひらがな、カタカナ、数字、アルファベットおよび特殊文字等の単語に分割する。文節分解部２０は、後述する助詞テーブルに基づいて、分解された単語を連結して文節データを生成する。 The phrase decomposition unit 20 generates phrase data in which the sentence to be analyzed is decomposed into phrases or compound phrases. In one example, the phrase decomposition unit 20 decomposes the sentence to be analyzed into words and classifies each of the words into a predetermined category. Then, the phrase decomposition unit 20 generates a phrase or a compound phrase by concatenating the decomposed words. For example, the phrase decomposition unit 20 divides the sentence to be analyzed into words such as Chinese characters, hiragana, katakana, numbers, alphabets, and special characters. The phrase decomposition unit 20 concatenates the decomposed words based on the particle table described later to generate phrase data.

助詞テーブル設定部３０は、カテゴリ毎に予め定められた識別符号と、単語とが対応付けられた助詞テーブルを作成する。助詞テーブルは、事前に登録されてよい。助詞テーブルは、文節分解部２０により参照される。助詞テーブルについては後述する。 The particle table setting unit 30 creates a particle table in which a word is associated with a predetermined identification code for each category. The particle table may be registered in advance. The particle table is referred to by the phrase decomposition unit 20. The particle table will be described later.

ＤＡＧ生成部４０は、文節データの構文解析のために、分解された文節または複文節からＤＡＧデータを生成する。本例のＤＡＧ生成部４０は、文節あるいは複文節を識別符号のパターンに基づいてＤＡＧデータを作成する。本例のＤＡＧは、文章を構成する文節または複文節をノードとして、関連付けられたものである。 The DAG generation unit 40 generates DAG data from decomposed clauses or compound clauses for parsing the clause data. The DAG generation unit 40 of this example creates DAG data based on a pattern of identification codes for clauses or compound clauses. The DAG of this example is associated with the clauses or compound clauses constituting the sentence as nodes.

出力部５０は、ＤＡＧ生成部４０が生成したＤＡＧデータを出力する。出力部５０は、解析システム１００の外部の装置にＤＡＧデータを出力してもよいし、ディスプレイ等の表示部にＤＡＧデータを出力して表示させてもよい。 The output unit 50 outputs the DAG data generated by the DAG generation unit 40. The output unit 50 may output the DAG data to an external device of the analysis system 100, or may output the DAG data to a display unit such as a display and display the DAG data.

ＤＡＧ構造情報取得部６０は、ＤＡＧ構造情報を取得する。ＤＡＧ構造情報とは、解析対象文を解析することにより生成した、過去のＤＡＧデータの構造に関する情報である。ＤＡＧ構造情報取得部６０は、ＤＡＧ構造情報をＤＡＧ生成部４０に入力する。ＤＡＧ生成部４０は、ＤＡＧ構造情報に基づいてＤＡＧデータを生成する。このように、ＤＡＧ生成部４０は、既存のＤＡＧ構造情報と、解析対象文の文節データとを組み合わせて、ＤＡＧデータを生成することができる。 The DAG structure information acquisition unit 60 acquires DAG structure information. The DAG structure information is information related to the structure of past DAG data generated by analyzing the analysis target sentence. The DAG structure information acquisition unit 60 inputs the DAG structure information to the DAG generation unit 40. The DAG generation unit 40 generates DAG data based on the DAG structure information. In this way, the DAG generation unit 40 can generate DAG data by combining the existing DAG structure information and the clause data of the analysis target sentence.

重複単語設定部７０は、ＤＡＧデータの共有ノードを許可する重複単語を設定する。共有ノードとは、異なる解析対象文で重複する文節のノードを共有したものである。重複単語設定部７０は、重複単語を検索して重複単語テーブルを生成する。例えば、重複単語設定部７０は、複数回、発現する単語を調べてリストに表示する。重複単語設定部７０は、複数回、発現する単語の中で共有する対象の単語を追加、修正および削除して重複単語テーブルを作成してもよい。また、重複単語設定部７０は、文章の段落の中で複数回、発現する単語を調べてリストに表示する重複単語検索機能を有してもよい。 The duplicate word setting unit 70 sets duplicate words that allow a shared node of DAG data. A shared node is a node in which overlapping clauses are shared by different parsing target sentences. The duplicate word setting unit 70 searches for duplicate words and generates a duplicate word table. For example, the duplicate word setting unit 70 examines a word that appears a plurality of times and displays it in a list. The duplicate word setting unit 70 may create a duplicate word table by adding, modifying, and deleting the target word to be shared among the words that appear multiple times. Further, the duplicate word setting unit 70 may have a duplicate word search function of examining words that appear multiple times in a paragraph of a sentence and displaying them in a list.

ＤＡＧ生成部４０は、ＤＡＧデータに、設定された重複単語と一致する場合に、ノードを共有する。このように、重複単語を共有することにより、異なる解析対象文であっても、ＤＡＧデータにおいて関連付けることができる。 The DAG generation unit 40 shares a node when the DAG data matches the set duplicate word. By sharing duplicate words in this way, even different analysis target sentences can be associated in the DAG data.

本例の解析システム１００は、解析対象文に応じたＤＡＧデータを出力することができる。例えば、解析システム１００は、解析対象文が医療に関する文章である場合、病気の症状についてのデータ構造を生成することができる。この場合、症状に関するキーワードで検索することにより、ＤＡＧデータから症状と関連する病気を検索することができる。 The analysis system 100 of this example can output DAG data according to the analysis target sentence. For example, the analysis system 100 can generate a data structure for a symptom of a disease when the sentence to be analyzed is a sentence related to medical treatment. In this case, a symptom-related disease can be searched from the DAG data by searching with a keyword related to the symptom.

図２Ａは、情報取得処理を実行するためのフローチャートの一例である。解析システム１００は、解析対象文を解析する前に、解析に必要な情報を取得する。本例の解析システム１００は、ステップＳ２００およびステップＳ２０２により、過去のＤＡＧ構造情報に基づいて各テーブルを取得する。本例のフローチャートは、情報取得処理の一例であり、これに限定されない。 FIG. 2A is an example of a flowchart for executing the information acquisition process. The analysis system 100 acquires information necessary for analysis before analyzing the analysis target sentence. The analysis system 100 of this example acquires each table based on the past DAG structure information by step S200 and step S202. The flowchart of this example is an example of information acquisition processing, and is not limited to this.

ステップＳ２００において、解析システム１００は、過去のＤＡＧ構造情報を取得する。過去のＤＡＧ構造情報とは、過去に解析システム１００が解析対象文を解析したときに取得されたＤＡＧの構造データである。但し、解析システム１００は、過去のＤＡＧ構造情報を取得せずに解析対象文の解析を開始してもよい。 In step S200, the analysis system 100 acquires the past DAG structure information. The past DAG structure information is DAG structure data acquired when the analysis system 100 analyzes the analysis target sentence in the past. However, the analysis system 100 may start the analysis of the analysis target sentence without acquiring the past DAG structure information.

ステップＳ２０２において、解析システム１００は、助詞テーブルや重複単語テーブル等の各テーブルを取得する。解析システム１００が記憶するテーブルについては後述する。これらのステップを通じて、解析システム１００は、文章の解析に必要な情報を事前に取得しておく。 In step S202, the analysis system 100 acquires each table such as a particle table and a duplicate word table. The table stored in the analysis system 100 will be described later. Through these steps, the analysis system 100 acquires information necessary for analyzing the text in advance.

図２Ｂは、ＤＡＧ生成処理を実行するためのフローチャートの一例である。解析システム１００は、ステップＳ２０４～ステップＳ２１０により、入力された解析対象文を解析してＤＡＧデータをする。本例のフローチャートは、ＤＡＧ生成処理の一例であり、これに限定されない。 FIG. 2B is an example of a flowchart for executing the DAG generation process. The analysis system 100 analyzes the input analysis target sentence in steps S204 to S210 to generate DAG data. The flowchart of this example is an example of the DAG generation process, and is not limited to this.

ステップＳ２０４において、解析対象文を取得する。一例において、解析対象文は、文章取得部１０により取得される。例えば、解析対象文は、ユーザにより直接入力されてもよいし、他の装置から入力されてもよい。解析対象文は、通信回路を介して入力されてもよい。検索キーワードでＷＥＢを検索して関連する複数の文献をプログラムにより自動的に取得し、解析対象文としてもよい。 In step S204, the analysis target sentence is acquired. In one example, the analysis target sentence is acquired by the sentence acquisition unit 10. For example, the analysis target sentence may be directly input by the user or may be input from another device. The analysis target sentence may be input via a communication circuit. A plurality of related documents may be automatically acquired by a program by searching the WEB with a search keyword and used as an analysis target sentence.

ステップＳ２０６において、文節データを生成する。例えば、文節分解部２０は、解析対象文を単語に分解して、単語分解テーブルを作成する。次に、文節分解部２０は、助詞テーブルを参照して単語に識別符号を付与する。次に、識別符号の付与されていない単語を結合して文節組立テーブルを作成する。 In step S206, clause data is generated. For example, the phrase decomposition unit 20 decomposes the sentence to be analyzed into words to create a word decomposition table. Next, the phrase decomposition unit 20 refers to the particle table and assigns an identification code to the word. Next, a phrase assembly table is created by combining words without identification codes.

ステップＳ２０８において、ＤＡＧデータを生成する。例えば、ＤＡＧデータは、ＤＡＧ生成部４０により、文節データに基づいて生成される。 In step S208, DAG data is generated. For example, the DAG data is generated by the DAG generation unit 40 based on the clause data.

ステップＳ２０４～ステップＳ２０８は、入力された解析対象文に応じて繰り返されてよい。例えば、解析システム１００は、入力された解析対象文の数だけ、ステップＳ２０４～ステップＳ２０８を繰り返す。ステップＳ２０４～ステップＳ２０８は、文章の数だけループされてもよいし、段落数分だけループされてもよい。ステップＳ２０４～ステップＳ２０８は、解析対象文が全て解析されるまで繰り返されてよい。 Steps S204 to S208 may be repeated according to the input analysis target sentence. For example, the analysis system 100 repeats steps S204 to S208 as many times as the number of input analysis target sentences. Steps S204 to S208 may be looped by the number of sentences or may be looped by the number of paragraphs. Steps S204 to S208 may be repeated until all the sentences to be analyzed are analyzed.

ステップＳ２１０において、ＤＡＧデータが出力される。例えば、ＤＡＧデータは、出力部５０により出力される。ＤＡＧデータは、後述するノードテーブルやリンクテーブルを含んでよい。 In step S210, DAG data is output. For example, the DAG data is output by the output unit 50. The DAG data may include a node table and a link table described later.

なお、ステップＳ２００～ステップＳ２１０は、解析システム１００を構成する各ハードウェアによって実行されてよい。また、ステップＳ２００～ステップＳ２１０は、プログラムによって、コンピュータに実行されてもよい。 It should be noted that steps S200 to S210 may be executed by each hardware constituting the analysis system 100. Further, steps S200 to S210 may be executed on the computer by a program.

図３は、ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。本例のＤＡＧ生成部４０は、風邪とインフルエンザの症状に関する解析対象文からＤＡＧを作成している。（Ａ－１）～（Ｇ－１）は、各ノードのノード番号を示す。 FIG. 3 is a conceptual diagram showing an example of the DAG generated by the DAG generation unit 40. The DAG generation unit 40 of this example creates a DAG from an analysis target sentence regarding the symptoms of a cold and influenza. (A-1) to (G-1) indicate the node number of each node.

文章取得部１０は、複数の解析対象文を取得している。例えば、文章取得部１０は、第１の解析対象文として、「一般的な風邪の症状は、鼻みず、咳、頭痛などである。」を取得する。また、文章取得部１０は、第２の解析対象文として、「インフルエンザの症状は、頭痛、筋肉痛、高熱などである。」を取得する。 The sentence acquisition unit 10 has acquired a plurality of analysis target sentences. For example, the sentence acquisition unit 10 acquires, as the first analysis target sentence, "general cold symptoms are nose stuffiness, cough, headache, and the like." In addition, the sentence acquisition unit 10 acquires "the symptoms of influenza are headache, myalgia, high fever, etc." as the second analysis target sentence.

文節分解部２０は、解析対象文を文節または複文節に分解する。本例の文節分解部２０は、「一般的な風邪の症状は、鼻みず、咳、頭痛などである。」を、「一般的な」、「風邪の症状は」、「鼻みず」、「咳」、「頭痛」、「などである。」にそれぞれ分解している。また、文節分解部２０は、「インフルエンザの症状は、頭痛、筋肉痛、高熱などである。」を、「インフルエンザの症状は」、「頭痛」、「筋肉痛」、「高熱」、「などである。」にそれぞれ分解している。即ち、本例では、後述の通り、助詞テーブルには、「な」、「は」、「など」が登録されている。 The phrase decomposition unit 20 decomposes the sentence to be analyzed into a phrase or a compound phrase. The phrase decomposition unit 20 of this example changes "general cold symptoms are nasal blemishes, cough, headache, etc." to "general", "cold symptoms", "nose bleeding", and "nose bleeding". It is decomposed into "cough", "headache", "etc.", respectively. In addition, the phrase decomposition unit 20 indicates that "influenza symptoms are headache, muscle pain, high fever, etc.", "influenza symptoms are", "headache", "muscle pain", "high fever", "etc." There are. " That is, in this example, as described later, "na", "ha", "etc." are registered in the particle table.

ＤＡＧ生成部４０は、第１の解析対象文について、各文節にノード番号を付している。例えば、ノード（Ａ－１）は、「一般的な」という文節に対応する。ノード（Ｂ－１）は、「風邪の症状は」という文節に対応する。ノード（Ｃ－１）は、「鼻みず」という文節に対応する。ノード（Ｃ－２）は、「咳」という文節に対応する。ノード（Ｃ－３）は、「頭痛」という文節に対応する。ノード（Ｄ－１）は、「などである。」という文節に対応する。 The DAG generation unit 40 assigns a node number to each clause of the first analysis target sentence. For example, the node (A-1) corresponds to the phrase "general". Node (B-1) corresponds to the phrase "What are the symptoms of a cold?" The node (C-1) corresponds to the phrase "nose water". The node (C-2) corresponds to the phrase "cough". The node (C-3) corresponds to the phrase "headache". The node (D-1) corresponds to the phrase "is, etc.".

また、ＤＡＧ生成部４０は、第２の解析対象文についても同様に、各文節にノード番号を付している。例えば、ノード（Ｅ－１）は、「インフルエンザの症状は」という文節に対応する。ノード（Ｆ－２）は、「筋肉痛」という文節に対応する。ノード（Ｆ－３）は、「高熱」という文節に対応する。ノード（Ｇ－１）は、「などである。」という文節に対応する。このように、副助詞の「など」は、並列に接続されたノードを次のノードで集約している。 Further, the DAG generation unit 40 similarly assigns a node number to each clause of the second analysis target sentence. For example, the node (E-1) corresponds to the phrase "What are the symptoms of influenza?". The node (F-2) corresponds to the phrase "myalgia". The node (F-3) corresponds to the phrase "high fever". The node (G-1) corresponds to the phrase "is, etc.". In this way, the sub-particle "etc." aggregates the nodes connected in parallel with the next node.

本例の解析システム１００は、リンクするパターンを直列か並列のいずれかに設定することにより、ＤＡＧを生成している。各ノードの接続を直列とするか、並列とするかは、助詞や副助詞に応じて決定されてよい。例えば、ノード（Ｃ－１）～（Ｃ－３）は、「は」と「などである。」に挟まれているので、並列に接続されている。また、ノード（Ｃ－３）、（Ｆ－２）および（Ｆ－３）も、「は」と「などである。」に挟まれているので、並列に接続されている。 The analysis system 100 of this example generates a DAG by setting the linking pattern to either series or parallel. Whether the connection of each node is serial or parallel may be determined according to particles and sub-particles. For example, since the nodes (C-1) to (C-3) are sandwiched between "ha" and "etc.", they are connected in parallel. Further, since the nodes (C-3), (F-2) and (F-3) are also sandwiched between "ha" and "etc.", they are connected in parallel.

重複単語設定部７０は、重複単語テーブルに「頭痛」を登録している。よって、ＤＡＧ生成部４０は、ノード（Ｃ－３）の「頭痛」を共有している。これにより、第１の解析対象文および第２解析対象文は、ノード（Ｃ－３）の「頭痛」によって関連付けられている。一方、重複単語設定部７０は、重複単語テーブルに「などである。」という文節を登録していない。よって、ＤＡＧ生成部４０は、「などである。」を示すノード（Ｄ－１）および（Ｇ－１）を共有していない。 The duplicate word setting unit 70 has registered "headache" in the duplicate word table. Therefore, the DAG generation unit 40 shares the "headache" of the node (C-3). As a result, the first analysis target sentence and the second analysis target sentence are associated with each other by the "headache" of the node (C-3). On the other hand, the duplicate word setting unit 70 does not register the phrase "etc." in the duplicate word table. Therefore, the DAG generation unit 40 does not share the nodes (D-1) and (G-1) indicating "etc.".

以上の通り、解析システム１００は、ノードを共有することにより、異なる解析対象文をＤＡＧの形式で関連付けている。つまり、風邪とインフルエンザが別の病気であるものの、共通の症状である「頭痛」により、２つの病気がＤＡＧで関連付けられている。解析システム１００は、大規模な複数の文献の関連であっても、ＤＡＧによって表すことができる。そして、解析システム１００は、文献を解析してＤＡＧに追加することにより、自動的にＤＡＧを成長させることができる。このように、解析システム１００は、ＡＩの学習エンジンとしても応用することができる。 As described above, the analysis system 100 associates different analysis target sentences in the form of DAG by sharing the node. That is, although cold and flu are different illnesses, the two illnesses are associated by a DAG due to the common symptom "headache". The analysis system 100 can be represented by a DAG even if it is related to a large number of documents. Then, the analysis system 100 can automatically grow the DAG by analyzing the document and adding it to the DAG. As described above, the analysis system 100 can also be applied as an AI learning engine.

図４は、より具体的な解析システム１００の構成の一例を示す。解析システム１００の各手段は、任意のハードウェア構成により実現されてよく、プログラムによって実現されてもよい。 FIG. 4 shows an example of a more specific configuration of the analysis system 100. Each means of the analysis system 100 may be realized by an arbitrary hardware configuration or may be realized by a program.

文節分解部２０は、単語分解手段２２と、単語分類手段２４と、文節組立手段２６とを有する。単語分解手段２２は、解析対象文を単語に分解する。 The phrase decomposition unit 20 includes a word decomposition means 22, a word classification means 24, and a phrase assembly means 26. The word decomposition means 22 decomposes the sentence to be analyzed into words.

単語分類手段２４は、分解された単語を予め定められたカテゴリに分類する。例えば、カテゴリには、漢字、ひらがな、順接の接続助詞、並列の接続助詞、読点、副助詞、句点等が含まれる。そして、単語分類手段２４は、カテゴリに対応して、分解された単語に識別符号を付してよい。例えば、単語分類手段２４は、ひらがなのうち、順接に用いる助詞「は」、並列の「や」、副助詞の「など」、句読点の「、」および「。」に一致する単語に識別符号を付与する。 The word classification means 24 classifies the decomposed words into predetermined categories. For example, categories include Chinese characters, hiragana, conjunctive particles, parallel particles, commas, sub-particles, kuten, and the like. Then, the word classification means 24 may attach an identification code to the decomposed words corresponding to the categories. For example, the word classification means 24 identifies a word in hiragana that matches the particle "ha" used for adjunct, the parallel particle "ya", the particle "etc.", and the punctuation marks "," and ".". Is given.

文節組立手段２６は、識別符号に基づいて、文節を組み立てる。文節組立手段２６は、識別符号が付与されていない単語を、それに続く識別符号の付与された単語が登場するまで連結する。これにより、文節または連文節が組み立てられる。 The clause assembly means 26 assembles a clause based on the identification code. The phrase assembling means 26 concatenates words to which an identification code is not assigned until a word with a subsequent identification code appears. This assembles a phrase or a sequence of phrases.

助詞テーブル設定部３０は、助詞テーブル取得手段３２および助詞テーブル記憶手段３４を有する。助詞テーブル取得手段３２は、後述する助詞テーブルを取得する。助詞テーブル記憶手段３４は、助詞テーブルを記憶する。助詞テーブルは、単語分類手段２４によって、助詞テーブル記憶手段３４から読み出される。 The particle table setting unit 30 has a particle table acquisition unit 32 and a particle table storage unit 34. The particle table acquisition means 32 acquires a particle table, which will be described later. The particle table storage means 34 stores the particle table. The particle table is read from the particle table storage means 34 by the word classification means 24.

ＤＡＧ生成部４０は、ＤＡＧ作成手段４２と、ノード結合手段４４と、閉ループチェック手段４６とを有する。ＤＡＧ作成手段４２は、解析対象文からＤＡＧを作成する。ノード結合手段４４は、生成されたＤＡＧのノードを、他のＤＡＧのノードと結合する。閉ループチェック手段４６は、ノード結合手段４４が生成したＤＡＧの閉ループをチェックする。 The DAG generation unit 40 includes a DAG creating means 42, a node connecting means 44, and a closed loop checking means 46. The DAG creating means 42 creates a DAG from the analysis target sentence. The node joining means 44 joins the generated DAG node with another DAG node. The closed loop checking means 46 checks the closed loop of the DAG generated by the node joining means 44.

また、ＤＡＧ生成部４０は、トポロジカル・ソートによるリンクの閉ループを修正する機能を有してもよい。ＤＡＧ生成部４０は、Ｋａｈｎの手法とＴａｒｊａｎの手法を組み合わせることで、閉ループの原因となるノードとリンクを調べることができる。ＤＡＧ生成部４０は、ループがあった場合、該当するリンクのリンク先ノードを新たに作成してリンク先を修正することにより閉ループを回避する。 Further, the DAG generation unit 40 may have a function of correcting a closed loop of a link due to topological sorting. By combining Kahn's method and Tarjan's method, the DAG generation unit 40 can investigate the node and the link that cause the closed loop. When there is a loop, the DAG generation unit 40 avoids the closed loop by newly creating a link destination node of the corresponding link and modifying the link destination.

重複単語設定部７０は、重複単語テーブル取得手段７２および重複単語テーブル記憶手段７４を有する。重複単語テーブル取得手段７２は、後述する重複単語テーブルを取得する。重複単語テーブル記憶手段７４は、重複単語テーブル取得手段７２が取得した重複単語テーブルを記憶する。重複単語テーブル記憶手段７４は、記憶した重複単語テーブルをノード結合手段４４に出力してもよい。 The duplicate word setting unit 70 has a duplicate word table acquisition means 72 and a duplicate word table storage means 74. The duplicate word table acquisition means 72 acquires the duplicate word table described later. The duplicate word table storage means 74 stores the duplicate word table acquired by the duplicate word table acquisition means 72. The duplicate word table storage means 74 may output the stored duplicate word table to the node joining means 44.

図５Ａは、単語分解テーブルの一例を示す。本例の単語分解テーブルは、図３のＤＡＧを生成するために用いられてよい。本例の単語分解テーブルは、番号、単語、分類、識別符号の欄を有する。各単語には、分類および識別符号が付されている。 FIG. 5A shows an example of a word decomposition table. The word decomposition table of this example may be used to generate the DAG of FIG. The word decomposition table of this example has columns for numbers, words, classifications, and identification codes. Each word is labeled with a classification and identification code.

文節分解部２０は、分解された単語のそれぞれを予め定められたカテゴリに分類する。例えば、カテゴリには、漢字、ひらがな、順接の接続助詞、並列の接続助詞、読点、副助詞、句点等が含まれる。但し、単語の分類方法は、本例に限られない。 The phrase decomposition unit 20 classifies each of the decomposed words into predetermined categories. For example, categories include Chinese characters, hiragana, conjunctive particles, parallel particles, commas, sub-particles, kuten, and the like. However, the word classification method is not limited to this example.

文節の組立てに用いられる単語には、分類のカテゴリに対応した識別符号が付されている。一例において、文節分解部２０は、単語分解テーブルの表のひらがなのうち、予め定められたものに識別符号を付与する。例えば、順接の接続助詞である「な」および「は」には、識別符号として「２」を付与する。並列の接続助詞である「や」には、識別符号として「３」を付与する。並列の接続をひとつに纏める副助詞である「などである」には、識別符号として「５」を付与する。読点「、」には、識別符号として「４」を付与する。句点「。」には、識別符号として「１３」を付与する。 The words used to assemble the phrase are given identification codes corresponding to the categories of classification. In one example, the phrase decomposition unit 20 assigns an identification code to a predetermined hiragana in the table of the word decomposition table. For example, "2" is added as an identification code to the conjunctive particles "na" and "ha". "3" is added as an identification code to "ya", which is a parallel connection particle. "5" is added as an identification code to "etc.", which is a sub-particle that combines parallel connections into one. The reading point "," is given "4" as an identification code. "13" is added to the kuten "." As an identification code.

なお、文節分解部２０は、文節の組立に用いない単語には識別符号を付与しなくてもよい。本例の文節分解部２０は、文節の組立に用いない単語には、識別符号として「０」を付与している。 The phrase decomposition unit 20 does not have to assign an identification code to a word that is not used for assembling the phrase. The phrase decomposition unit 20 of this example assigns "0" as an identification code to a word that is not used for assembling the phrase.

図５Ｂは、助詞テーブルの一例を示す。本例の助詞テーブルは、図３のＤＡＧを生成するために用いられてよい。 FIG. 5B shows an example of a particle table. The particle table of this example may be used to generate the DAG of FIG.

助詞テーブルは、分類ごとに対応する助詞を保存している。例えば、順接には、助詞の「な」および「は」が含まれており、「２」の識別符号がそれぞれ対応付けられている。並列には、助詞の「や」が含まれており、「３」の識別符号が対応付けられている。複数順接には、助詞の「など」が含まれており、「５」の識別符号が対応付けられている。 The particle table stores the corresponding particles for each classification. For example, the particles include the particles "na" and "ha", and the identification code of "2" is associated with each other. In parallel, the particle "ya" is included, and the identification code of "3" is associated with it. The plurality of adjuncts include the particle "etc." and are associated with the identification code of "5".

図５Ｃは、文節組立テーブルの一例を示す。本例の文節組立テーブルは、図３のＤＡＧを生成するために用いられてよい。文節組立テーブルは、例えば、番号、文節や複文節、分類、区分および識別符号の欄を有する。各文節や複文節には、分類、区分および識別符号が付されている。 FIG. 5C shows an example of a clause assembly table. The clause assembly table of this example may be used to generate the DAG of FIG. The clause assembly table has, for example, columns for numbers, clauses and compound clauses, classifications, divisions and identification codes. Each clause or compound clause is labeled with a classification, division and identification code.

文節分解部２０は、識別符号の付いていない単語は連結して文節あるいは連文節にする。これにより、ＤＡＧのノードとなる文字列がなるべく意味のある塊となるので扱いやすくなる。例えば、「一般的な」の文節は、「一般的」の識別符号が「０」であり、「な」の識別符号が「２」であることから連結されている。そして、「一般的な」の文節は、助詞である「な」と同様に、区分が順接の接続助詞となっており、識別符号として「２」が付与されている。 The phrase decomposition unit 20 concatenates words without an identification code into a phrase or a continuous phrase. As a result, the character string that becomes the node of the DAG becomes a meaningful mass as much as possible, so that it is easy to handle. For example, the "general" clause is concatenated because the "general" identification code is "0" and the "na" identification code is "2". And, like the particle "na", the phrase of "general" is a connecting particle whose division is an adjunct, and "2" is added as an identification code.

同様に、「風邪の症状は」の文節は、「風邪」、「の」および「症状」の識別符号が「０」であり、「は」の識別符号が「２」であることから連結されている。そして、「風邪の症状は」の文節は、助詞である「は」と同様に、区分が順接の接続助詞となっており、識別符号として「２」が付与されている。 Similarly, the phrase "cold symptom is" is concatenated because the identification code for "cold", "no" and "symptom" is "0" and the identification code for "ha" is "2". ing. And, the phrase of "the symptom of a cold" is a connecting particle whose division is an adjunct, like the particle "ha", and "2" is given as an identification code.

図５Ｄは、接続パターンテーブルの一例を示す。本例の接続パターンテーブルは、図３のＤＡＧを生成するために用いられてよい。接続パターンテーブルには、解析対象文の一連の文節に付された識別符号のパターンが保存されている。ＤＡＧ生成部４０は、接続パターンテーブルに保存された識別符号のパターンを用いて一連鎖のＤＡＧを作成する。 FIG. 5D shows an example of a connection pattern table. The connection pattern table of this example may be used to generate the DAG of FIG. In the connection pattern table, the pattern of the identification code attached to the series of clauses of the sentence to be analyzed is stored. The DAG generation unit 40 creates a series of DAGs using the pattern of the identification code stored in the connection pattern table.

一例において、ＤＡＧ生成部４０は、文節組立テーブルのパターン２－２－３－４－４－５に基づいて、一連鎖のＤＡＧを作成する。例えば、パターン２－２は直列の接続を示す。パターン２―３において、２は分岐の始まりのノードを示す。パターン３－４－４は、並列のリンクを示す。パターン５は、並列の接続が収斂するノードを示す。これにより、ＤＡＧ生成部４０は、（Ａ－１）→（Ｂ－１）と、（Ｂ－１）→（Ｃ－１）と、（Ｂ－１）→（Ｃ－２）と、（Ｂ－１）→（Ｃ－３）と、（Ｃ－１）→（Ｄ－１）と、（Ｃ－２）→（Ｄ－１）と、（Ｃ－３）→（Ｄ－１）と、リンク付けることができる。 In one example, the DAG generator 40 creates a chain of DAGs based on the pattern 2-2-3-4-5 in the clause assembly table. For example, pattern 2-2 shows a series connection. In patterns 2-3, 2 indicates the node at the beginning of the branch. Patterns 3-4-4 show parallel links. Pattern 5 shows a node where parallel connections converge. As a result, the DAG generation unit 40 has (A-1) → (B-1), (B-1) → (C-1), (B-1) → (C-2), and (B). -1) → (C-3), (C-1) → (D-1), (C-2) → (D-1), (C-3) → (D-1), Can be linked.

図５Ｅは、重複単語テーブルの一例を示す。本例の重複単語テーブルは、図３のＤＡＧを生成するために用いられてよい。重複単語テーブルには、ＤＡＧの作成時に共有するための重複単語が登録されている。本例の重複単語テーブルには、「頭痛」および「高熱」が登録されている。したがって、ＤＡＧのノードに「頭痛」または「高熱」が存在する場合は、共有ノードとしてＤＡＧが作成される。これにより、解析システム１００は、解析対象文毎に一連鎖のＤＡＧを作成するが、ＤＡＧ間に重複した単語が存在する場合、重複単語として接続することで段落間の関連性をとることができる。 FIG. 5E shows an example of a duplicate word table. The duplicate word table of this example may be used to generate the DAG of FIG. Duplicate words for sharing at the time of creating a DAG are registered in the duplicate word table. "Headache" and "high fever" are registered in the duplicate word table of this example. Therefore, if a "headache" or "high fever" is present on a DAG node, the DAG is created as a shared node. As a result, the analysis system 100 creates a series of DAGs for each sentence to be analyzed, but if there are duplicate words between the DAGs, the paragraphs can be related by connecting them as duplicate words. ..

なお、閉ループチェック手段４６は、ノードが共有された場合に、閉ループをチェックしてよい。ＤＡＧ生成部４０は、閉ループが存在する場合に、新たにノードを作成して、閉ループを解除する。 The closed loop checking means 46 may check the closed loop when the nodes are shared. When the closed loop exists, the DAG generation unit 40 creates a new node and releases the closed loop.

図５Ｆは、ノードテーブルの一例を示す。ノードテーブルは、ＤＡＧ構造情報の一例である。本例のノードテーブルは、図３のＤＡＧを生成するために用いられてよい。ノードテーブルには、ノード番号と、ノード番号に対応する内容が登録されている。 FIG. 5F shows an example of a node table. The node table is an example of DAG structure information. The node table of this example may be used to generate the DAG of FIG. The node number and the contents corresponding to the node number are registered in the node table.

具体的には、ノード（Ａ－１）に対して「一般的な」という内容が登録されている。ノード（Ｂ－１）に対して「風邪の症状は」という内容が登録されている。ノード（Ｃ－１）に対して「鼻みず」という内容が登録されている。ノード（Ｃ－２）に対して「咳」という内容が登録されている。ノード（Ｃ－３）に対して「頭痛」という内容が登録されている。ノード（Ｄ－１）に対して「などである。」という内容が登録されている。 Specifically, the content of "general" is registered for the node (A-1). The content "What are the symptoms of a cold" is registered for the node (B-1). The content of "nose water" is registered for the node (C-1). The content of "cough" is registered for the node (C-2). The content of "headache" is registered for the node (C-3). The content "is, etc." is registered for the node (D-1).

図５Ｇは、リンクテーブルの一例を示す。リンクテーブルは、ＤＡＧ構造情報の一例である。本例のリンクテーブルは、図３のＤＡＧを生成するために用いられてよい。リンクテーブルには、ノードのリンク元と、そのリンク先が登録されている。ノードのリンク元と、そのリンク先は、接続パターンテーブルに基づいて決定されてよい。 FIG. 5G shows an example of a link table. The link table is an example of DAG structure information. The link table of this example may be used to generate the DAG of FIG. In the link table, the link source of the node and the link destination are registered. The link source of the node and its link destination may be determined based on the connection pattern table.

具体的には、「文節組立テーブル」の「番号１」の情報から「ノードテーブル」の（Ａ－１）のノードを作成する。「文節組立テーブル」の「番号１」の識別符号は「２」なので順接を表す。また、「文節組立テーブル」の「番号２」の識別符号は「２」なので直列のリンクとなる。よって、リンク元のノード（Ａ－１）に対して、リンク先にはノード（Ｂ－１）が登録されている。 Specifically, the node (A-1) of the "node table" is created from the information of "number 1" of the "clause assembly table". Since the identification code of "number 1" in the "phrase assembly table" is "2", it represents an adjunct. Further, since the identification code of "number 2" in the "phrase assembly table" is "2", it is a serial link. Therefore, the node (B-1) is registered in the link destination with respect to the link source node (A-1).

次に、「文節組立テーブル」の「番号３」の識別符号が「３」であり、「番号４」および「番号５」の識別符号が「４」なので並列を示す。「番号２」の識別符号は「２」であるから、「番号２」の次の文節から並列が始まる。「番号６」の識別符号は「５」であるから、「番号６」の文節で先の並列の文節が収斂する。よって、リンク元のノード（Ｂ－１）に対して、リンク先にはノード（Ｃ－１）～（Ｃ－３）が登録されている。また、リンク先のノード（Ｄ－１）に対して、リンク元にはノード（Ｃ－１）～（Ｃ－３）が登録されている。 Next, since the identification code of "number 3" in the "phrase assembly table" is "3" and the identification code of "number 4" and "number 5" is "4", parallelism is shown. Since the identification code of "number 2" is "2", parallelism starts from the phrase following "number 2". Since the identification code of "number 6" is "5", the previous parallel clauses converge in the clause of "number 6". Therefore, the nodes (C-1) to (C-3) are registered in the link destination with respect to the link source node (B-1). Further, the nodes (C-1) to (C-3) are registered in the link source with respect to the link destination node (D-1).

ＤＡＧ生成部４０は、ノードテーブルおよびリンクテーブルを用いてノードを接続することにより、ＤＡＧを作成することができる。ＤＡＧ生成部４０は、ノードテーブルと並行してリンクテーブルを作成してもよい。出力部５０は、ＤＡＧデータとして、ノードテーブルおよびリンクテーブルを出力してもよい。 The DAG generation unit 40 can create a DAG by connecting nodes using a node table and a link table. The DAG generation unit 40 may create a link table in parallel with the node table. The output unit 50 may output a node table and a link table as DAG data.

図６は、解析システム１００で用いられるＧＵＩ画面の一例を示す。本例のＧＵＩ画面では、ユーザによって、解析対象文の入力欄に解析対象文が入力される。解析対象文は、キーボードから入力されてもよいし、音声入力等の他の入力方法により入力されてもよい。また、文章取得部１０は、解析対象文をデジタルデータで作成されたメディアから取得してもよい。文章取得部１０は、インターネットにおいてリンクで関連付けされた各ＨＴＭＬ文書プログラムで読み込んでもよい。解析システム１００は、解析対象文の入力後に解析実行ボタンがクリックされることにより、解析対象文の解析を実行する。なお、ＧＵＩ画面は、これに限定されない。 FIG. 6 shows an example of a GUI screen used in the analysis system 100. In the GUI screen of this example, the analysis target sentence is input by the user in the analysis target sentence input field. The analysis target sentence may be input from the keyboard, or may be input by another input method such as voice input. Further, the sentence acquisition unit 10 may acquire the analysis target sentence from the medium created by the digital data. The sentence acquisition unit 10 may be read by each HTML document program associated with a link on the Internet. The analysis system 100 executes the analysis of the analysis target sentence by clicking the analysis execution button after inputting the analysis target sentence. The GUI screen is not limited to this.

図７は、解析システム１００をハードウェアとして実現する場合の構成の一例である。解析システム１００は、ＣＰＵ１１０と、主メモリ１２０と、ＨＤＤ１３０と、入力デバイス１４０と、ディスプレイ１５０とを備える。本例のハードウェア構成は、一例であり、これに限定されない。 FIG. 7 is an example of a configuration in which the analysis system 100 is realized as hardware. The analysis system 100 includes a CPU 110, a main memory 120, an HDD 130, an input device 140, and a display 150. The hardware configuration of this example is an example and is not limited thereto.

ＣＰＵ１１０は、文章取得部１０、文節分解部２０およびＤＡＧ生成部４０を実現するための各種演算処理を実行する。例えば、ＣＰＵ１１０は、主メモリ１２０により読み出されたプログラムを参照して、プログラムで示される手順に従い各種演算処理を実行する。 The CPU 110 executes various arithmetic processes for realizing the sentence acquisition unit 10, the phrase decomposition unit 20, and the DAG generation unit 40. For example, the CPU 110 refers to the program read by the main memory 120 and executes various arithmetic processes according to the procedure indicated by the program.

主メモリ１２０は、解析対象文の取得プログラム、文節分解プログラムおよびＤＡＧ生成プログラムを格納している。主メモリ１２０は、その他のプログラムを適宜格納していてもよい。また、主メモリ１２０には、複数のアドレスが割り当てられてよい。ＣＰＵ１１０は、アドレスを特定し格納されているデータにアクセスすることにより、データを用いた演算処理を実行することができる。 The main memory 120 stores an acquisition program for an analysis target sentence, a phrase decomposition program, and a DAG generation program. The main memory 120 may appropriately store other programs. Further, a plurality of addresses may be assigned to the main memory 120. By specifying the address and accessing the stored data, the CPU 110 can execute arithmetic processing using the data.

ＨＤＤ１３０は、各テーブルを記憶するための記憶部として動作する。例えば、ＨＤＤ１３０は、ＤＡＧ構造情報、単語分解テーブル、文節組立テーブルおよび助詞テーブルを記憶している。また、ＨＤＤ１３０には、複数のアドレスが割り当てられてよい。 The HDD 130 operates as a storage unit for storing each table. For example, the HDD 130 stores DAG structure information, a word decomposition table, a phrase assembly table, and a particle table. Further, a plurality of addresses may be assigned to the HDD 130.

入力デバイス１４０は、文章を入力するためのユーザ入力装置である。例えば、入力デバイス１４０は、キーボード等の入力デバイスである。 The input device 140 is a user input device for inputting text. For example, the input device 140 is an input device such as a keyboard.

ディスプレイ１５０は、解析対象文を入力するためのＧＵＩ画面を表示する。ディスプレイ１５０は、解析システム１００の動作に必要な情報を適宜表示してよい。また、ディスプレイ１５０は、解析システム１００の解析結果を表示してもよい。 The display 150 displays a GUI screen for inputting a sentence to be analyzed. The display 150 may appropriately display information necessary for the operation of the analysis system 100. Further, the display 150 may display the analysis result of the analysis system 100.

なお、解析システム１００のハードウェア構成は、「システムバス」などのデータ通信経路によって相互に接続されてよい。これにより、各ハードウェア間で情報の送受信や処理を実行する。 The hardware configuration of the analysis system 100 may be connected to each other by a data communication path such as a “system bus”. As a result, information is transmitted / received and processed between the hardware.

ここで、解析システム１００は、解析対象文を自動解析することによって、ＤＡＧ構造情報を生成して、ＨＤＤ１３０に記録する。例えば、ユーザが文章を入力すると、ＣＰＵ１１０は、文節分解プログラムを実行し、テーブルを生成してＨＤＤ１３０に記録する。 Here, the analysis system 100 generates DAG structure information by automatically analyzing the analysis target sentence and records it in the HDD 130. For example, when the user inputs a sentence, the CPU 110 executes a phrase decomposition program, generates a table, and records it in the HDD 130.

例えば、ＣＰＵ１１０は、解析対象文として、「一般的な風邪の症状は、鼻みず、咳、頭痛などである。」などの文章を取得すると、当該文章に対して単語分解処理を実行して、「単語分解テーブル」を作成する。ＣＰＵ１１０は、ＨＤＤ１３０から主メモリ１２０に読み込んでおいた「助詞テーブル」を用いて「単語分解テーブル」の識別符号を付与する。その後、ＣＰＵ１１０は、文節組立処理を実行して、「文節組立テーブル」を作成してＨＤＤ１３０に記録する。 For example, when the CPU 110 acquires a sentence such as "general cold symptoms are nose, cough, headache, etc." as an analysis target sentence, the CPU 110 executes word decomposition processing on the sentence. Create a "word decomposition table". The CPU 110 assigns an identification code of the "word decomposition table" by using the "particle table" read from the HDD 130 into the main memory 120. After that, the CPU 110 executes the clause assembly process, creates a "clause assembly table", and records it in the HDD 130.

その後、ＣＰＵ１１０は、ＤＡＧ生成プログラムを実行して、「文節組立テーブル」の識別符号のパターンを解釈する。そして、ＣＰＵ１１０は、「ノードテーブル」および「リンクテーブル」を作成する。そして、ＣＰＵ１１０は、ＤＡＧデータ出力プログラムを解釈し、「ノードテーブル」および「リンクテーブル」を出力する。 After that, the CPU 110 executes a DAG generation program to interpret the pattern of the identification code of the "phrase assembly table". Then, the CPU 110 creates a "node table" and a "link table". Then, the CPU 110 interprets the DAG data output program and outputs the "node table" and the "link table".

図８は、ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。本例の解析システム１００は、文章を追加することによりＤＡＧ構造情報にデータを追加する。 FIG. 8 is a conceptual diagram showing an example of the DAG generated by the DAG generation unit 40. The analysis system 100 of this example adds data to the DAG structure information by adding sentences.

文章取得部１０は、解析対象文として、第１の解析対象文である「インフルエンザの症状は、頭痛、筋肉痛、高熱などである。」と、第２の解析対象文である「高熱の場合は解熱剤の投与が必要である。」を取得する。なお、本例では、助詞テーブルには、「は」および「など」が登録されている。また、重複単語テーブルには、「高熱」が登録されている。 The sentence acquisition unit 10 has, as the analysis target sentence, the first analysis target sentence "the symptoms of influenza are headache, myalgia, high fever, etc." and the second analysis target sentence "in the case of high fever". Requires administration of antihypertensive agent. " In this example, "ha" and "etc." are registered in the particle table. In addition, "high fever" is registered in the duplicate word table.

解析システム１００は、第１の解析対象文から、ＤＡＧデータ（Ｄ－１、Ｅ－１、Ｅ－２、Ｅ－３、Ｆ－１）を作成している。ここで、第２の解析対象文を追加する場合に「高熱」が一致し、且つ重複単語テーブルに「高熱」が登録されているので、ノード（Ｅ－３）が共有され、ノード（Ｆ－２）がノード（Ｅ－３）にリンクされる。 The analysis system 100 creates DAG data (D-1, E-1, E-2, E-3, F-1) from the first analysis target sentence. Here, when the second analysis target sentence is added, "high fever" matches and "high fever" is registered in the duplicate word table, so that the node (E-3) is shared and the node (F-) is shared. 2) is linked to the node (E-3).

なお、本例の解析システム１００は、後述する一致率の計算方法を用いて、「高熱」と「高熱の場合には」を一致するものとして認定している。そして、解析システム１００は、「高熱」と「高熱の場合には」を一致するものと判断して、ＤＡＧ構造情報にデータを追加している。 In addition, the analysis system 100 of this example certifies that "high fever" and "in the case of high fever" match by using the method of calculating the coincidence rate described later. Then, the analysis system 100 determines that "high fever" and "in the case of high fever" match, and adds data to the DAG structure information.

図９は、ＤＡＧ生成部４０が生成したＤＡＧの一例を示す概念図である。なお、本例では、助詞テーブルには、「は」および「が」が登録されている。 FIG. 9 is a conceptual diagram showing an example of the DAG generated by the DAG generation unit 40. In this example, "ha" and "ga" are registered in the particle table.

文章取得部１０は、解析対象文として、第１の解析対象文である「インフルエンザの処方は日本ではＡ薬が投与される。」と、第２の解析対象文である「インフルエンザの処方箋はアメリカではＢ薬が投与される。」を取得する。 The sentence acquisition unit 10 stated that the first analysis target sentence, "Influenza prescription is administered by drug A in Japan," and the second analysis target sentence, "Influenza prescription is the United States." Then, drug B is administered. "

解析システム１００は、第１の解析対象文からＤＡＧデータ（Ａ－１、Ｂ－１、Ｃ－１）を作成している。解析システム１００は、第１の解析対象文のＤＡＧデータに、第２の解析対象文のＤＡＧデータを追加している。 The analysis system 100 creates DAG data (A-1, B-1, C-1) from the first analysis target sentence. The analysis system 100 adds the DAG data of the second analysis target sentence to the DAG data of the first analysis target sentence.

本例のＤＡＧ生成部４０は、解析対象文の文節あるいは複文節に含まれる単語と、他の解析対象文の文節あるいは複文節に含まれる単語との一致率を算出する。ＤＡＧ生成部４０は、一致率が予め定められた閾値以上の場合に、ノードを共有する。 The DAG generation unit 40 of this example calculates the matching rate between the word included in the phrase or compound phrase of the analysis target sentence and the word included in the phrase or compound phrase of another analysis target sentence. The DAG generation unit 40 shares a node when the match rate is equal to or higher than a predetermined threshold value.

ノード結合手段４４は、「インフルエンザの処方」に含まれる単語と、「インフルエンザの処方箋」に含まれる単語の一致率を算出する。例えば、ノード結合手段４４は、第１の解析対象文のノード（Ａ－１）の「インフルエンザの処方」の単語「インフルエンザ」、「処方」と、第２の解析対象文の「インフルエンザの処方箋」の単語「インフルエンザ」、「処方箋」を比較する。 The node joining means 44 calculates the matching rate between the word included in the "influenza prescription" and the word contained in the "influenza prescription". For example, the node connecting means 44 has the words "influenza" and "prescription" of the node (A-1) of the first analysis target sentence and the "influenza prescription" of the second analysis target sentence. Compare the words "flu" and "prescription".

例えば、「インフルエンザ」が一致するので１００％、「処方」と「処方箋」は２／３文字が一致するので６６％、合わせて１６６／２００＝８３％の一致率となる。解析システム１００は、共有するための一致率を８０％以上と設定した場合には「インフルエンザの処方」と「インフルエンザの処方箋」の文節が一致すると判断して、ノードを共有する。一方、ノード（Ｂ－１）とノード（Ｂ－２）の一致率は０％であるのでノードは新たに追加される。また、ノード（Ｃ－１）の一致率は１００％なので共有される。よって、解析システム１００は、第２の解析対象文について、（Ａ－１）－（Ｂ－２）－（Ｃ－１）のようにリンクすることができる。 For example, since "influenza" matches, 100%, and "prescription" and "prescription" match 2/3 characters, so 66%, and the total match rate is 166/200 = 83%. When the matching rate for sharing is set to 80% or more, the analysis system 100 determines that the clauses of "influenza prescription" and "influenza prescription" match, and shares the node. On the other hand, since the match rate between the node (B-1) and the node (B-2) is 0%, a new node is added. Moreover, since the match rate of the node (C-1) is 100%, it is shared. Therefore, the analysis system 100 can link the second analysis target sentence as (A-1)-(B-2)-(C-1).

以上の通り、解析システム１００は、自然言語処理とＤＡＧを用いて解析対象文を解析することにより、ビッグデータの文献を意味解析して、文脈を解析することができる。即ち、解析システム１００は、検索キーワードと直接関連のない症状や病気の因果関係を考慮したデータ解析を実現することができる。 As described above, the analysis system 100 can analyze the meaning of big data documents and analyze the context by analyzing the analysis target sentence using natural language processing and DAG. That is, the analysis system 100 can realize data analysis in consideration of the causal relationship of symptoms and diseases that are not directly related to the search keyword.

また、解析システム１００は、単に、自然言語処理において翻訳や文章の特徴を抜き出すことにとどまらず、ＤＡＧのデータ構造を用いて解析する。そのため、解析システム１００は、機械の故障や病気の診断などの大規模な文献を解析する際に重要な因果関係を正しくデータとして保存することができる。よって、解析システム１００は、因果関係に着目して文献を文節に分解したＤＡＧのデータ構造を得ることができる。 Further, the analysis system 100 does not merely extract the features of translation and sentences in natural language processing, but also analyzes using the data structure of DAG. Therefore, the analysis system 100 can correctly store important causal relationships as data when analyzing a large-scale document such as a machine failure or a diagnosis of a disease. Therefore, the analysis system 100 can obtain a DAG data structure obtained by decomposing a document into clauses by paying attention to a causal relationship.

なお、解析システム１００は、日本語に限られず、他の言語にも同様に適用することができる。この場合、各言語に特有のテーブルを各種記憶しておくことにより、各言語の文法や単語に応じた態様で日本語以外の言語にも適用できる。 The analysis system 100 is not limited to Japanese, and can be similarly applied to other languages. In this case, by storing various tables peculiar to each language, it can be applied to languages other than Japanese in a manner corresponding to the grammar and words of each language.

図１０は、解析システム１００として機能するコンピュータ１９００のハードウェア構成の一例を示す。また、複数のコンピュータが協働して解析システム１００として機能してもよい。 FIG. 10 shows an example of the hardware configuration of the computer 1900 that functions as the analysis system 100. Further, a plurality of computers may cooperate to function as the analysis system 100.

実施例に係るコンピュータ１９００は、ホスト・コントローラ２０８２により相互に接続されるＣＰＵ２０００、ＲＡＭ２０２０、グラフィック・コントローラ２０７５、および表示装置２０８０を有するＣＰＵ周辺部と、入出力コントローラ２０８４によりホスト・コントローラ２０８２に接続される通信インターフェイス２０３０、ハードディスクドライブ２０４０、およびＤＶＤドライブ２０６０を有する入出力部と、入出力コントローラ２０８４に接続されるＲＯＭ２０１０、フレキシブルディスク・ドライブ２０５０、および入出力チップ２０７０を有するレガシー入出力部と、を備える。 The computer 1900 according to the embodiment is connected to a CPU peripheral portion having a CPU 2000, a RAM 2020, a graphic controller 2075, and a display device 2080 connected to each other by a host controller 2082, and to a host controller 2082 by an input / output controller 2084. An input / output unit having a communication interface 2030, a hard disk drive 2040, and a DVD drive 2060, and a legacy input / output unit having a ROM 2010, a flexible disk drive 2050, and an input / output chip 2070 connected to the input / output controller 2084. Be prepared.

ホスト・コントローラ２０８２は、ＲＡＭ２０２０と、高い転送レートでＲＡＭ２０２０をアクセスするＣＰＵ２０００およびグラフィック・コントローラ２０７５とを接続する。ＣＰＵ２０００は、ＲＯＭ２０１０およびＲＡＭ２０２０に格納されたプログラムに基づいて動作し、各部の制御を行う。グラフィック・コントローラ２０７５は、ＣＰＵ２０００等がＲＡＭ２０２０内に設けたフレーム・バッファ上に生成する画像データを取得し、表示装置２０８０上に表示させる。これに代えて、グラフィック・コントローラ２０７５は、ＣＰＵ２０００等が生成する画像データを格納するフレーム・バッファを、内部に含んでもよい。 The host controller 2082 connects the RAM 2020 to the CPU 2000 and the graphic controller 2075 that access the RAM 2020 at a high transfer rate. The CPU 2000 operates based on the programs stored in the ROM 2010 and the RAM 2020, and controls each part. The graphic controller 2075 acquires image data generated on a frame buffer provided in the RAM 2020 by the CPU 2000 or the like, and displays the image data on the display device 2080. Alternatively, the graphic controller 2075 may include a frame buffer internally for storing image data generated by the CPU 2000 or the like.

入出力コントローラ２０８４は、ホスト・コントローラ２０８２と、比較的高速な入出力装置である通信インターフェイス２０３０、ハードディスクドライブ２０４０、ＤＶＤドライブ２０６０を接続する。通信インターフェイス２０３０は、ネットワークを介して他の装置と通信する。ハードディスクドライブ２０４０は、コンピュータ１９００内のＣＰＵ２０００が使用するプログラムおよびデータを格納する。ＤＶＤドライブ２０６０は、ＤＶＤ－ＲＯＭ２０９５からプログラムまたはデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。 The input / output controller 2084 connects the host controller 2082 to a communication interface 2030, a hard disk drive 2040, and a DVD drive 2060, which are relatively high-speed input / output devices. Communication interface 2030 communicates with other devices via the network. The hard disk drive 2040 stores programs and data used by the CPU 2000 in the computer 1900. The DVD drive 2060 reads a program or data from the DVD-ROM 2095 and provides it to the hard disk drive 2040 via the RAM 2020.

また、入出力コントローラ２０８４には、ＲＯＭ２０１０と、フレキシブルディスク・ドライブ２０５０、および入出力チップ２０７０の比較的低速な入出力装置とが接続される。ＲＯＭ２０１０は、コンピュータ１９００が起動時に実行するブート・プログラム、および／または、コンピュータ１９００のハードウェアに依存するプログラム等を格納する。フレキシブルディスク・ドライブ２０５０は、フレキシブルディスク２０９０からプログラムまたはデータを読み取り、ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供する。入出力チップ２０７０は、フレキシブルディスク・ドライブ２０５０を入出力コントローラ２０８４へと接続すると共に、例えばパラレル・ポート、シリアル・ポート、キーボード・ポート、マウス・ポート等を介して各種の入出力装置を入出力コントローラ２０８４へと接続する。 Further, the input / output controller 2084 is connected to the ROM 2010, the flexible disk drive 2050, and the relatively low-speed input / output device of the input / output chip 2070. The ROM 2010 stores a boot program executed by the computer 1900 at startup, and / or a program depending on the hardware of the computer 1900. The flexible disk drive 2050 reads a program or data from the flexible disk 2090 and provides it to the hard disk drive 2040 via RAM 2020. The input / output chip 2070 connects the flexible disk drive 2050 to the input / output controller 2084, and inputs / outputs various input / output devices via, for example, a parallel port, a serial port, a keyboard port, a mouse port, and the like. Connect to controller 2084.

ＲＡＭ２０２０を介してハードディスクドライブ２０４０に提供されるプログラムは、フレキシブルディスク２０９０、ＤＶＤ－ＲＯＭ２０９５、またはＩＣカード等の記録媒体に格納されて利用者によって提供される。プログラムは、記録媒体から読み出され、ＲＡＭ２０２０を介してコンピュータ１９００内のハードディスクドライブ２０４０にインストールされ、ＣＰＵ２０００において実行される。プログラムは、コンピュータ１９００にインストールされ、コンピュータ１９００を、解析システム１００の各構成として機能させる。 The program provided to the hard disk drive 2040 via the RAM 2020 is stored in a recording medium such as a flexible disk 2090, a DVD-ROM 2095, or an IC card and provided by the user. The program is read from the recording medium, installed on the hard disk drive 2040 in the computer 1900 via the RAM 2020, and executed in the CPU 2000. The program is installed on the computer 1900 and causes the computer 1900 to function as each configuration of the analysis system 100.

プログラムに記述された情報処理は、コンピュータ１９００に読込まれることにより、ソフトウェアと上述した各種のハードウェア資源とが協働した具体的手段である文章取得部１０、文節分解部２０、助詞テーブル設定部３０、ＤＡＧ生成部４０、出力部５０、ＤＡＧ構造情報取得部６０および重複単語設定部７０の少なくとも一部として機能する。そして、この具体的手段によって、本実施形態におけるコンピュータ１９００の使用目的に応じた情報の演算または加工を実現することにより、使用目的に応じた特有の解析システム１００が構築される。 When the information processing described in the program is read into the computer 1900, the sentence acquisition unit 10, the phrase decomposition unit 20, and the particle table setting, which are specific means in which the software and the various hardware resources described above cooperate with each other, are set. It functions as at least a part of a unit 30, a DAG generation unit 40, an output unit 50, a DAG structure information acquisition unit 60, and a duplicate word setting unit 70. Then, by realizing the calculation or processing of information according to the purpose of use of the computer 1900 in the present embodiment by this specific means, a unique analysis system 100 according to the purpose of use is constructed.

一例として、コンピュータ１９００と外部の装置等との間で通信を行う場合には、ＣＰＵ２０００は、ＲＡＭ２０２０上にロードされた通信プログラムを実行し、通信プログラムに記述された処理内容に基づいて、通信インターフェイス２０３０に対して通信処理を指示する。通信インターフェイス２０３０は、ＣＰＵ２０００の制御を受けて、ＲＡＭ２０２０、ハードディスクドライブ２０４０、フレキシブルディスク２０９０、またはＤＶＤ－ＲＯＭ２０９５等の記憶装置上に設けた送信バッファ領域等に記憶された送信データを読み出してネットワークへと送信し、もしくは、ネットワークから受信した受信データを記憶装置上に設けた受信バッファ領域等へと書き込む。このように、通信インターフェイス２０３０は、ＤＭＡ（ダイレクト・メモリ・アクセス）方式により記憶装置との間で送受信データを転送してもよく、これに代えて、ＣＰＵ２０００が転送元の記憶装置または通信インターフェイス２０３０からデータを読み出し、転送先の通信インターフェイス２０３０または記憶装置へとデータを書き込むことにより送受信データを転送してもよい。 As an example, when communicating between the computer 1900 and an external device or the like, the CPU 2000 executes a communication program loaded on the RAM 2020, and a communication interface is based on the processing content described in the communication program. Instruct 2030 to process communication. Under the control of the CPU 2000, the communication interface 2030 reads out the transmission data stored in the transmission buffer area or the like provided on the storage device such as the RAM 2020, the hard disk drive 2040, the flexible disk 2090, or the DVD-ROM 2095, and transfers the transmission data to the network. The received data transmitted or received from the network is written to a receive buffer area or the like provided on the storage device. As described above, the communication interface 2030 may transfer the transmission / reception data to / from the storage device by the DMA (direct memory access) method, and instead, the CPU 2000 may transfer the transfer source storage device or the communication interface 2030. The transmitted / received data may be transferred by reading data from the data and writing the data to the communication interface 2030 or the storage device of the transfer destination.

また、ＣＰＵ２０００は、ハードディスクドライブ２０４０、ＤＶＤドライブ２０６０（ＤＶＤ－ＲＯＭ２０９５）、フレキシブルディスク・ドライブ２０５０（フレキシブルディスク２０９０）等の外部記憶装置に格納されたファイルまたはデータベース等の中から、全部または必要な部分をＤＭＡ転送等によりＲＡＭ２０２０へと読み込ませ、ＲＡＭ２０２０上のデータに対して各種の処理を行う。そして、ＣＰＵ２０００は、処理を終えたデータを、ＤＭＡ転送等により外部記憶装置へと書き戻す。このような処理において、ＲＡＭ２０２０は、外部記憶装置の内容を一時的に保持するものとみなせるから、本実施形態においてはＲＡＭ２０２０および外部記憶装置等をメモリ、記憶部、または記憶装置等と総称する。本実施形態における各種のプログラム、データ、テーブル、データベース等の各種の情報は、このような記憶装置上に格納されて、情報処理の対象となる。なお、ＣＰＵ２０００は、ＲＡＭ２０２０の一部をキャッシュメモリに保持し、キャッシュメモリ上で読み書きを行うこともできる。このような形態においても、キャッシュメモリはＲＡＭ２０２０の機能の一部を担うから、本実施形態においては、区別して示す場合を除き、キャッシュメモリもＲＡＭ２０２０、メモリ、および／または記憶装置に含まれるものとする。 Further, the CPU 2000 is a whole or necessary part from files or databases stored in an external storage device such as a hard disk drive 2040, a DVD drive 2060 (DVD-ROM 2095), and a flexible disk drive 2050 (flexible disk 2090). Is read into the RAM 2020 by DMA transfer or the like, and various processes are performed on the data on the RAM 2020. Then, the CPU 2000 writes the processed data back to the external storage device by DMA transfer or the like. In such processing, the RAM 2020 can be regarded as temporarily holding the contents of the external storage device. Therefore, in the present embodiment, the RAM 2020 and the external storage device are collectively referred to as a memory, a storage unit, a storage device, or the like. Various information such as various programs, data, tables, and databases in the present embodiment are stored in such a storage device and are subject to information processing. The CPU 2000 can also hold a part of the RAM 2020 in the cache memory and read / write on the cache memory. Even in such a form, the cache memory plays a part of the function of the RAM 2020. Therefore, in the present embodiment, the cache memory is also included in the RAM 2020, the memory, and / or the storage device, unless otherwise indicated. do.

また、ＣＰＵ２０００は、ＲＡＭ２０２０から読み出したデータに対して、プログラムの命令列により指定された、本実施形態中に記載した各種の演算、情報の加工、条件判断、情報の検索・置換等を含む各種の処理を行い、ＲＡＭ２０２０へと書き戻す。例えば、ＣＰＵ２０００は、条件判断を行う場合においては、本実施形態において示した各種の変数が、他の変数または定数と比較して、大きい、小さい、以上、以下、等しい等の条件を満たすかどうかを判断し、条件が成立した場合（または不成立であった場合）に、異なる命令列へと分岐し、またはサブルーチンを呼び出す。 Further, the CPU 2000 includes various operations described in the present embodiment, information processing, condition determination, information retrieval / replacement, and the like, which are specified by the instruction sequence of the program for the data read from the RAM 2020. Is processed and written back to RAM 2020. For example, when the CPU 2000 determines a condition, whether or not the various variables shown in the present embodiment satisfy conditions such as large, small, above, below, and equal to other variables or constants. If the condition is satisfied (or not satisfied), it branches to a different instruction sequence or calls a subroutine.

また、ＣＰＵ２０００は、記憶装置内のファイルまたはデータベース等に格納された情報を検索することができる。例えば、第１属性の属性値に対し第２属性の属性値がそれぞれ対応付けられた複数のエントリが記憶装置に格納されている場合において、ＣＰＵ２０００は、記憶装置に格納されている複数のエントリの中から第１属性の属性値が指定された条件と一致するエントリを検索し、そのエントリに格納されている第２属性の属性値を読み出すことにより、所定の条件を満たす第１属性に対応付けられた第２属性の属性値を得ることができる。 Further, the CPU 2000 can search for information stored in a file in the storage device, a database, or the like. For example, when a plurality of entries in which the attribute value of the second attribute is associated with the attribute value of the first attribute are stored in the storage device, the CPU 2000 describes the plurality of entries stored in the storage device. By searching for an entry in which the attribute value of the first attribute matches the specified condition and reading the attribute value of the second attribute stored in that entry, it is associated with the first attribute that satisfies the predetermined condition. The attribute value of the second attribute obtained can be obtained.

以上に示したプログラムまたはモジュールは、外部の記録媒体に格納されてもよい。記録媒体としては、フレキシブルディスク２０９０、ＤＶＤ－ＲＯＭ２０９５の他に、ＤＶＤ、Ｂｌｕ－ｒａｙ（登録商標）、またはＣＤ等の光学記録媒体、ＭＯ等の光磁気記録媒体、テープ媒体、ＩＣカード等の半導体メモリ等を用いることができる。また、専用通信ネットワークまたはインターネットに接続されたサーバシステムに設けたハードディスクまたはＲＡＭ等の記憶装置を記録媒体として使用し、ネットワークを介してプログラムをコンピュータ１９００に提供してもよい。 The program or module shown above may be stored in an external recording medium. As the recording medium, in addition to the flexible disk 2090 and DVD-ROM2095, an optical recording medium such as DVD, Blu-ray (registered trademark) or CD, a magneto-optical recording medium such as MO, a tape medium, and a semiconductor such as an IC card are used. A memory or the like can be used. Further, a storage device such as a hard disk or RAM provided in a dedicated communication network or a server system connected to the Internet may be used as a recording medium, and a program may be provided to the computer 1900 via the network.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various changes or improvements can be made to the above embodiments. It is clear from the description of the claims that the form with such changes or improvements may be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of each process such as operation, procedure, step, and step in the apparatus, system, program, and method shown in the claims, specification, and drawings is particularly "before" and "prior to". It should be noted that it can be realized in any order unless the output of the previous process is used in the subsequent process. Even if the scope of claims, the specification, and the operation flow in the drawings are explained using "first", "next", etc. for convenience, it means that it is essential to carry out in this order. It's not a thing.

１０・・・文章取得部、２０・・・文節分解部、２２・・・単語分解手段、２４・・・単語分類手段、２６・・・文節組立手段、３０・・・助詞テーブル設定部、３２・・・助詞テーブル取得手段、３４・・・助詞テーブル記憶手段、４０・・・ＤＡＧ生成部、４２・・・ＤＡＧ作成手段、４４・・・ノード結合手段、４６・・・閉ループチェック手段、５０・・・出力部、６０・・・ＤＡＧ構造情報取得部、７０・・・重複単語設定部、７２・・・重複単語テーブル取得手段、７４・・・重複単語テーブル記憶手段、１００・・・解析システム、１１０・・・ＣＰＵ、１２０・・・主メモリ、１３０・・・ＨＤＤ、１４０・・・入力デバイス、１５０・・・ディスプレイ、１９００・・・コンピュータ、２０００・・・ＣＰＵ、２０１０・・・ＲＯＭ、２０２０・・・ＲＡＭ、２０３０・・・通信インターフェイス、２０４０・・・ハードディスクドライブ、２０５０・・・フレキシブルディスク・ドライブ、２０６０・・・ＤＶＤドライブ、２０７０・・・入出力チップ、２０７５・・・グラフィック・コントローラ、２０８０・・・表示装置、２０８２・・・ホスト・コントローラ、２０８４・・・入出力コントローラ、２０９０・・・フレキシブルディスク、２０９５・・・ＤＶＤ－ＲＯＭ 10 ... sentence acquisition unit, 20 ... phrase decomposition unit, 22 ... word decomposition means, 24 ... word classification means, 26 ... phrase assembly means, 30 ... auxiliary word table setting unit, 32 ... Auxiliary table acquisition means, 34 ... Auxiliary table storage means, 40 ... DAG generation unit, 42 ... DAG creation means, 44 ... Node connection means, 46 ... Closed loop check means, 50. ... Output unit, 60 ... DAG structure information acquisition unit, 70 ... Duplicate word setting unit, 72 ... Duplicate word table acquisition means, 74 ... Duplicate word table storage means, 100 ... Analysis System, 110 ... CPU, 120 ... main memory, 130 ... HDD, 140 ... input device, 150 ... display, 1900 ... computer, 2000 ... CPU, 2010 ... ROM, 2020 ... RAM, 2030 ... communication interface, 2040 ... hard disk drive, 2050 ... flexible disk drive, 2060 ... DVD drive, 2070 ... input / output chip, 2075 ... Graphic controller, 2080 ... Display device, 2082 ... Host controller, 2084 ... Input / output controller, 2090 ... Flexible disk, 2095 ... DVD-ROM

Claims

The sentence acquisition unit that acquires the analysis target sentence to be analyzed,
A phrase decomposition unit that generates phrase data obtained by decomposing the analysis target sentence into phrases or compound phrases, and
A DAG generator that generates DAG data from the phrase or compound clause is provided for parsing the phrase data.
The phrase decomposition unit decomposes the analysis target sentence into words, classifies each of the words into predetermined categories, and classifies them into predetermined categories.
Further, a particle table setting unit for creating a particle table in which a predetermined identification code for each category and the word are associated with the word is provided.
The phrase decomposition unit generates a phrase or a compound phrase by concatenating the decomposed words based on the particle table.
Analysis system.

The DAG generation unit generates the DAG data from the phrase or compound clause based on the connection pattern table in which the pattern of the identification code attached to the series of clauses or compound clauses of the analysis target sentence is stored.
The analysis system according to claim 1.

It also has a DAG structure information acquisition unit that acquires DAG structure information including the structure data of past DAG data.
The analysis system according to claim 1 or 2 , wherein the DAG generation unit generates the DAG data based on the DAG structure information.

Further provided with a duplicate word setting unit having a duplicate word table in which duplicate words that allow the shared node of the DAG data are registered.
Any one of claims 1 to 3 in which the DAG generation unit sets a node corresponding to the content as a shared node when the content of the DAG data matches the duplicate word registered in the duplicate word table. The analysis system described in item 1.

The DAG generation unit calculates the matching rate between the word included in the phrase or compound phrase of the analysis target sentence and the word included in the phrase or compound phrase of another analysis target sentence, and the matching rate is predetermined. The analysis system according to any one of claims 1 to 4 , which shares a node when the threshold value is equal to or higher than the specified threshold value.

At the stage of acquiring the analysis target sentence to be analyzed,
The stage of generating phrase data obtained by decomposing the analysis target sentence into clauses or compound clauses, and
A step of generating DAG data from the phrase or compound clause for parsing the phrase data is provided .
The stage of generating the phrase data is
The stage of decomposing the analysis target sentence into words and classifying each of the words into predetermined categories, and
The stage of creating a particle table in which a predetermined identification code for each category and the word are associated with each other, and
Based on the particle table, the stage of generating a phrase or compound phrase by concatenating the decomposed words, and
Have
analysis method.

Further provided with a step of setting a duplicate word table in which duplicate words are registered, which allows the shared node of the DAG data.
A claim that the step of generating the DAG data includes a step of setting a node corresponding to the content as a shared node when the content of the DAG data matches the duplicate word registered in the duplicate word table. The analysis method according to 6.

At the stage of generating the DAG data, the match rate between the word contained in the phrase or compound phrase of the analysis target sentence and the word included in the phrase or compound phrase of another analysis target sentence is calculated, and the match rate is calculated. Has a stage to share a node when is greater than or equal to a predetermined threshold
The analysis method according to claim 6 or 7.

A program for causing a computer to execute the analysis method according to any one of claims 6 to 8 .