JP2010117832A

JP2010117832A - Related information extraction device, related information extraction method, program, and recording medium

Info

Publication number: JP2010117832A
Application number: JP2008289720A
Authority: JP
Inventors: Toru Hirano; 徹平野; Yoshihiro Matsuo; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-11-12
Filing date: 2008-11-12
Publication date: 2010-05-27
Anticipated expiration: 2028-11-12
Also published as: JP5142395B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device and a method for highly precisely extracting related information among a plurality of unique representations corresponding to each case, and to provide a program and a recording medium. <P>SOLUTION: This related information extraction device for extracting information about a plurality of input unique representations includes: an analysis processing part 10 for, when a text including each unique representation is input, performing the morphemic analysis of the input text, and for analyzing the dependency of clauses configuring an input text; and a related information extraction processing part 20 for, when acquiring an analytic result by the analysis processing part 10, extracting at least one independent word included in the input text as a related information candidate, and for acquiring related estimation information showing the degree of estimation that the related information candidate is related information for each extracted related information candidate, and for extracting related information from the related information candidates on the basis of the analytic result and the related estimation information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、入力されたテキストを要約する要約システム等において重要な役割を果たす、複数の固有表現に関係する情報を抽出する技術に関する。 The present invention relates to a technique for extracting information related to a plurality of specific expressions, which plays an important role in a summarization system or the like that summarizes input text.

従来、複数の固有表現に関係する情報を抽出する技術として、２つの固有表現を含む多数の文を所定の記憶装置に予め格納し、これらの文に対して各固有表現間に存在する単語または各固有表現それぞれの前後所定数の文字以内に存在する単語を検索して、最も多く検索された単語を関係情報として抽出するものが知られている（例えば非特許文献１参照）。 Conventionally, as a technique for extracting information related to a plurality of specific expressions, a large number of sentences including two specific expressions are stored in a predetermined storage device in advance, and words existing between the specific expressions for these sentences or It is known to search for words existing within a predetermined number of characters before and after each unique expression and extract the most frequently searched words as related information (see Non-Patent Document 1, for example).

このような関係情報抽出装置は、図１に示すように、記憶部１と、テキスト取得部２と、対象抽出部３と、関係情報抽出部４とを備えている。 As shown in FIG. 1, such a relationship information extraction device includes a storage unit 1, a text acquisition unit 2, a target extraction unit 3, and a relationship information extraction unit 4.

記憶部１には、周知の形態素解析処理がなされた複数の文が予め記憶されている。テキスト取得部２は、キーボード等の入力手段を用いて入力された２つの固有表現を含む文を記憶部から取得する。例えば、「小泉」と「日本」という２つの固有表現が入力された場合には、テキスト取得部２は、「日本の首相である小泉氏は来月韓国を訪問する。」や「日本の小泉首相は先月訪米しブッシュ米大統領と会談した。」等の文を記憶部から取得する。この場合、記憶部から取得された文は、「日本（名詞）／の（格助詞）／首相（名詞）／で（助詞）／ある（動詞）／小泉（名詞）／氏（接尾辞）／は（助詞）／来月（名詞）／韓国（名詞）／を（格助詞）／訪問（動詞）／する（接尾辞）／。（句点）」や「日本（名詞）／の（格助詞）／小泉（名詞）／首相（名詞）／は（助詞）／先月（名詞）／訪米（動詞）／し（接尾辞）／ブッシュ（名詞）／米（名詞）／大統領（名詞）／と（助詞）／会談（動詞）／した（接尾辞）／。（句点）」等のように形態素解析処理がなされている。 The storage unit 1 stores in advance a plurality of sentences that have been subjected to a known morphological analysis process. The text acquisition unit 2 acquires a sentence including two specific expressions input using an input unit such as a keyboard from the storage unit. For example, when two unique expressions “Koizumi” and “Japan” are input, the text acquisition unit 2 reads “Koizumi, the prime minister of Japan, will visit Korea next month.” The Prime Minister visited the US last month and met with US President Bush. ” In this case, the sentence acquired from the storage unit is “Japan (noun) / no (case particle) / prime (noun) / de (particle) / ar (verb) / Koizumi (noun) / Mr. (Suffix) / Is (participant) / next month (noun) / Korea (noun) / do (case particle) / visit (verb) / do (suffix) /. (Phrase) ”or“ Japan (noun) / no (case particle) ” / Koizumi (noun) / prime minister (noun) / ha (particle) / last month (noun) / visiting the United States (verb) / shi (suffix) / bush (noun) / rice (noun) / president (noun) / and (particle) ) / Conversation (verb) / done (suffix) /. (Punctuation) ”and so on.

対象抽出部３は、前記２つの固有表現を用いて、テキスト取得部２で取得した文から各固有表現間に存在する単語と各固有表現それぞれの前後１０文字以内に存在する単語とを抽出する。ここで、例として抽出された各文を用いて説明すると、各固有表現間に存在する単語として「首相」及び「ある」が抽出され、各固有表現それぞれの前後１０文字以内に存在する単語として「来月」、「韓国」、「訪問」、「首相」、「先月」、「訪米」及び「ブッシュ」が抽出される。 The target extraction unit 3 uses the two specific expressions to extract words existing between the specific expressions and words existing within 10 characters before and after each specific expression from the sentence acquired by the text acquisition unit 2. . Here, using each extracted sentence as an example, “Prime” and “A” are extracted as words existing between each unique expression, and words existing within 10 characters before and after each specific expression. “Next month”, “Korea”, “Visit”, “Prime Minister”, “Last month”, “Visit America” and “Bush” are extracted.

次に、対象抽出部３は、抽出された各単語それぞれの抽出回数をカウントして、各単語とその抽出回数を出力する。前記の抽出結果を用いて説明すると、「首相」という単語が２回抽出され、その他の単語はそれぞれ１回抽出されている。 Next, the target extraction unit 3 counts the number of extractions of each extracted word and outputs each word and the number of extractions. If it explains using the above-mentioned extraction result, the word "Prime Minister" is extracted twice, and the other words are extracted once each.

関係情報抽出部４は、対象抽出部３から各単語及びその抽出回数を取得すると、最も抽出回数の多い単語を関係情報として出力する。この場合、「首相」という単語が関係情報として出力される。 When the relationship information extraction unit 4 acquires each word and the number of extractions from the target extraction unit 3, the relationship information extraction unit 4 outputs the word with the highest number of extractions as the relationship information. In this case, the word “Prime Minister” is output as the relationship information.

このようにして、「小泉」と「日本」という２つの固有表現の関係情報として「首相」という単語が抽出される。 In this way, the word “Prime Minister” is extracted as the relation information of the two unique expressions “Koizumi” and “Japan”.

従来の関係情報抽出処理では、例えば「小泉」と「日本」という２つの固有表現を含む多数のテキストを予め記憶部１に記憶することにより関係情報の抽出精度を向上させることが可能であり、それ故に「小泉」と「日本」という２つの固有表現に対して「首相」や「総理」等の潜在的な関係を表す情報を抽出することができるものの、例えば「小泉純一郎が東京駅で演説した。」という事例における「小泉純一郎」と「東京駅」との一時的な関係を表す情報を抽出することが困難であった。 In the conventional relationship information extraction process, for example, it is possible to improve the extraction accuracy of relationship information by storing in advance in the storage unit 1 a large number of texts including two specific expressions “Koizumi” and “Japan”, Therefore, although it is possible to extract information representing potential relationships such as “Prime Minister” and “Prime Minister” for the two unique expressions “Koizumi” and “Japan”, for example, “Junichiro Koizumi gave a speech at Tokyo Station It was difficult to extract information representing the temporary relationship between “Joiichiro Koizumi” and “Tokyo Station” in the case of

そこで、本発明者は、各固有情報に対して一時的な関係を表す情報を抽出することの可能な関係情報抽出装置及びその方法を既に提案している（例えば特許文献１参照）。 In view of this, the present inventor has already proposed a relationship information extraction apparatus and method capable of extracting information representing a temporary relationship with respect to each unique information (see, for example, Patent Document 1).

この関係情報抽出装置は、各固有表現を含むテキストが入力されると、入力テキストを形態素解析するとともに入力テキストを構成する文節の係り受けを解析する解析処理部と、解析処理部による解析結果を取得するとともに、前記各固有表現が同一文に含まれている場合には、各固有表現のそれぞれを含む文節間の係り受け解析結果において係り先のない文節に含まれる自立語を関係情報として抽出し、各固有表現のそれぞれが互いに異なる文に含まれている場合には、各固有表現のうち一方の固有表現を含む文節から一方の固有表現とは異なる自立語を関係情報として抽出する関係情報抽出処理部とを備えている。 When the text including each unique expression is input, the relation information extracting device analyzes the input text by morphological analysis and analyzes the dependency of the clauses constituting the input text, and the analysis result by the analysis processing section. In addition, when the specific expressions are included in the same sentence, the independent words included in the unrelated clauses are extracted as relation information in the dependency analysis result between the clauses including the specific expressions. If each specific expression is contained in a different sentence, the relation information that extracts independent words different from the one specific expression from the clause containing one specific expression as the related information. And an extraction processing unit.

これにより、入力テキストを構成する文節に含まれる自立語であって、各固有表現のそれぞれを含む文節間の係り受け解析結果において係り先のない文節に含まれる自立語または各固有表現のうち一方の固有表現を含む文節内の自立語が関係情報として抽出されることから、各固有表現間の関係情報を入力テキストから抽出することが可能となる。
森純一郎、他３名、“Webからのエンティティ間の関係情報の抽出”、[online]、平成１８年１２月、人工知能学会、[平成１９年１月２２日検索]、インターネット＜URL：http://www.jstage.jst.go.jp/article/pjsai/JSAI06/0/12/#pdf/-char/ja/＞特開２００８−２２５５６６号公報 As a result, it is an independent word included in the clause constituting the input text, and one of the independent word or each unique expression included in the unrelated clause in the dependency analysis result between the phrases including each specific expression. Since the independent words in the phrase including the specific expressions are extracted as the relationship information, the relationship information between the specific expressions can be extracted from the input text.
Junichiro Mori and three others, “Extraction of relationship information between entities from the Web”, [online], December 2006, Japan Society for Artificial Intelligence, [searched on January 22, 2007], Internet <URL: http : //www.jstage.jst.go.jp/article/pjsai/JSAI06/0/12/#pdf/-char/en/> JP 2008-225566 A

しかしながら、特許文献１記載の技術では、例えば「小泉純一郎が東京駅で演説した。」という事例において「小泉純一郎」と「東京駅」に対して「演説」という一時的な関係を表す情報を抽出することができるものの、最適な関係情報の抽出精度が１９％、同一の事例における関係情報の再現率が３３％という評価結果が得られたことから、関係情報を高精度で抽出することが困難であった。 However, in the technique described in Patent Literature 1, for example, in the case of “Junichiro Koizumi made a speech at Tokyo Station”, information representing a temporary relationship “Speech” with respect to “Junichiro Koizumi” and “Tokyo Station” was extracted. However, it is difficult to extract the relationship information with high accuracy because the evaluation result that the extraction accuracy of the optimum relationship information is 19% and the recall rate of the relationship information in the same case is 33% is obtained. Met.

本発明は前記問題点に鑑みてなされたものであり、その目的とするところは、個々の事例に応じた複数の固有表現間の関係情報を高精度で抽出可能な装置、その方法、プログラム及び記録媒体を提供することにある。 The present invention has been made in view of the above-mentioned problems, and an object of the present invention is an apparatus, a method, a program, and an apparatus capable of extracting relation information between a plurality of specific expressions according to individual cases with high accuracy. It is to provide a recording medium.

本発明の関係情報抽出装置は、前記目的を達成するために、入力された複数の固有表現に関係する情報を抽出する装置であって、前記各固有表現を含むテキストが入力されると、入力テキストを形態素解析するとともに入力テキストを構成する文節の係り受けを解析する解析処理部と、解析処理部による解析結果を取得すると、入力テキストに含まれる少なくとも一つの自立語を関係情報候補として抽出するとともに、該関係情報候補が関係情報であると推定される度合を表す関係推定情報を、抽出された関係情報候補毎に取得し、解析結果及び関係推定情報に基づき関係情報候補から関係情報を抽出する関係情報抽出処理部とを備えている。 In order to achieve the above object, the related information extracting apparatus of the present invention is an apparatus for extracting information related to a plurality of input specific expressions, and when a text including each of the specific expressions is input, When the analysis processing unit that analyzes the morphological analysis of the text and analyzes the dependency of the clauses constituting the input text, and when the analysis result by the analysis processing unit is acquired, at least one independent word included in the input text is extracted as a related information candidate In addition, relation estimation information indicating the degree to which the relation information candidate is estimated to be relation information is acquired for each extracted relation information candidate, and relation information is extracted from the relation information candidate based on the analysis result and the relation estimation information And a related information extraction processing unit.

また、本発明の関係情報抽出方法は、前記目的を達成するために、入力された複数の固有表現に関係する情報を、コンピュータを用いて抽出する方法であって、前記コンピュータは、各固有表現を含むテキストが入力されると、入力テキストを形態素解析するとともに入力テキストを構成する文節の係り受けを解析し、入力テキストに含まれる少なくとも一つの自立語を関係情報候補として抽出するとともに、該関係情報候補が関係情報であると推定される度合を表す関係推定情報を、抽出された関係情報候補毎に取得し、解析結果及び関係推定情報に基づき関係情報候補から関係情報を抽出している。 Further, the related information extraction method of the present invention is a method for extracting information related to a plurality of input specific expressions using a computer in order to achieve the object, wherein the computer Input text, morphological analysis of the input text, analysis of the dependency of clauses constituting the input text, and extraction of at least one independent word included in the input text as a candidate for related information The relationship estimation information indicating the degree to which the information candidate is estimated to be the relationship information is acquired for each extracted relationship information candidate, and the relationship information is extracted from the relationship information candidate based on the analysis result and the relationship estimation information.

さらに、本発明のプログラムは、コンピュータを、上記関係情報抽出装置の各手段として機能させるためのものである。 Furthermore, the program of the present invention is for causing a computer to function as each means of the related information extracting apparatus.

さらにまた、本発明のプログラムは、コンピュータに、上記関係情報抽出方法の各処理を実行させるためのものである。 Furthermore, the program of the present invention is for causing a computer to execute each process of the related information extraction method.

さらに、本発明の記録媒体は、上記プログラムを記録している。 Furthermore, the recording medium of the present invention records the above program.

これにより、入力テキストに含まれる少なくとも一つの自立語が関係情報候補として抽出され、該関係情報候補が関係情報であると推定される度合を表す関係推定情報及び解析結果に基づき関係情報候補から関係情報が抽出されることから、各固有表現間の関係情報を入力テキストから抽出することが可能となる。 As a result, at least one independent word included in the input text is extracted as a related information candidate, and the related information from the related information candidate based on the relationship estimation information and the analysis result indicating the degree to which the related information candidate is estimated to be related information. Since the information is extracted, it is possible to extract the relationship information between each unique expression from the input text.

本発明の関係情報抽出装置、その方法、プログラム及び記録媒体によれば、各固有表現の関係情報を入力テキストから抽出することができるので、例えば「小泉純一郎は東京駅で、小沢一郎は大阪駅で演説した。」という事例において「小泉純一郎」と「東京駅」という固有表現に対して「演説」という一時的な関係を表す情報を抽出することができ、個々の事例に応じた固有表現間の関係情報を抽出することができる。 According to the related information extracting apparatus, method, program and recording medium of the present invention, the related information of each unique expression can be extracted from the input text. For example, “Junichiro Koizumi is Tokyo Station and Ichiro Ozawa is Osaka Station In the case of "Speech", the information representing the temporary relationship "Speech" can be extracted for the specific expressions "Joiichiro Koizumi" and "Tokyo Station". Related information can be extracted.

また、本発明によれば、前述した特開２００８−２２５５６６号公報記載の技術と比較して、最適な関係情報の抽出精度が１９％から６８％と大幅に向上するとともに、同一の事例における関係情報の再現率が３３％から４１％に向上するという格別の効果が得られた。 In addition, according to the present invention, the extraction accuracy of the optimum relationship information is greatly improved from 19% to 68% as compared with the technique described in Japanese Patent Application Laid-Open No. 2008-225565 described above, and the relationship in the same case The special effect that the information reproduction rate was improved from 33% to 41% was obtained.

図２乃至図９は本発明の一実施形態を示すもので、図２は本発明の一実施形態における関係情報抽出装置の構成図、図３は関係情報抽出処理のフロー図、図４は係り受け解析部による解析結果の概要を示す図、図５は固有表現対応付け部による処理結果の概要を示す図、図６乃至図８は係り受け構造情報取得部による処理結果の概要を示す図、図９は関係推定情報取得部による処理結果の概要を示す図である。 2 to 9 show an embodiment of the present invention. FIG. 2 is a configuration diagram of a related information extracting apparatus according to an embodiment of the present invention. FIG. 3 is a flowchart of related information extraction processing. FIG. 5 is a diagram showing an overview of the analysis result by the receiving analysis unit, FIG. 5 is a diagram showing an overview of the processing result by the specific expression matching unit, and FIGS. 6 to 8 are diagrams showing an overview of the processing result by the dependency structure information acquiring unit, FIG. 9 is a diagram illustrating an outline of a processing result by the relationship estimation information acquisition unit.

以下、図面を参照して本発明の関係情報抽出装置及びその方法の概要を説明する。 The outline of the related information extracting apparatus and method according to the present invention will be described below with reference to the drawings.

本発明の関係情報抽出装置は、周知のＣＰＵを主体として構成されたコンピュータ装置からなり、モニタ等の表示手段、キーボード等の入力手段、ハードディスクやメモリ等の記憶手段及び外部ネットワークに接続可能な通信装置等（何れも図示省略）を備えている。また、本発明の関係情報抽出装置には、解析処理部１０と、関係情報抽出処理部２０と、関係推定情報記憶部３０と、モデル記憶部４０とが設けられている。 The related information extracting apparatus of the present invention comprises a computer device mainly composed of a well-known CPU, and can be connected to display means such as a monitor, input means such as a keyboard, storage means such as a hard disk and memory, and an external network. A device (not shown) is provided. Further, the relationship information extraction apparatus of the present invention is provided with an analysis processing unit 10, a relationship information extraction processing unit 20, a relationship estimation information storage unit 30, and a model storage unit 40.

解析処理部１０は、図２に示すように形態素解析部１１、係り受け解析部１２及び固有表現対応付け部１３からなり、入力手段を用いて入力されたテキストを形態素解析するとともに入力テキストを構成する文節の係り受け関係を解析するようになっている。 As shown in FIG. 2, the analysis processing unit 10 includes a morphological analysis unit 11, a dependency analysis unit 12, and a specific expression association unit 13. The analysis processing unit 10 performs morphological analysis on the text input using the input unit and configures the input text. It is designed to analyze the dependency relations of phrases.

形態素解析部１１は、入力テキストを取得すると（図３のステップＳ１）、入力テキストに対して周知の形態素解析処理を行うことにより入力テキストを単語分割し、分割した各単語に品詞を付与して出力する（図３のステップＳ２）。例えば、「小泉純一郎は東京駅で、小沢一郎は大阪駅で演説した。」という文が入力された場合には、形態素解析部１１による処理結果は、「小泉純一郎（名詞）／は（格助詞）／東京駅（名詞）／で（助詞）／、（読点）／小沢一郎（名詞）／は（助詞）／大阪駅（名詞）／で（助詞）／演説（動詞）／した（接尾辞）／。（句点）」となる。 When the morphological analysis unit 11 acquires the input text (step S1 in FIG. 3), the input text is subjected to a known morphological analysis process to divide the input text into words, and parts of speech are assigned to the divided words. Output (step S2 in FIG. 3). For example, if a sentence “Junichiro Koizumi gave a speech at Tokyo Station and Ichiro Ozawa at Osaka Station” is input, the processing result by the morphological analysis unit 11 is “Junichiro Koizumi (noun) / ha (case particle). ) / Tokyo Station (noun) / de (particle) /, (reading) / Ichiro Ozawa (noun) / ha (particle) / Osaka station (noun) / de (particle) / speech (verb) / suffix (suffix) /. (Punctuation) ".

係り受け解析部１２は、形態素解析部１１から取得した形態素解析済みの入力テキストに対して周知の係り受け解析処理を行うことにより、該テキストを文節に分割し、分割された複数の文節間の係り受け関係を解析して出力する（図３のステップＳ３）。この場合、例示した入力テキストが係り受け解析部１２によって解析されると、図４に示すような係り受け構造を表す情報（係り受け木）が解析結果として出力される。ここで、「小泉純一郎／は」という文節と、「東京駅／で／、」という文節と、「小沢一郎／は」という文節と、「大阪駅／で」という文節とは、それぞれ「演説／した／。」という文節に係っており、これらの係り受け関係をデータとして実装する場合には、例えば「（演説した。（小泉純一郎は）（東京駅で、）（小沢一郎は）（大阪駅で）」というように表現される。また、これらの係り受け関係には、周知の係り受け解析技術において定義された係り受けタイプ情報（通常の係り受け関係を表す「Ｄ」、並列の係り受け関係を表す「Ｐ」または「同格の係り受け関係を表す「Ａ」）が付与される。 The dependency analysis unit 12 divides the text into phrases by performing a well-known dependency analysis process on the input text that has been obtained from the morpheme analysis unit 11 and has been subjected to morpheme analysis. The dependency relationship is analyzed and output (step S3 in FIG. 3). In this case, when the illustrated input text is analyzed by the dependency analysis unit 12, information indicating a dependency structure as shown in FIG. 4 (dependency tree) is output as an analysis result. Here, the phrase “Junichiro Koizumi / Ha”, the phrase “Tokyo Station / De /,”, the phrase “Ichiro Ozawa / Ha”, and the phrase “Osaka Station / De” are respectively “Speech / In the case of implementing these dependency relationships as data, for example, “(Speech. (Junichiro Koizumi) (At Tokyo Station)” (Ichiro Ozawa) (Osaka) In addition, these dependency relationships include dependency type information defined in a well-known dependency analysis technique (“D” representing a normal dependency relationship, parallel dependency). "P" representing a receiving relationship or "A" representing a similar dependency relationship) is assigned.

固有表現対応付け部１３は、２つの固有表現からなる固有表現の組を取得するとともに係り受け解析部１２の解析結果を用いて固有表現対応付け処理を行うためのものである。具体的に説明すると、固有表現対応付け部１３は、入力手段を用いて入力された固有表現の組及び係り受け解析部１２で解析された入力テキストを取得すると（図３のステップＳ４）、入力された各固有表現に対応する固有表現を入力テキストから抽出し、抽出された固有表現に対して固有表現を表す固有表現識別子を付与する（図３のステップＳ５）。例えば、「小泉純一郎」という固有表現が入力された場合には、入力テキスト中の「小泉純一郎」は「＜ＰＳＮ＞小泉純一郎＜／ＰＳＮ＞」と表記される。ここで、「ＰＳＮ」は人名を表す固有表現識別子であり、本実施形態では、周知の固有表現抽出技術において定義された８種類の固有表現識別子（人名を表す「ＰＳＮ」、組織名を表す「ＯＲＧ」、地名を表す「ＬＯＣ」、固有物名を表す「ＡＲＴ」、金額を表す「ＭＮＹ」、割合を表す「ＰＮＴ」、時刻を表す「ＴＩＭ」及び日付を表す「ＤＡＴ」）が用いられている。 The specific expression association unit 13 is for acquiring a set of specific expressions composed of two specific expressions and performing a specific expression association process using the analysis result of the dependency analysis unit 12. More specifically, when the specific expression association unit 13 obtains a set of specific expressions input using the input unit and the input text analyzed by the dependency analysis unit 12 (step S4 in FIG. 3), the input is performed. A unique expression corresponding to each unique expression is extracted from the input text, and a unique expression identifier representing the specific expression is assigned to the extracted specific expression (step S5 in FIG. 3). For example, when a specific expression “Junichiro Koizumi” is input, “Junichiro Koizumi” in the input text is expressed as “<PSN> Junichiro Koizumi </ PSN>”. Here, “PSN” is a unique expression identifier representing a person name. In this embodiment, eight types of unique expression identifiers (“PSN” representing a person name and “PSN” representing an organization name defined in a well-known unique expression extraction technique). ORG ”,“ LOC ”representing place name,“ ART ”representing unique name,“ MNY ”representing amount,“ PNT ”representing percentage,“ TIM ”representing time, and“ DAT ”representing date) ing.

なお、本実施形態では、固有表現の組を「小泉純一郎：東京駅」のように表記する。この場合、固有表現の組のうち入力テキストにおいて先に現れる固有表現が前方固有表現として「：」の左側に表され、後に現れる固有表現が後方固有表現として「：」の右側に表される。また、「小泉純一郎：東京駅」という固有表現の組が入力されたときには、固有表現対応付け部１３の処理結果は図５のように示される。 In the present embodiment, a set of unique expressions is expressed as “Junichiro Koizumi: Tokyo Station”. In this case, the specific expression that appears first in the input text in the set of specific expressions is represented as the front specific expression on the left side of “:”, and the specific expression that appears later is displayed on the right side of “:” as the backward specific expression. Further, when a set of specific expressions “Junichiro Koizumi: Tokyo Station” is input, the processing result of the specific expression association unit 13 is shown in FIG.

次に、関係情報抽出処理部２０の概要を説明する。関係情報抽出処理部２０は、関係情報候補抽出部２１と、係り受け構造情報取得部２２と、関係推定情報取得部２３と、モデル選択部２４と、分類器２５と、関係情報抽出部２６とからなり、解析処理部１０からによる解析結果に基づいて固有表現の組の関係情報を抽出するようになっている。 Next, an outline of the relationship information extraction processing unit 20 will be described. The relationship information extraction processing unit 20 includes a relationship information candidate extraction unit 21, a dependency structure information acquisition unit 22, a relationship estimation information acquisition unit 23, a model selection unit 24, a classifier 25, and a relationship information extraction unit 26. The relation information of the set of specific expressions is extracted based on the analysis result from the analysis processing unit 10.

関係情報候補抽出部２１は、解析処理部１０による解析結果を固有表現対応付け部１３から取得すると、入力テキストに含まれる少なくとも一つの自立語を関係情報候補として抽出する（図３のステップＳ６）。ここで、関係情報候補抽出部２１は、入力テキストにおいて固有表現の組が同一文節に含まれる場合に、固有表現の組のうち前方固有表現の前方に隣接する自立語と、各固有表現間に存在する自立語と、後方固有表現の後方に隣接する自立語とを関係情報候補として抽出する。例えば、「＜ＰＳＮ＞石原＜／ＰＳＮ＞＜ＬＯＣ＞東京都＜／ＬＯＣ＞知事が」というように、「石原：東京都」という固有表現の組が同一文節に含まれている場合には、後方固有表現「東京都」の後方に隣接する「知事」という自立語が関係情報候補として抽出される。 When the analysis result obtained by the analysis processing unit 10 is acquired from the specific expression association unit 13, the related information candidate extraction unit 21 extracts at least one independent word included in the input text as a related information candidate (step S6 in FIG. 3). . Here, the relationship information candidate extraction unit 21, when a set of specific expressions is included in the same phrase in the input text, between the independent words adjacent to the front of the front specific expression in the specific expressions and each specific expression The existing independent words and the independent words that are adjacent to the rear of the backward unique expression are extracted as related information candidates. For example, when a set of specific expressions “Ishihara: Tokyo” is included in the same phrase, such as “<PSN> Ishihara </ PSN> <LOC> Tokyo </ LOC> governor”, An independent word “governor” adjacent to the back of the backward proper expression “Tokyo” is extracted as a candidate for related information.

また、関係情報候補抽出部２１は、入力テキストにおいて固有表現の組が互いに異なる文節に含まれる場合に、固有表現の組のうち前方固有表現の前方又は後方に隣接する自立語と、後方固有表現の前方又は後方に隣接する自立語とを関係情報候補として抽出する。例えば、「＜ＯＲＧ＞自民党＜／ＯＲＧ＞総裁には」というように、「小泉純一郎：自民党」という固有表現の組が互いに異なる文節に含まれている場合には、後方固有表現「自民党」の後方に隣接する「総裁」という自立語が関係情報候補として抽出される。 In addition, the related information candidate extraction unit 21, when the set of specific expressions in the input text is included in different clauses, the independent word adjacent to the front or rear of the front specific expression in the set of specific expressions, and the rear specific expression Independent words adjacent to the front or rear of the are extracted as related information candidates. For example, if the set of specific expressions “Joiichiro Koizumi: Liberal Democratic Party” is included in different clauses, such as “<ORG> Liberal Democratic Party </ ORG> Governors”, the backward proper expression “Liberal Democratic Party” An independent word “Governor” adjacent to the rear is extracted as a related information candidate.

さらに、関係情報候補抽出部２１は、固有表現の組が含まれていない文節の主辞が自立語である場合に、該文節の先頭から主辞までの形態素を関係情報候補として抽出する。例えば、「演説した。」という文節のように、主辞「演説」が自立語である場合には、「演説」という形態素が関係情報候補として抽出される。 Furthermore, when the main word of the phrase that does not include the specific expression pair is an independent word, the related information candidate extraction unit 21 extracts the morphemes from the head of the phrase to the main word as related information candidates. For example, when the main sentence “Speech” is an independent word as in the phrase “Speech”, the morpheme “Speech” is extracted as a related information candidate.

なお、本実施形態では、図５に示した解析結果が入力されると、「小沢一郎」、「大阪駅」及び「演説」の３つが関係情報候補として抽出される。また、本実施形態では、上記３つの関係情報候補抽出方法を用いたが、各関係情報候補抽出方法のうち何れか１つ又は２つの方法を用いて関係情報候補の抽出処理を行ってもよい。 In the present embodiment, when the analysis result shown in FIG. 5 is input, “Ichiro Ozawa”, “Osaka Station”, and “Speech” are extracted as related information candidates. Further, in the present embodiment, the above three relational information candidate extraction methods are used. However, any one or two of the relational information candidate extraction methods may be used to perform the relational information candidate extraction process. .

係り受け構造情報取得部２２は、解析処理部１０による解析結果を固有表現対応付け部１３から取得するとともに、関係情報候補を関係情報候補抽出部２１から取得すると、固有表現の組と関係情報候補を含む最小の係り受け木を抽出する。また、係り受け構造情報取得部２２は、抽出した係り受け木と、各文節の主辞の品詞・係り受けタイプ・助詞と、固有表現の組の固有表現識別子と、関係情報候補の形態素の表記及び品詞とを用いて木構造情報を生成することにより、係り受け構造情報を取得する（図３のステップＳ７）。例えば、「小泉純一郎：東京駅」という固有表現の組と、「小沢一郎」という関係情報候補とから生成された木構造情報は、図６のように示される。また、「小泉純一郎：東京駅」という固有表現の組と、「大阪駅」という関係情報候補とから生成された木構造情報は、図７のように示され、「小泉純一郎：東京駅」という固有表現の組と、「演説」という関係情報候補とから生成された木構造情報は、図８のように示される。なお、本実施形態では、助詞が文節に含まれていないことを、「φ」を用いて表現している。例えば、図６乃至図８の木構造情報では、助詞が「演説した。」という文節に含まれていないことが表されている。 When the dependency structure information acquisition unit 22 acquires the analysis result by the analysis processing unit 10 from the specific expression association unit 13 and acquires the related information candidate from the related information candidate extraction unit 21, the set of specific expressions and the related information candidate Extract the smallest dependency tree that contains. In addition, the dependency structure information acquisition unit 22 includes the extracted dependency tree, the part of speech / dependency type / participant of the main part of each clause, the specific expression identifier of the set of specific expressions, the notation of the morphemes of the related information candidates, The dependency structure information is acquired by generating the tree structure information using the part of speech (step S7 in FIG. 3). For example, the tree structure information generated from the unique expression set “Junichiro Koizumi: Tokyo Station” and the related information candidate “Ichiro Ozawa” is shown in FIG. Further, the tree structure information generated from the unique expression set “Junichiro Koizumi: Tokyo Station” and the related information candidate “Osaka Station” is shown in FIG. 7 and is called “Junichiro Koizumi: Tokyo Station”. The tree structure information generated from the set of proper expressions and the related information candidate “speech” is shown in FIG. In the present embodiment, “φ” is used to express that the particle is not included in the phrase. For example, the tree structure information in FIG. 6 to FIG. 8 indicates that the particle is not included in the phrase “spoken”.

関係推定情報取得部２３は、係り受け構造情報取得部２２によって生成された各関係情報候補の係り受け構造情報を取得すると、関係情報候補が関係情報であると推定される度合を表す関係推定情報を、関係推定情報記憶部３０から取得する（図３のステップＳ８）。関係推定情報記憶部３０には、例えば「小沢一郎＝０．１」、「大阪駅＝０．２」、「演説＝１」というように、複数の形態素それぞれに対応する複数の関係推定情報が事前に記憶されており、各関係推定情報は、人的または後述の算出方法により作成されている。関係推定情報取得部２３は、図８に示した係り受け構造情報を取得すると、関係推定情報記憶部３０に記憶された各関係推定情報のうち、「演説」という関係情報候補に対応する関係推定情報を検索し、該当する関係推定情報「１」を抽出する。そして、関係推定情報取得部２３は、抽出した関係推定情報を、図９に示すように、係り受け構造情報における候補ノードの子ノードとして追加する。なお、関係推定情報取得部２３は、関係情報候補抽出部２１にて抽出された他の関係情報候補（「小沢一郎」及び「大阪駅」）についても同様の処理を行う。 When the relationship estimation information acquisition unit 23 acquires the dependency structure information of each relationship information candidate generated by the dependency structure information acquisition unit 22, the relationship estimation information indicating the degree to which the relationship information candidate is estimated to be the relationship information. Is acquired from the relationship estimation information storage unit 30 (step S8 in FIG. 3). The relationship estimation information storage unit 30 has a plurality of relationship estimation information corresponding to each of a plurality of morphemes, for example, “Ichiro Ozawa = 0.1”, “Osaka Station = 0.2”, “Speech = 1”. Each relationship estimation information is stored in advance, and is created by a human or a calculation method described later. When the relationship estimation information acquisition unit 23 acquires the dependency structure information illustrated in FIG. 8, among the relationship estimation information stored in the relationship estimation information storage unit 30, the relationship estimation corresponding to the relationship information candidate “speech”. Information is searched, and relevant relationship estimation information “1” is extracted. Then, the relationship estimation information acquisition unit 23 adds the extracted relationship estimation information as a child node of the candidate node in the dependency structure information as illustrated in FIG. The relationship estimation information acquisition unit 23 performs the same process on the other relationship information candidates (“Ichiro Ozawa” and “Osaka Station”) extracted by the relationship information candidate extraction unit 21.

なお、関係情報の推定度合を大規模コーパスから算出する技術については、田中他、「意味範疇の散らばりに基づいた名詞の統語範疇の分類」、情報処理学会論文誌、ｖｏｌ．４０、ｎｏ．９、ｐｐ．３３８７−３３９６、１９９９年９月を参照されたい。 As for the technique for calculating the estimated degree of related information from a large corpus, Tanaka et al., “Classification of syntactic categories of nouns based on the dispersion of semantic categories”, Transactions of Information Processing Society of Japan, vol. 40, no. 9, pp. 3387-3396, September 1999.

モデル選択部２４は、関係推定情報取得部２３の処理結果を取得すると、固有表現対応付け部１３によって付与された固有表現識別子に基づいて固有表現の組を分類するとともに、後述の分類器２５によって抽出されるモデルの種類を選択する（図３のステップＳ９）。例えば、固有表現の組として「小泉純一郎：東京駅」が入力された場合には、モデル選択部２４は固有表現の組を「人名：地名」という種類に分類し、分類された固有表現の組の種類を出力する。 When the model selection unit 24 acquires the processing result of the relationship estimation information acquisition unit 23, the model selection unit 24 classifies the set of proper expressions based on the specific expression identifier assigned by the specific expression association unit 13, and uses a classifier 25 described later. The type of model to be extracted is selected (step S9 in FIG. 3). For example, when “Joiichiro Koizumi: Tokyo Station” is input as a set of specific expressions, the model selection unit 24 classifies the set of specific expressions into a type of “person name: place name” and sets the classified specific expressions. The type of output.

分類器２５は、関係推定情報取得部２３の処理結果と、モデル選択部２４の処理結果とを取得すると、モデル選択部２４で選択された固有表現の組の種類に基づいて、複数のモデルが記憶されたモデル記憶部４０からモデルを抽出する。そして、分類器２５は、各関係情報候補が固有表現の組の関係情報となるか否かを、抽出したモデルを用いて判別する（図３のステップＳ１０）。 When the classifier 25 acquires the processing result of the relationship estimation information acquisition unit 23 and the processing result of the model selection unit 24, a plurality of models are obtained based on the type of the unique expression set selected by the model selection unit 24. A model is extracted from the stored model storage unit 40. Then, the classifier 25 determines, using the extracted model, whether or not each relation information candidate becomes relation information of a set of unique expressions (step S10 in FIG. 3).

ここで、モデルは、所定の固有表現の組に対応する関係情報について事前に判別された結果と、該所定の固有表現の組に対応する各固有表現を含むテキストを用いて解析処理部１０及び関係情報抽出処理２０から事前に抽出された情報とを用いて周知の機械学習を行うことにより予め生成されている。また、所定の固有表現の組についての判別結果は人的な判断に基づいて事前になされている。なお、各モデルを、例えば「人名：地名」や「人名：人名」等のように固有表現の組の種類に応じて構成してもよいし、種類を区別することなく構成してもよい。 Here, the model uses the analysis processing unit 10 and the result including the result determined in advance for the relationship information corresponding to the predetermined specific expression set and the text including each specific expression corresponding to the predetermined specific expression set, and It is generated in advance by performing well-known machine learning using information extracted in advance from the relationship information extraction process 20. In addition, the discrimination result for a predetermined set of specific expressions is made in advance based on human judgment. Note that each model may be configured according to the type of set of unique expressions, such as “person name: place name”, “person name: name”, or the like, or may be configured without distinguishing the types.

この場合、分類器２５による判別には、関係推定情報取得部２３によって付与された関係推定情報も利用されていることから、関係情報としての推定度合に基づき各関係情報候補が固有表現の組の関係情報となるか否か判別することができる。本実施形態では、各関係情報候補のうち「演説」という関係情報候補が、「小泉純一郎：東京駅」という固有表現の組の関係情報であると判別される。 In this case, since the relationship estimation information given by the relationship estimation information acquisition unit 23 is also used for discrimination by the classifier 25, each relationship information candidate is a set of unique expressions based on the degree of estimation as the relationship information. It can be determined whether or not the relationship information is obtained. In the present embodiment, it is determined that the relationship information candidate “speech” among the relationship information candidates is the relationship information of the specific expression group “Junichiro Koizumi: Tokyo Station”.

なお、分類器２５を、関係情報となる否かという判別結果の他に、関係情報となり得る度合を表す数値を出力するように構成してもよい。また、機械学習としては、周知のものを用いることが可能であるが、木構造やグラフ構造のデータを直接入力して学習可能に構成されたものを用いることが望ましい。 Note that the classifier 25 may be configured to output a numerical value indicating the degree of possible relation information in addition to the determination result of whether or not the relation information is obtained. As machine learning, a well-known machine can be used. However, it is desirable to use a machine that can learn by directly inputting data of a tree structure or a graph structure.

関係情報抽出部２６は、各関係情報候補のうち、固有表現の組の関係情報であると分類器２５によって判別された関係情報候補を関係情報として抽出し、抽出された関係情報を表示手段に出力する（図３のステップＳ１１）。なお、分類器２５が、関係情報となり得る度合を表す数値を出力するように構成されている場合には、関係情報抽出部２６は、該数値が最も大きい関係情報候補を関係情報として抽出するようにしてもよいし、該数値が所定の閾値より大きい関係情報候補を関係情報として抽出するようにしてもよい。 The relation information extraction unit 26 extracts relation information candidates determined by the classifier 25 as relation information among the relation information candidates as the relation information of the set of specific expressions, and uses the extracted relation information as a display unit. Output (step S11 in FIG. 3). When the classifier 25 is configured to output a numerical value indicating the degree of possible relation information, the related information extraction unit 26 extracts a related information candidate having the largest numerical value as related information. Alternatively, relationship information candidates whose numerical values are larger than a predetermined threshold value may be extracted as relationship information.

このようにして、「小泉純一郎は東京駅で、小沢一郎は大阪駅で演説した。」というテキストと、「小泉純一郎：東京駅」という固有表現の組とが入力されると、「演説」という関係情報が抽出される。 In this way, if the text “Junichiro Koizumi delivered a speech at Tokyo Station and Ichiro Ozawa delivered a speech at Osaka Station” and a set of specific expressions “Junichiro Koizumi: Tokyo Station” were entered, it would be called “Speech”. Relationship information is extracted.

なお、本発明者は、既に提案した特開２００８−２２５５６６号公報記載の関係情報抽出技術の性能評価を行ったところ、最適な関係情報の抽出精度が１９％、同一の事例における関係情報の再現率が３３％との評価結果が得られた。一方、本発明について性能評価を行った場合には、最適な関係情報の抽出精度が６８％と大幅に向上するとともに、同一の事例における関係情報の再現率が４１％に向上するという格別の効果が得られた。 The present inventor performed performance evaluation of the related information extraction technique described in Japanese Patent Application Laid-Open No. 2008-225666, and found that the optimal relation information extraction accuracy was 19% and the related information was reproduced in the same case. An evaluation result of 33% was obtained. On the other hand, when the performance evaluation is performed for the present invention, the optimum accuracy of extracting the related information is greatly improved to 68%, and the reproducibility of the related information in the same case is improved to 41%. was gotten.

前述したように上記実施形態では、入力テキストに含まれる少なくとも一つの自立語が関係情報候補として抽出され、該関係情報候補が関係情報であると推定される度合を表す関係推定情報及び解析結果に基づき関係情報候補から関係情報が抽出されることから、各固有表現間の関係情報を入力テキストから抽出することができる。従って、例えば「小泉純一郎は東京駅で、小沢一郎は大阪駅で演説した。」という事例において「小泉純一郎」と「東京駅」という固有表現に対して「演説」という一時的な関係を表す情報を抽出することができ、個々の事例に応じた固有表現間の関係情報を抽出することができる。 As described above, in the above-described embodiment, at least one independent word included in the input text is extracted as the relationship information candidate, and the relationship estimation information and the analysis result indicating the degree that the relationship information candidate is estimated to be the relationship information. Since the relationship information is extracted from the relationship information candidates based on the relationship information, the relationship information between the unique expressions can be extracted from the input text. Thus, for example, in the case of “Junichiro Koizumi gave a speech at Tokyo Station and Ichiro Ozawa at Osaka Station”, information representing a temporary relationship “speech” to the specific expression “Junichiro Koizumi” and “Tokyo Station” Can be extracted, and the relationship information between the unique expressions according to individual cases can be extracted.

また、本発明によれば、特開２００８−２２５５６６号公報記載の技術と比較して、最適な関係情報の抽出精度が１９％から６８％と大幅に向上するとともに、同一の事例における関係情報の再現率が３３％から４１％に向上するという格別の効果が得られた。 In addition, according to the present invention, the extraction accuracy of the optimum relationship information is greatly improved from 19% to 68% as compared with the technique described in Japanese Patent Application Laid-Open No. 2008-225566, and the relationship information in the same case is improved. A special effect was obtained in that the recall was improved from 33% to 41%.

さらに、関係情報候補に対応する関係推定情報を事前に記憶する関係推定情報記憶部３０を備え、関係情報抽出処理部２０は、抽出された関係情報候補に対応する関係推定情報を関係推定情報記憶部３０から取得するので、関係推定情報を容易に取得することができ、関係情報を抽出する際の処理効率を向上させることができる。 Furthermore, a relationship estimation information storage unit 30 that stores relationship estimation information corresponding to the relationship information candidate in advance is provided, and the relationship information extraction processing unit 20 stores the relationship estimation information corresponding to the extracted relationship information candidate. Since it acquires from the part 30, relationship estimation information can be acquired easily and the processing efficiency at the time of extracting relationship information can be improved.

さらにまた、関係情報抽出処理部２０は、各固有表現が同一文節に含まれる場合に、各固有表現のうち入力テキストにおいて先に現れる一方の固有表現の前方に隣接する自立語、各固有表現間に存在する自立語及び各固有表現のうち他方の固有表現の後方に隣接する自立語を関係情報候補として抽出するので、複数の関係情報候補を各固有表現を含む同一文節から抽出することができ、関係情報の抽出精度を向上させることができる。 Furthermore, the relational information extraction processing unit 20, when each unique expression is included in the same phrase, includes an independent word adjacent to the front of one of the specific expressions that appears first in the input text, The independent word that is adjacent to the other of the unique expressions and the independent words that are adjacent to each other are extracted as related information candidates, so that a plurality of related information candidates can be extracted from the same phrase including each specific expression. Thus, the extraction accuracy of related information can be improved.

また、関係情報抽出処理部２０は、各固有表現が互いに異なる文節に含まれる場合に、各固有表現のうち入力テキストにおいて先に現れる一方の固有表現の前方又は後方に隣接する自立語と、各固有表現のうち他方の固有表現の前方又は後方に隣接する自立語とを関係情報候補として抽出するので、複数の関係情報候補を各固有表現それぞれが含まれる文節から抽出することができ、関係情報の抽出精度を向上させることができる。 In addition, when each specific expression is included in different clauses, the relationship information extraction processing unit 20 includes independent words adjacent to the front or rear of one specific expression that appears first in the input text among the specific expressions, Since independent words adjacent to the front or rear of the other specific expression among the specific expressions are extracted as related information candidates, a plurality of related information candidates can be extracted from the clauses including each specific expression. The extraction accuracy can be improved.

さらに、関係情報抽出処理部２０は、各固有表現が含まれていない文節の主辞が自立語である場合に、該文節の先頭から主辞までの形態素を関係情報候補として抽出するので、関係情報候補を各固有表現が含まれていない文節から抽出することができ、関係情報の抽出精度を向上させることができる。 Furthermore, when the main word of the phrase that does not include each unique expression is an independent word, the related information extraction processing unit 20 extracts the morphemes from the head of the phrase to the main word as related information candidates. Can be extracted from clauses that do not include each unique expression, and the extraction accuracy of relation information can be improved.

なお、上記実施形態は本発明の具体例に過ぎず、本発明が上記実施形態のみに限定されることはない。例えば、本発明は、周知のコンピュータに記録媒体もしくは通信回線を介して、図２の構成図に示された機能を実現するプログラムあるいは図３のフローに示された手順を備えるプログラムをインストールすることによっても実現可能である。 In addition, the said embodiment is only a specific example of this invention, and this invention is not limited only to the said embodiment. For example, the present invention installs a program for realizing the functions shown in the configuration diagram of FIG. 2 or a program having the procedure shown in the flow of FIG. 3 through a recording medium or a communication line in a known computer. This is also possible.

また、関係情報候補抽出部２１を、所定の助詞が、連続する文節のうち入力テキストにおいて先に現れる一方の文節の最後尾に含まれるとともに、所定の動詞が、該連続する文節のうち他方の文節の先頭に含まれる場合に、一方の文節と、他方の文節の先頭から主辞までの形態素とからなる形態素列を関係情報候補として抽出するように構成してもよい。例えば、関係情報候補抽出部２１は、「及ぼす」という機能動詞と、該機能動詞に対応する「を」という助詞とを記憶している。そして、「影響を」という文節と「及ぼす」という文節が入力テキスト内で連続して現れる場合には、関係情報候補抽出部２１は、「影響を」という一方の文節と、「及ぼす」という他方の文節の先頭から主辞までの形態素即ち「及ぼす」とからなる形態素列「影響を及ぼす」を関係情報候補として抽出する。 In addition, the related information candidate extracting unit 21 includes a predetermined particle included at the end of one of the consecutive clauses that appears first in the input text, and a predetermined verb included in the other of the consecutive clauses. When included in the beginning of a phrase, a morpheme sequence including one phrase and a morpheme from the beginning of the other phrase to the main word may be extracted as a related information candidate. For example, the relationship information candidate extraction unit 21 stores a functional verb “effect” and a particle “wo” corresponding to the functional verb. When the phrase “influence” and the phrase “influence” appear consecutively in the input text, the relationship information candidate extraction unit 21 selects one of the phrases “influence” and the other “influence”. The morpheme from the beginning of the clause to the main word, that is, the morpheme string “influence” consisting of “influence” is extracted as a candidate of related information.

この場合、所定の助詞と機能動詞の組合せからなる形態素列を関係情報候補として抽出することができるので、関係情報候補の数を増加させることができ、関係情報の抽出精度を向上させることができる。 In this case, since a morpheme sequence consisting of a combination of a predetermined particle and a functional verb can be extracted as related information candidates, the number of related information candidates can be increased, and the accuracy of extracting related information can be improved. .

また、モデル選択部２４を、各固有表現の位置関係に応じてモデルの種類を選択するように構成してもよい。例えば、各固有表現が同一文節に含まれている場合には、モデル選択部２４は、固有表現の組を「同一文節型」という種類に分類する。また、モデル選択部２４は、各固有表現のうち前方固有表現が後方固有表現に係る場合に、固有表現の組を「係り受け型」という種類に分類し、「同一文節型」及び「係り受け型」に該当しない場合に、固有表現の組を「その他型」に分類する。 Further, the model selection unit 24 may be configured to select a model type according to the positional relationship of each unique expression. For example, when each specific expression is included in the same phrase, the model selection unit 24 classifies the set of specific expressions into a type of “same phrase type”. Further, the model selection unit 24 classifies the set of specific expressions into a type of “dependency type” when the front specific expression of each specific expression relates to the backward specific expression, and sets the “same phrase type” and “dependency type”. If it does not fall under “type”, the set of proper expressions is classified as “other types”.

なお、上記のモデル選択方法と、上記実施形態におけるモデル選択方法とを適宜組み合わせて用いるのも可能であることは言うまでもない。 Needless to say, the model selection method described above and the model selection method according to the embodiment may be used in appropriate combination.

従来の関係情報抽出装置の構成を示す図The figure which shows the structure of the conventional related information extraction apparatus. 本発明の一実施形態における関係情報抽出装置の構成図1 is a configuration diagram of a related information extraction device according to an embodiment of the present invention. 関係情報抽出処理のフロー図Relationship information extraction process flow chart 係り受け解析部による解析結果の概要を示す図Diagram showing the summary of analysis results by the dependency analysis unit 固有表現対応付け部による処理結果の概要を示す図The figure which shows the outline | summary of the processing result by a specific expression matching part 係り受け構造情報取得部による処理結果の概要を示す図The figure which shows the outline of the processing result by the dependency structure information acquisition section 係り受け構造情報取得部による処理結果の概要を示す図The figure which shows the outline of the processing result by the dependency structure information acquisition section 係り受け構造情報取得部による処理結果の概要を示す図The figure which shows the outline of the processing result by the dependency structure information acquisition section 関係推定情報取得部による処理結果の概要を示す図The figure which shows the outline of the processing result by the relation estimation information acquisition part

Explanation of symbols

１０…解析処理部、１１…形態素解析部、１２…係り受け解析部、２０…関係情報抽出処理部、２１…関係情報候補抽出部、２３…関係推定情報取得部、２６…関係情報抽出部、３０…関係推定情報記憶部。 DESCRIPTION OF SYMBOLS 10 ... Analysis process part, 11 ... Morphological analysis part, 12 ... Dependency analysis part, 20 ... Relation information extraction process part, 21 ... Relation information candidate extraction part, 23 ... Relation estimation information acquisition part, 26 ... Relation information extraction part, 30: A relationship estimation information storage unit.

Claims

An apparatus for extracting information related to a plurality of input proper expressions,
When a text including each of the specific expressions is input, an analysis processing unit that analyzes morphological analysis of the input text, and analyzes a dependency of a clause constituting the input text,
When the analysis result by the analysis processing unit is acquired, at least one independent word included in the input text is extracted as a related information candidate, and the relationship estimation information indicating the degree to which the related information candidate is estimated to be related information, A relationship information extraction apparatus comprising: a relationship information extraction processing unit that acquires each extracted relationship information candidate and extracts relationship information from the relationship information candidate based on an analysis result and relationship estimation information.

A relationship estimation information storage unit that stores in advance relationship estimation information corresponding to the relationship information candidates;
The relationship information extraction apparatus according to claim 1, wherein the relationship information extraction processing unit acquires relationship estimation information corresponding to the extracted relationship information candidate from the relationship estimation information storage unit.

The relation information extraction processing unit exists between each unique expression and an independent word adjacent to the front of one of the specific expressions that appears first in the input text when each specific expression is included in the same phrase. The related information extraction apparatus according to claim 1, wherein an independent word adjacent to the rear of the other unique expression is extracted as a related information candidate.

The related information extraction processing unit, when each specific expression is included in different clauses, the independent word adjacent to the front or rear of one specific expression that appears first in the input text among the specific expressions, and each specific expression 4. The related information extraction apparatus according to claim 1, wherein a self-supporting word adjacent to the front or rear of the other specific expression is extracted as a related information candidate. 5.

The relation information extraction processing unit extracts a morpheme from the head of the clause to the main word as a related information candidate when the main word of the phrase that does not include each unique expression is an independent word. 5. The related information extracting device according to any one of 1 to 4.

The related information extraction processing unit includes a predetermined particle at the end of one of the consecutive phrases that appears first in the input text, and a predetermined verb of the other phrase among the consecutive phrases. 6. The morpheme sequence consisting of one clause and a morpheme from the beginning of the other clause to the main word when it is included at the beginning is extracted as a relational information candidate. Relationship information extraction device.

A method of extracting information related to a plurality of input unique expressions using a computer,
When the text including each unique expression is input, the computer morphologically analyzes the input text and analyzes the dependency of the clause constituting the input text,
At least one independent word included in the input text is extracted as a related information candidate, and relationship estimation information representing the degree to which the related information candidate is estimated to be related information is acquired for each extracted related information candidate And extracting relation information from the candidate relation information based on the analysis result and the relation estimation information.

The program for functioning a computer as each means of the related information extraction apparatus in any one of Claims 1 thru | or 6.

The program for making a computer perform each process of the relationship information extraction method of Claim 7.

10. A computer-readable recording medium on which the program according to claim 8 or 9 is recorded.