JP6726638B2

JP6726638B2 - Entailment recognition device, method, and program

Info

Publication number: JP6726638B2
Application number: JP2017094854A
Authority: JP
Inventors: 克人別所; 久子浅野; 松尾　義博; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-05-11
Filing date: 2017-05-11
Publication date: 2020-07-22
Anticipated expiration: 2037-05-11
Also published as: JP2018190339A

Description

本発明は、２つのテキストｔ１、ｔ２の間の含意関係を認識する含意認識装置、方法、及びプログラムに関する。 The present invention relates to an entailment recognition device, method, and program for recognizing an entailment relationship between two texts t1 and t2.

２つのテキストｔ１，ｔ２が与えられたとき、「ｔ１ならばｔ２」が成り立つかどうかを判定する含意認識技術として、非特許文献1の手法がある。非特許文献１の手法では、集合Ｅを例えば単語集合としたとき、ｔ１とｔ２において、集合Ｅの要素がいくつ共通して現れるかを、次式によって定義する。 There is a method of Non-Patent Document 1 as an implication recognition technique for determining whether or not “if t1 is t2” when two texts t1 and t2 are given. In the method of Non-Patent Document 1, when the set E is, for example, a word set, how many elements of the set E commonly appear in t1 and t2 is defined by the following equation.

ここで、ｆ（ｘ，ｔ）は、集合Ｅの要素ｘが、ｔ中に出現する回数を表す。この式を用いて、「ｔ１ならばｔ２」の含意関係の度合いを表す類似度を以下の式で算出する。 Here, f(x,t) represents the number of times the element x of the set E appears in t. Using this equation, the degree of similarity representing the degree of the implication relation of “if t1, t2” is calculated by the following equation.

服部昇平, 佐藤理史, 駒谷知範,“表層類似度に基づく日本語テキスト含意認識,”人工知能学会論文誌, Vol.29, No.4, pp.416-426, 2014.Shohei Hattori, Rifumi Sato, Tomonori Komatani, "Japanese text entailment recognition based on surface similarity," Journal of Japan Society for Artificial Intelligence, Vol.29, No.4, pp.416-426, 2014.

上記従来手法では、以下に述べる課題がある。 The above conventional method has the following problems.

ｔ２，ｔ１に出現するＥ中の２つの要素ｊｕ，ｊｖに対し、以下のａ），ｂ），ｃ），ｄ）のいずれかが成り立つ。ここで、部分文とは、用言文節を起点として、該用言文節と、該用言文節に係るまたは該用言文節が係る非用言文節と、非用言文節に係る非用言文節とを連結して得られる文のことである。あるいは、そのような部分文を除いた文節群の中の非用言文節とそれに係る非用言文節とを連結して得られる文のことである。 For the two elements ju and jv in E appearing at t2 and t1, any of the following a), b), c) and d) is established. Here, the sub-sentences are, starting from the verb phrase, the verb phrase, the non-verb phrase related to the verb phrase or related to the verb phrase, and the non-verb phrase related to the non-verb phrase. It is a sentence obtained by connecting and. Alternatively, it is a sentence obtained by connecting the non-verbal phrases in the phrase group excluding such sub-sentences and the non-verbal phrases related thereto.

ａ）要素ｊｕ，ｊｖが、ｔ２において同一部分文にあり、ｔ１において同一部分文にある。
ｂ）要素ｊｕ，ｊｖが、ｔ２において同一部分文にあり、ｔ１において同一部分文にない。
ｃ）要素ｊｕ，ｊｖが、ｔ２において同一部分文になく、ｔ１において同一部分文にある。
ｄ）要素ｊｕ，ｊｖが、ｔ２において同一部分文になく、ｔ１において同一部分文にない。 a) Elements ju and jv are in the same sub-sentence at t2 and in the same sub-sentence at t1.
b) Elements ju and jv are in the same sub-sentence at t2 and not in the same sub-sentence at t1.
c) Elements ju and jv are not in the same sub-sentence at t2, but are in the same sub-sentence at t1.
d) Elements ju and jv are not in the same sub-sentence at t2 and are not in the same sub-sentence at t1.

ａ）の例として、ｔ２：「雀は水田を荒らす。」、ｔ１：「毎年、雀は水田を荒らす。」としたとき、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ２において、「荒らす。」を起点とする部分文「雀は水田を荒らす。」にあり、同一部分文にある。また、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ１において、「荒らす。」を起点とする部分文「毎年、雀は水田を荒らす。」にあり、同一部分文にある。また、「ｔ１ならばｔ２」の含意関係が成り立つ。 As an example of a), when t2: “sparrow damages paddy field”, t1: “every year sparrow damages paddy field”, ju: “sparrow”, jv: “paddy field” at t2: It is in the sub-sentence starting from "Troubleshoot." Also, ju: “sparrow” and jv: “paddy” are in the same sub-sentence in the sub-sentence “every year sparrow devastates the paddy field” starting from “disrupt” at t1. Further, the implication relationship of “if t1, t2” is established.

ｂ）の例として、ｔ２：「雀は水田を荒らす。」、ｔ１：「雀を駆除することが水田を荒らす。」としたとき、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ２において、「荒らす。」を起点とする部分文「雀は水田を荒らす。」にあり、同一部分文にある。また、ｊｕ：「雀」は、ｔ１において、「駆除する」を起点とする部分文「雀を駆除することが」にあり、ｊｖ：「水田」は、ｔ１において、「荒らす。」を起点とする部分文「ことが水田を荒らす。」にあり、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ１において同一部分文にない。また、「ｔ１ならばｔ２」の含意関係が成り立たない。 As an example of b), when t2: “Sparrow spoils paddy field”, t1: “Exterminating sparrows spoils paddy field”, ju: “sparrow”, jv: “paddy field” at t2 , In the sub-sentence starting from "Troubleshoot." "Sparrow trolls paddy." and is in the same sub-sentence. In addition, ju: “sparrow” is in the partial sentence “to exterminate sparrow” starting from “disinfect” at t1, and jv: “paddy” starts from “disrupt” in t1. The sub-sentence “to make the rice paddy worse” is present, and ju: “sparrow” and jv: “paddy” are not in the same sub-sentence at t1. In addition, the implication relationship of “if t1, t2” does not hold.

ｃ）の例として、ｔ２：「雀を駆除することが水田を荒らす。」、ｔ１：「雀は水田を荒らす。」としたとき、ｊｕ：「雀」は、ｔ２において、「駆除する」を起点とする部分文「雀を駆除することが」にあり、ｊｖ:「水田」は、ｔ２において、「荒らす。」を起点とする部分文「ことが水田を荒らす。」にあり、ｊｕ:「雀」、ｊｖ:「水田」は、ｔ２において同一部分文にない。また、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ１において、「荒らす。」を起点とする部分文「雀は水田を荒らす。」にあり、同一部分文にある。また、「ｔ１ならばｔ２」の含意関係が成り立たない。 As an example of c), when t2: “Exterminating sparrows damages paddy fields” and t1: “Sparrows damage paddy fields”, ju: “sparrow” selects “disinfect” at t2. There is a sub-sentence that starts from "Being exterminating sparrows", and jv: "Paddy field" is located at a sub-sentence that starts from "Troubleshoot." at t2. "Sparrow" and jv: "Paddy" are not in the same sub-sentence at t2. In addition, ju: “sparrow” and jv: “paddy” are in the same partial sentence as the sub-sentence “sparrow spoils the paddy field” starting from “disrupt” at t1. In addition, the implication relationship of “if t1, t2” does not hold.

ｄ）の例として、ｔ２：「定期的に雀を駆除することが水田を荒らす。」、ｔ１：「雀を駆除することが水田を荒らす。」としたとき、ｊｕ：「雀」は、ｔ２において、「駆除する」を起点とする部分文「定期的に雀を駆除することが」にあり、ｊｖ：「水田」は、ｔ２において、「荒らす。」を起点とする部分文「ことが水田を荒らす。」にあり、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ２において同一部分文にない。また、ｊｕ：「雀」は、ｔ１において、「駆除する」を起点とする部分文「雀を駆除することが」にあり、ｊｖ：「水田」は、ｔ１において、「荒らす」を起点とする部分文「ことが水田を荒らす。」にあり、ｊｕ：「雀」、ｊｖ：「水田」は、ｔ１において同一部分文にない。また、「ｔ１ならばｔ２」の含意関係が成り立つ。 As an example of d), when t2: “Exterminating sparrows damages paddy fields” and t1: “Exterminating sparrows damages paddy fields”, ju: “sparrow” is t2. , There is a sub-sentence that begins with “disinfect”, “Periodic extermination of sparrows”, and jv: “Paddy” is a sub-sentence that begins with “disrupt” at t2. , And ju: “sparrow” and jv: “paddy” are not in the same sub-sentence at t2. In addition, ju: "sparrow" is in the partial sentence "disinfect sparrow" starting from "disinfect" at t1, and jv: "paddy" starts in "disrupt" at t1. There is a sub-sentence "to make the rice paddy.", and ju: "sparrow" and jv: "paddy" are not in the same sub-sentence at t1. Further, the implication relationship of “if t1, t2” is established.

ｔ２、ｔ１に出現するＥ中の２つの要素ｊｕ、ｊｖが常にａ）またはｄ）を満たす場合、「ｔ１ならばｔ２」の含意関係が成り立つ可能性は高くなる。逆に、あるｊｕ，ｊｖが、ａ）もｄ）も満たさず、ｂ）またはｃ）に該当する場合、「ｔ１ならばｔ２」の含意関係が成り立つ可能性は低くなる。このように、ｔ２，ｔ１に出現するＥ中の２つの要素ｊｕ，ｊｖがａ），ｂ），ｃ），ｄ）のいずれであるかによって、含意関係のある程度の予測ができる。 When the two elements ju and jv in E appearing at t2 and t1 always satisfy a) or d), there is a high possibility that the implication relation of “if t1 is t2” is established. On the other hand, when a certain ju, jv does not satisfy a) or d) and corresponds to b) or c), it is less likely that the implication relation of “if t1 is t2” is established. In this way, depending on which of the two elements ju, jv in E appearing at t2, t1 is a), b), c), d), a certain degree of implication can be predicted.

しかしながら、上記従来手法では、類似度算出式におけるｊｕ，ｊｖに関する値に全く変化がないため、ｊｕ，ｊｖがａ），ｂ），ｃ），ｄ）のいずれに該当しているかの情報を、含意関係の認識に全く活用できていないという課題があった。 However, in the above-mentioned conventional method, since there is no change in the values relating to ju and jv in the similarity calculation formula, information indicating which of ju) and jv corresponds to a), b), c) and d) is given as There was a problem that it could not be utilized at all for recognition of implications.

本発明の目的は、Ｅ中の２つの要素ｊｕ，ｊｖが、対象とする各テキストにおいて、同一部分文にあるか否かの情報を含意関係の認識に活用することにより、この課題を解決し、含意認識の精度を向上させる含意認識装置、方法、及びプログラムを提供することにある。 An object of the present invention is to solve this problem by utilizing information on whether two elements ju and jv in E are in the same sub-sentence in each target text for recognizing an implication relation. An object of the present invention is to provide an entailment recognition device, method, and program for improving the accuracy of entailment recognition.

上記課題を解決するため、第１の発明に係る含意認識装置は、２つのテキストｔ１，ｔ２の間の含意関係を認識する含意認識装置であって、ｔ１，ｔ２それぞれに対し、係り受け解析し、用言文節を起点として、該用言文節と、該用言文節に係るまたは該用言文節が係る非用言文節と、非用言文節に係る非用言文節とを連結して得られる部分文を抽出し、前記部分文を除いた文節群の中の非用言文節と該非用言文節に係る非用言文節とを連結して得られる部分文を抽出することにより、部分文のリストを抽出する部分文抽出手段と、ｔ２中の各自立部に対し、該自立部と同義または類義のｔ１中の類義自立部のリストを抽出する類義自立部抽出手段と、ｔ２中の各自立部に対し、対応する類義自立部リストから一つの類義自立部を選択することにより、自立部間のアライメントを選択するアライメント選択手段と、選択したアライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該アライメントの類似度を算出するアライメント類似度算出手段と、選択したアライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の部分文にあるか否かによって定まる類似度の補正率を、類似度補正率データベースより取得し、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該アライメントの類似度の補正率を算出し、該アライメントの類似度に、該補正率を乗じることにより、該アライメントの補正類似度を算出するアライメント類似度補正手段と、各アライメントの補正類似度の最大値を、テキストｔ１，ｔ２間の類似度として算出するテキスト間類似度算出手段と、を含んで構成されている。 In order to solve the above problems, the implication recognition device according to the first invention is an implication recognition device that recognizes an implication relationship between two texts t1 and t2, and performs dependency analysis on each of t1 and t2. , Obtained by connecting the adjective phrase, the non-adjective phrase related to the adjective phrase or related to the adjective phrase, and the non-adjective phrase related to the non-verbal phrase starting from the adjective phrase By extracting a partial sentence and extracting a partial sentence obtained by connecting a non-verbal phrase in the phrase group excluding the partial sentence and a non-verbal phrase related to the non-verbal phrase, A subsentence extracting means for extracting a list; a synonym self-supporting portion extracting means for extracting a list of synonymous self-supporting portions in t1 which is synonymous with or synonymous with the self-supporting portion for each self-supporting portion in t2; For each self-supporting part, the alignment selecting means for selecting the alignment between the self-supporting parts by selecting one synonymous self-supporting part from the corresponding synonymous self-supporting part list, and each self-supporting part in t2 in the selected alignment. Alignment similarity calculating means for calculating the similarity of the alignment by averaging the weights of the self-supporting parts of the similarity between the self-supporting part and the synonymous self-supporting part corresponding to the self-supporting part, and the selected alignment In t2, for each pair of independent parts in t2, whether the pair is in the same partial sentence and whether the synonymous independent parts corresponding to each independent part of the pair are in the same partial sentence By obtaining the correction rate of the similarity determined from the similarity correction rate database and taking the average of the similarity correction rates of each pair, which is determined from the weight of each independent part of the pair, the weighting of the pair is performed. An alignment similarity correction unit that calculates the correction similarity of the alignment and calculates the correction similarity of the alignment by multiplying the similarity of the alignment by the correction ratio, and the maximum correction similarity of each alignment. An inter-text similarity calculating means for calculating a value as a similarity between the texts t1 and t2 is included.

第２の発明に係る含意認識装置は、テキストｔ１，ｔ２の組のリストであって、各組に対し、ｔ１，ｔ２それぞれの正解部分文のリストと、ｔ２中の各自立部と該自立部に対応するｔ１中の正解の１類義自立部との対のリストである正解アライメントと、ｔ１,ｔ２が含意関係にあるか否かの正解ラベルが付与されているｔ１，ｔ２の組のリストを入力とし、処理対象とするテキストｔ１，ｔ２の組を選択するテキストｔ１，ｔ２の組選択手段と、選択したｔ１，ｔ２の組の正解アライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該正解アライメントの類似度を算出する正解アライメント類似度算出手段と、該正解アライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の正解部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の正解部分文にあるか否かによって定まる類似度の補正率を未知数とし、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該正解アライメントの類似度の補正率の式を導出し、該正解アライメントの類似度に、該補正率の式を乗じることにより、該正解アライメントの補正類似度の式を導出する正解アライメント類似度補正手段と、各ｔ１，ｔ２の組の正解アライメントの補正類似度の式と、該組の正解ラベルとの対のリストに対し、重回帰分析を適用することにより、未知数である各類似度補正率の最適値を導出する重回帰分析手段と、を含んで構成されている。 The entailment recognition device according to the second invention is a list of sets of texts t1 and t2, for each set, a list of correct sub-sentences of t1 and t2, each independent part in t2, and the independent part. The correct answer alignment which is a list of pairs with the correct one-sense synonymous part in t1 corresponding to and the list of the set of t1 and t2 to which the correct answer label indicating whether or not t1 and t2 have an implication relationship is given. In the set of text t1 and t2 for selecting a set of texts t1 and t2 to be processed and correct alignment of the selected set of t1 and t2. Correct alignment alignment similarity calculation means for calculating the similarity of the correct answer alignment by taking the weighted average of the similarities with the corresponding synonymous independent parts, and during the correct alignment t2, Similarity determined for each pair of self-supporting parts of, whether the pair is in the same correct sub-sentence and whether synonymous self-supporting parts corresponding to each self-supporting part of the pair are in the same correct sub-sentence The correction factor of the degree is unknown, and the similarity correction factor of each pair is averaged with the weight of the pair, which is determined from the weight of each independent part of the pair, to obtain the correction factor of the similarity of the correct alignment. A formula for deriving a formula and a formula for the correction similarity for the correct alignment by multiplying the formula for the correction ratio by the similarity for the correct alignment, and a correct alignment similarity correction means and a set of t1 and t2. Multiple regression analysis means for deriving an optimum value of each similarity correction rate, which is an unknown number, by applying multiple regression analysis to a list of pairs of correct similarity for correct alignment and correct labels of the set. And are included.

また、第３の発明に係る含意認識方法は、部分文抽出手段と、類義自立部抽出手段と、アライメント選択手段と、アライメント類似度算出手段と、アライメント類似度補正手段と、テキスト間類似度算出手段とを含み、２つのテキストｔ１，ｔ２の間の含意関係を認識する含意認識装置における含意認識方法であって、前記部分文抽出手段が、ｔ１，ｔ２それぞれに対し、係り受け解析し、用言文節を起点として、該用言文節と、該用言文節に係るまたは該用言文節が係る非用言文節と、非用言文節に係る非用言文節とを連結して得られる部分文を抽出し、前記部分文を除いた文節群の中の非用言文節と該非用言文節に係る非用言文節とを連結して得られる部分文を抽出することにより、部分文のリストを抽出するステップと、前記類義自立部抽出手段が、ｔ２中の各自立部に対し、該自立部と同義または類義のｔ１中の類義自立部のリストを抽出するステップと、前記アライメント選択手段が、ｔ２中の各自立部に対し、対応する類義自立部リストから一つの類義自立部を選択することにより、自立部間のアライメントを選択するステップと、前記アライメント類似度算出手段が、選択したアライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該アライメントの類似度を算出するステップと、前記アライメント類似度補正手段が、選択したアライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の部分文にあるか否かによって定まる類似度の補正率を、類似度補正率データベースより取得し、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該アライメントの類似度の補正率を算出し、該アライメントの類似度に、該補正率を乗じることにより、該アライメントの補正類似度を算出するステップと、前記テキスト間類似度算出手段が、各アライメントの補正類似度の最大値を、テキストｔ１，ｔ２間の類似度として算出するステップと、を含んで構成されている。 Further, the entailment recognition method according to the third aspect of the present invention is a partial sentence extraction means, a synonymous independence part extraction means, an alignment selection means, an alignment similarity calculation means, an alignment similarity correction means, and an inter-text similarity degree. An entailment recognition method in an entailment recognition device for recognizing an entailment relationship between two texts t1 and t2, including calculation means, wherein the partial sentence extraction means performs dependency analysis on each of t1 and t2, A portion obtained by connecting the syllable phrase, the non-syllable phrase related to the syllable phrase, or related to the syllable phrase, and the non-synonym phrase related to the non-syllable phrase with the syllable phrase as a starting point. A list of sub-sentences by extracting sentences and extracting sub-sentences obtained by concatenating non-syllable phrases in the phrase group excluding the sub-sentences and non-syllable phrases related to the non-syllable phrases And extracting the list of synonymous self-supporting parts in t1 which is synonymous with or synonymous with the self-supporting part for each self-supporting part in t2; A step of selecting an alignment between independent parts by selecting one synonymous independent part from a corresponding synonymous independent part list for each independent part in t2; and the alignment similarity calculation means. In the selected alignment, the similarity of the alignment is calculated by taking the weighted average of the similarities between each self-supporting part in t2 and the synonymous self-supporting part corresponding to the self-supporting part. The step of calculating and the alignment similarity correction means, for the selected alignment, for each pair of independence parts in t2, whether the pair is in the same sub-sentence and corresponding to each independence part of the pair. The synonym independence part is obtained from the similarity correction rate database, which determines the similarity correction rate determined by whether or not they are in the same partial sentence, and the weight of each independence part of the similarity correction rate of each pair is obtained. From the weighted average of the pair, the correction rate of the similarity of the alignment is calculated, and the similarity of the alignment is multiplied by the correction rate to calculate the correction similarity of the alignment. And a step in which the inter-text similarity calculating means calculates the maximum value of the corrected similarity of each alignment as the similarity between the texts t1 and t2.

また、第４の発明に係る含意認識方法は、テキストｔ１，ｔ２の組選択手段と、正解アライメント類似度算出手段と、正解アライメント類似度補正手段と、重回帰分析手段とを含む含意認識装置における含意認識方法であって、テキストｔ１，ｔ２の組のリストであって、各組に対し、ｔ１，ｔ２それぞれの正解部分文のリストと、ｔ２中の各自立部と該自立部に対応するｔ１中の正解の１類義自立部との対のリストである正解アライメントと、ｔ１,ｔ２が含意関係にあるか否かの正解ラベルが付与されているｔ１，ｔ２の組のリストを入力とし、前記テキストｔ１，ｔ２の組選択手段が、処理対象とするテキストｔ１，ｔ２の組を選択するステップと、前記正解アライメント類似度算出手段が、選択したｔ１，ｔ２の組の正解アライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該正解アライメントの類似度を算出するステップと、前記正解アライメント類似度補正手段が、該正解アライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の正解部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の正解部分文にあるか否かによって定まる類似度の補正率を未知数とし、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該正解アライメントの類似度の補正率の式を導出し、該正解アライメントの類似度に、該補正率の式を乗じることにより、該正解アライメントの補正類似度の式を導出するステップと、前記重回帰分析手段が、各ｔ１，ｔ２の組の正解アライメントの補正類似度の式と、該組の正解ラベルとの対のリストに対し、重回帰分析を適用することにより、未知数である各類似度補正率の最適値を導出するステップと、を含んで構成されている。 The entailment recognition method according to the fourth aspect of the invention is an entailment recognition device including a set selection unit for texts t1 and t2, a correct alignment similarity calculation unit, a correct alignment similarity correction unit, and a multiple regression analysis unit. An entailment recognition method, which is a list of sets of texts t1 and t2, for each set, a list of correct partial sentences of t1 and t2, each independent part in t2, and t1 corresponding to the independent part. The correct answer alignment, which is a list of pairs of the correct answer in the 1st category independence part, and the list of the set of t1 and t2 to which the correct answer label indicating whether or not t1 and t2 have an implication relationship are input, The step of selecting the set of texts t1 and t2 to be processed by the set selecting means for the texts t1 and t2, and the step t2 in the correct answer alignment of the set of t1 and t2 selected by the correct alignment similarity calculating means. Calculating the similarity of the correct answer alignment by taking the average of the similarities of each independent part and the synonymous independent part corresponding to the independent part, with the weight of the independent part, and the correct alignment In the correct answer alignment, the similarity correction means determines whether or not the pair is in the same correct sub-sentence for each pair of independent parts in t2, and the synonymous independent part corresponding to each independent part of the pair. The similarity correction rate determined by whether or not the same correct sub-sentence is present is used as an unknown number, and the similarity correction rate of each pair is weighted and determined by the weight of each independent part of the pair. By deriving a formula for a correction rate for the similarity of the correct answer, and deriving a formula for a correction similarity for the correct alignment by multiplying the similarity for the correct alignment by the formula for the correction rate. The multiple regression analysis means applies the multiple regression analysis to the list of pairs of correct similarity of correct alignment of each pair of t1 and t2 and the correct answer label of the pair, and thereby the unknown number is obtained. Deriving an optimum value of each similarity correction rate.

また、本発明のプログラムは、コンピュータを、本発明の含意認識装置の各手段として機能させるためのプログラムである。 The program of the present invention is a program for causing a computer to function as each unit of the implication recognition device of the present invention.

第１の発明に係る含意認識装置は、入力する２つのテキストｔ１，ｔ２に対し、「ｔ１ならばｔ２」の含意関係の度合いを、ｔ１，ｔ２間の類似度として算出することによって、ｔ１，ｔ２間の含意関係を認識する含意認識装置である。 The entailment recognition device according to the first aspect of the invention calculates the degree of the entailment relationship “t1 is t2” for two input texts t1 and t2 as the similarity between t1 and t2. It is an implication recognition device that recognizes the implication relation between t2.

また第２の発明に係る含意認識装置は、含意認識を行う前に、含意認識で用いる類似度補正率データベース中に記述すべき最適な類似度補正率を学習データから獲得するための装置である。 Further, the entailment recognition device according to the second aspect of the invention is a device for acquiring the optimum similarity correction rate to be described in the similarity correction rate database used in the implication recognition from the learning data before performing the implication recognition. ..

本発明の含意認識装置、方法、及びプログラムは、含意認識の精度が向上するという効果がある。 The entailment recognition device, method, and program of the present invention have the effect of improving the accuracy of entailment recognition.

本発明の第１の実施の形態に係る含意認識装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the implication recognition apparatus which concerns on the 1st Embodiment of this invention. テキストの係り受け解析結果の一例を示す図である。It is a figure which shows an example of the dependency analysis result of a text. テキストｔ２中の各自立部と、該自立部に対して抽出されたテキストｔ１中の類義自立部のリストの各要素との対応関係の一例を示す図である。It is a figure which shows an example of the correspondence of each self-supporting part in the text t2, and each element of the list of the synonymous self-supporting part in the text t1 extracted with respect to this self-supporting part. シソーラスの一例を示す図である。It is a figure which shows an example of a thesaurus. 単語概念ベースの一例を示す図である。It is a figure which shows an example of a word concept base. テキストｔ２中の各自立部と対応する類義自立部リストから、選択したアライメントの一例を示す図である。It is a figure which shows an example of the alignment selected from the synonymous independent part list corresponding to each independent part in the text t2. 自立部重みデータベースの一例を示す図である。It is a figure which shows an example of an independent part weight database. 類似度補正率データベースの一例を示す図である。It is a figure which shows an example of a similarity correction rate database. 本発明の第１の実施の形態に係る含意認識装置の含意認識処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the implication recognition processing routine of the implication recognition apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る含意認識装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the implication recognition apparatus which concerns on the 2nd Embodiment of this invention. 類似度補正率データベースの一例を示す図である。It is a figure which shows an example of a similarity correction rate database. 本発明の第２の実施の形態に係る含意認識装置の含意認識処理ルーチンを示すフローチャート図である。It is a flowchart figure which shows the implication recognition processing routine of the implication recognition apparatus which concerns on the 2nd Embodiment of this invention.

以下、図面とともに本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

＜第１の実施の形態＞
図１は、本発明の請求項１の一例を示す含意認識装置の構成例である。図１に示すように、本発明の第１の実施の形態に係る含意認識装置１００は、ＣＰＵと、ＲＡＭと、後述する各処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この含意認識装置１００は、機能的には図１に示すように入力手段１０と、演算手段２０と、出力手段４０とを備えている。 <First Embodiment>
FIG. 1 is a configuration example of an implication recognition device showing an example of claim 1 of the present invention. As shown in FIG. 1, an implication recognition device 100 according to the first exemplary embodiment of the present invention includes a CPU, a RAM, a ROM that stores programs and various data for executing each processing routine described later, Can be configured with a computer including. This implication recognition device 100 is functionally provided with an input means 10, a calculation means 20, and an output means 40 as shown in FIG.

入力手段１０は、２つのテキストｔ１，ｔ２を受け付ける。 The input means 10 receives two texts t1 and t2.

演算手段２０は、部分文抽出手段２２と、類義自立部抽出手段２４と、自立部データベース２５と、アライメント選択手段２６と、アライメント類似度算出手段２８と、自立部重みデータベース２９と、アライメント類似度補正手段３０と、類似度補正率データベース３１と、テキスト間類似度算出手段３２とを含んで構成されている。 The calculating means 20 includes a partial sentence extracting means 22, a synonymous independent part extracting means 24, an independent part database 25, an alignment selecting means 26, an alignment similarity calculating means 28, an independent part weight database 29, and an alignment similarity. The degree correction unit 30, the similarity correction rate database 31, and the inter-text similarity degree calculation unit 32 are included.

部分文抽出手段２２は、入力手段１０により受け付けた２つのテキストｔ１,ｔ２を入力として、ｔ１，ｔ２それぞれに対し、係り受け解析し、用言文節を起点として、該用言文節と、該用言文節に係るまたは該用言文節が係る非用言文節と、非用言文節に係る非用言文節とを連結して得られる部分文を抽出し、部分文を除いた文節群の中の非用言文節とそれに係る非用言文節とを連結して得られる部分文を抽出し、部分文のリストを抽出する。 The sub-sentence extraction means 22 receives the two texts t1 and t2 received by the input means 10, and performs dependency analysis on each of t1 and t2. Extracts a partial sentence obtained by connecting a non-verbal phrase related to a phrase or a non-verbal phrase related to the phrase and a non-verbal phrase related to a non-verbal phrase, A partial sentence obtained by connecting the non-verbal phrase and the related non-verbal phrase is extracted, and a list of partial sentences is extracted.

図２は、対象テキスト「木造の建造物の賃貸に住む老人の数」の係り受け解析結果を示す図である。係り受け解析によって、このテキストは、６つの文節「木造の」、「建造物の」、「賃貸に」、「住む」、「老人の」、「数」に分割され、さらに、各文節内において、自立部と付属部に分割される。文節「木造の」の自立部は「木造」で、付属部は「の」である。文節「住む」の自立部は「住む」で、付属部はＮＵＬＬである。係り受け解析により、文節間の係り受け関係が導出され、「木造の」が「建造物の」に係り、「建造物の」が「賃貸に」に係り、「賃貸に」が「住む」に係り、「住む」が「老人の」に係り、「老人の」が「数」に係る。 FIG. 2 is a diagram showing the dependency analysis result of the target text “the number of old people living in the rent of wooden structures”. By the dependency analysis, this text is divided into 6 clauses "Wooden", "Building", "For rent", "Live", "Aged", and "Number", and within each clause, , It is divided into a self-supporting part and an accessory part. The independent part of the phrase "wooden" is "wooden", and the attached part is "no". The independence part of the phrase "living" is "living", and the appendix is NULL. Dependency analysis derives the dependency relationship between clauses. “Wooden's” is related to “Building”, “Building” is related to “Rental”, and “Rental” is “Living”. "Living" is related to "old man's", and "old man's" is related to "number".

部分文抽出手段２２では、用言文節「住む」を起点として、該用言文節「住む」と、該用言文節「住む」に係る非用言文節「賃貸に」と、非用言文節「賃貸に」に係る非用言文節「建造物の」と、非用言文節「建造物の」に係る非用言文節「木造の」と、該用言文節「住む」が係る非用言文節「老人の」とを連結して得られる部分文「木造の建造物の賃貸に住む老人の」を抽出する。この対象テキストでは、用言文節は１つだが、他にもあれば、用言文節ごとに同様にして部分文を抽出する。対象テキストから、抽出した部分文を除いた文節群の中の非用言文節「数」とそれに係る非用言文節「老人の」とを連結して得られる部分文「老人の数」を抽出する。このようにして、部分文のリスト「木造の建造物の賃貸に住む老人の」，「老人の数」を抽出する。 The sub-sentence extraction unit 22 starts from the verb phrase “live”, the verb phrase “live”, the non-verb phrase “rental” and the non-verb phrase “related to the verb phrase “live”. Non-verbal phrase “building” related to “rent”, non-verbal phrase “wooden construction” related to non-verbal phrase “building”, and non-verbal phrase related to the verbal phrase “live” Extract the partial sentence "Old man living in the rent of a wooden structure" obtained by connecting "Old man's". In this target text, there is one syllable phrase, but if there are other syllable phrases, partial sentences are similarly extracted for each syllable phrase. From the target text, extract the partial sentence "number of old people" obtained by connecting the non-verbal phrase "number" in the phrase group excluding the extracted partial sentences and the related non-verbal phrase "old man's" To do. In this way, the partial sentence lists “of the elderly living in the rent of wooden structures” and “the number of the elderly” are extracted.

類義自立部抽出手段２４は、ｔ２中の各自立部に対し、該自立部と同義または類義のｔ１中の類義自立部のリストを抽出する。以下、具体的に説明する。 The synonymous self-supporting part extraction means 24 extracts, for each self-supporting part in t2, a list of synonymous self-supporting parts in t1 which is synonymous with or synonymous with the self-supporting part. The details will be described below.

ｔ２中の自立部を、ｊ１，ｊ２，…，ｊｍとする。このリスト中の各ｊｉは、出現位置が異なる自立部であり、リスト中に同一文字列の自立部が複数ある場合もある。各ｊｉに対し、ｔ１中の自立部の中で、ｊｉと同義または類義の自立部である類義自立部のリストｒｉ１，ｒｉ２，…，ｒｉｎ_ｉを抽出し、ｊｉに対応付ける。このリスト中の各ｒｉｋは、出現位置が異なる自立部であり、リスト中に同一文字列の自立部が複数ある場合もある。また、ｔ２中の異なる自立部に、共通の類義自立部が対応している場合もある。ｊｉに対し、ｔ１中の自立部の中で、ｊｉと同義または類義の自立部が存在しない場合は、ｊｉに、空であることを表すＮＵＬＬを対応付ける。 Let j1, j2,..., Jm be independent sections during t2. Each ji in this list is an independent part having a different appearance position, and there may be a plurality of independent parts having the same character string in the list. For each ji, in free-standing unit in t1, list of synonymous freestanding unit which is freestanding portion of ji synonymous or synonymous ri1, ri2, ..., extracts rin _i, associated to ji. Each rik in this list is an independent part having a different appearance position, and there may be a plurality of independent parts having the same character string in the list. Moreover, the common synonymous self-supporting part may correspond to the different self-supporting part in t2. When there is no independent part having the same meaning as or synonymous with ji among the independent parts in t1 with respect to ji, ji is associated with NULL indicating that it is empty.

図３は、ｔ２中の各自立部と、該自立部に対して抽出されたｔ１中の類義自立部のリストの各要素との対応関係を示した図である。ｔ２の自立部ｊ１に対し、ｔ１中の類義自立部リストｒ１１，ｒ１２，ｒ１３が対応し、ｔ２の自立部ｊ２に対し、ｔ１中の類義自立部リストｒ２１，ｒ２２が対応し、ｔ２の自立部ｊ３に対し、ｔ１中の類義自立部リストｒ３１，ｒ３２，ｒ３３が対応し、ｔ２の自立部ｊ４に対しＮＵＬＬが対応する。 FIG. 3 is a diagram showing a correspondence relationship between each independent part in t2 and each element of the list of synonymous independent parts in t1 extracted for the independent part. The synonymous independent section lists r11, r12, and r13 in t1 correspond to the independent section j1 of t2, the synonymous independent section lists r21 and r22 in t1 correspond to the independent section j2 of t2, and The independent section j3 corresponds to the synonymous independent section lists r31, r32, and r33 in t1, and the independent section j4 at t2 corresponds to NULL.

ｔ２中の自立部ｊｉと同一文字列であるｔ１中の自立部は、ｊｉの類義自立部となる（但し、別途意味解析を行い、ｔ１中の該自立部の意味がｊｉの意味と異なった場合は、この限りではない）。ｊｉとｔ１中の該自立部との類似度を１．０とし、ｊｉとｔ１中の該自立部との対に対応付ける。 The self-supporting part in t1 which is the same character string as the self-supporting part ji in t2 becomes a synonymous self-supporting part of ji (However, the meaning of the self-supporting part in t1 is different from the meaning of ji. If this is not the case). The similarity between ji and the self-supporting part in t1 is set to 1.0 and is associated with the pair of ji and the self-supporting part in t1.

文字列として必ずしも同一でない２つの自立部が同義または類義であるか否かを例えば図４のようなシソーラスを用いて判断する。対象とする２つの自立部のシソーラス上のノードの間の距離が、ある値以下の場合に、同義または類義であると判断する。また、ノード間の距離をもとに、対象とする２つの自立部の間の類似度を算出し、対象とする２つの自立部に対応付ける。 It is determined whether or not two independent parts that are not necessarily the same as a character string have the same meaning or a similar meaning using, for example, a thesaurus shown in FIG. If the distance between the nodes on the thesaurus of the two independent parts of interest is less than or equal to a certain value, it is determined to be synonymous or synonymous. Also, based on the distance between the nodes, the similarity between the two target independent sections is calculated and associated with the two target independent sections.

文字列として必ずしも同一でない２つの自立部が同義または類義であるか否かを例えば、非特許文献２で挙げられている単語概念ベースを用いて判断する。単語概念ベースは、単語と該単語の概念を表す単語概念ベクトルとの対のリストである。図５は単語概念ベースの例を示す図である。各単語の単語概念ベクトルはｄ次元ベクトルであり、概念的に近い単語の概念ベクトルは、近くに配置されている。 Whether or not two independent parts that are not necessarily the same as a character string are synonymous or synonymous is determined using, for example, the word concept base described in Non-Patent Document 2. The word concept base is a list of pairs of a word and a word concept vector representing the concept of the word. FIG. 5 is a diagram showing an example of a word concept base. The word concept vector of each word is a d-dimensional vector, and the concept vectors of words that are conceptually close are arranged nearby.

［非特許文献2］別所克人, 内山俊郎, 内山匡, 片岡良治, 奥雅博,“単語・意味属性間共起に基づくコーパス概念ベースの生成方式,”情報処理学会論文誌, Vol.49, No.12, pp.3997-4006, Dec. 2008. [Non-patent document 2] Katsuto Bessho, Toshiro Uchiyama, Tadashi Uchiyama, Ryoji Kataoka, Masahiro Oku, "Generation method of corpus concept base based on co-occurrence between word and semantic attributes," Transactions of Information Processing Society of Japan, Vol.49, No.12, pp.3997-4006, Dec. 2008.

各自立部の概念ベクトルを、該自立部を構成する単語の単語概念ベクトルを合成することにより生成する。対象とする２つの自立部に対し、それぞれの概念ベクトルの例えばコサイン類似度が、ある値以上の場合に、同義または類義であると判断する。また、算出したコサイン類似度を、対象とする２つの自立部に対応付ける。 The concept vector of each self-supporting part is generated by synthesizing the word concept vectors of the words forming the self-supporting part. If, for example, the cosine similarity of each of the concept vectors of the two target independent portions is equal to or greater than a certain value, it is determined that they are synonymous or synonymous. In addition, the calculated cosine similarity is associated with two target independent parts.

上記のシソーラスや単語概念ベース等の、自立部が同義または類義であるか否かを判断するためのデータベースを総称して自立部データベース２５と呼ぶ。類義自立部抽出手段２４は、ｔ２中の自立部を任意に固定したとき、ｔ１中の各自立部に対し、自立部データベース２５を参照することにより、ｔ２中の該自立部とｔ１中の該自立部との類似度を算出し、その類似度がある値以上の場合に、ｔ１中の該自立部を類義自立部と認定し、ｔ２中の該自立部とｔ１中の該類義自立部との対に、算出した類似度を対応付ける。ｔ２中の自立部に対し、対応するｔ１中の類義自立部が存在しない場合は、ｔ２中の該自立部に、空であることを表すＮＵＬＬを対応付け、ｔ２中の該自立部とＮＵＬＬとの対に、類似度０．０を対応付ける。図３では、ｔ２中の各自立部と、該自立部に対応するｔ１中の類義自立部との対のリンクに、対応する類似度を付けて表示している。 Databases such as the thesaurus and the word concept base for determining whether the self-supporting parts are synonymous or synonymous are collectively referred to as the self-supporting part database 25. When the self-supporting part in t2 is arbitrarily fixed, the synonymous self-supporting part extraction means 24 refers to the self-supporting part database 25 for each self-supporting part in t1, and the self-supporting part in t2 and the self-supporting part in t1. The degree of similarity with the self-supporting portion is calculated, and when the degree of similarity is equal to or more than a certain value, the self-supporting portion in t1 is recognized as a synonymous self-supporting portion, and the self-supporting portion in t2 and the synonym in t1 are synonymous. The calculated similarity is associated with the pair with the independent section. When there is no corresponding synonymous self-supporting part in t1 with respect to the self-supporting part in t2, the self-supporting part in t2 is associated with NULL indicating that it is empty, and the self-supporting part in t2 and NULL are associated with each other. The similarity of 0.0 is associated with the pair of and. In FIG. 3, a pair of links of each self-supporting part in t2 and the synonymous self-supporting part in t1 corresponding to the self-supporting part are displayed with corresponding degrees of similarity.

アライメント選択手段２６は、ｔ２中の各自立部に対し、対応する類義自立部リストから一つの類義自立部を選択することにより、自立部間のアライメントを選択する。以下、具体的に説明する。 The alignment selecting means 26 selects an alignment between independent parts by selecting one synonymous independent part from the corresponding synonymous independent part list for each independent part in t2. The details will be described below.

ｔ２中の各自立部ｊｉに対し、対応する類義自立部リストｒｉ１，ｒｉ２，…，ｒｉｎ_ｉから一つの類義自立部を選択しｒｉｋ_ｉとする。類義自立部リストがＮＵＬＬの場合は、類義自立部リストがｒｉ１で、ｒｉ１＝＝ＮＵＬＬと考え、ｒｉ１を選択する。これにより、自立部間のアライメント（ｊ１，ｒ１ｋ_１），（ｊ２，ｒ２ｋ_２），…，（ｊｍ，ｒｍｋ_ｍ）が得られる。アライメント中の各（ｊｉ，ｒｉｋ_ｉ）には、類義自立部抽出手段２４で算出し対応付けた類似度が対応している。この自立部間のアライメントは、ｎ_１×ｎ_２×…×ｎ_ｍ個ある。 For each freestanding portion ji in t2, the corresponding synonymous freestanding unit list ri1, ri2, ..., selects one of synonymous freestanding portion from rin _i and rik _i. When the synonymous independence section list is NULL, the synonym independence section list is ri1 and ri1==NULL, and ri1 is selected. Thus, the alignment between the freestanding portion _{_{(j1, r1k 1), (}} j2, r2k 2), ..., (jm, rmk m) is obtained. Each similarity (ji, rik _i ) in the alignment corresponds to the similarity calculated and associated by the synonymous self-supporting portion extraction means 24. There are n ₁ ×n ₂ ×... ×n _m alignments between the self-supporting portions.

アライメント選択手段２６は、これまで選択されていないアライメントが存在する場合、その中から１つのアライメントを以後の処理対象として選択し、アライメント類似度算出手段２８の処理に移る。これまで選択されていないアライメントが存在しない場合、処理対象のアライメントは存在しないものとして、テキスト間類似度算出手段３２の処理に移る。 When there is an alignment that has not been selected so far, the alignment selecting means 26 selects one of the alignments as a target for subsequent processing, and shifts to the processing of the alignment similarity calculating means 28. When there is no alignment that has not been selected so far, it is determined that the alignment to be processed does not exist, and the process proceeds to the process of the inter-text similarity calculating unit 32.

図６は、図３のｔ２中の各自立部と対応する類義自立部リストから、選択したアライメントの例を示す図である。図６のアライメントは、（ｊ１，ｒ１１），（ｊ２，ｒ２１），（ｊ３，ｒ３１），（ｊ４，ＮＵＬＬ）であり、各（ｊｉ，ｒｉｋ_ｉ）には類似度が対応している。 FIG. 6 is a diagram showing an example of alignment selected from the synonymous self-supporting part list corresponding to each self-supporting part in t2 of FIG. Alignment of FIG. 6, (j1, r11), ( j2, r21), a (j3, r31), (j4 , NULL), the (ji, rik _i) is the similarity in the correspond.

アライメント類似度算出手段２８は、選択したアライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該アライメントの類似度を算出する。以下、具体的に説明する。 The alignment similarity calculation means 28 calculates the weighted average of the similarities between the self-supporting parts in t2 and the synonymous self-supporting parts corresponding to the self-supporting parts in the selected alignment, The similarity of the alignment is calculated. The details will be described below.

当該アライメントＡを（ｊ１，ｒ１ｋ_１），（ｊ２，ｒ２ｋ_２），…，（ｊｍ，ｒｍｋ_ｍ）とする。各（ｊｉ，ｒｉｋ_ｉ）に対応する類似度をｓｉとする。ｔ２中の自立部ｊｉの重みｗｉを、自立部重みデータベース２９から取得する。 The alignment _{_{A (j1, r1k 1),}} (j2, r2k 2), ..., and (jm, rmk _m). Let si be the similarity corresponding to each (ji, rik _i ). The weight wi of the independent part ji during t2 is acquired from the independent part weight database 29.

図７は、自立部重みデータベース２９の例を示す図である。各レコードは、自立部とその重みの実数値から構成されている。図３、図６では、ｔ２中の各自立部に、自立部重みデータベース２９から取得した該自立部の重みを付けて表示している。 FIG. 7 is a diagram showing an example of the independent part weight database 29. Each record is composed of an independent part and a real value of its weight. In FIG. 3 and FIG. 6, the weight of each independent portion acquired from the independent portion weight database 29 is displayed on each independent portion during t2.

当該アライメントＡの類似度ｓＡを次式で算出する。 The similarity sA of the alignment A is calculated by the following formula.

このアライメント類似度は、ｔ２中の自立部が、ｔ１において類義自立部まで許容した上で出現している度合いを意味し、「ｔ１ならばｔ２」の含意関係の度合いのベースとなる値である。 This alignment similarity means a degree at which the self-supporting part in t2 appears after allowing up to the synonymous self-supporting part at t1, and is a value that is a base of the degree of the implication relationship of “if t1 is t2”. is there.

自立部の重みについて、例えばＱ＆Ａ集における質問テキスト同士の含意認識を行う場合、質問テキストにおける「教える」や「願う」のような自立部は、「教える」や「願う」ことが内容の前提となっているので、あまり重要な意味を持たない。このような自立部の有無によりアライメント類似度が大きく影響されるのは好ましくない。そこで自立部重みデータベース２９において、例えば図７のように、このような自立部の重みを０．１に設定し、それ以外の自立部の重みを１．０に設定する。こうすることにより、「教える」や「願う」といった自立部の有無によってアライメント類似度が大きく影響されることは殆どなくなる。自立部重みデータベース２９に設定する重みとして、他に自立部のＩＤＦ値も考えられる。 Regarding the weight of the independence part, for example, when performing entailment recognition between question texts in a Q&A collection, the independence part such as “teach” or “hope” in the question text is premised on “teaching” or “hope”. Therefore, it has no significant meaning. It is not preferable that the alignment similarity is greatly affected by the presence or absence of the self-supporting portion. Therefore, in the self-supporting part weight database 29, the weights of such self-supporting parts are set to 0.1 and the weights of the other self-supporting parts are set to 1.0 as shown in FIG. 7, for example. By doing so, the alignment similarity is hardly affected by the presence or absence of the independent portion such as “teach” or “hope”. As the weight set in the independent part weight database 29, the IDF value of the independent part may be considered.

アライメント類似度補正手段３０は、選択したアライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の部分文にあるか否かによって定まる類似度の補正率を、類似度補正率データベース３１より取得し、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該アライメントの類似度の補正率を算出し、該アライメントの類似度に、該補正率を乗じることにより、該アライメントの補正類似度を算出する。以下、具体的に説明する。 In the selected alignment, the alignment similarity correction means 30 determines, for each pair of self-supporting parts in t2, whether or not the pair is in the same partial sentence, and a synonymous self-supporting part corresponding to each self-supporting part of the pair. Is obtained from the similarity correction rate database 31 and the similarity correction rate determined by whether or not is in the same sub-sentence is obtained from the similarity correction rate database 31. Is calculated to calculate the correction rate of the similarity of the alignment, and the similarity of the alignment is multiplied by the correction rate to calculate the correction similarity of the alignment. The details will be described below.

当該アライメントＡを（ｊ１，ｒ１ｋ_１），（ｊ２，ｒ２ｋ_２），…，（ｊｍ，ｒｍｋ_ｍ）とする。ｔ２中の自立部ｊ１，ｊ２，…，ｊｍからとったペア（ｊｕ，ｊｖ）はｕ＜ｖであり、ペアのリストは、（ｊ１，ｊ２），（ｊ１，ｊ３），…，（ｊ１，ｊｍ），（ｊ２，ｊ３），（ｊ２，ｊ４），…（ｊ２，ｊｍ），（ｊ３，ｊ４），…（ｊ（ｍ−１），ｊｍ）であり、ペアの数はｍ（ｍ−１）／２である。 The alignment _{_{A (j1, r1k 1),}} (j2, r2k 2), ..., and (jm, rmk _m). The pair (ju, jv) taken from the independent parts j1, j2,..., Jm in t2 is u<v, and the list of pairs is (j1, j2), (j1, j3),..., (j1, jm), (j2, j3), (j2, j4), ... (j2, jm), (j3, j4), ... (j(m-1), jm), and the number of pairs is m(m-. 1)/2.

各ペア（ｊｕ，ｊｖ）に対し、該ペアの各自立部に対応する類義自立部のペア（ｒｕｋ_ｕ，ｒｖｋ_ｖ）が定まり、（ｒｕｋ_ｕ，ｒｖｋ_ｖ）を簡単に（ｒｕ，ｒｖ）と表す。 For each pair (ju, jv), synonymous freestanding portion of the pair _(ruk u, _rvk _v) corresponding to the freestanding portion of the pair _{Sadamari, (ruk u, rvk} _v) a simple (ru, rv) Express.

図８は、類似度補正率データベース３１の例を示す図である。ｔ２中のペア（ｊｕ，ｊｖ）と、対応するｔ１中のペア（ｒｕ，ｒｖ）の区分ごとの類似度補正率を記載している。（ｊｕ，ｊｖ）がｔ２中の同一部分文にあるか否か、（ｒｕ，ｒｖ）がｔ１中の同一部分文にあるか否かによって区分している。区分１），６）のｒｕ＝ｒｖは、ｒｕ，ｒｖが、ｔ１中の出現位置が同一の自立部であることを意味する。区分２），７）のｒｕ≠ｒｖは、ｒｕ，ｒｖが、ｔ１中の出現位置の異なる自立部であることを意味する（同一文字列の場合はある）。ｔ２中のペア（ｊｕ，ｊｖ）に対し、（ｊｕ，ｊｖ）と（ｒｕ，ｒｖ）が類似度補正率データベース３１中のいずれの区分に該当するかが定まり、それによって類似度補正率ｈｕｖが定まる。 FIG. 8 is a diagram showing an example of the similarity correction rate database 31. The similarity correction rate for each section of the pair (ju, jv) in t2 and the corresponding pair (ru, rv) in t1 is described. It is classified by whether or not (ju, jv) is in the same sub-sentence in t2 and whether (ru, rv) is in the same sub-sentence in t1. Ru=rv in the sections 1) and 6) means that ru and rv are independent parts having the same appearance position in t1. RU≠rv in sections 2) and 7) means that ru and rv are independent portions having different appearance positions in t1 (there may be the same character string). For the pair (ju,jv) in t2, it is determined which category in the similarity correction rate database 31 corresponds to (ju,jv) and (ru,rv), and the similarity correction rate huv is thereby determined. Determined.

ｔ２中のペア（ｊｕ，ｊｖ）に対し、ｊｕの重みｗｕと、ｊｖの重みｗｖの内、大きくない方を、該ペアの重みｗｕｖとする。 For the pair (ju, jv) in t2, the weight wu of ju and the weight wv of jv, whichever is smaller, is set as the weight wuv of the pair.

当該アライメントＡの類似度補正率ｈＡを次式で算出する。 The similarity correction rate hA of the alignment A is calculated by the following equation.

当該アライメントＡの類似度をｓＡとしたとき、Ａの補正類似度ｈｓＡを次式で算出する。 When the similarity of the alignment A is sA, the corrected similarity hsA of A is calculated by the following formula.

類似度補正率データベース３１に記載された類似度補正率について、発明が解決しようとする課題で挙げたａ），ｄ）の場合に対応する区分２），８）は、アライメント類似度を下げる必要はないので、類似度補正率を１．０としている。発明が解決しようとする課題で挙げたｂ），ｃ）の場合に対応する区分３），７）は、アライメント類似度を下げる必要があるので、類似度補正率を０．１としている。 Regarding the similarity correction rate described in the similarity correction rate database 31, categories 2) and 8) corresponding to cases a) and d) mentioned in the problem to be solved by the invention need to reduce the alignment similarity. Therefore, the similarity correction rate is set to 1.0. In the categories 3) and 7) corresponding to the cases of b) and c) mentioned in the problem to be solved by the invention, the similarity correction rate is set to 0.1 because it is necessary to reduce the alignment similarity.

ｔ２中のペア（ｊｕ，ｊｖ）の重みについて、ｊｕ，ｊｖの一方が、重みの小さい自立部の場合、（ｊｕ，ｊｖ），（ｒｕ，ｒｖ）それぞれが、同一部分文にあるか否かは、アライメントの類似度補正率に影響を与えるものではないと考えられる。このためｊｕ，ｊｖの重みの内、大きくない方を、該ペアの重みとして、ペアを構成する自立部の重みに応じた影響を類似度補正率に与えるようにしている。 Regarding the weight of the pair (ju, jv) in t2, if one of ju and jv is an independent part with a small weight, whether or not (ju, jv) and (ru, rv) are in the same sub-sentence Is considered not to affect the alignment similarity correction rate. Therefore, the weight of ju and jv, whichever is smaller, is set as the weight of the pair so that the similarity correction factor is affected by the weight of the independent portion forming the pair.

アライメント類似度補正手段３０の処理が終了すると、アライメント選択手段２６の処理に移る。 When the process of the alignment similarity correction unit 30 is completed, the process of the alignment selection unit 26 is performed.

テキスト間類似度算出手段３２は、各アライメントの補正類似度の最大値を、テキストｔ１，ｔ２間の類似度として算出する。 The inter-text similarity calculating means 32 calculates the maximum value of the corrected similarity of each alignment as the similarity between the texts t1 and t2.

補正類似度が高いアライメントほど正解であると考えられるので、補正類似度が最大値をとるアライメントを正解として採用し、その最大値をテキストｔ１，ｔ２間の類似度としている。 Since it is considered that the alignment having the higher correction similarity is the correct answer, the alignment having the maximum correction similarity is adopted as the correct answer, and the maximum value is set as the similarity between the texts t1 and t2.

出力手段４０は、テキスト間類似度算出手段３２によって算出された、テキストｔ１，ｔ２間の類似度を結果として出力する。出力手段４０により出力されるテキストｔ１，ｔ２間の類似度は、「ｔ１ならばｔ２」の含意関係の度合いを表す。 The output unit 40 outputs the similarity between the texts t1 and t2 calculated by the inter-text similarity calculation unit 32 as a result. The degree of similarity between the texts t1 and t2 output by the output unit 40 represents the degree of the implication relationship of “if t1, t2”.

図９は、含意認識装置１００の処理フローの一例である。入力手段１０が２つのテキストｔ１，ｔ２を受け付けると、図９に示す含意認識処理ルーチンが実行される。 FIG. 9 is an example of a processing flow of the implication recognition device 100. When the input means 10 receives the two texts t1 and t2, the implication recognition processing routine shown in FIG. 9 is executed.

まず、ステップＳ１００において、部分文抽出手段２２は、入力手段１０により受け付けた２つのテキストｔ１,ｔ２を取得する。 First, in step S100, the partial sentence extracting unit 22 acquires the two texts t1 and t2 received by the input unit 10.

そして、ステップＳ１０２において、部分文抽出手段２２は、上記ステップＳ１００で受け付けたテキストｔ１，ｔ２それぞれに対し、係り受け解析し、用言文節を起点として、該用言文節と、該用言文節に係るまたは該用言文節が係る非用言文節と、非用言文節に係る非用言文節とを連結して得られる部分文を抽出し、部分文を除いた文節群の中の非用言文節とそれに係る非用言文節とを連結して得られる部分文を抽出し、部分文のリストを抽出する。 Then, in step S102, the sub-sentence extracting unit 22 performs dependency analysis on each of the texts t1 and t2 received in step S100, and extracts the verb phrase and the verb phrase from the verb phrase as a starting point. A non-verb in a clause group excluding a sub-sentence obtained by extracting a sub-sentence obtained by concatenating the non-verbal clause related to or related to the non-verbal clause and the non-verbal clause related to the non-verbal clause A partial sentence obtained by connecting the phrase and the non-verbal phrase related thereto is extracted, and a list of partial sentences is extracted.

ステップＳ１０４において、類義自立部抽出手段２４は、上記ステップＳ１００で受け付けたテキストｔ２中の各自立部に対し、該自立部と同義または類義のｔ１中の類義自立部のリストを抽出する。具体的には、類義自立部抽出手段２４は、ｔ２中の自立部を任意に固定したとき、ｔ１中の各自立部に対し、自立部データベース２５を参照することにより、ｔ２中の該自立部とｔ１中の該自立部との類似度を算出し、その類似度がある値以上の場合に、ｔ１中の該自立部を類義自立部と認定する。そして、類義自立部抽出手段２４は、ｔ２中の該自立部とｔ１中の該類義自立部との対に、算出した類似度を対応付ける。 In step S104, the synonymous self-supporting portion extraction means 24 extracts, for each self-supporting portion in the text t2 received in step S100, a list of synonymous self-supporting portions in t1 that are synonymous with or synonymous with the self-supporting portion. .. Specifically, the synonymous self-supporting part extraction means 24 refers to the self-supporting part database 25 for each self-supporting part in t1 when the self-supporting part in t2 is arbitrarily fixed, and thereby the self-supporting part in t2. The degree of similarity between a section and the self-supporting section in t1 is calculated, and when the degree of similarity is a certain value or more, the self-supporting section in t1 is recognized as a synonymous self-supporting section. Then, the synonymous self-supporting portion extraction means 24 associates the calculated similarity with the pair of the self-supporting portion in t2 and the synonymous self-supporting portion in t1.

ステップＳ１０６において、アライメント選択手段２６は、ｔ１中の自立部とｔ２中の自立部との間の自立部間のアライメントのうち、これまで選択されていないアライメントが存在するか否かを判定する。これまで選択されていないアライメントが存在する場合、ステップＳ１０８へ移行する。これまで選択されていないアライメントが存在しない場合、処理対象のアライメントは存在しないものとして、ステップＳ１１４へ移行する。 In step S106, the alignment selection unit 26 determines whether or not there is an alignment that has not been selected so far among the alignments between the self-supporting portions during the time t1 and the self-supporting portion during the time t2. If there is an alignment that has not been selected so far, the process proceeds to step S108. When there is no alignment that has not been selected so far, it is determined that there is no alignment to be processed, and the process proceeds to step S114.

ステップＳ１０８において、アライメント選択手段２６は、これまで選択されていないアライメントの中から１つのアライメントを以後の処理対象として選択する。 In step S108, the alignment selection unit 26 selects one alignment from the alignments that have not been selected so far, as a subsequent processing target.

ステップＳ１１０において、アライメント類似度算出手段２８は、上記ステップＳ１０８で選択したアライメントにおいて、自立部重みデータベース２９を参照して、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該アライメントの類似度を算出する。 In step S110, the alignment similarity calculation means 28 refers to the self-supporting part weight database 29 in the alignment selected in step S108 and refers to each self-supporting part in t2 and the synonymous self-supporting part corresponding to the self-supporting part. The similarity of the alignment is calculated by taking the average of the similarities weighted by the independent portion.

ステップＳ１１２において、アライメント類似度補正手段３０は、上記ステップＳ１０８で選択したアライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の部分文にあるか否かによって定まる類似度の補正率を、類似度補正率データベース３１より取得し、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該アライメントの類似度の補正率を算出する。そして、アライメント類似度補正手段３０は、上記ステップＳ１１０で得られた該アライメントの類似度に、該補正率を乗じることにより、該アライメントの補正類似度を算出する。 In step S112, the alignment similarity correction means 30 determines whether or not the pair is in the same sub-sentence for each pair of the independent parts in t2 in the alignment selected in step S108, and each independent part of the pair. The similarity correction rate determined by whether or not the synonymous self-supporting section corresponding to the section is in the same sub-sentence is acquired from the similarity correction rate database 31, and the similarity correction rate of each pair The correction rate of the similarity of the alignment is calculated by taking the average of the weights of the pair determined by the weight of the independent portion. Then, the alignment similarity correction means 30 calculates the correction similarity of the alignment by multiplying the alignment similarity obtained in step S110 by the correction rate.

ステップＳ１１４において、テキスト間類似度算出手段３２は、上記ステップＳ１１２で得られた各アライメントの補正類似度の最大値を、テキストｔ１，ｔ２間の類似度として算出し、含意認識処理ルーチンを終了する。 In step S114, the inter-text similarity calculating means 32 calculates the maximum value of the corrected similarity of each alignment obtained in step S112 as the similarity between the texts t1 and t2, and ends the implication recognition processing routine. ..

出力手段４０は、テキスト間類似度算出手段３２によって算出された、テキストｔ１，ｔ２間の類似度を結果として出力する。 The output unit 40 outputs the similarity between the texts t1 and t2 calculated by the inter-text similarity calculation unit 32 as a result.

以上説明したように、本発明の実施の形態に係る含意認識装置によれば、対象テキスト間で自立部がどれだけオーバーラップしているかを表すアライメント類似度を算出するのみならず、自立部のペアが、一方のテキストで同一部分文にあり、他方のテキストで同一部分文にない場合（すなわち、発明が解決しようとする課題で挙げたｂ），ｃ)の場合）は、類似度補正率を低くすることによって、アライメント類似度を下げ、含意関係の度合いを下げる。このようにして、本発明により含意認識の精度が向上するという効果がある。 As described above, according to the implication recognition device according to the embodiment of the present invention, not only is it possible to calculate the alignment similarity that indicates how much the independent parts overlap between the target texts, but When the pair is in the same sub-sentence in one text and not in the same sub-sentence in the other text (that is, in the case of b) and c) mentioned in the problem to be solved by the invention, the similarity correction rate By lowering, the degree of alignment similarity is lowered and the degree of implication is lowered. In this way, the present invention has the effect of improving the accuracy of implication recognition.

＜第２の実施の形態＞
図１０は、本発明の請求項２の一例を示す含意認識装置の構成例である。図１０に示すように、本発明の第２の実施の形態に係る含意認識装置２００は、ＣＰＵと、ＲＡＭと、後述する各処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この含意認識装置２００は、機能的には図１０に示すように入力手段２１０と、演算手段２２０と、出力手段２４０とを備えている。 <Second Embodiment>
FIG. 10 is a configuration example of an implication recognition device showing an example of claim 2 of the present invention. As shown in FIG. 10, an implication recognition device 200 according to the second exemplary embodiment of the present invention includes a CPU, a RAM, a ROM that stores programs and various data for executing each processing routine described later, Can be configured with a computer including. This implication recognition device 200 is functionally provided with an input means 210, a calculation means 220, and an output means 240, as shown in FIG.

請求項２の一例である含意認識装置２００は、テキストｔ１，ｔ２の組のリストであって、各組に対し、ｔ１，ｔ２それぞれの正解部分文のリストと、ｔ２中の各自立部と該自立部に対応するｔ１中の正解の１類義自立部との対のリストである正解アライメントと、ｔ１，ｔ２が「ｔ１ならばｔ２」の含意関係にあるか否かの正解ラベルが付与されているｔ１，ｔ２の組のリストを入力とする。各ｔ１，ｔ２の組に番号 The implication recognition device 200, which is an example of claim 2, is a list of sets of texts t1 and t2, and for each set, a list of correct sub-sentences of t1 and t2, each independent part in t2, and A correct answer alignment which is a list of pairs of correct answers in t1 corresponding to the self-supporting part and a self-supporting part, and a correct answer label indicating whether or not t1 and t2 are in the implication relationship of "if t1, t2" are given. The list of the set of t1 and t2 is set as the input. Number for each t1, t2 set

を付与する。 Is given.

入力手段２１０は、テキストｔ１，ｔ２の組のリストを受け付ける。テキストｔ１，ｔ２の組のリストには、各組に対し、ｔ１，ｔ２それぞれの正解部分文のリストと、ｔ２中の各自立部と該自立部に対応するｔ１中の正解の１類義自立部との対のリストである正解アライメントと、ｔ１，ｔ２が「ｔ１ならばｔ２」の含意関係にあるか否かの正解ラベルが付与されている。 The input unit 210 receives a list of sets of texts t1 and t2. The list of sets of texts t1 and t2 includes, for each set, a list of correct sub-sentences of t1 and t2, each independent part in t2, and one kind of correct answer in t1 corresponding to the independent part. A correct answer alignment, which is a list of pairs with a copy, and a correct answer label indicating whether or not t1 and t2 are in the implication relationship of "if t1, t2" are given.

演算手段２２０は、テキストｔ１，ｔ２の組選択手段２２４と、正解アライメント類似度算出手段２２８と、自立部データベース２５と、自立部重みデータベース２９と、正解アライメント類似度補正手段２３２と、重回帰分析手段２３６とを含んで構成されている。 The computing means 220 includes a set selecting means 224 for the texts t1 and t2, a correct alignment similarity calculating means 228, an independent part database 25, an independent part weight database 29, a correct alignment similarity correcting part 232, and a multiple regression analysis. And means 236.

テキストｔ１，ｔ２の組選択手段２２４は、処理対象とするテキストｔ１，ｔ２の組を選択する。 The set selection means 224 for the texts t1 and t2 selects a set of the texts t1 and t2 to be processed.

すなわち、テキストｔ１，ｔ２の組選択手段２２４は、これまで選択されていないｔ１，ｔ２の組が存在する場合、その中から１つのｔ１，ｔ２の組ｐを以後の処理対象として選択し、正解アライメント類似度算出手段２２８の処理に移る。これまで選択されていないｔ１，ｔ２の組が存在しない場合、処理対象のｔ１，ｔ２の組は存在しないものとして、重回帰分析手段２３６の処理に移る。 That is, if there is a set of t1 and t2 that has not been selected so far, the set selecting means 224 of the texts t1 and t2 selects one set p of t1 and t2 as a target for subsequent processing, and corrects the answer. The process proceeds to the processing of the alignment similarity calculation means 228. If there is no pair of t1 and t2 that has not been selected so far, it is determined that there is no pair of t1 and t2 to be processed, and the multiple regression analysis unit 236 proceeds to the processing.

正解アライメント類似度算出手段２２８は、選択したｔ１，ｔ２の組の正解アライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該正解アライメントの類似度を算出する。以下、具体的に説明する。 The correct alignment similarity calculation means 228 calculates the weight of the self-supporting part of the similarity between each self-supporting part in t2 and the synonymous self-supporting part corresponding to the self-supporting part in the correct solution alignment of the selected set of t1 and t2. The similarity of the correct alignment is calculated by taking the added average. The details will be described below.

ｔ２中の自立部を、ｊ１，ｊ２，…，ｊｍとし、当該正解アライメントＡを（ｊ１，ｒ１），（ｊ２，ｒ２），…，（ｊｍ，ｒｍ）とする。 Let j1, j2,..., Jm be self-supporting portions in t2, and the correct alignment A be (j1, r1), (j2, r2),..., (jm, rm).

自立部データベース２５を参照して、各（ｊｉ，ｒｉ）に対応する類似度ｓｉを算出する。例えば自立部データベース２５が図４のようなシソーラスの場合、ｊｉ，ｒｉそれぞれのシソーラス上のノードの間の距離をもとに類似度ｓｉを算出する。あるいは、自立部データベース２５が図５のような単語概念ベースの場合、ｊｉ，ｒｉそれぞれの概念ベクトルを、構成単語の単語概念ベクトルを合成することにより生成し、生成した概念ベクトル間のコサイン類似度を類似度ｓｉとする。ｒｉがＮＵＬＬの場合は、類似度ｓｉを０．０とする。 With reference to the independent part database 25, the similarity si corresponding to each (ji, ri) is calculated. For example, when the independent part database 25 is a thesaurus as shown in FIG. 4, the similarity si is calculated based on the distance between the nodes on the thesaurus of ji and ri. Alternatively, when the independent part database 25 is based on the word concept as shown in FIG. 5, the concept vectors of ji and ri are generated by synthesizing the word concept vectors of the constituent words, and the cosine similarity between the generated concept vectors is generated. Is the similarity si. When ri is NULL, the similarity si is set to 0.0.

自立部重みデータベース２９を参照して、各ｊｉの重みｗｉを取得する。 By referring to the independent part weight database 29, the weight wi of each ji is acquired.

当該正解アライメントＡの類似度ｓＡを次式で算出する。 The similarity sA of the correct alignment A is calculated by the following formula.

正解アライメント類似度補正手段２３２は、該正解アライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の正解部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の正解部分文にあるか否かによって定まる類似度の補正率を未知数とし、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該正解アライメントの類似度の補正率の式を導出し、該正解アライメントの類似度に、該補正率の式を乗じることにより、該正解アライメントの補正類似度の式を導出する。以下、具体的に説明する。 In the correct answer alignment, the correctness alignment similarity correction unit 232 determines, for each pair of independent parts in t2, whether or not the pair is in the same correct partial sentence and a synonym corresponding to each independent part of the pair. The unknown factor is the correction factor of the similarity determined by whether the independent parts are in the same correct sub-sentence, and the weight of the pair determined by the weight of each independent part of the pair is added to the similarity correction factor of each pair. By averaging, the formula of the correction rate of the similarity of the correct answer alignment is derived, and the formula of the correction similarity of the correct answer alignment is derived by multiplying the similarity of the correct answer alignment by the formula of the correction rate. To do. The details will be described below.

当該正解アライメントＡを（ｊ１，ｒ１），（ｊ２，ｒ２），…，（ｊｍ，ｒｍ）としたとき、ｔ２中の自立部の各ペア（ｊｕ，ｊｖ）に対し、該ペアの各自立部に対応する類義自立部のペア（ｒｕ，ｒｖ）が定まる。 When the correct alignment A is (j1, r1), (j2, r2),..., (jm, rm), for each pair (ju, jv) of the self-supporting part in t2, each self-supporting part of the pair. A pair of synonymous self-supporting parts (ru, rv) corresponding to is determined.

図１１は、第１の実施の形態の類似度補正率データベース３１の各区分ｇ）に対応する類似度補正率を未知数ａ_ｇと置くことを示す図である。ｔ２中のペア（ｊｕ，ｊｖ）に対し、（ｊｕ，ｊｖ）と（ｒｕ，ｒｖ）が類似度補正率データベース３１中のいずれの区分に該当するかが定まり、それによって類似度補正率ｈｕｖが定まる。ｈｕｖは、ａ_１，ａ_２，…，ａ_１０のいずれかとなる。 FIG. 11 is a diagram showing that the similarity correction rate corresponding to each section g) of the similarity correction rate database 31 according to the first embodiment is set as the unknown number a _g . For the pair (ju,jv) in t2, it is determined which category in the similarity correction rate database 31 corresponds to (ju,jv) and (ru,rv), and the similarity correction rate huv is thereby determined. Determined. huv is any one of a ₁ , a ₂ ,..., A ₁₀ .

当該正解アライメントＡの類似度補正率の式ｈＡを以下のように導出する。 The formula hA of the similarity correction rate of the correct answer A is derived as follows.

当該正解アライメントＡの類似度をｓＡとしたとき、Ａの補正類似度の式ｈｓＡを以下のように導出する。以下の式では、未知数ａ_ｇごとに、ａ_ｇの項をまとめたときの係数をｘ_ｐｇと置いている。ｘ_ｐｇは、ｔ１，ｔ２の組ｐごとに計算される具体的な実数値である。 Assuming that the similarity of the correct alignment A is sA, the corrected similarity expression hsA of A is derived as follows. In the following formulas, each unknowns a _g, has placed coefficient when summarizing the section a _g and x _pg. x _pg is a specific real value calculated for each set p of t1 and t2.

このようにして、ｔ１，ｔ２の組ｐに対し、正解アライメントの補正類似度の式ａ_１ｘ_ｐ１＋ａ_２ｘ_ｐ２＋…＋ａ_１０ｘ_ｐ１０を導出する。 In this way, for the set p of t1 and t2, the formula a ₁ x _p1 +a ₂ x _p2 +... +a ₁₀ x _p10 of the correct similarity of correct alignment is derived.

正解アライメント類似度補正手段２３２の処理が終了すると、テキストｔ１，ｔ２の組選択手段２２４の処理に移る。 When the processing of the correct answer alignment similarity correction means 232 ends, the processing moves to the processing of the set selection means 224 of the texts t1 and t2.

重回帰分析手段２３６は、各ｔ１，ｔ２の組の正解アライメントの補正類似度の式と、該組の正解ラベルとの対のリストに対し、重回帰分析を適用することにより、未知数である各類似度補正率の最適値を導出する。以下、具体的に説明する。 The multiple regression analysis unit 236 applies the multiple regression analysis to the list of pairs of correct similarity of correct alignment of each pair of t1 and t2 and the correct label of the pair, and each of them is an unknown number. The optimum value of the similarity correction rate is derived. The details will be described below.

ｔ１，ｔ２の組ｐの正解ラベルが「含意関係あり」の場合、ｙ_ｐ＝１とし、「含意関係なし」の場合、ｙ_ｐ＝０とする。 When the correct label of the set p of t1 and t2 is “with implication relation”, y _p =1 and when it is “without implication relation”, y _p =0.

各 each

に対し、以下の式を置く。ａ_１１は未知数の定数項である。 Then, the following formula is put. a ₁₁ is an unknown constant term.

上記ｑ個の式に対し重回帰分析を行い、左辺の実測値 The multiple regression analysis is performed on the above q equations, and the measured value on the left side

とｙ_ｐとの残差平方和が最も小さくなる係数ａ_１，ａ_２，…，ａ_１０，ａ_１１を求める。求めた係数ａ_１，ａ_２，…，ａ_１０に対し、各ｔ１，ｔ２の組ｐの正解アライメントの補正類似度ａ_１ｘ_ｐ１＋ａ_２ｘ_ｐ２＋…＋ａ_１０ｘ_ｐ１０は、「含意関係あり」の場合、大きくなり、「含意関係なし」の場合、小さくなる。 The coefficient _a _1, a 2 of residual sums of squares of _{y p} is the _{smallest, ...,} seeking _{a 10,} a _11. Coefficients calculated _a _1, a 2, _..., to _{a 10,} corrected similarity _{_{_{_{a 1 x p1 + a 2 x}}}} p2 + ... + a 10 x p10 correct answer alignment set p of the t1, t2 is located "entailment In the case of "," it becomes large, and in the case of "no implication," it becomes small.

このようにして、入力の学習データにフィットする最適な類似度補正率ａ_１，ａ_２，…，ａ_１０が求まる。求めた最適な類似度補正率ａ_１，ａ_２，…，ａ_１０を、類似度補正率データベース３１に格納して、請求項1の含意認識装置の処理を行うことにより、新規の入力テキストｔ１，ｔ２に対しても、的確なアライメント補正類似度を算出することができる。 In this way, the optimum similarity correction rates a ₁ , a ₂ ,..., A ₁₀ that fit the input learning data are obtained. By storing the calculated optimum similarity correction rates a ₁ , a ₂ ,..., A ₁₀ in the similarity correction rate database 31 and performing the processing of the implication recognition device of claim 1, a new input text t1 is obtained. , T2, an accurate alignment correction similarity can be calculated.

出力手段２４０は、重回帰分析手段２３６によって算出された係数ａ_１，ａ_２，…，ａ_１０を結果として出力する。係数ａ_１，ａ_２，…，ａ_１０は、類似度補正率データベース３１に格納される。 The output means 240 outputs the coefficients a ₁ , a ₂ ,..., A ₁₀ calculated by the multiple regression analysis means 236 as a result. The coefficients a ₁ , a ₂ ,..., A ₁₀ are stored in the similarity correction rate database 31.

図１２は、含意認識装置２００の処理フローの一例である。入力手段２１０が、テキストｔ１，ｔ２の組のリストであって、各組に対し、ｔ１，ｔ２それぞれの正解部分文のリストと、ｔ２中の各自立部と該自立部に対応するｔ１中の正解の１類義自立部との対のリストである正解アライメントと、ｔ１，ｔ２が「ｔ１ならばｔ２」の含意関係にあるか否かの正解ラベルが付与されているｔ１，ｔ２の組のリストを受け付けると、図１２に示す含意認識処理ルーチンが実行される。 FIG. 12 is an example of a processing flow of the implication recognition device 200. The input means 210 is a list of a set of texts t1 and t2, and for each set, a list of correct sub-sentences of t1 and t2, each independent part in t2, and each independent part in t1 corresponding to the independent part. The correct answer alignment, which is a list of pairs with the correct one-sense synonymous independence part, and the correct answer label indicating whether or not there is an implication of whether t1 and t2 are “if t1 is t2”. When the list is accepted, the implication recognition processing routine shown in FIG. 12 is executed.

ステップＳ２００において、テキストｔ１，ｔ２の組選択手段２２４は、入力手段２１０により受け付けたｔ１，ｔ２の組のリストを取得する。 In step S200, the set selection means 224 for the texts t1 and t2 acquires the list of the sets t1 and t2 received by the input means 210.

ステップＳ２０２において、テキストｔ１，ｔ２の組選択手段２２４は、上記ステップＳ２００で取得したｔ１，ｔ２の組のリストのうち、これまで選択されていないｔ１，ｔ２の組が存在するか否かを判定する。これまで選択されていないｔ１，ｔ２の組が存在する場合、ステップＳ２０４へ移行する。これまで選択されていないｔ１，ｔ２の組が存在しない場合、処理対象のｔ１，ｔ２の組は存在しないものとして、ステップＳ２１０へ移行する。 In step S202, the set selecting means 224 for the texts t1 and t2 determines whether or not there is a set of t1 and t2 that has not been selected so far from the list of the set of t1 and t2 acquired in step S200. To do. If there is a set of t1 and t2 that has not been selected so far, the process proceeds to step S204. If there is no pair of t1 and t2 that has not been selected so far, it is determined that there is no pair of t1 and t2 to be processed, and the process proceeds to step S210.

ステップＳ２０４において、テキストｔ１，ｔ２の組選択手段２２４は、これまで選択されていないｔ１，ｔ２の組の中から１つのｔ１，ｔ２の組を以後の処理対象として選択する。 In step S204, the set selecting means 224 for the texts t1 and t2 selects one set of t1 and t2 from the set of t1 and t2 which has not been selected so far, as a subsequent processing target.

ステップＳ２０６において、正解アライメント類似度算出手段２２８は、上記ステップＳ２０４で選択したｔ１，ｔ２の組の正解アライメントにおいて、ｔ２中の各自立部と該自立部に対応する類義自立部との類似度の、該自立部の重みを付けた平均をとることにより、該正解アライメントの類似度を算出する。 In step S206, the correct answer alignment similarity calculation unit 228, in the correct alignment of the set of t1 and t2 selected in step S204, the similarity between each self-supporting part in t2 and the synonymous self-supporting part corresponding to the self-supporting part. Of the self-supporting portion is calculated, and the similarity of the correct alignment is calculated.

ステップＳ２０８において、正解アライメント類似度補正手段２３２は、上記ステップＳ２０４で選択したｔ１，ｔ２の組の該正解アライメントにおいて、ｔ２中の自立部の各ペアに対し、該ペアが同一の正解部分文にあるか否かと、該ペアの各自立部に対応する類義自立部が同一の正解部分文にあるか否かによって定まる類似度の補正率を未知数とし、各ペアの類似度補正率の、該ペアの各自立部の重みから定まる該ペアの重みをつけた平均をとることにより、該正解アライメントの類似度の補正率の式を導出し、該正解アライメントの類似度に、該補正率の式を乗じることにより、該正解アライメントの補正類似度の式を導出する。 In step S208, the correct answer alignment similarity correction unit 232 makes the same correct sub-sentence for each pair of the independent sections in t2 in the correct answer alignment of the set of t1 and t2 selected in step S204. The similarity correction rate determined by whether or not there is a synonymous autonomy part corresponding to each autonomy part of the pair is in the same correct sub-sentence, and the similarity correction rate of each pair The weighted average of the pair, which is determined from the weight of each independent part of the pair, is taken to derive an equation for the correction rate of the similarity of the correct answer alignment, and the equation of the correction rate is calculated for the similarity of the correct answer alignment. By multiplying by, the formula of the correction similarity of the correct alignment is derived.

ステップＳ２１０において、重回帰分析手段２３６は、上記ステップＳ２０８で得られた各ｔ１，ｔ２の組の正解アライメントの補正類似度の式と、該組の正解ラベルとの対のリストに対し、重回帰分析を適用することにより、未知数である各類似度補正率の最適値を導出し、含意認識処理ルーチンを終了する。 In step S210, the multiple regression analysis unit 236 performs multiple regression for the list of pairs of correct similarity for correct alignment of each pair of t1 and t2 obtained in step S208 and the correct label of the pair. By applying the analysis, the optimal value of each similarity correction rate, which is an unknown number, is derived, and the implication recognition processing routine ends.

出力手段２４０は、重回帰分析手段２３６によって算出された各類似度補正率の最適値を類似度補正率データベース３１へ格納する。 The output unit 240 stores the optimum value of each similarity correction rate calculated by the multiple regression analysis unit 236 in the similarity correction rate database 31.

これまで述べた処理をプログラムとして構築し、当該プログラムを通信回線または記録媒体からインストールし、ＣＰＵ等の手段で実施することが可能である。 It is possible to construct the processing described above as a program, install the program from a communication line or a recording medium, and execute it by means such as a CPU.

なお、本発明は、上記の実施例に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above embodiments, and various modifications and applications are possible within the scope of the claims.

本発明は、２つのテキストｔ１，ｔ２の間の含意関係を認識する含意認識技術に適用可能である。 The present invention can be applied to the entailment recognition technique for recognizing the entailment relationship between two texts t1 and t2.

１０，２１０入力手段
２０，２２０演算手段
２２部分文抽出手段
２４類義自立部抽出手段
２５自立部データベース
２６アライメント選択手段
２８アライメント類似度算出手段
２９自立部重みデータベース
３０アライメント類似度補正手段
３１類似度補正率データベース
３２テキスト間類似度算出手段
４０，２４０出力手段
１００，２００含意認識装置
２２４テキストｔ１，ｔ２の組選択手段
２２８正解アライメント類似度算出手段
２３２正解アライメント類似度補正手段
２３６重回帰分析手段 10, 210 Input means 20, 220 Calculation means 22 Partial sentence extraction means 24 Synonymous independent part extraction means 25 Independent part database 26 Alignment selection means 28 Alignment similarity calculation means 29 Independent part weight database 30 Alignment similarity correction means 31 Similarity Correction ratio database 32 Inter-text similarity calculation means 40, 240 Output means 100, 200 Implication recognition device 224 Text t1, t2 set selection means 228 Correct answer alignment similarity calculation means 232 Correct alignment similarity correction means 236 Multiple regression analysis means

Claims

An entailment recognition device for recognizing an entailment relationship between two texts t1 and t2,
Dependency analysis is performed on each of t1 and t2, and the verb phrase is used as a starting point, and the verb phrase, the non-verb phrase related to the verb phrase, or the non-verb phrase related to the verb phrase, and the non-verb phrase related to It is obtained by extracting a partial sentence obtained by concatenating a non-verbal phrase and a non-verbal phrase in the phrase group excluding the partial sentence and a non-verbal phrase related to the non-verbal phrase. Partial sentence extraction means for extracting a list of partial sentences by extracting partial sentences,
synonym self-sustaining part extraction means for extracting a list of synonym self-supporting parts in t1 which is synonymous with or synonymous with the self-supporting part in t2;
alignment selecting means for selecting an alignment between independent sections by selecting one synonymous independent section from the corresponding synonymous independent section list for each independent section in t2;
In the selected alignment, the similarity of the alignment is calculated by taking the weighted average of the similarities between each self-supporting part in t2 and the synonymous self-supporting part corresponding to the self-supporting part. Alignment similarity calculation means,
In the selected alignment, for each pair of the independent parts in t2, whether the pair is in the same partial sentence and whether the synonymous independent parts corresponding to each independent part of the pair are in the same partial sentence Acquire a similarity correction rate determined by whether or not the similarity correction rate database from the similarity correction rate database, and take the average of the similarity correction rate of each pair, which is determined by the weight of each independent part of the pair. An alignment similarity correction means for calculating the correction similarity of the alignment, and calculating the correction similarity of the alignment by multiplying the alignment similarity by the correction rate.
An implication recognition device, comprising: inter-text similarity calculation means for calculating the maximum value of the corrected similarity of each alignment as the similarity between the texts t1 and t2.

A list of sets of texts t1 and t2, for each set, a list of correct answer sub-sentences of t1 and t2, each independence part in t2 and one synonym of correct answer in t1 corresponding to the independence part The input is a correct alignment that is a list of pairs with the independent part and a list of a pair of t1 and t2 to which a correct answer label indicating whether or not t1 and t2 have an implication relationship is input.
Text t1, t2 set selecting means for selecting a set of texts t1, t2 to be processed,
In the correct alignment of the selected set of t1 and t2, by taking the weighted average of the independence parts of the similarity between each independence part in t2 and the synonymous independence part corresponding to the independence part, A correct alignment similarity calculation means for calculating the similarity of the correct alignment;
In the correct answer alignment, for each pair of independent parts in t2, whether the pair is in the same correct partial sentence and whether the synonymous independent part corresponding to each independent part of the pair has the same correct partial sentence The correction rate of the similarity determined depending on whether or not there is an unknown number, and by taking the average of the similarity correction rate of each pair, which is determined by the weight of each independent part of the pair, the correct answer alignment is obtained. A correctness alignment similarity correction means for deriving an expression of the correction rate of the similarity of the correct answer alignment, and multiplying the similarity of the correct answer alignment by the expression of the correction rate,
The optimum value of each similarity correction rate, which is an unknown number, is obtained by applying a multiple regression analysis to the list of pairs of correct similarity of correct alignment of each pair of t1 and t2 and correct labels of the pair. And a multiple regression analysis unit for deriving the implication.

It includes a partial sentence extracting unit, a synonymous self-supporting unit extracting unit, an alignment selecting unit, an alignment similarity calculating unit, an alignment similarity correcting unit, and an inter-text similarity calculating unit, and includes two texts t1 and t2. An implication recognition method in an implication recognition device for recognizing an implication relationship between
The sub-sentence extraction unit performs dependency analysis on each of t1 and t2, and uses the syllable phrase and the non-syllable phrase related to the syllable phrase or related to the syllable phrase as a starting point from the syllable phrase. , A partial sentence obtained by concatenating the non-verbal phrase related to the non-verbal phrase, and extracting the non-verbal phrase from the phrase group excluding the partial sentence and the non-verbal phrase related to the non-verbal phrase Extracting a list of partial sentences by extracting partial sentences obtained by connecting and
A step in which the synonymous self-supporting part extraction means extracts, for each self-supporting part in t2, a list of synonymous self-supporting parts in t1 which is synonymous with or synonymous with the self-supporting part;
The alignment selecting means selects an alignment between independent sections by selecting one synonymous independent section from the corresponding synonymous independent section list for each independent section in t2;
The alignment similarity calculation means, in the selected alignment, by taking the weighted average of the self-supporting portion of the similarity between each self-supporting portion in t2 and the synonymous self-supporting portion corresponding to the self-supporting portion, Calculating the similarity of the alignment,
In the selected alignment, the alignment similarity correction means determines whether or not the pair is in the same sub-sentence for each pair of self-supporting parts in t2, and a synonymous self-supporting part corresponding to each self-supporting part of the pair. Is obtained from the similarity correction rate database, the correction rate of the similarity determined by whether or not is in the same sub-sentence, and the similarity correction rate of each pair is determined from the weight of each independent part of the pair. Calculating a correction rate of the similarity of the alignment by taking a weighted average, and calculating a correction similarity of the alignment by multiplying the similarity of the alignment by the correction rate;
The inter-text similarity calculating means includes a step of calculating the maximum value of the corrected similarity of each alignment as the similarity between the texts t1 and t2.

An implication recognition method in an implication recognition device including a set selection unit for the texts t1 and t2, a correct alignment similarity calculation unit, a correct alignment similarity correction unit, and a multiple regression analysis unit.
A list of sets of texts t1 and t2, for each set, a list of correct answer sub-sentences of t1 and t2, each independence part in t2 and one synonym of correct answer in t1 corresponding to the independence part The input is a correct alignment that is a list of pairs with the independent part and a list of a pair of t1 and t2 to which a correct answer label indicating whether or not t1 and t2 have an implication relationship is input.
A step of selecting a set of the texts t1 and t2 to be processed by the set selecting means of the texts t1 and t2;
In the correct answer alignment of the selected set of t1 and t2, the correct answer alignment similarity calculating means calculates the weight of the independent part of the similarity between each independent part in t2 and the synonymous independent part corresponding to the independent part. Calculating the degree of similarity of the correct alignment by taking the attached average,
In the correct answer alignment, the correctness alignment similarity correction means determines, for each pair of the independent parts in t2, whether the pair is in the same correct partial sentence and a synonym corresponding to each independent part of the pair. The unknown factor is the correction factor of the similarity determined by whether the independent parts are in the same correct sub-sentence, and the weight of the pair determined by the weight of each independent part of the pair is added to the similarity correction factor of each pair. By averaging, the formula of the correction rate of the similarity of the correct answer alignment is derived, and the formula of the correction similarity of the correct answer alignment is derived by multiplying the similarity of the correct answer alignment by the formula of the correction rate. Steps to
The multiple regression analysis means applies the multiple regression analysis to the list of pairs of correct similarity of correct alignment of each set of t1 and t2 and the correct answer label of the set, and thereby the unknown numbers are obtained. And a step of deriving an optimum value of the similarity correction rate.

A program for causing a computer to function as each unit of the implication recognition device according to claim 1.