JP4671440B2

JP4671440B2 - Reputation relationship extraction device, method and program thereof

Info

Publication number: JP4671440B2
Application number: JP2007313203A
Authority: JP
Inventors: 久子浅野; 徹平野; 義博松尾
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-12-04
Filing date: 2007-12-04
Publication date: 2011-04-20
Anticipated expiration: 2027-12-04
Also published as: JP2009140048A

Description

本発明は、日本語テキストから評判情報の要素間の関係を抽出、特に関係のある評価対象と評価表現との組を抽出する技術に関する。 The present invention relates to a technique for extracting relationships between elements of reputation information from Japanese text, and in particular, extracting a set of related evaluation objects and evaluation expressions.

近年、入力されたテキストデータから、ある対象に関する意見や評価等の情報である評判情報を抽出し、整理して提示する技術についての研究が進んでいる。ここで、評判情報を構成する要素としては、評価する対象を表す評価対象（情報）、評価する対象の仕様（性質や特徴等）やその一部分など（の具体的な評価項目）を表す評価属性（情報）、意見や評価そのものを表す評価表現（情報）、評価を行う人や組織を表す評価者（情報）等がある。 In recent years, research on techniques for extracting, organizing and presenting reputation information, which is information such as opinions and evaluations about a certain object, from input text data has been progressing. Here, as elements constituting reputation information, an evaluation object (information) that represents an object to be evaluated, an evaluation attribute that represents (specific evaluation items) the specifications (properties, characteristics, etc.) of the object to be evaluated, or a part thereof (Information), evaluation expressions (information) representing opinions and evaluations themselves, and evaluators (information) representing persons and organizations performing the evaluation.

そして、テキストデータから評判情報の各要素を抽出する（例えば、テキスト「○○レストランのオムライスはおいしいけど、カレーはまずい」から、評判情報の各要素「評価対象＝○○レストラン、評価属性＝オムライス；カレー、評価表現＝おいしい；まずい」を抽出する）、例えば評価表現を抽出する手法としては、評価表現（の単語情報）とその表現が有する評価極性の組の集合からなる評価表現辞書を用いて行う方法が提案され、また、特に評価属性を抽出する手法としては、評価属性の集合からなる属性辞書を作成して行う方法が一般的である（非特許文献１（特に「３．４．１要素抽出」）参照）。 Then, each element of reputation information is extracted from the text data (for example, from the text “XX restaurant omelet is delicious but curry is bad”, each element of reputation information “evaluation target = ○○ restaurant, evaluation attribute = omelet rice” Curry, evaluation expression = delicious; bad ”is extracted), for example, as a technique for extracting the evaluation expression, an evaluation expression dictionary including a set of evaluation expressions (word information) and evaluation polarities of the expression is used. In particular, as a technique for extracting evaluation attributes, a method of creating an attribute dictionary composed of a set of evaluation attributes is generally used (Non-patent Document 1 (especially “3.4. 1 element extraction ”)).

しかし、評判情報の各要素の関係を抽出し、関連付けて出力する（例えば、テキスト「○○レストランのオムライスはおいしいけど、カレーはまずい」および評判情報の各要素「評価対象＝○○レストラン、評価属性＝オムライス；カレー、評価表現＝おいしい；まずい」から、関連付けられた評判情報「（評価対象，評価属性，評価表現）＝（○○レストラン，オムライス，おいしい）；（○○レストラン，カレー，まずい）」を出力する）手法については、まだ精度の良い手法は確立されていない（非特許文献１（特に「３．４．２関係抽出」）参照）。 However, the relationship between each element of reputation information is extracted and linked (for example, the text “O restaurant's omelet rice is delicious but curry is bad”) and each element of reputation information “evaluation target = ○○ restaurant, evaluation Attribute = omelet rice; curry, evaluation expression = delicious; bad ", and associated reputation information" (evaluation object, evaluation attribute, evaluation expression) = (XX restaurant, omelet rice, delicious); (XX restaurant, curry, bad " ) ”Is not yet established (see Non-Patent Document 1 (particularly“ 3.4.2 Relationship Extraction ”)).

一方、テキストにおける固有表現間（固有表現−固有表現）の関係を統計的に判定する手法として、非特許文献２に記載された手法がある。 On the other hand, there is a method described in Non-Patent Document 2 as a method for statistically determining the relationship between specific expressions (specific expression-specific expression) in text.

しかし、固有表現−固有表現の関係と、評判情報における評価対象−評価表現の関係とでは、その特徴が異なるため、非特許文献２に記載された手法をそのまま評判情報における評価対象−評価表現の関係の判定に適用するのは効果的ではない。
乾孝司、他「テキストを対象とした評価情報の分析に関する研究動向」自然言語処理、言語処理学会、２００６年７月、Ｖｏｌ．１３、Ｎｏ．３、ｐｐ．２０１−２４１平野徹、他「テキストにおける固有表現間の意味的関係の抽出」言語処理学会第１３回年次大会発表論文集、２００７年３月 However, since the characteristic differs between the relationship between the unique expression and the specific expression and the relationship between the evaluation object and the evaluation expression in the reputation information, the method described in Non-Patent Document 2 is used as it is for the evaluation object in the reputation information and the evaluation expression. It is not effective to apply to the determination of relationship.
Takashi Inui, et al. “Research Trends on Analysis of Evaluation Information for Texts”, Natural Language Processing, Language Processing Society of Japan, July 2006, Vol. 13, no. 3, pp. 201-241 Toru Hirano, et al. “Extracting Semantic Relationships among Explicit Expressions in Texts” Proc. Of 13th Annual Conference of the Association for Natural Language Processing, March 2007

固有表現−固有表現の関係と、評判情報における評価対象−評価表現の関係との特徴の違いは、主として、以下の２点である。 Differences in characteristics between the relationship between the specific expression and the specific expression and the relationship between the evaluation object and the evaluation expression in the reputation information are mainly the following two points.

第１の違いは、関係の数である。 The first difference is the number of relationships.

固有表現−固有表現の関係においては、ある固有表現に対し、複数の固有表現が関係することが良くある。例えば、「山田さんは、今朝、新大阪から新幹線に乗り、東京に１０時に着いて、銀座でランチ。」というテキストにおいては、「山田」という固有表現は、山田−新大阪、山田−東京、山田−銀座の、３つの固有表現と関係する。 In the relationship between specific expressions and specific expressions, a plurality of specific expressions are often related to a specific expression. For example, in the text “Mr. Yamada takes the Shinkansen from Shin-Osaka this morning, arrives in Tokyo at 10:00, and has lunch in Ginza.”, The specific expression “Yamada” is Yamada-Shin Osaka, Yamada-Tokyo, It is related to three unique expressions of Yamada-Ginza.

これに対し、評判情報における評価対象−評価表現の関係においては、ある評価表現に関係する評価対象は原則１つであり、例外は評価対象が並列の場合である。例えば、「山田ちゃんも田中ちゃんもかわいい」というテキストにおいては、「かわいい」という評価表現は、評価対象「山田」および評価対象「田中」ともに関係する。 On the other hand, in the relationship of evaluation object-evaluation expression in reputation information, the evaluation object related to a certain evaluation expression is one in principle, and the exception is when the evaluation objects are parallel. For example, in the text “Yamada-chan and Tanaka-chan are both cute”, the evaluation expression “cute” is related to both the evaluation object “Yamada” and the evaluation object “Tanaka”.

第２の違いは、出現順序である。 The second difference is the order of appearance.

固有表現−固有表現の関係においては、その出現順序にほとんど制限はない。「今日、銀座に行った。山田さんとフレンチ食べた。」というテキストと、「今日、山田さんと食事した。銀座のフレンチに行った。」というテキストはどちらもあり得る。 There is almost no restriction on the order of appearance in the relationship between proper expression and proper expression. There could be both the text “I went to Ginza today. I ate French with Mr. Yamada” and the text “I ate with Mr. Yamada today. I went to French in Ginza.”

しかし、評判情報における評価対象−評価表現の関係においては、評価対象より前方の文に評価表現が出現することはほとんどなく、評価表現は評価対象と同一の文もしくは、その後方の文に出現する場合が多い（例：「山田ちゃんってかわいい」、「今日、山田ちゃんに会った。やっぱかわいい。」）。例外は、倒置的な表現の場合である。例えば、「かわいかった。やっぱ山田ちゃんだね！」というテキストでは、評価表現「かわいい」が評価対象「山田」より前方の文に存在する。 However, in the evaluation object-evaluation expression relationship in reputation information, the evaluation expression rarely appears in the sentence ahead of the evaluation object, and the evaluation expression appears in the same sentence as the evaluation object or in the sentence behind it. There are many cases (ex. “Yamada-chan is cute”, “I met Yamada-chan today. After all, it is cute”). An exception is the case of inverted expressions. For example, in the text “It was cute. After all, Yamada-chan!”, The evaluation expression “cute” exists in the sentence ahead of the evaluation target “Yamada”.

さらに、非特許文献２に記載された手法は、文書中に出現する各固有表現単位に関係の有無を判定するものであるが、同じ実体を表す固有表現が複数回出現した場合でも、知識抽出の観点からは、１つの実体に対して出現回数分の出力を行うよりも、集約してその関係を抽出できるほうが望ましい。 Further, the method described in Non-Patent Document 2 determines whether or not there is a relationship between each unique expression unit appearing in a document. Even when a specific expression representing the same entity appears multiple times, knowledge extraction is performed. From this point of view, it is desirable that the relations can be extracted in a collective manner rather than outputting the number of appearances for one entity.

本発明は、上記の点に鑑みてなされたもので、ある評価表現に関係する評価対象は原則１つであることと、ある評価表現に関係する評価対象の位置は評価表現と同じ文もしくはその前方の文である場合が多いこととを利用し、関係のある評価対象と評価表現との組を高精度かつ低コストで抽出可能な評判関係抽出装置、その方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and there is basically one evaluation object related to a certain evaluation expression, and the position of the evaluation object related to a certain evaluation expression is the same sentence as the evaluation expression or its It is an object to provide a reputation relationship extraction apparatus, method and program capable of extracting a pair of a related evaluation object and an evaluation expression with high accuracy and low cost by utilizing the fact that it is often a forward sentence. And

本発明は、少なくとも単語情報、固有表現情報および評価表現情報を含む文書単位の言語解析情報から、関係のある評価対象と評価表現との組を抽出する評判関係抽出装置であって、文書単位の言語解析情報より評価対象および評価表現を抽出し、評価対象と評価表現との組み合わせを評価対象−評価表現ペアとして生成する組み合わせ生成部と、前記評価対象−評価表現ペアを所定の順序に並び替える並び替え部と、全ての評価対象−評価表現ペアのうち、関係があると判別された評価対象−評価表現ペアを記憶する判別結果記憶部と、評価対象−評価表現ペアに含まれる評価対象と評価表現との間に関係があるか否かの判別を行うための複数のモデルを記憶したモデル記憶部と、前記並び替え後の評価対象−評価表現ペアをその先頭ペアから末尾ペアまで順次、処理対象ペアとして設定する処理対象ペア設定部と、判別結果記憶部に記憶された判別結果に基づき、前記処理対象ペアとして設定された評価対象−評価表現ペアが、その評価対象と評価表現との間に関係があるか否かの判別を行うべき判別対象ペアかどうかを判定する判別対象ペア判定部と、前記判別対象ペアと判定された評価対象−評価表現ペアについて、言語解析情報より当該ペア間の単語に関する情報または当該ペア間の係り受け情報を素性として抽出する汎用情報抽出部と、前記判別対象ペアと判定された評価対象−評価表現ペアについて、言語解析情報より当該ペア間の文距離、当該ペアの評価対象が文書タイトルに含まれるかどうか、当該ペア間の評価表現数および評価対象種類を素性として抽出する評判用情報抽出部と、前記判別対象ペアと判定された評価対象−評価表現ペアを分類するモデル選択部と、前記汎用情報抽出部および評判用情報抽出部で抽出された素性と、前記モデル選択部で分類された分類結果とを用いてモデル記憶部からモデルを抽出し、該抽出したモデルに基づいて前記判別対象ペアと判定された評価対象−評価表現ペアの評価対象と評価表現との間に関係があるか否かを判別する分類器と、関係があると判別された評価対象−評価表現ペアを判別結果記憶部に格納する判別結果格納部と、判別結果記憶部に格納された評価対象−評価表現ペアを出力するペア出力部とを備えたことを特徴とする。 The present invention relates to a reputation relationship extraction device for extracting a set of a related evaluation object and an evaluation expression from document-based language analysis information including at least word information, unique expression information, and evaluation expression information, An evaluation object and an evaluation expression are extracted from the language analysis information, and a combination generation unit that generates a combination of the evaluation object and the evaluation expression as an evaluation object-evaluation expression pair, and rearranges the evaluation object-evaluation expression pair in a predetermined order. A reordering unit, a discrimination result storage unit for storing an evaluation target-evaluation expression pair determined to have a relationship among all evaluation target-evaluation expression pairs, and an evaluation target included in the evaluation target-evaluation expression pair A model storage unit storing a plurality of models for determining whether or not there is a relationship with the evaluation expression, and the evaluation target-evaluation expression pair after the rearrangement is the first pair From the processing target pair setting unit that sequentially sets the processing target pair to the end pair and the discrimination result stored in the discrimination result storage unit, the evaluation target-evaluation expression pair set as the processing target pair is evaluated. For a determination target pair determination unit that determines whether a determination target pair should be determined whether or not there is a relationship between the target and the evaluation expression, and for the evaluation target-evaluation expression pair determined as the determination target pair, From the language analysis information, a general-purpose information extraction unit that extracts information about words between the pairs or dependency information between the pairs as features from the language analysis information, and an evaluation object-evaluation expression pair determined as the discrimination target pair, from the language analysis information statement distance between the pair, whether evaluation of the pair is included in the document title, extracts the evaluation expressions number and evaluation type between the pair as features And determine information extraction unit, evaluated the is determined that the determination target pairs - a model selection unit which classifies the evaluation expressions pairs, and feature of the extracted generic information extraction unit and Reputation information extraction section, the model selection extracting a model from the model storage unit by using the classification classification results in parts, the determination target pair determined as the evaluation target based on the extract out model - between evaluated and evaluation expression evaluation expression pairs A classifier that discriminates whether or not there is a relationship, a discrimination result storage unit that stores an evaluation object-evaluation expression pair that is discriminated to be related in a discrimination result storage unit, and an evaluation that is stored in the discrimination result storage unit And a pair output unit that outputs a target-evaluation expression pair.

本発明によれば、ある評価表現に関係する評価対象は原則１つであるという特徴、および、ある評価表現に関係する評価対象の位置は評価表現と同じ文もしくはその前方の文である場合が多いという特徴を利用して、評価対象−評価表現ペアの関係を判別することができ、個々の事例に応じて関係のある評価対象−評価表現ペアを抽出できる。 According to the present invention, there is a case in which there is in principle one evaluation object related to a certain evaluation expression, and the position of the evaluation object related to a certain evaluation expression may be the same sentence as the evaluation expression or a sentence in front of it. The relationship between the evaluation object-evaluation expression pair can be discriminated using the feature of many, and the related evaluation object-evaluation expression pair can be extracted according to each case.

以下、本発明を図示の実施の形態により詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.

図１は本発明の評判関係抽出装置の実施の形態の一例を示すもので、本発明の評判関係抽出装置は、ペア生成処理部１０、判別結果記憶部２０、モデル記憶部３０および素性抽出・判別処理部４０から構成される。また、図２はペア生成処理部１０におけるペア生成処理の流れ図、図３は素性抽出・判別処理部４０における素性抽出・判別処理の流れ図であり、以下、本発明の概要について説明する。 FIG. 1 shows an example of an embodiment of a reputation relationship extraction device according to the present invention. The reputation relationship extraction device according to the present invention includes a pair generation processing unit 10, a discrimination result storage unit 20, a model storage unit 30, and a feature extraction / The determination processing unit 40 is configured. FIG. 2 is a flowchart of pair generation processing in the pair generation processing unit 10, and FIG. 3 is a flowchart of feature extraction / discrimination processing in the feature extraction / discrimination processing unit 40. The outline of the present invention will be described below.

ペア生成処理部１０は、組み合わせ生成部１１と、並び替え部１２とからなり、記憶手段に記憶された、あるいは入力手段から入力された文書単位の言語解析情報を入力とし、評価対象と評価表現との組み合わせを評価対象−評価表現ペアとして所定の順序で出力する。 The pair generation processing unit 10 includes a combination generation unit 11 and a rearrangement unit 12. The language generation analysis information in document units stored in the storage unit or input from the input unit is used as an input, and the evaluation target and the evaluation expression Are output in a predetermined order as an evaluation object-evaluation expression pair.

判別結果記憶部２０は、全ての評価対象−評価表現ペアのうち、関係があると判別された評価対象−評価表現ペアの情報を記憶する。 The discrimination result storage unit 20 stores information on evaluation target-evaluation expression pairs that are determined to be related among all evaluation target-evaluation expression pairs.

モデル記憶部３０は、評価対象−評価表現ペアに含まれる評価対象と評価表現との間に関係があるか否かの判別を行うための複数のモデルを記憶している。 The model storage unit 30 stores a plurality of models for determining whether or not there is a relationship between the evaluation object included in the evaluation object-evaluation expression pair and the evaluation expression.

素性抽出・判別処理部４０は、処理対象ペア設定部４１と、判別対象ペア判定部４２と、汎用情報抽出部４３と、評判用情報抽出部４４と、モデル選択部４５と、分類器４６と、判別結果格納部４７と、ペア出力部４８とからなり、ペア生成処理部１０から出力された評価対象−評価表現ペアのうち、その評価対象と評価表現との間に関係があるか否かを判別すべき評価対象−評価表現ペアを判別結果記憶部２０を用いて判定し、該判別すべきペアに対しては言語解析情報より素性抽出を行い、モデル記憶部３０からモデルを選択して判別処理を行い、関係のある評価対象−評価表現ペアを判別結果記憶部２０に格納し、全ての評価対象−評価表現ペアに対して処理を行った後、判別結果記憶部２０から関係のある評価対象−評価表現ペアを読み出して出力する。 The feature extraction / discrimination processing unit 40 includes a processing target pair setting unit 41, a discrimination target pair determination unit 42, a general-purpose information extraction unit 43, a reputation information extraction unit 44, a model selection unit 45, and a classifier 46. Whether or not there is a relationship between the evaluation object and the evaluation expression among the evaluation object-evaluation expression pairs output from the pair generation processing unit 10, including the discrimination result storage unit 47 and the pair output unit 48. An evaluation object-evaluation expression pair to be discriminated is determined using the discrimination result storage unit 20, and feature extraction is performed on the pair to be discriminated from language analysis information, and a model is selected from the model storage unit 30. After performing the discrimination processing, storing the relevant evaluation object-evaluation expression pairs in the discrimination result storage unit 20 and processing all the evaluation object-evaluation expression pairs, the discriminant result storage unit 20 has a relationship. The evaluation target-evaluation expression pair Look out and outputs.

なお、本発明の評判関係抽出装置は、コンピュータ装置からなり、モニタ等の表示手段、キーボード等の入力手段、ハードディスクやメモリ等の記億手段および外部ネットワークに接続可能な通信装置等（いずれも図示せず）を備えている。 The reputation relation extracting device of the present invention is composed of a computer device, a display means such as a monitor, an input means such as a keyboard, a storage means such as a hard disk or a memory, a communication device that can be connected to an external network, etc. Not shown).

以下、前述した各部における処理の詳細について例を挙げて説明する。 Hereinafter, the details of the processing in each unit described above will be described with examples.

＜言語解析情報＞
本装置への入力は、記憶手段に記憶された、あるいは入力手段から入力された文書単位の言語解析情報である。評判情報は、文書タイトルに記述された評価対象についての評判を記述することが良くあるため、本文に加え、文書タイトル（存在しない場合もあり得る）も言語解析情報に加えたほうが良い。 <Language analysis information>
The input to this apparatus is language analysis information in document units stored in the storage means or input from the input means. Since the reputation information often describes the reputation of the evaluation target described in the document title, it is better to add the document title (which may not exist) to the language analysis information in addition to the text.

言語解析情報は、少なくとも単語情報、固有表現情報および評価表現情報を含む。 The language analysis information includes at least word information, unique expression information, and evaluation expression information.

単語情報は、テキストデータに対して周知の形態素解析処理を行うことにより得られる、単語分割された各単語に対する情報であり、少なくとも表記および品詞を含む。 The word information is information for each word divided into words obtained by performing a known morphological analysis process on text data, and includes at least a notation and a part of speech.

固有表現情報は、テキストデータおよび前記単語情報に対して周知の固有表現抽出処理を行うことにより得られる、各固有表現に関する情報であり、少なくとも固有表現の位置およびその種類（人名や地名など）を含む。 The unique expression information is information related to each unique expression obtained by performing known unique expression extraction processing on the text data and the word information, and at least the position of the unique expression and its type (person name, place name, etc.) Including.

評価表現情報は、テキストデータ、前記単語情報および前記固有表現情報に対して周知の評価表現抽出処理（例えば『小林のぞみ、他「照応解析手法を利用した属性−評価値対および意見性情報の抽出」言語処理学会第１１回年次大会論文集、２００５年３月、ｐｐ．４３６−４３９』に示される評価値表現辞書等を用いた処理）を行うことにより得られる、各評価表現に関する情報であり、少なくとも評価表現の位置を含む。 The evaluation expression information is a known evaluation expression extraction process for text data, the word information, and the unique expression information (for example, “Nozomi Kobayashi, etc.” Extraction of attribute-evaluation value pair and opinion information using anaphora analysis method) "The processing using the evaluation value expression dictionary shown in the 11th Annual Conference of the Language Processing Society of Japan, March 2005, pp. 436-439"). Yes, including at least the position of the evaluation expression.

なお、これら以外にも、テキストデータおよび前記単語情報に対して周知の係り受け解析処理を行うことにより得られる、係り受け解析情報等を言語解析情報に含めても良い。 Besides these, dependency analysis information obtained by performing known dependency analysis processing on text data and the word information may be included in the language analysis information.

図４に単語情報、固有表現情報、評価表現情報および係り受け解析情報からなる言語解析情報の一例を示す。 FIG. 4 shows an example of language analysis information including word information, unique expression information, evaluation expression information, and dependency analysis information.

なお、本実施の形態における例では、第０文を常に文書タイトルとして扱う（文書タイトルが存在しない場合、第０文は空とする）。 In the example of the present embodiment, the 0th sentence is always handled as a document title (when there is no document title, the 0th sentence is empty).

＜ペア生成処理＞
組み合わせ生成部１１は、記憶手段に記億された、あるいは入力手段から入力された文書単位の言語解析情報を取得し（図２：ステップＳ１）、これより評価対象および評価表現を抽出し、さらに予め定められた固有表現の種類のみを評価対象として、評価対象と評価表現との全ての組み合わせを評価対象−評価表現ペアとして生成する（図２：ステップＳ２）。 <Pair generation process>
The combination generation unit 11 acquires the language analysis information of the document unit stored in the storage unit or input from the input unit (FIG. 2: step S1), and extracts the evaluation object and the evaluation expression from this, Only combinations of specific expressions defined in advance are set as evaluation targets, and all combinations of evaluation targets and evaluation expressions are generated as evaluation target-evaluation expression pairs (FIG. 2: step S2).

本実施の形態においては予め定められた固有表現の種類を、人名、地名、組織名、人工物名の４種類として説明を行う。この固有表現の種類は固定ではなく、用途に応じて予め定めておけば良い。 In the present embodiment, description will be made assuming that the types of specific expressions defined in advance are four types: person name, place name, organization name, and artifact name. The type of the unique expression is not fixed and may be determined in advance according to the application.

また、評価対象と評価表現との全ての組み合わせを評価対象−評価表現ペアとして生成するのではなく、抽出した評価対象と評価表現との全ての組み合わせのうち、評価表現からみた評価対象の位置が予め設定した文の範囲内にある組み合わせのみ、あるいは抽出した評価対象と評価表現との全ての組み合わせのうち、評価対象が予め設定した少なくとも１つの文に含まれる組み合わせのみ、もしくはこれらの両方に該当する組み合わせのみ、を評価対象−評価表現ペアとして生成しても良い。なお、前述した文の設定は各文の位置を表す文番号（絶対値）を指定することにより、また、文の範囲の設定は評価対象を含む文の文番号と評価表現を含む文の文番号との差から求められる文距離（相対値）を指定することにより行うことができる。 In addition, all the combinations of the evaluation object and the evaluation expression are not generated as an evaluation object-evaluation expression pair, but the position of the evaluation object as viewed from the evaluation expression among all the combinations of the extracted evaluation object and the evaluation expression is Only combinations that fall within the preset sentence range, or all combinations of the extracted evaluation object and evaluation expression, only the combination that is included in at least one sentence that the evaluation object is set in advance, or both Only combinations to be performed may be generated as an evaluation target-evaluation expression pair. Note that the sentence setting described above specifies the sentence number (absolute value) indicating the position of each sentence, and the sentence range setting is the sentence number of the sentence including the evaluation object and the sentence sentence including the evaluation expression. This can be done by specifying the sentence distance (relative value) obtained from the difference from the number.

次に、並び替え部１２は、組み合わせ生成部１１で生成された評価対象−評価表現ペアを後述する素性抽出・判別処理部２０での処理に適した所定の順序、ここでは出現する評価表現単位にその出現順に並び替える（図３のステップＳ３）。また、この際、同一の評価表現を含むペア同士の順序は、当該評価表現と同一文中の評価対象を含むペアを前に（当該評価表現と同一文中の評価対象を含むペア同士の順序は当該評価表現との距離がより近い評価対象を含むペアを前に）、当該評価表現と異なる文中の評価対象を含むペアが後に位置するように並び替え、さらにまた、当該評価表現と異なる文中の評価対象を含むペア同士の順序は、評価対象が評価表現の前にあるペアを前に（そのうちでは評価表現との距離がより近い評価対象を含むペアを前に）、評価対象が評価表現の後にあるペアを後に（そのうちでは評価表現との距離がより近い評価対象を含むペアを前に）位置するように並び替える。この並び替えられた評価対象−評価表現ペアを出力して、ペア生成処理は終了する。 Next, the rearrangement unit 12 sets the evaluation target-evaluation expression pair generated by the combination generation unit 11 in a predetermined order suitable for processing in the feature extraction / discrimination processing unit 20 described later, here, an evaluation expression unit that appears. Are rearranged in the order of appearance (step S3 in FIG. 3). At this time, the order of the pairs including the same evaluation expression is the order of the pair including the evaluation object in the same sentence as the evaluation expression (the order of the pair including the evaluation object in the same sentence as the evaluation expression is Sorted so that the pair containing the evaluation object in the sentence different from the evaluation expression is located after the pair containing the evaluation object that is closer to the evaluation expression), and further, the evaluation in the sentence different from the evaluation expression The order of the pairs including the target is as follows: the pair whose evaluation target is before the evaluation expression (before the pair including the evaluation target whose distance to the evaluation expression is closer), and the evaluation target after the evaluation expression A pair is rearranged so that a pair is positioned later (a pair including an evaluation object closer to the evaluation expression before). The rearranged evaluation object-evaluation expression pair is output, and the pair generation process ends.

図５に、図４で示した言語解析情報から生成されたペア生成処理部１０の出力例を示す。 FIG. 5 shows an output example of the pair generation processing unit 10 generated from the language analysis information shown in FIG.

図５（ａ）は全ての組み合わせを生成した場合の例である。評価対象として、固有表現の種類が人名、地名、組織名、人工物名のいずれかである、「Ｘ９０４ｉ（人工物名）」（第０文（タイトル））、「タナカ電器（組織名）」（第２文）、「Ｘ９０４ｉ（人工物名）」（第２文）、「ＡＢＣカフェ（組織名）」（第４文）を取り出し（「今日」は日付という種類なので対象外になる）、評価表現の出現順「暑かったね」（第１文）、「超人気」（第２文）、「すごくかっこよかった」（第３文）に、評価対象の位置により並び替えて出力する。 FIG. 5A shows an example in which all combinations are generated. “X904i (artifact name)” (0th sentence (title)), “Tanaka Electric (organization name)”, which is one of the names of persons, places, organizations, and artifacts (2nd sentence), “X904i (artifact name)” (2nd sentence), “ABC cafe (organization name)” (4th sentence) are taken out (because “today” is a type of date, it is excluded) The evaluation expressions are sorted in the order of appearance “hot” (first sentence), “very popular” (second sentence), “very cool” (third sentence), and sorted according to the evaluation target position.

図５（ｂ）は出現位置制約を加えて組み合わせを生成した場合の例であり、ここでは評価表現からみた評価対象の位置が予め設定した文の範囲内にある組み合わせのみ、具体的には文距離（＝評価対象文番号−評価表現文番号）が０以下である（言い換えれば、評価対象の位置が評価表現と同一文もしくはその前方の文である）組み合わせのみを出力している。 FIG. 5B shows an example in which a combination is generated by adding the appearance position constraint. Here, only combinations in which the position of the evaluation target viewed from the evaluation expression is within a preset sentence range, specifically, a sentence. Only combinations in which the distance (= evaluation target sentence number−evaluation expression sentence number) is 0 or less (in other words, the position of the evaluation target is the same sentence as the evaluation expression or a sentence in front thereof) are output.

なお、ペア生成処理部１０で生成された、並び替え後の評価対象−評価表現ペアは、前述した文書単位の言語解析情報とともに図示しない記憶装置に一時的に記憶され、素性抽出・判別処理部４０における処理に使用される。 The rearranged evaluation target-evaluation expression pairs generated by the pair generation processing unit 10 are temporarily stored in a storage device (not shown) together with the document unit language analysis information described above, and the feature extraction / discrimination processing unit 40 used for processing.

＜素性抽出・判別処理＞
前述したように、判別結果記憶部２０は全ての評価対象−評価表現ペアのうち、関係があると判別された評価対象−評価表現ペアの情報を記憶するが、この評価対象−評価表現ペアの情報としては、少なくとも評価対象識別情報、評価表現識別情報および同一文情報を含むものとする。 <Feature extraction / discrimination processing>
As described above, the discrimination result storage unit 20 stores information on the evaluation target-evaluation expression pair that is determined to have a relationship among all the evaluation target-evaluation expression pairs. The information includes at least evaluation object identification information, evaluation expression identification information, and the same sentence information.

評価対象識別情報は、評価対象の表記そのものでも良いが、その標準形（例えば、形態素解析等から得たり、他の外部知識を参照して得たりすることが可能）としても良い。例えば、標準形を評価対象識別情報とする場合の一例としては、言語解析情報の単語情報に標準形という情報があり、「よこすか」という表記の単語の標準形が「横須賀」である場合（これは形態素解析処理の結果より得られる）、「横須賀」を評価対象識別情報として記憶する。 The evaluation object identification information may be a notation of the evaluation object itself, but may be in its standard form (for example, obtained from morphological analysis or obtained by referring to other external knowledge). For example, as an example of using the standard form as the evaluation object identification information, there is information called the standard form in the word information of the language analysis information, and the standard form of the word “Yokosuka” is “Yokosuka” (this Is obtained from the result of the morphological analysis process), and “Yokosuka” is stored as evaluation object identification information.

評価表現識別情報は、位置の異なる評価表現を区別できる情報であれば何を用いても良い。例えば、各評価表現にＩＤを付与する場合にはそのＩＤとすれば良いし、文書における位置情報を用いても良い。 The evaluation expression identification information may be any information as long as it can distinguish evaluation expressions at different positions. For example, when an ID is assigned to each evaluation expression, the ID may be used, or position information in a document may be used.

同一文情報は、当該評価対象−評価表現ペアの評価対象および評価表現が同一文に存在する場合はＹｅｓ、それ以外はＮｏとなる情報である。 The same sentence information is information that is Yes when the evaluation object and the evaluation expression of the evaluation object-evaluation expression pair exist in the same sentence, and No otherwise.

後述する例においては、評価対象識別情報としてその表記そのものを用い、評価表現識別情報として当該評価表現の先頭単語についての文位置（の番号）−単語位置（の番号）を用いるものとする。例えば、図５（ｂ）の並び順序１のペアにおいて、評価表現「暑かったね」は第１文の３番目の単語であるので、評価表現識別情報は「１−３」となり、評価対象識別情報は「Ｘ９０４ｉ」となる。 In the example described later, the notation itself is used as the evaluation object identification information, and the sentence position (number) -word position (number) of the first word of the evaluation expression is used as the evaluation expression identification information. For example, in the pair of the arrangement order 1 in FIG. 5B, the evaluation expression “hot” is the third word of the first sentence, so the evaluation expression identification information is “1-3”, and the evaluation object identification information Becomes “X904i”.

以下、素性抽出・判別処理部４０による処理を詳細に説明する。 Hereinafter, processing by the feature extraction / discrimination processing unit 40 will be described in detail.

まず、ステップＳ１１では、処理対象ペア設定部４１により、ペア生成処理部１０から出力された評価対象−評価表現ペアのうち、先頭ペアを処理対象ペアとして設定する（ステップＳ１７より移った場合は、ペア生成処理部１０から出力された評価対象−評価表現ペアのうち、その時点で処理対象ペアとして設定されているペアの次のペアを処理対象ペアに設定する）。その後、ステップＳ１２に移る。 First, in step S11, the processing target pair setting unit 41 sets the first pair as the processing target pair among the evaluation target-evaluation expression pairs output from the pair generation processing unit 10 (if the process moves from step S17, Among the evaluation target-evaluation expression pairs output from the pair generation processing unit 10, the next pair of the pair set as the processing target pair at that time is set as the processing target pair). Thereafter, the process proceeds to step S12.

次に、ステップＳ１２では、判別対象ペア判定部４２により、判定結果記憶部２０に記憶された判別結果に基づき、前記処理対象ペアとして設定された評価対象−評価表現ペアが、その評価対象と評価表現との間に関係があるか否かの判別を行うべき判別対象ペアかどうかを判定、具体的には、判定結果記憶部２０に前記処理対象ペアと一致する評価対象識別情報および評価表現識別情報を含む評価対象−評価表現ペアが格納されているか否かを判定する。 Next, in step S12, the evaluation target-evaluation expression pair set as the processing target pair based on the determination result stored in the determination result storage unit 20 by the determination target pair determination unit 42 is evaluated and evaluated. It is determined whether or not there is a determination target pair to be determined whether or not there is a relationship with the expression. Specifically, evaluation target identification information and evaluation expression identification matching the processing target pair are stored in the determination result storage unit 20 It is determined whether an evaluation object-evaluation expression pair including information is stored.

判定がＮｏであり、格納されていなければ前記処理対象ペアを判別対象ペアと決定してステップＳ１３に移り、また、判定がＹｅｓであり、格納されていれば前記処理対象ペアを判別対象ペアでないと決定してステップＳ１７に移る。 If the determination is No and not stored, the processing target pair is determined as a determination target pair, and the process proceeds to step S13. If the determination is Yes and stored, the processing target pair is not a determination target pair. And the process proceeds to step S17.

ここで、ステップＳ１２の判定条件としては、前述した判別結果記憶部２０に処理対象ペアと一致するペアの情報が格納されているか否かという条件の外、判別結果記憶部２０に処理対象ペアの評価表現と一致する評価表現識別情報を含む評価対象−評価表現ペアが格納されているか否かを判定し、格納されていなければ前記処理対象ペアを判別対象ペアと決定し、格納されている場合は、さらに当該判定結果記憶部に格納されている評価対象−評価表現ペアの同一文情報がＹｅｓかどうか、並びに前記処理対象ペアの文距離は０以外かどうかを判定し、同一文情報がＹｅｓであり、かつ文距離は０以外であるときのみ前記処理対象ペアを判別対象ペアでないと決定し、それ以外の場合は前記処理対象ペアを判別対象ペアと決定する、という条件であっても良い。この判定条件の場合、評価表現と同一文に関係のある評価対象が抽出された場合、同一文に存在する並列的な評価対象は引き続き判別処理（ステップＳ１５）が行われ、複数抽出される可能性があるが、評価表現と異なる文にある評価対象は判別処理をスキップし、関係なしとして扱われることになる。これは、ある評価表現に関係する評価対象は原則１つであるという特徴を反映した処理となる。 Here, as the determination condition in step S12, in addition to the condition whether or not the information on the pair that matches the processing target pair is stored in the determination result storage unit 20 described above, the determination result storage unit 20 stores the pair of processing target pairs. When it is determined whether or not an evaluation target-evaluation expression pair including evaluation expression identification information that matches the evaluation expression is stored, and if not stored, the processing target pair is determined as a determination target pair and stored Further determines whether the same sentence information of the evaluation target-evaluation expression pair stored in the determination result storage unit is Yes, and whether the sentence distance of the processing target pair is other than 0, and the same sentence information is Yes It is determined that the processing target pair is not a discrimination target pair only when the sentence distance is other than 0, and otherwise, the processing target pair is determined as a discrimination target pair. It may be a matter. In the case of this determination condition, when an evaluation object related to the same sentence as the evaluation expression is extracted, a plurality of parallel evaluation objects existing in the same sentence are continuously subjected to the discrimination process (step S15), and a plurality of evaluation objects can be extracted. However, the evaluation target in a sentence different from the evaluation expression skips the discrimination process and is treated as irrelevant. This is a process reflecting the feature that there is one evaluation object related to an evaluation expression in principle.

以後の説明では、後者の条件をステップＳ１２の判定条件として利用するものとする。 In the following description, the latter condition is used as the determination condition in step S12.

次に、ステップＳ１３では、汎用情報抽出部４３および評判用情報抽出部４４により、前記判別対象ペアと判定された評価対象−評価表現ペアについて、汎用情報抽出処理および評判用情報抽出処理（これらを合わせて素性抽出処理と呼ぶ。）を行い、ステップＳ１４に移る。 Next, in step S13, the general-purpose information extraction unit 43 and the reputation information extraction unit 44 perform general-purpose information extraction processing and reputation information extraction processing (respectively for the evaluation target-evaluation expression pair determined as the discrimination target pair). Together, this is called feature extraction processing), and the process proceeds to step S14.

以下、ステップＳ１３の汎用情報抽出処理および評判用情報抽出処理の詳細について説明する。 The details of the general-purpose information extraction process and reputation information extraction process in step S13 will be described below.

汎用情報抽出処理は、汎用的に関係抽出に利用できる情報、例えば非特許文献２で示されるようなＮＥ間の文構造および文脈情報を表す素性を評価対象−評価表現ペアに単純に適用した、ペア間の単語（文字列）、ペア間の単語（品詞）、ペア間の単語数、ペア間に存在する固有表現数、評価対象の係り先文節（文字列）、評価対象の係り先文節（品詞）、評価表現の係り先文節（文字列）、評価表現の係り先文節（品詞）、ペアが同一文節内に存在するか否か、ペアを含む文節間の最短経路の距離、係り受け構造情報、話題区切り情報、省略情報等を、言語解析情報より非特許文献２に述べられた手法等の既存技術を用いて抽出する。 The general-purpose information extraction processing simply applies information that can be used for relationship extraction in a general manner, for example, a sentence structure between NEs as shown in Non-Patent Document 2 and features representing context information to an evaluation target-evaluation expression pair. Words between pairs (character strings), words between pairs (parts of speech), number of words between pairs, number of unique expressions existing between pairs, destination clauses to be evaluated (strings), destination clauses to be evaluated ( Part of speech), dependency clause of evaluation expression (string), dependency clause of evaluation expression (part of speech), whether pair exists in the same phrase, distance of shortest path between clauses including pair, dependency structure Information, topic break information, omission information, etc. are extracted from language analysis information using existing techniques such as the method described in Non-Patent Document 2.

図５（ｂ）の並び順序５、「Ｘ９０４ｉ−すごくかっこよかった」を例に、この汎用情報抽出処理で抽出した素性例を図６に示す。この例では、汎用情報の素性として、ペア間の単語（文字列）、ペア間の単語数、ペア間に存在する固有表現数、評価対象の係り先文節（文字列）、評価対象の係り先文節（品詞）、評価表現の係り先文節（文字列）、評価表現の係り先文節（品詞）、ペアが同一文節内に存在するか否か、および、省略情報として、評価対象が存在するか否か？、評価対象がどの格助詞と対応する領域に記憶されているか？、評価対象がどの格助詞と対応する領域に記憶されているか？、その領域の何番目に記憶されているか？、評価対象が最上位の格助詞に記憶されているか？、評価対象までの経路、を出力している。 FIG. 6 shows an example of features extracted by this general-purpose information extraction process, taking as an example the arrangement order 5 of FIG. 5B, “X904i—very cool”. In this example, the features of the general information include the words between the pairs (character strings), the number of words between the pairs, the number of unique expressions existing between the pairs, the evaluation target clauses (character strings), and the evaluation target relationships. The clause (part of speech), the dependency clause of the evaluation expression (character string), the dependency clause of the evaluation expression (part of speech), whether the pair exists in the same clause, and whether the evaluation target exists as abbreviation information or not? , Which case particles are to be evaluated are stored in the corresponding region? , Which case particles are to be evaluated are stored in the corresponding region? , What number in that area is remembered? , Is the evaluation target memorized in the highest case particle? , The route to the evaluation target.

評判用情報抽出処理は、言語解析情報より、評判用に有効な素性として、少なくとも、ペア間の文距離、文書タイトルに含まれるかどうか？、ペア間の評価表現数、評価対象種類を抽出する。 Whether the reputation information extraction process is included in at least the sentence distance between the pair and the document title as a feature useful for reputation from the language analysis information? The number of evaluation expressions between pairs and the type of evaluation object are extracted.

以下、これらの素性について、図５（ｂ）の並び順序５、「Ｘ９０４ｉ−すごくかっこよかった」を具体例として説明する。 Hereinafter, these features will be described using a specific example of the arrangement order 5, “X904i—very cool” in FIG. 5B.

ペア間の文距離は、評価対象文番号−評価表現文番号より算出するものであり、「Ｘ９０４ｉ−すごくかっこよかった」では、評価表現文番号＝３、評価対象文番号＝２であるので、２−３＝−１となる。 The sentence distance between the pairs is calculated from the evaluation object sentence number−the evaluation expression sentence number. In “X904i−very cool”, the evaluation expression sentence number = 3 and the evaluation object sentence number = 2. −3 = −1.

文書タイトルに含まれるかどうかは、当該の評価対象が文書タイトルに存在するかどうかを表すものであり、「Ｘ９０４ｉ−すごくかっこよかった」の「Ｘ９０４ｉ」は第２文に存在し、第０文（文書タイトル）に存在しないので、Ｎｏとなる。 Whether or not the document is included in the document title indicates whether or not the evaluation target exists in the document title. “X904i” of “X904i-very cool” exists in the second sentence, and the zeroth sentence ( Document title) does not exist, so No.

ペア間の評価表現数は、当該ペアの評価対象と評価表現との間に存在する評価表現の数であり、「Ｘ９０４ｉ−すごくかっこよかった」の例では、１つも存在しないため、０となる。 The number of evaluation expressions between the pairs is the number of evaluation expressions existing between the evaluation object of the pair and the evaluation expression. In the example of “X904i—very cool”, there is no one, so it is zero.

評価対象種類は、評価対象の固有表現の種類であり、「Ｘ９０４ｉ」については「人工物」となる。 The evaluation target type is a type of specific expression of the evaluation target, and “X904i” is “artifact”.

以上、評判用情報抽出処理において抽出される素性をまとめて図７に示す。 The features extracted in the reputation information extraction process are collectively shown in FIG.

次に、ステップＳ１４では、モデル選択部４５により、前記判別対象ペアと判定された評価対象−評価表現ペアを、当該判別対象ペアと判定された評価対象−評価表現ペアの情報を用いて分類し、ステップＳ１５に移る。本実施の形態においては、当該評価対象−評価表現ペアが同一文か同一文以外か（評価対象および評価表現が同一文に含まれているか同一文以外に含まれているか）という分類を行うものとする。 Next, in step S14, the model selection unit 45 classifies the evaluation target-evaluation expression pair determined as the determination target pair using information on the evaluation target-evaluation expression pair determined as the determination target pair. The process proceeds to step S15. In the present embodiment, classification is performed as to whether the evaluation object-evaluation expression pair is the same sentence or other than the same sentence (whether the evaluation object and the evaluation expression are included in the same sentence or other than the same sentence). And

次に、ステップＳ１５では、分類器４６により、汎用情報抽出部４３および評判用情報抽出部４４から出力された素性と、モデル選択部４５から出力された当該評価対象−評価表現ペアの分類情報とを用いて、モデル記憶部３０からモデルを抽出し、該抽出したモデルに基づいて前記判別対象ペアと判定された評価対象−評価表現ペアの評価対象と評価表現との間に関係があるか否かを判定し、ステップＳ１６に移る。 Next, in step S15, the classifier 46 outputs the features output from the general-purpose information extraction unit 43 and the reputation information extraction unit 44, and the evaluation target-evaluation expression pair classification information output from the model selection unit 45. Whether or not there is a relationship between the evaluation target and the evaluation expression of the evaluation target-evaluation expression pair determined as the determination target pair based on the extracted model. Is determined, and the process proceeds to step S16.

ここで、モデルは、所定の評価対象−評価表現ペアにおける評価対象と評価表現との間の関係の有無について事前に判別された結果と、該所定の評価対象−評価表現ペアを含むテキストを用いて、前述した既存技術を用いて作られた言語解析情報と、ペア生成処理部１０、汎用情報抽出部４３および評判用情報抽出部４４から事前に抽出された情報とを用いて周知の機械学習を行うことにより予め作成され、モデル記億部３０に記憶されているものとする。また、この際、所定の評価対象−評価表現ペアについての判別結果は人的な判断に基づいてなされているものとする。なお、各モデルを、「同一文」「同一文以外」のように、評価対象−評価表現ペアの分類に応じて構成しても良いし、分類を区別することなく構成しても良い。 Here, the model uses a result determined in advance as to whether or not there is a relationship between the evaluation object and the evaluation expression in the predetermined evaluation object-evaluation expression pair, and a text including the predetermined evaluation object-evaluation expression pair. Well-known machine learning using language analysis information created using the above-described existing technology and information extracted in advance from the pair generation processing unit 10, the general-purpose information extraction unit 43, and the reputation information extraction unit 44 Are created in advance and stored in the model storage unit 30. At this time, it is assumed that the discrimination result for the predetermined evaluation object-evaluation expression pair is based on human judgment. Each model may be configured according to the classification of the evaluation object-evaluation expression pair, such as “same sentence” and “other than the same sentence”, or may be configured without distinguishing the classification.

なお、分類器４６は、関係あるか否かという判定結果の他に、関係度を表す数値を出力するように構成しても良い。また、機械学習としては、周知のものを用いることが可能であるが、木構造やグラフ構造のデータを直接入力して学習可能に構成されたものを用いることが望ましい。 The classifier 46 may be configured to output a numerical value representing the degree of relationship in addition to the determination result of whether or not there is a relationship. As machine learning, a well-known machine can be used. However, it is desirable to use a machine that can learn by directly inputting data of a tree structure or a graph structure.

次に、ステップＳ１６では、判別結果格納部４７により、分類器４６において関係があると判別された評価対象−評価表現ペアに関する評価表現識別情報、評価対象識別情報および同一文情報を判別結果記憶部２０に格納する。その後、ステップＳ１７に移る。 Next, in step S16, the determination result storage unit 47 stores the evaluation expression identification information, the evaluation object identification information, and the same sentence information related to the evaluation object-evaluation expression pair determined to be related by the classifier 46 as a determination result storage unit. 20. Thereafter, the process proceeds to step S17.

なお、分類器４６が関係度を表す数値を出力するようになっている場合には、関係度が予め設定された所定の閾値よりも大きいときのみに関係があると判定するようにしても良い。 If the classifier 46 outputs a numerical value indicating the degree of relationship, it may be determined that the relationship is relevant only when the degree of relationship is greater than a predetermined threshold value set in advance. .

次に、ステップＳ１７では、処理対象ペア設定部４１により、現在の処理対象ペアが、ペア生成処理部１０より出力された評価対象−評価表現ペアの末尾ペアであるかどうかを判定する。ＹｅｓであればステップＳ１８に移り、ＮｏであればステップＳ１１に移る。 Next, in step S <b> 17, the processing target pair setting unit 41 determines whether or not the current processing target pair is the last pair of the evaluation target-evaluation expression pair output from the pair generation processing unit 10. If Yes, the process moves to step S18, and if No, the process moves to step S11.

次に、ステップＳ１８では、ペア出力部４８により、処理対象の文書において、関係があると判定された評価対象−評価表現ペア（の情報）を、判別結果記憶部２０より取得し、表示手段あるいは記憶装置に出力し、処理を終了する。 Next, in step S18, the pair output unit 48 acquires the evaluation object-evaluation expression pair (information) determined to have a relationship in the document to be processed from the determination result storage unit 20, and displays the display means or The data is output to the storage device, and the process ends.

図８はペア生成処理部１０の出力が図５（ｂ）であった場合の、素性抽出・判別処理部４０の各部における処理結果を示すものである。 FIG. 8 shows a processing result in each unit of the feature extraction / discrimination processing unit 40 when the output of the pair generation processing unit 10 is FIG. 5B.

始めに、ステップＳ１１で並び順序１のペアを処理対象ペアに設定し、ステップＳ１２で判別結果記憶部２０には何も格納されていないので判別対象ペアと判定し、ステップＳ１３に移って素性抽出を行い、ステップＳ１４において評価対象と評価表現が同一文には存在しないので同一文以外と分類する。次に、ステップＳ１５において、同一文以外のモデルを抽出して判別を行い、関係なしという結果を得て、ステップＳ１６では何も行わず、ステップＳ１７からステップＳ１１に移り、並び順序２のペアを処理対象ペアに設定する。 First, in step S11, the pair in the arrangement order 1 is set as a processing target pair. In step S12, nothing is stored in the determination result storage unit 20, so that it is determined as a determination target pair, and the process moves to step S13. In step S14, since the evaluation target and the evaluation expression do not exist in the same sentence, they are classified as other than the same sentence. Next, in step S15, a model other than the same sentence is extracted and discriminated to obtain a result that there is no relation. In step S16, nothing is performed, and the process proceeds from step S17 to step S11. Set the processing target pair.

先ほどと同様に、ステップＳ１２、ステップＳ１３に移り、ステップＳ１４においては同一文に存在するので同一文と分類し、ステップＳ１５において、同一文のモデルを抽出して判定を行い、関係ありという結果を得て、ステップＳ１６において、評価対象識別情報としてその表記である「Ｘ９０４ｉ」、評価表現識別情報として第２文の第９単語であるので「２−９」、同一文情報として評価対象と評価表現は同一文であるので「Ｙｅｓ」を設定する。以下同様に処理を進め、図８に示すようなステップＳ１２の判定、ステップＳ１５の判別結果、および関係ありの場合、判別結果記憶部２０へ格納する情報の設定が行われる。 Similarly to the previous step, the process proceeds to step S12 and step S13. In step S14, the sentence is classified as the same sentence because it exists in the same sentence. In step S15, the model of the same sentence is extracted and determined. In step S16, “X904i” which is the notation as evaluation object identification information, “2-9” because it is the ninth word of the second sentence as evaluation expression identification information, evaluation object and evaluation expression as the same sentence information Since “S” is the same sentence, “Yes” is set. In the same manner, the process proceeds in the same manner, and the determination in step S12, the determination result in step S15, and the information stored in the determination result storage unit 20 are set when there is a relationship as shown in FIG.

ここで、並び順序４に対する処理において、ステップＳ１２では、２−９の位置にある評価表現「超人気」、評価対象の表記「Ｘ９０４ｉ」が、並び順序２の処理において判別結果記憶部２０へ格納された情報と一致するため、Ｙｅｓ（判別対象ペアでない）と判定される。 Here, in the process for the arrangement order 4, in step S12, the evaluation expression “super popular” and the evaluation target notation “X904i” at the position 2-9 are stored in the determination result storage unit 20 in the arrangement order 2 process. Since the information matches the determined information, it is determined as Yes (not a discrimination target pair).

図９は図８の処理結果に対応する最終的な素性抽出・判別処理部４０の出力例を示している。ここでは、評価対象および評価表現の表記のみを出力しているが、それぞれの位置（評価対象は複数の場合があり得るので、例えば評価表現から最短に位置する評価対象の位置）などの情報を出力しても良い。 FIG. 9 shows an output example of the final feature extraction / discrimination processing unit 40 corresponding to the processing result of FIG. Here, only the notation of the evaluation object and the evaluation expression is output, but information such as each position (the evaluation object may be a plurality of evaluation objects, for example, the position of the evaluation object located in the shortest position from the evaluation expression), etc. It may be output.

本発明の評判関係抽出装置の実施の形態の一例を示す構成図The block diagram which shows an example of embodiment of the reputation relationship extraction apparatus of this invention ペア生成処理部におけるペア生成処理の流れ図Flow chart of pair generation processing in the pair generation processing unit 素性抽出・判別処理部における素性抽出・判別処理の流れ図Flow chart of feature extraction / discrimination processing in the feature extraction / discrimination processing unit 言語解析情報の一例を示す説明図Explanatory diagram showing an example of language analysis information ペア生成処理部で生成された評価対象−評価表現ペアの一例を示す説明図Explanatory drawing which shows an example of the evaluation object-evaluation expression pair produced | generated by the pair production | generation process part 汎用情報抽出部で抽出された素性の一例を示す説明図Explanatory drawing which shows an example of the feature extracted in the general-purpose information extraction part 評判用情報抽出部で抽出された素性の一例を示す説明図Explanatory drawing which shows an example of the feature extracted by the information extraction part for reputation 素性抽出・判別処理部における処理結果の一例を示す説明図Explanatory drawing which shows an example of the processing result in a feature extraction / discrimination processing part 最終的な素性抽出・判別処理部の出力の一例を示す説明図Explanatory drawing which shows an example of the output of the final feature extraction / discrimination processing unit

Explanation of symbols

１０：ペア生成処理部、１１：組み合わせ生成部、１２：並び替え部、２０：判別結果記憶部、３０：モデル記憶部、４０：素性抽出・判別処理部、４１：処理対象ペア設定部、４２：判別対象ペア判定部、４３：汎用情報抽出部、４４：評判用情報抽出部、４５：モデル選択部、４６：分類器、４７：判別結果格納部、４８：ペア出力部４８。 10: Pair generation processing unit, 11: Combination generation unit, 12: Rearrangement unit, 20: Discrimination result storage unit, 30: Model storage unit, 40: Feature extraction / discrimination processing unit, 41: Processing target pair setting unit, 42 : Discrimination target pair determination unit, 43: general-purpose information extraction unit, 44: reputation information extraction unit, 45: model selection unit, 46: classifier, 47: discrimination result storage unit, 48: pair output unit 48.

Claims

A reputation relationship extraction device that extracts a set of a related evaluation object and an evaluation expression from document-based language analysis information including at least word information, unique expression information, and evaluation expression information,
A combination generation unit that extracts the evaluation object and the evaluation expression from the document unit language analysis information, and generates a combination of the evaluation object and the evaluation expression as an evaluation object-evaluation expression pair;
A rearrangement unit for rearranging the evaluation object-evaluation expression pairs in a predetermined order;
A discrimination result storage unit for storing an evaluation subject-evaluation expression pair determined to have a relationship among all evaluation object-evaluation expression pairs;
A model storage unit that stores a plurality of models for determining whether or not there is a relationship between the evaluation object included in the evaluation object-evaluation expression pair and the evaluation expression;
A processing target pair setting unit for sequentially setting the evaluation target-evaluation expression pair after the rearrangement from the first pair to the last pair as a processing target pair;
Based on the determination result stored in the determination result storage unit, it should be determined whether or not the evaluation object-evaluation expression pair set as the processing object pair has a relationship between the evaluation object and the evaluation expression. A discrimination target pair determination unit for determining whether the pair is a discrimination target;
A general-purpose information extraction unit that extracts information on words between the pairs or dependency information between the pairs as features from the language analysis information for the evaluation target-evaluation expression pair determined to be the determination target pair;
For the evaluation object-evaluation expression pair determined to be the determination object pair, the sentence distance between the pair, whether the evaluation object of the pair is included in the document title from the language analysis information , the number of evaluation expressions between the pair, and the evaluation Reputation information extraction unit that extracts target types as features ,
A model selection unit for classifying the evaluation object-evaluation expression pair determined as the determination object pair;
A model is extracted from a model storage unit using the features extracted by the general-purpose information extraction unit and the reputation information extraction unit and the classification result classified by the model selection unit, and the discrimination is performed based on the extracted model. A classifier for discriminating whether or not there is a relationship between the evaluation object of the evaluation object-evaluation expression pair determined as the object pair and the evaluation expression;
A determination result storage unit that stores the evaluation object-evaluation expression pair determined to have a relationship in the determination result storage unit;
A reputation relationship extraction apparatus comprising: a pair output unit that outputs an evaluation object-evaluation expression pair stored in a discrimination result storage unit.

It is characterized by comprising a combination generation unit that extracts evaluation targets and evaluation expressions from document-based language analysis information, and generates all combinations of the extracted evaluation objects and evaluation expressions as evaluation target-evaluation expression pairs. The reputation relationship extraction device according to claim 1.

The evaluation target and the evaluation expression are extracted from the language analysis information of the document unit, and the position of the evaluation target viewed from the evaluation expression is within a preset sentence range among all combinations of the extracted evaluation target and the evaluation expression. The reputation relation extraction device according to claim 1, further comprising a combination generation unit that generates only combinations as evaluation target-evaluation expression pairs.

The evaluation target and the evaluation expression are extracted from the linguistic analysis information of the document unit, and only the combination included in at least one sentence preset by the evaluation target is evaluated among all the combinations of the extracted evaluation target and the evaluation expression. The reputation relation extraction device according to claim 1, further comprising a combination generation unit that generates as an evaluation expression pair.

It is determined whether or not an evaluation target-evaluation expression pair including an evaluation target and an evaluation expression matching the processing target pair is stored in the determination result storage unit. If not stored, the processing target pair is determined as a determination target pair. The reputation relationship extraction device according to claim 1, further comprising: a discrimination target pair determination unit that determines and determines that the processing target pair is not a discrimination target pair if it is stored.

It is determined whether or not an evaluation target-evaluation expression pair including an evaluation expression that matches the processing target pair is stored in the determination result storage unit, and if not stored, the processing target pair is determined as a determination target pair, If stored, the evaluation object and evaluation expression of the evaluation object-evaluation expression pair stored in the determination result storage unit exist in the same sentence, and the evaluation object and evaluation expression of the processing object pair are different in the sentence The reputation relationship extraction device according to claim 1, further comprising: a discrimination target pair determination unit that determines that the processing target pair is not a discrimination target pair only when the processing target pair exists.

A reputation relationship extraction method for extracting a set of a related evaluation object and an evaluation expression from document-based language analysis information including at least word information, unique expression information, and evaluation expression information,
A step of generating a combination of the evaluation object and the evaluation expression as an evaluation object-evaluation expression pair, wherein the combination generation unit extracts the evaluation object and the evaluation expression from the document unit language analysis information;
A rearrangement unit rearranging the evaluation object-evaluation expression pairs in a predetermined order; and
A processing target pair setting unit sequentially sets the rearranged evaluation target-evaluation expression pair from the first pair to the last pair as a processing target pair;
Based on the discrimination result stored in the discrimination result storage unit that stores the evaluation subject-evaluation expression pair that is discriminated as having a relation among all the evaluation object-evaluation expression pairs, the discrimination target pair determination unit Determining whether the evaluation object-evaluation expression pair set as a pair is a determination object pair to determine whether or not there is a relationship between the evaluation object and the evaluation expression;
A step in which a general-purpose information extraction unit extracts information about words between the pairs or dependency information between the pairs as features from the language analysis information for the evaluation target-evaluation expression pair determined to be the determination target pair;
For the evaluation object-evaluation expression pair determined to be the determination target pair, the reputation information extraction unit determines whether the sentence title between the pairs is included in the document title from the language analysis information , whether the evaluation target of the pair is included in the document title, Extracting the number of evaluation expressions between and the types of evaluation target as features ,
A step in which the model selection unit classifies the evaluation target-evaluation expression pairs determined as the determination target pairs;
Classifier, a feature that the extracted generic information extracting unit and the reputation information extracting unit, by using the classified classification result by the model selection unit, evaluation - evaluated included in the evaluation expressions pair Rating A model is extracted from a model storage unit that stores a plurality of models for determining whether or not there is a relationship with an expression, and an evaluation target that is determined as the determination target pair based on the extracted model − Determining whether there is a relationship between the evaluation target of the evaluation expression pair and the evaluation expression;
A step in which the discrimination result storage unit stores the evaluation object-evaluation expression pair discriminated to be related in the discrimination result storage unit;
Pair output unit, the determination result storage unit stored in the evaluation - Reputation relation extraction method characterized by comprising the step of outputting the evaluation expressions pair.

The combination generation unit includes a step of extracting the evaluation object and the evaluation expression from the document unit language analysis information, and generating all combinations of the extracted evaluation object and the evaluation expression as an evaluation object-evaluation expression pair. The reputation relationship extraction method according to claim 7 .

The combination generation unit extracts the evaluation object and the evaluation expression from the document unit language analysis information, and among the combinations of the extracted evaluation object and the evaluation expression, the position of the evaluation object viewed from the evaluation expression is set in advance. The reputation relationship extracting method according to claim 7 , further comprising: generating only combinations within the range as evaluation target-evaluation expression pairs.

The combination generation unit extracts the evaluation object and the evaluation expression from the document unit language analysis information, and the evaluation object is included in at least one sentence set in advance among all combinations of the extracted evaluation object and the evaluation expression. The reputation relationship extracting method according to claim 7 , further comprising: generating only a combination as an evaluation object-evaluation expression pair.

The evaluation target and evaluation that match the processing target pair in the determination result storage unit that stores the evaluation object-evaluation expression pair that is determined to be related among all the evaluation object-evaluation expression pairs. It is determined whether or not an evaluation target-evaluation expression pair including an expression is stored. If not, the processing target pair is determined as a determination target pair, and if stored, the processing target pair is determined as a determination target pair. The reputation relationship extracting method according to claim 7 , further comprising: a step of determining that it is not.

The determination target pair determination unit includes an evaluation expression that matches the processing target pair in the determination result storage unit that stores the evaluation target-evaluation expression pair that is determined to be related among all the evaluation target-evaluation expression pairs. It is determined whether or not an evaluation target-evaluation expression pair is stored, and if not stored, the processing target pair is determined as a determination target pair. If stored, it is further stored in the determination result storage unit. The evaluation target-evaluation expression pair of the evaluation target and the evaluation expression are present in the same sentence, and the processing target pair is determined not to be the discrimination target pair only when the evaluation target of the processing target pair and the evaluation expression exist in different sentences. The reputation relationship extracting method according to claim 7 , further comprising a step of:

A reputation relationship extraction program for causing a computer to execute each processing step of the reputation relationship extraction method according to claim 7 .