JP2006190229A

JP2006190229A - Opinion extraction learning device and opinion extraction classifying device

Info

Publication number: JP2006190229A
Application number: JP2005003265A
Authority: JP
Inventors: Kenji Tateishi; 健二立石; Yuji Matsumoto; 裕治松本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2005-01-07
Filing date: 2005-01-07
Publication date: 2006-07-20
Anticipated expiration: 2025-01-07
Also published as: JP4600045B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an opinion extraction learning device for precisely extracting the opinions of writers from documents to be investigated, and to provide an opinion extraction classifying device. <P>SOLUTION: The opinion extraction learning device comprises an attribute evaluation paired candidate extracting means 200 for inputting the documents to be investigated on which the opinions of the writers such as questionnaire results are described, grouping and extracting attribute evaluation paired candidates from the documents, and imparting the features of attribute expression and evaluation expression for every attribute evaluation paired candidate, a classifying-1 means 260 for inputting the grouped attribute evaluation paired candidates and a classification model 1 such as an identifying function Z1 and extracting positive example attribute evaluation paired candidates from the attribute evaluation paired candidates in the same group, and a a classifying-2 means 290 for inputting the positive example attribute evaluation paired candidates and a classifying model 2 such as an identifying function Z2 and processing the extraction of a positive example attribute evaluation pair from the positive example attribute evaluation paired candidates. Thus, the opinions of the writers are extracted from the documents to be investigated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、調査対象文書の中から、属性表現と評価表現のペアを意見として抽出するための意見抽用学習装置及び意見抽出用分類装置に関する。 The present invention relates to an opinion extraction learning device and an opinion extraction classification device for extracting a pair of an attribute expression and an evaluation expression as an opinion from an investigation target document.

ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、アンケート結果、企業の問い合わせ窓口への問い合わせデータ、そして報告書等の種々の文書には、数多くの対象についての多くの意見が記述されている。これら多くの意見の中には、特定の対象に対する肯定又は否定の評価を示す意見のほか、トラブルや問題を報告したクレーム情報、改善点等を指摘した要望も含まれている。 Various documents such as blogs, Web bulletin boards, E-mails, questionnaire results, inquiry data for company inquiry counters, and reports, etc., describe many opinions about many objects. Many of these opinions include opinions indicating positive or negative evaluations on specific subjects, as well as requests for reporting complaints and reports on improvements and problems.

このようなさまざまな意見が記述されている文書内から、特定の対象物に関する筆者の意見が記述されている箇所を自動で抽出することが可能であれば、その対象物についての評判や批評、顧客の要望等を自動で得ることができ、商品購入前の事前調査や、その意見を反映させた改良製品の販売や新製品の開発等を行うことができるので、企業の市場調査等のマーケティングに重要な情報となるものである。 If it is possible to automatically extract the part where the author's opinion about a specific object is described from a document in which such various opinions are described, the reputation and criticism of the object, Customers' requests can be obtained automatically, pre-study before product purchase, sales of improved products reflecting the opinions, development of new products, etc., so marketing such as corporate market research It is important information.

ところがこれらの一般の文書は、もともと対象物に関する意見そのものを記述することを目的とした文書ではないので、意見と無関係の情報も多く含まれる。したがって、このような文書から自動で精度良く意見に関する情報を抽出することができないので、そのまま統計に利用することができないという不具合を生じていた。 However, since these general documents are not originally intended to describe opinions themselves about the object, they contain a lot of information unrelated to opinions. Therefore, since information regarding opinions cannot be extracted automatically and accurately from such a document, there is a problem that it cannot be used for statistics as it is.

意見抽出に関する背景技術として、文書中に含まれるある特定の対象に関する属性（デザイン、価格、サポート、性能等の表現）と、その属性表現に関連のある評価（良い、広い、好き等の、前記属性の取り得る値を示す表現）のペア（属性評価ペア）を所定のルールを用いて抽出する方法が知られている（例えば非特許文献１参照。）。 As background technology related to opinion extraction, attributes (design, price, support, performance, etc.) related to a specific object included in the document and evaluations related to the attribute expression (good, wide, like, etc.) There is known a method of extracting a pair (attribute evaluation pair) of an expression indicating a possible value of an attribute using a predetermined rule (see, for example, Non-Patent Document 1).

非特許文献１に記載の方法では、属性表現と評価表現に正しい関係が存在するかを判断する抽出ルールと、属性表現と評価表現に正しい関係があった場合にその主観評価性を判定する意見性判定ルールの２つを用いている。 In the method described in Non-Patent Document 1, an extraction rule for determining whether a correct relationship exists between an attribute expression and an evaluation expression, and an opinion for determining subjective evaluation when there is a correct relationship between the attribute expression and the evaluation expression Two sex determination rules are used.

前述の属性表現とは、意見抽出を目的としたある特定の対象に関する属性表現であって、「デザイン」、「室内」、「価格」、「サポート」、「性能」等の、ある特定の対象のある側面を表す表現を示すものである。 The above-mentioned attribute expression is an attribute expression related to a specific target for the purpose of extracting opinions, such as “design”, “room”, “price”, “support”, “performance”, etc. This shows an expression representing a certain aspect.

評価表現とは、物事に対する評価を示す表現であって、「良い」、「好き」、「広い」、「安い」等の、属性表現が取り得る値に関する評価表現を示すものである。 The evaluation expression is an expression indicating an evaluation of things, and indicates an evaluation expression regarding values that the attribute expression can take such as “good”, “like”, “wide”, “cheap”, and the like.

属性評価ペアとは、下記の２つの要件を満たす属性表現と評価表現の組み合わせを示す。
（要件１）属性表現と評価表現の間に正しい関係がある
（要件２）主観評価を示す
例えば、属性表現「デザイン」評価表現「良い」を考えたとき、下記の（ａ）は要件１を満たすが、（ｂ）は満たさない。（ｂ）は「良い」がさしている対象が「デザイン」ではないからである。
（ａ）デザインが良い。
（ｂ）デザインはさておき、良いこともある。 An attribute evaluation pair indicates a combination of an attribute expression and an evaluation expression that satisfy the following two requirements.
(Requirement 1) There is a correct relationship between the attribute expression and the evaluation expression. (Requirement 2) Indicates subjective evaluation. For example, when considering the attribute expression “design” evaluation expression “good”, the following (a) Satisfies but does not satisfy (b). (B) is because the object that “good” refers to is not “design”.
(A) Good design.
(B) Aside from design, it can be good.

一方、同様に属性表現「デザイン」評価表現「良い」を考えたとき下記の（ｃ）は要件２を満たすが、（ｄ）は要件２を示さない。（ｄ）は疑問、（ｅ）は条件であって記述者自身の主観的な評価を示さないからである。
（ｃ）デザインは良い。
（ｄ）デザインは良いか？
（ｅ）デザインは良いなら、買う。 On the other hand, when the attribute expression “design” and the evaluation expression “good” are considered similarly, the following (c) satisfies the requirement 2, but (d) does not indicate the requirement 2. This is because (d) is a question and (e) is a condition and does not indicate the subjective evaluation of the writer himself.
(C) The design is good.
(D) Is the design good?
(E) If the design is good, buy it.

意見抽出とは、文書から正例の属性評価ペア（正しい属性表現と評価表現との組み合わせ）を抽出することを示すものとする。なお、負例の属性評価ペアとは、意見として抽出するのにふさわしくない属性表現と評価表現との組み合わせを示す。 Opinion extraction indicates that a positive attribute evaluation pair (a combination of correct attribute expression and evaluation expression) is extracted from a document. The negative example attribute evaluation pair indicates a combination of an attribute expression and an evaluation expression that are not suitable for extraction as an opinion.

また、機械学習を用いて文脈から先行詞と照応詞の対を抽出する照応解析のモデルが知られている（例えば非特許文献２参照。）。
立石健二、福島俊一、小林のぞみ、高橋哲朗、藤田篤、乾健太郎、松本裕治：Ｗｅｂ文書集合からの意見情報抽出と着眼点に基づく要約作成、情報処理学会研究報告、ＮＬ−１６３、ｐ.１−８（２００４） Soon, W. M., Ng, H. T. and Lim, D. C. Y.:A Machine Learning Approach to Corefernce Resolution of Noun Phrases,Computational Linguistics,Vol.27,No.4,pp.521−544（2001） Also, an anaphoric analysis model is known that uses machine learning to extract antecedent and anaphoric pairs from context (see, for example, Non-Patent Document 2).
Kenji Tateishi, Shunichi Fukushima, Nozomi Kobayashi, Tetsuro Takahashi, Atsushi Fujita, Kentaro Inui, Yuji Matsumoto: Extraction of opinion information from a set of Web documents and creation of a summary based on the viewpoint, Information Processing Society of Japan Research Report, NL-163, p.1 -8 (2004) Soon, WM, Ng, HT and Lim, DCY: A Machine Learning Approach to Corefernce Resolution of Noun Phrases, Computational Linguistics, Vol.27, No.4, pp.521-544 (2001)

非特許文献１の意見抽出方法において、下記の抽出ルール（１）及び意見性判定ルール（２）を設けて意見の抽出処理を行う場合について検討する。なお、（１）の「→」は、係り受け関係が存在することを意味している。なお、抽出ルールとは、［属性］と［評価］との間に関連のある組み合わせを正解として抽出するためのルールである。
抽出ルール（１）［属性］＋「が」→［評価］
意見性判定ルール（２）［評価］＋「なら」 In the opinion extraction method of Non-Patent Document 1, a case where opinion extraction processing is performed by providing the following extraction rule (1) and opinion determination rule (2) will be considered. Note that “→” in (1) means that a dependency relationship exists. The extraction rule is a rule for extracting a combination that is related between [attribute] and [evaluation] as a correct answer.
Extraction rules (1) [Attribute] + "ga"-> [Evaluation]
Opinion judgment rule (2) [Evaluation] + "If"

属性表現が「デザイン」、評価表現が「良い」で与えられる場合に上記の抽出ルール（１）と意見性判定ルール（２）とを用いて、下記のサンプル文（ａ）〜（ｅ）ついて意見抽出処理を行ってみる。
（ａ）デザインが良い。
（ｂ）デザインの方に関しては、良い出来といえるだろう。
（ｃ）デザインが変わった、良い製品がほしい。
（ｄ）デザインが良いならほしい。
（ｅ）デザインが良いという考え方もあるが、私は… When the attribute expression is “design” and the evaluation expression is “good”, the following sample sentences (a) to (e) are used by using the above extraction rule (1) and opinion judgment rule (2). Try the opinion extraction process.
(A) Good design.
(B) As for the design, it can be said that it is a good result.
(C) I want a good product with a different design.
(D) I want it if the design is good.
(E) There is an idea that the design is good, but I ...

例えば、上記のサンプル文から意見の抽出を試みると、
（ａ）の文は、抽出ルール（１）にマッチするために意見を表す属性評価ペアが存在するとして抽出される。
（ｂ）及び（ｃ）の文は、抽出ルール（１）にマッチしないために意見を表す属性評価ペアは抽出されない。
（ｄ）の文は、抽出ルール（１）にはマッチするが、意見性判定ルール（２）にもマッチしてしまうために意見を表す属性評価ペアは抽出されない。
（ｅ）の文は、抽出ルール（１）にマッチし、意見性判定ルール（２）にもマッチしないために意見を表す属性評価ペアが抽出される。 For example, if you try to extract opinions from the above sample sentence,
The sentence (a) is extracted on the assumption that an attribute evaluation pair representing an opinion exists in order to match the extraction rule (1).
Since the sentences (b) and (c) do not match the extraction rule (1), an attribute evaluation pair representing an opinion is not extracted.
Although the sentence (d) matches the extraction rule (1), the attribute evaluation pair representing the opinion is not extracted because it matches the opinion determination rule (2).
Since the sentence (e) matches the extraction rule (1) and does not match the opinion determination rule (2), an attribute evaluation pair representing an opinion is extracted.

したがって非特許文献１の意見抽出方法を用いると、（ａ）と（ｅ）の文から意見を表す属性評価ペアが抽出されることになる。 Therefore, when the opinion extraction method of Non-Patent Document 1 is used, an attribute evaluation pair representing an opinion is extracted from the sentences (a) and (e).

ところが、上記の（ａ）〜（ｅ）の文において、意見として抽出すべき属性評価ペアを含む文は、（ａ）と（ｂ）である。すなわち、（ａ）及び（ｂ）の文には、ある特定の対象の［属性］＝「デザイン」、［評価］＝「良い」という記述者による意見が記載されているが、（ｃ）は「デザイン」が指している対象が「良い」ではなく、（ｄ）及び（ｅ）の文には、ある特定の対象について「デザインが良い」という記述者の主観的な評価や意見が記載されているとはいえないからである。 However, in the above sentences (a) to (e), sentences including attribute evaluation pairs to be extracted as opinions are (a) and (b). That is, in the sentences of (a) and (b), an opinion by a writer of [attribute] = “design” and [evaluation] = “good” of a specific target is described, but (c) The object that “design” points to is not “good”, and the sentences (d) and (e) describe the subjective evaluation and opinion of the writer that “design is good” for a specific object. Because it cannot be said.

したがって、非特許文献１の意見抽出方法を用いても、記述者の意見が記載されているサンプル文（ｂ）から意見を抽出することができないという不具合を生じていた。 Therefore, even if the opinion extraction method of Non-Patent Document 1 is used, there is a problem in that an opinion cannot be extracted from the sample sentence (b) in which the opinion of the writer is described.

また、非特許文献１の意見抽出方法を用いると、記述者の意見が記載されていないサンプル文（ｅ）から、誤って本来記述者の意見でない属性評価ペアを意見として抽出してしまうという不具合を生じていた。 In addition, when the opinion extraction method of Non-Patent Document 1 is used, an attribute evaluation pair that is not originally the opinion of the writer is erroneously extracted as an opinion from the sample sentence (e) where the opinion of the writer is not described. Was produced.

このように、非特許文献１に記載の所定のルールに基づいた意見抽出方法では、［属性］と［評価］の関係をルール化するとともに、主観評価性に基づくルールを網羅的に人手で記述するのはたいへん困難なこととなっている。 As described above, in the opinion extraction method based on the predetermined rule described in Non-Patent Document 1, the relationship between [attribute] and [evaluation] is ruled and the rule based on subjective evaluation is comprehensively described manually. It is very difficult to do.

また、非特許文献２に記載の照応解析のモデルでは、この照応解析のモデル自体が意見抽出を目的としたものではないので、このままでは文書中から記述者の意見を抽出することができないという不具合を生じていた。また、機械学習に基づく抽出方法は、正解（先行詞と照応詞との間に関連があると判断した組み合わせ）の例に対して不正解（先行詞と照応詞との間に関連が無い組み合わせであると判断した組み合わせ）の例が多くなる傾向があり、抽出性能が低下するという問題点がある。一般的に機械学習においては、正解の例と不正解の例とのバランスが極端に異なる事例を用いて学習すると、抽出性能が低下する傾向がみられる。 In addition, in the anaphora analysis model described in Non-Patent Document 2, the anaphora analysis model itself is not intended to extract opinions, so that the writer's opinion cannot be extracted from the document as it is. Was produced. In addition, the extraction method based on machine learning is an incorrect solution (a combination in which there is no relationship between an antecedent and an anaphor) with respect to an example of a correct answer (a combination determined to be related between an antecedent and an anaphor). There is a tendency that there are many examples of combinations) determined to be, and the extraction performance is lowered. In general, in machine learning, when learning is performed using a case where the balance between the correct answer example and the incorrect answer example is extremely different, the extraction performance tends to decrease.

本発明は上記課題を解決するためになされたもので、文書内から、［属性］と［評価］との間に何らかの関連があることを示す属性評価ペアを抽出し、抽出した属性評価ペアに対して機械学習に基づいて正例の属性評価ペアを抽出して、文書内から意見を精度良く抽出することが可能な意見抽出用学習装置及び意見抽出用分類装置を提供することを目的としている。 The present invention has been made to solve the above-described problem. An attribute evaluation pair indicating that there is some relationship between [attribute] and [evaluation] is extracted from the document, and the extracted attribute evaluation pair is extracted. On the other hand, it is an object to provide an opinion extraction learning device and an opinion extraction classification device capable of extracting positive attribute evaluation pairs based on machine learning and extracting opinions from a document with high accuracy. .

上記の目的を達成するために本発明の意見抽出用学習装置は、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書を入力し、訓練文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を訓練文書毎にグループ分けして抽出するとともに、それぞれの属性評価ペア候補に対して、属性表現及び評価表現を含む文節の表層文字列、属性表現及び評価表現を含む文節の品詞、属性表現を含む文節と評価表現を含む文節が係り受け関係にあるか否か、若しくは属性表現と評価表現との間の距離情報等の素性の情報を付与する処理を行う属性評価ペア候補抽出手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補とを入力して、正例属性評価ペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めるとともに、正例属性評価ペア候補が存在するグループ内で正例属性評価ペア候補でない属性評価ペア候補を負例属性評価ペア候補と定める処理を行う訓練事例作成１手段と、正例属性評価ペア候補と負例属性評価ペア候補とを入力して、識別関数Ｚ１等の分類モデル１を作成する処理を行うモデル作成１手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補と識別関数Ｚ１等の分類モデル１とを入力して、正例属性評価ペアと同一の属性評価ペア候補を一つも含まない属性評価ペア候補から構成されるグループに属する属性評価ペア候補を未定の属性評価ペア候補として抽出し、識別関数Ｚ１等の分類モデル１を用いて未定の属性評価ペア候補を正例属性評価ペア候補又は負例属性評価ペア候補に分類する処理を行う分類１手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補と分類モデル１手段が分類した正例属性評価ペア候補とを入力して、当該正例属性評価ペア候補を負例属性評価ペア候補に置き換えるとともに、正例属性評価ペアと同一の属性評価ペア候補を正例属性評価ペア候補と定める処理を行う訓練事例作成２手段と、訓練事例作成２手段が定めた正例属性評価ペア候補並びに訓練事例作成２手段が置き換えた負例属性評価ペア候補を入力して、識別関数Ｚ２等の分類モデル２を作成して出力する処理を行うモデル作成２手段とを備えたことを特徴とする。 In order to achieve the above object, the learning device for opinion extraction of the present invention inputs a training document including a positive example attribute evaluation pair with a label that associates an attribute expression and an evaluation expression in the document, and includes the training document in the training document. Attribute evaluation pair candidates corresponding to combinations of existing attribute expressions and evaluation expressions are grouped and extracted for each training document, and for each attribute evaluation pair candidate, a phrase including attribute expressions and evaluation expressions is extracted. The part-of-speech of the phrase including the surface character string, attribute expression and evaluation expression, whether the phrase including the attribute expression and the phrase including the evaluation expression are in a dependency relationship, or the distance information between the attribute expression and the evaluation expression For each training document and training document that includes attribute evaluation pair candidate extraction means for performing processing to add feature information, and a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document. The grouped attribute evaluation pair candidates are input, and the same attribute evaluation pair candidate as the positive example attribute evaluation pair is defined as the positive example attribute evaluation pair candidate. A training case creation 1 means for performing a process of defining an attribute evaluation pair candidate that is not an example attribute evaluation pair candidate as a negative example attribute evaluation pair candidate, a positive example attribute evaluation pair candidate, and a negative example attribute evaluation pair candidate are input, and an identification function Grouped for each training document and training document including a model creation 1 means for performing a process of creating a classification model 1 such as Z1 and a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document. Input an attribute evaluation pair candidate and a classification model 1 such as a discriminant function Z1, and belong to a group composed of attribute evaluation pair candidates that do not include any of the same attribute evaluation pair candidates as the positive attribute evaluation pair. Attribute evaluation pair candidates are extracted as undetermined attribute evaluation pair candidates, and the categorized model 1 such as the discriminant function Z1 is used to classify the undetermined attribute evaluation pair candidates as positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates. A classification 1 means for processing, a training document including a positive example attribute evaluation pair with a label that associates an attribute expression and an evaluation expression in the document, and an attribute evaluation pair candidate grouped for each training document and a classification model 1 means The classified positive example attribute evaluation pair candidate is input, the positive example attribute evaluation pair candidate is replaced with the negative example attribute evaluation pair candidate, and the same attribute evaluation pair candidate as the positive example attribute evaluation pair is replaced with the positive example attribute evaluation pair. The training case creation 2 means for performing the process determined as a candidate, the positive example attribute evaluation pair candidate determined by the training case creation 2 means, and the negative example attribute evaluation pair candidate replaced by the training case creation 2 means are input. And a model creation 2 means for performing a process of creating and outputting a classification model 2 such as an identification function Z2.

この発明によれば、訓練文書の中から自動的に負例を抽出し、この負例を選択的に用いて作成した分類モデル１と、主に主観評価性を学習できるように負例を選択的に用いて作成した分類モデル２とを出力するので、意見抽出に重要な属性表現と評価表現との関係を学習することが可能となる。 According to the present invention, a negative example is automatically extracted from a training document, and a negative model is selected so that the classification model 1 created by selectively using the negative example can be learned mainly in subjective evaluation. Since the classification model 2 created by using them is output, it is possible to learn the relationship between attribute expressions and evaluation expressions that are important for opinion extraction.

また上記の目的を達成するために本発明の意見抽出用分類装置は、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、アンケート結果、企業の問い合わせ窓口への問い合わせデータ、報告書、その他の記述者の意見が記載されている調査対象文書を入力し、調査対象文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を調査対象文書毎にグループ分けして抽出するとともに、それぞれの属性評価ペア候補に対して、属性表現及び評価表現を含む文節の表層文字列、属性表現及び評価表現を含む文節の品詞、属性表現を含む文節と評価表現を含む文節が係り受け関係にあるか否か、若しくは属性表現と評価表現との間の距離情報等の素性の情報を付与する処理を行う属性評価ペア候補抽出手段と、調査対象文書毎にグループ分けした属性評価ペア候補と属性評価ペア候補を正例属性評価ペアと負例属性評価ペアとに切り分ける境界面を決定する識別関数Ｚ１等の分類モデル１とを入力して、当該分類モデル１を用いて同一グループ内の属性評価ペア候補から正例属性評価ペア候補を抽出する処理を行う分類１手段と、分類１手段が抽出した正例属性評価ペア候補と正例属性評価ペア候補を正例属性評価ペアと負例属性評価ペアとに切り分ける境界面を決定する識別関数Ｚ２等の分類モデル２とを入力して、正例属性評価ペア候補から正例属性評価ペアを抽出する処理を行い当該正例属性評価ペアを真の属性評価ペアと判定する処理を行う分類２手段とを備えたことを特徴とする。 In order to achieve the above object, the opinion extraction classification apparatus of the present invention is a web bulletin board such as blog, E-mail, questionnaire results, inquiry data for company inquiry window, report, other writers Input the survey target document in which the opinions are described, and extract the attribute evaluation pair candidates corresponding to the combination of the attribute expression and the evaluation expression existing in the survey target document by grouping for each target document , For each attribute evaluation pair candidate, the surface character string of the phrase including the attribute expression and the evaluation expression, the part of speech of the phrase including the attribute expression and the evaluation expression, the phrase including the attribute expression and the phrase including the evaluation expression Attribute evaluation pair candidate extraction means for performing processing for adding feature information such as distance information between the attribute expression and the evaluation expression, and each survey target document The grouped attribute evaluation pair candidate and the attribute evaluation pair candidate are input into a classification model 1 such as a discriminant function Z1 for determining a boundary surface for dividing the attribute evaluation pair candidate into a positive example attribute evaluation pair and a negative example attribute evaluation pair. Class 1 means for performing processing for extracting positive example attribute evaluation pair candidates from candidate attribute evaluation pair candidates in the same group, and positive example attribute evaluation pair candidates and positive example attribute evaluation pair candidates extracted by classification 1 means Input a classification model 2 such as a discriminant function Z2 for determining a boundary plane to be divided into an example attribute evaluation pair and a negative example attribute evaluation pair, and perform a process of extracting a positive example attribute evaluation pair from positive example attribute evaluation pair candidates It comprises classification 2 means for performing processing for determining the positive example attribute evaluation pair as a true attribute evaluation pair.

この発明によれば、訓練文書の中から自動的に抽出した負例を選択的に用いて作成した分類モデル１と、主に主観評価性を学習できるように負例を選択的に用いて作成した分類モデル２とを用いることによって、属性表現と評価表現との関係を効果的に学習した結果を用いて調査対象文書から意見を抽出することが可能となる。 According to the present invention, a classification model 1 created by selectively using negative examples automatically extracted from training documents and a negative model selectively created so that mainly subjective evaluation can be learned. By using the classification model 2, the opinion can be extracted from the survey target document using the result of effectively learning the relationship between the attribute expression and the evaluation expression.

また上記の目的を達成するために本発明の意見抽出用学習装置は、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書を入力し訓練文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を訓練文書毎にグループ分けして抽出するとともに、それぞれの属性評価ペア候補に対して属性表現及び評価表現を含む文節の表層文字列、属性表現及び評価表現を含む文節の品詞、属性表現を含む文節と評価表現を含む文節が係り受け関係にあるか否か、若しくは属性表現と評価表現との間の距離情報等の素性の情報を付与する処理を行う属性評価ペア候補抽出手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補とを入力して、正例属性評価ペアと同一の属性評価ペア候補を一つ以上含むグループ内の属性評価ペア候補を抽出するとともに、２組の属性評価ペア候補内の２つの属性表現の素性や位置関係等に基づいて属性評価比較ペア候補を生成し、訓練文書中の先頭から左側に存在する属性表現が訓練文書中の正例属性評価ペアの属性表現と一致している場合には属性評価比較ペア候補にＬｅｆｔのラベルを付与した属性評価比較ペアを生成し、訓練文書中の先頭から右側に存在する属性表現が訓練文書中の正例属性評価ペアの属性表現と一致している場合には、属性評価比較ペア候補にＲｉｇｈｔのラベルを付与した属性評価比較ペアを生成する処理を行う訓練事例作成１手段と、訓練事例作成１手段によってＬｅｆｔ又はＲｉｇｈｔのラベルが付された属性評価比較ペアを入力して、識別関数Ｚ３等の分類モデル３を作成する処理を行うモデル作成１手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補と識別関数Ｚ３等の分類モデル３とを入力して、正例属性評価ペアと同一の属性評価ペア候補を一つも含まない属性評価ペア候補から構成されるグループに属する属性評価ペア候補を未定の属性評価ペア候補として抽出し、２組の未定の属性評価ペア候補内の２つの属性表現素性に基づいて属性評価比較ペアを生成し、識別関数Ｚ３等の分類モデル３を用いて、生成した複数の属性評価比較ペアをトーナメント方式に順番に比較し、グループ毎に最終的に残った一つの属性評価ペア候補を正例属性評価ペア候補に設定し、当該正例属性評価ペア候補以外の属性評価ペア候補を負例属性評価ペア候補に設定する処理を行う分類１手段と、文書中の属性表現と評価表現とを関連付けるラベルを付した正例属性評価ペアを含む訓練文書と訓練文書毎にグループ分けした属性評価ペア候補と分類モデル１手段が分類した正例属性評価ペア候補とを入力して、当該正例属性評価ペア候補を負例属性評価ペア候補に置き換えるとともに、正例属性評価ペアと同一の属性評価ペア候補を正例属性評価ペア候補と定める処理を行う訓練事例作成２手段と、訓練事例作成２手段が定めた正例属性評価ペア候補並びに訓練事例作成２手段が置き換えた負例属性評価ペア候補を入力して、識別関数Ｚ４等の分類モデル４を作成して出力する処理を行うモデル作成２手段とを備えたことを特徴とする。 In order to achieve the above object, the opinion extraction learning apparatus of the present invention inputs a training document including a positive example attribute evaluation pair with a label that associates an attribute expression and an evaluation expression in the document, and enters the training document. Attribute evaluation pair candidates corresponding to combinations of existing attribute expressions and evaluation expressions are grouped and extracted for each training document, and the phrase surface layer including attribute expressions and evaluation expressions for each attribute evaluation pair candidate The part of speech of the phrase containing the character string, attribute expression and evaluation expression, whether the phrase including the attribute expression and the phrase including the evaluation expression are in a dependency relationship, or the features such as distance information between the attribute expression and the evaluation expression For each training document and training document that includes attribute evaluation pair candidate extraction means that performs processing for assigning information, and a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document. The grouped attribute evaluation pair candidates are input to extract attribute evaluation pair candidates in a group including at least one attribute evaluation pair candidate identical to the positive attribute evaluation pair, and two sets of attribute evaluation pair candidates Attribute evaluation comparison pair candidates are generated based on the features and positional relationships of the two attribute expressions in the list, and the attribute expressions existing on the left side from the top in the training document are the attribute expressions of the positive example attribute evaluation pairs in the training document. If they match, an attribute evaluation comparison pair in which a Left label is assigned to the attribute evaluation comparison pair candidate is generated, and the attribute expression existing on the right side from the top in the training document is the correct example attribute evaluation pair in the training document. When the attribute expression matches, the training case creation 1 means for performing the process of generating the attribute evaluation comparison pair in which the right label is assigned to the attribute evaluation comparison pair candidate, and the training case creation 1 means Le A label for associating an attribute evaluation comparison pair with a label of t or Right with model creation 1 means for performing a process of creating a classification model 3 such as a discriminant function Z3, and an attribute expression in the document and an evaluation expression A training document including a positive example attribute evaluation pair with a symbol, an attribute evaluation pair candidate grouped for each training document, and a classification model 3 such as a discriminant function Z3 are input, and the same attribute evaluation pair as the positive example attribute evaluation pair Attribute evaluation pair candidates belonging to a group composed of attribute evaluation pair candidates that do not include any candidates are extracted as undetermined attribute evaluation pair candidates, and based on two attribute expression features in two undetermined attribute evaluation pair candidates The attribute evaluation comparison pair is generated, and the generated plurality of attribute evaluation comparison pairs are sequentially compared in the tournament method using the classification model 3 such as the discriminant function Z3. Class 1 means for performing a process of setting one remaining attribute evaluation pair candidate as a positive example attribute evaluation pair candidate and setting an attribute evaluation pair candidate other than the positive example attribute evaluation pair candidate as a negative example attribute evaluation pair candidate , A training document including a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document, a candidate attribute evaluation pair grouped for each training document, and a positive example attribute evaluation pair classified by the classification model 1 means The candidate is input, the positive example attribute evaluation pair candidate is replaced with a negative example attribute evaluation pair candidate, and the process of determining the same attribute evaluation pair candidate as the positive example attribute evaluation pair as a positive example attribute evaluation pair candidate The classification model such as the discriminant function Z4 is inputted by inputting the case creation 2 means, the positive example attribute evaluation pair candidate determined by the training case creation 2 means and the negative example attribute evaluation pair candidate replaced by the training case creation 2 means. Characterized in that a model created second means for performing a process creates and outputs a.

この発明によれば、訓練文書の中から自動的に負例を抽出し、この負例を選択的に用いて作成した分類モデル３と、主に主観評価性を学習できるように負例を選択的に用いて作成した分類モデル４とを出力するので、意見抽出に重要な属性表現と評価表現との関係を学習することが可能となる。 According to the present invention, a negative example is automatically extracted from a training document, and the negative model is selected so that the subjective evaluation can be learned mainly with the classification model 3 created by selectively using the negative example. Since the classification model 4 created by using the output is output, it is possible to learn the relationship between the attribute expression important for opinion extraction and the evaluation expression.

また上記の目的を達成するために本発明の意見抽出用分類装置は、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、アンケート結果、企業の問い合わせ窓口への問い合わせデータ、報告書、その他の記述者の意見が記載されている調査対象文書を入力し、調査対象文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を調査対象文書毎にグループ分けして抽出するとともに、それぞれの属性評価ペア候補に対して、属性表現及び評価表現を含む文節の表層文字列、属性表現及び評価表現を含む文節の品詞、属性表現を含む文節と評価表現を含む文節が係り受け関係にあるか否か、若しくは属性表現と評価表現との間の距離情報等の素性の情報を付与する処理を行う属性評価ペア候補抽出手段と、調査対象文書毎にグループ分けした属性評価ペア候補と属性評価ペア候補を正例属性評価ペアと負例属性評価ペアとに切り分ける境界面を決定する識別関数Ｚ３等の分類モデル３とを入力して、同一のグループ内の２組の属性評価ペア候補から属性評価比較ペアを作成し、識別関数Ｚ３等の分類モデル３を用いて、生成した複数の属性評価比較ペアをトーナメント方式に順番に比較し、グループ毎に最終的に残った一つの属性評価ペア候補を正例属性評価ペア候補に設定し、当該正例属性評価ペア候補以外の属性評価ペア候補を負例属性評価ペア候補に設定する処理を行う分類１手段と、分類１手段が設定した正例属性評価ペア候補と正例属性評価ペア候補を正例属性評価ペアと負例属性評価ペアとに切り分ける境界面を決定する識別関数Ｚ４等の分類モデル４とを入力して、分類１手段が設定した正例属性評価ペア候補から正例属性評価ペアを抽出する処理を行い、当該正例属性評価ペアを真の属性評価ペアと判定する処理を行う分類２手段とを備えたことを特徴とする。 In order to achieve the above object, the opinion extraction classification apparatus of the present invention is a web bulletin board such as blog, E-mail, questionnaire results, inquiry data for company inquiry window, report, other writers Input the survey target document in which the opinions are described, and extract the attribute evaluation pair candidates corresponding to the combination of the attribute expression and the evaluation expression existing in the survey target document by grouping for each target document , For each attribute evaluation pair candidate, the surface character string of the phrase including the attribute expression and the evaluation expression, the part of speech of the phrase including the attribute expression and the evaluation expression, the phrase including the attribute expression and the phrase including the evaluation expression Attribute evaluation pair candidate extraction means for performing processing for adding feature information such as distance information between the attribute expression and the evaluation expression, and each survey target document The grouped attribute evaluation pair candidate and the attribute evaluation pair candidate are classified into a positive example attribute evaluation pair and a negative example attribute evaluation pair, and a classification model 3 such as a discriminant function Z3 for determining a boundary surface is input. An attribute evaluation comparison pair is created from the two attribute evaluation pair candidates, and using the classification model 3 such as the discriminant function Z3, the generated plurality of attribute evaluation comparison pairs are compared in turn in the tournament method, and finally each group is finalized. Classification 1 means for performing processing for setting one attribute evaluation pair candidate remaining as a positive example attribute evaluation pair candidate and setting an attribute evaluation pair candidate other than the positive example attribute evaluation pair candidate as a negative example attribute evaluation pair candidate And a classification model such as a discriminant function Z4 for determining a boundary plane for dividing the positive example attribute evaluation pair candidate and the positive example attribute evaluation pair candidate set by the classification 1 means into a positive example attribute evaluation pair and a negative example attribute evaluation pair 4 is inputted, a process of extracting a positive example attribute evaluation pair from the positive example attribute evaluation pair candidates set by the classification 1 means, and a process of determining the positive example attribute evaluation pair as a true attribute evaluation pair And a classification 2 means.

この発明によれば、訓練文書の中から自動的に抽出した負例を選択的に用いて作成した分類モデル３と、主に主観評価性を学習できるように負例を選択的に用いて作成した分類モデル４とを用いることによって、属性表現と評価表現との関係を効果的に学習した結果を用いて調査対象文書から意見を抽出することが可能となる。 According to this invention, the classification model 3 created by selectively using negative examples automatically extracted from the training document and the negative model selectively created so that mainly subjective evaluation can be learned. By using the classification model 4, the opinion can be extracted from the survey target document using the result of effectively learning the relationship between the attribute expression and the evaluation expression.

本発明の効果は、高精度の意見抽出を実現できることにある。その理由は、属性と評価の関係を学習することを目的として負例を選択的に用いて作成した分類モデル１（又は分類モデル３）と、主観評価性を学習することを目的として負例を選択的に用いて作成した分類モデル２（又は分類モデル４）を用いているからである。 The effect of the present invention is to realize highly accurate opinion extraction. The reason is that the classification model 1 (or classification model 3) created by selectively using the negative example for the purpose of learning the relationship between the attribute and the evaluation, and the negative example for the purpose of learning the subjective evaluation property This is because the classification model 2 (or classification model 4) created selectively is used.

分類モデル１（又は分類モデル３）を作成するモデル作成１手段に入力する負例の属性評価ペア候補は訓練文書の正例の属性評価ペアを一つ以上含むグループＩＤの負例のみである。このような負例は、属性と評価に正しい関係がない負例であると判断できるので、モデル作成１手段はを属性と評価が正しい関係にあるかを判断することを目的とした学習が可能である。一方、分類モデル２(又は分類モデル４）を作成するモデル２作成手段に入力する負例の属性評価ペア候補は、分類１手段によって誤って正例と判定される負例のみである。分類１手段によって正例と判定されるが、実は負例という場合は、属性と評価に正しい関係があるものの主観評価ではない負例と判断できるのでモデル２作成手段は主観評価性を判断することを目的とした学習が可能である。 The negative example attribute evaluation pair candidates input to the model creation 1 means for creating the classification model 1 (or classification model 3) are only negative examples of group IDs including one or more positive example attribute evaluation pairs of the training document. Since such a negative example can be determined to be a negative example in which there is no correct relationship between the attribute and the evaluation, the model creation 1 means can learn for the purpose of determining whether the attribute and the evaluation have a correct relationship. It is. On the other hand, the negative example attribute evaluation pair candidates input to the model 2 creation means for creating the classification model 2 (or the classification model 4) are only negative examples that are erroneously determined as the positive examples by the classification 1 means. Although it is determined to be a positive example by the classification 1 means, in the case of a negative example, it can be determined that it is a negative example that is not a subjective evaluation although there is a correct relationship between the attribute and the evaluation. Can be used for learning.

このように、分類モデル１と分類モデル２で負例を選択的に用いることで目的が明確な学習をすることができ、抽出精度の向上が可能となる
本発明を用いることによって、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、報告書類、その他の意見そのものを記述することを目的としていない文書からであっても、その対象物についての評判や批評、顧客の要望等を自動で得ることができるので、記述者の意見をより正確に自動で抽出することが可能となる。したがって、その対象物についての評判や批評、顧客の要望等を自動で得ることができ、商品購入前の事前調査や、企業の市場調査等のマーケティング活動の効率を向上させ、より利用者の希望に則した製品を早期に提供することが可能となる。 Thus, by selectively using negative examples in the classification model 1 and the classification model 2, learning with a clear objective can be performed, and the extraction accuracy can be improved. Even from Web bulletin boards, e-mails, report documents, and other documents that are not intended to describe opinions themselves, it is possible to automatically obtain reputation and criticism of the object, customer requests, etc. As a result, it is possible to extract the opinions of the writer more accurately and automatically. Therefore, it is possible to automatically obtain the reputation and criticism of the object, customer requests, etc., improve the efficiency of marketing activities such as pre-sale before product purchase and market research of companies, and more It becomes possible to provide products that comply with

特許文献２に記載の照応解析のモデルにおいて、先行詞と照応詞の対という文脈を、属性表現と評価表現の対という文脈に置き換えることによって、属性評価ペアを抽出する処理の一部に用いることができそうである。現在では、さまざまな機械学習を用いた照応解析のモデルが知られているが、最も単純な特許文献２に記載の照応解析のモデルを適用することについて検討する。 In the model of anaphora analysis described in Patent Document 2, the context of an antecedent and an anaphoric pair is replaced with the context of a pair of attribute expression and evaluation expression, and used as part of the process of extracting an attribute evaluation pair Seems to be able to. At present, various models of anaphora analysis using machine learning are known. However, the application of the simple model of anaphora analysis described in Patent Document 2 will be examined.

この照応解析のモデルを用いて属性評価ペアを抽出し、属性候補集合内のそれぞれの要素が評価候補と対となるか否かの２値分類問題を解き、評価候補と属性候補の複数の対を求めることが可能であると考えられる。 Using this anaphora analysis model, an attribute evaluation pair is extracted, a binary classification problem of whether each element in the attribute candidate set is paired with the evaluation candidate, and a plurality of pairs of the evaluation candidate and the attribute candidate are solved. Is considered possible.

また、正解として抽出した属性評価ペア（［属性］と［評価］との間に関連があると判断した組み合わせ）の中には、一つの評価候補に対して複数の対を持つ場合もあるが、この手法に対して複数の処理を追加することにより、そのような事例に対しても網羅的に属性表現と評価表現の対を、記述者の意見として抽出できる可能性がある。 In addition, among the attribute evaluation pairs (combinations determined to be related between [attribute] and [evaluation]) extracted as correct answers, there may be a plurality of pairs for one evaluation candidate. By adding a plurality of processes to this technique, there is a possibility that a pair of attribute expression and evaluation expression can be extracted comprehensively as an opinion of the writer for such a case.

図１は、オペレータが入力した正例属性評価ペアを含む訓練事例から、属性評価ペア候補分類の学習を行って識別関数等の分類モデルを作成して出力する学習装置１０の処理に関するブロック図である。 FIG. 1 is a block diagram relating to processing of a learning apparatus 10 that learns attribute evaluation pair candidate classifications from a training example including positive example attribute evaluation pairs input by an operator, and generates and outputs a classification model such as a discrimination function. is there.

図２は、調査対象文書を入力し、オペレータが入力した正例属性評価ペアを含む訓練事例から生成した識別関数等の分類モデルを用いて意見の抽出を行う分類装置１１のブロック図である。図２に示す分類装置１１についての説明は後述する。 FIG. 2 is a block diagram of the classification device 11 that inputs a survey target document and extracts opinions using a classification model such as a discrimination function generated from a training example including a positive example attribute evaluation pair input by an operator. The description of the classification device 11 shown in FIG. 2 will be described later.

図１には、オペレータが正例のラベルを付与したラベル付きの正解データを含む訓練文書を記憶するハードディスク等の訓練文書記憶部１と、訓練文書に基づいた学習を行って分類モデルを作成する学習装置１０と、学習装置１０のモデル作成手段２２０が作成した分類モデルを記憶する分類モデル記憶部１２とが示されている。 FIG. 1 shows a training document storage unit 1 such as a hard disk for storing a training document including labeled correct answer data to which an operator gives a correct example label, and learning based on the training document to create a classification model. A learning device 10 and a classification model storage unit 12 that stores the classification model created by the model creation means 220 of the learning device 10 are shown.

図１に示す訓練文書記憶部１には、少なくとも文書内で真に属性表現と評価表現の対となる箇所にオペレータが正例のラベルを付与した（例えば、属性表現や評価表現にタグを付したものなど。）ラベル付きの正解データ（記述者の意見となる正例属性評価ペア）を含む訓練文書を記憶する。 In the training document storage unit 1 shown in FIG. 1, the operator gives a positive example label at least at a location that truly becomes a pair of the attribute expression and the evaluation expression in the document (for example, tags are attached to the attribute expression and the evaluation expression). The training document including the correct answer data with the label (positive example attribute evaluation pair as the opinion of the writer) is stored.

図１に示す学習装置１０は、訓練文書記憶部１から訓練文書を読み出して属性表現記憶部１１０に記憶されている属性表現の集合と評価表現記憶部１２０に記憶されている評価表現の集合とを参照し、訓練文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を抽出して出力する属性評価ペア候補抽出手段１００と、属性評価ペア候補を記憶する属性評価ペア候補記憶手段１３０とを備える。 The learning device 10 illustrated in FIG. 1 reads out a training document from the training document storage unit 1 and stores a set of attribute expressions stored in the attribute expression storage unit 110 and a set of evaluation expressions stored in the evaluation expression storage unit 120. Attribute extraction pair candidate extraction means 100 that extracts and outputs attribute evaluation pair candidates corresponding to combinations of attribute expressions and evaluation expressions existing in the training document, and attribute evaluation that stores attribute evaluation pair candidates Pair candidate storage means 130.

また学習装置１０は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めるとともに、正例属性評価ペア候補でない属性評価ペア候補を負例属性評価ペア候補と定める訓練処理を行う訓練事例作成手段２１０と、正例属性評価ペア候補と負例属性評価ペア候補とを入力して、分類モデルを作成するモデル作成手段２２０とを備える。 Further, the learning device 10 reads out the training document from the training document storage unit 1 and sets the attribute evaluation pair candidate that is the same as the attribute expression and evaluation expression pair that the operator gives a label to the training document as the correct answer. A training case creation means 210 that performs a training process for defining a pair candidate and determining an attribute evaluation pair candidate that is not a positive example attribute evaluation pair candidate as a negative example attribute evaluation pair candidate, and a positive example attribute evaluation pair candidate and a negative example attribute evaluation pair candidate And a model creation means 220 for creating a classification model.

上述の正例とは、着目する属性評価ペア候補の属性表現と評価表現との間に関係が存在し、かつ主観評価であることを意味している。すなわち正例は、真の属性評価ペアであることを意味する。また負例とは、着目する属性評価ペア候補の属性表現と評価表現との間に関係が存在しない、もしくは、主観評価でないことを意味している。すなわち負例は、真の属性評価ペアでないことを意味する。 The above-mentioned positive example means that there is a relationship between the attribute expression of the attribute evaluation pair candidate to be focused on and the evaluation expression, and it is a subjective evaluation. That is, the positive example means a true attribute evaluation pair. Further, the negative example means that there is no relationship between the attribute expression and the evaluation expression of the attribute evaluation pair candidate to be noticed, or it is not a subjective evaluation. That is, a negative example means that it is not a true attribute evaluation pair.

例えば訓練事例作成手段２１０は、図１に示す訓練文書に含まれる訓練文Ｓ１及びＳ２から８つの属性評価ペア候補を抽出して、オペレータが入力した訓練文書のラベルを参照して、１つの正例属性評価ペア（Ａ１，Ｅ１、素性１）とそれ以外の７つの負例属性評価ペアを作成する。モデル作成手段２２０は、これらを入力して分類モデルを作成して分類モデル記憶部１２に記憶する。 For example, the training case creation means 210 extracts eight attribute evaluation pair candidates from the training sentences S1 and S2 included in the training document shown in FIG. An example attribute evaluation pair (A1, E1, feature 1) and the other seven negative example attribute evaluation pairs are created. The model creation means 220 inputs these to create a classification model and stores it in the classification model storage unit 12.

図２に示す分類装置１１は、学習装置１０が作成した学習モデルを用いて、文書記憶部５に記憶されている調査対象文書内から属性評価ペア候補を抽出し、この属性評価ペア候補について真の属性評価ペア（正例）であるか、又は真の属性評価ペアでないか（負例）に分類する。 The classification device 11 shown in FIG. 2 uses the learning model created by the learning device 10 to extract attribute evaluation pair candidates from the investigation target document stored in the document storage unit 5, and true the attribute evaluation pair candidates. Or a true attribute evaluation pair (negative example).

分類装置１１の属性評価ペア候補抽出手段１００は、学習装置１０の属性評価ペア候補抽出手段１００が実行する処理と同様にして、調査対象文書から属性評価ペア候補を抽出して、属性評価ペア候補記憶部１３０に記憶する。分類手段２３０は、属性評価ペア候補記憶部１３０から属性評価ペア候補を読み出して、分類モデル記憶部１２に記憶されている分類モデルを用いて、正例属性評価ペア又は負例属性評価ペアに分類する処理を行う。 The attribute evaluation pair candidate extraction unit 100 of the classification device 11 extracts attribute evaluation pair candidates from the investigation target document in the same manner as the processing executed by the attribute evaluation pair candidate extraction unit 100 of the learning device 10, and the attribute evaluation pair candidates. Store in the storage unit 130. The classification unit 230 reads out the attribute evaluation pair candidates from the attribute evaluation pair candidate storage unit 130 and classifies them into a positive example attribute evaluation pair or a negative example attribute evaluation pair using the classification model stored in the classification model storage unit 12. Perform the process.

ここで正例属性評価ペアに分類された属性表現と評価表現のペア候補は、真の属性評価ペアと判定されるのであるが、図１に示した学習装置１０と図２に示した分類装置１１を用いて判断した場合には以下の問題点を生ずるおそれがある。 Here, the attribute expression and evaluation expression pair candidates classified into the positive attribute evaluation pairs are determined to be true attribute evaluation pairs, but the learning device 10 shown in FIG. 1 and the classification device shown in FIG. 11 may cause the following problems.

図１に示した学習装置１０と図２に示した分類装置１１を用いた、機械学習に基づく意見の抽出方法では、属性表現と評価表現に関係があるかという問題と、主観評価であるかという問題を分けずに取り扱っているため、２つの問題が混在した学習がなされてしまう可能性が高いものである。したがって、抽出した正例属性評価ペアは、記述者の意見を抽出している可能性が低く、抽出性能が向上しないという問題を生ずるおそれがある。 In the opinion extraction method based on machine learning using the learning device 10 shown in FIG. 1 and the classification device 11 shown in FIG. 2, there is a problem of whether there is a relationship between attribute expression and evaluation expression, and is it subjective evaluation? Therefore, there is a high possibility that learning in which two problems are mixed will be performed. Therefore, there is a low possibility that the extracted positive example attribute evaluation pair has extracted the opinion of the writer and the extraction performance may not be improved.

例えば図１に示す例では、訓練文としてＳ１とＳ２の文があり、Ｓ１から抽出されたＧＩＤ１のグループの属性評価ペアには正例属性評価ペアを含んでいるが、訓練文Ｓ２から抽出されたＧＩＤ２のグループの属性評価ペアは負例属性評価ペアのみで構成されており、正例属性評価ペアを一つも含まないものとなっている。 For example, in the example shown in FIG. 1, there are S1 and S2 sentences as training sentences, and the attribute evaluation pair of the GID1 group extracted from S1 includes a positive example attribute evaluation pair, but is extracted from the training sentence S2. The attribute evaluation pair of the GID2 group is composed of only negative example attribute evaluation pairs, and does not include any positive example attribute evaluation pair.

訓練文Ｓ１から抽出される負例属性評価ペアは、属性表現と評価表現に関係がないとオペレータが指定しているために、負例の属性評価ペアである可能性が高いものである。ところが訓練文Ｓ２では、オペレータが正例属性評価ペアを指定していないために、訓練文Ｓ２から学習装置１０が抽出する負例属性評価ペアは、それが属性表現と評価表現に関係がないために負例として判定されるのか、又は本来属性表現と評価表現に関係があるもののオペレータが主観評価としていために負例属性評価ペアとして判定さるものであるのか不明である。したがって、学習装置１０を用いて全ての事例について学習すると、上記の２つの問題が混在した学習がなされてしまうので、これによって意見の抽出性能が向上しないことが予測できる。 The negative example attribute evaluation pair extracted from the training sentence S1 is highly likely to be a negative example attribute evaluation pair because the operator specifies that there is no relationship between the attribute expression and the evaluation expression. However, in the training sentence S2, since the operator does not specify the positive example attribute evaluation pair, the negative example attribute evaluation pair extracted by the learning device 10 from the training sentence S2 is not related to the attribute expression and the evaluation expression. It is unclear whether it is determined as a negative example attribute evaluation pair because the operator has a subjective evaluation although it is originally related to the attribute expression and the evaluation expression. Therefore, if learning is performed for all cases using the learning device 10, learning in which the above two problems are mixed is performed, and it can be predicted that this does not improve opinion extraction performance.

以下、本発明を実施するための最良の形態を、図面に基づき説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

（第１実施形態）
図３は、オペレータが入力した正例属性評価ペアを含む訓練事例から、属性評価ペア候補分類の学習を行って作成した識別関数Ｚ１等の分類モデル１及び識別関数Ｚ２等の分類モデル２とを生成して出力することによって、どのような文書の場合に属性表現と評価表現は正例属性評価ペアになり、逆にどのような場合に正例属性評価ペアにならないかを判別するための分類モデル１及び分類モデル２を学習させる、学習装置２の処理に関するブロック図である。 (First embodiment)
FIG. 3 shows a classification model 1 such as a discriminant function Z1 and a classification model 2 such as a discriminant function Z2 created by learning attribute candidate pairs from a training case including a positive example attribute evaluation pair input by an operator. By generating and outputting, the classification to determine in what kind of document the attribute expression and evaluation expression become a normal example attribute evaluation pair, and conversely in which case it does not become a normal example attribute evaluation pair It is a block diagram regarding the process of the learning apparatus 2 which learns the model 1 and the classification model 2. FIG.

図２１は、図３に示す学習装置２が正例属性評価ペア候補を含む訓練事例から作成した識別関数Ｚ１等の分類モデル１及び識別関数Ｚ２等の分類モデル２を用いて、調査対象文書から正例意見の抽出を行う分類装置７のブロック図である。図２１に示す分類装置７についての説明は後述する。 FIG. 21 is a diagram illustrating a survey target document using a classification model 1 such as a discriminant function Z1 and a classification model 2 such as a discriminant function Z2 created by the learning device 2 shown in FIG. It is a block diagram of the classification | category apparatus 7 which extracts a positive example opinion. The description of the classification device 7 shown in FIG. 21 will be described later.

図３には、オペレータが正例のラベルを付与したラベル付きの正解データを含む訓練文書を記憶するハードディスク等の訓練文書記憶部１と、訓練文書に基づいた学習を行って識別関数Ｚ１等を用いた分類モデル１及び識別関数Ｚ２等を用いた分類モデル２を作成する学習装置２と、学習装置２のモデル作成１手段１５０が作成した識別関数Ｚ１等の分類モデル１を記憶する分類モデル１記憶部３と、学習装置２のモデル作成２手段１９０が作成した識別関数Ｚ２等の分類モデル２を記憶する分類モデル２記憶部４とが示されている。 FIG. 3 shows a training document storage unit 1 such as a hard disk for storing a training document including correct data with a label to which an operator gives a correct example label, and an identification function Z1 and the like by performing learning based on the training document. A learning apparatus 2 for creating a classification model 1 using the classification model 1 and a discrimination function Z2 used, and a classification model 1 for storing a classification model 1 such as a discrimination function Z1 created by the model creation 1 means 150 of the learning apparatus 2 A storage unit 3 and a classification model 2 storage unit 4 that stores the classification model 2 such as the discriminant function Z2 created by the model creation 2 means 190 of the learning device 2 are shown.

同図に示す学習装置２は、訓練文書記憶部１から訓練文書を読み出して属性表現記憶部１１０に記憶されている属性表現の集合と評価表現記憶部１２０に記憶されている評価表現の集合とを参照し、訓練文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を訓練文書毎にグループ分けして抽出して出力する属性評価ペア候補抽出手段１００と、属性評価ペア候補を記憶する属性評価ペア候補記憶手段１３０とを備える。 The learning device 2 shown in FIG. 1 reads a training document from the training document storage unit 1 and stores a set of attribute expressions stored in the attribute expression storage unit 110 and a set of evaluation expressions stored in the evaluation expression storage unit 120. Attribute evaluation pair candidate extraction means 100 for extracting and outputting attribute evaluation pair candidates corresponding to a combination of attribute expression and evaluation expression existing in the training document for each training document, and attributes, Attribute evaluation pair candidate storage means 130 for storing evaluation pair candidates.

また学習装置２は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めるとともに、正例属性評価ペア候補が存在するグループ内で正例属性評価ペア候補でない属性評価ペア候補を負例属性評価ペア候補と定める処理を行う訓練事例作成１手段１４０と、正例属性評価ペア候補と負例属性評価ペア候補とを入力して、属性表現と評価表現に正しい関係が存在するか否かを主に学習する識別関数Ｚ１等の分類モデル１を作成するモデル作成１手段１５０とを備える。 Further, the learning device 2 reads out the training document from the training document storage unit 1, and selects the attribute evaluation pair candidate that is the same as the attribute expression / evaluation expression pair that the operator gave a label to the training document as the correct example attribute evaluation. A training case creation 1 means 140 that performs a process of defining an attribute evaluation pair candidate that is not a positive example attribute evaluation pair candidate as a negative example attribute evaluation pair candidate in a group in which a positive example attribute evaluation pair candidate exists, Model creation that creates classification model 1 such as discriminant function Z1 that mainly learns whether or not there is a correct relationship between attribute expression and evaluation expression by inputting example attribute evaluation pair candidate and negative example attribute evaluation pair candidate 1 means 150.

また学習装置２は、分類モデル１記憶部３に記憶されている識別関数Ｚ１を用いて正例属性評価ペア候補にも負例属性評価ペア候補にも属さない同一グループＩＤ内の未定の属性評価ペア候補群を、正例属性評価ペア候補又は負例属性評価ペア候補に分類する処理を行う分類１手段１６０と、分類１手段１６０が分類した正例属性評価ペア候補又は負例属性評価ペア候補を記憶する分類１手段分類結果記憶部１７０とを備える。 In addition, the learning device 2 uses the discriminant function Z1 stored in the classification model 1 storage unit 3 to determine the undetermined attribute evaluation in the same group ID that does not belong to the positive example attribute evaluation pair candidate or the negative example attribute evaluation pair candidate. Classification 1 means 160 for performing processing for classifying the pair candidate group into positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates, and positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates classified by classification 1 means 160 And a classification 1 means classification result storage unit 170 for storing.

また学習装置２は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めて抽出するとともに、分類１手段分類結果記憶部１７０に記憶されている正例属性評価ペア候補を読み出して負例の属性評価ペア候補に置き換える訓練事例作成２手段１８０と、訓練事例作成２手段１８０によって抽出された正例属性評価ペア候補並びに置き換えた負例属性評価ペア候補を入力して、訓練文書から主に主観評価を学習するための識別関数Ｚ２を用いた分類モデル２を作成するモデル作成２手段１９０とを備える。 Further, the learning device 2 reads out the training document from the training document storage unit 1, and selects the attribute evaluation pair candidate that is the same as the attribute expression / evaluation expression pair that the operator gave a label to the training document as the correct example attribute evaluation. A training case creation 2 means 180 that reads out the positive example attribute evaluation pair candidate stored in the classification 1 means classification result storage unit 170 and replaces it with a negative example attribute evaluation pair candidate, By inputting the positive example attribute evaluation pair candidate extracted by the creation 2 means 180 and the replaced negative example attribute evaluation pair candidate, the classification model 2 using the discriminant function Z2 for mainly learning the subjective evaluation from the training document is obtained. Model creation 2 means 190 to create is provided.

図３に示す訓練文書記憶部１には、少なくとも文書内で真に属性表現と評価表現の対となる箇所にオペレータが正例のラベルを付与した（例えば、属性表現や評価表現にタグを付したものなど。）ラベル付きの正解データ（記述者の意見となる正例属性評価ペア）を含む訓練文書を記憶する。 In the training document storage unit 1 shown in FIG. 3, an operator gives a positive example label at least at a location that truly becomes a pair of an attribute expression and an evaluation expression in the document (for example, a tag is attached to the attribute expression or the evaluation expression). The training document including the correct answer data with the label (positive example attribute evaluation pair as the opinion of the writer) is stored.

図４は、訓練文書の例を示す図である。 FIG. 4 is a diagram illustrating an example of a training document.

同図は、単純に意見として抽出することが困難な訓練文Ｓ１と訓練文Ｓ２の２種類の訓練文を入力して記憶している例を示している。同図に示すように訓練文Ｓ１は、「デザインは、価格とサポート性能は知らないが、良い。」であり、この訓練文の中には「デザインは→良い」という係り受け関係を備えた、属性表現「デザイン」と評価表現「良い」の、記述者の意見となる正例属性評価ペアを含んでいる。したがってオペレータは、属性表現「デザイン」を<属性:1>及び</属性>のタグ（ラベルの一形態）で囲むとともに、評価表現「良い」に対して属性表現と同一のidを付した<評価:1></評価>のタグ（ラベルの一形態）で囲み、この同一のidを付した属性表現と評価表現のタグの組み合わせを正例属性評価ペアとして学習装置２が判断可能なように指定して、訓練文書記憶部１に記憶させる。 This figure shows an example in which two types of training texts, that is, training text S1 and training text S2, which are difficult to extract simply as opinions, are input and stored. As shown in the figure, the training sentence S1 is “Design does not know the price and support performance, but it is good”, and this training sentence has a dependency relationship “Design is good”. , The attribute expression “design” and the evaluation expression “good” are included as a positive example attribute evaluation pair as an opinion of the writer. Therefore, the operator surrounds the attribute expression “design” with <attribute: 1> and </ attribute> tags (one form of label) and attaches the same id as the attribute expression to the evaluation expression “good” < Evaluation: 1> </ evaluation> tags (one form of label), and the learning device 2 can determine the combination of the attribute expression with the same id and the tag of the evaluation expression as a positive attribute evaluation pair. And stored in the training document storage unit 1.

一方の訓練文Ｓ２は、「居住性とデザインとエンジン音と静粛性は置いていて、良いこともある。」であり、この訓練文の中には記述者の意見として抽出すべき属性評価ペアは存在しない。したがってオペレータは、特別にラベルを付与することなく訓練文Ｓ２を訓練文書記憶部１に記憶させる。なお、訓練文を識別するＳ１とＳ２の記号は、本発明の説明のために付与しているものであり、必ずしも付与する必要はない。 One training sentence S2 is “having comfortability, design, engine sound, and quietness, and may be good.” In this training sentence, an attribute evaluation pair to be extracted as an opinion of the writer Does not exist. Therefore, the operator stores the training sentence S2 in the training document storage unit 1 without giving a special label. In addition, the symbols S1 and S2 for identifying the training sentences are given for the explanation of the present invention and need not be given.

図５は、訓練文書を一般的に表現した状態を示す図である。 FIG. 5 is a diagram showing a state in which a training document is generally expressed.

同図に示すように、図４に示した訓練文Ｓ１及び訓練文Ｓ２における訓練文書中の属性（「デザイン」、「価格」、「サポート」等の属性表現）をＡ１、Ａ２、…、Ａｎのように置き換えて表現することができる。また同様に、訓練文中の評価（「良い」、「安い」、「よい」等の評価表現）をＥ１、…、Ｅｎのように表現することができる。 As shown in the figure, attributes (attribute expressions such as “design”, “price”, “support”) in the training document S1 and training sentence S2 shown in FIG. 4 are represented by A1, A2,. It can be expressed as a replacement. Similarly, evaluations in the training sentence (evaluation expressions such as “good”, “cheap”, and “good”) can be expressed as E1,..., En.

図６は、属性表現記憶部１１０が記憶している属性表現の集合を示す図であり、図７は評価表現記憶部１２０が記憶している評価表現の集合を示す図である。 FIG. 6 is a diagram showing a set of attribute expressions stored in the attribute expression storage unit 110, and FIG. 7 is a diagram showing a set of evaluation expressions stored in the evaluation expression storage unit 120.

図６及び図７に示すように学習装置２は、オペレータが入力した訓練文書にも存在する「デザイン」＝Ａ１、「価格」＝Ａ２、「サポート」＝Ａ３、「性能」＝Ａ４等の、ある特定の対象のある側面を表す属性表現を予め記憶している属性表現記憶部１１０と、属性表現と関連のある評価表現であって、「良い」＝Ｅ１、「安い」、「よい」、「好き」、「最悪」、「使いやすい」等の、属性表現に対応する日本語の評価表現を予め記憶している評価表現記憶部１２０とを備えている。 As shown in FIG. 6 and FIG. 7, the learning device 2 has “design” = A1, “price” = A2, “support” = A3, “performance” = A4, etc. that are also present in the training document input by the operator. An attribute expression storage unit 110 that stores in advance an attribute expression representing an aspect of a specific target, and an evaluation expression related to the attribute expression, where “good” = E1, “cheap”, “good”, An evaluation expression storage unit 120 that stores in advance Japanese evaluation expressions corresponding to attribute expressions such as “like”, “worst”, and “easy to use” is provided.

属性表現記憶部１１０又は評価表現記憶部１２０が記憶する属性表現又は評価表現は、訓練文書記憶部１に記憶されている訓練文書に指定されている正解データ（正例属性評価ペア）の属性表現と評価表現のペアを抽出して記憶するようにしてもよい。 The attribute expression or the evaluation expression stored in the attribute expression storage unit 110 or the evaluation expression storage unit 120 is an attribute expression of correct answer data (correct example attribute evaluation pair) specified in the training document stored in the training document storage unit 1. And a pair of evaluation expressions may be extracted and stored.

学習装置２の属性評価ペア候補抽出手段１００は、訓練文書記憶部１から訓練文書を読み出して、属性表現記憶部１１０に記憶されている属性表現と、評価表現記憶部１２０に記憶されている評価表現とを参照し、訓練文書中に存在している属性表現と評価表現の組み合わせに該当する部分を抽出する処理を行う。例えば訓練文書から、属性Ａ１、…、Ａｎ、と評価Ｅ１、…、Ｅｎの組み合わせを抽出して、属性表現ペア候補として出力する処理を行う。 The attribute evaluation pair candidate extraction unit 100 of the learning device 2 reads out the training document from the training document storage unit 1, the attribute expression stored in the attribute expression storage unit 110, and the evaluation stored in the evaluation expression storage unit 120. The processing is performed by referring to the expression and extracting a portion corresponding to the combination of the attribute expression and the evaluation expression existing in the training document. For example, a combination of attributes A1,..., An and evaluations E1,..., En is extracted from the training document and output as attribute expression pair candidates.

そして次に属性評価ペア候補抽出手段１００は、素性（図３に示す例では、素性１〜素性８と表記してある。素性の詳細については、後述する。）の値が所定の範囲内にある属性表現と評価表現のペア（図３に示す例では、（Ａ１，Ｅ１）〜（Ａ７，Ｅ１））を「属性評価ペア候補」として設定する。そして、そのそれぞれの属性評価ペア候補について、それが訓練文書のどの範囲内から抽出したかを示すグループＩＤ（ＧＩＤ）と、素性の情報とを付与する。 Then, the attribute evaluation pair candidate extraction unit 100 has the value of the feature (in the example shown in FIG. 3, described as feature 1 to feature 8. Details of the feature will be described later) within a predetermined range. A pair of an attribute expression and an evaluation expression ((A1, E1) to (A7, E1) in the example shown in FIG. 3) is set as an “attribute evaluation pair candidate”. And about each attribute evaluation pair candidate, group ID (GID) which shows from which range of the training document it extracted, and the information of a feature are provided.

ここで、属性評価ペア候補が訓練文書の所定の範囲内に存在するための条件として、例えば、「評価表現を含む文と同一文内に存在する属性表現である」という条件や、「評価表現を含む文とその前方ｋ文以内（但しｋは自然数とする。）という条件が考えられる。以下図８〜図１１を用いて、グループ分けの説明を行う。 Here, as a condition for the attribute evaluation pair candidate to exist within the predetermined range of the training document, for example, the condition “the attribute expression exists in the same sentence as the sentence including the evaluation expression” or “the evaluation expression And the preceding k sentences (where k is a natural number) can be considered, and grouping will be described below with reference to FIGS.

図８は、訓練文書記憶部１に記憶されている訓練文書の一例を示す図である。同図に示す例では、訓練文Ｓ３〜Ｓ５が記憶されている。 FIG. 8 is a diagram illustrating an example of a training document stored in the training document storage unit 1. In the example shown in the figure, training sentences S3 to S5 are stored.

図９は、属性評価ペア候補抽出手段１００が、評価表現Ｅ１を含む文と同一文内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ１）と、評価表現Ｅ２を含む文と同一文内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ２）とに分類した分類例を示す図である。 FIG. 9 shows that the attribute evaluation pair candidate extraction unit 100 has a group (GID1) of attribute evaluation pair candidates in which the attribute expression exists in the same sentence as the sentence including the evaluation expression E1, and the same sentence as the sentence including the evaluation expression E2. It is a figure which shows the example of classification classified into the group (GID2) of the attribute evaluation pair candidate in which attribute expression exists.

図１０は、属性評価ペア候補抽出手段１００が、評価表現Ｅ１を含む文とその前方１文以内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ１）と、評価表現Ｅ２を含む文とその前方１文以内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ２）とに分類した分類例を示す図である。 In FIG. 10, the attribute evaluation pair candidate extraction unit 100 includes a sentence including the evaluation expression E1, a group of attribute evaluation pair candidates (GID1) in which the attribute expression exists within one sentence in front of the sentence, a sentence including the evaluation expression E2, and It is a figure which shows the example of classification classified into the group (GID2) of the attribute evaluation pair candidate in which attribute expression exists within one sentence ahead.

図１１は、評価表現を含む文と同一文内に属性表現が存在するという条件と、最も評価表現に近い場所に存在する正例の属性タグ（ラベルの一形態であって、図８に示す例では<属性:1>Ａ１</属性>が該当する。）が付与されたＡ１までを、分類の範囲として限定した分類例を示す図である。 FIG. 11 shows a condition that an attribute expression exists in the same sentence as a sentence including an evaluation expression, and a positive example attribute tag (a form of a label) that exists in a place closest to the evaluation expression. In the example, <attribute: 1> A1 </ attribute> corresponds to A1)) is a diagram illustrating a classification example in which the range up to A1 is limited as a classification range.

次に、素性の情報について説明する。素性の情報は、例えば配列で表すことができる情報であり、以下の素性を１つ、又は複数組み合わせて用いることができる。
素性ａ：属性表現及び評価表現を含む文節の表層文字列
素性ｂ：属性表現及び評価表現を含む文節の品詞
素性ｃ：属性表現を含む文節と評価表現を含む文節が係り受け関係にあるか否か
素性ｄ：属性表現と評価表現との間の距離情報 Next, feature information will be described. The feature information is, for example, information that can be represented by an array, and one or a combination of the following features can be used.
Feature a: Surface character string feature of clause containing attribute expression and evaluation expression b: Part of speech feature of clause containing attribute expression and evaluation expression c: Whether clause containing attribute expression and clause containing evaluation expression are in a dependency relationship Feature d: Distance information between attribute expression and evaluation expression

図１２は、訓練文書記憶部１に記憶されている訓練文Ｓ６及び訓練文Ｓ７を例示する図である。 FIG. 12 is a diagram illustrating a training sentence S6 and a training sentence S7 stored in the training document storage unit 1.

ここでは、属性評価ペア候補抽出手段１００が、図１２に示す訓練文Ｓ６と訓練文Ｓ７に、（デザイン，良い）という属性評価ペア候補に素性の情報を付与する際の処理を例にとって説明する。まず、素性を得るために、訓練文Ｓ６と訓練文Ｓ７を形態素解析を用いて単語の区切りと文節の区切りを検出する（図１３参照）。次に、構文解析を用いて文節間の係り受け関係を検出する（図１３参照）。 Here, a description will be given by taking as an example processing performed when the attribute evaluation pair candidate extraction unit 100 assigns feature information to the attribute evaluation pair candidate (design, good) in the training sentence S6 and the training sentence S7 illustrated in FIG. . First, in order to obtain features, word breaks and phrase breaks are detected by using morphological analysis of the training sentence S6 and the training sentence S7 (see FIG. 13). Next, the dependency relationship between clauses is detected using syntax analysis (see FIG. 13).

図１３は、訓練文Ｓ６及び訓練文Ｓ７について形態素解析及び構文解析を実施した結果を例示する図である。 FIG. 13 is a diagram illustrating the results of performing morphological analysis and syntax analysis on the training sentence S6 and the training sentence S7.

同図に示す例では、「＋」が形態素解析を行った結果得られた単語の区切りを表し、「／」が文節の区切りを表している。また、構文解析を行った結果得られた単語の品詞を「［］」の間の文字列で表し、文節間の係り受け関係を「→」を用いて表している。 In the example shown in the figure, “+” represents a word break obtained as a result of morphological analysis, and “/” represents a phrase break. Also, the part of speech of the word obtained as a result of the syntax analysis is represented by a character string between “[]”, and the dependency relationship between phrases is represented by “→”.

図１４は、形態素解析及び構文解析を行った結果を用いて素性を抽出した結果を例示する図である。 FIG. 14 is a diagram illustrating results of extracting features using results of morphological analysis and syntax analysis.

素性ａについて、同図を用いて説明する。同図に示す例では、属性表現及び評価表現を含む文節の表層文字列は「＋」で区切られた各単語が該当し、この表層文字列を「素性ａ」として用いる。そしてこの表層文字列の全種類数の次元の配列を用意して、各単語が対象となる候補の素性として存在している場合には、各単語の素性ａのバイナリ値として１（又はフラグ）を設定する。各単語が対象となる候補の素性として存在しない場合には、各単語の素性ａのバイナリ値として０（又はフラグ）を設定する。一般に訓練文には多くの単語が含まれるので、この配列は非常に高次元な配列となる。なお、バイナリ値の代わりに単語の出現頻度を設定するようにしても、本発明の目的を達成することが可能である。 The feature a will be described with reference to FIG. In the example shown in the figure, the surface character string of the clause including the attribute expression and the evaluation expression corresponds to each word delimited by “+”, and this surface character string is used as “feature a”. If an array of dimensions of all kinds of surface character strings is prepared and each word is present as a candidate candidate feature, 1 (or flag) is used as a binary value of the feature a of each word. Set. If each word does not exist as a target candidate feature, 0 (or a flag) is set as the binary value of the feature a of each word. In general, training sentences contain many words, so this array is a very high-dimensional array. Note that the object of the present invention can be achieved by setting the appearance frequency of a word instead of a binary value.

素性ｂについて、同図を用いて説明する。属性表現及び評価表現を含む文節の品詞には「［］」の間の各品詞が該当する。この品詞の種類数の次元の配列を用意して、各品詞が対象となる候補の素性として存在している場合には、素性ｂのバイナリ値として１を設定する。また、各品詞が対象となる候補の素性として存在しない場合には、素性ｂのバイナリ値として０を設定する。この場合にもバイナリ値の代わりに品詞の出現頻度を代入しても本発明の目的を達成することが可能である。 The feature b will be described with reference to FIG. Each part of speech between “[]” corresponds to the part of speech of the phrase including the attribute expression and the evaluation expression. An array of dimensions of the number of types of parts of speech is prepared, and when each part of speech exists as a target candidate feature, 1 is set as the binary value of the feature b. If each part of speech does not exist as a target candidate feature, 0 is set as the binary value of the feature b. Also in this case, the object of the present invention can be achieved by substituting the appearance frequency of the part of speech instead of the binary value.

素性ｃについて、同図を用いて説明する。属性表現を含む文節と評価表現を含む文節が係り受け関係にある場合には、「→」で繋いだ文節が係り受け関係に該当する。文節間に係り受け関係があると判断した場合には、素性ｃのバイナリ値として１（又はフラグ）を設定する。また文節間に係り受け関係がないと判断した場合には、素性ｃのバイナリ値として０（又はフラグ）を設定する。 The feature c will be described with reference to FIG. When a clause including an attribute expression and a clause including an evaluation expression are in a dependency relationship, the clause connected by “→” corresponds to the dependency relationship. If it is determined that there is a dependency relationship between phrases, 1 (or flag) is set as the binary value of the feature c. If it is determined that there is no dependency relationship between phrases, 0 (or a flag) is set as the binary value of the feature c.

素性ｄについて、同図を用いて説明する。属性表現と評価表現との間の距離情報として、例えば属性表現の単語と評価表現の単語との間のＢｙｔｅ数を設定する。また、Ｂｙｔｅ数の代わりに文字数を代入しても、本発明の目的を達成することが可能である。なお、上記の実施形態において、素性ａ、素性ｂ、素性ｃのバイナリ値又はフラグには、正論理を用いてもよいし、負論理を用いるようにしてもよい。 The feature d will be described with reference to FIG. As the distance information between the attribute expression and the evaluation expression, for example, the number of bytes between the attribute expression word and the evaluation expression word is set. Also, the object of the present invention can be achieved by substituting the number of characters instead of the number of bytes. In the above embodiment, positive logic or negative logic may be used for binary values or flags of the feature a, the feature b, and the feature c.

図１５は、属性評価ペア候補記憶部１３０が記憶する属性評価ペア候補の組み合わせを示す図である。 FIG. 15 is a diagram illustrating combinations of attribute evaluation pair candidates stored in the attribute evaluation pair candidate storage unit 130.

図３に示す学習装置２の属性評価ペア候補記憶部１３０は、属性評価ペア候補抽出手段１００が抽出した属性評価ペア候補を記憶する。同図に示す例では、属性表現記憶部１１０に記憶されている属性表現の集合と、評価表現記憶部１２０に記憶されている評価表現の集合とを参照して、訓練文書記憶部１に記憶されている訓練文書から属性評価ペア候補を抽出した結果を記憶した例を示している。同図に示すＧＩＤ１及びＧＩＤ２は、グループＩＤを示している。素性１〜素性７は、それらの属性評価ペア候補の素性（配列を含む情報であってもよい）である。同図に示すグループ分けは、所定の範囲内を同一文内としてグループＩＤを付与している。 The attribute evaluation pair candidate storage unit 130 of the learning device 2 illustrated in FIG. 3 stores the attribute evaluation pair candidates extracted by the attribute evaluation pair candidate extraction unit 100. In the example shown in the figure, the set of attribute expressions stored in the attribute expression storage unit 110 and the set of evaluation expressions stored in the evaluation expression storage unit 120 are referred to and stored in the training document storage unit 1. The example which memorize | stored the result of having extracted the attribute evaluation pair candidate from the trained training document is shown. GID1 and GID2 shown in the figure indicate group IDs. Features 1 to 7 are the features of the attribute evaluation pair candidates (may be information including an array). In the grouping shown in the figure, a group ID is given with a predetermined range within the same sentence.

図１６は、属性評価ペア候補記憶部１３０が記憶する属性評価ペア候補に基づいて、訓練事例作成１手段１４０が出力する正例属性評価ペア候補と負例属性評価ペア候補（訓練事例）を示す図である。 FIG. 16 shows positive example attribute evaluation pair candidates and negative example attribute evaluation pair candidates (training examples) output by the training case creation 1 means 140 based on the attribute evaluation pair candidates stored in the attribute evaluation pair candidate storage unit 130. FIG.

図３に示す学習装置２の訓練事例作成１手段１４０は、訓練文書記憶部１に記憶されている訓練文書の、正例属性評価ペアが存在する箇所（オペレータが正例としてラベルを付与した正解とされる文書で、記述者の意見を含む文書。）から抽出した属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定める処理を行う。また、正例属性評価ペア候補が存在するＧＩＤ内で正例属性評価ペア候補ではない属性評価ペア候補（オペレータが正例としてラベルを付与していない属性表現と評価表現のペア）を負例属性評価ペア候補と定める処理を行う。 The training example creation 1 means 140 of the learning device 2 shown in FIG. 3 is a part of the training document stored in the training document storage unit 1 where a correct example attribute evaluation pair exists (correct answer given by the operator as a correct example). A document including the opinion of the writer, and the attribute evaluation pair candidate identical to the attribute expression / evaluation expression pair extracted from the document) is determined as a positive example attribute evaluation pair candidate. Also, attribute evaluation pair candidates that are not positive attribute evaluation pair candidates (a pair of an attribute expression and an evaluation expression that the operator has not given a label as a positive example) in a GID in which there are positive attribute evaluation pair candidates are negative example attributes. Processing to determine as an evaluation pair candidate is performed.

同図に示す例では、属性評価ペア候補記憶部１３０が記憶しているＧＩＤ２のグループ属性評価ペア候補は、訓練文書の正例属性評価ペアを一つも含まない訓練文から抽出した属性評価ペア候補であるために、訓練事例作成１手段１４０は、正例属性評価ペア候補としても、負例属性評価ペア候補としても選出しない。 In the example shown in the figure, the group attribute evaluation pair candidate of GID2 stored in the attribute evaluation pair candidate storage unit 130 is an attribute evaluation pair candidate extracted from a training sentence that does not include any of the normal example attribute evaluation pairs of the training document. For this reason, the training case creation 1 means 140 does not select a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate.

図３に示す学習装置２のモデル作成１手段１５０は、訓練事例作成１手段１４０よって定められた正例属性評価ペア候補と負例属性評価ペア候補とを入力して、分類モデル１を作成する。分類モデル１には、後段の処理において正例属性評価ペアと負例属性評価ペアとに切り分ける境界面を決定する識別関数Ｚ１を用いる。識別関数Ｚ１としては、例えば以下の（式１）に示す関数を用いることができる。 The model creation 1 means 150 of the learning device 2 shown in FIG. 3 creates the classification model 1 by inputting the positive example attribute evaluation pair candidate and the negative example attribute evaluation pair candidate determined by the training case creation 1 means 140. . The classification model 1 uses an identification function Z1 that determines a boundary surface to be divided into a positive example attribute evaluation pair and a negative example attribute evaluation pair in the subsequent processing. As the discrimination function Z1, for example, the function shown in the following (Formula 1) can be used.

Ｚ１＝a_1・x_1＋a_2・x_2 .... ＋a_n・x_n＋a_0 …（式１） Z1 = a_1 · x_1 + a_2 · x_2 .... + a_n · x_n + a_0 (Formula 1)

上記の（式１）で、x_1 〜 x_n には、入力された属性評価ペア候補の素性に含まれる配列要素を用いることができる。例えば前述の図１４に示した例では、表層文字列「デザイン」＝x_1、「以外」＝x_2、「が」＝x_3 …のように、素性ａのバイナリ値（又はフラグ）に応じた値を用いることができる。 In the above (Formula 1), for x_1 to x_n, an array element included in the feature of the input attribute evaluation pair candidate can be used. For example, in the example shown in FIG. 14 described above, a value corresponding to the binary value (or flag) of the feature a is obtained, such as the surface character string “design” = x_1, “other than” = x_2, “ga” = x_3. Can be used.

また、品詞「名詞」＝ x_1、「助動詞」＝ x_2 …のように、素性ｂのバイナリ値に応じた値を用いることができる。また、素性ｃの係り受け関係の有無を表すバイナリ値（又はフラグ）や、素性ｄの距離情報を表す値を x_1 から x_n に用いることも可能である。 Also, values corresponding to the binary value of the feature b can be used, such as part of speech “noun” = x_1, “auxiliary verb” = x_2. It is also possible to use a binary value (or flag) indicating the presence / absence of a dependency relationship of the feature c or a value indicating distance information of the feature d for x_1 to x_n.

上記の（式１）に示す a_1 〜 a_n は定数である。a_1 〜 a_n には、統計学で用いられる線形判別法や、Support Vector Machine等を用いて決定される定数を代入することができる。例えば、線形判別法では、正例属性評価ペア候補と負例属性評価ペア候補とに分け、それぞれ識別関数Ｚ１に x_1 から x_n を代入し、判別得点Ｚ１ｘを算出する。 A_1 to a_n shown in the above (formula 1) are constants. For a_1 to a_n, a linear discriminant used in statistics, a constant determined using Support Vector Machine, or the like can be substituted. For example, in the linear discriminant method, a positive example attribute evaluation pair candidate and a negative example attribute evaluation pair candidate are divided, and x_1 to x_n are assigned to the discriminant function Z1 to calculate a discrimination score Z1x.

正例属性評価ペア候補と負例属性評価ペア候補との２群を最もよく分ける平面とは、全変動をＳｔ、級間変動をＳｂとするとき、相関比（Ｓｂ／Ｓｔ）が最大となる平面である。そこで（Ｓｂ／Ｓｔ）が最大となるように a_1 から a_n を選択する。全変動Ｓｔは、判別得点Ｚ１ｘの全平均Ｚ_all から、各データがどの程度分散しているかを示す値であり、以下の（式２）で求めることができる。 The plane that best separates the two groups of the positive example attribute evaluation pair candidate and the negative example attribute evaluation pair candidate has the maximum correlation ratio (Sb / St) when St is the total variation and Sb is the interclass variation. It is a plane. Therefore, a_1 to a_n are selected so that (Sb / St) is maximized. The total variation St is a value indicating how much each data is dispersed from the total average Z_all of the discrimination score Z1x, and can be obtained by the following (Equation 2).

Ｓｔ＝ΣiΣj（Ｚ_ij −Ｚ_all）² …(式2) St = ΣiΣj (Z_ij−Z_all) ² (Expression 2)

ここで i は、群の種類（正例属性評価ペア候補、又は負例属性評価ペア候補）を表すダミーインデックスであり、ここでは１か２の値をとる。また j は、入力データを表すダミーインデックスである。また、級間変動Ｓｂは判定結果が正例の属性評価ペアと負例の属性評価ペアがそれぞれ全平均Ｚ_all からどの程度分散しているかを示す値であり、以下の（式３）で求めることができる。 Here, i is a dummy index representing the type of group (positive example attribute evaluation pair candidate or negative example attribute evaluation pair candidate), and takes a value of 1 or 2 here. J is a dummy index representing input data. The interclass variation Sb is a value indicating how far the positive attribute evaluation pair and the negative attribute evaluation pair are dispersed from the total average Z_all, respectively, and is obtained by the following (Equation 3). Can do.

Ｓｂ=Σi{n_i・（Ｚ_i −Ｚ_all）²} …（式３） Sb = Σi {n_i · (Z_i−Z_all) ² } (Formula 3)

ここで i は、群の種類（正例属性評価ペア候補、又は負例属性評価ペア候補）を表すダミーインデックスであり、ここでは１か２の値をとる。また n_i は、i 番目の群のデータの個数を表す。 Here, i is a dummy index representing the type of group (positive example attribute evaluation pair candidate or negative example attribute evaluation pair candidate), and takes a value of 1 or 2 here. N_i represents the number of data in the i-th group.

図３に示す分類モデル１記憶部３には、モデル作成１手段１５０が作成した識別関数Ｚ１等の分類モデル１を記憶する。 The classification model 1 storage unit 3 shown in FIG. 3 stores the classification model 1 such as the discriminant function Z1 created by the model creation 1 means 150.

図３に示す学習装置２の分類１手段１６０では、先ず属性評価ペア候補記憶部１３０の属性評価ペア候補を入力するとともに、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正例属性評価ペアと同一の属性評価ペア候補を一つも含まない属性評価ペア候補から構成されるグループ（ＧＩＤ）に属する属性評価ペア候補を、未定の属性評価ペア候補として抽出する処理を行う。ここで抽出された属性評価ペア候補は、正例属性評価ペア候補にも属さず、負例属性評価ペア候補にも属さない未定の属性評価ペア候補である。未定の属性評価ペア候補の抽出例を図１７に示す。同図に示すように分類１手段１６０は、ＧＩＤ２に属する未定の属性評価ペア候補を抽出している。 In the classification 1 means 160 of the learning device 2 shown in FIG. 3, the attribute evaluation pair candidate in the attribute evaluation pair candidate storage unit 130 is first input, the training document is read from the training document storage unit 1, and the operator labels the training document. Of extracting attribute evaluation pair candidates belonging to a group (GID) made up of attribute evaluation pair candidates that do not include any of the same attribute evaluation pair candidates as the positive example attribute evaluation pair to I do. The attribute evaluation pair candidates extracted here are undetermined attribute evaluation pair candidates that do not belong to the positive example attribute evaluation pair candidate and do not belong to the negative example attribute evaluation pair candidate. An extraction example of undetermined attribute evaluation pair candidates is shown in FIG. As shown in the figure, the classification 1 means 160 extracts undetermined attribute evaluation pair candidates belonging to GID2.

図１７は、上述のようにして分類１手段が抽出した未定の属性評価ペア候補を示す図である。 FIG. 17 is a diagram showing undetermined attribute evaluation pair candidates extracted by the classification 1 means as described above.

次に分類１手段１６０は、分類モデル１記憶部３に記憶されている識別関数Ｚ１を用いて、正例の属性評価ペアを一つも含まないグループ（ＧＩＤ）に属する未定の属性評価ペア候補群を、正例属性評価ペア候補又は負例属性評価ペア候補に分類する処理を行う。例えば分類１手段１６０は、分類モデル１記憶部３に記録されている識別関数Ｚ１の x_1 から x_n に属性評価ペア候補の素性を代入して、判別得点Ｚ１ｘの値を算出する。識別関数Ｚ１は、未定の属性評価ペア候補を、正例属性評価ペア候補と負例属性評価ペア候補とに切り分ける境界面を表すので、判別得点Ｚ１ｘが正の値に算出されるか、又は負の値に算出されるかに応じて、正例属性評価ペア候補又は負例属性評価ペア候補に分類することが可能となっている。 Next, the classification 1 means 160 uses the discriminant function Z1 stored in the classification model 1 storage unit 3 to determine a group of undetermined attribute evaluation pairs belonging to a group (GID) that does not include any positive attribute evaluation pairs. Is classified into a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate. For example, the classification 1 means 160 substitutes the feature of the attribute evaluation pair candidate for x_1 to x_n of the discrimination function Z1 recorded in the classification model 1 storage unit 3, and calculates the value of the discrimination score Z1x. The discriminant function Z1 represents a boundary surface that divides an undetermined attribute evaluation pair candidate into a positive example attribute evaluation pair candidate and a negative example attribute evaluation pair candidate, so that the discrimination score Z1x is calculated to be a positive value or negative Can be classified into positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates.

図１８は、分類１手段が分類した正例属性評価ペア候補と負例属性評価ペア候補とを示す図である。 FIG. 18 is a diagram showing positive example attribute evaluation pair candidates and negative example attribute evaluation pair candidates classified by the classification 1 means.

図３に示す学習装置２の分類１手段分類結果記憶部１７０は、図１８に示すような分類１手段１６０の分類結果を記憶する。 The classification 1 means classification result storage unit 170 of the learning device 2 shown in FIG. 3 stores the classification results of the classification 1 means 160 as shown in FIG.

図３に示す学習装置２の訓練事例作成２手段１８０は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めて抽出するとともに、分類１手段分類結果記憶部１７０に記憶されている正例属性評価ペア候補を読み出して負例の属性評価ペア候補に置き換える。 The training case creation 2 means 180 of the learning device 2 shown in FIG. 3 reads the training document from the training document storage unit 1 and is the same as the pair of the attribute expression and the evaluation expression that are regarded as correct answers by the operator labeling the training document. Are extracted as positive example attribute evaluation pair candidates, and the positive example attribute evaluation pair candidates stored in the classification 1 means classification result storage unit 170 are read and replaced with negative example attribute evaluation pair candidates. .

図１９は、訓練事例作成２手段１８０が抽出した正例属性評価ペア候補と、置き換えた負例属性評価ペア候補とを示す図である。同図に示すように、訓練事例作成２手段１８０が抽出した正例属性評価ペア候補はＧＩＤ１の（Ａ１，Ｅ１，素性１）であり、負例として与えられるのは、分類１手段１６０によって正例属性評価ペア候補と判定された（Ａ１，Ｅ１，素性６）である。 FIG. 19 is a diagram showing the positive example attribute evaluation pair candidates extracted by the training case creation 2 means 180 and the replaced negative example attribute evaluation pair candidates. As shown in the figure, the positive example attribute evaluation pair candidate extracted by the training case creation 2 means 180 is (A1, E1, feature 1) of GID1, and is given as a negative example by the classification 1 means 160. (A1, E1, feature 6) determined to be an example attribute evaluation pair candidate.

図３に示す学習装置２のモデル作成２手段１９０は、訓練事例作成２手段１８０によって抽出された正例属性評価ペア候補並びに置き換えた負例属性評価ペア候補を入力して、識別関数Ｚ２を用いた分類モデル２を作成する。識別関数Ｚ２を用いた分類モデル２の作成方法については、モデル作成１手段１５０と同様であるので、ここでは説明を省略する。 The model creation 2 means 190 of the learning apparatus 2 shown in FIG. 3 inputs the positive example attribute evaluation pair candidate extracted by the training case creation 2 means 180 and the replaced negative example attribute evaluation pair candidate, and uses the discriminant function Z2. The classification model 2 that was found is created. The method of creating the classification model 2 using the discriminant function Z2 is the same as that of the model creation 1 means 150, and thus description thereof is omitted here.

図３に示す学習装置２の分類モデル２記憶部４には、モデル作成２手段１９０が作成した識別関数Ｚ２の分類モデル２を記憶する。 The classification model 2 storage unit 4 of the learning device 2 shown in FIG. 3 stores the classification model 2 of the discriminant function Z2 created by the model creation 2 means 190.

図２０は、学習装置２を用いた識別関数Ｚ１等の分類モデル１及び識別関数Ｚ２等の分類モデル２を抽出する処理のフローチャートである。同図に示すＳ１００からＳ１９０の処理ステップは、図３の学習装置２の各手段における処理に対応している。 FIG. 20 is a flowchart of processing for extracting the classification model 1 such as the discrimination function Z1 and the classification model 2 such as the discrimination function Z2 using the learning device 2. The processing steps from S100 to S190 shown in the figure correspond to the processing in each means of the learning device 2 in FIG.

図２１は、学習装置２が正例属性評価ペア候補を含む訓練事例から作成した識別関数Ｚ１等の分類モデル１及び識別関数Ｚ２等の分類モデル２を用いて、調査対象文書から正例属性評価ペアを正例意見として抽出する処理を行う分類装置７のブロック図である。 FIG. 21 shows a case where a learning device 2 uses a classification model 1 such as a discriminant function Z1 and a classification model 2 such as a discriminant function Z2 created from a training case including a positive example attribute evaluation pair candidate. It is a block diagram of the classification | category apparatus 7 which performs the process which extracts a pair as a positive opinion.

図２１に示す分類装置７の属性表現記憶部１１０が記憶する属性表現の集合と、評価表現記憶部１２０が記憶する評価表現の集合は、図３に示した属性表現記憶部１１０が記憶する属性表現の情報と、評価表現記憶部１２０が記憶する情報と同様の情報を用いることができる。 A set of attribute expressions stored in the attribute expression storage unit 110 of the classification device 7 illustrated in FIG. 21 and a set of evaluation expressions stored in the evaluation expression storage unit 120 are attributes stored in the attribute expression storage unit 110 illustrated in FIG. Expression information and information similar to the information stored in the evaluation expression storage unit 120 can be used.

同図に示す分類装置７は、文書記憶部５から調査対象文書を読み出して属性表現記憶部１１０に記憶されている属性表現の集合と、評価表現記憶部１２０に記憶されている評価表現の集合とを参照し、調査対象文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を抽出して出力する属性評価ペア候補抽出手段２００と、属性評価ペア候補を記憶する属性評価ペア候補記憶手段２３０とを備える。 The classification device 7 shown in the figure reads a survey target document from the document storage unit 5 and stores a set of attribute expressions stored in the attribute expression storage unit 110 and a set of evaluation expressions stored in the evaluation expression storage unit 120. The attribute evaluation pair candidate extraction means 200 for extracting and outputting attribute evaluation pair candidates corresponding to the combination of the attribute expression and the evaluation expression existing in the investigation target document, and the attribute evaluation pair candidate are stored. Attribute evaluation pair candidate storage means 230.

また分類装置７は、属性評価ペア候補と分類モデル１記憶部３に記憶されている識別関数Ｚ１等の分類モデル１とを入力して、同一グループＩＤ内の属性評価ペア候補を正例属性評価ペア候補又は負例属性評価ペア候補に分類する処理を行う分類１手段２６０と、分類１手段２６０が分類した正例属性評価ペア候補又は負例属性評価ペア候補を記憶する分類１手段分類結果記憶部２７０とを備える。 Further, the classification device 7 inputs the attribute evaluation pair candidate and the classification model 1 such as the discriminant function Z1 stored in the classification model 1 storage unit 3, and selects the attribute evaluation pair candidate in the same group ID as the normal example attribute evaluation. Classification 1 means 260 for performing processing for classifying into pair candidates or negative example attribute evaluation pair candidates, and classification 1 means classification result storage for storing positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates classified by classification 1 means 260 Part 270.

また分類装置７は、分類モデル２記憶部４に記憶されている識別関数Ｚ２等の分類モデル２を用いて、属性評価ペア候補を正例属性評価ペア又は負例属性評価ペアに分類する処理を行う分類２手段２９０と、分類２手段２９０が分類した正例属性評価ペア又は負例属性評価ペアを記憶する分類２手段分類結果記憶部６とを備える。 Further, the classification device 7 uses the classification model 2 such as the discriminant function Z2 stored in the classification model 2 storage unit 4 to classify the attribute evaluation pair candidates into positive example attribute evaluation pairs or negative example attribute evaluation pairs. A classification 2 means 290 to perform, and a classification 2 means classification result storage unit 6 that stores positive example attribute evaluation pairs or negative example attribute evaluation pairs classified by the classification 2 means 290 are provided.

図２１に示す文書記憶部５は、調査対象文書を記憶するためのハードディスク等の記憶手段である。調査対象文書には、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、アンケート結果、企業の問い合わせ窓口への問い合わせデータ、報告書、その他の文書が含まれる。これらの調査対象文書は、図３に示した訓練文書記憶部１に記憶されている訓練文書とは異なり、正例属性評価ペアを示すラベルが付されていない文書である。例えば文書記憶部５に、図２２に示すような調査対象文Ｓ１０が記憶されている場合について、以下に説明する。 The document storage unit 5 illustrated in FIG. 21 is a storage unit such as a hard disk for storing the investigation target document. Survey target documents include Web bulletin boards such as blogs, E-mails, questionnaire results, inquiry data for company inquiry windows, reports, and other documents. Unlike the training documents stored in the training document storage unit 1 illustrated in FIG. 3, these survey target documents are documents that are not labeled with positive attribute evaluation pairs. For example, the case where the investigation target sentence S10 as shown in FIG. 22 is stored in the document storage unit 5 will be described below.

また、図２１に示す属性評価ペア候補抽出手段２００が実施する処理と、属性評価ペア候補記憶部２３０が実施する処理は、図３に示した学習装置２の属性評価ペア候補抽出手段１００と、属性評価ペア候補記憶部１３０が実施する処理と同様であるので、機能の説明は省略する。例えば図２２に示す調査対象文Ｓ１０について属性評価ペア候補抽出処理を実施すると、属性評価ペア候補記憶部２３０には、図２３に示す属性評価ペア候補が抽出されて記憶される。 Further, the processing performed by the attribute evaluation pair candidate extraction unit 200 illustrated in FIG. 21 and the processing performed by the attribute evaluation pair candidate storage unit 230 include the attribute evaluation pair candidate extraction unit 100 of the learning device 2 illustrated in FIG. Since it is the same as the process which the attribute evaluation pair candidate memory | storage part 130 implements, description of a function is abbreviate | omitted. For example, when the attribute evaluation pair candidate extraction process is performed on the investigation target sentence S10 illustrated in FIG. 22, the attribute evaluation pair candidate storage unit 230 extracts and stores the attribute evaluation pair candidates illustrated in FIG.

図２３は、属性評価ペア候補記憶部２３０が記憶する属性評価ペア候補の例を示す図である。同図に示す例では、（Ａ１，Ｅ１，素性１）の属性評価ペア候補が抽出され、記憶されている。 FIG. 23 is a diagram illustrating an example of attribute evaluation pair candidates stored in the attribute evaluation pair candidate storage unit 230. In the example shown in the figure, (A1, E1, feature 1) attribute evaluation pair candidates are extracted and stored.

図２１に示す分類装置７の分類１手段２６０は、属性評価ペア候補記憶部２３０に記憶されている属性評価ペア候補と、分類モデル１記憶部３に記憶されている識別関数Ｚ１等の分類モデル１とを入力して、識別関数Ｚ１を用いて属性評価ペア候補を正例属性評価ペア候補又は負例属性評価ペア候補に分類する処理を行う。 The classification 1 means 260 of the classification device 7 shown in FIG. 21 includes a classification model such as an attribute evaluation pair candidate stored in the attribute evaluation pair candidate storage unit 230 and a discrimination function Z1 stored in the classification model 1 storage unit 3. 1 is input, and the attribute evaluation pair candidate is classified into a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate using the discriminant function Z1.

例えば分類１手段２６０は、先ず分類モデル１記憶部３に記録されている識別関数Ｚ１の x_1 から x_n に属性評価ペア候補の素性を代入して判別得点Ｚ１ｘを算出する。識別関数Ｚ１は、正例属性評価ペア候補と負例属性評価ペア候補とを切り分ける境界面を表すため、判別得点Ｚ１ｘの値が正の値であるか、又は負の値であるかによって正例属性評価ペア候補又は負例属性評価ペア候補に分類することが可能である。分類した属性評価ペア候補は、分類１手段分類結果記憶部２７０に記憶する。 For example, the classification 1 means 260 first calculates the discrimination score Z1x by substituting the features of the attribute evaluation pair candidates for x_1 to x_n of the discrimination function Z1 recorded in the classification model 1 storage unit 3. The discriminant function Z1 represents a boundary surface that separates the positive example attribute evaluation pair candidate and the negative example attribute evaluation pair candidate, so that the positive example depends on whether the value of the discrimination score Z1x is a positive value or a negative value It is possible to classify into attribute evaluation pair candidates or negative example attribute evaluation pair candidates. The classified attribute evaluation pair candidates are stored in the classification 1 means classification result storage unit 270.

図２４は、分類１手段分類結果記憶部２７０に記憶されている属性評価ペア候補を示す図である。 FIG. 24 is a diagram illustrating attribute evaluation pair candidates stored in the classification 1 means classification result storage unit 270.

図２１に示す分類装置７の分類１手段分類結果記憶部２７０は、分類１手段２６０が分類した正例属性評価ペア候補又は負例属性評価ペア候補（同図に示す例では、（Ａ１，Ｅ１，素性１）＝正例属性評価ペア候補）を記憶する。 The classification 1 means classification result storage unit 270 of the classification device 7 shown in FIG. 21 stores positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates classified by the classification 1 means 260 (in the example shown in FIG. 21, (A1, E1 , Feature 1) = positive example attribute evaluation pair candidate).

図２１に示す分類２手段２９０は、分類１手段２６０により正例属性評価ペア候補と判定された属性評価ペア候補を、分類モデル２記憶部４に記憶されている識別関数Ｚ２等の分類モデル２を用いて正例属性評価ペア又は負例属性評価ペアに分類する処理を行う。 The classification 2 means 290 shown in FIG. 21 uses the classification model 2 such as the discriminant function Z2 stored in the classification model 2 storage unit 4 as the attribute evaluation pair candidate determined by the classification 1 means 260 as the positive example attribute evaluation pair candidate. Is used to classify into a positive example attribute evaluation pair or a negative example attribute evaluation pair.

分類２手段２９０にて正例属性評価ペアに分類された属性評価ペアは、調査対象文書の記述者の意見と考えられるので、分類２手段２９０は真の属性評価ペアと判定する。ここで分類２手段２９０は、分類モデル２記憶部４に記録されている識別関数Ｚ２の x_1 から x_n に、属性評価ペア候補の素性を代入して判別得点Ｚ２ｘを算出する。識別関数Ｚ２は正例属性評価ペアと負例属性評価ペアを切り分ける境界面を表すため、判別得点Ｚ２ｘの値の正負によって、正例属性評価ペア又は負例属性評価ペアに分類することが可能である。分類２手段２９０が分類した正例属性評価ペア又は負例属性評価ペアは、分類２手段分類結果記憶部６に記憶する。 Since the attribute evaluation pair classified into the positive example attribute evaluation pair by the classification 2 means 290 is considered as the opinion of the writer of the investigation target document, the classification 2 means 290 determines that it is a true attribute evaluation pair. Here, the classification 2 means 290 calculates the discrimination score Z2x by substituting the feature of the attribute evaluation pair candidate into x_1 to x_n of the discrimination function Z2 recorded in the classification model 2 storage unit 4. The discriminant function Z2 represents a boundary surface that separates the positive example attribute evaluation pair and the negative example attribute evaluation pair, and therefore can be classified into a positive example attribute evaluation pair or a negative example attribute evaluation pair depending on whether the value of the discrimination score Z2x is positive or negative. is there. The positive example attribute evaluation pair or the negative example attribute evaluation pair classified by the classification 2 means 290 is stored in the classification 2 means classification result storage unit 6.

図２５は、分類２手段分類結果記憶部６に記憶されている正例属性評価ペア又は負例属性評価ペアを示す図である。同図に示す例では、（Ａ１，Ｅ１，素性１）＝正例属性評価ペア）を記憶している。分類２手段２９０によって正例属性評価ペアと判定された正例属性評価ペア候補は、真の属性評価ペア、すなわち調査対象文書の記述者の意見として抽出され、記憶される。 FIG. 25 is a diagram showing a positive example attribute evaluation pair or a negative example attribute evaluation pair stored in the classification 2 means classification result storage unit 6. In the example shown in the figure, (A1, E1, feature 1) = positive example attribute evaluation pair) is stored. The positive example attribute evaluation pair candidates determined as the positive example attribute evaluation pair by the classification 2 means 290 are extracted and stored as the true attribute evaluation pair, that is, the opinion of the writer of the document to be investigated.

図２６は、分類装置７を用いて調査対象文書から記述者の意見を抽出する処理のフローチャートである。同図に示すＳ２００からＳ２９０の処理ステップは、図２１の分類装置７の各手段における処理に対応している。 FIG. 26 is a flowchart of processing for extracting the opinion of the writer from the investigation target document using the classification device 7. The processing steps from S200 to S290 shown in the figure correspond to the processing in each means of the classification device 7 in FIG.

（第２実施形態）
図２７は、オペレータが入力した正例属性評価ペアを含む訓練事例から属性評価ペア候補分類の学習を行って作成した識別関数Ｚ３等の分類モデル３と、識別関数Ｚ４等の分類モデル４とを生成して出力することによって、どのような文書の場合に属性表現と評価表現は正例属性評価ペアになり、逆にどのような場合に正例属性評価ペアにならないかを判別するための分類モデル３及び分類モデル４を学習させる、学習装置８の処理に関するブロック図である。 (Second Embodiment)
FIG. 27 shows a classification model 3 such as an identification function Z3 and a classification model 4 such as an identification function Z4 created by learning attribute evaluation pair candidate classification from a training example including a positive example attribute evaluation pair input by an operator. By generating and outputting, the classification to determine in what kind of document the attribute expression and evaluation expression become a normal example attribute evaluation pair, and conversely in which case it does not become a normal example attribute evaluation pair It is a block diagram regarding the process of the learning apparatus 8 which makes the model 3 and the classification | category model 4 learn.

図３１は、図２７に示す学習装置８が正例属性評価ペア候補を含む訓練事例から作成した識別関数Ｚ３等の分類モデル３及び識別関数Ｚ４等の分類モデル４を用いて、調査対象文書から正例意見の抽出を行う分類装置９のブロック図である。図３１に示す分類装置９についての説明は後述する。 FIG. 31 is a diagram illustrating a survey target document using a classification model 3 such as a discriminant function Z3 and a classification model 4 such as a discriminant function Z4 created from a training case including a positive example attribute evaluation pair candidate by the learning device 8 shown in FIG. It is a block diagram of the classification | category apparatus 9 which extracts a positive example opinion. The description of the classification device 9 shown in FIG. 31 will be described later.

図２７には、オペレータが正例のラベルを付与したラベル付きの正解データを含む訓練文書を記憶するハードディスク等の訓練文書記憶部１と、訓練文書に基づいた学習を行って識別関数Ｚ３等を用いた分類モデル３及び識別関数Ｚ４等を用いた分類モデル４を作成する学習装置８と、学習装置８のモデル作成１手段３５０が作成した識別関数Ｚ３等の分類モデル３を記憶する分類モデル１記憶部３と、学習装置８のモデル作成２手段１９０が作成した識別関数Ｚ４の分類モデル４を記憶する分類モデル２記憶部４とが示されている。 FIG. 27 shows a training document storage unit 1 such as a hard disk for storing a training document including labeled correct answer data to which an operator has given a correct example label, and learning function based on the training document to obtain an identification function Z3 and the like. A learning model 8 that creates a classification model 4 using the classification model 3 and the discrimination function Z4 used, and a classification model 1 that stores the classification model 3 such as the discrimination function Z3 created by the model creation 1 means 350 of the learning device 8 A storage unit 3 and a classification model 2 storage unit 4 that stores the classification model 4 of the discriminant function Z4 created by the model creation 2 means 190 of the learning device 8 are shown.

図２７に示す学習装置８は、訓練文書記憶部１から訓練文書を読み出して属性表現記憶部１１０に記憶されている属性表現の集合と評価表現記憶部１２０に記憶されている評価表現の集合とを参照し、訓練文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を抽出して出力する属性評価ペア候補抽出手段１００と、属性評価ペア候補を記憶する属性評価ペア候補記憶手段１３０とを備える。 The learning device 8 shown in FIG. 27 reads a training document from the training document storage unit 1 and stores a set of attribute expressions stored in the attribute expression storage unit 110 and a set of evaluation expressions stored in the evaluation expression storage unit 120. Attribute extraction pair candidate extraction means 100 that extracts and outputs attribute evaluation pair candidates corresponding to combinations of attribute expressions and evaluation expressions existing in the training document, and attribute evaluation that stores attribute evaluation pair candidates Pair candidate storage means 130.

図２７に示す学習装置８の訓練文書記憶部１に記憶する訓練文書は、図３に示した訓練文書記憶部１に記憶する訓練文書と同様の訓練文書を用いることができる。また、属性表現記憶部１１０が記憶する属性表現の集合と、評価表現記憶部１２０が記憶する評価表現の集合は、図３に示した属性表現記憶部１１０が記憶する属性表現の情報と、評価表現記憶部１２０が記憶する情報と同様の情報を用いることができる。 As the training document stored in the training document storage unit 1 of the learning device 8 illustrated in FIG. 27, a training document similar to the training document stored in the training document storage unit 1 illustrated in FIG. 3 can be used. Also, the set of attribute expressions stored in the attribute expression storage unit 110 and the set of evaluation expressions stored in the evaluation expression storage unit 120 are information on attribute expressions stored in the attribute expression storage unit 110 shown in FIG. Information similar to the information stored in the expression storage unit 120 can be used.

また学習装置８は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を一つ以上含むグループＩＤの属性評価ペア候補を抽出するとともに、２組の属性評価ペア候補内の２つの属性表現の素性や位置関係等に基づいて属性評価比較ペア候補を生成し、訓練文書中の先頭から左側（前方）に存在する属性表現が訓練文書中の正例属性評価ペアの属性表現と一致している場合には、属性評価比較ペア候補にＬｅｆｔのラベルを付与した属性評価比較ペアを生成し、訓練文書中の先頭から右側（後方）に存在する属性表現が訓練文書中の正例属性評価ペアの属性表現と一致している場合には、属性評価比較ペア候補にＲｉｇｈｔのラベルを付与した属性評価比較ペアを生成する処理を行う訓練事例作成１手段３４０を備える。 Further, the learning device 8 reads out the training document from the training document storage unit 1 and includes one or more attribute evaluation pair candidates that are the same as the pair of the attribute expression and the evaluation expression that are regarded as correct answers by the operator labeling the training document. Extract attribute evaluation pair candidates for group ID and generate attribute evaluation comparison pair candidates based on the features and positional relationships of the two attribute expressions in the two attribute evaluation pair candidates. When the attribute expression existing in (front) matches the attribute expression of the positive example attribute evaluation pair in the training document, an attribute evaluation comparison pair in which the Left label is assigned to the attribute evaluation comparison pair candidate is generated, When the attribute expression existing on the right side (rear) from the beginning in the training document matches the attribute expression of the positive example attribute evaluation pair in the training document, a right label is assigned to the attribute evaluation comparison pair candidate Comprising a training set created 1 means 340 for performing the attribute evaluation process generates a comparison pairs.

また学習装置８は、属性評価比較ペアを入力して属性表現と評価表現に正しい関係が存在するか否かを主に学習する識別関数Ｚ３等の分類モデル３を作成するモデル作成１手段３５０を備える。 Further, the learning device 8 includes a model creation 1 means 350 for creating a classification model 3 such as a discriminant function Z3 that mainly learns whether an attribute expression and an evaluation expression have a correct relationship by inputting an attribute evaluation comparison pair. Prepare.

また学習装置８は、同一グループＩＤ内の２組の未定の属性評価ペア候補群から属性評価比較ペアを作成し、分類モデル１記憶部３に記憶されている識別関数Ｚ３と未定の属性評価ペア候補とを用いてトーナメント方式に順番に比較して、同一グループＩＤ毎に正例属性評価ペア候補と負例属性評価ペア候補とに分類する分類１手段３６０と、分類１手段３６０が分類した正例属性評価ペア候補又は負例属性評価ペア候補を記憶する分類１手段分類結果記憶部１７０とを備える。 Further, the learning device 8 creates an attribute evaluation comparison pair from two sets of undetermined attribute evaluation pair candidates in the same group ID, and the discrimination function Z3 stored in the classification model 1 storage unit 3 and the undetermined attribute evaluation pair. The candidate 1 is used to compare in order to the tournament method, and the classification 1 means 360 for classifying the positive example attribute evaluation pair candidate and the negative example attribute evaluation pair candidate for each same group ID, and the positive classified by the classification 1 means 360 A classification 1 means classification result storage unit 170 that stores example attribute evaluation pair candidates or negative example attribute evaluation pair candidates is provided.

また学習装置８は、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正解とされる属性表現と評価表現のペアと同一の属性評価ペア候補を正例属性評価ペア候補と定めて抽出するとともに、分類１手段分類結果記憶部１７０に記憶されている正例属性評価ペア候補を読み出して負例の属性評価ペア候補に置き換える訓練事例作成２手段１８０と、訓練事例作成２手段１８０によって抽出された正例属性評価ペア候補並びに置き換えた負例属性評価ペア候補を入力して、訓練文書から主に主観評価を学習するための識別関数Ｚ４等を用いた分類モデル４を作成するモデル作成２手段１９０とを備える。 Further, the learning device 8 reads out the training document from the training document storage unit 1, and selects the attribute evaluation pair candidate that is the same as the pair of the attribute expression and the evaluation expression as the correct answer by which the operator has attached the label to the training document. A training case creation 2 means 180 that reads out the positive example attribute evaluation pair candidate stored in the classification 1 means classification result storage unit 170 and replaces it with a negative example attribute evaluation pair candidate, The classification model 4 using the identification function Z4 or the like for mainly learning the subjective evaluation from the training document by inputting the positive example attribute evaluation pair candidate extracted by the creation 2 means 180 and the replaced negative example attribute evaluation pair candidate. Model creation 2 means 190 for creating

図２８は、訓練事例作成１手段３４０が、属性評価ペア候補記憶部１３０に記憶されている属性評価ペア候補を用いて作成した属性評価比較ペア（訓練事例）を示す図である。 FIG. 28 is a diagram showing attribute evaluation comparison pairs (training examples) created by the training case creation 1 means 340 using the attribute evaluation pair candidates stored in the attribute evaluation pair candidate storage unit 130.

図２７に示す学習装置８の訓練事例作成１手段３４０は、先ず訓練文書記憶部１に記憶されている訓練文書の、正例属性評価ペアが存在する箇所（オペレータが正例としてラベルを付与した正解とされる文書で、記述者の意見を含む文書。）から抽出した属性表現と評価表現のペアと同一の属性評価ペア候補を一つ以上含むグループＩＤの属性評価ペア候補を抽出する処理を行う。図２７に示す例では、ＧＩＤ１のグループに属する属性評価ペア候補が抽出される。 The training example creation 1 means 340 of the learning device 8 shown in FIG. 27 first includes a place where a positive example attribute evaluation pair exists in the training document stored in the training document storage unit 1 (the operator gave a label as a positive example) Processing to extract attribute evaluation pair candidates having a group ID including at least one attribute evaluation pair candidate identical to the attribute expression / evaluation expression pair extracted from the correct document and including the opinion of the writer. Do. In the example shown in FIG. 27, attribute evaluation pair candidates belonging to the group of GID1 are extracted.

次に訓練事例作成１手段３４０は、抽出した属性評価ペア候補のうち、正例属性評価ペアと一致する属性表現と評価表現のペアを含む任意の２つの属性表現と、１つの評価表現のペアを組み合わせた属性評価比較ペア候補を生成する処理を行う。なお、図２７に示す例では、属性評価ペア候補記憶部１３０が記憶しているＧＩＤ２のグループ属性評価ペア候補は、訓練文書の正例属性評価ペアを一つも含まない訓練文から抽出した属性評価ペア候補であるために、訓練事例作成１手段３４０はＧＩＤ２のグループに属する属性評価ペア候補については属性評価比較ペアとして選出しない。 Next, the training case creation 1 means 340, among the extracted attribute evaluation pair candidates, any two attribute expressions including a pair of attribute expression and evaluation expression that match the positive attribute evaluation pair, and one evaluation expression pair. A process of generating an attribute evaluation comparison pair candidate in which is combined is performed. In the example shown in FIG. 27, the group attribute evaluation pair candidate of GID2 stored in the attribute evaluation pair candidate storage unit 130 is an attribute evaluation extracted from a training sentence that does not include any of the normal example attribute evaluation pairs of the training document. Since it is a pair candidate, the training example creation 1 means 340 does not select an attribute evaluation pair candidate belonging to the group of GID2 as an attribute evaluation comparison pair.

次に訓練事例作成１手段３４０は、属性評価比較ペア候補内の２つの属性表現の素性や位置関係等に基づいて、訓練文書中の先頭から左側（前方）に存在する属性表現が、訓練文書中の正例属性評価ペアの属性表現と一致している場合には、属性評価比較ペア候補にＬｅｆｔのラベルを付与した属性評価比較ペアを生成する処理を行う。このＬｅｆｔのラベルが付与されている属性評価比較ペアでは、右側の属性表現よりも左側の属性表現の方が正例属性評価ペアに近い素性を有していることを示している。例えば図２８に示す例では、属性表現「Ａ１」の方が属性表現「Ａ２」、「Ａ３」、「Ａ４」よりも正例属性評価ペアに近い素性を有している属性表現であることを示している。 Next, the training example creation 1 means 340, based on the features and positional relationships of the two attribute expressions in the attribute evaluation comparison pair candidates, the attribute expression existing from the beginning to the left (front) in the training document is the training document. If it matches the attribute expression of the positive example attribute evaluation pair, processing for generating an attribute evaluation comparison pair in which the Left label is assigned to the attribute evaluation comparison pair candidate is performed. The attribute evaluation comparison pair to which the Left label is assigned indicates that the left attribute expression has a feature closer to the positive example attribute evaluation pair than the right attribute expression. For example, in the example shown in FIG. 28, the attribute expression “A1” is an attribute expression having a feature closer to the positive example attribute evaluation pair than the attribute expressions “A2”, “A3”, and “A4”. Show.

また同様に訓練事例作成１手段３４０は、属性評価比較ペア候補内の２つの属性表現の素性や位置関係等に基づいて、訓練文書中の先頭から右側（後方）に存在する属性表現が、訓練文書中の正例属性評価ペアの属性表現と一致している場合には、属性評価比較ペア候補にＲｉｇｈｔのラベルを付与した属性評価比較ペアを生成する処理を行う。このＲｉｇｈｔのラベルが付与されている属性評価比較ペアでは、左側の属性表現よりも右側の属性表現の方が正例属性評価ペアに近い素性を有していることを示している。 Similarly, the training example creation 1 means 340 determines that the attribute expression existing from the beginning to the right (back) in the training document is based on the features and positional relationship between the two attribute expressions in the attribute evaluation comparison pair candidate. If the attribute expression of the positive example attribute evaluation pair in the document matches, processing for generating an attribute evaluation comparison pair in which a Right label is assigned to the attribute evaluation comparison pair candidate is performed. In the attribute evaluation comparison pair to which the Right label is assigned, the right side attribute expression has a feature closer to the positive example attribute evaluation pair than the left side attribute expression.

例えば属性評価ペア候補記憶部１３０に図１５に示した属性評価ペア候補が記憶されている場合には、訓練文書記憶部１の正例属性評価ペアと一致している属性評価ペア候補を含むＧＩＤ１のグループに属する属性評価ペア候補を取り出して、正例属性評価ペアと一致する（Ａ１，Ｅ１，素性１）を含む任意の２つの属性評価ペア候補を組み合わせた属性評価比較ペア（例えば図２８の３つの属性評価比較ペア）を作成する。 For example, when the attribute evaluation pair candidate shown in FIG. 15 is stored in the attribute evaluation pair candidate storage unit 130, GID1 including the attribute evaluation pair candidate that matches the positive example attribute evaluation pair in the training document storage unit 1 Attribute evaluation pair candidates that belong to the group are extracted, and an attribute evaluation comparison pair (for example, FIG. 28) that combines any two attribute evaluation pair candidates including (A1, E1, feature 1) that match the positive example attribute evaluation pair. Three attribute evaluation comparison pairs) are created.

図２８に示す３つの属性評価比較ペアの例では、３つとも文書の先頭から左側の属性表現「Ａ１」が、訓練文書中の正例属性評価ペアと一致しているために「Ｌｅｆｔ」のラベルが付与されている。なお、「Ｌｅｆｔ」又は「Ｒｉｇｈｔ」というラベルは、左側、右側を示す情報のラベルであればどのような種類のラベルであってもよい。 In the example of the three attribute evaluation comparison pairs shown in FIG. 28, since the attribute expression “A1” on the left side from the top of the document matches the positive example attribute evaluation pair in the training document, “Left” A label is attached. The label “Left” or “Right” may be any type of label as long as it is a label for information indicating the left side and the right side.

図２７に示す学習装置８のモデル作成１手段３５０は、訓練事例作成１手段３４０よって作成された属性評価比較ペアを入力して、分類モデル３を作成する。分類モデル３の作成方法は、図３に示した学習装置２の分類モデル１作成手段１５０が実行する処理と同様の、後段の処理においてＬｅｆｔとＲｉｇｈｔとに切り分ける境界面を決定する識別関数Ｚ３を用いる。識別関数Ｚ３としては、例えば前述の（式１）に示す識別関数Ｚ１を用いることができる。なお属性評価比較ペアの素性は、組み合わせ元の２つの属性評価ペア候補の素性の配列を連結したものを用いることができる。作成した識別関数Ｚ３等の分類モデル３は、分類モデル１記憶部３に記憶する。 The model creation 1 means 350 of the learning device 8 shown in FIG. 27 inputs the attribute evaluation comparison pair created by the training case creation 1 means 340 and creates the classification model 3. The classification model 3 is created by using the discriminant function Z3 for determining the boundary surface to be divided into Left and Right in the subsequent processing, which is the same as the processing executed by the classification model 1 creation means 150 of the learning device 2 shown in FIG. Use. As the discriminant function Z3, for example, the discriminant function Z1 shown in the above (Formula 1) can be used. The feature of the attribute evaluation comparison pair may be a combination of the two attribute evaluation pair candidate feature sequences of the combination source. The created classification model 3 such as the discriminant function Z3 is stored in the classification model 1 storage unit 3.

図２７に示す学習装置８の分類１手段３６０では、先ず属性評価ペア候補記憶部１３０から属性評価ペア候補を入力するとともに、訓練文書記憶部１から訓練文書を読み出して、オペレータが訓練文書にラベルを付した正例属性評価ペアと同一の属性評価ペア候補を一つも含まない属性評価ペア候補から構成されるグループ（ＧＩＤ）に属する属性評価ペア候補を、未定の属性評価ペア候補として抽出する処理を行う。ここで抽出された属性評価ペア候補は、図１７に示すような未定の属性評価ペア候補である。なお、図１７に示した未定の属性評価ペア候補は単一のグループ（ＧＩＤ２）に属する属性評価ペア候補のみを示しているが、複数のグループが抽出される場合もある。 In the classification 1 means 360 of the learning device 8 shown in FIG. 27, first, the attribute evaluation pair candidate is input from the attribute evaluation pair candidate storage unit 130, the training document is read from the training document storage unit 1, and the operator labels the training document. Of extracting attribute evaluation pair candidates belonging to a group (GID) made up of attribute evaluation pair candidates that do not include any of the same attribute evaluation pair candidates as the positive example attribute evaluation pair to I do. The attribute evaluation pair candidates extracted here are undetermined attribute evaluation pair candidates as shown in FIG. Although the undetermined attribute evaluation pair candidates shown in FIG. 17 show only the attribute evaluation pair candidates belonging to a single group (GID2), a plurality of groups may be extracted.

次に分類１手段３６０は、２組の未定の属性評価ペア候補内の２つの属性表現素性に基づいて属性評価比較ペアを生成する。次に分類１手段３６０は、分類モデル１記憶部３に記憶されている識別関数Ｚ３を用いて、生成した属性評価比較ペアとその素性を用いてトーナメント方式に順番に比較し、同一グループＩＤ毎に最終的に残った一つの属性評価ペア候補を正例属性評価ペア候補に設定し、正例属性評価ペア候補以外の属性評価ペア候補を負例属性評価ペア候補に設定する処理を行う。 Next, the classification 1 means 360 generates an attribute evaluation comparison pair based on two attribute expression features in two sets of undetermined attribute evaluation pair candidates. Next, the classification 1 means 360 uses the discriminant function Z3 stored in the classification model 1 storage unit 3 to compare the generated attribute evaluation comparison pairs and their features in order to the tournament method, and for each group ID. One attribute evaluation pair candidate that finally remains in is set as a positive example attribute evaluation pair candidate, and attribute evaluation pair candidates other than the positive example attribute evaluation pair candidate are set as negative example attribute evaluation pair candidates.

例えば分類１手段３６０は、分類モデル１記憶部３に記録されている識別関数Ｚ３の x_1 から x_n に属性評価ペア候補の素性を代入して、判別得点Ｚ３ｘの値を算出する。識別関数Ｚ３は、属性評価ペア候補をＬｅｆｔとＲｉｇｈｔとに切り分ける境界面を決定する関数であるので、判別得点Ｚ３ｘが正の値に算出されるか、又は負の値に算出されるかに応じて、ＬｅｆｔとＲｉｇｈｔとに切り分けることによって、属性評価比較ペアを生成する処理を行う。 For example, the classification 1 means 360 substitutes the feature of the attribute evaluation pair candidate for x_1 to x_n of the discrimination function Z3 recorded in the classification model 1 storage unit 3, and calculates the value of the discrimination score Z3x. The discriminant function Z3 is a function that determines a boundary surface that divides the attribute evaluation pair candidate into Left and Right, so that the discrimination score Z3x is calculated as a positive value or a negative value. Thus, processing for generating an attribute evaluation comparison pair is performed by dividing into Left and Right.

次にトーナメント方式を用いて属性評価ペア候補を順番に比較する処理について説明する。先ず分類１手段３６０は、文書の右側に最も近い属性表現を含む２つの属性評価ペア候補を比較する。すなわち、これらの２つの属性評価ペア候補組み合わせた属性評価比較ペアを作成し、それをモデル１記憶部３の分類モデル３を用いてＬｅｆｔまたはＲｉｇｈｔに分類し、Ｌｅｆｔの場合は左側の属性評価ペア候補を残し、Ｒｉｇｈｔの場合は右側の属性評価ペア候補を残す処理を行う。以降の比較では、残った属性評価ペア候補と比較が済んでいない属性評価比較ペアの中で最も右側の属性表現を含む２つの属性評価ペア候補を順番に比較する処理を繰り返す。 Next, processing for sequentially comparing attribute evaluation pair candidates using the tournament method will be described. First, the classification 1 means 360 compares two attribute evaluation pair candidates including the attribute expression closest to the right side of the document. That is, an attribute evaluation comparison pair obtained by combining these two attribute evaluation pair candidates is created, and is classified into Left or Right using the classification model 3 of the model 1 storage unit 3. In the case of Left, the left attribute evaluation pair A candidate is left, and in the case of Right, a process of leaving a right attribute evaluation pair candidate is performed. In subsequent comparisons, the process of sequentially comparing the two attribute evaluation pair candidates including the rightmost attribute expression among the attribute evaluation comparison pairs that have not been compared with the remaining attribute evaluation pair candidates is repeated.

分類１手段３６０にて、正例属性評価ペア候補又は負例属性評価ペア候補のどちらに分類されるかは、属性評価候補ペアの素性に依存するが、図１５に示した属性評価ペア候補を用いて分類する場合には、図１７に示す属性評価ペア候補（訓練事例作成１手段で使われなかったＧＩＤ２の属性評価ペア候補）に基づいて分類することになる。その分類を行った結果、図１８のように分類されて分類１手段分類結果記憶部１７０に記憶された場合について説明する。 Whether the classification 1 means 360 is classified as a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate depends on the feature of the attribute evaluation candidate pair, but the attribute evaluation pair candidate shown in FIG. When using and classifying, it classifies based on the attribute evaluation pair candidate (GID2 attribute evaluation pair candidate that was not used in the training example creation 1 means) shown in FIG. As a result of the classification, a case where the classification is performed as shown in FIG. 18 and stored in the classification 1 means classification result storage unit 170 will be described.

図２９は、分類１手段３６０がトーナメント方式を用いて分類する様子を示す図である。 FIG. 29 is a diagram showing how the classification 1 means 360 performs classification using the tournament method.

先ず分類１手段３６０は、文書の右側に最も近い２つの属性評価ペア候補（Ａ７，Ｅ１，素性８），（Ａ６，Ｅ１，素性７）の２つを比較するために、属性評価比較ペア（Ａ６，Ａ７，Ｅ１，素性７，素性８）を作成する。この属性評価比較ペアを分類モデル３を用いて判別得点Ｚ３ｘを求めたところＬｅｆｔに分類されたとする。するとこの場合には（Ａ６，Ｅ１，素性７）が残る。 First, the classification 1 means 360 compares the attribute evaluation comparison pair ((A7, E1, feature 8), (A6, E1, feature 7)) that is closest to the right side of the document. A6, A7, E1, feature 7, feature 8) are created. Suppose that this attribute evaluation comparison pair is classified as Left when the discrimination score Z3x is obtained using the classification model 3. In this case, (A6, E1, feature 7) remains.

次に、この（Ａ６，Ｅ１，素性７）と（Ａ１，Ｅ１，素性６）を比較して、同様に分類モデル３を用いてＬｅｆｔに分類されたとする。するとこの場合には（Ａ１，Ｅ１，素性６）が残る。最後に（Ａ１，Ｅ１，素性６）と（Ａ５，Ｅ１，素性５）を比較し、分類モデル３を用いてＲｉｇｈｔに分類されたとする。するとこの場合には（Ａ１，Ｅ１，素性６）の正例属性評価ペア候補が残る。以上のようにして分類１手段３６０は、属性評価ペア候補を、正例属性評価ペア候補と負例属性評価ペア候補とに分類する処理を行う。 Next, it is assumed that (A6, E1, feature 7) and (A1, E1, feature 6) are compared and similarly classified into Left using the classification model 3. In this case, (A1, E1, feature 6) remains. Finally, (A1, E1, feature 6) and (A5, E1, feature 5) are compared, and it is assumed that classification is performed using the classification model 3. In this case, the positive example attribute evaluation pair candidate (A1, E1, feature 6) remains. As described above, the classification 1 means 360 performs processing for classifying the attribute evaluation pair candidates into positive example attribute evaluation pair candidates and negative example attribute evaluation pair candidates.

図２７に示す学習装置８の訓練事例作成２手段１８０と、モデル作成２手段１９０とが実行する処理は、図３に示した学習装置２が実行する処理と同様であるので、ここでは説明を省略する。 The processing executed by the training case creation 2 means 180 and the model creation 2 means 190 of the learning device 8 shown in FIG. 27 is the same as the processing executed by the learning device 2 shown in FIG. Omitted.

図３０は、学習装置８を用いた識別関数Ｚ３等の分類モデル３及び識別関数Ｚ４等の分類モデル４を抽出する処理のフローチャートである。同図に示すＳ１００からＳ１９０の処理ステップは、図２７の学習装置８の各手段における処理に対応している。 FIG. 30 is a flowchart of the process of extracting the classification model 3 such as the discrimination function Z3 and the classification model 4 such as the discrimination function Z4 using the learning device 8. The processing steps from S100 to S190 shown in the figure correspond to the processing in each means of the learning device 8 in FIG.

図３１は、学習装置８が正例属性評価ペア候補を含む訓練事例から作成した識別関数Ｚ３等の分類モデル３及び識別関数Ｚ
４等の分類モデル４を用いて、調査対象文書から正例属性評価ペアを正例意見として抽出する処理を行う分類装置９のブロック図である。 FIG. 31 shows a classification model 3 such as a discriminant function Z3 and a discriminant function Z created by the learning device 8 from training examples including positive example attribute evaluation pair candidates.
It is a block diagram of the classification | category apparatus 9 which performs the process which extracts a positive example attribute evaluation pair as a positive example opinion from an investigation object document using classification models 4, such as 4. FIG.

図３１に示す分類装置９の属性表現記憶部１１０が記憶する属性表現の集合と、評価表現記憶部１２０が記憶する評価表現の集合は、図３に示した属性表現記憶部１１０が記憶する属性表現の情報と、評価表現記憶部１２０が記憶する情報と同様の情報を用いることができる。 A set of attribute expressions stored in the attribute expression storage unit 110 of the classification device 9 illustrated in FIG. 31 and a set of evaluation expressions stored in the evaluation expression storage unit 120 are attributes stored in the attribute expression storage unit 110 illustrated in FIG. Expression information and information similar to the information stored in the evaluation expression storage unit 120 can be used.

図３１に示す分類装置９は、文書記憶部５から調査対象文書を読み出して属性表現記憶部１１０に記憶されている属性表現の集合と、評価表現記憶部１２０に記憶されている評価表現の集合とを参照し、調査対象文書中に存在している属性表現と評価表現の組み合わせに該当する属性評価ペア候補を抽出して出力する属性評価ペア候補抽出手段２００と、属性評価ペア候補を記憶する属性評価ペア候補記憶手段２３０とを備える。 The classification device 9 shown in FIG. 31 reads a survey target document from the document storage unit 5 and stores a set of attribute expressions stored in the attribute expression storage unit 110 and a set of evaluation expressions stored in the evaluation expression storage unit 120. The attribute evaluation pair candidate extraction means 200 for extracting and outputting attribute evaluation pair candidates corresponding to the combination of the attribute expression and the evaluation expression existing in the investigation target document, and the attribute evaluation pair candidate are stored. Attribute evaluation pair candidate storage means 230.

図３１に示す分類装置９は、属性評価ペア候補記憶部２３０が記憶している同一のグループＩＤ内の２組の属性評価ペア候補群から属性評価比較ペアを作成し、分類モデル１記憶部３に記憶されている識別関数Ｚ３等の分類モデル３と属性評価比較ペアとを用いてトーナメント方式に順番に比較して、同一グループＩＤ毎に正例属性評価ペア候補と負例属性評価ペア候補とに分類する分類１手段４６０と、分類１手段４６０が分類した正例属性評価ペア候補又は負例属性評価ペア候補を記憶する分類１手段分類結果記憶部２７０とを備える。 The classification device 9 shown in FIG. 31 creates an attribute evaluation comparison pair from two groups of attribute evaluation pair candidates in the same group ID stored in the attribute evaluation pair candidate storage unit 230, and the classification model 1 storage unit 3 Are compared in order to the tournament method using the classification model 3 such as the discriminant function Z3 and the attribute evaluation comparison pair stored in the table, and a positive example attribute evaluation pair candidate and a negative example attribute evaluation pair candidate for each group ID A classification 1 means 460 that classifies the data, and a classification 1 means classification result storage unit 270 that stores positive example attribute evaluation pair candidates or negative example attribute evaluation pair candidates classified by the classification 1 means 460.

また分類装置９は、分類モデル２記憶部４に記憶されている識別関数Ｚ４等の分類モデル４を用いて、属性評価ペア候補を正例属性評価ペア又は負例属性評価ペアに分類する処理を行う分類２手段２９０と、分類２手段２９０が分類した正例属性評価ペア又は負例属性評価ペアを記憶する分類２手段分類結果記憶部６とを備える。 Further, the classification device 9 uses the classification model 4 such as the discriminant function Z4 stored in the classification model 2 storage unit 4 to classify the attribute evaluation pair candidates into positive example attribute evaluation pairs or negative example attribute evaluation pairs. A classification 2 means 290 to perform, and a classification 2 means classification result storage unit 6 that stores positive example attribute evaluation pairs or negative example attribute evaluation pairs classified by the classification 2 means 290 are provided.

図３１に示す文書記憶部５は、調査対象文書を記憶するためのハードディスク等の記憶手段である。調査対象文書には、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、アンケート結果、企業の問い合わせ窓口への問い合わせデータ、報告書、その他の文書が含まれる。これらの調査対象文書は、図２７に示した訓練文書記憶部１に記憶されている訓練文書とは異なり、正例属性評価ペアを示すラベルが付されていない文書である。 The document storage unit 5 shown in FIG. 31 is a storage unit such as a hard disk for storing the investigation target document. Survey target documents include Web bulletin boards such as blogs, E-mails, questionnaire results, inquiry data for company inquiry windows, reports, and other documents. Unlike the training documents stored in the training document storage unit 1 illustrated in FIG. 27, these survey target documents are documents that are not labeled with positive attribute evaluation pairs.

また、図３１に示す属性評価ペア候補抽出手段２００が実施する処理と、属性評価ペア候補記憶部２３０が実施する処理は、図３に示した学習装置２の属性評価ペア候補抽出手段１００と、属性評価ペア候補記憶部１３０が実施する処理と同様であるので、機能の説明は省略する。例えば図２２に示す調査対象文Ｓ１０について属性評価ペア候補抽出処理を実施すると、図２３に示す属性評価ペア候補が抽出されて、属性評価ペア候補記憶部２３０に記憶される。 Further, the processing executed by the attribute evaluation pair candidate extraction unit 200 shown in FIG. 31 and the processing executed by the attribute evaluation pair candidate storage unit 230 are the same as the attribute evaluation pair candidate extraction unit 100 of the learning device 2 shown in FIG. Since the processing is the same as that performed by the attribute evaluation pair candidate storage unit 130, description of the function is omitted. For example, when the attribute evaluation pair candidate extraction process is performed on the investigation target sentence S10 illustrated in FIG. 22, the attribute evaluation pair candidate illustrated in FIG. 23 is extracted and stored in the attribute evaluation pair candidate storage unit 230.

図３１に示す分類装置９の分類１手段４６０は、先ず属性評価ペア候補記憶部２３０が記憶している同一のグループＩＤ内の２組の属性評価ペア候補から属性評価比較ペアを作成する。 The classification 1 means 460 of the classification device 9 shown in FIG. 31 first creates an attribute evaluation comparison pair from two attribute evaluation pair candidates in the same group ID stored in the attribute evaluation pair candidate storage unit 230.

次に分類１手段４６０は、分類モデル１記憶部３に記憶されている識別関数Ｚ３等の分類モデル３と属性評価ペア候補とを用いてトーナメント方式に順番に比較して、同一グループＩＤ毎に最終的に残った一つの属性評価ペア候補を正例属性評価ペア候補として分類する。また分類１手段４６０は、正例属性評価ペア候補とされなかった属性評価ペア候補を負例属性評価ペア候補として分類する処理を行う。トーナメント方式を用いた分類処理は、学習装置８の分類１手段３６０が実施する処理と同様の処理を用いることができる。分類１手段４６０にて、属性評価ペア候補が正例属性評価ペア候補又は負例属性評価ペア候補のどちらに分類されるかは、属性評価ペア候補の素性に依存する。例えば、図２３に示す属性評価ペア候補を入力して分類した場合には、図２４に示す正例属性評価ペア候補に分類され、分類１手段分類結果記憶部２７０に記憶される。 Next, the classification 1 means 460 uses the classification model 3 such as the discriminant function Z3 stored in the classification model 1 storage unit 3 and the attribute evaluation pair candidates to sequentially compare the tournament method, and for each group ID. One attribute evaluation pair candidate remaining finally is classified as a positive example attribute evaluation pair candidate. Further, the classification 1 means 460 performs processing for classifying attribute evaluation pair candidates that are not regarded as positive example attribute evaluation pair candidates as negative example attribute evaluation pair candidates. For the classification process using the tournament method, the same process as the process performed by the classification 1 means 360 of the learning device 8 can be used. Whether the attribute evaluation pair candidate is classified as a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate by the classification 1 means 460 depends on the feature of the attribute evaluation pair candidate. For example, when the attribute evaluation pair candidate shown in FIG. 23 is input and classified, it is classified into the positive example attribute evaluation pair candidate shown in FIG. 24 and stored in the classification 1 means classification result storage unit 270.

図３１に示す分類装置９の分類２手段２９０は、分類１手段４６０により正例属性評価ペア候補と判定された属性評価ペア候補を、分類モデル２記憶部４に記憶されている識別関数Ｚ４等の分類モデル４を用いて正例属性評価ペア又は負例属性評価ペアに分類する処理を行う。 The classification 2 means 290 of the classification device 9 shown in FIG. 31 uses the identification function Z4 stored in the classification model 2 storage unit 4 as the attribute evaluation pair candidate determined as the positive example attribute evaluation pair candidate by the classification 1 means 460. The classification model 4 is used to classify into a positive example attribute evaluation pair or a negative example attribute evaluation pair.

分類２手段２９０にて正例属性評価ペアに分類された属性評価ペア候補は、調査対象文書の記述者の意見と考えられるので、分類２手段２９０は真の属性評価ペアと判定する。分類２手段２９０が実行する処理は、図２１に示した分類２手段２９０が実行する処理と同様の処理であるので、ここでは説明を省略する。分類２手段２９０が分類した正例属性評価ペア又は負例属性評価ペアは、分類２手段分類結果記憶部６に記憶する。 Since the attribute evaluation pair candidates classified into the positive example attribute evaluation pairs by the classification 2 means 290 are considered to be opinions of the writers of the document to be investigated, the classification 2 means 290 determines that they are true attribute evaluation pairs. The processing executed by the classification 2 means 290 is the same processing as the processing executed by the classification 2 means 290 shown in FIG. The positive example attribute evaluation pair or the negative example attribute evaluation pair classified by the classification 2 means 290 is stored in the classification 2 means classification result storage unit 6.

分類２手段分類結果記憶部６に記憶されている正例属性評価ペア又は負例属性評価ペアを、図２５に示す。分類２手段２９０によって正例属性評価ペアと判定された正例属性評価ペア候補は、真の属性評価ペア、すなわち調査対象文書の記述者の意見として抽出され、記憶される。 FIG. 25 shows positive example attribute evaluation pairs or negative example attribute evaluation pairs stored in the classification 2 means classification result storage unit 6. The positive example attribute evaluation pair candidates determined as the positive example attribute evaluation pair by the classification 2 means 290 are extracted and stored as the true attribute evaluation pair, that is, the opinion of the writer of the document to be investigated.

図３２は、分類装置９を用いて調査対象文書から記述者の意見を抽出する処理のフローチャートである。同図に示すＳ２００からＳ２９０の処理ステップは、図３１の分類装置９の各手段における処理に対応している。 FIG. 32 is a flowchart of the process of extracting the writer's opinion from the survey target document using the classification device 9. The processing steps from S200 to S290 shown in the figure correspond to the processing in each means of the classification device 9 in FIG.

本発明に係る意見抽出用の学習装置及び分類装置を用いることによって、オペレータが入力した訓練文書を有効に利用した高精度の意見抽出を実現することが可能となる。その理由は、主に属性表現と評価表現の関係を学習できるように、訓練文書の中から負例を抽出して選択的に用いて作成した分類モデル１又は分類モデル３と、主に主観評価性を学習できるように負例を選択的に用いて作成した分類モデル２又は分類モデル４とを用いるからである。 By using the opinion extraction learning apparatus and classification apparatus according to the present invention, it is possible to realize highly accurate opinion extraction that effectively uses a training document input by an operator. The reason is that the classification model 1 or the classification model 3 created by extracting negative examples from the training document and selectively using them so that the relationship between attribute expressions and evaluation expressions can be learned mainly, and subjective evaluation This is because the classification model 2 or the classification model 4 created by selectively using a negative example so as to learn the sex is used.

分類モデル１を作成する学習装置２のモデル作成１手段１５０（又は分類モデル３を作成する学習装置８のモデル作成１手段３５０）の負例は、正例を一つ以上含むグループＩＤの負例のみである。正例を一つ以上含むグループの負例は、主観評価であるが属性表現と評価表現に関係がない可能性が高いものである。したがって、学習装置２のモデル作成１手段１５０（又は学習装置８のモデル作成１手段３５０）は、訓練文書から属性表現と評価表現に正しい関係が存在するか否かを主に学習することは可能であるが、訓練文書から主観評価を学習することは困難である。 The negative example of the model creation 1 means 150 of the learning device 2 that creates the classification model 1 (or the model creation 1 means 350 of the learning device 8 that creates the classification model 3) is a negative example of a group ID including one or more positive examples. Only. A negative example of a group including one or more positive examples is a subjective evaluation, but there is a high possibility that the attribute expression and the evaluation expression are not related. Therefore, the model creation 1 means 150 of the learning device 2 (or the model creation 1 means 350 of the learning device 8) can mainly learn whether there is a correct relationship between the attribute expression and the evaluation expression from the training document. However, it is difficult to learn subjective evaluation from training documents.

一方、分類モデル２を作成する学習装置２のモデル２作成手段１９０（又は分類モデル３を作成する学習装置８のモデル作成２手段１９０）の負例は、学習装置２の分類１手段１６０（又は学習装置８の分類１手段３６０）によって間違って正例と判定される負例のみである。学習装置２の分類１手段１６０（又は学習装置８の分類１手段３６０）によって正例と判定されたものの、実際には調査対象文書の記述者の意見を表していない負例であるという現象が生じている場合には、属性表現と評価表現との間に何らかの関係はあるが、記述者の主観評価ではないという可能性が高い。 On the other hand, the negative example of the model 2 creation means 190 of the learning device 2 that creates the classification model 2 (or the model creation 2 means 190 of the learning device 8 that creates the classification model 3) is the classification 1 means 160 of the learning device 2 (or Only negative examples erroneously determined as positive examples by the classification 1 means 360) of the learning device 8 are shown. Although it is determined as a positive example by the classification 1 means 160 of the learning device 2 (or the classification 1 means 360 of the learning device 8), there is actually a phenomenon that it is a negative example that does not represent the opinion of the writer of the survey target document. If it has occurred, there is a certain relationship between the attribute expression and the evaluation expression, but there is a high possibility that it is not the subjective evaluation of the writer.

本発明に係る学習装置２のモデル２作成手段１９０（又は学習装置８のモデル作成２手段１９０）は、訓練文書に基づいて主に主観評価性を学習することが可能である。このように、訓練文書から属性表現と評価表現に正しい関係が存在するか否かを主に学習することが可能な分類モデル１（又は分類モデル３）と、訓練文書から主に主観評価を学習するための分類モデル２（又は分類モデル４）とを負例を選択的に用いることによって、分類モデル１（又は分類モデル３）のみでは正しい関係が存在するとして選択される属性評価ペアについて、主観評価を学習した分類モデル２（又は分類モデル３）を用いて分類モデル１による判断を補正し、意見の抽出精度を向上させることが可能となる。 The model 2 creation means 190 of the learning device 2 according to the present invention (or the model creation 2 means 190 of the learning device 8) can mainly learn subjective evaluation based on the training document. In this way, classification model 1 (or classification model 3) capable of mainly learning whether or not there is a correct relationship between the attribute expression and the evaluation expression from the training document, and mainly learning the subjective evaluation from the training document. By selectively using the negative example of the classification model 2 (or classification model 4) for the purpose, the attribute evaluation pair selected as having the correct relationship only in the classification model 1 (or classification model 3) It is possible to correct the judgment by the classification model 1 using the classification model 2 (or the classification model 3) having learned the evaluation, and improve the opinion extraction accuracy.

本発明によれば、ｂｌｏｇ等のＷｅｂの掲示板や、Ｅ−ｍａｉｌ、報告書類、その他の意見そのものを記述することを目的としていない文書から、記述者の意見をより正確に自動で抽出することが可能となるので、商品購入前の事前調査や、企業の市場調査等のマーケティング活動の効率を向上させ、より利用者の希望に則した製品を早期に提供することが可能となる。 According to the present invention, it is possible to automatically extract a writer's opinion more accurately and automatically from a web bulletin board such as blog or the like, an E-mail, a report document, or other documents that are not intended to describe an opinion itself. As a result, it is possible to improve the efficiency of marketing activities such as a preliminary survey before purchasing a product and a market survey of a company, and to provide a product more in line with the user's wishes at an early stage.

訓練事例を用いて属性評価ペア候補分類の学習を行い、識別関数等の分類モデルを作成する学習装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the learning apparatus 10 which learns attribute evaluation pair candidate classification | category using a training example, and produces classification models, such as a discriminant function. 調査対象文書を入力し、訓練事例から生成した識別関数等の分類モデルを用いて意見の抽出を行う分類装置１１の構成を示すブロック図である。It is a block diagram which shows the structure of the classification | category apparatus 11 which inputs an investigation object document and extracts an opinion using classification models, such as a discriminant function produced | generated from the training example. 本発明に係る第１実施形態の学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the learning apparatus of 1st Embodiment which concerns on this invention. 訓練文書の例を示す図である。It is a figure which shows the example of a training document. 訓練文書を一般的に表現した状態を示す図である。It is a figure which shows the state which represented the training document generally. 属性表現記憶部が記憶している属性表現の集合を示す図である。It is a figure which shows the set of the attribute expression which the attribute expression memory | storage part has memorize | stored. 評価表現記憶部が記憶している評価表現の集合を示す図である。It is a figure which shows the set of the evaluation expression which the evaluation expression memory | storage part has memorize | stored. 訓練文書記憶部１に記憶されている訓練文書の一例を示す図である。It is a figure which shows an example of the training document memorize | stored in the training document memory | storage part. 評価表現Ｅ１を含む文と同一文内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ１）と、評価表現Ｅ２を含む文と同一文内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ２）とに分類した分類例を示す図である。A group of attribute evaluation pair candidates (GID1) in which the attribute expression exists in the same sentence as the sentence including the evaluation expression E1, and a group of attribute evaluation pair candidates in which the attribute expression exists in the same sentence as the sentence including the evaluation expression E2 ( It is a figure which shows the example of classification classified into GID2). 評価表現Ｅ１を含む文とその前方１文以内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ１）と、評価表現Ｅ２を含む文とその前方１文以内に属性表現が存在する属性評価ペア候補のグループ（ＧＩＤ２）とに分類した分類例を示す図である。Attribute evaluation pair candidate group (GID1) in which an attribute expression exists within the sentence including the evaluation expression E1 and the preceding sentence, and an attribute evaluation pair in which the attribute expression exists within the sentence including the evaluation expression E2 and the preceding sentence It is a figure which shows the example of classification classified into candidate group (GID2). 評価表現を含む文と同一文内に属性表現が存在するという条件と、最も評価表現に近い場所に存在する正例の属性タグが付与されたＡ１までを、分類の範囲として限定した分類例を示す図である。A classification example in which the condition that the attribute expression exists in the same sentence as the sentence containing the evaluation expression and A1 to which the attribute tag of the positive example existing in the place closest to the evaluation expression is assigned as the classification range FIG. 訓練文書記憶部１に記憶されている訓練文Ｓ６及びＳ７を例示する図である。It is a figure which illustrates training sentences S6 and S7 memorized by training document storage part 1. 訓練文Ｓ６及びＳ７について形態素解析及び構文解析を実施した結果を例示する図である。It is a figure which illustrates the result of having implemented morphological analysis and syntactic analysis about training sentences S6 and S7. 形態素解析及び構文解析を行った結果を用いて素性を抽出した結果を例示する図である。It is a figure which illustrates the result of having extracted the feature using the result of having performed morphological analysis and syntax analysis. 属性評価ペア候補記憶部が記憶する属性評価ペア候補の組み合わせを示す図である。It is a figure which shows the combination of the attribute evaluation pair candidate which an attribute evaluation pair candidate memory | storage part memorize | stores. 訓練事例作成１手段が出力する正例属性評価ペア候補と負例属性評価ペア候補を示す図である。It is a figure which shows the positive example attribute evaluation pair candidate and negative example attribute evaluation pair candidate which a training example preparation 1 means outputs. 分類１手段が抽出した未定の属性評価ペア候補を示す図である。It is a figure which shows the undetermined attribute evaluation pair candidate extracted by the classification 1 means. 分類１手段が分類した正例属性評価ペア候補と負例属性評価ペア候補とを示す図である。It is a figure which shows the positive example attribute evaluation pair candidate and negative example attribute evaluation pair candidate which the classification 1 means classified. 訓練事例作成２手段が抽出した正例属性評価ペア候補と、置き換えた負例属性評価ペア候補とを示す図である。It is a figure which shows the positive example attribute evaluation pair candidate extracted by the training example preparation 2 means, and the replaced negative example attribute evaluation pair candidate. 学習装置２を用いた分類モデル１及び分類モデル２を抽出する処理のフローチャートである。It is a flowchart of the process which extracts the classification model 1 and the classification model 2 using the learning apparatus 2. FIG. 本発明に係る第１実施例の分類装置の構成を示すブロック図である。It is a block diagram which shows the structure of the classification | category apparatus of 1st Example which concerns on this invention. 調査対象文Ｓ１０の内容を示す図である。It is a figure which shows the content of investigation object sentence S10. 属性評価ペア候補記憶部が記憶する属性評価ペア候補の例を示す図である。It is a figure which shows the example of the attribute evaluation pair candidate which an attribute evaluation pair candidate memory | storage part memorize | stores. 分類１手段分類結果記憶部に記憶されている属性評価ペア候補を示す図である。It is a figure which shows the attribute evaluation pair candidate memorize | stored in the classification 1 means classification result storage part. 分類２手段分類結果記憶部６に記憶されている正例属性評価ペア又は負例属性評価ペアを示す図である。It is a figure which shows the positive example attribute evaluation pair or negative example attribute evaluation pair memorize | stored in the classification 2 means classification result storage part 6. FIG. 分類装置７を用いて調査対象文書から記述者の意見を抽出する処理のフローチャートである。10 is a flowchart of processing for extracting a writer's opinion from a survey target document using the classification device 7. 本発明に係る第２実施形態の学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the learning apparatus of 2nd Embodiment which concerns on this invention. 訓練事例作成１手段が作成した属性評価比較ペアを示す図である。It is a figure which shows the attribute evaluation comparison pair which the training example preparation 1 means created. 分類１手段がトーナメント方式を用いて分類する様子を示す図である。It is a figure which shows a mode that a classification | category 1 means classifies using a tournament system. 学習装置８を用いた分類モデル３及び分類モデル４を抽出する処理のフローチャートである。It is a flowchart of the process which extracts the classification model 3 and the classification model 4 using the learning apparatus 8. 本発明に係る第２実施形態の分類装置の構成を示すブロック図である。It is a block diagram which shows the structure of the classification device of 2nd Embodiment which concerns on this invention. 分類装置９を用いて調査対象文書から記述者の意見を抽出する処理のフローチャートである。10 is a flowchart of processing for extracting a writer's opinion from an investigation target document using the classification device 9;

Explanation of symbols

１訓練文書記憶部
２、８、１０学習装置
３分類モデル１記憶部
４分類モデル２記憶部
５文書記憶部
６分類２手段分類結果記憶部
７、９、１１分類装置
１２分類モデル記憶部
１３分類手段分類結果記憶部
１００、２００属性評価ペア候補抽出手段
１１０属性表現記憶部
１２０評価表現記憶部
１３０、２３０属性評価ペア候補記憶手段
１４０、３４０訓練事例作成１手段
１５０、３５０モデル作成１手段
１６０、２６０、３６０、４６０分類１手段
１７０、２７０分類１手段分類結果記憶部
１８０訓練事例作成２手段
１９０モデル作成２手段
２１０訓練事例作成手段
２２０モデル作成手段
２３０分類手段
２９０分類２手段 DESCRIPTION OF SYMBOLS 1 Training document memory | storage part 2, 8, 10 Learning apparatus 3 Classification model 1 memory | storage part 4 Classification model 2 memory | storage part 5 Document memory | storage part 6 Classification 2 means Classification result memory | storage part 7, 9, 11 Classification apparatus 12 Classification model memory | storage part 13 Classification Means classification result storage unit 100, 200 Attribute evaluation pair candidate extraction unit 110 Attribute expression storage unit 120 Evaluation expression storage unit 130, 230 Attribute evaluation pair candidate storage unit 140, 340 Training case creation 1 unit 150, 350 Model creation 1 unit 160, 260, 360, 460 Classification 1 means 170, 270 Classification 1 means classification result storage unit 180 Training example creation 2 means 190 Model creation 2 means 210 Training example creation means 220 Model creation means 230 Classification means 290 Classification 2 means

Claims

Input a training document including a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document, and candidate attribute evaluation pairs corresponding to the combination of the attribute expression and the evaluation expression existing in the training document For each training document, and for each attribute evaluation pair candidate, the surface character string of the phrase including the attribute expression and the evaluation expression, the part of speech of the phrase including the attribute expression and the evaluation expression, and the attribute expression An attribute evaluation pair candidate extraction means for performing processing to add feature information such as distance information between the attribute expression and the evaluation expression, or whether the clause including the evaluation expression has a dependency relationship;
Enter a training document including a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document, and a candidate attribute evaluation pair grouped for each training document, and the same as the positive example attribute evaluation pair The attribute evaluation pair candidate is determined as a positive example attribute evaluation pair candidate, and the attribute evaluation pair candidate that is not a positive example attribute evaluation pair candidate is defined as a negative example attribute evaluation pair candidate in the group in which the positive example attribute evaluation pair candidate exists. 1 means to create training cases,
Model creation 1 means for inputting a positive example attribute evaluation pair candidate and a negative example attribute evaluation pair candidate and performing a process of creating a classification model 1 such as a discriminant function Z1;
Input a training document including a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document, an attribute evaluation pair candidate grouped for each training document, and a classification model 1 such as a discriminant function Z1 Then, an attribute evaluation pair candidate belonging to a group made up of attribute evaluation pair candidates that does not include any of the same attribute evaluation pair candidates as the positive example attribute evaluation pair is extracted as an undetermined attribute evaluation pair candidate, and the identification function Z1 etc. A classification 1 means for performing a process of classifying an undetermined attribute evaluation pair candidate into a positive example attribute evaluation pair candidate or a negative example attribute evaluation pair candidate using the classification model 1;
Training document including a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document, a candidate attribute evaluation pair grouped for each training document, and a positive example attribute evaluation classified by the classification model 1 means A pair candidate is input, the positive example attribute evaluation pair candidate is replaced with a negative example attribute evaluation pair candidate, and the same attribute evaluation pair candidate as the positive example attribute evaluation pair is determined as a positive example attribute evaluation pair candidate 2 examples of training case creation,
A process of inputting a positive example attribute evaluation pair candidate determined by the training case creation 2 means and a negative example attribute evaluation pair candidate replaced by the training case creation 2 means, and creating and outputting a classification model 2 such as a discriminant function Z2. 2 models to create,
An opinion extraction learning device characterized by comprising:

Enter the survey target document that contains the web bulletin board such as blog, E-mail, questionnaire results, inquiry data to the company inquiry window, reports, and other writer's opinions. Attribute evaluation pair candidates corresponding to a combination of existing attribute expressions and evaluation expressions are extracted by grouping for each document to be investigated, and for each attribute evaluation pair candidate, a clause including attribute expressions and evaluation expressions The surface part of the character string, the part of speech of the phrase including the attribute expression and the evaluation expression, whether the phrase including the attribute expression and the phrase including the evaluation expression are in a dependency relationship, or the distance information between the attribute expression and the evaluation expression Attribute evaluation pair candidate extraction means for performing a process of assigning feature information of
Input the attribute evaluation pair candidate grouped for each document to be investigated, and the classification model 1 such as the discriminant function Z1 for determining the boundary plane for dividing the attribute evaluation pair candidate into the positive attribute evaluation pair and the negative attribute evaluation pair. A classification 1 means for performing processing for extracting a positive example attribute evaluation pair candidate from attribute evaluation pair candidates in the same group using the classification model 1;
A positive example attribute evaluation pair candidate extracted by the classification 1 means, and a classification model 2 such as an identification function Z2 for determining a boundary surface for dividing the positive example attribute evaluation pair candidate into a positive example attribute evaluation pair and a negative example attribute evaluation pair A classification 2 means for performing a process of inputting, extracting a positive example attribute evaluation pair from a positive example attribute evaluation pair candidate, and performing a process of determining the positive example attribute evaluation pair as a true attribute evaluation pair;
An opinion extraction classification apparatus characterized by comprising:

Input a training document including a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document, and candidate attribute evaluation pairs corresponding to the combination of the attribute expression and the evaluation expression existing in the training document For each training document, and for each attribute evaluation pair candidate, the surface character string of the phrase including the attribute expression and the evaluation expression, the part of speech of the phrase including the attribute expression and the evaluation expression, and the attribute expression An attribute evaluation pair candidate extraction means for performing processing to add feature information such as distance information between the attribute expression and the evaluation expression, or whether the clause including the evaluation expression has a dependency relationship;
Enter a training document including a positive example attribute evaluation pair with a label that associates the attribute expression with the evaluation expression in the document, and a candidate attribute evaluation pair grouped for each training document, and the same as the positive example attribute evaluation pair Attribute evaluation pair candidates in a group including at least one attribute evaluation pair candidate are extracted, and attribute evaluation comparison pair candidates are selected based on the features and positional relationships of the two attribute expressions in the two attribute evaluation pair candidates. When the attribute expression that is generated and exists on the left side from the beginning in the training document matches the attribute expression of the positive attribute evaluation pair in the training document, the attribute evaluation that adds the Left label to the attribute evaluation comparison pair candidate When a comparison pair is generated and the attribute expression existing on the right side from the top in the training document matches the attribute expression of the positive example attribute evaluation pair in the training document, the right evaluation label pair is added to the attribute evaluation comparison pair candidate. A training set created 1 means for performing processing for generating an attribute evaluation comparing pairs of imparting Le,
Model creation 1 means for performing processing for creating a classification model 3 such as a discriminant function Z3 by inputting the attribute evaluation comparison pair labeled with Left or Right by the training case creation 1 means,
Input a training document including a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document, an attribute evaluation pair candidate grouped for each training document, and a classification model 3 such as an identification function Z3 Then, attribute evaluation pair candidates belonging to a group composed of attribute evaluation pair candidates that do not include any of the same attribute evaluation pair candidates as the positive example attribute evaluation pair are extracted as undetermined attribute evaluation pair candidates, and two sets of undetermined An attribute evaluation comparison pair is generated based on the two attribute expression features in the candidate attribute evaluation pair, and using the classification model 3 such as the discriminant function Z3, the generated attribute evaluation comparison pairs are sequentially compared in the tournament method. One attribute evaluation pair candidate finally remaining for each group is set as a positive example attribute evaluation pair candidate, and attribute evaluation pair candidates other than the positive example attribute evaluation pair candidate are set as negative example attribute evaluation pairs. Classification 1 means for performing a process of setting the accessory,
Training document including a positive example attribute evaluation pair with a label that associates an attribute expression with an evaluation expression in the document, a candidate attribute evaluation pair grouped for each training document, and a positive example attribute evaluation classified by the classification model 1 means A pair candidate is input, the positive example attribute evaluation pair candidate is replaced with a negative example attribute evaluation pair candidate, and the same attribute evaluation pair candidate as the positive example attribute evaluation pair is determined as a positive example attribute evaluation pair candidate 2 examples of training case creation,
A process of inputting a positive example attribute evaluation pair candidate determined by the training case creation 2 means and a negative example attribute evaluation pair candidate replaced by the training case creation 2 means, and creating and outputting a classification model 4 such as a discriminant function Z4. 2 models to create,
An opinion extraction learning apparatus characterized by comprising:

Enter the survey target document that contains the web bulletin board such as blog, E-mail, questionnaire results, inquiry data to the company inquiry window, reports, and other writer's opinions. Attribute evaluation pair candidates corresponding to a combination of existing attribute expressions and evaluation expressions are extracted by grouping for each document to be investigated, and for each attribute evaluation pair candidate, a clause including attribute expressions and evaluation expressions The surface part of the character string, the part of speech of the phrase including the attribute expression and the evaluation expression, whether the phrase including the attribute expression and the phrase including the evaluation expression are in a dependency relationship, or the distance information between the attribute expression and the evaluation expression Attribute evaluation pair candidate extraction means for performing a process of assigning feature information of
Input the attribute evaluation pair candidate grouped for each document to be investigated, and the classification model 3 such as the discriminant function Z3 that determines the boundary surface for dividing the attribute evaluation pair candidate into the positive attribute evaluation pair and the negative attribute evaluation pair. Then, an attribute evaluation comparison pair is created from two sets of attribute evaluation pair candidates in the same group, and a plurality of generated attribute evaluation comparison pairs are sequentially compared in the tournament method using the classification model 3 such as the discriminant function Z3. One attribute evaluation pair candidate finally remaining for each group is set as a positive example attribute evaluation pair candidate, and attribute evaluation pair candidates other than the positive example attribute evaluation pair candidate are set as negative example attribute evaluation pair candidates Classification 1 means for processing;
A positive example attribute evaluation pair candidate set by the classification 1 means, and a classification model 4 such as an identification function Z4 for determining a boundary surface for dividing the positive example attribute evaluation pair candidate into a positive example attribute evaluation pair and a negative example attribute evaluation pair The classification 2 means for performing the process of inputting and extracting the positive example attribute evaluation pair from the positive example attribute evaluation pair candidates set by the classification 1 means and determining the positive example attribute evaluation pair as a true attribute evaluation pair When,
An opinion extraction classification apparatus characterized by comprising: