JP2015125570A

JP2015125570A - Information processing apparatus, control method, and program

Info

Publication number: JP2015125570A
Application number: JP2013269019A
Authority: JP
Inventors: 淑隆林; Yoshitaka Hayashi
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Priority date: 2013-12-26
Filing date: 2013-12-26
Publication date: 2015-07-06

Abstract

PROBLEM TO BE SOLVED: To provide an information processing apparatus, a control method, and a computer program capable of more precisely analyzing an evaluation object document without troubling a user when analyzing whether the evaluation object document is affirmative, denial, or others in the evaluation.SOLUTION: The information processing apparatus performs morphological analysis and syntactic analysis of an evaluation object document and analyzes the modification of a term which constitutes the document. When the term is included in the evaluation object and when an evaluation word included in the evaluation object and another evaluation word in modification relation with the evaluation word are in cooccurrence relation, the information processing apparatus further obtains change information indicating that the evaluation polarity reserved for the evaluation words should be changed. The information processing apparatus thereafter determines the evaluation polarity of each document on the basis of the change information.

Description

本発明は、文書の分析処理において文書から効果的な情報を抽出するための分析技術に関するものであり、特にウェブ上の電子化された文書データから主観的な評判・評価、意見等を抽出し提示する技術、及びその管理技術に関する。 The present invention relates to an analysis technique for extracting effective information from a document in document analysis processing, and in particular extracts subjective reputation / evaluation, opinion, etc. from digitized document data on the web. The present invention relates to the technology to be presented and its management technology.

ウェブを中心に日々膨大な量の文書が作成されており、また同時に消費されていく中から、必要な情報を抽出し提示することは重要な課題である。特に事業者にとって、扱う商品の評判やサービスへの評価・意見等に関して、その評価状況を分析することは有益である。 It is an important issue to extract and present necessary information from the vast amount of documents that are created every day, mainly on the web, and being consumed at the same time. In particular, it is useful for business operators to analyze the evaluation status regarding the reputation of the products they handle and the evaluation / opinions on services.

このような課題に対して、大量の電子化された文書データから商品の評判や評価、意見といった主観的な評価を抽出し提示する様々な手法が提案されている。 In order to deal with such problems, various methods for extracting and presenting subjective evaluations such as product reputation, evaluation and opinion from a large amount of digitized document data have been proposed.

例えば、評価や評判を表す意味を持つ語（以下、評価語とする）を、その評価値（肯定的或いは否定的な意味・評価を表す属性値）とともに辞書に登録しておき、文書データから評価語を抽出し、その頻度等から定量的に評価値を算出する手法等がある。 For example, a word having a meaning representing evaluation or reputation (hereinafter referred to as an evaluation word) is registered in a dictionary together with its evaluation value (an attribute value representing a positive or negative meaning / evaluation). There is a method of extracting an evaluation word and calculating an evaluation value quantitatively from its frequency.

特許文献１では、予め定められた評価語を辞書に登録しておき、分析対象となる文書から評価語を検出して評価値を求めて、肯定的意見或いは否定的意見を判定する手法を開示している。また、この手法では、否定文の構成を考慮することや複数の単語からひとつの具体的な評価値を算出する手法も開示している。 Patent Document 1 discloses a method for registering a predetermined evaluation word in a dictionary, detecting an evaluation word from a document to be analyzed, obtaining an evaluation value, and determining a positive opinion or a negative opinion doing. This method also discloses a method for considering the configuration of a negative sentence and calculating one specific evaluation value from a plurality of words.

特許文献２では、特許文献１と同様に、予め定められた評価語から評価値を求めているが、ひとつの評価語が評価の対象先によって評価値の反転が発生することを考慮しており、その対策としてユーザ投票による辞書管理手法を開示している。 In Patent Literature 2, as in Patent Literature 1, an evaluation value is obtained from a predetermined evaluation word. However, in consideration of the fact that one evaluation word is inverted depending on the evaluation target. As a countermeasure, a dictionary management technique based on user voting is disclosed.

特許第４７９６６６４号公報Japanese Patent No. 4796664 特許第３７３８０１１号公報Japanese Patent No. 3738011

しかしながら、特許文献１における評価値の反転については、一般的な否定語（例えば「ない」）の考慮を行っており、例えば、評価語「病気」の評価値が否定であるとき、「病気ではない」という表現は、評価値が反転して肯定となるが、「病気が治る」のとき、即ち、否定語以外の単語の組み合わせについての評価値の反転が考慮されていない。 However, regarding the reversal of the evaluation value in Patent Document 1, a general negative word (for example, “no”) is considered. For example, when the evaluation value of the evaluation word “disease” is negative, The expression “absent” is affirmative when the evaluation value is inverted, but inversion of the evaluation value for a combination of words other than a negative word is not considered when “the disease is cured”, that is, a word combination.

また、このような評価値の反転課題に対して、特許文献１では複数の単語からひとつの具体的な評価値を算出する「二項関係」を利用して課題を解決することも可能と思われるが、「病気が治る」のような、単一の評価語と同時に二項関係が存在する場合の対策手法が開示されていないこと、さらにはこのような評価値が反転するような二項関係を膨大に登録・管理することの新たな課題がある。 In addition, with respect to such an evaluation value inversion problem, Patent Document 1 may solve the problem by using a “binary relationship” that calculates one specific evaluation value from a plurality of words. However, there is no disclosure of countermeasures for cases where a binary relationship exists at the same time as a single evaluation word, such as “disease cures”, and further, such a binary that reverses the evaluation value. There is a new problem of registering and managing huge amounts of relationships.

また、特許文献２に記載の手法では、評価値が反転する根拠に評価語の係り先に当たる対象語が使われており、「古い寺」では肯定的評価を、「古い生鮮食料品」では否定的評価をする事例が開示されている。 Moreover, in the method described in Patent Document 2, the target word corresponding to the evaluation word is used as the basis for the evaluation value reversal, and a positive evaluation is given for “old temple”, while a negative result is given for “old fresh food”. An example of a manual evaluation is disclosed.

しかしながら、この事例のような評価を行うのは、あくまでもユーザ投票に起因する部分が大きく、所定の単語の組み合わせにおいては、ユーザ投票を行わなくとも、評価を行うことが可能なケースも存在するが、ユーザ投票を行うことなく、システムで対応すべき術については、具体的に明示されていない。 However, the evaluation as in this case is largely due to user voting, and there are cases where the evaluation can be performed without performing user voting in a predetermined combination of words. The technique that should be dealt with in the system without performing user voting is not clearly specified.

更に、特許文献１についても同様なことが言えるが、評価値が反転する根拠には、対象語だけではなく、主格に当たる語も根拠となり得る。例えば、「値段が高い」の評価値は、消費者から見れば否定的評価となるが、販売者の立場から見れば肯定的評価をみなすことができる。 Further, although the same can be said for Patent Document 1, not only the target word but also the word corresponding to the main word can be the basis for the evaluation value to be reversed. For example, an evaluation value of “high price” is a negative evaluation from the viewpoint of the consumer, but a positive evaluation can be considered from the viewpoint of the seller.

このように、文書データから評判や評価、意見といった主観的な評価を抽出することは重要な課題であり、より効率的に精度良く評価値を算出する必要がある。 Thus, extracting subjective evaluations such as reputation, evaluation and opinion from document data is an important issue, and it is necessary to calculate evaluation values more efficiently and accurately.

本発明は、上記課題を解決するためになされたものであり、評価対象となる文書が、肯定的な評価あるいは否定的な評価等であるかを分析するにあたり、ユーザへの手間をかけることなく、より精度良く評価対象文書の分析を行うことができる、情報処理装置、制御方法、及びプログラム。 The present invention has been made in order to solve the above-described problems, and it is possible to analyze whether a document to be evaluated is a positive evaluation or a negative evaluation without taking time and effort to a user. An information processing apparatus, a control method, and a program capable of analyzing an evaluation target document with higher accuracy.

上記目的を達成するための第１の発明は、文書における肯定評価あるいは否定評価を分析する情報処理装置であって、前記文書の形態素解析及び構文解析から求まる評価対象とする評価語及び前記評価語の評価極性を含む評価語情報を取得する評価情報取得手段と、前記評価情報取得手段によって取得した評価情報の評価語との係り受け関係となる前記評価語と異なる評価語を取得する係受評価語取得手段と、前記係受評価語取得手段によって取得した評価語と係り受け関係にある評価語とが、共起関係にあるか否かを判定する共起関係判定手段と、前記共起関係判定手段によって共起関係にある評価語の評価極性を変更するための変更情報を取得する変更情報取得手段と、を備えたことを特徴とする。 A first invention for achieving the above object is an information processing apparatus for analyzing affirmative evaluation or negative evaluation in a document, the evaluation word to be evaluated obtained from morphological analysis and syntactic analysis of the document, and the evaluation word Evaluation evaluation for acquiring evaluation word different from the evaluation word which is a dependency relationship between evaluation information acquisition means for acquiring evaluation word information including the evaluation polarity of the evaluation information and evaluation words of the evaluation information acquired by the evaluation information acquisition means A co-occurrence relation determining means for determining whether or not a word acquisition means and an evaluation word that is in a dependency relationship with the evaluation word acquired by the dependency evaluation word acquisition means are in a co-occurrence relationship; And a change information acquisition means for acquiring change information for changing the evaluation polarity of evaluation words in a co-occurrence relationship by a determination means.

上記目的を達成するための第２の発明は、文書における肯定評価あるいは否定評価を分析する情報処理装置の制御方法であって、前記情報処理装置は、前記文書の形態素解析及び構文解析から求まる評価対象とする評価語及び前記評価語の評価極性を含む評価語情報を取得する評価情報取得ステップと、前記評価情報取得ステップによって取得した評価情報の評価語との係り受け関係となる前記評価語と異なる評価語を取得する係受評価語取得手段と、前記係受評価語取得手段によって取得した評価語と係り受け関係にある評価語とが、共起関係にあるか否かを判定する共起関係判定ステップと、前記共起関係判定ステップによって共起関係にある評価語の評価極性を変更するための変更情報を取得する変更情報取得ステップと、を実行することを特徴とする。 A second invention for achieving the above object is a method of controlling an information processing apparatus that analyzes positive evaluation or negative evaluation in a document, wherein the information processing apparatus is obtained by morphological analysis and syntactic analysis of the document. The evaluation word which is a dependency relationship between the evaluation information acquisition step for acquiring evaluation word information including the evaluation word to be evaluated and the evaluation polarity of the evaluation word, and the evaluation word of the evaluation information acquired by the evaluation information acquisition step; Co-occurrence that determines whether or not a dependency evaluation word acquisition unit that acquires different evaluation words and an evaluation word that is in a dependency relationship with the evaluation word acquired by the dependency evaluation word acquisition unit have a co-occurrence relationship Executing a relationship determination step and a change information acquisition step of acquiring change information for changing the evaluation polarity of the evaluation words in the co-occurrence relationship by the co-occurrence relationship determination step And wherein the door.

上記目的を達成するための第３の発明は、文書における肯定評価あるいは否定評価を分析する情報処理装置において読取実行可能なプログラムであって、前記情報処理装置を、前記文書の形態素解析及び構文解析から求まる評価対象とする評価語及び前記評価語の評価極性を含む評価語情報を取得する評価情報取得手段と、前記評価情報取得手段によって取得した評価情報の評価語との係り受け関係となる前記評価語と異なる評価語を取得する係受評価語取得手段と、前記係受評価語取得手段によって取得した評価語と係り受け関係にある評価語とが、共起関係にあるか否かを判定する共起関係判定手段と、前記共起関係判定手段によって共起関係にある評価語の評価極性を変更するための変更情報を取得する変更情報取得手段と、して機能させることを特徴とする。 A third invention for achieving the above object is a program readable and executable by an information processing apparatus for analyzing positive evaluation or negative evaluation in a document, wherein the information processing apparatus is used for morphological analysis and syntax analysis of the document. The evaluation information acquisition means for acquiring evaluation word information including the evaluation word to be evaluated and the evaluation polarity of the evaluation word, and the evaluation word of the evaluation information acquired by the evaluation information acquisition means It is determined whether or not the dependency evaluation word acquisition unit that acquires an evaluation word different from the evaluation word and the evaluation word acquired by the dependency evaluation word acquisition unit and the evaluation word that is in a dependency relationship have a co-occurrence relationship. Function as a co-occurrence relation determining means, and a change information acquiring means for acquiring change information for changing the evaluation polarity of evaluation words in a co-occurrence relation by the co-occurrence relation determining means And characterized in that.

本発明によれば、評価対象となる文書が、肯定的な評価あるいは否定的な評価等であるかを分析するにあたり、評価対象となる文書を構成する評価語の分野属性を考慮して分析を行うことが可能となるので、ユーザへの手間をかけることなく、より精度良く評価対象文書の分析を行うことができる、という効果をそうする。 According to the present invention, in analyzing whether a document to be evaluated is a positive evaluation or a negative evaluation, an analysis is performed in consideration of the field attributes of evaluation words constituting the document to be evaluated. Therefore, the evaluation target document can be analyzed with higher accuracy without taking time and effort for the user.

本発明の実施形態に係る文書分析装置の構成例を示す構成図である。It is a block diagram which shows the structural example of the document analyzer which concerns on embodiment of this invention. 本発明の実施形態に係る文書分析装置のハードウェア構成を示す構成図である。It is a block diagram which shows the hardware constitutions of the document analyzer which concerns on embodiment of this invention. 本発明の実施形態に係る文書分析装置で実行される文書分析処理のフローチャートである。It is a flowchart of the document analysis process performed with the document analysis apparatus which concerns on embodiment of this invention. 本発明の実施形態における正規化処理の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the normalization process in embodiment of this invention. 本発明の実施形態に係る文書分析装置で実行される評価語抽出処理のフローチャートである。It is a flowchart of the evaluation word extraction process performed with the document analysis apparatus which concerns on embodiment of this invention. 本発明の実施形態における重複候補削除処理の一例である。It is an example of the duplication candidate deletion process in embodiment of this invention. 本発明の実施形態における共起処理の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the co-occurrence process in embodiment of this invention. 本発明の実施形態における係り受け関係の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the dependency relation in embodiment of this invention. 本発明の実施形態における係り受け関係の一例を説明するための説明図である。It is explanatory drawing for demonstrating an example of the dependency relation in embodiment of this invention. 本発明の実施形態における評価語候補リストの構成を示す構成図である。It is a block diagram which shows the structure of the evaluation word candidate list | wrist in embodiment of this invention. 本発明の実施形態に係る文書分析装置で実行される共起処理のフローチャートである。It is a flowchart of the co-occurrence process executed by the document analysis apparatus according to the embodiment of the present invention. 本発明の実施形態に係る文書分析装置で実行される否定表現処理のフローチャートである。It is a flowchart of the negative expression process performed with the document analyzer which concerns on embodiment of this invention. 本発明の実施形態に係る文書分析装置で実行される発言者情報抽出処理のフローチャートである。It is a flowchart of the speaker information extraction process performed with the document analyzer which concerns on embodiment of this invention. 本発明の実施形態における評価語抽出結果と発言者情報抽出結果を記憶するためのテーブルの構成の一例を示す構成図である。It is a block diagram which shows an example of the structure of the table for memorize | storing the evaluation word extraction result and speaker information extraction result in embodiment of this invention. 本発明の実施形態に係る文書分析装置で実行される極性判定処理のフローチャートである。It is a flowchart of the polarity determination process performed with the document analyzer which concerns on embodiment of this invention. 本発明の実施形態における評価語辞書の一例である。It is an example of the evaluation word dictionary in the embodiment of the present invention. 本発明の実施形態における分野反転語の一例である。It is an example of the field inversion word in embodiment of this invention. 本発明の実施形態における類語辞書の一例である。It is an example of the synonym dictionary in the embodiment of the present invention. 本発明の実施形態における発言者情報辞書の一例である。It is an example of the speaker information dictionary in the embodiment of the present invention.

以下、図面を参照して本発明の実施の形態の一例について説明する。 Hereinafter, an example of an embodiment of the present invention will be described with reference to the drawings.

図１は、本発明の実施形態における情報処理装置としての文書分析装置の構成を示す図である。 FIG. 1 is a diagram illustrating a configuration of a document analysis apparatus as an information processing apparatus according to an embodiment of the present invention.

文書分析装置１００は、評価辞書部１０１と、類語辞書部１０２と、評判情報抽出部１０３と、発言者情報抽出部１０４と、極性判定部１０５と、発言者情報辞書部１０６と、を備える。なお、評価辞書部１０１及び類語辞書部１０２及び発言者情報辞書部１０６は後述する外部メモリ２１１等の記憶装置に記憶されている。 The document analysis apparatus 100 includes an evaluation dictionary unit 101, a synonym dictionary unit 102, a reputation information extraction unit 103, a speaker information extraction unit 104, a polarity determination unit 105, and a speaker information dictionary unit 106. The evaluation dictionary unit 101, the synonym dictionary unit 102, and the speaker information dictionary unit 106 are stored in a storage device such as an external memory 211 described later.

文書分析装置１００は、テキスト文書１０７について、評判情報抽出部１０３及び発言者情報抽出部１０４に送られて、各種情報が抽出される。それぞれの抽出部では、形態素解析や構文解析された結果と各種辞書の情報を参照しながら抽出処理が実施される。 The document analysis apparatus 100 sends the text document 107 to the reputation information extraction unit 103 and the speaker information extraction unit 104 to extract various information. In each extraction unit, extraction processing is performed with reference to the results of morphological analysis and syntax analysis and information in various dictionaries.

そして、それぞれに抽出された結果は、極性判定部１０５に送られて、テキスト文書１０７の評価極性が算出される。これら一連の文書分析手法については、詳しく後述する。 The extracted results are sent to the polarity determination unit 105, and the evaluation polarity of the text document 107 is calculated. These series of document analysis techniques will be described in detail later.

次に、図１の文書分析装置１００のハードウェア構成について、図２を用いて説明する。 Next, the hardware configuration of the document analysis apparatus 100 in FIG. 1 will be described with reference to FIG.

図中、ＣＰＵ２０１は、システムバス２０４に接続される後述の各デバイスやコントローラを統括的に制御する。また、ＲＯＭ２０３あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やオペレーティングシステムプログラム（以下、ＯＳ）や、文書分析装置１００に後述する各種の処理を実行させるために必要な各種プログラムやデータ等が記憶されている。ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。 In the figure, a CPU 201 comprehensively controls each device and controller described later connected to a system bus 204. Further, the ROM 203 or the external memory 211 is used for causing the CPU 201 to execute a basic input / output system (BIOS), an operating system program (hereinafter referred to as an OS), and various processes described later. Various necessary programs and data are stored. The RAM 202 functions as a main memory, work area, and the like for the CPU 201.

ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＡＭ２０２にロードして、プログラムを実行することで後述する各種処理を実現するものである。また、入力コントローラ（入力Ｃ）２０５は、入力装置２０９からの入力を制御する。入力装置２０９は、例えばメカニカルキーボードやソフトウェアキーボード、タッチパネル等で構成される。ビデオコントローラ（ＶＣ）２０６は、表示装置２１０への表示を制御する。表示装置２１０は、例えば液晶ディスプレイ等で構成される。 The CPU 201 implements various processes to be described later by loading a program or the like necessary for executing the process into the RAM 202 and executing the program. An input controller (input C) 205 controls input from the input device 209. The input device 209 is configured by, for example, a mechanical keyboard, a software keyboard, a touch panel, or the like. A video controller (VC) 206 controls display on the display device 210. The display device 210 is configured by a liquid crystal display, for example.

メモリコントローラ（ＭＣ）２０７は、ブートプログラム、ブラウザソフトウエア、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶するハードディスク（ＨＤ）やソリッドステートディスク（ＳＳＤ）或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 The memory controller (MC) 207 is stored in a hard disk (HD), solid state disk (SSD), or PCMCIA card slot for storing boot programs, browser software, various applications, font data, user files, editing files, various data, and the like. Controls access to an external memory 211 such as a compact flash (registered trademark) memory connected via an adapter.

通信Ｉ／Ｆコントローラ（通信Ｉ／ＦＣ）２０８は、ネットワークを介して、外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いたインターネット通信等が可能である。 A communication I / F controller (communication I / FC) 208 is connected to and communicates with an external device via a network, and executes communication control processing in the network. For example, Internet communication using TCP / IP is possible.

なお、ＣＰＵ２０１は、例えばＲＡＭ２０２内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、表示装置２１０上での表示を可能としている。以上が、文書分析装置１００のハードウェア構成の説明であるが、後述する各種の処理を実行可能であれば、必ずしも図２に記載のハードウェア構成を有していなくとも構わないことは言うまでもない。 Note that the CPU 201 enables display on the display device 210 by executing, for example, outline font rasterization processing on a display information area in the RAM 202. The above is a description of the hardware configuration of the document analysis apparatus 100. Needless to say, the hardware configuration illustrated in FIG. 2 is not necessarily required as long as various processes described below can be performed. .

次に、文書分析装置１００における文書分析処理について、図３から図１５を用いて、詳しく説明する。 Next, document analysis processing in the document analysis apparatus 100 will be described in detail with reference to FIGS.

図３は、文書分析処理における全体の処理を示すフローチャートである。文書分析処理では、ＣＰＵ２０１は、文書に記述されている各文に対して、形態素解析及び構文解析を実施してから評価語抽出処理、及び発言者情報抽出処理を実施する。その後、抽出した各データを元に極性判定処理を実施する。 FIG. 3 is a flowchart showing the overall processing in the document analysis processing. In the document analysis processing, the CPU 201 performs evaluation word extraction processing and speaker information extraction processing after performing morphological analysis and syntax analysis for each sentence described in the document. Thereafter, polarity determination processing is performed based on each extracted data.

文書分析処理は、文単位で処理を実施するため、まず、ステップＳ３０１において、文書を文に分割して以降の処理を実施する。文の分割については、句点、感嘆符、疑問符、連続した改行文字列等を区切り文字として扱うことで実現する。続くステップＳ３０２において、抽出した文の形態素解析を行う。 Since the document analysis process is executed in units of sentences, first, in step S301, the document is divided into sentences and the subsequent processes are executed. Sentence splitting is realized by treating punctuation marks, exclamation marks, question marks, continuous newline strings, etc. as delimiters. In subsequent step S302, morphological analysis of the extracted sentence is performed.

続いて、ステップＳ３０３において、正規化処理を行う。正規化処理を行う理由としては、後続の構文解析の精度を一定に保つためであり、例えば、敬語・丁寧表現を標準形に戻す、或いは活用形を原型に戻す、といった処理が該当する。 Subsequently, in step S303, normalization processing is performed. The reason for performing the normalization process is to keep the accuracy of the subsequent parsing constant, for example, the process of returning the honorific and polite expression to the standard form, or returning the utilization form to the original form.

図４には、文「おいしい魚をいただいた」に対する正規化処理の一例を示す。なお、文４０１の表記は、形態素解析結果（ステップＳ３０２）で得られる形態素列で記述しており、文４０２の表記は、正規化処理結果（ステップＳ３０３）で得られる形態素列で記述している。 FIG. 4 shows an example of normalization processing for the sentence “I received a delicious fish”. The notation of the sentence 401 is described as a morpheme string obtained from the morpheme analysis result (step S302), and the notation of the sentence 402 is described as a morpheme string obtained from the normalization processing result (step S303). .

文４０１において、下線部「いただい」が「食べる」の謙譲語であるため、ステップＳ３０３の正規化処理の結果、文４０２のように補正される。この事例では「謙譲語を標準形にする」「活用形を原型にする」の２つの補正が実施されている。補正のための情報はステップＳ３０２で実施した形態素解析の結果に含まれている。 In the sentence 401, since the underlined part “Grate” is a modest word of “eat”, the result of the normalization process in step S303 is corrected as in the sentence 402. In this example, two amendments are made: “Make the modest word a standard form” and “Make the usage form a prototype”. Information for correction is included in the result of the morphological analysis performed in step S302.

図３に戻って、続くステップＳ３０４で構文解析を行う。構文解析では文節間の係り先を特定する係り受け解析処理を実施して、文節単位での文構造を獲得する。 Returning to FIG. 3, syntax analysis is performed in the subsequent step S304. In the syntax analysis, dependency analysis processing for specifying a dependency destination between clauses is performed, and a sentence structure in a phrase unit is acquired.

一般的な構文解析ツールにはＣａｂｏＣｈａやＫＮＰ等があり、特にＣａｂｏＣｈａは、構文解析でステップＳ３０２の形態素解析を一度に実施できるツールであるため、ステップＳ３０２の形態素解析の代わりにステップＳ３０４の構文解析を実施してから、構文解析結果に含まれる形態素に関する情報に基づいて、ステップＳ３０３の正規化処理を実施する必要がある。 Common parsing tools include CaboCha and KNP. In particular, CaboCha is a tool that can perform the morphological analysis of step S302 at a time in the parsing, and therefore the parsing of step S304 instead of the morphological analysis of step S302. After performing the above, it is necessary to perform the normalization process in step S303 based on the information about the morpheme included in the syntax analysis result.

ステップＳ３０４を実施した時点で、入力文書に対して形態素解析結果と構文解析結果との２つの解析結果を得る。これらの解析結果からステップＳ３０５の評価語抽出処理、及びステップＳ３０６の発言者情報抽出処理を実施する。 When step S304 is performed, two analysis results are obtained for the input document: a morphological analysis result and a syntax analysis result. From these analysis results, an evaluation word extraction process in step S305 and a speaker information extraction process in step S306 are performed.

まず、ステップＳ３０５の評価語抽出処理について、図５から図１２を用いて説明する。 First, the evaluation word extraction process in step S305 will be described with reference to FIGS.

評価語抽出処理は、ステップＳ３０２の形態素解析から得られた解析結果に対して、パターンマッチングを行うことで、予め評価辞書部１０１（図１６に示す評価辞書）に登録されている評価語を抽出する処理である。 The evaluation word extraction process extracts evaluation words registered in advance in the evaluation dictionary unit 101 (the evaluation dictionary shown in FIG. 16) by performing pattern matching on the analysis result obtained from the morphological analysis in step S302. It is processing to do.

図１６に評価辞書の一例を示す。評価辞書は評価表現と成り得る語を登録している辞書である。各語にはユニークなＩＤ値が定義されており（図１６における語ＩＤ１６０１）、一意に管理されている。 FIG. 16 shows an example of the evaluation dictionary. The evaluation dictionary is a dictionary in which words that can be evaluated expressions are registered. A unique ID value is defined for each word (word ID 1601 in FIG. 16) and is uniquely managed.

語ＩＤ１６０１ごとに、表記１６０２、初期極性１６０３、共起属性１６０４、分野１６０６が定義されている。これらの詳細については前述した通りである。なお、語ＩＤ１６０１については、ＩＤ値に対して任意の演算を実施することで、評価辞書において管理しているＩＤ値であることがわかるようになっている。 For each word ID 1601, a notation 1602, an initial polarity 1603, a co-occurrence attribute 1604, and a field 1606 are defined. These details are as described above. Note that the word ID 1601 is an ID value managed in the evaluation dictionary by performing an arbitrary calculation on the ID value.

まず、ステップＳ５０１において、評価語候補リストを初期化する。評価語候補リストとは、以降の抽出処理において検出された評価語及び当該評価語に関係する付属情報を一時保存しておくための領域である。 First, in step S501, the evaluation word candidate list is initialized. The evaluation word candidate list is an area for temporarily storing evaluation words detected in the subsequent extraction process and attached information related to the evaluation words.

続くステップＳ５０２からステップＳ５０６において、前記形態素解析結果（ステップＳ３０２）と前記正規化処理（ステップＳ３０３）で得た形態素列に対して、予め評価辞書部１０１に登録されている評価語に一致するものがあるか否かを判定する。 In subsequent steps S502 to S506, the morpheme analysis result (step S302) and the morpheme string obtained by the normalization process (step S303) match the evaluation words registered in the evaluation dictionary unit 101 in advance. It is determined whether or not there is.

一致するものがあると判定した場合（ステップＳ５０４で「はい」のとき）、ステップＳ５０５に進み、前記評価語候補リストに当該語ＩＤと形態素ＩＤと文節ＩＤとを組み合わせて追加し、一致するものがないと判定した場合（ステップＳ５０４で「いいえ」のとき）、ステップＳ５０６に進む（ステップＳ５０２へ戻る）。 If it is determined that there is a match (if “Yes” in step S504), the process proceeds to step S505, and the word ID, morpheme ID, and phrase ID are added to the evaluation word candidate list in combination and matched. If it is determined that there is no (No in step S504), the process proceeds to step S506 (returns to step S502).

図１０に評価語候補リストの一例を示す。語ＩＤ１００１は、抽出された評価語候補の語を一意に識別するための語ＩＤが格納されており、語ＩＤは、評価辞書部１０１で管理されている。 FIG. 10 shows an example of the evaluation word candidate list. The word ID 1001 stores a word ID for uniquely identifying the extracted word of the evaluation word candidate, and the word ID is managed by the evaluation dictionary unit 101.

形態素ＩＤ１００２は、ステップＳ３０２で得られた形態素解析結果における形態素識別番号である。同様に文節ＩＤ１００３は、ステップＳ３０４で得られた構文解析結果における文節識別番号である。 The morpheme ID 1002 is a morpheme identification number in the morpheme analysis result obtained in step S302. Similarly, the phrase ID 1003 is a phrase identification number in the syntax analysis result obtained in step S304.

なお、表記１００４は、図表の理解を得やすくするために便宜上記載しているものである。この事例では、評価候補語「値段」の語ＩＤは、１６７であり、形態素ＩＤは４、文節ＩＤは２として抽出されていることを意味する。 Note that the notation 1004 is provided for convenience in order to facilitate understanding of the chart. In this example, the word ID of the evaluation candidate word “price” is 167, which means that the morpheme ID is 4 and the phrase ID is 2 as extracted.

なお、ステップＳ５０３における評価語検索では、類語辞書部１０２に登録されている同義語や類義語も合わせて検索することも可能である。評価辞書部１０１と類語辞書部１０２との関係については、後述する。 In the evaluation word search in step S503, it is also possible to search for synonyms and synonyms registered in the synonym dictionary unit 102. The relationship between the evaluation dictionary unit 101 and the synonym dictionary unit 102 will be described later.

続くステップＳ５０７において、重複候補削除処理を実施する。図６を用いて重複候補削除処理を説明する。 In subsequent step S507, the duplication candidate deletion process is performed. The duplication candidate deletion process will be described with reference to FIG.

図６における文６０１は、文「おたふく風邪になった」に対して、ステップＳ３０２の形態素解析処理及びステップＳ３０３の正規化処理を実施した結果である。ここで、評価辞書部１０１に評価語「おたふく風邪」と「風邪」が登録されている、とする。 The sentence 601 in FIG. 6 is a result of performing the morphological analysis process in step S302 and the normalization process in step S303 on the sentence “I got mumps”. Here, it is assumed that the evaluation words “muffled cold” and “cold” are registered in the evaluation dictionary unit 101.

このとき、ステップＳ５０３の評価語検索では、文６０１の第１形態素と第２形態素からなる「おたふく/風邪」と第２形態素のみの「風邪」をともに前記評価語候補リストに登録する。 At this time, in the evaluation word search of step S503, both “muffled / cold” consisting of the first morpheme and second morpheme of the sentence 601 and “cold” only of the second morpheme are registered in the evaluation word candidate list.

この態様の例としては、Ａｈｏ-Ｃｏｒａｓｉｃｋ法等の複数キーワードを用いたパターンマッチングを行うことが１例としてあげられる。 As an example of this mode, pattern matching using a plurality of keywords such as the Aho-Corasick method can be cited as an example.

重複候補削除処理では、このような任意の評価語候補（文６０１における「風邪」）が他の評価語候補（文６０１における「おたふく/風邪」）に含まれる場合、構成形態素数の多い候補を優先する。文６０１の例では２つの形態素からなる「おたふく/風邪」を優先し、１つの形態素からなる「風邪」を前記評価語候補リストから削除する。 In the duplication candidate deletion process, when such an arbitrary evaluation word candidate (“cold” in the sentence 601) is included in another evaluation word candidate (“muffled / cold” in the sentence 601), a candidate having a large number of constituent morphemes is selected. Prioritize. In the example of the sentence 601, priority is given to “muffled / cold” composed of two morphemes, and “cold” composed of one morpheme is deleted from the evaluation word candidate list.

続くステップＳ５０８において、共起処理を実施する。図７、図８、図９、図１１及び図１２を用いて、共起処理を説明する。 In subsequent step S508, co-occurrence processing is performed. The co-occurrence process will be described with reference to FIGS. 7, 8, 9, 11, and 12.

共起処理とは、他の語と合わせて何らかの評価を表す表現（以下、評価表現とする）を検出する処理である。図７の文７０１は、文「今年のさんまは値段が高い」に対して、ステップＳ３０２の形態素解析処理を実施した結果である。この例では「値段」は単独で評価表現ではないが、「高い」とともに出現することで、評価表現となる一例である。 The co-occurrence process is a process for detecting an expression representing some evaluation together with other words (hereinafter referred to as an evaluation expression). A sentence 701 in FIG. 7 is a result of performing the morphological analysis process in step S302 for the sentence “This year's Sanma is expensive”. In this example, “price” is not an evaluation expression by itself, but is an example of an evaluation expression when it appears together with “high”.

ここで、図８に文７０１の構文解析の結果である係り受け解析結果を示す。係り受けとは、文を文節単位に切り分けたとき、どの文節がどの文節に係るかを示す構文解析結果である。 FIG. 8 shows a dependency analysis result that is a result of the syntax analysis of the sentence 701. The dependency is a parsing result indicating which clause relates to which clause when the sentence is divided into clauses.

図８では、文節８０１は文節８０４に係り、文節８０２は文節８０３に係り、文節８０３は文節８０４に係ることを示す。このとき、「値段」を含む文節８０３と「高い」を含む文節８０４に着目すると、文節８０３から文節８０４に係っていることがわかる。従って、「値段」と「高い」は共起関係にあると判断することができるため、評価表現（値段、高い）が有効となる。 In FIG. 8, the phrase 801 relates to the phrase 804, the phrase 802 relates to the phrase 803, and the phrase 803 relates to the phrase 804. At this time, focusing on the phrase 803 including “price” and the phrase 804 including “high”, it can be seen that the phrase 803 is related to the phrase 804. Therefore, since it can be determined that “price” and “high” are in a co-occurrence relationship, the evaluation expression (price, high) is effective.

一方で、単に「値段」と「高い」が文中で出現したから評価表現であると判断することはできない。このような語の組み合わせは、前述したように、文中において関係性のある位置になければならない。 On the other hand, since “price” and “high” appear in the sentence, it cannot be judged as evaluation expressions. Such word combinations must be in relevant positions in the sentence, as described above.

例えば、文「背の高い人が売っていたさんまの値段はいくらですか？」の構文解析結果を、図９に示す。「値段」を含む文節９０６は文節９０７に係り、「高い」を含む文節９０２は文節９０３に係るため、「値段」と「高い」は共起関係にない、と判断できる。このように、文中において２つの語に関係性があるかどうかを判定するために、前記ステップＳ３０４で得られた構文解析結果を用いる。 For example, FIG. 9 shows a syntax analysis result of the sentence “How much is the price of a sama sold by a tall person?”. Since the phrase 906 including “price” relates to the phrase 907 and the phrase 902 including “high” relates to the phrase 903, it can be determined that “price” and “high” do not have a co-occurrence relationship. Thus, in order to determine whether or not two words are related in a sentence, the syntax analysis result obtained in step S304 is used.

図１１を用いて共起処理の詳細を説明する。図５のステップＳ５０５で追加した前記評価語候補リストからひとつの評価語候補を取り出し、語ＩＤをキーとして評価辞書部１０１から当該語の極性等の情報を含む評価語情報を取得する（ステップＳ１１０２）。 Details of the co-occurrence process will be described with reference to FIG. One evaluation word candidate is extracted from the evaluation word candidate list added in step S505 in FIG. 5, and evaluation word information including information such as the polarity of the word is obtained from the evaluation dictionary unit 101 using the word ID as a key (step S1102). ).

なお、後述するが、語ＩＤが類義語の場合は、類語辞書部１０２（図１８に示す類語辞書）から代表評価語を検索した上で、評価辞書部１０１から当該語の評価語情報を取得する。 As will be described later, when the word ID is a synonym, the representative word is retrieved from the synonym dictionary unit 102 (the synonym dictionary shown in FIG. 18), and the evaluation word information of the word is acquired from the evaluation dictionary unit 101. .

次に、図１８に類語辞書の一例を示す。類語辞書は評価表現及び代表反転語或いは発言者情報における用言等、本システムで利用される様々な語に対して類義語や同義語を管理している。 Next, FIG. 18 shows an example of a synonym dictionary. The synonym dictionary manages synonyms and synonyms for various words used in this system, such as evaluation expressions, representative inverted words, or predicates in speaker information.

各類語にはユニークなＩＤ値が定義されており（図１８における類語ＩＤ１８０１）、一意に管理されている。類語ＩＤ１８０１ごとに、表記１８０２、語ＩＤ１８０３が設定されている。語ＩＤ１８０３は、当該類語の統一表記へのリンクを意味する。例えば、類語ＩＤ値１０２２３である「価格」の統一表記は語ＩＤ値が１６７であり、これは評価辞書の「値段」であることがわかる。なお、類語ＩＤ１８０１についても前述したように任意の演算によって、類語であることがわかるような識別番号になっている。 A unique ID value is defined for each synonym (synonym ID 1801 in FIG. 18) and is uniquely managed. For each synonym ID 1801, a notation 1802 and a word ID 1803 are set. The word ID 1803 means a link to the unified notation of the synonym. For example, the unified notation of “price”, which is the synonym ID value 10223, has a word ID value of 167, which is understood to be “price” in the evaluation dictionary. Note that the synonym ID 1801 is also an identification number that can be recognized as a synonym by an arbitrary calculation as described above.

ステップＳ１１０３において、ステップＳ１１０２で獲得した評価語情報から初期極性を付与する。初期極性とは、評価語に初期値として設定されている極性であり、肯定（正数）或いは否定（負数）或いは中立（０）の極性が設定されている。 In step S1103, initial polarity is given from the evaluation word information acquired in step S1102. The initial polarity is a polarity set as an initial value in the evaluation word, and an affirmative (positive number), negative (negative number), or neutral (0) polarity is set.

続くステップＳ１１０４において、前記評価語情報において、共起属性が設定されているかを判定し、共起属性を持たない（共起属性１６０４に値が設定されていない）と判定した場合（ステップＳ１１０４で「いいえ」の場合）、ステップＳ１１１１に進み、共起属性を持っている（共起属性１６０４に値が設定されている）と判定した場合（ステップＳ１１０４において「はい」の場合）、ステップＳ１１０５に進む。 In the subsequent step S1104, it is determined whether or not a co-occurrence attribute is set in the evaluation word information, and if it is determined that the co-occurrence attribute is not present (a value is not set in the co-occurrence attribute 1604) (in step S1104). If “no”, the process proceeds to step S1111 and if it is determined that the co-occurrence attribute is present (a value is set in the co-occurrence attribute 1604) (“Yes” in step S1104), the process proceeds to step S1105. move on.

ステップＳ１１０５では、前述したように、当該評価語候補の係り先文節を確認する処理を実施する。当該評価語候補の係り先文節ＩＤを持つ別の評価語候補が前記評価語候補リスト内にあるかどうかを確認する。 In step S1105, as described above, a process of confirming the related phrase of the evaluation word candidate is performed. It is checked whether another evaluation word candidate having a related phrase ID of the evaluation word candidate is in the evaluation word candidate list.

また、係り先文節の情報は、ステップＳ３０４で得られた構文解析結果を参照することで得ることができる。あるいは、図１０に示す評価語候補リストに構文解析結果として係り先の文節を示す文節ＩＤを備え、この文節ＩＤを参照して、係り先文節の情報を取得しても良い。 Further, the information on the relation clause can be obtained by referring to the syntax analysis result obtained in step S304. Alternatively, the evaluation word candidate list shown in FIG. 10 may be provided with a clause ID indicating a related clause as a syntax analysis result, and information on the related clause may be acquired by referring to this clause ID.

例えば、図８における文節８０３の文節ＩＤが２及び文節８０４の文節ＩＤが３であるとき、構文解析結果では文節ＩＤ２から文節ＩＤ３に係り受け関係が成立している情報が含まれており、これに図１０で示した評価語候補リストの例と合わせてみると、語ＩＤ１６７の文節ＩＤが２であり、語ＩＤ９３８の文節ＩＤが３であることから、語ＩＤ１６７は語ＩＤ９３８と係り受け関係が成立していると判断できる。 For example, when the phrase ID of the phrase 803 in FIG. 8 is 2 and the phrase ID of the phrase 804 is 3, the syntax analysis result includes information in which the dependency relationship is established from the phrase ID 2 to the phrase ID 3. 10 together with the example of the evaluation word candidate list shown in FIG. 10, since the phrase ID of the word ID 167 is 2 and the phrase ID of the word ID 938 is 3, the word ID 167 has a dependency relationship with the word ID 938. It can be judged that it is established.

このような係り受け関係が成立するような評価語候補が前記評価語候補リスト内に存在する場合（ステップＳ１１０６で「はい」の場合）、ステップＳ１１０７に進み、係り受け関係が成立するような評価語候補が前記評価語候補リスト内に存在しない場合（ステップＳ１１０６で「いいえ」の場合）、ステップＳ１１１１に進む。 If there are evaluation word candidates that satisfy such a dependency relationship in the evaluation word candidate list (in the case of “Yes” in step S1106), the process proceeds to step S1107, and the evaluation is such that the dependency relationship is satisfied. If the word candidate does not exist in the evaluation word candidate list (“NO” in step S1106), the process proceeds to step S1111.

ステップＳ１１０７では、係り受け関係が成立した２つの語が共起関係であるかを判定する。前述した評価辞書部１０１から取得した評価語情報を参照することで判定する。例えば、語「値段」の共起属性１６０４に「高い」が設定されているため、共起関係が成立していると判定し（ステップＳ１１０７で「はい」の場合）、ステップＳ１１０８に進み、係り受け関係は成立するが共起関係にないと判定した場合（ステップＳ１１０７で「いいえ」の場合）、ステップＳ１１１１に進む。
ステップＳ１１０８に進むと、極性を伴った評価表現を検出する。 In step S1107, it is determined whether the two words having the dependency relationship are co-occurrence relationships. The determination is made by referring to the evaluation word information acquired from the evaluation dictionary unit 101 described above. For example, since “high” is set in the co-occurrence attribute 1604 of the word “price”, it is determined that the co-occurrence relationship is established (in the case of “Yes” in step S1107), and the process proceeds to step S1108. If it is determined that the receiving relationship is established but not the co-occurrence relationship (“No” in step S1107), the process proceeds to step S1111.
In step S1108, an evaluation expression with polarity is detected.

一方、ステップＳ１１０４で「いいえ」の場合或いはステップＳ１１０６で「いいえ」の場合或いはステップＳ１１０７で「いいえ」の場合は、共起関係等がないため、前記評価語候補単独で評価表現が成立するかどうかを判定する。前記初期極性が中立であると判定した場合（ステップＳ１１１１で「いいえ」の場合）は、極性がないため評価表現として検出しないためステップＳ１１１０に進み、前記初期極性が中立でないと判定した場合（ステップＳ１１１１で「はい」の場合）、ステップＳ１１０８に進み、極性を伴った評価表現として検出する。 On the other hand, if “NO” in step S1104, “NO” in step S1106, or “NO” in step S1107, since there is no co-occurrence relationship, etc., is the evaluation word candidate established by the evaluation word candidate alone? Determine if. When it is determined that the initial polarity is neutral (in the case of “No” in step S1111), since there is no polarity, it is not detected as an evaluation expression, so the process proceeds to step S1110, and when it is determined that the initial polarity is not neutral (step In the case of “Yes” in S1111, the process proceeds to step S1108 and is detected as an evaluation expression with polarity.

ステップＳ１１０８で検出した評価表現に対して、続くステップＳ１１０９で否定表現によって評価極性が変化しないかどうかを判定する。図１２に否定表現処理のフローチャートを示す。 For the evaluation expression detected in step S1108, it is determined in subsequent step S1109 whether or not the evaluation polarity is changed by a negative expression. FIG. 12 shows a flowchart of negative expression processing.

まず、ステップＳ１２０１において、評価辞書部１０１から当該評価語の分野情報（分野１６０６）を取得する。分野情報とは評価語が属する分野を表し、例えば、「癌」「病気」「怪我」といった語は「医療」分野に属する、と定義している。 First, in step S1201, field information (field 1606) of the evaluation word is acquired from the evaluation dictionary unit 101. The field information represents the field to which the evaluation word belongs. For example, the words “cancer”, “disease”, and “injury” are defined as belonging to the “medical” field.

分野情報には代表反転語が定義されており（図１７に示す分野反転語のうち代表反転語１７０２）、例えば、「医療」の代表反転語は「治る」と定義されている。即ち、「医療」関連の評価語が検出されたとき、その共起関係に「治る」が出現している場合、極性を反転させることを目的とする。 In the field information, a representative inverted word is defined (represented inverted word 1702 among the field inverted words shown in FIG. 17). For example, the representative inverted word of “medical” is defined as “cure”. That is, when an evaluation word related to “medical” is detected, if “cure” appears in the co-occurrence relationship, the object is to reverse the polarity.

次に、図１７に分野反転語の一例を示す。分野１７０１に対して代表反転語１７０２がひとつ定義されている。語ＩＤ１７０３は代表反転語１７０２のＩＤ値を示している。代表反転語のＩＤ値も、前述したように任意の演算を実施することで、代表反転語であることがわかるようになっている。 Next, FIG. 17 shows an example of the field inversion word. One representative inversion word 1702 is defined for the field 1701. A word ID 1703 indicates the ID value of the representative inversion word 1702. As described above, the ID value of the representative inversion word can be understood to be a representative inversion word by performing an arbitrary calculation.

続くステップＳ１２０２において、抽出された評価表現の係り先に、前述した代表反転語が含まれているかどうかを判定する。評価表現が共起関係にないと判定した場合（図１１においてステップ１１０４で「いいえ」であり且つステップＳ１１１１で「はい」の場合）は、係り先が存在しないためステップＳ１２０４に進む。 In a succeeding step S1202, it is determined whether or not the representative inverted word described above is included in the relation of the extracted evaluation expression. If it is determined that the evaluation expression does not have a co-occurrence relationship (“No” in step 1104 and “Yes” in step S1111 in FIG. 11), the process advances to step S1204 because there is no dependency destination.

一方、共起関係が成立していると判定した場合は、係り先の文節を確認し当該代表反転語（代表反転語１７０２）があれば（ステップＳ１２０２で「はい」の場合）ステップＳ１２０３に進み、なければ（ステップＳ１２０２で「いいえ」の場合）ステップＳ１２０４に進む。 On the other hand, if it is determined that the co-occurrence relationship is established, the related phrase is confirmed, and if there is the representative inverted word (representative inverted word 1702) (in the case of “Yes” in step S1202), the process proceeds to step S1203. If not (if “NO” in step S1202), the process proceeds to step S1204.

なお、類語辞書部１０２に代表反転語の類語が設定されている場合は、代表反転語と同様に処理する。例えば、代表反転語「治る」の類語として「治療する」「完治する」（表記１８０２）等が相当する。 When a synonym of a representative inverted word is set in the synonym dictionary unit 102, the same processing as that of the representative inverted word is performed. For example, “treat”, “completely cure” (notation 1802) and the like correspond to the synonyms of the representative inverted word “cure”.

ステップＳ１２０３では、代表反転語による否定情報を付与する。抽出された前記評価表現の極性を反転させるのではなく、否定情報を付与することのみを実施し、最終的な評価極性の決定は極性判定部１０５で行う。 In step S1203, negative information based on representative inverted words is given. Instead of inverting the polarity of the extracted evaluation expression, only the negative information is given, and the final evaluation polarity is determined by the polarity determination unit 105.

続くステップＳ１２０４において、抽出された評価表現の係り先に否定表現が含まれているかどうかを判定する。共起関係が成立していると判定した場合は、係り先の文節に例えば助動詞「ない」が含まれていないか、或いは「ありません」といった否定表現が存在しないかを確認し、否定表現を含まないと判定した場合は、否定表現処理を終了する。否定表現を含む場合（ステップＳ１２０４で「はい」の場合）は、ステップＳ１２０５に進み、ステップＳ１２０３と同様に、否定情報を付与する。 In a succeeding step S1204, it is determined whether or not a negative expression is included in the relation of the extracted evaluation expression. If it is determined that the co-occurrence relationship has been established, check whether the related clause contains, for example, the auxiliary verb “None” or the presence of a negative expression such as “None”. If it is determined that there is no negative expression process, the negative expression process is terminated. If a negative expression is included (in the case of “Yes” in step S1204), the process proceeds to step S1205, and negative information is added in the same manner as in step S1203.

否定情報処理の一例をあげる。文「病気にならなかった」の場合、構文解析結果は
［病気 / に］→［なる / ない / た］
となり、評価表現「病気」の係り先に否定助動詞「ない」が存在するため、評価表現「病気」の初期極性（−１）に否定情報が付与される。また、文「病気が治った」の場合、構文解析結果は
［病気 / が」→［治る / た］
となり、評価表現「病気」の係り先に分野の代表反転語「治る」が存在するため、評価表現「病気」の初期極性（−１）に否定情報が付与される。また、文「病気が治らなかった」の場合、構文解析結果は
［病気 / が」→［治る / ない / た］
となり、評価表現「病気」の係り先に分野の代表反転語「治る」が存在するため、評価表現「病気」の初期極性（−１）に否定情報が付与され、さらに否定助動詞「ない」が存在するため、さらに否定情報が付与される。即ち、初期極性（−１）に否定情報が２つ付与される。 An example of negative information processing is given. In the case of the sentence “I did not get sick”, the result of the parsing is [ Ill / Ne ] → [Become / N / t]
Thus, since the negative auxiliary verb “None” exists at the destination of the evaluation expression “disease”, negative information is given to the initial polarity (−1) of the evaluation expression “disease”. Also, in the case of the sentence “Illness has been cured”, the parsing result is [ Illness / Gas ] → [ Healing / Ta ]
Thus, since the representative inversion word “cure” of the field exists at the destination of the evaluation expression “disease”, negative information is given to the initial polarity (−1) of the evaluation expression “disease”. Also, in the case of the sentence “Disease was not cured”, the result of the parsing is [ Illness / Gas ] → [ Cure / No / Ta ]
Since there is a representative inversion word “cure” in the field of the evaluation expression “disease”, negative information is given to the initial polarity (−1) of the evaluation expression “disease”, and the negative auxiliary verb “no” is further added. Since it exists, further negative information is given. That is, two pieces of negative information are assigned to the initial polarity (−1).

図１１に戻り、以上の処理を前記評価語候補リスト内のすべての評価語候補について繰り返し、共起処理を終了する。 Returning to FIG. 11, the above process is repeated for all the evaluation word candidates in the evaluation word candidate list, and the co-occurrence process is terminated.

図６に戻り、以上の処理で評価語抽出処理を終了する。この段階で文から、評価表現と否定情報の有無及び構文解析結果等を得ている。 Returning to FIG. 6, the evaluation word extraction processing is completed by the above processing. At this stage, the evaluation expression, the presence or absence of negative information, the result of parsing, etc. are obtained from the sentence.

前記評価語抽出処理を行う一方で、発言者情報抽出部１０４において、発言者情報の抽出処理が実施される。発言者情報の抽出とは、文の主格にあたる発言者の「立場」を推定することである。 While performing the evaluation word extraction process, the speaker information extraction unit 104 performs speaker information extraction processing. The extraction of the speaker information is to estimate the “position” of the speaker who is the main character of the sentence.

例えば、文「値段が高かったのでさんまは買わなかった」の場合、動詞「買う」が使用されていることから発言者情報は「消費者」と推定できる。 For example, in the case of the sentence “I did not buy Sanma because the price was high”, the verb information “Buy” is used, so that the speaker information can be estimated as “Consumer”.

ここで、前述した評価表現「値段が高い」を例に説明する。発言者情報が「消費者」の場合、評価表現「値段が高い」は否定極性になることが容易に推測される。一方で発言者情報が「供給者」の場合は肯定極性になることが推測される。 Here, the evaluation expression “price is high” will be described as an example. When the speaker information is “consumer”, it is easily estimated that the evaluation expression “price is high” has a negative polarity. On the other hand, when the speaker information is “supplier”, it is presumed that the polarity is positive.

即ち、評価表現「値段が高い」について、文「値段が高かったのでさんまは買わなかった」では否定極性となり、文「昨日のさんまは値段が高く売れた」の場合では肯定極性と判定すべきである。このような発言者情報を抽出する手法について、図１３を用いて説明する。 In other words, the evaluation expression “price is high” should be judged as negative polarity in the sentence “I didn't buy sanma because the price was high”, and positive in the case of the sentence “yama yesterday was high in price” It is. A method for extracting such speaker information will be described with reference to FIG.

図１３のステップＳ１３０１において、前記構文解析の結果から最終文節を選択する。続くステップＳ１３０２で最終文節が引用節を伴う述部であるかどうかを判定する。引用節とは述部が「と思う」或いは「と考える」等といった述部を要しているものであり、本来評価すべき文章は引用節に含まれている。 In step S1301 of FIG. 13, the final phrase is selected from the result of the syntax analysis. In a succeeding step S1302, it is determined whether or not the final clause is a predicate accompanied by a citation clause. A quote clause requires a predicate such as “I think” or “I think”, and the sentence to be evaluated is included in the quote clause.

引用節を伴うと判定した場合（ステップＳ１３０２で「はい」の場合）、ステップＳ１３０３に進み、ステップＳ１３０１で選択した最終節の代わりに引用節を選択する。具体的には、係り受け解析結果を参照し、最終節に係る文節を選択することで引用節を選択することができる。 If it is determined that a quoted clause is included (“Yes” in step S1302), the process proceeds to step S1303, and a quoted clause is selected instead of the last clause selected in step S1301. Specifically, the citation section can be selected by referring to the dependency analysis result and selecting the clause related to the last section.

続くステップＳ１３０４において、選択した文節に発言者用言が含まれているかどうかを判定する。発言者用言とは、例えば、前述した「買う」或いは「売る」といった用言を示し、発言者情報として発言者情報辞書部１０６（図１９に示す発言者情報辞書）に登録されている。 In a succeeding step S1304, it is determined whether or not a speaker precaution is included in the selected phrase. The speaker predicates are, for example, the above-mentioned prescriptions such as “buy” or “sell”, and are registered in the speaker information dictionary unit 106 (speaker information dictionary shown in FIG. 19) as speaker information.

次に、図１９に発言者情報辞書の一例を示す。発言者情報辞書も他の辞書と同様に、ユニークなＩＤ値である発言者情報ＩＤ１９０１を持ち、用言１９０２、評価表現１９０３、極性１９０４、発言者属性１９０５を持つ。発言者情報ＩＤについても前述したように任意の演算によって、発言者情報であることがわかるようになっている。 Next, FIG. 19 shows an example of the speaker information dictionary. Like other dictionaries, the speaker information dictionary also has a speaker information ID 1901 that is a unique ID value, and has a statement 1902, an evaluation expression 1903, a polarity 1904, and a speaker attribute 1905. As described above, the speaker information ID can be determined to be speaker information by an arbitrary calculation.

選択した文節に発言者用言が含まれると判定した場合（ステップＳ１３０４で「はい」の場合）、ステップＳ１３０５に進み、当該発言者情報を取得し、発言者用言を含まないと判定した場合（ステップＳ１３０４で「いいえ」の場合）、発言者情報がないため、発言者情報抽出処理を終了する。 If it is determined that the selected phrase contains a speaker precaution (if “Yes” in step S1304), the process proceeds to step S1305, where the speaker information is acquired and it is determined that the speaker prescript is not included (In the case of “No” in step S1304), since there is no speaker information, the speaker information extraction process is terminated.

なお、発言者用言の検索には類語辞書部１０２を利用することも可能であり、この場合は類語辞書から代表用言を獲得して発言者情報辞書部１０６から発言者情報を得る。例えば、発言者用言の代表用言が「買う」のとき、類語辞書部１０２に「購入する」が登録されているような場合である。 Note that the synonym dictionary unit 102 can also be used for searching for the speaker's words. In this case, representative words are acquired from the synonym dictionary and the speaker information is obtained from the speaker information dictionary unit 106. For example, there is a case where “buy” is registered in the synonym dictionary unit 102 when the representative word of the speaker's word is “buy”.

続くステップＳ１３０６において、発言者用言を含む文節に逆接の接続助詞があるかを判定する。具体的には「が」「けれども」「のに」等が当該文節に含まれているかどうか、を判定し、逆接の接続助詞を含むと判定した場合（ステップＳ１３０６で「はい」の場合）、ステップＳ１３０７に進み、逆接の接続助詞による極性補正情報を付与する。一方、逆接の接続助詞を含まないと判定した場合（ステップＳ１３０６で「いいえ」の場合）、そのまま発言者情報抽出処理を終了する。 In a succeeding step S1306, it is determined whether or not there is an inverse connected particle in the phrase including the speaker precaution. Specifically, it is determined whether or not “ga”, “but”, “noni”, and the like are included in the clause, and when it is determined that the connected particle of the reverse connection is included (in the case of “Yes” in step S1306), Proceeding to step S 1307, polarity correction information based on reverse connected particles is added. On the other hand, if it is determined that the connected particle is not included (in the case of “No” in step S1306), the speaker information extraction process is terminated as it is.

なお、発言者情報は常に文中で明示されているわけでない。従って、例えば、ひとつ前の発言者情報抽出処理の結果を一時的に記憶しておき、後段の文章において発言者情報が記載されていない場合にのみ前記一時記憶した前文の発言者情報を参照する、といったこともできる。 Note that the speaker information is not always specified in the text. Therefore, for example, the result of the previous speaker information extraction process is temporarily stored, and the temporarily stored speaker information is referred only when the speaker information is not described in the subsequent sentence. , And so on.

発言者情報抽出処理を終了した段階で、発言者用言を検出している場合は、ステップＳ１３０５で取得した発言者情報と、逆接の接続助詞を検出している場合は、ステップＳ１３０７で付与した極性補正情報を獲得している。以上の発言者情報抽出処理の結果と、前述した評価語抽出処理の結果を合わせて、図３のステップＳ３０７における極性判定処理を実施する。 If the speaker information is detected at the stage where the speaker information extraction process is completed, the speaker information acquired in step S1305 and the connected particle of the reverse connection are detected, and are added in step S1307. Polarity correction information has been acquired. The polarity determination process in step S307 in FIG. 3 is performed by combining the result of the above speaker information extraction process and the result of the evaluation word extraction process described above.

図１４には、評価語抽出処理の結果と発言者情報抽出処理の結果と、の一例を示す。文「値段が高かったのでさんまは買わなかった」を処理したとき、評価語抽出処理の結果は１４０１に、発言者情報抽出処理の結果は１４０５に示す。これらは、所定の記憶領域へテーブルを備え、当該テーブルへ結果を記憶する構成として良い。 FIG. 14 shows an example of the result of the evaluation word extraction process and the result of the speaker information extraction process. When the sentence “Sanma was not bought because the price was high” is processed, the result of the evaluation word extraction process is shown in 1401, and the result of the speaker information extraction process is shown in 1405. These may be configured to include a table in a predetermined storage area and store the result in the table.

評価語抽出処理では、評価表現１４０２と初期極性１４０４が抽出される。当該評価表現を含む部分文字列「値段が高かったので」には代表反転語や否定表現を含まないため、結果１４０１に否定情報は存在しない。なお、語ＩＤ１４０３は、評価辞書部１０１に登録されている語「値段」のＩＤ値になる。 In the evaluation word extraction process, an evaluation expression 1402 and an initial polarity 1404 are extracted. Since the partial character string “because the price was high” including the evaluation expression does not include the representative inversion word or the negative expression, there is no negative information in the result 1401. The word ID 1403 is an ID value of the word “price” registered in the evaluation dictionary unit 101.

発言者情報抽出処理では、最終文節から抽出された用言１４０７とその発言者情報である発言者情報ＩＤ１４０６、極性１４０８及び評価表現１４０９が抽出される。逆説の接続助詞は存在しない。なお、語ＩＤ１４１０は、発言者情報の一部である評価表現１４０９に設定されている語「値段」のＩＤ値になる。 In the speaker information extraction process, the prescription 1407 extracted from the last phrase and the speaker information ID 1406, polarity 1408, and evaluation expression 1409 that are the speaker information are extracted. There is no paradoxical connection particle. The word ID 1410 is an ID value of the word “price” set in the evaluation expression 1409 that is a part of the speaker information.

次に、図１５を用いて極性判定処理を説明する。極性判定処理は、前記評価語抽出結果及び前記発言者情報抽出結果から最終的な評価極性を決定する処理である。 Next, the polarity determination process will be described with reference to FIG. The polarity determination process is a process of determining a final evaluation polarity from the evaluation word extraction result and the speaker information extraction result.

図１５のステップＳ１５０１において、前記評価語抽出処理で抽出結果が得られたかどうかを判定し、抽出結果がないと判定した場合（ステップＳ１５０１で「いいえ」の場合）、極性を設定する評価表現が存在しないため、極性判定処理を終了し、抽出結果があると判定した場合（ステップＳ１５０１で「はい」の場合）、ステップＳ１５０２に進む。 In step S1501 of FIG. 15, it is determined whether or not an extraction result has been obtained by the evaluation word extraction process. If it is determined that there is no extraction result (“No” in step S1501), an evaluation expression for setting the polarity is Since it does not exist, the polarity determination process is terminated, and when it is determined that there is an extraction result (in the case of “Yes” in step S1501), the process proceeds to step S1502.

続くステップＳ１５０２において、前記発言者抽出処理で抽出結果が得られたかどうかを判定し、抽出結果がないと判定した場合（ステップＳ１５０２で「いいえ」の場合）、ステップＳ１５０５に進み、抽出結果があると判定した場合（ステップＳ１５０２で「はい」の場合）、ステップＳ１５０３に進む。 In subsequent step S1502, it is determined whether or not an extraction result has been obtained by the speaker extraction process. If it is determined that there is no extraction result (in the case of “No” in step S1502), the process proceeds to step S1505, and there is an extraction result. (Yes in step S1502), the process proceeds to step S1503.

ステップＳ１５０３では前記発言者抽出処理の結果得られた発言者用言に対して設定されている評価表現が、前記評価語抽出処理の結果に存在するかどうかを判定する。 In step S1503, it is determined whether or not an evaluation expression set for the speaker word obtained as a result of the speaker extraction process exists in the result of the evaluation word extraction process.

図１４の例で言えば、評価表現１４０９「値段、高い」が発言者情報抽出処理で得られているので、評価語抽出処理の結果である１４０１における評価表現１４０２と一致するかどうかを判定する。 In the example of FIG. 14, since the evaluation expression 1409 “price, high” is obtained by the speaker information extraction process, it is determined whether or not it matches the evaluation expression 1402 in 1401 which is the result of the evaluation word extraction process. .

結果が一致すると判定した場合（ステップＳ１５０３で「はい」の場合）ステップＳ１５０４に進み、一致しないと判定した場合（ステップＳ１５０３で「いいえ」の場合）、ステップ１５０５に進む。 If it is determined that the results match (if “Yes” in step S1503), the process proceeds to step S1504. If it is determined that the results do not match (“no” in step S1503), the process proceeds to step 1505.

続くステップＳ１５０４では、前述した２つの評価表現が一致したため、発言者情報における極性を評価語抽出結果に適用する。図１４の例で言えば、極性１４０４の中立（０）を極性１４０８の否定（−１）に置き換える。 In subsequent step S1504, since the two evaluation expressions described above match, the polarity in the speaker information is applied to the evaluation word extraction result. In the example of FIG. 14, the neutrality (0) of the polarity 1404 is replaced with the negation (−1) of the polarity 1408.

続くステップＳ１５０５からステップＳ１５０７までにおいて、前記評価語抽出処理において検出した否定情報を適用する（図１２におけるステップＳ１２０３及びステップＳ１２０５の処理）。 In subsequent steps S1505 to S1507, the negative information detected in the evaluation word extraction process is applied (the processes in steps S1203 and S1205 in FIG. 12).

さらに続くステップＳ１５０８において、発言者情報抽出処理において逆接の接続助詞に起因する極性情報（図１３におけるステップＳ１３０７の処理）がないかを判定する。極性情報があると判定した場合（ステップＳ１５０８で「はい」の場合）、ステップＳ１５０９に進み極性を反転し、極性情報がないと判定した場合（ステップＳ１５０８で「いいえ」の場合）、極性補正を行わず極性決定処理を終了する。 In further subsequent step S1508, it is determined whether or not there is polarity information (processing in step S1307 in FIG. 13) due to the reverse connected particle in the speaker information extraction processing. If it is determined that there is polarity information (in the case of “Yes” in step S1508), the process proceeds to step S1509, the polarity is reversed, and if it is determined that there is no polarity information (in the case of “No” in step S1508), polarity correction is performed. The polarity determination process is terminated without performing the process.

図１４の例で言えば、前述したように極性１４０４が中立（０）から否定（−１）に置き換わり、その他の否定情報は付与されていないことから、全体として否定（−１）として極性が決定する。従って、文「値段が高かったのでさんまは買わなかった」の評価極性は否定となる。 In the example of FIG. 14, as described above, the polarity 1404 is changed from neutral (0) to negative (−1) and no other negative information is given, so the polarity is negative (−1) as a whole. decide. Therefore, the evaluation polarity of the sentence “I didn't buy because of the high price” was negative.

次に、各辞書の一例を、図１６〜図１９に示したが、評価辞書部１０１及び類語辞書部１０２及び発言者情報辞書部１０６で使用するすべての語句をひとつのパターンマッチングマシン（トライ法等で構築）に登録して処理することで、システム或いは装置において効率的な抽出処理が実施できることは言うまでもない。 Next, an example of each dictionary is shown in FIG. 16 to FIG. 19, but all the phrases used in the evaluation dictionary unit 101, the synonym dictionary unit 102, and the speaker information dictionary unit 106 are converted into one pattern matching machine (Tri method). It goes without saying that an efficient extraction process can be carried out in the system or apparatus by registering and processing in (Established etc.).

以上、本発明によれば、評価対象となる文書が、肯定的な評価あるいは否定的な評価等であるかを分析するにあたり、評価対象となる文書を構成する評価語の分野属性を考慮して分析を行うことが可能となるので、ユーザへの手間をかけることなく、より精度良く評価対象文書の分析を行うことができる。 As described above, according to the present invention, in analyzing whether a document to be evaluated is a positive evaluation or a negative evaluation, the field attributes of evaluation words constituting the document to be evaluated are considered. Since the analysis can be performed, it is possible to analyze the evaluation target document with higher accuracy without taking time and effort for the user.

以上、実施形態例を詳述したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体等としての実施態様を取ることが可能であり、具体的には、複数の機器から構成するシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiments have been described in detail above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, or a storage medium, and specifically includes a plurality of devices. The present invention may be applied to a system that performs such a process, or may be applied to an apparatus that includes a single device.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な内容で構成されることは言うまでもない。 It should be noted that the configuration and contents of the various data described above are not limited to this, and it is needless to say that they are configured with various contents according to applications and purposes.

また、本発明は、システム或いは装置にプログラムを供給することによって達成される場合にも適用できることは言うまでもない。この場合、本発明を達成するためのソフトウェアによって表されるプログラムを格納した記憶媒体を該システム或いは装置に読み出すことによって、そのシステム或いは装置が、本発明の効果を享受することが可能となる。 Needless to say, the present invention can also be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a storage medium storing a program represented by software for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのソフトウェアによって表されるプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステム或いは装置が、本発明の効果を享受することが可能となる。 Furthermore, by downloading and reading a program represented by software for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention. It becomes.

なお、上述した各実施形態及びその変形例を組み合わせた構成もすべて本発明に含まれるものである。 In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１００文書分析装置
１０１評価辞書部
１０２類語辞書部
１０３評判情報抽出部
１０４発言者情報抽出部
１０５極性判定部
１０６発言者情報辞書部
１０７テキスト文書
２０１ＣＰＵ
２０２ＲＡＭ
２０３ＲＯＭ
２０４システムバス
２０５入力コントローラ
２０６ビデオコントローラ
２０７メモリコントローラ
２０８通信Ｉ／Ｆ（インターフェース）コントローラ
２０９入力装置
２１０表示装置
２１１外部メモリ DESCRIPTION OF SYMBOLS 100 Document analyzer 101 Evaluation dictionary part 102 Synonym dictionary part 103 Reputation information extraction part 104 Speaker information extraction part 105 Polarity determination part 106 Speaker information dictionary part 107 Text document 201 CPU
202 RAM
203 ROM
204 System Bus 205 Input Controller 206 Video Controller 207 Memory Controller 208 Communication I / F (Interface) Controller 209 Input Device 210 Display Device 211 External Memory

Claims

An information processing apparatus that analyzes positive evaluation or negative evaluation in a document,
Evaluation information acquisition means for acquiring evaluation word information including an evaluation word to be evaluated and an evaluation polarity of the evaluation word obtained from morphological analysis and syntax analysis of the document;
Dependency evaluation word acquisition means for acquiring an evaluation word different from the evaluation word that is a dependency relationship with the evaluation word of the evaluation information acquired by the evaluation information acquisition means;
Co-occurrence relation determining means for determining whether or not an evaluation word acquired by the dependency evaluation word acquiring means and an evaluation word having a dependency relation have a co-occurrence relationship;
Change information acquisition means for acquiring change information for changing the evaluation polarity of evaluation words in co-occurrence relation by the co-occurrence relation determination means;
An information processing apparatus comprising:

Change determination information storage means for storing change determination information indicating whether or not to change the evaluation polarity for the evaluation words in the co-occurrence relationship;
The information processing apparatus according to claim 1, wherein the change information acquisition unit acquires the change information from the change determination information stored by the change determination information storage unit.

The information processing apparatus according to claim 1, wherein the change information acquisition unit further changes an evaluation polarity of the change information when a dependency destination of the evaluation word is a negative expression.

A speaker identification means for identifying a speaker in the document;
The said change information acquisition means changes the said change information of the said evaluation word further by the speaker specified by the said speaker specific means, The change in any one of Claim 1 thru | or 3 characterized by the above-mentioned. Information processing device.

A method for controlling an information processing apparatus that analyzes positive evaluation or negative evaluation in a document,
The information processing apparatus includes:
An evaluation information acquisition step for acquiring evaluation word information including an evaluation word to be evaluated and an evaluation polarity of the evaluation word obtained from morphological analysis and syntax analysis of the document;
A dependency evaluation word acquisition step of acquiring an evaluation word different from the evaluation word which is a dependency relationship with the evaluation word of the evaluation information acquired by the evaluation information acquisition step;
A co-occurrence relationship determination step for determining whether or not the evaluation word acquired in the dependency evaluation word acquisition step and the evaluation word in the dependency relationship are in a co-occurrence relationship;
A change information acquisition step for acquiring change information for changing the evaluation polarity of evaluation words in a co-occurrence relationship by the co-occurrence relationship determination step;
A method for controlling an information processing apparatus, characterized by:

A program that can be read and executed by an information processing device that analyzes positive evaluation or negative evaluation in a document,
The information processing apparatus;
Evaluation information acquisition means for acquiring evaluation word information including an evaluation word to be evaluated and an evaluation polarity of the evaluation word obtained from morphological analysis and syntax analysis of the document;
Dependency evaluation word acquisition means for acquiring an evaluation word different from the evaluation word that is a dependency relationship with the evaluation word of the evaluation information acquired by the evaluation information acquisition means;
Co-occurrence relation determining means for determining whether or not an evaluation word acquired by the dependency evaluation word acquiring means and an evaluation word having a dependency relation have a co-occurrence relationship;
Change information acquisition means for acquiring change information for changing the evaluation polarity of evaluation words in co-occurrence relation by the co-occurrence relation determination means;
A program characterized by making it function.