JP2016012195A

JP2016012195A - Factor estimation device, program, and factor estimation method

Info

Publication number: JP2016012195A
Application number: JP2014132758A
Authority: JP
Inventors: 恭子小松; Kyoko Komatsu; 広海石先; Hiromi Ishisaki; 服部　元; Hajime Hattori; 元服部; 滝嶋　康弘; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-06-27
Filing date: 2014-06-27
Publication date: 2016-01-21
Anticipated expiration: 2034-06-27
Also published as: JP6253530B2

Abstract

PROBLEM TO BE SOLVED: To enable automatic estimation of candidates of factors relating to a request or a dissatisfaction, and also estimate a factor relating to a human relationship or a personal life event.SOLUTION: The factor estimation device is provided with: a subjective view detection unit 20 that performs document classification processing on inputted text data and gives, to classified each word group, a positive label indicating that the word group indicates a positive subjective view, or a negative label indicating that the word group indicates a negative subjective view; and a factor detection unit 30 that extracts, from a sentence, co-occurrence words that co-occur with the word group to which the positive label has been given or the word group to which the negative label has been given, and from a probability table constituted of the word groups that indicate the positive or negative subjective view, the word groups that indicate expression factors of the word groups indicating the subjective view, and numerical values that indicate expression probabilities of the word groups indicating the expression factors, extracts the word groups that include the extracted co-occurrence words, the word groups indicating the expression factors, the extracted word group that indicates the expression factors being selected in an order from the word group with higher expression probability and set as factor word groups.

Description

本発明は、文章内に、肯定的な主観を示す語群または否定的な主観を示す語群の少なくとも一方が発現した原因を推定する技術に関する。 The present invention relates to a technique for estimating a cause in which at least one of a word group showing a positive subjectivity or a word group showing a negative subjectivity appears in a sentence.

従来から、商品の購入者や使用者が、インターネットを用いて、商品に関する意見や感想を発信することが可能となっている。特許文献１では、製品評価が示された文書群からシステム動作に関する言葉を辞書により抽出し、その後、評価極性分類（ポジネガ）により、「ネガ」に判別されたものを不満文として抽出する技術が開示されている。また、非特許文献１では、Ｗｅｂ（World Wide Web）上のレビュー記事から、商品の好評意見と不満意見を取り出し、主成分分析によって、それぞれ一つの評価軸にまとめた空間に商品を配置することによって、消費者の購入判断を支援するシステムが開示されている。 2. Description of the Related Art Conventionally, purchasers and users of products can transmit opinions and impressions about products using the Internet. In Japanese Patent Laid-Open No. 2004-228867, there is a technique in which words relating to system operation are extracted from a document group showing product evaluation using a dictionary, and then those that are determined as “negative” are extracted as dissatisfied sentences by evaluation polarity classification (positive negative). It is disclosed. In Non-Patent Document 1, a product's popular opinion and dissatisfaction opinion is taken out from a review article on the Web (World Wide Web), and the product is arranged in a space summarized on one evaluation axis by principal component analysis. Discloses a system for supporting a consumer's purchase decision.

特開２０１３−１６８０４３号公報JP 2013-168043 A

DEIM Forum 2012 A9-2 ユーザの重視する不満意見と好評意見を考慮した商品推薦システムDEIM Forum 2012 A9-2 Product Recommendation System Considering Dissatisfaction Opinions and Popular Opinions that Users Focus on

しかしながら、特許文献１記載の技術では、要望や不満を抽出する対象が、システムや商品など、物の動作や不具合についての事柄に限定されており、対象範囲が狭い。さらに、要望や不満が発現する原因については言及していない。また、非特許文献１記載の技術は、商品レビュー記事を対象とし、複数の好評や不満をまとめることによって、ユーザが商品購買時の判断をし易くするためのものであるが、好評や不満が発現する原因については言及していない。また、特許文献１および非特許文献１の両方とも、対象が商品や製品などマーケティングに関わる事項に限定されているため、人間関係や人のライフイベントに関する要望や不満は対象とされておらず、人の感情に伴う原因については推定することができない。 However, in the technique described in Patent Document 1, the target for extracting requests and dissatisfaction is limited to matters related to the operation and malfunction of objects such as systems and products, and the target range is narrow. Furthermore, it does not mention the cause of demands and dissatisfaction. The technique described in Non-Patent Document 1 is intended for product review articles, and is intended to make it easier for a user to make a judgment when purchasing a product by collecting a plurality of popular reviews and dissatisfactions. There is no mention of the cause of its manifestation. Moreover, since both the patent document 1 and the non-patent document 1 are limited to matters relating to marketing such as products and products, requests and dissatisfactions regarding human relations and human life events are not targeted, The cause of human emotion cannot be estimated.

本発明は、このような事情に鑑みてなされたものであり、要望・不満に対応する原因の候補を自動で推定することができ、また、人間関係や、人のライフイベントに関する原因を推定することができる原因推定装置、プログラムおよび原因推定方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and can automatically estimate a cause candidate corresponding to a request or dissatisfaction, and estimate a cause related to a human relationship or a human life event. It is an object to provide a cause estimation device, a program, and a cause estimation method.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の原因推定装置は、文章内に、肯定的な主観を示す語群または否定的な主観を示す語群の少なくとも一方が発現した原因を推定する原因推定装置であって、入力されたテキストデータに対して文書分類処理を行ない、分類された各語群に対して肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルを付与する主観検出部と、前記文章から前記肯定ラベルが付与された語群または前記否定ラベルが付与された語群と共起する共起語を抽出し、肯定的または否定的な主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルから、前記抽出した共起語を含む発現原因を示す語群を抽出し、前記抽出した発現原因を示す語群を、発現確率が高い順に選出して原因語群とする原因検出部と、を備えることを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the cause estimation apparatus of the present invention is a cause estimation apparatus that estimates a cause in which at least one of a word group indicating a positive subjectivity or a word group indicating a negative subjectivity appears in a sentence, and is input. The document classification processing is performed on the text data, and the positive label indicating the positive subjectivity for each classified word group or the word group indicating the negative subjectivity is indicated. A subjective detection unit for assigning a negative label, and extracting a co-occurrence word co-occurring with the word group with the positive label or the word group with the negative label from the sentence, An expression cause including the extracted co-occurrence word from a probability table composed of a word group indicating an expression, a word group indicating an expression cause of the word group indicating the subjectivity, and a numerical value indicating an expression probability of the word group indicating the expression Extract words that indicate The term group indicates expression causes the extracted, causing detector caused word group and elected expression probability descending order, characterized in that it comprises a.

このように、入力されたテキストデータに対して文書分類処理を行ない、分類された各語群に対して肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルを付与し、前記文章から前記肯定ラベルが付与された語群または前記否定ラベルが付与された語群と共起する共起語を抽出し、肯定的または否定的な主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルから、前記抽出した共起語を含む発現原因を示す語群を抽出し、前記抽出した発現原因を示す語群を、発現確率が高い順に選出して原因語群とするので、肯定的または否定的な主観を示す語群の発現原因を、自動的に推定することが可能となる。また、文章を対象とするため、商品やシステムに限定されず、人間関係や人のライフイベントに関する肯定的または否定的な主観を示す語群の発現原因を推定することが可能となる。 In this way, a document classification process is performed on the input text data, and a positive label indicating a positive subjectivity or a negative subjectivity word for each classified word group. A negative label indicating that it is a group, and a co-occurrence word that co-occurs with the word group to which the positive label is given or the word group to which the negative label is given is extracted from the sentence, positive or negative Including the extracted co-occurrence words from a probability table composed of a word group indicating the subjectivity, a word group indicating the cause of the expression of the word group indicating the subjectivity, and a numerical value indicating the expression probability of the word group indicating the cause of the expression The word group indicating the cause of expression is extracted, and the word group indicating the extracted expression cause is selected in descending order of the probability of expression as the cause word group, so the expression cause of the word group indicating positive or negative subjectivity Can be estimated automatically It made. In addition, since text is a target, it is not limited to products and systems, and it is possible to estimate the cause of expression of a word group indicating positive or negative subjectivity regarding human relationships or human life events.

（２）また、本発明の原因推定装置は、予め収集され、文書分類処理によって分類され、肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルが付与された語群を学習データベースに格納し、前記学習データベースに格納された語群に基づいて、前記確率テーブルを作成する事前処理部を更に備えることを特徴とする。 (2) Moreover, the cause estimation apparatus of the present invention is a word group indicating a positive label or a negative subjectivity indicating that it is a word group collected in advance and classified by a document classification process and indicating a positive subjectivity. It further includes a preprocessing unit that stores a word group to which a negative label indicating that is stored in a learning database and creates the probability table based on the word group stored in the learning database.

このように、予め収集され、文書分類処理によって分類され、肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルが付与された語群を学習データベースに格納し、前記学習データベースに格納された語群に基づいて、前記確率テーブルを作成するので、肯定的または否定的な主観を示す語群の発現原因を、自動的に推定することが可能となる。 In this way, collected in advance and classified by the document classification process, a positive label indicating that it is a word group indicating positive subjectivity or a negative label indicating that it is a word group indicating negative subjectivity is given. The word group is stored in the learning database, and the probability table is created based on the word group stored in the learning database, so that the cause of the expression of the word group showing positive or negative subjectivity is automatically estimated. It becomes possible to do.

（３）また、本発明の原因推定装置において、前記原因検出部は、前記選出した複数の原因語群の発現確率が同一である場合は、前記各原因語群のうち、肯定ラベルが付与された語群または否定ラベルが付与された語群から文節間隔が最も小さい原因語群を選出することを特徴とする。 (3) Moreover, in the cause estimation apparatus of the present invention, the cause detection unit is given an affirmative label among the respective cause word groups when the plurality of selected cause word groups have the same occurrence probability. The causal word group having the smallest phrase interval is selected from the word group to which the negative label is assigned.

このように、前記選出した複数の原因語群の発現確率が同一である場合は、前記各原因語群のうち、肯定ラベルが付与された語群または否定ラベルが付与された語群から文節間隔が最も小さい原因語群を選出するので、発現確率が同一であったとしても、優先順位を付与することが可能となる。 Thus, when the expression probability of the selected plurality of causal word groups is the same, among the respective causal word groups, the phrase interval from the word group given a positive label or the word group given a negative label Since the causal word group having the smallest is selected, even if the expression probability is the same, the priority order can be given.

（４）また、本発明のプログラムは、文章内に、肯定的な主観を示す語群または否定的な主観を示す語群の少なくとも一方が発現した原因を推定するプログラムであって、入力されたテキストデータに対して文書分類処理を行ない、分類された各語群に対して肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルを付与する処理と、前記文章から前記肯定ラベルが付与された語群または前記否定ラベルが付与された語群と共起する共起語を抽出する処理と、肯定的または否定的な主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルから、前記抽出した共起語を含む発現原因を示す語群を抽出する処理と、前記抽出した発現原因を示す語群を、発現確率が高い順に選出して原因語群とする処理と、の一連の処理を、コンピュータに実行させることを特徴とする。 (4) The program of the present invention is a program for estimating the cause of the occurrence of at least one of a word group showing positive subjectivity or a word group showing negative subjectivity in a sentence, and is input Document classification processing is performed on the text data, and a negative label indicating that it is a word group indicating a positive subjectivity or a word group indicating a negative subjectivity for each classified word group A process of assigning a label, a process of extracting a co-occurrence word co-occurring with a word group to which the positive label is given or a word group to which the negative label is given from the sentence, and a positive or negative subjectivity. An expression cause including the extracted co-occurrence word from a probability table composed of a word group indicating the expression cause of the word group indicating the subjectivity and a numerical value indicating the expression probability of the word group indicating the expression cause Extract word groups A processing that, the word group showing expression causes the extracted, the processing caused word group and elected expression probability descending order, a series of processing, and characterized by causing a computer to execute.

（５）また、本発明の原因推定方法は、文章内に、肯定的な主観を示す語群または否定的な主観を示す語群の少なくとも一方が発現した原因を推定する原因推定方法であって、主観検出部において、入力されたテキストデータに対して文書分類処理を行ない、分類された各語群に対して肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルを付与するステップと、原因検出部において、前記文章から前記肯定ラベルが付与された語群または前記否定ラベルが付与された語群と共起する共起語を抽出し、肯定的または否定的な主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルから、前記抽出した共起語を含む発現原因を示す語群を抽出し、前記抽出した発現原因を示す語群を、発現確率が高い順に選出して原因語群とするステップと、を少なくとも含むことを特徴とする。 (5) Moreover, the cause estimation method of the present invention is a cause estimation method for estimating a cause in which at least one of a word group indicating a positive subjectivity or a word group indicating a negative subjectivity appears in a sentence. The subjectivity detection unit performs document classification processing on the input text data, and displays an affirmative label or negative subjectivity indicating that it is a word group indicating a positive subjectivity for each classified word group. A co-occurrence word that co-occurs with the step of assigning a negative label indicating that it is a word group to be indicated, and the word group to which the positive label is assigned from the sentence or the word group to which the negative label is assigned in the cause detection unit From a probability table composed of a word group indicating positive or negative subjectivity, a word group indicating the cause of expression of the word group indicating the subjectivity, and a numerical value indicating the probability of expression of the word group indicating the cause of expression The extraction Extracting a word group indicating the cause of expression including the co-occurrence word and selecting the word group indicating the extracted expression cause in descending order of the probability of expression to be a cause word group, To do.

本発明によれば、肯定的または否定的な主観を示す語群の発現原因を、自動的に推定することが可能となる。また、文章を対象とするため、商品やシステムに限定されず、人間関係や人のライフイベントに関する肯定的または否定的な主観を示す語群の発現原因を推定することが可能となる。 According to the present invention, it is possible to automatically estimate the cause of expression of a word group showing a positive or negative subjectivity. In addition, since text is a target, it is not limited to products and systems, and it is possible to estimate the cause of expression of a word group indicating positive or negative subjectivity regarding human relationships or human life events.

本実施形態に係る原因推定装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the cause estimation apparatus which concerns on this embodiment. 事前処理部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a pre-processing part. 共起語抽出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a co-occurrence word extraction part. 原因語候補抽出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a cause word candidate extraction part. 原因検出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a cause detection part. 原因検出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a cause detection part.

本発明者らは、既存の技術では、要望や不満などが発現した原因を推定することができていなかったことに着目し、要望または不満等の主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルを用いることによって、自動的に要望または不満等の主観が発現した原因を推定することができることを見出し、本発明をするに至った。 The inventors of the present invention pay attention to the fact that the cause of the request or dissatisfaction could not be estimated in the existing technology, and the word group indicating the subjectivity such as the request or dissatisfaction, the word group indicating the subjectivity It is possible to automatically estimate the cause of the manifestation of subjectivity such as desire or dissatisfaction by using a probability table composed of a word group indicating the cause of the occurrence of the disease and a numerical value indicating the expression probability of the word group indicating the cause of the expression. The inventors have found that this is possible and have come to the present invention.

すなわち、本発明の原因推定装置は、文章内に、肯定的な主観を示す語群または否定的な主観を示す語群の少なくとも一方が発現した原因を推定する原因推定装置であって、入力されたテキストデータに対して文書分類処理を行ない、分類された各語群に対して肯定的な主観を示す語群であることを示す肯定ラベルまたは否定的な主観を示す語群であることを示す否定ラベルを付与する主観検出部と、前記文章から前記肯定ラベルが付与された語群または前記否定ラベルが付与された語群と共起する共起語を抽出し、肯定的または否定的な主観を示す語群、その主観を示す語群の発現原因を示す語群およびその発現原因を示す語群の発現確率を示す数値から構成される確率テーブルから、前記抽出した共起語を含む発現原因を示す語群を抽出し、前記抽出した発現原因を示す語群を、発現確率が高い順に選出して原因語群とする原因検出部と、を備えることを特徴とする。 That is, the cause estimation apparatus of the present invention is a cause estimation apparatus that estimates a cause in which at least one of a word group indicating a positive subjectivity or a word group indicating a negative subjectivity appears in a sentence, and is input. The document classification processing is performed on the text data, and the positive label indicating the positive subjectivity for each classified word group or the word group indicating the negative subjectivity is indicated. A subjective detection unit for assigning a negative label, and extracting a co-occurrence word co-occurring with the word group with the positive label or the word group with the negative label from the sentence, An expression cause including the extracted co-occurrence word from a probability table composed of a word group indicating an expression, a word group indicating an expression cause of the word group indicating the subjectivity, and a numerical value indicating an expression probability of the word group indicating the expression Extract words that indicate The term group indicates expression causes the extracted, causing detector caused word group and elected expression probability descending order, characterized in that it comprises a.

この構成により、本発明者らは、肯定的または否定的な主観を示す語群の発現原因を、自動的に推定することを可能とした。また、文章を対象とするため、商品やシステムに限定されず、人間関係や人のライフイベントに関する肯定的または否定的な主観を示す語群の発現原因を推定することを可能とした。以下、本発明の実施形態について、図面を参照しながら具体的に説明する。 With this configuration, the present inventors have been able to automatically estimate the cause of expression of a word group showing a positive or negative subjectivity. In addition, since the subject is a sentence, the present invention is not limited to products and systems, and it is possible to estimate the cause of expression of a word group indicating positive or negative subjectivity regarding human relations or human life events. Embodiments of the present invention will be specifically described below with reference to the drawings.

図１は、本実施形態に係る原因推定装置の概略構成を示すブロック図である。この原因推定装置は、事前処理部１０、主観検出部２０および原因検出部３０から構成されている。事前処理部１０は、学習データベース１およびテーブル作成部３を備えている。学習データベース１は、事前に収集され、肯定的な主観を示す語群である要望Ｄ、否定的な主観を示す語群である不満Ｆ、およびそれらの少なくとも一方が発現した原因Ｃがラベルとして付与された文章を格納する。学習データは、例えば、商品等に対するレビュー文や、ＳＮＳ（Social Networking Service）の投稿とそのコメントなど、様々なテキストデータを使用することができる。テーブル作成部３は、学習データの要望Ｄ・不満Ｆ・原因Ｃラベルが付与されたデータセットから、原因文章の確率テーブルを作成する。データセットは、例えば、以下の表［学習データ例］に示される内容を有する。 FIG. 1 is a block diagram illustrating a schematic configuration of the cause estimation apparatus according to the present embodiment. The cause estimation apparatus includes a preprocessing unit 10, a subjective detection unit 20, and a cause detection unit 30. The pre-processing unit 10 includes a learning database 1 and a table creation unit 3. The learning database 1 is given in advance as a label, which is collected in advance and includes a request D that is a group of words indicating positive subjectivity, a dissatisfaction F that is a group of words indicating negative subjectivity, and a cause C in which at least one of them is expressed. Stored text. As the learning data, for example, various text data such as a review sentence for a product or the like, a SNS (Social Networking Service) post and its comment can be used. The table creation unit 3 creates a cause sentence probability table from a data set to which the learning data request D, dissatisfaction F, and cause C labels are assigned. The data set has, for example, the contents shown in the following table [learning data example].

テーブル作成部３が作成する確率テーブルは、次の表［確率テーブル例１］のように、要望Ｄ_Ｎと不満Ｆ_Ｍと原因Ｃ_Ｌとその起こりうる確率が複数セット記憶されているものを指す。 Probability table table creating unit 3 creates, as the following table [probability table Example 1], refers to something probability that demand D _N dissatisfaction F _M and cause C _L and may occur that is a plurality of sets stored .

データ内の要素（要望・不満・原因）は３つである必要はなく、次の表［確率テーブル例２］または［確率テーブル例３］のような要望と原因のセットデータ、不満と原因のセットデータも含めることができる。 There is no need to have three elements (desired / dissatisfied / cause) in the data, but the set data of the desired and the cause, dissatisfied / caused Set data can also be included.

要望・不満・原因には、単語・文節・文章を入れることができる。例えば、要望と不満は単語ごとに作成し、原因は文章で作成するなど、単語と文節、文章の組み合わせを自由に設定することができる。 Requests, complaints, and causes can include words, phrases, and sentences. For example, requests and dissatisfactions can be created for each word, and the cause can be created by sentences. The combination of words, phrases, and sentences can be freely set.

確率テーブルにおける確率値は、上記の［確率テーブル例１］に示したように、要望Ｄに「風が欲しい」、不満Ｆに「暑い」という内容文１００件に対して、原因としてＣ１「窓が閉まっている」が３５件、Ｃ２「部屋に窓がない」が５件、Ｃ３「エアコンがＯＦＦになっている」が５０件、Ｃ４「エアコンが故障している」が１０件だった場合、それぞれの確率はＰ１＝０．３５、Ｐ２＝０．０５、Ｐ３＝０．５０、Ｐ４＝０．１０と計算される。 As shown in [Probability Table Example 1] above, the probability values in the probability table are C1 “window” as the cause for 100 content sentences “desiring wind” in request D and “hot” in dissatisfaction F. 35 cases of “closed”, 5 cases of C2 “no window in room”, 50 cases of C3 “air conditioner turned off”, 10 cases of C4 “air conditioner failed” The respective probabilities are calculated as P1 = 0.35, P2 = 0.05, P3 = 0.50, and P4 = 0.10.

図２は、事前処理部の動作を示すフローチャートである。まず、学習データベース１を参照し（ステップＳ１）、確率値を計算する（ステップＳ２）。そして、学習データの要望Ｄ・不満Ｆ・原因Ｃラベルが付与されたデータセットから、原因文章の確率テーブルを作成する（ステップＳ３、ステップＳ４）。 FIG. 2 is a flowchart showing the operation of the preprocessing unit. First, the learning database 1 is referred to (step S1), and a probability value is calculated (step S2). And the probability table of a cause sentence is created from the data set to which the desire D of learning data, dissatisfaction F, and the cause C label were provided (step S3, step S4).

次に、図１において、主観検出部２０は、要望判定部５および不満判定部７を備えている。主観検出部２０は、テキスト群４（テキスト文章）を入力する。なお、文章は、適宜、段落、句読点区切りなどの文節文にしてもよい。なお、入力文章は、ＳＮＳやブログなど、インターネット上の投稿文や一連のメール文章といったコミュニケーション文を使うことが可能である。次の表は、［文節文の入力テキスト例］を示す。 Next, in FIG. 1, the subjective detection unit 20 includes a demand determination unit 5 and a dissatisfaction determination unit 7. The subjectivity detection unit 20 inputs the text group 4 (text sentence). Note that the sentence may be a sentence such as a paragraph or a punctuation break as appropriate. The input text can be a communication text such as a posted text on the Internet or a series of mail text such as SNS or blog. The following table shows [Input text example of clause sentence].

要望判定部５は、テキスト検索手法やベクトル空間モデルにより文書分類を行ない、要望Ｄを検出し、ラベル付けする。次の表は、付与されたラベルの例を示す。 The request determination unit 5 classifies the document by a text search method or a vector space model, detects the request D, and labels it. The following table shows examples of labels given.

例えば、以下の関連技術１および２を用いて、学習データに含まれる要望データから要望特徴空間および、教師ベクトルを作成する。前記要望特徴空間に基づいて、入力テキスト文章の特徴ベクトルを作成し、教師ベクトルとの類似度に基づき、要望ラベルを付与することができる。また、関連技術３を利用し、学習データに含まれる要望データから要望識別器を作成・利用することも可能である。 For example, using the following related techniques 1 and 2, a desired feature space and a teacher vector are created from desired data included in the learning data. A feature vector of an input text sentence can be created based on the desired feature space, and a desired label can be assigned based on the similarity to the teacher vector. It is also possible to create and use a request classifier from the request data included in the learning data using the related technique 3.

［関連技術１］Bag-of-words model：http://en.wikipedia.org/wiki/Bag-of-words_model
［関連技術２］Mecab：http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html
［関連技術３］SVMLIGHT：http://svmlight.joachims.org/
不満判定部７は、情緒的表現を解析することにより不満Ｆを検出し、ラベル付けする。情緒的表現の解析には、上記の各関連技術や他の従来技術で示されているように、整備された辞書やポジネガ判定などを用いることで実現する。次の表は、付与されたラベルの例を示す。 [Related Technology 1] Bag-of-words model: http://en.wikipedia.org/wiki/Bag-of-words_model
[Related Technology 2] Mecab: http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html
[Related Technology 3] SVMLIGHT: http://svmlight.joachims.org/
The dissatisfaction determination unit 7 detects dissatisfaction F by analyzing the emotional expression and labels it. The analysis of the emotional expression is realized by using a prepared dictionary, positive / negative determination, etc., as shown in the above related technologies and other conventional technologies. The following table shows examples of labels given.

図１において、原因検出部３０は、共起語抽出部９、原因語候補抽出部１１および原因判定部１３を備えている。共起語抽出部９は、要望判定部５によって「要望Ｄ」とラベル付けされた文章（要望文章）と、不満判定部７によって「不満Ｆ」とラベル付けされた文章（不満文章）とを、対応づけて、共起する単語の組み合わせを抽出する。この対応づけの方法として、例えば、検出された不満文章を起点に、直近の要望文章を組み合わせることが可能である。例えば、上記の表［ラベル付与例２］では、ｔ５のＦ「熱い」とｔｎのＤ「風がほしい」とが対応づけられ、これら２つと共起する「室温・湿度・エアコン・我慢」などの単語が抽出される。 In FIG. 1, the cause detection unit 30 includes a co-occurrence word extraction unit 9, a cause word candidate extraction unit 11, and a cause determination unit 13. The co-occurrence word extraction unit 9 reads a sentence (request sentence) labeled as “request D” by the request determination unit 5 and a sentence (dissatisfaction sentence) labeled “dissatisfaction F” by the dissatisfaction determination unit 7. , And extract a combination of co-occurring words. As a method of associating, for example, it is possible to combine the most recent requested text starting from the detected dissatisfied text. For example, in the above table [labeling example 2], t5 F “hot” and tn D “want a wind” are associated with each other, “room temperature, humidity, air conditioner, patience” etc. co-occurring with these two Are extracted.

図３は、共起語抽出部９の動作を示すフローチャートである。まず、ラベル付けされた文章群を取得し（ステップＳ１０１）、要望ラベルまたは不満ラベルの有無を判定する（ステップＳ１０２）。ステップＳ１０２において、要望ラベルまたは不満ラベルの一方のみが存在した場合は、ステップＳ１０４に遷移し、要望ラベルまたは不満ラベルの両方が存在した場合は、要望・不満の対応付けを行なう（ステップＳ１０３）。次に、共起語抽出処理を行ない（ステップＳ１０４）、共起語を取得する（ステップＳ１０５）。 FIG. 3 is a flowchart showing the operation of the co-occurrence word extraction unit 9. First, a labeled sentence group is acquired (step S101), and the presence or absence of a desired label or a dissatisfied label is determined (step S102). In step S102, when only one of the desired label or the dissatisfied label exists, the process proceeds to step S104. When both the desired label or the dissatisfied label exists, association between the desired / unsatisfied is performed (step S103). Next, a co-occurrence word extraction process is performed (step S104), and a co-occurrence word is acquired (step S105).

原因語候補抽出部１１は、共起語抽出部９で抽出された共起語に基づいて、テーブル作成部３で作成された確率テーブルを参照し、該当する原因候補をすべて抽出する。例えば、上記のような共起語「室温・湿度・エアコン・我慢」がある場合、上記の表［確率テーブル例１］では、３行目の「エアコンがついていない」と、４行目の「エアコンが壊れている」などが抽出される。確率テーブルの参照先は、入力したテキスト群４に対する主観検出部２０の出力結果に応じて、適宜変化させることができる。 The causal word candidate extraction unit 11 refers to the probability table created by the table creation unit 3 based on the co-occurrence words extracted by the co-occurrence word extraction unit 9, and extracts all corresponding cause candidates. For example, if there is a co-occurrence word such as “room temperature, humidity, air conditioner, patience” as described above, in the above table [probability table example 1], “no air conditioner” on the third line and “ The air conditioner is broken. " The reference destination of the probability table can be appropriately changed according to the output result of the subjective detection unit 20 for the input text group 4.

例えば、主観検出部２０にて要望・不満がとれた場合は、上記の表［確率テーブル例１］のような３つの要素を含むテーブルＤＦＣを優先して参照し、他の［確率テーブル例２］と［確率テーブル例３］を第二・第三候補として参照することができる。このとき、第二候補はテーブルＤＣでもテーブルＦＣでも構わない。不満だけの場合は、［確率テーブル例３］のような不満と原因の２要素を含むテーブルＦＣを優先して参照し、要望を無視する形で［確率テーブル例１］のような３つの要素を含むテーブルを第二候補として参照することも可能である。次の表は、参照先の優先順位例を示す。 For example, if the subjective detection unit 20 makes a request or dissatisfaction, the table DFC including three elements such as the above table [probability table example 1] is referred to with priority, and the other [probability table example 2]. ] And [Probability Table Example 3] can be referred to as the second and third candidates. At this time, the second candidate may be the table DC or the table FC. In the case of dissatisfaction only, a table FC including two elements of dissatisfaction and cause such as [Probability table example 3] is referred to preferentially, and three elements such as [Probability table example 1] are ignored in a way that the request is ignored. It is also possible to refer to a table including as a second candidate. The following table shows an example of priorities of reference destinations.

Ｎ_Ｄ：主観検出部により検出されたＤの数
Ｄ_Ｆ：主観検出部により検出されたＦの数

N _D : Number of D detected by the subjective detection unit D _F : Number of _F detected by the subjective detection unit

図４は、原因語候補抽出部１１の動作を示すフローチャートである。まず、ラベル付けされた文章群および共起語を入力する（ステップＳ２０１）。次に、要望ラベルおよび不満ラベルの有無を判断する（ステップＳ２０２）。ステップＳ２０２において、要望ラベルおよび不満ラベルの両方があった場合は、テーブルＤＦＣを参照し（ステップＳ２０３）、第二候補の有無を判断する（ステップＳ２０４）。ステップＳ２０４において、第二候補が無い場合は、ステップＳ２０８に遷移する一方、第二候補がある場合は、テーブルＤＣを参照し（ステップＳ２０５）、第三候補の有無を判断する（ステップＳ２０６）。ステップＳ２０６において、第三候補が無い場合は、ステップＳ２０８に遷移する一方、第三候補がある場合は、テーブルＦＣを参照し（ステップＳ２０７）、原因語候補を抽出する（ステップＳ２０８）。 FIG. 4 is a flowchart showing the operation of the causal word candidate extraction unit 11. First, a labeled sentence group and co-occurrence words are input (step S201). Next, it is determined whether or not there are desired labels and dissatisfied labels (step S202). In step S202, if both the desired label and the dissatisfied label exist, the table DFC is referred to (step S203), and the presence / absence of the second candidate is determined (step S204). In step S204, if there is no second candidate, the process proceeds to step S208. On the other hand, if there is a second candidate, the table DC is referred to (step S205), and the presence / absence of the third candidate is determined (step S206). In step S206, when there is no third candidate, the process proceeds to step S208. On the other hand, when there is a third candidate, the table FC is referred to (step S207), and a cause word candidate is extracted (step S208).

次に、ステップＳ２０２において、不満ラベルのみがあった場合は、テーブルＦＣを参照し（ステップＳ２０９）、第二候補の有無を判断する（ステップＳ２１０）。第二候補が無い場合は、ステップＳ２０８に遷移する一方、第二候補がある場合は、テーブルＤＦＣを参照して（ステップＳ２１１）、ステップＳ２０８に遷移する。 Next, in step S202, when there is only a dissatisfied label, the table FC is referred to (step S209), and the presence / absence of the second candidate is determined (step S210). If there is no second candidate, the process proceeds to step S208. If there is a second candidate, the table DFC is referenced (step S211), and the process proceeds to step S208.

次に、ステップＳ２０２において、要望ラベルのみがあった場合は、テーブルＤＣを参照し（ステップＳ２１２）、第二候補の有無を判断する（ステップＳ２１３）。第二候補が無い場合は、ステップＳ２０８に遷移する一方、第二候補がある場合は、テーブルＤＦＣを参照して（ステップＳ２１４）、ステップＳ２０８に遷移する。 Next, in step S202, when there is only a desired label, the table DC is referred to (step S212), and the presence / absence of the second candidate is determined (step S213). If there is no second candidate, the process proceeds to step S208. If there is a second candidate, the table DFC is referenced (step S214), and the process proceeds to step S208.

原因判定部１３は、原因語候補抽出部１１で抽出された原因語候補より、原因１５を決定する。原因の決定方法は、原因語候補のテーブルを参照し、確率値を用いて決定する。例えば、上位複数表示することや、事前に設定した閾値より高いものを表示することが可能である。例えば、次の表［原因語候補のテーブル例］のような場合、確率値を用いて、Ｃ１、Ｃ２、Ｃ３が優先して選ばれる。 The cause determination unit 13 determines the cause 15 from the cause word candidates extracted by the cause word candidate extraction unit 11. The cause determination method refers to a cause word candidate table and determines the cause using a probability value. For example, it is possible to display a plurality of upper ranks or to display a higher one than a preset threshold value. For example, in the case of the following table [table example of cause word candidates], C1, C2, and C3 are preferentially selected using probability values.

原因語候補のテーブルにおいて、同一確率のものが存在する場合、原因判定における優先順位は、入力した文章のうち、要望判定部５で検出された要望Ｄおよび不満判定部７で検出された不満Ｆに対し、最も文節間隔が小さくなるような原因Ｃを優先する、などの処理を適用することができる。例えば、上記の表［原因語候補のテーブル例］において、Ｃ１、Ｃ２、Ｃ３は同確率であるが、次の表［文節間隔を判定するためのテーブル例］より、Ｃ１はｔ３、Ｃ２はｔ７、Ｃ３はｔ５０に位置するが、対象となるｔ１００のＤとｔ５のＦに対して、文節間隔が小さいＣ１とＣ２が優先して選ばれる。 In the cause word candidate table, when there are those with the same probability, the priority in the cause determination is that the demand D detected by the request determination unit 5 and the dissatisfion F detected by the dissatisfaction determination unit 7 among the input sentences. On the other hand, processing such as giving priority to the cause C that minimizes the phrase interval can be applied. For example, in the above table [cause word candidate table example], C1, C2, and C3 have the same probability, but from the next table [table example for determining the phrase interval], C1 is t3 and C2 is t7. , C3 is located at t50, but C1 and C2 having a small phrase interval are preferentially selected with respect to D of target t100 and F of t5.

図５は、原因検出部３０の動作を示すフローチャートである。ここでは、複数の原因を特定する場合について説明する。まず、対象の確率テーブルを取得し（ステップＳ３０１）、次に、確率テーブルにおける確率値が閾値以上であるかどうかを判断する（ステップＳ３０２）。確率値が閾値以上でない場合は、終了する一方、確率値が閾値以上である場合は、その確率値を有する原因を特定し（ステップＳ３０３）、終了する。 FIG. 5 is a flowchart showing the operation of the cause detection unit 30. Here, a case where a plurality of causes are specified will be described. First, a target probability table is acquired (step S301), and then it is determined whether the probability value in the probability table is equal to or greater than a threshold value (step S302). If the probability value is not equal to or greater than the threshold value, the process ends. If the probability value is equal to or greater than the threshold value, the cause having the probability value is specified (step S303), and the process ends.

図６は、原因検出部の動作を示すフローチャートである。ここでは、同一確率が存在する場合、原因を一つに絞る例を示す。まず、対象の確率テーブルを取得し（ステップＳ４０１）、次に、確率テーブルにおける確率値が閾値以上であるかどうかを判断する（ステップＳ４０２）。確率値が閾値以上でない場合は、終了する一方、確率値が閾値以上である場合は、同一確率が存在するかどうかを判断する（ステップＳ４０３）。ステップＳ４０３において、同一確率が存在しない場合は、確率値を比較して（ステップＳ４０５）、大きい方を抽出することによって、その原因を特定し（ステップＳ４０６）、終了する。ステップＳ４０３において、同一確率が存在する場合は、文章間隔を比較し（ステップＳ４０４）、文章間隔が最小である原因を特定して（ステップＳ４０６）、終了する。 FIG. 6 is a flowchart showing the operation of the cause detection unit. Here, an example is shown in which the cause is narrowed down to one when the same probability exists. First, a target probability table is acquired (step S401), and then it is determined whether the probability value in the probability table is equal to or greater than a threshold value (step S402). If the probability value is not greater than or equal to the threshold value, the process ends. If the probability value is greater than or equal to the threshold value, it is determined whether or not the same probability exists (step S403). In step S403, if the same probability does not exist, the probability values are compared (step S405), the larger one is extracted, the cause is identified (step S406), and the process ends. In step S403, if the same probability exists, the sentence intervals are compared (step S404), the cause of the minimum sentence interval is specified (step S406), and the process ends.

なお、同一間隔のものが存在する場合は、ＤおよびＦより前のＣを優先することもできる。これは、文章が一般的に原因、結果の順に書かれる構造が多いことに起因する。例えは、上記の［文節間隔を判定するためのテーブル例］において、Ｃ１とＣ２の位置関係を見て、Ｃ１が優先して選択される。また、原因語候補として抽出された原因語を含む文章に、原因ラベルを付与することもできる。 In addition, when the thing of the same space | interval exists, priority before C of D and F can also be given priority. This is due to the fact that sentences are generally written in the order of causes and results. For example, in the above [Example of Table for Determining Phrase Interval], C1 is selected with priority given to the positional relationship between C1 and C2. In addition, a cause label can be given to a sentence including a cause word extracted as a cause word candidate.

以上説明したように、本実施形態によれば、要望または不満を示す語群の発現原因を、自動的に推定することが可能となる。また、文章を対象とするため、商品やシステムに限定されず、人間関係や人のライフイベントに関する肯定的または否定的な主観を示す語群の発現原因を推定することが可能となる。 As described above, according to the present embodiment, it is possible to automatically estimate the cause of expression of a word group indicating a request or dissatisfaction. In addition, since text is a target, it is not limited to products and systems, and it is possible to estimate the cause of expression of a word group indicating positive or negative subjectivity regarding human relationships or human life events.

１学習データベース
３テーブル作成部
４テキスト群
５要望判定部
７不満判定部
９共起語抽出部
１０事前処理部
１１原因語候補抽出部
１３原因判定部
１５原因
２０主観検出部
３０原因検出部
DESCRIPTION OF SYMBOLS 1 Learning database 3 Table preparation part 4 Text group 5 Desire determination part 7 Dissatisfaction determination part 9 Co-occurrence word extraction part 10 Preprocessing part 11 Cause word candidate extraction part 13 Cause determination part 15 Cause 20 Subjective detection part 30 Cause detection part

Claims

A cause estimation device for estimating the cause of at least one of a word group indicating a positive subjectivity or a word group indicating a negative subjectivity in a sentence,
A document classification process is performed on the input text data, and a positive label indicating that the classified word group is a positive subjective word group or a negative subjective word group. A subjective detection unit that gives a negative label indicating
Extracting co-occurrence words co-occurring with the group of words given the positive label or the group of negative labels from the sentence, the group of words showing positive or negative subjectivity, the word showing the subjectivity Extracting a word group indicating an expression cause including the extracted co-occurrence word from a probability table composed of a word group indicating the expression cause of the group and a numerical value indicating an expression probability of the word group indicating the expression cause; A cause detection unit comprising: a cause detection unit that selects a word group indicating the cause of occurrence as the cause word group in descending order of expression probability.

Learn words that have been collected in advance and classified by the document classification process and given a positive label indicating that it is a word group indicating positive subjectivity or a negative label indicating that it is a word group indicating negative subjectivity. The cause estimation apparatus according to claim 1, further comprising a pre-processing unit that stores the probability table based on a word group stored in a database and stored in the learning database.

When the cause detection unit has the same probability of expression of the selected plurality of causal word groups, among the causal word groups, from the word group to which a positive label is given or from the word group to which a negative label is given 3. The cause estimation apparatus according to claim 1, wherein a cause word group having the smallest phrase interval is selected.

A program that estimates the cause of at least one of a word group showing positive subjectivity or a word group showing negative subjectivity in a sentence,
A document classification process is performed on the input text data, and a positive label indicating that the classified word group is a positive subjective word group or a negative subjective word group. A process of assigning a negative label indicating
A process of extracting a co-occurrence word that co-occurs with the group of words given the positive label or the group of words given the negative label from the sentence;
Extracted from a probability table composed of a word group indicating a positive or negative subjectivity, a word group indicating the cause of expression of the word group indicating the subjectivity, and a numerical value indicating the expression probability of the word group indicating the expression cause Processing to extract a word group indicating the cause of expression including co-occurrence words;
A program that causes a computer to execute a series of processes of selecting a word group indicating the extracted expression cause in descending order of expression probability and making it a cause word group.

A cause estimation method for estimating a cause of occurrence of at least one of a word group indicating a positive subjectivity or a word group indicating a negative subjectivity in a sentence,
In the subjectivity detection unit, document classification processing is performed on the input text data, and a positive label indicating a positive subjectivity or a negative subjectivity is indicated for each classified word group. Assigning a negative label indicating that the word group,
In the cause detection unit, extract a co-occurrence word co-occurring with the word group given the positive label or the word group given the negative label from the sentence, a word group showing positive or negative subjectivity, A word group indicating an expression cause including the extracted co-occurrence word from a probability table composed of a word group indicating an expression cause of the word group indicating the subjectivity and a numerical value indicating an expression probability of the word group indicating the expression cause Extracting the word group indicating the extracted cause of occurrence and selecting the word group in descending order of the probability of occurrence as the cause word group.