JP6192166B2

JP6192166B2 - Opinion type estimation device and program thereof

Info

Publication number: JP6192166B2
Application number: JP2013263994A
Authority: JP
Inventors: 小早川　健; 健小早川
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2013-12-20
Filing date: 2013-12-20
Publication date: 2017-09-06
Anticipated expiration: 2033-12-20
Also published as: JP2015121846A

Description

本願発明は、意見の内容を表した発言データから意見種別に関する統計情報を推定する意見種別推定装置及びそのプログラムに関する。 The present invention relates to an opinion type estimation device for estimating statistical information related to an opinion type from comment data representing the content of the opinion, and a program thereof.

従来から、人間の手による書き込みを解析するために、自然言語処理が必要とされている。自然言語処理の前段では、意見種別を推定する技術が必要となる（特許文献１〜５及び非特許文献１，２）。この意見種別推定技術は、意見種別毎にラベルを付与するアルゴリズムであり、賛否を表す意見種別の場合、「賛成」または「反対」というラベルを意見に付与するものである。 Conventionally, natural language processing has been required to analyze writing by human hands. Prior to the natural language processing, a technique for estimating the opinion type is required (Patent Documents 1 to 5 and Non-Patent Documents 1 and 2). This opinion type estimation technique is an algorithm that assigns a label for each opinion type. In the case of an opinion type that indicates approval or disapproval, a label “agree” or “disagree” is assigned to an opinion.

特許第５２２４５３２号公報Japanese Patent No. 5224532 特許第５２５４８８８号公報Japanese Patent No. 5254888 特許第５２７８３２７号公報Japanese Patent No. 5278327 特許第５２８６２９８号公報Japanese Patent No. 5286298 特許第５２２１７５１号公報Japanese Patent No. 5221551

Jorge Nocedal,Stephen J.Wright,”Numerical Optimization”,Springer,2006年Jorge Nocedal, Stephen J. Wright, “Numerical Optimization”, Springer, 2006 大塚裕子、幹孝司、奥村学、“意見分析エンジン”、コロナ社、２００７年Yuko Otsuka, Takashi Miki, Manabu Okumura, “Opinion Analysis Engine”, Corona, 2007

しかし、従来の意見種別推定技術には、不適切なラベルを意見に付与するといった、ある程度の誤りが含まれてしまう。このため、従来の意見種別推定技術は、この技術を単独で用いた場合、賛否の数や割合といった意見種別に関する統計情報を求めるときに、その推定精度が向上しないという問題がある。 However, the conventional opinion type estimation technique includes a certain amount of error such as giving an inappropriate label to an opinion. For this reason, the conventional opinion type estimation technique has a problem that, when this technique is used alone, the estimation accuracy is not improved when obtaining statistical information related to the opinion type such as the number of pros and cons.

ここで、意見種別推定技術に誤りが含まれる原因を検討する。各個人の意見形成過程では、他人の意見を参考にしながら自分の意見（賛否）を決めることが多い。にもかかわらず、従来の意見種別推定技術では、実際の意見形成過程が反映されておらず、誤りの原因になると考えられる。 Here, the cause of the error included in the opinion type estimation technique is examined. In the process of forming each individual's opinion, it is often the case that one's opinion (pros and cons) is determined with reference to the opinions of others. Nevertheless, the conventional opinion type estimation technology does not reflect the actual opinion formation process, and is considered to cause errors.

そこで、本願発明は、前記した問題を解決し、意見種別に関する統計情報の推定精度を向上させる意見種別推定装置及びそのプログラムを提供することを課題とする。 Therefore, an object of the present invention is to provide an opinion type estimation device and a program for solving the above-described problems and improving the estimation accuracy of statistical information related to the opinion type.

前記した課題に鑑みて、本願発明に係る意見種別推定装置は、意見の内容を表した発言データに意見種別が予め設定され、発言データの意見種別に関する統計情報を推定する意見種別推定装置であって、混合分布モデル記憶手段と、パラメータ最適化手段と、意見種別推定手段と、を備えることを特徴とする。 In view of the above-described problems, the opinion type estimation device according to the present invention is an opinion type estimation device in which an opinion type is set in advance in comment data representing the content of an opinion, and statistical information regarding the opinion type of the comment data is estimated. And a mixed distribution model storage unit, a parameter optimization unit, and an opinion type estimation unit.

かかる構成によれば、意見種別推定装置は、混合分布モデル記憶手段によって、意見種別毎の混合比とピーク時刻と盛り上がりの急速さとをパラメータとした、発言時刻に依存した意見の形成過程を表す混合分布モデルを予め記憶すると共に、発言時刻が付加された前記発言データを前記意見種別毎に記憶する。また、意見種別推定装置は、パラメータ最適化手段によって、混合分布モデル記憶手段の混合分布モデルに発言データを適用し、数値最適化手法によってパラメータを推定する。 According to this configuration, the opinion type estimation device uses the mixture distribution model storage means to represent a process of forming an opinion depending on the utterance time, with the mixture ratio, peak time, and rapidity of each opinion type as parameters. The distribution model is stored in advance, and the speech data to which the speech time is added is stored for each opinion type. Further, the opinion type estimation device applies the utterance data to the mixed distribution model of the mixed distribution model storage unit by the parameter optimization unit, and estimates the parameter by the numerical optimization method.

そして、意見種別推定装置は、意見種別推定手段によって、意見の形成過程を表す混合分布モデルでパラメータが最適化されているので、この混合分布モデルを用いて、意見種別に関する統計情報を推定する。例えば、意見種別に関する統計情報は、全ての発言データに含まれる意見種別の発言の割合、又は、意見種別毎の発言数である。 The opinion type estimation device estimates the statistical information related to the opinion type using the mixed distribution model because the parameter is optimized by the mixed type distribution model representing the opinion formation process by the opinion type estimation unit. For example, the statistical information regarding the opinion type is the ratio of the comments of the opinion type included in all the comment data, or the number of comments for each opinion type.

発言データとは、政治、経済等の分野における特定の事案について、ネットワーク上で不特定多数者が発言した意見の内容を表したテキストデータのことである。
意見種別とは、発言データの意見内容を予め設定した種別に分類したものである。例えば、意見種別として、意見の賛否を表す「賛成」や「反対」といったラベルをあげることができる。
意見種別毎の混合比は、混合分布モデルにおける各意見種別の割合を表すことになる。例えば、意見種別毎の混合比は、「賛成」の割合と、「反対」の割合とを表す。 The speech data is text data representing the content of an opinion made by an unspecified majority person on a network regarding a specific case in a field such as politics and economy.
The opinion type is a classification of the opinion content of the utterance data into a preset type. For example, as the opinion type, labels such as “agree” and “disagree” indicating approval / disapproval of the opinion can be given.
The mixing ratio for each opinion type represents the ratio of each opinion type in the mixture distribution model. For example, the mixing ratio for each opinion type represents a ratio of “agree” and a ratio of “opposite”.

意見種別毎の分布モデルは、発言時刻に依存し、１以上のピークを有する分布モデルのことである（例えば、ガウシアン分布モデル）。つまり、意見種別毎の分布モデルは、発言時刻に依存することから、意見の形成過程が反映されていると言える。
混合分布モデルとは、意見種別毎の分布モデルを混合した確率モデルのことである。例えば、混合分布モデルとして、「賛成」の分布モデルと、「反対」の分布モデルとの２混合分布モデルをあげることができる。 The distribution model for each opinion type is a distribution model having one or more peaks depending on the utterance time (for example, Gaussian distribution model). In other words, since the distribution model for each opinion type depends on the utterance time, it can be said that the opinion formation process is reflected.
The mixed distribution model is a probability model obtained by mixing distribution models for each opinion type. For example, as the mixed distribution model, a two-mixed distribution model including a “promise” distribution model and an “opposite” distribution model can be cited.

盛り上がりとは、ネットワーク上で発言が増加する速さのことである。例えば、ネットワーク上で特定の事案に関する発言が急速に増えている場合、盛り上がっていると言う。また、例えば、ネットワーク上で特定の事案に関する発言が少ない状態が継続する場合、盛り上がっていないと言う。
ネットワーク上での発言の増加の原因は、拡散による発言者の増加や、同一人物による繰り返し発言の増加が考えられる。この２つの原因は混在しているが、同一人物の発言を１回しか参照しないことによって、同一人物による繰り返し発言の影響を除去することが可能である。この場合、発言の増加が拡散を表す。 The excitement is the speed with which the speech increases on the network. For example, if the number of statements about a particular case on the network is increasing rapidly, it is said to be exciting. Also, for example, if the state where there are few statements about a specific case continues on the network, it is said that it is not exciting.
The cause of the increase in the number of utterances on the network may be an increase in the number of speakers due to diffusion or an increase in the number of utterances repeatedly by the same person. Although these two causes are mixed, it is possible to eliminate the influence of repeated speech by the same person by referring to the speech of the same person only once. In this case, an increase in speech represents diffusion.

本願発明に係る意見種別推定装置は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した各手段として協調動作させる意見種別推定プログラムで実現することもできる。このプログラムは、通信回線を介して配布してもよく、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 The opinion type estimation apparatus according to the present invention can also be realized by an opinion type estimation program that causes hardware resources such as a CPU, a memory, and a hard disk included in a computer to operate in cooperation with each other as described above. This program may be distributed through a communication line, or may be distributed by writing in a recording medium such as a CD-ROM or a flash memory.

本願発明によれば、以下のような優れた効果を奏する。
本願発明によれば、意見の形成過程を表す混合分布モデルのパラメータを最適化して、意見種別に関する統計情報を推定するので、その推定精度を向上させることができる。これによって、本願発明によれば、ネットワーク上の意見に意見種別を自動的に付加できるので、手動で意見種別を付加する労力を削減することができる。 According to the present invention, the following excellent effects can be obtained.
According to the present invention, since the statistical information about the opinion type is estimated by optimizing the parameters of the mixed distribution model representing the opinion formation process, the estimation accuracy can be improved. Thus, according to the present invention, since the opinion type can be automatically added to the opinion on the network, the labor for manually adding the opinion type can be reduced.

本願発明の実施形態に係る発言解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech analysis apparatus which concerns on embodiment of this invention. 図１の意見種別付加手段が出力した発言データのデータ構造を説明する説明図である。It is explanatory drawing explaining the data structure of the utterance data which the opinion classification addition means of FIG. 1 output. 図１の混合分布モデル管理手段に記憶された混合分布モデルを説明する説明図である。It is explanatory drawing explaining the mixture distribution model memorize | stored in the mixture distribution model management means of FIG. 図１の発言解析装置の動作を示すブロック図である。It is a block diagram which shows operation | movement of the speech analysis apparatus of FIG. 本願発明の変形例において、分布モデルを説明する説明図である。In the modification of this invention, it is explanatory drawing explaining a distribution model.

図１を参照し、本願発明の実施形態について、適宜図面を参照しながら詳細に説明する。
図１のように、本願発明の実施形態に係る発言解析装置１は、ネットワーク上に存在する発言を解析するものであり、意見種別推定装置２と、発言解析手段３０とを備える。 With reference to FIG. 1, an embodiment of the present invention will be described in detail with reference to the drawings as appropriate.
As shown in FIG. 1, a speech analysis apparatus 1 according to an embodiment of the present invention analyzes a speech that exists on a network, and includes an opinion type estimation device 2 and a speech analysis means 30.

[発言解析装置の構成]
発言解析装置１は、ネットワーク上に存在する発言データが入力される。この発言データは、例えば、ホームページ、ブログ又は掲示板に書き込まれた意見である。ここで、発言データは、発言時刻（意見が書き込まれた時刻）が付加され、発言時刻で集計（ソート）された時系列データであることとする。そして、発言解析装置１は、発言の解析（自然言語処理）の前段として、意見種別の割合を推定するため、発言データを意見種別推定装置２に出力する。 [Configuration of speech analysis device]
The speech analysis apparatus 1 receives speech data existing on the network. This comment data is, for example, an opinion written on a homepage, blog or bulletin board. Here, it is assumed that the utterance data is time-series data added with the utterance time (the time when the opinion is written) and totaled (sorted) by the utterance time. Then, the utterance analysis device 1 outputs the utterance data to the opinion type estimation device 2 in order to estimate the ratio of the opinion type as a previous stage of the speech analysis (natural language processing).

意見種別推定装置２は、意見種別に関する統計情報として、発言解析装置１より入力された発言データから意見種別の割合を推定するものである。このため、意見種別推定装置２は、解析対象選別手段２１と、意見種別付加手段２２と、混合分布モデル管理手段（混合分布モデル記憶手段）２３と、パラメータ最適化手段２４と、意見種別推定手段２５とを備える。 The opinion type estimation device 2 estimates the ratio of the opinion type from the utterance data input from the utterance analysis device 1 as statistical information regarding the opinion type. Therefore, the opinion type estimation device 2 includes an analysis object selection unit 21, an opinion type addition unit 22, a mixed distribution model management unit (mixed distribution model storage unit) 23, a parameter optimization unit 24, and an opinion type estimation unit. 25.

解析対象選別手段２１は、発言解析装置１から入力された全ての発言データのうち、解析対象とする発言データを選別するものである。例えば、解析対象選別手段２１は、消費増税という政策の意見を解析したい場合、全ての発言データから、消費増税に関する発言データに絞り込む。より具体的には、解析対象選別手段２１は、解析対象となる事案で特徴的なキーワード（例えば、‘消費増税’）を手動で設定し、このキーワードが含まれる発言データを選別する処理を行う。そして、解析対象選別手段２１は、選別された発言データを、意見種別付加手段２２に出力する。
なお、解析対象選別手段２１は、キーワードが含まれる発言データを選別する処理に、設定されたキーワードの表記ゆれや類義語を同一語とみなす処理を組み合わせてもよい。 The analysis target selection means 21 selects the speech data to be analyzed from all the speech data input from the speech analysis device 1. For example, when the analysis target selection means 21 wants to analyze the opinion of the policy of consumption tax increase, it narrows down from all the comment data to the comment data related to the consumption tax increase. More specifically, the analysis target selection unit 21 manually sets a keyword (for example, “consumption tax increase”) characteristic in the case to be analyzed, and performs processing for selecting the utterance data including the keyword. . Then, the analysis target selecting unit 21 outputs the selected message data to the opinion type adding unit 22.
Note that the analysis target selection unit 21 may combine the processing of selecting the utterance data including the keyword with the processing of regarding the set keyword notation fluctuation or synonym as the same word.

意見種別付加手段２２は、解析対象選別手段２１から入力された発言データに意見種別を付加するものである。本実施形態では、意見種別付加手段２２は、「賛成」又は「反対」という賛否を意見種別として、発言データに付加することとする。このとき、意見種別付加手段２２は、従来の意見種別推定技術を用いて、意見種別を発言データに付加する。より具体的には、意見種別付加手段２２は、参考文献１の４．２節「意見を含む文の自動抽出」及び４．３「評価分析の要素技術」に記載の手法を用いて、意見種別を発言データに付加できる。
参考文献１：大塚裕子、幹孝司、奥村学、“意見分析エンジン”、コロナ社、２００７年 The opinion type adding unit 22 adds an opinion type to the utterance data input from the analysis target selecting unit 21. In the present embodiment, the opinion type adding unit 22 adds the approval / disapproval of “agree” or “disagree” as the opinion type to the comment data. At this time, the opinion type adding means 22 uses the conventional opinion type estimation technique to add the opinion type to the utterance data. More specifically, the opinion type adding means 22 uses the techniques described in Section 4.2 “Automatic Extraction of Sentences Containing Opinion” and 4.3 “Evaluation Analysis Elemental Techniques” in Reference Document 1, The type can be added to the message data.
Reference 1: Yuko Otsuka, Takashi Miki, Manabu Okumura, “Opinion Analysis Engine”, Corona, 2007

ここで、発言データには、混合分布モデルに最適化手法を適用する際に意見種別が必要となるため、意見種別を発言データに付加する。当然、従来の意見種別推定技術を用いるため、発言データに付加された意見種別には、誤りが含まれている。 Here, since the opinion type is necessary for the comment data when the optimization method is applied to the mixed distribution model, the opinion type is added to the comment data. Naturally, since the conventional opinion type estimation technique is used, an error is included in the opinion type added to the utterance data.

また、発言データは、意見種別推定装置２に入力された時点で、発言時刻が付加されている。従って、図２のように、発言データは、発言時刻と、発言内容と、意見種別とが含まれている。図２には、経済政策に関する発言データを図示した。例えば、最初の意見は、２０１３年９月１９日の１０時３４分に書き込まれ、「景気対策をやってほしい」という意見であり、経済政策に「賛成」であることを示す。また、２番目の意見は、２０１３年９月１９日の１０時３５分に書き込まれ、「首相の経済対策をやめてほしい」という意見であり、経済政策に「反対」であることを示す。 In addition, the speech time is added to the speech data when it is input to the opinion type estimation device 2. Therefore, as shown in FIG. 2, the utterance data includes the utterance time, the utterance content, and the opinion type. FIG. 2 illustrates remark data regarding economic policies. For example, the first opinion is written at 10:34 on September 19, 2013, and is an opinion “I want you to take measures against the economy” and indicates that you agree with economic policy. The second opinion, written at 10:35 on September 19, 2013, is an opinion that “I want the Prime Minister to quit economic measures” and that it is “opposite” for economic policy.

その後、意見種別付加手段２２は、意見種別が付加された発言データを、発言種別毎に混合分布モデル管理手段２３に出力する。このとき、意見種別付加手段２２は、混合分布モデル管理手段２３を介して、パラメータ最適化手段２４にパラメータの最適化を指令する（パラメータ最適化指令）。 Thereafter, the opinion type adding unit 22 outputs the utterance data to which the opinion type is added to the mixed distribution model management unit 23 for each utterance type. At this time, the opinion type adding means 22 instructs the parameter optimization means 24 via the mixture distribution model management means 23 (parameter optimization instruction).

混合分布モデル管理手段２３は、意見種別付加手段２２から入力された発言データと、混合分布モデルとを記憶、管理するものである。 The mixed distribution model management unit 23 stores and manages the message data input from the opinion type adding unit 22 and the mixed distribution model.

＜混合分布モデル＞
図３を参照し、混合分布モデルについて、説明する（適宜図１参照）
混合分布モデルとは、意見種別毎に設定された分布モデルを混合したものである。本実施形態では、混合分布モデルは、図３のように、「賛成」の分布モデル９０と、「反対」の分布モデル９１とを混合した２混合分布モデルである。 <Mixed distribution model>
The mixed distribution model will be described with reference to FIG. 3 (see FIG. 1 as appropriate).
The mixed distribution model is a mixture of distribution models set for each opinion type. In the present embodiment, the mixed distribution model is a two-mixed distribution model obtained by mixing the “agree” distribution model 90 and the “opposite” distribution model 91 as shown in FIG.

本実施形態では、分布モデル９０，９１は、発言時刻に依存し、１以上のピークを有するガウシアン分布モデルである。ガウシアン分布モデルは、下記式（１）で表される。式（１）では、ｔ_０が意見種別毎に意見がピークとなる時刻（ピーク時刻）であり、α_０が意見の盛り上がりの急速さを表す。 In the present embodiment, the distribution models 90 and 91 are Gaussian distribution models having one or more peaks depending on the utterance time. The Gaussian distribution model is represented by the following formula (1). In Equation (1), t ₀ is the time (peak time) when the opinion peaks for each opinion type, and α ₀ represents the rapidity of the opinion.

従って、各分布モデル９０，９１は、下記式（１）´で表すことができる。式（１）´では、ｎが意見種別の数を表す（但し、ｎは自然数）。また、式（１）´では、意見種別ｎのピーク時刻ｔ_ｎと、意見種別ｎの盛り上がりの急速さα_ｎとを表す。本実施形態では、ｎ＝１，２となり、ｎ＝１のとき「賛成」の分布モデル９０を表し、ｎ＝２のとき「反対」の分布モデル９１を表す。 Accordingly, each of the distribution models 90 and 91 can be expressed by the following formula (1) ′. In Expression (1) ′, n represents the number of opinion types (where n is a natural number). Further, in the expression (1) ′, the peak time t _n of the opinion type n and the rapidity α _n of the opinion type n are expressed. In this embodiment, n = 1 and 2, and when n = 1, the “agree” distribution model 90 is represented, and when n = 2, the “opposite” distribution model 91 is represented.

発言解析装置１が「賛成」と「反対」とが混在している発言データの解析を目的とすることから、「賛成」と「反対」の分布モデル９０，９１の２混合を行う（分布モデルの混合は混合分布モデルと呼ばれる）。この２混合分布モデルは、下記式（２）で表すことができる。この式（２）には、「賛成」のピーク時刻ｔ_１、及び、「反対」のピーク時刻ｔ_２が含まれることから、発言時刻に依存し、意見の形成過程が反映されていると言える。 Since the speech analysis apparatus 1 is intended to analyze speech data in which “agree” and “opposite” are mixed, the “agree” and “opposite” distribution models 90 and 91 are mixed (distribution model). Is called a mixture distribution model). This two-mix distribution model can be expressed by the following formula (2). Since this expression (2) includes the peak time t ₁ of “agree” and the peak time t ₂ of “opposite”, it can be said that the formation process of the opinion is reflected depending on the speech time. .

この式（２）では、「賛成」の盛り上がりの急速さα_１を表し、「反対」の盛り上がりの急速さα_２を表す。また、式（２）では、「賛成」と「反対」との混合比βを表す（但し、０＜β＜１）。つまり、式（２）では、「賛成」の割合βを表し、「反対」の割合（１−β）を表す。また、式（２）の引数に含まれるセミコロン「；」は、その後に含まれる変数が式（１）´のパラメータ（引数）であることを表す。 In this equation (2), the rapidity α ₁ of “promotion” is expressed, and the rapidity α ₂ of “opposite” is expressed. Further, in the formula (2), the mixing ratio β between “agree” and “opposite” is expressed (where 0 <β <1). In other words, the expression (2) represents the “β” proportion “β” and the “opposite” proportion (1-β). A semicolon “;” included in the argument of the expression (2) indicates that a variable included thereafter is a parameter (argument) of the expression (1) ′.

ここで、式（２）の第１項における収束条件を検討する。式（２）のパラメータｔ_１，α_１を式（１）´に代入してα_１を前に出すと、下記の式（１）´´となる。この場合、式（１）´´の積分が１となる収束条件が必要となる（つまり、式（１）´´の累積が１となる）。この収束条件については、パラメータα_２も同様である。 Here, the convergence condition in the first term of Equation (2) is examined. Substituting the parameters t ₁ and α ₁ of equation (2) into equation (1) ′ and moving α ₁ forward, the following equation (1) ″ is obtained. In this case, a convergence condition is required in which the integral of the expression (1) ″ is 1 (that is, the accumulation of the expression (1) ″ is 1). This convergence condition is also true parameter alpha _2.

図１に戻り、意見種別推定装置２について、説明を続ける。
パラメータ最適化手段２４は、混合分布モデル管理手段２３に記憶された混合分布モデルに発言データを適用し、数値最適化手法を用いて、最適なパラメータを推定するものである。ここで、パラメータ最適化手段２４は、意見種別付加手段２２からパラメータ最適化指令が入力されたら、パラメータの最適化を開始する。また、パラメータ最適化手段２４は、数値最適化手法として、最急降下法、ＢＦＧＳ（準ニュートン法）等の最尤推定法、又は、ベイズ推定法を利用することができる。 Returning to FIG. 1, the explanation of the opinion type estimation device 2 will be continued.
The parameter optimization unit 24 applies speech data to the mixture distribution model stored in the mixture distribution model management unit 23 and estimates an optimum parameter using a numerical optimization method. Here, when the parameter optimization command is input from the opinion type adding unit 22, the parameter optimization unit 24 starts parameter optimization. Further, the parameter optimizing unit 24 can use a steepest descent method, a maximum likelihood estimation method such as BFGS (quasi-Newton method), or a Bayesian estimation method as a numerical optimization method.

例えば、最急降下法を用いる場合、混合分布モデル管理手段２３には、パラメータｔ_１，ｔ_２，α_１，α_２，βの初期値を予め設定しておく。そして、パラメータ最適化手段２４は、最適な混合分布モデルが得られるまで（つまり、パラメータが収束するまで）、混合分布モデル管理手段２３のパラメータｔ_１，ｔ_２，α_１，α_２，βを繰り返し更新する。そして、パラメータ最適化手段２４は、パラメータｔ_１，ｔ_２，α_１，α_２，βの更新を終了したら、混合分布モデル管理手段２３を介して、意見種別推定手段２５に意見種別の割合の推定を指令する（意見種別推定指令）。 For example, when the steepest descent method is used, initial values of parameters t ₁ , t ₂ , α ₁ , α ₂ , and β are set in advance in the mixed distribution model management unit 23. Then, the parameter optimization unit 24 sets the parameters t ₁ , t ₂ , α ₁ , α ₂ , β of the mixed distribution model management unit 23 until an optimal mixed distribution model is obtained (that is, until the parameters converge). Update repeatedly. When the parameter optimization unit 24 finishes updating the parameters t ₁ , t ₂ , α ₁ , α ₂ , and β, the opinion type estimation unit 25 sends the opinion type ratio to the opinion type estimation unit 25 via the mixed distribution model management unit 23. Command the estimation (opinion type estimation command).

意見種別推定手段２５は、パラメータ最適化手段２４でパラメータが推定された混合分布モデルを用いて、意見種別の割合を推定するものである。例えば、意見種別推定手段２５は、パラメータ最適化手段２４から意見種別推定指令が入力されたら、意見種別の割合を推定する。 The opinion type estimation means 25 estimates the ratio of opinion types using the mixed distribution model in which the parameters are estimated by the parameter optimization means 24. For example, when the opinion type estimation command is input from the parameter optimization unit 24, the opinion type estimation unit 25 estimates the ratio of the opinion type.

ここで、意見種別推定手段２５は、パラメータｔ_１，ｔ_２，α_１，α_２，βが推定された式（２）を用いることで、発言時刻及びその近傍で、意見種別の割合を推定することが可能となる。つまり、意見種別推定手段２５は、式（２）から、発言データ全体に含まれる「賛成」の割合βと、「反対」の割合（１−β）とを推定することができる。 Here, the opinion type estimation means 25 estimates the ratio of the opinion type at the utterance time and its vicinity by using the expression (2) in which the parameters t ₁ , t ₂ , α ₁ , α ₂ , and β are estimated. It becomes possible to do. That is, the opinion type estimation unit 25 can estimate the ratio “agree” and the ratio “1−β” of “opposite” included in the whole remark data from the expression (2).

また、意見種別推定手段２５は、下記式（３）のように、ある時刻τにおける「賛成」の割合と、「反対」の割合とを推定することができる。 Moreover, the opinion type estimation means 25 can estimate the ratio of “agree” and the ratio of “opposite” at a certain time τ as shown in the following equation (3).

さらに、意見種別推定手段２５は、前記式（３）を一般化した式（４）を用いて、時刻区間（τ_１，τ_２）における「賛成」の割合と、「反対」の割合とを推定することができる。ここで、時刻区間（τ_１，τ_２）は、時刻τ_１から時刻τ_２まで間を表す。 Furthermore, the opinion type estimation means 25 uses the equation (4) generalized from the equation (3) to calculate the proportion of “agree” and the proportion of “opposite” in the time interval (τ ₁ , τ ₂ ). Can be estimated. Here, the time interval (τ ₁ , τ ₂ ) represents the interval from time τ ₁ to time τ ₂ .

その後、意見種別推定手段２５は、「賛成」の割合、及び、「反対」の割合を推定結果として、発言解析手段３０に出力する。このとき、意見種別推定手段２５は、推定結果と共に、混合分布モデル管理手段２３から発言データを読み出して、発言解析手段３０に出力する。 Thereafter, the opinion type estimation means 25 outputs the ratio of “agree” and the ratio of “opposite” to the speech analysis means 30 as estimation results. At this time, the opinion type estimation unit 25 reads the utterance data from the mixed distribution model management unit 23 together with the estimation result, and outputs the utterance data to the utterance analysis unit 30.

なお、意見種別推定手段２５は、前記式（２）から式（４）の何れを用いて意見種別の割合を算出するか、予め設定される。例えば、発言解析装置１の利用者が、前記式（２）から式（４）の何れを用いるか設定してもよい。 It should be noted that the opinion type estimation means 25 is set in advance as to which of the formulas (2) to (4) is used to calculate the opinion type ratio. For example, the user of the speech analysis apparatus 1 may set which of the above formulas (2) to (4) is used.

以下、発言解析装置１の発言解析手段３０について、説明する。
発言解析手段３０は、意見種別推定手段２５から入力された推定結果を用いて、発言データを解析するものである。ここで、発言解析手段３０は、従来の自然言語処理を用いて、「賛成」及び「反対」が付加された発言データを解析することができる。そして、発言解析手段３０は、発言データの解析結果を外部に出力する。 Hereinafter, the speech analysis means 30 of the speech analysis device 1 will be described.
The speech analysis unit 30 analyzes speech data using the estimation result input from the opinion type estimation unit 25. Here, the speech analysis means 30 can analyze the speech data to which “agree” and “opposite” are added using conventional natural language processing. And the speech analysis means 30 outputs the analysis result of speech data outside.

[発言解析装置の動作]
図４を参照し、発言解析装置１の動作について、説明する（適宜図１参照）。
解析対象選別手段２１は、全ての発言データのうち、解析対象とする発言データを選別する（ステップＳ１）。 [Operation of speech analysis device]
With reference to FIG. 4, operation | movement of the speech analysis apparatus 1 is demonstrated (refer FIG. 1 suitably).
The analysis object selection means 21 selects the message data to be analyzed from all the message data (step S1).

意見種別付加手段２２は、ステップＳ１で選別された発言データに意見種別を付加する（ステップＳ２）。
パラメータ最適化手段２４は、ステップＳ２で意見種別が付加された発言データを混合分布モデルに適用し、数値最適化手法を用いて、最適なパラメータを推定する（ステップＳ３）。 The opinion type adding means 22 adds the opinion type to the comment data selected in step S1 (step S2).
The parameter optimization unit 24 applies the utterance data to which the opinion type is added in step S2 to the mixed distribution model, and estimates an optimum parameter using a numerical optimization method (step S3).

意見種別推定手段２５は、ステップＳ３で最適なパラメータが推定された混合分布モデルを用いて、意見種別の割合を推定する（ステップＳ４）。
発言解析手段３０は、ステップＳ４の推定結果を用いて、発言データを解析する（ステップＳ５）。 The opinion type estimation means 25 estimates the ratio of opinion types using the mixed distribution model in which the optimum parameters are estimated in step S3 (step S4).
The speech analysis means 30 analyzes the speech data using the estimation result of step S4 (step S5).

本願発明の実施形態に係る意見種別推定装置２は、意見の形成過程を表す混合分布モデルのパラメータを最適化して、意見種別の割合を推定するので、推定結果の推定精度を向上させることができる。これによって、意見種別推定装置２は、ネットワーク上の意見に意見種別を自動的に付加できるので、手動で意見種別を付加する労力を削減することができる。さらに、発言解析装置１は、高い精度の推定結果を用いるので、発言データの良好な解析結果を得ることができる。 Since the opinion type estimation device 2 according to the embodiment of the present invention optimizes the parameters of the mixed distribution model representing the opinion formation process and estimates the ratio of opinion types, the estimation result estimation accuracy can be improved. . As a result, the opinion type estimation device 2 can automatically add an opinion type to an opinion on the network, thereby reducing the labor of manually adding an opinion type. Furthermore, since the speech analysis apparatus 1 uses a highly accurate estimation result, a favorable analysis result of speech data can be obtained.

（変形例）
本願発明に係る意見種別推定装置は、前記した実施形態に限定されず、その趣旨を逸脱しない範囲で変形を加えることができる。
前記した実施形態では、図３のように、「賛成」の分布モデル９０が１つのピークｔ_１を有することとしたが、これに限定されない。つまり、「賛成」の分布モデル９０は、ある事案に関する発言が一度盛りあがったら、収束することを表している。だが実際には、ある事案に関する発言が収束した後、再び盛り上がることも考えられる。 (Modification)
The opinion type estimation device according to the present invention is not limited to the above-described embodiment, and modifications can be made without departing from the gist thereof.
In the above-described embodiment, as shown in FIG. 3, the “agree” distribution model 90 has one peak t _1, but the present invention is not limited to this. In other words, the “agree” distribution model 90 indicates that once a statement about a certain case is raised, it converges. However, in reality, it is possible that the remarks about a certain case will come up again after it has converged.

これを表すため、図５に示すように、「賛成」の分布モデル９０ａは、２つのガウシアン分布モデルを重ね合わせて、２つのピークｔ_１１，ｔ_１２を有するように設定してもよい。また、「反対」の分布モデル９１aは、２つのガウシアン分布モデルを重ね合わせて、２つのピークｔ_２１，ｔ_２２を有するように設定してもよい。すなわち、図５の２混合分布モデルは、下記式（５）で表すことができる。この式（５）では、前記した式（１）´´と同様の拘束条件が必要となる。 In order to express this, as shown in FIG. 5, the “agree” distribution model 90a may be set to have two peaks t ₁₁ and t ₁₂ by superposing two Gaussian distribution models. The “opposite” distribution model 91a may be set to have two peaks t ₂₁ and t ₂₂ by superimposing two Gaussian distribution models. That is, the two-mix distribution model in FIG. 5 can be expressed by the following equation (5). In this formula (5), the same constraint condition as the above-described formula (1) ″ is required.

この式（５）では、「賛成」の分布モデル９０ａにおいて、ｔ_１１がガウシアン分布モデルｐ_１１（ｔ）でのピーク時刻を表し、α_１１がガウシアン分布モデルｐ_１１（ｔ）での盛り上がりの急速さを表し、ｋ_１がガウシアン分布モデルｐ_１１（ｔ），ｐ_１２（ｔ）の比率を表す（但し、０＜ｋ_１＜１）。
また、「賛成」の分布モデル９０ａにおいて、ｔ_１２がガウシアン分布モデルｐ_１２（ｔ）でのピーク時刻を表し、α_１２がガウシアン分布モデルｐ_１２（ｔ）での盛り上がりの急速さを表す。 In the equation (5), the distribution model 90a of _{"agree", t 11} represents a peak time of Gaussian distribution model _p 11 (t), alpha ₁₁ rapidly mound of Gaussian distribution model _p 11 (t) K ₁ represents the ratio of the Gaussian distribution models p ₁₁ (t) and p ₁₂ (t) (where 0 <k ₁ <1).
Further, the distribution model 90a of _{"agree", t 12} represents a peak time of Gaussian distribution model _p ₁₂ (t), _α 12 represents the rapidity of the protrusion of Gaussian distribution model _p 12 (t).

また、「反対」の分布モデル９１aにおいて、ｔ_２１がガウシアン分布モデルｐ_２１（ｔ）でのピーク時刻を表し、α_２１がガウシアン分布モデルｐ_２１（ｔ）での盛り上がりの急速さを表し、ｋ_２がガウシアン分布モデルｐ_２１（ｔ），ｐ_２２（ｔ）の比率を表す（但し、０＜ｋ_２＜１）。
また、「反対」の分布モデル９１aにおいて、ｔ_２２がガウシアン分布モデルｐ_２２（ｔ）でのピーク時刻を表し、α_２２がガウシアン分布モデルｐ_２２（ｔ）での盛り上がりの急速さを表す。ここでい、 Further, the distribution model 91a of the _{"opposite", t 21} represents the peak time of the Gaussian distribution model _p ₂₁ (t), _α 21 represents the rapidity of the protrusion of Gaussian distribution model _p 21 (t), k ₂ represents the ratio of the Gaussian distribution models p ₂₁ (t) and p ₂₂ (t) (where 0 <k ₂ <1).
Further, the distribution model 91a of the _{"opposite", t 22} represents a peak time of Gaussian distribution model _p ₂₂ (t), _α 22 represents the rapidity of the protrusion of Gaussian distribution model _p 22 (t). Here,

この場合、混合分布モデル管理手段２３は、図５の２混合分布モデルを記憶、管理する。そして、パラメータ最適化手段２４は、式（５）のパラメータｔ_１１，ｔ_１２，ｔ_２１，ｔ_２２，α_１１，α_１２，α_２１，α_２２，ｋ_１，ｋ_２，βを推定する。さらに、意見種別推定手段２５は、最適なパラメータが推定された式（５）を用いて、意見種別の割合を推定する。 In this case, the mixed distribution model management means 23 stores and manages the two mixed distribution model of FIG. Then, the parameter optimization unit 24 estimates the parameters t ₁₁ , t ₁₂ , t ₂₁ , t ₂₂ , α ₁₁ , α ₁₂ , α ₂₁ , α ₂₂ , k ₁ , k ₂ , β in the equation (5). Furthermore, the opinion type estimation means 25 estimates the ratio of the opinion type using the equation (5) in which the optimum parameter is estimated.

なお、図３において、「賛成」の分布モデル９０及び「反対」の分布モデル９１は、３つ以上のピークを含んでもよい。また、「反対」の分布モデル９１のピーク時刻ｔ_２が「賛成」の分布モデル９０のピーク時刻ｔ_１よりも先であってもよい。また、２混合分布モデルは、図３及び図５の例に限定されない。 In FIG. 3, the “agree” distribution model 90 and the “opposite” distribution model 91 may include three or more peaks. Further, the peak time t _{2 of} the “opposite” distribution model 91 may be earlier than the peak time t _{1 of the} “agree” distribution model 90. Further, the two-mix distribution model is not limited to the examples of FIGS. 3 and 5.

前記した実施形態では、意見種別が「賛成」又は「反対」の２種別としたが、これに限定されない。
例えば、意見種別は、「賛成」、「中立」又は「反対」の３種別としてもよい。この場合、混合分布モデル管理手段は、３混合分布モデルを記憶、管理する。
また、意見種別は、４種別以上であってもよい。この場合、混合分布モデル管理手段は、４種別以上の混合分布モデルを記憶、管理する。
つまり、意見種別がｎ種別であれば、混合分布モデル管理手段は、ｎ混合分布モデルを記憶、管理する（ｎは２以上の整数）。 In the above-described embodiment, the opinion type is two types of “agree” or “opposite”, but is not limited to this.
For example, the opinion type may be three types of “agree”, “neutral”, or “opposite”. In this case, the mixed distribution model management means stores and manages the three mixed distribution models.
Also, the opinion types may be four or more types. In this case, the mixed distribution model management means stores and manages four or more types of mixed distribution models.
That is, if the opinion type is n type, the mixed distribution model management means stores and manages the n mixed distribution model (n is an integer of 2 or more).

前記した実施形態では、意見種別毎の分布モデルをガウシアン分布モデルとして説明したが、これに限定されない。例えば、パラメータ最適化手段は、意見種別毎の分布モデルとして、下記式（６）の一般化双曲型分布モデルｇｈ（ｘ）を利用することができる。一般化双曲型分布モデルでは、ピークに達する前とピークに達した後との時間推移に対称性がない場合でも、意見の形成過程を反映した分布モデルとして扱うことができる。 In the above-described embodiment, the distribution model for each opinion type has been described as a Gaussian distribution model, but the present invention is not limited to this. For example, the parameter optimization means can use a generalized hyperbolic distribution model gh (x) of the following equation (6) as a distribution model for each opinion type. The generalized hyperbolic distribution model can be treated as a distribution model that reflects the formation process of opinions even when there is no symmetry in the time transition between before reaching the peak and after reaching the peak.

この式（６）では、μがピーク時刻を表し、δがスケール（縦、横）を表し、λが第３種変形ベッセル関数Ｋ_λ（ｘ）の次数を表す。ここで、α，γは分布の形状を決めるパラメータであり、αは尖度に、γは歪度（非対称性）に影響する。 In this equation (6), μ represents the peak time, δ represents the scale (vertical, horizontal), and λ represents the order of the third type modified Bessel function K _λ (x). Here, α and γ are parameters that determine the shape of the distribution, α affects the kurtosis, and γ affects the skewness (asymmetry).

前記した実施形態では、意見種別推定装置を独立したハードウェアとして説明したが、本願発明は、これに限定されない。例えば、意見種別推定装置は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、解析対象選別手段と、意見種別付加手段と、混合分布モデル管理手段と、パラメータ最適化手段と、意見種別推定手段として協調動作させる意見種別推定プログラムで実現することもできる。 In the above-described embodiment, the opinion type estimation device is described as independent hardware, but the present invention is not limited to this. For example, the opinion type estimation apparatus includes hardware resources such as a CPU, a memory, and a hard disk included in a computer, an analysis target selection unit, an opinion type addition unit, a mixed distribution model management unit, a parameter optimization unit, an opinion type, It can also be realized by an opinion type estimation program for cooperative operation as an estimation means.

本願発明に係る意見種別推定装置は、例えば、政策が問われる政治家又は行政機関、作品の評価が問われる芸術家、及び、商品の評判が問われる事業者が、ネットワーク上の発言を解析するのに利用することができる。 The opinion type estimation device according to the present invention analyzes, for example, a politician or administrative institution whose policy is questioned, an artist whose question is an evaluation of a work, and a company whose reputation of a product is questioned on a network. Can be used for

１発言解析装置
２意見種別推定装置
２１解析対象選別手段
２２意見種別付加手段
２３混合分布モデル管理手段（混合分布モデル記憶手段）
２４パラメータ最適化手段
２５意見種別推定手段
３０発言解析手段 DESCRIPTION OF SYMBOLS 1 Comment analysis apparatus 2 Opinion classification estimation apparatus 21 Analysis object selection means 22 Opinion classification addition means 23 Mixed distribution model management means (mixed distribution model storage means)
24 Parameter optimization means 25 Opinion type estimation means 30 Speech analysis means

Claims

Opinion type is preset in the utterance data representing the content of the opinion, and is an opinion type estimation device that estimates statistical information about the opinion type of the utterance data,
The utterance data to which the utterance time is added while pre-stored a mixed distribution model representing the formation process of the opinion depending on the utterance time, with the mixture ratio, peak time, and rapid rise of each opinion type as parameters. Mixed distribution model storage means for storing each opinion type;
Applying parameter data to the mixture distribution model of the mixture distribution model storage means, and parameter optimization means for estimating the parameters by a numerical optimization method;
Opinion type estimation means for estimating statistical information related to the opinion type using a mixed distribution model in which parameters are estimated by the parameter optimization means;
An opinion type estimation device comprising:

The mixed distribution model storage means includes a Gaussian distribution p of the following formula (1) ′ including a peak time t _n of the nth opinion type and a rapid rise α _{n of the nth} opinion type. _2. The opinion type estimation apparatus according to claim 1, wherein a mixed distribution model in which _n (t) is mixed is stored in advance (where n is a natural number).

The mixed distribution model storage means has two types of opinions, the peak time t ₁ of the first opinion type, the rapidity α ₁ of the first opinion type, and the second opinion Using the peak time t ₂ of the type and the rapidity α ₂ of the rise of the second opinion type as an argument of the equation (1) ′, the mixing ratio β of the first opinion type and the second opinion type A two-mixed distribution model p (t) of the following formula (2) including (0 <β <1) is stored in advance;

3. The opinion type estimation device according to claim 2, wherein the parameter optimization unit estimates the parameters t ₁ , t ₂ , α ₁ , α ₂ , β (provided that ';' is included later) Indicates that the variable is an argument of expression (1) ′).

The mixed distribution model storage unit superimposes two Gaussian distributions p ₁₁ (t) and p ₁₂ (t) with the first opinion type, as shown in the following formula (5). A two-mixed distribution model p (t) of the distribution model and a distribution model obtained by superimposing two Gaussian distributions p ₂₁ (t) and p ₂₂ (t) in the second opinion type,

The parameter optimization means, as the parameter,
A mixing ratio β of the first opinion type and the second opinion type;
A ratio k ₁ (0 <k ₁ <1) of the Gaussian distribution models p ₁₁ (t) and p ₁₂ (t);
A peak time t _{11 in} a Gaussian distribution model p ₁₁ (t), a rapid rise α ₁₁ ,
A peak time t ₁₂ in the Gaussian distribution model p ₁₂ (t), a rapid rise α ₁₂ ,
A ratio k ₂ (0 <k ₂ <1) of the Gaussian distribution models p ₂₁ (t) and p ₂₂ (t);
The peak time t ₂₁ in the Gaussian distribution model p ₂₁ (t), the rapidity α _{21 of the} rise,
The peak time t ₂₂ in the Gaussian distribution model p ₂₂ (t), the rapidity α _{22 of the} rise,
The opinion type estimation device according to claim 2, wherein

The mixed distribution model storage means includes the following formula (6) including the peak time μ, the scale δ, the order λ of the third type modified Bessel function K _λ (x), the kurtosis α, and the skewness γ. 2. The opinion type estimation device according to claim 1, wherein a mixed distribution model in which two or more generalized hyperbolic distribution models gh (x) are mixed is stored in advance. Indicates that the included variable is an argument of equation (6)).

An opinion type estimation program for causing a computer to function as the opinion type estimation device according to claim 1.