JP5496863B2

JP5496863B2 - Emotion estimation apparatus, method, program, and recording medium

Info

Publication number: JP5496863B2
Application number: JP2010262527A
Authority: JP
Inventors: 済央野本; 浩和政瀧; 理吉岡; 敏高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-11-25
Filing date: 2010-11-25
Publication date: 2014-05-21
Anticipated expiration: 2030-11-25
Also published as: JP2012113542A

Description

本発明は対象音声データまたは対象テキストデータから怒りの感情を推定する感情推定装置、その方法、そのプログラム及びその記録媒体に関する。 The present invention relates to an emotion estimation apparatus for estimating an anger emotion from target speech data or target text data, a method thereof, a program thereof, and a recording medium thereof.

非特許文献１が対象テキストデータから怒りの感情を推定する従来技術として知られている。非特許文献１では、感情に関係する語（以下「感情関係語」という）と、感情関係語に対する感情スコアとを感情関係語ＤＢに記憶しておき、感情関係語ＤＢを参照して文レベルの感情スコアを推定する。例えば、「子供」や「夏休み」等の感情関係語には、「楽しい」や「ポジティブ」等の感情スコアが対応付けられ、「戦争」や「事故」等の感情関係語には、「怒り」、「悲しい」、「ネガティブ」等の感情スコアが対応付けられる。 Non-Patent Document 1 is known as a conventional technique for estimating an angry emotion from target text data. In Non-Patent Document 1, words related to emotion (hereinafter referred to as “emotion related words”) and emotion scores for emotion related words are stored in the emotion related word DB, and the sentence level is referred to the emotion related word DB. Estimate the emotional score. For example, emotional words such as “children” and “summer vacation” are associated with emotional scores such as “fun” and “positive”, and emotional words such as “war” and “accident” are associated with “anger”. "," Sad "," negative ", etc. are associated with emotional scores.

菅原久嗣、アレナネビアロスカヤ、石塚満著、「日本語テキストからの感情抽出」、人工知能学会全国大会論文集（ＣＤ−ＲＯＭ）、２００９、巻:２３ｒｄ、頁:ＲＯＭＢＵＮＮＯ．３Ｉ４−２Hisashi Sugawara, Arena Nevia Roskaya, Mitsuru Ishizuka, “Emotion Extraction from Japanese Text”, Proceedings of the Japanese Society for Artificial Intelligence (CD-ROM), 2009, Volume: 23rd, Page: ROMBUNNO. 3I4-2

しかしながら、従来技術は、感情関係語と感情関係語に対する感情スコアとを感情関係語ＤＢに記憶するため、以下のような問題が生じる。 However, since the related art stores the emotion related words and the emotion score for the emotion related words in the emotion related word DB, the following problems occur.

感情関係語ＤＢ内に推定するために十分な量の感情関係語を記憶するためには、甚大な人的、時間的、金銭的コストがかかる。そのため、実用的には、３００語程度の感情関係語を記録する感情関係語ＤＢを作成することになる。そうすると、感情関係語ＤＢに登録されていない感情関係語（以下「未知語」という）が、対象テキストデータ内に多く含まれることとなり、感情スコアを推定することができない場合が多発する。 In order to store a sufficient amount of emotion related words to estimate in the emotion related word DB, enormous human, time and money costs are required. Therefore, practically, an emotion related word DB that records about 300 emotion related words is created. Then, many emotion-related words (hereinafter referred to as “unknown words”) that are not registered in the emotion-related word DB are included in the target text data, and the emotion score cannot be estimated frequently.

また、同一の感情関係語であっても、文脈や使われ方（以下「文脈等」という）により、関連する感情は異なる場合があるため、誤って感情を推定する可能性がある。例えば、「君の態度はまるで子供だね」という文章では、「子供」は「ネガティブ」な感情と関連するが、感情関係語ＤＢには、「ポジティブ」な感情に関連する感情関係語として登録されている場合があり、誤った感情を推定する原因となる。 Moreover, even if the same emotion-related word is used, the related emotions may differ depending on the context and usage (hereinafter referred to as “context etc.”). For example, in the sentence “Your attitude is like a child,” “kid” is related to “negative” emotions, but it is registered in the emotion related word DB as emotion related words related to “positive” emotions. It may have been the cause of estimating the wrong emotion.

よって、感情関係語ＤＢのような単語の意味情報を記録したデータベースを用いずに、対象音声データまたは対象テキストデータから感情を推定することができる感情推定装置、その方法、そのプログラム及びその記録媒体を提供するという課題がある。単語の意味情報を記録したデータベース（例えば感情関係語ＤＢ）を用いない構成により、未知語の存在により対象テキストデータから感情を推定することができないという問題、及び、文脈等により誤って感情を推定するという問題が生じなくなる。 Therefore, an emotion estimation apparatus, method, program thereof, and recording medium thereof that can estimate emotion from target speech data or target text data without using a database that records semantic information of words such as emotion related word DB There is a problem of providing. The problem of not being able to estimate emotions from the target text data due to the presence of unknown words due to a configuration that does not use a database (for example, emotion-related word DB) that records the semantic information of words, and erroneously estimating emotions due to context, etc. The problem of doing no longer occurs.

上記の課題を解決するために、本発明は、対象テキストデータの形態素解析結果と構文解析結果との少なくとも一方を用いて、対象テキストデータ内にどの程度その意味内容が明示されているかを表す明示性特徴量を推定し、怒りの感情が表現されているか否かを示す教師信号付きの学習用テキストデータの明示性特徴量から予め学習された識別器を用いて、対象テキストデータ内に怒りの感情が表現されているか否かを、明示性推定部で推定された対象テキストデータの明示性特徴量から識別する。 In order to solve the above problem, the present invention uses at least one of the morphological analysis result and the syntax analysis result of the target text data to express how much the semantic content is clearly indicated in the target text data. Estimate gender features and use an identifier that has been learned in advance from the explicit features of learning text data with a teacher signal to indicate whether anger emotions are expressed or not. Whether the emotion is expressed is identified from the explicit feature quantity of the target text data estimated by the explicit estimation unit.

本発明に係る感情推定装置は、未知語及び文脈等に影響されずに対象音声データまたは対象テキストデータ内に表現されている怒りの感情を推定することができるという効果を奏する。 The emotion estimation apparatus according to the present invention has an effect of being able to estimate an angry emotion expressed in the target speech data or the target text data without being affected by unknown words, contexts, and the like.

感情推定装置１００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the emotion estimation apparatus 100. FIG. 感情推定装置１００の処理フローの例を示す図。The figure which shows the example of the processing flow of the emotion estimation apparatus 100. FIG. 形態素解析結果の例を示す図。The figure which shows the example of a morphological analysis result. 構文解析結果の例を示す図。The figure which shows the example of a parsing result. 省略格推定部への入力となる構文解析結果の例を示す図。The figure which shows the example of the parsing result used as the input to an abbreviation case estimation part. 置換単語推定部または明示単語検出部への入力となる形態素解析結果の例を示す図。The figure which shows the example of the morphological analysis result used as the input to a replacement word estimation part or an explicit word detection part. 明示単語リスト記憶部のデータ例を示す図。The figure which shows the example of data of an explicit word list memory | storage part. 識別器を生成する感情推定装置１００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the emotion estimation apparatus 100 which produces | generates a discriminator. 識別器を生成する感情推定装置１００の処理フローの例を示す図。The figure which shows the example of the processing flow of the emotion estimation apparatus 100 which produces | generates a discriminator. 感情推定装置２００，３００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the emotion estimation apparatuses 200 and 300. FIG. 感情推定装置２００，３００の処理フローの例を示す図。The figure which shows the example of the processing flow of the emotion estimation apparatuses 200 and 300. FIG. 識別器を生成する感情推定装置２００，３００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the emotion estimation apparatuses 200 and 300 which produce | generate a discriminator. 識別器を生成する感情推定装置２００，３００の処理フローの例を示す図。The figure which shows the example of the processing flow of the emotion estimation apparatuses 200 and 300 which produce | generate a discriminator. モニタリングシステム４００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the monitoring system 400. FIG. モニタリングシステム４００の出力データの例を示す図。The figure which shows the example of the output data of the monitoring system. モニタリングシステム５００の機能構成例を示すブロック図。The block diagram which shows the function structural example of the monitoring system 500. FIG. モニタリングシステム５００の出力データの例を示す図。The figure which shows the example of the output data of the monitoring system.

［発明のポイント］
本発明では、対象音声データまたは対象テキストデータ内に表現されている怒りの感情を推定するために、文の明示性、特に、一般対話状況下での発話内容の明示性に着目する。なお、発話内容の明示性とは、発話において主語や対象の省略や「それ」、「これ」などの指示語への置換がどれだけ生じているかを指す。これら省略や置換が生じていない場合を「発話内容の明示性が高い」状態とする。これら省略や置換が多く生じている場合を「発話内容の明示性が低い」状態とする。 [Points of Invention]
In the present invention, in order to estimate the feeling of anger expressed in the target speech data or the target text data, attention is paid to the explicitness of the sentence, in particular, the explicitness of the utterance contents in the general conversation situation. Note that the clarification of the utterance content refers to how much omission of the subject or object or substitution with an instruction word such as “it” or “this” occurs in the utterance. The case where these omissions and substitutions have not occurred is defined as a state where the utterance content is highly explicit. A case where many of these omissions and substitutions occur is defined as a state where the utterance content is not clearly specified.

怒っている発話内容と怒っていない発話内容の明示性に差異があることを示すために、以下の分析結果について説明する。 In order to show that there is a difference in clarification between angry utterance contents and non-angry utterance contents, the following analysis results will be described.

あるコールセンタにおける全発話数６６７９に対し人手により怒り発話（発話数３３１２）と平常発話（発話数３３６７）のラベルを付与する。ラベル付きの発話を用いてＳＶＭ（サポートベクターマシン）により識別器を学習する。χ二乗検定で有意水準１％で怒り発話に偏って現れた単語（ｕｎｉｇｒａｍ）のうち、サービス名や人名、数字、業務用語等を人手で除いた２５６個の単語からなる単語リストを生成する。このとき、発話に含まれる単語の出現回数を発話に含まれる全形態素数で割って正規化した値を素性として使用する。 The labels of anger utterance (utterance number 3312) and normal utterance (utterance number 3367) are manually assigned to the total number of utterances 6679 in a call center. The discriminator is learned by SVM (support vector machine) using the labeled utterance. A word list composed of 256 words obtained by manually removing service names, personal names, numbers, business terms, etc. from words (unigrams) that appear biased to angry utterances at a significance level of 1% by the chi-square test is generated. At this time, a normalized value obtained by dividing the number of appearances of the word included in the utterance by the total number of morphemes included in the utterance is used as the feature.

単語リストをパラメータとして、学習した識別器に分析対象発話を入力し、分析対象発話を、怒り発話と平常発話、または、怒られている発話と怒られていない発話、に識別する。なお、コールセンタの発話には、顧客の発話とオペレータの発話とが含まれるが、顧客の発話を怒り発話と平常発話とに分類し、オペレータの発話を怒られている発話と怒られていない発話とに分類する。なお、怒られている発話とは、顧客の怒り発話直後のオペレータの発話を意味し、それ以外の発話を怒られていない発話とする。以下、分析結果を示す。

なお、ランダム抽出した場合の適合率は５０．５％であり、怒っている発話内容と怒っていない発話内容とで言語的特徴があることがわかる。なお、怒り発話に偏って出現した単語としては、一人称の「私」や「わたしら」、二人称の「お宅」等がある。 Using the word list as a parameter, the analysis target utterance is input to the learned classifier, and the analysis target utterance is identified as an angry utterance and a normal utterance, or an angry utterance and an angry utterance. Call center utterances include customer utterances and operator utterances, but categorize customer utterances into angry utterances and normal utterances, and utter utterances that are angry and angry. And classify. The angry utterance means the utterance of the operator immediately after the customer's angry utterance, and the other utterances are uttered not angry. The analysis results are shown below.

In addition, the relevance rate in the case of random extraction is 50.5%, and it can be seen that there is a linguistic feature between angry utterance content and non-angry utterance content. Note that words that appear biased in anger utterance include first-person “I” and “we”, second-person “home”, and the like.

日本語を母語とする話者が人と会話を行う際には、「あなたは」や「私は」などの単語は省略されることが多い。また、例えば何か対象を指す時、指示語への置換が生じたり、省略されたりすることも多い。これは日本語が可能な限り曖昧性を許す言語であり、対人コミュニケーションにおいて曖昧であることをよしとする日本の社会文化に起因する。しかし、話者の感情状態が「怒り」にある時は、曖昧性を排除し、怒りの対象をより明確にしようとする。例えば、平常時には「あなた」などの２人称名詞がコミュニケーション上表れることは少ないが、怒っている時は「あなた」という言葉は頻出する。同様に平常時には省略されて伝えられていた言葉も、怒っている時は明示されることが多い。例えば、子供が醤油をこぼしたときの母親の子供に対する発言としては、平常発話と怒り発話で以下のような違いが現れる。
平常発話：「こぼしたの？」
怒り発話：「あなたが醤油こぼしたの？」
本発明はこの現象に着目する。発話内容の明示性として「発話中にどれだけ省略や指示語への置換が生じているか」を算出し、発話内容明示性が高い時には話者が怒っていると推定する。 When a speaker whose mother tongue is Japanese speaks with people, words such as “you” and “I am” are often omitted. In addition, for example, when pointing to something, substitution to a directive word often occurs or is omitted. This is a language that allows Japanese language to be as ambiguous as possible, and is caused by Japanese social culture that is vague in interpersonal communication. However, when the speaker's emotional state is "anger", he tries to eliminate the ambiguity and make the target of anger clearer. For example, the second person noun such as “you” is rare in communication during normal times, but the word “you” appears frequently when angry. Similarly, words that were omitted and reported during normal times are often revealed when angry. For example, when a child spills soy sauce, the following difference appears between the normal speech and the angry speech as to the mother's child.
Normal utterance: “Did you spill?”
Anger utterance: “Did you spill soy sauce?”
The present invention focuses on this phenomenon. As the clarification of the utterance content, “how much omission or replacement with the instruction word occurs during the utterance” is calculated, and when the utterance content clarification is high, it is estimated that the speaker is angry.

以下、本発明の実施の形態について、詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail.

＜感情推定装置１００＞
図１及び図２を用いて実施例１に係る感情推定装置１００を説明する。感情推定装置１００は、テキスト解析部１３０と、明示性推定部１４０と、識別器生成部１５０と、感情識別部１６０と、を備える。このような構成により、感情推定装置１００は、対象テキストデータから怒りの感情を推定する。なお、対象テキストデータ中の感情を推定しようとする単位を感情推定単位と呼ぶ。感情推定単位は１文でも良いし、複数文でも良いし、文中の特定の区間でも良い。以下、各部の詳細を説明する。 <Emotion estimation apparatus 100>
The emotion estimation apparatus 100 according to the first embodiment will be described with reference to FIGS. 1 and 2. The emotion estimation device 100 includes a text analysis unit 130, an explicitness estimation unit 140, a discriminator generation unit 150, and an emotion identification unit 160. With such a configuration, the emotion estimation apparatus 100 estimates an angry emotion from the target text data. Note that a unit for estimating an emotion in the target text data is called an emotion estimation unit. The emotion estimation unit may be a single sentence, a plurality of sentences, or a specific section in the sentence. Details of each part will be described below.

＜テキスト解析部１３０＞
テキスト解析部１３０は、対象テキストデータを解析し、その対象テキストデータから得られる形態素解析結果と、その形態素解析結果に基づき対象テキストデータ内の係り受け関係を解析した構文解析結果と、を求める（ｓ１３０）。例えば、テキスト解析部１３０は、形態素解析部１３１と構文解析部１３３を有する。以下、各部の詳細を説明する。 <Text analysis unit 130>
The text analysis unit 130 analyzes the target text data and obtains a morpheme analysis result obtained from the target text data and a syntax analysis result obtained by analyzing the dependency relationship in the target text data based on the morpheme analysis result ( s130). For example, the text analysis unit 130 includes a morpheme analysis unit 131 and a syntax analysis unit 133. Details of each part will be described below.

（形態素解析部１３１）
形態素解析部１３１は、対象テキストデータを入力とし、その対象テキストデータを形態素解析して、形態素解析結果を求め（ｓ１３１）、それを構文解析部１３３と置換単語推定部１４３と明示単語検出部１４４へ出力する。なお、形態素とは言語的に意味を持つ最小単位のことであり、形態素解析とは、対象言語の文法の知識（文法のルールの集まり）や辞書（品詞等の情報付きの単語リスト）を情報源として用い、自然言語で書かれた文を形態素の列に分割し、それぞれの品詞を判別する作業である。なお、対象言語の文法の知識（文法のルールの集まり）や辞書（品詞等の情報付きの単語リスト）は図示しない記憶部に記憶されているものとする。形態素解析技術としては、従来技術（例えば、［長尾真（編）、「自然言語処理」、岩波講座ソフトウェア科学、第１５巻、岩波書店、１９９６（以下「参考文献１」という）］記載の従来技術）を用いることができる。 (Morphological analyzer 131)
The morpheme analysis unit 131 receives the target text data, performs morpheme analysis on the target text data, obtains a morpheme analysis result (s131), and obtains it as a syntax analysis unit 133, a replacement word estimation unit 143, and an explicit word detection unit 144. Output to. Note that a morpheme is the smallest unit that has linguistic meaning. A morpheme analysis is information about grammar knowledge (a collection of grammar rules) or a dictionary (a word list with information such as parts of speech) in the target language. This is a task that uses sentences as a source and divides sentences written in natural language into morpheme strings and discriminates each part of speech. It is assumed that knowledge of the target language grammar (a collection of grammar rules) and a dictionary (a word list with information such as parts of speech) are stored in a storage unit (not shown). As a morphological analysis technique, a conventional technique (for example, “Makoto Nagao (ed.),“ Natural Language Processing ”, Iwanami Course Software Science, Vol. 15, Iwanami Shoten, 1996 (hereinafter referred to as“ Reference Document 1 ”)) is described. Technology).

例えば、形態素解析部１３１は、対象テキストデータを形態素単位に分割し、分割した各形態素の品詞を判別し、各形態素に対して品詞を付加した対象テキストデータを形態素解析結果として出力する。例えば、「私は少女を見た」という対象テキストデータを形態素解析すると、図３に示す形態素解析結果が得られる。なお、図３中、「／」は形態素の区切りを表す。 For example, the morpheme analysis unit 131 divides the target text data into morpheme units, determines the part of speech of each divided morpheme, and outputs the target text data with the part of speech added to each morpheme as a morpheme analysis result. For example, if the target text data “I saw a girl” is subjected to morphological analysis, the morphological analysis result shown in FIG. 3 is obtained. In FIG. 3, “/” represents a morpheme break.

（構文解析部１３３）
構文解析部１３３は、形態素解析結果を入力とし、その形態素解析結果を構文解析して、構文解析結果を求め（ｓ１３３）、省略格推定部１４１へ出力する。例えば、構文解析部１３３は、形態素解析結果から文節を求め、さらに、どの文節が主部であるとか、どの文節が述部であるとか、文節の係り受け関係を解析し、解析結果を形態素解析結果に付加して構文解析結果として求める。例えば、図３の形態素解析結果を構文解析すると、図４に示す構文解析結果（文節単位での係り受け関係及び主部／述部等を付加した形態素解析結果）が得られる。なお、構文解析技術としては、従来技術（例えば、参考文献１記載の従来技術）を用いることができる。 (Syntax analyzer 133)
The syntax analysis unit 133 receives the morpheme analysis result as input, parses the morpheme analysis result, obtains the syntax analysis result (s133), and outputs the result to the abbreviation estimation unit 141. For example, the syntax analysis unit 133 obtains a clause from the morphological analysis result, further analyzes which clause is the main part, which clause is the predicate, the dependency relation of the clause, and the analysis result is the morphological analysis. The result is added to the result as the result of parsing. For example, when the morphological analysis result shown in FIG. 3 is parsed, the parse analysis result shown in FIG. 4 (the morpheme analysis result with the dependency relationship and the main part / predicate etc. added to each phrase) is obtained. As a syntax analysis technique, a conventional technique (for example, a conventional technique described in Reference 1) can be used.

＜明示性推定部１４０＞
明示性推定部１４０は、対象テキストデータから得られる形態素解析結果と、形態素解析結果に基づき対象テキストデータ中の係り受け関係を解析した構文解析結果と、を用いて、対象テキストデータ内にどの程度その意味内容が明示されているかを表す明示性特徴量を推定する（ｓ１４０）。例えば、明示性推定部１４０は省略格推定部１４１と、置換単語推定部１４３と、明示単語検出部１４４と、明示単語リスト記憶部１４４ａと、明示性特徴量算出部１４５と、を有する。以下、各部の詳細を説明する。 <Explicitness estimation unit 140>
The explicitness estimation unit 140 uses the morphological analysis result obtained from the target text data and the syntactic analysis result obtained by analyzing the dependency relationship in the target text data based on the morphological analysis result to determine how much in the target text data. An explicit feature amount indicating whether the semantic content is specified is estimated (s140). For example, the explicitness estimation unit 140 includes an omitted case estimation unit 141, a replacement word estimation unit 143, an explicit word detection unit 144, an explicit word list storage unit 144a, and an explicit feature quantity calculation unit 145. Details of each part will be described below.

（省略格推定部１４１）
省略格推定部１４１は、構文解析結果を用いて、対象テキストデータ内に本来あるべき省略されている格を省略格として推定し（ｓ１４１）、省略格の種類と各省略格の省略回数を明示性特徴量算出部１４５へ出力する。 (Omitted case estimation unit 141)
The abbreviation estimation unit 141 estimates the abbreviation that should be originally present in the target text data as an abbreviation using the syntax analysis result (s141), and clearly indicates the type of abbreviation and the number of abbreviations of each abbreviation. It outputs to the sex feature amount calculation unit 145.

省略格を推定する方法としては、例えば格フレーム辞書を用いる手法がある。例えば、［河原大輔、黒橋禎夫、「自動構築した格フレーム辞書に基づく省略解析の大規模評価」、言語処理学会第9回年次大会, pp.589-592, 2003.（以下、参考文献２という）］記載の従来技術により、格フレーム辞書を用いて省略格を推定することができる。また、自動構築した格フレーム辞書ではなく、既存の格フレーム辞書、例えば、ＩＰＡＬ動詞辞書［情報処理振興事業協会技術センター、計算機用日本語基本動詞辞書 IPAL, 1987 ］等の格フレーム辞書を用いて省略格を推定してもよい。このとき、格フレーム辞書は図示しない記憶部に予め記憶しておく。 As a method for estimating the omitted case, for example, there is a method using a case frame dictionary. For example, [Daisuke Kawahara, Ikuo Kurohashi, "Large-scale evaluation of omission analysis based on automatically constructed case frame dictionary", The 9th Annual Conference of the Language Processing Society, pp.589-592, 2003. 2))] can be estimated using a case frame dictionary. Also, instead of an automatically constructed case frame dictionary, an existing case frame dictionary, for example, a case frame dictionary such as the IPAL verb dictionary [Technology Center for Information Processing Promotion Corporation, Japanese Basic Verb Dictionary for Computers IPAL, 1987] is used. An abbreviated case may be estimated. At this time, the case frame dictionary is stored in advance in a storage unit (not shown).

なお、格とは文中における単語間の意味的関係性、特に、動詞と名詞（または名詞句）との間の意味的関係性を示す標識であり、格フレームとは動詞が取りうる格構造のパタンを表し、格フレーム辞書とは個々の動詞について格フレームを記載したデータベースである。なお、格には、表層的に決まる表層格（日本語における「ガ格」、「ヲ格」、「ニ格」等）と、表層だけでは決まらない、真の格を表す深層格があり、動詞の取りうる格を深層格で定義する場合には、（１）動作主格（Ａｇｅｎｔ）、（２）経験者格（Ｅｘｐｅｒｉｅｎｃｅｒ）、（３）道具格（Ｉｎｓｔｒｕｍｅｎｔ）、（４）対象格（Ｏｂｊｅｃｔ）、（５）源泉格（Ｓｏｕｒｃｅ）、（６）目標格（Ｇｏａｌ）、（７）場所格（Ｌｏｃａｔｉｏｎ）、（８）時間格（Ｔｉｍｅ）等が用いられる。 A case is a sign indicating a semantic relationship between words in a sentence, especially a verb and a noun (or noun phrase), and a case frame is a case structure of a verb. The case frame dictionary is a database in which case frames are described for individual verbs. There are two types of cases: surface cases that are determined by the surface layer (such as “ga”, “wo”, and “d”) in Japanese, and deep cases that represent the true case that are not determined solely by the surface, When defining the case that the verb can take in the deep case, (1) Action case (Agent), (2) Experiencer case (Experencer), (3) Instrument case (Object), (4) Object case (Object) ), (5) Source case (6), (6) Target case (Goal), (7) Location case (Location), (8) Time case (Time), etc. are used.

さらに、格フレーム辞書には、動詞が必ず取らなければならない必須格と取る場合もある任意格が定義されている。例えば、動詞「見る」であれば「動作主格」や「対象格」は必須格となり、「場所格」や「道具格」は任意格となる。省略格の推定とは、ある動詞に対する必須格が省略されているか否かを、格フレーム辞書を参照して、推定する処理を意味する。図５に示す構文解析結果に対して省略格の推定を行うと、「見る」の必須格である「対象格」と、「驚く」の必須格である「動作主格」が省略されているので、省略格の種類とその個数である（「対象格」、１個）、（「動作主格」、１個）を省略格推定結果として出力する。 Furthermore, the case frame dictionary defines an optional case that may be taken as an indispensable case that a verb must take. For example, in the case of the verb “see”, “action case” and “target case” are indispensable cases, and “location case” and “tool case” are arbitrary cases. The abbreviation estimation means processing for estimating whether or not an essential case for a certain verb is omitted with reference to a case frame dictionary. When the abbreviated case is estimated for the parsing result shown in FIG. 5, the “target case” that is an essential case of “see” and the “motion main case” that is an essential case of “surprise” are omitted. The type and number of abbreviations (“target case”, 1) and (“operation main case”, 1) are output as abbreviated case estimation results.

なお、格フレーム辞書には、他の要素、例えば、格に対する意味的な制約が定義されていてもよい。 In the case frame dictionary, other elements, for example, semantic restrictions on the case may be defined.

（置換単語推定部１４３）
置換単語推定部１４３は、形態素解析結果を用いて、対象テキストデータ内に本来あるべき単語の代わりに存在する単語を置換単語として推定し（ｓ１４３）、置換単語の種類と各置換単語の出現回数を明示性特徴量算出部１４５へ出力する。例えば、置換されて用いられる単語（つまり、「置換単語」）として指示語を用いる。指示語とは現場にあるものや文脈上の要素を指し示す表現であり、例えば「これ・それ・あれ」等であり、本来あるべき単語の代わりとして用いられる。 (Replacement word estimation unit 143)
Using the morphological analysis result, the replacement word estimation unit 143 estimates a word that exists in the target text data in place of the original word as a replacement word (s143), and the type of replacement word and the number of occurrences of each replacement word Is output to the explicit feature quantity calculation unit 145. For example, an instruction word is used as a word used after replacement (that is, a “replacement word”). An instruction word is an expression that indicates an element in the field or a contextual element, for example, “this, it, that”, and the like, and is used as a substitute for a word that should originally exist.

例えば置換単語推定部１４３は、図６に示す形態素解析結果を入力として、置換単語推定を行った場合、置換単語は指示語「これ」なので、置換単語の種類とその個数である（「これ」、１個）を置換単語推定結果として出力する。 For example, when the replacement word estimation unit 143 receives the morphological analysis result shown in FIG. 6 and performs the replacement word estimation, the replacement word is the instruction word “this”, and therefore the type and number of replacement words (“this”). 1) is output as a replacement word estimation result.

（明示単語検出部１４４及び明示単語リスト記憶部１４４ａ）
明示単語リスト記憶部１４４ａは、平常発話において省略されやすい単語を予め記憶しておく。発明のポイントで説明したように、怒り発話時に偏って出現する単語が存在する。例えば、上述のように平常発話では一人称の「私」や「わたしら」二人称の「お宅」等が省略されやすいが、怒り発話では省略されづらい傾向がある。これらの傾向がある単語を明示単語として、予め明示単語リスト記憶部１４４ａに記憶しておく（図７参照）。 (Explicit word detection unit 144 and explicit word list storage unit 144a)
The explicit word list storage unit 144a stores in advance words that are easily omitted in normal speech. As explained in the point of the invention, there are words that appear biased during anger utterances. For example, as described above, in the normal utterance, the first person “I” and “we”, the second person “home”, etc. are easily omitted, but in the angry utterance, there is a tendency that it is difficult to omit. Words having these tendencies are stored in advance in the explicit word list storage unit 144a as explicit words (see FIG. 7).

明示単語検出部１４４は、形態素解析結果を入力とし、これを用いて、明示単語リスト記憶部１４４ａを参照し、対象テキストデータ内に存在する明示単語を検出し（ｓ１４４）、明示単語の種類とその個数を明示単語検出結果として出力する。 The explicit word detection unit 144 receives the morpheme analysis result as an input, uses this to refer to the explicit word list storage unit 144a, detects an explicit word existing in the target text data (s144), The number is output as an explicit word detection result.

例えば明示単語検出部１４４は、図６に示す形態素解析結果を入力とし、図７に示す明示単語リスト記憶部１４４ａを用いて明示単語検出を行った場合、単語「私」が明示単語なので、明示単語の種類とその個数である（「私」、１個）を明示単語検出結果として出力する。 For example, when the explicit word detection unit 144 receives the morphological analysis result shown in FIG. 6 and performs explicit word detection using the explicit word list storage unit 144a shown in FIG. 7, the word “I” is an explicit word. The type and number of words (“I”, 1) are output as an explicit word detection result.

（明示単語リストの作成方法）
なお、明示単語リスト記憶部１４４ａに記憶する明示単語を代名詞（例えば人称代名詞「あなた」、「私」）等に限定してもよい。これは、代名詞等には、平常発話では省略されやすく、怒り発話では省略されづらい傾向があり、この傾向がドメイン等に影響されないためである。なお、ドメインとは、学習用テキストデータまたは対象テキストデータの取得される媒体（新聞、雑誌、コールセンタの通話の音声認識結果、ＴＶの音声認識結果等）や分野（スポーツ記事、政治記事、経済記事、保険会社のコールセンタの通話の音声認識結果、通信会社のコールセンタの通話の音声認識結果等）等を意味する。この場合、代名詞等のみを対象とすることで、あらゆる単語を網羅する必要がなくなり、明示単語リストの作成のコストを削減できる。またドメイン別にリストを作るコストも削減される。 (How to create an explicit word list)
The explicit words stored in the explicit word list storage unit 144a may be limited to pronouns (for example, personal pronouns “you” and “I”). This is because pronouns and the like tend to be omitted in normal utterances and are difficult to omit in angry utterances, and this tendency is not influenced by domains or the like. A domain is a medium (newspaper, magazine, call center speech recognition result, TV speech recognition result, etc.) or field (sports article, political article, economic article) from which learning text data or target text data is acquired. , Voice recognition results of calls at insurance company call centers, voice recognition results of calls at communication company call centers, and the like. In this case, by targeting only pronouns or the like, it is not necessary to cover all words, and the cost of creating an explicit word list can be reduced. It also reduces the cost of creating lists by domain.

また、明示単語リスト記憶部１４４ａに記憶する明示単語は、発明のポイントにおいて説明した方法等により決定してもよい。まず、サンプルテキストデータに対し人手により怒り発話と平常発話のラベルを付与する。次に、χ二乗検定で有意水準１％で怒り発話に偏って現れた単語（ｕｎｉｇｒａｍ）のうち、サービス名や人名、数字、業務用語等を人手で除いたものを明示単語とする。この明示単語のリストを明示単語リスト記憶部１４４ａに予め記憶しておく。なお、χ二乗検定の有意水準は、予め実験等により適切な値を求めておき、適宜設定すればよい。 The explicit word stored in the explicit word list storage unit 144a may be determined by the method described in the point of the invention. First, anger utterances and normal utterance labels are manually attached to the sample text data. Next, out of words (unigrams) that are biased toward angry utterances with a significance level of 1% in the chi-square test, those that are obtained by manually removing service names, names, numbers, business terms, etc. are designated as explicit words. This list of explicit words is stored in advance in the explicit word list storage unit 144a. The significance level of the χ square test may be set as appropriate by obtaining an appropriate value in advance through experiments or the like.

また、明示単語リスト記憶部１４４ａには、格の種類毎に（つまり、「動作主各」や「対象格」等毎に）、平常発話では省略されやすく、怒り発話では省略されづらい傾向がある単語を明示単語として記憶しておいてもよい。その場合、前述した明示単語リスト作成時に予め格の種類毎に分類しておく。明示単語検出部１４４は、構文解析結果を利用して、明示単語リスト記憶部１４４ａを参照し、単語と、その単語の格の種類とが一致する単語を明示単語として検出する。 Also, the explicit word list storage unit 144a tends to be omitted for normal utterances and hard to be omitted for angry utterances for each case type (that is, for each “actor main”, “target case”, etc.). The word may be stored as an explicit word. In that case, when the above-mentioned explicit word list is created, it is classified in advance for each case type. The explicit word detection unit 144 uses the syntax analysis result to refer to the explicit word list storage unit 144a to detect a word that matches the word and the type of case of the word as an explicit word.

（明示性特徴量算出部１４５）
明示性特徴量算出部１４５は、省略格推定結果と置換単語推定結果と明示単語検出結果とを用いて、明示性特徴量を算出し（ｓ１４５）、感情識別部１６０へ出力する。例えば、明示性特徴量算出部１４５は、省略格推定結果と置換単語推定結果と明示単語検出結果とを用いて得られる以下の値の何れか、または、それらの組合せを明示性特徴量として算出する。
（１）省略格毎の省略回数
（２）省略格毎の省略の有無（「０」「１」情報）
（３）省略格毎でなく、感情推定単位中に含まれる全ての省略格の合計省略回数
（４）感情推定単位における省略格の有無（「０」「１」情報）
（５）置換単語毎の出現回数
（６）置換単語毎の出現の有無（「０」「１」情報）
（７）置換単語毎でなく、感情推定単位中に含まれる全ての置換単語の合計出現回数
（８）感情推定単位における置換単語の出現の有無（「０」「１」情報）
（９）明示単語毎の出現回数
（１０）明示単語毎の出現の有無（「０」「１」情報）
（１１）明示単語毎でなく、感情推定単位中に含まれる全ての明示単語の合計出現回数
（１２）感情推定単位における明示単語の出現の有無（「０」「１」情報）
なお、明示性特徴量として、（１）と（５）と（９）とのうちの何れか１つ以上の組合せを用いる場合には、省略格推定結果と置換単語推定結果と明示単語検出結果に対して、特別な処理を必要としないため、明示性特徴量算出部１４５を備えず、省略格推定部１４１、置換単語推定部１４３及び明示単語検出部１４４は、それぞれ省略格推定結果、置換単語推定結果及び明示単語検出結果を明示性特徴量として直接、感情識別部１６０へ出力する構成としてもよい。 (Explicit feature quantity calculation unit 145)
The explicit feature quantity calculation unit 145 calculates an explicit feature quantity using the omitted case estimation result, the replacement word estimation result, and the explicit word detection result (s145), and outputs the explicit feature quantity to the emotion identification unit 160. For example, the explicit feature quantity calculation unit 145 calculates, as an explicit feature quantity, one of the following values obtained by using the abbreviation case estimation result, the replacement word estimation result, and the explicit word detection result: To do.
(1) Number of omissions for each abbreviation (2) Presence / absence of omission for each abbreviation ("0""1" information)
(3) Total number of omissions of all abbreviations included in the emotion estimation unit instead of each abbreviation (4) Presence / absence of abbreviations in the emotion estimation unit (“0” “1” information)
(5) Number of appearances for each replacement word (6) Presence / absence of each replacement word (“0” “1” information)
(7) The total number of appearances of all replacement words included in the emotion estimation unit instead of each replacement word (8) Presence / absence of the replacement word in the emotion estimation unit (“0” “1” information)
(9) Number of appearances for each explicit word (10) Presence / absence of each explicit word (“0” and “1” information)
(11) The total number of appearances of all explicit words included in the emotion estimation unit, not every explicit word (12) Presence / absence of the explicit word in the emotion estimation unit (“0” “1” information)
When any one of (1), (5), and (9) is used as the explicit feature amount, the abbreviation estimation result, the replacement word estimation result, and the explicit word detection result On the other hand, since no special processing is required, the explicit feature quantity calculation unit 145 is not provided, and the abbreviation estimation unit 141, the replacement word estimation unit 143, and the explicit word detection unit 144 respectively include an abbreviation estimation result and a replacement. The word estimation result and the explicit word detection result may be directly output to the emotion identification unit 160 as the explicit feature amount.

明示性特徴量の要素数を増やすと後述する感情識別部１６０における識別の精度は上がるが、計算量が増える。よって、適切な明示性特徴量の要素を選択するために、対象テキストデータに応じてサンプルテキストデータを用意し、実験等により明示性特徴量の要素を予め決定しておいてもよい。 Increasing the number of elements of the explicit feature value increases the accuracy of identification in the emotion identification unit 160 described later, but increases the amount of calculation. Therefore, in order to select an appropriate element of explicit feature quantity, sample text data may be prepared according to target text data, and the element of explicit feature quantity may be determined in advance by experiments or the like.

＜感情識別部１６０＞
感情識別部１６０は、明示性特徴量を入力とし、後述する識別器生成部１５０において、予め学習された識別器を用いて、対象テキストデータ内に怒りの感情が表現されているか否かを、明示性推定部１４０で推定された対象テキストデータの明示性特徴量から識別する（ｓ１６０）。 <Emotion identification unit 160>
The emotion discriminating unit 160 receives the explicit feature quantity as input, and uses the discriminator learned in advance in the discriminator generating unit 150 to be described later to determine whether or not an angry emotion is expressed in the target text data. Identification is performed from the explicit feature quantity of the target text data estimated by the explicit estimation unit 140 (s160).

感情識別部１６０は、識別器の識別結果（「怒り」または「平常」）をそのまま感情推定装置１００の推定結果として出力しても良いし、識別器がその識別結果に対する尤度も併せて出力する場合には以下のように推定結果を出力しても良い。（Ａ）識別結果が「怒り」であって、かつ、尤度が第一閾値（例えば０．７）以上のときに、推定結果を「怒り」とし、（Ｂ）識別結果が「怒り」であって、かつ、尤度が第一閾値未満第二閾値（例えば０．３）以上のときに、推定結果を推定不能とし、（Ｃ）それ以外のとき（つまり、識別結果が「平常」のとき、または、尤度が第二閾値未満のとき）に、推定結果を「平常」とする。 The emotion discriminating unit 160 may output the discrimination result (“anger” or “normal”) of the discriminator as it is as the estimation result of the emotion estimation device 100, or the discriminator also outputs the likelihood for the discrimination result. In this case, the estimation result may be output as follows. (A) When the identification result is “anger” and the likelihood is a first threshold (for example, 0.7) or more, the estimation result is “anger”, and (B) the identification result is “anger” If the likelihood is less than the first threshold and the second threshold (for example, 0.3) or more, the estimation result is impossible to estimate, and (C) otherwise (that is, the identification result is “normal”) When the likelihood is less than the second threshold), the estimation result is “normal”.

＜識別器生成部１５０＞
識別器生成部１５０は、教師信号付きの学習用テキストデータの明示性特徴量を用いて識別器を生成する（図９のｓ１５０）。教師信号とは、対応する学習用テキストデータに怒りの感情が表現されているか否かを示す情報である。なお、人が各学習用テキストデータを見て、怒りの感情が表現されているか否かを判断し、各学習用テキストデータに教師信号を付加する。 <Identifier generation unit 150>
The discriminator generation unit 150 generates a discriminator using the explicit feature amount of the learning text data with the teacher signal (s150 in FIG. 9). The teacher signal is information indicating whether or not an angry emotion is expressed in corresponding learning text data. A person looks at each learning text data to determine whether or not an angry emotion is expressed, and adds a teacher signal to each learning text data.

識別器を生成する手法として機械学習手法がある。例えば、機械学習の学習アルゴリズムとしては様々なものを採用することができ、教師あり学習の線形判別法、ＳＶＭ及びニューラルネット等を用いることができる。 There is a machine learning method as a method for generating a discriminator. For example, various learning algorithms for machine learning can be adopted, and a supervised learning linear discriminant method, SVM, neural network, or the like can be used.

＜学習用テキストデータの明示性特徴量の算出方法＞
図８及び図９を用いて識別器生成部１５０で用いる学習用テキストデータの明示性特徴量の算出方法を説明する。 <Calculation method of explicit feature quantity of text data for learning>
A method for calculating the explicit feature value of the text data for learning used in the classifier generation unit 150 will be described with reference to FIGS. 8 and 9.

テキスト解析部１３０は、学習用テキストコーパス９０から入力される学習用テキストデータを解析し、その学習用テキストデータから得られる形態素解析結果と、その形態素解析結果に基づき学習用テキストデータ内の係り受け関係を解析した構文解析結果とを求める（ｓ１３０−２）。 The text analysis unit 130 analyzes the learning text data input from the learning text corpus 90, the morpheme analysis result obtained from the learning text data, and the dependency in the learning text data based on the morpheme analysis result. A syntax analysis result obtained by analyzing the relationship is obtained (s130-2).

明示性推定部１４０は、学習用テキストデータから得られる形態素解析結果と、その形態素解析結果に基づき学習用テキストデータ中の係り受け関係を解析した構文解析結果と、を用いて、学習用テキストデータ内にどの程度その意味内容が明示されているかを表す明示性特徴量を推定する（ｓ１４０−２）。 The explicitness estimation unit 140 uses the morpheme analysis result obtained from the learning text data and the syntax analysis result obtained by analyzing the dependency relationship in the learning text data based on the morpheme analysis result, to obtain the learning text data. Then, an explicit feature amount indicating how much the semantic content is clearly shown is estimated (s140-2).

つまり、特別な構成を設けずとも、対象テキストデータではなく、学習用テキストデータに基づき、前述のテキスト解析部１３０及び明示性推定部１４０において同様の処理を行うことで、識別器生成部１５０で用いる学習用テキストデータの明示性特徴量を取得することができる。 That is, the discriminator generation unit 150 performs the same processing in the text analysis unit 130 and the explicitness estimation unit 140 based on the learning text data instead of the target text data without providing a special configuration. The explicit feature quantity of the text data for learning to be used can be acquired.

＜効果＞
このような構成とすることにより、単語の意味情報を記録したデータベース（例えば感情関係語ＤＢ）を用いずに怒りの感情を推定することができる。よって、感情関係語ＤＢ等の作成コストを削減でき、未知語及び文脈等に影響されずに対象テキストデータ内に表現されている怒りの感情を頑健に推定することができる。 <Effect>
With such a configuration, an angry emotion can be estimated without using a database (for example, an emotion related word DB) in which word semantic information is recorded. Therefore, it is possible to reduce the creation cost of the emotion related word DB and the like, and it is possible to robustly estimate the anger emotion expressed in the target text data without being influenced by the unknown word and the context.

［変形例］
別装置等で対象テキストデータに対して予め形態素解析、構文解析等を済ませておき、感情推定装置１００に対して、対象テキストデータの形態素解析結果と構文解析結果とが入力される場合には、感情推定装置１００はテキスト解析部１３０を備えなくともよい。 [Modification]
When morphological analysis, syntax analysis, etc. are completed in advance on the target text data in another device or the like, and the morphological analysis result and syntax analysis result of the target text data are input to the emotion estimation device 100, The emotion estimation device 100 may not include the text analysis unit 130.

明示性推定部１４０は、必要に応じて、省略格推定部１４１と置換単語推定部１４３と明示単語検出部１４４のうち少なくとも１つを備えればよい。例えば、明示性特徴量算出部１４５で説明した（１）〜（４）のうちの何れか、または、（１）〜（４）の組合せを明示性特徴量として算出する場合には、少なくとも省略格推定部１４１を備えればよい。（５）〜（８）のうちの何れか、または、（５）〜（８）の組合せを明示性特徴量として算出する場合には、少なくとも置換単語推定部１４３を備えればよい。（９）〜（１２）のうちの何れか、または、（９）〜（１２）の組合せを明示性特徴量として算出する場合には、少なくとも明示単語検出部１４４を備えればよい。なお、省略格推定部１４１を備えない場合には、構文解析結果が不要となるため、構文解析部１３３を備えなくともよく、明示単語検出部１４４を備えない場合には、明示単語リスト記憶部１４４ａを備えなくともよい。 The explicitness estimation unit 140 may include at least one of the abbreviation estimation unit 141, the replacement word estimation unit 143, and the explicit word detection unit 144 as necessary. For example, when calculating any one of (1) to (4) described in the explicit feature value calculation unit 145 or a combination of (1) to (4) as the explicit feature value, at least omitted. What is necessary is just to provide the case estimation part 141. FIG. If any one of (5) to (8) or a combination of (5) to (8) is calculated as the explicit feature value, at least the replacement word estimation unit 143 may be provided. When calculating any one of (9) to (12) or a combination of (9) to (12) as the explicit feature amount, at least the explicit word detecting unit 144 may be provided. If the abbreviation estimation unit 141 is not provided, the syntax analysis result is not required. Therefore, the syntax analysis unit 133 may not be provided. If the explicit word detection unit 144 is not provided, an explicit word list storage unit is provided. 144a may not be provided.

感情推定装置１００は、必ずしも識別器生成部１５０を備えなくともよい。例えば、ある感情推定装置で学習し、生成した識別器を他の感情推定装置で用いる場合には、ある感情推定装置にのみ識別器生成部を備えればよい。 The emotion estimation device 100 does not necessarily include the discriminator generation unit 150. For example, when a discriminator learned and generated with a certain emotion estimation device is used with another emotion estimation device, only a certain emotion estimation device has a discriminator generation unit.

感情推定装置１００は、対象音声データを入力としてもよい。その場合、テキスト解析部１３０の前段に図示しない音声認識部を設ける構成とする。さらに、図示しない音声認識部において、音声認識処理の過程において、形態素解析、構文解析を行う場合には、音声認識の結果として得られる対象テキストデータに対応する形態素解析結果、構文解析結果を明示性推定部１４０で用いる構成としてもよい。この場合も感情推定装置１００はテキスト解析部１３０を備えなくともよい。 Emotion estimation device 100 may receive target audio data as input. In that case, a speech recognition unit (not shown) is provided in the preceding stage of the text analysis unit 130. Furthermore, when a speech recognition unit (not shown) performs morphological analysis and syntax analysis in the course of speech recognition processing, the morphological analysis result and syntax analysis result corresponding to the target text data obtained as a result of speech recognition are clearly displayed. It is good also as a structure used by the estimation part 140. FIG. Also in this case, the emotion estimation apparatus 100 may not include the text analysis unit 130.

＜感情推定装置２００＞
図１０及び図１１を用いて実施例２に係る感情推定装置２００を説明する。なお、実施例１と異なる部分のみ説明する。感情推定装置２００は、音声認識部２１０と、テキスト解析部１３０と、明示性推定部２４０と、識別器生成部１５０と、感情識別部１６０と、を備える。テキスト解析部１３０、識別器生成部１５０及び感情識別部１６０の処理内容は実施例１と同様である。但し、テキスト解析部１３０で扱う対象テキストデータには後述する認識信頼度が付加されており、テキスト解析部１３０は認識信頼度付きの構文解析結果と形態素解析結果を出力する（図１０参照）。また、識別器生成部１５０及び感情識別部１６０において処理する明示性特徴量は、この認識信頼度を利用して求められた値である。 <Emotion estimation device 200>
An emotion estimation apparatus 200 according to the second embodiment will be described with reference to FIGS. 10 and 11. Only parts different from the first embodiment will be described. The emotion estimation device 200 includes a voice recognition unit 210, a text analysis unit 130, an explicitness estimation unit 240, a discriminator generation unit 150, and an emotion identification unit 160. The processing contents of the text analysis unit 130, the classifier generation unit 150, and the emotion identification unit 160 are the same as those in the first embodiment. However, the recognition reliability described later is added to the target text data handled by the text analysis unit 130, and the text analysis unit 130 outputs a syntax analysis result and a morphological analysis result with the recognition reliability (see FIG. 10). Further, the explicit feature quantity processed in the discriminator generation unit 150 and the emotion discrimination unit 160 is a value obtained using this recognition reliability.

＜音声認識部２１０＞
音声認識部２１０は、対象音声データを入力とし、対象音声データに対して音声認識処理を行い、対象テキストデータへ変換する。音声認識部２１０は、対象音声データから得られる対象テキストデータと、その対象テキストデータ内に含まれる単語毎の認識結果の信頼性を示す認識信頼度と、を求め（ｓ２１０）、認識信頼度付きの対象テキストデータをテキスト解析部１３０へ出力する。なお、認識信頼度とは認識結果の尤もらしさの信頼性を示す値である。認識信頼度が高ければ認識結果が正しいと推測され、低ければ認識結果が誤っていると推測される。なお、音声認識技術としては、従来技術（例えば、［古井貞煕著、「音響・音声工学 (電子・情報工学入門シリーズ)」、近代科学社、１９９２（以下「参考文献３」という）］記載の従来技術）を用いることができる。対象音声データが話者別にステレオ録音されている場合はモノラル録音されている場合よりも音声認識が容易である。なお、モノラル録音の場合は、話者別に音声を識別するための手段と併用する。話者識別技術としては、従来技術（例えば、参考文献３記載の従来技術）を用いることができる。例えば、音声スペクトルを特徴量とし、ＧＭＭ（Gaussian Mixture Model）を用いる方法などがある。 <Voice recognition unit 210>
The voice recognition unit 210 receives the target voice data, performs voice recognition processing on the target voice data, and converts it into target text data. The speech recognition unit 210 obtains the target text data obtained from the target speech data and the recognition reliability indicating the reliability of the recognition result for each word included in the target text data (s210), with the recognition reliability. Is output to the text analysis unit 130. Note that the recognition reliability is a value indicating the reliability of the likelihood of the recognition result. If the recognition reliability is high, it is presumed that the recognition result is correct, and if it is low, the recognition result is presumed to be incorrect. In addition, as speech recognition technology, description is made on the prior art (for example, [Sadaaki Furui, “Acoustic / Speech Engineering (Introduction to Electronic / Information Engineering)”), Modern Science Co., 1992 (hereinafter referred to as “Reference 3”)]. Can be used. When the target voice data is recorded in stereo for each speaker, voice recognition is easier than in the case of monaural recording. In the case of monaural recording, it is used in combination with a means for identifying speech for each speaker. As the speaker identification technique, a conventional technique (for example, a conventional technique described in Reference 3) can be used. For example, there is a method of using a GMM (Gaussian Mixture Model) with an audio spectrum as a feature quantity.

なお、音声認識処理過程において、形態素解析、構文解析を行う場合には、音声認識の結果として得られる対象テキストデータに対応する形態素解析結果、構文解析結果を明示性推定部２４０で用いる構成としてもよい。この場合も感情推定装置２００はテキスト解析部１３０を備えなくともよい。 When performing morpheme analysis and syntax analysis in the speech recognition process, the explicitness estimation unit 240 may use the morpheme analysis result and syntax analysis result corresponding to the target text data obtained as a result of speech recognition. Good. Also in this case, the emotion estimation apparatus 200 may not include the text analysis unit 130.

＜明示性推定部２４０＞
明示性推定部２４０は、認識信頼度付きの形態素解析結果と認識信頼度付きの構文解析結果との少なくとも一方を用いて、対象テキストデータ内にどの程度その意味内容が明示されているかを表す明示性特徴量を推定する（ｓ２４０）。明示性推定部２４０は省略格推定部２４１と、置換単語推定部２４３と、明示単語検出部２４４と、明示性特徴量算出部２４５と、を有する。以下、詳細を説明する。 <Explicitness estimation unit 240>
The explicitness estimation unit 240 uses an at least one of a morphological analysis result with recognition reliability and a syntactic analysis result with recognition reliability to express how much the semantic content is clearly specified in the target text data. The sex characteristic amount is estimated (s240). The explicitness estimation unit 240 includes an abbreviated case estimation unit 241, a replacement word estimation unit 243, an explicit word detection unit 244, and an explicit feature quantity calculation unit 245. Details will be described below.

（省略格推定部２４１）
省略格推定部２４１は、認識信頼度付きの構文解析結果を入力とし、これを用いて、対象テキストデータ内に本来あるべき省略されている格を省略格として推定し（ｓ１４１）、省略格推定結果を明示性特徴量算出部１４５へ出力する。 (Omitted case estimation unit 241)
The omitted case estimation unit 241 receives the parsing result with the recognition reliability as an input, and estimates the omitted case that should originally exist in the target text data as the omitted case (s141). The result is output to the explicit feature value calculation unit 145.

省略格推定部２４１は、例えば以下のようにして、省略格の推定を行う際に認識信頼度を組合せることで、誤りを含む音声認識結果に対しても頑健に省略格を推定することが可能になる。例えば、図５の対象テキストデータ中に動詞「見る」の認識信頼度が「０．８」で、動詞「驚く」の認識信頼度が「０.２」であったとする。 For example, the abbreviation estimation unit 241 can estimate the abbreviation robustly even for a speech recognition result including an error by combining the recognition reliability when estimating the abbreviation as follows. It becomes possible. For example, in the target text data of FIG. 5, the recognition reliability of the verb “see” is “0.8”, and the recognition reliability of the verb “surprise” is “0.2”.

（１）省略格推定部２４１は、認識信頼度がある閾値以上である動詞に対する省略格の種類とその省略格の省略回数を省略格推定結果として出力する。閾値は事前に定めておく。閾値を「０．５」としていれば、認識信頼度が「０．５」以上である動詞「見る」に対する省略格（対象格）のみを集計し、置換単語推定結果として（「対象格」、１個）を出力する。
（２）置換単語推定部２４３は、省略格の種類と、その省略格に対する動詞の認識信頼度の値の和を置換単語推定結果として出力する。この場合、（「対象格」、０．８）、（「動作主格」、０．２）を出力する。 (1) The abbreviation estimation unit 241 outputs, as an abbreviation estimation result, the type of abbreviation for a verb whose recognition reliability is equal to or greater than a certain threshold and the number of abbreviations of that abbreviation. The threshold value is determined in advance. If the threshold value is “0.5”, only the abbreviations (target case) for the verb “see” whose recognition reliability is “0.5” or more are totaled, and the replacement word estimation result (“target case”, 1) is output.
(2) The replacement word estimation unit 243 outputs the sum of the abbreviated case type and the verb recognition reliability value for the abbreviated case as a replacement word estimation result. In this case, (“target case”, 0.8) and (“operation main case”, 0.2) are output.

認識信頼度も絶対では無いので、認識信頼度が低いからといって必ずしも間違っているとは限らない。（１）のように閾値で単純に集計するのではなく、（２）のように集計することで、認識信頼度が低い動詞に対する省略格も推定結果に反映することができる。 Since the recognition reliability is not absolute, just because the recognition reliability is low does not necessarily mean that it is wrong. Rather than simply summing up with threshold values as in (1), by summing up as in (2), abbreviations for verbs with low recognition reliability can be reflected in the estimation results.

（置換単語推定部２４３）
置換単語推定部２４３は、認識信頼度付きの形態素解析結果を用いて、対象テキストデータ内に本来あるべき単語の代わりに存在する単語を置換単語として推定し（ｓ２４３）、置換単語推定結果を明示性特徴量算出部１４５へ出力する。 (Replacement word estimation unit 243)
The replacement word estimation unit 243 uses the morphological analysis result with the recognition reliability to estimate a word that exists in the target text data instead of the word that should be present as a replacement word (s243), and clearly indicates the replacement word estimation result. It outputs to the sex feature amount calculation unit 145.

置換単語推定部２４３は、例えば以下のようにして、置換単語の推定を行う際に認識信頼度を組合せることで、誤りを含む音声認識結果に対しても頑健に置換単語を推定することが可能になる。例えば、対象テキストデータ中に置換単語「それ」が２回出てきていたとし、１個目の「それ」の認識信頼度が「０．８」で２個目の「それ」の認識信頼度が「０.２」であったとする。 For example, the replacement word estimation unit 243 can robustly estimate the replacement word even for the speech recognition result including the error by combining the recognition reliability when estimating the replacement word as follows. It becomes possible. For example, assuming that the replacement word “it” appears twice in the target text data, the recognition reliability of the first “it” is “0.8”, and the recognition reliability of the second “it” Is “0.2”.

（１）置換単語推定部２４３は、認識信頼度がある閾値以上である置換単語の種類とその置換単語の出現回数を置換単語推定結果として出力する。閾値は事前に定めておく。閾値を「０．５」としていれば、認識信頼度が「０．５」以上である１個目の「それ」のみを集計し、置換単語推定結果として（「それ」、１個）を出力する。
（２）置換単語推定部２４３は、置換単語の種類と、その置換単語の認識信頼度の値の和を置換単語推定結果として出力する。この場合、（「それ」、１（＝０．８＋０．２））を出力する。 (1) The replacement word estimation unit 243 outputs, as a replacement word estimation result, the type of replacement word whose recognition reliability is equal to or higher than a certain threshold and the number of appearances of the replacement word. The threshold value is determined in advance. If the threshold is “0.5”, only the first “it” whose recognition reliability is “0.5” or more is totaled, and (“it”, one) is output as the replacement word estimation result. To do.
(2) The replacement word estimation unit 243 outputs the sum of the type of replacement word and the recognition reliability value of the replacement word as a replacement word estimation result. In this case, (“it”, 1 (= 0.8 + 0.2)) is output.

認識信頼度も絶対では無いので、認識信頼度が低いからといって必ずしも間違っているとは限らない。（１）のように閾値で単純に集計するのではなく、（２）のように集計することで、認識信頼度が低い置換単語も推定結果に反映することができる。
（明示単語検出部２４４）
明示単語検出部２４４は、認識信頼度付きの形態素解析結果を用いて、明示単語リスト記憶部１４４ａを参照し、対象テキストデータ内に存在する明示単語を検出し（ｓ２４４）、明示単語検出結果を明示性特徴量算出部１４５へ出力する。 Since the recognition reliability is not absolute, just because the recognition reliability is low does not necessarily mean that it is wrong. Rather than simply summing up with threshold values as in (1), replacement words with low recognition reliability can be reflected in the estimation results by summing up as in (2).
(Explicit word detection unit 244)
The explicit word detection unit 244 refers to the explicit word list storage unit 144a using the morphological analysis result with the recognition reliability, detects an explicit word existing in the target text data (s244), and displays the explicit word detection result. This is output to the explicit feature quantity calculation unit 145.

明示単語検出部２４４は、例えば以下のようにして、明示単語の検出を行う際に認識信頼度を組合せることで、誤りを含む音声認識結果に対しても頑健に明示単語を検出することが可能になる。例えば、対象テキストデータ中に明示単語「あなた」が２回出てきていたとし、１個目の「あなた」の認識信頼度が「０．８」で２個目の「あなた」の認識信頼度が「０.２」であったとし、明示単語リスト記憶部１１４ａには図７に示す明示単語リストが記憶されているものとする。 The explicit word detection unit 244 can detect an explicit word robustly even for a speech recognition result including an error by combining recognition reliability when detecting an explicit word, for example, as follows. It becomes possible. For example, if the explicit word “you” appears twice in the target text data, the recognition reliability of the first “you” is “0.8” and the recognition reliability of the second “you” Is "0.2", and the explicit word list storage unit 114a stores the explicit word list shown in FIG.

（１）明示単語検出部２４４は、認識信頼度がある閾値以上である明示単語の種類とその明示単語の出現回数を明示単語検出結果として出力する。閾値は事前に定めておく。閾値を「０．５」としていれば、認識信頼度が「０．５」以上である１個目の「あなた」のみを集計し、明示単語検出結果として（「あなた」、１個）を出力する。
（２）明示単語検出部２４４は、明示単語の種類と、その明示単語の認識信頼度の値の和を明示単語検出結果として出力する。この場合、（「あなた」、１（＝０．８＋０．２））を出力する。 (1) The explicit word detection unit 244 outputs, as an explicit word detection result, the type of explicit word whose recognition reliability is equal to or higher than a certain threshold and the number of appearances of the explicit word. The threshold value is determined in advance. If the threshold is set to “0.5”, only the first “you” whose recognition reliability is “0.5” or higher is aggregated, and (“you”, 1 item) is output as the explicit word detection result. To do.
(2) The explicit word detection unit 244 outputs the sum of the type of the explicit word and the recognition reliability value of the explicit word as an explicit word detection result. In this case, (“you”, 1 (= 0.8 + 0.2)) is output.

認識信頼度も絶対では無いので、認識信頼度が低いからといって必ずしも間違っているとは限らない。（１）のように閾値で単純に集計するのではなく、（２）のように集計することで、認識信頼度が低い明示単語も検出結果に反映することができる。 Since the recognition reliability is not absolute, just because the recognition reliability is low does not necessarily mean that it is wrong. Rather than simply summing up with threshold values as in (1), by summing up as in (2), explicit words with low recognition reliability can be reflected in the detection results.

（明示性特徴量算出部２４５）
明示性特徴量算出部２４５は、省略格推定結果と置換単語推定結果と明示単語検出結果とを用いて、明示性特徴量を算出し（ｓ２４５）、感情識別部１６０へ出力する。例えば、明示性特徴量算出部２４５は、省略格推定結果と置換単語推定結果と明示単語検出結果とを用いて得られる以下の値の何れか、または、それらの組合せを明示性特徴量として算出する。 (Explicit feature calculation unit 245)
The explicit feature quantity calculation unit 245 calculates an explicit feature quantity using the omitted case estimation result, the replacement word estimation result, and the explicit word detection result (s245), and outputs the explicit feature quantity to the emotion identification unit 160. For example, the explicit feature quantity calculation unit 245 calculates, as an explicit feature quantity, one of the following values obtained by using the abbreviation case estimation result, the replacement word estimation result, and the explicit word detection result: To do.

（１’）省略格毎の省略回数、または、各省略格に対応する動詞の認識信頼度の値の和
（２）省略格毎の省略の有無（「０」「１」情報）
（３’）省略格毎でなく、感情推定単位中に含まれる全ての省略格の合計省略回数、または、感情推定単位中に含まれる全ての省略格に対応する動詞の認識信頼度の値の和
（４）感情推定単位における省略格の有無（「０」「１」情報）
（５’）置換単語毎の出現回数、または、置換単語毎の認識信頼度の値の和
（６）置換単語毎の出現の有無（「０」「１」情報）
（７’）置換単語毎でなく、感情推定単位中に含まれる全ての置換単語の合計出現回数、または、感情推定単位中に含まれる全ての置換単語の認識信頼度の値の和
（８）感情推定単位における置換単語の出現の有無（「０」「１」情報）
（９’）明示単語毎の出現回数、または、明示単語毎の認識信頼度の値の和
（１０）明示単語毎の出現の有無（「０」「１」情報）
（１１’）明示単語毎でなく、感情推定単位中に含まれる全ての明示単語の合計出現回数、または、感情推定単位中に含まれる全ての明示単語の認識信頼度の値の和
（１２）感情推定単位における明示単語の出現の有無（「０」「１」情報）
なお、（１’）、（３’）、（５’）、（７’）、（９’）、（１１’）を用いる点が、明示性特徴量算出部１４５と異なる。 (1 ′) Number of omissions for each abbreviation, or sum of recognition reliability values of verbs corresponding to each abbreviation (2) Presence / absence of omission for each abbreviation (“0” “1” information)
(3 ′) The total number of omissions of all abbreviations included in the emotion estimation unit, not the abbreviations, or the recognition reliability value of the verb corresponding to all abbreviations included in the emotion estimation unit Sum (4) Presence or absence of abbreviation in emotion estimation unit ("0""1" information)
(5 ′) Number of occurrences for each replacement word or sum of recognition reliability values for each replacement word (6) Presence / absence of each replacement word (“0” “1” information)
(7 ′) The total number of appearances of all replacement words included in the emotion estimation unit, or the sum of recognition reliability values of all replacement words included in the emotion estimation unit, instead of each replacement word (8) Presence / absence of replacement word in emotion estimation unit ("0""1" information)
(9 ') Number of appearances for each explicit word or the sum of recognition reliability values for each explicit word (10) Presence / absence of each explicit word ("0""1" information)
(11 ′) The total number of appearances of all the explicit words included in the emotion estimation unit, or the sum of the recognition reliability values of all the explicit words included in the emotion estimation unit, not every explicit word (12) Presence / absence of explicit word in emotion estimation unit ("0""1" information)
Note that (1 ′), (3 ′), (5 ′), (7 ′), (9 ′), and (11 ′) are different from the explicit feature value calculation unit 145.

＜学習用テキストデータの明示性特徴量の算出方法＞
図１２及び図１３を用いて識別器生成部１５０で用いる学習用テキストデータの明示性特徴量の算出方法を説明する。 <Calculation method of explicit feature quantity of text data for learning>
A method for calculating the explicit feature amount of the text data for learning used in the classifier generation unit 150 will be described with reference to FIGS. 12 and 13.

音声認識部２１０は、学習用音声コーパス９１から入力される学習用音声データに対して音声認識処理を行い、学習用テキストデータへ変換する。音声認識部２１０は、学習用音声データから得られる学習用テキストデータと、その学習用テキストデータ内に含まれる単語毎の認識結果の信頼性を示す認識信頼度と、を求め（ｓ２１０−２）、認識信頼度付きの学習用テキストデータをテキスト解析部１３０へ出力する。 The speech recognition unit 210 performs speech recognition processing on the learning speech data input from the learning speech corpus 91 and converts it into learning text data. The speech recognition unit 210 obtains learning text data obtained from the learning speech data and a recognition reliability indicating the reliability of the recognition result for each word included in the learning text data (s210-2). The learning text data with recognition reliability is output to the text analysis unit 130.

テキスト解析部１３０は、音声認識部２１０から入力される認識信頼度付き学習用テキストデータを解析し、その学習用テキストデータから得られる認識信頼度付き形態素解析結果と、その形態素解析結果に基づき学習用テキストデータ内の係り受け関係を解析した認識信頼度付き構文解析結果とを求める（ｓ１３０−２）。 The text analysis unit 130 analyzes the learning text data with recognition reliability input from the speech recognition unit 210 and learns based on the morpheme analysis result with recognition reliability obtained from the learning text data and the morpheme analysis result. The syntactic analysis result with the recognition reliability obtained by analyzing the dependency relation in the text data for use is obtained (s130-2).

明示性推定部１４０は、学習用テキストデータから得られる認識信頼度付き形態素解析結果と、その形態素解析結果に基づき学習用テキストデータ中の係り受け関係を解析した認識信頼度付き構文解析結果と、を用いて、学習用テキストデータ内にどの程度その意味内容が明示されているかを表す明示性特徴量を推定する（ｓ２４０−２）。 The explicitness estimation unit 140 includes a morpheme analysis result with recognition reliability obtained from the text data for learning, and a syntax analysis result with recognition reliability obtained by analyzing the dependency relationship in the text data for learning based on the morpheme analysis result, Is used to estimate an explicit feature amount indicating how much the semantic content is clearly specified in the learning text data (s240-2).

つまり、特別な構成を設けずとも、対象音声データではなく、学習用音声データに基づき、前述の音声認識部２１０、テキスト解析部１３０及び明示性推定部２４０において同様の処理を行うことで、識別器生成部１５０で用いる学習用テキストデータの明示性特徴量を取得することができる。 That is, even if no special configuration is provided, identification is performed by performing the same processing in the speech recognition unit 210, the text analysis unit 130, and the explicitness estimation unit 240 based on the learning speech data instead of the target speech data. The explicit feature amount of the text data for learning used in the generator generator 150 can be acquired.

＜効果＞
このような構成により実施例１と同様の効果を得ることができる。さらに、認識信頼度を利用することで、誤りを含む音声認識結果に対しても頑健に明示性の推定が可能となる。 <Effect>
With such a configuration, the same effect as in the first embodiment can be obtained. Further, by using the recognition reliability, it is possible to robustly estimate the explicitness even for the speech recognition result including an error.

韻律情報は感情の識別に有効であることは一般的に知られている。そこで、入力が音声データの場合、実施例２の感情推定装置２００に、声の高さや声の大きさ等といった韻律情報を組合せて感情を推定してもよい。 It is generally known that prosodic information is effective in identifying emotions. Therefore, when the input is voice data, the emotion estimation apparatus 200 according to the second embodiment may be combined with prosodic information such as voice pitch and voice volume to estimate emotion.

実施例３では、対象音声データのピッチやパワー等を計算し、それらの平均値や最大・最小値・分散等、さらに、それらの動的特徴量ΔやΔΔの値等を韻律的特徴量として用いる。これらの韻律的特徴量と明示性特徴量を組合せて、感情識別部において感情を識別する際に利用する。このような構成により、より推定精度を上げることができる。 In the third embodiment, the pitch, power, and the like of the target audio data are calculated, and the average value, maximum / minimum value, variance, and the like thereof, and the dynamic feature value Δ and the value of ΔΔ are used as prosodic feature values. Use. These prosodic feature values and explicit feature values are used in combination to identify emotions in the emotion identification unit. With such a configuration, the estimation accuracy can be further increased.

＜感情推定装置３００＞
図１０及び図１１を用いて実施例３に係る感情推定装置３００を説明する。なお、実施例２と異なる部分のみ説明する。感情推定装置３００は、音声認識部２１０と、テキスト解析部１３０と、明示性推定部２４０と、識別器生成部３５０と、感情識別部３６０とに加え、さらに図１０及び図１２中、破線で示す韻律的特徴量算出部３２０と、を備える。音声認識部２１０と、テキスト解析部１３０、明示性推定部２４０の処理内容は実施例２と同様である。 <Emotion estimation device 300>
An emotion estimation apparatus 300 according to the third embodiment will be described with reference to FIGS. 10 and 11. Only parts different from the second embodiment will be described. The emotion estimation device 300 includes a voice recognition unit 210, a text analysis unit 130, an explicitness estimation unit 240, a discriminator generation unit 350, and an emotion identification unit 360, and also in broken lines in FIGS. A prosodic feature quantity calculation unit 320. The processing contents of the speech recognition unit 210, the text analysis unit 130, and the explicitness estimation unit 240 are the same as those in the second embodiment.

＜韻律的特徴量算出部３２０＞
韻律的特徴量算出部３２０は、対象音声データを用いて、その韻律的特徴量を算出する（図１１中、破線で示すｓ３２０）。韻律的特徴量を算出する技術としては、従来技術（例えば、参考文献３記載の従来技術）を用いることができる。韻律的特徴量としてピッチ（声の高さ）やパワー（声の大きさ）等を利用する。例えば、以下の値の何れか、または、それらの組合せを韻律的特徴量として算出する。
（Ａ）：ピッチの平均値
（Ｂ）：ピッチの最大値
（Ｃ）：ピッチの分散値
（Ｄ）：パワーの平均値
（Ｅ）：パワーの最大値
（Ｆ）：パワーの分散値
（Ｇ）：（Ａ）〜（Ｆ）の何れかの増分Δ
（Ｈ）：（Ｇ）の増分ΔΔ
なお、（Ｇ）や（Ｈ）を用いることで急峻な立ち上がりなど変動の大きさを捉えることができる。明示性特徴量と同様に、韻律的特徴量の要素数を増やすと後述する感情識別部３６０における識別の精度は上がるが、計算量も増える。よって、実験等により韻律的特徴量の要素を予め決定しておいてもよい。 <Prosodic feature amount calculation unit 320>
The prosodic feature value calculation unit 320 calculates the prosodic feature value using the target speech data (s320 indicated by a broken line in FIG. 11). As a technique for calculating the prosodic feature value, a conventional technique (for example, a conventional technique described in Reference 3) can be used. Pitch (voice pitch), power (voice volume), etc. are used as prosodic features. For example, one of the following values or a combination thereof is calculated as the prosodic feature quantity.
(A): Average value of pitch (B): Maximum value of pitch (C): Dispersion value of pitch (D): Average value of power (E): Maximum value of power (F): Dispersion value of power (G ): Any increment Δ of (A) to (F)
(H): Increment ΔΔ of (G)
Note that by using (G) and (H), the magnitude of fluctuation such as a steep rise can be captured. As with the explicit feature value, increasing the number of elements of the prosodic feature value increases the accuracy of discrimination in the emotion discriminator 360 described later, but also increases the calculation amount. Therefore, elements of prosodic feature values may be determined in advance by experiments or the like.

＜感情識別部３６０＞
感情識別部３６０は、学習用テキストデータの明示性特徴量に加え、学習用テキストデータに対応する学習用音声データの韻律的特徴量から予め学習された識別器を用いて、対象音声データ内に怒りの感情が表現されているか否かを、明示性推定部２４０で推定された明示性特徴量と韻律的特徴量算出部３２０で推定された韻律的特徴量とから識別し（ｓ３６０）、識別結果を感情推定装置３００の推定結果として出力する。 <Emotion identification unit 360>
The emotion discriminating unit 360 uses the discriminator previously learned from the prosodic feature quantity of the learning speech data corresponding to the learning text data in addition to the explicit feature quantity of the learning text data, Whether or not an angry emotion is expressed is discriminated from the explicit feature quantity estimated by the explicitness estimation unit 240 and the prosodic feature quantity estimated by the prosodic feature quantity calculation unit 320 (s360). The result is output as the estimation result of emotion estimation apparatus 300.

＜識別器生成部３５０＞
識別器生成部３５０は、教師信号付きの学習用テキストデータの明示性特徴量に加え、教師信号付きの学習用テキストデータに対応する学習用音声データの韻律的特徴量から識別器を学習し、生成する（図１３のｓ３５０）。なお、教師信号とは、対応する学習用テキストデータ及び学習用音声データに怒りの感情が表現されているか否かを示す情報である。実施例３では、明示性特徴量に加え、韻律的特徴量を考慮するため、人が各学習用音声データを聴き、さらに、人が学習用テキストデータを見て、学習用テキストデータ及び学習用音声データに怒りの感情が表現されているか否かを総合的に判断し、各学習用テキストデータに教師信号を付加する。 <Identifier generation unit 350>
The discriminator generation unit 350 learns a discriminator from prosodic feature quantities of learning speech data corresponding to learning text data with a teacher signal in addition to the explicit feature quantity of learning text data with a teacher signal, It is generated (s350 in FIG. 13). The teacher signal is information indicating whether or not an angry emotion is expressed in the corresponding learning text data and learning voice data. In the third embodiment, in order to consider prosodic feature values in addition to explicit feature values, a person listens to each learning speech data, and a person looks at learning text data to learn text data and learning data. It is comprehensively determined whether or not an angry emotion is expressed in the voice data, and a teacher signal is added to each learning text data.

＜学習用テキストデータの明示性特徴量の算出方法＞
図１２及び図１３を用いて識別器生成部３５０で用いる学習用音声データの韻律的特徴量と学習用テキストデータの明示性特徴量の算出方法を説明する。 <Calculation method of explicit feature quantity of text data for learning>
A method of calculating prosodic feature quantities of learning speech data and explicit feature quantities of learning text data used in the discriminator generation unit 350 will be described with reference to FIGS. 12 and 13.

韻律的特徴量算出部３２０は、学習用音声コーパス９１から入力される学習用音声データを用いて、その韻律的特徴量を算出し（図１２中、破線で示すｓ３２０−２）、識別器生成部３５０へ出力する。 The prosodic feature value calculation unit 320 calculates the prosodic feature value using the learning speech data input from the learning speech corpus 91 (s320-2 indicated by a broken line in FIG. 12), and generates a discriminator. To the unit 350.

実施例２と同様の処理（ｓ２１０−２〜ｓ２４０−２）によって、学習用テキストデータの明示性特徴量を推定し、識別器生成部３５０へ出力する。 By the same processing (s210-2 to s240-2) as in the second embodiment, the explicit feature amount of the text data for learning is estimated and output to the discriminator generating unit 350.

つまり、特別な構成を設けずとも、対象音声データではなく、学習用音声データに基づき、前述の韻律的特徴量算出部３２０、音声認識部２１０、テキスト解析部１３０及び明示性推定部１４０において同様の処理を行うことで、識別器生成部３５０で用いる学習用音声データの韻律的特徴量と学習用テキストデータの明示性特徴量を取得することができる。 That is, even if no special configuration is provided, the same applies to the prosodic feature value calculation unit 320, the speech recognition unit 210, the text analysis unit 130, and the explicitness estimation unit 140 based on the learning speech data, not the target speech data. By performing this process, it is possible to acquire the prosodic feature quantity of the learning speech data used by the discriminator generation unit 350 and the explicit feature quantity of the learning text data.

＜効果＞
このような構成により、韻律的特徴量を考慮することができ、さらに識別性能を向上させることが可能になる。 <Effect>
With such a configuration, prosodic feature quantities can be taken into account, and the discrimination performance can be further improved.

［変形例］
実施例３において韻律的特徴量算出部３２０において、対象音声データの韻律的特徴量を算出しているが、音声認識処理の過程において、韻律的特徴量を算出できる場合には、別途、韻律的特徴量算出部３２０を設けずに、音声認識部２１０の一部としてもよい。 [Modification]
In the third embodiment, the prosodic feature quantity calculation unit 320 calculates the prosodic feature quantity of the target speech data. However, if the prosodic feature quantity can be calculated in the process of speech recognition processing, the prosodic feature quantity is separately provided. The feature amount calculation unit 320 may not be provided and may be part of the voice recognition unit 210.

実施例３では、音声認識部２１０において、認識信頼度付き学習用テキストデータを出力しているが、認識信頼度が付加されていない学習用テキストデータを用いてもよい。その場合には、明示性推定部２４０に代えて実施例１の明示性推定部１４０を用いればよい。 In the third embodiment, the speech recognition unit 210 outputs learning text data with recognition reliability, but learning text data to which no recognition reliability is added may be used. In that case, the explicitness estimation unit 140 of the first embodiment may be used instead of the explicitness estimation unit 240.

近年、企業の抱えるコールセンタに集まる顧客からの要望や不満といった生の声から、企業にとって何か有益な情報を得ようとする動きが盛んである。またコールセンタは企業の顔という機能としても重要視され始め、顧客が企業に対し抱くイメージを向上させるために、コールセンタのサービスの質の向上も企業は力を入れている。そのような中で、顧客が怒っているクレーム通話を自動で見つけ出す技術がこれまで以上に望まれている。そこで感情推定装置２００を利用したモニタリングシステムについて説明する。 In recent years, there has been a lot of movement to obtain useful information for companies based on raw voices such as requests and dissatisfaction from customers gathering at call centers held by companies. In addition, call centers have started to be regarded as important as corporate functions, and companies are also making efforts to improve the quality of service at call centers in order to improve the image customers have about companies. Under such circumstances, a technique for automatically finding a complaint call in which a customer is angry is desired more than ever. Therefore, a monitoring system using the emotion estimation apparatus 200 will be described.

＜モニタリングシステム４００＞
図１４に示すようにモニタリングシステム４００は、前述の感情推定装置２００と、入力部４８０と、出力部４９０と、を備える。モニタリングシステム４００は、コールセンタ等に設置され、顧客からコールセンタへの着信や、コールセンタから顧客への発信をモニタリングし、クレーム通話を検出する。 <Monitoring system 400>
As shown in FIG. 14, the monitoring system 400 includes the emotion estimation apparatus 200 described above, an input unit 480, and an output unit 490. The monitoring system 400 is installed in a call center or the like, and monitors incoming calls from the customer to the call center and outgoing calls from the call center to the customer to detect a complaint call.

感情推定装置２００は、入力部４８０を介して、対象音声データＣ_１，Ｃ_２，…，Ｃ_Ｎ（但し、Ｎは１以上の整数）を受け取る。入力部４８０は、例えば、音声入力端子等である。なお、入力部４８０は、コールセンタ内に設置された電話機１_１，１_２，…，１_Ｎと接続されており、モニタリングシステム４００は、リアルタイムで、顧客とオペレータの通話内容を対象音声データとして受け取ることができる。 Emotion estimation apparatus 200 receives target speech data C ₁ , C ₂ ,..., C _N (where N is an integer equal to or greater than 1) via input unit 480. The input unit 480 is, for example, a voice input terminal. The input unit 480 is connected to the telephones 1 ₁ , 1 ₂ ,..., 1 _N installed in the call center, and the monitoring system 400 receives the call contents of the customer and the operator as target voice data in real time. be able to.

感情推定装置２００は、対象音声データを予め定めた感情推定単位に分割し、感情推定単位毎に感情を推定し、電話機の識別子Ｐ_１，Ｐ_２，…，Ｐ_Ｎとその推定結果Ｅ_１，Ｅ_２，…，Ｅ_Ｎを随時、出力部４９０に出力する。 The emotion estimation apparatus 200 divides the target speech data into predetermined emotion estimation units, estimates emotions for each emotion estimation unit, and sets telephone identifiers P ₁ , P ₂ ,..., P _N and their estimation results E ₁ , E _2, ..., and outputs the _{E N} at any time, to the output unit 490.

出力部４９０は、例えば、ディスプレイ等である。出力部４９０は、電話機の識別子とその推定結果を表示する（図１５参照）。 The output unit 490 is, for example, a display. The output unit 490 displays the telephone identifier and the estimation result (see FIG. 15).

＜効果＞
このような構成とすることによって、コールセンタの管理者等は、リアルタイムでオペレータの応対状況を監視することができ、クレームの発生を迅速に検出して対応することができる。 <Effect>
With such a configuration, the call center manager or the like can monitor the operator's response status in real time, and can quickly detect and respond to the occurrence of a complaint.

［変形例１］
＜モニタリングシステム５００＞
図１６に示すようにモニタリングシステム５００は、前述の感情推定装置２００と、入力部５８０と、出力部５９０と、を備える。モニタリングシステム５００は、コールセンタ等に設置され、顧客からコールセンタへの着信や、コールセンタから顧客への発信を通話データベース２から取得し、通話記録等からクレーム通話を検出する。 [Modification 1]
<Monitoring system 500>
As shown in FIG. 16, the monitoring system 500 includes the emotion estimation apparatus 200 described above, an input unit 580, and an output unit 590. The monitoring system 500 is installed in a call center or the like, acquires incoming calls from the customer to the call center and outgoing calls from the call center to the customer from the call database 2, and detects a complaint call from a call record or the like.

また、入力部５８０は、データ転送インターフェース等である。例えば、コールセンタの通話記録等を全て記憶する通話データベース２と入力部４８０とを接続することで、モニタリングシステム５００は、記憶された通話記録を対象音声データとして受け取ることができる。なお、通話データベース２に記憶される際に、対象音声データには、その対象音声データに対応する電話機の識別子、オペレータの識別子、対象音声データの開始時刻と、対象音声データの終了時刻等が付加されるものとする。 The input unit 580 is a data transfer interface or the like. For example, by connecting the call database 2 that stores all call center call records and the like to the input unit 480, the monitoring system 500 can receive the stored call records as target audio data. When stored in the call database 2, the target voice data is added with a telephone identifier, an operator identifier, a start time of the target voice data, an end time of the target voice data, and the like corresponding to the target voice data. Shall be.

感情推定装置２００は、１通話分の対象音声データの感情を推定し、電話機の識別子、オペレータの識別子、対象音声データの開始時刻と、対象音声データの終了時刻、その推定結果等を、出力部５９０に出力する。 The emotion estimation device 200 estimates the emotion of the target voice data for one call, and outputs the telephone identifier, the operator identifier, the start time of the target voice data, the end time of the target voice data, the estimation result, etc. Output to 590.

出力部５９０は、例えば、ディスプレイやプリンタ等であり、画面や紙に受け取ったデータ（電話機の識別子、オペレータの識別子、対象音声データの開始時刻と、対象音声データの終了時刻、その推定結果等）を表示する（図１７参照）。 The output unit 590 is, for example, a display, a printer, or the like, and data received on a screen or paper (telephone identifier, operator identifier, start time of target voice data, end time of target voice data, estimation result thereof, etc.) Is displayed (see FIG. 17).

＜効果＞
大量の通話記録を聴き起こしながらクレーム通話を探索する作業は非常に大きな人的コストがかかるが、このような構成により、自動的にクレーム通話を検出することができ、出力結果に基づき、通話データベースから容易にクレーム通話を探索することができる。このように探索したクレーム通話を分析することで、顧客の強い要望や不満、商品・サービスの不具合や問題点の発見につながる。またクレーム通話を引き起こすようなオペレータ応対の問題点の発見につながる。 <Effect>
Searching for a claim call while listening to a large number of call records requires a very large human cost, but with such a configuration, a claim call can be automatically detected, and a call database is created based on the output result. Can easily search for a claim call. Analyzing the complaint calls searched in this way leads to the discovery of strong customer demands and dissatisfactions, malfunctions and problems with products and services. It also leads to the discovery of problems with operator interaction that cause complaint calls.

［その他の変形例］
感情推定装置として、実施例３の感情推定装置３００や実施例１の感情推定装置の前段に音声認識部を設けた装置を用いてもよい。 [Other variations]
As the emotion estimation device, a device provided with a voice recognition unit in front of the emotion estimation device 300 of the third embodiment or the emotion estimation device of the first embodiment may be used.

＜プログラム及び記録媒体＞
上述した感情推定装置やモニタリングシステムは、コンピュータにより機能させることもできる。この場合はコンピュータに、目的とする装置（各実施例で図に示した機能構成をもつ装置）として機能させるためのプログラム、またはその処理手順（各実施例で示したもの）の各過程をコンピュータに実行させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、半導体記憶装置などの記録媒体から、あるいは通信回線を介してそのコンピュータ内にダウンロードし、そのプログラムを実行させればよい。 <Program and recording medium>
The emotion estimation apparatus and monitoring system described above can also be functioned by a computer. In this case, the program for causing the computer to function as the target device (the device having the functional configuration shown in the drawings in each embodiment) or each process of the processing procedure (shown in each embodiment) is processed by the computer. A program to be executed by the computer may be downloaded from a recording medium such as a CD-ROM, a magnetic disk, or a semiconductor storage device or via a communication line into the computer, and the program may be executed.

９０学習用テキストコーパス
９１学習用音声コーパス
１００，２００，３００感情推定装置
２１０音声認識部
３２０韻律的特徴量算出部
１３０テキスト解析部
１３１形態素解析部
１３３構文解析部
１４０，２４０明示性推定部
１４１，２４１省略格推定部
１４３，２４３置換単語推定部
１４４，２４４明示単語検出部
１４４ａ明示単語リスト記憶部
１４５，２４５明示性特徴量算出部
１５０，３５０識別器生成部
１６０，３６０感情識別部
４００，５００モニタリングシステム
４８０，５８０入力部
４９０，５９０出力部 90 learning text corpus 91 learning speech corpus 100, 200, 300 emotion estimation device 210 speech recognition unit 320 prosodic feature quantity calculation unit 130 text analysis unit 131 morpheme analysis unit 133 syntax analysis unit 140, 240 explicitness estimation unit 141 241 abbreviation estimation unit 143, 243 replacement word estimation unit 144, 244 explicit word detection unit 144a explicit word list storage unit 145, 245 explicit feature quantity calculation unit 150, 350 classifier generation unit 160, 360 emotion identification unit 400, 500 Monitoring system 480, 580 Input unit 490, 590 Output unit

Claims

Using at least one of the morphological analysis result and the syntax analysis result of the target text data, and an explicitness estimation unit that estimates an explicit feature amount indicating how much the semantic content is clearly indicated in the target text data;
Whether anger emotion is expressed in the target text data using a discriminator learned in advance from the explicit feature quantity of the text data for learning with a teacher signal indicating whether anger emotion is expressed An emotion identifying unit that identifies whether or not from the explicit feature quantity of the target text data estimated by the explicit estimation unit ,
The explicit feature amount is:
(1) Number of omissions for each abbreviation
(2) Presence / absence of each abbreviation
(3) The total number of omissions for all abbreviations included in the emotion estimation unit, not for each abbreviation
(4) Presence or absence of abbreviations in emotion estimation units
(5) Number of occurrences for each replacement word
(6) Presence / absence of each replacement word
(7) The total number of occurrences of all replacement words included in the emotion estimation unit, not for each replacement word
(8) Presence or absence of replacement word in emotion estimation unit
(9) Number of appearances for each explicit word
(10) Presence / absence of each explicit word
(11) The total number of appearances of all explicit words included in the emotion estimation unit, not for each explicit word
(12) Presence of explicit words in emotion estimation units
Or any combination thereof.
Emotion estimation device.

The emotion estimation device according to claim 1,
The explicitness estimation unit includes:
Using the parsing result, and having an abbreviated case estimation unit for estimating a case that should be omitted in the target text data.
Emotion estimation device.

The emotion estimation apparatus according to claim 1 or 2, wherein
The explicitness estimation unit includes:
Using the morphological analysis result, and having a replacement word estimation unit that estimates a word that exists in place of the word that should originally exist in the target text data,
Emotion estimation device.

The emotion estimation apparatus according to any one of claims 1 to 3,
The explicitness estimation unit includes:
In a normal utterance, an explicit word list storage unit that stores in advance words that are easily omitted;
Using the morpheme analysis result, referring to the explicit word list storage unit, and having an explicit word detection unit for detecting an explicit word existing in the target text data,
Emotion estimation device.

The emotion estimation apparatus according to any one of claims 2 to 4,
The explicitness estimation unit includes:
An explicit feature quantity calculation unit for calculating the explicit feature quantity using at least one of an estimation result of the abbreviation case estimation unit, an estimation result of the replacement word estimation unit, and a detection result of the explicit word detection unit; In addition,
Emotion estimation device.

The emotion estimation apparatus according to any one of claims 1 to 5,
Speech that performs speech recognition processing on the target speech data and obtains the target text data obtained from the target speech data and the recognition reliability indicating the reliability of the recognition result for each word included in the target text data A recognition unit,
The explicitness estimation unit uses at least one of the morphological analysis result and the syntax analysis result of the target text data with the recognition reliability, and how much the semantic content is clearly specified in the target text data. Estimating explicit features that represent
Emotion estimation device.

The emotion estimation apparatus according to claim 6,
Using the target speech data, further comprising a prosodic feature quantity calculating unit for calculating the prosodic feature quantity;
In addition to the explicit feature amount of the learning text data, the emotion discriminating unit uses a discriminator previously learned by using the prosodic feature amount of the learning speech data corresponding to the learning text data, Whether anger emotion is expressed in the target speech data is identified from the explicit feature amount estimated by the explicitness estimation unit and the prosodic feature amount calculated by the prosodic feature amount calculation unit To
Emotion estimation device.

The explicitness estimation unit estimates an explicit feature amount indicating how much the semantic content is clearly indicated in the target text data by using at least one of the morphological analysis result and the syntax analysis result of the target text data. An explicitness estimation step;
An anger emotion is included in the target text data using a discriminator previously learned from the explicit feature quantity of the text data for learning with a teacher signal indicating whether the anger emotion is expressed by the emotion identification unit An emotion identification step that identifies whether or not is expressed from the explicit feature quantity of the target text data estimated in the explicitness estimation step,
The explicit feature amount is:
(1) Number of omissions for each abbreviation (2) Presence / absence of omission for each abbreviation (3) Total number of omissions of all abbreviations included in the emotion estimation unit instead of each abbreviation (4) In the emotion estimation unit Presence / absence of abbreviation (5) Number of occurrences for each replacement word (6) Presence / absence for each replacement word (7) Total number of occurrences of all replacement words included in the emotion estimation unit instead of each replacement word (8) Presence / absence of replacement word in emotion estimation unit (9) Number of appearances for each explicit word (10) Presence / absence for each explicit word (11) Not all explicit words but all explicit words included in emotion estimation unit Total number of appearances (12) Any of presence / absence of explicit word in emotion estimation unit, or a combination thereof,
Emotion estimation method.

Program for causing a computer to function as the emotion estimation apparatus according to claims 1 to claim 7.

A computer-readable recording medium on which the program according to claim 9 is recorded.