JP2015099304A

JP2015099304A - Sympathy/antipathy location detecting apparatus, sympathy/antipathy location detecting method, and program

Info

Publication number: JP2015099304A
Application number: JP2013239767A
Authority: JP
Inventors: 孝典芦原; Takanori Ashihara; 裕司青野; Yuji Aono; 阪内　澄宇; Sumitaka Sakauchi; 澄宇阪内
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-11-20
Filing date: 2013-11-20
Publication date: 2015-05-28
Anticipated expiration: 2033-11-20
Also published as: JP6110283B2

Abstract

PROBLEM TO BE SOLVED: To detect locations of changes in emotional states of dialogue partners.SOLUTION: In an emotional model memory unit 91, an emotional model set obtained by modeling of a plurality of emotional scores is stored. A dialogue voice analyzing unit 13 extracts frame-by-frame acoustic feature quantities from input voice signals. An emotional score calculating unit 17 calculates emotional scores from the acoustic feature quantities by using the emotional model set. A sympathy/antipathy location estimation score calculating unit 18 calculates frame-by-frame sympathy/antipathy location estimation scores. A sympathy/antipathy location estimating unit 19 estimates sympathy/antipathy locations on the basis of the sympathy/antipathy location estimation scores.

Description

この発明は、対話において対話者が持つ共感もしくは反感といった感情状態の変化を検出する技術に関する。 The present invention relates to a technique for detecting a change in an emotional state such as empathy or affirmation possessed by a conversation person in a conversation.

従来から電話等を用いた音声対話において対話者が持つ共感もしくは反感といった感情状態を検出する技術が利用されている。例えば、特許文献１には、コールセンタにおけるオペレータの顧客に対する応対を自動的に評点し、オペレータ教育の負担を軽減する発明が記載されている。 2. Description of the Related Art Conventionally, a technology for detecting an emotional state such as empathy or anti-feeling of a talker in a voice conversation using a telephone or the like has been used. For example, Patent Document 1 describes an invention that automatically scores an operator's response to a customer in a call center and reduces the burden of operator education.

特許文献１に記載の発明では、具体的には以下のように感情状態を検出する。まず、入力された顧客の音声信号から音声特徴量を検出する。次に、あらかじめ定義された複数の感情のそれぞれをモデル化した感情モデル集合と音声特徴量の時系列的なマッチングを取ることで、１コールごとの感情系列を生成する。次に、複数の感情のそれぞれと感情点数を対応させた感情点数リストを用いて、感情系列の各感情を感情点数に変換する。そして、感情点数の系列をもとに応対終了時の感情点数から応対開始時の感情点数を減算した値や１コールにおける感情点数の平均値などを応対評点として算出する。 In the invention described in Patent Document 1, the emotional state is specifically detected as follows. First, a voice feature amount is detected from the inputted customer voice signal. Next, an emotion sequence for each call is generated by taking a time-series matching between an emotion model set obtained by modeling each of a plurality of predefined emotions and a voice feature amount. Next, each emotion in the emotion series is converted into an emotion score by using an emotion score list in which the emotion score is associated with each of a plurality of emotions. Then, based on the emotion score series, a value obtained by subtracting the emotion score at the start of the response from the emotion score at the end of the response, the average value of the emotion scores in one call, or the like is calculated as the response score.

特開２００７−２８６３７７号公報JP 2007-286377 A

特許文献１に記載された従来の技術は、顧客側の感情を事前にモデル化した上でコールセンタにおけるオペレータとの対話で生じる顧客の感情を数値化し、その数値をもとにオペレータの応対評価を自動で実行する技術である。この従来技術は、オペレータの教育や業務効率の向上といったコスト削減などのメリットが見込めるが、直接的に売り上げを向上させる目的では利用することができない。例えば、顧客とオペレータとの対話において顧客の感情状態が変化した箇所を検出することができれば、その箇所の対話内容を把握することでサービスや商品の改善ポイントを推定することができると考えられる。 The conventional technique described in Patent Document 1 models customer emotions in advance and then digitizes customer emotions generated by dialogue with the operator at the call center, and evaluates the response of the operator based on the numerical values. This is an automatic technology. Although this conventional technique can be expected to have advantages such as cost reduction such as operator training and improvement of work efficiency, it cannot be used for the purpose of directly improving sales. For example, if a location where the customer's emotional state has changed in the dialogue between the customer and the operator can be detected, the improvement point of the service or product can be estimated by grasping the content of the dialogue at that location.

この発明の目的は、対話者の感情状態が変化した箇所を検出することである。 An object of the present invention is to detect a location where the emotional state of a conversation person has changed.

上記の課題を解決するために、この発明の共感反感箇所検出装置は、感情モデル記憶部、対話音声分析部、感情スコア計算部、共感反感箇所推定スコア計算部及び共感反感箇所推定部を含む。感情モデル記憶部は、あらかじめ定義された複数の感情スコアをそれぞれモデル化した感情モデル集合を記憶する。対話音声分析部は、入力された音声信号からフレームごとに音響特徴量を抽出する。感情スコア計算部は、音響特徴量から感情モデル集合を用いてフレームごとに感情スコアを計算する。共感反感箇所推定スコア計算部は、感情スコアに基づいてフレームごとに共感反感箇所推定スコアを計算する。共感反感箇所推定部は、共感反感箇所推定スコアに基づいて感情状態が変化したフレームである共感反感箇所を推定する。 In order to solve the above-described problem, the sympathetic reaction site detection apparatus of the present invention includes an emotion model storage unit, a dialog voice analysis unit, an emotion score calculation unit, a sympathy reaction site estimation score calculation unit, and a sympathy reaction site estimation unit. The emotion model storage unit stores an emotion model set obtained by modeling a plurality of predefined emotion scores. The dialog voice analysis unit extracts an acoustic feature amount for each frame from the input voice signal. The emotion score calculation unit calculates an emotion score for each frame using the emotion model set from the acoustic feature amount. The sympathetic reaction site estimation score calculation unit calculates a sympathy reaction site estimation score for each frame based on the emotion score. The sympathetic reaction part estimation unit estimates a sympathy reaction part that is a frame in which the emotional state has changed based on the sympathy reaction part estimation score.

この発明の共感反感箇所検出技術によれば、対話者の感情状態が変化した箇所を検出することができる。また、この技術を応用すれば、対話者に感情状態の変化をもたらした対話内容を把握することができる。 According to the sympathetic reaction spot detection technique of the present invention, it is possible to detect a place where the emotional state of the talker has changed. Moreover, if this technology is applied, it is possible to grasp the content of the dialogue that has caused the emotional state change to the interlocutor.

図１は、感情スコアと共感反感箇所推定スコアと共感反感箇所の関係を説明するための図である。FIG. 1 is a diagram for explaining the relationship between an emotion score, a sympathetic reaction location estimation score, and a sympathy response location. 図２は、共感反感箇所検出装置の機能構成を例示する図である。FIG. 2 is a diagram illustrating a functional configuration of the sympathetic reaction site detection apparatus. 図３は、共感反感箇所検出方法の処理フローを例示する図である。FIG. 3 is a diagram illustrating a processing flow of the sympathetic reaction site detection method.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

［発明のポイント］
この発明では、例えばコールセンタ等の利用シーンにおいて取得できる情報（例えば、音声情報や対話情報等）から顧客の感情の時間変化を推定した上で、その時系列変化パターンに着目し、どのオペレータ発話に顧客が共感もしくは反感したのかを推定する。 [Points of Invention]
In this invention, after estimating the time change of the customer's emotion from information (for example, voice information, conversation information, etc.) that can be acquired in a usage scene such as a call center, the operator utters which operator utterance by paying attention to the time-series change pattern. Estimate whether was sympathetic or dissatisfied.

この発明では、例えばコールセンタ等の状況下において、顧客が発話した音声情報もしくは顧客とオペレータとの対話情報等から従来技術を用いて顧客の感情状態を推定する。感情種別は、「Positive（例えば、快、喜び、楽しみ等）」から「Negative（例えば、不快、怒り、悲しみ等）」まであり、あらかじめ定めた感情スコアとして算出する。例えば、５〜−５までの整数値をスコアとして設けるとする。この場合には、出力スコアが５であれば確実に「Positive」な感情状態である。出力スコアが０であれば「Positive」なのか「Negative」なのかを判断できない曖昧な感情状態である。出力スコアが−５であれば確実に「Negative」な感情状態である。 In the present invention, for example, in the state of a call center or the like, the emotional state of the customer is estimated using conventional technology from voice information spoken by the customer or dialogue information between the customer and the operator. The emotion types range from “Positive (for example, pleasure, joy, enjoyment, etc.)” to “Negative (for example, discomfort, anger, sadness, etc.)”, and are calculated as a predetermined emotion score. For example, it is assumed that an integer value from 5 to -5 is provided as a score. In this case, if the output score is 5, the emotional state is surely “Positive”. If the output score is 0, it is an ambiguous emotional state in which it is impossible to determine whether it is “Positive” or “Negative”. If the output score is −5, the emotional state is definitely “Negative”.

特徴量計算における計算窓をシフトさせていくことで、時系列変化パターンを得る事ができる。さらに、各特徴量から感情スコアを計算することで感情状態の時系列変化パターンを得る事ができる。顧客の感情状態の時系列変化パターンを用いて、オペレータ発話のどの箇所に共感もしくは反感したのかを感情状態が変化した箇所から推定することができる。例えば、推定された感情スコアが５から−５に変化した前後のオペレータ発話は、顧客を不快（すなわち、反感状態）にさせる発話がなされたと推定される。また逆に、例えば、感情スコアが−５から５に変化した前後のオペレータ発話は、顧客を快（すなわち、共感状態）にさせる発話がなされたと推定される。このように、ある一定時区間内で一定量以上のスコア変動が発生した場合のオペレータ発話を収集することで、顧客がどのキーワードに反応して共感もしくは反感したのかを推定することができる。特に、このキーワードをカウントすることで共感もしくは反感した際に頻出する単語を選出する事ができる。例えば、「お値段」というキーワードで顧客が反感する事が多いようであれば、金額が高いという部分が顧客を不快にさせたのだと推測することが可能になる。しかし逆に、「お値段」というキーワードに対して共感する事が多いようであれば、金額が思いの外安いと顧客にとって感じられたのだと推測される。 A time series change pattern can be obtained by shifting the calculation window in the feature quantity calculation. Furthermore, a time-series change pattern of emotional states can be obtained by calculating an emotional score from each feature quantity. Using the time-series change pattern of the customer's emotional state, it is possible to estimate from the location where the emotional state has changed which part of the operator utterance is sympathetic or disgusting. For example, operator utterances before and after the estimated emotion score changes from 5 to -5 are presumed to be utterances that make the customer uncomfortable (that is, a feeling of discomfort). Conversely, for example, operator utterances before and after the emotion score changes from -5 to 5 are presumed to be utterances that make the customer pleasant (ie, sympathetic state). In this way, by collecting operator utterances when score fluctuations of a certain amount or more occur within a certain time interval, it is possible to estimate which keyword the customer has responded to or sympathizes with. In particular, by counting this keyword, it is possible to select words that appear frequently when they sympathize or dislike. For example, if the customer often feels disliked with the keyword “price”, it can be assumed that the high amount of money has made the customer uncomfortable. On the other hand, if there is a lot of sympathy for the keyword "price", it is assumed that the customer felt that the amount was unexpectedly cheap.

図１に、この発明で用いる感情スコア及び共感反感箇所推定スコアと共感反感箇所の関係を示す。横軸は対話における開始からの経過時間を表し、縦軸は感情スコアを表す。感情スコアは対話音声のフレームごとに推定され、時間の経過と共に変動する。感情スコアに基づいて求める共感反感箇所推定スコアが大きく変動する箇所があれば、対話者の感情状態に変化があったことを表していると推定できる。この共感反感箇所推定スコアが大きく変動する箇所を共感反感箇所として推定する。 FIG. 1 shows the relationship between the emotion score and the sympathetic feeling location estimation score used in this invention and the sympathetic feeling location. The horizontal axis represents the elapsed time from the start of the dialogue, and the vertical axis represents the emotion score. The emotion score is estimated for each frame of the dialogue voice and varies with time. If there is a location where the empathy / relief location estimation score obtained based on the emotion score varies greatly, it can be estimated that the emotional state of the talker has changed. A place where the sympathetic reaction site estimation score greatly varies is estimated as a sympathy reaction site.

［実施形態］
図２を参照して、実施形態に係る共感反感箇所検出装置１の機能構成の一例を説明する。共感反感箇所検出装置１は、入力端子１０、閾値入力部１１、音声信号取得部１２、対話音声分析部１３、音声認識部１４、形態素解析部１５、感情モデル学習部１６、感情スコア計算部１７、共感反感箇所推定スコア計算部１８、共感反感箇所推定部１９、共感反感単語集計部２０、感情モデル記憶部９１及び共感反感単語記憶部９２を含む。共感反感箇所検出装置１は、例えば、中央演算処理装置（Central Processing Unit、CPU）、主記憶装置（Random Access Memory、RAM）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。共感反感箇所検出装置１は、例えば、中央演算処理装置の制御のもとで各処理を実行する。共感反感箇所検出装置１に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。共感反感箇所検出装置１が備える各記憶部は、例えば、RAM（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリ（Flash Memory）のような半導体メモリ素子により構成される補助記憶装置、またはリレーショナルデータベースやキーバリューストアなどのミドルウェアにより構成することができる。共感反感箇所検出装置１が備える各記憶部は、それぞれ論理的に分割されていればよく、一つの物理的な記憶装置に記憶されていてもよい。 [Embodiment]
With reference to FIG. 2, an example of a functional configuration of the sympathetic reaction site detection apparatus 1 according to the embodiment will be described. The sympathetic reaction site detection apparatus 1 includes an input terminal 10, a threshold input unit 11, a voice signal acquisition unit 12, a dialogue voice analysis unit 13, a voice recognition unit 14, a morpheme analysis unit 15, an emotion model learning unit 16, and an emotion score calculation unit 17. , A sympathetic feeling part estimation score calculation part 18, a sympathy feeling part estimation part 19, a sympathy feeling feeling word totaling part 20, an emotion model storage part 91 and a sympathy feeling feeling word storage part 92. The sympathetic reaction location detection apparatus 1 is configured by, for example, reading a special program into a known or dedicated computer having a central processing unit (CPU), a main memory (Random Access Memory, RAM), and the like. Special equipment. For example, the sympathetic reaction site detection apparatus 1 executes each process under the control of the central processing unit. The data input to the sympathetic reaction spot detection device 1 and the data obtained by each processing are stored in, for example, the main storage device, and the data stored in the main storage device is read out as necessary to perform other processing. Used for Each storage unit included in the sympathetic reaction location detection device 1 includes, for example, a main storage device such as a RAM (Random Access Memory), an auxiliary storage device configured by a semiconductor memory element such as a hard disk, an optical disk, or a flash memory. Or middleware such as a relational database or key-value store. Each storage unit included in the sympathetic reaction site detection device 1 may be logically divided, and may be stored in one physical storage device.

図３を参照しながら、実施形態に係る共感反感箇所検出装置１が実行する共感反感箇所検出方法の処理フローの一例を、実際に行われる手続きの順に従って説明する。 With reference to FIG. 3, an example of the processing flow of the sympathetic reaction site detection method executed by the sympathy response site detection apparatus 1 according to the embodiment will be described in the order of the procedures actually performed.

ステップＳ１１において、閾値入力部１１には、共感反感箇所推定部１９が利用する感情スコア変分閾値が入力される。感情スコア変分閾値の詳細は後述する。入力された感情スコア変分閾値は共感反感箇所推定部１９に設定される。感情スコア変分閾値は共感反感箇所推定部１９にあらかじめ設定されていてもよく、必ずしも外部から入力されなくともよい。その場合には、閾値入力部１１を省略してもよい。 In step S <b> 11, the threshold value input unit 11 receives an emotion score variation threshold value used by the sympathetic reaction site estimation unit 19. Details of the emotion score variation threshold will be described later. The input emotion score variation threshold is set in the sympathetic reaction site estimation unit 19. The emotion score variation threshold may be set in advance in the sympathetic reaction site estimation unit 19 and may not necessarily be input from the outside. In that case, the threshold value input unit 11 may be omitted.

ステップＳ１２において、音声信号取得部１２は、入力端子１０より入力されるアナログの音声信号を音声ディジタル信号に変換する。音声信号は既存のいかなる手段により取得してもよく、例えば、入力端子１０に接続したマイクロホンにより取得して入力してもよいし、ICレコーダ等の録音機器を用いてあらかじめ収録した音声信号を入力端子１０に接続して入力してもよい。音声信号取得部１２により変換された音声ディジタル信号は対話音声分析部１３及び音声認識部１４に入力される。 In step S12, the audio signal acquisition unit 12 converts an analog audio signal input from the input terminal 10 into an audio digital signal. The audio signal may be obtained by any existing means, for example, obtained by a microphone connected to the input terminal 10 or inputted by a recording device such as an IC recorder. It may be connected to the terminal 10 and input. The voice digital signal converted by the voice signal acquisition unit 12 is input to the dialogue voice analysis unit 13 and the voice recognition unit 14.

ステップＳ１３において、対話音声分析部１３は、入力された音声ディジタル信号から音響特徴量を抽出する。抽出された音響特徴量は感情モデル学習部１６及び感情スコア計算部１７へ入力される。 In step S13, the dialog voice analysis unit 13 extracts an acoustic feature amount from the input voice digital signal. The extracted acoustic feature amount is input to the emotion model learning unit 16 and the emotion score calculation unit 17.

抽出する音響特徴量としては、例えば、メル周波数ケプストラム係数（Mel-Frequency Cepstrum Coefficient、MFCC）の1〜12次元と、その変化量であるΔMFCC、ΔΔMFCCなどの動的パラメータや、パワーと、その変化量であるΔパワー及びΔΔパワーなどを用いる。ここで、ケプストラム平均正規化（Cepstral Mean Normalization、CMN）処理を行ってもよい。音響特徴量は、MFCCやパワーに限定したものでは無く、音声認識に用いられるパラメータや発話区間情報を用いることができる。また、このときに用いる計算窓は30ミリ秒程度とし、計算窓シフトは10ミリ秒程度で実行すればよい。 As acoustic features to be extracted, for example, 1 to 12 dimensions of mel frequency cepstrum coefficient (Mel-Frequency Cepstrum Coefficient, MFCC), dynamic parameters such as ΔMFCC and ΔΔMFCC, and power and changes thereof Δ power and ΔΔ power, which are quantities, are used. Here, a cepstral mean normalization (CMN) process may be performed. The acoustic feature amount is not limited to MFCC and power, but parameters and speech section information used for speech recognition can be used. Further, the calculation window used at this time is about 30 milliseconds, and the calculation window shift may be executed in about 10 milliseconds.

ステップＳ１４において、音声認識部１４は、入力された音声ディジタル信号に対して音声認識処理を行い、認識結果を生成する。音声認識の方法は既存の音声認識技術であれば任意に適用することができる。生成された認識結果は形態素解析部１５へ入力される。 In step S14, the speech recognition unit 14 performs speech recognition processing on the input speech digital signal and generates a recognition result. The speech recognition method can be arbitrarily applied as long as it is an existing speech recognition technology. The generated recognition result is input to the morphological analysis unit 15.

ステップＳ１５において、形態素解析部１５は、入力された認識結果を形態素解析する。形態素解析の方法は既存の形態素解析技術であれば任意に適用することができる。生成された形態素解析結果は共感反感単語集計部２０へ入力される。 In step S15, the morphological analysis unit 15 performs morphological analysis on the input recognition result. The morphological analysis method can be arbitrarily applied as long as it is an existing morphological analysis technique. The generated morpheme analysis result is input to the sympathetic reaction word totaling unit 20.

ステップＳ１６において、感情モデル学習部１６は、入力された音響特徴量を用いて感情モデルを学習する。感情モデルは、例えば、音声認識の分野で汎用される確率統計理論に基づいてモデル化された多次元混合正規分布（Gaussian Mixture Model、GMM）で表現することができる。GMMの詳細については、例えば、「D.A. Reynolds, R. C. Rose， “Robust Text-Independent speaker Identification using Gaussian mixture speaker models”， IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 72-83, Jan. 1995.」を参照されたい。例えば、「Positive（３）」から「Negative（−３）」まで整数の感情スコアを定義した場合、７個の感情モデルを生成することになる。感情モデルの学習方法の詳細は、特許文献１を参照されたい。各感情スコアに対応する感情モデルの集合は、感情モデル記憶部９１へ記憶される。 In step S <b> 16, the emotion model learning unit 16 learns an emotion model using the input acoustic feature amount. The emotion model can be expressed by, for example, a multi-dimensional mixed normal distribution (GaMMian Mixture Model, GMM) modeled based on probability statistical theory widely used in the field of speech recognition. For details on GMM, see, for example, “DA Reynolds, RC Rose,“ Robust Text-Independent speaker Identification using Gaussian mixture speaker models ”, IEEE Trans. Speech Audio Process., Vol. 3, no. 1, pp. 72-83. , Jan. 1995. ”. For example, when an integer emotion score is defined from “Positive (3)” to “Negative (−3)”, seven emotion models are generated. For details of the emotion model learning method, refer to Patent Document 1. A set of emotion models corresponding to each emotion score is stored in the emotion model storage unit 91.

ステップＳ１７において、感情スコア計算部１７は、入力された音響特徴量と感情モデル記憶部９１に記憶されている感情モデル集合とのマッチングを行い、最も適当な感情スコアを計算する。具体的には、入力された音響特徴量と感情モデル集合に含まれる各感情モデルとの尤度計算をそれぞれ行う。感情スコアの計算方法の詳細は、特許文献１を参照されたい。計算した感情スコアは、共感反感箇所推定スコア計算部１８へ入力される。 In step S <b> 17, the emotion score calculation unit 17 performs matching between the input acoustic feature quantity and the emotion model set stored in the emotion model storage unit 91, and calculates the most appropriate emotion score. Specifically, likelihood calculation is performed on the input acoustic feature quantity and each emotion model included in the emotion model set. For details of the emotion score calculation method, refer to Patent Document 1. The calculated emotion score is input to the sympathetic reaction site estimation score calculation unit 18.

ステップＳ１８において、共感反感箇所推定スコア計算部１８は、入力された感情スコアから共感反感箇所推定スコアを計算する。計算した共感反感箇所推定スコアは共感反感箇所推定部１９へ入力される。共感反感箇所推定スコアの計算方法は、例えば以下に挙げる二通りの方法がある。 In step S18, the sympathetic reaction site estimation score calculation unit 18 calculates a sympathy response site estimation score from the input emotion score. The calculated sympathetic reaction site estimation score is input to the sympathy response site estimation unit 19. There are, for example, the following two methods for calculating the sympathetic reaction site estimation score.

一番目の方法は、当該フレームの感情スコアと直前のフレームの感情スコアとの差分を当該フレームの共感反感箇所推定スコアとする方法である。具体的には、N+1フレーム目の感情スコアからNフレーム目の感情スコアを減算した値を、N+1フレーム目の共感反感箇所推定スコアとする。 The first method is a method in which the difference between the emotion score of the frame and the emotion score of the immediately preceding frame is used as the sympathetic reaction site estimation score of the frame. Specifically, a value obtained by subtracting the emotion score of the Nth frame from the emotion score of the (N + 1) th frame is set as the sympathetic reaction site estimation score of the (N + 1) th frame.

二番目の方法は、当該フレームより前の複数のフレームの感情スコアから計算した尤度と当該フレームの事後分布とを用いて当該フレームの共感反感箇所推定スコアを計算する方法である。具体的には、機械学習を用いてN+1フレーム目の感情スコアを予測した上で、その予測した感情スコアからN+1フレーム目の実際の感情スコアを減算した値を、N+1フレーム目の共感反感箇所推定スコアとする。ここで述べている機械学習とは、例えばベイズ線形回帰等が挙げられる。つまり、事前にいくつかの学習データからベイズ線形回帰の事前分布のパラメータを推定しておき、Nフレーム目までの感情スコアから尤度を計算し、N+1フレーム目の事後分布を求め、事後確率最大化基準などを用いてN+1フレーム目の感情スコアの予測値を算出する。より詳しい計算方法については、「C.M.ビショップ、『パターン認識と機械学習下 - ベイズ理論による統計的予測』、丸善出版、pp. 151〜159」を参照されたい。 The second method is a method of calculating a sympathetic feeling location estimation score of the frame using the likelihood calculated from the emotion scores of a plurality of frames before the frame and the posterior distribution of the frame. Specifically, after predicting the emotion score of the (N + 1) th frame using machine learning, the value obtained by subtracting the actual emotion score of the (N + 1) th frame from the predicted emotion score is represented by the (N + 1) th frame. The score is the estimated score for the sympathy for the eyes. The machine learning described here includes, for example, Bayesian linear regression. In other words, pre-distribution parameters for Bayesian linear regression are estimated from some learning data in advance, the likelihood is calculated from the emotion score up to the Nth frame, and the posterior distribution of the (N + 1) th frame is obtained. The predicted value of the emotion score of the (N + 1) th frame is calculated using a probability maximization criterion. For more detailed calculation method, please refer to “C.M. Bishop,“ Pattern recognition and machine learning-Statistical prediction by Bayesian theory ”, Maruzen Publishing, pp. 151-159.

ステップＳ１９において、共感反感箇所推定部１９は、共感反感箇所推定スコアに基づいて感情状態が変化したフレームである共感反感箇所を推定する。感情状態が変化したかどうかの判断は、閾値入力部１１により設定された感情スコア変分閾値と比較することにより行う。すなわち、共感反感箇所推定スコアが感情スコア変分閾値を超えた場合、そのフレームを共感反感箇所として推定する。推定された共感反感箇所は共感反感単語集計部２０へ入力される。 In step S19, the sympathetic reaction part estimation part 19 estimates the sympathy reaction part which is a flame | frame in which the emotional state changed based on the sympathy reaction part estimation score. Whether the emotional state has changed is determined by comparing with the emotional score variation threshold set by the threshold input unit 11. That is, when the sympathetic reaction location estimation score exceeds the emotion score variation threshold, the frame is estimated as a sympathy response location. The estimated empathy feeling part is input to the empathy feeling word totaling unit 20.

ステップＳ２０において、共感反感単語集計部２０は、形態素解析部１５より入力された形態素解析済みの認識結果から共感反感箇所で発話された単語を抽出し、各単語の出現回数を集計する。以降では、抽出された単語を共感反感単語と呼ぶ。共感反感単語の集計結果は共感反感単語記憶部９２へ記憶される。 In step S <b> 20, the sympathetic response word totaling unit 20 extracts words uttered at the sympathetic response point from the recognition result after morphological analysis input from the morpheme analysis unit 15, and totals the number of appearances of each word. Hereinafter, the extracted word is referred to as a sympathetic reaction word. The count result of the sympathetic feeling words is stored in the sympathetic feeling word storage unit 92.

具体的には、顧客の共感反感箇所に対応するオペレータ発話文の形態素解析結果を用いて、顧客が共感もしくは反感した際にオペレータが発した単語（形態素）をカウントし、その共感反感単語と、その累計値を出力する。この際、共感反感単語は品詞別にカウントされる。 Specifically, using the morpheme analysis result of the operator utterance sentence corresponding to the customer's sympathy negative part, the word (morpheme) issued by the operator when the customer sympathizes or disagrees is counted, and the sympathy negative word and The accumulated value is output. At this time, the empathy negative words are counted for each part of speech.

すべて単純に累計値を出力する以外にも、例えばコールセンタであれば「センタ毎（例えば、故障受付、契約受付等）」「時期毎（例えば、四半期毎、特定の期間等）」「オペレータ毎」「コールセンタ設置地域毎」等、様々な条件を付与して累計しておくことで、より詳細な分析を可能にする。すなわち、新商品や新サービスが登場した数日間の共感反感単語を閲覧することで、その新商品や新サービスの改善ポイントを把握する事ができる。 In addition to simply outputting the total value, for example, in the case of a call center, “for each center (for example, fault reception, contract reception, etc.)” “for each time (for example, quarterly, specific period, etc.)” “for each operator” By adding various conditions such as “by call center installation area” and accumulating, more detailed analysis is possible. That is, it is possible to grasp the improvement points of the new product or new service by browsing the empathy negative feeling words for several days when the new product or new service appears.

共感反感単語は品詞別に集計しているため、名詞によるキーワードだけでなく言い回しも抽出できる。例えばオペレータがある言い回しをすることで頻繁に顧客が反感するようであれば、その言い回しを異なる言い回しで代用するようにオペレータを指導するなどの業務改善に役立てることができる。 Since empathy words are tabulated by part of speech, it is possible to extract not only keywords based on nouns but also phrases. For example, if the customer frequently feels reluctant by making an expression, it can be used for business improvement such as instructing the operator to substitute the expression with a different expression.

共感反感単語集計部２０は、集計した単語や言い回しそのものだけでなく、その単語や言い回しを含む発話を出力するように構成してもよい。また、その単語や言い回しを含む発話の前後に続く複数発話をまとめて出力するように構成してもよい。 The empathy negative word totaling unit 20 may be configured to output not only the totaled words and wording itself but also utterances including the word and wording. Moreover, you may comprise so that the several utterances followed before and after the utterance containing the word or phrase may be output collectively.

コールセンタ等での利用を想定した場合には、毎日大量の通話が発生するため、上述の処理内容を一日一回程度定期的に実行することとし、すでに共感反感単語記憶部９２に記憶されている共感反感単語の集計値に加算するように構成するとよいであろう。 Assuming use at a call center or the like, a large amount of calls occur every day. Therefore, the above-described processing content is periodically executed once a day and is already stored in the empathy-relief word storage unit 92. It may be configured to add to the total value of the sympathetic negative feeling words.

このように、この実施形態の共感反感箇所検出装置は、対話者の感情状態が変化した箇所を検出し、感情状態の変化を招いた発話内容を把握できる。そのため、より直接的に売り上げの向上に繋がる情報の収集に活用することができる。 As described above, the empathy / reverse spot detection apparatus according to this embodiment can detect a place where the emotional state of the conversation person has changed and can grasp the content of the utterance that caused the change of the emotional state. Therefore, it can be used for collecting information that directly leads to an increase in sales.

この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and it goes without saying that modifications can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment may be executed not only in time series according to the order of description, but also in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１共感反感箇所検出装置
１０入力端子
１１閾値入力部
１２音声信号取得部
１３対話音声分析部
１４音声認識部
１５形態素解析部
１６感情モデル学習部
１７感情スコア計算部
１８共感反感箇所推定スコア計算部
１９共感反感箇所推定部
２０共感反感単語集計部
９１感情モデル記憶部
９２共感反感単語記憶部 DESCRIPTION OF SYMBOLS 1 Empathy feeling part detection apparatus 10 Input terminal 11 Threshold value input part 12 Voice signal acquisition part 13 Dialogue speech analysis part 14 Speech recognition part 15 Morphological analysis part 16 Emotion model learning part 17 Emotion score calculation part 18 Empathy feeling part estimation score calculation part 19 Empathy-sensitive part estimation unit 20 Empathy-sensitive word totaling unit 91 Emotion model storage unit 92

Claims

An emotion model storage unit for storing a set of emotion models each modeling a plurality of predefined emotion scores;
An interactive speech analysis unit that extracts acoustic features for each frame from the input speech signal;
An emotion score calculation unit for calculating an emotion score for each frame using the emotion model set from the acoustic feature amount;
A sympathetic reaction site estimation score calculation unit that calculates a sympathy response site estimation score for each frame based on the emotion score;
A sympathetic reaction part estimation unit that estimates a sympathy reaction part that is a frame in which the emotional state has changed based on the sympathy reaction part estimation score;
A sympathetic reaction site detection device including

A sympathetic reaction spot detection device according to claim 1,
A voice recognition unit that generates a recognition result obtained by voice recognition of the voice signal;
A morphological analysis unit that generates an analysis result obtained by performing morphological analysis on the recognition result;
Extracting words spoken at the sympathetic reaction points from the analysis results, and summing up the number of times each word appears,
A sympathetic reaction site detection apparatus further comprising:

A sympathetic reaction spot detection device according to claim 1 or 2,
A sympathetic feeling part detection device further comprising an emotion model learning unit that learns the emotion model using the acoustic feature amount.

A sympathetic reaction spot detection device according to any one of claims 1 to 3,
The sympathetic reaction location estimation unit estimates a frame in which the sympathy response location estimation score exceeds a predetermined emotion score variation threshold as the sympathy response location detection apparatus.

A sympathetic reaction site detection device according to any one of claims 1 to 4,
The sympathetic reaction site estimation score calculation unit is configured to use the difference between the emotion score of the frame and the emotion score of the immediately preceding frame as the sympathy response site estimation score of the frame.

A sympathetic reaction site detection device according to any one of claims 1 to 4,
The sympathetic reaction location estimation score calculation unit calculates the sympathy response location estimation score of the frame using the likelihood calculated from the emotion scores of a plurality of frames before the frame and the posterior distribution of the frame. There is a sympathetic reaction point detection device.

A dialog voice analysis step in which a dialog voice analysis unit extracts an acoustic feature value for each frame from the input voice signal;
An emotion score calculation step in which an emotion score calculation unit calculates an emotion score for each frame using an emotion model set obtained by modeling a plurality of predefined emotion scores from the acoustic feature amount;
A sympathetic reaction location estimation score calculation unit calculates a sympathy response location estimation score for each frame based on the emotion score,
A sympathetic reaction location estimation unit estimates a sympathy response location that is a frame in which the emotional state has changed based on the sympathy response location estimation score, and
A method for detecting empathy and reaction sites.

A program for causing a computer to function as the sympathetic reaction site detection device according to any one of claims 1 to 6.