JP6712539B2

JP6712539B2 - Satisfaction determination device, method and program

Info

Publication number: JP6712539B2
Application number: JP2016221356A
Authority: JP
Inventors: 厚志安藤; 歩相名神山; 哲小橋川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-11-14
Filing date: 2016-11-14
Publication date: 2020-06-24
Anticipated expiration: 2036-11-14
Also published as: JP2018081125A

Description

この発明は、音声技術に関する。特に発話の満足性を判定する技術に関する。 This invention relates to audio technology. In particular, the present invention relates to a technique for determining utterance satisfaction.

コールセンタの運営において、発話から顧客が満足しているかを判定する（以降、顧客満足判定と呼ぶ。）技術が求められている。この技術は、顧客が満足した頻度をオペレータごとに集計することでオペレータ評価を自動化する、満足した発話を音声認識しテキスト解析することで顧客の要望を調査する、などに応用可能である。 In the operation of a call center, there is a demand for a technique for determining whether a customer is satisfied from an utterance (hereinafter referred to as customer satisfaction determination). This technology can be applied to, for example, automate the operator evaluation by totaling the frequency of satisfaction of each customer for each operator, and investigate the customer's request by performing voice recognition and text analysis of satisfied utterances.

上記の類似技術として、通話から顧客の満足や不満、怒りを推定する技術が非特許文献１，２で提案されている。非特許文献１では、顧客の話速などの話し方特徴と、競合他社の製品名の有無などの言語的特徴を用いて顧客の満足／不満推定を行う。非特許文献２では、顧客の声の高さや大きさなどの韻律特徴と、相槌の頻度などの対話特徴を用いて顧客の怒り／非怒りを推定する。いずれの技術でも、機械学習技術を用いて大量の通話から各特徴量と顧客の満足／不満・怒りとの関係性を学習し、推定に利用する。 As a similar technique to the above, Non-Patent Documents 1 and 2 propose a technique for estimating customer satisfaction, dissatisfaction, or anger from a call. In Non-Patent Document 1, customer satisfaction/dissatisfaction is estimated by using speaking style features such as the customer's speech speed and linguistic features such as the presence or absence of product names of competitors. In Non-Patent Document 2, the prosody/non-anger of the customer is estimated by using prosodic features such as the pitch and loudness of the customer's voice and dialogue features such as the frequency of the mischief. In any of the technologies, machine learning technology is used to learn the relationship between each feature amount and customer satisfaction/dissatisfaction/anger from a large number of calls and use it for estimation.

Youngja Park, Stephen C. Gates, “Towards Real-Time Measurement of Customer Satisfaction Using Automatically Generated Call Transcripts”, in Proceedings of the 18th ACM conference on Information and knowledge management, p.1387-1396, 2009.Youngja Park, Stephen C. Gates, “Towards Real-Time Measurement of Customer Satisfaction Using Automatically Generated Call Transcripts”, in Proceedings of the 18th ACM conference on Information and knowledge management, p.1387-1396, 2009. 野本済央、小橋川哲、田本真詞、政瀧浩和、吉岡理、高橋敏、”発話の時間的関係性を用いた対話音声からの怒り感情推定,” 電子情報通信学会論文誌、Vol.J96-D, No. 1, p. 15-24, 2013.Noo, Satoshi Kobashigawa, Satoshi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi, “Estimation of Anger Feeling from Dialogue Voice Using Temporal Relationship of Speech,” IEICE Transactions, Vol.J96- D, No. 1, p. 15-24, 2013.

顧客の満足は、声の高さの局所的な変化との関連性が深い。例えば、顧客が満足した発話では喜びの表出として声の高さが急激に上昇するなど、声の高さの局所的な変化が特異的となる傾向がある。しかし、従来技術では、発話単位での声の高さの統計量を韻律特徴量として利用しており、声の高さの局所的な変化は韻律特徴量に反映されにくい。結果として、満足判定の精度が低下している恐れがある。 Customer satisfaction is closely linked to local changes in voice. For example, a local change in voice pitch tends to be peculiar, such as a sudden rise in voice pitch as an expression of joy in an utterance that the customer is satisfied with. However, in the related art, the statistic of the voice pitch in a unit of utterance is used as the prosody feature amount, and a local change in the voice pitch is hardly reflected in the prosody feature amount. As a result, the accuracy of satisfaction determination may decrease.

この発明の目的は、従来よりも高い精度で発話の満足性の判定を行うことができる満足判定装置、方法及びプログラムを提供することである。 An object of the present invention is to provide a satisfaction determination apparatus, method and program capable of determining the satisfaction of speech with higher accuracy than ever before.

この発明の一態様による満足判定装置は、通話に含まれる各発話を構成する各フレームの基本周波数を抽出する基本周波数抽出部と、発話について、抽出された基本周波数の変化量である局所変化量を時刻ごとに計算する局所変化量計算部と、計算された通話に含まれる各発話の時刻ごとの基本周波数の局所変化量の平均及び分散である局所変化量統計量を計算する局所変化量統計量計算部と、計算された局所変化量統計量に基づいて、計算された各発話の時刻ごとの局所変化量が特異であるかを判定する局所変化量特異判定部と、各発話について、特異であると判定された回数が所定の閾値以上含まれている場合には、各発話は発話をした者が満足した発話であると判定する満足判定部と、を備えている。 A satisfaction determination device according to an aspect of the present invention includes a fundamental frequency extraction unit that extracts a fundamental frequency of each frame that forms each utterance included in a call, and a local variation amount that is a variation amount of the extracted fundamental frequency for the utterance. a local change amount calculation unit for calculating a per time, calculated call to a local variation statistics calculating local variation statistic is the average and variance of the local variation of the fundamental frequency of each time for each utterance contained Based on the calculated amount of local change statistic, a local change amount singular determination unit that determines whether the calculated local change amount for each time of each utterance is singular, and for each utterance , a singular When the number of times that the utterance is determined to be equal to or greater than a predetermined threshold is included, each utterance includes a satisfaction determination unit that determines that the utterance is satisfied by the utterer.

声の高さの局所的な変化が特異的かどうかに基づいて発話の満足性を判定することにより、従来よりも高い精度で発話の満足性の判定を行うことができる。 By determining the satisfaction of the utterance based on whether or not the local change in the voice pitch is specific, it is possible to determine the satisfaction of the utterance with higher accuracy than before.

満足判定装置の例を説明するためのブロック図。The block diagram for explaining the example of a satisfaction judging device. 満足判定方法の例を説明するための流れ図。9 is a flowchart for explaining an example of a satisfaction determination method. 局所変化量の計算の例を説明するための図。The figure for demonstrating the example of calculation of a local change amount.

［実施形態］
以下、この発明の実施形態について説明する。 [Embodiment]
Hereinafter, embodiments of the present invention will be described.

満足判定装置は、図１に示すように、音声区間検出部１０１、基本周波数抽出部１０２、局所変化量計算部１０３、局所変化量統計量計算部１０４、局所変化量特異判定部１０５及び満足判定部１０６を例えば備えている。 As shown in FIG. 1, the satisfaction determination device includes a voice section detection unit 101, a fundamental frequency extraction unit 102, a local change amount calculation unit 103, a local change amount statistic calculation unit 104, a local change amount specific determination unit 105, and a satisfaction determination. The unit 106 is provided, for example.

満足判定方法は、満足判定装置の各部が、図２及び以下に説明するステップＳ１０１からステップＳ１０６の処理により例えば実現される。 The satisfaction determination method is realized, for example, by each unit of the satisfaction determination device by the processing of FIG. 2 and step S101 to step S106 described below.

＜暗号化部１１＞
＜音声区間検出部１０１＞
音声区間検出部１０１には、通話の音声信号が入力される。通話の例は、例えば、コールセンタに電話をした顧客とそのコールセンタのオペレータとの間の通話である。 <Encryption unit 11>
<Voice section detection unit 101>
A voice signal of a call is input to the voice section detection unit 101. An example of a call is, for example, a call between a customer who calls the call center and an operator of the call center.

音声区間検出部１０１は、入力された通話の音声信号に音声区間検出技術を適用することにより、顧客の発話を少なくとも１つ得る（ステップＳ１０１）。得られた顧客の各発話は、基本周波数抽出部１０２に出力される。 The voice section detection unit 101 obtains at least one utterance of the customer by applying the voice section detection technique to the input voice signal of the call (step S101). Each utterance of the customer thus obtained is output to the fundamental frequency extraction unit 102.

音声区間検出技術として、パワーの閾値処理に基づく手法、音声／非音声モデルの尤度比に基づく手法等の既存の音声区間検出技術を用いることができる。パワーの閾値処理に基づく手法では、音声信号のパワーが所定の閾値以上である区間を、顧客の発話とする。 As the voice section detection technology, an existing voice section detection technology such as a method based on power threshold processing or a method based on a likelihood ratio of a voice/non-voice model can be used. In the method based on the power threshold processing, the section in which the power of the audio signal is equal to or higher than a predetermined threshold is used as the utterance of the customer.

＜基本周波数抽出部１０２＞
基本周波数抽出部１０２には、顧客の各発話が入力される。 <Basic frequency extraction unit 102>
Each utterance of the customer is input to the basic frequency extraction unit 102.

基本周波数抽出部１０２は、入力された顧客の各発話に対し、発話に対応する基本周波数系の列を抽出する。言い換えれば、基本周波数抽出部１０２は、通話に含まれる発話を構成する各フレームの基本周波数を抽出する（ステップＳ１０２）。抽出された基本周波数は、局所変化量計算部１０３に出力される。 The fundamental frequency extraction unit 102 extracts, for each utterance of the input customer, the sequence of the fundamental frequency system corresponding to the utterance. In other words, the basic frequency extraction unit 102 extracts the basic frequency of each frame forming the utterance included in the call (step S102). The extracted fundamental frequency is output to the local change amount calculation unit 103.

例えば、基本周波数を抽出するために、自己相関に基づく手法（例えば、参考文献１）を用いることができる。もちろん、他の基本周波数抽出手法を用いてもよい。 For example, a method based on autocorrelation (eg, Reference 1) can be used to extract the fundamental frequency. Of course, other fundamental frequency extraction methods may be used.

〔参考文献１〕
板橋秀一、「音声工学」, 森北出版株式会社, 2006, pp.104-105
抽出後の基本周波数系の列に対して、平滑化を行ってもよい。平滑化の手法として、例えばメディアンフィルタを用いる手法を用いることができる。 [Reference 1]
Shuichi Itabashi, "Voice Engineering", Morikita Publishing Co., Ltd., 2006, pp.104-105
Smoothing may be performed on the extracted sequence of the fundamental frequency system. As a smoothing method, for example, a method using a median filter can be used.

＜局所変化量計算部１０３＞
局所変化量計算部１０３には、基本周波数抽出部１０２が抽出した基本周波数が入力される。 <Local change amount calculation unit 103>
The fundamental frequency extracted by the fundamental frequency extraction unit 102 is input to the local change amount calculation unit 103.

局所変化量計算部１０３は、基本周波数系の列から、基本周波数の局所変化量を求める。言い換えれば、局所変化量計算部１０３は、抽出された基本周波数の変化量である局所変化量を計算する（ステップＳ１０３）。計算された局所変化量は、局所変化量統計量計算部１０４及び局所変化量特異判定部１０５に出力される。 The local change amount calculation unit 103 obtains the local change amount of the fundamental frequency from the sequence of the fundamental frequency system. In other words, the local change amount calculation unit 103 calculates the local change amount that is the extracted change amount of the fundamental frequency (step S103). The calculated local change amount is output to the local change amount statistic calculation unit 104 and the local change amount peculiar determination unit 105.

局所変化量は、例えば基本周波数の一次回帰による勾配とする。すなわち、例えば図２に示すように、ある時刻から前後（例えば前後50ミリ秒）に含まれる基本周波数を用いて一次の回帰分析を行い、得られた回帰直線の傾きを局所変化量とする。この処理を発話全体の基本周波数に適用することで発話全体の基本周波数の局所変化量を得ることができる。局所変化量計算部１０３は、通話を構成する各発話について、この処理を行う。 The local change amount is, for example, a gradient obtained by linear regression of the fundamental frequency. That is, for example, as shown in FIG. 2, primary regression analysis is performed using fundamental frequencies included before and after (for example, 50 milliseconds before and after) from a certain time, and the slope of the obtained regression line is set as the local change amount. By applying this processing to the fundamental frequency of the entire utterance, the local variation amount of the fundamental frequency of the entire utterance can be obtained. The local change amount calculation unit 103 performs this process for each utterance that constitutes a call.

＜局所変化量統計量計算部１０４＞
局所変化量統計量計算部１０４には、局所変化量計算部１０３が計算した基本周波数の局所変化量が入力される。 <Local change amount statistic calculation unit 104>
A local change amount of the fundamental frequency calculated by the local change amount calculation unit 103 is input to the local change amount statistic calculation unit 104.

局所変化量統計量計算部１０４は、通話全体の基本周波数の局所変化量から平均と分散を求め、通話全体の基本周波数の局所変化量統計量とする。言い換えれば、局所変化量統計量計算部１０４は、通話に含まれる各発話の基本周波数の局所変化量の平均及び分散である局所変化量統計量を計算する（ステップＳ１０４）。計算された局所変化量統計量は、局所変化量特異判定部１０５に出力される。 The local change amount statistic calculation unit 104 obtains the average and the variance from the local change amount of the fundamental frequency of the entire call, and sets it as the local change amount statistic of the fundamental frequency of the entire call. In other words, the local change amount statistic calculation unit 104 calculates the local change amount statistic that is the average and variance of the local change amounts of the fundamental frequencies of the utterances included in the call (step S104). The calculated local change amount statistic is output to the local change amount peculiar determination unit 105.

この通話全体の基本周波数の局所変化量統計量は、「その通話においてその発話をした者がどのような抑揚の付け方をするか」を表現する。例えば分散が大きい場合は、抑揚をつけた話し方をする者であると考えられる。 The local variation statistic of the fundamental frequency of the entire call expresses "how the person who uttered the call in the call puts intonation". For example, when the variance is large, it is considered that the person speaks with inflection.

＜局所変化量特異判定部１０５＞
局所変化量特異判定部１０５には、局所変化量計算部１０３が計算した基本周波数の局所変化量及び局所変化量統計量計算部１０４が計算した局所変化量統計量が入力される。 <Local change amount specific determination unit 105>
The local change amount peculiar determination unit 105 receives the local change amount of the fundamental frequency calculated by the local change amount calculation unit 103 and the local change amount statistic calculated by the local change amount statistic calculation unit 104.

局所変化量特異判定部１０５は、時刻ごとの基本周波数の局所変化量から、その時刻の局所変化量が特異であったかどうかを判定する。言い換えれば、局所変化量特異判定部１０５は、局所変化量計算部１０３で計算された局所変化量統計量に基づいて、局所変化量統計量計算部１０４で計算された局所変化量が特異であるかを判定する（ステップＳ１０５）。判定結果は、満足判定部１０６に出力される。局所変化量が特異であった場合の判定結果を「特異判定」と呼び、局所変化量が特異であった場合の判定結果を「非特異判定」と呼ぶことにする。 The local change amount unique determination unit 105 determines from the local change amount of the fundamental frequency for each time point whether the local change amount at that time point is unique. In other words, the local change amount singular determination unit 105 has a unique local change amount calculated by the local change amount statistic calculation unit 104 based on the local change amount statistic calculated by the local change amount calculation unit 103. It is determined (step S105). The determination result is output to the satisfaction determination unit 106. The determination result when the local change amount is singular is referred to as a “singular determination”, and the determination result when the local change amount is singular is referred to as a “non-singular determination”.

特異であると判定された場合、その時刻では通話している者において特異的な声の高さの上昇または下降が発生したといえ、これらはその者が満足したかどうかに関連する。 If it is determined to be peculiar, it can be said that a specific voice pitch rise or fall has occurred in the person who is talking at that time, and these are related to whether or not the person is satisfied.

特異であったかどうかの判定には、通話全体の基本周波数の局所変化量統計量（例えば、平均及び分散）により求まる正規分布と所定の信頼水準を例えば利用する。通話全体の基本周波数の局所変化量統計量と信頼水準から、信頼区間の範囲が定まる。時刻ごとの基本周波数の局所変化量について、信頼区間に入る場合は特異でない、入らない場合は特異として判定結果を返す。信頼水準は例えば95%とする。 For example, a normal distribution and a predetermined confidence level obtained by a local variation statistic (for example, mean and variance) of the fundamental frequency of the entire call are used to determine whether or not the call is unique. The confidence interval range is determined from the local variation statistics of the fundamental frequency of the entire call and the confidence level. Regarding the local change amount of the fundamental frequency for each time, if the confidence interval is entered, it is not singular. The confidence level is 95%, for example.

＜満足判定部１０６＞
満足判定部１０６には、局所変化量特異判定部１０５による得られた判定結果が入力される。 <Satisfaction determination unit 106>
The satisfaction determination unit 106 receives the determination result obtained by the local change amount specific determination unit 105.

満足判定部１０６は、時刻ごとの特異であるか否かの判定に基づき、発話ごとに満足判定結果を返す（ステップＳ１０６）。満足判定結果は、例えば、その発話において発話をした者が満足しているか否かの2値とする。 The satisfaction determination unit 106 returns a satisfaction determination result for each utterance based on the determination as to whether it is peculiar at each time (step S106). The satisfaction determination result is, for example, a binary value indicating whether or not the person who uttered the utterance is satisfied.

例えば、ある顧客の発話において、その発話内に時刻ごとの特異判定が所定の閾値以上含まれる場合は、その発話を、発話をした者が満足した発話とみなす。そうでない場合は、その発話を、発話をした者が満足していない発話であるとみなす。所定の閾値は例えば２とする。これは声の高さの急激な上昇と下降が少なくとも一組以上発生したケースに対応する。所定の閾値は２以外の正の整数であってもよい。 For example, in the utterance of a customer, when the singular determination for each time is included in the utterance at a predetermined threshold value or more, the utterance is regarded as an utterance satisfied by the person who uttered the utterance. Otherwise, the speech, regarded as a speech a person who has the speech is not satisfied. The predetermined threshold is, for example, 2. This corresponds to the case where at least one set of rapid rise and fall of voice pitch occurs. The predetermined threshold may be a positive integer other than 2.

［第二実施形態］
第二実施形態の満足判定装置及び方法は、従来技術で利用されている発話ごとの韻律特徴や言語特徴を更に利用して満足判定を行う。発話ごとの韻律特徴や言語特徴を更に利用することにより、判定の精度の更なる向上が期待できる。 [Second embodiment]
The satisfaction determination apparatus and method according to the second embodiment further performs the satisfaction determination by further utilizing the prosodic feature and the language feature for each utterance used in the conventional technique. By further utilizing the prosodic features and linguistic features of each utterance, the accuracy of the determination can be expected to be further improved.

以下、第一実施形態と異なる部分を中心に説明する。第一実施形態と同様の部分については重複説明を省略する。 Hereinafter, the description will be focused on the parts different from the first embodiment. Overlapping description of the same parts as those in the first embodiment will be omitted.

第二実施形態の満足判定装置は、図１に点線で示す、韻律特徴抽出部２０１及び言語特徴抽出部２０２を更に備えている。 The satisfaction determination device of the second embodiment further includes a prosody feature extraction unit 201 and a language feature extraction unit 202, which are indicated by dotted lines in FIG.

＜韻律特徴抽出部２０１＞
韻律特徴抽出部２０１には、音声区間検出部１０１で得られた顧客の各発話が入力される。 <Prosodic feature extraction unit 201>
Each utterance of the customer obtained by the voice section detection unit 101 is input to the prosody feature extraction unit 201.

韻律特徴抽出部２０１は、顧客の各発話から韻律特徴を求める。求まった韻律特徴は、満足判定部１０６に出力される。 The prosody feature extraction unit 201 obtains a prosody feature from each utterance of the customer. The obtained prosody features are output to the satisfaction determination unit 106.

韻律特徴は、声の高さや大きさ、話速の統計量（例えば、平均、分散等）を少なくとも１つ以上含む情報である。 The prosodic feature is information including at least one or more of a voice pitch and a loudness, and a statistic (for example, average, variance, etc.) of speech speed.

＜言語特徴抽出部２０２＞
言語特徴抽出部２０２には、音声区間検出部１０１で得られた顧客の各発話が入力される。 <Language feature extraction unit 202>
Each utterance of the customer obtained by the voice section detection unit 101 is input to the language feature extraction unit 202.

言語特徴抽出部２０２は、顧客の各発話から言語特徴を求める。求まった言語特徴は、満足判定部１０６に出力される。 The language feature extraction unit 202 obtains language features from each utterance of the customer. The obtained language feature is output to the satisfaction determination unit 106.

言語特徴は、フィラー数や競合製品名、競合他社名などの出現数を少なくとも１つ以上含む情報である。 The language feature is information including at least one or more appearance numbers such as the number of fillers, the names of competing products, and the names of competitors.

＜満足判定部１０６＞
第二実施形態の満足判定部１０６には、韻律特徴抽出部２０１で求まった韻律特徴及び言語特徴抽出部２０２で求まった言語特徴が入力される。 <Satisfaction determination unit 106>
The prosody feature extraction unit 201 and the language feature determined by the language feature extraction unit 202 are input to the satisfaction determination unit 106 of the second embodiment.

第二実施形態の満足判定部１０６は、時刻ごとの特異判定に加え、発話ごとの韻律特徴及び言語特徴の少なくとも一方を更に利用して、発話ごとに顧客満足判定結果を求める。言い換えれば、第二実施形態の満足判定部１０６は、韻律特徴及び言語特徴量の少なくとも一方を更に用いて、発話は発話をした者が満足した発話であるか否かを判定する。
顧客満足判定結果とは、その発話において顧客が満足しているか否かの２値とする。 The satisfaction determination unit 106 according to the second embodiment obtains a customer satisfaction determination result for each utterance by further using at least one of the prosody feature and the language feature for each utterance, in addition to the peculiar determination for each time. In other words, the satisfaction determination unit 106 of the second embodiment further determines at least one of the prosodic feature and the language feature amount to determine whether or not the utterance is an utterance satisfied by the person who uttered the utterance.
The customer satisfaction determination result is a binary value indicating whether or not the customer is satisfied with the utterance.

顧客満足判定結果は、１つ以上の特徴量のしきい値判定の結果の組合せにより例えば決定する。例えば、発話内に時刻ごとの特異判定が所定の閾値以上含まれており、かつ、韻律特徴のうち基本周波数の平均値がしきい値以上である場合に満足であると判定する。これは、平均的に声が高く、かつ、声の高さの急激な上昇現象が発生した場合に顧客が満足したと判定していることに相当する。 The customer satisfaction determination result is determined, for example, by a combination of the results of the threshold determinations of one or more feature amounts. For example, it is determined that the utterance is satisfied when the peculiar determination for each time is included in the utterance at a predetermined threshold value or more and the average value of the fundamental frequencies among the prosody features is the threshold value or more. This corresponds to deciding that the customer is satisfied when the voice is high on average and the phenomenon of a rapid increase in voice pitch occurs.

所定の閾値は、所望の結果が得られるように適宜決定される。所定の閾値は、機械学習により決定されてもよい。 The predetermined threshold value is appropriately determined so as to obtain a desired result. The predetermined threshold may be determined by machine learning.

[プログラム及び記録媒体]
満足判定装置における各処理をコンピュータによって実現する場合、満足判定装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、満足判定装置の処理がコンピュータ上で実現される。 [Program and recording medium]
When each process in the satisfaction determination device is realized by a computer, the processing content of the function that the satisfaction determination device should have is described by a program. Then, by executing this program on a computer, the processing of the satisfaction determination device is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded in a computer-readable recording medium. The computer-readable recording medium may be, for example, a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, or the like.

また、各処理手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, each processing means may be configured by executing a predetermined program on a computer, or at least a part of the processing contents thereof may be realized by hardware.

[変形例]
その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 [Modification]
Needless to say, other changes can be made without departing from the spirit of the present invention.

１０１音声区間検出部
１０２基本周波数抽出部
１０３局所変化量計算部
１０４局所変化量統計量計算部
１０５局所変化量特異判定部
１０６満足判定部
２０１韻律特徴抽出部
２０２言語特徴抽出部 101 voice section detection unit 102 fundamental frequency extraction unit 103 local change amount calculation unit 104 local change amount statistic calculation unit 105 local change amount peculiar determination unit 106 satisfaction determination unit 201 prosody feature extraction unit 202 language feature extraction unit

Claims

A fundamental frequency extraction unit that extracts the fundamental frequency of each frame that constitutes each utterance included in the call,
For the utterance, a local change amount calculation unit that calculates a local change amount that is a change amount of the extracted fundamental frequency for each time ,
A local change statistic calculation unit that calculates a local change statistic that is the average and variance of the local change amount of the fundamental frequency for each time of each utterance included in the calculated call,
Based on the calculated local change amount statistic, a local change amount peculiar determination unit that determines whether the calculated local change amount for each time of each utterance is peculiar,
For each of the utterances , when the number of times determined to be the peculiarity is greater than or equal to a predetermined threshold, each utterance is a satisfaction determination unit that determines that the utterer is a satisfied utterance. ,
Satisfaction determination device including.

In the satisfaction determination device according to claim 1,
A prosody feature extraction unit that extracts prosody features from the utterance,
A language feature amount extraction unit that extracts a language feature amount from the utterance,
The satisfaction determination unit further uses at least one of the extracted prosodic feature and language feature amount to determine whether or not the utterance is an utterance satisfied by the utterer.
Satisfaction determination device.

A fundamental frequency extraction unit, a fundamental frequency extraction step of extracting the fundamental frequency of each frame constituting each utterance included in the call,
A local change amount calculation section, for the utterance, a local change amount calculation step of calculating a local change amount that is a change amount of the extracted fundamental frequency for each time ,
The local change statistic calculation unit calculates a local change statistic that is an average and a variance of the local change amount of the fundamental frequency for each time of each utterance included in the call, which is calculated by the local change statistic calculation unit. Steps,
The local change amount peculiar determination unit, based on the local change amount statistics for each time of each of the calculated utterances, a local change amount peculiar determination step of determining whether the calculated local change amount is peculiar,
When the satisfaction determination unit includes, for each utterance, the number of times the singularity is determined to be equal to or greater than a predetermined threshold value, the utterance is determined to be a utterance satisfied by the utterer. A satisfaction determination step,
Satisfaction determination method including.

A program for causing a computer to function as each unit of the satisfaction determination device according to claim 1.