JP4972107B2

JP4972107B2 - Call state determination device, call state determination method, program, recording medium

Info

Publication number: JP4972107B2
Application number: JP2009016335A
Authority: JP
Inventors: 済央野本; 敏高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-01-28
Filing date: 2009-01-28
Publication date: 2012-07-11
Anticipated expiration: 2029-01-28
Also published as: JP2010175684A

Description

本発明は例えば、二人の話者が対話する状況下の入力音声信号から、各話者の感情状態を推定する通話状態判定装置、通話状態判定方法、プログラム、記録媒体に関する。 The present invention relates to, for example, a call state determination apparatus, a call state determination method, a program, and a recording medium that estimate an emotional state of each speaker from an input voice signal under a situation where two speakers interact.

近年、企業の抱えるコールセンタに集まる顧客からの要望や不満といった生の声から企業にとって何か有益な情報を得ようとする動きが盛んである。また、コールセンタは企業の顔という機能も重要視され始め、顧客が企業に対し抱くイメージを向上させるために、コールセンタのサービスの質の向上も企業が力を入れている。そのような中で、顧客が怒っている通話（以下、「クレーム通話」という。）を自動で見つけ出す技術がこれまで以上に望まれている。クレーム通話を分析することで、顧客の強い要望や不満、商品・サービスの不具合や問題点などといったことや、クレーム通話を引き起こすようなオペレータ対応の問題点を見つけ出したり、リアルタイムでオペレータの対応状況を監視することで、クレームの発生を迅速に検出して対応するといったことが可能になる。 In recent years, there has been an active movement to obtain useful information for companies from the raw voices such as requests and dissatisfaction from customers gathering at call centers held by companies. In addition, the call center has started to emphasize the function of the company's face, and companies are also making efforts to improve the service quality of the call center in order to improve the customer's image of the company. Under such circumstances, a technology that automatically finds a call in which a customer is angry (hereinafter referred to as a “claim call”) is desired more than ever. By analyzing complaint calls, you can find out the customer's strong demands and dissatisfactions, product / service defects and problems, operator-related problems that cause complaint calls, and the operator's response status in real time By monitoring, it becomes possible to quickly detect and respond to the occurrence of a complaint.

クレーム通話を見つけ出すための話者の「怒り」音声の感情認識技術として、従来は音声の高さ（ピッチ周波数）や大きさ（パワー）、またはそれらの変化成分（Δ成分）、話速などといった音声特徴に着目した発話音声区間の音響的特徴量が一般的に用いられている（特許文献１参照、以下、「先行技術１」という。）。 As emotion recognition technology for speaker's “angry” voice to find a claim call, conventionally the voice height (pitch frequency) and magnitude (power), or their variation component (Δ component), speech speed, etc. An acoustic feature amount of an utterance speech section focusing on speech features is generally used (see Patent Document 1, hereinafter referred to as “Prior Art 1”).

その他、通話におけるオペレータの相槌に着目した手法も提案されている（特許文献２参照、以下、「先行技術２」という。）。これは、予め登録しておいた相槌単語を音声認識し、通話開始時刻から初めて相槌がうたれた開始時刻に基づいて、クレーム状態であるか、否かの判定を行うものである。 In addition, a method has been proposed that pays attention to the operator's interaction in a call (see Patent Document 2, hereinafter referred to as “Prior Art 2”). In this method, a pre-registered companion word is recognized by voice, and it is determined whether or not it is in a complaint state based on the start time when the companion was first heard from the call start time.

特開２００５−３４５４９６号公報JP 2005-345496 A 特開２００７−２８６０９７号公報JP 2007-286097 A

先行技術１の問題点について説明する。コールセンタで収録される顧客の音声は電話音声であり、収録されるノイズや帯域制限フィルタがかかっているため、ピッチ周波数（ピッチパタン）の抽出が困難で誤検出しやすい。また、電話機のボリューム設定や、話し手の口と受話器の距離により話者が同じ音声で話したとしても、受信側録音機で計算されるパワー値が異なってしまう。また、分析する顧客の音声（またはオペレータの音声）は、話し方や電話環境により変動する。従って、声の高さ（ピッチ周波数）や大きさ（パワー）などの音響的な特徴量を正確に算出することは困難であり、クレーム判定を高精度で頑健に行うことは難しい。また、怒り方は話者によって様々であり、怒声を上げて怒る場合もあれば、冷静な声で怒る場合もある。また怒声を上げて怒る話者でも、会話のはじめから終わりまで常に怒声を上げ続けているわけではない。そのため、声の高さや大きさなどの音響的な特徴だけから顧客が怒っているか否かを判定することは難しい。 The problem of Prior Art 1 will be described. The customer's voice recorded at the call center is telephone voice, and since the recorded noise and band limiting filter are applied, it is difficult to extract the pitch frequency (pitch pattern) and it is easy to detect it erroneously. Further, even if the speaker speaks with the same voice depending on the volume setting of the telephone or the distance between the speaker's mouth and the receiver, the power value calculated by the receiving recorder is different. The voice of the customer to be analyzed (or the voice of the operator) varies depending on the way of speaking and the telephone environment. Accordingly, it is difficult to accurately calculate acoustic feature quantities such as voice pitch (pitch frequency) and loudness (power), and it is difficult to make a claim determination robustly with high accuracy. Also, the manner of anger varies depending on the speaker, and there is a case where a person gets angry with a angry voice or a person who gets angry with a calm voice. Also, even a speaker who gets angry and angry does not always cry out from the beginning to the end of the conversation. Therefore, it is difficult to determine whether a customer is angry based only on acoustic features such as voice pitch and loudness.

先行技術２の問題点について説明する。この技術は、通話開始からの通話でないとクレームを判定できない。従って、例えばオペレータの対応が悪く、通話の途中から顧客が怒り出すような場合には、クレーム通話を認識することは難しかった。 The problem of Prior Art 2 will be described. With this technology, a complaint cannot be determined unless the call is from the start of the call. Therefore, for example, when the operator's response is poor and the customer gets angry from the middle of the call, it is difficult to recognize the complaint call.

上記の課題を解決するために、本願の通話状態判定装置は、第２話者と通話をしている第１話者が、第１状態か第２状態かを判定する通話状態判定装置であり、検出部と、抽出部と、算出部と、ベクトル化部と、スコア計算部と、判定部と、出力部と、を有する。検出部は、第１話者と第２話者の発話区間を検出する。抽出部は、予め定められた数の発話対をセグメントとして抽出する。算出部は、発話対毎に、発話状況に関わる対話的特徴量を算出する。ベクトル化部は、対話的特徴量をセグメント毎に集計することで特徴ベクトルを求める。スコア計算部は、予め定められた判別式に特徴ベクトルの各要素を代入することで、第１状態スコアを求める。判定部は、第１状態スコアが予め定められた第１閾値以上であれば、そのセグメントを第１状態セグメントとして判定する。出力部は、第１状態セグメントが予め定められた第２閾値以上であれば、通話状態は第１状態である旨の情報を出力する。 In order to solve the above-described problem, the call state determination device of the present application is a call state determination device that determines whether the first speaker talking with the second speaker is in the first state or the second state. A detection unit, an extraction unit, a calculation unit, a vectorization unit, a score calculation unit, a determination unit, and an output unit. A detection part detects the speech area of a 1st speaker and a 2nd speaker. The extraction unit extracts a predetermined number of utterance pairs as segments. The calculation unit calculates an interactive feature amount related to the utterance situation for each utterance pair. A vectorization part calculates | requires a feature vector by totaling an interactive feature-value for every segment. The score calculation unit obtains the first state score by substituting each element of the feature vector into a predetermined discriminant. The determination unit determines the segment as the first state segment if the first state score is equal to or greater than a predetermined first threshold. The output unit outputs information indicating that the call state is the first state if the first state segment is equal to or greater than a predetermined second threshold value.

本発明の通話判定装置によれば、先行技術１のように声の高さや大きさなどの音響的特徴量ではなく、対話的な特徴である対話的特徴量を用いて、顧客がクレーム状態か否かを判定する。対話的特徴量は、顧客の電話環境、顧客の音声または話し方などに変動されることはないため、先行技術１の問題点を解決でき、頑健にかつ精度よくクレーム判定を行うことができる。 According to the call determination device of the present invention, whether or not the customer is in a complaint state using an interactive feature quantity that is an interactive feature, instead of an acoustic feature quantity such as voice pitch and loudness as in Prior Art 1. Determine whether or not. Since the interactive feature amount is not changed by the customer's telephone environment, the customer's voice or how to speak, the problem of the prior art 1 can be solved, and the claims can be determined robustly and accurately.

また、先行技術２のように通話開始から初めて相槌が打たれた時の開始時間に基づいてクレーム状態であるか否かの判定をしていたが、本発明では、通話中の発話対毎の対話的特徴量を用いて、クレーム状態であるか否かの判定をするので、顧客が通話途中から怒り出したとしても、頑健にかつ精度よくクレーム判定を行うことができ、先行技術２の問題点を解決できる。 Further, as in the prior art 2, it is determined whether or not the complaint state is based on the start time when the first contact is made from the start of the call, but in the present invention, for each utterance pair in the call, Since it is determined whether or not it is a complaint state using the interactive feature amount, even if the customer gets angry from the middle of the call, the complaint can be determined robustly and accurately, and the problem of the prior art 2 The point can be solved.

本実施例の通話状態判定装置の機能構成例を示した図である。It is the figure which showed the function structural example of the telephone call state determination apparatus of a present Example. 本実施例の通話状態判定装置の処理フローを示した図である。It is the figure which showed the processing flow of the call state determination apparatus of a present Example. Ａ〜Ｄはそれぞれ発話対について示した図である。A to D are diagrams each showing an utterance pair. セグメントについて示した図である。It is the figure shown about the segment. Ａ〜Ｃはそれぞれ対話的特徴量について示した図である。A to C are diagrams showing interactive feature amounts.

以下の説明では、コールセンタにおいて、第１話者を顧客とし、第２話者をオペレータとし、第１状態を顧客が怒っている状態（以下、「クレーム状態」という。）とし、第２状態を顧客が怒っていない状態（つまり、平常状態、以下、「非クレーム状態」という。）とする。また、顧客、オペレータがそれぞれ話していることを発話といい、発話の集合を通話という。 In the following description, in the call center, the first speaker is the customer, the second speaker is the operator, the first state is the state where the customer is angry (hereinafter referred to as “claim state”), and the second state is It is assumed that the customer is not angry (that is, a normal state, hereinafter referred to as “non-claimed state”). In addition, what each customer and operator are talking is called utterance, and a set of utterances is called a call.

図１に通話状態判定装置１００などの機能構成例を示し、図２に処理フローを示す。顧客とオペレータの通話音声が入力端２に入力されると、検出部４は、顧客（第１話者）とオペレータ（第２話者）の発話区間を検出する（ステップＳ２）。具体的には、既存の分離アダプタ等のハードウェアや音源分離技術により、顧客とオペレータの音声を分離し、各発話毎に開始時刻と終了時刻を求める。発話区間検出のための技術として、任意で定めた予め定められた第３閾値Ｌ_３以上の音声パワーがある一定時間以上続いた区間を発話区間とする手法などがある。 FIG. 1 shows a functional configuration example of the call state determination device 100 and the like, and FIG. 2 shows a processing flow. When the call voice of the customer and the operator is input to the input terminal 2, the detection unit 4 detects the utterance section between the customer (first speaker) and the operator (second speaker) (step S2). Specifically, the voices of the customer and the operator are separated using existing hardware such as a separation adapter or sound source separation technology, and the start time and end time are obtained for each utterance. As a technique for voice activity detection, and the like method of the third threshold value L ₃ or more a voice power certain time or subsequently interval the speech period predetermined that defines optionally.

図３Ａ〜Ｄにオペレータの発話と顧客の発話とを模式的に示す。ハッチングしていない矩形がオペレータの発話区間を示し、ハッチングしている矩形が顧客の発話区間を示し、横軸が時間軸を示す。図３に示すように、オペレータと顧客が別にステレオ録音された場合は、モノラル録音より発話区間検出は容易になる。モノラル録音の場合には、顧客とオペレータとの音声を識別するための手段と併用する必要がある。例えば、音声スペクトルを特徴量とし、ＧＭＭ（ＧａｕｓｓｉａｎＭｉｑｔｕｒｅＭｏｄｅｌ）を用いて、顧客とオペレータの音声を分離すればよい。 3A to 3D schematically show an operator's utterance and a customer's utterance. The unhatched rectangle indicates the operator's utterance section, the hatched rectangle indicates the customer's utterance section, and the horizontal axis indicates the time axis. As shown in FIG. 3, when the operator and the customer are recorded separately in stereo, it is easier to detect the utterance period than monaural recording. In the case of monaural recording, it is necessary to use in combination with means for identifying the voice between the customer and the operator. For example, the voice and the operator's voice may be separated by using a voice spectrum as a feature quantity and using a GMM (Gaussian Mikuture Model).

ここで、図３Ａに示すように、（図３Ａの例ではオペレータの）連続的な発話が終了して、相手（図３Ａの例では顧客）が話し始めた場合に発話権が交代したとみなす。そして発話開始時点からこの発話終了時点までを発話区間とする。図３Ｂ記載のオペレータの発話については、途中で発話をやめているが、発話権が交代することなく、再度オペレータは発話し始めている。この場合も、オペレータの１つの発話区間とする。また、図３Ｃに示すように、オペレータの発話に重なるように顧客は相槌をうっているが、この場合も発話権は交代しておらず、この場合もオペレータの１つの発話とみなす。また、１つの顧客の１つの発話とオペレータの１つの発話との対を発話対という。つまり、図３Ａ〜Ｃの例では、オペレータの発話と顧客の発話との対が発話対である。また、図３Ａ〜Ｃでは、オペレータ→顧客の順番での発話対を示したが、図３Ｄに示すように、顧客→オペレータの順番に発話している場合も発話対と呼ぶ。 Here, as shown in FIG. 3A, when the continuous utterance (of the operator in the example of FIG. 3A) ends and the other party (customer in the example of FIG. 3A) starts speaking, it is considered that the utterance right is changed. . Then, an utterance section is defined from the utterance start time to the utterance end time. As for the utterance of the operator shown in FIG. 3B, the utterance is stopped halfway, but the operator starts speaking again without changing the utterance right. In this case as well, it is set as one utterance section of the operator. In addition, as shown in FIG. 3C, the customer is in a state of mutual agreement with the operator's utterance, but in this case as well, the utterance right is not changed, and this case is also regarded as one utterance of the operator. A pair of one utterance of one customer and one utterance of an operator is called an utterance pair. That is, in the example of FIGS. 3A to 3C, a pair of an operator's utterance and a customer's utterance is an utterance pair. 3A to 3C show the utterance pairs in the order of the operator → customer, but as shown in FIG. 3D, the utterance pairs in the order of the customer → operator are also called utterance pairs.

次に、抽出部６は、予め定められた数Ｑの発話対をセグメントとして抽出する（ステップＳ４）。図４にセグメント抽出の模式図を示す。図４の例では、Ｑ＝３、つまり、３つの発話対を１つのセグメントとした場合を示す。まず、３つの発話対を抽出すると、一定間隔ごと、または、１発話区間ごとにスライドさせて、再度、予め定められた数Ｑ（この例では３つ）の発話対を抽出する。この処理を繰り返し、セグメントを抽出していく。また、予め定められた数Ｑを１としてもよく、この場合は、１つの発話対が１つのセグメントとなる。 Next, the extraction unit 6 extracts a predetermined number Q of utterance pairs as segments (step S4). FIG. 4 shows a schematic diagram of segment extraction. In the example of FIG. 4, Q = 3, that is, a case where three utterance pairs are set as one segment is shown. First, when three utterance pairs are extracted, a predetermined number Q (three in this example) of utterance pairs are extracted again by sliding at regular intervals or for each utterance section. This process is repeated to extract segments. Further, the predetermined number Q may be set to 1, and in this case, one utterance pair is one segment.

次に、算出部８は、第１話者（顧客）と第２話者（オペレータ）の発話対毎に、発話状況に関わる対話的特徴量Ｒを算出する（ステップＳ６）。上記先行技術１で示されているような発話単位での声の大きさや高さなどの音響的特徴ではなく、この対話的特徴量Ｒとは、第１話者、第２話者の間の発話集合からなる対話についての特徴量である。 Next, the calculation unit 8 calculates an interactive feature amount R related to the utterance situation for each utterance pair of the first speaker (customer) and the second speaker (operator) (step S6). The interactive feature R is not an acoustic feature such as the loudness or the height of the voice in the utterance unit as shown in the prior art 1, but the interactive feature amount R is between the first speaker and the second speaker. This is a feature value for a dialogue consisting of a set of utterances.

この実施例では、対話的特徴量Ｒとして、（１）発話対毎の顧客（第１話者）の発話時間Ａ（２）発話対毎のオペレータ（第２話者）の発話時間Ｂ（３）発話対毎のオペレータの発話時間と顧客の発話時間の離散度Ｃ（４）発話対毎の顧客の相槌回数Ｄ（５）発話対毎のオペレータの相槌回数Ｅ（６）発話対毎の顧客の発話とオペレータの発話についての無音時間Ｆ（７）発話対毎の顧客の発話とオペレータの発話についての重複時間Ｇ、などを用いる。それぞれを詳細に説明する。また、（１）〜（７）の対話的特徴量の模式図を図５Ａ〜図５Ｃに示す。 In this embodiment, as the interactive feature quantity R, (1) the utterance time A of the customer (first speaker) for each utterance pair (2) the utterance time B (3) of the operator (second speaker) for each utterance pair ) Discreteness of operator utterance time and customer utterance time for each utterance pair C (4) Customer interaction count D for each utterance pair D (5) Operator interaction count E for each utterance pair (6) Customer for each utterance pair Silence time F for the utterances and utterances of the operator F (7) The customer utterance for each utterance pair, the overlap time G for the utterances of the operator, and the like are used. Each will be described in detail. Moreover, the schematic diagram of the interactive feature-value of (1)-(7) is shown to FIG. 5A-FIG. 5C.

（１）顧客の発話時間Ａ
図５Ａに顧客の発話時間Ａを示す。対話的特徴量として、顧客の発話時間を用いる理由は、
顧客の発話時間Ａが長い・・・顧客がクレーム状態にある。
顧客の発話時間Ａが短い・・・顧客が非クレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客がクレーム状態にある場合には、顧客はオペレータに対して一方的に話す場合が多く、顧客の発話時間Ａが長くなる傾向にあるからである。 (1) Customer utterance time A
FIG. 5A shows the customer utterance time A. FIG. The reason for using the customer's utterance time as an interactive feature is
The customer's utterance time A is long ... the customer is in a complaint state.
Customer's utterance time A is short ... The customer is in a non-claimed state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is in a complaint state, the customer often talks unilaterally to the operator, and the customer's utterance time A tends to be longer.

この場合には、算出部８には、第１算出手段８１を具備させる（図１参照）。第１算出手段８１は顧客の発話時間を算出するものである。発話時間の算出方法の一例として、顧客の発話開始時点から発話終了時点までの時間を測定する。 In this case, the calculation unit 8 includes first calculation means 81 (see FIG. 1). The first calculation means 81 calculates customer utterance time. As an example of the calculation method of the utterance time, the time from the start time of the customer's utterance to the end time of the utterance is measured.

（２）オペレータの発話時間Ｂ
図５Ａにオペレータの発話時間Ｂについて示す。対話的特徴量として、オペレータの発話時間Ｂを用いる理由は、
オペレータの発話時間Ｂが長い・・・顧客が非クレーム状態にある。
オペレータの発話時間Ｂが短い・・・顧客がクレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客がクレーム状態にある場合には、顧客はオペレータに対して一方的に話す場合が多く、オペレータがあまり発話せず、オペレータの発話時間Ｂは短くなる傾向があるからである。 (2) Operator utterance time B
FIG. 5A shows the utterance time B of the operator. The reason for using the operator's utterance time B as the interactive feature is as follows:
The operator's utterance time B is long ... the customer is in a non-claimed state.
The utterance time B of the operator is short .... The customer is in a complaint state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is in a complaint state, the customer often speaks unilaterally to the operator, the operator does not speak much, and the operator's utterance time B tends to be short.

この場合には、算出部８には、第２算出手段８２を具備させる。第２算出手段８２はオペレータの発話時間を算出するものである。発話時間の算出方法の一例として、オペレータの発話開始時点から発話終了時点までの時間を測定する。 In this case, the calculation unit 8 includes second calculation means 82. The second calculation means 82 is for calculating the utterance time of the operator. As an example of the calculation method of the utterance time, the time from the utterance start time of the operator to the utterance end time is measured.

（３）顧客の発話時間とオペレータの発話時間の離散度Ｃ
離散度Ｃについて図５Ｂに示す。ここで離散度Ｃとは、顧客の発話時間Ａがオペレータの発話時間Ｂと比較して、ＡとＢの離散している度合いを示すものであり、例えば、差（Ａ−Ｂ）または比（Ａ／Ｂ）である。対話的特徴量として、顧客の発話時間とオペレータの発話時間の離散度Ｃを用いる理由は、
離散度Ｃが大きい（Ａ−ＢやＡ／Ｂの値が大きい、つまり、オペレータの発話時間Ｂと比較して、顧客の発話時間Ａが大きい）・・・顧客がクレーム状態にある。 (3) Discreteness C between customer utterance time and operator utterance time
The degree of discreteness C is shown in FIG. 5B. Here, the degree of discreteness C indicates the degree to which the customer's utterance time A is more discrete than A and B compared to the operator's utterance time B, for example, a difference (AB) or a ratio ( A / B). The reason for using the discrete degree C of the customer utterance time and the operator utterance time as the interactive feature amount is as follows:
Discreteness C is large (AB and A / B are large, that is, the customer's utterance time A is large compared to the operator's utterance time B)...

離散度Ｃが小さい（Ａ−ＢやＡ／Ｂの値が小さい、つまり、オペレータの発話時間Ｂと比較して、顧客の発話時間Ａが小さい）・・・顧客が非クレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客がクレーム状態にある場合には、顧客が一方的に話しをするため、顧客の発話時間が長くなり、オペレータの発話時間が短くなることで顧客の発話時間Ａとオペレータの発話時間Ｂとの離散度Ｃが大きくなるからである。 Discreteness C is small (values of AB and A / B are small, that is, the customer's speech time A is small compared to the operator's speech time B)... The customer is in a non-claimed state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is in a complaint state, the customer speaks unilaterally, so the customer's utterance time becomes longer and the operator's utterance time becomes shorter, so that the customer's utterance time A and the operator's utterance time B This is because the degree of discreteness C increases.

また、通話において、顧客の発話時間が長い場合に（例えば、顧客の質問が長い場合に）、その長い発話に返答するオペレータの発話時間が長くなる場合がある。この場合には、実際には顧客はクレーム状態にないのであるが、対話特徴量Ｒとして顧客の発話時間Ａを用いると、顧客の発話時間が長いことから顧客はクレーム状態にあるといった誤判断（判断の手法は後述する）を行う場合がある。ところが、離散度Ｃを用いることで、顧客の発話時間とオペレータの発話時間とが正規化され、このような場合であっても、クレーム状態であるという誤判断を行うことはない。 Further, in a call, when the customer's utterance time is long (for example, when the customer's question is long), the utterance time of the operator who answers the long utterance may be long. In this case, the customer is not actually in the complaint state, but if the customer's utterance time A is used as the dialogue feature amount R, the customer's utterance time is long, so that the customer is in the complaint state ( The determination method will be described later. However, by using the degree of discreteness C, the customer's utterance time and the operator's utterance time are normalized, and even in such a case, there is no misjudgment of being in a complaint state.

一方、顧客の発話時間が短い場合に、その短い発話に返答するオペレータの発話時間が短くなる場合がある。この場合には、実際には顧客はクレーム状態にないのであるが、対話特徴量Ｒとしてオペレータの発話時間Ｂを用いると、オペレータの発話時間が短いことから顧客がクレーム状態にあるといった誤判断を行う場合がある。ところが、離散度Ｃを用いることで、顧客の発話時間とオペレータの発話時間とが正規化され、このような場合であっても、クレーム状態であるという誤判断を行うことはない。 On the other hand, when the customer's utterance time is short, the utterance time of the operator who responds to the short utterance may be short. In this case, the customer is not actually in the complaint state. However, if the operator's utterance time B is used as the dialogue feature amount R, an erroneous determination that the customer is in the complaint state is made because the operator's utterance time is short. May do. However, by using the degree of discreteness C, the customer's utterance time and the operator's utterance time are normalized, and even in such a case, there is no misjudgment of being in a complaint state.

この場合には、算出部８には、第３算出手段８３を具備させる。第３算出手段８３は、顧客の発話時間Ａとオペレータの発話時間Ｂを算出し、ＡとＢの離散度（Ａ−Ｂ、Ａ／Ｂ）を求める。 In this case, the calculation unit 8 includes third calculation means 83. The third calculation means 83 calculates the customer utterance time A and the operator utterance time B, and obtains the discreteness (A−B, A / B) of A and B.

（４）顧客の相槌回数Ｄ
図５Ａに顧客の相槌回数Ｄについて示す。対話的特徴量として、顧客の相槌回数Ｄを用いる理由は、
顧客の相槌回数Ｄが多い・・・顧客が非クレーム状態にある。
顧客の相槌回数Ｄが少ない・・・顧客がクレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客がクレーム状態にある場合には、顧客がオペレータの話しを聞かず、一方的に話しをするので顧客の相槌回数が少なくなるからである。 (4) Number of customer consultations D
FIG. 5A shows the number of customer consultations D. The reason for using the number of customer interactions D as an interactive feature is as follows:
The number of customer consultations D is large ... The customer is in a non-claimed state.
The number of customer consultations D is small ... the customer is in a complaint state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is in a complaint state, the customer does not listen to the operator and speaks unilaterally, so that the number of customer interactions decreases.

この場合には、算出部８には、第４算出手段８４を具備させる。第４算出手段８４は、顧客の相槌回数Ｄを求める。ここで、相槌回数を求める手法例について説明する。例えば、顧客が相槌をうつ時に発話するであろう単語（例えば、「うん」「ああ」「ええ」等）を定めておき、図示しない音声認識手段で、顧客の音声を音声認識し、相槌単語の数を測定する手法を用いればよい。 In this case, the calculation unit 8 includes fourth calculation means 84. The fourth calculation means 84 obtains the customer's consultation count D. Here, an example of a method for obtaining the number of conflicts will be described. For example, a word (for example, “Yes”, “Ah”, “Yes”, etc.) that the customer will utter when competing is determined, the customer's voice is recognized by voice recognition means (not shown), and the companion word A method of measuring the number of the above may be used.

（５）オペレータの相槌回数Ｅ
図５Ａにオペレータの相槌回数Ｅを示す。対話的特徴量として、オペレータの相槌回数Ｅを用いる理由は、
オペレータの相槌回数Ｅが少ない・・・顧客が非クレーム状態にある。
オペレータの相槌回数Ｅが多い・・・顧客がクレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客がクレーム状態にある場合には、顧客がオペレータの話しを聞かず、一方的に話しをするのでオペレータの相槌回数が多くなるからである。 (5) Number of operator consultations E
FIG. 5A shows the number E of the operator's discussions. The reason for using the operator's number of times of interaction E as the interactive feature is as follows:
Operator's number of times of interaction E is small .... The customer is in a non-claimed state.
Operator's number of times of interaction E is large ... The customer is in a complaint state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is in a complaint state, the customer does not listen to the operator but speaks unilaterally, so that the number of times the operator is in conflict increases.

この場合には、算出部８には、第５算出手段８５を具備させる。第５算出手段８５は、オペレータの相槌回数Ｅを求める。ここで、相槌回数を求める手法例について説明する。例えば、オペレータが相槌をうつ時に発話するであろう単語（例えば、「はい」「そうです」「申し訳ございません」等）を定めておき、図示しない音声認識手段（「（４）顧客の相槌回数Ｄ」で説明）で、オペレータの音声を音声認識し、相槌単語の数を測定する手法を用いればよい。 In this case, the calculation unit 8 includes fifth calculation means 85. The fifth calculation means 85 obtains the number of times E of the operator's discussion. Here, an example of a method for obtaining the number of conflicts will be described. For example, a word (for example, “Yes”, “Yes”, “Sorry”, etc.) that the operator will utter when competing is determined, and voice recognition means (not shown) (“(4) Number of customer's conflicts” In D), a method of recognizing the operator's voice and measuring the number of conflicting words may be used.

（６）無音時間Ｆ
図５Ａに無音時間Ｆについて示す。ここで、無音時間とは発話対について、顧客、オペレータの両方が発話していない時間をいう。発話対について、顧客の発話とオペレータの発話についての無音時間Ｆを用いる理由は、
無音時間Ｆが長い・・・顧客がクレーム状態にある。
無音時間Ｆが短い・・・顧客が非クレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客が怒っている場合には、オペレータは黙り込む（無音時間Ｆが長くなる）場合が多く、顧客が怒っていない場合には、オペレータの発話と顧客の発話との間に無音が生じない（無音時間Ｆが短くなる）場合が多いからである。 (6) Silent time F
FIG. 5A shows the silent period F. Here, the silent time refers to the time during which the customer and the operator are not speaking for the utterance pair. The reason for using the silent time F for the customer utterance and the operator utterance for the utterance pair is as follows:
Silent time F is long ... The customer is in a complaint state.
Silent time F is short ... the customer is in a non-claimed state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is angry, the operator often keeps silent (silence time F becomes longer), and when the customer is not angry, no silence is generated between the operator's utterance and the customer's utterance. This is because there are many cases where the silent time F is shortened.

この場合には、算出部８には、第６算出手段８６を具備させる。第６算出手段８６は、無音時間Ｆを求める。ここで、無音時間Ｆを求める手法例について説明する。例えば、第６算出手段８６は、発話をしている話者（図５Ａの例では、オペレータ）の発話終了時刻Ｔ_１と、この話者の発話が終了し、もう一方の話者（図５の例では、顧客）の発話開始時刻Ｔ_２と、を測定する。図５Ａでは、Ｔ_２＞Ｔ_１となり、この場合に無音時間Ｆが生じることになり、そして、第６算出手段８６は、Ｔ_２−Ｔ_１を算出すればよい。Ｔ_２−Ｔ_１は正の値となり、無音時間Ｆの値となる。 In this case, the calculation unit 8 includes sixth calculation means 86. The sixth calculation means 86 obtains the silent time F. Here, a method example for obtaining the silent time F will be described. For example, the sixth calculating means 86, a speaker that speech (in the example of FIG. 5A, the operator) and the speech end time T ₁ of the utterance of the speaker is finished, the other speaker (Fig. 5 In this example, the customer's utterance start time T ₂ is measured. In FIG. 5A, T ₂ > T ₁ , and in this case, the silent time F is generated, and the sixth calculating unit 86 may calculate T ₂ −T ₁ . T ₂ −T ₁ is a positive value, and is a value of the silent time F.

（７）重複時間Ｇ
重複時間Ｇを図５Ｃに示す。ここで、重複時間Ｇとは、発話対について、顧客とオペレータの両方が重複して発話している時間をいう。重複時間Ｇを用いる理由は、
重複時間Ｇが長い・・・顧客がクレーム状態にある。
重複時間Ｇが短い・・・顧客が非クレーム状態にある。
という現象が経験的に分かっており、この現象を利用するためである。なぜなら、顧客が怒っている場合には、顧客がオペレータの発話を遮って話すなど場合が多く、顧客が怒っていない場合には、オペレータの発話と顧客の発話との間に重複が生じない（重複時間Ｇが短くなる）場合が多いからである。 (7) Overlap time G
The overlap time G is shown in FIG. 5C. Here, the overlap time G refers to the time during which both the customer and the operator speak in an overlapping manner with respect to the utterance pair. The reason for using overlap time G is
Overlap time G is long ... The customer is in a complaint state.
Overlapping time G is short ... the customer is in a non-claimed state.
This is because this phenomenon is empirically understood and this phenomenon is used. This is because when the customer is angry, the customer often interrupts the operator's speech, and when the customer is not angry, there is no overlap between the operator's speech and the customer's speech ( This is because there are many cases where the overlap time G is shortened).

この場合には、算出部８には、第７算出手段８７を具備させる。第７算出手段８７は、重複時間Ｇを求める。ここで、重複時間Ｇを求める手法例について説明する。例えば、第７算出手段８７は発話をしている話者（図５の例では、オペレータ）の発話終了時刻Ｔ_１と、この話者の発話が終了し、もう一方の話者（図５の例では、顧客）の発話開始時刻Ｔ_２と、を測定する。図５Ｃでは、Ｔ_２＜Ｔ_１となり、この場合に重複時間Ｇが生じることになり、Ｔ_２−Ｔ_１は負の値となり、重複時間Ｆとなる。 In this case, the calculation unit 8 includes seventh calculation means 87. The seventh calculating means 87 calculates the overlap time G. Here, a method example for obtaining the overlap time G will be described. For example, the seventh calculating means 87 is the utterance end time T ₁ of the speaker who is speaking (the operator in the example of FIG. 5), and the speaker's utterance is terminated, and the other speaker (of FIG. 5). In the example, the customer's utterance start time T ₂ is measured. In FIG. 5C, T ₂ <T ₁ , and in this case, the overlap time G occurs, and T ₂ −T ₁ becomes a negative value and the overlap time F.

従って、上述のように、対話的特徴量Ｒとは、顧客の発話時間Ａ、オペレータの発話時間Ｂ、離散度Ｃ、顧客の相槌回数Ｄ、オペレータの相槌回数Ｅ、無音時間Ｆ、重複時間Ｇのうち少なくとも１つであり、算出部８には、上記第１算出手段８１〜第７算出手段８７のうち少なくとも１つを具備させればよい。これらＡ〜Ｇのうちどの対話的特徴量を用いるかは、適宜決定すればよい。また、対話的特徴量ＲはこれらＡ〜Ｇに限られるものではない。 Therefore, as described above, the interactive feature amount R includes the customer utterance time A, the operator utterance time B, the discreteness C, the customer interaction count D, the operator interaction count E, the silence time F, and the overlap time G. The calculation unit 8 may be provided with at least one of the first calculation unit 81 to the seventh calculation unit 87. Which interactive feature is to be used among these A to G may be appropriately determined. Further, the interactive feature amount R is not limited to these A to G.

また、算出部８の処理は、セグメントとして抽出された後の発話対について行ってもよく、セグメントとして抽出される前の発話対について行っても良い。また、上記対話的特徴量に加え、従来の音響的特徴量（例えば、声の高さ（ピッチ周波数）や大きさ（パワー）、話速など）を算出しても良い。 Moreover, the process of the calculation part 8 may be performed about the utterance pair after extracting as a segment, and may be performed about the utterance pair before extracting as a segment. In addition to the interactive feature amount, a conventional acoustic feature amount (for example, voice pitch (pitch frequency), loudness (power), speech speed, etc.) may be calculated.

ベクトル化部１０は、対話的特徴量をセグメント毎に集計することで特徴ベクトルを求める（ステップＳ８）。具体的には、Ｑ個（セグメントを構成する発話対の数）の発話対について、対話的特徴量Ｒ（Ａ〜Ｇのうち少なくとも１つ）から特徴ベクトルのベクトル要素を求める。ベクトル要素を求める手法として、Ｑ個の発話対について、例えば平均値、分散、最大値、最小値、を求めればよい。Ｑ個の発話対のうち、ｑ番目（１、．．．、ｑ、．．．、Ｑ）の発話対の対話的特徴量Ａ〜Ｇの値をそれぞれａ_ｑ、ｂ_ｑ、ｃ_ｑ、ｄ_ｑ、ｅ_ｑ、ｆ_ｑ、ｇ_ｑとすると、対話的特徴量Ａ〜Ｇを全て用いた場合には、ベクトル化部１０で求められる１セグメント毎の特徴ベクトルは、
（全てのａ_ｑの平均値、全てのａ_ｑの分散、全てのａ_ｑのうちの最大値、全てのａ_ｑのうちの最小値、．．．、全てのｇ_ｑの平均値、全てのｇ_ｑの分散、全てのｇ_ｑのうちの最大値、全てのｇ_ｑのうちの最小値）となる。この場合は、特徴ベクトルの要素数は２８個となる。上述したように、用いる対話的特徴量はＡ〜Ｇのうち少なくとも１つなので、用いた対話的特徴量に応じた特徴ベクトルが生成される。 The vectorization unit 10 obtains a feature vector by summing up interactive feature values for each segment (step S8). Specifically, vector elements of feature vectors are obtained from interactive feature values R (at least one of A to G) for Q (number of utterance pairs constituting a segment) utterance pairs. As a method for obtaining a vector element, for example, an average value, variance, maximum value, and minimum value may be obtained for Q utterance pairs. Among the Q utterance pairs, the values of the interactive feature quantities A to G of the q-th (1,..., Q,..., Q) utterance pair are respectively a _q , b _q , c _q , d. Assuming _{q 1} , e _q , f _q , and g _q , when all the interactive feature amounts A to G are used, the feature vector for each segment obtained by the vectorization unit 10 is
(Average of all _{a q,} the dispersion of all _{a q,} the maximum value of all the _{a q,} the minimum value of all _{a q,} ..., the average value of all _{g q,} all dispersion of g _q, maximum value of all g _q, the minimum value) of all g _q. In this case, the feature vector has 28 elements. As described above, since the interactive feature value to be used is at least one of A to G, a feature vector corresponding to the used interactive feature value is generated.

スコア計算部１２は、予め定められた判別式Ｆ（Ｘ）に特徴ベクトルの各要素を代入することで、第１状態スコア（クレームスコア）を求める（ステップＳ１０）。判別式Ｈは予め学習装置２００により求められる。学習装置２００は、通話状態判定装置１００の同じ検出部４、抽出部６、算出部８、ベクトル化部１０と、学習部１８が設けられる。以下、判別式Ｆ（Ｘ）の求め方について、説明する。 The score calculation unit 12 obtains a first state score (claim score) by substituting each element of the feature vector into a predetermined discriminant F (X) (step S10). The discriminant H is obtained in advance by the learning device 200. The learning device 200 includes the same detection unit 4, extraction unit 6, calculation unit 8, vectorization unit 10, and learning unit 18 of the call state determination device 100. Hereinafter, how to obtain the discriminant F (X) will be described.

通話データベース記憶部２０には、複数のクレーム通話、非クレーム通話が格納されている。そして、複数のクレーム通話、非クレーム通話について、通話状態判定装置１００と同様に検出部４〜ベクトル化部１０の処理が行われる。また、学習装置２００で用いられる特徴量ベクトルＲ（上述したＡ〜Ｇ）、特徴ベクトルＲのベクトル要素（上述した平均値、分散、最大値、最小値）は、通話状態判定装置１００で用いられるそれらと同一にしなければならない。学習部１８は、ベクトル化部１０よりの特徴ベクトルの機械学習を行う。学習方法として様々あるが、例えば、線形判別法やサポートベクターマシン、ニューラルネットワーク等を用いれば良い。 The call database storage unit 20 stores a plurality of complaint calls and non-claimed calls. And about the some claim call and non-claim call, the process of the detection part 4-the vectorization part 10 is performed similarly to the call state determination apparatus 100. FIG. Further, the feature quantity vector R (A to G described above) used in the learning device 200 and the vector elements of the feature vector R (the above-described average value, variance, maximum value, and minimum value) are used in the call state determination device 100. Must be identical to them. The learning unit 18 performs machine learning of the feature vector from the vectorization unit 10. There are various learning methods. For example, a linear discrimination method, a support vector machine, a neural network, or the like may be used.

そして、学習装置２００から求められる判別式Ｆ（Ｘ）は、線形判別法を用いた場合では例えば以下の式になる。

And the discriminant F (X) calculated | required from the learning apparatus 200 becomes the following formulas, when a linear discriminant method is used, for example.

ここで、Ｍの値は特徴ベクトルの要素の数となり、特徴ベクトルの各要素は、それぞれＸ_ｍに代入される。また、判別式Ｆ（Ｘ）として、顧客がクレーム状態にあるセグメントの特徴ベクトルの各要素が代入されると、算出されるクレームスコアは大であり、非クレーム状態にあるセグメントの特徴ベクトルの各要素が代入されると、算出されるクレームスコアは小となるような式が学習により求まる。学習装置２００のが学習により上記式（１）の重み係数α_ｍ（ｍ＝１、．．．、Ｍ）が求められる。 Here, the value of M becomes the number of elements of the feature vector, each element of the feature vector is assigned to X _m, respectively. Further, when each element of the feature vector of the segment in which the customer is in the claim state is substituted as the discriminant F (X), the calculated claim score is large, and each of the feature vector of the segment in the non-claim state When an element is substituted, an expression that reduces the calculated claim score is obtained by learning. The learning apparatus 200 obtains the weighting coefficient α _m (m = 1,..., M) of the above equation (1) by learning.

そして、判定部１４は、各セグメント毎に、クレームスコアが、予め定められた第１閾値以上か否かを判定する（ステップＳ１２）。第１閾値より大きければ、そのセグメントをクレームセグメント（クレーム状態である区間）とし、第１閾値Ｌ_１より小さければ、そのセグメントを非クレーム状態セグメント（非クレーム状態である区間）とする。 And the determination part 14 determines whether a claim score is more than a predetermined 1st threshold value for every segment (step S12). Greater than the first threshold value, and the segment as claimed segments (intervals that were claim state), smaller than the first threshold value L _1, to the segment as a non-complaint state segment (segment is non claim state).

出力部１６は、クレームセグメント（第１状態セグメント）の状態が予め定められた第２閾値Ｌ_１以上であれば、通話状態は第１状態である旨の情報を出力する（ステップＳ１４）。クレームセグメントの状態とは、クレームセグメントの個数や全セグメントに占めるクレームセグメントの割合を示す。例えば、第２閾値Ｌ_１を１とし、１つでもクレームセグメントがあると、その通話をクレーム通話（つまり、顧客が怒っている）とみなしてもよい。また、クレームセグメントの個数が第２閾値Ｌ_１以上である場合や、全セグメントに占めるクレームセグメントの割合が第２閾値Ｌ_１以上である場合にもその通話をクレーム通話とみなしてもよい。 Output unit 16, if the second threshold value L ₁ or the state of Claims segment (first state segment) is predetermined, communication state outputs information indicating that the first state (step S14). The state of the claim segment indicates the number of claim segments and the ratio of the claim segments to the total segments. For example, if the second threshold L ₁ is 1, and there is at least one claim segment, the call may be regarded as a claim call (that is, the customer is angry). Further, and when the number of claims segment is the second threshold value L ₁ or more, the proportion of claims segment in the total segment may be regarded also claims call the call if the second threshold value L ₁ or more.

このように、本発明の通話判定装置によれば、先行技術１のように声の高さや大きさなどの音響的特徴量ではなく、対話的な特徴である対話的特徴量を用いて、顧客はクレーム状態か否かを判定する。対話的特徴量は、顧客の電話環境、顧客の音声または話し方などに変動されず、ばらつきが小さいため、頑健にかつ精度よくクレーム判定を行うことができる。また、この発明では冷静に怒っている顧客のクレーム状態も判定できる。 As described above, according to the call determination device of the present invention, not using the acoustic feature quantity such as the voice pitch and loudness as in the prior art 1, but using the interactive feature quantity that is an interactive feature, Determines whether it is in a claim state. The interactive feature amount is not changed by the customer's telephone environment, the customer's voice or the way of speaking, and the variation is small, so that the claims can be determined robustly and accurately. Further, according to the present invention, it is possible to determine the customer's complaint state who is angry.

また、本発明では、通話中の発話対毎の対話的特徴量を用いて、クレーム状態であるか否かの判定をしていたので、顧客が通話途中から怒り出したとしても、頑健にかつ精度よくクレーム判定を行うことができる。 Further, in the present invention, since it is determined whether or not it is a complaint state using the interactive feature amount for each utterance pair during a call, even if the customer gets angry during the call, Claims can be determined with high accuracy.

また、顧客が怒っている状態中に現れる現象を捉えたものであるため、怒っている発話の集合（＝通話）かどうかを判定するのに適している。 In addition, since it captures the phenomenon that appears when the customer is angry, it is suitable for determining whether or not it is a set of angry utterances (= call).

＜ハードウェア構成＞
本発明は上述の実施の形態に限定されるものではない。また、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 <Hardware configuration>
The present invention is not limited to the above-described embodiment. In addition, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、通話状態判定装置１００が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、処理機能がコンピュータ上で実現される。 Further, when the above configuration is realized by a computer, the processing contents of the functions that the call state determination apparatus 100 should have are described by a program. The processing function is realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記憶しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記憶装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよいが、具体的には、例えば、磁気記憶装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be stored in a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic storage device, an optical disk, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, as the magnetic storage device, a hard disk device, a flexible Discs, magnetic tapes, etc. as optical disks, DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), etc. As the magneto-optical recording medium, MO (Magneto-Optical disc) or the like can be used, and as the semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used.

また、このプログラムの流通は、例えば、そのプログラムを記憶したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM storing the program, for example. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記憶されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program stored in a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

また、本実施例で説明した通話状態判定装置１００は、ＣＰＵ（Central Processing Unit）、入力部、出力部、補助記憶装置、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）及びバスを有している（何れも図示せず）。 The call state determination device 100 described in this embodiment includes a CPU (Central Processing Unit), an input unit, an output unit, an auxiliary storage device, a RAM (Random Access Memory), a ROM (Read Only Memory), and a bus. (Both not shown).

ＣＰＵは、読み込まれた各種プログラムに従って様々な演算処理を実行する。補助記憶装置は、例えば、ハードディスク、ＭＯ（Magneto-Optical disc）、半導体メモリ等であり、ＲＡＭは、ＳＲＡＭ(Static Random Access Memory)、ＤＲＡＭ (Dynamic Random Access Memory)等である。また、バスは、ＣＰＵ、入力部、出力部、補助記憶装置、ＲＡＭ及びＲＯＭを通信可能に接続している。 The CPU executes various arithmetic processes according to the read various programs. The auxiliary storage device is, for example, a hard disk, an MO (Magneto-Optical disc), a semiconductor memory, or the like, and the RAM is an SRAM (Static Random Access Memory), a DRAM (Dynamic Random Access Memory), or the like. The bus connects the CPU, the input unit, the output unit, the auxiliary storage device, the RAM, and the ROM so that they can communicate with each other.

＜ハードウェアとソフトウェアとの協働＞
本実施例の単語追加装置は、上述のようなハードウェアに所定のプログラムが読み込まれ、ＣＰＵがそれを実行することによって構築される。以下、このように構築される各装置の機能構成を説明する。 <Cooperation between hardware and software>
The word adding device of this embodiment is constructed by reading a predetermined program into the hardware as described above and executing it by the CPU. The functional configuration of each device constructed in this way will be described below.

通話状態判定装置１００の入力部、出力部は、所定のプログラムが読み込まれたＣＰＵの制御のもと駆動するＬＡＮカード、モデム等の通信装置である。その他の算出部８などは、所定のプログラムがＣＰＵに読み込まれ、実行されることによって構築される演算部である。記憶部は前記補助記憶装置として機能する。 The input unit and output unit of the call state determination device 100 are communication devices such as a LAN card and a modem that are driven under the control of a CPU loaded with a predetermined program. The other calculation units 8 and the like are calculation units constructed by reading a predetermined program into the CPU and executing it. The storage unit functions as the auxiliary storage device.

Claims

As input a voice call between the first speaker and the second speaker, a determining call status determination device whether a claim state is a state where one of the speaker angry,
A detection unit asking you to start time and end time of each utterance of each utterance of the first speaker and the second speaker,
Wherein to one utterance of the first speaker included in the telephone call voice and a pair utterances pair of one utterance of the second speaker, and extracts a set according to predetermined number the number of speech-to as a segment extraction unit When,
For each of the utterance pair, a calculation unit both the first speaker of the utterance and the utterance of the second speaker is calculated as an interactive feature amount time overlap,
Average value of the interactive features of all utterance pairs constituting the segment, variance of the interactive features of all utterance pairs constituting the segment, interactive of all utterance pairs constituting the segment A vectorization unit for obtaining a maximum value of feature values and a minimum value of the interactive feature values of all utterance pairs constituting the segment as elements of a feature vector;
By substituting each element of the feature vector previously stored more claims state a is call voice, and the voice of a prestored plurality of non claims state discriminant predetermined using A score calculation unit for obtaining a claim state score;
If the claim status score is greater than or equal to a predetermined first threshold, a determination unit that determines the segment as a claim status segment ;
Call state determination apparatus having.

The call state determination device according to claim 1,
An output unit that outputs information indicating that the call is in a claim state if the number of the claim state segments or the ratio of the claim state segments in all the segments is equal to or greater than a predetermined second threshold;
A call state determination device characterized by further comprising :

The call state determination device according to claim 1 or 2,
The calculation unit includes : the first speaker's speech time , the second speaker's speech time , the difference between the first speaker's speech time and the second speaker's speech time, the first speaker's speech time and the second speech Any one or more of the ratio of the speaker's utterance time, the silent time between the utterance of the first speaker and the utterance of the second speaker, the number of times of the first speaker's conversation, or the number of times of the second speaker's conversation Is further calculated as the interactive feature amount,
The vectorization unit comprises an average value for each interactive feature of all utterance pairs constituting the segment, a variance for each interactive feature of all utterance pairs constituting the segment, and the segment. Calling characterized in that a maximum value for each interactive feature value of all utterance pairs and a minimum value for each interactive feature value of all utterance pairs constituting the segment are obtained as elements of the feature vector. State determination device.

As input a voice call between the first speaker and the second speaker, a one speaker angry or determining call status determination method is claims state is a state,
And the detection process asking you to start time and end time of each utterance of each utterance of the first speaker and the second speaker,
Extraction process of the one utterance of the first speaker included in the telephone call voice and a pair of one utterance of the second speaker and the utterance pair, extracts a set according to predetermined number the number of speech-to as a segment When,
For each utterance pair, a calculation process for calculating, as an interactive feature, a time during which both of the first speaker's utterance and the second speaker's utterance overlap .
Average value of the interactive features of all utterance pairs constituting the segment, variance of the interactive features of all utterance pairs constituting the segment, interactive of all utterance pairs constituting the segment A vectorization process for obtaining a maximum value of feature values and a minimum value of the interactive feature values of all utterance pairs constituting the segment as elements of a feature vector;
By substituting each element of the feature vector previously stored more claims state a is call voice, and the voice of a prestored plurality of non claims state discriminant predetermined using A score calculation process to obtain a claim status score;
If the claim status score is greater than or equal to a predetermined first threshold, a determination process for determining the segment as a claim status segment ;
Call state determination method having.

The call state determination method according to claim 4,
An output process for outputting information indicating that the call is in a claim state if the number of the claim state segments or the ratio of the claim state segments in all the segments is equal to or greater than a predetermined second threshold;
The call state determination method further comprising:

The call state determination method according to claim 4 or 5,
The calculation process includes the first speaker's speech time, the second speaker's speech time, the difference between the first speaker's speech time and the second speaker's speech time, the first speaker's speech time and the second speech. Any one or more of the ratio of the speaker's utterance time, the silent time between the utterance of the first speaker and the utterance of the second speaker, the number of times of the first speaker's conversation, or the number of times of the second speaker's conversation Is further calculated as the interactive feature amount,
The vectorization process includes an average value for each interactive feature of all utterance pairs constituting the segment, a variance for each interactive feature of all utterance pairs constituting the segment, and the segment. The maximum value for each interactive feature value of all utterance pairs and the minimum value for each interactive feature value of all utterance pairs constituting the segment are obtained as elements of the feature vector.
A call state determination method characterized by the above.

The program which operates a computer as a telephone call state determination apparatus in any one of Claims 1-3 .

A computer-readable recording medium on which the program according to claim 7 is recorded.