JP7238940B2

JP7238940B2 - Information processing device, information processing method and information processing program

Info

Publication number: JP7238940B2
Application number: JP2021165059A
Authority: JP
Inventors: 祥史大西; 真寺尾
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2017-03-30
Filing date: 2021-10-06
Publication date: 2023-03-14
Anticipated expiration: 2037-03-30
Also published as: JP2022000825A; JP2018169843A; JP6957933B2

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

上記技術分野において、特許文献１には、顧客情報と、顧客の行動履歴情報と、解約リスクが高い顧客とみなす顧客抽出ルールと、を比較して、顧客抽出ルールを満たす顧客を解約リスクが高い顧客として抽出する技術が開示されている。また、非特許文献１には、発話区間の検出方法が開示されている。非特許文献２には、感情認識の方法が開示されている。非特許文献３には、機械学習方法が開示されている。非特許文献４には、発話区間の検出方法が開示されている。 In the above technical field, Patent Document 1 discloses that customer information, customer behavior history information, and a customer extraction rule for considering customers with high cancellation risk are compared, and customers who satisfy the customer extraction rule have a high cancellation risk. A technology for extracting as a customer is disclosed. In addition, Non-Patent Document 1 discloses a method of detecting a speech period. Non-Patent Document 2 discloses a method of emotion recognition. Non-Patent Document 3 discloses a machine learning method. Non-Patent Document 4 discloses a method of detecting a speech period.

特開２００２－３３４２００号公報Japanese Patent Application Laid-Open No. 2002-334200

Yusuke Kida and Tatsuya Kawahara, "Voice Activity Detection based on Optimally Weighted Combination of Multiple Features," Proc. INTERSPEECH 2005, pp.2621-2624, 2005.Yusuke Kida and Tatsuya Kawahara, "Voice Activity Detection based on Optimally Weighted Combination of Multiple Features," Proc. INTERSPEECH 2005, pp.2621-2624, 2005. Florian Eyben, Martin Wollmer, and Bjorn Schuller, "openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit," 2009 3rd International Conference on Affective Computing and Intelligent Interaction and WorkshopsFlorian Eyben, Martin Wollmer, and Bjorn Schuller, "openEAR - Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit," 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops Pedregosa et al., "Scikit-learn: Machine Learning in Python, " JMLR 12, pp. 2825-2830, 2011.Pedregosa et al., "Scikit-learn: Machine Learning in Python," JMLR 12, pp. 2825-2830, 2011. J.P.Yamron, I.Carp, L.Gillick, S.Lowe, and P.van Mulbregt, “A HIDDEN MARKOV MODEL APPROACH TO TEXT SEGMENTATION AND EVENT TRACKING,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp.333-336, 1998.J.P.Yamron, I.Carp, L.Gillick, S.Lowe, and P.van Mulbregt, “A HIDDEN MARKOV MODEL APPROACH TO TEXT SEGMENTATION AND EVENT TRACKING,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp.333- 336, 1998.

しかしながら、上記文献に記載の技術では、顧客の行動履歴情報を用いるため、行動履歴情報を持たない初めてコンタクトしてきた顧客対しても解約リスク予測ができなかった。 However, since the technique described in the above document uses the behavior history information of the customer, it is not possible to predict the cancellation risk even for a customer who does not have the behavior history information and contacts the customer for the first time.

本発明の目的は、上述の課題を解決する技術を提供することにある。 An object of the present invention is to provide a technique for solving the above problems.

上記目的を達成するため、本発明に係る情報処理装置は、
顧客とオペレータとの間での通話中の音声データに基づいて算出された、前記顧客および前記オペレータの発話量に関する発話量特徴量を取得する手段と、
前記音声データに基づいて算出された、前記通話中の前記顧客および前記オペレータの感情値に関する感情値特徴量を取得する手段と、
前記発話量特徴量と前記感情値特徴量とに基づいて前記顧客の解約リスクを予測するモデルと、取得した前記発話量特徴量と、取得した前記感情値特徴量とを用いて、前記顧客の解約リスクを予測する手段と、
を備える。 In order to achieve the above object, an information processing device according to the present invention includes:
a means for acquiring a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
means for obtaining an emotion value feature quantity relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer a means of predicting the churn risk of
Prepare.

上記目的を達成するため、本発明に係る情報処理方法は、
顧客とオペレータとの間での通話中の音声データに基づいて算出された、前記顧客および前記オペレータの発話量に関する発話量特徴量を取得するステップと、
前記音声データに基づいて算出された、前記通話中の前記顧客および前記オペレータの感情値に関する感情値特徴量を取得するステップと、
前記発話量特徴量と前記感情値特徴量とに基づいて前記顧客の解約リスクを予測するモデルと、取得した前記発話量特徴量と、取得した前記感情値特徴量とを用いて、前記顧客の解約リスクを予測するステップと、
を含む。 In order to achieve the above object, an information processing method according to the present invention comprises:
a step of obtaining a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
a step of obtaining an emotion value feature amount relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer predicting the churn risk of
including.

上記目的を達成するため、本発明に係る情報処理プログラムは、
顧客とオペレータとの間での通話中の音声データに基づいて算出された、前記顧客および前記オペレータの発話量に関する発話量特徴量を取得するステップと、
前記音声データに基づいて算出された、前記通話中の前記顧客および前記オペレータの感情値に関する感情値特徴量を取得するステップと、
前記発話量特徴量と前記感情値特徴量とに基づいて前記顧客の解約リスクを予測するモデルと、取得した前記発話量特徴量と、取得した前記感情値特徴量とを用いて、前記顧客の解約リスクを予測するステップと、
をコンピュータに実行させる。 In order to achieve the above object, an information processing program according to the present invention comprises:
a step of obtaining a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
a step of obtaining an emotion value feature amount relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer predicting the churn risk of
run on the computer.

本発明によれば、行動履歴情報を持たない初めてコンタクトしてきた顧客対しても解約リスク予測ができる。 According to the present invention, it is possible to predict the cancellation risk even for a customer who does not have action history information and contacts for the first time.

本発明の第１実施形態に係る情報処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an information processing device according to a first embodiment of the present invention; FIG. 本発明の第２実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of an information processing apparatus according to a second embodiment of the present invention; FIG. 本発明の第２実施形態に係る情報処理装置の感情特徴算出部の構成を示すブロック図である。FIG. 9 is a block diagram showing the configuration of an emotion feature calculation unit of the information processing device according to the second embodiment of the present invention; 本発明の第２実施形態に係る情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information processing apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る情報処理装置の処理手順を説明するフローチャートである。9 is a flowchart for explaining the processing procedure of the information processing apparatus according to the second embodiment of the present invention; 本発明の第３実施形態に係る情報処理装置の構成を示すブロック図である。FIG. 11 is a block diagram showing the configuration of an information processing apparatus according to a third embodiment of the present invention; FIG. 本発明の第３実施形態に係る情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information processing apparatus which concerns on 3rd Embodiment of this invention. 本発明の第３実施形態に係る情報処理装置の処理手順を説明するフローチャートである。It is a flow chart explaining a processing procedure of an information processor concerning a 3rd embodiment of the present invention.

以下に、本発明を実施するための形態について、図面を参照して、例示的に詳しく説明記載する。ただし、以下の実施の形態に記載されている、構成、数値、処理の流れ、機能要素などは一例に過ぎず、その変形や変更は自由であって、本発明の技術範囲を以下の記載に限定する趣旨のものではない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments for carrying out the present invention will be exemplarily described in detail with reference to the drawings. However, the configuration, numerical values, flow of processing, functional elements, etc. described in the following embodiments are only examples, and modifications and changes are free, and the technical scope of the present invention is not limited to the following description. It is not intended to be limited.

［第１実施形態］
本発明の第１実施形態としての情報処理装置１００について、図１を用いて説明する。情報処理装置１００は、通話における発話量特徴と感情とに基づいて、通話における解約リスクを予測する装置である。 [First Embodiment]
An information processing apparatus 100 as a first embodiment of the present invention will be described using FIG. The information processing device 100 is a device that predicts cancellation risk in a call based on the speech volume feature and emotion in the call.

図１に示すように、情報処理装置１００は、通話記憶部１０１と、発話量特徴算出部１０２と、感情認識部１０３と、解約予測部１０４と、を含む。通話記憶部１０１は、顧客とオペレータとの間の通話を取得して、記憶する。発話量特徴算出部１０２は、記憶した通話の発話量特徴量を算出する。感情認識部１０３は、発話区間の感情を認識する。解約予測部１０４は、発話量特徴量と、認識した感情と、に基づいて、通話における解約リスクを予測する。 As shown in FIG. 1 , the information processing device 100 includes a call storage unit 101 , a speech amount feature calculation unit 102 , an emotion recognition unit 103 and a cancellation prediction unit 104 . The call storage unit 101 acquires and stores calls between customers and operators. The speech amount feature calculating unit 102 calculates the speech amount feature of the stored call. The emotion recognition unit 103 recognizes the emotion of the utterance period. The cancellation prediction unit 104 predicts the cancellation risk of the call based on the speech amount feature amount and the recognized emotion.

本実施形態によれば、行動履歴情報を持たない初めてコンタクトしてきた顧客対しても解約リスク予測ができる。 According to this embodiment, it is possible to predict the cancellation risk even for a customer who does not have action history information and has contacted the customer for the first time.

［第２実施形態］
次に本発明の第２実施形態に係る情報処理装置について、図２乃至図５を用いて説明する。図２は、本実施形態に係る情報処理装置の構成を示すブロック図である。 [Second embodiment]
Next, an information processing apparatus according to a second embodiment of the present invention will be described with reference to FIGS. 2 to 5. FIG. FIG. 2 is a block diagram showing the configuration of the information processing apparatus according to this embodiment.

本実施形態に係る情報処理装置は、コールセンタなどにおいて、通話から、解約リスクの高い顧客や、継続が見込める顧客などを抽出することができ、それぞれの顧客に対して選択的に適切なフォローアップを実施できる装置である。 The information processing apparatus according to the present embodiment can extract customers with high cancellation risk and customers who are expected to continue from calls in call centers and the like, and selectively perform appropriate follow-ups for each customer. It is a device that can be implemented.

情報処理装置２００は、通話記憶部２０１、発話量特徴算出部２０２、感情認識部２０３、感情特徴算出部２０４、顧客抽出モデル記憶部２０５、解約予測部２０６および抽出顧客出力部２０７を有する。 The information processing device 200 has a call storage unit 201 , a speech amount feature calculation unit 202 , an emotion recognition unit 203 , an emotion feature calculation unit 204 , a customer extraction model storage unit 205 , a cancellation prediction unit 206 and an extracted customer output unit 207 .

通話記憶部２０１は、顧客とオペレータとの間の通話を通話音声データとして取得する。そして、通話記憶部２０１は、取得した通話音声データをファイルまたはストリームデータとして記憶する。 A call storage unit 201 acquires a call between a customer and an operator as call voice data. Then, the call storage unit 201 stores the acquired call voice data as a file or stream data.

通話音声データを記憶する際に、通話記憶部２０１は、記憶する通話音声データに対して、通話（コール）を特定する通話ＩＤ（Identifier）（コールＩＤ）を付与して、記憶する。さらに、通話記憶部２０１は、記憶する通話音声データに対して、通話音声データの発話区間を検出し、検出した発話区間のそれぞれに対して発話者情報、すなわち、オペレータ情報または顧客情報を付与して、記憶する。さらにまた、通話記憶部２０１は、記憶する通話音声データに対して、検出した発話区間のそれぞれに対して発話時刻情報、すなわち、発話始端の時刻および発話終端の時刻を付与して、記憶する。よって、通話記憶部２０１は、通話音声データを記憶する際に、記憶する通話音声データに対して、コールＩＤ、オペレータ情報、顧客情報、発話時刻情報を付与して、記憶する。 When storing the call voice data, the call storage unit 201 assigns a call identifier (call ID) for specifying the call to the call voice data to be stored. Further, the call storage unit 201 detects the speech period of the call sound data from the stored call sound data, and adds speaker information, that is, operator information or customer information to each of the detected speech periods. and memorize it. Furthermore, the call storage unit 201 adds speech time information, that is, the time of the start of speech and the time of the end of the speech to each of the detected speech segments, and stores the speech data to be stored. Therefore, when storing the call voice data, the call storage unit 201 adds a call ID, operator information, customer information, and utterance time information to the call voice data to be stored.

なお、発話区間の検出方法には様々な方法が開示されており、例えば、非特許文献１に記載されている方法を用いてもよい。 Note that various methods have been disclosed for detecting speech segments, and for example, the method described in Non-Patent Document 1 may be used.

コールＩＤについては、例えば、近年のコールセンタにおいては、ＣＴＩ（Computer Telephony Integration）システムによりコールＩＤを取得することができる。また、顧客側の信号とオペレータ側の信号とは、別のチャンネルとしてデータを記録することが可能である。したがって、コールＩＤおよび発話者情報は、ＣＴＩシステムにより取得したコールＩＤとチャンネル情報とを用いて付与することができる。 As for the call ID, for example, in recent call centers, the call ID can be acquired by a CTI (Computer Telephony Integration) system. It is also possible to record data on separate channels for the customer side signal and the operator side signal. Therefore, the call ID and speaker information can be assigned using the call ID and channel information obtained by the CTI system.

また、発話時刻情報は、検知された発話区間の絶対時刻、または、システムが規定する時刻情報を始端および終端について取得することで付与できる。 Also, the speech time information can be given by acquiring the absolute time of the detected speech section or the time information defined by the system for the beginning and end.

発話量特徴算出部２０２は、通話記憶部２０１に記憶された各発話区間の時刻情報から、オペレータと顧客との発話量の特徴を算出する。すなわち、各コールにおけるオペレータおよび顧客のそれぞれの発話区間の総時間、各コールにおけるオペレータと顧客との発話区間の総時間の比率、各コールにおけるオペレータおよび顧客のそれぞれの発話区間長の平均値、分散値、中央値、最頻値、あるいはその他の統計処理によって算出される値の少なくともいずれか一つを用いる。 The speech amount feature calculation unit 202 calculates the feature of the amount of speech between the operator and the customer from the time information of each speech period stored in the call storage unit 201 . That is, the total time of speech segments of the operator and the customer in each call, the ratio of the total time of speech segments of the operator and the customer in each call, the average length of the speech segments of the operator and the customer in each call, and the variance At least one of a value, a median, a mode, or a value calculated by other statistical processing is used.

感情認識部２０３は、通話記憶部２０１に記憶された各発話区間について、valence（感情の質を示す感情価、感情のpositiveとnegativeとの度合）、arousal（覚醒か沈静かを示す覚醒度、感情の興奮度合）、納得感および期待感の少なくとも一つの感情値を算出する。感情認識方法は様々な方法が開示されており、例えば、非特許文献２に記載されている方法を用いることができる。 Emotion recognition unit 203, for each utterance segment stored in call storage unit 201, valence (emotional valence indicating the quality of emotion, degree of positive or negative emotion), arousal (arousal level indicating arousal or calmness, Emotional excitement level), at least one emotional value of satisfaction and expectation is calculated. Various emotion recognition methods have been disclosed, and for example, the method described in Non-Patent Document 2 can be used.

いずれの感情も、事前に指定した範囲の数値、または、有限個のカテゴリを感情値として算出される。すなわち、感情認識部２０３は、例えば、valenceについて、－３～３の範囲の値、または、｛－３，－２，－１，０，１，２，３｝といったカテゴリを算出する。arousal、納得感、期待感も同様であるが、各感情で指定する範囲やカテゴリは同一のものである必要はない。 Each emotion is calculated using a predetermined range of numeric values or a finite number of categories as an emotion value. That is, the emotion recognition unit 203 calculates, for example, a value in the range of -3 to 3 or a category such as {-3, -2, -1, 0, 1, 2, 3} for valence. The same is true for arousal, conviction, and expectation, but the ranges and categories specified for each emotion do not have to be the same.

感情特徴算出部２０４は、感情認識部２０３で算出された各発話区間の感情値から、オペレータおよび顧客の感情特徴量を算出する。感情特徴量には、各コールにおけるオペレータと顧客とのそれぞれの感情値の総和、感情値比率、感情値の平均値、分散値、中央値、最頻値、あるいはその他の統計処理によって算出される値の少なくともいずれか一つが含まれる。 The emotion feature calculation unit 204 calculates the emotion feature amount of the operator and the customer from the emotion value of each utterance period calculated by the emotion recognition unit 203 . Emotion features include the sum of the emotion values of the operator and the customer in each call, the emotion value ratio, the average value of the emotion values, the variance value, the median value, the mode value, or other statistical processing. At least one of the values is included.

図３は、本実施形態に係る情報処理装置の感情特徴算出部の構成を示すブロック図である。感情特徴算出部２０４は、図３に示す構成により、通話音声データ内の特定の区間に含まれる各発話区間の感情値のみを用いて、オペレータおよび顧客の感情特徴量を算出してもよい。図３に示す感情特徴算出部２０４は、話題区間検出部２４１と、絞込み部２４２と、特徴算出部２４３と、を含む。 FIG. 3 is a block diagram showing the configuration of the emotion feature calculator of the information processing apparatus according to this embodiment. The emotion feature calculation unit 204 may calculate the emotion feature amounts of the operator and the customer using only the emotion value of each utterance section included in a specific section in the call voice data, using the configuration shown in FIG. The emotion feature calculator 204 shown in FIG. 3 includes a topic section detector 241 , a narrowing section 242 , and a feature calculator 243 .

話題区間検出部２４１は、通話記憶部２０１に記憶された通話音声データに含まれる話題区間を検出し、検出した話題区間のそれぞれに対して話題始端の時刻および話題終端の時刻を付与する。話題としては、例えば、オープニング、クロージング、製品・サービス説明、料金説明、勧誘、ヒアリング、個人情報確認、事務手続き、などが挙げられる。 The topic segment detection unit 241 detects topic segments included in the call voice data stored in the call storage unit 201, and assigns topic start time and topic end time to each of the detected topic segments. Topics include, for example, openings, closings, product/service explanations, fee explanations, solicitations, interviews, confirmation of personal information, administrative procedures, and the like.

話題区間の検出方法は様々な方法が開示されており、例えば、音声認識と非特許文献４に記載されている方法を組合せて用いることができる。具体的には、様々な話題に対する単語の出現頻度をあらかじめ学習した話題モデルを用いて、音声認識で得られる単語列を複数の話題区間へと分割することができる。 Various methods have been disclosed for detecting topic segments, and for example, speech recognition and the method described in Non-Patent Document 4 can be used in combination. Specifically, it is possible to divide a word string obtained by speech recognition into a plurality of topic sections using a topic model in which word appearance frequencies for various topics are learned in advance.

また、話題区間検出部２４１は、時刻情報や発話数に基づいて話題区間を検出しても良い。例えば、通話の先頭３０秒区間や先頭５発話区間をオープニングの話題区間として検出できる。また、通話の末尾３０秒区間や末尾５発話区間をクロージングの話題区間として検出できる。さらに、通話の後半１／３区間（例えば、９分間の通話であれば６分～９分の区間）を1つの話題区間として検出しても良い。 Also, the topic section detection unit 241 may detect topic sections based on time information or the number of utterances. For example, the first 30-second section or the first 5 utterance sections of a call can be detected as the opening topic section. In addition, a 30-second segment at the end of a call or a 5-speech segment at the end of a call can be detected as a closing topic segment. Furthermore, the latter ⅓ section of the call (for example, the section of 6 to 9 minutes in the case of a 9-minute call) may be detected as one topic section.

絞込み部２４２は、感情認識部２０３が感情値を付与した各発話区間のうち、話題区間検出部２０８が検出した特定の話題に含まれる発話区間のみを特徴算出部２４３へ出力する。例えば、クロージングの話題区間に含まれる発話区間とそれらの感情値のみを特徴算出部２４３へ出力する。 Narrowing down section 242 outputs only the speech sections included in the specific topic detected by topic section detection section 208 to feature calculation section 243 among the speech sections to which emotion recognition section 203 assigns an emotion value. For example, only the utterance segments included in the closing topic segment and their emotion values are output to the feature calculation unit 243 .

特徴算出部２４３は、絞込み部２４２から出力された各発話区間の感情値から、オペレータおよび顧客の感情特徴量を算出する。感情特徴量には、特定の話題区間におけるオペレータと顧客とのそれぞれの感情値の総和、感情値比率、感情値の平均値、分散値、中央値、最頻値、あるいはその他の統計処理によって算出される値の少なくともいずれか一つが含まれる。例えば、オープニングまたはクロージングの話題区間に含まれる顧客の発話のvalenceや期待感の感情値の総和や平均値を感情特徴量として算出する。 The feature calculation unit 243 calculates the emotion feature amount of the operator and the customer from the emotion value of each utterance period output from the narrowing-down unit 242 . Emotion features include the sum of emotion values of operators and customers in a specific topic section, the ratio of emotion values, the average value of emotion values, the variance value, the median value, the mode value, or other statistical processing. contains at least one of the values For example, the valence of the customer's utterance and the sum or average value of emotion values of expectations included in the opening or closing topic section are calculated as emotion feature amounts.

顧客抽出モデル記憶部２０５は、感情特徴算出部２０４で算出されたオペレータおよび顧客の感情特徴量と、発話量特徴算出部２０２で算出されたオペレータおよび顧客の発話量と、を特徴量とし、解約リスクが高い顧客が予測するモデルを顧客抽出モデルとして記憶する。 The customer extraction model storage unit 205 uses the emotion feature amount of the operator and the customer calculated by the emotion feature calculation unit 204 and the speech amount of the operator and the customer calculated by the speech amount feature calculation unit 202 as feature amounts. A model predicted by a high-risk customer is stored as a customer extraction model.

客抽出モデルは、あらかじめ各通話データに対応した解約したか否かの正解ラベル情報を与えて、解約リスクを予測するモデルを機械学習手法により学習しておく。機械学習手法は様々な方法が開示されており、例えば、非特許文献３には、その詳細および実行可能なプログラムが開示されている。解約予測は、当該モデルの学習手法に対応した予測手法を用いることができる。 For the customer extraction model, correct label information indicating whether or not the call has been canceled corresponding to each call data is given in advance, and a model for predicting the cancellation risk is learned by a machine learning method. Various machine learning techniques have been disclosed, and for example, Non-Patent Document 3 discloses details and an executable program thereof. For churn prediction, a prediction method corresponding to the learning method of the model can be used.

解約予測部２０６は、顧客抽出モデルを用いて、各コールに対して解約リスクを予測（算出）する。 The cancellation prediction unit 206 predicts (calculates) the cancellation risk for each call using a customer extraction model.

抽出顧客出力部２０７は、予測された各コールの解約リスクと、あらかじめ指定した解約リスクの閾値や対象コール数における高解約リスク者割合と、を比較し、解約リスクの高い顧客、すなわち、解約リスクが高いコールの顧客を抽出する。 The extracted customer output unit 207 compares the predicted churn risk of each call with a predetermined churn risk threshold value and a high churn risk rate among the number of target calls, and identifies customers with a high churn risk, i.e., customers with a high churn risk. Extract customers with high calls.

なお、本実施形態の説明では、解約を予測する例で説明をしたが、本実施形態は継続予測についても同様に適用できる。 In addition, in the description of the present embodiment, an example of predicting cancellation has been described, but the present embodiment can be similarly applied to prediction of continuation.

図４は、本実施形態に係る情報処理装置２００のハードウェア構成を説明するブロック図である。ＣＰＵ(Central Processing Unit)４１０は、演算制御用のプロセッサであり、プログラムを実行することで図２の情報処理装置２００の機能構成部を実現する。ＣＰＵ４１０は複数のプロセッサを有し、異なるプログラムやモジュール、タスク、スレッドなどを並行して実行してもよい。ＲＯＭ(Read Only Memory)４２０は、初期データおよびプログラムなどの固定データおよびその他のプログラムを記憶する。また、ネットワークインタフェース４３０は、ネットワークを介して他の装置などと通信する。なお、ＣＰＵ４１０は１つに限定されず、複数のＣＰＵであっても、あるいは画像処理用のＧＰＵ(Graphics Processing Unit)を含んでもよい。また、ネットワークインタフェース４３０は、ＣＰＵ４１０とは独立したＣＰＵを有して、ＲＡＭ(Random Access Memory)４４０の領域に送受信データを書き込みあるいは読み出しするのが望ましい。また、ＲＡＭ４４０とストレージ４５０との間でデータを転送するＤＭＡＣ(Direct Memory Access Controller)を設けるのが望ましい（図示なし）。さらに、ＣＰＵ４１０は、ＲＡＭ４４０にデータが受信あるいは転送されたことを認識してデータを処理する。また、ＣＰＵ４１０は、処理結果をＲＡＭ４４０に準備し、後の送信あるいは転送はネットワークインタフェース４３０やＤＭＡＣに任せる。 FIG. 4 is a block diagram illustrating the hardware configuration of the information processing device 200 according to this embodiment. A CPU (Central Processing Unit) 410 is a processor for arithmetic control, and implements the functional components of the information processing apparatus 200 in FIG. 2 by executing a program. The CPU 410 may have multiple processors and execute different programs, modules, tasks, threads, etc. in parallel. A ROM (Read Only Memory) 420 stores fixed data such as initial data and programs, and other programs. Also, the network interface 430 communicates with other devices and the like via a network. Note that the number of CPUs 410 is not limited to one, and may include a plurality of CPUs or a GPU (Graphics Processing Unit) for image processing. Moreover, it is desirable that the network interface 430 has a CPU independent of the CPU 410 and writes or reads transmission/reception data in a RAM (Random Access Memory) 440 area. It is also desirable to provide a DMAC (Direct Memory Access Controller) for transferring data between RAM 440 and storage 450 (not shown). Further, CPU 410 recognizes that data has been received or transferred to RAM 440 and processes the data. Also, the CPU 410 prepares the processing result in the RAM 440, and entrusts subsequent transmission or transfer to the network interface 430 or DMAC.

ＲＡＭ４４０は、ＣＰＵ４１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ４４０には、本実施形態の実現に必要なデータを記憶する領域が確保されている。通話ＩＤ４４１は、オペレータと顧客との間の通話（コール）を識別するデータであり、各通話に割り当てられている。発話量特徴４４２は、オペレータと顧客との間の通話において、算出された発話量の特徴である。感情値４４３は、各コールにおける各発話区間の算出された感情値である。感情特徴４４４は、各発話区間の感情値から算出された感情の特徴である、顧客抽出モデル４４５は、解約リスクが高い顧客を予測するためのモデルである。解約予測４４６は、顧客抽出モデルを用いて各コールに対して予測された解約リスクの予測である。 RAM 440 is a random access memory used by CPU 410 as a work area for temporary storage. The RAM 440 has an area for storing data necessary for implementing the present embodiment. A call ID 441 is data identifying a call between an operator and a customer, and is assigned to each call. The speech volume feature 442 is the speech volume feature calculated in the call between the operator and the customer. The emotion value 443 is the emotion value calculated for each utterance segment in each call. The emotional feature 444 is a feature of emotion calculated from the emotional value of each utterance interval. The customer extraction model 445 is a model for predicting customers with a high cancellation risk. Churn Prediction 446 is a predicted churn risk prediction for each call using a customer extraction model.

入出力データ４４７は、入出力インタフェース４６０を介して入出力されるデータである。送受信データ４４８は、ネットワークインタフェース４３０を介して送受信されるデータである。また、ＲＡＭ４４０は、各種アプリケーションモジュールを実行するためのアプリケーション実行領域４４９を有する。 The input/output data 447 is data input/output via the input/output interface 460 . Transmitted/received data 448 is data transmitted/received via network interface 430 . The RAM 440 also has an application execution area 449 for executing various application modules.

ストレージ４５０には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。ストレージ４５０は、発話量特徴算出モジュール４５３、感情認識モジュール４５４、感情特徴算出モジュール４５５および解約予測モジュール４５７を格納する。 The storage 450 stores a database, various parameters, or the following data or programs necessary for realizing this embodiment. The storage 450 stores a speech amount feature calculation module 453 , an emotion recognition module 454 , an emotion feature calculation module 455 and a cancellation prediction module 457 .

発話量特徴算出モジュール４５３は、オペレータと顧客との間の通話の発話量の特徴を算出するモジュールである。感情認識モジュール４５４は、オペレータと顧客との間の通話の発話区間における感情値を算出するモジュールである。感情特徴算出モジュール４５５は、算出された発話区間の感情値から感情の特徴を算出するモジュールである。解約予測モジュール４５７は、発話量特徴量と認識した感情とに基づいて、通話における解約リスクを予測するモジュールである。これらのモジュール４５３～４５７は、ＣＰＵ４１０によりＲＡＭ４４０のアプリケーション実行領域４４９に読み出され、実行される。制御プログラム４５６は、情報処理装置２００の全体を制御するためのプログラムである。 The speech amount feature calculation module 453 is a module for calculating the speech amount feature of a call between an operator and a customer. The emotion recognition module 454 is a module that calculates the emotion value in the utterance section of the call between the operator and the customer. The emotion feature calculation module 455 is a module that calculates the feature of emotion from the calculated emotion value of the utterance period. The cancellation prediction module 457 is a module that predicts the cancellation risk of a call based on the feature amount of speech and the recognized emotion. These modules 453 to 457 are read by the CPU 410 into the application execution area 449 of the RAM 440 and executed. The control program 456 is a program for controlling the information processing apparatus 200 as a whole.

入出力インタフェース４６０は、入出力機器との入出力データをインタフェースする。入出力インタフェース４６０には、表示部４６１、操作部４６２、が接続される。また、入出力インタフェース４６０には、さらに、記憶媒体４６４が接続されてもよい。さらに、音声出力部であるスピーカ４６３や、音声入力部であるマイク（図示せず）、あるいは、ＧＰＳ位置判定部が接続されてもよい。なお、図４に示したＲＡＭ４４０やストレージ４５０には、情報処理装置２００が有する汎用の機能や他の実現可能な機能に関するプログラムやデータは図示されていない。 The input/output interface 460 interfaces input/output data with input/output devices. A display unit 461 and an operation unit 462 are connected to the input/output interface 460 . A storage medium 464 may also be connected to the input/output interface 460 . Furthermore, a speaker 463 as an audio output unit, a microphone (not shown) as an audio input unit, or a GPS position determination unit may be connected. Note that the RAM 440 and the storage 450 shown in FIG. 4 do not show programs or data relating to general-purpose functions of the information processing apparatus 200 or other realizable functions.

図５は、本実施形態に係る情報処理装置２００の処理手順を説明するフローチャートである。このフローチャートは、図４のＣＰＵ４１０がＲＡＭ４４０を使用して実行し、図２の情報処理装置２００の機能構成部を実現する。 FIG. 5 is a flowchart for explaining the processing procedure of the information processing apparatus 200 according to this embodiment. This flowchart is executed by the CPU 410 in FIG. 4 using the RAM 440, and implements the functional components of the information processing apparatus 200 in FIG.

ステップＳ５０１において、情報処理装置２００は、オペレータと顧客との間の通話を取得する。ステップＳ５０３において、情報処理装置２００は、取得した通話の発話区間を検出し、検出した各発話区間に対して、発話者情報および発話時刻情報を付与する。ステップＳ５０５において、情報処理装置２００は、通話におけるオペレータと顧客との発話量の特徴を算出する。ステップＳ５０７において、情報処理装置２００は、発話区間の感情値を算出する。ステップＳ５０９において、情報処理装置２００は、算出した発話区間の感情値から、感情特徴を算出する。ステップＳ５１３において、情報処理装置２００は、顧客抽出モデルを用いて、各コールに対して解約リスクを予測する。ステップＳ５１５において、情報処理装置２００は、予測された解約リスクに基づいて、解約リスクが高いコールの顧客を抽出する。 In step S501, the information processing device 200 acquires a call between an operator and a customer. In step S503, the information processing apparatus 200 detects speech periods of the acquired call, and provides speaker information and speech time information to each detected speech period. In step S505, the information processing apparatus 200 calculates the feature of the amount of speech between the operator and the customer in the call. In step S507, the information processing apparatus 200 calculates the emotion value of the utterance period. In step S509, the information processing apparatus 200 calculates an emotion feature from the calculated emotion value of the utterance period. In step S513, the information processing device 200 predicts the cancellation risk for each call using the customer extraction model. In step S515, the information processing apparatus 200 extracts call customers with a high cancellation risk based on the predicted cancellation risk.

本実施形態によれば、顧客の商品やサービスへの問い合わせ、申し込み時の通話データを用いることで、初めてコンタクトしてくるような行動履歴が取得できない顧客に対しても解約予測をすることができる。 According to this embodiment, it is possible to predict cancellation even for customers whose behavior history cannot be obtained, such as contacting customers for the first time, by using the customer's inquiries about products and services and call data at the time of application. .

また、本実施形態によれば、発話特徴としてオペレータと顧客とのどちらが主体的に会話を行っていたかといった情報を、人手でルール化できない情報も含め統計処理を用いることで算出する。同じく、感情特徴として満足、不満といった、人が解釈し得る感情値ではなく、感情値から人手でルール化できない情報も含め統計処理を用いて算出することで、高精度に解約リスクを推定するモデルの学習および予測が可能となる。 In addition, according to the present embodiment, information such as which of the operator or the customer was actively conversing is calculated by using statistical processing, including information that cannot be ruled manually, as an utterance feature. Similarly, a model that estimates churn risk with high accuracy by calculating using statistical processing including information that cannot be ruled manually from emotional values, rather than emotional values that can be interpreted by humans such as satisfaction and dissatisfaction as emotional characteristics. can be learned and predicted.

解約する、解約しないについては、顧客の商品やサービスの問い合せや申し込み時のやり取りで決まるわけではない。一部の顧客において、アップセルやクロスセルにより、本来意図しなかった商品を購入した場合に解約する傾向がある。本発明者は、そのような場合に、解約または継続を予測するのに、サービスへの問い合わせ、申し込み時のやり取りの会話情報が有効であること、その際、発話特徴および感情特徴が有効であることを見出したものである。 Whether to cancel or not to cancel is not decided by the customer's inquiry about products and services or the interaction at the time of application. Some customers tend to cancel when they purchase products that they did not originally intend through upselling or cross-selling. In such a case, the present inventor believes that the conversational information of inquiries to services and exchanges at the time of application are effective in predicting cancellation or continuation, and that utterance features and emotional features are effective at that time. This is what I discovered.

そのため、本実施形態では、予測（算出）したリスクと、あらかじめ指定した解約リスクの閾値や対象コール数における高解約リスク者割合と、を比較し、解約リスクが高いと判断された顧客を抽出する。これにより、本実施形態では、電話での対応を主要因とする解約リスクが高い顧客を抽出することができる。 Therefore, in the present embodiment, the predicted (calculated) risk is compared with a pre-designated cancellation risk threshold and the high cancellation risk rate in the number of target calls, and customers judged to have a high cancellation risk are extracted. . As a result, in the present embodiment, it is possible to extract customers who have a high risk of canceling a contract mainly due to telephone correspondence.

また、本発明者は、顧客が将来解約するかどうかが、通話内の特定の話題区間におけるオペレータや顧客の感情に依存することを見出した。例えば、オープニングやクロージングの話題区間における顧客のpositive感情や期待感の感情値が大きい場合は、解約リスクが低いことを見出した。 Also, the inventors have found that whether or not a customer will cancel in the future depends on the operator's or customer's emotions during a particular topical segment in a call. For example, we found that the customer's positive emotions and expectations in the topical section of the opening and closing are high, the cancellation risk is low.

そのため、本実施形態では、通話における特定の話題区間におけるオペレータや顧客の感情に着目した感情特徴に基づいて解約リスクを推定する。これにより、本実施形態では、解約リスクが高い顧客を高い精度で抽出することができる。 Therefore, in the present embodiment, the cancellation risk is estimated based on the emotion feature focusing on the emotion of the operator or the customer in a specific topic section of the call. Thereby, in the present embodiment, it is possible to extract customers with a high cancellation risk with high accuracy.

［第３実施形態］
次に本発明の第３実施形態に係る情報処理装置について、図６乃至図９を用いて説明する。図６は、本実施形態に係る情報処理装置の構成を示すブロック図である。本実施形態に係る情報処理装置は、上記第２実施形態と比べると、フレーズ記憶部、フレーズ認識部およびフレーズ特徴算出部を有する点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Third embodiment]
Next, an information processing apparatus according to a third embodiment of the present invention will be explained using FIGS. 6 to 9. FIG. FIG. 6 is a block diagram showing the configuration of the information processing apparatus according to this embodiment. The information processing apparatus according to this embodiment differs from that of the second embodiment in that it has a phrase storage unit, a phrase recognition unit, and a phrase feature calculation unit. Since other configurations and operations are similar to those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof will be omitted.

情報処理装置６００は、フレーズ記憶部６０１、フレーズ認識部６０２およびフレーズ特徴算出部を有する。 The information processing device 600 has a phrase storage unit 601, a phrase recognition unit 602, and a phrase feature calculation unit.

フレーズ記憶部６０１は、あらかじめ指定したフレーズを記憶する。フレーズ記憶部６０１は、納得、非納得、強引さおよび料金へのこだわりの少なくとも一つについて、それらの状況とみなせるフレーズを記憶する。例えば、納得の場合、「わかりました」、「その通り」などである。非納得の場合、「ひとまず」、「とりあえず」、「さしあたって」などである。強引さの場合は、「決めましょう」、「これでいいですね」などである。料金へのこだわりの場合は、「価格」、「料金」、「送料無料」、「消費税」、「もったいない」などである。 The phrase storage unit 601 stores predesignated phrases. The phrase storage unit 601 stores phrases that can be regarded as situations of at least one of satisfaction, dissatisfaction, brute force, and attention to fees. For example, in the case of satisfaction, it is "understood", "that's right", and the like. In the case of dissatisfaction, it is "for the time being", "for the time being", "for the time being", and the like. In the case of brute force, it is "Let's decide", "This is fine", and so on. In the case of sticking to the price, it is "price", "fee", "free shipping", "consumption tax", "mottainai", and the like.

フレーズ認識部６０２は、フレーズ記憶部６０１に記憶されたフレーズの出現を検知して、認識する。フレーズ認識部６０２は、音声認識やワードスポッティングを行い、フレーズ記憶部６０１に記憶されたフレーズの出現を検知して、認識する。 A phrase recognition unit 602 detects and recognizes the appearance of phrases stored in the phrase storage unit 601 . A phrase recognition unit 602 performs speech recognition and word spotting, detects the appearance of phrases stored in the phrase storage unit 601, and recognizes them.

フレーズ特徴算出部６０３は、オペレータおよび顧客のフレーズの特徴を算出する。フレーズ特徴算出部６０３は、フレーズ認識部６０２による認識結果（検知結果）と用いて、納得、非納得、強引さおよび料金へのこだわりの少なくとも一つについて、それぞれ、以下の計算を行う。つまり、フレーズ特徴算出部６０３は、フレーズ記憶部６０１に記憶されたフレーズの出現数であるフレーズ出現数を算出する。また、フレーズ特徴算出部６０３は、フレーズ出現数を発話時間で正規化した正規化フレーズ出現数、および、あらかじめ指定した時間間隔ごとのフレーズ出現数を算出する。フレーズ特徴算出部６０３は、フレーズ出現数、正規化フレーズ出現数およびあらかじめ指定した時間間隔ごとのフレーズ出現数の少なくとも一つを算出する。 The phrase feature calculator 603 calculates features of phrases of operators and customers. The phrase feature calculation unit 603 uses the recognition result (detection result) by the phrase recognition unit 602 to perform the following calculations for at least one of satisfaction, dissatisfaction, aggressiveness, and commitment to fees. In other words, the phrase feature calculation unit 603 calculates the number of occurrences of phrases stored in the phrase storage unit 601 . The phrase feature calculation unit 603 also calculates the normalized number of phrase appearances obtained by normalizing the number of phrase appearances by the utterance time, and the number of phrase appearances for each predetermined time interval. The phrase feature calculation unit 603 calculates at least one of the number of phrase appearances, the normalized number of phrase appearances, and the number of phrase appearances for each predetermined time interval.

顧客抽出モデル記憶部２０５は、上記第２実施形態と同様に、感情特徴算出部２０４で算出されたオペレータおよび顧客の感情特徴量と、発話量特徴算出部２０２で算出されたオペレータおよび顧客の発話量特徴とに、次の特徴量を加えてモデルを生成する。すなわち、顧客抽出モデル記憶部２０５は、感情特徴量と発話量特徴とに、フレーズ特徴算出部６０３で算出された、オペレータおよび顧客のフレーズ特徴量を加えて、解約リスクが高い顧客を予測する顧客抽出モデルを生成して、記憶する。なお、顧客抽出モデルの学習（生成）、および、学習したモデルを用いた予測は、上述の特徴量（フレーズ特徴量）を用いる点以外は、第２実施形態と同様である。 As in the second embodiment, the customer extraction model storage unit 205 stores the emotion feature amounts of the operator and the customer calculated by the emotion feature calculation unit 204 and the utterances of the operator and the customer calculated by the utterance amount feature calculation unit 202. A model is generated by adding the following feature quantity to the quantity feature. That is, the customer extraction model storage unit 205 adds the phrase feature amounts of the operator and the customer calculated by the phrase feature calculation unit 603 to the emotion feature amount and the utterance amount feature to predict customers with a high cancellation risk. Generate and store an extraction model. The learning (generation) of the customer extraction model and the prediction using the learned model are the same as in the second embodiment except that the above-described feature amount (phrase feature amount) is used.

図７は、本実施形態に係る情報処理装置６００のハードウェア構成を説明するブロック図である。ＣＰＵ(Central Processing Unit)４１０は、演算制御用のプロセッサであり、プログラムを実行することで図２の情報処理装置６００の機能構成部を実現する。 FIG. 7 is a block diagram illustrating the hardware configuration of the information processing device 600 according to this embodiment. A CPU (Central Processing Unit) 410 is a processor for arithmetic control, and implements the functional components of the information processing apparatus 600 in FIG. 2 by executing a program.

ＲＡＭ７４０は、ＣＰＵ４１０が一時記憶のワークエリアとして使用するランダムアクセスメモリである。ＲＡＭ７４０には、本実施形態の実現に必要なデータを記憶する領域が確保されている。記憶フレーズ７４１は、あらかじめ指定され記憶されたフレーズである。 RAM 740 is a random access memory used by CPU 410 as a work area for temporary storage. The RAM 740 has an area for storing data necessary for implementing the present embodiment. Stored phrases 741 are predesignated and stored phrases.

ストレージ７５０には、データベースや各種のパラメータ、あるいは本実施形態の実現に必要な以下のデータまたはプログラムが記憶されている。ストレージ８５０は、フレーズ７５１、フレーズ認識モジュール７５２およびフレーズ特徴算出モジュール７５３を格納する。フレーズ７５１は、あらかじめ指定されたフレーズである。フレーズ認識モジュール７５２は、音声認識やワードスポッティングにより、記憶されたフレーズの出現を検知して、認識するモジュールである。フレーズ特徴算出モジュール７５３は、フレーズの認識結果（検知結果）を用いて、フレーズの出現数、フレーズ出現数を発話時間で正規化したもの、および所定時間間隔ごとのフレーズ出現数の少なくとも一つを算出するモジュールである。これらのモジュール７５２～７５３は、ＣＰＵ４１０によりＲＡＭ８４０のアプリケーション実行領域４４９に読み出され、実行される。 The storage 750 stores a database, various parameters, or the following data or programs necessary for implementing this embodiment. Storage 850 stores phrase 751 , phrase recognition module 752 and phrase feature calculation module 753 . Phrase 751 is a predesignated phrase. The phrase recognition module 752 is a module that detects and recognizes the appearance of stored phrases by voice recognition or word spotting. The phrase feature calculation module 753 uses the phrase recognition result (detection result) to calculate at least one of the number of occurrences of phrases, the number of occurrences of phrases normalized by the utterance time, and the number of occurrences of phrases at predetermined time intervals. It is a module that calculates. These modules 752 to 753 are read by the CPU 410 into the application execution area 449 of the RAM 840 and executed.

図８は、本実施形態に係る情報処理装置６００の処理手順を説明するフローチャートである。このフローチャートは、図７のＣＰＵ４１０がＲＡＭ８４０を使用して実行し、図６の情報処理装置６００の機能構成部を実現する。 FIG. 8 is a flowchart for explaining the processing procedure of the information processing apparatus 600 according to this embodiment. This flowchart is executed by the CPU 410 in FIG. 7 using the RAM 840, and implements the functional components of the information processing apparatus 600 in FIG.

ステップ８０１において、情報処理装置６００は、あらかじめ指定したフレーズの出現を検知して、認識する。ステップＳ８０３において、情報処理装置６００は、認識したフレーズの特徴を算出する。フレーズの特徴は、例えば、フレーズの出現数や、フレーズ出現数を発話時間で正規化したもの、所定の時間間隔ごとのフレーズ出現数などである。 At step 801, the information processing device 600 detects and recognizes the appearance of a predesignated phrase. In step S803, the information processing apparatus 600 calculates features of the recognized phrase. The features of phrases are, for example, the number of occurrences of phrases, the number of occurrences of phrases normalized by the utterance time, the number of occurrences of phrases for each predetermined time interval, and the like.

本実施形態によれば、納得、非納得、強引さおよび料金へのこだわりについて、関係するフレーズを検知し、フレーズ特徴を算出して、このフレーズ特徴を加えてモデルを生成するので、解約や継続について、発言内容まで加味して高精度に予測することができる。
［他の実施形態］
以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。また、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。 According to the present embodiment, relevant phrases are detected for consent, dissatisfaction, aggressiveness, and commitment to fees, and phrase features are calculated, and a model is generated by adding these phrase features. can be predicted with high accuracy by taking into account the content of the utterance.
[Other embodiments]
Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. Also, any system or apparatus that combines separate features included in each embodiment is also included in the scope of the present invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、本発明の範疇に含まれる。特に、少なくとも、上述した実施形態に含まれる処理ステップをコンピュータに実行させるプログラムを格納した非一時的コンピュータ可読媒体（non-transitory computer readable medium）は本発明の範疇に含まれる。 Further, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention is also applicable when an information processing program that implements the functions of the embodiments is directly or remotely supplied to a system or apparatus. Therefore, in order to implement the functions of the present invention on a computer, a program installed in a computer, a medium storing the program, and a WWW (World Wide Web) server from which the program is downloaded are also included in the scope of the present invention. . In particular, non-transitory computer readable media containing programs that cause a computer to perform at least the processing steps included in the above-described embodiments are included within the scope of the present invention.

［実施形態の他の表現］
上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）
顧客とオペレータとの間の通話を取得して、記憶する通話記憶手段と、
記憶した前記通話の発話量特徴量を算出する発話量特徴算出手段と、
前記発話区間の感情を認識する感情認識手段と、
前記発話量特徴量と、認識した前記感情と、に基づいて、前記通話における解約リスクを予測する解約予測手段と、
を備えた情報処理装置。
（付記２）
前記感情認識手段は、認識された前記感情の感情値をさらに算出し、
前記感情値から感情の特徴である感情特徴を算出する感情特徴算出手段と、
前記発話量特徴量と、前記感情特徴と、に基づいて、解約リスクの高い顧客を予測して、抽出するための、あらかじめ学習された顧客抽出モデルを記憶する顧客抽出モデル記憶手段と、
をさらに備え、
前記解約予測手段は、前記顧客抽出モデルに基づいて、前記解約リスクを予測する付記１に記載の情報処理装置。
（付記３）
前記感情特徴算出手段は、
通話に含まれる特定の話題区間を検出する話題区間検出手段と、
前記特定の話題区間に含まれる発話区間の前記感情値を出力する絞込み手段と、
前記絞込み手段が出力した感情値から感情の特徴である感情特徴を算出する特徴算出手段と、
をさらに備える付記２に記載の情報処理装置。
（付記４）
あらかじめ指定したフレーズを記憶するフレーズ記憶手段と、
前記通話における前記フレーズ記憶手段に記憶されたフレーズの出現を検知して、認識するフレーズ認識手段と、
出現したフレーズの特徴であるフレーズ特徴を算出するフレーズ特徴算出手段と、
をさらに備え、
前記解約予測手段は、さらに、前記フレーズ特徴に基づいて、前記解約リスクを予測する付記１乃至３のいずれか１項に記載の情報処理装置。
（付記５）
前記通話記憶手段は、前記通話の発話区間を検出し、検出した前記発話区間に対して、発話者情報と、発話時刻情報と、を付与して前記発話を記憶する付記１乃至４のいずれか１項に記載の情報処理装置。
（付記６）
前記感情は、valence、arousal、納得感および期待感の少なくとも一つを含む付記１乃至５のいずれか１項に記載の情報処理装置。
（付記７）
前記話題区間は、通話におけるオープニング、クロージング、製品・サービス説明、料金説明、勧誘、ヒアリング、個人情報確認および事務手続きの少なくとも一つを含み、
前記感情は、valenceおよび期待感の少なくとも一つを含む付記３乃至５のいずれか１項に記載の情報処理装置。
（付記８）
前記感情特徴は、前記感情値の感情値の比率、平均値、分散値、中央値および最頻値の少なくとも一つを含む付記２乃至７のいずれか１項に記載の情報処理装置。
（付記９）
前記フレーズ特徴は、前記フレーズの出現数であるフレーズ出現数、前記フレーズ出現数を発話時間で正規化した正規化フレーズ出現数およびあらかじめ指定した時間間隔ごとのフレーズ出現数の少なくとも一つを含む４乃至７のいずれか１項に記載の情報処理装置。
（付記１０）
顧客とオペレータとの間の通話を取得して、記憶する通話記憶ステップと、
記憶した前記通話の発話量特徴量を算出する発話量特徴算出ステップと、
前記発話区間の感情を認識する感情認識ステップと、
前記発話量特徴量と、認識した前記感情と、に基づいて、前記通話における解約リスクを予測する解約予測ステップと、
を含む情報処理方法。
（付記１１）
顧客とオペレータとの間の通話を取得して、記憶する通話記憶ステップと、
記憶した前記通話の発話量特徴量を算出する発話量特徴算出ステップと、
前記発話区間の感情を認識する感情認識ステップと、
前記発話量特徴量と、認識した前記感情と、に基づいて、前記通話における解約リスクを予測する解約予測ステップと、
をコンピュータに実行させる情報処理プログラム。 [Other expressions of the embodiment]
Some or all of the above embodiments can also be described as the following additional remarks, but are not limited to the following.
(Appendix 1)
call storage means for capturing and storing calls between the customer and the operator;
speech amount feature calculation means for calculating the stored speech amount feature amount of the call;
emotion recognition means for recognizing the emotion of the utterance section;
cancellation prediction means for predicting a cancellation risk in the call based on the utterance amount feature amount and the recognized emotion;
Information processing device with
(Appendix 2)
The emotion recognition means further calculates an emotion value of the recognized emotion,
emotion feature calculation means for calculating an emotion feature, which is a feature of an emotion, from the emotion value;
customer extraction model storage means for storing pre-learned customer extraction models for predicting and extracting customers with a high cancellation risk based on the utterance amount feature amount and the emotion feature;
further comprising
The information processing apparatus according to appendix 1, wherein the cancellation prediction means predicts the cancellation risk based on the customer extraction model.
(Appendix 3)
The emotion feature calculation means is
topic section detection means for detecting a specific topic section included in a call;
narrowing means for outputting the emotion value of the utterance segment included in the specific topic segment;
feature calculation means for calculating an emotion feature, which is a feature of an emotion, from the emotion value output by the narrowing-down means;
The information processing apparatus according to appendix 2, further comprising:
(Appendix 4)
a phrase storage means for storing a predesignated phrase;
Phrase recognition means for detecting and recognizing the occurrence of phrases stored in the phrase storage means in the call;
Phrase feature calculation means for calculating phrase features that are features of phrases that have appeared;
further comprising
4. The information processing apparatus according to any one of appendices 1 to 3, wherein the cancellation prediction means further predicts the cancellation risk based on the phrase features.
(Appendix 5)
5. Any one of Appendices 1 to 4, wherein the call storage means detects an utterance period of the call, adds speaker information and utterance time information to the detected utterance period, and stores the utterance. The information processing device according to item 1.
(Appendix 6)
6. The information processing apparatus according to any one of appendices 1 to 5, wherein the emotion includes at least one of valence, arousal, satisfaction, and expectation.
(Appendix 7)
The topic section includes at least one of opening, closing, product/service explanation, fee explanation, solicitation, hearing, personal information confirmation and administrative procedure in the call,
6. The information processing apparatus according to any one of appendices 3 to 5, wherein the emotion includes at least one of valence and expectation.
(Appendix 8)
8. The information processing apparatus according to any one of Appendices 2 to 7, wherein the emotion feature includes at least one of an emotion value ratio of the emotion values, an average value, a variance value, a median value, and a mode.
(Appendix 9)
The phrase features include at least one of the number of phrase appearances, which is the number of appearances of the phrase, the normalized number of phrase appearances obtained by normalizing the number of phrase appearances by the utterance time, and the number of phrase appearances for each predetermined time interval. 8. The information processing apparatus according to any one of items 1 to 7.
(Appendix 10)
a call storage step of retrieving and storing calls between the customer and the operator;
an utterance amount feature calculation step of calculating the stored utterance amount feature amount of the call;
an emotion recognition step of recognizing the emotion of the utterance segment;
a cancellation prediction step of predicting a cancellation risk in the call based on the utterance amount feature quantity and the recognized emotion;
Information processing method including.
(Appendix 11)
a call storage step of retrieving and storing calls between the customer and the operator;
an utterance amount feature calculation step of calculating the stored utterance amount feature amount of the call;
an emotion recognition step of recognizing the emotion of the utterance segment;
a cancellation prediction step of predicting a cancellation risk in the call based on the utterance amount feature quantity and the recognized emotion;
An information processing program that causes a computer to execute

Claims

a means for acquiring a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
means for obtaining an emotion value feature quantity relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer a means of predicting the churn risk of
Information processing device.

The speech data includes speaker information indicating whether the speaker is the customer or the operator , the start time of the speech, and the time of the end of the speech, for the speech data of the detected speech period. Time information and are given,
the means for acquiring the utterance amount feature acquires the utterance amount feature calculated using the speaker information and the utterance time information;
2. The information processing apparatus according to claim 1 , wherein said means for acquiring said emotion value feature quantity acquires said emotion value feature quantity calculated using the emotion value in each of said utterance intervals.

The utterance amount feature is
the total time of speech of the customer and the operator; and
a ratio of the total time of the customer's speech to the total time of the operator's speech;
an average value of the length of the customer's speech interval and an average value of the length of the speech interval of the operator;
A variance value of the length of the speech period of the customer and a variance value of the length of the speech period of the operator;
a median length of the utterance interval of the customer and a median length of the utterance interval of the operator;
a mode of the length of the utterance segment of the customer and a mode of the length of the utterance segment of the operator;
3. The information processing apparatus according to claim 1, comprising at least one of:

4. The emotional value according to any one of claims 1 to 3, wherein the emotional value includes at least one of valence (emotional valence indicating the quality of emotion) , arousal (arousal indicating arousal or calmness) , satisfaction, and expectation. The information processing device described.

5. The information processing apparatus according to any one of claims 1 to 4, wherein the emotion value feature amount includes at least one of a ratio of the emotion values, an average value, a variance value, a median value, and a mode value.

4. The information processing apparatus according to claim 2, wherein said means for acquiring said emotion value feature quantity acquires said emotion value feature quantity calculated based on voice data of a specific topic section of said call.

The specific topic section includes the opening topic section , the closing topic section , the product/service explanation topic section , the charge explanation topic section , the solicitation topic section , the hearing topic section , and the personal information confirmation section. 7. The information processing apparatus according to claim 6 , comprising at least one of a topic section and a topic section of paperwork.

Further comprising means for acquiring phrase features of the customer and the operator during the call, which are calculated based on the voice data ;
The means for predicting includes a model for predicting the customer's cancellation risk based on the speech amount feature amount, the emotion value feature amount, and the phrase feature amount, the acquired speech amount feature amount , and the acquired emotion Predicting the customer's cancellation risk using the value feature amount and the acquired phrase feature amount ,
8. The information processing apparatus according to any one of claims 1 to 7 , wherein said phrase feature amount is calculated by recognizing occurrence of a pre-specified phrase during said call .

a step of obtaining a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
a step of obtaining an emotion value feature amount relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer predicting the churn risk of
Information processing method including.

a step of obtaining a speech amount feature amount relating to the amount of speech of the customer and the operator, calculated based on voice data during a call between the customer and the operator;
a step of obtaining an emotion value feature amount relating to the emotion values of the customer and the operator during the call, calculated based on the voice data;
using a model for predicting the customer's cancellation risk based on the speech amount feature amount and the emotion value feature amount, the acquired speech amount feature amount, and the acquired emotion value feature amount, the customer predicting the churn risk of
An information processing program that causes a computer to execute