JP2009175336A

JP2009175336A - Database system of call center, and its information management method and information management program

Info

Publication number: JP2009175336A
Application number: JP2008012658A
Authority: JP
Inventors: Hisaya Kinugasa; 尚也衣笠; Yoshihiko Yajima; 芳彦矢島; Terukazu Uchida; 輝和内田; Jun Ishikawa; 潤石川
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2008-01-23
Filing date: 2008-01-23
Publication date: 2009-08-06

Abstract

PROBLEM TO BE SOLVED: To provide the database system of call center capable of performing multilateral data analysis, and its information management method and information management program. SOLUTION: The call center system 1 includes: a speech recognition device 11 which performs speech recognition on communication speech when an operator communicates with a customer, and which creates a text data by changing a communication content into characters; a feeling recognition device 14 which detects the featured value of speech of a section to be recognized in the communication speech, creates a feeling detection value in which feeling of a speaker is guessed on the basis of the featured value, and correlates the feeling detection value of the section to be recognized with the text data corresponding to the section to be recognized; and a D/B management device 12 in which the text data and the feeling detection value are stored in a state they are correlated with each other. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、コールセンターのデータベースシステム、その情報管理方法及び情報管理プログラムに関する。 The present invention relates to a call center database system, an information management method thereof, and an information management program.

コールセンターは、例えば商品に対する顧客からの商品に対する問い合わせ、商品の注文の受付等といった電話応対業務を専門に行う部門である。一般的に、コールセンターシステムは、構内電話交換機（ＰＢＸ、Private Branch eXchange、着信呼自動分配（ＡＣ
Ｄ Automatic Call Distribution）装置、オペレータが使用する端末、各種データベー
スを備えている（例えば特許文献１参照）。従来より、このコールセンターでは、オペレータが電話応対の内容を端末に入力し、端末に入力した応対履歴情報をデータベースに蓄積していた。コールセンターの管理部門では、この応対履歴情報を分析し、製品改良又はサービス改善等に利用していた。
特開２００７−２２８２７３号公報 The call center is a department that specializes in telephone answering operations such as inquiries about products from customers and receiving orders for products. Generally, a call center system includes a private branch exchange (PBX, Private Branch eXchange, automatic call distribution (AC).
D Automatic Call Distribution) device, a terminal used by an operator, and various databases (see, for example, Patent Document 1). Conventionally, in this call center, the operator inputs the contents of the telephone response to the terminal, and the response history information input to the terminal is accumulated in a database. The call center management department analyzed this response history information and used it for product improvement or service improvement.
JP 2007-228273 A

しかし、従来のデータベースでは、例えば顧客満足度やオペレータの対応等を多角的に分析することは困難だった。また、通話内容を分析するために、通話内容のキーワード等をオペレータが端末に入力しても、オペレータの主観が入り、通話内容を客観的に評価することは困難であった。 However, with conventional databases, it has been difficult to analyze, for example, customer satisfaction and operator response from various angles. In addition, even if the operator inputs a keyword or the like of the call content to the terminal in order to analyze the call content, the operator's subjectivity has entered and it has been difficult to objectively evaluate the call content.

本発明は、上記問題点に鑑みてなされたものであり、その目的は、多角的なデータ分析を行うことができるコールセンターのデータベースシステム、その情報管理方法及び情報管理プログラムを提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a call center database system, an information management method thereof, and an information management program capable of performing multifaceted data analysis.

本発明は、コールセンターのオペレータと顧客とが通話する際の通話音声に対し音声認識を行い、通話内容を文字化した文字データを生成する音声認識手段と、前記通話音声のうち対象区間の音声の特徴量を検出し、前記特徴量に基づき発話者の感情を推測した感情検出情報を生成する感情検出手段と、前記対象区間に対応する前記文字データに、該対象区間の前記感情検出情報を関連付ける登録手段と、前記文字データ及び前記感情検出情報を関連付けした状態で記憶する通話情報記憶手段とを備えた。 The present invention provides voice recognition means for performing voice recognition on a call voice when a call center operator and a customer make a call, and generating character data obtained by characterizing the call contents; voice of a target section of the call voice; Detecting a feature quantity and associating the emotion detection information of the target section with emotion detection means for generating emotion detection information inferring a speaker's emotion based on the feature quantity and the character data corresponding to the target section Registration means and call information storage means for storing the character data and the emotion detection information in association with each other.

この発明によれば、オペレータと顧客との間の通話を音声認識して生成された文字データと、通話音声の特徴量から求められた感情検出情報とを関連付けした状態で通話情報記憶手段に記憶する。従って、発話者の感情検出情報と、その情報を検出した対象区間の文字データとを、データベースに格納することができるので、多角的なデータ分析を行うことができる。このため、データ分析の自由度を向上させることができる。 According to this invention, the character data generated by voice recognition of the call between the operator and the customer is stored in the call information storage means in a state in which the emotion detection information obtained from the feature amount of the call voice is associated with the character data. To do. Accordingly, since the emotion detection information of the speaker and the character data of the target section in which the information is detected can be stored in the database, a multifaceted data analysis can be performed. For this reason, the freedom degree of data analysis can be improved.

このコールセンターのデータベースシステムにおいて、前記音声認識手段は、少なくとも前記顧客の通話音声、及び前記オペレータの通話音声に対して音声認識を行って、前記顧客の発話に基づく文字データ及び前記オペレータの発話に基づく前記文字データを生成し、前記感情検出手段は、前記顧客の通話音声のうち対象区間の前記感情検出情報、及び前記オペレータの通話音声のうち前記対象区間の前記感情検出情報を検出し、前記登録手段は、前記顧客の通話音声に基づき、前記対象区間に対応した前記文字データに、該対象
区間で検出された前記顧客の前記感情検出情報を関連付けるとともに、前記オペレータの通話音声に基づき、前記対象区間に対応した前記文字データに、該対象区間で検出された前記オペレータの前記感情検出情報を関連付ける。 In this call center database system, the voice recognition means performs voice recognition on at least the customer's call voice and the operator's call voice, and based on the character data based on the customer's utterance and the operator's utterance. The character data is generated, and the emotion detection means detects the emotion detection information of the target section of the call voice of the customer and the emotion detection information of the target section of the call voice of the operator, and the registration The means associates the emotion detection information of the customer detected in the target section with the character data corresponding to the target section based on the call voice of the customer, and the target based on the call voice of the operator The emotion detection of the operator detected in the target section in the character data corresponding to the section Associate the broadcast.

この発明によれば、顧客及びオペレータの感情検出情報を、顧客及びオペレータの発話に基づく文字データにそれぞれ関連付けることができるので、顧客側及びオペレータ側の両方の方向からデータ分析を行うことができる。このため、多角的なデータ分析を行うことができる。 According to the present invention, since the emotion detection information of the customer and the operator can be associated with the character data based on the utterances of the customer and the operator, data analysis can be performed from both the customer side and the operator side. For this reason, multilateral data analysis can be performed.

このコールセンターのデータベースシステムにおいて、前記音声認識手段は、前記顧客の通話音声及び前記オペレータの通話音声が混在した両者通話音声に対して音声認識を行った前記文字データを生成し、前記感情検出手段は、前記両者通話音声に基づいて前記感情検出情報を生成し、前記登録手段は、前記対象区間に対応する前記文字データに、該対象区間の前記感情検出情報を関連付ける。 In this call center database system, the voice recognition means generates the character data obtained by performing voice recognition on both call voices in which the customer call voice and the operator call voice are mixed, and the emotion detection means includes The emotion detection information is generated on the basis of the two call voices, and the registration means associates the emotion detection information of the target section with the character data corresponding to the target section.

この発明によれば、両者による通話全体における感情検出情報を、両者の通話音声に基づく文字データにそれぞれ関連付けることができるので、全体的なデータ分析を行うことができる。 According to the present invention, since the emotion detection information in the entire call by both can be associated with the character data based on the call voice of both, it is possible to perform overall data analysis.

このコールセンターのデータベースシステムにおいて、前記感情検出手段は、通話開始時から終了時までの間に複数の前記感情検出情報を検出し、前記登録手段は、前記各感情検出情報を、該感情検出情報を検出した前記文字データの前記対象区間にそれぞれ関連付ける。 In this call center database system, the emotion detection means detects a plurality of the emotion detection information from the start to the end of a call, and the registration means uses the emotion detection information as the emotion detection information. Each of the detected character data is associated with the target section.

この発明によれば、複数の感情検出情報を、文字データの対象区間にそれぞれ関連付ける。このため、単語単位、文節単位又は文章単位等で感情検出情報を関連付けられるので、感情の強い単語、文節又は文章単位等を抽出する等、データ分析の自由度を向上させることができる。 According to this invention, a plurality of emotion detection information is associated with each target section of character data. For this reason, since the emotion detection information can be associated in units of words, phrases, or sentences, it is possible to improve the degree of freedom of data analysis, such as extracting words, phrases, or sentences that have strong emotions.

このコールセンターのデータベースシステムにおいて、顧客との応対内容を示す応対履歴データを、前記感情検出データ及び前記文字データに関連付けて前記通話情報記憶手段に格納する応対履歴登録手段をさらに備えた。 The call center database system further includes reception history registration means for storing reception history data indicating the contents of reception with a customer in the call information storage means in association with the emotion detection data and the character data.

この発明によれば、応対履歴データを感情検出データ及び文字データに関連付けて格納するので、例えば問い合わせがあった製品毎にデータ分析を行うことができる。
本発明は、データベースを管理する制御手段を用いた前記データベースの情報管理方法であって、前記制御手段が、コールセンターのオペレータと顧客とが通話する際の通話音声に対し音声認識を行い、通話内容を文字化した文字データを取得するステップと、通話音声のうち対象区間の音声の特徴量を検出し、前記特徴量に基づき発話者の感情を推測した感情検出情報を取得するステップと、前記対象区間に対応する前記文字データに、該対象区間の前記感情検出情報を関連付けて通話情報記憶手段に記憶するステップとを有する。 According to this invention, since the response history data is stored in association with the emotion detection data and the character data, for example, data analysis can be performed for each product for which an inquiry has been made.
The present invention is the information management method for the database using the control means for managing the database, wherein the control means performs voice recognition on a call voice when a call is made between a call center operator and a customer, Obtaining character data obtained by characterizing, detecting a feature amount of speech in a target section of call speech, obtaining emotion detection information inferring a speaker's emotion based on the feature amount, and the target Storing the emotion detection information of the target section in association with the character data corresponding to the section in the call information storage means.

この方法によれば、オペレータと顧客との間の通話を音声認識して生成された文字データと、通話音声の特徴量から求められた感情検出情報とを関連付けした状態で通話情報記憶手段に記憶する。従って、発話者の感情検出情報と、その情報を検出した対象区間の文字データとを、データベースに格納することができるので、多角的なデータ分析を行うことができる。このため、データ分析の自由度を向上させることができる。 According to this method, the character data generated by voice recognition of the call between the operator and the customer is stored in the call information storage means in a state in which the emotion detection information obtained from the feature amount of the call voice is associated. To do. Accordingly, since the emotion detection information of the speaker and the character data of the target section in which the information is detected can be stored in the database, a multifaceted data analysis can be performed. For this reason, the freedom degree of data analysis can be improved.

本発明は、データベースを管理する制御手段を用いた前記データベースの情報管理プロ
グラムであって、前記制御手段を、コールセンターのオペレータと顧客とが通話する際の通話音声に対し音声認識を行い、通話内容を文字化した文字データを取得する文字データ取得手段と、通話音声のうち対象区間の音声の特徴量を検出し、前記特徴量に基づき発話者の感情を推測した感情検出情報を生成する検出データ取得手段と、前記対象区間に対応する前記文字データに、該対象区間の前記感情検出情報を関連付け、通話情報記憶手段に記憶する登録手段として機能させる。 The present invention is an information management program for the database using a control means for managing a database, wherein the control means performs voice recognition on a call voice when a call center operator and a customer make a call, Character data acquisition means for acquiring character data obtained by characterizing the character, and detection data for detecting the feature amount of the voice of the target section of the call voice and generating emotion detection information inferring the emotion of the speaker based on the feature amount The acquisition means and the character data corresponding to the target section are associated with the emotion detection information of the target section and function as registration means for storing in the call information storage means.

この発明によれば、情報管理プログラムに従って、オペレータと顧客との間の通話を音声認識して生成された文字データと、通話音声の特徴量から求められた感情検出情報とを関連付けした状態で通話情報記憶手段に記憶する。従って、発話者の感情検出情報と、その情報を検出した対象区間の文字データとを、データベースに格納することができるので、多角的なデータ分析を行うことができる。このため、データ分析の自由度を向上させることができる。 According to the present invention, in accordance with the information management program, a call is made in a state in which character data generated by voice recognition of a call between an operator and a customer is associated with emotion detection information obtained from the feature amount of the call voice. Store in the information storage means. Accordingly, since the emotion detection information of the speaker and the character data of the target section in which the information is detected can be stored in the database, a multifaceted data analysis can be performed. For this reason, the freedom degree of data analysis can be improved.

以下、本発明を具体化した一実施形態を図１〜図６に従って説明する。図１は、データベースシステムとしてのコールセンターシステム１の概略図である。
コールセンターシステム１は、ＰＢＸ装置５、オペレータ端末６、センター管理装置７、管理者端末８を有している。ＰＢＸ装置５、センター管理装置７及び各端末６，８は、ＬＡＮ（Local Area Network）９を介して、各種データを送受信可能に接続されている。 Hereinafter, an embodiment embodying the present invention will be described with reference to FIGS. FIG. 1 is a schematic diagram of a call center system 1 as a database system.
The call center system 1 includes a PBX device 5, an operator terminal 6, a center management device 7, and an administrator terminal 8. The PBX device 5, the center management device 7, and the terminals 6 and 8 are connected via a LAN (Local Area Network) 9 so that various data can be transmitted and received.

ＰＢＸ装置５は、公衆電話回線網Ｎ１を介して、顧客の電話機（以下、顧客電話機２という）と接続されている。また、ＰＢＸ装置５は、オペレータが用いる各電話機３と接続されている。各電話機３の操作部には、ログインボタンＢ１、保留ボタンＢ２、準備中ボタンＢ３が設けられている。ログインボタンＢ１は、電話機３で着呼を受けるためのボタンである。保留ボタンＢ２は、接続中の電話を保留させるためのボタンであり、準備中ボタンＢ３は、電話応対の準備に掛かっているときにオン操作される。 The PBX device 5 is connected to a customer's telephone (hereinafter referred to as customer telephone 2) via a public telephone line network N1. The PBX device 5 is connected to each telephone 3 used by the operator. The operation unit of each telephone 3 is provided with a login button B1, a hold button B2, and a preparing button B3. The login button B1 is a button for receiving an incoming call on the telephone 3. The hold button B2 is a button for holding the currently connected telephone, and the preparation button B3 is turned on when preparation for telephone reception is being made.

また、オペレータ端末６は、ＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を備え、ＬＡＮ９に複数台接続されている。オペレータ端末６には、マウス、キーボード等の入力装置Ｉと、ディスプレイＤＩが接続されている。オペレータは、電話機３からの音声が出力され、オペレータの通話音声を入力するヘッドセット（図示略）を用いて、顧客と通話しながら入力装置Ｉを操作する。 The operator terminal 6 includes a CPU, a RAM, a ROM, a communication I / F, and the like, and a plurality of operator terminals 6 are connected to the LAN 9. The operator terminal 6 is connected to an input device I such as a mouse and a keyboard and a display DI. The operator operates the input device I while talking to the customer using a headset (not shown) that receives the voice from the telephone 3 and inputs the call voice of the operator.

センター管理装置７は、電話機３をコールセンターシステム１に接続するＣＴＩ（Computer Telephony Integration）機能を有し、コールセンターへの電話着信及びコールセンターからの電話の発信を処理するサーバである。具体的には、センター管理装置７は、ＡＣＤ機能、ＩＶＲ（音声自動応答 Interactive Voice Response）機能を有し、ＰＢＸ装置５の制御を行う。ＡＣＤ機能は、コールセンターにかかってきた電話を待機中のオペレータに自動的に割り振る機能である。ＩＶＲ機能は、音声による自動応答機能である。 The center management device 7 has a CTI (Computer Telephony Integration) function for connecting the telephone 3 to the call center system 1 and is a server that processes incoming calls to the call center and outgoing calls from the call center. Specifically, the center management device 7 has an ACD function and an IVR (voice automatic response interactive voice response) function, and controls the PBX device 5. The ACD function is a function for automatically allocating a call to a call center to a waiting operator. The IVR function is a voice automatic response function.

このセンター管理装置７は、図示しない顧客情報記憶部を有している。顧客情報記憶部に格納された顧客情報は、過去にコールセンターに電話を掛けた顧客に関する情報、或いは予め登録した顧客に関する情報である。センター管理装置７は、オペレータの操作により入力された氏名等の顧客識別データ、又は顧客電話機２を介して入力された顧客識別データに基づき、対応する顧客情報を読み出してオペレータ端末６のディスプレイＤＩに表示する。 The center management device 7 has a customer information storage unit (not shown). The customer information stored in the customer information storage unit is information related to a customer who has called the call center in the past or information related to a customer registered in advance. The center management device 7 reads the corresponding customer information on the display DI of the operator terminal 6 based on the customer identification data such as the name entered by the operator's operation or the customer identification data inputted via the customer telephone 2. indicate.

また、センター管理装置７は、図２に示す履歴情報記憶部２０を備えている。履歴情報記憶部２０には、着呼毎の履歴データ２１が格納されている。履歴データ２１は、センタ
ー管理装置７が着呼毎に生成したデータであって、応対の履歴を示す。本実施形態では、履歴データ２１は、着呼ＩＤ２１Ａ、着呼日２１Ｂ、通話開始時間２１Ｃ、通話終了時間２１Ｄ、オペレータＩＤ２１Ｅ、製品グループ２１Ｆ、製品型番２１Ｇ、顧客ＩＤ２１Ｈ、保留履歴２１Ｊを有している。尚、履歴データ２１のデータ構成は、この構成に限定されず、他の構成でもよい。 The center management device 7 includes a history information storage unit 20 shown in FIG. The history information storage unit 20 stores history data 21 for each incoming call. The history data 21 is data generated by the center management device 7 for each incoming call, and indicates a history of reception. In the present embodiment, the history data 21 includes an incoming call ID 21A, an incoming call date 21B, a call start time 21C, a call end time 21D, an operator ID 21E, a product group 21F, a product model number 21G, a customer ID 21H, and a hold history 21J. Yes. The data configuration of the history data 21 is not limited to this configuration, and other configurations may be used.

着呼ＩＤ２１Ａは、センター管理装置７が着呼毎に割り振った番号であって、着呼日２１Ｂは、顧客からの電話をコールセンターが受信した年月日及び時刻を示す。通話開始時間２１Ｃ及び通話終了時間２１Ｄは、オペレータが通話を開始した時刻及び終了した時刻をそれぞれ示す。オペレータＩＤ２１Ｅは、その電話を受けたオペレータの識別番号である。 The incoming call ID 21A is a number assigned by the center management device 7 for each incoming call, and the incoming call date 21B indicates the date and time when the call center received a call from the customer. The call start time 21C and the call end time 21D indicate the time when the operator started and ended the call, respectively. The operator ID 21E is an identification number of the operator who has received the call.

製品グループ２１Ｆは、問い合わせがあった製品のカテゴリである。例えば、プリンタやプロジェクタ、スキャナ等といった電化製品に対する問い合わせ業務を行う場合、製品グループ２１Ｆには、「インクジェットプリンタ」、「レーザプリンタ」等の製品カテゴリ、或いは「個人向け」又は「ビジネス向け」等の目的別のカテゴリ等が格納される。製品型番２１Ｇは、問い合わせがあった製品の型番を示す。尚、商品の注文を受け付ける場合には、製品グループ２１Ｆ及び製品型番２１Ｇには、注文された商品のグループ及び型番が格納される。コールセンターから顧客に対して発呼を行う場合には、その通話内容に応じたサービス、製品の識別子が格納される。 The product group 21F is a category of a product for which an inquiry has been made. For example, when inquiring about electrical appliances such as printers, projectors, and scanners, the product group 21F includes product categories such as “inkjet printer” and “laser printer”, or “for individuals” or “for business”. A category for each purpose is stored. The product model number 21G indicates the model number of the product for which an inquiry has been made. When accepting an order for a product, the group and model number of the ordered product are stored in the product group 21F and the product model number 21G. When a call is made from a call center to a customer, service and product identifiers corresponding to the contents of the call are stored.

顧客ＩＤ２１Ｈは、応対した顧客の識別子であって、上記顧客情報と対応している。保留履歴２１Ｊは、保留回数、保留開始から保留終了までの保留時間等のデータである。
管理者端末８は、コールセンターのスーパーバイザー等が用いる端末であって、図示しないＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を有している。図１に示すように、管理者端末８には、マウスやキーボード等の入力装置Ｉが接続されており、スーパーバイザーの入力操作に応じて、指定された情報を履歴情報記憶部２０及び他のデータベースから抽出し、抽出したデータを閲覧画面としてディスプレイＤＩに表示する。 The customer ID 21H is an identifier of the customer who has responded, and corresponds to the customer information. The hold history 21J is data such as the number of hold times, the hold time from the hold start to the hold end.
The administrator terminal 8 is a terminal used by a call center supervisor or the like, and has a CPU, a RAM, a ROM, a communication I / F, etc. (not shown). As shown in FIG. 1, an input device I such as a mouse or a keyboard is connected to the administrator terminal 8, and designated information is transferred to the history information storage unit 20 and other information according to an input operation of the supervisor. Extracted from the database, and the extracted data is displayed on the display DI as a browsing screen.

さらに、図１に示すように、コールセンターシステム１は、音声データ登録装置１０、音声認識手段としての音声認識装置１１、通話情報記憶手段、制御手段、文字データ取得手段、検出データ取得手段としてのデータベース（Ｄ／Ｂ）管理装置１２、録音装置１３及び感情検出手段、登録手段としての感情認識装置１４を有している。これらの各装置１０〜１４は、上記ＬＡＮ９に接続され、各装置１０〜１４間だけでなく、ＰＢＸ装置５、センター管理装置７等との間で、後述する処理のためのデータを送受信可能に接続されている。尚、本実施形態では、処理を分散させるために装置１０〜１４を設けたが、各装置１０〜１４のうち複数の装置を一つの装置に統合したり、全ての装置１０〜１４を一つの装置にしてもよい。 Further, as shown in FIG. 1, the call center system 1 includes a voice data registration device 10, a voice recognition device 11 as voice recognition means, a call information storage means, a control means, character data acquisition means, and a database as detection data acquisition means. (D / B) It has the management apparatus 12, the recording device 13, the emotion detection means, and the emotion recognition apparatus 14 as a registration means. Each of these devices 10 to 14 is connected to the LAN 9 so that data for processing to be described later can be transmitted and received not only between the devices 10 to 14 but also with the PBX device 5, the center management device 7, and the like. It is connected. In the present embodiment, the devices 10 to 14 are provided to distribute the processing. However, a plurality of devices among the devices 10 to 14 are integrated into one device, or all the devices 10 to 14 are integrated into one device. It may be a device.

図３に示すように、音声データ登録装置１０には、制御部３０、全体音声記憶部３２、顧客音声記憶部３３、オペレータ音声記憶部３４を備えている。制御部３０は、ＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を備え、ＰＢＸ装置５又はオペレータが用いる電話機３等から顧客の発話のみによる音声信号を取得する。また、電話機３又は上記ヘッドセットのマイク等の音声入力部ＩＶ（図１参照）から、オペレータの発話のみによる音声信号を取得する。さらに、ＰＢＸ装置５、電話機３又は音声入力部ＩＶ等から、顧客の発話による音声及びオペレータの発話による音声が混在した両者通話音声の音声信号を取得する。 As shown in FIG. 3, the voice data registration device 10 includes a control unit 30, an overall voice storage unit 32, a customer voice storage unit 33, and an operator voice storage unit 34. The control unit 30 includes a CPU, a RAM, a ROM, a communication I / F, and the like, and acquires an audio signal based only on the customer's utterance from the PBX device 5 or the telephone 3 used by the operator. Further, a voice signal based only on the operator's utterance is acquired from the voice input unit IV (see FIG. 1) such as the telephone 3 or the microphone of the headset. Further, from the PBX device 5, the telephone 3, or the voice input unit IV, the voice signal of both call voices in which the voices of the customer's speech and the voices of the operator are mixed is acquired.

制御部３０は、顧客の発話のみによる音声信号、オペレータの発話のみによる音声信号、両者の音声が混合した音声信号を取得すると、音声信号をＡ／Ｄ変換してＷＡＶ形式等のデジタルデータに変換する。尚、音声データへの変換はリアルタイム処理で行ってもよ
く、バッチ処理で行うようにしてもよい。或いは、顧客の発話のみによる音声と、オペレータの発話のみによる音声と、両者の発話による音声とをそれぞれ録音装置１３又はセンター管理装置７（図１参照）に録音し、録音装置１３によってＡ／Ｄ変換を行い、その録音装置１３から音声データを取得してもよい。 When the control unit 30 acquires a voice signal based only on the customer's utterance, a voice signal based only on the operator's utterance, and a voice signal in which both voices are mixed, the control unit 30 converts the voice signal into digital data such as WAV format. To do. Note that the conversion to audio data may be performed by real time processing or batch processing. Alternatively, the voice of only the customer's utterance, the voice of only the operator's utterance, and the voice of both utterances are recorded in the recording device 13 or the center management device 7 (see FIG. 1), respectively, and the recording device 13 performs A / D. Conversion may be performed to obtain audio data from the recording device 13.

音声データを生成すると、制御部３０は、着呼毎に生成された識別子を、音声データに付与する。識別子は、データベースのレコードを他のレコードから一意に識別する主キーとなるデータで、着呼ＩＤ２１Ａ等、履歴データ２１を構成するデータ要素を用い、音声データと履歴データ２１との対応付けが可能な状態にする。また、音声データが、顧客、オペレータ及びその両方のうち、どの発話者によるデータであるのかを示す発話者コードを、各音声データに付与する。そして、識別子及び発話者コードが付与された音声データを、対応する各記憶部３２〜３４にそれぞれ記憶する。 When the voice data is generated, the control unit 30 assigns the identifier generated for each incoming call to the voice data. The identifier is data serving as a primary key for uniquely identifying a record in the database from other records, and the voice data and the history data 21 can be associated with each other using data elements constituting the history data 21 such as the incoming call ID 21A. To make sure In addition, a speaker code indicating which speaker is the data among the customer, the operator, and both is added to each voice data. And the audio | voice data to which the identifier and the speaker code | symbol were provided are each memorize | stored in each corresponding memory | storage parts 32-34.

その結果、全体音声記憶部３２には、顧客及びオペレータの発話による全体音声データ３２Ａが記憶され、顧客音声記憶部３３には、顧客の発話のみによる顧客音声データ３３Ａが記憶される。また、オペレータ音声記憶部３４には、オペレータ音声データ３４Ａが記憶される。 As a result, the entire voice storage unit 32 stores the entire voice data 32A based on the utterances of the customer and the operator, and the customer voice storage unit 33 stores the customer voice data 33A based only on the utterances of the customer. In addition, operator voice data 34 A is stored in the operator voice storage unit 34.

図４に示すように、音声認識装置１６は、音声認識処理部４１、認識用情報記憶部４２、両者発話情報記憶部４３、顧客発話情報記憶部４４、オペレータ発話情報記憶部４５を備えている。 As shown in FIG. 4, the voice recognition device 16 includes a voice recognition processing unit 41, a recognition information storage unit 42, a utterance information storage unit 43, a customer utterance information storage unit 44, and an operator utterance information storage unit 45. .

音声認識処理部４１は、ＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を格納し、図示しない記憶部に格納された音声認識プログラムに従って、音声認識処理を行う。音声認識処理は、通話時等に順次行うリアルタイム処理でもよく、複数の音声データに対して一括して行うバッチ処理でもよい。認識用情報記憶部４２には、音声認識処理に用いられる認識用情報４２Ａが格納されている。認識用情報４２Ａとしては、例えば、音声の特徴量と音素とを関連付けた音響モデル、音素列と対応付けられた単語を数万〜数十万語格納した認識辞書、文頭・文末に位置する確率や、連続する単語間の接続確率や、係り受け関係をモデル化した言語モデルといった各種データがある。 The voice recognition processing unit 41 stores a CPU, RAM, ROM, communication I / F, and the like, and performs voice recognition processing according to a voice recognition program stored in a storage unit (not shown). The voice recognition process may be a real-time process that is sequentially performed during a call or the like, or may be a batch process that is collectively performed on a plurality of voice data. The recognition information storage unit 42 stores recognition information 42A used for voice recognition processing. As the recognition information 42A, for example, an acoustic model in which a feature amount of speech and a phoneme are associated, a recognition dictionary in which tens of thousands to hundreds of thousands of words associated with a phoneme string are stored, a probability of being located at the beginning / end of a sentence In addition, there are various data such as a connection probability between consecutive words and a language model modeling a dependency relationship.

音声認識処理部４１は、音声認識プログラムに従って、認識用情報４２Ａを用いて、上記各音声データ３２Ａ〜３４Ａを文字データに変換する。この処理は公知の方法を用いることができる。例えば、音声認識処理部４１は、音声データ３２Ａ〜３４Ａを用いて、音声の波形の特徴を算出し、この特徴量と音響モデルとを照合して音素をそれぞれ選択する。また、これらの各音素列と認識辞書とを照合して、認識候補の単語を選択する。さらに、音声認識処理部４１は、言語モデルを用いて、接続関係の確率を算出し、整合性を判断する。認識結果が確定されると、テキストと、テキストが発話された時間等を関連付けて格納する。 The speech recognition processing unit 41 converts each of the speech data 32A to 34A into character data using the recognition information 42A in accordance with a speech recognition program. A known method can be used for this treatment. For example, the speech recognition processing unit 41 uses the speech data 32 A to 34 A to calculate features of speech waveforms, collates this feature amount with an acoustic model, and selects phonemes. Also, each phoneme string and the recognition dictionary are collated to select a recognition candidate word. Further, the speech recognition processing unit 41 calculates a connection relation probability using a language model, and determines consistency. When the recognition result is confirmed, the text is stored in association with the time when the text is spoken.

また、図５に示すように、音声認識処理部４１は、音声データと同様に、通話内容を文字化したテキストＴＸＴに対し、識別子Ｃ１及び発話者コードＣ２を付与して、文字データとしてのテキストデータＴＸを生成する。さらに生成したテキストデータＴＸを、各記憶部４３〜４５に格納する。 Further, as shown in FIG. 5, the speech recognition processing unit 41 assigns the identifier C1 and the speaker code C2 to the text TXT in which the content of the call is converted into text as text data, similarly to the speech data. Data TX is generated. Further, the generated text data TX is stored in each of the storage units 43 to 45.

即ち、顧客及びオペレータの発話による全体音声データ３２Ａを音声認識した文字データは、両者テキストデータＴＸ１として両者発話情報記憶部４３に記憶する。両者テキストデータＴＸ１は、顧客とオペレータが発話した内容が混合された状態で文字データ化されている。また、顧客の発話のみによる顧客音声データ３３Ａを音声認識した文字データは、顧客テキストデータＴＸ２として顧客発話情報記憶部４４に記憶する。また、オペレ
ータ音声データ３４Ａを音声認識した文字データは、オペレータテキストデータＴＸ３として、オペレータ発話情報記憶部４５に記憶する。尚、両者テキストデータＴＸ１、顧客テキストデータＴＸ２、オペレータテキストデータＴＸ３をそれぞれ区別しない場合には、テキストデータＴＸとして説明する。 That is, the character data obtained by voice recognition of the entire voice data 32A based on the utterances of the customer and the operator is stored in the utterance information storage unit 43 as the both text data TX1. Both pieces of text data TX1 are converted into character data in a state where contents uttered by the customer and the operator are mixed. The character data obtained by voice recognition of the customer voice data 33A based only on the customer utterance is stored in the customer utterance information storage unit 44 as customer text data TX2. Character data obtained by voice recognition of the operator voice data 34A is stored in the operator utterance information storage unit 45 as operator text data TX3. In addition, when not distinguishing both text data TX1, customer text data TX2, and operator text data TX3, it demonstrates as text data TX.

感情認識装置１４は、ＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を有し、図示しない記憶部に格納された感情認識プログラムに従って、通話音声の特徴量から求められる感情検出値を算出する。感情認識装置１４は、顧客及びオペレータの音声が混在した通話音声に基づく感情認識と、顧客の通話音声に基づく感情認識と、オペレータの通話音声に基づく感情認識を行う。感情認識処理は、通話終了時等に順次行うリアルタイム処理でもよく、複数の音声データに対して一括して行うバッチ処理でもよい。この感情認識処理では、「怒り」、「喜び」、「悲しみ」等といった各感情の種類のそれぞれに対し、感情の強さを示す値を特定する。 The emotion recognition device 14 includes a CPU, a RAM, a ROM, a communication I / F, and the like, and calculates an emotion detection value obtained from the feature amount of the call voice according to an emotion recognition program stored in a storage unit (not shown). The emotion recognition device 14 performs emotion recognition based on call voice in which customer and operator voices are mixed, emotion recognition based on customer call voice, and emotion recognition based on operator call voice. The emotion recognition process may be a real-time process that is sequentially performed at the end of a call or the like, or a batch process that is collectively performed on a plurality of voice data. In this emotion recognition process, a value indicating the strength of emotion is specified for each type of emotion such as “anger”, “joy”, and “sadness”.

感情認識方法としては、公知の方法を用いることができる。例えば、感情認識装置１４は、顧客、オペレータ又はその両方の音声データ３２Ａ〜３４Ａ（又は音声信号）に基づき、音声強度を取得する。 A known method can be used as the emotion recognition method. For example, the emotion recognition device 14 acquires the sound intensity based on the sound data 32A to 34A (or sound signal) of the customer, the operator, or both.

また、感情認識装置１４は、感情認識の認識対象区間を取得する。この認識対象区間は、テキストデータＴＸの単語又は文節又は文章の区切り等でもよく、５秒等の所定時間毎に区切られた区間でもよい。或いは、音声データ３２Ａ〜３４Ａの波形に基づき、所定のパターンが含まれる区域を認識対象区間としてもよい。そして認識対象区間毎の音声の強度を検出し、強度変化パターンに基づき音声の抑揚を算出する。 In addition, the emotion recognition device 14 acquires a recognition target section for emotion recognition. This recognition target section may be a section of words, phrases or sentences of the text data TX, or may be a section divided every predetermined time such as 5 seconds. Or based on the waveform of audio | voice data 32A-34A, it is good also considering the area containing a predetermined pattern as a recognition object area. Then, the sound intensity for each recognition target section is detected, and the sound inflection is calculated based on the intensity change pattern.

さらに、音声認識処理部４１等が算出した、音素データを取得して、音素の数をテンポとして取得する。その他に、音階、音程、旋律、周波数等といった音声の物理特徴量を検出してもよい。 Furthermore, the phoneme data calculated by the speech recognition processing unit 41 and the like is acquired, and the number of phonemes is acquired as the tempo. In addition, physical feature quantities of speech such as scale, pitch, melody, frequency, etc. may be detected.

また、感情認識装置１４は、怒りの状態における音声の特性パターンと、喜びの状態における音声の特性パターンと、悲しみの状態における音声の特性パターン等を予め格納している。感情認識装置１４は、各音声信号に基づくパターンを、予め格納したパターンと比較し、現在の感情の強さを特定する。例えば、「怒り」、「喜び」、「悲しみ」の各感情に対して、「−２」、「−１」、「０」、「＋１」、「＋２」の値がそれぞれ設定され、感情認識装置１４は、これらの値のいずれかを感情の強さとして特定する。 The emotion recognition device 14 stores in advance a voice characteristic pattern in an angry state, a voice characteristic pattern in a joyful state, a voice characteristic pattern in a sadness state, and the like. The emotion recognition device 14 compares the pattern based on each audio signal with a previously stored pattern, and specifies the current emotion strength. For example, the values “−2”, “−1”, “0”, “+1”, and “+2” are set for the emotions of “anger”, “joy”, and “sadness”, respectively. The device 14 identifies any of these values as emotional strength.

その結果、顧客の発話に対する感情検出値Ｖ１と、オペレータの発話に対する感情検出値Ｖ２と、顧客及びオペレータの発話が混在した全体の発話に対する両者感情検出値Ｖ３とが得られる。尚、感情検出値Ｖ１，Ｖ２及び両者感情検出値Ｖ３を互いに区別しないで説明する場合には、単に感情検出値Ｖとして説明する。 As a result, the emotion detection value V1 for the customer utterance, the emotion detection value V2 for the operator utterance, and the emotion detection value V3 for the entire utterance mixed with the customer and operator utterances are obtained. When the emotion detection values V1 and V2 and the emotion detection values V3 are described without being distinguished from each other, they are simply described as the emotion detection value V.

感情検出値Ｖを生成すると、感情認識装置１４は、感情検出値Ｖと、感情認識を行った認識対象区間とを用いて図５に模式的に示す感情検出データＤを生成する。感情検出データＤは、テキストデータＴＸと同様に識別子Ｃ１が付与されている。また、感情検出データＤは、どの発話者による感情であるのかを示す発話者コードＣ２が付与されている。さらに、その感情検出データＤが、「怒り」、「悲しみ」、「喜び」等の種別のうちどの種別のデータであるのかを示す感情種別Ｃ３が付与されている。また、感情検出データＤに含まれる感情検出値Ｖは、その値を検出した上記認識対象区間Ｓとそれぞれ関連付けられている。 When the emotion detection value V is generated, the emotion recognition device 14 generates emotion detection data D schematically shown in FIG. 5 using the emotion detection value V and the recognition target section on which emotion recognition has been performed. The emotion detection data D is given an identifier C1 like the text data TX. The emotion detection data D is given a speaker code C2 indicating which speaker is the emotion. Further, an emotion type C3 indicating which type of data among the types such as “anger”, “sadness”, “joy” is given to the emotion detection data D. The emotion detection value V included in the emotion detection data D is associated with the recognition target section S where the value is detected.

従って、図５に示すように、識別子Ｃ１及び発話者コードＣ２によって感情検出データ
ＤとテキストデータＴＸとを対応付けすることができる。また、テキストデータＴＸのテキストＴＸＴに含まれる文字は、各感情検出値Ｖ１の認識対象区間Ｓと対応しているので、「怒り」、「喜び」、「悲しみ」等に対する感情検出値Ｖと、単語又は文節又は文章等の文字データとを対応付けることができる。例えば、顧客が「エラーが起きた」と発話した文字データに対し、「怒り」の感情が「＋２」であった場合には、エラーが発生したことに対して顧客が強い怒りを感じていると推定することができる。一方、異なる顧客が「エラーが起きた」と発話した場合、「怒り」の感情が「０」であった場合には、その顧客はエラーが発生したことに対してあまり怒りを感じていないと推定することができる。また、通話を開始してから終了するまでの感情検出値Ｖの時系列的な推移を判定し、顧客の感情がどのように変化したかを判断できる。 Therefore, as shown in FIG. 5, the emotion detection data D and the text data TX can be associated with each other by the identifier C1 and the speaker code C2. Further, since the characters included in the text TXT of the text data TX correspond to the recognition target section S of each emotion detection value V1, the emotion detection value V for “anger”, “joy”, “sadness”, and the like, It can be associated with character data such as words, phrases or sentences. For example, if the customer's anger is “+2” for the character data that the customer uttered “an error has occurred”, the customer feels a strong anger for the error. Can be estimated. On the other hand, when a different customer speaks that “an error has occurred” and the feeling of “anger” is “0”, the customer does not feel too angry that the error has occurred. Can be estimated. Further, it is possible to determine how the emotion of the customer has changed by determining the time-series transition of the emotion detection value V from the start to the end of the call.

また、感情認識装置１４は、通話が開始されてから終了されるまでの間を感情認識の対象区間として、通話全体の感情検出値を特定する。このとき、感情認識装置１４は、通話全体の音声に対する特徴量に基づき、上記したように感情検出値を特定する。これにより、「怒り」、「喜び」、「悲しみ」等といった各感情の種類のそれぞれに対し、顧客の発話全体に対する全体感情検出値ＶＴと、オペレータの発話全体に対する全体感情検出値ＶＴと、両者の発話全体に対する全体感情検出値ＶＴとが生成される。全体感情検出値ＶＴを生成すると、感情認識装置１４は、図６に示すように、全体感情検出値ＶＴに対し、識別子Ｃ１、発話者コードＣ２及び感情種別Ｃ３を付与して、顧客の全体感情検出データＴと、オペレータの全体感情検出データＴと、両者の全体感情検出データＴとを生成する。 In addition, the emotion recognition device 14 specifies an emotion detection value for the entire call by setting the period from when the call is started to when it is ended as an emotion recognition target section. At this time, the emotion recognizing device 14 specifies the detected emotion value as described above based on the feature amount for the voice of the entire call. As a result, for each type of emotion such as “anger”, “joy”, “sadness”, etc., the total emotion detection value VT for the entire customer utterance and the total emotion detection value VT for the entire operator utterance, Is generated as a whole emotion detection value VT for the entire utterance. When the total emotion detection value VT is generated, the emotion recognition device 14 assigns the identifier C1, the speaker code C2, and the emotion type C3 to the total emotion detection value VT as shown in FIG. Detection data T, operator total emotion detection data T, and both total emotion detection data T are generated.

Ｄ／Ｂ管理装置１２は、ＣＰＵ、ＲＡＭ、ＲＯＭ、通信Ｉ／Ｆ等を有し、図示しない記憶部に格納された情報管理プログラムに従って、センター管理装置７、感情認識装置１４から所定のタイミングで各種情報を取得してデータベースを作成する。Ｄ／Ｂ管理装置１２は、センター管理装置７からは、履歴データ２１を取得する。また、感情認識装置１４から感情検出データＤ及び全体感情検出データＴを取得する。そして、これらのデータを統合して、データベースに新たなレコードとして追加する。 The D / B management device 12 includes a CPU, a RAM, a ROM, a communication I / F, and the like, and at a predetermined timing from the center management device 7 and the emotion recognition device 14 according to an information management program stored in a storage unit (not shown). Obtain various information and create a database. The D / B management device 12 acquires the history data 21 from the center management device 7. Also, emotion detection data D and overall emotion detection data T are acquired from the emotion recognition device 14. Then, these data are integrated and added as a new record to the database.

図６は、データベース構造の模式図である。データベース１００は、テキストデータＴＸ、履歴データ２１、感情検出データＤ、全体感情検出データＴとを識別子Ｃ１によって関連付けて格納している。また、識別子Ｃ１によって、音声データ登録装置１０の音声データ３２Ａ〜３４Ａが関連付けられる。この音声データ３２Ａ〜３４Ａは、音声データ登録装置１０に分散させて格納してもよいし、Ｄ／Ｂ管理装置１２に格納してもよい。 FIG. 6 is a schematic diagram of a database structure. The database 100 stores text data TX, history data 21, emotion detection data D, and overall emotion detection data T in association with each other by an identifier C1. Further, the voice data 32A to 34A of the voice data registration device 10 are associated by the identifier C1. The audio data 32A to 34A may be distributed and stored in the audio data registration device 10, or may be stored in the D / B management device 12.

また、上記したように感情検出値Ｖは、テキストデータＴＸのテキストＴＸＴの各文字データに関連付けられるので、単語、文節又は文章等を発話した際の発話者の感情を推測することができる。 Further, as described above, since the emotion detection value V is associated with each character data of the text TXT of the text data TX, it is possible to estimate the speaker's emotion when speaking a word, phrase, sentence or the like.

コールセンターのスーパーバイザーは、管理者端末８を操作して、データベース１００を用いてデータ分析を行う。このとき、データベースのパラメータとして感情検出値Ｖ及び全体感情検出値ＶＴ及び文字データが付加されるとともに、形態素解析及び感情認識を、顧客の音声、オペレータの音声及び両者の音声に基づいて行うことで、多角的な分析を行うことができる。 The call center supervisor operates the administrator terminal 8 to perform data analysis using the database 100. At this time, the emotion detection value V, the entire emotion detection value VT, and character data are added as parameters of the database, and morphological analysis and emotion recognition are performed based on the customer's voice, the operator's voice, and both voices. Multifaceted analysis can be performed.

例えば、履歴データ２１、感情検出データＤ及びテキストデータＴＸを用いて、感情の強さが所定値以上である感情検出値Ｖを有するデータを、製品毎、所定期間毎等に抽出し、抽出数又は感情検出値Ｖを製品毎又は所定期間毎に比較することができる。さらに、抽出した感情検出データＤに関連付けられた文字データをテキストデータＴＸから抽出することができる。例えば「怒り」又は「喜び」が「＋２」の文字データを抽出することで、顧客が何に対して強い「怒り」又は「喜び」を感じていたのか、客観的に評価することが
できる。このように音声認識によってテキストデータＴＸを生成することで、オペレータが通話内容を要約したキーワードを入力する場合に比べて、キーワードのばらつきの発生を抑制することができる。 For example, using history data 21, emotion detection data D, and text data TX, data having an emotion detection value V whose emotion strength is a predetermined value or more is extracted for each product, every predetermined period, etc. Alternatively, the emotion detection value V can be compared for each product or for each predetermined period. Furthermore, the character data associated with the extracted emotion detection data D can be extracted from the text data TX. For example, by extracting character data in which “anger” or “joy” is “+2”, it is possible to objectively evaluate what the customer felt strong “anger” or “joy”. By generating the text data TX by voice recognition in this way, it is possible to suppress the occurrence of keyword variations compared to the case where the operator inputs a keyword summarizing the contents of the call.

また例えば、異なる製品に対して、「紙詰まり」という単語を含むテキストデータＴＸが複数蓄積された場合でも、「紙詰まり」に対する感情検出値Ｖ１の大きさによって、その製品に対する顧客満足度を客観的に評価し、この顧客満足度を製品開発にフィードバックさせることができる。このため、オペレータが顧客の感情に関する事項を入力する手間を省くことができるとともに、顧客の感情を判定する判定者の主観を取り除くことができる。 Further, for example, even when a plurality of text data TX including the word “paper jam” is accumulated for different products, the customer satisfaction with the product is objectively determined by the magnitude of the emotion detection value V1 for “paper jam”. The customer satisfaction can be fed back to product development. For this reason, it is possible to save the operator from inputting items related to the customer's emotions, and to remove the subjectivity of the judge who determines the customer's emotions.

また、発話者コードＣ２がオペレータの識別子である感情検出データＤのうち、感情検出値Ｖが所定値以上のデータがあるか否かを判断し、検出された場合には、その感情検出値Ｖに関連付けられた文字データを判断することで、どのタイミングでオペレータにかかるストレスが増大したかを客観的に判定することができる。また、通話全体に対する感情検出値Ｖの推移に基づき、顧客とオペレータの感情の変化を計測し、「怒り」の感情検出値Ｖを小さくし、「喜び」の感情検出値Ｖを大きくするためのノウハウを蓄積し、そのノウハウを電話応対業務に反映させることができる。また、同一のオペレータＩＤ２１Ｅを有する履歴データ２１に関連付けられた感情検出データＤ又は全体感情検出データＴを抽出し、日毎、月毎、或いはそれ以外の期間で、所定の計算式に当てはめて、期間毎のオペレータの評価を行うことができる。 In addition, it is determined whether or not there is data whose emotion detection value V is equal to or greater than a predetermined value in the emotion detection data D in which the speaker code C2 is the identifier of the operator. By determining the character data associated with, it is possible to objectively determine at which timing the stress on the operator has increased. Further, based on the transition of the emotion detection value V for the entire call, the change in the emotions of the customer and the operator is measured, the emotion detection value V for “anger” is reduced, and the emotion detection value V for “joy” is increased. Accumulate know-how and reflect that know-how in telephone service. Further, the emotion detection data D or the whole emotion detection data T associated with the history data 21 having the same operator ID 21E is extracted and applied to a predetermined calculation formula every day, every month, or other periods, Each operator can be evaluated.

上記実施形態によれば、以下のような効果を得ることができる。
（１）上記実施形態では、コールセンターシステム１は、コールセンターのオペレータと顧客との通話による音声に対し音声認識を行い、通話内容を文字化したテキストデータＴＸを生成する音声認識装置１１を備える。また、通話音声の特徴量から求められる感情検出値Ｖ及び全体感情検出値ＶＴを検出し、テキストデータＴＸのうち、感情検出値Ｖを検出した認識対象区間Ｓに感情検出値Ｖを関連付ける感情認識装置１４を備える。さらに、テキストデータＴＸ及び感情検出値Ｖ及び全体感情検出値ＶＴを関連付けした状態で記憶するＤ／Ｂ管理装置１２を備える。従って、発話者の感情を推測することができる感情検出値Ｖと、その情報を検出した認識対象区間Ｓの文字データとをデータベースに格納することで、顧客満足度、オペレータのストレス判定、オペレータ評価等、多角的なデータ分析を行うことができる。 According to the above embodiment, the following effects can be obtained.
(1) In the above-described embodiment, the call center system 1 includes the voice recognition device 11 that performs voice recognition on the voice of the call between the call center operator and the customer and generates text data TX in which the call content is converted into text. Also, the emotion detection value V and the total emotion detection value VT obtained from the feature amount of the call voice are detected, and the emotion recognition that associates the emotion detection value V with the recognition target section S in which the emotion detection value V is detected in the text data TX. A device 14 is provided. Furthermore, the D / B management apparatus 12 which memorize | stores the text data TX, the emotion detection value V, and the whole emotion detection value VT in the linked state is provided. Therefore, by storing the emotion detection value V that can estimate the speaker's emotion and the character data of the recognition target section S from which the information is detected in the database, customer satisfaction, operator stress determination, operator evaluation For example, multi-dimensional data analysis can be performed.

（２）上記実施形態では、音声認識装置１１は、顧客の通話音声、及びオペレータの通話音声に対して音声認識を行って顧客の発話による顧客テキストデータＴＸ２及びオペレータの発話によるオペレータテキストデータＴＸ３を生成する。感情認識装置１４は、各通話音声に基づいて、顧客の通話音声の感情検出値Ｖ１及び全体感情検出値ＶＴ及びオペレータの通話音声の感情検出値Ｖ２及び全体感情検出値ＶＴを検出する。また、感情認識装置１４は、テキストデータＴＸのうち、感情検出値Ｖを検出した認識対象区間Ｓに対応する区間に、顧客及びオペレータの感情検出値Ｖ１，Ｖ２を関連付ける。さらに、全体感情検出値ＶＴをテキストデータＴＸにそれぞれ関連付ける。このため、顧客側及びオペレータ側の両方の方向からデータ分析を行うことができるので、データ分析の自由度を向上させることができる。 (2) In the above embodiment, the voice recognition device 11 performs voice recognition on the customer call voice and the operator call voice to obtain the customer text data TX2 based on the customer utterance and the operator text data TX3 based on the operator utterance. Generate. The emotion recognition device 14 detects the emotion detection value V1 and the total emotion detection value VT of the customer's call voice and the emotion detection value V2 and the total emotion detection value VT of the operator's call voice based on each call voice. In addition, the emotion recognition device 14 associates the detected emotion values V1 and V2 of the customer and the operator with the section corresponding to the recognition target section S in which the detected emotion value V is detected in the text data TX. Further, the entire emotion detection value VT is associated with the text data TX. For this reason, since data analysis can be performed from both the customer side and the operator side, the degree of freedom in data analysis can be improved.

（３）上記実施形態では、顧客の通話音声及びオペレータの通話音声が混在した両者通話音声に対して音声認識を行った両者テキストデータＴＸ１を生成する。また、感情認識装置１４は、両者通話音声に基づいて感情検出値Ｖ及び全体感情検出値ＶＴを検出し、両者テキストデータＴＸ１のうち、感情検出値Ｖを検出した認識対象区間Ｓに感情検出値Ｖを関連付ける。また、両者テキストデータＴＸ１に、全体感情検出値ＶＴを関連付ける。
このため、全体的な評価を行う際に、該データを用いることで、扱うデータ量を縮小し、検索処理等の処理量を軽減することができる。 (3) In the said embodiment, both text data TX1 which produced | generated the voice recognition with respect to both call voice in which the customer's call voice and the operator's call voice were mixed is produced | generated. Further, the emotion recognition device 14 detects the emotion detection value V and the entire emotion detection value VT based on both call voices, and the emotion detection value is detected in the recognition target section S in which the emotion detection value V is detected in the text data TX1. Associate V. In addition, the entire emotion detection value VT is associated with the text data TX1.
For this reason, when performing the overall evaluation, by using the data, the amount of data to be handled can be reduced, and the amount of processing such as search processing can be reduced.

（４）上記実施形態では、Ｄ／Ｂ管理装置１２は、テキストデータＴＸ及び感情検出データＤに、履歴データ２１を関連付けて格納する。このため、例えば、製品グループ２１Ｆ、製品型番２１Ｇ、オペレータＩＤ２１Ｅ、保留履歴２１Ｊ等に基づき検索を行い、製品グループ又は製品、オペレータ、保留履歴からみたデータ分析を行うことができる。このため、多角的なデータ分析を行うことができる。 (4) In the above embodiment, the D / B management device 12 stores the history data 21 in association with the text data TX and the emotion detection data D. For this reason, for example, a search can be performed based on the product group 21F, the product model number 21G, the operator ID 21E, the hold history 21J, and the like, and data analysis can be performed from the product group or product, operator, and hold history. For this reason, multilateral data analysis can be performed.

（５）上記実施形態では、感情認識装置１４は、通話開始時から終了時までの間に設定された複数の認識対象区間Ｓに対し感情検出値Ｖを検出し、それらの感情検出値Ｖを、該感情検出値Ｖを検出した認識対象区間Ｓにそれぞれ関連付ける。即ち、単語単位、文節単位又は文章単位等で感情検出値Ｖを関連付けられるので、感情検出値Ｖが所定値以上又は以下の単語、文節又は文章単位等を抽出するなど、データ分析の自由度を向上させることができる。 (5) In the above embodiment, the emotion recognition device 14 detects the emotion detection value V for a plurality of recognition target sections S set between the start of the call and the end of the call, and uses these emotion detection values V. , The emotion detection value V is associated with the detected recognition target section S. That is, since the emotion detection value V can be associated in units of words, phrases, or sentences, the degree of freedom of data analysis such as extracting words, phrases, or sentences in which the emotion detection value V is greater than or equal to a predetermined value is provided. Can be improved.

尚、本実施形態は以下のように変更してもよい。
・上記実施形態では、感情認識装置１４が、感情検出値Ｖと文字データとを関連付けるようにしたが、Ｄ／Ｂ管理装置１２が感情検出値Ｖと文字データとを関連付けるようにしてもよい。この場合、感情認識装置１４は、所定区間毎に感情検出値Ｖを検出し、Ｄ／Ｂ管理装置１２は、所定区間と認識対象区間Ｓとを照合して、感情検出値Ｖ及び文字データを関連付ける。 In addition, you may change this embodiment as follows.
In the above embodiment, the emotion recognition device 14 associates the emotion detection value V with the character data, but the D / B management device 12 may associate the emotion detection value V with the character data. In this case, the emotion recognition device 14 detects the emotion detection value V for each predetermined section, and the D / B management device 12 collates the predetermined section with the recognition target section S to obtain the emotion detection value V and the character data. Associate.

・履歴データ２１は、オペレータが通話内容を要約してオペレータ端末６に入力した応対内容データを含むようにしてもよい。
・上記実施形態では、テキストＴＸＴに感情検出値Ｖを関連付けたが、周波数、テンポ、強度等の音声の特徴量を関連付けてもよい。この場合でも、特徴量から顧客の感情の状態を推測することができるので、テキスト及び感情をパラメータとするデータ分析を行うことができる。 The history data 21 may include response content data input to the operator terminal 6 by the operator summarizing the content of the call.
In the above-described embodiment, the emotion detection value V is associated with the text TXT, but audio feature amounts such as frequency, tempo, and intensity may be associated. Even in this case, since the state of the customer's emotion can be estimated from the feature amount, data analysis using text and emotion as parameters can be performed.

本実施形態のコールセンターシステムの概略図。1 is a schematic diagram of a call center system of the present embodiment. 履歴情報記憶部の模式図。The schematic diagram of a log | history information storage part. 音声データ登録装置のブロック図。The block diagram of an audio | voice data registration apparatus. 音声認識装置のブロック図。The block diagram of a speech recognition apparatus. テキストデータ及び感情検出データの模式図。The schematic diagram of text data and emotion detection data. データベースの模式図。A schematic diagram of a database.

Explanation of symbols

１…データベースシステムとしてのコールセンターシステム、７…応対履歴登録手段としてのセンター管理装置、１１…音声認識手段としての音声認識装置、１２…通話情報記憶手段、制御手段、文字データ取得手段、検出データ取得手段としてのＤ／Ｂ管理装置、１５…感情検出手段、登録手段としての感情認識装置、２０…履歴情報記憶部、２１…応対履歴データとしての履歴データ、１００…データベース、Ｄ…感情検出データ、Ｓ…認識対象区間、ＴＸ…文字データとしてのテキストデータ、Ｖ…感情検出情報としての感情検出値、ＶＴ…感情検出情報としての全体感情検出値。 DESCRIPTION OF SYMBOLS 1 ... Call center system as a database system, 7 ... Center management apparatus as reception history registration means, 11 ... Voice recognition apparatus as voice recognition means, 12 ... Call information storage means, control means, character data acquisition means, detection data acquisition D / B management device as means, 15 ... emotion detection means, emotion recognition device as registration means, 20 ... history information storage unit, 21 ... history data as response history data, 100 ... database, D ... emotion detection data, S: Recognition target section, TX: Text data as character data, V: Emotion detection value as emotion detection information, VT: Whole emotion detection value as emotion detection information.

Claims

Voice recognition means for performing voice recognition on a call voice when a call center operator and a customer make a call, and generating character data in which the content of the call is converted;
An emotion detecting means for detecting a feature amount of a voice in a target section of the call voice and generating emotion detection information inferring a speaker's emotion based on the feature amount;
Registration means for associating the emotion detection information of the target section with the character data corresponding to the target section;
A call center database system comprising: call information storage means for storing the character data and the emotion detection information in association with each other.

The call center database system according to claim 1,
The voice recognition means
Performing voice recognition on at least the customer's call voice and the operator's call voice to generate character data based on the customer's utterance and the character data based on the utterance of the operator;
The emotion detection means includes
Detecting the emotion detection information of the target section of the call voice of the customer and the emotion detection information of the target section of the call voice of the operator;
The registration means includes
Based on the call voice of the customer, the character data corresponding to the target section is associated with the emotion detection information of the customer detected in the target section, and corresponds to the target section based on the call voice of the operator. A call center database system, wherein the character data is associated with the emotion detection information of the operator detected in the target section.

In the call center database system according to claim 1 or 2,
The voice recognition means
Generating the character data by performing voice recognition on both call voices in which the customer call voice and the operator call voice are mixed,
The emotion detection means includes
Generating the emotion detection information based on the voice of both calls;
The registration means includes
A call center database system, wherein the emotion detection information of the target section is associated with the character data corresponding to the target section.

In the call center database system according to any one of claims 1 to 3,
The emotion detection means detects a plurality of the emotion detection information from the start to the end of a call,
The call center database system, wherein the registration means associates each emotion detection information with the target section of the character data from which the emotion detection information is detected.

In the call center database system according to any one of claims 1 to 4,
A call center database system further comprising: a response history registration unit that stores response history data indicating a response content with a customer in the call information storage unit in association with the feature amount detection data and the character data.

An information management method for the database using a control means for managing the database,
The control means is
Performing voice recognition on a call voice when a call center operator and a customer make a call, and obtaining character data obtained by characterizing the call contents;
Detecting a feature amount of speech in a target section of the call speech and obtaining emotion detection information inferring a speaker's emotion based on the feature amount;
And storing the emotion detection information of the target section in association with the character data corresponding to the target section in a call information storage unit.

An information management program for the database using a control means for managing the database,
The control means;
Character data acquisition means for performing voice recognition on a call voice when a call is made between a call center operator and a customer, and acquiring character data obtained by characterizing the call contents;
Detection data acquisition means for detecting a feature amount of speech in a target section of the call speech and generating emotion detection information inferring a speaker's emotion based on the feature amount;
An information management program for causing the character data corresponding to the target section to function as registration means for associating the emotion detection information of the target section and storing it in a call information storage means.