JP7164372B2

JP7164372B2 - Speech recognition system and speech recognition method

Info

Publication number: JP7164372B2
Application number: JP2018177526A
Authority: JP
Inventors: 興平呰上; 隆司杉山
Original assignee: Hitachi Information and Telecommunication Engineering Ltd
Current assignee: Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2022-11-01
Anticipated expiration: 2038-09-21
Also published as: JP2020046634A

Description

本発明は、音声認識システム及び音声認識方法に関する。 The present invention relates to a speech recognition system and a speech recognition method.

コールセンタやオフィスにおいて、通話先の顧客とオペレータの通話内容を録音して、後日のトラブルに備えたり、内容をレビューしたりすることが行われている。録音データは、音声認識してテキストデータに変換することにより、コンピュータシステムで検索したり表示あるいは印刷できるようになり、業務データとしてより有効活用することができる。 In call centers and offices, it is common practice to record the contents of conversations between a customer and an operator to prepare for future troubles and to review the contents. Recorded data can be retrieved, displayed or printed by a computer system by recognizing voice and converting it into text data, and can be used more effectively as business data.

コールセンタにおける音声認識の一般的な活用は、オペレータの発話内容から取得した認識結果による応対品質改善と、顧客の発話内容から取得した認識結果によるカスタマーサービス向上が主である。 General use of speech recognition in call centers is mainly to improve response quality by recognition results obtained from operator's utterances and to improve customer service by recognition results obtained from customer's utterances.

一方で、コールセンタ業務に従事するオペレータの離職率は高い傾向にあり、オペレータの離職率を抑えることが運用コストを下げることに繋がる。オペレータの離職理由のひとつとして、オペレータが電話対応時に顧客との通話から受ける負荷が挙げられる。 On the other hand, the turnover rate of operators engaged in call center operations tends to be high, and suppressing the turnover rate of operators leads to lower operating costs. One of the reasons why operators leave their jobs is the burden that operators receive from calls with customers when answering telephone calls.

オペレータの負荷状態を把握する技術として、例えば、特許文献１がある。特許文献１では、顧客からの問い合わせに対するオペレータの対応時間とオペレータに対するヒアリング等に基づいてオペレータの負荷状態を把握している。 For example, Japanese Patent Laid-Open No. 2002-100002 discloses a technique for grasping the load state of an operator. In Patent Literature 1, the operator's load state is grasped based on the response time of the operator to the inquiry from the customer, the interview with the operator, and the like.

特開２０１０－２３９３５３号公報JP 2010-239353 A

特許文献１では、オペレータの負荷状態の把握は、オペレータの管理者がオペレータの対応時間に基づいてオペレータへのヒアリング等を元に行っている。この方法では、オペレータの潜在的な負荷状態の把握は困難である。その理由は、特許文献１では、顧客の発話に着目することなく、オペレータの発話に着目してオペレータの負荷状態の把握しているからである。 In Japanese Unexamined Patent Application Publication No. 2002-100000, the load state of the operator is grasped by the manager of the operator based on interviews with the operator based on the response time of the operator. With this method, it is difficult to grasp the potential load state of the operator. The reason for this is that, in Patent Document 1, the operator's load state is grasped by focusing on the operator's utterance, not the customer's utterance.

本発明の目的は、音声認識システムにおいて、顧客の発話内容に着目することによりオペレータの潜在的な負荷状態を把握することにある。 SUMMARY OF THE INVENTION It is an object of the present invention to grasp the latent load state of an operator by paying attention to the content of a customer's utterance in a voice recognition system.

本発明の一態様の音声認識システムは、顧客とオペレータと通話内容の内、前記顧客の発話内容と前記オペレータの発話内容とを別々に録音する通話録音装置と、前記通話録音装置に録音された前記顧客の発話内容と前記オペレータの発話内容に対して音声認識を行う音声認識装置と、前記音声認識装置により音声認識された前記顧客の発話内容の音声認識結果に基づいて、前記オペレータが前記顧客から受けた負荷状態を評価して評価結果を求める認識結果管理装置と、前記認識結果管理装置により求めた前記負荷状態の前記評価結果を負荷状態画面に表示する管理者端末を有することを特徴とする。 A voice recognition system according to one aspect of the present invention includes a call recording device for separately recording the customer's utterance content and the operator's utterance content among the call content between a customer and an operator, and a voice recognition device for performing voice recognition on the content of the customer's utterance and the content of the utterance of the operator; and a manager terminal for displaying the evaluation result of the load state obtained by the recognition result management device on a load state screen. do.

本発明の一態様の音声認識方法は、顧客と複数のオペレータと通話内容の内、前記顧客の発話内容と前記オペレータの発話内容とを別々に録音し、録音された前記顧客の発話内容と前記オペレータの発話内容に対して音声認識を行い、音声認識された前記顧客の発話内容の音声認識結果に基づいて、前記オペレータが前記顧客から受けた負荷状態を評価して評価結果を前記オペレータごとに求め、前記負荷状態の前記評価結果を負荷状態画面に前記オペレータごとに表示することを特徴とする。 A speech recognition method according to one aspect of the present invention includes recording separately the customer's utterance content and the operator's utterance content among the call content between a customer and a plurality of operators, and recording the customer's utterance content and the operator's utterance content separately. Speech recognition is performed on the utterance content of the operator, and based on the speech recognition result of the speech-recognized utterance content of the customer, the operator evaluates the load state received from the customer, and the evaluation result is provided for each operator. and displaying the evaluation result of the load state on the load state screen for each operator.

本発明の一態様によれば、音声認識システムにおいて、顧客の発話内容に着目することによりオペレータの潜在的な負荷状態を把握することができる。 According to one aspect of the present invention, in the speech recognition system, it is possible to grasp the potential load state of the operator by paying attention to the content of the customer's utterance.

コールセンタシステムの全体構成図である。1 is an overall configuration diagram of a call center system; FIG. 負荷評価値テーブル（Ｔ－１）の構成を示す図である。FIG. 10 is a diagram showing the structure of a load evaluation value table (T-1); 負荷単語テーブル（Ｔ－２）の構成を示す図である。FIG. 10 is a diagram showing the configuration of a load word table (T-2); 負荷状態テーブル（Ｔ－３）の構成を示す図である。FIG. 10 is a diagram showing the configuration of a load state table (T-3); 負荷評価値テーブル（Ｔ－１）で定義された「音量」、「話速」、「単語」の基準値（評価値）の仕様を示す図である。FIG. 10 is a diagram showing specifications of reference values (evaluation values) for “volume”, “speed of speech”, and “words” defined in a load evaluation value table (T−1). オペレータに「単語」「音量」「話速」の評価値それぞれ設定した例を示す図である。FIG. 10 is a diagram showing an example in which evaluation values of "word", "volume" and "speed of speech" are respectively set for an operator; 負荷単語の設定例を示す図である。It is a figure which shows the example of a setting of a load word. オペレータＡにおける２通話分の負荷状態の算出例を示す図である。FIG. 10 is a diagram showing an example of calculation of a load state for two calls by operator A; 負荷状態画面の一例を示す図である。It is a figure which shows an example of a load state screen. 負荷状態画面の一例を示す図である。It is a figure which shows an example of a load state screen. 負荷状態画面の一例を示す図である。It is a figure which shows an example of a load state screen. 音声認識システムの構成図である。1 is a configuration diagram of a speech recognition system; FIG. 音声認識システムの動作を説明するための図である。FIG. 4 is a diagram for explaining the operation of the speech recognition system; FIG. 評価値の選定方法（５－１）を説明するためのフローチャートである。FIG. 10 is a flowchart for explaining a method (5-1) for selecting an evaluation value; FIG. 項目「音量」に関する負荷の分析フローである。It is an analysis flow of the load regarding the item "volume". 項目「話速」に関する負荷の分析フローである。It is an analysis flow of the load regarding the item "speech speed." 項目「単語」に関する負荷の分析フローである。It is an analysis flow of the load regarding the item "word".

以下、図面を参照して、実施例について説明する。 An embodiment will be described below with reference to the drawings.

最初に、図１を参照して、コールセンタシステムについて説明する。
コールセンタにおける音声認識システムでは、一般的に着番号等のＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐｈｏｎｙＩｎｔｅｇｒａｔｉｏｎ）情報を音声認識エンジン（辞書）に紐付けて、音声認識を行う。ＣＴＩ情報は、言語を特定することができる情報である。ここで、ＣＴＩとは、電話とコンピューターを連携して利用する技術の総称をいう。コールセンタなどで、顧客の電話番号から顧客情報をデータベースに照会したり、自動発信や自動転送したりする技術である。 First, a call center system will be described with reference to FIG.
A speech recognition system in a call center generally associates CTI (Computer Telephony Integration) information such as a called party number with a speech recognition engine (dictionary) to perform speech recognition. CTI information is information that can identify a language. Here, CTI is a general term for technologies that utilize telephones and computers in cooperation. This is a technology that enables call centers, etc., to look up customer information in a database from the customer's telephone number, and to perform automatic dialing and automatic transfer.

図１に示されるように、コールセンタシステムは、ネットワーク１００を介して、ＩＰ－ＰＢＸ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ－ＰｒｉｖａｔｅＢｒａｎｃｈｅＸｃｈａｎｇｅ：ＩＰ回線対応構内交換機）装置１０１、ＣＴＩ装置１０２、音声通話処理システム１０３、オペレータ用端末１０４及び管理者用端末１０５が接続されて構成されている。 As shown in FIG. 1, the call center system includes IP-PBX (Internet Protocol-Private Branch eXchange: IP line compatible private branch exchange) device 101, CTI device 102, voice call processing system 103, operator A terminal 104 and an administrator terminal 105 are connected to each other.

ＩＰ－ＰＢＸ装置１０１は、顧客１０６の通話端末１０７からの呼を受けて、ＩＰ網と公衆網１０８のプロトコル変換、発着信の呼制御などを行う。ＣＴＩ装置１０２は、ＩＰ－ＰＢＸ装置１０１から通話情報（着番号等）を取得して、音声通話処理システム１０３に送信する。 The IP-PBX device 101 receives a call from a call terminal 107 of a customer 106, performs protocol conversion between the IP network and the public network 108, and controls incoming and outgoing calls. The CTI device 102 acquires call information (called number, etc.) from the IP-PBX device 101 and transmits it to the voice call processing system 103 .

オペレータ用端末１０４は、オペレータ１０９がオペレータ業務に使うオペレータＰＣ端末である。オペレータ１０９は、通話端末（ＩＰ電話機）１１０を介して顧客１０６の通話端末１０７と公衆網１０８を介して通話を行う。管理者用端末１０５は、認識結果管理装置１１５により求めた負荷状態の評価結果を負荷状態画面に表示する。 The operator terminal 104 is an operator PC terminal used by the operator 109 for operator work. The operator 109 makes a call with the call terminal 107 of the customer 106 via the public network 108 via the call terminal (IP telephone) 110 . The administrator terminal 105 displays the evaluation result of the load state obtained by the recognition result management device 115 on the load state screen.

顧客１０６の通話端末１０７から公衆網１０８を介して接続されるＩＰ－ＰＢＸ装置１０１が、ネットワーク１００を介して通話端末（ＩＰ電話機）１１０と接続して通話を行う。オペレータ１０９は、通話端末（ＩＰ電話機）１１０から電話操作をすることができ、顧客１０６からの着信があると、顧客１０６とオペレータ１０９は通話状態になる。 An IP-PBX device 101 connected from a call terminal 107 of a customer 106 via a public network 108 connects to a call terminal (IP telephone) 110 via a network 100 to make a call. The operator 109 can operate the telephone from the call terminal (IP telephone) 110, and when there is an incoming call from the customer 106, the customer 106 and the operator 109 are in a talking state.

音声通話処理システム１０３は、通話録音情報管理装置１１１、通話録音装置１１２、音声認識制御装置１１３、音声認識装置１１４、認識結果管理装置１１５を有する。 The voice call processing system 103 has a call recording information management device 111 , a call recording device 112 , a voice recognition control device 113 , a voice recognition device 114 and a recognition result management device 115 .

通話録音情報管理装置１１１は、ＣＴＩ情報や録音情報を蓄積する機能を持つ。
通話録音装置１１２は、ミラーリングした通話音声を取得して録音する機能および通話録音情管理装置１１１に送信する機能を持つ。 The call recording information management device 111 has a function of accumulating CTI information and recording information.
The call recording device 112 has a function of acquiring and recording the mirrored call voice and a function of transmitting it to the call recording information management device 111 .

通話録音装置１１２は、通話端末１０７でやりとりされる通話のデータストリームを、ＩＰ－ＰＢＸ装置１０１を介して録音データとして録音する装置である。通話端末１０７での通話は、通話録音装置１１２に送られて録音ファイルとして保存される。 The call recording device 112 is a device that records a data stream of a call exchanged by the call terminal 107 as recorded data via the IP-PBX device 101 . A call at the call terminal 107 is sent to the call recording device 112 and saved as a recording file.

具体的には、オペレータ１０９は、通話端末（ＩＰ電話機）１１０を使用し、ＩＰ－ＰＢＸ装置１０１を介して公衆網１０８の顧客１０６と会話する。オペレータ１０９の音声（ＲＴＰパケット）は、通話端末（ＩＰ電話機）１１０から公衆網１０８へ送信される。顧客１０６の音声（ＲＴＰパケット）は、公衆網１０８から通話端末（ＩＰ電話機）１１０へ送信される。ＲＴＰパケットの送信元がＩＰ電話機１１０であればオペレータ１０９の発話、ＲＴＰパケットの送信元が公衆網１０８であれば顧客１０６の発話と区別する。これにより、通話録音装置１１２は、顧客１０６の音声とオペレータ１０９の音声を別々に録音し、それぞれの音声のみの音声ファイルを作成する。具体的には、顧客１０６の音声は、顧客用音声ファイルに録音され、オペレータ１０９の音声は、オペレー用音声ファイルに録音される。 Specifically, operator 109 uses call terminal (IP telephone) 110 to talk with customer 106 of public network 108 via IP-PBX device 101 . Voice (RTP packets) of operator 109 is transmitted from call terminal (IP telephone) 110 to public network 108 . Voice (RTP packets) of customer 106 is transmitted from public network 108 to call terminal (IP telephone) 110 . If the source of the RTP packet is the IP telephone 110, it is distinguished from the speech of the operator 109, and if the source of the RTP packet is the public network 108, it is distinguished from the speech of the customer 106. As a result, the call recording device 112 separately records the voice of the customer 106 and the voice of the operator 109, and creates a voice file of each voice only. Specifically, the voice of customer 106 is recorded in a customer voice file, and the voice of operator 109 is recorded in an operator voice file.

通話録音装置１１２は、ミラーリングした通話音声を取得して録音して音声認識装置１１４に送信する。通話録音情報管理装置１１１は、通話情報と録音情報を対応付けて管理するためのサーバである。音声認識装置１１４は、音声認識を実行する音声認識エンジンを持ち、認識結果を認識結果管理装置１１５に送信する。音声認識装置１１４は、録音データを音声認識エンジンによりテキストデータに変換する。 The call recording device 112 acquires the mirrored call voice, records it, and transmits it to the voice recognition device 114 . The call recording information management device 111 is a server for managing call information and recording information in association with each other. The speech recognition device 114 has a speech recognition engine that executes speech recognition, and transmits recognition results to the recognition result management device 115 . The voice recognition device 114 converts the recorded data into text data by means of a voice recognition engine.

音声認識制御装置１１３は、音声認識装置１１４へ認識の実行を指示する機能を持つ。認識結果管理装置１１５は、音声認識装置１１４が出力するテキストデータをデータベースに格納し音声認識結果を蓄積して、その認識結果を出力する。 The voice recognition control device 113 has a function of instructing the voice recognition device 114 to perform recognition. The recognition result management device 115 stores text data output by the speech recognition device 114 in a database, accumulates speech recognition results, and outputs the recognition results.

認識結果管理装置１１５は、負荷状態を数値化するための評価値を定義した負荷評価値テーブル（Ｔ－１）（図２参照）と、オペレータ１０９の負荷となる単語を規定した負荷単語テーブル（Ｔ－２）（図３参照）、言語情報と非言語情報からオペレータの負荷状態を評価した評価結果を格納する負荷状態テーブル（Ｔ－３）（図４参照）を有する。 The recognition result management device 115 includes a load evaluation value table (T-1) (see FIG. 2) that defines evaluation values for quantifying the load state, and a load word table ( T-2) (see FIG. 3), and a load state table (T-3) (see FIG. 4) for storing evaluation results obtained by evaluating the load state of the operator from verbal information and non-verbal information.

負荷評価値テーブル（Ｔ－１）及び負荷単語テーブル（Ｔ－２）は、管理者が評価値を設定する。具体的には、認識結果管理装置１１５は、管理者に対して、管理者用端末１０５に評価値設定用画面を提供して表示する。管理者は、評価値設定用画面を介して評価値を設定する。 An administrator sets evaluation values in the load evaluation value table (T-1) and the load word table (T-2). Specifically, the recognition result management device 115 provides and displays an evaluation value setting screen on the administrator terminal 105 for the administrator. The administrator sets the evaluation value via the evaluation value setting screen.

図２を参照して、負荷評価値テーブル（Ｔ－１）の構成について説明する。
図２に示すように、「負荷評価値テーブル」（Ｔ－１）には、顧客の発話内容から抽出した「音量」「話速」「単語」の負荷算出における基準値（評価値）をオペレータごと設定する。 The configuration of the load evaluation value table (T-1) will be described with reference to FIG.
As shown in FIG. 2, in the "load evaluation value table" (T-1), reference values (evaluation values) for load calculation of "volume", "speed of speech", and "words" extracted from the customer's utterance set each.

例えば、図２に示すように、オペレータＡに対しては、「音量」：４０、「話速」：“ｄ”、「単語」：１０が設定される。オペレータＢに対しては、「音量」：５０、「話速」：“５０”、「単語」：“ｄ”が設定される。オペレータＣに対しては、「音量」：“ｄ”、「話速」：“２０”、「単語」：“ｄ”が設定される。また、“ｄｅｆａｕｌｔ”として、「音量」：３０、「話速」：“４０”、「単語」：“５”が設定される。ここで、“ｄ”は、“ｄｅｆａｕｌｔ”の設定値を使用することを意味する。例えば、オペレータＣに着目すると、「音量」として“ｄ”が設定されているので、“ｄｅｆａｕｌｔ”の「音量」：３０が提供され「音量」：３０が設定される。 For example, as shown in FIG. 2, for operator A, "volume": 40, "speech rate": "d", and "word": 10 are set. For operator B, "volume": 50, "speech rate": "50", and "word": "d" are set. For operator C, "volume": "d", "speech rate": "20", and "word": "d" are set. Also, as "default", "volume": 30, "speech rate": "40", and "word": "5" are set. Here, "d" means to use the "default" setting. For example, focusing on operator C, since "d" is set as "volume", "default" "volume": 30 is provided and "volume": 30 is set.

ここで、図５を参照して、「負荷評価値テーブル」（Ｔ－１）で定義された「音量」、「話速」、「単語」の基準値（評価値）の仕様について説明する。
図５に示すように、音量は、顧客の声の大きさを意味し、単位は「ｄＢ」である。話速は、顧客の１秒間に含まれる音節の件数を意味し、単位は「ｗｏｒｄ／ｓｅｃ」である。単語は、負荷単語が顧客の発話内容に含まれる件数を意味し、単位は「件」である。「負荷評価観点」は、音量に関しては大きい程、負荷は増える。話速に関しては大きい程、負荷は増える。単語に関しては、大きい程、負荷は増える。 Here, with reference to FIG. 5, the specifications of the reference values (evaluation values) of "volume", "speech speed", and "word" defined in the "load evaluation value table" (T-1) will be described.
As shown in FIG. 5, the volume means the loudness of the customer's voice, and the unit is "dB". The speech rate means the number of syllables included in one second of the customer, and the unit is "word/sec". A word means the number of times a load word is included in the customer's utterance content, and the unit is "case". In terms of "load evaluation", the greater the sound volume, the greater the load. As the speech speed increases, the load increases. As for words, the larger the word, the greater the load.

図６を参照して、オペレータ１０９に単語、音量、話速の評価値それぞれ設定した例について説明する。
図６に示すように、オペレータＡに対しては、「音量」：４０、「話速」：２、「単語」：１０が設定される。オペレータＢに対しては、「音量」：“ｄ”、「話速」：“ｄ”、「単語」：“５”が設定される。オペレータＣに対しては、「音量」：“ｄ”、「話速」：“２０”、「単語」：“ｄ”が設定される。また、オペレータ“ｄｅｆａｕｌｔ”に対しては、「音量」：３０、「話速」：“１０”、「単語」：“２”が設定される。 An example in which the operator 109 is set with evaluation values for words, volume, and speed of speech will be described with reference to FIG.
As shown in FIG. 6, for operator A, "volume": 40, "speech rate": 2, and "word": 10 are set. For operator B, "volume": "d", "speech rate": "d", and "word": "5" are set. For operator C, "volume": "d", "speech rate": "20", and "word": "d" are set. For the operator "default", "volume": 30, "speech rate": "10", and "word": "2" are set.

図６の各パラメータについて、”ｄ”を設定している項目については、オペレータ“ｄｅｆａｕｌｔ”に設定した値を用いる。例えば、オペレータＢの音量は”ｄ”が設定されているので、“ｄｅｆａｕｌｔ”から値を取得して３０を評価値とする。また、オペレータＢの話速は”ｄ”が設定されているので、“ｄｅｆａｕｌｔ”から値を取得して１０を評価値とする。 For each parameter in FIG. 6, the value set in the operator "default" is used for the item for which "d" is set. For example, since "d" is set for the volume of operator B, the value is acquired from "default" and 30 is set as the evaluation value. Also, since "d" is set for the speech rate of operator B, the value is obtained from "default" and 10 is set as the evaluation value.

図３を参照して、「負荷単語テーブル」（Ｔ－２）の構成について説明する。
図３に示すように、「負荷単語テーブル」（Ｔ－２）には、オペレータにとって負荷となる負荷単語を設定する。例えば、負荷単語として、“ＡＡＡＡ”、“ＢＢＢＢ”、“ＺＺＺＺ“が設定される。 The configuration of the "load word table" (T-2) will be described with reference to FIG.
As shown in FIG. 3, in the "load word table" (T-2), load words that are burdensome to the operator are set. For example, "AAAA", "BBBB", and "ZZZZ" are set as load words.

図７を参照して、負荷単語の設定例について説明する。
図７に示すように、負荷単語の設定例は、例えば、「上司を呼んでくれ」（Ｎｏ．１）、「他の人に代わって」（Ｎｏ．２）等である。 A setting example of load words will be described with reference to FIG.
As shown in FIG. 7, setting examples of load words are, for example, "Please call my boss" (No. 1), "On behalf of another person" (No. 2), and the like.

図４を参照して、負荷状態テーブル（Ｔ－３）の構成について説明する。
図４に示すように、負荷状態テーブル（Ｔ－３）では、オペレータＡ、Ｂ、Ｃごとに通話ＩＤ、外線発信番号、通話終了時間、評価結果が定義されている。ここで、評価結果として、音量、話速、単語を定義する。音量、話速、単語の評価結果は、“負荷”と“平常”で表す。 The configuration of the load state table (T-3) will be described with reference to FIG.
As shown in FIG. 4, the load status table (T-3) defines call IDs, outside line calling numbers, call end times, and evaluation results for each of operators A, B, and C. FIG. Here, volume, speaking speed, and words are defined as evaluation results. The volume, speaking speed, and word evaluation results are represented by "load" and "normal."

例えば、オペレータＡの通話ＩＤ“０ｘ３００１”に対しては、「音量」の評価結果は「負荷」であり、「話速」の評価結果は「負荷」であり、「単語」の評価結果は「負荷」である。オペレータＢの通話ＩＤ“０ｘ３００７”に対しては、「音量」の評価結果は「平常」であり、「話速」の評価結果は「平常」であり、「単語」の評価結果は「平常」である。オペレータＣの通話ＩＤ“０ｘ３０１０”に対しては、「音量」の評価結果は「負荷」であり、「話速」の評価結果は「負荷」であり、「単語」の評価結果は「平常」である。 For example, for the call ID "0x3001" of operator A, the evaluation result of "volume" is "load", the evaluation result of "speech speed" is "load", and the evaluation result of "word" is " load”. For the call ID "0x3007" of operator B, the evaluation result of "volume" is "normal", the evaluation result of "speech speed" is "normal", and the evaluation result of "word" is "normal". is. For the call ID "0x3010" of operator C, the evaluation result of "volume" is "load", the evaluation result of "speech speed" is "load", and the evaluation result of "word" is "normal". is.

次に、オペレータの負荷状態を求めて管理者用端末１０５に提供する方法について説明する。 Next, a method of obtaining the operator's load state and providing it to the administrator's terminal 105 will be described.

まず、管理者は、負荷評価値テーブル（Ｔ－１）の各項目と、負荷単語テーブル（Ｔ－２）に値を設定しておく。音声認識認装置１１４は、顧客１０７とオペレータ１０９の通話完了時に音声認識を行う。 First, the administrator sets values in each item of the load evaluation value table (T-1) and the load word table (T-2). A speech recognition device 114 performs speech recognition when the call between the customer 107 and the operator 109 is completed.

認識結果管理装置１１５は、音声認識装置１１４から取得した顧客１０７の認識結果と、負荷評価値テーブル（Ｔ－１）及び負荷単語テーブル（Ｔ－２）の内容からオペレータの負荷状態を評価して、評価結果を負荷状態テーブル（Ｔ－３）に格納する。ここで、音声認識装置１１４は、通話録音装置１１２に格納された顧客用音声ファイルに基づいて顧客１０７の音声を認識する。 The recognition result management device 115 evaluates the operator's load state from the recognition result of the customer 107 obtained from the speech recognition device 114, the load evaluation value table (T-1), and the load word table (T-2). , the evaluation result is stored in the load state table (T-3). Here, the voice recognition device 114 recognizes the voice of the customer 107 based on the customer voice file stored in the call recording device 112 .

ここで、図８にオペレータＡにおける２通話分の負荷状態の算出例を示す。
図８に示すように、オペレータＡの通話ＩＤ“０ｘ３００１”に対しては、「通話時間」は３００秒であり、「音量」の評価結果は「負荷」であり、「話速」の評価結果は「負荷」であり、「単語」の評価結果は「正常」である。また、オペレータＡの通話ＩＤ“０ｘ３００２”に対しては、「通話時間」は１２０秒であり、「音量」の評価結果は「負荷」であり、「話速」の評価結果は「負荷」であり、「単語」の評価結果は「負荷」である。 Here, FIG. 8 shows an example of calculation of the load state of operator A for two calls.
As shown in FIG. 8, for the call ID "0x3001" of operator A, the "call time" is 300 seconds, the evaluation result of "volume" is "load", and the evaluation result of "speed" is is "load" and the evaluation result of "word" is "normal". For operator A's call ID "0x3002", the "call time" is 120 seconds, the evaluation result of "volume" is "load", and the evaluation result of "speech speed" is "load". and the evaluation result of "word" is "load".

認識結果管理装置１１５は、オペレータ及び通話顧客ごとに通話開始時刻にそって集計結果を求め、管理者用端末１０５に提供して負荷状態参照画面（管理者ＰＣ画面）に表示させる。管理者は、負荷状態参照画面を介して、認識結果管理装置１１５が評価した内容からオペレータ１０９の負荷状態を判断する。 The recognition result management device 115 obtains a total result along with the call start time for each operator and call customer, provides it to the manager terminal 105, and displays it on the load state reference screen (manager PC screen). The administrator judges the load state of the operator 109 from the content evaluated by the recognition result management device 115 via the load state reference screen.

このように、実施例では、通話の顧客側の発話内容に着目し、顧客側の発話内容の音声認識結果から、オペレータ１０９への負荷を算出することで管理者に対してオペレータ１０９の負荷状態を提供する。 As described above, in the embodiment, attention is focused on the content of the customer's utterance in a call, and the load on the operator 109 is calculated from the speech recognition result of the customer's utterance content. I will provide a.

従来の音声認識システムでは、オペレータの発話内容に着目してオペレータの対応改善を行っていた。これに対して、実施例では顧客側の発話内容に着眼点を変更したうえで、音声認識結果の判定条件をオペレータの対応改善ではなく、音声認識と分析によりオペレータに負荷を与える要素の評価とする。実施例の音声認識システムでは、顧客の発話内容に着目することによりオペレータの潜在的な負荷状態を把握する。 In the conventional speech recognition system, attention was focused on the content of the operator's utterances to improve the response of the operator. On the other hand, in the embodiment, the point of view is changed to the content of the customer's utterance, and the judgment condition of the speech recognition result is not the improvement of the operator's response, but the evaluation of the factors that give the operator a load through speech recognition and analysis. do. In the voice recognition system of the embodiment, the operator's potential load state is grasped by paying attention to the content of the customer's utterance.

負荷を与える認識結果の要素は、「言語情報」中に含まれる単語、「非言語情報」中に含まれる発話の音量及び話速を対象とする。単語評価として、顧客の発話内容中にオペレータの負荷となる単語の出現数を評価する。音量評価として、顧客の発話時の音量を評価する。話速評価として顧客の発話時の通話速度を評価する。評価項目について評価値を設定した上で、発話内容からの取得情報と評価値の大小関係から負荷状態を判断する。そして、評価したオペレータの負荷状態を管理者に対して集計結果として提供する。 The elements of the recognition results that give the load are the words included in the "linguistic information" and the volume and speed of speech included in the "non-linguistic information". As the word evaluation, the number of occurrences of words that burden the operator in the contents of the customer's utterance is evaluated. As the volume evaluation, the volume when the customer speaks is evaluated. As speech speed evaluation, the speech speed at the time of customer's speech is evaluated. After setting the evaluation value for the evaluation item, the load state is determined from the magnitude relationship between the information obtained from the utterance content and the evaluation value. Then, the evaluated load state of the operator is provided to the administrator as a total result.

次に、管理者用端末１０５に表示される負荷状態参照画面（管理者ＰＣ画面）について説明する。管理者は、オペレータ１０９の負荷状態を管理者ＰＣ画面から判断する。
図９に示すように、負荷状態画面には、オペレータの通話に対する負荷分析結果を表示する。負荷状態画面は、管理者に対して、「負荷状態テーブル」（Ｔ－３）の内容から、１通話ごとの負荷状況を表示する。 Next, a load condition reference screen (administrator PC screen) displayed on the administrator terminal 105 will be described. The administrator judges the load condition of the operator 109 from the administrator PC screen.
As shown in FIG. 9, the load state screen displays the load analysis result for the operator's call. The load status screen displays the load status for each call to the administrator from the contents of the "load status table" (T-3).

また、図１０に示すように、負荷状態画面には、オペレータの通話に対する負荷分析結果を表示する。負荷状態画面は、管理者に対して、「負荷状態テーブル」（Ｔ－３）から週を指定し指定週について、日毎の（負荷となる通話件数／オペレータの全体通話件数）をパーセント表記で表示する。 Further, as shown in FIG. 10, the load status screen displays the result of load analysis for the operator's call. The load status screen indicates to the administrator the week specified from the "load status table" (T-3), and the daily number of calls (loaded calls/total number of calls of the operator) for the specified week in percent notation. do.

また、図１１に示すように、負荷状態画面には、オペレータの通話に対する負荷分析結果を表示する。負荷状態画面は、管理者に対して、「負荷状態テーブル」（Ｔ－３）から月を指定し指定月について、週毎の（負荷となる通話件数／オペレータの全体通話件数）をパーセント表記で表示する。 Further, as shown in FIG. 11, the load status screen displays the result of load analysis for the operator's call. On the load status screen, the administrator can specify the month from the "load status table" (T-3), and for the specified month, the weekly number of calls (the number of calls that are the load / the total number of calls of the operator) will be displayed in percentage notation. indicate.

次に、図１２のシステム構成図及び図１３のフローチャートを参照して、コールセンタにおける音声認識システムの動作（着信から負荷算出までの動作）について説明する。 Next, with reference to the system configuration diagram of FIG. 12 and the flow chart of FIG. 13, the operation of the speech recognition system in the call center (the operation from receiving a call to calculating the load) will be described.

例として、オペレータ１０９と顧客１０６との間の通話終了後に音声認識を実施（（１）～（４））後、オペレータ１０９が顧客１０６の発話から受ける負荷を算出する（５）場合を説明する。 As an example, a case will be described in which the operator 109 calculates the load received from the customer 106's utterance (5) after executing speech recognition ((1) to (4)) after the end of the call between the operator 109 and the customer 106. .

（１）通話録音情報管理装置１１１は、ＣＴＩ情報を受信する。
（２）通話録音装置１１２が、録音情報を通話録音情報管理装置１１１に送信する。
（３）通話録音情報管理装置１１１から音声認識制御装置１１３へＣＴＩ情報と録音情報を送信する。
（４）音声認識制御装置１１３は、音声認識装置１１２にＣＴＩ情報と録音情報を送信して音声認識装置１１２に音声認識の実行を依頼する。音声認識装置１１２は音声認識を行い、認識結果を音声認識制御装置１１３に送信する。音声認識制御装置１１３は、認識結果を認識結果管理装置１１５に送信する。
（５）認識結果管理装置１１５は、音声認識制御装置１１３から受信した認識結果と、「負荷価値テーブル」（Ｔ－１）及び「負荷単語テーブル」（Ｔ－２）とからオペレータが顧客から受けた負荷を分析し、分析結果を評価結果として「負荷状態テーブル」（Ｔ－３）に蓄積する。負荷分析は、後述のように、評価値の選定（５－１）、負荷の分析（５－２）、分析結果の保存（５－３）の順に行う。認識結果管理装置１１５は、管理者用端末１０５に認識結果を表示させ、管理者は認識結果を閲覧する。 (1) The call recording information management device 111 receives CTI information.
(2) The call recording device 112 transmits recording information to the call recording information management device 111 .
(3) Sending the CTI information and the recording information from the call recording information management device 111 to the voice recognition control device 113 .
(4) The speech recognition control device 113 transmits the CTI information and the recording information to the speech recognition device 112 and requests the speech recognition device 112 to perform speech recognition. The speech recognition device 112 performs speech recognition and transmits the recognition result to the speech recognition control device 113 . The speech recognition control device 113 transmits the recognition result to the recognition result management device 115 .
(5) The recognition result management device 115 uses the recognition result received from the speech recognition control device 113, the "load value table" (T-1) and the "load word table" (T-2) to provide the operator with information received from the customer. The load is analyzed and the results of the analysis are accumulated in the "load status table" (T-3) as evaluation results. As will be described later, the load analysis is performed in the order of evaluation value selection (5-1), load analysis (5-2), and analysis result storage (5-3). The recognition result management device 115 displays the recognition results on the manager's terminal 105, and the manager browses the recognition results.

次に、図１４を参照して、評価値の選定方法（５－１）について説明する。
最初に、「負荷価値テーブル」（Ｔ－１）の評価値を取得する（Ｓ１４１）。
次に、通話情報のオペレータ名が「負荷価値テーブル」（Ｔ－１）上に存在するかを判定する（Ｓ１４２）。 Next, the evaluation value selection method (5-1) will be described with reference to FIG.
First, the evaluation value of the "load value table" (T-1) is obtained (S141).
Next, it is determined whether the operator name of the call information exists on the "load value table" (T-1) (S142).

存在する場合には、「負荷価値テーブル」（Ｔ－１）のオペレータに対応するレコードの“音量”、“話速”、“単語”の値を取得する（Ｓ１４３）。 If it exists, the values of "volume", "speech speed" and "word" of the record corresponding to the operator in the "load value table" (T-1) are obtained (S143).

存在しない場合には、「負荷価値テーブル」（Ｔ－１）のオペレータ名：ｄｅｆａｕｌｔのレコード上の値を評価値とする（Ｓ１４４）。 If it does not exist, the value on the record of operator name: default in the "load value table" (T-1) is used as the evaluation value (S144).

次に、「負荷価値テーブル」（Ｔ－１）のオペレータのレコード項目について値が“ｄ”のものが存在するか判定する（Ｓ１４５）。
存在する場合には、値が０の項目は「負荷価値テーブル」（Ｔ－１）のオペレータ名：ｄｅｆａｕｌｔのレコード上の値を評価値とする（Ｓ１４６）。
存在しない場合には、「負荷価値テーブル」（Ｔ－１）の値を評価値とする（Ｓ１４７）。 Next, it is determined whether or not there is an operator record item of the "load value table" (T-1) with a value of "d" (S145).
If it exists, the value on the record of operator name: default in the "load value table" (T-1) is used as the evaluation value for items with a value of 0 (S146).
If it does not exist, the value of the "load value table" (T-1) is used as the evaluation value (S147).

次に、図１５～図１７を参照して、負荷の分析方法（５－２）について説明する。オペレータ１０９が顧客１０６の発話内容から受けた負荷の分析を「音量」、「話速」、「単語」について実施する。 Next, the load analysis method (5-2) will be described with reference to FIGS. 15 to 17. FIG. The operator 109 analyzes the load received from the contents of the speech of the customer 106 with respect to "volume", "speech speed", and "words".

図１５を参照して、項目「音量」に関する負荷の分析方法について説明する。
まず、認識結果から顧客１０６の発話内容に関する非言語情報を取得する（Ｓ１５１）。
次に、非言語情報の中から、音量を抽出する（Ｓ１５２）。
次に、顧客１０６の発話音量と音量評価値を比較する（Ｓ１５３）。
比較の結果、顧客１０６の発話音量が音量評価値よりも大きいかを判定する（Ｓ１５４）。
判定の結果、顧客１０６の発話音量が音量評価値よりも大きい場合には、通話から項目「音量」について評価結果を“負荷”とする（Ｓ１５５）。
判定の結果、顧客１０６の発話音量が音量評価値よりも大きくない場合には、通話から項目「音量」について評価結果を“平常”とする（Ｓ１５６）。 A load analysis method for the item "volume" will be described with reference to FIG.
First, non-verbal information about the content of the speech of the customer 106 is acquired from the recognition result (S151).
Next, volume is extracted from the non-verbal information (S152).
Next, the speech volume of the customer 106 is compared with the volume evaluation value (S153).
As a result of the comparison, it is determined whether the speech volume of the customer 106 is greater than the volume evaluation value (S154).
As a result of the determination, if the speech volume of the customer 106 is greater than the volume evaluation value, the evaluation result for the item "volume" from the call is set to "load" (S155).
As a result of the determination, if the speech volume of the customer 106 is not greater than the volume evaluation value, the evaluation result for the item "volume" from the call is set to "normal" (S156).

次に、図１６を参照して、項目「話速」に関する負荷の分析方法について説明する。
まず、認識結果から顧客１０６の発話内容に関する非言語情報を取得する（Ｓ１６１）。
次に、非言語情報の中から、話速を抽出する（Ｓ１６２）。
次に、顧客１０６の発話話速と話速評価値を比較する（Ｓ１６３）。
比較の結果、顧客１０６の話速音量が話速評価値よりも大きいかを判定する（Ｓ１６４）。
判定の結果、顧客１０６の話速音量が音量評価値よりも大きい場合には、通話から項目「話速」について評価結果を“負荷”とする（Ｓ１６５）。
判定の結果、顧客１０６の話速音量が話速評価値よりも大きくない場合には、通話から項目「話速」について評価結果を“平常”とする（Ｓ１６６）。 Next, referring to FIG. 16, a load analysis method for the item "speech speed" will be described.
First, non-verbal information about the content of the speech of the customer 106 is acquired from the recognition result (S161).
Next, speech speed is extracted from the non-verbal information (S162).
Next, the speech speed of the customer 106 is compared with the speech speed evaluation value (S163).
As a result of the comparison, it is determined whether the speaking speed volume of the customer 106 is greater than the speaking speed evaluation value (S164).
As a result of the determination, if the speech rate volume of the customer 106 is greater than the volume evaluation value, the evaluation result for the item "speech rate" from the call is set to "load" (S165).
As a result of the determination, if the speech rate volume of the customer 106 is not greater than the speech rate evaluation value, the evaluation result for the item "speech rate" from the call is set to "normal" (S166).

次に、図１７を参照して、項目「単語」に関する負荷の分析方法について説明する。
まず、認識結果から顧客１０６の発話内容に関する言語情報を取得する（Ｓ１７１）。
次に、非言語情報の中から、単語情報を抽出する（Ｓ１７２）。
次に、単語情報の中に「負荷単語テーブル」（Ｔ－２）に登録した単語が存在するかを判定する（Ｓ１７３）。 Next, referring to FIG. 17, a load analysis method for the item "word" will be described.
First, linguistic information about the content of the customer 106's utterance is acquired from the recognition result (S171).
Next, word information is extracted from the non-verbal information (S172).
Next, it is determined whether or not the words registered in the "load word table" (T-2) exist in the word information (S173).

存在する場合には、単語情報中に存在する「負荷単語テーブル」（Ｔ－２）に登録した単語の数をカウントする。カウントで算出した値を発話負荷単語数とする（Ｓ１７４）。
存在しない場合には、発話負荷単語数を０とする（Ｓ１７５）。 If it exists, the number of words registered in the "load word table" (T-2) that exists in the word information is counted. The value calculated by counting is set as the number of utterance load words (S174).
If it does not exist, the number of utterance load words is set to 0 (S175).

次に、顧客１０６の発話負荷単語数と単語数評価値を比較する（Ｓ１７６）。
次に、比較の結果、顧客１０６の発話負荷単語数が単語数評価値よりも大きいかを判定する（Ｓ１７７）。
判定の結果、顧客１０６の発話負荷単語数が単語数評価値よりも大きい場合には、通話から項目「単語」について評価結果を“負荷”とする（Ｓ１７８）。
判定の結果、顧客１０６の発話負荷単語数が単語数評価値よりも大きくない場合には、通話から項目「単語」について評価結果を“平常”とする（Ｓ１７９）。 Next, the number of utterance load words of the customer 106 and the word number evaluation value are compared (S176).
Next, as a result of the comparison, it is determined whether the number of utterance load words of the customer 106 is larger than the word number evaluation value (S177).
As a result of the determination, if the number of uttered load words of the customer 106 is larger than the word count evaluation value, the evaluation result for the item "word" from the call is set to "load" (S178).
As a result of the determination, if the number of utterance load words of the customer 106 is not larger than the word count evaluation value, the evaluation result for the item "word" from the call is set to "normal" (S179).

最後に、分析結果の保存方法（５－３）について説明する。
分析した「単語」、「音量」、「話速」の負荷結果を「負荷状態テーブル」（Ｔ－３）に保存する。保存時には、負荷結果に加えて、通話に紐づく「オペレータ名」、「通話ＩＤ」、「外線発信番号」、「通話開始時刻」及び「通話時間」を音声認識制御装置１１３より取得してレコードとして記録する。 Finally, a method (5-3) for saving analysis results will be described.
The load results of the analyzed "words", "volume" and "speech rate" are stored in the "load state table" (T-3). At the time of saving, in addition to the load result, the “operator name”, “call ID”, “outline number”, “call start time” and “call duration” associated with the call are acquired from the voice recognition control device 113 and recorded. Record as

実施例によれば、管理者はオペレータが顧客との通話から受ける負荷とオペレータの状態を把握することができる。さらに、管理者がオペレータの負荷傾向を早期に把握することで、電話対応における負荷の増大を事前に緩和することができる。 According to the embodiment, the manager can grasp the load that the operator receives from the call with the customer and the state of the operator. Furthermore, the administrator can grasp the operator's load tendency at an early stage, so that an increase in the load due to telephone correspondence can be alleviated in advance.

尚、実施例では、顧客用音声ファイルに基づいてオペレータの負荷を求めているが、本発明はこれに限定されず、オペレー用音声ファイルをも併用してオペレータの負荷を求めても良い。 In the embodiment, the operator's load is determined based on the customer voice file, but the present invention is not limited to this, and the operator's voice file may also be used to determine the operator's load.

１００ネットワーク
１０１ＩＰ－ＰＢＸ装置
１０２ＣＴＩ装置
１０３音声通話処理システム
１０４オペレータ用端末
１０５管理者用端末
１０６顧客
１０７通話端末
１０８公衆網
１０９オペレータ
１１０通話端末（ＩＰ電話機）
１１１通話録音情報管理装置
１１２通話録音装置
１１３音声認識制御装置
１１４音声認識装置
１１５認識結果管理装置 100 network 101 IP-PBX device 102 CTI device 103 voice call processing system 104 operator terminal 105 manager terminal 106 customer 107 call terminal 108 public network 109 operator 110 call terminal (IP telephone)
111 call recording information management device 112 call recording device 113 speech recognition control device 114 speech recognition device 115 recognition result management device

Claims

a call recording device for recording the customer's utterances ;
a speech recognition device that performs speech recognition on the content of the customer's speech recorded in the call recording device;
a recognition result management device for obtaining an evaluation result by evaluating the load state received by an operator from the customer based on the speech recognition result of the customer's utterance content recognized by the speech recognition device;
an administrator terminal that displays the evaluation result of the load state obtained by the recognition result management device on a load state screen;
has
The recognition result management device
a load evaluation value table defining evaluation values for evaluating the load state;
a load word table that defines load words that act as a load on the operator;
a load status table for storing the evaluation results of the operator evaluating the load status received from the customer;
The recognition result management device
By referring to the speech recognition result of the customer's utterance content obtained from the speech recognition device, the load evaluation value table and the load word table, the load state received by the operator from the customer is determined for each operator. and accumulate the evaluation result in the load state table for each operator,
The administrator terminal
displaying the evaluation results accumulated in the load state table on the load state screen for each operator;
The recognition result management device
In the load evaluation value table, as the evaluation values, the volume and speech rate of the content of the customer's utterance included in the non-verbal information of the content of the customer's utterance, and the load included in the linguistic information of the content of the customer's utterance. A speech recognition system characterized by defining the number of words .

The recognition result management device
By comparing the recognition result of the customer's utterance content acquired from the speech recognition device with the evaluation value defined in the load evaluation value table , whether the customer's utterance content causes a load on the operator. 2. The speech recognition system according to claim 1, wherein the evaluation result is accumulated in the load state table as a load or a normal state.

The recognition result management device
acquiring the non-verbal information of the customer's utterance content from the speech recognition result of the customer's utterance content;
extracting speech volume from the non-verbal information;
comparing the speech volume and the volume evaluation value to determine whether the speech volume is greater than the volume evaluation value;
If the speech volume is greater than the volume evaluation value, defining the volume evaluation result as the load,
3. The speech recognition system according to claim 2 , wherein when said speech volume is not greater than said volume evaluation value, said volume evaluation result is defined as said normal.

The recognition result management device
acquiring the non-verbal information of the customer's utterance content from the speech recognition result of the customer's utterance content;
extracting speech speed from the non-verbal information;
comparing the speech speed and the evaluation value of the speech speed to determine whether the speech speed is greater than the evaluation value of the speech speed;
if the speech speed is greater than the speech speed evaluation value, defining the speech speed evaluation result as the load;
3. The speech recognition system according to claim 2 , wherein when said speech speed is not greater than said speech speed evaluation value, said speech speed evaluation result is defined as said normal.

The recognition result management device
acquiring the language information of the customer's utterance content from the speech recognition result of the customer's utterance content;
extracting word information from the language information;
determining whether the load word defined in the load word table exists in the word information;
if the load word exists, calculating the number of uttered words of the load word in the word information;
comparing the number of spoken words with the evaluation value of the load word to determine whether the number of spoken words is greater than the evaluation value of the load word;
if the number of uttered words is greater than the evaluation value of the load word, defining the evaluation result of the load word as the load;
3. The speech recognition system according to claim 2 , wherein when the number of uttered words is not greater than the evaluation value of the load word, the evaluation result of the load word is defined as normal.

The administrator terminal displays a setting screen,
The recognition result management device
2. The speech recognition system according to claim 1 , wherein the evaluation values are set in the load evaluation value table and the load words are set in the load word table via the setting screen .

a call recording device for recording the customer's utterances ;
a speech recognition device that performs speech recognition on the content of the customer's speech recorded in the call recording device;
a recognition result management device for obtaining an evaluation result by evaluating the load state received by an operator from the customer based on the speech recognition result of the customer's utterance content recognized by the speech recognition device;
an administrator terminal that displays the evaluation result of the load state obtained by the recognition result management device on a load state screen;
has
The administrator terminal
A speech recognition system , wherein the aggregated results of the evaluation results are displayed on the load state screen for each of the operator and the customer in accordance with call start time .

a call recording device for recording the customer's utterances ;
a speech recognition device that performs speech recognition on the content of the customer's speech recorded in the call recording device;
a recognition result management device for obtaining an evaluation result by evaluating the load state received by an operator from the customer based on the speech recognition result of the customer's utterance content recognized by the speech recognition device;
an administrator terminal that displays the evaluation result of the load state obtained by the recognition result management device on a load state screen;
has
The administrator terminal
A speech recognition system characterized in that the totalized result obtained by totaling the evaluation results is displayed in percentage notation obtained by dividing the number of calls that impose a load on the operator by the total number of calls of the operator on a daily or weekly basis .