JP5860085B2

JP5860085B2 - Call recording server, call data management system, and call data management method

Info

Publication number: JP5860085B2
Application number: JP2014053355A
Authority: JP
Inventors: 政悟新井; 満堤; 健森脇
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2014-03-17
Filing date: 2014-03-17
Publication date: 2016-02-16
Anticipated expiration: 2034-03-17
Also published as: TW201540041A; KR20160100412A; JP2015177411A; CN106068641B; KR20170102394A; CN106068641A; KR101826918B1; TWI569619B; WO2015141189A1

Description

本発明は、ＩＰ電話網における通話音声のデータを記録して管理する、通話録音サーバ、通話データ管理システム、および通話データ管理方法に関する。 The present invention relates to a call recording server, a call data management system, and a call data management method for recording and managing call voice data in an IP telephone network.

従来、コールセンターにおけるサービス品質の向上等の様々な目的で、通話内容を確認あるいは監視（以下「監視」という）することが行われている。また、近年、ＶｏＩＰ（Voice over Internet Protocol）技術を使用したＩＰ電話の普及が進んでいる。このため、ＩＰ電話網における通話音声のデータを記録して管理するための様々な技術が存在している（例えば、特許文献１参照）。 Conventionally, confirmation or monitoring (hereinafter referred to as “monitoring”) of call contents has been performed for various purposes such as improvement of service quality in a call center. In recent years, IP telephones using VoIP (Voice over Internet Protocol) technology have been spreading. For this reason, there are various techniques for recording and managing call voice data in an IP telephone network (see, for example, Patent Document 1).

特許文献１に記載の技術（以下「従来技術」という）において、コールセンターのオペレータ端末は、電話対応が終了した後にオペレータによって作成される業務履歴情報と、通話音声の音声データに対する音声認識処理の結果である音声認識情報とを、サーバに送信する。管理者の端末は、業務履歴情報および音声認識情報を取得し、管理者に提示する。 In the technique described in Patent Document 1 (hereinafter referred to as “prior art”), the call center operator terminal is the result of voice recognition processing on the work history information created by the operator after completion of the telephone correspondence and the voice data of the call voice. Is transmitted to the server. The administrator's terminal acquires the business history information and the voice recognition information and presents them to the administrator.

かかる従来技術によれば、ＩＰ電話網における通話音声に対する音声認識結果を、通話が終了した後に管理者が確認することができる。すなわち、従来技術を用いることにより、ＩＰ電話の通話内容を監視することができる。 According to such a conventional technique, the administrator can confirm the voice recognition result for the call voice in the IP telephone network after the call is finished. That is, by using the prior art, it is possible to monitor the contents of the IP telephone call.

特開２００８−２１１２７１号公報JP 2008-2111271 A

しかしながら、各通話が終了した後に、確認が必要な通話をピックアップし、ピックアップされた通話の音声データを検索して、蓄積された音声認識結果や音声データを確認するといった作業は、手間が掛かる。大規模なコールセンターのようにＩＰ電話機の個数が多い場合、サーバに蓄積される上記業務履歴情報および音声認識情報の量も多くなり、かかる作業は、非常に煩雑なものとなる。したがって、従来技術は、監視対象となるＩＰ電話機の個数が多いケースに適用することが難しい。 However, it takes time and effort to pick up a call that needs to be confirmed after each call is finished, to search the voice data of the picked-up call, and to check the stored voice recognition results and voice data. When the number of IP telephones is large as in a large-scale call center, the amount of the business history information and voice recognition information stored in the server increases, and this work becomes very complicated. Therefore, it is difficult to apply the conventional technique to a case where the number of IP telephones to be monitored is large.

本発明の目的は、より簡単に多数のＩＰ電話機の通話内容を監視することを可能にする、通話録音サーバ、通話データ管理システム、および通話データ管理方法を提供することである。 An object of the present invention is to provide a call recording server, a call data management system, and a call data management method that make it possible to more easily monitor call contents of a large number of IP telephones.

本開示の通話録音サーバは、ＩＰ電話網で伝送される通話の音声データを、順次取得してメモリに記録する音声記録制御部と、取得された前記音声データに付随する制御情報に基づいて、前記通話が開始された通話開始タイミングを取得する通話開始取得部と、取得された前記通話開始タイミングの直後に、記録された前記音声データに対する音声認識処理を開始させる音声認識制御部と、を有する。 The call recording server of the present disclosure is based on a voice recording control unit that sequentially acquires and records voice data of a call transmitted over the IP telephone network in a memory, and control information associated with the acquired voice data, A call start acquisition unit that acquires a call start timing at which the call is started; and a voice recognition control unit that starts a voice recognition process for the recorded voice data immediately after the acquired call start timing. .

本開示の通話データ管理システムは、ＩＰ電話網で伝送される通話の音声データを記録する通話録音サーバと、記録された前記音声データに対する音声認識処理を行い、前記音声認識処理の結果であるテキストデータを生成する音声認識サーバと、記録された前記音声データと生成された前記テキストデータとを、対応付けて提示するモニタリング装置と、を有する通話データ管理システムであって、前記通話録音サーバは、ＩＰ電話網から前記音声データを順次取得してメモリに記録する音声記録制御部と、取得された前記音声データに付随する制御情報に基づいて、前記通話が開始された通話開始タイミングを取得する通話開始取得部と、記録された前記音声データを前記音声認識サーバへ出力し、前記音声認識サーバに対して、取得された前記通話開始タイミングの直後に、前記音声データに対する音声認識処理を開始させる音声認識制御部と、を有する。 The call data management system according to the present disclosure includes a call recording server that records voice data of a call transmitted over an IP telephone network, a voice recognition process for the recorded voice data, and a text that is a result of the voice recognition process. A call data management system comprising: a voice recognition server that generates data; and a monitoring device that presents the recorded voice data and the generated text data in association with each other, wherein the call recording server includes: A voice recording control unit that sequentially acquires the voice data from the IP telephone network and records it in a memory, and a call that acquires a call start timing at which the call is started based on control information that accompanies the acquired voice data A start acquisition unit and the recorded voice data are output to the voice recognition server, and acquired by the voice recognition server. Immediately after the call start timing was, having a voice recognition control section to start the speech recognition process on the voice data.

本開示の通話データ管理方法は、ＩＰ電話網で伝送される通話の音声データを、順次取得してメモリに記録するステップと、取得された前記音声データに付随する制御情報に基づいて、前記通話が開始された通話開始タイミングを取得するステップと、取得された前記通話開始タイミングの直後に、記録された前記音声データに対する音声認識処理を開始させるステップと、を有する。 The call data management method according to the present disclosure includes a step of sequentially acquiring voice data of a call transmitted over an IP telephone network and recording the voice data in a memory, and the call based on control information associated with the acquired voice data. And a step of acquiring a voice recognition process for the recorded voice data immediately after the acquired call start timing.

本開示によれば、ＩＰ電話網で伝送される通話の音声データに対する音声認識処理を、通話開始タイミングの直後から開始するので、音声認識結果を、通話の最中に、ほぼリアルタイムに提示することができる。したがって、本開示によれば、より簡単に多数のＩＰ電話機の通話内容を監視することを可能にする。 According to the present disclosure, since voice recognition processing for voice data of a call transmitted over the IP telephone network is started immediately after the call start timing, the voice recognition result can be presented almost in real time during the call. Can do. Therefore, according to the present disclosure, it is possible to more easily monitor the call contents of a large number of IP telephones.

本発明の一実施の形態に係る通話データ管理システムを含む通信システムの構成の一例を示すシステム構成図The system block diagram which shows an example of a structure of the communication system containing the telephone call data management system which concerns on one embodiment of this invention 本実施の形態に係る通話録音サーバの構成の一例を示すブロック図Block diagram showing an example of the configuration of a call recording server according to the present embodiment 本実施の形態に係る通話録音サーバの動作の一例を示すフローチャートThe flowchart which shows an example of operation | movement of the call recording server which concerns on this Embodiment. 本実施の形態に係る通信システムの動作の流れの一例を示すシーケンス図The sequence diagram which shows an example of the flow of operation | movement of the communication system which concerns on this Embodiment

以下、本発明の一実施の形態について、図面を参照して詳細に説明する。本実施の形態は、本発明を、多数のＩＰ電話機を配置したコールセンターの通話監視システムに適用した場合の、具体的態様の一例である。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. This embodiment is an example of a specific mode when the present invention is applied to a call center call monitoring system in which a large number of IP telephones are arranged.

＜システム構成＞
まず、本実施の形態に係る通話データ管理システムを含む通信システムの構成について説明する。 <System configuration>
First, the configuration of a communication system including a call data management system according to the present embodiment will be described.

図１は、本実施の形態に係る通話データ管理システムを含む通信システムの構成の一例を示すシステム構成図である。 FIG. 1 is a system configuration diagram showing an example of a configuration of a communication system including a call data management system according to the present embodiment.

図１において、通信システム１００は、外線網２００、内線網３００、および通話管理網４００を有する。 In FIG. 1, the communication system 100 includes an external network 200, an internal network 300, and a call management network 400.

外線網２００は、インターネット等の公共網であり、コールセンターの顧客が使用するＩＰ端末（図示せず）が接続された通信ネットワークである。すなわち、外線網２００は、コールセンターが形成するＩＰ電話網の一部を構成する。 The external network 200 is a public network such as the Internet, and is a communication network to which an IP terminal (not shown) used by a call center customer is connected. That is, the external network 200 constitutes a part of the IP telephone network formed by the call center.

内線網３００は、コールセンターに構築されたＬＡＮ（Local Area Network）等の通信ネットワークの一部である。内線網３００は、第１〜第Ｎの電話機３１０_１〜３１０_Ｎ、ネットワーク機器３２０、およびＰＢＸ（Private Branch eXchange）装置３３０を有する。 The extension network 300 is a part of a communication network such as a LAN (Local Area Network) constructed in a call center. The extension network 300 includes first to Nth telephones 310 _{1 to} 310 _N , a network device 320, and a PBX (Private Branch eXchange) device 330.

各電話機３１０は、顧客対応を行うオペレータが使用するＩＰ電話機である。第１〜第Ｎの電話機３１０_１〜３１０_Ｎは、ネットワーク機器３２０を介してそれぞれＰＢＸ（Private Branch eXchange）装置３３０に接続されている。 Each telephone 310 is an IP telephone used by an operator who handles customers. The first to Nth telephones 310 _{1 to} 310 _N are connected to a PBX (Private Branch eXchange) device 330 via a network device 320, respectively.

ネットワーク機器３２０は、各電話機３１０とＰＢＸ３３０との間で、ＩＰパケットの転送を行う中継装置であり、例えば、スイッチングハブ、ＴＡＰボックス、あるいはルータである。但し、ネットワーク機器３２０は、ポートミラーリング等の機能により、転送するＩＰパケットの複製を、通話管理網４００へ送信する。 The network device 320 is a relay device that transfers IP packets between each telephone 310 and the PBX 330, and is, for example, a switching hub, a TAP box, or a router. However, the network device 320 transmits a copy of the IP packet to be transferred to the call management network 400 by a function such as port mirroring.

ＰＢＸ装置３３０は、構内交換機であり、外線網２００に接続されている。ＰＢＸ装置３３０は、第１〜第Ｎの電話機３１０_１〜３１０_Ｎを宛先とするＩＰパケットを外線網２００から受信し、ネットワーク機器３２０へ転送する。また、ＰＢＸ装置３３０は、外線網２００のＩＰ電話機（図示せず）を宛先とするＩＰパケットをネットワーク機器３２０から受信し、外線網２００へ転送する。 The PBX device 330 is a private branch exchange and is connected to the external network 200. The PBX device 330 receives IP packets destined for the _first to Nth telephones 310 _{1 to} 310 _N from the external network 200 and transfers them to the network device 320. Further, the PBX device 330 receives an IP packet destined for an IP telephone (not shown) of the external line network 200 from the network device 320 and transfers the IP packet to the external line network 200.

すなわち、内線網３００は、ＩＰ電話網の一部を構成し、コールセンターで行われる多数の通話のＩＰパケットを伝送しつつ、伝送するＩＰパケットの複製を通話管理網４００へ送信する。 That is, the extension network 300 constitutes a part of the IP telephone network, and transmits a copy of the IP packets to be transmitted to the call management network 400 while transmitting IP packets of many calls made at the call center.

通話管理網４００は、例えば、コールセンターに構築されたＬＡＮ等の通信ネットワークの一部であり、本発明の通話データ管理システムに対応する部分である。通話管理網４００は、通話録音サーバ４１０、管理サーバ４２０、音声認識サーバ４３０、およびモニタリング装置４４０を有する。 The call management network 400 is a part of a communication network such as a LAN constructed in a call center, for example, and corresponds to the call data management system of the present invention. The call management network 400 includes a call recording server 410, a management server 420, a voice recognition server 430, and a monitoring device 440.

なお、各装置の接続関係は、図１に示す接続線に限定されない。各装置は、例えば、ＬＡＮにそれぞれ接続されており、どの装置の間でも通信が可能となっている。 In addition, the connection relationship of each apparatus is not limited to the connection line shown in FIG. Each device is connected to a LAN, for example, and can communicate with any device.

通話録音サーバ４１０は、内線網３００のネットワーク機器３２０に接続されている。通話録音サーバ４１０は、ネットワーク機器３２０から送信されるＩＰパケットを受信し、受信したＩＰパケットから通話の音声データを抽出して、記録する。すなわち、通話録音サーバ４１０は、ＩＰ電話網で伝送される通話の音声データを記録する。 The call recording server 410 is connected to the network device 320 of the extension network 300. The call recording server 410 receives the IP packet transmitted from the network device 320, extracts the voice data of the call from the received IP packet, and records it. That is, the call recording server 410 records voice data of a call transmitted over the IP telephone network.

図２は、通話録音サーバ４１０の構成の一例を示すブロック図である。 FIG. 2 is a block diagram illustrating an example of the configuration of the call recording server 410.

図２において、通話録音サーバ４１０は、電話網通信部４１１、管理網通信部４１２、メモリ４１３、音声記録制御部４１４、通話開始取得部４１５、および音声認識制御部４１６を有する。 In FIG. 2, the call recording server 410 includes a telephone network communication unit 411, a management network communication unit 412, a memory 413, a voice recording control unit 414, a call start acquisition unit 415, and a voice recognition control unit 416.

電話網通信部４１１は、内線網３００の通信ネットワークに接続するための通信インタフェースであり、ネットワーク機器３２０に接続されている。電話網通信部４１１は、ネットワーク機器３２０から送信されるＩＰパケットを受信し、受信したＩＰパケットを、逐次、音声記録制御部４１４および通話開始取得部４１５へ出力する。 The telephone network communication unit 411 is a communication interface for connecting to the communication network of the extension network 300, and is connected to the network device 320. The telephone network communication unit 411 receives the IP packet transmitted from the network device 320, and sequentially outputs the received IP packet to the voice recording control unit 414 and the call start acquisition unit 415.

管理網通信部４１２は、通話管理網４００の通信ネットワークに接続するための通信インタフェースであり、管理サーバ４２０、音声認識サーバ４３０、およびモニタリング装置４４０に接続されている。 The management network communication unit 412 is a communication interface for connecting to the communication network of the call management network 400, and is connected to the management server 420, the voice recognition server 430, and the monitoring device 440.

メモリ４１３は、ハードディスク等の記録媒体であり、音声記録制御部４１４から格納される情報を、読み出し可能に保持する。 The memory 413 is a recording medium such as a hard disk, and holds information stored from the audio recording control unit 414 in a readable manner.

音声記録制御部４１４は、入力されたＩＰパケットを解析し、ＩＰパケットのそれぞれから、音声データ（通話音声信号）および制御情報（通信制御信号）を抽出する。そして、音声記録制御部４１４は、抽出した音声データを、制御情報等の音声データを特定する情報と対応付けて、逐次、メモリ４１３に格納させる。すなわち、音声記録制御部４１４は、ＩＰ電話網から音声データを順次取得して、メモリ４１３に記録する。 The voice recording control unit 414 analyzes the input IP packet and extracts voice data (call voice signal) and control information (communication control signal) from each of the IP packets. Then, the audio recording control unit 414 sequentially stores the extracted audio data in the memory 413 in association with information specifying audio data such as control information. That is, the voice recording control unit 414 sequentially acquires voice data from the IP telephone network and records it in the memory 413.

音声データは、通話における双方の話者の発話音声を含む音響データである。制御情報は、音声データに付随する情報であり、通話識別情報、話者識別情報、および時刻情報を含む。通話識別情報は、通話を識別するための情報である。話者識別情報は、音声データに含まれる発話音声の話者（ＩＰ電話機）を識別するための情報である。時刻情報は、音声データが対応する時刻を示す情報である。制御情報は、ＩＰパケットのヘッダ部分から取得されてもよいし、ＩＰパケットのペイロード部分から取得されてもよい。 The voice data is acoustic data including the voices of both speakers in a call. The control information is information attached to the voice data, and includes call identification information, speaker identification information, and time information. The call identification information is information for identifying a call. The speaker identification information is information for identifying the speaker (IP telephone) of the uttered voice included in the voice data. The time information is information indicating the time corresponding to the audio data. The control information may be acquired from the header portion of the IP packet or may be acquired from the payload portion of the IP packet.

通話開始取得部４１５は、入力されたＩＰパケットを解析し、ＩＰパケットのそれぞれから、通話識別情報を含む制御情報を抽出する。通話開始取得部４１５は、抽出した制御情報に基づいて、通話毎に、電話網通信部４１１が当該通話のＩＰパケットを最初に受信したタイミングを特定する。通話開始取得部４１５は、特定された前記タイミングを、当該通話が開始されたタイミング（以下「通話開始タイミング」という）として取得する。そして、通話開始取得部４１５は、通話開始タイミングを取得する毎に、通話開始タイミングである事を、対応する通話の制御情報と共に、音声認識制御部４１６へ通知する。 The call start acquisition unit 415 analyzes the input IP packet and extracts control information including call identification information from each of the IP packets. The call start acquisition unit 415 specifies the timing at which the telephone network communication unit 411 first receives the IP packet of the call for each call based on the extracted control information. The call start acquisition unit 415 acquires the identified timing as the timing at which the call is started (hereinafter referred to as “call start timing”). Then, whenever the call start acquisition unit 415 acquires the call start timing, the call start acquisition unit 415 notifies the voice recognition control unit 416 of the call start timing together with the corresponding call control information.

なお、通話開始取得部４１５は、制御情報に、通話開始時刻を示す情報等、通話開始タイミングを直接的に示す情報が含まれている場合、かかる情報から通話開始タイミングを取得してもよい。 When the control information includes information directly indicating the call start timing, such as information indicating the call start time, the call start acquiring unit 415 may acquire the call start timing from the information.

また、ＩＰパケットからの音声データおよび制御情報の抽出は、電話網通信部４１１で行われてもよい。 Further, extraction of voice data and control information from the IP packet may be performed by the telephone network communication unit 411.

音声認識制御部４１６は、通話開始タイミングである事を通知されると、音声データ管理網通信部４１２を介して、管理サーバ４２０に対し、通話開始タイミングである事を示す通話開始通知を送信する。通話開始通知には、例えば、制御情報が含まれる。 When notified that the call start timing is reached, the voice recognition control unit 416 transmits a call start notification indicating the call start timing to the management server 420 via the voice data management network communication unit 412. . The call start notification includes, for example, control information.

また、音声認識制御部４１６は、音声データ管理網通信部４１２を介して、音声データの送信の要求（以下「音声送信要求」という）を受信すると、メモリ４１３に記録された、要求の対象となる音声データを、要求元へ返信する。音声送信要求には、制御情報等の音声データを特定する情報が含まれる。また、音声送信要求は、例えば、音声認識サーハ゛４３０およびモニタリング装置４４０から送信される。音声送信要求は、例えば、通話識別情報を指定して、対応する通話の音声データが格納され次第、当該音声データを逐次返信することを要求する内容である。 When the voice recognition control unit 416 receives a voice data transmission request (hereinafter referred to as “voice transmission request”) via the voice data management network communication unit 412, the voice recognition control unit 416 records the request target recorded in the memory 413. Is returned to the request source. The audio transmission request includes information for specifying audio data such as control information. The voice transmission request is transmitted from the voice recognition server 430 and the monitoring device 440, for example. The voice transmission request is, for example, a content requesting that call identification information is specified and the voice data is sequentially returned as soon as voice data of the corresponding call is stored.

後述するが、通話開始通知が送信されると、管理サーバ４２０の管理機能により、音声認識サーバ４３０は、通話録音サーバ４１０に対して音声データを要求し、返信された音声データに対する音声認識処理を開始する。 As will be described later, when a call start notification is transmitted, the management function of the management server 420 causes the voice recognition server 430 to request voice data from the call recording server 410 and perform voice recognition processing on the returned voice data. Start.

すなわち、音声認識制御部４１６は、通話開始通知の送信の結果、記録された音声データを音声認識サーバ４３０へ出力し、通話開始タイミングの直後に、メモリ４１３に記録された音声データに対する音声認識サーバ４３０の音声認識処理を、開始させる。 That is, the voice recognition control unit 416 outputs the recorded voice data to the voice recognition server 430 as a result of the transmission of the call start notification, and the voice recognition server for the voice data recorded in the memory 413 immediately after the call start timing. The voice recognition process 430 is started.

図１の管理サーバ４２０は、通話録音サーバ４１０から送信された通話開始通知を受信することにより、通話開始タイミングを取得する。そして、管理サーバ４２０は、取得された通話開始タイミングに基づいて、通話録音サーバ４１０、音声認識サーバ４３０、およびモニタリング装置４４０のそれぞれの動作タイミングを制御する。 The management server 420 in FIG. 1 acquires the call start timing by receiving the call start notification transmitted from the call recording server 410. Then, the management server 420 controls the operation timings of the call recording server 410, the voice recognition server 430, and the monitoring device 440 based on the acquired call start timing.

より具体的には、管理サーバ４２０は、通話開始通知を受信すると、通話開始通知に含まれる制御情報に基づいて、通話開始通知が示す通話の音声データに対して、音声認識処理を行うか否かを決定する。 More specifically, when the management server 420 receives the call start notification, the management server 420 determines whether or not to perform voice recognition processing on the voice data of the call indicated by the call start notification based on the control information included in the call start notification. To decide.

そして、管理サーバ４２０は、音声認識を行うと判断した場合、音声認識サーバ４３０に対して、通話録音サーバ４１０に記録された音声データに対する音声認識処理の開始の要求（以下「認識開始要求」という）を送信する。認識開始要求には、制御情報等の音声データを特定する情報が含まれる。 If the management server 420 determines to perform voice recognition, the management server 420 requests the voice recognition server 430 to start voice recognition processing for voice data recorded in the call recording server 410 (hereinafter referred to as “recognition start request”). ). The recognition start request includes information for specifying audio data such as control information.

また、管理サーバ４２０は、音声認識を行うと判断した場合、モニタリング装置４４０に対して、通話開始通知を転送する。更に、管理サーバ４２０は、音声認識サーバ４３０から、音声認識処理が開始された旨の通知（以下「認識開始通知」という）を受信すると、当該認識開始通知を、モニタリング装置へ転送する。認識開始通知には、制御情報等の音声データを特定する情報が含まれる。 In addition, when the management server 420 determines to perform voice recognition, the management server 420 transfers a call start notification to the monitoring device 440. Further, when receiving a notification from the voice recognition server 430 that the voice recognition process has started (hereinafter referred to as “recognition start notice”), the management server 420 transfers the recognition start notice to the monitoring device. The recognition start notification includes information for specifying audio data such as control information.

音声認識サーバ４３０は、認識開始要求を受信すると、通話録音サーバ４１０に対して、認識開始要求と同一の音声データを対象とした、音声送信要求を送信する。そして、音声認識サーバ４３０は、通話録音サーバ４１０から返信された音声データに対する音声認識処理を行い、音声認識処理の結果であるテキストデータを生成し、音声認識サーバ４３０のメモリ（図示）に格納する。 Upon receiving the recognition start request, the voice recognition server 430 transmits a voice transmission request for the same voice data as the recognition start request to the call recording server 410. Then, the voice recognition server 430 performs voice recognition processing on the voice data returned from the call recording server 410, generates text data as a result of the voice recognition processing, and stores it in a memory (illustrated) of the voice recognition server 430. .

音声認識サーバ４３０は、公知の音声認識技術による音声認識処理を行う。例えば、音声認識サーバは、音声認識データベース、音響分析部、および認識デコーダ部を有する（いずれも図示せず）。 The voice recognition server 430 performs voice recognition processing using a known voice recognition technique. For example, the speech recognition server includes a speech recognition database, an acoustic analysis unit, and a recognition decoder unit (all not shown).

音声認識データベースは、音響モデル、辞書、および言語モデルを予め格納している。音響モデルは、音声の特徴量と発音記号との確率的な対応付けをデータ化したものである。辞書は、音声認識処理による音声認識結果の候補群として、複数のテキスト配列を記述したものである。言語モデルは、辞書に記述されたテキスト配列のそれぞれについて、出現確率や接続確率をデータ化したものである。 The speech recognition database stores an acoustic model, a dictionary, and a language model in advance. The acoustic model is obtained by converting a stochastic association between a feature amount of a voice and a phonetic symbol into data. The dictionary describes a plurality of text arrays as a candidate group of speech recognition results by speech recognition processing. The language model is obtained by converting the appearance probability and connection probability into data for each text array described in the dictionary.

音響分析部は、音声信号に対してフレーム処理を行い、フレームごとにフーリエ解析を含む所定の処理を行って、音声特徴量を抽出する。そして、音響分析部は、解析結果から、発話音声が含まれている音声区間を検出し、音声区間の音声特徴量のみによる時系列データを生成する。 The acoustic analysis unit performs frame processing on the audio signal, performs predetermined processing including Fourier analysis for each frame, and extracts the audio feature amount. Then, the acoustic analysis unit detects a speech section including the speech voice from the analysis result, and generates time-series data based only on the speech feature amount of the speech section.

認識デコーダ部は、音響分析部が生成した音声特徴量の時系列データに基づき、音声認識データベースの音響モデル、辞書、および言語モデルを参照して、音声認識結果を決定する。 The recognition decoder unit determines a speech recognition result by referring to the acoustic model, dictionary, and language model of the speech recognition database based on the time series data of the speech feature amount generated by the acoustic analysis unit.

なお、音声認識結果には、音声認識が成功した場合、発話音声をテキスト化したテキストデータが含まれる。すなわち、音声認識サーバ４３０は、通話録音サーバ４１０に記録された音声データに対する音声認識処理を行い、音声認識処理の結果であるテキストデータを生成する。 Note that the speech recognition result includes text data obtained by converting the uttered speech into text when speech recognition is successful. That is, the voice recognition server 430 performs voice recognition processing on the voice data recorded in the call recording server 410, and generates text data that is a result of the voice recognition processing.

また、音声認識サーバ４３０は、モニタリング装置４４０から、テキストデータ送信の要求（以下「認識結果送信要求」という）を受信すると、保存された、要求の対象となる音声データの音声認識結果を、モニタリング装置４４０へ返信する。認識結果送信要求には、基の音声データの制御情報等の音声データを特定する情報が含まれる。認識結果送信要求は、例えば、通話識別情報を指定して、対応する通話の音声認識結果が生成され次第、当該音声認識結果を逐次返信することを要求する内容である。 When the voice recognition server 430 receives a text data transmission request (hereinafter referred to as “recognition result transmission request”) from the monitoring device 440, the voice recognition server 430 monitors the voice recognition result of the stored voice data to be requested. Reply to device 440. The recognition result transmission request includes information for specifying audio data such as control information of the basic audio data. The recognition result transmission request is, for example, a content requesting that the call identification information is specified and the voice recognition result is sequentially returned as soon as the voice recognition result of the corresponding call is generated.

モニタリング装置４４０は、コールセンターの各通話を監視する管理者が使用するパーソナルコンピュータのうち、ウェブブラウザとして機能する部分である。モニタリング装置４４０は、管理サーバ４２０から通話開始通知を受信すると、通話録音サーバ４１０に対して音声送信要求を送信し、音声認識サーバ４３０に対して認識結果送信要求を送信する。 The monitoring device 440 is a part that functions as a web browser among personal computers used by an administrator who monitors each call in the call center. When receiving the call start notification from the management server 420, the monitoring device 440 transmits a voice transmission request to the call recording server 410 and transmits a recognition result transmission request to the voice recognition server 430.

そして、モニタリング装置４４０は、通話録音サーバ４１０から返信された音声データと、音声認識サーバ４３０から返信された音声認識結果のうち少なくともテキストデータとを、液晶ディスプレイ等の表示部に対応付けて表示する。すなわち、モニタリング装置４４０は、通話開始タイミングの直後から、音声データとその音声認識結果（テキストデータ）とを、管理者に対し、対応付けて提示する。 The monitoring device 440 displays the voice data returned from the call recording server 410 and at least text data among the voice recognition results returned from the voice recognition server 430 in association with a display unit such as a liquid crystal display. . That is, the monitoring device 440 presents the voice data and the voice recognition result (text data) in association with each other immediately after the call start timing.

なお、通話録音サーバ４１０、管理サーバ４２０、音声認識サーバ４３０、およびモニタリング装置４４０のそれぞれは、図示しないが、例えば、ＣＰＵ（Central Processing Unit）、制御プログラムを格納したＲＯＭ（Read Only Memory）等の記憶媒体、ＲＡＭ（Random Access Memory）等の作業用メモリ、および通信回路等を有する。この場合、上記した各装置および各部の機能は、ＣＰＵが制御プログラムを実行することにより実現される。 Although not shown, each of the call recording server 410, the management server 420, the voice recognition server 430, and the monitoring device 440 is, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory) storing a control program, or the like. It has a storage medium, working memory such as RAM (Random Access Memory), and a communication circuit. In this case, the functions of the devices and units described above are realized by the CPU executing the control program.

このような通信システム１００において、通話録音サーバ４１０は、ＩＰ電話網で伝送される通話の音声データを、ストリーミングで音声認識サーバ４３０に提供することができる。そして、音声認識サーバ４３０は、音声データに対する音声認識結果を、ストリーミングでモニタリング装置４４０に提供することができる。 In such a communication system 100, the call recording server 410 can provide the voice recognition server 430 by streaming the voice data of the call transmitted over the IP telephone network. Then, the voice recognition server 430 can provide the voice recognition result for the voice data to the monitoring device 440 by streaming.

すなわち、通信システム１００は、音声認識処理および音声認識結果の提示を、通話開始タイミングの直後から開始することができるので、音声認識結果を、通話の最中にほぼリアルタイムに提示することができる。 That is, since the communication system 100 can start the voice recognition processing and the presentation of the voice recognition result immediately after the call start timing, the voice recognition result can be presented almost in real time during the call.

＜通話録音サーバの動作＞
次に、通話録音サーバの動作について説明する。 <Operation of call recording server>
Next, the operation of the call recording server will be described.

図３は、通話録音サーバ４１０の動作の一例を示すフローチャートである。 FIG. 3 is a flowchart showing an example of the operation of the call recording server 410.

まず、ステップＳ１１００において、電話網通信部４１１は、外線網２００からＩＰパケットを受信したか否かを判断する。電話網通信部４１１は、ＩＰパケットを受信した場合（Ｓ１１００：ＹＥＳ）、処理をステップＳ１２００へ進める。また、電話網通信部４１１は、ＩＰパケットを受信していない場合（Ｓ１１００：ＮＯ）、処理を後述のステップＳ１５００へ進める。 First, in step S 1100, the telephone network communication unit 411 determines whether an IP packet has been received from the external network 200. When the telephone network communication unit 411 receives an IP packet (S1100: YES), the process proceeds to step S1200. If the IP network communication unit 411 has not received an IP packet (S1100: NO), the telephone network communication unit 411 advances the process to step S1500 described later.

ステップＳ１２００において、音声記録制御部４１４は、ＩＰパケットから音声データを抽出し、音声データを、制御情報等の音声データを特定する情報と対応付けて、メモリ４１３に記録する。また、通話開始取得部４１５は、ＩＰパケットから制御情報を抽出する。 In step S1200, the voice recording control unit 414 extracts voice data from the IP packet, and records the voice data in the memory 413 in association with information specifying voice data such as control information. In addition, the call start acquisition unit 415 extracts control information from the IP packet.

そして、ステップＳ１３００において、通話開始取得部４１５は、制御情報に基づき、通話開始タイミングであるか否かを判断する。通話開始取得部４１５は、通話開始タイミングである場合（Ｓ１３００：ＹＥＳ）、処理をステップＳ１４００へ進める。また、通話開始取得部４１５は、通話開始タイミングではない場合（Ｓ１３００：ＮＯ）、処理を後述のステップＳ１５００へ進める。 In step S1300, the call start acquisition unit 415 determines whether it is a call start timing based on the control information. If it is the call start timing (S1300: YES), the call start acquisition unit 415 advances the process to step S1400. If it is not the call start timing (S1300: NO), the call start acquisition unit 415 advances the process to step S1500 described later.

ステップＳ１４００において、音声認識制御部４１６は、管理網通信部４１２を介して、通話開始通知を管理サーバ４２０へ送信する。 In step S <b> 1400, the voice recognition control unit 416 transmits a call start notification to the management server 420 via the management network communication unit 412.

そして、ステップＳ１５００において、音声認識制御部４１６は、管理網通信部４１２を介して、音声送信要求を受信したか否かを判断する。音声認識制御部４１６は、音声送信要求を受信した場合（Ｓ１５００：ＹＥＳ）、処理をステップＳ１６００へ進める。また、音声認識制御部４１６は、音声送信要求を受信していない場合（Ｓ１５００：ＮＯ）、処理を後述のステップＳ１７００へ進める。 In step S1500, the speech recognition control unit 416 determines whether a speech transmission request has been received via the management network communication unit 412. If the voice recognition control unit 416 receives a voice transmission request (S1500: YES), the voice recognition control unit 416 advances the process to step S1600. If the voice recognition control unit 416 has not received a voice transmission request (S1500: NO), the voice recognition control unit 416 advances the process to step S1700 described later.

ステップＳ１６００において、音声認識制御部４１６は、音声送信要求の送信元（要求元）への音声データの転送を開始する。 In step S1600, the voice recognition control unit 416 starts transferring voice data to the transmission source (request source) of the voice transmission request.

そして、ステップＳ１７００において、音声認識制御部４１６は、管理者の操作等により、通話データを監視する処理の終了を指示されたか否かを判断する。音声認識制御部４１６は、処理の終了を指示されていない場合（Ｓ１７００：ＮＯ）、処理をステップＳ１１００へ戻す。また、音声認識制御部４１６は、処理の終了を指示された場合（Ｓ１７００：ＹＥＳ）、一連の処理を終了する。 In step S1700, the voice recognition control unit 416 determines whether an instruction to end the process of monitoring call data is given by an operation of the administrator or the like. If the end of the process is not instructed (S1700: NO), the speech recognition control unit 416 returns the process to step S1100. Further, when instructed to end the process (S1700: YES), the speech recognition control unit 416 ends the series of processes.

このような動作により、通話録音サーバ４１０は、ＩＰ電話網で伝送される通話の通話開始タイミングを取得し、通話開始タイミングの直後に、ＩＰ電話網で伝送される通話の音声データに対する音声認識処理を開始させることができる。 With such an operation, the call recording server 410 acquires the call start timing of the call transmitted over the IP telephone network, and immediately after the call start timing, the voice recognition process for the voice data of the call transmitted over the IP telephone network Can be started.

＜システム全体の動作＞
次に、通信システム１００全体の動作の流れの一例について説明する。 <Operation of the entire system>
Next, an example of the operation flow of the entire communication system 100 will be described.

図４は、通信システム１００の動作の流れの一例を示すシーケンス図である。 FIG. 4 is a sequence diagram illustrating an example of the operation flow of the communication system 100.

まず、モニタリング装置４４０は、音声認識処理の対象となる音声データの条件を、管理サーバ４２０へ送信して予め設定しておく（Ｓ２０１０）。かかる条件は、例えば、話者識別情報、通話の時間帯、通話に所定の単語が含まれることである。すなわち、モニタリング装置４４０は、音声認識処理の対象を、予め、管理サーバ４２０に登録する。そして、通話が開始されると、ネットワーク機器３２０は、ＩＰパケットの通話録音サーバ４１０への送信を開始する（Ｓ２０２０）。 First, the monitoring apparatus 440 transmits to the management server 420 and sets in advance the conditions of the voice data that is the target of the voice recognition process (S2010). Such conditions include, for example, speaker identification information, a call time zone, and a predetermined word included in the call. That is, the monitoring device 440 registers the target of the voice recognition process in the management server 420 in advance. When the call is started, the network device 320 starts transmitting the IP packet to the call recording server 410 (S2020).

通信録音サーバ４１０は、送られてくるＩＰパケットのそれぞれから音声データおよび制御情報を抽出する処理と音声データの記録とを開始すると共に（Ｓ２０３０）、通話開始通知を管理サーバ４２０へ送信する（Ｓ２０４０）。この時点で、通話録音サーバ４１０には、少なくとも通話の最初の部分の音声データが格納されている。 The communication recording server 410 starts processing for extracting voice data and control information from each of the transmitted IP packets and recording of the voice data (S2030), and transmits a call start notification to the management server 420 (S2040). ). At this time, the call recording server 410 stores at least the voice data of the first part of the call.

管理サーバ４２０は、通話開始通知に含まれる制御情報と、Ｓ２０１０で設定された条件とに基づいて、音声データに対する音声認識を行うか否かを判断する（Ｓ２０５０）。管理サーバ４２０は、音声認識を行うと判断した場合、認識開始要求を音声認識サーバ４３０へ送信すると共に（Ｓ２０６０）、通話開始通知をモニタリング装置４４０へ送信する（Ｓ２０７０）。音声認識サーバ４３０は、認識開始要求を受けて、音声送信要求を通話録音サーバ４１０へ送信する（Ｓ２０８０）。 The management server 420 determines whether or not to perform voice recognition on the voice data based on the control information included in the call start notification and the condition set in S2010 (S2050). When the management server 420 determines to perform voice recognition, the management server 420 transmits a recognition start request to the voice recognition server 430 (S2060) and transmits a call start notification to the monitoring device 440 (S2070). Upon receiving the recognition start request, the voice recognition server 430 transmits a voice transmission request to the call recording server 410 (S2080).

上述の通り、通話録音サーバ４１０には、通話の最初の部分の音声データが、少なくとも格納されている。したがって、通話録音サーバ４１０は、音声送信要求を受けて、格納している音声データを音声認識サーバ４３０へ返信する（Ｓ２０９０）。なお、高精度な音声認識結果が得られるように、音声認識サーバ４３０へ送信される音声データは、ＩＰパケットから抽出された音声データの品質が維持されていることが望ましい。 As described above, the call recording server 410 stores at least the audio data of the first part of the call. Therefore, the call recording server 410 receives the voice transmission request and returns the stored voice data to the voice recognition server 430 (S2090). In order to obtain a highly accurate voice recognition result, it is desirable that the voice data transmitted to the voice recognition server 430 maintain the quality of the voice data extracted from the IP packet.

このようにして、音声認識サーバ４３０は、通話録音サーバ４１０に格納された音声データに対する音声認識処理を開始する（Ｓ２１００）。この時点で、音声認識サーバ４３０には、少なくとも通話の最初の部分の音声認識結果が格納されている。また、音声認識サーバ４３０は、認識開始通知を管理サーバ４２０へ送信する（Ｓ２１１０）。 In this way, the voice recognition server 430 starts voice recognition processing for the voice data stored in the call recording server 410 (S2100). At this point, the voice recognition server 430 stores at least the voice recognition result of the first part of the call. In addition, the voice recognition server 430 transmits a recognition start notification to the management server 420 (S2110).

かかる認識開始通知が行われることにより、ウェブブラウザのようにプル型の動作によって表示対象を取得するモニタリング装置４４０であっても、音声データおよび音声認識結果をリアルタイムに取得して表示することが可能となる。 With this recognition start notification, even the monitoring device 440 that acquires a display target by pull-type operation like a web browser can acquire and display voice data and a voice recognition result in real time. It becomes.

管理サーバ４２０は、音声認識サーバ４３０から受信した認識開始通知を、モニタリング装置４４０へ転送する（Ｓ２１２０）。なお、かかる認識開始通知、あるいは、ステップ２０７０で送信される通話開始通知には、音声認識結果の取得先を示す情報として、音声認識サーバ４３０の識別情報が含まれていることが望ましい。モニタリング装置４４０は、認識開始通知を受けて、認識結果送信要求を音声認識サーバ４３０へ送信する（Ｓ２１３０）。 The management server 420 transfers the recognition start notification received from the voice recognition server 430 to the monitoring device 440 (S2120). The recognition start notification or the call start notification transmitted in step 2070 preferably includes the identification information of the voice recognition server 430 as information indicating the acquisition destination of the voice recognition result. Upon receiving the recognition start notification, the monitoring device 440 transmits a recognition result transmission request to the voice recognition server 430 (S2130).

上述の通り、音声認識サーバ４３０には、少なくとも通話の最初の部分の音声認識結果が格納されている。したがって、音声認識サーバ４３０は、認識結果送信要求を受けて、格納している音声認識結果をモニタリング装置４４０へ送信する（Ｓ２１４０）。 As described above, the voice recognition server 430 stores the voice recognition result of at least the first part of the call. Therefore, the voice recognition server 430 receives the recognition result transmission request and transmits the stored voice recognition result to the monitoring device 440 (S2140).

モニタリング装置４４０は、更に、音声送信要求を通話録音サーバ４１０へ送信し（Ｓ２１５０）、通話録音サーバ４１０から音声データを受信する（Ｓ２１６０）。なお、通話録音サーバ４１０は、音声認識制御部４１６において、モニタリング装置４４０へ送信される音声データを、ウェブブラウザで出力可能な形式の音声データに変換することが望ましい。そして、モニタリング装置４４０は、受信した音声データおよび音声認識結果を、対応付けて表示する（Ｓ２１７０）。 The monitoring device 440 further transmits a voice transmission request to the call recording server 410 (S2150), and receives voice data from the call recording server 410 (S2160). The call recording server 410 preferably converts the voice data transmitted to the monitoring device 440 into voice data in a format that can be output by a web browser in the voice recognition control unit 416. Then, the monitoring apparatus 440 displays the received voice data and the voice recognition result in association with each other (S2170).

モニタリング装置４４０は、例えば、監視対象となる通話が複数同時に行われている場合、各音声データの制御情報に含まれる通話識別情報あるいは話者識別情報に基づいて、これら複数の通話についての音声認識結果を、通話毎に取得することができる。この場合、モニタリング装置４４０は、これら複数の通話についての音声認識結果を、１つのウェブブラウザ画面に同時表示することが望ましい。 For example, when a plurality of calls to be monitored are simultaneously performed, the monitoring device 440 recognizes voices for the plurality of calls based on the call identification information or the speaker identification information included in the control information of each voice data. The result can be acquired for each call. In this case, it is desirable that the monitoring device 440 simultaneously displays the voice recognition results for the plurality of calls on one web browser screen.

このような動作により、通信システム１００は、音声認識の対象を必要なものに絞りつつ、音声認識処理および音声認識結果の提示を、通話開始タイミングの直後から開始することができる。また、通信システム１００は、ウェブブラウザにおいて、通話の音声データおよび音声認識結果をリアルタイムに表示することができる。 With such an operation, the communication system 100 can start voice recognition processing and presentation of a voice recognition result immediately after the call start timing while narrowing the target of voice recognition to what is necessary. Further, the communication system 100 can display the voice data and the voice recognition result of the call in real time on the web browser.

なお、通信システム１００で送信される各種要求は、１回の要求で通話全体のデータに対する処理を要求するものであってもよいし、パケット、フレーム、あるいは一まとまりの音声認識結果等を単位として、通話の一部のデータ毎に処理を要求するものであってもよい。後者の場合、例えば、フレーム番号や音声認識結果のイベント番号等を、処理対象を指定する識別情報として用いることができる。 Note that the various requests transmitted by the communication system 100 may request processing for the data of the entire call with a single request, or in units of packets, frames, or a group of voice recognition results. The processing may be requested for each partial data of the call. In the latter case, for example, a frame number, an event number of a speech recognition result, or the like can be used as identification information for specifying a processing target.

＜本実施の形態の効果＞
以上のように、本実施の形態に係る通話データ管理システムを含む通信システム１００によれば、ＩＰ電話網を形成する内線網３００で伝送される通話の音声データに対する音声認識処理を、通話開始タイミングの直後から開始する。これにより、通信システム１００は、ＩＰ電話網の通話の音声データに対する音声認識結果を、通話の最中に、ほぼリアルタイムに提示することができる。 <Effects of the present embodiment>
As described above, according to the communication system 100 including the call data management system according to the present embodiment, the voice recognition processing for the voice data of the call transmitted through the extension network 300 forming the IP telephone network is performed at the call start timing. Start right after. As a result, the communication system 100 can present the voice recognition result for the voice data of the IP telephone network call almost in real time during the call.

上述の通り、各通話が終了した後に、確認が必要な通話をピックアップし、通話の音声データを検索して、蓄積された音声認識結果や音声データを確認するといった作業は、ＩＰ電話機の個数が膨大な場合、非常に煩雑なものとなる。 As described above, after each call is completed, a call that needs to be confirmed is picked up, voice data of the call is searched, and the accumulated voice recognition result and voice data are confirmed. When it is enormous, it becomes very complicated.

この点、本実施の形態に係る通信システム１００は、管理者に対して、各通話の内容をリアルタイムに提示するので、このような煩雑な作業を回避しつつ、各通話を効率良く監視することが可能となる。したがって、本実施の形態に係る通信システム１００によれば、より簡単に、多数のＩＰ電話機の通話内容をリアルタイムに監視することができる。 In this regard, since the communication system 100 according to the present embodiment presents the contents of each call to the administrator in real time, it is possible to efficiently monitor each call while avoiding such complicated work. Is possible. Therefore, according to communication system 100 according to the present embodiment, it is possible to more easily monitor the call contents of a large number of IP telephones in real time.

また、管理者がオペレータにアドバイスを与える等のアクションを行う場合、従来技術のように、通話が終了してから通話内容を確認すると、かかるアクションが適切なタイミングから遅れてしまう。この点、本実施の形態にかかる通信システム１００は、各ＩＰ電話機の通話内容をリアルタイムに監視することができるので、通話内容に応じたアクションを的確なタイミングで行うことを可能にする。 Further, when an administrator performs an action such as giving an advice to an operator, the action is delayed from an appropriate timing when the call content is confirmed after the call is ended as in the prior art. In this regard, since the communication system 100 according to the present embodiment can monitor the call contents of each IP telephone in real time, it is possible to perform an action corresponding to the call contents at an appropriate timing.

また、本実施の形態に係る通信システム１００は、管理サーバ４２０において、制御情報に基づいて、通話毎に、通話録音サーバ４１０、音声認識サーバ４３０、およびモニタリング装置４４０のそれぞれの動作タイミングを制御する。これにより、本実施の形態に係る通信システム１００は、通話録音サーバ４１０、音声認識サーバ４３０、およびモニタリング装置４４０が独立した装置であっても、これらの装置に対する最小限の改変により、これらを連携して動作させ、上記効果を得ることができる。 In communication system 100 according to the present embodiment, management server 420 controls the operation timings of call recording server 410, voice recognition server 430, and monitoring device 440 for each call based on the control information. . As a result, the communication system 100 according to the present embodiment links the call recording server 410, the voice recognition server 430, and the monitoring device 440 with minimum modification to these devices even if they are independent devices. The above effects can be obtained.

また、本実施の形態に係る通信システム１００において、モニタリング装置４４０は、音声認識サーバ４３０から、音声認識サーバ４３０に格納された音声認識結果を取得して提示する。したがって、本実施の形態に係る通信システム１００は、複数のモニタリング装置４４０が存在する場合であっても、各モニタリング装置４４０で独立して音声認識結果を提示することができる。 Further, in communication system 100 according to the present embodiment, monitoring device 440 acquires and presents the speech recognition result stored in speech recognition server 430 from speech recognition server 430. Therefore, the communication system 100 according to the present embodiment can present the speech recognition result independently by each monitoring device 440 even when there are a plurality of monitoring devices 440.

また、本実施の形態に係る通信システム１００は、音声認識の対象となる音声データ（通話、ＩＰ電話機、あるいは話者等）を動的に選択することができるので、多数のＩＰ電話機の通話内容の監視を、更に効率良く行うことを可能にする。 In addition, the communication system 100 according to the present embodiment can dynamically select voice data (call, IP phone, speaker, etc.) that is a target of voice recognition. Monitoring can be performed more efficiently.

また、本実施の形態に係る通信システム１００は、ＩＰ電話網から通話の音声データを取得するので、各通話の音声データを高品質にかつ効率良く取得することができる。例えば、各ＩＰ電話機に音声データ取得のための設備を設ける場合に比べて、本実施の形態に係る通信システム１００は、必要な設備コストおよび設備スペースを低減することができる。また、本実施の形態に係る通信システム１００は、送話音声と受話音声とが完全に分離録音された高品質な音声データを取得することができるため、精度の高いテキストデータを音声認識結果として得ることができ、より高い信頼性を実現することができる。 Further, since communication system 100 according to the present embodiment acquires voice data for calls from the IP telephone network, it can acquire voice data for each call with high quality and efficiency. For example, the communication system 100 according to the present embodiment can reduce the necessary equipment cost and equipment space as compared with the case where each IP telephone is provided with equipment for acquiring voice data. In addition, since the communication system 100 according to the present embodiment can acquire high-quality voice data in which transmitted voice and received voice are completely separated and recorded, highly accurate text data is used as a voice recognition result. And higher reliability can be realized.

なお、ＩＰ電話網で伝送される通話の音声データの取得の仕方は、上述の例に限定されない。例えば、通話録音サーバ４１０は、各通話の音声データの伝送路上に配置されている場合、音声データを転送する際にその複製を取得してもよい。 Note that the method of acquiring voice data of a call transmitted over the IP telephone network is not limited to the above example. For example, when the call recording server 410 is arranged on the voice data transmission path of each call, the call recording server 410 may obtain a copy thereof when transferring the voice data.

また、管理サーバ４２０の機能、音声認識サーバ４３０、およびモニタリング装置４４０の機能の、一部または全部が、通話録音サーバ４１０に配置されていてもよい。 Further, part or all of the functions of the management server 420, the voice recognition server 430, and the monitoring device 440 may be arranged in the call recording server 410.

例えば、通話録音サーバ４１０は、取得され制御情報に基づいて、記録された記音声データに対して音声認識処理を行うか否かを決定する、処理対象決定部、を有してもよい。この場合、音声認識対象の絞り込みを、通話録音サーバ４１０で行うことができ、通話開始通知の送信回数を低減することができる。 For example, the call recording server 410 may include a processing target determination unit that determines whether or not to perform voice recognition processing on the recorded voice data based on the acquired control information. In this case, the voice recognition target can be narrowed down by the call recording server 410, and the number of transmissions of the call start notification can be reduced.

また、本発明の適用は、コールセンターに限定されるものではない。本発明は、公官庁あるいは企業における、受付および営業等の各種窓口代表電話や、社内内線電話網等、複数の通話が行われ得る各種のＩＰ電話網に適用することが可能である。 The application of the present invention is not limited to a call center. The present invention can be applied to various types of IP telephone networks in which a plurality of calls can be made, such as various telephone representative telephones for reception and sales, in-house extension telephone networks, etc. in public offices or companies.

なお、上記通話録音サーバは、取得された前記制御情報に基づいて、記録された前記音声データに対して前記音声認識処理を行うか否かを決定する処理対象決定部、を有してもよい。 The call recording server may include a processing target determining unit that determines whether or not to perform the voice recognition process on the recorded voice data based on the acquired control information. .

また、上記通話録音サーバは、前記ＩＰ電話網から、前記音声データを格納し前記通話の識別情報を含む前記制御情報が付与されたパケットを受信する電話網通信部、を有し、前記通話開始取得部は、前記制御情報に基づいて、前記電話網通信部が前記通話の前記パケットを最初に受信したタイミングを特定し、特定された前記タイミングを、前記通話開始タイミングとして取得してもよい。 In addition, the call recording server includes a telephone network communication unit that stores the voice data from the IP telephone network and receives the packet to which the control information including the call identification information is added, and starts the call The acquisition unit may specify a timing at which the telephone network communication unit first receives the packet of the call based on the control information, and acquire the specified timing as the call start timing.

また、上記通話録音サーバにおいて、前記音声認識制御部は、前記制御情報に基づいて、前記通話毎に、前記音声認識処理の結果であるテキストデータを、記録された前記音声データと対応付けて管理してもよい。 In the call recording server, the voice recognition control unit manages the text data, which is the result of the voice recognition process, in association with the recorded voice data for each call based on the control information. May be.

本開示の通話データ管理システムは、ＩＰ電話網で伝送される通話の音声データを記録する通話録音サーバと、記録された前記音声データに対する音声認識処理を行い、前記音声認識処理の結果であるテキストデータを生成する音声認識サーバと、記録された前記音声データと生成された前記テキストデータとを、対応付けて提示するモニタリング装置と、を有する通話データ管理システムであって、前記通話録音サーバは、ＩＰ電話網から前記音声データを順次取得してメモリに記録する音声記録制御部と、取得された前記音声データに付随する制御情報に基づいて、前記通話が開始された通話開始タイミングを取得する通話開始取得部と、記録された前記音声データを前記音声認識サーバへ出力し、前記音声認識サーバに対して、取得された前記通話開始タイミングの直後に、前記音声データに対する音声認識処理を開始させる音声認識制御部と、を有してもよい。 The call data management system according to the present disclosure includes a call recording server that records voice data of a call transmitted over an IP telephone network, a voice recognition process for the recorded voice data, and a text that is a result of the voice recognition process. A call data management system comprising: a voice recognition server that generates data; and a monitoring device that presents the recorded voice data and the generated text data in association with each other, wherein the call recording server includes: A voice recording control unit that sequentially acquires the voice data from the IP telephone network and records it in a memory, and a call that acquires a call start timing at which the call is started based on control information that accompanies the acquired voice data A start acquisition unit and the recorded voice data are output to the voice recognition server, and acquired by the voice recognition server. And immediately after the call start timing, the voice recognition control section to start the speech recognition process on the voice data, may have.

なお、上記通話データ管理システムにおいて、前記音声認識制御部は、前記モニタリング装置からの要求に応じて、記録された前記音声データを前記モニタリング装置へ出力し、前記音声認識サーバは、前記モニタリング装置からの要求に応じて、生成された前記テキストデータを、前記モニタリング装置へ出力し、前記通話録音サーバから、前記通話開始タイミングを取得し、取得された前記通話開始タイミングに基づいて、前記通話録音サーバ、前記音声認識サーバ、および前記モニタリング装置のそれぞれの動作タイミングを制御する管理サーバ、を有してもよい。 In the call data management system, the voice recognition control unit outputs the recorded voice data to the monitoring device in response to a request from the monitoring device, and the voice recognition server receives from the monitoring device. In response to the request, the generated text data is output to the monitoring device, the call start timing is acquired from the call recording server, and the call recording server is acquired based on the acquired call start timing. And a management server that controls the operation timing of each of the voice recognition server and the monitoring device.

また、上記通話データ管理システムにおいて、前記通話録音サーバは、前記ＩＰ電話網から、前記音声データを格納し前記通話の識別情報を含む前記制御情報が付与されたパケットを受信する電話網通信部、を有し、前記管理サーバは、前記制御情報に基づいて、前記通話毎に、前記通話録音サーバ、前記音声認識サーバ、および前記モニタリング装置のそれぞれの動作タイミングを制御してもよい。 Further, in the call data management system, the call recording server is a telephone network communication unit that receives the packet to which the voice data is stored and the control information including the identification information of the call is added from the IP telephone network. The management server may control operation timings of the call recording server, the voice recognition server, and the monitoring apparatus for each call based on the control information.

また、上記通話データ管理システムにおいて、前記管理サーバは、前記制御情報に基づいて、前記通話録音サーバに通話記録された前記音声データに対して前記音声認識処理を行うか否かを決定してもよい。 In the call data management system, the management server may determine whether to perform the voice recognition process on the voice data recorded in the call recording server based on the control information. Good.

本開示の通話データ管理方法は、ＩＰ電話網で伝送される通話の音声データを、順次取得してメモリに記録するステップと、取得された前記音声データに付随する制御情報に基づいて、前記通話が開始された通話開始タイミングを取得するステップと、取得された前記通話開始タイミングの直後に、記録された前記音声データに対する音声認識処理を開始させるステップと、を有してもよい。 The call data management method according to the present disclosure includes a step of sequentially acquiring voice data of a call transmitted over an IP telephone network and recording the voice data in a memory, and the call based on control information associated with the acquired voice data. There may be included a step of acquiring a call start timing at which is started, and a step of starting a voice recognition process for the recorded voice data immediately after the acquired call start timing.

本発明は、より簡単に多数のＩＰ電話機の通話内容を監視することを可能にする、通話録音サーバ、通話データ管理システム、および通話データ管理方法として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a call recording server, a call data management system, and a call data management method that make it possible to more easily monitor call contents of a large number of IP telephones.

１００通信システム
２００外線網
３００内線網
３１０電話機
３２０ネットワーク機器
３３０ＰＢＸ装置
４００通話管理網
４１０通話録音サーバ
４１１電話網通信部
４１２管理網通信部
４１３メモリ
４１４音声記録制御部
４１５通話開始取得部
４１６音声認識制御部
４２０管理サーバ
４３０音声認識サーバ
４４０モニタリング装置 DESCRIPTION OF SYMBOLS 100 Communication system 200 External network 300 Extension network 310 Telephone 320 Network equipment 330 PBX apparatus 400 Call management network 410 Call recording server 411 Telephone network communication part 412 Management network communication part 413 Memory 414 Voice recording control part 415 Call start acquisition part 416 Voice recognition Control unit 420 Management server 430 Voice recognition server 440 Monitoring device

Claims

A voice recording control unit that sequentially acquires voice data of calls transmitted over the IP telephone network and records them in a memory;
A call start acquisition unit for acquiring a call start timing at which the call is started based on control information accompanying the acquired voice data;
A voice recognition control unit for starting the voice recognition process for the recorded voice data and the recording of text data as a result of the voice recognition process when the call start timing is acquired;
When the recording of the voice data and the recording of the text data is started, the voice data of the call is sent to the monitoring device that acquires and presents the recorded voice data and the text data by a pull-type operation. And a management unit that performs notification indicating that the text data can be acquired,
Call recording server.

A processing target determining unit that determines whether or not to perform the voice recognition process on the recorded voice data based on the acquired control information;
The call recording server according to claim 1.

A telephone network communication unit that stores the voice data from the IP telephone network and receives the packet to which the control information including the call identification information is added;
The call start acquisition unit
Based on the control information, specify the timing when the telephone network communication unit first received the packet of the call, and acquire the specified timing as the call start timing,
The call recording server according to claim 1.

The voice recognition control unit
Based on the control information, for each call, the text data as a result of the voice recognition process is managed in association with the recorded voice data.
The call recording server according to claim 1.

A call data management system having a call recording server, a voice recognition server, a management server, and a monitoring device,
The call recording server
Voice data of a call transmitted over the IP telephone network from the IP telephone network is sequentially acquired and recorded in a memory, and a call start timing at which the call is started based on control information attached to the acquired voice data is determined. Acquired,
The voice recognition server
When the call start timing is acquired, the voice recognition process for the recorded voice data and the recording of text data as a result of the voice recognition process are started,
The management server
When the recording of the voice data and the recording of the text data is started, the monitoring device is notified that the voice data and the text data of the call can be acquired,
The monitoring device includes:
The recorded voice data and the text data are obtained and presented by a pull-type operation.
Call data management system.

The call recording server
In response to a request from the monitoring device, the recorded audio data is output to the monitoring device,
The voice recognition server
In response to a request from the monitoring device, the recorded text data is output to the monitoring device,
The management server
Acquiring the call start timing from the call recording server, and controlling each operation timing of the call recording server, the voice recognition server, and the monitoring device based on the acquired call start timing;
The call data management system according to claim 5.

The call recording server
From the IP telephone network, receiving the packet to which the voice data is stored and the control information including the identification information of the call is attached,
The management server
Based on the control information, for each call, control the operation timing of the call recording server, the voice recognition server, and the monitoring device,
The call data management system according to claim 6.

The management server
Based on the control information, determine whether or not to perform the voice recognition processing on the voice data recorded in the call recording server,
The call data management system according to claim 6.

A step in which a voice recording control unit sequentially acquires voice data of a call transmitted over the IP telephone network and records it in a memory;
A call start obtaining unit obtaining a call start timing at which the call is started based on control information attached to the voice data obtained by the voice recording control unit ;
When the voice recognition control unit acquires the call start timing by the call start acquisition unit, the voice recognition processing for the voice data recorded by the voice recording control unit and the text data as a result of the voice recognition processing Starting the recording;
When the recording unit starts recording the voice data by the voice recording control unit and recording the text data by the voice recognition control unit , the management unit acquires the recorded voice data and the text data by a pull-type operation. And providing a notification indicating that the voice data and the text data of the call can be acquired to the monitoring device to be presented.
Call data management method.