JP2014036354A

JP2014036354A - Automatic response system and automatic response method

Info

Publication number: JP2014036354A
Application number: JP2012176797A
Authority: JP
Inventors: Noriko Ito; 範子伊東; Taketoshi Tsuyui; 武俊栗花; Mitsuteru Umezawa; 光輝梅沢
Original assignee: Hitachi Information and Telecommunication Engineering Ltd
Current assignee: Hitachi Information and Telecommunication Engineering Ltd
Priority date: 2012-08-09
Filing date: 2012-08-09
Publication date: 2014-02-24

Abstract

PROBLEM TO BE SOLVED: To provide an automatic response system and automatic response method capable of transmitting a DTMF signal to an IVR device without leakage to a third party and allowing the IVR device to correctly recognize the transmitted DTMF signal.SOLUTION: A communication terminal converts an output signal by a push operation and voice into a voice signal, and transmits voice data of a first format including the converted voice signal to a server device. The server device converts the voice data of the first format received from the communication terminal into voice data of a second format different from the first format. When it is determined that data of a frequency coinciding with the frequency of the voice signal obtained by converting the output signal by the push operation is included, data in the voice data of the second format is deleted to generate replacement voice data that is replaced with silence data. The deleted data is output by a third format being a protocol different from that of the replacement voice data.

Description

本発明は、自動応答機能を有するＩＶＲ（Interactive Voice Response）装置を有した自動応答システムに関し、特に、架電者が入力したＤＴＭＦ（Dual Tone Multi Frequency）信号の漏えいを防止する技術に関する。 The present invention relates to an automatic response system having an IVR (Interactive Voice Response) device having an automatic response function, and more particularly to a technique for preventing leakage of a DTMF (Dual Tone Multi Frequency) signal input by a caller.

一般に、コールセンタ等の顧客からの電話対応業務を行う施設では、業務の効率化をはかるため、自動応答により顧客からの照会を受け付けている。この機能を担う装置としてはＩＶＲ装置がある。ＩＶＲ装置は、架電者が入力したＤＴＭＦ信号を受け付け、受け付けた信号がどのような内容であるかを認識する機能を有する。例えば、テレフォンバンキング・コールセンタにＩＶＲ装置を設置した場合、架電者が入力したＤＴＭＦ信号をＩＶＲ装置が受け付け、音声操作をするためのメニュー項目、口座番号、個人識別情報（暗証番号など）、金額情報等を認識することで、オペレータを雇うよりも低コストで２４時間運用可能なテレフォンバンキング・コールセンタを実現することができる。 In general, facilities that handle telephone calls from customers such as call centers accept inquiries from customers by automatic responses in order to improve work efficiency. There exists an IVR apparatus as an apparatus which bears this function. The IVR device has a function of accepting a DTMF signal input by a caller and recognizing what the received signal is. For example, when an IVR device is installed in a telephone banking / call center, the IVR device accepts a DTMF signal input by the caller, and performs menu operations, account numbers, personal identification information (such as a PIN), and amount By recognizing information and the like, it is possible to realize a telephone banking call center that can be operated 24 hours a day at a lower cost than hiring an operator.

しかしながら、例えば、データセンタと複数の拠点のコールセンタによって構成されるマルチサイトのコールセンタシステム（一例としてはクラウドシステムのような構成がある。）では、データセンタと各拠点のコールセンタ間をＷＡＮ（Wide Area Network：広域通信網）により接続することが多く、ＷＡＮを伝送するＲＴＰ音声データを第三者に蓄積・聴取され、架電者が入力したＤＴＭＦ信号の内容が漏えいしてしまう危険性がある。また、コールセンタシステム内においてもＬＡＮ（Local Area Network：構内通信網）を伝送するＲＴＰ（Real−time Transport Protocol）音声データを聴取した場合には、同様の危険性がある。このように、ネットワークを介して音声データを送受信する場合、セキュリティ面からみて脆弱となり得る。 However, for example, in a multi-site call center system composed of a data center and call centers of a plurality of bases (for example, a configuration like a cloud system), a WAN (Wide Area) is provided between the data center and the call centers of each base. There is a risk that the contents of the DTMF signal input by the caller may be leaked because RTP voice data transmitted over the WAN is often stored and listened to by a third party. In the call center system, there is a similar risk when listening to RTP (Real-time Transport Protocol) voice data transmitted over a LAN (Local Area Network). As described above, when audio data is transmitted / received via a network, it may be vulnerable in terms of security.

また、架電者の個人識別情報を確認するためのコールセンタの運用として、架電者がオペレータと通話している途中で、一旦ＩＶＲ装置に接続し、そのＩＶＲ装置の自動応答受付に従って架電者が入力した個人識別情報をＩＶＲ装置がＤＴＭＦ信号として認識し、本人確認することがある。このとき、架電者とＩＶＲ装置とオペレータは三者通話状態になり、オペレータは架電者が入力したＤＴＭＦ信号を聴取することとなる。すなわち、上述したネットワーク上の問題をクリアした場合であっても、三者通話状態になる場合には、ＤＴＭＦ信号の内容が漏えいしてしまう危険性がある。 As a call center operation for confirming personal identification information of the caller, the caller is once connected to the IVR device while the caller is talking to the operator, and the caller is in accordance with the automatic response reception of the IVR device. In some cases, the IVR device recognizes the personal identification information input by the DTMF signal and confirms the identity. At this time, the caller, the IVR device, and the operator are in a three-party call state, and the operator listens to the DTMF signal input by the caller. That is, even if the above-described network problem is cleared, there is a risk that the contents of the DTMF signal will be leaked when the three-party call state is established.

ＤＴＭＦ信号の内容を架電者やＩＶＲ装置以外の第三者に漏えいさせない方法として、例えば、特許文献１では、電話線にスモールファームファクタ（small form−factor）のセキュリティ装置を挿入し、ＤＴＭＦ信号を暗号化する方法が開示されている。また、特許文献２では、ＤＴＭＦ信号に加えて擾乱信号を出力することにより、ＤＴＭＦ信号をスクランブルさせる方法が開示されている。 As a method for preventing the contents of the DTMF signal from being leaked to a third party other than the caller or the IVR device, for example, in Patent Document 1, a small form-factor security device is inserted into the telephone line, and the DTMF signal is A method for encrypting is disclosed. Patent Document 2 discloses a method of scrambling a DTMF signal by outputting a disturbance signal in addition to the DTMF signal.

特表２０１０−５１４２７２号公報Special table 2010-514272 gazette 特開２００７−２８８６４８号公報JP 2007-288648 A

しかしながら、上述した特許文献１、２に開示された技術では、ＤＴＭＦ信号から一定のアルゴリズムに従って暗号化やスクランブリングを実行しているため、いずれの場合も盗聴の危険性が残存し、架電者が入力したＤＴＭＦ信号が、架電者及びＩＶＲ装置以外の第三者に漏えいしてしまう可能性がある。 However, in the techniques disclosed in Patent Documents 1 and 2 described above, encryption and scrambling are executed from a DTMF signal according to a certain algorithm, and in either case, there is a risk of eavesdropping, and the caller May be leaked to third parties other than callers and IVR devices.

さらに、ＷＡＮやＬＡＮを介してＤＴＭＦ信号をＩＶＲ装置が受信する場合、ネットワークの通信品質が悪化している場合や通信帯域が不足している場合には、ＤＴＭＦ信号が入ったみなし音声（ＲＴＰ音声データ）の歪が発生することがある。この場合、ＩＶＲ装置では、発生した歪によってＤＴＭＦ信号を正しく認識できないという問題がある。 Furthermore, when the IVR apparatus receives a DTMF signal via a WAN or LAN, when the communication quality of the network is deteriorated or the communication band is insufficient, it is assumed that the DTMF signal is included (RTP sound). Data) may be distorted. In this case, the IVR apparatus has a problem that the DTMF signal cannot be correctly recognized due to the generated distortion.

本発明は、上記に鑑みてなされたものであって、ＤＴＭＦ信号を第三者に漏えいさせることなくＩＶＲ装置に送信し、送信されたＤＴＭＦ信号をＩＶＲ装置に正しく認識させることが可能な自動応答システムおよび自動応答方法を提供することを目的とする。 The present invention has been made in view of the above, and is an automatic response capable of transmitting a DTMF signal to an IVR device without leaking it to a third party and causing the IVR device to correctly recognize the transmitted DTMF signal. An object is to provide a system and an automatic response method.

上述した課題を解決し、目的を達成するために、本発明にかかる自動応答システムは、通話端末から送信された音声信号を音声データに変換し、前記音声データに対して応答する応答装置に前記音声データを送信するサーバ装置と、前記通話端末とがネットワークを介して接続された自動応答システムであって、前記通話端末は、通話者が発した音声を受け取る送話部と、前記通話者からのプッシュ操作を受け付ける入力受付部と、前記入力受付部が受け付けたプッシュ操作による出力信号と前記音声とを音声信号に変換し、変換した音声信号を含む第一のフォーマットの音声データを前記サーバ装置に送信する送信制御部と、を備え、前記サーバ装置は、前記通話端末から受信した前記第一のフォーマットの音声データを受信する第一通信部と、前記第一通信部が受信した前記第一のフォーマットの音声データを前記第一のフォーマットとは異なる第二のフォーマットの音声データに変換し、前記第二のフォーマットの音声データ内に前記プッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれていると判定された場合に、前記第二のフォーマットの音声データ内の前記データを削除し、無音データに差し替えた差し替え音声データを生成する音声解析生成部と、前記第二のフォーマットの音声データ内に前記プッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれているか否かを判定する解析部と、前記音声解析生成部によって削除された前記データを前記差し替え音声データとは異なるプロトコルである第三のフォーマットで出力するプロトコル制御部と、前記差し替え音声データと前記第三のフォーマットで出力されたデータとを前記応答装置に送信する第二通信部と、を備えることを特徴とする自動応答システム。 In order to solve the above-described problems and achieve the object, an automatic response system according to the present invention converts an audio signal transmitted from a call terminal into audio data, and provides a response device that responds to the audio data. An automatic response system in which a server device that transmits voice data and the call terminal are connected via a network, the call terminal including a transmitter that receives a voice uttered by a caller, and the caller An input receiving unit that receives the push operation, and an output signal generated by the push operation received by the input receiving unit and the sound is converted into an audio signal, and the first format audio data including the converted audio signal is converted to the server device. A first communication unit that receives the voice data in the first format received from the call terminal. And converting the audio data in the first format received by the first communication unit into audio data in a second format different from the first format, and pushing the audio data in the audio data in the second format. When it is determined that the data of the frequency that matches the frequency of the converted audio signal is included in the output signal by the operation, the data in the audio data of the second format is deleted and replaced with silence data A voice analysis generator that generates replacement voice data, and whether or not the second-format voice data includes data having a frequency that matches the frequency of the voice signal converted from the output signal by the push operation. And an analysis unit for determining whether the data deleted by the voice analysis generation unit is different from the replacement voice data. A protocol control unit that outputs in the third format, and a second communication unit that transmits the replacement voice data and the data output in the third format to the response device. Auto answer system.

また、本発明にかかる自動応答方法は、通話端末から送信された音声信号を音声データに変換し、前記音声データに対して応答する応答装置に前記音声データを送信するサーバ装置と、前記通話端末とがネットワークを介して接続された自動応答システムで行われる自動応答方法であって、通話者が発した音声を受け取る送話ステップと、前記通話者からの前記通話端末におけるプッシュ操作を受け付ける入力受付ステップと、前記入力受付ステップにおいて受け付けたプッシュ操作による出力信号と前記音声とを音声信号に変換し、変換した音声信号を含む第一のフォーマットの音声データを前記サーバ装置に送信する第一送信ステップと、前記通話端末から受信した前記第一のフォーマットの音声データを受信する受信ステップと、前記受信ステップにおいて受信した前記第一のフォーマットの音声データを前記第一のフォーマットとは異なる第二のフォーマットの音声データに変換する変換ステップと、前記第二のフォーマットの音声データ内に前記プッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれているか否かを判定する解析ステップと、前記第二のフォーマットの音声データ内に前記プッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれていると判定された場合に、前記第二のフォーマットの音声データ内の前記データを削除し、無音データに差し替えた差し替え音声データを生成する生成ステップと、削除された前記データを前記差し替え音声データとは異なるプロトコルである第三のフォーマットで出力する出力ステップと、前記差し替え音声データと前記第三のフォーマットで出力されたデータとを前記応答装置に送信する第二送信ステップと、を含むことを特徴とする。 The automatic response method according to the present invention includes a server device that converts a voice signal transmitted from a call terminal into voice data and transmits the voice data to a response device that responds to the voice data, and the call terminal. Is an automatic response method performed by an automatic response system connected via a network, and includes a transmission step for receiving a voice uttered by a caller, and an input reception for receiving a push operation on the call terminal from the caller And a first transmission step of converting the output signal by the push operation received in the input receiving step and the sound into a sound signal, and transmitting the sound data of the first format including the converted sound signal to the server device. Receiving the audio data of the first format received from the call terminal; and A conversion step of converting the audio data of the first format received in the communication step into audio data of a second format different from the first format, and the push operation in the audio data of the second format An analysis step for determining whether or not data having a frequency matching the frequency of the converted audio signal is included, and the output signal by the push operation is converted into the audio data of the second format. When it is determined that data having a frequency that matches the frequency of the audio signal is included, the data in the audio data in the second format is deleted and generated to generate replacement audio data that is replaced with silence data And the deleted data is a different protocol from the replacement voice data An output step of outputting in three formats, characterized in that it comprises a and a second transmission step of transmitting an output data in the response device at the third format and the replacement audio data.

本発明によれば、ＤＴＭＦ信号を第三者に漏えいさせることなくＩＶＲ装置に送信し、送信されたＤＴＭＦ信号をＩＶＲ装置に正しく認識させることが可能な自動応答システムおよび自動応答方法を提供することができる。 According to the present invention, there is provided an automatic response system and an automatic response method capable of transmitting a DTMF signal to an IVR device without leaking it to a third party and causing the IVR device to correctly recognize the transmitted DTMF signal. Can do.

通話者とオペレータとが通話状態にあるコールセンタシステムの構成例を示す図である。1 is a diagram illustrating a configuration example of a call center system in which a caller and an operator are in a call state. 通話者とＩＶＲ装置とが通話状態にあるコールセンタシステムの構成例を示す図である（２者通話）。It is a figure which shows the structural example of the call center system in which a caller and an IVR apparatus are in a call state (two-party call). 通話端末の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of a call terminal. ゲートウェイサーバの機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of a gateway server. ＳＩＰプロトコル制御部が出力するＳＩＰメッセージのフォーマット例を示す図である。It is a figure which shows the example of a format of the SIP message which a SIP protocol control part outputs. 音声解析生成部がＤＴＭＦ信号部分を差し替える対象となるＲＴＰ音声データのフォーマット例を示す図である。It is a figure which shows the example of a format of RTP audio | voice data used as the object from which the audio | voice analysis production | generation part replaces a DTMF signal part. 設定情報記憶部が記憶するＤＴＭＦテーブルの例を示す図である。It is a figure which shows the example of the DTMF table which a setting information storage part memorize | stores. 変換制御処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a conversion control process. 音声解析生成部が有する受信ジッタバッファに記憶されるデータの遷移を示すイメージ図である。It is an image figure which shows the transition of the data memorize | stored in the reception jitter buffer which an audio | voice analysis production | generation part has. ＤＴＭＦ解析部がＲＴＰ音声データを一定の周期で繰り返しサンプリングして取得する様子を示す図である。It is a figure which shows a mode that a DTMF analysis part repeatedly samples and acquires RTP audio | speech data with a fixed period. 通話者とＩＶＲ装置とが通話状態にあるコールセンタシステムの構成例を示す図である（３者通話）。It is a figure which shows the structural example of the call center system in which a caller and an IVR apparatus are in a call state (three-party call). 本発明にかかる自動応答システムおよび自動応答方法を、マルチサイトコールセンタシステムに適用した場合の構成例を示す図である。It is a figure which shows the structural example at the time of applying the automatic response system and automatic response method concerning this invention to a multi-site call center system.

以下に添付図面を参照して、本発明にかかる自動応答システムおよび自動応答方法の実施の形態を詳細に説明する。以下では、本発明にかかる自動応答システムおよび自動応答方法を、顧客からの電話対応業務を行うコールセンタシステムに適用した場合について説明しているが、ＤＴＭＦ信号を相手先に送信する仕組みを有しているものであれば、特にこれに限定されることはない。まず、通話者とオペレータとが通話状態にあるコールセンタシステムの構成について説明する。 Embodiments of an automatic response system and an automatic response method according to the present invention will be described below in detail with reference to the accompanying drawings. In the following, the case where the automatic answering system and the automatic answering method according to the present invention are applied to a call center system for handling a telephone call from a customer is described, but it has a mechanism for transmitting a DTMF signal to a destination. If it is, it will not be limited to this in particular. First, a configuration of a call center system in which a caller and an operator are in a call state will be described.

図１は、通話者とオペレータとが通話状態にあるコールセンタシステムの構成例を示す図である。図１に示すように、この例でのコールセンタシステムでは、通話端末７００と、コールセンタ８００とが、公衆交換電話網（ＰＳＴＮ：Public Switched Telephone Networks）９００を介して接続されている。なお、コールセンタ内の各装置は、例えば、ＬＡＮ（Local Area Network）回線等の一般的な通信回線によって互いに接続されている。 FIG. 1 is a diagram illustrating a configuration example of a call center system in which a caller and an operator are in a call state. As shown in FIG. 1, in the call center system in this example, a call terminal 700 and a call center 800 are connected via a public switched telephone network (PSTN) 900. The devices in the call center are connected to each other by a general communication line such as a LAN (Local Area Network) line.

まず、架電者Ｕが通話端末７００を操作し、コールセンタに電話を掛けると、オペレータＯとの間で通話状態となる。すると、架電者ＵがオペレータＯに対して話す通話音声（音声信号）Ｓ１は、公衆交換電話網９００を経由し、コールセンタシステム内のゲートウェイサーバ８０１に到達する。すると、ゲートウェイサーバ８０１内では、通話音声（音声信号）Ｓ１を送話の音声データ（ＲＴＰ音声データ）Ｓ２に変換し、通話先のオペレータの通話端末８０４へ伝送する。 First, when the caller U operates the call terminal 700 to place a call to the call center, a call state is established with the operator O. Then, the call voice (voice signal) S 1 that the caller U speaks to the operator O reaches the gateway server 801 in the call center system via the public switched telephone network 900. Then, in the gateway server 801, the call voice (voice signal) S1 is converted into voice data (RTP voice data) S2 for transmission and transmitted to the call terminal 804 of the operator of the call destination.

一方、オペレータＯが架電者Ｕに対して話す通話音声は、オペレータの通話端末８０５を通じて、ＲＴＰ音声データＳ２としてゲートウェイサーバ８０１に到達する。その後、ゲートウェイサーバ８０１は内部でＲＴＰ音声データＳ３を通話音声（音声信号）Ｓ１に変換し、架電者Ｕの通話端末７００に伝送し、会話が成立する。 On the other hand, the call voice that the operator O speaks to the caller U reaches the gateway server 801 as RTP voice data S2 through the call terminal 805 of the operator. Thereafter, the gateway server 801 internally converts the RTP voice data S3 into a call voice (voice signal) S1 and transmits it to the call terminal 700 of the caller U, thereby establishing a conversation.

このように、コールセンタシステムで架電者とオペレータとが二者間で通話をする場合は、ゲートウェイサーバが通話音声をＲＴＰ音声データに変換し、またはその逆にＲＴＰ音声データを通話音声に変換して両者の会話を成立させる。したがって、この例のように、架電者ＵとオペレータＯの二者間で通話しているときは、テレフォニーサーバ８０２によるＩＰテレフォニーの管理や制御、あるいはＩＶＲ（Interactive Voice Response）装置による音声の自動応答は行われない。また、通話端末ＵからＤＴＭＦ信号が入力されることがないので、ゲートウェイサーバではＤＴＭＦ信号の認識処理は行われない。続いて、図１に示したコールセンタシステムにおいて、通話者とＩＶＲ装置とが通話状態にある場合にについて説明する。 Thus, when a caller and an operator make a call between two parties in a call center system, the gateway server converts the call voice into RTP voice data, or vice versa. To establish a conversation between them. Therefore, as in this example, when a call is made between the caller U and the operator O, IP telephony management and control by the telephony server 802, or automatic voice by an IVR (Interactive Voice Response) device. No response is made. In addition, since no DTMF signal is input from the call terminal U, the gateway server does not perform DTMF signal recognition processing. Next, the case where the caller and the IVR device are in a call state in the call center system shown in FIG. 1 will be described.

図２は、通話者とＩＶＲ装置とが通話状態にあるコールセンタシステムの構成例を示す図である。図２に示すように、この例でのコールセンタシステムでは、図１の場合と同様に、通話端末１００と、コールセンタ２００とが、公衆交換電話網３００を介して接続されている。 FIG. 2 is a diagram illustrating a configuration example of a call center system in which a caller and an IVR device are in a call state. As shown in FIG. 2, in the call center system in this example, the call terminal 100 and the call center 200 are connected via a public switched telephone network 300, as in the case of FIG.

通話端末１００は、例えば、一般的な携帯電話やスマートフォン等の端末機器から構成される。図３は、通話端末１００の機能的な構成を示すブロック図である。図３に示すように、通話端末１００は、入力受付部１０１と、受話部１０２と、送話部１０３と、通信制御部１０４とを有して構成されている。 The call terminal 100 is configured by a terminal device such as a general mobile phone or a smartphone, for example. FIG. 3 is a block diagram illustrating a functional configuration of the call terminal 100. As shown in FIG. 3, the call terminal 100 includes an input reception unit 101, a reception unit 102, a transmission unit 103, and a communication control unit 104.

入力受付部１０１は、プッシュボタン等の操作部を有し、通話者Ｕからの操作（例えば、ボタンのプッシュ操作）を受け付ける。受話部１０２は、スピーカが内蔵され、公衆交換電話網３００を介して受け取った音声信号を音声として出力する。送話部１０３は、マイクロフォンが内蔵され、通話者Ｕから発せられた通話音声を音声信号に変換して出力する。通信制御部１０４は、上述した各部の動作を制御し、公衆交換電話網３００から受信した音声信号を受話部１０２に出力し、送話部１０３から受け取った音声信号や入力受付部１０１が受け付けた操作による出力信号を音声信号に変換し、公衆交換電話網３００に送出する。 The input reception unit 101 includes an operation unit such as a push button, and receives an operation (for example, a button push operation) from the caller U. The receiver 102 has a built-in speaker and outputs a voice signal received via the public switched telephone network 300 as voice. The transmitter 103 has a built-in microphone, converts the call voice emitted from the caller U into a voice signal, and outputs the voice signal. The communication control unit 104 controls the operation of each unit described above, outputs the audio signal received from the public switched telephone network 300 to the receiving unit 102, and receives the audio signal received from the transmitting unit 103 and the input receiving unit 101. The operation output signal is converted into a voice signal and sent to the public switched telephone network 300.

コールセンタ２００は、通話端末１００からの音声信号を受け付けて、照会応答等の電話対応業務を行う施設である。図１に示すように、コールセンタ２００は、ゲートウェイサーバ２０１と、テレフォニーサーバ２０２と、ＩＶＲ装置２０３と、複数のコンピュータ２０４および通話端末２０５とを有して構成されている。 The call center 200 is a facility that accepts voice signals from the call terminal 100 and performs telephone-related tasks such as inquiry responses. As shown in FIG. 1, the call center 200 includes a gateway server 201, a telephony server 202, an IVR device 203, a plurality of computers 204, and a call terminal 205.

ゲートウェイサーバ２０１は、公衆交換電話網３００を介した通話端末１００とコールセンタ２００内の各装置との通信を仲介するサーバである。図４は、ゲートウェイサーバ２０１の機能的な構成を示すブロック図である。図４に示すように、ゲートウェイサーバ２０１は、第一送受信制御部２０１１と、第二送受信制御部２０１２と、ＰＳＴＮプロトコル制御部２０１３と、ＳＩＰプロトコル制御部２０１４と、音声解析生成部２０１５と、ＤＴＭＦ解析部２０１６と、設定情報記憶部２０１７とを有して構成されている。 The gateway server 201 is a server that mediates communication between the call terminal 100 and each device in the call center 200 via the public switched telephone network 300. FIG. 4 is a block diagram illustrating a functional configuration of the gateway server 201. As illustrated in FIG. 4, the gateway server 201 includes a first transmission / reception control unit 2011, a second transmission / reception control unit 2012, a PSTN protocol control unit 2013, a SIP protocol control unit 2014, a voice analysis generation unit 2015, and a DTMF. An analysis unit 2016 and a setting information storage unit 2017 are included.

第一送受信制御部２０１１は、ＰＳＴＮプロトコル制御部２０１１が行うＩＳＤＮプロトコル等のＰＳＴＮプロトコルによる呼制御にしたがって公衆交換電話網３００を介した通話端末１００との間を通信させる。また、第一送受信制御部２０１１は、公衆交換電話網３００を介して通話端末１００から音声信号を受信し、その音声信号を音声解析生成部２０１５に出力する。 The first transmission / reception control unit 2011 causes communication with the call terminal 100 via the public switched telephone network 300 according to call control by the PSTN protocol such as ISDN protocol performed by the PSTN protocol control unit 2011. The first transmission / reception control unit 2011 receives an audio signal from the call terminal 100 via the public switched telephone network 300 and outputs the audio signal to the audio analysis generation unit 2015.

第二送受信制御部２０１２は、ＳＩＰプロトコル制御部２０１４がＳＩＰプロトコルのＩＮＦＯメソッドにより生成したＤＴＭＦ信号を含むＳＩＰメッセージを、ＳＩＰプロトコル制御部２０１４が設定情報記憶部２０１７から読み出したＩＶＲ装置２０３のＩＰアドレスに送信する。また、第二送受信制御部２０１２は、音声解析生成部２０１５によって生成されたＤＴＭＦ信号が無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数帯のノイズデータに差し替え後のＲＴＰ音声データを通話音声としてコールセンタ内のコンピュータ２０４や通話端末２０５に送信し、またはコンピュータ２０４や通話端末２０５から送出されたＲＴＰ音声データを受信し、音声解析制御部２０１５に出力する。 The second transmission / reception control unit 2012 receives the IP message of the IVR device 203 from which the SIP protocol control unit 2014 has read out the SIP message including the DTMF signal generated by the SIP protocol INFO method by the SIP protocol INFO method from the setting information storage unit 2017. Send to. The second transmission / reception control unit 2012 also converts the RTP voice data after the DTMF signal generated by the voice analysis generation unit 2015 into silent RTP voice data or noise data in a frequency band irrelevant to the DTMF signal. As RTP voice data transmitted from the computer 204 or the call terminal 205 or received from the computer 204 or the call terminal 205 and output to the voice analysis control unit 2015.

ＰＳＴＮプロトコル制御部２０１１は、第一送受信制御部２０１１が通話端末１００との間で行うＰＳＴＮプロトコルによる呼制御を行う。 The PSTN protocol control unit 2011 performs call control based on the PSTN protocol performed by the first transmission / reception control unit 2011 with the call terminal 100.

ＳＩＰプロトコル制御部２０１４は、第二送受信制御部２０１２がコールセンタ内のコンピュータ２０４や通話端末２０５との間で行うＳＩＰプロトコルによる呼制御を行う。ＳＩＰプロトコル制御部２０１４は、ＤＴＭＦ解析部２０１６によってあらかじめ定められた周波数に一致すると判定されたＤＴＭＦ信号を、ＳＩＰプロトコルのＩＮＦＯメソッドによるＳＩＰメッセージを第二送受信制御部２０１２に出力する。 The SIP protocol control unit 2014 performs call control based on the SIP protocol performed by the second transmission / reception control unit 2012 with the computer 204 and the call terminal 205 in the call center. The SIP protocol control unit 2014 outputs a DTMF signal determined by the DTMF analysis unit 2016 to match a predetermined frequency to the second transmission / reception control unit 2012 by using an SIP protocol INFO method.

図５は、ＳＩＰプロトコル制御部２０１４が出力するＳＩＰメッセージのフォーマット例を示す図である。図５に示すように、ＳＩＰプロトコル制御部２０１４は、リクエストの種類や宛先等を示すリクエスト部５０１と、そのリクエストの概要を示すヘッダ部５０２と、そのリクエストにおける通信に関する処理内容を示すボディ部５０３とによって構成されたＳＩＰメッセージを出力する。ＳＩＰプロトコル制御部２０１４は、後述するＤＴＭＦ解析部２０１６によって解析されたＤＴＭＦ信号をボディ部５０３に記述したＳＩＰメッセージを第二送受信制御部２０１２に出力する。 FIG. 5 is a diagram illustrating a format example of the SIP message output from the SIP protocol control unit 2014. As shown in FIG. 5, the SIP protocol control unit 2014 includes a request unit 501 indicating a request type, a destination, and the like, a header unit 502 indicating an outline of the request, and a body unit 503 indicating processing contents related to communication in the request. Output a SIP message composed of The SIP protocol control unit 2014 outputs a SIP message in which the DTMF signal analyzed by the DTMF analysis unit 2016 described later is described in the body unit 503 to the second transmission / reception control unit 2012.

音声解析生成部２０１５は、第一送受信制御部２０１１が受信した通話音声（音声信号）をＲＴＰ音声データに変換し、第二送受信制御部２０１２に出力する。このとき、音声解析生成部２０１５は、ＤＴＭＦ解析部２０１６からの指示を受けて、ＲＴＰ音声データに含まれるＤＴＭＦ信号部分のデータを削除し、無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数を有したノイズデータを挿入し、ＲＴＰ音声データを差し替える。 The voice analysis generation unit 2015 converts the call voice (voice signal) received by the first transmission / reception control unit 2011 into RTP voice data and outputs the RTP voice data to the second transmission / reception control unit 2012. At this time, the voice analysis generation unit 2015 receives an instruction from the DTMF analysis unit 2016, deletes the data of the DTMF signal part included in the RTP voice data, and has a frequency independent of the silent RTP voice data or the DTMF signal. Is inserted and the RTP audio data is replaced.

具体的には、音声解析生成部２０１５は、変換したＲＴＰ音声データの伝送先が、後述する設定情報記憶部２０１７に記憶されているＩＶＲ装置２０３のＩＰアドレスに一致するか否かを判定し、音声解析生成部２０１５が生成したＲＴＰ音声データの伝送先が、設定情報記憶部２０１７に記憶されているＩＶＲ装置２０３のＩＰアドレスに一致すると判定した場合、さらに、架電者ＵとＩＶＲ装置２０３間の通話が継続しているか否かを判定し、通話が継続していると判定した場合に、内部の記憶エリアである受信ジッタバッファに蓄積し、その後、上述したようにＲＴＰ音声データを差し替える。 Specifically, the voice analysis generation unit 2015 determines whether or not the transmission destination of the converted RTP voice data matches the IP address of the IVR device 203 stored in the setting information storage unit 2017 described later, When it is determined that the transmission destination of the RTP voice data generated by the voice analysis generation unit 2015 matches the IP address of the IVR device 203 stored in the setting information storage unit 2017, further, between the caller U and the IVR device 203 If it is determined whether or not the call is continued, and if it is determined that the call is continuing, it is stored in the reception jitter buffer which is an internal storage area, and then the RTP voice data is replaced as described above.

図６は、音声解析生成部２０１５がＤＴＭＦ信号部分を差し替える対象となるＲＴＰ音声データのフォーマット例を示す図である。図６に示すように、ＲＴＰ音声データは、ＩＰヘッダ部６０１と、ＵＤＰヘッダ部６０２と、ＲＴＰヘッダ部６０３と、ＲＴＰペイロード部６０４とを有したＲＴＰパケットから構成され、音声解析生成部２０１５は、ＲＴＰペイロード部６０４に含まれるＤＴＭＦ信号のデータを削除し、無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数のノイズデータを挿入し、ＲＴＰ音声データを差し替える。 FIG. 6 is a diagram illustrating a format example of RTP sound data that is a target for which the sound analysis generation unit 2015 replaces the DTMF signal portion. As shown in FIG. 6, the RTP voice data is composed of RTP packets having an IP header part 601, a UDP header part 602, an RTP header part 603, and an RTP payload part 604. The DTMF signal data contained in the RTP payload 604 is deleted, silence RTP audio data or noise data having a frequency unrelated to the DTMF signal is inserted, and the RTP audio data is replaced.

ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５によって、受信された音声信号が受信ジッタバッファに蓄積されると、そのＲＴＰ音声データがＤＴＭＦ信号の周波数に一致するか否かを判定し、そのＲＴＰ音声データがＤＴＭＦ信号の周波数に一致すると判定した場合、その判定の際に認識したＤＴＭＦ信号をＳＩＰプロトコル制御部２０１４に出力する。ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５に対して、ＲＴＰ音声データに含まれるＤＴＭＦ信号部分のデータを削除してその位置を記憶し、無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数のノイズデータを同じ位置に挿入するように指示する。 When the received voice signal is accumulated in the reception jitter buffer by the voice analysis generation unit 2015, the DTMF analysis unit 2016 determines whether or not the RTP voice data matches the frequency of the DTMF signal, and the RTP voice. When it is determined that the data matches the frequency of the DTMF signal, the DTMF signal recognized at the time of the determination is output to the SIP protocol control unit 2014. The DTMF analysis unit 2016 deletes the data of the DTMF signal portion included in the RTP audio data and stores the position to the audio analysis generation unit 2015, and the frequency that is not related to the silent RTP audio data or the DTMF signal. Instruct to insert the noise data at the same position.

設定情報記憶部２０１７は、例えば、ＨＤＤ（Hard Disk Drive）やメモリ等の記憶装置である。設定情報記憶部２０１７は、ＩＶＲ装置２０３のＩＰアドレスを記憶する。また設定情報記憶部２０１７は、音声解析部２０１５によって変換されたＲＴＰ音声データにＤＴＭＦ信号が含まれているか否かを判定するためのＤＴＭＦテーブルを記憶する。 The setting information storage unit 2017 is a storage device such as an HDD (Hard Disk Drive) or a memory. The setting information storage unit 2017 stores the IP address of the IVR device 203. The setting information storage unit 2017 stores a DTMF table for determining whether or not the DTMF signal is included in the RTP audio data converted by the audio analysis unit 2015.

図７は、設定情報記憶部２０１７が記憶するＤＴＭＦテーブルの例を示す図である。図７に示すように、ＤＴＭＦテーブルは、所定の規格によって定められた高周波数と低周波数とのマトリックスにより構成される。例えば、ＤＴＭＦ解析部２０１６は、音声解析部２０１５によって変換されたＲＴＰ音声データに、高周波数である１２０９Ｈｚと低周波数６９７Ｈｚとの２つの音声信号によって合成された音声信号を含んでいる場合には、「１」を示すＤＴＭＦ信号がＲＴＰ音声データに含まれると判定する。ＤＴＭＦ解析部２０１６は、他の数字や記号についても、これと同様に判定する。 FIG. 7 is a diagram illustrating an example of a DTMF table stored in the setting information storage unit 2017. As shown in FIG. 7, the DTMF table is composed of a matrix of high frequencies and low frequencies defined by a predetermined standard. For example, when the DTMF analysis unit 2016 includes RTP audio data converted by the audio analysis unit 2015, an audio signal synthesized by two audio signals of a high frequency of 1209 Hz and a low frequency of 697 Hz, It is determined that the DTMF signal indicating “1” is included in the RTP audio data. The DTMF analysis unit 2016 determines other numbers and symbols in the same manner.

続いて、図４に示したゲートウェイサーバ２０１が行うＲＴＰ音声データを変換して送信し、ＤＴＭＦ信号のＳＩＰメッセージにより送信する処理（以下、変換制御処理と呼ぶ。）の処理手順について説明する。図８は、上述した変換制御処理の処理手順を示すフローチャートである。以下に示す例では、ゲートウェイサーバ２０１が通話端末１００から通話開始を示すメッセージを受信した状態にあるものとする。 Subsequently, a processing procedure of processing (hereinafter referred to as conversion control processing) in which the RTP voice data performed by the gateway server 201 illustrated in FIG. 4 is converted and transmitted and transmitted by the SIP message of the DTMF signal will be described. FIG. 8 is a flowchart showing the processing procedure of the conversion control process described above. In the example shown below, it is assumed that the gateway server 201 has received a message indicating the start of a call from the call terminal 100.

図８に示すように、まず、変換制御処理では、ゲートウェイサーバ２０１が通話端末１００から通話開始を示すメッセージを受信すると、音声解析生成部２０１５は、設定情報記憶部２０１７に記憶されているＩＶＲ装置２０３のＩＰアドレスと同じＩＰアドレスがそのメッセージに含まれているか否かを判定する（ステップＳ８０１）。 As shown in FIG. 8, first, in the conversion control process, when the gateway server 201 receives a message indicating the start of a call from the call terminal 100, the voice analysis generation unit 2015 stores the IVR device stored in the setting information storage unit 2017. It is determined whether or not the same IP address as the IP address 203 is included in the message (step S801).

そして、音声解析生成部２０１５は、設定情報記憶部２０１７に記憶されているＩＶＲ装置２０３のＩＰアドレスと同じＩＰアドレスがそのメッセージに含まれていないと判定した場合（ステップＳ８０１；Ｎｏ）、架電者ＵとオペレータＯとの２者間の通話であると判断して変換制御処理を終了させ、図１に示したように、ゲートウェイサーバ２０１は、通話による音声信号をＲＴＰ音声データに変換する処理を実行する。 When the voice analysis generation unit 2015 determines that the same IP address as the IP address of the IVR device 203 stored in the setting information storage unit 2017 is not included in the message (step S801; No), the call analysis / generation unit 2015 It is determined that the call is a call between the person U and the operator O, and the conversion control process is terminated. As shown in FIG. 1, the gateway server 201 converts the voice signal from the call into RTP voice data. Execute.

一方、音声解析生成部２０１５は、設定情報記憶部２０１７に記憶されているＩＶＲ装置２０３のＩＰアドレスと同じＩＰアドレスがそのメッセージに含まれていると判定した場合（ステップＳ８０１；Ｙｅｓ）、さらに、架電者ＵとＩＶＲ装置２０３間の通話が継続しているか否かを判定する（ステップＳ８０２）。通話が継続しているか否かの判定は、例えば、通話端末１００とＩＶＲ装置２０３との間のセッションが確立している場合には通話中であると判定する。 On the other hand, when the voice analysis generation unit 2015 determines that the same IP address as the IP address of the IVR device 203 stored in the setting information storage unit 2017 is included in the message (step S801; Yes), It is determined whether or not the call between the caller U and the IVR device 203 is continued (step S802). For example, when the session between the call terminal 100 and the IVR device 203 is established, it is determined that the call is ongoing.

そして、音声解析生成部２０１５は、架電者ＵとＩＶＲ装置２０３間の通話が継続していない（すなわち通話終了である）と判定した場合（ステップＳ８０２；Ｎｏ）、受信ジッタバッファ２０１５１に蓄積されているＲＴＰ音声データを、第二送受信制御部２０１２に出力し、第二送受信制御部２０１２は、そのＲＴＰ音声データをＩＶＲ装置２０３に送信し、処理終了させる（ステップＳ８０３）。 When the voice analysis generation unit 2015 determines that the call between the caller U and the IVR device 203 is not continued (that is, the call is ended) (step S802; No), the voice analysis generation unit 2015 is stored in the reception jitter buffer 20151. The RTP audio data is output to the second transmission / reception control unit 2012, and the second transmission / reception control unit 2012 transmits the RTP audio data to the IVR device 203 and terminates the processing (step S803).

一方、音声解析生成部２０１５は、架電者ＵとＩＶＲ装置２０３間の通話が継続している（すなわち通話中である）と判定した場合（ステップＳ８０２；Ｙｅｓ）、受信された音声信号をＲＴＰ音声データに変換し、変換後のＲＴＰ音声データを受信ジッタバッファ２０１５１に蓄積する（ステップＳ８０４）。 On the other hand, when it is determined that the call between the caller U and the IVR device 203 is continuing (that is, the call is in progress) (step S802; Yes), the voice analysis generation unit 2015 converts the received voice signal to RTP. The voice data is converted into voice data, and the converted RTP voice data is stored in the reception jitter buffer 20151 (step S804).

そして、ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５によって変換されたＲＴＰ音声データを解析し（ステップＳ８０５）、その中に含まれるＤＴＭＦ信号が、図７に示したＤＴＭＦテーブルに含まれる周波数で構成されているか否かを判定する（ステップＳ８０６）。 Then, the DTMF analysis unit 2016 analyzes the RTP audio data converted by the audio analysis generation unit 2015 (step S805), and the DTMF signal included therein is configured with the frequency included in the DTMF table shown in FIG. It is determined whether it has been performed (step S806).

ＤＴＭＦ解析部２０１６は、変換されたＲＴＰ音声データに含まれるＤＴＭＦ信号が、ＤＴＭＦテーブルに含まれる周波数で構成されている（すなわち、一致する周波数がある）と判定した場合（ステップＳ８０６；Ｙｅｓ）、そのＲＴＰ音声データのＤＴＭＦ信号が含まれる部分の情報（ＤＴＭＦ信号情報）を抽出し、ＳＩＰプロトコル制御部２０１４は、そのＤＴＭＦ信号情報を、ＳＩＰメッセージ（ＩＮＦＯデータ）として第二送受信制御部２０１２に出力し、第二送受信制御部２０１２は、テレフォニーサーバ２０２を経由して、そのＳＩＰメッセージをＩＶＲ装置２０３に送信する（ステップＳ８０７）。 When the DTMF analysis unit 2016 determines that the DTMF signal included in the converted RTP audio data is composed of the frequencies included in the DTMF table (that is, there is a matching frequency) (step S806; Yes). Information of the portion (DTMF signal information) including the DTMF signal of the RTP audio data is extracted, and the SIP protocol control unit 2014 outputs the DTMF signal information to the second transmission / reception control unit 2012 as a SIP message (INFO data). Then, the second transmission / reception control unit 2012 transmits the SIP message to the IVR device 203 via the telephony server 202 (step S807).

そして、ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５に対して、ＲＴＰ音声データに含まれるＤＴＭＦ信号情報を削除してその位置を記憶し、無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数のノイズデータを同じ位置に挿入するように指示し、音声解析生成部２０１５は、その指示に従って、ＲＴＰ音声データを差し替える（ステップＳ８０８）。 Then, the DTMF analysis unit 2016 deletes the DTMF signal information included in the RTP audio data and stores the position to the audio analysis generation unit 2015, and the frequency unrelated to the silent RTP audio data or the DTMF signal. The voice analysis generation unit 2015 replaces the RTP voice data in accordance with the instruction (step S808).

その後、音声解析部２０１５は、あらかじめ定められたタイミング（例えば、３０ｍｓ周期）が到来したか否かを判定し（ステップＳ８０９）、そのタイミングが到来したと判定した場合（ステップＳ８０９；Ｙｅｓ）、受信ジッタバッファ２０１５１に蓄積されているＲＴＰ音声データを周期的に第二送受信制御部２０１２に出力し、第二送受信制御部２０１２は、そのＲＴＰ音声データをＩＶＲ装置２０３に送信する（ステップＳ８１０）。一方、音声解析部２０１５は、そのタイミングが到来していないと判定した場合（ステップＳ８０９；Ｎｏ）、ステップＳ８０２に戻り、以降の処理を架電者ＵとＩＶＲ装置２０３の通話が終了するまで繰り返す。 Thereafter, the voice analysis unit 2015 determines whether or not a predetermined timing (for example, 30 ms cycle) has arrived (step S809), and if it is determined that the timing has arrived (step S809; Yes) The RTP audio data stored in the jitter buffer 20151 is periodically output to the second transmission / reception control unit 2012, and the second transmission / reception control unit 2012 transmits the RTP audio data to the IVR device 203 (step S810). On the other hand, when it is determined that the timing has not arrived (step S809; No), the voice analysis unit 2015 returns to step S802 and repeats the subsequent processing until the call between the caller U and the IVR device 203 is completed. .

このように、本実施例では、通話端末１００から送信された音声信号を音声データに変換し、音声データに対して応答するＩＶＲ装置２０３に音声データを送信するゲートウェイサーバ２０１と、通話端末１００とが公衆交換電話網３００を介して接続された自動応答システムにおいて、通話端末１００は、入力受付部１０１が、通話者Ｕからのプッシュ操作を受け付け、送話部１０３が、通話者Ｕが発した音声を受け取り、送信制御部１０４が、入力受付部１０１が受け付けたプッシュ操作による出力信号と音声とを音声信号に変換し、変換した音声信号を含む第一のフォーマット（ＰＳＴＮプロトコル）の音声データをゲートウェイサーバ２０１に送信し、ゲートウェイサーバ２０１は、第一送受信制御部２０１１が、通話端末１００から受信した第一のフォーマットの音声データを受信し、音声解析生成部２０１５が、第一送受信制御部２０１１が受信した第一のフォーマットの音声データを第一のフォーマットとは異なる第二のフォーマット（ＲＴＰプロトコル）の音声データに変換し、第二のフォーマットの音声データ内にプッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれていると判定された場合に、第二のフォーマットの音声データ内のデータを削除し、無音データに差し替えた差し替え音声データを生成し、ＤＴＭＦ解析部２０１６が、第二のフォーマットの音声データ内にプッシュ操作による出力信号が変換された音声信号の周波数に一致する周波数のデータが含まれているか否かを判定し、ＳＩＰプロトコル制御部２０１４が、音声解析生成部２０１５によって削除されたデータを差し替え音声データとは異なるプロトコルである第三のフォーマット（ＳＩＰプロトコル）で出力し、第二送受信制御部２０１２が、差し替え音声データと第三のフォーマットで出力されたデータとをＩＶＲ装置に送信するので、ＤＴＭＦ信号を第三者に漏えいさせることなくＩＶＲ装置に送信することができる。 Thus, in this embodiment, the gateway server 201 that converts the voice signal transmitted from the call terminal 100 into voice data and transmits the voice data to the IVR device 203 that responds to the voice data, In the automatic answering system connected through the public switched telephone network 300, in the call terminal 100, the input receiving unit 101 receives a push operation from the caller U, and the transmitter 103 is issued by the caller U. The transmission control unit 104 receives the audio, converts the output signal and the audio by the push operation received by the input reception unit 101 into an audio signal, and converts the audio data of the first format (PSTN protocol) including the converted audio signal. Is transmitted to the gateway server 201. The gateway server 201 determines whether the first transmission / reception control unit 2011 is the call terminal 100. The received audio data in the first format is received, and the audio analysis generation unit 2015 converts the audio data in the first format received by the first transmission / reception control unit 2011 into a second format (RTP) different from the first format. Protocol) audio data, and it is determined that the audio data of the second format contains data having a frequency that matches the frequency of the audio signal converted by the push operation. The data in the audio data in the second format is deleted to generate the replacement audio data that is replaced with the silence data, and the DTMF analysis unit 2016 converts the output signal by the push operation into the audio data in the second format. SIP protocol control unit that determines whether or not frequency data matching the frequency of the signal is included 014 outputs the data deleted by the voice analysis generation unit 2015 in a third format (SIP protocol) which is a protocol different from the replacement voice data, and the second transmission / reception control unit 2012 outputs the replacement voice data and the third data. Since the data output in the format is transmitted to the IVR device, the DTMF signal can be transmitted to the IVR device without leaking it to a third party.

また、従来技術のようにある一定のアルゴリズムで暗号化やスクランブリングした場合には、そのアルゴリズムの解読によって復号化されてしまう可能性があるが、本実施例の場合にはＲＴＰ音声データに含まれるＤＴＭＦ信号情報を無音化してアルゴリズム自体を解読できないようにしているため、たとえ第三者にＲＴＰ音声データが聴取された場合であっても、その解読は不可能となる。さらに、スクランブリングした場合には、ＩＶＲ装置からスクランブルするための音声を出力するため、架電者がそのような音声を聴取するため耳障りとなり、不快感を与えてしまうが、ＤＴＭＦ信号情報を無音化した場合には、そのような不快感を与えることがなくなる。また、ＲＴＰ音声データが暗号化やスクランブルされた場合には、ＩＶＲ装置側で復号化する処理が必要となるため、ＩＶＲ装置側での処理負荷が増大するが、本実施例のようにＲＴＰ音声データに含まれるＤＴＭＦ信号情報を無音化する場合にはそのような弊害はない。 In addition, in the case of encryption or scrambling with a certain algorithm as in the prior art, it may be decrypted by decryption of the algorithm, but in this embodiment, it is included in the RTP audio data Since the DTMF signal information is silenced so that the algorithm itself cannot be deciphered, it is impossible to decipher even if RTP audio data is heard by a third party. In addition, when scrambled, since the voice for scrambling is output from the IVR device, the caller listens to such voice, which is annoying and uncomfortable, but the DTMF signal information is silent. When it becomes, it does not give such an unpleasant feeling. In addition, when the RTP audio data is encrypted or scrambled, a process for decryption is required on the IVR device side, which increases the processing load on the IVR device side. However, as in this embodiment, the RTP audio data is increased. There is no such adverse effect when the DTMF signal information included in the data is silenced.

より具体的には、図２に示したように、架電者Ｕが通話端末１００を操作し、コールセンタ２００に電話を掛けてＩＶＲ装置２０３に接続した場合、まず、図１に示した場合と同様に、架電者Ｕが話す通話音声が音声信号Ｓ４として公衆交換電話網３００を経由して、コールセンタ２００内のゲートウェイサーバ２０１に到達する。すると、ゲートウェイサーバ２０１によって音声信号がＲＴＰ音声データに変換され、あらかじめ記憶されているＩＶＲ装置２０３のＩＰアドレスと、ＲＴＯ音声データに含まれているＩＰアドレスがそのＩＰアドレスに一致する場合には宛先がＩＶＲ装置２０３であると認識し、ＲＴＰ音声データＳ６をＩＶＲ装置２０３に伝送する。一方、ＩＶＲ装置２０３が再生するガイダンス音声は、ＲＴＰ音声データＳ７としてゲートウェイサーバ２０１に到達する。ゲートウェイサーバ２０１内でＲＴＰ音声データをガイダンス音声（音声信号Ｓ４）に変換し、架電者Ｕの通話端末１００に伝送する。 More specifically, as shown in FIG. 2, when the caller U operates the call terminal 100 and makes a call to the call center 200 to connect to the IVR device 203, first, the case shown in FIG. Similarly, the call voice spoken by the caller U reaches the gateway server 201 in the call center 200 via the public switched telephone network 300 as the voice signal S4. Then, the voice signal is converted into RTP voice data by the gateway server 201, and when the IP address of the IVR device 203 stored in advance and the IP address included in the RTO voice data match the IP address, the destination Recognizes that it is the IVR device 203, and transmits the RTP audio data S6 to the IVR device 203. On the other hand, the guidance voice reproduced by the IVR device 203 reaches the gateway server 201 as RTP voice data S7. The RTP voice data is converted into guidance voice (voice signal S4) in the gateway server 201 and transmitted to the call terminal 100 of the caller U.

一方、架電者Ｕが通話端末１００を操作して入力したＤＴＭＦ信号を送信してＩＶＲ装置２０３で受け付けるコールセンタ２００を運用する場合、架電者Ｕが入力したＤＴＭＦ信号は、音声信号として公衆交換電話網３００を経由し、コールセンタ２００内のゲートウェイサーバ２０１に到達し、ＲＴＰ音声データに変換される。ゲートウェイサーバ２０１は、上述したようにＲＴＰ音声データの伝送先がＩＶＲ装置２０３であると判定し、ＲＴＰ音声データの伝送先がＩＶＲ装置２０３であると判定した場合のみ、さらにそのＲＴＰ音声データがあらかじめ定められたＤＴＭＦ信号の周波数に一致する信号を有しているかどうかを判定する。そしてＲＴＰ音声データがＤＴＭＦ信号の周波数に一致する信号を有していると判定した場合、一致すると判定されたその信号の情報をＳＩＰプロトコルのＩＮＦＯメソッドにより、テレフォニーサーバ２０２を経由してＩＶＲ装置２０３に伝送する。その後、ゲートウェイサーバ２０１は、ＲＴＰ音声データのうちのＳＩＰプロトコルのＩＮＦＯメソッドとして伝送した部分のデータを削除し、その部分のデータに代えて、同じ長さの無音のＲＴＰ音声データを挿入する。このようにして加工されたＲＴＰ音声データは、他の通話音声（ＲＴＰ音声データ）と同様に、ＩＶＲ装置２０３に伝送される。 On the other hand, when operating the call center 200 in which the caller 200 transmits a DTMF signal input by operating the call terminal 100 and receives it by the IVR device 203, the DTMF signal input by the caller U is publicly exchanged as a voice signal. It reaches the gateway server 201 in the call center 200 via the telephone network 300 and is converted into RTP voice data. As described above, the gateway server 201 determines that the transmission destination of the RTP audio data is the IVR device 203 and only determines that the transmission destination of the RTP audio data is the IVR device 203. It is determined whether or not there is a signal that matches the frequency of the defined DTMF signal. If it is determined that the RTP audio data has a signal that matches the frequency of the DTMF signal, the information of the signal determined to match is transmitted to the IVR device 203 via the telephony server 202 by the SIP protocol INFO method. Transmit to. Thereafter, the gateway server 201 deletes the portion of the RTP audio data transmitted as the SIP protocol INFO method, and inserts the same length of silent RTP audio data in place of the portion of data. The RTP voice data processed in this way is transmitted to the IVR device 203 in the same manner as other call voices (RTP voice data).

このような方式により、ＤＴＭＦ信号を含むＲＴＰ音声データのうち、ＤＴＭＦ信号部分とそれ以外の部分とを分離した上で各データをＩＶＲ装置２０３に送信するので、コールセンタ２００内のネットワーク上においても、ＲＴＰ音声データを蓄積・聴取（盗聴）された場合も、架電者Ｕが通話端末１００から入力したＤＴＭＦ信号の内容が第三者に知られることはない。また、ゲートウェイサーバ２０１がコールセンタ２００と公衆交換電話網３００との接続を仲介し、その内部で上述した変換制御処理を行っているので、コールセンタ２００の内部ネットワークでは既にＤＴＭＦ信号が無音化されているため、内部ネットワークからＤＴＭＦ信号の内容が外部に漏れてしまう危険性もなくなる。 With such a method, among the RTP audio data including the DTMF signal, each data is transmitted to the IVR device 203 after separating the DTMF signal portion and the other portions, so even on the network in the call center 200, Even when RTP audio data is stored and listened (wired), the content of the DTMF signal input from the call terminal 100 by the caller U is not known to a third party. In addition, since the gateway server 201 mediates the connection between the call center 200 and the public switched telephone network 300 and performs the above-described conversion control processing therein, the DTMF signal has already been silenced in the internal network of the call center 200. For this reason, there is no risk of the contents of the DTMF signal leaking from the internal network.

図９Ａは、音声解析生成部２０１５が有する受信ジッタバッファ２０１５１に記憶されるデータの遷移を示すイメージ図である。図９Ａに示すように、受信ジッタバッファ２０１５１では、まず、音声解析部２０１５が通話音声を示す音声信号から変換したＲＴＰ音声データ１が蓄積され、ＤＴＭＦ解析部２０１６が蓄積されたＲＴＰ音声データ１に含まれる音声信号の周波数を解析する。このとき、ＲＴＰ音声データ１は、通話音声による音声信号であるため、ＤＴＭＦ信号の周波数に一致しない。したがって、ＲＴＰ音声データ１はそのまま受信ジッタバッファ内に蓄積される（手順１）。 FIG. 9A is an image diagram showing a transition of data stored in the reception jitter buffer 20151 included in the voice analysis generation unit 2015. As shown in FIG. 9A, in the reception jitter buffer 20151, firstly, RTP voice data 1 converted from a voice signal indicating a call voice by the voice analysis unit 2015 is stored, and the RTP voice data 1 stored by the DTMF analysis unit 2016 is stored. Analyzes the frequency of the included audio signal. At this time, since the RTP audio data 1 is an audio signal based on a call audio, it does not match the frequency of the DTMF signal. Accordingly, the RTP audio data 1 is stored as it is in the reception jitter buffer (procedure 1).

そして、音声解析生成部２０１５は、続いて入力されたＲＴＰ音声データ２をＲＴＰ音声データ１と同様に、受信ジッタバッファ２０１５１内に蓄積する（手順２）。そして、ＤＴＭＦ解析部２０１６は、ＲＴＰ音声データ２に含まれる音声信号の周波数を解析する。このとき、ＲＴＰ音声データ２は、通話者Ｕが通話端末１００を操作して入力したＤＴＭＦ信号であるためＤＴＭＦ信号の周波数に一致する。したがって、ＲＴＰ音声データ２は、その部分が無音音声に差し替えられたＲＴＰ音声データ３に変換され、受信ジッタバッファ内に蓄積する（手順３）。 Then, the voice analysis generation unit 2015 accumulates the subsequently input RTP voice data 2 in the reception jitter buffer 20151 similarly to the RTP voice data 1 (procedure 2). Then, the DTMF analysis unit 2016 analyzes the frequency of the audio signal included in the RTP audio data 2. At this time, since the RTP audio data 2 is a DTMF signal input by the caller U operating the call terminal 100, it matches the frequency of the DTMF signal. Therefore, the RTP audio data 2 is converted into RTP audio data 3 in which the portion is replaced with silent audio, and is accumulated in the reception jitter buffer (procedure 3).

次に、音声解析生成部２０１５は、続いて入力されたＲＴＰ音声データ４は通話音声による音声信号であるため、ＤＴＭＦ解析部２０１６は、手順１の場合と同様にＤＴＭＦ信号ではないと判定し、そのまま受信ジッタバッファ内に蓄積される（手順４）。そして、音声解析部２０１５は、ＲＴＰ音声データ４が蓄積されたときに、あらかじめ定められたタイミングが到来したと判定し、その時点で受信ジッタバッファ２０１５１内に蓄積されているＲＴＰ音声データをＩＶＲ装置２０３に伝送するように、第二送受信制御部２０１２に通知し、第二送受信制御部２０１２は通知に従って、これらのデータを送信する（手順５）。 Next, the voice analysis generation unit 2015 determines that the DTMF analysis unit 2016 is not a DTMF signal as in the procedure 1 because the subsequently input RTP voice data 4 is a voice signal based on a call voice. It is stored in the reception jitter buffer as it is (procedure 4). Then, the voice analysis unit 2015 determines that a predetermined timing has arrived when the RTP voice data 4 is accumulated, and uses the RTP voice data accumulated in the reception jitter buffer 20151 at that time as the IVR device. The second transmission / reception control unit 2012 notifies the second transmission / reception control unit 2012 so that the data is transmitted to 203, and the second transmission / reception control unit 2012 transmits these data in accordance with the notification (step 5).

なお、通常は、ＤＴＭＦ信号は５０ｍｓよりも大きいため、ＲＴＰ音声データの伝送先であるＩＶＲ装置２０３にＲＴＰ音声データを送信するために、受信ジッタバッファ２０１５１の蓄積許容量を５０ｍｓよりも大きな値で設定したうえで、上述した手順３でＤＴＭＦ信号のＲＴＰ音声データを無音化したＲＴＰ音声データに差し替える必要がある。この場合、受信ジッタバッファ２０１５１の蓄積量を、例えば５００ｍｓ程度にまで極端に大きくした場合、ＲＴＰ音声データの送受信の遅延が発生することが考えられるが、架電者ＵがＩＶＲ装置２０３の自動音声応答に従って通話端末１００を操作している間だけであるため、コールセンタ２００での業務を遅延させる程度ではなく、全体の業務には支障は生じない。また、一定のタイミングでのみＲＴＰ音声データをＩＶＲ装置２０３に伝送するので、逐次送信する場合に比べてゲートウェイサーバ２０１の処理負荷が軽減できる。 Normally, since the DTMF signal is larger than 50 ms, in order to transmit the RTP audio data to the IVR device 203 that is the transmission destination of the RTP audio data, the storage allowable amount of the reception jitter buffer 20151 is set to a value larger than 50 ms. After setting, it is necessary to replace the RTP audio data of the DTMF signal with the silenced RTP audio data in the procedure 3 described above. In this case, when the accumulation amount of the reception jitter buffer 20151 is extremely increased to about 500 ms, for example, it is considered that a delay in transmission / reception of RTP audio data occurs. Since it is only during the operation of the call terminal 100 according to the response, the work at the call center 200 is not delayed and the entire work is not hindered. In addition, since the RTP audio data is transmitted to the IVR device 203 only at a fixed timing, the processing load on the gateway server 201 can be reduced as compared with the case of sequential transmission.

また、ＤＴＭＦ信号は、操作者Ｕが通話端末１００を操作する時間（例えば、プッシュボタンを押下する時間）によって、その長さも変わる。したがって、図９Ｂに示すように、ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５が受信ジッタバッファ２０１５１に出力したＲＴＰ音声データを、一定の周期（例えば、ＤＴＭＦ信号のサイズの１０分の１である５ｍｓ）で繰り返しサンプリングして取得し、取得したそのデータが、ＤＴＭＦ信号の周波数に一致するか否かを判定し、ＤＴＭＦ信号の周波数に一致すると判定した場合、ＤＴＭＦ解析部２０１６は、その周波数とそのＤＴＭＦ信号の長さ（例えば、５ｍｓ間隔で１０回同じ周波数を検出した場合には、その長さである５０ｍｓ）を音声解析生成部２０１５に通知する。 Further, the length of the DTMF signal also changes depending on the time during which the operator U operates the call terminal 100 (for example, the time when the push button is pressed). Therefore, as shown in FIG. 9B, the DTMF analysis unit 2016 outputs the RTP audio data output from the audio analysis generation unit 2015 to the reception jitter buffer 20151 with a certain period (for example, 1/10 of the size of the DTMF signal). 5 ms), it is determined whether or not the acquired data matches the frequency of the DTMF signal. If it is determined that the data matches the frequency of the DTMF signal, the DTMF analysis unit 2016 The voice analysis generation unit 2015 is notified of the length of the DTMF signal (for example, if the same frequency is detected 10 times at intervals of 5 ms, the length is 50 ms).

そして、ＤＴＭＦ解析部２０１６は、ＲＴＰ音声データのなかからＤＴＭＦ信号の長さだけＤＴＭＦ信号情報を抽出し、ＳＩＰプロトコル制御部２０１４は、そのＤＴＭＦ信号情報を、ＳＩＰメッセージ（ＩＮＦＯデータ）として第二送受信制御部２０１２に出力し、第二送受信制御部２０１２は、テレフォニーサーバ２０２を経由して、そのＳＩＰメッセージをＩＶＲ装置２０３に送信する。一方、ＤＴＭＦ解析部２０１６は、音声解析生成部２０１５に対して、その長さのＤＴＭＦ信号情報を削除し、無音のＲＴＰ音声データ、あるいはＤＴＭＦ信号とは無関係の周波数のノイズデータを同じ位置に挿入するように指示し、音声解析生成部２０１５は、その指示に従って、ＲＴＰ音声データを差し替える。 Then, the DTMF analysis unit 2016 extracts DTMF signal information from the RTP audio data by the length of the DTMF signal, and the SIP protocol control unit 2014 transmits the DTMF signal information as a SIP message (INFO data) to the second transmission / reception. The second transmission / reception control unit 2012 transmits the SIP message to the IVR device 203 via the telephony server 202. On the other hand, the DTMF analysis unit 2016 deletes the length of DTMF signal information from the speech analysis generation unit 2015 and inserts silence RTP speech data or noise data having a frequency unrelated to the DTMF signal at the same position. The voice analysis generation unit 2015 replaces the RTP voice data in accordance with the instruction.

このとき、ＤＴＭＦ解析部２０１６は、ＤＴＭＦ信号の周波数を検出する周期の１つ前の周期Ｔ１（あるいは１つ後の周期Ｔ２）では、ＤＴＭＦ信号とは異なった周波数を検出することとなるため、少なくともＤＴＭＦ信号の周波数を検出した周期と、その前後の周期で検出した周波数とを比較して、その前後の周波数がＤＴＭＦ信号の周波数とは異なる周波数であるか否かを判定し、その前後の周波数がＤＴＭＦ信号の周波数とは異なる周波数であると判定した場合には、その間の長さのＤＴＭＦ信号が入力されたと判定する。図９Ｂに示した例では、ＤＴＭＦ解析部２０１６が、Ｔ２のタイミングでＤＴＭＦ信号とは異なった周波数を検出した場合には、その周期Ｔ２までの長さのＤＴＭＦ信号が入力されたと判定し、その間のＤＴＭＦ信号に相当する部分のＲＴＰ音声データが無音化されることとなる。また、通常のＤＴＭＦ信号よりも短い長さで異なる周波数を検出した場合には、その周期Ｔ３までの長さのＤＴＭＦ信号が入力されたと判定し、その間のＤＴＭＦ信号に相当する部分のＲＴＰ音声データが無音化されることとなる。 At this time, the DTMF analysis unit 2016 detects a frequency different from that of the DTMF signal in the cycle T1 immediately before the cycle for detecting the frequency of the DTMF signal (or the cycle T2 after the DTMF signal). At least the period of detecting the frequency of the DTMF signal is compared with the frequency detected in the period before and after the period, and it is determined whether the frequency before and after that is different from the frequency of the DTMF signal. When it is determined that the frequency is different from the frequency of the DTMF signal, it is determined that a DTMF signal having a length between them is input. In the example shown in FIG. 9B, when the DTMF analysis unit 2016 detects a frequency different from the DTMF signal at the timing of T2, it determines that a DTMF signal having a length up to the period T2 has been input, The RTP audio data corresponding to the DTMF signal is silenced. If a different frequency is detected with a shorter length than that of a normal DTMF signal, it is determined that a DTMF signal having a length up to the period T3 has been input, and a portion of the RTP audio data corresponding to the DTMF signal therebetween Will be silenced.

このように、ＤＴＭＦ信号の周波数の変化を確認した上で無音化を行うので、通話端末１００で１回の操作により発生したＤＴＭＦ信号を確実に無音化させることができ、例えば、通話端末１００におけるボタンの２度押し操作が行われた場合であっても、ＤＴＭＦ解析部２０１６が誤ってＤＴＭＦ信号の長さを検出し、その結果、ＤＴＭＦ信号ではない部分のＲＴＰ音声データを誤って無音化することがなくなる。 As described above, since the silence is performed after confirming the change in the frequency of the DTMF signal, the DTMF signal generated by one operation at the call terminal 100 can be reliably silenced. Even when the button is pressed twice, the DTMF analysis unit 2016 erroneously detects the length of the DTMF signal, and as a result, erroneously silences the RTP audio data of the portion that is not the DTMF signal. Nothing will happen.

なお、図２に示した例では、通話者ＵとＩＶＲ装置２０３とが通話状態にある場合について説明した。しかし、実際にはコールセンタ２００内のオペレータＯが状況を確認する等して、通話者ＵとＩＶＲ装置２０３とオペレータＯとの三者通話の状態になる場合もある。この場合、図１０に示すように、架電者Ｕが通話端末１００を操作し、コールセンタ２００に電話を掛け、オペレータＯと通話状態になった後、架電者ＵとＩＶＲ装置２０３を通話状態にするために、オペレータＯがコンピュータ２０４あるいは通話端末２０５を操作し、ＩＶＲ装置２０３に接続する。このとき、架電者ＵとオペレータＯの通話は終了せず、架電者Ｕ、オペレータＯ、ＩＶＲ装置２０３の三者通話状態になる。 In the example illustrated in FIG. 2, the case where the caller U and the IVR device 203 are in a call state has been described. However, in reality, there is a case where the operator O in the call center 200 confirms the situation and the state of the three-party call between the caller U, the IVR device 203, and the operator O may occur. In this case, as shown in FIG. 10, the caller U operates the call terminal 100, calls the call center 200, enters a call state with the operator O, and then connects the caller U and the IVR device 203 to a call state. Therefore, the operator O operates the computer 204 or the call terminal 205 to connect to the IVR device 203. At this time, the call between the caller U and the operator O is not terminated, and the caller U, the operator O, and the IVR device 203 are in a three-party call state.

このような場合であっても、架電者ＵとオペレータＯとの間の通話音声の伝送方式、および架電者ＵとＩＶＲ装置１０２との間の通話伝送方式は、図２に示した場合と同様であり（Ｓ４〜Ｓ９）、ＤＴＭＦ信号情報は、ＳＩＰプロトコルのＩＮＦＯメソッドによりＩＶＲ装置２０３にのみ送信される一方、ＲＴＰ音声データはゲートウェイサーバ２０１においてＤＴＭＦ信号情報を有したＲＴＰ音声データが無音化されたＲＴＰ音声データに差し替えられ、ＩＶＲ装置２０３および通話端末２０５に送信される。したがって、図２に示した場合と同様に、第三者が同じコールセンタのオペレータの場合であっても、ＤＴＭＦ信号情報が聴取（盗聴）されることを防止することができる。 Even in such a case, the call voice transmission method between the caller U and the operator O and the call transfer method between the caller U and the IVR device 102 are as shown in FIG. (S4 to S9), DTMF signal information is transmitted only to the IVR device 203 by the SIP protocol INFO method, while RTP voice data is silenced in the gateway server 201 with DTMF signal information. The converted RTP voice data is transmitted to the IVR device 203 and the call terminal 205. Therefore, similarly to the case shown in FIG. 2, even if the third party is an operator of the same call center, it is possible to prevent the DTMF signal information from being heard (wired).

また、図２に示した例では、１つのコールセンタを対象としたが、複数のコールセンタを有したマルチサイトコールセンタシステムについて適用することも可能である。図１１は、本発明にかかる自動応答システムおよび自動応答方法を、マルチサイトコールセンタシステムに適用した場合の構成例を示す図である。図１１では、マルチサイトコールセンタシステムにおいて、架電者ＵとＩＶＲ装置１１０２が通話状態にある場合を示している。 Further, in the example shown in FIG. 2, one call center is targeted, but the present invention can also be applied to a multi-site call center system having a plurality of call centers. FIG. 11 is a diagram showing a configuration example when the automatic response system and the automatic response method according to the present invention are applied to a multi-site call center system. FIG. 11 shows a case where the caller U and the IVR device 1102 are in a call state in the multi-site call center system.

図１１に示すように、マルチサイトコールセンタシステムでは、テレフォニーサーバ１１０１およびＩＶＲ装置１１０２が、コールセンタとは別のセンタ（データセンタ）１１００に設けられ、各コールセンタとデータセンタ１１００とが互いにＷＡＮにより接続されている。このような場合であっても、ゲートウェイサーバ２０１がＤＴＭＦ信号情報をＳＩＰプロトコルのＩＮＦＯメソッドによりＩＶＲ装置１１０２に送信し、ＲＴＰ音声データのうちのＤＴＭＦ信号情報を無音化したＲＴＰ音声データに差し替える点は、図２に示した場合と同様である（Ｓ４、Ｓ１０〜Ｓ１３）。 As shown in FIG. 11, in the multi-site call center system, the telephony server 1101 and the IVR device 1102 are provided in a center (data center) 1100 different from the call center, and each call center and the data center 1100 are connected to each other by a WAN. ing. Even in such a case, the gateway server 201 transmits the DTMF signal information to the IVR apparatus 1102 by the SIP protocol INFO method, and replaces the DTMF signal information in the RTP audio data with the silenced RTP audio data. This is the same as the case shown in FIG. 2 (S4, S10 to S13).

マルチサイトコールセンタシステムでは、図１１に示したように、データセンタ１１００と各拠点のコールセンタ２００とをＷＡＮ１１００で接続することが多いが、ＷＡＮ１１００上を流れるＲＴＰ音声データのうち、架電者Ｕが入力したＤＴＭＦ信号は、図２に示した場合と同様に、ゲートウェイサーバ２０１において、ＤＴＭＦ信号情報が無音化されたＲＴＰ音声データとして送信されるため、ＷＡＮ１１００上でＲＴＰ音声データを蓄積・聴取（盗聴）された場合であっても、架電者Ｕが入力したＤＴＭＦ信号の内容が第三者に知られることはない。 In the multi-site call center system, as shown in FIG. 11, the data center 1100 and the call center 200 at each site are often connected by the WAN 1100, but the caller U inputs the RTP voice data flowing on the WAN 1100. Since the DTMF signal is transmitted as RTP audio data in which the DTMF signal information is silenced in the gateway server 201 as in the case shown in FIG. 2, the RTP audio data is accumulated and listened on (wiretapping) on the WAN 1100. Even in such a case, the content of the DTMF signal input by the caller U is not known to a third party.

さらに、ＷＡＮ１１００の通信品質が悪いときや帯域が不足した場合、ＷＡＮ１１００上で、ＲＴＰ音声データに歪が発生することがあるため、この場合にはＩＶＲ装置１１０２がＲＴＰ音声データでＤＴＭＦ信号を正しく認識できない場合があったが、ＩＶＲ装置１１０２が、図２に示した場合のように、ＳＩＰプロトコルのＩＮＦＯメソッドを受け取ることにより、ＤＴＭＦ信号情報を伝送する方式に変更することにより、そのような歪の影響を受けることなく、ＤＴＭＦ信号情報をＩＶＲ装置に正しく認識させることが可能となる。従来技術のように暗号化やスクランブリングした場合、盗聴に対するセキュリティを強化することはできるが、ＤＴＭＦ信号をＲＴＰ通信でＩＶＲ装置に送信するため、ＩＶＲ装置における誤認識を防止することはできない。 Further, when the communication quality of the WAN 1100 is poor or the bandwidth is insufficient, the RTP audio data may be distorted on the WAN 1100. In this case, the IVR device 1102 correctly recognizes the DTMF signal using the RTP audio data. In some cases, the IVR device 1102 receives the SIP protocol INFO method as shown in FIG. 2 and changes to a method for transmitting DTMF signal information. The DTMF signal information can be correctly recognized by the IVR device without being affected. When encryption or scrambling is performed as in the prior art, security against eavesdropping can be enhanced, but since a DTMF signal is transmitted to the IVR device by RTP communication, erroneous recognition in the IVR device cannot be prevented.

また、上述したコールセンタシステム、マルチサイトコールセンタシステムにおいて、通話者の音声を録音する通話録音システムと連携させた場合等には、通話録音システム側では、ＤＴＭＦ信号情報が無音化された後のＲＴＰ音声データが録音されるため、録音された音声データを聞き取る際に耳障りな不必要な音を再生させることがなくなる。さらに、通話録音システムで録音されたＲＴＰ音声データを、コールセンタ側でテキスト化するような運用を行っている場合、従来技術のような暗号化やスクランブル化した場合には、その暗号化した部分やスクランブル化した部分が誤変換され、正しくテキスト化されない虞があるが、本実施例では音声データを無音化しているため、そのような誤変換の虞がなくなる。 In the above-described call center system and multi-site call center system, when linked with a call recording system for recording a caller's voice, the RTP voice after the DTMF signal information is silenced on the call recording system side. Since the data is recorded, it is no longer necessary to reproduce unpleasant unnecessary sounds when listening to the recorded voice data. Furthermore, when the RTP audio data recorded by the call recording system is operated as text on the call center side, when encrypted or scrambled as in the prior art, the encrypted portion or Although the scrambled portion may be erroneously converted and may not be correctly converted into text, in this embodiment, since the audio data is silenced, there is no possibility of such erroneous conversion.

１００通話端末
１０１入力受付部
１０２受話部
１０３送話部
１０４通信制御部
２００コールセンタ
２０１ゲートウェイサーバ
２０１１第一送受信制御部
２０１２第二送受信制御部
２０１３ＰＳＴＮプロトコル制御部
２０１４ＳＩＰプロトコル制御部
２０１５音声解析生成部
２０１６ＤＴＭＦ解析部
２０１７設定情報記憶部 DESCRIPTION OF SYMBOLS 100 Call terminal 101 Input reception part 102 Reception part 103 Transmission part 104 Communication control part 200 Call center 201 Gateway server 2011 1st transmission / reception control part 2012 2nd transmission / reception control part 2013 PSTN protocol control part 2014 SIP protocol control part 2015 Voice analysis production | generation part 2016 DTMF analysis unit 2017 setting information storage unit

Claims

An automatic response in which a voice signal transmitted from a call terminal is converted into voice data and the voice data is transmitted to a response device that responds to the voice data, and the call terminal is connected via a network. A system,
The call terminal is
A transmitter that receives the voice uttered by the caller;
An input receiving unit for receiving a push operation from the caller;
A transmission control unit that converts an output signal by the push operation received by the input reception unit and the sound into a sound signal and transmits the sound data of the first format including the converted sound signal to the server device; ,
The server device
A first communication unit for receiving voice data in the first format received from the call terminal;
The voice data of the first format received by the first communication unit is converted into voice data of a second format different from the first format, and the voice data of the second format is converted into the voice data of the second format by the push operation. When it is determined that the data of the frequency that matches the frequency of the converted audio signal is included in the output signal, the data in the audio data of the second format is deleted and replaced with silence data A voice analysis generator for generating voice data;
An analysis unit that determines whether or not the audio data of the second format includes data having a frequency that matches the frequency of the audio signal obtained by converting the output signal by the push operation;
A protocol control unit that outputs the data deleted by the voice analysis generation unit in a third format that is a different protocol from the replacement voice data;
A second communication unit for transmitting the replacement voice data and the data output in the third format to the response device;
An automatic response system comprising:

The voice analysis generation unit includes a buffer unit that stores voice data of the second format,
The voice analysis generation unit accumulates the audio data of the second format in the buffer unit until a predetermined period arrives, and when the period arrives, the second format accumulated until then Audio data is output to the second communication unit, and the second communication unit transmits the output audio data of the second format to the response device.
The automatic response system according to claim 1, wherein:

The analysis unit acquires the frequency data included in the audio data of the second format output by the audio analysis generation unit by sampling at a predetermined period, and the timing at which the acquired frequency data is acquired. If the frequency data before and after that timing is determined to be different frequency data, and if it is determined that the frequency data is different frequency data, the frequency data corresponding to the length obtained so far In order to silence the voice analysis generation unit, the voice analysis control unit generates the replacement voice data according to the instruction,
The automatic answering system according to claim 1 or 2, characterized by the above-mentioned.

The transmission control unit outputs the voice data in the first format conforming to a PSTN (Public Switched Telephone Network) protocol,
The voice analysis generation unit outputs the voice data of the second format that conforms to the RTP (Real-time Transport Protocol) protocol,
The protocol control unit outputs data in a format conforming to the SIP (Session Initiation Protocol) protocol.
The automatic answering system according to any one of claims 1 to 3.

An automatic response in which a voice signal transmitted from a call terminal is converted into voice data and the voice data is transmitted to a response device that responds to the voice data, and the call terminal is connected via a network. An automatic response method performed in the system,
A transmission step for receiving the voice uttered by the caller;
An input accepting step for accepting a push operation on the call terminal from the caller;
A first transmission step of converting the output signal by the push operation received in the input reception step and the sound into a sound signal, and transmitting the sound data of the first format including the converted sound signal to the server device;
Receiving the audio data in the first format received from the call terminal;
A conversion step of converting the audio data of the first format received in the reception step into audio data of a second format different from the first format;
An analysis step of determining whether or not data of a frequency matching the frequency of the audio signal obtained by converting the output signal by the push operation is included in the audio data of the second format;
When it is determined that the audio data in the second format includes data having a frequency that matches the frequency of the audio signal converted from the output signal by the push operation, the audio data in the second format A generating step of generating replacement voice data by deleting the data in and replacing with silent data;
An output step of outputting the deleted data in a third format which is a protocol different from the replacement voice data;
A second transmission step of transmitting the replacement voice data and the data output in the third format to the response device;
An automatic response method characterized by comprising: