KR20170102394A

KR20170102394A - Call Recording Server, Call-Data Management System, and Call-Data Management Method

Info

Publication number: KR20170102394A
Application number: KR1020177024362A
Authority: KR
Inventors: 세이고 아라이; 미쯔루 쯔쯔미; 타케시 모리와키
Original assignee: 어드밴스드 미디어, 인코포레이티드
Priority date: 2014-03-17
Filing date: 2015-03-11
Publication date: 2017-09-08
Also published as: KR20160100412A; KR101826918B1; WO2015141189A1; JP5860085B2; JP2015177411A; CN106068641A; CN106068641B; TW201540041A; TWI569619B

Abstract

보다 간단하게 다수의 IP 전화기의 통화 내용을 감시하는 것을 가능하게 하는 통화녹음 서버를 제공하는 통화녹음 서버. 통화녹음 서버(410)는, IP 전화망에서 전송되는 통화의 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 음성 기록 제어부(414)와, 취득된 음성 데이터에 부수하는 제어 정보에 기초하여, 통화가 시작된 통화개시 타이밍을 취득하는 통화개시 취득부(415)와, 취득된 통화개시 타이밍의 직후에, 기록된 음성 데이터에 대한 음성인식 처리를 개시시키는 음성인식 제어부(416)를 가진다.A call recording server that provides a call recording server that makes it possible to more simply monitor call content of multiple IP phones. The call recording server 410 includes a voice recording control section 414 for sequentially acquiring voice data of a call transmitted from the IP telephone network and recording the voice data in a memory, And a voice recognition control section 416 for starting speech recognition processing on the recorded voice data immediately after the acquired call start timing.

Description

[0001] The present invention relates to a call recording server, a call data management system, and a call data management method,

본 개시(開示)는, IP 전화망에서 통화 음성의 데이터를 기록하여 관리하는 통화녹음 서버, 통화 데이터 관리 시스템, 및 통화 데이터 관리 방법에 관한 것이다.The present disclosure relates to a call recording server, a call data management system, and a call data management method for recording and managing call voice data in an IP telephone network.

종래부터, 콜 센터에서 서비스 품질 향상 등의 여러가지 목적으로, 통화 내용을 확인 혹은 감시(이하 「감시」라고 함)하는 일이 행해지고 있다. 또, 최근, VoIP(Voice over Internet Protocol) 기술을 사용한 IP 전화의 보급이 진행되고 있다. 이 때문에, IP 전화망에서의 통화 음성의 데이터를 기록하여 관리하기 위한 여러가지 기술이 존재하고 있다(예를 들면, 특허문헌 1 참조). BACKGROUND ART Heretofore, there has been a task of confirming or monitoring call contents (hereinafter referred to as " monitoring ") for various purposes such as improving service quality in a call center. In recent years, IP telephones using VoIP (Voice over Internet Protocol) technology have been spreading. For this reason, various technologies exist for recording and managing voice data of the IP telephone network (see, for example, Patent Document 1).

특허문헌 1에 기재된 기술(이하 「종래기술」이라고 함)에서, 콜 센터의 오퍼레이터 단말은, 전화 응대가 종료한 후에 오퍼레이터에 의해서 작성되는 업무 이력 정보와 통화 음성의 음성 데이터에 대한 음성 인식 처리의 결과인 음성인식 정보를 서버에 송신한다. 관리자 단말은, 업무 이력 정보 및 음성인식 정보를 취득하여 관리자에게 제시한다.In the technology described in Patent Document 1 (hereinafter, referred to as "prior art"), the operator terminal of the call center performs a voice recognition process for voice data of voice data of call history and business history information created by the operator And transmits the resultant voice recognition information to the server. The manager terminal obtains business history information and voice recognition information and presents the information to the manager.

이러한 종래기술에 의하면, IP 전화망에서의 통화 음성에 대한 음성인식 결과를 통화가 종료한 후에 관리자가 확인할 수 있다. 즉, 종래기술을 이용함으로써 IP 전화의 통화 내용을 감시할 수 있다.According to this conventional technique, the manager can confirm the speech recognition result of the call voice in the IP telephone network after the call is terminated. That is, it is possible to monitor the contents of the IP telephone call by using the conventional technology.

특허문헌 1: 일본 특허공개 제2008-211271호 공보Patent Document 1: JP-A-2008-211271

그렇지만, 각 통화가 종료한 후에, 확인이 필요한 통화를 픽업하고, 픽업된 통화의 음성 데이터를 검색하고, 축적된 음성인식 결과나 음성 데이터를 확인하는 작업은 번거롭고 시간이 걸린다. 대규모 콜센터와 같이 IP 전화기의 대수가 많은 경우, 서버에 축적되는 상기 업무이력 정보 및 음성인식 정보의 양도 많아져서, 이러한 작업은 매우 번잡한 것이 된다. 따라서, 종래기술은 감시 대상이 되는 IP 전화기의 대수가 많은 케이스에 적용하는 것이 어렵다.However, after each call is terminated, an operation of picking up a call requiring confirmation, retrieving the voice data of the picked-up call, and confirming the accumulated voice recognition result and voice data is troublesome and time-consuming. When the number of IP telephones is large, such as a large-scale call center, the amount of the business history information and the voice recognition information accumulated in the server is increased, and this operation becomes very troublesome. Therefore, it is difficult to apply the prior art to a case where the number of IP telephones to be monitored is large.

본 개시의 목적은, 보다 간단하게 다수의 IP 전화기의 통화 내용을 감시하는 것을 가능하게 하는, 통화녹음 서버, 통화 데이터 관리 시스템, 및 통화 데이터 관리 방법을 제공하는 것이다.It is an object of the present disclosure to provide a call recording server, call data management system, and call data management method which makes it possible to monitor call contents of multiple IP telephones more simply.

본 개시의 통화녹음 서버는, IP 전화망에서 전송되는 통화의 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 음성 기록 제어부와, 취득된 상기 음성 데이터에 부수하는 제어 정보에 기초하여, 상기 통화가 개시된 통화개시 타이밍을 취득하는 통화개시 취득부와, 취득된 상기 통화개시 타이밍의 직후에, 기록된 상기 음성 데이터에 대한 음성인식 처리를 개시시키는 음성인식 제어부를 가진다.A communication recording server of the present disclosure includes a voice recording control section for sequentially acquiring voice data of a call transmitted from an IP telephone network and recording the voice data in a memory, And a voice recognition control section for starting speech recognition processing on the recorded voice data immediately after the acquired call start timing.

본 개시의 통화 데이터 관리 시스템은, IP 전화망에서 전송되는 통화의 음성 데이터를 기록하는 통화녹음 서버와, 기록된 상기 음성 데이터에 대한 음성인식 처리를 행하고, 상기 음성인식 처리의 결과인 텍스트 데이터를 생성하는 음성인식 서버와, 기록된 상기 음성 데이터와 생성된 상기 텍스트 데이터를 대응시켜 제시하는 모니터링 장치를 가지는 통화 데이터 관리 시스템으로서, 상기 통화녹음 서버는, IP 전화망으로부터 상기 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 음성 기록 제어부와, 취득된 상기 음성 데이터에 부수하는 제어 정보에 기초하여, 상기 통화가 시작된 통화개시 타이밍을 취득하는 통화개시 취득부와, 기록된 상기 음성 데이터를 상기 음성인식 서버에 출력하고, 상기 음성인식 서버에 대해서, 취득된 상기 통화개시 타이밍의 직후에, 상기 음성 데이터에 대한 음성인식 처리를 개시시키는 음성인식 제어부를 가진다.The call data management system of the present disclosure includes a call recording server for recording voice data of a call transmitted from an IP telephone network, a voice recognition server for performing voice recognition processing on the recorded voice data and generating text data as a result of the voice recognition processing And a monitoring device for presenting the recorded voice data and the generated text data in association with each other, wherein the call recording server sequentially acquires the voice data from the IP telephone network and stores the voice data in the memory A call start acquisition unit that acquires a call start timing at which the call is started based on control information associated with the acquired voice data; and a call start acquisition unit that outputs the recorded voice data to the voice recognition server To the speech recognition server, And a voice recognition control section for starting voice recognition processing on the voice data immediately after the timing.

본 개시의 통화 데이터 관리 방법은, IP 전화망에서 전송되는 통화의 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 스텝과, 취득된 상기 음성 데이터에 부수하는 제어 정보에 기초하여, 상기 통화가 시작된 통화개시 타이밍을 취득하는 스텝과, 취득된 상기 통화개시 타이밍의 직후에, 기록된 상기 음성 데이터에 대한 음성인식 처리를 개시시키는 스텝을 가진다.A call data management method of the present disclosure includes a step of sequentially acquiring voice data of a call transmitted from an IP telephone network and writing the voice data into a memory, and a step of, based on control information associated with the acquired voice data, And a step of starting speech recognition processing on the recorded voice data immediately after the acquired call start timing.

본 개시에 의하면, IP 전화망에서 전송되는 통화의 음성 데이터에 대한 음성인식 처리를 통화개시 타이밍의 직후부터 시작하므로, 음성인식 결과를 통화하는 도중에 거의 실시간으로 제시할 수 있다. 따라서, 본 개시에 의하면, 보다 간단하게 다수의 IP 전화기의 통화 내용을 감시하는 일을 가능하게 한다.According to the present disclosure, speech recognition processing on voice data of a call transmitted from an IP telephone network starts immediately after a call start timing, so that a voice recognition result can be presented almost in real time during a call. Thus, according to the present disclosure, it is possible to monitor the contents of conversations of multiple IP telephones more simply.

도 1은 본 개시의 한 실시형태에 따른 통화 데이터 관리 시스템을 포함한 통신 시스템의 구성 일례를 나타내는 시스템 구성도이다.
도 2는 본 실시형태에 따른 통화녹음 서버의 구성 일례를 나타내는 블록도이다.
도 3은 본 실시형태에 따른 통화녹음 서버의 동작 일례를 나타내는 흐름도이다.
도 4는 본 실시형태에 따른 통신 시스템의 동작의 흐름의 일례를 나타내는 순서도이다.1 is a system configuration diagram showing an example of the configuration of a communication system including a call data management system according to an embodiment of the present disclosure.
2 is a block diagram showing an example of the configuration of a call recording server according to the present embodiment.
3 is a flowchart showing an example of operation of the call recording server according to the present embodiment.
4 is a flowchart showing an example of the flow of operation of the communication system according to the present embodiment.

이하, 본 개시의 한 실시형태에 대해서 도면을 참조하여 상세히 설명한다. 본 실시형태는, 본 개시를, 다수의 IP 전화기를 배치한 콜센터의 통화 감시 시스템에 적용했을 경우의 구체적 형태의 일례이다. Hereinafter, one embodiment of the present disclosure will be described in detail with reference to the drawings. The present embodiment is an example of a concrete form when the present disclosure is applied to a call monitoring system of a call center in which a plurality of IP telephones are arranged.

<시스템 구성> <System configuration>

우선, 본 실시형태에 따른 통화 데이터 관리 시스템을 포함한 통신 시스템의 구성에 대해 설명한다.First, the configuration of a communication system including the call data management system according to the present embodiment will be described.

도 1은 본 실시형태에 따른 통화 데이터 관리 시스템을 포함한 통신 시스템의 구성 일례를 나타내는 시스템 구성도이다.1 is a system configuration diagram showing an example of the configuration of a communication system including a call data management system according to the present embodiment.

도 1에서, 통신 시스템(100)은 외선망(200), 내선망(300), 및 통화 관리망(400)을 가진다.In FIG. 1, the communication system 100 has an external network 200, an internal network 300, and a call management network 400.

외선망(200)은, 인터넷 등의 공공망으로, 콜센터의 고객이 사용하는 IP 단말(도시하지 않음)이 접속된 통신 네트워크이다. 즉, 외선망(200)은 콜 센터가 형성하는 IP 전화망의 일부를 구성한다.The external network 200 is a public network such as the Internet and is a communication network to which an IP terminal (not shown) used by a customer of the call center is connected. That is, the external network 200 constitutes a part of the IP telephone network formed by the call center.

내선망(300)은, 콜 센터에 구축된 LAN(Local Area Network) 등의 통신 네트워크의 일부이다. 내선망(300)은, 제1~제N의 전화기(310₁~310_N), 네트워크 기기(320) 및 PBX(Private Branch eXchange) 장치(330)를 가진다.The inner network 300 is a part of a communication network such as a LAN (Local Area Network) built in a call center. The inner network 300 has first to Nth telephones 310 ₁ to 310 _N , a network device 320 and a PBX (Private Branch eXchange) device 330.

각 전화기(310)는, 고객 응대를 하는 오퍼레이터가 사용하는 IP 전화기이다. 제1~제N의 전화기(310₁~310_N)는, 네트워크 기기(320)를 경유하여 각각 PBX(Private Branch eXchange) 장치(330)에 접속되어 있다.Each telephone set 310 is an IP telephone set used by an operator for customer service. The first to Nth telephones 310 ₁ to 310 _N are connected to a PBX (Private Branch eXchange) device 330 via a network device 320, respectively.

네트워크 기기(320)는, 각 전화기(310)와 PBX 장치(330) 사이에서, IP 패킷의 전송을 행하는 중계 장치이며, 예를 들면, 스위칭 허브, TAP 박스 또는 라우터이다. 단, 네트워크 기기(320)는, 포토 미러링(mirroring) 등의 기능에 의해, 전송하는 IP 패킷의 복제를 통화 관리망(400)으로 송신한다.The network device 320 is a relay device for transferring IP packets between each telephone set 310 and the PBX device 330 and is, for example, a switching hub, a TAP box, or a router. However, the network device 320 transmits a copy of the IP packet to be transferred to the call management network 400 by a function such as photo-mirroring.

PBX 장치(330)는, 구내 교환기로 외선망(200)에 접속되어 있다. PBX 장치(330)는, 제1~제N의 전화기(310₁~310_N)를 수신처로 하는 IP 패킷을 외선망(200)으로부터 수신하여 네트워크 기기(320)로 전송한다. 또, PBX 장치(330)는, 외선망(200)의 IP 전화기(도시하지 않음)를 수신처로 하는 IP 패킷을 네트워크 기기(320)로부터 수신하여 외선망(200)으로 전송한다.The PBX device 330 is connected to the external network 200 by an internal exchange. The PBX device 330 receives the IP packets having the first to Nth telephone sets 310 ₁ to 310 _N as destinations from the external network 200 and transmits them to the network device 320. The PBX device 330 receives an IP packet destined for an IP telephone (not shown) of the external network 200 from the network device 320 and transmits the IP packet to the external network 200.

즉, 내선망(300)은, IP 전화망의 일부를 구성하고, 콜센터에서 행해지는 다수 통화의 IP 패킷을 전송하면서, 전송하는 IP 패킷의 복제를 통화 관리망(400)으로 송신한다.That is, the inner network 300 constitutes a part of the IP telephone network, and transmits a copy of an IP packet to be transmitted to the call management network 400 while transmitting an IP packet of a plurality of calls performed in the call center.

통화 관리망(400)은, 예를 들면, 콜 센터에 구축된 LAN등의 통신 네트워크의 일부로서, 본 개시의 통화 데이터 관리 시스템에 대응하는 부분이다. 통화 관리망(400)은, 통화녹음 서버(410), 관리 서버(420), 음성인식 서버(430) 및 모니터링 장치(440)를 가진다.The call management network 400 corresponds to a call data management system of the present disclosure, for example, as a part of a communication network such as a LAN constructed in a call center. The call management network 400 has a call recording server 410, a management server 420, a voice recognition server 430 and a monitoring device 440.

또한, 각 장치의 접속 관계는, 도 1에 나타내는 접속선에 한정되지 않는다. 각 장치는, 예를 들면, LAN에 각각 접속되어 있어 어떤 장치 간에도 통신이 가능하게 되어 있다.The connection relation of each device is not limited to the connection line shown in Fig. Each device is connected to a LAN, for example, so that communication is possible between any devices.

통화녹음 서버(410)는, 내선망(300)의 네트워크 기기(320)에 접속되어 있다. 통화녹음 서버(410)는, 네트워크 기기(320)로부터 송신되는 IP 패킷을 수신하고, 수신한 IP 패킷으로부터 통화의 음성 데이터를 추출하여 기록한다. 즉, 통화녹음 서버(410)는 IP 전화망에서 전송되는 통화의 음성 데이터를 기록한다.The call recording server 410 is connected to the network device 320 of the internal network 300. The call recording server 410 receives the IP packet transmitted from the network device 320, and extracts and records voice data of the call from the received IP packet. That is, the call recording server 410 records the voice data of the call transmitted from the IP telephone network.

도 2는 통화녹음 서버(410)의 구성 일례를 나타내는 블록도이다.2 is a block diagram showing an example of the configuration of the call recording server 410. As shown in FIG.

도 2에서, 통화녹음 서버(410)는, 전화망 통신부(411), 관리망 통신부(412), 메모리(413), 음성기록 제어부(414), 통화개시 취득부(415) 및 음성인식 제어부(416)를 가진다.2, the call recording server 410 includes a telephone network communication unit 411, a management network communication unit 412, a memory 413, a voice recording control unit 414, a call start acquisition unit 415, and a voice recognition control unit 416 ).

전화망 통신부(411)는, 내선망(300)의 통신 네트워크에 접속하기 위한 통신 인터페이스로서 네트워크 기기(320)에 접속되어 있다. 전화망 통신부(411)는, 네트워크 기기(320)로부터 송신되는 IP 패킷을 수신하고, 수신한 IP 패킷을 순차적으로 음성기록 제어부(414) 및 통화개시 취득부(415)에 출력한다.The telephone network communication unit 411 is connected to the network device 320 as a communication interface for connecting to the communication network of the inner network 300. [ The telephone network communication unit 411 receives IP packets transmitted from the network device 320 and sequentially outputs the received IP packets to the voice recording control unit 414 and the call initiation acquisition unit 415. [

관리망 통신부(412)는, 통화 관리망(400)의 통신 네트워크에 접속하기 위한 통신 인터페이스로서, 관리 서버(420), 음성인식 서버(430) 및 모니터링 장치(440)에 접속되어 있다.The management network communication unit 412 is a communication interface for connecting to the communication network of the call management network 400 and is connected to the management server 420, the voice recognition server 430 and the monitoring apparatus 440.

메모리(413)는, 하드디스크 등의 기록 매체로서 음성 기록 제어부(414)로부터 저장되는 정보를 판독 가능하도록 보유한다.The memory 413 holds information stored from the audio recording control unit 414 as a recording medium such as a hard disk so as to be readable.

음성 기록 제어부(414)는, 입력된 IP 패킷을 해석하고, IP 패킷의 각각으로부터 음성 데이터(통화 음성 신호) 및 제어 정보(통신 제어 신호)를 추출한다. 그리고, 음성 기록 제어부(414)는, 추출한 음성 데이터를 제어 정보 등의 음성 데이터를 특정하는 정보와 대응시켜, 순차적으로 메모리(413)에 저장한다. 즉, 음성 기록 제어부(414)는 IP 전화망으로부터 음성 데이터를 순차적으로 취득하여 메모리(413)에 기록한다.The voice recording control unit 414 analyzes the inputted IP packet and extracts voice data (call voice signal) and control information (communication control signal) from each of the IP packets. The audio recording control unit 414 sequentially stores the extracted audio data in the memory 413 in association with information specifying the audio data such as the control information. That is, the voice recording control section 414 sequentially acquires voice data from the IP telephone network and records the voice data in the memory 413.

음성 데이터는, 통화에서 쌍방의 화자의 발화 음성을 포함한 음향 데이터이다. 제어 정보는, 음성 데이터에 부수하는 정보로서, 통화 식별 정보, 화자 식별 정보, 및 시각(時刻) 정보를 포함한다. 통화 식별 정보는 통화를 식별하기 위한 정보(예를 들면, 쌍방의 전화번호)이다. 화자 식별 정보는 음성 데이터에 포함되는 발화 음성의 화자(IP 전화기)를 식별하기 위한 정보이다. 시각 정보는 음성 데이터가 대응하는 시각을 나타내는 정보이다. 제어 정보는 IP 패킷의 헤더 부분으로부터 취득되어도 좋고, IP 패킷의 페이로드 부분으로부터 취득되어도 좋다.The voice data is sound data including voice utterances of both speakers in the call. The control information is information accompanying the voice data, and includes call identification information, speaker identification information, and time (time) information. The call identification information is information (e.g., both telephone numbers) for identifying a call. The speaker identification information is information for identifying a speaker (IP telephone) of a spoken voice included in the voice data. The time information is information indicating time corresponding to the audio data. The control information may be acquired from the header portion of the IP packet or may be acquired from the payload portion of the IP packet.

통화개시 취득부(415)는, 입력된 IP 패킷을 해석하여, IP 패킷 각각으로부터 통화 식별 정보를 포함한 제어 정보를 추출한다. 통화개시 취득부(415)는, 추출한 제어 정보에 기초하여, 통화마다 전화망 통신부(411)가 해당 통화의 IP 패킷을 최초로 수신한 타이밍을 특정한다. 통화개시 취득부(415)는, 특정된 상기 타이밍을 해당 통화가 개시된 타이밍(이하 「통화개시 타이밍」이라고 함)으로서 취득한다. 그리고, 통화개시 취득부(415)는, 통화개시 타이밍을 취득할 때마다, 통화개시 타이밍이라는 것을, 대응하는 통화의 제어 정보와 함께 음성인식 제어부(416)에 통지한다.The call initiation acquisition unit 415 analyzes the input IP packet and extracts control information including call identification information from each IP packet. The call initiation acquisition unit 415 specifies the timing at which the telephone network communication unit 411 first receives the IP packet of the call for each call, based on the extracted control information. The call start acquisition section 415 acquires the specified timing as a timing at which the call is started (hereinafter referred to as "call start timing"). The call start acquisition section 415 notifies the voice recognition control section 416 of the call start timing together with the control information of the corresponding call each time the call start timing is obtained.

또한, 통화개시 취득부(415)는, 제어 정보에 통화개시 시각을 나타내는 정보 등 통화개시 타이밍을 직접적으로 나타내는 정보가 포함되어 있는 경우, 그러한 정보로부터 통화개시 타이밍을 취득해도 좋다.When the control information includes information directly indicating the call start timing such as information indicating the call start time, the call start acquisition section 415 may acquire the call start timing from such information.

또, IP 패킷으로부터의 음성 데이터 및 제어 정보의 추출은, 전화망 통신부(411)에서 행해져도 좋다.The extraction of the voice data and the control information from the IP packet may be performed by the telephone network communication unit 411.

음성인식 제어부(416)는, 통화개시 타이밍이라는 것이 통지되면, 관리망 통신부(412)를 경유하여, 관리 서버(420)에 대해서 통화개시 타이밍인 것을 나타내는 통화개시 통지를 송신한다. 통화개시 통지에는, 예를 들면 제어 정보가 포함된다.When it is notified that the communication start timing is reached, the voice recognition control section 416 transmits a communication start notification indicating the communication start timing to the management server 420 via the management network communication section 412. [ The call start notification includes, for example, control information.

또, 음성인식 제어부(416)는, 관리망 통신부(412)를 경유하여, 음성 데이터의 송신 요구(이하 「음성송신 요구」라고 함)를 수신하면, 메모리(413)에 기록된 요구 대상이 되는 음성 데이터를 요구처로 답신한다. 음성송신 요구에는, 제어 정보 등의 음성 데이터를 특정하는 정보가 포함된다. 또, 음성송신 요구는, 예를 들면, 음성인식 서버(430) 및 모니터링 장치(440)로부터 송신된다. 음성송신 요구는, 예를 들면, 통화 식별 정보를 지정하여, 대응하는 통화의 음성 데이터가 저장되는 대로, 해당 음성 데이터를 순서대로 답신할 것을 요구하는 내용이다. The voice recognition control section 416 receives the voice data transmission request (hereinafter, referred to as "voice transmission request") via the management network communication section 412, The voice data is returned to the request destination. The voice transmission request includes information for specifying voice data such as control information. The voice transmission request is transmitted from the voice recognition server 430 and the monitoring device 440, for example. The voice transmission request is a request for specifying the call identification information, for example, and returning the voice data in order as the voice data of the corresponding call is stored.

후술하지만, 통화개시 통지가 송신되면, 관리 서버(420)의 관리 기능에 의해, 음성인식 서버(430)는 통화녹음 서버(410)에 대해서 음성 데이터를 요구하고, 답신된 음성 데이터에 대한 음성인식 처리를 개시한다.When the communication start notification is transmitted, the management server 420 manages the voice recognition server 430. The voice recognition server 430 requests voice data to the voice recording server 410, And starts processing.

즉, 음성인식 제어부(416)는, 통화개시 통지의 송신 결과, 기록된 음성 데이터를 음성인식 서버(430)에 출력하고, 통화개시 타이밍 직후에, 메모리(413)에 기록된 음성 데이터에 대한 음성인식 서버(430)의 음성인식 처리를 개시시킨다.That is, the voice recognition control unit 416 outputs the recorded voice data to the voice recognition server 430 as a result of transmission of the call start notification, and immediately after the call start timing, The recognition server 430 starts voice recognition processing.

도 1의 관리 서버(420)는, 통화녹음 서버(410)로부터 송신된 통화개시 통지를 수신함으로써 통화개시 타이밍을 취득한다. 그리고, 관리 서버(420)는, 취득된 통화개시 타이밍을 기초로, 통화녹음 서버(410), 음성인식 서버(430) 및 모니터링 장치(440)의 각각의 동작 타이밍을 제어한다.The management server 420 in Fig. 1 acquires the communication start timing by receiving the communication start notification transmitted from the call recording server 410. [ The management server 420 controls the operation timings of the call recording server 410, the voice recognition server 430, and the monitoring device 440 based on the acquired call start timing.

보다 구체적으로는, 관리 서버(420)는, 통화개시 통지를 수신하면, 통화개시 통지에 포함되는 제어 정보에 기초하여, 통화개시 통지가 나타내는 통화의 음성 데이터에 대해서 음성인식 처리를 행할지 말지를 결정한다.More specifically, upon receiving the call start notification, the management server 420 determines whether or not to perform voice recognition processing on the voice data of the call indicated by the call start notification based on the control information included in the call start notification .

그리고, 관리 서버(420)는, 음성인식을 행한다고 판단했을 경우, 음성인식 서버(430)에 대해서, 통화녹음 서버(410)에 기록된 음성 데이터에 대한 음성인식 처리의 개시 요구(이하 「인식개시 요구」라고 함)를 송신한다. 인식개시 요구에는, 제어 정보 등의 음성 데이터를 특정하는 정보가 포함된다.When the management server 420 determines that the voice recognition is to be performed, the management server 420 instructs the voice recognition server 430 to issue a voice recognition processing start request (hereinafter referred to as " Start request "). The recognition start request includes information for specifying voice data such as control information.

또, 관리 서버(420)는, 음성인식을 행한다고 판단했을 경우, 모니터링 장치(440)에 대해서 통화개시 통지를 전송한다. 또한, 관리 서버(420)는, 음성인식 서버(430)로부터 음성인식 처리가 개시된 취지의 통지(이하 「인식개시 통지」라고 함)를 수신하면, 해당 인식개시 통지를 모니터링 장치에 전송한다. 인식개시 통지에는, 제어 정보 등의 음성 데이터를 특정하는 정보가 포함된다.If the management server 420 determines that voice recognition is to be performed, the management server 420 transmits a call start notification to the monitoring device 440. [ When the management server 420 receives notification of the start of speech recognition processing (hereinafter referred to as " recognition start notification ") from the speech recognition server 430, the management server 420 transmits the recognition start notification to the monitoring apparatus. The recognition start notification includes information for specifying voice data such as control information.

음성인식 서버(430)는, 인식개시 요구를 수신하면, 통화녹음 서버(410)에 대해서 인식개시 요구와 동일한 음성 데이터를 대상으로 한, 음성송신 요구를 송신한다. 그리고, 음성인식 서버(430)는, 통화녹음 서버(410)로부터 답신된 음성 데이터에 대한 음성인식 처리를 행하고, 음성인식 처리의 결과인 텍스트 데이터를 생성하여, 음성인식 서버(430)의 메모리(도시하지 않음)에 저장한다.Upon receiving the recognition start request, the voice recognition server 430 transmits to the call recording server 410 a voice transmission request for voice data that is the same as the recognition start request. The voice recognition server 430 performs voice recognition processing on the voice data returned from the call recording server 410 and generates text data that is a result of the voice recognition processing and outputs the text data to the memory Not shown).

음성인식 서버(430)는, 공지의 음성인식 기술에 의한 음성인식 처리를 행한다. 예를 들면, 음성인식 서버는, 음성인식 데이터베이스, 음향 분석부, 및 인식 디코더부를 가진다(모두 도시하지 않음).The speech recognition server 430 performs speech recognition processing by a known speech recognition technique. For example, the speech recognition server has a speech recognition database, an acoustic analysis unit, and a recognition decoder unit (not all shown).

음성인식 데이터베이스는, 음향 모델, 사전 및 언어 모델을 미리 저장하고 있다. 음향 모델은, 음성의 특징량과 발음 기호의 확률적인 대응관계를 데이터화한 것이다. 사전은, 음성인식 처리에 의한 음성인식 결과의 후보군으로서, 복수의 텍스트 배열을 기술한 것이다. 언어 모델은, 사전에 기술된 텍스트 배열의 각각에 대해서, 출현 확률이나 접속 확률을 데이터화한 것이다.The speech recognition database stores an acoustic model, a dictionary, and a language model in advance. The acoustic model is a data set of a probabilistic correspondence relationship between a feature quantity of a speech and a pronunciation symbol. The dictionary describes a plurality of text arrays as candidate groups of speech recognition results by speech recognition processing. The language model is a data representation of the appearance probability and the connection probability for each of the text arrays described in advance.

음향 분석부는, 음성 신호에 대해서 프레임 처리를 행하고, 프레임 마다 푸리에 해석을 포함한 소정의 처리를 행하여 음성 특징량을 추출한다. 그리고, 음향 분석부는, 해석 결과로부터 발화 음성이 포함되어 있는 음성 구간을 검출하고, 음성 구간의 음성 특징량만에 의한 시계열 데이터를 생성한다.The sound analysis section subjects the speech signal to frame processing and performs predetermined processing including Fourier analysis for each frame to extract the speech characteristic amount. Then, the sound analysis section detects a speech section including the speech voice from the analysis result, and generates time series data based only on the speech feature quantity of the speech section.

인식 디코더부는, 음향 분석부가 생성한 음성 특징량의 시계열 데이터에 기초하여, 음성인식 데이터베이스의 음향 모델, 사전 및 언어 모델을 참조하여 음성인식 결과를 결정한다.The recognition decoder section determines the speech recognition result by referring to the acoustic model, the dictionary, and the language model of the speech recognition database based on the time series data of the speech characteristic quantities generated by the acoustic analysis section.

또한, 음성인식 결과에는, 음성인식이 성공한 경우 발화 음성을 텍스트화한 텍스트 데이터가 포함된다. 즉, 음성인식 서버(430)는, 통화녹음 서버(410)에 기록된 음성 데이터에 대한 음성인식 처리를 행하고, 음성인식 처리의 결과인 텍스트 데이터를 생성한다.The speech recognition result includes text data in which a speech sound is textized when speech recognition is successful. That is, the voice recognition server 430 performs voice recognition processing on the voice data recorded in the call recording server 410, and generates text data that is a result of the voice recognition processing.

또, 음성인식 서버(430)는, 모니터링 장치(440)로부터, 텍스트 데이터 송신의 요구(이하 「인식결과 송신 요구」라고 함)를 수신하면, 보존된 요구 대상이 되는 음성 데이터의 음성인식 결과를 모니터링 장치(440)에 답신한다. 인식결과 송신 요구에는, 원래 음성 데이터의 제어 정보 등의 음성 데이터를 특정하는 정보가 포함된다. 인식결과 송신 요구는, 예를 들면 통화 식별 정보를 지정하여, 대응하는 통화의 음성인식 결과가 생성되는 대로, 해당 음성인식 결과를 순서대로 답신할 것을 요구하는 내용이다.When receiving a request for text data transmission (hereinafter referred to as a " recognition result transmission request ") from the monitoring device 440, the speech recognition server 430 outputs the speech recognition result of the saved speech data And returns to monitoring device 440. The recognition result transmission request includes information for specifying voice data such as control information of original voice data. The recognition result transmission request is a content requesting, for example, specifying the call identification information and returning the voice recognition result in order as the voice recognition result of the corresponding call is generated.

모니터링 장치(440)는, 콜센터의 각 통화를 감시하는 관리자가 사용하는 퍼스널 컴퓨터 중, 웹 브라우저로서 기능하는 부분이다. 모니터링 장치(440)는, 관리 서버(420)로부터 통화개시 통지를 수신하면, 통화녹음 서버(410)에 대해서 음성송신 요구를 송신하고, 음성인식 서버(430)에 대해서 인식결과 송신 요구를 송신한다.The monitoring device 440 is a part functioning as a web browser among the personal computers used by the manager monitoring each call of the call center. Upon receiving the call start notification from the management server 420, the monitoring device 440 transmits a voice transmission request to the call recording server 410 and transmits a recognition result transmission request to the voice recognition server 430 .

그리고, 모니터링 장치(440)는, 통화녹음 서버(410)로부터 답신된 음성 데이터와, 음성인식 서버(430)로부터 답신된 음성인식 결과 중 적어도 텍스트 데이터를, 액정 디스플레이 등의 표시부에 대응시켜 표시한다. 즉, 모니터링 장치(440)는, 통화개시 타이밍 직후부터, 음성 데이터와 그 음성인식 결과(텍스트 데이터)를 대응시켜 관리자에게 제시한다.The monitoring device 440 displays at least text data among the voice data returned from the call recording server 410 and the voice recognition result returned from the voice recognition server 430 in association with a display unit such as a liquid crystal display . That is, the monitoring device 440 associates the voice data with the voice recognition result (text data) immediately after the call start timing and presents the voice data to the manager.

또한, 통화녹음 서버(410), 관리 서버(420), 음성인식 서버(430) 및 모니터링 장치(440)의 각각은, 도시하지 않지만, 예를 들면, CPU(Central Processing Unit), 제어 프로그램을 저장한 ROM(Read Only Memory) 등의 기억 매체, RAM(Random Access Memory) 등의 작업용 메모리 및 통신 회로 등을 가진다. 이 경우, 상기한 각 장치 및 각 부의 기능은, CPU가 제어 프로그램을 실행함으로써 실현된다.Although not shown, each of the call recording server 410, the management server 420, the voice recognition server 430 and the monitoring device 440 may include a central processing unit (CPU) A storage medium such as a ROM (Read Only Memory), a work memory such as a RAM (Random Access Memory), a communication circuit, and the like. In this case, the functions of the respective devices and the respective units described above are realized by the CPU executing the control program.

이와 같은 통신 시스템(100)에서, 통화녹음 서버(410)는, IP 전화망에서 전송되는 통화의 음성 데이터를 스트리밍으로 음성인식 서버(430)에 제공할 수 있다. 그리고, 음성인식 서버(430)는, 음성 데이터에 대한 음성인식 결과를 스트리밍으로 모니터링 장치(440)에 제공할 수 있다.In this communication system 100, the call recording server 410 can provide the voice recognition server 430 with the voice data of the call transmitted from the IP telephone network by streaming. Then, the speech recognition server 430 can provide the monitoring device 440 with the result of speech recognition of the speech data by streaming.

즉, 통신 시스템(100)은, 음성인식 처리 및 음성인식 결과의 제시를 통화개시 타이밍의 직후부터 시작할 수 있으므로, 음성인식 결과를 통화 도중에 거의 실시간으로 제시할 수 있다.That is, since the communication system 100 can start the speech recognition processing and the presentation of the speech recognition result immediately after the communication start timing, the speech recognition result can be presented almost in real time during the communication.

<통화녹음 서버의 동작> <Operation of call recording server>

다음으로, 통화녹음 서버의 동작에 대해서 설명한다.Next, the operation of the call recording server will be described.

도 3은 통화녹음 서버(410)의 동작의 일례를 나타내는 흐름도이다.FIG. 3 is a flowchart showing an example of the operation of the call recording server 410. FIG.

우선, 스텝 S1100에서, 전화망 통신부(411)는 외선망(200)으로부터 IP 패킷을 수신했는지 아닌지를 판단한다. 전화망 통신부(411)는 IP 패킷을 수신했을 경우(S1100：YES), 처리를 스텝 S1200으로 진행한다. 또, 전화망 통신부(411)는 IP 패킷을 수신하지 않았을 경우(S1100：NO), 처리를 후술하는 스텝 S1500으로 진행한다.First, in step S1100, the telephone network communication unit 411 determines whether or not an IP packet has been received from the external network 200. [ When the telephone network communication unit 411 has received the IP packet (S1100: YES), the process proceeds to step S1200. If the telephone network communication unit 411 has not received the IP packet (S1100: NO), the process proceeds to step S1500, which will be described later.

스텝 S1200에서, 음성 기록 제어부(414)는, IP 패킷으로부터 음성 데이터를 추출하고, 음성 데이터를 제어 정보 등의 음성 데이터를 특정하는 정보와 대응시켜 메모리(413)에 기록한다. 또, 통화개시 취득부(415)는 IP 패킷으로부터 제어 정보를 추출한다.In step S1200, the audio recording control unit 414 extracts audio data from the IP packet, and records the audio data in the memory 413 in association with information specifying audio data such as control information. In addition, the call initiation acquisition unit 415 extracts control information from the IP packet.

그리고, 스텝 S1300에서, 통화개시 취득부(415)는, 제어 정보에 기초하여 통화개시 타이밍인지 아닌지를 판단한다. 통화개시 취득부(415)는 통화개시 타이밍일 경우(S1300：YES), 처리를 스텝 S1400으로 진행한다. 또, 통화개시 취득부(415)는 통화개시 타이밍이 아닐 경우(S1300：NO), 처리를 후술하는 스텝 S1500으로 진행한다.Then, in step S1300, the call start acquisition section 415 determines whether or not the call start timing is based on the control information. If the call start acquisition section 415 is in the call start timing (S1300: YES), the process proceeds to step S1400. If the call start acquisition section 415 is not the call start timing (S1300: NO), the process proceeds to step S1500, which will be described later.

스텝 S1400에서, 음성인식 제어부(416)는, 관리망 통신부(412)를 경유하여 통화개시 통지를 관리 서버(420)에 송신한다.In step S1400, the voice recognition control section 416 transmits a call start notification to the management server 420 via the management network communication section 412. [

그리고, 스텝 S1500에서, 음성인식 제어부(416)는, 관리망 통신부(412)를 경유하여 음성송신 요구를 수신했는지 아닌지를 판단한다. 음성인식 제어부(416)는 음성송신 요구를 수신했을 경우(S1500：YES), 처리를 스텝 S1600으로 진행한다. 또, 음성인식 제어부(416)는 음성송신 요구를 수신하지 않았을 경우(S1500：NO), 처리를 후술하는 스텝 S1700으로 진행한다.Then, in step S1500, the voice recognition control section 416 determines whether or not a voice transmission request has been received via the management network communication section 412. [ If the voice recognition control unit 416 receives a voice transmission request (S1500: YES), the process proceeds to step S1600. If the voice recognition control section 416 has not received the voice transmission request (S1500: NO), the process proceeds to step S1700, which will be described later.

스텝 S1600에서, 음성인식 제어부(416)는, 음성송신 요구의 송신원(요구처)으로의 음성 데이터의 전송을 개시한다.In step S1600, the voice recognition control section 416 starts transmission of voice data to the transmission source (request destination) of the voice transmission request.

그리고, 스텝 S1700에서, 음성인식 제어부(416)는, 관리자의 조작 등에 의해 통화 데이터를 감시하는 처리의 종료를 지시받았는지 아닌지를 판단한다. 음성인식 제어부(416)는 처리 종료를 지시받지 않았을 경우(S1700：NO), 처리를 스텝 S1100으로 되돌린다. 또, 음성인식 제어부(416)는 처리 종료를 지시받았을 경우(S1700：YES), 일련의 처리를 종료한다.Then, in step S1700, the voice recognition control section 416 determines whether or not an instruction to terminate the process of monitoring the call data is received by an administrator's operation or the like. If the speech recognition control unit 416 has not received an instruction to end the processing (S1700: NO), the speech recognition control unit 416 returns the processing to step S1100. When the voice recognition control unit 416 receives an instruction to end the process (S1700: YES), the voice recognition control unit 416 ends the series of processes.

이러한 동작에 의해, 통화녹음 서버(410)는, IP 전화망에서 전송되는 통화의 통화개시 타이밍을 취득하여, 통화개시 타이밍의 직후에, IP 전화망에서 전송되는 통화의 음성 데이터에 대한 음성인식 처리를 개시시킬 수 있다.With this operation, the call recording server 410 acquires the call start timing of the call transmitted from the IP telephone network, and immediately after the call start timing, starts voice recognition processing on the voice data of the call transmitted from the IP telephone network .

<시스템 전체의 동작> <System-wide operation>

다음에, 통신 시스템(100) 전체 동작의 흐름의 일례에 대해 설명한다.Next, an example of the flow of the entire operation of the communication system 100 will be described.

도 4는 통신 시스템(100)의 동작의 흐름 일례를 나타내는 순서도이다.4 is a flow chart showing an example of the flow of the operation of the communication system 100. As shown in Fig.

우선, 모니터링 장치(440)는, 음성인식 처리의 대상이 되는 음성 데이터의 조건을 관리 서버(420)에 송신하여 미리 설정해 둔다(S2010). 그러한 조건은, 예를 들면, 화자 식별 정보, 통화 시간대, 통화에 소정의 단어가 포함되는 것이다. 즉, 모니터링 장치(440)는, 음성인식 처리의 대상을 미리 관리 서버(420)에 등록한다. 그리고, 통화가 개시되면, 네트워크 기기(320)는 IP 패킷을 통화녹음 서버(410)로 송신 개시한다(S2020).First, the monitoring device 440 transmits the conditions of the voice data to be subjected to the voice recognition processing to the management server 420 and sets them in advance (S2010). Such a condition is, for example, that the speaker identification information, the call time zone, and the currency include a predetermined word. That is, the monitoring device 440 registers the target of speech recognition processing in the management server 420 in advance. Then, when the call is started, the network device 320 starts transmitting the IP packet to the call recording server 410 (S2020).

통화녹음 서버(410)는, 보내져 오는 IP 패킷의 각각으로부터 음성 데이터 및 제어 정보를 추출하는 처리와 음성 데이터의 기록을 시작함과 동시에(S2030), 통화개시 통지를 관리 서버(420)에 송신한다(S2040). 이 시점에서, 통화녹음 서버(410)에는 적어도 통화의 최초 부분의 음성 데이터가 저장되어 있다.The call recording server 410 starts processing for extracting voice data and control information from each of the transmitted IP packets and recording of voice data (S2030), and transmits a call start notification to the management server 420 (S2040). At this point, at least the voice data of the first part of the call is stored in the call recording server 410. [

관리 서버(420)는, 통화개시 통지에 포함되는 제어 정보와, S2010에서 설정된 조건을 기초로, 음성 데이터에 대한 음성인식을 행할지 말지를 판단한다(S2050). 관리 서버(420)는, 음성인식을 행한다고 판단했을 경우, 인식개시 요구를 음성인식 서버(430)에 송신함과 동시에(S2060), 통화개시 통지를 모니터링 장치(440)에 송신한다(S2070). 음성인식 서버(430)는, 인식개시 요구를 받아 음성송신 요구를 통화녹음 서버(410)에 송신한다(S2080).The management server 420 determines whether or not to perform speech recognition on the speech data based on the control information included in the communication start notification and the condition set in S2010 (S2050). The management server 420 transmits a recognition start request to the speech recognition server 430 (S2060), and transmits a communication start notification to the monitoring device 440 (S2070) . The voice recognition server 430 receives the recognition start request and transmits the voice transmission request to the call recording server 410 (S2080).

상술한 바와 같이, 통화녹음 서버(410)에는, 적어도 통화의 최초 부분의 음성 데이터가 저장되어 있다. 따라서, 통화녹음 서버(410)는, 음성송신 요구를 받고, 저장되어 있는 음성 데이터를 음성인식 서버(430)에 답신한다(S2090). 또한, 높은 정확도의 음성인식 결과가 얻어지도록, 음성인식 서버(430)에 송신되는 음성 데이터는, IP 패킷으로부터 추출된 음성 데이터의 품질이 유지되고 있는 것이 바람직하다.As described above, at least the voice data of the first part of the call is stored in the call recording server 410. [ Therefore, the call recording server 410 receives the voice transmission request and returns the stored voice data to the voice recognition server 430 (S2090). It is also preferable that the quality of the voice data extracted from the IP packet is maintained in the voice data transmitted to the voice recognition server 430 so as to obtain a voice recognition result with high accuracy.

이와 같이 하여, 음성인식 서버(430)는, 통화녹음 서버(410)에 저장된 음성 데이터에 대한 음성인식 처리를 개시한다(S2100). 이 시점에서, 음성인식 서버(430)에는, 적어도 통화의 최초 부분의 음성인식 결과가 저장되어 있다. 또, 음성인식 서버(430)는 인식개시 통지를 관리 서버(420)에 송신한다(S2110).In this manner, the voice recognition server 430 starts voice recognition processing on the voice data stored in the call recording server 410 (S2100). At this point, at least the voice recognition result of the first part of the call is stored in the voice recognition server 430. [ In addition, the voice recognition server 430 transmits a recognition start notification to the management server 420 (S2110).

그러한 인식개시 통지가 행해짐으로써, 웹 브라우저와 같이 풀형 동작에 의해 표시 대상을 취득하는 모니터링 장치(440)라 하더라도, 음성 데이터 및 음성인식 결과를 실시간으로 취득하여 표시하는 것이 가능하게 된다.Such a recognition start notification makes it possible to acquire and display speech data and speech recognition results in real time even in a monitoring apparatus 440 that acquires a display object by a full type operation such as a web browser.

관리 서버(420)는, 음성인식 서버(430)로부터 수신한 인식개시 통지를 모니터링 장치(440)에 전송한다(S2120). 한편, 이러한 인식개시 통지 또는 스텝 S2070에서 송신되는 통화개시 통지에는, 음성인식 결과의 취득처를 나타내는 정보로서, 음성인식 서버(430)의 식별 정보가 포함되어 있는 것이 바람직하다. 모니터링 장치(440)는, 인식개시 통지를 받고, 인식결과 송신 요구를 음성인식 서버(430)에 송신한다(S2130).The management server 420 transmits the recognition start notification received from the speech recognition server 430 to the monitoring device 440 (S2120). On the other hand, it is preferable that the identification start notification or the call start notification transmitted in step S2070 includes identification information of the voice recognition server 430 as information indicating the acquisition destination of the voice recognition result. The monitoring device 440 receives the recognition start notification and transmits a recognition result transmission request to the speech recognition server 430 (S2130).

상술한 바와 같이, 음성인식 서버(430)에는, 적어도 통화의 최초 부분의 음성인식 결과가 저장되어 있다. 따라서, 음성인식 서버(430)는, 인식결과 송신 요구를 받고, 저장되어 있는 음성인식 결과를 모니터링 장치(440)에 송신한다(S2140).As described above, at least the voice recognition result of the first part of the call is stored in the voice recognition server 430. [ Accordingly, the speech recognition server 430 receives the recognition result transmission request and transmits the stored speech recognition result to the monitoring device 440 (S2140).

모니터링 장치(440)는, 다시, 음성송신 요구를 통화녹음 서버(410)에 송신하고(S2150), 통화녹음 서버(410)로부터 음성 데이터를 수신한다(S2160). 한편, 통화녹음 서버(410)는, 음성인식 제어부(416)에서, 모니터링 장치(440)에 송신되는 음성 데이터를, 웹 브라우저에서 출력가능한 형식의 음성 데이터로 변환하는 것이 바람직하다. 그리고, 모니터링 장치(440)는, 수신한 음성 데이터 및 음성인식 결과를 대응시켜 표시한다(S2170).The monitoring device 440 again transmits a voice transmission request to the call recording server 410 (S2150), and receives voice data from the call recording server 410 (S2160). On the other hand, the call recording server 410 preferably converts the voice data transmitted to the monitoring device 440 into voice data in a format that can be output from the web browser by the voice recognition control unit 416. [ Then, the monitoring device 440 displays the received voice data and the voice recognition result in association with each other (S2170).

모니터링 장치(440)는, 예를 들면, 감시 대상이 되는 통화가 복수 동시에 행해지고 있는 경우, 각 음성 데이터의 제어 정보에 포함되는 통화 식별 정보 또는 화자 식별 정보에 기초하여, 이들 복수의 통화에 대한 음성인식 결과를 통화마다 취득할 수 있다. 이 경우, 모니터링 장치(440)는, 이들 복수 통화에 대한 음성인식 결과를 1개의 웹 브라우저 화면에 동시 표시하는 것이 바람직하다.When a plurality of conversations to be monitored are performed at the same time, for example, the monitoring device 440 outputs voice (voice) to the plurality of calls based on the call identification information or the speaker identification information included in the control information of each voice data The recognition result can be obtained for each currency. In this case, it is preferable that the monitoring device 440 simultaneously displays the voice recognition results of these multiple calls on one web browser screen.

이와 같은 동작에 의해, 통신 시스템(100)은, 음성인식 대상을 필요한 것으로 뽑아내면서, 음성인식 처리 및 음성인식 결과의 제시를 통화개시 타이밍의 직후부터 시작할 수 있다. 또, 통신 시스템(100)은, 웹 브라우저에서, 통화의 음성 데이터 및 음성인식 결과를 실시간으로 표시할 수 있다.With this operation, the communication system 100 can start speech recognition processing and presentation of speech recognition results immediately after the communication start timing, while extracting the speech recognition target as necessary. In addition, the communication system 100 can display, in a web browser, voice data of a call and voice recognition result in real time.

한편, 통신 시스템(100)에서 송신되는 각종 요구는, 1회 요구로 통화 전체 데이터에 대한 처리를 요구하는 것이어도 좋고, 패킷, 프레임 또는 한 묶음의 음성인식 결과 등을 단위로 하여, 통화의 일부 데이터마다 처리를 요구하는 것이어도 좋다. 후자의 경우, 예를 들면, 프레임 번호나 음성인식 결과의 이벤트 번호 등을, 처리 대상을 지정하는 식별 정보로서 이용할 수 있다.On the other hand, various requests transmitted from the communication system 100 may be a request for processing the entire data of a call at one request, and may be a request for a part of a call, a packet, a frame, It may be a case of requesting processing for each data. In the latter case, for example, a frame number, an event number of a speech recognition result, and the like can be used as identification information for specifying an object to be processed.

<본 실시형태의 효과> &Lt; Effect of the present embodiment &

이상과 같이, 본 실시형태에 따른 통화 데이터 관리 시스템을 포함한 통신 시스템(100)에 의하면, IP 전화망을 형성하는 내선망(300)에서 전송되는 통화의 음성 데이터에 대한 음성인식 처리를, 통화개시 타이밍의 직후부터 개시한다. 이에 의해, 통신 시스템(100)은, IP 전화망 통화의 음성 데이터에 대한 음성인식 결과를 통화 도중에 거의 실시간으로 제시할 수 있다.As described above, according to the communication system 100 including the call data management system according to the present embodiment, the voice recognition processing of the voice data of the call transmitted from the inner network 300 forming the IP telephone network is performed at the call start timing And the like. Thereby, the communication system 100 can present the speech recognition result of the voice data of the IP telephone conversation in real time during the call.

상술한 바와 같이, 각 통화가 종료한 후에, 확인이 필요한 통화를 픽업하고, 통화의 음성 데이터를 검색하여, 축적된 음성인식 결과나 음성 데이터를 확인하는 작업은, IP 전화기의 대수가 방대할 경우 매우 번잡한 것이 된다.As described above, an operation of picking up a call requiring confirmation after each call is ended, searching voice data of the call, and confirming the accumulated voice recognition result and voice data is performed when the number of IP phones is large It becomes very complicated.

이에 대해, 본 실시형태에 따른 통신 시스템(100)은, 관리자에 대해서 각 통화의 내용을 실시간으로 제시하므로, 이런 번잡한 작업을 회피하면서 각 통화를 높은 효율로 감시하는 것이 가능하게 된다. 따라서, 본 실시형태에 따른 통신 시스템(100)에 의하면, 보다 간단하게, 다수의 IP 전화기의 통화 내용을 실시간으로 감시할 수 있다.On the other hand, the communication system 100 according to the present embodiment presents the content of each call in real time to the manager, so that it is possible to monitor each call with high efficiency while avoiding such complicated operations. Therefore, according to the communication system 100 according to the present embodiment, it is possible to more simply monitor the contents of conversations of a plurality of IP phones in real time.

또, 관리자가 오퍼레이터에게 어드바이스를 하는 등의 액션을 행할 경우, 종래기술 처럼 통화가 종료하고 나서 통화 내용을 확인하면, 그러한 액션이 적절한 타이밍으로부터 늦어져 버린다. 이에 대해, 본 실시형태에 따른 통신 시스템(100)은, 각 IP 전화기의 통화 내용을 실시간으로 감시할 수 있으므로, 통화 내용에 맞는 액션을 적확한 타이밍으로 행하는 것을 가능하게 한다.When the manager performs an action such as advising the operator, such an action is delayed from the proper timing if the contents of the call are confirmed after the end of the call as in the prior art. On the other hand, the communication system 100 according to the present embodiment can monitor the contents of conversation of each IP telephone in real time, and thus can perform an action corresponding to the contents of the conversation at a proper timing.

또, 본 실시형태에 따른 통신 시스템(100)은, 관리 서버(420)에서, 제어 정보에 기초하여, 통화마다 통화녹음 서버(410), 음성인식 서버(430), 및 모니터링 장치(440)의 각각의 동작 타이밍을 제어한다. 이에 의해, 본 실시형태에 따른 통신 시스템(100)은, 통화녹음 서버(410), 음성인식 서버(430) 및 모니터링 장치(440)가 독립된 장치라 하더라도, 이러한 장치에 대한 최소한의 변경으로 이들을 연계적으로 동작시켜, 상기 효과를 얻을 수 있다.The communication system 100 according to the present embodiment is configured such that the management server 420 manages the call recording server 410, the voice recognition server 430, and the monitoring device 440 And controls the respective operation timings. Accordingly, the communication system 100 according to the present embodiment is configured such that even if the call recording server 410, the voice recognition server 430, and the monitoring apparatus 440 are independent apparatuses, The above effect can be obtained.

또, 본 실시형태에 따른 통신 시스템(100)에서, 모니터링 장치(440)는 음성인식 서버(430)로부터 음성인식 서버(430)에 저장된 음성인식 결과를 취득하여 제시한다. 따라서, 본 실시형태에 따른 통신 시스템(100)은, 복수의 모니터링 장치(440)가 존재하는 경우라 하더라도, 각 모니터링 장치(440)에서 독립적으로 음성인식 결과를 제시할 수 있다.In the communication system 100 according to the present embodiment, the monitoring device 440 acquires the voice recognition result stored in the voice recognition server 430 from the voice recognition server 430 and presents the voice recognition result. Therefore, the communication system 100 according to the present embodiment can present the voice recognition result independently from each monitoring device 440, even when a plurality of monitoring devices 440 exist.

또, 본 실시형태에 따른 통신 시스템(100)은, 음성인식의 대상이 되는 음성 데이터(통화, IP 전화기 혹은 화자 등)를 동적으로 선택할 수 있으므로, 다수의 IP 전화기의 통화 내용의 감시를, 더욱 높은 효율로 행하는 것을 가능하게 한다.In addition, the communication system 100 according to the present embodiment can dynamically select voice data (a call, an IP telephone, a speaker, or the like) to be subjected to speech recognition, So that it can be performed with high efficiency.

또, 본 실시형태에 따른 통신 시스템(100)은, IP 전화망으로부터 통화의 음성 데이터를 취득하므로, 각 통화의 음성 데이터를 고품질이면서 또 고효율로 취득할 수 있다. 예를 들면, 각 IP 전화기에 음성 데이터 취득을 위한 설비를 설치하는 경우에 비해서, 본 실시형태에 따른 통신 시스템(100)은 필요한 설비 코스트 및 설비 스페이스를 저감할 수 있다. 또, 본 실시형태에 따른 통신 시스템(100)은, 송화 음성과 수화 음성이 완전히 분리 녹음된 고품질의 음성 데이터를 취득할 수 있기 때문에, 정확도 높은 텍스트 데이터를 음성인식 결과로서 얻을 수 있어, 보다 높은 신뢰성을 실현할 수 있다.In addition, the communication system 100 according to the present embodiment acquires voice data of a call from the IP telephone network, so that voice data of each call can be acquired with high quality and high efficiency. For example, the communication system 100 according to the present embodiment can reduce the required equipment cost and equipment space, as compared with the case where equipment for obtaining voice data is installed in each IP telephone. Further, since the communication system 100 according to the present embodiment can acquire high-quality voice data in which the transmitted voice and the hydrated voice are completely separated and recorded, highly accurate text data can be obtained as voice recognition results, Reliability can be realized.

한편, IP 전화망에서 전송되는 통화의 음성 데이터의 취득 방법은, 상술한 예로 한정되지 않는다. 예를 들면, 통화녹음 서버(410)는, 각 통화의 음성 데이터의 전송로상에 배치되어 있을 경우, 음성 데이터를 전송 할 때에 그 복제를 취득해도 좋다.On the other hand, the method of obtaining voice data of a call transmitted from the IP telephone network is not limited to the above example. For example, when the call recording server 410 is disposed on the transmission path of the voice data of each call, it may acquire the copy when transferring the voice data.

또, 관리 서버(420)의 기능, 음성인식 서버(430) 및 모니터링 장치(440)의 기능의 일부 또는 전부가, 통화녹음 서버(410)에 배치되어 있어도 좋다.A part or all of the functions of the management server 420, the voice recognition server 430, and the monitoring device 440 may be disposed in the call recording server 410. [

예를 들면, 통화녹음 서버(410)는, 취득된 제어 정보에 기초하여, 기록된 음성 데이터에 대해서 음성인식 처리를 행할지 말지를 결정하는 처리 대상 결정부를 가져도 좋다. 이 경우, 음성인식 대상의 선정을 통화녹음 서버(410)에서 행할 수 있어, 통화개시 통지의 송신 회수를 저감할 수 있다.For example, the call recording server 410 may have a processing object determination unit that determines whether or not to perform speech recognition processing on the recorded voice data, based on the acquired control information. In this case, the voice recording server 410 can select a voice recognition target, thereby reducing the number of times the call start notification is transmitted.

또, 본 개시의 적용은 콜 센터에 한정되는 것은 아니다. 본 개시는 공공기관 또는 기업의 접수 및 영업 등의 각종 창구 대표전화나, 사내 내선 전화망 등, 복수의 통화가 행해질 수 있는 각종 IP 전화망에 적용하는 것이 가능하다.The application of the present disclosure is not limited to a call center. The present disclosure can be applied to various types of IP telephone networks in which a plurality of calls can be made, such as telephone calls for various kinds of counter such as acceptance and sales of public institutions or companies, in-house telephone networks, and the like.

또한, 상기 통화녹음 서버는, 취득된 상기 제어 정보에 기초하여, 기록된 상기 음성 데이터에 대해서 상기 음성인식 처리를 실시할지 말지를 결정하는 처리 대상 결정부를 가져도 좋다.In addition, the call recording server may have a processing object determination unit that determines whether or not to perform the speech recognition processing on the recorded speech data, based on the obtained control information.

또, 상기 통화녹음 서버는, 상기 IP 전화망으로부터, 상기 음성 데이터를 저장하여 상기 통화의 식별 정보를 포함한 상기 제어 정보가 부여된 패킷을 수신하는 전화망 통신부를 가지고, 상기 통화개시 취득부는, 상기 제어 정보에 기초하여, 상기 전화망 통신부가 상기 통화의 상기 패킷을 최초로 수신한 타이밍을 특정하고, 특정된 상기 타이밍을 상기 통화개시 타이밍으로서 취득해도 좋다.The call recording server includes a telephone network communication unit for storing the voice data from the IP telephone network and receiving a packet to which the control information including identification information of the call is assigned, , The telephone network communication section may specify the timing at which the packet is first received in the call, and the specified timing may be acquired as the call start timing.

또, 상기 통화녹음 서버에서, 상기 음성인식 제어부는, 상기 제어 정보에 기초하여, 상기 통화마다 상기 음성인식 처리의 결과인 텍스트 데이터를, 기록된 상기 음성 데이터와 대응시켜 관리해도 좋다.Further, in the call recording server, the voice recognition control unit may manage text data which is a result of the voice recognition processing for each call, in association with the recorded voice data, based on the control information.

본 개시의 통화 데이터 관리 시스템은, IP 전화망에서 전송되는 통화의 음성 데이터를 기록하는 통화녹음 서버와, 기록된 상기 음성 데이터에 대한 음성인식 처리를 행하고, 상기 음성인식 처리의 결과인 텍스트 데이터를 생성하는 음성인식 서버와, 기록된 상기 음성 데이터와 생성된 상기 텍스트 데이터를 대응시켜 제시하는 모니터링 장치를 가지는 통화 데이터 관리 시스템으로서, 상기 통화녹음 서버는, IP 전화망으로부터 상기 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 음성 기록 제어부와, 취득된 상기 음성 데이터에 부수하는 제어 정보에 기초하여, 상기 통화가 개시된 통화개시 타이밍을 취득하는 통화개시 취득부와, 기록된 상기 음성 데이터를 상기 음성인식 서버에 출력하고, 상기 음성인식 서버에 대해서, 취득된 상기 통화개시 타이밍의 직후에, 상기 음성 데이터에 대한 음성인식 처리를 개시시키는 음성인식 제어부를 가져도 좋다.The call data management system of the present disclosure includes a call recording server for recording voice data of a call transmitted from an IP telephone network, a voice recognition server for performing voice recognition processing on the recorded voice data and generating text data as a result of the voice recognition processing And a monitoring device for presenting the recorded voice data and the generated text data in association with each other, wherein the call recording server sequentially acquires the voice data from the IP telephone network and stores the voice data in the memory A call start acquisition unit that acquires the call start timing at which the call is started based on the control information attached to the acquired voice data; and a voice output unit that outputs the recorded voice data to the voice recognition server To the speech recognition server, And a voice recognition control section for starting voice recognition processing on the voice data immediately after the timing.

또한, 상기 통화 데이터 관리 시스템에서, 상기 음성인식 제어부는, 상기 모니터링 장치로부터의 요구에 따라, 기록된 상기 음성 데이터를 상기 모니터링 장치에 출력하고, 상기 음성인식 서버는, 상기 모니터링 장치로부터의 요구에 따라, 생성된 상기 텍스트 데이터를 상기 모니터링 장치에 출력하고, 상기 통화녹음 서버로부터 상기 통화개시 타이밍을 취득하고, 취득된 상기 통화개시 타이밍에 기초하여, 상기 통화녹음 서버, 상기 음성인식 서버 및 상기 모니터링 장치의 각각의 동작 타이밍을 제어하는 관리 서버를 가져도 좋다.In addition, in the call data management system, the voice recognition control unit outputs the recorded voice data to the monitoring device in response to a request from the monitoring device, and the voice recognition server notifies the monitoring device of a request from the monitoring device Outputting the generated text data to the monitoring apparatus, acquiring the call start timing from the call record server, and based on the acquired call start timing, the call record server, the voice recognition server, and the monitoring And a management server for controlling the operation timing of each of the devices.

또, 상기 통화 데이터 관리 시스템에서, 상기 통화녹음 서버는, 상기 IP 전화망으로부터, 상기 음성 데이터를 저장하여 상기 통화의 식별 정보를 포함한 상기 제어 정보가 부여된 패킷을 수신하는 전화망 통신부를 가지고, 상기 관리 서버는, 상기 제어 정보에 기초하여, 상기 통화마다 상기 통화녹음 서버, 상기 음성인식 서버 및 상기 모니터링 장치의 각각의 동작 타이밍을 제어해도 좋다.Further, in the call data management system, the call recording server has a telephone network communication section for storing the voice data from the IP telephone network and receiving the packet to which the control information including the identification information of the call is assigned, The server may control the operation timing of each of the call recording server, the voice recognition server and the monitoring device for each call based on the control information.

또, 상기 통화 데이터 관리 시스템에서, 상기 관리 서버는, 상기 제어 정보에 기초하여, 상기 통화녹음 서버에 통화 기록된 상기 음성 데이터에 대해서 상기 음성인식 처리를 행할지 말지를 결정해도 좋다.In the call data management system, the management server may determine, based on the control information, whether to perform the voice recognition processing on the voice data recorded in the call recording server.

본 개시의 통화 데이터 관리 방법은, IP 전화망에서 전송되는 통화의 음성 데이터를 순차적으로 취득하여 메모리에 기록하는 스텝과, 취득된 상기 음성 데이터에 부수하는 제어 정보에 기초하여, 상기 통화가 개시된 통화개시 타이밍을 취득하는 스텝과, 취득된 상기 통화개시 타이밍의 직후에, 기록된 상기 음성 데이터에 대한 음성인식 처리를 개시시키는 스텝을 가져도 좋다.A method for managing call data according to the present disclosure includes the steps of sequentially acquiring voice data of a call transmitted from an IP telephone network and recording the voice data in a memory, And a step of starting speech recognition processing on the recorded voice data immediately after the acquired call start timing.

2014년 3월 17일에 출원한 일본 특허출원 제2014-053355호에 포함되는 명세서, 도면 및 요약서의 개시 내용은, 모두 본원에 원용된다.The disclosures of the specification, drawings and summary included in Japanese Patent Application No. 2014-053355 filed on March 17, 2014 are all incorporated herein by reference.

<산업상 이용가능성>&Lt; Industrial applicability >

본 개시는, 보다 간단하게 다수의 IP 전화기의 통화 내용을 감시하는 것을 가능하게 하는, 통화녹음 서버, 통화 데이터 관리 시스템 및 통화 데이터 관리 방법으로서 유용하다.The present disclosure is useful as a call recording server, call data management system, and call data management method, which makes it possible to more simply monitor call contents of multiple IP telephones.

100 통신 시스템
200 외선망
300 내선망
310 전화기
320 네트워크 기기
330 PBX 장치
400 통화 관리망
410 통화녹음 서버
411 전화망 통신부
412 관리망 통신부
413 메모리
414 음성 기록 제어부
415 통화개시 취득부
416 음성인식 제어부
420 관리 서버
430 음성인식 서버
440 모니터링 장치 100 communication system
200 external network
300 inside the net
310 telephone
320 network devices
330 PBX device
400 call management network
410 Call recording server
411 telephone network communication section
412 Management Network Communication Department
413 memory
414 Voice recording control unit
415 call start acquisition section
416 voice recognition control section
420 management server
430 speech recognition server
440 Monitoring device

Claims

A call data management system having a call recording server, a speech recognition server, a management server and a monitoring device,
The call recording server comprises:
Sequentially acquires voice data of a call transmitted from the IP telephone network from the IP telephone network and records the acquired voice data in a memory, acquires the call start timing at which the call is started based on the control information attached to the acquired voice data,
The voice recognition server comprises:
Immediately after the call start timing is obtained, voice recognition processing on the recorded voice data and recording of text data resulting from the voice recognition processing are started based on a recognition start request received from the management server,
The management server includes:
Wherein the control unit receives the setting of the condition of the voice data to be subjected to the voice recognition processing from the monitoring apparatus and determines to perform the voice recognition based on the control information and the set condition, Immediately after the recording of the audio data and the recording of the text data are started, the monitoring device includes information indicating the acquisition destination of the text data, and transmits the audio data And a notification indicating that the text data can be acquired,
The monitoring device includes:
And the recorded voice data and the text data are acquired and presented by a pull-up operation.

The method according to claim 1,
The call recording server comprises:
Outputting the recorded voice data to the monitoring device in response to a request from the monitoring device,
The voice recognition server comprises:
Outputs the recorded text data to the monitoring device in response to a request from the monitoring device,
The management server includes:
Acquiring the call start timing from the call record server and controlling the operation timing of each of the call record server, the speech recognition server, and the monitoring apparatus based on the acquired call start timing,
Call data management system.

3. The method of claim 2,
The call recording server comprises:
Receiving, from the IP telephone network, a packet to which the control information is assigned, the packet including the voice data and identification information of the call;
The management server includes:
And controls the operation timing of each of the call recording server, the voice recognition server, and the monitoring device for each call based on the control information.
Call data management system.

4. The method according to any one of claims 1 to 3,
The monitoring device includes:
The recorded voice data and the text data are acquired by the full operation of the web browser,
Call data management system.