KR102597261B1

KR102597261B1 - A system that can introduce artificial intelligence-based voice recognition service without modifying the existing in-house phone system

Info

Publication number: KR102597261B1
Application number: KR1020210159176A
Authority: KR
Inventors: 이종우; 유상준; 배영식; 임승민; 최선진
Original assignee: (주) 큰사람커넥트
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2023-11-03
Also published as: KR20230072681A

Abstract

본 발명은 인터넷 전화 시스템에 인공지능기반 음성인식기능을 제공하기 위한 인공지능기반 음성인식 제공 시스템에 관한 것으로, 본 발명에 따르면, 음성인식 제공 시스템은, 인터넷 전화 시스템을 포함하는 사내 전화 시스템; 발신 단말로부터의 인터넷 전화 시스템으로의 호 연결 요청을 수신하여 상기 인터넷 전화 시스템에 호 연결을 수행하도록 구성된 중계 전화 시스템; 및 중계 전화 시스템으로부터 음성 데이터를 수신하여 음성인식된 결과를 중계 전화 시스템에 전달하도록 구성된 AI 기반 음성인식서버를 포함하고, 상기 중계 전화 시스템은, 발신단말로부터 전송 되어온 음성 데이터 패킷으로부터 음성 데이터만을 추출하여 전송하는 음성 처리 모듈; 및 음성인식서버로부터 인식처리된 음성인식정보와 자연어 처리된 자연어처리정보를 수신하는 STT(speak to text) 연동 인터페이스를 포함한다.The present invention relates to an artificial intelligence-based voice recognition providing system for providing an artificial intelligence-based voice recognition function to an Internet phone system. According to the present invention, the voice recognition providing system includes: an in-house phone system including an Internet phone system; a relay telephone system configured to receive a call connection request from a calling terminal to the Internet telephone system and perform a call connection to the Internet telephone system; and an AI-based voice recognition server configured to receive voice data from the relay phone system and transmit the voice recognition result to the relay phone system, wherein the relay phone system extracts only voice data from the voice data packet transmitted from the calling terminal. a voice processing module that transmits; and an STT (speak to text) interworking interface that receives recognized voice recognition information and natural language processed natural language information from the voice recognition server.

Description

A system that can introduce artificial intelligence-based voice recognition service without modifying the existing in-house phone system {A SYSTEM THAT CAN INTRODUCE ARTIFICIAL INTELLIGENCE-BASED VOICE RECOGNITION SERVICE WITHOUT MODIFYING THE EXISTING IN-HOUSE PHONE SYSTEM}

본 발명은 인터넷 전화 시스템에 인공지능기반 음성인식기능을 제공하기 위한 인공지능기반 음성인식 제공 시스템에 관한 것으로, 구체적으로는 기존 사내 전화 시스템의 수정없이 인공지능기반의 음성인식서비스를 도입할수 있는 시스템에 관한 것이다.The present invention relates to an artificial intelligence-based voice recognition provision system for providing an artificial intelligence-based voice recognition function to an Internet phone system. Specifically, a system that can introduce an artificial intelligence-based voice recognition service without modifying the existing in-house phone system. It's about.

최근, ARS 시스템이 발달되어 널리 사용되고 있으나, 메뉴얼적이고 양방향적으로 소통하지 못하는 ARS 시스템이나 음성인식 시스템에 많은 사람들이 답답함을 호소하며 제대로된 정보도 입력하지 못한 채, 상담원과 연결되는 경우가 많아 오히려 시스템 구축으로 비용은 증가하지만 상담원 인력의 낭비는 여전한 상황이다. Recently, the ARS system has been developed and is widely used, but many people complain of frustration with the ARS system or voice recognition system, which is manual and does not communicate two-way, and is often connected to a counselor without entering the correct information, so the system instead Costs increase with deployment, but the waste of counselor manpower remains.

이때, ARS의 정확한 연결과 콜센터를 원활한 상담 분배를 위하여 CTI(Computer Telephony Intergration)을 이용한다. 이와 관련하여, 한국공개특허 제2012-0070417호(2012년06월29일 공개)에는, 다양한 제조사의 범용 IPPBX와 CTI 미들웨어가 프로토콜 변경없이 손쉽게 연동할 수 있도록 하기 위한 콜센터 PBX 링크 장치가 개시되어 있다.At this time, CTI (Computer Telephony Integration) is used for accurate connection of ARS and smooth consultation distribution in the call center. In this regard, Korean Patent Publication No. 2012-0070417 (published on June 29, 2012) discloses a call center PBX link device to allow general-purpose IPPBXs and CTI middleware from various manufacturers to easily interconnect without changing the protocol. .

다만, 보다 원할한 CTI 이용을 위해 최근에는 인공지능기반의 음성인식기능을 사내 전화 시스템에 추가시키는 방법이 고려되고 있다. 도 1은 전술한 바와 같이 종래 CTI 모듈(100)을 구비한 사내 전화 시스템(10)에서 음성인식기능을 수행하기 위한 최소의 구성을 나타내는 도면이다.However, in order to use CTI more smoothly, a method of adding artificial intelligence-based voice recognition function to the in-house phone system has recently been considered. FIG. 1 is a diagram showing the minimum configuration for performing a voice recognition function in an in-house telephone system 10 equipped with a conventional CTI module 100, as described above.

도 1에 도시된 바와 같이 CTI 모듈(100)을 구비한 사내 전화 시스템(10)에서 AI 기반의 음성인식기능을 제공하기 위해서는 자체의 AI 음성 인식 서버(30)를 구축해야할 뿐만 아니라 AI 음성 인식 서버(30)와 교신할 수 있는 음성 수신모듈, STT(Speak-to-text) 연동 인터페이스가 사내 전화 시스템(10) 내에 구축되어야만 한다.As shown in FIG. 1, in order to provide an AI-based voice recognition function in the in-house phone system 10 equipped with the CTI module 100, it is necessary to not only build its own AI voice recognition server 30, but also build an AI voice recognition server. A voice reception module capable of communicating with (30) and an STT (Speak-to-text) interface must be built within the in-house telephone system (10).

그러나 이와 같이 기업의 사내 전화 시스템의 경우, 예를들어 CTI가 적용된 통화 시스템 혹은 VoIP로 구성한 Centrix 나 IP-PBX 통화 시스템을 수정해야 하는데 보통 이런 통화 시스템은 처음 통화 시스템을 도입 적용한 이후로는 대부분 오랜시간동안 수정없이 사용하게 되는게 일반적이고, AI 또는 음성인식시스템 등과 같은 특정 시스템과 정합하기 위해서는 기존의 서비스 시스템에 이를 위한 기능(음성수신기능, STT 연동기능)을 개발해 넣고 이를 실제 서비스시스템에 적용해야만 한다.However, in the case of a company's in-house phone system, for example, a CTI-applied call system or a Centrix or IP-PBX call system configured with VoIP must be modified, but these call systems usually last for a long time after the first call system is introduced and applied. It is common to use it without modification for a period of time, and in order to integrate with a specific system such as AI or voice recognition system, functions for this (voice reception function, STT linkage function) must be developed in the existing service system and applied to the actual service system. do.

하지만 이는 일상적으로 서비스하고 있던 기존의 사내 전화 시스템을 전체적으로 수정해야 하는 작업으로 비용적인 부분도 문제지만 보통은 전화 시스템을 도입한 후 오랜기간 수정이나 별도의 기능 추가 없이 사용하는 경우가 대부분이므로, 전화 시스템을 개발한 회사의 개발서포트가 없으면 개발이 불가능하고 또한 전화 시스템을 개발한 회사에서도 이를 수정하기 위해서는 적용하고자 하는 특정한 시스템을 위해 별도의 개발과 그 도입제품의 특성에 따라 자사의 제품을 맞추어야 하는 어려움이 있다.However, this is a task that requires a complete modification of the existing in-house telephone system that is in daily service, and although cost is also a problem, in most cases, the telephone system is used for a long period of time without modification or addition of additional functions after introduction. Development is impossible without the development support of the company that developed the system, and in order for the company that developed the telephone system to modify it, it must be separately developed for the specific system to be applied and tailored to its product according to the characteristics of the introduced product. There are difficulties.

본 발명은 전술한 문제점에 기반하여 안출된 발명으로 기존 사내 전화 시스템의 수정없이 인공지능기반의 음성인식 시스템을 도입할수 있는 시스템을 제공하는 것을 목적으로 한다.The purpose of the present invention is to provide a system that can introduce an artificial intelligence-based voice recognition system without modifying the existing in-house telephone system.

본 발명은 전술한 과제를 해결하기 위해, 본 발명의 일양태에 따르면, 사내 전화 시스템의 수정없이 인공지능기반의 음성인식서비스를 제공하기 위한 시스템을 제공하고, 상기 시스템은, In order to solve the above-described problem, the present invention provides a system for providing an artificial intelligence-based voice recognition service without modifying the in-house telephone system, and the system includes:

인터넷 전화 시스템을 포함하는 사내 전화 시스템;In-house telephone systems, including Internet telephone systems;

발신 단말로부터의 인터넷 전화 시스템으로의 호 연결 요청을 수신하여 상기 인터넷 전화 시스템에 호 연결을 수행하도록 구성된 중계 전화 시스템; 및a relay telephone system configured to receive a call connection request from a calling terminal to the Internet telephone system and perform a call connection to the Internet telephone system; and

중계 전화 시스템으로부터 음성 데이터를 수신하여 음성인식된 결과를 중계 전화 시스템에 전달하도록 구성된 AI 기반 음성인식서버를 포함하고,It includes an AI-based voice recognition server configured to receive voice data from the relay phone system and transmit the voice recognition result to the relay phone system,

상기 중계 전화 시스템은, The relay telephone system is,

발신단말로부터 전송 되어온 음성 데이터 패킷으로부터 음성 데이터만을 추출하여 전송하는 음성 처리 모듈; 및A voice processing module that extracts and transmits only voice data from voice data packets transmitted from the calling terminal; and

음성인식서버로부터 인식처리된 음성인식정보와 자연어 처리된 자연어처리정보를 수신하는 STT(speak to text) 연동 인터페이스를 포함한다.It includes an STT (speak to text) interworking interface that receives recognized voice recognition information and natural language processed natural language information from the voice recognition server.

전술한 양태에 있어서, 음성인식서버는 In the above-described aspect, the voice recognition server

딥러닝을 통해 만들어진 음성인식모델을 통해 음성 데이터를 텍스트 형태의 음성인식정보로 변환하는 음성인식모듈;A voice recognition module that converts voice data into voice recognition information in text form through a voice recognition model created through deep learning;

텍스트 형태의 음성인식정보로부터 숫자, 의미있는 단어를 추출하거나, 문장의 관계를 분석하는 자연어 처리모듈; 및A natural language processing module that extracts numbers and meaningful words from voice recognition information in text form or analyzes sentence relationships; and

상기 음성인식모듈과 자연어 처리모듈로부터 음성인식정보 및 자연어처리정보를 STT 연동 인터페이스에 전달하는 연동 인터페이스를 포함한다. It includes an interlocking interface that transmits voice recognition information and natural language processing information from the voice recognition module and natural language processing module to the STT interlocking interface.

또한 전술한 어느 하나의 양태에 있어서, STT 연동 인터페이스는 음성인식서버로부터 음성인식정보 및 자연어처리정보를 수신하고, 수신된 음성인식정보 및 자연어처리정보를 사내 전화 시스템에 전달하도록 구성된다. In addition, in any of the above-described aspects, the STT interworking interface is configured to receive voice recognition information and natural language processing information from the voice recognition server, and transmit the received voice recognition information and natural language processing information to the in-house telephone system.

또한 전술한 어느 하나의 양태에 있어서, 인터넷 전화 시스템은 전화의 정보추출, 통화상태 모니터링, 콜 전달, 콜 생성, 콜 끊기를 포함하는 기능을 수행하는 CTI 서버 모듈을 포함하고, CTI 서버 모듈은 발신단말의 고객의 정보를 제공하도록 구성된다.In addition, in any of the above-described aspects, the Internet phone system includes a CTI server module that performs functions including extracting phone information, monitoring call status, call forwarding, call creation, and call disconnection, and the CTI server module performs outgoing calls. It is configured to provide information about customers of the terminal.

또한 전술한 어느 하나의 양태에 있어서, 사내 전화 시스템은 웹인터페이스 모듈을 포함하고, STT 연동 인터페이스는 음성인식서버로부터 음성인식정보 및 자연어처리정보를 수신하고, 수신된 음성인식정보 및 자연어처리정보를 사내 전화 시스템의 웹인터페이스 모듈로 전달하고, 웹인터페이스 모듈은 수신된 음성인식정보 및 자연어처리정보를 사용자의 모니터에 표시하도록 동작한다.In addition, in any of the above-described aspects, the in-house telephone system includes a web interface module, and the STT interworking interface receives voice recognition information and natural language processing information from the voice recognition server, and uses the received voice recognition information and natural language processing information. It is transmitted to the web interface module of the in-house phone system, and the web interface module operates to display the received voice recognition information and natural language processing information on the user's monitor.

이와 같은 본 발명에 따르면 사내 전화 시스템과 AI 음성 인식 서버 사이에 음성 인식을 중계할 수 있는 중계 전화 시스템을 제공함으로써 기존의 사내 전화 시스템을 변경하지 않으면서도 음성인식기능을 포함한 인공지능서비스를 보다 편리하고 효과적으로 제공할 수 있다.According to the present invention, a relay phone system capable of relaying voice recognition between an in-house phone system and an AI voice recognition server is provided, making artificial intelligence services including voice recognition functions more convenient without changing the existing in-house phone system. and can be provided effectively.

도 1은 사내 전화 시스템에 AI 음성 인식 서비스를 제공하기 위한 종례의 시스템 구성의 예를 나타내는 도면;
도 2는 본 발명의 실시예에 따라 사내 전화 시스템에 AI 음성 인식 서비스를 제공하는 시스템의 구성을 나타내는 도면;
도 3은 본 발명에 따른 인공지능기반 음성인식 제공 시스템에서의 구성요소들 사이에서 시간에 따른 처리 흐름을 나타내는 흐름도이다.1 is a diagram showing an example of a conventional system configuration for providing an AI voice recognition service to an in-house telephone system;
Figure 2 is a diagram showing the configuration of a system that provides an AI voice recognition service to an in-house phone system according to an embodiment of the present invention;
Figure 3 is a flowchart showing the processing flow over time among components in the artificial intelligence-based voice recognition providing system according to the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되는 실시예를 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이다. The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and will be implemented in various different forms.

본 명세서에서 본 실시예는 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 그리고 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 따라서, 몇몇 실시예들에서, 잘 알려진 구성 요소, 잘 알려진 동작 및 잘 알려진 기술들은 본 발명이 모호하게 해석되는 것을 피하기 위하여 구체적으로 설명되지 않는다. The examples herein are provided to make the disclosure of the present invention complete and to fully inform those skilled in the art of the scope of the invention. And the present invention is only defined by the scope of the claims. Accordingly, in some embodiments, well-known components, well-known operations and well-known techniques are not specifically described in order to avoid ambiguous interpretation of the present invention.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다. 그리고, 본 명세서에서 사용된(언급된) 용어들은 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 또한, '포함(또는, 구비)한다'로 언급된 구성 요소 및 동작은 하나 이상의 다른 구성요소 및 동작의 존재 또는 추가를 배제하지 않는다. Like reference numerals refer to like elements throughout the specification. Also, the terms used (mentioned) in this specification are for describing embodiments and are not intended to limit the present invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. Additionally, components and operations referred to as 'including (or, including)' do not exclude the presence or addition of one or more other components and operations.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 정의되어 있지 않은 한 이상적으로 또는 과도하게 해석되지 않는다. Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless they are defined.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대해 설명한다. 도 2은 본 발명에 따른 기존의 인공지능기반 음성인식 서비스를 제공하지 않은 사내 전화 시스템(사내 전화 시스템)(10)에 인공지능기반의 음성인식 서비스를 제공하기 위한 개념을 설명하기 위한 블록도이다. Hereinafter, preferred embodiments of the present invention will be described with reference to the attached drawings. Figure 2 is a block diagram to explain the concept of providing an artificial intelligence-based voice recognition service to an in-house phone system (in-house phone system) 10 that does not provide an existing artificial intelligence-based voice recognition service according to the present invention. .

도 2에 도시된 바와 같이 음성인식 제공 시스템(1)은 발신 단말로부터의 발신 콜을 수신하는 중계 전화 시스템(40); 중계 전화 시스템과 호 전환 연동되는 사내전화 전화 시스템(10); 중계 전화 시스템(40)으로부터 음성 데이터를 수신하여 음성인식된 결과를 중계 전화 시스템(40)에 전달하는 AI 음성 인식 서버(30)를 포함한다. As shown in Figure 2, the voice recognition providing system 1 includes a relay telephone system 40 that receives an outgoing call from an outgoing terminal; An in-house telephone system (10) linked to a relay telephone system and call transfer; It includes an AI voice recognition server 30 that receives voice data from the relay phone system 40 and transmits the voice recognition result to the relay phone system 40.

본 실시예에서, 발신 단말과 중계전화시스템 간에는 망이 배치된다. 상기 망은 이동통신망, VoIP(Voice over Internet Protocol) 음성 패킷망과 같은 인터넷, PSTN(Public Switched Telephone Network)이나 PSDN(Packet Switched Data Network)와 같은 전화망일 수 있다.In this embodiment, a network is deployed between the calling terminal and the relay telephone system. The network may be a mobile communication network, the Internet such as VoIP (Voice over Internet Protocol) voice packet network, or a telephone network such as PSTN (Public Switched Telephone Network) or PSDN (Packet Switched Data Network).

중계 전화 시스템(20)은 사내 전화 시스템(10)으로의 발신 콜을 수신하면 사내 전화 시스템(10)과 전화 연동을 수행하고 동시에 AI 음성 인식 서버(30)에 음성 데이터를 전달하고, AI 음성 인식 서버(30)로부터의 음성인식된 문장 및/또는 자연어 처리된 정보를 수신하여 이를 사내 전화시스템(10)에 전달하도록 동작된다.When the relay phone system 20 receives an outgoing call to the in-house phone system 10, it performs telephone interworking with the in-house phone system 10 and simultaneously transmits voice data to the AI voice recognition server 30 and AI voice recognition. It is operated to receive voice-recognized sentences and/or natural language processed information from the server 30 and transmit them to the in-house telephone system 10.

중계 전화 시스템(20)은 이와 같은 동작을 수행하기 위해 음성 처리 모듈(22) 및 STT 연동 인터페이스(24)를 포함한다. VoIP 시스템에서 음성은 코덱으로 압축되어 RTP( Realtime transfer protocol) 패킷으로 전송이 되는데 음성 처리 모듈(22)은 음성 인식 서버(30)에서 음성 인식을 수행할 수 있도록 하기 위해 이를 실시간으로 압축을 풀어 PCM 형태의 데이터로 복원한 후 이를 음성인식 서버로 전송하도록 기능한다. 또한 RTP 음성은 10~20ms 단위로 잘라서 전송되는데 음성 처리 모듈(22)은 이를 적절히 모아 음성 인식 서버에서 음성인식을 처리할만큼 버퍼링을 수행한다. 음성 인식 서버의 경우 동시에 수백의 통화가 처리되어야 하기 때문에 음성 처리 모듈(22)은 각각의 통화를 독립적으로 실시간으로 코덱 디코딩처리를 하면서 버퍼링 처리를 수행하고 음성인식서버(30)와 연동하여 데이터의 실시간 전송하고 인터페이스와 통신하는 역할을 수행한다.The relay telephone system 20 includes a voice processing module 22 and an STT interworking interface 24 to perform this operation. In the VoIP system, voice is compressed with a codec and transmitted as RTP (Realtime transfer protocol) packets. The voice processing module 22 decompresses it in real time to enable voice recognition in the voice recognition server 30 and converts it to PCM. It functions to restore the data in its original form and transmit it to the voice recognition server. In addition, RTP voice is transmitted in segments of 10 to 20 ms, and the voice processing module 22 collects it appropriately and performs buffering enough to process voice recognition in the voice recognition server. In the case of a voice recognition server, hundreds of calls must be processed at the same time, so the voice processing module 22 independently performs codec decoding processing for each call in real time, performs buffering processing, and links with the voice recognition server 30 to transmit data. It performs real-time transmission and communication with the interface.

또한 중계 전화 시스템(20)은 STT 연동 인터페이스(24)를 포함한다. STT(Speak To Text)를 처리하는 음성 인식 서버(30)는 로컬서버에서 각각의 파일 또는 버퍼를 음성인식하도록 기능한다. 다수의 통화를 독립적으로 인식작업처리 하기 위해 멀티프로세싱이 되도록 음성 인식 서버가 구성되고 STT 연동 인터페이스(24)는 이와 같은 음성 인식 서버와의 사이에서 인터페이싱을 수행한다. 또한 STT 연동 인터페이스(24)는 관리자 웹프로그램(고객 관리 프로그램) 또는 상담원 웹 인터페이스 모듈(14)에 음성인식된 정보 또는 자연어처리된 정보를 전달하는 기능도 수행한다. STT 연동 인터페이스(24)는 C/S 방식으로 서비스가 구성되어 여러 개의 통화가 독립적으로 인식작업을 할 수 있도록 구성되어 있다.Additionally, the relay telephone system 20 includes an STT interworking interface 24. The voice recognition server 30, which processes STT (Speak To Text), functions to recognize each file or buffer by voice on the local server. The voice recognition server is configured to perform multiprocessing to independently process multiple calls, and the STT interoperability interface 24 performs interfacing with the voice recognition server. In addition, the STT interworking interface 24 also performs the function of transmitting voice recognition information or natural language processed information to the administrator web program (customer management program) or agent web interface module 14. The STT interworking interface 24 is configured to provide a C/S service so that multiple calls can be recognized independently.

음성인식서버(30)는 각각의 통화채널에 독립적으로 음성인식을 하고, 여기서 인식된 문장을 자연어처리를 통해 숫자인식, 문장구조분석, 개체명인식 등의 작업을 수행하는 집중식 음성인식 처리서버로 구성된다. 음성인식서버(30)는 음성인식만 수행하는 것은 아니고 다른 지능적인 처리까지도 수행하도록 구성될 수 있다. 음성인식서버(30)는 이와 같은 동작을 수행하기 위해 음성인식모듈(32), 자연어처리모듈(34), 및 연동인터페이스(36) 및 음성 인식 DB(38)를 포함할 수 있다. The voice recognition server 30 is a centralized voice recognition processing server that recognizes voices independently for each call channel and performs tasks such as number recognition, sentence structure analysis, and entity name recognition through natural language processing of the recognized sentences. It is composed. The voice recognition server 30 can be configured to not only perform voice recognition but also perform other intelligent processing. The voice recognition server 30 may include a voice recognition module 32, a natural language processing module 34, an interworking interface 36, and a voice recognition DB 38 to perform such operations.

음성 인식 모듈(32)은 음성인식을 딥러닝을 통해 만들어진 음성인식 모델을 통해 PCM 음성 데이터를 문자로 변환해주는 기능을 수행한다. 이를 간단히 STT( Speech to Text) 라고 부르기도 한다. 음성인식모듈(32)에서 인식된 정보는 txt 형태로 변환되고 연동인터페이스(36)를 통해 중계 시스템(20)의 STT 연동 인터페이스(24)에 전달된다.The voice recognition module 32 performs the function of converting PCM voice data into text through a voice recognition model created through deep learning. This is also simply called STT (Speech to Text). The information recognized by the voice recognition module 32 is converted into txt format and transmitted to the STT interconnection interface 24 of the relay system 20 through the interconnection interface 36.

자연어 처리 모듈(34)은 음성 인식 모듈(32)로부터 인식된 텍스트 문장을 분석하여 텍스트 문장으로부터 숫자부분을 추출하거나, 주요의미를 가진 단어를 추출해 내거나, 문장의 관계를 분석하는 등의 작업을 수행하고, 자연어 처리 모듈(34)에서 분석된 문장 또는 정보는연동 인터페이스(36)를 통해 중계 전화 시스템으로 전달된다.The natural language processing module 34 analyzes text sentences recognized by the voice recognition module 32 and performs tasks such as extracting numerical parts from text sentences, extracting words with main meaning, or analyzing sentence relationships. And the sentences or information analyzed by the natural language processing module 34 are transmitted to the relay phone system through the interworking interface 36.

연동 인터페이스(36)는 VoIP를 이용하는 전화 시스템, AI 스피커 등의 음성처리모듈(22)과 연동하여 디코딩된 음성데이터를 수신하여 음성인식모듈(32)로 전달해주고, 역으로 음성인식모듈(32) 및/또는 자연어 처리 모듈(34)로부터 인식된 문장과 정보를 다시 전화시스템, AI스피커로 전달해주는 기능을 수행한다. 또한 각각의 콜마다 별도의 콜 정보(Call ID)를 가지기 때문에 이에 대한 정보를 갱신 보관하는 기능도 수행하게 된다. The interworking interface 36 receives decoded voice data in conjunction with the voice processing module 22, such as a phone system using VoIP or an AI speaker, and transmits it to the voice recognition module 32, and conversely, the voice recognition module 32 And/or performs the function of transmitting sentences and information recognized from the natural language processing module 34 back to the telephone system and AI speaker. In addition, since each call has separate call information (Call ID), it also performs the function of updating and storing this information.

사내 전화 시스템(10)은 기존의 사내에 구비된 전화 시스템을 나타낸다. 도 2에 도시된 실시예에서 사내 전화 시스템(10)은 인터넷 전화 시스템(110) 및 고객 관리 프로그램(120)을 포함한다.The in-house telephone system 10 represents a telephone system installed in an existing in-house phone system. In the embodiment shown in FIG. 2, the in-house telephone system 10 includes an Internet telephone system 110 and a customer management program 120.

사내 전화 시스템(10)의 인터넷 전화시스템(110)는 중계 전화 시스템(20)과 호 연동되어 있으며, 인터넷 전화시스템(110)은 CTI 서버 모듈(112)를 포함한다. CTI 서버 모듈(112)는 중계 전화 시스템(20)을 통해 호 연동된 전화의 정보추출, 통화상태 모니터링 및 전달, 통화관련 보조기능 즉 콜 전달, 콜 생성, 콜 끊기 등의 다양한 기능을 수행한다. CRM등의 시스템에서 사용할 경우 정보의 전달과 통화상태 모니터링 등을 이용하여 DB와 연동하여 상대방의 정보를 확장하는 시스템을 구성하는데 이용된다. The Internet phone system 110 of the in-house phone system 10 is interconnected with the relay phone system 20, and the Internet phone system 110 includes a CTI server module 112. The CTI server module 112 performs various functions such as extracting information of a call connected to a call through the relay telephone system 20, monitoring and transmitting call status, and call-related auxiliary functions, such as call forwarding, call creation, and call disconnection. When used in systems such as CRM, it is used to construct a system that expands the other party's information by linking with the DB by transferring information and monitoring call status.

고객관리프로그램(120)은 CRM, ERP 등의 시스템과 연동하여 CTI 서버 모듈(112)에서 전달된 정보를 이용하여 고객의 상태 , 이전 히스토리, 콜 진행상의 고객정보 사용 등을 위해 구성된 프로그램을 의미한다. The customer management program 120 refers to a program configured to use information transmitted from the CTI server module 112 in conjunction with systems such as CRM and ERP to use customer information such as customer status, previous history, and call progress. .

본 실시예에서 음성인식서버(30)에서 인식된 정보를 사내 전화 시스템(10)에 제공하기 위해 고객 관리 프로그램(120)은 웹인터페이스모듈(124)를 더 포함한다. 웹인터페이스모듈(124)는 STT 연동 인터페이스(24)로부터 인식된 문장 및/또는 자연어 처리된 정보를 사용자(상담원)의 화면에 팝업창과 같은 형태로 제공하도록 구성된다.In this embodiment, the customer management program 120 further includes a web interface module 124 to provide information recognized by the voice recognition server 30 to the in-house telephone system 10. The web interface module 124 is configured to provide sentences and/or natural language processed information recognized from the STT linking interface 24 in the form of a pop-up window on the user's (counselor) screen.

여기서 웹인터페이스모듈(124)는 웹 브라우저에서 실행되는 스크립트를 이용하여 상담원측의 모니터에 인식된 문장 및/또는 자연어처리된 정보가 제공되도록 고객 관리 프로그램에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: world wide web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(hyper text mark-up language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(chrome) 등을 포함할 수 있다.Here, the web interface module 124 may be run in a customer management program so that recognized sentences and/or natural language processed information are provided to the agent's monitor using a script executed in a web browser. Here, a web browser is a program that allows the use of web (WWW: world wide web) services and refers to a program that receives and displays hypertext written in HTML (hyper text mark-up language). For example, Netscape. , Explorer, Chrome, etc.

도 3은 전술한 바와 같은 음성인식 제공 시스템에서의 동작흐름을 나타내는 흐름도이다. 도 3에 도시된 바와 같이 발신자측 단말에서 사내 전화 시스템(10)으로 호 연결을 요청한다. 중계 전화 시스템(20)은 사내 전화 시스템(10)과 호 연동이 되도록 구성되어 있어 중계 전화 시스템(20)이 호 연결 요청을 수신하게 된다. 따라서 중계 전화 시스템(20)이 호 연결 요청을 수신하면 중계 전화 시스템은 사내 전화 시스템(10)에 호 연동을 수행한다.Figure 3 is a flowchart showing the operation flow in the voice recognition providing system as described above. As shown in Figure 3, the caller's terminal requests a call connection to the in-house telephone system 10. The relay phone system 20 is configured to interoperate with the in-house phone system 10, so that the relay phone system 20 receives a call connection request. Therefore, when the relay phone system 20 receives a call connection request, the relay phone system performs a call connection to the in-house phone system 10.

중계 전화 시스템(20)과 사내 전화 시스템(10) 사이에 호 연동이 수립된 후 발신자 단말측으로부터의 음성데이가 중계 전화 시스템(20)으로 전달되고 중계 전화 시스템은 사내 전화 시스템(10)으로 음성 데이터를 전달하는 동시에 중계 전화 시스템(20)의 음성처리모듈(22)은 음성데이터 압축해제 후 압축해제된 음성 데이터를 음성인식서버(30)로 전송하여 음성인식을 요청하게 된다.After call interconnection is established between the relay phone system (20) and the in-house phone system (10), voice data from the caller's terminal is transmitted to the relay phone system (20), and the relay phone system transmits voice data to the in-house phone system (10). At the same time as transmitting data, the voice processing module 22 of the relay telephone system 20 decompresses the voice data and transmits the decompressed voice data to the voice recognition server 30 to request voice recognition.

발신자측 단말과 사내 전화 시스템(10)은 중계 전화 시스템(20)을 통해 호 연동 연결이되어 있기 때문에 발신자는 사내 전화 시스템(10)의 상담원과 통화를 수행하게 되고, 동시에 음성 인식 서버(30)는 음성처리모듈(22)로부터 일정 크기로 버퍼링되어 오는 음성 데이터를 음성 인식하여 텍스트 형태로 전환하고 또한 음성인식된 데이터에 대해 자연어 처리를 수행하여 필요한 정보를 추출하게 된다.Since the caller's terminal and the in-house phone system (10) are connected through the relay phone system (20), the caller makes a call with a counselor of the in-house phone system (10), and at the same time, the voice recognition server (30) recognizes voice data buffered to a certain size from the voice processing module 22, converts it into text form, and performs natural language processing on the voice-recognized data to extract necessary information.

음성인식서버(30)로부터 생성된 음성인식된 텍스트 문장과 자연어처리된 정보는 음성인식 서버와 중계 전화 시스템 사이의 수립된 연동 인터페이스를 통해 중계 전화 시스템(20)으로 전송되고, 중계 전화 시스템(20)의 STT 연동 인터페이스(24)는 은 음성 인식된 텍스트 문장 및 자연어 처리된 정보를 사내 전화 시스템의 고객 관리 프로그램(120)에 전송하게 된다.The voice-recognized text sentences and natural language processed information generated from the voice recognition server 30 are transmitted to the relay phone system 20 through an established interworking interface between the voice recognition server and the relay phone system, and the relay phone system 20 )'s STT interworking interface 24 transmits voice-recognized text sentences and natural language processed information to the customer management program 120 of the in-house phone system.

전술한 바와 같이 본 발명의 실시예에서는 사내 인터넷 전화 시스템으로 인입되던 번호를 전화 중계 시스템으로 연결시켜 호를 보내고 전화 중계 시스템이 이 호를 원래의 통화 시스템인 인터넷 전화 시스템으로 다시 보내는 방식으로 구현된다. 이 과정에서 전화 중계 시스템은 통화정보 및 음성 데이터를 처리하여 음성인식서버로 전송하고 다시 음성인식서버로부터 음성인식 결과를 수신하여 이를 사내 전화 시스템에 포함된 고객관리 프로그램으로 전달하도록 구성된다. As described above, in the embodiment of the present invention, the number coming into the in-house Internet phone system is connected to the phone relay system to send a call, and the phone relay system sends the call back to the original call system, the Internet phone system. . In this process, the telephone relay system processes call information and voice data, transmits them to the voice recognition server, receives voice recognition results from the voice recognition server, and transmits them to the customer management program included in the in-house phone system.

전술한 실시예에서, 인터넷 전화 시스템을 사용하는 것을 예로 들어 이를 설명하였다. 이는 최근 IP-PBX나 센트릭스등의 인터넷 통신을 이용한 음성전화시스템의 사용이 보편적이므로 인터넷 전화 시스템 연동으로 개발하면 거의 대부분의 통화망 연동이 가능하고 또 아나로그 또는 인터넷 전화가 아닌 CTI등을 적용한 PBX시스템이라 하더라도 인터넷 전화망과의 연동은 비교적 간단하기 때문에 대부분의 사내 전화 시스템에 적용이 가능하다.In the above-described embodiment, this was explained using the Internet phone system as an example. This is because the use of voice phone systems using Internet communication such as IP-PBX or Centrix is common these days, so if developed in conjunction with the Internet phone system, it is possible to link with most call networks, and it is also possible to use CTI, etc., rather than analog or Internet phones. Even if it is a PBX system, linking with the Internet phone network is relatively simple, so it can be applied to most in-house phone systems.

또한 최종단의 상담원이 사용하는 고객상담 프로그램의 경우 서비스사가 지속적으로 추가개발 및 관리를 하는 경우가 대부분이므로 전화 중계 시스템에서 처리한 정보를 고객상담 프로그램과 연동시켜서 전달하는 것은 비교적 큰 문제 없이 개발이 가능하다. 보통 이런 고객 상담 프로그램은 웹프로그램이나 C/S 프로그램으로 만들어져 있으며, REST API 나 JSON 연동, 메시지 연동 등을 통해 구현이 가능하다.In addition, in the case of customer consultation programs used by final-stage counselors, in most cases, the service company continues to develop and manage additional services, so linking the information processed by the telephone relay system to the customer consultation program and delivering it can be developed without any major problems. possible. Usually, these customer consultation programs are made as web programs or C/S programs, and can be implemented through REST API, JSON linkage, message linkage, etc.

이와 같은 본 발명의 실시예에 따르면 사내 전화 시스템과 AI 음성 인식 서버 사이에 음성 인식을 중계할 수 있는 중계 전화 시스템(20)을 제공함으로써 기존의 사내 전화 시스템을 변경하지 않고 음성인식기능을 포함한 인공지능서비스를 보다 편리하게 제공할 수 있다.According to this embodiment of the present invention, a relay phone system (20) capable of relaying voice recognition between an in-house phone system and an AI voice recognition server is provided, thereby providing an artificial intelligence device including a voice recognition function without changing the existing in-house phone system. Intelligent services can be provided more conveniently.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(Digital Signal Processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(Command)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리요소(Processing Element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(Parallel Processor)와 같은, 다른 처리 구성(Processing Configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or the device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments include, for example, a processor, a controller, an Arithmetic Logic Unit (ALU), a Digital Signal Processor, a microcomputer, and a Field Programmable Gate Array (FPGA). , may be implemented using one or more general-purpose computers or special-purpose computers, such as a Programmable Logic Unit (PLU), a microprocessor, or any other device that can execute and respond to commands. A processing device may execute an operating system (OS) and one or more software applications that run on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, there are cases where a single processing device is described, but those skilled in the art will understand that the processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are also possible.

소프트웨어는 컴퓨터 프로그램(Computer Program), 코드(Code), 명령(Command), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(Collectively)처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(Component), 물리적 장치, 가상 장치(Virtual Equipment), 컴퓨터 저장매체 또는 장치에 구체화(Embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, commands, or a combination of one or more of these, and may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied in . Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (Magneto-Optical Media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

따라서 본 발명의 보호 범위는 전술한 실시예에 의해 제한되기 보다는 아래의 청구범위에 의하여 해석되어야하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다. Therefore, the scope of protection of the present invention should be interpreted in accordance with the claims below rather than being limited by the above-described embodiments, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of the present invention.

10: 사내 전화 시스템 20: 중계 전화 시스템
22: 음성처리모듈 24: STT 연동 인터페이스
30: AI기반 음성인식서버 32: 음성인식모듈
34: 자연어처리모듈 36: 연동인터페이스
38: 음성인식 DB 110: 인터넷 전화 시스템
112: CTI 서버 모듈 120: 고객관리프로그램
122: CTI 클라이언트 모듈 124: 웹 인터페이스 모듈10: In-house phone system 20: Relay phone system
22: Voice processing module 24: STT interlocking interface
30: AI-based voice recognition server 32: Voice recognition module
34: Natural language processing module 36: Link interface
38: Voice recognition DB 110: Internet phone system
112: CTI server module 120: Customer management program
122: CTI client module 124: Web interface module

Claims

In a voice recognition provision system that allows the introduction of an artificial intelligence-based voice recognition service without modifying the in-house phone system,
In-house telephone systems, including Internet telephone systems;
a relay telephone system connected to the in-house telephone system through a network independent of the in-house telephone system to receive a call connection request from a calling terminal to the Internet telephone system and perform a call connection to the Internet telephone system; and
It includes an AI-based voice recognition server configured to receive voice data from the relay phone system and transmit the voice recognition result to the relay phone system,
The relay phone system is,
A voice processing module that extracts and transmits only voice data from voice data packets transmitted from the calling terminal; and
It includes an STT (speak to text) interworking interface that receives recognized voice recognition information and natural language processed natural language information from the voice recognition server,
The STT interworking interface is configured to receive voice recognition information and natural language processing information from the voice recognition server, and transmit the received voice recognition information and natural language processing information to the in-house telephone system,
The in-house phone system includes a web interface module,
The STT linkage interface receives voice recognition information and natural language processing information from the voice recognition server, and transmits the received voice recognition information and natural language processing information to the web interface module of the in-house phone system,
The web interface module operates to display received voice recognition information and natural language processing information on the user's monitor,
After call interconnection is established between the relay phone system and the in-house phone system, voice data from the caller's terminal is transmitted to the relay phone system, and the relay phone system transmits the voice data to the in-house phone system and at the same time, the voice data of the relay phone system is transmitted. The processing module decompresses voice data, transmits the decompressed voice data to the voice recognition server, and requests voice recognition.
The voice recognition server is configured to collect decompressed voice data and transmit voice recognition information to the relay phone system each time a sentence is completed,
The STT interworking interface of the relay phone system is configured to transmit voice-recognized text sentences and natural language processed information to the customer management program of the in-house phone system.
Voice recognition system.

According to paragraph 1,
The voice recognition server is
A voice recognition module that converts voice data into voice recognition information in text form through a voice recognition model created through deep learning;
A natural language processing module that extracts numbers and meaningful words from voice recognition information in text form or analyzes sentence relationships; and
Characterized in that it includes an interlocking interface that transmits voice recognition information and natural language processing information from the voice recognition module and natural language processing module to the STT interlocking interface.
Voice recognition system.

delete

According to paragraph 1,
The Internet phone system includes a CTI server module that performs functions including phone information extraction, call status monitoring, call forwarding, call creation, and call termination, and the CTI server module is configured to provide customer information of the calling terminal. characterized by
Voice recognition system.

delete