KR100929531B1

KR100929531B1 - Information provision system and method in wireless environment using speech recognition

Info

Publication number: KR100929531B1
Application number: KR1020060136411A
Authority: KR
Inventors: 유관식; 이현주; 이정훈; 정승우
Original assignee: 에스케이마케팅앤컴퍼니 주식회사
Priority date: 2006-12-28
Filing date: 2006-12-28
Publication date: 2009-12-03
Also published as: KR20080061549A

Abstract

본 발명은 음성 인식을 이용한 무선 환경에서의 정보 제공 시스템 및 그 방법에 관한 것이다. The present invention relates to a system and method for providing information in a wireless environment using speech recognition.

정보 제공 시스템은 네트워크를 통하여 적어도 하나 이상의 사용자 단말기에 연결되어 있으며, 사용자 단말기로부터의 요청에 따라 정보를 제공한다. 사용자의 정보 제공을 요청하는 음성 신호가 음성 인식 처리가 가능한 형태의 스트리밍 음성 데이터로 처리된 다음에 네트워크를 통하여 시스템으로 제공된다. 시스템은 수신된 스트리밍 음성 데이터에 대한 끝점을 검출하고 음성 인식 처리하여 사용자가 요청한 정보가 무엇인지를 인식한다. 그리고 인식 결과에 따른 정보를 전송 가능한 형태의 데이터로 처리하여 사용자 단말기로 제공한다.The information providing system is connected to at least one user terminal through a network, and provides information according to a request from the user terminal. The voice signal for requesting the user's information is processed into streaming voice data in the form of voice recognition processing and then provided to the system through the network. The system detects an endpoint for the received streaming voice data and performs a speech recognition process to recognize what information the user has requested. The information according to the recognition result is processed into data in a form that can be transmitted to the user terminal.

따라서, 사용자 단말기에서 음성 인식 처리가 가능하도록 입력되는 사용자의 음성 신호를 전처리 한 후 전송함으로써, 음성 데이터 전송시 왜곡이 발생되는 것을 방지할 수 있다. Accordingly, by preprocessing and transmitting the voice signal of the user inputted to enable the voice recognition processing in the user terminal, it is possible to prevent distortion from occurring during voice data transmission.

Description

Information Offering System and Method using Voice Recognition in Wireless Environment

도 1은 본 발명의 실시 예에 따른 정보 제공 시스템의 네트워크 연결도이다. 1 is a network connection diagram of an information providing system according to an exemplary embodiment of the present invention.

도 2는 본 발명의 실시 예에 따른 사용자 단말기의 구조도이다. 2 is a structural diagram of a user terminal according to an exemplary embodiment of the present invention.

도 3은 본 발명의 실시 예에 따른 정보 제공 시스템의 구체적인 구조도이다. 3 is a detailed structural diagram of an information providing system according to an exemplary embodiment of the present invention.

도 4 및 도 5는 본 발명의 실시 예에 따른 정보 제공 방법의 흐름도이다. 4 and 5 are flowcharts of an information providing method according to an exemplary embodiment of the present invention.

도 6은 본 발명의 실시 예에 따른 음성 인식 처리 결과에 따른 처리 과정을 나타낸 흐름도이다. 6 is a flowchart illustrating a processing process based on a voice recognition processing result according to an exemplary embodiment of the present invention.

도 7 및 도 8은 본 발명의 실시 예에 따른 정보 제공 방법에 따라 사용자 단말기에 출력되는 정보의 예시도이다. 7 and 8 are exemplary diagrams of information output to a user terminal according to an information providing method according to an exemplary embodiment of the present invention.

본 발명은 정보 제공 시스템에 관한 것으로, 더욱 상세하게는 무선 환경에서 음성 인식을 이용한 정보 제공 시스템 및 그 방법 그리고 정보 표시 장치에 관한 것이다. The present invention relates to an information providing system, and more particularly, to an information providing system, a method and an information display apparatus using speech recognition in a wireless environment.

일반적으로 전화를 통하여 사용자들에게 소정의 정보를 제공하는 시스템은 사용자가 유무선 통신으로 접속하면 사용자의 요청에 따라 다수의 정보를 제공하는 고객 상담 시스템이다. 종래의 고객 상담 시스템은 주로 고객이 유선 또는 무선으로 접속하면 고객을 소정의 상담원과 연결시켜, 고객과 상담원의 직접적인 통화에 의하여 고객이 원하는 정보를 제공받을 수 있도록 하는 형태로 이루어진다. 다른 형태로는 고객이 유선 또는 무선으로 통신하여 접속하면 고객을ARS(automatic response system) 서버로 연결시켜, 고객이 ARS 서버를 통하여 정보를 제공받을 수 있도록 한다. 또 다른 형태로는 고객이 원하는 서비스의 메뉴를 단말기 상에서 음성으로 요청하면 상기 음성을 분석하여 해당 정보를 제공한다.In general, a system for providing predetermined information to users through a telephone is a customer consultation system that provides a plurality of information in response to a user's request when the user connects through wired or wireless communication. Conventional customer consultation system is mainly made in the form of connecting the customer with a predetermined counselor when the customer is connected by wire or wireless, so that the customer can receive the desired information by a direct call between the customer and the counselor. In another form, when the customer communicates by wire or wirelessly, the customer is connected to an ARS (automatic response system) server so that the customer can receive information through the ARS server. In another form, when a customer requests a menu of a desired service by voice on a terminal, the voice is analyzed to provide corresponding information.

음성 인식을 이용하여 정보를 제공하는 시스템은 전화망을 기반으로 서비스를 제공하거나, 또는 무선망을 기반으로 정보를 제공하는데 음성 처리 방식에 따라 제1 형태와 제2 형태로 나뉘어진다. A system for providing information using speech recognition provides a service based on a telephone network, or provides information based on a wireless network, which is divided into a first form and a second form according to a speech processing method.

제1 형태는 단말기가 단지 사용자의 음성을 녹취한 아날로그 음성 파일을 시스템으로 전송하면, 시스템이 음성 파일을 소정 음성 인식 프로세스에 따라 처리하여 음성 데이터를 생성함으로써, 음성 인식이 수행되는 방식이다. 예를 들어, 단말기가 아날로그 음성 파일을 전화망을 통하여 시스템으로 전송하고, 시스템의 음성 인식 엔진이 교환기, IVR(interactive voice response) 서버를 통하여 전달되는 아날로그 음성 파일을 소정 음성 인식 프로세스에 따라 처리하여 인식한다. 그런데 이 경우 단말기가 아날로그 형태의 음성 파일을 시스템 상으로 전송하기 때문에 왜곡과 잡음 등이 수반된다. In the first aspect, when the terminal merely transmits an analog voice file in which a user's voice is recorded to the system, the system processes the voice file according to a predetermined voice recognition process to generate voice data, thereby performing voice recognition. For example, the terminal transmits the analog voice file to the system through the telephone network, and the voice recognition engine of the system processes and recognizes the analog voice file transmitted through the exchange and the interactive voice response (IVR) server according to a predetermined voice recognition process. do. In this case, however, the terminal transmits an analog voice file to the system, which is accompanied by distortion and noise.

반면, 제2 형태는 단말기가 사용자의 음성을 녹취한 아날로그 음성 파일을 전송하는 것이 아니라 상기 음성 파일을 소정 음성 인식 처리한 다음에 얻어지는 데이터를 시스템으로 전송하는 방식이다. 이러한 방식을 DSR(distribute speech recognition) 방식이라고 하며, 단말기가 음성 파일을 토대로 한 일반적인 음성 인식 프로세스 중 주요 부분을 수행한다.On the other hand, the second form is a method in which the terminal transmits data obtained after the predetermined voice recognition processing of the voice file, rather than transmitting an analog voice file recording the user's voice. This method is called distributed speech recognition (DSR), and the terminal performs a major part of the general speech recognition process based on a speech file.

그러나 이 경우에는 단말기가 처리하는 과정이 복잡하기 때문에 단말기의 부하가 증가되고, 또한 상기 과정을 구현하기 위한 하드웨어가 요구됨에 따라 단말기 제조 가격이 상승되는 단점이 있다. However, in this case, since the process of the terminal is complicated, the load of the terminal is increased, and as the hardware for implementing the process is required, the terminal manufacturing price increases.

그러므로 본 발명이 이루고자 하는 기술적 과제는 상기한 종래의 문제점들을 해결하기 위한 것으로, 무선 환경에서 보다 효율적으로 음성 인식을 수행하여 사용자들의 단말기로 해당하는 정보를 제공하고자 하는데 있다. Therefore, the technical problem to be achieved by the present invention is to solve the above-described problems, and to provide information corresponding to the terminal of the user by performing the speech recognition more efficiently in a wireless environment.

또한 본 발명이 이루고자 하는 기술적 과제는 단말기의 처리 부하는 감소시키면서 시스템을 통한 음성 인식이 왜곡 없이 수행되어, 정확한 정보가 단말기로 제공되도록 하는데 있다. In addition, the technical problem to be achieved by the present invention is to reduce the processing load of the terminal to perform the speech recognition without distortion, so that accurate information is provided to the terminal.

또한 본 발명이 이루고자 하는 기술적 과제는 음성 인식 상태를 단말기 상에 표시하여 사용자가 용이하게 확인할 수 있도록 하는데 있다. Another object of the present invention is to display a voice recognition state on a terminal so that a user can easily check the same.

이러한 본 발명의 기술적 과제를 달성하기 위한 본 발명의 특징에 따른 정보 제공 방법은, 네트워크를 통하여 적어도 하나 이상의 사용자 단말기 및 상담원 단말기에 연결되어 있는 시스템에서, 상기 사용자 단말기로부터의 요청에 따라 정보를 제공하는 방법으로서, a) 상기 사용자 단말기가 상기 시스템에 정보 제공을 요청하면서 별도로 구성된 코덱을 사용하여 사용자의 아날로그 음성을 그대로 디지털 신호로 변환 및 압축하고 이를 상기 무선 네트워크를 통하여 전송 가능한 소정 포맷으로 처리한 후 스트리밍 음성 데이터로 상기 시스템으로 전송하는 단계; b) 상기 사용자 단말기를 통하여 정보 제공 요청이 있는 경우, 상기 시스템이 상기 사용자 단말기로부터 전송되는 스트리밍 음성 데이터를 수신하는 단계; c) 상기 시스템이 상기 스트리밍 음성 데이터를 복원 처리하여 음성 데이터를 생성하는 단계; d) 상기 시스템이 상기 음성 데이터에 대한 끝점 검출을 수행하는 단계; e) 상기 시스템이 상기 검출된 끝점을 토대로 상기 음성 데이터에 대한 음성 인식 처리를 수행하는 단계; 및 f) 상기 시스템이 음성 인식 결과에 따라 해당하는 정보를 찾아서 상기 사용자 단말기로 전송하는 단계를 포함한다.The information providing method according to the characteristics of the present invention for achieving the technical problem of the present invention, in the system is connected to at least one user terminal and the counselor terminal through a network, providing information in response to a request from the user terminal A method for converting and compressing an analog voice of a user into a digital signal as it is using a separately configured codec while requesting information from the system by the user terminal, and processing the information into a predetermined format that can be transmitted through the wireless network. Transmitting to the system as post streaming voice data; b) if there is a request for providing information through the user terminal, the system receiving streaming voice data transmitted from the user terminal; c) restoring the streaming voice data by the system to generate voice data; d) the system performing endpoint detection on the speech data; e) the system performing a speech recognition process on the speech data based on the detected endpoint; And f) finding and transmitting the corresponding information to the user terminal according to the speech recognition result.

또한 본 발명의 다른 특징에 따른 정보 제공 시스템은, 네트워크를 통하여 적어도 하나 이상의 사용자 단말기 및 상담원 단말기에 연결되어, 상기 사용자 단말기로부터의 요청에 따라 정보를 제공하는 시스템으로서, 정보 제공을 요청하고, 별도로 구성된 코덱을 사용하여 사용자의 아날로그 음성을 그대로 디지털 신호로 변환 및 압축하여 이를 상기 무선 네트워크를 통해 전송 가능한 소정 포맷으로 처리한 후 스트리밍 음성 데이터로 전송하는 사용자 단말기; 상기 사용자 단말기로부터 정보 제공 요청에 따라 사용자의 아날로그 음성이 그대로 디지털 신호로 변환 및 압축된 후 상기 무선 네트워크를 통해 전송 가능한 소정 포맷으로 처리되어 전송되는 스트리밍 음성 데이터를 수신하고 소정 처리하여 음성 데이터를 생성하는 음성 처리부; 상기 생성된 음성 데이터에 대하여 끝점 검출을 수행하는 끝점 검출부; 상기 검출된 끝점을 토대로 상기 음성 데이터에 대한 음성 인식 처리를 수행하는 음성 인식 엔진; 및 상기 시스템이 음성 인식 결과에 따라 해당하는 정보를 찾아서 상기 사용자 단말기로 전송하는 서비스 제어부 포함한다. In addition, the information providing system according to another aspect of the present invention is a system for providing information in response to a request from the user terminal, connected to at least one or more user terminals and counselor terminals through a network, requesting information provision, and separately A user terminal which converts and compresses an analog voice of a user into a digital signal using a configured codec, processes the same into a predetermined format that can be transmitted through the wireless network, and then transmits the streaming voice data; In response to a request for providing information from the user terminal, the analog voice of the user is converted into a digital signal as it is, compressed, and then received and processed in a predetermined format that can be transmitted through the wireless network. A voice processing unit; An endpoint detection unit for performing endpoint detection on the generated voice data; A speech recognition engine that performs a speech recognition process on the speech data based on the detected endpoint; And a service control unit for searching for and transmitting the corresponding information to the user terminal according to the voice recognition result.

이 경우, 상기 사용자 단말기는 사용자 음성 신호를 입력하는 인터페이스부; 상기 인터페이스부를 통해 입력되는 사용자 음성 신호를 별도 설치된 음성 코텍을 통해 스트리밍 음성 데이터로 처리하는 음성 신호 처리부; 상기 스트리밍 음성 데이터를 상기 네트워크를 통하여 상기 시스템으로 전송하고, 상기 시스템으로부터 전송되는 신호를 수신하는 송수신부; 및 상기 시스템으로부터 전송되는 음성 인식 처리 결과, 시스템으로부터 전송된 정보를 포함하는 처리 정보를 표시하는 디스플레이부를 포함할 수 있다. 특히 상기 음성 신호 처리부는 사용자의 아날로그 음성 신호를 디지털 신호로 변환하고 압축한 후, 소정 포맷의 전송 가능한 스트리밍 음성 데이터를 생성하여 상기 시스템으로 전송한다. In this case, the user terminal may include an interface unit for inputting a user voice signal; A voice signal processor configured to process the user voice signal input through the interface unit as streaming voice data through a separately installed voice codec; A transmitter / receiver for transmitting the streaming voice data to the system through the network and receiving a signal transmitted from the system; And a display unit for displaying the processing information including the information transmitted from the system as a result of the voice recognition processing transmitted from the system. In particular, the voice signal processor converts and compresses an analog voice signal of a user into a digital signal, and then generates and transmits the transmissionable streaming voice data of a predetermined format to the system.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention.

도 1은 본 발명의 실시 예에 따른 정보 제공 시스템의 네트워크 연결도이다.첨부한 도 1에 도시되어 있듯이, 본 발명의 실시 예에 따른 정보 제공 시스템(100)은, 무선 네트워크(200)를 통하여 다수의 사용자 단말기(300)와 연결되어 있다. 1 is a diagram illustrating a network connection of an information providing system according to an embodiment of the present invention. As shown in FIG. 1, the information providing system 100 according to an embodiment of the present invention may be provided through a wireless network 200. It is connected with a plurality of user terminals 300.

여기서, 사용자 단말기(300)는 무선 네트워크(200, 이하 설명의 편의를 위하여 "네트워크"라고도 명명함)를 통하여 정보 제공 시스템(100)에 접속할 수 있는 통신 장치이며, 예를 들어, 이동 통신 단말기, 인터넷 폰, PDA 등의 모든 무선 네트워크 접속이 가능한 통신 장치가 포함될 수 있다. Here, the user terminal 300 is a communication device capable of connecting to the information providing system 100 via a wireless network 200 (also referred to as "network" for convenience of description below), for example, a mobile communication terminal, Communication devices capable of all wireless network connections, such as Internet phones and PDAs, may be included.

도 2는 본 발명의 실시 예에 따른 단말기의 구조도이다. 2 is a structural diagram of a terminal according to an embodiment of the present invention.

본 발명의 실시 예에 따른 사용자 단말기(300)는 도 2에서와 같이, 인터페이스부(310), 인터페이스부(310)를 통하여 입력되는 음성 신호를 처리하는 음성 신호 처리부(320), 처리된 음성 신호를 네트워크(200)를 통하여 시스템(100)으로 전송하는 송수신부(330), 그리고 음성 인식 처리 결과, 시스템으로부터 전송된 정보 등 다양한 형태의 정보를 표시하는 디스플레이부(340)를 포함한다. 또한 문자 형태의 정보를 처리하여 디스플레이부(340)에 표시하는 문자 정보 처리부(350), 영상 형태의 정보를 처리하여 디스플레이부(340)에 표시하는 영상 정보 처리부(360), 그리고 음성 신호의 전송, 문자 및 영상을 포함하는 정보의 수신 및 처리를 제어하는 서비스 제어부(370)를 더 포함한다. As shown in FIG. 2, the user terminal 300 according to an exemplary embodiment of the present invention may include an interface unit 310, a voice signal processor 320 for processing a voice signal input through the interface unit 310, and a processed voice signal. And a transmission / reception unit 330 for transmitting the data to the system 100 through the network 200, and a display unit 340 for displaying various types of information such as a result of voice recognition processing and information transmitted from the system. In addition, the character information processing unit 350 to process the information in the form of the character to display on the display unit 340, the image information processing unit 360 to process the information in the form of the image displayed on the display unit 340, and to transmit the audio signal The service control unit 370 further controls reception and processing of information including text and images.

인터페이스부(310)는 사용자와의 인터페이스를 위한 기능을 수행하며, 예를 들어 사용자의 음성을 전기적인 음성 신호로 변환하여 출력하는 마이크, 음성 신호를 출력하는 스피커를 포함하며, 이외에도 키패드, 마우스 등의 입력 장치를 포함한다. The interface unit 310 performs a function for interfacing with a user, and includes, for example, a microphone for converting and outputting a user's voice into an electrical voice signal and a speaker for outputting a voice signal. It includes an input device.

음성 신호 처리부(320)는 인터페이스부(310)를 통하여 입력되는 사용자의 음성 신호를 처리하여 시스템으로 송부할 소정의 음성 데이터를 생성한다. 구체적으로 음성 신호 처리부(320)는 인터페이스부를 통하여 입력되는 아날로그 음성 신호를 디지털 음성 신호로 변환하며, 특히 설정된 비트율에 따라 상기 아날로그 음성 신호를 디지털 음성 신호로 변환 처리한다. 그리고 디지털 음성 신호를 압축하여 스트리밍 데이터로 처리한 다음에 송수신부(340)를 통하여 시스템(100)으로 전송한다. 특히 음성 신호 처리부(320)는 입력되는 사용자 음성 신호를 녹취하며, 이후 음성 인식이 실패한 경우 필요에 따라 음성 녹취 파일을 시스템(100)으로 전송한다. 또한 음성 신호 처리부(320)는 시스템(100)으로부터 전달되는 음성 신호를 처 리하여 인터페이스부(310)를 통하여 출력되도록 한다. 이하에서는 사용자 단말기로부터 출력되어 시스템으로 전송되는 음성 신호를 스트리밍 음성 데이터라고 명명한다. 한편 음성 신호 처리부(320)는 일명 음성 코덱(CODEC)이라고도 명명할 수 있으며, 문자 정보 처리부(350)는 문자 코덱, 영상 정보 처리부(360)는 영상 코덱이라고 명명할 수 있다.The voice signal processor 320 processes voice signals of the user input through the interface unit 310 and generates predetermined voice data to be sent to the system. In detail, the voice signal processor 320 converts an analog voice signal input through the interface unit into a digital voice signal, and in particular, converts the analog voice signal into a digital voice signal according to a set bit rate. The digital voice signal is compressed and processed into streaming data, and then transmitted to the system 100 through the transceiver 340. In particular, the voice signal processor 320 records an input user voice signal, and if the voice recognition fails, transmits a voice recording file to the system 100 as necessary. In addition, the voice signal processor 320 processes the voice signal transmitted from the system 100 to be output through the interface unit 310. Hereinafter, the voice signal output from the user terminal and transmitted to the system will be referred to as streaming voice data. Meanwhile, the voice signal processor 320 may be called a voice codec, and the text information processor 350 may be a text codec, and the image information processor 360 may be called a video codec.

송수신부(340)는 음성 신호 처리부(320)로부터 제공되는 소정 포맷의 디지털 음성 신호인 스트리밍 음성 데이터를 전송 가능한 신호로 처리하여 시스템(100)으로 전송하며, 또한 시스템(100)으로부터 전송되는 다양한 형태의 정보를 포함하는 신호를 수신한다. The transmitter / receiver 340 processes streaming voice data, which is a digital voice signal of a predetermined format, provided from the voice signal processor 320 as a transmittable signal and transmits the signal to the system 100, and also transmits various forms transmitted from the system 100. Receive a signal containing information.

문자 정보 처리부(350)는 송수신부(340)로부터 수신된 신호로부터 문자 정보를 추출한 다음에 디스플레이부에 표시 가능한 형태로 처리하며, 영상 정보 처리부(360)는 상기 수신된 신호로부터 영상 정보를 추출하여 표시 가능한 형태로 처리한다. 여기서 영상 정보는 정지 영상 및 동영상을 모두 포함한다. The text information processor 350 extracts the text information from the signal received from the transceiver 340, and then processes the text information into a form that can be displayed on the display. The video information processor 360 extracts the video information from the received signal. Process in a form that can be displayed. The image information includes both a still image and a video.

서비스 제어부(370)는 본 발명의 실시 예에 따른 음성 인식을 통한 정보를 제공하는 서비스가 사용자 단말기를 통하여 이루어지도록, 위의 구성 요소(310∼360)를 제어한다. 예를 들어, 소정 서비스 메뉴별로 음성을 통한 정보 제공이 이루어지도록 하는 어플리케이션을 토대로 동작하여, 단말기(300)와 시스템(100) 사이의 연계에 따라 사용자가 요청한 메뉴에 해당하는 정보가 음성을 포함하는 다양한 형태로 사용자에게 제공될 수 있도록 한다. The service controller 370 controls the above components 310 to 360 so that a service for providing information through voice recognition according to an embodiment of the present invention is provided through the user terminal. For example, operating based on an application for providing information through voice for each service menu, the information corresponding to the menu requested by the user according to the link between the terminal 300 and the system 100 includes voice. It can be provided to the user in various forms.

이러한 본 발명의 실시 예에 따른 사용자 단말기(300)는 위에 기술된 구성 요소이외에도, 단말기 고유의 기능(예를 들어 콜 연결 처리를 포함한 통화 처리 기능)을 수행하기 위한 다른 구성 요소를 추가적으로 포함할 수 있다. In addition to the components described above, the user terminal 300 according to an embodiment of the present invention may further include other components for performing terminal-specific functions (for example, a call processing function including call connection processing). have.

한편 위에 기술된 바와 같은 단말기(300)로부터의 요청에 따라 소정 정보를 단말기로 전송하는 정보 제공 시스템(100)은 다음과 같은 구조로 이루어진다. Meanwhile, the information providing system 100 for transmitting predetermined information to the terminal in response to a request from the terminal 300 as described above has the following structure.

도 3은 본 발명의 실시 예에 따른 정보 제공 시스템의 상세 구조도이다. 3 is a detailed structural diagram of an information providing system according to an exemplary embodiment of the present invention.

본 발명의 실시 예에 따른 정보 제공 시스템(100)은 도 1 및 도 3에서와 같이, 음성 인식 서버(10)를 포함하며, 이외에도 상담원 관리 서버(20)를 포함한다. 또한 사용자 단말기와의 콜 연결을 위한 콜 처리부(50)를 더 포함할 수 있다. Information providing system 100 according to an embodiment of the present invention, as shown in Figure 1 and 3, includes a voice recognition server 10, in addition to the agent management server 20. In addition, the call processing unit 50 for the call connection with the user terminal may be further included.

이러한 정보 제공 시스템(100)은 이동 통신사의 FEP 서버(도시하지 않음)를 통하여 연결될 수 있다. FEP 서버는 사용자 단말기로부터의 정보 요청시, 정보 제공을 위한 음성 인식 서버와 기지국 위치 찾기 등의 기능을 수행하여, 정보 제공 시스템(100)과 사용자 단말기(300) 사이의 효율적인 연결이 이루어지도록 할 수 있다. 물론 FEP 서버 이외에 동일한 기능을 수행하는 다른 서버가 사용될 수도 있다. The information providing system 100 may be connected through a mobile communication company's FEP server (not shown). When the FEP server requests information from the user terminal, the FEP server performs a function such as a voice recognition server for providing the information and the location of the base station, so that the efficient connection between the information providing system 100 and the user terminal 300 can be made. have. Of course, other servers that perform the same function may be used in addition to the FEP server.

음성 인식 서버(10)는 사용자 단말기(300)로부터 전송되는 소정 포맷의 음성 신호를 처리하여 음성 인식을 수행한다. 이를 위하여 구체적으로 입력되는 스트리밍 음성 데이터를 복원 처리하여 해당하는 음성 데이터를 생성하는 음성 처리부(11), 복원된 음성 데이터에 대하여 끝점 검출(End-Point Detection, 이하, EPD라고 명명함) 처리를 수행하는 EPD 처리부(12), EPD 처리부에서 검출된 끝점을 토대로 상기 음성 데이터에 대한 음성 인식을 수행하여, 음성 데이터에 해당하는 문자 정보와 음성 인식 결과를 생성하는 음성 인식 엔진(13), 및 상기 문자 정보 및 음성 인식 결과를 상담원 관리 서버(20)로 제공하거나, 음성 인식 결과 및 음성 인식에 따라 검색된 정보들을 사용자 단말기(300)로 제공하는 서비스 제어부(14)를 포함한다. 한편 서비스 제어부(14)는 사용자 단말기(300)와 시스템(100) 사이의 동기화를 위한 통신 제어를 수행할 수 있다. The voice recognition server 10 processes a voice signal of a predetermined format transmitted from the user terminal 300 to perform voice recognition. To this end, the voice processing unit 11 which restores the specifically input streaming data and generates corresponding voice data, and performs end-point detection (hereinafter referred to as EPD) processing on the restored voice data. The speech recognition engine 13 which performs speech recognition on the speech data based on the detected end point detected by the EPD processor, and generates text information and speech recognition result corresponding to the speech data, and the text. The service control unit 14 provides information and a voice recognition result to the counselor management server 20 or provides the user terminal 300 with information retrieved according to the voice recognition result and the voice recognition. Meanwhile, the service controller 14 may perform communication control for synchronization between the user terminal 300 and the system 100.

본 발명의 실시 예에 따른 음성 인식 엔진(13)은 음성 인식 결과를 소정 형태의 정보로 처리하여 사용자 단말기(300)로 전송함으로써, 사용자가 시스템의 음성 인식 결과를 확인할 수 있도록 한다. 이 때 사용자 단말기로 전송되는 음성 인식 결과는 문자 형태일 수도 있으며, 또는 영상 또는 음성 등의 복합매체 정보일 수도 있다. 또한 음성 인식 엔진(13)에는 입력된 음성 데이터에 대한 음성 인식을 위한 기본 정보가 저장된 음성 인식 DB(도시되지 않음)를 포함할 수 있다. 이러한 음성 인식 DB에는 고객별로 최근에 사용하였거나 자주 사용한 단어 목록 또는 전체 고객 대상으로 자주 사용된 단어 목록이 저장되어 있어도 좋다.The speech recognition engine 13 according to an exemplary embodiment of the present invention processes the speech recognition result into a predetermined form of information and transmits the speech recognition result to the user terminal 300 so that the user can check the speech recognition result of the system. In this case, the voice recognition result transmitted to the user terminal may be in the form of a text or may be complex media information such as an image or a voice. In addition, the speech recognition engine 13 may include a speech recognition DB (not shown) in which basic information for speech recognition of the input speech data is stored. The voice recognition DB may store a list of recently used or frequently used words for each customer or a list of frequently used words for all customers.

음성 인식 서버(10)의 음성 인식 결과를 토대로 사용자가 요청한 정보가 제공되며, 이를 위하여 다수의 상담원 1그룹 단말기(30) 및 상담원 2그룹 단말기(40)들이 상담원 관리 서버(20)에 연결될 수 있다. 상담원 1그룹 단말기(30)는 상담원 관리 서버(20)와 음성 인식 엔진(13)에 연결되어 있으며, 상담원 2그룹 단말기(40)는 상담원 관리 서버(20)와 콜 처리부(50)에 연결되어 있다. Information requested by a user is provided based on a voice recognition result of the voice recognition server 10, and a plurality of agent 1 group terminals 30 and agent 2 group terminals 40 may be connected to the agent management server 20. . The agent 1 group terminal 30 is connected to the agent management server 20 and the voice recognition engine 13, and the agent 2 group terminal 40 is connected to the agent management server 20 and the call processing unit 50. .

상담원 관리 서버(20)는 상담원 1그룹 단말기(30) 및 상담원 2그룹 단말기(40)를 관리하며, 특히 사용자의 정보 요청에 따라 소정의 상담원 단말기를 선택하여 해당하는 정보가 사용자 단말기에게 제공되도록 한다. The agent management server 20 manages the agent 1 group terminal 30 and the agent 2 group terminal 40, and in particular, selects a predetermined agent terminal according to a user's information request so that corresponding information is provided to the user terminal. .

특히 상담원 관리 서버(20)는 음성 인식 엔진(13)에 의한 음성 인식 결과가 미리 설정된 기준치보다 작은 경우, 상담원 1그룹 단말기(30)들 중에서 하나를 선택하고, 사용자 단말기로부터 전송되는 음성 녹취 파일과 음성 인식 엔진(12)에서 인식한 인식 단어 목록을 포함하는 문자 정보를 선택된 상담원 단말기로 제공한다. 그 결과 선택된 상담원 1그룹 단말기(30)를 통하여 재생되는 음성 녹취 파일과 상기 문자 정보를 토대로 해당하는 상담원이 사용자 요청을 인식하게 되고, 인식된 결과를 음성 인식 엔진(13)으로 전달한다. 여기서 음성 녹취 파일은 사용자 단말기로부터 전송된 파일이다. In particular, when the voice recognition result of the voice recognition engine 13 is smaller than a preset reference value, the agent management server 20 selects one of the agent 1 group terminals 30, and the voice recording file transmitted from the user terminal. Character information including a list of recognized words recognized by the speech recognition engine 12 is provided to the selected counselor terminal. As a result, a corresponding counselor recognizes a user request based on the voice recording file reproduced through the selected counselor 1 group terminal 30 and the text information, and transmits the recognized result to the speech recognition engine 13. The voice recording file is a file transmitted from the user terminal.

한편 상담원 관리 서버(20)는 상담원 1그룹 단말기(30)를 통한 상담원 1그룹에 의한 음성 인식이 실패한 경우, 사용자의 음성을 상담원 2그룹의 상담원이 직접 청취할 수 있도록 하나의 상담원 2그룹 단말기(40)를 선택한다. 이후 선택된 2그룹 단말기의 상담원이 직접 사용자의 음성을 듣고 해당하는 처리 결과를 콜 처리부(50)로 전달한다. 이 때, 상담원 2그룹의 상담원은 사용자와 직접 통화하는 것이 아니라 사용자의 음성만을 직접 청취하는 것으로, 사용자에 대한 응대는 콜 처리부(50)에 의해 수행될 수 있다. On the other hand, the agent management server 20 is one of the agent 2 group terminals so that when the voice recognition by the agent 1 group through the agent 1 group terminal 30 can be heard directly by the agent of the agent 2 group ( Select 40). Thereafter, the counselors of the selected two group terminals directly listen to the user's voice and transmit the corresponding processing result to the call processor 50. At this time, the counselors of the two counselors listen directly to the voice of the user instead of directly talking to the user, and the response to the user may be performed by the call processor 50.

콜 처리부(50)는 도 3에서와 같이, 사용자 단말기(300)로부터의 콜 연결 요청에 따라 콜 처리를 수행하는 서버로서, 무선 통신사 또는 유선 통신사의 외부 교환기와 연결되는 내부의 교환기(51), 복수의 콜을 분배하는 콜 분배 서버(CTI: computer and telephony integration, 52)와, IVR(Interactive Voice Response) 서버(53)를 포함한다. The call processing unit 50 is a server that performs call processing according to a call connection request from the user terminal 300, as shown in FIG. 3, and includes an internal exchange 51 connected to an external exchange of a wireless communication company or a wire communication company, It includes a call distribution server (CTI: computer and telephony integration, 52) for distributing a plurality of calls, and an interactive voice response (IVR) server 53.

내부 교환기(51)는 유무선 통신사의 외부 교환기를 통하여 고객이 소지하고 있는 유무선 통신 단말기에 직접 접속되어 고객이 자신의 유무선 통신 단말기를 통하여, 본 발명의 실시 예에 따른 음성 인식을 통한 정보 제공 서비스를 받을 수 있도록 한다. 콜 분배 서버(52)는 교환기(51)에 접속되며, 전화와 컴퓨터간의 정보 자원 공유뿐만 아니라 그 연결된 장치들의 제어와 기존에 구축되어 있는 정보들과 네트워크를 형성하여 등록된 정보들을 제공한다. 특히 콜 분배 서버(52)는 사용자 단말기(300)로부터 콜 연결 요청이 입력되면, 상기 사용자 단말기(300)를 다수의 IVR 서버 중 하나의 IVR 서버로 연결시키며, IVR 서버(53)는 정해진 시나리오 서비스에 따른 음성 안내 멘트를 사용자 단말기(300)로 제공하여 사용자가 원하는 정보를 요청하도록 한다. 이를 위하여, IVR 서버(53)는 다수의 안내 멘트 및 제공할 정보를 음성화한 음성 정보 데이터를 저장 관리하며, 사용자 단말기와의 콜이 연결되면 저장된 안내 멘트를 재생시켜 출력한다. 이외에도 상담원 단말기(40)로부터 제공되는 사용자 요청에 대응하는 정보를 음성으로 제공하거나 또는 문자, 영상 등의 복합매체 데이터로 처리하여 사용자 단말기(300)로 제공한다. 특히 본 발명의 실시 예에서 콜 처리부(50)는 사용자의 요청에 대한 음성 인식이 실패한 경우 상담원 2그룹 단말기(40)와 사용자 단말기(300) 사이에 콜 연결이 이루어지도록 한다. 이 경우 상담원 2그룹에 속한 상담원이 상담원 2그룹 단말기(40)를 통해 직접 사용자와 통화 연결되어 사용자가 요청하는 정보를 직접 청취할 수 있게 된다. The internal exchange 51 is directly connected to the wired / wireless communication terminal possessed by the customer through the external exchange of the wired / wireless communication company, and the customer provides the information providing service through voice recognition according to an embodiment of the present invention. To receive it. The call distribution server 52 is connected to the exchange 51 and provides the registered information by forming a network with the information and the established information and control of the connected devices as well as the sharing of information resources between the telephone and the computer. In particular, when a call connection request is input from the user terminal 300, the call distribution server 52 connects the user terminal 300 to one IVR server among a plurality of IVR servers, and the IVR server 53 provides a predetermined scenario service. Providing a voice announcement according to the user terminal 300 so that the user requests the desired information. To this end, the IVR server 53 stores and manages voice information data obtained by voicening a plurality of announcements and information to be provided. When the call with the user terminal is connected, the IVR server 53 reproduces and outputs the stored announcements. In addition, information corresponding to a user request provided by the counselor terminal 40 may be provided as a voice or processed as complex media data such as a text or an image and provided to the user terminal 300. In particular, in the embodiment of the present invention, the call processing unit 50 allows a call connection to be made between the agent 2 group terminal 40 and the user terminal 300 when the voice recognition for the user request fails. In this case, the counselor belonging to the counselor 2 group can directly connect with the user through the counselor 2 group terminal 40 to directly listen to the information requested by the user.

한편, 본 발명의 실시 예에 따른 정보 제공 시스템에는 문자 데이터를 음성으로 변환하는 TTS(Text-to-Speech) 서버(도시되지 않음)가 더 포함될 수 있다. On the other hand, the information providing system according to an embodiment of the present invention may further include a text-to-speech (TTS) server (not shown) for converting text data into voice.

본 실시 예에서, 각 서버의 구성 요소들이 해당 서버 내에서 동작되도록 도시되었으나, 이에 한정되지 않고 각각 독립적인 서버로서 구현되어 해당 기능을 처리할 수도 있다. 또한, 각 서버 및 서버를 구성하는 구성 요소들은 그 기능에 따라 분류된 것이며, 위에 기술된 바와 같이 분류되는 것으로 한정되지 않는다. 예를 들어, 상담원 관리 서버가 하는 기능을 음성 인식 서버에서 수행하도록 구현할 수 있다. In the present embodiment, although components of each server are shown to operate in the corresponding server, the present invention is not limited thereto and may be implemented as independent servers to process corresponding functions. In addition, each server and the components constituting the server are classified according to their functions, and are not limited to those classified as described above. For example, the voice management server may perform a function of the agent management server.

다음에는 이러한 구조를 토대로 하여 본 발명의 실시 예에 따른 정보 제공 방법에 대하여 설명한다. Next, an information providing method according to an exemplary embodiment of the present invention will be described based on the structure.

사용자는 단말기(300)에서 본 발명의 실시 예에 따른 정보 제공 시스템(100)을 통하여 소정 정보를 제공받기 위하여, 단말기(300)를 소정 키트(kit)에 연결시키거나 또는 핫키(hot key) 등을 눌러서 네트워크 연결을 요청하면, 도 4 및 도 5에 도시되어 있듯이, 단말기(300)의 서비스 제어부(370)는 송수신부(340)를 통하여 네트워크(200)상의 정보 제공 시스템(100)으로의 접속을 시도한다. 접속이 이루어지면 정보 제공 시스템(100)으로부터 서비스 메뉴가 송신되어 단말기(300) 상에 표시될 수 있다.(S100). 이 경우 서비스 메뉴가 단말기에 저장되어 있다가 시스템으로의 접속이 이루어지면 표시될 수도 있으며, 필요에 따라 시스템을 통한 메뉴 업데이트가 이루어진 다음에 표시될 수도 있다. The user connects the terminal 300 to a predetermined kit or a hot key to receive the predetermined information from the terminal 300 through the information providing system 100 according to an exemplary embodiment of the present invention. When pressing to request a network connection, as shown in FIGS. 4 and 5, the service controller 370 of the terminal 300 connects to the information providing system 100 on the network 200 through the transceiver 340. Try. When the connection is made, a service menu may be transmitted from the information providing system 100 to be displayed on the terminal 300 (S100). In this case, the service menu is stored in the terminal and may be displayed when a connection to the system is made. If necessary, the service menu may be displayed after the menu update is made through the system.

이후 사용자가 소정 메뉴를 선택하면 단말기(300)의 서비스 제어부(370)는 선택된 메뉴에 해당하는 정보를 정보 제공 시스템(100)으로 전송하면서 서비스를 요청하게 된다(S110). 서비스 요청에 따라 정보 제공 시스템(100)은 사용자 단말기(300)에 대한 인증을 수행한다(S120). 단말기에 대한 인증은 접속시 제공되는 단말기의 식별 번호(예를 들어, 단말기 제조시 부여되는 고유 번호 등)를 토대로 서비스 제공 가능한 회원으로 등록된 사용자인지를 확인할 수 있다. 이러한 인증 과정은 선택적으로 수행될 수 있다. Thereafter, when the user selects a predetermined menu, the service controller 370 of the terminal 300 requests a service while transmitting information corresponding to the selected menu to the information providing system 100 (S110). In response to the service request, the information providing system 100 performs authentication on the user terminal 300 (S120). The authentication of the terminal may determine whether the user is registered as a member capable of providing a service based on the identification number of the terminal (eg, a unique number given when the terminal is manufactured). This authentication process may optionally be performed.

서비스 요청에 따라 정보 제공 시스템(100)은 도 5에서와 같이, 단말기(300)로 선택된 서비스 메뉴에 따른 사용자 음성 데이터를 전송할 것을 요청한다. 그러면 단말기(300)는 인터페이스부(310)를 통하여 입력되는 사용자의 음성을 처리하여 시스템(100)으로 전송한다(S130∼S140). 즉, 단말기(300)의 음성 신호 처리부(320)는 인터페이스부(310)를 통하여 입력되는 사용자의 음성 신호를 소정 포맷의 스트리밍 음성 데이터로 처리한 후 송수신부(340)를 통하여 시스템(100)으로 전달한다. 이 때 음성 신호 처리부(320)는 입력되는 사용자 음성 신호를 녹취하여 소정 파일로서 저장한다. In response to the service request, as shown in FIG. 5, the information providing system 100 requests the terminal 300 to transmit user voice data according to the selected service menu. Then, the terminal 300 processes the voice of the user input through the interface unit 310 and transmits it to the system 100 (S130 to S140). That is, the voice signal processor 320 of the terminal 300 processes the voice signal of the user input through the interface unit 310 into streaming voice data of a predetermined format, and then transmits the voice signal to the system 100 through the transceiver 340. To pass. At this time, the voice signal processor 320 records the input user voice signal and stores it as a predetermined file.

위의 과정을 통하여 단말기(300)와 시스템(100) 사이에 스트리밍 음성 데이터 전송이 이루어진다. 이와 같이 사용자 단말기(300)에서 시스템(100)으로 전송되는 음성 데이터가 아날로그 음성 신호 자체가 아니라 시스템(100)에서 음성 인식 처리가 가능하도록 소정의 전처리가 이루어진 상태로 전송되기 때문에, 음성 데이터 전송시 왜곡이 발생되는 것을 방지할 수 있다. 또한 단말기는 전처리를 구현하기 위한 수단만을 포함함으로써, 상기 수단의 구현에 따른 비용 증가, 구조의 복잡화 등의 문제를 방지할 수 있다. Through the above process, streaming voice data transmission is performed between the terminal 300 and the system 100. As such, the voice data transmitted from the user terminal 300 to the system 100 is transmitted in a state in which a predetermined preprocessing is performed to enable the voice recognition process in the system 100 instead of the analog voice signal itself. Distortion can be prevented from occurring. In addition, the terminal includes only means for implementing the preprocessing, thereby preventing problems such as an increase in cost and complexity of the structure according to the implementation of the means.

한편 사용자 단말기(300)로부터 전송된 스트리밍 음성 데이터는 음성 인식 서버(10)로 전달되며(S150), 음성 인식 서버(10)는 전달된 스트리밍 음성 데이터에 대한 음성 인식을 수행한다. 구체적으로 도 4에서와 같이, 입력되는 스트리밍 음성 데이터를 복원(디코딩 등) 처리하여 해당하는 음성 데이터를 생성하며(S160), 복원된 음성 데이터에 대하여 음성구간, 즉 음성의 시작점 및 끝점을 추출하고(S170), 추출된 끝점을 토대로 상기 음성 데이터에 대한 음성 인식을 수행한다(S180). 이 때 음성 인식 서버(10)의 음성 인식 엔진(12)은 소정 시나리오에 따른 음성 인식 DB를 검색하여 음성 인식을 수행할 수 있다.Meanwhile, the streaming voice data transmitted from the user terminal 300 is transferred to the voice recognition server 10 (S150), and the voice recognition server 10 performs voice recognition on the delivered streaming voice data. In detail, as shown in FIG. 4, the corresponding streaming data is restored (decoded, etc.) to generate corresponding voice data (S160), and a voice section, that is, a start point and an end point of the voice is extracted from the restored voice data. In operation S170, voice recognition is performed on the voice data based on the extracted endpoint. At this time, the speech recognition engine 12 of the speech recognition server 10 may perform speech recognition by searching a speech recognition DB according to a predetermined scenario.

음성 인식 결과 중 하나인 인식 스코어가 미리 정해놓은 기준치 이상인 경우, 서비스 제어부(15)는 음성 인식 결과를 사용하여 정보 검색을 완료한 후(S29), 검색된 정보를 사용자 단말기(300)로 제공한다(S190∼S200)). 이 때, 검색된 정보는 문자, 음성, 그래픽 또는 문자와 그래픽의 복합 형태 등과 같이 다양한 정보로써 사용자에게 제공될 수 있다. 이와 같이 본 발명의 실시 예에 따른 정보 제공 시스템(100)과 사용자 단말기(300) 사이의 정보 송수신이 교환기 등을 통한 음성망을 이용하여 이루어지는 것이 아니라, TCP/IP 망과 같은 데이터 송수신이 이루어지는 데이터망을 통하여 수행됨으로써, 음성 인식에 따른 보다 정확한 정보 제공이 이루어질 수 있다. When the recognition score, which is one of the speech recognition results, is equal to or greater than a predetermined reference value, the service controller 15 completes the information search using the speech recognition result (S29), and provides the searched information to the user terminal 300 ( S190 to S200). In this case, the retrieved information may be provided to the user as various information such as text, voice, graphic, or a complex form of text and graphic. As described above, data transmission and reception between the information providing system 100 and the user terminal 300 is not performed using a voice network through an exchange, but data is transmitted and received such as a TCP / IP network. By performing through a network, more accurate information can be provided according to voice recognition.

한편 음성 인식 처리 결과에 따라 시스템(100)의 정보 제공이 다르게 수행될 수 있다. Meanwhile, the information provided by the system 100 may be differently performed according to the result of the speech recognition process.

음성 인식 결과가 성공적으로 이루어진 경우에는 위에 기술된 바와 같이, 데이터망을 통하여 사용자 단말기(300)로 인식 결과에 따른 정보가 제공된다(S300 ∼S320). 그러나 음성 인식 엔진(12)에 의한 음성 인식 결과가 미리 설정된 기준치보다 작은 경우, 서비스 제어부(15)는 사용자 단말기(300)로 음성 인식 실패를 통보하여 사용자가 다시 응답을 하도록 하여 새로운 음성 데이터를 제공받는다(S330). If the voice recognition result is successful, as described above, information according to the recognition result is provided to the user terminal 300 through the data network (S300 to S320). However, if the speech recognition result by the speech recognition engine 12 is smaller than the preset reference value, the service controller 15 notifies the user terminal 300 of the speech recognition failure so that the user responds again and provides new speech data. Receive (S330).

이 때 음성 인식 실패를 나타내는 문자나 또는 음성이 사용자 단말기(300)를 통하여 표시되거나 출력되어 사용자는 음성 인식 상태를 확인하게 되며, 이후 음성 입력을 통한 정보 요청을 재시도한다. 이 때 사용자 단말기(300)의 음성 신호 처리부(320)는 재시도에 따라 입력되는 음성 신호를 스트리밍 음성 데이터로 처리하여 전송하면서, 상기 음성 신호를 녹취한 음성 녹취 파일을 함께 전송한다. 그러면 시스템(100)의 음성 인식 서버(10)는 재전송되는 스트리밍 음성 데이터를 음성 인식 처리한다(S340∼S350).At this time, a text or voice indicating a failure in speech recognition is displayed or output through the user terminal 300 to confirm the voice recognition state, and then retry the information request through the voice input. At this time, the voice signal processor 320 of the user terminal 300 processes and transmits the input voice signal as streaming voice data according to the retry, and transmits the voice recording file recording the voice signal. Then, the voice recognition server 10 of the system 100 performs voice recognition processing of the retransmitted streaming voice data (S340 to S350).

이후 시스템(100)의 서비스 제어부(15)는 상담원 관리 서버(20)를 통하여 상담원 1그룹 단말기(30)들 중에서 하나를 선택하고, 선택된 상담원 단말기로 음성 인식 엔진(12)에서 인식한 인식 단어 목록을 포함하는 문자 정보와 함께 사용자 단말기(300)로부터 전송된 음성 녹취 파일을 제공하여, 상담원 1그룹에 의한 인식이 수행될 수 있도록 한다(SS360). 한편 서비스 제어부(15)는 재응답에 따라 수신된 음성 데이터에 의한 음성 인식 결과가 미리 설정된 기준치보다 작은 경우에만 상담원 1그룹을 위와 같이 호출할 수 있다. Thereafter, the service controller 15 of the system 100 selects one of the counselor 1 group terminals 30 through the agent management server 20, and recognizes the list of recognized words recognized by the speech recognition engine 12 with the selected agent terminal. By providing a voice recording file transmitted from the user terminal 300 with the text information including the, so that the recognition by the counselor 1 group can be performed (SS360). On the other hand, the service control unit 15 may call the counselor group 1 as described above only when the voice recognition result by the received voice data is smaller than the preset reference value according to the re-response.

상담원 1그룹 단말기(30)는 음성 인식 엔진(40)으로부터 전송된 인식 단어 목록을 표시하면서 상기 음성 녹취 파일을 재생시켜 출력한다. 따라서, 상담원 1그룹에 속한 상담원은 헤드셋 등을 통하여 녹취 파일을 청취하여 사용자의 음성을 인식한다. 인식 결과 해당 단어가 표시되는 인식 단어 목록에 있으면 이를 선택하고, 없는 경우에는 소정 DB를 검색하여 검색된 결과 단어를 입력한다. 상담원 1그룹 단말기(30)는 상담원 1그룹에 속한 상담원으로부터 인식된 결과 단어가 선택되거나 입력되면 해당 결과를 상담원 관리 서버(20)를 통하여 서비스 제어부(15)로 전송한다. 이후 서비스 제어부(15)는 상담원 1그룹에 의해 음성 인식이 성공적으로 완료된 것으로 판단하여, 인식된 결과를 사용하여 정보 검색을 수행하고 검색된 정보를 사용자 단말기(300)로 제공한다(S370∼S380). 이 때 서비스 제어부(15)가 아니라 상담원 1그룹에 속한 상담원이 직접 인식된 결과 단어에 따라 정보 검색을 수행할 수 있으며, 이 경우 서비스 제어부(15)는 검색된 정보를 단말기(300)로 전달하는 기능만을 수행할 수 있다. The counselor group 1 terminal 30 reproduces and outputs the voice recording file while displaying a list of recognized words transmitted from the voice recognition engine 40. Therefore, the counselor belonging to the counselor group 1 recognizes the user's voice by listening to the recording file through the headset. If a result of the recognition is found in the list of recognized words, the selected word is selected. If not, the predetermined word is searched and the search result word is entered. The counselor 1 group terminal 30 transmits the result to the service controller 15 through the agent management server 20 when a result word recognized from a counselor belonging to the counselor 1 group is selected or input. Thereafter, the service controller 15 determines that the voice recognition is successfully completed by the counselor group 1, performs information search using the recognized result, and provides the searched information to the user terminal 300 (S370 to S380). At this time, the agent belonging to the first group of agents, not the service control unit 15, may perform information search based on the directly recognized result word. In this case, the service control unit 15 may transmit the retrieved information to the terminal 300. Only can be done.

한편, 상기 단계(S370)에서 상담원 1그룹에 의한 인식 결과, 상담원 1그룹에 속한 상담원이 녹취 파일 청취시 고객의 발음 불분명, 주변 소음 등에 의한 고객 음성 판단 불가, 고객의 원하는 정보 부재로 인한 검색 불가 등의 원인으로 인해 실패로 끝난 경우, 서비스 제어부(15)는 콜 처리부(50)를 통하여 해당 사용자의 단말기를 상담원 관리 서버(20)에 의하여 선택되는 소정의 상담원 2그룹 단말기(40)로 직접 연결시킨다(S390). 이에 따라 상담원 2그룹에 속한 상담원이 상담원 2그룹단말기(40)를 통해 직접 사용자와 통화 연결되어 사용자가 요청하는 정보를 직접 청취할 수 있게 된다. 이 때, 상담원 2그룹에 속한 상담원은 사용자와 직접 통화하는 것은 아니고, 콜 처리부(50)의 IVR 서버(53)가 상담원 1그룹에 의한 인식 실패로 인한 서비스 시나리오에 따라 사용자에게 음성 입력을 재요청하는 메시지를 보내고, 그 결과로 고객이 직접 입력하는 음성을 상담원 2그룹 단말기(40)를 통해 청취하여 인식하는 것이다. 즉, 상담원 2그룹 단말기(40)는 사용자로부터 직접 입력되는 음성을 전화기에 연결된 헤드셋 등을 통해 상담원 2그룹에 속한 상담원에게 들려주므로, 상담원 2그룹에 속한 상담원이 사용자와의 직접적인 통화 없이 직접 사용자의 음성을 들을 수 있다. 이 때 사용자 단말기(300)는 일반적인 무선망을 통한 음성 통화시와 동일하게, 사용자의 음성 신호를 시스템(100)으로 전송하며, 이 경우 단말기(300)와 시스템(100) 사이에는 음성망을 통한 신호 송신이 이루어지게 된다. On the other hand, as a result of the recognition by the first group of agents in step S370, when the agent belonging to the first group of agents listens to the recording file, the customer's voice cannot be judged due to unclear pronunciation of the customer, ambient noise, etc. In the case of failure due to the cause, the service control unit 15 directly connects the terminal of the corresponding user to the predetermined agent 2 group terminal 40 selected by the agent management server 20 through the call processing unit 50. (S390). Accordingly, the counselor belonging to the counselor 2 group can directly connect with the user through the counselor 2 group terminal 40 to directly listen to the information requested by the user. At this time, the agent belonging to the agent 2 group does not directly talk to the user, but the IVR server 53 of the call processing unit 50 requests the voice input to the user again according to the service scenario caused by the recognition failure by the agent 1 group. Send a message, and as a result, the customer directly inputs the voice through the counselor 2 group terminal 40 to recognize it. That is, since the agent 2 group terminal 40 hears the voice input directly from the user to the agent belonging to the agent 2 group through a headset connected to the phone, the agent belonging to the agent 2 group is directly connected to the user without a direct call with the user. I can hear the voice. At this time, the user terminal 300 transmits the user's voice signal to the system 100 as in the case of a voice call over a general wireless network, and in this case, the terminal 300 and the system 100 through the voice network. Signal transmission is made.

따라서, 상담원 2그룹에 속한 상담원은 헤드셋 등을 통해 직접 사용자의 음성을 들어서 인식한 후, 사용자가 요청한 정보를 정보 DB(도시하지 않음)를 검색하여 검색된 결과를 입력한다. 이후 상담원 2그룹 단말기(40)는 상담원 2그룹에 속한 상담원으로부터 인식된 결과가 입력되면 해당 결과를 IVR 서버(53)로 전송하고, IVR 서버(53)는 상담원 2그룹에 의해 음성 인식이 성공적으로 완료된 것으로 판단(S27)하여, 정보 검색을 수행하고 검색된 정보를 교환기(10)를 통해 사용자에게 제공한다(S400). Therefore, the counselor belonging to the counselor 2 group listens directly to the user's voice through a headset or the like, searches for an information DB (not shown), and inputs the searched result. Thereafter, when the agent 2 group terminal 40 receives a result recognized from an agent belonging to the agent 2 group, the result is transmitted to the IVR server 53, and the IVR server 53 successfully recognizes the voice by the agent 2 group. It is determined that it is completed (S27), performs an information search and provides the searched information to the user through the exchange 10 (S400).

다음에는 이와 같이 수행되는 본 발명의 실시 예에 따른 정보 제공 방법이 실제로 어떻게 적용되는지를 예를 들어 설명한다. Next, an example of how the information providing method according to an exemplary embodiment of the present invention performed as described above is actually applied will be described.

도 7 및 도 8은 본 발명의 실시 예에 따른 정보 제공 방법이 적용되는 과정의 예시도이다. 즉, 도 7은 사용자 즉, 고객이 서울에서 소정 목적지 예를 들어 코엑스를 찾아가는 길에 대한 정보를 제공받고자 하는 경우, 본 발명의 실시 예에 따른 정보 제공 시스템과 연계하여 사용자 단말기 상에 표시되는 화면들을 나타낸 예시도이다. 7 and 8 are diagrams illustrating a process of applying the information providing method according to an embodiment of the present invention. That is, FIG. 7 is a screen that is displayed on a user terminal in connection with an information providing system according to an embodiment of the present invention when a user, ie, a customer, wants to be provided with information on a route to a predetermined destination, for example, COEX, in Seoul. It is an exemplary view showing these.

첨부한 도 7에 예시되어 있듯이, 사용자가 시스템(100)으로부터 제공되는 <초기 메뉴 화면> 중에서 "음성 찾기" 의 메뉴를 선택하면, 시스템(100)은 단말기(300)로 음성 데이터 전송을 요청한다. 이에 따라 단말기(300) 상에 <음성 입력 화면>이 표시되며, 사용자는 찾고자 하는 목적지명을 음성으로 입력한다. As illustrated in FIG. 7, when the user selects a menu of “Find Voice” from the <Initial Menu Screen> provided from the system 100, the system 100 requests transmission of voice data to the terminal 300. . Accordingly, the <voice input screen> is displayed on the terminal 300, and the user inputs a destination name to be searched by voice.

사용자가 목적지명을 음성으로 입력하면, 단말기(300)는 입력되는 음성 신호를 처리하여 스트리밍 음성 데이터를 생성하며, 이에 대응하는 목적지명이 도 7의 <시/도명 입력 화면>과 같이 화면 상에 표시되어 사용자가 확인할 수 있도록 한다. When the user inputs the destination name by voice, the terminal 300 processes the input voice signal to generate streaming voice data, and the corresponding destination name is displayed on the screen as shown in FIG. So that the user can check it.

이후 단말기(300)는 사용자 확인이 이루어진 목적지명인 <서울시, 코엑스>에 해당하는 스트리밍 음성 데이터를 시스템(100)으로 전송하며, 시스템(100)은 위에 기술된 바와 같이 복원 처리, 끝점 검출, 그리고 음성 인식 처리를 수행하여 사용자가 입력한 목적지명이 무엇인지를 인식한다. 이 경우 성공적으로 음성 인식이 이루어지거나 또는 음성 인식이 실패하여 상담원 1그룹 또는 상담원 2그룹과 연결된 다음의 음성 인식이 이루어지면, 도 7의 <길안내 초기 화면>과 같이 음성 인식 결과에 해당하는 정보가 단말기(300)로 제공되어 표시되어, 사용자가 확인할 수 있도록 한다. 이와는 달리 상기 전송된 정보는 음성 데이터로 처리되어 예를 들어 "코 엑스는 삼성역에서 영동 대교 방면으로 50m 진행하시면 됩니다. 전화번호는 02-XXXX-OOOO입니다" 라는 음성으로 출력될 수도 있다. Thereafter, the terminal 300 transmits the streaming voice data corresponding to the destination name <Seoul, COEX>, which has been confirmed by the user, to the system 100, and the system 100 restores, detects the endpoint, and voices as described above. A recognition process is performed to recognize what the destination name entered by the user is. In this case, when the voice recognition is successfully performed or the voice recognition fails and the next voice recognition connected to the agent 1 group or the agent 2 group is performed, the information corresponding to the result of the voice recognition as shown in the <Introduction of the road guide> of FIG. Is provided to the terminal 300 and displayed, so that the user can check. In contrast, the transmitted information may be processed as voice data, and may be output as voice, for example, "CO X is 50m from Samsung Station toward Youngdong Bridge. The telephone number is 02-XXXX-OOOO."

반면, 음성 인식이 실패한 경우 도 8의 <인식 실패 화면>과 같이 "음성 인식 실패" 및 "죄송합니다. 다시 한번 시도하여 주십시오"라는 정보가 출력되거나 안내 멘트가 출력되어, 고객이 시스템의 음성 인식 상태를 확인하게 된다. 이후 위의 시나리오와 같은 과정을 통하여 고객이 재응답한 목적지명 또는 서비스 메뉴명에 해당하는 정보 제공이 이루어진다. On the other hand, if the voice recognition fails, as shown in <Recognition Failure Screen> of FIG. 8, the information "Failed Voice Recognition" and "Sorry, please try again" is output or the announcement is output. You will check the status. Thereafter, information corresponding to the destination name or the service menu name that the customer responses is provided through the same process as the above scenario.

이러한 본 발명에 따른 정보 제공은 무선 환경에서의 정보 제공에만 한정되지 않으며, 그 기술적 요지를 벗어나지 않는 범위에서 다양하게 변형 및 변경 실시할 수 있다.Such information provision according to the present invention is not limited to providing information in a wireless environment, and various modifications and changes can be made without departing from the technical gist of the invention.

비록, 본 발명이 가장 실제적이며 바람직한 실시 예를 참조하여 설명되었지만, 본 발명은 상기 개시된 실시 예에 한정되지 않으며, 후술되는 특허청구범위 내에 속하는 다양한 변형 및 등가 물들도 포함한다. Although the present invention has been described with reference to the most practical and preferred embodiments, the present invention is not limited to the above-described embodiments, and includes various modifications and equivalents falling within the scope of the following claims.

이상 설명한 바와 같이, 본 발명의 실시 예에 따르면, 음성 인식을 이용한 정보 제공시에, 사용자 단말기에서 음성 인식 처리가 가능하도록 입력되는 사용자의 음성 신호를 전처리 한 후 전송함으로써, 음성 데이터 전송시 왜곡이 발생되는 것을 방지할 수 있다. 따라서 무선 환경에서 보다 효율적으로 음성 인식을 수행하여 사용자들의 단말기로 요청된 정보를 정확하게 제공할 수 있다. As described above, according to an embodiment of the present invention, when providing information using voice recognition, the user terminal pre-processes and transmits a voice signal input to enable voice recognition processing, thereby transmitting distortion in voice data transmission. Can be prevented from occurring. Therefore, it is possible to perform the speech recognition more efficiently in the wireless environment to accurately provide the requested information to the user's terminal.

또한 사용자 단말기는 전처리를 구현하기 위한 수단만을 포함함으로써, 상기 수단의 구현에 따른 비용 증가, 구조의 복잡화 등의 문제를 방지할 수 있다. 그 결과 단말기의 처리 부하는 감소시키면서 시스템을 통한 음성 인식이 왜곡 없이 수행된다. In addition, the user terminal includes only means for implementing the preprocessing, thereby preventing problems such as an increase in cost and complexity of the structure due to the implementation of the means. As a result, voice recognition through the system is performed without distortion while reducing the processing load of the terminal.

또한 시스템의 음성 인식 상태를 단말기 상에 표시함으로써, 사용자가 용이하게 상기 음성 인식 상태를 확인할 수 있다. In addition, by displaying the voice recognition status of the system on the terminal, the user can easily check the voice recognition status.

Claims

In a system connected to at least one user terminal and an agent terminal through a wireless network, in a method for providing information in response to a request from the user terminal,

a) The user terminal requests the system to provide the information, converts and compresses the analog voice of the user into a digital signal using a separately configured codec, processes it into a predetermined format that can be transmitted through the wireless network, and then streams the voice data. Transmitting to the system via a network;

b) if there is a request for providing information through the user terminal, the system receiving streaming voice data transmitted from the user terminal;

c) restoring the streaming voice data by the system to generate voice data;

d) the system performing endpoint detection on the speech data;

e) the system performing a speech recognition process on the speech data based on the detected endpoint; And

f) finding and transmitting the corresponding information to the user terminal according to the speech recognition result;

Information providing method comprising a.

delete

The method of claim 1

The counselor terminal is divided into 1 group terminal and 2 group terminal,

Step f)

Transmitting, by the system, text information including a voice recording file transmitted from the user terminal and a recognition word list according to a voice recognition result to one group terminal when the voice recognition processing result is smaller than a preset reference value; And

Finding information based on a result of the recognition performed based on the text information and the voice recording file in the first group terminal and transmitting the information to the user terminal;

Information providing method further comprising.

The method of claim 3,

If the system fails recognition by the first group terminal, the method further comprises the step of connecting the second group terminal to the user terminal by call,

The two-group terminal recognizes the user's request based on the voice signal provided from the user terminal and provides the information to the user terminal.

The method of claim 3,

And if the voice recognition processing result is successful, processing the voice recognition processing result into at least one of text, voice, video, and composite media information and providing the processed result to the user terminal.

The method of claim 1

Streaming voice data between the user terminal and the system is transmitted and received over a data network on the network.

In a system connected to at least one user terminal and a counselor terminal through a wireless network to provide information in response to a request from the user terminal,

A user terminal for requesting information provision, converting and compressing an analog voice of a user into a digital signal as it is using a separately configured codec, processing it in a predetermined format that can be transmitted through the wireless network, and then transmitting the streaming voice data;

In response to a request for providing information from the user terminal, the analog voice of the user is converted into a digital signal as it is, compressed, and then received and processed in a predetermined format that can be transmitted through the wireless network. A voice processing unit;

An endpoint detection unit for performing endpoint detection on the generated voice data;

A speech recognition engine that performs a speech recognition process on the speech data based on the detected endpoint; And

The service control unit for finding the corresponding information according to the speech recognition result and transmits it to the user terminal

Information providing system comprising a.

The method of claim 7, wherein

The counselor terminal is divided into 1 group terminal and 2 group terminal,

And a counselor management server for selecting a terminal to respond to a request for providing information from the user terminal among the group 1 or group 2 terminals.

The method of claim 8,

When the voice recognition processing result is smaller than a preset reference value, the selected one group terminal selects an agent based on text information including a voice recording file transmitted from the user terminal and a recognition word list according to the voice recognition result. Information providing system for performing voice recognition through.

The method of claim 9,

The information providing system further comprises a call processing unit which connects the one group of two terminals with the user terminal when the recognition fails even by the first group terminal.

The method of claim 10,

The call processing unit

A call distribution server for connecting a call between the user terminal and the group 2 terminal; And

A response processing server for transmitting a voice signal provided from the user terminal to which the call is connected to the second group terminal, and providing the response information provided from the second group terminal to the user terminal;

Information providing system comprising a.

The method according to any one of claims 7 to 11.

The user terminal

An interface unit for inputting a user voice signal;

A voice signal processor configured to process the user voice signal input through the interface unit as streaming voice data through a separately installed voice codec;

A transmitter / receiver for transmitting the streaming voice data to the system through the network and receiving a signal transmitted from the system; And

A display unit for displaying the processing information including the information transmitted from the system as a result of the voice recognition processing transmitted from the system;

Information providing system comprising a.

delete