KR20190107351A

KR20190107351A - System and method for minimizing service delays for user voice based on terminal

Info

Publication number: KR20190107351A
Application number: KR1020180028496A
Authority: KR
Inventors: 정영섭; 김영진
Original assignee: 순천향대학교 산학협력단
Priority date: 2018-03-12
Filing date: 2018-03-12
Publication date: 2019-09-20

Abstract

According to the present invention, disclosed are a voice conversation system centered on a terminal for minimizing service delay for a user voice and a method thereof. According to one aspect of the present invention, the system has fast response speed since a series of processing steps are made to provide a voice conversation service centered on a terminal.

Description

TECHNICAL AND METHOD FOR MINIMIZING SERVICE DELAYS FOR USER VOICE BASED ON TERMINAL}

본 발명은 음성대화 시스템에 관한 것으로, 더욱 상세하게는 대화 서비스 제공을 위해 필요한 리소스만 서버로부터 제공받아 단말기 중심으로 음성대화 서비스를 제공하는 사용자 음성에 대한 서비스 지연을 최소화하는 단말기 중심의 음성대화 시스템 및 방법에 관한 것이다.The present invention relates to a voice conversation system, and more particularly, a terminal-centered voice conversation system that minimizes service delay for a user voice that provides a voice conversation service centered on a terminal by receiving only resources necessary for providing a conversation service from a server. And to a method.

통신 장비 및 네트워크 기술 등이 발전함에 따라 사용자는 휴대폰 등과 같은 단말을 이용하여 음성 입력을 통해 원하는 정보를 획득하는 음성대화 서비스를 이용하고 있다.As communication equipment and network technology have developed, users are using a voice conversation service for obtaining desired information through voice input using a terminal such as a mobile phone.

하지만, 종래 기술에 따른 음성대화 서비스는 사용자가 단말을 이용해 음성을 입력하면, 이러한 음성에 상응하는 대화 서비스 관련 정보는 리소스가 존재하는 서버에서 그 일련의 처리 과정이 이루어져 획득된다. 이처럼, 음성대화 서비스의 처리 과정이 서버 중심으로 이루어지게 되면, 기능의 업데이트가 용이하고 단말이 바뀌더라도 서비스가 제공 가능하다는 이점이 있지만, 응답 속도가 늦어진다는 단점이 존재한다. 또한, 서버가 동작하지 않는 경우에는 음성대화 시스템이 마비되어 음성대화 서비스의 제공이 원활하게 이루어지지 않게 되고, 서버에 부하가 걸리게 되면 단말기의 사용자들이 제공받는 음성대화 서비스의 품질이 저하된다는 문제점이 발생한다.However, in the voice conversation service according to the related art, when a user inputs a voice using a terminal, the conversation service related information corresponding to the voice is obtained by performing a series of processes in a server where resources exist. As such, when the processing of the voice conversation service is made centered on the server, there is an advantage that the function can be easily updated and the service can be provided even if the terminal is changed, but there is a disadvantage that the response speed is slowed. In addition, when the server does not operate, the voice conversation system is paralyzed, and thus the voice conversation service is not provided smoothly. When the load is applied to the server, the quality of the voice conversation service provided by the users of the terminal is degraded. Occurs.

한국공개특허 제2017-0043955호(2017.04.24 공개)Korean Patent Publication No. 2017-0043955 (published Apr. 24, 2017)

본 발명은 상기와 같은 문제점을 해결하기 위해 제안된 것으로서, 대화 서비스 제공을 위해 필요한 리소스만 서버로부터 제공받고, 음성대화 서비스의 제공을 위한 처리 과정은 단말기에서 이루어지도록 하는 사용자 음성에 대한 서비스 지연을 최소화하는 단말기 중심의 음성대화 시스템 및 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and receives only a resource necessary for providing a chat service from a server, and a process delay for providing a voice chat service is performed by a terminal. It is an object of the present invention to provide a terminal-oriented voice conversation system and method for minimizing.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 일 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by one embodiment of the present invention. It will also be appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 사용자가 발화한 음성에 대한 음성대화 서비스를 제공하는 사용자 단말 및 상기 사용자 단말로부터 상기 음성대화 서비스 제공을 위한 리소스 요청이 있을 경우, 상기 사용자 단말로 리소스를 제공하는 리소스 서버를 포함하는 음성대화 시스템에서, 상기 사용자 단말은, 입력부로부터 수신한 사용자가 발화한 음성에 대한 사용자의 음성을 인식하는 음성 인식부; 상기 인식된 사용자의 음성을 분석하여 필요한 리소스를 상기 리소스 서버로 요청하고 수신하여 음성대화를 처리하는 음성대화 처리부; 상기 음성대화 처리부에서 미리 설정된 일정 시간 동안 음성대화의 처리가 수행되지 않는 경우, 사용자에게 지연 시나리오를 제공하는 음성대화 지연 처리부; 및 상기 음성대화 처리부에서 처리된 음성대화 및 상기 음성대화 지연 처리부에서 제공하는 지연 시나리오의 제공을 위한 음성을 생성하는 음성 생성부;를 포함한다.According to an aspect of the present invention, a user terminal for providing a voice conversation service for a voice spoken by a user and a resource request for providing the voice conversation service from the user terminal are provided. In a voice dialogue system including a resource server for providing a resource to a terminal, the user terminal comprises: a voice recognition unit for recognizing a user's voice to the voice spoken by the user received from the input unit; A voice conversation processing unit which analyzes the recognized user's voice and requests and receives a required resource from the resource server to process a voice conversation; A voice conversation delay processor configured to provide a delay scenario to a user when the voice conversation is not performed for a predetermined time set by the voice conversation processor; And a voice generator configured to generate a voice for providing a voice dialogue processed by the voice dialogue processor and a delay scenario provided by the voice dialogue delay processor.

상기 음성대화 처리부는, 상기 인식된 사용자의 음성을 미리 수집된 비정형 텍스트 데이터로 이루어진 문맥 관리 데이터베이스를 기반으로 음성대화에 필요한 자연어를 이해하고, 상기 이해된 자연어에서 음성대화에 필요한 핵심 단어를 추출하여 상기 핵심 단어에 대한 리소스를 상기 리소스 서버로 요청하고 수신하여 상기 리소스를 기반으로 사용자가 발화한 음성에 상응하는 자연어를 생성함으로써 음성대화를 처리하는 것을 특징으로 한다.The speech conversation processing unit may understand a natural language necessary for speech conversation based on a context management database composed of unstructured text data collected in advance from the recognized user's speech, and extract a key word necessary for speech conversation from the understood natural language. The voice dialogue is processed by requesting and receiving a resource for the key word from the resource server and generating a natural language corresponding to the voice spoken by the user based on the resource.

상기 음성대화 지연 처리부는, 미리 설정된 일정 시간 동안 상기 음성대화 처리부에 의해 음성대화가 처리되지 않는 경우, 사용자에게 다른 음성대화를 요청하거나 또는 사용자에게 즉시 제공 가능한 음성대화 서비스 목록을 제공하는 것을 특징으로 한다.The voice conversation delay processing unit may request another voice conversation from the user or provide a list of voice conversation services that can be immediately provided to the user when the voice conversation is not processed by the voice conversation processing unit for a predetermined time. do.

상기와 같은 목적을 달성하기 위한 본 발명의 다른 측면에 따른 사용자가 발화한 음성에 대한 음성대화 서비스를 제공하는 사용자 단말 및 상기 사용자 단말로부터 상기 음성대화 서비스 제공을 위한 리소스 요청이 있을 경우, 상기 사용자 단말로 리소스를 제공하는 리소스 서버를 포함하는 음성대화 시스템에서의 음성대화 방법은, 상기 사용자 단말이, 사용자가 발화한 음성에 대한 사용자의 음성을 인식하는 단계; 상기 사용자 단말이, 인식된 사용자의 음성을 분석하여 필요한 리소스를 상기 리소스 서버로 요청하고 수신하여 음성대화를 처리하는 단계; 상기 사용자 단말이, 미리 설정된 일정 시간 동안 음성대화의 처리가 수행되지 않는 경우, 사용자에게 지연 시나리오를 제공하는 단계; 및 상기 사용자 단말이, 처리된 음성대화 및 지연 시나리오의 제공을 위한 음성을 생성하는 단계;를 포함한다.According to another aspect of the present invention for achieving the above object, a user terminal for providing a voice conversation service for the voice spoken by the user and a resource request for providing the voice conversation service from the user terminal, the user A voice conversation method in a voice conversation system including a resource server for providing a resource to a terminal may include: recognizing, by the user terminal, a user's voice with respect to a voice spoken by the user; Analyzing, by the user terminal, a voice of the recognized user, requesting and receiving a required resource from the resource server to process a voice conversation; Providing, by the user terminal, a delay scenario to the user when the voice conversation is not performed for a predetermined time; And generating, by the user terminal, voice for providing the processed voice conversation and delay scenario.

상기 인식된 사용자의 음성을 분석하여 필요한 리소스를 상기 리소스 서버로 요청하고 수신하여 음성대화를 처리하는 단계는, 상기 인식된 사용자의 음성을 미리 수집된 비정형 텍스트 데이터로 이루오진 문맥 관리 데이터베이스를 기반으로 음성대화에 필요한 자연어를 이해하는 단계; 상기 이해된 자연어에서 음성대화에 필요한 핵심 단어를 추출하여 상기 핵심 단어에 대한 리소스를 상기 리소스 서버로 요청하고 수신하는 단계; 및 상기 수신한 리소스를 기반으로 사용자가 발화한 음성에 상응하는 자연어를 생성하여 음성대화를 처리하는 단계;를 포함한다.Analyzing the recognized user's voice, requesting and receiving a required resource from the resource server, and processing a voice conversation, based on a context management database composed of pre-collected unstructured text data. Understanding the natural language required for speech conversation; Extracting a key word required for voice conversation from the understood natural language and requesting and receiving a resource for the key word from the resource server; And generating a natural language corresponding to the voice spoken by the user based on the received resource and processing the voice conversation.

상기 미리 설정된 일정 시간 동안 음성대화의 처리가 수행되지 않는 경우, 사용자에게 지연 시나리오를 제공하는 단계는, 미리 설정된 일정 시간 동안 상기 음성대화 처리부에 의해 음성대화가 처리되지 않는 경우, 사용자에게 다른 음성대화를 요청하거나 또는 사용자에게 즉시 제공 가능한 음성대화 서비스 목록을 제공하는 단계를 포함하는 것을 특징으로 한다.When the voice conversation is not performed for the predetermined time period, the step of providing a delay scenario to the user may include: When the voice conversation is not processed by the voice conversation processing unit for a predetermined time, another voice conversation is performed to the user. Requesting or providing a list of voice chat services that can be immediately provided to the user.

본 발명의 일 측면에 따르면, 단말기 중심으로 음성대화 서비스 제공을 위한 일련의 처리 과정이 이루어지므로, 응답 속도가 빠르다.According to an aspect of the present invention, since a series of processes for providing a voice conversation service centered on the terminal is made, the response speed is high.

또한, 음성대화 서비스 제공을 위해 필요한 리소스만 서버로 요청하고 제공받아 음성대화 서비스 제공을 위한 일련의 처리 과정을 사용자의 단말에서 수행하므로 서버에 걸리는 부하를 최소화하여 사용자가 제공받는 음성대화 서비스의 품질을 향상시킬 수 있는 효과가 있다.In addition, since the user terminal performs a series of processes for providing the voice conversation service by requesting and receiving only the resources necessary for providing the voice conversation service from the server, the quality of the voice conversation service provided by the user is minimized by minimizing the load on the server. There is an effect to improve.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effect obtained in the present invention is not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the following description. .

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용들과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일 실시예에 따른 음성대화 시스템의 구성을 도시한 도면,
도 2는 본 발명의 일 실시예에 따른 음성대화 방법의 흐름을 도시한 도면이다.The following drawings attached to this specification are illustrative of the preferred embodiments of the present invention, and together with the specific details for carrying out the invention serve to further understand the technical spirit of the present invention, the present invention in such drawings It should not be construed as limited to the matters described.
1 is a diagram showing the configuration of a voice conversation system according to an embodiment of the present invention;
2 is a diagram illustrating a flow of a voice conversation method according to an embodiment of the present invention.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하기로 한다.The above objects, features, and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 “…부” 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless otherwise stated. In addition, the “…” described in the specification. “Unit” refers to a unit that processes at least one function or operation, which may be implemented in hardware or software, or a combination of hardware and software.

도 1은 본 발명의 일 실시예에 따른 음성대화 시스템의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of a voice conversation system according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 음성대화 시스템은 사용자 단말(100) 및 리소스 서버(200)를 포함한다.Referring to FIG. 1, the voice conversation system according to the present embodiment includes a user terminal 100 and a resource server 200.

사용자 단말(100)은 사용자가 발화한 음성에 대한 음성대화 서비스를 제공하며, 리소스 서버(200)는 사용자 단말(100)로부터 음성대화 서비스 제공을 위한 리소스 요청이 있는 경우, 사용자 단말(100)로 해당하는 리소스를 제공한다. 이때, 리소스는 음성대화 서비스 제공을 위한 정보일 수 있다. 예컨대, 사용자가 발화한 음성이 “오늘의 날씨는?”일 경우, 리소스는 날씨에 대한 정보일 수 있다. 한편, 사용자 단말(100)과 리소스 서버(200)는 네트워크를 통해 연결되어 통신이 수행될 수 있다. 이때, 네트워크는 인터넷망, 인트라넷망, 이동통신망, 위성 통신망 등 다양한 유무선 통신 기술을 이용하여 인터넷 프로토콜로 데이터를 송수신할 수 있는 망을 의미할 수 있다. 한편, 네트워크는 LAN(Local Area Network), WAN(Wide Area Network) 등의 폐쇄형 네트워크, 인터넷(Internet)과 같은 개방형 네트워크뿐만 아니라, CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), GSM(Global System for Mobile Communication), LTE(Long Term Evolution), EPC(Evolved Packet Core) 등의 네트워크와 향후 구형될 차세대 네트워크 및 컴퓨팅 네트워크를 통칭할 수 있다.The user terminal 100 provides a voice conversation service for the voice spoken by the user, and the resource server 200 sends a request for a resource for providing a voice conversation service from the user terminal 100 to the user terminal 100. Provide the appropriate resources. In this case, the resource may be information for providing a voice conversation service. For example, if the voice spoken by the user is “what is the weather of today?”, The resource may be information about the weather. Meanwhile, the user terminal 100 and the resource server 200 may be connected through a network to perform communication. In this case, the network may mean a network capable of transmitting and receiving data through an internet protocol using various wired and wireless communication technologies such as an internet network, an intranet network, a mobile communication network, and a satellite communication network. Meanwhile, the network is not only a closed network such as a local area network (LAN), a wide area network (WAN), and an open network such as the Internet, but also code division multiple access (CDMA) and wide band code division multiple access (WCDMA). , Networks such as Global System for Mobile Communication (GSM), Long Term Evolution (LTE), and Evolved Packet Core (EPC), as well as future generations of networks and computing networks.

본 실시예에 있어서, 사용자 단말(100)은 사용자로부터 음성을 입력받는 출력부(160) 및 그에 대한 답변(예컨대, 입력된 음성에 상응하는 대화 서비스)을 출력할 수 있는 출력부(160)를 포함하는 음성대화가 가능한 장치로 예를 들어, 스마트 폰(smart phone), 핸드폰(mobile phone), PDA(Personal Digital Assistant) 또는 PMP(Portable Multimedia Player) 등일 수 있다. 또한, 대화가 가능한 로봇일 수도 있다.In the present embodiment, the user terminal 100 outputs an output unit 160 for receiving a voice from the user and an output unit 160 for outputting an answer thereof (for example, a chat service corresponding to the input voice). A device capable of voice communication, for example, may be a smart phone, a mobile phone, a personal digital assistant (PDA) or a portable multimedia player (PMP). It may also be a robot that can communicate.

사용자 단말(100)은 입력부(110), 음성 인식부(120), 음성대화 처리부(130), 음성대화 지연 처리부(140), 음성 생성부(150) 및 출력부(160)를 포함한다.The user terminal 100 includes an input unit 110, a voice recognition unit 120, a voice conversation processing unit 130, a voice conversation delay processing unit 140, a voice generation unit 150, and an output unit 160.

입력부(110)는 사용자로부터 발화된 음성을 입력받을 수 있으며, 마이크 등과 같은 장치일 수 있다.The input unit 110 may receive a spoken voice from a user and may be a device such as a microphone.

음성 인식부(120)는 입력부(110)로부터 수신한 사용자가 발화한 음성에 대한 사용자의 음성을 인식한다. 즉, 음성 인식부(120)는 사용자가 발화한 음성을 텍스트 형태로 변환할 수 있다.The voice recognition unit 120 recognizes the user's voice with respect to the voice spoken by the user received from the input unit 110. That is, the voice recognition unit 120 may convert the voice spoken by the user into a text form.

음성대화 처리부(130)는 음성 인식부(120)를 통해 인식된 사용자의 음성을 분석하여 필요한 리소스를 확인하고, 확인된 리소스를 리소스 서버(200)로 요청하고 수신하여 음성대화를 처리한다. The voice conversation processing unit 130 analyzes the voice of the user recognized by the voice recognition unit 120 to check the necessary resources, and requests and receives the identified resources from the resource server 200 to process the voice conversation.

음성대화 처리부(130)는 인식된 사용자의 음성을 미리 수집된 비정형 텍스트 데이터로 이루어진 문맥 관리 데이터베이스를 기반으로 음성대화에 필요한 자연어를 이해한다. 이후, 음성대화 처리부(130)는 이해된 자연어에서 음성대화에 필요한 핵심 단어를 추출하여 핵심 단어에 대한 리소스를 리소스 서버(200)로 요청하고 수신한다. 보다 자세하게, 음성대화 처리부(130)는 텍스트 형태로 변환된 사용자가 발화한 음성을 구문 분석을 통해, 문장의 구조와 각 단어의 구문론적 성분을 알아내고 인접 낱말과의 상관 관계를 분석함으로써 낱말 상호 간의 집합적 종속 관계를 파악하여 핵심 단어를 추출할 수 있다. 이후, 음성대화 처리부(130)는 수신한 리소스를 기반으로 사용자가 발화한 음성에 상응하는 자연어를 생성함으로써 음성대화를 처리한다. The voice conversation processor 130 understands the natural language necessary for the voice conversation based on a context management database composed of unstructured text data collected in advance from the recognized user's voice. Thereafter, the voice conversation processing unit 130 extracts a key word necessary for the voice conversation from the understood natural language and requests and receives a resource for the key word from the resource server 200. In more detail, the speech conversation processor 130 analyzes the speech spoken by the user, which is converted into a text form, through syntax analysis, to find the structure of the sentence and the syntactic components of each word, and to analyze the correlation between adjacent words. Key words can be extracted by identifying the collective dependencies between them. Thereafter, the voice conversation processor 130 processes the voice conversation by generating a natural language corresponding to the voice spoken by the user based on the received resource.

한편, 자연어는 사용자 단말(100)의 검색, 질의 응답 또는 채팅과 같은 동작에 의해 제공되는 문장일 수 있으며, 검색이나 질의 응답의 경우 음성 인식을 통해 이루어질 수 있다. 이러한 경우, 텍스트 기반의 인식 결과가 입력 문장으로 제공되는 것이 바람직하지만, 단순히 음성 신호가 입력되는 경우에는 대화 처리부에 제공되어, 텍스트 기반의 인식 결과가 생성될 수 있다.The natural language may be a sentence provided by an operation such as a search, a query response, or a chat of the user terminal 100. In the case of a search or a query response, the natural language may be performed through voice recognition. In this case, it is preferable that the text-based recognition result is provided as an input sentence. However, when a voice signal is simply input, the text-based recognition result may be provided to the conversation processor to generate a text-based recognition result.

또한, 음성대화 처리부(130)는 인식된 사용자의 음성을 미리 수집된 비정형 텍스트 데이터로 이루어진 문맥 관리 데이터베이스를 기반으로 음성대화에 필요한 자연어를 이해하기 위해, 보다 구체적으로, 사용자가 발화한 음성을 텍스트로 변환하고, 변환된 텍스트를 형태소 단위로 분해한 후, 마지막 형태소 뒤에 더미 형태소를 추가하고 형태소들의 품사 정보와 형태소들 간 관계 정보를 생성할 수도 있다.In addition, the voice conversation processing unit 130 may, in particular, understand the natural language required for the voice conversation based on a context management database consisting of unstructured text data collected in advance from the recognized user's voice. After converting to, decomposing the converted text into morphological units, dummy morphemes may be added after the last morpheme, and parts of speech information and relation information between morphemes may be generated.

음성대화 지연 처리부(140)는 음성대화 처리부(130)에서 음성대화의 처리가 수행되지 않는 경우 사용자에게 다른 음성을 요청할 수 있다. 음성대화 지연 처리부(140)는 음성대화 처리부(130)에서 미리 설정된 시간 동안 음성대화의 처리가 수행되지 않는 경우, 사용자에게 지연 시나리오를 제공할 수 있다. 이때, 지연 시나리오는 음성대화 처리부(130)에서 사용자가 발화한 음성에 대한 음성대화 서비스 제공이 이루어지지 않는 경우, 사용자가 지루함을 느끼지 않도록 하기 위한 것으로, 사용자에게 다른 음성대화를 요청하거나 또는 사용자에게 즉시 제공 가능한 음성대화 서비스 목록을 제공하는 것 등일 수 있다. 즉, 음성대화 지연 처리부(140)는 미리 설정된 일정 시간 동안 음성대화 처리부(130)에 의해 음성대화가 처리되지 않는 경우, 음성대화 서비스에 지연이 발생한 것으로 판단하여, 사용자에게 다른 음성대화를 요청하거나 또는 사용자에게 즉시 제공 가능한 음성대화 서비스 목록을 제공할 수 있다. The voice conversation delay processing unit 140 may request another voice from the user when the voice conversation processing is not performed by the voice conversation processing unit 130. The voice conversation delay processor 140 may provide a delay scenario to the user when the voice conversation processing is not performed for a predetermined time in the voice conversation processor 130. In this case, the delay scenario is to prevent the user from being bored when the voice conversation service for the voice spoken by the user is not provided by the voice conversation processing unit 130, and requests another voice conversation to the user or to the user. This may include providing a list of voice chat services that can be provided immediately. That is, when the voice conversation is not processed by the voice conversation processor 130 for a predetermined time, the voice conversation delay processing unit 140 determines that a delay has occurred in the voice conversation service, and requests another voice conversation from the user. Alternatively, the user may provide a list of voice chat services that can be immediately provided to the user.

음성 생성부(150)는 음성대화 처리부(130)에서 처리된 음성대화 및 음성대화 지연 처리부(140)에서 제공하는 지연 시나리오의 제공을 위한 음성을 생성할 수 있다. The voice generator 150 may generate a voice for providing a delay scenario provided by the voice conversation and the voice conversation delay processor 140 processed by the voice conversation processor 130.

출력부(160)는 음성 생성부(150)에서 생성된 음성을 출력하여 사용자에게 제공할 수 있으며, 스피커 등과 같은 장치일 수 있다.The output unit 160 may output the voice generated by the voice generator 150 and provide the same to the user. The output unit 160 may be a device such as a speaker.

한편, 사용자 단말(100)은 사용자로부터 발화된 음성에 포함된 노이즈를 제거하기 위한 전처리부(미도시)를 더 포함할 수 있다. 상기 음성에는 사용자의 음성 메시지 이외에 주변의 소음이 포함되어 있을 수 있으며, 이러한 노이즈가 포함된 상태에서는 정확한 음성 인식이 이루어지지 않을 수 있다. 따라서, 이러한 문제점을 해결하기 위해 사용자 단말(100)은 전처리부(미도시)를 이용하여 사용자로부터 발화된 음성에 포함된 노이즈를 제거할 수 있다. 구체적으로, 전처리부(미도시)는 LowPassFilter, HighPassFilter, BandPassFilter 중 어느 하나를 포함할 수 있으며, 사용자의 음성 기준으로 고유 주파수를 설정하고, 해당 주파수를 중심으로 상기 필터를 적용시켜 불필요한 신호를 걸러낼 수 있다.Meanwhile, the user terminal 100 may further include a preprocessor (not shown) for removing noise included in the voice spoken by the user. The voice may include ambient noise in addition to the voice message of the user, and accurate voice recognition may not be performed when such noise is included. Therefore, in order to solve this problem, the user terminal 100 may remove noise included in the voice spoken by the user using a preprocessor (not shown). Specifically, the preprocessor (not shown) may include any one of LowPassFilter, HighPassFilter, BandPassFilter, and set the natural frequency based on the user's voice, and apply the filter around the frequency to filter out unnecessary signals. Can be.

또한, 사용자 단말(100)은 저장부(미도시)를 더 포함할 수 있으며, 저장부(미도시)는 음성대화 처리 방법(자연어 이해, 문맥 관리 및 자연어 생성 등)을 수행하는 프로그램(어플리케이션)이 저장될 수도 있다.In addition, the user terminal 100 may further include a storage unit (not shown), the storage unit (not shown) is a program (application) for performing a voice conversation processing method (natural language understanding, context management and natural language generation, etc.) This may be stored.

본 실시예를 설명함에 있어서, 상술한 구성요소들은 개별적으로 동작하는 것으로 설명하지만 이에 한하지 않으며, 제어부(미도시)에 의해 제어되어 유기적으로 동작할 수 있다.In the present embodiment, the above-described components are described as operating individually, but the present invention is not limited thereto and may be controlled by a controller (not shown) to operate organically.

리소스 서버(200)는 사용자 단말(100)에서 요청하는 리소스와 관련한 정보들을 저장할 수 있다.The resource server 200 may store information related to a resource requested by the user terminal 100.

도 2는 본 발명의 일 실시예에 따른 음성대화 방법의 흐름을 도시한 도면이다.2 is a diagram illustrating a flow of a voice conversation method according to an embodiment of the present invention.

도 2를 참조하면, 먼저, 사용자 단말(100)은 사용자가 발화한 음성에 대한 사용자의 음성을 인식한다(S210).Referring to FIG. 2, first, the user terminal 100 recognizes a user's voice with respect to a voice spoken by the user (S210).

다음으로, 사용자 단말(100)은 인식된 사용자의 음성을 분석하여 필요한 리소스를 리소스 서버(200)로 요청하고 수신하여 음성대화를 처리한다(S220).Next, the user terminal 100 analyzes the voice of the recognized user to request and receive the necessary resources to the resource server 200 to process the voice conversation (S220).

다음으로, 사용자 단말(100)은 미리 설정된 일정 시간 동안 음성대화의 처리가 수행되지 않는 경우, 사용자에게 지연 시나리오를 제공한다(S230). 예를 들어, 사용자 단말(100)은 아래와 같은 지연 시나리오를 제공할 수 있다.Next, the user terminal 100 provides a delay scenario to the user when the voice conversation is not performed for a predetermined time (S230). For example, the user terminal 100 may provide a delay scenario as follows.

- 지연 시나리오 1 -Delay Scenario 1

사용자 : 이번주 A팀 경기 일정 알려줘.User: Please let me know your schedule for Team A this week.

시스템 : 경기 정보를 찾는대로 알려드리겠습니다. 다른 필요한 것이 있으신가요?System: We will let you know as soon as you find the match information. Do you have anything else you need?

사용자 : 오늘의 날씨 좀 알려줘.User: Please tell me the weather of the day.

시스템 : 오늘 날씨는 19도로 쌀쌀하고, 눈이 올 확률이 60%입니다.System: Today's weather is 19 degrees chilly, with a 60% chance of snow.

시스템 : 요청하셨던 이번주 A팀 경기 일정입니다. 이번주 수요일 19:00 K구장, 금요일 19:00에 X구장에서 경기가 있습니다. System: This is your schedule for Team A this week. This Wednesday, at 19:00 K Stadium and Friday 19:00, we will play at the X Stadium.

- 지연 시나리오 2 -Delay scenario 2

시스템 : 요청하셨던 이번주 A팀 경기 일정 정보를 찾을 수 없습니다. 서버에 이상이 생긴 것 같습니다. System: We could not find the team A schedule you requested this week. There seems to be something wrong with the server.

이처럼, 사용자 단말(100)은 서버(200) 중심이 아닌, 사용자 단말(100) 중심으로 사용자 음성대화 서비스를 제공함으로, 일련의 처리 과정이 단말에서 이루어져 음성대화 서비스를 위한 응답 속도가 빨라질 수 있다. 또한, 음성대화 서비스 제공을 위해 필요한 리소스만 서버(200)로 요청하고 제공받아 음성대화 서비스 제공을 위한 일련의 처리 과정을 사용자의 단말에서 수행하므로 서버(200)에 걸리는 부하를 최소화하여 사용자가 제공받는 음성대화 서비스의 품질을 향상시킬 수 있는 효과가 있다. 그리고, 지연 시나리오를 적용함으로써 사용자가 음성대화 서비스를 제공받음에 있어서 지루하지 않도록 할 수 있다. As such, since the user terminal 100 provides the user voice conversation service centered on the user terminal 100 instead of the server 200, a series of processes may be performed in the terminal, thereby speeding up the response speed for the voice conversation service. . In addition, since the user terminal performs a series of processes for providing a voice conversation service by receiving and requesting only the resources necessary for providing the voice conversation service to the server 200, the user minimizes the load on the server 200 and provides the user. There is an effect that can improve the quality of the received voice chat service. In addition, by applying a delay scenario, the user may not be bored when receiving the voice conversation service.

이후, 사용자 단말(100)은 처리된 음성대화 및 지연 시나리오의 제공을 위한 음성을 생성한다(S240).Thereafter, the user terminal 100 generates a voice for providing the processed voice dialogue and delay scenario (S240).

본 발명의 실시예에 따른 방법들은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는, 본 발명을 위한 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Methods according to an embodiment of the present invention may be implemented in the form of program instructions that may be implemented as an application or executed through various computer components, and may be recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the computer-readable recording medium may be those specially designed and constructed for the present invention, and may be known and available to those skilled in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks. media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the process according to the invention, and vice versa.

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 아니 된다. 또한, 본 명세서의 개별적인 실시예에서 설명된 특징들은 단일 실시예에서 결합되어 구현될 수 있다. 반대로, 본 명세서의 단일 실시예에서 설명된 다양한 특징들은 개별적으로 다양한 실시예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While this specification includes many features, such features should not be construed as limiting the scope of the invention or the claims. In addition, the features described in the individual embodiments herein can be implemented in combination in a single embodiment. Conversely, various features described in a single embodiment of the present specification can be implemented individually in various embodiments or in combination as appropriate.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시예에서 다양한 시스템 구성요소의 구분은 모든 실시예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 앱 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although the operations have been described in a particular order in the drawings, they should not be understood as being performed in a particular order as shown or in a sequence of successive orders, or all of the described actions being performed to obtain a desired result. Multitasking and parallel processing may be advantageous in certain circumstances. In addition, it should be understood that the division of various system components in the above-described embodiments does not require such division in all embodiments. The app components and systems described above may generally be packaged in a single software product or multiple software products.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것은 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawing.

도 11

Claims

In a voice conversation system including a user terminal for providing a voice conversation service for the voice spoken by the user and a resource server for providing a resource to the user terminal when the resource request for the voice conversation service is provided from the user terminal. In
The user terminal,
A voice recognition unit for recognizing the user's voice with respect to the voice spoken by the user received from the input unit;
A voice conversation processing unit which analyzes the recognized user's voice and requests and receives a required resource from the resource server to process a voice conversation;
A voice conversation delay processor configured to provide a delay scenario to a user when the voice conversation is not performed for a predetermined time set by the voice conversation processor; And
A voice center system for minimizing service delay for a user voice including a voice generator configured to generate a voice for providing a voice conversation processed by the voice conversation processor and a delay scenario provided by the voice conversation delay processor. .

The method of claim 1,
The voice conversation processing unit,
Based on the context management database consisting of the unstructured text data collected in advance from the recognized user's voice, understand the natural language required for the voice conversation, and extract the key words required for the voice conversation from the understood natural language to obtain resources for the key word. Requesting and receiving a request from the resource server to generate a natural language corresponding to the voice spoken by the user based on the resource, thereby processing the voice conversation. .

The method of claim 1,
The voice conversation delay processing unit,
When the voice conversation is not processed by the voice conversation processing unit for a predetermined time,
A terminal-oriented voice conversation system for minimizing service delay for a user's voice, which requests a user for another voice conversation or provides a list of voice conversation services that can be immediately provided to the user.

In a voice conversation system including a user terminal for providing a voice conversation service for a voice spoken by a user, and a resource server for providing a resource to the user terminal when a resource request for providing the voice conversation service is received from the user terminal. In the voice conversation method of,
Recognizing, by the user terminal, the user's voice with respect to the voice spoken by the user;
Analyzing, by the user terminal, a voice of the recognized user, requesting and receiving a required resource from the resource server to process a voice conversation;
Providing, by the user terminal, a delay scenario to the user when the voice conversation is not performed for a predetermined time; And
Generating, by the user terminal, a voice for providing a processed voice conversation and a delay scenario.

The method of claim 4, wherein
Analyzing the recognized user's voice to request and receive the necessary resources to the resource server to process the voice conversation,
Understanding a natural language required for speech conversation based on a context management database consisting of the pre-collected unstructured text data of the recognized user's speech;
Extracting a key word required for voice conversation from the understood natural language and requesting and receiving a resource for the key word from the resource server; And
And generating a natural language corresponding to the voice spoken by the user based on the received resource to process a voice dialogue.

The method of claim 4, wherein
If the voice conversation is not performed for the predetermined time period, the step of providing a delay scenario to the user may include:
If the voice conversation is not processed by the voice conversation processing unit for a predetermined time, requesting another voice conversation to the user or providing a list of voice chat services that can be immediately provided to the user. Way.