KR102411619B1

KR102411619B1 - Electronic apparatus and the controlling method thereof

Info

Publication number: KR102411619B1
Application number: KR1020150128511A
Authority: KR
Inventors: 최형탁; 황인철; 김덕호; 이정섭; 전희식
Original assignee: 삼성전자주식회사
Priority date: 2015-05-11
Filing date: 2015-09-10
Publication date: 2022-06-21
Also published as: KR20160132748A

Abstract

전자 장치가 개시된다. 전자 장치는 대화 주제 별로 카테고리화된 도메인 정보를 저장하는 저장부, 사용자 발화 음성에 대응되는 시스템 응답을 출력하는 스피커부 및 사용자 발화 음성에 대응되는 도메인을 검출하고, 사용자 발화 음성과 검출된 도메인 간의 신뢰도(confidence)에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하는 프로세서를 포함한다. 이에 따라, 다양한 사용자 발화 음성을 인식하고 복수의 단말 장치의 기능에 기초하여 시스템 응답을 생성할 수 있게 된다.An electronic device is disclosed. The electronic device detects a storage unit that stores domain information categorized for each conversation topic, a speaker unit that outputs a system response corresponding to the user's uttered voice, and a domain corresponding to the user's uttered voice, between the user uttered voice and the detected domain. and a processor for generating a system response by determining a domain to process a user's uttered voice from among the detected domain and the previous domain based on confidence. Accordingly, it is possible to recognize various user utterances and generate a system response based on functions of a plurality of terminal devices.

Description

ELECTRONIC APPARATUS AND THE CONTROLLING METHOD THEREOF

본 발명은 전자 장치 및 그 제어 방법에 관한 것으로서, 보다 상세하게는 사용자 발화 음성에 대응되는 시스템 응답을 생성하는 전자 장치 및 그 제어 방법에 관한 것이다.The present invention relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for generating a system response corresponding to a user uttered voice, and a method for controlling the same.

전자 기술의 발달에 힘입어 다양한 유형의 전자 제품들이 개발 및 보급되고 있다. 특히, TV, 휴대폰, PC, 노트북 PC, PDA 등과 같은 각종 디스플레이 장치들은 대부분의 일반 가정에서도 많이 사용되고 있다.With the development of electronic technology, various types of electronic products are being developed and distributed. In particular, various display devices such as TV, mobile phone, PC, notebook PC, PDA, etc. are widely used in most general homes.

디스플레이 장치들의 사용이 늘면서 좀 더 다양한 기능에 대한 사용자 니즈(needs)도 증대되었다. 이에 따라, 사용자 니즈에 부합하기 위한 각 제조사들의 노력도 커져서, 종래에 없던 새로운 기능을 갖춘 제품들이 속속 등장하고 있다.As the use of display devices increases, user needs for more diverse functions have also increased. Accordingly, each manufacturer's efforts to meet the user's needs have also increased, and products with new functions that were not previously available are appearing one after another.

특히, 일반 가정에서는 이러한 디스플레이 장치뿐만 아니라 냉장고, 에어컨, 전등 등과 같은 다양한 가전 제품을 사용하고 있으며, 현재 이러한 다양한 가전 제품을 네트워크로 연결하여 제어하고자 하는 홈 네트워크 시스템이 상용화되어 있다.In particular, not only the display device but also various home appliances such as refrigerators, air conditioners, and electric lamps are used in general households, and a home network system for controlling these various home appliances by connecting them through a network has been commercialized.

한편, 이러한 홈 네트워크 시스템에 대해 사용자는 직접 조작을 수행하여 원하는 목적을 달성할 수 있으나, 홈 네트워크 시스템이 음성 인식 기능을 지원하는 경우 사용자는 음성 명령을 발화하여 원하는 목적을 달성할 수 있게 된다.On the other hand, the user can directly manipulate the home network system to achieve a desired purpose, but when the home network system supports a voice recognition function, the user can utter a voice command to achieve the desired purpose.

다만, 현재 사용자 발화 음성을 인식하고 이에 대응되는 응답을 생성하는 시스템은 단순히 하나의 단말 장치가 제공하는 기능만을 제어하는데 사용될 뿐, 여러 개의 단말 장치들의 기능을 조합 및 비교하여 응답을 생성할 수 없고 또는 시스템에 프로그래밍된 다이얼로그를 벗어나는 사용자 발화 음성이 인식되면 이를 처리하지 못하는 문제가 있다.However, the current system for recognizing a user's uttered voice and generating a corresponding response is used to simply control a function provided by one terminal device, and cannot generate a response by combining and comparing the functions of several terminal devices. Alternatively, when a user's spoken voice outside of a dialog programmed in the system is recognized, there is a problem in that it cannot be processed.

또한, 하나의 특정 도메인에 대해 사용자와 시스템이 대화를 주고 받는 상황에서 사용자가 다른 도메인에 대한 대화를 진행하면 시스템은 정상적으로 처리하지 못하는 문제가 있다.In addition, there is a problem in that if the user proceeds with a conversation on another domain in a situation where the user and the system exchange a conversation for one specific domain, the system cannot normally process it.

이에 따라, 다양한 사용자 발화 음성을 인식하고 이에 대응되는 응답을 생성하며 여러 개의 단말 장치의 기능에 기초하여 응답을 생성하고자 하는 요구가 증대되었다.Accordingly, there has been an increase in demand for recognizing various user voices, generating responses corresponding thereto, and generating responses based on functions of a plurality of terminal devices.

본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로, 본 발명의 목적은 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하는 전자 장치 및 그 제어 방법을 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been devised to solve the above problems, and an object of the present invention is to provide an electronic device for generating a system response by determining a domain to process a user's uttered voice, and a method for controlling the same.

이러한 목적을 달성하기 위한 본 발명의 일 실시 예에 따른 전자 장치는 대화 주제 별로 카테고리화된 도메인 정보를 저장하는 저장부, 사용자 발화 음성에 대응되는 시스템 응답을 출력하는 스피커부 및 상기 사용자 발화 음성에 대응되는 도메인을 검출하고, 상기 사용자 발화 음성과 상기 검출된 도메인 간의 신뢰도(confidence)에 기초하여 상기 검출된 도메인 및 이전 도메인 중 상기 사용자 발화 음성을 처리할 도메인을 결정하여 상기 시스템 응답을 생성하는 프로세서를 포함한다.In order to achieve this object, an electronic device according to an embodiment of the present invention includes a storage unit for storing domain information categorized for each conversation topic, a speaker unit for outputting a system response corresponding to the user's uttered voice, and the user's uttered voice. A processor configured to detect a corresponding domain and generate the system response by determining a domain to process the user uttered voice from among the detected domain and previous domains based on confidence between the user uttered voice and the detected domain includes

여기서, 상기 저장부는, 상기 도메인 각각에 대응되는 대화 주제를 컨텍스트 별로 카테고리화하여 저장하며, 상기 프로세서는, 상기 이전 도메인이 상기 사용자 발화 음성을 처리할 도메인으로 결정되면, 상기 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 상기 사용자 발화 음성과 상기 판단된 컨텍스트 간의 신뢰도에 기초하여 상기 판단된 컨텍스트 및 이전 컨텍스트 중 상기 사용자 발화 음성을 처리할 컨텍스트를 결정하여 상기 시스템 응답을 생성할 수 있다.Here, the storage unit categorizes and stores conversation topics corresponding to each of the domains for each context, and the processor is configured to store, when the previous domain is determined as a domain to process the user's spoken voice, corresponding to the user's spoken voice. The system response may be generated by determining a context, and determining a context to process the user's spoken voice from among the determined context and previous contexts based on the reliability between the user's spoken voice and the determined context.

또한, 상기 프로세서는, 상기 판단된 컨텍스트가 상기 사용자 발화 음성을 처리할 컨텍스트로 결정되면, 상기 이전 컨텍스트와 관련된 정보를 상기 저장부에 저장하고, 상기 판단된 컨텍스트에서의 발화 음성 처리가 종료되면, 상기 저장된 이전 컨텍스트와 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.In addition, when the determined context is determined as a context for processing the user's spoken voice, the processor stores information related to the previous context in the storage unit, and when the spoken voice processing in the determined context is terminated, A new spoken voice may be processed based on the stored information related to the previous context.

또한, 상기 프로세서는, 상기 검출된 도메인이 상기 사용자 발화 음성을 처리할 도메인으로 결정되면, 상기 이전 도메인과 관련된 정보를 상기 저장부에 저장하고, 상기 검출된 도메인에서의 발화 음성 처리가 종료되면, 상기 저장된 이전 도메인과 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.In addition, when the detected domain is determined as a domain to process the user's speech voice, the processor stores information related to the previous domain in the storage unit, and when the speech speech processing in the detected domain ends, A new spoken voice may be processed based on the stored information related to the previous domain.

또한, 상기 프로세서는, 상기 사용자 발화 음성을 구성하는 적어도 하나의 발화 요소와 상기 검출된 도메인에 속하는 적어도 하나의 발화 요소 간의 동일 여부에 따른 신뢰도 스코어 에 기초하여 상기 사용자 발화 음성과 상기 검출된 도메인 간의 신뢰도를 판단할 수 있다.In addition, the processor is configured to: Based on a confidence score based on whether at least one speech element constituting the user's speech voice is identical to at least one speech element belonging to the detected domain, between the user's speech voice and the detected domain reliability can be judged.

한편, 본 발명의 일 실시 예에 따른 전자 장치는 적어도 하나의 외부 장치와 통신을 수행하는 통신부를 더 포함하며, 상기 프로세서는, 상기 발화 음성에 대응되는 시스템 응답이 상기 결정된 도메인 내에서 상기 적어도 하나의 외부 장치의 기능 제어가 요구되는 컨텍스트에 기초하여 생성되면, 상기 외부 장치의 기능에 관한 정보에 기초하여 적어도 하나의 외부 장치의 기능을 제어하기 위한 상기 시스템 응답을 생성할 수 있다.Meanwhile, the electronic device according to an embodiment of the present invention further includes a communication unit configured to communicate with at least one external device, wherein the processor is configured to provide a system response corresponding to the spoken voice to the at least one within the determined domain. When the function control of the external device is generated based on a context in which the function control of the external device is requested, the system response for controlling the function of at least one external device may be generated based on the information on the function of the external device.

또한, 상기 저장부는, 상기 외부 장치의 기능에 대한 정보를 더 저장하고, 상기 통신부는, 기설정된 네트워크 내에 추가된 적어도 하나의 외부 장치에 대한 기능 정보를 수신하며, 상기 프로세서는, 상기 수신된 적어도 하나의 외부 장치에 대한 기능 정보에 기초하여 상기 저장부에 저장된 정보를 업데이트할 수 있다.In addition, the storage unit further stores information on functions of the external device, the communication unit receives function information on at least one external device added in a preset network, and the processor, Information stored in the storage unit may be updated based on function information about one external device.

또한, 상기 프로세서는, 발화 이력 정보에 기초하여 상기 사용자 발화 음성을 처리할 도메인을 결정하여 상기 시스템 응답을 생성하며, 상기 발화 이력 정보는 이전에 수신된 사용자 발화 음성, 상기 이전에 수신된 사용자 발화 음성을 처리한 도메인과 관련된 정보 및 상기 이전에 수신된 사용자 발화 음성에 대응되는 시스템 응답 중 적어도 하나를 포함할 수 있다.In addition, the processor generates the system response by determining a domain to process the user uttered voice based on the utterance history information, and the utterance history information includes a previously received user uttered voice and the previously received user uttered voice. It may include at least one of information related to the domain in which the voice has been processed and a system response corresponding to the previously received user uttered voice.

또한, 상기 도메인 정보는, 상기 대화 주제에 대응되는 태스크 수행을 위한 제어 정보 및 상기 대화 주제 별 대화 패턴 중 적어도 하나를 포함할 수 있다.Also, the domain information may include at least one of control information for performing a task corresponding to the conversation topic and a conversation pattern for each conversation topic.

한편, 본 발명의 일 실시 예에 따른 전자 장치는 상기 사용자 발화 음성을 입력받는 마이크부를 더 포함할 수 있다.Meanwhile, the electronic device according to an embodiment of the present invention may further include a microphone unit for receiving the user's uttered voice.

한편, 본 발명의 일 실시 예에 따른 대화 주제 별로 카테고리화된 도메인 정보를 저장하는 저장부를 포함하는 전자 장치의 제어 방법에 있어서, 사용자 발화 음성에 대응되는 도메인을 검출하는 단계 및 상기 사용자 발화 음성과 상기 검출된 도메인 간의 신뢰도(confidence)에 기초하여 상기 검출된 도메인 및 이전 도메인 중 상기 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하는 단계를 포함한다.Meanwhile, in the control method of an electronic device including a storage unit for storing domain information categorized for each conversation topic according to an embodiment of the present invention, the step of detecting a domain corresponding to a user uttered voice and the and generating a system response by determining a domain to process the user's uttered voice from among the detected domain and previous domains based on confidence between the detected domains.

여기서, 상기 저장부는, 상기 도메인 각각에 대응되는 대화 주제를 컨텍스트 별로 카테고리화하여 저장하며, 상기 시스템 응답을 생성하는 단계는, 상기 이전 도메인이 상기 사용자 발화 음성을 처리할 도메인으로 결정되면, 상기 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 상기 사용자 발화 음성과 상기 판단된 컨텍스트 간의 신뢰도에 기초하여 상기 판단된 컨텍스트 및 이전 컨텍스트 중 상기 사용자 발화 음성을 처리할 컨텍스트를 결정하여 상기 시스템 응답을 생성할 수 있다.Here, the storage unit categorizes and stores the conversation topics corresponding to each of the domains for each context, and the generating of the system response includes, The system response may be generated by determining a context corresponding to the spoken voice, and determining a context to process the user's spoken voice among the determined context and previous contexts based on the reliability between the user's spoken voice and the determined context have.

또한, 상기 시스템 응답을 생성하는 단계는, 상기 판단된 컨텍스트가 상기 사용자 발화 음성을 처리할 컨텍스트로 결정되면, 상기 이전 컨텍스트와 관련된 정보를 상기 저장부에 저장하고, 상기 판단된 컨텍스에서의 발화 음성 처리가 종료되면, 상기 저장된 이전 컨텍스트와 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.Also, in the generating of the system response, when the determined context is determined as a context to process the user uttered voice, information related to the previous context is stored in the storage unit, and the utterance in the determined context When the voice processing is finished, the newly uttered voice may be processed based on the stored information related to the previous context.

또한, 상기 시스템 응답을 생성하는 단계는, 상기 검출된 도메인이 상기 사용자 발화 음성을 처리할 도메인으로 결정되면, 상기 이전 도메인과 관련된 정보를 상기 저장부에 저장하고, 상기 검출된 도메인에서의 발화 음성 처리가 종료되면, 상기 저장된 이전 도메인과 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.In addition, the generating of the system response may include, when the detected domain is determined as a domain to process the user's uttered voice, information related to the previous domain is stored in the storage unit, and the uttered voice from the detected domain is stored in the storage unit. When the processing is completed, the newly uttered voice may be processed based on the stored information related to the previous domain.

또한, 상기 시스템 응답을 생성하는 단계는, 상기 사용자 발화 음성을 구성하는 적어도 하나의 발화 요소와 상기 검출된 도메인에 속하는 적어도 하나의 발화 요소 간의 동일 여부에 따른 신뢰도 스코어에 기초하여 상기 사용자 발화 음성과 상기 검출된 도메인 간의 신뢰도를 판단할 수 있다.In addition, the generating of the system response may include: based on a reliability score according to whether at least one speech element constituting the user's speech voice and at least one speech element belonging to the detected domain is identical to the user's speech voice and Reliability between the detected domains may be determined.

또한, 상기 시스템 응답을 생성하는 단계는, 상기 발화 음성에 대응되는 시스템 응답이 상기 결정된 도메인 내에서 상기 적어도 하나의 외부 장치의 기능 제어가 요구되는 컨텍스트에 기초하여 생성되면, 상기 외부 장치의 기능에 관한 정보에 기초하여 적어도 하나의 외부 장치의 기능을 제어하기 위한 상기 시스템 응답을 생성할 수 있다.In addition, the generating of the system response may include, when a system response corresponding to the spoken voice is generated based on a context in which function control of the at least one external device is required within the determined domain, the function of the external device is determined. The system response for controlling the function of the at least one external device may be generated based on the related information.

한편, 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법은 기 설정된 네트워크 내에 추가된 적어도 하나의 외부 장치에 대한 기능 정보를 수신하여 기 저장된 상기 외부 장치의 기능에 대한 정보를 업데이트하는 단계를 더 포함할 수 있다.Meanwhile, the method of controlling an electronic device according to an embodiment of the present invention further includes the step of receiving function information on at least one external device added in a preset network and updating the previously stored information on the function of the external device. may include

또한, 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법은 발화 이력 정보에 기초하여 상기 사용자 발화 음성을 처리할 도메인을 결정하여 상기 시스템 응답을 생성하는 단계를 더 포함하며, 상기 발화 이력 정보는 이전에 수신된 사용자 발화 음성, 상기 이전에 수신된 사용자 발화 음성을 처리한 도메인과 관련된 정보 및 상기 이전에 수신된 사용자 발화 음성에 대응되는 시스템 응답 중 적어도 하나를 포함할 수 있다.In addition, the method of controlling an electronic device according to an embodiment of the present invention further includes generating the system response by determining a domain to process the user's uttered voice based on utterance history information, wherein the utterance history information includes: It may include at least one of a previously received user uttered voice, information related to a domain in which the previously received user uttered voice has been processed, and a system response corresponding to the previously received user uttered voice.

또한, 본 발명의 일 실시 예에 따른 전자 장치는 상기 사용자 발화 음성을 입력받는 단계를 더 포함할 수 있다.In addition, the electronic device according to an embodiment of the present invention may further include the step of receiving the user's spoken voice.

또한, 본 발명의 일 실시 예에 따른 시스템 응답을 생성하는 프로그램이 저장된 저장 매체에 있어서, 상기 프로그램은, 사용자 발화 음성에 대응되는 도메인을 검출하는 단계 및 상기 사용자 발화 음성과 상기 검출된 도메인 간의 신뢰도(confidence)에 기초하여 상기 검출된 도메인 및 이전 도메인 중 상기 사용자 발화 음성을 처리할 도메인을 결정하여 상기 시스템 응답을 생성하는 단계를 포함할 수 있다.In addition, in a storage medium storing a program for generating a system response according to an embodiment of the present invention, the program includes the steps of: detecting a domain corresponding to a user's uttered voice and reliability between the user's uttered voice and the detected domain and generating the system response by determining a domain to process the user's uttered voice from among the detected domain and the previous domain based on (confidence).

이상과 같은 본 발명의 다양한 실시 예에 따르면, 다양한 사용자 발화 음성을 인식하고 복수의 단말 장치의 기능에 기초하여 시스템 응답을 생성할 수 있게 된다.According to various embodiments of the present disclosure as described above, it is possible to recognize various user voices and generate a system response based on functions of a plurality of terminal devices.

도 1은 본 발명의 일 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다.
도 2는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리하는 과정을 나타낸 도면이다.
도 3은 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리하는 과정을 설명하기 위한 흐름도이다.
도 4는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리할 도메인이 변경되는 경우 처리 과정에 관한 도면이다.
도 5는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리할 도메인이 변경되지 않는 경우 처리 과정에 관한 도메인이다.
도 6은 본 발명의 DM 모듈에서의 처리 과정을 도시한 도면이다.
도 7은 본 발명의 일 실시 예에 따른 전자 장치(100)와 외부 장치에 관한 정보를 포함하는 데이터 베이스를 포함하는 시스템에서 수행되는 과정을 설명하기 위한 도면이다.
도 8은 본 발명의 다른 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다.
도 9는 본 발명의 또 다른 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다.
도 10은 도 1에 도시된 전자 장치의 구체적 구성을 나타내는 블럭도이다.
도 11은 본 발명의 일 실시 예에 따른 저장부에 저장된 소프트웨어 모듈에 관한 도면이다.
도 12는 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법을 설명하기 위한 흐름도이다.1 is a block diagram illustrating the configuration of an electronic device according to an embodiment of the present invention.
2 is a diagram illustrating a process of processing a user uttered voice according to an embodiment of the present invention.
3 is a flowchart illustrating a process of processing a user uttered voice according to an embodiment of the present invention.
4 is a diagram illustrating a processing process when a domain for processing a user's uttered voice is changed according to an embodiment of the present invention.
5 is a domain related to a processing process when a domain for processing a user's uttered voice is not changed according to an embodiment of the present invention.
6 is a diagram illustrating a processing process in the DM module of the present invention.
7 is a diagram for explaining a process performed in a system including the electronic device 100 and a database including information about an external device according to an embodiment of the present invention.
8 is a block diagram illustrating a configuration of an electronic device according to another embodiment of the present invention.
9 is a block diagram illustrating a configuration of an electronic device according to another embodiment of the present invention.
FIG. 10 is a block diagram illustrating a specific configuration of the electronic device shown in FIG. 1 .
11 is a diagram of a software module stored in a storage unit according to an embodiment of the present invention.
12 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 더욱 상세하게 설명한다. 그리고, 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관계 등에 따라 달라질 수 있다. 그러므로, 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, the present invention will be described in more detail with reference to the drawings. In the description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. And, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary depending on the intention or relationship of a user or an operator. Therefore, the definition should be made based on the content throughout this specification.

도 1은 본 발명의 일 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다.1 is a block diagram illustrating the configuration of an electronic device according to an embodiment of the present invention.

도 1을 참조하면, 전자 장치(100)는 통신부(110) 및 프로세서(120)를 포함한다. 여기서, 전자 장치(100)는 사용자 발화 음성을 인식하여 사용자 발화 음성에 대응되는 시스템 응답을 처리하는 기능을 수행할 수 있는 장치를 포함하며, 예를 들어, TV, 전자 칠판, 전자 테이블, LFD(Large Format Display), 스마트 폰, 태블릿, 데스크탑 PC, 노트북, 홈 네트워크 시스템 서버 등과 같은 다양한 형태의 전자 장치로 구현될 수 있다. 물론, 전자 장치(100)는 사용자 발화 음성을 인식하여 사용자 발화 음성에 대응되는 시스템 응답을 처리하는 기능을 수행하기 위한 SOC(System On Chip) 형태로 구현될 수도 있다.Referring to FIG. 1 , the electronic device 100 includes a communication unit 110 and a processor 120 . Here, the electronic device 100 includes a device capable of recognizing a user's spoken voice and processing a system response corresponding to the user's spoken voice, for example, a TV, an electronic blackboard, an electronic table, an LFD ( Large Format Display), a smart phone, a tablet, a desktop PC, a notebook computer, a home network system server, etc. may be implemented in various types of electronic devices. Of course, the electronic device 100 may be implemented in the form of a system on chip (SOC) for recognizing the user's spoken voice and processing a system response corresponding to the user's spoken voice.

저장부(110)는 대화 주제 별로 카테고리화된 도메인 정보를 저장할 수 있다. 여기서, 저장부(110)는 전자 장치(100)를 동작시키기 위해 필요한 각종 프로그램 등이 저장되는 저장매체로서, 메모리, HDD(Hard Disk Drive) 등으로 구현가능하다. 예를 들어, 저장부(110)는 프로세서(130)의 동작 수행을 위한 프로그램을 저장하기 위한 ROM, 제어부(150)의 동작 수행에 따른 데이터를 일시적으로 저장하기 위한 RAM 등을 구비할 수 있다. 또한 각종 참조 데이터를 저장하기 위한 EEROM(Electrically Erasable and Programmable ROM) 등을 더 구비할 수 있다.The storage 110 may store domain information categorized for each conversation topic. Here, the storage unit 110 is a storage medium in which various programs necessary for operating the electronic device 100 are stored, and may be implemented as a memory, a hard disk drive (HDD), or the like. For example, the storage 110 may include a ROM for storing a program for performing an operation of the processor 130 , a RAM for temporarily storing data according to an operation of the controller 150 , and the like. In addition, an electrically erasable and programmable ROM (EEROM) for storing various types of reference data may be further provided.

특히, 저장부(110)는 대화 주제 별로 카테고리화된 도메인 정보를 저장할 수 있는데, 대화 주제 별로 카테고리화된 도메인 정보는 사용자 발화 음성이 속하는 주제에 따라 구분되는 그룹에 관한 정보를 의미한다. 여기서, 도메인 정보는 대화 주제에 대응되는 태스크 수행을 위한 제어 정보 및 대화 주제 별 대화 패턴 중 적어도 하나를 포함할 수 있다.In particular, the storage 110 may store domain information categorized for each conversation topic, and the domain information categorized for each conversation topic means information about a group divided according to a topic to which a user's utterance voice belongs. Here, the domain information may include at least one of control information for performing a task corresponding to a conversation topic and a conversation pattern for each conversation topic.

구체적으로, 저장부(110)는 서비스 도메인별 대화 패턴을 저장할 수 있으며, 이를 위해 저장부(110)는 각 서비스 도메인별 말뭉치(corpus) 데이터베이스를 구비할 수 있다. 여기서, 서비스 도메인은 상술한 바와 같이 사용자 발화 음성이 속하는 주제에 따라 구분될 수 있다.Specifically, the storage 110 may store a conversation pattern for each service domain, and for this, the storage 110 may include a corpus database for each service domain. Here, the service domain may be classified according to a subject to which the user's spoken voice belongs, as described above.

예를 들어, 저장부(110)는 방송 서비스 도메인에 대한 제1 말뭉치 데이터베이스 및 날씨 서비스 도메인에 대한 제2 말뭉치 데이터베이스를 구비할 수 있다.For example, the storage 110 may include a first corpus database for a broadcast service domain and a second corpus database for a weather service domain.

이 경우, 제1 말뭉치 데이터베이스는 방송 서비스 도메인 내에서 발생할 수 있는 다양한 대화 패턴을 저장할 수 있다. 예를 들어, "프로그램 언제 시작해?"에 대한 답변으로 "어느 프로그램의 시작시간을 알고 싶습니까?"를 저장하고, "○○○(방송 프로그램명) 언제 시작해?"에 대한 답변으로 "문의하신 ○○○의 시작시간은 ... 입니다"를 저장할 수 있다. In this case, the first corpus database may store various conversation patterns that may occur in the broadcast service domain. For example, as an answer to "When does a program start?", "Which program do you want to know the start time of?" is stored, and as an answer to "When does ○○○ (broadcast program name) start?" ○○○'s start time is ..." can be saved.

또한, 제2 말뭉치 데이터베이스는 날씨 서비스 도메인 내에서 발생할 수 있는 대화 패턴을 저장할 수 있다. 예를 들어, "○○(지역명)의 날씨 어때?"에 대한 답변으로 "온도를 말씀드릴까요?"를 저장하고, "서울의 온도는 어때?"에 대한 답변으로 "문의하신 ○○의 온도는 ... 입니다"를 답변으로 저장할 수 있다.Also, the second corpus database may store a conversation pattern that may occur within the weather service domain. For example, as an answer to "How's the weather in ○○ (region name)?", "Can you tell me the temperature?" is saved, and as an answer to "How is the temperature in Seoul?" The temperature is ..." can be stored as an answer.

또한, 저장부(110)는 사용자의 발화 의도 별로 제어 명령을 매칭시켜 저장하고 있을 수 있다. 예를 들어, 사용자의 발화 의도가 채널 변경인 경우 디스플레이 장치(미도시)의 채널을 변경하기 위한 제어 명령을 매칭시켜 저장하고, 사용자의 발화 의도가 예약 녹화인 경우 디스플레이 장치(미도시)에서 특정 프로그램에 대한 예약 녹화 기능을 실행시키기 위한 제어 명령을 매칭시켜 저장하고 있을 수 있다.In addition, the storage 110 may match and store the control command according to the user's utterance intention. For example, when the user's utterance intention is to change the channel, a control command for changing the channel of the display device (not shown) is matched and stored, and when the user's utterance intention is reserved recording, a specific control command is displayed on the display device (not shown). A control command for executing a reserved recording function for a program may be matched and stored.

또한, 사용자의 발화 의도가 온도 조절인 경우 에어컨 장치(미도시)의 온도를 조절하기 위한 제어 명령을 매칭시켜 저장하고, 사용자의 발화 의도가 음악 재생인 경우 음향 출력 장치(미도시)를 재생하기 위한 제어 명령을 매칭시켜 저장할 수 있다. 이와 같이, 저장부(110)는 다양한 외부 장치를 제어하기 위한 제어 명령을 사용자의 발화 의도 별로 매칭시켜 저장하고 있을 수 있다.In addition, when the user's speech intention is temperature control, a control command for adjusting the temperature of the air conditioner (not shown) is matched and stored, and when the user's speech intention is music playback, the sound output device (not shown) is played It can be stored by matching the control commands for As such, the storage 110 may match and store control commands for controlling various external devices according to the user's utterance intention.

스피커부(120)는 사용자 발화 음성에 대응되는 시스템 응답을 출력할 수 있다. 여기서, 스피커부(120)는 시스템 응답을 음성 형태로 출력하는 스피커 형태로 구현될 수도 있으나, 외부 스피커를 통해 시스템 응답을 음성 형태로 출력하기 위해 외부 스피커를 연결하기 위한 잭 등과 같은 출력 포트로 구현될 수도 있다.The speaker unit 120 may output a system response corresponding to the user's uttered voice. Here, the speaker unit 120 may be implemented in the form of a speaker that outputs the system response in the form of a voice, but is implemented as an output port such as a jack for connecting an external speaker in order to output the system response in the form of a voice through the external speaker it might be

프로세서(130)는 사용자 발화 음성에 대응되는 도메인을 검출하고, 사용자 발화 음성과 검출된 도메인 간의 신뢰도(confidence)에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성할 수 있다.The processor 130 detects a domain corresponding to the user uttered voice, and determines a domain to process the user uttered voice from among the detected domain and the previous domain based on the confidence between the user uttered voice and the detected domain to respond to the system can create

구체적으로, 프로세서(130)는 사용자 발화 음성이 입력되면 사용자 발화 음성을 분석하여 사용자 발화 음성이 어느 대화 주제에 대응되는지 여부를 판단하여 사용자 발화 음성에 대응되는 도메인을 검출할 수 있다.Specifically, when a user uttered voice is input, the processor 130 may analyze the user uttered voice to determine which conversation topic the user uttered voice corresponds to to detect a domain corresponding to the user uttered voice.

여기서, 프로세서(130)는 ASR(Automatic Speech Recognition) 모듈을 이용하여 사용자 발화 음성을 텍스트로 변환하여 사용자 발화 음성에 대응되는 텍스트를 생성할 수 있다. ASR 모듈은 음성 신호를 텍스트로 변환하기 위한 모듈로서, 종래 개시되어 있는 다양한 ASR 알고리즘을 이용하여 음성 신호를 텍스트로 변환할 수 있다.Here, the processor 130 may convert the user's uttered voice into text using an Automatic Speech Recognition (ASR) module to generate text corresponding to the user's uttered voice. The ASR module is a module for converting a voice signal into text, and may convert the voice signal into text using various ASR algorithms disclosed in the prior art.

예를 들어, 프로세서(130)는 수신된 음성 신호 내에서 사용자가 발화한 음성의 시작과 끝을 검출하여 음성 구간을 판단한다. 구체적으로, 프로세서(130)는 수신된 음성 신호의 에너지를 계산하고, 계산된 에너지에 따라 음성 신호의 에너지 레벨을 분류하여, 동적 프로그래밍을 통해 음성 구간을 검출할 수 있다. 그리고, 프로세서(130)는 검출된 음성 구간 내에서 음향 모델(Acoustic Model)을 기초로 음성의 최소 단위인 음소를 검출하여 음소 데이터를 생성하고, 생성된 음소 데이터에 HMM(Hidden Markov Model) 확률 모델을 적용하여 사용자의 발화 음성을 텍스트로 변환할 수 있다. For example, the processor 130 determines the voice section by detecting the beginning and the end of the voice uttered by the user in the received voice signal. Specifically, the processor 130 may calculate the energy of the received voice signal, classify the energy level of the voice signal according to the calculated energy, and detect the voice section through dynamic programming. Then, the processor 130 generates phoneme data by detecting a phoneme, which is a minimum unit of speech, based on an acoustic model within the detected speech section, and a Hidden Markov Model (HMM) probability model based on the generated phoneme data. can be applied to convert the user's spoken voice into text.

그리고, 프로세서(130)는 SLU(Spoken Language Understanding) 모듈을 이용하여 사용자 발화 음성에 대응되는 텍스트를 프로세서(130)가 이해할 수 있도록 Part of speech, Named entity extraction, information extraction, semantic analytic 등과 같은 여러가지 분석을 수행할 수 있다.In addition, the processor 130 analyzes various types of speech, such as part of speech, named entity extraction, information extraction, semantic analytic, etc. so that the processor 130 can understand the text corresponding to the user's spoken voice by using the Spoken Language Understanding (SLU) module. can be performed.

이후, 프로세서(130)는 사용자의 발화 음성이 변환된 텍스트에 매칭되는 대화 패턴이 존재하는 말뭉치 데이터베이스를 검출하여, 사용자의 발화 음성에 대응되는 도메인을 검출할 수 있다.Thereafter, the processor 130 may detect a domain corresponding to the user's spoken voice by detecting a corpus database in which a dialog pattern matching the converted text of the user's spoken voice exists.

예를 들어, 프로세서(130)는 "프로그램 언제 시작해?"라는 텍스트가 수신되면 사용자의 발화 음성이 방송 서비스 도메인에 대응되는 것으로 판단하고, "○○(지역명)의 날씨 어때?"라는 텍스트가 수신되면 사용자 발화 음성이 날씨 서비스 도메인에 대응되는 것으로 판단할 수 있다.For example, when the text "When does the program start?" is received, the processor 130 determines that the user's spoken voice corresponds to the broadcast service domain, and the text "How is the weather in ○○ (region name)?" When received, it may be determined that the user's uttered voice corresponds to the weather service domain.

그리고, 프로세서(130)는 사용자 발화 음성과 검출된 도메인 간의 신뢰도를 분석할 수 있다.In addition, the processor 130 may analyze the reliability between the user's uttered voice and the detected domain.

구체적으로, 프로세서(130)는 사용자 발화 음성을 구성하는 적어도 하나의 발화 요소와 검출된 도메인에 속하는 적어도 하나의 발화 요소 간의 동일 여부에 따른 신뢰도 스코어에 기초하여 사용자 발화 음성과 검출된 도메인 간의 신뢰도를 판단할 수 있다.Specifically, the processor 130 determines the reliability between the user's spoken voice and the detected domain based on a reliability score according to whether at least one speech element constituting the user's speech voice and at least one speech element belonging to the detected domain are identical. can judge

예를 들어, 프로세서(130)는 사용자 발화 음성으로부터 화행(dialogue act), 주행(main action) 및 구성요소(component slot)를 추출할 수 있다. 여기서, 화행, 주행 및 구성요소는 발화 요소에 포함된다.For example, the processor 130 may extract a dialogue act, a main action, and a component slot from the user's spoken voice. Here, the dialogue act, the driving, and the components are included in the speech element.

그리고, 프로세서(130)는 사용자 발화 음성에서 MaxEnt(Maximum Entropy Classifier)를 이용하여 화행, 주행을 추출하고, CRF(Conditional Random Field)를 이용하여 구성요소를 추출할 수 있다. 하지만, 이에 한정되는 것은 아니며 이미 공지된 다양한 방식을 통해 화행, 주행 및 구성요소를 추출할 수 있다.In addition, the processor 130 may extract dialogue acts and driving from the user's uttered voice using Maximum Entropy Classifier (MaxEnt), and extract components using Conditional Random Field (CRF). However, the present invention is not limited thereto, and dialogue acts, driving, and components may be extracted through various known methods.

여기서, 화행은 문장의 형태와 관련된 분류 기준으로, 해당 문장이 서술문(Statement), 요청문(Request), Why 의문문(WH-Question) 또는 Yes-No 의문문(YN-Question)인지를 나타내는 것이다. 주행은 해당 발화가 특정 도메인에서 대화를 통해 원하는 행위를 나타내는 의미적 정보이다. 예를 들어, 방송 서비스 도메인에서, 주행은 TV 온/오프, 프로그램 찾기, 프로그램 시간 찾기, 프로그램 예약 등을 포함할 수 있다. 구성요소는 발화에 나타나는 특정 도메인에 대한 개체 정보즉, 특정 도메인에서 의도하는 행동의 의미를 구체화하기 위해서 부가되는 정보이다. 예를 들어, 방송 서비스 도메인에서 구성요소는 장르, 프로그램명, 시작시간, 채널명, 배우 이름 등을 포함할 수 있다.Here, the dialogue act is a classification criterion related to the form of a sentence, and indicates whether the corresponding sentence is a Statement, a Request, a Why question (WH-Question), or a Yes-No question (YN-Question). Driving is semantic information in which a corresponding utterance indicates a desired action through a conversation in a specific domain. For example, in the broadcast service domain, driving may include TV on/off, program search, program time search, program reservation, and the like. A component is information added to specify the meaning of an action intended in a specific domain, that is, individual information about a specific domain appearing in the speech. For example, in the broadcast service domain, components may include a genre, a program name, a start time, a channel name, an actor name, and the like.

그리고, 프로세서(130)는 사용자 발화 음성으로부터 추출된 화행, 주행 및 구성요소 중 적어도 하나와 검출된 도메인에 속하는 적어도 하나의 발화 요소 간의 동일 여부에 따라 신뢰도 스코어를 산출할 수 있고, 프로세서(130)는 산출된 신뢰도 스코어에 기초하여 사용자 발화 음성과 검출된 도메인 간의 신뢰도를 판단할 수 있다.In addition, the processor 130 may calculate a reliability score according to whether at least one of dialogue acts, driving, and components extracted from the user's speech voice is identical to at least one speech component belonging to the detected domain, and the processor 130 . may determine the reliability between the user's uttered voice and the detected domain based on the calculated reliability score.

또한, 프로세서(130)는 판단된 사용자 발화 음성과 검출된 도메인 간의 신뢰도에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정할 수 있다.Also, the processor 130 may determine a domain to process the user's uttered voice from among the detected domain and the previous domain based on the reliability between the determined user uttered voice and the detected domain.

예를 들어, 이전에 수신된 사용자 발화 음성이 날씨에 관한 것이어서 이전 도메인이 날씨에 관한 것이라고 상정하면, 프로세서(130)는 현재 수신된 사용자 발화 음성이 "프로그램 언제 시작해?"인 경우 방송에 관한 도메인을 검출할 수 있고, 현재 수신된 "프로그램 언제 시작해?"라는 사용자 발화 음성과 날씨에 관한 도메인 간의 신뢰도 스코어가 10이고 "프로그램 언제 시작해?"라는 사용자 발화 음성과 방송에 관한 도메인 간의 신뢰도 스코어가 80인 경우, 프로세서(130)는 "프로그램 언제 시작해?"라는 사용자 발화 음성을 처리할 도메인은 이전 도메인(날씨에 관한 도메인)이 아닌 방송에 관한 도메인이라고 결정할 수 있다.For example, if it is assumed that the previously received user uttered voice relates to the weather and thus the previous domain relates to the weather, the processor 130 determines the broadcasting domain when the currently received user uttered voice is “When does the program start?” can be detected, and the confidence score between the currently received user-spoken voice “when does the program start?” and the domain about the weather is 10, and the confidence score between the user-spoken voice “when does the program start?” and the domain about the broadcast is 80 In the case of , the processor 130 may determine that the domain in which the user uttered voice “when does the program start?” is processed is a broadcasting domain, not a previous domain (weather related domain).

또한, 예를 들어, 이전에 수신된 사용자 발화 음성이 "비가 오면 창문을 닫아줘"이고 이를 처리한 도메인이 날씨에 관한 도메인으로 결정되었다고 상정하면, 프로세서(130)는 현재 수신된 사용자 발화 음성이 "비가 오면 TV를 켜줘"인 경우 날씨에 관한 도메인과 방송에 관한 도메인을 검출할 수 있고, "비가 오면 TV를 켜줘"라는 사용자 발화 음성과 날씨에 관한 도메인 간의 신뢰도 스코어가 70이고, "비가 오면 TV를 켜줘"라는 사용자 발화 음성과 방송에 관한 도메인 간의 신뢰도 스코어가 50인 경우 프로세서(130)는 "비가 오면 TV를 켜줘"라는 사용자 발화 음성을 처리할 도메인은 이전 도메인인 날씨에 관한 도메인과 동일한 것으로 결정할 수 있다.Also, for example, if it is assumed that the previously received user's uttered voice is "close the window when it rains" and the domain that has processed this is determined to be a weather-related domain, the processor 130 determines that the currently received user uttered voice is In the case of "Turn on the TV when it rains", the weather domain and the broadcasting domain can be detected, and the reliability score between the user's voice saying "Turn on the TV when it rains" and the weather domain is 70, and "if it rains, When the reliability score between the user uttered voice “Turn on the TV” and the domain related to broadcasting is 50, the processor 130 determines that the domain to process the user uttered voice “Turn on the TV when it rains” is the same as the previous domain, the domain related to the weather. it can be decided that

즉, 프로세서(130)는 이전 사용자 발화 음성을 처리하는 도메인이 이미 결정되어 있다고 하더라도, 새롭게 사용자 발화 음성이 수신될 때마다 새롭게 수신된 사용자 발화 음성을 처리할 도메인을 각각 결정할 수 있다.That is, even if the domain for processing the previous user's uttered voice has already been determined, the processor 130 may determine each domain for processing the newly received user's uttered voice whenever a new user's uttered voice is received.

또한, 프로세서(130)는 현재 수신된 사용자 발화 음성과 이전 도메인 간의 신뢰도 스코어와 현재 수신된 사용자 발화 음성에 대응되는 도메인 간의 신뢰도 스코어가 서로 비슷하여 현재 수신된 사용자 발화 음성을 처리할 도메인을 판단하는 것이 불가능한 경우, 사용자에게 현재 수신된 사용자 발화 음성이 어떤 도메인에 대한 것인지 물어보는 메시지를 생성하여 스피커부(120)를 통해 출력할 수도 있다.In addition, the processor 130 determines a domain in which the currently received user uttered voice is to be processed because the confidence score between the currently received user uttered voice and the previous domain and the confidence score between the domain corresponding to the currently received user uttered voice are similar to each other. If this is impossible, a message asking the user for which domain the currently received user uttered voice is used may be generated and outputted through the speaker unit 120 .

한편, 프로세서(130)는 추출된 화행, 주행 및 구성요소를 이용하여 사용자 발화 음성에 포함된 발화 의도를 판단할 수 있다. Meanwhile, the processor 130 may determine the utterance intention included in the user's uttered voice by using the extracted dialogue acts, driving, and components.

예를 들어, "○○○(방송프로그램명)를 언제 해?"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 사용자 발화 음성과 매칭되는 대화 패턴이 존재하는 말뭉치 데이터베이스를 검색하여, "○○○를 언제 해?"라는 사용자 발화 음성이 방송 서비스 도메인에 포함되는 것으로 검출할 수 있다.For example, when a user's uttered voice "When do you do ○○○ (broadcast program name)?" is received, the processor 130 searches a corpus database in which a conversation pattern matching the user uttered voice exists, and "○ When do you do ○○?" may be detected as being included in the broadcast service domain.

이후, 프로세서(130)는 화행을 통해 해당 음성의 문장 형태가 "의문형"인 것으로 판단하고, 주행 및 핵심 요소를 통해 "○○○"에 대한 "프로그램 시작시간"을 알기 원하는 것으로 판단한다. 결과적으로, 프로세서(130)는 사용자 음성에 포함된 발화 의도는 "○○○"에 대해 "프로그램 시작시간"을 "문의"하는 것으로 판단할 수 있다.Thereafter, the processor 130 determines that the sentence form of the corresponding voice is "interrogative" through the dialogue act, and determines that it wants to know the "program start time" for "○○○" through driving and key elements. As a result, the processor 130 may determine that the utterance intention included in the user's voice is "inquiring" for "program start time" for "○○○".

그리고, 프로세서(130)는 사용자 발화 음성이 방송 서비스 도메인에 대응되고 발화 의도는 "○○○"에 대해 "프로그램 시작시간"을 "문의"하는 것임을 고려하여 최종적으로 사용자 발화 음성은 방송 서비스 도메인에서 "○○○"에 대해 "프로그램 시작시간"을 "문의"하는 것으로 판단할 수 있다.And, considering that the user's spoken voice corresponds to the broadcast service domain and the utterance intention is to "inquire" the "program start time" for "○○○", finally the user's spoken voice is transmitted in the broadcast service domain. For "○○○", it can be judged as "inquiring" about "program start time".

특히, 프로세서(130)는 사용자 발화 음성이 수신될 때마다 사용자 발화 음성을 처리할 도메인을 각각 결정할 수 있고, 또한 결정된 도메인 각각에 포함된 컨텍스트에 기초하여 각 도메인 내에서의 사용자의 발화 의도를 판단할 수 있다.In particular, the processor 130 may respectively determine a domain to process a user's spoken voice whenever a user's spoken voice is received, and also determines the user's utterance intention in each domain based on a context included in each of the determined domains. can do.

기존의 음성 인식 기반의 처리 시스템은 사용자 발화 음성이 수신되어 사용자 발화 음성을 처리할 도메인이 결정되면, 결정된 도메인 내의 컨텍스트에만 기초하여 사용자의 발화 의도를 판단할 수 있었으나, 본 발명의 일 실시 예에 따른 프로세서(130)는 사용자 발화 음성이 수신될 때마다 사용자 발화 음성을 처리할 도메인을 각각 결정할 수 있고, 결정된 도메인 각각에 속한 컨텍스트에 기초하여 각 도메인 내에서의 사용자 발화 의도를 판단할 수 있게 되어 다양한 사용자 발화 음성을 처리할 수 있다.In the existing speech recognition-based processing system, when a domain for processing a user's spoken voice is determined by receiving a user's spoken voice, the user's speech intention can be determined based only on the context within the determined domain. The processor 130 can determine each domain to process the user's uttered voice whenever a user's uttered voice is received, and can determine the user's uttered intention in each domain based on the context belonging to each of the determined domains. It can process various user uttered voices.

한편, 프로세서(130)는 사용자 발화 음성을 처리할 도메인이 결정되면 사용자 발화 음성에 대응되는 시스템 응답을 생성할 수 있다.Meanwhile, when a domain for processing the user's spoken voice is determined, the processor 130 may generate a system response corresponding to the user's spoken voice.

예를 들어, 프로세서(130)는 사용자 발화 음성은 방송 서비스 도메인에서 "○○○"에 대해 "프로그램 시작시간"을 "문의"하는 것으로 판단되면, "문의하신 ○○○의 시작시간은 ... 입니다"라는 답변을 방송 서비스 도메인의 말뭉치 데이터베이스에서 추출한다. 이 경우, 프로세서(130)는 저장부(110)에 기저장된 EPG(Electronic Program Guide) 정보를 이용하여 "○○○"에 대한 방송 시작시간을 검색하고, "문의하신 ○○○의 시작시간은 토요일 7시입니다"라는 시스템 응답을 생성할 수 있다.For example, if it is determined that the processor 130 "inquires" the "program start time" for "○○○" in the broadcast service domain, "the start time of the inquired ○○○ is .. .” is extracted from the corpus database of the broadcast service domain. In this case, the processor 130 searches for the broadcast start time for "○○○" using EPG (Electronic Program Guide) information pre-stored in the storage 110, and "The start time of the inquired ○○○ is It is 7 o'clock on Saturday".

또한, 프로세서(130)는 사용자의 발화 의도에 대응되는 기능을 수행하기 위한 제어 명령을 시스템 응답으로 생성할 수도 있다.In addition, the processor 130 may generate a control command for performing a function corresponding to the user's utterance intention as a system response.

예를 들면, "○○○(방송 프로그램명)를 예약해줘"라는 사용자 발화 음성이 수신된 경우를 가정할 수 있다. 이 경우, 제어부(330)는 사용자의 음성과 매칭되는 대화 패턴이 존재하는 말뭉치 데이터베이스를 검색하여, "○○○를 예약해줘"라는 사용자의 음성이 방송 서비스 도메인에 포함되는 것으로 판단할 수 있다. For example, it may be assumed that a user uttered voice “Reserve ○○○ (broadcast program name)” is received. In this case, the controller 330 may search the corpus database in which a conversation pattern matching the user's voice exists, and determine that the user's voice "Reserve ○○○" is included in the broadcast service domain.

그리고, 프로세서(130)는 화행을 통해 해당 음성이 "요청"과 관련된 문장 형태인 것으로 판단하고, 주행 및 핵심 요소를 통해 "○○○"에 대한 "프로그램 예약"을 원하는 것으로 판단할 수 있다. 결과적으로, 프로세서(130)는 사용자 음성에 포함된 발화 의도는 "○○○"에 대해 "프로그램 예약"을 "요청"하는 것으로 판단할 수 있다.In addition, the processor 130 may determine that the corresponding voice is in the form of a sentence related to "request" through dialogue act, and may determine that "program reservation" for "○○○" is desired through driving and key elements. As a result, the processor 130 may determine that the utterance intention included in the user's voice is "requesting" a "program reservation" for "○○○".

이후, 프로세서(130)는 "○○○"에 대해 "프로그램 예약"을 "요청"하는 발화 의도에 대응되는 제어 명령을 저장부(110)로부터 검출하여, 디스플레이 장치(미도시)에서 "○○○"의 예약 녹화 기능을 수행하기 위한 제어 명령을 생성할 수 있다. 이 경우, 프로세서(130)는 "○○○"에 대해 "프로그램 예약"을 "요청"하는 발화 의도에 대해 "요청하신 프로그램의 녹화가 예약되었습니다"라는 응답 메시지를 방송 서비스 도메인의 말뭉치 데이터베이스에서 추출하여, 스피커부(120)를 통해 함께 출력할 수 있다.Thereafter, the processor 130 detects, from the storage 110 , a control command corresponding to an utterance intention of “requesting a “program reservation” for “○○○”, and displays “○○” on the display device (not shown). It is possible to create a control command for performing the reserved recording function of "○". In this case, the processor 130 extracts a response message “recording of the requested program is reserved” from the corpus database of the broadcast service domain for the utterance intention of “requesting” “program reservation” for “○○○” Therefore, it can be output together through the speaker unit 120 .

한편, 저장부(110)는 도메인 각각에 대응되는 대화 주제를 컨텍스트 별로 카테고리화하여 저장하며, 프로세서(130)는 이전 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되면, 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 사용자 발화 음성과 판단된 컨텍스트 간의 신뢰도에 기초하여 판단된 컨텍스트 및 이전 컨텍스트 중 사용자 발화 음성을 처리할 컨텍스트를 결정하여 시스템 응답을 생성할 수 있다.Meanwhile, the storage 110 categorizes and stores the conversation topics corresponding to each domain for each context, and the processor 130 determines that the previous domain is a domain to process the user's spoken voice, the context corresponding to the user's spoken voice. may be determined, and a context to process the user's spoken voice may be generated from among the determined contexts and previous contexts based on the reliability between the user's spoken voice and the determined context.

여기서, 저장부(110)에 저장된 도메인 각각에 대응되는 대화 주제를 컨텍스트 별 카테고리는 상술한 도메인에 대한 말뭉치 데이터베이스에 대응될 수 있다. 즉, 도메인 각각에 대응되는 대화 주제와 관련된 컨텍스트는 각 도메인 내에서 발생할 수 있는 다양한 대화 패턴을 포함하고, 이러한 대화 패턴뿐만 아니라, 특정 대화 주제에서 사용되는 특정 용어, 고유 명사, 지명 등의 단어도 대화 주제에 따라 카테고리화하여 포함할 수 있다.Here, the category for each context of the conversation topic corresponding to each domain stored in the storage 110 may correspond to the corpus database for the above-described domain. That is, the context related to the conversation topic corresponding to each domain includes various conversation patterns that can occur within each domain, and not only these conversation patterns, but also words such as specific terms, proper nouns, and place names used in specific conversation topics. It can be included by categorizing it according to the topic of conversation.

그리고, 프로세서(130)는 이전 도메인이 현재 수신된 사용자 발화 음성을 처리할 도메인으로 결정되면 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 현재 수신된 사용자 발화 음성과 사용자 발화 음성에 대응되는 컨텍스트 간의 신뢰도 및 현재 수신된 사용자 발화 음성과 이전 컨텍스트 간의 신뢰도를 비교하여 현재 수신된 사용자 발화 음성을 처리할 컨텍스트를 결정하여 시스템 응답을 생성할 수 있다.Then, when the previous domain is determined as a domain to process the currently received user uttered voice, the processor 130 determines a context corresponding to the user uttered voice, and the reliability between the currently received user uttered voice and the context corresponding to the user uttered voice and comparing the reliability between the currently received user's uttered voice and the previous context to determine a context in which to process the currently received user's uttered voice to generate a system response.

예를 들어, 이전에 수신된 사용자 발화 음성이 "큰방에 있는 TV 시청할 때면 다른 방 불 꺼주고 큰 방은 시원하고 어둡게 해줘"인 경우, 프로세서(130)는 이전에 수신된 사용자 발화 음성을 처리할 도메인으로 디스플레이 장치에 관한 도메인으로 결정하고 컨텍스트는 큰방에 있는 TV 시청 시 전등과 온도에 관한 것으로 판단하게 된다. 이후, 수신된 사용자 발화 음성이 "영화 나오는 채널이 몇 번이지?"인 경우, 프로세서(130)는 사용자 발화 음성을 처리할 도메인은 이전 도메인과 마찬가지로 디스플레이 장치에 관한 도메인으로 결정할 수 있으며, "영화 나오는 채널이 몇 번이지?"에 대응되는 컨텍스트는 채널 정보에 관한 것으로 판단할 수 있다. 그리고, 프로세서(130)는 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성과 채널 정보에 관한 컨텍스트 간의 신뢰도 스코어가 80이고, "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성과 TV 시청 시 전등과 온도에 관한 컨텍스트 간의 신뢰도 스코어가 40인 경우, "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성을 처리할 컨텍스트는 채널 정보에 관한 컨텍스트로 결정할 수 있다.For example, if the previously received user's spoken voice is "When watching TV in a large room, turn off the lights in other rooms and make the big room cool and dark", the processor 130 processes the previously received user's spoken voice. The domain is determined as a domain related to the display device, and the context is determined to be related to lighting and temperature when watching TV in a large room. Thereafter, when the received user's uttered voice is "How many channels do movies come out of?", the processor 130 may determine a domain for processing the user's uttered voice as a domain related to the display device, like the previous domain, and "movies." It may be determined that the context corresponding to "what number of channels appears?" relates to channel information. In addition, the processor 130 has a reliability score of 80 between a user uttered voice saying “how many times is a movie on a channel?” and a context related to channel information, and a user uttered voice saying “how many times is a movie on a channel?” and TV When the reliability score between the context regarding the light and the temperature during viewing is 40, the context in which the user uttered voice “what is the number of channels for the movie?” may be determined as a context regarding channel information.

즉, 프로세서(130)는 이전 사용자 발화 음성을 처리하는 도메인이 이미 결정되어 있다고 하더라도, 새롭게 사용자 발화 음성이 수신될 때마다 새롭게 수신된 사용자 발화 음성을 처리할 도메인을 각각 결정할 수 있고, 또한, 새롭게 수신된 사용자 발화 음성을 처리할 도메인이 이전 도메인과 동일하다고 하더라도 도메인 내에 포함된 복수의 컨텍스트들 중 새롭게 수신된 사용자 발화 음성을 처리할 컨텍스트를 결정할 수 있다.That is, even if the domain for processing the previous user's uttered voice is already determined, the processor 130 may determine each domain for processing the newly received user's uttered voice each time a new user's uttered voice is received, and also newly Even if the domain for processing the received user's uttered voice is the same as the previous domain, it is possible to determine a context for processing the newly received user's uttered voice from among a plurality of contexts included in the domain.

그리고, 프로세서(130)는 결정된 컨텍스트가 이전 컨텍스트와 동일하다면 이전 컨텍스트 내에서 새롭게 수신된 사용자 발화 음성을 처리하고, 결정된 컨텍스트가 이전 컨텍스트와 다르다면 결정된 컨텍스트 내에서 새롭게 수신된 사용자 발화 음성을 처리하게 된다.And, if the determined context is the same as the previous context, the processor 130 processes the newly received user uttered voice in the previous context, and if the determined context is different from the previous context, processes the newly received user uttered voice within the determined context. do.

한편, 프로세서(130)는 판단된 컨텍스트가 사용자 발화 음성을 처리할 컨텍스트로 결정되면, 이전 컨텍스트와 관련된 정보를 저장부(110)에 저장하고, 판단된 컨텍스트에서의 발화 음성 처리가 종료되면, 저장된 이전 컨텍스트와 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.On the other hand, when the determined context is determined as a context for processing the user's spoken voice, the processor 130 stores information related to the previous context in the storage 110 , and when processing of the spoken voice in the determined context is finished, the stored A newly uttered voice may be processed based on information related to the previous context.

예를 들어, "큰 방에 있는 TV 시청할 때 다른 방은 꺼주고 큰 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 수신된 사용자 발화 음성을 처리할 도메인으로 디스플레이 장치에 관한 도메인으로 결정하고 컨텍스트는 큰방에 있는 TV 시청 시 전등과 온도에 관한 것으로 판단하며, 이러한 사용자 발화 음성에 대응하여 "거실과 복도 불도 함께 끌까요?"와 같은 시스템 응답을 생성할 수 있다.For example, when a user uttered voice is received, "When watching TV in a large room, turn off other rooms and make the large room cool and dark", the processor 130 sets the display device as a domain to process the received user uttered voice. domain, and the context is determined to be about light and temperature when watching TV in a large room, and in response to the user's voice, a system response such as "Shall we turn off the lights in the living room and hallway together?" can be generated.

이후, "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 사용자 발화 음성을 처리할 도메인은 이전 도메인과 마찬가지로 디스플레이 장치에 관한 도메인으로 결정하되 "영화 나오는 채널이 몇 번이지?"에 대응되는 컨텍스트는 채널 정보에 관한 것으로 판단하게 된다. 이에 따라, 프로세서(130)는 TV 시청 시 전등과 온도에 관한 컨텍스트와 관련된 정보를 저장부(110)에 저장하고, 채널 정보에 관한 컨텍스트에서 수신되는 사용자 발화 음성을 처리하게 된다. 여기서, TV 시청 시 전등과 온도에 관한 컨텍스트와 관련된 정보는 TV 시청 시 전등과 온도에 관한 컨텍스트 내에서 수신된 사용자 발화 음성, 수신된 사용자 발화 음성을 처리하는데 사용된 데이터 및 수신된 사용자 발화 음성에 대응하여 생성된 시스템 응답에 관한 정보를 포함할 수 있다.Thereafter, when a user's spoken voice asking "what channel is a movie on?" is received, the processor 130 determines a domain for processing the user's uttered voice as a domain related to the display device as in the previous domain. It is determined that the context corresponding to "how many times?" relates to channel information. Accordingly, the processor 130 stores information related to the context related to lighting and temperature when watching TV in the storage 110 , and processes a user uttered voice received in the context related to channel information. Here, the information related to the context related to the light and temperature when watching TV includes the user uttered voice received within the context of the light and temperature when watching TV, data used to process the received user uttered voice, and the received user uttered voice. It may include information about the correspondingly generated system response.

한편, 프로세서(130)는 채널 정보에 관한 컨텍스트에서 수신된 사용자 발화 음성에 대한 처리가 종료되면, 다시 저장부(110)에 저장된 TV 시청 시 전등과 온도에 관한 컨텍스트와 관련된 정보를 독출하고 이에 기초하여 새롭게 수신되는 사용자 발화 음성을 처리할 수 있다.On the other hand, when the processing of the user's spoken voice received in the context related to channel information is finished, the processor 130 reads out information related to the context related to the light and temperature when watching TV stored in the storage 110 again, and based on this Thus, a newly received user uttered voice can be processed.

예를 들면, 프로세서(130)는 채널 정보에 관한 컨텍스트에서 수신된 사용자 발화 음성에 대한 처리가 종료되면, 다시 저장부(110)에 저장된 TV 시청 시 전등과 온도에 관한 컨텍스트와 관련된 정보를 독출하면서, "이전에 진행중이던 대화를 이어서 진행하겠습니다."와 같은 음성 메시지를 생성하여 스피커부(120)를 통해 출력할 수 있다.For example, when the processing of the user's spoken voice received in the context related to channel information is finished, the processor 130 reads information related to the context related to lighting and temperature when watching TV stored in the storage 110 again. , "I will continue the conversation that was previously in progress." It is possible to generate a voice message such as, and output through the speaker unit (120).

그리고, 프로세서(130)는 TV 시청 시 전등과 온도에 관한 컨텍스트에 기초하여 이전에 진행하던 "거실과 복도 불도 함께 끌까요?"의 음성 메시지를 다시 스피커부(120)를 통해 출력하면 사용자로부터 "거실은 끄고 복도는 켜줘"라는 신규 발화 음성을 수신하게 되고, 이에 따라, 프로세서(130)는 TV 시청 시 전등과 온도에 관한 컨텍스트에 기초하여 "거실은 끄고 복도는 켜줘"라는 사용자 발화 음성을 처리할 수 있다.Then, when the processor 130 outputs the voice message of “Shall we turn off the lights in the living room and hallway together?” again through the speaker unit 120 based on the context related to the light and temperature when watching TV, the user asks “ A new utterance voice of “turn off the living room and turn on the hallway” is received, and accordingly, the processor 130 processes the user uttered voice “turn off the living room and turn on the hallway” based on the context regarding lights and temperature when watching TV can do.

상술한 예는, 이전 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되는 경우에 대해 설명한 것이고, 이전 도메인이 아닌 사용자 발화 음성에 대응하여 검출된 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되는 경우에 대해서도 설명하기로 한다.The above-described example describes a case where the previous domain is determined as a domain to process a user's uttered voice, and when a domain detected corresponding to a user's uttered voice other than the previous domain is determined as a domain for processing the user's uttered voice It will also be explained.

프로세서(130)는 사용자 발화 음성에 대응하여 검출된 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되면, 이전 도메인과 관련된 정보를 저장부(110)에 저장하고, 검출된 도메인에서의 발화 음성 처리가 종료되면, 저장된 이전 도메인과 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.When the domain detected in response to the user's uttered voice is determined as a domain to process the user's uttered voice, the processor 130 stores information related to the previous domain in the storage unit 110, and processing of the uttered voice in the detected domain is performed. Upon completion, the newly uttered voice may be processed based on the stored information related to the previous domain.

예를 들어, 프로세서(130)는 "큰 방에 있는 TV 시청할 때 다른 방 불 꺼주고 큰 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성이 수신되면 디스플레이 장치에 관한 도메인으로 판단하고, 디스플레이 장치에 관한 도메인에서 사용자 발화 음성을 처리하게 된다. 이때, 프로세서(130)는 "거실과 복도 불도 함께 끌까요?"와 같은 시스템 응답을 생성할 수 있다.For example, when a user's voice saying "When watching TV in a large room, turn off the lights in the other room and make the large room cool and dark" is received, the processor 130 determines as a domain related to the display device, and relates to the display device. The domain will process the user uttered voice. In this case, the processor 130 may generate a system response such as "Shall we turn off the lights in the living room and hallway together?"

이후, 사용자로부터 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성에 대응되는 도메인을 일정에 관한 도메인으로 검출할 수 있고, 사용자 발화 음성과 일정에 관한 도메인 및 디스플레이 장치에 관한 도메인 각각에 관한 신뢰도에 기초하여 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성을 처리할 도메인을 일정에 관한 도메인으로 결정할 수 있다.Thereafter, when a user uttered voice “Register a wedding schedule for next weekend” is received from the user, the processor 130 sets the domain corresponding to the user uttered voice “True, register a wedding schedule for next weekend” into the schedule. It is possible to detect as a domain related to the user's speech and schedule a domain to process a user's voice saying "Really, register a wedding schedule next weekend" based on the reliability of each of the domain related to the user's voice and the schedule and the domain related to the display device. It can be determined by the domain of

이때, 프로세서(130)는 이전 도메인과 관련된 정보 즉, 디스플레이 장치에 관한 도메인과 관련된 정보를 저장부(110)에 저장하고, 일정에 관한 도메인에서 사용자로부터 수신되는 "토요일 12시"와 같은 사용자 발화 음성을 처리하여 "일정 등록 완료되었습니다"와 같은 시스템 응답을 생성하여 스피커부(120)를 통해 출력함으로써 일정에 관한 도메인에서의 발화 음성 처리가 종료되면, 다시 저장부(110)에 저장된 디스플레이 장치에 관한 도메인에 관련된 정보를 독출하면서, "이전에 진행중이던 대화를 이어서 진행하겠습니다."와 같은 음성 메시지를 생성하여 스피커부(120)를 통해 출력할 수 있다.In this case, the processor 130 stores information related to the previous domain, that is, information related to the domain related to the display device, in the storage 110 , and a user utterance such as “Saturday 12:00” received from the user in the domain related to the schedule. By processing the voice to generate a system response such as “schedule registration is complete” and outputting it through the speaker unit 120 , when the speech processing in the domain related to the schedule ends, it is again displayed on the display device stored in the storage unit 110 . While reading information related to the related domain, a voice message such as “I will continue the conversation that was previously in progress” may be generated and outputted through the speaker unit 120 .

그리고, 프로세서(130)는 디스플레이 장치에 관한 도메인에 기초하여 이전에 진행하던 "거실과 복도 불도 함께 끌까요?"의 음성 메시지를 다시 스피커부(120)를 통해 출력하면 사용자로부터 "거실은 끄고 복도는 켜줘"라는 신규 발화 음성을 수신하게 되고, 이에 따라, 프로세서(130)는 디스플레이 장치에 관한 도메인에 기초하여 "거실은 끄고 복도는 켜줘"라는 사용자 발화 음성을 처리할 수 있다.Then, when the processor 130 outputs the voice message of “Shall we also turn off the lights in the living room and hallway together?” that was previously conducted on the basis of the domain related to the display device through the speaker unit 120 again, the user tells the user “Turn off the living room and turn off the hallway lights.” A new uttered voice “Turn on” is received, and accordingly, the processor 130 may process a user uttered voice “Turn off the living room and turn on the hallway” based on the domain related to the display device.

도 2는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리하는 과정을 나타낸 도면이다.2 is a diagram illustrating a process of processing a user uttered voice according to an embodiment of the present invention.

도 2를 참조하면, 사용자 발화 음성이 입력되면, 프로세서(130)는 ASR(Automatic Speech(210) 모듈을 사용하여 Language Model(211)을 참조하여 음성 신호를 텍스트로 변환하고, SLU(Spoken Language Understanding)(220) 모듈을 사용하여 SLU Model(221)을 참조하여 텍스트 입력을 프로세서(130)가 이해할 수 있도록 여러가지 분석을 수행할 수 있다.Referring to FIG. 2 , when a user's spoken voice is input, the processor 130 converts a voice signal into text by referring to the language model 211 using an Automatic Speech 210 (ASR) module, and SLU (Spoken Language Understanding). ) 220 module may be used to perform various analyzes so that the processor 130 can understand the text input by referring to the SLU Model 221 .

또한, 프로세서(130)는 DM(Dialogue Manager)(230) 모듈을 사용하여 사용자의 사용자의 발화 음성을 분석하고 사용자의 발화 의도를 분석하면서 다양한 자연어 대화를 생성할 수 있다. 구체적으로, 프로세서(130)는 DM(230) 모듈을 사용하여 사용자 발화 음성에 대응되는 도메인을 검출하고,사용자 발화 음성과 검출된 도메인 간이 신뢰도에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정할 수 있다.In addition, the processor 130 may generate various natural language conversations while analyzing the user's uttered voice and the user's utterance intention using the DM (Dialogue Manager) 230 module. Specifically, the processor 130 detects a domain corresponding to the user uttered voice using the DM 230 module, and divides the user uttered voice among the detected domain and the previous domain based on the reliability between the user uttered voice and the detected domain. You can decide which domain to process.

또한, 프로세서(130)는 DM(230) 모듈을 사용하여 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 사용자 발화 음성과 판단된 컨텍스트 간의 신뢰도에 기초하여 판단된 컨텍스트 및 이전 컨텍스트 중 사용자 발화 음성을 처리할 컨텍스트를 결정할 수 있다. 특히, 프로세서(130)는 이전 도메인과 관련된 정보 또는 이전 컨텍스트와 관련된 정보를 Context Stack(231)에 저장할 수 있고, 사용자와 전자 장치(100) 간의 대화 내용에 관련된 데이터를 Context History(232)에 저장할 수 있다. 여기서, Context Stack(231)은 사용자와의 대화 중 다른 대화 주제로 전환될 경우 이전의 컨텍스트에 관련된 정보를 저장하기 위한 저장 공간을 의미하고, Context History(232)는 사용자와 전자 장치 간의 대화에 관련된 데이터를 저장하기 위한 저장 공간을 의미한다.In addition, the processor 130 determines a context corresponding to the user uttered voice using the DM 230 module, and processes the user uttered voice among the context determined based on the reliability between the user uttered voice and the determined context and the previous context. You can decide which context to use. In particular, the processor 130 may store information related to the previous domain or information related to the previous context in the Context Stack 231 , and store data related to conversation content between the user and the electronic device 100 in the Context History 232 . can Here, the context stack 231 means a storage space for storing information related to a previous context when switching to another conversation topic during a conversation with the user, and the context history 232 is related to the conversation between the user and the electronic device A storage space for storing data.

또한, 프로세서(130)는 Context Manager ＆ Task Delegating(240) 모듈을 사용하여 전자 장치(100)와 사용자 간의 대화를 모니터링하고 사용자와의 대화에서 활용 가능한 정보를 지원할 수 있다.In addition, the processor 130 may monitor a conversation between the electronic device 100 and the user by using the Context Manager & Task Delegating 240 module, and may support information available in the conversation with the user.

또한, 프로세서(130)는 Context Manager ＆ Task Delegating(240) 모듈을 사용하여 Action Plannig Agent(250), Family member Agent(260) 및 Health Agent(270) 등 중 적어도 하나를 제어하여 사용자 발화 음성을 처리하도록 할 수 있다.In addition, the processor 130 processes at least one of the Action Plannig Agent 250 , the Family Member Agent 260 , and the Health Agent 270 using the Context Manager & Task Delegating 240 module to process the user's spoken voice. can make it

한편, 프로세서(130)는 Action Plannig Agent(250)를 사용하여 DM(230) 모듈에서 분석된 사용자 발화 음성, 사용자 발화 의도, 사용자 발화 음성을 처리할 도메인 및 컨텍스트 등에 기초하여 적어도 하나의 외부 장치의 기능을 제어할 수 있다.On the other hand, the processor 130 uses the Action Plannig Agent 250 to control the at least one external device based on the user uttered voice analyzed in the DM 230 module, the user uttered intention, the domain and context to process the user uttered voice, etc. function can be controlled.

또한, 프로세서(130)는 Action Plannig Agent(250)를 사용하여 외부 장치의 기능 및 상태 등에 기초하여 어떤 외부 장치를 사용하여 사용자 발화 음성에 대응되는 기능을 수행하도록 할지 결정할 수 있다. 여기서, 프로세서(130)는 Action Plannig Agent(250)를 사용하여 Action Ontology(251) 및 Things Graph DB(252)에 기초하여 외부 장치의 기능 및 상태를 판단할 수 있다.Also, the processor 130 may determine which external device to use to perform a function corresponding to the user's uttered voice based on the function and state of the external device using the Action Plannig Agent 250 . Here, the processor 130 may determine the function and state of the external device based on the Action Ontology 251 and the Things Graph DB 252 using the Action Plannig Agent 250 .

한편, 프로세서(130)는 SLU(Spoken Language Understanding)(280) 모듈을 사용하여 생성된 시스템 응답을 사용자가 이해할 수 있도록 텍스트로 변환할 수 있다.Meanwhile, the processor 130 may convert the generated system response into text so that the user can understand it using the Spoken Language Understanding (SLU) 280 module.

그리고, 프로세서(130)는 TTS(Text to Speech)(290) 모듈을 사용하여 텍스트를 음성 신호로 변환할 수 있다. 이에 따라, 프로세서(130)는 생성된 시스템 응답을 음성 신호로 변환하여 스피커부(120)를 통해 출력할 수 있다.In addition, the processor 130 may convert the text into a voice signal using a Text to Speech (TTS) 290 module. Accordingly, the processor 130 may convert the generated system response into a voice signal and output it through the speaker unit 120 .

한편, 도 3은 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리하는 과정을 설명하기 위한 흐름도이다.Meanwhile, FIG. 3 is a flowchart illustrating a process of processing a user uttered voice according to an embodiment of the present invention.

도 3을 참조하면, 사용자 발화 음성이 입력되면 프로세서(130)는 ASR(Automatic Speech Recognition) 모듈을 사용하여 음성 신호를 텍스트로 변환할 수 있고(S310), 사용자 발화 음성에 대응되는 도메인을 검출하고(S321), 사용자 발화 음성과 검출된 도메인 및 이전 도메인 간의 신뢰도를 분석할 수 있다(S322). 또한, 프로세서(130)는 사용자 발화 의도를 분석할 수도 있다(S320). 사용자 발화 음성에 대응되는 도메인을 검출하고, 사용자 발화 음성과 검출된 도메인 및 이전 도메인 간의 신뢰도를 분석하며 사용자 발화 의도를 분석하는 구체적인 설명은 미리 하였으므로 자세한 설명은 생략하기로 한다.Referring to FIG. 3 , when a user's spoken voice is input, the processor 130 may convert a voice signal into text using an Automatic Speech Recognition (ASR) module (S310), and detect a domain corresponding to the user's uttered voice. (S321), it is possible to analyze the reliability between the user's uttered voice and the detected domain and the previous domain (S322). In addition, the processor 130 may analyze the user's utterance intention (S320). Since the detailed description of detecting a domain corresponding to the user's spoken voice, analyzing the reliability between the user's uttered voice and the detected domain and the previous domain, and analyzing the user's uttered intention has been previously described, a detailed description thereof will be omitted.

이후, 프로세서(130)는 사용자 발화 음성과 검출된 도메인 및 이전 도메인 간의 신뢰도에 기초하여 사용자 발화 음성을 처리할 도메인이 이전 도메인과 동일한지 여부를 판단할 수 있다(S330).Thereafter, the processor 130 may determine whether the domain to process the user's uttered voice is the same as the previous domain based on the reliability between the user uttered voice and the detected domain and the previous domain ( S330 ).

여기서, 프로세서(130)는 사용자 발화 음성을 처리할 도메인이 이전 도메인과 동일한 것으로 판단된 경우, 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 사용자 발화 음성과 판단된 컨텍스트 및 이전 컨텍스트 간의 신뢰도 스코어를 분석할 수 있다(S340).Here, when it is determined that the domain to process the user uttered voice is the same as the previous domain, the processor 130 determines a context corresponding to the user uttered voice, and analyzes a reliability score between the user uttered voice and the determined context and the previous context It can be done (S340).

그리고, 프로세서(130)는 사용자 발화 음성과 판단된 컨텍스트 및 이전 컨텍스트 간의 신뢰도 스코어에 기초하여 사용자 발화 음성을 처리할 컨텍스트가 이전 컨텍스트와 동일하니 여부를 판단하거나 현재 state에서 처리 가능한지 판단할 수 있다(S350).In addition, the processor 130 may determine whether the context in which the user's spoken voice is to be processed is the same as the previous context or whether processing is possible in the current state based on the confidence score between the user's spoken voice and the determined context and the previous context ( S350).

여기서, 프로세서(130)는 사용자 발화 음성을 처리할 컨텍스트가 이전 컨텍스트와 동일하다고 판단된 경우, 이전 컨텍스트에 기초하여 State management를 처리할 수 있다(S360). 그리고, 프로세서(130)는 이전 컨텍스트에 기초하여 수신된 사용자 발화 음성에 대한 처리를 완료할 수 있다(S370). 또한, 프로세서(130)는 현재 사용자 발화 음성이 수신되기 이전에 저장해두었던 컨텍스트를 다시 저장부(110)로부터 독출하여 신규 사용자 발화 음성을 처리할 수도 있다(S380).Here, when it is determined that the context in which the user uttered voice is to be processed is the same as the previous context, the processor 130 may process state management based on the previous context (S360). Then, the processor 130 may complete the processing of the received user uttered voice based on the previous context (S370). Also, the processor 130 may read the context stored before the current user's spoken voice is received from the storage 110 again and process the new user's spoken voice ( S380 ).

한편, 프로세서(130)는 사용자 발화 음성을 처리할 도메인이 이전 도메인이 아닌 검출된 사용자 발화 음성에 대응되는 도메인으로 결정된 경우, 이전 도메인 및 이전 컨텍스트과 관련된 정보를 저장하고(S331), 검출된 사용자 발화 음성에 대응되는 도메인의 context state management를 수행하여(S332) 사용자 발화 음성을 처리하게 된다.Meanwhile, when the domain for processing the user's uttered voice is determined to be a domain corresponding to the detected user's uttered voice instead of the previous domain, the processor 130 stores information related to the previous domain and the previous context (S331), and the detected user's utterance The context state management of the domain corresponding to the voice is performed (S332) to process the user's uttered voice.

또한, 프로세서(130)는 사용자 발화 음성을 처리할 도메인이 이전 도메인과 동일한 것으로 판단된 경우라도 사용자 발화 음성을 처리할 컨텍스트가 이전 컨텍스트가 아닌 사용자 발화 음성에 대응되는 컨텍스트로 결정된 경우, 이전에 진행중이던 컨텍스트에 관련된 정보를 저장하고(S351), 이전 도메인 내에서 새로운 Ccontext state management를 수행하여(S352), 사용자 발화 음성을 처리하게 된다.In addition, even when it is determined that the domain for processing the user's uttered voice is the same as the previous domain, the processor 130 proceeds to the previous Information related to the current context is stored (S351), and a new Ccontext state management is performed in the previous domain (S352) to process the user's uttered voice.

한편, 도 4는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리할 도메인이 변경되는 경우 처리 과정에 관한 도면이다.Meanwhile, FIG. 4 is a diagram illustrating a processing process when a domain for processing a user's uttered voice is changed according to an embodiment of the present invention.

예를 들어, 사용자로부터 "큰 방 TV 시청할 때면 다른 방 불 꺼주고 이 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성이 수신되면(411), 프로세서(130)는 "큰 방 TV 시청할 때면 다른 방 불 꺼주고 이 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성을 처리할 도메인으로서 TV 시청 시 전등 및 온도에 관한 도메인으로 결정하여 사용자 발화 음성을 처리하여 "거실과 복도 불도 함께 끌까요?"와 같은 시스템 응답을 생성하여 출력할 수 있다(421).For example, when a user's voice saying "When watching TV in a big room, turn off the lights in another room and make this room cool and dark" is received from the user (411), the processor 130 "lights another room when watching TV in a big room" A system such as “Should I turn off the lights in the living room and hallway as well?” A response may be generated and output (421).

이후, 사용자로부터 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성이 수신되면(412) 프로세서(130)는 수신된 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성에 대응되는 일정에 관한 도메인을 검출하고 사용자 발화 의도를 분석하여(S431), 사용자 발화 음성과 검출된 일정에 관한 도메인 간의 신뢰도를 분석하고(S432), 분석된 신뢰도 결과에 따라 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성을 처리할 도메인이 이전의 TV 시청 시 전등 및 온도에 관한 도메인과 동일한지 여부를 판단할 수 있다(S433). 여기서, 프로세서(130)는 "참, 다음 주말에 결혼식 일정 등록해줘"라는 사용자 발화 음성을 처리할 도메인은 이전 TV 시청 시 전등 및 온도에 관한 도메인이 아닌 일정에 관한 도메인으로 결정되면, 이전 도메인에서 진행중이던 컨텍스트와 관련된 정보를 저장부(110)에 저장할 수 있고(S434), 새롭게 결정된 일정에 관한 도메인에 맞는 컨텍스트를 생성하여 사용자 발화 음성을 처리할 수 있다(S435). 도 4에 도시된 바와 같이, 프로세서(130)는 "무슨 요일, 몇 시인가요?"라는 시스템 응답을 생성하여 출력할 수 있고(S422), 이에 대해 사용자로부터 "토요일, 12시"라는 사용자 발화 음성이 수신되면 여전히 일정에 관한 도메인 내의 동일한 컨텍스트이므로 이를 유지한 채 "추가 정보는 있나요?"라는 시스템 응답을 생성하여 출력(423)할 수 있다. 또한, 이에 대해, 사용자로부터 "김대경 결혼식, 장소는 신촌 ○○○ 대학교 동문회관"이라는 사용자 발화 음성이 수신되면(414), 프로세서(130)는 여전히 일정에 관한 도메인 내의 동일한 컨텍스트이므로 이를 유지한 채 "일정 등록이 완료되었습니다"라는 시스템 응답을 생성하여 출력할 수 있고(424), 이에 대해 사용자로부터 "알았어"라는 사용자 발화 음성을 수신할 수 있다(415). 이에 따라 프로세서(130)는 현재 진행 중인 Task가 완료되었음을 판단할 수 있다(S436).Thereafter, when a user's uttered voice is received from the user, "Yes, register the wedding schedule for next weekend" (412), the processor 130 responds to the user's uttered voice 'True, register the wedding schedule for the next weekend'. The domain related to the schedule is detected and the intention of the user's speech is analyzed (S431), and the reliability between the user's speech voice and the domain related to the detected schedule is analyzed (S432), and according to the analyzed reliability result, "True, wedding schedule for next weekend It may be determined whether the domain to process the user's utterance "Please register" is the same as the domain related to light and temperature when watching TV before (S433). Here, the processor 130 determines that the domain to process the user's utterance voice "Register the wedding schedule next weekend" is a domain related to a schedule rather than a domain related to lights and temperature when watching TV before, in the previous domain Information related to the ongoing context may be stored in the storage unit 110 (S434), and a context suitable for a domain related to a newly determined schedule may be generated to process the user's uttered voice (S435). As shown in FIG. 4 , the processor 130 may generate and output a system response “What day of the week, what time is it?” (S422), and in response to this, a user uttered voice saying “Saturday, 12:00” from the user When this is received, it is still the same context within the domain regarding the schedule, so it is possible to generate and output 423 a system response "Do you have any additional information?" while maintaining it. In addition, when a user uttered voice is received from the user, "Kim Dae-kyung's wedding, the location is Sinchon ○○○ University Alumni Hall" (414), the processor 130 is still the same context in the domain regarding the schedule, so it is maintained A system response “schedule registration is complete” may be generated and output (424), and a user uttered voice saying “OK” may be received from the user (415). Accordingly, the processor 130 may determine that the currently ongoing task has been completed (S436).

이후, 프로세서(130)는 이전에 진행 중이던 TV 시청 시 전등 및 온도에 관한 도메인과 관련 컨텍스트를 저장부(110)로부터 독출하고(S437), 이전 도메인과 관련 컨텍스트에 기초하여 사용자로부터 수신되는 신규 발화 음성을 처리할 수 있다(S438).Thereafter, the processor 130 reads a domain and a related context related to light and temperature from the storage 110 when watching TV in progress ( S437 ), and a new utterance received from the user based on the previous domain and related context ( S437 ) The voice may be processed (S438).

예를 들어, 프로세서(130)는 "이전에 진행 중이던 Planning을 이어서 진행하겠습니다"와 같은 시스템 응답을 출력하여 사용자로부터 "거실은 끄고 복도는 켜줘"라는 사용자 발화 음성이 수신되면 이전 도메인인 TV 시청 시 전등 및 온도에 관한 도메인과 관련 컨텍스트에 기초하여 사용자 발화 음성을 처리하고, 거실과 복도 불도 함께 끌까요?"라는 시스템 응답을 생성하여 출력할 수 있고(426), 이에 대해 사용자로부터 "중간 정도로 해줘"라는 사용자 발화 음성이 수신되면(417), 프로세서(130)는 여전히 TV 시청 시 전등 및 온도에 관한 도메인 및 관련 컨텍스트에 관한 것으로 판단하여 이를 유지한 채 "에어컨은 어느 정도로 할까요?"와 같은 시스템 응답을 생성하여 출력할 수 있다.For example, the processor 130 outputs a system response such as "We will continue the planning that was previously in progress", and when a user's voice saying "Turn off the living room and turn on the hallway" is received from the user, when watching TV, the previous domain Based on the domains of light and temperature, and the relevant context, process the user's spoken voice, and turn off the living room and hallway lights as well? When a user uttered voice "" is received (417), the processor 130 still determines that it is about the domain and related contexts related to light and temperature when watching TV, and maintains it, while maintaining the system such as "How much air conditioning?" You can generate a response and print it out.

한편, 도 5는 본 발명의 일 실시 예에 따른 사용자 발화 음성을 처리할 도메인이 변경되지 않는 경우 처리 과정에 관한 도메인이다.Meanwhile, FIG. 5 is a domain for processing when a domain for processing a user's uttered voice is not changed according to an embodiment of the present invention.

예를 들어, 사용자로부터 "큰 방 TV 시청할 때면 다른 방 불 꺼주고 이 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성이 수신되면(511), 프로세서(130)는 "큰 방 TV 시청할 때면 다른 방 불 꺼주고 이 방은 시원하고 어둡게 해줘"라는 사용자 발화 음성을 처리할 도메인으로서 TV 시청 시 전등 및 온도에 관한 도메인으로 결정하여 사용자 발화 음성을 처리하여 "거실과 복도 불도 함께 끌까요?"와 같은 시스템 응답을 생성하여 출력할 수 있다(521).For example, when a user uttered voice saying "When watching TV in a large room, turn off the lights in another room and make this room cool and dark" is received from the user (511), the processor 130 is "lighted in another room when watching TV in a large room" Turn it off and make this room cool and dark" as the domain to process the user's voice. When watching TV, it is decided as a domain related to lights and temperature, and a system such as "Should I turn off the lights in the living room and hallway as well?" A response may be generated and output (521).

이후, 사용자로부터 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성이 수신되면(512), 프로세서(130)는 수신된 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성에 대응되는 채널 정보에 관한 도메인을 검출하고 사용자 발화 의도를 분석하여(S531), 사용자 발화 음성과 검출된 방송 프로그램에 관한 도메인 가의 신뢰도를 분석하고(S532), 분석된 신뢰도 결과에 따라 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성을 처리할 도메인이 이전의 TV 시청 시 전등 및 온도에 관한 도메인과 동일한지 여부를 판단할 수 있다(S533). 여기서, 프로세서(130)는 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성을 처리할 도메인은 이전 도메인인 TV 시청 시 전등 및 온도에 관한 도메인과 동일한 것으로 판단되면, TV 시청 시 전등 및 온도에 관한 도메인 내에서 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성에 대응되어 검출된 채널 정보에 관한 컨텍스트 간의 신뢰도를 분석할 수 있고(S534), 신뢰도 분석 결과에 따라 프로세서(130)는 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성을 처리할 컨텍스트가 이전 컨텍스트인 TV 시청 시 전등 및 온도에 관한 컨텍스트와 동일하니 여부를 판단할 수 있다(S535). 여기서, 프로세서(130)는 "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성을 처리할 컨텍스트는 이전 컨텍스트가 아닌 검출된 채널 정보에 관한 컨텍스르로 결정할 수 있고, 이에 다라, 이전에 진행 중이던 컨텍스트 관련 정보를 저장부(110)에 저장할 수 있다(S536).Thereafter, when a user uttered voice saying “what channel is a movie on?” is received from the user (512), the processor 130 is a channel corresponding to the received user uttered voice “how many times is a movie on a channel?” By detecting a domain related to information and analyzing the user's utterance intention (S531), the reliability of the domain value for the user's uttered voice and the detected broadcast program is analyzed (S532), and according to the analyzed reliability result, "the number of times the channel for the movie is It may be determined whether the domain to process the user's utterance voice "isn't it?" is the same as the domain related to light and temperature when watching TV before (S533). Here, if the processor 130 determines that the domain for processing the user's utterance voice, "What time is the movie on the channel?" is the same as the previous domain, the domain related to lights and temperature when watching TV, light and temperature when watching TV Reliability between contexts regarding channel information detected in response to a user uttered voice saying, “How many channels is a movie on?” in the domain related to It may be determined whether the context for processing the user's utterance voice of "what channel is the movie on?" is the same as the previous context, which is the context regarding lights and temperature when watching TV (S535). Here, the processor 130 may determine the context in which to process the user's uttered voice, "What time is the channel on which the movie comes out?" is a context related to the detected channel information, not the previous context. Context-related information may be stored in the storage unit 110 (S536).

그리고, 프로세서(130)는 TV 시청 시 전등 및 온도에 관한 도메인 내에서 새로운 채널 정보에 관한 컨텍스트를 생성하여 사용자 발화 음성을 처리할 수 있다(S537).In addition, the processor 130 may process the user's uttered voice by creating a context related to new channel information within the domain related to light and temperature when watching TV (S537).

즉, "영화 나오는 채널이 몇 번이지?"라는 사용자 발화 음성에 대해 프로세서(130)는 "23, 37, 101, 157, 274 번 채널입니다"와 같은 시스템 응답을 생성하여 출력할 수 있고(522), 이에 대해 사용자로부터 "34번"이라는 사용자 발화 음성이 수신되면 프로세서(130)는 "34번으로 채널 변환하였습니다"와 같은 시스템 응답을 생성하여 출력할 수 있다(523).That is, to the user's voice saying "what channel is the movie on?", the processor 130 may generate and output a system response such as "there are channels 23, 37, 101, 157, and 274" (522). ), when the user's uttered voice "34" is received from the user, the processor 130 may generate and output a system response such as "the channel has been changed to 34" ( 523 ).

이후, 프로세서(130)는 현재 진행 중인 Task가 완료되었음을 판단할 수 있고(S538), 프로세서(130)는 이전에 진행 중이던 TV 시청 시 전등 및 온도에 관한 컨텍스트를 저장부(110)로부터 독출하며(S539-1), 이에 기초하여 사용자로부터 수신되는 신규 발화 음성을 처리할 수 있다(S539-2). 예를 들어, 프로세서(130)는 "이전에 진행 중이던 Planning을 이어서 진행하겠습니다"와 같은 시스템 응답을 출력하여(524) 사용자로부터 "거실은 끄고 복도는 켜줘"라는 사용자 발화 음성이 수신되면(514), 이전 컨텍스트인 TV 시청 시 전등 및 온도에 관한 컨텍스트에 기초하여 사용자 발화 음성을 처리하고, "거실과 복도 불도 함께 끌까요?"라는 시스템 응답을 생성하여 출력할 수 있고(525), 이에 대해 사용자로부터 "중간 정도로 해줘"라는 사용자 발화 음성이 수신되면(515), 프로세서(130)는 여전히 TV 시청 시 전등 및 온도에 관한 컨텍스트에 관한 것으로 판단하여 이를 유지한 채 "에어컨은 어느 정도로 할까요?"와 같은 시스템 응답을 생성하여 출력할 수 있다(526).Thereafter, the processor 130 may determine that the currently ongoing task has been completed (S538), and the processor 130 reads the context related to the light and temperature when watching TV that was previously in progress from the storage 110 ( S539-1), based on this, a new spoken voice received from the user may be processed (S539-2). For example, the processor 130 outputs a system response such as "I will proceed with the previously ongoing planning" (524), and when a user's voice saying "Turn off the living room and turn on the hallway" is received from the user (514) , it is possible to process the user's uttered voice based on the previous context, the context of light and temperature when watching TV, and generate and output a system response "Do you want to turn off the lights in the living room and hallway together?" (525), and for this, the user When a user's utterance voice is received (515), "Please do it to a medium degree" from The same system response may be generated and output (526).

한편, 도 6은 본 발명의 DM 모듈에서의 처리 과정을 도시한 도면이다.Meanwhile, FIG. 6 is a diagram illustrating a processing process in the DM module of the present invention.

도 6을 참조하면, "효율적인 에너지 관리해줘"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 Dialog Manager(10)가 Dialog Context, Dialog History 및 Question Answering을 참조하여 NLP(Natural Language Processing)(640)이 사용자 발화 의도를 분석하도록 제어할 수 있다. 예를 들어, Dialog Manager(10)은 "효율적인 에너지 관리해줘"라는 사용자 발화 음성에 있어서 이해가 불명확한 부분에 대한 내용에 관한 정보를 NLP(640) 모듈로 전송하고(610), NLP(640) 모듈을 사용하여 불필요하거나 사용 안 하는 에너지 관리 또는 효율적인 성능 수치를 적용하여 사용자 발화 음성의 정확한 의미 및 발화 의도를 분석할 수 있다.Referring to FIG. 6 , when a user uttered voice saying “manage energy efficiently” is received, the processor 130 uses the Dialog Manager 10 to refer to Dialog Context, Dialog History, and Question Answering, and a Natural Language Processing (NLP) 640 ) can be controlled to analyze the intention of the user's utterance. For example, the Dialog Manager 10 transmits information about the content of the unclear part in the user's uttered voice saying "manage energy efficiently" to the NLP (640) module (610), and the NLP (640) The module can be used to analyze the exact meaning and intent of the user's spoken voice by applying unnecessary or unused energy management or efficient performance figures.

그리고, 프로세서(130)는 Dialog Manager(10)가 Context Manager(650) 모듈로 불명확한 내용이나 조건 또는 선택에 필요한 정보를 요청하면(620), Context Manager(650) 모듈을 사용하여 "효율적인 에너지 관리해줘"라는 사용자 발화 음성과 관련된 다양한 외부 기기에 대한 컨텍스트를 체크하고, 사용자 발화 음성과 관련된 컨텍스트를 분석하여 도움이 될 만한 Task를 추천하거나(670), 관련된 Task를 제안하도록 제어할 수 있다(680).And, when the Dialog Manager (10) requests unclear content or conditions or information necessary for selection by the Context Manager (650) module (620), using the Context Manager (650) module, "Efficient energy management" It is possible to check the context of various external devices related to the user's uttered voice, "Do it," and analyze the context related to the user's uttered voice to recommend a helpful task (670), or control to suggest a related task (680). ).

또한, 프로세서(130)는 Planning Manager(660) 모듈로 Planning 시 필요한 내용을 요청하면(630), Planning Manager(660) 모듈을 사용하여 "효율적인 에너지 관리해줘"라는 사용자 발화 음성에 대해 다양한 외부 장치들의 조합을 통해 사용자 발화 음성에 대응되는 적절한 시스템 응답 즉, 다양한 외부 장치들을 어떻게 조합하여 기능을 수행하게 할 것인가에 관한 Action Planning을 수행하도록 제어할 수있다(660).In addition, when the processor 130 requests the content necessary for planning with the Planning Manager 660 module ( 630 ), using the Planning Manager ( 660 ) module, the processor 130 responds to the user’s utterance of “manage energy efficiently” by using various external devices. Through the combination, it is possible to control an appropriate system response corresponding to the user's voice, that is, to perform action planning regarding how to combine various external devices to perform a function ( 660 ).

한편, 도 7은 본 발명의 일 실시 예에 따른 전자 장치(100)와 외부 장치에 관한 정보를 포함하는 데이터 베이스를 포함하는 시스템에서 수행되는 과정을 설명하기 위한 도면이다.Meanwhile, FIG. 7 is a diagram for explaining a process performed in a system including the electronic device 100 and a database including information about an external device according to an embodiment of the present invention.

도 7을 참조하면, 프로세서(130)는 Dialog Manager(10), NLP(640), Context Manager(650) 및 Planning Manager(660)이 외부 장치에 관한 정보를 저장하는 데이터 베이스(KB)(20)에 외부 장치의 기능, 성능, 특징 등에 관한 정보를 요청하면, 외부 장치에 관한 정보를 저장하는 데이터 베이스(KB)(20)를 제어하는 KB Manager(30) 및 Things Manger(40)는 수신된 요청을 처리하고 Batch Manager(50)는 Dialog Manager(10), NLP(640), Context Manager(650) 및 Planning Manager(660)으로 외부 장치의 기능, 성능, 특징 등에 관한 정보를 전송할 수 있다.Referring to FIG. 7 , the processor 130 is a database (KB) 20 in which the Dialog Manager 10 , the NLP 640 , the Context Manager 650 and the Planning Manager 660 stores information about external devices. When information on functions, performance, characteristics, etc. of the external device is requested from process and the Batch Manager 50 may transmit information about the function, performance, characteristics, etc. of the external device to the Dialog Manager 10 , the NLP 640 , the Context Manager 650 , and the Planning Manager 660 .

이에 따라, 프로세서(130)는 수신된 외부 장치의 기능, 성능, 특징 등에 관한 정보에 기초하여 Dialog Manager(10), NLP(640), Context Manager(650) 및 Planning Manager(660)을 통해 사용자 발화 음성을 처리하여 시스템 응답을 생성하여 출력할 수 있다.Accordingly, the processor 130 uses the Dialog Manager 10 , the NLP 640 , the Context Manager 650 , and the Planning Manager 660 based on the received information about the function, performance, characteristic, etc. of the external device to utter the user's utterance. It can process the voice to generate and output a system response.

도 8은 본 발명의 다른 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다.8 is a block diagram illustrating a configuration of an electronic device according to another embodiment of the present invention.

도 8을 참조하면, 전자 장치(100)는 저장부(110), 스피커부(120), 프로세서(130) 및 통신부(140)를 포함하며, 저장부(110), 스피커부(120) 및 프로세서(130)에 대해서는 미리 설명하였으므로 자세한 설명은 생략하기로 한다.Referring to FIG. 8 , the electronic device 100 includes a storage unit 110 , a speaker unit 120 , a processor 130 , and a communication unit 140 , and includes the storage unit 110 , the speaker unit 120 , and the processor. Since 130 has been previously described, a detailed description thereof will be omitted.

통신부(140)는 적어도 하나의 외부 장치와 통신을 수행할 수 있으며, 프로세서(130)는 발화 음성에 대응되는 시스템 응답이 결정된 도메인 내에서 적어도 하나의 외부 장치의 기능 제어가 요구되는 컨텍스트에 기초하여 생성되면, 외부 장치의 기능에 관한 정보에 기초하여 적어도 하나의 외부 장치의 기능을 제어하기 위한 시스템 응답을 생성할 수 있다.The communication unit 140 may communicate with at least one external device, and the processor 130 may be configured based on a context in which a function control of at least one external device is required within a domain in which a system response corresponding to the spoken voice is determined. When generated, a system response for controlling the function of at least one external device may be generated based on the information on the function of the external device.

구체적으로, "집안 온도를 낮춰줘"라는 사용자 발화 음성이 수신되면, 프로세서(130)는 사용자 발화 음성을 처리할 도메인으로 온도 관련 도메인을 결정할 수 있고, 이러한 온도 관련 도메인 내에서 적어도 하나의 외부 장치의 기능 제어가 요구되는 컨텍스트에 기초하여 생성되면, 프로세서(130)는 집안에 배치된 다양한 외부 장치들 중 온도 조절 기능과 관련된 외부 장치를 검색할 수 있다. 예를 들어, 프로세서(130)는 온도 조절 기능과 관련하여 에어컨, 창문 및 전등 등의 외부 장치를 검색할 수 있고, 에어컨을 턴 온 시켜 온도를 낮추고, 창문을 닫아서 에너지 효율을 높이며 전등을 턴 오프하는 기능을 수행하도록 하는 제어 명령을 에어컨, 창문 및 전등 각각에 전송할 수 있다.Specifically, when a user's uttered voice of “lower the house temperature” is received, the processor 130 may determine a temperature-related domain as a domain to process the user's uttered voice, and at least one external device within the temperature-related domain When the function control is generated based on the required context, the processor 130 may search for an external device related to the temperature control function among various external devices disposed in the house. For example, the processor 130 may search for external devices such as an air conditioner, a window, and a light in relation to the temperature control function, turn on the air conditioner to lower the temperature, close the window to increase energy efficiency, and turn off the light A control command to perform a function may be transmitted to each of the air conditioner, the window, and the lamp.

또한, 프로세서(130)는 온도 관련 도메인 내에서 온도를 조절하기 위해 다양한 외부 장치의 기능, 성능 및 특징 등을 고려하여 사용자 발화 음성에 대응되는 시스템 응답에 부합하는 적어도 하나의 외부 장치를 판단할 수 있다.In addition, the processor 130 may determine at least one external device conforming to a system response corresponding to a user uttered voice in consideration of functions, performance, and characteristics of various external devices to control the temperature within the temperature-related domain. have.

여기서, 저장부(110)는 외부 장치의 기능에 대한 정보를 더 저장할 수 있고, 통신부(140)는 기 설정된 네트워크 내에 추가된 적어도 하나의 외부 장치에 대한 기능 정보를 수신할 수 있다.Here, the storage 110 may further store information on functions of the external device, and the communication unit 140 may receive function information on at least one external device added in a preset network.

그리고, 프로세서(130)는 수신된 적어도 하나의 외부 장치에 대한 기능 정보에 기초하여 저장부(110)에 저장된 정보를 업데이트할 수 있다.In addition, the processor 130 may update information stored in the storage 110 based on the received function information on at least one external device.

예를 들어, 기 설정된 네트워크를 집 내부의 네트워크로 상정하면, 집 내부에 신규 디스플레이 장치가 추가적으로 구비된 경우, 통신부(140)는 집 내부의 네트워크 내에 추가된 신규 디스플레이 장치에 대한 기능 정보를 수신할 수 있고, 프로세서(130)는 수신된 신규 디스플레이 장치에 대한 기능 정보에 기초하여 저장부(110)에 저장된 정보를 업데이트할 수 있다. 이에 따라, 프로세서(130)는 이후 수신되는 사용자 발화 음성을 처리하는데 있어서, 추가된 신규 디스플레이 장치의 기능 정보를 더 고려하여 시스템 응답을 생성할 수 있다.For example, assuming a preset network as an internal network, if a new display device is additionally provided inside the house, the communication unit 140 may receive function information on the new display device added to the internal network. Also, the processor 130 may update information stored in the storage 110 based on the received function information on the new display device. Accordingly, the processor 130 may generate a system response by further considering the function information of the new display device added in processing the user's uttered voice received thereafter.

한편, 프로세서(130)는 발화 이력 정보에 기초하여 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하며, 발화 이력 정보는 이전에 수신된 사용자 발화 음성, 이전에 수신된 사용자 발화 음성을 처리한 도메인과 관련된 정보 및 이전에 수신된 사용자 발화 음성에 대응되는 시스템 응답 중 적어도 하나를 포함할 수 있다.Meanwhile, the processor 130 generates a system response by determining a domain to process a user's uttered voice based on the uttered history information, and the uttered history information processes a previously received user's uttered voice and a previously received user's uttered voice. It may include at least one of information related to one domain and a system response corresponding to a previously received user uttered voice.

예를 들어, 프로세서(130)는 사용자 발화 음성에 대응되는 도메인 또는 컨텍스트를 검출하기 불가능하거나 사용자 발화 음성과 검출된 도메인 또는 컨텍스트 간의 신뢰도에 기초하여 사용자 발화 음성을 처리할 도메인 또는 컨텍스트를 결정하기 불가능한 경우, 발화 이력 정보에 기초하여 사용자 발화 음성을 처리할 도메인 또는 컨텍스트를 결정할 수 있다.For example, the processor 130 is unable to detect a domain or context corresponding to the user uttered voice or is unable to determine a domain or context in which to process the user uttered voice based on the reliability between the user uttered voice and the detected domain or context. In this case, a domain or context in which the user's uttered voice is to be processed may be determined based on the utterance history information.

또한, 이러한 발화 이력 정보는 이전에 수신된 사용자 발화 음성, 이전에 수신된 사용자 발화 음성을 처리한 도메인과 관련된 컨텍스트에 관한 정보 및 이전에 수신된 사용자 발화 음성에 대응되는 시스템 응답을 생성하는데 있어서 참조한 외부 장치에 관한 정보, 외부 장치의 기능에 관한 정보 등을 포함할 수 있다.In addition, this utterance history information is referred to in generating a previously received user uttered voice, information about a context related to a domain in which the previously received user uttered voice has been processed, and a system response corresponding to the previously received user uttered voice. It may include information about the external device, information about the function of the external device, and the like.

한편, 도 9는 본 발명의 또 다른 실시 예에 따른 전자 장치의 구성을 도시한 블럭도이다. 도 9를 참조하면, 전자 장치(100)는 저장부(110), 스피커부(120), 프로세서(130) 및 마이크부(160)를 포함하며, 저장부(110), 스피커부(120) 및 프로세서(130)에 대해서는 이미 설명하였으므로, 자세한 설명은 생략하기로 한다.Meanwhile, FIG. 9 is a block diagram illustrating the configuration of an electronic device according to another embodiment of the present invention. Referring to FIG. 9 , the electronic device 100 includes a storage unit 110 , a speaker unit 120 , a processor 130 , and a microphone unit 160 , the storage unit 110 , the speaker unit 120 , and Since the processor 130 has already been described, a detailed description thereof will be omitted.

또한, 마이크부(160)는 사용자 발화 음성을 입력받을 수 있으며, 이러한 마이크부(160)는 전자 장치(100)에 구비되어 있을 수도 있으나, 분리되어 외부에 존재할 수도 있고, 탈부착이 가능한 형태로 구현될 수도 있다.In addition, the microphone unit 160 may receive a user's spoken voice, and the microphone unit 160 may be provided in the electronic device 100 , but may be separated and exist outside, and is implemented in a detachable form. it could be

한편, 상술한 바와 같이 전자 장치(100)는 사용자 발화 음성에 대응되는 시스템 응답을 출력할 수 있으며, 시스템 응답을 출력함과 동시에 사용자에게 피드백 효과를 제공할 수도 있다. 또한, 전자 장치(100)는 디스플레이부(미도시)를 포함하며, 디스플레이부(미도시)를 통해 피드백 효과를 제공할 수 있다.Meanwhile, as described above, the electronic device 100 may output a system response corresponding to the user's uttered voice, and may output the system response and provide a feedback effect to the user at the same time. Also, the electronic device 100 may include a display unit (not shown), and may provide a feedback effect through the display unit (not shown).

구체적으로, 프로세서(130)는 사용자 발화 음성에 대응되는 시스템 응답을 생성하여 출력함과 동시에 사용자 발화 음성에 대응되는 동작의 수행이 완료되었음을 음향 신호, 메시지 및 사용자 유저인터페이스 화면 등 중 하나를 통해 사용자에게 제공할 수 있으며, 여기서, 메시지 또는 사용자 유저인터페이스 화면은 디스플레이부(미도시)를 통해 출력될 수 있다.Specifically, the processor 130 generates and outputs a system response corresponding to the user's uttered voice and at the same time indicates that the operation corresponding to the user's uttered voice is completed through one of an acoustic signal, a message, and a user user interface screen. may be provided to, where the message or the user user interface screen may be output through a display unit (not shown).

예를 들어, 프로세서(130)는 "집안 온도를 낮춰줘"라는 사용자 발화 음성이 수신되면, 집안에 배치된 다양한 외부 장치들 중 온도 조절 기능과 관련된 외부 장치를 검색하고, 이에 따라 에어컨을 턴 온 시켜 온도를 낮추고, 창문을 닫아서 에너지 효율을 높이며 전등을 턴 오프하는 기능을 수행하도록 하는 제어 명령을 에어컨, 창문, 및 전등 각각에 전송할 수 있으며, 이와 동시에 "요청하신 부분을 처리하였습니다" 또는 "요청하신 바에 따라 에어컨을 작동하고, 창문을 닫으며 전등을 껐습니다"와 같은 음향 신호를 스피커부(120)를 통해 출력하여 사용자에게 피드백 효과를 줄 수도 있고, "요청하신 부분을 처리하였습니다" 또는 "요청하신 바에 따라 에어컨을 작동하고, 창문을 닫으며 전등을 껐습니다"를 텍스트 형태로 포함하는 메시지나 사용자 유저인터페이스 화면 또는 아이콘 등을 디스플레이부(미도시)를 통해 출력하여 사용자에게 피드백 효과를 줄 수도 있다.For example, when a user uttered voice saying “lower the house temperature” is received, the processor 130 searches for an external device related to a temperature control function among various external devices disposed in the house, and turns on the air conditioner accordingly. Control commands can be sent to each of the air conditioner, window, and light to reduce the temperature, close the window to increase energy efficiency, and turn off the light, while simultaneously "processing your request" or "request" You can give a feedback effect to the user by outputting an acoustic signal such as "I turned on the air conditioner, close the window and turn off the light according to your request" through the speaker unit 120, or "The requested part has been processed" or " As requested, the air conditioner was operated, the window was closed, and the light was turned off” in text form, or a user interface screen or icon, etc. may be

또한, 프로세서(130)는 사용자 발화 음성에 대응되는 시스템 응답을 출력하고 출력된 시스템 응답과 관련성이 있는 또 다른 동작 또는 Task를 사용자에게 추가적으로 추천 또는 제안할 수도 있다.In addition, the processor 130 may output a system response corresponding to the user's uttered voice and additionally recommend or suggest another operation or task related to the output system response to the user.

예를 들어, 프로세서(130)는 "어제 보던 영화를 보여줘"라고 사용자 발화 음성이 수신되면, TV에서 어제 보던 영화를 디스플레이하도록 하는 제어 명령을 TV로 전송한 후, TV에서 영화를 재생하도록 하는 제어 명령 중 "영화 재생"이라는 키워드와 관련하여 현재 영화를 재생하는 상황에 적합한 다른 동작 또는 Task를 사용자에게 추가적으로 추천 또는 제안할 수 있다. 구체적으로, 프로세서(130)는 "전등의 라이트 조도를 낮출까요?" 또는 "현재 로봇 청소기가 작동중인데 영화 시청에 방해가 되니 로봇 청소기의 동작을 중단할까요?" 또는 "영화 시청에 방해가 되지 않도록 창문을 닫을까요?" 등과 같이 TV에서 영화를 재생하도록 하는 제어 명령과 관련된 동작 또는 Task를 사용자에게 추가적으로 제안하고 추천할 수 있다.For example, when a user uttered voice saying “show me the movie I watched yesterday” is received, the processor 130 transmits a control command to display the movie that I saw yesterday on the TV to the TV, and then controls the TV to play the movie In relation to the keyword "play movie" among the commands, another action or task suitable for the current movie playing situation may be additionally recommended or suggested to the user. Specifically, the processor 130 asks "Do you want to lower the light intensity of the lamp?" or "The robot vacuum is currently running, but would you like to stop the robot vacuum as it interferes with watching movies?" or "Should I close the windows so they don't get in the way of watching a movie?" An operation or task related to a control command for playing a movie on the TV, such as the like, may be additionally suggested and recommended to the user.

이와 같이, 프로세서(130)는 사용자 발화 음성에 대응되는 시스템 응답을 생성하여 출력함과 동시에 사용자 발화 음성에 대응되는 동작의 수행이 완료되었음을 음향 신호, 메시지 및 사용자 유저인터페이스 화면 등 중 하나를 통해 사용자에게 제공할 수 있고, 또한, 사용자 발화 음성에 대응되는 시스템 응답과 관련성이 있는 또 다른 동작 또는 Task를 사용자에게 추가적으로 추천하거나 제안할 수 있다.As such, the processor 130 generates and outputs a system response corresponding to the user's uttered voice and at the same time indicates that the operation corresponding to the user's uttered voice is completed through one of an acoustic signal, a message, and a user interface screen. It can be provided to the user, and another action or task related to the system response corresponding to the user's voice can be additionally recommended or suggested to the user.

도 10은 도 1에 도시된 전자 장치의 구체적 구성을 나타내는 블럭도이다.FIG. 10 is a block diagram illustrating a specific configuration of the electronic device shown in FIG. 1 .

도 10을 참조하면, 전자 장치(100')는 저장부(110), 스피커부(120), 프로세서(130), 통신부(140), 디스플레이부(150), 마이크부(160) 및 센서부(170)를 포함한다. 도 10에 도시된 구성 중 도 1에 도시된 구성과 중복되는 부분에 대해서는 자세한 설명을 생략하도록 한다.Referring to FIG. 10 , the electronic device 100 ′ includes a storage unit 110 , a speaker unit 120 , a processor 130 , a communication unit 140 , a display unit 150 , a microphone unit 160 , and a sensor unit ( 170). A detailed description of the portion overlapping with the configuration shown in FIG. 1 among the configurations shown in FIG. 10 will be omitted.

프로세서(130)는 전자 장치(100)의 동작을 전반적으로 제어한다.The processor 130 controls the overall operation of the electronic device 100 .

구체적으로, 프로세서(130)는 RAM(131), ROM(132), 메인 CPU(133), 그래픽 처리부(134), 제1 내지 n 인터페이스(135-1 ~ 135-n), 버스(136)를 포함한다.Specifically, the processor 130 includes the RAM 131 , the ROM 132 , the main CPU 133 , the graphic processing unit 134 , the first to n interfaces 135-1 to 135-n, and the bus 136 . include

RAM(131), ROM(132), 메인 CPU(133), 그래픽 처리부(134), 제1 내지 n 인터페이스(135-1 ~ 135-n) 등은 버스(136)를 통해 서로 연결될 수 있다. The RAM 131 , the ROM 132 , the main CPU 133 , the graphic processing unit 134 , and the first to n interfaces 135 - 1 to 135 - n may be connected to each other through the bus 136 .

제1 내지 n 인터페이스(135-1 내지 135-n)는 상술한 각종 구성요소들과 연결된다. 인터페이스들 중 하나는 네트워크를 통해 외부 장치와 연결되는 네트워크 인터페이스가 될 수도 있다.The first to n-th interfaces 135-1 to 135-n are connected to the various components described above. One of the interfaces may be a network interface connected to an external device through a network.

메인 CPU(133)는 저장부(110)에 액세스하여, 저장부(110)에 저장된 O/S를 이용하여 부팅을 수행한다. 그리고, 저장부(110)에 저장된 각종 프로그램, 컨텐츠, 데이터 등을 이용하여 다양한 동작을 수행한다. The main CPU 133 accesses the storage unit 110 and performs booting using the O/S stored in the storage unit 110 . Then, various operations are performed using various programs, contents, data, etc. stored in the storage unit 110 .

ROM(132)에는 시스템 부팅을 위한 명령어 세트 등이 저장된다. 턴온 명령이 입력되어 전원이 공급되면, 메인 CPU(133)는 ROM(132)에 저장된 명령어에 따라 저장부(140)에 저장된 O/S를 RAM(131)에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, 메인 CPU(133)는 저장부(140)에 저장된 각종 어플리케이션 프로그램을 RAM(131)에 복사하고, RAM(131)에 복사된 어플리케이션 프로그램을 실행시켜 각종 동작을 수행한다. The ROM 132 stores an instruction set for system booting and the like. When the turn-on command is input and power is supplied, the main CPU 133 copies the O/S stored in the storage unit 140 to the RAM 131 according to the command stored in the ROM 132, and executes the O/S. Boot the system. When booting is completed, the main CPU 133 copies various application programs stored in the storage unit 140 to the RAM 131 , and executes the application programs copied to the RAM 131 to perform various operations.

그래픽 처리부(134)는 연산부(미도시) 및 렌더링부(미도시)를 이용하여 아이콘, 이미지, 텍스트 등과 같은 다양한 객체를 포함하는 화면을 생성한다. 연산부(미도시)는 수신된 제어 명령에 기초하여 화면의 레이아웃에 따라 각 객체들이 표시될 좌표값, 형태, 크기, 컬러 등과 같은 속성값을 연산한다. 렌더링부(미도시)는 연산부(미도시)에서 연산한 속성값에 기초하여 객체를 포함하는 다양한 레이아웃의 화면을 생성한다. 특히, 그래픽 처리부(134)는 사용자 발화 음성에 대응하여 생성된 시스템 응답을 텍스트 형태로 변환하면서, 문자의 폰트, 크기, 색상 등을 결정할 수 있다. 렌더링부(미도시)에서 생성된 화면은 디스플레이부(150)를 통해 디스플레이될 수 있다.The graphic processing unit 134 generates a screen including various objects such as icons, images, and texts by using a calculation unit (not shown) and a rendering unit (not shown). A calculation unit (not shown) calculates attribute values such as coordinate values, shape, size, color, etc. at which each object is to be displayed according to the layout of the screen based on the received control command. The rendering unit (not shown) generates screens of various layouts including objects based on the attribute values calculated by the calculation unit (not shown). In particular, the graphic processing unit 134 may determine a font, size, color, etc. of a character while converting a system response generated in response to a user's spoken voice into a text form. The screen generated by the rendering unit (not shown) may be displayed through the display unit 150 .

한편, 상술한 프로세서(130)의 동작은 저장부(110)에 저장된 프로그램에 의해 이루어질 수 있다.Meanwhile, the above-described operation of the processor 130 may be performed by a program stored in the storage 110 .

저장부(140)는 디스플레이 장치(100')를 구동시키기 위한 O/S(Operating System) 소프트웨어 모듈, 각종 멀티미디어 컨텐츠와 같은 다양한 데이터를 저장한다.The storage unit 140 stores various data such as an O/S (Operating System) software module for driving the display device 100 ′ and various multimedia contents.

특히, 저장부(110)는 사용자 발화 음성에 대응되는 도메인을 검출하고, 사용자 발화 음성과 검출된 도메인 간의 신뢰도에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하기 위한 소프트웨어 모듈을 포함한다. 이에 대해서는 도 11을 통해 상세히 설명하기로 한다.In particular, the storage unit 110 detects a domain corresponding to a user uttered voice, and determines a domain to process a user uttered voice from among a detected domain and a previous domain based on the reliability between the user uttered voice and the detected domain to respond to the system software module for creating This will be described in detail with reference to FIG. 11 .

한편, 센서부(170)는 각종 센서들을 포함할 수 있으며, 터치를 인식하기 위한 터치 센서, 사용자의 움직임을 감지하기 위한 모션 센서 등을 포함할 수 있다. 특히, 센서부(170)는 사용자의 발화 음성과 외부의 잡음을 구별하는 센서를 포함할 수 있다.Meanwhile, the sensor unit 170 may include various sensors, and may include a touch sensor for recognizing a touch, a motion sensor for detecting a user's movement, and the like. In particular, the sensor unit 170 may include a sensor for discriminating the user's spoken voice and external noise.

도 11은 본 발명의 일 실시 예에 따른 저장부에 저장된 소프트웨어 모듈에 관한 도면이다.11 is a diagram of a software module stored in a storage unit according to an embodiment of the present invention.

도 11을 참조하면, 저장부(110)에는 Dialogue Manager 모듈(111), Dialogue Context 모듈(112), NLP 모듈(113), NLG 모듈(114), Discourse Manage 모듈(115), Question Answering 모듈(116), Context Manager 모듈(117) 및 Action Planner 모듈(118) 등의 프로그램이 저장되어 있을 수 있다.Referring to FIG. 11 , the storage unit 110 includes a Dialogue Manager module 111 , a Dialogue Context module 112 , an NLP module 113 , an NLG module 114 , a Discourse Manage module 115 , and a Question Answering module 116 . ), programs such as the Context Manager module 117 and the Action Planner module 118 may be stored.

한편, 상술한 프로세서(130)의 동작은 저장부(110)에 저장된 프로그램에 의해 이루어질 수 있다. 이하에서는 저장부(110)에 저장된 프로그램을 이용한 프로세서(130)의 세부 동작에 대해 자세히 설명하도록 한다. Meanwhile, the above-described operation of the processor 130 may be performed by a program stored in the storage 110 . Hereinafter, detailed operations of the processor 130 using the program stored in the storage 110 will be described in detail.

Dialogue Manager 모듈(111)은 사용자 발화 음성을 분석하여 사용자 발화 음성의 내용과 발화 의도를 검출하는 기능을 수행하고, 사용자와 자연어 대화를 생성할 수 있는 기능을 수행할 수 있다. 특히, Dialogue Manager 모듈(111)은 다이얼로그 시스템의 메인 모듈로써, 사용자와 전자 장치 간에 이루어지는 대화의 전체적인 흐름을 관리하고, 다른 내부 모듈을 전반적으로 관리하는 기능을 수행할 수 있다.The Dialogue Manager module 111 may perform a function of analyzing the user's uttered voice to detect the content and uttering intention of the user's uttered voice, and may perform a function of generating a natural language conversation with the user. In particular, the Dialogue Manager module 111 is a main module of the dialog system, and may perform a function of managing the overall flow of a conversation between the user and the electronic device and overall managing other internal modules.

Dialogue Context 모듈(112)은 사용자와 전자 장치(100) 간에 주고 받은 대화 내용을 기록하고 관리하는 기능을 수행할 수 있다.The Dialogue Context module 112 may perform a function of recording and managing content of conversations exchanged between the user and the electronic device 100 .

또한, NLP 모듈(113)은 자연어를 처리하고 이에 기초하여 사용자 발화 의도를 검출하는 기능을 수행할 수 있다.In addition, the NLP module 113 may process a natural language and perform a function of detecting a user's utterance intention based thereon.

NLG 모듈(114)은 사용자 발화 음성에 대응하여 생성된 시스템 응답을 텍스트로 변환하는 기능을 수행할 수 있다.The NLG module 114 may perform a function of converting a system response generated in response to a user's spoken voice into text.

또한, Discourse Manage 모듈(115)은 사용자 발화 음성의 내용 및 발화 의도에 기초하여 시스템 응답을 생성하는 기능을 수행할 수 있다.Also, the Discourse Manage module 115 may perform a function of generating a system response based on the content of the user's uttered voice and utterance intention.

Question Answering 모듈(116)은 사용자로부터 수신된 질문에 대한 내용을 처리하는 기능을 수행할 수 있다.The Question Answering module 116 may perform a function of processing the content of the question received from the user.

Context Manager 모듈(117)은 사용자와 전자 장치(100) 간의 대화 내용을 모니터링하면서 필요한 컨텍스트에 관한 정보를 제공하거나, 사용자 발화 음성에 대응되는 컨텍스트를 검출하여 제공하는 기능을 수행할 수 있다.The context manager module 117 may perform a function of providing necessary context information while monitoring the content of a conversation between the user and the electronic device 100 or detecting and providing a context corresponding to the user's uttered voice.

Action Planner 모듈(118)은 분석된 사용자 발화 음성의 내용 및 발화 의도에 기초하여 적어도 하나의 외부 장치의 기능, 성능 등을 고려하여 적어도 하나의 외부 장치를 제어하도록 하는 기능을 수행할 수 있다.The action planner module 118 may perform a function of controlling the at least one external device in consideration of the function and performance of the at least one external device based on the analyzed content of the user's uttered voice and the utterance intention.

여기서, Action Planner 모듈(118)에서 출력되는 결과는 다양한 타입으로 결정될 수 있다. 예를 들어, 일반적으로 사용자의 발화 음성을 처리하게 되면 "요청이 성공적으로 수행되었습니다"와 같이 출력될 수 있고, 사용자의 발화 음성을 처리하지 않는 경우에는 "요청하신 내용은 이미지 수행된 것입니다"와 같이 출력될 수 있다. 또한, 사용자의 발화 음성을 처리하되 사용자의 발화 의도와 다르게 처리하게 되면, "사용자가 요청하신 내용 대신 다른 방식으로 수행되었습니다"와 같이 출력될 수도 있다.Here, the results output from the action planner module 118 may be determined in various types. For example, in general, if the user's spoken voice is processed, "The request has been successfully performed" may be output. can be output as In addition, if the user's utterance voice is processed but different from the user's utterance intention, "It was performed in a different way instead of the user's request" may be output.

또한, 사용자 발화 음성을 통해 요청한 부분 이외에 이와 관련된 부분을 추가적으로 처리하는 경우에는 "요청하신 부분은 성공적으로 수행되었습니다. 또한, 요청하신 부분과 관련하여 또 다른 가능한 기능을 찾았습니다"와 같이 출력될 수도 있다.In addition, if a part related to this is additionally processed in addition to the part requested through the user's voice, it may output something like "The requested part has been successfully performed. Also, another possible function has been found related to the requested part." have.

또한, 사용자의 수동적인 조작이 필요한 경우에는 "요청하신 부분을 처리하기 위해서는 사용자의 선택이 필요합니다"와 같이 출력될 수 있고, 파라미터 값에 대한 요청이 필요한 경우에는 "파라미터 값에 대한 입력을 요청드립니다"와 같이 출력될 수도 있다. 또한, 사용자 발화 음성에 대하여 2 이상의 처리 방안이 결정될 경우, "사용자의 선택이 필요합니다"와 같이 출력될 수 있다.In addition, when manual manipulation of the user is required, it can be output as "User's selection is required to process the requested part", and when a request for parameter value is required, "Request input for parameter value" It may be printed as "I will give you". In addition, when two or more processing methods are determined for the user's spoken voice, "the user's selection is required" may be output.

한편, 도 12는 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법을 설며하기 위한 흐름도이다.Meanwhile, FIG. 12 is a flowchart illustrating a method for controlling an electronic device according to an embodiment of the present invention.

도 12에 도시된 대화 주제 별로 카테고리화된 도메인 정보를 저장하는 저장부를 포함하는 전자 장치의 제어 방법은 사용자 발화 음성에 대응되는 도메인을 검출한다(S1210).The control method of the electronic device including a storage unit for storing domain information categorized for each conversation topic shown in FIG. 12 detects a domain corresponding to a user's uttered voice (S1210).

그리고, 사용자 발화 음성과 검출된 도메인 간의 신뢰도에 기초하여 검출된 도메인과 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성한다(S1220).Then, a system response is generated by determining a domain to process the user's spoken voice from among the detected domain and the previous domain based on the reliability between the user's spoken voice and the detected domain ( S1220 ).

여기서, 저장부는 도메인 각각에 대응되는 대화 주제를 컨텍스트 별로 카테고리화하여 저장하며, 시스템 응답을 생성하는 단계는, 이전 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되면, 사용자 발화 음성에 대응되는 컨텍스트를 판단하고, 사용자 발화 음성과 판단된 컨텍스트 간의 신뢰도에 기초하여 판단된 컨텍스트 및 이전 컨텍스트 중 사용자 발화 음성을 처리할 컨텍스트를 결정하여 시스템 응답을 생성할 수 있다.Here, the storage unit categorizes and stores the conversation topic corresponding to each domain for each context, and the step of generating a system response includes, when the previous domain is determined as a domain to process the user's uttered voice, the context corresponding to the user's uttered voice. The system response may be generated by determining a context in which to process the user's spoken voice among the determined contexts and previous contexts based on the reliability between the user's spoken voice and the determined context.

또한, 시스템 응답을 생성하는 단계는, 판단된 컨텍스트가 사용자 발화 음성을 처리할 컨텍스트로 결정되면, 이전 컨텍스트와 관련된 정보를 저장부에 저장하고 판단된 컨텍스트에서의 발화 음성 처리가 종료되면, 저장된 이전 컨텍스트와 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.In addition, the generating of the system response includes, when the determined context is determined as a context to process the user's spoken voice, stores information related to the previous context in the storage unit, and when the spoken voice processing in the determined context ends, the stored previous A newly uttered voice may be processed based on context-related information.

또한, 시스템 응답을 생성하는 단계는, 검출된 도메인이 사용자 발화 음성을 처리할 도메인으로 결정되면, 이전 도메인과 관련된 정보를 저장부에 저장하고 검출된 도메인에서의 발화 음성 처리가 종료되면, 저장된 이전 도메인과 관련된 정보에 기초하여 신규 발화 음성을 처리할 수 있다.In addition, the generating of the system response includes, when the detected domain is determined as a domain to process the user's spoken voice, stores information related to the previous domain in the storage unit, and when the spoken voice processing in the detected domain ends, the stored previous A newly uttered voice may be processed based on domain-related information.

또한, 시스템 응답을 생성하는 단계는, 사용자 발화 음성을 구성하는 적어도 하나의 발화 요소와 검출된 도메인에 속하는 적어도 하나의 발화 요소 간의 동일 여부에 따른 신뢰도 스코어에 기초하여 사용자 발화 음성과 검출된 도메인 간의 신뢰도를 판단할 수 있다.In addition, the generating of the system response may include: based on a confidence score according to whether at least one speech element constituting the user's speech voice and at least one speech element belonging to the detected domain are identical between the user's speech voice and the detected domain. reliability can be judged.

또한, 시스템 응답을 생성하는 단계는, 발화 음성에 대응되는 시스템 응답이 결정된 도메인 내에서 적어도 하나의 외부 장치의 기능 제어가 요구되는 컨텍스트에 기초하여 생성되면, 외부 장치의 기능에 관한 정보에 기초하여 적어도 하나의 외부 장치의 기능을 제어하기 위한 시스템 응답을 생성할 수 있다.In addition, the generating of the system response may include, when a system response corresponding to the spoken voice is generated based on a context in which a function control of at least one external device is required within the determined domain, based on the information on the function of the external device. A system response for controlling a function of the at least one external device may be generated.

또한, 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법은 기 설정된 네트워크 내에 추가된 적어도 하나의 외부 장치에 대한 기능 정보를 수신하여 기 저장된 외부 장치의 기능에 대한 정보를 업데이트하는 단계를 더 포함할 수 있다.In addition, the method of controlling an electronic device according to an embodiment of the present invention further includes the step of receiving function information on at least one external device added in a preset network and updating the information on the function of the pre-stored external device. can do.

또한, 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법은 발화 이력 정보에 기초하여 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하는 단계를 더 포함하며, 발화 이력 정보는 이전에 수신된 사용자 발화 음성, 이전에 수신된 사용자 발화 음성을 처리한 도메인과 관련된 정보 및 이전에 수신된 사용자 발화 음성에 대응되는 시스템 응답 중 적어도 하나를 포함할 수 있다.In addition, the method of controlling an electronic device according to an embodiment of the present invention further includes generating a system response by determining a domain to process a user's uttered voice based on the utterance history information, wherein the utterance history information has been previously received. It may include at least one of a user uttered voice, information related to a domain in which a previously received user uttered voice has been processed, and a system response corresponding to a previously received user uttered voice.

또한, 도메인 정보는, 대화 주제에 대응되는 태스크 수행을 위한 제어 정보 및 대화 주제 별 대화 패턴 중 적어도 하나를 포함할 수 있다.Also, the domain information may include at least one of control information for performing a task corresponding to a conversation topic and a conversation pattern for each conversation topic.

또한, 본 발명의 일 실시 예에 따른 전자 장치의 제어 방법은 사용자 발화 음성을 입력받는 단계를 더 포함할 수 있다.Also, the method of controlling an electronic device according to an embodiment of the present invention may further include receiving a user's spoken voice.

한편, 본 발명에 따른 제어 방법을 순차적으로 수행하는 프로그램이 저장된 비일시적 판독 가능 매체(non-transitory computer readable medium)가 제공될 수 있다. Meanwhile, a non-transitory computer readable medium in which a program for sequentially performing the control method according to the present invention is stored may be provided.

일 예로, 사용자 발화 음성에 대응되는 도메인을 검출하는 단계 및 사용자 발화 음성과 검출된 도메인 간의 신뢰도에 기초하여 검출된 도메인 및 이전 도메인 중 사용자 발화 음성을 처리할 도메인을 결정하여 시스템 응답을 생성하는 단계를 를 수행하는 프로그램이 저장된 비일시적 판독 가능 매체(non-transitory computer readable medium)가 제공될 수 있다.For example, detecting a domain corresponding to the user's uttered voice and generating a system response by determining a domain to process the user's uttered voice from among a detected domain and a previous domain based on the reliability between the user's uttered voice and the detected domain A non-transitory computer readable medium in which a program for performing a is stored may be provided.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, and the like, and can be read by a device. Specifically, the various applications or programs described above may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

또한, 전자 장치에 대해 도시한 상술한 블록도에서는 버스(bus)를 미도시하였으나, 전자 장치에서 각 구성요소 간의 통신은 버스를 통해 이루어질 수도 있다. 또한, 각 디바이스에는 상술한 다양한 단계를 수행하는 CPU, 마이크로 프로세서 등과 같은 프로세서가 더 포함될 수도 있다. In addition, although a bus is not illustrated in the above-described block diagram of an electronic device, communication between respective components in the electronic device may be performed through a bus. In addition, each device may further include a processor such as a CPU or a microprocessor that performs the various steps described above.

또한, 이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

100: 저장부 120: 스피커부
130: 프로세서100: storage unit 120: speaker unit
130: processor

Claims

a storage unit for storing domain information categorized by conversation topic;
microphone unit;
speaker unit; and
detecting a domain corresponding to a user uttered voice received through the microphone unit,
A first confidence between the user uttered voice and the detected domain and a second confidence between the user uttered voice and a previous domain used for processing a previous user uttered voice received through the microphone unit before the user uttered voice determines a domain for processing the user's uttered voice from among the detected domain and the previous domain, generates a system response corresponding to the user's uttered voice based on the determined domain, and outputs the generated system response A processor for controlling the speaker unit so as to include;
The processor is
When it is determined that the detected domain is a domain for processing the user's uttered voice, information related to the previous domain is stored in the storage unit, and when the uttered voice processing in the detected domain is terminated, the stored previous domain and An electronic device characterized in that the newly uttered voice is processed based on related information.

According to claim 1,
The storage unit,
categorizing and storing conversation topics corresponding to each of the domains for each context,
The processor is
When the previous domain is determined as the domain to process the user's uttered voice, a context corresponding to the user's uttered voice is determined, and based on the reliability between the user uttered voice and the determined context, among the determined context and the previous context and generating the system response by determining a context in which to process the user's uttered voice.

3. The method of claim 2,
The processor is
When the determined context is determined as a context to process the user's spoken voice, information related to the previous context is stored in the storage unit, and when processing of the spoken voice in the determined context is terminated, the stored previous context related information An electronic device characterized in that the newly uttered voice is processed based on the information.

delete

According to claim 1,
The processor is
and determining the reliability between the user's uttered voice and the detected domain based on a reliability score according to whether at least one utterance constituting the user's uttered voice is identical to at least one uttered element belonging to the detected domain. electronic device with

According to claim 1,
It further includes; a communication unit for performing communication with at least one external device,
The processor is
When a system response corresponding to the spoken voice is generated based on a context in which the function control of the at least one external device is requested within the determined domain, the system response of the at least one external device is generated based on the information on the function of the external device. and generating the system response for controlling a function.

7. The method of claim 6,
The storage unit further stores information on the function of the external device,
The communication unit receives function information on at least one external device added in a preset network,
The processor is
and updating the information stored in the storage unit based on the received function information on the at least one external device.

According to claim 1,
The processor is
determining a domain to process the user's uttered voice based on utterance history information to generate the system response,
The utterance history information includes at least one of a previously received user uttered voice, information related to a domain in which the previously received user uttered voice has been processed, and a system response corresponding to the previously received user uttered voice. electronic device with

According to claim 1,
The domain information is
The electronic device comprising at least one of control information for performing a task corresponding to the conversation topic and a conversation pattern for each conversation topic.

According to claim 1,
The electronic device further comprising; a microphone unit for receiving the user's speech voice.

In the control method of an electronic device comprising a storage unit for storing domain information categorized by conversation topic,
receiving a user uttered voice;
detecting a domain corresponding to the user's spoken voice;
the detection based on a first confidence between the user uttered voice and the detected domain and a second confidence between the user uttered voice and a previous domain used before the detected domain to process a previous user uttered voice determining a domain to process the user's uttered voice from among the old domain and the previous domain;
storing information related to the previous domain in the storage unit when the detected domain is selected as a domain to process the user's uttered voice;
generating a system response based on the selected domain;
outputting the system response; and
and processing a new spoken voice based on the stored information related to the previous domain when the spoken voice processing in the selected domain is finished.

delete

12. The method of claim 11,
The step of generating the system response comprises:
and determining the reliability between the user's spoken voice and the detected domain based on a reliability score according to whether at least one speech element constituting the user's speech voice and at least one speech element belonging to the detected domain are identical. A method for controlling an electronic device.

12. The method of claim 11,
The step of generating the system response comprises:
When a system response corresponding to the spoken voice is generated based on a context in which function control of at least one external device is required within the determined domain, the at least one and generating the system response for controlling a function of an external device.

17. The method of claim 16,
The method of controlling an electronic device further comprising: receiving function information on at least one external device added in a preset network and updating the stored information on the function of the external device.

12. The method of claim 11,
Generating the system response by determining a domain to process the user's uttered voice based on the utterance history information;
The utterance history information includes at least one of a previously received user uttered voice, information related to a domain in which the previously received user uttered voice has been processed, and a system response corresponding to the previously received user uttered voice. A method for controlling an electronic device.

12. The method of claim 11,
The domain information is
and at least one of control information for performing a task corresponding to the conversation topic and a conversation pattern for each conversation topic.

delete