KR20200028158A

KR20200028158A - Media play device, method and computer program for providing multi language voice command service

Info

Publication number: KR20200028158A
Application number: KR1020180106483A
Authority: KR
Inventors: 이재동; 류민우; 안지용; 이장원; 홍미정
Original assignee: 주식회사 케이티
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-16
Also published as: KR102638373B1

Abstract

A device for playing media to provide a multilanguage voice command service comprises: an input unit receiving a voice command from a user; a language recognition unit analyzing the inputted voice command and recognizing language uttered by the user; a classification unit determining a sentence structure of the voice command based on the recognized language and classifying vocabulary proficiency according to the determined sentence structure; a conversion unit converting output data for a user interface of the device for playing media based on the recognized language and the classified vocabulary proficiency; and an output unit dynamically outputting the user interface by applying the converted output data.

Description

MEDIA PLAY DEVICE, METHOD AND COMPUTER PROGRAM FOR PROVIDING MULTI LANGUAGE VOICE COMMAND SERVICE}

본 발명은 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램에 관한 것이다. The present invention relates to a media playback apparatus, method and computer program that provides a multilingual voice command service.

지능형 개인 비서는 사용자가 요구하는 작업을 처리하고, 사용자에게 특화된 서비스를 제공하는 소프트웨어 에이전트이다. 지능형 개인 비서는 인공 지능(AI) 엔진과 음성 인식을 기반으로 사용자에게 맞춤 정보를 수집하여 제공하고, 사용자의 음성 명령에 따라 일정 관리, 이메일 전송, 식당 예약 등 여러 기능을 수행하는 점에서 사용자의 편의성을 향상시키는 장점을 갖는다. The intelligent personal assistant is a software agent that handles the tasks required by the user and provides specialized services to the user. The intelligent personal assistant collects and provides personalized information to the user based on the artificial intelligence (AI) engine and speech recognition, and performs various functions such as scheduling, emailing, and restaurant reservation according to the user's voice commands. It has the advantage of improving convenience.

이러한 지능형 개인 비서는 주로 스마트폰에서 맞춤형 개인 서비스의 형태로 제공되고 있으며, 대표적으로 애플의 시리(siri), 구글의 나우(now), 삼성의 빅스비 등이 이에 포함된다. 이와 관련하여, 선행기술인 한국공개특허 제 2016-0071111호는 전자 장치에서의 개인 비서 서비스 제공 방법을 개시하고 있다.These intelligent personal assistants are mainly provided in the form of personalized personal services on smartphones, including Apple's Siri, Google's Now, and Samsung's Bixby. In this regard, Korean Patent Publication No. 2016-0071111, which is a prior art, discloses a method for providing a personal assistant service in an electronic device.

최근의 지능형 개인 비서는 컨시어지 서비스와 같이 다양한 서비스 산업에 적용되어 사용자와 터치/시각/음성을 통해 상호 작용이 확대되고 있으며, 이를 통해 더욱 다양한 정보를 제공할 수 있게 되었다. 그러나 외국인의 경우, 지능형 개인 비서의 이용에 어려움을 겪게 되며, 시각/음성과 같이 확대된 상호 작용을 기반으로 하는 서비스를 제공받을 수 없게 된다는 문제점을 가지고 있다. Recently, intelligent personal assistants have been applied to various service industries such as concierge services, and interaction with users has been expanded through touch / visual / speech, and through this, it has been possible to provide more diverse information. However, foreigners have difficulties in using intelligent personal assistants, and have the problem of being unable to receive services based on expanded interaction such as visual / voice.

사용자가 사용하는 언어, 어휘 구사 성향, 억양 스타일에 따라 실시간으로 사용자 인터페이스를 동적으로 구성하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. It is intended to provide a media playback device, method, and computer program that provides a multilingual voice command service that dynamically configures a user interface in real time according to a user's language, vocabulary inclination, and accent style.

사용자의 어휘 구사 성향에 따라 사용자의 언어로 음성 명령에 대한 발화를 유도하도록 맞춤형 발화 가이드를 제공하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다. It is an object of the present invention to provide a media playback device, method, and computer program that provides a multilingual voice command service that provides a personalized speech guide to induce utterance of speech commands in a user's language according to a user's vocabulary inclination.

사용자의 음성 명령의 패턴을 분석하여 음성 명령에 대한 분석 결과를 사용자의 억양 스타일과 유사해지도록 합성하고, 사용자의 억양 스타일과 합성된 음성 명령에 대한 분석 결과를 오디오를 통해 출력하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.Multilingual voice command service that analyzes the pattern of the user's voice command and synthesizes the analysis result for the voice command to be similar to the user's intonation style, and outputs the analysis result of the user's intonation style and synthesized voice command through audio. It is intended to provide a media playback device, method and computer program providing a.

사용자가 사용하는 언어로 사용자 인터페이스를 재구성함으로써, 사용자의 음성 명령 서비스에 대한 사용성을 높이고, 사용자의 언어와 상관 없이 사용자 인터랙션에 대한 편의성을 향상시킬 수 있는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.A media playback device that provides a multilingual voice command service that can improve the usability of a user's voice command service and improve the convenience of user interaction regardless of the user's language by reconfiguring the user interface in a language used by the user, It is intended to provide methods and computer programs.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 사용자로부터 음성 명령을 입력받는 입력부, 상기 입력된 음성 명령을 분석하여 상기 사용자가 발화한 언어를 인식하는 언어 인식부, 상기 인식된 언어에 기초하여 상기 음성 명령의 문장 구성을 결정하고, 상기 결정된 문장 구성에 따라 어휘 구사 성향을 분류하는 분류부, 상기 인식된 언어 및 상기 분류된 어휘 구사 성향에 기초하여 상기 미디어 재생 장치의 사용자 인터페이스에 대한 출력 사항을 변환하는 변환부 및 상기 변환된 출력 사항을 적용하여 동적으로 상기 사용자 인터페이스를 출력하는 출력부를 포함하는 미디어 재생 장치를 제공할 수 있다. As a means for achieving the above technical problem, an embodiment of the present invention, an input unit for receiving a voice command from a user, a language recognition unit for analyzing the input voice command to recognize the language spoken by the user, the The classification unit for determining the sentence structure of the voice command based on the recognized language, and classifying the vocabulary propensity according to the determined sentence configuration, the media playback device based on the recognized language and the classified vocabulary propensity It is possible to provide a media playback device including a conversion unit for converting output items for a user interface and an output unit for dynamically outputting the user interface by applying the converted output items.

본 발명의 다른 실시예는, 사용자로부터 음성 명령을 입력받는 단계, 상기 입력된 음성 명령을 분석하여 상기 사용자가 발화한 언어를 인식하는 단계, 상기 인식된 언어에 기초하여 상기 음성 명령의 문장 구성을 결정하고, 상기 결정된 문장 구성에 따라 어휘 구사 성향을 분류하는 단계, 상기 인식된 언어 및 상기 분류된 어휘 구사 성향에 기초하여 상기 미디어 재생 장치의 사용자 인터페이스에 대한 출력 사항을 변환하는 단계 및 상기 변환된 출력 사항을 적용하여 동적으로 상기 사용자 인터페이스를 출력하는 단계를 포함하는 다국어 음성 명령 서비스 제공 방법을 제공할 수 있다. According to another embodiment of the present invention, receiving a voice command from a user, analyzing the input voice command to recognize a language spoken by the user, and constructing a sentence structure of the voice command based on the recognized language Determining, classifying a vocabulary proficiency according to the determined sentence composition, converting outputs for a user interface of the media playback device based on the recognized language and the classified vocabulary proficiency, and the converted It is possible to provide a method for providing a multilingual voice command service, including dynamically outputting the user interface by applying output information.

본 발명의 또 다른 실시예는, 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 사용자로부터 음성 명령을 입력받고, 상기 입력된 음성 명령을 분석하여 상기 사용자가 발화한 언어를 인식하고, 상기 인식된 사용자의 언어에 기초하여 상기 음성 명령의 문장 구성을 결정하고, 상기 결정된 문장 구성에 따라 어휘 구사 성향을 분류하고, 상기 인식된 언어 및 상기 분류된 어휘 구사 성향에 기초하여 상기 미디어 재생 장치의 사용자 인터페이스에 대한 출력 사항을 변환하고, 상기 변환된 출력 사항을 적용하여 동적으로 상기 사용자 인터페이스를 출력하도록 하는 명령어들의 시퀀스를 포함하는 매체에 저장된 컴퓨터 프로그램을 제공할 수 있다. According to another embodiment of the present invention, when a computer program is executed by a computing device, a voice command is input from a user, and the input voice command is analyzed to recognize a language spoken by the user, and the recognized user The sentence structure of the voice command is determined based on the language, the vocabulary proficiency is classified according to the determined sentence configuration, and the user interface of the media player is based on the recognized language and the classified vocabulary proficiency. It is possible to provide a computer program stored in a medium including a sequence of instructions for converting an output item and applying the converted output item to dynamically output the user interface.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 사용자가 사용하는 언어, 어휘 구사 성향, 억양 스타일에 따라 실시간으로 사용자 인터페이스를 동적으로 구성하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다. According to any one of the above-described problem solving means of the present invention, a media playback device, method and method for providing a multilingual voice command service that dynamically configures a user interface in real time according to a language, vocabulary propensity, and accent style used by a user, and Computer programs can be provided.

사용자의 어휘 구사 성향에 따라 사용자의 언어로 음성 명령에 대한 발화를 유도하도록 맞춤형 발화 가이드를 제공하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다. According to a user's vocabulary inclination, a media playback device, method, and computer program that provides a multilingual voice command service that provides a personalized speech guide to induce speech in a user's language can be provided.

사용자의 음성 명령의 패턴을 분석하여 음성 명령에 대한 분석 결과를 사용자의 억양 스타일과 유사해지도록 합성하고, 사용자의 억양 스타일과 합성된 음성 명령에 대한 분석 결과를 오디오를 통해 출력하는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.Multilingual voice command service that analyzes the pattern of the user's voice command and synthesizes the analysis result for the voice command to be similar to the user's intonation style, and outputs the analysis result of the user's intonation style and synthesized voice command through audio. It may provide a media playback device, method and computer program providing a.

사용자가 사용하는 언어로 사용자 인터페이스를 재구성함으로써, 사용자의 음성 명령 서비스에 대한 사용성을 높이고, 사용자의 언어와 상관 없이 사용자 인터랙션에 대한 편의성을 향상시킬 수 있는 다국어 음성 명령 서비스를 제공하는 미디어 재생 장치, 방법 및 컴퓨터 프로그램을 제공할 수 있다.A media playback device that provides a multilingual voice command service that can improve the usability of a user's voice command service and improve the convenience of user interaction regardless of the user's language by reconfiguring the user interface in a language used by the user, Methods and computer programs can be provided.

도 1은 본 발명의 일 실시예에 따른 다국어 음성 명령 서비스 제공 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자의 언어에 기초하여 사용자의 어휘 구사 성향을 분류하는 과정을 설명하기 위한 예시적인 도면이다.
도 4a 및 도 4b는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 발화 리스트 문장을 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자의 언어로 변환하는 과정을 설명하기 위한 예시적인 도면이다.
도 5는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자의 음성 명령에 대한 억양 스타일을 추론하는 과정을 설명하기 위한 예시적인 도면이다.
도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 사용자의 언어에 기초하여 출력 사항이 변환된 미디어 재생 장치의 사용자 인터페이스를 도시한 예시적인 도면이다.
도 7은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 다국어 음성 명령 서비스를 제공하는 방법의 순서도이다. 1 is a block diagram of a system for providing a multilingual voice command service according to an embodiment of the present invention.
2 is a block diagram of a media playback apparatus according to an embodiment of the present invention.
3 is an exemplary diagram for describing a process of classifying a user's vocabulary based on a user's language in a media playback apparatus according to an embodiment of the present invention.
4A and 4B are exemplary diagrams for explaining a process of converting a utterance list sentence into a language of a recognized user based on a preset mapping table for each language in the media playback apparatus according to an embodiment of the present invention.
5 is an exemplary diagram for explaining a process of inferring an accent style for a user's voice command in a media playback apparatus according to an embodiment of the present invention.
6A to 6C are exemplary views illustrating a user interface of a media playback device in which output items are converted based on a user's language according to an embodiment of the present invention.
7 is a flowchart of a method of providing a multilingual voice command service in a media playback apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains can easily practice. However, the present invention can be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . Also, when a part is said to “include” a certain component, it means that the component may further include other components, not to exclude other components, unless otherwise stated. However, it should be understood that the existence or addition possibilities of numbers, steps, actions, components, parts or combinations thereof are not excluded in advance.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term “unit” includes a unit realized by hardware, a unit realized by software, and a unit realized by using both. Further, one unit may be realized by using two or more hardware, and two or more units may be realized by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.Some of the operations or functions described in this specification as being performed by a terminal or device may be performed instead on a server connected to the corresponding terminal or device. Similarly, some of the operations or functions described as being performed by the server may be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 다국어 음성 명령 서비스 제공 시스템의 구성도이다. 도 1을 참조하면, 다국어 음성 명령 서비스 제공 시스템(1)은 미디어 재생 장치(110), 디스플레이(115) 및 다국어 대화 서버(120)를 포함할 수 있다. 미디어 재생 장치(110), 디스플레이(115) 및 다국어 대화 서버(120)는 다국어 음성 명령 서비스 제공 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다. 1 is a block diagram of a system for providing a multilingual voice command service according to an embodiment of the present invention. Referring to FIG. 1, the multilingual voice command service providing system 1 may include a media playback device 110, a display 115, and a multilingual conversation server 120. The media playback device 110, the display 115, and the multilingual conversation server 120 exemplarily show components that can be controlled by the multilingual voice command service providing system 1.

도 1의 다국어 음성 명령 서비스 제공 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 미디어 재생 장치(110)는 다국어 대화 서버(120)와 동시에 또는 시간 간격을 두고 연결될 수 있다. Each component of the multilingual voice command service providing system 1 of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1, the media playback device 110 may be connected to the multilingual conversation server 120 at the same time or at a time interval.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.Network refers to a connection structure capable of exchanging information between nodes such as terminals and servers, and a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World) Wide Web), wired and wireless data communication networks, telephone networks, and wired and wireless television communication networks. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasound Communication, Visible Light Communication (VLC), LiFi, and the like are included, but are not limited thereto.

미디어 재생 장치(110)는 사용자(100)로부터 음성 명령을 입력받고, 입력된 음성 명령을 분석하여 사용자(100)가 발화한 언어를 인식할 수 있다. The media playback device 110 may receive a voice command from the user 100 and analyze the input voice command to recognize the language spoken by the user 100.

미디어 재생 장치(110)는 인식된 언어에 기초하여 음성 명령의 문장 구성을 결정하고, 결정된 문장 구성에 따라 어휘 구사 성향을 분류할 수 있다. 예를 들어, 미디어 재생 장치(110)는 음성 명령의 문장 구성에 포함된 적어도 하나 이상의 단어의 품사를 결정하고, 결정된 각 품사 별로 사용 어휘를 카테고리화하고, 카테고리화된 사용 어휘에 따라 결정된 각 품사 별로 등급을 매핑하고, 각 품사 별로 매핑된 등급에 기초하여 사용자(100)의 어휘 구사 성향을 결정할 수 있다.The media playback device 110 may determine the sentence structure of the voice command based on the recognized language, and classify the vocabulary propensity according to the determined sentence structure. For example, the media playback device 110 determines a part-of-speech of at least one word included in a sentence configuration of a voice command, categorizes a used vocabulary for each determined part-of-speech, and determines each part-of-speech according to the categorized use vocabulary The ratings are mapped by stars, and the vocabulary propensity of the user 100 may be determined based on the ratings mapped to each part of speech.

미디어 재생 장치(110)는 분류된 어휘 구사 성향에 기초하여 사용자(100)가 발화한 언어로 발화 가이드 문장을 생성할 수 있다. 예를 들어, 미디어 재생 장치(110)는 분류된 어휘 구사 성향에 기초하여 어휘를 선별하고, 선별된 어휘에 기초하여 발화 가이드 문장을 인식된 사용자(100)의 언어로 변환할 수 있다. 이 때, 미디어 재생 장치(110)는 사용자(100)가 발화한 음성 명령의 사용 빈도에 기초하여 음성 명령 정보를 도출하고, 도출된 음성 명령 정보에 기초하여 발화 리스트 문장을 선별하고, 선별된 발화 리스트 문장의 어휘를 사용자(100)의 어휘 구사 성향에 기초하여 재구성할 수 있다. 미디어 재생 장치(110)는 어휘가 재구성된 발화 리스트 문장을 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자(100)의 언어로 변환할 수 있다. 미디어 재생 장치(110)는 인식된 언어 및 분류된 어휘 구사 성향에 기초하여 미디어 재생 장치(110)의 사용자 인터페이스에 대한 디스플레이 출력 사항을 변환할 수 있다.The media playback device 110 may generate a utterance guide sentence in a language spoken by the user 100 based on the classified vocabulary propensity. For example, the media playback device 110 may select a vocabulary based on the classified vocabulary propensity, and convert the utterance guide sentence into the language of the recognized user 100 based on the selected vocabulary. At this time, the media playback device 110 derives voice command information based on the frequency of use of the voice command spoken by the user 100, selects a speech list sentence based on the derived voice command information, and selects the spoken speech The vocabulary of the list sentence may be reconstructed based on the vocabulary proficiency of the user 100. The media playback device 110 may convert a speech list sentence in which the vocabulary is reconstructed into a language of the recognized user 100 based on a preset mapping table for each language. The media playback device 110 may convert display outputs for the user interface of the media playback device 110 based on the recognized language and classified vocabulary propensity.

미디어 재생 장치(110)는 미디어 재생 장치(110)의 디스플레이(115)에 출력된 텍스트를 분류된 어휘 구사 성향을 참조하여 인식된 언어의 형식으로 변환할 수 있다.The media playback device 110 may convert the text output on the display 115 of the media playback device 110 into a recognized language format by referring to the classified vocabulary propensity.

미디어 재생 장치(110)는 입력된 음성 명령으로부터 사용자(100)의 억양 및 엑센트에 대한 언어 특징 정보를 추출하고, 추출된 언어 특징 정보에 대한 패턴을 통해 음성 명령에 대한 억양 스타일을 분석할 수 있다. 예를 들어, 미디어 재생 장치(110)는 입력된 음성 명령을 윤율 표기 규약 표기법에 기초하여 음높이 심볼로 변환하고, 기설정된 음성 시퀀스에 기초하여 변환된 음높이 심볼을 분석하고, 분석된 음높이 심볼의 패턴과 기저장된 언어별 억양 패턴과의 유사도에 기초하여 음성 명령에 대한 억양 스타일을 추론할 수 있다. The media playback device 110 may extract language feature information for the accent and accent of the user 100 from the input voice command, and analyze the accent style for the voice command through a pattern for the extracted language feature information. . For example, the media playback device 110 converts the input voice command into a pitch symbol based on the rhythm notation convention notation, analyzes the transformed pitch symbol based on a preset speech sequence, and analyzes the pattern of the pitch symbol The intonation style for the voice command may be inferred based on the similarity with the pre-stored intonation pattern for each language.

미디어 재생 장치(110)는 인식된 사용자(100)의 언어에 기초하여 음성 명령을 다국어 대화 서버(120)로 전송하고, 다국어 대화 서버(120)로부터 음성 명령에 대한 분석 결과를 수신할 수 있다. 이 때, 미디어 재생 장치(110)는 다국어 대화 서버(120)로부터 수신한 음성 명령에 대한 분석 결과를 사용자(100)의 억양 스타일과 합성할 수 있다. 미디어 재생 장치(110)는 추론된 억양 스타일에 기초하여 사용자 인터페이스에 대한 출력 사항을 변환할 수 있다. The media playback device 110 may transmit a voice command to the multilingual conversation server 120 based on the recognized language of the user 100 and receive an analysis result of the voice command from the multilingual conversation server 120. At this time, the media playback device 110 may synthesize the analysis result of the voice command received from the multilingual conversation server 120 with the accent style of the user 100. The media playback device 110 may convert outputs for the user interface based on the inferred accent style.

미디어 재생 장치(110)는 사용자 인터페이스에 대한 출력 사항을 변환하면, 변환된 출력 사항을 적용하여 동적으로 사용자 인터페이스를 출력할 수 있다. 예를 들어, 미디어 재생 장치(110)는 사용자(100)의 언어로 음성 명령을 유도하도록 생성된 발화 가이드 문장을 디스플레이(150)에 출력할 수 있다. 다른 예를 들어, 미디어 재생 장치(110)는 미디어 재생 장치(110)의 디스플레이(115)에 출력된 텍스트를 분류된 어휘 구사 성향을 참조하여 인식된 언어의 형식으로 변환되면, 변환된 텍스트를 디스플레이(115)에 출력할 수 있다. 또 다른 예를 들어, 미디어 재생 장치(110)는 사용자(100)의 억양 스타일과 합성된 음성 명령에 대한 분석 결과를 오디오를 통해 출력할 수 있다.When the media playback device 110 converts the output for the user interface, the media playback device 110 may dynamically output the user interface by applying the converted output. For example, the media playback device 110 may output a utterance guide sentence generated to induce a voice command in the language of the user 100 on the display 150. For another example, the media playback device 110 displays the converted text when the text output on the display 115 of the media playback device 110 is converted into a recognized language format by referring to the classified vocabulary propensity. (115). For another example, the media playback device 110 may output the analysis result of the speech style synthesized with the accent style of the user 100 through audio.

디스플레이(115)는 사용자 인터페이스 화면을 디스플레이를 통해 출력할 수 있다. 예를 들어, 디스플레이(115)는 디폴트로 설정된 국문형 사용자 인터페이스를 디스플레이를 통해 출력하고, 사용자(100)의 인식된 언어에 기초하여 예를 들어, 영문형 사용자 인터페이스를 디스플레이를 통해 출력할 수 있다. The display 115 may output a user interface screen through a display. For example, the display 115 may output a Korean user interface set as a default through the display, and output, for example, an English user interface through the display based on the recognized language of the user 100. .

디스플레이(115)는 사용자 인터페이스 화면은 제 1 영역을 통해 발화 가이드 문장을 표시하고, 제 2 영역을 통해 날짜, 시간, 날씨, 온도, 습도와 같은 생활 정보를 표시하고, 제 3 영역을 통해 서브 메뉴를 표시하고, 제 4 영역을 통해 외부 연동 메시지 및 단말 설정 정보를 표시하고, 제 5 영역을 통해 언어 설정 정보 및 제어 정보를 표시할 수 있다. The display 115 displays a user guide screen displaying a utterance guide sentence through the first area, displaying life information such as date, time, weather, temperature, and humidity through the second area, and sub-menu through the third area. And display external linkage messages and terminal setting information through the fourth area, and display language setting information and control information through the fifth area.

다국어 대화 서버(120)는 미디어 재생 장치(110)로부터 음성 명령을 수신할 수 있다. 또한, 다국어 대화 서버(120)는 수신한 음성 명령을 분석하고, 분석 결과를 미디어 재생 장치(110)로 전송할 수 있다. 이 때, 다국어 대화 서버(120)는 음성 명령 처리를 위한 액션(action) ID 결과 값을 미디어 재생 장치(110)로 전송할 수 있다. The multilingual conversation server 120 may receive a voice command from the media playback device 110. In addition, the multilingual conversation server 120 may analyze the received voice command and transmit the analysis results to the media playback device 110. At this time, the multilingual conversation server 120 may transmit an action ID result value for processing a voice command to the media playback device 110.

도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다. 도 2를 참조하면, 미디어 재생 장치(110)는 입력부(210), 언어 인식부(220), 분류부(230), 발화 가이드 문장 생성부(240), 변환부(250), 통신부(260), 언어 특징 분석부(270), 음성 변환부(280) 및 출력부(290)를 포함할 수 있다. 2 is a block diagram of a media playback apparatus according to an embodiment of the present invention. Referring to FIG. 2, the media playback device 110 includes an input unit 210, a language recognition unit 220, a classification unit 230, an utterance guide sentence generation unit 240, a conversion unit 250, and a communication unit 260. , A language feature analysis unit 270, a voice conversion unit 280 and an output unit 290.

입력부(210)는 사용자(100)로부터 음성 명령을 입력받을 수 있다. 예를 들어, 입력부(210)는 "GiGa Genenie, Turn the all lights off"와 같이 외국어 음성 명령을 입력받을 수 있다. The input unit 210 may receive a voice command from the user 100. For example, the input unit 210 may receive a foreign language voice command such as “GiGa Genenie, Turn the all lights off”.

언어 인식부(220)는 입력된 음성 명령을 분석하여 사용자(100)가 발화한 언어를 인식할 수 있다. 이 때, 언어 인식부(220)는 신경망, 히든마르코프 모델과 같은 패턴 인식, 딥러닝 기반 자연어 처리 등 중 적어도 하나의 자동 언어 인식 알고리즘을 이용하여 사용자(100)가 발화한 언어를 자동으로 인식할 수 있다. 예를 들어, 언어 인식부(220)는 "GiGa Genenie, Turn the all lights off"를 분석하여 사용자(100)가 발화한 언어가 '영어'임을 인식할 수 있다. The language recognition unit 220 may analyze the input voice command to recognize the language spoken by the user 100. At this time, the language recognition unit 220 automatically recognizes the language spoken by the user 100 using at least one automatic language recognition algorithm among neural networks, pattern recognition such as a Hidden Markov model, and deep learning-based natural language processing. You can. For example, the language recognition unit 220 may analyze “GiGa Genenie, Turn the all lights off” to recognize that the language spoken by the user 100 is “English”.

분류부(230)는 인식된 언어에 기초하여 음성 명령의 문장 구성을 결정하고, 결정된 문장 구성에 따라 어휘 구사 성향을 분류할 수 있다. 예를 들어, 분류부(230)는 음성 명령의 문장 구성에 포함된 적어도 하나 이상의 단어의 품사(예를 들어, 명사, 동사, 형용사, 부사 등)를 결정하고, 결정된 각 품사 별로 사용 어휘를 카테고리화하고, 카테고리화된 사용 어휘에 따라 결정된 각 품사 별로 등급을 매핑하고, 각 품사 별로 매핑된 등급에 기초하여 사용자(100)의 어휘 구사 성향을 결정할 수 있다. 사용자(100)의 어휘 구사 성향을 결정하는 과정에 대해서는 도 3을 통해 상세히 설명하도록 한다. The classification unit 230 may determine the sentence structure of the voice command based on the recognized language, and classify the vocabulary proficiency according to the determined sentence structure. For example, the classification unit 230 determines parts of speech (eg, nouns, verbs, adjectives, adverbs, etc.) of at least one word included in the sentence composition of the voice command, and categorizes the used vocabulary for each determined part of speech. It is possible to map the ratings for each part of speech determined according to the categorized use vocabulary, and to determine the vocabulary propensity of the user 100 based on the ratings for each part of speech. The process of determining the vocabulary propensity of the user 100 will be described in detail with reference to FIG. 3.

도 3은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자의 언어에 기초하여 사용자의 어휘 구사 성향을 분류하는 과정을 설명하기 위한 예시적인 도면이다. 도 3에서는, 사용자(100)가 "Give me linen amenities."라는 문장을 발화하였다고 가정되었다. 3 is an exemplary diagram for describing a process of classifying a user's vocabulary based on a user's language in a media playback apparatus according to an embodiment of the present invention. In FIG. 3, it was assumed that the user 100 uttered the sentence "Give me linen amenities."

분류부(230)는 음성 명령의 문장 구성에 포함된 각 음성 발화 단어(300)에 대한 품사명(310)을 결정할 수 있다. 예를 들어, 분류부(230)는 'give'의 품사를 '동사'(311)로 결정하고, 'me'의 품사를 '대명사'(312)로 결정하고, 'linen'의 품사를 '명사'(313)로 결정하고, 'amenities'의 품사를 '명사'(314)로 결정할 수 있다. The classification unit 230 may determine the part-of-speech name 310 for each spoken word 300 included in the sentence configuration of the voice command. For example, the classification unit 230 determines the part of speech of 'give' as' verb '311, the part of speech of' me 'as' pronoun' 312, and the part of speech of 'linen' as' noun It can be determined as' (313), and the part of speech of 'amenities' can be determined as' noun' (314).

분류부(230)는 각 품사 별로 사용 어휘를 카테고리화할 수 있다. 이 때, 분류부(230)는 어휘 등급 사전 데이터베이스를 이용하여 각 품사별 사용 어휘를 카테고리화할 수 있다. 예를 들어, 분류부(230)는 '동사'(311)가 'give'를 포함하고, '명사'(313, 314)'가 'linen, amenities'를 포함하도록 카테고리화할 수 있다.The classification unit 230 may categorize the vocabulary used for each part of speech. At this time, the classification unit 230 may categorize the used vocabulary for each part of speech by using the vocabulary class dictionary database. For example, the classification unit 230 may categorize the 'verb' 311 to include 'give' and the 'noun' (313, 314) to include 'linen, amenities'.

분류부(230)는 카테고리화된 사용 어휘에 따라 결정된 각 품사 별로 등급을 매핑할 수 있다. 각 품사는 예를 들어, 동사의 경우, A 등급으로 매핑되고, 명사의 경우, B 등급으로 매핑되고, 형용사의 경우, C 등급으로 매핑될 수 있다. 이 때, 접속사, 대명사, 전치사, 감탄사, 관사는 등급의 매핑에서 제외될 수 있다. 예를 들어, 분류부(230)는 '동사'(311)의 등급을 'A'(321)로 매핑하고, '명사'(313, 314)의 등급을 'B'(322, 323)로 각각 매핑할 수 있다. The classification unit 230 may map a grade for each part of speech determined according to the categorized use vocabulary. Each part-of-speech may be mapped to class A, for example, a class B to a noun, and a class C to an adjective. At this time, conjunctions, pronouns, prepositions, interjections, and articles can be excluded from the class mapping. For example, the classification unit 230 maps the grades of the 'verb' 311 to the 'A' 321, and the grades of the 'noun' (313, 314) to the 'B' (322, 323), respectively. Can be mapped.

분류부(230)는 각 품사 별로 매핑된 등급에 기초하여 사용자(100)의 어휘 구사 성향(320)을 결정할 수 있다. 예를 들어, 분류부(230)는 '동사'(311)가 하나의 'A'(321) 등급으로 구성되고, '명사'(313, 314)가 두개의 'B'(322, 323) 등급으로 구성되는 경우, 사용자(100)의 어휘 구사 성향을 'B'타입으로 결정할 수 있다.The classification unit 230 may determine the vocabulary proficiency 320 of the user 100 based on the grade mapped for each part of speech. For example, in the classification unit 230, the 'verb' 311 is composed of one 'A' (321) grade, and the 'noun' (313, 314) is two 'B' (322, 323) grades When configured as, it may determine the vocabulary inclination of the user 100 as a 'B' type.

다시 도 2로 돌아와서, 발화 가이드 문장 생성부(240)는 분류부(230)에서 분류된 어휘 구사 성향에 기초하여 사용자(100)가 발화한 언어로 발화 가이드 문장을 생성할 수 있다. Referring back to FIG. 2, the utterance guide sentence generation unit 240 may generate a utterance guide sentence in a language spoken by the user 100 based on the vocabulary proficiency classified by the classification unit 230.

발화 가이드 문장 생성부(240)는 분류된 어휘 구사 성향에 기초하여 어휘를 선별하고, 선별된 어휘에 기초하여 발화 가이드 문장을 인식된 사용자(100)의 언어로 변환할 수 있다. The utterance guide sentence generation unit 240 may select a vocabulary based on the classified vocabulary proficiency, and convert the utterance guide sentence into the language of the recognized user 100 based on the selected vocabulary.

발화 가이드 문장 생성부(240)는 사용자(100)가 발화한 음성 명령의 사용 빈도에 기초하여 음성 명령 정보를 도출하고, 도출된 음성 명령 정보에 기초하여 발화 리스트 문장을 선별할 수 있다. 예를 들어, 발화 가이드 문장 생성부(240)는 사용자(100)가 발화한 음성 명령의 사용 빈도에 기초하여 상위 10%에 해당하는 음성 명령 정보를 도출하고, 도출된 음성 명령 정보에 기초하여 발화 리스트 문장을 선별할 수 있다. 이 때, 발화 가이드 문장 생성부(240)는 선별된 발화 리스트 문장의 어휘를 사용자(100)의 어휘 구사 성향에 기초하여 재구성할 수 있다. The utterance guide sentence generation unit 240 may derive voice command information based on the frequency of use of the voice command uttered by the user 100, and may select the utterance list sentence based on the derived voice command information. For example, the utterance guide sentence generation unit 240 derives voice command information corresponding to the top 10% based on the frequency of use of the voice command spoken by the user 100, and speaks based on the derived voice command information List sentences can be selected. At this time, the utterance guide sentence generation unit 240 may reconstruct the vocabulary of the selected utterance list sentences based on the vocabulary use propensity of the user 100.

발화 가이드 문장 생성부(240)는 어휘가 재구성된 발화 리스트 문장을 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자(100)의 언어로 변환할 수 있다. 이 때, 기설정된 언어별 매핑 테이블이 존재하지 않는 경우, 번역기 모듈(미도시)을 통해 언어 번역이 수행될 수 있다. 여기서, 번역기 모듈은 웹 번역기 또는 별도로 개발된 번역기일 수 있다. 발화 리스트 문장을 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자(100)의 언어로 변환하는 과정에 대해서는 도 4a 및 도 4b를 통해 상세히 설명하도록 한다. The utterance guide sentence generation unit 240 may convert a utterance list sentence in which the vocabulary is reconstructed into a language of the recognized user 100 based on a preset mapping table for each language. At this time, if there is no preset mapping table for each language, language translation may be performed through a translator module (not shown). Here, the translator module may be a web translator or a separately developed translator. The process of converting the speech list sentence into the language of the recognized user 100 based on the preset mapping table for each language will be described in detail with reference to FIGS. 4A and 4B.

도 4a 및 도 4b는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 발화 리스트 문장을 인식된 사용자의 언어로 변환하는 과정을 설명하기 위한 예시적인 도면이다. 4A and 4B are exemplary diagrams for explaining a process of converting a speech list sentence into a recognized user's language in a media playback apparatus according to an embodiment of the present invention.

도 4a는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 발화 리스트 문장을 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자의 언어로 변환하는 과정을 설명하기 위한 예시적인 도면이다. 도 4a를 참조하면, 발화 가이드 문장 생성부(240)는 발화 리스트 문장을 언어 변환 데이터베이스를 탐색하여 기설정된 언어별 매핑 테이블에 기초하여 인식된 사용자(100)의 언어로 변환할 수 있다.4A is an exemplary diagram for explaining a process of converting a utterance list sentence into a language of a recognized user based on a preset mapping table for each language in a media playback apparatus according to an embodiment of the present invention. Referring to FIG. 4A, the utterance guide sentence generation unit 240 may convert a utterance list sentence into a language of the recognized user 100 based on a preset language-specific mapping table by searching a language conversion database.

발화 가이드 문장 생성부(240)는 발화 리스트 문장(400)을 기설정된 매핑 테이블(410)에 기초하여 인식된 사용자(100)의 언어로 변환할 수 있다. 예를 들어, 인식된 사용자(100)의 언어가 영어(420)인 경우, 발화 가이드 문장 생성부(240)는 발화 리스트 문장(400)을 "Turn on the TV", "Play the music", "How is the weather?", "What time is it now?" 등으로 변환할 수 있다. The utterance guide sentence generation unit 240 may convert the utterance list sentence 400 into the language of the recognized user 100 based on the preset mapping table 410. For example, if the language of the recognized user 100 is English 420, the utterance guide sentence generation unit 240 sets the utterance list sentence 400 to "Turn on the TV", "Play the music", " How is the weather? "," What time is it now? " And so on.

도 4b는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 발화 리스트 문장을 번역기 모듈을 통해 인식된 사용자의 언어로 변환하는 과정을 설명하기 위한 예시적인 도면이다. 도 4b를 참조하면, 발화 가이드 문장 생성부(240)는 발화 리스트 문장(400)이 기설정된 언어별 매핑 테이블(410)에 해당하는 텍스트 언어 정보가 존재하지 않는 경우, 별도의 번역기 모듈(430)을 통해 발화 리스트 문장을 사용자(100)의 언어로 변환할 수 있다. 4B is an exemplary diagram for explaining a process of converting a utterance list sentence into a language of a user recognized through a translator module in a media playback apparatus according to an embodiment of the present invention. Referring to FIG. 4B, the utterance guide sentence generation unit 240 separates the translator module 430 when text language information corresponding to the language-specific mapping table 410 in which the utterance list sentence 400 is preset does not exist. Through the utterance list sentence can be converted to the language of the user (100).

예를 들어, 인식된 사용자(100)의 언어가 '독일어'(440)이지만, 기설정된 매핑 테이블에 '독일어'(440)가 존재하지 않는 경우, 발화 가이드 문장 생성부(240)는 발화 리스트 문장(400)을 별도의 번역기 모듈(430)로 전송하여 번역기 모듈(430)을 통해 '독일어'(440)로 변환된 발화 리스트 문장(400)을 수신할 수 있다.For example, if the language of the recognized user 100 is 'German' 440, but the 'German' 440 does not exist in the preset mapping table, the utterance guide sentence generation unit 240 is the utterance list sentence By transmitting the 400 to a separate translator module 430, the utterance list sentence 400 converted to 'German' 440 through the translator module 430 may be received.

다시 도 2로 돌아와서, 변환부(250)는 인식된 언어 및 분류된 어휘 구사 성향에 기초하여 미디어 재생 장치(110)의 사용자 인터페이스에 대한 출력 사항을 변환할 수 있다. 또한, 변환부(250)는 미디어 재생 장치(110)의 디스플레이(115)에 출력된 텍스트를 분류된 어휘 구사 성향을 참조하여 인식된 언어의 형식으로 변환할 수 있다. 예를 들어, 사용자(100)의 인식된 언어가 '영어'인 경우, 변환부(250)는 '국문'으로 표시된 사용자 인터페이스를 사용자(100)의 어휘 구사 성향에 기초하여 사용자 인터페이스에 대한 출력 사항을 '영문'으로 변환할 수 있다.Returning to FIG. 2 again, the conversion unit 250 may convert outputs of the user interface of the media playback device 110 based on the recognized language and classified vocabulary propensity. Also, the converter 250 may convert the text output on the display 115 of the media playback device 110 into a recognized language format by referring to the classified vocabulary propensity. For example, when the recognized language of the user 100 is 'English', the conversion unit 250 outputs the user interface displayed as 'Korean' based on the vocabulary propensity of the user 100 Can be converted to 'English'.

변환부(250)는 추론된 억양 스타일에 기초하여 사용자 인터페이스에 대한 출력 사항을 변환할 수 있다. The conversion unit 250 may convert the output information for the user interface based on the inferred accent style.

통신부(260)는 인식된 사용자(100)의 언어에 기초하여 음성 명령을 다국어 대화 서버(120)로 전송하고, 다국어 대화 서버(120)로부터 음성 명령에 대한 분석 결과를 수신할 수 있다. 이 때, 통신부(260)는 음성 명령 처리를 위한 액션(action) ID 결과 값을 다국어 대화 서버(120)로부터 수신할 수 있다.The communication unit 260 may transmit a voice command to the multilingual conversation server 120 based on the recognized language of the user 100 and receive an analysis result of the voice command from the multilingual conversation server 120. At this time, the communication unit 260 may receive an action ID result value for processing a voice command from the multilingual conversation server 120.

언어 특징 분석부(270)는 입력된 음성 명령으로부터 사용자(100)의 억양 및 엑센트에 대한 언어 특징 정보를 추출하고, 추출된 언어 특징 정보에 대한 패턴을 통해 음성 명령에 대한 억양 스타일을 분석할 수 있다. 예를 들어, 언어 특징 분석부(270)는 입력된 음성 명령을 윤율 표기 규약 표기법(ToBI, Tones and Break Indices)에 기초하여 단순화된 음높이 심볼로 변환하고, 기설정된 음성 시퀀스에 기초하여 변환된 음높이 심볼을 분석하고, 분석된 음높이 심볼의 패턴과 데이터베이스에 기저장된 언어별 억양 패턴과의 유사도에 기초하여 음성 명령에 대한 억양 스타일을 추론할 수 있다. 음성 명령에 대한 억양 스타일을 추론하는 과정에 대해서는 도 5를 통해 상세히 설명하도록 한다. The language feature analysis unit 270 extracts language feature information for the accent and accent of the user 100 from the input voice command, and analyzes the accent style for the voice command through a pattern for the extracted language feature information. have. For example, the language feature analysis unit 270 converts the input voice command into a simplified pitch symbol based on the ToBI, Tones and Break Indices (ToBI), and the converted pitch based on a preset speech sequence. The symbols may be analyzed, and the intonation style for the voice command may be inferred based on the similarity between the analyzed pitch symbol pattern and the language-specific intonation pattern stored in the database. The process of inferring the accent style for the voice command will be described in detail with reference to FIG. 5.

도 5는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자의 음성 명령에 대한 억양 스타일을 추론하는 과정을 설명하기 위한 예시적인 도면이다. 도 5를 참조하면, 언어 특징 분석부(270)는 음성 명령을 윤율 표기 규약 표기법에 기초하여 단순화된 음높이 심볼로 변환할 수 있다. 단순화된 음높이 심볼(510)은 높은음(H), 중간음(M), 낮은음(L), 고저 액센트(+)로 구성될 수 있다.5 is an exemplary diagram for explaining a process of inferring an accent style for a user's voice command in a media playback apparatus according to an embodiment of the present invention. Referring to FIG. 5, the language feature analysis unit 270 may convert a voice command into a simplified pitch symbol based on the ruler notation convention notation. The simplified pitch symbol 510 may be composed of a high tone (H), a medium tone (M), a low tone (L), and a high and low accent (+).

예를 들어, 사용자(100)가 "TURN THE AIR CONDITIONER UP 10 DEGREE."(500)라는 음성 명령을 발화한 경우, 언어 특징 분석부(270)는 'LL+ML+HHM'(520)와 같이 단순화된 음높이 심볼로 변환할 수 있다. For example, when the user 100 utters the voice command “TURN THE AIR CONDITIONER UP 10 DEGREE.” 500, the language feature analysis unit 270 may display the same as 'LL + ML + HHM' 520. Can be converted to a simplified pitch symbol.

다시 2로 돌아와서, 음성 변환부(280)는 다국어 대화 서버(120)로부터 수신한 음성 명령에 대한 분석 결과를 사용자(100)의 억양 스타일과 합성할 수 있다.Returning to 2 again, the voice conversion unit 280 may synthesize the analysis result of the voice command received from the multilingual conversation server 120 with the accent style of the user 100.

출력부(290)는 변환부(250)에서 사용자 인터페이스에 대한 출력 사항을 변환하면, 변환된 출력 사항을 적용하여 동적으로 사용자 인터페이스를 출력할 수 있다. 예를 들어, 출력부(290)는 사용자(100)의 언어로 음성 명령을 유도하도록 생성된 발화 가이드 문장을 디스플레이(150)에 출력할 수 있다. 또한, 출력부(290)는 변환부(250)에서 미디어 재생 장치(110)의 디스플레이(115)에 출력된 텍스트를 분류된 어휘 구사 성향을 참조하여 인식된 언어의 형식으로 변환되면, 변환된 텍스트를 디스플레이(115)에 출력할 수 있다. 다른 예를 들어, 출력부(290)는 사용자(100)의 억양 스타일과 합성된 음성 명령에 대한 분석 결과를 오디오를 통해 출력할 수 있다. When the output unit 290 converts the output items for the user interface in the conversion unit 250, the output unit 290 may dynamically output the user interface by applying the converted output items. For example, the output unit 290 may output the utterance guide sentence generated to induce a voice command in the language of the user 100 on the display 150. In addition, when the output unit 290 converts the text output on the display 115 of the media playback device 110 by the conversion unit 250 into a recognized language format by referring to the classified vocabulary propensity, the converted text Can be output to the display 115. For another example, the output unit 290 may output the analysis result of the voice command synthesized with the accent style of the user 100 through audio.

도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 사용자의 언어에 기초하여 출력 사항이 변환된 미디어 재생 장치의 사용자 인터페이스를 도시한 예시적인 도면이다. 6A to 6C are exemplary views illustrating a user interface of a media playback device in which output items are converted based on a user's language according to an embodiment of the present invention.

도 6a는 본 발명의 일 실시예에 따른 국문형 사용자 인터페이스를 도시한 예시적인 도면이다. 도 6a를 참조하면, 미디어 재생 장치(110)는 국문형 사용자 인터페이스 화면(600)을 통해 제 1 영역(610)에 발화 가이드 문장을 표시하고, 제 2 영역(620)에 생활 정보를 표시하고, 제 3 영역(630)에 서브 메뉴를 표시하고, 제 4 영역(640)에 외부 연동 메시지 및 단말 설정 정보를 표시하고, 제 5 영역(650)에 언어 설정 정보 및 제어 정보를 표시할 수 있다. 6A is an exemplary diagram illustrating a Korean user interface according to an embodiment of the present invention. Referring to FIG. 6A, the media playback device 110 displays the utterance guide sentence in the first area 610 through the Korean user interface screen 600, and displays life information in the second area 620, The sub-menu may be displayed in the third area 630, external interlocking messages and terminal setting information may be displayed in the fourth area 640, and language setting information and control information may be displayed in the fifth area 650.

발화 가이드 문장은 사용자(100)의 어휘 구사 성향에 기초하여 발화 가이드리스트 문장 중 어느 하나가 디스플레이(115)에 표시될 수 있다. 예를 들어, 발화 가이드 리스트 문장은 사용자(100)의 어휘 구사 성향이 'A타입'으로 분류된 경우, "지니야, 불 다 꺼줘"가 디스플레이에 표시되고, 사용자(100)의 어휘 구사 성향이 'B타입'으로 분류된 경우, "지니야. 조명 모두 꺼줘"가 디스플레이에 표시되고, 사용자(100)의 어휘 구사 성향이 'C타입'으로 분류된 경우, "지니야, 조명 전체 꺼줘"가 디스플레이에 표시될 수 있다. 예를 들어, 사용자(100)의 어휘 구사 성향이 'C'타입으로 분류된 경우, 미디어 재생 장치(110)는 "지니야. 조명 전체 꺼줘"라는 발화 가이드 문장을 디스플레이에 표시할 수 있다. As for the utterance guide sentence, any one of the utterance guide list sentences may be displayed on the display 115 based on the vocabulary propensity of the user 100. For example, in the utterance guide list sentence, when the vocabulary propensity of the user 100 is classified as 'A type', “Geniya, turn off all lights” is displayed on the display, and the user 100's vocabulary propensity is When classified as 'B type', "Gniya. Turn off all lights" is displayed on the display, and when the vocabulary of the user 100 is classified as 'C type', "Gniya, turn off all lights" Can be displayed on the display. For example, when the vocabulary inclination of the user 100 is classified as a 'C' type, the media playback device 110 may display a utterance guide sentence, "Gniya. Turn off all lights" on the display.

도 6b는 본 발명의 일 실시예에 따른 영문형 사용자 인터페이스를 도시한 예시적인 도면이다. 도 6b를 참조하면, 미디어 재생 장치(110)는 사용자(100)의 언어가 '영어'로 인식된 경우, 국문형 사용자 인터페이스 화면을 영문형으로 변환하여 표시할 수 있다. 미디어 재생 장치(110)는 영문형으로 변환된 사용자 인터페이스 화면(600)을 통해 제 1 영역(610)에 발화 가이드 문장을 영어로 표시하고, 제 2 영역(620)에 생활 정보를 영어로 표시하고, 제 3 영역(630)에 서브 메뉴를 영어로 표시하고, 제 4 영역(640)에 외부 연동 메시지 및 단말 설정 정보를 영어로 표시하고, 제 5 영역(650)에 언어 설정 정보 및 제어 정보를 영어로 표시할 수 있다.6B is an exemplary diagram illustrating an English user interface according to an embodiment of the present invention. Referring to FIG. 6B, when the language of the user 100 is recognized as 'English', the media playback device 110 may convert and display a Korean user interface screen into an English type. The media playback device 110 displays the utterance guide sentence in the first area 610 in English through the user interface screen 600 converted to the English type, and displays the living information in the second area 620 in English. , Displays the sub-menu in English in the third area 630, displays external linkage messages and terminal setting information in English in the fourth area 640, and displays language setting information and control information in the fifth area 650. Can be displayed in English.

도 6c는 본 발명의 일 실시예에 따른 사용자의 음성 명령에 기초하여 사용자의 언어, 어휘 구사 성향, 억양 스타일이 적용된 사용자 인터페이스를 도시한 예시적인 도면이다. 도 6c를 참조하면, 사용자(100)가 "turn on all room lights(660)"라는 음성 명령을 발화한 경우, 미디어 재생 장치(110)는 사용자(100)가 사용하는 언어가 '영문'임을 인식하고, 사용자(100)의 어휘 구사 성향을 'A타입'으로 분류하고, 사용자(100)의 억양 패턴을 분석할 수 있다. 6C is an exemplary diagram illustrating a user interface to which a user's language, vocabulary inclination, and accent style are applied based on a user's voice command according to an embodiment of the present invention. Referring to FIG. 6C, when the user 100 utters the voice command “turn on all room lights 660”, the media playback device 110 recognizes that the language used by the user 100 is 'English' Then, the vocabulary inclination of the user 100 may be classified as 'A type', and the accent pattern of the user 100 may be analyzed.

예를 들어, 미디어 재생 장치(110)는 사용자(100)의 인식된 언어가 '영어'이므로, 사용자 인터페이스를 영문으로 변환하여 출력하고, 사용자(100)의 어휘 구사 성향에 따른 발화 가이드 문장을 표시할 수 있다. 이 때, 미디어 재생 장치(110)는 사용자(100)의 음성 명령에 대한 분석 결과(예를 들어, "All room lights are now turned off(670)")를 사용자(100)의 억양 스타일과 합성하여 오디오를 통해 출력할 수 있다. For example, since the recognized language of the user 100 is 'English', the media playback device 110 converts the user interface to English and outputs it, and displays a utterance guide sentence according to the vocabulary proficiency of the user 100 can do. At this time, the media playback device 110 synthesizes the analysis result (eg, “All room lights are now turned off (670)”) of the user 100 with the accent style of the user 100 You can output through audio.

도 7은 본 발명의 일 실시예에 따른 미디어 재생 장치에서 다국어 음성 명령 서비스를 제공하는 방법의 순서도이다. 도 7에 도시된 미디어 재생 장치(110)에서 다국어 음성 명령 서비스를 제공하는 방법은 도 1 내지 도 6c에 도시된 실시예에 따른 다국어 음성 명령 서비스 제공 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 6c에 도시된 실시예에 따른 미디어 재생 장치(110)에서 다국어 음성 명령 서비스를 제공하는 방법에도 적용된다. 7 is a flowchart of a method of providing a multilingual voice command service in a media playback apparatus according to an embodiment of the present invention. The method for providing a multilingual voice command service in the media playback device 110 shown in FIG. 7 is time-sequentially processed by the multilingual voice command service providing system 1 according to the embodiment shown in FIGS. 1 to 6C. Includes Therefore, even if omitted, the media playback apparatus 110 according to the embodiment shown in FIGS. 1 to 6C is also applied to a method of providing a multilingual voice command service.

단계 S710에서 미디어 재생 장치(110)는 사용자(100)로부터 음성 명령을 입력받을 수 있다. In step S710, the media playback device 110 may receive a voice command from the user 100.

단계 S720에서 미디어 재생 장치(110)는 입력된 음성 명령을 분석하여 사용자(100)가 발화한 언어를 인식할 수 있다. In step S720, the media playback device 110 may recognize the language spoken by the user 100 by analyzing the input voice command.

단계 S730에서 미디어 재생 장치(110)는 인식된 언어에 기초하여 음성 명령의 문장 구성을 결정하고, 결정된 문장 구성에 따라 어휘 구사 성향을 분류할 수 있다. In step S730, the media playback device 110 may determine the sentence structure of the voice command based on the recognized language, and classify the vocabulary proficiency according to the determined sentence structure.

단계 S740에서 미디어 재생 장치(110)는 인식된 언어 및 분류된 어휘 구사 성향에 기초하여 미디어 재생 장치(110)의 사용자 인터페이스에 대한 출력 사항을 변환할 수 있다. In step S740, the media playback device 110 may convert outputs of the user interface of the media playback device 110 based on the recognized language and classified vocabulary propensity.

단계 S750에서 미디어 재생 장치(110)는 변환된 출력 사항을 적용하여 동적으로 사용자 인터페이스를 출력할 수 있다. In step S750, the media playback device 110 may dynamically output the user interface by applying the converted output.

상술한 설명에서, 단계 S710 내지 S750은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S710 to S750 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between the steps may be switched.

도 1 내지 도 7을 통해 설명된 미디어 재생 장치에서 다국어 음성 명령 서비스를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 7을 통해 설명된 미디어 재생 장치에서 다국어 음성 명령 서비스를 제공하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method of providing a multilingual voice command service in the media playback apparatus described with reference to FIGS. 1 to 7 may also be implemented in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer. You can. In addition, the method for providing a multilingual voice command service in the media playback apparatus described with reference to FIGS. 1 to 7 may also be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustration only, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the above detailed description, and it should be interpreted that all changes or modified forms derived from the meaning and scope of the claims and equivalent concepts thereof are included in the scope of the present invention. do.

110: 미디어 재생 장치
115: 디스플레이
120: 다국어 대화 서버
210: 입력부
220: 언어 인식부
230: 분류부
240: 발화 가이드 문장 생성부
250: 변환부
260: 통신부
270: 언어 특징 분석부
280: 음성 변환부
290: 출력부110: media playback device
115: display
120: multilingual chat server
210: input
220: language recognition unit
230: classification
240: utterance guide sentence generation unit
250: conversion unit
260: communication unit
270: language feature analysis unit
280: speech converter
290: output

Claims

A media playback device that provides a multilingual voice command service,
An input unit that receives a voice command from a user;
A language recognition unit that analyzes the input voice command and recognizes a language spoken by the user;
A classification unit determining a sentence configuration of the voice command based on the recognized language, and classifying a vocabulary proficiency according to the determined sentence configuration;
A conversion unit configured to convert outputs to the user interface of the media playback device based on the recognized language and the classified vocabulary proficiency; and
And an output unit that dynamically outputs the user interface by applying the converted output information.

According to claim 1,
The classification unit determines a part-of-speech of at least one word included in the sentence configuration of the voice command, categorizes a used vocabulary for each determined part-of-speech, and maps a rating for each determined part-of-speech according to the categorized used vocabulary And determining a user's vocabulary propensity based on the level mapped for each part of speech.

According to claim 1,
A utterance guide sentence generation unit that generates a utterance guide sentence in a language spoken by the user based on the classified vocabulary propensity
Further comprising,
And the output unit outputs the generated utterance guide sentence to a display.

The method of claim 3,
The speech guide sentence generator selects a vocabulary based on the classified vocabulary proficiency, and converts the speech guide sentence into the recognized user's language based on the selected vocabulary.

The method of claim 4,
The utterance guide sentence generation unit derives voice command information based on the frequency of use of the voice command spoken by the user,
Sentence list sentences are selected based on the derived voice command information,
And reconstructing the vocabulary of the selected speech list sentence based on the user's vocabulary propensity.

The method of claim 5,
The speech guide sentence generation unit converts the speech list sentence in which the vocabulary is reconstructed into a language of the recognized user based on a preset mapping table for each language.

According to claim 1,
Transmitting the voice command to a multilingual conversation server based on the recognized user's language,
And a communication unit configured to receive an analysis result of the voice command from the multilingual conversation server.

The method of claim 7,
Further comprising a voice conversion unit for synthesizing the analysis result of the voice command received from the multilingual conversation server with the accent style of the user,
The output unit is to output the analysis result of the voice command synthesized with the accent style of the user through audio, media playback device.

According to claim 1,
Further comprising a language feature analysis unit for extracting the language feature information for the accent and accent of the user from the input voice command, and analyzing the accent style for the voice command through a pattern for the extracted language feature information, Media playback device.

The method of claim 9,
The language feature analysis unit converts the input voice command into a pitch symbol based on a rhythm notation convention notation, analyzes the converted pitch symbol based on a preset speech sequence, and stores the pattern and the previously stored pattern of the analyzed pitch symbol. Infer the intonation style for the voice command based on the similarity with the intonation pattern for each language,
And the conversion unit converts the output information based on the inferred accent style.

According to claim 1,
The conversion unit converts the text output on the display of the media playback device into the recognized language format with reference to the classified vocabulary proficiency,
And the output unit outputs the converted text to the display.

A method for providing a multilingual voice command service in a media playback device,
Receiving a voice command from a user;
Analyzing the input voice command to recognize a language spoken by the user;
Determining a sentence structure of the voice command based on the recognized language, and classifying a vocabulary propensity according to the determined sentence structure;
Converting outputs for a user interface of the media player based on the recognized language and the classified vocabulary proficiency; And
And dynamically outputting the user interface by applying the converted output information.

The method of claim 12,
The classifying may include determining a part-of-speech of at least one word included in the sentence configuration of the voice command;
Categorizing the used vocabulary for each of the determined parts of speech;
Mapping a grade for each of the determined parts of speech according to the categorized use vocabulary; And
And determining a user's vocabulary propensity based on the grade mapped for each part of speech.

The method of claim 12,
Generating an utterance guide sentence in a language spoken by the user based on the classified vocabulary propensity; And
Displaying the generated utterance guide sentence on a display
The method further comprising, providing a multilingual voice command service.

The method of claim 14,
Generating the utterance guide sentence may include selecting a vocabulary based on the classified vocabulary propensity; And
A method of providing a multilingual voice command service, wherein the utterance guide sentence is converted into the recognized user's language based on the selected vocabulary.

The method of claim 15,
The generating of the spoken guide sentence may include deriving voice command information based on the frequency of use of the voice command spoken by the user;
Selecting a utterance list sentence based on the derived voice command information;
Reconfiguring the vocabulary of the selected utterance list sentence based on the user's vocabulary propensity; And
And converting the speech list sentence in which the vocabulary is reconstructed into a language of the recognized user based on a preset mapping table for each language.

The method of claim 12,
Transmitting the voice command to a multilingual conversation server based on the recognized user's language information; And
Receiving an analysis result of the voice command from the multilingual conversation server;
Synthesizing an analysis result of the voice command received from the multilingual conversation server with the accent style of the user; And
And outputting an analysis result of the speech style synthesized with the user's intonation style through audio.

The method of claim 12,
Extracting language feature information for the accent and accent of the user from the input voice command; And
And analyzing an accent style for the voice command through a pattern for the extracted language feature information.

The method of claim 18,
Converting the input voice command into a pitch symbol based on a leap notation convention notation;
Analyzing the converted pitch symbol based on a preset speech sequence; And
Inferring an accent style for the voice command based on the similarity between the analyzed pitch symbol pattern and the pre-stored accent pattern for each language; And
Transforming the output based on the inferred accent style further
Further comprising, a method for providing a multilingual voice command service.

A computer program stored on a computer readable medium comprising a sequence of instructions for providing a multilingual voice command service through a media playback device,
When the computer program is executed by a computing device,
Receiving voice commands from the user,
Recognize the language spoken by the user by analyzing the input voice command,
Determine the sentence structure of the voice command based on the recognized user's language,
Classifying vocabulary proficiency according to the determined sentence composition,
Convert outputs to the user interface of the media playback device based on the recognized language and the classified vocabulary propensity,
And a sequence of instructions for dynamically outputting the user interface by applying the converted output.