KR102145370B1

KR102145370B1 - Media play device and method for controlling screen and server for analyzing screen

Info

Publication number: KR102145370B1
Application number: KR1020180082371A
Authority: KR
Inventors: 최재원; 임정빈; 윤성인; 이강태
Original assignee: 주식회사 케이티
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2020-08-18
Also published as: KR20200008341A

Abstract

화면을 제어하는 미디어 재생 장치는 사용자로부터 발화된 음성 명령을 입력받는 입력부, 상기 입력된 음성 명령을 음성 인식 서버로 전송하는 음성 명령 전송부, 상기 음성 인식 서버로부터 상기 음성 명령에 기초하여 생성된 텍스트 정보를 수신하는 텍스트 정보 수신부, 상기 미디어 재생 장치에 표시된 메뉴 화면을 분석하는 화면 상태 분석부, 상기 수신한 텍스트 정보 및 상기 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버로 전송하는 화면 정보 전송부, 상기 화면 분석 서버로부터 상기 텍스트 정보 및 상기 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하는 수신부 및 상기 수신한 제어 명령에 기초하여 상기 메뉴 화면을 제어하는 실행부를 포함한다. The media playback device for controlling the screen includes an input unit for receiving a voice command uttered from a user, a voice command transmission unit for transmitting the input voice command to a voice recognition server, and text generated based on the voice command from the voice recognition server. A text information receiving unit that receives information, a screen state analysis unit that analyzes a menu screen displayed on the media playback device, a screen information transmission that transmits the received text information and screen state information on the analyzed menu screen to a screen analysis server And a receiving unit for receiving a control command generated based on the text information and the screen state information from the screen analysis server, and an execution unit for controlling the menu screen based on the received control command.

Description

Media playback device that controls the screen, method, and server that analyzes the screen {MEDIA PLAY DEVICE AND METHOD FOR CONTROLLING SCREEN AND SERVER FOR ANALYZING SCREEN}

본 발명은 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버에 관한 것이다. The present invention relates to a media playback apparatus and a method for controlling a screen, and a server for analyzing the screen.

IPTV(Internet Protocol Television)란 초고속 인터넷 망을 이용하여 영화와 방송프로그램과 같은 동영상 컨텐츠와 인터넷 검색 등 다양한 멀티미디어 컨텐츠를 텔레비전 수상기로 제공하는 양방향 방송 및 통신 서비스이다. 시청자는 리모컨을 이용하여 간단하게 인터넷 검색은 물론 영화 감상, 홈쇼핑, 홈뱅킹, 온라인 게임, MP3 등 인터넷이 제공하는 다양한 컨텐츠 및 부가 서비스를 IPTV로부터 제공받을 수 있다. IPTV (Internet Protocol Television) is an interactive broadcasting and communication service that provides video content such as movies and broadcast programs and various multimedia contents such as Internet search through a high-speed Internet network to a television receiver. Viewers can simply search the Internet using a remote control and receive various contents and additional services provided by the Internet, such as watching movies, home shopping, home banking, online games, and MP3s, from IPTV.

최근의 IPTV는 리모컨 대신 사용자로부터 발화된 음성 인식을 통해 IPTV 서비스를 제공하고 있으며, 이러한 서비스와 관련하여, 선행기술인 한국공개특허 제 2011-0027362호는 음성 인터페이스를 이용한 IPTV 시스템 및 서비스 방법을 개시하고 있다. Recently, IPTV provides an IPTV service through voice recognition spoken by a user instead of a remote control. In relation to this service, Korean Patent Publication No. 2011-0027362, a prior art, discloses an IPTV system and service method using a voice interface. have.

그러나 종래에는 IPTV에서 음성 인식을 통해 사용자가 의도한 메뉴로 접근하고자 하는 경우, IPTV는 사용자가 어떠한 화면에서 음성 발화한 상황인지를 판단할 수 없어 조건 텍스트 검색으로 관련 컨텐츠 또는 해당 메뉴로 접근해야 한다. 이로 인해, 사용자가 의도한 메뉴 또는 컨텐츠로의 접근까지 많은 시간이 소요된다는 단점을 가지고 있다. However, conventionally, when an IPTV attempts to access a menu intended by a user through voice recognition, the IPTV cannot determine which screen the user has spoken in, so it must access the relevant content or the corresponding menu through conditional text search. . For this reason, it has a disadvantage in that it takes a lot of time to access the menu or contents intended by the user.

사용자로부터 발화된 음성 명령과 현재 표시된 메뉴 화면을 비교 분석하여 사용자의 의도에 맞는 하위 메뉴 화면으로 이동시키도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공하고자 한다. 사용자로부터 발화된 음성 명령과 현재 표시된 메뉴 화면을 비교 분석하여 사용자의 의도에 맞는 컨텐츠를 선택하여 표시되도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공하고자 한다. An object of the present invention is to provide a media playback apparatus, a method, and a server that analyzes a screen that controls a screen to move to a lower menu screen suitable for the user's intention by comparing and analyzing a voice command spoken by a user with a currently displayed menu screen. An object of the present invention is to provide a media playback apparatus, a method, and a server that analyzes a screen that controls a screen that selects and displays content suitable for a user's intention by comparing and analyzing a voice command spoken by a user and a currently displayed menu screen.

종래에는 사용자가 호출어+명령어를 반복적으로 발화 해야만 사용자가 원하는 메뉴에 도달할 수 있었으나, 한번의 호출어의 발화 후, 명령어를 연속 발화함으로써 사용자가 원하는 메뉴에 빠르게 도달할 수 있도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공하고자 한다.In the past, the user could reach the desired menu only when the user spoke the caller + command repeatedly, but after uttering one caller, the command was continuously uttered to control the screen that allows the user to quickly reach the desired menu. It is intended to provide a server that analyzes a media playback device, a method, and a screen to be played.

사용자의 음성 명령에 따라 메뉴 화면을 제어한 이후에, 사용자로부터 추가 음성 명령어가 발화된 경우 로그를 추출 및 관리함으로써, 사용자 이력을 기반으로 맞춤형 컨텐츠 추천 및 메뉴 편성을 제공하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공하고자 한다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. After controlling the menu screen according to the user's voice command, when an additional voice command is uttered from the user, the log is extracted and managed, thereby controlling the screen that provides customized content recommendation and menu organization based on the user's history. It is intended to provide a server that analyzes devices, methods and screens. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 사용자로부터 발화된 음성 명령을 입력받는 입력부, 상기 입력된 음성 명령을 음성 인식 서버로 전송하는 음성 명령 전송부, 상기 음성 인식 서버로부터 상기 음성 명령에 기초하여 생성된 텍스트 정보를 수신하는 텍스트 정보 수신부, 상기 미디어 재생 장치에 표시된 메뉴 화면을 분석하는 화면 상태 분석부, 상기 수신한 텍스트 정보 및 상기 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버로 전송하는 화면 정보 전송부, 상기 화면 분석 서버로부터 상기 텍스트 정보 및 상기 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하는 수신부 및 상기 수신한 제어 명령에 기초하여 상기 메뉴 화면을 제어하는 실행부를 포함하는 미디어 재생 장치를 제공할 수 있다. As a means for achieving the above-described technical problem, an embodiment of the present invention provides an input unit for receiving a voice command uttered from a user, a voice command transmission unit for transmitting the input voice command to a voice recognition server, and the voice recognition A text information receiving unit that receives text information generated based on the voice command from a server, a screen state analysis unit that analyzes a menu screen displayed on the media playback device, a screen state of the received text information and the analyzed menu screen A screen information transmission unit for transmitting information to a screen analysis server, a receiving unit for receiving a control command generated based on the text information and the screen state information from the screen analysis server, and the menu screen based on the received control command. It is possible to provide a media playback device including a controlling execution unit.

본 발명의 다른 실시예는, 사용자로부터 발화된 음성 명령을 입력받는 입력부, 상기 입력된 음성 명령 및 상기 미디어 재생 장치에 표시된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버로 전송하는 화면 정보 전송부, 상기 화면 분석 서버로부터 상기 음성 명령 및 상기 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하는 수신부 및 상기 수신한 제어 명령에 기초하여 상기 메뉴 화면을 제어하는 제어부를 포함하는 미디어 재생 장치를 제공할 수 있다. Another embodiment of the present invention is an input unit for receiving a voice command uttered from a user, a screen information transmission unit for transmitting the input voice command and screen status information on a menu screen displayed on the media player to a screen analysis server, It is possible to provide a media playback device including a receiver configured to receive the voice command and a control command generated based on the screen state information from the screen analysis server, and a controller configured to control the menu screen based on the received control command. have.

본 발명의 또 다른 실시예는, 사용자로부터 발화된 음성 명령을 입력받는 단계, 상기 입력된 음성 명령을 음성 인식 서버로 전송하는 단계, 상기 음성 인식 서버로부터 상기 음성 명령에 기초하여 생성된 텍스트 정보를 수신하는 단계, 상기 미디어 재생 장치에 표시된 메뉴 화면을 분석하는 단계, 상기 수신한 텍스트 정보 및 상기 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버로 전송하는 단계, 상기 화면 분석 서버로부터 상기 텍스트 정보 및 상기 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하는 단계 및 상기 수신한 제어 명령에 기초하여 상기 메뉴 화면을 제어하는 단계를 포함하는 메뉴 화면 제어 방법을 제공할 수 있다. In another embodiment of the present invention, receiving a voice command uttered from a user, transmitting the input voice command to a voice recognition server, and receiving text information generated based on the voice command from the voice recognition server. Receiving, analyzing a menu screen displayed on the media playback device, transmitting the received text information and screen status information on the analyzed menu screen to a screen analysis server, the text information from the screen analysis server And receiving a control command generated based on the screen state information and controlling the menu screen based on the received control command.

본 발명의 또 다른 실시예는, 미디어 재생 장치에서 사용자로부터 발화된 음성 명령을 입력받은 경우, 상기 미디어 재생 장치로부터 상기 음성 명령으로부터 변환된 텍스트 정보 및 상기 미디어 재생 장치에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신하는 화면 상태 정보 수신부 및 상기 화면 상태 정보에 기초하여 상기 메뉴 화면에서 상기 음성 명령에 대응하는 제어 명령을 생성하여 상기 미디어 재생 장치로 전송하는 전송부를 포함하는 메뉴 화면 분석 서버를 제공할 수 있다. In another embodiment of the present invention, when a media playback device receives a voice command uttered from a user, text information converted from the voice command from the media playback device and a screen state for a menu screen displayed on the media playback device It is possible to provide a menu screen analysis server including a screen status information receiving unit for receiving information and a transmission unit for generating a control command corresponding to the voice command on the menu screen and transmitting it to the media player based on the screen status information. have.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present invention. In addition to the above-described exemplary embodiments, there may be additional embodiments described in the drawings and detailed description of the invention.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 사용자로부터 발화된 음성 명령과 현재 표시된 메뉴 화면을 비교 분석하여 사용자의 의도에 맞는 하위 메뉴 화면으로 이동시키도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공할 수 있다. 사용자로부터 발화된 음성 명령과 현재 표시된 메뉴 화면을 비교 분석하여 사용자의 의도에 맞는 컨텐츠를 선택하여 표시되도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공할 수 있다. According to any one of the above-described problem solving means of the present invention, a media playback device for controlling a screen to move to a lower menu screen suitable for the user's intention by comparing and analyzing a voice command uttered from a user and a currently displayed menu screen, A server that analyzes the method and screen may be provided. A media playback device, a method, and a server that analyzes a screen for controlling a screen for selecting and displaying content suitable for a user's intention by comparing and analyzing a voice command spoken by a user and a currently displayed menu screen may be provided.

종래에는 사용자가 호출어+명령어를 반복적으로 발화 해야만 사용자가 원하는 메뉴에 도달할 수 있었으나, 한번의 호출어의 발화 후, 명령어를 연속 발화함으로써 사용자가 원하는 메뉴에 빠르게 도달할 수 있도록 하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공할 수 있다. In the past, the user could reach the desired menu only when the user spoke the caller + command repeatedly, but after uttering one caller, the command was continuously uttered to control the screen that allows the user to quickly reach the desired menu. It is possible to provide a server that analyzes a media playback device, a method, and a screen.

사용자의 음성 명령에 따라 메뉴 화면을 제어한 이후에, 사용자로부터 추가 음성 명령어가 발화된 경우 로그를 추출 및 관리함으로써, 사용자 이력을 기반으로 맞춤형 컨텐츠 추천 및 메뉴 편성을 제공하는 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버를 제공할 수 있다. After controlling the menu screen according to the user's voice command, when an additional voice command is uttered from the user, the log is extracted and managed, thereby controlling the screen that provides customized content recommendation and menu organization based on the user's history. A server that analyzes devices, methods, and screens may be provided.

도 1은 본 발명의 일 실시예에 따른 화면 제어 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다.
도 3은 본 발명의 일 실시예에 따른 화면 제어를 위해 사용자의 연속 발화를 유도하는 화면을 표시한 예시적인 도면이다.
도 4는 본 발명의 일 실시예에 따른 사용자의 음성 발화를 통해 넘버링된 컨텐츠가 선택되도록 하는 화면을 도시한 예시적인 도면이다.
도 5는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 화면을 제어하는 방법의 순서도이다.
도 6은 본 발명의 다른 실시예에 따른 미디어 재생 장치에서 화면을 제어하는 방법의 순서도이다.
도 7은 본 발명의 일 실시예에 따른 화면 분석 서버의 구성도이다.
도 8a 및 도 8b는 본 발명의 일 실시예에 따른 텍스트 정보가 화면 데이터를 구성하는 이미지에 포함된 경우의 이미지로부터 텍스트가 추출된 메뉴 화면을 도시한 예시적인 도면이다.
도 9는 본 발명의 일 실시예에 따른 화면 분석 서버에서 화면을 분석하는 방법의 순서도이다.
도 10a 내지 도 10c는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 메인 화면으로부터 사용자의 의도에 따라 최신 영화 카테고리에 해당하는 화면으로 이동하도록 제어하는 과정을 설명하기 위한 예시적인 도면이다.
도 11a 내지 도 11c는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 최신 영화 카테고리로부터 사용자의 의도에 따라 영화 컨텐츠에 해당하는 화면으로 이동하도록 제어하는 과정을 설명하기 위한 예시적인 도면이다.
도 12a 및 도 12b는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자로부터 입력된 연속 발화에 따라 화면을 제어하는 과정을 설명하기 위한 예시적인 도면이다. 1 is a block diagram of a screen control system according to an embodiment of the present invention.
2 is a block diagram of a media playback device according to an embodiment of the present invention.
3 is an exemplary diagram showing a screen that induces a user's continuous speech for screen control according to an embodiment of the present invention.
4 is an exemplary diagram illustrating a screen for selecting numbered content through a user's voice utterance according to an embodiment of the present invention.
5 is a flow chart of a method for controlling a screen in a media playback device according to an embodiment of the present invention.
6 is a flowchart of a method for controlling a screen in a media playback device according to another embodiment of the present invention.
7 is a block diagram of a screen analysis server according to an embodiment of the present invention.
8A and 8B are exemplary diagrams illustrating a menu screen from which text is extracted from an image when text information is included in an image constituting screen data according to an embodiment of the present invention.
9 is a flowchart of a method for analyzing a screen in a screen analysis server according to an embodiment of the present invention.
10A to 10C are exemplary views for explaining a process of controlling a media playback device according to an embodiment of the present invention to move from a main screen to a screen corresponding to a newest movie category according to a user's intention.
11A to 11C are exemplary diagrams for explaining a process of controlling a media playback device according to an embodiment of the present invention to move from a latest movie category to a screen corresponding to movie content according to a user's intention.
12A and 12B are exemplary diagrams for explaining a process of controlling a screen according to continuous speech input from a user in a media playback device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. Throughout the specification, when a part is said to be "connected" with another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다.In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized by using two or more hardware, or two or more units may be realized by one piece of hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by the terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal or device connected to the server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 화면 제어 시스템의 구성도이다. 도 1을 참조하면, 화면 제어 시스템(1)은 미디어 재생 장치(110), 디스플레이 장치(115), 음성 인식 서버(120) 및 화면 분석 서버(130)를 포함할 수 있다. 미디어 재생 장치(110), 디스플레이 장치(115), 음성 인식 서버(120) 및 화면 분석 서버(130)는 화면 제어 시스템(1)에 의하여 제어될 수 있는 구성요소들을 예시적으로 도시한 것이다. 1 is a block diagram of a screen control system according to an embodiment of the present invention. Referring to FIG. 1, the screen control system 1 may include a media playback device 110, a display device 115, a voice recognition server 120, and a screen analysis server 130. The media playback device 110, the display device 115, the voice recognition server 120, and the screen analysis server 130 exemplarily illustrate components that can be controlled by the screen control system 1.

도 1의 화면 제어 시스템(1)의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 미디어 재생 장치(110)는 음성 인식 서버(120) 또는 화면 분석 서버(130)와 동시에 또는 시간 간격을 두고 연결될 수 있다. Each component of the screen control system 1 of FIG. 1 is generally connected through a network. For example, as shown in FIG. 1, the media playback device 110 may be connected to the voice recognition server 120 or the screen analysis server 130 at the same time or at intervals of time.

네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다. Network refers to a connection structure that enables information exchange between nodes such as terminals and servers, and is a local area network (LAN), a wide area network (WAN), and the Internet (WWW: World). Wide Web), wired and wireless data communication networks, telephone networks, wired and wireless television networks, etc. Examples of wireless data communication networks include 3G, 4G, 5G, 3GPP (3rd Generation Partnership Project), LTE (Long Term Evolution), WIMAX (World Interoperability for Microwave Access), Wi-Fi, Bluetooth communication, infrared communication, and ultrasound. Communication, Visible Light Communication (VLC), LiFi, etc. are included, but are not limited thereto.

일 실시예에 따르면, 미디어 재생 장치(110), 음성 인식 서버(120) 및 화면 분석 서버(130)는 연속 대기를 통해 연속 명령어를 처리할 수 있다. 연속 대기는 이전 명령어 처리 결과에 따라 명령어 수행 여부가 결정되는 것으로, 사용자(100)가 호출어의 발화 이후, 호출어의 재발화 없이 음성 명령을 입력 대기하여 메뉴 화면을 제어할 수 있다. According to an embodiment, the media playback device 110, the voice recognition server 120, and the screen analysis server 130 may process continuous commands through continuous standby. In the continuous standby, whether or not to execute a command is determined according to a result of processing a previous command, and after the user 100 utters a call word, it is possible to control the menu screen by waiting for input of a voice command without re-firing the call word.

미디어 재생 장치(110)는 사용자(100)로부터 발화된 음성 명령을 입력받을 수 있다. The media playback device 110 may receive a voice command uttered from the user 100.

미디어 재생 장치(110)는 입력된 음성 명령을 음성 인식 서버(120)로 전송하고, 음성 인식 서버(120)로부터 음성 명령에 기초하여 생성된 텍스트 정보를 수신할 수 있다. The media playback device 110 may transmit the input voice command to the voice recognition server 120 and receive text information generated based on the voice command from the voice recognition server 120.

미디어 재생 장치(110)는 미디어 재생 장치(110)에 표시된 메뉴 화면을 분석할 수 있다. 메뉴 화면의 분석이 완료되면, 미디어 재생 장치(110)는 수신한 텍스트 정보 및 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. 화면 상태 정보는 메뉴 화면과 관련된 메뉴 ID, 카테고리 ID, 컨텐츠 ID 및 화면 구성 요소 ID 등을 포함하는 것이되, 각각의 ID는 메뉴명, 카테고리명, 컨텐츠명 및 화면 구성 요소명과 매핑된 것일 수 있다. 이후, 미디어 재생 장치(110)는 화면 분석 서버(130)로부터 텍스트 정보 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하고, 수신한 제어 명령에 기초하여 메뉴 화면을 제어할 수 있다. 이 때, 미디어 재생 장치(110)는 미디어 재생 장치(110)에 표시된 화면 데이터를 화면 분석 서버(130)로 더 전송할 수 있으며, 화면 분석 서버(130)로부터 메뉴 화면 및 화면 데이터와 관련된 제어 명령을 수신하지 못한 경우, 미디어 재생 장치(110)는 수신한 텍스트 정보를 음성 인식 서버(120)로 전송하고, 음성 인식 서버(120)로부터 텍스트 정보에 대응하는 검색 결과를 수신할 수 있다. The media playback device 110 may analyze a menu screen displayed on the media playback device 110. When the analysis of the menu screen is completed, the media playback device 110 may transmit the received text information and screen status information on the analyzed menu screen to the screen analysis server 130. The screen status information includes a menu ID, category ID, content ID, and screen component ID related to a menu screen, and each ID may be mapped to a menu name, category name, content name, and screen component name. Thereafter, the media playback device 110 may receive a control command generated based on text information and screen state information from the screen analysis server 130 and control a menu screen based on the received control command. At this time, the media playback device 110 may further transmit the screen data displayed on the media playback device 110 to the screen analysis server 130, and control commands related to the menu screen and screen data from the screen analysis server 130 If not received, the media playback device 110 may transmit the received text information to the speech recognition server 120 and may receive a search result corresponding to the text information from the speech recognition server 120.

미디어 재생 장치(110)가 메뉴 화면을 표시하는 않는 경우, 미디어 재생 장치(110)는 수신한 텍스트 정보를 음성 인식 서버(120)로 전송하고, 음성 인식 서버(120)로부터 텍스트 정보에 대응하는 검색 결과를 수신할 수 있다. When the media playback device 110 does not display the menu screen, the media playback device 110 transmits the received text information to the voice recognition server 120 and searches for text information from the voice recognition server 120. You can receive the results.

디스플레이 장치(115)는 TV 또는 VOD를 제공하기 위한 메뉴 화면을 표시할 수 있다. The display device 115 may display a menu screen for providing TV or VOD.

디스플레이 장치(115)는 미디어 재생 장치(110)에서 화면 분석 서버(130)로부터 사용자(100)의 음성 명령에 대응하는 제어 명령을 수신한 경우, 제어 명령에 따라 이동된 메뉴 화면을 표시하거나, 컨텐츠를 재생하여 표시할 수 있다. When the media playback device 110 receives a control command corresponding to the voice command of the user 100 from the screen analysis server 130, the display device 115 displays a menu screen moved according to the control command, or Can be displayed by playing back.

음성 인식 서버(120)는 미디어 재생 장치(110)로부터 사용자(100)가 발화한 음성 명령을 포함하는 음성 파일을 수신하고, 수신한 음성 파일로부터 텍스트 정보를 추출하고, 추출한 텍스트 정보를 미디어 재생 장치(110)로 전송할 수 있다. The voice recognition server 120 receives a voice file including a voice command uttered by the user 100 from the media playback device 110, extracts text information from the received voice file, and sends the extracted text information to the media playback device. Can be transmitted to (110).

화면 분석 서버(130)는 메뉴 관리 서버(미도시)로부터 메뉴 화면과 관련된 메타 데이터를 수신하고, 메뉴 화면에 대한 매핑 테이블을 저장 및 업데이트할 수 있다. 여기서, 메뉴 화면은 메뉴 리스트, 카테고리 리스트 및 컨텐츠 리스트로 구성될 수 있으며, 메뉴 리스트는 메뉴 ID 및 메뉴명, 카테고리 리스트는 카테고리 ID 및 카테고리명, 컨텐츠 리스트는 컨텐츠 ID 및 컨텐츠명이 각각 매핑된 것일 수 있다. 여기서, 메뉴 화면 중 컨텐츠 상세 화면은 화질별(저화질/고화질) 구매 버튼, 함께 많이 본 영상, 인물검색, 예고편/부가영상, 평점 더 보기, 줄거리 더 보기 등을 포함할 수 있다.The screen analysis server 130 may receive metadata related to a menu screen from a menu management server (not shown), and may store and update a mapping table for the menu screen. Here, the menu screen may be composed of a menu list, a category list, and a content list, and the menu list may be mapped to a menu ID and menu name, a category list may be a category ID and category name, and a content list may be mapped to a content ID and a content name. . Here, the content detail screen among the menu screens may include a purchase button for each quality (low/high quality), a video viewed together, a person search, a trailer/additional video, more ratings, and more plots.

화면 분석 서버(130)는 메뉴 화면의 화면 구성 요소에 해당하는 텍스트 정보가 화면 데이터를 구성하는 이미지 내에 포함된 것으로 판단된 경우, 이미지로부터 텍스트를 추출하고, 추출된 텍스트와 화면 구성 요소의 ID를 매핑시켜 매핑 테이블을 업데이트할 수 있다. When it is determined that text information corresponding to the screen component of the menu screen is included in the image constituting the screen data, the screen analysis server 130 extracts the text from the image and calculates the extracted text and the ID of the screen component. You can update the mapping table by mapping.

화면 분석 서버(130)는 편성 외에 컨텐츠 상세 화면, 설정 등의 화면 구성 메뉴와 관련하여 별도의 ID 부여를 하여 매핑 테이블을 통해 추가로 관리할 수 있다. The screen analysis server 130 may additionally manage through a mapping table by assigning a separate ID in relation to a screen configuration menu such as a detailed content screen and settings in addition to the organization.

화면 분석 서버(130)는 미디어 재생 장치(110)가 가입한 상품에 따라 매핑 테이블을 관리할 수 있다. 예를 들어, 미디어 재생 장치(110)가 기본 상품으로 Mass/Biz, IPTV 단일 상품/위성 연계 IPTV 상품에 가입하였는지, 부가 상품으로 채널/VOD 월정액에 가입하였는지에 따라 미디어 재생 장치(110)에서 제공되는 메뉴 화면이 달라질 수 있으므로, 화면 분석 서버(130)는 미디어 재생 장치(110)가 가입한 상품에 따라 메뉴 화면에 대한 매핑 테이블을 관리할 수 있다. The screen analysis server 130 may manage a mapping table according to a product to which the media playback device 110 subscribes. For example, depending on whether the media playback device 110 subscribes to Mass/Biz as a basic product, IPTV single product/satellite-linked IPTV product, or subscribes to a channel/VOD monthly fee as an additional product, the media playback device 110 provides Since the menu screen may be different, the screen analysis server 130 may manage a mapping table for a menu screen according to a product to which the media player 110 subscribes.

화면 분석 서버(130)는 미디어 재생 장치(110)로부터 음성 명령으로부터 변환된 텍스트 정보 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신할 수 있다. 화면 분석 서버(130)는 수신된 텍스트 정보 및 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 이 때, 미디어 재생 장치(110)에서 사용자(100)로부터 호출어(예를 들어, 기가지니)를 입력받으면, 화면 분석 서버(130)는 미디어 재생 장치(110)와 메뉴 화면의 제어의 고속 처리를 위한 뷰 세션(view session)이 생성될 수 있다. The screen analysis server 130 may receive text information converted from a voice command from the media playback device 110 and screen status information on a menu screen displayed on the media playback device 110. The screen analysis server 130 may determine whether the voice command is a control command for a menu screen based on the received text information and screen state information. At this time, if the media playback device 110 receives a call word (eg, GiGA Genie) from the user 100, the screen analysis server 130 processes the media playback device 110 and the menu screen at a high speed. A view session for can be created.

화면 분석 서버(130)는 미디어 재생 장치로부터 화면 데이터를 수신할 수 있다. The screen analysis server 130 may receive screen data from the media playback device.

화면 분석 서버(130)는 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단하여 메뉴 화면에서 음성 명령에 대응하는 제어 명령을 생성하고, 이를 미디어 재생 장치(110)로 전송할 수 있다. 예를 들어, 화면 분석 서버(130)는 미디어 재생 장치(110)로부터 수신한 텍스트 정보에 대응하는 메뉴 ID, 카테고리 ID, 컨텐츠 ID 및 화면 구성 요소 ID 중 적어도 하나를 추출하고, 추출된 ID에 기초하여 메뉴 화면에서 음성 명령에 대응하는 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. The screen analysis server 130 determines whether the voice command is a control command for the menu screen based on the screen state information, generates a control command corresponding to the voice command on the menu screen, and transmits the control command to the media playback device 110. I can. For example, the screen analysis server 130 extracts at least one of a menu ID, a category ID, a content ID, and a screen component ID corresponding to text information received from the media playback device 110, and based on the extracted ID. Accordingly, a control command for a menu corresponding to the voice command on the menu screen may be transmitted to the media playback device 110.

화면 분석 서버(130)는 음성 명령에 대응하는 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송한 이후, 미디어 재생 장치(110) 및 화면 분석 서버(130)는 연속 대기를 수행하여 사용자(100)로부터 음성 명령어에 대응하는 텍스트 정보 및 화면 상태 정보를 연속적으로 수신하고, 수신한 텍스트 정보에 대응하는 ID를 추출하여 추출된 ID에 기초하여 메뉴 화면에서 음성 명령에 대응하는 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송하는 과정을 반복 수행할 수 있다. 이 때, 미디어 재생 장치(110)로부터 수신한 텍스트 정보에 대응하는 ID가 추출되지 않는 경우, 미디어 재생 장치(110) 및 화면 분석 서버(130)는 연속 대기를 수행하지 않을 수 있다. After the screen analysis server 130 transmits a control command for a menu corresponding to the voice command to the media playback device 110, the media playback device 110 and the screen analysis server 130 perform continuous standby to perform a user ( 100), the text information corresponding to the voice command and the screen status information are continuously received, the ID corresponding to the received text information is extracted, and the control command for the menu corresponding to the voice command from the menu screen based on the extracted ID The process of transmitting the video to the media playback device 110 may be repeatedly performed. In this case, if the ID corresponding to the text information received from the media playback device 110 is not extracted, the media playback device 110 and the screen analysis server 130 may not perform continuous standby.

다른 실시예에 따르면, 미디어 재생 장치(110), 음성 인식 서버(120) 및 화면 분석 서버(130)는 연속 발화를 통해 연속 명령어를 처리할 수 있다. 연속 발화는 사용자(100)가 음성 명령어의 발화 이후, 화면 분석 서버(130)와 음성 인식 서버(120) 간의 세션을 생성 및 유지시킴으로써, 연속 명령어를 처리하여 메뉴 화면을 제어하도록 할 수 있다. According to another embodiment, the media playback device 110, the voice recognition server 120, and the screen analysis server 130 may process continuous commands through continuous speech. In the continuous speech, after the user 100 utters a voice command, a session between the screen analysis server 130 and the voice recognition server 120 is created and maintained, thereby processing the continuous command to control the menu screen.

미디어 재생 장치(110)는 사용자(100)로부터 발화된 음성 명령을 입력받고, 입력된 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. The media playback device 110 may receive a voice command uttered from the user 100 and transmit the input voice command and screen status information on a menu screen displayed on the media playback device 110 to the screen analysis server 130. have.

미디어 재생 장치(110)는 화면 분석 서버(130)로부터 음성 명령 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신하고, 수신한 제어 명령에 기초하여 메뉴 화면을 제어할 수 있다. The media playback device 110 may receive a voice command and a control command generated based on the screen state information from the screen analysis server 130 and control a menu screen based on the received control command.

미디어 재생 장치(110)는 화면 분석 서버(130)로부터 음성 명령 및 메뉴 화면에 표시된 텍스트 정보에 기초하여 음성 명령에 대한 발화 가이드 정보를 수신하여 표시할 수 있다. 예를 들어, 미디어 재생 장치(110)는 복수의 발화 가이드 정보를 오버레이(overlay) 방식 또는 롤링 방식 중 어느 하나로 표시할 수 있다. The media playback device 110 may receive and display speech guide information for the voice command based on the voice command and text information displayed on the menu screen from the screen analysis server 130. For example, the media playback device 110 may display a plurality of utterance guide information in either an overlay method or a rolling method.

음성 인식 서버(120)는 미디어 재생 장치(110)에서 음성 명령의 연속 발화가 가능한 경우, 화면 분석 서버(130)로부터 음성 명령 및 화면 상태 정보로부터 도출된 힌트를 수신할 수 있다. The voice recognition server 120 may receive a hint derived from the voice command and screen state information from the screen analysis server 130 when the media playback device 110 allows continuous speech of the voice command.

음성 인식 서버(120)는 음성 명령 및 힌트에 기초하여 음성 인식을 수행하고, 인식된 소정의 텍스트를 화면 분석 서버(130)로 전송할 수 있다. The voice recognition server 120 may perform voice recognition based on a voice command and hint, and transmit the recognized text to the screen analysis server 130.

화면 분석 서버(130)는 미디어 재생 장치(110)로부터 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신할 수 있다. 이 때, 화면 분석 서버(130)에서 미디어 재생 장치(110)로부터 음성 명령 및 메뉴 화면에 대한 화면 상태 정보를 수신하면, 화면 분석 서버(130)와 음성 인식 서버(120) 간에 음성 명령의 연속 발화를 고속으로 처리하기 위한 ASR 세션이 생성될 수 있다. The screen analysis server 130 may receive a voice command from the media playback device 110 and screen status information on a menu screen displayed on the media playback device 110. At this time, when the screen analysis server 130 receives the voice command and screen status information on the menu screen from the media playback device 110, continuous speech of the voice command between the screen analysis server 130 and the voice recognition server 120 An ASR session may be created to process the data at high speed.

화면 분석 서버(130)는 미디어 재생 장치(110)로부터 수신한 음성 명령 및 미디어 재생 장치(110)의 화면 상태 정보로부터 도출된 힌트를 음성 인식 서버(120)로 전송하고, 음성 인식 서버(120)로부터 소정의 텍스트를 정확도 순으로 수신할 수 있다. The screen analysis server 130 transmits a voice command received from the media playback device 110 and a hint derived from the screen state information of the media playback device 110 to the voice recognition server 120, and the voice recognition server 120 From, it is possible to receive text in order of accuracy.

화면 분석 서버(130)는 수신한 인식 텍스트 및 화면 상태 정보에 기초하여 음성 명령이 미디어 재생 장치(110)의 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 이 때, 화면 분석 서버(130)는 인식 텍스트 및 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단하여 메뉴 화면에서 음성 명령에 대응하는 제어 명령을 생성하여 미디어 재생 장치(110)로 전송할 수 있다. The screen analysis server 130 may determine whether the voice command is a control command for a menu screen of the media player 110 based on the received recognition text and screen state information. At this time, the screen analysis server 130 determines whether the voice command is a control command for the menu screen based on the recognized text and screen state information, and generates a control command corresponding to the voice command on the menu screen to generate a media playback device ( 110).

도 2는 본 발명의 일 실시예에 따른 미디어 재생 장치의 구성도이다. 도 2를 참조하면, 미디어 재생 장치(110)는 입력부(210), 전송부(220), 텍스트 정보 수신부(230), 화면 상태 분석부(240), 화면 정보 전송부(250), 수신부(260), 제어부(270) 및 표시부(280)를 포함할 수 있다. 2 is a block diagram of a media playback device according to an embodiment of the present invention. Referring to FIG. 2, the media playback device 110 includes an input unit 210, a transmission unit 220, a text information reception unit 230, a screen state analysis unit 240, a screen information transmission unit 250, and a reception unit 260. ), a control unit 270 and a display unit 280.

일 실시예에 따르면, 미디어 재생 장치(110)는 연속 대기를 통해 연속 명령어를 처리할 수 있다. 연속 대기는 이전 명령어 처리 결과에 따라 명령어 수행 여부가 결정되는 것으로, 사용자(100)가 호출어의 발화 이후, 호출어의 재발화 없이 음성 명령을 입력 대기하여 메뉴 화면을 제어할 수 있다. According to an embodiment, the media playback device 110 may process a continuous command through continuous standby. In the continuous standby, whether or not to execute a command is determined according to a result of processing a previous command, and after the user 100 utters a call word, it is possible to control the menu screen by waiting for input of a voice command without re-firing the call word.

입력부(210)는 사용자(100)로부터 발화된 음성 명령을 입력받을 수 있다. 예를 들어, 입력부(210)는 사용자(100)로부터 발화된 호출어(예를 들어, 기가지니)를 입력받은 후, 사용자(100)로부터 "최신 영화로 이동"과 같은 음성 명령을 입력받을 수 있다. The input unit 210 may receive a voice command uttered from the user 100. For example, the input unit 210 may receive a call word uttered from the user 100 (eg, GiGA Genie) and then receive a voice command such as "Move to the latest movie" from the user 100. have.

전송부(220)는 입력된 음성 명령을 음성 인식 서버(120)로 전송할 수 있다. The transmission unit 220 may transmit the input voice command to the voice recognition server 120.

전송부(220)는 미디어 재생 장치(110)가 메뉴 화면을 표시하는 않는 경우, 변환된 텍스트 정보를 음성 인식 서버(120)로 전송할 수 있다. When the media player 110 does not display the menu screen, the transmission unit 220 may transmit the converted text information to the voice recognition server 120.

전송부(220)는 화면 분석 서버(130)로부터 메뉴 화면 및 화면 데이터와 관련된 제어 명령을 수신하지 못한 경우, 수신한 텍스트 정보를 음성 인식 서버(120)로 전송할 수 있다. When the control command related to the menu screen and screen data is not received from the screen analysis server 130, the transmission unit 220 may transmit the received text information to the voice recognition server 120.

텍스트 정보 수신부(230)는 음성 인식 서버(120)로부터 음성 명령에 기초하여 생성된 텍스트 정보를 수신할 수 있다. 여기서, 텍스트 정보는 음성 인식 서버(120)에 의해 음성 명령으로부터 STT(Speech To Text)를 통해 생성된 것일 수 있다. The text information receiving unit 230 may receive text information generated based on a voice command from the voice recognition server 120. Here, the text information may be generated from a voice command by the voice recognition server 120 through Speech To Text (STT).

화면 상태 분석부(240)는 미디어 재생 장치(110)에 표시된 메뉴 화면을 분석할 수 있다. 예를 들어, 화면 상태 분석부(240)는 미디어 재생 장치(110)에서 현재 표시된 메뉴 화면이 메인 메뉴 화면인지, 카테고리 화면인지, 컨텐츠 화면인지를 분석할 수 있다. The screen state analysis unit 240 may analyze a menu screen displayed on the media playback device 110. For example, the screen state analysis unit 240 may analyze whether a menu screen currently displayed in the media playback device 110 is a main menu screen, a category screen, or a content screen.

화면 정보 전송부(250)는 수신한 텍스트 정보 및 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. 화면 상태 정보는 메뉴 화면과 관련된 메뉴 ID, 카테고리 ID, 컨텐츠 ID 및 화면 구성 요소 ID 등을 포함하는 것이되, 각각의 ID는 메뉴명, 카테고리명, 컨텐츠명 및 화면 구성 요소명과 매핑된 것일 수 있다. The screen information transmission unit 250 may transmit the received text information and screen state information on the analyzed menu screen to the screen analysis server 130. The screen status information includes a menu ID, category ID, content ID, and screen component ID related to a menu screen, and each ID may be mapped to a menu name, category name, content name, and screen component name.

화면 정보 전송부(250)는 미디어 재생 장치(110)에 표시된 화면 데이터를 화면 분석 서버(130)로 전송할 수 있다. 화면 데이터는 예를 들어, html, javascript, css, image URL, hash tag, data 등을 포함할 수 있다.The screen information transmission unit 250 may transmit screen data displayed on the media playback device 110 to the screen analysis server 130. Screen data may include, for example, html, javascript, css, image URL, hash tag, data, and the like.

수신부(260)는 화면 분석 서버(130)로부터 텍스트 정보 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신할 수 있다. 제어 명령은 예를 들어, 메뉴 이동, 메뉴 또는 컨텐츠 선택, 컨텐츠 실행 등을 포함할 수 있다. The receiver 260 may receive a control command generated based on text information and screen state information from the screen analysis server 130. The control command may include, for example, moving a menu, selecting a menu or content, executing content, or the like.

수신부(260)는 미디어 재생 장치(110)가 메뉴 화면을 표시하지 않는 경우, 음성 인식 서버(120)로부터 텍스트 정보에 대응하는 검색 결과를 수신할 수 있다. The receiving unit 260 may receive a search result corresponding to text information from the voice recognition server 120 when the media playback device 110 does not display the menu screen.

수신부(260)는 화면 분석 서버(130)로부터 메뉴 화면 및 화면 데이터와 관련된 제어 명령을 수신하지 못한 경우, 음성 인식 서버(120)로부터 텍스트 정보에 대응하는 검색 결과를 수신할 수 있다. When the control command related to the menu screen and screen data is not received from the screen analysis server 130, the receiving unit 260 may receive a search result corresponding to text information from the voice recognition server 120.

제어부(270)는 수신한 제어 명령에 기초하여 메뉴 화면을 제어할 수 있다. The controller 270 may control the menu screen based on the received control command.

다른 실시예에 따르면, 미디어 재생 장치(110)는 사용자(100)로부터 연속 발화를 입력받아 연속 명령어를 처리할 수 있다. 연속 발화는 사용자(100)가 음성 명령어의 발화 이후, 화면 분석 서버(130)와 음성 인식 서버(120) 간에 세션이 생성 및 유지됨으로써, 연속 명령어를 처리하여 메뉴 화면을 제어할 수 있다. According to another embodiment, the media playback device 110 may receive continuous speech from the user 100 and process a continuous command. In the continuous speech, after the user 100 utters a voice command, a session is created and maintained between the screen analysis server 130 and the voice recognition server 120, so that the menu screen can be controlled by processing the continuous command.

입력부(210)는 사용자(100)로부터 발화된 음성 명령을 입력받을 수 있다. The input unit 210 may receive a voice command uttered from the user 100.

화면 정보 전송부(250)는 입력된 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. The screen information transmission unit 250 may transmit an input voice command and screen state information on a menu screen displayed on the media playback device 110 to the screen analysis server 130.

이 때, 화면 정보 전송부(250)에서 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송하면, 화면 분석 서버(130) 및 음성 인식 서버(120) 간에 세션이 생성될 수 있다. At this time, if the screen information transmission unit 250 transmits the voice command and screen status information on the menu screen displayed on the media playback device 110 to the screen analysis server 130, the screen analysis server 130 and the voice recognition server Between 120 sessions may be created.

수신부(260)는 화면 분석 서버(130)로부터 음성 명령 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신할 수 있다. 이 때, 제어 명령은 세션을 통해 화면 분석 서버(130)로부터 음성 인식 서버(120)로 전송된 음성 명령 및 화면 상태 정보로부터 도출된 힌트에 기초하여 생성되는 것일 수 있다. The receiver 260 may receive a voice command and a control command generated based on the screen state information from the screen analysis server 130. In this case, the control command may be generated based on a voice command transmitted from the screen analysis server 130 to the voice recognition server 120 through a session and a hint derived from screen state information.

수신부(260)는 화면 분석 서버(130)로부터 음성 명령 및 메뉴 화면에 표시된 텍스트 정보에 기초하여 음성 명령에 대한 발화 가이드 정보를 수신할 수 있다. The receiver 260 may receive speech guide information for the voice command from the screen analysis server 130 based on the voice command and text information displayed on the menu screen.

표시부(280)는 발화 가이드 정보를 표시할 수 있다.예를 들어, 표시부(280)는 복수의 발화 가이드 정보를 오버레이(overlay) 방식 또는 롤링 방식 중 어느 하나로 표시할 수 있다. The display unit 280 may display utterance guide information. For example, the display unit 280 may display a plurality of ignition guide information in either an overlay method or a rolling method.

도 3은 본 발명의 일 실시예에 따른 화면 제어를 위해 사용자의 연속 발화를 유도하는 화면을 표시한 예시적인 도면이다. 도 3을 참조하면, 미디어 재생 장치(110)는 사용자(100)의 연속 발화를 유도하기 위해 발화 가이드 정보를 표시할 수 있다. 3 is an exemplary diagram showing a screen that induces a user's continuous speech for screen control according to an embodiment of the present invention. Referring to FIG. 3, the media playback device 110 may display speech guide information to induce the user 100 to continuously speak.

예를 들어, 사용자(100)가 "기가지니"라는 호출어를 발화하면, 미디어 재생 장치(110)는 마이크(310) 표시를 통해 사용자(100)의 음성을 입력받고 있음을 표시할 수 있다. 또한, 미디어 재생 장치(110)는 사용자(100)가 발화한 호출어가 인식된 경우, "네 듣고 있어요"와 같은 메시지(320)를 표시할 수 있다. For example, when the user 100 utters the call word "giga genie", the media playback device 110 may indicate that the user 100's voice is being input through the microphone 310 display. In addition, when the page word spoken by the user 100 is recognized, the media playback device 110 may display a message 320 such as "Yes, I am listening".

미디어 재생 장치(110)는 사용자(100)로 하여금 명령어의 연속 발화를 유도하는 발화 가이드 정보를 표시함으로써, 사용자(100)가 원하는 최종 메뉴 또는 컨텐츠로 접근할 수 있도록 할 수 있다. 카테고리가 오늘의 추천, 영화/시리즈, TV 다시보기, 키즈랜드, 애니메이션, 음악/교육/다큐, 일정액 전용관 TV Apps 등을 포함하고, TV 다시보기에서 하위 메뉴로 버닝, 칸 영화제 진출작, 법정 드라마, 인기 애니메이션, VIP 혜택존, 인기 TOP 100, 오늘의 이벤트, 신규 고객 별점존 등을 포함하는 경우, 미디어 재생 장치(110)는 "TV 다시보기" -> "VIP 혜택존"과 같이 사용자(100)의 연속 발화를 유도하는 발화 가이드 정보(300)를 표시할 수 있다. The media playback device 110 may allow the user 100 to access a final menu or content desired by the user 100 by displaying speech guide information that induces continuous speech of a command. The categories include Today's Recommendations, Movies/Series, TV Replays, Kids Land, Animation, Music/Education/Documentation, TV Apps for a certain amount, and burning from TV replays to sub-menus, entry to Cannes Film Festival, legal dramas, In the case of including popular animations, VIP benefit zones, popular TOP 100, today's events, and new customer star zones, the media playback device 110 displays the user 100 as "Review TV" -> "VIP Benefit Zone". The utterance guide information 300 that induces continuous utterance of may be displayed.

도 4는 본 발명의 일 실시예에 따른 사용자의 음성 발화를 통해 넘버링된 컨텐츠가 선택되도록 하는 화면을 도시한 예시적인 도면이다. 도 4를 참조하면, 미디어 재생 장치(110)에서 표시된 메뉴 화면이 '영화/시리즈' 화면인 상태에서, 사용자(100)로부터 '영화/시리즈'의 하위 메뉴에 해당하는 '인기 TOP 10'(400)을 음성 명령으로 입력받은 경우, 미디어 재생 장치(110)는 '인기 TOP 10'(400)에 대한 메뉴 화면을 표시할 수 있다. 이 때, 미디어 재생 장치(110)는 '인기 TOP 10'(400)에 해당하는 복수의 컨텐츠(410)에 대해 각각 넘버링하여 표시할 수 있다. 4 is an exemplary diagram illustrating a screen for selecting numbered content through a user's voice utterance according to an embodiment of the present invention. Referring to FIG. 4, in a state that the menu screen displayed on the media playback device 110 is a'movie/series' screen, a'popular TOP 10' 400 corresponding to a sub-menu of'movie/series' from the user 100 ) Is received as a voice command, the media playback device 110 may display a menu screen for the'popular TOP 10' 400. In this case, the media playback device 110 may number and display a plurality of contents 410 corresponding to the'popular TOP 10' 400, respectively.

예를 들어, 미디어 재생 장치(110)가 현재 '인기 TOP 10'(400) 메뉴 화면이 표시된 상태에서 사용자(100)로부터 "1번 틀어줘"라는 음성 명령을 입력받은 경우, 미디어 재생 장치(110)는 '1번'(411) 컨텐츠에 해당하는 '챔피언'을 재생시키거나, 메뉴 화면을 상세 화면으로 이동시킬 수 있다. For example, when the media player 110 receives a voice command "Play it once" from the user 100 while the current'popular TOP 10' 400 menu screen is displayed, the media player 110 ) May play a'champion' corresponding to the content of'No. 1'411 or may move the menu screen to the detail screen.

이와 같이, 복수의 컨텐츠 각각에 숫자 또는 알파벳이 부여되는 이유는 복수의 컨텐츠의 컨텐츠명이 긴 경우, 미디어 재생 장치(110)는 화면에 컨텐츠명을 모두 표시할 수 없으므로, 사용자에게 메뉴 이동 또는 컨텐츠 선택에 대한 편리함을 제공하기 위함이다. As described above, the reason why numbers or alphabets are given to each of the plurality of contents is that when the contents name of the plurality of contents is long, the media playback device 110 cannot display all the contents names on the screen, so that the user moves a menu or selects the contents. This is to provide convenience for

도 5는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 화면을 제어하는 방법의 순서도이다. 도 5에 도시된 미디어 재생 장치(110)에서 화면을 제어하는 방법은 도 1 내지 도 4에 도시된 실시예에 따른 화면 제어 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 4에 도시된 실시예에 따른 미디어 재생 장치(110)에서 화면을 제어하는 방법에도 적용된다. 5 is a flow chart of a method for controlling a screen in a media playback device according to an embodiment of the present invention. A method of controlling a screen in the media playback device 110 illustrated in FIG. 5 includes steps processed in a time series by the screen control system 1 according to the exemplary embodiment illustrated in FIGS. 1 to 4. Accordingly, even if the contents are omitted below, it is also applied to the method of controlling the screen in the media playback device 110 according to the exemplary embodiment illustrated in FIGS.

단계 S510에서 미디어 재생 장치(110)는 사용자(100)로부터 발화된 음성 명령을 입력받을 수 있다. In operation S510, the media playback device 110 may receive an input of a voice command uttered from the user 100.

단계 S520에서 미디어 재생 장치(110)는 입력된 음성 명령을 음성 인식 서버(120)로 전송할 수 있다. In step S520, the media playback device 110 may transmit the input voice command to the voice recognition server 120.

단계 S530에서 미디어 재생 장치(110)는 음성 인식 서버(120)로부터 음성 명령에 기초하여 생성된 텍스트 정보를 수신할 수 있다. In operation S530, the media playback device 110 may receive text information generated based on a voice command from the voice recognition server 120.

단계 S540에서 미디어 재생 장치(110)는 미디어 재생 장치(110)에 표시된 메뉴 화면을 분석할 수 있다. In operation S540, the media playback device 110 may analyze a menu screen displayed on the media playback device 110.

단계 S550에서 미디어 재생 장치(110)는 수신한 텍스트 정보 및 분석된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. In operation S550, the media playback device 110 may transmit the received text information and screen state information on the analyzed menu screen to the screen analysis server 130.

단계 S560에서 미디어 재생 장치(110)는 화면 분석 서버(130)로부터 텍스트 정보 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신할 수 있다. In operation S560, the media playback device 110 may receive a control command generated based on text information and screen state information from the screen analysis server 130.

단계 S570에서 미디어 재생 장치(110)는 수신한 제어 명령에 기초하여 메뉴 화면을 제어할 수 있다. In operation S570, the media playback device 110 may control the menu screen based on the received control command.

상술한 설명에서, 단계 S510 내지 S570은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S510 to S570 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, or the order of steps may be switched.

도 6은 본 발명의 다른 실시예에 따른 미디어 재생 장치에서 화면을 제어하는 방법의 순서도이다. 도 6에 도시된 미디어 재생 장치(110)에서 화면을 제어하는 방법은 도 1 내지 도 5에 도시된 실시예에 따른 화면 제어 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 5에 도시된 실시예에 따른 미디어 재생 장치(110)에서 화면을 제어하는 방법에도 적용된다. 6 is a flowchart of a method for controlling a screen in a media playback device according to another embodiment of the present invention. A method of controlling a screen in the media playback apparatus 110 shown in FIG. 6 includes steps processed in a time series by the screen control system 1 according to the exemplary embodiment shown in FIGS. 1 to 5. Accordingly, even if the contents are omitted below, it is also applied to a method of controlling a screen in the media playback device 110 according to the exemplary embodiment illustrated in FIGS. 1 to 5.

단계 S610에서 미디어 재생 장치(110)는 사용자(100)로부터 발화된 음성 명령을 입력받을 수 있다. In step S610, the media playback device 110 may receive a voice command uttered from the user 100.

단계 S620에서 미디어 재생 장치(110)는 입력된 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 화면 분석 서버(130)로 전송할 수 있다. In operation S620, the media playback device 110 may transmit the input voice command and screen status information on the menu screen displayed on the media playback device 110 to the screen analysis server 130.

단계 S630에서 미디어 재생 장치(110)는 화면 분석 서버(130)로부터 음성 명령 및 화면 상태 정보에 기초하여 생성된 제어 명령을 수신할 수 있다. In operation S630, the media playback device 110 may receive a voice command and a control command generated based on the screen state information from the screen analysis server 130.

단계 S640에서 미디어 재생 장치(110)는 수신한 제어 명령에 기초하여 메뉴 화면을 제어할 수 있다. In step S640, the media playback device 110 may control the menu screen based on the received control command.

상술한 설명에서, 단계 S610 내지 S640은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S610 to S640 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, or the order of steps may be switched.

도 7은 본 발명의 일 실시예에 따른 화면 분석 서버의 구성도이다. 도 7을 참조하면, 화면 분석 서버(130)는 화면 상태 정보 수신부(710), 추출부(720), 판단부(730), 전송부(740), 텍스트 추출부(750), 매핑부(760) 및 관리부(770)를 포함할 수 있다. 7 is a block diagram of a screen analysis server according to an embodiment of the present invention. Referring to FIG. 7, the screen analysis server 130 includes a screen state information receiving unit 710, an extraction unit 720, a determination unit 730, a transmission unit 740, a text extraction unit 750, and a mapping unit 760. ) And a management unit 770 may be included.

일 실시예에 따르면, 화면 분석 서버(130)는 미디어 재생 장치(110)와의 연속 대기를 통해 연속 명령어를 처리할 수 있다. 연속 대기는 이전 명령어 처리 결과에 따라 명령어 수행 여부가 결정되는 것으로, 사용자(100)가 호출어의 발화 이후, 호출어의 재발화 없이 음성 명령을 입력 대기하여 메뉴 화면을 제어할 수 있다. According to an embodiment, the screen analysis server 130 may process a continuous command through continuous standby with the media playback device 110. In the continuous standby, whether or not to execute a command is determined according to a result of processing a previous command, and after the user 100 utters a call word, it is possible to control the menu screen by waiting for input of a voice command without re-firing the call word.

화면 상태 정보 수신부(710)는 미디어 재생 장치(110)에서 사용자(100)로부터 발화된 음성 명령을 입력받은 경우, 미디어 재생 장치(110)로부터 음성 명령으로부터 변환된 텍스트 정보 및 미디어 재생 장치에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신할 수 있다. When the media playback device 110 receives a voice command uttered from the user 100, the screen state information receiver 710 receives text information converted from the voice command from the media playback device 110 and a menu displayed on the media playback device. You can receive screen status information about the screen.

화면 상태 정보 수신부(710)는 미디어 재생 장치(110)로부터 화면 데이터를 수신할 수 있다. 화면 데이터는 예를 들어, html, javascript, css, image URL, hash tag, data 등을 포함할 수 있다.The screen state information receiving unit 710 may receive screen data from the media playback device 110. Screen data may include, for example, html, javascript, css, image URL, hash tag, data, and the like.

추출부(720)는 미디어 재생 장치(110)로부터 수신된 메뉴 화면에 대한 화면 상태 정보 중 음성 명령에 대한 텍스트 정보에 대응하는 메뉴 ID, 카테고리 ID, 컨텐츠 ID 및 화면 구성 요소 ID 중 적어도 하나를 추출할 수 있다. The extraction unit 720 extracts at least one of a menu ID, a category ID, a content ID, and a screen component ID corresponding to text information on a voice command from among screen status information on a menu screen received from the media playback device 110 can do.

판단부(730)는 수신한 텍스트 정보 및 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 구체적으로, 판단부(730)는 화면 상태 정보 중 텍스트 정보에 대응하여 추출된 각 ID를 분석하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 이 때, 판단부(730)는 음성 명령이 메뉴 화면에 대한 제어 명령으로 판단된 경우, 음성 명령에 대응하는 제어 명령을 생성할 수 있다. The determination unit 730 may determine whether the voice command is a control command for a menu screen based on the received text information and screen state information. Specifically, the determination unit 730 may determine whether the voice command is a control command for the menu screen by analyzing each ID extracted corresponding to the text information among the screen state information. In this case, when it is determined that the voice command is a control command for the menu screen, the determination unit 730 may generate a control command corresponding to the voice command.

전송부(740)는 화면 상태 정보에 기초하여 메뉴 화면에서 음성 명령에 대응하는 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. 제어 명령은 예를 들어, 메뉴 이동, 메뉴 또는 컨텐츠 선택, 컨텐츠 실행 등을 포함할 수 있다. The transmission unit 740 may transmit a control command corresponding to the voice command on the menu screen to the media playback device 110 based on the screen state information. The control command may include, for example, moving a menu, selecting a menu or content, executing content, or the like.

전송부(740)는 미디어 재생 장치(110)에서 사용자(100)로부터 연속 발화한 음성 명령을 수신한 경우, 연속 발화된 음성 명령에 기초하여 메뉴 화면의 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. 여기서, 메뉴 화면은 메뉴 리스트, 카테고리 리스트 및 컨텐츠 리스트로 구성되며, 메뉴 리스트는 메뉴 ID 및 메뉴명, 상기 카테고리 리스트는 카테고리 ID 및 카테고리명, 상기 컨텐츠 리스트는 컨텐츠 ID 및 컨텐츠명이 각각 매핑된 것일 수 있다. When the media playback device 110 receives a continuous speech command from the user 100, the transmission unit 740 transmits a control command for the menu on the menu screen based on the continuous speech command. ). Here, the menu screen is composed of a menu list, a category list, and a content list, the menu list may be a menu ID and a menu name, the category list may be a category ID and a category name, and the content list may be mapped to a content ID and a content name, respectively. .

전송부(740)는 추출부(720)에서 추출된 ID에 기초하여 메뉴 화면에서 음성 명령에 대응하는 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. The transmission unit 740 may transmit a control command for a menu corresponding to the voice command on the menu screen to the media playback device 110 on the basis of the ID extracted by the extraction unit 720.

텍스트 추출부(750)는 메뉴 화면의 화면 구성 요소에 해당하는 텍스트 정보가 화면 데이터를 구성하는 이미지 내에 포함된 것으로 판단된 경우, 이미지로부터 텍스트를 추출할 수 있다. When it is determined that text information corresponding to a screen component of a menu screen is included in an image constituting the screen data, the text extracting unit 750 may extract text from the image.

매핑부(760)는 추출된 텍스트와 화면 구성 요소의 ID를 매핑시킬 수 있다. The mapping unit 760 may map the extracted text and the ID of the screen component.

관리부(770)는 미디어 재생 장치(110)에서 제어 명령에 기초하여 메뉴 화면이 하위 메뉴로 이동된 후 사용자(100)로부터 추가 명령어를 입력받은 경우, 추가 명령어에 대한 로그를 수집하여 관리할 수 있다. 또한, 관리부(770)는 사용자(100)로부터 리모컨을 통해 추가 메뉴 이동을 입력받은 경우, 리모컨을 통해 입력된 추가 메뉴 이동에 대한 로그를 수집하여 관리할 수도 있다. 여기서, 추가 명령어 또는 추가 메뉴 이동은 예를 들어, 메뉴 화면 내에 중복된 텍스트가 존재하는 경우, 사용자의 우선 순위에 따라 메뉴 이동이 다르게 처리됨에 따라 발생될 수 있으며, 또는 메뉴 화면에 해당 텍스트가 존재함에도 사용자가 일반적인 검색 결과를 원함에 따라 발생될 수 있다. When an additional command is input from the user 100 after the menu screen is moved to a lower menu based on a control command from the media playback device 110, the management unit 770 may collect and manage logs for the additional command. . In addition, when an additional menu movement is received from the user 100 through a remote control, the management unit 770 may collect and manage a log of movement of the additional menu input through the remote control. Here, the additional command or the additional menu movement may occur as, for example, when duplicate text exists in the menu screen, the menu movement is processed differently according to the user's priority, or the corresponding text exists on the menu screen. In spite of this, it may occur as the user wants general search results.

관리부(770)는 수집된 로그를 사용자가 의도하지 않은 형태의 발화 명령 케이스로 판단하여 동일한 형태의 명령어 처리가 발생하지 않도록 사용자별 예외 명령어 세트를 통해 별도로 관리할 수 있다. The management unit 770 may determine the collected log as an utterance command case of a type not intended by the user and separately manage the collected log through an exception command set for each user so that the same type of command processing does not occur.

이러한 과정을 통해 수집된 로그에 기초하여 사용자 이력을 기반으로 하는 맞춤형 컨텐츠를 추천하고, 이를 메뉴 편성에 적용되도록 할 수 있다. Based on the logs collected through this process, customized content based on user history can be recommended and applied to menu organization.

다른 실시예에 따르면, 화면 분석 서버(130)는 연속 발화를 통해 연속 명령어를 처리할 수 있다. 연속 발화는 사용자(100)가 음성 명령어의 발화 이후, 화면 분석 서버(130)와 음성 인식 서버(120) 간의 음성 세션을 생성 및 유지됨으로써, 연속 명령어를 처리하여 메뉴 화면을 제어할 수 있다. 이 때, 화면 분석 서버(130)는 화면 상태 분석부(미도시)를 더 포함할 수 있다. According to another embodiment, the screen analysis server 130 may process continuous commands through continuous speech. In the continuous speech, after the user 100 utters the voice command, a voice session between the screen analysis server 130 and the voice recognition server 120 is created and maintained, thereby processing the continuous command to control the menu screen. In this case, the screen analysis server 130 may further include a screen state analysis unit (not shown).

화면 상태 정보 수신부(710)는 미디어 재생 장치(110)로부터 사용자(100)가 발화한 음성 명령 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신할 수 있다. 이 때, 화면 상태 정보 수신부(710)에서 미디어 재생 장치(110)로부터 음성 명령 및 메뉴 화면에 대한 화면 상태 정보를 수신하면, 화면 분석 서버(130)와 음성 인식 서버(120) 간에 음성 명령의 연속 발화를 고속으로 처리하기 위한 ASR 세션이 생성될 수 있다. The screen status information receiver 710 may receive a voice command uttered by the user 100 from the media playback device 110 and screen status information on a menu screen displayed on the media playback device 110. At this time, when the screen status information receiving unit 710 receives the voice command and the screen status information for the menu screen from the media playback device 110, the voice command is continued between the screen analysis server 130 and the voice recognition server 120 An ASR session can be created to process the speech at high speed.

화면 상태 정보 수신부(710)는 음성 인식 서버(120)로부터 음성 명령 및 화면 상태 정보로부터 인식된 인식 텍스트를 정확도에 따라 소정의 개수를 수신할 수 있다. The screen state information receiver 710 may receive a voice command from the voice recognition server 120 and a predetermined number of recognized texts recognized from the screen state information according to accuracy.

추출부(720)는 화면 상태 정보 중 인식 텍스트에 대응하는 메뉴 ID, 카테고리 ID, 컨텐츠 ID 및 화면 구성 요소 ID 중 적어도 하나를 추출할 수 있다. The extraction unit 720 may extract at least one of a menu ID, a category ID, a content ID, and a screen component ID corresponding to the recognized text from the screen state information.

판단부(730)는 수신한 인식 텍스트 및 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 구체적으로, 판단부(730)는 화면 상태 정보 중 인식 텍스트에 대응하여 추출된 각 ID를 분석하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. 이 때, 판단부(730)는 음성 명령이 메뉴 화면에 대한 제어 명령으로 판단된 경우, 음성 명령에 대응하는 제어 명령을 생성할 수 있다. The determination unit 730 may determine whether the voice command is a control command for a menu screen based on the received recognition text and screen state information. Specifically, the determination unit 730 may determine whether the voice command is a control command for a menu screen by analyzing each ID extracted corresponding to the recognized text among the screen state information. In this case, when it is determined that the voice command is a control command for the menu screen, the determination unit 730 may generate a control command corresponding to the voice command.

전송부(740)는 ASR 세션을 통해 음성 인식 서버(120)로 음성 명령 및 화면 상태 정보로부터 도출된 힌트를 전송할 수 있다. 이 때, 힌트는 현재 메뉴 화면에서 노출된 텍스트 세트일 수 있다. The transmission unit 740 may transmit a voice command and a hint derived from screen state information to the voice recognition server 120 through an ASR session. In this case, the hint may be a text set exposed on the current menu screen.

전송부(740)는 음성 명령 및 화면 상태 정보에 기초하여 생성된 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. 예를 들어, 전송부(740)는 추출된 ID에 기초하여 메뉴 화면에서 음성 명령에 대응하는 하위 메뉴에 대한 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. The transmission unit 740 may transmit a control command generated based on a voice command and screen state information to the media playback device 110. For example, the transmission unit 740 may transmit a control command for a sub-menu corresponding to a voice command on the menu screen to the media playback device 110 based on the extracted ID.

도 8a 및 도 8b는 본 발명의 일 실시예에 따른 텍스트 정보가 화면 데이터를 구성하는 이미지에 포함된 경우의 이미지로부터 텍스트가 추출된 메뉴 화면을 도시한 예시적인 도면이다. 도 8a 및 도 8b를 참조하면, 화면 분석 서버(130)는 메뉴 화면의 화면 구성 요소에 해당하는 텍스트 정보가 화면 데이터를 구성하는 이미지(800, 810) 내에 포함된 것으로 판단된 경우, 이미지(800, 810)로부터 텍스트를 추출할 수 있다. 이 때, 이미지(800, 810)는 포스터 이미지일 수 있으며, 텍스트 정보가 화면 데이터를 구성하는 이미지(800, 810) 내에 포함된 경우는 컨텐츠 ID는 존재하나, 컨텐츠명이 존재하지 않는 경우를 의미할 수 있다. 8A and 8B are exemplary diagrams illustrating a menu screen from which text is extracted from an image when text information is included in an image constituting screen data according to an embodiment of the present invention. 8A and 8B, when it is determined that text information corresponding to a screen component of a menu screen is included in the images 800 and 810 constituting the screen data, the screen analysis server 130 , 810). In this case, the images 800 and 810 may be poster images, and when text information is included in the images 800 and 810 constituting the screen data, it means that the content ID exists but the content name does not exist. I can.

화면 분석 서버(130)는 이미지 URL 정보에 기초하여 이미지(800, 810)를 저장하는 외부 서버(미도시)로부터 이미지 원본을 요청하고, 외부 서버(미도시)로부터 수신한 이미지 원본에 대해 OCR(Optical Character Reader) 기능을 이용하여 해당 이미지의 원본으로부터 텍스트 정보를 추출하고, 추출한 텍스트와 화면 구성 요소의 ID를 매핑시켜 저장할 수 있다. The screen analysis server 130 requests the original image from an external server (not shown) that stores images 800 and 810 based on the image URL information, and OCR for the original image received from the external server (not shown). Optical Character Reader) function can be used to extract text information from the original image, map the extracted text to the ID of the screen component, and store it.

이후에, 미디어 재생 장치(110)가 사용자(100)로부터 "더헌츠맨"이라는 음성 명령을 입력받은 경우, 미디어 재생 장치(110)는 "더헌츠맨"(801)을 재생시키거나, 상세 화면으로 메뉴 화면을 이동시킬 수 있다. Thereafter, when the media playback device 110 receives a voice command "The Huntsman" from the user 100, the media playback device 110 plays the "The Huntsman" 801 or displays a menu on the detail screen. You can move the screen.

도 9는 본 발명의 일 실시예에 따른 화면 분석 서버에서 화면을 분석하는 방법의 순서도이다. 도 9에 도시된 화면 분석 서버(130)에서 화면을 분석하는 방법은 도 1 내지 도 8b에 도시된 실시예에 따른 화면 제어 시스템(1)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1 내지 도 8b에 도시된 실시예에 따른 화면 분석 서버(130)에서 화면을 분석하는 방법에도 적용된다. 9 is a flowchart of a method for analyzing a screen in a screen analysis server according to an embodiment of the present invention. A method of analyzing a screen in the screen analysis server 130 illustrated in FIG. 9 includes steps processed in time series by the screen control system 1 according to the exemplary embodiment illustrated in FIGS. 1 to 8B. Therefore, even if the contents are omitted below, it is also applied to the method of analyzing the screen in the screen analysis server 130 according to the exemplary embodiment illustrated in FIGS. 1 to 8B.

단계 S910에서 화면 분석 서버(130)는 미디어 재생 장치(110)에서 사용자(100)로부터 발화된 음성 명령을 입력받은 경우, 미디어 재생 장치(110)로부터 음성 명령으로부터 변환된 텍스트 정보 및 미디어 재생 장치(110)에 표시된 메뉴 화면에 대한 화면 상태 정보를 수신할 수 있다. In step S910, when the screen analysis server 130 receives a voice command uttered from the user 100 in the media playback device 110, the text information converted from the voice command from the media playback device 110 and the media playback device ( Screen status information on the menu screen displayed in 110) may be received.

단계 S920에서 화면 분석 서버(130)는 수신한 텍스트 정보 및 화면 상태 정보에 기초하여 음성 명령이 메뉴 화면에 대한 제어 명령인지 여부를 판단할 수 있다. In step S920, the screen analysis server 130 may determine whether the voice command is a control command for a menu screen based on the received text information and screen state information.

단계 S930에서 화면 분석 서버(130)는 화면 상태 정보에 기초하여 메뉴 화면에서 음성 명령에 대응하는 제어 명령을 미디어 재생 장치(110)로 전송할 수 있다. In step S930, the screen analysis server 130 may transmit a control command corresponding to the voice command on the menu screen to the media playback device 110 based on the screen state information.

상술한 설명에서, 단계 S910 내지 S930은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S910 to S930 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted as necessary, or the order of steps may be switched.

도 10a 내지 도 10c는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 메인 화면으로부터 사용자의 의도에 따라 최신 영화 카테고리에 해당하는 화면으로 이동하도록 제어하는 과정을 설명하기 위한 예시적인 도면이다. 10A to 10C are exemplary views for explaining a process of controlling a media playback device according to an embodiment of the present invention to move from a main screen to a screen corresponding to a newest movie category according to a user's intention.

도 10a는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자가 의도한 메뉴로 이동시키기 위한 음성 명령을 입력받는 과정을 설명하기 위한 예시적인 도면이다. 도 10a를 참조하면, 미디어 재생 장치(110)는 메인 메뉴 화면(1000)을 표시한 상태에서 사용자(100)로부터 "기가지니"라는 호출어와 함께 "영화시리즈(1010) 메뉴 보여줘" 및 "최신영화"(1020)와 같이 연속 발화된 음성 명령을 입력받을 수 있다. 이 때, 사용자(100)가 의도한 최종 메뉴는 '최신영화'(1020) 카테고리일 수 있다. 10A is an exemplary diagram for explaining a process of receiving a voice command for moving to a menu intended by a user in a media playback device according to an embodiment of the present invention. Referring to FIG. 10A, the media playback device 110 displays the main menu screen 1000 and the user 100 displays the "Show Movie Series 1010 menu" and "Latest Movie" along with the caller "Gi Genie". As shown in "1020, a continuous voice command may be input. In this case, the final menu intended by the user 100 may be a'Latest Movie' 1020 category.

도 10b는 종래의 미디어 재생 장치에서 음성 명령에 따른 검색 결과를 표시한 도면이다. 도 10a 및 도 10b를 참조하면, 종래의 미디어 재생 장치는 사용자로부터 입력받은 '최신영화'라는 음성 명령에 대한 검색 결과(1030)로, "최근 입수된 영화를 검색해 드릴게요."라는 메시지와 함께 검색 결과 수 및 검색된 영화를 일렬로 나열하여 표시한다. 이와 같이, 종래의 음성 명령에 따른 검색 결과는 사용자로 하여금 추가 음성 명령(예를 들어, 전체 보기 선택 등)을 통한 선택을 유발한다는 단점을 가지고 있다. 10B is a diagram showing a search result according to a voice command in a conventional media player. Referring to FIGS. 10A and 10B, in a conventional media player, a search result 1030 for a voice command of'latest movie' received from a user is searched with a message "I will search for a recently obtained movie." The number of results and the searched movies are displayed in a line. As described above, the search result according to the conventional voice command has a disadvantage in that the user causes a selection through an additional voice command (eg, selection of all views).

도 10c는 도 10a의 결과물로서 본 발명의 일 실시예에 따른 미디어 재생 장치에서 음성 명령에 따른 검색 결과를 표시한 예시적인 도면이다. 도 10a 및 도 10c를 참조하면, 미디어 재생 장치(110)는 '최신영화'(1020)에 포함된 복수의 컨텐츠(1040)를 사용자(100)가 한 눈에 알아 볼 수 있도록, 복수의 컨텐츠(1040)를 바둑판 형식으로 배열할 수 있다. 이 때, 사용자(100)는 음성 명령 또는 리모컨을 통해 스크롤을 제어하여 사용자(100)가 원하는 컨텐츠를 선택할 수 있다. 10C is an exemplary diagram illustrating a search result according to a voice command in the media playback device according to an embodiment of the present invention as the result of FIG. 10A. Referring to FIGS. 10A and 10C, the media playback device 110 allows the user 100 to recognize a plurality of contents 1040 included in the'latest movie' 1020 at a glance. 1040) can be arranged in a checkerboard format. In this case, the user 100 may control the scroll through a voice command or a remote control to select the content desired by the user 100.

도 11a 내지 도 11c는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 최신 영화 카테고리로부터 사용자의 의도에 따라 영화 컨텐츠에 해당하는 화면으로 이동하도록 제어하는 과정을 설명하기 위한 예시적인 도면이다.11A to 11C are exemplary diagrams for explaining a process of controlling a media playback device according to an embodiment of the present invention to move from a latest movie category to a screen corresponding to movie content according to a user's intention.

도 11a는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자가 의도한 컨텐츠로 이동시키기 위한 음성 명령을 입력받는 과정을 설명하기 위한 예시적인 도면이다. 도 11a를 참조하면, 미디어 재생 장치(110)는 '영화/시리즈' 화면(1100)을 표시한 상태에서 사용자(100)로부터 "기가지니"라는 호출어와 함께 "최신영화 메뉴 보여줘"(1110) 및 "동주"(1120)와 같이 연속 발화된 음성 명령을 입력받을 수 있다. 이 때, 사용자(100)가 의도한 최종 메뉴는 '동주'(1120)라는 컨텐츠일 수 있다. 11A is an exemplary diagram for explaining a process of receiving a voice command for moving to a content intended by a user in a media playback device according to an embodiment of the present invention. Referring to FIG. 11A, the media playback device 110 displays the'Movie/Series' screen 1100, and the user 100 displays the "Show the latest movie menu" with the caller "Gi Genie" (1110) and Like the "Dongju" 1120, a continuous voice command may be input. In this case, the final menu intended by the user 100 may be a content called'dongju' 1120.

도 11b는 종래의 미디어 재생 장치에서 음성 명령에 따라 검색 결과를 표시한 도면이다. 도 11a 및 도 11b를 참조하면, 종래의 미디어 재생 장치는 사용자로부터 입력받은 '동주'라는 음성 명령에 대해 메뉴의 전체 검색을 수행하여, 검색 결과(1120)로 "동주 검색결과입니다."라는 메시지와 함께 검색 결과 수 및 검색된 영화를 일렬로 나열하여 표시한다. 종래의 미디어 재생 장치는 사용자가 현재 어떤 화면에서 음성 명령을 발화한 것인지에 대한 상황을 판단할 수 없으므로, 오로지 텍스트 검색을 통한 메뉴의 전체 검색을 통해 해당 컨텐츠 또는 메뉴로 접근할 수 있게 된다. 11B is a diagram illustrating a search result according to a voice command in a conventional media player. Referring to FIGS. 11A and 11B, a conventional media player performs a full search of a menu for a voice command “Dongju” received from a user, and a message “This is a Dongju search result” as a search result 1120. And the number of search results and the searched movies are displayed in a line. Since the conventional media player cannot determine the situation on which screen the user uttered the voice command on, the corresponding content or menu can be accessed only through a full search of the menu through text search.

도 11c는 도 11a의 결과물로서 본 발명의 일 실시예에 따른 미디어 재생 장치에서 음성 명령에 따른 검색 결과를 표시한 예시적인 도면이다. 도 11a 및 도 11c를 참조하면, 미디어 재생 장치(110)는 '영화/시리즈'(1000) 메뉴 중 사용자(100)가 발화한 음성 명령에 해당하는 '최신영화'(1110) 메뉴로 이동한 후, '최신영화'(1110)에 포함된 복수의 컨텐츠(1140)를 사용자(100)가 한 눈에 알아 볼 수 있도록, 복수의 컨텐츠(1140)를 바둑판 형식으로 배열할 수 있다. 이 때, 미디어 재생 장치(110)는 화면에 표시된 복수의 컨텐츠(1140) 중 '동주'(1150)에 굵은 테두리 표시를 할 수 있다. 11C is an exemplary diagram illustrating a search result according to a voice command in the media playback device according to an embodiment of the present invention as the result of FIG. 11A. 11A and 11C, the media player 110 moves to the'Latest Movie' 1110 menu corresponding to the voice command uttered by the user 100 among the'Movie/Series' 1000 menus. , In order for the user 100 to recognize the plurality of contents 1140 included in the'latest movie' 1110 at a glance, the plurality of contents 1140 may be arranged in a checkerboard format. In this case, the media playback device 110 may display a thick frame on the'dongju' 1150 among the plurality of contents 1140 displayed on the screen.

도 12a 및 도 12b는 본 발명의 일 실시예에 따른 미디어 재생 장치에서 사용자로부터 입력된 연속 발화에 따라 화면을 제어하는 과정을 설명하기 위한 예시적인 도면이다. 12A and 12B are exemplary diagrams for explaining a process of controlling a screen according to continuous speech input from a user in a media playback device according to an embodiment of the present invention.

도 12a는 종래의 미디어 재생 장치에서 사용자로부터 입력된 음성 명령에 기초하여 메인 메뉴에서 컨텐츠 페이지로 이동하는 과정을 설명하기 위한 도면이다. 도 12a를 참조하면, 사용자는 메인 메뉴 화면(1200)에서 "기가지니"라는 호출어를 발화할 수 있다. 이 때, 미디어 재생 장치는 "네 듣고 있어요."라는 메시지를 표시하며, 마이크 아이콘을 통해 사용자로부터 음성 명령을 입력받을 준비를 한다. 12A is a diagram illustrating a process of moving from a main menu to a content page based on a voice command input from a user in a conventional media player. Referring to FIG. 12A, a user may utter a call word “giga genie” on the main menu screen 1200. At this time, the media player displays a message saying "Yes, I'm listening" and prepares to receive a voice command from the user through the microphone icon.

사용자가 "동주 찾아줘"라는 음성 명령을 발화한 경우, 미디어 재생 장치는 '동주'라는 텍스트에 기초하여 메뉴에 대한 전체 검색을 수행하고, 검색 결과(1210)에 해당하는 복수의 컨텐츠를 일렬로 표시한다. When the user utters the voice command “Find Dongju”, the media playback device performs a full search for the menu based on the text “Dongju” and arranges a plurality of contents corresponding to the search result 1210 in a line. Indicate.

이후, 사용자가 '동주'라는 컨텐츠를 실행하고자 하는 경우, 사용자는 "기가지니"라는 호출어를 재발화한 후, 검색 결과(1210)에 포함된 복수의 컨텐츠 중 사용자가 원하는 컨텐츠에 넘버링된 숫자에 해당하는 "1번"(1220)을 발화한다. Thereafter, when the user wants to execute the content “Dongju”, the user re-initiates the call word “Gi Genie”, and then the number numbered in the content desired by the user among the plurality of contents included in the search result 1210 "No. 1" (1220) corresponding to is ignited.

미디어 재생 장치(110)는 "1번"(1120)에 해당하는 '동주'라는 컨텐츠의 상세 페이지로 이동하여 표시한다. The media playback device 110 moves to and displays a detail page of the content “Dongju” corresponding to “No. 1” 1120.

이와 같이, 종래에는 메인 메뉴로부터 컨텐츠까지 이동하기 위해서 사용자는 '호출어+명령어'(기가지니+동주찾아줘)를 발화하고, '호출어+명령어'(기가지니+1번실행)과 같이 명령어의 발화 이전에 호출어를 발화해야 한다. As such, conventionally, in order to move from the main menu to the content, the user uttered a'call word + command' (Giga Genie + Find Dongju), and a command such as'call word + command' (Giga Genie + execute 1 time). The caller must be uttered before the utterance of

도 12b는 본 발명의 일 실시예에 따른 사용자로부터 입력된 음성 명령에 기초하여 메인 메뉴에서 컨텐츠 페이지로 이동하는 과정을 설명하기 위한 예시적인 도면이다. 도 12b를 참조하면, 사용자(100)는 메인 메뉴 화면(1240)에서 "기가지니"라는 호출어를 발화할 수 있다. 이 때, 미디어 재생 장치(110)는 "네 듣고 있어요."라는 메시지를 표시하며, 마이크 아이콘을 통해 사용자(100)로부터 음성 명령을 입력받을 준비를 한다.12B is an exemplary diagram illustrating a process of moving from a main menu to a content page based on a voice command input from a user according to an embodiment of the present invention. Referring to FIG. 12B, the user 100 may utter a call word “giga genie” on the main menu screen 1240. At this time, the media playback device 110 displays a message "Yes, I'm listening", and prepares to receive a voice command from the user 100 through the microphone icon.

사용자(100)가 "영화/시리즈 이동"이라는 음성 명령을 발화한 경우, 미디어 재생 장치(110)는 '영화/시리즈' 메뉴 화면(1250)으로 이동시키고, '영화/시리즈' 메뉴(1250)의 하위 메뉴 중 사용자(100)가 "최신 영화 이동"라는 음성 명령을 발화한 경우, 미디어 재생 장치(110)는 '최신 영화' 메뉴 화면(1260)으로 이동시키고, '최신 영화' 메뉴(1260)에 포함된 복수의 컨텐츠 중 사용자(100)가 "동주 실행"이라는 음성 명령을 발화한 경우, 미디어 재생 장치(110)는 복수의 컨텐츠 중 '동주'(1270)를 선택하여 실행시킬 수 있다. When the user 100 utters a voice command “movie/series movement”, the media playback device 110 moves to the “movie/series” menu screen 1250 and displays the “movie/series” menu 1250. When the user 100 among the sub-menus utters a voice command “Move to the latest movie”, the media playback device 110 moves to the “Latest Movie” menu screen 1260 and displays the “Latest Movie” menu 1260. When the user 100 utters a voice command "execution of play" among a plurality of included contents, the media playback device 110 may select and execute the "playing" 1270 of the plurality of contents.

도 1 내지 도 12b를 통해 설명된 미디어 재생 장치에서 화면을 제어하는 방법 및 화면 분석 서버에서 화면을 분석하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 또한, 도 1 내지 도 12b를 통해 설명된 미디어 재생 장치에서 화면을 제어하는 방법 및 화면 분석 서버에서 화면을 분석하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다. The method of controlling a screen in the media playback device described with reference to FIGS. 1 to 12B and the method of analyzing the screen in a screen analysis server include a computer program stored in a medium executed by a computer or a computer-executable instruction. It can also be implemented in the form of a medium. In addition, the method of controlling the screen in the media playback device described with reference to FIGS. 1 to 12B and the method of analyzing the screen in the screen analysis server may be implemented in the form of a computer program stored in a medium executed by a computer.

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that other specific forms can be easily modified without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

110: 미디어 재생 장치
115: 디스플레이 장치
120: 음성 인식 서버
130: 화면 분석 서버
210: 입력부
220: 음성 명령 전송부
230: 텍스트 정보 수신부
240: 화면 상태 분석부
250: 화면 정보 전송부
260: 수신부
270: 실행부
280: 표시부
710: 화면 상태 정보 수신부
720: 추출부
730: 판단부
740: 전송부
750: 텍스트 추출부
760: 매핑부
770: 관리부110: media playback device
115: display device
120: speech recognition server
130: screen analysis server
210: input unit
220: voice command transmission unit
230: text information receiver
240: screen state analysis unit
250: screen information transmission unit
260: receiver
270: executive
280: display
710: screen status information receiving unit
720: extraction unit
730: judgment unit
740: transmission unit
750: text extraction unit
760: mapping unit
770: administration

Claims

In the media playback device for controlling the screen,
An input unit receiving a voice command spoken by a user;
A voice command transmission unit for transmitting the input voice command to a voice recognition server;
A text information receiver configured to receive text information generated based on the voice command from the voice recognition server;
A screen state analysis unit that analyzes a menu screen displayed on the media playback device;
A screen information transmission unit that transmits the received text information and screen state information on the analyzed menu screen to a screen analysis server;
A receiver configured to receive a control command generated based on the text information and the screen state information from the screen analysis server; And
Including an execution unit for controlling the menu screen based on the received control command,
The screen status information includes at least one of a menu ID, a category ID, a content ID, and a screen component ID related to the menu screen,
The control command is generated based on at least one ID extracted from the screen state information corresponding to the text information.

The method of claim 1,
When the media playback device does not display the menu screen,
The transmission unit transmits the received text information to the speech recognition server,
Wherein the receiving unit receives a search result corresponding to the text information from the speech recognition server.

The method of claim 1,
Each of the IDs is mapped to a menu name, a category name, a content name, and a screen component name.

The method of claim 3,
The screen information transmission unit further transmits the screen data displayed on the media playback device to the screen analysis server.

The method of claim 4,
When a control command related to the menu screen and screen data is not received from the screen analysis server,
The transmission unit transmits the received text information to the speech recognition server,
Wherein the receiving unit receives a search result corresponding to the text information from the speech recognition server.

In the media playback device for controlling the screen,
An input unit receiving a voice command spoken by a user;
A screen information transmission unit for transmitting the input voice command and screen state information on a menu screen displayed on the media playback device to a screen analysis server;
A receiver configured to receive a control command generated based on the voice command and the screen state information from the screen analysis server; And
Including a control unit for controlling the menu screen based on the received control command,
The screen status information includes at least one of a menu ID, a category ID, a content ID, and a screen component ID related to the menu screen,
The control command is generated based on at least one ID extracted from the screen state information corresponding to text information generated based on the voice command.

The method of claim 6,
Wherein the receiving unit receives speech guide information for the voice command based on the voice command and text information displayed on the menu screen from the screen analysis server.

The method of claim 7,
Further comprising a display for displaying the utterance guide information,
Wherein the display unit displays the utterance guide information in either an overlay method or a rolling method.

The method of claim 7,
A session is created between the screen analysis server and the voice recognition server,
Wherein the control command is generated based on the voice command transmitted from the screen analysis server to the voice recognition server through the session and a hint derived from the screen state information.

In a method of controlling a screen in a media playback device,
Receiving a voice command spoken from a user;
Transmitting the input voice command to a voice recognition server;
Receiving text information generated based on the voice command from the voice recognition server;
Analyzing a menu screen displayed on the media playback device;
Transmitting the received text information and screen status information on the analyzed menu screen to a screen analysis server;
Receiving a control command generated based on the text information and the screen state information from the screen analysis server; And
Including the step of controlling the menu screen based on the received control command,
The screen status information includes at least one of a menu ID, a category ID, a content ID, and a screen component ID related to the menu screen,
The control command is generated based on at least one ID extracted from the screen state information corresponding to the text information.

In the server that analyzes the screen,
A screen status information receiver configured to receive text information converted from the voice command from the media playback device and screen status information on a menu screen displayed on the media playback device when the media playback device receives a voice command uttered from a user; And
A transmission unit that generates a control command corresponding to the voice command on the menu screen based on the screen state information and transmits it to the media player
Including,
The screen status information includes at least one of a menu ID, a category ID, a content ID, and a screen component ID related to the menu screen,
Further comprising an extraction unit for extracting at least one of a menu ID, a category ID, a content ID, and a screen component ID corresponding to the text information from the screen state information,
The control command is generated based on at least one ID extracted from the screen state information corresponding to the text information.

The method of claim 11,
When the media playback device receives a continuous speech command from the user, the transmission unit transmits a control command for a submenu of the menu screen to the media playback device based on the continuous speech command Screen analysis server.

The method of claim 11,
The menu screen is composed of a menu list, a category list and a content list,
The menu list is a menu ID and a menu name, the category list is a category ID and a category name, and the content list is a content ID and a content name respectively mapped.

The method of claim 11,
The screen state information receiving unit further receives screen data from the media playback device, the menu screen analysis server.

The method of claim 14,
A text extracting unit for extracting text from the image when it is determined that text information corresponding to a screen component of the menu screen is included in the image constituting the screen data;
And a mapping unit for mapping the extracted text and the ID of the screen component.

The method of claim 15,
And the transmission unit transmits a control command for a sub-menu corresponding to the voice command on the menu screen to the media player based on the extracted ID.

The method of claim 12,
When the media playback device receives an additional command from the user after the menu screen is moved to the sub-menu based on the control command, the menu further comprises a management unit that collects and manages a log of the additional command Screen analysis server.