KR102344044B1

KR102344044B1 - Settop terminal and operating method of thereof

Info

Publication number: KR102344044B1
Application number: KR1020200008372A
Authority: KR
Inventors: 김태원; 김규석; 김효진
Original assignee: 주식회사 엘지유플러스
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2021-12-27
Also published as: KR102433628B1; KR20210094756A; KR20210158832A

Abstract

셋탑 단말 및 이의 동작 방법이 개시된다. 일 실시예는 사용자의 발화 음성 신호의 음성 인식 결과를 서버로부터 수신하고, 상기 수신된 음성 인식 결과로 실행 가능한 동작이 없는 경우, 상기 수신된 음성 인식 결과를 기초로 참조 리스트를 탐색하며, 상기 탐색 결과를 기초로 동작을 실행한다. A set-top terminal and an operating method thereof are disclosed. One embodiment receives a voice recognition result of the user's spoken voice signal from a server, and searches a reference list based on the received voice recognition result when there is no actionable action based on the received voice recognition result, and the search Execute the action based on the result.

Description

Set-top terminal and its operating method {SETTOP TERMINAL AND OPERATING METHOD OF THEREOF}

아래 실시예들은 셋탑 단말에 관한 것이다.The following embodiments relate to a set-top terminal.

기존 셋탑 단말은 음성 인식 결과인 STT(Speech to Text)가 잘못된 경우, 어떤 동작을 해야 하는지 알 수 없거나 잘못된 동작을 수행할 수 있다. When STT (Speech to Text), which is a result of speech recognition, is incorrect, the existing set-top terminal may not know what operation to take or may perform an incorrect operation.

관련 선행기술로, 한국 공개특허공보 제10-2014-0073996호(발명의 명칭: 음성을 통한 지능적인 TV 제어 방법 및 시스템, 출원인: 주식회사 케이티)가 있다. 해당 공개특허공보에 개시된 셋탑 박스는 마이크가 장착된 리모트 컨트롤러로부터 음성 명령을 수신하는 음성 명령 수신부, 수신된 음성 명령을 음성 인식 대화 처리 서버로 전송하는 음성 명령 전송부, 및 음성 인식 대화 처리 서버로부터 음성 명령에 대응하는 키 조합 메시지를 수신하고 수신된 키 조합 메시지를 분석하여 키 조합 메시지에 포함된 제어 명령을 순차적으로 실행시키는 명령 실행부를 포함한다.As a related prior art, there is Korean Patent Application Laid-Open No. 10-2014-0073996 (Title of the Invention: Intelligent TV Control Method and System through Voice, Applicant: Katy Co., Ltd.). The set-top box disclosed in this publication is a voice command receiver for receiving a voice command from a remote controller equipped with a microphone, a voice command transmitter for transmitting the received voice command to a voice recognition conversation processing server, and a voice recognition conversation processing server from the and a command execution unit that receives a key combination message corresponding to a voice command, analyzes the received key combination message, and sequentially executes a control command included in the key combination message.

일 측에 따른 셋탑 단말은 통신 인터페이스 및 사용자의 발화 음성 신호의 음성 인식 결과를 상기 통신 인터페이스를 통해 서버로부터 수신하고, 상기 수신된 음성 인식 결과로 실행 가능한 동작이 없는 경우, 상기 수신된 음성 인식 결과를 기초로 참조 리스트를 탐색하며, 상기 탐색 결과를 기초로 동작을 실행하는 컨트롤러를 포함한다.The set-top terminal according to one side receives the voice recognition result of the communication interface and the user's spoken voice signal from the server through the communication interface, and when there is no actionable operation as the received voice recognition result, the received voice recognition result and a controller that searches the reference list based on , and executes an operation based on the search result.

상기 컨트롤러는 상기 참조 리스트 내의 참조 데이터들 중에서 상기 수신된 음성 인식 결과 내의 제1 텍스트와 부합되는 제1 규격을 갖는 참조 데이터들을 식별하고, 상기 식별된 참조 데이터들 중에서 상기 수신된 음성 인식 결과 내의 제2 텍스트와 부합되는 제2 규격을 갖는 타겟 참조 데이터가 있는지 판단하며, 상기 타겟 참조 데이터가 있는 경우 상기 제1 규격에 해당하는 어플리케이션을 실행하고, 상기 어플리케이션의 실행 화면 상의 복수의 메뉴들 중 포커싱된 메뉴를 식별하고, 상기 메뉴들 중 상기 제2 규격에 대응되는 메뉴의 좌표와 상기 식별된 메뉴의 좌표를 비교하고, 상기 비교 결과를 기초로 포커싱을 상기 제2 규격에 대응되는 메뉴로 이동할 수 있다.The controller identifies, from among the reference data in the reference list, reference data having a first standard that matches the first text in the received speech recognition result, and identifies, from among the identified reference data, the second in the received speech recognition result. 2 It is determined whether there is target reference data having a second standard that matches the text, and if there is the target reference data, an application corresponding to the first standard is executed, and a focused one of a plurality of menus on an execution screen of the application is executed. It is possible to identify a menu, compare the coordinates of the menu corresponding to the second standard among the menus with the coordinates of the identified menu, and move the focusing to the menu corresponding to the second standard based on the comparison result. .

상기 컨트롤러는 상기 복수의 메뉴들 중 텍스트 컬러 또는 배경 컬러가 다른 하나를 상기 포커싱된 메뉴로 식별할 수 있다.The controller may identify one of the plurality of menus having a different text color or a different background color as the focused menu.

상기 컨트롤러는 상기 제2 규격에 대응되는 메뉴가 포커싱된 경우, 상기 제2 텍스트에 대응되는 메뉴를 실행할 수 있다.When the menu corresponding to the second standard is focused, the controller may execute a menu corresponding to the second text.

상기 컨트롤러는 상기 비교 결과를 기초로 상기 포커싱을 상, 하, 좌, 및 우 중 어느 하나로 이동시킬 수 있다.The controller may move the focusing to any one of up, down, left, and right based on the comparison result.

상기 타겟 참조 데이터는 텍스트 컬러 또는 배경 컬러를 제3 규격으로 갖고, 상기 제2 규격에 대응되는 메뉴의 좌표를 제4 규격으로 가지며, 액션(action)을 제5 규격으로 가질 수 있다.The target reference data may have a text color or a background color as a third standard, coordinates of a menu corresponding to the second standard as a fourth standard, and an action as a fifth standard.

상기 컨트롤러는 상기 통신 인터페이스를 통해 상기 서버로부터 상기 음성 인식 결과로 실행 가능한 동작이 없다는 정보를 수신할 수 있다.The controller may receive information that there is no actionable action as a result of the voice recognition from the server through the communication interface.

상기 컨트롤러는 상기 발화 음성 신호를 상기 사용자로부터 수신할 수 있다.The controller may receive the spoken voice signal from the user.

상기 컨트롤러는 상기 발화 음성 신호를 부호화하여 디지털 신호를 생성하고, 상기 생성된 디지털 신호를 상기 통신 인터페이스를 통해 상기 서버로 전송할 수 있다.The controller may generate a digital signal by encoding the spoken voice signal, and transmit the generated digital signal to the server through the communication interface.

일 측에 따른 셋탑 단말의 동작 방법은 사용자의 발화 음성 신호의 음성 인식 결과를 서버로부터 수신하는 단계; 및 상기 수신된 음성 인식 결과로 실행 가능한 동작이 없는 경우, 상기 수신된 음성 인식 결과를 기초로 참조 리스트를 탐색하고 상기 탐색 결과를 기초로 동작을 실행하는 단계를 포함한다.A method of operating a set-top terminal according to one side includes: receiving a voice recognition result of a user's uttered voice signal from a server; and when there is no actionable action based on the received speech recognition result, searching a reference list based on the received speech recognition result and executing an action based on the search result.

상기 동작을 실행하는 단계는 상기 참조 리스트 내의 참조 데이터들 중에서 상기 수신된 음성 인식 결과 내의 제1 텍스트와 부합되는 제1 규격을 갖는 참조 데이터들을 식별하는 단계; 상기 식별된 참조 데이터들 중에서 상기 수신된 음성 인식 결과 내의 제2 텍스트와 부합되는 제2 규격을 갖는 타겟 참조 데이터가 있는지 판단하는 단계; 상기 타겟 참조 데이터가 있는 경우 상기 제1 규격에 해당하는 어플리케이션을 실행하는 단계; 상기 어플리케이션의 실행 화면 상의 복수의 메뉴들 중 포커싱된 메뉴를 식별하는 단계; 및 상기 메뉴들 중 상기 제2 규격에 대응되는 메뉴의 좌표와 상기 식별된 메뉴의 좌표를 비교하고 상기 비교 결과를 기초로 포커싱을 상기 제2 규격에 대응되는 메뉴로 이동하는 단계를 포함할 수 있다.The executing of the operation may include: identifying reference data having a first standard matching the first text in the received speech recognition result from among the reference data in the reference list; determining whether there is target reference data having a second standard matching a second text in the received speech recognition result among the identified reference data; executing an application corresponding to the first standard when there is the target reference data; identifying a focused menu from among a plurality of menus on an execution screen of the application; and comparing the coordinates of the identified menu with the coordinates of the menu corresponding to the second standard among the menus, and moving the focusing to the menu corresponding to the second standard based on the comparison result. .

상기 식별하는 단계는 상기 복수의 메뉴들 중 텍스트 컬러 또는 배경 컬러가 다른 하나를 상기 포커싱된 메뉴로 식별하는 단계를 포함할 수 있다.The identifying may include identifying one of the plurality of menus having a different text color or a different background color as the focused menu.

상기 동작을 실행하는 단계는 상기 제2 규격에 대응되는 메뉴가 포커싱된 경우, 상기 제2 텍스트에 대응되는 메뉴를 실행하는 단계를 더 포함할 수 있다.The executing of the operation may further include executing a menu corresponding to the second text when the menu corresponding to the second standard is focused.

상기 이동하는 단계는 상기 비교 결과를 기초로 상기 포커싱을 상, 하, 좌, 및 우 중 어느 하나로 이동하는 단계를 포함할 수 있다. The moving may include moving the focusing to any one of up, down, left, and right based on the comparison result.

상기 셋탑 단말의 동작 방법은 상기 서버로부터 상기 음성 인식 결과로 실행 가능한 동작이 없다는 정보를 수신하는 단계를 더 포함할 수 있다. The method of operating the set-top terminal may further include receiving information that there is no actionable operation as a result of the voice recognition from the server.

상기 셋탑 단말의 동작 방법은 상기 발화 음성 신호를 상기 사용자로부터 수신하는 단계를 더 포함할 수 있다.The method of operating the set-top terminal may further include receiving the spoken voice signal from the user.

상기 셋탑 단말의 동작 방법은 상기 발화 음성 신호를 부호화하여 디지털 신호를 생성하고, 상기 생성된 디지털 신호를 상기 서버로 전송하는 단계를 더 포함할 수 있다.The method of operating the set-top terminal may further include generating a digital signal by encoding the spoken voice signal, and transmitting the generated digital signal to the server.

실시예들은 음성 인식 결과인 STT를 통해 동작을 수행할 때 해당 STT를 참조 리스트와 매칭시킬 수 있어 동작 선택에 대한 정확도를 높일 수 있다.In embodiments, when an operation is performed through the STT, which is a voice recognition result, the corresponding STT may be matched with the reference list, thereby increasing the accuracy of the operation selection.

도 1은 일 실시예에 따른 IPTV 시스템을 설명하기 위한 흐름도이다.
도 2 내지 도 5는 일 실시예에 따른 IPTV 시스템의 동작을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 IPTV 시스템의 동작을 설명하기 위한 흐름도이다.
도 7은 일 실시예에 따른 셋탑 단말을 설명하기 위한 블록도이다.1 is a flowchart illustrating an IPTV system according to an embodiment.
2 to 5 are diagrams for explaining the operation of an IPTV system according to an embodiment.
6 is a flowchart illustrating an operation of an IPTV system according to an embodiment.
7 is a block diagram illustrating a set-top terminal according to an embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all modifications, equivalents and substitutes for the embodiments are included in the scope of the rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for the purpose of description only, and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or a combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are given the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

도 1은 일 실시예에 따른 IPTV 시스템을 설명하기 위한 흐름도이다.1 is a flowchart illustrating an IPTV system according to an embodiment.

도 1을 참조하면, 일 실시예에 따른 IPTV 시스템(100)은 셋탑 단말(110), 디스플레이(120), 및 서버(130)를 포함한다.Referring to FIG. 1 , an IPTV system 100 according to an embodiment includes a set-top terminal 110 , a display 120 , and a server 130 .

셋탑 단말(110)은 셋탑 박스 또는 영상 수신 장치로 달리 표현될 수 있다.The set-top terminal 110 may be expressed differently as a set-top box or an image receiving device.

셋탑 단말(110)은 사용자의 발화 음성 신호를 해당 사용자로부터 수신하고, 수신된 발화 음성 신호를 프로세싱(예를 들어, 부호화)하여 디지털 신호를 생성한다.The set-top terminal 110 receives the user's spoken voice signal from the user, and processes (eg, encodes) the received spoken voice signal to generate a digital signal.

셋탑 단말(110)은 생성된 디지털 신호를 서버(130)로 전송한다.The set-top terminal 110 transmits the generated digital signal to the server 130 .

서버(130)는 음성 인식 엔진을 포함한다. 서버(130)는 음성 인식 엔진을 이용하여 디지털 신호에 음성 인식을 수행하고, 음성 인식 결과를 셋탑 단말(110)로 전송한다.The server 130 includes a voice recognition engine. The server 130 performs voice recognition on a digital signal using a voice recognition engine, and transmits the voice recognition result to the set-top terminal 110 .

셋탑 단말(110)은 수신된 음성 인식 결과로 실행 가능한 동작이 없을 수 있다. 이 경우, 셋탑 단말(110)은 수신된 음성 인식 결과를 기초로 참조 리스트를 탐색하고, 탐색 결과를 기초로 동작을 실행하며, 실행 결과를 디스플레이(120)에 출력한다. The set-top terminal 110 may not have an actionable operation as a result of the received voice recognition. In this case, the set-top terminal 110 searches the reference list based on the received voice recognition result, executes an operation based on the search result, and outputs the execution result to the display 120 .

기존 셋탑 단말은 음성 인식 결과인 STT(Speech to Text)가 잘못된 경우, 어떤 동작을 해야 하는지 알 수 없거나 잘못된 동작을 수행할 수 있다. 하지만, 일 실시예에 따른 셋탑 단말(110)은 음성 인식 결과인 STT를 통해 어떤 동작을 수행할 때 해당 STT를 참조 리스트와 매칭시킴으로써 동작 선택에 대한 정확도를 높일 수 있다. 이하, 도 2 내지 도 5를 참조하면서 자세히 설명한다.When STT (Speech to Text), which is a result of speech recognition, is incorrect, the existing set-top terminal may not know what operation to take or may perform an incorrect operation. However, the set-top terminal 110 according to an embodiment can increase the accuracy of the operation selection by matching the STT with the reference list when performing an operation through the STT, which is the voice recognition result. Hereinafter, it will be described in detail with reference to FIGS. 2 to 5 .

도 2 내지 도 5는 일 실시예에 따른 IPTV 시스템의 동작을 설명하기 위한 도면이다.2 to 5 are diagrams for explaining the operation of an IPTV system according to an embodiment.

사용자는 "아이들 나라 영어 유치원 틀어줘"를 발화할 수 있다.The user can utter "Play English Kindergarten in Children's Country".

셋탑 단말(110)은 사용자의 발화 음성 신호를 마이크를 통해 수신할 수 있다. 달리 표현하면, 셋탑 단말(110)은 음성 입력을 획득할 수 있다.The set-top terminal 110 may receive the user's spoken voice signal through a microphone. In other words, the set-top terminal 110 may obtain a voice input.

셋탑 단말(110)은 사용자의 발화 음성 신호를 프로세싱하여 디지털 신호를 생성한다. 일례로, 셋탑 단말(110)은 사용자의 발화 음성 신호를 미리 정해진 부호화 기법(예를 들어, PCM(Pulse Code Modulation) 등)에 따라 부호화하여 디지털 신호를 생성할 수 있다. PCM은 부호화 기법의 예시적인 사항일 뿐, 부호화 기법은 전술한 PCM으로 제한되지 않는다.The set-top terminal 110 generates a digital signal by processing the user's spoken voice signal. For example, the set-top terminal 110 may generate a digital signal by encoding the user's spoken voice signal according to a predetermined encoding technique (eg, pulse code modulation (PCM), etc.). PCM is only an example of an encoding scheme, and the encoding scheme is not limited to the above-described PCM.

서버(130)는 음성 인식 엔진을 이용하여 디지털 신호에 음성 인식을 수행한다. 이에 따라, 서버(130)는 텍스트 "아이들 나라 영어 유치원 틀어줘"를 생성할 수 있다.The server 130 performs voice recognition on a digital signal using a voice recognition engine. Accordingly, the server 130 may generate the text "Play English Kindergarten in Children's Country".

서버(130) 또는 음성 인식 엔진에는 "아이들 나라 영어 유치원 틀어줘"에 대한 연동 규격 또는 학습 결과 데이터가 없을 수 있다. 달리 표현하면, 서버(130) 또는 음성 인식 엔진은 학습 과정에서 어플리케이션 "아이들 나라"의 메뉴 "영어 유치원"을 학습하지 못했을 수 있다. 이 경우, 서버(130)는 음성 인식 결과로 실행 가능한 동작이 없다는 것을 셋탑 단말(110)에 알릴 수 있다. 달리 표현하면, 서버(130)는 발화 음성 신호가 의도한 액션을 실행할 수 없다는 결과를 셋탑 단말(110)에 전송할 수 있다. 이와 함께, 서버(130)는 음성 인식 결과의 텍스트 "아이들 나라 영어 유치원 틀어줘"를 셋탑 단말(110)로 전송할 수 있다.In the server 130 or the voice recognition engine, there may be no interworking standards or learning result data for "Play English Kindergarten in Children's Country". In other words, the server 130 or the voice recognition engine may not have learned the menu "English Kindergarten" of the application "Children's Country" in the learning process. In this case, the server 130 may notify the set-top terminal 110 that there is no actionable operation as a result of the voice recognition. In other words, the server 130 may transmit a result that the spoken voice signal cannot execute the intended action to the set-top terminal 110 . In addition, the server 130 may transmit the text "Play English Kindergarten in Children's Country" of the voice recognition result to the set-top terminal 110 .

셋탑 단말(110)은 도 2에 도시된 예와 같이 음성 인식 결과 "아이들 나라 영어 유치원 틀어줘"를 디스플레이(120)에 출력할 수 있다.The set-top terminal 110 may output "Play English Kindergarten in Children's Country" to the display 120 as a result of voice recognition as in the example shown in FIG. 2 .

셋탑 단말(110)은 음성 인식 결과 "아이들 나라 영어 유치원 틀어줘"로 실행 가능한 동작이 없으므로, 참조 리스트를 탐색할 수 있다. 참조 리스트의 일례가 도 3에 도시된다.The set-top terminal 110 may search the reference list because there is no action that can be executed as "Play English Kindergarten in Children's Country" as a result of voice recognition. An example of a reference list is shown in FIG. 3 .

도 3을 참조하면, 참조 리스트는 여러 참조 데이터들을 포함한다. Referring to FIG. 3 , the reference list includes several pieces of reference data.

각 참조 데이터는 어플리케이션의 식별 정보, 해당 어플리케이션 내의 메뉴의 식별 정보, 해당 메뉴의 텍스트의 컬러(또는 배경 컬러), 해당 어플리케이션의 실행 화면 상에서 해당 메뉴의 좌표, 및 액션(action)(또는 인텐트(intent))를 규격으로 가질 수 있다. 액션은 발화 음성 신호가 지시하는 명령을 수행하기 위해 음성 인식 엔진이 실행하는 동작 또는 태스크에 해당할 수 있다.Each reference data includes identification information of the application, identification information of a menu within the application, the color of the text (or background color) of the menu, the coordinates of the menu on the execution screen of the application, and an action (or intent ( intent)) as a standard. The action may correspond to an operation or task executed by the voice recognition engine to perform a command indicated by the spoken voice signal.

도 3에 도시된 예에서, 첫번째 참조 데이터는 어플리케이션 이름 "아이들 나라", 어플리케이션 "아이들 나라"의 메뉴들 중 하나의 이름 "부모 교실", 메뉴 "부모 교실"의 텍스트 컬러(또는 배경 컬러)에 해당하는 컬러₁, 어플리케이션 "아이들 나라"의 실행 화면 상에서 메뉴 "부모 교실"의 좌표를 나타내는 (x₁,y₁), 및 action₁을 포함할 수 있다. 여기서, action₁은 메뉴 "부모 교실"의 실행에 대한 태스크를 나타낼 수 있다.In the example shown in Fig. 3, the first reference data is the text color (or background color) of the application name "Kids Country", the name "Parents Classroom" in one of the menus of the application "Kids Country", and the menu "Parents Classroom" It may include a corresponding color ₁ _{, (x 1} , y ₁ ) indicating coordinates of the menu “parent classroom” on the execution screen of the application “children’s country”, and action ₁ . Here, action ₁ may indicate a task for execution of the menu “parent classroom”.

도 3에 도시된 예에서, 다섯 번째 참조 데이터는 어플리케이션 이름 "아이들 나라", 어플리케이션 "아이들 나라"의 메뉴들 중 하나의 이름 "영어 유치원", 메뉴 "영어 유치원"의 텍스트 컬러(또는 배경 컬러)에 해당하는 컬러₅, 어플리케이션 "아이들 나라"의 실행 화면 상에서 메뉴 "영어 유치원"의 좌표를 나타내는 (x₅,y₅), 및 action₅를 포함할 수 있다. 여기서, action₅는 메뉴 "영어 유치원"의 실행 동작을 나타낼 수 있다.In the example shown in Fig. 3, the fifth reference data is the text color (or background color) of the application name "Kids Country", the name "English Kindergarten" in one of the menus of the application "Kids Country", and the menu "English Kindergarten" It may include a color ₅ _{corresponding to , (x 5} , y ₅ ) indicating coordinates of the menu “English Kindergarten” on the execution screen of the application “Children’s Country”, and action ₅ . Here, action ₅ may represent an execution action of the menu "English kindergarten".

셋탑 단말(110)은 음성 인식 결과 "아이들 나라 영어 유치원 틀어줘"에 어플리케이션 이름 "아이들 나라"가 포함되어 있으므로, 해당 음성 인식 결과가 어플리케이션 "아이들 나라" 관련 발화문일 것으로 추정할 수 있다. 또한, 셋탑 단말(110)은 음성 인식 결과에 "영어 유치원"이 포함되어 있으므로, 어플리케이션 이름이 "아이들 나라"인 참조 데이터들 중에서 메뉴 이름이 "영어 유치원"인 것을 검색할 수 있다. 도 3에 도시된 예에서, 셋탑 단말(110)은 참조 리스트에서 다섯 번째 참조 데이터를 검색할 수 있다.The set-top terminal 110 may estimate that the voice recognition result is a utterance related to the application "Children's Country" because the application name "Children's Country" is included in the voice recognition result of "Play Children's Country English Kindergarten". In addition, since the set-top terminal 110 includes "English Kindergarten" in the voice recognition result, the set-top terminal 110 may search for a menu name of "English Kindergarten" among reference data having an application name of "Children's Country". In the example shown in FIG. 3 , the set-top terminal 110 may search for fifth reference data in the reference list.

셋탑 단말(110)은 다섯 번째 참조 데이터가 검색되었으므로, 어플리케이션 "아이들 나라"를 실행할 수 있고, 도 4에 도시된 예와 같이, 실행 결과를 디스플레이(120)에 출력할 수 있다. Since the fifth reference data is found, the set-top terminal 110 may execute the application "Children's Country", and may output the execution result on the display 120 as shown in the example shown in FIG. 4 .

셋탑 단말(110)은 어플리케이션 "아이들 나라"의 실행 화면에서 텍스트를 읽어올 수 있다. 일례로, 셋탑 단말(110)은 컴퓨터 비전을 통해 어플리케이션 실행 화면에서 메뉴들 "부모 교실"(410), "책 읽어주는 TV"(420), "웅진북클럽 TV"(430), "생생체험학습"(440), "영어유치원"(450), "누리교실"(460), "아이들 유튜브"(470) 각각의 텍스트를 인식할 수 있다.The set-top terminal 110 may read text from the execution screen of the application "Children's Country". As an example, the set-top terminal 110 displays the menus “Parent Classroom” 410, “Book Reading TV” 420, “Woongjin Book Club TV” 430, and “live experience” on the application execution screen through computer vision. Learning” (440), “English kindergarten” 450, “Nuri Classroom” 460, and “Children’s YouTube” 470 may be recognized.

셋탑 단말(110)은 메뉴들(410 내지 470) 중에서 포커싱된 메뉴를 식별할 수 있다. 일례로, 셋탑 단말(110)은 메뉴(420)의 텍스트의 배경 컬러가 다른 메뉴들(410 및 430 내지 470)의 텍스트의 배경 컬러가 다르므로, 메뉴(420)가 현재 포커싱되어 있다고 결정 또는 식별할 수 있다.The set-top terminal 110 may identify a focused menu from among the menus 410 to 470 . For example, the set-top terminal 110 determines or identifies that the menu 420 is currently focused because the background color of the text of the menus 410 and 430 to 470 is different from the background color of the text of the menu 420 . can do.

셋탑 단말(110)은 현재 포커싱된 메뉴(420)의 좌표와 검색된 다섯 번째 참조 데이터에 포함된 메뉴 이름의 "영어 유치원"의 좌표를 비교할 수 있다. 메뉴(420)의 좌표는 (x₂,y₁)이고, "영어 유치원"의 메뉴(450)의 좌표는 (x₅,y₁)이다. 셋탑 단말(110)은 메뉴(420)의 좌표와 메뉴(450)의 좌표 사이의 비교 결과를 통해 메뉴(450)가 메뉴(420)보다 오른쪽에 있다고 체크할 수 있고, 포커싱을 오른쪽으로 이동할 수 있다.The set-top terminal 110 may compare the coordinates of the currently focused menu 420 with the coordinates of "English Kindergarten" of the menu name included in the retrieved fifth reference data. The coordinates of the menu 420 are (x ₂ ,y ₁ ), and the coordinates of the menu 450 of “English Kindergarten” are (x ₅ ,y ₁ ). The set-top terminal 110 may check that the menu 450 is to the right of the menu 420 through the comparison result between the coordinates of the menu 420 and the coordinates of the menu 450, and may move the focusing to the right. .

셋탑 단말(110)은 포커싱을 오른쪽으로 이동하다가 "영어유치원"을 검출할 수 있다. 도 5에 도시된 예에서, 셋탑 단말(110)은 메뉴(450)의 텍스트의 배경 컬러가 다른 메뉴들(410 내지 440 및 460 내지 470)의 텍스트의 배경 컬러가 다른 것을 확인할 수 있고 이에 따라, 메뉴(450)에 포커싱이 이동했다고 판단할 수 있다. 다시 말해, 셋탑 단말(110)은 텍스트 컬러 또는 배경 컬러가 변경되면 포커싱이 이동한 것으로 판단할 수 있다.The set-top terminal 110 may detect "English kindergarten" while moving the focusing to the right. In the example shown in FIG. 5 , the set-top terminal 110 may confirm that the background color of the text of the menu 450 is different from the background color of the text of the menus 410 to 440 and 460 to 470, and accordingly, It may be determined that the focus has moved to the menu 450 . In other words, when the text color or the background color is changed, the set-top terminal 110 may determine that the focusing has shifted.

셋탑 단말(110)은 메뉴 "영어유치원"을 검출한 경우, 메뉴 "영어유치원"을 실행할 수 있다.When the set-top terminal 110 detects the menu "English kindergarten", it can execute the menu "English kindergarten".

실시예에 따르면, 셋탑 단말(110)은 사용자가 음성 인식 엔진이 학습하지 못한 명령문을 발화하여도 음성 인식 결과를 기초로 참조 리스트를 탐색한 뒤 탐색 결과를 통해 적절한 동작을 실행할 수 있다. 이에 따라, 동작 선택에 대한 정확도가 향상될 수 있다.According to an embodiment, even if the user utters a command that the voice recognition engine has not learned, the set-top terminal 110 may search the reference list based on the voice recognition result and then execute an appropriate operation through the search result. Accordingly, the accuracy of the operation selection may be improved.

도 6은 일 실시예에 따른 IPTV 시스템의 동작을 설명하기 위한 흐름도이다.6 is a flowchart illustrating an operation of an IPTV system according to an embodiment.

도 6에 도시된 예에서, 참조 데이터베이스(610) 및 음성 인식 엔진(620)은 서버(120)에 포함될 수 있다.In the example shown in FIG. 6 , the reference database 610 and the speech recognition engine 620 may be included in the server 120 .

도 6을 참조하면, 셋탑 단말(110)은 사용자로부터 발화 음성 신호를 수신한다(630).Referring to FIG. 6 , the set-top terminal 110 receives a spoken voice signal from the user ( 630 ).

셋탑 단말(110)은 발화 음성 신호를 프로세싱하여 디지털 신호를 생성한다(631).The set-top terminal 110 generates a digital signal by processing the spoken voice signal (631).

셋탑 단말(110)은 디지털 신호를 음성 인식 엔진(620)으로 전송한다(632).The set-top terminal 110 transmits a digital signal to the voice recognition engine 620 (632).

음성 인식 엔진(620)은 디지털 신호에 음성 인식을 수행하고 음성 인식 결과를 셋탑 단말(110)로 전송한다(633). 이 때, 음성 인식 엔진(620)은 음성 인식 결과로 실행 가능한 동작이 없음을 나타내는 정보를 셋탑 단말(110)로 전송할 수 있다.The voice recognition engine 620 performs voice recognition on the digital signal and transmits the voice recognition result to the set-top terminal 110 (633). In this case, the voice recognition engine 620 may transmit information indicating that there is no executable operation as a result of the voice recognition to the set-top terminal 110 .

셋탑 단말(110)은 음성 인식 결과로 실행 가능한 동작이 없다는 것을 확인할 수 있다. 이 경우, 셋탑 단말(110)은 참조 데이터베이스(610)를 탐색한다(634). 여기서, 참조 데이터베이스(610)에는 상술한 참조 리스트가 포함된다.The set-top terminal 110 may confirm that there is no actionable operation as a result of the voice recognition. In this case, the set-top terminal 110 searches the reference database 610 (634). Here, the reference database 610 includes the aforementioned reference list.

셋탑 단말(110)은 참조 데이터베이스(610)로부터 탐색 결과를 반환 받는다(635). 예를 들어, 셋탑 단말(110)은 참조 데이터베이스(610)에서 음성 인식 결과 "아이들 나라 영어 유치원 틀어줘" 내의 제1 텍스트 "아이들 나라"와 부합되는 제1 규격을 갖는 참조 데이터들을 식별할 수 있다. 일례로, 셋탑 단말(110)은 도 3을 통해 설명한 참조 리스트에서 어플리케이션 이름 "아이들 나라"를 포함하는 첫번째 참조 데이터 내지 여덟 번째 참조 데이터를 식별할 수 있다. 셋탑 단말(110)은 식별된 참조 데이터들 중에서 음성 인식 결과 "아이들 나라 영어 유치원 틀어줘" 내의 제2 텍스트 "영어 유치원"과 부합되는 제2 규격을 갖는 타겟 참조 데이터가 있는지 판단할 수 있다. 일례로, 셋탑 단말(110)은 식별된 첫번째 참조 데이터 내지 여덟 번째 참조 데이터 중에서 메뉴 이름 "영어 유치원"을 포함하는 다섯 번째 참조 데이터를 검색할 수 있다. 셋탑 단말(110)은 타겟 참조 데이터가 있는 경우 제1 규격에 해당하는 어플리케이션을 실행할 수 있고, 어플리케이션의 실행 화면 상의 복수의 메뉴들 중 포커싱된 메뉴를 식별할 수 있다. 일례로, 셋탑 단말(110)은 다섯 번째 참조 데이터를 검색한 경우, "아이들 나라"를 실행할 수 있고, "아이들 나라"의 실행 화면 상의 메뉴들 중 포커싱된 메뉴를 식별할 수 있다. 셋탑 단말(110)은 "아이들 나라"의 실행 화면 상의 메뉴들 중 제2 규격에 대응되는 메뉴의 좌표와 식별된 메뉴의 좌표를 비교할 수 있고, 비교 결과를 기초로 포커싱을 제2 규격에 대응되는 메뉴로 이동할 수 있다. 일례로, 셋탑 단말(110)은 메뉴(420)의 텍스트 배경 컬러가 다른 메뉴와 다르므로, 메뉴(420)에 포커싱이 되어 있다고 판단할 수 있고, 메뉴(420)의 좌표와 다섯 번째 참조 데이터에 포함된 "영어유치원"에 해당하는 메뉴(450)의 좌표를 비교할 수 있다. 이러한 비교 결과를 통해 셋탑 단말(110)은 메뉴(450)이 메뉴(420)보다 오른쪽에 있다고 체크할 수 있고, 포커싱을 오른쪽으로 이동할 수 있다. 셋탑 단말(110)은 포커싱을 오른쪽으로 계속 이동하여 "영어유치원" 메뉴(450)를 검출할 수 있다. 셋탑 단말(110)은 "영어유치원" 메뉴(450)를 검출한 경우, "영어유치원" 메뉴(450)를 실행할 수 있다.The set-top terminal 110 receives a search result returned from the reference database 610 (S635). For example, the set-top terminal 110 may identify reference data having a first standard that matches the first text "Kids' country" in the voice recognition result "Play English kindergarten in children's country" in the reference database 610. . As an example, the set-top terminal 110 may identify first to eighth reference data including the application name "children's country" in the reference list described with reference to FIG. 3 . The set-top terminal 110 may determine whether there is target reference data having a second standard that matches the second text “English kindergarten” in the voice recognition result “Play English Kindergarten in Children’s Country” among the identified reference data. As an example, the set-top terminal 110 may search for fifth reference data including the menu name "English kindergarten" among the identified first to eighth reference data. When there is target reference data, the set-top terminal 110 may execute an application corresponding to the first standard, and may identify a focused menu from among a plurality of menus on an execution screen of the application. For example, when the set-top terminal 110 searches for the fifth reference data, "children's country" may be executed, and a focused menu may be identified among menus on the execution screen of "children's country". The set-top terminal 110 may compare the coordinates of the menu corresponding to the second standard among the menus on the execution screen of "Children's Country" and the coordinates of the identified menu, and focus on the second standard based on the comparison result. You can go to the menu. For example, the set-top terminal 110 may determine that the menu 420 is focused because the text background color of the menu 420 is different from that of other menus, and the coordinates of the menu 420 and the fifth reference data The coordinates of the menu 450 corresponding to the included "English kindergarten" can be compared. Through this comparison result, the set-top terminal 110 may check that the menu 450 is on the right side of the menu 420 and may move the focusing to the right. The set-top terminal 110 may detect the "English kindergarten" menu 450 by continuously moving the focusing to the right. When the set-top terminal 110 detects the “English kindergarten” menu 450 , the set-top terminal 110 may execute the “English kindergarten” menu 450 .

실시예에 있어서, 셋탑 단말(110)은 참조 데이터베이스(610)를 업데이트할 수 있다. 일례로, 셋탑 단말(110)은 새로운 어플리케이션을 다운받아 실행할 수 있고, 새로운 어플리케이션의 실행 화면에서 컴퓨터 비전 기술을 이용하여 메뉴들에 대한 텍스트를 인식할 수 있으며, 인식 결과를 참조 데이터베이스(610)에 등록할 수 있다. In an embodiment, the set-top terminal 110 may update the reference database 610 . For example, the set-top terminal 110 may download and execute a new application, may recognize texts for menus using computer vision technology on the execution screen of the new application, and store the recognition result in the reference database 610 . can register.

실시예에 있어서, 음성 인식 엔진이 "아이들 나라의 영어 유치원을 실행해줘"를 학습한 경우, 셋탑 단말(110) 또는 서버(130)는 참조 리스트에서 아이들 나라의 영어 유치원에 대한 참조 데이터를 삭제할 수 있다.In an embodiment, when the voice recognition engine learns "Run an English kindergarten in a children's country", the set-top terminal 110 or the server 130 may delete reference data for an English kindergarten in the children's country from the reference list. have.

도 1 내지 도 5를 통해 기술된 사항들은 도 6을 통해 기술된 사항들에 적용될 수 있으므로, 상세한 설명을 생략한다.Since the matters described with reference to FIGS. 1 to 5 may be applied to the matters described with reference to FIG. 6 , a detailed description thereof will be omitted.

도 7은 일 실시예에 따른 셋탑 단말을 설명하기 위한 블록도이다.7 is a block diagram illustrating a set-top terminal according to an embodiment.

도 7을 참조하면, 셋탑 단말(110)은 통신 인터페이스(710) 및 컨트롤러(720)를 포함한다.Referring to FIG. 7 , the set-top terminal 110 includes a communication interface 710 and a controller 720 .

통신 인터페이스(710)는 셋탑 단말(110)이 서버(130) 등 외부와 통신하는데 사용되는 통신 모듈을 포함할 수 있다.The communication interface 710 may include a communication module used for the set-top terminal 110 to communicate with the outside such as the server 130 .

컨트롤러(720)는 사용자의 발화 음성 신호를 수신 또는 획득할 수 있다. 일례로, 셋탑 단말(110)은 마이크를 포함할 수 있고 해당 마이크를 통해 사용자의 발화 음성 신호를 수신 또는 획득할 수 있다. The controller 720 may receive or obtain a user's spoken voice signal. For example, the set-top terminal 110 may include a microphone, and may receive or obtain a user's spoken voice signal through the microphone.

컨트롤러(720)는 사용자의 발화 음성 신호를 부호화하여 디지털 신호를 생성하고 생성된 디지털 신호를 통신 인터페이스(710)를 통해 서버(130)로 전송할 수 있다.The controller 720 may generate a digital signal by encoding the user's speech signal, and transmit the generated digital signal to the server 130 through the communication interface 710 .

컨트롤러(720)는 사용자의 발화 음성 신호의 음성 인식 결과를 통신 인터페이스(710)를 통해 서버(130)로부터 수신한다. 실시예에 따라, 컨트롤러(720)는 통신 인터페이스(710)를 통해 서버(130)로부터 음성 인식 결과로 실행 가능한 동작이 없다는 정보를 수신할 수 있다.The controller 720 receives the voice recognition result of the user's spoken voice signal from the server 130 through the communication interface 710 . According to an embodiment, the controller 720 may receive information that there is no actionable operation as a result of voice recognition from the server 130 through the communication interface 710 .

컨트롤러(720)는 음성 인식 결과로 실행 가능한 동작이 없는 경우, 해당 음성 인식 결과를 기초로 참조 리스트를 탐색하고, 탐색 결과를 기초로 동작을 실행한다. When there is no actionable action as a result of the speech recognition, the controller 720 searches the reference list based on the speech recognition result and executes the action based on the search result.

실시예에 있어서, 컨트롤러(720)는 참조 리스트 내의 참조 데이터들 중에서 음성 인식 결과 내의 제1 텍스트(예를 들어, 상술한 "아이들 나라")와 부합되는 제1 규격(예를 들어, 상술한 어플리케이션 이름)을 갖는 참조 데이터들을 식별할 수 있고, 식별된 참조 데이터들 중에서 음성 인식 결과 내의 제2 텍스트(예를 들어, 상술한 "영어 유치원")와 부합되는 제2 규격(예를 들어, 메뉴 이름)을 갖는 타겟 참조 데이터가 있는지 판단할 수 있다. 타겟 참조 데이터는 제1 및 제2 규격 뿐 아니라 다른 규격을 더 포함할 수 있다. 위에서 설명한 것과 같이, 타겟 참조 데이터는 텍스트 컬러 또는 배경 컬러를 제3 규격으로 가질 수 있고 제2 규격에 대응되는 메뉴의 좌표를 제4 규격으로 가질 수 있으며, action(또는 intent)을 제5 규격으로 가질 수 있다.In an embodiment, the controller 720 may control a first standard (eg, the above-described application) that matches the first text (eg, "children's country") in the voice recognition result among the reference data in the reference list. It is possible to identify reference data having a name), and among the identified reference data, a second standard (eg, menu name) that conforms to the second text (eg, “English kindergarten” described above) in the speech recognition result. ), it can be determined whether there is target reference data having The target reference data may further include other standards as well as the first and second standards. As described above, the target reference data may have text color or background color as the third standard, menu coordinates corresponding to the second standard as the fourth standard, and action (or intent) as the fifth standard can have

컨트롤러(720)는 타겟 참조 데이터가 있는 경우, 제1 규격에 해당하는 어플리케이션을 실행할 수 있고, 어플리케이션의 실행 화면 상의 복수의 메뉴들 중 포커싱된 메뉴를 식별할 수 있다. 일례로, 컨트롤러(720)는 복수의 메뉴들 중 텍스트 컬러 또는 배경 컬러가 다른 하나를 포커싱된 메뉴로 식별할 수 있다.When there is target reference data, the controller 720 may execute an application corresponding to the first standard, and may identify a focused menu from among a plurality of menus on an execution screen of the application. For example, the controller 720 may identify one of the plurality of menus having a different text color or a different background color as the focused menu.

컨트롤러(720)는 메뉴들 중 제2 규격에 대응되는 메뉴의 좌표와 식별된 메뉴의 좌표를 비교할 수 있고, 비교 결과를 기초로 포커싱을 제2 규격에 대응되는 메뉴로 이동할 수 있다. 일례로, 컨트롤러(720)는 비교 결과를 기초로 포커싱을 상, 하, 좌, 및 우 중 어느 하나로 이동시킬 수 있다.The controller 720 may compare the coordinates of the identified menu with the coordinates of the menu corresponding to the second standard among the menus, and may move the focusing to the menu corresponding to the second standard based on the comparison result. For example, the controller 720 may move the focusing to any one of up, down, left, and right based on the comparison result.

컨트롤러(720)는 어플리케이션의 실행 화면 상에서 타겟 참조 데이터의 제2 규격에 대응되는 메뉴가 포커싱된 경우, 해당 메뉴를 실행할 수 있다.When the menu corresponding to the second standard of the target reference data is focused on the execution screen of the application, the controller 720 may execute the corresponding menu.

실시예에 있어서, 컨트롤러(720)는 새롭게 다운 받은 어플리케이션에 대한 참조 데이터들을 참조 리스트에 등록하여 해당 참조 리스트를 업데이트할 수 있다. 일례로, 컨트롤러(720)는 새롭게 다운 받은 어플리케이션의 실행 화면에서 텍스트를 자동 또는 동적으로 추출할 수 있다. 여기서, 텍스트는 실행 화면 상의 메뉴들 각각의 텍스트에 해당할 수 있다. 컨트롤러(720)는 추출된 텍스트에 대한 참조 데이터를 생성할 수 있고 생성된 참조 데이터를 참조 리스트에 등록할 수 있다. 예를 들어, 컨트롤러(720)는 새롭게 다운 어플리케이션 A의 실행 화면에서 메뉴 A 및 메뉴 B 각각을 인식할 수 있고, 메뉴 A 및 메뉴 B 각각에 대한 참조 데이터를 생성하여 참조 리스트에 등록할 수 있다. 메뉴 A에 대한 참조 데이터는, 예를 들어, 어플리케이션 A의 이름, 메뉴 A의 이름, 메뉴 A의 텍스트의 컬러(또는 배경 컬러), 어플리케이션 A의 실행 화면 상에서의 메뉴 A의 좌표, 및 액션 중 적어도 하나를 포함할 수 있다. 메뉴 B에 대한 참조 데이터는, 예를 들어, 어플리케이션 A의 이름, 메뉴 B의 이름, 메뉴 B의 텍스트의 컬러(또는 배경 컬러), 어플리케이션 A의 실행 화면 상에서의 메뉴 B의 좌표, 및 액션 중 적어도 하나를 포함할 수 있다.In an embodiment, the controller 720 may update the reference list by registering reference data for a newly downloaded application in the reference list. For example, the controller 720 may automatically or dynamically extract text from an execution screen of a newly downloaded application. Here, the text may correspond to the text of each of the menus on the execution screen. The controller 720 may generate reference data for the extracted text and register the generated reference data in the reference list. For example, the controller 720 may recognize each of menu A and menu B on the execution screen of the newly downloaded application A, generate reference data for each of menu A and menu B, and register it in the reference list. Reference data for menu A may include, for example, at least one of a name of application A, a name of menu A, a color (or background color) of text of menu A, coordinates of menu A on an execution screen of application A, and an action may contain one. Reference data for menu B may include, for example, at least one of a name of application A, a name of menu B, a color (or background color) of text of menu B, coordinates of menu B on an execution screen of application A, and an action may contain one.

도 1 내지 도 6을 통해 기술된 사항들은 도 7을 통해 기술된 사항들에 적용될 수 있으므로, 상세한 설명을 생략한다.Since the matters described with reference to FIGS. 1 to 6 may be applied to the matters described with reference to FIG. 7 , a detailed description thereof will be omitted.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

communication interface; and
A voice recognition result of the user's spoken voice signal and information indicating that there is no actionable action as a result of the speech recognition are received from the server through the communication interface, and an action executable as a result of the received speech recognition through the received information a controller that searches a reference list based on the received voice recognition result and executes an operation based on the search result when it is confirmed that there is no
including,
The controller is
Identifies reference data having a first standard matching the first text in the received speech recognition result from among the reference data in the reference list, and includes a second text in the received speech recognition result from among the identified reference data It is determined whether there is target reference data having a matching second standard, and if there is the target reference data, an application corresponding to the first standard is executed, and a focused menu is identified among a plurality of menus on an execution screen of the application and comparing the coordinates of the identified menu with the coordinates of the menu corresponding to the second standard among the menus, and moving the focusing to the menu corresponding to the second standard based on the comparison result,
set-top terminal.

delete

According to claim 1,
The controller is
identifying one of the plurality of menus having a different text color or a different background color as the focused menu,
set-top terminal.

According to claim 1,
The controller is
executing a menu corresponding to the second text when the menu corresponding to the second standard is focused;
set-top terminal.

According to claim 1,
The controller is
moving the focusing to any one of up, down, left, and right based on the comparison result,
set-top terminal.

According to claim 1,
The target reference data is
Having a text color or a background color as a third standard, having menu coordinates corresponding to the second standard as a fourth standard, and having an action as a fifth standard,
set-top terminal.

delete

According to claim 1,
The controller is
receiving the spoken voice signal from the user,
set-top terminal.

According to claim 1,
The controller is
generating a digital signal by encoding the spoken voice signal, and transmitting the generated digital signal to the server through the communication interface;
set-top terminal.

In the method of operating a set-top terminal,
receiving, from a server, a voice recognition result of the user's spoken voice signal and information indicating that there is no actionable action as a result of the voice recognition; and
Searching a reference list based on the received voice recognition result and executing an operation based on the search result when it is confirmed through the received information that there is no actionable operation as the received voice recognition result
including,
The step of executing the operation is
identifying reference data having a first standard matching a first text in the received speech recognition result from among the reference data in the reference list;
determining whether there is target reference data having a second standard matching a second text in the received speech recognition result from among the identified reference data;
executing an application corresponding to the first standard when there is the target reference data;
identifying a focused menu from among a plurality of menus on an execution screen of the application; and
Comparing the coordinates of the menu corresponding to the second standard among the menus with the coordinates of the identified menu, and moving the focusing to the menu corresponding to the second standard based on the comparison result
containing,
A method of operating a set-top terminal.

delete

11. The method of claim 10,
The step of identifying the focused menu is,
identifying one of the plurality of menus having a different text color or a different background color as the focused menu
containing,
A method of operating a set-top terminal.

11. The method of claim 10,
The step of executing the operation is
executing a menu corresponding to the second text when the menu corresponding to the second standard is focused
further comprising,
A method of operating a set-top terminal.

11. The method of claim 10,
The moving step is
moving the focusing to any one of up, down, left, and right based on the comparison result
containing,
A method of operating a set-top terminal.

11. The method of claim 10,
The target reference data is
Having a text color or a background color as a third standard, having menu coordinates corresponding to the second standard as a fourth standard, and having an action as a fifth standard,
A method of operating a set-top terminal.

delete

11. The method of claim 10,
receiving the spoken voice signal from the user
further comprising,
A method of operating a set-top terminal.

11. The method of claim 10,
generating a digital signal by encoding the spoken voice signal, and transmitting the generated digital signal to the server
further comprising,
A method of operating a set-top terminal.