KR102524675B1

KR102524675B1 - Display apparatus and controlling method thereof

Info

Publication number: KR102524675B1
Application number: KR1020170091494A
Authority: KR
Inventors: 김양수; 수라즈 싱 탄와르
Original assignee: 삼성전자주식회사
Priority date: 2017-05-12
Filing date: 2017-07-19
Publication date: 2023-04-21
Also published as: KR20230130589A; KR20180124682A

Abstract

디스플레이 장치가 개시된다. 본 디스플레이 장치는, 디스플레이 및 복수의 텍스트 객체를 포함하는 UI 스크린을 표시하도록 디스플레이를 제어하고, 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자가 함께 표시되도록 디스플레이를 제어하고, 사용자가 발화한 음성의 인식 결과가 표시된 숫자를 포함하면 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 프로세서를 포함한다.A display device is disclosed. The present display device controls the display to display a UI screen including a display and a plurality of text objects, and controls the display so that a preset number is displayed together for a text object different from a predetermined language among the plurality of text objects, and a processor for performing an operation related to a text object corresponding to the displayed number when the result of recognizing the voice spoken by the user includes the displayed number.

Description

Display device and its control method { DISPLAY APPARATUS AND CONTROLLING METHOD THEREOF }

본 개시는 디스플레이장치 및 이의 제어방법에 관한 것으로, 더욱 상세하게는 다양한 언어로 구성된 컨텐츠에 대한 음성인식 제어를 제공하는 디스플레이장치 및 이의 제어방법에 관한 것이다.The present disclosure relates to a display device and a control method thereof, and more particularly, to a display device providing voice recognition control for contents composed of various languages and a control method thereof.

전자 기술의 발달에 힘입어 다양한 유형의 디스플레이장치가 개발 및 보급되고 있었다. 특히, TV, 휴대폰, PC, 노트북 PC, PDA 등과 같은 각종 전자 장치들은 대부분의 일반 가정에서도 많이 사용되고 있었다.Thanks to the development of electronic technology, various types of display devices have been developed and supplied. In particular, various electronic devices such as TVs, mobile phones, PCs, notebook PCs, and PDAs have been widely used in most households.

한편, 최근에는 디스플레이장치를 조금 더 편리하고 직관적으로 제어하기 위하여 음성 인식을 이용한 기술이 개발되고 있었다.Meanwhile, in recent years, a technology using voice recognition has been developed in order to more conveniently and intuitively control a display device.

종래 사용자의 음성에 따라 제어되는 디스플레이장치들은 음성인식엔진을 이용하여 음성 인식을 수행하게 되는데, 언어마다 다른 음성인식엔진이 존재하므로, 어떤 음성인식엔진을 이용해서 음성 인식을 수행할지 미리 결정할 필요가 있었다. 따라서, 보통은 디스플레이장치의 시스템 언어를 음성 인식에 사용할 언어로 결정하였다.Conventional display devices controlled according to the user's voice perform voice recognition using a voice recognition engine. Since there are different voice recognition engines for each language, it is necessary to determine in advance which voice recognition engine to use for voice recognition. there was. Therefore, in general, the system language of the display device is determined as a language to be used for voice recognition.

그러나 예컨대 디스플레이장치에 표시된 하이퍼링크텍스트에서 사용된 언어가 영어이고, 디스플레이장치의 시스템 언어는 한국어인 경우, 사용자가 그 하이퍼링크텍스트에 해당하는 음성을 발화하더라도 그 음성은 한국어 음성인식엔진을 거쳐 한국어 텍스트로 변환되기 때문에, 결국 해당 하이퍼링크텍스트를 선택할 수 없다는 문제가 있었다.However, for example, when the language used in the hyperlink text displayed on the display device is English and the system language of the display device is Korean, even if the user utters a voice corresponding to the hyperlink text, the voice is passed through the Korean voice recognition engine to Korean language. Since it is converted into text, there was a problem that the corresponding hyperlink text could not be selected after all.

이와 같이 종래엔, 시스템 언어와 디스플레이장치에 실제 표시된 언어가 서로 다른 경우에 음성으로 디스플레이장치를 제어하는데 제한이 있었다.As such, conventionally, when the system language and the language actually displayed on the display device are different from each other, there is a limitation in controlling the display device with voice.

본 개시는 상술한 필요성에 따른 것으로, 본 개시의 목적은 다양한 언어로 구성된 컨텐츠에 대한 음성인식 제어를 제공하는 디스플레이장치 및 이의 제어방법을 제공함에 있다.The present disclosure has been made in accordance with the above-described needs, and an object of the present disclosure is to provide a display device and a control method for providing voice recognition control for contents composed of various languages.

이상과 같은 목적을 달성하기 위한 본 개시의 일 실시 예에 따른 디스플레이 장치는, 디스플레이 및 복수의 텍스트 객체를 포함하는 UI 스크린을 표시하도록 상기 디스플레이를 제어하고, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자가 함께 표시되도록 상기 디스플레이를 제어하고, 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 프로세서를 포함한다.A display device according to an embodiment of the present disclosure for achieving the above object controls the display to display a UI screen including a display and a plurality of text objects, and selects a predetermined language among the plurality of text objects. A processor for controlling the display so that preset numbers are displayed together for different text objects, and performing an operation related to a text object corresponding to the displayed numbers when a recognition result of a voice spoken by a user includes the displayed numbers do.

이 경우, 상기 프로세서는, 상기 디스플레이 장치의 설정 메뉴에서 설정된 사용 언어를 상기 기결정된 언어로 설정할 수 있다. 또는 상기 복수의 텍스트 객체에 가장 많이 사용된 언어를 상기 기결정된 언어로 설정할 수 있다.In this case, the processor may set the language set in the setting menu of the display device to the predetermined language. Alternatively, the most frequently used language of the plurality of text objects may be set as the predetermined language.

한편, 상기 UI 스크린은 웹 페이지이며, 상기 프로세서는, 상기 웹 페이지의 언어 정보에 대응되는 언어를 상기 기결정된 언어로 설정할 수 있다.Meanwhile, the UI screen is a web page, and the processor may set a language corresponding to language information of the web page to the predetermined language.

한편, 상기 프로세서는, 상기 복수의 텍스트 객체 중 2 이상의 언어로 구성된 텍스트 객체에 대해선, 상기 기결정된 언어의 포함 비율이 기설정된 비율 미만인 경우 상기 기결정된 언어와 상이한 텍스트 객체인 것으로 판단할 수 있다.On the other hand, the processor may determine that a text object composed of two or more languages among the plurality of text objects is a text object different from the predetermined language when the content ratio of the predetermined language is less than the predetermined ratio.

한편, 상기 프로세서는, 상기 기 설정된 숫자를 상기 기 설정된 숫자에 대응되는 텍스트 객체에 인접하여 표시하도록 상기 디스플레이를 제어할 수 있다.Meanwhile, the processor may control the display to display the preset number adjacent to a text object corresponding to the preset number.

한편, 본 개시에 따른 디스플레이 장치는 외부 장치와 통신을 수행하는 통신부를 더 포함하고, 상기 프로세서는, 상기 외부 장치의 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 상기 기설정된 숫자를 표시하도록 상기 디스플레이를 제어할 수 있다.Meanwhile, the display device according to the present disclosure further includes a communication unit that communicates with an external device, and the processor displays the predetermined number while receiving a signal corresponding to selection of a specific button of the external device. You can control the display.

이 경우, 상기 외부 장치는 마이크를 포함하며, 상기 통신부는, 상기 외부 장치의 마이크를 통해 입력된 음성에 대응하는 음성신호를 수신하고, 상기 프로세서는, 상기 수신된 음성신호에 대한 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, the external device includes a microphone, the communication unit receives a voice signal corresponding to a voice input through the microphone of the external device, and the processor determines that the recognition result of the received voice signal is If the displayed number is included, an operation related to the text object corresponding to the displayed number can be performed.

이 경우, 상기 프로세서는, 상기 수신된 음성신호에 대한 인식 결과가 상기 복수의 텍스트 객체 중 어느 하나에 대응하는 텍스트를 포함하면 해당 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, if the recognition result of the received voice signal includes text corresponding to any one of the plurality of text objects, the processor may perform an operation related to the corresponding text object.

한편, 상기 텍스트 객체와 관련된 동작은, 상기 텍스트 객체에 대응하는 URL 주소의 웹 페이지의 표시 동작 또는 상기 텍스트 객체에 대응하는 애플리케이션 프로그램 실행 동작일 수 있다.Meanwhile, an operation related to the text object may be a display operation of a web page having a URL address corresponding to the text object or an application program execution operation corresponding to the text object.

한편, 상기 복수의 텍스트 객체는 제1 애플리케이션의 실행 화면에 포함된 것이며, 상기 프로세서는, 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행할 수 있다.Meanwhile, the plurality of text objects are included in the execution screen of the first application, and the processor determines that the object corresponding to the recognition result of the voice spoken by the user while the execution screen of the first application is displayed is the first application. If it is determined that there is no execution screen of , a second application different from the first application may be executed to perform an operation corresponding to the result of recognizing the voice.

이 경우, 상기 제2 애플리케이션은 검색어에 대한 검색 결과를 제공하는 애플리케이션이고, 상기 프로세서는, 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 텍스트를 검색어로 한 검색 결과를 제공할 수 있다.In this case, the second application is an application that provides a search result for a search word, and the processor determines that an object corresponding to a result of recognizing a voice spoken by a user while an execution screen of the first application is displayed is displayed. If it is determined that there is no on the execution screen of , the second application may be executed to provide a search result using text corresponding to the voice recognition result as a search term.

한편, 본 개시에 따른 디스플레이 장치는 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 서버와 통신하는 통신부를 더 포함하고, 상기 프로세서는, 상기 사용자가 발화한 음성에 대응하는 음성 신호와 상기 기 결정된 언어에 대한 정보를 상기 서버에 제공하도록 상기 통신부를 제어하고, 상기 서버로부터 수신된 음성 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.Meanwhile, the display device according to the present disclosure further includes a communication unit communicating with a server that performs voice recognition for a plurality of different languages, and the processor includes a voice signal corresponding to the voice spoken by the user and the predetermined The communication unit may be controlled to provide language information to the server, and if a voice recognition result received from the server includes the displayed number, an operation related to a text object corresponding to the displayed number may be performed.

이 경우, 상기 프로세서는, 상기 서버로부터 수신된 음성 인식 결과가 상기 복수의 텍스트 객체 중 어느 하나에 대응하는 텍스트를 포함하면 해당 텍스트 객체와 관련된 동작을 수행할 수 있다.In this case, if the voice recognition result received from the server includes text corresponding to any one of the plurality of text objects, the processor may perform an operation related to the corresponding text object.

한편, 본 개시의 일 실시 예에 따른 디스플레이 장치의 제어방법은, 복수의 텍스트 객체를 표시하는 단계, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시하는 단계 및 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 단계를 포함한다.Meanwhile, a control method of a display device according to an embodiment of the present disclosure includes displaying a plurality of text objects, displaying a preset number for text objects different from a predetermined language among the plurality of text objects, and and performing an operation related to a text object corresponding to the displayed number if the result of recognizing the voice spoken by the user includes the displayed number.

이 경우, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 복수의 텍스트 객체에 가장 많이 사용된 언어를 상기 기결정된 언어로 설정하는 단계를 더 포함할 수 있다.In this case, the control method of the display device according to the present disclosure may further include setting a language most frequently used for the plurality of text objects as the predetermined language.

한편, 상기 복수의 텍스트 객체는 웹 페이지에 포함된 것이며, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 웹 페이지의 언어 정보에 대응되는 언어를 상기 기결정된 언어로 설정하는 단계를 더 포함할 수 있다.Meanwhile, the plurality of text objects are included in the web page, and the control method of the display device according to the present disclosure may further include setting a language corresponding to language information of the web page to the predetermined language. .

한편, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 복수의 텍스트 객체 중 2 이상의 언어로 구성된 텍스트 객체에 대해선, 상기 기결정된 언어의 포함 비율이 기설정된 비율 미만인 경우 상기 기결정된 언어와 상이한 텍스트 객체인 것으로 판단하는 단계를 더 포함할 수 있다.On the other hand, in the control method of the display device according to the present disclosure, for a text object composed of two or more languages among the plurality of text objects, when the ratio of the predetermined language is less than the predetermined ratio, the text object is different from the predetermined language. It may further include the step of determining that it is.

한편, 상기 기설정된 숫자를 함께 표시하는 단계는, 상기 기 설정된 숫자를 상기 기 설정된 숫자에 대응되는 텍스트 객체에 인접하여 표시할 수 있다.Meanwhile, in the step of displaying the preset number together, the preset number may be displayed adjacent to a text object corresponding to the preset number.

한편, 상기 기설정된 숫자를 함께 표시하는 단계는, 외부 장치로부터 상기 외부 장치의 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 상기 기설정된 숫자를 표시할 수 있다.Meanwhile, in the step of displaying the preset number together, the preset number may be displayed while a signal corresponding to selection of a specific button of the external device is received from the external device.

한편, 상기 텍스트 객체와 관련된 동작을 수행하는 단계는, 상기 텍스트 객체에 대응하는 URL 주소의 웹 페이지를 표시하거나, 상기 텍스트 객체에 대응하는 애플리케이션 프로그램을 실행할 수 있다.Meanwhile, in the step of performing an operation related to the text object, a web page having a URL address corresponding to the text object may be displayed or an application program corresponding to the text object may be executed.

한편, 상기 복수의 텍스트 객체는 제1 애플리케이션의 실행 화면에 포함된 것이며, 본 개시에 따른 디스플레이 장치의 제어방법은 상기 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행하는 단계를 더 포함할 수 있다.Meanwhile, the plurality of text objects are included in the execution screen of the first application, and the control method of the display device according to the present disclosure corresponds to the recognition result of the user's voice while the execution screen of the first application is displayed. The method may further include performing an operation corresponding to a result of recognizing the voice by executing a second application different from the first application when it is determined that the object is not present on the execution screen of the first application.

한편, 본 개시에 따른 디스플레이 장치의 제어방법은 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 서버에 상기 사용자가 발화한 음성에 대응하는 음성 신호와 상기 기 결정된 언어에 대한 정보를 제공하는 단계를 더 포함하며, 상기 텍스트 객체와 관련된 동작을 수행하는 단계는 상기 서버로부터 수신된 음성 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.Meanwhile, a method for controlling a display device according to the present disclosure includes providing a voice signal corresponding to a voice spoken by the user and information on the predetermined language to a server performing voice recognition for a plurality of different languages. In the step of performing an operation related to the text object, if the voice recognition result received from the server includes the displayed number, an operation related to the text object corresponding to the displayed number may be performed.

한편, 본 개시의 일 실시 예에 따른 디스플레이 장치의 제어방법을 실행하기 위한 프로그램이 저장된 컴퓨터 판독 가능 기록매체에 있어서, 상기 디스플레이 장치의 제어방법은, 복수의 텍스트 객체를 표시하도록 상기 디스플레이 장치를 제어하는 단계, 상기 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시하도록 상기 디스플레이 장치를 제어하는 단계, 및 사용자가 발화한 음성의 인식 결과가 상기 표시된 숫자를 포함하면 상기 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행하는 단계를 포함한다.Meanwhile, in a computer readable recording medium storing a program for executing a method for controlling a display device according to an embodiment of the present disclosure, the method for controlling the display device controls the display device to display a plurality of text objects. controlling the display device to display a predetermined number for text objects different from a predetermined language among the plurality of text objects, and if the result of recognizing the voice spoken by the user includes the displayed number, and performing an operation related to the text object corresponding to the displayed number.

도 1 내지 도 2는 본 개시의 다양한 실시 예에 따른 디스플레이장치에서의 음성 명령 입력 방법을 설명하기 위한 도면,
도 3은 본 개시의 일 실시 예에 따른 음성인식시스템을 설명하기 위한 도면,
도 4는 본 개시의 일 실시 예에 따른 디스플레이장치의 구성을 설명하기 위한 블럭도,
도 5 내지 도 7은 본 개시의 다양한 실시 예에 따른 객체 선택을 위한 숫자 표시 방식을 설명하기 위한 도면,
도 8 내지 도 9는 본 개시의 다양한 실시 예에 따른 음성 검색 방법을 설명하기 위한 도면,
도 10은 본 개시의 또 다른 실시 예에 따른 디스플레이장치의 구성을 설명하기 위한 블럭도, 그리고
도 11은 본 개시의 일 실시 예에 따른 디스플레이장치의 제어방법을 설명하기 위한 흐름도이다.1 and 2 are views for explaining a method for inputting a voice command in a display device according to various embodiments of the present disclosure;
3 is a diagram for explaining a voice recognition system according to an embodiment of the present disclosure;
4 is a block diagram for explaining the configuration of a display device according to an embodiment of the present disclosure;
5 to 7 are views for explaining a number display method for object selection according to various embodiments of the present disclosure;
8 to 9 are views for explaining a voice search method according to various embodiments of the present disclosure;
10 is a block diagram for explaining the configuration of a display device according to another embodiment of the present disclosure, and
11 is a flowchart for explaining a control method of a display device according to an embodiment of the present disclosure.

본 개시에 대하여 구체적으로 설명하기에 앞서, 본 명세서 및 도면의 기재 방법에 대하여 설명한다. Prior to a detailed description of the present disclosure, the method of describing the present specification and drawings will be described.

먼저, 본 명세서 및 청구범위에서 사용되는 용어는 본 개시의 다양한 실시 예들에서의 기능을 고려하여 일반적인 용어들을 선택하였다 하지만, 이러한 용어들은 당해 기술 분야에 종사하는 기술자의 의도나 법률적 또는 기술적 해석 및 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 일부 용어는 출원인이 임의로 선정한 용어도 있다. 이러한 용어에 대해서는 본 명세서에서 정의된 의미로 해석될 수 있으며, 구체적인 용어 정의가 없으면 본 명세서의 전반적인 내용 및 당해 기술 분야의 통상적인 기술 상식을 토대로 해석될 수도 있다. First, the terms used in the present specification and claims are general terms selected in consideration of functions in various embodiments of the present disclosure. It may change depending on the emergence of new technologies, etc. In addition, some terms are arbitrarily selected by the applicant. These terms may be interpreted as the meanings defined in this specification, and if there is no specific term definition, they may be interpreted based on the overall content of this specification and common technical knowledge in the art.

또한, 본 명세서에 첨부된 각 도면에 기재된 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 부품 또는 구성요소를 나타낸다. 설명 및 이해의 편의를 위해서 서로 다른 실시 예들에서도 동일한 참조번호 또는 부호를 사용하여 설명한다. 즉, 복수의 도면에서 동일한 참조 번호를 가지는 구성요소를 모두 도시되어 있다고 하더라도, 복수의 도면들이 하나의 실시 예를 의미하는 것은 아니다. In addition, the same reference numerals or numerals in each drawing attached to this specification indicate parts or components that perform substantially the same function. For convenience of description and understanding, the same reference numerals or symbols are used in different embodiments. That is, even if all components having the same reference numerals are shown in a plurality of drawings, the plurality of drawings do not mean one embodiment.

또한, 본 명세서 및 청구범위에서는 구성요소들 간의 구별을 위하여 "제1", "제2" 등과 같이 서수를 포함하는 용어가 사용될 수 있다. 이러한 서수는 동일 또는 유사한 구성요소들을 서로 구별하기 위하여 사용하는 것이며 이러한 서수 사용으로 인하여 용어의 의미가 한정 해석되어서는 안 된다. 일 예로, 이러한 서수와 결합된 구성요소는 그 숫자에 의해 사용 순서나 배치 순서 등이 제한되어서는 안 된다. 필요에 따라서는, 각 서수들은 서로 교체되어 사용될 수도 있다. Also, in the present specification and claims, terms including ordinal numbers such as “first” and “second” may be used to distinguish between elements. These ordinal numbers are used to distinguish the same or similar components from each other, and the meaning of the term should not be construed as being limited due to the use of these ordinal numbers. For example, the order of use or arrangement of elements associated with such ordinal numbers should not be limited by the number. If necessary, each ordinal number may be used interchangeably.

본 명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "comprise" or "consist of" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other It should be understood that the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof is not precluded.

본 개시의 실시 예에서 "모듈", "유닛", "부(part)" 등과 같은 용어는 적어도 하나의 기능이나 동작을 수행하는 구성요소를 지칭하기 위한 용어이며, 이러한 구성요소는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 "모듈", "유닛", "부(part)" 등은 각각이 개별적인 특정한 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In the embodiments of the present disclosure, terms such as “module,” “unit,” and “part” are terms used to refer to components that perform at least one function or operation, and these components are hardware or software. It may be implemented or implemented as a combination of hardware and software. In addition, a plurality of "modules", "units", "parts", etc. are integrated into at least one module or chip, except for cases where each of them needs to be implemented with separate specific hardware, so that at least one processor can be implemented as

또한, 본 개시의 실시 예에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적인 연결뿐 아니라, 다른 매체를 통한 간접적인 연결의 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 포함한다는 의미는, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Also, in an embodiment of the present disclosure, when a part is said to be connected to another part, this includes not only a direct connection but also an indirect connection through another medium. In addition, the meaning that a certain part includes a certain component means that it may further include other components without excluding other components unless otherwise stated.

이하, 첨부된 도면을 이용하여 본 개시에 대하여 구체적으로 설명한다. Hereinafter, the present disclosure will be described in detail using the accompanying drawings.

도 1은 음성 인식에 따라 제어되는 본 개시의 일 실시 예에 따른 디스플레이장치를 설명하기 위한 도면이다.1 is a diagram for explaining a display apparatus controlled according to voice recognition according to an embodiment of the present disclosure.

도 1을 참고하면, 디스플레이장치(100)는 도 1에 도시된 바와 같이 TV 일 수 있으나, 이는 일 예에 불과할 뿐, 스마트폰, 데스크탑 PC, 노트북, 스마트 워치, 네비게이션, 냉장고 등 디스플레이 기능을 갖는 어떠한 장치로도 구현될 수 있다.Referring to FIG. 1, the display device 100 may be a TV as shown in FIG. 1, but this is only an example, and a smartphone, desktop PC, laptop computer, smart watch, navigation device, refrigerator, etc. It can be implemented in any device.

디스플레이장치(100)는 사용자가 발화한 음성의 인식 결과에 기초하여 동작을 수행할 수 있다. 예컨대, 사용자가 "7번 채널로 변경"이라고 말하면 7번 채널의 프로그램을 표시할 수 있고, 사용자가 "전원 꺼"라고 말하면 전원을 오프할 수 있다. 또한, 디스플레이장치(100)는 사용자와 대화하듯이 동작할 수도 있다. 예컨대, 사용자가 "현재 방송 중인 프로그램의 명칭이 뭐야?"라는 음성에 대한 답변으로 "문의하신 프로그램 제목은 ○○○ 입니다"라는 메시지를 음성 또는 텍스트로 출력할 수 있고, 사용자가 "오늘 날씨 어때"라고 말하면 "원하시는 지역을 말씀해 주세요"라는 메시지를 음성 또는 텍스트로 출력할 수 있고, 이에 대해 사용자가 "서울"이라고 말하면 "서울의 기온은 ○○ 입니다"라는 메시지를 음성 또는 텍스트로 출력할 수 있다.The display apparatus 100 may perform an operation based on a result of recognizing a voice spoken by a user. For example, if the user says "change to channel 7", the program of channel 7 can be displayed, and if the user says "turn off the power", the power can be turned off. Also, the display device 100 may operate as if talking to a user. For example, a user may output a voice or text message saying "The title of the program you are inquiring about is ○○○" in response to a voice asking "What is the name of the program currently being broadcast?", and the user may ask "How is the weather today?" ", the message "Tell me the region you want" can be output through voice or text, and when the user says "Seoul", the message "The temperature of Seoul is ○○" can be output through voice or text. there is.

도 1에 도시된 바와 같이 디스플레이장치(100)는 디스플레이장치(100)에 연결되거나 디스플레이장치(100)에 포함된 마이크를 통해 사용자 음성을 수신할 수 있다. 또는, 디스플레이장치(100)는 외부 장치의 마이크를 통해 입력된 음성에 대응하는 음성 신호를 상기 외부 장치로부터 수신할 수도 있다. 이에 대해선 도 2를 참고하여 설명하도록 한다.As shown in FIG. 1 , the display apparatus 100 may receive a user's voice through a microphone connected to the display apparatus 100 or included in the display apparatus 100 . Alternatively, the display apparatus 100 may receive a voice signal corresponding to a voice input through a microphone of the external device from the external device. This will be described with reference to FIG. 2 .

도 2는 본 개시의 일 실시 에에 따른 디스플레이시스템을 설명하기 위한 도면이다.2 is a diagram for explaining a display system according to an embodiment of the present disclosure.

도 2를 참고하면, 디스플레이시스템은 디스플레이장치(100)와 외부장치(200)를 포함한다.Referring to FIG. 2 , the display system includes a display device 100 and an external device 200 .

디스플레이장치(100)는 도 1에서 설명한 것처럼 음성 인식 결과에 따라 동작하는 장치이다. The display device 100 is a device that operates according to a voice recognition result as described in FIG. 1 .

도 2에선 외부장치(200)가 리모컨으로 구현된 예를 도시하였으나, 스마트폰, 테블릿 PC, 스마트 워치 등의 전자 장치로 구현되는 것도 가능하다.Although FIG. 2 shows an example in which the external device 200 is implemented as a remote control, it is also possible to implement the external device 200 as an electronic device such as a smart phone, a tablet PC, and a smart watch.

외부장치(200)는 마이크를 포함한 장치로서, 마이크를 통해 입력된 음성에 대응하는 음성 신호를 디스플레이장치(100)로 전송할 수 있다. 예컨대, 외부장치(200)는 적외선(IR), RF, 블루투스, 와이파이 등의 무선 통신 방식을 이용하여 음성 신호를 디스플레이장치(100)로 전송할 수 있다.The external device 200 is a device including a microphone and can transmit a voice signal corresponding to a voice input through the microphone to the display device 100 . For example, the external device 200 may transmit a voice signal to the display device 100 using a wireless communication method such as infrared (IR), RF, Bluetooth, or Wi-Fi.

외부장치(200)의 마이크는 전력 절감을 위해 기 설정된 이벤트가 있는 경우에만 활성화될 수 있다. 예컨대, 외부장치(200)의 마이크 버튼(210)을 누르고 있는 동안 마이크가 활성화되고, 마이크 버튼(210)이 릴리즈되면 마이크가 비활성화된다. 즉, 마이크 버튼(210)이 눌려진 동안에만 음성을 입력받을 수 있다.The microphone of the external device 200 may be activated only when there is a preset event to save power. For example, while pressing the microphone button 210 of the external device 200, the microphone is activated, and when the microphone button 210 is released, the microphone is deactivated. That is, voice can be received only while the microphone button 210 is pressed.

디스플레이장치(100)의 마이크 또는 외부장치(200)의 마이크를 통해 입력된 음성에 대한 음성 인식은 외부 서버를 통해 수행될 수 있다. 도 3은 이와 관련한 실시 예를 설명하기 위한 도면이다.Voice recognition for a voice input through the microphone of the display device 100 or the microphone of the external device 200 may be performed through an external server. 3 is a diagram for explaining an embodiment related to this.

도 3을 참고하면, 음성 인식 시스템(2000)은 디스플레이장치(100) 및 서버(300)를 포함한다. Referring to FIG. 3 , the voice recognition system 2000 includes a display device 100 and a server 300 .

디스플레이장치(100)는 도 1에서 설명한 것처럼 음성 인식 결과에 따라 동작하는 장치이다. 디스플레이장치(100)는 앞서 설명한 것과 같이 디스플레이장치(100)의 마이크 또는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성 신호를 서버(300)로 전송할 수 있다. The display device 100 is a device that operates according to a voice recognition result as described in FIG. 1 . As described above, the display apparatus 100 may transmit a voice signal corresponding to a voice input through the microphone of the display apparatus 100 or the microphone of the external apparatus 200 to the server 300 .

디스플레이장치(100)는 음성 신호와 함께, 상기 음성 신호를 어떤 언어를 기반으로 인식해야 하는지를 나타내는 정보(이하 '언어 정보')를 서버(300)로 전송할 수 있다. 동일한 음성 신호라도, 어떤 언어의 음성 인식 엔진을 이용하느냐에 따라 다른 음성 인식 결과가 나올 수 있다. The display apparatus 100 may transmit to the server 300 information indicating which language should be used to recognize the voice signal (hereinafter referred to as 'language information') along with the voice signal. Even with the same voice signal, different voice recognition results may be obtained depending on which language voice recognition engine is used.

서버(300)는 복수의 서로 다른 언어에 대한 음성 인식을 수행할 수 있다. 서버(300)는 여러 언어 각각에 대응하는 여러 음성 인식 엔진을 포함할 수 있다. 예를 들어 서버(300)는 한국어 음성 인식 엔진, 영어 음성 인식 엔진, 일본어 음성 인식 엔진 등을 포함할 수 있다. 서버(300)는 디스플레이장치(100)로부터 음성 신호와 언어 정보가 수신되면, 음성 신호에 대해 언어 정보에 대응하는 음성 인식 엔진을 이용해서 음성 인식을 수행할 수 있다.The server 300 may perform voice recognition for a plurality of different languages. The server 300 may include several voice recognition engines corresponding to each of several languages. For example, the server 300 may include a Korean voice recognition engine, an English voice recognition engine, a Japanese voice recognition engine, and the like. When a voice signal and language information are received from the display apparatus 100, the server 300 may perform voice recognition on the voice signal using a voice recognition engine corresponding to the language information.

그리고 서버(300)는 음성 인식의 결과를 디스플레이장치(100)로 전송하고, 디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과에 대응하는 동작을 수행할 수 있다.The server 300 may transmit the voice recognition result to the display device 100, and the display device 100 may perform an operation corresponding to the voice recognition result received from the server 300.

예를 들어, 디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과에 포함된 텍스트가 디스플레이장치(100)에 표시된 텍스트 객체와 일치되면, 해당 텍스트 객체와 관련한 동작을 수행할 수 있다. 예를 들어, 디스플레이장치(100)는 웹 페이지 내에서 음성인식결과에 포함된 텍스트와 일치되는 텍스트 객체가 있으면, 해당 텍스트 객체에 대응되는 URL 주소의 웹 페이지를 표시할 수 있다. 다만, 이는 일 예에 불과할 뿐, 디스플레이장치(100)의 다양한 애플리케이션이 제공하는 UI 객체가 음성인식에 의해 선택되어 해당 동작이 수행될 수 있다.For example, if the text included in the voice recognition result received from the server 300 matches the text object displayed on the display device 100, the display apparatus 100 may perform an operation related to the corresponding text object. For example, if there is a text object matching text included in the voice recognition result in the web page, the display apparatus 100 may display a web page having a URL address corresponding to the corresponding text object. However, this is only an example, and UI objects provided by various applications of the display apparatus 100 may be selected by voice recognition and corresponding operations may be performed.

한편, 도 3에선 서버(300)가 하나인 것으로 도시하였으나, 복수의 언어에 각각에 대응되는 복수의 서버가 존재할 수 있다. 예컨대, 한국어 음성 인식을 담당하는 서버와 영어 음성 인식을 담당하는 서버가 별도로 존재할 수 있다. Meanwhile, although FIG. 3 shows one server 300, a plurality of servers corresponding to each of a plurality of languages may exist. For example, a server in charge of Korean voice recognition and a server in charge of English voice recognition may exist separately.

한편, 상술한 예에선 음성 인식이 디스플레이장치(100)와는 별도의 장치인 서버(300)에서 이루어지는 것으로 설명하였으나, 또 다른 예에 따르면, 디스플레이장치(100)가 서버(300)의 기능을 수행하는 것도 가능하다. 즉, 상술한 디스플레이장치(100)와 서버(300)가 하나의 제품으로 구현되는 것도 가능하다.Meanwhile, in the above example, it has been described that voice recognition is performed in the server 300, which is a separate device from the display device 100, but according to another example, the display device 100 performs the function of the server 300. It is also possible. That is, it is possible that the above-described display device 100 and the server 300 are implemented as a single product.

도 4는 본 개시의 일 실시 예에 따른 디스플레이장치(100)의 구성을 설명하기 위한 블럭도이다.4 is a block diagram for explaining the configuration of a display apparatus 100 according to an embodiment of the present disclosure.

디스플레이장치(100)는 디스플레이(110)와 프로세서(120)를 포함한다.The display device 100 includes a display 110 and a processor 120.

디스플레이(110)는 예컨대, LCD(Liquid Crystal Display)로 구현될 수 있으며, 경우에 따라 CRT(cathode-ray tube), PDP(plasma display panel), OLED(organic light emitting diodes), TOLED(transparent OLED) 등으로 구현될 수 있다. 또한, 디스플레이(110)는 사용자의 터치 조작을 감지할 수 있는 터치스크린 형태로 구현될 수도 있다.The display 110 may be implemented as, for example, a liquid crystal display (LCD), and in some cases, a cathode-ray tube (CRT), a plasma display panel (PDP), organic light emitting diodes (OLED), or a transparent OLED (TOLED). etc. can be implemented. Also, the display 110 may be implemented in the form of a touch screen capable of detecting a user's touch manipulation.

프로세서(120)는 디스플레이장치(100)의 전반적인 동작을 제어하기 위한 구성이다. The processor 120 is a component for controlling the overall operation of the display device 100 .

예를 들어, 프로세서(120)는 CPU, 램(RAM), 롬(ROM), 시스템 버스를 포함할 수 있다. 여기서, 롬은 시스템 부팅을 위한 명령어 세트가 저장되는 구성이고, CPU는 롬에 저장된 명령어에 따라 디스플레이장치(100)의 저장부에 저장된 운영체제를 램에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, CPU는 저장부에 저장된 각종 애플리케이션을 램에 복사하고, 실행시켜 각종 동작을 수행할 수 있다. 이상에서는 프로세서(120)가 하나의 CPU만을 포함하는 것으로 설명하였지만, 구현시에는 복수의 CPU(또는 DSP, SoC 등)으로 구현될 수 있다.For example, the processor 120 may include a CPU, RAM, ROM, and a system bus. Here, the ROM is a configuration in which a command set for system booting is stored, and the CPU copies the operating system stored in the storage unit of the display device 100 to the RAM according to the command stored in the ROM, and executes the O / S to boot the system let it When booting is completed, the CPU copies various applications stored in the storage unit to RAM and executes them to perform various operations. In the above, the processor 120 has been described as including only one CPU, but may be implemented with a plurality of CPUs (or DSPs, SoCs, etc.).

프로세서(120)는 디스플레이(110)에 표시된 객체를 선택하기 위한 사용자 명령이 입력되면, 사용자 명령에 의해 선택된 객체와 연관된 동작을 수행할 수 있다. 여기서 객체는 선택이 가능한 어떠한 객체라도 될 수 있으며, 예를 들어, 하이퍼링크 또는 아이콘 등일 수 있다. 선택된 객체와 연관된 동작이란 예컨대 하이퍼링크에 연결된 페이지, 문서, 영상 등을 표시하는 동작, 아이콘에 대응하는 프로그램을 실행하는 동작 등일 수 있다.When a user command for selecting an object displayed on the display 110 is input, the processor 120 may perform an operation related to the object selected by the user command. Here, the object may be any object that can be selected, and may be, for example, a hyperlink or an icon. The operation associated with the selected object may be, for example, an operation of displaying a page, document, image, etc. connected to a hyperlink, an operation of executing a program corresponding to an icon, and the like.

객체를 선택하기 위한 사용자 명령은 예컨대, 디스플레이장치(100)와 연결된 다양한 입력 장치(ex. 마우스, 키보드, 터치패드 등)를 통해 입력되는 명령이거나, 사용자가 발화한 음성에 대응하는 음성 명령일 수 있다. A user command for selecting an object may be, for example, a command input through various input devices (ex. mouse, keyboard, touch pad, etc.) connected to the display device 100 or a voice command corresponding to a voice uttered by the user. there is.

도 4에 도시하진 않았지만 디스플레이장치(100)는 음성을 입력받기 위한 음성 수신부를 더 포함할 수 있다. 음성 수신부는 마이크를 포함하여 사용자가 발화한 음성을 직접 입력받아 음성 신호를 생성할 수 있고, 또는 외부 장치(200)로부터 전기적인 음성 신호를 수신할 수 있다. 후자의 경우 음성 수신부는 외부 장치(200)와 유선 또는 무선 통신을 수행하기 위한 통신부로 구현될 수 있다. 이와 같은 음성 수신부는 경우에 따라 디스플레이장치(100)에 포함되지 않을 수 있다. 예를 들어, 외부 장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성 신호가 디스플레이장치(100)가 아닌 다른 장치를 거쳐 서버(300)로 전달되거나 혹은 외부 장치(200)로부터 직접적으로 서버(300)로 전달될 수 있고, 디스플레이장치(100)는 서버(300)로부터 음성 인식 결과만을 수신하는 형태로 구현될 수 있다.Although not shown in FIG. 4 , the display apparatus 100 may further include a voice receiver for receiving voice input. The voice receiving unit may include a microphone to generate a voice signal by directly receiving a voice spoken by a user, or may receive an electrical voice signal from the external device 200 . In the latter case, the voice receiver may be implemented as a communication unit for performing wired or wireless communication with the external device 200 . Such a voice receiver may not be included in the display device 100 in some cases. For example, a voice signal corresponding to a voice input through a microphone of the external device 200 is transmitted to the server 300 via a device other than the display device 100 or directly from the external device 200 to the server. 300, and the display device 100 may be implemented in a form of receiving only the voice recognition result from the server 300.

프로세서(120)는 디스플레이(110)에 표시된 텍스트 객체들 중, 기결정된 언어와 상이한 텍스트 객체에 대해선 숫자를 함께 표시하도록 디스플레이(110)를 제어할 수 있다. Among the text objects displayed on the display 110, the processor 120 may control the display 110 to display numerals for text objects different from a predetermined language.

여기서 기결정된 언어란, 음성 인식의 기초가 되는 언어(음성인식에 이용할 음성인식엔진의 언어)를 의미하는 것으로서, 사용자가 수동으로 설정할 수 있고, 자동으로 설정되는 것도 가능하다. 수동으로 언어를 설정하는 경우에 대해 설명하자면, 예컨대, 디스플레이장치(100)가 제공하는 설정메뉴에서 사용언어(또는 시스템 언어)로서 설정된 언어를 음성 인식의 기초가 되는 언어로 설정할 수 있다. Here, the predetermined language means a language that is a basis for voice recognition (a language of a voice recognition engine to be used for voice recognition), and can be manually set by a user or can be set automatically. To describe the case of manually setting a language, for example, a language set as a language to be used (or a system language) in a setting menu provided by the display apparatus 100 may be set as a language based on voice recognition.

자동으로 음성 인식의 기초가 되는 언어를 설정하는 일 실시 예에 따르면, 프로세서(120)는 디스플레이(110)에 현재 표시된 텍스트 객체에 가장 많이 사용된 언어를 식별하여 해당 언어를 음성 인식의 기초가 되는 언어로 자동 설정할 수 있다.According to an embodiment of automatically setting a language as a basis for voice recognition, the processor 120 identifies a language most frequently used in a text object currently displayed on the display 110, and selects the language as a basis for voice recognition. The language can be set automatically.

구체적으로, 프로세서(120)는 현재 디스플레이(110)에 표시된 복수의 텍스트 객체 각각에 포함된 문자의 종류(예컨대, 한글 또는 알파벳)를 분석해서 복수의 텍스트 객체에 전반적으로 가장 많이 사용된 문자의 종류에 해당하는 언어를 음성 인식의 기초가 되는 언어로 설정할 수 있다.Specifically, the processor 120 analyzes the type of characters (eg, Hangul or the alphabet) included in each of the plurality of text objects currently displayed on the display 110, and the type of character most frequently used overall in the plurality of text objects. A language corresponding to may be set as a language that is a basis for voice recognition.

또 다른 실시 예에 따르면, 프로세서(120)는 디스플레이(110)에 현재 표시된 객체들이 웹 페이지의 객체들이면, 해당 웹 페이지의 언어 정보에 대응되는 언어를 음성 인식의 기초가 되는 언어로서 설정할 수 있다. 웹 페이지의 언어 정보는 예컨대, HTML의 lang 속성에서 확인할 수 있다(예컨대, <html lang="en">).According to another embodiment, if objects currently displayed on the display 110 are objects of a web page, the processor 120 may set a language corresponding to language information of the corresponding web page as a language that is a basis for voice recognition. Language information of the web page can be checked, for example, in the lang attribute of HTML (eg, <html lang="en">).

음성 인식의 기초가 되는 언어가 설정되었으면, 프로세서(120)는 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체에 대해서는 임의의 숫자가 함께 표시되도록 디스플레이(110)를 제어할 수 있다. 사용자는 디스플레이(110)에 표시된 임의의 숫자를 말함으로써 텍스트 객체를 선택할 수 있다. 또한, 이미지 객체 또한 음성으로 선택할 수 없기 때문에, 프로세서(120)는 이미지 객체에 대해서도 임의의 숫자가 함께 표시되도록 디스플레이(110)를 제어할 수 있다.When a language based on voice recognition is set, the processor 120 may control the display 110 to display an arbitrary number for a text object different from a language based on voice recognition. A user may select a text object by speaking any number displayed on the display 110 . Also, since an image object cannot be selected by voice, the processor 120 may control the display 110 to display an arbitrary number for the image object as well.

프로세서(120)는 음성 인식의 기초가 되는 언어가 아닌 다른 언어로만 구성된 텍스트 객체에 대해선 음성 인식에 사용될 언어와 상이한 텍스트 객체라고 판단할 수 있다. 또한, 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체에 대해선, 음성 인식의 기초가 되는 언어의 포함 비율이 기설정된 비율 미만인 경우에 음성인식에 사용될 언어와 상이한 텍스트 객체라고 판단할 수 있다. 이에 대해서 도 5를 참고하여 좀 더 구체적으로 설명하도록 한다.The processor 120 may determine that a text object composed of only a language other than a language used as a basis for voice recognition is a text object different from a language to be used for voice recognition. In addition, the processor 120 may determine that a text object composed of two or more languages is a text object different from the language to be used for voice recognition when the content ratio of the language underlying voice recognition is less than a preset ratio. This will be described in more detail with reference to FIG. 5 .

도 5는 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.5 illustrates that a specific screen is displayed on the display 110 .

도 5를 참고하면, 복수의 텍스트 객체(51 ~ 59)를 포함하는 UI 스크린이 디스플레이(110)에 표시되어 있다. 음성 인식의 기초가 되는 언어가 영어로 설정되었다고 가정하도록 한다. 프로세서(120)는 영어가 아닌 다른 언어로 구성된 텍스트 객체들(51 ~ 56)에 대해선 임의의 숫자(①~⑥)가 함께 표시되도록 디스플레이(110)를 제어할 수 있다. 숫자들(①~⑥)은 대응하는 텍스트 객체들(51 ~ 56)에 인접한 위치에 표시될 수 있다. 그리고 영어로 구성된 텍스트 객체들(51, 58)에 대해선, 주변에 특정 아이콘(57a, 58a)이 함께 표시됨으로써, 텍스트 객체들(51, 58)에 포함된 텍스트를 발화함으로써 텍스트 객체들(51, 58)을 선택할 수 있음을 사용자에게 알릴 수 있다. 아이콘(57a, 58a)은 도 5에 도시한 것과 같이 "T"로 표현될 수 있으나, 이에 한정되는 것은 아니고, 예컨대 "Text" 등과 같이 다양한 형태로 표현될 수 있다.Referring to FIG. 5 , a UI screen including a plurality of text objects 51 to 59 is displayed on the display 110 . Assume that the language that is the basis of voice recognition is set to English. The processor 120 may control the display 110 to display random numbers ① to ⑥ together with respect to the text objects 51 to 56 composed of languages other than English. Numbers ① to ⑥ may be displayed at positions adjacent to the corresponding text objects 51 to 56. And, for the text objects 51 and 58 composed of English, by displaying specific icons 57a and 58a together, by uttering the text included in the text objects 51 and 58, the text objects 51, 58) can be selected. The icons 57a and 58a may be expressed as "T" as shown in FIG. 5, but are not limited thereto, and may be expressed in various forms, such as "Text".

2 이상의 언어로 구성된 텍스트 객체(59)에 대해선, 프로세서(120)는 영어의 포함 비율이 기설정된 비율(예컨대 50%) 미만인지 확인하여, 미만인 경우에 숫자를 함께 표시하도록 디스플레이(110)를 제어할 수 있다. 도 5에 도시한 텍스트 객체(59)는 한국어와 영어로 구성되어 있는데, 영어의 포함 비율이 기설정된 비율(예컨대 50%)을 넘으므로 숫자가 함께 표시되지 않는다. 대신 텍스트 객체에 포함된 텍스트를 발화함으로써 텍스트 객체가 선택이 가능함을 알리는 아이콘(59a)이 텍스트 객체(59)에 인접하여 표시될 수 있다.For the text object 59 composed of two or more languages, the processor 120 checks whether the English content ratio is less than a predetermined ratio (eg, 50%), and controls the display 110 to display numbers together if it is less than can do. The text object 59 shown in FIG. 5 is composed of Korean and English, but since the proportion of English exceeds a predetermined ratio (eg 50%), numbers are not displayed together. Instead, an icon 59a indicating that the text object can be selected may be displayed adjacent to the text object 59 by uttering the text included in the text object.

한편, 도 5에선 숫자가 "①"와 같은 형상인 것으로 도시되었으나, 숫자의 형상엔 제한이 없다. 예컨대 원형이 아닌 사각형 안에 "1"이 포함된 형태일 수도 있고, 단순히 "1"이라고만 표시될 수도 있다. 본 개시의 또 다른 실시 예에 따르면, 음성 인식의 기초가 되는 언어의 단어로 표시될 수 있는데, 음성 인식의 기초가 되는 언어가 영어라면 "one"이라고 표시될 수 있고, 음성 인식의 기초가 되는 언어가 스페인어라면 "uno"라고 표시될 수 있다. Meanwhile, in FIG. 5, the numbers are shown to have a shape such as “①”, but there is no limit to the shape of the numbers. For example, “1” may be included in a rectangle instead of a circle, or only “1” may be displayed. According to another embodiment of the present disclosure, it may be displayed as a word of a language that is the basis of voice recognition. If the language that is the basis of voice recognition is English, it may be displayed as "one", and the basis of voice recognition If the language is Spanish, it may be displayed as "uno".

한편, 도 5에선 도시하지 않았으나, 숫자의 표시와 함께 "말하신 숫자에 대응하는 객체를 선택하실 수 있습니다"와 같이 숫자를 말할 것을 유도하는 문구가 추가적으로 디스플레이(110)에 표시될 수도 있다.Meanwhile, although not shown in FIG. 5 , a phrase encouraging the user to say a number, such as “You can select an object corresponding to the number you have spoken,” may be additionally displayed on the display 110 together with the display of the number.

본 개시의 또 다른 실시 예에 따르면 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체에 대해선, 맨 앞의 단어의 언어가 음성인식에 사용될 언어와 다르면, 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체라고 판단할 수 있다. 본 실시 예에 관해선 도 6을 참고하여 설명하도록 한다.According to another embodiment of the present disclosure, for a text object composed of two or more languages, if the language of the first word is different from the language to be used for voice recognition, the processor 120 may set the text object to be different from the language used for voice recognition. can be judged. This embodiment will be described with reference to FIG. 6 .

도 6은 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.6 illustrates that a specific screen is displayed on the display 110 .

도 6을 참고하면, 복수의 텍스트 객체(61 ~ 63)를 포함하는 UI 스크린이 디스플레이(110)에 표시되어 있다. 음성인식에 사용될 언어가 한국어로 설정되었다고 가정하도록 한다. 프로세서(120)는 2 이상의 언어로 구성된 텍스트 객체(61)에 대해선, 맨 앞의 단어 "AAA"의 언어가 음성 인식의 기초가 되는 언어인 한국어가 아닌 영어이므로 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체라고 판단할 수 있다. 따라서, 프로세서(120)는 숫자(①)가 텍스트 객체(61)와 함께 표시되도록 디스플레이(110)를 제어할 수 있다.Referring to FIG. 6 , a UI screen including a plurality of text objects 61 to 63 is displayed on the display 110 . Assume that the language to be used for voice recognition is set to Korean. For the text object 61 composed of two or more languages, the processor 120 is different from the language based on voice recognition because the language of the first word "AAA" is English, not Korean, which is the language based on voice recognition. It can be determined as a text object. Accordingly, the processor 120 may control the display 110 to display the number ① together with the text object 61 .

도 6을 참고하여 설명한 실시 예에 따르면, 2 이상의 언어로 구성된 텍스트 객체에 음성 인식의 기초가 되는 언어가 기 설정된 비율 이상으로 포함되어 있더라도 맨 앞 단어가 음성 인식의 기초가 되는 언어와 다르면 숫자를 표시한다. 반대로, 2 이상의 언어로 구성된 텍스트 객체에 음성 인식의 기초가 되는 언어가 기 설정된 비율 미만으로 포함되어 있더라도 맨 앞 단어가 음성 인식의 기초가 되는 언어와 같으면 숫자를 표시하지 않는다. 이는, 사용자가 텍스트 객체를 선택하기 위해 텍스트 객체의 가장 맨앞에 존재하는 단어를 말할 가능성이 높기 때문이다.According to the embodiment described with reference to FIG. 6 , even if a text object composed of two or more languages includes a language based on voice recognition in a predetermined ratio or more, if the first word is different from the language based on voice recognition, a number is entered. display Conversely, even if the text object composed of two or more languages contains less than a preset ratio of the language underlying voice recognition, if the first word is the same as the language underlying voice recognition, a number is not displayed. This is because there is a high possibility that the user will say the first word of the text object to select the text object.

한편, 본 개시의 또 다른 실시 예에 따르면, 이미지 객체 또한 음성으로 선택할 수 없기 때문에, 이미지 객체에도 숫자가 표시될 수 있다. 본 실시 예에 대해선 이하 도 7을 참고하여 설명한다.Meanwhile, according to another embodiment of the present disclosure, since an image object cannot be selected by voice, a number may be displayed on the image object. This embodiment will be described with reference to FIG. 7 below.

도 7은 디스플레이(110)에 특정 화면이 표시된 것을 도시한 것이다.7 illustrates that a specific screen is displayed on the display 110 .

도 7을 참고하면, 제1 이미지 객체(71), 제2 이미지 객체(72), 제3 이미지 객체(74), 제1 텍스트 객체(73) 및 제2 텍스트 객체(75)가 디스플레이(110)에 표시되어 있다. 프로세서(120)는 제1 이미지 객체(71)와 함께 숫자(①)를 표시하도록 디스플레이(110)를 제어할 수 있다.Referring to FIG. 7 , a first image object 71 , a second image object 72 , a third image object 74 , a first text object 73 , and a second text object 75 are displayed on the display 110 . is indicated on The processor 120 may control the display 110 to display the number ① together with the first image object 71 .

한편, 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 표시된 복수의 객체가 URL 링크를 가지는 객체인 경우, 프로세서(120)는 상기 복수의 객체의 URL 링크를 비교한 결과, 동일한 URL 링크를 가지는 객체들이 있는 경우에 있어서, 해당 객체들이 모두 음성 인식으로 선택이 가능하지 않은 객체라면 어느 하나의 객체에만 숫자를 표시하도록 디스플레이(110)를 제어할 수 있고, 이 객체들 중 어느 하나라도 음성 인식으로 선택이 가능한 객체이면 숫자를 표시하지 않도록 디스플레이(110)를 제어할 수 있다.Meanwhile, according to another embodiment of the present disclosure, when a plurality of objects displayed on the display 110 are objects having URL links, the processor 120 compares the URL links of the plurality of objects, and the same URL link In the case where there are objects having , if all of the corresponding objects are objects that cannot be selected by voice recognition, the display 110 can be controlled to display a number on only one object, and any one of these objects can be voiced. If an object can be selected through recognition, the display 110 may be controlled not to display a number.

좀 더 구체적으로 설명하자면, 디스플레이(110)에 음성 인식으로 선택이 가능하지 않은 객체(즉, 음성 인식의 기초가 되는 언어와 상이한 텍스트 객체, 또는 이미지 객체)가 복수 개 표시되어 있고, 이들이 동일한 URL 주소의 링크를 가질 경우엔, 어느 하나의 객체에만 숫자가 표시될 수 있다. 도 7을 참고하여 설명하자면, 제2 이미지 객체(72)는 음성으로 선택될 수 없는 객체이고, 텍스트 객체(73)는 음성 인식의 기초가 되는 언어인 한국어와는 다른 언어인 영어로 구성되어 있기 때문에 제2 이미지 객체(72)와 제1 텍스트 객체(73)는 모두 음성으로 선택될 수 없지만, 제2 이미지 객체(72)와 제1 텍스트 객체(73)는 선택되었을 때 동일한 URL 주소로 연결되기 때문에, 제2 이미지 객체(72)와 제1 텍스트 객체(73) 중 어느 하나인 제2 이미지 객체(72)에만 숫자(②)가 표시될 수 있다. 또는, 제2 이미지 객체(72) 대신에 텍스트 객체(73)에 숫자가 표시되는 것도 가능하다. 이는, 디스플레이(110)에 표시되는 숫자의 개수를 최소화하기 위함이다.More specifically, a plurality of objects that cannot be selected by voice recognition (ie, text objects or image objects different from the language used for voice recognition) are displayed on the display 110, and they have the same URL. In the case of having an address link, a number can be displayed only in one object. Referring to FIG. 7, the second image object 72 is an object that cannot be selected by voice, and the text object 73 is composed of English, a language different from Korean, which is the language underlying voice recognition. Therefore, both the second image object 72 and the first text object 73 cannot be selected by voice, but the second image object 72 and the first text object 73 are connected to the same URL address when selected. Therefore, the number (②) may be displayed only on the second image object 72, which is either one of the second image object 72 and the first text object 73. Alternatively, numbers may be displayed on the text object 73 instead of the second image object 72 . This is to minimize the number of numbers displayed on the display 110 .

디스플레이(110)에 표시되는 숫자의 개수를 최소화하기 위해, 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 동일한 URL 주소를 갖는 복수의 객체가 디스플레이(110)에 표시되어 있고, 이들 중 어느 하나라도 음성 인식의 기초가 되는 언어와 동일한 텍스트 객체이면 이들 모두에 대해 숫자를 표시하지 않는다. 도 7을 참고하여 설명하자면, 프로세서(120)는 제3 이미지 객체(74)의 URL 주소와 제2 텍스트 객체(75)의 ULR 주소를 비교하여 서로가 같은 것으로 판단되고, 제2 텍스트 객체(75)가 음성 인식의 기초가 되는 언어인 한국어와 동일한 텍스트 객체라고 판단되면, 제3 이미지 객체(74)에는 숫자를 표시하지 않도록 디스플레이(110)를 제어한다.In order to minimize the number of numbers displayed on the display 110, according to another embodiment of the present disclosure, a plurality of objects having the same URL address are displayed on the display 110, and among them If any one of the text objects is the same as the language that is the basis of voice recognition, numbers are not displayed for all of them. Referring to FIG. 7 , the processor 120 compares the URL address of the third image object 74 and the ULR address of the second text object 75, determines that they are the same, and determines that the URL address of the second text object 75 is the same. ) is determined to be the same text object as Korean, which is the language underlying voice recognition, the display 110 is controlled so as not to display numbers on the third image object 74 .

사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 특정 텍스트를 포함하면, 프로세서(120)는 해당 텍스트에 대응하는 텍스트 객체와 관련한 동작을 수행할 수 있다. 도 5를 참고하여 설명하자면, 사용자가 "Voice recognition"이라고 말하면, 프로세서(120)는 텍스트 객체(59)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다.When the recognition result of the voice spoken by the user includes specific text displayed on the display 110, the processor 120 may perform an operation related to a text object corresponding to the text. Referring to FIG. 5 , when the user says “Voice recognition”, the processor 120 may control the display 110 to display a page of a URL address corresponding to the text object 59 .

한편, 본 개시의 일 실시 예에 따르면, 사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 텍스트 객체들 중 2 이상의 텍스트 객체에 공통으로 포함된 텍스트를 포함한 경우, 프로세서(120)는 해당 텍스트 객체들에 각각 숫자를 표시하고, 사용자가 표시된 숫자를 발화하면 숫자에 대응하는 텍스트 객체에 관련한 동작을 수행할 수 있다. 도 5를 참고하여 설명하자면, 사용자가 발화한 음성 인식의 결과에 "Speech recognition"이 포함된 경우, 프로세서(120)는 화면에 표시된 텍스트 객체들 중 "Speech recognition"이 포함되어 있는 텍스트 객체를 검색한다. 복수의 텍스트 객체(57, 58)가 검색된 경우, 프로세서(120)는 텍스트 객체들(57, 58) 옆에 임의의 숫자를 표시하도록 디스플레이(110)를 제어할 수 있다. 예컨대, 텍스트 객체(57) 옆에 숫자 ⑦이 표시되고, 텍스트 객체(58) 옆에 숫자 ⑧이 표시될 수 있고, 사용자는 숫자 "7"을 말함으로써 텍스트 객체(57)를 선택할 수 있게 된다. 사용자가 발화한 음성의 인식 결과가 디스플레이(110)에 표시된 숫자를 포함하면, 프로세서(120)는 포함된 숫자에 대응되는 텍스트 객체 또는 이미지 객체와 관련된 동작을 수행할 수 있다. 도 6을 참고하여 설명하자면, 사용자가 "일"이라고 말하면 프로세서(120)는 텍스트 객체(61)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다. Meanwhile, according to an embodiment of the present disclosure, when the recognition result of a voice spoken by a user includes text commonly included in two or more text objects among text objects displayed on the display 110, the processor 120 determines the corresponding text object. A number is displayed on each of the text objects, and when the user utters the displayed number, an operation related to the text object corresponding to the number can be performed. Referring to FIG. 5 , when “Speech recognition” is included in the result of voice recognition uttered by the user, the processor 120 searches for a text object including “Speech recognition” among text objects displayed on the screen. do. When a plurality of text objects 57 and 58 are searched for, the processor 120 may control the display 110 to display a random number next to the text objects 57 and 58 . For example, the number ⑦ can be displayed next to the text object 57, the number ⑧ can be displayed next to the text object 58, and the user can select the text object 57 by saying the number “7”. When the recognition result of the voice spoken by the user includes a number displayed on the display 110, the processor 120 may perform an operation related to a text object or an image object corresponding to the included number. Referring to FIG. 6 , if the user says “work”, the processor 120 may control the display 110 to display a page of a URL address corresponding to the text object 61 .

사용자가 발화한 음성은 디스플레이장치(100)의 마이크를 통해 입력되거나 외부장치(200)이 마이크를 통해 입력될 수 있다. 후자의 경우, 디스플레이장치(100)는 마이크를 포함한 외부장치(200)와 통신하기 위한 통신부를 포함할 수 있고, 통신부는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성신호를 수신할 수 있다. 프로세서(120)는 통신부를 통해 외부장치(200)로부터 수신된 음성신호에 대한 인식 결과가 디스플레이(110)에 표시된 숫자를 포함하면, 해당 숫자에 대응하는 텍스트 객체와 관련된 동작을 수행할 수 있다. 도 6을 참고하여 설명하자면, 사용자가 외부장치(200)의 마이크에 "일"이라고 말하면 외부장치(200)는 음성신호를 디스플레이장치(200)로 전송하고, 프로세서(120)는 수신한 음성신호에 대한 음성 인식 결과를 바탕으로 텍스트 객체(61)에 대응하는 URL 주소의 페이지를 표시하도록 디스플레이(110)를 제어할 수 있다.The voice uttered by the user may be input through the microphone of the display device 100 or the external device 200 through the microphone. In the latter case, the display device 100 may include a communication unit for communicating with the external device 200 including a microphone, and the communication unit receives a voice signal corresponding to a voice input through the microphone of the external device 200. can do. When the recognition result of the voice signal received from the external device 200 through the communication unit includes a number displayed on the display 110, the processor 120 may perform an operation related to a text object corresponding to the corresponding number. Referring to FIG. 6, when the user says “work” into the microphone of the external device 200, the external device 200 transmits a voice signal to the display device 200, and the processor 120 transmits the received voice signal. The display 110 may be controlled to display a page of a URL address corresponding to the text object 61 based on a voice recognition result for .

한편, 텍스트 또는 이미지 객체에 대응하여 표시된 숫자는 일정 기간 동안만 표시될 수 있다. 일 실시 예에 따르면, 프로세서(120)는 외부장치(200)에서 특정 버튼의 선택에 대응하는 신호가 수신되는 동안 숫자들을 표시하도록 디스플레이(110)를 제어할 수 있다. 즉, 외부장치(200)의 특정버튼을 사용자가 누르고 있는 동안에만 숫자가 표시될 수 있다. 여기서 특정 버튼은 예컨대, 도 2에서 설명한 외부장치(200)의 마이크 버튼(210)일 수 있다.Meanwhile, numbers displayed corresponding to text or image objects may be displayed only for a certain period of time. According to an embodiment, the processor 120 may control the display 110 to display numbers while a signal corresponding to selection of a specific button is received from the external device 200 . That is, numbers may be displayed only while the user presses a specific button of the external device 200 . Here, the specific button may be, for example, the microphone button 210 of the external device 200 described in FIG. 2 .

또 다른 실시 예에 따르면, 프로세서(120)는 디스플레이장치(100)의 마이크를 통해 입력된 음성이 기 설정된 키워드(예컨대, "Hi TV")를 포함하면 숫자들을 표시하고, 디스플레이장치(100)의 마이크를 통해 음성이 미입력되는 상태로 기 설정된 시간이 경과하면 표시된 숫자들을 제거할 수 있다.According to another embodiment, the processor 120 displays numbers when the voice input through the microphone of the display device 100 includes a preset keyword (eg, "Hi TV"), and When a predetermined time elapses in a state in which no voice is input through the microphone, the displayed numbers may be removed.

한편, 상술한 실시 예들에선 숫자가 표시되는 것으로 설명하였으나, 반드시 숫자가 표시될 필요는 없고, 사용자가 보고 읽을 수 있는 단어(의미를 가진 단어 또는 의미가 없는 단어)라면 어떠한 것이든 가능하다. 예컨대, 1, 2, 3.. 대신에 a, b, c...가 표시되는 것도 가능하다. 본 개시의 또 다른 실시 예에 따르면, 디스플레이(110)에 표시된 웹 페이지에 검색창이 있는 경우, 사용자는 검색하고자 하는 단어와, 검색 기능을 실행시키는 특정 키워드를 발화함으로써 손쉽게 검색을 수행할 수 있다. 예컨대, 디스플레이(110)에 표시된 웹 페이지에 검색창이 있는 경우, "○○○ 검색" 또는 "검색 ○○○" 등과 같이 말하기만 하면 "○○○"에 대한 검색결과가 디스플레이(110)에 표시될 수 있다.On the other hand, in the above embodiments, it has been described that numbers are displayed, but numbers are not necessarily displayed, and any words (words with meaning or words without meaning) that the user can see and read can be used. For example, it is also possible to display a, b, c... instead of 1, 2, 3... According to another embodiment of the present disclosure, when there is a search window in the web page displayed on the display 110, the user can easily perform a search by uttering a word to be searched for and a specific keyword that executes the search function. For example, if there is a search bar on the web page displayed on the display 110, the search result for “○○○” is displayed on the display 110 simply by saying “Search ○○○” or “Search ○○○”. It can be.

이를 위해, 프로세서(120)는 디스플레이(110)에 표시된 웹 페이지에서 검색어 입력창을 검출할 수 있다. 구체적으로, 프로세서(120)는 디스플레이(110)에 표시된 웹 페이지의 구성 객체들 중에서 입력이 가능한 객체를 검색할 수 있다. HTML 상의 입력 태그(Input tag)가 입력이 가능한 객체이다. 입력 태그(Input tag)는 다양한 속성들(attributes)을 가지는데, 그 중 타입 속성(type attributes)은 입력 성격을 명확히 규정한다. 타입이 "search"인 경우엔 그 객체는 명확히 검색어 입력창에 해당된다. To this end, the processor 120 may detect a search word input window in the web page displayed on the display 110 . Specifically, the processor 120 may search for an object capable of being input from among constituent objects of the web page displayed on the display 110 . An input tag on HTML is an object that can be input. An input tag has various attributes, among which type attributes clearly define the characteristics of an input. If the type is "search", the object clearly corresponds to the search word input window.

다만, 타입이 "text"인 객체의 경우엔 검색어 입력창인지 여부를 바로 판단할 수 없다. 일반적인 입력 객체들도 텍스트 타입(text type)을 가지고 있기 때문에 해당 객체가 검색어 입력창인지 일반 입력 창인지 구분할 수 없기 때문이다. 따라서, 이 경우엔 검색어 입력창인지 여부를 판단하는 별도의 과정이 필요하다.However, in the case of an object whose type is "text", it cannot be immediately determined whether or not it is a search word input window. This is because general input objects also have a text type, so it cannot be distinguished whether the corresponding object is a search word input window or a general input window. Therefore, in this case, a separate process of determining whether the search word input window is required is required.

타입이 "text"인 객체의 경우, 검색어 입력창인지 여부를 판단하기 위해, 해당 객체의 추가적인 속성(attributes)에 대한 정보를 참고하게 된다. title이나 aria-label 에 "검색" 키워드가 있는 경우 해당 객체를 검색어 입력창이라고 판단할 수 있다.In the case of an object whose type is "text", information on additional attributes of the corresponding object is referred to in order to determine whether it is a search word input window. If there is a "search" keyword in title or aria-label, it can be determined that the object is a search word input window.

그리고 프로세서(120)는 사용자가 발화한 음성의 인식 결과에 특정 키워드가 포함되어 있는지 판단한다. 여기서 특정 키워드는 "검색", "찾아" 등일 수 있다. 특정 키워드가 포함되어 있는 것으로 판단되면, 프로세서(120)는 사용자의 의도를 보다 정확히 판단하기 위해 상기 특정 키워드의 위치를 확인하다. 상기 특정 키워드의 앞 또는 뒤에 적어도 하나의 단어가 존재하는 경우라면 사용자의 의도가 그 적어도 하나의 단어를 검색하고자 하는 의도일 가능성이 높다. 만약 음성 인식 결과에 오직 "검색" 또는 "찾아"와 같은 특정 키워드만 포함된 경우라면 사용자가 검색하고자 하는 의도가 아닐 확률이 높다.Further, the processor 120 determines whether a specific keyword is included in the recognition result of the voice spoken by the user. Here, the specific keyword may be “search” or “find”. If it is determined that a specific keyword is included, the processor 120 checks the location of the specific keyword in order to more accurately determine the user's intention. If at least one word exists before or after the specific keyword, it is highly likely that the user's intention is to search for the at least one word. If the voice recognition result includes only a specific keyword such as “search” or “find”, it is highly likely that the user does not intend to search.

이와 같은 사용자의 의도 판단 과정은 디스플레이장치(100)에서 수행될 수 있고, 서버(300)에서 수행되어 그 결과를 디스플레이장치(100)에 제공하는 것도 가능하다.This process of determining the user's intention may be performed in the display apparatus 100 or may be performed in the server 300 to provide the result to the display apparatus 100 .

사용자의 검색 의도가 판단된 경우, 프로세서(120)는 상기 특정 키워드를 제외한 나머지 단어를 검색어로 선정하고, 선정된 검색어를 상술한 방식에 따라 검출된 검색어 입력창에 입력하여 검색을 수행한다. 예컨대, 도 8에 도시한 바와 같이 검색어 입력창(810)을 포함하는 웹 페이지가 디스플레이(110)에 표시되면, 프로세서(120)는 검색어 입력창(810)을 검출하고, 사용자가 "강아지 검색"이라고 음성을 발화하면, 프로세서(120)는 발화된 음성에 대한 음성 인식 결과에서 "강아지"를 검색어로 선정하여 상기 검출된 검색어 입력창(810)에 입력하여 검색을 수행한다.When the user's search intention is determined, the processor 120 selects words other than the specific keyword as a search word, inputs the selected search word into the detected search word input window according to the above method, and performs a search. For example, as shown in FIG. 8 , when a web page including a search word input window 810 is displayed on the display 110, the processor 120 detects the search word input window 810, and the user “searches for a dog”. When a voice is uttered, the processor 120 selects “dog” as a search word from the voice recognition result for the uttered voice and inputs it into the detected search word input window 810 to perform a search.

한편, 디스플레이(110)에 표시된 웹 페이지에서 검색어 입력창을 검출하는 동작은 음성 인식 결과에 특정 키워드가 포함되어 있음이 판단된 이후에 수행될 수 있고, 또는 그 이전에 미리 수행되는 것도 가능하다.Meanwhile, the operation of detecting a search word input box in the web page displayed on the display 110 may be performed after it is determined that a specific keyword is included in the voice recognition result, or may be performed in advance.

도 9는 검색어 입력 방식의 또 다른 예를 설명하기 위한 도면이다. 도 9는 한 웹 페이지 내에 검색어 입력창이 복수 개인 경우에 검색 수행방법을 설명하기 위한 것이다.9 is a diagram for explaining another example of a search word input method. 9 is for explaining a method of performing a search when there are a plurality of search word input windows in one web page.

도 9를 참고하면, 한 웹 페이지 안에 검색창이 2개인 경우를 도시한 것이다. 제1 검색어 입력창(910)은 뉴스 검색을 위한 것이고, 제2 검색어 입력창(920)은 주식 검색을 위한 것이다. 프로세서(120)는 객체의 배치 위치에 관한 정보 및 현재 화면의 레이아웃에 관한 정보에 기초하여, 사용자가 검색어를 포함한 음성을 발화한 시점에 표시된 검색어 입력창으로 검색을 수행한다. 예컨대, 제1 검색어 입력창(910)이 디스플레이(110)에 표시된 상황에서 사용자가 검색어 및 특정 키워드를 포함한 음성을 발화하면 프로세서(120)는 제1 검색어 입력창(910)에 검색어를 입력하고, 아래 방향으로 스크롤이 수행되어서 제2 검색어 입력창(920)이 디스플레이(110)에 표시된 상황에서 사용자가 검색어 및 특정 키워드를 포함한 음성을 발화하면 프로세서(120)는 제2 검색어 입력창(920)에 검색어를 입력할 수 있다. 즉, 한 웹 페이지 안에 다수의 검색어 입력창이 있는 경우, 현재 화면에서 보여지는 검색어 입력창으로 검색이 수행될 수 있다.Referring to FIG. 9 , a case in which there are two search windows in one web page is shown. The first search word input window 910 is for news search, and the second search word input window 920 is for stock search. The processor 120 performs a search through the search word input window displayed when the user utters a voice including the search word, based on the information about the arrangement position of the object and the information about the layout of the current screen. For example, in a situation where the first search word input window 910 is displayed on the display 110, when the user utters a search word and a voice including a specific keyword, the processor 120 inputs the search word into the first search word input window 910, In a situation where the second search word input window 920 is displayed on the display 110 by scrolling in a downward direction, when the user utters a voice including a search word and a specific keyword, the processor 120 displays the second search word input window 920 You can enter a search term. That is, when there are a plurality of search word input windows in one web page, a search can be performed with the search word input window displayed on the current screen.

디스플레이(110)에 보여지는 화면에 기초하여 음성 제어가 이루어진다 즉, 기본적으로 디스플레이(110)에 표시 중인 화면에 해당하는 애플리케이션을 이용하여 음성 명령에 따른 기능이 수행된다. 그러나 입력된 음성 명령이 현재 표시된 화면에 포함된 객체와 매칭되지 않거나, 현재 화면을 표시하고 있는 애플리케이션이 갖는 기능과 다른 것일 경우, 다른 애플리케이션이 실행되어 해당 음성 명령에 따른 기능을 수행할 수 있다.Voice control is performed based on the screen displayed on the display 110 , that is, a function according to a voice command is performed using an application corresponding to the screen being displayed on the display 110 . However, if the input voice command does not match an object included in the currently displayed screen or has a function different from that of the application currently displaying the screen, another application may be executed to perform the function according to the corresponding voice command.

예를 들어, 현재 실행 중인 애플리케이션이 웹 브라우징 애플리케이션이고, 사용자가 발화한 음성이 웹 브라우징 애플리케이션이 표시하고 있는 웹 페이지 내 객체와 매칭되지 않는 경우, 프로세서(120)는 기 설정된 다른 애플리케이션을 실행시켜 사용자가 발화한 음성에 대응하는 검색 기능을 수행할 수 있다. 여기서 기 설정된 다른 애플리케이션은 검색 기능을 제공하는 애플리케이션으로서 예컨대, 구글™의 검색엔진을 이용하여 음성에 대응하는 텍스트에 대한 검색결과를 제공하는 애플리케이션, 음성에 대응하는 텍스트에 대응하는 VOD 컨텐츠의 검색 결과를 제공하는 애플리케이션 등일 수 있다. 한편, 이와 같은 다른 애플리케이션이 실행되기 전에, 프로세서(120)는 "현재 화면에서 ○○○와 일치되는 결과가 없습니다. 인터넷에서 ○○○를 검색하시겠습니까?"와 같은 사용자의 동의를 받기 위한 UI를 표시할 수 있고, UI에서 사용자 동의가 입력되고 나서 인터넷 검색 애플리케이션 등을 실행하여 검색 결과를 제공할 수 있다.For example, if the currently running application is a web browsing application and a voice uttered by the user does not match an object in a web page displayed by the web browsing application, the processor 120 executes another preset application to allow the user to A search function corresponding to the uttered voice may be performed. Here, another preset application is an application that provides a search function, for example, an application that provides search results for text corresponding to voice using the search engine of Google™, and search results for VOD content corresponding to text corresponding to voice. It may be an application that provides. On the other hand, before this other application is executed, the processor 120 displays a UI for obtaining consent from the user, such as "There is no result matching ○○○ on the current screen. Do you want to search for ○○○ on the Internet?" It can be displayed, and after user agreement is input in the UI, an Internet search application or the like can be executed to provide search results.

디스플레이장치(100)는 서버(300)로부터 수신된 음성 인식 결과를 처리하는 음성처리부와 디스플레이장치(100)에 설치된 애플리케이션을 실행하는 애플리케이션부를 포함할 수 있다. 음성처리부는 서버(300)로부터 수신된 음성 인식 결과를 애플리케이션부에 제공한다. 애플리케이션부의 제1 애플리케이션이 실행되어 제1 애플리케이션의 화면이 디스플레이(110)에 표시되어 있는 동안 상기 인식 결과를 제공받은 경우, 제1 애플리케이션은 음성처리부로부터 제공받은 음성 인식 결과를 기초로 앞서 설명한 동작을 수행할 수 있다. 예컨대, 음성 인식 결과에 포함된 숫자에 해당하는 텍스트 또는 이미지 객체 탐색, 음성 인식 결과에 포함된 단어에 해당하는 텍스트 객체 탐색, 음성 인식 결과에 "검색"이 포함된 경우 검색창에 키워드 입력 후 검색을 실행하는 등과 같은 동작을 수행할 수 있다. 만약 제1 애플리케이션이 음성처리부로부터 제공받은 음성 인식 결과를 이용하여 수행할 동작이 없는 경우, 즉 예컨대 음성 인식 결과에 해당하는 텍스트 또는 이미지 객체가 없거나, 검색창이 없는 경우, 제1 애플리케이션은 음성처리부에 이를 통지하고, 음성처리부는 음성 인식 결과와 관련한 동작을 수행할 수 있는 제2 애플리케이션을 실행하도록 애플리케이션부를 제어할 수 있다. 예컨대, 제2 애플리케이션은 특정 검색어에 대한 검색 결과를 제공하는 애플리케이션이다. 애플리케이션부는 제2 애플리케이션을 실행하여 음성인식결과에 포함된 텍스트를 검색어로 이용한 검색 결과를 제공할 수 있다.The display device 100 may include a voice processing unit that processes a voice recognition result received from the server 300 and an application unit that executes an application installed in the display device 100 . The voice processing unit provides the voice recognition result received from the server 300 to the application unit. When the first application of the application unit is executed and the recognition result is provided while the screen of the first application is displayed on the display 110, the first application performs the above-described operation based on the voice recognition result provided from the voice processing unit. can be done For example, text or image object search corresponding to the number included in the voice recognition result, text object search corresponding to the word included in the voice recognition result, search after entering a keyword in the search box when “search” is included in the voice recognition result. You can perform operations such as running . If there is no operation to be performed using the voice recognition result provided by the first application from the voice processing unit, that is, if there is no text or image object corresponding to the voice recognition result or there is no search window, the first application is sent to the voice processing unit. Notifying this, the voice processing unit may control the application unit to execute a second application capable of performing an operation related to a voice recognition result. For example, the second application is an application that provides search results for a specific search term. The application unit may execute the second application and provide a search result using text included in the voice recognition result as a search term.

도 10은 디스플레이장치(100)가 TV로 구현된 경우의 구성을 도시한 블럭도이다. 도 10을 설명함에 있어서 도 4에서 설명한 구성과 중복되는 구성에 대한 설명은 생략한다.FIG. 10 is a block diagram showing the configuration of a display apparatus 100 implemented as a TV. In describing FIG. 10 , a description of a configuration overlapping with the configuration described in FIG. 4 will be omitted.

도 10을 참고하면, 디스플레이장치(100)는 예를 들어 아날로그 TV, 디지털 TV, 3D-TV, 스마트 TV, LED TV, OLED TV, 플라즈마 TV, 모니터, 고정 곡률(curvature)인 화면을 가지는 커브드(curved) TV, 고정 곡률인 화면을 가지는 플렉시블(flexible) TV, 고정 곡률인 화면을 가지는 벤디드(bended) TV, 및/또는 수신되는 사용자 입력에 의해 현재 화면의 곡률을 변경 가능한 곡률 가변형 TV 등으로 구현될 수 있으나, 이에 한정되지 않는다. Referring to FIG. 10 , the display device 100 includes, for example, an analog TV, a digital TV, a 3D-TV, a smart TV, an LED TV, an OLED TV, a plasma TV, a monitor, and a curved screen having a fixed curvature screen. A curved TV, a flexible TV having a screen with a fixed curvature, a bended TV having a screen with a fixed curvature, and/or a curvature variable TV capable of changing the curvature of the current screen by a received user input, etc. It may be implemented as, but is not limited thereto.

디스플레이 장치(100)는 디스플레이(110), 프로세서(120), 튜너(130), 통신부(140), 마이크(150), 입/출력부(160), 오디오 출력부(170), 저장부(180)를 포함한다. The display device 100 includes a display 110, a processor 120, a tuner 130, a communication unit 140, a microphone 150, an input/output unit 160, an audio output unit 170, and a storage unit 180. ).

튜너(130)는 유선 또는 무선으로 수신되는 방송 신호를 증폭(amplification), 혼합(mixing), 공진(resonance) 등을 통하여 많은 전파 성분 중에서 디스플레이 장치(100)에서 수신하고자 하는 채널의 주파수만을 튜닝(tuning)시켜 선택할 수 있다. 방송 신호는 비디오(video), 오디오(audio) 및 부가 데이터(예를 들어, EPG(Electronic Program Guide)를 포함할 수 있다.The tuner 130 tunes only the frequency of a channel to be received by the display device 100 among many radio wave components through amplification, mixing, resonance, etc. of broadcast signals received by wire or wirelessly ( can be selected by tuning. The broadcast signal may include video, audio, and additional data (eg, Electronic Program Guide (EPG)).

튜너(130)는 사용자 입력에 대응되는 채널 번호에 대응되는 주파수 대역에서 비디오, 오디오 및 데이터를 수신할 수 있다. The tuner 130 may receive video, audio, and data in a frequency band corresponding to a channel number corresponding to a user input.

튜너(130)는 지상파 방송, 케이블 방송, 또는, 위성 방송 등과 같이 다양한 소스에서부터 방송 신호를 수신할 수 있다. 튜너(130)는 다양한 소스에서부터 아날로그 방송 또는 디지털 방송 등과 같은 소스로부터 방송 신호를 수신할 수도 있다. The tuner 130 may receive broadcast signals from various sources such as terrestrial broadcasting, cable broadcasting, or satellite broadcasting. The tuner 130 may receive broadcast signals from various sources such as analog broadcasting or digital broadcasting.

튜너(130)는 디스플레이 장치(100)와 일체형(all-in-one)으로 구현되거나 또는 디스플레이 장치(100)와 전기적으로 연결되는 튜너 유닛을 가지는 별개의 장치(예를 들어, 셋톱박스(set-top box), 입/출력부(160)에 연결되는 튜너)로 구현될 수 있다.The tuner 130 is implemented as an all-in-one with the display device 100 or a separate device having a tuner unit electrically connected to the display device 100 (for example, a set-top box (set-top box)). top box) and a tuner connected to the input/output unit 160).

통신부(140)는 다양한 유형의 통신방식에 따라 다양한 유형의 외부 기기와 통신을 수행하는 구성이다. 통신부(140)는 근거리 통신망(LAN: Local Area Network) 또는 인터넷망을 통해 외부 기기에 접속될 수 있고, 무선 통신(예를 들어, Z-wave, 4LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless, Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, 블루투스, 와이파이, Wi-Fi Direct, GSM, UMTS, LTE, WiBRO 등의 무선 통신) 방식에 의해서 외부 기기에 접속될 수 있다. 통신부(140)는 와이파이칩(141), 블루투스 칩(142), NFC칩(143), 무선 통신 칩(144) 등과 같은 다양한 통신 칩을 포함한다. 와이파이 칩(141), 블루투스 칩(142), NFC 칩(143)은 각각 WiFi 방식, 블루투스 방식, NFC 방식으로 통신을 수행한다. 무선 통신 칩(174)은 IEEE, 지그비, 3G(3rd Generation), 3GPP(3rd Generation Partnership Project), LTE(Long Term Evoloution) 등과 같은 다양한 통신 규격에 따라 통신을 수행하는 칩을 의미한다. 또한 통신부(140)는 외부장치(200)로부터 제어신호(예를 들어 IR 펄스)를 수신할 수 있는 광 수신부(145)를 포함한다. The communication unit 140 is a component that performs communication with various types of external devices according to various types of communication methods. The communication unit 140 may be connected to an external device through a local area network (LAN) or an Internet network, and wireless communication (eg, Z-wave, 4LoWPAN, RFID, LTE D2D, BLE, GPRS, Weightless , Edge Zigbee, ANT+, NFC, IrDA, DECT, WLAN, Bluetooth, Wi-Fi, Wi-Fi Direct, GSM, UMTS, LTE, WiBRO, etc.) The communication unit 140 includes various communication chips such as a Wi-Fi chip 141, a Bluetooth chip 142, an NFC chip 143, and a wireless communication chip 144. The Wi-Fi chip 141, the Bluetooth chip 142, and the NFC chip 143 each perform communication using a WiFi method, a Bluetooth method, and an NFC method. The wireless communication chip 174 refers to a chip that performs communication according to various communication standards such as IEEE, ZigBee, 3rd Generation (3G), 3rd Generation Partnership Project (3GPP), and Long Term Evolution (LTE). In addition, the communication unit 140 includes a light receiving unit 145 capable of receiving a control signal (eg, an IR pulse) from the external device 200 .

프로세서(120)는 통신부(140)를 통해 서버(300)로 음성신호와 언어정보(음성 인식의 기초가 되는 언어에 대한 정보)를 전송할 수 있고, 서버(300)가 상기 언어 정보에 대응하는 언어의 음성인식엔진을 이용하여 상기 음성신호에 대하여 수행한 음성 인식의 결과를 전송하면, 상기 음성 인식의 결과를 통신부(140)를 통해 수신할 수 있다.The processor 120 may transmit a voice signal and language information (information on a language that is a basis for voice recognition) to the server 300 through the communication unit 140, and the server 300 may transmit a language corresponding to the language information. When a result of voice recognition performed on the voice signal is transmitted using the voice recognition engine of , the result of voice recognition may be received through the communication unit 140 .

마이크(150)는 사용자가 발화한 음성을 수신할 수 있고, 수신된 음성에 대응하는 음성 신호를 생성할 수 있다. 마이크(150)는 디스플레이 장치(100)와 일체형으로 구현되거나 또는 분리될 수 있다. 분리된 마이크(150)는 디스플레이 장치(100)와 전기적으로 연결될 수 있다.The microphone 150 may receive a voice uttered by a user and generate a voice signal corresponding to the received voice. The microphone 150 may be integrally implemented with the display device 100 or may be separated. The separated microphone 150 may be electrically connected to the display device 100 .

디스플레이장치(100)에 마이크가 없는 경우, 디스플레이장치(100)는 외부장치(200)의 마이크를 통해 입력된 음성에 대응하는 음성신호를 통신부(140)를 통해 외부장치(200)로부터 수신할 수 있다. 통신부(140)는 와이파이, 블루투스 등의 통신 방식으로 외부장치(200)로부터 음성신호를 수신할 수 있다.When the display device 100 does not have a microphone, the display device 100 may receive a voice signal corresponding to a voice input through the microphone of the external device 200 from the external device 200 through the communication unit 140. there is. The communication unit 140 may receive a voice signal from the external device 200 through a communication method such as Wi-Fi or Bluetooth.

입/출력부(160)는 외부 장치와 연결되기 위한 구성이다. 입/출력부(160)는 HDMI 입력 포트(High-Definition Multimedia Interface port, 161), 컴포넌트 입력 잭(162), 및 USB 포트(163) 중 적어도 하나를 포함할 수 있다. 도시한 것 이외에도 입/출력부(180)는 RGB, DVI, HDMI, DP, 썬드볼트 등의 포트 중 적어도 하나를 포함할 수 있다.The input/output unit 160 is a component for connecting with an external device. The input/output unit 160 may include at least one of a high-definition multimedia interface port 161, a component input jack 162, and a USB port 163. In addition to those shown, the input/output unit 180 may include at least one of ports such as RGB, DVI, HDMI, DP, and Thunderbolt.

오디오 출력부(170)는 오디오를 출력하기 위한 구성으로서, 예컨대, 튜너(130)를 통해 수신된 방송 신호에 포함된 오디오, 또는 통신부(140), 입/출력부(160) 등을 통해 입력되는 오디오, 또는 저장부(180)에 저장된 오디오 파일에 포함된 오디오를 출력할 수 있다. 오디오 출력부(170)는 스피커(171) 및 헤드폰 출력 단자(172)를 포함할 수 있다. The audio output unit 170 is a component for outputting audio, for example, audio included in a broadcast signal received through the tuner 130 or input through the communication unit 140 or the input/output unit 160. Audio or audio included in an audio file stored in the storage unit 180 may be output. The audio output unit 170 may include a speaker 171 and a headphone output terminal 172 .

저장부(180)는 프로세서(120)의 제어에 의해 디스플레이 장치(100)를 구동하고 제어하기 위한 각종 애플리케이션 프로그램, 데이터, 소프트웨어 모듈을 포함할 수 있다. 예컨대, 저장부(180)는 인터넷망을 통해 수신된 웹 컨텐츠 데이터를 파싱하는 웹 파싱 모듈, JavaScript 모듈, 그래픽처리 모듈, 음성인식결과 처리모듈, 입력 처리 모듈을 포함할 수 있다. The storage unit 180 may include various application programs, data, and software modules for driving and controlling the display device 100 under the control of the processor 120 . For example, the storage unit 180 may include a web parsing module for parsing web content data received through an Internet network, a JavaScript module, a graphic processing module, a voice recognition result processing module, and an input processing module.

외부의 서버(300)가 아닌 디스플레이장치(100) 자체적으로 음성 인식을 수행하는 경우, 저장부(180)에는 다양한 언어에 맞는 다양한 음성인식엔진을 포함하는 음성인식모듈이 저장되어 있을 수 있다.When voice recognition is performed by the display device 100 itself rather than the external server 300, voice recognition modules including various voice recognition engines suitable for various languages may be stored in the storage unit 180.

저장부(180)는 디스플레이(110)에서 제공되는 다양한 UI 화면을 구성하기 위한 데이터를 저장할 수 있다. 또한, 저장부(180)는 다양한 사용자 인터렉션에 대응되는 제어 신호를 생성하기 위한 데이터를 저장할 수 있다.The storage unit 180 may store data for composing various UI screens provided by the display 110 . Also, the storage unit 180 may store data for generating control signals corresponding to various user interactions.

저장부(180)는 비휘발성 메모리, 휘발성 메모리, 플래시메모리(flash-memory), 하드디스크 드라이브(HDD) 또는 솔리드 스테이트 드라이브(SSD) 등으로 구현될 수 있다. 한편, 저장부(180)는 디스플레이 장치(100) 내의 저장 매체뿐만 아니라, 외부 저장 매체, 예를 들어, micro SD 카드, USB 메모리 또는 네트워크를 통한 웹 서버(Web server) 등으로 구현될 수 있다.The storage unit 180 may be implemented as a non-volatile memory, a volatile memory, a flash-memory, a hard disk drive (HDD), or a solid state drive (SSD). Meanwhile, the storage unit 180 may be implemented as a storage medium in the display device 100 as well as an external storage medium such as a micro SD card, a USB memory, or a web server through a network.

프로세서(120)는 디스플레이 장치(100)의 전반적인 동작 및 디스플레이 장치(100)의 내부 구성요소들 사이의 신호 흐름을 제어하고, 데이터를 처리하는 기능을 수행한다. The processor 120 controls the overall operation of the display device 100 and the flow of signals between internal components of the display device 100 and processes data.

프로세서(120)는 RAM(121), ROM(122), CPU(123) 및 버스(124)를 포함한다. RAM(121), ROM(122), CPU(123) 등은 버스(124)를 통해 서로 연결될 수 있다. 프로세서(120)는 SoC(System On Chip)로 구현될 수 있다. Processor 120 includes RAM 121 , ROM 122 , CPU 123 and bus 124 . RAM 121 , ROM 122 , CPU 123 , etc. may be connected to each other through a bus 124 . The processor 120 may be implemented as a system on chip (SoC).

CPU(123)는 저장부(180)에 액세스하여, 저장부(180)에 저장된 O/S를 이용하여 부팅을 수행한다. 그리고 저장부(180)에 저장된 각종 프로그램, 컨텐츠, 데이터 등을 이용하여 다양한 동작을 수행한다.The CPU 123 accesses the storage unit 180 and performs booting using the O/S stored in the storage unit 180 . In addition, various operations are performed using various programs, contents, and data stored in the storage unit 180 .

ROM(122)에는 시스템 부팅을 위한 명령어 세트 등이 저장된다. 턴 온 명령이 입력되어 전원이 공급되면, CPU(123)는 ROM(122)에 저장된 명령어에 따라 저장부(180)에 저장된 O/S를 RAM(121)에 복사하고, O/S를 실행시켜 시스템을 부팅시킨다. 부팅이 완료되면, CPU(123)는 저장부(180)에 저장된 각종 애플리케이션 프로그램을을 RAM(121)에 복사하고, RAM(121)에 복사된 애플리케이션 프로그램을 실행시켜 각종 동작을 수행한다.The ROM 122 stores command sets for system booting and the like. When a turn-on command is input and power is supplied, the CPU 123 copies the O/S stored in the storage 180 to the RAM 121 according to the command stored in the ROM 122 and executes the O/S. boot up the system When booting is completed, the CPU 123 copies various application programs stored in the storage unit 180 to the RAM 121 and executes the copied application programs in the RAM 121 to perform various operations.

프로세서(120)는 저장부(180)에 저장된 모듈을 이용하여 다양한 동작을 수행할 수 있다. 예를 들어, 프로세서(120)는 인터넷망을 통해 수신한 웹 컨텐츠 데이터를 파싱하고 처리하여 해당 컨텐츠의 전체적인 레이아웃(layout)과 각 객체를 디스플레이(110)에 표시할 수 있다.The processor 120 may perform various operations using modules stored in the storage unit 180 . For example, the processor 120 may parse and process web content data received through an internet network to display the overall layout of the corresponding content and each object on the display 110 .

프로세서(120)는 음성 인식 기능이 활성화 되면, 웹 컨텐츠의 객체들을 분석하여 음성으로 컨트롤 될 수 있는 객체를 찾아서 객체의 위치, 객체에 관련된 동작, 객체 내 텍스트 포함 여부 등의 정보에 대한 전 처리를 수행하여 전처리 수행 결과를 저장부(180)에 저장할 수 있다.When the voice recognition function is activated, the processor 120 analyzes the objects of the web content to find objects that can be controlled by voice, and pre-processes information such as the location of the object, an operation related to the object, whether or not text is included in the object, and the like. It is possible to store the preprocessing results in the storage unit 180.

그리고 프로세서(120)는 전 처리된 객체 정보에 기초하여, 음성으로 컨트롤 가능한(선택 가능한) 객체들이 식별되게 표시되도록 디스플레이(110)를 제어할 수 있다. 예를 들어, 프로세서(120)는 음성으로 컨트롤이 가능한 객체들의 색상을 다른 객체들과 다르게 표시하도록 디스플레이(110)를 제어할 수 있다. Further, the processor 120 may control the display 110 to identify and display objects controllable (selectable) by voice based on the pre-processed object information. For example, the processor 120 may control the display 110 to display the colors of objects controllable by voice differently from other objects.

그리고 프로세서(120)는 마이크(150)로 입력된 음성을 음성인식엔진을 이용해서 텍스트로 인식할 수 있다. 이 경우, 프로세서(120)는 기결정된 언어(음성 인식의 기초가될 언어로서 설정된 언어)의 음성인식엔진을 이용한다. 또는, 프로세서(120)는 음성 신호와 음성 인식의 기초가 되는 언어에 대한 정보를 서버(300)로 보내어 서버(300)로부터 음성인식결과로서 텍스트를 수신하는 것도 가능하다.The processor 120 may recognize the voice input through the microphone 150 as text using a voice recognition engine. In this case, the processor 120 uses a voice recognition engine of a predetermined language (a language set as a language to be based on voice recognition). Alternatively, the processor 120 may transmit a voice signal and information about a language that is the basis of voice recognition to the server 300 and receive text as a voice recognition result from the server 300 .

그리고 프로세서(120)는 전 처리된 객체들 중에서 음성 인식 결과에 대응하는 객체를 검색하고, 검색된 객체의 위치에 객체가 선택되었음을 표시할 수 있다. 예를 들어, 프로세서(120)는 음성에 의해 선택된 객체를 하이라이트하도록 디스플레이(110)를 제어할 수 있다. 그리고 프로세서(120)는 전 처리된 객체 정보를 기초로, 음성 인식 결과에 대응하는 객체에 관련된 동작을 수행하여 그 결과를 디스플레이(110) 또는 오디오 출력부(170)를 통해 출력할 수 있다.Also, the processor 120 may search for an object corresponding to the voice recognition result among the pre-processed objects, and display that the object has been selected at the position of the searched object. For example, the processor 120 may control the display 110 to highlight an object selected by voice. Further, the processor 120 may perform an operation related to an object corresponding to a voice recognition result based on the pre-processed object information and output the result through the display 110 or the audio output unit 170 .

도 11은 본 개시의 일 실시 예에 따른 디스플레이장치(100)의 제어방법을 설명하기 위한 흐름도이다. 도 11에 도시된 흐름도는 본 명세서에서 설명되는 디스플레이장치(100)에서 처리되는 동작들로 구성될 수 있다. 따라서, 이하에서 생략된 내용이라 하더라도 디스플레이장치(100)에 관하여 기술된 내용은 도 11에 도시된 흐름도에도 적용될 수 있다.11 is a flowchart for explaining a control method of the display apparatus 100 according to an embodiment of the present disclosure. The flowchart shown in FIG. 11 may consist of operations processed in the display apparatus 100 described herein. Therefore, even if the content is omitted below, the description of the display device 100 can be applied to the flowchart shown in FIG. 11 .

도 11을 참고하면, 먼저 디스플레이장치(100)에서 복수의 텍스트 객체를 포함하는 UI 스크린을 표시한다(S1110).Referring to FIG. 11 , the display device 100 first displays a UI screen including a plurality of text objects (S1110).

그리고 디스플레이장치에 표시된 복수의 텍스트 객체 중 기결정된 언어와 상이한 텍스트 객체에 대해서는 기설정된 숫자를 함께 표시한다(S1120). 여기서 기결정된 언어란, 음성인식의 기초가 되는 언어로서 미리 결정된 것을 의미한다. 음성인식의 기초가 될 언어는 디폴트 언어로 설정된 언어이거나, 사용자의 수동 설정에 의해 설정되거나, 디스플레이장치(100)에 표시된 객체들을 구성하는 언어에 기초하여 자동 설정될 수 있다. 자동 설정의 경우, 예컨대 디스플레이장치(100)에 표시된 객체들에 OCR(Optical character recognition)을 적용하여 객체를 구성하는 언어를 확인할 수 있다.Then, among the plurality of text objects displayed on the display device, text objects different from the predetermined language are displayed together with preset numbers (S1120). Here, the predetermined language means a predetermined language as a basis for voice recognition. The language to be the basis of voice recognition may be set as a default language, set manually by a user, or set automatically based on a language constituting objects displayed on the display device 100 . In the case of automatic setting, for example, optical character recognition (OCR) may be applied to objects displayed on the display device 100 to determine the language constituting the objects.

그리고 사용자가 발화한 음성의 인식 결과가 표시된 숫자를 포함하면 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행한다(S1130). Then, if the recognition result of the voice spoken by the user includes the displayed number, an operation related to the text object corresponding to the displayed number is performed (S1130).

사용자가 발화한 음성의 인식 결과는 디스플레이장치의 자체적인 음성 인식에 의해 얻을 수 있거나, 복수의 서로 다른 언어에 대한 음성 인식을 수행하는 외부 서버에 음성 인식을 요청해서 수신받을 수 있다. 후자의 경우, 디스플레이장치(100)는 사용자가 발화한 음성에 대응하는 음성 신호와 음성 인식의 기초가되는 언어로 설정된 언어에 대한 정보를 외부 서버에 제공하고, 외부 서버로부터 수신된 음성 인식 결과가 표시된 숫자를 포함하면, 표시된 숫자에 대응되는 텍스트 객체와 관련된 동작을 수행할 수 있다.The result of recognizing the voice spoken by the user can be obtained by the display device's own voice recognition, or can be received by requesting voice recognition from an external server that performs voice recognition for a plurality of different languages. In the latter case, the display apparatus 100 provides information on a voice signal corresponding to a voice uttered by a user and a language set as a language underlying voice recognition to an external server, and the voice recognition result received from the external server is If the displayed number is included, an operation related to the text object corresponding to the displayed number can be performed.

예컨대, 텍스트 객체가 웹 페이지 내의 하이퍼링크텍스트인 경우, 텍스트 객체에 대응하는 URL 주소의 웹 페이지의 표시 동작을 수행할 수 있고, 텍스트 객체가 애플리케이션 실행을 위한 아이콘인 경우, 해당 애플리케이션을 실행할 수 있다.For example, if the text object is hyperlinked text in a web page, a web page displaying operation of a URL address corresponding to the text object may be performed, and if the text object is an icon for running an application, the corresponding application may be executed. .

한편, 상기 복수의 텍스트 객체를 포함하는 UI 스크린은 제1 애플리케이션의 실행 화면일 수 있다. 제1 애플리케이션의 실행 화면이란 제1 애플리케이션이 제공하는 어떠한 화면이라도 될 수 있다. 제1 애플리케이션의 실행 화면이 표시된 동안 사용자가 발화한 음성의 인식 결과에 대응하는 객체가 상기 제1 애플리케이션의 실행 화면에 없는 것으로 판단되면, 디스플레이장치는 상기 제1 애플리케이션과는 다른 제2 애플리케이션을 실행하여 상기 음성의 인식 결과에 대응하는 동작을 수행할 수 있다. 여기서 제1 애플리케이션은 웹브라우징 애플리케이션일 수 있고, 제2 애플리케이션은 다양한 소스, 예컨대 인터넷, 디스플레이장치 내 저장된 데이터, VOD 컨텐츠, 채널 정보(ex. EPG) 등에서 검색을 수행하는 애플리케이션일 수 있다. 예컨대, 현재 표시된 웹 페이지에서 음성 인식에 대응하는 객체가 없는 경우, 디스플레이장치는 다른 애플리케이션을 실행해서 음성 인식에 대응하는 검색 결과(예컨대, 구글 검색 결과, VOD 검색 결과, 채널 검색 결과 등)를 제공할 수 있다. Meanwhile, the UI screen including the plurality of text objects may be an execution screen of the first application. The execution screen of the first application may be any screen provided by the first application. If it is determined that the object corresponding to the recognition result of the voice spoken by the user while the running screen of the first application is displayed is not on the running screen of the first application, the display device executes a second application different from the first application. Thus, an operation corresponding to the result of recognizing the voice may be performed. Here, the first application may be a web browsing application, and the second application may be an application that searches various sources, such as the Internet, data stored in a display device, VOD content, channel information (eg EPG), and the like. For example, if there is no object corresponding to voice recognition in the currently displayed web page, the display device executes another application to provide search results (eg, Google search results, VOD search results, channel search results, etc.) corresponding to voice recognition. can do.

상술한 다양한 실시 예들에 따르면, 다양한 언어로 구성된 객체들에 대한 음성 컨트롤이 가능하며, 또한, 음성 검색을 보다 용이하게 할 수 있다.According to various embodiments described above, voice control for objects configured in various languages is possible, and voice search can be more easily performed.

한편, 이상에서 설명된 다양한 실시 예들은 소프트웨어(software), 하드웨어(hardware) 또는 이들의 조합된 것을 이용하여 컴퓨터(computer) 또는 이와 유사한 장치로 읽을 수 있는 기록 매체 내에서 구현될 수 있다. 하드웨어적인 구현에 의하면, 본 개시에서 설명되는 실시 예들은 ASICs(Application Specific Integrated Circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛(unit) 중 적어도 하나를 이용하여 구현될 수 있다. 일부의 경우에 본 명세서에서 설명되는 실시 예들이 프로세서(120) 자체로 구현될 수 있다. 소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다.Meanwhile, various embodiments described above may be implemented in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof. According to the hardware implementation, the embodiments described in this disclosure are application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and field programmable gate arrays (FPGAs). ), processors, controllers, micro-controllers, microprocessors, and electrical units for performing other functions. In some cases, the embodiments described herein may be implemented by the processor 120 itself. According to software implementation, embodiments such as procedures and functions described in this specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein.

한편, 상술한 본 개시의 다양한 실시 예들에 따른 디스플레이장치(100)에서의 처리동작을 수행하기 위한 컴퓨터 명령어(computer instructions)는 비일시적 컴퓨터 판독 가능 매체(non-transitory computer-readable medium) 에 저장될 수 있다. 이러한 비일시적 컴퓨터 판독 가능 매체에 저장된 컴퓨터 명령어는 특정 기기의 프로세서에 의해 실행되었을 때 상술한 다양한 실시 예에 따른 디스플레이장치(100)에서의 처리 동작을 상기 특정 기기가 수행하도록 한다. Meanwhile, computer instructions for performing processing operations in the display apparatus 100 according to various embodiments of the present disclosure described above may be stored in a non-transitory computer-readable medium. can When the computer instructions stored in the non-transitory computer readable medium are executed by the processor of the specific device, the processing operation in the display device 100 according to various embodiments described above is performed by the specific device.

비일시적 컴퓨터 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 비일시적 컴퓨터 판독 가능 매체의 구체적인 예로는, CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등이 있을 수 있다.A non-transitory computer readable medium is a medium that stores data semi-permanently and is readable by a device, not a medium that stores data for a short moment, such as a register, cache, or memory. Specific examples of the non-transitory computer readable media may include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시에 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.Although the preferred embodiments of the present disclosure have been shown and described above, the present disclosure is not limited to the specific embodiments described above, and is common in the technical field belonging to the present disclosure without departing from the gist of the present disclosure claimed in the claims. Of course, various modifications and implementations are possible by those with knowledge of, and these modifications should not be individually understood from the technical spirit or perspective of the present disclosure.

100: 디스플레이장치
110: 디스플레이
120: 프로세서100: display device
110: display
120: processor

Claims

In the display device,
display; and
Control the display to display a UI screen including a plurality of text objects;
Identifying a first text object including text in a first language, which is a basis for voice recognition, and a second text object including text in a second language different from the first language, among the plurality of text objects;
For the second text object among the plurality of text objects, control the display to display a preset number adjacent to the second text object;
and a processor configured to perform an operation related to the second text object when a result of the voice recognition for the voice spoken by the user includes the preset number.

delete

According to claim 1,
The UI screen is a web page,
the processor,
A display device configured to set a language corresponding to the language information of the web page as the first language.

According to claim 1,
the processor,
A display device that determines that a text object composed of two or more languages among the plurality of text objects is the second text object when the content ratio of the first language is less than a preset ratio.

delete

According to claim 1,
It further includes; a communication unit that performs communication with an external device;
the processor,
A display device that controls the display to display the preset number while receiving a signal corresponding to selection of a specific button of the external device.

According to claim 6,
The external device includes a microphone,
The communication department,
Receiving a voice signal corresponding to a voice input through a microphone of the external device;
the processor,
and performing an operation related to the second text object when a result of the voice recognition for the received voice signal includes the preset number.

According to claim 7,
the processor,
and performing an operation related to the first text object when a result of the voice recognition for the received voice signal corresponds to the first text object.

According to claim 1,
The operation related to the second text object,
Displaying a web page of a URL address corresponding to the second text object or executing an application program corresponding to the second text object.

According to claim 1,
The plurality of text objects are included in the execution screen of the first application,
the processor,
If it is determined that the object corresponding to the result of the voice recognition for the voice uttered by the user while the running screen of the first application is displayed is not on the running screen of the first application, a second application different from the first application A display device that executes an operation corresponding to a result of the voice recognition.

According to claim 10,
The second application is an application that provides search results for a search term,
the processor,
When it is determined that the object corresponding to the result of the voice recognition for the voice uttered by the user while the running screen of the first application is displayed is not on the running screen of the first application, the second application is executed to recognize the voice. A display device that provides a search result using text corresponding to the result as a search term.

According to claim 1,
A communication unit configured to communicate with a server that performs voice recognition for a plurality of different languages; further comprising;
the processor,
Controls the communication unit to provide a voice signal corresponding to a voice spoken by the user and information on the first language to the server, and when a result of voice recognition received from the server includes the preset number, the device A display device that performs an operation related to the second text object corresponding to the set number.

According to claim 12,
the processor,
When a result of voice recognition received from the server corresponds to the first text object, performing an operation related to the first text object.

In the control method of the display device,
displaying a UI screen including a plurality of text objects;
identifying a first text object including text in a first language, which is a basis for voice recognition, and a second text object including text in a second language different from the first language, among the plurality of text objects;
displaying a predetermined number adjacent to the second text object among the plurality of text objects; and
and performing an operation related to the second text object when a result of the voice recognition for the voice spoken by the user includes the preset number.

delete

According to claim 14,
The UI screen is a web page,
The control method of the display device,
The method of controlling a display device further comprising setting a language corresponding to the language information of the web page as the first language.

According to claim 14,
For a text object composed of two or more languages among the plurality of text objects, determining that the second text object is the second text object when the content rate of the first language is less than a preset rate; Control method of a display device further comprising.

delete

According to claim 14,
In the step of displaying the preset number,
A control method of a display device displaying the preset number while receiving a signal corresponding to selection of a specific button of the external device from an external device.

A computer-readable recording medium storing a program for executing a control method of a display device,
The control method of the display device,
controlling the display device to display a plurality of text objects;
identifying a first text object including text in a first language, which is a basis for voice recognition, and a second text object including text in a second language different from the first language, among the plurality of text objects;
controlling the display device to display a predetermined number adjacent to the second text object among the plurality of text objects; and
and performing an operation related to the second text object when a result of the voice recognition for the voice uttered by the user includes the preset number.

According to claim 1,
the processor,
A display device for controlling the display so that an icon indicating that the first text object is capable of voice recognition is displayed together with respect to the first text object among the plurality of text objects.