KR102049833B1

KR102049833B1 - Interactive server, display apparatus and controlling method thereof

Info

Publication number: KR102049833B1
Application number: KR1020190094895A
Authority: KR
Inventors: 정지혜; 김명재; 신용욱; 이보라; 이진식; 이청재
Original assignee: 삼성전자주식회사
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-11-29
Also published as: KR20190096312A

Abstract

대화형 서버, 디스플레이 장치 및 그 제어 방법이 개시된다. 본 발명에 따른 대화형 서버는 디스플레이 장치와 통신을 수행하여, 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성 신호를 수신하는 통신부, 복수의 지시자 및 복수의 명령어를 저장하는 저장부, 제1 발화 요소에 대응되는 지시자 및 제2 발화 요소에 대응되는 명령어를 저장부로부터 추출하는 추출부 및, 추출된 지시자 및 명령어를 조합하여 발화 음성 신호에 대응되는 응답 정보를 생성하여 디스플레이 장치로 전송하는 제어부를 포함하며, 제1 발화 요소는 디스플레이 장치의 화면상에 표시된 객체의 표시 상태에 기초하여 결정된 발화 요소이다. 이에 따라, 대화형 서버는 다양한 사용자의 발화 음성에 대해서 사용자의 의도를 파악하고, 그에 따른 응답 정보를 생성하여 디스플레이 장치로 제공할 수 있다.An interactive server, a display device, and a control method thereof are disclosed. The interactive server according to the present invention communicates with a display device to receive a spoken voice signal including a first spoken element representing a target and a second spoken element representing an execution command, a plurality of indicators, and a plurality of instructions. A storage unit for storing the information; an extractor for extracting an indicator corresponding to the first speech element and a command corresponding to the second speech element from the storage; and response information corresponding to the spoken voice signal by combining the extracted indicator and command. And a controller configured to generate and transmit the same to the display device, wherein the first speech element is a speech element determined based on a display state of an object displayed on a screen of the display apparatus. Accordingly, the interactive server may grasp the user's intention with respect to the spoken voices of various users, generate response information, and provide the response information to the display apparatus.

Description

Interactive server, display apparatus and controlling method

본 발명은 대화형 서버, 디스플레이 장치 및 제어 방법에 관한 것으로써, 보다 상세하게는 사용자 발화에 대응되는 응답 정보를 제공하기 위한 대화형 서버, 디스플레이 장치 및 제어 방법에 관한 것이다.The present invention relates to an interactive server, a display apparatus, and a control method, and more particularly, to an interactive server, a display apparatus, and a control method for providing response information corresponding to user speech.

일반적으로, 대화형 시스템에서 음성 인식이 가능한 디스플레이 장치는 사용자의 발화 음성을 수집하고, 그 수집한 발화 음성을 네트워크를 통해 연결된 외부 서버로 전송한다. 발화 음성을 수신한 외부 서버는 발화 음성을 분석하여 사용자의 의도를 파악하고, 그에 따른 응답 정보를 생성하여 디스플레이 장치로 전송한다. 이에 따라, 디스플레이 장치는 외부 서버로부터 수신한 응답 정보에 기초하여 사용자의 발화 음성에 대응하는 기능을 실행하거나 정보를 제공할 수 있다.In general, a display device capable of speech recognition in an interactive system collects a spoken voice of a user and transmits the collected spoken voice to an external server connected through a network. The external server receiving the spoken voice analyzes the spoken voice to determine the user's intention, generates response information, and transmits the response information to the display apparatus. Accordingly, the display apparatus may execute a function corresponding to the spoken voice of the user or provide the information based on the response information received from the external server.

그러나, 이 같은 종래의 대화형 시스템은 사용자의 발화 음성을 분석하고, 그 분석 결과에 기초하여 사용자의 의도를 파악하는데 한계가 있다. 예를 들어, "제1 컨텐츠 보여줘"와 같이 지칭하는 대상이 명확한 발화 음성의 경우, 외부 서버는 이 같은 발화 음성을 분석하여 사용자의 의도를 올바르게 파악하고, 그에 따른 응답 정보를 생성하여 디스플레이 장치로 전송할 수 있다. 따라서, 디스플레이 장치는 응답 정보에 기초하여 사용자가 요청한 제1 컨텐츠를 디스플레이할 수 있다.However, such a conventional interactive system has a limitation in analyzing a user's spoken voice and determining a user's intention based on the analysis result. For example, in the case of a spoken voice that is clearly referred to as “show first content,” the external server analyzes the spoken voice to correctly grasp the user's intention, and generates response information to the display device. Can transmit Therefore, the display device may display the first content requested by the user based on the response information.

그러나, "이거 보여줘"와 같이 지칭하는 대상이 불명확한 발화 음성의 경우, 외부 서버는 이 같은 발화 음성으로부터 사용자의 의도를 명확하게 파악하지 못하는 문제가 있다. 다시 말해, 종래의 대화형 시스템은 기정의된 발화 음성에 대해서만 사용자의 의도를 파악하고 그에 따른 동작을 수행하거나 정보를 제공하기 때문에 사용자의 발화에 제약이 따르는 문제가 있다. However, in the case of an unclear speech voice that is referred to as "show this", the external server has a problem that the user's intention is not clearly understood from the speech speech. In other words, the conventional interactive system has a problem in that the user's speech is restricted because the user's intention is determined only for the predetermined speech voice and the corresponding operation or information is provided.

본 발명은 상술한 필요성에 따라 안출된 것으로, 본 발명의 목적은, 대화형 시스템에서 다양한 사용자의 발화에 대응되는 동작을 수행하도록 함을 목적으로 한다.The present invention has been made in accordance with the above-described needs, and an object of the present invention is to perform an operation corresponding to the speech of various users in an interactive system.

이상과 같은 목적을 달성하기 위한 본 발명의 일 실시 예에 따른 대화형 서버는 디스플레이 장치와 통신을 수행하여, 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성 신호를 수신하는 통신부, 복수의 지시자 및 복수의 명령어를 저장하는 저장부, 상기 제1 발화 요소에 대응되는 지시자 및 상기 제2 발화 요소에 대응되는 명령어를 상기 저장부로부터 추출하는 추출부 및, 상기 추출된 지시자 및 명령어를 조합하여 상기 발화 음성 신호에 대응되는 응답 정보를 생성하여 상기 디스플레이 장치로 전송하는 제어부를 포함하며, 상기 제1 발화 요소는 상기 디스플레이 장치의 화면상에 표시된 객체의 표시 상태에 기초하여 결정된 발화 요소이다.In order to achieve the above object, an interactive server according to an exemplary embodiment of the present invention communicates with a display apparatus to include a spoken voice signal including a first spoken element representing a target and a second spoken element representing an execution command. A communication unit for receiving a plurality of indicators, a storage unit for storing a plurality of indicators and a plurality of instructions, an extractor for extracting an indicator corresponding to the first ignition element and a command corresponding to the second ignition element from the storage unit, and the extraction. And a control unit for generating response information corresponding to the spoken voice signal by combining the indicated indicators and commands and transmitting the response information to the display device, wherein the first spoken element is based on a display state of an object displayed on a screen of the display device. It is a fire element determined by.

그리고, 상기 제1 발화 요소는, 대명사, 서수, 방향 중 적어도 하나를 포함할 수 있다.The first ignition element may include at least one of pronouns, ordinal numbers, and directions.

또한, 상기 추출부는, 상기 제1 발화 요소에 요청 정보가 있는지 여부를 판단하여, 상기 요청 정보가 있으면, 상기 요청 정보에 대응되는 명령어를 저장부로부터 추출하고, 상기 제어부는, 상기 추출된 명령어에 기초하여 상기 요청 정보에 대응되는 컨텐츠 정보를 상기 응답 정보에 추가할 수 있다.The extractor may determine whether there is request information in the first utterance element, and if there is the request information, extracts a command corresponding to the request information from a storage unit, and the controller is further configured to extract the command. Based on the request, content information corresponding to the request information may be added to the response information.

그리고, 상기 지시자는, 상기 디스플레이 장치의 화면상에 표시된 상기 객체들 중에서 상기 대상을 상대적으로 지칭하기 위한 실행어일 수 있다.The indicator may be an execution word for relatively referring to the object among the objects displayed on the screen of the display apparatus.

또한, 상기 지시자는, 상기 디스플레이 장치의 화면상에 표시된 상기 객체들의 고유 식별 정보이며, 상기 추출부는, 상기 디스플레이 장치의 대화 이력에 기초하여 상기 제1 발화 요소가 지칭하는 대상을 판단하고, 판단된 상기 대상에 대응되는 고유 식별 정보를 상기 지시자로 추출할 수 있다.The indicator may be unique identification information of the objects displayed on the screen of the display apparatus, and the extracting unit may determine an object referred to by the first utterance element based on a conversation history of the display apparatus. Unique identification information corresponding to the object may be extracted with the indicator.

그리고, 상기 수신된 발화 음성 신호를 텍스트 정보로 변환하는 음성 처리부를 더 포함할 수 있다.The apparatus may further include a voice processor configured to convert the received spoken voice signal into text information.

한편, 본 발명의 일 실시 예에 따르면, 디스플레이 장치에 있어서, 디스플레이 장치는 사용자의 발화 음성을 입력받는 입력부, 상기 발화 음성에 대한 발화 음성 신호를 서버 장치로 전송하는 통신부, 화면을 디스플레이하는 디스플레이부, 상기 서버 장치로부터, 지시자 및 명령어를 포함하는 응답 정보가 수신되면, 상기 디스플레이부의 화면상에 표시된 객체들의 표시 상태에 기초하여 상기 지시자가 지칭하는 대상을 선택하고, 상기 선택된 대상에 대해 상기 명령어에 대응되는 동작을 수행하는 제어부를 포함한다.Meanwhile, according to an embodiment of the present disclosure, in the display device, the display device may include an input unit for receiving a spoken voice of a user, a communication unit for transmitting a spoken voice signal for the spoken voice to a server device, and a display unit for displaying a screen. And, when response information including an indicator and a command is received from the server device, selecting a target referred to by the indicator based on a display state of objects displayed on a screen of the display unit, and selecting the target to the command for the selected target. It includes a control unit for performing a corresponding operation.

그리고, 상기 발화 음성 신호는, 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하며, 상기 제1 발화 요소는 상기 디스플레이 장치의 화면 표시 상태에 기초하여 결정된 발화 요소로써, 대명사, 서수, 방향 중 적어도 하나를 포함할 수 있다.The speech signal includes a first speech element representing a target and a second speech element representing an execution command, wherein the first speech element is a speech element determined based on a screen display state of the display device. It may include at least one of an ordinal number and a direction.

한편, 본 발명의 일 실시 예에 따르면, 대화형 서버의 제어 방법에 있어서, 상기 방법은 디스플레이 장치로부터 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성 신호를 수신하는 단계, 상기 제1 발화 요소에 대응되는 지시자 및 상기 제2 발화 요소에 대응되는 명령어를 추출하는 단계 및 상기 추출된 지시자 및 명령어를 조합하여 상기 발화 음성 신호에 대응되는 응답 정보를 생성하여 상기 디스플레이 장치로 전송하는 단계를 포함하며, 상기 제1 발화 요소는 상기 디스플레이 장치의 화면상에 표시된 객체의 표시 상태에 기초하여 결정된 발화 요소이다.Meanwhile, according to an embodiment of the present invention, in the method of controlling an interactive server, the method receives a spoken voice signal including a first speech element representing an object and a second speech element representing an execution command from a display device. And extracting an indicator corresponding to the first speech element and a command corresponding to the second speech element, and combining the extracted indicator and the command to generate response information corresponding to the spoken voice signal. And transmitting to a device, wherein the first utterance element is a utterance element determined based on a display state of an object displayed on a screen of the display device.

또한, 상기 추출하는 단계는, 상기 제1 발화 요소에 요청 정보가 있는지 여부를 판단하여, 상기 요청 정보가 있으면, 상기 요청 정보에 대응되는 명령어를 저장부로부터 추출하고, 상기 전송하는 단계는, 상기 추출된 명령어에 기초하여 상기 요청 정보에 대응되는 컨텐츠 정보를 상기 응답 정보에 추가할 수 있다.The extracting may include determining whether there is request information in the first utterance element, and if there is the request information, extracting a command corresponding to the request information from a storage unit, and transmitting the information. Based on the extracted command, content information corresponding to the request information may be added to the response information.

또한, 상기 지시자는, 상기 디스플레이 장치의 화면상에 표시된 상기 객체들의 고유 식별 정보이며, 상기 추출하는 단계는, 상기 디스플레이 장치의 대화 이력에 기초하여 상기 제1 발화 요소가 지칭하는 대상을 판단하고, 판단된 상기 대상에 대응되는 고유 식별 정보를 상기 지시자로 추출할 수 있다.The indicator may be unique identification information of the objects displayed on the screen of the display apparatus, and the extracting may include determining an object referred to by the first utterance element based on a conversation history of the display apparatus. The unique identification information corresponding to the determined object may be extracted as the indicator.

그리고, 상기 수신된 발화 음성 신호를 텍스트 정보로 변환하는 단계를 더 포함할 수 있다.The method may further include converting the received spoken voice signal into text information.

한편, 본 발명의 일 실시 예에 따르면, 디스플레이 장치의 제어 방법에 있어서, 상기 방법은 사용자의 발화 음성을 입력받는 단계, 상기 발화 음성에 대한 발화 음성 신호를 서버 장치로 전송하는 단계, 상기 서버 장치로부터 지시자 및 명령어를 포함하는 응답 정보를 수신하는 단계 및 화면상에 표시된 객체들의 표시 상태에 기초하여 상기 지시자가 지칭하는 대상을 선택하고, 상기 선택된 대상에 대해 상기 명령어에 대응되는 동작을 수행하는 단계를 포함한다.On the other hand, according to an embodiment of the present invention, in the control method of the display apparatus, the method comprising the steps of receiving a user's spoken voice, transmitting a spoken voice signal for the spoken voice to a server device, the server device Receiving response information including an indicator and a command from the target; selecting an object referred to by the indicator based on a display state of objects displayed on the screen, and performing an operation corresponding to the command on the selected object; It includes.

이상과 같이 본 발명의 다양한 실시 예에 따르면, 대화형 시스템에서 대화형 서버는 다양한 사용자의 발화 음성에 대해서 사용자의 의도를 파악하고, 그에 따른 응답 정보를 생성하여 디스플레이 장치로 제공할 수 있다.As described above, according to various embodiments of the present disclosure, in the interactive system, the interactive server may grasp the user's intention with respect to the spoken voices of the various users, generate response information, and provide the response information to the display apparatus.

도 1은 본 발명의 일 실시예에 따른 사용자 발화 음성에 적합한 응답 정보를 제공하는 대화형 시스템의 제1 예시도,
도 2는 본 발명의 또다른 실시예에 따른 사용자 발화 음성에 적합한 응답 정보를 제공하는 대화형 시스템의 제2 예시도,
도 3은 본 발명의 일 실시예에 따른 대화형 서버의 블록도,
도 4는 본 발명의 일 실시예에 따른 디스플레이 장치의 화면상에 표시된 객체의 표시 상태에 기초하여 발화되는 발화 음성의 예시도,
도 5는 본 발명의 일 실시예에 따른 디스플레이 장치의 블록도,
도 6은 본 발명의 일 실시예에 따른 대화형 서버의 제어 방법에 대한 흐름도,
도 7은 본 발명의 일 실시예에 따른 디스플레이 장치의 제어 방법에 대한 흐름도이다.1 is a first exemplary diagram of an interactive system for providing response information suitable for a user spoken voice according to an embodiment of the present invention;
2 is a second exemplary diagram of an interactive system for providing response information suitable for user spoken voice according to another embodiment of the present invention;
3 is a block diagram of an interactive server according to an embodiment of the present invention;
4 is an exemplary diagram of a spoken voice uttered based on a display state of an object displayed on a screen of a display device according to an embodiment of the present invention;
5 is a block diagram of a display device according to an embodiment of the present invention;
6 is a flowchart illustrating a method for controlling an interactive server according to an embodiment of the present invention;
7 is a flowchart illustrating a control method of a display apparatus according to an exemplary embodiment.

이하 첨부된 도면들을 참조하여 본 발명의 일시 예를 보다 상세하게 설명한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 사용자 발화 음성에 적합한 응답 정보를 제공하는 대화형 시스템의 제1 예시도이다.1 is a first exemplary diagram of an interactive system for providing response information suitable for a user spoken voice according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시 예에 따른 대화형 시스템은 디스플레이 장치(100) 및 대화형 서버(200)를 포함한다. As shown in FIG. 1, an interactive system according to an embodiment of the present invention includes a display device 100 and an interactive server 200.

디스플레이 장치(100)는 인터넷이 가능한 장치로써, 스마트 TV, 스마트폰과 같은 휴대폰, 데스크탑 PC, 노트북, 네비게이션 등과 같은 다양한 전자 장치로 구현될 수 있다. 이 같은 디스플레이 장치(100)는 사용자의 발화 음성이 입력되면, 입력된 사용자의 발화 음성에 대응되는 동작을 수행한다. 구체적으로, 디스플레이 장치(100)는 사용자의 발화 음성에 대응되는 응답 메시지를 출력하거나 또는 사용자의 발화 음성에 대응되는 기능을 수행할 수 있다. 즉, 디스플레이 장치(100)는 입력된 발화 음성을 분석하여 내부적으로 수행가능한지 여부를 판단하고, 그 판단 결과에 따라, 입력된 발화 음성에 대응되는 기능을 수행하거나 또는 대화형 서버(200)로부터 수신한 응답 정보에 기초하여 동작을 수행할 수 있다.The display device 100 is an internet-enabled device, and may be implemented as various electronic devices such as a smart TV, a mobile phone such as a smartphone, a desktop PC, a notebook, a navigation device, and the like. When the spoken voice of the user is input, the display apparatus 100 performs an operation corresponding to the input spoken voice of the user. In detail, the display apparatus 100 may output a response message corresponding to the spoken voice of the user or perform a function corresponding to the spoken voice of the user. That is, the display apparatus 100 analyzes the input spoken voice to determine whether it can be performed internally, and according to the determination result, performs the function corresponding to the input spoken voice or receives it from the interactive server 200. The operation may be performed based on the response information.

예를 들어, "볼륨 올려"라는 발화 음성이 입력되면, 디스플레이 장치(100)는 기저장된 제어 정보 중 입력된 발화 음성에 대응되는 제어 정보에 기초하여 볼륨을 조정할 수 있다.For example, when a spoken voice of "volume up" is input, the display apparatus 100 may adjust the volume based on control information corresponding to the input spoken voice among prestored control information.

또다른 예를 들어, "오늘 날씨 어때?"라는 발화 음성이 입력되면, 디스플레이 장치(100)는 입력된 발화 음성에 대한 발화 음성 신호(이하 발화 음성이라 함)를 대화형 서버(200)로 전송한다. 여기서, 발화 음성은 아날로그 신호가 될 수 있다. 따라서, 디스플레이 장치(100)는 아날로그 신호인 발화 음성을 디지털 신호로 변환하여 대화형 서버(200)로 전송한다. 이후, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 기초하여 오늘 날씨에 대한 결과를 음성 또는 텍스트 형태의 영상으로 출력할 수 있다.For another example, when a spoken voice of "how is the weather today" is input, the display apparatus 100 transmits a spoken voice signal (hereinafter referred to as spoken voice) for the input spoken voice to the interactive server 200. do. Here, the spoken voice may be an analog signal. Therefore, the display apparatus 100 converts the spoken voice, which is an analog signal, into a digital signal and transmits the converted voice to the interactive server 200. Thereafter, the display apparatus 100 may output a result of today's weather as an image in voice or text form based on the response information received from the interactive server 200.

대화형 서버(200)는 디스플레이 장치(100)로부터 수신한 사용자의 발화 음성에 기초하여 사용자의 의도에 적합한 응답 정보를 제공한다. 구체적으로, 대화형 서버(200)는 디스플레이 장치(100)로 사용자의 발화 음성이 수신되면, 수신된 발화 음성으로부터 발화 요소를 추출하고, 그 추출된 발화 요소에 기초하여 사용자의 발화 음성과 관련된 응답 정보를 생성하여 전송할 수 있다. 전술한 바와 같이, 디스플레이 장치(100)로부터 수신한 사용자의 발화 음성은 디지털 신호가 될 수 있다. 따라서, 대화형 서버(200)는 디지털 신호로 변환된 발화 음성이 수신되면, 이를 텍스트 정보로 생성한 후, 생성된 텍스트 정보를 분석하여 발화 요소를 추출하고, 추출된 발화 요소에 기초하여 사용자의 발화 음성에 대응되는 응답 정보를 생성할 수 있다. 디지털 신호로 변환된 발화 음성으로부터 텍스트 정보를 생성하는 것은 공지된 기술이기에 본 발명에서는 상세한 설명을 생략하도록 한다. The interactive server 200 provides response information suitable for the user's intention based on the user's spoken voice received from the display apparatus 100. In detail, when the user's speech voice is received by the display apparatus 100, the interactive server 200 extracts a speech element from the received speech voice and responds to the user's speech based on the extracted speech element. Information can be generated and transmitted. As described above, the user's spoken voice received from the display apparatus 100 may be a digital signal. Therefore, when the spoken voice converted into a digital signal is received, the interactive server 200 generates the spoken voice as text information, analyzes the generated text information, and extracts a spoken element, based on the extracted spoken element. Response information corresponding to the spoken voice may be generated. Since generating text information from a spoken voice converted into a digital signal is a well-known technique, detailed description thereof will be omitted.

한편, 발화 요소는 사용자의 발화 음성 내에서 사용자가 요청한 동작을 수행하기 위한 핵심 키워드로써, 목적 영역(domain), 목적 기능(user action) 및 주요 특징(feature) 별로 추출된 발화 요소를 분류할 수 있다. 전술한 예와 같이, "오늘 날씨 어때?"라는 사용자의 발화 음성에 대한 텍스트 정보가 생성되면, 대화형 서버(200)는 "오늘", "날씨", "어때?"라는 발화 요소를 추출할 수 있다. 이후, 대화형 서버(200)는 추출된 발화 요소 중 "오늘" 및 "날씨"를 주요 특징에 대한 발화 요소(이하 제1 발화 요소라 함)로 분류하고, "어때?"를 목적 기능에 대한 발화 요소(이하 제2 발화 요소라 함)로 분류할 수 있다. 또한, 대화형 서버(200)는 추출된 발화 요소에 기초하여 목적 영역에 대한 발화 요소(이하 제3 발화 요소라 함)가 웹 서치라는 도메인에 속하는 것으로 분류할 수 있다. 이 같이, 사용자의 발화 음성에 대한 텍스트 정보로부터 제1 내지 제3 발화 요소가 분류되면, 대화형 서버(200)는 다양한 컨텐츠를 제공하는 외부 서버(미도시)로부터 날씨 정보를 제공받아, 이를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송한다. 이에 따라, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 기초하여 오늘 날씨 정보를 음성 및 텍스트 중 적어도 하나를 통해 디스플레이할 수 있다.On the other hand, the utterance element is a key keyword for performing an operation requested by the user in the utterance voice of the user, and can classify the utterance elements extracted by the target domain, the user action, and the main features. have. As described above, when the text information on the user's speech voice of "how is the weather today" is generated, the interactive server 200 extracts the speech elements of "today", "weather", "how?" Can be. Thereafter, the interactive server 200 classifies "today" and "weather" among the extracted utterance elements as ignition elements for the main features (hereinafter referred to as the first ignition element), and "What?" It can be classified into a ignition element (hereinafter referred to as a second ignition element). In addition, the interactive server 200 may classify the utterance element (hereinafter referred to as the third utterance element) for the target region as belonging to a domain called web search based on the extracted utterance element. As such, when the first to third speech elements are classified from the text information of the user's speech voice, the interactive server 200 receives weather information from an external server (not shown) providing various contents, and includes the same. The response information is generated and transmitted to the display apparatus 100. Accordingly, the display apparatus 100 may display today's weather information through at least one of voice and text based on the response information received from the interactive server 200.

한편, 전술한 대화형 서버(200)는 디지털 신호로 변환된 사용자의 발화 음성을 텍스트 정보로 생성하는 제1 서버(10) 및 텍스트 정보로 생성된 발화 음성에 대응하는 응답 정보를 생성하는 제2 서버(20)를 포함할 수 있다. 이하에서는, 디스플레이 장치(100), 제1 및 제2 서버(10,20)를 통해 사용자의 발화 음성에 적합한 응답 정보를 제공하는 대화형 시스템에 대해서 상세히 설명하도록 한다.Meanwhile, the above-described interactive server 200 may include the first server 10 generating the spoken voice of the user converted into the digital signal as text information and the second response information corresponding to the spoken voice generated from the text information. Server 20 may be included. Hereinafter, an interactive system that provides response information suitable for a user's spoken voice through the display apparatus 100 and the first and second servers 10 and 20 will be described in detail.

도 2는 본 발명의 또다른 실시예에 따른 사용자 발화 음성에 적합한 응답 정보를 제공하는 대화형 시스템의 제2 예시도이다.2 is a second exemplary diagram of an interactive system providing response information suitable for a user spoken voice according to another embodiment of the present invention.

도 2에 도시된 바와 같이, 디스플레이 장치(100)는 사용자로부터 발화된 발화 음성이 입력되면, 입력된 발화 음성을 디지털 신호로 변환하여 제1 서버(10)로 전송한다. 디지털 신호로 변환된 발화 음성이 수신되면, 제1 서버(10)는 기저장된 다양한 발화 음성에 대한 특정 패턴에 따라, 사용자의 발화 음성에 대한 텍스트 정보를 생성하여 디스플레이 장치(100)로 전송한다. As shown in FIG. 2, when a spoken voice is input from a user, the display apparatus 100 converts the input spoken voice into a digital signal and transmits the converted spoken voice to a first server 10. When the spoken voice converted into the digital signal is received, the first server 10 generates text information regarding the spoken voice of the user and transmits the text information to the display apparatus 100 according to a specific pattern for the pre-stored various spoken voices.

제1 서버(10)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신한 디스플레이 장치(100)는 사용자의 발화 음성에 대한 텍스트 정보를 제2 서버(20)로 전송한다. 사용자의 발화 음성에 대한 텍스트 정보를 수신한 제2 서버(20)는 수신한 텍스트 정보를 분석하여 발화 요소를 추출하고, 그 추출된 발화 요소에 기초하여 사용자의 발화 음성에 대응되는 동작을 수행하기 위한 응답 정보를 생성하여 디스플레이 장치(100)로 전송할 수 있다.The display apparatus 100 that receives the text information on the user's spoken voice from the first server 10 transmits the text information on the user's spoken voice to the second server 20. The second server 20 which has received text information on the spoken voice of the user analyzes the received text information, extracts a spoken element, and performs an operation corresponding to the spoken voice of the user based on the extracted spoken element. The response information may be generated and transmitted to the display apparatus 100.

지금까지, 본 발명에 따른 디스플레이 장치(100)와 대화형 서버(200)로 이루어진 대화형 시스템에서 사용자의 발화 음성에 대응하는 응답 정보를 제공하는 동작에 대해서 개략적으로 설명하였다. 이하에서는, 전술한 디스플레이 장치(100) 및 대화형 서버(200)의 각 구성에 대해서 상세히 설명하도록 한다.Up to now, the operation of providing the response information corresponding to the spoken voice of the user in the interactive system including the display apparatus 100 and the interactive server 200 according to the present invention has been briefly described. Hereinafter, each configuration of the above-described display apparatus 100 and the interactive server 200 will be described in detail.

도 3은 본 발명의 일 실시예에 따른 대화형 서버의 블록도이다.3 is a block diagram of an interactive server according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 대화형 서버(200)는 통신부(210), 음성 처리부(220), 저장부(230), 추출부(240) 및 제어부(250)를 포함한다.As illustrated in FIG. 3, the interactive server 200 includes a communication unit 210, a voice processing unit 220, a storage unit 230, an extraction unit 240, and a control unit 250.

통신부(210)는 사용자의 발화 음성을 제공하는 디스플레이 장치(100)와 통신을 수행한다. 특히, 통신부(210)는 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성에 대한 디지털 신호를 수신할 수 있다. 여기서, 제1 발화 요소는 사용자의 발화 음성 내에서 주요 특징으로 분류되는 발화 요소이다. 이 같은 제1 발화 요소가 디스플레이 장치(100)의 화면상에 표시된 객체들의 표시 상태에 기초하여 결정된 발화 요소인 경우, 제1 발화 요소는 대상을 나타내는 발화 요소가 될 수 있다. 즉, 제1 발화 요소는 대명사, 서수, 방향 중 적어도 하나를 통해 대상을 나타내는 발화 요소가 될 수 있다. 그리고, 제2 발화 요소는 사용자의 발화 음성에 내에서 목적 기능으로 분류되는 발화 요소로써, 본 발명에서는 이 같은 목적 기능을 실행 명령으로 명명하기로 한다.The communicator 210 communicates with the display apparatus 100 that provides a spoken voice of a user. In particular, the communicator 210 may receive a digital signal for a spoken voice including a first spoken element representing a target and a second spoken element representing an execution command. Here, the first speech element is a speech element classified as a main feature in the speech of the user. When the first speech element is a speech element determined based on a display state of objects displayed on the screen of the display apparatus 100, the first speech element may be a speech element representing a target. That is, the first speech element may be a speech element representing the object through at least one of pronouns, ordinal numbers, and directions. The second speech element is a speech element that is classified as an objective function in the speech of the user. In the present invention, such an objective function will be referred to as an execution command.

예를 들어, "이거 실행해줘"라는 발화 음성의 경우, "이거"는 대명사를 나타내는 제1 발화 요소가 될 수 있으며, "실행해줘"는 실행 명령을 나타내는 제2 발화 요소가 될 수 있다. 이 같은 제1 및 제2 발화 요소를 포함하는 발화 음성에 대한 디지털 신호가 수신되면, 음성 처리부(220)는 수신된 발화 음성을 텍스트 정보로 변환한다. 실시예에 따라, 음성 처리부(220)는 STT(Speech to Text) 알고리즘을 이용하여 수신된 사용자의 발화 음성을 텍스트로 변환할 수 있다. 그러나, 본 발명은 이에 한정되지 않으며, 통신부(210)는 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신할 수 있다. 이 경우, 디스플레이 장치(100)는 입력된 사용자의 발화 음성에 대한 텍스트 정보를 전술한 제1 서버(10)와 같은 로부터 수신하여 대화형 서버(200)로 전송한다. 따라서, 통신부(210)는 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신할 수 있다. 이 같이, 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신할 경우, 전술한 음성 처리부(220)의 구성은 생략될 수 있다.For example, in the case of a utterance voice of "play this", "this" may be a first utterance element representing a pronoun and "execute" may be a second utterance element representing an execution command. When a digital signal for a spoken voice including such first and second spoken elements is received, the voice processor 220 converts the received spoken voice into text information. According to an embodiment, the speech processing unit 220 may convert the received speech of the user into text using a speech to text (STT) algorithm. However, the present invention is not limited thereto, and the communication unit 210 may receive text information on the user's spoken voice from the display apparatus 100. In this case, the display apparatus 100 receives the text information on the input voice of the user from the same as the above-described first server 10 and transmits it to the interactive server 200. Therefore, the communicator 210 may receive text information regarding the spoken voice of the user from the display apparatus 100. As such, when the text information regarding the spoken voice of the user is received from the display apparatus 100, the above-described configuration of the voice processor 220 may be omitted.

이 같이, 사용자의 발화 음성이 텍스트 정보로 변환되거나 혹은 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보가 수신되면, 추출부(240)는 저장부(230)로부터 제1 발화 요소에 대응되는 지시자 및 제2 발화 요소에 대응되는 명령어를 추출한다. 여기서, 저장부(230)는 대화형 서버(200)를 동작시키기 위해 필요한 각종 프로그램 등이 저장되는 저장매체로써, 메모리, HDD(Hard Disk Drvie) 등으로 구현될 수 있다. 예를 들어, 저장부(230)는 후술할 제어부(250)의 동작을 수행하기 위한 프로그램을 저장하는 ROM, 제어부(250)의 동작 수행에 따른 데이터를 일시적으로 저장하는 RAM 등을 구비할 수 있다. 뿐만 아니라, 저장부(230)는 각종 참조 데이터를 저장하는 EEROM(Electrically Eraasable and Programmable ROM) 등을 더 구비할 수 있다. 특히, 저장부(230)는 복수의 지시자 및 복수의 명령어를 저장할 수 있다. 여기서, 복수의 지시자 및 복수의 명령어는 사용자의 발화 음성으로부터 추출된 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하도록 하기 위한 실행 정보이다. 구체적으로, 지시자는 디스플레이 장치(100)의 화면상에 표시된 객체들 중에서 대상을 상대적으로 지칭하기 위한 실행어이다. 다시 말해, 지시자는 사용자의 발화 음성으로부터 추출된 발화 요소 중 대명사, 서수, 방향과 같은 대상을 나타내는 제1 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하도록 하기 위한 실행어이다. As such, when the spoken voice of the user is converted into text information or when the text information on the spoken voice of the user is received from the display apparatus 100, the extractor 240 corresponds to the first spoken element from the storage 230. The command corresponding to the indicator and the second speech element is extracted. Here, the storage unit 230 is a storage medium that stores various programs necessary for operating the interactive server 200, and may be implemented as a memory, a hard disk drvie (HDD), or the like. For example, the storage unit 230 may include a ROM for storing a program for performing an operation of the controller 250, a RAM for temporarily storing data according to an operation of the controller 250, and the like. . In addition, the storage unit 230 may further include an electrically erasable and programmable ROM (EEROM) for storing various reference data. In particular, the storage unit 230 may store a plurality of indicators and a plurality of commands. Here, the plurality of indicators and the plurality of commands are execution information for performing an operation in a form that can be interpreted by the display apparatus 100 based on the spoken elements extracted from the spoken voice of the user. In detail, the indicator is an execution word for relatively referring to an object among objects displayed on the screen of the display apparatus 100. In other words, the indicator is an execution word for performing an operation in an interpretable form on the display apparatus 100 based on a first utterance element representing an object such as a pronoun, an ordinal number, and a direction among utterance elements extracted from the utterance voice of the user. to be.

따라서, 저장부(230)는 아래 표 1과 같이 대상을 나타내는 각각의 제1 발화 요소 별로 지시자가 매칭된 테이블을 저장할 수 있다.Therefore, the storage unit 230 may store a table in which an indicator is matched for each first utterance element indicating a target, as shown in Table 1 below.

제1 발화 요소First ignition element 지시자indicator 이거this $this$$ this $ 다음 next $this$+1$ this $ + 1 세번째 third $3rd$$ 3rd $

표 1과 같이, "이거" 라는 제1 발화 요소에 대응되는 지시자는 $this$이 될 수 있으며, "다음"이라는 제1 발화 요소에 대응되는 지시자는 $this$+1이 될 수 있으며, "세번째"라는 제1 발화 요소에 대응되는 지시자는 $3rd$이 될 수 있다.그리고, 저장부(230)에 저장되는 복수의 명령어는 사용자의 발화 음성으로부터 추출된 발화 요소 중 실행 명령을 나타내는 제2 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하도록 하기 위한 실행 정보이다. 따라서, 저장부(230)는 아래 표 2와 같이 실행 명령을 나타내는 각각의 제2 발화 요소 별로 명령어가 매칭된 테이블을 저장할 수 있다.As shown in Table 1, the indicator corresponding to the first utterance element "this" may be $ this $, the indicator corresponding to the first ignition element "next" may be $ this $ + 1, and " The indicator corresponding to the first utterance element of "third" may be $ 3rd $. The plurality of commands stored in the storage 230 may include a second command indicating an execution command among utterance elements extracted from the user's spoken voice. Execution information for causing the display apparatus 100 to perform an operation based on the utterance element. Therefore, the storage unit 230 may store a table in which the command is matched for each second utterance element representing the execution command as shown in Table 2 below.

제2 발화 요소Second ignition element 명령어command 실행해줘(실행)Run it (run) executeexecute 보여줘, 뭐야(정보출력)Show me what showshow

표 2와 같이, "실행해줘"라는 제2 발화 요소에 대응되는 명령어는 "execute"가 될 수 있으며, "보여줘", "뭐야"라는 제2 발화 요소에 대응되는 명령어는 "show"가 될 수 있다.표 1 및 표 2에서 개시된 바와 같이, 저장부(230)는 대상을 나타내는 제1 발화 요소 각각에 대한 지시자 및 실행 명령을 나타내는 제2 발화 요소 각각에 대한 명령어를 저장할 수 있다. 따라서, 추출부(240)는 음성 처리부(220)를 통해 텍스트 정보로 변환된 사용자의 발화 음성으로부터 제1 및 제2 발화 요소를 추출하고, 추출된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 저장부(230)로부터 추출할 수 있다.As shown in Table 2, the command corresponding to the second utterance element "execute" may be "execute", and the command corresponding to the second utterance element "show me" or "what" may be "show". As described in Tables 1 and 2, the storage unit 230 may store an instruction for each of the first utterance elements representing the target and a command for each of the second utterance elements representing the execution command. Therefore, the extractor 240 extracts the first and second speech elements from the user's speech voice converted into text information through the speech processor 220, and an indicator corresponding to the extracted first and second speech elements; The command may be extracted from the storage 230.

제어부(250)는 추출부(240)로부터 추출된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 조합하여 사용자의 발화 음성에 대응되는 응답 정보를 생성하여 통신부(210)를 통해 디스플레이 장치(100)로 전송한다. 예를 들어, "이거 실행해줘"라는 사용자의 발화 음성의 경우, 추출부(240)는 "이거"라는 대상을 나타내는 제1 발화 요소 및 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 이 같은 제1 및 제2 발화 요소가 추출되면, 추출부(240)는 저장부(230)로부터 추출된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출한다. 즉, 추출부(240)는 표 1 및 표 2에 개시된 테이블에 기초하여 "이거"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출할 수 있다.The controller 250 generates response information corresponding to the user's spoken voice by combining an indicator and a command corresponding to the first and second speech elements extracted from the extractor 240, and then displays the display apparatus ( 100). For example, in the case of the user's spoken voice of "play this", the extraction unit 240 extracts the first speech element representing the object "this" and the second speech element representing the execution command "run". can do. When the first and second utterance elements are extracted, the extractor 240 extracts an indicator and a command corresponding to the first and second utterance elements extracted from the storage 230. That is, the extraction unit 240, based on the tables disclosed in Table 1 and Table 2, the extracting unit 240 indicates an execution command of the indicators "$ this $" and "execute" corresponding to the first ignition element indicating the object "this". 2 The command "execute" corresponding to the utterance element can be extracted.

이 같이, 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 제어부(250)는 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 즉, 제어부(250)는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 조합하여 "execute($this$)"라는 실행 명령 스크립트를 생성할 수 있다. As such, when the indicators and instructions corresponding to the first and second utterance elements are extracted, the controller 250 generates the execution command script by combining the extracted indicators and instructions. That is, the controller 250 combines the indicator "$ this $" corresponding to the first ignition element and the instruction "execute" corresponding to the second ignition element indicating the execution command "execute" and executes "execute ($ this $). You can create an executable command script called.

또다른 예를 들어, "세번째 것 실행해줘"라는 사용자의 발화 음성의 경우, 추출부(240)는 "세번째"라는 대상을 나타내는 제1 발화 요소 및 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 이 같은 제1 및 제2 발화 요소가 추출되면, 추출부(240)는 저장부(230)로부터 추출된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출한다. 즉, 추출부(240)는 표 1 및 표 2에 개시된 테이블에 기초하여 "세번째"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$3rd$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출할 수 있다.For another example, in the case of the user's spoken voice of "play the third one", the extraction unit 240 is the first speech element representing the object "third" and the second speech representing the execution command "execute". You can extract elements. When the first and second utterance elements are extracted, the extractor 240 extracts an indicator and a command corresponding to the first and second utterance elements extracted from the storage 230. That is, the extraction unit 240 is the first to display the execution command of the instructions "$ 3rd $" and "execute" corresponding to the first ignition element indicating the target "third" based on the table disclosed in Table 1 and Table 2 2 The command "execute" corresponding to the utterance element can be extracted.

이 같이, 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 제어부(250)는 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 즉, 제어부(250)는 제1 발화 요소에 대응되는 지시자 "$3rd$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 조합하여 "execute($3rd$)"라는 실행 명령 스크립트를 생성할 수 있다.As such, when the indicators and instructions corresponding to the first and second utterance elements are extracted, the controller 250 generates the execution command script by combining the extracted indicators and instructions. That is, the controller 250 combines the indicator "$ 3rd $" corresponding to the first ignition element and the instruction "execute" corresponding to the second ignition element indicating the execution command "execute" and executes "execute ($ 3rd $ You can create an executable command script called.

또다른 예를 들어, "다음 거 실행해줘"라는 사용자의 발화 음성의 경우, 추출부(240)는 "다음"이라는 대상을 나타내는 제1 발화 요소 및 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 이 같은 제1 및 제2 발화 요소가 추출되면, 추출부(240)는 저장부(230)로부터 추출된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출한다. 즉, 추출부(240)는 표 1 및 표 2에 개시된 테이블에 기초하여 "다음"이라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$+1"과 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출할 수 있다.For another example, in the case of the user's spoken voice " please execute next ", the extraction unit 240 has a first speech element representing the object " next " and a second speech representing the execution command " execute " You can extract elements. When the first and second utterance elements are extracted, the extractor 240 extracts an indicator and a command corresponding to the first and second utterance elements extracted from the storage 230. That is, the extractor 240 executes the instructions "$ this $ + 1" and the "execute" command corresponding to the first ignition element representing the object "next" based on the table disclosed in Table 1 and Table 2. The command “execute” corresponding to the second utterance element represented may be extracted.

이 같이, 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 제어부(250)는 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 즉, 제어부(250)는 "다음"이라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$+1"과 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 조합하여 "execute($this$+1)"라는 실행 명령 스크립트를 생성할 수 있다.As such, when the indicators and instructions corresponding to the first and second utterance elements are extracted, the controller 250 generates the execution command script by combining the extracted indicators and instructions. That is, the controller 250 includes the indicator "$ this $ + 1" corresponding to the first utterance element indicating the object "next" and the instruction "execute" corresponding to the second ignition element indicating the execution command "execute". Can be combined to create an executable command script called "execute ($ this $ + 1)".

이 같은 실행 명령 스크립트가 생성되면, 제어부(250)는 생성된 실행 명령 스크립트를 포함하는 응답 정보를 디스플레이 장치(100)로 전송한다. 이에 따라, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 포함된 실행 명령 스크립트에 기초하여 화면상에 표시된 객체 중 사용자가 지칭한 대상에 대응되는 객체를 선택하고, 선택된 객체를 디스플레이할 수 있다.When such an execution command script is generated, the controller 250 transmits response information including the generated execution command script to the display apparatus 100. Accordingly, the display apparatus 100 selects an object corresponding to the object indicated by the user from among the objects displayed on the screen based on the execution command script included in the response information received from the interactive server 200, and selects the selected object. Can be displayed.

이하에서는, 디스플레이 장치(100)의 화면상에 표시된 객체의 표시 상태에 기초하여 제1 및 제2 발화 요소를 포함하는 발화 음성이 사용자에 의해 발화되는 것에 대해서 구체적으로 설명하도록 한다.Hereinafter, the speech voice including the first and second speech elements based on the display state of the object displayed on the screen of the display apparatus 100 will be described in detail.

도 4는 본 발명의 일 실시예에 따른 디스플레이 장치의 화면상에 표시된 객체의 표시 상태에 기초하여 발화되는 발화 음성의 예시도이다.4 is an exemplary diagram of a spoken voice uttered based on a display state of an object displayed on a screen of a display device according to an exemplary embodiment of the present invention.

도 4에 도시된 바와 같이, 디스플레이 장치(100)는 사용자가 요청한 채널을 통해 컨텐츠(410)를 수신하여 디스플레이할 수 있다. 또한, 디스플레이 장치(100)는 리모콘 또는 사용자 발화 음성을 통해 입력된 사용자 명령에 기초하여 사용자가 요청한 컨텐츠에 대한 컨텐츠 리스트(420)를 화면상에 디스플레이할 수 있다. 도시된 바와 같이, 컨텐츠 리스트(420) 상에는 지금까지 방송된 제1 컨텐츠에 대한 컨텐츠 정보(421~425)가 표시될 수 있다. 한편, 사용자는 컨텐츠 리스트(420) 상에 표시된 회차별 컨텐츠 정보(421~425)를 참조하여 특정 회차에 대응되는 제1 컨텐츠를 시청하기 위한 발화를 할 수 있다. 예를 들어, 사용자는 1회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 시청하기 위해 "이거 실행해줘"라는 발화를 하거나 혹은 3회차 컨텐츠 정보(423)에 대응되는 제1 컨텐츠를 시청하기 위해 "세번째 것 실행해줘"라는 발화를 할 수 있다.As illustrated in FIG. 4, the display apparatus 100 may receive and display the content 410 through a channel requested by the user. In addition, the display apparatus 100 may display a content list 420 for content requested by a user on the screen based on a user command input through a remote controller or a user spoken voice. As illustrated, the content information 421 to 425 for the first content broadcast so far may be displayed on the content list 420. Meanwhile, the user may speak the first content corresponding to the specific event by referring to the content information 421 to 425 displayed on the content list 420. For example, a user may speak "Please do this" to watch the first content corresponding to the first content information 421 or to watch the first content corresponding to the third content information 423. You can say, "Do the third one."

이와 같이, 사용자는 디스플레이 장치(100)의 화면상에 디스플레이된 컨텐츠 리스트(420)로부터 시청하고자 하는 회차의 제1 컨텐츠(대상)를 나타내는 제1 발화 요소 및 해당 회차의 제1 컨텐츠를 시청하기 위한 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화를 할 수 있다. 따라서, 제어부(250)는 이 같은 발화 음성에 포함된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 저장부(230)로부터 추출되면, 추출된 지시자 및 명령어의 조합으로 실행 명령 스크립트를 생성할 수 있다.As described above, the user can view the first utterance element representing the first content (target) of the turn and the first content of the turn from the content list 420 displayed on the screen of the display apparatus 100. A utterance may be included that includes a second utterance element representing an execution command. Therefore, when the indicator and the command corresponding to the first and second spoken elements included in the spoken voice are extracted from the storage 230, the controller 250 generates an execution command script using a combination of the extracted indicator and the command. can do.

한편, 본 발명의 추가적인 양상에 따라, 추출부(240)는 제1 발화 요소에 요청 정보가 있는지 여부를 판단하여, 요청 정보가 있으면, 요청 정보에 대응되는 명령어를 추출한다. 이후, 제어부(250)는 추출부(240)로부터 추출된 명령어에 기초하여 요청 정보에 대응되는 컨텐츠 정보를 응답 정보에 추가하여 디스플레이 장치(100)로 전송할 수 있다. 이를 위해, 저장부(230)는 요청 정보에 대한 명령어를 추가로 저장할 수 있다. 예를 들어, "상세 정보"라는 요청 정보는 "detail information"이라는 명령어와 대응되어 저장될 수 있으며, "제목"이라는 요청 정보는 "title"이라는 명령어와 대응되어 저장될 수 있다.Meanwhile, according to an additional aspect of the present invention, the extractor 240 determines whether there is request information in the first utterance element, and if there is request information, extracts a command corresponding to the request information. Thereafter, the controller 250 may transmit content information corresponding to the request information to the response information based on the command extracted from the extractor 240 and transmit the content information to the display apparatus 100. To this end, the storage unit 230 may further store a command for the request information. For example, the request information "detail information" may be stored in correspondence with the command "detail information", and the request information "title" may be stored in correspondence with the command "title".

예를 들어, "이거 제목이 뭐야"라는 사용자의 발화 음성의 경우, 추출부(240)는 "이거"와 "제목"이라는 제1 발화 요소와 "뭐야"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 여기서, 추출된 "이거"라는 제1 발화 요소는 대상을 나타내는 발화 요소이며, "제목"이라는 제1 발화 요소는 요청 정보를 나타내는 발화 요소가 될 수 있다. 이 같이, 제1 및 제2 발화 요소가 추출되면, 추출부(240)는 저장부(230)로부터 "이거"라는 제1 발화 요소에 대응되는 지시자 "$this$"와 "제목"이라는 제1 발화 요소에 대응되는 명령어 "title"와 "뭐야"라는 제2 발화 요소에 대응되는 명령어 "show"를 추출할 수 있다. 이 같이, 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 제어부(250)는 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 즉, 제어부(250)는 "이거"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$", "제목"이라는 요청 정보를 나타내는 제1 발화 요소에 대응되는 명령어 "title" 및 "뭐야"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "show"를 조합하여 show(title) at ($this$)이라는 실행 명령 스크립트를 생성할 수 있다.For example, in the case of the user's spoken voice of "what is the title", the extractor 240 may display the first speech element of "this" and "title" and the second speech element representing the execution command of "what". Can be extracted. Here, the extracted first utterance element "this" is a utterance element indicating the target, the first utterance element "title" may be a utterance element indicating the request information. As such, when the first and second ignition elements are extracted, the extractor 240 extracts from the storage 230 the first indicators "$ this $" and "title" corresponding to the first ignition element "this". Commands “title” corresponding to the utterance element and “show” corresponding to the second utterance element “what” may be extracted. As such, when the indicators and instructions corresponding to the first and second utterance elements are extracted, the controller 250 generates the execution command script by combining the extracted indicators and instructions. That is, the controller 250 includes the commands "title" and "what" corresponding to the first "ignition element" indicating the request information "indicator" $ this $ "and" title "corresponding to the first speech element indicating the object" this ". An executable command script called show (title) at ($ this $) may be generated by combining the command "show" corresponding to the second firing element representing the execution command.

이 같은 실행 명령 스크립트가 생성되면, 제어부(250)는 생성된 실행 명령 스크립트 내에 요청 정보를 나타내는 명령어가 포함되어 있는지 여부를 판단한다. 요청 정보를 나타내는 명령어가 있으면, 제어부(250)는 저장부(230)에 저장된 디스플레이 장치(100)와의 대화 이력 정보에 기초하여 요청 정보에 대응되는 컨텐츠 정보 획득 여부를 판단한다. 예들 들어, 제어부(250)는 "이거 제목이 뭐야"라는 사용자의 발화 음성 이전에 수신된 "액션 영화 보여줘"라는 사용자의 발화 음성에 기초하여 액션 영화에 대한 컨텐츠 정보를 포함하는 응답 정보를 디스플레이 장치(100)로 전송할 수 있다. 이후, 전술한 바와 같이, 요청 정보를 나타내는 명령어를 포함하는 실행 명령 스크립트가 생성되면, 제어부(250)는 이전 대화 이력 정보에 기초하여 해당 컨텐츠에 대한 제목 정보를 저장부(230)에 저장된 EPG 정보로부터 획득하거나 혹은 외부 서버(미도시)를 통해 수신할 수 있다. 이후, 제어부(250)는 기생성된 실행 명령 스크립트 및 제목 정보를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송할 수 있다.When such an execution command script is generated, the controller 250 determines whether a command indicating request information is included in the generated execution command script. If there is a command indicating request information, the controller 250 determines whether to obtain content information corresponding to the request information based on the conversation history information with the display apparatus 100 stored in the storage 230. For example, the control unit 250 displays the response information including the content information on the action movie based on the user's spoken voice called "Show action movie" received before the user's spoken voice "What is the title of this?" 100 can be sent. Thereafter, as described above, when the execution command script including the command indicating the request information is generated, the controller 250 stores the title information of the corresponding content on the basis of the previous conversation history information, the EPG information stored in the storage 230. Or from an external server (not shown). Thereafter, the controller 250 may generate response information including the generated execution command script and title information and transmit the generated response information to the display apparatus 100.

그러나, 본 발명은 이에 한정되지 않으며, 제어부(250)는 요청 정보를 나타내는 명령어를 포함하는 실행 명령 스크립트에 대한 응답 정보를 디스플레이 장치(100)로 전송할 수 있다. 이 경우, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 포함된 실행 명령 스크립트를 해석하여 화면상에 표시된 객체 중 지시자가 지칭하는 대상에 대응되는 객체를 선택하고, 선택된 객체에 대해 명령어에 대응되는 동작을 수행할 수 있다. 따라서, 디스플레이 장치(100)는 선택된 객체에 대응되는 컨텐츠의 제목 정보를 기저장된 EPG 정보로부터 획득하거나 혹은 외부 서버(미도시)를 통해 수신하여 출력할 수 있다.However, the present invention is not limited thereto, and the controller 250 may transmit response information to the execution command script including the command indicating the request information to the display apparatus 100. In this case, the display apparatus 100 analyzes the execution command script included in the response information received from the interactive server 200, selects an object corresponding to the object indicated by the indicator among the objects displayed on the screen, and selects the selected object. An operation corresponding to the command may be performed with respect to. Accordingly, the display apparatus 100 may obtain title information of content corresponding to the selected object from pre-stored EPG information or may receive and output the information through an external server (not shown).

한편, 본 발명의 추가적인 양상에 따라, 저장부(230)에 저장되는 지시자는 디스플레이 장치(100)의 화면상에 표시된 객체들의 고유 식별 정보가 될 수 있다. 이 경우, 추출부(240)는 디스플레이 장치(100)의 대화 이력에 기초하여 사용자의 발화 음성으로부터 추출된 제1 발화 요소가 지칭하는 대상을 판단하고, 판단된 대상에 대응되는 고유 식별 정보를 지시자로 추출할 수 있다.Meanwhile, according to an additional aspect of the present disclosure, the indicator stored in the storage 230 may be unique identification information of objects displayed on the screen of the display apparatus 100. In this case, the extractor 240 determines an object indicated by the first spoken element extracted from the spoken voice of the user based on the conversation history of the display apparatus 100, and indicates the unique identification information corresponding to the determined object. Can be extracted with

구체적으로, 디스플레이 장치(100)와 대화형 서버(200)는 디스플레이 장치(100)의 화면상에 디스플레이되는 컨텐츠에 대한 고유 식별 정보를 공유할 수 있다. 여기서, 각 고유 식별 정보는 디스플레이 장치(100)에서 현재 디스플레이되고 있는 컨텐츠 및 사용자의 요청에 따라 제공될 컨텐츠를 식별하기 위한 정보이다. 예를 들어, 도 4에서 설명한 바와 같이, 디스플레이 장치(100)는 화면상에 컨텐츠(410) 및 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 디스플레이할 수 있다. 이 경우, 컨텐츠(410)는 현재 디스플레이되고 있음을 나타내는 고유 식별 정보(#1234)가 부여될 수 있으며, 컨텐츠 리스트(420)는 현재 디스플레이되는 컨텐츠(410)와 상이한 고유 식별 정보(#5678)가 부여될 수 있다. In detail, the display apparatus 100 and the interactive server 200 may share unique identification information about the content displayed on the screen of the display apparatus 100. Here, each unique identification information is information for identifying the content currently displayed on the display apparatus 100 and the content to be provided according to a user's request. For example, as described with reference to FIG. 4, the display apparatus 100 may display the content list 420 including the content 410 and the content information 421 to 425 for each episode on the screen. In this case, the content 410 may be given unique identification information (# 1234) indicating that it is currently being displayed, and the content list 420 may have unique identification information (# 5678) different from the currently displayed content (410). Can be given.

따라서, 추출부(240)는 사용자의 발화 음성으로부터 제1 및 제2 발화 요소가 추출되면, 추출된 발화 요소 중 제1 발화 요소로부터 지칭하는 대상을 판단하고, 판단된 대상에 대응되는 고유 식별 정보를 저장부(230)로부터 획득하여 이를 지시자로 결정할 수 있다. 예를 들어, "이거 실행해줘"라는 발화 음성의 경우, 추출부(240)는 "이거"라는 제1 발화 요소를 추출할 수 있다. 이 같이, 제1 발화 요소가 추출되면, 추출부(240)는 "이거"라는 제1 발화 요소에 대응되는 지시자 $this$를 저장부(230)로부터 추출할 수 있다. 이 같은 지시자가 추출되면, 추출부(240)는 추출된 지시자를 통해 제1 발화 요소로부터 지칭하는 대상이 디스플레이 장치(100)의 화면상에 현재 디스플레이되는 컨텐츠(210)와 상이한 대상임을 판단할 수 있다. 따라서, 추출부(240)는 "이거"라는 제1 발화 요소에 대응되는 지시자인 $this$를 고유 식별 정보(#5678)로 변환할 수 있다.Accordingly, when the first and second speech elements are extracted from the speech of the user, the extractor 240 determines an object referred to from the first speech element among the extracted speech elements, and identifies unique identification information corresponding to the determined object. May be obtained from the storage 230 and determined as an indicator. For example, in the case of a spoken voice of "play this", the extractor 240 may extract the first spoken element of "this". As such, when the first ignition element is extracted, the extractor 240 may extract the indicator $ this $ corresponding to the first ignition element "this" from the storage unit 230. When such an indicator is extracted, the extractor 240 may determine that the object referred to from the first utterance element is a different object from the content 210 currently displayed on the screen of the display apparatus 100 through the extracted indicator. have. Therefore, the extractor 240 may convert $ this $, which is an indicator corresponding to the first utterance element "this", into unique identification information # 5678.

한편, "보고 있는 거 제목이 뭐야"라는 발화 음성의 경우, 추출부(240)는 "보고 있는 거"라는 제1 발화 요소를 추출할 수 있다. 이 같이, 제1 발화 요소가 추출되면, 추출부(240)는 "보거 있는 거"라는 제1 발화 요소에 대응되는 지시자 $showing_content$를 저장부(230)로부터 추출할 수 있다. 이 같은 지시자가 추출되면, 추출부(240)는 추출된 지시자를 통해 제1 발화 요소로부터 지칭하는 대상이 디스플레이 장치(100)의 화면상에 현재 디스플레이되는 컨텐츠인 것으로 판단할 수 있다. 따라서, 추출부(240)는 "보고 있는 거"라는 제1 발화 요소에 대응되는 지사자인 $showing_content$를 고유 식별 번호(#1234)로 변환할 수 있다.On the other hand, in the case of the spoken voice of "what is the title of what you are watching", the extraction unit 240 may extract the first speech element of "what is what you are watching." As such, when the first utterance element is extracted, the extractor 240 may extract the indicator $ showing_content $ corresponding to the first utterance element that is "walking" from the storage unit 230. When such an indicator is extracted, the extractor 240 may determine that the object referred to from the first utterance element is the content currently displayed on the screen of the display apparatus 100 through the extracted indicator. Accordingly, the extractor 240 may convert $ showing_content $, which is a branch corresponding to the first utterance element of “viewing”, into a unique identification number # 1234.

지금까지, 본 발명에 따른 대화형 서버(200)에 각 구성에 대해서 상세히 설명하였다. 이하에서는, 사용자의 발화 음성을 입력받는 디스플레이 장치(100)의 각 구성에 대해서 상세히 설명하도록 한다.Up to now, each configuration in the interactive server 200 according to the present invention has been described in detail. Hereinafter, each configuration of the display apparatus 100 that receives a user's spoken voice will be described in detail.

도 5는 본 발명의 일 실시예에 따른 디스플레이 장치의 블록도이다.5 is a block diagram of a display apparatus according to an embodiment of the present invention.

도 5에 도시된 바와 같이, 디스플레이 장치(100)는 입력부(110), 통신부(120), 디스플레이부(130) 및 제어부(140)를 포함한다.As shown in FIG. 5, the display apparatus 100 includes an input unit 110, a communication unit 120, a display unit 130, and a controller 140.

입력부(110)는 사용자로부터 발화된 발화 음성을 입력받는다. 구체적으로, 입력부(110)는 아날로그 형태의 사용자의 발화 음성이 입력되면, 입력된 사용자 발화 음성을 샘플링하여 디지털 신호로 변환한다. 이때, 입력부(110)는 입력된 사용자의 발화 음성에 노이즈(예를 들어, 에어컨 소리, 청소기 소리 등)가 있는 경우, 노이즈를 제거한 후, 노이즈가 제거된 사용자의 발화 음성을 디지털 신호로 변환함이 바람직하다. 뿐만 아니라, 입력부(110)는 다양한 사용자 조작을 입력받아 제어부(130)로 전달하는 기능을 수행할 수 있다. 이 경우, 입력부(110)는 터치패드(Touch Pad) 혹은 각종 기능키, 숫자키, 특수키, 문자키 등을 구비한 키패드(Key Pad) 또는 터치 스크린(Touch Screen)을 통해 사용자 조작 명령을 입력받을 수 있다.The input unit 110 receives a spoken speech voice from a user. In detail, when an analog voice of a user is input, the input unit 110 samples the input user speech and converts it into a digital signal. At this time, the input unit 110, if there is noise (for example, air conditioner sound, cleaner sound, etc.) in the input voice of the user, removes the noise, and converts the user's speech voice from which the noise is removed into a digital signal. This is preferable. In addition, the input unit 110 may perform a function of receiving various user manipulations and transmitting them to the controller 130. In this case, the input unit 110 inputs a user manipulation command through a touch pad or a keypad or a touch screen provided with various function keys, numeric keys, special keys, character keys, and the like. I can receive it.

통신부(120)는 입력부(110)를 통해 입력된 사용자의 발화 음성을 서버 장치(이하 대화형 서버라 함)로 전송하고, 전성된 발화 음성에 대응되는 응답 정보를 수신한다. 여기서, 통신부(120)는 근거리 무선 통신 모듈(미도시), 무선 통신 모듈(미도시) 등과 같은 다양한 통신 모듈을 포함할 수 있다. 여기서, 근거리 무선 통신 모듈(미도시)은 근거리에 위치한 대화형 서버(200) 및 컨텐츠를 제공하는 외부 서버(미도시)와 무선 통신을 수행하는 통신 모듈로써, 예를 들어, 블루투스, 지그비 등이 될 수 있다. 무선 통신 모듈(미도시)은 와이파이(WiFi), IEEE 등과 같은 무선 통신 프로토콜에 따라 외부 네트워크에 연결되어 통신을 수행하는 모듈이다. 이 밖에 무선 통신 모듈은 3G(3rd Generation), 3GPP(3rd Generation Partnership Project), LTE(Long Term Evoloution) 등과 같은 다양한 이동 통신 규격에 따라 이동 통신 망에 접속하여 통신을 수행하는 이동 통신 모듈을 더 포함할 수도 있다.The communication unit 120 transmits the user's spoken voice input through the input unit 110 to a server device (hereinafter referred to as an interactive server), and receives response information corresponding to the voiced voice. Here, the communication unit 120 may include various communication modules such as a short range wireless communication module (not shown), a wireless communication module (not shown), and the like. Here, the short range wireless communication module (not shown) is a communication module for performing wireless communication with the interactive server 200 located at a short distance and an external server (not shown) providing content, for example, Bluetooth, Zigbee, etc. Can be. The wireless communication module (not shown) is a module connected to an external network according to a wireless communication protocol such as Wi-Fi or IEEE to perform communication. In addition, the wireless communication module further includes a mobile communication module for accessing and communicating with the mobile communication network according to various mobile communication standards such as 3rd generation (3G), 3rd generation partnership project (3GPP), long term evolution (LTE), and the like. You may.

디스플레이부(130)는 액정 표시 장치(Liquid Crystal Display, LCD), 유기 전기 발광 다이오드(Organic Light Emitting Display, OLED) 또는 플라즈마 표시 패널(Plasma Display Panel, PDP) 등으로 구현되어, 디스플레이 장치(100)를 통해 제공 가능한 다양한 디스플레이 화면을 제공할 수 있다. 특히, 디스플레이부(161) 대화형 서버(200)로부터 수신한 응답 정보에 기초하여 사용자의 발화 음성에 대응되는 컨텐츠 혹은 컨텐츠 정보를 디스플레이한다.The display unit 130 may be implemented as a liquid crystal display (LCD), an organic light emitting diode (OLED), a plasma display panel (PDP), or the like. It is possible to provide various display screens that can be provided through. In particular, the display unit 161 displays content or content information corresponding to the spoken voice of the user based on the response information received from the interactive server 200.

제어부(140)는 디스플레이 장치(100)의 구성을 전반적으로 제어한다. 특히, 제어부(140)는 대화형 서버(200)로부터 지시자 및 명령어의 조합으로 생성된 실행 명령 스크립트를 포함하는 응답 정보가 수신되면, 디스플레이부(130)의 화면상에 표시된 객체들의 표시 상태에 기초하여 지시자가 지칭하는 대상을 선택한다. 이후, 제어부(140)는 선택된 대상에 대해 명령어에 대응되는 동작을 수행한다.The controller 140 controls the overall configuration of the display apparatus 100. In particular, when the controller 140 receives the response information including the execution command script generated by the combination of the indicator and the command from the interactive server 200, the controller 140 is based on the display state of the objects displayed on the screen of the display 130. To select the object indicated by the indicator. Thereafter, the controller 140 performs an operation corresponding to the command on the selected target.

구체적으로, 도 4에서 설명한 바와 같이, 디스플레이부(130)는 제어부(140)의 제어 명령에 따라, 사용자가 요청한 채널을 통해 컨텐츠(410)를 수신하여 디스플레이할 수 있다. 또한, 디스플레이 장치(100)는 리모콘 또는 사용자의 발화 음성을 통해 입력된 사용자 명령에 기초하여 사용자가 요청한 컨텐츠에 대한 컨텐츠 리스트(420)를 화면상에 디스플레이할 수 있다. 또한, 디스플레이 장치(100)는 기설정된 조건에 기초하여 컨텐츠 리스트(420) 상에 표시된 회차별 컨텐츠 정보(421~425) 중 1회차 컨텐츠 정보(421)에 하이라이트를 표시할 수 있다. 즉, 디스플레이 장치(100)는 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 화면상에 최초 표시할 경우, 기설정된 조건에 기초하여 제일 상단에 위치하는 1회차 컨텐츠 정보(421)가 우선적으로 선택될 수 있도록 하이라이트를 표시할 수 있다.In detail, as described with reference to FIG. 4, the display 130 may receive and display the content 410 through a channel requested by the user according to a control command of the controller 140. In addition, the display apparatus 100 may display a content list 420 for content requested by a user on the screen based on a user command input through a remote controller or a user's spoken voice. In addition, the display apparatus 100 may display a highlight on the first content information 421 of the individual content information 421 to 425 displayed on the content list 420 based on a preset condition. That is, when the display apparatus 100 first displays the content list 420 including the content information 421 to 425 on the screen for the first time, the display apparatus 100 displays the first content information located at the top based on a preset condition ( The highlight may be displayed so that 421 may be preferentially selected.

한편, 사용자는 컨텐츠 리스트(420) 상에 표시된 회차별 컨텐츠 정보(421~425)를 참조하여 특정 회차에 대응되는 제1 컨텐츠를 시청하기 위한 발화를 할 수 있다. 예를 들어, 사용자는 1회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 시청하기 위해 "이거 실행해줘"라는 발화를 할 수 있다. 따라서, 디스플레이 장치(100)는 "이거 실행해줘"라는 발화 음성을 입력받을 수 있다. 이와 같이, 본 발명에 따른 디스플레이 장치(100)는 화면상에 복수의 객체들이 표시된 상태에서 복수의 객체 중 하나를 지칭하는 대상을 나타내는 제1 발화 요소와 해당 대상을 실행하기 위한 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성을 입력받을 수 있다. 여기서, 제1 발화 요소는 대명사, 서수, 방향 중 적어도 하나를 통해 대상을 나타낼 수 있다. 예를 들어, "이거 실행해줘"라는 발화 음성은 대명사로 대상을 나타내는 제1 발화 요소를 포함할 수 있으며, "세번째 거 실행해줘"라는 발화 음성은 서수로 대상을 나타내는 제1 발화 요소를 포함할 수 있다.Meanwhile, the user may speak the first content corresponding to the specific event by referring to the content information 421 to 425 displayed on the content list 420. For example, the user may speak "Please execute this" in order to watch the first content corresponding to the first content information 421. Accordingly, the display apparatus 100 may receive a spoken voice of "do this". As described above, the display apparatus 100 according to the present invention includes a first utterance element representing a target referring to one of the plurality of objects in a state where a plurality of objects are displayed on the screen, and an execution command for executing the target. A spoken voice including two speech elements may be input. Here, the first utterance element may indicate an object through at least one of pronouns, ordinal numbers, and directions. For example, a spoken voice of "play this" may include a first spoken element that represents a subject in pronouns, and a "speak of please" may include a first spoken element that represents a subject in an ordinal number. Can be.

이 같이, 대상을 나타내는 제1 발화 요소 및 해당 대상을 실행하기 위한 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성이 입력되면, 통신부(120)는 대화형 서버(200)로 입력된 발화 음성을 전송한다. 발화 음성을 수신한 대화형 서버(200)는 수신한 발화 음성에 포함된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출하고, 추출한 지시자 및 명령어의 조합으로 실행 명령 스크립트를 생성한다. 전술한 예와 같이, "이거 실행해줘"라는 발화 음성이 수신되면, 대화형 서버(200)는 "이거"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출한다. 이후, 대화형 서버(200)는 제1 및 제2 발화 요소에 대응하여 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 즉, 대화형 서버(200)는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 조합하여 "execute($this$)"라는 실행 명령 스크립트를 생성한다. 이후, 대화형 서버(200)는 "execute($this$)"라는 실행 명령 스크립트를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송한다.As such, when an utterance voice including a first utterance element representing a target and a second utterance element indicating an execution command for executing the target is input, the communication unit 120 inputs the uttered voice input to the interactive server 200. Send it. The interactive server 200 receiving the spoken voice extracts an indicator and a command corresponding to the first and second spoken elements included in the received spoken voice, and generates an execution command script using a combination of the extracted indicator and the command. As in the above example, when a spoken voice of "execute this" is received, the interactive server 200 "instructs" and "execute" the indicators "$ this $" corresponding to the first speech element representing the object "this". Extracts the command "execute" corresponding to the second utterance element representing the execution command. Thereafter, the interactive server 200 generates an execution command script by combining the extracted indicators and commands corresponding to the first and second utterance elements. That is, the interactive server 200 combines the indicator "$ this $" corresponding to the first utterance element and the instruction "execute" corresponding to the second utterance element indicating the execution command "execute", and executes "execute ($ Create an executable command script called this $). Thereafter, the interactive server 200 generates response information including an execution command script called "execute ($ this $)" and transmits the response information to the display apparatus 100.

이 같은 응답 정보가 수신되면, 제어부(140)는 수신된 응답 정보에 포함된 실행 명령 스크립트를 해석하여 사용자의 발화 음성에 대응되는 동작을 수행할 수 있다. 전술한 예와 같이, "execute($this$)"라는 실행 명령 스크립트를 포함하는 응답 정보가 수신되면, 제어부(140)는 실행 명령 스크립트를 해석하여 화면상에 표시된 객체 중 하나의 객체를 선택하고, 선택된 객체에 대한 실행을 수행한다. 구체적으로, 제어부(140)는 실행 명령 스크립트를 해석하여 지시자 및 명령어를 구분한다. 즉, 제어부(140)는 "execute($this$)"라는 실행 명령 스크립트로부터 지시자는 "$this$"이며, 명령어는 "execute"라고 구분할 수 있다.When such response information is received, the controller 140 may interpret an execution command script included in the received response information and perform an operation corresponding to the spoken voice of the user. As in the above example, when response information including an execution command script called "execute ($ this $)" is received, the controller 140 interprets the execution command script to select one of the objects displayed on the screen. , Executes the selected object. In detail, the controller 140 interprets the execution command script to classify the indicator and the command. That is, the control unit 140 may be classified as "$ this $" and the command "execute" from the execution command script called "execute ($ this $)".

이 같이, 실행 명령 스크립트로부터 지시자 및 명령어가 구분되면, 제어부(140)는 구분된 지시자에 기초하여 화면상에 표시된 복수의 객체 중 하나를 선택할 수 있다. 도 4에 도시된 바와 같이, 디스플레이부(130)는 사용자가 요청한 채널을 통해 수신된 컨텐츠(410)를 디스플레하며, 이와 함께 사용자가 요청한 제1 컨텐츠에 대한 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 표시될 수 있다. 또한, 디스플레이부(130)는 기설정된 조건에 기초하여 컨텐츠 리스트(420)에 포함된 회차별 컨텐츠 정보(421~425) 중 1회차 컨텐츠 정보(421)에 하이라이트를 표시할 수 있다. 즉, 디스플레이부(130)는 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 화면상에 최초 표시할 경우, 기설정된 조건에 기초하여 제일 상단에 위치하는 1회차 컨텐츠 정보(421)가 우선적으로 선택될 수 있도록 하이라이트를 표시할 수 있다. 한편, 1회차 컨텐츠 정보(421)에 하이라이트가 표시된 상태에서, 입력부(110)를 통해 사용자의 조작 명령이 입력되면, 디스플레이부(130)는 입력된 사용자의 조작 명령에 대응되는 컨텐츠 정보(1회차 컨텐츠 정보(421)를 제외한 나머지 회차별 컨텐츠 정보(422~425) 중 하나)에 하이라이트를 표시할 수 있다. 이 경우, 사용자의 조작 명령에 대응하여 하이라이트가 표시된 컨텐츠 정보가 우선적으로 선택되도록 설정될 수 있다.As such, when the indicator and the command are separated from the execution command script, the controller 140 may select one of a plurality of objects displayed on the screen based on the separated indicator. As shown in FIG. 4, the display unit 130 displays the content 410 received through the channel requested by the user, and simultaneously displays the contents information 421 to 425 of the first content requested by the user. The content list 420 may be displayed. In addition, the display 130 may display a highlight on the first content information 421 of the content information 421 to 425 included in the content list 420 based on a preset condition. That is, when the display unit 130 displays the content list 420 including the content information 421 to 425 for each time on the screen for the first time, the display 130 displays the first content information located at the top based on a preset condition ( The highlight may be displayed so that 421 may be preferentially selected. Meanwhile, when the user's operation command is input through the input unit 110 while the first content information 421 is highlighted, the display unit 130 displays the content information corresponding to the input user's operation command (1 time). One of the remaining content information 422 to 425 except for the content information 421 may be displayed. In this case, the content information marked with highlights may be preferentially selected in response to a user's operation command.

따라서, 실행 명령 스크립트로부터 구분된 지시자가 "$this$"이면, 제어부(140)는 현재 하이라이트가 표시된 컨텐츠 정보를 지칭하는 것으로 판단할 수 있다. 즉, 도 4와 같이, 1회차 컨텐츠 정보(421)에 하이라이트가 표시되어 있으면, 제어부(140)는 "$this$" 지시자에 기초하여 하이라이트가 표시된 1회차 컨텐츠 정보(421)를 선택할 수 있다. 이 같이, 1회차 컨텐츠 정보(421)가 선택되면, 제어부(140)는 실행 명령 스크립트로부터 구분된 명령어인 "execute"에 기초하여 선택된 1회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 외부 서버(미도시)로부터 수신하여 디스플레이할 수 있다.Therefore, when the indicator distinguished from the execution command script is "$ this $", the controller 140 may determine that the content information is currently highlighted. That is, as shown in FIG. 4, when the highlight is displayed in the first content information 421, the controller 140 may select the first content information 421 in which the highlight is displayed based on the “$ this $” indicator. As such, when the first content information 421 is selected, the controller 140 displays the first content corresponding to the first content information 421 selected based on “execute” which is a command classified from the execution command script. Can be received from (not shown) and displayed.

전술한 또다른 예와 같이, 대화형 서버(200)는 "세번째 거 실행해줘"라는 사용자의 발화 음성으로부터 "execute($3rd$)"라는 실행 명령 스크립트를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송할 수 있다. 이 같은 응답 정보가 수신되면, 제어부(140)는 수신된 응답 정보에 포함된 실행 명령 스크립트를 해석하여 지시자 및 명령어를 구분한다. 즉, 제어부(140)는 "execute($3rd$)"라는 실행 명령 스크립트로부터 "$3rd$"이라는 지시자와 "execute"라는 명령어를 구분할 수 있다. 한편, 도 4와 같이, 1회차 컨텐츠 정보(421)에 하이라이트가 표시되어 있으면, 제어부(140)는 "$3rd$" 지시자에 기초하여 하이라이트가 표시된 1회차 컨텐츠 정보(421)를 기준으로 세번째 위치하는 3회차 컨텐츠 정보(423)를 선택할 수 있다. 이 같이, 3회차 컨텐츠 정보(423)가 선택되면, 제어부(14)는 실행 명령 스크립트로부터 구분된 명령어인 "execute"에 기초하여 선택된 3회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 외부 서버(미도시)로부터 수신하여 디스플레이할 수 있다.As in another example described above, the interactive server 200 generates response information including an execution command script called "execute ($ 3rd $)" from the user's spoken voice of "play the third one" and displays the display device ( 100). When such response information is received, the controller 140 interprets the execution command script included in the received response information to classify the indicator and the command. That is, the controller 140 may distinguish the indicator "$ 3rd $" and the command "execute" from the execution command script called "execute ($ 3rd $)". On the other hand, as shown in FIG. 4, if the highlight is displayed in the first-order content information 421, the controller 140 displays the third position based on the first-time content information 421 in which the highlight is displayed based on the "$ 3rd $" indicator. The third round content information 423 can be selected. As such, when the third round of content information 423 is selected, the controller 14 displays the first content corresponding to the third round of content information 421 selected based on "execute" which is a command separated from the execution command script. Can be received from (not shown) and displayed.

지금까지, 본 발명에 따른 사용자의 발화 음성을 인식하고, 인식된 발화 음성에 대응되는 응답 정보에 기초하여 동작을 수행하는 디스플레이 장치(100)의 각 구성에 대해서 상세히 설명하였다. 이하에서는, 사용자의 발화 음성에 대응되는 응답 정보를 제공하는 대화형 서버(200) 및 응답 정보에 기초하여 동작을 수행하는 디스플레이 장치(100)의 제어 방법에 대해서 상세히 설명하도록 한다.Up to now, each configuration of the display apparatus 100 that recognizes a user's spoken voice and performs an operation based on response information corresponding to the recognized spoken voice has been described in detail. Hereinafter, a control method of the interactive server 200 that provides response information corresponding to the spoken voice of the user and the display apparatus 100 that performs an operation based on the response information will be described in detail.

도 6은 본 발명의 일 실시예에 따른 대화형 서버의 제어 방법에 대한 흐름도이다.6 is a flowchart illustrating a control method of an interactive server according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 대화형 서버(200)는 디스플레이 장치(100)로부터 대상을 나타내는 제1 발화 요소 및 실행 명령을 나타내는 제2 발화 요소를 포함하는 사용자의 발화 음성을 수신한다(S610). 여기서, 사용자의 발화 음성은 아날로그 형태의 음성 신호에서 디지털 신호로 변환된 음성 신호이다. 그리고, 제1 발화 요소는 사용자의 발화 음성 내에서 주요 특징으로 분류되는 발화 요소이다. 이 같은 제1 발화 요소가 디스플레이 장치(100)의 화면상에 표시된 객체들의 표시 상태에 기초하여 결정된 발화 요소인 경우, 제1 발화 요소는 대상을 나타내는 발화 요소가 될 수 있다. 즉, 제1 발화 요소는 대명사, 서수, 방향 중 적어도 하나를 통해 대상을 나타내는 발화 요소가 될 수 있다. 그리고, 제2 발화 요소는 사용자의 발화 음성 내에서 실행 명령으로 분류된 발화 요소가 될 수 있다.As illustrated in FIG. 6, the interactive server 200 receives a spoken voice of a user including a first speech element representing a target and a second speech element representing an execution command from the display apparatus 100 (S610). . Here, the user's spoken voice is a voice signal converted from an analog voice signal to a digital signal. The first speech element is a speech element that is classified as a main feature in the speech of the user. When the first speech element is a speech element determined based on a display state of objects displayed on the screen of the display apparatus 100, the first speech element may be a speech element representing a target. That is, the first speech element may be a speech element representing the object through at least one of pronouns, ordinal numbers, and directions. The second speech element may be a speech element classified as an execution command in the speech of the user.

예를 들어, "이거 실행해줘"라는 발화 음성의 경우, "이거"는 대명사를 나타내는 제1 발화 요소가 될 수 있으며, "실행해줘"는 실행 명령을 나타내는 제2 발화 요소가 될 수 있다. 이 같은 제1 및 제2 발화 요소를 포함하는 발화 음성에 대한 디지털 신호가 수신되면, 대화형 서버(200)는 수신된 발화 음성을 텍스트 정보로 변화한다(S620). 실시예에 따라, 대화형 서버(200)는 STT(Speech to Text) 알고리즘을 이용하여 수신된 사용자의 발화 음성을 텍스트로 변환할 수 있다. 그러나, 본 발명은 이에 한정되지 않으며, 대화형 서버(200)는 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신할 수 있다. 이 경우, 디스플레이 장치(100)는 입력된 사용자의 발화 음성에 대한 텍스트 정보를 전술한 제1 서버(10)와 같은 ASR 서버로부터 수신하여 대화형 서버(200)로 전송한다. 따라서, 대화형 서버(200)는 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보를 수신할 수 있다. For example, in the case of a utterance voice of "play this", "this" may be a first utterance element representing a pronoun and "execute" may be a second utterance element representing an execution command. When the digital signal for the spoken voice including the first and second spoken elements is received, the interactive server 200 converts the received spoken voice into text information (S620). According to an embodiment, the interactive server 200 may convert the received speech of the user into text using a speech to text (STT) algorithm. However, the present invention is not limited thereto, and the interactive server 200 may receive text information on the spoken voice of the user from the display apparatus 100. In this case, the display apparatus 100 receives text information on the input voice of the user from the ASR server such as the first server 10 and transmits the text information to the interactive server 200. Therefore, the interactive server 200 may receive text information on the spoken voice of the user from the display apparatus 100.

이 같이, 사용자의 발화 음성이 텍스트 정보로 변환되거나 혹은 디스플레이 장치(100)로부터 사용자의 발화 음성에 대한 텍스트 정보가 수신되면, 대화형 서버(200)는 텍스트 정보로 변환된 발화 음성으로부터 제1 발화 요소에 대응되는 지시자 및 제2 발화 요소에 대응되는 명령어를 추출한다(S630). 구체적으로, 대화형 서버(200)는 복수의 지시자 및 복수의 명령어를 저장할 수 있다. 여기서, 복수의 지시자 및 복수의 명령어는 사용자의 발화 음성으로부터 추출된 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하기 위한 실행 정보이다. 보다 구체적으로, 지시자는 디스플레이 장치(100)의 화면상에 표시된 객체들 중에서 대상을 상대적으로 지칭하기 위한 실행어가 될 수 있다. 다시 말해, 지시자는 사용자의 발화 음성으로부터 추출된 발화 요소 중 대명사, 서수, 방향과 같은 대상을 나타내는 제1 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하기 위한 실행어이다. 그리고, 명령어는 사용자의 발화 음성으로부터 추출된 발화 요소 중 실행 명령을 나타내는 제2 발화 요소에 기초하여 디스플레이 장치(100)에서 해석 가능한 형태로 동작을 수행하기 위한 실행 정보이다. 따라서, 대화형 서버(200)는 표 1 및 표 2와 같이, 대상을 나타내는 제1 발화 요소별 지시자 및 실행 명령을 나타내는 제2 발화 요소별 명령어가 매칭된 테이블을 저장할 수 있다. 이에 따라, 대화형 서버(200)는 기저장된 테이블로부터 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출할 수 있다.As such, when the spoken voice of the user is converted into text information or when the text information on the spoken voice of the user is received from the display apparatus 100, the interactive server 200 performs a first speech from the spoken voice converted into text information. The indicator corresponding to the element and the instruction corresponding to the second spoken element are extracted (S630). In detail, the interactive server 200 may store a plurality of indicators and a plurality of commands. Here, the plurality of indicators and the plurality of commands are execution information for performing an operation in a form that can be interpreted by the display apparatus 100 based on the spoken elements extracted from the spoken voice of the user. More specifically, the indicator may be an execution word for relatively referring to an object among objects displayed on the screen of the display apparatus 100. In other words, the indicator is an execution word for performing an operation in a form that can be interpreted by the display apparatus 100 based on a first speech element representing an object such as a pronoun, an ordinal number, and a direction among speech elements extracted from the speech of the user. . The command is execution information for performing an operation in a form that can be interpreted by the display apparatus 100 based on a second speech element indicating an execution command among speech elements extracted from a user's speech voice. Thus, as shown in Tables 1 and 2, the interactive server 200 may store a table in which an indicator for each first utterance element indicating a target and a command for each second utterance element indicating an execution command match. Accordingly, the interactive server 200 may extract an indicator and a command corresponding to the first and second speech elements from the pre-stored table.

제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 대화형 서버(200)는 추출된 지시자 및 명령어를 조합하여 발화 음성에 대응되는 응답 정보를 생성하고, 이를 디스플레이 장치(100)로 전송한다(S640).When the indicators and commands corresponding to the first and second speech elements are extracted, the interactive server 200 generates response information corresponding to the spoken voice by combining the extracted indicators and commands and outputs the response information to the display apparatus 100. Transmit (S640).

예를 들어, "이거 실행해줘"라는 사용자의 발화 음성의 경우, 대화형 서버(200)는 "이거"라는 대상을 나타내는 제1 발화 요소 및 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 이 같은 제1 및 제2 발화 요소가 추출되면, 대화형 서버(200)는 기저장된 테이블로부터 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출한다. 즉, 대화형 서버(200)는 표 1 및 표 2에 개시된 테이블과 같이, "이거"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출할 수 있다. 이후, 대화형 서버(200)는 추출된 지시자 및 명령어를 조합하여 "execute($this$)"라는 실행 명령 스크립트를 생성할 수 있다. For example, in the case of the user's spoken voice of "play this", the interactive server 200 displays the first speech element representing the object "this" and the second speech element representing the execution command "run this". Can be extracted. When such first and second utterance elements are extracted, the interactive server 200 extracts an indicator and a command corresponding to the first and second utterance elements from a previously stored table. That is, the interactive server 200 indicates an execution command of the indicators "$ this $" and "execute" corresponding to the first ignition element representing the object "this", as shown in Tables 1 and 2, respectively. The command “execute” corresponding to the second speech element may be extracted. Thereafter, the interactive server 200 may generate the execution command script called "execute ($ this $)" by combining the extracted indicator and the command.

또다른 예를 들어, "세번째 것 실행해줘"라는 사용자의 발화 음성의 경우, 대화형 서버(200)는 "세번째"라는 대상을 나타내는 제1 발화 요소 및 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 이 같은 제1 및 제2 발화 요소가 추출되면, 대화형 서버(200)는 기저장된 테이블로부터 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출한다. 즉, 대화형 서버(200)는 표 1 및 표 2에 개시된 테이블과 같이, "세번째"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$3rd$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 "execute"를 추출할 수 있다. 이후, 대화형 서버(200)는 추출된 지시자 및 명령어를 조합하여 "execute($3rd$)"라는 실행 명령 스크립트를 생성할 수 있다.For another example, in the case of the user's spoken voice " execute a third one, " the interactive server 200 can display a first utterance element representing the object " third " and a second command " execute command " Ignition elements can be extracted. When such first and second utterance elements are extracted, the interactive server 200 extracts an indicator and a command corresponding to the first and second utterance elements from a previously stored table. That is, the interactive server 200 indicates the execution commands of the indicators "$ 3rd $" and "execute" corresponding to the first utterance element indicating the object "third", as shown in Tables 1 and 2, respectively. The command “execute” corresponding to the second speech element may be extracted. Thereafter, the interactive server 200 may generate the execution command script called "execute ($ 3rd $)" by combining the extracted indicator and the command.

이 같은 실행 명령 스크립트가 생성되면, 대화형 서버(200)는 생성된 실행 명령 스크립트를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송한다. 이에 따라, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 포함된 실행 명령 스크립트에 기초하여 화면상에 표시된 객체 중 사용자가 지칭한 대상에 대응되는 객체를 선택하고, 선택된 객체를 디스플레이할 수 있다.When the execution command script is generated, the interactive server 200 generates response information including the generated execution command script and transmits the generated response information to the display apparatus 100. Accordingly, the display apparatus 100 selects an object corresponding to the object indicated by the user from among the objects displayed on the screen based on the execution command script included in the response information received from the interactive server 200, and selects the selected object. Can be displayed.

한편, 대화형 서버(200)는 제1 발화 요소에 요청 정보가 있는지 여부를 판단하여, 요청 정보가 있으면, 요청 정보에 대응되는 명령어를 추출한다. 이후, 대화형 서버(200)는 추출된 명령어에 기초하여 요청 정보에 대응되는 컨텐츠 정보를 응답 정보에 추가하여 디스플레이 장치(100)로 전송할 수 있다. 이를 위해, 대화형 서버(200)는 요청 정보별 명령어를 매칭시킨 테이블을 추가로 저장할 수 있다. 예를 들어, 대화형 서버(200)는 "상세 정보"라는 요청 정보와 "detail informaion"이라는 명령어를 매칭시켜 저장할 수 있으며, "제목"이라는 요청 정보와 "titile"이라는 명령어를 매칭시켜 저장할 수 있다.Meanwhile, the interactive server 200 determines whether there is request information in the first utterance element, and if there is request information, extracts a command corresponding to the request information. Thereafter, the interactive server 200 may add content information corresponding to the request information to the response information based on the extracted command and transmit the content information to the display apparatus 100. To this end, the interactive server 200 may further store a table matching the command for each request information. For example, the interactive server 200 may match the request information of "detailed information" with the command "detail informaion" and store it, and may match the request information of "title" with the command of "titile" and store it. .

예를 들어, "이거 제목이 뭐야"라는 사용자의 발화 음성의 경우, 대화형 서버(200)는 "이거"와 "제목"이라는 제1 발화 요소와 "뭐야"라는 실행 명령을 나타내는 제2 발화 요소를 추출할 수 있다. 여기서, 추출된 "이거"라는 제1 발화 요소는 대상을 나타내는 발화 요소이며, "제목"이라는 제1 발화 요소는 요청 정보를 나타내는 발화 요소가 될 수 있다. 이 같이, 제1 및 제2 발화 요소가 추출되면, 대화형 서버(200)는 기저장된 테이블을 참조하여 "이거"라는 제1 발화 요소에 대응되는 지시자 "$this$"와 "제목"이라는 제1 발화 요소에 대응되는 명령어 "title"와 "뭐야"라는 제2 발화 요소에 대응되는 명령어 "show"를 추출할 수 있다. 이 같이, 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어가 추출되면, 대화형 서버(200)는 추출된 지시자 및 명령어를 조합하여 show(title) at ($this$)이라는 실행 명령 스크립트를 생성할 수 있다.For example, in the case of a user's spoken voice of "what is the title", the interactive server 200 may display a first speech element of "this" and "title" and a second speech element representing an execution command of "what". Can be extracted. Here, the extracted first utterance element "this" is a utterance element indicating the target, the first utterance element "title" may be a utterance element indicating the request information. As such, when the first and second utterance elements are extracted, the interactive server 200 refers to the pre-stored table and displays the indicators "$ this $" and "title" corresponding to the first utterance element "this". The command “show” corresponding to the first utterance element “title” and “what” corresponding to the first utterance element may be extracted. As such, when the indicators and commands corresponding to the first and second utterance elements are extracted, the interactive server 200 combines the extracted indicators and commands to execute an execution command script called show (title) at ($ this $). Can be generated.

이 같은 실행 명령 스크립트가 생성되면, 대화형 서버(200)는 기 생성된 실행 명령 스크립트 내에 요청 정보를 나타내는 명령어가 포함되어 있는지 여부를 판단한다. 판단 결과, 실행 명령 스크립트 내에 명령어가 있으면, 대화형 서버(200)는 기저장된 디스플레이 장치(100)와의 대화 이력 정보에 기초하여 요청 정보에 대으오디는 컨텐츠 정보 획득 여부를 판단한다. 예를 들어, 대화형 서버(200)는 "이거 제목이 모야"라는 사용자의 발화 음성 이전에 수신된 "액션 영화 보여줘'라는 사용자의 발화 음성에 기초하여 액션 영화에 대한 컨텐츠 정보를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송할 수 있다. 이후, "이거 제목이 모야'라는 사용자의 발화 음성이 수신되면, 대화형 서버(200)는 전술한 단계를 통해 사용자의 발화 음성에 대한 실행 명령 스크립트를 생성한다. 이후, 대화형 서버(200)는 기 생성된 실행 명령 스크립트 내에 요청 정보에 대한 명령어가 포함되어 있으면, 기저장된 디스플레이 장치(100)와의 대화 이력 정보에 기초하여 해당 컨텐츠에 대한 제목 정보를 EPG 정보로부터 획득하거나 혹은 외부 서버(미도시)로부터 수신한다. 이후, 대화형 서버(200)는 기 생성된 실행 명령 스크립트 및 제목 정보를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송한다.When such an execution command script is generated, the interactive server 200 determines whether a command indicating request information is included in the previously generated execution command script. As a result of the determination, if there is a command in the execution command script, the interactive server 200 determines whether to obtain content information for the request information based on the conversation history information with the pre-stored display apparatus 100. For example, the interactive server 200 may include response information including content information about the action movie based on the user's spoken voice of “Show Action Movie” received before the user's spoken voice of “This title is Moya”. May be generated and transmitted to the display apparatus 100. Subsequently, when the user's spoken voice of "this title is Moya" is received, the interactive server 200 executes the command for executing the spoken voice of the user through the above-described steps. Create a script. Subsequently, if the interactive server 200 includes a command for requesting information in the previously generated execution command script, the interactive server 200 obtains title information on the corresponding content from the EPG information based on the conversation history information with the pre-stored display apparatus 100. Acquire or receive from an external server (not shown). Thereafter, the interactive server 200 generates response information including previously generated execution command script and title information and transmits the generated response information to the display apparatus 100.

그러나, 본 발명은 이에 한정되지 않으며, 대화형 서버(200)는 요청 정보를 나타내는 명령어를 포함하는 실행 명령 스크립트에 대한 응답 정보를 디스플레이 장치(100)로 전송할 수 있다. 이 경우, 디스플레이 장치(100)는 대화형 서버(200)로부터 수신한 응답 정보에 포함된 실행 명령 스크립트를 해석하여 화면상에 표시된 객체 중 지시자가 지칭하는 대상에 대응되는 객체를 선택하고, 선택된 객체에 대해 명령어에 대응되는 동작을 수행할 수 있다. 따라서, 디스플레이 장치(100)는 선택된 객체에 대응되는 컨텐츠의 제목 정보를 기저장된 EPG 정보로부터 획득하거나 혹은 외부 서버(미도시)를 통해 수신하여 출력할 수 있다.However, the present invention is not limited thereto, and the interactive server 200 may transmit response information about an execution command script including a command indicating request information to the display apparatus 100. In this case, the display apparatus 100 analyzes the execution command script included in the response information received from the interactive server 200, selects an object corresponding to the object indicated by the indicator among the objects displayed on the screen, and selects the selected object. An operation corresponding to the command may be performed with respect to. Accordingly, the display apparatus 100 may obtain title information of content corresponding to the selected object from pre-stored EPG information or may receive and output the information through an external server (not shown).

한편, 본 발명의 추가적인 양상에 따라, 대화형 서버(200)에 저장되는 지시자는 디스플레이 장치(100)의 화면상에 표시된 객체들의 고유 식별 정보가 될 수 있다. 여기서, 각 고유 식별 정보는 디스플레이 장치(100)에서 현재 디스플레이되고 있는 컨텐츠 및 사용자의 요청에 따라 제공될 컨텐츠를 식별하기 위한 정보이다. 예를 들어, 도 4에서 설명한 바와 같이, 디스플레이 장치(100)는 화면상에 컨텐츠(410) 및 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 디스플레이할 수 있다. 이 경우, 컨텐츠(410)는 현재 디스플레이되고 있음을 나타내는 고유 식별 정보(#1234)가 부여될 수 있으며, 컨텐츠 리스트(420)는 현재 디스플레이되는 컨텐츠(410)와 상이한 고유 식별 정보(#5678)가 부여될 수 있다. Meanwhile, according to an additional aspect of the present invention, the indicator stored in the interactive server 200 may be unique identification information of the objects displayed on the screen of the display apparatus 100. Here, each unique identification information is information for identifying the content currently displayed on the display apparatus 100 and the content to be provided according to a user's request. For example, as described with reference to FIG. 4, the display apparatus 100 may display the content list 420 including the content 410 and the content information 421 to 425 for each episode on the screen. In this case, the content 410 may be given unique identification information (# 1234) indicating that it is currently being displayed, and the content list 420 may have unique identification information (# 5678) different from the currently displayed content (410). Can be given.

따라서, 대화형 서버(200)는 사용자의 발화 음성으로부터 제1 및 제2 발화 요소가 추출되면, 추출된 발화 요소 중 제1 발화 요소로부터 지칭하는 대상을 판단하고, 기저장된 고유 식별 정보 중 판단된 대상에 대응되는 고유 식별 정보를 획득하여 이를 지시자로 결정할 수 있다. 예를 들어, "이거 실행해줘"라는 발화 음성의 경우, 대화형 서버(200)는 "이거"라는 제1 발화 요소를 추출할 수 있다. 이 같이, 제1 발화 요소가 추출되면, 대화형 서버(200)는 기저장된 제1 발화 요소별 지시자 중 "이거"라는 제1 발화 요소에 대응되는 지시자 $this$를 추출할 수 있다. 이 같은 지시자가 추출되면, 대화형 서버(200)는 추출된 지시자를 통해 제1 발화 요소로부터 지칭하는 대상이 디스플레이 장치(100)의 화면상에 현재 디스플레이되는 컨텐츠(210)와 상이한 대상임을 판단할 수 있다. 따라서, 대화형 서버(200)는 "이거"라는 제1 발화 요소에 대응되는 지시자인 $this$를 고유 식별 정보(#5678)로 변환할 수 있다.Therefore, when the first and second speech elements are extracted from the user's speech voice, the interactive server 200 determines an object referred to from the first speech elements among the extracted speech elements, and determines among the pre-stored unique identification information. Unique identification information corresponding to the object may be obtained and determined as an indicator. For example, in the case of a spoken voice of "play this", the interactive server 200 may extract the first spoken element of "this". As such, when the first utterance element is extracted, the interactive server 200 may extract the indicator $ this $ corresponding to the first utterance element "this" from among the pre-stored first utterance element-specific indicators. When such an indicator is extracted, the interactive server 200 may determine that the object referred to from the first utterance element is a different object from the content 210 currently displayed on the screen of the display apparatus 100 through the extracted indicator. Can be. Accordingly, the interactive server 200 may convert $ this $, which is an indicator corresponding to the first utterance element "this", into unique identification information # 5678.

이하에서는, 사용자의 발화 음성에 대응되는 응답 정보에 기초하여 동작을 수행하는 디스플레이 장치(100)의 제어 방법에 대해서 상세히 설명하도록 한다.Hereinafter, a control method of the display apparatus 100 that performs an operation based on response information corresponding to a user's spoken voice will be described in detail.

도 7은 본 발명의 일 실시예에 따른 디스플레이 장치의 제어 방법에 대한 흐름도이다.7 is a flowchart illustrating a control method of a display apparatus according to an exemplary embodiment.

도 7에 도시된 바와 같이, 디스플레이 장치(100)는 사용자의 발화 음성을 입력받는다(S710). 사용자의 발화 음성이 입력되면, 디스플레이 장치(100)는 입력된 사용자의 발화 음성을 대화형 서버(200)로 전송한다(S720). 구체적으로, 디스플레이 장치(100)는 아날로그 형태의 사용자의 발화 음성이 입력되면, 입력된 사용자의 발화 음성을 디지털 신호로 변환한다. 이때, 디스플레이 장치(100)는 입력된 사용자의 발화 음성에 노이즈가 있는지 여부를 판단하여, 노이즈가 있으면, 노이즈가 제거된 사용자의 발화 음성을 디지털 신호로 변환하는 것이 바람직하다.As shown in FIG. 7, the display apparatus 100 receives an input voice of a user (S710). When the user's spoken voice is input, the display apparatus 100 transmits the input user's spoken voice to the interactive server 200 (S720). In detail, when the analog speech user's speech is input, the display apparatus 100 converts the input speech of the user into a digital signal. In this case, the display apparatus 100 may determine whether there is noise in the input speech of the user, and if there is noise, convert the speech of the user from which the noise is removed into a digital signal.

이 같은 사용자의 발화 음성이 디지털 신호로 변환되면, 디스플레이 장치(100)는 대화형 서버(200)로 디지털 신호로 변환된 사용자의 발화 음성을 전송하고, 그에 따른 응답 정보를 수신한다(S730). 응답 정보가 수신되면, 디스플레이 장치(100)는 화면상에 표시된 객체들의 표시 상태에 기초하여 응답 정보에 포함된 지시자가 지칭하는 대상을 선택하고, 선택된 대상에 대해 응답 정보에 포함된 명령에 대응되는 동작을 수행한다(S740).When the user's spoken voice is converted into a digital signal, the display apparatus 100 transmits the user's spoken voice converted into a digital signal to the interactive server 200 and receives corresponding response information (S730). When the response information is received, the display apparatus 100 selects a target indicated by the indicator included in the response information based on the display state of the objects displayed on the screen, and corresponds to a command included in the response information for the selected target. The operation is performed (S740).

구체적으로, 도 4에서 설명한 바와 같이, 디스플레이 장치(100)는 사용자가요청한 채널을 통해 컨텐츠(410)를 수신하여 디스플레이할 수 있다. 또한, 디스플레이 장치(100)는 리모콘 또는 사용자의 발화 음성을 통해 입력된 사용자 명령에 기초하여 사용자가 요청한 컨텐츠에 대한 컨텐츠 리스트(420)를 화면상에 디스플레이할 수 있다. 또한, 디스플레이 장치(100)는 기설정된 조건에 기초하여 컨텐츠 리스트(420) 상에 표시된 회차별 컨텐츠 정보(421~425) 중 1회차 컨텐츠 정보(421)에 하이라이트를 표시할 수 있다. 즉, 디스플레이 장치(100)는 회차별 컨텐츠 정보(421~425)를 포함하는 컨텐츠 리스트(420)를 화면상에 최초 표시할 경우, 기설정된 조건에 기초하여 제일 상단에 위치하는 1회차 컨텐츠 정보(421)가 우선적으로 선택될 수 있도록 하이라이트를 표시할 수 있다.In detail, as described with reference to FIG. 4, the display apparatus 100 may receive and display the content 410 through a channel requested by the user. In addition, the display apparatus 100 may display a content list 420 for content requested by a user on the screen based on a user command input through a remote controller or a user's spoken voice. In addition, the display apparatus 100 may display a highlight on the first content information 421 of the individual content information 421 to 425 displayed on the content list 420 based on a preset condition. That is, when the display apparatus 100 first displays the content list 420 including the content information 421 to 425 on the screen for the first time, the display apparatus 100 displays the first content information located at the top based on a preset condition ( The highlight may be displayed so that 421 may be preferentially selected.

한편, 사용자는 컨텐츠 리스트(420) 상에 표시된 1회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 시청하기 위해 "이거 실행해줘"라는 발화를 할 수 있다. 따라서, 디스플레이 장치(100)는 "이거 실행해줘"라는 사용자의 발화 음성을 입력받을 수 있다. 이와 같이, 본 발명에 따른 디스플레이 장치(100)는 화면상에 복수의 객체들이 표시된 상태에서 복수의 객체 중 하나를 지칭하는 대상을 나타내는 제1 발화 요소와 해당 대상을 실행하기 위한 실행 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성을 입력받을 수 있다. 여기서, 제1 발화 요소는 대명사, 서수, 방향 중 적어도 하나를 통해 대상을 나타낼 수 있다. 예를 들어, "이거 실행해줘"라는 발화 음성은 대명사로 대상을 나타내는 제1 발화 요소를 포함할 수 있으며, "세번째 거 실행해줘"라는 발화 음성은 서수로 대상을 나타내는 제1 발화 요소를 포함할 수 있다.On the other hand, the user may utter "Do this" to watch the first content corresponding to the first content information 421 displayed on the content list 420. Accordingly, the display apparatus 100 may receive a user's spoken voice of "do this". As described above, the display apparatus 100 according to the present invention includes a first utterance element representing a target referring to one of the plurality of objects in a state where a plurality of objects are displayed on the screen, and an execution command for executing the target. A spoken voice including two speech elements may be input. Here, the first utterance element may indicate an object through at least one of pronouns, ordinal numbers, and directions. For example, a spoken voice of "play this" may include a first spoken element that represents a subject in pronouns, and a "speak of please" may include a first spoken element that represents a subject in an ordinal number. Can be.

이 같이, 대상을 나타내는 제1 발화 요소 및 해당 대상을 실행하기 위한 실햄 명령을 나타내는 제2 발화 요소를 포함하는 발화 음성이 입력되면, 디스플레이 장치(100)는 대화형 서버(200)로 입력된 발화 음성을 디지털 신호로 변환하여 전송한다. 이에 따라, 대화형 서버(200)는 전술한 바와 같이, 입력된 발화 음성에 포함된 제1 및 제2 발화 요소에 대응되는 지시자 및 명령어를 추출하고, 추출된 지시자 및 명령어를 조합하여 실행 명령 스크립트를 생성한다. 전술한 예와 같이, "이거 실행해줘"라는 발화 음성이 수신되면, 대화형 서버(200)는 "이거"라는 대상을 나타내는 제1 발화 요소에 대응되는 지시자 "$this$"와 "실행해줘"라는 실행 명령을 나타내는 제2 발화 요소에 대응되는 명령어 execute를 추출한다. 이후, 대화형 서버(200)는 제1 및 제2 발화 요소에 대응하여 추출된 지시자 및 명령어를 조합하여 "execute($this$)"라는 실행 명령 스크립트를 생성한다. 이후, 대화형 서버(200)는 "execute($this$)"라는 실행 명령 스크립트를 포함하는 응답 정보를 생성하여 디스플레이 장치(100)로 전송한다.As such, when an utterance voice including a first utterance element representing a target and a second utterance element representing a silham command for executing the target is input, the display apparatus 100 receives a utterance input to the interactive server 200. The voice is converted into a digital signal and transmitted. Accordingly, as described above, the interactive server 200 extracts an indicator and a command corresponding to the first and second speech elements included in the input speech voice, and combines the extracted indicator and the command to execute an execution command script. Create As in the above example, when a spoken voice of "execute this" is received, the interactive server 200 "instructs" and "execute" the indicators "$ this $" corresponding to the first speech element representing the object "this". Extract command execute corresponding to the second utterance element representing the execution command. Thereafter, the interactive server 200 generates an execution command script called "execute ($ this $)" by combining the extracted indicators and commands corresponding to the first and second utterance elements. Thereafter, the interactive server 200 generates response information including an execution command script called "execute ($ this $)" and transmits the response information to the display apparatus 100.

이 같은 응답 정보가 수신되면, 디스플레이 장치(100)는 수신된 응답 정보에 포함된 실행 명령 스크립트를 해석하여 사용자의 발화 음성에 대응되는 동작을 수행할 수 있다. 전술한 예와 같이, "execute($this$)"라는 실행 명령 스크립트를 포함하는 응답 정보가 수신되면, 디스플레이 장치(100)는 실행 명령 스크립트를 해석하여 지시자는 $this$이며, 명령어는 execute라고 구분할 수 있다.When such response information is received, the display apparatus 100 may interpret an execution command script included in the received response information and perform an operation corresponding to the spoken voice of the user. As in the above example, when the response information including the execution command script called "execute ($ this $)" is received, the display apparatus 100 interprets the execution command script so that the indicator is $ this $ and the command is execute. Can be distinguished.

이 같이, 실행 명령 스크립트로부터 지시자 및 명령어가 구분되면, 디스플레이 장치(100)는 구분된 지시자에 기초하여 화면상에 표시된 복수의 객체 중 하나를 선택할 수 있다. 도 4에 도시된 바와 같이, 디스플레이 장치(100)는 컨텐츠 리스트(420) 상에 포함된 회차별 컨텐츠 정보(421~425) 중 1회차 컨텐츠 정보(421)에 하이라이트를 표시할 수 있다. 여기서, 하이라이트가 표시된 1회차 컨텐츠 정보(421)는 사용자가 지시하는 대상에 해당하는 객체를 선택하기 위한 기준이 될 수 있다. 따라서, 디스플레이 장치(100)는 실행 명령 스크립트로부터 구분된 지시자가 "$this$"이면, 하이라이트가 표시된 1회차 컨텐츠 정보(421)를 지시하는 것으로 판단하여 1회차 컨텐츠 정보(421)를 선택할 수 있다. 이와 같이, 1회차 컨텐츠 정보(421)가 선택되면, 디스플레이 장치(100)는 실행 명령 스크립트로부터 구분된 "execute" 명령어에 기초하여 1회차 컨텐츠 정보(421)에 대응되는 제1 컨텐츠를 외부 서버(미도시)로부터 수신하여 디스플레이할 수 있다.As such, when the indicator and the command are separated from the execution command script, the display apparatus 100 may select one of a plurality of objects displayed on the screen based on the separated indicator. As illustrated in FIG. 4, the display apparatus 100 may display a highlight on the first content information 421 among the content information 421 to 425 included in the content list 420. Here, the first round content information 421 in which the highlight is displayed may be a reference for selecting an object corresponding to a target indicated by the user. Therefore, when the indicator distinguished from the execution command script is "$ this $", the display apparatus 100 may determine that the first content information 421 is highlighted and select the first content information 421. . As such, when the first content information 421 is selected, the display apparatus 100 displays the first content corresponding to the first content information 421 based on the "execute" command separated from the execution command script. It may be received from the display (not shown).

또다른 예를 들어, 디스플레이 장치(100)는 대화형 서버(200)로부터 "다음 거 실행해줘"라는 사용자의 발화 음성에 대응하여 "execute($this$+1)"이라는 실행 명령 스크립트를 포함하는 응답 정보를 수신할 수 있다. 이 경우, 디스플레이 장치(100)는 수신한 응답 정보에 포함된 실행 명령 스크립트를 해석하여 "$this$+1"이라는 지시자와 "execute"라는 명령어를 구분할 수 있다. 한편, 도 4와 같이, 1회차 컨텐츠 정보(421)에 하이라이트가 표시되어 있으면, 디스플레이 장치(100)는 "$this$+1" 지시자에 기초하여 하이라이트가 표시된 1회차 컨텐츠 정보(421)를 기준으로 다음 번째에 위치하는 2회차 컨텐츠 정보(422)를 선택할 수 있다. 이 같이, 2회차 컨텐츠 정보(422)가 선택되면, 디스플레이 장치(100)는 실행 명령 스크립트로부터 구분된 "execute" 명령어에 기초하여 2회차 컨텐츠 정보(422)에 대응되는 제1 컨텐츠를 외부 서버(미도시)로부터 수신하여 디스플레이할 수 있다.In another example, the display apparatus 100 may include an execution command script called "execute ($ this $ + 1)" in response to a user's spoken voice of "execute next" from the interactive server 200. Response information may be received. In this case, the display apparatus 100 may distinguish the command “$ this $ + 1” from the command “execute” by analyzing the execution command script included in the received response information. On the other hand, as shown in FIG. 4, if the highlight is displayed in the first content information 421, the display apparatus 100 refers to the first content information 421 in which the highlight is displayed based on the "$ this $ + 1" indicator. Next, the second content information 422 located next may be selected. As such, when the second content information 422 is selected, the display apparatus 100 displays the first content corresponding to the second content information 422 based on the "execute" command separated from the execution command script. It may be received from the display (not shown).

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다.So far I looked at the center of the preferred embodiment for the present invention.

이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.While the above has been shown and described with respect to preferred embodiments of the present invention, the present invention is not limited to the specific embodiments described above, it is usually in the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

10 : 제1 서버 20 : 제2 서버
100 : 디스플레이 장치 110 : 입력부
120,210 : 통신부 130 : 디스플레이부
140,250 : 제어부 200 : 대화형 서버
220 : 음성 처리부 230 : 저장부
240 : 추출부10: first server 20: second server
100: display device 110: input unit
120,210: communication unit 130: display unit
140,250: control unit 200: interactive server
220: voice processing unit 230: storage unit
240: extraction unit

Claims

An input unit for receiving a voice command and converting the voice command into a digital signal;
Communication unit for performing communication with the server;
A display unit configured to display a content list including a plurality of objects on a screen; And
Control the communication unit to transmit the digital signal to the server,
Receiving, from the server, response information for performing an operation corresponding to the voice command obtained based on text information corresponding to the voice command, through the communication unit,
And a controller configured to identify a target object among the plurality of objects in the content list based on the response information, and to control the display to display the target object on the screen.
The text information includes a first command element representing the target object and including an ordinal number and a second command element representing an execution command,
The response information may include first information relatively representing the target object among the plurality of objects based on the first command element, and second information that is execution information for performing the operation based on the second command element. Include,
The control unit,
And identifying the target object at a position corresponding to the ordinal number among the plurality of objects based on the first information, and performing the operation on the identified target object based on the second information.

The method of claim 1,
The control unit,
And identifying the target object at a position corresponding to the ordinal based on one of the plurality of objects.

The method of claim 2,
The control unit,
And identifying the target object at a position corresponding to the ordinal based on a focus displayed on one of the plurality of objects.

The method of claim 1,
And the first command element further comprises at least one of a pronoun or a direction.

The method of claim 1,
The control unit,
And when the target object is identified, controlling the display to display the other object and the target object differently in the content list.

The method of claim 1,
Each of the plurality of objects includes an image representing corresponding content.

The method of claim 1,
And the response information includes an execution command script generated by combining the first information and the second information.

In the control method of the display device,
Displaying a content list including a plurality of objects on a screen;
Receiving a voice command;
Converting the voice command into a digital signal;
Transmitting the digital signal to a server;
Receiving, from the server, response information for performing an operation corresponding to the voice command obtained based on text information corresponding to the voice command;
Identifying a target object among the plurality of objects in the content list based on the response information; And
Displaying the target object on the screen;
The text information includes a first command element representing the target object and including an ordinal number and a second command element representing an execution command,
The response information may include first information relatively representing the target object among the plurality of objects based on the first command element, and second information that is execution information for performing the operation based on the second command element. Include,
Identifying the target object,
Identify the target object at a position corresponding to the ordinal number of the plurality of objects based on the first information,
The control method,
And performing the operation on the identified target object based on the second information.

The method of claim 8,
Identifying the target object,
And identifying the target object at a position corresponding to the ordinal based on one of the plurality of objects.

The method of claim 9,
Identifying the target object,
And identifying the target object at a position corresponding to the ordinal based on a focus displayed on one of the plurality of objects.

The method of claim 8,
And the first command element further comprises at least one of a pronoun or a direction.

The method of claim 8,
And displaying the target object differently from the remaining objects in the content list when the target object is identified.

The method of claim 8,
Each of the plurality of objects includes an image representing corresponding content.

The method of claim 8,
And the response information includes an execution command script generated by combining the first information and the second information.