KR20180068113A

KR20180068113A - Apparatus, method for recognizing voice and method of displaying user interface therefor

Info

Publication number: KR20180068113A
Application number: KR1020160169745A
Authority: KR
Inventors: 이강태; 김진한; 윤성인
Original assignee: 주식회사 케이티
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2018-06-21
Also published as: KR102208822B1

Abstract

The present invention relates to a method for recognizing a voice of an apparatus thereof operated by at least one processor. The method comprises the steps of: extracting at least one keyword identifying each content from at least one piece of content information; mapping the extracted keyword to a content selection voice command for selecting the content as a voice; receiving a voice signal including the content selection voice command from a user; and providing the specific content corresponding to the content selection voice command.

Description

TECHNICAL FIELD [0001] The present invention relates to a voice recognition apparatus, a method, and a user interface display method for the same. [0002]

본 발명은 음성 인식 장치, 방법 그리고 이를 위한 사용자 인터페이스 표시 방법에 관한 것이다.The present invention relates to a speech recognition apparatus, a method, and a user interface display method therefor.

음성 인식 기술은 사용자 등이 입력하는 음성을 수집하여 획득한 음성 신호를 소정 언어에 대응하는 신호로 인식하는 기술로서, 다양한 산업 분야에서 활용되고 있다. 특히, 음성 인식 기술은 손가락 등을 통해 특정 버튼을 누르는 등의 종래의 입력 방식에 비해 간편하기 때문에, 종래 리모컨을 대체하는 수단으로 TV에서 활용되고 있다. Speech recognition technology is a technology for recognizing a speech signal acquired by collecting speech input by a user or the like as a signal corresponding to a predetermined language, and is utilized in various industrial fields. Particularly, since the speech recognition technology is simpler than the conventional input method such as pressing a specific button through a finger or the like, it is utilized in TV as a means for replacing a conventional remote control.

예를 들어, 사용자가 "채널 올려","7번","KBS"와 같은 특정 단어를 음성으로 입력하면, TV 또는 셋탑박스(SET-TOP BOX) 내의 음성 인식 엔진을 통해 사용자의 음성 신호를 인식하여 채널 조정을 수행할 수 있다. For example, when a user inputs a specific word such as " channel up ", " 7 ", or " KBS " by voice, the user's voice signal is transmitted through a speech recognition engine in a TV or a set- It can recognize and perform channel adjustment.

최근 통신망 발달에 따라 사용자가 필요로 하는 영상을 원하는 시간에 제공해주는 주문형 비디오 조회 시스템(Video on demand, VOD) 제공 서비스가 늘고 있다. 비디오 콘텐츠는 제목이 짧은 단어로 구성될 수도 있지만, 비교적 긴 문장으로 구성될 수도 있다. VOD 제공 서비스에서도 음성 인식 기술을 적용하기 위하여 비교적 긴 문장을 정확하게 인식할 수 있는 다양한 연구가 지속 되고 있다. With the recent development of communication networks, services for providing video on demand (VOD) are increasing, providing users with the video they need at the desired time. The video content may consist of words with short titles, but may also be composed of relatively long sentences. Various researches have been continuing to accurately recognize relatively long sentences in order to apply speech recognition technology in VOD service.

하지만, 점차 다양해지는 콘텐츠(특히, 콘텐츠 제목이 특이한 사용자 제작 콘텐츠(User Created Contents, UCC))의 길고 복잡한 제목을 지원하기에는 한계가 있다. However, there is a limit to supporting long and complex titles of increasingly diverse content (especially user created contents (UCC) with content titles).

본 발명이 해결하고자 하는 과제는 복잡한 제목을 갖는 콘텐츠 제목에서 콘텐츠를 선택할 수 있는 명령어를 추출하고, 적어도 하나 이상의 음성 명령어를 이용하여 콘텐츠를 선택하는 음성 인식 장치, 음성 인식 방법, 그리고 음성 인식 방법을 제공하기 위한 사용자 인터페이스를 제공하는 것이다.SUMMARY OF THE INVENTION The present invention is directed to a speech recognition apparatus, a speech recognition method, and a speech recognition method for extracting a command for selecting a content from a content title having a complex title and selecting a content using at least one voice command And provides a user interface for providing the user interface.

본 발명의 한 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 음성 인식 장치의 음성 인식 방법은 적어도 하나의 콘텐츠 정보에서 각 콘텐츠를 식별하는 적어도 하나의 키워드를 추출하는 단계, 상기 추출한 키워드를 해당 콘텐츠를 음성으로 선택하기 위한 콘텐츠 선택 음성 명령어로 매핑하는 단계, 사용자로부터 상기 콘텐츠 선택 음성 명령어를 포함하는 음성 신호를 수신하는 단계, 그리고 상기 콘텐츠 선택 음성 명령어에 대응하는 상기 특정 콘텐츠를 제공하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a speech recognition method of a speech recognition apparatus operating by at least one processor, comprising the steps of: extracting at least one keyword that identifies each content from at least one piece of content information; Selecting a voice with a content selection voice command for selecting the voice, receiving a voice signal including the content selection voice command from a user, and providing the specific content corresponding to the content selection voice command do.

상기 음성 명령어 목록을 추출하는 단계는 상기 콘텐츠 정보에 포함된 콘텐츠 제목을 형태소 분석하여 상기 적어도 하나의 키워드를 추출하는 단계, 그리고 상기 추출된 키워드 중에서 중복되는 항목은 제거하고, 상기 키워드를 상기 개별 음성 명령어로 매핑하는 단계를 포함할 수 있다. The step of extracting the voice command word list may include extracting at least one keyword by morphologically analyzing a content title included in the content information, removing duplicated items from the extracted keywords, Quot; command ".

상기 적어도 하나의 콘텐츠의 썸네일, 그리고 상기 콘텐츠 선택 음성 명령어를 대응시켜 디스플레이 화면에 출력하는 단계를 더 포함하고, 상기 콘텐츠 선택 음성 명령어는 상기 콘텐츠 제목과 구분할 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시될 수 있다. A thumbnail of the at least one content, and the content selection voice command in association with each other, and outputting the corresponding content selection voice command to a display screen, wherein the content selection voice command includes a character size, a font color, Any one or more of the fonts may be displayed differently.

상기 디스플레이 화면에 출력하는 단계는 상기 콘텐츠 항목이 복수 개이면, 상기 복수의 콘텐츠 항목을 화면 분할 방식으로 한 화면에 출력하고, 상기 복수의 콘텐츠 항목에 부여된 일련 번호를 함께 표시할 수 있다. The step of outputting to the display screen may output the plurality of content items on a screen in a screen division manner and display the serial numbers assigned to the plurality of content items together when the plurality of content items are included.

상기 콘텐츠 선택 음성 명령어로 매핑하는 단계는 상기 일련번호를 추가 음성 명령어로 매핑할 수 있다. The mapping with the content selection voice command may map the serial number to an additional voice command.

본 발명의 한 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 음성 인식 장치에서 음성 인식을 수행하기 위한 사용자 인터페이스(User Interface)화면을 표시하는 방법은 콘텐츠 서버로부터 복수 콘텐츠의 표시 정보를 수신하는 단계, 상기 표시 정보에서 각 콘텐츠를 식별하는 적어도 하나의 키워드를 추출하고, 추출한 키워드를 해당 콘텐츠를 음성으로 선택하기 위한 콘텐츠 선택 음성 명령어로 매핑하는 단계, 그리고 콘텐츠별로 상기 표시 정보에 포함된 썸네일과 상기 콘텐츠 선택 음성 명령어를 대응시켜 디스플레이 화면에 출력하는 단계를 포함한다. A method for displaying a user interface screen for performing speech recognition in a speech recognition apparatus operated by at least one processor according to an embodiment of the present invention includes receiving display information of a plurality of contents from a content server Extracting at least one keyword that identifies each content from the display information, mapping the extracted keyword to a content selection voice command for selecting the content as a voice, and displaying the thumbnail included in the display information and the thumbnail And outputting the content selection voice command to the display screen in association with the content selection voice command.

상기 콘텐츠 선택 음성 명령어로 매핑하는 단계는 상기 복수 콘텐츠마다 상기 콘텐츠 정보에 포함된 콘텐츠 제목을 형태소 분석하여 상기 적어도 하나의 키워드를 추출하는 단계, 그리고 상기 추출된 키워드 중 중복되는 항목은 제거하여, 콘텐츠 선택 음성 명령어로 매핑하는 단계를 포함할 수 있다. Wherein the mapping with the content selection voice command includes a step of extracting at least one keyword by morphologically analyzing a content title included in the content information for each of the plurality of contents, And mapping to the selected voice command.

상기 디스플레이 화면에 출력하는 단계는 상기 콘텐츠 제목을 더 출력하고, 상기 콘텐츠 선택 음성 명령어는 상기 콘텐츠 제목과 구분할 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시될 수 있다. The outputting of the content title to the display screen may further output the content title, and the content selection voice command may display at least one of a character size, a character color, a character size, and a font so that the content can be distinguished from the content title.

상기 디스플레이 화면에 출력하는 단계는 상기 복수 콘텐츠에 대응하는 일련번호를 더 표시하고, 상기 콘텐츠 선택 음성 명령어로 매핑하는 단계는 상기 일련번호를 구성된 추가 음성 명령어로 매핑할 수 있다. The step of outputting to the display screen may further display a serial number corresponding to the plurality of contents, and the step of mapping to the content selection voice command may map the serial number to an additional voice command configured.

상기 콘텐츠 서버로 콘텐츠 정보 요청 메시지를 전송하는 단계를 더 포함하고, 상기 콘텐츠 정보 요청 메시지는 콘텐츠의 종류, 콘텐츠의 장르, 콘텐츠와 관련된 키워드, 콘텐츠의 제목에 포함된 단어, 콘텐츠에 등장하는 인물의 이름, 콘텐츠의 제작사 제목, 그리고 복수의 콘텐츠를 포함하는 분류 기준과 같은 명령어 중 어느 하나를 포함할 수 있다. And transmitting the content information request message to the content server, wherein the content information request message includes a content type, a genre of the content, a keyword related to the content, a word included in the title of the content, A name of the producer of the content, a classification title including a plurality of contents, and the like.

본 발명의 한 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 음성 인식 장치는 콘텐츠 서버로부터 복수의 콘텐츠 정보를 수신하는 콘텐츠 정보 수신부, 상기 콘텐츠 정보에서 각 콘텐츠를 식별하는 적어도 하나의 키워드를 추출하고, 추출한 키워드를 해당 콘텐츠를 음성으로 선택하기 위한 콘텐츠 선택 음성 명령어로 매핑하는 콘텐츠 선택 음성 명령어 생성부, 그리고 콘텐츠별로 상기 콘텐츠 정보에 포함된 썸네일과 상기 콘텐츠 선택 음성 명령어를 대응시켜 디스플레이 화면에 표시하는 유저 인터페이스 구성부를 포함한다. A speech recognition apparatus operated by at least one processor according to an embodiment of the present invention includes a content information receiver for receiving a plurality of pieces of content information from a content server, at least one keyword for identifying each content in the content information A content selection voice command generation unit for mapping the extracted keywords to a content selection voice command for selecting the content as a voice, and a thumbnail included in the content information for each content and the content selection voice command in association with each other and displaying them on a display screen And a user interface constituting unit.

상기 콘텐츠 선택 음성 명령어 생성부는 상기 콘텐츠 정보에 포함된 콘텐츠 제목을 형태소 분석하여 상기 적어도 하나의 키워드를 추출하고, 상기 추출된 키워드 중에서 중복되는 항목은 제거하여, 상기 콘텐츠 선택 음성 명령어로 매핑할 수 있다. The content selection voice command generation unit may perform a morphological analysis of the content title included in the content information to extract the at least one keyword and remove duplicated items from the extracted keywords to map the selected content to the content selection voice command .

상기 유저 인터페이스 구성부는 상기 콘텐츠 선택 음성 명령어를 상기 콘텐츠 제목과 구분할 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상을 다르게 표시할 수 있다.The user interface configuration unit may display at least one of a character size, a character color, a font size, and a font so that the content selection voice command can be distinguished from the content title.

상기 유저 인터페이스 구성부는 상기 복수의 콘텐츠 항목을 화면 분할 방식으로 한 화면에 출력하고, 상기 복수의 콘텐츠 항목에 부여된 일련번호를 함께 표시할 수 있다. The user interface configuration unit may output the plurality of content items on a screen in a screen division manner and display the serial numbers assigned to the plurality of content items together.

상기 콘텐츠 선택 음성 명령어 생성부는 상기 일련번호를 추가 음성 명령어로 매핑할 수 있다. The content selection voice command generation unit may map the serial number to an additional voice command.

본 발명의 실시예에 따르면 복잡한 단어의 조합으로 구성된 콘텐츠 제목을 간단한 음성 명령어를 이용하여 사용자가 입력할 수 있도록 할 수 있다. According to the embodiment of the present invention, a user can input a content title composed of a complex word combination using a simple voice command.

본 발명의 실시예에 따르면 사용자는 복잡한 콘텐츠 명칭 전체를 입력어로 할 필요 없이, 간단한 음성 명령어를 이용하여 콘텐츠를 선택할 수 있다. According to the embodiment of the present invention, a user can select a content using a simple voice command without having to input the entire contents name as a complex input word.

도 1은 본 발명의 한 실시예에 따른 음성 인식 시스템의 구성도이다.
도 2는 본 발명의 한 실시예에 따른 음성 인식 시스템의 상세 구성도이다.
도 3은 본 발명의 한 실시예에 따른 음성 인식 장치가 구성한 유저 인터페이스 화면의 예시이다.
도 4a 내지 도 4c는 본 발명의 한 실시예에 따른 음성 인식을 통해 콘텐츠를 선택하는 사용자 인터페이스 화면의 예시이다.
도 5는 본 발명의 다른 실시예에 따른 음성 인식 시스템의 구성도이다.
도 6은 본 발명의 한 실시예에 따른 음성 인식 장치가 음성 인식을 수행하여 콘텐츠를 선택하는 방법의 흐름도이다.
도 7은 본 발명의 다른 실시예에 따른 음성 인식 장치가 음성 인식을 수행하여 콘텐츠를 선택하는 방법의 흐름도이다. 1 is a block diagram of a speech recognition system according to an embodiment of the present invention.
2 is a detailed block diagram of a speech recognition system according to an embodiment of the present invention.
3 is an illustration of a user interface screen constructed by the speech recognition apparatus according to an embodiment of the present invention.
4A to 4C are views illustrating a user interface screen for selecting a content through speech recognition according to an embodiment of the present invention.
5 is a block diagram of a speech recognition system according to another embodiment of the present invention.
6 is a flowchart illustrating a method of selecting a content by performing speech recognition according to an embodiment of the present invention.
7 is a flowchart illustrating a method of selecting a content by performing speech recognition according to another embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as " comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, " " module, " and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

다음에서, 음성 인식 장치는 음성 인식 대상 제목이 복잡한 단어들의 조합으로 구성된 경우 복잡한 제목에서 음성 인식 대상을 대표할 수 있는 간단한 키워드를 추출하고, 사용자가 키워드만으로 구성된 음성 신호를 입력할 수 있도록 제공하는 것을 그 예로 들었으나, 본 발명은 반드시 이에 한정하는 것은 아니며, 음성 인식 대상 제목이 한 음절로 구성되거나 대상 제목이 자음 또는 모음만으로 구성되는 등 음성 인식하기 어려운 제목으로 구성된 경우, 사용자가 발음할 수 있는 명령어를 구성하는 것으로도 확장 가능하다. In the following, the speech recognition apparatus extracts a simple keyword that can represent a speech recognition target in a complex title when the speech recognition target title is composed of a complex word combination, and provides the user with a speech signal composed of only keywords However, the present invention is not necessarily limited to this, but the present invention is not necessarily limited thereto. In the case where the title of the speech recognition target is composed of one syllable or the subject title is composed of only consonants or vowels, It can also be expanded by constructing a command.

다음에서, 음성 인식 장치는 셋탑 박스로서, 다양한 제목을 갖는 콘텐츠를 선택하는 것을 그 예로 들었으나, 본 발명은 반드시 이에 한정하는 것은 아니며 오디오, 차량 이내에 설치되어 음악을 재생하기 위한 음성 인식 방법에도 확장 가능하다. In the following, the voice recognition apparatus selects a content having a variety of titles as a set-top box, but the present invention is not limited thereto. The present invention is also applicable to a voice recognition method for audio, It is possible.

도 1은 본 발명의 한 실시예에 따른 음성 인식 시스템의 구성도이다. 1 is a block diagram of a speech recognition system according to an embodiment of the present invention.

도 1을 참고하면, 음성 인식 시스템(1000)은 음성 인식 장치(100), 서버(200), 그리고 표시 장치(300)를 포함한다.Referring to FIG. 1, a speech recognition system 1000 includes a speech recognition apparatus 100, a server 200, and a display apparatus 300.

음성 인식 장치(100)는 사용자로부터 복수의 콘텐츠 중 특정 콘텐츠를 선택하기 위한 음성 명령어를 수신하고, 음성 명령어를 이용하여 서버(200)로 특정 콘텐츠를 요청하며, 서버(200)로부터 수신한 특정 콘텐츠를 표시 장치(300)를 통해 출력할 수 있다.The speech recognition apparatus 100 receives a voice command for selecting a specific content from a plurality of contents from a user, requests specific content from the server 200 using a voice command, Can be output through the display device (300).

본 실시예에서 음성 인식 장치(100)는 표시 장치(300)와 별도의 장치로 구성되는 것을 그 예로 들었으나, 음성 인식 장치(100)는 내부에 디스플레이를 포함하여 자체적으로 사용자가 콘텐츠를 선택할 수 있는 인터페이스를 제공하고, 서버(200)로부터 제공받은 콘텐츠를 표시할 수도 있다. Although the speech recognition apparatus 100 is configured as a separate apparatus from the display apparatus 300 in the present embodiment, the speech recognition apparatus 100 may include a display to allow the user to select contents And may display the contents provided from the server 200. [

앞으로, 음성 인식 장치(100)에서의 부하 및 처리 성능을 고려하여, 음성 인식 장치(100)가 서버(200)로부터 음성 명령어 목록을 수신한다고 주로 설명하나, 음성 인식 장치(100)는 서버(200)로부터 일부 도움을 받아 음성 명령어 목록을 제공하거나, 서버(200)와의 통신 없이 독자적으로 음성 명령어 목록을 제공할 수 있다. 이 경우, 음성 인식 장치(100)는 서버(200)의 일부 기능을 수행할 수 있는 명령어들을 포함하거나, 음성 인식 장치(100)가 서버(200)의 모든 기능을 수행할 수 있는 명령어들을 포함할 수 있다. 한편, 사용자가 호출하는 대표적인 명령어는 텔레비전 프로그램 제목과 같은 콘텐츠 제목이므로, 앞으로 음성 인식 장치(100)에 포함된 프로그램은 사용자가 음성 명령어를 호출하면 음성 명령어에 대응하는 콘텐츠를 재생하는 콘텐츠 재생 프로그램으로 설명하나, 본 발명은 콘텐츠 재생 프로그램뿐만 아니라, 사용자가 음성 명령어를 호출하면 음성 명령어에 대응하여 연산을 수행할 수 있는 다양한 음성 인식 프로그램에 적용될 수 있다. The speech recognition apparatus 100 mainly receives the list of voice commands from the server 200 in consideration of the load and the processing performance in the voice recognition apparatus 100. The voice recognition apparatus 100 includes a server 200 To provide a list of voice commands or to provide a list of voice commands independently without communicating with the server 200. [ In this case, the speech recognition apparatus 100 may include instructions that can perform some functions of the server 200, or may include instructions that allow the speech recognition apparatus 100 to perform all the functions of the server 200 . On the other hand, since a typical command to be called by the user is a content title such as a title of a television program, the program included in the voice recognition apparatus 100 is a content playback program for playing back a content corresponding to a voice command when the user calls a voice command However, the present invention can be applied not only to a content reproduction program, but also to various speech recognition programs capable of performing calculations corresponding to voice commands when a user calls voice commands.

음성 인식 장치(100)는 컴퓨터 판독 가능한 저장 매체, 프로세서, 메모리, 통신 모듈 등의 하드웨어를 포함한다. 저장 매체에는 음성 인식을 통해 명령어를 호출하는 음성 인식 프로그램이 저장된다. 메모리는 음성 인식 프로그램의 음성 명령어 목록을 저장하고 있거나, 저장 장치로부터 음성 인식 프로그램의 명령어들을 로드하여 일시 저장한다. 프로세서는 메모리에 저장되어 있거나 로드된 명령어들을 실행하여 본 발명의 음성 인식 프로그램을 구동한다. 통신 모듈은 통신망을 통해 서버(200)와 통신한다. The speech recognition apparatus 100 includes hardware such as a computer-readable storage medium, a processor, a memory, a communication module, and the like. The storage medium stores a speech recognition program for calling a command through voice recognition. The memory stores a list of voice commands of the voice recognition program or temporarily stores commands of the voice recognition program from the storage device. The processor executes the stored or loaded instructions to drive the speech recognition program of the present invention. The communication module communicates with the server 200 through a communication network.

음성 인식 장치(100)는 다양한 형태로 구현될 수 있고, 예를 들면 스마트 폰과 같은 모바일 단말, 스마트 패드와 같은 패드형 단말, 랩탑 컴퓨터 등 각종 형태의 컴퓨터, 웨어러블 디바이스, TV 단말, 셋톱 박스 등의 형태로 구현될 수 있다. The voice recognition apparatus 100 may be implemented in various forms, for example, a mobile terminal such as a smart phone, a pad type terminal such as a smart pad, various types of computers such as a laptop computer, a wearable device, a TV terminal, As shown in FIG.

음성 인식 프로그램은 사용자 인터페이스 화면에 포함된 복잡한 콘텐츠 제목에서 콘텐츠를 대표할 수 있는 키워드를 추출하여 음성 명령어로 설정할 수 있다. 음성 인식 프로그램은 단독 어플리케이션으로 구현될 수 있으나, 설명을 위해 콘텐츠 선택 프로그램에 통합된 것으로 가정한다.The speech recognition program can extract a keyword representing a content from a complex content title included in the user interface screen and set it as a voice command word. The speech recognition program may be implemented as a stand-alone application, but is assumed to be integrated into the content selection program for explanation.

도 2는 본 발명의 한 실시예에 따른 음성 인식 시스템의 상세 구성도이다. 2 is a detailed block diagram of a speech recognition system according to an embodiment of the present invention.

도 2를 참고하면, 본 발명의 한 실시예에 따른 음성 인식 장치(100)는 적어도 하나의 프로세서에 의해 동작하고, 음성 신호 수신부(110), 음성 명령어 인식부(120), 콘텐츠 정보 수신부(140), 콘텐츠 식별 음성 명령어 생성부(150), 그리고 유저 인터페이스 구성부(160)를 포함한다. 2, the speech recognition apparatus 100 according to an exemplary embodiment of the present invention is operated by at least one processor and includes a voice signal receiving unit 110, a voice command recognizing unit 120, a content information receiving unit 140 A content identification voice command generation unit 150, and a user interface configuration unit 160.

음성 신호 수신부(110)는 사용자의 음성 신호를 수신한다. 음성 신호 수신부(110)는 음성 인식 장치(100)에 포함된 마이크를 통해 구현될 수 있고, 또는 원격 장치에 포함된 마이크를 통해 수집한 음성 명령어를 수신할 수도 있다. The audio signal receiving unit 110 receives a user's audio signal. The voice signal receiving unit 110 may be implemented through a microphone included in the voice recognition apparatus 100 or may receive voice commands collected through a microphone included in the remote apparatus.

음성 명령어 인식부(120)는 음성 명령어를 수신하여 음성 인식 처리를 수행한다. 음성 인식 기능은 음성 명령어를 얻어 음성 명령어에 해당하는 실행 명령어로 변환하는 일련의 과정으로써 음성 명령어 인식부(120)는 공지의 다양한 음성 인식 방법에 따라 음성 명령어를 언어 데이터로 변환하여 출력할 수 있다. The voice command recognition unit 120 receives voice commands and performs voice recognition processing. The voice recognition function is a series of steps of obtaining a voice command and converting it into an execution command corresponding to a voice command. The voice command recognition unit 120 can convert a voice command into a language data according to various known voice recognition methods and output the voice command .

음성 신호 수신부(110)를 통해 수신되는 음성 명령어는 음성 인식을 목표로 하는 사용자의 음성 이외에 다양한 노이즈 성분을 포함할 수 있으므로, 음성 명령어 인식부(120)는 주파수 분석 등의 전처리 과정을 통해 사용자의 음성 성분만을 추출하고, 추출된 음성 성분에 기초하여 음성 인식 처리를 수행할 수 있다. 음성 명령어 인식부(120)를 통한 음성 인식 방법은 공지의 다양한 방법이 존재하므로, 이에 대한 설명은 생략하기로 한다. Since the voice command received through the voice signal receiver 110 may include various noise components in addition to the voice of the user aiming at voice recognition, the voice command recognizer 120 may recognize the voice of the user through a pre- It is possible to extract only the speech component and perform speech recognition processing based on the extracted speech components. Since there are various known methods of speech recognition through the speech command recognition unit 120, a description thereof will be omitted.

음성 명령어 인식부(120)는 음성 인식 장치(100)의 내부에 마련되는 임베디드형 엔진으로 구현될 수 있으며, 별도의 하드웨어로 구현하거나, 프로세서에 의해 실행되는 소프트웨어로 구현될 수도 있다. The voice command recognition unit 120 may be embodied as an embedded engine provided in the voice recognition apparatus 100, or may be implemented as separate hardware or software executed by a processor.

콘텐츠 정보 요청부(130)는 음성 명령어 인식부(120)에서 인식한 음성 명령어가 콘텐츠 정보를 요청하는 음성 명령어이면 서버(200)로 콘텐츠 정보를 요청한다. 콘텐츠 정보를 요청하는 음성 명령어는 사용자가 선택할 수 있는 복수의 콘텐츠 정보를 요청하기 위한 명령어일 수 있다. 예를 들어, 콘텐츠 정보를 요청하는 음성 명령어는 "최신 영화", "메이크업", "헬스", "정우성" 등과 같이 콘텐츠의 종류, 콘텐츠의 장르, 콘텐츠와 관련된 키워드, 콘텐츠의 제목에 포함된 단어, 콘텐츠에 등장하는 인물의 이름, 콘텐츠의 제작사 제목, 그리고 복수의 콘텐츠를 포함하는 분류 기준과 같은 명령어 들을 포함할 수 있다. The content information request unit 130 requests content information from the server 200 if the voice command recognized by the voice command recognition unit 120 is a voice command for requesting the content information. The voice command for requesting the content information may be a command for requesting a plurality of pieces of content information that the user can select. For example, the voice command for requesting the content information may include a type of content, such as "latest movie", "makeup", "health", " , A name of a character appearing in the content, a title of the producer of the content, and a classification criterion including a plurality of contents.

콘텐츠 정보 수신부(140)는 서버(200)로부터 콘텐츠 정보를 수신한다. 서버(200)는 콘텐츠 정보 요청부(130)의 요청에 따라 사용자가 선택할 수 있는 적어도 하나 이상의 콘텐츠 정보를 추출하고, 추출한 콘텐츠 정보를 음성 인식 장치(100)로 제공할 수 있다. 콘텐츠 정보는 콘텐츠의 제목, 썸네일을 포함하고, 기타 콘텐츠에 관한 부가 정보를 더 포함할 수도 있다. The content information receiving unit 140 receives the content information from the server 200. [ The server 200 may extract at least one content information that the user can select according to a request from the content information requesting unit 130 and provide the extracted content information to the voice recognition apparatus 100. The content information includes a title and a thumbnail of the content, and may further include additional information related to the other content.

예를 들어, 사용자가 음성 명령어로 "메이크업"을 입력한 경우, 서버(200)는 "메이크업"과 관련한 주문형 비디오 리스트인 콘텐츠 목록을 추출할 수 있다. 예를 들면 서버(200)는 표 1과 같은 콘텐츠 목록을 추출하고, 추출한 콘텐츠 목록을 음성 인식 장치(100)로 전송할 수 있다. For example, if the user has entered " makeup " as a voice command, the server 200 may extract a list of content, which is a video-on-demand list associated with " makeup. &Quot; For example, the server 200 may extract the contents list as shown in Table 1 and transmit the extracted contents list to the voice recognition apparatus 100.

기본 명령어Basic commands 콘텐츠 제목Content title 메이크업make up 선미 메이크업 따라하기Follow stern make-up 할로윈 데이 메이크업Halloween Day Makeup 마스크팩 리얼 후기Mask Pack Real Review 전주 갈 때, 필수 메이크업Essential makeup when going to Jeonju 여신이란 이런 것This is a goddess

콘텐츠 식별 음성 명령어 생성부(150)는 콘텐츠 정보를 이용하여 사용자가 특정 콘텐츠를 식별할 수 있는 음성 명령어를 생성한다. The content identification voice command generation unit 150 generates a voice command that allows the user to identify a specific content using the content information.

콘텐츠 식별 음성 명령어 생성부(150)는 콘텐츠 정보, 예를 들어 콘텐츠 제목을 형태소 분석하여 적어도 하나의 명사를 추출할 수 있다. 예를 들면, 표 2와 같은 명사를 추출한다고 가정한다. The content identification voice command generation unit 150 can extract at least one noun by morphological analysis of content information, for example, a content title. For example, assume that nouns such as Table 2 are extracted.

콘텐츠 제목Content title 추출된 명사Extracted noun 선미 메이크업 따라하기Follow stern make-up 선미, 메이크업Stern, makeup 할로윈 데이 메이크업Halloween Day Makeup 할로윈, 메이크업Halloween, Makeup 마스크팩 리얼 후기Mask Pack Real Review 마스크팩, 리얼Mask pack, real 전주 갈 때, 필수 메이크업Essential makeup when going to Jeonju 전주, 필수, 메이크업Jeonju, Essential, Makeup 여신이란 이런 것This is a goddess 여신goddess

그리고 콘텐츠 식별 음성 명령어 생성부(150)는 추출된 명사 중에서 중복되는 명사를 제거할 수 있다. 예를 들면, 표 3과 같이 중복되는 명사인 "메이크업"을 제거할 수 있다. Then, the content identification voice command generation unit 150 can remove duplicate nouns from the extracted nouns. For example, you can remove duplicate nouns "makeup" as shown in Table 3.

콘텐츠 제목Content title 음성 명령어Voice command 선미 메이크업 따라하기Follow stern make-up 선미Stern 할로윈 데이 메이크업Halloween Day Makeup 할로윈Halloween 마스크팩 리얼 후기Mask Pack Real Review 마스크팩, 리얼Mask pack, real 전주 갈 때, 필수 메이크업Essential makeup when going to Jeonju 전주, 필수Jeonju, Essential 여신이란 이런 것This is a goddess 여신goddess

콘텐츠 식별 음성 명령어 생성부(150)는 추출된 명사 중 적어도 하나 이상을 해당 콘텐츠를 지칭하기 위한 개별 음성 명령어로 매핑할 수 있다. The content identification voice command generation unit 150 may map at least one of the extracted nouns to individual voice command words for referring to the content.

본 실시예에서 콘텐츠 식별 음성 명령어 생성부(150)는 콘텐츠 제목에서 키워드를 추출하고, 추출한 키워드를 해당 콘텐츠를 지칭하기 위한 개별 음성 명령어로 설정하는 것을 그 예로 들었으나, 본 발명은 반드시 이에 한하는 것은 아니다. 즉, 콘텐츠 식별 음성 명령어 생성부(150)는 콘텐츠 부가 정보에서 추출한 키워드를 기초로 해당 콘텐츠를 지칭하기 위한 콘텐츠 선택 음성 명령어로 설정할 수도 있다. In this embodiment, the content identification voice command generation unit 150 extracts a keyword from a content title and sets the extracted keyword as a separate voice command for designating the corresponding content. However, the present invention is not limited thereto It is not. That is, the content identification voice command generation unit 150 may set the content identification voice command to refer to the content based on the keyword extracted from the content additional information.

한편, 유저 인터페이스 구성부(160)는 사용자가 복수의 콘텐츠 중 어느 하나의 콘텐츠를 선택할 수 있는 선택 화면을 생성하여 사용자에게 제공할 수 있다. On the other hand, the user interface construction unit 160 can generate a selection screen in which a user can select any one of a plurality of contents, and provide the selection screen to the user.

유저 인터페이스 구성부(160)는 서버(200)에서 제공된 콘텐츠 정보에 포함된 콘텐츠 썸네일, 그리고 콘텐츠 선택 음성 명령어를 대응시켜 디스플레이 화면에 출력할 수 있도록 선택 화면을 구성할 수 있다. The user interface configuration unit 160 may configure a selection screen so that a content thumbnail included in the content information provided by the server 200 and a content selection voice command corresponding to each other can be output to the display screen.

유저 인터페이스 구성부(160)는 복수의 콘텐츠를 화면 분할 방식으로 하나의 화면에 출력할 수 있는데, 이때 화면에 표시된 순서대로 복수의 콘텐츠에 일련번호가 부여될 수 있다. 콘텐츠 식별 음성 명령어 생성부(150)는 복수의 콘텐츠에 해당하는 일련번호를 추가 음성 명령어로 설정할 수 있다. 예를 들면, 표 4와 같이 콘텐츠에 해당하는 일련번호를 추가 음성 명령어로 설정할 수 있다. The user interface construction unit 160 may output a plurality of contents on one screen in a screen division manner. At this time, a plurality of contents can be assigned serial numbers in the order displayed on the screen. The content identification voice command generation unit 150 may set the serial number corresponding to the plurality of contents as an additional voice command. For example, as shown in Table 4, the serial number corresponding to the content can be set by an additional voice command.

콘텐츠 제목Content title 음성 명령어Voice command 추가 음성 명령어Additional voice commands 선미 메이크업 따라하기Follow stern make-up 선미Stern 일번One time 할로윈 데이 메이크업Halloween Day Makeup 할로윈Halloween 이번this time 마스크팩 리얼 후기Mask Pack Real Review 마스크팩, 리얼Mask pack, real 삼번Third 전주 갈 때, 필수 메이크업Essential makeup when going to Jeonju 전주, 필수Jeonju, Essential 사번pressure of business 여신이란 이런 것This is a goddess 여신goddess 오번Auburn

유저 인터페이스 구성부(160)는 개별 음성 명령어가 콘텐츠 제목과 구분될 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시되도록 선택 화면을 구성할 수 있다. The user interface construction unit 160 may configure the selection screen so that at least one of the character size, the character color, the font size, and the font is displayed differently so that the individual voice commands can be distinguished from the content title.

도 3은 본 발명의 한 실시예에 따른 음성 인식 장치가 구성한 유저 인터페이스 화면의 예시이다.3 is an illustration of a user interface screen constructed by the speech recognition apparatus according to an embodiment of the present invention.

도 3을 참고하면, 본 발명의 한 실시예에 따른 음성 인식 장치(100)는 음성 명령어를 수신하고, 음성 명령어를 기초로 콘텐츠를 선택하는 프로그램을 실행할 수 있다. 즉, 음성 인식 장치(100)는 방송 프로그램 제목과 같은 콘텐츠 제목을 인식하기 위하여 사용자가 음성 명령어를 호출하면 음성 명령어에 대응하는 콘텐츠를 재생하는 콘텐츠 재생 프로그램을 실행할 수 있다. Referring to FIG. 3, the speech recognition apparatus 100 according to an embodiment of the present invention can execute a program that receives a voice command and selects a content based on the voice command. That is, the speech recognition apparatus 100 can execute a content reproduction program for reproducing a content corresponding to a voice command when a user calls a voice command to recognize a content title such as a title of a broadcast program.

음성 인식 장치(100)는 네트워크를 통해 연결된 서버(200)로부터 사용자가 선택할 수 있는 적어도 하나의 콘텐츠 정보를 수신한다. 이때, 서버(200)는 음성 인식 장치(100)로부터 전송된 콘텐츠 정보 요청 메시지를 이용하여 데이터베이스에서 추출하여 사용자가 선택할 수 있는 콘텐츠 목록을 제공할 수 있다. The speech recognition apparatus 100 receives at least one piece of content information that the user can select from the server 200 connected via the network. At this time, the server 200 may extract from the database using the content information request message transmitted from the voice recognition apparatus 100 and provide a list of contents that the user can select.

음성 인식 장치(100)는 수신한 콘텐츠 정보에서 각 콘텐츠를 식별하는 적어도 하나의 키워드를 추출하고, 추출한 키워드를 해당 콘텐츠를 음성으로 선택하기 위한 개별 음성 명령어로 매핑 할 수 있다. The speech recognition apparatus 100 may extract at least one keyword that identifies each content from the received content information and map the extracted keyword to a separate voice command for selecting the corresponding content as a voice.

음성 인식 장치(100)는 복수의 콘텐츠를 하나의 화면에 분할 방식으로 표시하고, 사용자로 하여금 개별 음성 명령어를 이용하여 복수의 콘텐츠로부터 하나의 콘텐츠를 선택하도록 할 수 있다. 이때, 음성 인식 장치(100)는 복수의 콘텐츠에 일련 번호를 부여하고, 복수의 콘텐츠를 인식하기 위한 일련번호를 추가 음성 명령어로 설정할 수 있다. The speech recognition apparatus 100 can display a plurality of contents on a single screen in a divided manner and allow the user to select one content from a plurality of contents using individual voice commands. At this time, the speech recognition apparatus 100 may assign a serial number to a plurality of contents, and may set a serial number for recognizing a plurality of contents as an additional voice command.

그리고 음성 인식 장치(100)는 도 3에 도시된 바와 같이 음성 명령어가 콘텐츠 제목과 구분될 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시되도록 선택 화면을 구성할 수 있다. As shown in FIG. 3, the speech recognition apparatus 100 may configure a selection screen such that at least one of a character size, a character color, a font size, and a font is displayed so that a voice command word can be distinguished from a content title .

도 4a 내지 도 4c는 본 발명의 한 실시예에 따른 음성 인식을 통해 콘텐츠를 선택하는 사용자 인터페이스 화면의 예시이다. 4A to 4C are views illustrating a user interface screen for selecting a content through speech recognition according to an embodiment of the present invention.

도 4a 내지 도 4c를 참고하면, 본 발명의 한 실시예에서 음성 인식 장치(100)가 TV 등의 표시 장치이면, 음성 인식 장치(100)에 구비된 디스플레이에서 사용자 인터페이스 화면을 제공할 수 있다. 본 발명의 다른 실시예에서 음성 인식 장치(100)가 셋톱 박스인 경우, 음성 인식 장치(100)와 연결된 디스플레이에서 사용자 인터페이스 화면을 제공할 수 있다. 4A to 4C, in one embodiment of the present invention, if the speech recognition apparatus 100 is a display device such as a TV, the user interface screen may be provided on a display provided in the speech recognition apparatus 100. [ In another embodiment of the present invention, when the speech recognition apparatus 100 is a set-top box, a user interface screen may be provided on a display connected to the speech recognition apparatus 100.

음성 인식 장치(100)는 서버(200)로부터 콘텐츠 정보를 수신한다. 서버(200)는 음성 인식 장치(100)로부터 전송된 콘텐츠 정보 요청 메시지에 따라 콘텐츠 정보를 음성 인식 장치(100)로 전송할 수 있다. The speech recognition apparatus 100 receives content information from the server 200. [ The server 200 may transmit the content information to the voice recognition apparatus 100 according to the content information request message transmitted from the voice recognition apparatus 100. [

콘텐츠 정보 요청 메시지는 콘텐츠의 종류, 콘텐츠의 장르, 콘텐츠와 관련된 키워드, 콘텐츠의 제목에 포함된 단어, 콘텐츠에 등장하는 인물의 이름, 콘텐츠의 제작사 제목, 그리고 복수의 콘텐츠를 포함하는 분류 기준 중 어느 하나 이상으로 구성되는 콘텐츠 요청 명령어를 포함할 수 있다. 서버(200)는 콘텐츠 요청 명령어에 따라 사용자가 선택할 수 있는 적어도 하나 이상의 콘텐츠 정보를 추출하고, 추출한 콘텐츠 정보를 음성 인식 장치(100)로 전송할 수 있다. The content information request message includes at least one of a content type, a genre of a content, a keyword associated with the content, a word included in the title of the content, a name of a character appearing in the content, a title of the producer of the content, And may include one or more content request commands. The server 200 may extract at least one content information that the user can select according to the content request command and transmit the extracted content information to the speech recognition apparatus 100. [

예를 들면 서버(200)는 표 5와 같이 콘텐츠 요청 명령어에 대응하여 복수의 콘텐츠 정보를 제공할 수 있는 데이터 베이스를 포함할 수 있다. For example, the server 200 may include a database capable of providing a plurality of pieces of content information in response to a content request command as shown in Table 5.

메뉴명Menu name 제1 콘텐츠 요청 명령어The first content request command 제2 콘텐츠 요청 명령어The second content request command 코드code 메뉴 열기Open menu 메뉴 열기Open menu 메뉴menu M01M01 카테고리category 카테고리category C01C01 쇼핑하기 Shopping 쇼핑하기Shopping 쇼핑shopping S01S01 전체 보기View all 전체 보기View all 전체all F01F01 인기 동영상Popular videos 인기 동영상Popular videos 인기popularity F04F04 더 보기 View more 더 보기View more 다음next NEXTNEXT

도 4a를 참고하면, 사용자가 사용자 인터페이스 화면에서 콘텐츠를 선택하기 위한 콘텐츠 요청 명령어인 "인기 동영상"을 입력할 수 있다. 이때, 사용자 인터페이스 화면은 사용자가 입력할 수 있는 콘텐츠 요청 명령어들을 표시할 수도 있고, 표시되지 않을 수도 있다. 한편, 사용자는 음성을 통해 콘텐츠 요청 명령어를 호출할 수도 있고, 리모콘과 같은 별도의 입력 장치를 이용하여 콘텐츠 요청 명령어를 호출할 수도 있다. Referring to FIG. 4A, a user can input a content request command " popular video " for selecting a content on a user interface screen. At this time, the user interface screen may display the content request commands that the user can input, or may not be displayed. Meanwhile, the user may call a content request command through voice or a content request command using a separate input device such as a remote controller.

음성 인식 장치(100)는 서버(200)로 콘텐츠 요청 명령어를 포함하는 콘텐츠 정보 요청 메시지를 전송한다. 그리고 음성 인식 장치(100)는 서버(200)로부터 "인기 동영상"과 관련된 콘텐츠 표시 정보를 수신할 수 있다. 예를 들면 표 6과 같은 관련 정보를 수신한다고 가정한다. The speech recognition apparatus 100 transmits a content information request message including a content request command to the server 200. [ And the speech recognition apparatus 100 may receive the content indication information related to the " popular video " from the server 200. [ For example, it is assumed that related information as shown in Table 6 is received.

콘텐츠 IDContent ID 콘텐츠 제목Content title CID1CID1 아이폰 6S KT 최대 지원금IPhone 6S KT Maximum Support CID2CID2 니콜생지르 남셔츠 4종Nicole Saint-Jean's four shirts CID3CID3 왁스배쏙티Wax bait CID4CID4 크리스탈 선스프레이Crystal Line Spray CID5CID5 휠라 남성 드로즈 Fila Male DROS CID6CID6 풍기 인견 여성 란제리Feng Shui Women's Lingerie CID7CID7 갤럭시 S6 엣지Galaxy S6 Edge CID8CID8 마조네뜨Mazonetto CID9CID9 KT 홈IoT 월4000원KT Home IoT 4000 won per month CID10CID10 커버퀸Cover queen ...... CIDn CIDn

도 4b를 참고하면, 음성 인식 장치(100)는 서버(200)로부터 수신한 콘텐츠 제목으로부터 각 콘텐츠를 식별할 수 있는 적어도 하나의 키워드를 추출하고, 추출한 키워드를 사용자가 해당 콘텐츠를 음성으로 선택하기 위한 음성 명령어로 매핑한다. 그리고 음성 인식 장치(100)는 적어도 하나의 콘텐츠 썸네일과 해당 음성 명령어를 대응하여 디스플레이 화면에 출력할 수 있다. 이때 화면에 표시된 순서대로 복수의 콘텐츠에 일련번호가 부여될 수 있다. 음성 인식 장치(100)는 각 콘텐츠에 해당하는 일련번호를 추가 음성 명령어로 설정할 수 있다. 예를 들면, 표 7과 같이 콘텐츠에 해당하는 일련번호를 추가 음성 명령어로 설정할 수 있다. Referring to FIG. 4B, the speech recognition apparatus 100 extracts at least one keyword that can identify each content from the content title received from the server 200, and the user selects the extracted keyword by speech For example. Then, the speech recognition apparatus 100 can output at least one content thumbnail and corresponding voice command on the display screen in correspondence with each other. At this time, serial numbers may be assigned to a plurality of contents in the order displayed on the screen. The voice recognition apparatus 100 can set the serial number corresponding to each content as an additional voice command. For example, as shown in Table 7, the serial number corresponding to the content can be set by an additional voice command.

콘텐츠 IDContent ID 콘텐츠 제목Content title 제1 음성 명령어The first voice command 제2 음성 명령어The second voice command CID1CID1 아이폰 6S KT 최대 지원금IPhone 6S KT Maximum Support 아이폰Iphone 일번One time CID2CID2 니콜생지르 남셔츠 4종Nicole Saint-Jean's four shirts 니콜생지르Nicole Saint-Jean 이번this time CID3CID3 왁스배쏙티Wax bait 왁스Wax 삼번Third CID4CID4 크리스탈 선스프레이Crystal Line Spray 크리스탈crystal 사번pressure of business CID5CID5 휠라 남성 드로즈 Fila Male DROS 휠라Fila 오번Auburn

도 4c를 참고하면, 사용자가 "더보기"를 선택하여 사용자가 콘텐츠를 선택하기 위한 다음 선택 화면을 출력한 예시이다. 이때 사용자가 선택할 수 있는 다음 콘텐츠 목록들은 새로운 일련번호를 부여받을 수 있다. 예를 들면 표 8과 같이 콘텐츠에 해당하는 일련번호를 새롭게 설정될 수 있다. Referring to FIG. 4C, the user selects " more " to output the next selection screen for the user to select the content. At this time, the next contents list which the user can select can be given a new serial number. For example, as shown in Table 8, the serial number corresponding to the content can be newly set.

콘텐츠 IDContent ID 콘텐츠 제목Content title 제1 음성 명령어The first voice command 제2 음성 명령어The second voice command CID6CID6 풍기 인견 여성 란제리Feng Shui Women's Lingerie 란제리lingerie 일번One time CID7CID7 갤럭시 S6 엣지Galaxy S6 Edge 갤럭시Galaxy 이번this time CID8CID8 마조네뜨Mazonetto 마조네뜨Mazonetto 삼번Third CID9CID9 KT 홈IoT 월4000원KT Home IoT 4000 won per month KTKT 사번pressure of business CID10CID10 커버퀸Cover queen 커버퀸Cover queen 오번Auburn

도 5는 본 발명의 다른 실시예에 따른 음성 인식 시스템의 구성도이다. 5 is a block diagram of a speech recognition system according to another embodiment of the present invention.

도 5를 참고하면, 본 발명의 다른 실시예에 따른 음성 인식 시스템(1000')은 음성 인식 장치(100'), 서버(200'), 그리고 표시 장치(300')를 포함한다.Referring to FIG. 5, a speech recognition system 1000 'according to another embodiment of the present invention includes a speech recognition apparatus 100', a server 200 ', and a display device 300'.

본 발명의 다른 실시예에 따른 음성 인식 장치(100')는 적어도 하나의 프로세서에 의해 동작하고, 음성 신호 수신부(110'), 음성 명령어 인식부(120'), 콘텐츠 정보 수신부(140'), 콘텐츠 식별 음성 명령어 생성부(150'), 그리고 유저 인터페이스 구성부(160')를 포함한다. The speech recognition apparatus 100 'according to another embodiment of the present invention is operated by at least one processor and includes a voice signal receiving unit 110', a voice command recognizing unit 120 ', a content information receiving unit 140' A content identification voice command generation unit 150 ', and a user interface configuration unit 160'.

이때, 도 2에서 설명한 내용과 중복하는 내용은 생략한다. Here, the contents overlapping with those described in FIG. 2 will be omitted.

음성 신호 수신부(110')는 사용자의 음성 신호를 수신한다.The voice signal receiving unit 110 'receives the voice signal of the user.

음성 명령어 인식부(120')는 음성 명령어를 수신하여 음성 인식 처리를 수행한다.The voice command recognizing unit 120 'receives voice commands and performs voice recognition processing.

콘텐츠 정보 요청부(130')는 음성 명령어 인식부(120')에서 인식한 음성 명령어가 콘텐츠 정보를 요청하는 음성 명령어이면 서버(200')로 콘텐츠 정보를 요청한다.The content information request unit 130 'requests content information to the server 200' if the voice command recognized by the voice command recognizer 120 'is a voice command for requesting the content information.

콘텐츠 정보 수신부(140')는 서버(200')로부터 콘텐츠 정보를 수신한다. 서버(200')는 콘텐츠 정보 요청부(130')의 요청에 따라 사용자가 선택할 수 있는 적어도 하나 이상의 콘텐츠 정보를 추출하고, 추출한 콘텐츠 정보를 음성 인식 장치(100)로 제공할 수 있다. 콘텐츠 정보는 콘텐츠의 제목, 썸네일, 그리고 각 콘텐츠를 선택하기 위한 콘텐츠 선택 음성 명령어를 포함하고, 기타 콘텐츠에 관한 부가 정보를 더 포함할 수도 있다. The content information receiving unit 140 'receives the content information from the server 200'. The server 200 'may extract at least one content information that the user can select according to a request of the content information request unit 130' and provide the extracted content information to the voice recognition apparatus 100. The content information includes a title of the content, a thumbnail, and a content selection voice command for selecting each content, and may further include additional information on other contents.

유저 인터페이스 구성부(160')는 사용자가 복수의 콘텐츠 중 어느 하나의 콘텐츠를 선택할 수 있는 선택 화면을 생성하여 사용자에게 제공할 수 있다. The user interface construction unit 160 'may generate a selection screen for the user to select any one of a plurality of contents and provide the selection screen to the user.

유저 인터페이스 구성부(160')는 서버(200')에서 제공된 콘텐츠 정보에 포함된 콘텐츠 썸네일, 그리고 개별 음성 명령어를 대응시켜 디스플레이 화면에 출력할 수 있도록 선택 화면을 구성할 수 있다. The user interface configuration unit 160 'may configure a selection screen so that the content thumbnails included in the content information provided by the server 200' and individual voice commands are associated with each other and output to the display screen.

유저 인터페이스 구성부(160')는 복수의 콘텐츠를 화면 분할 방식으로 하나의 화면에 출력할 수 있는데, 이때 화면에 표시된 순서대로 복수의 콘텐츠에 일련번호가 부여될 수 있다. 콘텐츠 식별 음성 명령어 생성부(150')는 복수의 콘텐츠에 해당하는 일련번호를 추가 음성 명령어로 설정할 수 있다.The user interface construction unit 160 'can output a plurality of contents on one screen in a screen division manner. At this time, serial numbers can be assigned to a plurality of contents in the order displayed on the screen. The content identification voice command generation unit 150 'may set a serial number corresponding to a plurality of contents as an additional voice command.

도 6은 본 발명의 한 실시예에 따른 음성 인식 장치가 음성 인식을 수행하여 콘텐츠를 선택하는 방법의 흐름도이다. 6 is a flowchart illustrating a method of selecting a content by performing speech recognition according to an embodiment of the present invention.

도 6을 참고하면, 음성 인식 장치(100)는 콘텐츠 정보 요청 메시지를 서버(200)로 전송한다(S110).Referring to FIG. 6, the voice recognition apparatus 100 transmits a content information request message to the server 200 (S110).

콘텐츠 정보 요청 메시지는 사용자로부터 수신한 콘텐츠 정보 요청 명령어를 포함할 수 있다. 콘텐츠 정보 요청 명령어는 콘텐츠의 종류, 콘텐츠의 장르, 콘텐츠와 관련된 키워드, 콘텐츠의 제목에 포함된 단어, 콘텐츠에 등장하는 인물의 이름, 콘텐츠의 제작사 제목, 그리고 복수의 콘텐츠를 포함하는 분류 기준(예를 들면, 인기 콘텐츠, 최신 콘텐츠) 중 어느 하나일 수 있다. The content information request message may include a content information request command received from the user. The content information request command includes a content type, a genre of the content, a keyword related to the content, a word included in the title of the content, a name of a character appearing in the content, a title of the producer of the content, For example, popular contents, latest contents).

서버(200)는 수신한 콘텐츠 정보 요청 명령어를 이용하여 적어도 하나 이상의 콘텐츠를 포함하는 콘텐츠 목록을 추출한다(S120). The server 200 extracts a content list including at least one content using the received content information request command (S120).

서버(200)는 음성 인식 장치(100)로부터 수신한 콘텐츠 정보 요청 명령어를 이용하여 데이터베이스에서 콘텐츠 정보 요청 명령어와 관련된 복수의 콘텐츠를 추출할 수 있다. The server 200 can extract a plurality of contents related to the contents information request command in the database using the contents information request command received from the speech recognition apparatus 100. [

서버(200)는 콘텐츠 썸네일, 콘텐츠 제목, 그리고 기타 콘텐츠와 관련된 부가 정보를 포함하는 콘텐츠 정보를 음성 인식 장치(100)로 전달한다(S130).The server 200 transmits content information including additional information related to content thumbnails, content titles, and other contents to the speech recognition apparatus 100 (S130).

음성 인식 장치(100)는 콘텐츠 정보에서 특정 콘텐츠를 선택하기 위한 음성 명령어를 추출한다(S140).The speech recognition apparatus 100 extracts a voice command for selecting a specific content from the content information (S140).

예를 들어, 음성 인식 장치(100)는 콘텐츠 제목을 형태소 분석하고, 콘텐츠 제목에 포함된 적어도 하나 이상의 키워드를 추출한다. 그리고 추출된 키워드 중에서 중복되는 단어를 제거한 다음, 추출된 키워드 중 적어도 하나 이상을 콘텐츠를 지칭하기 위한 콘텐츠 선택 음성 명령어로 매핑할 수 있다. For example, the speech recognition apparatus 100 may morpheme the content title and extract at least one or more keywords included in the content title. At least one of the extracted keywords may be mapped to a content selection voice command for designating a content, after eliminating duplicated words from the extracted keywords.

음성 인식 장치(100)는 콘텐츠 썸네일과 콘텐츠 선택 음성 명령어를 대응시켜 표시하는 사용자 인터페이스 화면을 구성한다(S150). 음성 인식 장치(100)는 도 3에서 설명한 바와 같이, 사용자 인터페이스 화면에 콘텐츠 썸네일, 콘텐츠 제목, 음성 명령어를 표시하여 사용자로 하여금 복수의 콘텐츠 중에서 어느 하나의 콘텐츠를 선택할 수 있도록 선택 화면을 제공할 수 있다. 이때, 음성 인식 장치(100)는 음성 명령어가 콘텐츠 제목과 구분될 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시되도록 선택 화면을 구성할 수 있다. The speech recognition apparatus 100 forms a user interface screen for displaying the content thumbnail and the content selection voice command in association with each other (S150). 3, the voice recognition apparatus 100 may display a content thumbnail, a content title, and a voice command on a user interface screen to provide a selection screen so that the user can select any one of a plurality of contents have. At this time, the speech recognition apparatus 100 may configure a selection screen such that at least one of a character size, a character color, a font size, and a font may be displayed so that a voice command can be distinguished from a content title.

음성 인식 장치(100)는 사용자로부터 입력된 음성 명령어에 대응하는 콘텐츠를 서버(200)로 요청한다(S160). 음성 인식 장치(100)는 사용자의 음성 신호를 수신하고, 음성 인식 처리하여 음성 명령어에 대응하는 콘텐츠를 서버(200)로 요청하는 메시지를 전송할 수 있다. The speech recognition apparatus 100 requests the server 200 for a content corresponding to the voice command input from the user (S160). The voice recognition apparatus 100 may receive a voice signal of a user, perform voice recognition processing, and transmit a message requesting the server 200 for a content corresponding to a voice command.

서버(200)는 음성 인식 장치(100)의 요청에 대응하여 사용자가 선택한 콘텐트를 음성 인식 장치(100)로 전송한다(S170). The server 200 transmits the content selected by the user to the voice recognition apparatus 100 in response to the request of the voice recognition apparatus 100 (S170).

도 7은 본 발명의 다른 실시예에 따른 음성 인식 장치가 음성 인식을 수행하여 콘텐츠를 선택하는 방법의 흐름도이다. 7 is a flowchart illustrating a method of selecting a content by performing speech recognition according to another embodiment of the present invention.

도 7을 참고하면, 음성 인식 장치(100)는 서버(200)로부터 콘텐츠 제목을 포함하는 콘텐츠 목록을 수신한다(S210). 서버(200)는 데이터 베이스에 저장된 복수의 콘텐츠의 제목, 그리고 썸네일을 포함하는 콘텐츠 목록을 음성 인식 장치(100)로 전송할 수 있다. Referring to FIG. 7, the speech recognition apparatus 100 receives a content list including a content title from the server 200 (S210). The server 200 may transmit to the speech recognition apparatus 100 a list of contents including titles and thumbnails of a plurality of contents stored in the database.

음성 인식 장치(100)는 콘텐츠 제목을 형태소 분석하여 콘텐츠 제목에 포함된 적어도 하나의 키워드를 추출한다(S220). 음성 인식 장치(100)는 내부적으로 구축된 데이터베이스를 이용하여 콘텐츠 제목에서 키워드를 추출할 수 있다. The speech recognition apparatus 100 morphologizes the content title and extracts at least one keyword included in the content title (S220). The speech recognition apparatus 100 may extract a keyword from a content title using a database constructed internally.

음성 인식 장치(100)는 추출된 명사 중 중복하는 단어를 제거하여, 콘텐츠를 선택할 수 있는 콘텐츠 선택 음성 명령어로 설정한다(S230).The speech recognition apparatus 100 removes redundant words among the extracted nouns, and sets the selected contents as a content selection voice command (S230).

이때, 음성 인식 장치(100)는 사용자에게 표시하는 콘텐츠 개수에 따라 복수의 콘텐츠에 일련번호를 부여하고, 일련번호를 추가 음성 명령어로 설정할 수도 있다. At this time, the speech recognition apparatus 100 may assign a serial number to a plurality of contents according to the number of contents to be displayed to the user, and may set the serial number as an additional voice command.

그리고 음성 인식 장치(100)는 콘텐츠 썸네일, 콘텐츠 제목, 그리고 콘텐츠 선택 음성 명령어를 포함하는 사용자 인터페이스 화면을 구성한다(S240). Then, the speech recognition apparatus 100 forms a user interface screen including a content thumbnail, a content title, and a content selection voice command (S240).

즉, 음성 인식 장치(100)는 사용자로 하여금 복수의 콘텐츠 중에서 어느 하나의 콘텐츠를 선택할 수 있도록 선택 화면을 제공할 수 있다. 음성 인식 장치(100)는 음성 명령어가 콘텐츠 제목과 구분될 수 있도록 글자 크기, 글자 색상, 글자 굵기, 글자체 중 어느 하나 이상이 다르게 표시되도록 선택 화면을 구성할 수 있다. That is, the speech recognition apparatus 100 can provide a selection screen so that a user can select any one of a plurality of contents. The speech recognition apparatus 100 may configure a selection screen such that at least one of a character size, a character color, a font size, and a font is displayed so that a voice command can be distinguished from a content title.

이와 같이, 본 발명의 실시예에 따르면 복잡한 단어의 조합으로 구성된 콘텐츠 제목을 간단한 음성 명령어를 이용하여 사용자가 입력할 수 있도록 할 수 있다. 즉, 본 발명의 실시예에 따르면 사용자는 복잡한 콘텐츠 이름을 모두 입력어로 할 필요 없이, 간단한 음성 명령어를 이용하여 콘텐츠를 선택할 수 있다. As described above, according to the embodiment of the present invention, it is possible for the user to input a content title composed of a complex word combination using a simple voice command. That is, according to the embodiment of the present invention, a user can select a content using a simple voice command without having to input all the complicated content names.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

A speech recognition method of a speech recognition apparatus operating by at least one processor,
Extracting at least one keyword that identifies each content from at least one piece of content information,
Mapping the extracted keyword to a content selection voice command for selecting the content as a voice;
Receiving a voice signal including the content selection voice command from a user, and
And providing the specific content corresponding to the content selection voice command.

The method of claim 1,
The step of extracting the voice command list
Extracting the at least one keyword by morpheme analysis of the content title included in the content information, and
Removing redundant items from the extracted keywords, and mapping the keywords to the individual voice commands.

3. The method of claim 2,
A thumbnail of the at least one content, and the content selection voice command, and outputting the corresponding content selection voice command to a display screen,
Wherein the content selection voice command is different from any one of a character size, a character color, a character size, and a font so as to be distinguishable from the content title.

4. The method of claim 3,
The step of outputting to the display screen
If the content items are plural
And outputting the plurality of content items on a screen in a screen division manner, and displaying the serial numbers assigned to the plurality of content items together.

5. The method of claim 4,
The step of mapping to the content selection voice command
And mapping the serial number to an additional voice command.

A method for displaying a user interface screen for performing speech recognition in a speech recognition apparatus operating by at least one processor,
Receiving display information of a plurality of contents from a contents server,
Extracting at least one keyword identifying each content from the display information, and mapping the extracted keyword to a content selection voice command for selecting the content as a voice; and
And outputting a thumbnail included in the display information for each content in association with the content selection voice command on a display screen.

The method of claim 6,
The step of mapping to the content selection voice command
Extracting the at least one keyword by morpheme-analyzing the content title included in the content information for each of the plurality of contents, and
And removing the duplicated items among the extracted keywords, and mapping the extracted items to a content selection voice command.

The method of claim 6,
The step of outputting to the display screen
Further outputting the content title,
Wherein the content selection voice command includes at least one of a character size, a character color, a font size, and a font so that the content selection voice command can be distinguished from the content title.

The method of claim 6,
The step of outputting to the display screen
Further displaying a serial number corresponding to the plurality of contents,
The step of mapping to the content selection voice command
And mapping the serial number to a configured additional voice command.

The method of claim 6,
Further comprising transmitting a content information request message to the content server,
The content information request message includes at least one of a content type, a genre of a content, a keyword associated with the content, a word included in the title of the content, a name of a character appearing in the content, a title of the producer of the content, A method of displaying a user interface screen including any one of the same commands.

17. A speech recognition device operating by at least one processor,
A content information receiver for receiving a plurality of pieces of content information from a content server,
A content selection voice command generation unit for extracting at least one keyword that identifies each content from the content information and mapping the extracted keyword to a content selection voice command for selecting the content as a voice;
And displaying a thumbnail included in the content information on the display screen in association with the content selection voice command for each content.

12. The method of claim 11,
The content selection voice command generation unit
Extracts the at least one keyword by morpheme analysis of the content title included in the content information, removes duplicated items from the extracted keywords, and maps the selected at least one keyword to the content selection voice command word.

The method of claim 12,
The user interface configuration unit
And displays at least one of a character size, a character color, a character size, and a font differently so as to distinguish the content selection voice command from the content title.

The method of claim 13,
The user interface configuration unit
Outputting the plurality of content items on a screen in a screen division manner, and displaying the serial numbers assigned to the plurality of content items together.

The method of claim 14,
The content selection voice command generation unit
And the serial number is mapped to an additional voice command.