KR20120083025A

KR20120083025A - Multimedia device for providing voice recognition service by using at least two of database and the method for controlling the same

Info

Publication number: KR20120083025A
Application number: KR1020110004443A
Authority: KR
Inventors: 강민구
Original assignee: 엘지전자 주식회사
Priority date: 2011-01-17
Filing date: 2011-01-17
Publication date: 2012-07-25
Also published as: KR101775532B1

Abstract

PURPOSE: A multimedia device for providing voice recognition service and a control method thereof are provided to enlarge the range of voice recognition. CONSTITUTION: A multimedia device extracts feature vector from received audio data(S620). In case the keyword corresponding to the extracted feature vector exists in the first database, the multimedia device controls the execution of a voice recognition service according to keyword(S630,S640). In case the corresponding key word does not exist, the multimedia device searches second database(S650). In case the corresponding key word exists in the second database, the multimedia device controls the execution of the voice recognition service according to the keyword(S660).

Description

MULTIMEDIA DEVICE FOR PROVIDING VOICE RECOGNITION SERVICE BY USING AT LEAST TWO OF DATABASE AND THE METHOD FOR CONTROLLING THE SAME}

본 발명은 멀티미디어 장치 기술에 대한 것으로서, 보다 상세하게는 서로 다른 적어도 2개 이상의 데이터베이스를 이용하여 음성 인식 서비스를 제공하는 멀티미디어 디바이스 및 그 제어 방법에 대한 것이다.The present invention relates to a multimedia device technology, and more particularly, to a multimedia device for providing a voice recognition service using at least two different databases and a control method thereof.

영상표시기기는 예를 들어, 사용자가 시청할 수 있는 방송영상을 수신하여 처리하는 기능을 갖춘 장치이다. 영상표시기기는 예를 들어, 방송국에서 송출되는 방송신호 중 사용자가 선택한 방송을 디스플레이에 표시한다. 현재 방송은 전 세계적으로 아날로그 방송에서 디지털 방송으로 전환하고 있는 추세이다. The image display device is, for example, a device having a function of receiving and processing a broadcast image that a user can watch. The image display device displays, for example, a broadcast selected by a user on a display among broadcast signals transmitted from a broadcasting station. Currently, broadcasting is shifting from analog broadcasting to digital broadcasting worldwide.

디지털 방송은 디지털 영상 및 음성 신호를 송출하는 방송을 의미한다. 디지털 방송은 아날로그 방송에 비해, 외부 잡음에 강해 데이터 손실이 작으며, 에러 정정에 유리하며, 해상도가 높고, 선명한 화면을 제공한다. 또한, 디지털 방송은 아날로그 방송과 달리 양방향 서비스가 가능하다. 한편, 최근 들어, 영상표시기기의 기능 및 멀티미디어기기의 기능을 결합시킨 스마트 TV(Smart TV)가 논의되고 있다.Digital broadcasting refers to broadcasting for transmitting digital video and audio signals. Digital broadcasting is more resistant to external noise than analog broadcasting, so it has less data loss, is advantageous for error correction, has a higher resolution, and provides a clearer picture. In addition, unlike analog broadcasting, digital broadcasting is capable of bidirectional services. On the other hand, smart TV (Smart TV) that combines the function of the video display device and the function of the multimedia device has recently been discussed.

또한, 최근에 있어서, 종래 기술에 의한 일부 디바이스들은 음성 인식 기술을 선보이고 있다. 그러나, 제한된 데이터베이스만을 이용하는 한계가 있어서, 사용자에게 보다 폭넓은 범위의 음성 인식 서비스를 제공할 수 없는 문제점이 있었다.Also, in recent years, some devices according to the prior art have introduced voice recognition technology. However, there is a limitation of using only a limited database, and thus there is a problem in that a wider range of speech recognition services cannot be provided to a user.

본 발명의 일실시예는, 멀티미디어 디바이스의 내부 데이터베이스 및 외부 데이터베이스를 모두 이용하여, 음성 인식의 범위를 대폭 증대시키는 솔루션을 제공하고자 한다.One embodiment of the present invention is to provide a solution that greatly increases the scope of speech recognition by using both an internal database and an external database of a multimedia device.

또한, 본 발명의 다른 일실시예는, 멀티미디어 디바이스 자체를 컨트롤 하는 음성 인식 명령과 키워드 관련 데이터를 제공하는 음성 인식 명령을 구별함으로써, 사용자가 액세스 하고자 하는 서비스를 보다 신속히 제공하는 프로토콜을 정의하고자 한다.In addition, another embodiment of the present invention is to define a protocol for providing a service that a user wants to access more quickly by distinguishing a voice recognition command for controlling the multimedia device itself from a voice recognition command for providing keyword related data. .

그리고, 본 발명의 또 다른 일실시예는, 멀티미디어 디바이스와 통신 가능한 모바일 장치를 이용하여 외부 잡음에 강인한 음성 인식 성능을 담보하기 위한 기술을 제공하고자 한다.In addition, another embodiment of the present invention is to provide a technique for ensuring voice recognition performance that is robust to external noise by using a mobile device that can communicate with a multimedia device.

본 발명의 일실시예에 의한 서로 다른 적어도 2개 이상의 데이터베이스를 이용하여 음성 인식 서비스를 제공하는 멀티미디어 디바이스의 제어 방법은, 상기 멀티미디어 디바이스의 유저의 음성 데이터를 수신하는 단계와, 상기 수신된 음성 데이터로부터 인식에 필요한 특징 벡터를 추출하는 단계와, 상기 멀티미디어 디바이스내 위치한 제1데이터베이스를 이용하여, 상기 추출된 특징 벡터에 대응하는 키워드가 존재하는지 여부를 판단하는 단계와, 상기 제1데이터베이스에 존재하는 경우, 상기 키워드에 따른 음성 인식 서비스가 실행되도록 제어하는 단계와, 상기 제1데이터베이스에 존재하지 않는 경우, 상기 멀티미디어 디바이스 외부에 위치하며 또한 네트워크로 연결된 제2데이터베이스를 이용하여, 상기 추출된 특징 벡터에 대응하는 키워드가 존재하는지 여부를 판단하는 단계와, 그리고 상기 제2데이터베이스에 존재하는 경우, 상기 키워드에 따른 음성 인식 서비스가 실행되도록 제어하는 단계를 포함한다.According to an embodiment of the present invention, a method of controlling a multimedia device providing a voice recognition service using at least two different databases includes: receiving voice data of a user of the multimedia device, and receiving the received voice data. Extracting a feature vector required for recognition from the apparatus; determining whether a keyword corresponding to the extracted feature vector exists by using a first database located in the multimedia device; and presenting the feature vector in the first database. If the voice recognition service according to the keyword is executed, and if not present in the first database, using the second database located outside the multimedia device and connected to the network, the extracted feature vector The keyword corresponding to And determining whether the material and, if present in the second database, and a step of controlling so that the voice recognition services are running according to the keyword.

그리고, 본 발명의 일실시예에 의한 서로 다른 적어도 2개 이상의 데이터베이스(DataBase)를 이용하여 음성 인식 서비스를 제공하는 멀티미디어 디바이스(multimedia device)는, 상기 멀티미디어 디바이스의 유저의 음성 데이터를 수신하는 보이스 센서와, 상기 수신된 음성 데이터로부터 인식에 필요한 특징 벡터를 추출하는 전처리부와, 상기 멀티미디어 디바이스내 위치한 제1데이터베이스를 이용하여, 상기 추출된 특징 벡터를 분석하는 인식부와, 상기 분석 결과 상기 음성 데이터가 기설정된 태그를 포함하고 있는 경우, 상기 음성 데이터에 대응하는 디바이스 컨트롤 명령이 실행되도록 제어하는 제어부와, 그리고 상기 분석 결과 상기 음성 데이터가 기설정된 태그를 포함하고 있지 않은 경우, 상기 추출된 특징 벡터를, 제2데이터베이스를 포함하는 외부 디바이스로 전송하는 네트워크 인터페이스를 포함한다.In addition, a multimedia device for providing a voice recognition service using at least two different databases (DataBase) according to an embodiment of the present invention, a voice sensor for receiving voice data of the user of the multimedia device A preprocessing unit for extracting a feature vector required for recognition from the received voice data, a recognition unit for analyzing the extracted feature vector using a first database located in the multimedia device, and the analysis result of the voice data The control unit controls to execute a device control command corresponding to the voice data when the tag includes a preset tag, and the extracted feature vector when the voice data does not include the preset tag as a result of the analysis. External, including the second database A network interface for sending in a vice.

본 발명의 일실시예에 의하면, 멀티미디어 디바이스의 내부 데이터베이스 및 외부 데이터베이스를 모두 이용하여, 음성 인식의 범위를 대폭 증대시키는 솔루션을 제공한다.According to one embodiment of the present invention, using both an internal database and an external database of a multimedia device, a solution for greatly increasing the scope of speech recognition is provided.

또한, 본 발명의 다른 일실시예에 의하면, 멀티미디어 디바이스 자체를 컨트롤 하는 음성 인식 명령과 키워드 관련 데이터를 제공하는 음성 인식 명령을 구별함으로써, 사용자가 액세스 하고자 하는 서비스를 보다 신속히 제공하는 프로토콜을 정의한다.In addition, according to another embodiment of the present invention, by distinguishing between the speech recognition command for controlling the multimedia device itself and the speech recognition command for providing keyword-related data, a protocol for providing a service that the user wants to access more quickly is defined. .

그리고, 본 발명의 또 다른 일실시예에 의하면, 멀티미디어 디바이스와 통신 가능한 모바일 장치를 이용하여 외부 잡음에 강인한 음성 인식 성능을 담보하기 위한 기술을 제공한다.In addition, according to another embodiment of the present invention, a technology for securing voice recognition performance that is robust to external noise using a mobile device capable of communicating with a multimedia device is provided.

보다 구체적인 발명의 효과에 대해서는, 이하 목차에서 상세히 후술하도록 하겠다.More specific effects of the invention will be described later in detail in the table of contents.

도 1은 본 발명의 일실시예에 따른 멀티미디어 장치를 포함한 전체 시스템의 일예를 개략적으로 나타낸 도면이다.
도 2는 본 발명의 일실시예에 의한 멀티미디어 디바이스와, 외부 디바이스, 그리고 서버가 네트워크로 연결된 상태를 도시한 도면이다.
도 3은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 구성 모듈을 보다 상세히 도시한 도면이다.
도 4는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식을 위한 내부 데이터베이스(DataBase)의 제1실시예를 도시한 도면이다.
도 5는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식을 위한 내부 데이터베이스(DataBase)의 제2실시예를 도시한 도면이다.
도 6은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 제어 방법을 시간 흐름에 따라 도시한 플로우 차트이다.
도 7은 도 6에 도시된 전체 단계들 중에서, 특정 S640 단계를 보다 상세히 도시한 플로우 차트이다.
도 8은 도 6에 도시된 전체 단계들 중에서, 특정 S660 단계를 보다 상세히 도시한 플로우 차트이다.
도 9는 본 발명의 다른 일실시예에 의한 멀티미디어 디바이스의 제어 방법을 시간 흐름에 따라 도시한 플로우 차트이다.
도 10은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 일예를 도시한 도면이다.
도 11은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 다른 일예를 도시한 도면이다.
그리고, 도 12는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 또 다른 일예를 도시한 도면이다.1 is a view schematically showing an example of an entire system including a multimedia apparatus according to an embodiment of the present invention.
2 is a diagram illustrating a state in which a multimedia device, an external device, and a server are connected through a network according to an embodiment of the present invention.
3 is a diagram illustrating in detail a configuration module of a multimedia device according to an embodiment of the present invention.
4 is a diagram illustrating a first embodiment of an internal database (DataBase) for speech recognition of a multimedia device according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a second embodiment of an internal database (DataBase) for speech recognition of a multimedia device according to an embodiment of the present invention.
6 is a flowchart illustrating a method of controlling a multimedia device according to an embodiment of the present invention over time.
FIG. 7 is a flowchart illustrating in more detail a specific S640 step among all the steps shown in FIG. 6.
FIG. 8 is a flowchart illustrating in more detail a specific S660 step among all the steps shown in FIG. 6.
9 is a flowchart illustrating a control method of a multimedia device according to another embodiment of the present invention over time.
10 is a diagram illustrating an example of a voice recognition service of a multimedia device according to an embodiment of the present invention.
11 is a diagram illustrating another example of a voice recognition service of a multimedia device according to an embodiment of the present invention.
12 is a diagram illustrating another example of a voice recognition service of a multimedia device according to an embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명의 여러가지 실시예들을 보다 상세히 설명하도록 하겠다. 나아가, 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 단순히 본 명세서 작성의 용이함을 고려하여 부여되는 것으로서, 상기 "모듈" 및 "부"는 서로 혼용되어 사용될 수 있으며, 하드웨어 또는 소프트웨어로 설계 가능하다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings. Furthermore, the suffixes "module" and "unit" for components used in the following description are merely given in consideration of ease of writing the present specification, and the "module" and "unit" may be used interchangeably with each other. Can be designed in hardware or software.

한편, 본 명세서에서 기술되는 멀티미디어 장치(device)는, 예컨대 방송 데이터를 수신하여 처리하는 여러가지 타입의 디바이스들에 해당한다. 나아가 상기 멀티미디어 디바이스는 Connected TV에 대응할 수도 있으며, 상기 Connected TV는 방송 수신 기능 뿐만 아니라 유무선 통신 장치 등이 추가되어, 수기 방식의 입력 장치, 터치 스크린 또는 모션(motion) 인식 리모콘 등 보다 사용에 편리한 인터페이스를 가질 수 있다. 그리고, 유선 또는 무선 인터넷 기능의 지원으로 인터넷 및 컴퓨터에 접속되어, 이메일, 웹브라우징, 뱅킹 또는 게임 등의 기능도 수행가능하다. 이러한 다양한 기능을 위해 표준화된 범용 OS가 사용될 수도 있다.Meanwhile, the multimedia devices described herein correspond to various types of devices that receive and process broadcast data, for example. Furthermore, the multimedia device may correspond to a connected TV. The connected TV may include a wired / wireless communication device as well as a broadcast reception function, and thus may be more convenient to use, such as a handwritten input device, a touch screen, or a motion recognition remote controller. It can have In addition, by being connected to the Internet and a computer with the support of a wired or wireless Internet function, it is possible to perform functions such as e-mail, web browsing, banking or gaming. Standardized general-purpose operating systems may be used for these various functions.

따라서, 상기 Connected TV는, 예를 들어 범용의 OS 커널 상에, 다양한 애플리케이션이 자유롭게 추가되거나 삭제 가능하므로, 사용자 친화적인 다양한 기능이 수행될 수 있다. 상기 Connected TV는, 보다 구체적으로 예를 들면, 웹 TV, 인터넷 TV, HBBTV, 스마트 TV, DTV 등이 될 수 있으며, 경우에 따라 스마트폰에도 적용 가능하다.Therefore, the connected TV can be freely added or deleted, for example, on a general-purpose OS kernel, so that various user-friendly functions can be performed. More specifically, the connected TV may be, for example, a web TV, an Internet TV, an HBBTV, a smart TV, a DTV, or the like, and may be applicable to a smartphone in some cases.

나아가, 이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 본 발명의 실시예를 상세하게 설명하지만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. BRIEF DESCRIPTION OF THE DRAWINGS The above and other features and advantages of the present invention will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 이는 당분야에 종사하는 기술자의 의도 또는 관례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 그 의미를 기재할 것이다. 따라서 본 명세서에서 사용되는 용어는, 단순한 용어의 명칭이 아닌 그 용어가 가지는 실질적인 의미와 본 명세서의 전반에 걸친 내용을 토대로 해석되어야 함을 밝혀두고자 한다.As used herein, terms used in the present invention are selected from general terms that are widely used in the present invention while taking into account the functions of the present invention, but these may vary depending on the intention or custom of a person skilled in the art or the emergence of new technologies. In addition, in certain cases, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in the corresponding description of the invention. Therefore, it is intended that the terminology used herein should be interpreted based on the meaning of the term rather than on the name of the term, and on the entire contents of the specification.

도 1은 본 발명의 일실시예에 따른 멀티미디어 장치를 포함한 전체 방송 시스템의 일예를 개략적으로 나타낸 도면이다. 도 1의 멀티미디어 장치는 예컨대 Connected TV에 대응할 수도 있으나, 본 발명의 권리범위가 Connected TV 만으로 한정되는 것은 아니며 본 발명의 권리범위는 원칙적으로 특허청구범위에 의해 정해져야 한다.1 is a diagram schematically showing an example of an entire broadcasting system including a multimedia apparatus according to an embodiment of the present invention. Although the multimedia apparatus of FIG. 1 may correspond to, for example, a connected TV, the scope of the present invention is not limited to the connected TV alone, and the scope of the present invention should be defined by the claims.

도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 멀티미디어 장치를 포함한 전체 시스템은, 컨텐츠 제공자(Content Provider;CP)(10), 서비스 제공자(Service Provider;SP)(20), 네트워크 제공자(Network Provider; NP)(30) 및 HNED(40)로 구분될 수 있다. 상기 HNED(40) 는, 예를 들어 본 발명의 실시예에 따른 멀티미디어 장치인 클라이언트(100)에 대응한다.As shown in FIG. 1, an entire system including a multimedia apparatus according to an embodiment of the present invention includes a content provider (CP) 10, a service provider (SP) 20, and a network provider. (Network Provider; NP) (30) and HNED (40) can be divided. The HNED 40 corresponds to, for example, the client 100 which is a multimedia device according to an embodiment of the present invention.

컨텐츠 제공자(10)는, 각종 컨텐츠를 제작하여 제공한다. 컨텐츠 제공자(10)에는 도 1에서와 같이 지상파 방송 송출자(terrestrial broadcaster), 케이블 방송 사업자(cable SO(System Operator) 또는 MSO(Multiple System Operator), 위성 방송 송출자(satellite broadcaster), 인터넷 방송 송출자(Internet broadcaster) 등이 예시될 수 있다. 또한, 컨텐츠 제공자(10)는, 방송 컨텐츠 외에, 다양한 애플리케이션 등을 제공할 수도 있다. The content provider 10 produces and provides various contents. As shown in FIG. 1, the content provider 10 includes a terrestrial broadcaster, a cable system operator or a multiple system operator, a satellite broadcaster, and an internet broadcaster. An internet broadcaster, etc. may be exemplified, etc. In addition, the content provider 10 may provide various applications and the like in addition to the broadcast content.

서비스 제공자(20)는, 컨텐츠 제공자(10)가 제공하는 컨텐츠들을 서비스 패키지화하여 제공할 수 있다. 예를 들어, 도 1의 서비스 제공자(20)는, 제1 지상파 방송, 제2 지상파 방송, 케이블 MSO, 위성 방송, 다양한 인터넷 방송, 애플리케이션 등을 패키지화하여 사용자에게 제공할 수 있다.The service provider 20 may provide a service package of contents provided by the content provider 10. For example, the service provider 20 of FIG. 1 may package and provide a first terrestrial broadcast, a second terrestrial broadcast, a cable MSO, satellite broadcast, various internet broadcasts, applications, and the like to a user.

네트워크 제공자(30)는, 서비스를 클라이언트(100)에게 제공하기 위한 네트워크 망을 제공할 수 있다. 클라이언트(100)는 홈 네트워크(Home Network End User;HNED)를 구축하여 서비스를 제공받을 수도 있다.The network provider 30 may provide a network for providing a service to the client 100. The client 100 may establish a home network end user (HNED) to receive a service.

한편, 클라이언트(100)도 네트워크를 통해, 컨텐츠를 제공하는 것이 가능하다. 이러한 경우, 상술한 바와 달리, 역으로, 클라이언트(100)가 컨텐츠 제공자가 될 수 있으며, 컨텐츠 제공자(10)가 클라이언트(100)로부터 컨텐츠를 수신할 수도 있다. 이와 같이 설계된 경우, 양방향 컨텐츠 서비스 또는 데이터 서비스가 가능한 장점이 있다.On the other hand, the client 100 can also provide content through the network. In this case, unlike the above, the client 100 may be a content provider, and the content provider 10 may receive content from the client 100. In the case of designing as described above, an interactive content service or a data service is possible.

도 2는 본 발명의 일실시예에 의한 멀티미디어 디바이스와, 외부 디바이스, 그리고 서버가 네트워크로 연결된 상태를 도시한 도면이다. 이하, 도 2를 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스가 음성 인식 서비스를 실행하는 과정에서 내부 데이터베이스 및 외부 데이터베이스를 이용하는 과정을 개략적으로 설명하면 다음과 같다.2 is a diagram illustrating a state in which a multimedia device, an external device, and a server are connected through a network according to an embodiment of the present invention. Hereinafter, a process of using an internal database and an external database in a process of executing a voice recognition service by a multimedia device according to an embodiment of the present invention will be described below.

본 발명의 일실시예에 의한 멀티미디어 디바이스(200)는 음성 인식이 가능한 디바이스로서, 예컨대 커넥티드 TV, 스마트 TV, 웹 TV, 인터넷 TV, 네트워크 TV 등에 해당한다. 나아가, 상기 멀티미디어 디바이스(200)는, 음성 인식 과정에서 필요한 내부 데이터베이스(201)를 포함하고 있다. 그러나, 전술한 내부 데이터베이스(201)는 상대적으로 제한된 데이터만을 구비하고 있으므로, 음성 인식에 따른 기능 실행 역시 제한적일 수 밖에 없다.The multimedia device 200 according to an embodiment of the present invention is a device capable of speech recognition, and corresponds to, for example, a connected TV, a smart TV, a web TV, an Internet TV, a network TV, and the like. In addition, the multimedia device 200 includes an internal database 201 necessary for a voice recognition process. However, since the above-described internal database 201 includes only relatively limited data, the execution of functions according to voice recognition is also limited.

이와 같은 문제점을 해결하기 위하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스(200)는 외부에 위치한 제1기기의 데이터베이스(210) 및 제2기기의 데이터베이스(220)와 연결되어 있으며, 전술한 제1기기 및 제2기기는 예컨대 DLNA(Digital Living Network Alliance) 기반의 주변 디바이스들에 해당한다. 또는, 상기 전술한 제1기기 및 제2기기는 예를 들어 USB, HDMI CEC 연결 장비 등에 해당한다.In order to solve such a problem, the multimedia device 200 according to an embodiment of the present invention is connected to the database 210 of the first device and the database 220 of the second device, which are located outside, The first device and the second device correspond to peripheral devices based on, for example, Digital Living Network Alliance (DLNA). Or, the first device and the second device described above correspond to, for example, USB, HDMI CEC connection equipment.

그리고, 상기 멀티미디어 디바이스(200)는, 인터넷 등의 네트워크로 연결된 서버(230)와 통신하여, 음성 인식 과정에서 외부 데이터베이스(231) 또한 이용할 수 있도록 설계된다.In addition, the multimedia device 200 is designed to communicate with a server 230 connected through a network such as the Internet so that an external database 231 may also be used in a voice recognition process.

따라서, 이와 같이 설계하는 경우, 멀티미디어 디바이스(200)는 내부의 제한된 데이터베이스(201)에만 의존하지 않고, 외부의 데이터베이스들에 액세스 가능하게 되므로, 보다 다양한 음성 인식 서비스가 가능해 지는 장점이 있다. 또한, 특정 컨디션에 따라, 내부 데이터베이스 또는 외부 데이터베이스를 선택적으로 사용하도록 설계함으로써, 처리 속도도 함께 개선되는 효과가 있다. 이하, 도 3을 참조하여 본 발명의 일실시예에 의한 멀티미디어 디바이스의 동작에 대해 보다 상세히 후술하도록 하겠다.Therefore, in this design, the multimedia device 200 does not rely only on the internal limited database 201, but can access external databases, thereby enabling a variety of voice recognition services. In addition, according to a specific condition, by designing to use an internal database or an external database selectively, there is an effect that the processing speed is also improved. Hereinafter, the operation of the multimedia device according to an embodiment of the present invention will be described in more detail with reference to FIG. 3.

도 3은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 구성 모듈을 보다 상세히 도시한 도면이다. 이하, 도 3을 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스가 내부 또는 외부 데이터베이스를 이용하여, 음성 인식 서비스를 실행하는 과정을 상세히 설명하면 다음과 같다.3 is a diagram illustrating in detail a configuration module of a multimedia device according to an embodiment of the present invention. Hereinafter, referring to FIG. 3, a process of executing a voice recognition service by using a multimedia device according to an embodiment of the present invention using an internal or external database will be described in detail.

도 3에 도시된 바와 같이, 본 발명의 일실시예에 의한 멀티미디어 디바이스(300)는, 보이스 센서(301), 전처리부(302), 인식부(303), 제어부(304), 네트워크 인터페이스(305), 그리고 디스플레이부(306) 등을 포함하여 이루어 진다. 다만, 도 3에 도시된 모듈들은 일실시예이며, 본 발명의 권리범위는 원칙적으로 특허청구범위에 의해 정해져야 한다. 또한, 상기 멀티미디어 디바이스(300) 내부의 보이스 센서(301)에서 유저의 음성을 디텍트 하도록 설계하는 것도 가능하지만, 도 3에 도시된 모바일 디바이스(310)를 이용하여, 유저의 음성을 디텍트 하는 것도 가능하다. 이와 같이 설계하는 경우, 유저는 자신의 입과 보다 인접한 모바일 디바이스(310)에 부착된 음성 디텍팅 센서를 통해 음성 데이터를 전달하는 것이 가능하므로, 주변 노이즈 또는 멀티미디어 디바이스(300) 자체의 오디오 신호를 배제할 수 있는 장점이 있다. 또한, 상기 멀티미디어 디바이스(300)와 통신 가능한 상기 모바일 디바이스(310)는, 예를 들어 휴대폰, 스마트폰, 랩탑, 태블릿 PC 등이 될 수가 있다.As shown in FIG. 3, the multimedia device 300 according to an embodiment of the present invention may include a voice sensor 301, a preprocessor 302, a recognizer 303, a controller 304, and a network interface 305. ), And the display unit 306 and the like. However, the modules shown in FIG. 3 are one embodiment, and the scope of the present invention should in principle be defined by the claims. In addition, although the voice sensor 301 inside the multimedia device 300 may be designed to detect a user's voice, the voice of the user may be detected using the mobile device 310 illustrated in FIG. 3. It is also possible. In this design, since the user can transmit voice data through the voice detecting sensor attached to the mobile device 310 which is closer to his or her mouth, the user may receive ambient noise or the audio signal of the multimedia device 300 itself. There is an advantage that can be excluded. In addition, the mobile device 310 that can communicate with the multimedia device 300 may be, for example, a mobile phone, a smartphone, a laptop, a tablet PC, or the like.

한편, 본 발명의 일실시예에 의한 멀티미디어 디바이스(300)의 음성 인식 단계는 크게 2가지 영역으로 나누어 설명할 수 있다. 즉, 도 3에 도시된 전처리부(302) 및 인식부(303)에서 주요 역할을 수행하게 된다.On the other hand, the speech recognition step of the multimedia device 300 according to an embodiment of the present invention can be largely divided into two areas. In other words, the preprocessor 302 and the recognizer 303 shown in FIG. 3 play a major role.

상기 전처리부(302)는 사용자가 발성한 음성으로부터 인식에 필요한 특징 벡터를 추출하고, 상기 인식부(303)는 특징 벡터를 분석하여 음성 인식 결과를 얻는다. 예컨대, 마이크 등을 통해 입력된 음성이 멀티미디어 디바이스(300)로 들어오면, 상기 전처리부(302)는, 일정 시간(예를 들어, 1/100초)마다 음성학적 특징을 표현할 수 있는 특징 벡터들을 추출한다. The preprocessor 302 extracts a feature vector required for recognition from the voice spoken by the user, and the recognizer 303 analyzes the feature vector to obtain a voice recognition result. For example, when a voice input through a microphone or the like enters the multimedia device 300, the preprocessor 302 may include feature vectors capable of expressing a phonetic feature every predetermined time (for example, 1/100 second). Extract.

전술한 특징 벡터들은 음성학적 특성을 잘 나타내며 그 이외의 요소, 즉 배경 잡음, 화자 차이, 발음 태도 등에는 둔감해야 하며 이 과정을 거쳐 상기 인식부(303)는 순수하게 음성학적 특성에만 집중해 분석할 수 있게 된다. 추출된 음성 특징 벡터들은 상기 인식부(303)로 넘어가 미리 저장된 음향모델과 비교하게 되며 그 결과는 언어처리 과정을 거쳐 최종 인식된 문장으로 출력되게 된다. The feature vectors described above well represent phonetic characteristics and should be insensitive to other factors such as background noise, speaker difference, and pronunciation attitude. Through this process, the recognition unit 303 focuses on purely phonetic characteristics and analyzes them. You can do it. The extracted speech feature vectors are passed to the recognition unit 303 and compared with the previously stored acoustic model, and the result is output as a final recognized sentence through language processing.

특히, 미리 저장된 음향 모델과 비교하는 과정에서 데이터베이스가 사용되며, 본 발명의 일실시예에 의한 멀티미디어 디바이스는 내부 데이터베이스 및 외부 데이터베이스를 동시에 이용 가능하도록 설계된다. 이에 대해서는 다시 상세히 설명하도록 하겠다.In particular, a database is used in the process of comparing with a previously stored acoustic model, the multimedia device according to an embodiment of the present invention is designed to be able to use the internal database and the external database at the same time. This will be described in detail later.

한편, 특징 벡터 추출 방법은, 인간이 음성을 인지하는 방법을 흉내내는 관점에 따라 여러 종류가 있으며 대표적으로는, 모든 주파수 대역에 동일하게 비중을 두어 분석하는 LPC(Linear Predictive Coding) 추출법, 혹은 사람의 음성 인지 양상이 선형적이지 않고 로그 스케일과 비슷한 멜 스케일을 따른다는 특성을 반영한 MFCC(Mel Frequency Cepstral Coefficients) 추출법, 음성과 잡음을 뚜렷하게 구별하기 위해 고주파 성분을 강조해 주는 고역강조 추출법, 음성을 짧은 구간으로 나누어 분석할 때 생기는 단절로 인한 왜곡현상을 최소화 하는 창 함수 추출법 등이 있다.On the other hand, there are various types of feature vector extraction methods according to the point of view of mimicking how a human recognizes speech, and typically, LPC (Linear Predictive Coding) extraction method that analyzes with equal emphasis on all frequency bands or human Mel Frequency Cepstral Coefficients (MFCC) extraction, which reflects the characteristics of speech recognition that is not linear and follows Mel scale similar to log scale, high frequency emphasis extraction that emphasizes high frequency components to distinguish speech and noise clearly, The window function extraction method minimizes the distortion caused by breakage when analyzing the data into intervals.

그리고, 상기 인식부(303)는 상기 멀티미디어 디바이스(300)의 내부 또는 외부에 위치한 음성 인식 관련 데이터베이스의 음성학적 정보와 상기 전처리부(302)에서 넘어온 특징 벡터와의 비교를 통해 음성 인식 결과를 획득하게 된다.The recognition unit 303 obtains a voice recognition result by comparing the phonetic information of a voice recognition related database located inside or outside the multimedia device 300 with a feature vector passed from the preprocessor 302. Done.

또한, 음성 인식을 위한 데이터베이스 검색 과정은, 크게 단어 단위 검색과 문장 단위 검색으로 분류할 수가 있다. 단어 단위 검색에서는 얻어 낸 특징 벡터를 데이터베이스에 저장된 단어 모델, 즉 각 단어의 음성학적 특징, 또는 그보다 짧게 음소 단위의 음향 모델과의 비교를 통해 가능한 단어에 대한 경우를 추출한다. 데이터베이스에 미리 저장된 음향 모델과의 패턴 비교를 통해 적절한 후보 패턴을 찾아내는 과정이므로 패턴분류라고 하기도 한다.In addition, a database search process for speech recognition can be largely classified into a word unit search and a sentence unit search. The word unit search extracts the case of possible words by comparing the obtained feature vector with a word model stored in a database, that is, a phonetic feature of each word, or shorter. It is also called pattern classification because it is a process of finding an appropriate candidate pattern by comparing the pattern with the acoustic model stored in the database.

패턴 분류 과정을 거친 결과는 일련의 후보 단어 또는 후보 음소의 형태로 문장 단위 검색으로 넘어가게 된다. 이 과정에서는 후보단어 또는 후보음소들의 정보를 토대로 하며 문법 구조, 전체적인 문장 문맥, 특정 주제에의 부합 여부를 판단하여 어떤 단어나 음소가 가장 적합한지를 판단하게 된다. 예를 들어 ‘나는 간다’는 문장에서 불명확한 발음이 섞여 ‘는’과 ‘능’이 잘 구분이 되지 않는다고 가정해 보자. The result of the pattern classification process is passed to the sentence unit search in the form of a series of candidate words or candidate phonemes. In this process, based on the information of candidate words or candidate phonemes, it is judged which word or phoneme is most suitable by determining grammatical structure, overall sentence context, and conformity to a specific subject. For example, let's assume that 'i' goes' indistinguishable from '은' and '능' because of the indefinite pronunciation in the sentence.

이 때 음성인식 시스템은 패턴 분류 과정에서 ‘는’과 ‘능’이라는 두 개의 후보 단어를 결과로 생성해 내게 된다. 곧이어 따라 나오는 문장 단위 검색 과정에서는 문장 구조 분석을 통해 ‘는’이 문장에서 조사 역할을 담당한다는 것을 알아내게 되고 ‘능’이라는 조사는 존재하지 않으므로 후보에서 배제하게 된다.At this time, the speech recognition system generates two candidate words as 'results' and 'performance' as a result of pattern classification. In the subsequent sentence unit search process, the sentence structure analysis reveals that '은' plays an investigation role in the sentence, and the investigation of 'performance' does not exist and thus is excluded from the candidate.

즉, 어휘 및 문법 구조에의 제약을 통해 인식성능을 향상시키는 과정이다. 이 과정에서는 문법 구조 뿐만 아니라 의미 정보도 함께 이용되며 따라서 언어처리 과정이라고도 한다. 또한, 패턴 분류와 언어처리 과정에서 이용되는 데이터들은 미리 컴퓨터에 의해 학습되어 데이터베이스에 저장된다. In other words, it is a process of improving the recognition performance through the restriction on the vocabulary and grammar structure. In this process, not only grammatical structure but also semantic information are used together, so it is also called language processing process. In addition, data used in pattern classification and language processing are learned by a computer in advance and stored in a database.

상기 인식부(303)에서 주로 사용되는 대표적인 기술은 HMM(Hidden Markov Model)으로 통계적 패턴 인식을 기반으로 하며 단어 단위 검색과 문장 단위 검색 과정이 하나의 최적화 과정으로 통합된 방식이다. 이 방법은 음성단위에 해당하는 패턴들의 통계적 정보를 확률모델 형태로 저장하고 미지의 입력패턴이 들어오면 각각의 모델에서 이 패턴이 나올 수 있는 확률을 계산함으로써 이 패턴에 가장 적합한 음성단위를 찾아내는 방법이다.A representative technique mainly used in the recognition unit 303 is HMM (Hidden Markov Model), which is based on statistical pattern recognition, and combines a word unit search and a sentence unit search process into one optimization process. This method finds the best speech unit for this pattern by storing statistical information of patterns corresponding to the speech unit in the form of probability model and calculating the probability that the pattern can come from each model when an unknown input pattern comes in. to be.

다시 도 3을 참조하여, 상기 인식부(303)에서 음성 인식을 위해 필요한 데이터베이스(특히, 본 발명의 일실시예에 의하면, 내부 데이터베이스 및 외부 데이터베이스 등 하이브리드 형태의 DB 를 사용함)에 액세스 및 음성 인식 결과를 처리하는 과정을 상세히 설명하면 다음과 같다.Referring to FIG. 3 again, the recognition unit 303 accesses and recognizes a database required for speech recognition (particularly, according to an embodiment of the present invention, a hybrid DB such as an internal database and an external database) is used. The process of processing the result is described in detail as follows.

도 3에 도시된 바와 같이, 상기 보이스 센서(301)는, 상기 멀티미디어 디바이스(300)의 유저의 음성 데이터를 수신하고, 상기 전처리부(302)는 상기 수신된 음성 데이터로부터 인식에 필요한 특징 벡터를 추출한다. 또는, 상기 전처리부(302)는 외부 모바일 디바이스(310)로부터 음성 데이터를 수신하도록 설계할 수도 있다.As shown in FIG. 3, the voice sensor 301 receives voice data of a user of the multimedia device 300, and the preprocessor 302 receives a feature vector required for recognition from the received voice data. Extract. Alternatively, the preprocessor 302 may be designed to receive voice data from the external mobile device 310.

상기 인식부(303)는, 우선 상기 멀티미디어 디바이스(300)내 위치한 제1데이터베이스를 이용하여, 상기 추출된 특징 벡터를 분석한다. 상기 분석 결과 상기 음성 데이터가 기설정된 태그를 포함하고 있는 경우, 상기 제어부(304)는 상기 음성 데이터에 대응하는 디바이스 컨트롤 명령이 실행되도록 제어한다.The recognition unit 303 first analyzes the extracted feature vector using a first database located in the multimedia device 300. When the voice data includes a preset tag as a result of the analysis, the controller 304 controls the device control command corresponding to the voice data to be executed.

반면, 상기 분석 결과 상기 음성 데이터가 기설정된 태그를 포함하고 있지 않은 경우, 상기 네트워크 인터페이스(305)는 상기 추출된 특징 벡터를, 제2데이터베이스를 포함하는 외부 디바이스로 전송한다. 상기 외부 디바이스는, 예를 들어 도 2에 도시된 서버(230) 등에 해당한다.On the other hand, if the voice data does not include a predetermined tag as a result of the analysis, the network interface 305 transmits the extracted feature vector to an external device including a second database. The external device corresponds to, for example, the server 230 illustrated in FIG. 2.

따라서, 멀티미디어 디바이스(300) 자체의 기능을 컨트롤 하는 키워드와 관련된 음성 인식 데이터베이스는 내부 DB가 이용되므로, 보다 신속하게 명령을 실행할 수 있고 또한 기능 자체는 제한되어 있으므로, 외부 DB 를 이용할 필요가 없다. 나아가, 멀티미디어 디바이스(300)의 기능과 무관한 키워드와 관련된 음성 인식 데이터베이스는 외부 DB가 이용되므로, 보다 다양한 음성 인식 서비스사 가능한 장점이 있다. 상기 외부 DB는 업데이트가 용이하고 상대적으로 많은 데이터를 보유하고 있다는 점에서 내부 DB와 구별된다.Therefore, since the internal DB is used for the speech recognition database related to the keyword controlling the function of the multimedia device 300 itself, the command can be executed more quickly and the function itself is limited. Therefore, there is no need to use the external DB. Furthermore, since an external DB is used for the speech recognition database related to the keyword irrelevant to the function of the multimedia device 300, there are advantages in that various speech recognition services can be used. The external DB is distinguished from the internal DB in that it is easy to update and has a relatively large amount of data.

한편, 전술하여 설명한 기설정된 태그(tag)라 함은, 예컨대 상기 멀티미디어 디바이스(300)를 컨트롤 하는 명령을 실행하기 위한 특정 글자의 조합으로 구성되어 있다. 이에 대해서는 도 4 및 도 5를 참조하여 보다 상세히 후술하도록 하겠다.On the other hand, the predetermined tag described above is composed of a combination of specific letters for executing a command for controlling the multimedia device 300, for example. This will be described later in more detail with reference to FIGS. 4 and 5.

또한, 본 발명의 다른 일실시예에 의한 멀티미디어 디바이스(300)는, 도 3에 도시된 바와 같이 디스플레이부(306)를 더 포함하고 있다. 상기 디스플레이부(306)는, 상기 멀티미디어 디바이스(300)의 현재 상태가 타이핑 모드인 경우, 전술하여 설명한 외부 디바이스로부터 수신된 키워드 자체를 타이핑 영역에 디스플레이 한다. 반면, 상기 디스플레이부(306)는, 상기 멀티미디어 디바이스(300)의 현재 상태가 타이핑 모드가 아닌 경우에는, 상기 외부 디바이스로부터 수신된 키워드와 관련된 컨텐츠 리스트를 디스플레이 하도록 설계된다. 상기 외부 디바이스는, 예컨대 상기 멀티미디어 디바이스(300)와 네트워크로 연결된 서버(도 2에 도시된 230)에 대응한다. 한편, 상기 디스플레이부(306)가 출력하는 스크린에 대해서는 도 10 내지 도 12를 참조하여 보다 상세히 후술하도록 하겠다.In addition, the multimedia device 300 according to another embodiment of the present invention further includes a display unit 306 as shown in FIG. 3. When the current state of the multimedia device 300 is a typing mode, the display unit 306 displays the keyword itself received from the external device described above in a typing area. In contrast, when the current state of the multimedia device 300 is not a typing mode, the display unit 306 is designed to display a content list related to a keyword received from the external device. The external device corresponds to, for example, a server (230 shown in FIG. 2) connected to the multimedia device 300 in a network. Meanwhile, the screen output by the display unit 306 will be described later in more detail with reference to FIGS. 10 to 12.

도 4는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식을 위한 내부 데이터베이스(DataBase)의 제1실시예를 도시한 도면이다. 이하, 도 4를 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스가 "TV" 란 공통 태그를 저장하고 있는 내부 데이터베이스를 이용하여 음성 인식 서비스를 수행하는 방법을 설명하면 다음과 같다.4 is a diagram illustrating a first embodiment of an internal database (DataBase) for speech recognition of a multimedia device according to an embodiment of the present invention. Hereinafter, referring to FIG. 4, a multimedia device according to an embodiment of the present invention will be described with reference to a method of performing a voice recognition service using an internal database storing a common tag of "TV".

도 4에 도시된 바와 같이, 멀티미디어 디바이스(예를 들어, TV 등) 자체의 기능을 컨트롤 하는 키워드에 대해서는, 공통적으로 "TV" 라는 태그를 붙이도록 설정하고 이를 내부 데이터베이스에 저장하도록 설계한다. 따라서, "TV" 라는 음성이 인식된 경우에는 내부 데이터베이스에 신속히 액세스 하고, "TV" 가 아닌 다른 단어가 인식된 경우에는 내부 DB 또는 외부 DB에 액세스 하여 보다 다양한 정보들을 컬렉트할 수 있도록 설계한다.As shown in FIG. 4, keywords that control the functions of multimedia devices (eg, TVs, etc.) themselves are commonly set to be tagged "TV" and stored in an internal database. Therefore, if the voice of "TV" is recognized, the internal database is quickly accessed. If a word other than "TV" is recognized, the internal or external DB can be accessed to collect more information. .

또한, "TV" 다음에 발음되어진 개별 명령어(도 4에 도시된 볼륨 업, 볼륨 다운, 파워 온, 파워 오프)를 인식한 다음, 해당 컨트롤 명령을 수행하도록 제어함으로써, 불필요하게 외부 데이터베이스까지 액세스하는 과정에서 발생하는 시간 낭비를 줄일 수가 있다.In addition, by recognizing the individual commands (volume up, volume down, power on, power off shown in FIG. 4) pronounced after "TV", and controlling to perform the corresponding control command, unnecessary access to the external database is performed. This saves time wasted in the process.

도 5는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식을 위한 내부 데이터베이스(DataBase)의 제2실시예를 도시한 도면이다. 이하, 도 5를 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스가 "기기" 란 공통 태그를 저장하고 있는 내부 데이터베이스를 이용하여 음성 인식 서비스를 수행하는 방법을 설명하면 다음과 같다.FIG. 5 is a diagram illustrating a second embodiment of an internal database (DataBase) for speech recognition of a multimedia device according to an embodiment of the present invention. Hereinafter, referring to FIG. 5, a multimedia device according to an embodiment of the present invention will be described with reference to a method of performing a voice recognition service using an internal database storing a common tag of “device”.

도 5에 도시된 바와 같이, 멀티미디어 디바이스(예를 들어, 기기 등) 자체의 기능을 컨트롤 하는 키워드에 대해서는, 공통적으로 "기기" 라는 태그를 붙이도록 설정하고 이를 내부 데이터베이스에 저장하도록 설계한다. 따라서, "기기" 라는 음성이 인식된 경우에는 내부 데이터베이스에 신속히 액세스 하고, "기기" 가 아닌 다른 단어가 인식된 경우에는 내부 DB 또는 외부 DB에 액세스 하여 보다 다양한 정보들을 컬렉트할 수 있도록 설계한다.As illustrated in FIG. 5, keywords that control the functions of multimedia devices (eg, devices, etc.) themselves are commonly set to be tagged with "device" and stored in an internal database. Therefore, when the voice of "device" is recognized, it is designed to quickly access the internal database, and when a word other than "device" is recognized, it is designed to collect more information by accessing the internal DB or external DB. .

또한, "기기" 다음에 발음되어진 개별 명령어(도 5에 도시된 채널 업, 채널 다운, 파워 온, 파워 오프)를 인식한 다음, 해당 컨트롤 명령을 수행하도록 제어함으로써, 불필요하게 외부 데이터베이스까지 액세스하는 과정에서 발생하는 시간 낭비를 줄일 수가 있다.In addition, by recognizing the individual commands (channel up, channel down, power on, power off shown in Fig. 5), which are pronounced next to the "device," and then controlling to perform the corresponding control commands, unnecessary access to the external database is also possible. This saves time wasted in the process.

도 6은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 제어 방법을 시간 흐름에 따라 도시한 플로우 차트이다. 이하, 도 6을 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스가 내부 DB 및 외부 DB를 선택적으로 이용하여 음성 인식 서비스를 제공하는 방법을 설명하면 다음과 같다.6 is a flowchart illustrating a method of controlling a multimedia device according to an embodiment of the present invention over time. Hereinafter, referring to FIG. 6, a method of providing a speech recognition service by selectively using an internal DB and an external DB according to an embodiment of the present invention will be described.

서로 다른 적어도 2개 이상의 데이터베이스를 이용하여 음성 인식 서비스를 제공하는 멀티미디어 디바이스는, 상기 멀티미디어 디바이스의 유저의 음성 데이터를 수신한다(S610). 또한, 상기 멀티미디어 디바이스는, 상기 수신된 음성 데이터로부터 인식에 필요한 특징 벡터를 추출한다(S620).A multimedia device providing a voice recognition service using at least two different databases, receives voice data of a user of the multimedia device (S610). In addition, the multimedia device extracts a feature vector required for recognition from the received voice data (S620).

상기 멀티미디어 디바이스는, 상기 멀티미디어 디바이스내 위치한 제1데이터베이스를 이용하여, 상기 추출된 특징 벡터에 대응하는 키워드가 존재하는지 여부를 판단한다(S630).The multimedia device determines whether a keyword corresponding to the extracted feature vector exists using the first database located in the multimedia device (S630).

상기 판단 결과(S630) 상기 제1데이터베이스에 존재하는 경우, 상기 멀티미디어 디바이스는 상기 키워드에 따른 음성 인식 서비스가 실행되도록 제어한다(S640).If the determination result (S630) exists in the first database, the multimedia device controls to execute a voice recognition service according to the keyword (S640).

반면, 상기 판단 결과(S630) 상기 제1데이터베이스에 존재하지 않는 경우, 상기 멀티미디어 디바이스는, 상기 멀티미디어 디바이스 외부에 위치하며 또한 네트워크로 연결된 제2데이터베이스에 액세스 한다(S650). 또한, 상기 멀티미디어 디바이스는, 상기 제2데이터베이스를 이용하여, 상기 추출된 특징 벡터에 대응하는 키워드가 존재하는지 여부를 판단한다.On the other hand, if the determination result (S630) does not exist in the first database, the multimedia device is located outside the multimedia device and accesses a second database connected via a network (S650). In addition, the multimedia device determines whether a keyword corresponding to the extracted feature vector exists using the second database.

상기 제2데이터베이스에 존재하는 경우, 상기 멀티미디어 디바이스는 상기 키워드에 따른 음성 인식 서비스가 실행되도록 제어한다(S660).If present in the second database, the multimedia device controls to execute a voice recognition service according to the keyword (S660).

도 7은 도 6에 도시된 전체 단계들 중에서, 특정 S640 단계를 보다 상세히 도시한 플로우 차트이다. 이하, 도 7을 참조하여, 도 6에 도시된 S640 단계의 세부 스텝들을 설명하면 다음과 같다.FIG. 7 is a flowchart illustrating in more detail a specific S640 step among all the steps shown in FIG. 6. Hereinafter, referring to FIG. 7, detailed steps of step S640 illustrated in FIG. 6 will be described.

전술하여 설명한 상기 S640 단계는, 도 7에 도시된 바와 같이 상기 키워드가 기설정된 태그를 포함하고 있는지 여부를 판단하는 단계(S641)를 더 포함한다. 상기 태그는 도 4 및 도 5에서 상세히 설명하였으므로, 동일한 설명은 생략하도록 하겠다.The above-described step S640 further includes a step S641 of determining whether the keyword includes a predetermined tag as shown in FIG. 7. Since the tag has been described in detail with reference to FIGS. 4 and 5, the same description will be omitted.

그리고, 상기 S640 단계는, 상기 판단 결과(S641) 포함하고 있는 경우, 상기 키워드에 대응하는 디바이스 컨트롤 명령이 실행되도록 제어하는 단계(S642)와, 그리고 상기 판단 결과(S641) 포함하고 있지 않은 경우, 상기 키워드와 관련된 컨텐츠 리스트를 디스플레이 하는 단계(S643)를 더 포함하도록 설계한다. 특히, 전술하여 설명한 S642 단계는 도 10을 참조하여 보충적으로 해석할 수 있으며, 전술하여 설명한 S643 단계는 도 11을 참조하여 보충적으로 해석할 수가 있다.In operation S640, when the determination result S641 is included, controlling the device control command corresponding to the keyword to be executed in operation S642, and when the determination result S641 is not included, The method may further include displaying the content list related to the keyword (S643). In particular, step S642 described above may be supplementarily interpreted with reference to FIG. 10, and step S643 described above may be supplementarily interpreted with reference to FIG. 11.

도 8은 도 6에 도시된 전체 단계들 중에서, 특정 S660 단계를 보다 상세히 도시한 플로우 차트이다. 이하, 도 8을 참조하여, 도 6에 도시된 S660 단계의 세부 스텝들을 설명하면 다음과 같다.FIG. 8 is a flowchart illustrating in more detail a specific S660 step among all the steps shown in FIG. 6. Hereinafter, the detailed steps of the step S660 illustrated in FIG. 6 will be described with reference to FIG. 8.

전술하여 설명한 상기 S660 단계는, 도 8에 도시된 바와 같이 상기 멀티미디어 디바이스의 현재 상태가 타이핑 모드인지 여부를 판단하는 단계(S661)를 더 포함한다.The above-described step S660 may further include determining whether the current state of the multimedia device is a typing mode as shown in FIG. 8 (S661).

그리고, 상기 S660 단계는, 상기 판단 결과(S661) 타이핑 모드인 경우, 상기 키워드 자체를 타이핑 영역에 디스플레이 하는 단계(S662)와, 그리고 상기 판단 결과(S661) 타이핑 모드가 아닌 경우, 상기 키워드와 관련된 컨텐츠 리스트를 디스플레이 하는 단계(S663)를 더 포함하도록 설계한다. 특히, 전술하여 설명한 S662 단계는 도 12를 참조하여 보충적으로 해석할 수 있으며, 전술하여 설명한 S663 단계는 도 11을 참조하여 보충적으로 해석할 수도 있다.In operation S660, when the determination result S661 is a typing mode, displaying the keyword itself in a typing area in operation S662, and when the determination result is not a typing mode, is related to the keyword. The method may further include displaying the content list (S663). In particular, step S662 described above may be supplementarily interpreted with reference to FIG. 12, and step S663 described above may be supplementarily interpreted with reference to FIG. 11.

도 9는 본 발명의 다른 일실시예에 의한 멀티미디어 디바이스의 제어 방법을 시간 흐름에 따라 도시한 플로우 차트이다. 이하, 도 9를 참조하여, 본 발명의 다른 일실시예에 의한 멀티미디어 디바이스의 제어 방법을, 임베디드 영역과 서버 영역으로 나누어 설명하도록 하겠다. 특히, 상기 임베디드 영역이라 함은, 예컨대 전술하여 설명한 내부 데이터베이스를 이용하여 음성 인식을 처리하는 부분을 의미하고, 상기 서버 영역이라 함은, 예컨대 전술하여 설명한 외부 데이터베이스를 이용하여 음성 인식을 처리하는 부분을 의미한다.9 is a flowchart illustrating a control method of a multimedia device according to another embodiment of the present invention over time. Hereinafter, a method of controlling a multimedia device according to another exemplary embodiment of the present invention will be described by dividing the embedded area and the server area. In particular, the embedded area means, for example, a part for processing voice recognition using the internal database described above, and the server area means, for example, a part for processing voice recognition using the external database described above. Means.

본 발명의 다른 일실시예에 의한 멀티미디어 디바이스는, 유저의 음성 데이터를 수신한다(S910). 나아가, 상기 멀티미디어 디바이스는, 상기 수신된 음성 데이터가 내부 DB에 존재하는지 여부를 1차적으로 판단한다(S920). 상기 내부 DB는 예를 들어, 도 4 또는 도 5에 도시된 바와 같이 공통 태그를 포함하도록 설계한다. The multimedia device according to another embodiment of the present invention receives voice data of a user (S910). Further, the multimedia device primarily determines whether the received voice data exists in an internal DB (S920). The internal DB is designed to include a common tag, for example, as shown in FIG. 4 or 5.

상기 판단 결과(S920) 내부 DB에 존재하는 경우, 인식된 음성 데이터가 명령어에 해당하는지 여부를 다시 판단한다(S930). 상기 판단 결과(S930) 명령어에 해당하는 경우, 멀티미디어 디바이스의 특정 기능을 자동으로 실행한다(S940).If the determination result (S920) exists in the internal DB, it is again determined whether the recognized voice data corresponds to the command (S930). If it corresponds to the determination result (S930) command, a specific function of the multimedia device is automatically executed (S940).

반면, 상기 판단 결과(S930) 명령어에 해당하지 않는 경우에는, 인식된 음성 데이터와 관련된 검색 결과를 디스플레이 한다(S980).On the other hand, if it does not correspond to the determination result (S930) command, a search result related to the recognized voice data is displayed (S980).

한편, 상기 판단 결과(S920) 내부 DB에 존재하지 않는 경우에는, 인식된 음성 데이터를 외부 서버로 전송한다(S950). 그리고 이 때, 상기 외부 서버의 데이터베이스를 이용 가능하도록 설계된다.On the other hand, if the determination result (S920) does not exist in the internal DB, the recognized voice data is transmitted to the external server (S950). At this time, it is designed to use the database of the external server.

그리고, 멀티미디어 디바이스의 현재 상태가 타이핑 모드인지 여부를 판단한다(S960). 상기 판단 결과(S960) 타이핑 모드인 경우에는 인식된 음성 데이터 자체에 대한 딕테이션(dictation) 을 수행하며(S970), 상기 판단 결과(S960) 타이핑 모드가 아닌 경우에는 인식된 음성 데이터와 관련된 검색 결과를 디스플레이 한다(S980).In operation S960, it is determined whether the multimedia device is in a typing mode. In the determination result (S960) typing mode, a dictation is performed on the recognized speech data itself (S970). In the determination result (S960), the search result related to the recognized speech data is displayed. Display (S980).

다시 정리하여 설명하면, 음성 인식 결과, 공통 구분자(Tag)가 삽입되어 있는 경우(예를 들어, "TV on", "TV Off", "TV Volume up" 등)에는, 지체없이 해당 명령어에 대응하는 디바이스 컨트를을 수행한다.In summary, when the speech recognition result shows that a common tag is inserted (for example, "TV on", "TV Off", "TV Volume up", etc.), the command is immediately responded to. Perform device control.

반면, 음성 인식 결과, 공통 구분자(예컨대 "TV")가 존재하지 않는 경우(예를 들어, "무안 도전", "소년 시대" 등)에는, 외부 서버에 액세스 하여 검색을 수행한다.On the other hand, when the voice recognition result does not exist (for example, "TV"), the common separator ("Muan challenge", "boy age", etc.), the external server is accessed to perform a search.

또한, 음성 인식 시 명령어 DB (Embedded type)에 단어 일치율이 낮은 경우 (예를 들어, 도 4 또는 도 5에 도시된 명령어 List 에 존재 하지 않는 단어)에는, 서버를 통해 음성 데이터를 전송하고, 서버 영역에서 음성 데이터를 처리한다.In addition, when the word matching rate is low in the command DB (Embedded type) during speech recognition (for example, a word not present in the command list illustrated in FIG. 4 or 5), the voice data is transmitted through the server. Process voice data in the area.

한편, 음성 인식 결과 컨텐츠에 대한 검색이 요청되는 경우에는, 멀티미디어 디바이스 내부 또는 근거리에 위치한 다른 디바이스(예들 들어, PC, Set-top, Mobile, Network HDD)의 데이터베이스를 이용하여 검색을 실시한다. 또한, 전술한 과정과 더불어 병행하여, 음성 데이터를 서버로 전송하여 원거리 네트워크 상의 웹검색도 동시에 수행하도록 설계할 수도 있다.On the other hand, when a search for the content of the voice recognition result is requested, the search is performed using a database of other devices (eg, PC, Set-top, Mobile, Network HDD) located in or near the multimedia device. In addition to the above-described process, it may be designed to simultaneously perform web search on a remote network by transmitting voice data to a server.

그리고, 멀티미디어 디바이스의 입력 상태(state)를 우선적으로 확인하여, 단순 딕테이션이 목적인지 또는 관련 컨텐츠 검색이 목적인지 여부에 대한 사용자 의도를 판별한다.Then, the input state of the multimedia device is first checked to determine a user's intention as to whether a simple dictation is a purpose or a related content search.

도 10은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 일예를 도시한 도면이다. 이하, 도 10을 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스 중, 내부 데이터베이스를 이용하여 특정 기능을 수행하는 과정을 설명하면 다음과 같다.10 is a diagram illustrating an example of a voice recognition service of a multimedia device according to an embodiment of the present invention. Hereinafter, a process of performing a specific function using an internal database among voice recognition services of a multimedia device according to an embodiment of the present invention will be described with reference to FIG. 10.

우선, 도 10의 (a)에 도시된 바와 같이, 본 발명의 일실시예에 의한 멀티미디어 디바이스(1000)는 볼륨 상태가 "3" 레벨(1030)로 설정되어 있다고 가정하겠다. 이 때, 상기 멀티미디어 디바이스(1000)의 유저(1010)가, 'TV 볼륨 업'(1020)이라고 발성한 경우, 해당 음성 데이터는 상기 멀티미디어 디바이스(1000)로 전송된다. 이 때, 상기 멀티미디어 디바이스(1000)는 도 3과 같이 설계되어 진다.First, as shown in FIG. 10A, it is assumed that the multimedia device 1000 according to an embodiment of the present invention has a volume state set to a “3” level 1030. At this time, when the user 1010 of the multimedia device 1000 speaks 'TV volume up' 1020, the corresponding voice data is transmitted to the multimedia device 1000. At this time, the multimedia device 1000 is designed as shown in FIG.

그 다음, 내부 데이터베이스를 이용하여 상기 해당 음성 데이터를 인식한 멀티미디어 디바이스(1050)는, 도 10의 (b)에 도시된 바와 같이 TV 볼륨 상태를 한 단계 높여서 "4" 레벨(1060)로 자동 설정한다. 상기 내부 데이터베이스에 대해서는 전술한 도 4 또는 도 5에서 충분히 설명한 바, 동일한 설명은 생략하도록 하겠다.Next, the multimedia device 1050 that recognizes the corresponding voice data using an internal database automatically raises the TV volume state by one step to set the "4" level 1060 as shown in FIG. do. The internal database has been described in detail with reference to FIGS. 4 and 5, and the same description will be omitted.

도 11은 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 다른 일예를 도시한 도면이다. 이하, 도 11을 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스 중, 외부 데이터베이스를 이용하여 관련 컨텐츠 리스트를 제공하는 과정을 설명하면 다음과 같다.11 is a diagram illustrating another example of a voice recognition service of a multimedia device according to an embodiment of the present invention. Hereinafter, a process of providing a related content list using an external database among voice recognition services of a multimedia device according to an embodiment of the present invention will be described with reference to FIG. 11.

우선, 도 11의 (a)에 도시된 바와 같이, 본 발명의 일실시예에 의한 멀티미디어 디바이스(1100)의 유저(1110)가, '마이크 잭스'(1120)라고 발성한 경우, 해당 음성 데이터는 상기 멀티미디어 디바이스(1100)로 전송된다. 이 때, 상기 멀티미디어 디바이스(1100)는 도 3과 같이 설계되어 진다.First, as illustrated in FIG. 11A, when the user 1110 of the multimedia device 1100 according to an embodiment of the present invention speaks as 'Mic Jacks' 1120, the corresponding voice data is Transmitted to the multimedia device 1100. In this case, the multimedia device 1100 is designed as shown in FIG. 3.

그 다음, 외부 데이터베이스를 이용하여 상기 해당 음성 데이터를 인식한 멀티미디어 디바이스(1150)는, 도 11의 (b)에 도시된 바와 같이 인식된 해당 음성 데이터와 관련된 컨텐츠 리스트(1160, 1170, 1180, 1190)를 자동으로 디스플레이 하도록 설계한다. Next, the multimedia device 1150 that recognizes the corresponding voice data using an external database may include a content list 1160, 1170, 1180, and 1190 related to the recognized voice data as illustrated in FIG. 11B. ) Is designed to display automatically.

그리고, 도 12는 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스의 또 다른 일예를 도시한 도면이다. 이하, 도 12를 참조하여, 본 발명의 일실시예에 의한 멀티미디어 디바이스의 음성 인식 서비스 중, 외부 데이터베이스를 이용하여 인식된 음성 데이터의 딕테이션(dictation)을 실행하는 과정을 설명하면 다음과 같다.12 is a diagram illustrating another example of a voice recognition service of a multimedia device according to an embodiment of the present invention. Hereinafter, a process of executing dictation of voice data recognized using an external database among voice recognition services of a multimedia device according to an embodiment of the present invention will be described as follows.

우선, 도 12의 (a)에 도시된 바와 같이, 본 발명의 일실시예에 의한 멀티미디어 디바이스(1200)의 유저(1210)가, '키 크는 방법'(1220)이라고 발성한 경우, 해당 음성 데이터는 상기 멀티미디어 디바이스(1200)로 전송된다. 이 때, 상기 멀티미디어 디바이스(1200)는 도 3과 같이 설계되어 진다.First, as illustrated in (a) of FIG. 12, when the user 1210 of the multimedia device 1200 according to an embodiment of the present invention speaks as 'thickness method' 1220, corresponding voice data Is transmitted to the multimedia device 1200. At this time, the multimedia device 1200 is designed as shown in FIG.

그 다음, 외부 데이터베이스를 이용하여 상기 해당 음성 데이터를 인식한 멀티미디어 디바이스(1250)는, 도 12의 (b)에 도시된 바와 같이 인식된 해당 음성 데이터(1260)를 그대로 출력한다. 도 11과 달리 관련 컨텐츠 리스트를 디스플레이 하지 않는 이유는, 도 12에 도시된 바와 같이 현재 멀티미디어 디바이스의 상태가 키워드 타이핑 모드에 해당하기 때문이다. 따라서, 멀티미디어 디바이스의 현재 모드, 상태에 따라 다른 종류의 결과물을 제공하는 것이 본 발명 특유의 효과라 할 것이다.Next, the multimedia device 1250 that recognizes the corresponding voice data by using an external database outputs the recognized voice data 1260 as shown in FIG. Unlike FIG. 11, the reason for not displaying the related content list is that the current multimedia device state corresponds to the keyword typing mode as shown in FIG. 12. Therefore, it would be a unique effect of the present invention to provide different kinds of results according to the current mode and state of the multimedia device.

이상 전술하여 설명한 본 발명의 일실시예들에 의하면, 내부 데이터베이스를 이용하여 명령 수행은 빠른 수행 타임을 유지하고, 많은 데이터 처리를 요구하는 경우에는 외부 데이터베이스를 이용하여 보다 다이나믹한 컨텐츠 검색 결과를 디스플레이 하는 장점이 있다.According to the embodiments of the present invention described above, when the command execution using the internal database maintains a fast execution time, and when a large amount of data processing is required, a more dynamic content search result is displayed using the external database. There is an advantage.

한편, 당해 명세서에서는 물건 발명과 방법 발명이 모두 설명되고 있으며, 필요에 따라 양발명의 설명은 보충적으로 적용될 수가 있다. 또한, 설명의 편의상 각 도면을 나누어 설명하고 있으나, 도면 또는 실시예들을 결합하여 다른 실시예를 구현하는 것도 본 발명의 권리범위에 속한다.On the other hand, in the present specification, both the object invention and the method invention are described, and the description of the invention can be supplementally applied as necessary. In addition, for convenience of description, each drawing is divided and described, but implementing another embodiment by combining the drawings or embodiments is also within the scope of the present invention.

본 발명에 따른 방법 발명은 모두 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Method invention according to the present invention are all implemented in the form of program instructions that can be executed by various computer means can be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

200 : 멀티미디어 디바이스
210 : 제1기기 데이터베이스(DB)
220 : 제2기기 데이터베이스(DB)
230 : 서버200: multimedia device
210: first device database (DB)
220: second device database (DB)
230: server

Claims

In the control method of a multimedia device providing a voice recognition service using at least two different databases,
Receiving voice data of a user of the multimedia device;
Extracting a feature vector required for recognition from the received speech data;
Determining whether a keyword corresponding to the extracted feature vector exists by using a first database located in the multimedia device;
If present in the first database, controlling a voice recognition service according to the keyword to be executed;
Determining whether a keyword corresponding to the extracted feature vector exists by using a second database located outside the multimedia device and connected to a network when the data does not exist in the first database; And
Controlling the voice recognition service according to the keyword to be executed when the second database exists in the second database.
Control method of a multimedia device comprising a.

The method of claim 1,
When present in the first database, the step of controlling the voice recognition service according to the keyword is executed,
Determining whether the keyword includes a predetermined tag;
If the determination result is included, controlling a device control command corresponding to the keyword to be executed; And
If it is not included in the determination result, displaying a content list related to the keyword.
Control method of a multimedia device comprising a.

The method of claim 1,
When present in the second database, the step of controlling to execute the voice recognition service according to the keyword,
Determining whether a current state of the multimedia device is in a typing mode;
If the determination result is a typing mode, displaying the keyword itself in a typing area; And
If the determination result is not a typing mode, displaying a content list related to the keyword;
Control method of a multimedia device comprising a.

The method of claim 1,
The second database,
And a database managed by a server connected to the multimedia device via a network.

The method of claim 1,
Wherein the receiving comprises:
And receiving voice data obtained from a voice detecting sensor attached to a mobile device capable of communicating with the multimedia device.

A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 5 is recorded.

In the multimedia device that provides a speech recognition service using at least two different databases,
A voice sensor for receiving voice data of a user of the multimedia device;
A preprocessor extracting a feature vector required for recognition from the received speech data;
A recognizer configured to analyze the extracted feature vector using a first database located in the multimedia device;
A controller for controlling a device control command corresponding to the voice data to be executed when the voice data includes a predetermined tag as a result of the analysis; And
If the voice data does not include a predetermined tag as a result of the analysis, a network interface for transmitting the extracted feature vector to an external device including a second database
Multimedia device comprising a.

The method of claim 7, wherein
If the current state of the multimedia device is a typing mode, display the keyword itself received from the external device in a typing area, and
A display unit configured to display a content list related to a keyword received from the external device when the current state of the multimedia device is not a typing mode;
Multimedia device further comprising.

The method of claim 8,
The external device,
And a server connected to a network connected to the multimedia device.

The method of claim 7, wherein
The preset tag,
And a specific letter combination for executing the device control command.

The method of claim 7, wherein
The multimedia device,
A multimedia device corresponding to at least one of a network TV, a connected TV, a smart TV, a web TV, and an Internet TV.