KR101687614B1

KR101687614B1 - Method for voice recognition and image display device thereof

Info

Publication number: KR101687614B1
Application number: KR1020100075173A
Authority: KR
Inventors: 조택일; 윤종현
Original assignee: 엘지전자 주식회사
Priority date: 2010-08-04
Filing date: 2010-08-04
Publication date: 2016-12-19
Also published as: KR20120013032A

Abstract

본원 발명은 음성 인식률을 증가시켜 사용자가 보다 편리하게 영상 표시 장치를 제어할 수 있는 음성 인식 방법 및 그에 따른 영상 표시 장치이다.
본 발명의 일 실시예에 따른 음성 인식 방법 및 그에 따른 영상 표시 장치는 음성 인식률을 증가시킬 수 있다. 그에 따라서, 더욱 정확하게 영상 표시 장치를 원격으로 제어할 수 있다. The present invention relates to a voice recognition method and a video display apparatus according to the present invention in which a user can more easily control an image display apparatus by increasing the voice recognition rate.
The speech recognition method and the image display apparatus according to an embodiment of the present invention can increase the voice recognition rate. Accordingly, the image display apparatus can be remotely controlled more accurately.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a voice recognition method and an image display device,

본 발명은 음성 인식 방법 및 그에 따른 영상 표시 장치에 관한 것이다. The present invention relates to a speech recognition method and a video display apparatus therefor.

더욱 상세하게는 높은 음성 인식률을 갖는 음성 인식 방법 및 그에 따른 영상 표시 장치에 관한 것이다. And more particularly, to a speech recognition method having a high voice recognition rate and a video display device therefor.

영상 표시 장치는 소정 영상을 디스플레이할 수 있는 장치로, 디지털 텔레비전, 셋탑 박스(set-top box), PVR(Personal Video Recoder), 또는 DVD 상영장치(Digital Video Disc player) 등이 있다. 이러한 영상 표시 장치를 사용자가 원거리에서 조작하기 위해서는 원격 제어 장치 등을 이용할 수 있다. The video display device is a device capable of displaying a predetermined video, and includes a digital television, a set-top box, a PVR (Personal Video Recorder), or a DVD video player. A remote control device or the like can be used for the user to operate such a video display device from a long distance.

기존의 아날로그 방송을 벗어나, 디지털 기반의 디지털 방송 기술이 개발되고 상용화되고 있다. 그에 따라, 기존의 전파나 유선 케이블 매체 외에도 각 가정에 연결되어 있는 인터넷 네트워크를 이용하여 실시간 방송, CoD(Contents on Demand), 게임 또는 뉴스 등 다양한 종류의 컨텐츠 서비스를 사용자에게 제공할 수 있게 되었다. 그리고, 디지털 방송 수신기는 전술한 각종 컨텐츠 서비스를 제공받아 디스플레이할 수 있게 되었다. 상기 인터넷 네트워크를 이용한 컨텐츠 서비스 제공의 예로서 IPTV(Internet Protocol TV)를 들 수 있다. Beyond conventional analog broadcasting, digital-based digital broadcasting technology is being developed and commercialized. Accordingly, it is possible to provide various types of contents services to users, such as real-time broadcasting, contents on demand (CoD), games, or news, using an Internet network connected to each home in addition to existing radio waves or cable cable media. In addition, the digital broadcast receiver can receive and display various contents services described above. An example of content service provision using the Internet network is IPTV (Internet Protocol TV).

전술한 바와 같이, 디지털 방송 기술의 개발 및 IPTV 서비스의 제공 등으로 인해, 영상 표시 장치를 매우 다양한 서비스를 제공받을 수 있으며, 게임 등도 영상 표시 장치를 통하여 이용할 수 있다. 영상 표시 장치가 제공할 수 있는 컨텐츠 또는 서비스 등이 매우 다양해 지면서, 소정 컨텐츠 또는 서비스를 이용하기 위해서 필요한 리모컨 컨트롤러(remote controller)의 키들 또한 무수히 많아지고 있으며, 각각의 키들은 다양하고 복잡한 제어 명령을 포함할 수 있다. As described above, due to the development of the digital broadcasting technology and the provision of the IPTV service, the video display device can be provided with a wide variety of services, and games and the like can be used through the video display device. As the contents or services that can be provided by the video display device are greatly diversified, the number of keys of a remote controller necessary for using a predetermined content or service is also increasing. Each of the keys has various and complicated control commands .

그에 따라서, 한정된 키들만을 포함하는 버튼식 리모트 컨트롤러 이외에 음성 인식에 의한 제어 명령 입력이 가능한 음성 인식 리모트 컨트롤러가 개발되고 있다. Accordingly, a voice recognition remote controller capable of inputting a control command by voice recognition in addition to a button-type remote controller including only limited keys has been developed.

도 1은 음성 인식에 의한 영상 표시 장치의 제어를 설명하기 위한 도면이다. 1 is a diagram for explaining control of a video display device by voice recognition.

도 1을 참조하면, 사용자는 음성 인식 기능이 있는 리모트 컨트롤러(120)를 통하여 소정 명령을 포함하는 음성 신호를 입력한다. 그러면, 리모트 컨트롤러(120)는 음성 신호를 영상 표시 장치(110)로 전송하고, 영상 표시 장치(110)는 음성 신호에 포함된 명령을 인식하여 그에 따른 제어 동작을 수행한다. Referring to FIG. 1, a user inputs a voice signal including a predetermined command through a remote controller 120 having a voice recognition function. Then, the remote controller 120 transmits the voice signal to the video display device 110, and the video display device 110 recognizes the command included in the voice signal and performs the control operation accordingly.

도 2는 음성 인식의 일반적인 동작을 설명하기 위한 플로우차트이다. 2 is a flowchart for explaining a general operation of speech recognition.

도 2를 참조하면, 일반적인 음성 인식 방법에 있어서, 먼저 리모트 컨트롤러(120)로 음성 신호가 입력된다(210 단계). Referring to FIG. 2, in a general speech recognition method, a voice signal is first input to a remote controller 120 (step 210).

입력된 음성 신호를 전송받은 영상 표시 장치(110)는 음성 신호에 포함된 단어를 판별한다(220 단계). 즉, 음성 신호에 포함되는 명령을 인식한다. The video display device 110 receiving the input voice signal discriminates words included in the voice signal (step 220). That is, it recognizes a command included in the voice signal.

그리고, 단어 인식 결과에 따른 제어 동작을 수행한다(230 단계). Then, a control operation is performed according to the word recognition result (operation 230).

음성 인식에 의하여 영상 표시 장치를 원격으로 제어하는 리모트 컨트롤러는 다양한 제어 키들을 유동적으로 입력할 수 있으나, 음성 신호의 인식에 있어서 오류가 발생하면, 잘못된 제어 키가 입력될 수 있다. A remote controller for remotely controlling an image display apparatus by voice recognition can flexibly input various control keys, but if an error occurs in recognition of a voice signal, a wrong control key can be input.

따라서, 높은 음성 인식율을 가지며 정확도가 높은 음성 인식이 가능한 음성 인식 방법 및 그에 따른 영상 표시 장치를 제공할 필요가 있다. Accordingly, there is a need to provide a speech recognition method capable of speech recognition with high speech recognition rate and high accuracy, and a video display device accordingly.

본원 발명은 음성 인식률을 증가시킬 수 있는 음성 인식 방법 및 그에 따른 영상 표시 장치의 제공을 목적으로 한다. The present invention aims to provide a speech recognition method capable of increasing the speech recognition rate and a video display device therefor.

또한, 본원 발명은 음성 인식률을 증가시켜 사용자가 보다 편리하게 영상 표시 장치를 제어할 수 있는 음성 인식 방법 및 그에 따른 영상 표시 장치의 제공을 목적으로 한다. It is another object of the present invention to provide a speech recognition method and a video display apparatus therefor, in which a user can more easily control a video display device by increasing the voice recognition rate.

본 발명의 일 실시예에 따른 음성 인식 방법은 소정 시간 간격을 두고 일 완성형 문자에 대응되는 음성 신호를 입력받는 단계; 입모양의 움직임 영상을 촬영하는 단계; 상기 음성 신호로부터 상기 일 완성형 문자에 대응되는 음성 신호 구간을 추출하고, 상기 움직임 영상으로부터 상기 일 완성형 문자에 대응되는 영상 신호 구간을 추출하는 단계; 및 상기 음성 신호 구간과 상기 영상 신호 구간이 일치하면, 상기 일치된 구간 내에 입력된 상기 음성 신호 및 상기 영상 신호 중 적어도 하나에 대응되는 상기 완성형 문자를 인식하는 단계를 포함한다. According to an embodiment of the present invention, there is provided a speech recognition method comprising: receiving a speech signal corresponding to a one-shot type character at predetermined time intervals; Taking a mouth-shaped motion image; Extracting a voice signal section corresponding to the one-letter type character from the voice signal, and extracting a video signal section corresponding to the one-letter type character from the motion picture; And recognizing the completion type character corresponding to at least one of the voice signal and the video signal input in the matched section if the voice signal section and the video signal section coincide with each other.

또한, 상기 완성형 문자를 인식하는 단계는 상기 일치된 구간 내에 입력된 상기 음성 신호의 음성 특징값을 계산하는 단계; 및 상기 음성 특징값에 근거하여 상기 일 완성형 문자를 인식하는 단계를 포함할 수 있다. The recognizing of the completion type character may include calculating a speech characteristic value of the speech signal input in the matched section; And recognizing the complete character based on the voice feature value.

또한, 상기 완성형 문자를 인식하는 단계는 상기 일치된 구간 내에 입력된 상기 움직임 영상의 움직임 특징값을 계산하는 단계를 더 포함할 수 있다. The step of recognizing the completion character may further include calculating a motion feature value of the motion image input in the matched section.

또한, 상기 음성 특징값에 근거하여 상기 일 완성형 문자를 인식하는 단계는In addition, the step of recognizing the complete character based on the voice feature value

상기 음성 특징값 및 상기 움직임 특징값에 근거하여 상기 일 완성형 문자를 인식하는 단계를 포함할 수 있다. And recognizing the complete character based on the voice feature value and the motion feature value.

또한, 상기 음성 신호를 입력받는 단계는 상기 소정 명령을 형성하는 적어도 하나의 상기 완성형 문자에 대응되는 적어도 하나의 상기 음성 신호 각각을 상기 소정 시간 간격으로 입력받는 단계를 포함할 수 있다. The step of receiving the voice signal may include inputting at least one of the voice signals corresponding to at least one completion character forming the predetermined command at the predetermined time intervals.

또한, 상기 영상 신호 구간을 추출하는 단계는 상기 음성 신호가 입력되기 시작한 시점부터, 상기 소정 시간 간격이 시작되기 전까지의 구간을 상기 음성 신호 구간으로 추출하는 단계; 및 상기 입모양이 움직이기 시작한 시점부터 상기 입모양의 움직임이 정지되는 시점까지의 구간을 상기 영상 신호 구간으로 추출하는 단계를 포함할 수 있다. The step of extracting the video signal section may include extracting a section from the time when the audio signal starts to be input until the beginning of the predetermined time interval into the audio signal section. And extracting a section from the time point at which the mouth shape starts moving to the moment at which the mouth shape is stopped into the video signal section.

또한, 본 발명의 일 실시예에 따른 음성 인식 방법은 상기 인식된 완성형 문자 또는 상기 완성형 문자로 이루어지는 상기 소정 명령 정보를 실시간으로 사용자 인터페이스를 통하여 출력하는 단계; 및 상기 소정 명령을 수행하는 단계를 더 포함할 수 있다. According to another aspect of the present invention, there is provided a speech recognition method including: outputting the predetermined command information including the recognized complete character or the completed character through a user interface in real time; And performing the predetermined command.

또한, 상기 완성형 문자를 인식하는 단계는 상기 일치된 구간 내에 입력된 상기 음성 신호 및 상기 영상 신호를 인터넷 서버로 전송하는 단계; 및 상기 인터넷 서버의 음성 인식 엔진 및 음성 인식 데이터 베이스를 이용하여, 상기 완성형 문자를 인식하는 단계를 포함할 수 있다. In addition, the step of recognizing the completion type character may include transmitting the voice signal and the video signal input in the matched section to the Internet server. And recognizing the completion type character using the speech recognition engine and the speech recognition database of the Internet server.

또한, 본 발명의 일 실시예에 따른 음성 인식 방법은 상기 인식된 완성형 문자에 적어도 하나의 음성 유사어가 존재하는 경우, 상기 음성 유사어를 사용자 인터페이스를 통하여 출력하는 단계; 및 상기 적어도 하나의 음성 유사어 중 사용자가 의도한 완성형 문자를 상기 사용자 인터페이스를 통하여 선택받는 단계를 더 포함할 수 있다. According to another embodiment of the present invention, there is provided a method of recognizing speech, comprising the steps of: outputting the speech similarity through a user interface when at least one speech similarity exists in the recognized completed character; And receiving a completion character intended by a user of the at least one voice similarity through the user interface.

또한, 본 발명의 일 실시예에 따른 음성 인식 방법은 상기 인식된 완성형 문자로 이루어지는 소정 명령이 다수개 존재하는 경우, 상기 다수개의 소정 명령을 사용자 인터페이스를 통하여 출력하는 단계; 및 상기 다수개의 소정 명령 중 일 소정 명령을 상기 사용자 인터페이스를 통하여 선택받는 단계를 더 포함할 수 있다. According to another aspect of the present invention, there is provided a speech recognition method comprising: outputting a plurality of predetermined commands through a user interface when a plurality of predetermined commands are recognized; And receiving a predetermined command from the plurality of predetermined commands through the user interface.

또한, 본 발명의 일 실시예에 따른 음성 인식 방법은 상기 인식된 완성형 문자 또는 상기 완성형 문자로 이루어지는 상기 소정 명령 정보를 실시간으로 사용자 인터페이스를 통하여 출력하는 단계; 및 상기 음성 신호의 입력이 완료되면, 상기 소정 명령과 관련된 정보를 인터넷 서버에서 검색하고, 상기 검색된 정보를 상기 사용자 인터페이스를 통하여 출력하는 단계를 더 포함할 수 있다. According to another aspect of the present invention, there is provided a speech recognition method including: outputting the predetermined command information including the recognized complete character or the completed character through a user interface in real time; And searching the Internet server for information related to the predetermined command when the input of the voice signal is completed, and outputting the retrieved information through the user interface.

본 발명의 일 실시예에 따른 영상 표시 장치는 소정 시간 간격을 두고 일 완성형 문자에 대응되는 음성 신호를 입력받고, 입모양의 움직임 영상을 촬영하는 원격 제어부; 상기 음성 신호로부터 상기 일 완성형 문자에 대응되는 음성 신호 구간을 추출하고, 상기 움직임 영상으로부터 상기 일 완성형 문자에 대응되는 영상 신호 구간을 추출하며, 상기 음성 신호 구간과 상기 영상 신호 구간이 일치하면 상기 일치된 구간 내에 입력된 상기 음성 신호 및 상기 영상 신호 중 적어도 하나에 대응되는 상기 완성형 문자를 인식하는 음성 인식 처리부; 및 상기 인식된 완성형 문자로 이루어지는 소정 명령이 수행되도록 제어하는 제어부를 포함한다. The image display apparatus according to an embodiment of the present invention includes a remote controller for receiving a voice signal corresponding to a one-shot type character at a predetermined time interval and photographing a mouth-shaped motion image; Extracting a voice signal section corresponding to the one-letter type character from the voice signal, extracting a video signal section corresponding to the one-letter type character from the motion image, and, if the voice signal section and the video signal section coincide, A voice recognition processor for recognizing the completion type character corresponding to at least one of the voice signal and the video signal inputted within a predetermined interval; And a control unit for controlling the execution of the predetermined command including the recognized complete character.

또한, 상기 음성 인식 처리부는 상기 일치된 구간 내에 입력된 상기 음성 신호의 음성 특징값을 계산하고, 계산된 상기 음성 특징값에 근거하여 상기 일 완성형 문자를 인식할 수 있다. The speech recognition processing unit may calculate a speech characteristic value of the speech signal input in the matched section and recognize the one-sided character based on the calculated speech characteristic value.

본 발명의 일 실시예에 따른 음성 인식 방법 및 그에 따른 영상 표시 장치는 음성 인식율을 증가시킬 수 있다. 그에 따라서, 더욱 정확하게 영상 표시 장치를 원격으로 제어할 수 있다. The speech recognition method and the image display apparatus according to an embodiment of the present invention can increase the voice recognition rate. Accordingly, the image display apparatus can be remotely controlled more accurately.

또한, 본 발명의 일 실시예에 따른 음성 인식 방법 및 그에 따른 영상 표시 장치는 음성 인식율을 증가시킴으로써 사용자가 더욱 용이하고 편리하게 영상 표시 장치를 사용하도록 할 수 있다. In addition, the speech recognition method and the image display apparatus according to an embodiment of the present invention can increase the voice recognition rate, so that the user can more easily and conveniently use the image display apparatus.

도 1은 음성 인식에 의한 영상 표시 장치의 제어를 설명하기 위한 도면이다.
도 2는 음성 인식의 일반적인 동작을 설명하기 위한 플로우차트이다.
도 3은 본 발명의 일 실시예에 따른 영상 표시 장치를 나타내는 블록 다이어그램이다.
도 4는 도 3을 좀 더 상세히 나타내는 블록 다이어그램이다.
도 5는 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 일 디스플레이 화면을 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다.
도 8은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다.
도 9는 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다.
도 10은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다.
도 11은 본 발명의 일 실시예에 따른 음성 인식 방법을 나타내는 도면이다. 1 is a diagram for explaining control of a video display device by voice recognition.
2 is a flowchart for explaining a general operation of speech recognition.
3 is a block diagram illustrating an image display apparatus according to an exemplary embodiment of the present invention.
FIG. 4 is a block diagram illustrating FIG. 3 in more detail.
5 is a view illustrating a display screen output by the image display apparatus according to an embodiment of the present invention.
6 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention.
7 is a view illustrating another display screen output by the image display apparatus according to the embodiment of the present invention.
8 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention.
9 is a view showing another display screen output by the image display apparatus according to an embodiment of the present invention.
10 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention.
11 is a diagram illustrating a speech recognition method according to an embodiment of the present invention.

이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 본 발명의 실시예를 상세하게 설명하지만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and accompanying drawings, but the present invention is not limited to or limited by the embodiments.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 이는 당분야에 종사하는 기술자의 의도 또는 관례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 그 의미를 기재할 것이다. 따라서 본 명세서에서 사용되는 용어는, 단순한 용어의 명칭이 아닌 그 용어가 가지는 실질적인 의미와 본 명세서의 전반에 걸친 내용을 토대로 해석되어야 함을 밝혀두고자 한다.As used herein, terms used in the present invention are selected from general terms that are widely used in the present invention while taking into account the functions of the present invention, but these may vary depending on the intention or custom of a person skilled in the art or the emergence of new technologies. Also, in certain cases, there may be a term chosen arbitrarily by the applicant, in which case the meaning thereof will be described in the description of the corresponding invention. Therefore, it is intended that the terminology used herein should be interpreted based on the meaning of the term rather than on the name of the term, and on the entire contents of the specification.

도 3은 본 발명의 일 실시예에 따른 영상 표시 장치를 나타내는 블록 다이어그램이다. 3 is a block diagram illustrating an image display apparatus according to an exemplary embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 영상 표시 장치(300)는 신호 입력부(310), 음성 인식부(320), 제어부(Controller)(330), 신호 처리부(340), 인터페이스 부(350), OSD 생성부(355), 사용자 인터페이스 부(User Interface unit)(360) 및 저장부(380)를 포함한다. 또한, 외부적으로 원격 제어부(remote controller unit)(390)를 더 포함한다. 3, an image display apparatus 300 according to an exemplary embodiment of the present invention includes a signal input unit 310, a voice recognition unit 320, a controller 330, a signal processing unit 340, An OSD generation unit 355, a user interface unit 360, and a storage unit 380. The OSD generation unit 355, the OSD generation unit 355, the user interface unit 360, In addition, it further includes a remote controller unit 390 externally.

영상 표시 장치(300)는 디지털 텔레비전 또는 셋 톱 박스(set-top box) 등과 같은 디지털 방송 수신기이다. 영상 표시 장치(300)가 디지털 텔레비전인 경우, 도 3에 도시된 바와 같이 영상 표시 장치(300)는 디스플레이 부(Display unit)(370)를 포함한다. 또한, 영상 표시 장치(300)가 셋 톱 박스(set-top box)인 경우, 영상 표시 장치(300)는 디스플레이 부(Display unit)(370)를 포함하지 않는다. The video display device 300 is a digital broadcast receiver such as a digital television or a set-top box. When the image display apparatus 300 is a digital television, the image display apparatus 300 includes a display unit 370 as shown in FIG. Also, when the image display apparatus 300 is a set-top box, the image display apparatus 300 does not include a display unit 370.

또한, 영상 표시 장치(300)에는 도 1에서 도시되는 구성 이외에 필요한 다른 구성이 더 포함될 수 있다.In addition, the image display apparatus 300 may further include other configurations required in addition to the configuration shown in FIG.

신호 입력부(310)는 튜너(tuner)(311) 및 네트워크 인터페이스 부(network interface unit)(315) 등을 포함할 수 있다. 이하에서는, 신호 입력부(310)가 수신하는 신호를 영상 신호라 한다. The signal input unit 310 may include a tuner 311, a network interface unit 315, and the like. Hereinafter, a signal received by the signal input unit 310 is referred to as a video signal.

튜너(311)는 소정 주파수 대역의 채널을 통하여 무선 주파수(RF: Radio Frequency) 신호 형태로 전송되는 방송 신호를 선택적으로 수신한다. 즉, 방송국 등의 컨텐츠 제작자로부터 전송되며 소정 콘텐츠를 포함하는 방송 신호를 선택적으로 수신한다. The tuner 311 selectively receives a broadcast signal transmitted in the form of a radio frequency (RF) signal through a channel of a predetermined frequency band. That is, it selectively receives a broadcast signal transmitted from a content producer such as a broadcasting station and including a predetermined content.

네트워크 인터페이스 부(315)는 네트워크를 통하여 소정 인터넷 서버 또는 컨텐츠 제공자(CP: Content Provider) 서버에 접속하여, 상기 서버들과 소정 신호를 송수신한다. 구체적으로, 네트워크 인터페이스 부(315)는 컨텐츠 제공자로부터 소정 컨텐츠를 포함하는 방송 신호를 전송받는다. The network interface unit 315 accesses a predetermined Internet server or a content provider (CP) server through a network, and transmits / receives a predetermined signal to / from the servers. Specifically, the network interface unit 315 receives a broadcast signal including a predetermined content from a content provider.

원격 제어부(390)는 소정 시간 간격을 두고 일 완성형 문자에 대응되는 음성 신호를 입력받고, 입모양의 움직임 영상을 촬영한다. 그리고, 입력받은 음성 신호 및 촬영된 움직임 영상을 인터페이스 부(350)로 전송한다. 원격 제어부(390)는 카메라 부(391), 마이크 부(393), 신호 변환부(395) 및 리모컨 인터페이스 부(397)를 포함할 수 있다. The remote control unit 390 receives a voice signal corresponding to a one-time type character at a predetermined time interval, and captures a mouth-shaped motion image. Then, the input voice signal and the captured motion image are transmitted to the interface unit 350. The remote control unit 390 may include a camera unit 391, a microphone unit 393, a signal conversion unit 395, and a remote control interface unit 397.

일 완성형 문자란, 하나의 문자 자체가 독립적으로 이용될 수 있는 문자로, 한글 완성형 문자로는 2350 개(확장형의 완성형 문자는 2850 개)의 문자가 있으며, 영문 완성형 문자로는 알파벳 개수인 26개의 문자가 있다. 여기서, 한글 완성형 문자는 적어도 하나의 자음(기본 자음 및 복합 자음)과 적어도 하나의 모음(기본 모음과 복합 모음)의 조합으로 이루어질 수 있다. One complete character is a character that can be used independently, and there are 2350 characters (2850 of extended type completed characters) in the Hangul complete character, and 26 characters There is a character. Here, the Hangul complete character can be composed of at least one consonant (basic consonant and compound consonant) and at least one vowel (basic vowel and complex vowel).

그리고, 완성형 문자 각각은 서로 다른 발음 특성이 가진다. 예를 들어, 한글 완성형 문자인 '가' 및 '나'는 각각 '가' 및 '나'의 구별되는 발음 특성을 가지고, 영문 완성형 문자인 'a' 및 'b'는 각각 '에이' 및 '비(biː)'의 발음 특성을 가진다. And each completed character has different pronunciation characteristics. For example, the 'H' and 'I' characters have distinct pronunciation characteristics of 'I' and 'I', respectively, and 'a' and 'b' Has a pronunciation characteristic of 'bi'.

이하에서는 완성형 문자로 한글 또는 영문의 완성형 문자를 예로 들어 설명하였으나, 다른 외국어에 따른 완성형 문자가 이용될 수 있다. In the following description, the completion type character is used as an example of the completion type character in Korean or English, but the completion type character according to another foreign language can be used.

카메라 부(391)는 사용자가 소정 명령을 말할 때의 입모양 영상을 촬영한다. 즉, 사용자가 음성신호를 원격 제어부(390)로 입력할 때의 입모양 영상을 촬영한다. 카메라 부(391)가 촬영한 영상은 동영상(moving picture)이 될 수 있다. The camera unit 391 takes a mouth-shaped image when the user speaks a predetermined command. That is, the user takes a mouth-shaped image when the user inputs a voice signal to the remote control unit 390. The image captured by the camera unit 391 may be a moving picture.

마이크 부(393)는 사용자가 소정 명령을 말할 때 발생하는 음성 신호를 입력받는다. 즉, 마이크 부(393)는 사용자에 의해 발생하는 음성 신호를 녹음한다. The microphone unit 393 receives a voice signal generated when the user speaks a predetermined command. That is, the microphone unit 393 records a voice signal generated by the user.

신호 변환부(395)는 카메라 부(391)에서 촬영된 영상 신호를 인터페이스 부(350)로 전송하기 위하여, 전송에 필요한 소정 신호 형태로 변환한다. 그리고, 마이크 부(393)에서 입력받은 음성 신호를 인터페이스 부(350)로 전송하기 위하여, 소정 신호 형태로 변환한다. The signal converting unit 395 converts the video signal captured by the camera unit 391 into a predetermined signal format necessary for transmission to the interface unit 350. In order to transmit the voice signal input from the microphone unit 393 to the interface unit 350, the voice signal is converted into a predetermined signal form.

예를 들어, 원격 제어부(390)와 인터페이스 부(350)가 RF 통신 규격에 따라 소정 신호를 송수신할 수 있는 RF(radio frequency) 모듈을 포함한다면, 신호 변환부(395)는 영상 신호 또는 음성 신호를 RF 통신 규격에 맞춰 RF 신호로 변환한다. 또는, 원격 제어부(390)와 인터페이스 부(350)가 IR(infra-red) 통신 규격에 따라 소정 신호를 송수신할 수 있는 IR 모듈을 포함한다면, 신호 변환부(395)는 영상 신호 또는 음성 신호를 IR 통신 규격에 맞춰 IR 신호로 변환한다. For example, if the remote control unit 390 and the interface unit 350 include a radio frequency (RF) module capable of transmitting and receiving a predetermined signal according to the RF communication standard, the signal conversion unit 395 converts the video signal or the audio signal Into an RF signal conforming to the RF communication standard. Alternatively, if the remote control unit 390 and the interface unit 350 include an IR module capable of transmitting and receiving a predetermined signal in accordance with the infra-red communication standard, the signal conversion unit 395 converts the video signal or the audio signal Converts to IR signal according to IR communication standard.

리모컨 인터페이스 부(397)는 신호 변환부(395)에서 출력되는 신호를 인터페이스 부(350)로 전송한다. 여기서, 리모컨 인터페이스 부(397)는 전술한 바와 같이, RF 통신 규격에 따라 신호를 송수신할 수 있는 RF(radio frequency) 모듈(미도시) 또는 IR(infra-red) 통신 규격에 따라 신호를 송수신할 수 있는 IR 모듈(미도시)로 구성될 수 있다. The remote controller interface unit 397 transmits the signal output from the signal converter 395 to the interface unit 350. The remote control interface unit 397 transmits and receives signals according to an RF (radio frequency) module (not shown) or an IR (infra-red) communication standard that can transmit and receive signals according to the RF communication standard And an IR module (not shown).

인터페이스 부(350)는 리모컨 인터페이스 부(397)에서 전송되는 음성 신호 및 영상 신호를 전송받아, 음성 인식 처리부(320)로 전송한다. 또한, 인터페이스 부(350)는 RF 통신 규격에 따라 신호를 송수신할 수 있는 RF(radio frequency) 모듈(미도시) 또는 IR(infra-red) 통신 규격에 따라 신호를 송수신할 수 있는 IR 모듈(미도시)로 구성될 수 있다. The interface unit 350 receives the voice signal and the video signal transmitted from the remote controller interface unit 397 and transmits the voice signal and the video signal to the voice recognition processor 320. The interface unit 350 may include an IR module capable of transmitting and receiving signals according to an RF communication standard or an IR module capable of transmitting and receiving signals according to an IR (infra-red) communication standard (not shown) Time).

음성 인식 처리부(320)는 전송받은 음성 신호로부터 일 완성형 문자에 대응되는 음성 신호 구간을 추출하고, 입모양 움직임 영상으로부터 일 완성형 문자에 대응되는 영상 신호 구간을 추출한다. 그리고, 추출된 음성 신호 구간과 영상 신호 구간이 상호 일치하면, 일치된 구간 내에 입력된 음성 신호 및 영상 신호 중 적어도 하나에 대응되는 완성형 문자를 인식한다. The voice recognition processor 320 extracts a voice signal section corresponding to the one-letter type character from the received voice signal and extracts a video signal section corresponding to the one-letter type character from the mouth-shaped motion image. If the extracted voice signal section and the video signal section coincide with each other, the completion type character corresponding to at least one of the voice signal and the video signal input in the matched section is recognized.

음성 인식 처리부(320)의 상세 동작은 이하에서 도 4를 참조하여 상세히 설명한다. Detailed operation of the speech recognition processor 320 will be described in detail with reference to FIG.

제어부(330)는 영상 표시 장치(300)의 전반적인 동작을 제어한다. 구체적으로, 소정 컨텐츠를 포함하는 영상 신호가 영상 화면으로 디스플레이될 수 있도록 제어한다. 구체적으로, 제어부(330)는 음성 인식 처리부(320)에서 인식된 완성형 문자로 이루어지는 소정 명령이 수행될 수 있도록 제어한다. The control unit 330 controls the overall operation of the video display device 300. Specifically, the video signal including the predetermined content is controlled to be displayed on the video screen. Specifically, the control unit 330 controls the voice recognition processor 320 to execute a predetermined command including the completed character.

신호 처리부(340)는 수신한 영상 신호를 영상 표시 장치(300)가 디스플레이할 수 있는 데이터로 변환하여 출력한다. 구체적으로, 수신한 영상 신호를 복조하고, 복조된 신호를 역다중화(demultiplexing) 및 복호화(decoding)하며, 에러 보정 및 신호 품질 개선 등의 신호 처리를 수행한다. The signal processor 340 converts the received video signal into data that can be displayed by the video display device 300 and outputs the data. Specifically, it demodulates the received video signal, demultiplexes and decodes the demodulated signal, and performs signal processing such as error correction and signal quality improvement.

OSD 생성부(355)는 OSD(On Screen Data) 데이터를 생성하여 디스플레이 부(370)로 출력한다. 구체적으로, OSD 생성부(355)는 사용자 인터페이스 부(360)에서 출력되는 사용자 인터페이스 데이터를 OSD 데이터로 변환하여 출력할 수 있다. 또한, 제어부(330)의 제어에 따라서, 사용자에게 제공하여야 할 각종 정보를 OSD 데이터로 생성할 수 있다. 구체적으로, 사용자에 입력한 음성 신호에 대응되는 완성형 문자의 인식 정보를 OSD로 실시간으로 생성하여 출력한다. 또한, 생성된 OSD는 디스플레이 부(370)로 전송되어 디스플레이 화면상에 디스플레이된다. The OSD generation unit 355 generates OSD (On Screen Data) data and outputs it to the display unit 370. Specifically, the OSD generation unit 355 can convert the user interface data output from the user interface unit 360 into OSD data and output the OSD data. Also, according to the control of the control unit 330, various information to be provided to the user can be generated as the OSD data. Specifically, the recognition information of the completion type character corresponding to the voice signal input to the user is generated and output in real time on the OSD. The generated OSD is transmitted to the display unit 370 and displayed on the display screen.

사용자 인터페이스 부(360)는 사용자에게 제공하여야할 제어 메뉴를 사용자 인터페이스(UI: User Interface) 데이터로 생성하여 출력하거나, 사용자로부터 소정 요청 또는 소정 정보를 입력받는다. 여기서, 사용자 인터페이스 데이터는 OSD 생성부(355)를 통하여 OSD(On Screen Data) 데이터로 변환될 수 있다. 또한, 사용자 인터페이스 데이터를 바로 디스플레이 부(370)로 전송되어 GUI(Graphic User Interface)로 디스플레이될 수도 있다. The user interface unit 360 generates and outputs a control menu to be provided to the user as user interface (UI) data, or receives a predetermined request or predetermined information from the user. Here, the user interface data may be converted into OSD (On Screen Data) data through the OSD generating unit 355. Also, the user interface data may be directly transmitted to the display unit 370 and displayed in a GUI (Graphic User Interface).

OSD 생성부(355)에서 출력되는 OSD 데이터 또는 사용자 인터페이스 부(360)를 통하여 출력되는 사용자 인터페이스(UI)는 이하에서 도 5 내지 도 10을 참조하여 상세히 설명한다. The OSD data output from the OSD generation unit 355 or the user interface UI output through the user interface unit 360 will be described in detail with reference to FIG. 5 to FIG.

디스플레이 부(370)는 신호 처리부(340)에서 전송되는 영상 신호를 영상 화면으로 디스플레이한다. 또한, 사용자 인터페이스 부(360) 또는 OSD 생성부(355)에서 출력되는 OSD 데이터 또는 사용자 인터페이스 데이터를 영상 화면의 전체 또는 소정 영역에 디스플레이한다. The display unit 370 displays a video signal transmitted from the signal processing unit 340 as an image screen. In addition, the OSD data or the user interface data output from the user interface unit 360 or the OSD generating unit 355 is displayed on the whole or a predetermined area of the image screen.

저장부(380)는 제어부(330)의 제어에 따라서 디스플레이 동작에 필요한 각종 정보들을 저장할 수 있다. 저장부(380)는 음성 인식을 위한 음성 인식 데이터 베이스를 저장할 수도 있다. The storage unit 380 may store various kinds of information necessary for the display operation under the control of the controller 330. [ The storage unit 380 may store a speech recognition database for speech recognition.

본 발명의 일 실시예에 따른 영상 표시 장치(300)의 영상 인식 동작 및 상세 구성은 이하에서 도 4 내지 도 10을 참조하여 더욱 상세히 설명한다. The image recognition operation and the detailed configuration of the image display apparatus 300 according to an embodiment of the present invention will be described in more detail with reference to FIG. 4 to FIG.

도 4는 도 3을 좀 더 상세히 나타내는 블록 다이어그램이다. FIG. 4 is a block diagram illustrating FIG. 3 in more detail.

도 4를 참조하면, 음성 인식 처리부(320)는 음성 및 영상 신호 입력부(420), 구간 검출부(430), 특징값 산출부(440), 및 문자 인식부(450)를 포함할 수 있다. 또한, 네트워크 인터페이스 부(315)는 유선 또는 무선의 통신 네트워크를 통하여 인터넷 서버(410)와 소정 데이터를 송수신할 수 있다. 여기서, 인터넷 서버(410)는 인식 데이터 베이스(411) 및 인식 엔진(413)을 포함한다. 도 3에서 중복되는 구성에 대한 설명은 생략한다. 4, the speech recognition processing unit 320 may include a voice and video signal input unit 420, a section detection unit 430, a feature value calculation unit 440, and a character recognition unit 450. In addition, the network interface unit 315 can transmit / receive predetermined data to / from the Internet server 410 via a wired or wireless communication network. Here, the Internet server 410 includes a recognition database 411 and a recognition engine 413. Description of the overlapping configuration in FIG. 3 is omitted.

인터페이스 부(350)는 원격 제어부(390)에서 전송되는 음성 신호 및 입모양의 움직임 영상 신호를 전송받는다. 이하에서는 입모양의 움직임 영상 신호를 '영상 신호'라 한다. The interface unit 350 receives the audio signal and the mouth-shaped motion picture signal transmitted from the remote control unit 390. Hereinafter, a mouth-shaped motion picture signal is referred to as a 'video signal'.

음성 및 영상 신호 입력부(420)는 인터페이스 부(350)로부터 원격 제어부(390)에서 입력받은 음성 신호 및 영상 신호를 입력받는다. The audio and video signal input unit 420 receives the audio and video signals input from the remote control unit 390 from the interface unit 350.

전술한 원격 제어부(390)는 마이크 부(393)로 음성 신호를 입력받는데 있어서, 소정 시간 간격을 두고 일 완성형 문자에 대응되는 음성 신호를 각각 입력받는다. 예를 들어, '채널7'의 음성 신호를 입력받는 경우, '채'를 입력받고 소정 시간 후에 '널'을 입력받고, 계속하여 소정 시간 후에 '7(칠)'을 입력받는다. The remote control unit 390 receives a voice signal from the microphone unit 393, and receives a voice signal corresponding to a one-time type character at predetermined intervals. For example, when a voice signal of 'channel 7' is input, 'ch` is inputted,' null 'is input after a predetermined time, and' 7 (seven) 'is input after a predetermined time.

여기서, 소정 시간은 하나의 완성형 문자와 후속하여 입력되는 다른 완성형 문자 사이에 존재하는 정지 시간을 뜻한다. 즉, 각각의 완성형 문자(채, 널, 7(칠))를 끊어 읽기 형식으로 입력함에 있어서, 정지 시간(소정 시간 간격)에는 사용자의 음성 신호가 입력되지 않는다. 따라서, 사람의 음성에 해당하는 주파수 영역의 신호가 입력되지 않는 구간이 전술한 정지 시간인 것으로 판단할 수 있다. Here, the predetermined time means a stop time that exists between one complete character and another successive character that is subsequently input. In other words, in inputting the completion type characters (Chain, 7, 7) in the reading format, the user's voice signal is not input at the stopping time (predetermined time interval). Accordingly, it can be determined that the section in which the signal in the frequency domain corresponding to the human voice is not input is the above-described stop time.

이하에서는 '일 완성형 문자 + 소정 시간 간격 + 일 완성형 문자 + 소정 시간 간격...'을 '일 완성형 문자_일 완성형 문자_...'로 나타낸다. 즉, 소정 시간 간격을 '_' 기호를 사용하여 표시한다. 예를 들어, 소정 시간 간격으로 입력되는 '채, 널, 칠'은 '채_널_칠(7)'로 표현될 수 있다. Hereinafter, 'one-time completion type character + predetermined time interval + one-time completion character time + predetermined time interval ...' is expressed as 'one-time completion type character completion character type _...'. That is, the predetermined time interval is displayed using the symbol '_'. For example, 'chess', 'null', and 'chess', which are input at predetermined time intervals, can be expressed as 'chessboard (7)'.

또한, 카메라 부(391)에 의해 촬영된 영상 신호에 있어서, 하나의 완성형 문자와 후속하는 완성형 문자를 말하는 사용자의 입모양에 있어서, 입모양이 움직이지 않는 구간이 존재하게 된다. 입모양이 움직이지 않는 구간은 전술한 소정 시간 간격이 된다. In the video signal photographed by the camera unit 391, there is a section in which the mouth shape does not move in a mouth shape of a user who speaks one complete character and a succeeding complete character. The section where the mouth shape does not move is the predetermined time interval described above.

구간 검출부(430)는 일 완성형 문자에 대응되는 음성 구간 및 영상 구간을 추출한다. 구간 검출부(430)는 음성 구간 검출부(431) 및 영상 구간 검출부(433)를 포함할 수 있다. The section detecting unit 430 extracts a voice section and an image section corresponding to the one-letter type character. The section detector 430 may include a voice section detector 431 and an image section detector 433. [

음성 구간 검출부(431)는 음성 신호로부터 일 완성형 문자에 대응되는 음성 신호 구간을 추출한다. 구체적으로, 인접한 정지 시간 사이의 구간을 일 완성형 문자에 대응되는 음성 신호 구간으로 추출할 수 있다. 예를 들어, a 시점부터 음성신호가 입력되기 시작하여 b 시점부터 정지 시간이 검출되었다면, a 시점부터 b 시점까지의 구간을 일 완성형 문자에 대응되는 음성 신호 구간으로 추출할 수 있다. 또한, 계속하여, b시점부터 c 시점까지 정지 시간이 검출되고 계속하여 c 시점부터 d 시점까지 다시 음성 신호가 검출되었다면, 후속하는 일 완성형 문자에 대응되는 음성 신호 구간은 c 시점부터 d 시점까지가 된다. The voice section detector 431 extracts a voice signal section corresponding to the one-letter type character from the voice signal. Specifically, a section between adjacent stopping times can be extracted as a speech signal section corresponding to a one-time-type character. For example, if a voice signal starts to be input at time point a and a stop time is detected at time point b, a section from time point a to time point b can be extracted as a voice signal section corresponding to a one-time type character. If the stop time is detected from the time point b to the time point c and the voice signal is detected again from the time point c to the time point d, the voice signal section corresponding to the succeeding one- do.

그리고, 영상 구간 검출부(433)는 영상 신호로부터 일 완성형 문자에 대응되는 영상 구간을 검출한다. 구체적으로, 인접한 정지 영상 구간의 사이 구간을 일 완성형 문자에 대응되는 영상 신호 구간으로 추출할 수 있다. 예를 들어, a 시점부터 입 모양이 움직이기 시작하여 b 시점부터 입 모양의 움직임이 정지되는 정지 영상 구간이 검출되었다면, a 시점부터 b 시점까지의 구간을 일 완성형 문자에 대응되는 영상 신호 구간으로 추출할 수 있다. Then, the video segment detecting unit 433 detects an video segment corresponding to the one-shot type character from the video signal. Specifically, a section between adjacent still image sections can be extracted as a video signal section corresponding to a one-shot type character. For example, if a mouth shape starts to move from a point of time and a still image portion in which a mouth motion is stopped from point b is detected, a section from a point of time to point b is referred to as a video signal section corresponding to a full- Can be extracted.

본 발명의 일 실시예에 따른 영상 표시 장치에서는, '일 완성형 문자+소정 시간 간격'으로 음성 신호를 입력받고, 음성 신호 구간과 영상 신호 구간을 각각 검출한다. 그리고, 음성 신호 구간과 영상 신호 구간이 일치되는 구간에서 음성 신호를 인식하게 되므로, 음성 신호를 더욱 정확하게 인식할 수 있다. 구체적으로, 음성 신호가 주변 소음으로 인해서 불확실하게 입력되더라도, 영상 신호의 구간을 비교 판단하여 상기 영상 신호 구간과 일치되는 음성 신호의 구간을 검출함으로써, 음성 신호를 더욱 정확하게 입력받을 수 있다. 그에 따라서, 음성 신호의 인식률을 높일 수 있다. In an image display apparatus according to an embodiment of the present invention, a voice signal is input in a 'full character type + predetermined time interval', and a voice signal section and a video signal section are detected. Since the voice signal is recognized in a section where the voice signal section and the video signal section coincide with each other, the voice signal can be recognized more accurately. Specifically, even if the audio signal is input uncertainly due to the ambient noise, the audio signal can be received more accurately by comparing the duration of the video signal and detecting the duration of the audio signal that coincides with the video signal duration. Accordingly, the recognition rate of the voice signal can be increased.

음성 구간 검출부(431)에서 검출된 일 완성형 문자에 해당하는 음성 신호 구간과 영상 구간 검출부(433)에서 검출된 일 완성형 문자에 해당하는 영상 신호 구간이 일치하면, 특징값 산출부(440)는 구간 검출부(430)에서 검출된 일치 구간 내에 존재하는 음성 신호의 특징값을 산출한다. 그리고, 영상 신호의 특징값을 더 산출할 수 있다. 이하에서는, 음성 신호 구간과 영상 신호 구간이 일치할 때의 음성 신호 입력 구간을 '일치 구간'이라 한다. If the voice signal section corresponding to the one-time type character detected by the voice section detecting section 431 coincides with the video signal section corresponding to the one-time type character detected by the video section detecting section 433, the feature value calculating section 440 calculates The feature value of the voice signal existing within the matching section detected by the detecting section 430 is calculated. Further, the feature value of the video signal can be further calculated. Hereinafter, the audio signal input section when the audio signal section and the video signal section match is referred to as a " matching section ".

구체적으로, 음성 특징 산출부(441)에서 산출되는 음성 특징값은 상기 일치 구간에 존재하는 음성 신호를 디지털 변환하고, 이산 코사인 변환(DCT: discrete cosine transform)하여 구한 MFCC(Mel Frequency Cepstrol Coefficient) 값 등이 될 수 있다. 음성 특징값은 음성 신호 자체가 아니며, 음성 신호에서 특징적인 값만을 추출한 값으로, 원래의 음성 신호에 비하여 작은 데이터 양(예를 들어, 원래 음성 신호 데이터 양의 10% 수준)을 가진다. More specifically, the voice characteristic value calculated by the voice characteristic calculating unit 441 is a value obtained by digitally converting the voice signal existing in the matching period and calculating a Mel Frequency Cepstral Coefficient (MFCC) value obtained by a discrete cosine transform (DCT) And the like. The voice characteristic value is not the voice signal itself but only a characteristic value extracted from the voice signal and has a smaller data amount (for example, 10% of the original voice signal data amount) than the original voice signal.

구체적으로, 음성 특징 산출부(441)는 아날로그 신호 형태의 음성 신호를 입력받고, 아날로그-디지털 컨버터(AD converter)를 이용하여 PCM(Pulse Code Modulation) 신호로 변환한다. 변환된 PCM 신호는 8KHz의 샘플링 레이트(sampling rate)를 가지며 16bits 의 진폭 분해능을 가질 수 있다. 계속하여, PCM 신호에서 노이즈(noise) 성분을 제거하여 사람의 음성에 대응되는 구간의 주파수 성분만을 남긴다. 그리고, PCM 신호를 소정 주파수 대역마다 분할하여, 분할된 주파수 대역 각각에서의 MFCC 값을 구한다. 음성 신호의 특징값은 이외에도 매우 다양한 방법으로 구할 수 있다. Specifically, the voice characteristic calculator 441 receives a voice signal in the form of an analog signal and converts it into a PCM (Pulse Code Modulation) signal using an analog-to-digital converter (AD converter). The converted PCM signal has a sampling rate of 8 KHz and can have an amplitude resolution of 16 bits. Subsequently, a noise component is removed from the PCM signal, leaving only a frequency component of a section corresponding to a human voice. Then, the PCM signal is divided for each predetermined frequency band, and the MFCC value in each divided frequency band is obtained. The feature values of the speech signal can be obtained by various other methods as well.

또한, 움직임 특징 산출부(443)는 일치 구간에 존재하는 영상 신호로부터 입술의 움직임, 턱의 움직임 및 뺨의 움직임을 추출하고, 각 움직임의 방향과 이동량의 특징값을 산출한다. 구체적으로, 입술, 턱, 또는 뺨의 움직임을 움직임 벡터값으로 산출할 수 있다. Further, the motion feature calculating unit 443 extracts the lip motion, the jaw motion, and the cheek motion from the video signal existing in the matching period, and calculates the feature value of each motion direction and the movement amount. Specifically, the motion of the lips, the jaws, or the cheeks can be calculated as a motion vector value.

음성 신호 자체를 주파수 변환하여 구한 이산 신호(discrete signal)를 이용하려면, 음성 신호의 크기 자체가 매우 커지게 되어, 네트워크를 통하여 소정 인터넷 서버로 전송을 완료하는데 걸리는 시간 및 데이터 전송량이 매우 증가하게 된다. 그에 따라서, 음성 인식의 시간이 증가하게 되며, 그에 따라서 즉각적인 음성 인식에 따른 제어가 어렵다. A discrete signal obtained by frequency conversion of a voice signal itself is used, the size of a voice signal itself becomes very large, and a time and a data transmission amount for completing a transmission to a predetermined Internet server through the network are greatly increased . Accordingly, the time of speech recognition is increased, and accordingly, it is difficult to control according to immediate speech recognition.

본원에서는 음성 특징 산출부(441)에서 음성 신호의 특징값만을 산출하고, 움직임 특징 산출부(443)에서 움직임의 특징값만을 산출한다. 그리고 추출된 특징값을 네트워크 인터페이스 부(310)를 통하여 인터넷 서버(410)로 전송하여 이용하므로, 인터넷 서버로 전송을 완료하는데 걸리는 시간 및 데이터 전송량을 최소화할 수 있다. 그에 따라서, 음성 인식의 시간을 단축시켜 즉각적인 음성 인식에 따른 제어를 할 수 있다. In this embodiment, only the feature value of the speech signal is calculated by the speech feature calculating unit 441, and only the feature value of the motion is calculated by the motion feature calculating unit 443. Since the extracted feature value is transmitted to the Internet server 410 through the network interface unit 310, the time and data transfer time required to complete the transmission to the Internet server can be minimized. Accordingly, it is possible to shorten the time of speech recognition and perform control according to immediate speech recognition.

문자 인식부(450)는 구간 검출부(430)에서 검출된 음성 신호 구간 및 영상 신호 구간이 일치하면, 일치 구간 내에 입력된 음성 신호 및 영상 신호 중 적어도 하나에 대응되는 완성형 문자를 인식한다. 문자 인식부(450)는 내부적으로 음성 인식 엔진(미도시) 및 음성 인식을 위한 데이터 베이스(미도시)를 포함할 수 있으며, 이를 이용하여, 일치 구간 내에 입력된 완성형 문자를 인식하게 된다. When the voice signal section and the video signal section detected by the section detector 430 match, the character recognizing section 450 recognizes the completion type character corresponding to at least one of the voice signal and the video signal input in the matching section. The character recognition unit 450 may internally include a speech recognition engine (not shown) and a database (not shown) for speech recognition, and recognizes the completed character input within the matching period.

구체적으로, 문자 인식부(450) 특징 값 산출부(440)에서 산출된 음성 특징 값 및 움직임 특징 값 중 적어도 하나에 근거하여, 일치 구간 내에 입력된 완성형 문자를 인식할 수 있다. Specifically, based on at least one of the voice feature value and the motion feature value calculated by the character recognition unit 450 feature value calculation unit 440, the completion type character input in the matching period can be recognized.

또한, 문자 인식부(450)는 네트워크 인터페이스 부(310)를 통하여 소정 인터넷 서버(410)와 접속하고, 인터넷 서버(410)가 포함하는 인식 엔진(413) 및 인식 테이터 베이스(411)를 이용할 수 있다. 이 경우, 문자 인식부(450)는 내부적으로 인식 엔진 또는 음성 인식을 위한 데이터 베이스를 구비하지 않을 수 있다. The character recognition unit 450 can access the predetermined Internet server 410 through the network interface unit 310 and use the recognition engine 413 and the recognition data base 411 included in the Internet server 410 have. In this case, the character recognition unit 450 may not internally include a recognition engine or a database for voice recognition.

여기서, 인식 데이터 베이스(411)는 특징값 산출부(440)에서 산출된 음성 특징값 또는 움직임 특징값과 대응 또는 비교되는 자료 구조로 되어 있다. 예를 들어, 음성 특징값이 MFCC(Mel Frequency Cepstrol Coefficient) 값을 갖는다면, 인식 데이터 베이스(411)는 각 완성형 문자에 대응되는 MFCC 값을 포함한다. 또한, 각 완성형 문자에 대응되는 영상 신호의 특징 값(예를 들어, 움직임 벡터 값 등)을 포함할 수 있다. Here, the recognition database 411 has a data structure corresponding to or compared with the voice feature value or the motion feature value calculated by the feature value calculation unit 440. For example, if the voice feature value has a Mel Frequency Cepstroll Coefficient (MFCC) value, the recognition database 411 includes an MFCC value corresponding to each complete character. In addition, it may include a feature value (e.g., a motion vector value) of a video signal corresponding to each complete character.

인식 엔진(413)은 입력된 음성 신호가 어느 문자에 해당하는지 판단하는 엔진으로, 음성 인식 처리부(320)에서 출력되는 음성 신호 또는 영상 신호의 값(구체적으로, 음성 특징값 또는 움직임 특징값)과 대응되는 인식 데이터 베이스(411)에 저장된 값을 비교 검출하여, 음성 신호가 어느 문자에 해당하는지 여부를 판단한다. The recognition engine 413 is an engine for determining which character the input speech signal corresponds to and recognizes the value of the speech signal or the video signal output from the speech recognition processing unit 320 (specifically, the speech characteristic value or the motion characteristic value) A value stored in the corresponding recognition database 411 is compared and detected, and it is determined whether or not the voice signal corresponds to which character.

영상 표시 장치(100)는 대용량 인식 데이터 베이스를 포함하기에는 저장 공간이 부족할 수 있다. 그에 따라서, 인터넷 서버(410)에 존재하는 인식 데이터 베이스(411) 및 인식 엔진(413)을 이용하면, 더욱 정확하게 음성 인식을 수행할 수 있다. 또한, 본원에서는 특징값 산출부(440)에서 산출된 특징값만을 이용하므로, 인터넷 서버(410)로 전송되는 데이터 신호 량을 최소화할 수 있다. 그에 따라서, 신호의 고속 전달이 가능하고, 이용되는 데이터 패킷의 양을 최소화할 수 있다. The image display apparatus 100 may lack storage space to include a large-capacity recognition database. Accordingly, by using the recognition database 411 and the recognition engine 413 present in the Internet server 410, speech recognition can be performed more accurately. In addition, since only feature values calculated by the feature value calculating unit 440 are used in this embodiment, the amount of data signals transmitted to the Internet server 410 can be minimized. Accordingly, high-speed transmission of signals is possible, and the amount of data packets used can be minimized.

인터넷 서버(410)의 인식 엔진(413)을 이용할 경우, 문자 인식부(450)는 인식 엔진(413)에서 인식된 문자를 전송받고, 전송되는 문자들을 조합하여 제어 명령을 해석한다. When the recognition engine 413 of the Internet server 410 is used, the character recognition unit 450 receives the recognized character from the recognition engine 413 and combines the transmitted characters to interpret the control command.

제어부(330)는 문자 인식부(450)에서 해석된 제어 명령에 따라서, 영상 표시 장치(100)의 제어 동작을 수행하게 된다. The control unit 330 performs a control operation of the image display apparatus 100 according to the control command interpreted by the character recognition unit 450.

또한, 제어부(330)는 문자 인식부(450)가 실시간으로 전송되는 인식된 문자를 전송받고, 인식된 문자가 실시간으로 사용자 인터페이스(UI: User Interface)로 출력되도록 제어한다. In addition, the control unit 330 receives the recognized character transmitted in real time by the character recognition unit 450 and controls the recognized character to be output in a user interface (UI) in real time.

구체적으로, 사용자 인터페이스 부(360)는 인식된 문자를 사용자 인터페이스 데이터를 생성하고, 디스플레이 부(370)는 사용자 인터페이스 데이터를 디스플레이한다. 또한, 제어부(330)는 문자 인식부(450)가 실시간으로 전송되는 인식된 문자를 전송받고, 인식된 문자가 실시간으로 OSD(On Screen Display)로 출력되도록 제어할 수 있다. 구체적으로, OSD 생성부(355)는 인식된 문자를 OSD로 생성하고, 디스플레이 부(370)는 생성된 OSD 데이터를 디스플레이한다. 또한, 제어부(330)는 사용자 인터페이스 부(360)에서 출력되는 사용자 인터페이스 데이터가 OSD 생성부(355)에서 OSD 데이터로 변환되어 디스플레이 부(370)로 출력되도록 제어할 수 있다. Specifically, the user interface unit 360 generates user interface data, and the display unit 370 displays user interface data. In addition, the control unit 330 can receive the recognized character transmitted in real time by the character recognition unit 450 and control the recognized character to be output in real time on an OSD (On Screen Display). Specifically, the OSD generation unit 355 generates the recognized character using the OSD, and the display unit 370 displays the generated OSD data. The controller 330 may control the OSD generator 355 to convert OSD data output from the user interface 360 into OSD data and output the OSD data to the display unit 370.

도 5는 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 일 디스플레이 화면을 나타내는 도면이다. 도 5를 참조하면, 음성 신호 및 영상 신호가 입력되어, 음성 인식 처리부(320)가 음성 인식 동작을 수행하는 동안에 디스플레이되는 디스플레이 화면(510, 550)이 도시된다. 도 5에서는 사용자가 소정 시간 간격으로 '채, 널, 칠(7)'의 음성 신호를 원격 제어부(390)로 입력한 경우를 예로 들어 도시하였다. 5 is a view illustrating a display screen output by the image display apparatus according to an embodiment of the present invention. Referring to FIG. 5, a display screen 510 or 550 is displayed when a voice signal and a video signal are inputted and the voice recognition processor 320 performs a voice recognition operation. In FIG. 5, a case where the user inputs voice signals of 'channel, null, and fill (7)' to the remote control unit 390 at predetermined time intervals is shown as an example.

도 5의 (a)를 참조하면, 일 완성형 문자인 '채'가 입력되어, 음성 인식 처리부(320)에서 '채' 문자를 인식하고, 그에 따라서 사용자 인터페이스 데이터(520)에 인식된 문자(531)가 표시된다. 그리고, 음성 신호의 입력이 잠시 정지된 기간 동안 소정 시간 간격을 나타내는 기호(535)가 표시된다.Referring to FIG. 5A, a full-length character 'Chae' is input, and the voice recognition processor 320 recognizes the 'Chae' character and accordingly recognizes the recognized character 531 in the user interface data 520 ) Is displayed. A symbol 535 indicating a predetermined time interval is displayed during a period in which the input of the voice signal is temporarily stopped.

도 5의 (b)를 참조하면, 계속하여, 일 완성형 문자인 '널'이 입력되어, 음성 인식 처리부(320)에서 '널' 문자를 인식하고, 그에 따라서 사용자 인터페이스 데이터(520)에 인식된 문자('널')가 표시된다. 또한, 음성 신호의 입력이 잠시 정지된 기간 동안 소정 시간 간격을 나타내는 기호가 계속하여 표시된다. Referring to FIG. 5B, 'null', which is a one-piece type character, is input to recognize the 'null' character in the voice recognition processing unit 320, A character ('null') is displayed. Further, a symbol indicating a predetermined time interval is continuously displayed during a period in which the input of the voice signal is temporarily stopped.

도 6은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다. 도 6에서는 도 5에서 전술한 '채, 널, 칠(7)'에 대한 음성 신호의 입력 및 인식이 완료된 경우, 제어부(330)의 제어에 따라서 사용자 인터페이스 부(360)는 인식된 문자가 맞음을 확인받기 위한 확인 키(621) 및 제어 동작 수행 없이 음성 인식을 종료하기 위한 종료 키(625)를 포함하는 OSD(610)를 출력할 수 있다. 6 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention. In FIG. 6, when the input and recognition of the voice signal to the above-described 'Chain, Null, and Chil (7)' in FIG. 5 is completed, the user interface unit 360, under the control of the control unit 330, And an OSD 610 including an OK key 621 for confirming the voice recognition and an end key 625 for terminating voice recognition without performing a control operation.

사용자가 원격 제어부(390)를 조작하여 확인 키(621)를 제어부(330)로 입력하면, 제어부(330)는 확인된 문자가 지정하는 명령을 해석하고, 해석된 소정 명령을 수행한다. 그에 따라서, 영상 표시 장치(300)에서는 '채널7'로 채널 전환 동작이 수행된다. When the user operates the remote control unit 390 to input the confirmation key 621 to the control unit 330, the control unit 330 interprets the command designated by the confirmed character and executes the interpreted predetermined command. Accordingly, in the video display device 300, the channel switching operation is performed to 'channel 7'.

도 5 내지 도 6에 있어서, 사용자가 '칠'이라는 음성 신호를 입력한 경우, 해당 영상 표시 장치(300)는 숫자 '7'을 바로 인식하는 것으로 설정된 경우를 예로 들어 도시하였다. In FIGS. 5 to 6, when the user inputs a voice signal of 'fill', the video display device 300 is set to recognize the number '7' as an example.

도 7은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다. 7 is a view illustrating another display screen output by the image display apparatus according to the embodiment of the present invention.

또한, 제어부(330)는 인식 문자에 해당하는 소정 명령을 수행할지 여부를 다시 한번 확인할 수 있다. 도 7의 (a)을 참조하면, 제어부(330)는 디스플레이 화면(700)상으로 도시된 OSD 가 출력되도록 제어함으로써, 인식 문자에 해당하는 소정 명령을 수행할지 여부를 다시 한번 확인할 수 있다.Also, the control unit 330 can confirm once again whether or not to execute a predetermined command corresponding to the recognition character. Referring to FIG. 7A, the controller 330 controls the OSD to be displayed on the display screen 700, thereby confirming whether or not to execute a predetermined command corresponding to the recognized character.

또한, 도시된 OSD 자체가 사용자 인터페이스(구체적으로, GUI(Graphic User Interface)로 형성되어, 사용자 등은 원격 제어부(390)를 조작하여 '네' 키(710)를 제어부(330)로 입력할 수 있다. 그에 따라서, 사용자는 '채널7'로의 채널 전환이 수행되도록 해당 영상 표시 장치(300)를 원격으로 제어할 수 있다. In addition, the illustrated OSD itself is formed of a user interface (specifically, a GUI (Graphic User Interface) so that a user or the like can operate the remote control unit 390 and input the 'yes' key 710 to the control unit 330 Accordingly, the user can remotely control the video display device 300 to perform channel switching to 'channel 7'.

그리고, 제어부(330)의 제어에 따라서, 채널 전환 동작 수행을 알리는 OSD(750)가 출력될 수 있다. In accordance with the control of the control unit 330, the OSD 750 informing the user of the channel switching operation can be output.

또한, '아니오' 키(720)가 제어부(330)로 입력된 경우, 채널 전환 동작의 수행 없이, 음성 인식이 종료될 수 있다. Also, when the 'No' key 720 is input to the control unit 330, speech recognition may be terminated without performing a channel switching operation.

도 8은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다. 8 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention.

또한, 제어부(330)는 인식된 완성형 문자에 적어도 하나의 음성 음성 유사어가 존재하는 경우, 음성 유사어가 사용자 인터페이스를 통하여 출력되도록 제어할 수 있다. 그리고,사용자 인터페이스 부(360)는 출력되는 적어도 하나의 음성 유사어 중 일 완성형 문자를 사용자 인터페이스를 통하여 입력받을 수 있다. 입력된 완성형 문자는 제어부(330)로 전송된다. In addition, when at least one voice-like similar word exists in the recognized completed character, the control unit 330 may control the voice similar word to be outputted through the user interface. In addition, the user interface unit 360 may receive a complete character among the output at least one phonetic alphabet through the user interface. The completed character is input to the control unit 330.

도 8을 참조하면, 사용자가 문자 '애'의 음성 신호를 입력하면, 문자 인식부(450)는 음성 신호에 대응될 수 있는 문자인 '애', '에' 및 '얘'를 인식할 수 있다. 그러한 경우, 정확한 음성 인식을 위하여, 제어부(450)는 다수개의 음성 유사어 중 어느 하나를 선택할 수 있는 사용자 인터페이스가 출력되도록 제어한다. Referring to FIG. 8, when a user inputs a voice signal of the character 'E', the character recognition unit 450 recognizes the characters' E ',' E 'and' have. In this case, for accurate voice recognition, the control unit 450 controls the user interface to select one of a plurality of voice similarities to be outputted.

사용자 등은 OSD(810)로 디스플레이된 사용자 인터페이스를 통하여, 의도한 일 문자를 선택한다. 그에 따라서, 제어부(330)는 선택된 문자를 입력받고, 대응되는 제어 동작을 수행한다. The user or the like selects the intended letter through the user interface displayed on the OSD 810. [ Accordingly, the control unit 330 receives the selected character and performs a corresponding control operation.

도 9는 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다. 9 is a view showing another display screen output by the image display apparatus according to an embodiment of the present invention.

제어부(330)는 인식된 문자에 오류가 있는 경우, 인식된 문자를 취소하기 위한 음성 신호(예를 들어, back 을 나타내는 '백' 또는 '빽' 등)를 입력받고, 그에 따라서, 인식된 문자를 취소하고 다시 음성 인식을 수행하도록 제어한다. 즉, 제어부(330)는 인식된 문자를 취소하기 위한 음성 신호(예를 들어, '백' 또는 '빽' 등)를 미리 등록받아 저장부(380)에 저장시키고, 저장된 음성 신호인 '백' 또는 '빽'이 입력되어 인식되면, 직전 인식 문자를 취소한다. The control unit 330 receives a voice signal (e.g., 'back' or 'back' to indicate back) for canceling the recognized character when there is an error in the recognized character, And performs voice recognition again. That is, the control unit 330 registers the voice signal (for example, 'back' or 'back') for canceling the recognized character in advance in the storage unit 380, Or " back " is inputted, the previous recognition character is canceled.

사용자가 인식시키고자 의도한 문자는 'OZ'인 경우, 사용자는 '" '오' + 소정 시간 간격 + '제트' + 소정 시간 간격 "으로 음성신호를 입력하여야 한다. 그러나, 사용자가 잘못하여 "'오'+ 소정 시간 간격 + '즈'+ 소정 시간 간격"으로 음성 신호를 입력하면, 도 9의 (a)에 도시된 바와 같은 음성 인식의 위한 OSD(910)가 출력된다. If the character intended to be recognized by the user is 'OZ', the user must input a voice signal in '' o '+ a predetermined time interval +' jet '+ a predetermined time interval'. However, if the user erroneously inputs a voice signal at "+" a predetermined time interval + '+' + a predetermined time interval ", the OSD 910 for voice recognition as shown in FIG. 9 (a) .

사용자가 계속하여, 인식된 음성 신호의 취소를 요청하는 음성 신호인 "'빽'+ 소정 시간 간격"을 입력하면, 제어부(330)는 직전에 인식된 음성 신호('즈')를 취소하여, 도 9의 (b)에 도시된 바와 같은 음성 인식을 위한 OSD(920)가 출력되도록 제어한다. When the user continuously inputs a voice signal "back" + a predetermined time interval for requesting cancellation of the recognized voice signal, the control unit 330 cancels the voice signal ' And controls the OSD 920 for speech recognition as shown in FIG. 9 (b) to be outputted.

계속하여, 사용자가 다시 음성 신호인 "'제트'+ 소정 시간 간격"을 입력하면, 음성 인식 처리부(320)는 영문 완성형 문자인 'z'를 인식하고, 제어부(330)는 도 9의 (c)에 도시된 바와 같은 음성 인식을 위한 OSD(930)가 출력되도록 제어한다. When the user again inputs the voice signal "jet + plus a predetermined time interval", the voice recognition processor 320 recognizes the alphabetical completion character 'z', and the control unit 330 recognizes The OSD 930 for voice recognition as shown in FIG.

도 10은 본 발명의 일 실시예에 따른 영상 표시 장치가 출력하는 다른 디스플레이 화면을 나타내는 도면이다. 10 is a view illustrating another display screen output by the image display apparatus according to an embodiment of the present invention.

도 10을 참조하면, 제어부(330)는 인식된 완성형 문자로 이루어지는 단어 문자(도 9에서 예로 들은, 'OZ')에 대응되는 명령이 다수 개 존재하는 경우, 상기 다수개의 소정 명령을 선택하기 위한 사용자 인터페이스 데이터를 포함하는 OSD(1010)가 디스플레이 화면(1000)으로 출력되도록 제어할 수 있다. Referring to FIG. 10, when there are a plurality of commands corresponding to word characters ('OZ' in FIG. 9) composed of recognized complete characters, the control unit 330 selects the plurality of commands It is possible to control the OSD 1010 including the user interface data to be output to the display screen 1000.

도 10을 참조하면, 음성 신호 'OZ' 에 대응되는 명령으로 'OZ 홈페이지에 연결하도록 하는 명령, 'LGT 홈페이지에 연결하도록 하는 명령' 등이 존재하는 경우, 대응되는 명령들을 OSD(1010)에 표시하여, 사용자 등이 표시된 다수개의 명령들 중 일 명령을 선택할 수 있도록 한다. Referring to FIG. 10, when there is a command to connect to the OZ homepage, a command to connect to the LGT homepage, and the like corresponding to the voice signal 'OZ', corresponding commands are displayed on the OSD 1010 So that a user or the like can select a command among a plurality of commands displayed.

제어부(330)는 디스플레이되는 OSD(1010)로 출력된 사용자 인터페이스(구체적으로, GUI)를 통하여 일 명령을 선택받고, 선택된 일 명령에 해당하는 동작을 수행한다. 예를 들어, 사용자가 'OZ 홈페이지에 연결하도록 하는 명령'을 선택하여 제어부(330)로 입력한 경우, 제어부(330)는 네트워크 인터페이스 부(310)를 통하여 OZ 홈페이지에 접속하여 그에 따라 OZ 홈페이지의 화면을 디스플레이 부(370)로 출력한다. The control unit 330 selects a command through a user interface (specifically, GUI) output to the OSD 1010 to be displayed, and performs an operation corresponding to the selected command. For example, when the user selects 'command to connect to OZ homepage' and inputs the command to the control unit 330, the control unit 330 accesses the OZ homepage through the network interface unit 310, And outputs the screen to the display unit 370.

또한, 제어부(330)는 인식된 완성형 문자 또는 완성형 문자(예를 들어, 도 6에 도시된 바와 같이 '채널 7') 또는 인식된 완성형 문자들로 이루어지는 소정 명령 정보(예를 들어, 도 7에 도시된 바와 같이 "'채널 7'로 전환합니다!")가 사용자 인터페이스를 통하여 출력되도록 제어한다. 계속하여 제어부(330)는 음성 신호 입력이 완료되어 완성형 문자가 인식되고 나면, 인식된 완성형 문자에 대응되는 명령과 관련된 정보를 인터넷 서버에서 검색하고, 검색된 정보가 사용자 인터페이스를 통하여 출력되도록 제어할 수 있다. 예를 들어, 도 9 및 도 10에서 설명한 바와 같이, 인식된 완성형 문자가 'OZ'인 경우, OZ와 관련된 정보(예를 들어, LG 텔레콤의 OZ 서비스 센터 위치 등)를 인터넷 서버에서 검색하고, 검색된 정보가 사용자 인터페이스를 통하여 출력되도록 제어할 수 있다. 7) composed of the recognized complete character or completed character (for example, 'channel 7' as shown in FIG. 6) or recognized completion characters (for example, Quot ;, "switch to channel 7") as shown in FIG. After completing the input of the voice signal and recognizing the completion type character, the control unit 330 searches the Internet server for information related to the command corresponding to the recognized completion type character, and controls the retrieved information to be output through the user interface have. For example, as described in FIGS. 9 and 10, when the recognized completed character is 'OZ', information related to the OZ (for example, the location of the OZ service center of LG Telecom) is retrieved from the Internet server, So that the retrieved information can be controlled to be output through the user interface.

도 11은 본 발명의 일 실시예에 따른 음성 인식 방법을 나타내는 도면이다. 11 is a diagram illustrating a speech recognition method according to an embodiment of the present invention.

도 11을 참조하면, 본 발명의 일 실시예에 따른 음성 인식 방법은 소정 시간 간격을 두고 일 완성형 문자에 대응되는 음성 신호를 입력받고(1115 단계), 음성 신호가 입력되는 동안 사용자의 임모양 영상을 촬영(1117 단계)한다(1110 단계). 11, a speech recognition method according to an exemplary embodiment of the present invention receives a speech signal corresponding to a one-shot type character at predetermined time intervals (step 1115). While the speech signal is being input, (Step 1117).

그리고, 일 완성형 문자에 대응되는 음성 신호 구간을 추출(1125 단계)하고, 움직임 영상으로부터 일 완성형 문자에 대응되는 영상 신호 구간을 추출(1127 단계)한다(1120 단계).Then, in step 1120, a speech signal section corresponding to the one-time type character is extracted (step 1125), and a video signal section corresponding to the one-time type character is extracted from the motion image (step 1120).

음성 신호 구간과 영상 신호 구간이 일치하면, 일치된 구간 내에 입력된 음성 신호 및 영상 신호 중 적어도 하나에 대응되는 완성형 문자를 인식한다(1130 단계). 구체적으로, 음성 인식 엔진으로 상기 일치 구간 내의 신호를 전송(1135 단계)하고, 음성 인식 엔진을 이용하여 완성형 문자를 인식(1137 단계)한다.If the voice signal section and the video signal section coincide with each other, the completion character corresponding to at least one of the voice signal and the video signal input in the matching section is recognized (step 1130). Specifically, the signal in the matching interval is transmitted to the speech recognition engine in step 1135, and the completion character is recognized in step 1137 using the speech recognition engine.

그리고, 인식된 완성형 문자를 해석하여, 인식된 완성형 문자에 대응되는 명령을 산출한다(1140 단계). Then, the recognized completion type character is analyzed and a command corresponding to the recognized completion type character is calculated (step 1140).

그에 따라서, 인식된 완성형 문자에 대응되는 명령을 수행한다(1150 단계). Accordingly, an instruction corresponding to the recognized complete character is executed (Step 1150).

본 발명의 일 실시예에 따른 음성 인식 방법은 도 3 내지 도 10을 참조하여 설명한 본 발명의 일 실시예에 따른 영상 표시 장치와 그 기술적 사상 및 상세 동작 구성이 동일하다. 따라서, 상세한 설명은 생략한다. The speech recognition method according to an embodiment of the present invention is the same as that of the image display apparatus according to the embodiment of the present invention described with reference to FIG. 3 to FIG. Therefore, detailed description is omitted.

한편, 본 발명에서 사용되는 용어(terminology)들은 본 발명에서의 기능을 고려하여 정의 내려진 용어들로써 이는 해당 분야에 종사하는 기술자의 의도 또는 관례 등에 따라 달라질 수 있으므로 그 정의는 본 발명의 전반에 걸친 내용을 토대로 내려져야 할 것이다. The terminology used in the present invention is defined in consideration of the functions of the present invention, and it may vary depending on the intention or custom of a technician working in the field. Therefore, .

이상의 본 발명은 상기에 기술된 실시 예들에 의해 한정되지 않고 당업자들에 의해 다양한 변형 및 변경을 가져올 수 있으며, 이는 첨부된 청구항에서 정의되는 본 발명의 취지와 범위에 포함된다. The present invention is not limited to the above-described embodiments and various changes and modifications may be made by those skilled in the art, which is included in the spirit and scope of the present invention as defined in the appended claims.

110: 영상 표시 장치
340: 원격 제어 장치
300: 영상 표시 장치
310: 신호 입력부
311: 튜너
315: 네트워크 인터페이스 부
320: 음성 인식부
330: 제어부(Controller)
340: 신호 처리부(signal processor)
350: 인터페이스 부
355: OSD 생성부(On Screen Display generator)
360: 사용자 인터페이스 부(User Interface unit)
370: 디스플레이 부(Display unit)
380: 저장부(Storage unit)
390: 원격 제어부
391: 카메라 부
393: 마이크 부
395: 신호 변환부
397: 리모컨 인터페이스 부 110: Video display device
340: remote control device
300: Video display device
310: Signal input section
311: Tuner
315: Network interface unit
320:
330:
340: a signal processor
350:
355: OSD generating unit (On Screen Display generator)
360: User Interface Unit
370: Display unit
380: Storage unit
390:
391:
393: microphone section
395:
397: Remote control interface unit

Claims

A method of controlling an image display apparatus using speech recognition,
Storing a specific audio signal for deleting the recognized complete character in a storage unit;
Receiving a voice signal corresponding to a one-time type character at predetermined time intervals;
Taking a mouth-shaped motion image;
Extracting a voice signal section corresponding to the one-letter type character from the voice signal, and extracting a video signal section corresponding to the one-letter type character from the motion picture;
Recognizing the completion type character corresponding to at least one of the voice signal and the video signal input in the matched section if the voice signal section and the video signal section coincide with each other;
Deleting the recognized complete character in response to a specific audio signal stored in the storage unit; And
And performing a predetermined command including the recognized complete character.

The method of claim 1, wherein recognizing the completed character comprises:
Calculating a speech feature value of the speech signal input in the matched interval; And
And recognizing the one-shot type character based on the voice feature value.

3. The method of claim 2, wherein recognizing the finished character comprises:
And calculating a motion feature value of the motion image input in the matched section.

4. The method of claim 3, wherein recognizing the one-time character based on the voice feature value comprises:
Recognizing the complete character based on the voice feature value and the motion feature value.

The method of claim 1, wherein the step of receiving the voice signal comprises:
And receiving at least one of the at least one voice signal corresponding to at least one of the completed characters forming the predetermined command at the predetermined time intervals.

The method of claim 1, wherein the step of extracting the video signal section
Extracting, as the voice signal section, a section from the time when the voice signal starts to be input until the start of the predetermined time interval; And
And extracting, as the video signal section, a section from a point at which the mouth shape starts moving to a point at which the mouth shape stops moving.

The method according to claim 1,
Outputting information corresponding to the predetermined command including the recognized completed character or the completed character through a user interface in real time; And
Further comprising the step of performing the predetermined command.

The method of claim 1, wherein recognizing the completed character comprises:
Transmitting the audio signal and the video signal input in the matched section to an Internet server; And
And recognizing the completion type character using the speech recognition engine and the speech recognition database of the Internet server.

The method according to claim 1,
Outputting the voice similarity through a user interface when at least one voice similarity exists in the recognized completed character; And
Further comprising the step of the user selecting a desired completed character among the at least one voice similarity through the user interface.

The method according to claim 1,
Outputting the plurality of predetermined commands through a user interface when there are a plurality of predetermined commands including the recognized completed characters; And
Further comprising receiving a predetermined command from the plurality of predetermined commands through the user interface.

The method according to claim 1,
Outputting information corresponding to the predetermined command including the recognized completed character or the completed character through a user interface in real time; And
Searching the Internet server for information related to the predetermined command when the input of the voice signal is completed, and outputting the retrieved information through the user interface.

A remote controller for inputting a voice signal corresponding to the one-letter type character at predetermined time intervals and photographing a mouth-shaped motion picture;
Extracting a voice signal section corresponding to the one-letter type character from the voice signal, extracting a video signal section corresponding to the one-letter type character from the motion image, and, if the voice signal section and the video signal section coincide, A voice recognition processor for recognizing the completion type character corresponding to at least one of the voice signal and the video signal inputted within a predetermined interval;
A storage unit for storing a specific audio signal for deleting a recognized character; And
And a control unit for controlling the execution of a predetermined command including the recognized complete character,
Wherein,
And deletes the recognized complete character in response to the specific audio signal stored in the storage unit.

13. The apparatus of claim 12, wherein the speech recognition processor
Calculates a voice characteristic value of the voice signal input in the matched section, and recognizes the one-finished character based on the calculated voice characteristic value.

14. The apparatus of claim 13, wherein the speech recognition processor
Calculates a motion feature value of the motion image input in the matched section, and recognizes the monotone type character based on the calculated voice feature value and the motion feature value.

13. The apparatus of claim 12, wherein the speech recognition processor
A section from the time when the voice signal is input until the beginning of the predetermined time interval is extracted as the voice signal section and a section from the time when the mouth shape starts moving to the time when the mouth shape is stopped And extracts the image signal in the video signal section.

13. The method of claim 12,
Further comprising a user interface unit for outputting user interface data including information corresponding to the predetermined command including the recognized complete character or the completed character in real time.

13. The method of claim 12,
Further comprising a network interface unit for transmitting and receiving data with at least one Internet servers,
The speech recognition processor
And transmits at least one of the audio signal and the video signal input in the matched section through the network interface unit to the Internet server and recognizes the completion type character using the speech recognition engine and the speech recognition database of the Internet server And the video display device.

17. The method of claim 16,
The speech recognition processor
And controlling the voice similarity output through the user interface unit when at least one voice similarity exists in the recognized completed character,
The user interface unit
Wherein the at least one of the at least one phonetic alphabet is selected by the user.

delete

17. The apparatus of claim 16, wherein the speech recognition processor
Controlling the plurality of predetermined commands to be output through the user interface unit when a plurality of predetermined commands including the recognized completed characters exist,
The user interface unit
Further comprising a video display unit for receiving a predetermined command among the plurality of predetermined commands.