KR20200040133A

KR20200040133A - Voice recognition apparatus and control method for the same

Info

Publication number: KR20200040133A
Application number: KR1020180120042A
Authority: KR
Inventors: 김현남; 송현정; 이나경; 김홍성; 윤현상
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2020-04-17
Also published as: KR102393774B1

Abstract

According to one embodiment of the present invention, a voice recognition device for controlling an operation through a sensed gesture comprises: a display unit displaying a media play screen for a voice recognition device recognizing a sensed voice; a camera obtaining an image for a predetermined area of interest; and a control unit controlling an operation of the voice recognition device according to a sensed hand gesture when the hand gesture is sensed in the image obtained by the camera.

Description

VOICE RECOGNITION APPARATUS AND CONTROL METHOD FOR THE SAME

본 발명은 미디어 재생 화면이 표시되는 디스플레이부가 마련되는 음성 인식 장치 및 방법에 관한 것이다.The present invention relates to a speech recognition apparatus and method in which a display unit on which a media playback screen is displayed is provided.

최근 인공지능 스피커와 같이 음성 입력을 인식하고 그에 따라 동작을 수행하는 음성 인식 장치에 대한 관심이 증가하고 있다. 이러한 음성 인식 장치는 음성 인식 장치가 위치된 곳에서 다소 떨어져 있는 사용자로부터 음성 입력을 제공받아 다양한 서비스를 사용자에게 제공할 수 있다. Recently, interest in a speech recognition device that recognizes a voice input and performs an operation according to an artificial intelligence speaker has increased. Such a speech recognition device may provide a variety of services to a user by receiving a voice input from a user who is somewhat away from where the speech recognition device is located.

일 예로, 음성 인식 장치는 사용자의 음성을 입력 받아 이를 인식하고, 인식 결과에 대응되는 제어 명령에 따라 소리 등의 피드백을 제공할 수 있다. 아울러, 음성 인식 장치가 각종 정보를 표시할 수 있는 디스플레이 수단을 구비하는 경우, 음성 인식 장치는 음성 인식 결과에 대응되는 화면을 디스플레이 수단을 통해 표시할 수도 있다.For example, the voice recognition device may receive a user's voice, recognize it, and provide feedback such as sound according to a control command corresponding to the recognition result. In addition, when the speech recognition device includes display means capable of displaying various types of information, the speech recognition device may display a screen corresponding to the speech recognition result through the display means.

한편, 상술한 음성 인식 장치는 사용자의 음성 이외에도 다양한 방식으로 제어 명령을 입력 받을 수 있다. 만약 디스플레이 수단이 터치 스크린의 형태로 구현되는 경우, 음성 인식 장치는 사용자의 터치 입력을 제어 명령으로 입력 받을 수 있다.Meanwhile, the above-described voice recognition device may receive a control command in various ways in addition to the user's voice. If the display means is implemented in the form of a touch screen, the voice recognition device may receive a user's touch input as a control command.

한국공개특허공보, 제 10-2018-0085931호 (2018.07.30. 공개)Korean Patent Publication, No. 10-2018-0085931 (released on July 30, 2018)

본 발명이 해결하고자 하는 과제는, 감지되는 제스처를 통해 동작을 제어하는 음성 인식 장치 및 그 제어방법을 제공하는 것이다.The problem to be solved by the present invention is to provide a voice recognition device and a control method for controlling an operation through a detected gesture.

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problem to be solved by the present invention is not limited to those mentioned above, and another problem not to be solved can be clearly understood by those having ordinary knowledge to which the present invention belongs from the following description. will be.

일 실시예에 따른 음성 인식 장치는, 감지되는 음성을 인식하는 음성 인식 장치에 있어서, 미디어 재생 화면을 표시하는 디스플레이부; 미리 정해진 관심 영역에 대한 영상을 획득하는 카메라; 및 상기 카메라에 의해 획득된 영상 내 핸드 제스처가 감지되면, 상기 감지된 핸드 제스처에 따라 상기 음성 인식 장치의 동작을 제어하는 제어부를 포함한다.A voice recognition apparatus according to an embodiment includes: a voice recognition apparatus for recognizing a sensed voice, comprising: a display unit displaying a media playback screen; A camera that acquires an image for a predetermined region of interest; And a control unit controlling an operation of the speech recognition device according to the detected hand gesture when the hand gesture in the image acquired by the camera is detected.

일 실시예에 따른 음성 인식 장치의 제어방법은, 감지되는 음성을 인식하는 음성 인식 장치의 제어방법에 있어서, 미디어 재생 화면을 표시하는 단계; 미리 정해진 관심 영역에 대한 영상을 획득하는 단계; 및 상기 획득된 영상 내 제스처가 감지되면, 상기 감지된 제스처에 따라 상기 음성 인식 장치의 동작을 제어하는 단계를 포함한다.A control method of a speech recognition apparatus according to an embodiment includes: a control method of a speech recognition apparatus for recognizing a sensed speech, comprising: displaying a media playback screen; Obtaining an image for a predetermined region of interest; And when a gesture in the acquired image is detected, controlling an operation of the speech recognition device according to the detected gesture.

일 실시예에 따른 음성 인식 장치 및 그 제어방법은, 음성 또는 터치와 같은 방법으로 제어명령의 입력이 어려운 상황에서, 단순하면서도 직관적인 제스처를 통해 동작의 제어가 가능한 환경을 제공할 수 있다. 이를 통해, 음성 인식 장치에 대한 사용자 편의성을 높일 수 있다.The voice recognition apparatus and the control method according to an embodiment may provide an environment capable of controlling operations through simple and intuitive gestures in a situation in which it is difficult to input a control command by a method such as voice or touch. Through this, it is possible to increase user convenience for the speech recognition device.

도 1은 일 실시예에 따른 디스플레이 장치의 외관도이다.
도 2는 일 실시예에 따른 디스플레이 장치의 기능 블록도이다.
도 3은 일 실시예에 따른 디스플레이부가 표시하는 미디어 재생 화면을 예시한 도면이다.
도 4는 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 1 핸드 제스처가 감지된 경우를 예시한 도면이다.
도 5는 일 실시예에 따른 디스플레이부가 표시하는 볼륨 제어 화면을 예시한 도면이다.
도 6은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 2 핸드 제스처가 감지된 경우를 예시한 도면이다.
도 7은 도 5의 볼륨 제어 화면이 제 2 핸드 제스처에 의해 제어되는 경우를 나타낸 도면이다.
도 8은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 3 핸드 제스처가 감지된 경우를 예시한 도면이다.
도 9는 도 7의 볼륨 제어 화면이 제 3 핸드 제스처에 의해 제어되는 경우를 나타낸 도면이다.
도 10은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 1 핸드 제스처가 감지되지 않는 경우를 예시한 도면이다.
도 11은 일 실시예에 따른 음성 인식 장치 제어방법의 흐름도이다.1 is an external view of a display device according to an exemplary embodiment.
2 is a functional block diagram of a display device according to an embodiment.
3 is a diagram illustrating a media playback screen displayed by the display unit according to an embodiment.
4 is a diagram illustrating a case in which a first hand gesture is detected in an image acquired by a camera according to an embodiment.
5 is a diagram illustrating a volume control screen displayed by the display unit according to an embodiment.
6 is a diagram illustrating a case in which a second hand gesture is detected in an image acquired by a camera according to an embodiment.
FIG. 7 is a diagram illustrating a case in which the volume control screen of FIG. 5 is controlled by a second hand gesture.
8 is a diagram illustrating a case in which a third hand gesture is detected in an image acquired by a camera according to an embodiment.
FIG. 9 is a diagram illustrating a case in which the volume control screen of FIG. 7 is controlled by a third hand gesture.
10 is a diagram illustrating a case in which a first hand gesture is not detected in an image acquired by a camera according to an embodiment.
11 is a flowchart of a method for controlling a speech recognition apparatus according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to completely inform the person having the scope of the invention, and the present invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing embodiments of the present invention, when it is determined that a detailed description of known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to a user's or operator's intention or practice. Therefore, the definition should be made based on the contents throughout this specification.

도 1은 일 실시예에 따른 디스플레이 장치의 외관도이고, 도 2는 일 실시예에 따른 디스플레이 장치의 기능 블록도이다.1 is an external view of a display device according to an embodiment, and FIG. 2 is a functional block diagram of a display device according to an embodiment.

디스플레이 장치는 미디어 또는 각종 정보에 대한 화면을 표시할 수 있는 디스플레이 수단을 구비한 모든 전자장치를 의미할 수 있다. 예를 들어, 일 실시예에 따른 디스플레이 장치는 컴퓨터 장치, 휴대용 통신 장치(예: 스마트폰), 휴대용 멀티미디어 장치 등을 포함할 수 있다. 또한, 다른 실시예에 따른 디스플레이 장치(100)는 종래의 가전 장치에 디스플레이 수단이 마련되는 경우를 포함할 수 있다.The display device may mean any electronic device having a display means capable of displaying a screen for media or various information. For example, the display device according to an embodiment may include a computer device, a portable communication device (eg, a smart phone), a portable multimedia device, and the like. Also, the display device 100 according to another embodiment may include a case in which a display means is provided in a conventional home appliance.

나아가, 또 다른 실시예에 따른 디스플레이 장치는 디스플레이 수단이 마련되는 음성 인식 장치(100)를 포함할 수도 있다. 여기서, 음성 인식 장치(100)란 사용자로부터 발화되는 음성(Voice)를 인식하고, 인식된 음성에 대응되는 제어 명령에 따라 제어되는 전자 장치를 의미할 수 있다. 이하에서는 디스플레이 장치가 음성 인식 장치(100)로서 구현되는 경우를 전제로 설명한다. Furthermore, the display device according to another embodiment may include the speech recognition device 100 in which the display means is provided. Here, the voice recognition device 100 may mean an electronic device that recognizes a voice uttered from a user and is controlled according to a control command corresponding to the recognized voice. Hereinafter, a description will be given on the premise that the display device is implemented as the speech recognition device 100.

도 1 및 2를 참조하면, 일 실시예에 따른 음성 인식 장치(100)는 사용자의 음성을 감지하는 마이크로폰(110); 미디어 또는 각종 정보에 대한 화면이 표시되는 디스플레이부(130); 미디어 재생에 따른 소리를 출력하거나 입력에 대한 피드백으로 소리를 출력하는 스피커(140); 음성 인식 장치(100) 관련 정보가 저장되는 저장부(160); 및 음성 인식 장치(100)의 각 구성을 제어하는 제어부(150)를 포함할 수 있다.1 and 2, the speech recognition apparatus 100 according to an embodiment includes a microphone 110 for sensing a user's voice; A display unit 130 on which a screen for media or various information is displayed; A speaker 140 for outputting sound according to media playback or outputting sound as feedback for input; A storage unit 160 in which information related to the speech recognition device 100 is stored; And it may include a control unit 150 for controlling each configuration of the speech recognition device 100.

디스플레이부(130)는 음성 인식 장치(100)의 외관에 마련되어, 미디어 또는 음성 인식 장치(100)와 직간접적으로 연관된 각종 정보에 대한 화면을 표시할 수 있다. 예를 들어, 디스플레이부(130)는 미리 저장되거나 외부의 서버 또는 클라우드로부터 스트리밍된 영상 컨텐츠를 표시할 수 있고, 날씨나 뉴스와 같은 각종 정보 제공 화면을 표시할 수도 있다. 또한, 디스플레이부(130)는 음성 인식 장치(100)의 기능을 제어하기 위한 제어 화면을 표시할 수도 있다. 이 때, 일 실시예에 따른 제어 화면은 미디어의 재생을 제어하기 위한 미디어 재생 화면 M을 포함할 수 있다.The display 130 may be provided on the exterior of the speech recognition apparatus 100 to display a screen for various information directly or indirectly associated with the media or speech recognition apparatus 100. For example, the display 130 may display video content previously stored or streamed from an external server or cloud, and may display various information providing screens such as weather or news. In addition, the display 130 may display a control screen for controlling the function of the speech recognition device 100. At this time, the control screen according to an embodiment may include a media playback screen M for controlling playback of media.

이를 위해, 디스플레이부(130)는 LCD(Liquid Crystal Display), LED(Light Emitting Diode), PDP(Plasma Display Panel), OLED(Organic Light Emitting Diode), CRT(Cathode Ray Tube) 등으로 구현될 수 있으나, 이에 한정되지는 않는다.To this end, the display 130 may be implemented with a liquid crystal display (LCD), a light emitting diode (LED), a plasma display panel (PDP), an organic light emitting diode (OLED), a cathode ray tube (CRT), etc. , But is not limited thereto.

또한, 디스플레이부(130)가 터치 패널과 결합되어 마련되는 경우, 디스플레이부(130)는 사용자의 터치를 감지함으로써 제어 명령을 입력 받을 수도 있다. 예를 들어, 제어 화면이 표시되는 디스플레이부(130)의 특정 위치에 터치가 감지되면, 디스플레이부(130)는 터치에 따른 전기적 신호를 후술할 제어부(150)에 제공할 수 있다. 이렇게 제공된 전기적 신호는 터치에 대응되는 제어 명령을 탐색하는데 이용될 수 있다.In addition, when the display unit 130 is provided in combination with the touch panel, the display unit 130 may receive a control command by sensing a user's touch. For example, when a touch is detected at a specific location of the display unit 130 on which the control screen is displayed, the display unit 130 may provide an electrical signal according to the touch to the controller 150 to be described later. The electrical signal thus provided may be used to search for a control command corresponding to a touch.

스피커(140)는 미디어 재생에 따른 소리를 출력할 수 있다. 예를 들어, 음성 인식 장치(100)에 특정 미디어 재생 명령이 입력되면, 스피커(140)는 해당 미디어를 소리로 출력할 수 있다. 만약, 재생 명령이 입력된 미디어가 소리를 포함하는 영상인 경우, 스피커(140)는 디스플레이부(130)의 표시 영상에 동기화하여 소리를 출력할 수 있다.The speaker 140 may output sound according to media playback. For example, when a specific media play command is input to the voice recognition apparatus 100, the speaker 140 may output the corresponding media as sound. If the media to which the playback command is input is an image including sound, the speaker 140 may output sound in synchronization with the display image of the display 130.

또한, 스피커(140)는 음성 인식 장치(100)에 대한 입력의 피드백으로 각종 소리를 출력할 수도 있다. 예를 들어, 사용자로부터 제어 명령이 입력된 경우, 스피커(140)는 제어 명령이 성공적으로 입력되었음을 알리는 소리를 출력할 수 있다. 사용자는 이를 청각적으로 인식함으로써, 음성 인식 장치(100)에 제어 명령이 입력되었음을 확인할 수 있다.In addition, the speaker 140 may output various sounds as feedback of an input to the speech recognition device 100. For example, when a control command is input from the user, the speaker 140 may output a sound indicating that the control command has been successfully input. The user can confirm that a control command has been input to the speech recognition apparatus 100 by acoustically recognizing it.

마이크로폰(110)은 사용자에 의해 발화되는 음성을 감지할 수 있도록, 음성 인식 장치(100)의 외관에 마련될 수 있다. 마이크로폰(110)은 감지된 음성에 대응되는 전기적 신호인 음성 신호를 출력할 수 있다.The microphone 110 may be provided on the exterior of the speech recognition device 100 to detect speech uttered by the user. The microphone 110 may output a voice signal that is an electrical signal corresponding to the detected voice.

제어부(150)는 마이크로폰(110)으로부터 음성 신호를 전달받아, 사용자의 음성을 인식할 수 있다. 구체적으로, 제어부(150)는 음성 신호에 음성인식 알고리즘(Speech Recognition Algorithm) 또는 음성인식 엔진(Speech Recognition Engine)을 적용하여 사용자의 음성을 인식할 수 있다. 구체적으로, 제어부(150)는 수신된 음성 신호 중 실제 음성 구간인 EPD(End Point Detection)을 검출하고, 검출된 구간 내에서 켑스트럼(Cepstrum), 선형 예측 코딩(Linear Predictive Coefficient: LPC), 멜프리퀀시켑스트럼(Mel Frequency Cepstral Coefficient: MFCC) 또는 필터 뱅크 에너지(Filter Bank Energy) 등의 특징 벡터 추출 기술을 적용하여 음성 신호의 특징 벡터를 추출할 수 있다. 제어부(150)는 이렇게 추출된 특징 벡터와 훈련된 기준 패턴과의 비교를 통하여 인식 결과를 얻을 수 있다. 이를 위해, 음성의 신호적인 특성을 모델링하여 비교하는 음향 모델(Acoustic Model) 과 인식 어휘에 해당하는 단어나 음절 등의 언어적인 순서 관계를 모델링하는 언어 모델(Language Model)이 사용될 수 있다.The control unit 150 may receive a voice signal from the microphone 110 and recognize a user's voice. Specifically, the controller 150 may recognize a user's speech by applying a speech recognition algorithm or a speech recognition engine to the speech signal. Specifically, the controller 150 detects an end point detection (EPD), which is an actual voice section, among the received voice signals, and a cepstrum and a linear predictive coding (LPC) within the detected section, A feature vector extraction technology such as Mel Frequency Cepstral Coefficient (MFCC) or Filter Bank Energy can be applied to extract a feature vector of a speech signal. The controller 150 may obtain a recognition result through comparison between the extracted feature vector and the trained reference pattern. To this end, an acoustic model that compares by modeling and comparing signal characteristics of speech and a language model that models linguistic order relationships such as words or syllables corresponding to a recognized vocabulary may be used.

음성 인식 결과를 획득한 후, 제어부(150)는 음성 인식 결과에 대응되는 제어 명령을 탐색할 수 있다. 음성 인식 장치(100)를 제어하기 위한 제어 명령 세트가 후술할 저장부(160)에 미리 저장되고, 제어부(150)는 음성 인식 결과와 미리 저장된 제어 명령 세트를 비교하여 유사도에 따라 적절한 제어 명령을 탐색할 수 있다.After obtaining the voice recognition result, the controller 150 may search for a control command corresponding to the voice recognition result. A set of control commands for controlling the speech recognition device 100 is stored in advance in a storage unit 160 to be described later, and the control unit 150 compares the result of speech recognition with a set of pre-stored control commands to generate appropriate control commands according to similarity. Can navigate.

제어 명령 탐색이 완료되면, 제어부(150)는 탐색된 제어 명령에 기초하여 음성 인식 장치(100)의 동작을 제어할 수 있다. 예를 들어, 사용자가 “OOO의 음악 재생해줘.”라고 발화하면, 제어부(150)는 사용자의 발화 음성에 대응되는 제어 명령을 탐색하고, 탐색된 제어 명령에 대응되는 음악을 스피커(140)를 통해 출력할 수 있다. 만약, 사용자가 “△△△의 뮤직 비디오 재생해줘.”라고 발화한 경우, 제어부(150)는 사용자의 발화 음성에 대응되는 뮤직 비디오를 디스플레이부(130)와 스피커(140)를 동기화하여 출력할 수 있다.When the control command search is completed, the control unit 150 may control the operation of the speech recognition device 100 based on the searched control command. For example, when the user utters “Play music of OOO.”, The controller 150 searches for a control command corresponding to the user's spoken voice, and plays the speaker 140 for music corresponding to the found control command. Can be output through If the user utters “Play music video of △△△.”, The controller 150 synchronizes the display unit 130 with the speaker 140 to output the music video corresponding to the user's spoken voice. You can.

이를 위해, 제어부(150)는 프로세서와 같이 하드웨어로 구현되거나, 프로그램과 같이 소프트웨어로 구현될 수 있고, 이와는 달리 하드웨어 및 소프트웨어의 결합으로 구현될 수도 있다.To this end, the controller 150 may be implemented in hardware, such as a processor, or in software, such as a program, or alternatively, may be implemented in a combination of hardware and software.

저장부(160)는 제어부(150)가 음성 인식 장치(100)를 제어하는데 필요한 각종 정보를 미리 저장할 수 있다. 예를 들어, 저장부(160)는 음성 인식 장치(100)에 대한 제어 명령 세트, 음성 인식에 이용되는 음성인식 알고리즘, 음성인식 엔진, 음향 모델, 언어 모델, 디스플레이부(130)에 의해 표시되는 미디어, 정보 제공 화면, 제어 화면, 스피커(140)에 의해 출력되는 소리 등을 미리 저장하였다가 제어부(150)에 제공할 수 있다.The storage 160 may store in advance various information necessary for the control unit 150 to control the speech recognition device 100. For example, the storage 160 is a set of control commands for the speech recognition device 100, a speech recognition algorithm used for speech recognition, a speech recognition engine, a sound model, a language model, and displayed by the display 130 Media, an information providing screen, a control screen, and sounds output by the speaker 140 may be stored in advance and then provided to the controller 150.

이를 위해, 저장부(160)는 메모리 타입(Flash Memory Type), 하드디스크 타입(Hard Disk Type), 멀티미디어 카드 마이크로 타입(Multimedia Card Micro Type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM: Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM: Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 통해 구현될 수 있다.To this end, the storage 160 is a memory type (Flash Memory Type), hard disk type (Hard Disk Type), multimedia card micro type (Multimedia Card Micro Type), card type memory (for example, SD or XD memory, etc.) ), RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (ROM: Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), Magnetic It may be implemented through a storage medium of at least one type of memory, magnetic disk, and optical disk.

상술한 바와 같이, 음성 인식 장치(100)는 사용자의 음성 또는 디스플레이부(130)에 대한 터치를 통해 제어 명령을 입력 받을 수 있다. 그러나, 음성 인식 장치(100)가 소음이 존재하는 환경에 위치하거나, 사용자가 터치가 불가능한 상황에 놓인 경우, 사용자는 음성 인식 장치(100)를 제어하기 어려울 수 있다. 이러한 경우, 음성 인식 장치(100)는 상술한 방법 이외의 방법으로 제어 명령을 입력 받을 필요가 있다.As described above, the voice recognition apparatus 100 may receive a control command through a user's voice or a touch on the display 130. However, when the voice recognition device 100 is located in an environment in which noise is present, or when the user is placed in a situation where touch is impossible, the user may have difficulty controlling the voice recognition device 100. In this case, the speech recognition apparatus 100 needs to receive a control command in a method other than the above-described method.

일 실시예에 따른 음성 인식 장치(100)는 사용자의 제스처를 입력 받고, 입력된 제스처에 대응되는 제어 명령에 따라 각 구성이 제어될 수 있다. 이를 위해, 일 실시예에 따른 음성 인식 장치(100)는 카메라(120); 및 조도 센서(170)를 더 포함할 수 있다.The voice recognition apparatus 100 according to an embodiment may receive a user's gesture, and each component may be controlled according to a control command corresponding to the input gesture. To this end, the speech recognition apparatus 100 according to an embodiment includes a camera 120; And an illuminance sensor 170.

카메라(120)는 음성 인식 장치(100) 외관에 마련되어, 미리 정해진 관심 영역에 대한 영상 I 을 획득할 수 있다. 여기서 관심 영역이란 카메라(120)가 마련되는 위치에 따라 결정되는 촬영 영역을 의미할 수 있다. 제어부(150)는 획득된 영상 I 으로부터 제스처를 감지하고, 감지된 제스처에 대응되는 제어 명령에 따라 음성 인식 장치(100)를 제어할 수 있다.The camera 120 is provided on the exterior of the speech recognition apparatus 100 to obtain an image I for a predetermined region of interest. Here, the region of interest may mean a photographing region determined according to a location where the camera 120 is provided. The controller 150 may detect a gesture from the acquired image I and control the voice recognition apparatus 100 according to a control command corresponding to the detected gesture.

또한, 조도 센서(170)는 음성 인식 장치(100) 주변의 조도를 감지할 수 있고, 제어부(150)는 감지된 조도가 미리 정해진 기준 조도 이상일 때에만 제스처에 대응되는 제어 명령에 따라 음성 인식 장치(100)를 제어할 수도 있다.In addition, the illuminance sensor 170 may detect the illuminance around the speech recognition device 100, and the controller 150 may recognize the speech according to a control command corresponding to a gesture only when the sensed illuminance is equal to or greater than a predetermined reference illuminance. (100) can also be controlled.

음성 인식 장치(100)는 영상으로부터 감지 가능한 다양한 제스처에 의해 각 구성을 제어할 수 있다. 제어에 이용되는 제스처의 일 실시예는 사용자의 손에 대한 핸드 제스처, 사용자의 손 중 손가락에 대한 핑거 제스처, 사용자의 머리에 대한 헤드 제스처, 사용자의 표정에 대한 표정 제스처, 및/또는 사용자 몸의 움직임에 대한 모션 제스처 등을 포함할 수 있다. 나아가, 음성 인식 장치(100)의 제어에 이용되는 제스처가 반드시 사용자에 의한 것만을 의미하는 것은 아니고, 영상에 의해 식별 가능한 미리 정해진 객체의 움직임에 관한 것이면 충분하다.The voice recognition device 100 may control each component by various gestures that can be detected from an image. One example of a gesture used for control is a hand gesture for a user's hand, a finger gesture for a user's hand, a head gesture for the user's head, a facial expression gesture for the user's facial expression, and / or a user's body And motion gestures for movement. Furthermore, the gesture used for the control of the speech recognition device 100 does not necessarily mean that the user is the user, but it is sufficient to relate to the movement of a predetermined object identifiable by an image.

이하에서는, 음성 인식 장치(100)의 제어에 이용되는 제스처가 핸드 제스처인 경우를 전제로 설명한다.Hereinafter, a description will be given on the assumption that the gesture used for the control of the speech recognition apparatus 100 is a hand gesture.

이하에서는 도 3 내지 10을 참조하여 핸드 제스처에 따라 음성 인식 장치(100)를 제어하는 방법을 설명하며, 구체적으로 음성 인식 장치(100)에 표시되는 미디어 재생 화면 M을 제어하는 방법을 설명한다.Hereinafter, a method of controlling the speech recognition apparatus 100 according to the hand gesture will be described with reference to FIGS. 3 to 10, and specifically, a method of controlling the media playback screen M displayed on the speech recognition apparatus 100 will be described.

도 3은 일 실시예에 따른 디스플레이부가 표시하는 미디어 재생 화면을 예시한 도면이고, 도 4는 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 1 핸드 제스처가 감지된 경우를 예시한 도면이고, 도 5는 일 실시예에 따른 디스플레이부가 표시하는 볼륨 제어 화면을 예시한 도면이다.3 is a diagram illustrating a media playback screen displayed by the display unit according to an embodiment, and FIG. 4 is a diagram illustrating a case where a first hand gesture is detected from an image obtained by a camera according to an embodiment, 5 is a diagram illustrating a volume control screen displayed by the display unit according to an embodiment.

디스플레이부(130)는 특정 미디어를 재생하는 것과 관련된 미디어 재생 화면 M을 표시할 수 있다. 상술한 바와 같이, 음성 또는 터치 등의 방법으로 특정 미디어에 대한 재생 명령이 입력되면, 제어부(150)는 해당 미디어를 재생함과 동시에 이와 관련된 미디어 재생 화면 M을 표시하도록 디스플레이부(130)를 제어할 수 있다.The display unit 130 may display a media playback screen M related to playing a specific media. As described above, when a playback command for a specific media is input by a method such as voice or touch, the control unit 150 controls the display unit 130 to display the media playback screen M while playing the corresponding media. can do.

미디어 재생 화면 M 상에는 재생되는 미디어의 종류(예를 들어, 동영상, 사진, 음악 등), 재생되는 미디어의 제목, 저작자 정보, 재생되는 미디어에 대한 프로그레스 바(Progress Bar) 등이 표시될 수 있다.On the media playback screen M, the type of media to be played (eg, video, photo, music, etc.), the title of the played media, author information, and a progress bar for the played media may be displayed. .

미디어 재생 화면 M이 디스플레이부(130)에 표시될 때, 제어부(150)는 핸드 제스처가 입력될 수 있는 환경인지 여부를 판단할 수 있다. 핸드 제스처가 입력될 수 있는 환경이란 음성 인식 장치(100)의 제스처 인식 설정이 온(On) 상태이고, 조도 센서(170)에 의해 감지된 조도가 기준 조도 이상인 경우를 의미할 수 있다. 여기서, 기준 조도는 카메라(120)에 의해 획득된 영상 I 으로부터 핸드 제스처를 감지할 수 있는 최저 조도를 의미할 수 있다.When the media playback screen M is displayed on the display unit 130, the controller 150 may determine whether the environment is a hand gesture input. The environment in which the hand gesture can be input may refer to a case in which the gesture recognition setting of the speech recognition device 100 is on and the illuminance detected by the illuminance sensor 170 is greater than or equal to the reference illuminance. Here, the reference illuminance may mean the lowest illuminance capable of detecting a hand gesture from the image I acquired by the camera 120.

만약, 핸드 제스처가 입력될 수 있는 환경이라고 판단되면, 제어부(150)는 미디어 재생 화면 M 상에 핸드 제스처가 입력 가능함을 나타내는 제스처 입력 가능 오브젝트 M₁을 표시하도록 디스플레이부(130)를 제어할 수 있다. 도 3에서는 미디어 재생 화면 M의 우측 상단에 제스처 입력 가능 오브젝트 M₁이 표시되는 경우를 예시하였으나, 제스처 입력 가능 오브젝트 M₁은 사용자에게 시각적으로 인식될 수 있는 미디어 재생 화면 M 상의 모든 위치에 표시될 수 있다.If it is determined that the hand gesture can be input, the control unit 150 may control the display unit 130 to display the gesture input capable object M ₁ indicating that the hand gesture can be input on the media playback screen M. have. In FIG. 3, the case where the gesture input capable object M ₁ is displayed on the upper right of the media playback screen M is illustrated, but the gesture input capable object M ₁ is displayed at all positions on the media playback screen M that can be visually recognized by the user. You can.

제스처 입력 가능 오브젝트 M₁을 시각적으로 인식한 사용자는 관심 영역 내에서 핸드 제스처를 통해 제어 명령을 입력할 수 있다. 예를 들어, 사용자는 음성 인식 장치에 의해 재생 중인 미디어의 설정에 대한 핸드 제스처를 미디어 설정 화면을 통해 입력함으로써 음성 인식 장치의 동작을 제어할 수 있다. 여기서, 미디어의 설정이란 미디어 재생과 관련하여 사용자의 선택에 의해 결정 가능한 모든 항목을 의미할 수 있다. 예를 들어, 핸드 제스처에 의해 입력 가능한 미디어 설정은 표시 화면의 밝기 제어, 표시 화면의 크기 제어, 미디어 재생 제어 등을 포함할 수 있다. 또한, 미디어 설정 화면은 상술한 미디어의 설정이 가능한 UI를 제공하는 화면을 의미할 수 있다.The user who visually recognizes the gesture input capable object M ₁ may input a control command through a hand gesture in the region of interest. For example, the user may control the operation of the speech recognition device by inputting a hand gesture for setting the media being played by the speech recognition device through the media setting screen. Here, the setting of the media may mean all items that can be determined by a user's selection in relation to media playback. For example, the media setting that can be input by a hand gesture may include controlling the brightness of the display screen, controlling the size of the display screen, controlling media playback, and the like. Also, the media setting screen may refer to a screen providing a UI capable of setting the above-described media.

이하에서는 핸드 제스처가 미디어의 설정 중 볼륨 제어에 대한 경우를 전제로 설명한다.Hereinafter, a description will be given on the premise that the hand gesture is for volume control during media setting.

카메라(120)는 관심 영역에 대한 영상 I를 획득할 수 있고, 도 4에서는 영상 I 내에 사용자가 검지 손가락을 편 제 1 핸드 제스처가 포함된 경우를 예시한다. 제어부(150)는 카메라(120)에 의해 획득된 영상 I로부터 검지 손가락이 펴진 제 1 핸드 제스처를 감지하고, 감지된 제 1 핸드 제스처에 따라 미디어 재생 화면 M 내 미디어 설정 화면 중 볼륨 제어 화면 V를 표시하도록 디스플레이부(130)를 제어할 수 있다. 이 때, 제어부(150)는 검지 손가락이 펴진 제 1 핸드 제스처가 미리 정해진 기준 시간 동안 감지될 때에만, 미디어 재생 화면 M 내 볼륨 제어 화면 V를 표시하도록 디스플레이부(130)를 제어할 수도 있다. 여기서, 기준 시간이란 사용자가 볼륨 제어 화면 M을 표시할 의도를 가지고 제 1 핸드 제스처를 유지하는 최소 시간을 의미할 수 있다.The camera 120 may acquire the image I for the region of interest, and FIG. 4 illustrates a case in which the first hand gesture in which the user has the index finger is included in the image I. The control unit 150 detects the first hand gesture with the index finger extended from the image I acquired by the camera 120, and controls the volume control screen V among the media setting screens in the media playback screen M according to the detected first hand gesture. The display 130 may be controlled to display. At this time, the control unit 150 may control the display unit 130 to display the volume control screen V in the media playback screen M only when the first hand gesture with the index finger extended is detected for a predetermined reference time. Here, the reference time may mean a minimum time for the user to maintain the first hand gesture with the intention of displaying the volume control screen M.

제어부(150)는 볼륨 제어 화면 V가 기존에 표시된 미디어 재생 화면 M 상에 오버랩되어 표시되도록 디스플레이부(130)를 제어할 수 있다. 이와는 달리, 제어부(150)는 볼륨 제어 화면 V가 미디어 재생 화면 M과 분리된 영역에 표시되도록 디스플레이부(130)를 제어할 수도 있다.The control unit 150 may control the display unit 130 such that the volume control screen V overlaps and displays on the previously displayed media playback screen M. Alternatively, the control unit 150 may control the display unit 130 such that the volume control screen V is displayed in an area separate from the media playback screen M.

도 5를 참조하면, 볼륨 제어 화면 V는 음소거 명령을 입력 받기 위한 제 1 제어 항목 V₁, 볼륨 다운 명령을 입력 받기 위한 제 2 제어 항목 V₂, 및 볼륨 업 명령을 입력 받기 위한 제 3 제어 항목 V₃을 포함할 수 있다. 아울러, 볼륨 제어 화면 V는 복수의 제어 항목 중 어느 하나에 대한 선택 명령을 입력 받기 위한 포커스 f를 더 포함할 수 있다. 도 5의 경우, 포커스 f는 제 2 제어 항목 V₂상에 위치함을 확인할 수 있다.Referring to FIG. 5, the volume control screen V includes a first control item V ₁ for receiving a mute command, a second control item V ₂ for receiving a volume down command, and a third control item for receiving a volume up command. V ₃ . In addition, the volume control screen V may further include a focus f for receiving a selection command for any one of a plurality of control items. 5, it can be confirmed that the focus f is located on the second control item V ₂ .

지금까지는 제 1 핸드 제스처를 통해 미디어 재생 화면 M을 제어하는 방법을 설명하였다. 이하에서는 도 6 및 7을 통해 제 2 핸드 제스처를 통해 미디어 재생 화면 M을 제어하는 방법을 설명한다.So far, the method of controlling the media playback screen M through the first hand gesture has been described. Hereinafter, a method of controlling the media playback screen M through the second hand gesture will be described with reference to FIGS. 6 and 7.

도 6은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 2 핸드 제스처가 감지된 경우를 예시한 도면이고, 도 7은 도 5의 볼륨 제어 화면이 제 2 핸드 제스처에 의해 제어되는 경우를 나타낸 도면이다.FIG. 6 is a diagram illustrating a case in which a second hand gesture is detected in an image acquired by a camera according to an embodiment, and FIG. 7 shows a case in which the volume control screen of FIG. 5 is controlled by a second hand gesture It is a drawing.

볼륨 제어 화면 V가 표시되면, 사용자는 복수의 제어 항목 V₁, V₂, V₃ 중 어느 하나를 선택하기 위해 포커스 f를 이동시킬 수 있다. 이를 위해, 사용자는 포커스 f를 이동시키기 위한 제어 명령을 관심 영역 내에서 핸드 제스처를 통해 입력할 수 있다. 카메라(120)는 관심 영역에 대한 영상 I를 획득할 수 있고, 도 6에서는 영상 I 내에 사용자가 검지 손가락을 편 상태로 수평 이동, 구체적으로 좌측 방향으로 이동하는 제 2 핸드 제스처가 포함된 경우를 예시한다.When the volume control screen V is displayed, the user can move the focus f to select one of the plurality of control items V ₁ , V ₂ , and V ₃ . To this end, the user may input a control command for moving the focus f through a hand gesture in the region of interest. The camera 120 may acquire the image I for the region of interest, and in FIG. 6, a case in which the user moves horizontally in a state in which the index finger is opened, and specifically, a second hand gesture for moving in the left direction is included. For example.

제어부(150)는 카메라(120)에 의해 획득된 영상 I로부터 검지 손가락이 펴진 상태로 수평 이동하는 제 2 핸드 제스처를 감지하고, 감지된 제 2 핸드 제스처에 따라 미디어 설정 화면 중 볼륨 제어 화면 V 내 포커스 f의 위치를 이동하도록 디스플레이부(130)를 제어할 수 있다. 구체적으로, 검지 손가락이 펴진 상태로 좌측 방향으로 이동하는 제 2 핸드 제스처가 감지된 경우, 제어부(150)는 포커스 f의 위치를 우측 방향으로 이동하도록 디스플레이부(130)를 제어할 수 있다. 이와는 달리, 검지 손가락이 펴진 상태로 우측 방향으로 이동하는 제 2 핸드 제스처가 감지된 경우, 제어부(150)는 포커스 f의 위치를 좌측 방향으로 이동하도록 디스플레이부(130)를 제어할 수 있다.The control unit 150 detects a second hand gesture horizontally moving with the index finger extended from the image I acquired by the camera 120, and within the volume control screen V of the media setting screen according to the detected second hand gesture The display 130 may be controlled to move the position of the focus f. Specifically, when the second hand gesture moving in the left direction with the index finger extended is detected, the controller 150 may control the display 130 to move the position of the focus f in the right direction. Alternatively, when the second hand gesture moving in the right direction with the index finger extended is detected, the controller 150 may control the display 130 to move the position of the focus f in the left direction.

도 7은 도 6과 같이 좌측 방향으로 이동하는 제 2 핸드 제스처가 감지될 때, 제 2 제어 항목 V₂ 상에 위치하던 포커스 f가 우측 방향으로 이동한 경우를 나타낸다. 그 결과, 포커스 f는 제 3 제어 항목 V₃ 상에 위치함을 확인할 수 있다.7 illustrates a case in which the focus f located on the second control item V ₂ moves in the right direction when the second hand gesture moving in the left direction is sensed as shown in FIG. 6. As a result, it can be confirmed that the focus f is located on the third control item V ₃ .

지금까지는 제 2 핸드 제스처를 통해 미디어 재생 화면 M을 제어하는 방법을 설명하였다. 이하에서는 도 8 및 9를 통해 제 3 핸드 제스처를 통해 미디어 재생 화면 M을 제어하는 방법을 설명한다.So far, the method of controlling the media playback screen M through the second hand gesture has been described. Hereinafter, a method of controlling the media playback screen M through a third hand gesture will be described with reference to FIGS. 8 and 9.

도 8은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 3 핸드 제스처가 감지된 경우를 예시한 도면이고, 도 9는 도 7의 볼륨 제어 화면이 제 3 핸드 제스처에 의해 제어되는 경우를 나타낸 도면이다.FIG. 8 is a diagram illustrating a case in which a third hand gesture is detected in an image acquired by a camera according to an embodiment, and FIG. 9 is a diagram showing a case in which the volume control screen of FIG. 7 is controlled by a third hand gesture It is a drawing.

포커스 f 가 원하는 제어 항목 상으로 이동한 후, 사용자는 해당 제어 항목을 선택할 수 있다. 이를 위해, 사용자는 포커스 f가 위치한 제어 항목을 선택하기 위한 제어 명령을 관심 영역 내에서 제 3 핸드 제스처를 통해 입력할 수 있다. 여기서, 제 3 핸드 제스처는 검지 손가락을 편 상태에서 검지 손가락을 구부려 카메라(120)를 향하는 제 1 서브 제스처와, 구부린 검지 손가락을 다시 펴는 제 2 서브 제스처의 연결 동작으로 구성될 수 있다. 직관적으로, 제 3 핸드 제스처는 사용자에게 마우스의 클릭(Click) 동작을 연상시킬 수 있다.After the focus f moves over the desired control item, the user can select the corresponding control item. To this end, the user may input a control command for selecting a control item in which the focus f is located through a third hand gesture in the region of interest. Here, the third hand gesture may be composed of a connection operation between the first sub-gesture facing the camera 120 by bending the index finger while the index finger is opened, and the second sub-gesture extending the bent index finger again. Intuitively, the third hand gesture may remind the user of a click action of the mouse.

카메라(120)는 관심 영역에 대한 영상 I를 획득할 수 있고, 도 8에서는 영상 I 내에 사용자가 펴진 검지 손가락을 구부려 카메라(120)를 향하도록 하는 제 3 핸드 제스처 중 제 1 서브 제스처가 포함된 경우를 예시한다.The camera 120 may acquire an image I for a region of interest, and in FIG. 8, a first sub-gesture is included in the third hand gesture in which the user bends the extended index finger to face the camera 120 Illustrate the case.

제어부(150)는 카메라(120)에 의해 획득된 영상 I로부터 검지 손가락이 카메라(120)를 향해 구부려졌다 펴지는 제 3 핸드 제스처를 감지하고, 감지된 제 3 핸드 제스처에 따라 미디어 설정 화면 중 볼륨 제어 화면 V 내 포커스 f가 위치하는 제 3 제어 항목 V₃를 선택할 수 있다. 제 3 제어 항목 V₃가 선택되면, 제어부(150)는 선택에 따라 재생되는 미디어의 볼륨을 다운시키도록 스피커(140)를 제어할 수 있다. The control unit 150 detects a third hand gesture in which the index finger is bent and extended toward the camera 120 from the image I acquired by the camera 120, and the volume of the media setting screen according to the detected third hand gesture The third control item V ₃ in which the focus f in the control screen V is located can be selected. When the third control item V ₃ is selected, the controller 150 may control the speaker 140 to lower the volume of the media played according to the selection.

이와 동시에, 제어부(150)는 제 3 제어 항목 V₃가 선택되었음을 나타내는 선택 피드백이 볼륨 제어 화면 V 상에 표시되도록 디스플레이부(130)를 제어할 수 있다. 도 9를 참조하면, 볼륨 제어 화면 V는 선택 피드백으로서 고리 형상의 포커스 f의 개수를 증가시킬 수 있다. 선택 피드백을 시각적으로 확인한 사용자는 원하는 제어 항목이 성공적으로 선택되었음을 인지할 수 있다.At the same time, the control unit 150 may control the display unit 130 such that selection feedback indicating that the third control item V ₃ is selected is displayed on the volume control screen V. Referring to FIG. 9, the volume control screen V may increase the number of ring-shaped focus f as selection feedback. The user who visually confirms the selection feedback can recognize that the desired control item has been successfully selected.

한편, 볼륨 제어 항목 중 제 2 제어 항목 및 제 3 제어 항목 선택에 대한 제 1 서브 제스처가 유지되면, 제어부(150)는 제 1 서브 제스처의 유지 시간에 대응되는 횟수만큼 제어 항목을 선택할 수 있다. 예를 들어, 포커스 f가 제 3 제어 항목 V₃ 상에 위치할 때, 사용자가 제 1 서브 제스처, 즉 검지 손가락을 카메라(120)를 향해 구부린 핸드 제스처를 유지하는 경우, 제어부(150)는 제 1 서브 제스처의 유지 시간에 비례하여 재생되는 미디어의 볼륨을 복수 회 증가시킬 수 있다. Meanwhile, when the first sub-gesture for selecting the second control item and the third control item among the volume control items is maintained, the controller 150 may select the control item by a number of times corresponding to the holding time of the first sub-gesture. For example, when the focus f is positioned on the third control item V ₃ , when the user maintains the first sub-gesture, that is, the hand gesture with the index finger bent toward the camera 120, the control unit 150 controls the first It is possible to increase the volume of the media being played multiple times in proportion to the retention time of one sub-gesture.

나아가, 제어부(150)는 제 1 서브 제스처의 유지 시간에 대응되는 횟수만큼 선택 피드백이 표시되도록 디스플레이부(130)를 제어할 수 있다. 만약, 지속 시간이 1초이고, 제 1 서브 제스처의 유지 시간이 3초인 경우, 제어부(150)는 선택 피드백으로서 고리 형상의 포커스 f를 3개 표시할 수 있다.Furthermore, the controller 150 may control the display 130 to display selection feedback as many times as the number of times corresponding to the holding time of the first sub-gesture. If the duration is 1 second and the retention time of the first sub-gesture is 3 seconds, the controller 150 may display three ring-shaped focus f as selection feedback.

지금까지는 핸드 제스처를 통해 미디어 재생 화면 M을 제어하는 방법을 설명하였다. 이하에서는 도 10을 통해 핸드 제스처에 의한 미디어 재생 화면 M 제어를 종료하는 방법을 설명한다.So far, the method of controlling the media playback screen M through a hand gesture has been described. Hereinafter, a method of ending the media playback screen M control by the hand gesture will be described with reference to FIG. 10.

도 10은 일 실시예에 따른 카메라에 의해 획득된 영상에서 제 1 핸드 제스처가 감지되지 않는 경우를 예시한 도면이다.10 is a diagram illustrating a case in which a first hand gesture is not detected in an image acquired by a camera according to an embodiment.

더 이상 핸드 제스처를 통해 미디어 재생 화면 M을 제어할 필요가 없는 경우, 사용자는 핸드 제스처에 의한 미디어 재생 화면 제어를 종료할 수 있다. 이를 위해, 사용자는 제 1 핸드 제스처가 관심 영역 내에서 감지되지 않도록 할 수 있다.When it is no longer necessary to control the media playback screen M through the hand gesture, the user can end the media playback screen control by the hand gesture. To this end, the user can prevent the first hand gesture from being detected within the region of interest.

그 결과, 카메라(120)에 의해 획득된 관심 영역에 대한 영상 I 내에는 사용자의 제 1 핸드 제스처가 감지되지 않을 수 있다. 제어부(150)는 영상 I 내에서 미리 정해진 종료 시간 동안 제 1 핸드 제스처가 감지되지 않으면, 미디어 재생 화면 M 상의 볼륨 제어 화면 V가 사라지도록 디스플레이부(130)를 제어할 수 있다. As a result, the user's first hand gesture may not be detected in the image I of the region of interest acquired by the camera 120. If the first hand gesture is not detected for a predetermined end time in the image I, the controller 150 may control the display 130 to disappear the volume control screen V on the media playback screen M.

한편, 미디어 재생 화면 M 상에 볼륨 제어 화면 V가 표시되는 경우라도, 사용자의 음성 또는 디스플레이부(130) 상에 터치가 감지되는 경우, 제어부(150)는 핸드 제스처에 우선하여 감지된 음성 또는 터치에 대응되는 제어 명령에 따라 음성 인식 장치(100)를 제어할 수 있다. 구체적으로, 제어부(150)는 핸드 제스처, 음성, 및 터치 중 적어도 두 개가 동시에 입력되면, 터치-음성-핸드 제스처의 순서에 따라 음성 인식 장치(100)를 제어할 수 있다.On the other hand, even when the volume control screen V is displayed on the media playback screen M, when a user's voice or a touch is detected on the display unit 130, the controller 150 prioritizes the hand gesture to detect the voice or touch. The voice recognition apparatus 100 may be controlled according to a control command corresponding to. Specifically, when at least two of the hand gesture, voice, and touch are simultaneously input, the controller 150 may control the voice recognition apparatus 100 according to the order of the touch-voice-hand gesture.

한편, 음성 인식 장치(100)는 핸드 제스처에 의한 제어 명령이 입력된 적이 없거나 초기화된 경우, 핸드 제스처에 대한 튜토리얼 모드를 제공할 수 있다. 일 실시예에 따른 음성 인식 장치(100)는 사용자에게 튜토리얼 모드 진입을 묻고, 튜토리얼 모드 진입 시 사용자가 상술한 제 1 내지 3 핸드 제스처를 학습할 수 있도록 안내할 수 있다. 이를 통해, 핸드 제스처에 의한 제어 명령 입력 방법을 사용자가 학습할 수 있어, 음성 인식 장치(100)의 사용자 편의성이 증대될 수 있다.On the other hand, the voice recognition apparatus 100 may provide a tutorial mode for the hand gesture when the control command by the hand gesture has never been input or is initialized. The voice recognition apparatus 100 according to an embodiment may prompt the user to enter the tutorial mode, and guide the user to learn the above-described first to third hand gestures when entering the tutorial mode. Through this, a user can learn a method of inputting a control command by a hand gesture, thereby increasing user convenience of the speech recognition device 100.

도 11은 일 실시예에 따른 음성 인식 장치 제어방법의 흐름도이다.11 is a flowchart of a method for controlling a speech recognition apparatus according to an embodiment.

먼저, 음성 인식 장치(100)는 미디어 재생 화면 M을 표시할 수 있다(S100). 여기서, 미디어 재생 화면 M이란 미디어의 재생 관련한 정보 제공 및 제어 명령 입력을 위한 화면을 의미할 수 있다. 일 실시예에 따른 미디어 재생 화면 M은 재생되는 미디어의 종류(예를 들어, 동영상, 사진, 음악 등), 재생되는 미디어의 제목, 저작자 정보, 재생되는 미디어에 대한 프로그레스 바(Progress Bar) 등을 포함할 수 있다.First, the speech recognition device 100 may display the media playback screen M (S100). Here, the media playback screen M may mean a screen for providing information related to playback of media and inputting a control command. Media playback screen M according to an embodiment is the type of media being played (eg, video, photo, music, etc.), the title of the media being played, author information, a progress bar for the media being played, etc. It may include.

그 다음, 음성 인식 장치(100)는 미리 정해진 관심 영역에 대한 영상 I을 획득할 수 있다(S110). 여기서 관심 영역이란 카메라(120)가 마련되는 위치에 따라 결정되는 촬영 영역을 의미할 수 있다.Then, the speech recognition apparatus 100 may acquire an image I for a predetermined region of interest (S110). Here, the region of interest may mean a photographing region determined according to a location where the camera 120 is provided.

관심 영역에 대한 영상 I가 획득되면, 음성 인식 장치(100)는 획득된 영상 I 내 핸드 제스처가 감지되는지 확인할 수 있다(S120). 만약, 핸드 제스처가 감지되지 않는다면, 음성 인식 장치(100)는 다시 관심 영역에 대한 영상을 획득할 수 있다.When the image I of the region of interest is obtained, the voice recognition apparatus 100 may check whether a hand gesture in the acquired image I is detected (S120). If the hand gesture is not detected, the voice recognition apparatus 100 may acquire an image of the region of interest again.

반면, 핸드 제스처가 감지된다면, 음성 인식 장치(100)는 감지된 핸드 제스처에 따라 음성 인식 장치의 동작을 제어할 수 있다(S130). 일 실시예에 따른 음성 인식 장치(100)는 감지된 핸드 제스처에 따라 미디어 재생 화면 M을 제어할 수 있다. 예를 들어, 제 1 핸드 제스처가 감지되면, 음성 인식 장치(100)는 미디어 재생 화면 M 상에 볼륨 제어 화면 V를 표시할 수 있다. 또한, 제 2 핸드 제스처가 감지되면, 음성 인식 장치(100)는 볼륨 제어 화면 V 내 복수의 제어 항목 중 어느 하나에 대한 포커스 f의 위치를 이동시킬 수 있다. 나아가, 제 3 핸드 제스처가 감지되면, 음성 인식 장치(100)는 포커스 f가 위치하는 제어 항목에 대한 선택 피드백을 표시할 수 있다.On the other hand, if the hand gesture is detected, the voice recognition device 100 may control the operation of the voice recognition device according to the detected hand gesture (S130). The voice recognition apparatus 100 according to an embodiment may control the media playback screen M according to the detected hand gesture. For example, when the first hand gesture is detected, the voice recognition device 100 may display the volume control screen V on the media playback screen M. In addition, when the second hand gesture is detected, the voice recognition apparatus 100 may move the position of the focus f for any one of a plurality of control items in the volume control screen V. Furthermore, when the third hand gesture is detected, the voice recognition device 100 may display selection feedback for the control item in which the focus f is located.

상술한 음성 인식 장치 및 그 제어방법은, 음성 또는 터치와 같은 방법으로 제어명령의 입력이 어려운 상황에서, 단순하면서도 직관적인 제스처를 통해 동작의 제어가 가능한 환경을 제공할 수 있다. 이를 통해, 음성 인식 장치에 대한 사용자 편의성을 높일 수 있다.The above-described voice recognition apparatus and its control method may provide an environment capable of controlling operations through simple and intuitive gestures in a situation in which it is difficult to input a control command by a method such as voice or touch. Through this, it is possible to increase user convenience for the speech recognition device.

한편, 상술한 일 실시예에 따른 음성 인식 장치의 제어방법에 포함된 각각의 단계는, 이러한 단계를 수행하도록 프로그램된 컴퓨터 프로그램을 기록하는 컴퓨터 판독가능한 기록매체에서 구현될 수 있다.Meanwhile, each step included in the control method of the speech recognition apparatus according to the above-described exemplary embodiment may be implemented in a computer-readable recording medium recording a computer program programmed to perform such steps.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art to which the present invention pertains may make various modifications and variations without departing from the essential quality of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

일 실시예에 따르면, 상술한 디스플레이 장치 및 그 제어방법은 댁내 또는 산업 현장 등 다양한 분야에서 이용될 수 있으므로 산업상 이용 가능성이 있다.According to an embodiment, the above-described display device and its control method may be used in various fields such as a home or an industrial site, and thus there is a possibility of industrial use.

100: 음성 인식 장치
110: 마이크로폰
120: 카메라
130: 디스플레이부
140: 스피커
150: 제어부
160: 저장부
170: 조도 센서100: speech recognition device
110: microphone
120: camera
130: display unit
140: speaker
150: control unit
160: storage unit
170: illuminance sensor

Claims

In the speech recognition device for recognizing the detected speech,
A display unit for displaying a media playback screen;
A camera that acquires an image for a predetermined region of interest; And
And a control unit controlling an operation of the speech recognition device according to the detected hand gesture when a hand gesture in the image acquired by the camera is detected.
Speech recognition device.

According to claim 1,
The control unit,
When a first hand gesture for starting the media setting in the image is detected, the display unit is controlled to display a media setting screen corresponding to the first hand gesture,
When a second hand gesture for searching for a media setting item in the image is detected, the display unit moves the focus position for any one of a plurality of control items on the displayed media setting screen according to the detected second hand gesture. Control,
When a third hand gesture for selecting the media setting item in the image is detected, controlling the display unit to display a selection feedback indicating that the control item in which the focus is located is selected according to the detected third hand gesture
Speech recognition device.

According to claim 2,
The third hand gesture,
It consists of a combination of a continuous first sub-gesture and the second sub-gesture,
When the first sub-gesture for selecting the media setting item in the image is maintained, the control unit displays the selection feedback indicating that the control item is selected as many times as the number of times corresponding to the holding time of the first sub-gesture. To control the display
Speech recognition device.

According to claim 2,
The control unit,
If the first hand gesture in the image is not detected for a predetermined period of time, controlling the display unit so that the media setting screen disappears
Speech recognition device.

According to claim 1,
Further comprising an illuminance sensor for detecting the illuminance around the speech recognition device,
The control unit,
In addition to detecting the hand gesture in the image, when the gesture recognition setting of the speech recognition device is On, and the illuminance detected by the illuminance sensor is greater than or equal to the reference illuminance, the voice according to the detected hand gesture To control the operation of the recognition device
Speech recognition device.

According to claim 1,
The control unit,
Controlling the operation of the speech recognition device according to any one of the sensed hand gesture, the sensed voice, and the touch on the display unit,
When at least two of the hand gesture, the voice, and the touch are input at the same time, controlling the operation of the voice recognition device according to the order of the touch, the voice, and the hand gesture
Speech recognition device.

In the control method of the speech recognition device for recognizing the detected speech,
Displaying a media playback screen;
Obtaining an image for a predetermined region of interest; And
And when a gesture in the acquired image is detected, controlling an operation of the speech recognition device according to the detected gesture.
Control method of speech recognition device.

The method of claim 7,
Controlling the operation of the speech recognition device,
Displaying a media setting screen corresponding to the first gesture when a first gesture for starting the media setting in the image is detected;
When a second gesture for selecting a media setting item in the image is detected, moving a focus position for any one of a plurality of control items on the displayed media setting screen according to the detected second gesture; And
And when a third gesture for selecting the media setting item in the image is detected, displaying selection feedback indicating that the control item in which the focus is located is selected according to the detected third gesture.
Control method of speech recognition device.

The method of claim 8,
The third gesture,
It consists of a combination of a continuous first sub-gesture and the second sub-gesture,
Controlling the operation of the speech recognition device,
And displaying the selection feedback indicating that the control item is selected as many times as the number of times corresponding to the holding time of the first sub-gesture when the first sub-gesture for selecting the media setting item in the image is maintained. doing
Control method of speech recognition device.

The method of claim 8,
If the first gesture in the image is not detected for a predetermined time, further comprising the step of causing the media setting screen to disappear
Control method of speech recognition device.

The method of claim 7,
Further comprising the step of detecting the illuminance around the speech recognition device,
Controlling the operation of the speech recognition device,
In addition to detecting the gesture in the image, when the gesture recognition setting of the voice recognition device is on and the sensed illuminance is higher than the reference illuminance, control the operation of the voice recognition device according to the detected gesture doing
Control method of speech recognition device.

The method of claim 7,
Controlling the operation of the speech recognition device,
When at least two of the gesture, the voice, and the touch on the display unit on which the media playback screen is displayed are simultaneously input, controlling the operation of the voice recognition device according to the order of the touch, the voice, and the gesture
Control method of speech recognition device.

A computer-readable recording medium in which a program including instructions for performing each step according to the method according to any one of claims 7 to 12 is recorded.