KR20190018330A

KR20190018330A - Method for recognizing voice and apparatus used therefor

Info

Publication number: KR20190018330A
Application number: KR1020170103169A
Authority: KR
Inventors: 임국찬; 배용우; 신동엽; 신승민; 신승호; 이학순; 전진수
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2017-08-14
Filing date: 2017-08-14
Publication date: 2019-02-22
Also published as: KR102331234B1; KR20210098424A; KR102304342B1

Abstract

According to one embodiment of the present invention, a method performed by a voice recognition apparatus for inputting voice comprises the steps of: determining whether the other device is possessed by a user based on state information received from the other device when the state information including a changed value from the other device is received if the user possesses the other device; requesting the other device to transmit sound inputted to a sound input unit of the other device to a voice recognition device if it is determined that the other device is owned by the user; and controlling voice recognition to be performed based on the received voice when the voice is received from the other device.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a speech recognition method,

본 발명은 음성 인식 방법 및 이에 사용되는 음성 인식 장치에 관한 것이며, 보다 자세하게는 음성을 입력받는 타 기기와 연동하여서 음성을 인식하는 방법 및 이에 사용되는 음성 인식 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and a speech recognition apparatus used therein, and more particularly, to a method of recognizing a speech in cooperation with another apparatus that receives speech and a speech recognition apparatus used therefor.

음성 인식 기반의 대화형 디바이스는 복수 개의 음성 입력부(예컨대 마이크로폰)를 포함할 수 있다. 음성 입력부가 복수 개로 구비되면, 다양한 방향에서 발생되는 음성이 높은 인식률로 수집될 수 있다. 도 1은 복수 개의 음성 입력부(20)를 포함하는 대화형 디바이스(1)의 구성을 개념적으로 도시한 도면이다. 도 1을 참조하면, 대화형 디바이스(1)는 몸체를 구성하는 바디부(10) 그리고 이러한 바디부(10)에 실장되는 복수 개의 음성 입력부(20)를 포함할 수 있다. 복수 개의 음성 입력부(20)는 다양한 방향을 향하도록 지향적으로 배치될 수 있다.The interactive device based on speech recognition may include a plurality of voice input units (e.g., a microphone). If a plurality of voice input units are provided, voice generated in various directions can be collected at a high recognition rate. 1 is a diagram conceptually showing a configuration of an interactive device 1 including a plurality of voice input units 20. As shown in Fig. Referring to FIG. 1, the interactive device 1 may include a body part 10 constituting a body and a plurality of voice input parts 20 mounted on the body part 10. The plurality of audio input units 20 may be oriented to be oriented in various directions.

도 2는 도 1에 도시된 복수 개의 음성 입력부(20)에 대한 블록도를 도시한 도면이다. 도 2를 참조하면, 복수 개의 음성 입력부(20) 각각은 증폭기(21)에 연결될 수 있고, 증폭기(21)는 마이크로프로세서(MCU, 22)에 연결될 수 있다. 복수 개의 음성 입력부(20) 각각을 통해 입력된 음성은 증폭기(21)에서 증폭된 뒤 마이크로프로세서(22)로 전달된다. 마이크로프로세서(22)는 각각의 증폭기(21)로부터 음성을 전달받은 후 음성 인식을 직접 수행할 수 있으며, 이와 달리 별도의 음성 인식 서버에서 음성 인식이 수행될 수 있도록 음성 인식 서버에게 음성을 전달할 수 있다.FIG. 2 is a block diagram illustrating a plurality of audio input units 20 shown in FIG. 1. Referring to FIG. 2, each of the plurality of audio input units 20 may be connected to an amplifier 21, and the amplifier 21 may be connected to a microprocessor (MCU) 22. The voice inputted through each of the plurality of voice input units 20 is amplified by the amplifier 21 and then transmitted to the microprocessor 22. The microprocessor 22 can perform speech recognition directly after receiving the voice from each amplifier 21. Alternatively, the microprocessor 22 can transmit voice to the voice recognition server so that voice recognition can be performed in a separate voice recognition server have.

대화형 디바이스(1)는 특정 위치에 고정되어 사용되는 고정형 디바이스일 수 있다. 사용자가 대화형 디바이스(1)로부터 근거리만큼 이격된 위치에 있다면, 이러한 사용자가 발한 음성은 대화형 디바이스(1)에서 용이하게 인식 가능하다. 그러나, 사용자가 대화형 디바이스(1)로부터 원거리만큼 이격된 위치에 있다면, 이러한 사용자가 발한 음성은 대화형 디바이스(1)에서 용이하게 인식되기가 어렵다. 왜냐하면, 사용자가 발한 음성이 대화형 디바이스(1)까지 도달하는 과정에서 왜곡될 수 있기 때문이다.The interactive device 1 may be a fixed device fixed and used at a specific position. If the user is at a position spaced apart from the interactive device 1 by a short distance, the voice uttered by such user is easily recognizable in the interactive device 1. However, if the user is at a distance from the interactive device 1 at a distance, the voice uttered by such user is difficult to be easily recognized in the interactive device 1. [ This is because the voice uttered by the user may be distorted in the process of reaching the interactive device 1.

한국특허공개공보, 제 2010-0115783호 (2010.10.28. 공개)Korean Patent Laid-Open Publication No. 2010-0115783 (published on October 28, 2010)

이에 본 발명이 해결하고자 하는 과제는, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에 있거나 대화형 디바이스 부근에 잡음원이 존재하는 경우에, 음성 인식률을 개선시키는 기술을 제공하는 것이다.SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a technique for improving the voice recognition rate when a user is located at a distance from the voice recognition device at a distance or in the vicinity of an interactive device.

다만, 본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.It is to be understood that both the foregoing general description and the following detailed description of the present invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed. will be.

일 실시예에 따른 음성 입력 방법은 음성 인식 장치에 의해 수행되며, 타 기기로부터, 사용자가 상기 타 기기를 소지한 경우에 변화된 값을 포함하는 상태 정보가 수신되면, 상기 수신받은 상태 정보를 기초로 상기 타 기기가 상기 사용자에 의해 소지되었는지 여부를 판단하는 단계와, 상기 타 기기가 상기 사용자에 의해 소지된 것으로 판단되면, 상기 타 기기의 소리 입력부에 입력된 소리를 상기 음성 인식 장치에게 송신하라고 상기 타 기기에게 요청하는 단계와, 상기 타 기기로부터 소리가 수신되면, 상기 수신된 소리를 기초로 음성 인식이 수행되도록 제어하는 단계를 포함한다.A voice input method according to an embodiment is performed by a voice recognition device, and when state information including a changed value is received from another device in the case where the user holds the other device, Determining whether the other device is held by the user; and transmitting, if it is determined that the other device is held by the user, to transmit the sound input to the sound input unit of the other device to the voice recognition device And requesting the other device to perform speech recognition based on the received sound when the sound is received from the other device.

일 실시예에 따른 음성 입력 장치는 타 기기와 통신을 수행하는 통신부와, 사용자가 상기 타 기기를 소지한 경우에 변화된 값을 갖는 상태 정보가 상기통신부를 통해서 상기 타 기기로부터 수신되면 상기 수신받은 상태 정보를 기초로 상기 타 기기가 상기 사용자에 의해 소지되었는지 여부를 판단하고, 상기 타 기기가 상기 사용자에 의해 소지된 것으로 판단되면 상기 타 기기의 소리 입력부에게 입력된 소리의 전달을 상기 타 기기에게 요청하는 제어부와, 상기 요청에 따라 상기 소리가 상기 통신부를 통해서 상기 타 기기로부터 수신되면, 상기 수신된 소리를 기초로 음성 인식이 수행되도록 제어하는 음성 인식부를 포함한다.The voice input apparatus according to an embodiment of the present invention includes a communication unit for performing communication with another device, and a control unit for, when a status information having a changed value is received from the other device through the communication unit, Determines whether the other device is owned by the user based on the information, and when it is determined that the other device is owned by the user, requests the other device to deliver the sound input to the sound input unit of the other device And a voice recognition unit for performing voice recognition based on the received voice when the voice is received from the other device through the communication unit in response to the request.

일 실시예에 따르면, 사용자가 음성 인식이 가능한 타 기기를 소지하고 있는 경우에는 이러한 타 기기에서 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.According to an embodiment, if the user has another device capable of voice recognition, the user's voice may be acquired from the other device. Therefore, even if the user utters at a distance from the speech recognition apparatus at a long distance, the speech distortion that may occur due to the distance between the user and the speech recognition apparatus may be reduced or may not occur.

또한, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터의 이격 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 이러한 타 기기에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향이 최소화될 수 있다.In addition, when the user does not possess another device, the distance from the noise source is taken into consideration and an object for acquiring the voice can be selected. For example, when the other device is located farther from the noise source than the voice input device, the voice of the user can be acquired by such other device. Therefore, even if a noise source exists in the speech input system, the influence of such a noise source on speech recognition can be minimized.

도 1은 일반적인 대화형 음성 인식 장치의 구성을 개념적으로 도시한 도면이다.
도 2는 도 1에 도시된 대화형 음성 인식 장치의 음성 인식부에 대한 블록도를 도시한 도면이다.
도 3은 일 실시예에 따른 음성 인식 장치가 적용된 음성 인식 시스템의 구성을 개념적으로 도시한 도면이다.
도 4는 도 3에 도시된 음성 인식 장치의 구성을 개념적으로 도시한 도면이다
도 5는 일 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다.
도 6은 다른 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다.1 is a diagram conceptually showing a configuration of a general interactive speech recognition apparatus.
FIG. 2 is a block diagram of a voice recognition unit of the interactive voice recognition apparatus shown in FIG. 1. FIG.
FIG. 3 is a conceptual view illustrating a configuration of a speech recognition system to which a speech recognition apparatus according to an embodiment is applied.
4 is a view conceptually showing a configuration of the speech recognition apparatus shown in Fig. 3
5 is a diagram illustrating an example of a procedure of a speech recognition method according to an embodiment.
6 is a diagram illustrating an example of a procedure of a speech recognition method according to another embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

도 3은 일 실시예에 따른 음성 인식 장치가 적용된 음성 인식 시스템의 구성을 개념적으로 도시한 도면이다. 다만, 도 3은 예시적인 것에 불과하므로, 음성 인식 장치(100)가 도 3에 도시된 음성 인식 시스템(1000)에만 한정 적용되는 것으로 해석되지는 않는다.FIG. 3 is a conceptual view illustrating a configuration of a speech recognition system to which a speech recognition apparatus according to an embodiment is applied. However, since Fig. 3 is merely an example, the speech recognition apparatus 100 is not construed to be limited to the speech recognition system 1000 shown in Fig.

도 3을 참조하면, 음성 인식 시스템(1000)은 음성 인식 서버(500), 음성 인식 장치(100) 그리고 적어도 하나의 타 기기(200,210)를 포함할 수 있다. 이 때, 이러한 음성 인식 시스템(1000)이 설치된 공간에는 잡음(noise)을 발하는 잡음원(300)이 배치될 수 있다. Referring to FIG. 3, the speech recognition system 1000 may include a speech recognition server 500, a speech recognition apparatus 100, and at least one other device 200, 210. At this time, a noise source 300 that emits noise may be disposed in a space where the speech recognition system 1000 is installed.

음성 인식 서버(500)는 소리로부터 음성을 추출하고 인식하는 기능을 수행하는 서버일 수 있다. 음성 인식 서버(500)에서 처리되는 소리는 음성 인식 장치(100) 또는 타 기기(200,210)로부터 전달받은 소리일 수 있다. 여기서, 음성 인식 서버(500)는 소리로부터 음성을 추출하고 인식하기 위해 공지된 기술을 사용할 수 있는데, 이에 대한 설명은 생략하기로 한다.The speech recognition server 500 may be a server that performs a function of extracting and recognizing speech from a sound. The sound processed by the voice recognition server 500 may be sound received from the voice recognition apparatus 100 or other devices 200 and 210. Here, the speech recognition server 500 can use a known technique for extracting and recognizing speech from a sound, and a description thereof will be omitted.

타 기기(200,210)에 대하여 살펴보기로 한다. 타 기기(200,210)는 외부로부터 소리를 입력받는 기능을 구비한, 모든 기기를 총칭할 수 있다. 이에 타 기기(200,210)는 소리를 입력받는 소리 입력부를 포함한다. 또한, 타 기기(200,210)는 사용자가 소지한 채로 이동할 수 있는 크기 또는 형태를 가질 수 있다. 또한 타 기기(200,210)는 음성 인식 장치(100)와 정보를 주고받는 통신부(미도시)를 포함할 수 있다. 예컨대 이러한 타 기기(200,210)는 스마트폰, 스마트패드, 스마트시계, 소리 입력 기능이 구비된 리모콘 또는 소리 입력 기능이 구비된 휴대용 스피커 등일 수 있다.Other devices 200 and 210 will now be described. The other devices 200 and 210 may be collectively referred to as all devices having a function of receiving sound from the outside. The other devices 200 and 210 include a sound input unit for receiving sound. In addition, the other devices 200 and 210 may have a size or shape that the user can move while holding it. The other devices 200 and 210 may include a communication unit (not shown) for exchanging information with the voice recognition device 100. For example, the other devices 200 and 210 may be a smart phone, a smart pad, a smart clock, a remote control equipped with a sound input function, or a portable speaker equipped with a sound input function.

타 기기(200,210)가 입력받는 소리에는 사람의 음성, 사물로부터 발생되는 소리, 잡음원(300)이 발생시키는 잡음 등이 있을 수 있으며, 다만 이에 한정되는 것은 아니다.The sounds received by the other devices 200 and 210 may include a human voice, a sound generated from an object, a noise generated by the noise source 300, and the like.

한편, 타 기기(200,210)는 도면에는 도지 않았지만 센서부를 포함한다. 센서부는 사용자(400)가 타 기기(200,210)를 소지하거나 또는 소지한 채로 이동하는 경우 그 값이 변하는 구성일 수 있다. 예컨대 센서부는 가속도 센서, 자이로(gyro) 센서, 지자기 센서 또는 햅틱(haptic) 센서 중 적어도 하나를 포함할 수 있다. 이 중 햅틱 센서란, 사용자(400)에 의해 타 기기(200,210)가 터치되는지 여부를 감지하는 촉각 센서일 수 있다.On the other hand, the other devices 200 and 210 include a sensor unit although they are not shown in the drawings. The sensor unit may be configured such that the value changes when the user 400 moves or holds the other device 200 or 210. For example, the sensor unit may include at least one of an acceleration sensor, a gyro sensor, a geomagnetic sensor, or a haptic sensor. The haptic sensor may be a tactile sensor that detects whether the other device 200 or 210 is touched by the user 400.

음성 인식 장치(100)는 사용자(400)가 발하는 음성을 인식하고, 인식된 음성에 대응하여서 대화형 서비스를 제공하는 장치일 수 있다. 또한, 음성 인식 장치(100)는 타 기기(200,210)를 제어함으로써, 이러한 타 기기(200,210)로 하여금 사용자(400)가 발하는 음성을 입력받도록 할 수 있다. 이하에서는 이러한 음성 인식 장치(100)의 구성에 대해서 살펴보도록 한다.The speech recognition apparatus 100 may be a device that recognizes a speech uttered by the user 400 and provides an interactive service corresponding to the recognized speech. In addition, the voice recognition apparatus 100 may control the other apparatuses 200 and 210 to allow the other apparatuses 200 and 210 to receive the voice uttered by the user 400. Hereinafter, a configuration of the speech recognition apparatus 100 will be described.

도 4는 도 3에 도시된 음성 인식 장치(100)의 구성을 예시적으로 도시한 도면이다. 도 4를 참조하면, 음성 인식 장치(100)는 통신부(110), 소리 입력부(120), 음성 인식부(130), 처리부(140), 제어부(150) 및 스피커(160)를 포함할 수 있으며, 다만 도 4에 도시된 것과는 달리 이 중에서 적어도 하나를 포함하지 않거나 또는 도면에는 도시되지 않은 구성을 더 포함할 수도 있다.4 is a diagram exemplarily showing a configuration of the speech recognition apparatus 100 shown in Fig. 4, the speech recognition apparatus 100 may include a communication unit 110, a sound input unit 120, a voice recognition unit 130, a processing unit 140, a control unit 150, and a speaker 160 , But it may further include a configuration which does not include at least one of them or is not shown in the drawings, unlike the one shown in Fig.

통신부(110)는 무선 통신 모듈일 수 있다. 예컨대 통신부(110)는 블루투스모듈, Wi-Fi 모듈 또는 적외선 통신 모듈 중 어느 하나일 수 있으나 이에 한정되는 것은 아니다. 이러한 통신부(110)를 통해서 음성 인식 장치(100)는 타 기기(200,210)와 음성 또는 음성 관련 데이터를 주고받을 수 있다. 통신부(110)는 복수 개의 타 기기(200,210) 각각과 순차적으로 또는 동시에 무선 방식으로 연결될 수 있다.The communication unit 110 may be a wireless communication module. For example, the communication unit 110 may be any one of a Bluetooth module, a Wi-Fi module, and an infrared communication module, but is not limited thereto. Through the communication unit 110, the voice recognition apparatus 100 can exchange voice or voice related data with other devices 200 and 210. The communication unit 110 may be connected to each of the plurality of other devices 200 and 210 sequentially or simultaneously in a wireless manner.

소리 입력부(120)는 마이크로폰과 같이 소리를 입력받는 구성이며, 입력받은 소리를 증폭하는 구성까지도 포함할 수 있다. 소리 입력부(120)가 입력받은 소리에는 사람의 음성, 사물로부터 발생되는 소리, 잡음원(300)이 발생시키는 잡음 등이 있을 수 있으며, 다만 이에 한정되는 것은 아니다. 이러한 소리 입력부(120)는 복수 개가 지향적으로 배치 및 동작될 수 있다.The sound input unit 120 may be configured to receive sound as in the case of a microphone, and may also include a configuration for amplifying a received sound. The sound input by the sound input unit 120 may include a human voice, a sound generated from an object, a noise generated by the noise source 300, and the like. A plurality of the sound input unit 120 may be arranged and operated in a direction.

음성 인식부(130)는 소리로부터 음성을 추출하여서 인식하는 구성이다. 음성 인식부(130)는 소리로부터 음성을 인식하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다.The speech recognition unit 130 extracts speech from the sound and recognizes the speech. The speech recognition unit 130 may be embodied by a memory that stores instructions programmed to recognize speech from a sound and a microprocessor that executes such instructions.

음성 인식부(130)에서 인식되는 소리는 소리 입력부(120)로 입력된 소리 또는 타 기기(200,210)로부터 전달받은 소리일 수 있다. The sound recognized by the voice recognition unit 130 may be a sound input to the sound input unit 120 or a sound received from other devices 200 and 210. [

음성 인식부(130)는 웨이크업 신호를 소리로부터 추출하여서 인식할 수 있다. 웨이크업 신호는 음성 인식 장치(100)를 깨우는 기 정의된 형태의 신호이다. 음성 인식 장치(100)가 웨이크업 신호를 인식하면, 이후부터 사용자(400)가 발하는 음성에 대해 음성 인식 장치(100)는 사용자(400)의 명령으로 해석한다. The voice recognition unit 130 can extract the wake-up signal from the sound and recognize it. The wake-up signal is a predefined type of signal that awakens the speech recognition device 100. When the voice recognition apparatus 100 recognizes the wakeup signal, the voice recognition apparatus 100 interprets the voice issued by the user 400 as a command of the user 400 from now on.

한편, 음성 인식부(130)는 웨이크업 신호 이외에 사용자(400)가 발하는 명령을 인식할 수도 있다. 다만, 이와 달리 음성 인식부(130)는 사용자(400)가 발하는 명령을 인식하지 않을 수 있으며, 이 경우에 사용자의 명령 인식은 음성 인식 서버(500)에서 수행될 수 있다.Meanwhile, the voice recognition unit 130 may recognize a command issued by the user 400 in addition to the wakeup signal. Alternatively, the speech recognition unit 130 may not recognize the command issued by the user 400, and in this case, the command recognition of the user may be performed in the speech recognition server 500.

음성 인식부(130)는 소리로부터 잡음을 추출하여서 인식할 수 있다. 음성 인식부(130)가 소리로부터 잡음을 추출하여서 인식하는데 사용하는 알고리즘은 공지된 것이므로 이에 대한 설명은 생략하기로 한다.The speech recognition unit 130 can extract and recognize noise from the sound. The algorithm used by the speech recognition unit 130 to extract and recognize the noise from the sound is well known, and a description thereof will be omitted.

처리부(140)는 사용자(400)에게 대화형 서비스를 제공하는 구성이며, 이러한 처리부(140)는 대화형 서비스를 제공하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다. 여기서, 처리부(140)는 이미 공지된 알고리즘을 사용하여서 대화형 서비스를 제공하므로, 이에 대해서는 설명을 생략하기로 한다.The processing unit 140 is a configuration for providing an interactive service to the user 400. The processing unit 140 may be implemented by a memory that stores instructions programmed to provide an interactive service and a microprocessor that executes such an instruction Do. Here, the processing unit 140 provides an interactive service by using an already-known algorithm, so that a description thereof will be omitted.

한편, 실시예에 따라서 처리부(140)는 음성 인식 장치(100)에 포함되지 않을 수 있다. 이 경우, 사용자(400)에게 제공되는 대화형 서비스는 음성 인식 서버(500)가 생성한 것이 음성 인식 장치(100)에게 전달된 것일 수 있다. Meanwhile, the processing unit 140 may not be included in the speech recognition apparatus 100 according to the embodiment. In this case, the interactive service provided to the user 400 may be the one generated by the voice recognition server 500 and delivered to the voice recognition apparatus 100.

스피커(160)는 외부를 향해 소리를 출력하는 구성이다. 음성 인식 장치(100)에 채용되는 스피커(160)는 일반적인 스피커일 수 있는 바, 이러한 스피커(160)에 대해서는 설명을 생략하기로 한다.The speaker 160 outputs a sound toward the outside. The speaker 160 employed in the speech recognition apparatus 100 may be a general speaker, and a description of the speaker 160 will be omitted.

제어부(150)는 이하에서 설명할 기능을 수행하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다. 이하에서는 이러한 제어부(150)에 대하여 구체적으로 살펴보도록 한다.The control unit 150 may be embodied by a memory that stores instructions programmed to perform the functions described below, and a microprocessor that executes such instructions. Hereinafter, the control unit 150 will be described in detail.

먼저, 제어부(150)는 음성 인식 장치(100) 주변에 위치한 타 기기(200,210)를 탐색할 수 있다. 예컨대, 통신부(110)가 블루투스 모듈로 구현된 경우, 제어부(150)는 블루투스 연결 히스토리 등을 이용하여서 탐색의 진행을 제어할 수 있다.First, the control unit 150 can search for other devices 200 and 210 located in the vicinity of the voice recognition device 100. For example, when the communication unit 110 is implemented as a Bluetooth module, the control unit 150 can control the progress of the search using the Bluetooth connection history or the like.

탐색이 완료된 경우, 제어부(150)는 탐색된 타 기기(200,210)와 음성 인식 장치(100)를 서로 연결시킬 수 있다.When the search is completed, the control unit 150 can connect the other devices 200 and 210 and the voice recognition device 100 to each other.

통신이 연결되면, 제어부(150)는 통신이 연결된 타 기기(200,210) 각각에게 통신부(110)를 통해서 상태 정보를 요청할 수 있다. 이러한 요청에 대응하여서, 통신부(110)를 통해 각각의 타 기기(200,210)로부터 상태 정보가 수신될 수 있다.When the communication is established, the control unit 150 can request status information from each of the other devices 200 and 210 connected to the communication through the communication unit 110. In response to such a request, state information can be received from each of the other devices 200 and 210 through the communication unit 110.

타 기기(200,210) 각각으로부터 상태 정보가 수신되면, 제어부(150)는 이를 이용하여서 사용자(400)가 타 기기(200,210)를 소지하였는지를 타 기기(200,210)마다 각각 판단한다. 예컨대, 어느 하나의 타 기기로부터 수신받은 상태 정보의 값이 시간에 따라 변화한다면, 제어부(150)는 이러한 타 기기가 사용자(400)에 의해 소지된 것으로 판단할 수 있다. 보다 구체적으로 살펴보면, 센서부에 포함된 가속도 센서의 값이 시간에 따라 변화하는 경우, 제어부(150)는 사용자(400)가 해당 타 기기를 소지한 것으로 판단할 수 있다. 가속도 센서의 값이 시간에 따라 변화한다는 것은 사용자(400)가 해당 타 기기를 소지한 채로 가속도를 갖고 걷거나 뛰는 것을 의미하기 때문이다. 또는, 센서부에 포함된 햅틱 센서의 값이 '사용자(400)가 해당 타 기기를 접촉하면 도출되는 기 지정된 값'과 오차 범위 내에 있는 경우, 제어부(150)는 사용자(400)가 해당 타 기기를 소지한 것으로 판단할 수 있다.When the status information is received from each of the other devices 200 and 210, the control unit 150 determines whether each of the other devices 200 and 210 has the other device 200 or 210 by using the user. For example, if the value of the status information received from one of the other devices changes with time, the controller 150 can determine that the other device is owned by the user 400. [ More specifically, when the value of the acceleration sensor included in the sensor unit changes with time, the controller 150 may determine that the user 400 has the other device. The reason that the value of the acceleration sensor changes with time means that the user 400 walks or runs with the acceleration while holding the other device. Alternatively, if the value of the haptic sensor included in the sensor unit is within an error range with the 'predetermined value' derived when the user 400 contacts the other device, the control unit 150 determines that the user 400 As shown in FIG.

여기서, 전술한 바와 같이 센서부는 가속도 센서, 자이로(gyro) 센서, 지자기 센서 또는 햅틱(haptic) 센서 중 적어도 하나를 포함할 수 있으며, 이 때 제어부(150)는 전술한 다양한 센서에 의해 측정된 값 중 적어도 두 개를 조합함으로써 해당 타 기기가 사용자(400)에 의해 소지되었는지 여부를 판단할 수 있다.Here, as described above, the sensor unit may include at least one of an acceleration sensor, a gyro sensor, a geomagnetic sensor, or a haptic sensor. In this case, the controller 150 may control the value It is possible to determine whether the other device is owned by the user 400 by combining at least two of them.

타 기기(200,210) 중 어느 하나(200)가 사용자(400)에 의해 소지되었다고 제어부(150)에 의해 판단되면, 제어부(150)는 해당 타 기기(200)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다. 명령은, 해당 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리를 음성 인식 장치(100)에게 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. If the controller 150 determines that one of the other devices 200 and 210 is held by the user 400, the controller 150 instructs the other device 200 to transmit the following command to the communication unit 110: Lt; / RTI > The command may be that the other device 200 transmits the sound received through the sound input unit of the other device 200 to the voice recognition device 100, but the present invention is not limited thereto.

즉, 일 실시예에 따르면, 사용자가 음성 인식 기능이 있는 타 기기를 소지하고 있으면, 이러한 타 기기로부터 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.That is, according to one embodiment, if the user has another device having a voice recognition function, the voice of the user can be acquired from the other device. Therefore, even if the user utters at a distance from the speech recognition apparatus at a long distance, the speech distortion that may occur due to the distance between the user and the speech recognition apparatus may be reduced or may not occur.

이 경우, 실시예에 따라서 제어부(150)는 소리 입력부(120)를 제어함으로써, 음성 인식 장치(100)에 포함된 소리 입력부(120)를 통해서는 소리가 입력되지 않도록 하거나, 소리 입력부(120)를 통해서 소리가 입력되더라도 이러한 소리가 음성 인식의 대상에서 배제되도록 음성 인식부(130)를 제어할 수도 있으며, 이 경우는 예컨대 스피커(160)를 통해서 소리가 출력되고 있는 상황일 수 있다. 이를 통해서, 타 기기(200)로 입력된 소리만이 음성 인식의 대상이 되고, 음성 인식 장치(100)로 입력된 소리는 음성 인식의 대상에서 배제될 수 있다. In this case, the control unit 150 controls the sound input unit 120 to prevent sound from being input through the sound input unit 120 included in the voice recognition apparatus 100, The sound recognition unit 130 may be controlled so that the sound is excluded from the object of speech recognition even if sound is input through the speaker 160. In this case, sound may be output through the speaker 160, for example. Accordingly, only the sound input to the other device 200 is subjected to speech recognition, and the sound input to the speech recognition apparatus 100 can be excluded from the object of speech recognition.

한편, 사용자(400)는 타 기기(200,210) 중 어느 것도 소지하지 않을 수 있다. 제어부(150)는 상태 정보를 기초로 타 기기(200,210) 중 어느 것도 사용자(400)에 의해 소지되지 않았다고 판단할 수 있고, 이 경우 제어부(150)는 각각의 타 기기(200,210)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다. 명령은, 각각의 타 기기(200,210)는 자신의 소리 입력부를 통해 입력받은 소리(이하 제1 소리라고 지칭)를 음성 인식 장치(100)에게 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제2 소리라고 지칭)를 입력받도록 설정할 수 있다. 이하에서는 이러한 명령을 제1 명령이라고 지칭하기로 한다.On the other hand, the user 400 may not possess any of the other devices 200 and 210. The control unit 150 can determine that none of the other devices 200 and 210 is carried by the user 400 based on the status information and in this case the control unit 150 instructs each of the other devices 200 and 210 And can transmit the command through the communication unit 110. The command may be that each of the other devices 200 and 210 transmits a sound (hereinafter, referred to as a first sound) inputted through its sound input unit to the voice recognition device 100, but is not limited thereto. In addition, the control unit 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a second sound). Hereinafter, such an instruction will be referred to as a first instruction.

제1 명령에 따라서, 제1 소리가 통신부(110)를 통해 수신될 수 있으며, 제2 소리가 소리 입력부(120)를 통해 입력될 수 있다. 아울러, 제어부(150)는 음성 인식부(130)로 하여금 제1 소리와 제2 소리로부터 잡음의 특성을 추출하도록 제어할 수 있다. 이에 따라 음성 인식부(130)는 제1 소리로부터 잡음의 특성(이하 제1 잡음의 특성이라고 지칭)을 추출하고 제2 소리로부터 잡음의 특성(이하 제2 잡음의 특성이라고 지칭)을 추출할 수 있다. 여기서 잡음의 특성이란 잡음의 크기나 주파수와 같은 것을 의미할 수 있다. According to the first command, the first sound may be received through the communication unit 110, and the second sound may be input through the sound input unit 120. [ In addition, the control unit 150 can control the voice recognition unit 130 to extract the characteristics of the noise from the first sound and the second sound. Accordingly, the speech recognition unit 130 extracts the characteristic of the noise (hereinafter referred to as the characteristic of the first noise) from the first sound and the characteristic of the noise (hereinafter referred to as the characteristic of the second noise) from the second sound have. Here, the characteristic of the noise may mean the magnitude and the frequency of the noise.

제어부(150)는 제1 잡음의 특성과 제2 잡음의 특성, 예컨대 잡음의 크기를 비교할 수 있다. 잡음의 특성에 대한 비교 결과, 제어부(150)는 음성 인식 장치(100)와 타 기기(200) 중 잡음원(300)과의 거리가 상대적으로 멀리 위치한 것을 판단할 수 있으며, 이에 따라서 음성 인식의 대상을 선정할 수 있다. 예컨대, 제어부(150)는 제1 잡음의 크기가 제2 잡음의 크기보다 큰 경우에는 타 기기(200)가 음성 인식 장치(100)보다 상대적으로 잡음원(300)에 가깝다고 판단하여서 제2 소리를 음성 인식의 대상으로 선정할 수 있고, 반대인 경우에는 제1 소리를 음성 인식의 대상으로 선정할 수 있다. The controller 150 may compare the characteristic of the first noise with the characteristic of the second noise, for example, the magnitude of the noise. As a result of comparing the characteristics of the noise, the controller 150 can determine that the distance between the speech recognition apparatus 100 and the noise source 300 among the other devices 200 is relatively far, Can be selected. For example, when the size of the first noise is larger than the size of the second noise, the controller 150 determines that the other device 200 is relatively closer to the noise source 300 than the voice recognition device 100, The first sound can be selected as the object of recognition, and in the opposite case, the first sound can be selected as the object of speech recognition.

즉, 일 실시예에 따르면, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터 이격된 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 타 기기에 의해 취득될 수 있고, 타 기기가 음성 입력 장치보다 잡음원으로부터 가까운 거리에 위치한 경우, 사용자의 음성은 음성 입력 장치에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향이 최소화될 수 있다.That is, according to one embodiment, when the user does not have another device, the distance to the noise source is taken into consideration, and an object for acquiring voice can be selected. For example, when the other device is located farther from the noise source than the audio input device, the user's voice can be acquired by the other device, and when the other device is located closer to the noise source than the voice input device, And can be acquired by an input device. Therefore, even if a noise source exists in the speech input system, the influence of such a noise source on speech recognition can be minimized.

한편, 전술한 실시예에서는 음성 인식의 대상을 선정함에 있어서 1개의 음성 입력 장치(100)와 1개의 타 기기(200)만을 고려하였지만, 실시예에 따라서 1개의 음성 입력 장치(100)와 복수 개의 타 기기(200,210)가 고려될 수도 있고, 또는 음성 입력 장치(100)는 배제되고 복수 개의 타 기기(200,210)만이 고려될 수도 있다.In the above embodiment, only one audio input device 100 and one other device 200 are considered in selecting an object for speech recognition. However, according to the embodiment, one audio input device 100 and a plurality of The other devices 200 and 210 may be considered or the voice input device 100 may be excluded and only a plurality of other devices 200 and 210 may be considered.

한편, 사용자(400)가 타 기기(200,210) 중 어느 것도 소지하지 않은 경우, 제어부(150)는 제1 명령이 아닌 제2 명령이 통신부(110)를 통해 타 기기(200)에게 전달되도록 할 수 있다. 여기서, 제어부(150)가 제1 명령과 제2 명령 중 어떤 명령을 전달할지 여부는, 사용자(400)에 의해 설정되거나 또는 기 정해진 알고리즘에 의해 주기적으로 변경될 수 있다.If the user 400 does not possess any of the other devices 200 and 210, the controller 150 may cause the second command, not the first command, to be transmitted to the other device 200 through the communication unit 110 have. Here, whether or not the control unit 150 transmits a command among the first command and the second command can be set by the user 400 or can be periodically changed by a predetermined algorithm.

제2 명령은, 타 기기(200)가 자신의 소리 입력부로 입력된 소리(이하 제3 소리라고 지칭)를 음성 인식 장치(100)에게 주기적으로 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. 이 경우, 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제4 소리라고 지칭)를 입력받도록 설정할 수 있다.The second command may be that the other device 200 periodically transmits sound (hereinafter referred to as a third sound) input to the sound input unit of the other device 200 to the voice recognition device 100, but is not limited thereto. In this case, the control unit 150 can set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a fourth sound).

제2 명령에 따라서, 제3 소리가 통신부(110)를 통해 주기적으로 수신될 수 있다. 제어부(150)는 주기적으로 수신된 제3 소리를 기초로, 타 기기(200)와 잡음원(300) 간의 거리가 멀어지고 있는지 또는 가까워지고 있는지 여부를 판단할 수 있다. 여기서, 제3 소리를 기초로 타 기기(200)와 잡음원(300) 간의 거리가 멀어지거나 가까워지고 있는지 여부는 소리의 크기나 주파수의 특성 등을 기초로 판단 가능하나, 이에 한정되는 것은 아니다.According to the second command, the third sound can be periodically received through the communication unit 110. [ The control unit 150 may determine whether the distance between the other device 200 and the noise source 300 is getting closer or approaching based on the third sound periodically received. Here, whether or not the distance between the other device 200 and the noise source 300 is getting farther or closer to each other based on the third sound can be determined based on the size of the sound and the characteristics of the frequency, but is not limited thereto.

제어부(150)에 의해서 타 기기(200)와 잡음원(300) 간의 거리가 멀어진다고 판단된 경우, 제어부(150)는 제3 소리를 음성 인식의 대상으로 선정할 수 있고, 반대인 경우에는 제4 소리를 음성 인식의 대상으로 선정할 수 있다. If it is determined by the controller 150 that the distance between the other device 200 and the noise source 300 is increased, the controller 150 can select the third sound as a target of speech recognition. Sound can be selected as a target of speech recognition.

도 5와 6은 일 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다. 이러한 방법은 전술한 음성 인식 장치(100)에 의해 수행 가능하며, 다만 도 5와 6에 도시된 절차 중 적어도 하나가 수행되지 않거나 도시된 절차의 순서와는 다르게 수행될 수 있으며, 또한 도시되지 않은 다른 절차가 수행될 수도 있다.5 and 6 illustrate a procedure of a speech recognition method according to an embodiment. This method can be performed by the speech recognition apparatus 100 described above, but at least one of the procedures shown in FIGS. 5 and 6 may not be performed or may be performed differently from the sequence of the illustrated procedure, Other procedures may be performed.

도 5를 참조하면, 통신부(110)를 통해 타 기기(200,210) 각각으로부터 상태 정보가 수신된다(S100). 다만 단계 S100 이전에 도면에는 도시되지 않았지만 다음과 같은 단계들이 먼저 선행될 수 있다. 예컨대, 제어부(150)가 음성 인식 장치(100) 주변에 있는 타 기기(200,210)를 탐색하는 단계, 탐색이 완료되면 타 기기(200,210)에 음성 인식 장치(100)가 연결되는 단계, 통신이 연결되면 제어부(150)가 타 기기(200,210) 각각에게 통신부(110)를 통해서 상태 정보를 요청하는 단계 등이 수행될 수 있다.Referring to FIG. 5, state information is received from each of the other devices 200 and 210 through the communication unit 110 (S100). Although not shown in the drawing prior to step S100, the following steps may be performed first. For example, the control unit 150 searches for other devices 200 and 210 in the vicinity of the voice recognition device 100, the voice recognition device 100 is connected to the other devices 200 and 210 when the search is completed, The control unit 150 may request status information from each of the other devices 200 and 210 through the communication unit 110, and so on.

단계 S100에 따라 상태 정보가 타 기기(200,210) 각각으로부터 수신되면, 제어부(150)는 이를 이용하여서 타 기기(200,210) 중에서 사용자(400)에 의해 소지된 타 기기가 있는지 여부를 판단한다(S200). 예컨대, 어느 하나의 타 기기로부터 수신받은 상태 정보의 값이 시간에 따라 변화한다면, 제어부(150)는 해당 상태 정보를 송신한 타 기기가 사용자(400)에 의해 소지된 것으로 판단할 수 있음은 전술한 바와 같다.When the status information is received from each of the other devices 200 and 210 according to step S100, the control unit 150 determines whether there is another device owned by the user 400 among the other devices 200 and 210 (S200) . For example, if the value of the state information received from one of the other devices changes with time, the controller 150 may determine that the other device that transmitted the corresponding state information is owned by the user 400, As shown above.

타 기기(200,210) 중 어느 하나(200)가 사용자(400)에 의해 소지되었다고 제어부(150)에 의해 판단되면, 제어부(150)는 해당 타 기기(200)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다(S300). 명령은, 해당 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리를 음성 인식 장치(100)에게 전달하라는 것일 수 있음은 전술한 바와 같다.If the controller 150 determines that one of the other devices 200 and 210 is held by the user 400, the controller 150 instructs the other device 200 to transmit the following command to the communication unit 110: (S300). The command may be that the other device 200 transmits the sound received through the sound input unit of the other device 200 to the voice recognition device 100 as described above.

타 기기(200)로부터 소리가 전달되면, 이러한 소리는 음성 인식 서버(500)로 전달되어서 음성이 추출될 수 있고, 이러한 음성을 기초로 대화형 서비스가 제공될 수 있다.When sound is transmitted from the other device 200, the sound is transmitted to the voice recognition server 500 to extract voice, and an interactive service can be provided based on the voice.

이 경우 실시예에 제어부(150)는 소리 입력부(120)를 제어함으로써 음성 인식 장치(100)에 포함된 소리 입력부(120)를 통해서는 소리가 입력되지 않도록 하거나, 또는 음성 인식부(130)를 제어함으로써 소리 입력부(120)를 통해서 소리가 입력되더라도 이러한 소리가 음성 인식의 대상에서 배제되도록 할 수 있음은 전술한 바와 같다.In this case, the control unit 150 controls the sound input unit 120 to prevent sound from being input through the sound input unit 120 included in the voice recognition apparatus 100, The sound can be excluded from the object of speech recognition even if sound is input through the sound input unit 120 as described above.

한편, 사용자(400)는 타 기기(200,210) 중 어느 것도 소지하지 않을 수 있다. 제어부(150)는 상태 정보를 기초로 타 기기(200,210) 중 어느 것도 사용자(400)에 의해 소지되지 않았다고 판단할 수 있고, 이 경우 제어부(150)는 각각의 타 기기(200,210)에게 제1 명령을 통신부(110)를 통해 전달할 수 있다(S210). 제1 명령은, 각각의 타 기기(200,210)는 자신의 소리 입력부로 입력된 소리(이하 제1 소리라고 지칭)를 음성 인식 장치(100)에게 전달하라는 것일 수 있음은 전술한 바와 같다. 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제2 소리라고 지칭)를 입력받도록 설정할 수 있다(S211).On the other hand, the user 400 may not possess any of the other devices 200 and 210. The control unit 150 may determine that none of the other devices 200 and 210 is held by the user 400 based on the status information and in this case the control unit 150 instructs each of the other devices 200 and 210 to issue a first command May be transmitted through the communication unit 110 (S210). As described above, the first command may be that each of the other devices 200 and 210 transmits a sound (hereinafter, referred to as a first sound) input to its sound input unit to the voice recognition device 100. The control unit 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a second sound) (S211).

단계 S210의 설정에 따라서 제1 소리가 통신부(110)를 통해 수신될 수 있고, 제2 소리 또한 입력될 수 있다. 아울러, 제어부(150)는 음성 인식부(130)로 하여금 제1 소리로부터 잡음의 특성(이하 제1 잡음의 특성이라고 지칭)을 추출하도록 제어하고 제2 소리로부터 잡음의 특성(이하 제2 잡음의 특성이라고 지칭)를 추출하도록 제어할 수 있다(S212).The first sound can be received through the communication unit 110 and the second sound can also be input according to the setting of step S210. In addition, the controller 150 controls the voice recognition unit 130 to extract the characteristic of the noise (hereinafter referred to as the characteristic of the first noise) from the first sound and the characteristic of the noise from the second sound Quot; property ") (S212).

제어부(150)는 제1 잡음의 특성과 제2 잡음의 특성을 비교할 수 있다(S213). 비교 결과, 제어부(150)는 음성 인식 장치(100)와 타 기기(200) 중 잡음원(300)과의 거리가 상대적으로 먼 것을 판단할 수 있으며, 이에 따라서 음성 인식의 대상을 선정할 수 있다(S214,S215). 이와 같이 선정된 소리는 음성 인식 서버(500)로 전달되어서 음성 인식될 수 있고, 그에 따라 대화형 서비스가 제공될 수 있다.The controller 150 may compare the characteristic of the first noise with the characteristic of the second noise (S213). As a result of comparison, the control unit 150 can determine that the distance between the speech recognition apparatus 100 and the noise source 300 among the other devices 200 is relatively long, and thus can select an object of speech recognition S214, and S215). The selected sound may be transmitted to the voice recognition server 500 to be voice-recognized, thereby providing an interactive service.

한편, 사용자(400)가 타 기기(200,210) 중 어느 것도 소지하지 않은 경우, 제어부(150)는 제1 명령과는 다른 제2 명령이 통신부(110)를 통해 타 기기(200)에게 전달되도록 할 수도 있는데, 이에 대해서는 도 6을 참조하여 살펴보기로 한다.On the other hand, when the user 400 does not possess any of the other devices 200 and 210, the controller 150 causes the second command different from the first command to be transmitted to the other device 200 through the communication unit 110 This will be described with reference to FIG.

도 6에 도시된 절차 중 S100과 S400은 도 5와 동일하므로, 이에 대한 설명은 도 5에서의 설명을 원용하기로 한다.Since S100 and S400 of the procedure shown in FIG. 6 are the same as those of FIG. 5, the description of FIG. 5 will be omitted.

제2 명령은, 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리(이하 제3 소리라고 지칭)를 음성 인식 장치(100)에게 주기적으로 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다(S220). 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제4 소리라고 지칭)를 입력받도록 설정할 수 있다.The second command may be that the other device 200 periodically transmits the sound (hereinafter referred to as the third sound) input through the sound input unit of the other device 200 to the voice recognition device 100, but is not limited thereto (S220 ). In addition, the control unit 150 can set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a fourth sound).

제2 명령에 따라서, 제3 소리가 통신부(110)를 통해 주기적으로 수신될 수 있다. 아울러, 제어부(150)는 타 기기(200)와 잡음원(300) 간의 거리가 멀어지고 있는지 또는 가까워지고 있는지 여부를 판단할 수 있다(S222). 타 기기(200)와 잡음원(300) 간의 거리가 멀어지거나 가까워지는 것은 주기적으로 수신되는 제3 소리로부터 잡음을 추출(S221)한 뒤, 이러한 잡음의 특성(시간 지연이나 왜곡 등)을 분석함으로써 판단 가능하다.According to the second command, the third sound can be periodically received through the communication unit 110. [ In addition, the control unit 150 may determine whether the distance between the other device 200 and the noise source 300 is getting closer or approaching (S222). The reason why the distance between the other device 200 and the noise source 300 increases or decreases is that the noise is extracted from the third sound periodically received (S221) and then analyzed by analyzing the characteristics (such as time delay or distortion) It is possible.

제어부(150)에 의해서 타 기기(200)와 잡음원(300) 간의 거리가 멀어진다고 판단된 경우, 제어부(150)는 제3 소리를 음성 인식의 대상으로 선정할 수 있고(S223), 반대인 경우에는 제4 소리를 음성 인식의 대상으로 선정할 수 있다(S224). 이와 같이 선정된 소리는 음성 인식 서버(500)로 전달되어서 음성 인식될 수 있고, 그에 따라 대화형 서비스가 제공될 수 있다.If it is determined that the distance between the other device 200 and the noise source 300 is increased by the control unit 150, the control unit 150 can select the third sound as the target of speech recognition (S223) The fourth sound may be selected as an object of speech recognition (S224). The selected sound may be transmitted to the voice recognition server 500 to be voice-recognized, thereby providing an interactive service.

이상에서 살펴본 바와 같이, 일 실시예에 따르면, 사용자가 음성 인식이 가능한 타 기기를 소지하고 있는 경우에는 이러한 타 기기에서 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.As described above, according to an embodiment, if the user has another device capable of voice recognition, the voice of the user can be acquired from the other device. Therefore, even if the user utters at a distance from the speech recognition apparatus at a long distance, the speech distortion that may occur due to the distance between the user and the speech recognition apparatus may be reduced or may not occur.

또한, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터 이격된 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 타 기기에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향을 최소화시킬 수 있다.Further, when the user does not possess another device, the distance from the noise source is taken into consideration and an object for acquiring the voice can be selected. For example, when the other device is located farther from the noise source than the voice input device, the voice of the user can be acquired by another device. Therefore, even if a noise source exists in the speech input system, the influence of the noise source on speech recognition can be minimized.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications and changes may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents thereof should be construed as falling within the scope of the present invention.

일 실시예에 따르면, 사용자가 음성 인식 장치로부터 원거리에서 음성을 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 발생하지 않거나 경감될 수 있다. 또한, 음성 입력 시스템에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 최소한으로 영향을 미치도록 할 수 있다.According to one embodiment, even if the user utters a voice from a remote location from the voice recognition apparatus, the voice distortion that may occur due to the distance between the user and the voice recognition apparatus can be prevented or mitigated. In addition, even if a noise source is present in the speech input system, such a noise source can minimize the influence on speech recognition.

100: 음성 인식 장치
200, 210: 타 기기
300: 잡음원
400: 사용자100: Speech recognition device
200, 210: Other devices
300: Noise source
400: User

Claims

A speech recognition method performed by a speech recognition apparatus,
Determining whether the other device is held by the user based on the received status information when status information including a changed value is received from the other device when the user has the other device;
Requesting the other device to transmit sound input to the sound input unit of the other device to the voice recognition device when it is determined that the other device is owned by the user;
And controlling the voice recognition to be performed based on the received voice when the voice is received from the other device
Speech recognition method.

The method according to claim 1,
Wherein the controlling comprises:
When the other device is judged to be held by the user and sound is output through the speaker of the voice recognition device, the sound received through the voice input device of the voice recognition device, Control to perform speech recognition on a basis
Speech recognition method.

The method according to claim 1,
Requesting the other device to transmit the first sound inputted to the sound input unit of the other device to the voice recognition device if it is determined that the other device is not carried by the user,
Requesting a sound input unit included in the speech recognition apparatus to receive a second sound;
Extracting a size of a first noise from the information about the first sound and extracting a size of a second noise from the information about the second sound to compare sizes of the first noise and the second noise; ,
Controlling the voice recognition to be performed based on the second sound if the size of the first noise is larger than the first noise and controlling the voice recognition to be performed based on the first sound when the size of the second noise is larger More included
Speech recognition method.

The method according to claim 1,
Requesting the other device to periodically transmit the sound input to the sound input unit of the other device to the voice recognition device when it is determined that the other device is not held by the user in the determining step;
Extracting noise included in the sound from information on the periodically received sound;
Determining whether the distance between the other device and the noise source generating the noise is getting closer or approaching based on the noise;
When it is determined that the other device is moving away from the noise source, controlling the voice recognition to be performed based on the sound input from the sound input unit, and when it is determined that the other device is approaching from the source of the noise, And controlling the voice recognition to be performed based on the sound received from the sound input unit included in the voice input unit
Speech recognition method.

A communication unit for performing communication with another device,
When the status information having the changed value is received from the other device through the communication unit when the user holds the other device, it is determined whether the other device is owned by the user based on the received status information, A control unit for requesting the other device to transmit the sound input to the sound input unit of the other device when it is determined that the other device is owned by the user;
And a voice recognition unit for performing voice recognition based on the received voice when the voice is received from the other device through the communication unit according to the request
Speech Recognition Method