KR102304342B1

KR102304342B1 - Method for recognizing voice and apparatus used therefor

Info

Publication number: KR102304342B1
Application number: KR1020170103169A
Authority: KR
Inventors: 임국찬; 배용우; 신동엽; 신승민; 신승호; 이학순; 전진수
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2017-08-14
Filing date: 2017-08-14
Publication date: 2021-09-23
Also published as: KR20210098424A; KR20190018330A; KR102331234B1

Abstract

일 실시예에 따른 음성 입력 방법은 음성 인식 장치에 의해 수행되며, 타 기기로부터, 사용자가 상기 타 기기를 소지한 경우에 변화된 값을 포함하는 상태 정보가 수신되면, 상기 수신받은 상태 정보를 기초로 상기 타 기기가 상기 사용자에 의해 소지되었는지 여부를 판단하는 단계와, 상기 타 기기가 상기 사용자에 의해 소지된 것으로 판단되면, 상기 타 기기에게 상기 타 기기의 소리 입력부에 입력된 소리를 송신 요청하는 단계와, 상기 타 기기로부터 소리가 수신되면, 상기 수신된 소리를 기초로 음성 인식이 수행되도록 제어하는 단계를 포함한다.The voice input method according to an embodiment is performed by a voice recognition device, and when status information including a value changed when a user possesses the other device is received from another device, based on the received status information determining whether the other device is possessed by the user; and if it is determined that the other device is possessed by the user, requesting the other device to transmit a sound input to a sound input unit of the other device and, when a sound is received from the other device, controlling the voice recognition to be performed based on the received sound.

Description

Speech recognition method and device used therefor

본 발명은 음성 인식 방법 및 이에 사용되는 음성 인식 장치에 관한 것이며, 보다 자세하게는 음성을 입력받는 타 기기와 연동하여서 음성을 인식하는 방법 및 이에 사용되는 음성 인식 장치에 관한 것이다.The present invention relates to a voice recognition method and a voice recognition apparatus used therein, and more particularly, to a method for recognizing a voice by interworking with another device receiving a voice input, and a voice recognition apparatus used therein.

음성 인식 기반의 대화형 디바이스는 복수 개의 음성 입력부(예컨대 마이크로폰)를 포함할 수 있다. 음성 입력부가 복수 개로 구비되면, 다양한 방향에서 발생되는 음성이 높은 인식률로 수집될 수 있다. 도 1은 복수 개의 음성 입력부(20)를 포함하는 대화형 디바이스(1)의 구성을 개념적으로 도시한 도면이다. 도 1을 참조하면, 대화형 디바이스(1)는 몸체를 구성하는 바디부(10) 그리고 이러한 바디부(10)에 실장되는 복수 개의 음성 입력부(20)를 포함할 수 있다. 복수 개의 음성 입력부(20)는 다양한 방향을 향하도록 지향적으로 배치될 수 있다.The voice recognition-based interactive device may include a plurality of voice input units (eg, microphones). When a plurality of voice input units are provided, voices generated in various directions may be collected at a high recognition rate. 1 is a diagram conceptually illustrating a configuration of an interactive device 1 including a plurality of voice input units 20 . Referring to FIG. 1 , the interactive device 1 may include a body portion 10 constituting a body and a plurality of voice input units 20 mounted on the body portion 10 . The plurality of voice input units 20 may be directionally disposed to face various directions.

도 2는 도 1에 도시된 복수 개의 음성 입력부(20)에 대한 블록도를 도시한 도면이다. 도 2를 참조하면, 복수 개의 음성 입력부(20) 각각은 증폭기(21)에 연결될 수 있고, 증폭기(21)는 마이크로프로세서(MCU, 22)에 연결될 수 있다. 복수 개의 음성 입력부(20) 각각을 통해 입력된 음성은 증폭기(21)에서 증폭된 뒤 마이크로프로세서(22)로 전달된다. 마이크로프로세서(22)는 각각의 증폭기(21)로부터 음성을 전달받은 후 음성 인식을 직접 수행할 수 있으며, 이와 달리 별도의 음성 인식 서버에서 음성 인식이 수행될 수 있도록 음성 인식 서버에게 음성을 전달할 수 있다.FIG. 2 is a block diagram illustrating a plurality of voice input units 20 illustrated in FIG. 1 . Referring to FIG. 2 , each of the plurality of audio input units 20 may be connected to the amplifier 21 , and the amplifier 21 may be connected to the microprocessor (MCU) 22 . The voice input through each of the plurality of voice input units 20 is amplified by the amplifier 21 and then transferred to the microprocessor 22 . The microprocessor 22 may directly perform voice recognition after receiving the voice from each amplifier 21 . Alternatively, the microprocessor 22 may transmit the voice to the voice recognition server so that the voice recognition can be performed in a separate voice recognition server. have.

대화형 디바이스(1)는 특정 위치에 고정되어 사용되는 고정형 디바이스일 수 있다. 사용자가 대화형 디바이스(1)로부터 근거리만큼 이격된 위치에 있다면, 이러한 사용자가 발한 음성은 대화형 디바이스(1)에서 용이하게 인식 가능하다. 그러나, 사용자가 대화형 디바이스(1)로부터 원거리만큼 이격된 위치에 있다면, 이러한 사용자가 발한 음성은 대화형 디바이스(1)에서 용이하게 인식되기가 어렵다. 왜냐하면, 사용자가 발한 음성이 대화형 디바이스(1)까지 도달하는 과정에서 왜곡될 수 있기 때문이다.The interactive device 1 may be a fixed device that is fixed and used at a specific location. If the user is at a location separated by a short distance from the interactive device 1 , the voice uttered by the user is easily recognizable in the interactive device 1 . However, if the user is at a location spaced apart from the interactive device 1 by a distance, it is difficult for the user's voice to be easily recognized by the interactive device 1 . This is because the user's voice may be distorted in the process of reaching the interactive device 1 .

한국특허공개공보, 제 2010-0115783호 (2010.10.28. 공개)Korean Patent Laid-Open Publication No. 2010-0115783 (published on October 28, 2010)

이에 본 발명이 해결하고자 하는 과제는, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에 있거나 대화형 디바이스 부근에 잡음원이 존재하는 경우에, 음성 인식률을 개선시키는 기술을 제공하는 것이다.Accordingly, an object of the present invention is to provide a technique for improving a voice recognition rate when a user is at a location far apart from a voice recognition apparatus or a noise source exists in the vicinity of an interactive device.

다만, 본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problems to be solved of the present invention are not limited to those mentioned above, and other problems to be solved that are not mentioned can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

일 실시예에 따른 음성 입력 장치는 타 기기와 통신을 수행하는 통신부와, 사용자가 상기 타 기기를 소지한 경우에 변화된 값을 갖는 상태 정보가 상기통신부를 통해서 상기 타 기기로부터 수신되면 상기 수신받은 상태 정보를 기초로 상기 타 기기가 상기 사용자에 의해 소지되었는지 여부를 판단하고, 상기 타 기기가 상기 사용자에 의해 소지된 것으로 판단되면 상기 타 기기에게 상기 타 기기의 소리 입력부에 입력된 소리를 송신 요청하는 제어부와, 상기 송신 요청에 따라 상기 소리가 상기 통신부를 통해서 상기 타 기기로부터 수신되면, 상기 수신된 소리를 기초로 음성 인식이 수행되도록 제어하는 음성 인식부를 포함한다.A voice input device according to an embodiment includes a communication unit for communicating with another device, and when state information having a changed value is received from the other device through the communication unit when the user possesses the other device, the received state determining whether the other device is possessed by the user based on the information, and requesting the other device to transmit the sound input to the sound input unit of the other device a control unit; and a voice recognition unit for controlling to perform voice recognition based on the received sound when the sound is received from the other device through the communication unit in response to the transmission request.

일 실시예에 따르면, 사용자가 음성 인식이 가능한 타 기기를 소지하고 있는 경우에는 이러한 타 기기에서 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.According to an embodiment, when the user has another device capable of voice recognition, the user's voice may be acquired from the other device. Accordingly, even if the user speaks at a position spaced apart from the voice recognition device by a distance, voice distortion that may occur due to the distance between the user and the voice recognition device may be reduced or not generated.

또한, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터의 이격 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 이러한 타 기기에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향이 최소화될 수 있다.In addition, when the user does not have another device, an object for acquiring a voice may be selected in consideration of the separation distance from the noise source. For example, when the other device is located at a greater distance from the noise source than the voice input device, the user's voice may be acquired by the other device. Accordingly, even if a noise source exists in the voice input system, the influence of the noise source on speech recognition can be minimized.

도 1은 일반적인 대화형 음성 인식 장치의 구성을 개념적으로 도시한 도면이다.
도 2는 도 1에 도시된 대화형 음성 인식 장치의 음성 인식부에 대한 블록도를 도시한 도면이다.
도 3은 일 실시예에 따른 음성 인식 장치가 적용된 음성 인식 시스템의 구성을 개념적으로 도시한 도면이다.
도 4는 도 3에 도시된 음성 인식 장치의 구성을 개념적으로 도시한 도면이다
도 5는 일 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다.
도 6은 다른 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다.1 is a diagram conceptually illustrating a configuration of a general interactive voice recognition apparatus.
FIG. 2 is a diagram illustrating a block diagram of a voice recognition unit of the interactive voice recognition apparatus shown in FIG. 1 .
3 is a diagram conceptually illustrating a configuration of a voice recognition system to which a voice recognition apparatus according to an embodiment is applied.
FIG. 4 is a diagram conceptually illustrating the configuration of the voice recognition apparatus shown in FIG. 3 ;
5 is a diagram exemplarily illustrating a procedure of a voice recognition method according to an embodiment.
6 is a diagram exemplarily illustrating a procedure of a voice recognition method according to another embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, if it is determined that a detailed description of a well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification.

도 3은 일 실시예에 따른 음성 인식 장치가 적용된 음성 인식 시스템의 구성을 개념적으로 도시한 도면이다. 다만, 도 3은 예시적인 것에 불과하므로, 음성 인식 장치(100)가 도 3에 도시된 음성 인식 시스템(1000)에만 한정 적용되는 것으로 해석되지는 않는다.3 is a diagram conceptually illustrating a configuration of a voice recognition system to which a voice recognition apparatus according to an embodiment is applied. However, since FIG. 3 is merely exemplary, it is not interpreted that the voice recognition apparatus 100 is limitedly applied only to the voice recognition system 1000 illustrated in FIG. 3 .

도 3을 참조하면, 음성 인식 시스템(1000)은 음성 인식 서버(500), 음성 인식 장치(100) 그리고 적어도 하나의 타 기기(200,210)를 포함할 수 있다. 이 때, 이러한 음성 인식 시스템(1000)이 설치된 공간에는 잡음(noise)을 발하는 잡음원(300)이 배치될 수 있다. Referring to FIG. 3 , the voice recognition system 1000 may include a voice recognition server 500 , a voice recognition apparatus 100 , and at least one other device 200 , 210 . In this case, the noise source 300 emitting noise may be disposed in a space where the voice recognition system 1000 is installed.

음성 인식 서버(500)는 소리로부터 음성을 추출하고 인식하는 기능을 수행하는 서버일 수 있다. 음성 인식 서버(500)에서 처리되는 소리는 음성 인식 장치(100) 또는 타 기기(200,210)로부터 전달받은 소리일 수 있다. 여기서, 음성 인식 서버(500)는 소리로부터 음성을 추출하고 인식하기 위해 공지된 기술을 사용할 수 있는데, 이에 대한 설명은 생략하기로 한다.The voice recognition server 500 may be a server that extracts and recognizes a voice from a sound. The sound processed by the voice recognition server 500 may be a sound received from the voice recognition apparatus 100 or other devices 200 and 210 . Here, the voice recognition server 500 may use a known technology to extract and recognize a voice from a sound, and a description thereof will be omitted.

타 기기(200,210)에 대하여 살펴보기로 한다. 타 기기(200,210)는 외부로부터 소리를 입력받는 기능을 구비한, 모든 기기를 총칭할 수 있다. 이에 타 기기(200,210)는 소리를 입력받는 소리 입력부를 포함한다. 또한, 타 기기(200,210)는 사용자가 소지한 채로 이동할 수 있는 크기 또는 형태를 가질 수 있다. 또한 타 기기(200,210)는 음성 인식 장치(100)와 정보를 주고받는 통신부(미도시)를 포함할 수 있다. 예컨대 이러한 타 기기(200,210)는 스마트폰, 스마트패드, 스마트시계, 소리 입력 기능이 구비된 리모콘 또는 소리 입력 기능이 구비된 휴대용 스피커 등일 수 있다.The other devices 200 and 210 will be described. The other devices 200 and 210 may collectively refer to all devices having a function of receiving a sound input from the outside. Accordingly, the other devices 200 and 210 include a sound input unit for receiving sound. In addition, the other devices 200 and 210 may have a size or shape that can be moved while being carried by the user. In addition, the other devices 200 and 210 may include a communication unit (not shown) for exchanging information with the voice recognition apparatus 100 . For example, the other devices 200 and 210 may be a smart phone, a smart pad, a smart watch, a remote control equipped with a sound input function, or a portable speaker equipped with a sound input function.

타 기기(200,210)가 입력받는 소리에는 사람의 음성, 사물로부터 발생되는 소리, 잡음원(300)이 발생시키는 잡음 등이 있을 수 있으며, 다만 이에 한정되는 것은 아니다.The sound received by the other devices 200 and 210 may include a human voice, a sound generated from an object, or a noise generated by the noise source 300 , but is not limited thereto.

한편, 타 기기(200,210)는 도면에는 도지 않았지만 센서부를 포함한다. 센서부는 사용자(400)가 타 기기(200,210)를 소지하거나 또는 소지한 채로 이동하는 경우 그 값이 변하는 구성일 수 있다. 예컨대 센서부는 가속도 센서, 자이로(gyro) 센서, 지자기 센서 또는 햅틱(haptic) 센서 중 적어도 하나를 포함할 수 있다. 이 중 햅틱 센서란, 사용자(400)에 의해 타 기기(200,210)가 터치되는지 여부를 감지하는 촉각 센서일 수 있다.Meanwhile, although not shown in the drawings, the other devices 200 and 210 include a sensor unit. The sensor unit may have a configuration in which the value changes when the user 400 carries or moves while carrying the other devices 200 and 210 . For example, the sensor unit may include at least one of an acceleration sensor, a gyro sensor, a geomagnetic sensor, and a haptic sensor. Among them, the haptic sensor may be a tactile sensor that detects whether the other devices 200 and 210 are touched by the user 400 .

음성 인식 장치(100)는 사용자(400)가 발하는 음성을 인식하고, 인식된 음성에 대응하여서 대화형 서비스를 제공하는 장치일 수 있다. 또한, 음성 인식 장치(100)는 타 기기(200,210)를 제어함으로써, 이러한 타 기기(200,210)로 하여금 사용자(400)가 발하는 음성을 입력받도록 할 수 있다. 이하에서는 이러한 음성 인식 장치(100)의 구성에 대해서 살펴보도록 한다.The voice recognition apparatus 100 may be a device that recognizes a voice issued by the user 400 and provides an interactive service in response to the recognized voice. Also, by controlling the other devices 200 and 210 , the voice recognition apparatus 100 may allow the other devices 200 and 210 to receive the voice of the user 400 as input. Hereinafter, the configuration of the voice recognition apparatus 100 will be described.

도 4는 도 3에 도시된 음성 인식 장치(100)의 구성을 예시적으로 도시한 도면이다. 도 4를 참조하면, 음성 인식 장치(100)는 통신부(110), 소리 입력부(120), 음성 인식부(130), 처리부(140), 제어부(150) 및 스피커(160)를 포함할 수 있으며, 다만 도 4에 도시된 것과는 달리 이 중에서 적어도 하나를 포함하지 않거나 또는 도면에는 도시되지 않은 구성을 더 포함할 수도 있다.FIG. 4 is a diagram exemplarily illustrating the configuration of the voice recognition apparatus 100 shown in FIG. 3 . Referring to FIG. 4 , the voice recognition apparatus 100 may include a communication unit 110 , a sound input unit 120 , a voice recognition unit 130 , a processing unit 140 , a control unit 150 , and a speaker 160 . , However, unlike shown in FIG. 4 , at least one of them may not be included or may further include a configuration not shown in the drawings.

통신부(110)는 무선 통신 모듈일 수 있다. 예컨대 통신부(110)는 블루투스모듈, Wi-Fi 모듈 또는 적외선 통신 모듈 중 어느 하나일 수 있으나 이에 한정되는 것은 아니다. 이러한 통신부(110)를 통해서 음성 인식 장치(100)는 타 기기(200,210)와 음성 또는 음성 관련 데이터를 주고받을 수 있다. 통신부(110)는 복수 개의 타 기기(200,210) 각각과 순차적으로 또는 동시에 무선 방식으로 연결될 수 있다.The communication unit 110 may be a wireless communication module. For example, the communication unit 110 may be any one of a Bluetooth module, a Wi-Fi module, or an infrared communication module, but is not limited thereto. Through the communication unit 110 , the voice recognition apparatus 100 may exchange voice or voice related data with other devices 200 and 210 . The communication unit 110 may be sequentially or simultaneously connected to each of the plurality of other devices 200 and 210 in a wireless manner.

소리 입력부(120)는 마이크로폰과 같이 소리를 입력받는 구성이며, 입력받은 소리를 증폭하는 구성까지도 포함할 수 있다. 소리 입력부(120)가 입력받은 소리에는 사람의 음성, 사물로부터 발생되는 소리, 잡음원(300)이 발생시키는 잡음 등이 있을 수 있으며, 다만 이에 한정되는 것은 아니다. 이러한 소리 입력부(120)는 복수 개가 지향적으로 배치 및 동작될 수 있다.The sound input unit 120 is configured to receive a sound like a microphone, and may even include a configuration to amplify the received sound. The sound input by the sound input unit 120 may include a human voice, a sound generated from an object, or a noise generated by the noise source 300 , but is not limited thereto. A plurality of such sound input units 120 may be directionally disposed and operated.

음성 인식부(130)는 소리로부터 음성을 추출하여서 인식하는 구성이다. 음성 인식부(130)는 소리로부터 음성을 인식하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다.The voice recognition unit 130 is configured to extract and recognize a voice from a sound. The voice recognition unit 130 may be implemented by a memory for storing a command programmed to recognize a voice from a sound and a microprocessor for executing the command.

음성 인식부(130)에서 인식되는 소리는 소리 입력부(120)로 입력된 소리 또는 타 기기(200,210)로부터 전달받은 소리일 수 있다. The sound recognized by the voice recognition unit 130 may be a sound input to the sound input unit 120 or a sound transmitted from other devices 200 and 210 .

음성 인식부(130)는 웨이크업 신호를 소리로부터 추출하여서 인식할 수 있다. 웨이크업 신호는 음성 인식 장치(100)를 깨우는 기 정의된 형태의 신호이다. 음성 인식 장치(100)가 웨이크업 신호를 인식하면, 이후부터 사용자(400)가 발하는 음성에 대해 음성 인식 장치(100)는 사용자(400)의 명령으로 해석한다. The voice recognition unit 130 may recognize the wake-up signal by extracting it from the sound. The wakeup signal is a signal of a predefined type for waking up the voice recognition apparatus 100 . When the voice recognition apparatus 100 recognizes the wake-up signal, the voice recognition apparatus 100 interprets a voice from the user 400 as a command from the user 400 thereafter.

한편, 음성 인식부(130)는 웨이크업 신호 이외에 사용자(400)가 발하는 명령을 인식할 수도 있다. 다만, 이와 달리 음성 인식부(130)는 사용자(400)가 발하는 명령을 인식하지 않을 수 있으며, 이 경우에 사용자의 명령 인식은 음성 인식 서버(500)에서 수행될 수 있다.Meanwhile, the voice recognition unit 130 may recognize a command issued by the user 400 in addition to the wake-up signal. However, unlike this, the voice recognition unit 130 may not recognize the command issued by the user 400 , and in this case, the user's command recognition may be performed by the voice recognition server 500 .

음성 인식부(130)는 소리로부터 잡음을 추출하여서 인식할 수 있다. 음성 인식부(130)가 소리로부터 잡음을 추출하여서 인식하는데 사용하는 알고리즘은 공지된 것이므로 이에 대한 설명은 생략하기로 한다.The voice recognition unit 130 may recognize noise by extracting noise from the sound. Since the algorithm used by the voice recognition unit 130 to extract and recognize noise from a sound is well known, a description thereof will be omitted.

처리부(140)는 사용자(400)에게 대화형 서비스를 제공하는 구성이며, 이러한 처리부(140)는 대화형 서비스를 제공하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다. 여기서, 처리부(140)는 이미 공지된 알고리즘을 사용하여서 대화형 서비스를 제공하므로, 이에 대해서는 설명을 생략하기로 한다.The processing unit 140 is a configuration that provides an interactive service to the user 400, and the processing unit 140 can be implemented by a memory for storing instructions programmed to provide the interactive service and a microprocessor for executing these instructions. do. Here, since the processing unit 140 provides an interactive service using a known algorithm, a description thereof will be omitted.

한편, 실시예에 따라서 처리부(140)는 음성 인식 장치(100)에 포함되지 않을 수 있다. 이 경우, 사용자(400)에게 제공되는 대화형 서비스는 음성 인식 서버(500)가 생성한 것이 음성 인식 장치(100)에게 전달된 것일 수 있다. Meanwhile, according to an embodiment, the processing unit 140 may not be included in the voice recognition apparatus 100 . In this case, the interactive service provided to the user 400 may be generated by the voice recognition server 500 and delivered to the voice recognition apparatus 100 .

스피커(160)는 외부를 향해 소리를 출력하는 구성이다. 음성 인식 장치(100)에 채용되는 스피커(160)는 일반적인 스피커일 수 있는 바, 이러한 스피커(160)에 대해서는 설명을 생략하기로 한다.The speaker 160 is configured to output sound toward the outside. Since the speaker 160 employed in the voice recognition apparatus 100 may be a general speaker, a description of the speaker 160 will be omitted.

제어부(150)는 이하에서 설명할 기능을 수행하도록 프로그램된 명령어를 저장하는 메모리 및 이러한 명령어를 실행하는 마이크로프로세서에 의하여 구현 가능하다. 이하에서는 이러한 제어부(150)에 대하여 구체적으로 살펴보도록 한다.The control unit 150 may be implemented by a memory for storing instructions programmed to perform a function to be described below and a microprocessor for executing these instructions. Hereinafter, the control unit 150 will be described in detail.

먼저, 제어부(150)는 음성 인식 장치(100) 주변에 위치한 타 기기(200,210)를 탐색할 수 있다. 예컨대, 통신부(110)가 블루투스 모듈로 구현된 경우, 제어부(150)는 블루투스 연결 히스토리 등을 이용하여서 탐색의 진행을 제어할 수 있다.First, the controller 150 may search for other devices 200 and 210 located in the vicinity of the voice recognition apparatus 100 . For example, when the communication unit 110 is implemented as a Bluetooth module, the control unit 150 may control the search progress by using a Bluetooth connection history or the like.

탐색이 완료된 경우, 제어부(150)는 탐색된 타 기기(200,210)와 음성 인식 장치(100)를 서로 연결시킬 수 있다.When the search is completed, the controller 150 may connect the searched other devices 200 and 210 and the voice recognition apparatus 100 to each other.

통신이 연결되면, 제어부(150)는 통신이 연결된 타 기기(200,210) 각각에게 통신부(110)를 통해서 상태 정보를 요청할 수 있다. 이러한 요청에 대응하여서, 통신부(110)를 통해 각각의 타 기기(200,210)로부터 상태 정보가 수신될 수 있다.When communication is connected, the control unit 150 may request status information from each of the other devices 200 and 210 to which communication is connected through the communication unit 110 . In response to such a request, status information may be received from each of the other devices 200 and 210 through the communication unit 110 .

타 기기(200,210) 각각으로부터 상태 정보가 수신되면, 제어부(150)는 이를 이용하여서 사용자(400)가 타 기기(200,210)를 소지하였는지를 타 기기(200,210)마다 각각 판단한다. 예컨대, 어느 하나의 타 기기로부터 수신받은 상태 정보의 값이 시간에 따라 변화한다면, 제어부(150)는 이러한 타 기기가 사용자(400)에 의해 소지된 것으로 판단할 수 있다. 보다 구체적으로 살펴보면, 센서부에 포함된 가속도 센서의 값이 시간에 따라 변화하는 경우, 제어부(150)는 사용자(400)가 해당 타 기기를 소지한 것으로 판단할 수 있다. 가속도 센서의 값이 시간에 따라 변화한다는 것은 사용자(400)가 해당 타 기기를 소지한 채로 가속도를 갖고 걷거나 뛰는 것을 의미하기 때문이다. 또는, 센서부에 포함된 햅틱 센서의 값이 '사용자(400)가 해당 타 기기를 접촉하면 도출되는 기 지정된 값'과 오차 범위 내에 있는 경우, 제어부(150)는 사용자(400)가 해당 타 기기를 소지한 것으로 판단할 수 있다.When status information is received from each of the other devices 200 and 210 , the control unit 150 determines whether the user 400 possesses the other devices 200 and 210 for each of the other devices 200 and 210 , respectively. For example, if the value of the state information received from any one other device changes with time, the controller 150 may determine that the other device is possessed by the user 400 . More specifically, when the value of the acceleration sensor included in the sensor unit changes with time, the controller 150 may determine that the user 400 possesses the other device. The reason that the value of the acceleration sensor changes with time means that the user 400 walks or runs with an acceleration while carrying the corresponding other device. Alternatively, when the value of the haptic sensor included in the sensor unit is within an error range with 'a predetermined value derived when the user 400 touches the other device', the controller 150 controls the user 400 to select the corresponding other device. can be considered as possessing

여기서, 전술한 바와 같이 센서부는 가속도 센서, 자이로(gyro) 센서, 지자기 센서 또는 햅틱(haptic) 센서 중 적어도 하나를 포함할 수 있으며, 이 때 제어부(150)는 전술한 다양한 센서에 의해 측정된 값 중 적어도 두 개를 조합함으로써 해당 타 기기가 사용자(400)에 의해 소지되었는지 여부를 판단할 수 있다.Here, as described above, the sensor unit may include at least one of an acceleration sensor, a gyro sensor, a geomagnetic sensor, and a haptic sensor. By combining at least two of them, it may be determined whether the corresponding other device is possessed by the user 400 .

타 기기(200,210) 중 어느 하나(200)가 사용자(400)에 의해 소지되었다고 제어부(150)에 의해 판단되면, 제어부(150)는 해당 타 기기(200)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다. 명령은, 해당 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리를 음성 인식 장치(100)에게 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. When it is determined by the control unit 150 that any one of the other devices 200 and 210 is possessed by the user 400 , the control unit 150 sends the following command to the corresponding other device 200 to the communication unit 110 . can be transmitted through The command may be that the other device 200 transmits a sound input through its own sound input unit to the voice recognition apparatus 100, but is not limited thereto.

즉, 일 실시예에 따르면, 사용자가 음성 인식 기능이 있는 타 기기를 소지하고 있으면, 이러한 타 기기로부터 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.That is, according to an embodiment, if the user has another device having a voice recognition function, the user's voice may be acquired from the other device. Accordingly, even if the user speaks at a location spaced apart from the voice recognition device by a distance, voice distortion that may occur due to the distance between the user and the voice recognition device may be reduced or not generated.

이 경우, 실시예에 따라서 제어부(150)는 소리 입력부(120)를 제어함으로써, 음성 인식 장치(100)에 포함된 소리 입력부(120)를 통해서는 소리가 입력되지 않도록 하거나, 소리 입력부(120)를 통해서 소리가 입력되더라도 이러한 소리가 음성 인식의 대상에서 배제되도록 음성 인식부(130)를 제어할 수도 있으며, 이 경우는 예컨대 스피커(160)를 통해서 소리가 출력되고 있는 상황일 수 있다. 이를 통해서, 타 기기(200)로 입력된 소리만이 음성 인식의 대상이 되고, 음성 인식 장치(100)로 입력된 소리는 음성 인식의 대상에서 배제될 수 있다. In this case, according to an embodiment, the controller 150 controls the sound input unit 120 to prevent a sound from being input through the sound input unit 120 included in the voice recognition apparatus 100 or to the sound input unit 120 . The voice recognition unit 130 may be controlled so that the sound is excluded from the target of voice recognition even if the sound is input through the . In this case, for example, the sound may be output through the speaker 160 . Through this, only the sound input to the other device 200 may be the target of voice recognition, and the sound input to the voice recognition apparatus 100 may be excluded from the target of voice recognition.

한편, 사용자(400)는 타 기기(200,210) 중 어느 것도 소지하지 않을 수 있다. 제어부(150)는 상태 정보를 기초로 타 기기(200,210) 중 어느 것도 사용자(400)에 의해 소지되지 않았다고 판단할 수 있고, 이 경우 제어부(150)는 각각의 타 기기(200,210)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다. 명령은, 각각의 타 기기(200,210)는 자신의 소리 입력부를 통해 입력받은 소리(이하 제1 소리라고 지칭)를 음성 인식 장치(100)에게 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제2 소리라고 지칭)를 입력받도록 설정할 수 있다. 이하에서는 이러한 명령을 제1 명령이라고 지칭하기로 한다.Meanwhile, the user 400 may not possess any of the other devices 200 and 210 . The control unit 150 may determine that none of the other devices 200 and 210 are possessed by the user 400 based on the status information. The command may be transmitted through the communication unit 110 . The command may be that each of the other devices 200 and 210 transmit a sound (hereinafter referred to as a first sound) received through its own sound input unit to the voice recognition apparatus 100 , but is not limited thereto. In addition, the controller 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a second sound). Hereinafter, such a command will be referred to as a first command.

제1 명령에 따라서, 제1 소리가 통신부(110)를 통해 수신될 수 있으며, 제2 소리가 소리 입력부(120)를 통해 입력될 수 있다. 아울러, 제어부(150)는 음성 인식부(130)로 하여금 제1 소리와 제2 소리로부터 잡음의 특성을 추출하도록 제어할 수 있다. 이에 따라 음성 인식부(130)는 제1 소리로부터 잡음의 특성(이하 제1 잡음의 특성이라고 지칭)을 추출하고 제2 소리로부터 잡음의 특성(이하 제2 잡음의 특성이라고 지칭)을 추출할 수 있다. 여기서 잡음의 특성이란 잡음의 크기나 주파수와 같은 것을 의미할 수 있다. According to the first command, the first sound may be received through the communication unit 110 , and the second sound may be input through the sound input unit 120 . In addition, the controller 150 may control the voice recognition unit 130 to extract noise characteristics from the first sound and the second sound. Accordingly, the voice recognition unit 130 may extract noise characteristics (hereinafter referred to as first noise characteristics) from the first sound and extract noise characteristics (hereinafter referred to as second noise characteristics) from the second sound. have. Here, the noise characteristic may mean the same as the noise level or frequency.

제어부(150)는 제1 잡음의 특성과 제2 잡음의 특성, 예컨대 잡음의 크기를 비교할 수 있다. 잡음의 특성에 대한 비교 결과, 제어부(150)는 음성 인식 장치(100)와 타 기기(200) 중 잡음원(300)과의 거리가 상대적으로 멀리 위치한 것을 판단할 수 있으며, 이에 따라서 음성 인식의 대상을 선정할 수 있다. 예컨대, 제어부(150)는 제1 잡음의 크기가 제2 잡음의 크기보다 큰 경우에는 타 기기(200)가 음성 인식 장치(100)보다 상대적으로 잡음원(300)에 가깝다고 판단하여서 제2 소리를 음성 인식의 대상으로 선정할 수 있고, 반대인 경우에는 제1 소리를 음성 인식의 대상으로 선정할 수 있다. The controller 150 may compare the characteristic of the first noise and the characteristic of the second noise, for example, the magnitude of the noise. As a result of the comparison of the noise characteristics, the controller 150 may determine that the distance between the voice recognition apparatus 100 and the noise source 300 among the other devices 200 is relatively far, and accordingly, the target of voice recognition can be selected. For example, when the magnitude of the first noise is greater than the magnitude of the second noise, the controller 150 determines that the other device 200 is relatively closer to the noise source 300 than the voice recognition apparatus 100 and outputs the second sound. A target of recognition may be selected, and in the opposite case, the first sound may be selected as a target of voice recognition.

즉, 일 실시예에 따르면, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터 이격된 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 타 기기에 의해 취득될 수 있고, 타 기기가 음성 입력 장치보다 잡음원으로부터 가까운 거리에 위치한 경우, 사용자의 음성은 음성 입력 장치에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향이 최소화될 수 있다.That is, according to an embodiment, when the user does not have another device, an object for acquiring a voice may be selected by considering a distance away from the noise source. For example, if the other device is located farther from the noise source than the voice input device, the user's voice may be acquired by the other device. If the other device is located closer from the noise source than the voice input device, the user's voice may be may be acquired by an input device. Accordingly, even if a noise source exists in the voice input system, the influence of the noise source on speech recognition can be minimized.

한편, 전술한 실시예에서는 음성 인식의 대상을 선정함에 있어서 1개의 음성 입력 장치(100)와 1개의 타 기기(200)만을 고려하였지만, 실시예에 따라서 1개의 음성 입력 장치(100)와 복수 개의 타 기기(200,210)가 고려될 수도 있고, 또는 음성 입력 장치(100)는 배제되고 복수 개의 타 기기(200,210)만이 고려될 수도 있다.Meanwhile, in the above-described embodiment, only one voice input device 100 and one other device 200 are considered in selecting a target for voice recognition, but according to the embodiment, one voice input device 100 and a plurality of voice input devices are considered. Other devices 200 and 210 may be considered, or the voice input device 100 may be excluded and only the plurality of other devices 200 and 210 may be considered.

한편, 사용자(400)가 타 기기(200,210) 중 어느 것도 소지하지 않은 경우, 제어부(150)는 제1 명령이 아닌 제2 명령이 통신부(110)를 통해 타 기기(200)에게 전달되도록 할 수 있다. 여기서, 제어부(150)가 제1 명령과 제2 명령 중 어떤 명령을 전달할지 여부는, 사용자(400)에 의해 설정되거나 또는 기 정해진 알고리즘에 의해 주기적으로 변경될 수 있다.On the other hand, when the user 400 does not possess any of the other devices 200 and 210 , the control unit 150 may cause the second command, not the first command, to be transmitted to the other device 200 through the communication unit 110 . have. Here, whether the controller 150 transmits any of the first command and the second command may be set by the user 400 or may be periodically changed by a predetermined algorithm.

제2 명령은, 타 기기(200)가 자신의 소리 입력부로 입력된 소리(이하 제3 소리라고 지칭)를 음성 인식 장치(100)에게 주기적으로 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다. 이 경우, 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제4 소리라고 지칭)를 입력받도록 설정할 수 있다.The second command may be that the other device 200 periodically transmits a sound (hereinafter referred to as a third sound) input through its own sound input unit to the voice recognition apparatus 100 , but is not limited thereto. In this case, the controller 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a fourth sound).

제2 명령에 따라서, 제3 소리가 통신부(110)를 통해 주기적으로 수신될 수 있다. 제어부(150)는 주기적으로 수신된 제3 소리를 기초로, 타 기기(200)와 잡음원(300) 간의 거리가 멀어지고 있는지 또는 가까워지고 있는지 여부를 판단할 수 있다. 여기서, 제3 소리를 기초로 타 기기(200)와 잡음원(300) 간의 거리가 멀어지거나 가까워지고 있는지 여부는 소리의 크기나 주파수의 특성 등을 기초로 판단 가능하나, 이에 한정되는 것은 아니다.According to the second command, the third sound may be periodically received through the communication unit 110 . The controller 150 may determine whether the distance between the other device 200 and the noise source 300 is increasing or getting closer, based on the periodically received third sound. Here, whether the distance between the other device 200 and the noise source 300 increases or decreases based on the third sound can be determined based on the size of the sound or the characteristics of the frequency, but is not limited thereto.

제어부(150)에 의해서 타 기기(200)와 잡음원(300) 간의 거리가 멀어진다고 판단된 경우, 제어부(150)는 제3 소리를 음성 인식의 대상으로 선정할 수 있고, 반대인 경우에는 제4 소리를 음성 인식의 대상으로 선정할 수 있다. When it is determined by the controller 150 that the distance between the other device 200 and the noise source 300 increases, the controller 150 may select the third sound as a target for voice recognition, and in the opposite case, the fourth sound A sound can be selected as a target for speech recognition.

도 5와 6은 일 실시예에 따른 음성 인식 방법의 절차를 예시적으로 도시한 도면이다. 이러한 방법은 전술한 음성 인식 장치(100)에 의해 수행 가능하며, 다만 도 5와 6에 도시된 절차 중 적어도 하나가 수행되지 않거나 도시된 절차의 순서와는 다르게 수행될 수 있으며, 또한 도시되지 않은 다른 절차가 수행될 수도 있다.5 and 6 are diagrams exemplarily illustrating a procedure of a voice recognition method according to an embodiment. This method can be performed by the above-described voice recognition apparatus 100, but at least one of the procedures shown in FIGS. 5 and 6 may not be performed or may be performed in a different order from the procedure shown, and also not shown. Other procedures may be performed.

도 5를 참조하면, 통신부(110)를 통해 타 기기(200,210) 각각으로부터 상태 정보가 수신된다(S100). 다만 단계 S100 이전에 도면에는 도시되지 않았지만 다음과 같은 단계들이 먼저 선행될 수 있다. 예컨대, 제어부(150)가 음성 인식 장치(100) 주변에 있는 타 기기(200,210)를 탐색하는 단계, 탐색이 완료되면 타 기기(200,210)에 음성 인식 장치(100)가 연결되는 단계, 통신이 연결되면 제어부(150)가 타 기기(200,210) 각각에게 통신부(110)를 통해서 상태 정보를 요청하는 단계 등이 수행될 수 있다.Referring to FIG. 5 , status information is received from each of the other devices 200 and 210 through the communication unit 110 ( S100 ). However, although not shown in the drawings before step S100, the following steps may be preceded. For example, the control unit 150 searches for other devices 200 and 210 in the vicinity of the voice recognition apparatus 100, when the search is completed, the voice recognition apparatus 100 is connected to the other devices 200 and 210, and communication is connected When this occurs, a step of the control unit 150 requesting status information from each of the other devices 200 and 210 through the communication unit 110 may be performed.

단계 S100에 따라 상태 정보가 타 기기(200,210) 각각으로부터 수신되면, 제어부(150)는 이를 이용하여서 타 기기(200,210) 중에서 사용자(400)에 의해 소지된 타 기기가 있는지 여부를 판단한다(S200). 예컨대, 어느 하나의 타 기기로부터 수신받은 상태 정보의 값이 시간에 따라 변화한다면, 제어부(150)는 해당 상태 정보를 송신한 타 기기가 사용자(400)에 의해 소지된 것으로 판단할 수 있음은 전술한 바와 같다.When the state information is received from each of the other devices 200 and 210 according to step S100, the control unit 150 determines whether there is another device possessed by the user 400 among the other devices 200 and 210 by using it (S200) . For example, if the value of the status information received from any one other device changes with time, the control unit 150 can determine that the other device that has transmitted the corresponding status information is possessed by the user 400 is described above. It's like a bar.

타 기기(200,210) 중 어느 하나(200)가 사용자(400)에 의해 소지되었다고 제어부(150)에 의해 판단되면, 제어부(150)는 해당 타 기기(200)에게 다음과 같은 명령을 통신부(110)를 통해 전달할 수 있다(S300). 명령은, 해당 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리를 음성 인식 장치(100)에게 전달하라는 것일 수 있음은 전술한 바와 같다.When it is determined by the control unit 150 that any one of the other devices 200 and 210 is possessed by the user 400 , the control unit 150 sends the following command to the corresponding other device 200 to the communication unit 110 . It can be delivered through (S300). As described above, the command may be that the other device 200 transmits a sound input through its own sound input unit to the voice recognition apparatus 100 .

타 기기(200)로부터 소리가 전달되면, 이러한 소리는 음성 인식 서버(500)로 전달되어서 음성이 추출될 수 있고, 이러한 음성을 기초로 대화형 서비스가 제공될 수 있다.When a sound is transmitted from the other device 200 , the sound is transmitted to the voice recognition server 500 so that a voice can be extracted, and an interactive service can be provided based on the voice.

이 경우 실시예에 제어부(150)는 소리 입력부(120)를 제어함으로써 음성 인식 장치(100)에 포함된 소리 입력부(120)를 통해서는 소리가 입력되지 않도록 하거나, 또는 음성 인식부(130)를 제어함으로써 소리 입력부(120)를 통해서 소리가 입력되더라도 이러한 소리가 음성 인식의 대상에서 배제되도록 할 수 있음은 전술한 바와 같다.In this case, in the embodiment, the controller 150 controls the sound input unit 120 to prevent a sound from being input through the sound input unit 120 included in the voice recognition apparatus 100 , or to use the voice recognition unit 130 . As described above, even when a sound is input through the sound input unit 120 by controlling the sound, the sound can be excluded from the target of voice recognition.

한편, 사용자(400)는 타 기기(200,210) 중 어느 것도 소지하지 않을 수 있다. 제어부(150)는 상태 정보를 기초로 타 기기(200,210) 중 어느 것도 사용자(400)에 의해 소지되지 않았다고 판단할 수 있고, 이 경우 제어부(150)는 각각의 타 기기(200,210)에게 제1 명령을 통신부(110)를 통해 전달할 수 있다(S210). 제1 명령은, 각각의 타 기기(200,210)는 자신의 소리 입력부로 입력된 소리(이하 제1 소리라고 지칭)를 음성 인식 장치(100)에게 전달하라는 것일 수 있음은 전술한 바와 같다. 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제2 소리라고 지칭)를 입력받도록 설정할 수 있다(S211).Meanwhile, the user 400 may not possess any of the other devices 200 and 210 . The control unit 150 may determine that none of the other devices 200 and 210 are possessed by the user 400 based on the status information, and in this case, the control unit 150 gives the first command to each of the other devices 200 and 210 can be transmitted through the communication unit 110 (S210). As described above, the first command may be to transmit a sound (hereinafter, referred to as a first sound) input to the sound input unit of each of the other devices 200 and 210 to the voice recognition apparatus 100 . In addition, the controller 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a second sound) (S211).

단계 S210의 설정에 따라서 제1 소리가 통신부(110)를 통해 수신될 수 있고, 제2 소리 또한 입력될 수 있다. 아울러, 제어부(150)는 음성 인식부(130)로 하여금 제1 소리로부터 잡음의 특성(이하 제1 잡음의 특성이라고 지칭)을 추출하도록 제어하고 제2 소리로부터 잡음의 특성(이하 제2 잡음의 특성이라고 지칭)를 추출하도록 제어할 수 있다(S212).According to the setting in step S210, the first sound may be received through the communication unit 110, and the second sound may also be input. In addition, the control unit 150 controls the voice recognition unit 130 to extract noise characteristics (hereinafter referred to as first noise characteristics) from the first sound, and noise characteristics (hereinafter referred to as second noise characteristics) from the second sound. (referred to as a characteristic) can be controlled to be extracted (S212).

제어부(150)는 제1 잡음의 특성과 제2 잡음의 특성을 비교할 수 있다(S213). 비교 결과, 제어부(150)는 음성 인식 장치(100)와 타 기기(200) 중 잡음원(300)과의 거리가 상대적으로 먼 것을 판단할 수 있으며, 이에 따라서 음성 인식의 대상을 선정할 수 있다(S214,S215). 이와 같이 선정된 소리는 음성 인식 서버(500)로 전달되어서 음성 인식될 수 있고, 그에 따라 대화형 서비스가 제공될 수 있다.The controller 150 may compare the characteristics of the first noise and the characteristics of the second noise ( S213 ). As a result of the comparison, the controller 150 may determine that the distance between the voice recognition apparatus 100 and the noise source 300 among the other devices 200 is relatively long, and accordingly, may select a target for voice recognition ( S214, S215). The selected sound may be transmitted to the voice recognition server 500 for voice recognition, and an interactive service may be provided accordingly.

한편, 사용자(400)가 타 기기(200,210) 중 어느 것도 소지하지 않은 경우, 제어부(150)는 제1 명령과는 다른 제2 명령이 통신부(110)를 통해 타 기기(200)에게 전달되도록 할 수도 있는데, 이에 대해서는 도 6을 참조하여 살펴보기로 한다.On the other hand, when the user 400 does not possess any of the other devices 200 and 210 , the control unit 150 controls a second command different from the first command to be transmitted to the other device 200 through the communication unit 110 . Also, this will be described with reference to FIG. 6 .

도 6에 도시된 절차 중 S100과 S400은 도 5와 동일하므로, 이에 대한 설명은 도 5에서의 설명을 원용하기로 한다.Among the procedures shown in FIG. 6 , S100 and S400 are the same as those of FIG. 5 , so the description of FIG. 5 will be referenced.

제2 명령은, 타 기기(200)는 자신의 소리 입력부를 통해 입력받은 소리(이하 제3 소리라고 지칭)를 음성 인식 장치(100)에게 주기적으로 전달하라는 것일 수 있으나 이에 한정되는 것은 아니다(S220). 이와 함께 제어부(150)는 음성 인식 장치(100)에 포함된 소리 입력부(120)로 하여금 소리(이하 제4 소리라고 지칭)를 입력받도록 설정할 수 있다.The second command may be that the other device 200 periodically transmits a sound (hereinafter referred to as a third sound) input through its own sound input unit to the voice recognition apparatus 100, but is not limited thereto (S220). ). In addition, the controller 150 may set the sound input unit 120 included in the voice recognition apparatus 100 to receive a sound (hereinafter referred to as a fourth sound).

제2 명령에 따라서, 제3 소리가 통신부(110)를 통해 주기적으로 수신될 수 있다. 아울러, 제어부(150)는 타 기기(200)와 잡음원(300) 간의 거리가 멀어지고 있는지 또는 가까워지고 있는지 여부를 판단할 수 있다(S222). 타 기기(200)와 잡음원(300) 간의 거리가 멀어지거나 가까워지는 것은 주기적으로 수신되는 제3 소리로부터 잡음을 추출(S221)한 뒤, 이러한 잡음의 특성(시간 지연이나 왜곡 등)을 분석함으로써 판단 가능하다.According to the second command, the third sound may be periodically received through the communication unit 110 . In addition, the controller 150 may determine whether the distance between the other device 200 and the noise source 300 is increasing or getting closer ( S222 ). It is determined that the distance between the other device 200 and the noise source 300 increases or decreases by extracting the noise from the periodically received third sound (S221) and analyzing the characteristics of the noise (time delay or distortion, etc.) possible.

제어부(150)에 의해서 타 기기(200)와 잡음원(300) 간의 거리가 멀어진다고 판단된 경우, 제어부(150)는 제3 소리를 음성 인식의 대상으로 선정할 수 있고(S223), 반대인 경우에는 제4 소리를 음성 인식의 대상으로 선정할 수 있다(S224). 이와 같이 선정된 소리는 음성 인식 서버(500)로 전달되어서 음성 인식될 수 있고, 그에 따라 대화형 서비스가 제공될 수 있다.When it is determined by the controller 150 that the distance between the other device 200 and the noise source 300 increases, the controller 150 may select the third sound as a target for voice recognition (S223), and vice versa. A fourth sound may be selected as a target for voice recognition (S224). The selected sound may be transmitted to the voice recognition server 500 for voice recognition, and an interactive service may be provided accordingly.

이상에서 살펴본 바와 같이, 일 실시예에 따르면, 사용자가 음성 인식이 가능한 타 기기를 소지하고 있는 경우에는 이러한 타 기기에서 사용자의 음성이 취득될 수 있다. 따라서, 사용자가 음성 인식 장치로부터 원거리만큼 이격된 위치에서 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 경감되거나 발생하지 않을 수 있다.As described above, according to an embodiment, when the user has another device capable of voice recognition, the user's voice may be acquired from the other device. Accordingly, even if the user speaks at a position spaced apart from the voice recognition device by a distance, voice distortion that may occur due to the distance between the user and the voice recognition device may be reduced or not generated.

또한, 사용자가 타 기기를 소지하고 있지 않은 경우에는, 잡음원으로부터 이격된 거리가 고려되어서 음성을 취득하는 객체가 선택될 수 있다. 예컨대, 타 기기가 음성 입력 장치보다 잡음원으로부터 먼 거리에 위치한 경우, 사용자의 음성은 타 기기에 의해 취득될 수 있다. 따라서, 음성 입력 시스템 내에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 끼치는 영향을 최소화시킬 수 있다.In addition, when the user does not have another device, an object for acquiring a voice may be selected by considering a distance away from the noise source. For example, when the other device is located at a greater distance from the noise source than the voice input device, the user's voice may be acquired by the other device. Accordingly, even if a noise source exists in the voice input system, the influence of the noise source on speech recognition can be minimized.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications and variations will be possible without departing from the essential quality of the present invention by those skilled in the art to which the present invention pertains. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

일 실시예에 따르면, 사용자가 음성 인식 장치로부터 원거리에서 음성을 발화하더라도, 사용자와 음성 인식 장치 간의 원거리로 인해 발생 가능한 음성 왜곡이 발생하지 않거나 경감될 수 있다. 또한, 음성 입력 시스템에 잡음원이 존재하더라도 이러한 잡음원이 음성 인식에 최소한으로 영향을 미치도록 할 수 있다.According to an embodiment, even if the user utters a voice from a distance from the voice recognition device, voice distortion that may occur due to the distance between the user and the voice recognition device may not occur or be reduced. In addition, even if a noise source exists in the voice input system, it is possible to minimize the influence of the noise source on speech recognition.

100: 음성 인식 장치
200, 210: 타 기기
300: 잡음원
400: 사용자100: speech recognition device
200, 210: other devices
300: noise source
400: user

Claims

A voice recognition method performed by a voice recognition device, comprising:
determining whether the other device is possessed by the user based on the received status information when status information including a value changed when the user possesses the other device is received from another device;
when it is determined that the other device is possessed by the user, requesting the other device to transmit the sound input to the sound input unit of the other device;
When a sound is received from the other device, comprising the step of controlling the voice recognition to be performed based on the received sound
Speech Recognition Method.

The method of claim 1,
The controlling step is
When it is determined that the other device is possessed by the user and sound is being output through the speaker of the voice recognition device, the sound received from the other device is excluded from the sound input through the sound input unit of the voice recognition device. to control voice recognition to be performed on the basis of
Speech Recognition Method.

The method of claim 1,
When it is determined that the other device is not possessed by the user in the determining step, requesting the other device to transmit the first sound input to the sound input unit of the other device;
requesting a second sound input from a sound input unit included in the voice recognition device;
comparing the magnitude of the first noise with the magnitude of the second noise by extracting the magnitude of the first noise from the information on the first sound and the magnitude of the second noise from the information on the second sound; ,
controlling to perform voice recognition based on the second sound when the level of the first noise is large as a result of the comparison, and controlling the voice recognition to be performed based on the first sound when the level of the second noise is large; more containing
Speech Recognition Method.

The method of claim 1,
if it is determined in the determining step that the other device is not possessed by the user, periodically requesting the other device to transmit the sound input to the sound input unit of the other device;
extracting noise included in the sound from information about the sound received in response to the periodic transmission request;
determining whether the distance between the other device and the noise source generating the noise is getting closer or closer based on the noise;
When it is determined that the other device is moving away from the noise source, voice recognition is controlled to be performed based on the sound input from the sound input unit, and when it is determined that the other device is getting closer to the noise source, the voice recognition device Further comprising the step of controlling the voice recognition to be performed based on the sound input from the sound input unit included in the
Speech Recognition Method.

A communication unit for communicating with other devices;
If status information having a changed value is received from the other device through the communication unit when the user possesses the other device, it is determined whether the other device is possessed by the user based on the received status information, a control unit for requesting the other device to transmit the sound input to the sound input unit of the other device when it is determined that the other device is possessed by the user;
When the sound is received from the other device through the communication unit according to the transmission request, comprising a voice recognition unit for controlling to perform voice recognition based on the received sound
speech recognition device.