KR20230138656A

KR20230138656A - Method, system and non-transitory computer-readable recording medium for providing speech recognition trigger

Info

Publication number: KR20230138656A
Application number: KR1020220036502A
Authority: KR
Inventors: 김석중
Original assignee: 주식회사 브이터치
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2023-10-05
Also published as: WO2023182766A1

Abstract

본 개시는 음성 인식 트리거를 제공하기 위한 방법, 시스템 및 비일시성의 컴퓨터 판독 가능한 기록 매체에 관한 것이다. 본 개시의 일 실시예에 따른 음성 인식 트리거를 제공하기 위한 방법은 디바이스에서 감지되는 근접 정보에 기초하여 디바이스와 객체 사이의 거리 변화를 산출하는 단계; 및 디바이스와 객체 사이의 거리 변화를 참조하여 디바이스의 음성 인식 트리거 여부를 결정하는 단계;를 포함한다.The present disclosure relates to a method, system, and non-transitory computer-readable recording medium for providing a voice recognition trigger. A method for providing a voice recognition trigger according to an embodiment of the present disclosure includes calculating a change in distance between a device and an object based on proximity information detected by the device; and determining whether to trigger voice recognition of the device by referring to a change in distance between the device and the object.

Description

METHOD, SYSTEM AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM FOR PROVIDING SPEECH RECOGNITION TRIGGER}

본 개시는 음성 인식 트리거를 제공하기 위한 방법, 시스템 및 비일시성의 컴퓨터 판독 가능한 기록 매체에 관한 것이다.The present disclosure relates to a method, system, and non-transitory computer-readable recording medium for providing a voice recognition trigger.

최근 사용자 인터페이스에 대한 관심이 높아지고, 음성 처리 기술이 발달함에 따라, 음성 인식 기능이 내장된 IT 기기들이 늘어가고 있다. 예컨대, 사용자의 음성을 인식하여 사용자가 요청한 동작을 수행할 수 있는 스마트폰, 스마트 워치, 스마트 TV, 스마트 냉장고 등이 널리 보급되고 있다.Recently, as interest in user interfaces has increased and voice processing technology has developed, the number of IT devices with built-in voice recognition functions is increasing. For example, smartphones, smart watches, smart TVs, and smart refrigerators that can recognize the user's voice and perform actions requested by the user are becoming widely available.

이와 같은 종래 기술의 일 예로서, 한국공개특허공보 제2016-0039244호(특허문헌 1)에 개시된 기술을 예로 들 수 있는데, 컴퓨팅 디바이스가 오디오 데이터를 제공받으면, 오디오 데이터에 음성 개시 액션이 포함되어 있는지 여부를 판단하고, 포함되어 있다고 판단되는 경우, 음성 개시 액션이 인식되었음을 나타내는 디스플레이를 컴퓨팅 디바이스를 통해 사용자에게 제공하는 기술이 소개된 바 있다.An example of such prior art is the technology disclosed in Korean Patent Publication No. 2016-0039244 (Patent Document 1). When a computing device receives audio data, the audio data includes a voice-initiated action. A technology has been introduced that determines whether a voice-initiated action is included and, if determined to be included, provides a display to the user through a computing device indicating that a voice-initiated action has been recognized.

하지만, 위와 같은 종래 기술을 비롯하여 지금까지 소개된 기술에 의하면, 사용자가 음성 입력을 시작하기 전에 버튼을 누르거나 미리 정해진 트리거 단어를 입력하여, 사용자가 음성 입력을 시작하는 지점을 특정 하여야만 했다. 이 중 버튼을 누르는 전자의 방식은, 사용자가 손을 자유롭게 사용하지 못하는 경우 음성 입력을 수행할 수 없어 불편함을 초래할 수밖에 없었다. 또한, 사용자가 미리 정해진 트리거 단어를 말하는 후자의 방식은 음성 인식 장치와 사용자가 조금만 떨어지더라도 같은 공간에서 발생되는 타인의 음성과 같은 다양한 노이즈로 인해 음성 입력 시작 지점의 특정이 어려운데다, 사용자가 미리 정해진 트리거 단어를 말하더라도 음성 입력이 시작되었다는 것을 사용자에게 확신시키기 위하여 소리나 불빛으로 피드백을 한 뒤에야 사용자가 음성 입력을 시작하도록 할 수밖에 없었기 때문에, 음성 입력을 시작하는 단계부터 상당히 긴 시간이 소요될 수밖에 없는 한계가 있었다.However, according to the technologies introduced so far, including the above-described prior art, the user had to press a button or input a predetermined trigger word before starting voice input to specify the point at which the user starts voice input. Among these, the former method of pressing buttons inevitably caused inconvenience because voice input could not be performed if the user could not use his or her hands freely. In addition, the latter method, in which the user speaks a predetermined trigger word, makes it difficult to specify the starting point of voice input due to various noises such as other people's voices occurring in the same space even if the voice recognition device and the user are only slightly separated. Even if a set trigger word is said, the user has to start voice input only after providing feedback with sound or light to convince the user that voice input has started, so it is inevitable that it will take quite a long time from the beginning of voice input. There were limits.

이에 본 발명자는, 버튼을 누르거나 트리거 단어를 입력하지 않고, 디바이스와 사용자 사이의 거리 변화에 기초하여 디바이스의 음성 인식 트리거 여부를 결정하는 음성 인식 트리거에 관한 기술을 제안하는 바이다.Accordingly, the present inventor proposes a voice recognition trigger technology that determines whether to trigger voice recognition of a device based on a change in distance between the device and the user without pressing a button or entering a trigger word.

공개특허공보 제2016-39244호(2016. 4. 8)Public Patent Publication No. 2016-39244 (April 8, 2016)

본 개시는 상술한 종래기술의 문제점을 해결하기 위한 것으로서, 음성 입력을 시작하기 위한 불필요한 과정을 생략하여 사용자가 빠르게 음성을 입력할 수 있도록 지원하는 것에 그 목적이 있다.The present disclosure is intended to solve the problems of the prior art described above, and its purpose is to support users to quickly input voice by omitting unnecessary processes for starting voice input.

또한, 본 개시는 디바이스가 음성 인식을 시작하는 시점을 정확히 검출할 수 있는 시스템 및 방법을 제공하는 것에도 그 목적이 있다.Additionally, the present disclosure aims to provide a system and method that can accurately detect when a device starts voice recognition.

상기 목적을 달성하기 위한 본 개시의 대표적인 구성은 다음과 같다.A representative configuration of the present disclosure to achieve the above object is as follows.

본 개시의 일 실시예에 따른 음성 인식 트리거를 제공하기 위한 방법은 디바이스에서 감지되는 근접 정보에 기초하여 디바이스와 객체 사이의 거리 변화를 산출하는 단계; 및 디바이스와 객체 사이의 거리 변화를 참조하여 디바이스의 음성 인식 트리거 여부를 결정하는 단계;를 포함한다.A method for providing a voice recognition trigger according to an embodiment of the present disclosure includes calculating a change in distance between a device and an object based on proximity information detected by the device; and determining whether to trigger voice recognition of the device by referring to a change in distance between the device and the object.

본 개시의 일 실시예에 따르면, 근접 정보는 디바이스의 근접 센서로부터 획득될 수 있다.According to one embodiment of the present disclosure, proximity information may be obtained from a proximity sensor of a device.

본 개시의 일 실시예에 따르면, 음성 인식 트리거 여부 결정 단계에서는, 디바이스와 객체 사이의 거리 변화를 참조하여, 디바이스가 사전 설정된 속도 이하의 속도로 객체와 가까워져 사전 설정된 거리 이내에 근접한 시점을 음성 인식 시점으로 결정할 수 있다.According to an embodiment of the present disclosure, in the step of determining whether to trigger voice recognition, with reference to the change in distance between the device and the object, the voice recognition time point is when the device approaches the object at a speed lower than the preset speed and approaches within the preset distance. can be decided.

본 개시의 일 실시예에 따르면, 음성 인식 트리거 여부 결정 단계에서는, 디바이스와 객체 사이의 거리 변화를 참조하여, 디바이스가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지되는 시점을 음성 인식 시점으로 결정할 수 있다. 여기서, 디바이스가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지된 시간이 사전 설정된 시간에 도달하는 시점을 음성 인식 시점으로 결정할 수 있다.According to an embodiment of the present disclosure, in the step of determining whether to trigger voice recognition, the voice recognition time point is when the distance is maintained constant while the device is within a preset distance from the object, with reference to the change in distance between the device and the object. can be decided. Here, while the device is close to the object within a preset distance, the point in time when the distance is maintained constant reaches the preset time can be determined as the voice recognition point.

본 개시의 일 실시예에 따른 음성 인식 트리거를 제공하기 위한 방법은 디바이스에서 감지되는 근접 정보에 기초하여 디바이스와 객체 사이의 거리 변화를 산출하고, 디바이스에서 감지되는 이동 정보에 기초하여 디바이스의 이동 방향을 산출하는 단계; 및 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여 디바이스의 음성 인식 트리거 여부를 결정하는 단계;를 포함한다.A method for providing a voice recognition trigger according to an embodiment of the present disclosure calculates the change in distance between the device and the object based on proximity information detected by the device, and the movement direction of the device based on movement information detected by the device. Calculating; and determining whether to trigger voice recognition of the device by referring to the change in distance between the device and the object and the moving direction of the device.

본 개시의 일 실시예에 따르면, 음성 인식 트리거 여부 결정 단계에서는, 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여, 디바이스가 상방향으로 이동 후 감속하여 정지한 시점을 음성 인식 시점으로 결정할 수 있다. 여기서, 디바이스가 정지한 시점에서의 디바이스와 객체 사이의 거리가 사전 설정된 거리 이내인 경우, 디바이스가 정지한 시점을 음성 인식 시점으로 결정할 수 있다.According to an embodiment of the present disclosure, in the step of determining whether to trigger voice recognition, referring to the change in distance between the device and the object and the moving direction of the device, the time when the device moves upward and then decelerates and stops is considered the voice recognition time. You can decide. Here, if the distance between the device and the object at the time the device stops is within a preset distance, the time when the device stops can be determined as the voice recognition time.

본 개시의 일 실시예에 따르면, 음성 인식 트리거 여부 결정 단계에서는, 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여, 디바이스가 상방향으로 이동하여 객체와 사전 설정된 거리 이내에 근접하는 시점을 음성 인식 시점으로 결정할 수 있다.According to an embodiment of the present disclosure, in the step of determining whether to trigger voice recognition, the point in time when the device moves upward and approaches the object within a preset distance is determined by referring to the change in distance between the device and the object and the moving direction of the device. It can be decided at the time of voice recognition.

본 개시의 일 실시예에 따르면, 근접 정보는 디바이스의 근접 센서로부터 획득될 수 있으며, 이동 정보는 디바이스의 IMU 센서로부터 획득될 수 있다.According to one embodiment of the present disclosure, proximity information may be obtained from a proximity sensor of the device, and movement information may be obtained from an IMU sensor of the device.

본 개시의 일 실시예에 따른 음성 인식 트리거를 제공하기 위한 음성 인식 시스템은 디바이스에서 감지되는 근접 정보에 기초하여 디바이스와 객체 사이의 거리 변화를 산출하는 산출부; 및 디바이스와 객체 사이의 거리 변화를 참조하여 디바이스의 음성 인식 트리거 여부를 결정하는 결정부;를 포함한다.A voice recognition system for providing a voice recognition trigger according to an embodiment of the present disclosure includes: a calculation unit that calculates a change in distance between a device and an object based on proximity information detected by the device; and a determination unit that determines whether to trigger voice recognition of the device by referring to a change in distance between the device and the object.

이 외에도, 본 개시를 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 비일시성의 컴퓨터 판독 가능한 기록 매체가 더 제공된다.In addition to this, other methods for implementing the present disclosure, other systems, and non-transitory computer-readable recording media for recording computer programs for executing the methods are further provided.

본 개시의 일 실시예에 따르면, 음성 입력을 시작하기 위한 불필요한 과정을 생략하여 사용자가 빠르게 음성을 입력할 수 있는 효과가 달성된다.According to an embodiment of the present disclosure, the effect of allowing a user to quickly input a voice is achieved by omitting unnecessary processes for starting voice input.

본 개시의 일 실시예에 따르면, 디바이스와 사용자 사이의 거리 변화를 참조하여, 또는 디바이스와 사용자 사이의 거리 변화와 디바이스의 이동 방향을 함께 참조하여 디바이스의 음성 인식 트리거 여부를 결정함으로써, 예를 들어 사용자가 디바이스를 들어 올려 입술에 근접하여 위치시킨 상태에서 음성을 인식하게 함으로써, 사용자의 의도를 정확하게 반영하여 동작하는 음성 인식 트리거를 제공하는 효과가 달성된다.According to an embodiment of the present disclosure, by determining whether to trigger voice recognition of the device by referring to the change in distance between the device and the user, or by referring to the change in distance between the device and the user and the moving direction of the device, for example, By allowing the user to recognize a voice while lifting the device and placing it close to the lips, the effect of providing a voice recognition trigger that operates by accurately reflecting the user's intention is achieved.

도 1은 본 개시의 일 실시예에 따라 음성 인식 시스템을 포함하는 디바이스의 다양한 실시예를 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 디바이스의 구성을 개략적으로 도시한 기능 블록도이다.
도 3은 본 개시의 일 실시예에 따른 음성 인식 시스템의 기능적 구성을 개략적으로 도시한 기능 블록도이다.
도 4는 본 개시의 일 실시예에 따라 음성 인식 시스템을 통해 음성 인식 트리거가 제공되는 상황을 예시적으로 나타내는 도면이다.
도 5는 본 개시의 다른 실시예에 따라 음성 인식 시스템을 통해 음성 인식 트리거가 제공되는 상황을 예시적으로 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 음성 인식 시스템을 통해 음성 인식 트리거가 제공되지 않는 상황을 예시적으로 나타내는 도면이다.
도 7은 본 개시의 일 실시예에 따라 음성 인식 트리거 제공 방법을 예시적으로 보여주는 동작 흐름도이다.1 is a diagram illustrating various embodiments of a device including a voice recognition system according to an embodiment of the present disclosure.
Figure 2 is a functional block diagram schematically showing the configuration of a device according to an embodiment of the present disclosure.
Figure 3 is a functional block diagram schematically showing the functional configuration of a voice recognition system according to an embodiment of the present disclosure.
FIG. 4 is a diagram illustrating a situation in which a voice recognition trigger is provided through a voice recognition system according to an embodiment of the present disclosure.
FIG. 5 is a diagram illustrating a situation in which a voice recognition trigger is provided through a voice recognition system according to another embodiment of the present disclosure.
FIG. 6 is a diagram illustrating a situation in which a voice recognition trigger is not provided through a voice recognition system according to an embodiment of the present disclosure.
Figure 7 is an operation flowchart illustrating a method for providing a voice recognition trigger according to an embodiment of the present disclosure.

이하, 첨부 도면을 참조하여 본 개시의 실시예에 관하여 상세히 설명한다. 이하에서는, 본 개시의 요지를 불필요하게 흐릴 우려가 있다고 판단되는 경우, 이미 공지된 기능 및 구성에 관한 구체적인 설명을 생략한다. 또한, 이하에서 설명하는 내용은 어디까지나 본 개시의 일 실시예에 관한 것일 뿐 본 개시가 이로써 제한되는 것은 아님을 알아야 한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Below, if it is judged that there is a risk of unnecessarily obscuring the gist of the present disclosure, detailed descriptions of already known functions and configurations will be omitted. In addition, it should be noted that the content described below only relates to one embodiment of the present disclosure and is not limited thereto.

본 개시에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로 본 개시를 한정하려는 의도에서 사용된 것이 아니다. 예를 들면, 단수로 표현된 구성요소는 문맥상 명백하게 단수만을 의미하지 않는다면 복수의 구성요소를 포함하는 개념으로 이해되어야 한다. 본 개시에서 사용되는 "및/또는"이라는 용어는, 열거되는 항목들 중 하나 이상의 항목에 의한 임의의 가능한 모든 조합들을 포괄하는 것임이 이해되어야 한다. 본 개시에서 사용되는 '포함하다' 또는 '가지다' 등의 용어는 본 개시 상에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐이고, 이러한 용어의 사용에 의해 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하려는 것은 아니다.The terms used in this disclosure are only used to describe specific embodiments and are not intended to limit the disclosure. For example, a component expressed in the singular should be understood as a concept that includes plural components unless the context clearly indicates only the singular. It should be understood that the term “and/or” as used in this disclosure encompasses any and all possible combinations of one or more of the listed items. Terms such as 'include' or 'have' used in this disclosure are only intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in this disclosure, and the meaning of these terms is The use is not intended to exclude the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

본 개시의 실시예에 있어서 '모듈' 또는 '부'는 적어도 하나의 기능이나 동작을 수행하는 기능적 부분을 의미하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 '모듈' 또는 '부'는, 특정한 하드웨어로 구현될 필요가 있는 '모듈' 또는 '부'를 제외하고는, 적어도 하나의 소프트웨어 모듈로 일체화되어 적어도 하나의 프로세서에 의해 구현될 수 있다.In the embodiments of the present disclosure, 'module' or 'unit' refers to a functional part that performs at least one function or operation, and may be implemented as hardware or software, or as a combination of hardware and software. Additionally, a plurality of 'modules' or 'units' may be integrated into at least one software module and implemented by at least one processor, except for 'modules' or 'units' that need to be implemented with specific hardware. there is.

덧붙여, 달리 정의되지 않는 한 기술적 또는 과학적인 용어를 포함하여, 본 개시에서 사용되는 모든 용어들은 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의된 용어들은, 관련 기술의 문맥상 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 개시에서 명백하게 달리 정의하지 않는 한 과도하게 제한 또는 확장하여 해석되지 않는다는 점을 알아야 한다.In addition, unless otherwise defined, all terms used in this disclosure, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the technical field to which this disclosure pertains. It should be noted that terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the context meaning of the related technology, and should not be interpreted in an overly restrictive or expanded manner unless clearly defined otherwise in the present disclosure. .

도 1은 본 개시의 일 실시예에 따라 음성 인식 시스템을 포함하는 디바이스의 다양한 실시예를 나타내는 도면이다.1 is a diagram illustrating various embodiments of a device including a voice recognition system according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따라, 사용자가 본 개시에 따른 음성 인식 시스템이 포함된 디바이스를 사용자의 입(또는 입술) 부근에 위치시킨 상태에서 음성을 발화함으로써 사용자에게 음성 인식 트리거가 제공되는 상황을 예시적으로 나타낸다.Referring to FIG. 1, according to an embodiment of the present disclosure, a user utters a voice while placing a device including a voice recognition system according to the present disclosure near the user's mouth (or lips), thereby providing voice recognition to the user. An example situation in which a trigger is provided is shown.

본 개시의 일 실시예에 따른 디바이스는, 메모리 수단을 구비하고, 마이크로 프로세서를 탑재하여 연산 능력을 갖춘 디지털 기기로서, 일 실시예서 디바이스는 스마트 링(100a), 스마트 워치(100b), 스마트 리모컨(100c), 스마트 펜(100d) 또는 스마트폰(100e)일 수 있다. 디바이스는 도시된 바에 한정되지 않으며, 스마트 밴드와 같은 다른 웨어러블 디바이스이거나 스마트 패드, 데스크탑 컴퓨터, 노트북 컴퓨터, 워크스테이션, 피디에이(PDA), 웹 패드, 이동 전화기 등과 같은 다소 전통적인 디바이스일 수도 있으며, 이 뿐만 아니라 본 개시의 목적을 달성할 수 있는 범위 내에서 얼마든지 변경될 수 있다.A device according to an embodiment of the present disclosure is a digital device equipped with a memory means and equipped with a microprocessor and has computing capabilities. In one embodiment, the device includes a smart ring (100a), a smart watch (100b), and a smart remote control ( 100c), a smart pen (100d), or a smartphone (100e). The device is not limited to what is shown, and may be other wearable devices such as smart bands or somewhat traditional devices such as smart pads, desktop computers, laptop computers, workstations, PDAs, web pads, mobile phones, etc. However, it may be changed at any time within the scope of achieving the purpose of the present disclosure.

본 개시의 일 실시예에 따르면 디바이스(100a, 100b, 100c, 100d, 100e)와 사용자 사이의 거리 변화 및 디바이스(100a, 100b, 100c, 100d, 100e)의 이동 방향을 참조하여 디바이스(100a, 100b, 100c, 100d, 100e)의 음성 트리거 여부를 결정할 수 있다.According to an embodiment of the present disclosure, the devices 100a, 100b are stored with reference to the change in distance between the devices 100a, 100b, 100c, 100d, 100e and the user and the movement direction of the devices 100a, 100b, 100c, 100d, 100e. , 100c, 100d, 100e).

도 2는 본 개시의 일 실시예에 따른 디바이스의 기능적 구성을 개략적으로 도시한 기능 블록도이다.Figure 2 is a functional block diagram schematically showing the functional configuration of a device according to an embodiment of the present disclosure.

도 2를 참조하면, 본 개시의 일 실시예에 따른 디바이스(200)는, 근접 센서(202), IMU 센서(204), 마이크(206) 및 음성 인식 시스템(208)을 포함할 수 있다. 도 2에 도시된 구성요소들은 디바이스(200)의 모든 기능을 반영한 것이 아니고. 필수적인 것도 아니어서, 디바이스(200)는 도시된 구성요소들 보다 많은 구성요소를 포함하거나 그보다 적은 구성요소를 포함할 수 있다.Referring to FIG. 2, the device 200 according to an embodiment of the present disclosure may include a proximity sensor 202, an IMU sensor 204, a microphone 206, and a voice recognition system 208. The components shown in FIG. 2 do not reflect all functions of the device 200. Although not required, device 200 may include more or fewer components than those shown.

본 개시의 일 실시예에 따른 디바이스(200)의 근접 센서(202)는 디바이스(200)와 객체 간 거리를 감지하는 기능을 수행한다. 일 실시예에서, 근접 센서(202)를 통하여 디바이스(200)의 근접 정보를 획득할 수 있다. 본 개시의 일 실시예에서, 근접 센서(202)는 광학(optical) 센서, 광전(photoelectric) 센서, 초음파(ultrasonic) 센서, 유도(inductive) 센서, 정전 용량(capacitive) 센서, 저항막(resistive) 센서, 와전류(eddy current) 센서, 적외선(infrared) 센서, 마그네틱(magnetic) 센서 등의 공지의 센서 중 적어도 하나를 포함할 수 있다.The proximity sensor 202 of the device 200 according to an embodiment of the present disclosure performs a function of detecting the distance between the device 200 and an object. In one embodiment, proximity information of the device 200 may be obtained through the proximity sensor 202. In one embodiment of the present disclosure, the proximity sensor 202 may be an optical sensor, a photoelectric sensor, an ultrasonic sensor, an inductive sensor, a capacitive sensor, or a resistive sensor. It may include at least one of known sensors such as a sensor, an eddy current sensor, an infrared sensor, and a magnetic sensor.

본 개시의 일 실시예에 따른 디바이스(200)의 IMU 센서(204)는 디바이스(200)의 이동을 감지하는 기능을 수행한다. 일 실시예에서, IMU 센서(204)를 통하여 디바이스(200)의 이동 정보(예를 들어, 높이 변화 값)를 획득할 수 있다.The IMU sensor 204 of the device 200 according to an embodiment of the present disclosure performs a function of detecting movement of the device 200. In one embodiment, movement information (eg, height change value) of the device 200 may be obtained through the IMU sensor 204.

본 개시의 일 실시예에 따른 디바이스(200)의 마이크(206)는 사용자의 음성을 감지하는 기능을 수행한다. 본 개시의 일 실시예에 따라, 디바이스(200)의 음성 인식 트리거 여부가 결정되는 경우, 디바이스(200)는 마이크(206)를 통하여 사용자의 음성을 인식할 수 있으며, 이에 따라 사용자가 요청한 동작을 수행할 수 있다.The microphone 206 of the device 200 according to an embodiment of the present disclosure performs a function of detecting the user's voice. According to an embodiment of the present disclosure, when it is determined whether to trigger voice recognition of the device 200, the device 200 may recognize the user's voice through the microphone 206 and perform the operation requested by the user accordingly. It can be done.

본 개시의 일 실시예에 따르면, 디바이스(200)에 포함되는 근접 센서(202), IMU 센서(204) 및 마이크(206)는 서로 물리적으로 가까운 위치에 배치될 수 있다.According to one embodiment of the present disclosure, the proximity sensor 202, the IMU sensor 204, and the microphone 206 included in the device 200 may be placed in locations physically close to each other.

본 개시의 일 실시예에 따르면, 디바이스(200)에 포함되는 근접 센서(202), IMU 센서(204) 및 마이크(206)는 소정의 조건에 따라 순차적으로 작동되도록 구성될 수 있다. 일 실시예에서, 근접 센서(202)에서 감지되는 디바이스(200)와 객체 사이의 거리가 사전 설정된 거리 이내인 경우 IMU 센서(204) 및/또는 마이크(206)가 작동되도록 구성될 수 있다. 일 실시예에서, 디바이스(200)가 상방향 이동 후 정지한 경우 근접 센서(202) 및/또는 마이크(206)가 작동되도록 구성될 수 있다. 여기서, 디바이스(200)의 이동 방향은 IMU 센서(204)에 의해서 감지될 수 있다.According to an embodiment of the present disclosure, the proximity sensor 202, IMU sensor 204, and microphone 206 included in the device 200 may be configured to operate sequentially according to predetermined conditions. In one embodiment, the IMU sensor 204 and/or the microphone 206 may be configured to operate when the distance between the device 200 and the object detected by the proximity sensor 202 is within a preset distance. In one embodiment, the proximity sensor 202 and/or microphone 206 may be configured to operate when the device 200 stops after moving upward. Here, the moving direction of the device 200 can be detected by the IMU sensor 204.

이처럼, 본 실시예에 따르면, 디바이스(200)에 포함되는 근접 센서(202), IMU 센서(204) 및 마이크(206)가 소정의 조건에 따라 순차적으로 작동되도록 구성함으로써, 효율적인 저전력 설계를 달성할 수 있다. 뿐만 아니라, 하나의 센서에서 소정의 조건을 충족하는 경우에 다른 센서를 작동하여 음성 인식 트리거 여부의 판단을 함께 수행하도록 함으로써, 음성 인식 트리거 제공의 정확성을 향상시킬 수 있다.As such, according to the present embodiment, an efficient low-power design can be achieved by configuring the proximity sensor 202, the IMU sensor 204, and the microphone 206 included in the device 200 to operate sequentially according to predetermined conditions. You can. In addition, when one sensor satisfies a predetermined condition, the accuracy of providing a voice recognition trigger can be improved by operating another sensor to determine whether a voice recognition trigger is triggered.

본 개시의 일 실시예에 따른 디바이스(200)의 음성 인식 시스템(208)은, 사용자에 음성 인식 트리거를 제공하는 기능을 수행한다. 본 개시의 일 실시예에 따른 음성 인식 시스템(208)은 디바이스(100)와 객체 사이의 거리 변화를 산출할 수 있으며, 산출된 결과를 참조하여 디바이스(100)의 음성 인식 트리거 여부를 결정할 수 있다. 본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 구체적인 구성은 뒤에서 다시 설명하기로 한다.The voice recognition system 208 of the device 200 according to an embodiment of the present disclosure performs a function of providing a voice recognition trigger to the user. The voice recognition system 208 according to an embodiment of the present disclosure can calculate the change in distance between the device 100 and the object, and determine whether to trigger voice recognition of the device 100 by referring to the calculated result. . The specific configuration of the voice recognition system 208 according to an embodiment of the present disclosure will be described later.

한편, 본 개시의 일 실시예에 따른 음성 인식 시스템(208)은 음성 인식 트리거가 제공되는 기능이 수행될 수 있도록 지원하는 애플리케이션 형태로 디바이스(200)에 포함될 수 있다. 이와 같은 애플리케이션은 외부의 애플리케이션 배포 서버(미도시됨)로부터 다운로드된 것일 수 있다. 여기서, 애플리케이션은 그 적어도 일부가 필요에 따라 그것과 실질적으로 동일하거나 균등한 기능을 수행할 수 있는 하드웨어 장치나 펌웨어 장치로 치환될 수도 있다.Meanwhile, the voice recognition system 208 according to an embodiment of the present disclosure may be included in the device 200 in the form of an application that supports performing a function for which a voice recognition trigger is provided. Such applications may be downloaded from an external application distribution server (not shown). Here, at least part of the application may be replaced with a hardware device or firmware device that can perform substantially the same or equivalent functions as necessary.

도 3은 본 개시의 일 실시예에 따른 음성 인식 시스템의 기능적 구성을 개략적으로 도시한 기능 블록도이다.Figure 3 is a functional block diagram schematically showing the functional configuration of a voice recognition system according to an embodiment of the present disclosure.

도 3을 참조하면, 본 개시의 일 실시예에 따른 음성 인식 시스템(208)은, 산출부(302), 결정부(304), 통신부(306) 및 저장부(308)를 포함할 수 있다. 도 3에 도시된 구성요소들은 음성 인식 시스템(208)의 모든 기능을 반영한 것이 아니고, 필수적인 것도 아니어서, 음성 인식 시스템(208)은 도시된 구성요소들 보다 많은 구성요소를 포함하거나 그보다 적은 구성요소를 포함할 수 있다.Referring to FIG. 3, the voice recognition system 208 according to an embodiment of the present disclosure may include a calculation unit 302, a decision unit 304, a communication unit 306, and a storage unit 308. The components shown in FIG. 3 do not reflect all of the functions of the speech recognition system 208 and are not essential, so the speech recognition system 208 may include more or fewer components than those shown. may include.

본 개시의 일 실시예에 따르면, 산출부(302), 결정부(304), 통신부(306) 및 저장부(308)는 그 중 적어도 일부가 외부 시스템(미도시됨)과 통신하는 프로그램 모듈들일 수 있다. 이러한 프로그램 모듈들은 운영 시스템, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 음성 인식 시스템(208)에 포함될 수 있으며, 물리적으로는 여러 가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈들은 음성 인식 시스템(208)과 통신 가능한 원격 기억 장치에 저장될 수도 있다. 한편, 이러한 프로그램 모듈들은 본 개시에 따라 후술할 특정 업무를 수행하거나 특정 추상 데이터 유형을 실행하는 루틴, 서브루틴, 프로그램, 오브젝트, 컴포넌트, 데이터 구조 등을 포괄하지만, 이에 제한되지는 않는다.According to one embodiment of the present disclosure, the calculation unit 302, the decision unit 304, the communication unit 306, and the storage unit 308 are program modules, at least some of which communicate with an external system (not shown). You can. These program modules may be included in the voice recognition system 208 in the form of operating systems, application modules, and other program modules, and may be physically stored in various known storage devices. Additionally, these program modules may be stored in a remote storage device capable of communicating with the voice recognition system 208. Meanwhile, these program modules include, but are not limited to, routines, subroutines, programs, objects, components, data structures, etc. that perform specific tasks or execute specific abstract data types according to the present disclosure.

한편, 앞서 설명한 바와 같이, 음성 인식 시스템(208)의 구성요소 또는 기능 중 적어도 일부가 필요에 따라 사용자가 휴대하거나 사용자의 신체 부위(예를 들면, 손가락, 손목 등)에 착용되는 디바이스(200) 내에서 실현되거나 이러한 디바이스(200) 내에 포함될 수 있다. 또한, 경우에 따라서는, 음성 인식 시스템(208)의 모든 기능과 모든 구성요소가 디바이스(200) 내에서 전부 실행되거나 디바이스(200) 내에 전부 포함될 수도 있다.Meanwhile, as described above, at least some of the components or functions of the voice recognition system 208 are device 200 that is carried by the user or worn on a part of the user's body (e.g., fingers, wrist, etc.) as needed. It may be realized within or included within such device 200. Additionally, in some cases, all functions and all components of the voice recognition system 208 may be entirely executed within the device 200 or may be entirely included within the device 200.

본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 산출부(302)는, 디바이스(200)에서 감지되는 근접 정보에 기초하여 디바이스(200)와 객체 사이의 거리 변화를 산출하는 기능을 수행한다. 본 개시의 일 실시예에 따르면, 산출부(302)는, 디바이스(200)에서 감지되는 근접 정보로부터 객체와 디바이스(200) 간 물리적인 거리에 관한 정보를 획득함으로써 디바이스(200)와 객체 사이의 시간에 따른 거리 변화를 산출할 수 있다. 여기서, 객체는 사용자의 입 또는 입술일 수 있으며, 이와 달리 사용자의 다른 신체 부위나 다른 사물의 특정 부위일 수도 있다. 일 실시예에서, 근접 정보는 디바이스(200)에 포함되는 근접 센서(202)에 의해서 감지될 수 있다.The calculation unit 302 of the voice recognition system 208 according to an embodiment of the present disclosure performs a function of calculating the change in distance between the device 200 and the object based on proximity information detected by the device 200. do. According to one embodiment of the present disclosure, the calculation unit 302 determines the distance between the device 200 and the object by obtaining information about the physical distance between the object and the device 200 from the proximity information sensed by the device 200. Distance changes over time can be calculated. Here, the object may be the user's mouth or lips, or alternatively, it may be another body part of the user or a specific part of another object. In one embodiment, proximity information may be detected by proximity sensor 202 included in device 200.

또한, 본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 산출부(302)는, 디바이스(200)에서 감지되는 이동 정보에 기초하여 디바이스의 이동 방향을 산출하는 기능을 수행한다. 본 개시의 일 실시예에 따르면, 산출부(302)는, 디바이스(200)의 물리적인 이동에 따른 가속도, 속도, 위치의 변화에 관한 정보를 획득함으로써 디바이스(200)의 이동 방향을 산출할 수 있다. 일 실시예에서, 이동 정보는 디바이스(200)에 포함되는 IMU 센서(204)에 의해서 감지될 수 있다.Additionally, the calculation unit 302 of the voice recognition system 208 according to an embodiment of the present disclosure performs a function of calculating the movement direction of the device based on movement information detected by the device 200. According to an embodiment of the present disclosure, the calculation unit 302 can calculate the direction of movement of the device 200 by obtaining information about changes in acceleration, speed, and position according to the physical movement of the device 200. there is. In one embodiment, movement information may be detected by an IMU sensor 204 included in device 200.

본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 결정부(304)는, 산출부(302)를 통해 산출되는 디바이스(200)와 객체 사이의 거리 변화를 참조하여, 디바이스(200)의 음성 인식 시점을 결정하고, 음성 인식 트리거 여부를 결정하는 기능을 수행한다.The determination unit 304 of the voice recognition system 208 according to an embodiment of the present disclosure refers to the change in distance between the device 200 and the object calculated through the calculation unit 302, and determines the It performs the function of determining the timing of voice recognition and whether to trigger voice recognition.

예를 들면, 결정부(304)는 디바이스(200)가 사전 설정된 속도 이하의 속도로 객체와 가까워져, 객체에 사전 설정된 거리 이내에 근접한 시점을 음성 인식 시점으로 결정할 수 있다. 즉, 결정부(304)는 디바이스(200)가 객체와 점진적으로 가까워져 소정의 거리 이내에 근접한 상태에 도달하는 시점을 음성 인식 시점으로 결정할 수 있다.For example, the decision unit 304 may determine the point in time when the device 200 approaches the object at a speed lower than the preset speed and within a preset distance to the object as the voice recognition point. That is, the decision unit 304 may determine the point in time when the device 200 gradually approaches the object and reaches a state within a predetermined distance as the voice recognition point.

다른 예를 들면, 결정부(304)는, 디바이스(200)가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지되는 시점을 음성 인식 시점으로 결정할 수 있다. 일 실시예에서, 디바이스(200)가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지된 시간이 사전 설정된 시간에 도달하는 시점을 음성 인식 시점으로 결정할 수 있다. 이에 의하면, 사용자가 디바이스(200)를 객체와 근접한 곳에 위치시킨 경우에 한하여, 본 개시에 따른 음성 인식 트리거가 제공될 수 있다.For another example, the decision unit 304 may determine the voice recognition time point at which the device 200 is within a preset distance from the object and the distance is maintained constant. In one embodiment, while the device 200 is close to an object within a preset distance, the point in time when the distance is maintained constant reaches the preset time may be determined as the voice recognition point. According to this, the voice recognition trigger according to the present disclosure can be provided only when the user places the device 200 close to the object.

또한, 본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 결정부(304)는, 산출부(302)를 통해 산출되는 디바이스(200)와 객체 사이의 거리 변화 및 디바이스(200)의 이동 방향을 참조하여, 디바이스(200)의 음성 인식 시점을 결정하고, 음성 인식 트리거 여부를 결정하는 기능을 수행할 수 있다.In addition, the determination unit 304 of the voice recognition system 208 according to an embodiment of the present disclosure determines the change in distance between the device 200 and the object calculated through the calculation unit 302 and the movement of the device 200. By referring to the direction, the voice recognition timing of the device 200 can be determined and the function of determining whether to trigger voice recognition can be performed.

예를 들면, 결정부(304)는, 디바이스(200)가 상방향으로 이동 후 감속하여 정지한 시점을 음성 인식 시점으로 결정할 수 있다. 일 실시예에서, 결정부(304)는 디바이스가 정지한 시점에서의 디바이스(200)와 객체 사이의 거리가 사전 설정된 거리 이내인 경우, 정지한 시점을 음성 인식 시점으로 결정할 수 있다. 이에 의하면, 사용자가 디바이스(200)를 상방향으로 들어 올리는 경우에 한하여, 본 개시에 따른 음성 인식 트리거가 제공될 수 있다.For example, the decision unit 304 may determine the point in time when the device 200 moves upward and then decelerates and stops as the voice recognition point. In one embodiment, if the distance between the device 200 and the object at the time when the device stops is within a preset distance, the decision unit 304 may determine the time when the device stops as the voice recognition time. Accordingly, the voice recognition trigger according to the present disclosure can be provided only when the user lifts the device 200 upward.

다른 예를 들면, 객체가 사용자의 입 또는 입술인 경우, 결정부(304)는 디바이스(200)가 상방향으로 이동하여 객체와 사전 설정된 거리 이내에 근접하는 시점을 음성 인식 시점으로 결정할 수 있다. 여기서, 사전 설정된 거리 이내에 근접하는 시점은, 예를 들어 디바이스(200)가 객체(예컨대, 사용자 입술)로부터 3cm 이내에 근접하는 시점일 수 있다. 이 경우, 본 개시의 일 실시예에 따라 사용자가 디바이스(200)를 들어 올려 디바이스(200)를 사용자의 입술에 근접한 곳에 위치시킨 경우에 한하여, 본 개시에 따른 음성 인식 트리거가 제공될 수 있다.For another example, when the object is the user's mouth or lips, the decision unit 304 may determine the point in time when the device 200 moves upward and approaches the object within a preset distance as the voice recognition point. Here, the point in time when the device 200 approaches within a preset distance may be, for example, the point in time when the device 200 approaches within 3 cm from an object (eg, the user's lips). In this case, according to an embodiment of the present disclosure, a voice recognition trigger according to the present disclosure may be provided only when the user lifts the device 200 and places the device 200 close to the user's lips.

본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 통신부(306)는 산출부(302), 결정부(304) 및 저장부(308)로부터의/로의 데이터 송수신이 가능하도록 하는 기능을 수행할 수 있으며, 또한 음성 인식 시스템(208)의 외부 통신망과 통신할 수 있도록 기능할 수 있다.The communication unit 306 of the voice recognition system 208 according to an embodiment of the present disclosure performs a function to enable data transmission and reception from/to the calculation unit 302, the decision unit 304, and the storage unit 308. It can also function to communicate with an external communication network of the voice recognition system 208.

본 개시의 일 실시예에 따른 음성 인식 시스템(208)의 저장부(308)는 음성 인식 시스템(208)을 운영하는 데에 필요한 데이터를 저장하는 기능을 수행할 수 있다. 저장부(308)에 저장되는 데이터로는, 예를 들어 산출부(302)에서 산출한 디바이스(200)와 객체 사이의 거리 변화 및 디바이스(200)의 이동 방향에 관한 정보, 결정부(304)에서 결정된 음성 인식 시점에 관한 정보 등이 있을 수 있다.The storage unit 308 of the voice recognition system 208 according to an embodiment of the present disclosure may perform the function of storing data necessary for operating the voice recognition system 208. Data stored in the storage unit 308 include, for example, information about the change in distance between the device 200 and the object calculated by the calculation unit 302 and the direction of movement of the device 200, and the determination unit 304. There may be information about the voice recognition timing determined in .

도 4는 본 개시의 일 실시예에 따라 음성 인식 시스템을 통해 음성 인식 트리거가 제공되는 상황을 예시적으로 나타내는 도면이다. 도 4에서는, 음성 인식 시스템(208)이 포함된 디바이스(200)로 스마트 링이 사용되고, 객체가 사용자의 입술인 것을 예시한다.FIG. 4 is a diagram illustrating a situation in which a voice recognition trigger is provided through a voice recognition system according to an embodiment of the present disclosure. In FIG. 4 , a smart ring is used as the device 200 including the voice recognition system 208, and the object is the user's lips.

도 4의 (a)는 사용자가 손가락에 스마트 링을 착용한 상태에서 스마트 링을 상방향, 즉 사용자의 입술을 향하는 방향으로 포물선을 그리며 이동시킨 후, 사용자의 입술에 근접한 상태에서 정지시킨 상황을 예시적으로 나타낸다. 도시된 바와 같이, 각 시점(t-4, t-3, t-2, t-1 및 t)에서 측정되는 스마트 링과 사용자의 입 사이의 거리(D)는 300mm 이상(D_t-4), 200mm(D_t-3), 100mm(D_t-2), 20mm(D_t-1) 및 20mm(D_t)이다.Figure 4 (a) shows a situation in which the user wears a smart ring on a finger, moves the smart ring upward, that is, in the direction toward the user's lips, in a parabola, and then stops it close to the user's lips. It is shown illustratively. As shown, the distance (D) between the smart ring and the user's mouth measured at each time point (t-4, t-3, t-2, t-1 and t) is more than 300 mm (D _t-4 ) , 200 mm (D _t-3 ), 100 mm (D _t-2 ), 20 mm (D _t-1 ) and 20 mm (D _t ).

일 실시예에 따르면, 사용자가 스마트 링을 사용자의 입 부근으로 이동시킴에 따라 음성 인식 시스템(208)은 스마트 링의 근접 센서를 통하여 감지되는 근접 정보(즉, 거리 변화 값)에 기초하여 스마트 링과 사용자 사이의 거리가 원거리에서 근접거리까지 점진적으로 가까워졌음을 추정할 수 있다.According to one embodiment, as the user moves the smart ring near the user's mouth, the voice recognition system 208 uses the smart ring based on proximity information (i.e., distance change value) detected through the proximity sensor of the smart ring. It can be estimated that the distance between the user and the user has gradually become closer from far to close.

도 4의 (b)는 사용자가 손가락에 스마트 링을 착용한 상태에서 스마트 링을 상방향, 즉 사용자의 입을 향하는 방향으로 직선 이동시킨 후, 사용자의 입에 근접한 상태에서 정지시킨 상황을 예시적으로 나타낸다. 도시된 바와 같이, 각 시점(t-4, t-3, t-2, t-1 및 t)에서 측정되는 스마트 링과 사용자의 입 사이의 거리(D)는 200mm(D_t-4), 140mm(D_t-3), 80mm(D_t-2), 20mm(D_t-1) 및 20mm(D_t)이다.Figure 4 (b) illustrates a situation where the user wears a smart ring on a finger, moves the smart ring straight upward, that is, in the direction toward the user's mouth, and then stops it close to the user's mouth. indicates. As shown, the distance (D) between the smart ring and the user's mouth measured at each time point (t-4, t-3, t-2, t-1 and t) is 200 mm (D _t-4 ), They are 140mm (D _t-3 ), 80mm (D _t-2 ), 20mm (D _t-1 ) and 20mm (D _t ).

도 4의 (a)와 마찬가지로, 사용자가 스마트 링을 사용자의 입술 부근에 이동시킴에 따라 음성 인식 시스템(208)은 스마트 링의 근접 센서를 통하여 감지되는 근접 정보에 기초하여 스마트 링과 사용자 사이의 거리가 원거리에서 근접거리까지 점진적으로 가까워졌음을 추정할 수 있다.Similar to (a) of FIG. 4, as the user moves the smart ring near the user's lips, the voice recognition system 208 establishes a connection between the smart ring and the user based on proximity information detected through the proximity sensor of the smart ring. It can be assumed that the distance gradually became closer from far to near.

이처럼, 디바이스와 객체 사이의 거리 변화를 참조하여 디바이스의 음성 인식 트리거 여부를 결정함으로써, 사용자가 디바이스를 사용자의 입에 근접하여 위치시킨 상태에서 음성 인식 트리거가 정확하게 제공될 수 있다.In this way, by determining whether to trigger voice recognition of the device by referring to the change in distance between the device and the object, a voice recognition trigger can be accurately provided when the user places the device close to the user's mouth.

도 4에서는 디바이스와 객체 사이의 거리 변화만을 참조하여 디바이스의 음성 인식 트리거 여부를 결정하는 것으로 설명하였으나, 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 함께 참조하여 디바이스의 음성 인식 트리거 여부를 결정할 수도 있다.In FIG. 4, it is explained that whether or not to trigger voice recognition of the device is determined by referring only to the change in distance between the device and the object. However, whether to trigger voice recognition of the device is determined by referring to the change in distance between the device and object and the direction of movement of the device. It may be possible.

도 5는 본 개시의 다른 실시예에 따라 음성 인식 시스템을 통해 음성 인식 트리거가 제공되는 상황을 예시적으로 나타내는 도면이다. 도 4와 마찬가지로, 도 5에서는, 음성 인식 시스템(208)이 포함된 디바이스(200)로 스마트 링이 사용되고, 객체가 사용자의 입술인 것을 예시한다.FIG. 5 is a diagram illustrating a situation in which a voice recognition trigger is provided through a voice recognition system according to another embodiment of the present disclosure. Like FIG. 4 , FIG. 5 illustrates that a smart ring is used as the device 200 including the voice recognition system 208 and the object is the user's lips.

도 5의 (a)는 사용자가 손가락에 스마트 링을 착용한 상태에서 스마트 링을 상방향, 즉 사용자의 입술을 향하는 방향으로 포물선을 그리며 이동시킨 후, 사용자의 입술에 근접한 상태에서 정지시킨 상황을 예시적으로 나타낸다. 도시된 바와 같이, 각 시점(t-4, t-3, t-2, t-1 및 t)에서 측정되는 스마트 링과 사용자의 입 사이의 거리(D)는 300mm 이상(D_t-4), 200mm(D_t-3), 100mm(D_t-2), 20mm(D_t-1) 및 20mm(D_t)이고, 스마트 링의 높이 변화 값(ΔH)은 +70mm(ΔH_t-4), +70mm(ΔH_t-3), +60mm(ΔH_t-2), +40mm(ΔH_t-1) 및 +0mm(ΔH_t)이다.Figure 5 (a) shows a situation in which the user wears a smart ring on a finger, moves the smart ring upward, that is, in the direction toward the user's lips, in a parabola, and then stops it close to the user's lips. It is shown illustratively. As shown, the distance (D) between the smart ring and the user's mouth measured at each time point (t-4, t-3, t-2, t-1 and t) is more than 300 mm (D _t-4 ) , 200mm (D _t-3 ), 100mm (D _t-2 ), 20mm (D _t-1 ) and 20mm (D _t ), and the height change value (ΔH) of the smart ring is +70mm (ΔH _t-4 ). , +70 mm (ΔH _t-3 ), +60 mm (ΔH _t-2 ), +40 mm (ΔH _t-1 ) and +0 mm (ΔH _t ).

일 실시예에 따르면, 사용자가 스마트 링을 사용자의 입 부근으로 이동시킴에 따라 음성 인식 시스템(208)은 스마트 링의 근접 센서를 통하여 감지되는 근접 정보(즉, 거리 변화 값)에 기초하여 스마트 링과 사용자 사이의 거리가 원거리에서 근접거리까지 점진적으로 가까워졌음을 추정할 수 있다. 또한, 음성 인식 시스템(208)은 스마트 링의 IMU 센서를 통하여 감지되는 이동 정보(즉, 높이 변화 값)에 기초하여 스마트 링이 상방향으로 이동하였음을 추정할 수 있다.According to one embodiment, as the user moves the smart ring near the user's mouth, the voice recognition system 208 uses the smart ring based on proximity information (i.e., distance change value) detected through the proximity sensor of the smart ring. It can be estimated that the distance between the user and the user has gradually become closer from far to close. Additionally, the voice recognition system 208 may estimate that the smart ring has moved upward based on movement information (i.e., height change value) detected through the IMU sensor of the smart ring.

도 5의 (b)는 사용자가 손가락에 스마트 링을 착용한 상태에서 스마트 링을 상방향, 즉 사용자의 입을 향하는 방향으로 직선 이동시킨 후, 사용자의 입에 근접한 상태에서 정지시킨 상황을 예시적으로 나타낸다. 도시된 바와 같이, 각 시점(t-4, t-3, t-2, t-1 및 t)에서 측정되는 스마트 링과 사용자의 입 사이의 거리(D)는 200mm(D_t-4), 140mm(D_t-3), 80mm(D_t-2), 20mm(D_t-1) 및 20mm(D_t)이고, 스마트 링의 높이 변화 값(ΔH)은 +40mm(ΔH_t-4), +40mm(ΔH_t-3), +40mm(ΔH_t-2), +40mm(ΔH_t-1) 및 +0mm(ΔH_t)이다.Figure 5(b) illustrates a situation where the user wears a smart ring on a finger, moves the smart ring straight upward, that is, in the direction toward the user's mouth, and then stops it close to the user's mouth. indicates. As shown, the distance (D) between the smart ring and the user's mouth measured at each time point (t-4, t-3, t-2, t-1 and t) is 200 mm (D _t-4 ), 140mm (D _t-3 ), 80mm (D _t-2 ), 20mm (D _t-1 ) and 20mm (D _t ), and the height change value (ΔH) of the smart ring is +40mm (ΔH _t-4 ); +40 mm (ΔH _t-3 ), +40 mm (ΔH _t-2 ), +40 mm (ΔH _t-1 ) and +0 mm (ΔH _t ).

도 5의 (a)와 마찬가지로, 사용자가 스마트 링을 사용자의 입술 부근에 이동시킴에 따라 음성 인식 시스템(208)은 스마트 링의 근접 센서를 통하여 감지되는 근접 정보에 기초하여 스마트 링과 사용자 사이의 거리가 원거리에서 근접거리까지 점진적으로 가까워졌음을 추정할 수 있으며, 스마트 링의 IMU 센서를 통하여 감지되는 이동 정보에 기초하여 스마트 링이 상방향으로 이동하였음을 추정할 수 있다.As shown in (a) of FIG. 5, as the user moves the smart ring near the user's lips, the voice recognition system 208 establishes a connection between the smart ring and the user based on proximity information detected through the proximity sensor of the smart ring. It can be estimated that the distance has gradually become closer from far to close, and it can be estimated that the smart ring has moved upward based on movement information detected through the IMU sensor of the smart ring.

이처럼, 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여 디바이스의 음성 인식 트리거 여부를 결정함으로써, 사용자가 디바이스를 들어 올려 사용자의 입에 근접하여 위치시킨 상태에서 음성 인식 트리거가 정확하게 제공될 수 있다.In this way, by determining whether the device will trigger voice recognition by referring to the change in distance between the device and the object and the direction of movement of the device, a voice recognition trigger can be accurately provided when the user lifts the device and places it close to the user's mouth. You can.

도 6은 본 개시의 일 실시예에 따른 음성 인식 시스템을 통해 음성 인식 트리거가 제공되지 않는 상황을 예시적으로 나타내는 도면이다.FIG. 6 is a diagram illustrating a situation in which a voice recognition trigger is not provided through a voice recognition system according to an embodiment of the present disclosure.

도 6은 사용자가 손가락에 스마트 링을 착용한 상태에서 스마트 링에 포함된 센서(근접 센서 및 IMU 센서)에 다른 손가락을 가까이 가져가는 상황(예를 들어, 센서를 손가락으로 덮거나 가리는 상황)을 나타낸다. 이 경우, 스마트 링을 착용한 손가락이 아닌 다른 손가락에 의해 스마트 링의 근접 센서가 가려지면서 스마트 링의 근접 센서에서 감지되는 스마트 링과 객체 사이의 거리가 500mm 이상에서 10mm로 급격히 가까워지게 된다. 이에 따라, 음성 인식 시스템(208)에서는 스마트 링의 음성 인식 트리거가 제공되지 않게 된다. 즉, 음성 인식 시스템(208)에서는 스마트 링과 사용자 사이의 거리가 소정의 거리 이내에 근접한 상태라 하더라도 스마트 링의 근접 센서를 통하여 감지되는 근접 정보(즉, 거리 변화 값)에 기초하여 스마트 링과 사용자 사이의 거리가 원거리부터 근접거리까지 점진적으로 가까워진 경우가 아닌 상황에서는 스마트 링의 음성 인식 트리거가 제공되지 않을 수 있다.Figure 6 shows a situation where the user puts another finger close to the sensor (proximity sensor and IMU sensor) included in the smart ring while wearing a smart ring on the finger (for example, covering or blocking the sensor with the finger). indicates. In this case, as the proximity sensor of the smart ring is obscured by a finger other than the finger wearing the smart ring, the distance between the smart ring and the object detected by the proximity sensor of the smart ring rapidly decreases from 500 mm or more to 10 mm. Accordingly, the voice recognition trigger of the smart ring is not provided in the voice recognition system 208. That is, in the voice recognition system 208, even if the distance between the smart ring and the user is within a predetermined distance, the smart ring and the user are based on proximity information (i.e., distance change value) detected through the proximity sensor of the smart ring. The voice recognition trigger of the smart ring may not be provided in situations where the distance between devices does not gradually become closer from far to near.

이처럼, 디바이스와 객체 사이의 점진적 거리 변화를 참조하여 디바이스의 음성 인식 트리거 여부를 결정함으로써, 신체나 다른 사물 등이 디바이스에 포함되는 센서를 덮거나 가리는 상황에서 음성 인식 트리거가 제공되는 것을 방지할 수 있다.In this way, by determining whether to trigger the voice recognition of the device by referring to the gradual change in distance between the device and the object, it is possible to prevent the voice recognition trigger from being provided in situations where the body or other objects cover or obscure the sensor included in the device. there is.

도 7은 본 개시의 일 실시예에 따라 음성 인식 트리거 제공 방법을 예시적으로 보여주는 동작 흐름도이다.Figure 7 is an operation flowchart illustrating a method for providing a voice recognition trigger according to an embodiment of the present disclosure.

디바이스의 음성 인식 트리거 제공 방법은 음성 인식 시스템(208)이 디바이스(200)에서 감지되는 근접 정보를 획득하는 단계(S702)로부터 시작된다. 또는, 단계(S702)에서, 음성 인식 시스템(208)은 디바이스(200)에서 감지되는 근접 정보 및 이동 정보를 획득할 수 있다. 일 실시예에서, 근접 정보는 디바이스(200)의 근접 센서(202)로부터 획득될 수 있으며, 이동 정보는 디바이스(200)의 IMU 센서(204)로부터 획득될 수 있다. 한편, 일 실시예에서, 디바이스(200)의 근접 센서(202)와 IMU 센서(204)는 소정의 조건에 따라 순차적으로 작동되도록 구성될 수 있다. 예를 들어, 근접 센서(202)에서 감지되는 디바이스(200)와 객체 사이의 거리가 사전 설정된 거리 이내인 경우 IMU 센서(204)가 작동되도록 구성될 수 있다.The method of providing a voice recognition trigger for a device begins with the voice recognition system 208 acquiring proximity information detected by the device 200 (S702). Alternatively, in step S702, the voice recognition system 208 may obtain proximity information and movement information detected by the device 200. In one embodiment, proximity information may be obtained from the proximity sensor 202 of device 200 and movement information may be obtained from the IMU sensor 204 of device 200. Meanwhile, in one embodiment, the proximity sensor 202 and the IMU sensor 204 of the device 200 may be configured to operate sequentially according to predetermined conditions. For example, the IMU sensor 204 may be configured to operate when the distance between the device 200 and the object detected by the proximity sensor 202 is within a preset distance.

단계(S704)에서, 음성 인식 시스템(208)은 근접 정보에 기초하여 디바이스(200)와 객체 사이의 거리 변화를 산출할 수 있다. 또는, 단계(S704)에서, 음성 인식 시스템(208)은 근접 정보에 기초하여 디바이스(200)와 객체 사이의 거리 변화를 산출할 수 있으며, 이동 정보에 기초하여 디바이스(200)의 이동 방향을 산출할 수 있다.In step S704, the voice recognition system 208 may calculate the change in distance between the device 200 and the object based on proximity information. Alternatively, in step S704, the voice recognition system 208 may calculate the change in distance between the device 200 and the object based on the proximity information and calculate the movement direction of the device 200 based on the movement information. can do.

단계(S706)에서, 음성 인식 시스템(208)은 디바이스(200)와 객체 사이의 거리 변화를 참조하여 디바이스(200)의 음성 인식 시점 및 음성 인식 트리거 여부를 결정할 수 있다. 예를 들면, 음성 인식 시스템(208)은 디바이스와 객체 사이의 거리 변화를 참조하여, 디바이스가 사전 설정된 속도 이하의 속도로 객체와 가까워져 사전 설정된 거리 이내에 근접한 시점을 음성 인식 시점으로 결정할 수 있다. 다른 예를 들면, 음성 인식 시스템(208)은 디바이스와 객체 사이의 거리 변화를 참조하여, 디바이스가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지되는 시점을 음성 인식 시점으로 결정할 수 있다. 이때, 디바이스가 객체와 사전 설정된 거리 이내에 근접한 상태에서 거리가 일정하게 유지된 시간이 사전 설정된 시간에 도달하는 시점을 음성 인식 시점으로 결정할 수 있다.In step S706, the voice recognition system 208 may determine the voice recognition timing of the device 200 and whether to trigger voice recognition by referring to the change in distance between the device 200 and the object. For example, the voice recognition system 208 may refer to changes in the distance between the device and the object and determine the voice recognition time as the point in time when the device approaches the object at a speed less than or equal to a preset speed and approaches within the preset distance. For another example, the voice recognition system 208 may refer to changes in the distance between the device and the object and determine the voice recognition time at which the distance remains constant while the device is within a preset distance from the object. At this time, while the device is close to the object within a preset distance, the time when the distance is maintained constant reaches the preset time can be determined as the voice recognition time.

또는, 단계(S706)에서, 음성 인식 시스템(208)은 디바이스(200)와 객체 사이의 거리 변화 및 디바이스(200)의 이동 방향을 함께 참조하여 디바이스(200)의 음성 인식 시점 및 음성 인식 트리거 여부를 결정할 수 있다. 예를 들어, 음성 인식 시스템(208)은 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여, 디바이스가 상방향으로 이동 후 감속하여 정지한 시점을 음성 인식 시점으로 결정할 수 있다. 다른 예를 들면, 음성 인식 시스템(208)은 디바이스와 객체 사이의 거리 변화 및 디바이스의 이동 방향을 참조하여, 디바이스가 상방향으로 이동하여 객체와 사전 설정된 거리 이내에 근접하는 시점을 음성 인식 시점으로 결정할 수 있다.Alternatively, in step S706, the voice recognition system 208 determines the voice recognition timing of the device 200 and whether the voice recognition is triggered by referring to the change in distance between the device 200 and the object and the moving direction of the device 200. can be decided. For example, the voice recognition system 208 may refer to the change in distance between the device and the object and the moving direction of the device, and determine the point in time when the device moves upward and then decelerates and stops as the voice recognition point. For another example, the voice recognition system 208 determines the time when the device moves upward and approaches the object within a preset distance as the voice recognition time, referring to the change in distance between the device and the object and the moving direction of the device. You can.

이상 설명된 본 개시에 따른 실시예는 다양한 컴퓨터 구성요소를 통하여 실행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 개시를 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 하드웨어 장치는 본 개시에 따른 처리를 수행하기 위하여 하나 이상의 소프트웨어 모듈로 변경될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present disclosure described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. A computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present disclosure or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. medium), and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. A hardware device may be replaced with one or more software modules to perform processing according to the present disclosure, and vice versa.

이상 본 개시를 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예에 의해 설명하였으나, 상기 실시예는 본 개시의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 개시가 이에 한정되는 것은 아니며, 본 개시가 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.Although the present disclosure has been described in terms of specific details such as specific components and limited examples, the examples are provided only to facilitate a more general understanding of the present disclosure, and the present disclosure is not limited thereto. Anyone with ordinary knowledge in the relevant technical field can make various modifications and variations from this description.

따라서, 본 개시의 사상은 앞서 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 청구범위뿐만 아니라 이 청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 개시의 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present disclosure should not be limited to the above-described embodiments, and the claims described below as well as all modifications equivalent to or equivalent to the claims are said to fall within the scope of the spirit of the present disclosure. will be.

200: 디바이스
202: 근접 센서
204: IMU 센서
206: 마이크
208: 음성 인식 시스템
302: 산출부
304: 결정부
306: 통신부
308: 저장부200: device
202: Proximity sensor
204: IMU sensor
206: microphone
208: Voice recognition system
302: Calculation unit
304: decision part
306: Department of Communications
308: storage unit

Claims

As a method for providing a voice recognition trigger,
calculating a change in distance between the device and an object based on proximity information detected by the device; and
determining whether to trigger voice recognition of the device by referring to a change in distance between the device and the object;
A method of providing a voice recognition trigger comprising:

According to paragraph 1,
The proximity information is obtained from a proximity sensor of the device.

According to paragraph 1,
In the step of determining whether to trigger voice recognition, the point in time when the device approaches the object at a speed less than a preset speed and approaches within a preset distance is determined as the voice recognition point, with reference to a change in distance between the device and the object. How to provide voice recognition triggers.

According to paragraph 1,
In the step of determining whether to trigger voice recognition, the voice recognition time point is determined at a time when the distance is maintained constant while the device is within a preset distance from the object, with reference to a change in distance between the device and the object. How to provide voice recognition triggers.

According to paragraph 4,
A method for providing a voice recognition trigger, wherein the voice recognition time point is determined when the time at which the distance is kept constant reaches the preset time while the device is within a preset distance from the object.

As a method for providing a voice recognition trigger,
Calculating a change in distance between the device and an object based on proximity information sensed by the device, and calculating a movement direction of the device based on movement information sensed by the device; and
determining whether to trigger voice recognition of the device by referring to a change in distance between the device and the object and a moving direction of the device;
A method of providing a voice recognition trigger comprising:

According to clause 6,
In the step of determining whether to trigger voice recognition, the point in time when the device moves upward and then decelerates and stops is determined as the voice recognition time point, with reference to the change in distance between the device and the object and the moving direction of the device. How to provide recognition triggers.

In clause 7,
If the distance between the device and the object at the time the device stops is within a preset distance, a voice recognition trigger providing method determines the time when the device stops as the voice recognition time.

According to clause 6,
In the step of determining whether to trigger voice recognition, the voice recognition time point is when the device moves upward and approaches the object within a preset distance, referring to the change in distance between the device and the object and the moving direction of the device. Determine how to provide voice recognition triggers.

According to clause 6,
The proximity information is obtained from a proximity sensor of the device, and the movement information is obtained from an IMU sensor of the device.

A non-transitory computer-readable recording medium recording a computer program for executing the method according to any one of claims 1 to 10.

A voice recognition system for providing a voice recognition trigger, comprising:
a calculation unit that calculates a change in distance between the device and an object based on proximity information detected by the device; and
a determination unit that determines whether to trigger voice recognition of the device by referring to a change in distance between the device and the object;
A voice recognition system including.