KR20160110085A

KR20160110085A - Speech recognition system and method thereof

Info

Publication number: KR20160110085A
Application number: KR1020160011838A
Authority: KR
Inventors: 김태윤; 김상하; 한창우; 이재원
Original assignee: 삼성전자주식회사
Priority date: 2015-03-13
Filing date: 2016-01-29
Publication date: 2016-09-21
Also published as: KR102585228B1

Abstract

The present disclosure may provide technology for effectively recognizing a speech by using a personalized wakeup keyword. For this, a device detects a wakeup keyword from the speech signal of a user received by using a personalized wakeup keyword model, transmits whether the wakeup keyword is detected or a received speech signal to a speech recognition server. A speech recognition server sets a speech recognition model according to whether the wakeup keyword is detected and can perform a recognition process on the speech signal of the user. So, an accurate speech recognition function can be provided.

Description

[0001] SPEECH RECOGNITION SYSTEM AND METHOD THEREOF [0002]

본 개시는 음성 인식 시스템 및 방법에 관한 것으로서, 보다 상세하게는, 웨이크 업(Wake up) 키워드에 기초한 음성 인식 시스템 및 방법에 관한 것이다.The present disclosure relates to a speech recognition system and method, and more particularly, to a speech recognition system and method based on a wake up keyword.

디바이스들이 스마트해지면서, 사용자의 음성 신호를 이용하여 디바이스의 기능을 실행시킬 수 있는 음성 인식 기능이 디바이스에 탑재되고 있다. As the devices become smart, the device is equipped with a voice recognition function capable of executing the function of the device by using the user's voice signal.

디바이스에 탑재된 음성 인식 기능을 사용하기 위하여, 디바이스의 음성 인식 기능을 웨이크업 시켜야 한다. 기존의 음성 인식 기능은 고정 웨이크업 키워드를 이용하여 웨이크업 되고 있다. 이로 인하여 동일한 장소에 동일한 음성 인식 기능을 탑재한 복수의 디바이스가 있을 때, 원하지 않는 디바이스의 음성 인식 기능이 웨이크업 될 수 있다. In order to use the voice recognition function mounted on the device, the voice recognition function of the device must be woken up. The existing speech recognition function is being woken up using the fixed wakeup keyword. Thus, when there are a plurality of devices equipped with the same voice recognition function in the same place, the voice recognition function of the unwanted device can be woken up.

또한, 기존의 음성 인식 기능은 웨이크업 키워드와 음성 명령을 나누어 처리하고 있다. 이에 따라 사용자는 웨이크업 키워드를 입력한 후, 디바이스의 음성 인식 기능이 웨이크업 되면, 음성 명령을 입력하여야 한다. 만약 사용자가 웨이크업 키워드와 음성 명령을 연속적으로 입력할 경우에, 기존의 음성 인식 기능은 웨이크업 되지 않거나 웨이크업 된다 하더라고 입력된 음성 명령에 대한 인식 오류가 발생될 수 있다. In addition, the existing speech recognition function is divided into a wakeup keyword and a voice command. Accordingly, after inputting the wake-up keyword, the user must input a voice command when the voice recognition function of the device wakes up. If the user continuously inputs the wakeup keyword and the voice command, the conventional voice recognition function may not be woken up or wake up, but a recognition error may occur for the inputted voice command.

따라서, 보다 편리하고, 정확하게 디바이스의 음성 인식 기능을 웨이크업 시키면서 보다 정확하게 음성 명령을 인식할 수 있는 기술이 요구되고 있다. Accordingly, there is a need for a technique that can more accurately and accurately recognize a voice command while waking up the voice recognition function of the device.

전술한 배경기술은 발명자가 본 개시의 도출을 위해 보유하고 있었거나, 본 개시의 도출 과정에서 습득한 정보로서, 반드시 본 개시의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-described background information is information that the inventor holds for the purpose of deriving the present disclosure or obtained in the process of deriving the present disclosure and is not necessarily a known technology disclosed to the general public prior to the filing of the present disclosure.

본 개시의 실시 예들은, 개인화된 웨이크업 키워드와 음성 명령을 연속적으로 인식함으로써, 보다 편리하고 정확한 음성 인식 기능을 제공하기 위한 것이다.Embodiments of the present disclosure are intended to provide a more convenient and accurate speech recognition function by continuously recognizing personalized wakeup keywords and voice commands.

또한, 본 개시의 실시 예들은, 개인화된 웨이크업 키워드를 이용하여 보다 효과적으로 웨이크업 되는 음성 인식 기능을 제공하기 위한 것이다. Embodiments of the present disclosure are also intended to provide a speech recognition function that is more effectively woken up using personalized wakeup keywords.

또한, 본 개시의 실시 예들은, 디바이스 기반의 환경 정보에 따른 개인화 웨이크업 키워드를 이용하여 보다 효과적으로 웨이크업 되는 음성 인식 기능을 제공하기 위한 것이다. Embodiments of the present disclosure are also intended to provide a speech recognition function that is more effectively woken up using personalized wakeup keywords according to device-based environmental information.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 1 측면은, 디바이스에 있어서, 사용자의 음성 신호를 수신하는 오디오 입력부; 웨이크업 키워드 모델을 저장하는 메모리; 음성 인식 서버와 통신할 수 있는 통신부; 및 상기 오디오 입력부를 통해 상기 사용자의 음성 신호가 수신됨에 따라 상기 웨이크업 키워드 모델을 이용하여 상기 사용자의 음성 신호로부터 웨이크업 키워드를 검출하고, 상기 웨이크업 키워드에 대한 검출 여부 신호와 상기 사용자의 음성 신호를 상기 통신부를 통해 상기 음성 인식 서버로 전송하고, 상기 통신부를 통해 상기 음성 인식 서버로부터 음성 인식 결과를 수신하고, 및 상기 음성 인식 결과에 따라 상기 디바이스를 제어하는 프로세서를 포함하는 디바이스를 제공할 수 있다. As a technical means for achieving the above-mentioned technical object, a first aspect of the present disclosure relates to a device, comprising: an audio input for receiving a user's voice signal; A memory for storing a wake-up keyword model; A communication unit capable of communicating with a speech recognition server; And a controller for detecting a wake-up keyword from the user's voice signal using the wake-up keyword model as the voice signal of the user is received through the audio input unit, A processor for transmitting a signal to the speech recognition server via the communication unit, receiving a speech recognition result from the speech recognition server via the communication unit, and controlling the device in accordance with the speech recognition result .

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 2 측면은, 음성 인식 서버에 있어서, 적어도 하나의 디바이스와 통신할 수 있는 통신부; 웨이크업 키워드 모델과 음성 인식 모델을 저장하는 메모리; 및 상기 통신부를 통해 상기 적어도 하나의 디바이스 중 하나의 디바이스로부터 웨이크업 키워드에 대한 검출 여부 신호와 사용자의 음성 신호가 수신됨에 따라 상기 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정하고, 상기 설정된 음성 인식 모델을 이용하여 상기 사용자의 음성 신호를 인식하고, 상기 사용자의 음성 신호에 대한 인식 결과로부터 상기 웨이크업 키워드를 제거하고, 상기 웨이크업 키워드가 제거된 인식 결과를 상기 통신부를 통해 상기 디바이스로 전송하는 프로세서를 포함하는 음성 인식 서버를 제공할 수 있다. As a technical means for achieving the above technical object, a second aspect of the present disclosure is a speech recognition server comprising: a communication unit capable of communicating with at least one device; A memory for storing a wake-up keyword model and a speech recognition model; And a speech recognition model in which the wake-up keyword model is combined with a detection signal for a wake-up keyword and a voice signal of a user from a device of one of the at least one device through the communication unit, Recognizing the voice signal of the user using the recognition model, removing the wake-up keyword from the recognition result of the user's voice signal, transmitting the recognition result from which the wake-up keyword is removed to the device through the communication unit A voice recognition server, and a voice recognition server.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 3 측면은, 사용자의 음성 신호로부터 웨이크업 키워드를 검출하는 디바이스; 상기 디바이스로부터 상기 웨이크업 키워드에 대한 검출 여부 신호와 상기 사용자의 음성 신호가 수신됨에 따라 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정하고, 상기 설정된 음성 인식 모델을 이용하여 상기 사용자의 음성 신호를 인식하고, 인식 결과를 상기 디바이스로 전송하는 음성 인식 서버를 포함하는 음성 인식 시스템을 제공할 수 있다.As a technical means for achieving the above-mentioned technical object, a third aspect of the present disclosure relates to a device for detecting a wake-up keyword from a voice signal of a user; And a voice recognition model in which a wake-up keyword model is combined with a detection signal for the wake-up keyword and a voice signal of the user are received from the device, And a voice recognition server for recognizing the voice recognition result and transmitting the recognition result to the device.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 4 측면은, 사용자의 음성 신호가 수신됨에 따라 웨이크업 키워드 모델을 이용하여 상기 사용자의 음성 신호로부터 웨이크업 키워드를 검출하는 단계; 상기 웨이크업 키워드에 대한 검출 여부 신호와 상기 사용자의 음성 신호를 음성 인식 서버로 전송하는 단계; 상기 음성 인식 서버로부터 상기 사용자의 음성 신호에 대한 인식 결과를 수신하는 단계; 및 상기 인식 결과에 따라 상기 디바이스를 제어하는 단계를 포함하는 디바이스에서의 음성 인식 방법을 제공할 수 있다. As a technical means for achieving the above-mentioned technical problem, a fourth aspect of the present disclosure is a method for detecting a wake-up keyword from a voice signal of a user using a wake-up keyword model as a voice signal of a user is received, Transmitting a detection signal for the wake-up keyword and the voice signal of the user to a voice recognition server; Receiving a recognition result of the user's voice signal from the voice recognition server; And controlling the device in accordance with the recognition result.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 5 측면은, 디바이스로부터 웨이크업 키워드에 대한 검출 여부 신호와 사용자의 음성 신호를 수신하는 단계; 상기 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성 인식 모델을 설정하는 단계; 상기 음성 인식 모델을 이용하여 상기 사용자의 음성 신호를 인식하는 단계; 상기 사용자의 음성 신호에 대한 인식 결과로부터 상기 웨이크업 키워드를 제거하는 단계; 및 상기 웨이크업 키워드가 제거된 상기 사용자의 음성 신호에 대한 인식 결과를 상기 디바이스로 전송하는 단계를 포함하는 음성 인식 서버에서의 음성 인식 방법을 제공할 수 있다. As a technical means for achieving the above-mentioned technical object, a fifth aspect of the present disclosure is a method comprising: receiving a detection signal and a voice signal of a user for a wake-up keyword from a device; Setting a speech recognition model according to a detection signal of the wakeup keyword; Recognizing the user's speech signal using the speech recognition model; Removing the wake-up keyword from recognition results of the user's voice signal; And transmitting the recognition result of the user's voice signal from which the wake-up keyword is removed to the device.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 개시의 제 6 측면은, 상기 디바이스를 통해 사용자의 음성 신호가 수신됨에 따라 상기 웨이크업 키워드 모델을 이용하여 상기 사용자의 음성 신호로부터 상기 웨이크업 키워드를 검출하는 단계; 상기 웨이크업 키워드에 대한 검출 여부 신호와 상기 사용자의 음성 신호를 상기 디바이스로부터 상기 음성 인식 서버로 전송하는 단계; 상기 음성 인식 서버에서, 상기 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성 인식 모델을 설정하는 단계; 상기 음성 인식 서버에서, 상기 설정된 음성 인식 모델을 이용하여 상기 사용자의 음성 신호를 인식하는 단계; 상기 음성 인식 서버에서, 상기 사용자의 음성 신호에 대한 인식 결과로부터 상기 웨이크업 키워드를 제거하는 단계; 상기 웨이크업 키워드가 제거된 상기 사용자의 음성 신호에 대한 인식 결과를 상기 음성 인식 서버로부터 상기 디바이스로 전송하는 단계; 및 상기 디바이스에서, 상기 수신된 인식 결과에 따라 디바이스를 제어하는 단계를 포함하는 음성 인식 시스템에서 음성 인식 방법을 제공할 수 있다. As a technical means to achieve the above-mentioned technical problem, a sixth aspect of the present disclosure is a method for detecting a wake-up keyword from a user's voice signal using the wake-up keyword model as a user's voice signal is received via the device, ; Transmitting a detection signal for the wake-up keyword and the user's voice signal from the device to the voice recognition server; Setting a speech recognition model in the speech recognition server in accordance with a detection signal of the wakeup keyword; Recognizing the speech signal of the user using the speech recognition model in the speech recognition server; Removing, at the speech recognition server, the wakeup keyword from recognition results of the user's speech signal; Transmitting a recognition result of the user's voice signal from which the wake-up keyword is removed to the device from the voice recognition server; And controlling, in the device, a device according to the received recognition result.

본 개시의 제 7 측면은, 제 5 측면의 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다.A seventh aspect of the present disclosure can provide a computer-readable recording medium having recorded thereon a program for causing a computer to execute the method of the fifth aspect.

본 개시의 제 8 측면은, 제 6 측면의 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다.The eighth aspect of the present disclosure can provide a computer-readable recording medium on which a program for causing a computer to execute the method of the sixth aspect is recorded.

도 1은 일부 실시 예에 따른 음성 인식 시스템을 설명하는 도면이다.
도 2는 일부 실시 예에 따른 음성 인식 시스템에 포함되는 디바이스와 음성 인식 서버 기반으로 수행되는 음성 인식 방법의 동작 흐름도이다.
도 3은 일부 실시 예에 따른 음성 인식 방법에 있어서 웨이크업 키워드 모델을 등록하는 프로세스의 동작 흐름도이다.
도 4는 일부 실시 예에 따른 음성 인식 방법에 있어서 웨이크업 키워드를 등록하는 다른 프로세스의 동작 흐름도이다.
도 5a 및 도 5b는 일부 실시 예에 따른 음성 인식 시스템에 포함된 디바이스의 디스플레이상에 후보 웨이크업 키워드 모델이 디스플레이 되는 예이다.
도 6 및 도 7은 일부 실시 예에 따른 음성 인식 시스템에 포함되는 디바이스와 음성 인식 서버를 기반으로 수행되는 음성 인식 방법의 동작 흐름도들이다.
도 8은 일부 실시 예에 따른 디바이스에 의한 음성 인식 방법의 동작 흐름도이다.
도 9 및 도 10은 일부 실시 예에 따른 음성 인식 시스템에 포함되는 디바이스의 구성도이다.
도 11은 일부 실시 예에 따른 음성 인식 시스템에 포함되는 음성 인식 서버의 구성도이다.
도 12는 일부 다른 실시 예에 따른 음성 인식 시스템의 구성도이다.1 is a diagram for explaining a speech recognition system according to some embodiments.
2 is a flowchart illustrating an operation of a device included in the speech recognition system and a speech recognition method based on the speech recognition server according to some embodiments.
3 is a flowchart of an operation of a process of registering a wake-up keyword model in a speech recognition method according to some embodiments.
Fig. 4 is an operational flowchart of another process for registering a wake-up keyword in the speech recognition method according to some embodiments.
5A and 5B are examples in which a candidate wake-up keyword model is displayed on a display of a device included in the speech recognition system according to some embodiments.
6 and 7 are operational flowcharts of a device included in the speech recognition system and a speech recognition method based on the speech recognition server according to some embodiments.
8 is a flowchart illustrating an operation of a speech recognition method by a device according to some embodiments.
9 and 10 are block diagrams of a device included in a speech recognition system according to some embodiments.
11 is a configuration diagram of a speech recognition server included in a speech recognition system according to some embodiments.
12 is a configuration diagram of a speech recognition system according to some other embodiments.

아래에서는 첨부한 도면을 참조하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 개시의 실시 예를 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. In order that the present disclosure may be more fully understood, the same reference numbers are used throughout the specification to refer to the same or like parts.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when an element is referred to as "comprising ", it means that it can include other elements as well, without departing from the other elements unless specifically stated otherwise.

본 개시에서 웨이크업 키워드는 음성 인식 기능을 웨이크업 할 수 있는 정보를 말한다. 본 개시에서 웨이크업 키워드는 웨이크업 워드라고 말할 수 있다. 본 개시에서 웨이크업 키워드는 사용자의 음성 신호 기반일 수 있으나 본 개시에서 웨이크업 키워드는 이로 제한되지 않는다. 예를 들어, 본 개시에서 웨이크업 키워드는 사용자의 제스처 기반의 소리(또는 오디오 신호)를 포함할 수 있다. In the present disclosure, the wakeup keyword refers to information capable of waking up the speech recognition function. In this disclosure, the wakeup keyword can be said to be a wakeup word. In this disclosure, the wake-up keyword may be based on the user's voice signal, but in this disclosure the wake-up keyword is not limited thereto. For example, in the present disclosure, the wakeup keyword may include a gesture-based sound (or audio signal) of the user.

사용자의 제스처 기반의 소리는, 예를 들어, 사용자가 손가락을 부딪히는 소리를 포함할 수 있다. 사용자의 제스처 기반의 소리는, 예를 들어, 사용자가 혀를 차는 소리를 포함할 수 있다. 사용자의 제스처 기반의 소리는, 예를 들어, 사용자의 웃음 소리를 포함할 수 있다. 사용자의 제스처 기반의 소리는, 예를 들어, 사용자가 입술을 떠는 소리를 포함할 수 있다. 사용자의 제스처 기반의 소리는, 예를 들어, 사용자의 휘파람 소리를 포함할 수 있다. 본 개시에서 사용자의 제스처 기반의 소리는 상술한 바로 제한되지 않는다. A gesture-based sound of a user may include, for example, a sound that a user hits a finger. A gesture-based sound of a user may include, for example, a user's tongue-tearing sound. The user's gesture-based sound may include, for example, the user's laughing sound. The gesture-based sound of the user may include, for example, the sound of the user's lips being pounded. The user's gesture-based sound may, for example, include the user's whistling sound. In this disclosure, the gesture-based sound of the user is not limited to the above-mentioned one.

본 개시에서 웨이크업 키워드는 상술한 사용자의 제스처 기반의 소리를 포함할 경우에, 웨이크업 신호라고 말할 수 있다. In the present disclosure, the wake-up keyword may be referred to as a wake-up signal if it includes the gesture-based sound of the user described above.

본 개시에서 웨이크업 키워드 모델은 웨이크업 키워드를 검출하거나 인식하기 위해 디바이스 또는/및 음성 인식 서버에 미리 등록된 웨이크업 키워드를 말한다. 웨이크업 키워드 모델은 개인화 음향 모델(Acoustic model) 및/또는 언어 모델(Language model)을 포함할 수 있으나 본 개시에서 웨이크업 키워드 모델은 이로 제한되지 않는다. 음향 모델은 사용자의 음성(또는 사용자의 제스처 기반이 소리)의 신호적인 특성을 모델링한 것이다. 언어 모델은 인식 어휘에 해당하는 단어나 음절 등의 언어적인 순서를 모델링 한 것이다. In this disclosure, the wake-up keyword model refers to a wake-up keyword pre-registered with a device or / and a speech recognition server to detect or recognize a wake-up keyword. The wakeup keyword model may include a personalized acoustic model and / or a language model, but in the present disclosure, the wakeup keyword model is not limited thereto. The acoustic model is a model of the signal characteristics of the user's voice (or the sound of the user's gesture). The language model is a model of the linguistic order of words or syllables corresponding to the recognized vocabulary.

본 개시의 디바이스에 등록되는 웨이크업 키워드 모델은 웨이크업 키워드를 검출하기 위해 사용되므로, 웨이크업 키워드 검출용 모델이라고 말할 수 있다. 본 개시의 음성 인식 서버에 등록되는 웨이크업 키워드 모델은 웨이크업 키워드를 인식하기 위해 사용되므로, 웨이크업 키워드 인식용 모델이라고 말할 수 있다. Since the wake-up keyword model registered in the device of the present disclosure is used for detecting the wake-up keyword, it can be said that it is a model for detecting the wake-up keyword. Since the wake-up keyword model registered in the voice recognition server of the present disclosure is used to recognize the wake-up keyword, it can be said that it is a model for recognizing the wake-up keyword.

웨이크업 키워드 검출용 모델과 웨이크업 키워드 인식용 모델은 서로 동일할 수 있으나 차이가 있을 수 있다. 예를 들어, 웨이크업 키워드 검출용 모델이 개인화 웨이크업 키워드 Hi에 대응되는 음향 모델을 포함할 때, 웨이크업 키워드 인식용 모델은, 예를 들어, 개인화 웨이크업 키워드 Hi에 대응되는 음향 모델과 웨이크업 키워드임을 나타내는 태그(예를 들어, !)를 포함할 수 있다. 본 개시에서 웨이크업 키워드 검출용 모델과 웨이크업 키워드 인식용 모델은 상술한 바로 제한되지 않는다. The model for detecting the wake-up keyword and the model for recognizing the wake-up keyword may be the same, but there may be a difference. For example, when the model for detecting the wake-up keyword includes the acoustic model corresponding to the personalized wakeup keyword Hi, the model for recognizing the wakeup keyword includes, for example, an acoustic model corresponding to the personalized wakeup keyword Hi, (E.g., < RTI ID = 0.0 >!) &Lt; / RTI > In the present disclosure, the model for wake-up keyword detection and the model for wakeup keyword recognition are not limited to the above-described ones.

이하 설명에서 웨이크업 키워드 검출용 모델과 웨이크업 키워드 인식용 모델을 구분하지 않고, 웨이크업 키워드 모델로 언급할 것이다. 그러나, 디바이스에 등록된 웨이크업 키워드 모델은 웨이크업 키워드 검출용 모델로서 이해되고, 음성 인식 서버에 등록된 웨이크업 키워드 모델은 웨이크업 키워드 인식용 모델로서 이해될 수 있다. In the following description, a model for detecting a wake-up keyword and a model for recognizing a wake-up keyword are not distinguished from each other, and will be referred to as a wake-up keyword model. However, the wake-up keyword model registered in the device is understood as a model for wake-up keyword detection, and the wake-up keyword model registered in the speech recognition server can be understood as a model for wake-up keyword recognition.

웨이크업 키워드 모델은 디바이스 또는 음성 인식 서버에 의해 생성될 수 있다. 디바이스 또는 음성 인식 서버는 생성된 웨이크업 키워드 모델을 서로 공유하기 위하여, 데이터를 송수신할 수 있다. The wakeup keyword model may be generated by the device or the speech recognition server. The device or the speech recognition server can send and receive data to share the generated wakeup keyword model with each other.

본 개시에서 음성 인식 기능은 사용자의 음성 신호를 문자열(또는 텍스트)로 변환하는 것을 말한다. 사용자의 음성 신호는 음성 명령을 포함할 수 있다. 음성 명령은 디바이스의 특정 기능을 실행할 수 있다. In the present disclosure, the speech recognition function refers to converting a user's voice signal into a character string (or text). The user's voice signal may include voice commands. The voice command can execute a specific function of the device.

본 개시에서 디바이스의 특정 기능은, 예를 들어, 디바이스에 설정된 애플리케이션을 실행하는 것을 포함할 수 있으나 이로 제한되지 않는다. The specific functionality of the device in this disclosure may include, but is not limited to, for example, executing an application configured in the device.

예를 들어, 디바이스가 스마트 폰인 경우에, 애플리케이션을 실행하는 것은 전화 걸기, 길 찾기, 인터넷 검색하기, 또는 알람 설정하기 등을 포함할 수 있다. 예를 들어, 디바이스가 스마트 티브인 경우에, 애플리케이션을 실행하는 것은 프로그램 검색하기, 또는 채널 검색하기 등을 포함할 수 있다. 디바이스가 스마트 오븐인 경우에, 애플리케이션을 실행하는 것은 요리 방법 검색하기 등을 포함할 수 있다. 디바이스가 스마트 냉장고인 경우에, 애플리케이션을 실행하는 것은 냉장 상태 점검하기, 또는 냉동 상태 점검하기 등을 포함할 수 있다. 디바이스가 스마트 자동차인 경우에, 애플리케이션을 실행하는 것은 자동 시동 걸기, 자율 주행하기, 자동 주차하기 등을 포함할 수 있다. 본 개시에서 애플리케이션을 실행하는 것은 상술한 바로 제한되지 않는다. For example, when the device is a smart phone, running the application may include dialing, navigating, searching the internet, or setting an alarm. For example, in the case where the device is a smart device, executing an application may include searching for a program, searching for a channel, and the like. If the device is a smart oven, running the application may include searching for cooking methods, and the like. If the device is a smart refrigerator, running the application may include checking the refrigeration condition, or checking the refrigeration condition, and the like. When the device is a smart car, running the application may include autostarting, autonomous driving, automatic parking, and the like. Implementation of an application in this disclosure is not limited to the one just described.

본 개시에서 음성 명령은 워드 형태를 가질 수 있다. 본 개시에서 음성 명령은 문장 형태를 가질 수 있다. 본 개시에서 음성 명령은 구 형태를 가질 수 있다. 본 개시에서 음성 인식 모델은 개인화 음향 모델 또는/및 언어 모델을 포함할 수 있다. Voice instructions in this disclosure may have a word form. In the present disclosure, voice commands may have the form of a sentence. Voice instructions in this disclosure may have a sphere shape. In this disclosure, the speech recognition model may include a personalized acoustic model and / or a language model.

이하 첨부된 도면을 참고하여 본 개시를 상세하게 설명하기로 한다. The present disclosure will be described in detail below with reference to the accompanying drawings.

도 1은 일부 실시 예에 따른 음성 인식 시스템(10)을 설명하는 도면이다. 음성 인식 시스템(10)은 디바이스(100)와 음성 인식 서버(110)를 포함한다. 1 is a diagram for explaining a speech recognition system 10 according to some embodiments. The speech recognition system 10 includes a device 100 and a speech recognition server 110.

디바이스(100)는 사용자(101)로부터 음성 신호를 수신할 수 있다. 디바이스(100)는 웨이크업 키워드 모델을 사용하여 수신되는 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. 디바이스(100)는 웨이크업 키워드 모델을 생성하여 디바이스(100)에 등록할 수 있다. 디바이스(100)는 생성된 웨이크업 키워드 모델을 음성 인식 서버(110)로 전송할 수 있다. 디바이스(100)는 음성 인식 서버(110)로부터 웨이크업 키워드 모델을 수신하여 사용할 수 있다. The device 100 may receive a voice signal from the user 101. [ The device 100 may detect the wakeup keyword from the voice signal of the user 101 that is received using the wakeup keyword model. The device 100 may generate a wake-up keyword model and register it in the device 100. [ The device 100 may transmit the generated wakeup keyword model to the speech recognition server 110. [ The device 100 may receive and use a wake-up keyword model from the speech recognition server 110.

디바이스(100)는 웨이크업 키워드에 대한 검출 여부 신호와 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송할 수 있다. The device 100 may send a detection signal for the wakeup keyword and a voice signal of the user 101 to the voice recognition server 110. [

웨이크업 키워드에 대한 검출 여부 신호는 수신되는 사용자(101)의 음성 신호로부터 웨이크업 키워드가 검출되었는지 여부를 나타내는 신호이다. 디바이스(100)는 웨이크업 키워드에 대한 검출 여부 신호를 2진 데이터로 표현할 수 있다. 수신되는 사용자(101)의 음성 신호로부터 웨이크업 키워드가 검출되면, 디바이스(100)는 웨이크업 키워드에 대한 검출 신호를, 예를 들어, ‘0’으로 표현할 수 있다. 수신되는 사용자(101)의 음성 신호로부터 웨이크업 키워드가 검출되지 않으면, 디바이스(100)는 웨이크업 키워드에 대한 검출 신호를, 예를 들어, ‘1’로 표현할 수 있다.The detection / non-detection signal for the wake-up keyword is a signal indicating whether or not the wake-up keyword has been detected from the voice signal of the user 101 to be received. The device 100 may represent the detection signal for the wakeup keyword in binary data. When the wake-up keyword is detected from the voice signal of the received user 101, the device 100 can express the detection signal for the wake-up keyword by, for example, '0'. If the wakeup keyword is not detected from the voice signal of the received user 101, the device 100 can express the detection signal for the wakeup keyword as '1', for example.

디바이스(100)에 의해 수신되는 사용자(101)의 음성 신호는 웨이크업 키워드와 음성 명령을 포함할 수 있다. 디바이스(100)에 의해 수신되는 사용자(101)의 음성 신호는 웨이크업 키워드를 포함하지 않을 수 있다. The voice signal of the user 101 received by the device 100 may include a wakeup keyword and a voice command. The voice signal of the user 101 received by the device 100 may not include the wakeup keyword.

음성 인식 서버(110)는 디바이스(100)로부터 웨이크업 키워드에 대한 검출 여부 신호와 사용자(101)의 음성 신호를 수신할 수 있다. 디바이스(100)로부터 수신되는 사용자(101)의 음성 신호는 디바이스(100)에 의해 수신되는 사용자(101)의 음성 신호와 동일하다. The speech recognition server 110 may receive a detection signal for the wakeup keyword from the device 100 and a voice signal of the user 101. [ The voice signal of the user 101 received from the device 100 is the same as the voice signal of the user 101 received by the device 100. [

음성 인식 서버(110)는 수신되는 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성 인식 모델을 설정할 수 있다. 웨이크업 키워드에 대한 검출 여부 신호가 사용자(101)의 음성 신호에 웨이크업 키워드가 포함된 것을 나타내면, 음성 인식 서버(110)는 웨이크업 키워드 모델을 결합한 음성 인식 모델을 이용하여 사용자(101)의 음성 신호를 인식하도록 음성 인식 모델을 설정할 수 있다. The speech recognition server 110 can set the speech recognition model according to the detection signal of the received wake-up keyword. If the detection signal for the wake-up keyword indicates that the voice signal of the user 101 includes the wake-up keyword, the voice recognition server 110 uses the voice recognition model combined with the wake- The speech recognition model can be set to recognize the speech signal.

음성 인식 서버(110)에서 음성 인식 모델에 결합되는 웨이크업 키워드 모델은 디바이스(100)에서 검출된 웨이크업 키워드에 따를 수 있다. 예를 들어, 디바이스(100)에서 검출된 웨이크업 키워드가“하이(Hi)”인 경우에, 음성 인식 서버(110)는 “하이(Hi) + 음성 인식 모델(예를 들어, play the music)”을 이용하여 사용자(101)의 음성 신호를 인식하도록 음성 인식 모델을 설정할 수 있다. 음성 인식 서버(110)는 웨이크업 키워드 모델과 음성 인식 모델을 결합할 때, 웨이크업 키워드 모델과 음성 인식 모델 사이에 침묵 구간(silence duration)을 고려할 수 있다. A wakeup keyword model coupled to the speech recognition model at the speech recognition server 110 may follow the wakeup keyword detected at the device 100. [ For example, when the wakeup keyword detected in the device 100 is " Hi ", the speech recognition server 110 determines that the speech recognition model (e.g., play the music) Quot; can be used to set the speech recognition model to recognize the speech signal of the user 101. [ When combining the wakeup keyword model and the speech recognition model, the speech recognition server 110 may consider the silence duration between the wakeup keyword model and the speech recognition model.

상술한 바와 같이 음성 인식 서버(110)는 사용자의 음성 신호에 포함되어 있는 웨이크업 키워드 모델과 음성 명령에 대한 인식 처리를 연속적으로 수행함으로써, 사용자의 음성 신호를 안정적으로 확보하여 음성 인식 성능을 향상 시킬 수 있다. As described above, the speech recognition server 110 continuously performs the recognition processing on the voice command and the wake-up keyword model included in the voice signal of the user, thereby stably obtaining the voice signal of the user and improving the voice recognition performance .

웨이크업 키워드에 대한 검출 여부 신호가 사용자(101)의 음성 신호에 웨이크업 키워드가 포함되지 않은 것을 나타내면, 음성 인식 서버(110)는 웨이크업 키워드 모델을 결합하지 않은 음성 인식 모델을 이용하여 사용자(101)의 음성 신호를 인식하도록 음성 인식 모델을 설정할 수 있다. If the detection signal for the wake-up keyword indicates that the voice signal of the user 101 does not include the wake-up keyword, the voice recognition server 110 uses the voice recognition model that does not incorporate the wake- 101 can recognize the speech signal of the speech recognition model.

이와 같이 음성 인식 서버(110)는 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성을 인식하기 위해 사용되는 음성 인식 모델을 동적으로 재구성(또는 스위칭)할 수 있다. 이에 따라 음성 인식 서버(110)에서 수행되는 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성 인식 모델을 설정하는 것은 웨이크업 키워드에 대한 검출 여부에 따라 음성 인식 모델의 구성을 결정하는 것이라 말할 수 있다. In this manner, the speech recognition server 110 can dynamically reconfigure (or switch) the speech recognition model used for recognizing speech according to the detection signal for the wakeup keyword. Accordingly, it can be said that setting the speech recognition model according to the detection / non-detection signal of the wake-up keyword performed in the speech recognition server 110 determines the configuration of the speech recognition model according to whether or not the wake-up keyword is detected.

음성 인식 서버(110)에서 음성 인식 모델을 설정하는 것은 음성 인식 모델을 로딩하는 것을 포함할 수 있다. 이에 따라 웨이크업 키워드에 대한 검출 여부 신호는 음성 인식 모델 로딩 요청 신호, 음성 인식 모델 설정 요청 신호, 또는 음성 인식 모델 로딩 트리거 신호를 포함하는 것으로 해석될 수 있다. 본 개시에서 웨이크업 키워드에 대한 검출 여부 신호에 대한 표현은 상술한 바들로 제한되지 않는다. Setting the speech recognition model in the speech recognition server 110 may include loading the speech recognition model. Accordingly, the detection signal for the wakeup keyword may be interpreted as including a voice recognition model loading request signal, a voice recognition model setting request signal, or a voice recognition model loading trigger signal. In this disclosure, the expression for the detection signal for the wake-up keyword is not limited to those described above.

음성 인식 서버(110)는 음성 명령을 인식하기 위한 음성 인식 모델을 생성할 수 있다. 음성 인식 모델은 음향 모델(Acoustic Model)과 언어 모델(Language Model)을 포함할 수 있다. 음향 모델은 음성의 신호적인 특성을 모델링 한 것을 말한다. 언어 모델은 인식 어휘에 해당하는 단어나 음절 등의 언어적인 순서 관계를 모델링 한 것을 말한다. The speech recognition server 110 may generate a speech recognition model for recognizing speech commands. The speech recognition model may include an acoustic model and a language model. The acoustic model refers to modeling the signal characteristics of speech. The language model is a model of a linguistic order relation such as a word or syllable corresponding to a recognition vocabulary.

음성 인식 서버(110)는 수신되는 사용자(101)의 음성 신호로부터 음성 부분만을 검출할 수 있다. 음성 인식 서버(110)는 검출된 음성 부분에서 음성 특징을 추출할 수 있다. 음성 인식 서버(110)는 추출된 음성 특징과 기 등록된 음향 모델의 특징 및 언어 모델을 이용하여 수신되는 사용자(101)의 음성 신호에 대한 음성 인식 처리를 수행할 수 있다. 음성 인식 서버(110)는 추출된 음성 특징과 기 등록된 음향 모델의 특징간을 비교하여 음성 인식 처리를 수행할 수 있다. 음성 인식 서버(110)에 의해 수행되는 수신되는 사용자(101)의 음성 신호에 대한 음성 인식 처리는 상술한 바로 제한되지 않는다. The speech recognition server 110 can detect only the speech portion from the speech signal of the user 101 that is received. The speech recognition server 110 may extract the speech feature from the detected speech portion. The speech recognition server 110 may perform a speech recognition process on the received speech signal of the user 101 using the extracted speech features, features of the pre-registered acoustic model, and language model. The speech recognition server 110 can perform speech recognition processing by comparing features of the extracted speech features and pre-registered acoustic models. The speech recognition processing of the received voice signal of the user 101 performed by the voice recognition server 110 is not limited to the above-described one.

음성 인식 서버(110)는 음성인식 처리를 수행한 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 음성 인식 서버(110)는 웨이크업 키워드가 제거된 음성인식 결과를 디바이스(100)로 전송할 수 있다. The speech recognition server 110 may remove the wakeup keyword from the speech recognition result that has performed the speech recognition processing. The voice recognition server 110 can transmit the voice recognition result to the device 100 from which the wake-up keyword has been removed.

음성 인식 서버(110)는 웨이크업 키워드 모델을 생성할 수 있다. 음성 인식 서버(110)는 생성한 웨이크업 키워드 모델을 음성 인식 서버(110)에 등록(또는 저장)하면서 디바이스(100)로 전송할 수 있다. 이에 따라 디바이스(100)와 음성 인식 서버(110)는 웨이크업 키워드 모델을 공유할 수 있다. The speech recognition server 110 may generate a wake-up keyword model. The speech recognition server 110 can register the generated wake-up keyword model in the speech recognition server 110 and transmit it to the device 100. [ Accordingly, the device 100 and the speech recognition server 110 can share the wake-up keyword model.

디바이스(100)는 음성 인식 서버(110)로부터 수신되는 음성인식 결과에 따라 디바이스(100)의 기능을 제어할 수 있다. The device 100 may control the function of the device 100 according to the voice recognition result received from the voice recognition server 110. [

디바이스(100) 또는 음성 인식 서버(110)에 의해 생성된 웨이크업 키워드 모델이 복수개인 경우에, 디바이스(100) 또는 음성 인식 서버(110)는 각 웨이크업 키워드 모델에 대한 식별 정보를 할당할 수 있다. 각 웨이크업 키워드 모델에 식별 정보가 할당된 경우에, 디바이스(100)로부터 음성 인식 서버(110)로 전송되는 웨이크업 키워드에 대한 검출 여부 신호는 검출된 웨이크업 키워드에 대한 식별 정보를 포함할 수 있다. The device 100 or the speech recognition server 110 can assign identification information for each wake-up keyword model when there are a plurality of wake-up keyword models generated by the device 100 or the speech recognition server 110 have. When the identification information is assigned to each wakeup keyword model, the detection signal for the wakeup keyword transmitted from the device 100 to the voice recognition server 110 may include identification information for the detected wakeup keyword have.

디바이스(100)가 휴대 가능한 디바이스인 경우에, 디바이스(100)는 스마트 폰(Smart Phone), 노트북(Notebook), 스마트 보드(Smart Board), 태블릿(Tablet) PC(Personal Computer), 핸드헬드(handheld) 디바이스, 핸드헬드 컴퓨터, 미디어 플레이어, 전자북 디바이스, 및 PDA(Personal Digital Assistant) 등과 같은 디바이스 중 적어도 하나를 포함할 수 있으나 본 개시에서 디바이스(100)는 상술한 바로 제한되지 않는다. In the case where the device 100 is a portable device, the device 100 may be a smart phone, a notebook, a smart board, a tablet PC, a handheld ) Device, a handheld computer, a media player, an electronic book device, and a device such as a PDA (Personal Digital Assistant), but the device 100 in this disclosure is not limited to the one just described.

디바이스(100)가 웨어러블 디바이스인 경우에, 디바이스(100)는 스마트 안경, 스마트 시계, 스마트 밴드(예를 들어, 스마트 허리 밴드, 및 스마트 헤어 밴드 등), 각종 스마트 장신구(예를 들어, 스마트 반지, 스마트 팔지, 스마트 발지. 스마트 헤어 핀, 스마트 클립, 및 스마트 목거리 등), 각종 스마트 신체 보호대(예를 들어, 스마트 무릎 보호대, 및 스마트 팔꿈치 보호대). 스마트 신발, 스마트 장갑, 스마트 의류, 스마트 모자, 스마트 의족, 또는 스마트 의수 등과 같은 디바이스 중 적어도 하나를 포함할 수 있으나 본 개시에서 디바이스(100)는 상술한 바로 제한되지 않는다. In the case where the device 100 is a wearable device, the device 100 may be a smart phone, a smart watch, a smart band (e.g., a smart waistband and a smart hairband), various smart ornaments Smart smart fingers, smart smart pins, smart clips, and smart necklaces), various smart body protectors (eg, smart knee protectors, and smart elbow protectors). The device 100 may include at least one of a smart shoe, a smart glove, a smart garment, a smart hat, a smart prosthesis, or a smart shoe, but the device 100 in this disclosure is not limited to the one just described.

디바이스(100)는 M2M(Machine to Machine) 또는 IoT(Internet of Things) 네트워크 기반의 디바이스(예를 들어, 스마트 가전 제품, 스마트 센서), 자동차, 및 자동차용 내비게이션 디바이스 등과 같은 디바이스를 포함할 수 있으나 본 개시에서 디바이스(100)는 상술한 바로 제한되지 않는다.The device 100 may include devices such as M2M (Machine to Machine) or devices based on Internet of Things (IoT) networks (e.g., smart appliances, smart sensors), automobiles, and automotive navigation devices The device 100 in this disclosure is not limited to the one just described.

디바이스(100)와 음성 인식 서버(110)는 유선 또는/및 무선 네트워크 기반으로 연결될 수 있다. 디바이스(100)와 음성 인식 서버(110)는 근거리 무선 네트워크 또는/및 원거리 무선 네트워크로 연결될 수 있다. The device 100 and the voice recognition server 110 may be connected based on a wired or wireless network. The device 100 and the voice recognition server 110 may be connected to a short-range wireless network and / or a long-range wireless network.

도 2는 일부 실시 예에 따른 음성 인식 시스템(10)에 포함되는 디바이스(100)와 음성 인식 서버(110) 기반으로 수행되는 음성 인식 방법의 동작 흐름도이다. 도 2는 사용자(101)의 음성 신호에 기초하여 음성 인식을 수행하는 경우이다. 2 is a flowchart illustrating an operation of a speech recognition method performed on the basis of the device 100 and the speech recognition server 110 included in the speech recognition system 10 according to some embodiments. Fig. 2 shows a case where voice recognition is performed based on the voice signal of the user 101. Fig.

도 2를 참조하면, 단계 S201에서, 디바이스(100)는 웨이크업 키워드 모델을 등록한다. 도 3은 일부 실시 예에 따른 음성 인식 방법에 있어서 웨이크업 키워드 모델을 등록하는 프로세스의 동작 흐름도이다. Referring to FIG. 2, in step S201, the device 100 registers a wake-up keyword model. 3 is a flowchart of an operation of a process of registering a wake-up keyword model in a speech recognition method according to some embodiments.

도 3을 참조하면, 단계 S301에서, 디바이스(100)는 사용자(101)의 음성 신호를 수신한다. 단계 S301에서 수신되는 사용자의 음성 신호는 웨이크업 키워드 모델을 등록하기 위한 것이다. 단계 S301에서, 디바이스(100)는 사용자(101)의 음성 신호를 대신하여 전술한 사용자(101)의 특정 제스처에 기초한 소리(또는 오디오 신호)를 수신할 수 있다. Referring to FIG. 3, in step S301, the device 100 receives a voice signal of the user 101. FIG. The voice signal of the user received in step S301 is for registering the wake-up keyword model. In step S301, the device 100 may receive a sound (or an audio signal) based on the specific gesture of the user 101 described above instead of the voice signal of the user 101. [

단계 S302에서, 디바이스(100)는 음성 인식 모델을 이용하여 사용자의 음성 신호를 인식할 수 있다. 음성 인식 모델은 ASR(Automatic Speech Recognition)에 기초한 음향 모델 및/또는 언어 모델을 포함할 수 있으나 본 개시에서 음성 인식 모델은 이로 제한되지 않는다. In step S302, the device 100 can recognize the user's voice signal using the voice recognition model. The speech recognition model may include an acoustic model and / or a language model based on ASR (Automatic Speech Recognition), but the speech recognition model is not limited to this in the present disclosure.

단계 S303에서, 수신된 사용자(101)의 음성 신호가 웨이크업 키워드 모델로서 유효한 것으로 판단되면, 디바이스(100)는 단계 S304에서, 웨이크업 키워드 모델을 생성하고, 등록한다. 디바이스(100)에 웨이크업 키워드 모델을 등록하는 것은 디바이스(100)에 웨이크업 키워드 모델을 저장하는 것을 의미할 수 있다. If it is determined in step S303 that the voice signal of the received user 101 is valid as a wake-up keyword model, the device 100 generates and registers a wake-up keyword model in step S304. Registering the wakeup keyword model in the device 100 may mean storing the wakeup keyword model in the device 100. [

단계 S303에서, 디바이스(100)는 사용자(101)의 음성 신호에 대한 음성 매칭률에 기초하여 수신된 사용자(101)의 음성 신호가 웨이크업 키워드 모델로서 유효한지를 판단할 수 있다. In step S303, the device 100 can determine whether the voice signal of the received user 101 is valid as a wake-up keyword model based on the voice matching rate for the voice signal of the user 101. [

예를 들어, 디바이스(100)에 의해 사용자(101)의 음성 신호를 복수 회 인식하고, 인식된 결과를 비교하고, 비교 결과에 따라 일관된 결과가 사전에 설정된 횟수 이상 나오면, 수신된 사용자(101)의 음성 신호가 웨이크업 키워드 모델로서 유효한 것으로 판단할 수 있다. For example, when the device 100 recognizes the voice signal of the user 101 a plurality of times, compares the recognized results, and if the consistent result exceeds a predetermined number of times according to the comparison result, It can be determined that the voice signal of the wakeup keyword is valid as the wake-up keyword model.

단계 S303에서, 수신된 사용자(101)의 음성 신호가 유효한 것으로 판단되면, 단계 S304에서, 디바이스(100)는 유효한 것으로 판단된 웨이크업 키워드 모델을 디바이스(100)에 등록한다. If it is determined in step S303 that the voice signal of the received user 101 is valid, the device 100 registers the wake-up keyword model determined to be valid in the device 100 in step S304.

단계 S303에서, 디바이스(100)는 사용자(101)의 음성 신호를 복수 회 인식하고, 인식된 결과를 비교한 결과, 일관된 결과가 사전에 설정된 횟수 미만이면, 수신된 사용자(101)의 음성 신호를 웨이크업 키워드 모델로서 유효하지 않은 것으로 판단할 수 있다. In step S303, the device 100 recognizes the voice signal of the user 101 a plurality of times, and as a result of comparing the recognized results, if the coherent result is less than a predetermined number of times, It can be determined that it is not valid as a wake-up keyword model.

단계 S303에서, 수신된 사용자(101)의 음성 신호가 유효하지 않은 것으로 판단되면, 디바이스(100)는 수신된 사용자(101)의 음성 신호를 웨이크업 키워드 모델로서 등록하지 않는다. If it is determined in step S303 that the voice signal of the received user 101 is not valid, the device 100 does not register the voice signal of the received user 101 as a wake-up keyword model.

수신된 사용자(101)의 음성 신호가 웨이크업 키워드 모델로서 유효하지 않은 것으로 판단될 경우에, 디바이스(100)는 알림 메시지를 출력할 수 있다. 알림 메시지는 다양한 형태 및 내용을 가질 수 있다. 예를 들어, 알림 메시지는 ‘현재 입력된 사용자(101)의 음성 신호는 웨이크업 키워드 모델로서 등록되지 않았다’는 메시지를 포함할 수 있다. 알림 메시지는 사용자(101)가 웨이크업 키워드 모델로서 등록 가능한 음성 신호를 입력할 수 있도록 안내하는 메시지를 포함할 수 있다. If the voice signal of the received user 101 is determined to be invalid as a wake-up keyword model, the device 100 can output a notification message. The notification message can have various forms and contents. For example, the notification message may include a message that the voice signal of the currently input user 101 is not registered as a wake-up keyword model. The notification message may include a message that guides the user 101 to input a voice signal that can be registered as a wake-up keyword model.

도 4는 일부 실시 예에 따른 음성 인식 방법에 있어서 웨이크업 키워드를 등록하는 다른 프로세스의 동작 흐름도이다. Fig. 4 is an operational flowchart of another process for registering a wake-up keyword in the speech recognition method according to some embodiments.

단계 S401에서, 디바이스(100)는 디바이스(100)에 저장된 후보 웨이크업 키워드 모델을 요청한다. 후보 웨이크업 키워드 모델 요청은 사용자(101)의 음성 신호에 기초할 수 있으나 본 개시는 이로 제한되지 않는다. 예를 들어, 디바이스(100)는 디바이스(100)의 특정 버튼 제어(또는 전용 버튼) 또는 터치 기반 입력에 따라 후보 웨이크업 키워드 모델을 요청하는 사용자 입력을 수신할 수 있다. In step S401, the device 100 requests a candidate wake-up keyword model stored in the device 100. [ The candidate wakeup keyword model request may be based on the voice signal of the user 101, but the present disclosure is not limited thereto. For example, the device 100 may receive a user input requesting a candidate wake-up keyword model according to a specific button control (or dedicated button) or touch-based input of the device 100.

단계 S402에서, 디바이스(100)는 후보 웨이크업 키워드 모델을 출력한다. 디바이스(100)는 디바이스(100)의 디스플레이를 통해 후보 웨이크업 키워드 모델을 출력할 수 있다. In step S402, the device 100 outputs a candidate wake-up keyword model. The device 100 may output a candidate wake-up keyword model through the display of the device 100. [

도 5(a) 및 도 5(b)는 일부 실시 예에 따른 음성 인식 시스템(10)에 포함된 디바이스(100)의 디스플레이상에 후보 웨이크업 키워드 모델을 디스플레이 하는 예들이다. Figs. 5 (a) and 5 (b) are examples of displaying a candidate wake-up keyword model on the display of the device 100 included in the speech recognition system 10 according to some embodiments.

도 5(a)는 디바이스(100)의 디스플레이에 디스플레이 되고 있는 후보 웨이크업 키워드 모델 리스트의 예이다. 도 5(a)를 참조하면, 후보 웨이크업 키워드 모델이 텍스트 형태로 제공된다. 5 (a) is an example of a candidate wake-up keyword model list being displayed on the display of the device 100. Fig. Referring to Fig. 5 (a), a candidate wake-up keyword model is provided in text form.

도 5(a)에 도시된 후보 웨이크업 키워드 모델 리스트에 기초하여 첫번째 후보 웨이크업 키워드 모델에 대한 터치 기반 입력이 수신되면, 디바이스(100)는 선택된 후보 웨이크업 키워드 모델에 대한 음성 파형을 도 5(b)에 도시된 바와 같이 디스플레이 하면서, 후보 웨이크업 키워드 모델에 대응되는 오디오 신호를 출력할 수 있다. 이에 따라 사용자는 웨이크업 키워드 모델을 선택하기 전에 선택할 웨이크업 키워드 모델을 확인할 수 있다. If a touch-based input to the first candidate wake-up keyword model is received based on the candidate wake-up keyword model list shown in Fig. 5 (a), the device 100 displays the speech waveform for the selected candidate wake- the audio signal corresponding to the candidate wake-up keyword model can be output while being displayed as shown in (b) of FIG. Accordingly, the user can confirm the wake-up keyword model to be selected before selecting the wake-up keyword model.

단계 S402에서, 디바이스(100)는 디바이스(100)의 오디오 출력부(예를 들어, 스피커)를 통해 후보 웨이크업 키워드 모델을 출력할 수 있다. In step S402, the device 100 may output a candidate wake-up keyword model through the audio output portion (e.g., a speaker) of the device 100. [

단계 S403에서, 후보 웨이크업 키워드 모델에 기초하여 하나의 후보 웨이크업 키워드 모델에 대한 선택 신호가 수신되면, 단계 S404에서 디바이스(100)는 선택된 후보 웨이크업 키워드 모델을 등록한다. 단계 S404에서, 디바이스(100)는 선택된 후보 웨이크업 키워드 모델에 대응되는 사용자(101)의 음성 신호 입력을 요청하고, 이에 따라 수신되는 사용자(101)의 음성 신호를 웨이크업 키워드 모델로서 생성하고, 등록할 수 있다. When a selection signal for one candidate wakeup keyword model is received based on the candidate wakeup keyword model in step S403, the device 100 registers the selected candidate wakeup keyword model in step S404. In step S404, the device 100 requests the input of the voice signal of the user 101 corresponding to the selected candidate wake-up keyword model, generates the voice signal of the received user 101 as a wake-up keyword model, You can register.

단계 S201에서, 디바이스(100)는 음성 인식 서버(110)로부터 웨이크업 키워드 모델을 수신하여 등록할 수 있다. 단계 S201에서, 디바이스(100)는 음성 인식 서버(110)와 통신 채널을 설정하고, 설정된 통신 채널을 통해 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송하면서 웨이크업 키워드 모델 등록을 요청할 수 있다. 이에 따라 디바이스(100)는 음성 인식 서버(110)에서 생성된 웨이크업 키워드 모델을 수신할 수 있다.In step S201, the device 100 can receive the wake-up keyword model from the voice recognition server 110 and register it. In step S201, the device 100 establishes a communication channel with the voice recognition server 110, transmits a voice signal of the user 101 received via the set communication channel to the voice recognition server 110, You may request registration. Accordingly, the device 100 can receive the wake-up keyword model generated by the speech recognition server 110. [

한편, 도 2의 단계 S202에서, 음성 인식 서버(110)는 웨이크업 키워드 모델을 등록한다. 단계 S202에서, 음성 인식 서버(110)는 디바이스(100)로부터 수신되는 웨이크업 키워드 모델을 등록할 수 있으나 본 개시에서 음성 인식 서버(110)에 웨이크업 키워드 모델을 등록하는 방식은 상술한 바로 제한되지 않는다. On the other hand, in step S202 of FIG. 2, the speech recognition server 110 registers a wake-up keyword model. In step S202, the voice recognition server 110 may register the wake-up keyword model received from the device 100, but the method of registering the wake-up keyword model in the voice recognition server 110 in this disclosure is not limited to the above- It does not.

예를 들어, 음성 인식 서버(110)는 디바이스(100)로 웨이크업 키워드 모델을 요청하여 수신할 수 있다. 이를 위하여, 음성 인식 서버(110)는 디바이스(100)를 모니터링 할 수 있다. 음성 인식 서버(110)는 디바이스(100)를 주기적으로 모니터링 할 수 있다. 음성 인식 서버(110)는 디바이스(100) 접근이 인식되면, 디바이스(100)를 모니터링 할 수 있다. 음성 인식 서버(110)는 디바이스(100)가 음성 인식 서버(110)에 연결된 것이 인식되면, 디바이스(100)로 웨이크업 키워드 모델을 요청할 수 있다. For example, the speech recognition server 110 may request and receive a wake-up keyword model from the device 100. For this purpose, the voice recognition server 110 may monitor the device 100. [ The voice recognition server 110 may periodically monitor the device 100. [ The voice recognition server 110 may monitor the device 100 when the access to the device 100 is recognized. The voice recognition server 110 may request the device 100 for a wake-up keyword model when it is recognized that the device 100 is connected to the voice recognition server 110. [

단계 S202에서, 음성 인식 서버(110)는 웨이크업 키워드 모델을 등록할 때, 웨이크업 키워드를 나타내는 태그를 웨이크업 키워드에 부가할 수 있다. 태그는, 예를 들어, 특수 기호(예를 들어, 느낌표(!))로 표현될 수 있으나, 본 개시에서 태그에 대한 표현은 상술한 바로 제한되지 않는다. In step S202, when the voice recognition server 110 registers the wake-up keyword model, the voice recognition server 110 may add a tag indicating the wake-up keyword to the wake-up keyword. The tag may be represented, for example, by a special symbol (e.g., an exclamation point (!)), But the representation for the tag in this disclosure is not limited to the one just described.

단계 S202에서, 음성 인식 서버(110)에 등록되는 웨이크업 키워드 모델은 디바이스(100)에 등록되는 웨이크업 키워드 모델과 동기화될 수 있다. 디바이스(100)에 등록된 웨이크업 키워드 모델이 업데이트될 때, 음성 인식 서버(110)에 등록된 웨이크업 키워드 모델은 업데이트 될 수 있다. In step S202, the wake-up keyword model registered in the voice recognition server 110 may be synchronized with the wake-up keyword model registered in the device 100. [ When the wake-up keyword model registered in the device 100 is updated, the wake-up keyword model registered in the speech recognition server 110 can be updated.

단계 S202에서, 음성 인식 서버(110)는 디바이스(100)로부터 사용자(101)의 음성 신호를 수신하여 웨이크업 키워드 모델을 생성하고 등록할 수 있다. 음성 인식 서버(110)는 상술한 도 3 또는 도 4에서 설명한 바와 같이 웨이크업 키워드 모델을 생성할 수 있다. 음성 인식 서버(110)는 단계 S201 이전에 디바이스(100)로부터 웨이크업 키워드 모델을 생성하기 위한 사용자(101)의 음성 신호를 수신할 수 있다. In step S202, the speech recognition server 110 receives the voice signal of the user 101 from the device 100 and generates and registers a wake-up keyword model. The speech recognition server 110 may generate a wake-up keyword model as described above with reference to FIG. 3 or FIG. The speech recognition server 110 may receive the speech signal of the user 101 to generate a wake-up keyword model from the device 100 prior to step S201.

단계 S203에서, 디바이스(100)는 사용자(101)의 음성 신호를 수신할 수 있다. 단계 S204에서, 디바이스(100)는 등록된 웨이크업 키워드 모델을 이용하여 수신된 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. 디바이스(100)는 등록된 웨이크업 키워드 모델과 수신되는 사용자(101)의 음성 신호간의 신호 특성을 비교하여 웨이크업 키워드를 검출할 수 있다. In step S203, the device 100 can receive the voice signal of the user 101. [ In step S204, the device 100 can detect the wake-up keyword from the voice signal of the received user 101 using the registered wake-up keyword model. The device 100 can detect the wake-up keyword by comparing the signal characteristics between the registered wake-up keyword model and the voice signal of the user 101 to be received.

단계 S205에서, 디바이스(100)는 웨이크업 키워드에 대한 검출 여부 신호와 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송할 수 있다. In step S205, the device 100 may transmit the detection signal for the wake-up keyword and the voice signal of the user 101 to the voice recognition server 110. [

단계 S206에서, 음성 인식 서버(110)는 수신된 웨이크업 키워드에 대한 검출 여부 신호에 따라 음성 인식 모델을 설정할 수 있다. 음성 인식 모델 설정은 도 1에서 설명한 바와 같을 수 있다. 즉, 웨이크업 키워드에 대한 검출 여부 신호가 웨이크업 키워드가 검출된 것을 나타낼 경우에, 음성 인식 서버(110)는 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정할 수 있다. 웨이크업 키워드에 대한 검출 여부 신호가 웨이크업 키워드가 검출되지 않은 것을 나타낼 경우에, 음성 인식 서버(110)는 웨이크업 키워드 모델이 결합되지 않은 음성 인식 모델을 설정할 수 있다. In step S206, the speech recognition server 110 can set the speech recognition model according to the detection whether or not the received wake-up keyword is detected. The speech recognition model setting may be as described in FIG. That is, when the detection signal for the wake-up keyword indicates that the wake-up keyword is detected, the voice recognition server 110 can set the voice recognition model combined with the wake-up keyword model. The speech recognition server 110 can set the speech recognition model to which the wake-up keyword model is not combined when the detection signal for the wake-up keyword indicates that the wake-up keyword is not detected.

단계 S207에서, 음성 인식 서버(110)는 설정된 음성 인식 모델을 이용하여 수신되는 사용자(101)의 음성 신호를 인식할 수 있다. 단계 S208에서, 음성 인식 서버(110)는 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 음성 인식 서버(110)는 웨이크업 키워드 모델을 등록할 때 웨이크업 키워드에 부가한 태그를 이용하여 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. In step S207, the voice recognition server 110 can recognize the voice signal of the user 101 that is received using the voice recognition model set. In step S208, the speech recognition server 110 can remove the wakeup keyword from the speech recognition result. The speech recognition server 110 can remove the wakeup keyword from the speech recognition result by using the tag added to the wakeup keyword when registering the wakeup keyword model.

단계 S209에서, 음성 인식 서버(110)는 웨이크업 키워드가 제거된 음성인식 결과를 디바이스(100)로 전송할 수 있다. 단계 S210에서, 디바이스(100)는 수신된 음성인식 결과에 따라 디바이스를 제어할 수 있다. In step S209, the speech recognition server 110 can transmit the speech recognition result from which the wake-up keyword is removed to the device 100. [ In step S210, the device 100 can control the device according to the received speech recognition result.

도 6은 일부 실시 예에 따른 음성 인식 시스템(10)에 포함되는 디바이스(100)와 음성 인식 서버(110)를 기반으로 수행되는 음성 인식 방법의 동작 흐름도이다. 도 6은 디바이스(100)에 기초한 환경 정보에 따른 웨이크업 키워드 모델을 이용하여 음성 인식을 수행하는 예이다. 6 is a flowchart illustrating an operation of a voice recognition method performed on the basis of the device 100 and the voice recognition server 110 included in the voice recognition system 10 according to some embodiments. Fig. 6 is an example of performing speech recognition using a wake-up keyword model according to environment information based on the device 100. Fig.

단계 S601에서, 디바이스(100)는 환경 정보에 기초하여 복수의 웨이크업 키워드 모델을 등록할 수 있다. 환경 정보는 위치 정보를 포함할 수 있다. 위치 정보는 물리적인 위치 정보와 논리적인 위치 정보를 포함할 수 있다. 물리적인 위치 정보는 위도 및 경도로 표현되는 정보를 말한다. 논리적인 위치 정보는 홈, 오피스, 카페와 같이 의미 정보로 표현되는 정보를 말한다. 환경 정보는 날씨 정보를 포함할 수 있다. 환경 정보는 시간 정보를 포함할 수 있다. 환경 정보는 일정 정보를 포함할 수 있다. 환경 정보는 위치, 시간, 날씨, 및/또는 일정 정보를 포함할 수 있다. 본 개시에서 환경 정보는 상술한 바로 제한되지 않고, 사용자(101)에게 직접적 또는 간접적으로 영향을 주는 조건 정보나 상황 정보를 포함할 수 있다. In step S601, the device 100 can register a plurality of wakeup keyword models based on the environment information. The environment information may include location information. The location information may include physical location information and logical location information. Physical location information refers to information expressed in latitude and longitude. Logical location information refers to information represented by semantic information such as home, office, or cafe. Environmental information may include weather information. The environmental information may include time information. Environmental information may include schedule information. The environmental information may include location, time, weather, and / or schedule information. In the present disclosure, the environmental information is not limited to the above-described one, and may include condition information or situation information that directly or indirectly affects the user 101. [

예를 들어, 디바이스(100)는 디바이스(100)의 위치가 홈인 경우에 웨이크업 키워드 모델과 디바이스(100)의 위치가 오피스인 경우에 웨이크업 키워드 모델을 다르게 등록할 수 있다. 디바이스(100)는 디바이스(100)에 의해 검출된 시간이 오전 6시일 때 웨이크업 키워드 모델과 디바이스(100)에 의해 검출된 시간이 오후 6일 때 웨이크업 키워드 모델을 다르게 등록할 수 있다. 디바이스(100)는 디바이스(100)에 의해 검출된 날씨가 쾌청한 경우에 웨이크업 키워드 모델과 디바이스(100)에 의해 검출된 날씨가 비오는 경우에 웨이크업 키워드 모델을 다르게 등록할 수 있다. 디바이스(100)는 디바이스(100)에 의해 검출된 사용자(101)의 일정에 따라 다른 웨이크업 키워드 모델을 등록할 수 있다. For example, the device 100 can register the wake-up keyword model differently when the position of the device 100 is the home and the wake-up keyword model when the position of the device 100 is the office. The device 100 may register the wake-up keyword model differently when the time detected by the device 100 is 6 am and the wake-up keyword model when the time detected by the device 100 is 6 pm. The device 100 may register the wakeup keyword model differently when the weather detected by the device 100 is clear and the wakeup keyword model differently when the weather detected by the device 100 is rainy. The device 100 may register another wakeup keyword model according to the schedule of the user 101 detected by the device 100. [

단계 S601에서, 디바이스(100)는 상술한 단계 S201에서와 같이 환경 정보에 기초한 복수의 웨이크업 키워드 모델을 음성 인식 서버(110)로부터 수신하여 등록할 수 있다. In step S601, the device 100 can receive and register a plurality of wakeup keyword models based on environmental information from the speech recognition server 110 as in step S201 described above.

단계 S602에서, 음성 인식 서버(110)는 환경 정보에 기초하여 복수의 웨이크업 키워드 모델을 등록할 수 있다. In step S602, the speech recognition server 110 can register a plurality of wake-up keyword models based on the environment information.

음성 인식 서버(110)에 등록되는 복수의 웨이크업 키워드 모델은 디바이스(100)에 등록된 복수의 웨이크업 키워드 모델과 실시간으로 동기화될 수 있다. 따라서, 디바이스(100)에 등록된 복수의 웨이크업 키워드 모델이 업데이트될 때마다 음성 인식 서버(110)에 등록된 복수의 웨이크업 키워드 모델은 업데이트될 수 있다. A plurality of wakeup keyword models registered in the speech recognition server 110 can be synchronized in real time with a plurality of wakeup keyword models registered in the device 100. [ Therefore, each time a plurality of wake-up keyword models registered in the device 100 are updated, a plurality of wake-up keyword models registered in the speech recognition server 110 can be updated.

단계 S602에서, 음성 인식 서버(110)는 디바이스(100)로부터 수신되는 복수의 웨이크업 키워드 모델을 등록할 수 있다. 단계 S602에서, 음성 인식 서버(110)는 디바이스(100)로 복수의 웨이크업 키워드 모델을 요청하여 디바이스(100)로부터 복수의 웨이크업 키워드 모델을 수신할 수 있다. In step S602, the voice recognition server 110 can register a plurality of wake-up keyword models received from the device 100. [ In step S602, the speech recognition server 110 may request a plurality of wake-up keyword models from the device 100 to receive a plurality of wake-up keyword models from the device 100. [

단계 S602에서, 음성 인식 서버(110)는, 상술한 단계 S202에서와 같이, 디바이스(100)와 음성 인식 서버(110)간에 통신 채널을 설정하고, 설정된 통신 채널을 통해 디바이스(100)로부터 수신되는 사용자(101)의 음성 신호에 기초하여 상술한 환경 정보에 기초한 복수의 웨이크업 키워드 모델을 생성하고 등록할 수 있다. 음성 인식 서버(110)는 이와 같이 등록된 복수의 웨이크업 키워드 모델을 디바이스(100)로 제공할 수 있다.In step S602, the voice recognition server 110 sets a communication channel between the device 100 and the voice recognition server 110 as in the above-described step S202, A plurality of wakeup keyword models based on the environmental information described above can be generated and registered based on the voice signal of the user 101. [ The speech recognition server 110 can provide the device 100 with a plurality of wakeup keyword models thus registered.

단계 603에서, 디바이스(100)는 사용자(101)의 음성 신호를 수신할 수 있다. 단계 S604에서, 디바이스(100)는 디바이스(100)에 기초한 환경 정보를 검출할 수 있다. 디바이스(100)는 디바이스(100)에 포함된 센서들 또는 디바이스(100)에 설정된 애플리케이션을 이용하여 디바이스(100)에 기초한 환경 정보를 검출할 수 있다. In step 603, the device 100 may receive the voice signal of the user 101. [ In step S604, the device 100 can detect environment information based on the device 100. [ The device 100 can detect environmental information based on the device 100 using the sensors included in the device 100 or an application set in the device 100. [

예를 들어, 디바이스(100)는 디바이스(100)에 포함된 위치 센서(예를 들어, GPS(Global Positioning System) 센서)를 이용하여 위치 정보를 검출할 수 있다. 디바이스(100)는 디바이스(100)에 설정된 타이머 애플리케이션을 이용하여 시간 정보를 검출할 수 있다. 디바이스(100)는 디바이스(100)에 설정된 날씨 애플리케이션을 이용하여 날씨 정보를 검출할 수 있다. 디바이스(100)는 디바이스(100)에 설정된 일정 애플리케이션을 이용하여 사용자(101)의 일정을 검출할 수 있다. For example, the device 100 may detect position information using a position sensor (e.g., a Global Positioning System (GPS) sensor) included in the device 100. The device 100 can detect the time information using the timer application set in the device 100. [ The device 100 may detect weather information using the weather application set in the device 100. [ The device 100 can detect the schedule of the user 101 using the fixed application set in the device 100. [

단계 S605에서, 디바이스(100)는 등록된 복수의 웨이크업 키워드 모델중에서 검출된 환경 정보에 대응되는 웨이크업 키워드 모델을 이용하여 수신된 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. In step S605, the device 100 can detect the wakeup keyword from the voice signal of the received user 101 using the wake-up keyword model corresponding to the detected environment information among the plurality of registered wake-up keyword models .

예를 들어, 홈에서의 웨이크업 키워드 모델이 “하이(Hi)”이고, 오피스에서의 웨이크업 키워드 모델이 “굿(Good)”일 때, 디바이스(100)에 의해 검출된 디바이스(100)의 위치가 오피스이면, 디바이스(100)는 “굿(Good)”을 이용하여 수신된 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. For example, when the wake-up keyword model at the home is " Hi " and the wake-up keyword model at the office is " Good ", the device 100 If the location is an office, the device 100 can detect the wakeup keyword from the voice signal of the user 101 received using " Good. &Quot;

단계 S606에서, 디바이스(100)는 검출된 환경 정보, 웨이크업 키워드에 대한 검출 여부 신호, 및 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송할 수 있다. In step S606, the device 100 may transmit the detected environment information, a detection whether or not to detect the wakeup keyword, and a voice signal of the user 101 to be received to the voice recognition server 110. [

단계 S607에서, 음성 인식 서버(110)는 웨이크업 키워드에 대한 검출 여부 신호와 수신된 디바이스(100)에 기초한 환경 정보에 따라 웨이크업 키워드 모델을 결정하고, 결정된 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정할 수 있다. In step S607, the voice recognition server 110 determines a wake-up keyword model in accordance with the detection signal for the wake-up keyword and the environment information based on the received device 100, and the determined wake- You can set the model.

단계 S608에서, 음성 인식 서버(110)는 설정된 음성 인식 모델을 이용하여 수신되는 음성 신호를 인식할 수 있다. 단계 S609에서, 음성 인식 서버(110)는 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 음성 인식 서버(110)는 웨이크업 키워드 모델 등록 시 웨이크업 키워드에 부가한 태그를 이용하여 음성인식 결과에 포함되어 있는 웨이크업 키워드를 제거할 수 있다.In step S608, the speech recognition server 110 can recognize the received speech signal using the set speech recognition model. In step S609, the speech recognition server 110 can remove the wakeup keyword from the speech recognition result. The speech recognition server 110 can remove the wakeup keyword contained in the speech recognition result by using the tag added to the wakeup keyword when registering the wakeup keyword model.

단계 S610에서, 음성 인식 서버(110)는 웨이크업 키워드가 제거된 음성인식 결과를 디바이스(100)로 전송할 수 있다. 단계 S611에서, 디바이스(100)는 수신된 음성인식 결과에 따라 디바이스(100)를 제어할 수 있다. In step S610, the voice recognition server 110 may transmit the voice recognition result to the device 100 from which the wake-up keyword has been removed. In step S611, the device 100 can control the device 100 according to the received voice recognition result.

도 7은 일부 실시 예에 따른 음성 인식 시스템(10)에 포함되는 디바이스(100)와 음성 인식 서버(110)를 기반으로 수행되는 음성 인식 방법의 동작 흐름도이다. 도 7은 사용자(101)의 식별 정보, 디바이스(100)에 기초한 환경 정보, 및 웨이크업 키워드 검출 여부 신호에 따라 음성 인식 모델을 설정하여 음성 인식을 수행하는 예이다. 7 is a flowchart illustrating an operation of a voice recognition method performed on the basis of the device 100 and the voice recognition server 110 included in the voice recognition system 10 according to some embodiments. 7 shows an example in which speech recognition is performed by setting a speech recognition model in accordance with identification information of the user 101, environment information based on the device 100, and a wakeup keyword detection signal.

단계 S701에서, 디바이스(100)는 환경 정보에 기초하여 복수의 웨이크업 키워드 모델을 등록할 수 있다. 환경 정보는 도 6의 단계 S601에서 설명한 바와 같을 수 있으나 이로 제한되지 않는다. 단계 S701에서, 디바이스(100)는 음성 인식 서버(110)로부터 수신되는 복수의 웨이크업 키워드 모델을 등록할 수 있다.In step S701, the device 100 can register a plurality of wake-up keyword models based on the environment information. The environment information may be as described in step S601 of Fig. 6, but is not limited thereto. In step S701, the device 100 can register a plurality of wake-up keyword models received from the voice recognition server 110. [

단계 S702에서, 음성 인식 서버(110)는 환경 정보와 사용자(101)의 식별 정보에 기초하여 복수의 웨이크업 키워드 모델을 등록할 수 있다. 예를 들어, 음성 인식 서버(110)는 사용자(101)의 식별 정보 A에 대해 환경 정보에 기초한 복수의 웨이크업 키워드 모델을 등록할 수 있다. 음성 인식 서버(110)는 사용자(101)의 식별 정보 B에 대해 환경 정보에 기초한 복수의 웨이크업 키워드 모델을 등록할 수 있다. In step S702, the voice recognition server 110 can register a plurality of wake-up keyword models based on the environment information and the identification information of the user 101. [ For example, the voice recognition server 110 can register a plurality of wake-up keyword models based on the environment information with respect to the identification information A of the user 101. [ The speech recognition server 110 can register a plurality of wakeup keyword models based on the environment information with respect to the identification information B of the user 101. [

음성 인식 서버(110)에 등록된 복수의 웨이크업 키워드 모델은 사용자 단위로 동기화 처리가 수행될 수 있다. 예를 들어, 사용자 A의 복수의 웨이크업 키워드 모델이 업데이트되면, 음성 인식 서버(110)에 등록된 복수의 웨이크업 키워드 모델 중에서 사용자 A의 복수의 웨이크업 키워드 모델을 업데이트할 수 있다. The plurality of wake-up keyword models registered in the voice recognition server 110 can be synchronized on a user-by-user basis. For example, when a plurality of wake-up keyword models of the user A are updated, a plurality of wake-up keyword models of the user A among the plurality of wakeup keyword models registered in the speech recognition server 110 can be updated.

단계 S702에서, 음성 인식 서버(110)는 디바이스(100)로부터 수신되는 사용자(101)의 음성 신호에 기초하여 상술한 웨이크업 키워드 모델을 등록할 수 있다. 이와 같은 경우에, 음성 인식 서버(110)는 디바이스(100)로 등록된 웨이크업 키워드 모델을 제공할 수 있다. In step S702, the voice recognition server 110 can register the above-described wake-up keyword model based on the voice signal of the user 101 received from the device 100. [ In such a case, the voice recognition server 110 may provide a wake-up keyword model registered with the device 100. [

단계 703에서, 디바이스(100)는 사용자(101)의 음성 신호를 수신할 수 있다. 단계 S704에서, 디바이스(100)는 디바이스(100)에 기초한 환경 정보를 검출할 수 있다. 단계 S705에서, 디바이스(100)는 수신되는 사용자(101)의 음성 신호에 기초하여 사용자(101)의 식별 정보를 획득할 수 있다. 사용자(101)의 식별 정보는 사용자(101)의 닉 네임, 성별, 이름 등을 포함할 수 있으나 본 개시에서 사용자(101)의 식별 정보는 상술한 바로 제한되지 않는다. In step 703, the device 100 may receive the voice signal of the user 101. [ In step S704, the device 100 can detect the environment information based on the device 100. [ In step S705, the device 100 can acquire the identification information of the user 101 based on the voice signal of the user 101 to be received. The identification information of the user 101 may include a nickname, a sex, a name, etc. of the user 101, but the identification information of the user 101 in this disclosure is not limited to the above.

또한, 단계 S705는 사용자(101)의 지문 인식 또는 사용자(101)의 홍체 인식 기술을 이용하여 사용자(101)의 식별 정보를 획득하도록 구성될 수 있다. Step S705 may be configured to acquire the identification information of the user 101 using the fingerprint recognition of the user 101 or the iris recognition technique of the user 101. [

단계 S706에서, 디바이스(100)는 등록된 복수의 웨이크업 키워드 모델 중에서 검출된 환경 정보에 대응되는 웨이크업 키워드 모델을 이용하여 수신된 사용자(101)의 음성신호로부터 웨이크업 키워드를 검출할 수 있다. In step S706, the device 100 can detect the wakeup keyword from the voice signal of the received user 101 using the wakeup keyword model corresponding to the detected environment information among the plurality of registered wakeup keyword models .

단계 S707에서, 디바이스(100)는 검출된 환경 정보, 사용자(101)의 식별 정보, 웨이크업 키워드에 대한 검출 여부 신호, 및 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송할 수 있다. In step S707, the device 100 transmits the detected environment information, the identification information of the user 101, the detection / non-detection signal for the wakeup keyword, and the voice signal of the received user 101 to the voice recognition server 110 .

단계 S708에서, 음성 인식 서버(110)는 웨이크업 키워드에 대한 검출 여부 신호, 수신된 디바이스(100)에 기초한 환경 정보, 및 사용자(101)의 식별 정보에 따라 웨이크업 키워드 모델을 결정하고, 결정된 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정할 수 있다. In step S708, the voice recognition server 110 determines a wake-up keyword model in accordance with the detection signal for the wake-up keyword, the environment information based on the received device 100, and the identification information of the user 101, You can set up a speech recognition model that combines a wakeup keyword model.

단계 S709에서, 음성 인식 서버(110)는 설정된 음성 인식 모델을 이용하여 수신되는 음성 신호를 인식할 수 있다. 단계 S710에서, 음성 인식 서버(110)는 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 음성 인식 서버(110)는 웨이크업 키워드 모델 등록 시 웨이크업 키워드에 부가한 태그를 이용하여 음성인식 결과에 포함되어 있는 웨이크업 키워드를 제거할 수 있다.In step S709, the speech recognition server 110 can recognize the received speech signal using the set speech recognition model. In step S710, the speech recognition server 110 may remove the wakeup keyword from the speech recognition result. The speech recognition server 110 can remove the wakeup keyword contained in the speech recognition result by using the tag added to the wakeup keyword when registering the wakeup keyword model.

단계 S711에서, 음성 인식 서버(110)는 웨이크업 키워드가 제거된 음성인식 결과를 디바이스(100)로 전송할 수 있다. 단계 S712에서, 디바이스(100)는 수신된 음성인식 결과에 따라 디바이스(100)를 제어할 수 있다. In step S711, the voice recognition server 110 can transmit the voice recognition result from which the wake-up keyword is removed to the device 100. [ In step S712, the device 100 can control the device 100 according to the received speech recognition result.

도 8은 일부 실시 예에 따른 디바이스(100)에 의한 음성 인식 방법의 동작 흐름도이다. 도 8은 음성 인식 서버(110)와 관계없이 디바이스(100)에 의해 음성 인식을 수행하는 경우이다. FIG. 8 is a flow chart of the operation of the speech recognition method by the device 100 according to some embodiments. FIG. 8 shows a case where the device 100 performs voice recognition regardless of the voice recognition server 110. FIG.

단계 S801에서, 디바이스(100)는 웨이크업 키워드 모델을 등록할 수 있다. 등록할 때, 디바이스(100)는 웨이크업 키워드를 식별할 수 있는 태그를 웨이크업 키워드에 부가할 수 있다. 단계 S801에서, 디바이스(100)는 음성 인식 서버(110)로부터 웨이크업 키워드 모델을 수신하여 등록할 수 있다. In step S801, the device 100 can register the wake-up keyword model. When registering, the device 100 may add a tag capable of identifying the wake-up keyword to the wake-up keyword. In step S801, the device 100 can receive and register the wake-up keyword model from the voice recognition server 110. [

단계 S802에서, 디바이스(100)는 사용자(101)의 음성 신호를 수신할 수 있다. 단계 S803에서, 디바이스(100)는 웨이크업 키워드 모델을 이용하여 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. In step S802, the device 100 can receive the voice signal of the user 101. [ In step S803, the device 100 can detect the wakeup keyword from the voice signal of the user 101 using the wake-up keyword model.

단계 S804에서, 웨이크업 키워드가 검출된 것으로 판단되면, 단계 S805로 진행되어, 디바이스(100)는 웨이크업 키워드 모델이 결합된 음성 인식 모델을 설정할 수 있다. 단계 S806에서, 디바이스(100)는 음성 인식 모델을 이용하여 수신된 사용자(101)의 음성 신호에 대한 음성 인식 처리를 수행할 수 있다. If it is determined in step S804 that the wakeup keyword has been detected, the process proceeds to step S805 where the device 100 can set the voice recognition model in which the wakeup keyword model is combined. In step S806, the device 100 can perform speech recognition processing on the received speech signal of the user 101 using the speech recognition model.

단계 S807에서, 디바이스(100)는 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 디바이스(100)는 웨이크업 키워드를 식별하는 태그를 이용하여 음성인식 결과로부터 웨이크업 키워드를 제거할 수 있다. 단계 S808에서, 디바이스(100)는 웨이크업 키워드가 제거된 음성인식 결과를 이용하여 디바이스(100)를 제어할 수 있다. In step S807, the device 100 may remove the wakeup keyword from the speech recognition result. The device 100 may remove the wake-up keyword from the speech recognition result using a tag that identifies the wake-up keyword. In step S808, the device 100 can control the device 100 using the voice recognition result from which the wake-up keyword has been removed.

단계 S804에서, 웨이크업 키워드가 검출되지 않은 것으로 판단되면, 단계 S809로 진행되어 디바이스(100)는 웨이크업 키워드 모델이 결합되지 않은 음성 인식 모델을 설정할 수 있다. 단계 S810에서, 디바이스(100)는 음성 인식 모델을 이용하여 사용자(101)의 음성 신호에 대한 인식 처리를 수행할 수 있다. 단계 S811에서, 디바이스(100)는 음성인식 결과를 이용하여 디바이스(100)를 제어할 수 있다. If it is determined in step S804 that the wakeup keyword is not detected, the process advances to step S809, and the device 100 can set a speech recognition model to which the wakeup keyword model is not combined. In step S810, the device 100 can perform recognition processing on the voice signal of the user 101 using the voice recognition model. In step S811, the device 100 can control the device 100 using the voice recognition result.

도 8은 상술한 도 6과 같이 환경 정보에 기초하여 복수의 웨이크업 키워드 모델을 등록하여 음성 신호를 인식하도록 변형될 수 있다.Fig. 8 can be modified to recognize a voice signal by registering a plurality of wake-up keyword models based on environmental information as in Fig. 6 described above.

상술한 도 2, 6, 7, 또는 8은 환경 정보에 관계없이 복수의 웨이크업 키워드 모델을 등록하여 음성 신호를 인식하도록 변형될 수 있다. 복수의 웨이크업 키워드 모델은 사용자 별로 설정될 수 있다. 복수의 웨이크업 키워드 모델이 등록될 경우에, 각 웨이크업 키워드 모델은 웨이크업 키워드를 식별할 수 있는 식별 정보를 포함할 수 있다. 2, 6, 7, or 8 described above can be modified to recognize a voice signal by registering a plurality of wake-up keyword models regardless of environmental information. A plurality of wake-up keyword models may be set for each user. When a plurality of wakeup keyword models are registered, each wakeup keyword model may include identification information capable of identifying wakeup keywords.

도 9 은 일부 실시 예에 따른 디바이스(100)의 기능 블록도이다. 9 is a functional block diagram of a device 100 according to some embodiments.

도 9를 참조하면, 디바이스(100)는 오디오 입력부(910), 통신부(920), 프로세서(930), 디스플레이(940), 사용자 입력부(950), 및 메모리(960)를 포함한다. 9, the device 100 includes an audio input 910, a communication unit 920, a processor 930, a display 940, a user input 950, and a memory 960.

오디오 입력부(910)는 사용자(101)의 음성 신호를 수신할 수 있다. 오디오 입력부(910)는 상술한 사용자(101)의 특정 제스처에 기초한 소리(오디오 신호)를 수신할 수 있다. The audio input unit 910 can receive the audio signal of the user 101. [ The audio input unit 910 can receive a sound (audio signal) based on the specific gesture of the user 101 described above.

오디오 입력부(910)는 디바이스(100)의 외부로부터 입력되는 오디오 신호를 수신할 수 있다. 오디오 입력부(910)는 수신된 오디오 신호를 전기적인 오디오 신호로 변환하여 프로세서(930)로 전송할 수 있다. 오디오 입력부(910)는 외부의 음향 신호를 입력 받는 과정에서 발생 되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘에 기초한 동작을 수행하도록 구성될 수 있다. 오디오 입력부(910)는 마이크로 폰으로 구성될 수 있다.The audio input unit 910 can receive an audio signal input from the outside of the device 100. The audio input unit 910 converts the received audio signal into an electrical audio signal and transmits the electrical audio signal to the processor 930. The audio input unit 910 may be configured to perform an operation based on various noise reduction algorithms for eliminating noise generated in receiving an external sound signal. The audio input unit 910 may be a microphone.

통신부(920)는 음성 인식 서버(110)와 디바이스(100)간에 유선 또는/및 무선으로 연결되도록 구성될 수 있다. 통신부(920)는 도 10에서 후술할 통신부(1040)와 같이 구성될 수 있다.The communication unit 920 may be configured to be connected between the voice recognition server 110 and the device 100 in a wired or wireless manner. The communication unit 920 may be configured as the communication unit 1040 described later with reference to FIG.

프로세서(930)는 디바이스(100)의 동작을 제어하는 제어부라고 말할 수 있다. 프로세서(930)는 오디오 입력부(910), 통신부(920), 디스플레이(940), 사용자 입력부(950), 및 메모리(960)를 제어할 수 있다. 오디오 입력부(910)를 통해 사용자(101)의 음성 신호가 수신되면, 프로세서(930)는 실시간으로 웨이크업 키워드 모델을 이용한 음성 인식 처리를 수행할 수 있다. The processor 930 may be referred to as a controller that controls the operation of the device 100. [ The processor 930 may control an audio input 910, a communication unit 920, a display 940, a user input 950, and a memory 960. When the voice signal of the user 101 is received through the audio input unit 910, the processor 930 can perform speech recognition processing using the wake-up keyword model in real time.

프로세서(930)는 메모리(960)에 웨이크업 키워드 모델을 등록할 수 있다. 프로세서(930)는 통신부(920)를 통해 음성 인식 서버(110)로부터 수신된 웨이크업 키워드 모델을 메모리(960)에 등록할 수 있다. 프로세서(930)는 오디오 입력부(910)를 통해 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)로 전송하면서, 사용자(101)의 음성 신호에 기초한 웨이크업 키워드 모델을 요청할 수 있다. Processor 930 may register the wakeup keyword model in memory 960. [ The processor 930 can register the wake-up keyword model received from the speech recognition server 110 via the communication unit 920 in the memory 960. [ The processor 930 may request a wake-up keyword model based on the voice signal of the user 101 while transmitting the voice signal of the user 101 received through the audio input unit 910 to the voice recognition server 110. [

프로세서(930)는 메모리(960)에 등록된 웨이크업 키워드 모델을 통신부(920)를 통해 음성 인식 서버(110)로 전송할 수 있다. 프로세서(930)는 통신부(920)를 통해 음성 인식 서버(110)로부터 웨이크업 키워드 모델 요청 신호가 수신되면, 등록된 웨이크업 키워드 모델을 음성 인식 서버(110)로 전송할 수 있다. 프로세서(930)는 메모리(960)에 웨이크업 키워드 모델이 등록됨과 동시에 음성 인식 서버(110)로 등록된 웨이크업 키워드 모델을 전송할 수 있다. The processor 930 may transmit the wake-up keyword model registered in the memory 960 to the voice recognition server 110 via the communication unit 920. [ The processor 930 can transmit the registered wakeup keyword model to the voice recognition server 110 when the wakeup keyword model request signal is received from the voice recognition server 110 through the communication unit 920. [ The processor 930 may transmit the wake-up keyword model registered with the voice recognition server 110 at the same time that the wake-up keyword model is registered in the memory 960. [

프로세서(930)는 오디오 입력부(910)를 통해 사용자(101)의 음성 신호가 수신됨에 따라 메모리(960)에 등록된 웨이크업 키워드 모델을 이용하여 수신되는 사용자(101)의 음성 신호로부터 웨이크업 키워드를 검출할 수 있다. 프로세서(930)는 웨이크업 키워드에 대한 검출 여부 신호와 수신되는 사용자(101)의 음성 신호를 통신부(920)를 통해 음성 인식 서버(110)로 전송할 수 있다. The processor 930 receives a wakeup keyword from the voice signal of the user 101 received using the wakeup keyword model registered in the memory 960 as the voice signal of the user 101 is received through the audio input 910, Can be detected. The processor 930 may transmit the detection signal for the wakeup keyword and the voice signal of the user 101 to the voice recognition server 110 through the communication unit 920. [

프로세서(930)는 통신부(920)를 통해 음성 인식 서버(110)로부터 음성 인식 결과를 수신할 수 있다. 프로세서(930)는 수신되는 음성 인식 결과에 따라 디바이스(100)를 제어할 수 있다. The processor 930 may receive the voice recognition result from the voice recognition server 110 through the communication unit 920. [ The processor 930 may control the device 100 according to the received speech recognition result.

오디오 입력부(910)를 통해 웨이크업 키워드 모델을 등록하기 위한 오디오 신호가 수신되면, 프로세서(930)는 상술한 도 3의 단계 S303에서와 같이, 오디오 신호에 대한 매칭률에 기초하여 오디오 신호가 상기 웨이크업 키워드 모델로서 사용 가능한지를 판단할 수 있다. When an audio signal for registering the wake-up keyword model is received through the audio input unit 910, the processor 930 determines whether or not the audio signal is an audio signal based on the matching rate for the audio signal, as in step S303 of FIG. It can be determined whether or not it can be used as a wake-up keyword model.

프로세서(930)는 사용자 입력부(950)를 통해 수신되는 사용자 입력에 따라 메모리(960)에 저장된 후보 웨이크업 키워드 모델중에서 선택된 후보 웨이크업 키워드 모델을 메모리(960)에 등록할 수 있다. The processor 930 may register the candidate wakeup keyword model selected from the candidate wakeup keyword keywords stored in the memory 960 in the memory 960 in accordance with the user input received through the user input unit 950. [

프로세서(930)는 디바이스(100)의 구성 형태에 따라 메인 프로세서와 서브 프로세서로 나뉠 수 있다. 서브 프로세서는 저전력 프로세서로 설정될 수 있다.The processor 930 may be divided into a main processor and a sub-processor depending on the configuration of the device 100. [ The sub-processor may be configured as a low-power processor.

디스플레이(940)는 프로세서(930)에 의해 제어되어 사용자(101)가 요청한 후보 웨이크업 키워드를 디스플레이 할 수 있다. 디스플레이(940)는 액정 디스플레이(liquid crystal display), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display), 유기 발광 다이오드(organic light-emitting diode), 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 또는 전기영동 디스플레이(electrophoretic display, EPD)를 포함할 수 있다. 디스플레이(940)는, 예를 들어, 터치 스크린을 포함할 수 있으나, 본 개시는 디스플레이(940)의 구성을 상술한 바로 제한하지 않는다.Display 940 may be controlled by processor 930 to display the candidate wakeup keyword requested by user 101. [ The display 940 may be a liquid crystal display, a thin film transistor-liquid crystal display, an organic light-emitting diode, a flexible display, a three-dimensional display display, or an electrophoretic display (EPD). The display 940 may include, for example, a touch screen, but this disclosure does not limit the construction of the display 940 just as described above.

사용자 입력부(950)는 디바이스(100)에 대한 사용자 입력을 수신할 수 있다. 사용자 입력부(950)는 웨이크업 키워드 등록 요청을 나타내는 사용자 입력, 후보 웨이크업 키워드중 하나의 웨이크업 키워드를 선택하는 사용자 입력, 및/또는 선택된 후보 웨이크업 키워드에 대한 등록을 나타내는 사용자 입력을 수신할 수 있다. 본 개시에서 사용자 입력부(950)를 통해 수신되는 사용자 입력은 상술한 바로 제한되지 않는다. 사용자 입력부(950)는 수신된 사용자 입력을 프로세서(930)로 전송한다. The user input 950 may receive user input to the device 100. The user input 950 receives a user input indicating a wakeup keyword registration request, a user input selecting one of the candidate wakeup keywords, and / or a user input indicating a registration for a selected candidate wakeup keyword . The user input received via the user input 950 in this disclosure is not limited to the one just described. The user input 950 transmits the received user input to the processor 930.

메모리(960)는, 웨이크업 키워드 모델을 저장할 수 있다. 메모리(960)는 프로세서(930)의 처리 및 제어를 위한 프로그램을 저장할 수 있다. 메모리(960)에 저장되는 프로그램은 OS(Operating System) 프로그램 및 각종 애플리케이션 프로그램을 포함할 수 있다. 각종 애플리케이션 프로그램은 본 개시의 실시 예들에 따른 음성 인식 애플리케이션, 카메라 애플리케이션 등을 포함할 수 있다. 메모리(960)는 애플리케이션 프로그램에 의해 관리되는 정보(예를 들어, 사용자(101)의 웨이크업 키워드 사용 히스토리 정보, 사용자(101)의 일정 정보, 또는 사용자(101) 프로파일 정보)를 저장할 수 있다.The memory 960 may store a wakeup keyword model. The memory 960 may store programs for processing and control of the processor 930. The program stored in the memory 960 may include an OS (Operating System) program and various application programs. Various application programs may include speech recognition applications, camera applications, etc., in accordance with embodiments of the present disclosure. The memory 960 may store information managed by the application program (e.g., history information of the user 101's wakeup keyword, schedule information of the user 101, or user profile information).

메모리(960)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류될 수 있다. 복수개의 모듈들은, 예를 들어, 이동 통신 모듈, Wi-Fi 모듈, 블루투스 모듈, DMB 모듈, 카메라 모듈, 센서 모듈, GPS 모듈, 동영상 재생 모듈, 오디오 재생 모듈, 전원 모듈, 터치 스크린 모듈, UI 모듈, 또는/및 어플리케이션 모듈을 포함할 수 있다. Programs stored in the memory 960 may be classified into a plurality of modules according to their functions. The plurality of modules may include, for example, a mobile communication module, a Wi-Fi module, a Bluetooth module, a DMB module, a camera module, a sensor module, a GPS module, a video playback module, an audio playback module, , &Lt; / RTI > and / or application modules.

메모리(960)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory), SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 또는 광디스크 타입의 저장매체를 포함할 수 있다. The memory 960 may be a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (e.g., SD or XD memory), a RAM (Random Access Memory), SRAM (Static Random Access Memory), ROM (Read Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) Disk, or optical disk type storage media.

도 10은 본 발명의 다른 일 실시 예에 따른 디바이스(100)의 블록도이다. 10 is a block diagram of a device 100 in accordance with another embodiment of the present invention.

도 10을 참조하면, 디바이스(100)는 센서부(1010), 사용자 인터페이스부(1020), 메모리(1030), 통신부(1040), 영상 처리부(1050), 오디오 출력부(1060), 카메라(1070), 및 프로세서(1090)를 포함한다. 10, the device 100 includes a sensor unit 1010, a user interface unit 1020, a memory 1030, a communication unit 1040, an image processing unit 1050, an audio output unit 1060, a camera 1070 ), And a processor 1090.

디바이스(100)는 배터리를 포함할 수 있다. 배터리는 내장형으로 디바이스(100)에 포함되거나 착탈이 가능한 형태로 디바이스(100)에 포함될 수 있다. 배터리는 디바이스(100)에 포함된 모든 구성 요소로 전원을 공급할 수 있다. 디바이스(100)는 통신부(1040)를 통해 외부 전원장치(미 도시됨)로부터 전원을 공급 받을 수 있다. 디바이스(100)는 외부 전원 장치와 연결될 수 있는 커넥터를 더 포함할 수 있다. The device 100 may include a battery. The battery may be embedded in the device 100 in a form that is embedded in the device 100 or removable. The battery can supply power to all components contained in the device 100. The device 100 may receive power from an external power supply (not shown) through the communication unit 1040. [ The device 100 may further include a connector connectable to an external power supply.

도 10에 도시된 프로세서(1090), 사용자 인터페이스부(1020)에 포함된 디스플레이(1021) 및 사용자 입력부(1022), 메모리(1030), 오디오 입력부(1070), 및 통신부(1040)는 도 9에 도시된 프로세서(930), 오디오 입력부(910), 통신부(920), 디스플레이(940), 사용자 입력부(950), 메모리(960)와 유사하거나 동일한 구성요소로 언급될 수 있다. The processor 1090 shown in Fig. 10, the display 1021 and the user input unit 1022, the memory 1030, the audio input unit 1070, and the communication unit 1040 included in the user interface unit 1020 are shown in Fig. May be referred to as similar or identical components to the illustrated processor 930, audio input 910, communication unit 920, display 940, user input 950, and memory 960.

메모리(1030)에 저장된 프로그램들은 그 기능에 따라 복수 개의 모듈들로 분류될 수 있다. 예를 들어, 메모리(1030)에 저장되는 프로그램들은 UI 모듈(1031), 알림 모듈(1032), 및 애플리케이션 모듈(1033) 등으로 분류될 수 있으나 본 개시는 이로 한정되지 않는다. 예를 들어, 메모리(1030)에 저장되는 프로그램들은 도 9의 메모리(960)에서 언급된 바와 같이 복수개의 모듈들로 분류될 수 있다. Programs stored in the memory 1030 can be classified into a plurality of modules according to their functions. For example, the programs stored in the memory 1030 may be classified into the UI module 1031, the notification module 1032, and the application module 1033, but the present disclosure is not limited thereto. For example, programs stored in the memory 1030 may be classified into a plurality of modules as mentioned in the memory 960 of FIG.

UI 모듈(1031)은 바람직한 실시 예에서 언급되는 음성 인식을 위한 웨이크업 키워드를 등록하기 위한 GUI 정보, 음성 인식 결과(예를 들어, 텍스트 정보)를 나타내는 GUI 정보, 음성 인식 파형을 나타내는 GUI 정보 등을 프로세서(1090)로 제공할 수 있다. 프로세서(1090)는 UI 모듈(1032)로부터 수신되는 GUI 정보에 기초한 스크린을 디스플레이(10210)에 디스플레이 할 수 있다. UI 모듈(1031)은, 디바이스(100)에 설치된 각 애플리케이션에 대해 특화된 UI, 또는/및 GUI 등을 프로세서(1090)로 제공할 수 있다. The UI module 1031 includes GUI information for registering a wake-up keyword for speech recognition, GUI information indicating a speech recognition result (e.g., text information), GUI information indicating a speech recognition waveform, etc. To the processor 1090. The processor 1090 may display on the display 10210 a screen based on GUI information received from the UI module 1032. [ The UI module 1031 can provide the processor 1090 with a UI and / or GUI specialized for each application installed in the device 100. [

알림 모듈(1032)은 디바이스(100)의 음성 인식에 따른 알림, 웨이크업 키워드 등록에 따른 알림, 잘못된 웨이크업 키워드 입력에 따른 알림, 또는 웨이크업 키워드 인식에 따른 알림 등을 발생할 수 있으나 알림 모듈(1032)에 의해 발생되는 알림은 상술한 바로 제한되지 않는다. The notification module 1032 can generate a notification based on the voice recognition of the device 100, a notification based on the registration of the wakeup keyword, a notification based on the input of the false wakeup keyword, or a notification based on the wakeup keyword recognition, 1032 are not limited to the above-mentioned ones.

알림 모듈(1032)은 디스플레이(1021)를 통해 비디오 신호 형태로 알림 신호를 출력할 수 있고, 오디오 출력부(1060)를 통해 오디오 신호 형태로 알림 신호를 출력할 수 있으나 이로 한정되지 않는다. The notification module 1032 can output a notification signal in the form of a video signal through the display 1021 and output a notification signal in the form of an audio signal through the audio output unit 1060. However,

애플리케이션 모듈(1033)은 본 개시의 실시 예들에서 언급되고 있는 음성인식 애플리케이션을 비롯한 다양한 애플리케이션을 포함할 수 있다. The application module 1033 may include various applications, including speech recognition applications, as described in the embodiments of this disclosure.

통신부(1040)는 디바이스(100)와 적어도 하나의 외부 디바이스(예를 들어, 음성 인식 서버(110), 스마트 티브이, 스마트 시계, 스마트 거울, 또는/및 IoT 네트워크 기반 디바이스 등)간의 통신을 위한 하나 이상의 구성요소를 포함할 수 있다. 예를 들어, 통신부(1040)는, 근거리 통신기(1041), 이동 통신기(1042), 및 방송 수신기(1043)중 적어도 하나를 포함할 수 있으나 통신부(1040)에 포함되는 구성 요소는 이로 제한되지 않는다. The communication unit 1040 is a unit for communication between the device 100 and at least one external device (e.g., voice recognition server 110, smart TV, smart clock, smart mirror, and / or IoT network- And may include the above components. For example, the communication unit 1040 may include at least one of the local communicator 1041, the mobile communication device 1042, and the broadcast receiver 1043, but the components included in the communication unit 1040 are not limited thereto .

근거리 통신기(short-range wireless communicator)(1041)는, 블루투스 통신 모듈, BLE(Bluetooth Low Energy) 통신 모듈, 근거리 무선 통신(Near Field Communication unit, RFID) 모듈, WLAN(와이파이) 통신 모듈, 지그비(Zigbee) 통신 모듈, Ant+ 통신 모듈, WFD(Wi-Fi Direct) 통신 모듈, 비콘 통신 모듈, 또는 UWB(ultra wideband) 통신 모듈을 포함할 수 있으나 이로 제한되지 않는다. 예를 들어, 근거리 통신기(7451)는 적외선(IrDA, infrared Data Association) 통신 모듈을 포함할 수 있다. A short-range wireless communicator 1041 may be a Bluetooth communication module, a Bluetooth low energy (BLE) communication module, a near field communication unit (RFID) module, a WLAN communication module, a Zigbee ) Communication module, an Ant + communication module, a WFD (Wi-Fi Direct) communication module, a beacon communication module, or an UWB (ultra wideband) communication module. For example, the local communicator 7451 may include an infrared (IRDA) communication module.

이동 통신기(1042)는 이동 통신망 상에서 기지국, 외부 디바이스, 서버 중 적어도 하나와 무선 신호를 송수신할 수 있다. 여기에서, 무선 신호는, 음성 호 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다. The mobile communication device 1042 can transmit and receive a radio signal with at least one of a base station, an external device, and a server on a mobile communication network. Here, the wireless signal may include various types of data depending on a voice call signal, a video call signal, or a text / multimedia message transmission / reception.

방송 수신기(1043)는, 방송 채널을 통하여 외부로부터 방송 신호 및/또는 방송 관련된 정보를 수신할 수 있다. 방송 채널은 위성 채널, 지상파 채널, 및 라디오 채널 중 적어도 하나를 포함할 수 있으나 이로 제한되지 않는다. The broadcast receiver 1043 can receive broadcast signals and / or broadcast-related information from outside via a broadcast channel. The broadcast channel may include, but is not limited to, at least one of a satellite channel, a terrestrial channel, and a radio channel.

통신부(1040)는 바람직한 일 실시 예에 따라 디바이스(100)에 의해 생성된 적어도 하나의 정보를 적어도 하나의 외부 디바이스로 전송하거나 적어도 하나의 외부 디바이스로부터 전송되는 정보를 수신할 수 있다. The communication unit 1040 can transmit at least one information generated by the device 100 to at least one external device or receive information transmitted from at least one external device according to a preferred embodiment.

센서부(1010)는 디바이스(100)에 사용자(101)의 접근 여부를 감지하는 근접 센서(1011), 디바이스(100)의 사용자(101)의 건강 정보를 감지하는 바이오 센서(또는 헬스 센서(예를 들어, 심박동 센서, 혈류량 센서, 당뇨 센서, 혈압 센서, 또는/및 스트레스 센서 등)) (1012), 디바이스(100) 주변의 조명을 감지하는 조도 센서(1013)(또는 광 센서, LED 센서), 디바이스(100)의 사용자(101)의 무드(mood)를 감지하는 무드스코프 센서(1014), 활동성(activity)을 감지하는 동작 감지 센서(1015), 디바이스(100)의 위치를 검출하기 위한 위치 센서(Position Sensor, 예를 들어, GPS(Global Positioning System) 수신기)(1016), 디바이스(100)의 방위각을 측정하는 자이로스코프 센서(1017), 지표면을 중심으로 디바이스(100)의 기울기 및 가속도 등을 측정하는 가속도(Accelerometer) 센서(1018), 또는/및 디바이스(100)를 기준으로 동서남북 방위를 감지하는 지자기 센서(1019) 등을 포함할 수 있으나 본 개시에서 센서부(1010)에 포함되는 센서는 상술한 바로 제한되지 않는다. The sensor unit 1010 includes a proximity sensor 1011 that detects whether the user 101 is approaching the device 100 or a biosensor (or a health sensor (E.g., a heartbeat sensor, a blood flow sensor, a diabetes sensor, a blood pressure sensor, and / or a stress sensor) 1012, an illuminance sensor 1013 (or an optical sensor, LED sensor) A mood sensor 1014 for sensing the mood of the user 101 of the device 100, a motion detection sensor 1015 for sensing activity, a position for detecting the position of the device 100, A gyroscope sensor 1017 for measuring the azimuth angle of the device 100, a tilt and an acceleration of the device 100 about the ground surface, etc. An accelerometer sensor 1018 that measures the acceleration of the device 100, The north, south, east, and west, and the like geomagnetic sensor 1019 for sensing the orientation, but the sensor included in the sensor unit 1010 in the present disclosure is not limited just above.

예를 들어, 센서부(1010)는 온/습도 센서, 중력 센서, 고도(Altitude) 센서, 화학적 센서(예를 들어, 냄새 센서(Odorant sensor)), 기압 센서, 미세 먼지 측정 센서, 자외선 센서, 오존도 센서, 이산화 탄소(CO2) 센서, 또는/및 네트워크 센서(예를 들어 WiFi, Bluetooth, 3G, LTE(Long Term Evolution), 또는/및 NFC(Near Field Communication) 등에 기초한 네트워크 센서) 등을 포함할 수 있으나 이로 제한되지 않는다. For example, the sensor unit 1010 may be a temperature sensor, a gravity sensor, an altitude sensor, a chemical sensor (e.g., Odorant sensor), an air pressure sensor, a fine dust measuring sensor, (E.g., a network sensor based on WiFi, Bluetooth, 3G, Long Term Evolution (LTE), or / and Near Field Communication (NFC), etc.) But is not limited to.

센서부(1010)는 압력 센서(예를 들어, 터치 센서, 압전 센서, 물리적인 버튼 등), 상태 센서(예를 들어, 이어폰 단자, DMB(Digital Multimedia Broadcasting) 안테나, 표준 단자(예를 들어, 충전 진행 여부를 인식할 수 있는 단자, PC(Personal Computer) 연결 여부를 인식할 수 있는 단자, 독(dock) 연결 여부를 인식할 수 있는 단자), 또는/및 타임 센서 등을 포함할 수 있으나 이로 제한되지 않는다. The sensor unit 1010 may include a pressure sensor (e.g., a touch sensor, a piezoelectric sensor, a physical button, etc.), a status sensor (e.g., an earphone terminal, a Digital Multimedia Broadcasting (DMB) A terminal for recognizing whether or not the charging progress is recognized, a terminal for recognizing whether or not a PC (Personal Computer) is connected, a terminal for recognizing whether a dock is connected), and / or a time sensor. It is not limited.

센서부(1010)에 포함되는 센서들은 도 10에 도시된 것보다 적은 수의 센서들을 포함할 수 있다. 예를 들어, 센서부(1010)는 위치 센서(1016)만을 포함할 수 있다. 센서부(1010)에 위치 센서(1016)만 포함되는 경우에, 센서부(1010)는 GPS 수신기로 표현될 수 있다. The sensors included in the sensor portion 1010 may include fewer sensors than those shown in FIG. For example, the sensor unit 1010 may include only the position sensor 1016. [ In the case where only the position sensor 1016 is included in the sensor unit 1010, the sensor unit 1010 may be represented by a GPS receiver.

센서부(1010)에 의해 감지된 결과(또는 센싱 값)는 프로세서(1090)로 전송된다. 센서부(1010)로부터 수신되는 센싱 값이 위치를 나타내는 값일 때, 프로세서(1090)는 수신되는 센싱 값에 기초하여 디바이스(100)의 현재 위치가 홈인지 오피스인지를 판단할 수 있다. The result (or sensing value) sensed by the sensor unit 1010 is transmitted to the processor 1090. When the sensing value received from the sensor unit 1010 is a value indicating the position, the processor 1090 can determine whether the current position of the device 100 is the home or office based on the received sensing value.

프로세서(1090)는 디바이스(100)의 전반적인 동작을 제어하는 제어부로 동작할 수 있다. 예를 들어, 프로세서(1090)는 메모리(1030)에 저장된 프로그램들을 실행함으로써, 센서부(1010), 메모리(1030), 사용자 인터페이스부(1020), 영상 처리부(1050), 오디오 출력부(1060), 오디오 입력부(1070), 카메라(1080), 또는/및 통신부(1040) 등을 전반적으로 제어할 수 있다. The processor 1090 may operate as a controller that controls the overall operation of the device 100. [ For example, the processor 1090 may include a sensor unit 1010, a memory 1030, a user interface unit 1020, an image processing unit 1050, an audio output unit 1060, The audio input unit 1070, the camera 1080, and / or the communication unit 1040 and the like.

프로세서(1090)는 도 9의 프로세서(930)와 같이 동작할 수 있다. 프로세서(1090)에서 메모리(1030)로부터 데이터를 리드하는 동작에 대해, 프로세서(1090)는 통신부(1040)를 통해 외부 디바이스로부터 데이터를 수신하는 동작을 수행할 수 있다. 프로세서(1090)에서 메모리(1030)에 데이터를 쓰는 동작에 대해, 프로세서(1090)는 통신부(1040)를 통해 외부 디바이스로 전송하는 동작을 수행할 수 있다. Processor 1090 may operate as processor 930 of FIG. For the operation of reading data from the memory 1030 in the processor 1090, the processor 1090 can perform operations to receive data from the external device via the communication unit 1040. [ For the operation of writing data from the processor 1090 to the memory 1030, the processor 1090 may perform an operation of transmitting the data to the external device through the communication unit 1040. [

프로세서(1090)는 상술한 도 2, 3, 4, 6, 7, 또는 8에서 언급되고 있는 적어도 하나의 동작을 수행할 수 있다. 프로세서(1090)는 상술한 동작을 제어하는 제어기라고 말할 수 있다. The processor 1090 may perform at least one of the operations referred to in Figures 2, 3, 4, 6, 7, or 8 described above. The processor 1090 may be referred to as a controller that controls the above-described operations.

영상 처리부(1050)는 통신부(1040)로부터 수신되거나 메모리(1030)에 저장된 영상 데이터를 디스플레이(1021)에 표시할 수 있도록 처리할 수 있다. The image processing unit 1050 can process the image data received from the communication unit 1040 or stored in the memory 1030 so that the image data can be displayed on the display 1021.

오디오 출력부(1060)는 통신부(1040)로부터 수신되거나 메모리(1030)에 저장된 오디오 데이터를 출력할 수 있다. 오디오 출력부(1060)는 디바이스(100)에 의해 수행되는 기능과 관련된 음향 신호(예를 들어, 알림 음)를 출력할 수 있다. The audio output unit 1060 can output the audio data received from the communication unit 1040 or stored in the memory 1030. The audio output unit 1060 can output a sound signal (e.g., a notification sound) related to a function performed by the device 100. [

오디오 출력부(1060)에는 스피커(speaker), 또는 버저(Buzzer) 등이 포함될 수 있으나 이로 제한되지 않는다.The audio output unit 1060 may include, but is not limited to, a speaker, a buzzer, and the like.

도 11은 일부 실시 예에 따른 음성 인식 서버(110)의 기능 블록도이다. 11 is a functional block diagram of speech recognition server 110 according to some embodiments.

도 11을 참조하면, 음성 인식 서버(110)는 통신부(1110), 프로세서(1120), 및 메모리(1130)를 포함하나 음성 인식 서버(110)의 구성은 도 11에 도시된 바로 제한되지 않는다. 즉, 음성 인식 서버(110)에 포함되는 구성 요소는 도 11에 도시된 구성 요소보다 더 많거나 더 적을 수 있다. 11, the voice recognition server 110 includes a communication unit 1110, a processor 1120, and a memory 1130, but the configuration of the voice recognition server 110 is not limited to that shown in FIG. That is, the components included in the speech recognition server 110 may be more or less than the components shown in FIG.

통신부(1110)는 도 10에 도시된 통신부(1040)와 같이 구성될 수 있다. 통신부(1110)는 디바이스(100)와 음성 인식 관련 신호를 송수신할 수 있다. The communication unit 1110 may be configured as the communication unit 1040 shown in FIG. The communication unit 1110 can transmit and receive a voice recognition related signal to the device 100. [

프로세서(1120)는 상술한 도 2, 도 6, 또는 도 7에서 언급된 음성 인식 서버(110)의 동작을 수행할 수 있다. The processor 1120 may perform the operations of the speech recognition server 110 mentioned in FIG. 2, FIG. 6, or FIG.

메모리(1130)는 웨이크업 키워드 모델(1131), 및 음성 인식 모델(1132)을 저장하고, 프로세서(1120)에 의해 제어되어 웨이크업 키워드 모델(1131) 및 음성 인식 모델(1132)을 프로세서(1120)로 제공할 수 있다. 음성 인식 모델(1132)은 음성 명령을 인식하기 위한 모델이라 말할 수 있다.The memory 1130 stores the wakeup keyword model 1131 and the speech recognition model 1132 and controls the wakeup keyword model 1131 and the speech recognition model 1132 by the processor 1120 ). The speech recognition model 1132 can be referred to as a model for recognizing a voice command.

메모리(1130)에 저장되는 웨이크업 키워드 모델(1131) 및 음성 인식 모델(1132)은 통신부(1110)를 통해 수신되는 정보에 따라 업데이트 될 수 있다. 메모리(1130)에 저장되는 웨이크업 키워드 모델(1131) 및 음성 인식 모델(1132)은 운영자의 정보 입력에 의해 업데이트될 수 있다. 이를 위하여 음성 인식 서버(110)는 운영자가 정보를 입력할 수 있는 구성 요소를 더 포함할 수 있다. The wakeup keyword model 1131 and the speech recognition model 1132 stored in the memory 1130 can be updated according to the information received through the communication unit 1110. [ The wake-up keyword model 1131 and the speech recognition model 1132 stored in the memory 1130 can be updated by an operator's information input. For this purpose, the voice recognition server 110 may further include a component for allowing an operator to input information.

도 12는 일부 다른 실시 예에 따른 음성 인식 시스템(1200)의 구성도이다. 도 12는 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)로부터 수신되는 사용자(101)의 음성 신호를 음성 인식 서버(110)에서 인식하는 경우이다. 12 is a block diagram of a speech recognition system 1200 according to some other embodiments. 12 is a case where the voice recognition server 110 recognizes a voice signal of the user 101 received from a plurality of devices 100, 1210, 1220, 1230, 1240,

복수의 디바이스는 웨어러블 글래스(1210), 스마트 워치(1220), IoT 디바이스(1230), IoT 센서(1240), 및/또는 스마트 TV(1250)를 포함할 수 있다. The plurality of devices may include a wearable glass 1210, a smart watch 1220, an IoT device 1230, an IoT sensor 1240, and / or a smart TV 1250.

상술한 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)는 사용자가 동일할 수 있으나 사용자가 서로 다를 수 있다. 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)의 사용자가 동일한 경우에, 음성 인식 서버(110)는 디바이스 별로 웨이크업 키워드 모델을 등록하여 음성 인식 기능을 수행할 수 있다. 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)의 사용자가 서로 다른 경우에, 음성 인식 서버(110)는 디바이스의 식별 정보와 디바이스 별 사용자의 식별 정보를 이용하여 웨이크업 키워드 모델을 등록하여 음성 인식 기능을 수행할 수 있다. 이에 따라 본 개시의 음성 인식 시스템(1200)은 보다 다양하고, 정확한 음성 인식 서비스를 제공할 수 있다. 음성 인식 서버(110)는 등록된 웨이크업 키워드 모델을 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)로 제공할 수 있다.The plurality of devices 100, 1210, 1220, 1230, 1240, and 1250 described above may be the same as the user, but may be different from each other. When the users of the plurality of devices 100, 1210, 1220, 1230, 1240 and 1250 are the same, the voice recognition server 110 can perform a voice recognition function by registering a wake-up keyword model for each device. When the users of the plurality of devices 100, 1210, 1220, 1230, 1240 and 1250 are different from each other, the voice recognition server 110 uses the device identification information and the device- So that the voice recognition function can be performed. Accordingly, the speech recognition system 1200 of the present disclosure can provide a more versatile and accurate speech recognition service. The speech recognition server 110 may provide the registered wakeup keyword model to the plurality of devices 100, 1210, 1220, 1230, 1240 and 1250. [

또한, 음성 인식 서버(110)는 웨이크업 키워드와 음성 명령을 연속적으로 인식 처리함에 따라 웨이크업 키워드 이외의 음성 신호를 이용하여 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250) 주변의 잡음 레벨을 추정하거나 환경 정보를 인식할 수 있다. 음성 인식 서버(110)는 상술한 추정된 잡음 레벨이나 인식된 환경 정보를 음성 인식 결과와 함께 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)로 제공함으로써, 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)를 제어하는데 사용하거나 추정 또는 인식된 정보를 사용자에게 제공할 수 있다. In addition, the speech recognition server 110 recognizes the wake-up keyword and the voice command in order to recognize the wake-up keyword and the voice command in the vicinity of the plurality of devices 100, 1210, 1220, 1230, 1240, The noise level can be estimated or environment information can be recognized. The speech recognition server 110 provides the above-described estimated noise level or recognized environment information to the plurality of devices 100, 1210, 1220, 1230, 1240, and 1250 together with the speech recognition result, 1210, 1220, 1230, 1240, 1250, or may provide the user with estimated or perceived information.

네트워크(1260)는 유선 또는/및 무선 네트워크로 구성될 수 있다. 네트워크(1260)는 도 10에 도시된 통신부(1040)에서 언급된 통신 중 적어도 하나의 통신에 기초하여 복수의 디바이스(100, 1210, 1220, 1230, 1240, 1250)와 서버(110)간에 데이터를 송수신할 수 있다.The network 1260 may be configured as a wired or / and wireless network. 1260 and 1250 and the server 110 based on at least one of the communications mentioned in the communication unit 1040 shown in Fig. It can transmit and receive.

상술한 도 2, 도 3, 도 4, 도 6, 도 7, 또는 도 8에 도시된 방법은 컴퓨터 프로그램에 의해 구현될 수 있다. 예를 들어, 상술한 도 2에서 디바이스(100)의 동작은 디바이스(100)에 인스톨된 음성 인식 애플리케이션에 의해 수행될 수 있다. 도 2에 도시된 음성 인식 서버(110)의 동작은 음성 인식 서버(110)에 인스톨된 음성 인식 애플리케이션에 의해 수행될 수 있다. 상술한 컴퓨터 프로그램은 디바이스(100)에 설치된 운영 시스템(Operation System) 환경에서 동작될 수 있다. 상술한 컴퓨터 프로그램은 음성 인식 서버(110)에 설치된 운영 시스템 환경에서 동작될 수 있다. 디바이스(100)는 상술한 컴퓨터 프로그램을 저장 매체에 라이트 하고, 저장 매체로부터 리드하여 사용할 수 있다. 음성 인식 서버(110)는 상술한 컴퓨터 프로그램을 저장 매체에 라이트 하고, 저장 매체로부터 리드하여 사용할 수 있다. The above-described methods shown in Figs. 2, 3, 4, 6, 7, or 8 can be implemented by a computer program. For example, the operation of the device 100 in Fig. 2 described above may be performed by a voice recognition application installed in the device 100. [ The operation of the speech recognition server 110 shown in Fig. 2 may be performed by a speech recognition application installed in the speech recognition server 110. [ The computer program described above may be operated in an operating system environment installed in the device 100. [ The computer program described above may be operated in an operating system environment installed in the speech recognition server 110. [ The device 100 can write the above-described computer program onto a storage medium, read from the storage medium, and use the computer program. The speech recognition server 110 can write the above-described computer program onto a storage medium, read from the storage medium, and use the computer program.

본 개시의 일 실시 예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. An embodiment of the present disclosure may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium can include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

전술한 본 개시의 설명은 예시를 위한 것이며, 본 개시가 속하는 기술분야의 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It is to be understood that the foregoing description of the disclosure is for the purpose of illustration only and that those skilled in the art will readily understand that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 개시의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present disclosure is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

Claims

In a device,
An audio input unit for receiving a user's voice signal;
A memory for storing a wake-up keyword model;
A communication unit capable of communicating with a speech recognition server; And
Detecting a wake-up keyword from the voice signal of the user using the wake-up keyword model as the voice signal of the user is received through the audio input unit,
A detection signal for the wake-up keyword and a voice signal of the user are transmitted to the voice recognition server through the communication unit,
Receives a speech recognition result from the speech recognition server through the communication unit, and
And a processor for controlling the device in accordance with the speech recognition result.

2. The apparatus of claim 1, wherein the device
Further comprising an environmental information sensing unit for sensing environmental information based on the device,
Wherein the wake-up keyword model includes a plurality of wake-up keyword models based on environment information,
The processor comprising:
A device for detecting a wake-up keyword from a voice signal of a user using a wake-up keyword model corresponding to environment information based on the device detected through the environment information sensing unit among the plurality of wake-up keyword models.

3. The device of claim 2, wherein the environment information includes location information of the device.

3. The apparatus of claim 2,
Acquiring identification information of the user based on a user's voice signal received through the audio input unit,
And transmits the obtained identification information of the user to the voice recognition server through the communication unit.

The method of claim 1, further comprising: when an audio signal for registering the wake-up keyword model is received through the audio input unit,
The processor comprising:
And determines whether the audio signal is valid as the wake-up keyword model based on a matching rate for the audio signal.

2. The apparatus of claim 1, wherein the device
Further comprising a user input for receiving user input,
The processor comprising:
And registers the candidate wake-up keyword model selected from the candidate wake-up keyword models stored in the memory in accordance with a user input received through the user input unit.

2. The apparatus of claim 1,
Up keyword model from the speech recognition server via the communication unit,
And registers the received wake-up keyword model in the memory.

2. The apparatus of claim 1,
Up keyword model based on a voice signal of a user received through the audio input unit,
And registers the generated wake-up keyword model in the memory.

A voice recognition server comprising:
A communication unit capable of communicating with at least one device;
A memory for storing a wake-up keyword model and a speech recognition model; And
A speech recognition model in which the wake-up keyword model is combined with a detection signal for a wake-up keyword and a voice signal of a user are received from one of the at least one device through the communication unit,
Recognizes the user's voice signal using the set speech recognition model,
Removing the wake-up keyword from a voice recognition result of the voice signal of the user,
And a processor for transmitting the voice recognition result from which the wake-up keyword is removed to the device via the communication unit.

The method of claim 9, wherein the wake-up keyword model includes a plurality of wake-up keyword models based on environmental information,
The processor comprising:
And a voice recognition model in which a wake-up keyword model corresponding to environment information based on the device is combined among the plurality of wake-up keyword models as the environment information based on the device is received through the communication unit.

10. The apparatus of claim 9,
And a voice recognition model in which a wake-up keyword model associated with the identification information of the user and the environment information based on the device is combined is received as the identification information of the user is received through the communication unit.

In a speech recognition system,
A device for detecting a wakeup keyword from a voice signal of a user;
Setting a voice recognition model in which a wake-up keyword model is combined with a detection signal for the wake-up keyword and a voice signal of the user from the device,
And a speech recognition server for recognizing the user's speech signal using the set speech recognition model and transmitting the speech recognition result to the device.

Detecting a wake-up keyword from the user's voice signal using a wake-up keyword model as the user's voice signal is received;
Transmitting a detection signal for the wake-up keyword and the voice signal of the user to a voice recognition server;
Receiving a recognition result of the user's voice signal from the voice recognition server; And
And controlling the device in accordance with recognition results of the user's voice signal.

14. The method of claim 13, wherein the wake-up keyword model includes a plurality of wake-up keyword models based on environmental information,
The method comprises:
Detecting environmental information based on the device; And
And transmitting environment information based on the detected device to the voice recognition server,
Wherein detecting the wakeup keyword comprises:
Detecting a wake-up keyword from the user's voice signal using a wake-up keyword model corresponding to environment information based on the detected device among the plurality of wake-up keyword models.

15. The method of claim 14,
Obtaining identification information of the user based on the voice signal of the user; And
And transmitting the obtained identification information of the user to the voice recognition server.

Receiving a detection signal and a voice signal of a user for a wakeup keyword from the device;
Setting a speech recognition model according to a detection signal of the wakeup keyword;
Recognizing the user's speech signal using the speech recognition model;
Removing the wake-up keyword from recognition results of the user's voice signal; And
And transmitting to the device a recognition result of the user's voice signal from which the wake-up keyword has been removed.

17. The method of claim 16,
The method may further comprise receiving device-based environment information from the device,
Wherein the step of setting the speech recognition model comprises:
Setting a voice recognition model in which a wake-up keyword model corresponding to environment information based on the device is combined among a plurality of wake-up keyword models based on environment information.

17. The method of claim 16,
Further comprising receiving user identification information from the device,
Wherein the step of setting the speech recognition model comprises:
And setting a speech recognition model in which environmental information based on the device and a wake-up keyword model related to the identification information of the user are combined.

Registering a wake-up keyword model in a device and a voice recognition server;
Detecting the wake-up keyword from the user's voice signal using the wake-up keyword model as the user's voice signal is received through the device;
Transmitting a detection signal for the wake-up keyword and the user's voice signal from the device to the voice recognition server;
Setting a speech recognition model in the speech recognition server in accordance with a detection signal of the wakeup keyword;
Recognizing the speech signal of the user using the speech recognition model in the speech recognition server;
Removing, at the speech recognition server, the wakeup keyword from recognition results of the user's speech signal;
Transmitting a recognition result of the user's voice signal from which the wake-up keyword is removed to the device from the voice recognition server; And
And in the device, controlling the device in accordance with the received speech recognition result.

A computer-readable recording medium storing a program for causing a computer to execute the method according to any one of claims 13 to 15.

A computer-readable recording medium having recorded thereon a program for causing a computer to execute the method according to any one of claims 16 to 18.