KR101728941B1

KR101728941B1 - Application operating apparatus based on voice recognition and Control method thereof

Info

Publication number: KR101728941B1
Application number: KR1020150016652A
Authority: KR
Inventors: 이현우
Original assignee: 주식회사 시그널비젼
Priority date: 2015-02-03
Filing date: 2015-02-03
Publication date: 2017-04-20
Also published as: KR20160095418A

Abstract

본 발명은 음성 인식 기반 애플리케이션 구동 장치 및 제어 방법을 개시한다. 본 발명의 일 실시예에 따르면, 음성 인식 기반 애플리케이션 구동 장치로서, 프로세서; 및 상기 프로세서에 연결된 메모리를 포함하며, 상기 메모리는, 백그라운드 상태로 실행되고; 복수의 트리거 명령어를 저장하며; 입력되는 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단하며; 상기 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는 경우, 상기 대응되는 트리거 명령어에 따른 애플리케이션을 호출하도록 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 음성 인식 기반 애플리케이션 구동 장치가 제공된다.The present invention discloses a speech recognition-based application driving apparatus and a control method. According to an embodiment of the present invention, there is provided a device for driving a speech recognition based application, comprising: a processor; And a memory coupled to the processor, wherein the memory is running in a background state; Storing a plurality of trigger instructions; Determine whether an incoming user voice corresponds to one of the stored plurality of trigger commands; There is provided a speech recognition based application driving apparatus for storing program instructions executable by the processor to call an application according to the corresponding trigger instruction when the user voice corresponds to one of the stored plurality of trigger instructions.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a voice recognition-

본 발명은 음성 인식 기반 애플리케이션 구동 장치 및 제어 방법에 관한 것으로서, 보다 상세하게는 단일 발성으로 다양한 애플리케이션을 바로 실행할 수 있도록 하는 음성 인식 기반 애플리케이션 구동 장치 및 제어 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to a device for driving a speech recognition-based application and a control method thereof, and more particularly, to a device for driving a speech recognition-based application and a control method for enabling various applications to be executed immediately with a single utterance.

사용자의 입 모양과 혀의 위치 변화로 인하여 발성되는 음성의 주파수가 변화하게 되며, 음성 인식 기술은 발성된 사용자 음성을 전기 신호로 변환한 후 음성 신호의 주파수 특징을 추출해 인식하는 기술이다. Speech recognition technology converts speech user speech into electrical signals and then extracts and recognizes the frequency characteristics of the speech signal.

최근 음성 인식 기술이 모바일 기기의 제어를 위해 사용되고 있다. Recently, speech recognition technology has been used to control mobile devices.

도 1은 종래기술에 따른 음성 인식 기반 모바일 기기의 제어 과정을 도시한 순서도이다. 1 is a flowchart illustrating a process of controlling a mobile device based on speech recognition according to the related art.

도 1을 참조하면, 사용자의 버튼 조작에 의해 모바일 기기에 탑재된 음성 인식 애플리케이션이 활성화(active) 상태가 된다(단계 100).Referring to FIG. 1, a voice recognition application installed on a mobile device is activated by a user's operation of a button (step 100).

여기서, 음성 인식 애플리케이션의 활성화는 모바일 기기의 전면에 배치된 홈버튼의 소정 횟수 이상의 선택 또는 터치스크린을 통해 음성 인식 애플리케이션 아이콘을 선택하는 방식으로 이루어질 수 있다. Here, the activation of the speech recognition application may be performed by selecting a predetermined number of times or more of the home button disposed on the front side of the mobile device, or by selecting the speech recognition application icon through the touch screen.

활성화된 음성 인식 애플리케이션은 사용자로부터 미리 설정된 하나 이상의 단어에 해당하는 음성이 입력되는지 여부를 판단한다(단계 102).The activated speech recognition application determines whether a voice corresponding to one or more preset words is input from the user (step 102).

예를 들어, 미리 설정된 하나 이상의 단어는 “하이 갤럭시”, “오케이 구글” 및 “시리야”와 같은 단어를 포함할 수 있다. For example, one or more preset words may include words such as " high galaxy ", " GOOGLE GOOGLE ", and "

여기서 상기한 미리 설정된 하나 이상의 단어는 음성 인식 애플리케이션이 이후 입력된 사용자 음성의 인식 과정을 수행하도록 하는 트리거 명령어이다. Herein, the preset one or more words are trigger commands for allowing the speech recognition application to perform a process of recognizing a user's voice inputted subsequently.

즉, 트리거 명령어는 마이크를 통해 입력된 사용자 음성의 분석 과정을 수행하도록 하는 명령어로 정의되는 것이다. That is, the trigger command is defined as a command for performing a process of analyzing a user's voice inputted through a microphone.

트리거 명령어가 입력되고, 이후 다른 사용자 음성이 입력되는 경우, 음성 인식 애플리케이션은 사용자 음성을 인식하고(단계 104), 인식된 사용자 음성에 해당하는 기능을 실행한다(단계 106). If the trigger command is input and then another user voice is input, the voice recognition application recognizes the user voice (step 104) and executes the function corresponding to the recognized user voice (step 106).

이러한 종래기술에 따르면, 음성 인식 기반으로 모바일 기기를 제어하기 위해서는 음성 인식 애플리케이션을 활성화하기 위해 사용자의 버튼 조작이 필수적이기 때문에 운전 중 또는 손을 자유롭게 사용하지 못하는 상황에서 사용자의 불편함을 초래할 수 있다. According to the related art, in order to control a mobile device based on speech recognition, a button operation of a user is required to activate a voice recognition application, which may inconvenience a user in a situation where the user can not use the device during operation or freely using the device .

또한, 소정 기능을 실행하기 위한 전 단계로서 상기한 바와 같이 미리 정해진 트리거 명령어의 입력을 필수적으로 요구하기 때문에 사용자가 본래 원하는 기능 실행을 위해 여러 단계를 거쳐야 하는 문제가 있다.In addition, since the input of a predetermined trigger command is essentially required as a pre-step for executing a predetermined function, there is a problem that the user must go through various steps in order to perform a desired function originally.

상기한 바와 같은 종래기술의 문제점을 해결하기 위해, 본 발명에서는 사용자가 한 번의 음성 입력으로 원하는 기능을 이용할 수 있도록 하는 음성 인식 기반 애플리케이션 구동 장치 및 제어 방법을 제안하고자 한다. In order to solve the problems of the related art as described above, the present invention proposes a speech recognition-based application driving apparatus and a control method that allow a user to use a desired function with one voice input.

본 발명의 다른 목적들은 하기의 실시예를 통해 당업자에 의해 도출될 수 있을 것이다.Other objects of the invention will be apparent to those skilled in the art from the following examples.

상기한 목적을 달성하기 위해 본 발명의 바람직한 일 실시예에 따르면, 음성 인식 기반 애플리케이션 구동 장치로서, 프로세서; 및 상기 프로세서에 연결된 메모리를 포함하며, 상기 메모리는, 백그라운드 상태로 실행되고; 복수의 트리거 명령어를 저장하며; 입력되는 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단하며; 상기 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는 경우, 상기 대응되는 트리거 명령어에 따른 애플리케이션을 호출하도록상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 음성 인식 기반 애플리케이션 구동 장치가 제공된다. According to a preferred embodiment of the present invention, there is provided a device for driving a speech recognition based application, the device comprising: a processor; And a memory coupled to the processor, wherein the memory is running in a background state; Storing a plurality of trigger instructions; Determine whether an incoming user voice corresponds to one of the stored plurality of trigger commands; There is provided a speech recognition based application driving apparatus for storing program instructions executable by the processor to call an application according to the corresponding trigger instruction when the user voice corresponds to one of the stored plurality of trigger instructions.

상기 애플리케이션은 통화, 카메라, 음악, 동영상 및 네비게이션 중 적어도 하나를 위한 애플리케이션일 수 있다. The application may be an application for at least one of a call, a camera, music, video, and navigation.

상기 복수의 트리거 명령어는 식별 이름 및 상기 식별 이름에 상응하는 애플리케이션 동작을 위한 단어를 포함할 수 있다. The plurality of trigger instructions may include an identification name and a word for application operation corresponding to the identification name.

상기 복수의 트리거 명령어는 주소록에 포함된 복수의 연락처 중 미리 설정된 개수 이하의 연락처로의 통화를 위한 명령어일 수 있다. The plurality of trigger commands may be instructions for making a call to a predetermined number or less of a plurality of contacts included in the address book.

상기 메모리는, 사용자의 요청에 따라 복수의 연락처 중 하나 이상의 선택을 위한 인터페이스를 출력하고; 상기 복수의 연락처 중 사용자에 의해 선택된 하나 이상의 연락처를 식별하고; 상기 식별된 하나 이상의 연락처에 각각에 대한 트리거 명령어를 생성하도록 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장할 수 있다. The memory outputting an interface for selection of one or more of a plurality of contacts according to a user's request; Identify one or more contacts selected by a user from among the plurality of contacts; And store program instructions executable by the processor to generate a trigger instruction for each of the identified one or more contacts.

상기 메모리는, 사용자의 요청에 따라 복수의 연락처 중 하나 이상의 선택을 위한 인터페이스를 출력하고; 상기 복수의 연락처 중 사용자에 의해 선택된 하나 이상의 연락처를 식별하고; 제1 시점에 입력된 사용자 음성이 미리 설정된 단일 트리거 명령어에 대응되고, 제2 시점에 입력된 사용자 음성이 상기 연락처에 대응되는 경우, 상기 대응되는 연락처로의 통화를 위한 애플리케이션을 실행하도록 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장할 수 있다. The memory outputting an interface for selection of one or more of a plurality of contacts according to a user's request; Identify one or more contacts selected by a user from among the plurality of contacts; When the user voice input at the first time corresponds to a preset single trigger instruction and the user voice input at the second time corresponds to the contact, Lt; RTI ID = 0.0 > executable < / RTI >

상기 메모리는, 내장된 센서에 의해 센싱된 상기 구동 장치의 자세, 사용자의 근접 여부 또는 주변 밝기에 따라 상기 입력된 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단하도록 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장할 수 있다. Wherein the memory is adapted to determine whether the input user voice corresponds to one of the stored plurality of trigger commands in accordance with an attitude of the driving device sensed by an embedded sensor, proximity of a user, Lt; RTI ID = 0.0 > executable < / RTI >

본 발명의 다른 측면에 따르면, 음성 인식 기반 애플리케이션 구동 장치로서, 상기 장치의 움직임, 사용자의 근접 여부 또는 주변 밝기를 센싱하는 하나 이상의 센서; 프로세서; 및 상기 프로세서에 연결된 메모리를 포함하며, 상기 메모리는, 복수의 트리거 명령어를 저장하며; 상기 센서에 의해 상기 장치가 미리 설정된 자세를 가지거나 또는 사용자가 근접한 것으로 판단되는 경우, 입력되는 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단하며; 상기 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는 경우, 상기 대응되는 트리거 명령어에 따른 애플리케이션을 호출하도록 상기 프로세서에 의해 실행 가능한 프로그램 명령어들을 저장하는 음성 인식 기반 애플리케이션 구동 장치가 제공된다. According to another aspect of the present invention, there is provided a device for driving a speech recognition based application, comprising: at least one sensor for sensing movement of the device, proximity of a user, or ambient brightness; A processor; And a memory coupled to the processor, the memory storing a plurality of trigger instructions; Determine whether the input user voice corresponds to one of the stored plurality of trigger commands if the device has a predetermined attitude or the user is determined to be proximate by the sensor; There is provided a speech recognition based application driving apparatus for storing program instructions executable by the processor to call an application according to the corresponding trigger instruction when the user voice corresponds to one of the stored plurality of trigger instructions.

본 발명의 또 다른 측면에 따르면, 애플리케이션 구동 장치의 음성 인식 기반 제어 방법으로서, 복수의 트리거 명령어를 저장하는 단계; 백그라운드 상태에서, 입력되는 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단하는 단계; 및 상기 사용자 음성이 상기 저장된 복수의 트리거 명령어 중 하나에 대응되는 경우, 상기 대응되는 트리거 명령어에 따른 애플리케이션을 호출하는 단계를 수행하는 애플리케이션 구동 장치의 음성 인식 기반 제어 방법이 제공된다. According to another aspect of the present invention, there is provided a method of controlling a speech recognition based on an application driving apparatus, the method comprising: storing a plurality of trigger commands; Determining, in the background state, whether the incoming user voice corresponds to one of the stored plurality of trigger commands; And invoking an application according to the corresponding trigger instruction when the user voice corresponds to one of the stored plurality of trigger commands.

본 발명의 또 다른 측면에 따르면, 상기한 방법을 수행하기 위한 일련의 명령을 포함하는 매체에 저장된 컴퓨터 프로그램이 제공된다. According to yet another aspect of the present invention, there is provided a computer program stored on a medium comprising a series of instructions for performing the method.

본 발명에 따르면, 음성 인식 엔진을 백그라운드 상태에서 실행시키고 복수의 트리거 명령어를 저장하고 있기 때문에 음성 인식 기반 장치 제어를 위해 여러 단계를 거치지 않아도 되는 장점이 있다.According to the present invention, since the speech recognition engine is executed in the background state and a plurality of trigger commands are stored, there is an advantage that it is not necessary to perform various steps for controlling the speech recognition based device.

도 1은 종래기술에 따른 음성 인식 기반 모바일 기기의 제어 과정을 도시한 순서도이다.
도 2는 본 발명의 바람직한 일 실시예에 따른 음성 인식 기반 구동 장치의 블록도이다.
도 3 내지 도 4는 본 발명의 일 실시예에 따른 제한된 개수의 연락처를 설정하기 위한 인터페이스를 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 음성 인식 엔진의 모듈 구성을 도시한 도면이다.
도 6은 본 발명의 바람직한 일 실시예에 따른 애플리케이션 구동 장치의 자세에 따른 음성 인식 과정을 설명하기 위한 도면이다.1 is a flowchart illustrating a process of controlling a mobile device based on speech recognition according to the related art.
2 is a block diagram of a speech recognition-based driving apparatus according to a preferred embodiment of the present invention.
3-4 illustrate interfaces for establishing a limited number of contacts according to an embodiment of the present invention.
5 is a diagram illustrating a module configuration of a speech recognition engine according to an embodiment of the present invention.
6 is a diagram for explaining a speech recognition process according to a posture of an application driving apparatus according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.
Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 바람직한 일 실시예에 따른 음성 인식 기반 구동 장치의 블록도이다. 2 is a block diagram of a speech recognition-based driving apparatus according to a preferred embodiment of the present invention.

도 2에 도시된 바와 같이, 본 실시예에 따른 구동 장치는 프로세서(200) 및 메모리(202)를 포함할 수 있다. As shown in FIG. 2, the driving apparatus according to the present embodiment may include a processor 200 and a memory 202.

프로세서(200)는 컴퓨터 프로그램을 실행할 수 있는 CPU(central processing unit)나 그밖에 가상 머신 등을 포함할 수 있다. The processor 200 may include a central processing unit (CPU) or other virtual machine capable of executing a computer program.

메모리(202)는 고정식 하드 드라이브나 착탈식 저장 장치와 같은 불휘발성 저장 장치를 포함할 수 있다. 착탈식 저장 장치는 컴팩트 플래시 유닛, USB 메모리 스틱 등을 포함할 수 있다. 메모리(202)는 각종 랜덤 액세스 메모리와 같은 휘발성 메모리도 포함할 수 있다.Memory 202 may include non-volatile storage such as a fixed hard drive or a removable storage device. The removable storage device may include a compact flash unit, a USB memory stick, and the like. The memory 202 may also include volatile memory such as various random access memories.

이와 같은 메모리(202)에는 프로세서(200)에 의해 실행 가능한 프로그램 명령어들이 저장된다. Such memory 202 stores program instructions that are executable by the processor 200.

본 실시예에 따른 음성 인식 기반 애플리케이션 구동 장치는 통화 기능을 갖는 모바일 기기일 수 있으며, 이하에서는 장치가 모바일 기기인 것을 중심으로 설명할 것이나 반드시 이에 한정되는 것은 아니다. The speech recognition-based application driving apparatus according to the present embodiment may be a mobile apparatus having a call function, and hereinafter, the apparatus will be mainly described as a mobile apparatus, but the present invention is not limited thereto.

본 발명의 바람직한 일 실시예에 따르면, 메모리(202)에는 모바일 기기에서 복수의 트리거 명령어를 저장하고, 입력된 사용자 음성이 복수의 트리거 명령어 중 하나에 해당하는지 여부를 판단하여 복수의 애플리케이션 중 입력된 음성에 대응되는 애플리케이션이 실행되도록 하는 프로그램 명령어들이 저장된다. According to a preferred embodiment of the present invention, the memory 202 stores a plurality of trigger commands in the mobile device, determines whether the input user voice corresponds to one of a plurality of trigger commands, Program instructions for causing an application corresponding to a voice to be executed are stored.

본 명세서에서, 상기한 음성 인식 기능 및 애플리케이션 실행을 위한 기능을 수행하는 애플리케이션을 음성 인식 엔진으로 정의한다. In this specification, the speech recognition function is defined as an application that performs the above-described functions for speech recognition and application execution.

바람직하게, 본 실시예에 따른 음성 인식 엔진은 백그라운드 상태로 실행된다. Preferably, the speech recognition engine according to the present embodiment is executed in the background state.

또한, 본 실시예에 따른 음성 인식 엔진은 복수의 트리거 명령어를 저장한다. In addition, the speech recognition engine according to the present embodiment stores a plurality of trigger commands.

전술한 바와 같이, 트리거 명령어는 사용자의 음성을 인식하기 위한 과정을 수행하도록 하는 명령어로서, 종래에는 단일 트리거 명령어만이 사용되었다. As described above, the trigger command is a command for performing a process for recognizing a user's voice. Conventionally, only a single trigger command is used.

그러나, 본 실시예에 따른 음성 인식 엔진은 복수의 트리거 명령어를 저장하며, 입력된 사용자 음성이 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단한다. However, the speech recognition engine according to the present embodiment stores a plurality of trigger commands and determines whether the input user voice corresponds to one of a plurality of trigger commands.

여기서, 복수의 트리거 명령어는 “AAA 전화(발신)”, “BBB 전화(수신)”, “전화 받기(수신)”, “음악 재생”, “동영상 재생”, “카메라 실행”, “ 네비게이션 실행”과 같이 식별 단어 및 식별 단어에 상응하는 애플리케이션의 동작을 위한 단어를 포함할 수 있다. Here, a plurality of trigger commands are defined as "AAA telephone (outgoing)", "BBB telephone (reception)", "telephone reception (reception)", "music reproduction", "movie reproduction" And a word for operation of the application corresponding to the identification word.

여기서, 식별 단어는 주소록에 포함된 연락처(상대방 이름) 또는 음악, 카메라 등과 같이 기능과 관련된 단어일 수 있다. Here, the identification word may be a contact-related word (name of the other party) included in the address book or a word related to the function such as music, camera, and the like.

본 실시예에 따르면, 트리거 명령어에 대응되는 사용자 음성이 입력되는 경우, 백그라운드 상태로 실행되는 음성 인식 엔진은 사용자의 다른 동작을 요구하지 않고, 식별 단어에 상응하는 애플리케이션을 호출하여 전화 걸기(받기) 또는 음악 재생이 이루어질 수 있도록 한다. According to the present embodiment, when the user voice corresponding to the trigger command is input, the voice recognition engine executed in the background state calls the application corresponding to the identification word without requesting the other operation of the user, Or music reproduction can be performed.

오동작을 방지하기 위해, 복수의 트리거 명령어는 미리 설정된 개수 이하, 예를 들어, 20개 또는 30개로 제한될 수 있다. In order to prevent malfunction, a plurality of trigger commands may be limited to a predetermined number or less, for example, 20 or 30.

본 발명의 바람직한 일 실시예에 따르면, 복수의 트리거 명령어는 주소록에 포함된 복수의 연락처 중 미리 설정된 개수 이하의 연락처로의 통화(전화 걸기)를 위한 명령어를 포함할 수 있다. According to a preferred embodiment of the present invention, the plurality of trigger commands may include an instruction for making a call (dialing) to a predetermined number or less of a plurality of contacts included in the address book.

도 3 내지 도 4는 본 발명의 일 실시예에 따른 제한된 개수의 연락처를 설정하기 위한 인터페이스를 도시한 도면이다. 3-4 illustrate interfaces for establishing a limited number of contacts according to an embodiment of the present invention.

도 3은 초기 인터페이스로서, 사용자가 도 3의 초기 인터페이스에서 연락처 선택 메뉴(300)를 선택하면 도 4와 같이 복수의 연락처 중 하나 이상을 선택할 수 있는 체크박스를 포함하는 연락처 리스트 인터페이스가 출력된다. 3 is an initial interface. When the user selects the contact selection menu 300 in the initial interface of FIG. 3, a contact list interface including a check box for selecting one or more of a plurality of contacts is output as shown in FIG.

도 4에 도시된 바와 같이, 연락처 리스트 인터페이스는 각 연락처에 대한 웨이크 업(wake up) 체크 박스(400) 및 커맨드(command) 체크 박스(402)를 포함할 수 있다. As shown in FIG. 4, the contact list interface may include a wake up check box 400 and a command check box 402 for each contact.

웨이크 업 체크 박스(400)가 선택되는 경우, 음성 인식 엔진은 선택된 연락처에 대한 트리거 명령어를 생성한다. When the wake-up check box 400 is selected, the speech recognition engine generates a trigger command for the selected contact.

예를 들어, AAA라는 연락처에 대해 대해 웨이크 업 체크 박스(400)가 선택되는 경우, 음성 인식 엔진은 “AAA 전화”라는 트리거 명령어를 생성하고, 추후에“AAA 전화”를 포함하는 사용자 음성이 입력되는 경우, 통화 애플리케이션을 호출하여 AAA에 해당하는 전화번호로 전화 걸기가 이루어지도록 한다. For example, if a wakeup check box 400 is selected for a contact named AAA, the speech recognition engine generates a trigger command " AAA phone ", and subsequently a user voice including an " , The call application is called to make a call to the telephone number corresponding to the AAA.

상기한 바와 같이, 복수의 연락처 중 미리 설정된 개수 이하의 연락처에 대해서만 트리거 명령어를 생성하는 것은 음성 인식 엔진이 계속적으로 입력되는 사용자 음성에 의해 오동작하는 것을 방지하기 위함이다. As described above, generating the trigger command only for a predetermined number or less of the contacts among the plurality of contacts is for preventing the speech recognition engine from malfunctioning due to the continuously input user voice.

본 실시예에 따르면, 음성 인식 엔진은 실행 상태를 유지하면서 사용자가 요청한 애플리케이션을 호출하여 필요한 기능을 직접 제어할 수 있다. According to the present embodiment, the speech recognition engine can invoke the application requested by the user while maintaining the execution state, and can directly control necessary functions.

이에 한정됨이 없이, 음성 인식 엔진은 사용자가 요청한 애플리케이션의 실행 중에는 종료되었다가 요청된 애플리케이션의 실행이 종료된 이후에 백그라운드 상태로 복귀할 수 있다. Without being limited thereto, the speech recognition engine may be terminated during execution of the application requested by the user, and may return to the background state after the execution of the requested application is terminated.

여기서, 본 실시예에 따른 음성 인식 엔진이 종료되었다가 복귀되는 경우는 통화 애플리케이션이 실행되는 동안으로 한정될 수 있다. 이는 마이크를 통해 통화가 이루어지는 경우 본 실시예에 따른 음성 인식 엔진의 동작이 일시적으로 중지되는 것을 의미한다. Here, when the speech recognition engine according to the present embodiment is terminated and then returned, it may be limited to the period during which the call application is executed. This means that the operation of the speech recognition engine according to the present embodiment is temporarily stopped when a call is made through the microphone.

한편, 커맨드 체크 박스(402)가 선택되는 경우에는 종래와 같이 제1 시점에 미리 설정된 단일 트리거 명령어가 입력되고, 이후(제2 시점에) 커맨드 체크 박스(402)가 선택된 “연락처 이름”과 “전화”를 포함하는 사용자 음성이 입력되는 경우 통화 애플리케이션이 호출된다. On the other hand, when the command check box 402 is selected, a previously set single trigger command is input at the first time, and then the command check box 402 selects the "contact name" and " Quot; telephone " is input, the calling application is called.

상기한 바와 같이, 복수의 트리거 명령어에 따라 애플리케이션 실행이 가능한 것은 음성 인식 엔진에서의 음성 인식 성능과 동시에 거절 성능을 높였기 때문이다. As described above, it is possible to execute the application in accordance with a plurality of trigger commands because the speech recognition performance and the rejection performance of the speech recognition engine are enhanced.

도 5는 본 발명의 일 실시예에 따른 음성 인식 엔진의 모듈 구성을 도시한 도면이다. 5 is a diagram illustrating a module configuration of a speech recognition engine according to an embodiment of the present invention.

도 5를 참조하면, 본 실시예에 따른 음성 인식 엔진은 전처리 모듈(500), 거절 모듈(502), 가비지 음향 모델(504), 음소 분석 모듈(506), 음소 모델(508), 트리거 명령어 판단 모듈(510), 트리거 명령어 리스트(512) 및 애플리케이션 실행 모듈(514)을 포함할 수 있다. 5, the speech recognition engine according to the present embodiment includes a preprocessing module 500, a rejection module 502, a garbage acoustic model 504, a phoneme analysis module 506, a phoneme model 508, Module 510, a list of trigger commands 512, and an application execution module 514.

전처리 모듈(500)은 입력된 사용자 음성으로부터 인식 대상이 되는 구간을 탐색하여 잡음을 제거하고 입력된 음성의 특징을 추출한다. The preprocessing module 500 searches for a section to be recognized from the input user voice, removes noise, and extracts characteristics of the input voice.

거절 모듈(502)은 가비지 음향 모듈(504)을 참조하여 입력된 사용자 음성이 가비지(garbage)에 해당하는지 여부를 판단한다. The reject module 502 refers to the garbage sound module 504 and determines whether the inputted user voice corresponds to garbage.

가비지에 해당하지 않는 사용자 음성만이 음소 분석 모듈(506)로 입력된다. Only the user voice not corresponding to the garbage is input to the phoneme analysis module 506. [

음소 분석 모듈(506)은 음소 모델(508)을 참조하여 사용자 음성에서 음소 분석을 수행한다. The phoneme analysis module 506 refers to the phoneme model 508 and performs phoneme analysis on the user's voice.

트리거 명령어 판단 모듈(510)은 트리거 명령어 리스트(512)를 참조하여 입력된 사용자 음성이 미리 저장된 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단한다. The trigger command determination module 510 refers to the trigger command list 512 to determine whether the input user voice corresponds to one of a plurality of trigger commands stored in advance.

애플리케이션 실행 모듈(514)은 사용자 음성이 복수의 트리거 명령어 중 하나에 대응되는 경우, 분석된 사용자 음성에 대응되는 애플리케이션을 실행한다. The application execution module 514 executes an application corresponding to the analyzed user voice when the user voice corresponds to one of a plurality of trigger commands.

본 실시예에 따르면 트리거 명령어 리스트(512)는 사용자의 연락처 선택 및 음성 인식 기반으로 동작시키고자 하는 애플리케이션 선택 과정에 의해 갱신될 수 있다. According to the present embodiment, the trigger instruction list 512 can be updated by the user's contact selection and application selection process to operate based on speech recognition.

본 발명은 단일 트리거 명령어가 아니라 사용자의 요청 등에 의해 복수의 트리거 명령어를 저장하면서, 입력된 사용자 음성에 따라 하나의 단계로서 사용자가 원하는 애플리케이션이 실행되도록 할 수 있다. The present invention can store a plurality of trigger commands in response to a user's request, rather than a single trigger command, and allow a user to execute an application desired by the user as one step according to the inputted user's voice.

도 3에 도시된 바와 같이, 본 실시예에 따른 음성 인식 어플리케이션 실행에 따라 화면에 출력되는 인터페이스에서 사용자가 절전 모드(Power Save)가 선택되는 경우, 음성 인식 엔진에 의한 거절 여부 결정, 음소 분석 및 트리거 명령어 대응여부의 판단과 같은 분석 과정은 외부 전원이 연결된 상태이거나 또는 내장된 센서에 의해 음성 인식 기반 장치 제어가 이루어질 것으로 예상되는 상태에서만 수행될 수 있다. As shown in FIG. 3, when the user selects the power save mode in the interface displayed on the screen according to the execution of the speech recognition application according to the present embodiment, An analysis process such as determination of whether or not to respond to a trigger command can be performed only in a state where an external power source is connected or in a state in which a voice recognition based device control is expected to be performed by a built-in sensor.

도 6은 본 발명의 바람직한 일 실시예에 따른 애플리케이션 구동 장치의 자세에 따른 음성 인식 과정을 설명하기 위한 도면이다. 6 is a diagram for explaining a speech recognition process according to a posture of an application driving apparatus according to an embodiment of the present invention.

본 실시예에 따른 애플리케이션 구동 장치는 가속도 센서, 자이로 센서 및 자기장 센서와 같이 장치의 움직임에 따른 가속도, 각속도 또는 자기장 변화와 근접 센서를 통해 사용자의 근접 여부를 센싱할 수 있는 센서를 포함할 수 있다.The application driving apparatus according to the present embodiment may include a sensor capable of sensing an acceleration, an angular velocity, or a magnetic field change according to movement of the apparatus, such as an acceleration sensor, a gyro sensor, and a magnetic field sensor, .

또한, 본 실시예에 따르면 밝기 센서가 추가로 이용될 수 있다. Further, according to the present embodiment, a brightness sensor can be additionally used.

여기서, 밝기 센서에 의해 조도가 낮은 경우에는 주머니에 위치한 것으로, 소정 수치 이상 값의 조도를 갖는 경우에는 사용자가 모바일 기기를 사용할 의사가 있는 것으로 판단할 수 있다. Here, when the illuminance is low due to the brightness sensor, it is located in the pocket, and when the illuminance is higher than a predetermined value, it can be determined that the user intends to use the mobile device.

이처럼 내장된 센서에 의해, 애플리케이션 구동 장치의 자세가 미리 설정된 자세를 가지거나 또는 사용자가 근접한 것으로 판단되는 경우 또는 밝기가 소정 수치 이상인 경우에만 음성 인식에 의한 제어 과정이 수행될 수 있다. By the built-in sensor, the control process by voice recognition can be performed only when the attitude of the application driving apparatus has a predetermined attitude or when it is determined that the user is close or when the brightness is equal to or greater than a predetermined value.

장치가 모바일 기기인 경우, 미리 설정된 자세라는 것은 지면에 대한 모바일 기기의 기울기(각도) 범위로 설정될 수 있다. If the device is a mobile device, the preset orientation may be set to the range of the tilt (angle) of the mobile device relative to the ground.

예를 들어, 지면에 수평한 상태를 0도로 정의할 때, 음성 인식을 위한 미리 설정된 자세는 모바일 기기의 기울기가 60도 내지 300도 범위인 경우로 정의될 수 있다. 여기서 상기한 각도는 xyz축 중 적어도 하나에 대한 각도일 수 있다. For example, when a horizontal state on the ground is defined as 0 degrees, a predetermined attitude for speech recognition may be defined as a case where the inclination of the mobile device is in the range of 60 to 300 degrees. Where the angle may be an angle to at least one of the x, y, and z axes.

도 6을 참조하면, 음성 인식 엔진은 내장 센서에 의해 센싱된 정보를 이용하여 장치의 자세가 미리 설정된 자세 또는 사용자의 근접이 있는지 여부를 판단한다(단계 600).Referring to FIG. 6, the speech recognition engine uses information sensed by the built-in sensor to determine whether the attitude of the device is a preset attitude or proximity of a user (step 600).

여기서, 단계 600은 사용자 음성의 인식 과정을 수행할지 여부를 판단하는 과정으로 정의될 수 있다. Here, the step 600 may be defined as a process of determining whether to perform the recognition process of the user's voice.

장치가 미리 설정된 자세를 가지거나 또는 사용자 근접이 있는 것으로 판단되는 경우, 음성 인식 엔진은 이후 입력되는 사용자 음성을 분석하여 복수의 트리거 명령어 중 하나에 대응되는지 여부를 판단한다(단계 602).If it is determined that the device has a preset attitude or there is a user proximity, the speech recognition engine analyzes the inputted user voice to determine whether it corresponds to one of a plurality of trigger commands (step 602).

단계 602에서 가비지 음향 모델을 이용한 거절 여부의 결정 및 음소 분석 등이 수행될 수 있다. In step 602, determination of rejection and phoneme analysis using the garbage acoustic model may be performed.

사용자 음성이 복수의 트리거 명령어 중 하나에 대응되는 경우, 음성 인식 엔진은 대응되는 트리거 명령어에 따른 애플리케이션을 호출한다(단계 604). If the user voice corresponds to one of the plurality of trigger commands, the speech recognition engine invokes the application according to the corresponding trigger command (step 604).

이후 호출된 애플리케이션의 종료 여부를 판단하며(단계 606), 애플리케이션이 종료된 이후 음성 인식 엔진이 백그라운드 상태로 복귀한다(단계 608). Then, it is determined whether or not the called application is terminated (step 606), and the speech recognition engine returns to the background state after the application is terminated (step 608).

상기한 606 내지 608은 통화 애플리케이션이 호출되는 경우에만 선택적으로 수행될 수도 있다. The above-described steps 606 to 608 may be selectively performed only when a call application is called.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.
As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

A speech recognition-based application driver,
A processor; And
A memory coupled to the processor,
The memory comprising:
Run in background state;
Storing a plurality of trigger instructions;
Determine whether an incoming user voice corresponds to one of the stored plurality of trigger commands;
Storing program instructions executable by the processor to call an application according to the corresponding trigger instruction if the user voice corresponds to one of the stored plurality of trigger instructions,
Wherein the program instructions are terminated during execution of the called application and return to the background state upon termination of the called application,
Wherein the trigger instruction comprises an identification word and a word for application operation corresponding to the identification word, the identification word and the word for application operation being limited to a predetermined number or less,
The plurality of trigger commands are commands for making a call to a contact of a predetermined number or less among a plurality of contacts included in the address book
Wherein the program instructions generate a single trigger instruction for a contact selected by a user at a first point in time, and if the user voice input at a second point in time after the first point of time includes the contact from which the single trigger command was generated, Wherein the single triggering instruction causes the application to execute a call to the generated contact.

The method according to claim 1,
Wherein the application is an application for at least one of a call, a camera, music, video, and navigation.

delete

The method according to claim 1,
The memory comprising:
Outputting an interface for selection of one or more of the plurality of contacts according to a user's request;
Identify one or more contacts selected by a user from among the plurality of contacts;
To generate a trigger command for each of the identified contacts
And storing program instructions executable by the processor.

delete

The method according to claim 1,
The memory comprising:
And determines whether or not the input user voice corresponds to one of the stored plurality of trigger commands according to the posture of the driving device sensed by the built-in sensor, proximity of the user, or ambient brightness
And storing program instructions executable by the processor.

A speech recognition-based application driver,
One or more sensors for sensing movement of the device, proximity of a user, or ambient brightness;
A processor; And
A memory coupled to the processor,
The memory comprising:
Storing a plurality of trigger instructions;
Determine whether the input user voice corresponds to one of the stored plurality of trigger commands if the device has a predetermined attitude or the user is determined to be proximate by the sensor;
Storing program instructions executable by the processor to call an application according to the corresponding trigger instruction if the user voice corresponds to one of the stored plurality of trigger instructions,
Wherein the trigger instruction comprises an identification word and a word for application operation corresponding to the identification word, the identification word and the word for application operation being limited to a predetermined number or less,
Wherein the plurality of trigger commands are instructions for making a call to a predetermined number or less of a plurality of contacts included in the address book,
Wherein the program instructions generate a single trigger instruction for a contact selected by a user at a first point in time, and if the user voice input at a second point in time after the first point of time includes the contact from which the single trigger command was generated, Wherein the single triggering instruction causes the application to execute a call to the generated contact.

A speech recognition-based control method of an application driving apparatus,
Storing a plurality of trigger instructions;
Determining, in the background state, whether the incoming user voice corresponds to one of the stored plurality of trigger commands;
Invoking an application according to the corresponding trigger instruction if the user voice corresponds to one of the stored plurality of trigger instructions;
Terminating the execution of the called application; And
When the called application is terminated, returning to the background state
Lt; / RTI >
Wherein the trigger instruction comprises an identification word and a word for application operation corresponding to the identification word, the identification word and the word for application operation being limited to a predetermined number or less,
Wherein the plurality of trigger commands are instructions for making a call to a predetermined number or less of a plurality of contacts included in the address book,
Wherein the storing of the plurality of trigger instructions comprises:
Generating a single trigger instruction for the contact selected by the user,
The method of claim 1,
And if the user voice includes the contact from which the single triggering instruction was generated, causing the single triggering instruction to execute an application for a call to the generated contact.

A computer program stored in a medium comprising a series of instructions for performing the method according to claim 9.