KR20080066243A

KR20080066243A - Voice recognition device and method of voice recognition using the same

Info

Publication number: KR20080066243A
Application number: KR1020070003442A
Authority: KR
Inventors: 김양문
Original assignee: 삼성전자주식회사
Priority date: 2007-01-11
Filing date: 2007-01-11
Publication date: 2008-07-16

Abstract

An apparatus and a method for recognizing voice are provided to perform voice recognition using various parameters, thereby improving the efficiency of the voice recognition. A voice-receiving unit(110) receives a voice signal, and converts the voice signal into voice recognition data having main parameter and sub parameter for identifying the voice signal. A pattern-storing unit(130) stores predetermined voice model data patterns, and a feature-extracting unit(120) extracts a recognition feature part of the voice recognition data according to the main parameter and the sub parameter. If the recognition feature part is identical to one of the voice model data patterns, a controller(150) outputs a command code corresponding to the voice model pattern.

Description

VOICE RECOGNITION DEVICE AND METHOD OF VOICE RECOGNITION USING THE SAME}

도 1은 일반적인 음성 인식 방법을 설명하기 위한 순서도이다.1 is a flowchart illustrating a general speech recognition method.

도 2는 아날로그 음성 신호를 디지털 음성 인식 데이터로 변환한 일례를 도시한 도면이다.2 is a diagram illustrating an example of converting an analog voice signal into digital voice recognition data.

도 3은 본 발명의 일 실시예에 의한 음성 인식 장치를 설명하기 위한 블록도이다.3 is a block diagram illustrating a speech recognition apparatus according to an embodiment of the present invention.

도 4는 방향성을 갖는 음성 인식 데이터 구조를 설명하기 위한 도면이다.4 is a diagram for describing a speech recognition data structure having directionality.

도 5는 일반적인 음성 인식 데이터 구조와 본 발명에 의한 음성 인식 데이터 구조를 비교 설명하기 위한 도면이다.5 is a view for comparing and explaining a general speech recognition data structure and a speech recognition data structure according to the present invention.

도 6은 본 발명의 일 실시예에 의한 음성 인식 방법을 설명하기 위한 순서도이다.6 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.

*도면의 주요 부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

100 : 음성 인식 장치 110 : 음성 수신부100: speech recognition device 110: voice receiver

120 : 특징 추출부 130 : 패턴 저장부120: feature extraction unit 130: pattern storage unit

140 : 명령어 코드 저장부 150 : 제어부140: command code storage unit 150: control unit

본 발명은 음성 인식 장치 및 이를 이용한 음성 인식 방법에 관한 것으로서, 보다 상세하게는 음성 인식률을 향상시킨 음성 인식 장치 및 이를 이용한 음성 인식 방법에 관한 것이다.The present invention relates to a speech recognition apparatus and a speech recognition method using the same, and more particularly, to a speech recognition apparatus having an improved speech recognition rate and a speech recognition method using the same.

최근 들어, 전자 기기의 조작 편의성을 향상시키기 위하여, 사용자가 전자 기기에 형성된 명령어 키를 직접 터치하지 않아도 사용자의 음성을 통해 명령어가 직접 입력되도록 하여 전자 기기의 소정 기능이 구동하도록 하는 음성 인식 기술이 발전되고 있다.Recently, in order to improve the convenience of operation of an electronic device, a voice recognition technology for driving a predetermined function of an electronic device by directly inputting a command through a user's voice without a user directly touching a command key formed on the electronic device has been performed. It is developing.

이러한 음성 인식 기술은 특정 사용자에 대해서만 인식하는 화자 종속 시스템과 특정 사용자에 상관없이 인식하는 화자 독립 시스템으로 나누어진다. The speech recognition technology is divided into a speaker-dependent system that recognizes only a specific user and a speaker-independent system that recognizes a specific user.

화자 종속 음성 인식은 사용전에 화자의 음성을 저장 및 등록시키고 실제 인식을 수행할 때는 입력된 음성의 패턴과 저장된 음성의 패턴을 비교하여 인식하게 된다. 반면, 화자 독립 음성 인식은 불특정 다수 화자의 음성을 인식하기 위한 것으로 화자 종속 음성 인식처럼 사용자가 시스템의 동작전에 음성을 등록시켜야 하는 번거로움이 없다. Speaker dependent speech recognition stores and registers the speaker's voice before use and recognizes the voice by comparing the pattern of the input voice with the stored voice. On the other hand, speaker independent speech recognition is for recognizing an unspecified majority of speakers' voices, and there is no need for a user to register a voice before operating a system like speaker dependent speech recognition.

즉, 불특정 다수 화자의 음성을 수집하여 통계적인 음성 모델 데이터를 저장하고, 이를 이용하여 음성 인식을 수행하게 된다. 따라서, 각 화자의 특징적인 특성은 사라지고 각 화자간에 공통적으로 나타나는 특성이 부각된다. 화자 종속 음성 인식은 화자 독립 음성 인식에 비해 상대적으로 인식률도 높고 기술 구현이 용이하 여 실용화하기에도 유리하다.That is, the voice of unspecified majority speakers is collected to store statistical voice model data, and speech recognition is performed using the voice data. Therefore, the characteristic characteristics of each speaker disappear, and the characteristics common to each speaker are highlighted. Speaker-dependent speech recognition has a higher recognition rate and easier implementation of technology than speaker-independent speech recognition.

이러한 음성 인식 기술을 이용하여 음성 인식을 수행하는 방법을 구체적으로 살펴보면 다음과 같다.A method of performing speech recognition using the speech recognition technology will now be described in detail.

도 1은 일반적인 음성 인식 방법을 설명하기 위한 순서도이고, 도 2는 아날로그 음성 신호를 디지털 음성 인식 데이터로 변환한 일례를 도시한 도면이다.1 is a flowchart illustrating a general voice recognition method, and FIG. 2 is a diagram illustrating an example of converting an analog voice signal into digital voice recognition data.

도 1과 도 2를 참조하면, 일반적인 음성 인식 방법은 음성 신호를 수신하는 단계(S10), 음성 신호를 크기만을 갖는 음성 인식 데이터로 변환하는 단계(S20), 잡음 처리 단계(S30), 음성 인식 데이터의 인식 특징부를 추출하는 단계(S40), 음성 모델 데이터 패턴과 인식 특징부를 비교하는 단계(S50), 일치하는 패턴의 존재 여부를 판단하는 단계(S60) 및 인식 결과를 출력하는 단계(S70)를 포함한다.1 and 2, the general speech recognition method includes receiving a speech signal (S10), converting the speech signal into speech recognition data having only a magnitude (S20), a noise processing step (S30), and speech recognition. Extracting a recognition feature of the data (S40), comparing the speech model data pattern and the recognition feature (S50), determining whether a matching pattern exists (S60), and outputting a recognition result (S70). It includes.

구체적으로, 단계 S10에서 사용자가 소정 명령어를 음성 신호로 출력하면, 마이크 등과 같은 음향 센서는 사용자가 출력하는 음성 신호를 수신한다.Specifically, when the user outputs a predetermined command as a voice signal in step S10, an acoustic sensor such as a microphone receives a voice signal output by the user.

단계 S20에서 음향 센서는 수신된 아날로그 음성 신호를 이용하여 도 2에 도시된 바와 같은 디지털 음성 신호(이하, 음성 인식 데이터)로 변환한다. In step S20, the acoustic sensor converts the digital voice signal (hereinafter, voice recognition data) as shown in FIG. 2 using the received analog voice signal.

단계 S30에서 변환된 음성 인식 데이터에 포함되고, 주변 환경으로부터 입력되는 잡음 성분에 대한 음성 인식 데이터와 실제 사용자로부터 입력되는 음성 신호에 대한 음성 인식 데이터를 분리하여, 잡음 성분을 제거한다.The speech component included in the speech recognition data converted in step S30 and separated from the speech recognition data for the noise component input from the surrounding environment and the speech signal input from the actual user, to remove the noise component.

단계 S40에서 잡음 처리된 음성 인식 데이터를 용이하게 구분할 수 있는 인식 특징부, 예를 들어, 음성 신호를 음소 단위로 분류하고, 분류된 음소의 모음부와 같은 인식 특징부를 추출한다.In operation S40, a recognition feature capable of easily distinguishing the noise-recognized speech recognition data, for example, a speech signal is classified into phoneme units, and a recognition feature such as a collection of classified phonemes is extracted.

단계 S50에서 추출된 인식 특징부와 음성 모델 데이터 패턴을 비교한다. 여기서, 음성 모델 데이터 패턴은 상술한 화자 종속 시스템과 화자 독립 시스템 모두에서 학습을 통해, 또는 불특정 다수 화자의 음성 신호의 공통적인 특징을 모델화하여 등록된 패턴을 의미하고, 이러한 음성 모델 데이터 패턴은 음성 데이터 베이스(이하, 음성 DB) 등에 기 저장되어야 한다.The recognition feature extracted in step S50 is compared with the speech model data pattern. Here, the voice model data pattern means a pattern registered by learning in both the speaker dependent system and the speaker independent system described above, or by modeling common features of voice signals of unspecified majority speakers. It should be pre-stored in a database (hereinafter referred to as a voice DB).

단계 S60에서 음성 모델 데이터 패턴과 추출된 인식 특징부의 비교 결과, 인식 특징부와 일치하는 패턴이 존재하는지 여부를 판단하고, 단계 S70에서 인식 결과를 출력한다. 여기서, 인식 결과는 인식 특징부와 일치하는 음성 모델 데이터 패턴이 존재하면, 음성 모델 데이터 패턴과 기 매칭된 명령어 코드가 출력되어 전자 기기가 명령어 코드에 대응하는 즉, 사용자가 출력하는 명령어에 대응하는 기능을 수행하도록 한다. In operation S60, it is determined whether a comparison result of the speech model data pattern and the extracted recognition features is present, and whether there is a pattern that matches the recognition features, and in operation S70, the recognition result is output. Here, if the recognition result is a voice model data pattern that matches the recognition feature, a command code previously matched with the voice model data pattern is output so that the electronic device corresponds to the command code, that is, corresponding to the command output by the user. Perform a function.

이러한 일반적인 음성 인식 방법에 의하면, 음성 신호를 크기만을 갖는 음성 인식 데이터로 변환하고, 이에 기초하여 인식 특징부를 추출하고, 음성 신호의 크기만을 고려하여 기 저장된 음성 모델 데이터 패턴과 인식 특징부를 비교하여 음성 인식을 수행함에 따라 음성 인식률이 저하되는 문제점이 있다.According to this general speech recognition method, a speech signal is converted into speech recognition data having only a magnitude, a recognition feature is extracted on the basis of the speech signal, and the speech is compared with the previously stored speech model data pattern and the recognition feature in consideration of the magnitude of the speech signal. There is a problem in that the speech recognition rate is lowered as the recognition is performed.

본 발명은 상기한 문제점을 해결하기 위해 안출된 것으로서, 본 발명의 목적은 다양한 파라미터들을 이용하여 음성 인식을 수행함으로써 음성 인식률을 향상시킨 음성 인식 장치를 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech recognition apparatus having improved speech recognition rate by performing speech recognition using various parameters.

본 발명의 다른 목적은 상기 음성 인식 장치를 이용한 음성 인식 방법을 제 공하는데 있다.Another object of the present invention is to provide a voice recognition method using the voice recognition device.

상기한 본 발명의 목적을 달성하기 위하여, 본 발명의 일 실시예에 의한 음성 인식 장치는 음성 신호를 수신하여 상기 음성 신호를 식별하기 위한 메인 파라미터와 서브 파라미터를 갖는 음성 인식 데이터로 변환하는 음성 수신부, 기 정의된 음성 모델 데이터 패턴을 저장하는 패턴 저장부, 상기 메인 및 서브 파라미터에 따른 상기 음성 인식 데이터의 인식 특징부를 추출하는 특징 추출부 및 상기 인식 특징부와 상기 음성 모델 데이터 패턴을 비교하여 상기 인식 특징부와 일치하는 음성 모델 데이터 패턴이 존재하면, 상기 음성 모델 데이터 패턴에 대응하는 명령어 코드를 출력하는 제어부를 포함한다.In order to achieve the above object of the present invention, a voice recognition apparatus according to an embodiment of the present invention receives a voice signal and converts it to voice recognition data having a main parameter and a sub parameter for identifying the voice signal A pattern storage unit for storing a predefined speech model data pattern, a feature extractor extracting a recognition feature of the speech recognition data according to the main and sub-parameters, and comparing the recognition feature with the speech model data pattern And if there is a speech model data pattern that matches a recognition feature, then outputting a command code corresponding to the speech model data pattern.

여기서, 상기 메인 파라미터는 상기 음성 신호의 크기 정보이고, 상기 서브 파라미터는 상기 음성 신호의 출력 방향에 대한 방향 정보일 수 있다.Here, the main parameter may be magnitude information of the voice signal, and the sub parameter may be direction information on an output direction of the voice signal.

또한, 상기 음성 수신부는 상기 방향 정보를 3차원 공간 영역에서 식별하여 상기 음성 인식 데이터에 포함시키는 복수개의 음향 센서들을 구비할 수 있다.The voice receiver may include a plurality of acoustic sensors that identify the direction information in a 3D space region and include the direction information in the voice recognition data.

바람직하게는, 본 발명의 일 실시예에 의한 음성 인식 장치는 상기 명령어 코드를 저장하는 명령어 코드 저장부를 더 포함할 수 있다.Preferably, the voice recognition apparatus according to an embodiment of the present invention may further include a command code storage unit for storing the command code.

또한, 상기 패턴 저장부는, 상기 메인 파라미터 및 서브 파라미터에 따라 상기 음성 신호에 대한 기 정의된 음성 모델 데이터 패턴들을 저장할 수 있다.The pattern storage unit may store predefined voice model data patterns for the voice signal according to the main parameter and the sub parameter.

본 발명의 다른 목적을 달성하기 위하여, 본 발명의 일 실시예에 의한 음성 인식 방법은 음성 신호를 수신하여 상기 음성 신호를 식별하기 위한 메인 파라미터 와 서브 파라미터를 갖는 음성 인식 데이터로 변환하는 단계, 상기 변환된 음성 인식 데이터의 인식 특징부를 추출하는 단계, 기 정의된 음성 모델 데이터 패턴과 상기 인식 특징부를 비교하는 단계 및 상기 인식 특징부와 일치하는 상기 음성 모델 데이터 패턴이 존재하는 것으로 판단되면, 상기 음성 모델 데이터 패턴과 대응하는 명령어 코드를 출력하는 단계를 포함한다.In order to achieve another object of the present invention, the voice recognition method according to an embodiment of the present invention comprises the steps of receiving a voice signal and converting it into voice recognition data having a main parameter and a sub-parameter for identifying the voice signal, Extracting a recognition feature of the converted speech recognition data, comparing a predefined speech model data pattern with the recognition feature, and if it is determined that the speech model data pattern coincides with the recognition feature, the speech Outputting a command code corresponding to the model data pattern.

이러한 본 발명에 따른 표시 장치에 의하면, 다양한 파라미터들을 기초로 다양한 표본들과 비교하여 음성 인식을 수행함으로써 음성 인식률을 향상시킬 수 있다.According to the display device according to the present invention, the speech recognition rate can be improved by performing speech recognition by comparing various samples based on various parameters.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 일 실시예에 의한 음성 인식 장치를 설명하기 위한 블록도이고, 도 4는 방향성을 갖는 음성 인식 데이터의 구조를 설명하기 위한 도면이며, 도 5는 일반적인 음성 인식 데이터 구조와 본 발명에 의한 음성 인식 데이터 구조를 비교 설명하기 위한 도면이다.FIG. 3 is a block diagram illustrating a speech recognition apparatus according to an embodiment of the present invention, FIG. 4 is a diagram for describing a structure of speech recognition data having directionality, and FIG. 5 is a general speech recognition data structure and the present embodiment. It is a figure for comparing and explaining the speech recognition data structure by this invention.

도 3을 참조하면, 본 발명의 일 실시예에 의한 음성 인식 장치(100)는 음성 수신부(110), 특징 추출부(120), 패턴 저장부(130), 명령어 코드 저장부(140) 및 제어부(150)를 포함한다.Referring to FIG. 3, the voice recognition apparatus 100 according to an embodiment of the present invention may include a voice receiver 110, a feature extractor 120, a pattern storage unit 130, a command code storage unit 140, and a controller. And 150.

구체적으로, 음성 수신부(110)는 다수개의 음향 센서 예를 들어, 센서 1(112), 센서 2(114) 및 센서 3(116)를 구비하여 외부로부터 입력되는 아날로그 음성 신호를 수신한다. In detail, the voice receiver 110 includes a plurality of acoustic sensors, for example, a sensor 1 112, a sensor 2 114, and a sensor 3 116 to receive an analog voice signal input from the outside.

이때, 음성 수신부(110)는 일반적으로 입력되는 아날로그 음성 신호의 크기에 대한 정보를 수신한다. 또한, 음성 수신부(110)는 도 4에 도시된 지점 A와 같이 3차원 공간상에 배치되고, 음성 수신부(110)에 포함된 각각의 센서들(112, 114, 116)은 3차원 공간상에서 음성 신호가 입력되는 방향 정보를 함께 수신한다. In this case, the voice receiver 110 generally receives information on the size of an analog voice signal input. In addition, the voice receiver 110 is disposed in a three-dimensional space as shown in point A shown in FIG. 4, and each of the sensors 112, 114, and 116 included in the voice receiver 110 is voiced in a three-dimensional space. Receive the direction information to which the signal is input.

예를 들어, 센서들(112, 114, 116)은 각각 3차원 공간상의 x-축, y-축 및 z-축 상에 각각 배치되고, 아날로그 음성 신호의 출력 위치(지점 B 또는 지점 C)에 대한 방향 정보((x1, y1, z1) 또는 (x2, y2, z2))를 수신한다.For example, sensors 112, 114, and 116 are respectively disposed on the x-axis, y-axis, and z-axis in three-dimensional space, respectively, and at the output location (point B or point C) of the analog speech signal. Direction information ((x1, y1, z1) or (x2, y2, z2)) is received.

또한, 음성 수신부(110)는 아날로그 음성 신호를 식별하기 위한 메인 파라미터 예를 들어 아날로그 음성 신호의 크기 정보와, 서브 파라미터 예를 들어 아날로그 음성 신호의 방향 정보를 갖는 음성 인식 데이터로 변환한다. In addition, the voice receiver 110 converts the voice recognition data having the main parameter for identifying the analog voice signal, for example, the magnitude information of the analog voice signal, and the direction information of the sub parameter, for example, the analog voice signal.

이와 같이 변환된 음성 인식 데이터는 도 5에 도시된 바와 같이 도 1에서 설명한 일반적인 음성 인식 방법에서는 시간 변화(T1,..., TN)에 따른 아날로그 음성 신호의 크기 정보만을 포함하는 예를 들어, 1바이트의 음성 인식 데이터로 변환되나, 본 발명에서는 시간 변화(T1,..., TN)에 따른 아날로그 음성 신호의 크기 및 방향 정보를 포함하는 예를 들어, 4바이트의 음성 인식 데이터로 변환된다.As illustrated in FIG. 5, the converted voice recognition data includes only size information of an analog voice signal according to a time change (T1, ..., TN) in the general voice recognition method described with reference to FIG. 1 byte of speech recognition data, but in the present invention, for example, 4 bytes of speech recognition data including the magnitude and direction information of the analog speech signal according to the time change (T1, ..., TN) .

특징 추출부(120)는 음성 수신부(110)에서 변환된 음성 인식 데이터를 수신하여, 음성 인식 데이터를 식별하기 위한 인식 특징부를 추출한다. 이때, 특징 추출부(120)는 메인 파라미터에 대한 인식 특징부를 추출하고, 서브 파라미터에 대한 인식 특징부를 함께 추출한다. 예를 들어, 특징 추출부(120)는 음성 인식 데이터의 크기에 대한 인식 특징부와 함께 각각의 방향(x, y, z)에 대한 인식 특징부를 함께 추출한다.The feature extractor 120 receives the speech recognition data converted by the speech receiver 110 and extracts a recognition feature for identifying the speech recognition data. In this case, the feature extractor 120 extracts the recognition feature for the main parameter and extracts the recognition feature for the subparameter. For example, the feature extractor 120 extracts the recognition features for each direction (x, y, z) together with the recognition features for the size of the voice recognition data.

패턴 저장부(130)는 화자 종속 시스템에서는 사용자에 의해 저장된 음성 모델 데이터의 기 정의된 패턴(이하, 음성 모델 데이터 패턴)을 저장하고, 화자 독립 시스템에서는 불특정 다수 화자의 음성 신호의 공통적인 특징을 모델화하여 등록된 패턴을 저장한다. 이때, 패턴 저장부(130)에는 메인 파라미터를 기반으로, 서브 파라미터 즉, 방향 정보에 기초하여 등록된 패턴이 저장될 수 있다. 이를 위해, 패턴 저장부(130)는 각 방향 정보에 기초하여 등록된 패턴이 저장되는 복수개의 음성 DB들(132, 134, 136)을 포함할 수 있다.The pattern storage unit 130 stores a predefined pattern (hereinafter, referred to as a voice model data pattern) of voice model data stored by a user in a speaker dependent system, and the common features of voice signals of an unspecified majority speaker in a speaker independent system. Model and save the registered pattern. In this case, the pattern storage unit 130 may store the registered pattern based on the main parameter, that is, the direction information. To this end, the pattern storage unit 130 may include a plurality of voice DBs 132, 134, and 136 in which a registered pattern is stored based on each direction information.

명령어 코드 저장부(140)는 사용자가 입력하는 아날로그 음성 신호에 대응하는 명령어 코드를 저장한다. 예를 들어, 사용자의 "정지"라는 아날로그 음성 신호에 대해 시스템이 수행하고 있는 동작을 정지시킬 수 있도록 기 정의된 명령어 코드를 저장한다. The command code storage unit 140 stores a command code corresponding to the analog voice signal input by the user. For example, a predetermined command code may be stored to stop an operation performed by a system on an analog voice signal of a user's “stop”.

또한, 명령어 코드 저장부(140)에는 아스키 코드(American Standard Code for Information Interchange) 등이 저장되어 사용자가 입력하는 아날로그 음성 신호를 전자 기기의 소정 표시 영역에 대응하는 문자로 표시하도록 할 수도 있다.In addition, the command code storage unit 140 may store an American Standard Code for Information Interchange or the like to display an analog voice signal input by a user as a character corresponding to a predetermined display area of the electronic device.

즉, 여기서 사용된 명령어 코드의 용어는 전자 기기가 특정 기능만을 수행하도록 하는 명령어들에 대한 코드가 다양한 종류의 데이터를 의미한다.That is, the terminology of the command code used herein refers to various kinds of data in which codes for instructions for causing the electronic device to perform only a specific function.

제어부(150)는 특징 추출부(120)에서 추출된 음성 인식 데이터의 인식 특징 부를 수신하고, 패턴 저장부(130)에 저장된 패턴들을 비교하여 일치하는 패턴이 존재하는지 여부를 판단한다. The controller 150 receives a recognition feature of the speech recognition data extracted by the feature extractor 120, compares the patterns stored in the pattern storage unit 130, and determines whether a matching pattern exists.

여기서, 바람직한 실시예로 음성 인식 데이터의 인식 특징부와 일치하는 패턴이 존재하는지 여부를 판단하였으나, 실제로 일치하는 패턴이 존재하지 않는 경우 가장 유사한 패턴을 추출할 수도 있다.Here, in the preferred embodiment, it is determined whether there is a pattern matching the recognition feature of the speech recognition data, but if the pattern actually does not exist, the most similar pattern may be extracted.

제어부(150)는 일치하는 패턴이 존재하는 경우, 일치하는 패턴과 대응하는 명령어 코드를 명령어 코드 저장부(140)로부터 독출하여 인식 결과로 출력한다. 이에 따라, 시스템은 대응하는 동작을 수행한다. 또한, 시스템이 아날로그 음성 신호와 대응되는 특정 기능을 수행하도록 인식 결과를 출력하는 것이 아닌 경우, 예를 들어, 아날로그 음성 신호에 대응하는 문자를 소정 표시 영역에 표시하도록 하는 경우에는 명령어 코드가 아닌 해당 문자에 대한 정보를 인식 결과로 출력할 수도 있다.If there is a matching pattern, the controller 150 reads the command code corresponding to the matching pattern from the command code storage 140 and outputs it as a recognition result. Accordingly, the system performs the corresponding operation. In addition, when the system does not output a recognition result to perform a specific function corresponding to the analog voice signal, for example, when the character corresponding to the analog voice signal is displayed in a predetermined display area, the corresponding code is not a command code. Information about characters can also be output as a recognition result.

이러한, 음성 인식 장치를 이용한 음성 인식 방법에 대해 구체적으로 살펴보면 다음과 같다.The voice recognition method using the voice recognition device will now be described in detail.

도 6은 본 발명의 일 실시예에 의한 음성 인식 방법을 설명하기 위한 순서도이다. 6 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.

도 4 내지 도 6을 참조하면, 본 발명의 일 실시예에 의한 음성 인식 방법은 음성 신호를 수신하는 단계(S100), 음성 신호를 메인 및 서브 파라미터를 갖는 음성 인식 데이터로 변환하는 단계(S110), 잡음 처리 단계(S120), 음성 인식 데이터의 인식 특징부를 추출하는 단계(S130), 음성 모델 데이터 패턴과 음성 인식 데이 터의 인식 특징부를 비교하는 단계(S140), 일치하는 패턴의 존재 여부를 판단하는 단계(S150) 및 인식 결과를 출력하는 단계(S160)를 포함한다.4 to 6, in a voice recognition method according to an embodiment of the present invention, a step of receiving a voice signal (S100) and a step of converting the voice signal into voice recognition data having main and sub parameters (S110). , Noise processing step (S120), extracting a recognition feature of the speech recognition data (S130), comparing the recognition feature of the speech model data pattern and the speech recognition data (S140), determining whether a matching pattern exists In step S150 and outputting the recognition result (S160).

구체적으로 단계 S100에서, 음성 수신부(110)는 외부로부터 입력되는 아날로그 음성 신호를 수신한다. 이때, 음성 수신부(110)는 아날로그 음성 신호의 메인 파라미터와 서브 파라미터 예를 들어, 크기 정보와 방향 정보를 수신한다.Specifically, in step S100, the voice receiver 110 receives an analog voice signal input from the outside. At this time, the voice receiver 110 receives the main parameter and the sub-parameter of the analog voice signal, for example, size information and direction information.

단계 S110에서, 음성 수신부(110)는 수신된 메인 파라미터와 서브 파라미터에 기초하여 도 5에 도시된 바와 같이 메인 파라미터와 서브 파라미터를 갖는 예를 들어, 4 바이트의 음성 인식 데이터로 변환한다. In step S110, the voice receiver 110 converts the voice recognition data into, for example, four bytes of voice recognition data having the main parameter and the sub parameter as shown in FIG. 5 based on the received main parameter and the sub parameter.

단계 S120에서, 음성 수신부(110)는 변환된 음성 인식 데이터에 포함되고, 주변 환경으로부터 입력되는 잡음 성분에 대한 음성 인식 데이터와 실제 사용자로부터 입력되는 음성 신호에 대한 음성 인식 데이터를 분리하여, 잡음 성분을 제거하여 출력한다.In operation S120, the voice receiver 110 is included in the converted voice recognition data, and separates the voice recognition data for the noise component input from the surrounding environment from the voice recognition data for the voice signal input from the real user. Remove the output.

단계 S130에서, 특징 추출부(120)는 음성 수신부(110)에서 출력되는 잡음 성분이 제거된 음성 인식 데이터를 수신하여, 음성 인식 데이터의 특징이 되는 부분 예를 들어, 음소의 모음부를 추출한다. 이때, 특징 추출부(120)는 메인 파리미터와 서브 파라미터 모두를 고려한 특징부를 추출한다. In operation S130, the feature extractor 120 receives voice recognition data from which the noise component output from the voice receiver 110 is removed and extracts a part of the voice recognition data, for example, a collection of phonemes. In this case, the feature extractor 120 extracts the feature considering both the main parameter and the sub parameter.

단계 S140에서, 제어부(150)는 패턴 저장부(130)에 메인 파라미터와 서브 파라미터에 기초하여 기 저장된 음성 모델 데이터 패턴들과 음성 인식 데이터의 추출된 인식 특징부를 비교한다. 이때, 패턴 저장부(130) 각각의 음성 DB에 방향 정보에 기초로하여 기 저장된 음성 모델 데이터 패턴과 인식 특징부를 비교한다. In operation S140, the controller 150 compares the voice recognition model data patterns pre-stored with the extracted recognition feature of the voice recognition data based on the main parameter and the sub parameter in the pattern storage unit 130. At this time, the pattern storage unit 130 compares the pre-stored voice model data pattern and the recognition feature based on the direction information in each voice DB.

단계S150에서, 제어부(150)는 단계 S140에서의 비교 결과, 일치하는 패턴이 존재하는지 여부를 판단한다. 판단결과, 일치하는 패턴이 존재하는 경우, 단계 S160를 수행하고, 판단결과, 일치하는 패턴이 존재하지 않는 경우, 음성 인식 처리과정을 종료한다. 단계 S150에서는 설명의 편의를 위해 일치하는 패턴의 존재 유무만을 판단하는 것으로 도시하였으나, 일치하는 패턴이 존재하지 않는 경우, 가장 유사한 패턴을 일치하는 패턴을 일치하는 패턴으로 인식하도록 할 수도 있다.In step S150, the controller 150 determines whether a matching pattern exists as a result of the comparison in step S140. As a result of the determination, if there is a matching pattern, step S160 is performed. If the determination result does not exist, the voice recognition process is terminated. In operation S150, only the presence or absence of a matching pattern is determined for convenience of description. However, when there is no matching pattern, the most similar pattern may be recognized as a matching pattern.

단계 S160에서, 제어부(150)는 일치하는 패턴이 존재하거나, 가장 유사한 패턴을 일치하는 패턴으로 인식한 경우, 이에 대응하는 인식 결과를 출력한다. In operation S160, when the matching pattern exists or recognizes the most similar pattern as the matching pattern, the controller 150 outputs a recognition result corresponding thereto.

여기서, 인식 결과는 예를 들어, 사용자의 아날로그 음성 신호가 시스템이 소정 기능을 수행하도록 하는 명령어인 경우, 명령어 코드 저장부(140)로부터 대응하는 명령어 코드를 독출하여 출력할 수 있으며, 소정 기능이 아닌 단순한 문자를 표시하는 기능을 수행하도록 하는 명령어인 경우, 명령어 코드 저장부(140) 또는 기 정의된 문자에 대한 아스키 코드 등을 출력하도록 할 수 있다.Here, the recognition result may be, for example, when the analog voice signal of the user is a command for the system to perform a predetermined function, the command code storage unit 140 may be read out and output the corresponding command code, the predetermined function is In the case of a command to perform a function of displaying a simple character, the command code storage 140 or an ASCII code for a predefined character may be output.

이러한 음성 인식 장치 및 음성 인식 방법에 의하면, 일반적으로 음성 신호의 크기만을 고려하여 음성 인식을 수행하는 경우의 인식률이 1/10이라고 할 때, 방향성 즉, 3차원 공간에서 x-축, y-축 및 z-축 각각에 대한 방향성을 고려하여 음성 인식을 수행하는 경우, 각 방향성을 고려하여 기 정의된 음성 모델 데이터 패턴들을 비교하여 음성 인식을 수행함으로써 1/(10×10×10)의 인식률을 얻게 되고, 고 정밀도의 음성 인식이 가능하다. According to such a speech recognition apparatus and a speech recognition method, when the recognition rate in the case of performing speech recognition is generally 1/10 considering only the magnitude of the speech signal, the x-axis and the y-axis in the directionality, that is, three-dimensional space When speech recognition is performed in consideration of the directionality of each of the z-axis, a recognition rate of 1 / (10 × 10 × 10) is obtained by comparing the predefined speech model data patterns in consideration of each directionality and performing speech recognition. And speech recognition with high precision is possible.

상기한 바와 같이 본 발명에 의하면, 음성 신호의 크기뿐만 아니라 3차원 공간에서의 방향성을 고려하여 방향성을 고려하여 기 정의된 다양한 모델들과 비교함으로써, 음성 인식의 인식률을 향상시킬 수 있다.As described above, according to the present invention, recognition rate of speech recognition can be improved by comparing not only the magnitude of the speech signal but also various models previously defined in consideration of the directionality in consideration of the directionality in the three-dimensional space.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. While the foregoing has been described with reference to preferred embodiments of the invention, those skilled in the art will be able to make various modifications and changes to the invention without departing from the spirit and scope of the invention as set forth in the claims below. It will be appreciated.

Claims

A voice receiver for receiving a voice signal and converting the voice signal into voice recognition data having a main parameter and a sub parameter for identifying the voice signal;

A pattern storage unit for storing a predefined voice model data pattern;

A feature extractor which extracts a recognition feature of the speech recognition data according to the main and sub parameters; And

And a controller configured to compare the recognition feature with the speech model data pattern and to output a command code corresponding to the speech model data pattern if a speech model data pattern matching the recognition feature exists. Speech recognition device.

The method of claim 1,

The main parameter is magnitude information of the voice signal, and the sub parameter is direction information on an output direction of the voice signal.

The method of claim 2, wherein the voice receiver,

And a plurality of acoustic sensors for identifying the direction information in a three-dimensional space region and including the direction information in the speech recognition data.

The method of claim 1,

And a command code storage unit for storing the command code.

The method of claim 1, wherein the pattern storage unit,

And a pre-defined voice model data pattern for the voice signal according to the main parameter and the sub parameter.

Receiving a speech signal and converting the speech signal into speech recognition data having a main parameter and a sub-parameter for identifying the speech signal;

Extracting recognition features of the converted speech recognition data;

Comparing the recognition feature with a predefined speech model data pattern; And

And if it is determined that the speech model data pattern that matches the recognition feature exists, outputting a command code corresponding to the speech model data pattern.

The voice recognition method of claim 6, wherein the main parameter is magnitude information of the voice signal, and the sub parameter is direction information on an output direction of the voice signal.