KR101066472B1

KR101066472B1 - Apparatus and method speech recognition based initial sound

Info

Publication number: KR101066472B1
Application number: KR1020090087154A
Authority: KR
Inventors: 김남운; 김기두
Original assignee: 국민대학교산학협력단
Priority date: 2009-09-15
Filing date: 2009-09-15
Publication date: 2011-09-21
Also published as: KR20110029471A

Abstract

본 발명은 초성 기반 음성인식장치 및 음성인식방법에 관한 것으로서, 음성을 입력받는 음성입력부, 수신된 음성신호의 특징벡터를 산출하여 입력패턴을 생성하는 특징분석부, 참조패턴들이 미리 저장되는 참조패턴저장부, 입력패턴과 참조패턴을 비교하여 입력패턴과 유사한 참조패턴에 대응하는 초성정보를 출력하는 패턴인식부, 그리고 초성정보에 대응하는 단어를 검색하여 출력할 단어를 결정하는 결정부를 포함하는 초성 기반 음성인식장치에 있어서, 상기 참조패턴은, 각각의 자음에 동일한 모음이 조합된 음절을 기초로 하여 생성된 패턴이고; 상기 패턴인식부는, 상기 입력패턴에 대하여 음절단위 비교 알고리듬을 수행하여 유사한 참조패턴을 검출하되, 유사한 참조패턴이 둘 이상 검출되는 경우에 한하여 음소단위 비교 알고리듬을 수행함으로써 최종적으로 유사한 참조패턴 하나를 검출한다. 이와 같은 본 발명에 의하면 임베디드 시스템에 적합한 음성인식 기술을 채용하여 간단하면서도 정확성 있는 초성 기반의 음성인식을 가능하게 하고, 초성 기반 음성인식에서 검색된 단어의 중요도에 따라 결과를 출력함으로써 보다 향상된 검색 기능을 제공할 수 있다는 이점이 있다. The present invention relates to a speech-based voice recognition device and a voice recognition method, comprising: a voice input unit for receiving a voice, a feature analyzer for generating an input pattern by calculating a feature vector of a received voice signal, and a reference pattern in which reference patterns are stored in advance A consonant including a storage unit, a pattern recognition unit outputting consonant information corresponding to a reference pattern similar to the input pattern by comparing the input pattern with a reference pattern, and a decision unit determining a word to be output by searching for words corresponding to the consonant information In the speech recognition apparatus, the reference pattern is a pattern generated based on syllables in which the same vowel is combined with each consonant; The pattern recognition unit detects a similar reference pattern by performing a syllable unit comparison algorithm on the input pattern, but performs a phoneme unit comparison algorithm only when two or more similar reference patterns are detected. do. According to the present invention, a speech recognition technology suitable for an embedded system enables simple and accurate initial-based speech recognition, and outputs a result according to the importance of words retrieved from the initial-based speech recognition, thereby improving search function. There is an advantage that it can provide.

DTW(Dynamic Time Warping), HMM(Hiden Markov Model), 음성 인식 Dynamic Time Warping (DTW), Hiden Markov Model (HMM), Speech Recognition

Description

Speech based speech recognition device and voice recognition method {APPARATUS AND METHOD SPEECH RECOGNITION BASED INITIAL SOUND}

본 발명은 초성 기반 음성인식장치 및 음성인식방법에 관한 것으로서, 보다 구체적으로는, 음성인식을 통한 초성 검색에서 자음에 동일한 모음을 조합한 음절을 이용하여 음성인식이 수행되도록 함으로써 임베디드 시스템의 자원을 절약함과 동시에, 초성 검색시 중요도 높은 단어가 우선적으로 출력되도록 함으로써 검색 기능을 향상시킬 수 있는 음성인식장치 및 음성인식방법을 제공하는 것이다. The present invention relates to a speech-based speech recognition device and a speech recognition method. More specifically, the speech recognition is performed by using syllables combining the same vowels with consonants in a consonant search through speech recognition. At the same time, the present invention provides a speech recognition apparatus and a speech recognition method capable of improving the search function by outputting a word of high importance in the initial search.

운전자의 편의와 주행의 안정성을 위해 음성인식기술에 의한 차량 내 기기의 필요성이 대두되고 있다. 특히 차량용 내비게이션 장치(Navigator)의 음성인식은 실용화 단계에 있다.The need for in-vehicle devices by voice recognition technology is emerging for the driver's convenience and driving stability. In particular, voice recognition of a vehicle navigation device (Navigator) is in the practical stage of use.

그러나 상용화된 내비게이션 시스템의 음성인식 성능은 만족할 만한 수준에 이르고 있지 못하다. 이는 차량 내의 잡음과 임베디드 시스템이라는 열악한 조건에 의한 것이다. 잡음이 많은 환경에서의 음성인식 시스템은 잡음 처리, 음성신호 특징 모델링, 음성인식 알고리듬 등을 모두 구현해야 하는 반면, 내비게이션 장치 내의 음성인식 시스템과 같은 임베디드 시스템은 작은 용량의 데이터베이스 내에서 구현될 것이 요구된다. However, the speech recognition performance of commercialized navigation system is not satisfactory. This is due to the in-vehicle noise and poor conditions of embedded systems. In a noisy environment, speech recognition systems must implement noise processing, speech signal feature modeling, speech recognition algorithms, etc., whereas embedded systems such as speech recognition systems in navigation devices need to be implemented in small databases. do.

잡음 처리 기술에는 음성향상과 음성강화 기법이 있으며, 그 종류로는 통계모델을 기반으로 한 Spectral Subtraction(SS) 기술과, Minimum Mean Square Error(MMSE) 추정을 이용한 Wiener 필터 기술이 있다. 또한 인간 청각 특성을 고려한 음성향상 등의 방법으로는 Signal subspace 음성 향상기법과 마스킹효과를 이용한 기법이 있다. 통계 모델 기법 중 SS는 간단하지만 Musical noise에 약한 단점이 있으며, Wiener 필터를 이용한 기술에서 역시 음성이 왜곡(Speech distortion)되며 잔여잡음(Residual noise)이 발생한다. 인간 청각 특성을 이용한 기법 중 Signal subspace 음성향상 기법과 마스킹효과를 이용한 기법은 성능 자체는 좋지만 계산량이 많은 단점이 있다. 이와 같은 음성강화 기법은 내 주위의 잡음을 제어하는 기법으로서 기본적으로 전력소모가 많기 때문에 임베디드 시스템에의 적용이 어렵다. The noise processing techniques include speech enhancement and speech enhancement techniques. Spectral Subtraction (SS) based on statistical model and Wiener filter technique using Minimum Mean Square Error (MMSE) estimation. In addition, methods such as speech enhancement considering human hearing characteristics include signal subspace speech enhancement and masking effects. Among the statistical model techniques, SS is simple but has a weak point in musical noise. In the technique using Wiener filter, speech distortion and residual noise are also generated. Among the techniques using the human auditory characteristics, the signal subspace speech enhancement technique and the masking effect technique have good performance but have a large amount of computational disadvantages. Such voice enhancement technique is a technique to control the noise around me, and it is difficult to apply to embedded system because it consumes a lot of power.

따라서 본 발명은 상기와 같은 종래의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은 임베디드 시스템에 적합한 음성인식장치 및 음성인식방법을 제공하는 것이다.Accordingly, the present invention has been made to solve the above-mentioned conventional problems, and an object of the present invention is to provide a speech recognition device and a speech recognition method suitable for an embedded system.

본 발명의 다른 목적은 간단하면서도 정확성 있는 초성 기반의 음성인식장치 및 음성인식방법을 제공하는 것이다.Another object of the present invention is to provide a simple and accurate voice-based speech recognition device and a speech recognition method.

본 발명의 또 다른 목적은 초성 기반 음성인식에서 검색된 단어의 중요도에 따라 결과를 출력하여 검색 성능이 향상된 음성인식장치 및 음성인식방법을 제공하는 것이다. It is still another object of the present invention to provide a speech recognition apparatus and a speech recognition method having improved search performance by outputting a result according to the importance of a searched word in initial speech recognition.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 특징에 따르면, 본 발명은 음성을 입력받아 음성신호로 변환하는 음성입력부와; 상기 음성입력부로부터 수신된 상기 음성신호의 특징 벡터를 산출하여 입력패턴을 생성하는 특징분석부와; 상기 입력패턴과 비교하여 음성인식을 수행하기 위하여 참조패턴들이 미리 저장되는 참조패턴저장부와; 상기 입력패턴과 상기 참조패턴을 비교하여 상기 입력패턴과 유사한 참조패턴에 대응하는 초성정보를 출력하는 패턴인식부; 그리고 상기 패턴인식부에서 검출된 초성정보에 대응하는 단어를 검색하여 출력할 단어를 결정하는 결정부를 포함하여 구성되는 초성 기반 음성인식장치에 있어서, 상기 참조패턴저장부에 저장되는 참조패턴은, 각각의 자음에 동일한 모음이 조합된 음절을 기초로 하여 생성된 패턴이고; 상기 패턴인식부는, 상기 입력패턴에 대하여 우선적으로 음절단위 비교 알고리듬을 수행하여 유사한 참조패턴을 검출하되, 유사한 참조패턴이 둘 이상 검출되는 경우에 한하여 선택적으로 음소단위 비교 알고리듬을 수행함으로써 최종적으로 유사한 참조패턴 하나를 검출하고, 상기 결정부는, 복수의 단어가 저장된 데이터베이스를 포함하여 구성되고, 상기 데이터베이스에 저장된 단어 중 상기 패턴인식부에서 검출된 초성정보에 대응하는 단어를 검색하여 출력하되, 상기 패턴인식부에서 검출된 초성정보에 대응하는 단어가 둘 이상 검색된 경우 상기 검색된 모든 단어의 중요도를 산출하는 것을 특징으로 한다.According to a feature of the present invention for achieving the above object, the present invention comprises: a voice input unit for receiving a voice and converting it into a voice signal; A feature analyzer for generating an input pattern by calculating a feature vector of the voice signal received from the voice input unit; A reference pattern storage unit for storing reference patterns in advance in order to perform speech recognition in comparison with the input pattern; A pattern recognition unit for comparing the input pattern with the reference pattern and outputting consonant information corresponding to a reference pattern similar to the input pattern; And a decision unit determining a word to be output by searching for words corresponding to the initial information detected by the pattern recognition unit, wherein the reference patterns stored in the reference pattern storage unit are respectively included in the reference pattern storage unit. A pattern generated on the basis of syllables in which the same vowel is combined with the consonants of; The pattern recognition unit first performs a syllable unit comparison algorithm on the input pattern to detect similar reference patterns, and selectively performs a phoneme unit comparison algorithm only when two or more similar reference patterns are detected. Detecting one pattern, the determining unit is configured to include a database in which a plurality of words are stored, and searches for and outputs a word corresponding to initial information detected by the pattern recognition unit among words stored in the database, wherein the pattern recognition is performed. If more than one word corresponding to the detected first consonant information is searched, the importance of all the searched words is calculated.

여기서 상기 음절단위 비교 알고리듬은, 동적 시간 워핑(DTW; Dynamic Time Warping) 방법이고; 상기 음소단위 비교 알고리듬은, 은닉 마르코프 모델(Hidden Markov model) 방법이며; 상기 참조패턴저장부에는, 상기 음절단위 비교 알고리듬을 위한 참조패턴과, 상기 음소단위 비교 알고리듬을 위한 참조패턴이 별도로 저장될 수 있다. Wherein the syllable unit comparison algorithm is a dynamic time warping (DTW) method; The phoneme comparison algorithm is a Hidden Markov model method; In the reference pattern storage unit, a reference pattern for the syllable unit comparison algorithm and a reference pattern for the phoneme unit comparison algorithm may be separately stored.

또한 상기 패턴인식부는, 상기 음절단위 비교 알고리듬의 수행에 의하여 상기 입력패턴과 각각의 참조패턴 사이의 유클리드 제곱 거리(Squared Euclidean Distance)를 산출하고, 산출된 유클리드 제곱 거리가 미리 저장된 임계거리 미만인 참조패턴이 둘 이상인 경우, 상기 음소단위 비교 알고리듬을 수행할 수 있다.In addition, the pattern recognition unit calculates a Euclidean square distance between the input pattern and each reference pattern by performing the syllable unit comparison algorithm, and the calculated Euclidean square distance is less than a previously stored threshold distance. In the case of two or more, the phoneme comparison algorithm may be performed.

삭제delete

이때 상기 결정부는, 검색된 단어의 범위에서, 검색된 각각의 단어에 포함된 단순명사의 빈도수를 검출하고, 검출된 단순명사의 빈도수에 기초하여 각 단어의 중요도를 산출할 수 있다.In this case, the determination unit may detect a frequency of simple nouns included in each of the words searched in the range of the searched words, and calculate the importance of each word based on the detected frequency of the simple nouns.

나아가, 상기 결정부는, 검색된 각각의 단어에 미리 설정된 가중치가 부여된 단순명사가 포함되어 있는지 여부를 검출하여, 상기 빈도수와 상기 가중치에 기초하여 각 단어의 중요도를 산출할 수도 있다. In addition, the determination unit may detect whether each searched word includes a simple noun with a predetermined weight, and calculate the importance of each word based on the frequency and the weight.

여기서 상기 결정부는, 산출된 중요도가 높은 단어부터 출력하거나, 중요도가 가장 높은 단어만을 출력할 수도 있다. Here, the determination unit may output the calculated word of high importance or output only the word of highest importance.

한편 본 발명은, (A)음성을 입력받아 음성신호로 변환하는 단계와; (B)상기 음성신호의 특징 벡터를 산출하여 입력 패턴을 생성하는 단계와; (C)음절단위 비교 알고리듬에 따라 입력패턴을 미리 저장된 참조패턴들과 비교하여 유사한 참조패턴을 검출하는 단계와; (D)상기 (C)단계에서 유사한 참조패턴이 2개 이상 검출된 경우에 한하여 선택적으로 음소단위 비교 알고리듬에 따라 하나의 유사한 참조패턴을 검출하는 단계와; (E)상기 (C)단계 또는 상기 (D)단계에서 검출된 하나의 유사한 참조패턴에 대응하는 초성을 갖는 단어를 검색하여 출력하는 단계를 포함하며,
이때, 상기 (E)단계는, (E1)검출된 하나의 유사한 참조패턴에 대응하는 초성을 갖는 단어를 모두 검색하는 단계와; (E2)상기 (E1)단계에서 검색된 단어의 범위에서, 검색된 각각의 단어에 포함된 단순명사의 빈도수를 검출하는 단계와; (E3)상기 (E2)단계에서 검출된 단순명사의 빈도수에 기초하여 각 단어의 중요도를 산출하는 단계; 그리고 (E4)상기 (E2)단계에서 산출된 중요도에 따라 검색된 단어를 출력하는 단계를 포함하여 수행될 수 있다. On the other hand, the present invention, (A) receiving the voice and converting the voice signal; (B) generating an input pattern by calculating a feature vector of the voice signal; (C) detecting a similar reference pattern by comparing the input pattern with previously stored reference patterns according to a syllable unit comparison algorithm; (D) selectively detecting one similar reference pattern according to a phoneme unit comparison algorithm only when two or more similar reference patterns are detected in step (C); (E) searching for and outputting a word having a consonant corresponding to one similar reference pattern detected in step (C) or step (D),
At this time, the step (E), (E1) is a step of searching for all the words having the initial property corresponding to the detected one similar reference pattern; (E2) detecting a frequency of simple nouns included in each word searched in the range of words searched in step (E1); (E3) calculating the importance of each word based on the frequency of the simple noun detected in the step (E2); And (E4) outputting the searched word according to the importance calculated in the step (E2).

이때 상기 참조패턴은, 각각의 자음에 동일한 모음을 조합한 음절을 기초로 하여, 상기 음절단위 비교 알고리듬을 위한 참조패턴과, 상기 음소단위 비교 알고리듬을 위한 참조패턴에 대하여 별도로 미리 생성 및 저장될 수 있다. In this case, the reference pattern may be previously generated and stored separately for the reference pattern for the syllable unit comparison algorithm and the reference pattern for the phoneme unit comparison algorithm based on the syllables in which the same vowels are combined with each consonant. have.

또한 상기 음절단위 비교 알고리듬은, 동적 시간 워핑(DTW; Dynamic Time Warping) 방법이고; 상기 음소단위 비교 알고리듬은, 은닉 마르코프 모델(Hidden Markov model) 방법일 수 있다. In addition, the syllable unit comparison algorithm is a dynamic time warping (DTW) method; The phoneme comparison algorithm may be a hidden Markov model method.

삭제delete

상기 (E3)단계는, 검색된 각각의 단어에 미리 설정된 가중치가 부여된 단순명사가 포함되어 있는지 여부를 검출하여, 상기 빈도수와 상기 가중치에 기초하여 각 단어의 중요도를 산출함으로써 수행될 수도 있다. The step (E3) may be performed by detecting whether each searched word includes a simple noun with a predetermined weight, and calculating the importance of each word based on the frequency and the weight.

여기서 상기 가중치가 부여된 단순명사는, 공공기관을 나타내는 단순명사가 될 수 있다. Here, the weighted simple noun may be a simple noun representing a public institution.

이상에서 상세히 설명한 바와 같이 본 발명에 의한 음성인식장치 및 음성인식방법에 의하면 다음과 같은 효과를 기대할 수 있다.As described in detail above, according to the voice recognition device and the voice recognition method according to the present invention, the following effects can be expected.

즉, 임베디드 시스템에 적합한 음성인식 기술을 채용하여 간단하면서도 정확성 있는 초성 기반의 음성인식이 가능하게 한다는 장점이 있다.In other words, by adopting a voice recognition technology suitable for the embedded system has the advantage that it is possible to simple and accurate voice-based speech recognition.

또한, 본 발명에 의하면 초성 기반 음성인식에서 검색된 단어의 중요도에 따라 결과를 출력함으로써 보다 향상된 검색 기능을 제공할 수 있다는 이점이 있다. In addition, according to the present invention there is an advantage that the improved search function can be provided by outputting the result according to the importance of the searched words in the initial speech recognition.

이하에서는 상기한 바와 같은 본 발명에 의한 음성인식장치 및 음성인식방법 의 구체적인 실시예를 첨부된 도면을 참고하여 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings, a specific embodiment of the voice recognition device and voice recognition method according to the present invention as described above will be described in detail.

도 1은 일반적인 음성인식 시스템의 개략적인 구성을 도시한 블럭도이다. 일반적인 음성인식 시스템에는, 우선 사용자의 음성을 마이크로폰으로 입력받아 신호 형태로 변환하는 음성입력부(10)가 구비된다. 1 is a block diagram showing a schematic configuration of a general speech recognition system. The general voice recognition system includes a voice input unit 10 that first receives a user's voice into a microphone and converts the voice into a signal form.

그리고 상기 음성입력부(10)에 의해 얻어진 음성신호를 분석하여 인식에 필요한 특징 벡터를 추출하고, 이를 이용하여 추후 참조패턴과 비교될 입력패턴을 생성하는 특징분석부(11)가 구비된다. In addition, a feature analysis unit 11 is provided to analyze a voice signal obtained by the voice input unit 10 to extract a feature vector required for recognition, and generate an input pattern to be compared with a reference pattern using the same.

상기 특징분석부(11)에 의한 입력패턴은 다시 패턴인식부(12)로 입력된다. 상기 패턴인식부(12)는 후술할 참조패턴저장부(13)에 미리 저장된 참조패턴과 상기 입력패턴에 대한 비교 알고리듬을 수행한다. 그리고 상기 패턴인식부(12)는 상기 입력패턴과 상기 참조패턴의 비교 결과 가장 유사도가 높은 참조패턴에 대한 정보를 음성인식 결과로 출력한다. The input pattern by the feature analyzer 11 is input to the pattern recognition unit 12 again. The pattern recognition unit 12 performs a comparison algorithm with respect to the input pattern and the reference pattern previously stored in the reference pattern storage unit 13 to be described later. The pattern recognition unit 12 outputs information on a reference pattern having the highest similarity as a result of the comparison between the input pattern and the reference pattern as a voice recognition result.

여기서 상기 참조패턴저장부(13)는 음성인식을 위하여 미리 정해진 참조패턴들이 저장되는 기억수단이다.Here, the reference pattern storage unit 13 is a storage means for storing predetermined reference patterns for speech recognition.

한편 상기 패턴인식부(12)에서 가장 유사도가 높은 것으로 최종 결정된 참조패턴에 대한 정보를 출력하면, 이를 결정부(14)에서 수신하여 최종 결정된 유사한 참조패턴에 대응하는 문자를 검색어로 하여 단어를 검색한다. 그리고 검색된 단어를 최종 출력함으로써 음성인식 동작의 수행이 종료된다.On the other hand, when the pattern recognition unit 12 outputs information on a reference pattern finally determined to have the highest similarity, the determination unit 14 receives the word and searches for a word using a letter corresponding to the similar reference pattern finally determined. do. The voice recognition operation is terminated by finally outputting the retrieved word.

이와 같은 음성인식 시스템에서 상기 참조패턴저장부(13)에 저장되어 입력패턴과 비교되는 참조패턴의 수가 많을수록 정확한 음성인식이 가능하나, 이 경우 상 기 참조패턴저장부(13)의 용량이 방대해야할 뿐 아니라, 입력패턴과 참조패턴의 비교 알고리듬 수행시 계산량이 크게 증가하기 때문에 상기 패턴인식부(12)의 성능 또한 높아야만 한다. 그러나 이미 설명한 바와 같이 임베디드 시스템에서는 시스템 자원이 제한되므로 최소한의 자원으로 정확한 음성인식 결과를 도출하는 것이 요구되며 이를 위하여 본 발명에서는 초성 기반의 음성인식을 지원한다.In such a voice recognition system, the more the number of reference patterns stored in the reference pattern storage unit 13 and compared with the input pattern, the more accurate voice recognition is possible. In this case, the capacity of the reference pattern storage unit 13 should be enormous. In addition, the performance of the pattern recognition unit 12 should also be high because the amount of calculation increases significantly when the comparison algorithm between the input pattern and the reference pattern is performed. However, as described above, since system resources are limited in an embedded system, it is required to derive an accurate voice recognition result with a minimum of resources. For this purpose, the present invention supports voice based speech recognition.

특히 초성 'ㄱ', 'ㄴ', 'ㄷ' 등을 음성으로 입력함에 있어서, '기역', '니은', '디귿'과 같이 초성의 명칭을 사용하지 않고, '가', '나', '다' 같이 초성에 하나의 통일된 모음을 조합하여 발음하여 입력하도록 하고, 상기 참조패턴 또한 이와 같이 초성과 통일된 하나의 모음이 조합된 형태의 음성신호에 대응하여 생성하고 저장되도록 한다. In particular, when entering the first consonants 'a', 'b', 'c', etc., do not use the names of the consonants, such as 'breath,' 'ni', 'di', and 'ga', 'me', A single vowel is combined to be pronounced and inputted as 'da', and the reference pattern is also generated and stored in response to the voice signal of the combined form.

이에 의하면, 상기 참조패턴저장부(13)에 저장되어야할 참조패턴의 수가 크게 줄어들 뿐 아니라, 그에 따라 상기 패턴인식부(12)에서의 계산량도 현저하게 줄어들게 된다. As a result, not only the number of reference patterns to be stored in the reference pattern storage unit 13 is greatly reduced, but also the amount of calculation in the pattern recognition unit 12 is significantly reduced.

이러한 본 발명에 의한 음성인식방법을 도 2를 참조하여 보다 상세하게 설명한다. 도 2는 본 발명의 실시예에 따른 초성 기반 음성인식방법을 순차적으로 도시한 흐름도이다.The speech recognition method according to the present invention will be described in more detail with reference to FIG. 2. 2 is a flowchart sequentially illustrating a method of speech-based speech recognition according to an embodiment of the present invention.

도 2에 도시된 바와 같이 본 발명의 실시예에 의한 음성인식방법에서는, 우선 사용자로부터 음성이 입력되는 단계(S10)로부터 시작된다. 상기 S10단계에서 사용자로부터 입력되는 음성은 어떤 것이라도 될 수 있으나, 앞서 설명한 바와 같이, 본 발명의 실시예에서는 초성 기반 음성인식이 수행되고, 아울러 초성에 공통된 하나의 모음을 조합한 발음을 기초로 생성된 참조패턴을 이용하여 음성인식이 수행되므로 '가', '나', '다' 와 같이 공통된 모음을 갖고, 종성을 포함하지 않는 음절들의 조합으로 입력될 것이다. 예를 들어, 사용자가 '홍길동'을 검색하고자 하는 경우, '하가다'와 같은 음성이 입력될 것이다.In the voice recognition method according to an embodiment of the present invention as shown in FIG. 2, first, a voice is input from a user (S10). The voice input from the user in step S10 may be any one, but as described above, in the embodiment of the present invention, initial-based speech recognition is performed, and based on a combination of a single vowel common to the first consonants. Since speech recognition is performed using the generated reference pattern, it will be input as a combination of syllables that have a common vowel such as 'ga', 'me', and 'da' and does not include the finality. For example, if the user wants to search for 'Hong Gil-dong', a voice such as 'Hagada' will be input.

이와 같이 사용자로부터 음성이 입력되면, 입력된 음성을 음성신호로 변환하고, 그로부터 특징 벡터를 추출하는 단계(S20)가 수행된다. 여기서 특징 벡터의 추출은 선형 예측 부호화(LPC;Linear Predictive Coding)를 수행함으로써 이루어질 수 있다. 그리고 상기 S20단계에서 특징 벡터가 추출되면 특징 벡터를 이용하여 입력된 음성 신호에 대한 입력 패턴을 생성하게 된다. 여기서 상기 입력 패턴은 상기 음성 신호를 기저장된 참조 패턴과의 비교 알고리듬 수행이 가능하도록 처리한 데이터이다. When a voice is input from the user in this way, the input voice is converted into a voice signal, and a feature vector is extracted therefrom (S20). Here, the extraction of the feature vector may be performed by performing linear predictive coding (LPC). When the feature vector is extracted in step S20, an input pattern for the input voice signal is generated using the feature vector. The input pattern is data processed to perform a comparison algorithm with the pre-stored reference pattern.

그리고 이어서 S30단계에서는 상기 입력 패턴과 상기 참조 패턴 사이의 비교 및 분석이 이루어진다. 여기서 비교 및 분석되는 입력 패턴과 참조 패턴은 모두 음절단위로 생성되어 서로 비교된다. 특히 상기 S30단계는 동적 시간 워핑(DTW; Dynamic Time Warping) 방법을 사용할 수 있다. 상기 동적 시간 워핑은 입력된 음성과 참조 음성 사이의 발음 속도와 길이의 차이를 보상하기 위하여 입력 패턴과 참조 패턴을 비선형적으로 정합하여 가장 유사도가 높은 참조 패턴의 음성으로 입력된 음성을 인식하는 방법이다. 유사도를 판단하는 기준은 입력 패턴과 각각의 참조 패턴 사이의 유클리드 제곱 거리(Squared Euclidean Distance)를 산출하여 그 거리가 가장 적은 참조 패턴을 입력 패턴과 가장 유사한 패턴으로 인식하는 것이다.Subsequently, in step S30, a comparison and analysis between the input pattern and the reference pattern are performed. Here, the input pattern and the reference pattern to be compared and analyzed are all generated in syllable units and compared with each other. In particular, the step S30 may use a dynamic time warping (DTW) method. The dynamic time warping is a method of recognizing a voice input with a voice having the most similar reference pattern by non-linearly matching an input pattern and a reference pattern to compensate for a difference in pronunciation speed and length between the input voice and the reference voice. to be. The criterion for determining similarity is to calculate a Euclidean distance between the input pattern and each reference pattern, and recognize the reference pattern having the smallest distance as the pattern most similar to the input pattern.

이와 같이 상기 S30단계에서 동적 시간 워핑을 수행하여 입력 패턴과 각각의 참조 패턴 사이의 유사도를 산출함에 있어서, 입력 패턴과의 유사도가 미리 정한 일정 범위 내인 참조 패턴이 2 이상인지 여부를 판단하는 단계(S40)가 함께 수행된다. 즉, 상기 동적 시간 워핑 수행 중, 입력 패턴과 각각의 참조 패턴 사이의 유클리드 제곱 거리를 산출한 결과, 미리 설정된 임계값보다 작은 유클리드 제곱 거리를 갖는 참조 패턴이 둘 이상인지 여부를 판단한다.As described above, in calculating the similarity between the input pattern and each reference pattern by performing dynamic time warping in step S30, determining whether the reference pattern having similarity with the input pattern within a predetermined range is 2 or more ( S40) is performed together. That is, during the dynamic time warping, as a result of calculating the Euclidean square distance between the input pattern and each reference pattern, it is determined whether two or more reference patterns having Euclidean square distance smaller than a preset threshold value are present.

현재 입력된 음성이 둘 이상의 유사한 음성으로 인식될 가능성이 있는 경우에 해당하므로 보다 정확한 패턴 분석이 요구된다. 예를 들어, '가'와 '카', '다'와 '타' 등은 발음의 유사성으로 인해 신호 패턴도 어느 정도 유사하므로 이를 동적 시간 워핑 방식만으로 비교하는 경우, 사용자가 의도한 바와 다르게 인식될 가능성이 있다. Since the voice currently input is likely to be recognized as two or more similar voices, more accurate pattern analysis is required. For example, 'ga' and 'ka', 'da' and 'ta' are similar to the signal pattern due to the similarity of pronunciation. There is a possibility.

따라서 이와 같이 유사한 참조 패턴이 2 이상인지 여부를 판별하여, 2 이상인 경우에는 동적 시간 워핑 방식보다 더 인식률이 좋은 방식의 패턴 분석을 다시 수행한다(S50). 즉, 음성 신호를 음소단위로 분리한 후 은닉 마르코프 모델(Hidden Markov model)과 같은 방식에 의하여 음소단위의 패턴 비교 알고리듬을 수행한다.Therefore, it is determined whether the similar reference pattern is 2 or more, and if it is 2 or more, pattern analysis of a method having better recognition rate is performed again than the dynamic time warping method (S50). That is, the speech signal is divided into phoneme units, and the pattern comparison algorithm of the phoneme units is performed by the same method as the Hidden Markov model.

여기서 상기 은닉 마르코프 모델은, 모델링하는 시스템이 미지의 파라미터를 가진 Markov process일 것이라고 가정하여, 그 가정에 기초해서 관측된 파라미터로부터 숨겨진 파라미터를 결정하는 하나의 통계모델로서, 음성인식분야에서 널리 사용되는 방식 중 하나이다. Here, the hidden Markov model is a statistical model for determining hidden parameters from observed parameters based on the assumption that the modeling system is a Markov process having unknown parameters, and is widely used in the field of speech recognition. One of the ways.

S60단계에서는 상기 S30단계와 상기 S50단계에서 수행된 패턴 분석 결과에 따라 음소를 결정한다. 즉, S50단계를 거쳐 음소단위의 패턴 비교 알고리듬을 수행한 결과 가장 유사한 것으로 판단된 음소를 입력된 음소로 결정하거나, 상기 S40단계에서, 유사한 참조 패턴이 하나만 검출된 경우에는 해당 참조 패턴에 해당하는 음성에 대응하는 음소를 입력된 음소로 결정한다. In the step S60 determines the phoneme according to the pattern analysis results performed in the step S30 and S50. That is, when the pattern comparison algorithm of the phoneme unit is performed through step S50, the phoneme determined to be the most similar is determined as the input phoneme, or when only one similar reference pattern is detected in step S40, the corresponding phoneme corresponds to the corresponding reference pattern. The phoneme corresponding to the voice is determined as the input phoneme.

예를 들어, 사용자가 음성 '가'를 입력하여 상기 S30단계에서 각각 '가'와 '카'에 해당하는 참조 패턴이 유사한 패턴으로 인식된 경우에는 다시 S50단계에서 다시 저장된 음성 신호의 음소 부분만을 따로 처리하여 은닉 마르코프 모델을 수행함으로써 사용자가 실질적으로 입력하고자 하였던 초성 'ㄱ'을 인식된 음소로 결정하게 된다. 한편 사용자가 음성 '나'를 입력하여 상기 S30단계에서 유사한 참조패턴이 '나'로 인식되었다면 S50단계를 거치지 않고 바로 'ㄴ'이 입력된 것으로 결정된다. For example, when the user inputs the voice 'ga' and recognizes that the reference patterns corresponding to 'ga' and 'car' are similar patterns in step S30, only the phoneme part of the voice signal stored again in step S50 is repeated. By separately processing and performing the hidden Markov model, the user decides the initial consonant 'a' to be recognized as a phoneme. On the other hand, if the user inputs the voice 'I' and the similar reference pattern is recognized as 'I' in step S30, it is determined that 'b' is immediately input without going through step S50.

순차적으로 입력된 음성 신호에 대하여 이와 같이 각각 음소가 결정되면, 해당 음소들을 이용하여 단어를 검색하고 검색된 단어 중 최종 결과를 선택하는 단계가 수행된다(S70). 여기서 S70단계의 보다 구체적인 수행에 대해서는 도 4를 참조하여 후술한다.When phonemes are determined in this way with respect to the sequentially input voice signals, a step of searching for words using the phonemes and selecting a final result among the searched words is performed (S70). More specific execution of step S70 will be described later with reference to FIG. 4.

그리고 상기 S70단계에서 검색된 결과를 디스플레이 장치에 표시하는 등 최종적으로 인식된 결과를 출력하는 단계가 수행된다(S80). In operation S80, a result of the finally recognized result is displayed, such as displaying the searched result on the display device.

이와 같은 방식에 의하면, 우선 초성 기반의 음성인식을 통해 비교되는 참조패턴의 개수를 줄일 수 있어, 메모리를 절약함과 동시에 계산량을 감소시킬 수 있다. 또한 비교적 계산량이 많지 않은 음절 단위 패턴에 대한 동적 시간 워핑을 기본적으로 사용하되, 정확성이 요구되는 경우에만 음소 단위 패턴에 대한 은닉 마르코프 모델 방식을 보조적으로 사용함으로써 시스템에 과도한 로드를 주지않으면서 음성인식의 정확성을 담보할 수 있다는 장점이 있다.According to this method, first, the number of reference patterns to be compared can be reduced by initial speech-based speech recognition, thereby saving memory and reducing computation amount. In addition, by using dynamic time warping for syllable unit patterns, which do not require much computation, the subsidiary use of the Hidden Markov Model method for phoneme unit patterns only when accuracy is required, without overloading the system It has the advantage of ensuring the accuracy of the.

또한 본 발명의 실시예에 의한 음성인식방법에서의 음성신호 그래프 및 음성신호 스펙트로그램을 도시하고 있는 도 3을 참조하면, (a)는 '가, 다, 타'의 남성 음성 신호를 나타낸 것이고, (b)는 '가, 다, 타'의 여성 음성 신호를 나타낸 것이며, (c)와 (d)는 '가' 음절에 대한 남성과 여성의 음성 신호를 20ms의 윈도우(Window)로 윈도우윙(windowing)한 결과와, 그 스펙트로그램, 그리고 선형 예측 부호화(10차)의 스펙트로그램을 함께 나타낸 것이다. In addition, referring to FIG. 3, which illustrates a voice signal graph and a voice signal spectrogram in a voice recognition method according to an exemplary embodiment of the present invention, (a) shows a male voice signal of 'a, da, other', (b) shows the female voice signal of 'a, da, other', and (c) and (d) shows the male and female voice signals for the 'ga' syllable with a window of 20ms. windowing), the spectrogram, and the spectrogram of linear predictive coding (10th order).

도 3의 (a), (b)에 도시된 '가, 다, 타' 음성 신호에 대한 음절단위 동적 시간 워핑을 수행한 결과를 보면, 각 음절에 대한 선형 예측 부호화를 통한 특징 벡터 추출에서 선형 예측 부호화 계수는 10차까지 산출하였으며, 각 음절간, 남성과 여성의 동적 시간 워핑 거리는 다음의 표와 같이 나타났다. As a result of performing syllable unit dynamic time warping on the 'ga, da, other' speech signals illustrated in FIGS. 3 (a) and 3 (b), the linear feature in the feature vector extraction through the linear prediction coding for each syllable The predictive coding coefficients were calculated up to 10th order, and the dynamic time warping distances between male and female are shown in the following table.

표에서 보는 것처럼 '가' 와 '다' 같은 경우에는 인식률이 높았으나, '다'와 '타'와 같은 유사 음절에 대한 동적 시간 워핑 방식에 의한 비교 알고리듬 수행은 인식률이 높지 않았다. 따라서 은닉 마르코프 모델 등의 방식에 의한 음소단위의 패턴 비교 알고리듬을 보조적으로 추가 수행하여 인식 정확성을 높일 수 있다. As shown in the table, the recognition rate was high in the case of 'a' and 'da', but the performance of the comparison algorithm by dynamic time warping for similar syllables such as 'da' and 'ta' was not high. Therefore, the recognition accuracy can be improved by additionally performing a pattern comparison algorithm of phoneme units using a hidden Markov model.

한편 이하에서는 도 4를 참조하여 도 2에 도시된 단어검색 및 최종단어 선택 단계를 보다 자세하게 살펴본다. 도 4는 본 발명의 실시예에 따른 초성 기반 음성인식방법에서의 단어검색 및 최종단어선택단계를 보다 구체적으로 도시한 흐름도이다.Meanwhile, the word search and final word selection steps shown in FIG. 2 will be described in detail with reference to FIG. 4. 4 is a flowchart illustrating in more detail a word search and final word selection step in a consonant-based speech recognition method according to an embodiment of the present invention.

도 2에 순차적으로 도시된 S10단계 내지 S60단계의 수행에 의하여, 사용자가 입력한 음성에 대한 음소가 인식되면, 도 1의 결정부(14)에 해당하는 음성인식 시스템의 제어수단에서는 시스템 내의 데이터베이스에서, 결정된 음소에 대응하는 모든 단어를 검색한다(S71). 여기서 상기 데이터베이스는 예를 들어, 내비게이션 장치에서는 목적지 명칭에 대한 데이터베이스 등이 될 수 있다.When the phonemes for the voice input by the user are recognized by performing the steps S10 to S60 sequentially shown in FIG. 2, the control unit of the voice recognition system corresponding to the determination unit 14 of FIG. In step S71, all words corresponding to the determined phonemes are searched. The database may be, for example, a database of a destination name in the navigation device.

그리고나서, 상기 S71단계에서 검색된 단어가 둘 이상인지 여부를 결정한다(S72). 검색된 단어가 하나뿐이면, 검색된 단어를 최종 단어로 선택하여 출력하면 되지만, 검색된 단어가 둘 이상인 경우에는, 다시 검색된 단어 중 어떤 단어를 최종 단어로 선택하여 출력할지 여부를 결정하기 위하여 이를 구분하는 것이다. Then, it is determined whether more than one word searched in step S71 (S72). If there is only one word searched, the searched word may be selected as the final word and output, but if there is more than one searched word, it is distinguished to determine which word among the searched words is selected and outputted as the final word. .

S72단계에서 검색된 단어가 둘 이상인 것으로 판별되면, 다시 검색된 단어들을 단순명사 단위로 구분한다(S73). 내비게이션 장치의 목적지 명칭 등은 특히 '서울관광호텔'과 같이 복합명사로 구성되는 경우가 매우 많다. 예로든 경우, '서울', '관광', '호텔'과 같은 단순명사가 한 단어 내에 여러 개 구성되는데 상기 S73단계에서는 검색된 모든 단어들을 단순명사 단위로 구분한다.If it is determined in step S72 that more than one word is searched, the searched words are further divided into simple noun units (S73). The destination name of the navigation device is often composed of compound nouns such as 'Seoul Tourist Hotel'. For example, a plurality of simple nouns such as 'Seoul', 'tourism', and 'hotel' are formed in one word. In step S73, all the words searched are divided into simple noun units.

사용자가 '사, 아, 아, 사, 하, 가'를 입력하여 음성인식장치에서 'ㅅ, ㅇ, ㅇ, ㅅ, ㅎ, ㄱ'으로 인식되고, 이에 대해 '서울예술학교', '상암예술회관', '서울여성학교' 총 3개의 단어가 검색된 경우를 예로 들면, 이들 각각을 단순명사 단위로 '서울/예술/학교', '상암/예술/회관', '서울/여성/학교'로 구분 가능하다. The user inputs 'sa, ah, ah, ah, sa, ha, a' and is recognized as 'ㅅ, ㅇ, ㅇ, ㅅ, ㅎ, ㄱ' in the voice recognition device, and 'Seoul Art School' and 'Sangam Art' For example, three words were searched for 'hall,' and 'Seoul women's school.'For example, each of these words is referred to as' Seoul / Art / School', 'Sangam / Art / Hall', and 'Seoul / Women / school' It is possible to distinguish.

이와 같이 검색된 단어 내에 포함된 단순명사를 구분한 후에는, 각 단순명사들의 검색 범위 내에서의 빈도수를 검출한다(S74). 즉, 동일한 예에서, 검색된 총 단어의 수는 3개이고, 이중 '서울'이 포함된 단어는 2개, '예술'이 포함된 단어는 2개, '학교'가 포함된 단어는 2개이며, '상암', '회관', '여성'은 각각 1개씩 포함되었다. After dividing the simple nouns included in the searched word as described above, the frequency in the search range of each simple noun is detected (S74). That is, in the same example, the total number of words searched is three, of which two words contain 'Seoul', two words contain 'Art', two words contain 'School', 'Sangam', 'hall', and 'female' each included one.

그 후, 선택적으로 각 단순명사에 대한 가중치를 부여하는 단계가 수행된다(S75). 각 단어 중 중요도나 검색 빈도는 높으나, 실제로 데이터베이스 내에서의 빈도수는 적은 단어, 예를 들어 병원, 소방서, 학교, 구청, 시청 등과 같은 관공서 내지 공공기관 등은 지역 내에 한정된 수로 존재하므로 그 검색 빈도에 비하여 데이터베이스에 포함된 목적지 명칭 내에 포함된 수는 적으므로 별도로 가중치를 부여할 수 있다. Thereafter, a step of selectively weighting each simple noun is performed (S75). Words that have high importance or search frequency among each word but actually have low frequency in the database, such as hospitals, fire departments, schools, ward offices, city halls, etc. In comparison, since the number included in the destination name included in the database is small, it can be separately weighted.

예를 들어, 관공서를 나타내는 단순명사에 대한 가중치는 아래의 식에 의하여 산출되도록 할 수 있다.For example, the weight for a simple noun representing a government office can be calculated by the following equation.

여기서 상기

는 공공기관을 나타내는 단순명사를 포함하는 단어의 전체 수이고,

는 공공기관을 나타내는 단순명사 중 해당 단순명사를 포함하는 단어수가 가장 많은 단순명사를 포함하는 단어의 수이다. Where above

Is the total number of words, including simple nouns that represent public authorities,

Is the number of words including the simple noun with the largest number of words including the simple noun among the simple nouns representing public institutions.

그리고 상기 S74단계에서 검출된 빈도수와 S75단계에서 부여된 가중치에 기초하여 검색된 단어들의 중요도 지수를 산출하는 단계(S76)가 수행된다. 중요도 지수를 산출하기 위한 수학식은 아래와 같이 정해질 수 있다.In operation S76, the importance index of the searched words is calculated based on the frequency detected in step S74 and the weights assigned in step S75. Equation for calculating the importance index can be determined as follows.

또는

or

여기서

는 검색된 각 단어의 중요도 지수이고,

는 검색된 모든 단어 중 특정 단순명사가 포함된 단어의 수이며,

는 검색된 모든 단어의 수이다. 또한

는 위에서 설명한 가중치이다.here

Is the importance index for each word found,

Is the number of words with a specific simple noun among all words found.

Is the number of all words retrieved. Also

Is the weight described above.

왼쪽에 도시된 수학식은 가중치를 고려하지 않은 경우, 즉 S75단계를 수행하지 않은 경우의 중요도 지수 식이고, 오른쪽에 도시된 수학식은 가중치를 고려한 경우이다. The equation shown on the left is an importance index expression when the weight is not taken into consideration, that is, when the step S75 is not performed, and the equation shown on the right is a case considering the weight.

보다 쉬운 이해를 위하여 예를 들어 설명하면, 이미 설명한 바와 같은 예에서 '사, 아, 아, 사, 하, 가'를 입력한 경우 검색된 모든 단어 '서울예술학교', '서울여성학교', '상암예술회관' 각각에 대한 중요도 지수를 계산하면, 우선 '서울예술학교'는 세 개의 단순명사를 포함하고 있다. 따라서

는 3이다. 그리고 ' 서울'이라는 단어의 빈도수는 2이므로, '서울'이라는 단어에 대한

는 2이다. 이와 같은 식으로, '서울예술학교'에 포함된 각 단어, '서울', '예술', '학교'에 대한

값을 각각 산출하면, 2/3, 2/3, 2/3이고, 따라서 총합

은 2가 된다.For easier understanding, for example, if you enter 'sa, ah, ah, ah, sa, ha, ga' in the example as described above, all words searched for 'Seoul Arts School', 'Seoul Women's School', ' When calculating the importance index for each of the Sangam Arts Centers, the Seoul Arts School includes three simple nouns. therefore

Is 3. And because the frequency of the word "Seoul" is 2, the word "Seoul"

Is 2. In this way, each word included in 'Seoul Art School', 'Seoul', 'Art', 'School'

If you calculate the values, respectively, it is 2/3, 2/3, 2/3, so the sum

Becomes two.

한편 '서울여성학교'에 대해서도 동일한 방식으로 중요도 지수를 산출하면 2/3+1/3+2/3으로서 5/3이 된다. 그리고 '상암예술회관'은 1/3+2/3+1/3으로 4/3이 된다. On the other hand, if the importance index is calculated in the same way for 'Seoul Women's School', it becomes 2/3 + 1/3 + 2/3, which is 5/3. And Sangam Art Center is 1/3 + 2/3 + 1/3, which is 4/3.

따라서 위 세 개의 단어 중 중요도 지수가 가장 높은 단어는 '서울예술학교'가 된다. Therefore, the word with the highest importance index among the above three words becomes 'Seoul Art School'.

가중치를 고려하는 경우에도, 각 단어에 대한 가중치를 함께 더하여 전체값을 구하면 된다. Even when the weight is considered, the total value may be obtained by adding the weights for each word together.

그러나, 상기 S76단계에서 중요도 지수를 산출하기 위한 식은 반드시 위에 기재된 수학식 2a와 같이 정의되는 것은 아니다. 다른 예로 아래의 수학식을 예시한다. However, the equation for calculating the importance index in step S76 is not necessarily defined as Equation 2a described above. As another example, the following equation is illustrated.

여기서

은 검색된 단어에 포함된 단순명사의 수이다.here

Is the number of simple nouns included in the searched words.

예를 들어, 초성 'ㅁ, ㄱ, ㅅ, ㅈ'가 입력되고, 이에 대하여 검색된 단어가 '문경새재'와 '면곡시장' 두 개인 경우, 수학식 2a에 의한다면, '문경새재'는 포함된 단순명사가 하나로서, 중요도 지수를 산출하면 1/2인 반면, '면곡시장'은 두 개의 검색된 전체 단어 중 '면곡'이 들어간 단어가 1개, '시장'이 들어간 단어 1개로 중요도 지수가 1이 되므로, 빈도수와 무관하게 단어에 포함된 단순명사의 수가 많을수록 중요도 지수가 높게 나타날 수 있다. 따라서 산출된 빈도수의 합을 해당 단어에 포함된 단순명사의 수로 나눠서 중요도 지수를 산출할 수도 있다.For example, if the first consonants 'ㅁ, ㄱ, ㅅ, ㅈ' are input, and the words searched for the two are 'Mungyeongsaejae' and 'Myeongok Market', according to Equation 2a, 'Mungyeongsaejae' is included. The simple noun is one, which is 1/2 when the importance index is calculated, whereas the 'membrane market' has one word containing 'memories' and one word containing 'market' out of two searched words. Therefore, the higher the number of simple nouns included in a word regardless of frequency, the higher the importance index may appear. Therefore, the importance index may be calculated by dividing the calculated frequency by the number of simple nouns included in the word.

그리고 위와 같이 제76단계에서 산출된 중요도 지수들을 모두 비교하여 (S77), 검색된 단어 중 가장 높은 중요도 지수를 갖는 단어를 최종 단어로 선택한다(S78). Then, by comparing all the importance indexes calculated in step 76 as described above (S77), the word having the highest importance index among the searched words is selected as the final word (S78).

다만 이와 같이 중요도 지수를 산출하여 최종 선택된 단어를 출력하는 것은 선택에 따라서 수행될 수 있고, 검색된 모든 단어를 가나다순으로 모두 표시하거나, 검색된 단어들을 산출된 중요도 지수가 높은 순서에 따라 모두 나열하여 표시하는 것도 가능하다. However, the calculation of the importance index and outputting the final selected word may be performed according to a selection, displaying all the words searched in alphabetical order, or listing all the searched words in order of the calculated importance index. It is also possible.

본 발명의 권리는 위에서 설명된 실시예에 한정되지 않고 청구범위에 기재된 바에 의해 정의되며, 본 발명의 분야에서 통상의 지식을 가진 자가 청구범위에 기재된 권리범위 내에서 다양한 변형과 개작을 할 수 있다는 것은 자명하다.The rights of the present invention are not limited to the embodiments described above, but are defined by the claims, and those skilled in the art can make various modifications and adaptations within the scope of the claims. It is self-evident.

도 1은 일반적인 음성인식 시스템의 개략적인 구성을 도시한 블럭도.1 is a block diagram showing a schematic configuration of a general speech recognition system.

도 2는 본 발명의 실시예에 의한 음성인식방법을 순차적으로 도시한 흐름도.2 is a flowchart sequentially showing a voice recognition method according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 의한 음성인식방법에서의 음성신호 그래프 및 음성신호 스펙트로그램.Figure 3 is a voice signal graph and voice signal spectrogram in the voice recognition method according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 의한 음성인식방법에서의 단어검색 및 최종단어선택단계를 보다 구체적으로 도시한 흐름도.Figure 4 is a flow chart illustrating in more detail the word search and the final word selection step in the speech recognition method according to an embodiment of the present invention.

**도면의 주요 부분에 대한 부호의 설명**** Description of the symbols for the main parts of the drawings **

10: 음성입력부 11: 특징분석부10: voice input unit 11: feature analysis unit

12: 패턴인식부 13: 참조패턴저장부12: pattern recognition unit 13: reference pattern storage unit

14: 결정부14: decision

Claims

A voice input unit which receives a voice and converts the voice into a voice signal;

A feature analyzer for generating an input pattern by calculating a feature vector of the voice signal received from the voice input unit;

A reference pattern storage unit for storing reference patterns for performing speech recognition in comparison with the input pattern;

A pattern recognition unit for comparing the input pattern with the reference pattern and outputting consonant information corresponding to a reference pattern similar to the input pattern; And

In the initial speech-based speech recognition device including a decision unit for determining a word to be output by searching for a word corresponding to the initial information detected by the pattern recognition unit,

The reference pattern stored in the reference pattern storage unit,

A pattern generated based on syllables in which the same vowel is combined for each consonant;

The pattern recognition unit,

A syllable unit comparison algorithm is first performed on the input pattern to detect similar reference patterns. However, when two or more similar reference patterns are detected, a similar phonetic unit comparison algorithm is selectively performed to finally detect one similar reference pattern. ;

The determination unit,

Comprising a plurality of words stored in the database, the word stored in the database to search for the word corresponding to the initial consonant information detected by the pattern recognition unit, and outputs, the word corresponding to the initial consonant information detected by the pattern recognition unit If more than one is found, the initial speech recognition apparatus characterized in that it calculates the importance of all the searched words.

The method of claim 1,

The syllable unit comparison algorithm is

Dynamic Time Warping (DTW) method;

The phoneme comparison algorithm,

Hidden Markov model method;

In the reference pattern storage unit,

A reference pattern for the syllable unit comparison algorithm and a reference pattern for the phoneme unit comparison algorithm are stored separately.

The method of claim 2,

The pattern recognition unit,

When the Euclidean square distance between the input pattern and each reference pattern is calculated by performing the syllable unit comparison algorithm, and the calculated Euclidean square distance is two or more reference patterns that are less than a previously stored threshold distance, A speech-based speech recognition device, characterized in that for performing a phoneme comparison algorithm.

delete

The method of claim 1,

The determination unit,

A range of searched words, the frequency of simple nouns included in each of the searched words, and based on the frequency of the detected simple noun, the initial speech-based speech recognition device characterized in that for calculating the importance of each word.

The method of claim 6,

The determination unit,

And detecting the importance of each word based on the frequency and the weight by detecting whether a simple noun with a predetermined weight is included in each searched word.

The method of claim 1,

The determination unit,

A first speech-based speech recognition device, characterized in that for outputting the calculated word of high importance or outputting only the word of highest importance.

(A) receiving a voice and converting it into a voice signal;

(B) generating an input pattern by calculating a feature vector of the voice signal;

(C) detecting a similar reference pattern by comparing the input pattern with previously stored reference patterns according to a syllable unit comparison algorithm;

(D) selectively detecting one similar reference pattern according to a phoneme unit comparison algorithm only when two or more similar reference patterns are detected in step (C);

(E) searching for and outputting a word having a consonant corresponding to one similar reference pattern detected in step (C) or step (D),

Step (E),

(E1) searching for all of the words having the initial property corresponding to the detected one similar reference pattern;

(E2) detecting a frequency of simple nouns included in each word searched in the range of words searched in step (E1);

(E3) calculating the importance of each word based on the frequency of the simple noun detected in the step (E2); And

E4 based speech recognition method, characterized in that is performed including the step of outputting a searched word according to the importance calculated in the step (E2).

10. The method of claim 9,

The reference pattern is,

Based on syllables in which the same vowels are combined with each consonant, a reference pattern for the syllable unit comparison algorithm and a reference pattern for the phoneme unit comparison algorithm are separately generated and stored in advance. Way.

The method of claim 10,

The syllable unit comparison algorithm is

Dynamic Time Warping (DTW) method;

The phoneme comparison algorithm,

A blind-based speech recognition method, characterized in that it is a Hidden Markov model method.

delete

10. The method of claim 9,

Step (E3),

And detecting the importance of each word based on the frequency and the weight.

The method of claim 13,

The weighted simple noun,

A consonant based speech recognition method characterized by simple nouns representing public institutions.