KR20030002197A

KR20030002197A - Speech recognition method using posterior distance

Info

Publication number: KR20030002197A
Application number: KR1020010038948A
Authority: KR
Inventors: 박성준
Original assignee: 주식회사 케이티
Priority date: 2001-06-30
Filing date: 2001-06-30
Publication date: 2003-01-08
Also published as: KR100560916B1

Abstract

PURPOSE: A method for recognizing a voice by using a distance after recognition is provided to obtain prompt word voice recognition speed and easily manage dictionaries by separating a dictionary for a viterbi search and a recognition target word dictionary. CONSTITUTION: A method for recognizing a voice by using a distance after recognition includes the steps of initializing data in state of having a word dictionary(ST-301), receiving voice data inputted from a speaker(ST-302), searching a subword sequence through a viterbi search(ST-303,ST-304), calculating distance between the subword sequence and words in a recognition target word dictionary(ST-305), obtaining a final recognition result(ST-306), and judging whether voice data exist to be continuously recognized(ST-307).

Description

Speech recognition method using posterior distance

본 발명은 인식 후 거리를 이용한 음성인식 방법에 관한 것으로, 보다 상세하게 부단어(subword) 음성 단위들로 연결된 시퀀스(sequence)를 및 실제 단어 사전에 있는 단어들과 비교하여 그 거리를 이용하여 음성을 인식하도록 하는 인식 후 거리를 이용한 음성인식 방법에 관한 것이다.The present invention relates to a speech recognition method using a distance after recognition, and more particularly, a sequence connected to sub-word speech units and a speech using the distance compared with words in a real word dictionary. It relates to a speech recognition method using the distance after the recognition to recognize.

주지된 바와 같이, 음성인식 시스템은 사람의 음성을 입력받아 인식기를 통해 인식음성의 결과를 유도해내는 시스템으로, 현재 이러한 인식기술을 이용한 다양한 서비스가 개발되어 상용화되고 있다.As is well known, the speech recognition system is a system that derives the result of speech recognition through a recognizer by receiving a human voice, and various services using such recognition technology have been developed and commercialized.

일반적으로, 음성 인식은 특정 음향 데이터열이 입력되었을 때 이에 해당하는 단어열을 찾는 과정이다. 이러한 음성 인식을 위한 접근 방법 중 하나는 패턴 비교로서, 입력된 음성 데이터와 참조 패턴을 비교함으로써 가장 유사한 참조 패턴을 인식 결과로 내놓는 방법이다.In general, speech recognition is a process of finding a corresponding word sequence when a specific sound data sequence is input. One approach for speech recognition is pattern comparison, in which the most similar reference pattern is produced as a recognition result by comparing input speech data with a reference pattern.

상기 패턴 비교는 다양한 곳에 적용되어 좋은 인식 성능을 보여주고 있으나, 저장된 정보가 적다는 문제가 있다. 즉, 각 단어에 대하여 한두 개의 참조 패턴만을 저장하고 있기 때문에 한 단어가 다양하게 발음되는 것을 충분히 비교하여 해당 결과값을 유도해낼 수 없기 때문이다. 따라서 좀 더 방대한 정보를 담을 수 있는 방법이 필요한다.Although the pattern comparison is applied to various places and shows good recognition performance, there is a problem that there is little stored information. That is, since only one or two reference patterns are stored for each word, the corresponding result value cannot be derived by sufficiently comparing one word with various pronunciations. Therefore, we need a way to hold more information.

그것은 통계적 모델에 의한 음성인식 방법인 바, 이 방법은 정보가 결정론적으로 정해져 있지 않고 통계적으로 내포되어 있다.It is a speech recognition method based on a statistical model, in which the information is not deterministically defined but statistically implied.

보다 상세하게, HMM (hidden Markov model)이라고 한다. HMM-기타 다양한 종류의 통계적 모델-에 깔린 가정은 음성 신호를 파라미터(parameter)에 의한 랜덤 프로세스(random process)로 잘 특징지을 수 있으며, 그 파라미터값들은 정확하고 잘 정의된 방법으로 추정할 수 있다는 것이다.More specifically, it is called HMM (hidden Markov model). Assumptions based on the HMM-and various other kinds of statistical models-can well characterize the speech signal as a random process by parameters, and the parameter values can be estimated in an accurate and well-defined way. will be.

상기 HMM은 전이(transition)로 연결된 상태들의 집합으로서 각각의 전이는 두 종류의 확률 집합을 가지고 있다. 하나는 전이할 수 있는 확률을 나타내는 전이확률(transition probability)이며, 다른 것은 특정 출력이 나오는 확률을 나타내는 출력 확률 밀도 함수(output probability density function)이다.The HMM is a set of states connected by a transition, and each transition has two kinds of probability sets. One is the transition probability, which indicates the probability of a transition, and the other is the output probability density function, which indicates the probability that a particular output will be produced.

만약, 음성을 모델링하기 위하여 HMM이 주어졌을 때, 기본적인 단위를 정의할 필요가 있다. 대표적인 것으로 단어와 음소를 들 수 있는 데, 단어 모델은 소어휘 인식에서는 적합하지만 대어휘 인식에서는 실용적이지 못하다.If an HMM is given to model speech, it is necessary to define a basic unit. Typical examples include words and phonemes. The word model is suitable for small vocabulary recognition, but is not practical for large vocabulary recognition.

한편, 음소 모델은 문맥 종속성이 강하지만 표준적이고 잘 이해될 수 있으며 훈련이 용이한 단위이다.The phoneme model, on the other hand, has a strong context dependency but is a standard, well understood and easy to train unit.

도 1은 음소를 사용하는 HMM의 한 예를 든 것이다. 도 1에서 확률에 대한 표기는 생략하였다. 이 모델은 "YES"와 "NO"를 인식하는 것이데 /sil로 표시되는 묵음으로 분리될 수 있다. "YES"는 '/YE'와 '/S'로 구성되어 있으며, "NO"는 '/N'과 '/O'로 구성되어 있다.1 is an example of an HMM using phonemes. In FIG. 1, notation of probability is omitted. This model recognizes "YES" and "NO", which can be separated by a silent representation of / sil. "YES" consists of '/ YE' and '/ S', and "NO" consists of '/ N' and '/ O'.

음성 기관이 음성 샘플 X를 생성하여 입밖으로 내보낼 때, 음성 인식 시스템은 화자의 내면에 음향적인 순서가 어떻게 이루어지고 있는지를 알지 못한다. 인식 시스템은 X를 생성한 단어 시퀀스 W가 어떤 것인지를 알기 위해 가장 유사한 상태 시퀀스를 재구성함으로써 W를 찾으려고 한다. 단일의 가장 바람직한 시퀀스를 찾는 형식적인 방법이 있으며, 이것을 비터비 알고리즘이라고 한다.When the speech organ generates the speech sample X and sends it out of the mouth, the speech recognition system does not know how the acoustic sequence is performed inside the speaker. The recognition system attempts to find W by reconstructing the most similar state sequence to know what word sequence W generated X. There is a formal way to find the single most desirable sequence, called the Viterbi algorithm.

도 2는 종래의 음성 인식 과정을 나타내는 흐름도이다.2 is a flowchart illustrating a conventional speech recognition process.

이를 참조하면, 음성인식 시스템은 단어와 이 단어들이 어떤 모델로 구성되어 있는지에 대한 정보를 포함한 단어 사전을 가지고 있는 상태에서, 데이터를 초기화하고(ST-201), 화자로부터 음성 데이터를 입력받는다(ST-202).Referring to this, the speech recognition system initializes data with a word dictionary including information on a word and a model of the words (ST-201), and receives voice data from the speaker ( ST-202).

그리고, 비터비 검색을 통함으로 인해(ST-203), 인식기는 단어 사전을 검색하여 입력된 음향 시퀀스와 가장 유사한 단어 시퀀스를 인식결과로 출력하게 된다(ST-204).Then, through the Viterbi search (ST-203), the recognizer searches the word dictionary and outputs the word sequence most similar to the input sound sequence as the recognition result (ST-204).

그러나, 상기한 방법으로 음성 인식을 행할 경우에는 인식 대상 단어의 개수가 증가되면 될수록 사전을 검색하는 시간도 증가하게 되므로 그 인식에 소요되는 시간도 더불어 증가된다는 문제가 있다.However, when the speech recognition is performed in the above-described manner, as the number of words to be recognized increases, the time for searching the dictionary also increases, thereby increasing the time required for the recognition.

음성을 인식할 때는 인식의 기본 단위를 먼저 정의할 필요가 있다.When recognizing speech, it is necessary to first define the basic unit of recognition.

이에 대하여 상세하게 기술하면, 먼저 음성인식에 사용되는 단위로서 대표적으로 단어와 음소를 들 수 있는 바, 음소는 부단어(subword) 단위의 한 종류로 부단어 단위로는 음절, 2음절(dyad), 음향 단위(acoustic units) 등이 있다.In detail, first, a word and a phoneme can be exemplified as a unit used for speech recognition. A phoneme is a kind of subword unit and a syllable and a two syllable as a subword unit. And acoustic units.

그러나, 단어를 기본 단위로 사용할 경우에는 신뢰성 있는 단어 모델을 얻기 위해서 학습에 필요한 단어 발성의 수가 많아야 하고, 단어의 변경이 있을 때마다 단어 모델을 새로 생성해야 한다는 단점이 있다.However, when the word is used as a basic unit, in order to obtain a reliable word model, there are disadvantages in that the number of word utterances required for learning must be large, and a new word model must be generated whenever there is a change in the word.

본 발명은 상기한 종래 기술의 사정을 감안하여 이루어진 것으로, 음성 입력후 부단어(subword) 음성 단위들로 연결된 시퀀스(sequence)를 1차적으로 구하고, 2차적으로 이 시퀀스와 실제 단어 사전에 있는 단어들과 비교하는 복차적 인식 과정을 통하여 가장 근거리의 단어 시퀀스를 결과로 출력하게 함으로써 그 인식 속도를 향상시키도록 한 인식 후 거리를 이용한 음성인식 방법을 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made in view of the above-described prior art, and primarily obtains a sequence connected to subword speech units after speech input, and secondly, a word in the sequence and the actual word dictionary. It is an object of the present invention to provide a speech recognition method using a distance after recognition to improve the recognition speed by outputting a word sequence of the shortest distance as a result through a regressive recognition process that compares the results.

도 1은 종래의 실시예에 따른 단순한 문법의 HMM(hidden Markov model)의 예를 도시한 도면,1 is a view showing an example of a HMM (hidden Markov model) of a simple grammar according to a conventional embodiment,

도 2는 종래의 실시예에 따른 음성인식 과정을 도시한 흐름도,2 is a flowchart illustrating a voice recognition process according to a conventional embodiment;

도 3은 본 발명의 일실시예에 따른 인식 후 거리를 이용한 음성인식 과정을 도시한 흐름도이다.3 is a flowchart illustrating a speech recognition process using a distance after recognition according to an embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

ST-304:부단위 시퀀스 결정단계 ST-305:인식후 거리측정단계ST-304: Sub-Sequence Determination Step ST-305: Recognition Distance Measurement Step

ST-306:단어 시퀀스 결정단계ST-306: Word Sequence Determination Step

상기한 목적을 달성하기 위해, 본 발명의 바람직한 실시예에 따르면 입력된 음성 데이터를 인식하기 위한 음성인식 시스템에 있어서, 음성 인식을 위한 단어사전으로 단어의 삽입이나 제거를 하더라도 변동이 없는 부단위 음성 단위 사전과; 단어의 변경에 따라 변할 수 있는 인식 대상 단어 사전을 동시에 이용하는 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.In order to achieve the above object, according to a preferred embodiment of the present invention, in the speech recognition system for recognizing the input speech data, the sub-unit speech does not change even if the words are inserted or removed as a word dictionary for speech recognition Unit dictionary; Provided is a speech recognition method using a distance after recognition, wherein the recognition target word dictionary can be changed at the same time according to a change of a word.

바람직하게, 부단어 음성 단위 사전을 이용하여 부단어 시퀀스를 찾는 단계; 상기 부단어 시퀀스와 단어 사전에 있는 단어들간의 인식 후 거리를 측정하여 가장 거리가 근거리인 단어 시퀀스를 인식 결과로 구하는 단계로 이루어진 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.Preferably, using the subword phonetic unit dictionary to find a subword sequence; The speech recognition method using the post-recognition distance is provided by measuring a distance after the recognition between the subword sequence and the words in the word dictionary and obtaining a word sequence having the shortest distance as a recognition result.

보다 상세하게, 인식된 부단어 단위 시퀀스에서 부단위 음성 단위가 삽입되거나 삭제될 때에 부단어 단위 시퀀스가 가지는 인식 대상 단어와의 거리를 산출하여 최종 인식 결과를 얻게 되는 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.In more detail, when the sub-unit speech unit is inserted or deleted from the recognized sub-word unit sequence, the distance after the recognition may be obtained by calculating a distance from the target word of the sub-word unit sequence to obtain a final recognition result. Provided is a speech recognition method.

또한, 상기 부단어 시퀀스를 생성하는 단계에서 음성학적인 정보와, 문법에 의하여 부과되는 정보를 이용하여 결과 시퀀스를 구하는 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.In addition, there is provided a speech recognition method using a distance after recognition, wherein a result sequence is obtained by using phonetic information and information imposed by a grammar in generating the subword sequence.

한편, 상기 인식 후 거리측정 과정은 음성학적 모델을 이용하여 각각의 음소에 대한 거리를 기산출하여 저장된 정보를 이용함으로써 측정하는 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.On the other hand, the distance measurement process after the recognition is provided by the speech recognition method using a distance after recognition, characterized in that by using the stored information to calculate the distance for each phoneme using a phonetic model.

바람직하게, 상기 인식 후 거리측정 과정은 추가 단어의 음소에 대한 거리값을 기정의하여 산출한 정보를 이용하여 측정하는 것을 특징으로 하는 인식 후 거리를 이용한 음성인식 방법이 제공된다.Preferably, the distance measurement process after the recognition is provided using a speech recognition distance using a distance after recognition, characterized in that by using the information calculated by pre-determining the distance value for the phoneme of the additional word.

이하, 본 발명에 대해 도면을 참조하여 상세하게 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, this invention is demonstrated in detail with reference to drawings.

먼저, 본 발명은 단어 사전을 용이하게 다루고 대어휘 음성 인식에서의 인식 속도를 높이는 데 있다. 본 발명은 상기한 목적을 위하여 복수 개의 사전이 사용되는 바, 그 첫째 사전은 비터비 검색에 사용되는 부단어 음성 단위로 구성되어 있으며, 둘째 사전은 인식 대상 단어들에 대한 정보를 가지고 있다. 분리된 사전을 사용함으로써 얻을 수 있는 점은 비터비 검색에 무관하게 인식 대상 단어들의 추가, 삭제가 용이하다는 점이다.First, the present invention is to easily handle the word dictionary and to increase the recognition speed in the large vocabulary speech recognition. In the present invention, a plurality of dictionaries are used for the above-described purposes. The first dictionary is composed of subword phonetic units used for Viterbi search, and the second dictionary has information on words to be recognized. The advantage of using a separate dictionary is that it is easy to add or delete words to be recognized regardless of Viterbi search.

따라서, 본 발명에 의한 인식 작업은 다음 두 단계를 거치게 되는데 첫째, 입력 음성에 따른 비터비 검색을 통해 부단어 음성 단위 시퀀스를 중간 인식 결과를 출력하는 과정이다.Therefore, the recognition operation according to the present invention goes through the following two steps. First, a process of outputting an intermediate recognition result of a subword speech unit sequence through a Viterbi search based on an input speech.

또한, 둘째로는 중간 결과와 인식단어사전에 있는 단어들간의 거리를 계산하여 가장 근거리의 단어 시퀀스를 최종 인식 결과로 출력하는 과정이다.Secondly, the distance between the intermediate result and the words in the recognition word dictionary is calculated and the word sequence in the shortest distance is output as the final recognition result.

도 3은 본 발명에 의한 음성 인식 과정을 나타낸 것이다.3 shows a speech recognition process according to the present invention.

먼저, 본 발명에 따른 음성 인식 과정에서 일부 과정(301,302,303,304)은 기존의 방법을 나타낸, 도 2에서의 일부(201,202,203,204)와 일치한다. 하지만, 본 발명에서 사용되는 첫번째 사전과 도 2에서의 사전은 서로 각기 상이하다.First, some of the processes 301, 302, 303 and 304 in the speech recognition process according to the present invention coincide with the parts 201, 202, 203 and 204 in FIG. However, the first dictionary used in the present invention and the dictionary in FIG. 2 are different from each other.

이하, 본 발명에 따른 음성 인식 과정에 대하여 기술한다.Hereinafter, a speech recognition process according to the present invention will be described.

예컨대, "아버지", "어머니"를 인식하는 인식 엔진을 만든다고 가정할 때, 여기서 부단위 음성 단위는 음소로 가정하는 바, 음절이나 다른 것으로 정의할 수도 있다. 기존 방법을 따를 때, 단어 사전을 다음과 같이 구성할 수 있다.For example, suppose that a recognition engine for recognizing "father" and "mother" is made. Here, sub-unit speech units are assumed to be phonemes. When following the conventional method, the word dictionary can be constructed as follows.

아버지 : ㅏ ㅂ ㅓ ㅈ ㅣFather: ㅏ ㅂ ㅓ ㅣ

어머니 : ㅓ ㅁ ㅓ ㄴ ㅣMother: ㅓ ㅁ ㅓ ㄴ ㅣ

사전에 정의된 모델을 사용하여 인식 엔진은 각각의 단어 시퀀스가 음향 시퀀스를 생성했을 확률값을 계산하고, 가장 큰 확률값을 낸 단어를 인식 결과로 내놓는다.Using a predefined model, the recognition engine calculates the probability that each word sequence generated an acoustic sequence, and outputs the word with the highest probability as the recognition result.

본 발명에서는 두 개의 사전이 사용된다. 첫번째 사전은 부단어 단위 사전으로서 음소 정보를 가지고 있으며 다음과 같이 구성된다고 볼 수 있다.Two dictionaries are used in the present invention. The first dictionary is a subword unit dictionary, which has phoneme information, and is composed as follows.

ㅏㅏ

ㅂㅂ

ㅓㅓ

ㅈㅈ

ㅣㅣ

ㅁM

ㄴN

이 사전에는 각 모델에 대한 여러 가지 정보가 포함되어 있어야 하나, 본 발명의 요지가 아니므로 생략하였다.This dictionary should include a variety of information about each model, but is omitted because it is not the gist of the present invention.

두 번째 사전은 첫번째 사전에 있는 음소들로 구성된 인식 대상 단어 사전이다. 본 예에서는 도 2에서 전술한 종래의 단어 사전을 그대로 사용하는 것으로 한다.The second dictionary is a recognized word dictionary composed of phonemes from the first dictionary. In this example, the conventional word dictionary described above with reference to FIG. 2 is used as it is.

첫번째 단계에서, 인식기는 비터비 검색의 결과로서 음소의 시퀀스를 생성한다(ST-304). 예를 들어, 사용자가 "아버지"라고 말을 했다고 하면, 음소 시퀀스는 <'ㅏ','ㅂ','ㅓ','ㅈ','ㅣ'>이거나 <'ㅏ','ㅁ','ㅓ','ㅈ','ㅣ'>이거나 그 밖의 다른 것이 될 수 있다.In the first step, the recognizer generates a sequence of phonemes as a result of the Viterbi search (ST-304). For example, if a user says "father", the phoneme sequence is <'ㅏ', 'ㅂ', 'ㅓ', 'ㅈ', 'ㅣ'> or <'ㅏ', 'ㅁ', ' ㅓ ',' ㅈ ',' ㅣ '> or something else.

물론 올바른 시퀀스는 <'ㅏ','ㅂ','ㅓ','ㅈ','ㅣ'>이며 인식기가 제대로 수행했다면 이 결과를 내놓을 것이다. 비터비 검색을 하는 동안, 음소들간의 관계나 문법을 사용한다면 인식 시간을 줄일 수 있을 것이다. 이러한 정보는 부단어 단위 사전 내에 포함시킬 수도 있고 따로 가지고 있을 수도 있다.Of course, the correct sequence is <'ㅏ', 'ㅂ', 'ㅓ', 'ㅈ', 'ㅣ'> and this will produce this result if the recognizer performed correctly. During the Viterbi search, you can reduce the recognition time if you use grammar relationships or grammar. Such information may be included in a subword dictionary or may be kept separately.

둘째 단계에서는 음소 시퀀스와 둘째 사전에 있는 단어들간의 거리가 계산된다(ST-305). 이 거리는 이미 비터비 검색을 통해 인식된 결과를 사용하여 구하는 것이므로 이것을 '인식 후 거리'라고 명명하기로 한다.In the second step, the distance between the phoneme sequence and the words in the second dictionary is calculated (ST-305). Since this distance is obtained using the results already recognized by the Viterbi search, we will call it 'distance after recognition'.

음소 시퀀스가 <'ㅏ','ㅂ','ㅓ','ㅈ','ㅣ'>이라면 가장 짧은 거리는 '0'이며 그 단어는 "아버지"이다. 다른 예로 음소 시퀀스가 <'ㅏ','ㅁ','ㅓ','ㅈ','ㅣ'>이라면 "아버지"와의 거리가 '0'은 아니다. 하지만 아마도 이 거리가 다른 단어에 비해서 가장 짧을 것이며 따라서 이번에도 인식 결과는 "아버지"가 될 것이다.If the phoneme sequence is <'ㅏ', 'ㅂ', 'ㅓ', 'ㅈ', 'ㅣ'> the shortest distance is '0' and the word is 'father'. As another example, if the phoneme sequence is <'ㅏ', 'ㅁ', 'ㅓ', 'ㅈ', 'ㅣ'>, the distance from the 'father' is not '0'. But perhaps this distance is the shortest compared to other words, so this time the recognition will be "father."

만약, 다른 단어와의 거리가 더 짧게 나왔다면 이때는 오인식된 경우이다.If the distance from other words is shorter, this is a misunderstanding.

거리를 측정하는 방법은 여러 가지가 있다. 한 방법을 예로 나타내면 다음과 같다.There are many ways to measure distance. An example is shown below.

첫째, 각각의 음소에 대하여 다른 음소와의 거리를 미리 계산해 둔다. 음성학적으로 비슷한 음소들은 거리가 짧게 나올 것이다.First, the distance from other phonemes is calculated in advance for each phoneme. Phonemes that are phonetically similar will come out shortly.

둘째, 단어 내에서 음소가 삽입되거나 삭제되는 경우에도 거리값을 정의하여 계산해 둔다. 삽입의 경우를 보면 동일한 음소라도 어느 단어의 어느 위치에 삽입되는가에 따라 미치는 영향이 각기 상이하므로 문맥 종속성을 고려하여 거리를 계산해야 한다. 이것은 다른 부단위 음성 단위를 사용하는 경우도 마찬가지이다.Second, even when phonemes are inserted or deleted in a word, distance values are defined and calculated. In the case of insertion, since the effect of the same phoneme being inserted at which position of which word is different, the distance must be calculated in consideration of the context dependency. This is also the case when using other subunit speech units.

거리값이 기산출되어 있을 경우에, 인식의 두 번째 단계에서 이 값들을 이용하여 전체적인 거리를 연산하고(ST-305) 이에 따라 최종 인식 결과를 구하게 된다(ST-306).If the distance value is already calculated, the overall distance is calculated using these values in the second step of recognition (ST-305), and thus the final recognition result is obtained (ST-306).

그리고, 지속적으로 인식해야 할 음성 데이터가 존재하는 지의 여부를 판단하여(ST-307), 음성 데이터가 존재할 경우에는 음성 입력 과정으로 복귀하고, 음성 데이터가 존재하지 않는 경우에는 인식기를 종료시킨다.Then, it is determined whether or not the voice data to be continuously recognized exists (ST-307), and if the voice data exists, the process returns to the voice input process, and if the voice data does not exist, the recognizer is terminated.

이상에서 기술된 바와 같이, 본 발명은 두 단계에 걸친 음성 인식 방법이며 두 번째 단계에서 인식후 거리를 사용하도록 하였다. 본 발명은 단어 사전을 다룸에 있어서 좀 더 융통성을 제공해 주며, 대어휘 인식 시스템의 인식 속도를 향상시킬 수 있다.As described above, the present invention is a speech recognition method over two stages and uses a distance after recognition in the second stage. The present invention provides more flexibility in dealing with word dictionaries and can improve the recognition speed of a large vocabulary recognition system.

한편, 본 발명의 실시예에 따른 인식 후 거리를 이용한 음성인식 방법은 단지 상기한 실시예에 한정되는 것이 아니라 그 기술적 요지를 이탈하지 않는 범위내에서 다양한 변경이 가능하다.On the other hand, the speech recognition method using the distance after the recognition according to an embodiment of the present invention is not limited only to the above-described embodiment, various modifications are possible within the scope not departing from the technical gist.

상기한 바와 같이, 본 발명에 따른 인식 후 거리를 이용한 음성인식 방법은 부단위 시퀀스 결정 및 그 결과값에 의한 인식 후 거리 측정방법을 이용하여 대어휘 음성 인식에서 보다 빠른 인식 속도를 얻을 수 있다는 효과가 있으며, 아울러 비터비 검색을 위한 사전과 인식 대상 단어 사전을 분리함으로써 사전의 관리를 용이하게 한다는 효과가 있다.As described above, the speech recognition method using the distance after recognition according to the present invention can obtain a faster recognition speed in the large vocabulary speech recognition by using the sub-sequence determination and the distance measurement method after the recognition by the result value. In addition, there is an effect of facilitating the management of the dictionary by separating the dictionary for the Viterbi search and the word dictionary to be recognized.

Claims

In the speech recognition system for recognizing input voice data,

A subunit speech unit dictionary which does not change even when the words are inserted or removed as a dictionary for speech recognition;

Speech recognition method using a distance after recognition, characterized in that for using the dictionary of recognition target words that can change according to the change of the word.

In the speech recognition system for recognizing input voice data,

Finding a subword sequence using a subword speech unit dictionary;

And measuring the distance after recognition between the subword sequence and the words in the word dictionary to obtain the shortest word sequence as a recognition result.

The post-recognition method of claim 2, wherein the final recognition result is obtained by calculating a distance from the target word of the subword unit sequence when the subunit unit is inserted or deleted from the recognized subword unit sequence. Speech recognition method using distance.

The speech recognition method of claim 2, wherein the generating of the subword sequence comprises obtaining a result sequence using phonetic information and information imposed by a grammar.

The speech recognition method of claim 2, wherein the distance measurement process after the recognition is performed by using a stored information by calculating a distance for each phoneme using a phonetic model.

The speech recognition method of claim 2, wherein the distance measurement process after the recognition is performed by using information calculated based on a predetermined distance value of a phoneme of an additional word.