KR100298177B1

KR100298177B1 - Method for construction anti-phone model and method for utterance verification based on anti-phone medel

Info

Publication number: KR100298177B1
Application number: KR1019980043061A
Authority: KR
Inventors: 구명완; 김우성
Original assignee: 이계철; 한국전기통신공사
Priority date: 1998-10-14
Filing date: 1998-10-14
Publication date: 2001-08-07
Also published as: KR20000025827A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 음성인식시스템에서의 반음소모델 구축방법 및 그를 이용한 발화 검증방법과 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a method for constructing a semi-phoneme model in a speech recognition system, a method for verifying speech using the same, and a computer-readable recording medium having recorded thereon a program for realizing the methods.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 음성인식시스템에서 문맥종속(CD) 음소를 사용하여 훈련용과 테스트용으로 구분하고, 이로부터 생성되는 문맥독립(CI) 음소로 구성된 유사음소 집합으로 각 CD 음소에 대해 훈련을 통해 인식거절 기능을 수행하기 위해 사용되는 발화검증을 위한 반음소모델을 구축함으로써, 훈련데이터 및 훈련시간을 줄이고 시스템의 성능을 향상시키기 위한 반음소모델 구축방법 및 그를 이용한 발화 검증방법과, 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하고자 함.In the present invention, the context-dependent (CD) phoneme in the speech recognition system is divided into training and test use, and a similar phoneme set composed of context-independent (CI) phonemes generated therefrom is recognized through training for each CD phoneme. By building a semi-phoneme model for speech verification used to perform the rejection function, a method to construct a semi-phoneme model to reduce training data and training time and to improve the performance of the system, and a method of verifying speech using the same, and to realize the above methods To provide a computer-readable recording medium that records a program for the purpose.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 음성인식시스템에 적용되는 발화검증을 위한 반음소모델 구축 방법에 있어서, 외부로부터 입력되는 음성데이터에서 음성인식에 필요한 특징을 추출하여 문맥종속(CD) 음소단위로 분할하고, 같은 음소별로 모아 분류하여 분류된 음소를 훈련용 문맥종속 음소와 테스트용 문맥종속(CD) 음소로 구분하는 제 1 단계; 상기 훈련용 문맥종속 음소를 문맥독립(CI) 음소단위로 수집하여 훈련을 통해 문맥독립(CI) 음소모델을 구축하는 제 2 단계; 및 상기 테스트용 문맥종속(CD) 음소와 상기 문맥독립(CI) 음소모델을 기준으로 음소인식 테스트를 실행하여 유사음소 집합을 구하고, 각 문맥종속(CD) 음소에 대해 유사음소 집합으로 훈련을 통해 반음소모델을 구축하는 제 3 단계를 포함함.In the method for constructing a semi-phoneme model for speech verification applied to a speech recognition system, the present invention extracts features required for speech recognition from speech data input from the outside, and divides them into context-dependent (CD) phonemes. Collecting and classifying the classified phonemes into training context-dependent phonemes and test context-dependent phonemes; A second step of constructing a context-independent (CI) phoneme model through training by collecting the training context-dependent phonemes in a context-independent (CI) phoneme unit; And a similar phoneme set by performing a phoneme recognition test based on the test context-dependent (CD) phoneme and the context-independent (CI) phoneme model, and training through a similar phoneme set for each context-dependent phoneme (CD). Including the third step of building a phoneme model.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 음성인식시스템 등에 이용됨.The present invention is used in a voice recognition system.

Description

METHODO FOR CONSTRUCTION ANTI-PHONE MODEL AND METHOD FOR UTTERANCE VERIFICATION BASED ON ANTI-PHONE MEDEL}

본 발명은 음성인식시스템에서 인식거절 기능을 수행하기 위해 사용되는 발화검증을 위한 반음소모델을 문맥독립(CI : Context-Independent) 음소와 문맥종속(CD : Context-Dependent) 음소로 혼합하여 구축함으로써, 훈련데이터 및 훈련시간을 줄이고 시스템의 성능을 향상시킬 수 있도록 한 반음소모델 구축방법 및 그를 이용한 발화 검증방법과, 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.According to the present invention, a semi-phoneme model for speech verification used to perform a recognition rejection function in a speech recognition system is constructed by mixing a context-independent (CI) phoneme and a context-dependent phone (CD). In addition, the present invention relates to a method for constructing a semi-phoneme model for reducing training data and training time and improving system performance, a method for verifying speech using the same, and a computer-readable recording medium recording a program for implementing the methods.

음성인식시스템은 미리 정해놓은 특정 인식대상 단어만이 입력될 것이라는 가정하에 음성인식 기능을 수행하므로, 사용자가 실수로 또는 고의로 인식대상 단어외의 말을 해 버리면 인식대상 단어중의 하나로 인식결과를 보여주기 때문에 엉뚱한 말로 인식해 버리는 문제점을 지니고 있다.The voice recognition system performs the voice recognition function on the assumption that only a specific word to be recognized is pre-set, so if the user speaks a word other than the word to be recognized by mistake or intentionally, the recognition result is displayed as one of the words to be recognized. Therefore, it has a problem that is recognized as a wrong word.

따라서, 음성인식시스템에서 인식대상 단어이외의 단어가 입력되었을 때 이를 인식하지 않고 거절하는 기능이 요구되어 왔다. 즉, 음성인식시스템에서는 인식대상 단어외의 말이 입력되었을때, 이를 다른 단어로 오인식하지 않고 입력이 잘못되었음을 알려주는 기능이 요구된다. 이를 음성인식 거절기능이라고 한다.Therefore, when a word other than the word to be recognized is input in the speech recognition system, a function of rejecting the word without recognition is required. That is, in a speech recognition system, when a word other than a word to be recognized is input, a function of notifying a word other than a word to be recognized is required without a mistake. This is called a voice recognition rejection function.

본 발명은 음성인식 거절기능을 수행하는 방법중 발화 검증 방식에 관련된 반음소모델의 구축에 관한 것이다.The present invention relates to the construction of a semi-phoneme model related to a speech verification method in a method of performing a speech recognition rejection function.

발화 검증 방식이란, 음성인식된 어떤 결과에 대해 그 인식 결과를 받아들일 것인지(Accept), 거절할 것인지(Reject)를 어떤 신뢰도(Confidence Score 또는 Confidence Measure)값을 사용하여 결정하는 방식이다. 여기서, 신뢰도는 음성인식결과에 대해서 그 결과가 얼마나 믿을 만한 것인가를 나타내는 척도로서, 신뢰도값이 높으면 인식 결과를 신뢰할 수 있는 것으로 인식결과를 받아들여야 하고, 반대로 낮으면 결과를 신뢰하기가 어렵다는 의미로 인식결과를 거절하여야 한다.The speech verification method is a method of determining which confidence (Refidence Score or Confidence Measure) values are used to determine whether to accept (Accept) or reject (Reject) the recognition result. Here, the reliability is a measure of how reliable the result is for the speech recognition result. If the reliability value is high, the recognition result should be accepted as reliable, and if it is low, it is difficult to trust the result. The recognition result should be rejected.

일반적으로, 널리 알려진 음성인식 방법으로 은닉 마르코프 모델(HMM : Hidden Markov Model)을 사용하는 방법이 있다. 여기서, 음성인식 과정으로 비터비(Viterbi) 탐색을 실시하는데, 이는 인식대상 후보 단어들에 대한 미리 훈련하여 구축한 HMM과 현재 입력된 음성의 특징들과의 차이를 비교하여 가장 유사한 후보단어를 결정하는 과정이다.In general, a well-known speech recognition method uses a Hidden Markov Model (HMM). Here, the Viterbi search is performed through the speech recognition process, which compares the difference between the HMM constructed by pre-training the candidate words to be recognized and the features of the currently input speech to determine the most similar candidate word. It's a process.

상기의 신뢰도는 비터비 탐색 결과 수치와는 의미가 다르다. 즉, 비터비 탐색 결과 수치는 어떤 단어나 음소에 대한 단순한 유사도를 나타낸 것인 반면에, 신뢰도는 인식된 결과인 음소나 단어에 대해 그 외의 다른 음소나 단어로부터 그 말이 발화되었을 확률에 대한 상대값을 의미한다.The reliability is different from the Viterbi search result. That is, the Viterbi search result number represents a simple similarity to a word or phoneme, while the reliability is a relative value of the probability that the word is spoken from other phonemes or words for the recognized phoneme or word. Means.

신뢰도를 결정하기 위해서는 음소(Phone) 모델과 반음소(Anti-phone) 모델이 필요하다.To determine the reliability, a phone model and an anti-phone model are required.

음소모델은 어떤 음성에서 실제로 발화된 음소들을 추출하여 추출된 음소들을 훈련시켜 생성된 HMM이다. 이러한 음소모델은 일반적인 HMM에 근거한 음성인식시스템에서 사용되는 모델이다.A phoneme model is an HMM created by training extracted phonemes by extracting phonemes actually spoken from a voice. The phoneme model is a model used in a speech recognition system based on a general HMM.

한편, 반음소모델은 실제 발화된 음소와 아주 유사한 음소들(이를 유사음소집합(Cohort Set)이라 함)을 사용하여 훈련된 HMM을 말한다.On the other hand, the half-phoneme model refers to an HMM trained using phonemes that are very similar to actual phonemes (these are called cohort sets).

이와 같이, 음성인식시스템에서는 사용하는 모든 음소들에 대해서 각기 음소모델과 반음소모델이 존재한다.As such, in the speech recognition system, a phoneme model and a semiphoneme model exist for each phoneme used.

예를들어 설명하면, ""라는 음소에 대해서는 "" 음소모델이 있고, ""에 대한 반음소모델이 존재하게 되는 것이다.For example, " About the phoneme called " "There is a phoneme model," The semitone phone model for "will exist.

예를들면, "" 음소의 모델은 음성 데이터베이스에서 ""라는 음소만을 추출하여 HMM의 훈련 방식대로 훈련을 시켜서 만들어지게 된다. 그리고, ""에 대한 반음소모델을 구축하기 위해서는 ""에 대한 유사음소집합을 구해야 한다. 이는 음소인식 결과를 보면 구할 수 있는데, 음소인식 과정을 수행하여 "" 이외의 다른 어떤 음소들이 ""로 오인식되었는지를 보고 이를 모아서 ""에 대한 유사음소 집합을 결정할 수 있다. 즉, ",," 등의 음소들이 주로 ""로 오인식 되었다면 이들을 유사음소집합이라 할 수 있고, 이들을 모아서 HMM 훈련과정을 거치면 "" 음소에 대한 반음소모델이 생성된다.For example, " "The model of the phoneme in the speech database" "Only extracts phonemes and trains the training method of HMM, and is made. And," To build a phoneme model for " You should get a similar phoneme set for ". This can be found by looking at the phoneme recognition results, by performing a phoneme recognition process. Any phonemes other than " See if you've been misunderstood and collect them Can determine a set of similar phonemes for " , , Phonemes are mainly " "If they were misunderstood, they could be called a similar phoneme set. "A semiphoneme model is created for the phonemes.

이와 같은 방식으로 모든 음소에 대하여 음소모델과 반음소모델이 생성되었다면, 입력된 음성에 대한 신뢰도는 다음과 같이 계산된다.If the phoneme model and the phoneme model are generated for all the phonemes in this way, the reliability of the input voice is calculated as follows.

우선 음소모델을 탐색하여 가장 유사한 음소를 하나 찾아낸다.First, the phoneme model is searched to find the most similar phoneme.

그리고, 찾아낸 음소에 대한 반음소모델에 대한 유사도를 계산해 낸다.Then, the similarity is calculated for the semitone phone model.

최종적인 신뢰도는 음소모델에 대한 유사도와 반음소모델에 대한 유사도의 차이를 구하고, 이에 소정의 특정함수를 적용시켜 신뢰도값의 범위를 조절하여 구할 수 있다.The final reliability can be obtained by calculating the difference between the similarity between the phoneme model and the similarity between the phoneme model and adjusting a range of the reliability value by applying a predetermined specific function.

종래에는, 반음소모델을 문맥독립(CI) 음소만을 사용하였다. 여기서, 문맥독립음소는 각 음소마다 전후의 음소를 고려하지 않은 단순한 음소 그 자체를 의미하고, 반면에 문맥종속(CD) 음소는 같은 음소라 하더라도 전후의 음소에 따라 발음에 영향을 받는 현상을 고려하여 더 세분한 음소를 말한다.In the related art, the semi-phoneme model uses only context-independent (CI) phones. Here, the context-independent phoneme refers to a simple phoneme that does not consider the phoneme before and after each phoneme, whereas the context-dependent phoneme considers a phenomenon in which the phoneme is influenced by the phoneme before and after the same phoneme. Say more subtle phonemes.

일반적으로, 한국어 음성인식시스템의 경우에 CI 음소는 약 40-60개 정도, 그리고 CD 음소로는 메모리 및 인식시간을 고려해서 약 300개 정도를 사용한다. 보통의 음성인식시스템에서는 CD 음소를 사용하여 훈련 및 테스트를 하게 되지만, 발화검증에 사용되는 반음소모델의 경우에는 CI 음소 단위의 모델을 주로 사용한다. 이는 CD 음소단위의 반음소모델을 사용할 경우에, 음소갯수가 너무 많아서 요구되는 훈련 데이터와 훈련 시간이 많다는 단점이 있기 때문이다.Generally, about 40-60 CI phonemes are used in Korean speech recognition system, and about 300 CD phonemes are used in consideration of memory and recognition time. In the general speech recognition system, CD phonemes are used to train and test. However, in case of the semiphoneme model used for speech verification, CI phoneme model is mainly used. This is because when using a semi-phoneme model of the CD phoneme unit, the number of phonemes is too large, which requires a lot of training data and training time.

종래에도 발화검증을 이용하여 음성인식 거절을 수행하는 방법이 있었다.Conventionally, there has been a method of performing speech recognition rejection using speech verification.

이러한 발화검증을 이용하여 음성인식 거절 기능을 수행하는 종래의 방법에서도 발화검증을 위해 역시 신뢰도값을 구하게 되는데, 이러한 신뢰도를 구하기 위해서는 음소들에 대한 HMM과 반음소들에 대한 HMM이 요구된다. 즉, 음소에 대한 HMM과 반음소에 대한 HMM과의 차이를 비교하여 그 값을 신뢰도로 사용한다.In the conventional method of performing speech recognition rejection function using the speech verification, a reliability value is also obtained for speech verification. To obtain such reliability, an HMM for phonemes and an HMM for semitones are required. That is, the difference between the HMM for the phoneme and the HMM for the semiphoneme is compared and the value is used as the reliability.

그러나, 종래에는 반음소의 HMM을 구축함에 있어서 CI 음소만을 사용하는 방법만이 존재하였다.However, conventionally, only a method of using CI phonemes existed in constructing a half-phone HMM.

그리고, 종래에는 음소인식과정에서 음소 HMM은 CD 음소를 사용하고 반음소 HMM은 CI 음소를 사용하였기 때문에, 성능이 저하되는 단점이 있었다. 또한, 종래에는 반음소모델을 CD 음소를 사용하여 구축하려면, CD 음소의 개수가 많기 때문에 훈련에 요구되는 데이터량이 많고 부족한 데이터로 훈련을 하여야 하기 때문에, 시스템의 성능이 떨어지는 문제점이 있었다.In addition, in the conventional phoneme recognition process, since the phoneme HMM uses CD phoneme and the half phonemeal HMM uses CI phoneme, there is a disadvantage in that performance decreases. In addition, conventionally, in order to build a semi-phoneme model using CD phonemes, since the number of CD phonemes is large, a large amount of data required for training and training with insufficient data, there is a problem in that the performance of the system is poor.

상기한 바와 같은 문제점을 해결하기 위하여 안출된 본 발명은, 음성인식시스템에서 문맥종속(CD) 음소를 사용하여 훈련용과 테스트용으로 구분하고, 이로부터 생성되는 문맥독립(CI) 음소로 구성된 유사음소 집합으로 각 CD 음소에 대해 훈련을 통해 인식거절 기능을 수행하기 위해 사용되는 발화검증을 위한 반음소모델을 구축함으로써, 훈련데이터 및 훈련시간을 줄이고 시스템의 성능을 향상시키기 위한 반음소모델 구축방법 및 그를 이용한 발화 검증방법과, 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 있다.The present invention devised to solve the problems as described above, using the context-dependent (CD) phoneme in the speech recognition system, divided into training and test use, and similar phonemes composed of the context-independent (CI) phonemes generated therefrom By building a semi-phoneme model for speech verification used to perform the recognition rejection function through training for each CD phoneme as a set, a semi-phoneme model construction method for reducing training data and training time and improving system performance and A utterance verification method using the same and a computer readable recording medium having recorded thereon a program for realizing the above methods are provided.

도 1 은 본 발명이 적용되는 음성인식시스템의 구성 예시도.1 is an exemplary configuration of a voice recognition system to which the present invention is applied.

도 2 는 본 발명에 따른 반음소모델 구축방법에 대한 일실시예 흐름도.2 is a flowchart illustrating an embodiment of a method for constructing a half-phoneme model according to the present invention.

도 3 은 본 발명에 따른 반음소모델 구축과정을 이용한 발화 검증방법에 대한 일실시예 흐름도.Figure 3 is a flow diagram of an embodiment of a speech verification method using a semi-phoneme model construction process according to the present invention.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11; 끝점검출기 12 : 특징추출기11; Endpoint Detector 12: Feature Extractor

13 : 비터비탐색기 14 : 발음사전13: Viterbi Explorer 14: Pronunciation Dictionary

15 : CD 음소모델 데이터베이스 16 : 발화검증기15: CD phoneme model database 16: Ignition Verifier

17 : CD 반음소모델 데이터베이스17: CD Semitone Phone Model Database

상기 목적을 달성하기 위한 본 발명은, 음성인식시스템에 적용되는 발화검증을 위한 반음소모델 구축 방법에 있어서, 외부로부터 입력되는 음성데이터에서 음성인식에 필요한 특징을 추출하여 문맥종속(CD) 음소단위로 분할하고, 같은 음소별로 모아 분류하여 분류된 음소를 훈련용 문맥종속 음소와 테스트용 문맥종속(CD) 음소로 구분하는 제 1 단계; 상기 훈련용 문맥종속 음소를 문맥독립(CI) 음소단위로 수집하여 훈련을 통해 문맥독립(CI) 음소모델을 구축하는 제 2 단계; 및 상기 테스트용 문맥종속(CD) 음소와 상기 문맥독립(CI) 음소모델을 기준으로 음소인식 테스트를 실행하여 유사음소 집합을 구하고, 각 문맥종속(CD) 음소에 대해 유사음소 집합으로 훈련을 통해 반음소모델을 구축하는 제 3 단계를 포함하여 이루어진 것을 특징으로 한다.In order to achieve the above object, the present invention provides a method for constructing a semi-phoneme model for speech verification applied to a speech recognition system, wherein a feature required for speech recognition is extracted from speech data input from an external device. A first step of dividing the phoneme into groups and classifying the classified phonemes into training context-dependent phonemes and test context-dependent phonemes; A second step of constructing a context-independent (CI) phoneme model through training by collecting the training context-dependent phonemes in a context-independent (CI) phoneme unit; And a similar phoneme set by performing a phoneme recognition test based on the test context-dependent (CD) phoneme and the context-independent (CI) phoneme model, and training through a similar phoneme set for each context-dependent phoneme (CD). And a third step of constructing a semitone phone model.

그리고, 본 발명은 음성인식시스템에 적용되는 발화 검증방법에 있어서,And, in the speech verification method applied to the voice recognition system,

문맥종속(CD) 음소와 문맥독립(CI) 음소를 혼합하여 반음소모델을 구축하는 제 1 단계; 음성 입력시에, 음성의 끝점을 검출하여 음성이 존재하는 부분만 추출하고, 추출된 음성 부분에서 음성인식에 필요한 특징을 추출하는 제 2 단계; 추출된 음성데이터에 대해 발음사전과 문맥종속(CD) 음소모델을 참조하여 비터비 탐색을 통해 음성인식을 수행하는 제 3 단계; 및 음성인식된 음성데이터에 대해 상기 문맥종속(CD) 반음소모델과 문맥종속(CD) 음소모델을 참조하여 구한 신뢰도를 바탕으로 발화검증을 수행하는 제 4 단계를 포함하여 이루어진 것을 특징으로 한다.A first step of constructing a semi-phoneme model by mixing context-dependent (CD) phones and context-independent (CI) phones; A second step of detecting an end point of the voice during voice input, extracting only a portion in which the voice exists, and extracting a feature required for voice recognition from the extracted voice portion; A third step of performing speech recognition on the extracted speech data by referring to a pronunciation dictionary and a context-dependent CD model; And a fourth step of performing speech verification based on the reliability obtained by referring to the context-dependent (CD) half-phoneme model and the context-dependent (CD) phoneme model for speech recognition speech data.

상기 목적을 달성하기 위한 본 발명은, 반음소모델을 구축하기 위하여, 프로세서를 구비한 반음소모델 구축장치에, 외부로부터 입력되는 음성데이터에서 음성인식에 필요한 특징을 추출하여 문맥종속(CD) 음소단위로 분할하고, 같은 음소별로 모아 분류하여 분류된 음소를 훈련용 문맥종속 음소와 테스트용 문맥종속(CD) 음소로 구분하는 제 1 기능; 상기 훈련용 문맥종속 음소를 문맥독립(CI) 음소단위로 수집하여 훈련을 통해 문맥독립(CI) 음소모델을 구축하는 제 2 기능; 및 상기 테스트용 문맥종속(CD) 음소와 상기 문맥독립(CI) 음소모델을 기준으로 음소인식 테스트를 실행하여 유사음소 집합을 구하고, 각 문맥종속(CD) 음소에 대해 유사음소 집합으로 훈련을 통해 반음소모델을 구축하는 제 3 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention for achieving the above object, in order to build a semi-phoneme model, a semi-phone model construction device having a processor, extracts the features required for speech recognition from the speech data input from the outside context-dependent (CD) phoneme A first function of dividing the phoneme into groups and classifying and classifying the classified phonemes into training context-dependent phonemes and test context-dependent phonemes; A second function of collecting the context-dependent phonemes for training in context-independent (CI) phoneme units and constructing a context-independent (CI) phoneme model through training; And a similar phoneme set by performing a phoneme recognition test based on the test context-dependent (CD) phoneme and the context-independent (CI) phoneme model, and training through a similar phoneme set for each context-dependent phoneme (CD). A computer readable recording medium having recorded thereon a program for realizing a third function of constructing a half phoneme model is provided.

그리고, 본 발명은 발화 검증을 위하여, 프로세서를 구비한 발화 검증장치에, 문맥종속(CD) 음소와 문맥독립(CI) 음소를 혼합하여 반음소모델을 구축하는 제1 기능; 음성 입력시에, 음성의 끝점을 검출하여 음성이 존재하는 부분만 추출하고, 추출된 음성 부분에서 음성인식에 필요한 특징을 추출하는 제 2 기능; 추출된 음성데이터에 대해 발음사전과 문맥종속(CD) 음소모델을 참조하여 비터비 탐색을 통해 음성인식을 수행하는 제 3 기능; 및 음성인식된 음성데이터에 대해 상기 문맥종속(CD) 반음소모델과 문맥종속(CD) 음소모델을 참조하여 구한 신뢰도를 바탕으로 발화검증을 수행하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention provides a speech verification apparatus having a processor for speech verification, comprising: a first function of constructing a semi-phoneme model by mixing a context-dependent (CD) phoneme and a context-independent (CI) phoneme; A second function of detecting an end point of the voice at the time of voice input, extracting only a portion in which the voice exists, and extracting a feature required for voice recognition from the extracted voice portion; A third function of performing speech recognition on the extracted speech data by searching for the Viterbi by referring to the pronunciation dictionary and the context-dependent phoneme model; And a program for realizing a fourth function of performing speech verification on the basis of the reliability obtained by referring to the context-dependent (CD) half-phoneme model and the context-dependent (CD) phoneme model for speech recognition speech data. Provide a readable recording medium.

음성인식시스템에서 인식대상 단어 이외의 단어가 입력되었을 때 이를 인식하지 않고 거절하는 기능이 요구된다. 이를 구현하기 위해서 어떤 음성에 대해 그 음성이 얼마나 신뢰할 만한가 하는 신뢰도를 구하고 이를 이용하여 인식할지 또는 거절할지를 결정하는 발화검증 과정을 거치게 된다. 따라서, 신뢰도를 구하기 위해서는 음소 모델과 반음소 모델을 구축하여야 하는데, 본 발명에서는 효율적으로 반음소 모델을 구축하는 방법을 제안하고자 한다. 기존의 방법에서는 CI 음소만을 사용하여 반음소 모델을 구축한데 비해, 본 발명에서는 CI 음소와 CD 음소를 결합하여 사용하고, 이로부터 CD 단위의 반음소 모델을 구축하는 방법을 제안하였다. 이 방법을 사용함으로써 CI 단위의 반소음 모델을 사용하는 경우와 CD만을 이용하여 CD 단위의 반음소 모델을 구축하는 경우의 단점들을 보완하여 적은 훈련 데이터로도 효율적인 CD 반음소 모델을 구축하게 되어 성능이 향상된다.When a word other than the word to be recognized is entered in the speech recognition system, a function of rejecting it without recognizing it is required. In order to implement this, a speech verification process is performed to determine the reliability of a certain voice and how to recognize or reject the voice. Therefore, in order to obtain the reliability, a phoneme model and a semiphoneme model must be constructed. In the present invention, a method of constructing a phoneme model efficiently is proposed. In the conventional method, the semi-phoneme model is constructed using only the CI phoneme. In the present invention, a combination of the CI phoneme and the CD phoneme is used, and a method of constructing the semi-phoneme model of the CD unit is proposed. By using this method, it is possible to construct an efficient CD half-phone model with little training data by making up for the shortcomings of using a semi-noise model based on CI and building a semi-phone model based on CD using only CD. This is improved.

본 발명은 CI와 CD를 혼합하여 반음소 모델을 구축하고, 이를 통해 발화 검증을 이용하여 음성인식 거절 기능을 수행한다.The present invention builds a semi-phoneme model by mixing CI and CD, through which speech recognition rejection function is performed using speech verification.

여기서, 반음소 모델 구축시에, 본 발명은 CD를 사용하여 훈련용과 테스트용으로 구분한 후, CI로 구성되는 유사음소집합을 만든 후 반음소 모델을 만든다. 각 특징을 살펴보면 다음과 같다.Here, in constructing the semitone phone model, the present invention divides the training phone and the test phone using a CD, and then creates a semiphoneme model after making a similar phoneme set composed of CI. The features are as follows.

가. 음성 데이터에서 음성인식에 필요한 특징을 추출한다.end. Extract the features required for speech recognition from speech data.

나. 특징 추출된 음성 데이터를 CD 단위로 음소 분할하고, 같은 음소별로 모아서 분류한다.I. Features The extracted speech data are phoneme-divided into CD units, and collected by the same phoneme.

다. 분류된 음소를 훈련용과 테스트용으로 구분한다.All. Classified phonemes are classified for training and test purposes.

라. 훈련용 CD 음소에 대해서 CI별로 모아서 CI 단위의 음소 집합을 수집한다.la. Collect CD collections for each training CD phone by CI.

마. 수집된 훈련용 데이터로 HMM 훈련과정을 거쳐서 CI 음소에 대한 HMM 모델을 구축한다.hemp. The collected training data is used to build the HMM model for the CI phoneme through the HMM training process.

바. CD 단위로 모아진 테스트용 데이터를 입력으로 사용하고, 상기 마 항에서 구축한 CI 음소에 대한 HMM 모델을 기준 패턴으로 사용하여 음소인식 테스트를 실행한다.bar. The test data collected in units of CDs is used as an input, and the phoneme recognition test is executed using the HMM model of the CI phonemes constructed in the above section as a reference pattern.

사. 각 CD 음소에 대해서 어떤 CI 음소들로 오인식되는지를 알려주는 유사음소집합을 구한다(상기 바 항의 테스트시, 입력은 CD 음소이고, 기준패턴이 CI 음소였기 때문에 CD 음소에 대한 CI로 구성되는 유사음소집합이 구해진다).four. For each CD phoneme, we obtain a similar phoneme set that tells us which CI phonemes are misrecognized. (In the above test, the phoneme is composed of CIs for CD phones because the input is CD phone and the reference pattern was CI phone. Set is obtained).

자. 각 CD 음소에 대해서 유사음소집합으로 HMM 훈련을 거쳐, 반음소에 대한 HMM 모델을 구축한다.character. For each CD phoneme, HMM training is performed with similar phoneme sets, and HMM model for semitones is constructed.

한편, 음성인식 거절을 위한 발화검증시에, 본 발명은 다음과 같은 특징을갖는다.On the other hand, in speech verification for rejection of speech recognition, the present invention has the following features.

가. 음성이 입력되면 음성의 끝점을 검출하여 음성이 존재하는 부분만 골라낸다.end. When the voice is input, the end point of the voice is detected and only the portion where the voice is present is selected.

나. 음성 부분에서 음성인식에 필요한 특징을 추출한다.I. Extract the features required for speech recognition from speech part.

다. 발음사전과 CD 음소모델을 참조하여 비터비 탐색과정을 거쳐 음성인식을 수행한다.All. Speech recognition is performed through the Viterbi search process by referring to the pronunciation dictionary and the CD phoneme model.

라. CD 반음소모델과 CD 음소모델을 참조하여 신뢰도를 구하고, 이로부터 발화검증을 수행한다.la. The reliability is obtained by referring to the CD semiphone model and the CD phoneme model.

마. 상기 라 항에서 구해진 신뢰도를 임계치 값과 비교하여, 만약 신뢰도 값이 임계치보다 더 크면 인식된 결과를 받아들이고, 반대로 신뢰도 값이 임계치보다 더 작다면 인식된 결과를 거절한다.hemp. The reliability obtained in the above paragraph is compared with a threshold value, and if the confidence value is larger than the threshold, the recognized result is accepted. If the confidence value is smaller than the threshold, the recognized result is rejected.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 음성인식시스템의 구성 예시도이다.1 is an exemplary configuration of a voice recognition system to which the present invention is applied.

본 발명이 적용되는 음성인식시스템의 구성 및 동작은 당해 분야에서 이미 주지된 기술에 지나지 아니하므로 여기에서는 그에 관한 자세한 설명한 생략하기로 한다. 다만, 입력된 음성에 대해 음성인식 거절 기능을 수행하는 과정에 대해 보다 자세히 살펴보고자 한다.Since the configuration and operation of the speech recognition system to which the present invention is applied are only known techniques in the art, detailed description thereof will be omitted herein. However, the process of performing the voice recognition rejection function on the input voice will be described in more detail.

먼저, 음성이 입력되면, 끝점검출기(11)에서 음성의 앞뒤에 있는 묵음 구간을 제외한 음성구간을 찾는다. 이후에, 특징추출기(12)에서 앞에서 찾은 음성 구간의 음성신호로부터 음성의 특징을 추출한다.First, when the voice is input, the endpoint detector 11 finds the voice section excluding the silent section before and after the voice. Thereafter, the feature extractor 12 extracts the feature of the speech from the speech signal of the speech section found earlier.

다음으로, 비터비 탐색기(13)에서 CD 음소모델 데이터베이스(15)로 구성된 발음사전(14)에 등록된 단어들에 대해 음성 특징값을 이용하여 유사도(Likelihood)가 가장 유사한 단어들을 선정한다.Next, in the Viterbi searcher 13, the words most similar to the likelihood are selected by using the voice feature values for the words registered in the pronunciation dictionary 14 composed of the CD phoneme model database 15.

이어서, 발화검증기(16)가 비터비 탐색기(13)에서 선정된 단어를 이용하여 음소단위로 특징구간을 분할한 후에, 반음소모델을 이용하여 음소단위의 유사 신뢰도(Likelihood Ratio Confidence Score)를 구한다. 이러한 음소단위 유사 신뢰도(δ(CM_p))를 구하는 식을 수학식으로 표현하면, 다음의 (수학식 1)과 같다.Subsequently, the speech verifier 16 divides the feature section into phonemes using the word selected by the Viterbi searcher 13, and then obtains a Likelihood Ratio Confidence Score using the semitone phone model. . The equation for obtaining the phoneme-like reliability δ (CM _p ) is expressed by Equation 1 below.

상기 (수학식 1)에서, α, β는 상수(α=1, β=0)이다. 또한, LLR은에서 P(O_e/λ^a)는 반음소모델에 의한 유사도(Likelihood) 값이고, P(O_e/λ^c)는 음소모델에 의한 유사도값이다.In Equation 1, α and β are constants (α = 1, β = 0). In addition, LLR In P (O _e / λ ^a ) is the similarity value (Likelihood) by the semi-phoneme model, P (O _e / λ ^c ) is the similarity value by the phoneme model.

단어단위의 발화검증을 살펴보면, 단어단위 유사 신뢰도(δ(CM_w))를 구하는 식을 수학식으로 표현하면, 다음의 (수학식 2)와 같다.Looking at the utterance verification of the word unit, the equation for obtaining the word unit similarity reliability (δ (CM _w )) is expressed as the following equation (2).

상기 (수학식 2)에서, N은 단어에 있는 음소 개수이다. 이때, logδ(CM_w)가 소정의 임계값을 상승하면 받아들이고(Accept), 그 이하이면 단어를 거절한다(Reject).In Equation 2, N is the number of phonemes in a word. At this time, if logδ (CM _w ) rises a predetermined threshold value (Accept), if less than that, the word is rejected (Reject).

마지막으로, 단어가 거절되면 다음 후보 단어에 대해 상기한 바와 같이 발화검증기(16)에서 발화 검증 과정을 수행한다.Finally, if the word is rejected, the speech verification process 16 performs the speech verification process as described above for the next candidate word.

한편, 문장을 인식할 경우에도 상기의 발화 검증 과정은 동일하게 적용되어 문법만 추가되며, 문장단위의 검증이 된다.On the other hand, in the case of recognizing a sentence, the above utterance verification process is applied in the same manner, only the grammar is added, and the sentence unit is verified.

도 2 는 본 발명에 따른 발화검증을 위한 반음소모델 구축방법에 대한 일실시예 흐름도이다.2 is a flowchart illustrating a method for constructing a semiphoneme model for speech verification according to the present invention.

본 발명에 따른 발화검증을 위한 반음소모델 구축 방법은, 기존의 CI 음소만을 사용하여 반음소모델을 구축하는 방법과는 달리, CI 음소와 CD 음소를 결합하여 사용하고, 이로부터 CD 음소단위의 반음소모델을 구축함으로써, CI 음소단위의 반음소모델을 사용하는 경우와 CD 음소만을 이용해 CD 음소단위의 반음소모델을 구축하는 경우의 단점들을 보완하여 적은 훈련 데이터로도 효율적인 CD 반음소모델을 구축할 수 있다.According to the present invention, unlike the method of constructing a semitone model using only CI phones, the CI phoneme and CD phoneme are used in combination. By constructing the semitone model, we solved the disadvantages of using the semiphone model of CI phoneme and the semiphone model of CD phoneme using only CD phoneme. Can be built.

CI 음소와 CD 음소를 결합하여 CD 음소단위의 반음소모델을 효율적으로 구축하기 위해서, 본 발명은 우선 CD 음소단위로 음소를 수집한 후에, 이를 훈련용과 테스트용으로 구분한다. 여기서, CD 음소는 CI 음소를 주변 환경에 따라서 더 세분한 것이므로, 이들은 CI 음소단위로 모을 수 있다.In order to efficiently construct a semi-phoneme model of CD phoneme units by combining CI phonemes and CD phonemes, the present invention first collects phonemes by CD phoneme units, and then divides them into training and test cases. Here, since CD phonemes are subdivided into CI phonemes according to the surrounding environment, they can be collected in CI phoneme units.

따라서, 훈련용 데이터를 다시 CI 음소단위로 모은 후에, HMM 훈련과정을 거치면 CI 음소단위의 HMM이 생성된다. 이후에, CI HMM을 음소인식기의 기준패턴으로 사용하고, CD 음소를 입력으로 하여 음소인식 테스트하면 각 CD 음소에 대해서 어떤 CI 음소들로 오인식되는지를 알 수 있다. 그리고, 이 결과가 바로 CI 음소들로 구성되는 CD 음소에 대한 유사음소 집합이 된다.Therefore, after gathering training data back to CI phoneme unit, HMM training process generates HMM of CI phoneme unit. Subsequently, when the CI HMM is used as a reference pattern of the phoneme recognizer and the phoneme recognition test is performed by using the CD phoneme as an input, it is possible to know which CI phonemes are misrecognized for each CD phoneme. The result is a similar phoneme set for a CD phone composed of CI phones.

이러한 유사음소 집합은 CD 음소단위로 생성되지만 그 구성 요소가 CI 음소들이기 때문에 CD 음소처럼 많은 양의 훈련 데이터를 요구하지 않게 된다.This pseudo phoneme set is generated in CD phoneme units, but since its components are CI phones, it does not require as much training data as CD phones.

마지막으로, 이러한 유사음소 집합에 의거하여 HMM 훈련을 시키게 되면 CD 음소에 대해서 반음소모델이 생성된다.Finally, when HMM training is performed based on the similar phoneme set, a semiphoneme model is generated for CD phonemes.

따라서, 이 방법을 사용하여 반음소모델을 구축할 경우에, 본 발명은 CD 음소를 CI 음소에 의해 훈련시키기 때문에 적은 훈련데이터로 더 좋은 성능을 보이게 되고, 또한 CI 음소단위의 반음소모델을 구축할 때보다 더 좋은 성능을 보이게 된다.Therefore, when constructing the semitone phone model using this method, the present invention shows better performance with less training data because the CD phone is trained by the CI phoneme, and also the semiphoneme model of CI phoneme unit is constructed. You'll get better performance than it does.

음성인식 시스템을 구성하기 위해서는 자체 성능을 평가하기 위하여 항상 훈련용과 테스트용으로 구분한다. 훈련용이란 훈련이 잘 되었는지를 알아보기 위해 필요한 데이터이다. 그러므로, 훈련용과 인식용은 동일한 데이터를 쓸 수 없다. 일반적으로 훈련용과 테스트용의 구분 비율은 8:2로 한다. 이는 개발자가 인식대상단어 및 개발시간 등을 고려하여 정하는 노하우(Know-how)이다. 즉, 특별한 기준이 없다.To construct a speech recognition system, it is always divided into training and testing for evaluating its performance. Training is the data needed to determine if you are well trained. Therefore, training and recognition can not write the same data. Generally, the ratio between training and test is 8: 2. This is a know-how determined by the developer in consideration of words to be recognized and development time. That is, there is no special standard.

상기한 바와 같은 본 발명에 따른 발화검증을 위한 반음소모델을 구축하는 방법을 구체적으로 살펴보면 다음과 같다.Looking at the method for building a semi-phoneme model for the speech verification according to the present invention as described above in detail.

도 2에 도시된 바와 같이, 본 발명에 따른 발화검증을 위한 반음소모델 구축방법은, 먼저 입력되는 음성데이터에서 음성인식에 필요한 특징을 추출한다(201).As shown in FIG. 2, the method for constructing a semi-phoneme model for speech verification according to the present invention extracts a feature required for speech recognition from first input speech data (201).

이후, 특징 추출된 음성데이터를 CD 음소단위로 음소 분할하고 같은 음소별로 모아서 분류한 후에(202), 분류된 음소를 훈련용과 테스트용으로 구분한다(203).Subsequently, after the feature-extracted voice data are divided into phonemes by CD phoneme and collected and sorted by the same phoneme (202), the classified phonemes are divided into training and test (203).

다음으로, 훈련용 CD 음소에 대해서 CI 음소별로 모아서 CI 음소단위의 음소 집합을 수집한 후에(204), 수집된 훈련용 데이터로 HMM 훈련과정을 거쳐서 CI 음소에 대한 HMM을 구축한다(206).Next, after collecting a CD set of CI phonemes by collecting the CI phonemes for the training CD phones (204), the HMM for the CI phones is constructed through the HMM training process using the collected training data (206).

이어서, CD 음소단위로 모아진 테스트용 데이터를 입력으로 사용하고 CI 음소에 대한 HMM을 기준 패턴으로 사용하여 음소인식 테스트를 실행하여(207) 각 CD 음소에 대해서 어떤 CI 음소들로 오인식되는지를 알려주는 유사음소 집합을 구한다(208). 즉, 음소인식 테스트시에, 입력이 CD 음소이고 기준패턴이 CI 음소였기 때문에, CD 음소에 대한 CI 음소로 구성되는 유사음소 집합을 구할 수 있다.Then, using the test data collected in units of CD phonemes as input and running a phoneme recognition test using the HMM for the CI phoneme as a reference pattern (207), it indicates which CI phonemes are misrecognized for each CD phoneme. A similar phoneme set is obtained (208). That is, in the phoneme recognition test, since the input is a CD phone and the reference pattern is a CI phone, a similar phoneme set consisting of CI phonemes for the CD phone can be obtained.

마지막으로, 각 CD 음소에 대해서 유사음소 집합으로 HMM 훈련을 거쳐 반음소에 대한 HMM을 구축한다(209).Finally, HMM training is performed for semi-phones by constructing similar phonemes for each CD phone (209).

도 3 은 본 발명에 따른 반음소모델 구축과정을 이용한 발화 검증방법에 대한 일실시예 흐름도이다.3 is a flowchart illustrating a method of verifying speech using a half phoneme model construction process according to the present invention.

도 3에 도시된 바와, CI 음소와 CD 음소를 결합하여 CD 음소단위의 반음소모델을 구축하는 과정(도 2 참조)을 이용한 본 발명에 따른 발화 검증방법은, 먼저 음성이 입력되면 끝점검출기(11)에서 음성의 끝점을 검출하여 음성이 존재하는 부분만 추출한다(301).As shown in FIG. 3, the method of verifying speech according to the present invention using a process of constructing a semi-phoneme model of a CD phoneme unit by combining CI phonemes and CD phonemes (see FIG. 2), first, when a voice is inputted, an endpoint detector ( In step 11), an end point of the voice is detected, and only a portion in which the voice exists is extracted (301).

이후, 특징추출기(12)가 추출된 음성 부분에서 음성인식에 필요한 특징을 추출한 후에(302), 추출된 음성데이터에 대해 비터비 탐색기(13)에서 발음사전(14)과 CD 음소모델(15)을 참조하여 비터비 탐색과정을 거쳐 음성인식을 수행한다(303).Then, after the feature extractor 12 extracts a feature required for speech recognition from the extracted speech part (302), the pronunciation dictionary 14 and the CD phoneme model 15 in the Viterbi searcher 13 for the extracted speech data. In operation 303, voice recognition is performed through a Viterbi search process.

다음으로, 음성인식된 음성데이터에 대해 발화검증기(16)에서 CD 반음소모델(17)과 CD 음소모델(15)을 참조하여 신뢰도를 구하고, 이로부터 발화검증을 수행한다(304). 이때, 구한 신뢰도 값이 소정의 임계치와 비교하여(305), 신뢰도 값이 임계치보다 크면 인식된 결과를 받아들이고(306), 신뢰도 값이 임계치보다 크지 않으면 인식된 결과를 거절한다(307).Next, the speech authenticator 16 obtains the reliability by referring to the CD semitone phone model 17 and the CD phoneme model 15 in the speech verifier 16, and performs speech verification therefrom (304). At this time, the obtained reliability value is compared with a predetermined threshold (305). If the reliability value is greater than the threshold, the recognized result is accepted (306). If the reliability value is not greater than the threshold, the recognized result is rejected (307).

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 있어 본 발명의 기술적 사상을 벗어나지 않는 범위내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the spirit of the present invention for those skilled in the art to which the present invention pertains, and the above-described embodiments and accompanying It is not limited to the drawing.

상기한 바와 같은 본 발명은, CI 음소와 CD 음소를 혼합적으로 사용하여 반음소모델을 구축함으로써, 보다 정확하게 반음소를 모델링할 수 있고, CD를 CI에 의해 훈련시키기 때문에 적은 양의 훈련데이터로도 효율적인 CD 반음소모델을 구축할 수 있어, 기존의 CI 음소단위의 반음소모델을 사용하거나 CD 음소만을 이용하여 CD 음소단위의 반음소모델을 구축하는 경우에 비해 시스템의 발화검증 기능을 향상시킬 수 있는 효과가 있다.According to the present invention as described above, the semitone model can be modeled more accurately by using a combination of the CI phoneme and the CD phoneme, so that the semitone phone can be modeled more accurately, and the CD is trained by the CI. It is also possible to construct an efficient CD half-phone model, which can improve the speech verification function of the system compared to the case of using the existing semi-phone model of the CI phoneme or using only the CD phoneme. It can be effective.

Claims

In the semi-phoneme model construction method for speech verification applied to speech recognition system,

Extracts the features required for speech recognition from the voice data input from the outside, divides them into contextual phoneme (CD) phonemes, and classifies and classifies the classified phonemes into training contextual phonemes and test contextual phonemes (CD). First step divided into;

A second step of constructing a context-independent (CI) phoneme model through training by collecting the training context-dependent phonemes in a context-independent (CI) phoneme unit; And

A phoneme recognition test is performed on the basis of the test context-dependent (CD) phoneme and the context-independent (CI) phoneme model to obtain a similar phoneme set. The third step in building a phoneme model

Half phoneme model construction method in speech recognition system comprising a.

The method of claim 1,

The second step,

A fourth step of collecting a phoneme set of context-independent (CI) phonemes by gathering the training context-independent (CI) phones by context-independent (CI) phones; And

The fifth step of constructing a hidden Markov model for context-independent (CI) phonemes through the Hidden Markov Model (HMM) training with the collected training data.

The method according to claim 1 or 2,

The third step,

By using the test data collected in units of the context dependent (CD) phonemes as input and executing the phoneme recognition test using the hidden Markov model (HMM) for the context independent (CI) phonemes as a reference pattern, each context dependency ( CD) a sixth step of obtaining a similar phoneme set indicating which context-independent (CI) phones are misrecognized for the phoneme; And

Seventh step of constructing hidden Markov models for semitones through training hidden Markov models with similar phoneme sets for each context-dependent (CD) phoneme

In the speech verification method applied to the speech recognition system,

A first step of constructing a semi-phoneme model by mixing context-dependent (CD) phones and context-independent (CI) phones;

A second step of detecting an end point of the voice during voice input, extracting only a portion in which the voice exists, and extracting a feature required for voice recognition from the extracted voice portion;

A third step of performing speech recognition on the extracted speech data by referring to a pronunciation dictionary and a context-dependent CD model; And

A fourth step of performing speech verification on speech recognized speech data based on the reliability obtained by referring to the CD-based phoneme model and the CD-based phoneme model

Speech verification method in a voice recognition system comprising a.

The method of claim 4, wherein

The first step is,

Extracts the features required for speech recognition from the voice data input from the outside, divides them into contextual phoneme (CD) phonemes, and classifies and classifies the classified phonemes into training contextual phonemes and test contextual phonemes (CD). A fifth step divided into;

A sixth step of constructing a context-independent (CI) phoneme model through training by collecting the training context-dependent phoneme in a context-independent (CI) phoneme unit; And

A similar phoneme set is obtained by performing a phoneme recognition test based on the test context-dependent (CD) phoneme and the context-independent phoneme (CI) phoneme model, and training through the similar phoneme set for each context-dependent phoneme (CD) phoneme. 7th Step to Build a Semitone Phone Model

Speech verification method in a voice recognition system comprising a.

In order to build a semi-phoneme model, in a semi-phone model building apparatus having a processor,

Extracts the features required for speech recognition from the voice data input from the outside, divides them into contextual phoneme (CD) phonemes, and classifies and classifies the classified phonemes into training contextual phonemes and test contextual phonemes (CD). A first function divided into;

A second function of collecting the context-dependent phonemes for training in context-independent (CI) phoneme units and constructing a context-independent (CI) phoneme model through training; And

A phoneme recognition test is performed on the basis of the test context-dependent (CD) phoneme and the context-independent (CI) phoneme model to obtain a similar phoneme set. Third function to build phoneme model

A computer-readable recording medium having recorded thereon a program for realizing this.

In order to verify speech, a speech verification apparatus having a processor,

A first function of constructing a semi-phoneme model by mixing a context-dependent (CD) phoneme and a context-independent (CI) phoneme;

A second function of detecting an end point of the voice at the time of voice input, extracting only a portion in which the voice exists, and extracting a feature required for voice recognition from the extracted voice portion;

A third function of performing speech recognition on the extracted speech data by searching for the Viterbi by referring to the pronunciation dictionary and the context-dependent phoneme model; And

A fourth function of performing speech verification on speech recognized speech data based on the reliability obtained by referring to the context-dependent (CD) half-phoneme model and the context-dependent (CD) phoneme model