KR100577515B1

KR100577515B1 - Method for choosing gaussian mixture number of hmm

Info

Publication number: KR100577515B1
Application number: KR1019990050273A
Authority: KR
Inventors: 김호경; 구명완
Original assignee: 주식회사 케이티
Priority date: 1999-11-12
Filing date: 1999-11-12
Publication date: 2006-05-10
Also published as: KR20010046482A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 음성인식시스템에서의 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a method for setting the number of Gaussian mixtures of hidden Markov model parameters in a speech recognition system and a computer-readable recording medium recording a program for realizing the method.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 훈련과정에서 믹스츄어(mixture)의 갯수를 증가시키며 최적의 믹스츄어 갯수를 찾는 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하고자 함.The present invention provides a computer-readable recording method of increasing the number of mixtures during training and a method of setting the Gaussian mixture number of hidden Markov model parameters for finding the optimal number of mixtures and a program for realizing the method. To provide a record carrier.

3. 발명의 해결방법의 요지 3. Summary of Solution to Invention

본 발명은, 히든 마르코프 모델(HMM) 파라미터와 믹스츄어(mixture) 갯수를 초기화하는 제 1 단계; 상기 믹스츄어의 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터를 얻는 제 2 단계; 상기 믹스츄어(mixture) 갯수를 소정의 수가 될때까지 변화시키며 상기 제 2 단계부터 반복 수행하는 제 3 단계; 및 각각의 상기 믹스츄어(mixture) 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터로 음성인식률을 테스트하여 그 결과로서 믹스츄어(mixture) 갯수를 설정하는 제 4 단계를 포함함.The present invention includes a first step of initializing a Hidden Markov Model (HMM) parameter and the number of mixtures; Obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures; A third step of changing the number of mixtures until the predetermined number is repeated and repeating from the second step; And a fourth step of testing a speech recognition rate using the Hidden Markov Model (HMM) parameter according to the number of the mixtures and setting the number of the mixtures as a result.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 음성인식시스템 등에 이용됨.The present invention is used in a voice recognition system.

믹스츄어 갯수 설정, 히든 마르코프 모델 파라미터, 음성인식, 인식율Number of mixtures, hidden Markov model parameters, speech recognition, recognition rate

Description

How to set the number of Gaussian mixtures for the Hidden Markov model parameters {METHOD FOR CHOOSING GAUSSIAN MIXTURE NUMBER OF HMM}

도 1 은 일반적인 음성인식시스템의 구성예시도.1 is an exemplary configuration diagram of a general voice recognition system.

도 2 는 일반적인 음성인식시스템에서 음성 파라미터 훈련 과정의 일예시도.2 is an exemplary diagram of a voice parameter training process in a general voice recognition system.

도 3 은 일반적인 음성인식시스템에서 음성 인식 과정의 일예시도.3 is an exemplary view of a speech recognition process in a general speech recognition system.

도 4 는 종래의 믹스츄어 갯수 설정을 통해 음성 파라미터 훈련을 하는 과정의 흐름예시도.Figure 4 is a flow diagram of a process of training the voice parameter through the conventional number of mixtures setting.

도 5 는 본 발명에 따른 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법에 대한 일실시예 흐름도.5 is a flowchart illustrating a method for setting a Gaussian mixture number of hidden Markov model parameters according to the present invention.

도 6 은 본 발명에 이용되는 CIP의 각 믹스츄어(mixture) 갯수에서 확장하여 훈련한 CDP 인식률의 일예시도.FIG. 6 is an exemplary view of CDP recognition rate trained by extending the number of mixtures of CIP used in the present invention. FIG.

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 끝점검출기 12 : 특징추출부11: end point detector 12: feature extraction unit

13 : 패턴비교부 14 : 기준패턴부13: pattern comparison part 14: reference pattern part

본 발명은 히든 마르코프 모델(HMM : Hidden markov Model) 파라미터의 가우시안(Gaussian) 믹스츄어(mixture) 갯수 설정 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 특히 음성인식시스템 등의 음성 파라미터 훈련에서 사용되는 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다. The present invention relates to a method of setting the number of Gaussian mixtures of a Hidden Markov Model (HMM) parameter and a computer-readable recording medium recording a program for realizing the method. A method of setting the number of Gaussian mixtures of hidden Markov model parameters used in voice parameter training, such as a recognition system, and a computer-readable recording medium recording a program for realizing the method.

도 1 은 일반적인 음성인식시스템의 구성예시도이다.1 is an exemplary configuration diagram of a general voice recognition system.

본 발명이 적용되는 음성인식시스템에서의 동작은 다음과 같다.Operation in the speech recognition system to which the present invention is applied is as follows.

우선, 사용자의 음성이 입력되면 먼저 끝점검출부(11)에서 음성의 앞과 뒤에 있는 묵음 구간을 제외한 음성 구간을 찾는다. 다음으로 특징추출부(12)에서는 앞에서 찾아진 음성 구간의 신호로부터 음성의 특징을 추출해 낸다. 실제의 음성 인식 작업을 수행하는 패턴비교부(13)에서는 추출해 낸 음성 특징 데이터와 기준패턴부(14)에 미리 가지고 있는 기준 패턴 데이터를 비교하여 가장 유사한 기준 패턴을 찾아서 그것으로 인식 결과를 삼는다.First, when a user's voice is input, the endpoint detector 11 first searches for a voice section excluding the silent section before and after the voice. Next, the feature extractor 12 extracts the feature of the speech from the signal of the speech section found above. The pattern comparison unit 13 performing the actual speech recognition operation compares the extracted voice feature data with the reference pattern data previously included in the reference pattern unit 14 to find the most similar reference pattern and uses the recognition result as the result.

본 발명에서 제안하는 방식은 도 1 의 기준패턴부(14)에 해당하는 내용으로, 히든 마르코프 모델(HMM) 파라미터도 기준패턴의 한 가지이다. 본 발명은 인식률을 높이기 위한 기준패턴 생성에 관한 발명이라고 할 수 있다. The method proposed by the present invention corresponds to the reference pattern unit 14 of FIG. 1, and the Hidden Markov Model (HMM) parameter is one of the reference patterns. The present invention may be referred to as an invention relating to the generation of a reference pattern for increasing the recognition rate.

음성인식시스템에서의 음성인식은 크게 두 가지 단계로 나누어 생각할 수 있 다. 첫 번째 단계는 인식에 사용될 파라미터들을 생성하는 훈련과정이고, 두 번째는 입력 음성에 대한 실제 인식과정이다. 히든 마르코프 모델(HMM : Hidden Markov Medel) 파라미터는 훈련과정에서 생성하는 것으로 나중에 인식과정에서 사용된다.Speech recognition in the speech recognition system can be considered in two stages. The first step is the training process for generating the parameters to be used for recognition, and the second step is the actual recognition process for the input voice. Hidden Markov Medel (HMM) parameters are generated during training and are used later in the recognition process.

HMM은 음성인식 시스팀에서 널리 사용되고 있는 훈련방법인데 이것을 구성하는 개개의 음소 단위 파라미터는 이산 확률(discrete output probability) 또는 연속 확률(continuous output probability)로 구성할 수 있다. HMM is a widely used training method in speech recognition systems, and individual phoneme unit parameters may be configured as discrete output probability or continuous output probability.

또, HMM 파라미터는 각 음소별로 상태 전이 확률과 관측 확률로 구성되는데, 이 관측 확률을 구성하는 방법에 있어서 두 가지 방법의 차이가 있다. 먼저 이산 확률(discrete output probability)을 갖는 이산-히든 마르코프 모델(discrete-HMM)의 경우에는 코드북(codebook)이라는 것을 사용하게 되는데, 이것은 우리가 정한 코드 워드의 종류와 갯수만큼의 대표값으로 확률값을 표현하게 되는 것이다. In addition, the HMM parameter consists of state transition probabilities and observation probabilities for each phoneme, and there are differences between the two methods in configuring the observation probabilities. First, in the case of the discrete-hidden Markov model (discrete-HMM) with discrete output probability, we use a codebook, which is a representative value of the number and type of codewords that we set. Will be expressed.

반면에, 본 발명이 적용되는 연속-히든 마르코프 모델(continuous-HMM)의 경우에는 확률값이 연속확률 밀도를 갖도록 구성하며 가우시안 분포(Gaussian distribution) 함수를 사용하여, 평균과 분산으로서 표현하게 된다. 이때, 가우시안 함수를 몇 개를 사용하여 표현하는가 하는 것이 인식 성능에 영향을 주는데, 이 갯수를 믹스츄어(mixture) 갯수라고 부른다. On the other hand, in the case of the continuous-hidden Markov model (continuous-HMM) to which the present invention is applied, the probability value is configured to have a continuous probability density and is expressed as an average and a variance using a Gaussian distribution function. In this case, how many Gaussian functions are used affects the recognition performance, which is called the number of mixtures.

다음의 도 2 와 도 3 은 음성인식시스템에서 HMM 파라미터를 이용하는 경우의 음성 파라미터 훈련 과정과 음성 인식 과정의 흐름예시도이다.2 and 3 are exemplary flow charts of a voice parameter training process and a voice recognition process when using the HMM parameter in the voice recognition system.

도 2 는 일반적인 음성인식시스템에서 음성 파라미터 훈련 과정의 일예시도이다.2 is an exemplary diagram of a voice parameter training process in a general voice recognition system.

음성인식시스템에서의 음성 파라미터 훈련 과정은, 우선, 훈련용 음성 파일 데이터베이스로부터 훈련용 음성 파일을 읽어 특징을 추출한다(201). 이때 프레임별로 특징이 추출된다.In the voice parameter training process in the voice recognition system, first, a voice file for training is read from a training voice file database to extract a feature (201). At this time, the feature is extracted for each frame.

음소 단위 분할 정보 데이터베이스로부터 음소 단위 분할 정보를 받아 HMM 음소 파라미터의 초기화를 수행한다(202).The phoneme division information is received from the phoneme division information database, and the HMM phoneme parameter is initialized (202).

기본 음소 데이터베이스를 이용해 발음 사전을 생성하여(203) 발음 사전 데이터베이스에 저장한다.A pronunciation dictionary is generated using the basic phoneme database (203) and stored in the pronunciation dictionary database.

발음 사전 데이터베이스로부터의 자료와 훈련용 음성 파일에서 추출된 특징 정보와 음소 단위 분할 정보를 통해 얻은 초기화된 HMM 독립 음소 파라미터를 가지고 HMM 독립 음소 파라미터 훈련을 수행한다(204). 독립 음소 훈련을 통해 문맥 독립(CI : Context-Independent) 파라미터를 생성한다.HMM independent phoneme parameter training is performed with the initialized HMM independent phoneme parameters obtained from the data from the pronunciation dictionary database, the feature information extracted from the training speech file, and the phoneme unit segmentation information (204). Independent phoneme training generates context-independent (CI) parameters.

또한, 발음 사전 데이터베이스로부터의 자료와 훈련용 음성 파일에서 추출된 특징 정보와 문맥 독립 음소 파라미터에서 확장하여(205), 얻은 초기화된 HMM 종속 음소 파라미터를 가지고 HMM 종속 음소(CD : Context-Dependent) 파라미터 훈련을 수행한다(206).In addition, the HMM-dependent phoneme (CD: Context-Dependent) parameter is obtained by extending the feature information extracted from the pronunciation dictionary database and the feature information extracted from the training speech file and the context-independent phoneme parameter (205). Perform training (206).

상기한 바와 같은 음성 파라미터 훈련 과정을 거쳐 HMM 파라미터가 생성되게 된다.The HMM parameter is generated through the voice parameter training process as described above.

도 3 은 일반적인 음성인식시스템에서 음성 인식 과정의 일예시도이다.3 is an exemplary view of a speech recognition process in a general speech recognition system.

도 3 은 실제 음성 인식을 할 수 있는 음성인식시스템에서 인식용 음성 데이터를 가지고 음성인식을 수행하는 과정을 제시한다.3 illustrates a process of performing voice recognition with voice data for recognition in a voice recognition system capable of real voice recognition.

음성인식시스템에서 인식용 음성 데이터베이스로부터 인식용 음성 데이터를 읽어 특징을 추출한다(301).In the speech recognition system, the speech data for recognition is read from the speech database for recognition and the feature is extracted (301).

음성 데이터에서 추출된 특징을 가지고, 발음 사전, HMM 파라미터 및 문법정보를 이용해 음성인식을 수행한다(302).The speech recognition is performed by using the pronunciation dictionary, the HMM parameter, and grammar information extracted from the speech data (302).

이때, 발음 사전은 상기 도 2 의 설명을 통해 제시된 바와 같이 기본 음소 데이터베이스로부터 생성된 것으로, "할아버지"라는 데이터가 저장될 때, "할아버지"가 "ㅎ+ㅏ+ㄹ+ㅇ+ㅏ+ㅂ+ㅓ+ㅈ+ㅣ"라는 음소로 구성됨을 보여준다.At this time, the pronunciation dictionary is generated from the basic phoneme database as shown through the description of FIG. 2, and when the data "grandfather" is stored, "grandfather" is "ㅎ + ㅏ + ㄹ + ㅇ + ㅏ + ㅂ + 음 + ㅈ + ㅣ ".

HMM 파라미터는 각 음소의 특징을 가지며, 예를 들어, "ㅎ"의 특징을 나타내는 확률값, "ㅏ"의 특징을 나타내는 확률값 등을 가지게 된다.The HMM parameter has the characteristics of each phoneme, and has, for example, a probability value indicating a characteristic of "h", a probability value indicating a characteristic of "ㅏ", and the like.

문법정보는 음성인식시스템에서 음성인식이 수행되는 분야의 시나리오 흐름에 따라 입력되는 음성데이터의 내용이 정해지므로, 현재 해당하는 단계(state)에서 가질 수 있는 음성데이터의 후보값을 열거할 수 있도록 하는 정보이다. 예를 들어, "열차표 예약를 위한 음성인식시스템"의 경우에 출발역명을 입력하는 단계에서는 역명 이외의 단어의 확률값은 제로이며, 출발시간을 입력하는 단계에서는 시간,분 이외의 단어의 확률값은 제로이다. 이러한 음성인식 과정에서의 각 후보 단어가 출현할 수 있는 확률을 문법정보에서 가진다.The grammar information is determined by the scenario flow of the field in which speech recognition is performed in the speech recognition system. Therefore, the grammar information can be used to enumerate candidate values of speech data that may be present in a corresponding state. Information. For example, in the case of “voice recognition system for train ticket reservation”, the probability value of words other than the station name is zero in the step of entering the departure station name, and the probability value of words other than hours and minutes is zero in the step of entering the departure time. to be. The grammar information has a probability that each candidate word in the speech recognition process can appear.

음성인식시스템에서는 이러한 일반적인 음성 파라미터 훈련 과정과 음성인식 과정이 있는데, 이때 음성 파라미터 훈련 과정에서 인식률을 높이기 위해 HMM 파라미터 설정 과정을 거치게 된다.In the speech recognition system, there is a general speech parameter training process and a speech recognition process. At this time, the HMM parameter setting process is performed to increase the recognition rate in the speech parameter training process.

본 발명은 이러한 HMM 파라미터 설정시에 HMM 파라미터의 가우시안 믹스츄어 갯수의 효율적인 설정에 관한 것으로, 종래에는 믹스츄어(mixture) 갯수를 음성인식시스템 운용자가 지정한 후 음성 파라미터 훈련을 수행하여 그 성능을 확인하였다.The present invention relates to the efficient setting of the number of Gaussian mixtures of HMM parameters when the HMM parameters are set. In the related art, the performance of the voice recognition system was performed after the voice recognition system operator designated the number of mixtures to confirm the performance. .

이를 도면을 통해 나타내면 다음의 도 4 와 같다.If this is shown through the drawings as shown in FIG.

도 4 는 종래의 믹스츄어 갯수 설정을 통해 음성 파라미터 훈련을 하는 과정의 흐름예시도이다.4 is a flowchart illustrating a process of training a voice parameter through a conventional number of mixtures.

이는 도 2 에서의 음성 파라미터 훈련 과정에 대한 설명을 흐름도를 통해 나타낸 것이다. This is a flowchart illustrating a description of the voice parameter training process in FIG. 2.

우선, HMM 파라미터를 초기화하고, 믹스츄어 갯수를 지정한다(401). First, the HMM parameter is initialized and the number of mixtures is designated (401).

전후방 분할(forward-backward segmentation) 작업의 수행을 통해 음소를 추정한다(402). 즉, "할아버지"와 같은 입력데이터가 있을 때, 여기서 각각의 음소인 "ㅎ", "ㅏ" 등이 차지하는 프레임의 위치를 예측하여 분할하는 것이다.Phoneme is estimated by performing forward-backward segmentation (402). That is, when there is input data such as "grandfather", the prediction is performed by dividing the position of a frame occupied by each phoneme "ㅎ", "ㅏ" and the like.

그리고, HMM 파라미터(parameter)를 추정(estimation)한다(403). 즉, 프레임 별로 찾아진 음소들에 대해 그 특징을 추정하는 것이다.The HMM parameter is estimated (403). That is, the characteristics of the phonemes found for each frame are estimated.

HMM 파라미터에 대해 추정한 후에는 이러한 추정 작업이 충분히 수행되어 원하는 목표에 도달하였는지를 판단한다(404). 이때, 판단 기준은 파라미터 추정 횟수가 될 수 있고, 운용자가 지정하는 목표치라고 할 수도 있다. 이는 음성 파라미터 훈련 과정이므로 입력 자료가 가지는 특징을, 훈련에 의한 얻어지는 데이터가 얼마나 잘 표현하였는가 하는 것을 판단하는 것이다.After estimating the HMM parameters, it is determined whether this estimation has been performed sufficiently to reach the desired goal (404). In this case, the determination criterion may be a parameter estimation number or may be referred to as a target value designated by an operator. Since this is a voice parameter training process, it is to judge how well the data obtained by training represents the characteristics of the input data.

원하는 목표에 도달하지 않았으면, 전후방 분할(forward-backward segmentation) 작업의 수행 과정(402)부터 반복 수행한다.If the desired target has not been reached, the process is repeated from step 402 of performing forward-backward segmentation.

원하는 목표에 도달하였으면 처음에 제시된 믹스츄어(mixture) 갯수로 설정하고(405) 종료한다. 이때, 원하는 목표가 파라미터 추정 훈련 횟수이면 횟수 만큼의 훈련을 수행한 후 결과치를 보고 운용자가 믹스츄어(mixture) 갯수를 다르게 준 후 다시 작업을 수행하게 된다. When the desired goal is reached, the number of the mixtures presented at the beginning is set (405) and the process ends. In this case, if the desired target is the parameter estimation training number, the training is performed as many times as the number of times, and the operator reports the result and the operator gives a different number of mixtures and then performs the work again.

상기한 바와 같이 종래에는 믹스츄어(mixture) 갯수가 음성인식시스템의 성능에 중요한 영향을 미치는데도 운용자의 판단에 의해 지정된 갯수를 설정해주므로 정확한 판단을 내리기가 어려운 문제점이 있었다.As described above, although the number of mixtures has a significant influence on the performance of the voice recognition system, it is difficult to make an accurate judgment because the designated number is set by the operator's judgment.

본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 안출된 것으로, 훈련과정에서 믹스츄어(mixture)의 갯수를 증가시키며 최적의 믹스츄어 갯수를 찾는 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.
The present invention has been made to solve the above problems, the method and method of setting the Gaussian mixture number of hidden Markov model parameters to increase the number of mixtures in the training process and find the optimal number of mixtures An object of the present invention is to provide a computer-readable recording medium having recorded thereon a program for realizing this.

상기 목적을 달성하기 위한 본 발명의 방법은, 음성인식시스템에 적용되는 히든 마르코프 모델(HMM : Hidden markov Model) 파라미터의 가우시안(Gaussian) 믹스츄어(mixture) 갯수 설정 방법에 있어서, 상기 히든 마르코프 모델(HMM) 파라 미터와 상기 믹스츄어(mixture) 갯수를 초기화하는 제 1 단계; 상기 믹스츄어의 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터를 얻는 제 2 단계; 상기 믹스츄어(mixture) 갯수를 소정의 수가 될때까지 변화시키며 상기 제 2 단계부터 반복 수행하는 제 3 단계; 및 각각의 상기 믹스츄어(mixture) 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터로 음성인식률을 테스트하여 그 결과로서 믹스츄어(mixture) 갯수를 설정하는 제 4 단계를 포함하는 것을 특징으로 한다.The method of the present invention for achieving the above object is, in the method of setting the number of Gaussian (Mixture) Mixture of the Hidden Markov Model (HMM) parameter applied to the speech recognition system, HMM) a first step of initializing the number of parameters and the mixture (mixture); Obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures; A third step of changing the number of mixtures until the predetermined number is repeated and repeating from the second step; And a fourth step of testing a speech recognition rate using the Hidden Markov Model (HMM) parameter according to the number of the mixtures and setting the number of the mixtures as a result.

또한, 본 발명의 다른 방법은, 음성인식시스템에 적용되는 히든 마르코프 모델(HMM : Hidden markov Model) 파라미터의 가우시안(Gaussian) 믹스츄어(mixture) 갯수 설정 방법에 있어서, 상기 히든 마르코프 모델(HMM) 파라미터와 상기 믹스츄어(mixture) 갯수를 초기화하는 제 1 단계; 상기 믹스츄어의 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터를 얻어 음성인식률 테스트를 수행하는 제 2 단계; 상기 믹스츄어(mixture) 갯수를 소정의 수가 될때까지 변화시키며 상기 제 2 단계부터 반복 수행하는 제 3 단계; 및 각각의 상기 믹스츄어(mixture) 갯수에 따른 상기 음성인식률 테스트 결과의 비교를 통해 믹스츄어(mixture) 갯수를 설정하는 제 4 단계를 포함하는 것을 특징으로 한다.In addition, another method of the present invention is a method for setting the number of Gaussian mixtures of a hidden markov model (HMM) parameter applied to a speech recognition system, wherein the hidden markov model (HMM) parameter is used. And a first step of initializing the number of the mixtures. Performing a speech recognition rate test by obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures; A third step of changing the number of mixtures until the predetermined number is repeated and repeating from the second step; And a fourth step of setting the number of mixtures by comparing the voice recognition rate test results according to the number of each mixture.

또한, 본 발명은, 대용량 프로세서를 구비한 음성인식시스템에, 히든 마르코프 모델(HMM) 파라미터와 믹스츄어(mixture) 갯수를 초기화하는 제 1 기능; 상기 믹스츄어의 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터를 얻는 제 2 기능; 상기 믹스츄어(mixture) 갯수를 소정의 수가 될때까지 변화시키며 상기 제 2 단계부터 반복 수행하는 제 3 기능; 및 각각의 상기 믹스츄어(mixture) 갯수에 따 른 상기 히든 마르코프 모델(HMM) 파라미터로 음성인식률을 테스트하여 그 결과로서 믹스츄어(mixture) 갯수를 설정하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a speech recognition system having a large-capacity processor, comprising: a first function of initializing a Hidden Markov Model (HMM) parameter and the number of mixtures; A second function of obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures; A third function of changing the number of mixtures until a predetermined number and repeatedly performing the second step; And a program for realizing a fourth function of testing a speech recognition rate with the Hidden Markov Model (HMM) parameter according to the number of the mixtures, and as a result, setting the number of the mixtures. It provides a recording medium that can be read by.

또한, 본 발명은, 대용량 프로세서를 구비한 음성인식시스템에, 히든 마르코프 모델(HMM) 파라미터와 믹스츄어(mixture) 갯수를 초기화하는 제 1 기능; 상기 믹스츄어의 갯수에 따른 상기 히든 마르코프 모델(HMM) 파라미터를 얻어 음성인식률 테스트를 수행하는 제 2 기능; 상기 믹스츄어(mixture) 갯수를 소정의 수가 될때까지 변화시키며 상기 제 2 단계부터 반복 수행하는 제 3 기능; 및 각각의 상기 믹스츄어(mixture) 갯수에 따른 상기 음성인식률 테스트 결과의 비교를 통해 믹스츄어(mixture) 갯수를 설정하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a speech recognition system having a large-capacity processor, comprising: a first function of initializing a Hidden Markov Model (HMM) parameter and the number of mixtures; A second function of performing a speech recognition rate test by obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures; A third function of changing the number of mixtures until a predetermined number and repeatedly performing the second step; And a computer-readable recording medium having recorded thereon a program for realizing a fourth function of setting the number of mixtures by comparing the voice recognition rate test results according to the number of each mixture. .

상기한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 5 는 본 발명에 따른 히든 마르코프 모델 파라미터의 가우시안 믹스츄어 갯수 설정 방법에 대한 일실시예 흐름도이다.5 is a flowchart illustrating a method of setting the number of Gaussian mixtures of a hidden Markov model parameter according to the present invention.

본 발명은 이 가우시안 믹스츄어(mixture)의 갯수 설정에 관한 것으로 최고의 인식 성능을 갖는 믹스츄어(mixture) 갯수를 찾아내는 방법에 대한 것이다. HMM 파라미터의 초기 값은 음소 분할(segmentation) 정보를 이용하여 생성하였다. 이것은 음성파일에서 어떠한 음소가 발화된 것인지를 구간(프레임 단위) 별로 표시하여 놓은 정보를 말하는 것으로 음성학 전문가가 그 소리와 파형을 보고서 판단한 것이다. The present invention relates to setting the number of Gaussian mixtures and to a method of finding the number of mixtures having the best recognition performance. The initial value of the HMM parameter was generated using phoneme segmentation information. This refers to information that indicates which phonemes are uttered in a voice file by section (frame unit), and is determined by a phonetician based on the sound and waveform.

여기에서 음소라고 하는 것은 문맥 독립 음소(CIP : context-independent-phoneme)로서 한국통신의 음성인식시스템에서 정의하여 사용하고 있는 기본음소를 말하는데, 한국어의 음소 갯수보다는 좀 더 세분화된 음소로 정의하여 사용하고 있다. 음소 분할 정보를 이용하여 음소 단위의 HMM 파라미터 초기값을 생성하는데 이 때에 믹스츄어(mixture)를 1개 또는 N개로 분할(split)하여 생성할 수가 있다. 본 발명에서는 1개의 믹스츄어(mixture)로 초기 값을 생성하였다. Here, the phoneme is a context-independent-phoneme (CIP), which is the basic phoneme defined and used in the speech recognition system of KT. It is defined as a phoneme that is more subdivided than the phoneme in Korean. Doing. The phoneme split information is used to generate an initial value of the HMM parameter in phoneme units. In this case, the mixture may be generated by splitting one or N pieces. In the present invention, an initial value is generated by one mixture.

이 초기 파라미터 값과 훈련용 음성 파일들, 그리고 각 음성 파일들의 발음 사전을 이용하여 음성 파일을 음소 구간 별로 분리하는 전후방 분할(forward-backward segmentation) 과정을 거친 후에 HMM 파라미터 추정(estimation) 과정을 진행하는데 이러한 일련의 과정을 파라미터 훈련 과정이라고 부른다. HMM parameter estimation process is performed after the forward-backward segmentation process that separates the voice file by phoneme segment using the initial parameter values, training voice files, and pronunciation dictionary of each voice file. This series of processes is called parametric training.

1개의 믹스츄어(mixture)로서 생성된 초기 파라미터는 훈련용 음성 파일을 이용한 HMM 훈련 과정을 거쳐서 다시 1개의 믹스츄어(mixture)로 이루어진 파라미터를 생성하고 원하는 최대 믹스츄어(mixture)를 생성할 때까지 1개씩의 믹스츄어(mixture)를 증가하면서 반복 수행한다(여기에서 최대 믹스츄어(mixture)라는 의미는 시스템의 메모리가 허용하는 최대 믹스츄어(mixture)라고 할 수도 있다). 그리고, 각 믹스츄어(mixture) 단위의 HMM 파라미터를 생성할 때마다 그 값을 저장한다. The initial parameters created as one mixture go through the HMM training process using the training voice file, and then again generate parameters consisting of one mixture until the maximum mixture desired is generated. Repeatedly increasing the mix by one (here, the maximum mixture may be referred to as the maximum mixture allowed by the system's memory). Each time the HMM parameter of each mix unit is generated, the value is stored.

최대의 믹스츄어(mixture) 갯수만큼 반복 수행하여 각 갯수의 믹스츄어(mixture)로 이루어진 파라미터가 생성되면, 개개의 HMM 파라미터를 사용하여 인식 테스트용 음성 파일들로서 인식 성능을 테스트하여 가장 우수한 인식 성능을 나타내는 믹스츄어(mixture)의 파라미터를 선택한다. When the maximum number of mixtures is repeated to generate parameters consisting of each number of mixtures, the recognition performance is tested as voice files for recognition test using individual HMM parameters to obtain the best recognition performance. Selects the mixture parameter to indicate.

이를 도면의 흐름에 따라 설명하면 다음과 같다.This will be described according to the flow of the drawings.

우선, HMM 파라미터를 초기화하고, 믹스츄어(mixture)의 갯수는 1로 설정한다(501). 현재 믹스츄어(mixture)의 갯수가 최대 믹스츄어(mixture) 갯수보다 많은지를 판단한다(502). 이때, 최대 믹스츄어(mixture)의 갯수는 시스템의 메모리 용량이나 운용자의 설정에 의해 결정된다.First, the HMM parameter is initialized, and the number of mixtures is set to 1 (501). It is determined whether the number of current mixtures is greater than the maximum number of mixtures (502). At this time, the maximum number of mixtures is determined by the memory capacity of the system or the setting of the operator.

현재 믹스츄어(mixture)의 갯수가 최대 믹스츄어(mixture) 갯수보다 많지 않거나 같으면 전후방 분할 작업(forward-backward segmentation)을 수행하여 음소 구간별로 음성 파일을 분리한다(503). If the number of current mixtures is not greater than or equal to the maximum number of mixtures, the voice file is separated by phoneme sections by performing forward-backward segmentation (503).

그리고, HMM 파라미터 추정(parameter estimation)을 수행한다(504). 추정된 HMM 파라미터가 원하는 목표에 수렴하였는지를 확인한다(505). 확인 결과, 수렴하지 않았으면 음소 구간별로 분리하는 전후방 분할 작업(forward-backward segmentation) 수행 과정(503)부터 반복 수행한다.In operation 504, HMM parameter estimation is performed. Verify that the estimated HMM parameter has converged to the desired goal (505). As a result of the check, if it has not converged, the process is repeated from the forward-backward segmentation process 503 for separating the phoneme sections.

원하는 목표에 수렴하였는지를 확인한 결과, 원하는 목표에 수렴하였으면 얻어진 HMM 파라미터(parameter)를 저장하고(506), 믹스츄어(mixture)의 갯수를 1증가시켜 현재 믹스츄어(mixture)의 갯수가 최대 믹스츄어(mixture) 갯수보다 많은지를 판단하는 과정(502)부터 반복 수행한다.As a result of confirming that the target has been converged to the desired target, when the target has converged to the desired target, the obtained HMM parameter is stored (506), and the number of the mixtures is increased by 1 to increase the number of the current mixtures. The process is repeated from step 502 to determine whether the number of mixtures is greater than that.

현재 믹스츄어(mixture)의 갯수가 최대 믹스츄어(mixture) 갯수보다 많으면, 각 믹스츄어(mixture) 갯수에서의 HMM 파라미터(parameter)로 인식률을 테스트하여(508), 최적의 갯수로 믹스츄어(mixture) 갯수를 설정한다(509).If the number of current mixtures is greater than the maximum number of mixtures, the recognition rate is tested with the HMM parameter at each mixture number (508), and the optimal number of mixtures is used. Set the number (509).

상기한 실시예에서는 각 믹스츄어(mixture) 갯수에 대한 HMM 파라미터(parameter)를 얻은 후에 전체적으로 인식률 테스트를 통해 최적의 믹스츄어(mixture) 갯수를 설정하였으나, 이와 달리 각 믹스츄어(mixture) 갯수를 얻을 때마다 인식률 테스트를 하고, 최대 믹스츄어(mixture) 갯수에서의 인식률 테스트까지 수행한 후 얻어진 결과값들을 비교하여 최적의 믹스츄어(mixture) 갯수를 얻는 방법도 있다.In the above embodiment, after obtaining the HMM parameter for each number of mixtures, the optimal number of mixtures is set through the recognition rate test as a whole.However, the number of each mixture is obtained. There is also a method of obtaining an optimum number of mixtures by performing a recognition rate test each time, performing a recognition rate test up to the maximum number of mixtures, and comparing the result values obtained.

상기한 방법에 따라 믹스츄어(mixture) 갯수와 그에 따른 HMM 파라미터가 얻어지는데, 이 HMM 파라미터는 문맥 독립 음소(CIP)를 사용한 것으로 HMM 문맥 독립(CI : context-independent) 파라미터라고 부른다.According to the method described above, the number of mixtures and the HMM parameters are obtained. The HMM parameters are referred to as HMM context-independent parameters (CIPs).

실제의 음성인식시스템에서는 인식 성능을 높이기 위하여 CI-HMM에서 확장한 음소들로 이루어진 문맥 종속(CD : context-dependent) 파라미터를 사용한다. Real speech recognition system uses context-dependent parameter (CD) consisting of phonemes extended from CI-HMM to improve recognition performance.

CIP는 앞 뒤 문맥에 나오는 음소의 종류에 따라 CDP로 세분화되는데 이 과정은 해당 도메인의 단어에 따라서 결정된다. 여기서, 도메인은 음성인식이 사용되는 분야를 말한다. 예를 들어 철도역에 대한 음성인식일 경우, 대상인 모든 철도역이 음성인식시스템에서의 도메인이 된다.CIPs are subdivided into CDPs based on the types of phonemes in the front and back context. This process is determined by the words in the domain. Here, the domain refers to the field where voice recognition is used. For example, in the case of voice recognition of railway stations, all target railway stations become domains in the voice recognition system.

CDP에서의 해당 도메인 단어에 대해 예를 들면, 똑같은 "ㄱ" 음소라고 하더라도 초성의 "ㄱ"과 종성의 "ㄱ" 은 엄격하게 보면 그 소리가 다르고, 다시 종성의 "ㄱ"도 그 앞과 뒤에 어떠한 음소가 있는가에 따라서 소리가 다르다고 할 수 있다. 이렇게 전후의 음소 환경까지를 고려하여 CDP로 확장하여 사용하게 된다.For the corresponding domain word in the CDP, for example, even though the same "a" phonemes, the "a" of the superstar and the "a" of the star are strictly different in sound, and again the "a" in front and behind The sound is different depending on which phonemes are present. In this way, it is extended to CDP in consideration of the phoneme environment before and after.

음소 단위가 CDP로 확장이 되면 CIP 단위의 HMM 파라미터 훈련 과정과 동일한 훈련 과정을 반복한다. 그런데, CIP 훈련 과정에서 믹스츄어(mixture)를 증가한 값으로 CDP로 확장한다면 CDP 단위의 HMM 파라미터는 그 믹스츄어(mixture) 갯수 이상의 믹스츄어(mixture)만을 생성할 수 있다. When the phoneme unit is expanded to CDP, the same training process as the HIP parameter training process of the CIP unit is repeated. By the way, if the extension to the CDP to increase the value of the mixture (Ctureture) in the CIP training process, the HMM parameter of the CDP unit can generate only the mixture (mixture) more than the number of the mixture (mixture).

예를 들어, HMM의 CIP 음소 훈련 단계에서 3개의 믹스츄어(mixture)를 생성하였다면 여기에서 CDP로 확장하여 훈련작업을 계속하면 3개의 믹스츄어(mixture)부터 시작하여 증가하게 된다. For example, if you created three mixtures during the CIP phoneme training phase of the HMM, if you continue with the training by expanding to CDP here, the three mixtures will start and increase.

즉, 큰 믹스츄어(mixture)로부터 작은 믹스츄어(mixture) 갯수로 축소가 되지는 않기 때문에 CIP의 훈련 단계에서 몇 개 믹스츄어(mixture)인 상태로 CDP로 확장하는 것이 좋은가 하는 문제에 다다르게 된다. 실험에 의하면 CIP 단계의 HMM 훈련 과정에서는 적은 믹스츄어(mixture) 갯수, 즉 1개로 생성한 후에 CDP로 확장하여 인식률이 최고가 될 때까지 믹스츄어(mixture)를 증가하는 것이 가장 우수한 성능을 나타내었다. 이를 도면을 통해 제시하면 다음의 도 6 과 같다.That is, since it does not reduce from a large mixture to a small number of mixtures, it is a question of how many mixtures should be extended to the CDP during the training phase of the CIP. According to the experiments, the HMM training process of the CIP stage showed the best performance by increasing the mix until the number of mixtures, that is, one, was expanded to CDP and the recognition rate was the best. If this is presented through the drawings as shown in FIG.

도 6 은 본 발명에 이용되는 CIP의 각 믹스츄어(mixture) 갯수에서 확장하여 훈련한 CDP 인식률의 일예시도이다.FIG. 6 is an exemplary view of CDP recognition rate trained by extending the number of mixtures of CIP used in the present invention.

도면에 제시된 바와 같이, CIP 단계에서 너무 과도하게 훈련을 반복하는 것보다 적정한 CDP로 확장한 상태에서 반복훈련을 하는 것이 더 좋다는 것을 확인하였다.As shown in the figure, it was confirmed that it is better to repeat the training in the expanded state to the appropriate CDP than to repeat the training too excessively in the CIP stage.

CDP 단위로 HMM 훈련 과정을 거쳐서 각 믹스츄어(mixture)별 파라미터를 저 장하고, 원하는 최대 믹스츄어(mixture) 갯수만큼의 파라미터가 생성되면 각 믹스츄어(mixture)의 파라미터를 이용하여 인식 성능을 테스트한다. 이 때 가장 우수한 성능을 나타내는 파라미터를 사용하여 음성 인식 시스템에 적용한다.The parameters of each mixture are saved through HMM training in CDP units, and the recognition performance is tested using the parameters of each mixture when the desired maximum number of mixtures are generated. do. In this case, the parameter representing the best performance is applied to the speech recognition system.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 연속 확률 히든 마르코프 모델(HMM)의 믹스츄어(mixture)의 갯수 중에서 우수한 성능을 나타내는 갯수를 쉽게 찾아낼 수 있어, 기존의 운용자에 의해 지정되는 N개의 믹스츄어(mixture)를 통해 파라미터를 얻는 것보다 훨씬 간편하고 빠른 처리가 가능한 효과가 있다.As described above, the present invention can easily find the number showing the excellent performance among the number of the mixtures of the continuous probability hidden Markov model (HMM), and thus, N mixtures designated by the existing operators. ) Is much easier and faster to process than getting parameters.

또한, 본 발명은, 각 믹스츄어(mixture) 갯수에서의 HMM 파라미터를 생성하여 저장하므로 문맥 독립 음소(CIP)에서의 음성 파라미터 훈련을 문맥 종속 음소(CDP)에서의 훈련으로 확장할 경우에 어느 믹스츄어(mixture) 갯수에서 확장하는 것이 좋은 지를 쉽게 확인할 수 있는 효과가 있다.In addition, the present invention generates and stores HMM parameters in the number of each mixture, so that any mix is required when extending voice parameter training in context independent phonemes (CIP) to training in context dependent phonemes (CDP). It's easy to see if it's good to expand on the number of mixes.

Claims

In the method of setting the number of Gaussian mixtures of the Hidden Markov Model (HMM) parameter applied to the speech recognition system,

A first step of initializing the hidden Markov model (HMM) parameter and the number of mixtures;

Obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures;

A third step of changing the number of mixtures until the predetermined number is repeated and repeating from the second step; And

A fourth step of testing a speech recognition rate using the Hidden Markov Model (HMM) parameter according to the number of each mixture and setting the number of mixtures as a result

How to set the number of Gaussian mixtures of Hidden Markov model parameters, including.

The method of claim 1,

The second step,

Perform a forward-backward segmentation operation to separate the voice file by phoneme sections with the Hidden Markov Model (HMM) parameter initial value, the number of the mixtures, the training voice file, and the pronunciation dictionary. A fifth step;

A sixth step of obtaining an estimated value for the Hidden Markov Model (HMM) parameter with the speech file separated for each phoneme section;

A seventh step of determining whether an estimated value of the Hidden Markov Model (HMM) parameter converges to a predetermined value;

An eighth step of repeating from the fifth step if it does not converge to the predetermined value as a result of the determination of the seventh step; And

A ninth step of storing the estimated value of the Hidden Markov Model (HMM) parameter as the Hidden Markov Model (HMM) parameter in the number of mixtures when converging to the predetermined value as a result of the determination in the seventh step.

Performing a speech recognition rate test by obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures;

A fourth step of setting the number of mixtures by comparing the voice recognition rate test results according to the number of each mixture

The method of claim 3, wherein

The second step,

Performing a forward-backward segmentation operation for separating the voice file by phoneme sections with the Hidden Markov Model (HMM) parameter initial value, the number of the mixtures, the training voice file, and the pronunciation dictionary. A fifth step;

An eighth step of repeating from the fifth step if it does not converge to the predetermined value as a result of the determination of the seventh step;

A ninth step of storing the estimated value of the Hidden Markov Model (HMM) parameter as the Hidden Markov Model (HMM) parameter in the number of mixtures when converging to the predetermined value as a result of the determination in the seventh step; And

A tenth step of performing a voice recognition rate test with the Hidden Markov Model (HMM) parameter in the number of mixtures and storing the result;

The method according to any one of claims 1 to 4,

The predetermined number is

Method for setting the Gaussian Mixture number of Hidden Markov model parameters, characterized in that the number to indicate the range that can be set the number of mixtures.

The method of claim 5, wherein

The Hidden Markov Model (HMM) parameter is

A method for setting the number of Gaussian mixtures of a Hidden Markov Model parameter, which is substantially a Hidden Markov Model (HMM) parameter in Context-Independent-phoneme (CIP) units.

The method of claim 5, wherein

The Hidden Markov Model (HMM) parameter is

A method for setting the Gaussian mix number of Hidden Markov Model Parameters, which is substantially a Hidden Markov Model (HMM) parameter in units of Context-Dependent-phoneme (CDP).

In voice recognition system with a large processor,

A first function of initializing the Hidden Markov Model (HMM) parameters and the number of mixtures;

A second function of obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures;

A third function of changing the number of mixtures until a predetermined number and repeatedly performing the second step; And

A fourth function of testing a speech recognition rate with the Hidden Markov Model (HMM) parameter according to the number of each mixture and setting the number of mixtures as a result

A computer-readable recording medium having recorded thereon a program for realizing this.

In voice recognition system with a large processor,

A second function of performing a speech recognition rate test by obtaining the Hidden Markov Model (HMM) parameter according to the number of the mixtures;

A fourth function of setting the number of mixtures by comparing the voice recognition rate test results according to the number of each mixture