KR100633228B1

KR100633228B1 - Method for presenting gaussian probability density and method of speech recognition training for obtaining the same

Info

Publication number: KR100633228B1
Application number: KR1019990068001A
Authority: KR
Inventors: 박용규
Original assignee: 주식회사 케이티
Priority date: 1999-12-31
Filing date: 1999-12-31
Publication date: 2006-10-11
Also published as: KR20010060005A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 가우시안 확률밀도 표현 방법 및 그를 얻기 위한 음성 인식 훈련 방법과 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a Gaussian probability density representation method, a speech recognition training method for obtaining the same, and a computer readable recording medium having recorded thereon a program for implementing the methods.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 확률밀도 공간을 여러 개의 PGAM 공간으로 나눈 다음 이 공간을 GAM으로 표현한 후 가장 큰 값을 갖는 GAM을 대표 확률밀도로 나타내는 가우시안 확률밀도 표현 방법 및 그를 얻기 위한 음성 인식 훈련 방법과 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하고자 함.The present invention divides a probability density space into a plurality of PGAM spaces, and then expresses this space as a GAM, and then represents a Gaussian probability density representation method representing a GAM having the largest value as a representative probability density, a speech recognition training method for obtaining the same, and the method. To provide a computer-readable recording medium that records a program for realizing them.

3. 발명의 해결방법의 요지 3. Summary of Solution to Invention

본 발명은, 믹스츄어 그룹 수가 초기화된 음성 인식모델을 입력받아 인식 훈련을 통해 훈련된 음성 인식모델을 얻는 제 1 단계; 훈련된 상기 음성 인식모델에 대해 음성 인식 테스트를 수행하여 오인식 단어에 대해 인식 훈련을 수행하여 새로운 음성 인식모델을 획득하고 상기 제 1 단계에서 얻은 음성 인식모델과 결합하여 상기 믹스츄어 그룹 수가 증가된 음성 인식모델을 얻는 제 2 단계; 상기 제 2 단계에서 얻은 음성 인식모델에 대해 인식 훈련을 수행하고 믹스츄어 그룹 개수가 원하는 수에 도달하였는지를 확인하는 제 3 단계; 상기 제 3 단계의 확인 결과, 믹스츄어 그룹 개수가 원하는 수에 도달하지 않았으면 상기 제 2 단계부터 반복 수행하는 제 4 단계; 및 상기 제 3 단계의 확인 결과, 믹스츄어 그룹 개수가 원하는 수에 도달하였으면 음성 인식 훈련을 마치는 제 5 단계를 포함함.The present invention includes a first step of obtaining a trained speech recognition model through the recognition training receives the speech recognition model initialized the number of mix groups; A speech recognition test is performed on the trained speech recognition model to perform a recognition training on a misrecognized word to obtain a new speech recognition model and combine with the speech recognition model obtained in the first step to increase the number of mixture groups. A second step of obtaining a recognition model; A third step of performing recognition training on the speech recognition model obtained in the second step and confirming that the number of mixture groups has reached a desired number; A fourth step of repeating from the second step if the number of mixture groups does not reach a desired number as a result of the checking of the third step; And a fifth step of completing voice recognition training when the number of mixture groups reaches a desired number as a result of the checking of the third step.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 음성 인식 시스템 등에 이용됨.The present invention is used in a speech recognition system and the like.

음성 인식, 확률밀도, 음성 인식 훈련, 가우시안Speech Recognition, Probability Density, Speech Recognition Training, Gaussian

Description

METHOD FOR PRESENTING GAUSSIAN PROBABILITY DENSITY AND METHOD OF SPEECH RECOGNITION TRAINING FOR OBTAINING THE SAME}

도 1 은 일반적인 컴퓨터의 구성예시도.1 is an exemplary configuration diagram of a general computer.

도 2 는 종래의 확률밀도를 표현하기 위한 확률모델의 일예시도.2 is an exemplary view of a probability model for expressing a conventional probability density.

도 3 은 본 발명에 따른 확률밀도 모델의 일실시예 설명도.3 is an exemplary explanatory diagram of a probability density model according to the present invention;

도 4 는 본 발명에 따른 가우시안 확률밀도 표현 방법에 대한 일실시예 흐름도.4 is a flowchart illustrating a Gaussian probability density representation method according to the present invention.

도 5 는 본 발명에 따른 HPGAM을 얻기 위한 음성 인식 훈련 방법에 대한 일실시예 흐름도.5 is a flowchart of an embodiment of a speech recognition training method for obtaining an HPGAM according to the present invention;

*도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

1 : 입력장치 2 : 중앙처리장치1: input device 2: central processing unit

3 : 주기억장치 4 : 보조기억장치3: main memory device 4: auxiliary storage device

5 : 표시장치5: display device

본 발명은 가우시안 확률밀도 표현 방법 및 그를 얻기 위한 음성 인식 훈련 방법과 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 특히 가우시안 오토리그레시브 믹스츄어(GAM : Gaussian Autoregressive mixture)와 파티숀드 가우시안 오토리그레시브 믹스츄어(PGAM : Partitioned Gaussian Autoregressive mixture)를 합쳐서 가우시안 확률밀도를 표현하는 방법 및 그를 얻기 위한 음성 인식 훈련 방법과 상기 방법들을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다. The present invention relates to a Gaussian probability density representation method, a speech recognition training method for obtaining the same, and a computer readable recording medium recording a program for realizing the methods, in particular, a Gaussian Autoregressive mixture (GAM). ) And a partitioned Gaussian Autoregressive mixture (PGAM) that combine Gaussian probability density to express Gaussian probability density, to obtain speech recognition training methods for obtaining them, and to a computer readable program that records the programs to realize the methods. It relates to a recording medium.

도 1 은 일반적인 컴퓨터의 구성예시도이다.1 is an exemplary configuration diagram of a general computer.

일반적으로 컴퓨터는 입력장치(1), 중앙처리장치(2), 주기억장치(3), 보조기억장치(4) 및 표시장치(5)를 구비한다. 이러한 컴퓨터에서 확률밀도를 표현하는 방법과 이를 이용하여 음성 인식 훈련을 수행하는 방법이 구현된다.In general, a computer includes an input device 1, a central processing unit 2, a main memory device 3, an auxiliary memory device 4, and a display device 5. A method of expressing probability density in such a computer and a method of performing speech recognition training using the same are implemented.

도 2 는 종래의 확률밀도를 표현하기 위한 확률모델의 일예시도이다.2 is an example of a probability model for expressing a conventional probability density.

기존에 널리 사용되는 확률모델을 나타낸다. 즉, 2개의 가우시안(Gaussian) 믹스츄어(mixture) 확률분포(PDF)가 중첩되어 1개의 확률분포(PDF)가 되어진 모양(점선)이다. Represents a widely used probability model. In other words, two Gaussian Mixture probability distributions (PDF) are overlapped to form one probability distribution (PDF) (dotted line).

확률밀도가 다른 2개의 가우시안(Gaussian) 분포가 합쳐질 경우 믹스츄어(mixture) 모델은 각각의 가우시안(Gaussian) 분포에 가중치(weighting)을 주어 단순히 합하여 새로운 확률분포(PDF)를 만들어 낸다. 이를 가우시안 오토리그레시브 믹스츄어(GAM : Gaussian Autoregressive mixture) 모델이라 부르며(점선), 2개의 가우시안(Gaussian) 분포중 가장 큰 확률분포만 고려하고 나머지 확률분포를 무시하는 것을 파티숀드 가우시안 오토리그레시브 믹스츄어(PGAM : Partitioned Gaussian Autoregressive mixture) 모델이라 부른다.When two Gaussian distributions with different probability densities are combined, the Mixture model weights each Gaussian distribution and simply adds them to create a new probability distribution (PDF). This is called the Gaussian Autoregressive mixture (GAM) model, and it considers the largest probability distribution of two Gaussian distributions and ignores the other probability distributions for the partitioned Gaussian Autoregressive mixture. It is called a PGAM (Partitioned Gaussian Autoregressive mixture) model.

가우시안(Gaussian) 밀도(density)의 합으로 확률밀도를 표현하는 경우에 기존 방법은 GAM와 PGAM으로 불린 두 가지 방법이 사용되었다. GAM은 확률 밀도를 가우시안(Gaussian) 밀도(density)의 가중된 합(weighted sum)으로 표현한다. 또한 PGAM은 확률 공간을 가우시안(Gaussian)으로 표현한 후 가장 큰 값을 확률밀도로 나타내고 나머지 확률밀도 값은 무시하는 방식이다. When probability density is expressed as a sum of Gaussian density, two methods, called GAM and PGAM, are used. GAM expresses the probability density as a weighted sum of Gaussian density. In addition, PGAM expresses the probability space as Gaussian and represents the largest value as probability density and ignores the other probability density values.

그런데, 상기한 확률밀도 표현 방식을 이용하여 음성 인식에 적용할 경우에 음성 인식율이 만족할 만큼 높지 못한 문제점이 있었다. 따라서, 음성 인식율 향상을 위한 보다 나은 확률밀도 표현 방식이 요구되고 있다.However, there is a problem that the speech recognition rate is not high enough to be applied to speech recognition using the probability density expression method described above. Therefore, there is a demand for a better probability density representation method for improving speech recognition rate.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 안출된 것으로, 확률밀도 공간을 여러 개의 PGAM 공간으로 나눈 다음 이 공간을 GAM으로 표현한 후 가장 큰 값을 갖는 GAM을 대표 확률밀도로 나타내는 가우시안 확률밀도 표현 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다. The present invention has been devised to solve the above problems, which is divided into several PGAM spaces, and then expressed as a GAM, and then a Gaussian probability density representing a GAM having the largest value as a representative probability density. It is an object of the present invention to provide a computer-readable recording medium recording a representation method and a program for realizing the method.

또한, 본 발명은, 기존 훈련 방식에 오인식된 데이터를 결합하여 초기화하여 상기 가우시안 확률밀도 표현을 얻기 위한 음성 인식 훈련 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 다른 목적이 있다.
In addition, the present invention provides a voice recognition training method for combining and recognizing data that has been misidentified with an existing training method to obtain the Gaussian probability density expression, and a computer-readable recording medium having recorded thereon a program for realizing the method. That has another purpose.

상기 목적을 달성하기 위한 본 발명은, 음성 인식 시스템에 적용되는 가우시안 확률밀도 표현 방법에 있어서, 확률밀도 공간을 여러 개의 파티숀드 가우시안 오토리그레시브 믹스츄어(PGAM : Partitioned Gaussian Autoregressive Mixture) 공간으로 나누는 제 1 단계; 나뉘어진 상기 파티숀드 가우시안 오토리그레시브 믹스츄어 공간을 가우시안 오토리그레시브 믹스츄어(GAM : Gaussian Autoregressive Mixture)로 표현하는 제 2 단계; 및 가장 큰 값을 갖는 GAM을 대표 확률밀도인 하이브리드 파티숀드 가우시안 오토리그레시브 믹스츄어(HPGAM : Hybrid Partitioned Gaussian Autoregressive Mixture)로 나타내어 가우시안 확률밀도를 표현하는 제 3 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention relates to a Gaussian probability density expression method applied to a speech recognition system, comprising: dividing a probability density space into a plurality of partitioned Gaussian Autoregressive Mixture (PGAM) spaces. Stage 1; A second step of expressing the divided partitioned Gaussian auto-regressive mixture space as Gaussian Autoregressive Mixture (GAM); And a third step of expressing the Gaussian probability density by representing the GAM having the largest value as a hybrid partitioned Gaussian Autoregressive Mixture (HPGAM) which is a representative probability density.

또한, 본 발명은, 프로세서를 구비한 음성 인식 시스템에 있어서, 확률밀도 공간을 여러 개의 파티숀드 가우시안 오토리그레시브 믹스츄어(PGAM : Partitioned Gaussian Autoregressive Mixture) 공간으로 나누는 제 1 기능; 나뉘어진 상기 공간을 가우시안 오토리그레시브 믹스츄어(GAM : Gaussian Autoregressive Mixture)로 표현하는 제 2 기능; 및 가장 큰 값을 갖는 GAM을 대표 확률밀도인 하이브리드 파티숀드 가우시안 오토리그레시브 믹스츄어(HPGAM : Hybrid Partitioned Gaussian Autoregressive Mixture)로 나타내어 가우시안 확률밀도를 표현하는 제 3 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.According to another aspect of the present invention, there is provided a speech recognition system having a processor, comprising: a first function of dividing a probability density space into a plurality of partitioned Gaussian Autoregressive Mixture (PGAM) spaces; A second function of expressing the divided space in Gaussian Autoregressive Mixture (GAM); And a computer that records a program for realizing a third function of expressing Gaussian probability density by expressing the GAM having the largest value as a hybrid partitioned Gaussian Autoregressive Mixture (HPGAM), which is a representative probability density. Provide a readable recording medium.

상기 다른 목적을 달성하기 위한 본 발명은, 음성 인식 시스템에 적용되는 하이브리드 파티숀드 가우시안 오토리그레시브 믹스츄어을 얻기 위한 음성 인식 훈련 방법에 있어서, 믹스츄어 그룹 수를 초기값으로 하여 제1 훈련 데이터를 설정한 후, 상기 제1 훈련 데이터에 대해 훈련을 수행하는 제1 단계; 상기 훈련이 수행된 제1 훈련 데이터에 대해 인식 시험을 수행하여, 오인식 단어에 대해 훈련을 수행하는 제2 단계; 상기 오인식 단어에 대한 훈련 데이터와 상기 제1 훈련 데이터를 결합하여 믹스츄어 그룹 개수가 1 증가된 새로운 제2 훈련 데이터를 획득하여, 상기 제2 훈련 데이터에 대해 훈련을 수행하는 제3 단계; 상기 제3 단계의 훈련 데이터의 믹스츄어 그룹 개수가 원하는 수에 도달하였는지를 확인하는 제4 단계; 상기 제4 단계에서 믹스츄어 그룹 개수가 원하는 수에 도달하지 않았으면 상기 제2 단계부터 반복하는 제5 단계; 및 상기 제4 단계에서 믹스츄어 그룹 개수가 원하는 수에 도달하였으면 음성 인식 훈련을 종료하는 제6 단계를 포함한다.According to another aspect of the present invention, there is provided a speech recognition training method for obtaining a hybrid partitioned Gaussian auto-regressive mixture applied to a speech recognition system, wherein the first training data is set using the number of mixture groups as an initial value. A first step of performing training on the first training data; A second step of performing a recognition test on the first training data on which the training is performed to perform a training on a misunderstanding word; A third step of combining the training data for the misrecognized word and the first training data to obtain new second training data of which the number of mixture groups is increased by one, and performing training on the second training data; A fourth step of checking whether the number of mixture groups of the training data of the third step reaches a desired number; A fifth step of repeating from the second step if the number of mixture groups does not reach a desired number in the fourth step; And a sixth step of terminating the speech recognition training when the number of the mixture groups reaches the desired number in the fourth step.

또한, 본 발명은, 프로세서를 구비한 음성 인식 시스템에, 믹스츄어 그룹 수를 초기값으로 하여 제1 훈련 데이터를 설정한 후, 상기 제1 훈련 데이터에 대해 훈련을 수행하는 제1 단계; 상기 훈련이 수행된 제1 훈련 데이터에 대해 인식 시험을 수행하여, 오인식 단어에 대해 훈련을 수행하는 제2 단계; 상기 오인식 단어에 대한 훈련 데이터와 상기 제1 훈련 데이터를 결합하여 믹스츄어 그룹 개수가 1 증가된 새로운 제2 훈련 데이터를 획득하여, 상기 제2 훈련 데이터에 대해 훈련을 수행하는 제3 단계; 상기 제3 단계의 훈련 데이터의 믹스츄어 그룹 개수가 원하는 수에 도달하였는지를 확인하는 제4 단계; 상기 제4 단계에서 믹스츄어 그룹 개수가 원하는 수에 도달하지 않았으면 상기 제2 단계부터 반복하는 제5 단계; 및 상기 제4 단계에서 믹스츄어 그룹 개수가 원하는 수에 도달하였으면 음성 인식 훈련을 종료하는 제6 단계를 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention also provides a speech recognition system including a processor, comprising: a first step of setting first training data using the number of mixture groups as an initial value and then performing training on the first training data; A second step of performing a recognition test on the first training data on which the training is performed to perform a training on a misunderstanding word; A third step of combining the training data for the misrecognized word and the first training data to obtain new second training data of which the number of mixture groups is increased by one, and performing training on the second training data; A fourth step of checking whether the number of mixture groups of the training data of the third step reaches a desired number; A fifth step of repeating from the second step if the number of mixture groups does not reach a desired number in the fourth step; And a computer-readable recording medium having recorded thereon a program for realizing a sixth step of terminating voice recognition training when the number of mixture groups reaches the desired number in the fourth step.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 3 은 본 발명에 따른 확률밀도 모델의 일실시예 설명도이다.3 is an exemplary explanatory diagram of a probability density model according to the present invention.

도 3 은 2개의 확률밀도(PDF) 모델을 나타내고 있다. 12 믹스츄어(mixture)를 가진 기존의 가우시안 오토리그레시브 믹스츄어(GAM) 모델과 4 믹스츄어(mixture) 그룹(group)과 3 믹스츄어(mixture)를 가진 본 발명에서 제안하는 하이브리드 파티숀드 가우시안 오토리그레시브 믹스츄어(HPGAM) 모델을 나타내고 있다.3 shows two probability density (PDF) models. The hybrid partitioned Gaussian auto proposed by the present invention has a conventional Gaussian auto-regressive mixture model with 12 mixtures, 4 mixture groups and 3 mixtures. Representative HPGAM model.

본 실시예에서는 확률밀도를 표현하는 방법에 대하여 설명한다. 가우시안(Gaussian) 밀도(density)의 합으로 확률밀도를 표현하는 경우에 기존에는 GAM와 PGAM으로 불린 두 가지 방식이 사용되었다. In the present embodiment, a method of expressing probability density will be described. When probability density is expressed as a sum of Gaussian density, two methods, called GAM and PGAM, have been used.

GAM은 확률 밀도를 가우시안(Gaussian) 밀도(density)의 가중된 합(weighted sum)으로 표현한다. 또한, PGAM은 학률 공간을 가우시안(Gaussian)으로 표현한 후 가장 큰 값을 확률밀도로 나타내고 나머지 확률밀도 값은 무시하는 방식이다. GAM expresses the probability density as a weighted sum of Gaussian density. In addition, PGAM expresses the school space as Gaussian, and represents the largest value as probability density and ignores the remaining probability density values.

본 발명에서 제안한 방식은 위의 두 가지를 결합한 방식으로 확률밀도 공간을 여러 개의 PGAM 공간으로 나눈 다음 이 공간을 GAM으로 표현한 후 가장 큰 값을 갖는 GAM을 대표 확률밀도로 나타내는 방식이다. 즉 제안한 가우시안(Gaussian) 확률밀도의 표현방법은 GAM와 PGAM을 합쳐놓은 방식으로 우리는 이 방식을 하이브리드 파티숀드 가우시안 오토리그레시브 믹스츄어(HPGAM : Hybrid Partitioned Gaussian Autoregressive mixture)로 부른다.The method proposed in the present invention combines the above two and divides the probability density space into several PGAM spaces, and then expresses the GAM having the largest value as the representative probability density. In other words, the proposed Gaussian probability density method combines GAM and PGAM, and we call this method Hybrid Partitioned Gaussian Autoregressive mixture (HPGAM).

도 4 는 본 발명에 따른 가우시안 확률밀도 표현 방법에 대한 일실시예 흐름도이다.4 is a flowchart illustrating a Gaussian probability density representation method according to the present invention.

우선, 확률밀도 공간을 여러 개의 PGAM 공간으로 나누고(401), 이 공간을 GAM으로 표현한다(402). 그리고, 가장 큰 값을 갖는 GAM을 대표 확률밀도로 나타내어 가우시안 확률밀도를 표현한다(403).First, a probability density space is divided into a plurality of PGAM spaces (401), and this space is represented by GAM (402). The GAM having the largest value is represented as the representative probability density to represent the Gaussian probability density (403).

도 5 는 본 발명에 따른 HPGAM을 얻기 위한 음성 인식 훈련 방법에 대한 일실시예 흐름도이다.5 is a flowchart illustrating an embodiment of a speech recognition training method for obtaining an HPGAM according to the present invention.

또한, 본 실시예에서는 HPGAM에 대한 훈련(training) 방법에 대하여 설명한다. 이 방식은 기존 훈련 방식에 따라 훈련한 뒤 오 인식된 데이타를 결합하여 새로이 초기화함으로써 HPGAM을 얻는 방식이다. 즉, 도 5 에서 보여진 바와 같이 원하는 믹스츄어(mixture) 개수의 음성 인식모델(1 mixture group and N mixtures)을 전통적인 방식으로 구한 다음, 이 모델로 음성 인식 테스트(test)를 수행한다. 이 음성 인식 결과중 오 인식 단어를 이용하여 새로운 음성 인식모델(1 mixture group and N mixtures)을 구한다. 이 모델을 이전 모델과 결합하여 새로운 음성 인식모델(2 mixture group and N mixtures)을 구하며, 이 모델을 이용하여 훈련을 수행한다. 이후 또다시 이 모델로 음성 인식 테스트(test)를 수행하여 원하는 믹스츄어 그룹(mixture groups)이 얻어질 때까지 반복한다. 이러한 훈련은 기존 훈련의 단점인 초기화에서 발생하는 문제점을 보충해 준다.In addition, this embodiment describes a training method for HPGAM. This method obtains HPGAM by training according to the existing training method and reinitializing the newly recognized data. That is, as shown in FIG. 5, a voice recognition model (1 mixture group and N mixtures) of a desired number of mixtures is obtained in a conventional manner, and then a voice recognition test is performed using this model. From these speech recognition results, a new speech recognition model (1 mixture group and N mixtures) is obtained using the misrecognition words. This model is combined with the previous model to obtain a new speech recognition model (2 mixture group and N mixtures) and trained using this model. After that, the voice recognition test is repeated with this model until the desired mix groups are obtained. This training compensates for the problem with initialization that is a disadvantage of the existing training.

이를 도면의 흐름에 따라 설명하면 다음과 같다.This will be described according to the flow of the drawings.

우선, 믹스츄어 그룹 수를 1로 하여 훈련 데이터를 설정하고(501), 이 훈련 데이터에 대한 세그멘탈 k-민스(Segmental k-means) 훈련과 포워드-백워드(forward-backward) 훈련을 수행한다(502). First, training data is set with the number of mixture groups as 1 (501), and segmental k-means training and forward-backward training are performed on the training data. (502).

훈련이 수행된 훈련 데이터에 대해 인식시험을 수행하고(503), 오인식 단어에 대한 세그멘탈 k-민스(Segmental k-means) 훈련과 포워드-백워드(forward-backward) 훈련을 수행한다(504, 505).A recognition test is performed on the training data on which the training is performed (503), and segmental k-means training and forward-backward training on misperceived words are performed (504, 505).

오인식 단어를 훈련시켜 얻은 훈련 데이터에 기존의 훈련 데이터를 결합하여 믹스츄어 그룹 개수가 1증가된 새로운 훈련데이터를 얻어(506), 포워드-백워드 훈련을 수행하고(507), 이 훈련 데이터의 믹스츄어 그룹 개수가 원하는 믹스츄어 그룹 개수와 같은지를 판단한다(508). 판단 결과, 같지 않으면 그 숫자에 미치지 않는 것이므로, 훈련 데이터에 대한 인식시험을 수행하는 과정(503)부터 반복 수행한다. 판단 결과, 같으면 훈련 데이터에 대한 음성 인식 훈련 과정을 종료한다.Combining the existing training data with the training data obtained by training the misperceived words to obtain new training data with an increase in the number of mix group groups (506), performing forward-backward training (507), and mixing the training data. It is determined whether the number of the group groups is equal to the desired number of mix groups in step 508. As a result of the determination, if it is not the same, the number does not reach the number, and the process is repeated from the process 503 of performing the recognition test on the training data. As a result of the determination, the voice recognition training process for the training data is terminated.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 우선 음성 인식율 향상을 위해 기존의 GAM이나 PGAM보다 인식율이 좋은 HPGAM을 보다 수월하게 얻을 수 있으며, 기존 훈련의 단점인 초기화에서 발생하는 문제점을 보충해주는 음성 인식 훈련에 따라 음성 인식율을 향상시킬 수 있는 효과가 있다.According to the present invention as described above, in order to improve the speech recognition rate, it is easier to obtain HPGAM with better recognition rate than the existing GAM or PGAM, and according to the speech recognition training to compensate for the problems caused by the initialization, which is a disadvantage of the existing training. There is an effect that can improve the speech recognition rate.

Claims

In the Gaussian probability density expression method applied to the speech recognition system,

A first step of dividing the probability density space into a plurality of partitioned Gaussian Autoregressive Mixture (PGAM) spaces;

A second step of expressing the divided partitioned Gaussian auto-regressive mixture space as a Gaussian Autoregressive Mixture (GAM); And

The third step of expressing the Gaussian probability density by representing the GAM having the largest value as a hybrid partitioned Gaussian Autoregressive Mixture (HPGAM), which is the representative probability density.

Gaussian probability density expression method comprising a.

In the speech recognition training method for obtaining a hybrid partitioned Gaussian auto-regressive mixture applied to a speech recognition system,

A first step of setting first training data using the number of mixture groups as an initial value and then performing training on the first training data;

A second step of performing a recognition test on the first training data on which the training is performed to perform a training on a misunderstanding word;

A third step of combining the training data for the misrecognized word and the first training data to obtain new second training data of which the number of mixture groups is increased by one, and performing training on the second training data;

A fourth step of checking whether the number of mixture groups of the training data of the third step reaches a desired number;

A fifth step of repeating from the second step if the number of mixture groups does not reach a desired number in the fourth step; And

A sixth step of terminating voice recognition training when the number of mixture groups reaches a desired number in the fourth step;

Speech recognition training method comprising a.

In the speech recognition system having a processor,

A first function of dividing a probability density space into a plurality of partitioned Gaussian Autoregressive Mixture (PGAM) spaces;

A second function of expressing the divided partitioned Gaussian Autoregressive Mixture space as Gaussian Autoregressive Mixture (GAM); And

A third function that expresses Gaussian probability density by expressing GAM having the largest value as Hybrid Partitioned Gaussian Autoregressive Mixture (HPGAM), which is the representative probability density.

A computer-readable recording medium having recorded thereon a program for realizing this.

In a speech recognition system having a processor,

The computer-readable recording medium having recorded thereon a program for realizing the sixth step of terminating the speech recognition training when the number of mixture groups reaches the desired number in the fourth step.