KR100262596B1

KR100262596B1 - Voice recognition method for shortening computing time

Info

Publication number: KR100262596B1
Application number: KR1019960068149A
Authority: KR
Inventors: 심갑종
Original assignee: 정몽규; 현대자동차주식회사
Priority date: 1996-12-19
Filing date: 1996-12-19
Publication date: 2000-08-01
Also published as: KR19980049435A

Abstract

PURPOSE: A voice recognition control method for reducing calculation time is provided to enable integer representation of model parameters and calculation time reduction by performing log conversion on the model parameters extracted from a voice database. CONSTITUTION: The voice recognition control method for reducing calculation time includes following steps. At first, the voice database including wave data with respect to standard voice is accessed to and the data with respect to the standard voice is extracted(31). At the second step, observation capability model parameters and transition capability model parameters represented as floating point format are calculated by using the data extracted at the first step(33,34). Then, the observation capability model parameters and transition capability model parameters calculated at the second step are converted by using a log function to generate data represented as integers(35). At last, the data output from the third step is stored in a memory(36,37).

Description

Speech recognition control method to reduce the calculation time

이 발명은 음성 인식 장치에 적용되는 음성 인식 제어 방법에 관한 것으로서, 더욱 상세하게 말하자면 모델 파라미터의 학습 과정에서 모델링에 의해 얻어지는 모델 파라미터를 로그 변환한 후 메모리에 저장함으로써 음성 인식 장치의 실행 속도의 증가 및 메모리 용량의 감소를 가능하게 하는 음성 인식 제어 방법에 관한 것이다.The present invention relates to a speech recognition control method applied to a speech recognition apparatus, and more specifically, to increase the execution speed of the speech recognition apparatus by log-converting the model parameters obtained by modeling and storing them in a memory during model parameter learning. And a voice recognition control method that enables a reduction in memory capacity.

음성 인식 기술은 사용자가 별도로 입력 장치를 조작하지 않고도 기계를 작동할 수 있다는 잇점 때문에 널리 연구되고 있으며, 최근에는 사람의 생활 환경과 밀접하게 관련되어 있는 자동차, 가전기기 등에 실용화되고 있다.Speech recognition technology has been widely studied due to the advantage that a user can operate a machine without operating an input device separately, and recently, it has been put into practical use in automobiles, home appliances, and the like, which are closely related to a human living environment.

이러한 음성 인식 기술이 구체화된 것으로서, 사용자의 입력 음성과 미리 학습된 모델 파라미터를 비교함에 있어서 비터비 스코어링 알고리즘(viterbi scoring algorithm)을 이용한 음성 인식 제어 방법이 알려져 있으며, 아래에서 첨부된 도면을 참조하여 이를 보다 상세하게 설명한다.As such a speech recognition technology is embodied, a method of controlling speech recognition using a Viterbi scoring algorithm is known in comparing a user's input speech with pre-trained model parameters, and with reference to the accompanying drawings below. This will be described in more detail.

도1은 종래의 음성 인식 제어 방법 중 모델 파라미터 학습 과정을 도시하는 기능 블록도이고, 도2는 종래의 음성 인식 제어 방법 중 실행 과정을 도시하는 기능 블록도이다.1 is a functional block diagram illustrating a model parameter learning process in a conventional speech recognition control method, and FIG. 2 is a functional block diagram illustrating an execution process in a conventional speech recognition control method.

종래의 음성 인식 제어 방법에는 모델 파라미터 학습 과정과 실행 과정으로 구성되며, 그 각각에 대해서는 상기 도1 및 도2에 개략적으로 도시되어 있다.The conventional speech recognition control method includes a model parameter learning process and an execution process, each of which is schematically illustrated in FIGS. 1 and 2.

상기 도1을 참조하면, 음성 인식 데이타베이스(11)는 미리 수집된 표준 음성에 관한 데이타의 집합이며, 학습 알고리즘 실행블록(12)은 소정의 학습 알고리즘에 따라 상기 음성 인식 데이타베이스(11)를 억세스하여 표준 음성에 관한 데이타를 추출한다. 상기 학습 알고리즘 실행블록(12)이 수행된 후에는 관측 확률 모델링 블록(13)과 천이 확률 모델링 블록(14)에 의해 상기 추출된 표준 음성에 관한 데이타로부터 관측 확률 파라미터 b(0)와 천이 확률 파라미터 a_ij가 구해진다. 여기서, 관측 확률 파라미터는 표준 음성의 식별을 위한 것이며, 천이 확률 파라미터는 두 표준음성의 결합시 둘 사이의 천이과정을 표현하기 위한 것이다. 그리고, 상기 두 파라미터는 부동 소수점 방식(floating point)으로 표현되며, 상기와 같이 구해진 확률 파라미터들은 메모리 억세스 장치(15)를 통해 읽기 및 쓰기가 가능한 메모리 장치인 EPROM(Erasable and Programmable Read Only Memory)(16)에 저장된다. 그런데, 상기한 종래의 음성 인식 제어 방법에서는 모델 파라미터가 부동 소수점 방식으로 표현되므로 데이타의 양이 매우 많으며, 이로 인해 큰 저장 용량을 가지는 메모리가 요구된다.Referring to FIG. 1, the speech recognition database 11 is a collection of data relating to standard speech collected in advance, and the learning algorithm execution block 12 opens the speech recognition database 11 according to a predetermined learning algorithm. Access to extract data about standard voice. After the learning algorithm execution block 12 is performed, the observation probability parameter b (0) and the transition probability parameter are obtained from the data about the standard speech extracted by the observation probability modeling block 13 and the transition probability modeling block 14. a _ij is found. Here, the observation probability parameter is for identification of the standard voice, and the transition probability parameter is for representing the transition process between the two when the two standard voices are combined. The two parameters are represented by a floating point, and the probability parameters obtained as described above are EPROM (Erasable and Programmable Read Only Memory), which are memory devices that can be read and written through the memory access device 15. 16). However, in the above-described conventional speech recognition control method, since the model parameters are represented in a floating point manner, the amount of data is very large, which requires a memory having a large storage capacity.

다음으로, 도2를 참조하여 상기 EPROM(16)에 저장된 모델 파라미터를 이용한 실행과정을 설명한다.Next, the execution process using the model parameters stored in the EPROM 16 will be described with reference to FIG.

도2에 도시되어 있듯이, 모델 파라미터 읽기 블록(22)에서는 EPROM(21)에 저장되어 있는 관측 확률 파라미터와 천이 확률 파라미터를 읽어 들이며, 이 파라미터들은 비터비 스코어링 알고리즘 실행 블록(23)으로 전달된다. 상기 비터비 스코어링 알고리즘 실행 블록(23)은 비터비 스코어링 알고리즘에 따라 음성 데이타와 상기 확률 파라미터를 입력받아 중앙처리유닛(CPU : Central Processing Unit)을 이용하여 유사도 P(o/│K)를 계산한다. 상기 유사도의 계산식은 아래와 같다.As shown in Fig. 2, the model parameter reading block 22 reads the observation probability parameter and the transition probability parameter stored in the EPROM 21, and these parameters are passed to the Viterbi scoring algorithm execution block 23. The Viterbi scoring algorithm execution block 23 receives voice data and the probability parameter according to the Viterbi scoring algorithm and calculates a similarity P (o / │K) using a central processing unit (CPU). . The formula of the similarity is as follows.

P(o/│K) =log[a_ijSb(0)]P (o / │K) = log [a _ij Sb (0)]

그런데, 이미 언급한 바와 같이 확률 파라미터들은 부동 소수점으로 표현되기 때문에 많은 기억 공간을 차지할 뿐만 아니라 상기 유사도를 계산하기 위한 시간에도 영향을 미친다. 즉, 로그 계산을 수행하기 위하여 중앙처리유닛은 많은 계산시간을 필요로 한다.However, as already mentioned, since the probability parameters are represented in floating point, they take up a lot of storage space and also affect the time for calculating the similarity. That is, the central processing unit needs a lot of calculation time in order to perform log calculation.

이에 따라, 종래의 음성 인식 제어 방법에서는 메모리의 기억 공간 감소 및 계산 시간 단축을 위한 기술적 과제가 존재하고 있는 실정이다.Accordingly, in the conventional voice recognition control method, there are technical problems for reducing the memory space of the memory and shortening the calculation time.

이 발명은 상기한 바와 같은 종래의 기술적 문제점을 해결하기 위한 것으로서, 음성 데이타베이스에서 추출된 모델 파라미터에 대하여 로그 변환을 수행한 후 메모리 장치에 기록함으로써 모델 파라미터의 정수 표현을 가능하게 하며, 이로 인해 기억 공간 감소 계산 시간 단축을 실현할 수 있는 음성 인식 제어 방법을 제공하는 데 그 목적이 있다.The present invention is to solve the above-mentioned technical problems, and to perform an integer representation of the model parameters by performing logarithmic conversion on the model parameters extracted from the voice database and writing them to the memory device. An object of the present invention is to provide a speech recognition control method capable of realizing a reduction in storage space calculation time.

도1은 종래의 음성 인식 제어 방법 중 모델 파라미터 학습 과정을 도시하는 기능 블록도.1 is a functional block diagram showing a model parameter learning process in a conventional speech recognition control method.

도2는 종래의 음성 인식 제어 방법 중 실행 과정을 도시하는 기능 블록도.2 is a functional block diagram showing an execution process of a conventional speech recognition control method.

도3은 이 발명에 따른 음성 인식 제어 방법 중 모델 파라미터 학습 과정을 도시하는 기능 블록도.3 is a functional block diagram illustrating a model parameter learning process in the speech recognition control method according to the present invention.

도4는 이 발명에 따른 음성 인식 제어 방법 중 실행 과정을 도시하는 기능 블록도.4 is a functional block diagram showing an execution process of the speech recognition control method according to the present invention.

상기한 목적을 달성하기 위한 이 발명에 따른 음성 인식 제어 방법은, 표준 음성에 관한 파형 데이타가 집합되어 있는 음성 데이타베이스를 소정의 학습 알고리즘에 따라 억세스하여 상기 표준 음성에 관한 데이타를 추출하는 제1단계; 상기 제1단계에서 추출된 데이타를 이용하여 부동 소수점 방식으로 표현되는 관측 확률 모델 파라미터와 천이 확률 모델 파라미터를 계산하는 제2단계; 상기 제2단계에서 계산된 관측 확률 모델 파라미터와 천이 확률 모델 파라미터를 로그 함수에 의해 변환하여 정수로 표현되는 데이타를 생성하는 제3단계; 및 상기 제3단계에서 생성되는 데이타를 메모리 장치에 기록하는 제4단계를 포함한다.According to an aspect of the present invention, there is provided a speech recognition control method, comprising: a first method of extracting data concerning standard speech by accessing a speech database in which waveform data relating to standard speech is collected according to a predetermined learning algorithm; step; A second step of calculating an observation probability model parameter and a transition probability model parameter represented by a floating point method using the data extracted in the first step; A third step of generating data expressed as an integer by converting the observed probability model parameter and the transition probability model parameter calculated in the second step by a logarithmic function; And a fourth step of writing the data generated in the third step to a memory device.

상기한 이 발명에 따른 음성 인식 제어 방법은, 상기 메모리 장치로부터 모델 파라미터를 읽어들이는 제5단계; 및 음성 데이타를 입력받아 상기 제5단계에서 읽어들인 모델 파라미터를 이용하여 비터비 스코어링 알고리즘을 수행하며, 상기 알고리즘 수행의 결과로써 얻어지는 유사도를 외부에 출력하는 제6단계를 부가하여 포함한다.The voice recognition control method according to the present invention includes a fifth step of reading model parameters from the memory device; And performing a Viterbi scoring algorithm using the model parameters read in the fifth step by receiving the voice data, and adding a sixth step of outputting the similarity obtained as a result of the algorithm to the outside.

이하, 첨부된 도면을 참조하여 이 발명의 바람직한 실시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도3은 이 발명에 따른 음성 인식 제어 방법 중 모델 파라미터 학습 과정을 도시하는 기능 블록도이고, 도4는 이 발명에 따른 음성 인식 제어 방법 중 실행 과정을 도시하는 기능 블록도이다.3 is a functional block diagram illustrating a model parameter learning process in the speech recognition control method according to the present invention, and FIG. 4 is a functional block diagram illustrating an execution process in the speech recognition control method according to the present invention.

먼저, 도3을 참조하여 이 발명의 실시예에 따른 음성 인식 제어 방법 중 모델 파라미터 학습 과정을 설명한다.First, a model parameter learning process of the speech recognition control method according to the embodiment of the present invention will be described with reference to FIG. 3.

상기 도3에 도시된 음성 인식 데이타베이스(31)는 미리 수집된 표준 음성에 관한 데이타의 집합으로 구축되며, 학습 알고리즘 실행블록(32)에서는 소정의 학습 알고리즘에 따라 상기 음성 인식 데이타베이스(31)를 억세스하여 표준 음성에 관한 데이타를 추출하는 동작이 수행된다. 상기 학습 알고리즘 실행블록(32)이 수행된 후에는 관측 확률 모델링 블록(33)과 천이 확률 모델링 블록(34)에 의해 상기 추출된 표준 음성에 관한 데이타로부터 관측 확률 파라미터 b(0)와 천이 확률 파라미터 a_ij가 구해진다. 여기서, 관측 확률 파라미터는 표준 음성의 식별을 위한 것이며, 천이 확률 파라미터는 두 표준음성의 결합시 둘 사이의 천이과정을 표현하기 위한 것이다. 상기 두 파라미터는 부동 소수점 방식(floating point)으로 표현되어 있다. 상기와 같이 구해진 확률 파라미터들은 로그 변환 블록(35)에 의해 로그 값으로 변환되며, 이렇게 로그 함수로 변환된 값은 정수로 표현되어 있다. 상기 로그 변환 블록(35)의 수행에 의해 얻어지는 데이타는 메모리 억세스 장치(36)를 통해 읽기 및 쓰기가 가능한 메모리 장치인 EPROM(Erasable and Programmable Read Only Memory)(37)에 저장된다.The speech recognition database 31 shown in FIG. 3 is constructed from a set of data relating to standard speech collected in advance, and in the learning algorithm execution block 32, the speech recognition database 31 is set according to a predetermined learning algorithm. And extracting data on standard speech is performed. After the learning algorithm execution block 32 is performed, the observation probability parameter b (0) and the transition probability parameter are obtained from the data about the standard speech extracted by the observation probability modeling block 33 and the transition probability modeling block 34. a _ij is found. Here, the observation probability parameter is for identification of the standard voice, and the transition probability parameter is for representing the transition process between the two when the two standard voices are combined. The two parameters are represented by floating point. The probability parameters obtained as described above are converted into log values by the log conversion block 35, and the values converted into the log functions are expressed as integers. Data obtained by performing the log conversion block 35 is stored in an erasable and programmable read only memory (EPROM) 37 which is a memory device that can be read and written through the memory access device 36.

다음으로, 도4를 참조하여 상기 EPROM(37)에 저장된 모델 파라미터를 이용한 실행과정을 설명한다.Next, the execution process using the model parameters stored in the EPROM 37 will be described with reference to FIG.

도4에 도시되어 있듯이, 모델 파라미터 읽기 블록(42)에서는 EPROM(37)에 저장되어 있는 관측 확률 파라미터와 천이 확률 파라미터를 읽어 들이며, 이 파라미터들은 비터비 스코어링 알고리즘 실행 블록(43)으로 전달된다. 상기 비터비 스코어링 알고리즘 실행 블록(43)은 비터비 스코어링 알고리즘에 따라 음성 데이타와 상기 확률 파라미터를 입력받아 중앙처리유닛(CPU : Central Processing Unit)을 이용하여 유사도 P(o/│K)를 계산한다. 상기 유사도의 계산식은 아래와 같다.As shown in Fig. 4, the model parameter read block 42 reads the observation probability parameters and the transition probability parameters stored in the EPROM 37, and these parameters are passed to the Viterbi scoring algorithm execution block 43. The Viterbi scoring algorithm execution block 43 receives voice data and the probability parameter according to the Viterbi scoring algorithm and calculates the similarity P (o / │K) using a central processing unit (CPU). . The formula of the similarity is as follows.

P(o/│K) =log[a_ijSb(0)]P (o / │K) = log [a _ij Sb (0)]

이 발명의 실시예에서는 관측 확률 파라미터와 천이 확률 파라미터의 로그 값이 EPROM(37)에 저장되어 있으므로, 상기 계산식 중 로그항은 상기 모델 파라미터 읽기 블록(42)으로부터 얻어진 관측 확률 파라미터와 천이 확률 파라미터에 대한 데이타의 합산에 의해 구해진다. 따라서, 종래의 제어 방법에서는 두 파라미터 데이타에 대한 곱셈이 요구되어 계산 시간이 많이 소비되었던 반면, 본 발명의 제어 방법에서는 유사도의 계산에 합산만 필요하므로 계산시간이 많이 단축될 수 있다.In the embodiment of the present invention, since the log values of the observed probability parameter and the transition probability parameter are stored in the EPROM 37, the log term in the equation is applied to the observed probability parameter and the transition probability parameter obtained from the model parameter read block 42. It is obtained by the summation of the data. Therefore, in the conventional control method, the multiplication of the two parameter data is required, and the calculation time is consumed a lot, whereas in the control method of the present invention, only the sum is required for the calculation of the similarity, so the calculation time can be shortened.

상기 비터비 스코어링 알고리즘 실행 블록(43)에서 얻어진 유사도는 외부에 제공되어 음성 인식 장치에 사용된다.The similarity obtained in the Viterbi scoring algorithm execution block 43 is provided externally and used in the speech recognition apparatus.

이상과 같이 설명된 바에 따르면, 이 발명의 음성 인식 제어 방법은 음성 데이타베이스로부터 추출된 표준 음성에 관한 데이타로부터 확률 파라미터를 구한 후 이를 로그 변환하여 메모리 장치에 저장함으로써 메모리의 기억 공간을 절감할 뿐만 아니라 실행 과정은 상기 메모리 장치에 저장되어 있는 로그 변환이 이루어진 파라미터들의 합산에 의해 이루어지도록 함으로써 계산 시간을 단축하는 효과를 제공한다.As described above, the speech recognition control method of the present invention saves the memory space of the memory by obtaining a probability parameter from data about standard speech extracted from a speech database, and then converting and storing the probability parameter in a memory device. Rather, the execution process is performed by summing up the log conversion parameters stored in the memory device, thereby reducing the computation time.

Claims

A first step of accessing a speech database in which waveform data relating to standard speech is collected according to a predetermined learning algorithm and extracting data relating to the standard speech; A second step of calculating an observation probability model parameter and a transition probability model parameter represented by a floating point method using the data extracted in the first step;

A third step of generating data expressed as an integer by converting the observed probability model parameter and the transition probability model parameter calculated in the second step by a logarithmic function; And

And a fourth step of recording the data generated in the third step into a memory device.