KR100332748B1

KR100332748B1 - Vector quantization searching method in voice recognition

Info

Publication number: KR100332748B1
Application number: KR1019940034853A
Authority: KR
Inventors: 김민성
Original assignee: 엘지전자주식회사
Priority date: 1994-12-17
Filing date: 1994-12-17
Publication date: 2002-10-25
Also published as: KR960025316A

Abstract

PURPOSE: A vector quantization searching method in voice recognition is provided to reduce a period of time required for calculations in a quantization process so as to process voice recognition in real time. CONSTITUTION: The quantized codeword of the first frame of an input voice is compared with the entire codewords of a codebook, to select the codeword having the minimum distance. The codeword of the previous frame is compared with the input of the next frame to obtain a distance. When the distance is larger than the sum of an average distance and a variance distance, the codeword of the input is compared with the entire codewords. When the distance is not larger, the codeword is compared only with a selected codeword corresponding to the previous frame codeword, to obtain a quantization codeword. The aforementioned process is repeated for the overall voice section to acquire quantization codewords.

Description

Vector Recognition Method of Speech Recognition

본 발명은 음성인식에 벡터 양자화를 이용하는 인식방법에 관한 것으로, 특히 음성 인식시 과거 프레임의 양자화된 결과로 다음 프레임의 코드워드를 예측함으로써 다음 프레임에서 검색할 코드북의 코드워드의 개수를 줄여 빠른 검색이 되도록 하는 음성인식의 벡터 양자화 검색방법에 관한 것이다.The present invention relates to a recognition method using vector quantization for speech recognition. In particular, a speech retrieval method reduces the number of codebooks to be searched in a next frame by predicting a codeword of a next frame based on a quantized result of a previous frame. The present invention relates to a vector quantization search method for speech recognition.

종래의 음성인식에서 이용되는 벡터 양자화방법은 코드북의 모든 코드워드가 입력된 음성 특징벡터와 비교하여 미리 정의된 거리함수의 최소값을 가지는 코드워드를 출력한다.The vector quantization method used in the conventional speech recognition outputs a codeword having a minimum value of a predefined distance function by comparing all codewords of a codebook with an input speech feature vector.

이때 M개의 코드워드로 이루어진 코드북을 CB라 하고, 각 코드워드 벡터를 W라고 하면 입력X에 대한 벡터 양자화 결과는 하기 식(1)과 같이된다.In this case, if a codebook consisting of M codewords is called CB and each codeword vector is W, the vector quantization result of the input X is expressed by Equation (1) below.

W = arg min { 거리(W_i,X) ......... (1) 단, i=0,MW = arg min {distance (W _i , X) ......... (1) where i = 0, M

즉, 코드북의 모든 코드워드와 입력의 비교에서 최소값을 가지는 코드워드로 출력된다.In other words, all codewords of the codebook are output as codewords having the minimum value in comparison with the input.

상기 과정을 입력의 각 프레임에서 반복적으로 행하면 전체음성에 대한 코드워드열을 구할 수 있으며, 이 과정에서 비교된 정도를 나타내는 함수거리는 두 벡터 차의 거리로 나타나는데 상기 식(1)을 기호를 이용해서 나타내보면 다음과 같다.If the above process is repeated in each frame of input, the codeword string for the entire voice can be obtained. In this process, the functional distance representing the degree of comparison is expressed as the distance between two vectors. It is as follows.

거리(W_i,X) = ∑ (Wj -W_ij)² Distance (W _i , X) = ∑ (Wj -W _ij ) ²

위의 식에 대해 M번 반복함으로써 한 프레임에 대한 코드워드를 찾을 수 있다.By repeating M times for the above equation, we can find the codeword for one frame.

그러나, 이와같은 종래의 기술에 있어서 입력의 각 프레임에 대하여 전체 프레임과 비교하여 코드워드를 구하도록 함으로써 벡터양자화시 양자화과정시간이 많이 소요되는 문제점이 있다.However, in the related art, a codeword is obtained for each frame of the input in comparison with the entire frame, so that the quantization process takes a long time in vector quantization.

따라서, 본 발명의 목적은 M번의 거리계산을 줄이기 위해 이전 프레임에서 양자화된 코드워드로 현재 프레임에 대한 코드워드를 예측하고 이 정보를 이용하여 검색 코드워드중 입력과 큰 차이를 나타내는 코드워드를 제외하고 비교함으로써 M번의 검색과정을 M보다 작은 N회수로 검색할 수 있도록 하고, (M-N)만큼 계산량을 줄여 벡터 양자화시간을 줄일 수 있도록 한 음성인식의 벡터 양자화 검색방법을 제공함에 있다.Accordingly, an object of the present invention is to predict a codeword for a current frame with a codeword quantized in a previous frame in order to reduce the distance calculation of M times, and to use the information to exclude a codeword having a large difference from an input among search codewords. By searching and comparing the M times, the number of retrieval processes can be searched with N times smaller than M, and the vector quantization retrieval method of speech recognition is provided to reduce the vector quantization time by reducing the computation amount by (MN).

상기 목적을 달성하기 위한 본 발명은 입력음성의 첫프레임의 양자화된 코드워드를 코드북의 전체 코드워드와 비교하여 최소거리를 갖는 코드워드를 선택하는 단계와, 상기 단계에서 이전 프레임의 코드워드와 그 다음 프레임의 입력을 비교하여 거리를 구하는 단계와, 상기 단계에서 구한 거리가 평균거리와 분산거리의 합보다 크면 전체 코드워드와 비교하고, 아니면 이전 프레임 코드워드에 해당하는 선택된 코드워드만으로 비교하여 양자화 코드워드를 구하는 단계와, 모든 프레임 구간에 걸쳐 상기 단계를 반복하며 구하도록 구성한다.According to an aspect of the present invention, a codeword having a minimum distance is selected by comparing a quantized codeword of a first frame of an input speech with an entire codeword of a codebook. Comparing the input of the next frame to obtain a distance; if the distance obtained in the step is greater than the sum of the average distance and the variance distance, compare the entire codeword with the selected codeword corresponding to the previous frame codeword; Obtaining a codeword and repeating the above steps over all frame intervals.

이와같이 각 단계별로 이루어진 본 발명에 대한 동작 및 작용효과에 대하여 상세히 설명하면 다음과 같다.When described in detail with respect to the operation and effect of the present invention made in each step as follows.

코드북을 CB라 하고, 코드북의 크기 즉, 코드워드의 수를 M, 입력음성의 특징벡터열을 C, C의 i번째 프레임을 C_i, i-1번째 벡터 양자화된 코드워드를 W_i-1라고 하면, i번째 입력 프레임에 대한 벡터 양자화된 코드워드 W_i는 다음과 같이 구한다.The codebook is called CB, the size of the codebook, i.e., the number of codewords is M, the feature vector sequence of the input speech is C, the i-th frame of C is C _i , the i-1th vector quantized codeword is W _i-1 In this case, the vector quantized codeword W _i for the i-th input frame is obtained as follows.

양자화원 코드워드가 입력되면 그 입력된 코드워드를 W_i-1의 코드워드와 입력(C_i)의 거리(D)보다 2배 큰 코드워드를 제외하고 비교하는데, 양자화된 코드워드(W_i)와의 거리가 2D보다 큰 코드워드는 입력과의 거리가 최소 D보다 크기때문에 제외하여도 최소거리를 갖는 코드워드로 선택될 수 없기 때문이다.When the quantization source codeword is input, the input codeword is compared except for a codeword twice larger than the distance D between the codeword of W _i-1 and the input C _i , and the quantized codeword W _i This is because a codeword having a distance greater than 2D cannot be selected as a codeword having a minimum distance even if the distance from the input is greater than the minimum D.

제 1 도에서 양자화된 코드워드Wi와 입력Ci와의 거리가 D이면 코드워드W_i와의 거리가 2D보다 큰 W_k,W_n은 입력C_i와의 거리가 W_i보다 항상 크기때문에 비교할 필요가 없게된다.Claim 1 is also a code word Wi and if the input Ci with the distance D codeword W _i with the distance is large W _k, W _n than 2D is a distance between the input C _i quantized in it is not necessary to compare, because always larger than W _i .

따라서, 벡터양자화를 할때 거리가 작을것으로 생각되는 코드워드를 찾는다면 이 거리와 코드워드 상호간의 거리를 이용해서 비교할 필요가 없는 코드워드를 제거할 수 있다.Therefore, if the codeword is found to be small when the vector is quantized, the distance between the codewords and the codewords can be used to eliminate codewords that do not need to be compared.

본 발명에서는 현재 입력프레임과 거리가 작을 가능성이 있는 코드워드로 전 프레임의 코드워드W_i-1를 선택한다.In the present invention, the codeword W _i-1 of the previous frame is selected as a codeword in which the distance from the current input frame may be small.

음성에서는 전 후 프레임이 매우 유사하므로 전 프레임의 코드워드가 다음 프레임에서도 입력과 비교했을대 거리가 매우 작을 가능성이 크기때문에 전 프레임과 다음 프레임을 반복적으로 비교해 나가면 전 음성에 대해 백터양자화를 빠른시간에 구할수 있다. 단 첫 프레임에서는 전 프레임 정보를 구할 수 없으므로 전체 코드워드를 다 비교하고, 두번째 프레임부터는 첫 프레임의 양자화된 코드워드를 이용하여 구하고, 세번째 프레임은 두번째 프레임의 양자화된 결과를 이용하는 방법으로 전 음성구간의 프레임의 양자화를 할 수 있다.Since before and after frames are very similar in voice, the distance between the previous frame and the next frame is likely to be very small when the codeword of the previous frame is compared with the input in the next frame. You can get it at However, since the entire frame information cannot be obtained in the first frame, all the codewords are compared, the second frame is obtained using the quantized codeword of the first frame, and the third frame uses the quantized result of the second frame. The frame can be quantized.

상기에서와 같이 프레임의 양자화를 하려면 코드북의 각 코드워드 상호간의 거리를 계산하여야 하는데, 이것은 코드북을 구할때 1번 계산해두면 된다.As described above, in order to quantize a frame, a distance between each codeword of a codebook must be calculated, which is calculated once when a codebook is obtained.

그러나, 각 코드워드들 상호간의 거리를 모두 저장하려면 많은 기억소자가 필요한데 코드북 크기가 256이라면 256 ×256 = 65,536개의 거리를 저장할 수 있는 기억소자가 필요하다.However, in order to store all the distances between the codewords, many memory elements are required. If the codebook size is 256, a memory element capable of storing 256 x 256 = 65,536 distances is required.

그러므로, 기억소자의 양을 줄이기 위한 방법에 대하여 제 2 도에 의거하여살펴보면, 먼저 훈련음성과 코드북이 입력되면 훈련음성을 코드북으로 양자화했을 때 평균거리(μ)와 분산거리(σ)를 구한다음 코드북의 각 코드워드간의 거리를 구해 그 거리가 2(μ+ σ)보다 작으면 W_i코드워드에 대한 선택된 코드워드를 해당 저장번지 S[W_i][m]에 저장한다.Therefore, according to FIG. 2, a method for reducing the amount of memory elements is obtained. First, when the training voice and the codebook are input, the average distance (μ) and the dispersion distance (σ) are obtained when the training voice is quantized into the codebook. If the calculated distance between codewords in the codebook the distance is less than 2 (μ + σ) stores a code word selected for the code words W _i to the storage address _{S [W i] [m]} .

여기서, S[Wi][m]은 메모리의 번지와 같은 개념이며, S[Wi]는 선택된 i번째 벡터 양자화 코드워드를 의미하고, [m]은 설정된 순서로 1에서 M의 값이 순차적으로 들어간다. 즉, 훈련음성과 코드북이 입력되면 훈련음성을 코드북으로 양자화 했을 때 평균거리와 분산거리를 구한 후 코드북의 각 코드워드간의 거리를 구해 그 거리가 2(평균거리 + 분산거리)보다 작으면 Wi 코드워드에 대한 선택된 코드워드들을 S[Wi][1]에서부터 S[Wi][m]까지 저장하는 것이다.Here, S [Wi] [m] is the same concept as the address of the memory, S [Wi] means the selected i-th vector quantization codeword, and [m] is a value of 1 to M sequentially in the set order. . In other words, when the training voice and the codebook are input, the average distance and the variance distance are obtained when the training voice is quantized into the codebook, and then the distance between each codeword of the codebook is obtained. If the distance is less than 2 (average distance + dispersion distance), the Wi code The selected codewords for the word are stored from S [Wi] [1] to S [Wi] [m].

이렇게 함으로써 입력과 Wi를 비교했을 때 거리가 (μ+ σ)보다 작을경우 S[W_i][m]의 코드워드만으로 비교할 수 있다. 여기서 m의 최대값이 코드북 크기인 전체 코드워드 수(M)와 같으면 계산량은 차이가 없고 m의 최대값이 M보다 작으면 그만큼 계산량이 줄게된다.In this way, if the distance between input and Wi is less than (μ + σ), only S [W _i ] [m] codewords can be compared. If the maximum value of m is equal to the total number of codewords M, which is the codebook size, the calculation amount is not different. If the maximum value of m is less than M, the calculation amount is reduced by that amount.

상기에서와 같은 방법으로 코드워드를 선택하여 저장한 다음 인식단계를 수행하게 되는데 인식대상 음성이 들어오면 S[Wi][m]와 코드북을 이용해서 빠른 양자화를 행한다.In the same manner as described above, a codeword is selected and stored, and then a recognition step is performed. When a voice to be recognized is received, fast quantization is performed using S [Wi] [m] and a codebook.

제 3 도는 지금까지의 과정에 대하여 상세하게 설명해놓은 것으로, 먼저 입력(Ct)이 시작점(t=0)이면 모든 코드워드에 대하여 비교한 후 양자화하고, 시작점이 아니면 이전 프레임의 양자화된 코드워드와 입력(Ct)을 비교하여 거리(D)를 구한다.3 is a detailed description of the process up to now. First, when the input point Ct is the starting point (t = 0), all codewords are compared and then quantized. The distance D is obtained by comparing the input Ct.

상기에서 구한 거리(D)가 평균거리(μ)와 분산거리(σ)의 합보다 크면 모든 코드워드와 비교하고, 크지않으면 이전 프레임 코드워드에 해당하는 선택된 코드워드하고만 비교한 후 양자화한다.If the distance D is greater than the sum of the average distance μ and the dispersion distance σ, the comparison is made with all codewords. If the distance D is not large, only the selected codeword corresponding to the previous frame codeword is compared and quantized.

이상에서와 같은 과정을 모드 음성구간(t=o,T)에 대해 반복하여 양자화한다.The same process as described above is repeated for quantization of the mode voice interval t = o, T.

이상에서 상세히 설명한 바와같이 본 발명은 음성인식의 양자화 과정의 계산시간을 줄임으로써 음성인식을 실시간에 처리할 수 있도록 한 효과가 있다.As described in detail above, the present invention has an effect of processing the speech recognition in real time by reducing the calculation time of the speech recognition quantization process.

제 1 도는 입력과 각 코드워드와의 거리 표시도.1 is a diagram showing the distance between an input and each codeword.

제 2 도는 음성의 벡터양자화 방법 설명도.2 is an explanatory diagram of a vector quantization method of speech.

제 3 도는 본 발명 음성인식의 벡터 양자화 검색방법에 대한 흐름도.3 is a flowchart of a vector quantization search method of speech recognition according to the present invention.

Claims

Selecting a codeword having a minimum distance by comparing the quantized codeword of the first frame of the input speech with the entire codeword of the codebook, and comparing the codeword of the previous frame with the input of the next frame to obtain a distance. Obtaining a quantization codeword by comparing the entire codeword with only the selected codeword corresponding to the previous frame codeword, if the distance obtained in the step is greater than the sum of the average distance and the variance distance; Repeating the above step for the interval to obtain a quantization codeword, characterized in that the vector quantization search method of speech recognition.

The method of claim 1, wherein the selected codeword is obtained by calculating the average distance and the variance distance when the training speech is quantized into the codebook, and then calculating the codeword mutual distances of the codes, and the distance is greater than twice the sum of the average distance and the variance distance. The method of claim 1, wherein the selected codeword for the corresponding codeword is stored in the storage address S [W _i ] [m].