KR101372020B1

KR101372020B1 - Method and apparatus for gaussian mixture model based switched split vector quantization

Info

Publication number: KR101372020B1
Application number: KR1020120063776A
Authority: KR
Inventors: 김무영; 노명훈; 이윤주
Original assignee: 세종대학교산학협력단
Priority date: 2012-06-14
Filing date: 2012-06-14
Publication date: 2014-03-07
Also published as: KR20130140403A

Abstract

본 발명은 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 방법 및 그 장치에 관한 것으로, 본 발명의 일 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 스위치 분할 벡터 양자화 방법은, 입력된 음성 신호의 각 프레임에 대한 선형 스펙트럼 주파수를 가우시안 혼합 모델을 기반으로 하는 입력 벡터로 변환하는 단계와, 상기 입력 벡터 또는 상기 입력 벡터의 예측 에러 벡터 각각을 스위치 분할 벡터 양자화하여 복수의 양자화 벡터를 생성하는 단계와, 상기 생성된 복수의 양자화 벡터를 상기 입력 벡터와 비교하여 기 설정된 유사도를 가지는 하나의 최종 양자화 벡터를 결정하는 단계를 포함한다.
이에 따라 음성 신호에 대한 프레임에 대해 인터 프레임 상관 관계와 인트라 프레임 상관 관계를 이용함으로써 채널 음성 신호의 화자 인식 성능을 향상시킬 수 있다.The present invention relates to a switch partition vector quantization method and apparatus therefor based on a Gaussian mixture model, and a switch partition vector quantization method of a switch partition vector quantization apparatus based on a Gaussian mixture model according to an embodiment of the present invention, Converting the linear spectral frequency for each frame of the input speech signal into an input vector based on a Gaussian mixture model, and sequencing each of the input vector or the prediction error vector of the input vector by quantizing a switch division vector; And generating a final quantization vector having a predetermined similarity by comparing the generated plurality of quantization vectors with the input vector.
Accordingly, the speaker recognition performance of the channel speech signal can be improved by using the inter frame correlation and the intra frame correlation for the frame of the speech signal.

Description

Switch split vector quantization method based on Gaussian mixture model and its device TECHNICAL FIELD

본 발명은 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 방법 및 그 장치에 관한 것으로, 더욱 상세하게는 인터 프레임 상관관계와 인트라 프레임 상관관계를 이용하여 음성 신호를 양자화하는 기술이 개시된다.The present invention relates to a switch division vector quantization method and apparatus therefor based on a Gaussian mixture model. More particularly, a technique of quantizing a speech signal using inter frame correlation and intra frame correlation is disclosed.

음성 부호화기에서 고음질의 음성 부호화를 위해서 고음질의 음성 부호화를 위해서는 음성 신호의 단구간 상관도를 나타내는 선형 스펙트럼 주파수인 LSF(Line Spectral frequency) 계수를 효율적으로 양자화 하는 것이 매우 중요하다. LPC(Linear Predictive Coeffieient) 필터의 최적 선형 예측 계수값은 입력 신호를 프레임 단위로 나누어 각 프레임별로 예측 오차의 에너지를 최소화시키는 개념으로 구해진다.It is very important to efficiently quantize the linear spectral frequency (LSF) coefficient, which is a linear spectral frequency representing the short-term correlation of a speech signal, for high quality speech encoding in a speech encoder. The optimal linear prediction coefficient value of a linear predictive coeffieient (LPC) filter is obtained by dividing an input signal into frames and minimizing the energy of prediction error for each frame.

LPC에서 변환된 LSF 데이터는 코딩 효율성을 높이지만, LSF 데이터가 다차원으로 갈수록 복잡해진다는 문제점이 있다. VQ(Vector Quantizatio)의 차수가 커지면, 계산이 복잡해지고 메모리의 요구량도 증가하게 된다. 이러한 문제점을 해결하기 위해, SVQ(Split Vector Quantization)이 제안되었다.LSF data converted in LPC increases coding efficiency, but there is a problem in that LSF data becomes more complicated in multiple dimensions. As the order of VQ (Vector Quantizatio) increases, the computation becomes complicated and the memory requirements increase. In order to solve this problem, Split Vector Quantization (SVQ) has been proposed.

LPC 필터의 계수를 직접 양자화할 경우, 필터의 특성이 계수의 양자화 오차에 매우 민감하고 계수 양자화 후의 LPC 필터의 안정성이 보장되지 않는 문제점이 있다. 따라서 LPC 계수를 양자화 성질이 좋은 다른 파라미터로 변환하여 양자화하여야 하며, 주로 반사 계수(reflection coefficient) 또는 LSF로 변환하여 양자화한다. 특히, LSF 값은 음성의 주파수 특성과 밀접하게 연관되는 성질이 있어 최근에 개발된 표준 음성 압축기들은 대부분 LSF 양자화 방법을 사용한다.When directly quantizing the coefficients of the LPC filter, the characteristics of the filter are very sensitive to the quantization error of the coefficients, and the stability of the LPC filter after coefficient quantization is not guaranteed. Therefore, LPC coefficients should be converted to other parameters with good quantization properties and quantized, and mainly converted to reflection coefficients or LSF. In particular, LSF values are closely related to the frequency characteristics of speech, so most recently developed standard speech compressors use the LSF quantization method.

LSF 양자화 방법은 효율적인 양자화를 위하여 LSF 계수의 프레임간 상관관계를 이용한다. 즉, 현재 프레임의 LSF를 직접 양자화하지 않고 과거 프레임의 LSF 값 정보로부터 현재 프레임의 LSF를 예측하고 예측 오차를 양자화한다. LSF 값은 음성 신호의 주파수 특성과 밀접한 관계가 있으며, 따라서 시간적으로 예측이 가능하고 상당히 큰 예측 이득을 얻을 수 있다.The LSF quantization method uses interframe correlation of LSF coefficients for efficient quantization. That is, the LSF of the current frame is predicted and the prediction error is quantized from the LSF value information of the past frame without directly quantizing the LSF of the current frame. The LSF value is closely related to the frequency characteristic of the speech signal, and thus can be predicted in time and a fairly large prediction gain can be obtained.

이러한 방법은 재구성된 이전 프레임을 사용하기 때문에 채널 에러에 약한 문제가 있다. 만일 채널 내에서 오류가 발생하는 경우 이 오류로 인해 변형된 프레임은 다음에 오는 모든 데이터를 망칠 수 있게 된다. 제거를 위해 사용된 이전 프레임이 재구성된 프레임이기 때문이다. 따라서 채널에 민감한 실제 타임 시스템에서는 사용되기 어려운 문제점이 있다.This method has a weak problem with channel error because it uses the reconstructed previous frame. If an error occurs in the channel, the frame that is modified due to this error can ruin all subsequent data. This is because the previous frame used for removal is a reconstructed frame. Therefore, there is a problem that is difficult to use in the real time system sensitive to the channel.

본 발명의 배경이 되는 기술은 대한민국 공개특허공보 제10-2004-0078760호(2004. 09. 31)에 개시되어 있다.The background technology of the present invention is disclosed in Republic of Korea Patent Publication No. 10-2004-0078760 (2004. 09. 31).

본 발명이 해결하고자 하는 기술적인 과제는 음성 신호에 대한 프레임에 대해 인터 프레임 상관 관계와 인트라 프레임 상관 관계를 이용하여 채널 음성 신호의 화자 인식 성능을 향상시키는 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 방법 및 그 장치를 제공하는 것을 목적으로 한다.The technical problem to be solved by the present invention is a switch partition vector quantization based on a Gaussian mixture model that improves speaker recognition performance of channel speech signals by using inter frame correlation and intra frame correlation for frames for speech signals. It is an object to provide a method and an apparatus thereof.

본 발명의 일 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 스위치 분할 벡터 양자화 방법은, 입력된 음성 신호의 각 프레임에 대한 선형 스펙트럼 주파수를 가우시안 혼합 모델을 기반으로 하는 입력 벡터로 변환하는 단계와, 상기 입력 벡터 또는 상기 입력 벡터의 예측 에러 벡터 각각을 스위치 분할 벡터 양자화하여 복수의 양자화 벡터를 생성하는 단계와, 상기 생성된 복수의 양자화 벡터를 상기 입력 벡터와 비교하여 기 설정된 유사도를 가지는 하나의 최종 양자화 벡터를 결정하는 단계를 포함한다.In the switch partition vector quantization method of a switch partition vector quantization apparatus based on a Gaussian mixture model according to an embodiment of the present invention, a linear spectral frequency for each frame of an input speech signal is based on a Gaussian mixture model. Converting the input vector into a plurality of quantization vectors; and generating a plurality of quantization vectors by quantizing a switch partition vector of each of the input vector or the prediction error vector of the input vector, and comparing the generated quantization vectors with the input vector. Determining one final quantization vector having similarity.

또한, 상기 복수의 양자화 벡터를 생성하는 단계는, 상기 입력 벡터 또는 상기 입력 벡터의 예측 에러 벡터 각각을 KLT 도메인으로 변환하여 분할 벡터 양자화할 수 있다.The generating of the plurality of quantization vectors may include transforming the input vector or each of the prediction error vectors of the input vector into a KLT domain to perform quantization of a split vector.

또한, 상기 복수의 양자화 벡터를 생성하는 단계는, 벡터 양자화 코드북으로부터 상기 입력 벡터와 기 설정된 유사도를 가지는 제1 코드 벡터를 선택하고, 상기 제1 코드 벡터를 복수의 서브 벡터로 분할하여 양자화한 값인 제1 양자화 벡터를 생성하는 단계와, 상기 벡터 양자화 코드북으로부터 상기 예측 에러 벡터와 기 설정된 유사도를 가지는 제2 코드 벡터를 선택하고, 상기 제2 코드 벡터를 복수의 서브 벡터로 분할하여 양자화한 값인 제2 양자화 벡터를 생성하는 단계를 포함할 수 있다.The generating of the plurality of quantization vectors may include selecting a first code vector having a predetermined similarity with the input vector from a vector quantization codebook, dividing the first code vector into a plurality of subvectors, and quantizing the first code vector. Generating a first quantization vector, selecting a second code vector having a predetermined similarity with the prediction error vector from the vector quantization codebook, and dividing the second code vector into a plurality of subvectors to perform quantization; Generating a two quantization vector.

또한, 상기 제2 양자화 벡터를 생성하는 단계는, 상기 음성 신호의 현재 프레임의 예측 에러 벡터를 양자화한 값과, 이전 프레임에서 생성된 최종 양자화 벡터의 값을 합산하여 상기 제2 양자화 벡터를 생성할 수 있다.The generating of the second quantization vector may include generating a second quantization vector by adding a quantized value of a prediction error vector of a current frame of the speech signal and a value of a final quantization vector generated in a previous frame. Can be.

또한, 상기 예측 에러 벡터는, 상기 음성 신호의 현재 프레임의 입력 벡터와 이전 프레임의 입력 벡터를 이용하여 생성할 수 있다.The prediction error vector may be generated using an input vector of a current frame of the speech signal and an input vector of a previous frame.

또한, 상기 입력 벡터는, 상기 입력된 음성 신호의 각 프레임에 대한 선형 스펙트럼 주파수를 EM(Expectation Maximum) 알고리즘을 이용하여 생성된 가우시안 혼합 모델일 수 있다.The input vector may be a Gaussian mixture model generated by using an EM (Expectation Maximum) algorithm for a linear spectral frequency for each frame of the input speech signal.

또한, 상기 제1 코드 벡터 또는 제2 코드 벡터는, 최대 우도 추정 알고리즘을 이용하여 선택될 수 있다.In addition, the first code vector or the second code vector may be selected using a maximum likelihood estimation algorithm.

또한, 상기 하나의 최종 양자화 벡터를 결정하는 단계는, 상기 최종 양자화 벡터에 의해 할당되는 i 번째 클러스터의 n 차원을 가지는 비트(B_i _,n)는 다음의 수학식을 이용하여 계산할 수 있다:In addition, the step of determining the one final quantization vector, the bit (B _i _{, n} ) having the n dimension of the i-th cluster allocated by the final quantization vector can be calculated using the following equation:

여기서, B_i는 i 번째 클러스터, k는 분할 벡터, G_i _,k는 k-차원의 도형의 정규화된 2차 모멘트를 나타낸다.Here, B _i is the i-th cluster, k is the division vector, G _i _{, k} is the normalized second moment of the k-dimensional figure.

본 발명의 또 다른 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치는, 입력된 음성 신호의 각 프레임에 대한 선형 스펙트럼 주파수를 가우시안 혼합 모델을 기반으로 하는 입력 벡터로 변환하는 변환부와, 상기 입력 벡터 또는 상기 입력 벡터의 예측 에러 벡터 각각을 스위치 분할 벡터 양자화하여 복수의 양자화 벡터를 생성하는 양자화부와, 상기 생성된 복수의 양자화 벡터를 상기 입력 벡터와 비교하여 기 설정된 유사도를 가지는 하나의 최종 양자화 벡터를 결정하는 결정부를 포함한다.Switch division vector quantization apparatus based on a Gaussian mixture model according to another embodiment of the present invention, the conversion unit for converting the linear spectral frequency for each frame of the input speech signal to an input vector based on the Gaussian mixture model And a quantizer configured to generate a plurality of quantization vectors by quantizing the switch vector and each of the input vector or the prediction error vector of the input vector, and comparing the generated plurality of quantization vectors with the input vector to have a predetermined similarity. And a decision unit for determining one final quantization vector.

이와 같이 본 발명의 실시예에 따르면 음성 신호에 대한 프레임에 대해 인터 프레임 상관 관계와 인트라 프레임 상관 관계를 이용함으로써 채널 음성 신호의 화자 인식 성능을 향상시킬 수 있다.As described above, according to the exemplary embodiment of the present invention, the speaker recognition performance of the channel speech signal may be improved by using the inter frame correlation and the intra frame correlation for the frame of the speech signal.

도 1은 본 발명의 일 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 구성도,
도 2는 도 1에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 방법의 흐름도,
도 3은 본 발명의 또 다른 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 구성도,
도 4는 도 1 및 도 3에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 입력 벡터 및 그에 대응하는 양자화 벡터를 설명하기 위한 예시도이다.1 is a block diagram of a switch division vector quantization apparatus based on a Gaussian mixture model according to an embodiment of the present invention;
2 is a flowchart of a switch division vector quantization method based on a Gaussian mixture model according to FIG. 1;
3 is a block diagram of a switch division vector quantization apparatus based on a Gaussian mixture model according to another embodiment of the present invention;
4 is an exemplary diagram for describing an input vector and a corresponding quantization vector of a switch division vector quantization apparatus based on the Gaussian mixture model according to FIGS. 1 and 3.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세하게 설명한다. 사용되는 용어들은 실시예에서의 기능을 고려하여 선택된 용어들로서, 그 용어의 의미는 사용자, 운용자의 의도 또는 판례 등에 따라 달라질 수 있다. 그러므로 후술하는 실시예들에서 사용된 용어의 의미는, 본 명세서에 구체적으로 정의된 경우에는 그 정의에 따르며, 구체적인 정의가 없는 경우는 당업자들이 일반적으로 인식하는 의미로 해석되어야 할 것이다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The terms used are terms selected in consideration of the functions in the embodiments, and the meaning of the terms may vary depending on the user, the intention or the precedent of the operator, and the like. Therefore, the meaning of the terms used in the following embodiments is defined according to the definition when specifically defined in this specification, and unless otherwise defined, it should be interpreted in a sense generally recognized by those skilled in the art.

도 1은 본 발명의 일 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 구성도이고, 도 2는 도 1에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 방법의 흐름도이다.1 is a block diagram of a switch split vector quantization apparatus based on a Gaussian mixed model according to an embodiment of the present invention, and FIG. 2 is a flowchart of a switch split vector quantization method based on a Gaussian mixed model according to FIG. 1. .

도 1 및 도 2를 참조하면, 본 발명의 일 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치(100)는 변환부(110), 양자화부(120) 및 결정부(130)를 포함한다. 먼저 변환부(110)는 입력된 음성 신호의 각 프레임에 대한 선형 스펙트럼 주파수(LSF : Line Spectral frequency)를 가우시안 혼합 모델(GMM : Gaussian Mixture Model)을 기반으로 하는 입력 벡터로 변환한다(S210). 이 경우, 음성 신호는 복수의 프레임으로 구성될 수 있으며, 입력 벡터는 각각의 프레임에 선형 스펙트럼 주파수를 EM(Expectation Maximum) 알고리즘을 이용하여 생성된 가우시안 혼합 모델일 수 있다.1 and 2, the switch division vector quantization apparatus 100 based on a Gaussian mixture model according to an embodiment of the present invention may include a transform unit 110, a quantization unit 120, and a determination unit 130. It includes. First, the conversion unit 110 converts a linear spectral frequency (LSF: Line Spectral frequency) for each frame of the input voice signal into an input vector based on a Gaussian Mixture Model (GMM) (S210). In this case, the speech signal may be composed of a plurality of frames, and the input vector may be a Gaussian mixture model generated by using an Expectation Maximum (EM) algorithm with linear spectral frequencies in each frame.

또한, 변환부(110)는 입력 벡터의 예측 에러 벡터를 생성할 수 있다. 여기서, 예측 에러 벡터는 음성 신호의 현재 프레임의 입력 벡터와 이전 프레임의 입력 벡터를 이용하여 생성할 수 있다. 이는 LSF 계수의 프레임 간의 상관관계를 이용하는 방식인 인터 프레임 상관 관계를 통해 예측 에러 벡터를 구할 수 있다. 즉, 변환부(110)는 현재 프레임의 LSF를 직접 양자화하지 않고 과거 프레임의 LSF 값 정보로부터 현재 프레임의 LSF를 예측한 예측 에러를 구하게 된다. 여기서, LSF 값은 음성 신호의 주파수 특성과 밀접한 관계가 있으므로 시간적으로 예측이 가능하고 상당히 큰 예측 이득을 얻을 수 있다.In addition, the transformer 110 may generate a prediction error vector of the input vector. Here, the prediction error vector may be generated using the input vector of the current frame of the speech signal and the input vector of the previous frame. It is possible to obtain a prediction error vector through inter frame correlation, which is a method of using correlation between frames of LSF coefficients. That is, the transform unit 110 obtains a prediction error for predicting the LSF of the current frame from the LSF value information of the past frame without directly quantizing the LSF of the current frame. Here, since the LSF value is closely related to the frequency characteristic of the speech signal, the LSF value can be predicted in time and a fairly large prediction gain can be obtained.

다음으로, 양자화부(120)는 입력 벡터 또는 입력 벡터의 예측 에러 벡터 각각을 스위치 분할 벡터 양자화(SSVQ : Switched Split Vector Quantization)하여 복수의 양자화 벡터를 생성한다(S220). 즉, 양자화부(120)는 입력 벡터를 이용한 양자화 벡터와 예측 에러 벡터를 이용한 양자화 벡터를 생성하게 된다. 분할 벡터 양자화 알고리즘은 입력된 음성 신호의 프레임을 복수의 부차원(sub-dimention)으로 분할하여, 각각의 부차원에 대해 벡터 양자화를 하는 기술이다.Next, the quantization unit 120 generates a plurality of quantization vectors by performing Switched Split Vector Quantization (SSVQ) on each of the input vector or the prediction error vector of the input vector (S220). That is, the quantization unit 120 generates a quantization vector using an input vector and a quantization vector using a prediction error vector. The division vector quantization algorithm divides a frame of an input speech signal into a plurality of sub-dimentions and performs vector quantization for each sub-dimension.

구체적으로 양자화부(120)는 제1 양자화부(121), 제2 양자화부(122) 및 코드북 DB(127)를 포함하며, 본 명세서에서는 제1 양자화부(121)는 입력 벡터를 스위치 분할 벡터 양자화하며, 제2 양자화부(122)는 예측 에러 벡터를 스위치 분할 벡터 양자화하는 구성을 나타낸다. 코드북 DB(127)는 복수의 코드 벡터{μ_m,X}^N _m=1가 저장되어 있다. 예를 들어, 제1 양자화부(121)는 코드북 DB(127)로부터 입력 벡터 x^k의 각각의 값에 대해 각 코드북의 k₁내지 k_n 번째 코드 벡터들과 비교하여 가장 유사도가 높은 코드 벡터 i를 선택하고, 제2 양자화부(122)는 예측 에러 벡터 e^k의 각각의 값에 대해 각 코드북의 k₁ 내지 k_n 번째 코드 벡터들과 비교하여 가장 유사도가 높은 코드 벡터 i를 선택한다.In detail, the quantization unit 120 includes a first quantization unit 121, a second quantization unit 122, and a codebook DB 127. In the present specification, the first quantization unit 121 converts an input vector into a switch division vector. The second quantizer 122 quantizes the switch error vector and quantizes the prediction error vector. Codebook DB (127) may be ^N _{m = 1} is stored in a plurality of code vectors {μ _{m, X}.} For example, the first quantizer 121 compares the code vector i having the highest similarity with respect to each value of the input vector x ^k from the codebook DB 127 with the k ₁ to k _n th code vectors of each codebook. The second quantizer 122 selects k ₁ of each codebook for each value of the prediction error vector e ^k . The code vector i having the highest similarity is selected in comparison with the k to _n th code vectors.

제1 양자화부(121)는 제1 선택부(123) 및 제1 분할 벡터 양자화부(125)를 포함한다. 제1 선택부(123)는 벡터 양자화 코드북인 코드북 DB(127)로부터 입력 벡터와 기 설정된 유사도를 가지는 제1 코드 벡터를 선택한다. 여기서, 제1 코드 벡터는 최대 우도 추정(Maximum Likelihood Estimation) 알고리즘을 이용하여 선택된다. 제1 분할 벡터 양자화부(125)는 제1 선택부(123)에서 선택된 제1 코드 벡터를 복수의 서브 벡터로 분할하고, 각각의 서브 벡터를 양자화하여 제1 양자화 벡터를 생성한다.The first quantizer 121 includes a first selector 123 and a first divided vector quantizer 125. The first selector 123 selects a first code vector having a predetermined similarity with the input vector from the codebook DB 127 which is a vector quantization codebook. Here, the first code vector is selected using a maximum likelihood estimation algorithm. The first division vector quantization unit 125 divides the first code vector selected by the first selection unit 123 into a plurality of subvectors, and quantizes each subvector to generate a first quantization vector.

한편, 제2 양자화부(122)는 제2 선택부(124) 및 제2 분할 벡터 양자화부(126)를 포함한다. 제2 선택부(124)는 벡터 양자화 코드북인 코드북 DB(127)로부터 예측 에러 벡터와 기 설정된 유사도를 가지는 제2 코드 벡터를 선택한다. 여기서, 제2 코드 벡터는 최대 우도 추정(Maximum Likelihood Estimation) 알고리즘을 이용하여 선택된다. 제2 분할 벡터 양자화부(126)는 제2 선택부(124)에서 선택된 제2 코드 벡터를 복수의 서브 벡터로 분할하고, 각각의 서브 벡터를 양자화하여 제2 양자화 벡터를 생성한다.The second quantizer 122 includes a second selector 124 and a second divided vector quantizer 126. The second selector 124 selects a second code vector having a predetermined similarity with the prediction error vector from the codebook DB 127 which is a vector quantization codebook. Here, the second code vector is selected using a maximum likelihood estimation algorithm. The second division vector quantization unit 126 divides the second code vector selected by the second selection unit 124 into a plurality of subvectors, and quantizes each subvector to generate a second quantization vector.

또한, 제2 양자화부(122)는 음성 신호의 현재 프레임의 예측 에러 벡터를 양자화한 값과, 이전 프레임에서 생성된 최종 양자화 벡터의 값을 합산하여 제2 양자화 벡터를 생성할 수 있다. 즉, 이전 프레임과 현재 프레임 간의 인터 프레임 상관 관계를 이용하여 제2 양자화 벡터를 생성한다.In addition, the second quantization unit 122 may generate a second quantization vector by summing a value obtained by quantizing the prediction error vector of the current frame of the speech signal and a value of the final quantization vector generated in the previous frame. That is, a second quantization vector is generated using the inter frame correlation between the previous frame and the current frame.

마지막으로, 결정부(130)는 양자화부로부터 생성된 복수의 양자화 벡터를 입력 벡터와 비교하여 기 설정된 유사도를 가지는 하나의 최종 양자화 벡터를 결정한다(S230). 즉, 음성 신호에 대한 동일한 프레임 간의 상관 관계를 구하는 인트라 프레임 상관 관계(intra frame correlation) 알고리즘을 이용하게 된다. 따라서, 결정부(130)는 양자화부로부터 생성된 제1 양자화 벡터와 제2 양자화 벡터 중 입력 벡터와 가장 근접한 유사도를 가지는 양자화 벡터를 최종 양자화 벡터로 결정하며, 최종 양자화 벡터를 이용하여 음성 신호에 대한 비트를 할당하게 된다.Finally, the determination unit 130 compares the plurality of quantization vectors generated from the quantization unit with the input vector to determine one final quantization vector having a predetermined similarity (S230). That is, an intra frame correlation algorithm for obtaining correlation between the same frames with respect to the speech signal is used. Accordingly, the determiner 130 determines a quantization vector having a similarity closest to the input vector among the first quantization vector and the second quantization vector generated from the quantization unit as a final quantization vector, and uses the final quantization vector to determine a speech signal. Will allocate bits for

또한, 결정부(130)는 전체 비트(B_tot)를 각각의 클러스터(cluster) 및 하위 차원(subdimension)에 할당할 수 있다. 여기서, 원본 음성 데이터는 복수의 코드 벡터인 클러스터로 나눠지며, 각 클러스터는 복수의 하위 차원으로 분할된다. 결정부(130)는 i 번째 클러스터에 할당되는 비트(B_i)를 다음의 수학식 1을 이용하여 계산할 수 있다.In addition, the determiner 130 may allocate the entire bit B _tot to each cluster and the subdimension. Here, the original speech data is divided into clusters which are a plurality of code vectors, and each cluster is divided into a plurality of lower dimensions. The determiner 130 may calculate the bit _Bi allocated to the i-th cluster by using Equation 1 below.

수학식 1에서, β_i=k[∏^N _n ₌₁(G_i _, _kn/k_n)^kn]^1/k, k_i=∏^N _n ₌₁(λ_i,l)^1/k , G_i _,k는 k 차원의 도형의 정규화된 2차 모멘트를 나타낸다.In Equation 1, β _i = k [∏ ^N _n _{= 1} (G _i _, _kn / k _n ) ^kn ] ^{1 / k} , k _i = ∏ ^N _n _{= 1} (λ _{i, l} ) ^{1 / k} , G _i _{, k} represents the normalized second moment of the k-dimensional figure.

또한, 결정부(130)는 최종 양자화 벡터에 의해 할당되는 i 번째 클러스터의 n 차원을 가지는 비트(B_i _,n)는 다음의 수학식 2를 이용하여 계산할 수 있다.Also, the determiner 130 may calculate the bits _Bi _{and n} having the n dimension of the i th cluster allocated by the final quantization vector using Equation 2 below.

수학식 2에서, B_i는 i 번째 클러스터, k는 분할 벡터, G_i _,k는 k-차원의 도형의 정규화된 2차 모멘트를 나타낸다.
In Equation 2, B _i is the i-th cluster, k is the division vector, G _i _{, k} represents the normalized second moment of the k-dimensional figure.

도 3은 본 발명의 또 다른 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 구성도이다.3 is a block diagram of a switch division vector quantization apparatus based on a Gaussian mixture model according to another embodiment of the present invention.

도 3을 참조하면, 본 발명의 또 다른 실시예에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치(300)는 변환부(310), 양자화부(320) 및 결정부(330)를 포함한다. 이 경우, 변환부(310) 및 결정부(330)의 기능은 앞서 설명한 도 1의 스위치 분할 벡터 양자화 장치의 변환부(110) 및 결정부(130)와 동일하므로 이에 대한 설명은 생략하기로 한다. 이하, 도 3의 스위치 분할 벡터 양자화 장치의 양자화부(320)에 대해서 도 1의 스위치 분할 벡터 양자화 장치(100)의 양자화부(120)와의 차이점을 중심으로 설명하도록 한다.Referring to FIG. 3, the switch division vector quantization apparatus 300 based on a Gaussian mixture model according to another embodiment of the present invention includes a transform unit 310, a quantization unit 320, and a determination unit 330. do. In this case, functions of the transform unit 310 and the determiner 330 are the same as those of the transform unit 110 and the determiner 130 of the switch division vector quantization apparatus of FIG. 1, which will not be described. . Hereinafter, the quantization unit 320 of the switch division vector quantization apparatus of FIG. 3 will be described based on differences from the quantization unit 120 of the switch division vector quantization apparatus 100 of FIG. 1.

양자화부(320)는 입력 벡터 또는 입력 벡터의 예측 에러 벡터 각각을 KLT(Karhunen-Loeve Transform) 도메인으로 변환하여 분할 벡터 양자화할 수 있다. KLT는 원 음성 신호를 작은 크기의 서브 블록으로 분할하여 변환하는 방법 중 하나이다. 보다 구체적으로 양자화부(320)는 제1 양자화부(321), 제2 양자화부(322) 및 코드북 DB(329)를 포함한다. 제1 양자화부(321)에서는 입력 벡터에 대한 양자화 벡터를 구하고, 제2 양자화부(322)에서는 입력 벡터의 예측 에러 벡터에 대한 양자화 벡터를 구한다.The quantization unit 320 may transform the input vector or each of the prediction error vectors of the input vector into a Karhunen-Loeve Transform (KLT) domain to quantize the split vector. KLT is a method of dividing and converting an original audio signal into small sub-blocks. More specifically, the quantization unit 320 includes a first quantization unit 321, a second quantization unit 322, and a codebook DB 329. The first quantizer 321 obtains a quantization vector for the input vector, and the second quantizer 322 obtains a quantization vector for the prediction error vector of the input vector.

제1 양자화부(321)는 제1 선택부(323), 제1 KLT 변환부(325-1), 제1 분할 벡터 양자화부(327) 및 제1 KLT 역변환부(325-2)를 포함한다. 제1 선택부(323)는 벡터 양자화 코드북인 코드북 DB(329)로부터 입력 벡터와 기 설정된 유사도를 가지는 제1 코드 벡터를 선택한다. 여기서, 제1 코드 벡터는 최대 우도 추정(Maximum Likelihood Estimation) 알고리즘을 이용하여 선택된다. 제1 KLT 변환부(325-1)는 제1 코드 벡터를 KLT 도메인으로 변환한다. 제1 분할 벡터 양자화부(327)는 KLT 변환된 제1 코드 벡터를 복수의 서브 벡터로 분할하고, 각각의 서브 벡터를 양자화한다. 제1 KLT 역변환부(325-2)는 제1 분할 벡터 양자화부(327)에서 양자화된 벡터를 KLT 역변환하여 원래의 도메인으로 변환시켜 제1 양자화 벡터를 생성한다. 따라서, 제1 양자화 벡터는 KLT 도메인에서 분할 벡터 양자화 처리를 거친 값이 된다.The first quantizer 321 includes a first selector 323, a first KLT transform unit 325-1, a first division vector quantizer 327, and a first KLT inverse transform unit 325-2. . The first selector 323 selects a first code vector having a predetermined similarity with the input vector from the codebook DB 329 which is a vector quantization codebook. Here, the first code vector is selected using a maximum likelihood estimation algorithm. The first KLT converter 325-1 converts the first code vector into a KLT domain. The first division vector quantizer 327 divides the KLT transformed first code vector into a plurality of subvectors and quantizes each subvector. The first KLT inverse transform unit 325-2 generates the first quantization vector by transforming the quantized vector by the first division vector quantization unit 327 to the original domain by performing KLT inverse transform. Therefore, the first quantization vector becomes a value that has undergone the division vector quantization processing in the KLT domain.

한편, 제2 양자화부(322)는 제2 선택부(324), 제2 KLT 변환부(326-1), 제2 분할 벡터 양자화부(328) 및 제2 KLT 역변환부(326-2)를 포함한다. 제2 선택부(324)는 벡터 양자화 코드북인 코드북 DB(329)로부터 예측 에러 벡터와 기 설정된 유사도를 가지는 제2 코드 벡터를 선택한다. 여기서, 제2 코드 벡터는 최대 우도 추정(Maximum Likelihood Estimation) 알고리즘을 이용하여 선택된다. 제2 KLT 변환부(326-1)는 제2 코드 벡터를 KLT 도메인으로 변환한다. 제2 분할 벡터 양자화부(328)는 KLT 변환된 제2 코드 벡터를 복수의 서브 벡터로 분할하고, 각각의 서브 벡터를 양자화한다. 제2 KLT 역변환부(326-2)는 제2 분할 벡터 양자화부(328)에서 양자화된 벡터를 KLT 역변환하여 원래의 도메인으로 변환시켜 제2 양자화 벡터를 생성한다. 따라서, 제2 양자화 벡터는 KLT 도메인에서 분할 벡터 양자화 처리를 거친 값이 된다.On the other hand, the second quantization unit 322 performs the second selection unit 324, the second KLT transform unit 326-1, the second division vector quantization unit 328, and the second KLT inverse transform unit 326-2. Include. The second selector 324 selects a second code vector having a predetermined similarity with the prediction error vector from the codebook DB 329 which is a vector quantization codebook. Here, the second code vector is selected using a maximum likelihood estimation algorithm. The second KLT converter 326-1 converts the second code vector into a KLT domain. The second division vector quantizer 328 divides the KLT-converted second code vector into a plurality of subvectors and quantizes each subvector. The second KLT inverse transform unit 326-2 performs a KLT inverse transform on the quantized vector in the second division vector quantization unit 328 to convert the original quantized vector to the original domain to generate a second quantization vector. Therefore, the second quantization vector becomes a value that has undergone the division vector quantization processing in the KLT domain.

또한, 제2 양자화부(322)는 음성 신호의 현재 프레임의 예측 에러 벡터를 양자화한 값과, 이전 프레임에서 생성된 최종 양자화 벡터의 값을 합산하여 제2 양자화 벡터를 생성할 수 있다. 즉, 이전 프레임과 현재 프레임 간의 인터 프레임 상관 관계를 이용하여 제2 양자화 벡터를 생성한다.In addition, the second quantization unit 322 may generate a second quantization vector by summing a value obtained by quantizing the prediction error vector of the current frame of the speech signal and a value of the final quantization vector generated in the previous frame. That is, a second quantization vector is generated using the inter frame correlation between the previous frame and the current frame.

한편, 결정부(330)는 양자화부(320)로부터 생성된 복수의 양자화 벡터를 입력 벡터와 비교하여 기 설정된 유사도를 가지는 하나의 최종 양자화 벡터를 결정하는 것으로, 구체적인 설명은 도 1의 스위치 분할 벡터 양자화 장치와 동일하므로 생략하기로 한다.
Meanwhile, the determiner 330 compares the plurality of quantization vectors generated from the quantization unit 320 with an input vector to determine one final quantization vector having a predetermined similarity. Since it is the same as the quantization device, it will be omitted.

도 4는 도 1 및 도 3에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 분할 벡터 양자화 장치의 입력 벡터 및 그에 대응하는 양자화 벡터를 설명하기 위한 예시도이다.4 is an exemplary diagram for describing an input vector and a corresponding quantization vector of a switch division vector quantization apparatus based on the Gaussian mixture model according to FIGS. 1 and 3.

도 4를 참조하면, 음성 신호에 대하여 SVQ(Split Vector Quantization) 알고리즘을 적용하여 양자화한 경우(a)와, 도 1에 따른 가우시안 혼합 모델을 기반으로 하는 스위치 스플릿 벡터 양자화 알고리즘인 GMM-SSVQ(Gaussian Mixture Model-Switched Split Vector Quantization)을 적용하여 양자화한 경우(b) 및 도 3에 따른 가우시안 혼합 모델 KLT 도메인 기반의 스위치 스플릿 벡터 양자화 알고리즘인 GMM-KLT-SSVQ(Gaussian Mixture Model-Karhunen Loeve Transform-Switched Split Vector Quantization)을 적용하여 양자화한 경우(c)를 나타낸다. 음성 신호에 대해 SVQ를 적용하여 양자화한 경우에 비해 GMM-SSVQ, GMM-KLT-SSVQ를 적용한 경우 원본 음성 데이터에 더욱 근접한 양자화된 데이터에 비트를 할당할 수 있다.
Referring to FIG. 4, a quantization is performed by applying a split vector quantization (SVQ) algorithm to a speech signal (a), and a GMM-SSVQ (Gaussian) which is a switch split vector quantization algorithm based on a Gaussian mixture model according to FIG. 1. Gaussian Mixture Model-Karhunen Loeve Transform-Switched (GMM-KLT-SSVQ), which is a switch split vector quantization algorithm based on Gaussian mixed model KLT domain according to FIG. 3 and quantized using Mixture Model-Switched Split Vector Quantization (C) when quantization is performed using Split Vector Quantization). When GMM-SSVQ and GMM-KLT-SSVQ are applied to the speech signal, the bit may be allocated to the quantized data that is closer to the original speech data than when SVQ is applied to the quantized speech signal.

이와 같이 본 발명의 실시예에 따르면 음성 신호에 대한 프레임에 대해 인터 프레임 상관 관계와 인트라 프레임 상관 관계를 이용함으로써 채널 음성 신호의 화자 인식 성능을 향상시킬 수 있다.
As described above, according to the exemplary embodiment of the present invention, the speaker recognition performance of the channel speech signal may be improved by using the inter frame correlation and the intra frame correlation for the frame of the speech signal.

이상에서 본 발명은 도면을 참조하면서 기술되는 바람직한 실시예를 중심으로 설명되었지만 이에 한정되는 것은 아니다. 따라서 본 발명은 기재된 실시예로부터 도출 가능한 자명한 변형예를 포괄하도록 의도된 특허청구범위의 기재에 의해 해석되어져야 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, Therefore, the present invention should be construed as a description of the claims which are intended to cover obvious variations that can be derived from the described embodiments.

100 : 스위치 분할 벡터 양자화 장치
110 : 변환부
120 : 양자화부
121 : 제1 양자화부
122 : 제2 양자화부
123 : 제1 선택부
124 : 제2 선택부
125 : 제1 분할 벡터 양자화부
126 : 제2 분할 벡터 양자화부
127 : 코드북 DB
130 : 결정부
300 : 스위치 분할 벡터 양자화 장치
310 : 변환부
320 : 양자화부
321 : 제1 양자화부
322 : 제2 양자화부
323 : 제1 선택부
324 : 제2 선택부
325-1 : 제1 KLT 변환부
325-2 : 제1 KLT 역변환부
326-1 : 제2 KLT 변환부
326-2 : 제2 KLT 역변환부
327 : 제1 분할 벡터 양자화부
328 : 제2 분할 벡터 양자화부
329 : 코드북 DB
330 : 결정부100: switch division vector quantization device
110: converter
120: quantization unit
121: first quantization unit
122: second quantization unit
123: first selection unit
124: second selection unit
125: first division vector quantization unit
126: second division vector quantization unit
127: Codebook DB
130: decision unit
300: switch division vector quantization device
310:
320: quantization unit
321: first quantization unit
322: second quantization unit
323: first selection unit
324: second selection unit
325-1: First KLT Converter
325-2: first KLT inverse transform unit
326-1: second KLT converter
326-2: second KLT inverse transform unit
327: First division vector quantization unit
328: second division vector quantization unit
329: Codebook DB
330:

Claims

In the switch partition vector quantization method of a switch partition vector quantizer based on a Gaussian mixture model,
Converting a linear spectral frequency for each frame of the input speech signal into an input vector based on a Gaussian mixture model;
Generating a plurality of quantization vectors by quantizing a switch division vector of each of the input vector or the prediction error vector of the input vector; And
Comparing the generated plurality of quantization vectors with the input vector to determine one final quantization vector having a predetermined similarity,
The prediction error vector is
A switch division vector quantization method based on a Gaussian mixture model generated by using an input vector of a current frame of the speech signal and an input vector of a previous frame.

The method of claim 1,
Generating the plurality of quantization vectors,
A switch partition vector quantization method based on a Gaussian mixture model for transforming each prediction error vector of the input vector into a Karhunen-Loeve Transform (KLT) domain to perform partition vector quantization.

The method of claim 1,
Generating the plurality of quantization vectors,
Selecting a first code vector having a predetermined similarity with the input vector from a vector quantization codebook, generating a first quantization vector that is a quantized value by dividing the first code vector into a plurality of subvectors; And
Selecting a second code vector having a predetermined similarity with the prediction error vector from the vector quantization codebook, and generating a second quantization vector that is a quantized value by dividing the second code vector into a plurality of subvectors; Switch-partition vector quantization method based on Gaussian mixture model.

The method of claim 3,
Generating the second quantization vector,
And a Gaussian mixture model based on a Gaussian mixture model that adds a quantized value of a prediction error vector of a current frame of the speech signal and a value of a final quantized vector generated in a previous frame to generate the second quantized vector.

delete

The method of claim 1,
The input vector is
And a linear spectral frequency for each frame of the input speech signal based on a Gaussian mixture model which is a Gaussian mixture model generated using an EM (Expectation Maximum) algorithm.

The method of claim 3,
The first code vector or the second code vector,
A switch partition vector quantization method based on a Gaussian mixture model selected using a maximum likelihood estimation algorithm.

The method of claim 1,
Determining the one final quantization vector,
Said end having a dimension n of the i-th cluster is allocated by the vector quantization bit (B _{_i, n)} is a switch division based on a Gaussian mixture model calculated by using the following equation of the vector quantization by:

Here, B _i is the i-th cluster, k is the division vector, G _i _{, k} is the normalized second moment of the k-dimensional figure.

A conversion unit for converting a preceding spectral frequency for each frame of the input speech signal into an input vector based on a Gaussian mixture model;
A quantization unit configured to generate a plurality of quantization vectors by quantizing a switch division vector of each of the input vector or the prediction error vector of the input vector; And
And a determiner configured to determine one final quantization vector having a predetermined similarity by comparing the plurality of generated quantization vectors with the input vector.
The prediction error vector is
And a switch partition vector quantization device based on a Gaussian mixture model generated using an input vector of a current frame of the speech signal and an input vector of a previous frame.

10. The method of claim 9,
The quantization unit,
And a Gaussian mixture model based on a Gaussian mixture model for transforming the input vector or the prediction error vector of the input vector into a Karhunen-Loeve Transform (KLT) domain.

10. The method of claim 9,
The quantization unit,
A first quantizer for selecting a first code vector having a predetermined similarity with the input vector from a vector quantization codebook, and generating a first quantized vector that is a value obtained by dividing the first code vector into a plurality of subvectors; And
A second quantizer configured to select a second code vector having a predetermined similarity with the prediction error vector from the vector quantization codebook, divide the second code vector into a plurality of subvectors, and generate a second quantization vector that is a quantized value; Switch partition vector quantization device based on a Gaussian mixture model.

12. The method of claim 11,
The second quantization unit,
And a Gaussian mixture model based on a Gaussian mixture model that adds a quantized value of a prediction error vector of a current frame of the speech signal and a value of a final quantized vector generated in a previous frame to generate the second quantized vector.

delete

10. The method of claim 9,
The input vector is
And a Gaussian mixture model, which is a Gaussian mixture model generated using an EM (Expectation Maximum) algorithm, for the linear spectral frequency of each frame of the input speech signal.

12. The method of claim 11,
The first code vector or the second code vector,
Switch partition vector quantization apparatus based on Gaussian mixture model selected using maximum likelihood estimation algorithm.

10. The method of claim 9,
Wherein,
A switch partition vector quantization apparatus based on a Gaussian mixture model, in which bits (B _i _{, n} ) having an n dimension of an i th cluster allocated by the final quantization vector are calculated using the following equation: