KR20130080271A

KR20130080271A - A method and device for klt based domain switching split vector quantization

Info

Publication number: KR20130080271A
Application number: KR1020120001096A
Authority: KR
Inventors: 김무영; 노명훈; 이윤주
Original assignee: 세종대학교산학협력단
Priority date: 2012-01-04
Filing date: 2012-01-04
Publication date: 2013-07-12
Also published as: KR101348888B1

Abstract

PURPOSE: A Karhunen-Loeve transform (KLT) based domain switch split vector quantization method and an apparatus are provided to use a the KLT based SVQ apparatus and algorithm, thereby reducing an abnormal signal distortion part of the domains which is caused by the interaction between the KLT domain and an original domain. CONSTITUTION: A KLT domain converter (101) converts an inputted audio signal vector into a small sized sub-block. The KLT domain converter outputs a KLT vector signal by the KLT conversion of the converted audio signal vector. A first quantizing device (120) compares each of the values of the inputted audio signal vector with the codes of each code book and prints out the nearest code book. A second quantizing device (103) compares each of the values of the KLT vector with the codes of each code book and prints out the nearest code book. A final code book determiner (105) compares the code vector which is quantized in an original domain and the KLT domain of the audio signal, respectively, and selects the nearest code vector. [Reference numerals] (101) KLT domain converter; (104) Inverse KLT domain converter; (105) Final code book determiner

Description

MELT-based domain switch split vector quantization method and apparatus {A METHOD AND DEVICE FOR KLT BASED DOMAIN SWITCHING SPLIT VECTOR QUANTIZATION}

본 발명은 음성 신호 처리 방법에 관한 것이다. 구체적으로는 SVQ(Split Vector Quantization) 알고리즘에서 코드북을 선택하는 방법에 관한 것이다.The present invention relates to a speech signal processing method. Specifically, the present invention relates to a method of selecting a codebook in a split vector quantization (SVQ) algorithm.

음성 부호화기에서 고음질의 음성 부호화를 위해서 고음질의 음성 부호화를 위해서는 음성 신호의 단구간 상관도를 나타내는 선형 스펙트럼 주파수(Line Spectral frequency; LSF) 계수를 효율적으로 양자화하는 것이 매우 중요하다. LPC(Linear Predictive Coeffieient) 필터의 최적 선형 예측 계수값은 입력 신호를 프레임 단위로 나누어 각 프레임별로 예측 오차의 에너지를 최소화시키는 개념으로 구해진다.It is very important to efficiently quantize a linear spectral frequency (LSF) coefficient representing a short-term correlation of a speech signal for high quality speech encoding in a speech encoder. The optimal linear prediction coefficient value of a linear predictive coeffieient (LPC) filter is obtained by dividing an input signal into frames and minimizing the energy of prediction error for each frame.

LPC에서 변환된 LSF 데이터는 코딩 효율성을 높이지만, LSF 데이터가 다차원으로 갈수록 복잡해진다는 문제점이 있다. VQ(Vector Quantizatio)의 차수가 커지면, 계산이 복잡해지고 메모리의 요구량도 증가하게 된다. 이러한 문제점을 해결하기 위해, SVQ(Split Vector Quantization)가 제안되었다.LSF data converted in LPC increases coding efficiency, but there is a problem in that LSF data becomes more complicated in multiple dimensions. As the order of VQ (Vector Quantizatio) increases, the computation becomes complicated and the memory requirements increase. In order to solve this problem, Split Vector Quantization (SVQ) has been proposed.

LPC 필터의 계수를 직접 양자화할 경우, 필터의 특성이 계수의 양자화 오차에 매우 민감하고 계수 양자화 후의 LPC 필터의 안정성이 보장되지 않는 문제점이 있다. 따라서 LPC 계수를 양자화 성질이 좋은 다른 파라미터로 변환하여 양자화하여야 하며, 주로 반사 계수(reflection coefficient) 또는 LSF로 변환하여 양자화한다. 특히, LSF 값은 음성의 주파수 특성과 밀접하게 연관되는 성질이 있어 최근에 개발된 표준 음성 압축기들은 대부분 LSF 양자화 방법을 사용한다.When directly quantizing the coefficients of the LPC filter, the characteristics of the filter are very sensitive to the quantization error of the coefficients, and the stability of the LPC filter after coefficient quantization is not guaranteed. Therefore, LPC coefficients should be converted to other parameters with good quantization properties and quantized, and mainly converted to reflection coefficients or LSF. In particular, LSF values are closely related to the frequency characteristics of speech, so most recently developed standard speech compressors use the LSF quantization method.

LSF 양자화 방법은 효율적인 양자화를 위하여 LSF 계수의 프레임간 상관관계를 이용한다. 즉, 현재 프레임의 LSF를 직접 양자화하지 않고 과거 프레임의 LSF 값 정보로부터 현재 프레임의 LSF를 예측하고 예측 오차를 양자화한다. LSF 값은 음성 신호의 주파수 특성과 밀접한 관계가 있으며, 따라서 시간적으로 예측이 가능하고 상당히 큰 예측 이득을 얻을 수 있다.The LSF quantization method uses interframe correlation of LSF coefficients for efficient quantization. That is, the LSF of the current frame is predicted and the prediction error is quantized from the LSF value information of the past frame without directly quantizing the LSF of the current frame. The LSF value is closely related to the frequency characteristic of the speech signal, and thus can be predicted in time and a fairly large prediction gain can be obtained.

이러한 방법은 재구성된 이전 프레임을 사용하기 때문에 채널 에러에 약한 문제가 있다. 만일 채널 내에서 오류가 발생하는 경우 이 오류로 인해 변형된 프레임은 다음에 오는 모든 데이터를 망칠 수 있게 된다. 제거를 위해 사용된 이전 프레임이 재구성된 프레임이기 때문이다. 따라서 채널에 민감한 실제 타임 시스템에서는 사용되지 않는 방법이다. 그러나 채널에 영향을 주지 않는 시스템에 이용되는 경우에서는 이러한 알고리즘은 문제가 되지 않는다. 예를 들어 TTS(Text to Speech), 전화 응답 장치(TAD; Telephone Answering Device), 음성 녹음기, 응급 콜백 시스템 등은 채널에 영향을 미치지 않는다. 이 시스템의 경우 리얼 타임 시스템이 아니기에 채널 에러 문제는 무시될 수 있다.This method has a weak problem with channel error because it uses the reconstructed previous frame. If an error occurs in the channel, the frame that is modified due to this error can ruin all subsequent data. This is because the previous frame used for removal is a reconstructed frame. Therefore, it is not used in a real time system which is sensitive to channel. However, this algorithm is not a problem when used in systems that do not affect the channel. For example, text to speech (TTS), telephone answering devices (TADs), voice recorders, and emergency callback systems do not affect the channel. Since this system is not a real-time system, the channel error problem can be ignored.

일반적인 벡터 양자화에서 전체 벡터를 한꺼번에 양자화하는 것은 벡터 테이블의 크기가 너무 커지고 검색 시간이 많이 소요되므로 사용이 불가능하다. 이를 해결하기 위하여 전체 벡터를 여러 개의 서브벡터로 나누어 각각을 독립적으로 벡터 양자화하는 방법이 개발되었는데, 이를 스플릿 벡터 양자화(split vector quantization; SVQ) 방법이라고 한다. 예를 들어 20 비트를 사용한 10차 벡터 양자화에서 한 번에 양자화할 경우 벡터 테이블의 크기가 10×220 이 되지만 2개의 5차 서브벡터로 나누어 각각 10 비트씩 할당하는 스플릿(격자) 벡터 양자화 방법을 이용하면 벡터 테이블의 크기가 단지 5×210×2로 된다. 보다 많은 서브벡터로 나누면 벡터 테이블의 크기가 줄어들어 메모리를 절약할 수 있고 검색 시간을 줄일 수 있는 장점이 있으나, 벡터 값들의 상관 관계를 충분히 활용하지 못하여 성능이 떨어지는 단점이 있다.In general vector quantization, it is impossible to quantize an entire vector at once because the vector table is too large and takes a long time to search. To solve this problem, a method of dividing an entire vector into several subvectors and independently quantizing each of them has been developed. This is called split vector quantization (SVQ). For example, in the 10th-order vector quantization using 20 bits, the vector table has a size of 10 × 220. However, a split vector quantization method of dividing into two fifth-order subvectors and assigning 10 bits each to each other is provided. If used, the size of the vector table is only 5 × 210 × 2. Dividing into more subvectors reduces the size of the vector table, which saves memory and reduces the search time. However, it does not have enough correlation between the vector values.

한편, LSF를 압축하여 음성 코딩의 성능 향상을 위해 LSF 양자화에 대한 연구가 많이 진행되고 있다. 고정 비트율을 사용하는 RCQ(resolution-constrained quantization)는 입력 데이터의 PDF(probability density function)에 따라 셀의 크기가 달라지게 된다. 이로 인해 이상 신호왜곡(outliers in distortion)가 발생하게 된다. 이상 신호왜곡은 성능이 저하되는 요인이 된다.Meanwhile, a lot of researches on LSF quantization have been conducted to improve the performance of speech coding by compressing LSF. Resolution-constrained quantization (RCQ), which uses a fixed bit rate, results in a cell size that depends on the probability density function (PDF) of input data. This results in outliers in distortion. The abnormal signal distortion is a factor that degrades the performance.

또한 일반적은 벡터 양자화기는 코드북을 저장하게 되는데, 많은 연구들은 학습(training) 과정에서 최적의 코드북을 만들기 위해 노력하고 있으나, 현실에서는 학습과 테스트가 다르기 때문에 소스 미스매치(source mismatch)가 발생하게 된다. 이러한 소스 미스매치로 인해 왜곡이 증가되며, 음성 코딩의 성능이 떨어지는 문제점이 발생하게 된다.In addition, the general vector quantizer stores the codebook. Many studies are trying to make an optimal codebook during the training process, but in the real world, the source mismatch occurs because the learning and the test are different. . Due to such source mismatch, distortion is increased and a problem of degrading voice coding occurs.

본 발명의 배경이 되는 기술은 대한민국 등록특허공보 제10-0446630호(2004. 08. 23)에 기재되어 있다.The background technology of the present invention is described in Republic of Korea Patent Publication No. 10-0446630 (August 23, 2004).

본 발명의 목적은 LSF 양자화 시 이상 신호왜곡과 소스 미스 매치로 인해 일어나는 성능 저하를 줄이기 위해 입력된 음성 신호와 벡터와 더 가까운 코드북을 선택할 수 있게 하는 방법을 제공하는데 그 목적이 있다.An object of the present invention is to provide a method for selecting a codebook closer to an input speech signal and a vector in order to reduce performance degradation caused by abnormal signal distortion and source miss match during LSF quantization.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 사상에 따르면, 양자화 장치는, 입력된 음성 신호 벡터를 작은 크기의 서브 블록으로 변환하여 KLT 변환하여 KLT 벡터 신호를 출력하는 KLT 도메인 변환부, 상기 입력된 음성 신호 벡터의 각각의 값들을 각 코드북의 코드들과 비교하여 가장 가까운 코드북을 찾아 출력하는 제1 양자화기, 상기 KLT 벡터의 각각의 값들을 각 코드북의 코드들과 비교하여 가장 가까운 코드북을 찾아 출력하는 제2 양자화기, 및 상기 음성 신호의 오리지널 도메인에서 양자화된 코드 벡터와 KLT 도메인에서 양자화된 코드 벡터를 각각 소스 벡터와 비교하여 가장 가까운 코드 벡터를 선택하는 최종 코드북 결정부를 포함한다.According to the spirit of the present invention for achieving the above object, the quantization apparatus, KLT domain conversion unit for converting the input voice signal vector to a small sub-block KLT transform to output a KLT vector signal, the input A first quantizer which compares the respective values of the speech signal vector with the codes of the respective codebooks and finds the closest codebook, and compares the respective values of the KLT vector with the codes of the respective codebooks to find the closest codebook. A second coder for outputting and a final codebook determiner for selecting the nearest code vector by comparing the quantized code vector in the original domain of the speech signal and the quantized code vector in the KLT domain with the source vector, respectively.

일 실시예에서, 상기 양자화 장치는 스플릿 벡터 양자화(SVQ; Split Vector Quantization) 장치이다.In one embodiment, the quantization device is a split vector quantization (SVQ) device.

바람직하게는, 상기 제2 양자화기에서 출력된 KLT 벡터를 역 KLT 변환하여 출력하는 역 KLT 도메인 변환부를 더 포함하고, 상기 최종 코드북 비교부는, 상기 제1 양자화기와 상기 역 KLT 도메인 변환부에서 출력된 코드 벡터들과 소스 벡터를 비교한다.Preferably, the apparatus further includes an inverse KLT domain transform unit for inverse KLT transforming and outputting the KLT vector output from the second quantizer, and the final codebook comparator includes an output from the first quantizer and the inverse KLT domain transform unit. Compare code vectors with source vectors.

일 실시예에서, 상기 입력된 음성 신호는 선형 스펙트럼 주파수(LSF; Line Spectral frequency) 타입의 신호이며, 상기 입력된 음성 신호는 입력된 현재 프레임과 예측된 현재 프레임 데이터의 차이 값이다.In one embodiment, the input speech signal is a line spectral frequency (LSF) type signal, and the input speech signal is a difference value between the input current frame and the predicted current frame data.

더욱 바람직하게는, 상기 제1 양자화기는 x^k1 내지 x^kn의 값을 갖는 입력된 음성 신호 벡터 x^k의 각각의 값에 대해 각 코드북의 k¹ 내지 kⁿ번째 코드들과 비교하여 가장 가까운 코드북

^k를 찾고, 상기 제2 양자화기는 상기 KLT 변환부를 통해 변환된 X^k1 내지 X^kn의 값을 갖는 KLT 벡터 X^k의 각각의 값에 대해 각 코드북의 k¹ 내지 kⁿ 번째 코드들과 비교하여 가장 가까운 코드북

^k를 찾는다.More preferably, the first quantizer is the closest codebook in comparison to the k ¹ to k ^nth codes of each codebook for each value of the input speech signal vector x ^k having values of x ^k1 to x ^kn .

Finding ^k , the second quantizer is most compared to the k ¹ to k ⁿ th codes of each codebook for each value of the KLT vector X ^k having values of X ^k1 to X ^kn transformed through the KLT transform unit Nearest codebook

Find ^k

본 발명에 따른 KLT 기반의 SVQ 장치 및 알고리즘은 KLT 도메인과 오리지널 도메인의 상호작용으로 인해 서로가 가지는 이상 신호왜곡 부분을 줄이게 된다. 두 가지 도메인을 사용하기 때문에 코드벡터들이 넓게 위치해 소스 미스 매치를 줄일 수 있게 된다.The KLT-based SVQ apparatus and algorithm according to the present invention reduces the abnormal signal distortion part of each other due to the interaction between the KLT domain and the original domain. By using two domains, the codevectors are widely located, reducing source mismatches.

본 발명에 따른 KLT 기반 SVQ의 성능을 기존의 SVQ와 비교하여 SD(Spectral distortion)을 측정하면, 본 발명에 따른 KLT 기반 SVQ은 기존의 SVQ에 비해 1비트 이상 이득을 가진다.When the SD (Spectral Distortion) is measured by comparing the performance of the KLT-based SVQ according to the present invention with the conventional SVQ, the KLT-based SVQ according to the present invention has a gain of 1 bit or more compared with the existing SVQ.

도 1은 본 발명에 따른 KLT 도메인을 이용한 양자화 장치의 구성도,
도 2는 도 1에 따른 양자화 장치를 통해 구현되는 양자화 방법의 흐름도이다.1 is a block diagram of a quantization apparatus using a KLT domain according to the present invention,
FIG. 2 is a flowchart of a quantization method implemented through the quantization device according to FIG. 1.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and the drawings.

본 발명의 이점 및, 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술 되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.
Advantages, features, and methods of achieving them will be apparent with reference to the embodiments described below in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

효율적인 음성 코딩을 위해 사람의 vocal tract 필터를 선형 예측 코딩(Linear Predictive Coding; LPC)계수에 의해 모델링 한다. 보다 나은 LPC계수의 코딩을 위해 선형 스펙트럼 주파수(Line Spectral Frequency; LSF)가 제안되었다. 음성의 각 프레임은 올-폴(all-pole) 필터, H(Z)=1/A(Z)로써 일반화될 수 있다. 여기서 A(Z)는 LPC로 구성된 인버스 필터이고, 그 값은 Z-변환(Z-transform)을 이용하여 나타낸다.For efficient speech coding, the human vocal tract filter is modeled by Linear Predictive Coding (LPC) coefficients. Linear Spectral Frequency (LSF) has been proposed for better coding of LPC coefficients. Each frame of speech can be generalized as an all-pole filter, H (Z) = 1 / A (Z). Where A (Z) is an inverse filter composed of LPCs, and its value is represented using a Z-transform.

수학식 1에서, P는 LPC 차수이고, a는 LPC 계수이다. LSF를 정의하기 위해 인버스(inverse) 필터는 다음의 수학식 2와 같이 2개의 다항식으로 나타낼 수 있다.In Equation 1, P is the LPC order and a is the LPC coefficient. To define the LSF, an inverse filter can be represented by two polynomials as shown in Equation 2 below.

수학식 2에서, P(z)는 인버스 필터의 짝수 항이고, Q(z)는 홀수 항을 나타낸다. 코딩을 위해 이용되는 LSF 계수는 P(z)와 Q(z)의 근을 의미한다. 수학식 2로 구한 LSF는 오름차순으로 정렬된다. 그 성질은 효율적인 LSF 계산을 도와주기 때문에 압축 효율을 높일 수 있다.In Equation 2, P (z) is an even term of the inverse filter, and Q (z) represents an odd term. The LSF coefficient used for coding means the root of P (z) and Q (z). LSF obtained by Equation 2 is sorted in ascending order. Its properties help to make efficient LSF calculations, thus increasing compression efficiency.

10차 LPC를 사용하는 경우에는 10차의 LSF 데이터가 만들어진다. 벡터양자화는 차원이 증가할수록 계산량이 크게 증가하는데, 10차 LSF 데이터를 Vector Quantization (VQ) 하면 높은 계산량으로 실제에서 사용하는데 문제점이 있다. 그러므로 이를 해결하기 위해 벡터를 적절하게 나누어 양자화하는 스플릿 벡터 양자화(Split Vector Quantization; SVQ)가 있다. 이 방법을 사용하면 계산량은 감소하지만, 차원간의 상관관계가 부분적으로 이용되지 않기 때문에 스플릿 손실(split loss)이 발생하게 된다. 그러므로 SVQ는 VQ보다 성능이 떨어지게 된다.In case of using 10th order LPC, 10th order LSF data is generated. As vector quantization increases as the dimension increases, vector quantization greatly increases the amount of computation. Therefore, there is a split vector quantization (SVQ) that properly divides and quantizes vectors to solve this problem. This method reduces computation, but split loss occurs because the inter-dimensional correlation is not partially used. Thus, SVQ is less than VQ.

도 1은 본 발명에 따른 KLT 도메인을 이용한 양자화 장치의 구성도이고, 도 2는 도 1에 따른 양자화 장치를 통해 구현되는 양자화 방법의 흐름도이다.1 is a configuration diagram of a quantization apparatus using a KLT domain according to the present invention, and FIG. 2 is a flowchart of a quantization method implemented through the quantization apparatus according to FIG. 1.

부호화 패러다임의 일종으로 KLT(Karhunen-Loeve Transform) 기반의 양자화 방법이 있는데, KLT 는 각 벡터 간의 상관관계를 제거하는 방법 중 하나이다.As a kind of encoding paradigm, there is a quantization method based on Karhunen-Loeve Transform (KLT). KLT is one of methods for removing correlation between vectors.

도 1 및 도 2를 참조하면, 본 발명에 따른 SVQ 양자화 장치(100)는 KLT 도메인 변환부(101), 제1 양자화기(102), 제2 양자화기(103), 역 KLT 도메인 변환부(104) 및 최종 코드북 결정부(105)를 포함한다. KLT 도메인 변환부(101)는 입력 신호 x^k가 입력되면 이를 작은 크기의 서브 블록으로 변환하여 KLT 벡터 신호는 X^k를 생성시킨다(S200). 제1 양자화기(102)는 입력된 x^k1 내지 x^kn의 값을 갖는 신호 벡터 x^k 의 각각의 값에 대해 각 코드북의 k¹ 내지 kⁿ 번째 코드들과 비교하여 가장 가까운 코드북

^k를 찾게 된다(S210). 제2 양자화기(103)는 KLT를 통해 변환된 X^k1 내지 X^kn의 값을 갖는 KLT 벡터 X^k의 각각의 값에 대해 각 코드북의 k¹ 내지 kⁿ 번째 코드들과 비교하여 가장 가까운 코드북

^k를 찾게 된다(S220).1 and 2, the SVQ quantization apparatus 100 according to the present invention includes a KLT domain transform unit 101, a first quantizer 102, a second quantizer 103, and an inverse KLT domain transform unit ( 104) and a final codebook determiner 105. When the input signal x ^k is input, the KLT domain converter 101 converts the input signal x ^k into a sub block having a small size, thereby generating the X ^k from the KLT vector signal (S200). A first quantizer (102) closest codebook compared to the respective codebooks of k ¹ to k ^n-th codes for each value of the signal vector x ^k which has a value of an input x x ^k1 to ^kn

^k is found (S210). A second quantizer 103 is the closest codebook compared to the respective codebooks of k ¹ to k ⁿ th code for each value KLT of vector X ^k has a value of X X ^k1 to ^kn converted by a KLT

^k is found (S220).

제1 및 제2 양자화기(102, 103)는 비트 할당에 따라 성능이 달라진다. 최적의 성능을 내기 위해 비트 할당은 공분산 행렬(covariance matrix)과 차원(dimension)에 따라 결정이 된다. 비트 할당식은 수학식 3과 같다.The first and second quantizers 102 and 103 vary in performance depending on bit allocation. To achieve optimal performance, bit allocation is determined by the covariance matrix and dimensions. The bit allocation equation is the same as Equation 3.

수학식 3에서, c와 c_i는 각각 전체비트와 i번째 서브블록의 비트이고, k와 k_i는 차원과 i번째 서브블록의 차원을 나타낸다. G_k는 차원에 따라 달라지는 상수이고, C_Yi는 KLT 도메인에서 i번째 서브블록의 공분산 행렬이다. 수학식 3은 KLT 도메인에서 제2 양자화기의 비트할당에 사용되고, 제1 양자화기는 수학식 3에서 C_Yi가 C_Xi로 교체되어 사용된다. 여기서 C_Xi는 오리지널 도메인에서 i번째 서브블록의 공분산 행렬이다.In Equation 3, c and c _i are bits of the entire bit and the i-th subblock, respectively, and k and k _i represent the dimension and the dimension of the i-th subblock. G _k is a constant that depends on the dimension, and C _Yi is a covariance matrix of the i th subblock in the KLT domain. Equation 3 is used for bit allocation of the second quantizer in the KLT domain, and the first quantizer is used by replacing C _Yi with C _Xi in Equation 3. Where C _Xi is the covariance matrix of the i th subblock in the original domain.

각 서브블록은 LBG(Linde-Buzo-Gray) 알고리즘을 이용하여 학습된다. 많은 클래스의 코드북들이 코드북 데이터베이스에 포함된다면, 음성 신호에 대한 벡터 양자화 장치의 SNR 효율은 더 향상될 것이다.Each subblock is trained using the Linde-Buzo-Gray (LBG) algorithm. If many classes of codebooks are included in the codebook database, the SNR efficiency of the vector quantizer for speech signals will be further improved.

그런 다음 역 KLT 도메인 변환부(104)는 제2 양자화기(103)에서 출력되는 양자화된 음성 신호를 다시 역 KLT 변환한다. Then, the inverse KLT domain converter 104 inverse KLT transforms the quantized speech signal output from the second quantizer 103 again.

상기 KLT 도메인 변환부(101)와 역 KLT 도메인 변환부(104)는 프레임 단위로 이루어진다.The KLT domain converter 101 and the inverse KLT domain converter 104 are configured in units of frames.

그리고 최종 코드북 결정부(105)는 제1 양자화기(102)와 역 KLT 도메인 변환부(104)에서 출력되는 2 도메인에서의 선택된 코드 벡터와 소스 벡터를 비교하여 가장 가까운 코드 벡터를 선택한다(S230).The final codebook determiner 105 selects the closest code vector by comparing the source code with the selected code vector in the two domains output from the first quantizer 102 and the inverse KLT domain transform unit 104 (S230). ).

즉, 본 발명에 따른 SVQ 는 오리지널 도메인에서 설계한 양자화기와 KLT 도메인에서 설계한 양자화기를 이용하여 각 도메인에서 작은 왜곡을 가지는 코드벡터를 선택하게 된다. 최종적으로 각 도메인에서 선택된 코드벡터와 소스벡터를 비교하여 가장 가까운 코드벡터를 선택한다.That is, the SVQ according to the present invention selects a code vector having a small distortion in each domain by using a quantizer designed in the original domain and a quantizer designed in the KLT domain. Finally, the closest codevector is selected by comparing the source code with the selected codevector in each domain.

KLT 도메인의 코드벡터와 소스벡터의 왜곡 값이 상대적으로 오리지널 도메인의 코드벡터와 소스벡터의 왜곡 값보다 작은 경우 KLT 도메인의 코드벡터를 선택하여 왜곡을 줄이게 된다. 반대로 오리지널 도메인의 코드벡터와 소스벡터의 왜곡 값이 상대적으로 KLT 도메인의 코드벡터와 소스벡터의 왜곡 값보다 작은 경우 오리지널 도메인의 코드벡터를 선택하여 왜곡을 줄이게 된다.When the distortion values of the code vector and the source vector of the KLT domain are relatively smaller than the distortion values of the code vector and the source vector of the original domain, the distortion is reduced by selecting the code vector of the KLT domain. On the contrary, when the distortion values of the code vector and the source vector of the original domain are relatively smaller than the distortion values of the code vector and the source vector of the KLT domain, the distortion is reduced by selecting the code vector of the original domain.

이러한 상호작용으로 인해 서로가 가지는 이상 신호왜곡 부분을 줄이게 된다. 두 가지 도메인을 사용하기 때문에 코드벡터들이 넓게 위치해 소스 미스 매치를 줄일 수 있게 된다.This interaction reduces the abnormal signal distortions of each other. By using two domains, the codevectors are widely located, reducing source mismatches.

본 발명에 따른 KLT 기반 SVQ의 성능을 기존의 SVQ와 비교하여 SD(Spectral distortion)을 측정하면, 본 발명에 따른 KLT 기반 SVQ은 기존의 SVQ에 비해 1비트 이상 이득을 얻는 것을 확인할 수 있다.
When the SD (Spectral Distortion) is measured by comparing the performance of the KLT-based SVQ according to the present invention with the existing SVQ, it can be seen that the KLT-based SVQ according to the present invention obtains a gain of 1 bit or more compared with the existing SVQ.

본 발명은 첨부된 도면에 도시된 실시예를 참고하여 설명되었으나, 이는 예시적인 것에 불과하며, 본 발명과 관련된 기술 분야에서의 통상적인 지식을 가진 자라면, 이로부터 다양한 변형 또는 균등한 타 실시예가 존재 가능하다는 점을 이해할 것이다. Although the present invention has been described with reference to the embodiments illustrated in the accompanying drawings, it is merely an example, and a person having ordinary knowledge in the art related to the present invention may have various modifications or equivalents thereto. I will understand that it is possible.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

100 : SVQ 양자화 장치 101: KLT 도메인 변환부
102 : 제1 양자화기 103 : 제2 양자화기
104: 역 KLT 도메인 변환부 105 : 최종 코드북 결정부100: SVQ quantization apparatus 101: KLT domain converter
102: first quantizer 103: second quantizer
104: inverse KLT domain conversion unit 105: final codebook determination unit

Claims

A KLT domain conversion unit for converting the input voice signal vector into a small sub-block and performing KLT conversion to output a KLT vector signal;
A first quantizer which compares respective values of the input speech signal vector with codes of respective codebooks to find and output the closest codebook;
A second quantizer which compares respective values of the KLT vector with codes of respective codebooks to find and output the closest codebook; And
And a final codebook determiner for selecting the nearest code vector by comparing the quantized code vector in the original domain of the speech signal and the quantized code vector in the KLT domain with a source vector, respectively.

The method of claim 1,
Wherein the first quantizer or the second quantizer sets bit allocations according to covariance matrices and dimensions using the following equation:

Where c and c _i are bits of the entire bit and the i subblock, k and k _i represent the dimensions of the dimension and the i subblock, G _k is a constant that depends on the dimension, and C _Yi is the KLT domain. Or the covariance matrix of the i-th subblock in the original domain.

The method of claim 1,
The quantization device is
A quantization device, characterized in that the split vector quantization device.

The method of claim 1,
The quantization device is
And an inverse KLT domain transform unit for inverse KLT transforming and outputting the KLT vector output from the second quantizer.

The method of claim 4, wherein
The final codebook comparator compares the code vectors output from the first quantizer and the inverse KLT domain transform unit with a source vector.

The method of claim 5, wherein
And the input voice signal is a signal of a linear spectral frequency type.

The method according to claim 6,
The input speech signal is a difference between the input current frame and the predicted current frame data.

The method of claim 5, wherein
The first quantizer is the closest codebook compared to the k ¹ to k ^nth codes of each codebook for each value of the input speech signal vector x ^k having values of x ^k1 to x ^kn .

A quantization device characterized by finding ^k .

Converting the input speech signal vector into a sub-block having a small size and performing KLT conversion to output a KLT vector signal;
A first quantization step of finding and outputting the closest codebook by comparing respective values of the input speech signal vector with codes of respective codebooks;
A second quantization step of finding and outputting the closest codebook by comparing respective values of the KLT vector with codes of respective codebooks; And
And a final codebook determination step of comparing the quantized code vector in the original domain of the speech signal and the quantized code vector in the KLT domain with the source vector to select the nearest code vector.

The method of claim 9,
The first or second quantizer is a quantization method, characterized in that for setting the bit allocation according to the covariance matrix and the dimension using the following equation:

The method of claim 9,
The quantization method
A quantization method characterized by using a split vector quantization algorithm.

The method of claim 9,
The quantization method
And an inverse KLT domain transformation step of inverse KLT transforming and outputting the KLT vector generated in the second quantization step.

13. The method of claim 12,
The final codebook comparison step is a quantization method, characterized in that for comparing the source code and the code vector output in the first quantization step and the inverse KLT domain transform step.

The method of claim 13,
And the input voice signal is a signal of a linear spectral frequency type.

15. The method of claim 14,
The input voice signal is a difference between the input current frame and the predicted current frame data.

The method of claim 13,
The first quantizer is the closest codebook compared to the k ¹ to k ^nth codes of each codebook for each value of the input speech signal vector x ^k having values of x ^k1 to x ^kn .

^A method of quantization, characterized by finding ^k .