KR20040043278A

KR20040043278A - Speech encoder and speech encoding method thereof

Info

Publication number: KR20040043278A
Application number: KR1020020071497A
Authority: KR
Inventors: 성호상
Original assignee: 한국전자통신연구원
Priority date: 2002-11-18
Filing date: 2002-11-18
Publication date: 2004-05-24
Also published as: KR100465316B1

Abstract

PURPOSE: A vocoder and a voice coding method using the vocoder are provided to reduce the quantity of calculations required for fixed codebook search for innovation analysis and decrease the number of memories and bits used for the fixed codebook search. CONSTITUTION: A vocoder includes a pre-processor for pre-processing an input voice signal, a linear predictive coding analysis and quantization unit for carrying out linear predictive coding analysis and quantization for the pre-processed signal, a pitch analyzer for analyzing a pitch to obtain long-section correlation, and a codebook searching unit for searching an algebraic codebook to model an innovation signal. The vocoder further includes a synthesis filter for synthesizing a speech signal using the pitch and code vector searched by the pitch analyzer and the codebook searching unit and a linear predictive coding coefficient output from the linear predictive coding analysis and quantization unit, an adder for calculating an error between the outputs of the pre-processor and the synthesis filter, a weight filter for receiving the error obtained by the adder to output a weight error, and a parameter encoder for encoding the linear predictive coding coefficient, a pitch parameter selected by the pitch analyzer and a codebook parameter selected by the codebook searching unit to output encoded voice data.

Description

Speech Encoder and Speech Encoding Method Using The Same {SPEECH ENCODER AND SPEECH ENCODING METHOD THEREOF}

본 발명은 음성신호 처리 방법에 관한 것으로서, 더욱 상세하게는 16㎑의 샘플링 주기를 가지는 광대역(wideband) 음성 신호의 부호화기 및 그 방법에 관한 것이다.The present invention relates to a speech signal processing method, and more particularly, to an encoder and a method of a wideband speech signal having a sampling period of 16 kHz.

일반적으로 음성 신호처리를 위한 대상신호는 신호의 대역폭을 기준으로 하여 협대역 신호와 광대역 신호로 나눌 수 있다. 협대역 신호는 아날로그 입력 음성신호를 8㎑로 샘플링을 한 16비트 리니어 PCM(linear pulse code modulation) 데이터를 부호화기의 입력신호로 사용하는 것을 의미하고, 광대역 신호는 아날로그 입력신호를 16㎑로 샘플링 한 16비트 리니어 PCM 데이터를 부호화기의 입력신호로 사용하는 것을 의미한다.In general, a target signal for speech signal processing may be divided into a narrowband signal and a wideband signal based on the bandwidth of the signal. Narrow-band signal means using 16-bit linear PCM (linear pulse code modulation) data which sampled the analog input audio signal at 8 kHz as the input signal of the encoder. This means using 16-bit linear PCM data as the input signal of the encoder.

협대역 신호를 대상으로 한 음성 부호화 기술(코덱)들은 ITU-T(International Telecommunications Union Telecommunication)의 표준안 중에서 G.711 ~ G.712의 PCM 방식과 G.720 ~ G.729 시리즈 같은 PCM 이외의 방법으로 압축하는 방식이 있다. 또한, 광대역 신호를 대상으로 하는 음성 코덱은 ITU-T의 G.722와 G.722.1 및 그리고 IMT-2000에 사용예정인 AMR-WB(G.722.2)가 있다.Speech coding techniques (codecs) for narrowband signals are based on the ITU-T (International Telecommunications Union Telecommunication) standard, which is based on the PCM method of G.711 to G.712 and other PCM methods such as G.720 to G.729 series. There is a way to compress. In addition, voice codecs for wideband signals include G.722 and G.722.1 of ITU-T, and AMR-WB (G.722.2), which is to be used for IMT-2000.

ITU-T G.723.1은 대표적인 협대역 음성부호화기의 하나로 멀티미디어 신호를 저속으로 압축하기 위해 표준화되었으며, 입력 음성을 5.3 / 6.3 kbit/s의 듀얼 레이트(dual rate)로 압축, 복원하는 알고리즘이고, 유선망의 음성품질(toll quality)을 제공한다. 또한, G.723.1은 웨이브폼(waveform) 코딩과 파라메트릭(parametric) 코딩이 혼합된 하이브리드(Hybrid) 코딩 기술을 사용하며, CELP(Code Excited Linear Prediction) 계열의 음성 부호화기이다.ITU-T G.723.1 is one of the representative narrowband voice coders, standardized to compress multimedia signals at low speed, and is an algorithm that compresses and recovers input voice at a dual rate of 5.3 / 6.3 kbit / s. Toll quality. In addition, G.723.1 uses a hybrid coding technology that combines waveform coding and parametric coding, and is a code encoder of CELP (Code Excited Linear Prediction) series.

ITU-T G.722는 광대역 오디오 신호를 부호화하기 위해 표준화된 것으로, 64,56,48 kbit/s의 전송률을 가지고, 일대일 통화 품질(face-to-facecommunication quality)을 보유하고 있으며, 두 개의 서브밴드(sub-band)로 나눈 후 각 밴드를 AD-PCM을 이용하여 부호화한다.ITU-T G.722 is standardized for encoding wideband audio signals. It has a data rate of 64,56,48 kbit / s, has face-to-face communication quality, and two sub After dividing into sub-bands, each band is encoded using AD-PCM.

3GPP AMR-WB는 광대역 음성부호화기로서 가장 최근에 표준화된 것이다. 이것은 이동통신의 수요를 확장하기 위해 IMT-2000에 응용할 목적으로 표준화되었으며, 동일한 코덱을 ITU-T에서는 G.722.2로 명명하였고, 유무선에 동시에 사용될 목적으로 표준화되었다. G.722.2는 9개의 전송률을 가지고 있으며 최대 23.85kbit/s 이고, 최대 전송률에서 ITU-T G.722 64 kbit/s를 능가하는 음질을 제공한다.3GPP AMR-WB is the latest standardized wideband voice encoder. It was standardized for application to IMT-2000 to expand the demands of mobile communication. The same codec was named G.722.2 in ITU-T and standardized for simultaneous use in wired and wireless. G.722.2 has nine data rates, up to 23.85 kbit / s, and offers sound quality exceeding ITU-T G.722 64 kbit / s at the maximum data rate.

한편, 최근에 많이 사용되는 CELP 방식의 음성 코덱은 LPC 분석(linear predictive coding analysis), 피치 분석(Pitch analysis) 및 이노베이션 분석(innovation analysis)으로 나눌 수 있으며, 특히 이노베이션을 모델링하기 위해서는 다양한 방식이 제안되어 있다. 이 중 뛰어난 성능을 가지는 대수적(algebraic) 구조의 고정 코드북(fixed codebook)이 많이 사용되고 있다.On the other hand, the recently used CELP speech codec can be divided into LPC analysis (linear predictive coding analysis), pitch analysis (Pitch analysis) and innovation analysis (innovation analysis), in particular to propose a variety of ways to model innovation It is. Among them, a fixed codebook of an algebraic structure having excellent performance is widely used.

대수적 구조의 고정 코드북은 크게 코드북 구조(structure) 구성 방법, 선택된 샘플의 인덱싱(indexing) 방법, 코드북 탐색 방법의 세 가지 구조로 이루어져 있다. 특히, 협대역 음성 부호화기의 대수적 코드북은 계산량이 적어서 많이 사용되고 있다.The fixed codebook of the algebraic structure is composed of three structures, namely, a method of constructing a codebook structure, an indexing method of selected samples, and a method of searching a codebook. In particular, the algebraic codebooks of narrowband speech coders are often used due to their small amount of computation.

그러나, 광대역 음성신호는 16㎑의 샘플링 구조로 협대역 음성신호보다 샘플 개수가 두 배 많기 때문에, 이러한 대수적 코드북을 광대역 음성 부호화기에 적용할 경우, 계산량과 메모리 사용에서 많은 문제점을 가지고 있다.However, since the wideband speech signal has a sampling rate of 16 kHz and has twice as many samples as the narrowband speech signal, the algebraic codebook has many problems in calculation amount and memory usage when applied to the wideband speech coder.

그러므로 본 발명은 이러한 문제점을 해결하기 위한 것으로, CELP 방식의 광대역 음성 부호화 방법에서 이노베이션 분석을 위한 고정 코드북 탐색에 사용되는 계산량을 줄이고, 사용 메모리와 사용 비트를 감소시킬 수 있는 고정 코드북 구조 및 이를 이용한 고정 코드북 검색 방법을 제공하는 것을 목적으로 한다.Therefore, the present invention is to solve this problem, fixed codebook structure that can reduce the amount of computation used in the fixed codebook search for innovation analysis, the use memory and the use bit in the CELP wideband speech coding method and using the same An object of the present invention is to provide a fixed codebook retrieval method.

도 1은 일반적인 CELP 방식의 부호화기의 구조를 나타낸 도이다.1 is a diagram showing the structure of a typical CELP encoder.

도 2는 일반적인 CELP 방식의 복호화기의 구조를 나타낸 도이다.2 is a diagram illustrating the structure of a decoder of a general CELP scheme.

도 3은 일반적인 AMR-WB 12.65 kbit/s의 대수적 코드북의 구조를 나타낸 도이다.3 is a diagram illustrating the structure of a general AMR-WB 12.65 kbit / s algebraic codebook.

도 4는 본 발명의 실시예에 따른 분할 대수적 코드북 구조를 나타낸 도이다.4 is a diagram illustrating a partitioned algebra codebook structure according to an embodiment of the present invention.

***도면의 주요 부분에 대한 부호의 설명****** Description of the symbols for the main parts of the drawings ***

101: 전처리기102: LPC 분석부101: preprocessor 102: LPC analysis unit

103: 합성 필터104: 피치 분석부103: synthesis filter 104: pitch analysis unit

105: 고정코드북 탐색부108: 고정 코드북105: fixed codebook search unit 108: fixed codebook

110: 적응 코드북113: 파라미터 인코더110: adaptive codebook 113: parametric encoder

이러한 기술적 과제를 달성하기 위한 본 발명의 특징에 따른 대수적 코드북 구조는, 입력된 음성 신호를 부호화된 음성 데이터로 변환하는 음성 부호화기에서 부프레임 단위로 이루어지는 대수적 코드북 구조에 있어서, 소정수의 일련 펄스 번호, 상기 각 펄스 번호에 대응하는 일련의 트랙 및 상기 각 트랙에 속하는 펄스 위치를 포함하며, 제1 파트 및 제2 파트로 구분되고, 상기 펄스번호 및 펄스 위치가 상기 각 파트로 나누어져서 구성된다.The algebraic codebook structure according to the characteristics of the present invention for achieving the technical problem is a predetermined number of serial pulse numbers in an algebraic codebook structure in subframe units in a speech encoder for converting an input speech signal into encoded speech data. And a series of tracks corresponding to the respective pulse numbers and pulse positions belonging to the respective track numbers, divided into first and second parts, and the pulse numbers and pulse positions divided into the respective parts.

상기 각 파트에 포함되는 상기 펄스 위치는 상기 부프레임 샘플수의 1/2보다작거나 같은 것을 특징으로 한다.The pulse position included in each part may be less than or equal to 1/2 of the number of subframe samples.

또한, 본 발명의 특징에 따른 음성 부호화기는, 입력된 음성 신호를 부호화된 음성 데이터로 변환하는 음성 부호화기에 있어서, 상기 입력된 음성 신호에 대하여 전처리 과정을 수행하는 전처리부; 상기 전처리된 신호에 대하여 LPC(linear predictive coding) 분석 및 양자화 처리를 수행하는 LPC 분석 및 양자화부; 장구간 상관도를 구하기 위하여 피치를 분석하는 피치 분석부; 여기신호에서 상기 장구간 상관도에 대한 기여분을 제외한 이노베이션 신호를 모델링하기 위하여 대수적 코드북 - 여기서 대수적 코드북은 부프레임 단위로 이루어지며, 대수적 코드북의구조는, 소정수의 일련 펄스 번호, 상기 각 펄스 번호에 대응하는 일련의 트랙 및 상기 각 트랙에 속하는 펄스 위치를 포함하며, 제1 파트 및 제2 파트로 구분되고, 상기 펄스번호 및 펄스 위치가 상기 각 파트로 나누어져서 구성됨 -을 탐색하는 코드북 탐색부; 상기 피치 분석부 및 코드북 탐색부에 의해 검색된 피치 및 코드벡터와, 상기 LPC 분석 및 양자화부에서 출력된 LPC 계수를 이용하여 음성 신호를 합성하는 합성필터; 상기 전처리부의 출력과 상기 합성 필터의 오차를 구하는 가산기; 상기 가산기에 의하여 구해진 오차를 입력하여 가중치 오차를 출력하는 가중치 필터; 및 상기 LPC 계수, 상기 피치 분석부에서 선택된 피치 파라미터 및 상기 코드북 탐색부에서 선택된 코드북 파라미터를 부호화하여 부호화된 음성 데이터를 출력하는 파라미터 인코딩부를 포함한다.In addition, the speech encoder according to an aspect of the present invention, a speech encoder for converting the input speech signal into the encoded speech data, the speech encoder comprising: a preprocessor for performing a pre-processing process for the input speech signal; An LPC analysis and quantization unit performing linear predictive coding (LPC) analysis and quantization processing on the preprocessed signal; A pitch analyzer for analyzing pitches to obtain long-term correlations; Algebraic codebook to model the innovation signal excluding the contribution to the long-term correlation in the excitation signal, where the algebraic codebook consists of subframe units, and the structure of the algebraic codebook includes a predetermined number of serial pulse numbers and each pulse number A codebook search unit comprising a series of tracks corresponding to and a pulse position belonging to each track, divided into a first part and a second part, wherein the pulse number and the pulse position are divided into the respective parts. ; A synthesis filter for synthesizing a speech signal using the pitch and code vectors retrieved by the pitch analyzer and the codebook search unit and the LPC coefficients output from the LPC analyzer and the quantization unit; An adder for obtaining an error between the output of the preprocessor and the synthesis filter; A weight filter for outputting a weight error by inputting the error obtained by the adder; And a parameter encoding unit configured to output the encoded speech data by encoding the LPC coefficient, the pitch parameter selected by the pitch analyzer, and the codebook parameter selected by the codebook search unit.

상기 각 파트에 포함되는 상기 펄스 위치는 상기 부프레임 샘플수의 1/2보다 작거나 같은 것을 특징으로 한다.The pulse position included in each part may be less than or equal to 1/2 of the number of subframe samples.

본 발명의 다른 특징에 따른 음성 부호화 방법은, 입력된 음성 신호를 부호화된 음성 데이터로 변환하는 음성 부호화 방법에 있어서, a) 상기 입력된 음성 신호에 대하여 전처리 과정을 수행하는 단계; b) 상기 전처리된 신호에 대하여 LPC(linear predictive coding) 분석 및 양자화 처리를 수행하는 단계; c) 피치를 분석하고 상기 입력 음성 신호를 합성하기 위한 코드벡터를 발생하기 위하여 대수적 코드북 - 여기서 대수적 코드북은 부프레임 단위로 이루어지며, 대수적 코드북의 구조는, 소정수의 일련 펄스 번호, 상기 각 펄스 번호에 대응하는 일련의 트랙 및 상기 각 트랙에 속하는 펄스 위치를 포함하며, 제1 파트 및 제2 파트로 구분되고, 상기 펄스번호 및 펄스 위치가 상기 각 파트로 나누어져서 구성됨 - 을 탐색하는 단계 - 여기서 대수적 코드북을 탐색하는 단계는, 상기 제1 파트에 속하는 펄스에 대해서 대수 코드 탐색을 수행하여 입력된 음성신호에 대한 목표 벡터와의 오차를 최소화하는 대수 코드를 탐색한 후 상기 제2 파트에 속하는 펄스에 대해서 대수 코드 탐색을 수행하여 입력된 음성신호에 대한 목표 벡터와의 오차를 최소화하는 대수 코드를 탐색함; d) 상기 피치 분석 및 코드북 탐색에 의해 검색된 피치 및 코드벡터와, 상기 LPC 분석 및 양자화부에서 출력된 LPC 계수를 이용하여 음성 신호를 합성하는 단계; 및 e) 상기 LPC 계수, 상기 피치 분석에 의하여 선택된 피치 파라미터 및 상기 코드북 탐색에 의하여 선택된 코드북 파라미터를 부호화하여 부호화된 음성 데이터를 출력하는 단계를 포함한다.According to another aspect of the present invention, there is provided a speech encoding method comprising: a) performing a preprocessing process on an input speech signal; b) performing linear predictive coding (LPC) analysis and quantization on the preprocessed signal; c) an algebraic codebook for analyzing the pitch and generating a codevector for synthesizing the input speech signal, wherein the algebraic codebook is in subframe units, and the structure of the algebraic codebook is a predetermined number of serial pulse numbers, each pulse A series of tracks corresponding to the number and pulse positions belonging to the respective tracks, each divided into a first part and a second part, wherein the pulse number and the pulse position are divided into the respective parts. The searching of the algebraic codebook may include performing an algebraic code search on a pulse belonging to the first part, searching for an algebraic code that minimizes an error with a target vector for an input speech signal, and then searching for an algebraic codebook belonging to the second part. Algebraic code search for pulses to minimize error with target vector for input speech signal Must navigate; d) synthesizing a speech signal using the pitch and code vectors retrieved by the pitch analysis and codebook search and the LPC coefficients output from the LPC analysis and quantization unit; And e) encoding the LPC coefficients, the pitch parameters selected by the pitch analysis, and the codebook parameters selected by the codebook search to output encoded speech data.

상기 c) 단계는,C),

상기 제1 파트에 속하는 펄스에 대하여 순차적으로 상기 펄스의 위치를 선택하는 단계; 상기 제 2파트에 속하는 펄스에 대하여 순차적으로 상기 펄스의 위치를 선택하는 단계; 및 상기 위치가 선택된 펄스에 대하여 부호를 선택하는 단계를 포함한다.Selecting positions of the pulses sequentially with respect to the pulses belonging to the first part; Selecting positions of the pulses sequentially with respect to the pulses belonging to the second part; And selecting a sign for the pulse at which the position is selected.

이하, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있는 가장 바람직한 실시예를 첨부된 도면을 참조로 하여 상세히 설명하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings the most preferred embodiment that can be easily carried out by those of ordinary skill in the art as follows.

도 1은 일반적인 CELP 방식의 음성 부호화기의 구성을 나타낸 도이다.1 is a diagram illustrating a configuration of a general CELP speech coder.

도 1에 도시된 바와 같이, 전처리기(101)는 입력 음성 신호 s(n)에 대하여전처리 과정을 수행하고, LPC(Linear Predictive Coding) 분석 및 양자화기(102)는 전처리기(101)에 의해 전처리된 음성 신호에 대하여 LPC 분석 및 양자화 처리를 수행하고 LPC 계수를 출력한다. 합성 필터(103)는 LPC 분석 및 양자화기(102)에서 출력된 LPC 계수는 전송을 위하여 양자화가 되며, 양자화된 계수는 합성필터(103)를 구성한다. 이렇게 구성된 합성 필터(103)는 합성에 의한 분석(Analysis-by-Synthesis) 과정에서 검색된 결과(피치와 코드벡터)를 이용하여 최종 음성 신호를 합성하기 위한 필터로 사용된다. 합성 필터(103)의 함수는 다음의 수학식 1과 같다.As shown in FIG. 1, the preprocessor 101 performs a preprocessing process on the input speech signal s (n), and the linear predictive coding (LPC) analysis and the quantizer 102 are performed by the preprocessor 101. LPC analysis and quantization processing are performed on the preprocessed speech signal and the LPC coefficients are output. In the synthesis filter 103, the LPC coefficients output from the LPC analysis and quantizer 102 are quantized for transmission, and the quantized coefficients constitute the synthesis filter 103. The synthesis filter 103 configured as described above is used as a filter for synthesizing the final speech signal using the results (pitch and code vector) searched in the analysis-by-synthesis process. The function of the synthesis filter 103 is as shown in Equation 1 below.

위의 식에서 _i(i=1,…,m)는 양자화 된 LPC 계수이다. 또한, 위의 식에서 m의 값에 따라 예측 차수가 결정되는데, 보통 협대역의 음성코덱은 10차를 사용하며, 광대역의 음성 코덱은 16~20정도의 차수를 사용한다.In the above expression _i (i = 1, ..., m) is the quantized LPC coefficient. In addition, in the above equation, the prediction order is determined according to the value of m. In general, the narrowband speech codec uses the 10th order, and the wideband speech codec uses the order of about 16-20.

합성 필터(103)가 구성된 다음에는, 이 구성된 합성 필터를 이용하여 폐루프로 돌면서 여기신호를 구한다. LPC 계수분석은 프레임당 한번만 이루어지며, 여기신호는 부프레임 단위로 구해진다. 각 부프레임에 대한 합성 필터를 구성하기 위하여 현재 프레임에서 구해진 LPC 계수는 이전 프레임에서 구해진 LPC 신호와 보간을 통하여 각 부프레임에 대한 합성 필터를 구성하게 된다.After the synthesis filter 103 is configured, the excitation signal is obtained by turning to the closed loop using the constructed synthesis filter. LPC coefficient analysis is performed only once per frame, and the excitation signal is obtained in subframe units. In order to construct a synthesis filter for each subframe, the LPC coefficients obtained in the current frame form a synthesis filter for each subframe through interpolation with the LPC signal obtained in the previous frame.

여기신호는 적응 코드북(110)에서 구해지는 장구간 상관도 신호와 고정 코드북(108)에서 구해지는 신호가 있다. 이 두 신호에 적절한 이득을 곱한 후에 합하면 합성필터의 여기신호가 된다. 다음의 수학식 2는 적응 코드북(110)에서 구해지는 장구간 상관도 신호의 합성 필터의 함수를 나타낸다.The excitation signal includes a long term correlation signal obtained from the adaptive codebook 110 and a signal obtained from the fixed codebook 108. These two signals are multiplied by the appropriate gains and then summed to form the excitation signal of the synthesis filter. Equation 2 below represents a function of the synthesis filter of the long-term correlation signal obtained from the adaptive codebook 110.

위의 식에서 T는 피치주기이고, g_p는 이득이다. 즉, 주기 T 이전의 과거의 합성신호를 이용하여 현재 신호를 장구간으로 예측을 하며, 여기에 이득을 곱해주면 현재의 장구간 상관도 신호가 된다.Where T is the pitch period and g _p is the gain. That is, the current signal is predicted in the long period using the past synthesized signal before the period T, and multiplying the gain by the gain results in the current long-term correlation signal.

이러한 장구간 상관도의 주요 계수인 T와 g_p를 구한 후, 더 정밀한 여기신호를 구하기 위하여 피치 분석(104) 및 코드북 탐색(105) 과정이 수행된다. 이때, 주어진 모든 경우의 피치 주기와 이득, 그리고 코드북 내의 모든 코드벡터에 대하여 합성된 신호의 출력(합성 필터(103)의 출력)을 구한다.After obtaining T and g _p , which are main coefficients of the long-term correlation, the pitch analysis 104 and the codebook search 105 are performed to obtain a more precise excitation signal. In this case, the output of the synthesized signal (output of the synthesis filter 103) is obtained for the pitch period and the gain in all the given cases and all the code vectors in the codebook.

가산기(106)는 전처리기(101)의 출력과 합성 필터(103)의 출력간의 오차를 구하고, 적응 가중치 필터(107)는 가산기(106)에 의해 구해진 오차를 입력하여 가중치 오차(weighted error)를 출력한다. 이 가중치 오차의 에너지가 최소가 되도록 피치 분석(104) 및 코드북 탐색(105) 과정이 수행된다.The adder 106 obtains an error between the output of the preprocessor 101 and the output of the synthesis filter 103, and the adaptive weight filter 107 inputs the error obtained by the adder 106 to obtain a weighted error. Output Pitch analysis 104 and codebook search 105 are performed to minimize the energy of the weight error.

이와 같이 LPC 분석(102), 피치 분석(104) 및 코드북 탐색(105)의 과정이 수행된 이후, 파라미터 인코더(113)는 LPC 계수와 선택된 피치 파라미터 및 코드북 파라미터를 부호화(113)함으로써, 입력 음성 신호 s(n)을 비트 스트림 데이터로 변환하여 출력한다.After the processes of the LPC analysis 102, the pitch analysis 104, and the codebook search 105 are performed as described above, the parameter encoder 113 encodes the LPC coefficients, the selected pitch parameter, and the codebook parameters by inputting the input voice. The signal s (n) is converted into bit stream data and output.

한편, 고정 코드북 탐색(105)의 대상신호는 장구간 상관도를 구하기 위한 대상 신호에서 장구간 상관도 신호를 제하고 남은 신호이다. 고정 코드북은 다양한 방식으로 구현되는데, 최근에는 대수적 코드북 구조를 가장 많이 사용하고 있다. 이 방식은 코드북을 저장하기 위한 메모리가 필요 없으며 고속으로 원하는 이노베이션 신호를 구할 수 있다. 또한, 이 구조는 계산량이 많다는 단점이 있는데, 최근에는 다양한 패스트 알고리즘(fast algorithm)이 제안되어 계산량을 감소하였다. 그러나, 아직도 전체 부호화 과정에서 가장 많은 계산량을 차지하고 있는 부분이 바로 고정 코드북 탐색 루틴이다.The target signal of the fixed codebook search 105 is a signal remaining after subtracting the long-term correlation signal from the target signal for obtaining the long-term correlation. Fixed codebooks are implemented in a variety of ways, with the most recent use of the algebraic codebook structure. This method does not require memory to store codebooks and can obtain desired innovation signals at high speed. In addition, this structure has a disadvantage in that a large amount of calculation, and a variety of fast algorithm has recently been proposed to reduce the amount of calculation. However, the fixed codebook search routine still occupies the largest amount of computation in the entire encoding process.

이 방식에서 최적의 펄스를 구하기 위한 기준은 다음의 수학식 3과 같다.In this method, a criterion for obtaining an optimal pulse is shown in Equation 3 below.

위의 수학식에서 이노베이션 백터 c_k는 코드북에서 미리 설정되어 선택된 펄스의 개수로 이루어진다.In the above equation, the innovation vector c _k is the number of pulses selected in advance in the codebook.

또한, 벡터 d와 Φ는 각각 다음의 수학식 4 및 수학식 5에 의하여 미리 계산된다.Further, the vectors d and φ are calculated in advance by the following equations (4) and (5), respectively.

여기서, x₂(n)은 고정 코드북 탐색을 위한 타겟(target) 신호이고, h(n)은가중 합성 필터의 임펄스 응답이다.Here, x ₂ (n) is a target signal for fixed codebook search, and h (n) is an impulse response of a weighted synthesis filter.

이와 같이, 도 1에 도시된 바와 같은 일반적인 CELP 음성 부호화기에서는 모든 가능한 피치 및 코드북에 대하여 음성 신호를 실제로 합성하면서 입력 신호와 가장 근접한 경우의 피치 및 코드벡터를 선택하기 때문에, 매우 많은 반복적인 계산 과정이 필요하다. 따라서, 일반적으로는 먼저 코드북을 무시하고 피치에 대해서만 검색을 한 다음, 선택된 피치 결과를 고정시키고 코드북에 대하여 검색을 수행한다.As described above, the general CELP speech coder as shown in FIG. 1 selects the pitch and code vector when it is the closest to the input signal while actually synthesizing the speech signal for all possible pitch and codebooks. This is necessary. Therefore, in general, the codebook is first ignored and only the pitch is searched. Then, the selected pitch result is fixed and the codebook is searched.

한편, 일반적인 CELP 음성 부호화기에서 고정 코드북의 크기는 상당히 크기 때문에 많은 경우의 코드벡터에 대한 검색을 해야 한다. 또한, 고정 코드북은 피치보다 자주 검색을 하기 때문에, 전체 CELP 음성 부호화기의 계산량에서 고정 코드북 탐색을 위한 계산량이 가장 큰 비중을 차지한다.On the other hand, since the size of a fixed codebook is quite large in a general CELP speech coder, a search for a code vector in many cases is required. In addition, since the fixed codebook searches more frequently than the pitch, the calculation amount for the fixed codebook search occupies the largest portion in the calculation amount of the entire CELP speech coder.

그러므로, CELP 음성 부호화기에서 고정 코드북의 구조는 계산량과 음성 부호화기의 성능을 결정하는 매우 중요한 부분이다. 고정 코드북의 구조가 아주 간단하고 크기가 작으면 검색을 위한 계산량은 줄어들지만, 다양한 형태의 여기 신호를 표시할 수 없기 때문에 부호화기의 성능은 떨어진다. 반면에, 고정 코드북의 크기가 크면 부호화기의 성능은 향상시킬 수 있지만, 검색을 위한 계산량은 현저히 증가한다. 그리고, 실제 위치 정보를 찾는 계산을 할 때에도, 트랙의 위치정보에 따라 계산량이 증가한다.Therefore, the structure of the fixed codebook in the CELP speech coder is a very important part in determining the amount of computation and the performance of the speech coder. If the structure of the fixed codebook is very simple and small in size, the computational complexity for the search is reduced, but the performance of the encoder is poor because it cannot display various types of excitation signals. On the other hand, if the size of the fixed codebook is large, the performance of the encoder can be improved, but the amount of computation for searching is significantly increased. Further, even when calculating to find the actual positional information, the amount of calculation increases in accordance with the positional information of the track.

따라서, AMR-WB에서는 뎁스 퍼스트 서치(depth first search) 방식을 사용하여 메모리의 사용을 줄일 수 있으며, 실제로 16(트랙) × 16(트랙) × 4 = 1024(워드)의 메모리가 필요하다. 그러나, 풀 서치(Full search) 방식을 사용하면 16(트랙) × 16(트랙) × 6 = 1536 (워드)의 메모리가 필요하다.Therefore, in AMR-WB, the use of a depth first search method can reduce the use of memory, and in fact, 16 (track) × 16 (track) × 4 = 1024 (word) memory is required. However, using the full search method requires memory of 16 (track) x 16 (track) x 6 = 1536 (word).

한편, 도 2는 일반적인 CELP 방식의 복호화기를 나타낸 것이다.On the other hand, Figure 2 shows a typical CELP decoder.

도 2에 도시된 바와 같이, 복호화기는 부호화기에서 전송된 비트 스트림을 이용하여 LPC 합성필터(206)를 구성하고, 고정 코드북(202)과 적응 코드북(204)의 인덱스를 복호화하여 각각 이득을 곱한 후 여기신호를 만들어 낸다. 이 신호를 LPC 합성필터(206)를 통과시키면 합성 신호가 발생하며, 이 신호는 후처리 필터(207)를 거쳐서 듣기 좋은 소리로 만들어진다.As shown in FIG. 2, the decoder configures the LPC synthesis filter 206 using the bit stream transmitted from the encoder, decodes the indexes of the fixed codebook 202 and the adaptive codebook 204, and multiplies the gains. Generate an excitation signal. Passing this signal through the LPC synthesis filter 206 generates a synthesized signal, which is made to be audible through the post-processing filter 207.

본 발명의 실시예에 따른 고정 코드북의 구조를 상세히 설명하면 다음과 같다.Referring to the structure of a fixed codebook according to an embodiment of the present invention in detail.

일반적인 AMR-WB는 7㎑의 대역폭을 갖는 광대역 음성 부호화기이므로 16㎑의 샘플링 주기를 갖는다. 이 코덱은 스프릿 밴드(Split band) 구조를 이용하며, 6.4㎑를 기준으로 하위밴드와 상위 밴드로 나누어진다. 이 중, 상위 밴드는 정보를 거의 포함하지 않으므로 단순하게 모델링되며, 실제로 중요한 신호가 포함되어있는 하위밴드는 CELP 방식으로 부호화한다. 이 코덱의 프레임 크기는 20ms이므로 한 프레임의 샘플 수는 하위 밴드에 대하여 256 샘플이 된다. 여기에 4개의 부프레임을 가지고 있으므로, 하나의 부프레임에서 처리하는 샘플은 64 샘플이다.The typical AMR-WB has a sampling period of 16 ms since the wideband speech coder has a bandwidth of 7 ms. The codec uses a split band structure and is divided into lower band and upper band based on 6.4 dB. Among them, the upper band contains little information, so it is simply modeled, and the lower band including the important signal is encoded by the CELP method. Since the frame size of this codec is 20ms, the number of samples of one frame is 256 samples for the lower band. Since we have four subframes here, 64 samples are processed in one subframe.

도 3은 이러한 64 샘플에 해당하는 종래의 AMR-WB에서 12.65kbit/s의 대수적 코드북의 구조를 나타낸 것이다.Figure 3 shows the structure of an algebraic codebook of 12.65 kbit / s in the conventional AMR-WB corresponding to these 64 samples.

도 3에 도시된 바와 같이, 각 트랙은 16개의 가능한 포지션(position)을 가지고 있으며, 각 트랙에 대하여 펄스가 할당되어 있고, 각 펄스는 각 트랙에서 지정한 포지션 내에만 존재할 수 있다. 또한, 각 트랙에서 정해지는 펄스의 개수는 전송률에 따라 다르다.As shown in FIG. 3, each track has 16 possible positions, pulses are assigned to each track, and each pulse may only exist within the position specified in each track. In addition, the number of pulses determined in each track depends on the transmission rate.

즉, 각 트랙에서 포지션이 정해지면 해당 펄스의 사인(sign)이 정해진다. 이때, 각 펄스 당 하나의 사인이 정해지며, 이러한 사인 정보와 포지션 정보만 있으면 고정 코드북에서 구해지는 펄스를 생성할 수 있다.In other words, when a position is determined in each track, a sign of the corresponding pulse is determined. At this time, one sine is determined for each pulse, and if only the sine information and the position information are provided, a pulse obtained from the fixed codebook can be generated.

한편, 일반적으로 고정 코드북의 성능을 결정하는 것으로는 코드북 구조의 구성방식, 펄스 인덱싱(pulse indexing) 방식, 코드북 탐색 방식이 있다. 본 발명은 이 중 코드북의 구조를 스프릿(split) 구조로 구성하여 펄스 인덱싱과 코드북 탐색에서 많은 이점을 가지도록 하며, 사용하는 메모리 양도 줄일 수 있도록 한 것이다.On the other hand, the performance of the fixed codebook generally determines the configuration of the codebook structure, pulse indexing (pulse indexing) method, codebook search method. The present invention is to configure the structure of the codebook of the split (split) structure to have many advantages in pulse indexing and codebook search, and to reduce the amount of memory used.

도 4는 본 발명의 실시예에 따른 스프릿 구조를 가지는 대수적 코드북의 구조를 나타낸 것이다.4 illustrates a structure of an algebraic codebook having a split structure according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 실시예에 따른 대수적 코드북은, 하나의 부프레임 단위로 구성되어 있는 코드북을 더 작은 크기인 2개의 파트로 나누었다. 즉, 본 발명의 실시예에 따른 코드북 구조를 도 3의 일반적인 코드북 구조와 비교하면, 트랙의 크기가 작아진 것을 알 수 있다.As shown in FIG. 4, an algebraic codebook according to an embodiment of the present invention divides a codebook configured in one subframe unit into two parts having smaller sizes. That is, when the codebook structure according to the embodiment of the present invention is compared with the general codebook structure of FIG. 3, it can be seen that the track size is reduced.

또한, 본 발명의 실시예에 따른 코드북 구조에 의하면, 코드북의 앞부분 절반만으로 모든 탐색과정을 독립적으로 수행할 수 있다. 즉, 포지션과 사인 정보를 구하기 위해서는 앞에서 언급한 수학식 3을 통하여 필요한 정보를 미리 계산하여 메모리에 저장을 하게 되는데, 본 발명의 실시예에 따른 스프릿 구조의 대수적 코드북은 하나의 파트에 대하여 고정 코드북 검색을 수행할 때 절반의 위치에 대한 정보들만을 계산하여 메모리에 저장한 후 검색을 수행하고, 이후에 나머지 위치에 대한 정보에 대하여 메모리에 다시 저장하여 검색을 수행하므로 메모리 사용량을 줄일 수 있다. 또한, 포지션의 탐색에 있어서도 각 파트별로 탐색을 하게되므로 동일한 위치를 검색하는 것에 비하여 훨씬 적은 계산량으로 탐색을 수행할 수 있다.In addition, according to the codebook structure according to the embodiment of the present invention, all search processes can be independently performed with only the first half of the codebook. In other words, in order to obtain position and sign information, necessary information is calculated in advance through Equation 3 and stored in a memory. The algebraic codebook of a split structure according to an embodiment of the present invention is a fixed codebook for one part. When performing a search, only the information about the half of the location is calculated and stored in the memory, and then the search is performed. In addition, since the search is performed for each part in the position search, the search can be performed with much less computation amount than the search for the same position.

예를 들면, 종래의 코드북 구조에서 각 트랙이 16개의 포지션을 가지고 있다면, 2개의 트랙에 대하여 모든 포지션을 찾기 위해서는 16 × 16 = 256 회의 계산량이 필요하다. 그러나, 본 발명의 실시예에 따라 16개의 포지션을 가지는 트랙을 8개의 포지션으로 이루어진 2개의 파트로 나누면, 각 파트의 포지션으로 탐색할 수 있는 회수는 8 × 8 = 64 회가 되며, 이것을 2개의 파트에서 실시하게 되므로 64 × 2 = 128회의 탐색이 이루어진다. 따라서 탐색 위치가 거의 절반으로 감소하는 효과가 발생한다.For example, in the conventional codebook structure, if each track has 16 positions, 16 x 16 = 256 calculations are needed to find all positions for the two tracks. However, according to an embodiment of the present invention, if a track having 16 positions is divided into two parts consisting of eight positions, the number of positions that can be navigated to the positions of each part is 8 × 8 = 64 times, which is two As part of the process, 64 × 2 = 128 searches. Thus, the search position is reduced by almost half.

또한, 펄스 인덱싱에 있어서도 각 트랙의 포지션이 적으므로 적은 비트로 위치정보를 인코딩 할 수 있다.Also, even in pulse indexing, since the position of each track is small, position information can be encoded with a few bits.

한편, 본 발명의 실시예에 따른 스프릿 구조의 대수적 코드북에서는 n1과 n2값이 중요하다. 도 4에서 펄스 K*index₀, K*index₁, K*index₂, K*index₃의 위치에 해당하는 0,…,n1은 다양하게 재조합하여 적당한 트랙에 넣을 수 있지만, n1보다 큰 값이 포함되면 독립적으로 탐색과정을 실시할 수 없기 때문에 n1보다 큰 값이 포함되지 않도록 한다.Meanwhile, in the algebraic codebook of the split structure according to the embodiment of the present invention, n1 and n2 values are important. In FIG. 4, 0,... Corresponding to the positions of pulses K * index ₀ , K * index ₁ , K * index ₂ , and K * index ₃ . , n1 can be recombined in various ways and put in a proper track, but if a value larger than n1 is included, the search process cannot be performed independently.

예를 들어, G.722.2의 구조를 본 발명의 실시예에 따른 스프릿 구조의 대수적 코드북에 적용하면, 도 4에서 n1은 31이 되고, n2는 63이 된다. 이렇게 되면 펄스의 위치는 1/2로 제한을 받게 되고, 이로 인해 약간의 음질 저하가 발생할 수 있다.For example, applying the structure of G.722.2 to the algebraic codebook of the split structure according to the embodiment of the present invention, n1 becomes 31 and n2 becomes 63 in FIG. This limits the position of the pulse to half, which can result in some degradation in sound quality.

그러나, 본 발명의 실시예에 따른 대수적 코드북 구조의 구성방식을 사용하면 LPC 분석 및 양자화 과정을 제외한 모든 과정이 부프레임 단위로 이루어진다. 따라서, 펄스 인덱싱을 할 때 비트를 아낄 수 있고, 코드북 탐색 시에 고속으로 탐색이 가능하며, 메모리의 사용을 줄일 수 있다.However, when the algebraic codebook structure configuration method according to the embodiment of the present invention is used, all processes except for LPC analysis and quantization are performed in subframe units. Therefore, it is possible to save bits when performing pulse indexing, search at a high speed during codebook search, and reduce memory usage.

또한, 각 트랙에 할당하는 펄스 개수와, 각 펄스의 사인 정보는 설계자에 의하여 다양하게 설정될 수 있다.In addition, the number of pulses allocated to each track and the sign information of each pulse may be variously set by a designer.

상기 도면과 발명의 상세한 설명은 단지 본 발명의 예시적인 것으로서, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.The drawings and detailed description of the invention are merely exemplary of the invention, which are used for the purpose of illustrating the invention only and are not intended to limit the scope of the invention as defined in the appended claims or claims. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible from this. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

이상에서와 같이 본 발명에 따른 스프릿 구조의 코드북을 적용하면, 각 탐색을 위한 서브 파트마다 더 적은 포지션 정보를 갖게 되어 메모리의 사용이 적어진다. 이에 따라, 탐색해야 할 포지션이 적어지므로 계산량도 감소하고, 고정 코드북 탐색을 고속으로 수행할 수 있다.When the codebook of the split structure according to the present invention is applied as described above, less position information is provided for each sub-part for each search, and memory usage is reduced. Accordingly, since the position to be searched is reduced, the amount of calculation is also reduced, and fixed codebook search can be performed at high speed.

또한, 포지션 정보가 적으므로 펄스의 인덱싱 과정에서도 비트를 많이 아낄 수 있다. 게다가, 코덱의 계산량을 줄일 수 있으므로, 시스템에 적용할 때 적은 용량의 프로세서로 구현이 가능하며 전체 시스템의 가격을 절감할 수 있다.In addition, since the position information is small, a lot of bits can be saved even during the indexing process of the pulse. In addition, since the codec can be reduced, it can be implemented with a small processor when applied to the system, and the cost of the entire system can be reduced.

Claims

In the algebraic codebook structure of a subframe unit in a speech encoder for converting an input speech signal into encoded speech data,

A predetermined number of serial pulse numbers, a series of tracks corresponding to each pulse number, and pulse positions belonging to each track,

Divided into a first part and a second part, wherein the pulse number and the pulse position are divided into the respective parts,

Algebraic codebook structure.

The method of claim 1,

The pulse position included in each part is less than or equal to 1/2 of the number of subframe samples.

Algebraic codebook structure.

In the speech encoder for converting the input speech signal to the encoded speech data,

A preprocessing unit performing a preprocessing process on the input voice signal;

An LPC analysis and quantization unit performing linear predictive coding (LPC) analysis and quantization processing on the preprocessed signal;

A pitch analyzer for analyzing pitches to obtain long-term correlations;

Algebraic codebooks for modeling innovation signals excluding contributions to long-term correlations in an excitation signal, wherein the algebraic codebooks are composed of subframe units, and the structure of the algebraic codebooks includes a predetermined number of serial pulse numbers A codebook search unit comprising a series of tracks corresponding to and a pulse position belonging to each track, divided into a first part and a second part, wherein the pulse number and the pulse position are divided into the respective parts. ;

A synthesis filter for synthesizing a speech signal using the pitch and code vectors retrieved by the pitch analyzer and the codebook search unit and the LPC coefficients output from the LPC analyzer and the quantization unit;

An adder for obtaining an error between the output of the preprocessor and the synthesis filter;

A weight filter for outputting a weight error by inputting the error obtained by the adder; And

A parameter encoder which outputs encoded speech data by encoding the LPC coefficient, the pitch parameter selected by the pitch analyzer, and the codebook parameter selected by the codebook searcher

Speech signal encoder comprising a.

The method of claim 3,

Speech signal encoder.

In the speech encoding method for converting an input speech signal into encoded speech data,

a) performing a preprocessing process on the input voice signal;

b) performing linear predictive coding (LPC) analysis and quantization on the preprocessed signal;

c) an algebraic codebook for analyzing the pitch and generating a codevector for synthesizing the input speech signal, wherein the algebraic codebook is in subframe units, and the structure of the algebraic codebook is a predetermined number of serial pulse numbers, each pulse A series of tracks corresponding to the number and pulse positions belonging to the respective tracks, each divided into a first part and a second part, wherein the pulse number and the pulse position are divided into the respective parts. The searching of the algebraic codebook may include performing an algebraic code search on a pulse belonging to the first part, searching for an algebraic code that minimizes an error with a target vector for an input speech signal, and then searching for an algebraic codebook belonging to the second part. Algebraic code search for pulses to minimize error with target vector for input speech signal Must navigate;

d) synthesizing a speech signal using the pitch and code vectors retrieved by the pitch analysis and codebook search and the LPC coefficients output from the LPC analysis and quantization unit; And

e) outputting encoded speech data by encoding the LPC coefficient, the pitch parameter selected by the pitch analysis, and the codebook parameter selected by the codebook search.

Speech signal encoding method comprising a.

The method of claim 5,

C),

Selecting positions of the pulses sequentially with respect to the pulses belonging to the first part;

Selecting positions of the pulses sequentially with respect to the pulses belonging to the second part; And

Selecting a sign for the pulse at which the position is selected

Speech signal encoding method comprising a.