KR0138878B1

KR0138878B1 - Method for reducing the pitch detection time of vocoder

Info

Publication number: KR0138878B1
Application number: KR1019940036577A
Authority: KR
Inventors: 유하영; 변경진; 김종재; 한기천; 김재석; 배명진
Original assignee: 양승택; 한국전자통신연구원
Priority date: 1994-12-24
Filing date: 1994-12-24
Publication date: 1998-07-01
Also published as: KR960027875A

Abstract

본 발명은 디지털 음성통신에서 음성을 압축시키는 CELP형 음성 부호화기의 피치 검색시간을 단축하는 보코더용 피치검색 시간 단축법에 관한 것으로서, 특히, 입력되는 음성신호의 일정 구간에 대하여 봉우리와 골을 검출하여 예비피치로 정하는 데시메이션 기법을 적용하여 예비피치에 대해서만 피치검색을 수행하여 CELP보코더의 실현시에 음질의 저하없이 보코더 전체 처리과정을 상당히 줄일 수 있는 효과가 있다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vocoder's pitch search time shortening method for shortening the pitch search time of a CELP speech coder that compresses speech in digital speech communication. By applying the decimation technique of the preliminary pitch, the pitch search is performed only on the preliminary pitch, and thus the overall process of the vocoder can be considerably reduced without degrading the sound quality when the CELP vocoder is realized.

Description

Method for Reducing the Pitch Detection Time of Vocoder

제1도 본 발명의 구현을 위한 하드웨어 구성예를 나타낸 도면1 is a diagram showing an example of a hardware configuration for the implementation of the present invention.

제2도 본 발명의 구현한 소프트웨어 처리 블럭도2 is a software processing block diagram of the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

11: 마이크로폰12: 제1증폭기11: microphone 12: first amplifier

13: 제1저역통과 필터14: A/D변환기13: first low pass filter 14: A / D converter

15: 입력 포트21: 스피커15: Input port 21: Speaker

22: 제2증폭기23: 제2저역통과 필터22: second amplifier 23: second low pass filter

24: D/A변환기25: 출력 포트24: D / A converter 25: Output port

30: DSP칩31: 메모리30: DSP chip 31: memory

32: 입출력 포트32: I / O port

[기술분야][Technical Field]

본 발명은 디지털 음성통신에서 음성을 압축시키는 CELP형 음성 부호화기의 피치 검색 시간을 단축하는 보코더용 피치검색 시간 단축법에 관한 것으로서, 특히, 입력되는 음성 신호의 일정 구간에 대하여 봉우리와 골을 검출하여 예비피치로 정하는 데시메이션 기법을 적용하여 예비피치에 대해서만 피치검색을 수행하여 피치 검색시간을 절약하는 것을 특징으로 하는 봉우리-골 검출의 예비피치에 의한 보코더용 피치 검색 처리시간 단축법에 관한 것이다.The present invention relates to a vocoder pitch search time shortening method for shortening the pitch search time of a CELP speech coder for compressing speech in digital voice communication. The present invention relates to a method for shortening the pitch search processing time for a vocoder by using a preliminary pitch of peak-goal detection by applying a decimation technique for determining a preliminary pitch to save the pitch search time only for the preliminary pitch.

[배경 기술 및 종래 기술의 문제점][Problems of Background Art and Prior Art]

지금까지 디지털 방식의 휴대용 통신기기에서는 전송채널의 대역폭을 효율적으로 사용하고 또한 고음질을 얻기 위하여 여러가지의 보코더 이론들을 이용하여 음성 부호화기를 실현하고 있다.Until now, digital portable communication devices have realized voice coders using various vocoder theories to efficiently use the bandwidth of a transmission channel and to obtain high sound quality.

그러나 이러한 보코더 기법들은 많은 계산량을 필요로 하며 특히 피치검색 부분은 보코더기법에서 필요로 하는 전체 계산량의 50%이상을 차지한다.However, these vocoder techniques require a large amount of computation, and especially the pitch search part takes up more than 50% of the total computation required by the vocoder technique.

그러므로, DSP칩으로 보코더기법을 실현하는 경우에, 많은 계산량으로 인해 고속의 DSP칩이 아니면 실시간 구현이 어려운 문제점이 있었다.Therefore, in the case of realizing the vocoder technique with a DSP chip, there is a problem that it is difficult to implement a real-time unless a high-speed DSP chip due to a large amount of calculation.

음성신호를 부호화하기 위한 보코더기법은 크게 파형부호화법, 소스부호화법, 혼성부호화법으로 구분된다.Vocoder techniques for encoding speech signals are broadly classified into waveform encoding, source encoding, and hybrid encoding.

최근의 부호화 기술과 합성된 음질을 고려할 때 보코더용으로 가장 바람직한 기법이 혼성부호화법이다.Considering the current coding technique and synthesized sound quality, the most preferable technique for vocoder is hybrid coding.

혼성부호화법은 성도필터를 선형예측분석법으로 모델링하고, 남은 잔류신호는 그대로 전송하는 부호화법이며 RELP(Residual Excited Linear Prediction)법, VELP(Voice Excited Linear Prediction)법, CELP(Code Excited Linear Prediction)법 등이 있다.Hybrid coding is a coding method that models the vocal filter by linear predictive analysis, and transmits the residual signal as it is.Residual Excited Linear Prediction (RELP), Voice Excited Linear Prediction (VELP), and Code Excited Linear Prediction (CELP) Etc.

이들 중에는 사용대역폭에 비해 가장 음질이 우수하다고 알려진 것은 CELP보코더이다.Among them, the CELP vocoder is known to have the best sound quality compared to the bandwidth used.

CELP보코더는 입력으로 얻어진 음성신호를 분석하여 필요한 파라미터를 추출하고 이를 이용하여 음성신호를 합성하여 입력 음성신호와 비교하는 합성에 의한 분석 방법을 사용하므로써 낮은 전송율에서도 매우 우수한 음질을 합성해서 비교해야 하므로 매우 복잡한 구조를 갖고 그에 따른 방대한 계산량으로 인해 실시간 구형에 어려움이 많다.CELP vocoder analyzes the speech signal obtained from the input, extracts the necessary parameters, and synthesizes the speech signal using the synthesis method to compare with the input speech signal. Due to the very complicated structure and the enormous amount of computation, the real-time sphere is difficult.

CELP부호화기에서 가장 큰 계산량을 필요로 하는 부분은 코드북에서 입력 여기신호를 찾아내는 과정과 피치필터의 계수를 구하는 과정이다.The part that needs the largest amount of computation in the CELP encoder is the process of finding the input excitation signal in the codebook and the coefficient of the pitch filter.

이중에서 본 발명과 관련되어 있는 부분인 피치분석은 음성신호의 장기 상관관계에 해당하는 피치주기에 관한 정보를 얻어내는 과정인데 CELP부호화기의 전체 계산량의 50%이상을 차지하는 부분이므로 이 부분의 개선은 전체 복호화기에 많은 영향을 미치게 된다.Among them, pitch analysis, which is a part related to the present invention, is a process of obtaining information on pitch period corresponding to a long-term correlation of a speech signal, which is more than 50% of the total calculation amount of the CELP encoder. It affects a lot of the entire decoder.

음성신호의 경우 피치분석 구간은 일정 크기 이상으로 늘어날 경우 음질이 급속도로 저하하므,로 보통 5ms에서 10ms사이로 결정하여 계산량을 최소화하고 음질을 저하시키지 않도록 한다.In the case of voice signals, the pitch analysis section is rapidly deteriorated when the pitch analysis section is extended to a certain size. Therefore, the sound quality is usually set between 5ms and 10ms to minimize the calculation amount and not to degrade the sound quality.

8KHz의 표본화된 음성 신호의 경우 보통 피치필터의 파라미터인 피치지연(L)과 피치이득(b)을 구하는 데 있어서 음질이 우수한 폐루프구조를 사용하게 되는데 폐회로 구조에서는 피치지연을 20에서 147까지의 값으로 제한한다.In the case of 8KHz sampled speech signal, a closed loop structure with excellent sound quality is used to obtain pitch delay (L) and pitch gain (b), which are parameters of a pitch filter. In a closed loop structure, a pitch delay of 20 to 147 is used. Limit by value

이 범위내의 제한된 128개의 지연값에 대해 피치이득을 구하고 이를 이용하여 스펙트럼 필터의 잔여신호에 대한 피치필터의 응답을 얻는다.Pitch gain is obtained for the limited 128 delay values within this range and used to obtain the response of the pitch filter to the residual signal of the spectral filter.

각각의 경우에 대한 잔여신호들의 평균제곱 오차값을 계산하여 최소 값에 해당하는 b와 L값을 얻으면 최적의 피치필터가 결정된다. 즉 최적의 피치 지연값과 이득을 구하기 위해서는 128번의 폐루프에 대한 계산을 항상 반복하게 되므로 하나의 피치 파라미터값을 구하기 위한 계산량이 엄청나게 많아지는 단점이 있다. 즉, 기존의 피치 검색과정은 피치 지연과 이득을 구하기 위해 모든 피치구간에 대해 반복적으로 검색해야 하기 때문에 CELP부호화기 전체 계산시간의 50%이상을 차지한다.The optimum pitch filter is determined by calculating the mean square error value of the residual signals for each case to obtain the b and L values corresponding to the minimum values. That is, since the calculation for 128 closed loops is always repeated to obtain the optimum pitch delay value and gain, the calculation amount for calculating one pitch parameter value is enormous. That is, the conventional pitch search process takes up more than 50% of the total computation time of the CELP encoder because it needs to search repeatedly for all pitch sections to find the pitch delay and gain.

[발명의 목적][Purpose of invention]

따라서, 본 발명은 피치검색 방법을 개선하여 기존의 피치검색시 필요한 계산량을 줄임으로써, 실시간 구현이 용이하고 또한 줄어든 계산량만큼의 다른 기능을 추가적으로 DSP칩에 탑재할 수 있으므로 보다 효율적으로 시스템을 구성할 수 있게 하는 보코더용 피치 검색 처리 시간 단축법을 제공함에 그 목적이 있다.Therefore, the present invention improves the pitch search method and reduces the amount of calculation required for the existing pitch search, so that it is possible to implement a system more efficiently since it is easy to implement in real time and additional functions can be added to the DSP chip as much as the reduced calculation amount. It is an object of the present invention to provide a method for shortening the pitch search processing time for a vocoder.

[발명의 개요][Overview of invention]

상기와 같은 목적을 달성하기 위하여 본 발명은, 음성신호를 입력받아 상기 음성신호로부터 포만트 합성필터의 영입력응답을 제거하는 제1과정과; 상기 영 입력 응답이 제거된 상기 음성신호에 인식 가중화 처리를 행하고 피치지연(L)을 소정의 값으로 가정하는 제2과정과; 입력된 음성 신호에 대해 고차 포만트 영향을 제거하기 위하여 상기 음성 신호를 저역 통과 여파기에 통과시키는 제3과정과; 상기 저역 통과 여파기를 거친 음성 신호 구간을 일정 구간으로 나누어 각 구간에서 제일 큰 값을 갖는 봉우리와 제일 작은 값을 갖는 골을 검출하는 데시메이션을 수행하여 예비 피치를 구하는 제4과정과; 상기 입력음성신호의 현재 프레임 포만트 잔류성분과 이전 프레임의 피치필터의 출력을 가중필터에 통과시켜 합성음성신호를 합성하는 제5과정과; 상기 제5과정에서 생성된 합성음성신호의 시간지연이 0인 자기상관관계치와 소정의 피치지연을 가지고 있는 자기상관관계치를 구하여 상기 소정의 피치지연을 가지고 있는 상기 자기상관관계치의 제곱을 지간지연이 0인 상기 자기상관관계치로 나누어 상관관계(E_L)를 구하는 제6과정과; 상기 피치지연(L)이 소정의 값을 넘는지 판단하여, 피치지연(L)이 소정의 값을 넘지 않는 경우 상기 제4과정에서 구한 예비 피치에 해당하는 피치지연(L)을 구해서, 제5과정으로 진행하는 제7과정과, 상기 제7과정에서 상기 피치지연(L)이 소정의 값을 넘는 경우, 상기 제6과정에서 구한 상관관계(E_L)가 최대인 피치 지연(L)값과, 피치 이득(b)을 선택하는 제8과정으로 구성된다.In order to achieve the above object, the present invention includes a first step of receiving a voice signal and removing the zero input response of the formant synthesis filter from the voice signal; A second process of performing a recognition weighting process on the speech signal from which the zero input response is removed and assuming a pitch delay L as a predetermined value; A third step of passing the voice signal through a low pass filter to remove the high order formant effect on the input voice signal; Dividing the voice signal section through the low pass filter into a predetermined section to perform a decimation for detecting the peak having the largest value and the valley having the smallest value in each section to obtain a preliminary pitch; A fifth step of synthesizing the synthesized speech signal by passing the residual component of the current frame formant of the input speech signal and the output of the pitch filter of the previous frame through a weighting filter; The autocorrelation value having the time delay of the synthesized speech signal generated in the fifth process is 0 and the autocorrelation value having the predetermined pitch delay is obtained, and the square of the autocorrelation value having the predetermined pitch delay is delayed. A sixth step of dividing the autocorrelation value equal to 0 to obtain a correlation E _L ; It is determined whether the pitch delay L exceeds a predetermined value, and when the pitch delay L does not exceed a predetermined value, the pitch delay L corresponding to the preliminary pitch obtained in the fourth process is obtained, and the fifth In the seventh process proceeds to the process, and if the pitch delay (L) in the seventh process exceeds a predetermined value, the pitch delay (L) value and the correlation (E _L ) obtained in the sixth process is the maximum and And an eighth process of selecting the pitch gain b.

[구성 및 실시예][Configuration and Example]

이하, 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

제1도 본 발명의 구현을 위한 하드웨어 구성예를 나타낸 도면이고, 제2도 본 발명을 구현한 소프트웨어 처리 블럭도이다.1 is a diagram showing an example of a hardware configuration for implementing the present invention, and FIG. 2 is a software processing block diagram implementing the present invention.

제1도를 참조하면, 먼저, 마이크로폰(11)을 통해 음파가 전기 신호로 변환되면, 이를 제1증폭기(12)를 통해 증폭하여, 전기신호를 일정한 레벨로 높이게 된다.Referring to FIG. 1, first, when a sound wave is converted into an electric signal through the microphone 11, the sound wave is amplified by the first amplifier 12 to raise the electric signal to a predetermined level.

여기서, 마이크로폰(11)을 통해 입력된 신호의 성분은 음성신호인 경우에 20Hz∼20KHz 범위의 주파수를 갖는 성분으로 구성된다.Here, the component of the signal input through the microphone 11 is composed of a component having a frequency in the range of 20 Hz to 20 KHz in the case of an audio signal.

본 발명에서는 이들 성분중 본 발명을 구현하기 위해서는 의사전달 정보 성분만 포함하면 되기 때문에 저역통과 여파기(LPF)(13)를 통해 의사전달 정보성분 주파수의 범위인 4KHz이내의 주파수 성분만 통과되고 이상 주파수 성분은 제거한다.In the present invention, since only the communication information components need to be included in order to implement the present invention, only a frequency component within 4 KHz that is a range of the frequency of the communication information component is passed through the low pass filter (LPF) 13 and the abnormal frequency is passed. Ingredients remove.

사실, 전화통신에서 음성신호는 3.4KHz까지만 보내고 있다.In fact, in telephony, voice signals only send up to 3.4KHz.

이처럼 특정 주파수 이상의 성분을 제거하는 이유는 이 음성신호를 디지탈로 변환하였을 때 1초당 처리할 데이타 수를 줄이기 위함이다.The reason for removing components above a certain frequency is to reduce the number of data to be processed per second when this voice signal is converted to digital.

4KHz이하의 신호 성분만 남기고 저역여파인 신호에 대해 컴퓨터로 이를 처리하게 위해 디지탈 신호로 변환하여야 하는데, 이것은 아날로그를 디지탈로 변환하는 변환기(ADC : Analog t0 digital Converter)(14)에 의해 표본화된다.For low-pass signals, leaving only 4KHz of signal components, they must be converted to a digital signal for processing by the computer, which is sampled by an analog to digital converter (ADC) 14.

아날로그 신호가 디지탈 신호로 표본화되는 샘플링 주파수는 나이퀴스트(Nyquist)의 표본화 이론에 따라 대역제한 신호의 최대주파수(여기서는 4KHz)의 두배(여기서는 8KHz)가 되어야 한다.The sampling frequency at which an analog signal is sampled as a digital signal should be twice the maximum frequency of the band-limited signal (here 4KHz) according to Nyquist's sampling theory.

또한 한 표본당 전압레벨을 양자화(Quantization)해야 하는데, 전화음질을 기준으로 하기 위해 12피트(2¹²=4096)레벨을 사용하였다.We also need to quantize the voltage level per sample. We used a 12-foot (2 ¹² = 4096) level as a reference for telephone sound quality.

이렇게 처리된 디지탈 음성신호는 마이크로프로세서로 구성된 디지털 신호 처리기(DSP : Digital Signal Processor)(30)에서 계산 및 처리하기 위해 입력포트(15)를 통해 입력된다.The digital voice signal thus processed is input through the input port 15 for calculation and processing in a digital signal processor (DSP) composed of microprocessors.

입력된 음성신호 데이타는 DSP(30)에서 소프트웨어 처리과정을 통해 처리한 다음에, 필요에 따라서 메모리(31)에 저장시키거나 또는 전송채널(121)에 전송하기 위해 입력/출력포트(32)에 출력한다. 그리고 필요시에는 메모리(31)에서 읽어낸 데이타나, 전송채널(121)을 통해 입력된 데이타를 사용하여 상기 DSP(30)가 복호화과정을 통해 음성신호를 합성한다.The input voice signal data is processed by the DSP 30 through a software process, and then stored in the memory 31 or transmitted to the input / output port 32 for transmission to the transmission channel 121 as necessary. Output If necessary, the DSP 30 synthesizes a voice signal through a decoding process using data read from the memory 31 or data input through the transmission channel 121.

이처럼 DSP(30)에 의해 복호화 처리가 완료된 합성 음성신호는 잘 처리되었는지 스피커(22)를 통해 들어보기 위해 출력포트(25)에 전달된다.The synthesized speech signal, which has been decoded by the DSP 30 in this manner, is transmitted to the output port 25 in order to listen through the speaker 22 whether it is well processed.

상기 출력포트(25)에 데이타가 전달되면 디지털을 아날로그로 변환하는 변환기(DAC : Digital to Analog Converter)(24)에 전달된다.When data is transmitted to the output port 25, the data is transmitted to a digital to analog converter (DAC) 24.

이 경우에도 표본화율 8KHz 단위로 디지털 신호는 아날로그 신호로 변환된다.Even in this case, the digital signal is converted into an analog signal with a sampling rate of 8KHz.

변환된 아날로그 신호는 아직 표본율의 고조파가 포함된 개별신호로 나타나기 때문에, 제2저역통과 여파기(23)에 통과시켜 기본대역의 신호만 남도록 처리한다.Since the converted analog signal is still represented as a separate signal including harmonics of sample rate, it is passed through the second low pass filter 23 to process only the signal of the base band to be left.

이와 같이 저역통과된 아날로그 신호는 제2증폭기(22)에서 스피커(21)를 구동시킬 수 있을 만큼의 레벨로 증폭되어 스피커(21)에 입력된다.The low pass analog signal is amplified to a level sufficient to drive the speaker 21 in the second amplifier 22 and input to the speaker 21.

그러면, 상기 스피커(21)는 전기신호를 음압파로 변화하여 주어 최조의 음성 신호를 인간이 귀를 통해 청취할 수 있게 된다. 제2도를 참조하여 본 발명에 따른 피치 검색 방법을 설명하면 다음과 같다.Then, the speaker 21 converts the electrical signal into a sound pressure wave, so that the human being can hear the most voice signal through the ear. The pitch search method according to the present invention will be described with reference to FIG.

먼저, 음성신호(s(n))를 입력받아 상기 음성신호 (s(n))로부터 포만트 합성필터의 영입력응답(ZIR of 1/A(z) : Zero Input Response of formant synthesis filter)을 제거(s1)하고, 상기 영입력응답(ZIR of 1/A(z))이 제거된 상기 음성신호(x(n))에 인식 가중화 처리(A(z)/A(z/α))(s2)를 한 후, 피치지연(L)을 20으로 가정(s3)한다.First, a zero input response of formant synthesis filter (ZIR of 1 / A (z)) is received from the voice signal s (n). Recognition weighting processing (A (z) / A (z / α)) to the speech signal x (n) from which the s1 is removed and the zero input response ZIR of 1 / A (z) is removed. After (s2), the pitch delay L is assumed to be 20 (s3).

그리고, 상기 입력된 음성 신호(s(n))에 대한 고차 포만트 영향을 제거하기 위하여 상기 음성 신호(s(n))를 저역 통과 여파기에 통과(s9)시킨 후, 상기 저역 통과 여파기를 거친 음성 신호 구간을 일정 구간으로 나누어 각 구간에서 제일 큰 값을 갖는 봉우리와 제일 작은 값을 갖는 골을 검출하는 데시메이션을 수행하여 예비 피치를 구한다. (s10)The voice signal s (n) is passed through a low pass filter s9 to remove the high order formant effect on the input voice signal s (n), and then passed through the low pass filter. The preliminary pitch is obtained by dividing the speech signal section into predetermined sections to detect the peak having the largest value and the valley having the smallest value in each section. (s10)

또한, 상기 입력음성신호(s(n))의 현재 프레임 포만트 잔류성분과 이전 프레임의 피치필터의 출력을 가중필터에 통과시켜 합성음성신호(y_L(n))를 합성(s4)하고, 상기 생성된 합성음성신호(y_L(n))의 시간지연이 0인 자기상관관계치(E_yy)와 소저으이 피치지연을 가지고 있는 자기상관관계치(E_yy)를 구하여(s5) 상기 소정의 피치지연을 가지고 있는 상기 자기상관관계치의 제곱(E_yy ²)을 시간지연이 0인 상기 자기 상관관계치(E_yy)로 나누어 상관관계(E_L)를 구한다(s6).Further, the synthesized speech signal y _L (n) is synthesized (s4) by passing a residual component of the current frame formant of the input speech signal s (n) and the output of the pitch filter of the previous frame through a weighting filter. obtaining the autocorrelation value (E _yy) in the resulting composite time delay of the audio signal (y _L (n)) has a zero-autocorrelation value (E _yy) and bovine jeoeuyi pitch delay (s5) the predetermined calculate the square (E _yy ²⁾ the time delay to zero by dividing the value of the autocorrelation (E _yy) correlation (E _L) value the auto-correlation relationship with the pitch delay (s6).

그리고, 상기 피치지연(L)이 147을 넘는지 판단(s7)하여, 피치지연(L)이 147을 넘지 않으면 예비 피치에 해당하는 피치지연(L)을 구해서(s11), 상기 합성음성신호를 생성하는 과정(s4)으로 진행하고, 상기 피치지연(L)이 147을 넘으면 상기 상관관계(E_L)가 최대인 피치지연(L)값과, 피치 이득(b)을 선택(s8)한다.Then, it is determined whether the pitch delay L exceeds 147 (s7). If the pitch delay L does not exceed 147, the pitch delay L corresponding to the preliminary pitch is obtained (s11), and the synthesized speech signal is obtained. If the pitch delay L exceeds 147, the pitch delay L value having the maximum correlation _L is selected and the pitch gain b is selected (s8).

종래의 방법에서는 도면의 s9 내지 s11과정을 제외한 나머지 블럭으로써 L값을 20에서 147까지 1씩 증가시키면서 128번의 폐루프에 대한 계산을 하여 오차가 제일 적은 값을 피치지연 L로 정하게 된다. 그러나 개선된 방법에서는 상기 s9내지 과정의 기능을 추가로 삽입하여 자기 상관관계가 큰 구간을 검출하고 나머지는 0으로 생략함으로써 폐루프계산시 생략구간은 피치 지연값에서 제외하였다.In the conventional method, as the remaining blocks except for steps s9 to s11 in the drawing, the value of the least error is determined as the pitch delay L by increasing the value of L from 20 to 147 by 1 for 128 closed loops. However, in the improved method, the functions of the s9 to the process are additionally inserted to detect a section having a large autocorrelation, and the rest is omitted as 0 so that the skipped section is excluded from the pitch delay value.

이때, 상기 s11(L=L+Ks)부분은 종래의 방법에서는 L=L+1이었으므로(3) 총 128번의 폐루프를 수행하였으나 개선된 방법에서는 생략구간을 제외하고 폐루프를 수행한다.At this time, since the s11 (L = L + Ks) part was L = L + 1 in the conventional method (3), a total of 128 closed loops were performed, but the improved method performs a closed loop except for an omitted section.

CELP보코더에서 피치 검색과정은 잔여신호로 합성된 음성신호가 원래음성과 가장 유사하게 나타나는 피치지연값과 이때의 피치이득을 구하게 되는데, 이 대는 시간지연에 따른 상관관계가 최상인 경우를 찾으면 된다.In the CELP vocoder, the pitch search process finds the pitch delay value and the pitch gain at which the speech signal synthesized as the residual signal most closely resembles the original speech, and finds the best correlation between the time delays.

상관관계가 최상인 시간지연을 찾기 위해서는 피치가 존재 가능한 영역을 차례차례 조사해 보아야 한다.In order to find the best time delay, we need to examine the areas where pitch can exist.

이러한 순차 피치 검색법은 많은 시간이 소요되기 때문에, 전처리 관계식에 의해 상관관계가 높게 나타나는 구간을 미리 파악하고, 이들 구간에 대해서만 본격적인 피치검색법을 수행하게 되면 피치 검색시간을 경감시킬 수 있다.Since the sequential pitch search method takes a lot of time, it is possible to reduce the pitch search time by identifying the sections in which correlations are high by the preprocessing relation in advance and performing the pitch search method only for those sections.

음성신호의 피치는 음성파형의 반복되는 봉우리에서 봉우리까지 또는 골에서 골가지로 정의된다.The pitch of the speech signal is defined as the repeating peaks to peaks or valleys in the valleys of the speech waveform.

파형의 봉우리 위주로 피치를 검출하는 경우에는 두드러진 봉우리가 존재하는 시간지연에 대해서만 자기 상관관계가 높게 존재한다.When the pitch is detected around the peaks of the waveform, the autocorrelation is high only for the time delay in which the prominent peaks exist.

반면, 파형의 골에 의해 피치를 검출하는 경우는 두드러진 골이 존재하는 시간지연에 대해서만 자기 상관관계가 높게 존재한다.On the other hand, when the pitch is detected by the valleys of the waveform, the autocorrelation is high only for the time delay in which the prominent valleys exist.

또한 피치주기는 2.5ms이내에서는 찾아가지 않기 때문에 피치검색에 적용되는 파형에 대해 다음과 갗이 데시메이션을 수행하여 예비 피치를 구한다.In addition, since pitch period does not find within 2.5ms, preliminary pitch is obtained by performing the following decimation on the waveform applied to pitch search.

먼저 한 프레임을 19 표본(=2.35ms)단위로 나누어 구간번호 i를 붙인다. 이때 i번째 19표본에 대해 최대 봉우리를 계산하여 크기는 p(i, 1)에, 위치는 p(i, 0)에 저장한다.First, a frame is divided into 19 samples (= 2.35ms) and a section number i is added. At this time, the maximum peak is calculated for the i th 19 sample, and the size is stored in p (i, 1) and the position is stored in p (i, 0).

또한 최소의 골을 측정하여 그 높이와 위치를 v(i, 1) 및 v(i, 0)에 각각 저장한다.It also measures the minimum goal and stores its height and location in v (i, 1) and v (i, 0), respectively.

이렇게 봉우리와 골을 찾게 되면 음성신호의 제3포먼트의 위상변화에 따른 영향 때문에 예비피치가 몇 표본정도 오차를 가질 수 있다. 따라서 음성신호에 대해 다음과 같은 해닝필터를 수행한 후에 상기의 데시메이션을 수행하면 고차의 포먼트에 의한 영향을 제거할 수 있다.When the peaks and valleys are found, the preliminary pitch may have a few sample errors due to the influence of the phase change of the third formant of the voice signal. Therefore, if the above decimation is performed after the following hanning filter is performed on the audio signal, the effect of higher order formants can be eliminated.

s'(n-2) = (s(n)+2s(n-1)+3s(n-2)+2s(n-3)+s(n))/9 (1)s' (n-2) = (s (n) + 2s (n-1) + 3s (n-2) + 2s (n-3) + s (n)) / 9 (1)

여기서 해닝필터의 차단주파수는 2.67KHz이다.The cutoff frequency of the Hanning filter is 2.67KHz.

검출된 봉우리와 골을 예비피치로 사용하기 위해서는 처음 찾아진 봉우리(골)를 기준으로 그 다음의 봉우리(골)의 차이가 다음과 같은 구간이내에 있는 경우에만 식(3)의 본격적인 자기 상관관계를 수행해야 한다.In order to use the detected peaks and bones as preliminary pitches, the full-fledged autocorrelation of Equation (3) is applied only when the difference between the next peaks (bones) based on the first peaks (bones) found is within the following interval. Should be done.

T_P(2i) = p%(i, 0)-Thp, andT _P (2i) = p% (i, 0) -Thp, and

T_P(2i+1) = v%(i, 0)-Thv, i = 1, 2, …, 12 (2)T _P (2i + 1) = v% (i, 0) -Thv, i = 1, 2,... , 12 (2)

여기서 Thp는 두드러진 첫 봉우리의 위치를, Thv는 첫 골의 위치를 나타낸다.Where Thp is the location of the prominent first peak and Thv is the location of the first goal.

검출된 예비피치들의 조합에 대해 E(L) = Exy²/Eyy 상관관계식에 대입하여 최대의 E(T_P(i))를 이루는 T_P(i)를 피치필터의 피치값 L로 결정하고, 이때 피치필터의 계수(bi)는 다음식으로 결정한다.Determining T _P (i) forming the ^{E (L) = Exy 2 /} Eyy the maximum E (T _P (i)) by applying the correlation to the combination of the detected preliminary pitch by pitch value L of the pitch filter, At this time, the coefficient bi of the pitch filter is determined by the following equation.

bi = Exy/Eyybi = Exy / Eyy

이렇게 데시메이션을 수행하면 봉우리와 골의 갯수가 19표본당 하나씩 찾아지게 되며, 봉유리와 골의 구간을 별도로 고려하여 예비 피치구간을 각각 찾으면 검색시간율은 순차 피치검색을 적용했을 때에 비해 다음과 같이 단축된다;When the decimation is performed, the number of peaks and valleys is found one per 19 specimens. When the preliminary pitch intervals are found considering the section of the peak glass and valleys separately, the search time rate is as follows. Shortened together;

T_R=2/19*105 = 11% (4)T _R = 2/19 * 105 = 11% (4)

여기서 계산시간에 5%를 더 고려한 것은 예비피치를 구하기 위해 데시메이션을 수행하는데 소요되는 시간이다.The additional consideration of 5% in the calculation time is the time required to perform the decimation to obtain the preliminary pitch.

두 처리과정의 피치검색 시간차를 구하기 위해 상기 발성들에 대해 1초 단위의 평균 검색시간을 구해보았다.To find the pitch search time difference between the two processes, the average search time in units of 1 second was obtained.

기존의 순차 피치 검색법은 평균 7.52초가 소요되었고 제안한 방법으로는 평균 1.02초가 소요되어 약87%의 시간절약이 이루어졌다. 여기서 시간 측정치는 컴퓨터의 기종에 따라 다르기 때문에 상대적인 시간 단축율만을 평가에 고려하였다.The conventional sequential pitch search method took 7.52 seconds on average, and the proposed method took 1.02 seconds on average, saving about 87% of the time. In this case, since the time measurement value is different depending on the computer type, only the relative time reduction rate is considered in the evaluation.

한편 순차 피치검색에 비해 제안한 검색법에서 피치필터의 예측이득은 평균 11.64dB에서 10.89dB로 낮아져서 -0.75dB정도로 열하되었다.On the other hand, compared with sequential pitch search, the predicted gain of the pitch filter was lowered from 11.64dB to 10.89dB in average, and dropped to -0.75dB.

[발명의 효과][Effects of the Invention]

상기와 같은 본 발명에 의해 음성파형의 자기 상관관계가 높은 구간만을 피치검색에 적용하면 CELP보코더의 실현시에 음질의 저하 없이 보코더 전체 처리과정의 44.5%이상을 줄일 수 있는 효과가 있다.According to the present invention as described above, if only the high correlation of the speech waveform autocorrelation is applied to the pitch search, it is possible to reduce more than 44.5% of the entire vocoder processing without degrading the sound quality when the CELP vocoder is realized.

이 대문에 처리속도가 낮은 저가의 DSP칩으로도 CELP보코더를 실시간 구현을 할 수 있게 된다.This makes it possible to implement CELP vocoder in real time even with low-cost DSP chip with low processing speed.

또한 피치검색시에 줄인 계산량 만큼의 처리과정을 다른 서비스기능을 위해 사용할 수 있으므로 경제적인 CELP보코더 시스템을 설계할 수 있게 된다.In addition, the process can be used for other service functions as much as the amount of calculation reduced during the pitch search, enabling the design of an economical CELP vocoder system.

그리고 보코더의 처리시간은 소비전력에 직접적인 영향을 주기 때문에 휴대용 보코더의 사용시간을 연장시킬 수 있게 되어, 상품의 대외 경쟁력을 높일 수 있는 등의 특장점을 갖게 된다.In addition, since the processing time of the vocoder directly affects the power consumption, it is possible to extend the use time of the portable vocoder, thereby increasing the external competitiveness of the product.

Claims

A pitch search time shortening method for a vocoder which shortens a pitch search time when searching a pitch in a CELP vocoder, comprising: a first step of receiving a voice signal and removing a zero input response of a formant synthesis filter from the voice signal; A second step of performing recognition weighting processing on the speech signal from which the zero input response has been removed and assuming that a pitch delay (L) is a predetermined value; A third step of passing the voice signal through a low pass filter to remove the high order formant effect on the input voice signal; Dividing the voice signal section through the low pass filter into a predetermined section to perform a decimation for detecting the peak having the largest value and the valley having the smallest value in each section to obtain a preliminary pitch; A fifth step of synthesizing the synthesized speech signal by passing the residual component of the current frame formant of the input speech signal and the output of the pitch filter of the previous frame through a weighting filter; The autocorrelation value having the time delay of the synthesized speech signal generated in the fifth process is 0 and the autocorrelation value having the predetermined pitch delay is obtained, and the square of the autocorrelation value having the predetermined pitch delay is obtained. A sixth step of dividing the autocorrelation value equal to 0 to obtain a correlation E _L ; It is determined whether the pitch delay L exceeds a predetermined value, and when the pitch delay L does not exceed a predetermined value, the pitch delay L corresponding to the preliminary pitch obtained in the fourth process is obtained, and the fifth A seventh process proceeding to a process; If the pitch delay L exceeds a predetermined value in the seventh step, the pitch delay L value having the maximum correlation E _L obtained in the sixth step is selected and the pitch gain b is selected. A method of shortening the pitch search processing time for a vocoder by using a preliminary pitch, comprising the eighth process.

2. The method of claim 1, wherein the minimum value of the pitch delay time is assumed to be 20 samples in the second process.

The vocoder pitch search processing time reduction method according to claim 1, wherein it is determined whether the pitch delay exceeds 147 in the seventh step.