KR20120072099A

KR20120072099A - Pitch estimation system in an integrated time and frequency domain by applying interpolation

Info

Publication number: KR20120072099A
Application number: KR1020100133889A
Authority: KR
Inventors: 서경학; 송재종; 이석필; 양창모; 김기출; 김무영
Original assignee: 전자부품연구원
Priority date: 2010-12-23
Filing date: 2010-12-23
Publication date: 2012-07-03
Also published as: KR101203158B1

Abstract

PURPOSE: A pitch estimating system using an interpolation method in a time-frequency mixing domain is provided to use the interpolation method when a time domain is mixed with a frequency domain. CONSTITUTION: A second autocorrelator(520) outputs an autocorrelated value of a frequency domain. A first interpolator interpolates an autocorrelated value of a time domain. A second interpolator interpolates the autocorrelated value of the frequency domain. A pitch estimator(540) applies a weighted value to the interpolated autocorrelated value of the time domain and the interpolated autocorrelated value of the frequency domain. The pitch estimator estimates a final pitch.

Description

Pitch Estimation System in an Integrated Time and Frequency Domain by Applying Interpolation

본 발명은 피치 추정 시스템에 관한 것으로서, 더 구체적으로는 피치 배가/반감 에러 발생률을 줄일 수 있는 보간법을 이용하는 시간-주파수 혼합영역의 피치 추정 시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pitch estimation system, and more particularly, to a pitch estimation system of a time-frequency mixed region using interpolation that can reduce the pitch double / half error occurrence rate.

일반적으로, 유성음은 성대의 진동에 의해 발생하며, 성대의 진동은 유성음 신호의 주기적 특성으로 나타난다.In general, voiced sound is generated by vibration of the vocal cords, and the vibration of the vocal cords appears as a periodic characteristic of the voiced sound signal.

이러한 주기적 특성을 피치라고 하며, 단시간 스펙트럼(Short-time Spectrum) 분석시, 기본주파수에 대응되며, 피치의 특성은 음의 높낮이에 해당된다.This periodic characteristic is called pitch, and it corresponds to the fundamental frequency in the short-time spectrum analysis, and the pitch characteristic corresponds to the pitch of the sound.

현재, 피치는 음성 부호화기의 효율적인 음성압축을 위하여 사용되고 있으며, 피치 정보를 이용한 화자인식, 음성인식 방법에도 적용중이며, 최근에는 사용자 허밍에 의한 음악 검색 시스템에도 이용되고 있다. 따라서, 피치 추정 방식에 대한 연구는 음성 신호처리 시스템에 있어 파급 효과가 매우 크다고 볼 수 있다.Currently, pitch is used for efficient speech compression of speech coders, and is being applied to speaker recognition and speech recognition methods using pitch information, and recently, it has also been used in music retrieval systems by user hum. Therefore, the study of the pitch estimation method can be seen that the ripple effect is very large in the speech signal processing system.

종래의 시간영역의 피치 추정 방법은 음소가 천이구간에 걸쳐 있는 경우에는 프레임 내의 레벨변화가 심하고, 피치 주기가 변동하여 피치 추정이 어렵고, 특히 잡음이 섞인 음성에서는 피치 추정을 위한 결정논리가 복잡하여 검출오류가 증가하는 단점이 있다.In the conventional time domain pitch estimation method, when the phoneme spans a transition period, the level change in the frame is severe, the pitch period is fluctuated, and the pitch estimation is difficult. Especially in the noise-mixed voice, the decision logic for pitch estimation is complicated. There is a disadvantage that the detection error increases.

종래의 주파수영역의 피치 추정 방법은 음소의 천이나 변동이나 배경잡음에 의한 영향은 적으나, 주파수영역으로의 변환이 필요하여 계산이 복잡하며, 기본 주파수의 정밀성을 높이려면 처리시간이 길어지는 단점이 있다.Conventional frequency domain pitch estimation method is less affected by phoneme variation, fluctuations or background noise, but it is complicated to calculate because it needs to be converted to frequency domain, and processing time is long to increase the precision of fundamental frequency. There is this.

더욱이, 전술한 두 방법에 자기상관을 적용하는 경우, 시간영역의 자기상관은 저음에 대해 피치 배가(Doubling) 에러가 자주 발생하며, 주파수영역의 자기상관은 고음에 대해 피치 반감(Halving) 에러가 자주 발생하는 문제가 있다.Moreover, when autocorrelation is applied to the two methods described above, the time-correlated autocorrelation frequently causes a doubling error for low frequencies, and the autocorrelation in the frequency domain has a pitch halving error for high frequencies. There is a common problem.

이를 개선하고자, 시간-주파수 혼성영역의 피치 추정 방법도 제안된 바 있으나, 이 방법 역시 계산량이 많은 문제가 있다.In order to improve this, a pitch estimation method of a time-frequency hybrid region has also been proposed, but this method also has a large computational problem.

본 발명은 전술한 바와 같은 기술적 배경에서 안출된 것으로서, 시간영역과 주파수영역을 혼합할 때, 보간법을 이용할 수 있는 보간법을 이용하는 시간-주파수 혼합영역의 피치 추정 시스템을 제공하는 것을 그 목적으로 한다.An object of the present invention is to provide a pitch estimation system of a time-frequency mixed region using an interpolation method that can use an interpolation method when mixing a time domain and a frequency domain.

본 발명의 일면에 따른 피치 추정 시스템은, 시간영역에서 기결정된 기준 주파수 미만의 유성음 신호를 자기상관연산하여 시간영역의 자기상관치를 출력하는 제1 자기상관기; 주파수영역에서 상기 기준 주파수 이상의 유성음 신호를 자기상관연산하여 주파수영역의 자기상관치를 출력하는 제2 자기상관기; 상기 시간영역의 자기상관치를 보간 처리하는 제1 보간 처리기; 상기 주파수영역의 자기상관치를 보간 처리하는 제2 보간 처리기; 및 보간된 상기 시간영역의 자기상관치 및 상기 주파수영역의 자기상관치에 가중치를 적용하여 최종 피치를 추정하는 피치 추정기를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a pitch estimating system comprising: a first autocorrelator configured to autocorrelate a voiced sound signal having a predetermined frequency lower than a predetermined reference frequency in a time domain to output an autocorrelation value in a time domain; A second autocorrelator which auto-correlates the voiced sound signal having the reference frequency or more in the frequency domain and outputs the autocorrelation of the frequency domain; A first interpolation processor for interpolating the autocorrelation of the time domain; A second interpolation processor for interpolating the autocorrelation of the frequency domain; And a pitch estimator applying a weight to the interpolated autocorrelation value and the frequency domain autocorrelation value to estimate a final pitch.

본 발명의 다른 면에 따른 피치 추정 시스템은, 시간영역에서, 유성음 신호를 자기상관연산하여 복수의 후보피치를 산출하는 제1 자기상관기; 주파수영역에서, 상기 복수의 후보피치의 인접 인덱스에만 자기상관연산하는 제2 자기상관기; 주파수영역에서, 상기 제2 자기상관기의 연산결과를 보간 처리하는 보간 처리기; 보간 처리된 상기 연산결과로부터 최종 피치를 결정하는 피치 추정기를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a pitch estimation system, including: a first autocorrelator configured to autocorrelate a voiced sound signal to calculate a plurality of candidate pitches in a time domain; A second autocorrelator performing autocorrelation only on adjacent indices of the plurality of candidate pitches in the frequency domain; An interpolation processor configured to interpolate a calculation result of the second autocorrelator in a frequency domain; And a pitch estimator for determining a final pitch from the interpolated calculation result.

본 발명에 따르면, 시간영역과 주파수영역의 자기상관치를 혼합시, 보간법을 사용하여 시간과 주파수영역의 분해능 불일치로 인하여 저주파 영역 또는 고주파 영역이 불명료하게 표현되는 문제점을 개선할 수 있으며, 피치 배가/반감 에러를 줄일 수 있다.According to the present invention, when the autocorrelation of the time domain and the frequency domain is mixed, the problem that the low frequency region or the high frequency region is unambiguously expressed due to the resolution mismatch between the time and the frequency domain using interpolation can be improved, and the pitch double / The half-half error can be reduced.

뿐만 아니라, 본 발명은 FFT 계수를 줄여도 높은 성능을 제공할 수 있어, FFT 연산의 계산량을 줄일 수 있으며, 자기상관을 위한 계산량을 크게 줄일 수 있어, 시스템 부하를 줄일 수 있고, 처리 속도는 높일 수 있다.In addition, the present invention can provide a high performance even by reducing the FFT coefficient, can reduce the calculation amount of the FFT operation, can significantly reduce the calculation amount for autocorrelation, can reduce the system load, and increase the processing speed have.

도 1A는 유성음 신호를 시간영역에서 도시한 그래프.
도 1B는 시간영역의 자기상관치를 시간영역에서 도시한 그래프.
도 1C는 주파수영역의 자기상관치를 시간영역에서 도시한 그래프.
도 1D는 시간-주파수 혼합영역의 자기상관치를 시간영역에서 도시한 그래프.
도 2A는 시간영역의 자기상관치를 도시한 그래프.
도 2B는 주파수영역의 자기상관치를 도시한 그래프.
도 2C는 주파수영역의 자기상관치를 시간영역에서 도시한 그래프.
도 2D는 보간법을 적용한 주파수영역의 자기상관치를 시간영역에서 도시한 그래프.
도 3은 선형 보간법을 이용하여 시간-주파수영역을 결합하는 과정을 표현한 그래프.
도 4는 본 발명의 실시예에 따른 IACF 1를 사용하는 피치 추정 시스템을 도시한 구성도.
도 5는 본 발명의 실시예에 따른 IACF 2 및 IACF 3를 사용하는 피치 추정 시스템을 도시한 구성도.1A is a graph showing a voiced sound signal in a time domain.
1B is a graph showing the autocorrelation of the time domain in the time domain.
1C is a graph showing the autocorrelation of the frequency domain in the time domain.
1D is a graph showing the autocorrelation of the time-frequency mixing region in the time domain.
2A is a graph showing autocorrelation in the time domain.
2B is a graph showing autocorrelation in the frequency domain.
2C is a graph showing the autocorrelation of the frequency domain in the time domain.
2D is a graph showing the autocorrelation of the frequency domain to which the interpolation method is applied in the time domain.
3 is a graph illustrating a process of combining time-frequency domains using linear interpolation.
4 is a block diagram illustrating a pitch estimation system using IACF 1 according to an embodiment of the present invention.
5 is a block diagram illustrating a pitch estimation system using IACF 2 and IACF 3 according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various forms, and only the present embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the art to which the present invention pertains. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

본 발명은 시간-주파수 혼합영역의 자기상관법을 이용하여 피치를 추정함으로써, 시간영역 또는 주파수영역의 자기상관법만을 이용할 때 발생할 수 있는 피치 배가/반감 에러를 개선할 수 있다.The present invention can improve the pitch doubling / half error that can occur when using only the time- or frequency-domain autocorrelation method by estimating the pitch using the auto-correlation method of the time-frequency mixed domain.

이하, 본 발명의 구성에 대해 설명하기에 앞서, 그래프와 수학식을 참고하여 시간영역 및 주파수 영역의 자기상관법 또는 시간-주파수 혼합영역 자기상관법을 이용한 피치 추정 방법에 대하여 설명한다.Hereinafter, the pitch estimation method using the time domain and frequency domain autocorrelation method or the time-frequency mixed domain autocorrelation method will be described with reference to graphs and equations before explaining the configuration of the present invention.

먼저, 시간영역의 자기상관법(Time domain AutoCorrelation Function: FACF)은 하기의 수학식 1과 같이 시간영역의 자기 상관치 Rt(τ)를 산출한다. 여기서, x[n]는 n번째 유성음 신호, τ는 피치에 대응되는 지연, N_t는 유성음 신호의 프레임 길이이다.First, the time domain autocorrelation function (FACF) calculates the autocorrelation value Rt (τ) of the time domain as shown in Equation 1 below. Where x [n] is the nth voiced sound signal, τ is the delay corresponding to the pitch, and N _t is the frame length of the voiced sound signal.

그리고, 시간영역의 자기상관법을 이용하는 피치 추정 시스템은 시간영역의 자기 상관치의 최대 피크에 대응하는 시간 인덱스를 최종 피치로 결정한다.Then, the pitch estimation system using the time domain autocorrelation method determines the time index corresponding to the maximum peak of the autocorrelation value of the time domain as the final pitch.

다음으로, 주파수영역의 자기상관법(FACF: Frequency domain AutoCorrelation Function)은 하기의 수학식 1과 같이 주파수영역의 자기 상관치 R_f(ω_τ)를 산출한다. 여기서, X[ω_n]은 ω_n번째 유성음 신호의 대수 진폭(Log magnitude), ω_τ는 기본주파수에 대응되는 지연, N_f는 FFT 계수이다. 여기서, X[ω_n]는 에너지가 많은 부분을 제외하면 하모닉이 잘 표현되지 않는 불필요한 정보이므로, 평균값으로 클리핑하여 적용될 수 있다.Next, the frequency domain autocorrelation function (FACF) calculates the autocorrelation value R _f (ω _τ ) of the frequency domain as shown in Equation 1 below. Where X [ω _n ] is the log magnitude of the ω _nth voiced signal, ω _τ is the delay corresponding to the fundamental frequency, and N _f is the FFT coefficient. Here, since X [ω _n ] is unnecessary information in which harmonics are not well expressed except for a large portion of energy, it may be applied by clipping to an average value.

주파수영역의 자기상관법을 이용하는 피치 추출 시스템은 수학식 2에 의해서 산출된 주파수영역의 자기 상관치의 최대 피크에 대응하는 주파수 인덱스를 최종 기본주파수로 결정한다. 이때, 기본주파수는 피치의 역수이며, 윈도우에 의한 하모닉 간의 간격이다.The pitch extraction system using the frequency domain autocorrelation method determines the frequency index corresponding to the maximum peak of the autocorrelation value of the frequency domain calculated by Equation 2 as the final fundamental frequency. At this time, the fundamental frequency is an inverse of the pitch, and is an interval between harmonics caused by a window.

여기서, 기본주파수는 피치와 서로 역수 관계이므로 시간영역 및 주파수영역의 자기상관법은 샘플링과 FFT 계수에 의해 표현가능한 분해능이 한정된다.Here, since the fundamental frequency is inversely related to the pitch, the autocorrelation method of the time domain and the frequency domain has a limited resolution that can be expressed by sampling and FFT coefficients.

특히, 주파수영역의 자기상관은 프레임 길이, 윈도우 함수의 길이와 종류에 따라서 분해능이 달라진다. 따라서, 길이가 긴 프레임 길이 및 윈도우 함수를 사용하는 경우 주파수영역에서 피치를 추정하는 것이 용이하다. 그러나, 주파수영역의 피치 추정은 분석 프레임 구간 내에서 피치가 변화하는 경우는 피치 검출이 어렵고, 계산량이 많은 단점이 있으므로, 이러한 점을 고려할 필요가 있다.In particular, the resolution of the autocorrelation in the frequency domain depends on the frame length and the length and type of the window function. Therefore, when using a long frame length and window function, it is easy to estimate the pitch in the frequency domain. However, pitch estimation in the frequency domain is difficult to detect the pitch when the pitch is changed within the analysis frame section, and there is a disadvantage in that a large amount of calculation is required.

따라서, 이하의 명세서에서는 해밍 윈도우를 사용하고, 32.5ms 사이즈의 프레임을 사용하는 경우를 예로 들어 설명하였다.Therefore, in the following specification, a case where a hamming window is used and a frame having a size of 32.5 ms is used is described as an example.

한편, 시간-주파수 혼합영역의 자기상관법(TFAC: Time-Frequency domain Autocorrelation)은 하기의 수학식 3에 의하여 자기상관치를 결합한다. 이때, Rt(τ) 및 R_f(ω_τ)는 시간과 주파수영역의 자기상관을 결합하기 위하여 신호의 크기를 고려하여 최대 1로 정규화된다.On the other hand, Time-Frequency Domain Autocorrelation (TFAC) of time-frequency mixed domains combines autocorrelation values according to Equation 3 below. At this time, Rt (τ) and R _f (ω _τ ) are normalized to a maximum of 1 in consideration of the magnitude of the signal in order to combine the autocorrelation of time and frequency domain.

수학식 3에서, β는 시간영역 및 주파수영역의 자기상관치의 가중치로서, 피치의 배가 및 반감 에러를 조절할 수 있는 파라미터이다. 이하에서는 β가 0.5로 설정된 경우를 예로 들어 설명한다.In Equation 3, β is a weight of the autocorrelation value in the time domain and the frequency domain, and is a parameter capable of adjusting the doubling of the pitch and the half-half error. Hereinafter, the case where β is set to 0.5 will be described as an example.

상기의 수학식 3의

는 하기의 수학식 4와 같이 Round 함수(

)에 의해 산출되는 시간영역의 τ와 가장 인접한 ω_τ이다.Of Equation 3 above

Is a Round function (

Ω _τ closest to τ in the time domain calculated by

이하, 도 1A 내지 1D의 그래프를 참조하여 시간-주파수 혼합영역의 자기상관법을 이용하여 피치 배가 에러를 개선가능함을 설명한다.Hereinafter, the pitch doubling error can be improved by using the autocorrelation method of the time-frequency mixing region with reference to the graphs of FIGS. 1A to 1D.

도 1A는 유성음 신호를 시간영역에서 도시한 그래프이며, 도 1B는 시간영역의 자기상관치를 시간영역에서 도시한 그래프이며, 도 1C는 주파수영역의 자기상관치를 시간영역에서 도시한 그래프이며, 도 1D는 시간-주파수 혼합영역의 자기상관치를 시간영역에서 도시한 그래프이다. 도 1A 내지 1D에서, *는 자기상관치의 최대 피크이다.FIG. 1A is a graph showing a voiced sound signal in a time domain, FIG. 1B is a graph showing a time domain autocorrelation in a time domain, and FIG. 1C is a graph showing a frequency domain autocorrelation in a time domain, and FIG. 1D. Is a graph showing the autocorrelation of the time-frequency mixing region in the time domain. 1A-1D, * is the maximum peak of autocorrelation.

도 1B와 같이, 수학식 1의 시간영역의 자기상관법을 사용하면, 실체 피치의 정수 배 τ를 피치로 추출하여 실제 피치의 두 배에 해당하는 피치 배가 에러가 발생한다.As shown in FIG. 1B, when the time-domain autocorrelation method of Equation 1 is used, an integer multiple of the actual pitch τ is extracted as a pitch, and a pitch double error corresponding to twice the actual pitch occurs.

그러나, 도 1D와 같이, 도 1B의 시간영역의 자기상관법을 도 1C의 주파수영역의 자기상관법과 결합하면, 피치 배가 에러를 정정할 수 있다.However, as shown in FIG. 1D, when the time domain autocorrelation method of FIG. 1B is combined with the autocorrelation method of the frequency domain of FIG. 1C, the pitch doubling error can be corrected.

한편, 전술한 시간-주파수 혼합영역의 자기상관법은 시간영역과 주파수영역을 결합할 때 라운드 함수에 의해 시간영역의 τ와 가장 인접한 ω_τ를 이용하는 경우에 대하여 설명하였지만, 이와 달리, 보간법을 사용할 수 있다. 이하, 보간법을 사용하는 시간-주파수 혼합영역의 자기상관법에 대하여 설명한다.On the other hand, the above-described autocorrelation of the time-frequency mixed domain has been described in the case of using ω _τ closest to the τ of the time domain by the round function when combining the time domain and the frequency domain. Can be. The autocorrelation method of the time-frequency mixed region using the interpolation method will be described below.

시간영역과 주파수영역을 결합할 때, 보간법을 사용하면 크게 이하의 두 가지 장점을 얻을 수 있다.When combining the time domain and the frequency domain, the following two advantages can be obtained by using the interpolation method.

첫째로, 보간법을 사용하면 시간과 주파수영역의 분해능 불일치를 개선할 수 있다.First, the interpolation method can improve the resolution mismatch between time and frequency domain.

전술한 바와 같이, 피치와 기본주파수는 역수 관계이며, 시간영역의 자기상관법은 샘플링에 의하여 분해능이 결정되며, 주파수영역의 자기상관법은 FFT 계수 N_f에 의하여 분해능이 결정된다. 따라서, 두 영역을 결합할 때, 분해능의 불일치가 발생하기 마련이다.As described above, the pitch and the fundamental frequency have an inverse relationship. In the time domain autocorrelation, the resolution is determined by sampling, and in the frequency domain autocorrelation, the resolution is determined by the FFT coefficient N _f . Thus, when combining the two regions, a resolution mismatch occurs.

예를 들어, 시간영역과 주파수 영역은 8kHz 및 1028-FFT라는 각기 다른 해상도를 갖는다. 또한, 시간영역의 샘플은 주파수영역에서 N_f/n인 반비례 간격이지만, 주파수영역의 샘플은 시간영역에서 정비례 간격으로 표현되는 차이가 있다. 따라서, 8kHz 및 1028-FFT의 해상도를 사용할 때에는 250Hz를 기준으로 250Hz이하는 시간영역의 자기상관이 잘 표현되며, 250Hz이상에서는 주파수영역의 자기상관이 잘 표현되는 특징이 나타난다.For example, the time domain and frequency domain have different resolutions, 8 kHz and 1028-FFT. In addition, the samples in the time domain are inversely proportional intervals of N _f / n in the frequency domain, but the samples in the frequency domain have a difference expressed in direct proportional intervals in the time domain. Therefore, when using the resolutions of 8kHz and 1028-FFT, the autocorrelation of the time domain below 250Hz based on 250Hz is well represented, and the autocorrelation of the frequency domain above 250Hz is well represented.

이러한 두 영역 간의 불일치는 보간법에 의하여 해소될 수 있는데 이하, 도 2A 내지 2D를 참조하여 보간법을 적용하여 혼합영역의 불일치를 해소하는 과정에 대하여 설명한다.The inconsistency between the two regions can be solved by the interpolation method. Hereinafter, a process of solving the inconsistency of the mixed region by applying the interpolation method will be described with reference to FIGS. 2A to 2D.

도 2A는 시간영역의 자기상관치를 도시한 그래프이며, 도 2B는 주파수영역의 자기상관치를 도시한 그래프이며, 도 2C는 주파수영역의 자기상관치를 시간영역에서 도시한 그래프이며, 도 2D는 보간법을 적용한 주파수영역의 자기상관치를 시간영역에서 도시한 그래프이다. 도 2A 내지 2D에서, ○, □, *, ◇, △는 시간영역의 자기상관치의 피크 인덱스이다.FIG. 2A is a graph illustrating autocorrelation in the time domain, FIG. 2B is a graph illustrating autocorrelation in the frequency domain, FIG. 2C is a graph illustrating the autocorrelation in the frequency domain, and FIG. 2D is an interpolation method. It is a graph showing the autocorrelation of the applied frequency domain in the time domain. 2A to 2D,?,?, *,?,? Are the peak indices of autocorrelation values in the time domain.

도 2A은 시간영역의 자기상관치이며, 도 2A에서, 자기상관치의 피크는 유성음 신호의 주기적 특성에 의하여 정비례 간격으로 표현됨을 알 수 있다. 여기서, 시간영역의 자기상관은 8kHz의 샘플링 주파수를 이용한 경우를 예로 들었다. FIG. 2A shows autocorrelation values in the time domain, and in FIG. 2A, peaks of autocorrelation values are expressed at direct intervals by periodic characteristics of the voiced sound signal. Here, the case of using the sampling frequency of 8kHz as an example of the time domain autocorrelation.

도 2B에서, 시간영역의 피크 인덱스는 주파수영역에서는 역수 관계로 맵핑됨을 확인할 수 있다.In FIG. 2B, it can be seen that the peak indices of the time domain are mapped in the inverse relationship in the frequency domain.

도 2C에서, 주파수영역의 자기상관치를 시간영역으로 표현하면, 시간영역의 피크 인덱스 앞 부분이 불명료하게 표현됨을 확인할 수 있다.In FIG. 2C, when the autocorrelation of the frequency domain is expressed in the time domain, it can be seen that the front part of the peak index of the time domain is indistinctly expressed.

그러나, 도 2D와 같이, 시간영역으로 변환된 주파수영역의 자기상관치에 보간법을 추가로 적용하면 시간영역의 피크 인덱스 앞 부분에 해당하는 고주파영역도 잘 표현됨을 알 수 있다.However, as shown in FIG. 2D, when the interpolation method is additionally applied to the autocorrelation value of the frequency domain transformed into the time domain, the high frequency region corresponding to the front part of the peak index of the time domain is also well represented.

둘째로, 보간법을 사용하면 계산량을 감축할 수 있다.Second, using interpolation can reduce the amount of computation.

주파수영역의 분해능은 FFT 계수 N_f에 의하여 결정되므로, 주파수영역의 자기상관치를 세밀하게 표현하기 위해서는 N_f를 높여야 한다. 그런데, Nf를 높이면 FFT 및 자기상관의 계산량도 많아진다. 예를 들어, 신호의 길이가 N_f인 경우에는, FFT의 계산량은 대략

이므로, 신호의 길이가 커짐에 따라 증가한다. 뿐만 아니라, N_f의 증가분만큼 자기상관의 계산량도 늘어난다.Since the resolution of the frequency domain is determined by the FFT coefficient N _f , N _f must be increased to express the autocorrelation of the frequency domain in detail. By increasing Nf, the amount of calculation of the FFT and autocorrelation also increases. For example, if the length of the signal is N _f , the amount of computation of the FFT is approximately

Therefore, it increases as the length of the signal increases. In addition, the amount of autocorrelation increases by an increase of N _f .

그런데, 보간법을 적용하면 N_f/8의 길이를 사용해도, N_f를 사용하는 주파수영역의 자기상관치를 이용하는 경우와 비교할 때, 성능의 저하가 거의 없다.By the way, when the interpolation method is applied, even if the length of N _f / 8 is used, the performance is hardly deteriorated compared with the case of using the autocorrelation value of the frequency domain using N _f .

예를 들어, N_f가 2048인 경우, 시간-주파수 혼합영역 자기상관법의 FFT 계산량은 22528개의 곱셈이 필요하지만, 보간법을 사용하면 2048개의 곱셈만이 필요하므로, 보간법을 사용하면 계산량을 약 11배 정도 감축할 수 있다. 뿐만 아니라, 상기의 수학식 2의 자기상관의 계산량은 더욱 감축될 수 있다.For example, if N _f is 2048, the FFT calculation of time-frequency mixed domain autocorrelation requires 22528 multiplications, but using interpolation requires only 2048 multiplications. I can reduce it by a factor of two. In addition, the calculation amount of autocorrelation of Equation 2 may be further reduced.

본 발명에 따른 보간법을 사용하는 시간-주파수 혼합영역 자기상관법은 하기의 세 가지 방법이 있다.The time-frequency mixed domain autocorrelation method using the interpolation method according to the present invention has the following three methods.

첫 번째 방법(이하, IACF(Interpolated Autocorrelation Function) 1이라고 함)는 시간과 주파수영역 모두 고려하여 보간법을 적용하는 것이다. 이때, 적용되는 보간법은 정확도가 높은 스플라인 보간법(Spline Interpolation)일 수 있다.The first method (hereinafter referred to as Interpolated Autocorrelation Function 1) is to apply interpolation method considering both time and frequency domain. In this case, the applied interpolation method may be spline interpolation with high accuracy.

예를 들어, 시간영역의 샘플링 주파수가 8kHz이고, 주파수영역의 FFT 계수가 1024 포인트 FFT인 IACF 1을 사용하는 시스템은 250Hz이하에서는 주파수영역의 자기상관치에 보간 처리하고, 250Hz이상에서는 시간영역의 자기상관치에 보간 처리한 다음 가중치를 적용하여 상호 결합한다.For example, a system using IACF 1 whose sampling frequency in the time domain is 8 kHz and the FFT coefficient in the frequency domain is 1024 points FFT is interpolated to the autocorrelation of the frequency domain below 250 Hz and above the time domain. Interpolate the autocorrelation values and then combine them by applying weights.

두 번째 방법(이하, IACF 2라고 함)은 시간영역의 자기상관치로부터 후보 피치(즉, 피크 인덱스)들을 찾고, 후보 피치에만 주파수영역의 자기상관을 수행하여 스플라인 보간을 적용하는 것이다. 그러면, 주파수영역에서는 모든 피치에 자기상관연산을 수행할 필요가 없어 계산량을 크게 감축할 수 있다.The second method (hereinafter referred to as IACF 2) finds candidate pitches (ie, peak indices) from the time-domain autocorrelation, and applies spline interpolation by performing autocorrelation of the frequency domain only on the candidate pitches. Then, in the frequency domain, it is not necessary to perform autocorrelation on all pitches, so that the calculation amount can be greatly reduced.

세 번째 방법(이하, IACF 3이라고 함)은 두 번째 방법과 거의 유사하나 계산이 복잡한 스플라인 보간법 대신 하기의 수학식 5에 의해 표현되는 계산량이 적은 선형 보간법을 적용하는 것이다.The third method (hereinafter referred to as IACF 3) is similar to the second method, but instead of the spline interpolation method, which is complicated to calculate, the linear interpolation method with a small amount of calculation represented by Equation 5 below is applied.

여기서,

는 시간영역의 자기상관치의 피크 인덱스이며,

는 시간영역의 자기상관치이다.

는 주파수영역의 자기상관치의 피크 인덱스이며,

는 주파수영역의 자기상관치이다. 또한,

,

는

의 좌우 인접 인덱스이며,

,

는 인접 인덱스의 자기상관치이다.here,

Is the peak index of autocorrelation in the time domain,

Is the autocorrelation of the time domain.

Is the peak index of autocorrelation in the frequency domain,

Is the autocorrelation of the frequency domain. Also,

,

Is

Is the left and right adjacent index of

,

Is the autocorrelation of the neighbor index.

도 3에 도시된 바와 같이,

는 주파수영역의 자기상관치를 보간 처리한 결과로부터 산출된다.As shown in FIG. 3,

Is calculated from the result of interpolating the autocorrelation of the frequency domain.

한편, IACF 1, IACF 2 및 IACF 3은 피치 추정에 있어서 성능차이가 거의 없으므로, 세 가지 방법 중에 가장 우수한 방법은 계산량이 적은 IACF 3이라고 볼 수 있다.On the other hand, since IACF 1, IACF 2 and IACF 3 have little performance difference in pitch estimation, the best method among the three methods is IACF 3, which has a small amount of calculation.

이하, 도 4 내지 5를 참조하여 전술한 IACF 1, IACF 2 및 IACF 3을 사용하여 피치를 추정하는 시스템의 구체 구성에 대하여 설명한다.Hereinafter, the specific structure of the system which estimates a pitch using IACF1, IACF2, and IACF3 mentioned above with reference to FIGS. 4-5 is demonstrated.

도 4는 본 발명의 실시예에 따른 IACF 1를 사용하는 피치 추정 시스템을 도시한 구성도이다.4 is a diagram illustrating a pitch estimation system using IACF 1 according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 실시예에 따른 IACF 1을 사용하는 피치 추정 시스템(40)은 제1 자기상관기(410), 제2 자기상관기(430), 제1 보간 처리기(420), 제2 보간 처리기(440) 및 피치 추정기(450)를 포함한다.As shown in FIG. 4, the pitch estimation system 40 using IACF 1 according to an embodiment of the present invention includes a first autocorrelator 410, a second autocorrelator 430, and a first interpolation processor 420. And a second interpolation processor 440 and a pitch estimator 450.

제1 자기상관기(410)는 시간영역에서 기결정된 기준주파수 미만의 유성음 신호를 자기상관연산하여 시간영역의 자기상관치를 출력한다.The first autocorrelator 410 autocorrelates a voiced sound signal below a predetermined reference frequency in the time domain and outputs the autocorrelation value in the time domain.

여기서, 기준주파수는 시간영역의 자기상관연산의 샘플링 주파수와 주파수영역의 자기상관연산의 FFT 계수에 따라 배가 에러와 반감 에러를 더 감소시킬 수 있는 주파수일 수 있다.Here, the reference frequency may be a frequency that can further reduce the doubling error and half-half error according to the sampling frequency of the autocorrelation operation in the time domain and the FFT coefficient of the autocorrelation operation in the frequency domain.

제2 자기상관기(430)는 주파수영역에서, 기준주파수 이상의 유성음 신호를 자기상관연산하여 주파수영역의 자기상관치를 출력한다.The second autocorrelator 430 autocorrelates a voiced sound signal having a reference frequency or more in the frequency domain and outputs the autocorrelation of the frequency domain.

제1 보간 처리기(420)는 시간영역의 자기상관치를 예컨대, 스플라인 보간법을 이용하여 보간 처리한다.The first interpolation processor 420 interpolates the autocorrelation of the time domain using, for example, a spline interpolation method.

제2 보간 처리기(440)는 주파수영역의 자기상관치를 예컨대, 스플라인 보간법을 이용하여 보간 처리한다.The second interpolation processor 440 interpolates the autocorrelation of the frequency domain using, for example, a spline interpolation method.

피치 추정기(450)는 보간 처리된 시간영역 및 주파수영역의 자기상관치를 예컨대, 최대 1로 정규화한 다음, 각기 피치를 검출하고, 검출된 피치에 가중치를 적용한 결과에서 최대 피크에 대응하는 시간 인덱스를 최종 피치로 결정한다. The pitch estimator 450 normalizes the autocorrelation values of the interpolated time domain and the frequency domain to, for example, a maximum of 1, and then detects the pitches, and applies a weight index to the detected pitches to determine a time index corresponding to the maximum peak. Determine the final pitch.

이때, 피치 추정기(450)는 전술한 수학식 3을 이용하되, 보간법을 이용하여

를 결정한다.In this case, the pitch estimator 450 uses Equation 3 described above, using an interpolation method.

.

이하, 도 5를 참조하여 IACF 2 및 IACF 3을 사용하는 피치 추정 시스템에 대하여 설명한다. 도 5는 본 발명의 실시예에 따른 IACF 2 및 IACF 3를 사용하는 피치 추정 시스템을 도시한 구성도이다.Hereinafter, a pitch estimation system using IACF 2 and IACF 3 will be described with reference to FIG. 5. 5 is a block diagram illustrating a pitch estimation system using IACF 2 and IACF 3 according to an embodiment of the present invention.

도 5에 도시된 바와 같이, 본 발명의 실시예에 따른 IACF 2 및 IACF 3를 사용하는 피치 추정 시스템(50)은 제1 자기상관기(510), 제2 자기상관기(520), 보간 처리기(530) 및 피치 추정기(540)를 포함한다.As shown in FIG. 5, the pitch estimation system 50 using IACF 2 and IACF 3 according to an embodiment of the present invention includes a first autocorrelator 510, a second autocorrelator 520, and an interpolation processor 530. ) And a pitch estimator 540.

제1 자기상관기(510)는 시간영역에서, 유성음 신호를 자기상관연산하여 복수의 후보피치를 산출한다.The first autocorrelator 510 calculates a plurality of candidate pitches by performing autocorrelation on the voiced sound signal in the time domain.

여기서, 복수의 후보피치는 시간영역의 피치들 중에서, 기설정된 임계치 이상의 피크에 대응하는 시간 인덱스일 수 있다. 이때, 임계치는 유성음 신호의 피치들의 전체 크기를 고려하여 설정될 수 있다.Here, the plurality of candidate pitches may be temporal indexes corresponding to peaks above a predetermined threshold among the pitches of the time domain. In this case, the threshold may be set in consideration of the overall size of the pitches of the voiced sound signal.

제2 자기상관기(520)는 주파수영역에서 복수의 후보피치의 인접 인덱스에만 자기상관연산을 수행한다.The second autocorrelator 520 performs autocorrelation only on adjacent indices of the plurality of candidate pitches in the frequency domain.

즉, 복수의 후보피치를 주파수영역으로 변환하면, 시간영역에 완전히 대응되지는 않기 때문에, 제2 자기상관기(520)는 후보피치의 역수 위치를 기준으로 좌우의 인접 인덱스에 대하여 자기상관연산을 수행한다.That is, when the plurality of candidate pitches are converted into the frequency domain, they do not correspond completely to the time domain, and thus the second autocorrelator 520 performs autocorrelation on the left and right adjacent indices based on the inverse positions of the candidate pitches. do.

여기서, 후보피치의 인접 인덱스는 후보피치의 좌우에 바로 근접한 인덱스일 수 있다.Here, the adjacent index of the candidate pitch may be an index immediately adjacent to the left and right of the candidate pitch.

보간 처리기(530)는 자기상관된 인접 인덱스에 보간 처리를 수행한다.The interpolation processor 530 performs interpolation on the autocorrelated adjacent index.

이때, 보간 처리기(530)는 스플라인 보간법을 사용할 수 있으며, 수학식 5의 선형 보간법을 사용할 수 있다.In this case, the interpolation processor 530 may use a spline interpolation method, and may use a linear interpolation method of Equation 5.

피치 추정기(540)는 보간 처리된 인접 인덱스의 피치들 중에서 최대 피크에 대응하는 시간 인덱스를 최종 피치로 결정한다.The pitch estimator 540 determines, as the final pitch, the time index corresponding to the maximum peak among the pitches of the interpolated adjacent indexes.

이와 같이, 본 발명은 시간영역과 주파수영역의 자기상관치를 혼합시, 보간법을 사용하여 시간과 주파수영역의 분해능 불일치로 인하여 저주파 영역 또는 고주파 영역이 불명료하게 표현되는 문제점을 개선할 수 있으며, 피치 배가/반감 에러를 줄일 수 있다.As such, when the autocorrelation of the time domain and the frequency domain is mixed, the present invention can improve the problem in which the low frequency region or the high frequency region is unambiguously expressed due to the resolution mismatch between the time and frequency domain by using interpolation, and doubles the pitch. Reduce half-error.

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술분야에 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다.While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

Claims

A first autocorrelator which auto-correlates a voiced sound signal below a predetermined reference frequency in the time domain and outputs an autocorrelation value in the time domain;
A second autocorrelator which auto-correlates the voiced sound signal having the reference frequency or more in the frequency domain and outputs the autocorrelation of the frequency domain;
A first interpolation processor for interpolating the autocorrelation of the time domain;
A second interpolation processor for interpolating the autocorrelation of the frequency domain; And
And a pitch estimator that estimates a final pitch by applying weights to the interpolated autocorrelation of the time domain and the autocorrelation of the frequency domain.
Wherein the reference frequency is determined according to a sampling frequency of the first autocorrelator and an FFT coefficient of the second autocorrelator.

The method of claim 1, wherein the first and second interpolation processor,
A pitch estimation system using an interpolation method including a spline interpolation method.

The method of claim 1, wherein the pitch estimator,
And combining the autocorrelation values of the time domain and the autocorrelation values of the frequency domain by applying weights, and estimating an index corresponding to the maximum peak in the combined autocorrelation values as the final pitch.

The method of claim 3, wherein the pitch estimator,
And normalizing the autocorrelation of the time domain and the autocorrelation of the frequency domain, and then applying the weight.

A first autocorrelator for autonomously computing the voiced sound signal in a time domain to produce a plurality of candidate pitches;
A second autocorrelator performing autocorrelation only on adjacent indices of the plurality of candidate pitches in the frequency domain;
An interpolation processor configured to interpolate a calculation result of the second autocorrelator in a frequency domain;
Pitch estimator for determining the final pitch of the inverse of the frequency index corresponding to the maximum peak value from the interpolated operation result
Pitch estimation system comprising a.

The method of claim 5, wherein the interpolation processor,
A spline interpolation method or a linear interpolation method for the interpolation process.

The method of claim 5, wherein the first autocorrelator,
And calculating the index of the peak whose peak according to the autocorrelation operation is greater than or equal to a predetermined threshold value as the candidate pitch.

The method of claim 5, wherein the adjacent index,
A pitch estimation system that is a left and right frequency index of a frequency that is an inverse of the candidate pitch.