KR19990003629A

KR19990003629A - Apparatus and method for detecting pitch interval using second-order linear prediction residual and nonlinear filter

Info

Publication number: KR19990003629A
Application number: KR1019970027527A
Authority: KR
Inventors: 오광철; 김경선
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-06-26
Filing date: 1997-06-26
Publication date: 1999-01-15

Abstract

본 발명은 음성 신호의 처리에 중요한 정보인 유성음 신호의 피치 정보 검출 장치에 관한 것으로, 특히 피치 정보를 빠르게 검출하기 위하여 2차의 선형 예측 계수를 구하고, 그 역 필터를 통하여 원 신호의 주요 주파수 성분을 제거한 잔차 신호를 구한 후, 비선형 시스템을 이용하여 피치 성분을 강조하는, 입력 신호를 주어진 단위로 버퍼링하는 버퍼(10)와, 상기 버퍼(10)에서 버퍼링된 데이터로부터 0 ~ 2차의 자기 상관계수를 구하는 자기 상관계수부(20), 상기 자기 상관계수부(20)에서 구해진 자기 상관계수로부터 2차의 선형 예측 계수를 구하는 선형 예측 계수부(30), 상기 선형 예측 계수부(30)에서 구해진 선형 예측 계수를 역 필터링하여 원래의 입력 신호에 대한 잔차 신호를 구하는 역 필터(40), 상기 역 필터(40)에서 구해진 잔차 신호에 대하여 비선형 필터링시키는 비선형 필터(50), 상기 비선형 필터(50)에서 필터링된 신호의 피크 값과 그 위치를 검출하는 피크 검출부(60) 및 상기 피크 검출부(60)에서 구해진 피크 값을 임계치와 비교하여 피치 주기를 결정하는 피치 주기 결정부(70)로 구성한, 2차 선형 예측 잔차와 비선형 필터를 이용한 피치 구간 검출 장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for detecting pitch information of a voiced sound signal, which is important information for processing a speech signal. In particular, a second linear prediction coefficient is obtained to quickly detect pitch information, and the main frequency component of the original signal is obtained through an inverse filter. After the residual signal is removed, the buffer 10 buffers the input signal in a given unit using a nonlinear system to emphasize the pitch component, and zero to second order autocorrelation from the data buffered in the buffer 10. In the autocorrelation coefficient unit 20 for obtaining the coefficients, the linear prediction coefficient unit 30 for obtaining the second-order linear prediction coefficients from the autocorrelation coefficients obtained in the autocorrelation coefficient unit 20, and the linear prediction coefficient unit 30 An inverse filter 40 that obtains a residual signal with respect to the original input signal by inversely filtering the obtained linear prediction coefficients, and a non-linearity with respect to the residual signal obtained by the inverse filter 40 Pitch period by comparing the peak value obtained by the nonlinear filter 50 to be filtered, the peak value of the signal filtered by the nonlinear filter 50, and the peak detector 60 and the peak detector 60 that detect the position thereof with a threshold value. It relates to a pitch interval detection device and method using a second-order linear prediction residual and a nonlinear filter composed of a pitch period determination unit 70 for determining the.

Description

Apparatus and method for detecting pitch interval using second-order linear prediction residual and nonlinear filter

본 발명은 음성 신호의 처리에 중요한 정보인 유성음 신호의 피치 정보 검출 장치에 관한 것으로, 특히 피치 정보를 빠르게 검출하기 위하여 2차의 선형 예측 계수를 구하고, 그 역 필터를 통하여 원 신호의 주요 주파수 성분을 제거한 잔차 신호를 구한 후, 비선형 시스템을 이용하여 피치 성분을 강조하는, 2차 선형 예측 잔차와 비선형 필터를 이용한 피치 구간 검출 장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for detecting pitch information of a voiced sound signal, which is important information for processing a speech signal. In particular, a second linear prediction coefficient is obtained to quickly detect pitch information, and the main frequency component of the original signal is obtained through an inverse filter. The present invention relates to a pitch interval detection apparatus and method using a second-order linear prediction residual and a nonlinear filter, after obtaining a residual signal obtained by removing the residual signal, and then using a nonlinear system to emphasize the pitch component.

유성음 신호의 피치 정보는 음성 신호의 처리에 중요한 정보이다.Pitch information of the voiced sound signal is important information for processing the voice signal.

따라서, 유성음의 피치를 구하는 방법은 많은 연구가 있었다.Therefore, there have been many studies on how to find the pitch of voiced sounds.

선형예측 잔차를 이용한 방법, 자기-상관계수를 이용한 방법, 켑스트럼 계수로부터 구하는 방법등이 있다.There are methods using linear predictive residuals, methods using auto-correlation coefficients, and methods to obtain from 켑 strum coefficients.

선형예측 잔차를 이용한 방법은, 먼저 음성을 극(ploe)만을 갖는 선형 필터로 모델링(All pole modeling)하여, 그 모델 파라미터를 입력된 데이터로부터 구한다.In the method using the linear prediction residual, first, all poles are modeled using a linear filter having only a pole, and the model parameters are obtained from the input data.

이 선형 모델은 음성 신호, 특히 유성음 신호가 수개의 서로다른 공진점을 가지는 특성을 모델링한 것으로, 이들 공진점(유성음에서의 포먼트)들을 극으로 모델링한 것이다.This linear model models the characteristics of voice signals, especially voiced sound signals, with several different resonance points, and models these resonance points (formers in voiced sounds) as poles.

따라서, 이 극들을 제거하면 음성의 포먼트 특성이 없어져 주파수 특성이 평편하게 된다.Thus, removing these poles removes the formant characteristics of the speech, resulting in a flat frequency characteristic.

이와 같이, 극을 없애기 위해 앞서 구한 선형 필터의 역필터를, 입력 음성 신호에 적용하여 잔차를 구한다.In this way, in order to eliminate the pole, the inverse filter of the linear filter obtained above is applied to the input audio signal to obtain a residual.

상기와 같이 구한 잔차 신호에는, 유성음 구간에서는 피치 성분의 연속된 펄스들만 남고, 무성음 구간에서는 백색 잡음만 남는다.In the residual signal obtained as described above, only continuous pulses of the pitch component remain in the voiced sound section, and only white noise remains in the unvoiced sound section.

따라서, 이 잔차 신호에서 피치 성분에 해당하는 피크 값들의 위치를 찾아, 피크 위치의 차이를 피치 주기로 한다.Therefore, the positions of the peak values corresponding to the pitch components in this residual signal are found, and the difference between the peak positions is taken as the pitch period.

일반적으로, 선형 예측 필터를 이용한 피치 검출 장치는 도 1 에 도시된 바와 같이, 입력된 신호를 저장하는 버퍼(1)와, 상기 버퍼(1)에 저장된 데이터로부터 0 ~ 20차의 자기 상관계수를 구하는 자기 상관계수부(2), 상기 자기 상관계수부(2)에서 구해진 자기 상관계수로부터 20차의 선형 예측 계수를 구하는 선형 예측 계수부(3), 상기 선형 예측 계수부(3)에서 구해진 선형 예측 계수로부터 원래의 입력 신호에 대한 잔차 신호를 구하는 역 필터(4), 상기 역 필터(4)에서 구해진 잔차 신호로부터 피크 값과 그 위치를 구하는 피크 검출부(5) 및, 상기 피크 검출부(5)에서 구해진 피크 값을 임계치와 비교하여 피치 주기를 결정하는 구간 계산부(6)로 구성되어 있다.In general, as shown in FIG. 1, a pitch detection apparatus using a linear prediction filter includes a buffer 1 for storing an input signal and an autocorrelation coefficient of order 0 to 20 from data stored in the buffer 1. A linear prediction coefficient unit 3 for obtaining a linear prediction coefficient of order 20 from the autocorrelation coefficients obtained by the autocorrelation coefficient unit 2 and the linear prediction coefficient unit 3 obtained by the linear prediction coefficient unit 3 An inverse filter (4) for obtaining a residual signal with respect to the original input signal from prediction coefficients, a peak detector (5) for obtaining a peak value and its position from the residual signal obtained from the inverse filter (4), and the peak detector (5) It consists of an interval calculating section 6 for determining the pitch period by comparing the peak value obtained by the threshold value.

상기와 같이 구성된 종래의 피치 검출 장치는 다음과 같이 동작된다.The conventional pitch detection device configured as described above operates as follows.

먼저, 선형 예측 계수는 예측된 필터의 안정성을 고려하여 주로 자기 상관계수법으로 구한다.First, the linear prediction coefficient is mainly obtained by autocorrelation coefficient in consideration of the predicted filter stability.

따라서, 입력된 신호는 먼저 하나의 블록단위(보통 20~30msec)만큼 버퍼(1)에 저장되고, 자기 상관계수부(2)는 상기 버퍼(1)의 데이터로부터 0~20차의 를 구한다.Therefore, the input signal is first stored in the buffer 1 by one block unit (usually 20 to 30 msec), and the autocorrelation coefficient unit 2 obtains 0 to 20th order from the data of the buffer 1.

선형 예측 계수부(3)는, 상기와 같이 구해진 자기 상관 계수로부터 20차의 선형예측 계수를 자기 상관계수법으로 구한다.The linear prediction coefficient unit 3 obtains the 20th order linear prediction coefficient from the autocorrelation coefficients obtained as described above by the autocorrelation coefficient method.

역 필터(4)는 상기 선형 예측 계수부(3)에서 구해진 선형 예측 계수로부터 역 필터링하고, 원래의 입력신호에 대하여 적용하여 잔차 신호를 얻는다.The inverse filter 4 performs inverse filtering from the linear prediction coefficients obtained by the linear prediction coefficient unit 3 and applies the original input signal to obtain a residual signal.

상기 역 필터(4)에서 구해진 잔차 신호로부터 신호의 크기만을 가지고, 피크 검출부(5)는 피크 값과 그 위치를 구한다.With only the magnitude of the signal from the residual signal obtained by the inverse filter 4, the peak detector 5 finds the peak value and its position.

상기 피크 검출부(5)에서 구해진 피크 값의 크기가, 일정한 임계치(threshold)를 넘으면, 유성음의 피치에 해당하는 것으로 보고, 구간 계산부(6)는 그 피크 위치사이의 거리를 피치 주기로 결정한다.If the magnitude of the peak value obtained by the peak detector 5 exceeds a certain threshold, it is regarded as the pitch of the voiced sound, and the section calculator 6 determines the distance between the peak positions as the pitch period.

반면에, 임계치보다 작으면 무성음, 또는 묵음 구간으로 피치가 없는 구간이 된다.On the other hand, if it is smaller than the threshold, it is an unvoiced or silent section.

이상과 같은 방법으로 피치를 구하는 것은, 그 피치 성분이 다른 성분보다 커서 쉽게 구분이 가능해야 하나, 유성음의 시작과 끝, 그리고 전이 구간에서는 피치 성분이 작아서 피크 값을 찾기 어렵다.It is difficult to find the pitch in the above-described way because the pitch component is larger than the other components and can be easily distinguished, but the pitch component is small at the beginning and end of the voiced sound and the transition period.

또, 계산량도 많아서 간단하게 피치 정보만을 이용하기 위해서는, 프로세서의 부담이 너무 커진다.In addition, the calculation amount is large, and the burden on the processor becomes too large to simply use only the pitch information.

즉, 8 kHz 샘플된 신호에 대해 30 msec(240 samples) 단위로 피치를 구하는 경우의, 곱하기 계산을 보면 다음과 같다.That is, in the case of obtaining a pitch in units of 30 msec (240 samples) for an 8 kHz sampled signal, the multiplication calculation is as follows.

먼저, 0~20차의 자기 상관계수를 구하기 위해 대략 21 x 240=5,040 번의 곱하기가 필요하고, 20차의 선형 예측 계수를 구하는데 약 20 x log(20)≒60 번, 또 역 필터를 수행하기 위하여 240 x 20=4,800 번의 곱하기가 필요하다.First, in order to find the autocorrelation coefficients of order 0 to 20, it is necessary to multiply approximately 21 x 240 = 5,040 times, and perform the inverse filter about 20 x log (20) ≒ 60 times to find the linear prediction coefficients of the 20th order. To do this we need to multiply 240 x 20 = 4,800 times.

즉, 전체적으로 대략 9,900번의 계산이 필요하여, 1밉스(Milion Instruction Per Second 이하 MIPS 라 칭함)의 프로세서 자원이 소요된다.That is, about 9,900 calculations are required in total, and a processor resource of 1 mips (hereinafter referred to as MIPS (MiPS)) is consumed.

한편, 자기 상관계수를 이용하는 방법은, 입력 신호의 자기 상관계수가 신호의 피치 주기마다 피크치를 갖는 성질을 응용하여 피치를 검출하는 방법으로, 이는 계산량이 앞서의 선형 예측 잔차를 이용한 방법보다 많아진다.On the other hand, the method using autocorrelation coefficient is a method of detecting the pitch by applying the property that the autocorrelation coefficient of the input signal has a peak value for every pitch period of the signal, which is larger than the method using the linear prediction residual. .

즉, 자기 상관계수를 구할때 피치의 범위를 70Hz에서 300Hz라고 가정하면, 26~115차의 자기 상관계수를 구하여야 한다.In other words, assuming that the pitch range is 70Hz to 300Hz when the autocorrelation coefficient is obtained, the autocorrelation coefficient of order 26 to 115 should be obtained.

따라서, 자기 상관계수들을 구하는 데 사용되는 곱하기 개수만 90 x 240 = 21,600으로, 2.1밉스(MIPS)의 자원이 필요하다.Therefore, the number of multiplications used to find the autocorrelation coefficients is only 90 × 240 = 21,600, which requires 2.1 MIPS of resources.

켑스트럼 계수를 이용한 방법은, 켑스트럼이 신호 발생 모델상의 소스(source) 성분과, 필터(음성 신호에 대해서는 성도 모델)성분으로 분리해 나타나는 특성을 이용한 것이다.The method using the Cepstrum coefficient utilizes a characteristic in which the Cepstrum appears separately from a source component on a signal generation model and a filter (a vocal model for voice signals).

즉, 음성의 성도 필터에 대한 성분은 켑스트럼의 낮은 차수에 모여 있고, 소스에 해당하는 성분은 높은 차수에 형성되는 특성을 이용한다.That is, the components for the negative vocal filter are gathered at the low order of the Cepstrum, and the components corresponding to the source take advantage of the characteristics formed at the high order.

따라서, 높은 차수(피치를 300Hz까지만 고려할 때 26차 이상)에서의 피크 값을 가지고 피치를 찾는다.Thus, the pitch is found with peak values at higher orders (26 orders of magnitude or more when only pitch is considered up to 300 Hz).

이때, 고속 푸리에 변환(FFT)을 이용하는 경우 256차를 사용할 때 256 x log2(256) = 2,048 번의 곱하기와, 역 고속 푸리에 변환(FFT)에서 또 2,048번의 곱하기가 필요하다.In this case, when the fast Fourier transform (FFT) is used, it is necessary to multiply 256 x log 2 (256) = 2,048 times when using the 256th order, and to multiply 2,048 times in the inverse fast Fourier transform (FFT).

또한, 스펙트럼의 크기를 구할 때 128 x 2번의 곱하기와, 128 개의 대수 연산(logarithmic operation)이 필요하다.In addition, when determining the magnitude of the spectrum, 128 x 2 multiplications and 128 logarithmic operations are required.

즉, 켑스트럼을 구할때 4,350 번의 곱하기와 128번의 로그(log) 연산이 소요된다.In other words, it takes 4,350 multiplications and 128 log operations.

일반적으로, 로그(log) 연산은 곱하기의 수십배가 되므로, 켑스트럼 방법에서도 대략 1밉스(MIPS)의 자원이 소요된다.In general, the log operation is a multiple of several times the multiplication, and therefore, the cepstrum method requires about 1 mips (MIPS) of resources.

본 발명은 상기한 바와 같은 종래의 제 문제점들을 해소시키기 위하여 창안된 것으로, 피치 정보를 빠르게 검출하기 위하여 2차의 선형 예측 계수를 구하고, 그 역 필터를 통하여 원 신호의 주요 주파수 성분을 제거한 잔차 신호를 구한 후, 비선형 시스템을 이용하여 피치 성분을 강조하는, 2차 선형 예측 잔차와 비선형 필터를 이용한 피치 구간 검출 장치 및 방법을 제공하는데 그 목적이 있다.The present invention has been devised to solve the above-mentioned problems of the prior art, and obtains a linear linear prediction coefficient for quickly detecting the pitch information, and removes the residual frequency components of the original signal through the inverse filter. It is an object of the present invention to provide an apparatus and method for detecting a pitch interval using a second-order linear prediction residual and a nonlinear filter that emphasizes pitch components using a nonlinear system.

도 1 은 기존의 선형 예측 잔차를 이용한 피치 검출 장치 블록 구성도,1 is a block diagram of a pitch detection apparatus using a conventional linear prediction residual;

도 2 는 본 발명에 따른 2차 선형 예측 잔차와 비선형 필터를 이용한 피치 검출 장치 블록 구성도,2 is a block diagram of a pitch detection apparatus using a second-order linear prediction residual and a nonlinear filter according to the present invention;

도 3 은 버퍼 데이터에서 자기 상관 계수를 구하는 방법의 예시도,3 is an exemplary diagram of a method for obtaining autocorrelation coefficients from buffer data;

도 4 는 음성 생성 모델의 예시도,4 is an exemplary diagram of a voice generation model;

도 5 는 음성 신호의 유성음의 시작부의 파형 예시도,5 is an exemplary waveform diagram of a beginning portion of voiced sound of a voice signal;

도 6 은 도 5 의 음성 신호에 대한 20차 선형 예측 잔차의 파형 예시도,6 is an exemplary waveform diagram of a 20th order linear prediction residual for the speech signal of FIG. 5;

도 7 은 도 5 의 음성 신호에 대한 2차 선형 예측 잔차의 파형 예시도,7 is an exemplary waveform diagram of a second-order linear prediction residual for the speech signal of FIG. 5;

도 8 은 2차 선형 예측 잔차에 대한 비선형 필터 출력의 파형 예시도이다.8 is a waveform illustration of a nonlinear filter output for a second order linear prediction residual.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 버퍼 20 : 자기 상관계수부10 buffer 20 autocorrelation coefficient unit

30 : 선형 예측 계수부 40 : 역 필터30: linear prediction coefficient unit 40: inverse filter

50 : 비선형 필터 60 : 피크 검출부50 nonlinear filter 60 peak detection unit

70 : 피치 주기 결정부 100 : 성도 필터70: pitch period determination unit 100: saint filter

200 : 연속된 펄스 300 : 백색 잡음200: continuous pulse 300: white noise

상기한 바와 같은 목적을 달성하기 위한 본 발명은, 입력 신호를 주어진 단위로 버퍼링하는 버퍼(10)와, 상기 버퍼(10)에서 버퍼링된 데이터로부터 0 ~ 2차의 자기 상관계수를 구하는 자기 상관계수부(20), 상기 자기 상관계수부(20)에서 구해진 자기 상관계수로부터 2차의 선형 예측 계수를 구하는 선형 예측 계수부(30), 상기 선형 예측 계수부(30)에서 구해진 선형 예측 계수를 역 필터링하여 원래의 입력 신호에 대한 잔차 신호를 구하는 역 필터(40), 상기 역 필터(40)에서 구해진 잔차 신호에 대하여 비선형 필터링시키는 비선형 필터(50), 상기 비선형 필터(50)에서 필터링된 신호의 피크 값과 그 위치를 검출하는 피크 검출부(60) 및, 상기 피크 검출부(60)에서 구해진 피크 값을 임계치와 비교하여 피치 주기를 결정하는 피치 주기 결정부(70)로 구성함을 특징으로 한다.The present invention for achieving the object as described above, the autocorrelation coefficient to obtain a zero to second order autocorrelation coefficient from the buffer 10 for buffering the input signal in a given unit, and the data buffered in the buffer 10 A linear prediction coefficient unit 30 for obtaining a second linear prediction coefficient from the autocorrelation coefficients obtained by the autocorrelation coefficient unit 20 and an inverse of the linear prediction coefficients obtained by the linear prediction coefficient unit 30 An inverse filter 40 for filtering out a residual signal with respect to the original input signal, a nonlinear filter 50 for performing nonlinear filtering on the residual signal obtained in the inverse filter 40, and a filtering of the signal filtered by the nonlinear filter 50 And a pitch period determination unit 70 for detecting a peak value and its position, and a pitch period determination unit 70 for determining a pitch period by comparing the peak value obtained by the peak detection unit 60 with a threshold value. The.

또한, 상기한 바와 같은 목적을 달성하기 위한 본 발명은, 입력된 신호를 분석구간 만큼의 데이터가 모일 때까지 버퍼링하는 단계와, 버퍼링된 데이터를 0~2 샘플씩 이동시켜서 얻어진 샘플들과, 원래의 샘플들을 서로 곱하여 축적시킴으로써 자기 상관계수를 구하는 단계, 상기 단계에서 구한 자기 상관계수 {r0, r1, r2}로 부터 자기 상관계수법(autocorrelation method)을 이용하여 선형 예측 계수 {a1, a2}를 구하는 단계, 상기 단계에서 구한 선형 예측 계수 {a1, a2}를 가지고, 역필터를 구성하여, 원래 버퍼에 입력된 신호를 필터에 통과 시켜 잔차 신호를 구하는 단계, 상기 단계에서 구한 잔차 신호를 비선형 필터에서 다시 필터링 하는 단계, 상기 비선형 필터의 출력으로부터 피크 값과 그 위치를 구하는 단계 및, 상기 단계에서 구한 피크 값과 피크 위치를 이용하여, 각 피치 구간의 평균을 구하거나 미디안 평균으로 피치를 결정하는 단계로 이루어짐을 특징으로 한다.In addition, the present invention for achieving the object as described above, the step of buffering the input signal until the data as much as the analysis interval, the samples obtained by moving the buffered data by 0 to 2 samples, and Obtaining the autocorrelation coefficient by multiplying and accumulating samples of each other, and obtaining the linear prediction coefficients {a1, a2} from the autocorrelation coefficients {r0, r1, r2} obtained using the autocorrelation method. A step of constructing an inverse filter with the linear prediction coefficients {a1, a2} obtained in the step, and obtaining a residual signal by passing the signal originally input to the buffer to the filter, using the nonlinear filter Filtering again, obtaining peak values and positions from the output of the nonlinear filter, and using the peak values and peak positions obtained in the step. For example, the method may include determining an average of each pitch section or determining a pitch based on a median average.

이 피치 정보를 빠르게 추출하기 위하여, 본 발명에서는 2차의 선형 예측 계수를 구하고, 그 역 필터를 통하여 원 신호의 주요 주파수 성분을 제거한 잔차 신호를 먼저 구한다.In order to quickly extract this pitch information, in the present invention, the linear prediction coefficients of the second order are obtained, and the residual signal obtained by removing the main frequency components of the original signal is first obtained through the inverse filter.

잔차 신호에는 피치 성분을 포함하고 있으나, 다른 포먼트 정보도 포함되어 있으므로, 피치 성분을 강조하기 위하여 비 선형 시스템을 이용한다.The residual signal includes a pitch component, but also includes other formant information, so a nonlinear system is used to emphasize the pitch component.

먼저, 입력된 신호는 버퍼(10)에 분석 구간 만큼의 데이터가 모일 때까지 버퍼링된다.First, the input signal is buffered until data for the analysis section is collected in the buffer 10.

이 분석구간은 보통 두개 또는 세개의 피치 펄스를 포함할 수 있는 길이로 설정하는데, 20msec 에서 30 msec 정도로 하여 입력 신호가 8 kHz 샘플링되었을 경우 160에서 240 샘플 정도가 된다.This analysis interval is usually set to a length that can contain two or three pitch pulses, ranging from 20 msec to 30 msec, from 160 to 240 samples when the input signal is 8 kHz sampled.

자기 상관계수부(20)는, 도 3 에 도시한 바와 같이 상기 버퍼(10)내의 데이터를 0~2 샘플씩 이동시켜서 얻어진 샘플들과, 원래의 샘플들을 서로 곱하여 축적시켜 얻는다.As shown in FIG. 3, the autocorrelation coefficient unit 20 obtains samples obtained by moving data in the buffer 10 by 0 to 2 samples and accumulates the original samples by multiplying each other.

이렇게 얻어진 자기 상관계수 {r0, r1, r2}로 부터 선형 예측 계수부(30)는, 자기 상관계수법(autocorrelation method)을 이용하여 선형 예측 계수 {a1, a2}를 구한다.From the autocorrelation coefficients {r0, r1, r2} thus obtained, the linear prediction coefficient unit 30 obtains the linear prediction coefficients {a1, a2} using an autocorrelation method.

역 필터(40)에서는 상기 선형 예측 계수부(30)에서 구한 선형 예측 계수 {a1, a2}를 가지고, 다음과 같이 역필터를 먼저 구성한다.The inverse filter 40 has the linear prediction coefficients {a1, a2} obtained by the linear prediction coefficient unit 30, and configures the inverse filter first as follows.

[수학식 1][Equation 1]

여기서, a0 는 1 이고, x(n) 은 역 필터의 입력인 음성 신호이고, y(n) 은 역 필터의 출력인 잔차 신호를 나타낸다.Here, a0 is 1, x (n) is an audio signal which is an input of an inverse filter, and y (n) represents a residual signal which is an output of an inverse filter.

역 필터를 구성하였으면, 원래 버퍼에 입력된 신호를 필터에 통과 시켜 잔차 신호를 구한다.When the inverse filter is configured, the residual signal is obtained by passing the signal input to the original buffer through the filter.

잔차 신호는 다시 비선형 필터(50)에서 다시 필터링 된다.The residual signal is again filtered out by the nonlinear filter 50.

상기 비선형 필터(50)는 다음과 같은 구조를 갖는다.The nonlinear filter 50 has a structure as follows.

[수학식 2][Equation 2]

여기서, z(n) 은 비선형 필터의 출력이고, y(n)은 비선형 필터의 입력, 즉 잔차 신호이다.Where z (n) is the output of the nonlinear filter and y (n) is the input of the nonlinear filter, i.e. the residual signal.

피크 값을 찾는 피크 검출부(60)에서는, 상기 비선형 필터(50)의 출력으로부터 피크 값과 그 위치를 구한다.In the peak detector 60 that finds the peak value, the peak value and its position are obtained from the output of the nonlinear filter 50.

이 피크 값과 피크 위치는, 다시 피치 주기 결정부(70)에 의해 피치를 결정한다.The peak value and the peak position are further determined by the pitch period determination unit 70.

하나의 벼퍼내에는 두 세개의 피치 펄스가 존재하므로, 피치를 결정할 때 각 피치 구간의 평균을 구하거나 미디안 평균으로 구한다.Since there are two or three pitch pulses in one impeller, each pitch section is averaged or a median average when determining pitch.

이상과 같이 동작하는 피치 검출 장치의 소요 계산량을 살펴보면 다음과 같다.Looking at the required calculation amount of the pitch detection device operating as described above is as follows.

먼저, 자기 상관계수를 구하는 데 3 x 240=720 번의 곱하기, 또 선형 예측 계수를 구하는 데 수 번의 곱하기, 그리고 역 필터링에 2 x 240=480 번의 곱하기, 마지막으로 비선형 필터링에 2 x 240=480 번의 곱하기가 필요하여, 전체적으로 대략 1,700 번의 곱하기가 필요하므로, 0.2 밉스(MIPS) 정도의 프로세스 자원만을 소요하게 된다.First, 3 x 240 = 720 times to get the autocorrelation coefficient, several times to get the linear prediction coefficient, and 2 x 240 = 480 times to the inverse filtering, and 2 x 240 = 480 times to the nonlinear filtering. Multiplying is required, which requires approximately 1,700 multiplications in total, thus requiring only about 0.2 mips of process resources.

여기서, 본 발명의 목적에 따른 2차 선형 예측 잔차와 비선형 필터를 이용한 피치 구간 검출 장치의 동작 원리를 상세히 설명하면 다음과 같다.Here, the operation principle of the pitch interval detection apparatus using the second-order linear prediction residual and the nonlinear filter according to the object of the present invention will be described in detail.

일반적으로, 음성 신호의 발생은 도 4 에 도시한 바와 같이, 소스 필터(source-filter) 모델로 설명이 되는데, 발성된 음성 신호는 성문에서 나오는 여기(source) 신호가 성도(vocal tract) 필터(100)를 통하여 나오는 것으로 보기 때문이다.In general, the generation of the voice signal is described as a source-filter model, as shown in FIG. 4, wherein the voice signal is generated by the vocal tract filter (the source signal from the gate). Because it comes through 100).

여기서, 여기 신호는 유성음의 경우는 연속된 펄스(pulse train)(200)로 형성되고, 무성음의 경우는 백색 잡음(300)의 형태를 갖는다.Here, the excitation signal is formed as a continuous pulse train 200 in the case of voiced sound, and has a form of white noise 300 in the case of unvoiced sound.

또한, 성도 필터(100)는 그 음소의 특성을 나타내는 데, 일반적으로 수 개의 공진 주파수인 포먼트로 구성된다.In addition, the duct filter 100 exhibits the characteristics of the phoneme, and is generally composed of formants having several resonance frequencies.

따라서 많은 경우, 음성 신호에 대하여 선형 예측 모델(linear prediction model)을 설정하여 분석한다.Therefore, in many cases, a linear prediction model is set for the speech signal and analyzed.

이것은, 선형 예측 모델이 주로 전극(all-pole) 모델로 주어지고, 이 극점에 해당하는 주파수가 성도 필터(100)의 포먼트와 비슷하기 때문이다.This is because the linear prediction model is mainly given as an all-pole model, and the frequency corresponding to this pole is similar to the formant of the duct filter 100.

또한 선형 예측 모델의 잔차 신호는 음성 발생의 여기 신호로 볼 수 있다.In addition, the residual signal of the linear prediction model may be regarded as an excitation signal of speech generation.

따라서, 일반적인 음성 발생 모델과 음성 분석에서의 선형 예측 모델이 서로 일치하기 때문에, 음성 부호화, 음성 해석, 음성 인식 등의 음성 신호 관련 분야에서 많이 사용되고 있다.Therefore, since a general speech generation model and a linear prediction model in speech analysis coincide with each other, they are widely used in speech signal related fields such as speech encoding, speech analysis, and speech recognition.

전극 선형 예측 모델은, 현재 샘플 x(n)을 과거 N개의 샘플 {x(n-1), ..., x(n-N)}을 가지고 예측하는데, 예측된 현재의 샘플 x'(n)은 과거 N개의 샘플의 선형적 결합으로 다음 식과 같이 얻어진다.The electrode linear prediction model predicts the current sample x (n) with the past N samples {x (n-1), ..., x (nN)}, where the predicted current sample x '(n) is The linear combination of past N samples is obtained as

[수학식 3][Equation 3]

이 때, 선형 결합 강도를 표시하는 예측 계수 {ai, i=1,..,N}는, 현재 샘플과 예측된 샘플 사이의 오차 x(n)-x'(n)가 적도록 얻어진다.At this time, a prediction coefficient {ai, i = 1, .., N} indicating the linear coupling strength is obtained so that the error x (n) -x '(n) between the current sample and the predicted sample is small.

예측 계수를 얻는 방식에는 여러 가지가 있는데, 자기 상관계수법이 예측된 필터의 안정성등을 고려하여 많이 사용되고 있다.There are various ways to obtain prediction coefficients, and autocorrelation coefficients are widely used in consideration of predicted filter stability.

자기 상관계수법으로 예측 계수를 구할 때는, N+1개 이상의 자기 상관관계 계수(autocorrelation coefficients)가 필요하다.When the prediction coefficients are obtained by autocorrelation coefficients, N + 1 or more autocorrelation coefficients are required.

한편, 선형 예측 계수로부터 역필터를 구성하여 원래의 음성 신호에 적용하면 잔차 신호를 얻을 수 있는데, 잔차 신호는 음성 발생 모델의 여기 신호와 유사하여 유성음의 경우는 연속된 펄스의 형태로 무성음에서는 백색 잡음의 형태를 보인다.On the other hand, if the inverse filter is constructed from the linear prediction coefficients and applied to the original speech signal, the residual signal can be obtained. The residual signal is similar to the excitation signal of the speech generation model. It shows the form of noise.

따라서, 이러한 잔차 신호의 특성을 이용하여 유성음의 피치를 검출하는 방식이 나와 있다.Therefore, a method of detecting the pitch of voiced sounds by using the characteristics of the residual signal is disclosed.

즉, 잔차 신호를 이용한 유성음의 피치 검출은, 음성 신호에 있는 포먼트 성분을 제거하여 여기 신호를 얻어서, 여기 신호로부터 펄스(피치)의 주기를 검출하는 방법이다.That is, pitch detection of voiced sound using the residual signal is a method of removing a formant component in an audio signal to obtain an excitation signal and detecting a period of a pulse (pitch) from the excitation signal.

역 필터는, 선형 예측 계수, {ai, i=1,...,N} 로부터 상기 수학식 1 과 같이 구성된다.The inverse filter is constructed from the linear prediction coefficients {ai, i = 1, ..., N} as in the above equation (1).

선형 예측 모델의 잔차 신호를 이용한 피치 검출은, 20차 정도의 선형 예측 계수를 구하여야 하기 때문에 일반적으로 시스템의 자원을 많이 소모한다.Pitch detection using the residual signal of the linear prediction model generally consumes a lot of system resources because a linear prediction coefficient of about 20 orders has to be obtained.

따라서, 본 발명에서는 2차의 선형 예측 계수만을 구하고, 그 역필터를 통하여 피치 성분과 혼동을 일으킬 수 있는 주요 포먼트 성분만 제거 한 잔차를 얻고, 그로부터 피치 주기를 검출하는 방식을 택하였다.Therefore, in the present invention, only the second-order linear prediction coefficient is obtained, and a residual obtained by removing only the main formant component that may be confused with the pitch component through the inverse filter is obtained, and the pitch period is detected therefrom.

이때, 잔차 신호에서 피치 성분을 보상하기 위하여 상기 수학식 2 와 같은 비선형 필터를 잔차 신호에 적용한다.At this time, in order to compensate the pitch component in the residual signal, a nonlinear filter such as Equation 2 is applied to the residual signal.

이 비선형 필터는, 입력 신호의 에너지가 많은 부분에 대해 큰 값을 출력하고, 에너지가 적은 부분에서는 작은 값을 출력하므로, 잔차 신호에서 에너지가 많은 펄스에 해당하는 부분을 강조하고 에너지가 적은 다른 성분을 감소시킨다.This nonlinear filter outputs a large value for the high energy portion of the input signal and a small value for the low energy portion, thus emphasizing the portion of the residual signal that corresponds to the high energy pulse and providing a low energy component. Decreases.

따라서, 유성음의 피치를 검출할 때 피크 값을 찾는데 유리하게 작용한다.Therefore, it is advantageous to find the peak value when detecting the pitch of the voiced sound.

이하에서, 본 발명에 따른 실시 예를 도 5 내지 도 8 을 참조하여 설명한다.Hereinafter, an embodiment according to the present invention will be described with reference to FIGS. 5 to 8.

음성 신호에서 피치 검출시, 음성의 시작부분과 끝나는 부분 그리고 천이 구간에서 오류가 많이 발생한다.When pitch is detected in a speech signal, a lot of errors occur at the beginning and ending portions of the speech and the transition period.

도 5 는 음성의 시작 부분으로, 피치에 해당하는 피크 값이 점점 커지고 있으며, 모양도 많이 변화하고 있다.Fig. 5 shows the beginning of the voice, and the peak value corresponding to the pitch is gradually increasing, and the shape is also changing a lot.

다음, 도 6 은 기존 선형 예측 잔차를 이용한 방법을 성능을 보이고자, 20차의 선형 예측 계수에 의한 역 필터의 잔차 신호로, 피크 위치를 선명하게 나타내나, 신호 레벨의 변화가 원래 음성 신호를 많이 따라 간다.Next, FIG. 6 is a residual signal of the inverse filter by the linear prediction coefficient of the 20th order to show the performance of the method using the existing linear prediction residuals, but clearly shows the peak position, but the change in the signal level shows the original speech signal. Follow a lot

도 7 은 본 발명에서 사용한, 2차의 선형 예측 계수에 의한 역 필터의 잔차 신호로, 전체적인 모양이나 성능에서 20차의 선형 예측 계수를 사용했을 때와 많은 차이가 없다.7 is a residual signal of the inverse filter by the second-order linear prediction coefficients used in the present invention, and there is not much difference from when the 20th-order linear prediction coefficients are used in the overall shape or performance.

따라서, 음성 포먼트의 주요 포먼트 성분만을 제거하여도 유사한 성능을 얻을 수 있으므로, 저차의 선형 예측 계수를 구하여 시스템 자원의 소모를 줄이는 것이 효과적이다.Therefore, similar performance can be obtained by removing only the main formant component of the voice formant. Therefore, it is effective to reduce the consumption of system resources by obtaining the lower linear prediction coefficients.

도 8 은, 2차의 선형 예측 계수에 의한 역필터의 잔차 신호를, 비선형 필터에 통과시킨 출력 신호로, 피치 펄스 성분이 선명하게 드러난다.8 is an output signal obtained by passing the residual signal of the inverse filter by the second order linear prediction coefficient to the nonlinear filter, and the pitch pulse component is clearly revealed.

따라서, 2차 선형 예측 계수를 사용하여 생기는 성능의 저하를 충분히 상쇄하고도 남는다.Therefore, the performance degradation caused by using the second-order linear prediction coefficient may be sufficiently offset.

이상에서 상세히 설명한 바와 같이 본 발명은, 시스템의 자원을 적게 소모하는 방법으로 유성음의 피치를 검출하는 방법으로 빠르게(기존의 선형 예측 모델의 잔차를 이용한 방식에 비해 5배 이상 빠르다) 검출된 음성을 이용할 수 있는 분야에 적합하다.As described in detail above, the present invention is a method of detecting voice pitch using a method that consumes less resources of the system, and thus detects the detected voice quickly (more than five times faster than the method using the residual of the existing linear prediction model). It is suitable for the field available.

예를 들어, 음성 검출, 음성 분석, 음성 인식, 화자 인식, 운율 정보 추출 등에서는 정밀한 피치 정보보다는 효과적인 피치 검출이 바람직하기 때문에 유용하게 사용될 것이다.For example, in speech detection, speech analysis, speech recognition, speaker recognition, rhyme information extraction, etc., effective pitch detection is preferable rather than precise pitch information.

또한, 음성 검출에서는 피치의 유, 무를 이용해, 입력된 신호가 유성음인지를 알 수 있기 때문에, 에너지가 높은 신호가 음성이기 때문인지, 아니면 다른 동적 잡음에 의한 것인지 판별할 수 있다.In the voice detection, it is possible to determine whether the input signal is a voiced sound using the presence or absence of the pitch, so that it is possible to determine whether the signal with high energy is voice or other dynamic noise.

또, 음성 분석에서는 분석 구간을 피치에 따라 차등을 줄수 있으므로, 보다 정확하게 분석할 수 있다.In the voice analysis, the analysis section can be given a difference according to the pitch, so that the analysis can be performed more accurately.

한편, 음성 인식에서는 특징벡터를 구할 때 피치 정보를 이용할 수 있고, 또 운율 정보를 이용하여 연속어 음성 인식 등에서 사용이 가능하다.On the other hand, in speech recognition, pitch information can be used to obtain a feature vector, and it can be used in continuous speech recognition using rhyme information.

여기서, 피치 정보는 음성 신호내에 발성 화자에 대한 정보를 가장 많이 가지고 있는 성분이므로, 빠르게 피치를 검출할 수 있는 방법은 화자 인식에 좋은 정보를 추가해 주는 것이다.Here, the pitch information is a component having the most information on the talker in the speech signal, so a method for quickly detecting the pitch is to add good information for speaker recognition.

Claims

A buffer 10 for buffering an input signal in a given unit, an autocorrelation coefficient unit 20 for obtaining a 0 to 2nd order autocorrelation coefficient from the data buffered in the buffer 10, and the autocorrelation coefficient unit 20 The linear prediction coefficient unit 30 that obtains the second-order linear prediction coefficients from the autocorrelation coefficients obtained by, and the inverse that the linear prediction coefficients obtained by the linear prediction coefficient unit 30 are inversely filtered to obtain a residual signal with respect to the original input signal. A filter 40, a nonlinear filter 50 for nonlinear filtering the residual signal obtained by the inverse filter 40, and a peak detector 60 for detecting the peak value of the signal filtered by the nonlinear filter 50 and its position. And a pitch period determination unit 70 for determining a pitch period by comparing the peak value obtained by the peak detection unit 60 with a threshold value, and using a second-order linear prediction residual and a nonlinear filter. Detection device.

Buffering the input signal until the data of the analysis period is collected, and obtaining autocorrelation coefficients by multiplying and accumulating the samples obtained by moving the buffered data by 0 to 2 samples, and the original samples. Obtaining the linear prediction coefficients {a1, a2} from the autocorrelation coefficients {r0, r1, r2} obtained in the step by using the autocorrelation method, and calculating the linear prediction coefficients {a1, a2} obtained in the above steps. And constructing an inverse filter, passing a signal originally input to the buffer to a filter to obtain a residual signal, re-filtering the residual signal obtained in the step with a nonlinear filter, and peak values from the output of the nonlinear filter. Using the step of finding the position and the peak value and the peak position obtained in the above step, the average of each pitch section or the average of the pitch The method of claim 1, characterized in that the step consisting of, the second-order linear prediction residual and the pitch interval detection method using a nonlinear filter.

The method of claim 2, wherein the obtaining of the residual signal comprises: when a0 is 1, x (n) is an audio signal that is an input of an inverse filter, and y (n) represents a residual signal that is an output of an inverse filter. Pitch interval detection method using a second-order linear prediction residual and nonlinear filter, characterized in that using the same method.

The method of claim 2, wherein the filtering in the nonlinear filter is performed when z (n) is an output of the nonlinear filter and y (n) is an input of the nonlinear filter, that is, a residual signal. Pitch interval detection method using a second-order linear prediction residual and nonlinear filter, characterized in that using the same method.