KR100337293B1

KR100337293B1 - Method of harmonic estimation in voice coder

Info

Publication number: KR100337293B1
Application number: KR1020000049345A
Authority: KR
Inventors: 최용수
Original assignee: 서평원; 엘지정보통신주식회사
Priority date: 2000-08-24
Filing date: 2000-08-24
Publication date: 2002-05-17
Also published as: KR20020016175A

Abstract

본 발명은 음성 부호화기에 관한 것으로 상세하게는, 4kbps 이하의 저전송률 음성 부호화기에서 음성신호의 스펙트럼 고조파 추정에 델타 조정법을 사용하여 계산량을 크게 감소시키는 음성 부호화기에서 효율적인 고조파 추정에 관한 것이다.The present invention relates to a speech coder, and more particularly, to an efficient harmonic estimation in a speech coder that greatly reduces computational complexity using a delta adjustment method for estimating a spectrum harmonic of a speech signal in a low-rate speech coder of 4 kbps or less.

종래에 고조파 전체 대역에서의 MMSE 계산 방식을 사용하는 실수피치 기반 고조파 추정은 입력신호 스펙트럼으로부터 정확한 실수 단위의 피치 검색이 요구된다. 만약, 할당된 비트나 연산량의 제한으로 인하여 음성 부호화기의 피치 검색의 정확도가 떨어져 있다면 해당 입력신호 스펙트럼과 합성신호 스펙트럼의 고조파 중심 주파수의 오차는 고주파수 영역으로 갈수록 커지며, 고조파 분석의 성능이 급격히 저하되는 문제점이 있다. 이러한 문제는 실제 음성신호 스펙트럼의 고조파 주파수 간격은 약간의 편차가 있음에도 불구하고 전체 주파수 대역에 대해서 동일한 고조파 간격을 사용하기 때문에 발생하며, 만약 피치 해상도가 낮으면 더욱 심각해진다. 또한, 실수피치기반 고조파 추정 방법은 입력신호 스펙트럼의 피치의 정확도에 종속적인 성능을 가지며, 정확한 피치 검색에 많은 계산량이 소요된다.Conventionally, a real-valued pitch-based harmonic estimation using the MMSE calculation method in the entire harmonics band requires accurate pitch search in real number from the input signal spectrum. If the accuracy of the pitch search of the speech encoder is deteriorated due to the limitation of the allocated bits or the calculation amount, the error of the center frequency of the harmonics of the input signal spectrum and the synthesized signal spectrum becomes larger as the frequency becomes higher and the performance of the harmonic analysis deteriorates sharply . This problem arises because the harmonic frequency spacing of the actual speech signal spectrum uses the same harmonic spacing for the entire frequency band despite slight variations, and if the pitch resolution is low, the harmonic frequency spacing becomes even more serious. In addition, real-pitch pitch-based harmonic estimation has a performance dependent on the pitch accuracy of the input signal spectrum and requires a large amount of computation for accurate pitch search.

본 발명은 고조파 부호화기의 연산량을 크게 줄이는 고조파 추정 방식으로 DSP 칩을 사용한 실시간 구현에 있어서는 적은 연산량이 요구되므로 매우 효율적이다. 또한, 델타조정법은 고조파 갯수 만큼 고조파 대역 별로 독립적으로 최소화 과정을 반복 수행하여 적은 연산량을 필요로 하며, 오차에 대해서 덜 민감한 특성을 갖는다. 또한 피치의 해상도보다는 피치 윤곽선(contour)이 음질에 중요하다는음성의 피치 특성을 잘 이용한다. 계산량 분석, 스펙트럼 왜곡 측정, 선호도 평가 결과 적은 계산량으로 우수한 고조파 추정 성능의 효과가 있다.The present invention is a harmonic estimation method that greatly reduces the amount of computation of the harmonic encoder, and is very efficient because it requires a small amount of computation in real-time implementation using a DSP chip. In addition, the delta adjustment method requires a small amount of computation by performing the minimization process independently for each harmonic band by the number of harmonics, and is less sensitive to errors. It also makes good use of the pitch property of speech that pitch contour is more important to sound quality than pitch resolution. Analysis of computational complexity, measurement of spectrum distortion, and evaluation of preference result in the effect of excellent harmonic estimation performance with a small amount of calculation.

Description

TECHNICAL FIELD The present invention relates to a method of estimating harmonics in a speech coder,

본 발명은 음성 부호화기에 관한 것으로 상세하게는, 4kbps 이하의 저전송률 음성 부호화기에서 음성신호의 스펙트럼 고조파 추정에 효율적인 델타 조정법을 사용하여 계산량을 크게 감소시키는 음성 부호화기에서 고조파 추정에 관한 것이다.The present invention relates to a speech coder, and more particularly, to a harmonic estimation in a speech coder that greatly reduces a calculation amount by using a delta adjustment method which is efficient in estimating a spectrum harmonic of a speech signal in a low-rate speech coder of 4 kbps or less.

일반적으로, 음성 부호화기는 사람의 음성을 마이크로 받아서 해당 음성 데이터의 주파수 분포, 세기, 음성 데이터의 파형을 부호로 변환하여 전송하고, 수신측에서는 음성을 합성하는 기능을 하여 이동 통신 단말기, 교환기, 화상회의 시스템 등 많은 분야에 사용되고 있다. VOIP(Voice Over Internet Protocol)와 같은 멀티미디어 통신 및 음성 저장 시스템에 사용되고 있는 저 전송률 음성 부호화기는 대부분 CELP(Code-Excited Linear Prediction) 부호화기이다. 음성 부호화기는 4~13kbps의 전송률에서는 시간 영역 부호화기인 CELP 부호화기가, 4kbps 이하의 전송률에서는 주파수 영역 부호화기가 있다. 고조파 부호화기는 기본 주파수의 고조파 성분으로 여기 신호를 표현한다. 따라서 백색 잡음의 형태로 여기 신호를 표현하는 CELP에 비해 고조파 부호화기는 무성음 구간에서는 합성 음질의 자연성이 떨어진다. 그러나 음성 신호의 대부분을 차지하는 유성음 구간에서는 CELP에 비해 훨씬 낮은 비트율에서 부호화가 가능하다. 4kbps 이하의 전송률을 가지는 음성 부호화기는 고조파 부호화기가 많으며. 해당 고조파 부호화기는 고조파 추정기와 고조파 합성기로 구성되며, 고조파 추정기는 부호화기 전체의 성능에 가장 중요한 영향을 미치는 부분이어서 성능과 계산량을 적절히 고려하여 설계되어야 한다. 고조파 추정기에 연산량과 음질에 큰 영향을 미치는 부분이 스펙트럼 고조파 추정이다. 고조파 추정기는 피치, 진폭, 위상 등의 많은 계산량이 요구되어 DSP(Digital Signal Processor) 칩이 사용된다.Generally speaking, a speech coder receives a human voice as a microphone and converts the frequency distribution and strength of the voice data into a sign and transmits the voice data, and on the receiving side, Systems and so on. Most low-rate speech coders used in multimedia communication and voice storage systems such as VOIP (Voice Over Internet Protocol) are mostly Code-Excited Linear Prediction (CELP) coder. The speech coder has a CELP encoder as a time-domain encoder at a transmission rate of 4 to 13 kbps, and a frequency-domain encoder at a transmission rate of 4 kbps or less. The harmonic encoder expresses the excitation signal as a harmonic component of the fundamental frequency. Therefore, compared to CELP, which represents the excitation signal in the form of white noise, the harmonic encoder is less natural in synthesized sound quality in the unvoiced interval. However, it is possible to encode at a much lower bit rate than the CELP in the voiced region that occupies most of the speech signal. A speech encoder having a transmission rate of 4 kbps or less has many harmonic encoders. The harmonics encoder is composed of a harmonic estimator and a harmonic synthesizer. The harmonic estimator is a part that has the greatest influence on the performance of the entire encoder. Therefore, the harmonics estimator should be designed in consideration of performance and calculation amount. In the harmonic estimator, spectrum harmonic estimation has a great influence on computational complexity and sound quality. The harmonic estimator requires DSP (Digital Signal Processor) chip because it requires a lot of calculations such as pitch, amplitude, and phase.

피치는 시간 영역에서는 정수 단위로 검색되고, 주파수 영역에서는 실수 단위로 검색된다. 실수 피치 기반의 고조파 추정 방법은 입력 스펙트럼과 합성 스펙트럼과의 오차 에너지를 최소화하는 합성에 의한 분석으로 이루어지므로 상당한 계산량이 요구된다. 한편, CELP 부호화기와는 달리 보간(interpolation)을 통해 합성음을 재생하는 고조파 부호화기에서는 피치의 해상도보다는 피치 윤곽선(contour)이 음질에 중요한 역할을 한다. 고조파 추정 방식은 크게 이산푸리어 변환(Discrete Fourier Transform)에 기반한 방식과 고속푸리어 변환(Fast Fourier Transform)에 기반한 방식으로 나눌 수 있다. 이산푸리어 변환에 기반한 고조파 추정 방식은 피치 주기에 상관없이 스펙트럼 고조파의 크기와 위상을 동시에 추정할 수 있지만, 피치 주기가 큰 경우에는 이산푸리어 변환 과정에서 많은 연산량이 요구된다. 고속푸리어 변환에 기반한 고조파 추정에서는 스펙트럼에서 고조파가 관찰될 수 있도록 2~3개의 피치 주기 파형을 고속푸리어 변환한 후, 스펙트럼의 최고점을 추출하는 첨점 추출(peak picking) 방법이나 기본 주파수의 고조파에 해당하는 주파수에서 스펙트럼을 샘플링하는 방식과 같은 비교적 간단한 방법이 사용될 수 있다. 또 다른 방법으로는 이 보다 연산량은 많지만 성능이 우수한 MMSE (Minimum Mean Squared Error) 방법이 있다.Pitches are searched in integer units in the time domain and real numbers in the frequency domain. Since the harmonic estimation method based on the real number pitch is made by synthesis analysis that minimizes the error energy between the input spectrum and the synthetic spectrum, a considerable calculation amount is required. On the other hand, contrary to the CELP encoder, the pitch contour plays a more important role in the sound quality than the resolution of the pitch in the harmonic encoder that reproduces the synthesized sound through interpolation. The harmonic estimation method can be roughly classified into a method based on Discrete Fourier Transform and a method based on Fast Fourier Transform. The harmonic estimation method based on the discrete Fourier transform can simultaneously estimate the magnitude and phase of the spectrum harmonics irrespective of the pitch period. However, when the pitch period is large, a large amount of computation is required in the discrete Fourier transform process. In the harmonic estimation based on the fast Fourier transform, a peak picking method that extracts the peak of the spectrum after fast Fourier transform of two or three pitch period waveforms so that the harmonics can be observed in the spectrum, A relatively simple method, such as a method of sampling the spectrum at a frequency corresponding to the frequency of the received signal, may be used. Another method is the Minimum Mean Squared Error (MMSE) method, which has a higher computational complexity than the conventional method.

종래의 4kbps 이하의 전송률을 가지는 실수 피치를 기반으로 하는 고조파 추정 방법에는 음성 데이터를 주파수 영역에서 계산하는 방식으로 해당 음성 데이터의번 째 고조파 크기는 입력신호에 대하여 윈도우 스펙트럼을 적용하여크기의 고속 푸리어 변환을 한 입력신호 스펙트럼과 윈도우 스펙트럼을 적용하여 실수 피치 후보에 대해크기의 고속푸리어 변환을 한 합성신호 스펙트럼과의 오차에너지가 최소가 되도록하는MMSE(Minimum Mean Squared Energy) 방식을 사용한다.In the harmonic estimation method based on a real pitch having a transmission rate of 4 kbps or less, the speech data is calculated in the frequency domain, Harmonic size Lt; / RTI & Window spectrum By applying The input signal spectrum with fast Fourier transform of size And window spectrum Is applied to the real pitch pitch candidate Synthesized signal spectrum with fast Fourier transform of size The error energy The minimum mean square energy (MMSE) method is used.

여기서,는 기본주파수이고, 입력신호 스펙트럼에서의의 범위는이며,은 256이다. 합성신호 스펙트럼에서는 16,384이다.은 고조파의 수를 나타내며 수학식 1은번 째 고조파 대역의 시작점인에서 고조파 대역의 끝점인까지 입력신호 스펙트럼의 절대값과 합성신호 스펙트럼절대값의 차를 제곱하여 누적합을 구하는 것으로, 해당 수학식 1에서 오차에너지가 최소가 되는 것은 해당 수학식 1을에 관하여 미분하여 0이 되게 하면 된다.here, Is the fundamental frequency, and the input signal spectrum In The range of Lt; Is 256. Synthesized signal spectrum in Is 16,384. Represents the number of harmonics, and Equation (1) Which is the starting point of the first harmonic band The end point of the harmonic band Input signal spectrum up to And the composite signal spectrum And the cumulative sum is found by squaring the difference between the absolute values. In Equation 1, the error energy Is minimized is expressed by Equation (1) To be zero.

이때,는 고조파의 크기를 나타내며, 수학식 2는로 스케일링된 윈도우 스펙트럼로 합성신호 스펙트럼을 표현한 것이며, 실수 피치 후보에 대해크기 윈도우 스펙트럼을 적용하여 합성신호 스펙트럼을 구하는 식이다.At this time, Represents the magnitude of harmonics, and Equation (2) Scaled Window Spectrum Is a representation of the synthesized signal spectrum, and for real pitch pitch candidates Size Window Spectrum To synthesize the signal spectrum .

수학식 3에서은번 째 고조파 대역의 시작점을 의미하며,In Equation 3, silver Means the starting point of the first harmonic band,

이고, ego,

수학식 4에서은번 째 고조파 대역의 끝점을 의미한다. 수학식2와 수학식3 수학식4를 적용하여 합성신호 스펙트럼을 구한다.In Equation 4, silver Means the end point of the first harmonic band. By applying equations (2) and (3), the synthesized signal spectrum .

한편, 오차에너지을 최소화 하기 위하여 수학식1을에 관하여 미분해서 0 이되게 놓으면 다음식을 얻는다.On the other hand, 1 < / RTI > If you set it to 0 differently, you get everything.

수학식 5로 주어진 고조파 크기의 신뢰도를 높이기 위해서는 먼저 수학식 6으로 주어진 전체 주파수 대역에서의 입력신호 스펙트럼과 합성신호 스펙트럼과의 오차 에너지을 최소화하는 정교한 실수 피치 검색이 선행되어야 한다. 여기서은 검색될 실수 피치 후보의 수로서 통상 10으로 한다.The harmonic magnitude given by equation (5) The input signal spectrum in the entire frequency band given by Equation (6) And synthesized signal spectrum The error energy A sophisticated real-time pitch search that minimizes the number of errors must be followed. here Is the number of real pitch candidates to be searched.

한편, 첨부한 도면 도1은 실수피치 기반의 고조파 추정 동작방법 블럭도이며 상세히 설명하면 다음과 같다.은 입력신호 스펙트럼으로 고속 푸리어 변환을 한 신호이며,는 합성신호 스펙트럼이다. 실수피치 정제부(10)에서는 입력신호 스펙트럼과 합성신호 스펙트럼의 오차에너지를 구한다.개의 실수피치 후보에 대해서 하나의 입력신호 스펙트럼에 대한, 합성신호 스펙트럼을 계산하여 실수피치 오차의 합인 오차에너지가 최소화 되도록 최적의 실수피치 후보를 검색하여를 선택한다. 고조파 크기 추정부(11)는 오차에너지를 최소화하는 주파수를 적용하여 고조파의 크기가 최대가 되는 값을 최적의 고조파로 선택한다.FIG. 1 is a block diagram of a real-pitch-based harmonic estimation operation method, and will be described in detail. Is a fast Fourier transformed signal with an input signal spectrum, Is a synthesized signal spectrum. In the real-pitch pitching unit 10, And synthesized signal spectrum Error energy of . One input signal spectrum for the real pitch candidate For the composite signal spectrum To calculate the real pitch error Sum error energy The optimum pitch pitch candidates are searched for . The harmonic amplitude estimating unit 11 estimates the error energy &Lt; / RTI > The magnitude of the harmonic Is selected as the optimum harmonic.

한편, 실수 피치 후보에 기반한 방법의 순서적인 동작을 설명하면 다음과 같다. 입력신호에 대하여 윈도우 스펙트럼을 적용하여크기의 고속 푸리어 변환을 한 입력신호 스펙트럼을 고조파 추정기의 입력으로 사용한다(스텝 S30, 스텝 S31). 윈도우 스펙트럼을 적용하여 실수 피치 후보에 대해크기의 합성신호 스펙트럼을 계산한다 (스텝 S32). 상기 수학식6에서 전체 주파수 대역에 대하여 입력신호 스펙트럼과 합성신호 스펙트럼의 오차에너지를 구하고,개의 실수 피치 후보에 대하여 해당 스텝 S32과 스텝 S33 단계를 반복하여를 최소화 하는최적의 실수 피치 후보를 검색하여를 선택한다(스텝 S33). 스텝 S33에서 구한 최적의 실수 피치 후보에서 구한에 대해서 수학식5에 적용하여 최대의 고조파 크기를 구한다(스텝 S34).On the other hand, the sequential operation of the method based on the real pitch candidate will be described as follows. Input signal Window spectrum By applying The input signal spectrum with fast Fourier transform of size Is used as an input to the harmonic estimator (steps S30 and S31). Windows Spectrum Is applied to the real pitch pitch candidate Synthesized signal spectrum of size (Step S32). In Equation (6), the input signal spectrum And synthesized signal spectrum Error energy of Is obtained, The steps S32 and S33 are repeated for the pitch pitch candidate candidates The optimal pitch pitch candidates are searched for (Step S33). The pitches obtained from the optimum pitch pitch candidates obtained in step S33 The maximum harmonic magnitude < RTI ID = 0.0 > (Step S34).

종래에 피치 값에 따라 고정된과로 표현되는 고조파 전체 대역에서의 MMSE 계산 방식을 사용하는 실수피치 기반 고조파 추정은 입력신호 스펙트럼으로부터 정확한 실수 단위의 피치 검색이 요구된다. 만약, 할당된 비트나 연산량의 제한으로 인하여 음성 부호화기의 피치 검색의 정확도가 떨어져 있다면 해당 입력신호 스펙트럼과 합성신호 스펙트럼의 고조파 중심 주파수의 오차는 고주파수 영역으로 갈수록 커지며, 고조파 분석의 성능이 급격히 저하되는 문제점이 있다. 이러한 문제는 실제 음성신호 스펙트럼의 고조파 주파수 간격은 약간의 편차가 있음에도 불구하고 전체 주파수 대역에 대해서 동일한 고조파 주파수 간격을 사용하기 때문에 발생하며, 만약 피치 해상도가 낮으면 문제가 더욱 심각해진다. 또한, 실수피치기반 고조파 추정 방법은 입력신호 스펙트럼의 피치의 정확도에 종속적인 성능을 가지며, 정확한 피치 검색에 많은 계산량이 소요된다.Conventionally, and The real-pitch pitch-based harmonic estimation using the MMSE calculation method in the entire band of harmonics, which is expressed as a whole, requires accurate pitch search in the real number spectrum from the input signal spectrum. If the accuracy of the pitch search of the speech encoder is deteriorated due to the limitation of the allocated bits or the calculation amount, the error of the center frequency of the harmonics of the input signal spectrum and the synthesized signal spectrum becomes larger as the frequency becomes higher and the performance of the harmonic analysis deteriorates sharply . This problem occurs because the harmonic frequency spacing of the actual speech signal spectrum uses the same harmonic frequency spacing for the entire frequency band, although there is some variation, and if the pitch resolution is low, the problem becomes more serious. In addition, real-pitch pitch-based harmonic estimation has a performance dependent on the pitch accuracy of the input signal spectrum and requires a large amount of computation for accurate pitch search.

본 발명은 상술한 바와 같은 문제점을 해결하기 위한 것으로 그 목적은, 고조파의 추정을 종속적으로 전체 주파수 대역에 적용하지 않고 각각의 고조파 대역에 적응적으로 적용함으로써 피치 검색에 따른 많은 연산량을 감소시켜 DSP칩을 이용한 실시간 구현에 매우 효율적이며 피치에 덜 민감한 고조파 추정을 하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems and it is an object of the present invention to adaptively apply harmonic estimation to each harmonic band without applying it to the entire frequency band, It is very efficient for real-time implementation using chip and harmonic estimation that is less sensitive to pitch.

도1은 종래의 실수피치 기반의 고조파 추정 동작방법 블럭도.1 is a block diagram of a conventional real-pitch-based harmonic estimation operation method.

도2는 본발명에 따른 델타조정법의 고조파 추정 동작방법 블럭도.2 is a block diagram of a harmonic estimation operation method of the delta adjustment method according to the present invention.

도3은 종래의 실수피치 기반의 고조파 추정기의 동작 순서도.FIG. 3 is a flowchart of a conventional real-pitch-based harmonic estimator. FIG.

도4는 본 발명에 따른 델타조정법을 적용한 고조파 추정기의 동작 순서도.4 is a flowchart illustrating an operation of a harmonic estimator using a delta adjustment method according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Description of the Related Art [0002]

10 : 실수피치 정제부, 11 : 고조파 크기 추정부,10: Real-pitch pitch refinement part, 11: Harmonic amplitude estimation part,

20 : 델타 조정부, 21 : 고조파 크기 추정부.20: Delta adjustment section, 21: Harmonic amplitude estimation section.

상술한 바와 같은 목적을 해결하기 위한 본 발명의 특징은, 입력 음성신호에 대해 윈도우 스펙트럼을 적용하여 입력신호 스펙트럼을 구하는 과정과, 정수피치 후보에 대하여 윈도우 스펙트럼을 적용하여 합성신호 스펙트럼을 구하는 과정과, 상기 입력신호 스펙트럼의 절대값과 합성신호 스펙트럼의 절대값의 차를 제곱하여 각각의 고조파 대역에서 누적합을 구하여 대역별 오차에너지를 구하는 과정과, 상기 오차에너지를 최소로 하는 고조파 주파수 조정값의 범위를 구하는 과정과, 상기 고조파 주파수 조정값을 적용하여 고조파의 크기를 구하는 과정을 포함하는 데 있다.According to another aspect of the present invention, there is provided a method for processing a speech signal, the method comprising: obtaining an input signal spectrum by applying a window spectrum to an input speech signal; calculating a synthesized signal spectrum by applying a window spectrum to the integer pitch candidate; Calculating an error energy for each band by squaring the difference between the absolute value of the input signal spectrum and the absolute value of the synthesized signal spectrum to obtain a cumulative sum in each harmonic band to obtain a harmonic frequency adjustment value for minimizing the error energy; And a step of calculating the harmonic frequency by applying the harmonic frequency adjustment value.

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세하게 설명하면 다음과 같다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 고조파의 추정을 주파수 영역에 따라 전체 주파수 대역에 적용하지 않고 각각의 고조파 대역에 적응적으로 조정함으로써 입력 피치에 대한 종속성을 없애고 피치 검색에 따른 많은 연산량을 감소시키는 계산 방식으로 일명 델타조정법(Delta Adjustment : DA)을 사용한다. 델타조정법에서는 정수 단위의 피치를 사용하여 각각의 고조파에 대하여 해당 고조파 주파수를 좌우로만큼 조정하여 입력신호 스펙트럼과 합성신호 스펙트럼 사이에 오차에너지를 최소로 하는를 구하고, 해당를 적용하여 최대의 고조파 크기를 구한다. 다음 수학식은 오차에너지를 구하는 식이다.The present invention eliminates the dependence on the input pitch by adjusting the harmonic estimation adaptively to each harmonic band instead of applying to the entire frequency band according to the frequency domain and reduces a large amount of computation according to the pitch search. (Delta Adjustment: DA). In the delta adjustment method, the pitch of the integer unit is used to set the harmonic frequency for each harmonic to the left and right The error energy between the input signal spectrum and the synthesized signal spectrum To a minimum And The maximum harmonic size . The following equation gives the error energy .

수학식7은 정수피치 후보에 대해 입력신호 스펙트럼의 절대값과 합성신호 스펙트럼절대값의 차를 제곱하여 고조파의 시작점에서 고조파의 끝점까지 더한 값이다.의 범위는에서까지이며, 수학식8에서은 고조파 주파수의 조정값의 범위를 나타내며 해당의 값은 주파수에 비례하여 저주파 대역에서는 작게, 고주파 대역으로 갈수록 크게한다. 수학식9는 각각의 정수피치 후보에 대한 고조파 주파수 조정값을 적용하여 최대의 고조파 크기을 구하는 식이다.Equation (7) shows the input signal spectrum < RTI ID = 0.0 > And the composite signal spectrum The absolute value is squared to determine the starting point of the harmonic Endpoints of harmonics in . The range of in , And in Equation (8) Is the adjustment value of the harmonic frequency Indicates the range of Is small in the low frequency band and large in the high frequency band in proportion to the frequency. Equation (9) represents the harmonic frequency adjustment value for each integer pitch candidate The maximum harmonic size .

첨부한 도면 도2는 정수 피치 기반의 고조파추정기의 동작 방법의 블럭도이며 상세히 설명하면 다음과 같다.은 입력신호 스펙트럼으로 입력 음성신호에 대하여 윈도우 스펙트럼을 적용하여크기의 고속푸리어 변환한 신호이며,는 정수피치 후보에 대하여 윈도우 스펙트럼을 적용하여크기의 합성신호 스펙트럼이다. 델타조정부(20)는 정수단위의 피치를 이용하여 고조파 주파수의 조정값의 범위을 구하고 해당의 범위에 속하는을 적용하여가 최대가 되는를 최적의 주파수 조정값으로 하며, 해당 최적의 주파수 조정값를 적용한 고조파의 크기가 최적의 고조파로 선택된다.FIG. 2 is a block diagram of a method of operating the constant pitch-based harmonic estimator, and will be described in detail as follows. The input speech signal < RTI ID = 0.0 > Window spectrum By applying Size fast Fourier transformed signal, The window spectrum < RTI ID = 0.0 > By applying Size composite signal spectrum. The delta adjustment unit 20 uses the pitch of the integer unit to adjust the adjustment value of the harmonic frequency Range of And In the range of By applying Maximized Is set as an optimum frequency adjustment value, and the optimum frequency adjustment value Is selected as the optimal harmonic.

또한, 정수 피치 후보에 기반하는 델타조정법의 순서적인 동작을 설명하면 다음과 같다. 입력신호에 대하여 윈도우 스펙트럼을 적용하여크기의 고속 푸리어 변환을 한 입력신호 스펙트럼을 고조파 추정기의 입력으로 사용한다(스텝 S40, 스텝 S41). 정수 피치 후보에 대해 윈도우 스펙트럼을 적용하여크기의 합성신호 스펙트럼을 계산한다(스텝 S42). 정수 단위 피치를 이용 하여 상기 수학식8에서 고조파 주파수의 조정값의 범위는 주파수에 비례하여 저주파에서는 작게하고 고주파에서는 크게하여 고조파 주파수의 조정값의 한계치을 계산한다(스텝 S43). 스텝 S43에서 구한의 범위를에 대하여 오차에너지를 최소로하는를 구하고, 해당 오차에너지를 최소로하는를 상기 수학식9에 적용하여 최대의 고조파 크기를 구하는 것이다.(스텝 S44).The sequential operation of the delta adjustment method based on the integer pitch candidate will be described as follows. Input signal Window spectrum By applying The input signal spectrum with fast Fourier transform of size Is used as the input of the harmonic estimator (step S40, step S41). Window spectrum for integer pitch candidates By applying Synthesized signal spectrum of size (Step S42). Using the integer unit pitch, the adjustment value of the harmonic frequency in Equation (8) The range of the frequency is proportional to the frequency, which is small at low frequency and high at high frequency, Limit of (Step S43). In step S43, Range of The error energy To a minimum And the corresponding error energy To a minimum Is applied to Equation (9) to obtain the maximum harmonic size (Step S44).

또한, 표1은 종래의 실수피치기반 방법과 델타조정법과의 연산량을 비교한 것이며,,,,으로 하였고 기본주파수에 해당하는 피치범위는 통상이다. 해당 연산량의 비는 최소 4:1, 최대 19:1, 평균 13:1로 계산량이 대폭 감소된다. 델타조정법은 피치에 따라서 연산량이 가변적이지만 평균적으로 많은 계산량의 감소로 연산량이 중요한 요소로 작용하는 DSP칩을 이용한 실시간 구현에 효율적이다.Table 1 compares the computational complexity between the conventional real pitch-based method and the delta adjustment method, , , , And the fundamental frequency The pitch range corresponding to to be. The computational complexity is greatly reduced by a ratio of 4: 1, a maximum of 19: 1, and an average of 13: 1. The delta adjustment method is effective for real-time implementation using a DSP chip, in which the computation amount is an important factor due to a decrease in the amount of calculation on the average although the computation amount is variable according to the pitch.

실수피치 기반 방법Real Pitch-based Method 델타조정법Delta Adjustment Method 연산량Operation amount

델타조정법의 성능을 평가하기 위해서 종래의 방법과의 고조파 스펙트럼 왜곡(Spectral Distortion: SD)을 측정하였으며, 선호도 평가를 수행하였다. 객관적인 음질 평가 척도로서의 고조파 SD는 다음과 같이 정의된다.In order to evaluate the performance of the delta tuning method, the harmonic spectral distortion (SD) with the conventional method was measured and a preference evaluation was performed. The harmonic SD as an objective sound quality evaluation scale is defined as follows.

여기서 각각,는 실수 피치 기반 방법과 델타 조정 방법으로 추정된번째 고조파의 크기이며,는번째 프레임에서의 고조파의 갯수이다.Here, , Is estimated using a real-pitch-based method and a delta adjustment method The second harmonic, The Lt; th > frame.

여성화자A womanizer 남성화자Masculine 평균Average SD(dB)SD (dB) 0.09620.0962 0.120.12 0.1080.108

표 2는 종래의 실수피치기반 방식으로 전체 주파수 대역에 대하여 구한 고조파와 본 발명에 따른 각각의 고조파에 대한 델타조정법으로 구한 고조파간의 스펙트럼왜곡(Spectrum Distance)를 보여주며, 해당 스펙트럼왜곡 0.1dB 정도의 차이는 주관적으로 거의 음질의 차이를 구별할 수 없다.Table 2 shows the spectral distances between the harmonics obtained for the entire frequency band and the harmonics obtained by the delta adjustment method for each harmonic according to the present invention in the conventional real-pitch-based scheme, and the spectrum distortion of about 0.1 dB The difference is that it can not distinguish the difference of the sound quality subjectively.

상술한 바와 같이, 정수피치 후보에 대한 고조파의 갯수만큼 고조파의 대역별로 독립적으로 입력신호 스펙트럼과 합성신호 스펙트럼과의 오차에너지의 최소화를 위하여 고조파 주파수 조정값를 주파수 영역에 따라 적응적으로 조정하는 델타 조정법은 피치 범위에 따라 연산량이 다르기는 하지만 평균적으로 많은 계산량의 감소가 있으며, 연산량이 중요한 관심사가 되는 DSP(Digital Signal Processor) 칩을 이용한 실시간 구현에 있어서 적은 연산량이 요구되므로 매우 효율적이고, 피치의 오차에 대해서 덜 민감한 특성을 가지게 된다.As described above, in order to minimize the error energy between the input signal spectrum and the synthesized signal spectrum independently for each band of the harmonics by the number of harmonics for the integer pitch candidates, Is adaptively adjusted according to the frequency domain. However, in the real-time implementation using a DSP (Digital Signal Processor) chip in which the computation amount is an important concern, there is a decrease in the average amount of computation although the computation amount varies depending on the pitch range It is very efficient because it requires a small amount of computation, and it is less sensitive to pitch error.

또한, 본 발명에 따른 실시예는 상술한 것으로 한정되지 않고, 본 발명과 관련하여 통상의 지식을 가진자에게 자명한 범위내에서 여러 가지의 대안, 수정 및변경하여 실시할 수 있다.Further, the embodiment according to the present invention is not limited to the above-described embodiment, and various alternatives, modifications and changes may be made within the scope of the present invention to those skilled in the art.

이상과 같이, 본 발명은 고조파 부호화기의 연산량을 크게 줄이는 고조파 추정 방식으로 DSP 칩을 사용한 실시간 구현에 있어서는 적은 연산량이 요구되므로 매우 효율적이다. 또한, 델타조정법은 고조파 갯수 만큼 고조파 대역 별로 독립적으로 최소화 과정을 반복 수행하여 적은 연산량을 필요로 하며, 오차에 대해서 덜 민감한 특성을 갖는다. 또한 피치의 해상도보다는 피치 윤곽선(contour)이 음질에 중요하다는 음성의 피치 특성을 잘 이용한다. 계산량 분석, 스펙트럼 왜곡 측정, 선호도 평가 결과 적은 계산량으로 우수한 고조파 추정 성능의 효과가 있다.As described above, the present invention is a harmonic estimation method for greatly reducing the amount of computation of a harmonic encoder, and is very efficient because a small amount of computation is required in real-time implementation using a DSP chip. In addition, the delta adjustment method requires a small amount of computation by performing the minimization process independently for each harmonic band by the number of harmonics, and is less sensitive to errors. It also makes good use of the pitch property of speech that pitch contour is more important to sound quality than pitch resolution. Analysis of computational complexity, measurement of spectrum distortion, and evaluation of preference result in the effect of excellent harmonic estimation performance with a small amount of calculation.

Claims

Obtaining an input signal spectrum by applying a window spectrum to an input speech signal; obtaining a synthesized signal spectrum by applying a window spectrum to an integer pitch candidate; calculating an absolute value of an absolute value of the input signal spectrum and an absolute value of a synthesized signal spectrum Calculating a harmonic frequency adjustment value that minimizes the error energy, calculating a harmonic frequency adjustment value by applying the harmonic frequency adjustment value, And estimating a harmonic of the speech signal.

The method according to claim 1,

Wherein the range of the harmonic frequency adjustment value is adaptively adjusted from a low frequency to a high frequency so as to be proportional to the frequency.

The method according to claim 1,

Wherein the maximum harmonic size is set to an optimal harmonic by applying a frequency adjustment value corresponding to a range of the harmonic frequency adjustment value.