KR101072457B1

KR101072457B1 - Overlap and add harmonic synthesis method and apparatus for sinusoidal speech

Info

Publication number: KR101072457B1
Application number: KR1020100006313A
Authority: KR
Inventors: 이인성; 김영준; 박종배; 정규혁
Original assignee: 충북대학교 산학협력단
Priority date: 2010-01-25
Filing date: 2010-01-25
Publication date: 2011-10-11
Also published as: KR20110086918A

Abstract

정현파 음성의 중첩 합산 하모닉 합성 방법 및 장치가 개시된다. 하모닉 합성 방법은 k 및 k+1번째 프레임들 간의 최대 중첩 지점에서의 정현파 파라미터를 추출하는 단계, 원본 위상에 근접하도록 합성 정현파의 위상 레벨을 조절하는 위상 레벨 갱신 단계, 이전 프레임으로부터 현재 프레임을 예측하여 가중치로 사용될 정현파 크기를 예측하는 정현파 파라미터 비 예측 단계, 및 조절된 위상 레벨 및 예측된 크기를 이용하여 합성 정현파를 모델링하는 단계를 포함한다. 본 발명에 의하여 프레임의 경계에서 나타나는 불연속성을 제거하고, 정현파와 원본 신호 간의 오류를 감소시킴으로써 음질 저하를 방지할 수 있다.
[색인어]
정현파, 중첩 합산, 가중치Disclosed are a method and apparatus for superposition summated harmonic synthesis of sinusoidal speech. The harmonic synthesis method extracts a sinusoidal parameter at the maximum overlap between k and k + 1th frames, updates a phase level of adjusting the phase level of the synthesized sinusoid to approximate the original phase, and predicts the current frame from the previous frame. Sine wave parameter ratio prediction step for predicting sine wave size to be used as a weight, and modeling the synthesized sine wave using the adjusted phase level and the predicted magnitude. According to the present invention, the sound quality can be prevented by removing the discontinuity appearing at the boundary of the frame and reducing the error between the sine wave and the original signal.
[Index]
Sine wave, nested sum, weight

Description

Overlap and add harmonic synthesis method and apparatus for sinusoidal speech

본 발명은 음성 합성 신호를 보간 할 때 프레임간의 불연속성을 최소화 하는 방법에 관한 것으로, 특히, 정현파 크기 값으로 가중치를 두는 방법, 위상 변환 인자를 적용, 감쇄(damping) 하모닉 크기 파라미터를 적용하여 합성 파형에 대한 연속성을 개선하기 위한 방법에 관한 것이다.The present invention relates to a method for minimizing the discontinuity between frames when interpolating speech synthesis signals, in particular, a method for weighting sinusoidal magnitude values, applying a phase shifting factor, and applying a damping harmonic magnitude parameter. To a method for improving continuity.

정현파 모델은 일반적으로 4kbps 이하에서도 좋은 음질을 제공하는 STC(Sinusoidal Transform Coding), MBE(Multi-band Excitation), HVXC(Harmonic Vector Excitation Coding), MELP(Mixed Excitation Linear Prediction) 등의 음성 코덱들과 HILN(Harmonic Individual Line plus Noise)과 같은 오디오 코덱에서 사용되고 있고, 최근에는 음성 변환이나 음질 개선, 그리고 저 전송률의 음성 부호화의 디지털 신호처리 분야에 이용되고 있는 기술이다.Sinusoidal models typically use HILN and voice codecs such as Sinusoidal Transform Coding (STC), Multi-band Excitation (MBE), Harmonic Vector Excitation Coding (HVXC), and Mixed Excitation Linear Prediction (MELP) that provide good sound quality at 4 kbps and below. It is used in audio codecs such as (Harmonic Individual Line plus Noise), and is recently used in the field of digital signal processing such as voice conversion, sound quality improvement, and low bit rate voice coding.

정현파 또는 하모닉 모델은 음성 신호를 시간에 따라 변화하는 주파수, 진폭, 그리고 위상을 가진 정현파 성분의 선형 합으로 정의한다. 정현파 모델을 이용한 음성 부호화기의 처리 단계는 파라미터의 보간법이나 파형의 중첩 합산(overlap and add) 방법을 통해 수행되며, 파라미터의 예측은 일반적으로 MP(Matching Pursuit) 방법이 정확한 것으로 알려져 있다.A sinusoidal or harmonic model defines a speech signal as a linear sum of sinusoidal components with frequency, amplitude, and phase that change over time. The processing of the speech coder using the sinusoidal model is performed through a parameter interpolation method or an overlap and add method of waveforms. In general, the prediction of a parameter is known to be an accurate matching pursuit (MP) method.

정현파 모델은 한 프레임 안에서 정현파 파라미터가 일정한 값을 가진다는 기본 가정 때문에 프레임간의 불연속성이 생기게 된다. 이러한 불연속성을 줄이기 위해 선형 보간(linear interpolation) 및 위상 보간(phase interpolation)이 합성시 함께 사용되는데, 프레임의 중간 지점에서 정현파 파라미터를 추정하기 때문에 불연속성이 남아 있게 된다. 따라서 정현파 크기 값으로 가중치를 두는 방법, 위상 변환 인자를 적용, 감쇄 하모닉 크기 파라미터를 적용하여 정현파 파라미터를 추정함으로써 프레임간의 불연속성을 최소화하는데 목적을 둔다.The sinusoidal model creates frame discontinuities due to the basic assumption that sinusoidal parameters have a constant value within a frame. In order to reduce this discontinuity, linear interpolation and phase interpolation are used together in the synthesis. The discontinuity remains because the sinusoidal parameter is estimated at the middle point of the frame. Therefore, it aims to minimize the discontinuity between frames by weighting by sinusoidal magnitude value, applying phase shifting factor and estimating sinusoidal parameter by applying attenuation harmonic magnitude parameter.

종래 기술에 의한 1차 위상을 갖는 중첩 합산 정현파 합성 방법은 다음과 같다.The superposition summated sinusoidal synthesis method having a primary phase according to the prior art is as follows.

1차 위상을 갖는 중첩 합산 정현파 합성 방법의 주된 요점은 N/2(overlap-length)에 대하여 k번째 프레임과 k+1번째 프레임 사이의 N/2 지점을 중심으로 파라미터들의 연속성을 유지하도록 N/2 지점에서의 정현파 파라미터를 찾는 것이다.First main point of the overlap-add sinusoidal synthesis having a phase N / to maintain the continuity of the parameters about the N / 2 point between the k-th frame and the (k + 1) th frame with respect to N / 2 (overlap-length) Find the sinusoidal parameters at point 2.

k번째 프레임에서의 크기, 주파수, 위상을 각각 A _k , w _k , θ _k 라 정의하고 N/2 지점인 중간에 위치한 크기, 주파수, 위상을 각각

,

라 한다면 중간에 위치한 크기와 주파수 값은 간단히 다음과 같이 과거와 현재 크기와 주파수 파라미터의 평균값으로 수학식 1 및 2로 표현될 수 있다.The magnitude, frequency, and phase in the kth frame are defined as A _k , w _k , and θ _k , respectively, and the magnitude, frequency, and phase located in the middle of N / 2 points are

,

In this case, the magnitude and frequency values located in the middle may simply be expressed by Equations 1 and 2 as average values of past and present magnitude and frequency parameters as follows.

하지만, 중간지점의 위상 파라미터의 선택은 분명하지 않다. 중첩 길이 N/2에 대한 중첩 합산 과정에서 각 프레임마다 삼각 윈도우(triangle window)가 적용되기 때문에 정현파의 최대 중첩은 위도우가 서로 만나는 지점에서 발생한다. 그러므로 최대 중첩 지점에서의 위상 불연속성을 최소화하는 방법으로 중간 지점의 위상을 얻는 것이 적절하다. 이 두 점은 N/4, 3N/4이며 각각의 위상 에러를 ∋ _N _/4(

)와 ∋ ₃ _N _/4(

, M)이라 한다면 다음 수학식 3 및 4와 같이However, the choice of the midpoint phase parameter is not clear. Since a triangular window is applied to each frame in the overlap summation process for the overlap length N / 2, the maximum overlap of the sine wave occurs at the point where the latitudes meet each other. Therefore, it is appropriate to obtain the midpoint phase in such a way as to minimize the phase discontinuity at the maximum overlap point. These two points are N / 4, 3 N / 4 and each phase error is equal to _N _{/ 4} (

) And ∋ ₃ _N _{/ 4} (

, M ), as in Equations 3 and 4

여기서 위상의 불연속성을 줄이기 위해 첨가된 위상

와 정수 M 을 추정하기 위하여 두 지점에서 에러의 제곱의 합을 위상 오차 함수 l(

, M) 로 정의하면 수학식 5와 같다.Where added phases to reduce phase discontinuities

To estimate the integer and M , the sum of the squares of the errors at two points is added to the phase error function l (

, M ) is the same as Equation 5.

위상 불연속을 최소로 하는 위상

와 정수 M은 각각에 대하여 편미분 값이 0이 되는 값이며, 이는 수학식 6 및 7과 같이 구해진다.Phase to minimize phase discontinuity

And the constant M are values for which the partial differential value becomes 0 for each, which are obtained as in Equations 6 and 7.

이와 같이, 정현파 음성 모델은 낮은 비트 전송률로 음성 신호를 부호화하는 효율적인 기술로 알려져 있고, 최근에는 음성 변환이나 음질 개선, 저 전송률의 음성 부호화의 디지털 신호처리 분야에 이용되고 있다. 또한, 음성 부호화기의 처리 단계는 파라미터의 보간법이나 파형의 중첩 합산(overlap and add) 방법을 통해 수행된다.As described above, the sinusoidal speech model is known as an efficient technique for encoding a speech signal at a low bit rate. Recently, the sinusoidal speech model has been used in the field of digital signal processing such as speech conversion, sound quality improvement, and low rate speech coding. In addition, the processing of the speech coder is performed through a parameter interpolation method or a waveform overlap and add method.

그런데, 정현파 모델은 한 프레임 안에서 정현파 파라미터가 일정한 값을 가진다는 기본 가정 때문에 파라미터들이 프레임의 중심에 가중치를 두고 구해진 것이기 때문에 합성 시 프레임의 경계에서 불연속의 원인이 되고, 음질 저하의 중요한 요인이 된다.However, since the sine wave model is obtained by weighting the center of the frame due to the basic assumption that the sine wave parameter has a constant value in one frame, it is a cause of discontinuity at the boundary of the frame during synthesis and is an important factor of sound quality degradation. .

그러므로, 이러한 합성 에러를 감소시킴으로써 합성시 프레임의 경계에서의 불연속을 줄이고, 음질 저하를 방지할 수 있는 중첩 합산 기술이 절실히 요구된다.Therefore, there is an urgent need for an overlap summation technique capable of reducing discontinuity at the border of the frame at the time of synthesis and preventing degradation of sound quality by reducing such synthesis error.

본 발명의 목적은 프레임의 경계에서 나타나는 불연속성을 제거하기 위해서 제안된 것으로, 정현파 합성 파형에 대한 연속성을 개선하는데 그 목적이 있다.An object of the present invention is to remove discontinuities appearing at the boundary of a frame, and to improve the continuity of sinusoidal synthesized waveforms.

또한, 본 발명의 다른 목적은 합성된 정현파와 원본 신호 간의 오류를 감소시킴으로써 음질 저하를 방지할 수 있는 중첩 합산 기술을 제공하는 것이다.In addition, another object of the present invention is to provide an overlap summation technique capable of preventing sound degradation by reducing errors between the synthesized sine wave and the original signal.

상기와 같은 목적들을 달성하기 위한 본 발명의 일면은, 원본 음성 신호의 k번째 프레임 및 k+1번째 프레임을 합산하여 합성 정현파를 생성하는 정현파 음성 모델의 중첩 합산 하모닉 합성 방법에 관한 것이다. 본 발명에 의한 중첩 합산 하모닉 합성 방법은 k 및 k+1번째 프레임들 간의 최대 중첩 지점에서의 정현파 파라미터를 추출하는 단계, 원본 위상에 근접하도록 합성 정현파의 위상 레벨을 조절하는 위상 레벨 갱신 단계, 이전 프레임으로부터 현재 프레임을 예측하여 가중치로 사용될 정현파 크기를 예측하는 정현파 파라미터 비 예측 단계, 및 조절된 위상 레벨 및 예측된 크기를 이용하여 합성 정현파를 모델링하는 단계를 포함한다. 특히, 정현파 파라미터 추출 단계는 k 및 k+1번째 프레임들 간의 중첩 길이가 N일때, N/2에 해당하는 중간 지점에서 합성 정현파의 크기

및 주파수

를 다음과 같이 추출하는 중간 지점 크기 및 주파수 추출 단계

,

(여기서, A_k, w_k는 각각 k번째 프레임에서의 크기 및 주파수이다), 중간 지점에서의 합성 정현파의 위상

를 추출하는 중간 지점 위상 추출 단계, N을 4등분할 때 k번째 프레임으로부터 각각 N/4, 3N/4에 해당하는 두 지점에서의 합성 정현파의 오류 값을 각각 ∋ ^S _N _/4(

) 및 ∋ ^S ₃ _N _/4(

, M) 이라고 할 때, ∋ ^S _N _/4(

) 및 ∋ ^S ₃ _N _/4(

, M) 각각의 제곱의 합을 가중치된 위상 오차 함수 r(

, M)를 연산하는 가중치된 위상 오차 함수 연산 단계, 및 합성 정현파의 불연속성을 최소화하는 위상

및 위상비 구속 정수 인자 M을 추출하는 단계를 포함한다. 더 나아가, 위상 레벨 갱신 단계는 중간 지점 위상

를 수신하고, 정합 추적 알고리즘을 통하여 중간 지점의 최적 위상 θ _opt 를 추출하는 단계, 중간 지점 위상

및 최적 위상 θ _opt 를 이용하여 위상 변환 인자 β 를 연산하는 단계, 및 다음 조건을 만족하는 변환 위상 θ _{SF, l} 을 정의하는 단계 θ _SF,ℓ =

_ℓ +β _ℓ *2π, (여기서,

및 β _l 은 각각 1번째 중간 지점 위상 및 1번째 위상 변환 인자이다), 및 변환 위상 θ _{SF, l} 을 1번째 하모닉의 중간 지점 위상값으로서 갱신하는 단계를 포함한다. 바람직하게는, 정현파 파라미터 비 예측 단계는 k번째 프레임 및 k+1번째 프레임에서의 1번째 하모닉들의 크기 간에 다음의 관계를 만족하는 감쇄 요소 g ^l _k 를 연산하는 단계 A ^l _k ₊₁=g ^l _k ·A ^l _k , 원본 음성 신호 및 합성 정현파의 MSE(Mean Squared Error)를 최소화하는 최적 감쇄 요소를 구하는 최적 감쇄 요소 추출 단계, 및 합성 정현파의 크기에 최적 감쇄 요소 g ^l _k 를 적용하여 정현파 크기값을 갱신하는 단계를 포함한다. 특히, 감쇄 요소 g ^l _k 는 다음 관계

를 만족하도록 선택되고, 여기서, s _k (n)은 원본 신호이고,

이다.One aspect of the present invention for achieving the above object relates to a method of overlapping summation harmonic synthesis of a sinusoidal speech model to generate a synthesized sinusoid by summing the k-th frame and k + 1-th frame of the original speech signal. The overlapped summation harmonic synthesis method according to the present invention comprises the steps of: extracting a sine wave parameter at the maximum overlap point between k and k + 1th frames, phase level updating step of adjusting the phase level of the synthesized sinusoid to approach the original phase, A sine wave parameter ratio prediction step of predicting a sine wave size to be used as a weight by predicting a current frame from the frame, and modeling a synthetic sine wave using the adjusted phase level and the predicted magnitude. In particular, the step of extracting sinusoidal parameters is the magnitude of the synthesized sinusoid at the intermediate point corresponding to N / 2 when the overlap length between k and k + 1th frames is N.

And frequency

Midpoint magnitude and frequency extraction steps as follows:

,

Where A_k, w_kAre the magnitude and frequency in the k-th frame, respectively), and the phase of the synthesized sinusoid at the midpoint.

An intermediate point phase extraction step of extracting N, quartets of error values of the synthesized sinusoids at two points corresponding to N / 4 and 3N / 4, respectively, from the kth frame when N is divided into four. ^S _N _/4(

) And ∋ ^S ₃ _N _/4(

,M), ∋ ^S _N _/4(

) And ∋ ^S ₃ _N _/4(

,M) The sum of the squares of each weighted phase error functionr(

,MA weighted phase error function computation step of computing < RTI ID = 0.0 >

And extracting the phase ratio constraint integer factor M. Furthermore, the phase level update step is an intermediate point phase

And the optimal phase of the intermediate point through the matching tracking algorithmθ _opt Extracting step, midpoint phase

And optimal phaseθ _opt Calculating a phase shifting factor β using; and a shifting phase θ satisfying the following condition: _{SF, l} Steps to Defineθ _{SF, ℓ} =

_ℓ +β _ℓ *2π, (here,

And β _l Are the first intermediate point phase and the first phase shift factor, respectively, and the shift phase θ _{SF, l} Is updated as the midpoint phase value of the first harmonic. Preferably, the sinusoidal parameter ratio prediction step includes attenuation elements that satisfy the following relationship between the magnitudes of the first harmonics in the kth frame and the k + 1th frame.g ^l _k OperationA ^l _k ₊₁=g ^l _k ·A ^l _k , The optimal attenuation factor extraction step to find the optimal attenuation factor to minimize the mean squared error (MSE) of the original speech signal and the synthesized sine wave, and the optimum attenuation factor to the size of the synthesized sine waveg ^l _k The step of updating the sinusoidal magnitude value by applying a. In particular, the damping elementg ^l _k Has the following relationship

Is selected to satisfy, wheres _k (n)Is the original signal,

to be.

상기와 같은 목적들을 달성하기 위한 본 발명의 다른 면은 원본 음성 신호의 k번째 프레임 및 k+1번째 프레임을 합산하여 합성 정현파를 생성하는 정현파 음성 모델의 중첩 합산 하모닉 합성 장치에 관한 것이다. 본 발명에 의한 중첩 합산 하모닉 합성 장치는 k 및 k+1번째 프레임들 간의 최대 중첩 지점에서의 정현파 파라미터를 추출하는 정현파 파라미터 추출부, 원본 위상에 근접하도록 합성 정현파의 위상 레벨을 조절하는 위상 레벨 갱신부, 이전 프레임으로부터 현재 프레임을 예측하여 가중치로 사용될 정현파 크기를 예측하는 정현파 파라미터 비 예측부, 및 조절된 위상 레벨 및 예측된 크기를 이용하여 합성 정현파를 모델링 합성 정현파 모델링부를 포함한다. 정현파 파라미터 추출부는 k 및 k+1번째 프레임들 간의 중첩 길이가 N일때, N/2에 해당하는 중간 지점에서 합성 정현파의 크기

및 주파수

를 다음과 같이 추출하는 중간 지점 크기 및 주파수 추출기

,

를 추출하는 중간 지점 위상 추출기, N을 4등분할 때 k번째 프레임으로부터 각각 N/4, 3N/4에 해당하는 두 지점에서의 합성 정현파의 오류 값을 각각 ∋ ^S _N _/4(

) 및 ∋ ^S ₃ _N _/4(

, M) 이라고 할 때, ∋ ^S _N _/4(

) 및 ∋ ^S ₃ _N _/4(

, M) 각각의 제곱의 합을 가중치된 위상 오차 함수 r(

, M)를 연산하는 가중치된 위상 오차 함수 연산기, 및 합성 정현파의 불연속성을 최소화 하는 위상

및 위상비 구속 정수 인자 M을 추출하는 위상비 구속 정수 인자 추출기를 포함한다. 특히, 위상 레벨 갱신부는 중간 지점 위상

를 수신하고, 정합 추적 알고리즘을 통하여 중간 지점의 최적 위상 θ _opt 를 추출하는 최적 위상 추출기, 중간 지점 위상

및 최적 위상 θ _opt 를 이용하여 위상 변환 인자 β 를 연산하는 위상 변환 인자 추출기, 및 다음 조건을 만족하는 변환 위상 θ _{SF, l} 을 정의하는 변환 위상 결정기 θ _SF,ℓ =

_ℓ +β _ℓ *2π, (여기서,

및 β _l 은 각각 1번째 중간 지점 위상 및 1번째 위상 변환 인자이다), 및 변환 위상 θ _{SF, l} 을 1번째 하모닉의 중간 지점 위상값으로서 갱신하는 위상 레벨 갱신기를 포함한다. 더 나아가, 정현파 파라미터 비 예측부는 k번째 프레임 및 k+1번째 프레임에서의 1번째 하모닉들의 크기 간에 다음의 관계를 만족하는 감쇄 요소 g ^l _k 를 연산하는 정현파 크기 예측기 A ^l _k ₊₁=g ^l _k ·A ^l _k , 원본 음성 신호 및 합성 정현파의 MSE(Mean Squared Error)를 최소화하는 최적 감쇄 요소를 구하는 최적 감쇄 요소 추출기, 및 합성 정현파의 크기에 최적 감쇄 요소 g ^l _k 를 적용하여 정현파 크기값을 갱신하는 정현파 크기값 갱신기를 포함한다.Another aspect of the present invention for achieving the above object relates to a superposition summation harmonic synthesis apparatus of a sinusoidal speech model to generate a synthesized sinusoid by summing the k-th frame and k + 1-th frame of the original speech signal. The superposition summation harmonic synthesizing apparatus according to the present invention includes a sinusoidal parameter extraction unit for extracting a sinusoidal parameter at the maximum overlapping point between k and k + 1th frames, and a phase level update for adjusting a phase level of the synthesized sinusoid to be close to the original phase. A sinusoidal parameter ratio predictor for predicting a sinusoidal size to be used as a weight by predicting a current frame from a previous frame, and a synthesized sinusoidal modeling unit modeling a synthesized sinusoid using an adjusted phase level and a predicted magnitude. The sinusoidal parameter extraction unit is the size of the synthesized sinusoid at the midpoint corresponding to N / 2 when the overlap length between k and k + 1th frames is N.

And frequency

Midpoint size and frequency extractor to extract

,

A mid-point phase extractor that extracts N and quartets of the error values of the synthesized sinusoids at two points corresponding to N / 4 and 3N / 4, respectively, from the kth frame when N is divided into four. ^S _N _/4(

) And ∋ ^S ₃ _N _/4(

,M), ∋ ^S _N _/4(

) And ∋ ^S ₃ _N _/4(

,M) The sum of the squares of each weighted phase error functionr(

,MA weighted phase error function operator that computes

And a phase ratio constraint integer factor extractor that extracts a phase ratio constraint integer factor M. In particular, the phase level update unit is an intermediate point phase

And the optimal phase of the intermediate point through the matching tracking algorithmθ _opt Phase extractor, midpoint phase extraction

And optimal phaseθ _opt A phase shifting factor extractor for calculating the phase shifting factor β by using, and a shifting phase θ satisfying the following condition _{SF, l} Conversion phase determinerθ _{SF, ℓ} =

_ℓ +β _ℓ *2π, (here,

And β _l Are the first intermediate point phase and the first phase shift factor, respectively, and the shift phase θ _{SF, l} It includes a phase level updater for updating the as a midpoint phase value of the first harmonic. Furthermore, the sinusoidal parameter ratio predictor is attenuating element that satisfies the following relationship between the magnitudes of the first harmonics in the kth frame and the k + 1th frame.g ^l _k Sinusoidal magnitude estimator that computesA ^l _k ₊₁=g ^l _k ·A ^l _k An optimal attenuation factor extractor to obtain an optimal attenuation factor that minimizes the mean squared error (MSE) of the original speech signal and the synthesized sine wave, and an optimal attenuation factor for the magnitude of the synthesized sine waveg ^l _k It includes a sine wave size value updater for updating the sine wave size value by applying.

본 발명에 의하여, 기존의 프레임 중간 지점의 정현파 파라미터를 추정하는 대신에, 정현파 크기 값으로 가중치를 두는 방법, 위상 변환 인자를 적용하는 방법, 그리고 감쇄(damping) 하모닉 크기 파라미터를 적용하여 정현파 파라미터를 추정하는 방법을 이용함으로써 합성 파형에 대한 연속성을 개선할 수 있다.According to the present invention, instead of estimating a sinusoidal parameter of an existing frame midpoint, a sinusoidal parameter is obtained by weighting a sinusoidal magnitude value, applying a phase shifting factor, and applying a damping harmonic magnitude parameter. By using the estimating method, the continuity for the synthesized waveform can be improved.

또한, 합성 정현파 및 원본 신호 간의 오류를 최소화함으로써 음질을 향상시킬 수 있다.In addition, the sound quality can be improved by minimizing the error between the synthesized sine wave and the original signal.

도 1은 본 발명의 일면에 따른 중첩 합산 하모닉 합성 방법을 개념적으로 나타내는 흐름도이다.
도 2는 본 발명의 제2 측면에 의한 중첩 합산 하모닉 합성 장치를 개념적으로 나타내는 블록도이다.
도 3은 도 2에 도시된 정현파 파라미터 추출부(210)의 동작을 상세히 설명하는 도면이다.
도 4는 도 2에 도시된 위상 레벨 갱신부(230)의 동작을 상세히 설명하는 도면이다.
도 5는 도 2에 도시된 정현파 파라미터 비 예측부(250)의 동작을 상세히 설명하는 도면이다.1 is a flowchart conceptually illustrating a superposition summation harmonic synthesis method according to an aspect of the present invention.
Fig. 2 is a block diagram conceptually showing the superposition summation harmonic synthesis apparatus according to the second aspect of the present invention.
3 is a view for explaining the operation of the sinusoidal parameter extraction unit 210 shown in FIG.
4 is a view for explaining the operation of the phase level update unit 230 shown in FIG.
FIG. 5 is a diagram illustrating in detail the operation of the sinusoidal parameter ratio predictor 250 illustrated in FIG. 2.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다.In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings which illustrate preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로서, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it may further include other components, without excluding the other components unless otherwise stated. In addition, the terms "... unit", "... unit", "module", "block", etc. described in the specification mean a unit for processing at least one function or operation, which means hardware, software, or hardware. And software.

도 1은 본 발명의 일면에 따른 중첩 합산 하모닉 합성 방법을 개념적으로 나타내는 흐름도이다.1 is a flowchart conceptually illustrating a superposition summation harmonic synthesis method according to an aspect of the present invention.

우선, 모델링할 원본 음성 신호를 수신한다(S110). 그러면, 원본 음성 신호의 k번째 프레임 및 k+1번째 프레임 간의 최대 중첩 지점에서의 정현파 파라미터를 추출한다(S130). 이 때, 중간 지점의 정현파 파라미터

,

를 구한다. 종래의 선형 위상을 가지는 중첩 합산 정현파 합성법은 최대 중첩 윈도우가 만나는 지점에 대해서 단순히 위상 오차에 대해서만 오차를 최소화하는 방법으로 정현파 파라미터의 위상 값을 예측하였다. 하지만 종래의 중첩 합산 정현파 합성법은 합성 파형에 대해서 불연속성을 최소화 하는 것은 아니기 때문에 정현파 파라미터의 연속성을 향상시키기에는 충분하지 않다. 그러므로 합성 파형에 대한 오차를 최소화하는 과정에서 정현파 파라미터를 찾는 것이 필요하다.First, an original voice signal to be modeled is received (S110). Then, the sine wave parameter at the maximum overlapping point between the k th frame and the k + 1 th frame of the original speech signal is extracted (S130). At this time, the sine wave parameter of the intermediate point

,

. In the conventional superposition summated sinusoidal synthesis method having a linear phase, the phase value of the sinusoidal parameter is predicted by simply minimizing the error only for the phase error at the point where the maximum overlapping window meets. However, the conventional superposition sine wave synthesis method is not sufficient to improve the continuity of the sinusoidal parameters because it does not minimize the discontinuity of the synthesized waveform. Therefore, it is necessary to find the sine wave parameter in the process of minimizing the error on the synthesized waveform.

합성 정현파의 원본 신호와의 차이에 대한 MSE(Mean Squared Error)를 최소화하는 값이 연산되면, 이를 이용하여 원본 영상에 근접하도록 합성 정현파의 위상 레벨을 조절한다(S150). 여기서, 합성 정현파의 MSE는 크기가 가중치된 위상 오차 함수식으로 간략하게 유도될 수 있으며, 이는 명세서의 해당 부분에서 상세히 후술된다.When a value for minimizing the mean squared error (MSE) with respect to the difference between the synthesized sinusoid and the original signal is calculated, the phase level of the synthesized sinusoid is adjusted to be close to the original image by using the calculated value (S150). Here, the MSE of the synthesized sinusoid can be briefly derived as a magnitude-weighted phase error function, which is described in detail later in the corresponding part of the specification.

이 과정에서 과거 프레임으로부터 현재의 프레임을 예측하고, 부가될 가중치도 예측한다(S170). 부가될 가중치는 과거 프레임 및 현재 프레임 간의 크기 비에 해당한다. 정현파 크기값은 중간 지점의 위상 등을 구하는 과정에서 가중치 요소로 이용된다. 그러면, 얻어진 위상 레벨 및 크기를 이용하여 합성 정현파를 모델링할 수 있다(S190).In this process, the current frame is predicted from the past frame, and the weight to be added is also predicted (S170). The weight to be added corresponds to the size ratio between the past frame and the current frame. The sinusoidal magnitude is used as a weighting factor in the process of finding the phase of the intermediate point. Then, the synthesized sinusoid can be modeled using the obtained phase level and magnitude (S190).

도 1의 각 단계에서 파라미터를 예측하고 연산하는 방법에 대해서 이하 상세히 설명한다.A method of predicting and calculating parameters in each step of FIG. 1 will be described in detail below.

도 2는 본 발명의 제2 측면에 의한 중첩 합산 하모닉 합성 장치를 개념적으로 나타내는 블록도이다.Fig. 2 is a block diagram conceptually showing the superposition summation harmonic synthesis apparatus according to the second aspect of the present invention.

도 2에서, 중첩 합산 하모닉 합성 장치(200)는 정현파 파라미터 추출부(210), 위상 레벨 갱신부(230), 정현파 파라미터비 예측부(250), 및 합성 정현파 모델링부(270)를 포함한다.In FIG. 2, the superposition summation harmonic synthesizing apparatus 200 includes a sinusoidal parameter extractor 210, a phase level updater 230, a sinusoidal parameter ratio predictor 250, and a synthesized sinusoidal modeling unit 270.

정현파 파라미터 추출부(210)는 최대 중첩 지점에서의 정현파 파라미터를 추출한다. 정현파 파라미터 추출부(210)의 동작을 자세히 설명하기 위해서는 도 3을 참조한다.The sinusoidal parameter extractor 210 extracts a sinusoidal parameter at the maximum overlapping point. 3 to describe the operation of the sinusoidal parameter extraction unit 210 in detail.

도 3은 도 2에 도시된 정현파 파라미터 추출부(210)의 동작을 상세히 설명하는 도면이다.3 is a view for explaining the operation of the sinusoidal parameter extraction unit 210 shown in FIG.

도 3에 도시된 정현파 파라미터 추출부(300)는 중간 지점 정현파 크기 및 주파수 추출기(310), 중간 지점 위상 추출기(330), 위상 오차 함수 연산기(350), 및 위상비 구속 정수 인자 추출기(370)를 포함한다.The sinusoidal parameter extractor 300 shown in FIG. 3 includes a midpoint sinusoidal amplitude and frequency extractor 310, a midpoint phase extractor 330, a phase error function operator 350, and a phase ratio constraint integer factor extractor 370. It includes.

즉, 중간 지점 정현파 크기 및 주파수 추출기(310)는 중간 지점의 정현파 파라미터들 중에서 크기

는 수학식 1과 같이 정의하고, 주파수

는 다음 수학식 8처럼 정의한다.In other words, the midpoint sinusoidal magnitude and frequency extractor 310 are the magnitudes of the midpoint sinusoidal parameters.

Is defined as in Equation 1, frequency

Is defined as in Equation 8 below.

크기와 주파수가 정의되면, 중간 지점 위상 추출기(330)는 중간 지점의 정현파 파라미터의 위상

을 얻기 위해서 최대 중첩 지점 N/4, 3N/4에 대하여 합성 파형에 대한 MSE(Mean-Squared Error) 값 ∋ ^S _N _/4(

) 및 ∋ ^S ₃ _N _/4(

, M) 을 각각 수학식 9와 같이 정의한다.Once the magnitude and frequency are defined, the midpoint phase extractor 330 phases the sinusoidal parameters of the midpoint.

Maximum overlap point N / 4, 3 N / 4 MSE (Mean-Squared Error) of the synthesized waveform for the value ∋ ^S _N _{/ 4} in order to obtain the (

) And ∋ ^S ₃ _N _{/ 4} (

, M ) are defined as in Equation 9, respectively.

여기서,

here,

수학식 9에서, W _1/4와 W _3/4는 각각 N/4 지점과 3N/4 지점에서의 k번째 프레임과 k+1번째 프레임으로부터 각 지점의 평균 크기의 값이다.In Equation 9, W _1/4 and W _3/4 is the value of the average size of each spot from the k-th frame and the (k + 1) th frame of the N / 4 points and 3 N / 4 points respectively.

여기서, MSE를 최소로 하는 중간 지점의 위상을 얻기 위해 편미분시 0이 되는 해를 구해야 하는데, MSE 값은 위상 항에 대한 삼각함수를 포함하고 있고 많은 국소해(local solutions)를 가지고 있어 최적 해를 구하기 힘들다. 그러므로 이 문제를 해결하기 위해서 위상 오차 함수 연산기(350)는 수학식 9를 중간 지점의 위상

에 대하여 삼각 함수의 공식을 사용하여 수학식 10과 같이 정리한다.Here, in order to obtain the phase of the intermediate point that minimizes the MSE, we need to find a solution that is partial differential time zero. The MSE value includes the trigonometric function for the phase term and has many local solutions. Hard to obtain Therefore, in order to solve this problem, the phase error function operator 350 uses Equation 9 as the phase of the intermediate point.

Use the trigonometric formula for to sum up as in equation (10).

만약 변수 x가 작은 값이라면 1차 테일러 급수(1st Taylor Series)에 의해서 cos x

1-(x ²/2)이므로 수학식 10은 다음 수학식 11로 근사화 된다.If the variable x is a small value cos x by the first taylor series (1st Taylor Series)

Because the 1- (x ^2/2) Equation (10) approximates to the following equation (11).

수학식 11로부터 최종 합성된 신호의 MSE가 기존의 위상 오차 함수에서 정현파 크기(amplitude)가 가중치 된 위상 오차 함수식으로 간략하게 유도됨을 알 수 있다. 이것을 정현파 크기로 가중치 된 선형 위상 중첩 합산(WE-OLA-LP : weighted-overlap and add-linear phase) 합성법으로 정의한다.It can be seen from Equation 11 that the MSE of the final synthesized signal is briefly derived as a phase error function weighted by a sinusoidal amplitude in a conventional phase error function. This is defined by weighted-overlap and add-linear phase (WE-OLA-LP) synthesis.

위상 오차 함수 연산기(350)는 수학식 11에서 가중치 된 위상 오차 함수의 개념을 사용하여 N/4과 3N/4 지점에 대한 에러 제곱의 합 r(

, M) 을 수학식 12와 같이 정의한다.Phase error function calculator 350 is the sum of the error square of the N / 4 and 3 N / 4 points using the concept of the phase error function in Equation 11 r weight (

, M ) is defined as in Equation 12.

그러면, 위상비 구속 정수 인자 추출기(370)는 수학식 12에서 합성된 신호의 불연속성을 최소로 하는 위상

와 위상 비 구속 정수 인자 M 을 구할 수 있다. 즉, 위상비 구속 정수 인자 추출기(370)는 위상

와 위상 비 구속 정수 인자 M 각각의 편미분 값이 0이 되는 값을 구하는데, 그 결과는 수학식 13 및 14와 같다.Then, the phase ratio constraint integer extractor 370 is a phase that minimizes the discontinuity of the signal synthesized in Equation 12.

And the phase non-constrained constant factor M can be obtained. That is, the phase ratio constraint integer extractor 370 is a phase

The partial derivative values of and and the phase non-constrained integer factor M are respectively calculated to be zero. The results are given by Equations 13 and 14.

수학식 13 및 14에서, X, Y, Z는 각각 다음과 같다.In Equations 13 and 14, X, Y and Z are as follows.

위와 같이 정현파 파라미터가 구해지면, 위상 레벨 갱신부(도 2의 230)는 원본 위상에 근접하도록 위상 레벨을 조정한다. 즉, 위상 레벨 갱신부(230)는 위상 중첩 합산(WE-OLA-LP) 정현파 합성법에서 구한 보간 지점의 정현파 파라미터 중 위상 변환 인자를 적용하여 위상 값을 변화시켜 원 신호의 위상 값과 유사하게 함으로써 음성 신호의 합성 성능을 개선한다. 위상 레벨 갱신부(230)의 동작을 자세히 설명하기 위해서는 도 4를 참조한다.When the sine wave parameter is obtained as described above, the phase level updating unit 230 of FIG. 2 adjusts the phase level to approach the original phase. That is, the phase level updater 230 changes the phase value by applying a phase shift factor among the sine wave parameters of the interpolation point obtained by the phase superposition summation (WE-OLA-LP) sine wave synthesis method, thereby making the phase value similar to that of the original signal. Improve the synthesis performance of speech signals. To describe the operation of the phase level updater 230 in detail, refer to FIG. 4.

도 4는 도 2에 도시된 위상 레벨 갱신부(230)의 동작을 상세히 설명하는 도면이다.4 is a view for explaining the operation of the phase level update unit 230 shown in FIG.

도 4에 도시된 위상 레벨 갱신부(400)는 중간 지점 위상값 수신기(410), 중간 지점 위상 파라미터 추출기(430), 위상 변환 인자 추출기(450), 변환 위상 결정기(470), 및 위상 레벨 갱신기(490)를 포함한다.The phase level updater 400 shown in FIG. 4 includes an intermediate point phase value receiver 410, an intermediate point phase parameter extractor 430, a phase shift factor extractor 450, a transform phase determiner 470, and a phase level update. Group 490.

우선, 중간 지점 위상값 수신기(410)는 중간 지점 위상 추출기(도 3의 330)로부터 위상값을 수신한다. 그러면, 중간 지점 위상 파라미터 추출기(430)는 MP(Matching Pursuit) 기법을 적용하여 중간 지점의 위상 파라미터 θ _opt 를 추출한다. 또한, 위상 변환 인자 추출기(450)는 중간 지점 위상값 수신기(410) 및 중간 지점 위상 파라미터 추출기(430)에서 얻은

와 θ _opt 를 이용하여 위상 변환 인자 β 를 구한다. 이때 수학식 15와 같은 에러 E 가 β 값을 구하기 위해 사용된다.First, the midpoint phase value receiver 410 receives the phase value from the midpoint phase extractor (330 of FIG. 3). Then, the intermediate point phase parameter extractor 430 extracts the phase parameter θ _opt of the intermediate point by applying a matching pursuit (MP) technique. In addition, the phase shift factor extractor 450 is obtained from the midpoint phase value receiver 410 and the midpoint phase parameter extractor 430.

The phase shift factor β is obtained using and θ _opt . In this case, an error E as shown in Equation 15 is used to obtain a β value.

그러면, 변환 위상 결정기(470)는 위상의 프레임 경계 조건 θ ^k _l (N)=θ ^k _l ⁺¹(0)을 최소화 하는 β 값으로부터 θ _{SF, l} 을 수학식 16과 같이 새로 정의한다.Then, the conversion phase determiner 470 newly defines θ _{SF, l} as shown in Equation 16 from the β value that minimizes the phase boundary condition θ ^k _l ( N ) = θ ^k _l ⁺¹ (0).

그러면, 위상 레벨 갱신기(490)는 WE-OLA-LP 정현파 합성 과정에서 l 번째 하모닉 N/2 지점 위상값으로서 θ _{SF, l} 를 사용함으로써 원본 위상에 더욱 근접한 합성을 얻을 수 있다.Then, the phase level updater 490 can obtain synthesis closer to the original phase by using θ _{SF, l} as the l- th harmonic N / 2 point phase value in the WE-OLA-LP sine wave synthesis process.

이와 같이 위상 레벨이 갱신되면, 정현파 파라미터비 예측부(도 2의 250)는 과거 프레임으로부터 현재 프레임을 예측하여 가중치로 사용될 정현파 크기 값을 보다 정확하게 구한다. 정현파 파라미터비 예측부(250)의 동작을 자세히 설명하기 위해서는 도 5를 참조한다.As described above, when the phase level is updated, the sinusoidal parameter ratio predictor 250 of FIG. 2 predicts a current frame from a past frame and more accurately calculates a sinusoidal magnitude value to be used as a weight. 5 to describe the operation of the sinusoidal parameter ratio predictor 250 in detail.

도 5는 도 2에 도시된 정현파 파라미터비 예측부(250)의 동작을 상세히 설명하는 도면이다. 정현파 파라미터비 예측부(500)는 정현파 크기 예측기(510), 최적 감쇄 요소 추출기(530), 및 정현파 크기 값 갱신기(550)를 포함한다.FIG. 5 is a diagram illustrating in detail the operation of the sinusoidal parameter ratio predictor 250 shown in FIG. 2. The sinusoidal parameter ratio predictor 500 includes a sinusoidal magnitude predictor 510, an optimum attenuation factor extractor 530, and a sinusoidal magnitude value updater 550.

WE-OLA-LP 정현파 합성법에서 정현파 크기 값 W _1/4, W _3/4은 중간 지점의 위상

와 비구속 정수 인자 M을 추정하는 과정에서 가중치 요소로 적용된다.In the WE-OLA-LP sinusoidal synthesis method, the sinusoidal magnitude values W _1/4 and W _3/4 are the phases of the intermediate points.

It is applied as a weighting factor in estimating and unconstrained integer factor M.

정현파 크기 예측기(510)는 가중치로 사용된 정현파 크기 값을 과거와 현재 크기 파라미터의 평균값으로 표현하지 않고 과거 프레임의 정현파 파라미터의 합성신호와 원본신호와의 상관성을 이용하여 N/4 지점과 3N/4 지점의 정현파 크기를 반복되는 분석 및 합성 과정을 통해 과거 프레임에 대한 최적의 정현파 크기의 파라미터의 비를 예측함으로써 보다 정확한 중간 지점의 위상

와 비구속 정수 인자 M을 구한다.The sinusoidal magnitude predictor 510 does not express the sinusoidal magnitude value used as the weight as an average value of the past and present magnitude parameters, and uses the correlation between the synthesized signal of the sinusoidal parameters of the past frame and the original signal, using the N / 4 point and the 3N. Iterative analysis and synthesis of the sinusoidal magnitudes at the 4th / fourth point predicts the ratio of the optimal sinusoidal size parameter to the past frame, making the phase more precise.

And the unconstrained integer factor M are obtained.

이 때, 정현파 크기 예측기(510)는 k번째 프레임에서의 l 번째 하모닉에 대한 정현파 크기의 비를 감쇄(damping) 요소 g ^l _k 라 정의하는데, 이는 다음 수학식 17을 만족한다.At this time, the sinusoidal magnitude predictor 510 defines a ratio of the sinusoidal magnitude to the lth harmonic in the k-th frame as a damping element g ^l _k , which satisfies Equation 17 below.

스펙트럼 크기의 보간법(amplitude interpolation)은 시간 변화에 따라 피치의 변화가 크지 않다는 가정 하에 선형 함수를 이용하고 여기에 수학식 17을 적용한 스펙트럼 크기의 선형 보간식은 다음 수학식 18과 같다.The spectral magnitude interpolation method uses a linear function on the assumption that the pitch does not change with time, and the spectral magnitude linear interpolation equation (17) is expressed by Equation 18 below.

그러면, 최적 감쇄 요소 추출기(530)는 원본신호 s _k (n)와 합성 신호

_k (n)의 MSE를 수학식 19와 같이 정의하고, 여기서 g ^l _k 의 최적 해를 얻기 위해서 수학식 19의 g ^l _k 에 대한 편미분 값이 0이 되는 값을 구한다.The optimal attenuation element extractor 530 then combines the original signal s _k ( n ) with the composite signal.

defined as the MSE of _k (n) with equation (19), where g _k ^l to obtain the optimal solution is obtained of the value of the partial derivatives of equation (19) values for g ^l _k is zero.

여기서,

here,

수학식 19에서,

이다.In Equation 19,

to be.

최적 감쇄 요소 추출기(530)가 구한 값은 수학식 20과 같이 구해진다.The value obtained by the optimum damping element extractor 530 is obtained as shown in Equation 20.

그러면, 정현파 크기 값 갱신기(550)는 수학식 20에서 얻은 정현파 크기의 감쇄 하모닉 파라미터 크기의 요소 g ^l _k 를 적용하여 기존의 WE-OLA-LP 정현파 합성법에서 가중치로 사용된 정현파 크기값을 보다 정확하게 구한다. g ^l _k 요소가 적용된 정현파 합성법을 Damp-OLA-LP 정현파 합성법이라 정의한다.Then, the sinusoidal magnitude value updater 550 performs the element g ^l _k of the attenuation harmonic parameter magnitude of the sinusoidal magnitude obtained in Equation 20. The sinusoidal magnitude value used as the weight in the conventional WE-OLA-LP sine wave synthesis method is applied to obtain a more accurate value. g ^l _k The sine wave synthesis method to which the element is applied is defined as a Damp-OLA-LP sine wave synthesis method.

이상 설명된 바와 같이, 종래의 방식에서는 정현파 파라미터의 연속성을 유지하기 위해서 단순히 위상에 대한 불연속성을 최소화하기 위해 프레임과 중간 지점의 정현파 파라미터를 찾았다. 하지만, 본 발명은 위상 차이만이 아닌 합성된 신호들 간의 차이를 최소화하면서 프레임의 중간 지점의 정현파 파라미터를 찾는다. 이 과정에서 정현파 크기 값이 중간 지점의 위상

와 정수 M을 추정하는 과정의 가중치 요소로 적용되었다. 또한 Damp-OLA-LP 정현파 합성법은 과거 프레임의 정현파 파라미터의 합성과 원본 신호와의 상관성을 이용하여 중간 지점의 정현파 크기를 반복되는 분석 및 합성 과정을 통해 과거 프레임에 대한 최적의 정현파 크기의 파라미터의 비를 예측함으로써 합성 파형에 대한 오차를 최소화 하여 합성 음성 신호의 연속성을 개선시켰다.As described above, in the conventional scheme, the sinusoidal parameters of the frame and the intermediate point were found to simply minimize the discontinuity of the phase in order to maintain the continuity of the sinusoidal parameters. However, the present invention finds the sinusoidal parameter of the midpoint of the frame while minimizing the difference between the synthesized signals as well as the phase difference. In this process, the sine wave magnitude is the phase of the midpoint.

It is applied as a weighting factor in the process of estimating and integer M. In addition, Damp-OLA-LP sine wave synthesis method uses the synthesis of sine wave parameters of past frame and correlation of original signal to analyze the sine wave size of intermediate point. By predicting the ratio, the error on the synthesized waveform is minimized to improve the continuity of the synthesized speech signal.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

본 발명은 음성 합성 파형에 대한 연속성을 개선하기 위한 기술에 적용될 수 있다.The invention can be applied to techniques for improving the continuity for speech synthesis waveforms.

200: 중첩 합산 정현파 합성 장치
210, 300: 정현파 파라미터 추출부
230, 400: 위상 레벨 갱신부
250, 500: 정현파 파라미터 비 예측부
270: 합성 정현파 모델링부
310: 정현파 크기 및 주파수 추출기
330: 중간 지점 위상 추출기
350: 위상 오차 함수 연산기
370: 위상 비 구속 정수 인자 추출기
410: 중간 지점 위상값 수신기
430: 중간 지점 위상 파라미터 추출기
450: 위상 변환 인자 추출기
470: 변환 위상 결정기
490: 위상 레벨 갱신기
510: 정현파 크기 예측기
530: 최적 감쇄 요소 추출기
550: 정현파 크기값 갱신기200: superposition sine wave synthesis device
210, 300: Sinusoidal parameter extraction unit
230, 400: phase level update unit
250, 500: Sinusoidal parameter ratio prediction unit
270: synthetic sine wave modeling unit
310: Sinusoidal magnitude and frequency extractor
330 midpoint phase extractor
350: phase error function operator
370: phase unconstrained integer factor extractor
410: midpoint phase value receiver
430: Midpoint Phase Parameter Extractor
450: phase shift factor extractor
470: phase shifter
490: phase level updater
510: sine wave size predictor
530: optimal damping factor extractor
550: sine wave size value updater

Claims

In the overlapped summation harmonic synthesis method of a sinusoidal speech model for generating a synthesized sinusoid by summing the k-th frame and k + 1-th frame of the original speech signal,
Extracting a sinusoidal parameter at a maximum overlapping point between the k and k + 1th frames;
A phase level updating step of adjusting a phase level of the synthesized sine wave to approach an original phase;
A sinusoidal parameter ratio predicting step of predicting a sinusoidal magnitude to be used as a weight by predicting a current frame from a previous frame; And
Modeling the synthesized sinusoid using the adjusted phase level and predicted magnitude,
The phase level updating step is a midpoint phase

Receiving and extracting an optimal phase θ _opt of the intermediate point through a matching tracking algorithm; The midpoint phase

Calculating a phase shifting factor β using the optimum phase θ _opt ; And defining a conversion phase θ _{SF, l} satisfying the following condition:
θ _{SF, ℓ} =

_ℓ + β _ℓ * 2 π , (where

And β _l are each the first intermediate point phase and the first phase shift factor); And updating the transform phase θ _{SF, l} as an intermediate point phase value of a first harmonic.

The method of claim 1, wherein the sinusoidal parameter extraction step comprises:
The magnitude of the synthesized sinusoid at the midpoint corresponding to N / 2 when the overlap length between k and k + 1 th frames is N

And frequency

Midpoint magnitude and frequency extraction steps as follows:

,

Where A _k and w _k are the magnitude and frequency in the k-th frame, respectively;
Phase of Synthetic Sine Wave at the Middle Point

An intermediate point phase extraction step of extracting;
When N is divided into four quadrants, the error values of the synthesized sinusoids at two points corresponding to N / 4 and 3N / 4 respectively from the kth frame are ∋ ^S _N _{/ 4} (

) And ∋ ^S ₃ _N _{/ 4} (

, M ), ∋ ^S _N _{/ 4} (

) And ∋ ^S ₃ _N _{/ 4} (

, M ) Sum of each squared weighted phase error function r (

Weighted phase error function calculation step of computing M ; And
The phase to minimize discontinuity of the synthesized sinusoidal wave

And extracting a phase ratio constraint integer factor M. 10.

The method of claim 2, wherein the phase

And the phase ratio constraint integer factor M,
Next relation

And

(X, Y, Z are each

to be)

Superposition summation harmonic synthesis method of a sinusoidal speech model, characterized in that is selected to satisfy.

delete

The method of claim 1, wherein the sine wave parameter ratio prediction step is performed by:
calculating attenuation factor g ^l _k that satisfies the following relationship between the magnitudes of the first harmonics in the kth frame and the k + 1th frame
A ^l _k ₊₁ = g ^l _kA ^l _k ;
An optimal attenuation factor extraction step of obtaining an optimal attenuation factor for minimizing Mean Squared Error (MSE) of the original speech signal and the synthesized sine wave; And
And updating the sinusoidal magnitude value by applying the optimum attenuation factor to the magnitude of the synthesized sinusoidal wave.

The method of claim 5, wherein the damping element g ^l _k ,
Next relationship

Is selected to satisfy
Where s _k (n) is the original signal,

A superposition summation harmonic synthesis method of a sinusoidal speech model, characterized in that.

In the superposition summation harmonic synthesizing apparatus of a sinusoidal speech model for generating a synthesized sinusoid by summing a kth frame and a k + 1th frame of an original speech signal,
A sinusoidal parameter extracting unit extracting a sinusoidal parameter at a maximum overlapping point between the k and k + 1th frames;
A phase level updater for adjusting a phase level of the synthesized sinusoidal wave so as to approximate an original phase;
A sine wave parameter ratio predictor for predicting a sine wave size to be used as a weight by predicting a current frame from a previous frame; And
Including the synthesized sinusoidal modeling unit for modeling the synthesized sinusoids using the adjusted phase level and the predicted magnitude,
The phase level update unit is an intermediate point phase

An optimal phase extractor for receiving the _P-value and extracting the optimal phase θ _opt of the intermediate point through a matching tracking algorithm; The midpoint phase

And a phase shifting factor extractor for calculating a phase shifting factor β using the optimum phase θ _opt . And a conversion phase determiner defining a conversion phase θ _{SF, l} satisfying the following conditions:
θ _{SF, ℓ} =

_ℓ + β _ℓ * 2 π , (where

And β _l are each the first intermediate point phase and the first phase shift factor); And a phase level updater for updating the converted phases θ _{SF, l} as intermediate point phase values of a first harmonic.

The method of claim 7, wherein the sine wave parameter extraction unit,
The magnitude of the synthesized sinusoid at the midpoint corresponding to N / 2 when the overlap length between k and k + 1 th frames is N

And frequency

Midpoint size and frequency extractor to extract

,

An intermediate point phase extractor for extracting;
When N is divided into four quadrants, the error values of the synthesized sinusoids at two points corresponding to N / 4 and 3N / 4 respectively from the kth frame are ∋ ^S _N _{/ 4} (

) And ∋ ^S ₃ _N _{/ 4} (

, M ), ∋ ^S _N _{/ 4} (

) And ∋ ^S ₃ _N _{/ 4} (

, M ) Sum of each squared weighted phase error function r (

, M ) a weighted phase error function operator for computing M ; And
Phase to minimize discontinuity of the synthesized sinusoidal wave

And a phase ratio constrained integer factor extractor for extracting a phase ratio constrained integer factor M. 2.

The method of claim 8, wherein the phase

And the phase ratio constraint integer factor M,
Next relation

And

(X, Y, Z are each

)

The overlap-added harmonic synthesis device of the sinusoidal speech model, characterized in that is selected to satisfy.

delete

The method of claim 7, wherein the sinusoidal parameter ratio predictor,
Sinusoidal magnitude predictor that computes the attenuation factor g ^l _k that satisfies the following relationship between the magnitudes of the first harmonics in the kth frame and the k + 1th frame
A ^l _k ₊₁ = g ^l _kA ^l _k ;
An optimum attenuation element extractor for obtaining an optimum attenuation element for minimizing the mean squared error (MSE) of the original speech signal and the synthesized sine wave; And
And a sinusoidal magnitude value updater for updating the sinusoidal magnitude value by applying the optimum attenuation factor to the magnitude of the synthesized sinusoidal wave.

The method according to claim 11, wherein the damping element g ^l _k ,
Next relationship

Is selected to satisfy
Where s _k (n) is the original signal,

A superposition summation harmonic synthesizing apparatus of a sinusoidal speech model, characterized in that.