KR100416754B1

KR100416754B1 - Apparatus and Method for Parameter Estimation in Multiband Excitation Speech Coder

Info

Publication number: KR100416754B1
Application number: KR1019970026008A
Authority: KR
Inventors: 조용덕; 김홍국; 김무영
Original assignee: 삼성전자주식회사
Priority date: 1997-06-20
Filing date: 1997-06-20
Publication date: 2005-05-24
Also published as: KR19990002399A

Abstract

본 발명은 음성 신호 부호화 장치의 다중 밴드 여기 보코더에 관한 것으로, 특히 다중 밴드 여기 보코더의 매개변수 추정시 피치, 밴드 크기, 밴드별 유성음/무성음의 최적값을 동시에 추정할 수 있도록, 합성 스펙트럼을 이용하여 피치를 정의하는 피치 후보값 선택부(10)와 ; 주어진 피치(τ) 에 대하여 주기적인 펄스 열과 윈도우 스펙트럼을 이용해서 고조파 여기를 생성하는 고조파 여기 발생부(20) ; 전체 밴드가 유성음(V)인 경우에 대하여 고조파 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 유성음 밴드 크기 추정부(30) ; 유성음 밴드 크기와 고조파 여기 스펙트럼으로부터 유성음 스펙트럼을 합성하는 유성음 스펙트럼 합성부(40) ; 랜덤 스펙트럼과 윈도우 스펙트럼으로부터 무성음 스펙트럼을 생성하는 랜덤 여기 발생부(50) ; 전체 밴드가 무성음(UV)인 경우에 대하여 랜덤 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 무성음 밴드 크기 추정부(60) ; 무성음 밴드 크기와 랜덤 여기 스펙트럼으로부터 무성음 스펙트럼을 합성하는 무성음 스펙트럼 합성부(70) ; 각 밴드에서 구한 유성음 스펙트럼과 무성음 스펙트럼 중에서 원 스펙트럼과의 스펙트럼 차가 작은 쪽을 취하는 유성음/무성음 결정부(80) 및 ; 최적의 매개변수 추정을 하기 위하여 후보값 매개변수를 평가하는 최소화부(90)로 구성한 것을 특징으로 하는, 다중 밴드 여기 음성 부호화기에서 매개변수 추정 장치 및 방법에 관한 것이다. The present invention relates to a multi-band excitation vocoder of a speech signal encoding apparatus. In particular, the composite spectrum is used to simultaneously estimate the optimum values of pitch, band size, and voice / voice for each band when estimating parameters of the multi-band excitation vocoder. A pitch candidate value selection unit 10 defining a pitch by A harmonic excitation generator 20 for generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch τ; A voiced sound band size estimator 30 for obtaining the spectral size of the m-th band from harmonic excitation for the case where the entire band is voiced sound (V); A voiced sound spectrum synthesizer 40 for synthesizing the voiced sound spectrum from the voiced sound band size and the harmonic excitation spectrum; A random excitation generator 50 for generating an unvoiced sound spectrum from the random spectrum and the window spectrum; An unvoiced band size estimator 60 for obtaining the spectral size of the m-th band from random excitation for the case where the entire band is unvoiced (UV); An unvoiced spectrum synthesizer 70 which synthesizes unvoiced spectrum from unvoiced band size and random excitation spectrum; Voiced sound / unvoiced sound determining unit 80, which has a smaller spectral difference from the original spectrum among voiced sound spectrums and unvoiced sound spectrums obtained in each band; The present invention relates to a parameter estimation apparatus and method in a multi-band excitation speech coder, comprising a minimization unit (90) for evaluating candidate parameter parameters for optimal parameter estimation.

Description

Apparatus and Method for Parameter Estimation in Multiband Excitation Speech Coder

본 발명은 음성 신호 부호화 장치의 다중 밴드 여기 보코더에 관한 것으로, 특히 다중 밴드 여기 보코더의 매개변수 추정시 피치, 밴드 크기, 밴드별 유성음/무성음의 최적값을 동시에 추정할 수 있도록 한, 다중 밴드 여기 음성 부호화기에서 매개변수 추정 장치 및 방법에 관한 것이다. The present invention relates to a multi-band excitation vocoder of a speech signal encoding apparatus. In particular, the multi-band excitation vocoder is capable of simultaneously estimating the optimum values of pitch, band size, and voiced / unvoiced sound for each band. The present invention relates to a parameter estimating apparatus and method in a speech encoder.

일반적으로, 다중 밴드 여기(Multi Band Excitation 이하 MBE 라 칭함) 보코더(vocoder)는, 피치, 밴드 크기, 그리고 각 스펙트럼 밴드에 의해 결정된 유성음/무성음(Voice / Unvoice 이하 V/UV 라 칭함)을 가지고 음성 신호를 나타낸다. In general, a multi-band excitation (called MBE) multi-band excitation has a voice with voice, unvoiced voice (V / UV or less) determined by the pitch, band size, and spectral band. Indicates a signal.

종래의 다중 밴드 여기(MBE) 모델에 있어서, 모델 매개변수는 두 단계로 연속 추정된다.In a conventional multiband excitation (MBE) model, the model parameters are estimated in two stages in succession.

상기 피치와 밴드 크기는, 첫번째로 주파수 정의역에서 합성에 의한 분석(Analysis-by-Synthesis 이하 AbS 라 칭함)에 따른 가정된 유성 음성 모델이 추정된다. The pitch and band size are first estimated hypothesized voice models according to synthesis analysis (hereinafter, referred to as AbS) under frequency domain.

그런 후, 유성음/무성음(V/UVs)이 결정된다. Then voiced / unvoiced sounds (V / UVs) are determined.

그러나, 가정한 유성음성 모델에 의한 합성 스펙트럼은 과도 영역과 같이 음성 프레임에 무성음이 강하게 섞여 있을 경우, 커다란 스펙트럼 왜곡을 갖게 된다. However, the synthesized spectrum by the hypothesized voice model has a large spectral distortion when the unvoiced sound is strongly mixed in the speech frame such as the transient region.

낮은 비트율의 음성 부호화에 있어서, 다중 밴드 여기(MBE) 보코더[ (1988년 8월 D. W. Griffin 과 J. S. Lim, IEEE Trans. ASSP, ASSP-36권, 1223~1235쪽, 다중 밴드 여기 보코더("Multiband Excitation Vocoder") 참조]와 사인 변환 부호화기(Sinusoidal Transform Coder 이하 STC 라 칭함)[1986년 8월 R. J. McAulay 와 T. F. Quatieri, IEEE Trans. ASSP, ASSP-34권, 4부, 744~754쪽, 사인 표현에 기초한 음성 분석/합성(“Speech Analysis/Synthesis Based on a Sinusoidal Representation') 참조]와 같은 사인 음성 부호화기는, 낮은 비트율로 고음질을 재생할 수 있다고 널리 알려져 있다. In low bit-rate speech coding, a multiband excitation (MBE) vocoder [DW Griffin and JS Lim, IEEE Trans. ASSP, ASSP-36, pp. 1223-1235, "Multiband Excitation Vocoder ") and the Sinusoidal Transform Coder (hereinafter referred to as STC) [August 1986 RJ McAulay and TF Quatieri, IEEE Trans. ASSP, Vol. 4, vol. Sine speech coders, such as Speech Analysis / Synthesis Based on a Sinusoidal Representation, are widely known to be able to reproduce high sound quality at low bit rates.

특히, 다중 밴드 여기(MBE) 보코더는, 1995년 W. B. 클라인(Kleijn)과 K. K.팔리월(Paliwal)에 의해 저술된 음성 부호화 및 합성(Speech Coding and Synthesis. 출판사 Elservier.)의 R. V. Cox, "음성 부호화 표준(Speech Coding Standards)" 과 1996년 S. Dimolitsas 위성 통신 국제 저널(International Journal of Satellite Communications)의 14권 381~387쪽 인마샛 미니 엠 시스템을 위한 음성 부호화 기술의 전송 방법 평가("Transmission Performance Evaluation of Voice Encoding Technology for the INMARSAT Mini-M System,")에 따르면, 국제 해상 위성통신 기구(International Marine Satellite Telecommunication Organization 이하 INMARSAT 이라 칭함)의 인마샛-엠(INMARSAT-M), APCO/NASTD/Fed Project 25, 그리고 인마샛 미니-엠(INMARSAT Mini-M)에 표준 보코더로 채택되었다. In particular, the multiband excitation (MBE) vocoder is described by RV Cox of Speech Coding and Synthesis. Publisher Elservier., "Wonder Coding." "Speech Coding Standards" and 1996, S. Dimolitsas International Journal of Satellite Communications, Volume 14, pages 381-387. According to of Voice Encoding Technology for the INMARSAT Mini-M System, "INMARSAT-M, APCO / NASTD / Fed Project of the International Marine Satellite Telecommunication Organization (hereinafter referred to as INMARSAT). 25, and was adopted as the standard vocoder for the INMARSAT Mini-M.

다중 밴드 여기(MBE) 보코더는 피치, 밴드 크기와 결정된 유성음/무성음(V/UV)으로 음성 신호를 나타낸다. Multiband excitation (MBE) vocoders represent speech signals with pitch, band size, and determined voiced / unvoiced (V / UV).

상기 피치와 밴드 크기는, 모든 밴드가 유성음이라고 가정하고, 원 스펙트럼과 합성된 스펙트럼 사이에 추정 에러가 최소가 되도록, 합성에 의한 분석(AbS) 방식에 의해 추정된다. The pitch and band size are estimated by the synthesis analysis (AbS) method, assuming that all bands are voiced, so that the estimation error is minimal between the original spectrum and the synthesized spectrum.

그런 후, 추정된 밴드 크기와 피치는 각 고조파(harmonic) 밴드의 유성음/무성음(V/UV) 결정에 이용되고, 그 결정은 미리 계산된 값 또는 적응 임계치와 스펙트럼 추정 에러를 비교하여 수행된다. The estimated band size and pitch are then used to determine the voiced / unvoiced (V / UV) determination of each harmonic band, which determination is performed by comparing the spectral estimation error with a precalculated value or an adaptive threshold.

그러나, 다중 밴드 여기(MBE) 모델은 합성에 의한 분석법에 의해 생성한 스펙트럼과, 음성의 수신단에서 디코딩한 후 생성한 스펙트럼과 상이하다. However, the multiband excitation (MBE) model is different from the spectrum generated by the synthesis method and the spectrum generated after decoding at the receiving end of the speech.

분석 절차에 있어서, 상기 음성 스펙트럼은 모든 밴드가 유성음이라는 가정하에 합성된다. In the analysis procedure, the speech spectrum is synthesized on the assumption that all bands are voiced.

그러나, 많은 밴드가 무성음일 때 모델 매개변수의 추정 에러는 크다. However, the estimation error of the model parameter is large when many bands are unvoiced.

종래의 순차 다중 밴드 여기(MBE) 모델은 도 1 의 (가) 에 도시된 바와 같이, 합성 스펙트럼을 이용하여 피치를 정의하는 피치 후보값 선택부(1)와 ; 주어진 피치(τ) 에 대하여 주기적인 펄스 열과 윈도우 스펙트럼을 이용해서 고조파 여기를 생성하는 고조파 여기 발생부(2) ; 전체 밴드가 유성음(V)인 경우에 대하여 고조파 여기와 랜덤 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 밴드 크기 추정부(3) ; 유성음 밴드 크기와 고조파 여기 스펙트럼으로부터 유성음 스펙트럼을 합성하는 유성음 스펙트럼 합성부(4) 및 ; 최적의 매개변수 추정을 하기 위하여 후보값 매개변수를 평가하는 최소화부(5)로 구성되어 있다. The conventional sequential multi-band excitation (MBE) model includes a pitch candidate value selector 1 for defining a pitch using a synthesized spectrum, as shown in Fig. 1A; A harmonic excitation generator 2 for generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch τ; A band size estimating unit (3) for obtaining the spectral size of the m-th band from harmonic excitation and random excitation for the case where the entire band is voiced sound (V); Voiced sound spectral synthesizer 4 for synthesizing voiced sound spectrum from voiced sound band size and harmonic excitation spectrum; It consists of a minimization section 5 for evaluating the candidate value parameters for optimal parameter estimation.

또한, 도 1 의 (나)에 도시된 바와 같이, 각 밴드의 유성음/무성음 판별은 원 스펙트럼과의 합성 스펙트럼 차를 임계치와 비교하여, 임계치보다 작으면 유성음, 그렇지 않으면 무성음이라고 결정한다. In addition, as shown in Fig. 1B, voiced / unvoiced sound discrimination of each band is compared with a threshold to compare the synthesized spectral difference with the original spectrum, and determines that the voiced sound is unvoiced, if not less than the threshold.

다중 대역 여기(MBE) 보코더에 있어서, 음성 신호는 피치(τ)와 고조파 밴드 크기인 {A_m, m=1,…,M(τ)} 와 유성음/무성음 결정 {V_m, m=1,…,M(τ)} 을 이용하여 모델링된다.In a multiband excitation (MBE) vocoder, the speech signal is equal to {A _m , m = 1,... , M (τ)} and voiced / unvoiced determination {V _m , m = 1,. , M (τ)}.

여기서, M(τ) 는 피치(τ)에 따른 고조파의 수 이다. Here, M (τ) is the number of harmonics according to the pitch τ.

|E_w(ω)| 는 |E_w(ω)|=|E(ω)*W(ω)| 로 나타내는 유성음 여기 스펙트럼으로 가정된다.| E _w (ω) | Is | E _w (ω) | = | E (ω) * W (ω) | It is assumed to be a voiced excitation spectrum.

여기서, E(ω)=δ(ω-mF_s/τ) 와 W(ω) 는 주기적인 여기와 윈도우 스펙트럼을 각각 의미한다(F_s 는 샘플링 주파수).Here, E (ω) = δ (ω−mF _s / τ) and W (ω) mean periodic excitation and window spectrum, respectively (F _s is a sampling frequency).

상기 밴드 크기와 피치는, 다음과 같이 정의되는 합성에 의한 분석(AbS) 방법의 에러 측정치 ξ(τ) 를 최소화하므로서 추정된다. The band size and pitch are estimated by minimizing the error measurement ξ (τ) of the analysis by synthesis (AbS) method defined as follows.

여기서, a_m 과 b_m 은 m-번째 고조파 밴드의 각각 상한 주파수와 하한 주파수이다.Where a _m and b _m are the upper and lower frequencies, respectively, of the m-th harmonic band.

w(n) 은 윈도우(window)이고,

은 피치로 인한 에러 측정치의 바이어싱(biasing)을 보정하는 계수이다.w (n) is the window,

Is a coefficient for correcting biasing of error measurements due to pitch.

피치와 고조파 밴드의 추정 후, 다중 밴드 여기(MBE) 보코더는 각 밴드의 정규 스펙트럼 추정 에러(ξ_m)를 다음과 같이 계산한다.After estimation of the pitch and harmonic bands, the multiband excitation (MBE) vocoder calculates the normal spectral estimation error (ξ _m ) of each band as follows.

이어서, 상기 다중 밴드 여기(MBE) 보코더는 하기 수학식 4 와 같이, 임계치( θ)를 이용하여 m-번째 고조파 밴드의 유성음/무성음(V/UV)을 결정한다. Subsequently, the multi-band excitation (MBE) vocoder determines voiced / unvoiced sound (V / UV) of the m-th harmonic band using a threshold value θ as shown in Equation 4 below.

실제로 상기 유성음/무성음(V/UV)의 결정에 있어서, 진보된 다중 밴드 여기(Improved MBE 이하 IMBE 라 칭함) 보코더는, 정수 피치 탐색 에러, 밴드 에너지 등을 이용하여 정의된, 수 개의 임계치를 사용한다. Indeed, in determining the voiced / unvoiced (V / UV) advanced multiband excitation (hereinafter referred to as IMBE) vocoder uses several thresholds defined using integer pitch search error, band energy, and the like. do.

도 1 에 도시된 바와 같이, 다중 밴드 여기(MBE) 모델은 유성음을 기초로 하여 분석된 음성 신호임을 알 수 있다. As shown in FIG. 1, it can be seen that the multi-band excitation (MBE) model is a speech signal analyzed based on voiced sound.

즉, 만약 에러 측정치가 크면 합성에 의한 분석(AbS) 절차는 부정확한 피치 후보값으로 결정되고, 그 후 무성음 밴드로 분류된다. In other words, if the error measurement is large, the synthesis analysis (AbS) procedure is determined to be an incorrect pitch candidate and then classified into an unvoiced band.

그러나, 상기 모델은 몇가지 문제점들을 가지고 있다. However, the model has some problems.

첫번째, 큰 스펙트럼 추정 에러를 갖는 무성음 밴드는 부정확한 피치 추정의 원인이 된다. First, unvoiced bands with large spectral estimation errors cause incorrect pitch estimation.

두번째, 임계치는 신호 특성, 말하는 환경 등등에 따라서 가변적이기 때문에, 유성음/무성음(V/UV) 결정 절차에 필요한 임계치를 얻는데 어려움이 있다. Second, since the threshold is variable according to signal characteristics, speaking environment, etc., it is difficult to obtain the threshold required for the voiced / unvoiced (V / UV) determination procedure.

진보된 다중 밴드 여기(IMBE)의 경우에는, 수 개의 임계치를, 동적 계획법을 이용한 피치 평활(smoothing) 방법에 의해 정의되지만, 상기 방법은 코덱 지연이 길고, 계산량을 많이 요구한다. In the case of advanced multiband excitation (IMBE), several thresholds are defined by a pitch smoothing method using dynamic programming, but the method has a long codec delay and requires a large amount of computation.

마지막으로, 합성에 의한 분석(AbS)에 의해 최종적으로 합성된 스펙트럼은 복호기의 값과 차이가 있다. Finally, the spectrum finally synthesized by the analysis by synthesis (AbS) differs from the value of the decoder.

상기 합성에 의한 분석(AbS) 방법은, 부호기에서 유성음 스펙트럼만 합성되지만, 복호기에서는 유성음과 무성음 스펙트럼을 합성된다. In the above-described analysis (AbS) method, only the voiced sound spectrum is synthesized by the encoder, but the voiced and unvoiced spectrum are synthesized by the decoder.

이에 본 발명은 상기한 바와 같은 종래의 제 문제점들을 해소시키기 위하여 창안된 것으로, 다중 밴드 여기 보코더의 매개변수 추정시 피치, 밴드 크기, 밴드별 유성음/무성음의 최적값을 동시에 추정할 수 있도록 한, 다중 밴드 여기 음성 부호화기에서 매개변수 추정 장치 및 방법을 제공하는데 그 목적이 있다. Accordingly, the present invention has been devised to solve the above-mentioned problems. The optimum value of pitch, band size, and voiced / unvoiced sound for each band can be simultaneously estimated when estimating a parameter of a multi-band excitation vocoder. An object of the present invention is to provide a parameter estimation apparatus and method in a multi-band excitation speech encoder.

상기한 바와 같은 목적을 달성하기 위한 본 발명은, 합성 스펙트럼을 이용하여 피치를 정의하는 피치 후보값 선택부(10)와 ; 주어진 피치(τ) 에 대하여 주기적인 펄스 열과 윈도우 스펙트럼을 이용해서 고조파 여기를 생성하는 고조파 여기 발생부(20) ; 전체 밴드가 유성음(V)인 경우에 대하여 고조파 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 유성음 밴드 크기 추정부(30) ; 유성음 밴드 크기와 고조파 여기 스펙트럼으로부터 유성음 스펙트럼을 합성하는 유성음 스펙트럼 합성부(40) ; 랜덤 스펙트럼과 윈도우 스펙트럼으로부터 무성음 스펙트럼을 생성하는 랜덤 여기 발생부(50) ; 전체 밴드가 무성음(UV)인 경우에 대하여 랜덤 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 무성음 밴드 크기 추정부(60) ; 무성음 밴드 크기와 랜덤 여기 스펙트럼으로부터 무성음 스펙트럼을 합성하는 무성음 스펙트럼 합성부(70) ; 각 밴드에서 구한 유성음 스펙트럼과 무성음 스펙트럼 중에서 원 스펙트럼과의 스펙트럼 차가 작은 쪽을 취하는 유성음/무성음 결정부(80) 및 ; 최적의 매개변수 추정을 하기 위하여 후보값 매개변수를 평가하는 최소화부(90)로 구성함을 특징으로 한다. The present invention for achieving the above object, the pitch candidate value selection unit 10 for defining the pitch using the composite spectrum; A harmonic excitation generator 20 for generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch τ; A voiced sound band size estimator 30 for obtaining the spectral size of the m-th band from harmonic excitation for the case where the entire band is voiced sound (V); A voiced sound spectrum synthesizer 40 for synthesizing the voiced sound spectrum from the voiced sound band size and the harmonic excitation spectrum; A random excitation generator 50 for generating an unvoiced sound spectrum from the random spectrum and the window spectrum; An unvoiced band size estimator 60 for obtaining the spectral size of the m-th band from random excitation for the case where the entire band is unvoiced (UV); An unvoiced spectrum synthesizer 70 which synthesizes unvoiced spectrum from unvoiced band size and random excitation spectrum; Voiced sound / unvoiced sound determining unit 80, which has a smaller spectral difference from the original spectrum among voiced sound spectrums and unvoiced sound spectrums obtained in each band; Characterized in that it comprises a minimizing section 90 for evaluating the candidate value parameter in order to estimate the optimal parameter.

또한, 상기한 바와 같은 목적을 달성하기 위한 본 발명은, 합성 스펙트럼을 이용하여 피치를 정의하는 피치 후보값 선택 단계와 ; 주어진 피치(τ) 에 대하여 주기적인 펄스 열과 윈도우 스펙트럼을 이용해서 고조파 여기를 생성하는 고조파 여기 발생 단계 ; 유성음 밴드 크기와 고조파 여기 스펙트럼으로부터 유성음 스펙트럼을 합성하는 유성음 스펙트럼 합성 단계 ; 랜덤 스펙트럼과 윈도우 스펙트럼으로부터 무성음 스펙트럼을 생성하는 랜덤 여기 발생 단계 ; 전체 밴드가 유성음(V)인 경우와 무성음(UV)인 경우 각각에 대하여 고조파 여기와 랜덤 여기로부터 m-번째 밴드의 스펙트럼 크기를 구하는 밴드 크기 추정 단계 ; 무성음 밴드 크기와 랜덤 여기 스펙트럼으로부터 무성음 스펙트럼을 합성하는 무성음 스펙트럼 합성 단계 ; 각 밴드에서 구한 유성음 스펙트럼과 무성음 스펙트럼 중에서 원 스펙트럼과의 스펙트럼 차가 작은 쪽을 취하는 유성음/무성음 결정 단계 및 ; 최적의 매개변수 추정을 하기 위하여 후보값 매개변수를 평가하는 최소화 단계로 이루어짐을 특징으로 한다. In addition, the present invention for achieving the above object, the pitch candidate value selection step of defining the pitch using the composite spectrum; A harmonic excitation generating step of generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch [tau]; Voiced sound spectral synthesis step of synthesizing voiced sound spectrum from voiced sound band size and harmonic excitation spectrum; A random excitation generation step of generating unvoiced spectrum from the random spectrum and the window spectrum; A band size estimation step of obtaining the spectral size of the m-th band from harmonic excitation and random excitation for the case where the entire band is voiced (V) and unvoiced (UV), respectively; An unvoiced spectral synthesis step of synthesizing the unvoiced spectrum from the unvoiced band size and the random excitation spectrum; A voiced / unvoiced sound determination step in which the spectral difference between the original spectrum and the unvoiced sound spectrum and unvoiced sound spectrum obtained in each band is smaller; It is characterized in that it consists of a minimization step of evaluating candidate parameter in order to estimate the optimal parameters.

본 발명은, 다중 밴드 여기(MBE) 보코더에 있어서 새로운 매개변수 추정 방법을 제안한다. The present invention proposes a new parameter estimation method for a multi-band excitation (MBE) vocoder.

상기 방법은, 유성음 스펙트럼 뿐만 아니라 무성음 스펙트럼 음성 신호도 추정한다. The method estimates not only the voiced sound spectrum but also the unvoiced spectrum voice signal.

따라서, 첫번째 새로운 유성음/무성음(V/UV) 결정 방법을 제안한다. Therefore, we propose a first new voiced / unvoiced (V / UV) determination method.

두번째 스펙트럼 추정 에러는 유성음 또는 무성음 스펙트럼의 가정에 의해 계산된다. The second spectral estimation error is calculated by the assumption of the voiced or unvoiced spectrum.

그리고, 상기 에러값들은 각각 서로 비교된다. The error values are compared with each other.

그 유성음/무성음(V/UV)의 결정은 스펙트럼 추정 에러가 최소가 되도록, 합성에 의한 분석(AbS)의 폐 루프를 이루게 된다. The determination of the voiced / unvoiced (V / UV) results in a closed loop of the analysis by synthesis (AbS) so that the spectral estimation error is minimal.

즉, 모든 다중 밴드 여기(MBE) 모델의 매개변수는 동시에(jointly) 추정된다. That is, the parameters of all multiband excitation (MBE) models are estimated jointly.

상기 동시 추정 방법을 이용하여, 종래의 다중 밴드 여기(MBE) 모델보다 더 정확한 모델 매개변수의 추정과 양질의 재생된 음성을 얻을 수 있다. By using the simultaneous estimation method, it is possible to obtain a more accurate estimation of model parameters and a better reproduced speech than a conventional multi-band excitation (MBE) model.

또한, 유성음/무성음(V/UV)의 결정은 유성음 스펙트럼과 무성음 스펙트럼에 의해 결정되었기 때문에, 유성음/무성음(V/UV)의 결정에 있어서 유성음 임계치를 제거할 수 있다. In addition, since the determination of the voiced sound / unvoiced sound (V / UV) is determined by the voiced sound spectrum and the unvoiced sound spectrum, the voiced sound threshold can be removed in the determination of the voiced sound / unvoiced sound (V / UV).

본 발명은, 합성에 의한 분석(AbS) 루프의 모든 모델 매개변수를 추정하고 결정하는데 동시 추정 방법을 제안한다. The present invention proposes a simultaneous estimation method for estimating and determining all model parameters of an analysis by synthesis (AbS) loop.

따라서, 각 밴드의 유성 또는 무성 음성 모델이 합성에 의한 분석 절차동안 이용된다. Thus, voiced or unvoiced speech models of each band are used during the analysis procedure by synthesis.

두 음성 모델에 의한 매개변수들이 각각 추정된 후, 낮은 스펙트럼 추정 오차를 산출하도록 각 밴드의 모델이 선택된다. After the parameters by the two speech models are estimated respectively, the model of each band is selected to yield a low spectral estimation error.

단구간 스펙트럼과 장구간 스펙트럼 사진의 분석에 의해, 제안된 모델의 재생 음은 종래의 것보다 우수함을 보여 준다. By analysis of the short-term spectrum and the long-term spectrum photographs, it is shown that the reproduced sound of the proposed model is superior to the conventional one.

제안된 다중 밴드 여기(MBE) 모델은, 피치, 밴드 크기, 그리고 합성에 의한 분석(AbS) 방법의 유성음/무성음(V/UV) 결정을 동시에 추정한다. The proposed multiband excitation (MBE) model simultaneously estimates the voiced / unvoiced (V / UV) decision of the pitch, band size, and synthesis analysis (AbS) method.

음성 분석에서 스펙트럼의 추정을 위하여 제안된 다중 밴드 여기(MBE) 모델은, 종래의 모델이 단지 유성음 스펙트럼만을 이용하는 반면에, 유성음과 무성음 스펙트럼을 모두 이용한다. The multiband excitation (MBE) model proposed for the estimation of the spectrum in speech analysis uses both voiced and unvoiced spectra, while the conventional model uses only voiced spectra.

상기 유성음 스펙트럼

는, 다음과 같이 밴드 크기 주기적인 펄스 열 δ(ω-mF_s/τ), 그리고 윈도우 함수 W(ω) 에 의해 모델링된다.The voiced sound spectrum

Is modeled by the band size periodic pulse train δ (ω−mF _s / τ) and the window function W (ω) as follows.

반면에, 무성음 스펙트럼

는, 다음과 같이 모델링된다.On the other hand, unvoiced spectrum

Is modeled as follows.

여기서, R(ω) 는 기대값이 “1”일때의 랜덤 스펙트럼이다. Here, R (ω) is a random spectrum when the expected value is "1".

유성음과 무성음 모델의 정확도를 측정하기 위한, 유성음과 무성음 모델에 각각 응답하는 스펙트럼 추정 에러

와

는, 원 스펙트럼과 m-번째 고조파 밴드의 합성에 의해 다음과 같이 계산된다.Spectral estimation error in response to voiced and unvoiced models, respectively, to measure the accuracy of voiced and unvoiced models

Wow

Is calculated by the synthesis of the original spectrum and the m-th harmonic band as follows.

만약, 유성음 밴드의 스펙트럼 추정 에러가 무성음 밴드의 스펙트럼 추정 에러보다 작으면, 상기 밴드는 유성음으로 정의된다. If the spectral estimation error of the voiced band is smaller than the spectral estimation error of the unvoiced band, the band is defined as voiced sound.

그렇지 않으면, 상기 밴드는 무성음으로 분류된다. Otherwise, the band is classified as unvoiced.

즉, m-번째 스펙트럼 밴드의 유성음/무성음(V/UV) 결정은, 하기 수학식 8 과 같다. That is, the voiced sound / unvoiced sound (V / UV) determination of the m-th spectral band is expressed by Equation 8 below.

상기 모델은, 각 고조파 밴드의 유성음/무성음(V/UV) 결정을 위하여 다른 임계치를 요구하지 않는다. The model does not require another threshold for voiced / unvoiced (V / UV) determination of each harmonic band.

m-번째 밴드 크기(

)는 다음과 같이, 상기 수학식 7 의 에러를 최소화함으로서 얻을 수 있다.m-th band size (

) Can be obtained by minimizing the error of Equation 7 as follows.

각 밴드의 유성음/무성음(V/UV) 결정에 따라서, 상기 수학식 1 의 합성 스펙트럼

는, 만약 m-번째 고조파 밴드가 유성음이면 가 되고, 그렇지 않으면

가 된다.Synthetic spectrum of Equation 1 according to voice / voice unvoiced (V / UV) determination

If the m-th harmonic band is voiced , Otherwise

Becomes

상기 피치는, 합성 스펙트럼을 이용한 상기 수학식 2 와 같은 합성에 의한 분석(AbS)으로 정의된다. The pitch is defined by the analysis by synthesis (AbS) as in Equation 2 using the synthetic spectrum.

상기 도 3 은, 혼합 다중 밴드 여기(MBE) 최적화 모델이 생성한 스펙트럼(d)이, 순차 다중 밴드 여기(MBE) 모델이 생성한 스펙트럼(c)보다 원 스펙트럼(b)에 더 근사함을 설명한다. 3 illustrates that the spectrum (d) generated by the mixed multiband excitation (MBE) optimization model is closer to the original spectrum (b) than the spectrum (c) generated by the sequential multiband excitation (MBE) model. .

이어서, 장구간 음성을 비교한 스펙트로그램은 도 4 에 도시한 바와 같다. Subsequently, the spectrogram comparing the long-term speech is as shown in FIG. 4.

스펙트로그램으로부터, 제안된 다중 밴드 여기(MBE) 모델은 음성 재생에 있어서 더 원음에 가까운 고조파와 잡음 구조를 재생시킴을 알 수 있다. From the spectrogram, it can be seen that the proposed multiband excitation (MBE) model reproduces harmonics and noise structures that are closer to the original in speech reproduction.

비공식 듣기 테스트에서도 제안된 모델의 우월함이 확인되었다. Informal listening tests also confirmed the superiority of the proposed model.

이상에서 상세히 설명한 바와 같이 본 발명은, 종래의 다중 밴드 여기(MBE) 모델과 비교했을 때 보다 양질로 음성을 합성하는 혼합 다중 밴드 여기(MBE) 모델을 제안하였다. As described in detail above, the present invention has proposed a mixed multi-band excitation (MBE) model that synthesizes speech with higher quality as compared with the conventional multi-band excitation (MBE) model.

제안한 다중 밴드 여기(MBE) 모델의 진보된 동작을 보여주기 위하여 단구간 스펙트럼, 장구간 스펙트럼과 비공식 듣기 테스트 결과를 이용하였다. To show the advanced operation of the proposed multiband excitation (MBE) model, we used the short-term spectrum, long-term spectrum, and informal listening test results.

따라서, 제안한 다중 밴드 여기(MBE) 모델이 종래의 모델보다 더 정확한 모델 매개변수를 산출함을 알 수 있다. Therefore, it can be seen that the proposed multi-band excitation (MBE) model yields more accurate model parameters than the conventional model.

도 1 은 종래의 순차 다중 밴드 여기 최적화 모델의 블록 구성도, 1 is a block diagram of a conventional sequential multi-band excitation optimization model,

도 2 는 본 발명에 따른 다중 밴드 여기 음성 부호화기에서 매개변수 추정 장치의 블록 구성도, 2 is a block diagram of a parameter estimating apparatus in a multi-band excitation speech encoder according to the present invention;

도 3 은 단구간 음성의 스펙트럼을 비교한 예시도, 3 is an exemplary diagram comparing the spectrum of short-term speech;

도 4 는 장구간 음성의 스펙트로그램을 비교한 예시도이다. 4 is an exemplary diagram comparing spectrograms of long term speech.

*** 도면의 주요 부분에 대한 부호의 설명 *** *** Explanation of symbols for the main parts of the drawing ***

10 : 피치 후보값 선택부 20 : 고조파 여기 발생부 10: pitch candidate value selection unit 20: harmonic excitation generation unit

30 : 유성음 밴드 크기 추정부 40 : 유성음 스펙트럼 합성부 30: voiced sound band size estimation unit 40: voiced sound spectrum synthesis unit

50 : 랜덤 여기 발생부 60 : 무성음 밴드 크기 추정부 50: random excitation generator 60: unvoiced band size estimation unit

70 : 무성음 스펙트럼 합성부 80 : 유성음/무성음 결정부 70: unvoiced spectrum synthesis unit 80: voiced sound / unvoiced sound determining unit

90 : 최소화부 90: minimized part

Claims

A pitch candidate value selection unit 10 for defining a pitch using the synthesized spectrum;

A harmonic excitation generator 20 for generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch τ;

A voiced sound band size estimator 30 for obtaining the spectral size of the m-th band from harmonic excitation for the case where the entire band is voiced sound (V);

A voiced sound spectrum synthesizer 40 for synthesizing the voiced sound spectrum from the voiced sound band size and the harmonic excitation spectrum;

A random excitation generator 50 for generating an unvoiced sound spectrum from the random spectrum and the window spectrum;

A band size estimator 60 for obtaining the spectral size of the m-th band from random excitation for the case where the entire band is unvoiced (UV);

An unvoiced spectrum synthesizer 70 which synthesizes unvoiced spectrum from unvoiced band size and random excitation spectrum;

Voiced sound / unvoiced sound determining unit 80, which has a smaller spectral difference from the original spectrum among voiced sound spectrums and unvoiced sound spectrums obtained in each band;

And a minimization unit (90) for evaluating candidate parameter for optimal parameter estimation.

The method according to claim 1,

A parameter estimation apparatus in a multi-band excitation speech coder, characterized in that it is configured to simultaneously estimate an optimum value of pitch, band size, and voice / unvoice for each band.

A pitch candidate value selection step of defining a pitch using a composite spectrum;

A harmonic excitation generating step of generating harmonic excitation using a periodic pulse train and a window spectrum for a given pitch [tau];

Voiced sound spectral synthesis step of synthesizing voiced sound spectrum from voiced sound band size and harmonic excitation spectrum;

A random excitation generation step of generating unvoiced spectrum from the random spectrum and the window spectrum;

A band size estimation step of obtaining the spectral size of the m-th band from harmonic excitation and random excitation for the case where the entire band is voiced (V) and unvoiced (UV), respectively;

An unvoiced spectral synthesis step of synthesizing the unvoiced spectrum from the unvoiced band size and the random excitation spectrum;

A voiced / unvoiced sound determination step in which the spectral difference between the original spectrum and the unvoiced sound spectrum and unvoiced sound spectrum obtained in each band is smaller;

A parameter estimation method in a multiband excitation speech encoder, characterized in that it comprises a minimization step of evaluating candidate parameter parameters for optimal parameter estimation.

The method according to claim 3,

The band size estimating step,

Parameter estimation method in a multi-band excitation speech encoder, characterized in that consisting of.

The method according to claim 3,

The voiced sound spectrum synthesis step,

Where is the band size, δ (ω-mF _s / τ) is the periodic pulse train, and W (ω) is the window function,

Parameter estimation method in a multi-band excitation speech coder, characterized in that using a method such as.

The method according to claim 3,

The unvoiced spectrum synthesis step,

R (ω) is a random spectrum when the expected value is "1",

The method according to claim 3,

The voiced sound / unvoiced sound determining step,

The method according to claim 3,

The minimizing step is,

a _m and b _m are the upper and lower frequencies, respectively, of the m-th harmonic band, w (n) is the window,

Is the correction factor of the error measurement due to the pitch,

The method according to claim 3,

The harmonic excitation generation step,

For a given pitch, harmonic excitation using the periodic pulse train spectrum δ (ω-mF _s / τ) and the window spectrum W (ω)

Parameter estimation method in a multi-band excitation speech encoder.