KR19980051012A

KR19980051012A - A method for changing the speech pitch in the time domain by separating components

Info

Publication number: KR19980051012A
Application number: KR1019960069861A
Authority: KR
Inventors: 김상철; 박충희; 임준석; 배명진
Original assignee: 김정국; 현대중공업 주식회사
Priority date: 1996-12-21
Filing date: 1996-12-21
Publication date: 1998-09-15
Also published as: KR100196387B1

Abstract

본 발명은 음성 합성기술에서 음성 피치(pitch)를 변경하는 방법에 관한 것으로서 특히, 음성신호를 광대역 성분과 협대역 성분으로 성분 분리하고, 광대역 성분은 유지하며, 협대역 성분에 대해서만 시간 스케일링(scaling)을 수행하여, 이 스케일링된 협대역 성분과 상기 유지된 광대역 성분을 합하여 피치가 변경된 음성신호를 얻을 수 있도록 한 성분 분리를 통한 시간 영역상의 음성피치 변경방법에 관한 것이다.The present invention relates to a method for changing the pitch of a speech in a speech synthesis technique, and more particularly to a method for separating a speech signal into a wideband component and a narrowband component, maintaining a wideband component, and scaling The present invention also relates to a method of changing a speech pitch in a time domain by separating a component obtained by adding the scaled narrowband component and the held wideband component to obtain a speech signal whose pitch is changed.

종래의 음성 합성기술에서 피치 변경법은 시간 영역법, 주파수 영역법, 시간 주파수 혼성 영역법이 있으나 피치 변경을 하였을때 나타나는 스펙트럼 왜곡을 극복하지 못하기 때문에 합성음의 명료도가 저하되는 문제가 있다.In the conventional speech synthesis technique, there are a time domain method, a frequency domain method, and a time frequency hybrid domain method in the pitch change method, but the spectrum distortion caused by the pitch change can not be overcome.

본 발명에서는, 음성신호를 가변 저역통과필터를 이용해서 광대역성분과 협대역성분으로 분리하는 단계와, 상기 분리된 광대역 성분을 유지하는 단계와, 상기 분리된 협대역 성분에 대해서 시간 스케일링을 수행하는 단계와, 상기 유지된 광대역 성분에 상기 시간 스케일링된 협대역 성분을 가하여 합성음을 구성하는 단계로 피치 변경을 수행하는 음성 합성 기술을 제공한다.According to the present invention, there is provided a method comprising: separating a speech signal into a wideband component and a narrowband component using a variable lowpass filter; maintaining the separated wideband component; and performing time scaling on the separated narrowband component And a step of constructing a synthesized voice by adding the time-scaled narrowband component to the held wideband component.

Description

A method for changing the speech pitch in the time domain by separating components

본 발명은 음성 합성기술에 관한 것으로서 특히, 합성 음성의 피치(pitch)를 변경하는 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech synthesis techniques and, more particularly, to a method for changing the pitch of synthesized speech.

디지탈 방식의 휴대용 통신기기에서는 전송 채널의 대역폭을 효율적으로 사용하고 또한 고음질을 얻기 위하여 여러가지의 피치변경 이론들을 이용하여 음성 부호화기를 실현하고 있다.In a digital portable communication device, a voice encoder is implemented using various pitch change theories in order to efficiently use the bandwidth of a transmission channel and to obtain high sound quality.

특히, 최근 다양해진 음성 서비스 분야에서는 고음질의 합성음을 요구하고 있다.Especially, in the field of voice service which has recently been diversified, high quality sound is required.

합성음 구성을 위한 대부분의 파형 부호화법이나 혼성 부호화 방법들은 종속되는 단어에 따라서 같은 단어를 서로 다른 데이타로 이용하였으며, 이는 데이타 베이스의 크기가 증가할 뿐만 아니라 데이타 베이스 설계에 있어서도 문제가 되기 때문에, 이러한 한계를 피치 변경법을 사용함으로써 극복하고 있다.Most of the waveform coding methods and hybrid coding methods for composing synthetic speech use the same words as different data depending on the dependent words, which not only increases the size of the database but also becomes a problem in the database design. The limit is overcome by using the pitch change method.

즉, 파형 부호화법은 잉여성분을 줄여서 파형의 모양을 단순히 유지하는 것인데, 이 경우의 음성합성에 있어서, 분석에 의한 합성이 주로 고음질 합성에 쓰인다.In other words, the waveform coding method is to simply keep the shape of the waveform by reducing surplus components. In speech synthesis in this case, synthesis by analysis is mainly used for high-quality synthesis.

그러나, 이 경우에 있어서의 부호화 파라미터는 여기성분과 여파기 성분이 분류되어 있지 않기 때문에 규칙에 의한 합성에 파형 부호화법을 적용하기는 힘들고, 규칙에 의한 합성법을 파형 부호화법에 적용시키기 위해서는 운율 조절법에서의 피치 변경 기술이 필요하게 된다.However, since the excitation component and the filter component are not classified in the encoding parameter in this case, it is difficult to apply the waveform encoding method to the synthesis according to the rule. In order to apply the rule-based synthesis method to the waveform encoding method, A technique of changing the pitch of the pitch is required.

종래의 피치 변경법은 그 처리 영역에 따라서 시간영역법, 주파수 영역법, 시간 주파수 혼성 처리법 등이 있다.The conventional pitch changing method includes a time domain method, a frequency domain method, and a time-frequency hybrid processing method depending on the processing region.

시간 영역법에는 멀티-펄스(Multi-Pulse)법, 피치 반분법 등이 있다.The time domain method includes a multi-pulse method and a pitch half-method.

Caspers와 Atal은 MPLPC에서 펄스에 영을 삽입하거나 삭제하는 방법을 제안하였으나, MPLPC상의 펄스열은 피치와 포만트에 대한 상호 연관을 가지고 있기 때문에 스펙트럼 왜곡이 심하다.Caspers and Atal proposed a method of inserting or deleting nulls in pulses in MPLPC, but the pulse train on MPLPC has a strong correlation with pitch and formant, which causes severe spectrum distortion.

Varga와 Fallside는 LPC계수를 이용한 피치 연장법을 제안했으나, 이 방법은 피치 주기를 줄이는 경우에 단지 파형의 일부분을 소거하고 평활화하는 방법을 사용하고 있기 때문에 스펙트럼 왜곡이 많이 나타난다.Varga and Fallside proposed a pitch extension method using LPC coefficients. However, this method shows a lot of spectrum distortion because it only uses a method of canceling and smoothing a part of the waveform when decreasing the pitch period.

한편, 피치 반분법은 임의로 변경하려는 피치 주기의 2배 파형을 만든 후에 그 파형의 주기를 반분하는 피치 변경법이다.On the other hand, the pitch half-method is a pitch change method in which a period of a waveform is changed by half after making a double waveform of a pitch period to be arbitrarily changed.

그러나 이 방법은 시간 영역에서만 수행되기 때문에 스펙트럼 왜곡이 발생하여 합성음의 명료도가 저하된다.However, since this method is performed only in the time domain, spectrum distortion occurs and the clarity of the synthesized sound is lowered.

McAulay와 Quatieri는 주파수 영역에서 위상을 보존하는 피치 변경법을 제안하였는데, 이 방법은 입력 음성에 대해 진폭 및 위상 스펙트럼을 추출하여 별도로처리하는 방법이다.McAulay and Quatieri proposed a pitch change method that preserves the phase in the frequency domain. This method extracts the amplitude and phase spectrum of the input speech and processes it separately.

즉, 진폭 스펙트럼에 대해서는 두드러진 스펙트럼 봉우리들을 추출한 다음에 이 것을 피치 변경율(ρ)만큼 인터폴레이션(Interpolation)하여 진폭 스펙트럼의 피치를 변경시킨다.That is, for the amplitude spectrum, we extract the prominent spectral peaks and then interpolate them by the pitch change rate p to change the pitch of the amplitude spectrum.

그리고, 위상 스펙트럼에 대해서는 시간 영역에서 구한 피치 개시 시간(pitch on-set time)에 해당하는 위상을 제거하고나서 피치가 변경되었을 때의 새로운 피치 개시시간의 위상을 더해줌으로써 새로운 위상을 구성하게 된다.For the phase spectrum, a phase corresponding to the pitch on-set time obtained in the time domain is removed, and a new phase is formed by adding the phase of the new pitch start time when the pitch is changed.

이러한 방법은 파형의 꼴을 그대로 유지하기 때문에 프레임 단위로 분석 처리하는 통상의 처리법에서 인접 프레임간의 연결이 아주 용이해진다는 장점이 있다.This method has the merit that the connection between neighboring frames becomes very easy in a usual processing method of performing analysis processing on a frame basis since the shape of a waveform remains unchanged.

그렇지만, 피치 변경시에 피치 주기와는 별도로 피치의 개시 시간을 공급해 주어야 하고, 또한 진폭 스펙트럼상에서 두드러진 봉우리 위주로 고조파의 인터폴레이션을 수행하기 때문에 스펙트럼의 왜곡이 높아진다는 단점이 있다.However, when changing the pitch, the start time of the pitch must be supplied separately from the pitch period, and the distortion of the spectrum is increased because the interpolation of the harmonics is performed mainly on the prominent peaks on the amplitude spectrum.

시간 주파수 혼성 처리법으로는 켑스트럼의 특징을 이용하여 켑스트럼값이 거의 영이되는 부분에서 영값을 삽입하거나 삭제함으로써 피치를 변경하는 방법이 있다.In the time-frequency hybrid processing method, there is a method of changing the pitch by inserting or deleting a zero value at a portion where the cepstrum value becomes almost zero using the characteristics of the cepstrum.

그러나, 이 방법 역시 위상의 보존이 어렵다는 문제점이 있다.However, this method also has a problem that it is difficult to maintain the phase.

또, Takagi와 Miyasaka가 제안한 시간 주파수 혼성법은 시간 영역에서 피치 변경을 하였을때 나타나는 스펙트럼 왜곡을 스펙트럼 영역상에서 LPC포락을 통해 수정하는 방법이다.The time-frequency hybrid method proposed by Takagi and Miyasaka is a method of modifying the spectrum distortion caused by the pitch change in the time domain through the LPC envelope on the spectrum region.

이 방법은 LPC스펙트럼 포락이 갖는 극점에 치중된 스펙트럼 전달 특성 때문에 모든 유성음을 만족하지 못한다는 한계성을 가진다.This method has the limitation that it can not satisfy all the voices because of the spectral transfer characteristic focused on the pole of the LPC spectrum envelope.

지금까지 제안된 피치 변경법으로서 시간 영역법, 주파수 영역법, 시간 주파수 혼성 영역법이 있으나, 시간 영역법을 사용할때 단순한 평균값 인터폴레이션을 통해 유성음의 피치 주기를 r배로 늘리게 되면 피치 주기는 자연스럽게 늘어나지만 성도특성이 변경되어 발성의 명료도가 저하된다.However, if the pitch period of a voiced sound is increased to r times by a simple average interpolation when the time domain method is used, the pitch period naturally increases. However, The characteristics are changed, and the intelligibility of vocalization is lowered.

본 발명에 사용한 표준 문장의 발성에 대해 평균값 인터폴레이션법으로 피치 주기를 신장시켰을 때 원래 음성에 대한 스펙트럼 왜곡을 측정하여 100분율로 나타낼 경우 피치 변경율이 증가할수록 스펙트럼의 왜곡은 지수함수적인 증가추세를 나타내며, 명확성 또한 상당히 떨어진다는 것이 문제이다.When the pitch period of the standard sentence used in the present invention is extended by the mean value interpolation method and the spectrum distortion of the original speech is measured and expressed by a percentage of 100, the spectrum distortion increases exponentially as the pitch change rate increases , And the clarity is also considerably low.

그 원인은 협대역 뿐만 아니라 광대역 신호도 함께 스케일링(scaling)되기 때문이다.The reason for this is that not only narrowband but also broadband signals are scaled together.

본 발명은 먼저 음성신호를 협대역 성분과 광대역 성분으로 가변 저역통과필터를 사용하여 성분분리하고, 광대역 성분은 유지하며 협대역 성분에 대해서만 시간 스케일링을 수행하여, 스케일링된 협대역 성분과 상기 유지된 광대역 성분을 합하여 피치가 변경된 음성신호를 구성하는 방법으로 성분 분리를 통한 시간 영역상의 음성피치 변경방법을 제공한다.The present invention firstly separates components of a speech signal into a narrowband component and a wideband component using a variable low pass filter, maintains a wideband component and performs time scaling only on narrowband components, A method of changing a pitch in a time domain by separating components by a method of constructing a speech signal whose pitch is changed by adding wideband components is provided.

즉, 본 발명은 합성음의 명료도를 증가시키기 위하여, 협대역 신호와 광대역 신호를 분리하여 광대역 신호는 변경시키지 않으면서 협대역 신호의 주기를 변경시키는 피치변경법을 제안한다.That is, in order to increase the clarity of a synthetic sound, the present invention proposes a pitch changing method in which a narrow-band signal and a wide-band signal are separated and a period of a narrow-band signal is changed without changing a wide-band signal.

도 1 은 본 발명의 구현을 위한 음성 신호 처리 시스템의 블럭 구성도1 is a block diagram of a speech signal processing system for implementing the present invention;

도 2 는 본 발명의 음성피치 변경방법에 대한 블럭 구성도2 is a block diagram of a speech pitch changing method according to the present invention.

도 3 은 본 발명에 적용된 포만트 검출기의 블럭 구성도3 is a block diagram of a formant detector applied to the present invention.

본 발명을 실현하기 위한 음성신호 처리 시스템의 구성을 도면 도 1에 나타내었다.The configuration of a speech signal processing system for realizing the present invention is shown in Fig.

도 1을 참조하면, 음성을 전기적인 음성신호로 변환하기 위한 마이크(1)와, 상기 마이크(1)로 입력된 음성신호를 증폭하는 증폭기(2)와, 상기 증폭기(2)로 증폭된 음성신호를 필터링하는 저역통과필터(3)와, 상기 저역통과필터(3)에서 출력된 음성신호를 디지탈 신호로 변환하는 아날로그 디지탈 변환기(4)와, 상기 변환된 디지탈 신호를 마이크로 프로세서에서 처리하기 위하여 입력하는 입력포트(5)와, 상기 입력포트(5)로 입력된 음성데이타를 처리하는 마이크로 프로세서(6)와, 상기 마이크로 프로세서(6)에서 처리된 음성 데이타가 저장되는 메모리(7)와, 상기 마이크로 프로세서(6)에서 출력된 음성데이타를 전송채널로 전송하기 위한 입출력포트(8)와, 상기 마이크로 프로세서(6)에서 처리된 음성 데이타의 출력포트(9)와, 상기 출력포트(9)로 출력된 음성 데이타를 아날로그 신호로 변환하는 디지탈 아날로그 변환기(10)와, 상기 변환된 아날로그 음성신호를 필터링하는 저역통과필터(11)와, 상기 저역통과필터(11)에서 출력된 음성신호를 증폭하는 증폭기(12)와, 상기 증폭된 음성신호를 가청 주파수 대역의 음성으로 출력하는 스피커(13)로 구성된다.1, there is shown a microphone 1 for converting a voice into an electrical voice signal, an amplifier 2 for amplifying a voice signal input to the microphone 1, a voice amplified by the amplifier 2, A low pass filter 3 for filtering a signal, an analogue digital converter 4 for converting a voice signal output from the low pass filter 3 into a digital signal, A memory 7 for storing voice data processed by the microprocessor 6; a memory 7 for storing voice data processed by the microprocessor 6; An input / output port 8 for transmitting voice data output from the microprocessor 6 on a transmission channel; an output port 9 for voice data processed by the microprocessor 6; Voice output A low-pass filter 11 for filtering the converted analog voice signal; an amplifier 12 for amplifying the voice signal output from the low-pass filter 11; And a speaker 13 for outputting the amplified voice signal in the audio frequency band.

마이크(1)로 입력된 음성은 전기적인 신호로 변환되어 증폭기(2)에서 소정의 증폭도로 증폭된다.The voice inputted to the microphone 1 is converted into an electrical signal and amplified by the amplifier 2 to a predetermined degree of amplification.

증폭기(2)에서 증폭된 음성신호는 의사 전달 정보 성분만 필요로 하기 때문에 저역통과필터(3)를 통과하여 4kHz 이상 주파수 성분을 제거하고, 아날로그 디지탈 변환기(4)에서 8kHz의 클럭으로 샘플링되고 전화(Telephone) 음질을 기준으로 하기 위해서 12비트의 양자화 레벨로 디지탈 변환된다.Since the voice signal amplified by the amplifier 2 requires only a pseudo-propagation information component, it passes through the low-pass filter 3 to remove a frequency component of 4 kHz or more, is sampled at a clock of 8 kHz in the analogue digital converter 4, Is digitally converted to a 12-bit quantization level based on the telephone sound quality.

디지탈 음성 데이타는 입력포트(5)를 통해서 마이크로 프로세서(6)에 입력되어 부호화 처리되고, 이와같이 처리된 데이타는 메모리(7)에 저장하거나 또는 입출력포트(8)를 통해서 전송채널로 전송된다.The digital voice data is input to the microprocessor 6 through the input port 5 and encoded and the processed data is stored in the memory 7 or transmitted through the input /

한편, 상기 메모리(7)에서 읽어낸 마이크로 프로세서(6)에서 처리된 음성데이타 또는 입출력포트(8)를 통해 입력된 데이타를 사용해서 복호화 처리가 완료된 합성 음성신호는 잘 처리되었는지를 확인하기 위하여, 출력포트(9)를 통해서 디지탈 아날로그 변환기(10)에 입력된다.On the other hand, in order to confirm whether the synthesized voice signal, which has been decoded by using the voice data processed in the microprocessor 6 or the data inputted through the input / output port 8 read from the memory 7, Analog converter 10 through the output port 9. The digital-to-

디지탈 아날로그 변환기(10)는 8kHz의 클럭신호로 상기 디지탈 음성 데이타를 아날로그 음성신호로 변환하고, 저역통과필터(11)를 통과시켜 고조파성분을 제거한 기본 대역의 신호만 필터링한 다음 증폭기(12)를 통해서 증폭하여 스피커(13)로 출력한다.The digital-to-analog converter 10 converts the digital voice data into an analog voice signal with a clock signal of 8 kHz, passes through a low-pass filter 11 to filter only a signal of a fundamental band from which a harmonic component is removed, And outputs it to the speaker 13.

도면 도 2도는 상기한 음성 데이타 처리시의 피치 변경법에 대한 블럭도로서, 음성신호S(n)를 입력받아 광대역 성분과 협대역 성분으로 분리하기 위한 강대역 검출수단(201)과 저역통과필터(202)를 가지며, 상기 저역통과필터(202)에서 필터링된 협대역 음성신호S_L(n)를 시간 영역에서 스케일링 처리하는 시간 스케일링수단(203)과, 상기 시간 스케일링수단(203)에서 스케일링된 협대역 음성신호S_L(n')와 광대역 성분을 합하는 가산수단(204)을 포함하여 이루어진다.FIG. 2 is a block diagram of a pitch changing method in the above-described speech data processing, which includes a strong band detecting means 201 for receiving the speech signal S (n) and separating it into a wideband component and a narrowband component, A time scaling means (203) for scaling the narrowband speech signal S _L (n) filtered by the low pass filter (202) in a time domain; a time scaling means And an adding means 204 for adding the band speech signal S _L (n ') and a wideband component.

상기 대역 분리수단(201)(202)에서 협대역 신호는 제 1 포만트와 제 2 포만트로 구성되며, 그 이상의 포만트는 광대역 신호 성분이고 특히, 유성음의 포만트 주파수는 변화하기 때문에 협대역 신호와 광대역 신호를 분리해 내기 위해서 상기 저역통과필터(202)는 차단 주파수(f_T) 가변 저역통과필터를 채용하였다.In the band separating means 201 and 202, the narrowband signal is composed of a first formant and a second formant, and the formants therefor are broadband signal components. Particularly, since the formant frequency of the voiced sound changes, To isolate the wideband signal, the low pass filter 202 employs a cutoff frequency (f _T ) variable low pass filter.

광대역 검출수단(201)은 음성신호S(n)를 입력받아 피이크밸리율(PVR: Peak Valley Rate)로부터 원래의 음성신호에 포함된 고주파 영교차율과, 저역통과필터링된 에러신호로부터 에러신호의 영교차율을 가지고 가변 저역통과필터(202)의 차단 주파수f_T를 가변 제어하여 광대역 신호성분과 협대역 신호성분을 성분 분리한다.The wideband detecting means 201 receives the audio signal S (n) and detects the high frequency zero crossing rate included in the original voice signal from the peak valley rate (PVR) and the zero crossing rate from the low- with crossing rate and variably controls the cut-off frequency f _T of the variable low-pass filter 202 to remove components of broadband signal component and a narrowband signal component.

분리된 협대역 신호성분S_L(n)에 대해서만 시간 스케일링수단(203)에서 시간영역의 스케일링을 수행함으로써 피치를 변경하고, 이 스케일링된 협대역 신호 성분S_L(n')과 광대역 신호성분을 가산수단(204)에서 가산하여 최종적으로 피치가 변경된 신호를 재구성한다.By changing the pitch by performing time-domain scaling in the time scaling means 203 only for the separated narrowband signal component S _L (n), the scaled narrow-band signal component S _L (n ') and the wide- And added by the addition means 204 to reconstruct a signal whose pitch is finally changed.

즉, 협대역 신호의 피치 주기는 단순한 평균 인터폴레이션 방법에 의해서 r배만큼 확장되거나 데시메이션(Decimation)법에 의해서 r배만큼 압축되는데, 여기서 본 발명은 광대역 신호 성분의 보존에 촛점을 맞추었기 때문에 단순한 시간 스케일링 알고리즘을 피치 변경에 사용해서 협대역 신호 성분을 스케일링하고, 이와같이 피치가 변경된 협대역 성분과 광대역 성분을 가산수단(204)에서 합하여 피치가 변경된 신호를 재구성한 것이다.That is, the pitch period of the narrowband signal is expanded by r times by a simple average interpolation method or compressed by r times by a decimation method. Since the present invention focuses on the preservation of a wideband signal component, The time-scaling algorithm is used for pitch change to scale the narrowband signal component, and the narrow-band and wide-band components thus changed in pitch are summed in the adding means 204 to reconstruct the pitch-changed signal.

본 발명의 피치 변경법에 사용되는 대역 분리수단으로서 상기의 포만트 검출기의 구성을 도면 도 3에 도시하였다.The configuration of the formant detector as the band separating means used in the pitch changing method of the present invention is shown in Fig.

차단 주파수가 가변 제어되어 입력 음성신호S(n)의 저역통과 필터링을 수행하는 상기의 가변 저역통과필터(301) 출력신호S'(n)와 음성신호S(n)의 에러신호e(n)를 출력하는 감산수단(306)과, 상기 입력 음성신호S(n)의 피이크밸류율(PVR)로부터 원래 음성에 포함된 고주파 영교차율(Z_SM)을 추정해내는 제 1 연산수단(302)과, 상기 감산수단(306)에서 출력된 에러신호e(n)로부터 에러신호의 영교차율(ZCR : Zero Crossing Rate)로부터 원래 음성신호의 광대역 포만트 성분의 영교차율Zi를 출력하는 제 2 연산수단(303)과, 상기 제 1 연산수단(302)으로부터 출력되는 고주파 영교차율Z_SM과 상기 제 2 연산수단(303)에서 출력되는 에러신호의 영교차율Zi로부터 가변 저역통과필터의 차단 주파수 결정을 위한 포인트를 산출하는 제 3 연산수단(304)과, 상기 제 3 연산수단(304)에서 산출된 포인트값을 대소 비교하여 가변 저역통과필터(301)의 차단주파수f_T를 제어하는 비교제어수단(305)을 포함하여 구성된다.(N) of the speech signal S (n) and the output signal S '(n) of the variable low-pass filter 301 that performs low-pass filtering of the input speech signal S (n) A first calculation means 302 for estimating a high frequency zero crossing rate Z _SM included in the original voice from the peak value rate PVR of the input voice signal S (n) Second calculation means for outputting a zero crossing rate Zi of a wide band formant component of the original speech signal from the zero crossing rate (ZCR) of the error signal from the error signal e (n) output from the subtracting means 306 Frequency zero crossing rate Z _SM output from the first calculating means 302 and the zero crossing rate Z i of the error signal output from the second calculating means 303 to a point for determining the cutoff frequency of the variable low- A third calculation means 304 for calculating the second calculation means 304, Comparing the magnitude values STE and is configured to include a comparison control means 305 for controlling the cut-off frequency f _T of the variable low-pass filter 301.

이와같이 구성된 포만트 검출기의 동작은 다음과같이 이루어진다.The operation of the formant detector thus constructed is as follows.

저역통과 필터(301)에서 출력된 음성신호S'(n)과 입력 음성신호S(n)의 차값을 감산수단(306)에서 연산하여 에러신호e(n)를 구한다.The difference value between the speech signal S '(n) output from the low-pass filter 301 and the input speech signal S (n) is calculated by the subtracting means 306 to obtain an error signal e (n).

에러신호e(n)는 높은 주파수 성분들로 구성된 잉여신호이므로 이 에러신호e(n)의 영교차율(ZCR)은 원래 음성신호의 광대역 포만트 성분의 영교차율Zi와 같다.Since the error signal e (n) is an excess signal composed of high frequency components, the zero crossing rate ZCR of the error signal e (n) is equal to the zero crossing rate Zi of the wideband formant component of the original speech signal.

그러므로, 제 2 연산수단(303)에서 영교차율Zi를 출력하여 제 3 연산수단(304)에 입력하는 한편, 원래 음성에 포함된 고주파 영교차율Z_SM은 원신호의 피크밸리율(PVR)로부터 제 2 연산수단(302)이 추정하여 제 3 연산수단(304)에 입력한다.Therefore, the second calculation means 303 outputs the zero crossing rate Zi and inputs it to the third calculation means 304 while the high frequency zero crossing rate Z _SM included in the original voice is obtained from the peakvalue rate (PVR) of the original signal 2 calculation means 302 estimates it and inputs it to the third calculation means 304. [

그런데 유성음의 피이크밸리율(PVR)은 고차 포만트의 피이크밸리율의 약 70%로 구성되고 무성음에서의 피이크밸리율(PVR)은 고차 포만트의 피이크밸리율로 구성되므로 가변 저역통과필터(301)의 차단 주파수f_T는 Zi=0.7×Z_SM이 되는 지점에서 정할 수 있다.Since the peak-to-peak ratio (PVR) of the voiced sound consists of about 70% of the peak-to-peak ratio of the higher-order formants and the peak-to-peak ratio (PVR) of the unvoiced sound consists of the peak- cut-off frequency of the) f _T may be determined at the point where the Zi = 0.7 × Z _SM.

그러므로, 제 3 연산수단(304)이 이 포인트(지점)를 연산하여 비교제어수단(305)에 입력하고, 비교 제어수단(305)은 Zi와 0.7Z_SM을 비교하여 차단 주파수f_T를 제어한다.Therefore, the third calculation means 304 calculates this point (point) and inputs it to the comparison control means 305. The comparison control means 305 compares Zi with 0.7Z _SM to control the cut-off frequency f _T .

즉, 차단 주파수f_T는 초기에 미리 정의된 저주파에서 초기화되어 Zi가 0.7×Z_SM보다 작으면 차단주파수f_T를 증가시키면서 Zi가 0.7×Z_SM보다 커질때 까지 반복하여 비교됨으로써 본 발명에서 원하는 협대역 신호 성분에 대한 피치 변경이 이루어질 수 있도록 성분 분리를 해내는 것이다.That is, the cut-off frequency f _T is initially initialized at a predefined low frequency, and if Zi is less than 0.7 x Z _SM , the cut-off frequency f _T is increased and repeatedly compared until Zi becomes larger than 0.7 x Z _SM , And separates the components so that a pitch change can be made to the narrowband signal component.

이와같이 하여 협대역 신호성분을 구하고 그 것을 스케일링하여 상기 광대역 신호성분과 합하여 최종적으로 피치가 변경된 신호를 재구성하는 것이다.In this manner, the narrowband signal component is obtained, scaled, and the result is summed with the wideband signal component to reconstruct a signal whose pitch is finally changed.

본 발명에 의한 피치 변경방법은 시간 영역에서 음성신호를 광대역 성분과 협대역 성분으로 분리하고, 광대역 성분은 유지한채 협대역 성분에 대해서만 시간 스케일링을 수행한 다음 이 것을 광대역 성분과 합하여 최종적으로 피치 변경된 음성신호를 구성한다.The pitch changing method according to the present invention separates a speech signal into a wideband component and a narrowband component in a time domain, performs time scaling only on a narrowband component while maintaining a wideband component, combines the wideband component with a wideband component, And constitutes a voice signal.

그러므로, 종래의 기술들에 비하여 명료도가 향상된 합성 음성신호를 확보할 수 있다.Therefore, it is possible to secure a synthesized voice signal with improved clarity compared to the conventional techniques.

또, 결과적으로 피치주기를 200% 까지 신장하였을 때에 스펙트럼 왜곡율은 평균 5.9%로 기존의 방법에 비하여 약 3% 향상된 결과를 얻었다.As a result, when the pitch period is extended up to 200%, the spectrum distortion rate is 5.9%, which is about 3% higher than the conventional method.

또한 본 발명은 원래 음성의 고차 포만트 성분을 그대로 유지하기 때문에 명료성의 열화는 거의 없고, 고음질 합성법에 적용함으로써 음소에 의한 합성법에 적용시킬 수 있으며, 음성 서비스 분야의 고음질 합성 시스템에 적용하여 좋은 효과를 거둘 수 있다.Further, since the present invention retains the original higher-order formant component of the original voice, there is almost no deterioration in clarity. The present invention can be applied to the synthesis method using phonemes by applying to the high-quality synthesis method. .

Claims

The method comprising the steps of: maintaining a wideband component by separating a speech signal into a wideband component and a narrowband component; performing scaling in the time domain with respect to the separated component of the narrowband component; And constructing a speech signal that is finally pitch-changed by adding the held wideband components to the speech signal.

2. The method of claim 1, wherein narrowband components are separated using a variable low-pass filter in the component separation step.

3. The method of claim 2, in Zi = 0.7 × Z _SM point is from the high-frequency zero crossing rate Z _SM included in the zero crossing rate Zi and the original speech in the original wideband formant components of the audio signal to set the cut-off frequency of the variable low-pass filter &Lt; / RTI > wherein the temporal domain of the voice pitch is changed.