KR0141167B1 - Nonvoice synthesizing method - Google Patents
Nonvoice synthesizing methodInfo
- Publication number
- KR0141167B1 KR0141167B1 KR1019950001576A KR19950001576A KR0141167B1 KR 0141167 B1 KR0141167 B1 KR 0141167B1 KR 1019950001576 A KR1019950001576 A KR 1019950001576A KR 19950001576 A KR19950001576 A KR 19950001576A KR 0141167 B1 KR0141167 B1 KR 0141167B1
- Authority
- KR
- South Korea
- Prior art keywords
- unvoiced
- band
- unvoiced sound
- sound
- obtaining
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 4
- 238000001228 spectrum Methods 0.000 claims abstract description 25
- 230000005284 excitation Effects 0.000 claims abstract description 16
- 230000003595 spectral effect Effects 0.000 claims abstract description 13
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 10
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 10
- 238000001308 synthesis method Methods 0.000 abstract description 12
- 230000005540 biological transmission Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 3
- 238000009827 uniform distribution Methods 0.000 description 2
- 239000013078 crystal Substances 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
- G10L19/07—Line spectrum pair [LSP] vocoders
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
본 발명은 다중 대역 부호화방법에 있어서 무성음 합성방법을 공개한다. 그 다중 대역 여기 부호화방법에 있어서 무성음을 합성하기 위한 방법은, 무성음신호의 스펙트럼 포락선을 다음 제1식을 이용하여 구하는 스펙트럼 포락선단계와,The present invention discloses an unvoiced sound synthesis method in a multi-band encoding method. In the multi-band excitation encoding method, a method for synthesizing unvoiced sound includes: a spectral envelope step of obtaining a spectral envelope of an unvoiced signal using the following equation;
(여기서, p는 선형예측계수의 차수이고, G는 파워이고, a는 선형예측계수이고, i는 선형예측계수의 첨자이다)Where p is the order of the linear predictive coefficient, G is the power, a is the linear predictive coefficient, and i is the subscript of the linear predictive coefficient.
백색잡음 스텍프럼인 N(w)를 구하는 백색음잡음단계와, 주파수대역이 무성음대역에 포함되면 제1레벨이고, 대역에 포함되지 않으면 제2레벨이 되는 유무성음 정보 UV(w), N(w) 및 │H(w)│를 다음 제2식을 이용하여 무성음 합성을 위한 스펙트럼│H(w)│uv를 구하는 무성음 스텍트럼단계와, 및The white noise noise step of obtaining N (w), which is a white noise spectrum, and the unvoiced sound information UV (w), N (which is the first level if the frequency band is included in the unvoiced sound band and the second level if it is not included in the band. w) and an unvoiced spectrum step of obtaining H (w) | spectra for unvoiced sound synthesis using the following equation: H (w) uv, and
│H(w)│uv=│H(w)│ N(w) UV(w) ----- (2)│H (w) │uv = │H (w) │ N (w) UV (w) ----- (2)
│H(w)│uv를 역푸리에 변환하여 시간상에서 무성음 신호를 구하는 무성음 합성단계를 구비하는 것을 특징으로 하고, 무성음의 재생시에 음질의 향상을 가져오고, 다중 대역 여기형태의 음성 부호화기의 성능을 향상시키며, 스펙트럼 포락의 전송에 선형 예측 계수를 사용함으로써 저전송률로 음성 부호화기를 구현할 수 있는 효과가 있다.An unvoiced speech synthesis step of converting H (w) uv to inverse Fourier to obtain an unvoiced signal in time, and improves sound quality during unvoiced sound reproduction, and improves the performance of a multi-band excitation type speech coder. In addition, the use of a linear prediction coefficient for the transmission of the spectral envelope enables the speech coder to be implemented at a low data rate.
Description
제1도는 종래의 무성음합성방법을 설명하기 위한 플로우차트이다.1 is a flowchart for explaining a conventional unvoiced speech synthesis method.
제2도는 제1도의 방법에 의해 무성음 합성시 스펙트럼의 파형도이다.FIG. 2 is a waveform diagram of the spectrum during unvoiced sound synthesis by the method of FIG.
제3도는 본 발명에 의한 다중 대역 여기 부호화방법에 있어서 무성음 합성방법을 설명하기 위한 플로우차트이다.3 is a flowchart for explaining an unvoiced sound synthesis method in the multi-band excitation encoding method according to the present invention.
제4도는 제3도의 방법에 의해 무성음 합성시 스펙트럼의 파형도이다.FIG. 4 is a waveform diagram of the spectrum during unvoiced sound synthesis by the method of FIG.
본 발명은 음성부호화방법에 관한 것으로 특히, 다중 대역 여기 부호화방법에서 무성음을 합성하는 방법에 관한 것이다.The present invention relates to a speech encoding method, and more particularly, to a method for synthesizing unvoiced sound in a multi-band excitation encoding method.
음성신호의 중복성을 제거하여 정보량을 줄이는 음성부호화방법은 음성신호의 전송시 전송효율을 높여주며, 음성정보를 저장할 때 기억용량을 줄여준다. 음성 부호화방법은 크게 파형부호화방법과 음성부호화방법 그리고 두가지의 방법을 혼합한 하이브리드형 부호화방법으로 분류될 수 있다.The voice encoding method that reduces the amount of information by removing the redundancy of the voice signal increases the transmission efficiency when transmitting the voice signal and reduces the storage capacity when storing the voice information. The speech encoding method can be broadly classified into a waveform encoding method, a speech encoding method, and a hybrid encoding method in which the two methods are mixed.
음원 부호화 방법중의 하나인 다중대역 여기(MBE:Multi Band Excitation 이하 MBE)부호화방법은 기존의 음성 생성 모델들이 음성구간 전체를 유성음 혹은 무성음으로 구분하는 것과는 달리, 음성구간의 주파수 영역을 여러개의 대역으로 나누어, 각각의 대역에 대하여 유성음과 무성음으로 구분하는 다중대역 여기음성 생성 모델을 사용하며, 이러한 방법으로는 선형 예측부호화(LPC:Linear Predictive Coding 이하 LPC). 부분상관관계(PARCOR:PARtial CORrelation). 선형스펙트럼쌍(LSP:Line pectrum Pairs 이하 LPS), 포만트부호화방법 및 다중대역여기(MBE:Multi band excitation 이하 MBE)등이 있고, 현재 연구되고 있는 부호화방법으로는 개선된 다중대역여기(IMBE:Improved multi band excitation 이하 IMBE) 보코더 (VOCODER: Voice Coder)가 있으며, 4kbps이하의 전송률에서도 고품질의 합성음을 제공할 수 있고, 벡터 여기부호화방법에 비하여 계산량이 적은 것으로 알려져 있다.Multiband excitation (MBE) encoding, one of the sound source encoding methods, is a method in which a frequency band of a voice region is divided into several bands, unlike conventional speech generation models that divide an entire voice interval into voiced or unvoiced sounds. The multiband excitation speech generation model is divided into voiced and unvoiced voices for each band. In this method, linear predictive coding (LPC) is used. Partial Correlation (PARCOR). Linear spectral pairs (LSP), formant coding, and multiband excitation (MBE) are currently being studied. Among the coding methods currently studied are improved multiband excitation (IMBE). Improved multi-band excitation (IMBE) vocoder (VOCODER) is known to provide high quality synthesized sound at a bit rate of 4kbps or less, and it is known to have a small amount of computation compared to the vector excitation encoding method.
IMBE는 분석부에서 음성신호를 각 주파수이 하모닉(Harmonic)에 해당하는 진폭과 수개의 하모닉을 한 단위로 한 주파수 대역의 유.무성음 정보 및 피치로 표현한다.IMBE expresses the voice signal in the analysis unit as voice and unvoiced sound information and pitch of a frequency band in which each frequency has a harmonic and several harmonics as a unit.
합성부에서는 추출한 정보를 이용하여 합성하고자 하는 대역이 유성음인 경우에는 삼각함수를 이용한 주기 신호들로서 음성신호를 합성하며, 무성음인 경우는 대역의 대표 크기에 해당하는 잡음신호로 음성신호를 합성한다.The synthesizer synthesizes the voice signal as periodic signals using a trigonometric function when the band to be synthesized using the extracted information is a voiced sound, and synthesizes the voice signal as a noise signal corresponding to a representative size of the band when the unvoiced sound is used.
제1도는 종래의 무성음 합성방법을 설명하기 위한 플로우차트이다.1 is a flowchart for explaining a conventional unvoiced sound synthesis method.
제2도는 제1도의 방법에 의한 무성음 합성을 할때, 스펙트럼의 파형도로서, (a)는 원래의 스펙트럼을 나타내고, (b)는 재생된 무성음의 스펙트럼을 나타내며, U/V(무성음/유성음)결정은 주파수대역에서 신호가 무성음 인가 유성음 인가를 나타내 준다.FIG. 2 is a waveform diagram of spectrum when unvoiced sound synthesis is performed by the method of FIG. 1, (a) shows the original spectrum, (b) shows the spectrum of reproduced unvoiced sound, and U / V (unvoiced / voiced sound). The decision indicates whether the signal is unvoiced or voiced in the frequency band.
이하, 제 1 및 제2도를 참조하여 종래의 무성음 합성방법을 다음과 같이 설명한다.Hereinafter, referring to FIGS. 1 and 2, a conventional unvoiced sound synthesis method will be described as follows.
종래에는 무성음을 합성하기 위해 균일분포를 가지는 백색잡음을 발생시킨다(제200단계). 제200단계후에 발생된 백색잡음을 푸리에(Fourier)변환을 통해 잡음 스펙트럼을 구한다(제202단계).Conventionally, white noise having a uniform distribution is generated to synthesize unvoiced sound (step 200). A noise spectrum is obtained through Fourier transform of the white noise generated after step 200 (step 202).
제202단계후에 구하여진 잡음 스펙트럼에 유성 및 무성정보와 대역별 크기 정보를 이용하여 재생될 무성음 스펙트럼을 구한후(제204단계), 역푸리에변환을 행하여 시간축 상의 합성음을 만든다(제206단계).After the unvoiced sound spectrum to be reproduced is obtained by using the voiced and unvoiced information and the size-by-band size information in the noise spectrum obtained after the step 202 (step 204), the inverse Fourier transform is performed to generate a synthesized sound on the time axis (step 206).
제2도에 도시된 바와 같이 원음의 스펙트럼 포락은 무성음 대역에서 변하고 있지만 재생되는 스펙트럼의 포락은 그 대역에서 일정하게 유지되므로 무성음의 경우는 원음이 충실히 재생되지 못한다. 즉 무성음의 합성시 스펙트럼 상의 불협화음이 발생되는 문제점이 있다.As shown in FIG. 2, the spectral envelope of the original sound is changing in the unvoiced band, but since the envelope of the reproduced spectrum is kept constant in the band, the unvoiced sound cannot be faithfully reproduced. That is, there is a problem that discordant sound occurs on the spectrum when the unvoiced sound is synthesized.
본 발명의 목적은 상기의 문제점을 해결하기 위하여 대역의 대표 크기로 무성음을 재생하는 대신, 그 대역에 해당하는 포락의 정보를 이용하여 무성음을 생성하는 다중 대역 여기 부호화방법에 있어서 무성음 합성방법을 제공하는데 있다.SUMMARY OF THE INVENTION An object of the present invention is to provide an unvoiced sound synthesis method in a multi-band excitation encoding method for generating unvoiced sound using information of an envelope corresponding to a band instead of playing unvoiced sound at a representative size of a band to solve the above problem. It is.
상기 목적을 달성하기 위하여 본 발명에 의한 다중 대역 여기 부호화방법에 있어서 무성음 합성방법은, 상기 무성음신호의 스펙트럼 포락선을 다음 제1식을 이용하여 구하는 스펙트럼 포락선단계와,In order to achieve the above object, in the multi-band excitation encoding method according to the present invention, an unvoiced sound synthesis method includes: a spectral envelope step of obtaining a spectral envelope of the unvoiced sound signal by using the following equation;
(여기서, p는 선형예측계수의 차수이고, G는 파워이고, a는 상기 선형예측계수이고, i는 선형예측계수의 첨자이다)Where p is the order of the linear predictive coefficient, G is the power, a is the linear predictive coefficient, and i is the subscript of the linear predictive coefficient.
백색잡음 스펙트럼인 N(w)를 구하는 백색잡음단계와, 주파수대역이 무성음대역에 포함되면 제1레벨이고, 상기 대역에 포함되지 않으면 제2레벨이 되는 유무성음 정보 UV(w), 상기 N(w) 및 상기│H(w)│를 다음 제2식을 이용하여 무성음 합성을 위한 스펙트럼 │H(w)│uv를 구하는 무성음 스펙트럼단계와, 및The white noise step of obtaining N (w), which is a white noise spectrum, and the unvoiced sound information UV (w), which is a first level if the frequency band is included in the unvoiced band, and becomes a second level if it is not included in the band, the N ( w) and the unvoiced spectral step of obtaining the spectrum H (w) uv for the unvoiced sound synthesis using the following equation (H), and
│H(w)│uv=│H(w)│N(w) UV(w) ----- (2)│H (w) │uv = │H (w) │N (w) UV (w) ----- (2)
상기 │H(w)│uv를 역푸리에 변환하여 시간상에서 무성음 신호를 구하는 무성음 합성단계를 구비하는 것을 특징으로 한다.And an unvoiced speech synthesis step of obtaining an unvoiced sound signal in time by converting the | H (w) | uv to an inverse Fourier transform.
제3도는 본 발명에 의한 다중 대역 여기 부호화방법에 있어서 무성음 합성방법을 설명하기 이한 플로우차트이다.3 is a flowchart for explaining an unvoiced sound synthesis method in the multi-band excitation encoding method according to the present invention.
제4도는 제3도의 방법에 의해 무성음 합성시 스펙트럼의 파형도로서, (a')는 원래의 무성음 신호의 스펙트럼을 나타내고, (b')은 재생된 무성음 신호의 스펙트럼을 나타내며, U/V결정은 제2도에 설명된 바와 같다.4 is a waveform diagram of spectrum when unvoiced sound is synthesized by the method of FIG. 3, (a ') shows the spectrum of the original unvoiced signal, (b') shows the spectrum of the reproduced unvoiced signal, and U / V crystal Is as described in FIG.
이하, 제3 및 제4도를 참조하여 본 발명에 의한 다중 대역 여기 부호화방법에 있어서 무성음 합성방법을 다음과 같이 설명한다.Hereinafter, the unvoiced sound synthesis method in the multiband excitation encoding method according to the present invention will be described with reference to FIGS. 3 and 4 as follows.
음성 파형에 대한 p차 선형 예측 계수를 (a1, a2, ---- , ap)라 하고, 파워를 G라 할 경우, 무성음 신호 생성을 위한 스펙트럼 포락은If the p-order linear prediction coefficient for the speech waveform is (a 1 , a 2 , ----, ap), and the power is G, the spectral envelope for generating the unvoiced signal is
과 같은 식에 의해 구한다(제400단계). 제400단계후에 균일한(Uniform) 분포를 갖는 백색잡음을 발생시킨후 푸리에 변환을 통해 백색잡음 스펙트럼인 N(w)를 구한다(제402단계). 제402단계후에 유,무성 정보를 나타내는 함수인 UV(w)의 값을 구한다.Obtained by the equation (step 400). After operation 400, white noise having a uniform distribution is generated, and N (w), which is a white noise spectrum, is obtained through a Fourier transform (operation 402). After step 402, a value of UV (w), which is a function representing voice and voice information, is obtained.
만일, 주파수가 무성음 대역에 포함되면 UV(w) = '1'이고, 주파수가 무성음 대역에 포함되지 않으면 '0'으로 결정한다.If the frequency is included in the unvoiced band, UV (w) = '1'. If the frequency is not included in the unvoiced band, it is determined as '0'.
그리고 무성음 합성을 위한 스펙트럼은 다음 식에 의해 구한다(제404단계).And the spectrum for unvoiced sound synthesis is obtained by the following equation (step 404).
│H(w)│uv= │H(w)│ N(w) UV(w)│H (w) │uv = │H (w) │ N (w) UV (w)
여기서, │H(w)│uv는 합성된 무성음의 스펙트럼을 나타낸다.Where H (w) uv represents the spectrum of the synthesized unvoiced sound.
제404단계후에 시간축상의 무성음 신호를 Suv라 할 경우, │H(w)│uv를 역 푸리에 변환하여 무성음 신호 Suv를 구한다(제406단계).When the unvoiced sound signal on the time axis after the step 404 is called Suv, the unvoiced signal Suv is obtained by performing inverse Fourier transform on H (w) uv (step 406).
제4도에 도시된 바와 같이 본 발명에 의한 무성음 합성방법에 의한 음성신호의 스펙트럼은 제2도에 도시된 종래의 무성음 합성방법과는 달리 원래의 음성 스펙트럼을 그대로 따르게 된다.As shown in FIG. 4, the spectrum of the speech signal by the unvoiced sound synthesis method according to the present invention follows the original speech spectrum as it is unlike the conventional unvoiced sound synthesis method shown in FIG.
이상에서 살펴 본 바와 같이 본 발명에 의한 다중 대역 여기 부호화방법에 있어서 무성음 합성방법은 무성음 재생시에 음질의 향상을 가져오고, 다중 대역 여기형태의 음성 부호화기의 성능을 향상시키며, 스펙트럼 포락의 전송에 선형 예측 계수를 사용함으로써 저전송률로 음성 부호화기를 구현할 수 있는 효과가 있다.As described above, in the multi-band excitation encoding method according to the present invention, the unvoiced sound synthesis method improves the sound quality during unvoiced sound reproduction, improves the performance of the multi-band excitation-type speech coder, and is linear in the transmission of spectral envelope. By using the prediction coefficients, the speech coder can be implemented at a low data rate.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019950001576A KR0141167B1 (en) | 1995-01-27 | 1995-01-27 | Nonvoice synthesizing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1019950001576A KR0141167B1 (en) | 1995-01-27 | 1995-01-27 | Nonvoice synthesizing method |
Publications (2)
Publication Number | Publication Date |
---|---|
KR960030559A KR960030559A (en) | 1996-08-17 |
KR0141167B1 true KR0141167B1 (en) | 1998-07-15 |
Family
ID=19407412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1019950001576A KR0141167B1 (en) | 1995-01-27 | 1995-01-27 | Nonvoice synthesizing method |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR0141167B1 (en) |
-
1995
- 1995-01-27 KR KR1019950001576A patent/KR0141167B1/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
KR960030559A (en) | 1996-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4843124B2 (en) | Codec and method for encoding and decoding audio signals | |
US10249313B2 (en) | Adaptive bandwidth extension and apparatus for the same | |
US7792672B2 (en) | Method and system for the quick conversion of a voice signal | |
JP3446764B2 (en) | Speech synthesis system and speech synthesis server | |
KR19980028284A (en) | Method and apparatus for reproducing voice signal, method and apparatus for voice decoding, method and apparatus for voice synthesis and portable wireless terminal apparatus | |
WO2003010752A1 (en) | Speech bandwidth extension apparatus and speech bandwidth extension method | |
Yang | Low bit rate speech coding | |
EP1163662B1 (en) | Method of determining the voicing probability of speech signals | |
JPH10124089A (en) | Processor and method for speech signal processing and device and method for expanding voice bandwidth | |
KR0141167B1 (en) | Nonvoice synthesizing method | |
JP3264679B2 (en) | Code-excited linear prediction encoding device and decoding device | |
Dankberg et al. | Development of a 4.8-9.6 kbps RELP Vocoder | |
KR0155798B1 (en) | Vocoder and the method thereof | |
JP4287840B2 (en) | Encoder | |
JP3510168B2 (en) | Audio encoding method and audio decoding method | |
Sercov et al. | An improved speech model with allowance for time-varying pitch harmonic amplitudes and frequencies in low bit-rate MBE coders. | |
Garcia-Mateo et al. | Multi-band vector excitation coding of speech at 4.8 kbps | |
JP2853170B2 (en) | Audio encoding / decoding system | |
Yang et al. | Pitch synchronous multi-band (PSMB) speech coding | |
Chiu et al. | Quad‐band excitation for low bit rate speech coding | |
KR0156983B1 (en) | Voice coder | |
Kang et al. | Phase adjustment in waveform interpolation | |
Polotti et al. | Sound modeling by means of harmonic-band wavelets: new results and experiments | |
Teague | An enhanced multiband excitation speech coder at 2,400 b/s | |
JPH08160993A (en) | Sound analysis-synthesizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20080115 Year of fee payment: 11 |
|
LAPS | Lapse due to unpaid annual fee |