WO2017125840A1 - Method for analysis and synthesis of aperiodic signals - Google Patents
Method for analysis and synthesis of aperiodic signals Download PDFInfo
- Publication number
- WO2017125840A1 WO2017125840A1 PCT/IB2017/050208 IB2017050208W WO2017125840A1 WO 2017125840 A1 WO2017125840 A1 WO 2017125840A1 IB 2017050208 W IB2017050208 W IB 2017050208W WO 2017125840 A1 WO2017125840 A1 WO 2017125840A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- band
- time
- aperiodic
- domain
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 26
- 230000015572 biosynthetic process Effects 0.000 title abstract description 17
- 238000000034 method Methods 0.000 title abstract description 17
- 238000003786 synthesis reaction Methods 0.000 title abstract description 17
- 230000003595 spectral effect Effects 0.000 claims abstract description 42
- 230000005284 excitation Effects 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims 2
- 238000012986 modification Methods 0.000 abstract description 3
- 230000004048 modification Effects 0.000 abstract description 3
- 230000002123 temporal effect Effects 0.000 abstract description 2
- 238000000354 decomposition reaction Methods 0.000 abstract 1
- 238000001228 spectrum Methods 0.000 description 52
- 230000000737 periodic effect Effects 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 4
- 238000004040 coloring Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000012952 Resampling Methods 0.000 description 2
- 230000002459 sustained effect Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000928106 Alain Species 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Definitions
- This invention relates to a method for analysis and synthesis of aperiodic signals, in particular the analysis and synthesis of the aperiodic component in speech signals.
- U.S. Patent No. 5029509 discloses a well-known Spectral Modeling Synthesis (SMS) technology in which the periodic component is represented as a series of sinusoids and the aperiodic component is represented as a series of magnitude spectral envelopes.
- SMS Spectral Modeling Synthesis
- the time-domain envelope of the speech aperiodic component was found to be related to the periodic component.
- the use of a square-wave-modulated white noise excitation signal synchronized to the periodic excitation in a formant synthesizer was found to produce more natural-sounding voice.
- the pitch shifting of the aperiodic component involves whitening the aperiodic signal, estimating time-domain envelope of the whitened signal, demodulating the whitened signal by estimated time-domain envelope, resampling the time-domain envelope, re-modulating the demodulated signal by the resampled time-domain envelope, and finally spectral coloring the re-modulated signal by the spectrum of the original aperiodic signal.
- the short-time energy of the resynthesized aperiodic signal is not guaranteed to comply with the original aperiodic signal because the resampling of time-domain envelope of whitened aperiodic signal may introduce an energy distortion.
- Another problem of conventional technologies is the over-simplified assumption that the noise excitation receives the same modulation in all frequency channels.
- noise is produced not only near glottis but also in the vocal tract during phonation, which implies that the aperiodic component exhibits different time-domain characteristics in different frequency regions.
- the time-domain envelope of aperiodic component extracted from a sustained vowel in the 0-5kHz band has very weak periodicity while the time-domain envelope of the aperiodic component in the 9-12kHz band has stronger periodicity, as shown in Fig. 1.
- the present invention is a method for analysis and synthesis of stochastic signals with quasi- periodic time-domain envelopes.
- the method is primarily designed for high-quality speech processing applications, for example, speech synthesis and audio production.
- the present invention in its analysis stage, involves the following steps. (1) Estimate one or a plurality of spectral envelopes from the input aperiodic signal. (2) Band-pass filter the input aperiodic signal for each designated frequency band. (3) Extract time-domain envelope from each band-pass filtered signal. (4) Store the analysis results.
- the present invention in its synthesis stage, involves the following steps. (1) Generate a white noise signal. (2) Band-pass filter the white noise signal for each designated frequency band. (3) Modulate the band-pass filtered signal with input time-domain envelope. (4) Obtain the full-band excitation signal as the summation of modulated band-limited signals. (5) Whiten the excitation signal by inverse-filtering by its spectral envelope. (6) Filter the whitened excitation signal by input spectral envelope.
- Fig. 1 is the plot of an example of an aperiodic signal extracted from a sustained vowel sound /a/ filtered by a 0-5kHz band-pass filter and a 9-12kHz band-pass filter, respectively. The time-domain envelopes of the signal are also plotted in the figure.
- Fig. 2 is a flow chart showing the analysis process of this invention.
- Fig. 3 is a flow chart showing the synthesis process of this invention.
- Fig. 4 is a flow chart showing the analysis process of a speech processing application involving this invention.
- Fig. 5 is a flow chart showing the synthesis process of a speech processing application involving this invention.
- Fig. 6 is the plot of an example of a magnitude spectrum of the aperiodic signal and its spectral envelope obtained by cepstral smoothing.
- the analysis stage of the present invention consists of the following steps.
- Step A001 obtain the input aperiodic signal and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
- Step A002 perform Short-Time Fourier Transform (STFT) on the input aperiodic signal and obtain a series of magnitude spectra of the input aperiodic signal.
- STFT Short-Time Fourier Transform
- Step A003 for each magnitude spectrum in the series of magnitude spectra obtained in step A002, calculate the corresponding spectral envelope.
- the result should be a series of spectral envelopes.
- the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
- An example of a magnitude spectrum obtained by STFT in step A002 and the corresponding spectral envelope obtained by truncating the cepstrum is shown in Fig. 6.
- Step A004 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the input aperiodic signal to remove the portion of the aperiodic signal outside of the frequency band.
- Step A005 extract the time-domain envelope of the band-pass filtered signal obtained in step A004.
- the preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the band-pass filtered signal.
- Step A006 store the analysis results, including the series of spectral envelopes obtained in step A003 and one or a plurality of time-domain envelopes obtained in step A005.
- the synthesis stage of the present invention consists of the following steps,
- Step S001 obtain a series of spectral envelopes describing the frequency-domain
- characteristics of the aperiodic signal one or a plurality of time-domain envelopes describing the time-domain characteristics of the aperiodic signal, and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
- Step S002 generate a white noise signal with the same duration as the input time-domain envelopes.
- Step S003 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S002 to remove the portion of the noise signal outside of the frequency band.
- Step S004 for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S003, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
- Step S005 calculate the sum of the modulated signals obtained in step S004. The result will be denoted as the noise excitation signal in the rest of this description.
- step S004 Because the time-domain modulation in step S004 changes the energy of the band-pass filtered signal in each frequency band, the resulting noise excitation signal becomes colored.
- the spectral envelope of the noise excitation signal should be taken into consideration in the following noise coloring procedure, in particular described in step S007-S009.
- Step S006 perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
- the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
- Step S007 for each magnitude spectrum in the series of magnitude spectra obtained in step S006, calculate the corresponding spectral envelope.
- the result should be a series of spectral envelopes.
- Step S008 inverse filter the series of complex spectra obtained in step S006 by the series of spectral envelopes obtained in step S007.
- the inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope.
- the result should be a series of complex spectra.
- Step S009 filter the series of inverse filtered complex spectra obtained in step S008 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal.
- the filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope.
- the result should be a series of complex spectra.
- Step S010 perform inverse STFT on the series of complex spectra obtained in step S009.
- the resulting time-domain signal is the synthesized aperiodic signal.
- Step A101 receive a speech signal from a sound input device, such as a microphone.
- Step A102 extract the pitch contour from the input speech signal.
- the extracted pitch contour is an array of frequency values corresponding to the fundamental frequency of the speech signal at a series of time instants, spacing at a fixed time interval (around 5 milliseconds). If the speech is unvoiced at a certain time instant, then the frequency value corresponding to the time instant is set to zero.
- the preferred pitch extraction method is the YIN algorithm described in De Cheveigne, Alain, and Kawahara, Hideki, "YIN, a fundamental frequency estimator for speech and music.” Journal of the Acoustical Society of America 111.4 (2002) : 1917-1930.
- Step A103 perform STFT analysis on the input speech signal at a series of time instants corresponding to the analysis time instants at where the pitch contour is extracted; obtain a series of complex spectra of the input speech signal.
- the preferred window for the STFT analysis is Blackman window.
- the length of the window is preferred to be time-varying.
- the length of the window is preferred to be twice the length of a period of the speech signal around the analysis time instant.
- Step A104 for each complex spectrum in the series of complex spectra obtained in step A103, calculate the log magnitude spectrum and pick the spectral peaks around each harmonic frequency calculated as the integer multiple of the fundamental frequency at the corresponding time instant. Perform parabolic interpolation at the spectral peaks to obtain a refined estimation of the harmonic amplitudes and harmonic frequencies. Perform linear interpolation at the refined harmonic frequencies on the unwrapped phase spectrum calculated from the complex spectrum to obtain a refined estimation of the phase of the harmonics. Normalize the estimated harmonic amplitudes by dividing the amplitudes by half the sum of the analysis window for the STFT analysis in step A103.
- Step A105 generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 and the pitch contour obtained in step A102. Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the extracted periodic component.
- Step A106 subtract the extract periodic component from the input speech signal.
- the resulting signal is denoted as the extracted aperiodic component in the rest of this description.
- Step A112 perform Short-Time Fourier Transform (STFT) on the extracted aperiodic component signal and obtain a series of magnitude spectra of the extracted aperiodic component signal.
- STFT Short-Time Fourier Transform
- Step A113 for each magnitude spectrum in the series of magnitude spectra obtained in step A112, calculate the corresponding spectral envelope.
- the result should be a series of spectral envelopes.
- the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
- Step A114 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the extracted aperiodic component signal to remove the portion of the aperiodic signal outside of the frequency band.
- Step A115 extract the time-domain envelope of the band-pass filtered signal obtained in step A114.
- the preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the signal.
- Step A116 store the analysis results, including the series of spectral envelopes obtained in step A113, one or a plurality of time-domain envelopes obtained in step A115, the pitch contour obtained in step A102 and the series of harmonic amplitudes and harmonic phases obtained in step A104.
- Step M101 optionally, modify the analysis results. For example, multiply the pitch contour by a constant, accordingly adjust the amplitudes of the harmonics at each time instant and accordingly resample the time-domain envelopes for each frequency band describing the aperiodic component to shift up the pitch.
- Step SlOl generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 or step M101 and the pitch contour obtained in step A102 or step M101.
- Step S112 generate a white noise signal with the same duration as the time-domain envelopes obtained in step A115 or step M101.
- Step S113 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S112 to remove the portion of the noise signal outside of the frequency band.
- Step S114 for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S113, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
- Step S115 obtain the noise excitation signal by computing the sum of the modulated signals obtained in step S114.
- Step S116 perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
- the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
- Step S117 for each magnitude spectrum in the series of magnitude spectra obtained in step S116, calculate the corresponding spectral envelope.
- the result should be a series of spectral envelopes.
- Step S118 inverse filter the series of complex spectra obtained in step S116 by the series of spectral envelopes obtained in step S117.
- the inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope.
- the result should be a series of complex spectra.
- Step S119 filter the series of inverse filtered complex spectra obtained in step S118 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal.
- the filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope.
- the result should be a series of complex spectra.
- Step S120 perform inverse STFT on the series of complex spectra obtained in step S119.
- the resulting time-domain signal is the synthesized aperiodic signal.
- Step S121 calculate the sum of the synthesized periodic signal obtained in step SlOl and the synthesized aperiodic signal obtained in step S120. Send the resulting signal to an sound output device such as a speaker.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention is a method for analysis and synthesis of the aperiodic component in speech signals. The analysis stage involves spectral envelope estimation and decomposition of the input aperiodic component into a plurality of band-pass filtered signals, from which time-domain envelopes are extracted. The synthesis stage involves multi-band time-domain modulation and spectral modification on a white noise signal. The present invention preserves both temporal and spectral characteristics of the aperiodic component when applied to speech signals.
Description
Method for Analysis and Synthesis of Aperiodic Signals
FIELD OF THE INVENTION This invention relates to a method for analysis and synthesis of aperiodic signals, in particular the analysis and synthesis of the aperiodic component in speech signals.
DESCRIPTION OF THE PRIOR ART Various speech processing technologies have been proposed that decomposes a speech signal into periodic and aperiodic components in analysis stage, and recombine the two components in synthesis stage. In some literatures, the periodic component is referred to as deterministic component and the aperiodic component is referred to as stochastic component or noise component.
For example, U.S. Patent No. 5029509 discloses a well-known Spectral Modeling Synthesis (SMS) technology in which the periodic component is represented as a series of sinusoids and the aperiodic component is represented as a series of magnitude spectral envelopes. According to Childers, Donald G., and C. K. Lee. "Vocal quality factors: Analysis, synthesis, and perception." the Journal of the Acoustical Society of America 90.5 (1991): 2394-2410, the time-domain envelope of the speech aperiodic component was found to be related to the periodic component. Further, the use of a square-wave-modulated white noise excitation signal synchronized to the periodic excitation in a formant synthesizer was found to produce more natural-sounding voice.
In many applications it is desirable to preserve both temporal and frequency-domain characteristics of the aperiodic signal during processing, modification, or parametrization. Such an attempt, for the purpose of pitch shifting, is described in Mehta, Daryush, and Thomas F. Quatieri. "Synthesis, analysis, and pitch modification of the breathy vowel." Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on. IEEE, 2005, according to which the speech signal is first decomposed into periodic component and aperiodic component, then the periodic component is pitch-shifted by a sinusoidal model. The pitch shifting of the aperiodic component involves whitening the aperiodic signal, estimating time-domain envelope of the whitened signal, demodulating the whitened signal by estimated time-domain envelope, resampling the time-domain envelope, re-modulating the demodulated signal by the resampled time-domain envelope, and finally spectral coloring the re-modulated signal by the spectrum of the original aperiodic signal. However, the short-time energy of the resynthesized aperiodic signal is not guaranteed to comply with the original aperiodic signal because the resampling of time-domain envelope of whitened aperiodic signal may introduce an energy distortion.
A similar technology is described in Pantazis, Yannis, and Stylianou, Yannis, "Improving the modeling of the noise part in the harmonic plus noise model of speech." International Conference on Acoustics, Speech, and Signal Processing (2008), in which the synthesis stage involves first coloring the white noise signal and then modulating the colored signal by a time-domain envelope. However, the time-domain modulation introduces a frequency- domain distortion that blurs the spectrum of the aperiodic signal, resulting in a degradation in naturalness of the synthesized speech.
Another problem of conventional technologies is the over-simplified assumption that the noise excitation receives the same modulation in all frequency channels. In fact, noise is produced not only near glottis but also in the vocal tract during phonation, which implies that the aperiodic component exhibits different time-domain characteristics in different frequency regions. For example, the time-domain envelope of aperiodic component extracted from a sustained vowel in the 0-5kHz band has very weak periodicity while the time-domain envelope of the aperiodic component in the 9-12kHz band has stronger
periodicity, as shown in Fig. 1.
SUMMARY OF THE INVENTION The present invention is a method for analysis and synthesis of stochastic signals with quasi- periodic time-domain envelopes. The method is primarily designed for high-quality speech processing applications, for example, speech synthesis and audio production.
The present invention, in its analysis stage, involves the following steps. (1) Estimate one or a plurality of spectral envelopes from the input aperiodic signal. (2) Band-pass filter the input aperiodic signal for each designated frequency band. (3) Extract time-domain envelope from each band-pass filtered signal. (4) Store the analysis results.
The present invention, in its synthesis stage, involves the following steps. (1) Generate a white noise signal. (2) Band-pass filter the white noise signal for each designated frequency band. (3) Modulate the band-pass filtered signal with input time-domain envelope. (4) Obtain the full-band excitation signal as the summation of modulated band-limited signals. (5) Whiten the excitation signal by inverse-filtering by its spectral envelope. (6) Filter the whitened excitation signal by input spectral envelope.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is the plot of an example of an aperiodic signal extracted from a sustained vowel sound /a/ filtered by a 0-5kHz band-pass filter and a 9-12kHz band-pass filter, respectively. The time-domain envelopes of the signal are also plotted in the figure.
Fig. 2 is a flow chart showing the analysis process of this invention.
Fig. 3 is a flow chart showing the synthesis process of this invention.
Fig. 4 is a flow chart showing the analysis process of a speech processing application involving this invention.
Fig. 5 is a flow chart showing the synthesis process of a speech processing application involving this invention.
Fig. 6 is the plot of an example of a magnitude spectrum of the aperiodic signal and its spectral envelope obtained by cepstral smoothing.
DETAILED DESCRIPTION OF THE INVENTION As shown in Fig. 2, the analysis stage of the present invention consists of the following steps.
Step A001, obtain the input aperiodic signal and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
Step A002, perform Short-Time Fourier Transform (STFT) on the input aperiodic signal and obtain a series of magnitude spectra of the input aperiodic signal.
Step A003, for each magnitude spectrum in the series of magnitude spectra obtained in step A002, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum. An example of a magnitude spectrum obtained by STFT in step A002 and the corresponding spectral envelope obtained by truncating the cepstrum is shown in Fig. 6.
Step A004, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the input aperiodic signal to remove the portion of the aperiodic signal outside of the frequency band.
Step A005, extract the time-domain envelope of the band-pass filtered signal obtained in step A004.
The preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the band-pass filtered signal.
Step A006, store the analysis results, including the series of spectral envelopes obtained in step A003 and one or a plurality of time-domain envelopes obtained in step A005.
As shown in Fig. 3, the synthesis stage of the present invention consists of the following steps,
Step S001, obtain a series of spectral envelopes describing the frequency-domain
characteristics of the aperiodic signal, one or a plurality of time-domain envelopes describing the time-domain characteristics of the aperiodic signal, and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
Step S002, generate a white noise signal with the same duration as the input time-domain envelopes.
Step S003, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S002 to remove the portion of the noise signal outside of the frequency band.
Step S004, for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S003, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
Step S005, calculate the sum of the modulated signals obtained in step S004. The result will be denoted as the noise excitation signal in the rest of this description.
Because the time-domain modulation in step S004 changes the energy of the band-pass filtered signal in each frequency band, the resulting noise excitation signal becomes colored. Thus the spectral envelope of the noise excitation signal should be taken into consideration in the following noise coloring procedure, in particular described in step S007-S009.
Step S006, perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step S007, for each magnitude spectrum in the series of magnitude spectra obtained in step S006, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
Step S008, inverse filter the series of complex spectra obtained in step S006 by the series of spectral envelopes obtained in step S007. The inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S009, filter the series of inverse filtered complex spectra obtained in step S008 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal. The filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S010, perform inverse STFT on the series of complex spectra obtained in step S009. The resulting time-domain signal is the synthesized aperiodic signal.
The following describes the implementation of an exemplary speech processing application involving this invention, as shown in Fig. 4 and Fig. 5.
Step A101, receive a speech signal from a sound input device, such as a microphone.
Step A102, extract the pitch contour from the input speech signal. The extracted pitch contour is an array of frequency values corresponding to the fundamental frequency of the
speech signal at a series of time instants, spacing at a fixed time interval (around 5 milliseconds). If the speech is unvoiced at a certain time instant, then the frequency value corresponding to the time instant is set to zero.
The preferred pitch extraction method is the YIN algorithm described in De Cheveigne, Alain, and Kawahara, Hideki, "YIN, a fundamental frequency estimator for speech and music." Journal of the Acoustical Society of America 111.4 (2002) : 1917-1930.
Step A103, perform STFT analysis on the input speech signal at a series of time instants corresponding to the analysis time instants at where the pitch contour is extracted; obtain a series of complex spectra of the input speech signal. The preferred window for the STFT analysis is Blackman window. The length of the window is preferred to be time-varying. The length of the window is preferred to be twice the length of a period of the speech signal around the analysis time instant.
Step A104, for each complex spectrum in the series of complex spectra obtained in step A103, calculate the log magnitude spectrum and pick the spectral peaks around each harmonic frequency calculated as the integer multiple of the fundamental frequency at the corresponding time instant. Perform parabolic interpolation at the spectral peaks to obtain a refined estimation of the harmonic amplitudes and harmonic frequencies. Perform linear interpolation at the refined harmonic frequencies on the unwrapped phase spectrum calculated from the complex spectrum to obtain a refined estimation of the phase of the harmonics. Normalize the estimated harmonic amplitudes by dividing the amplitudes by half the sum of the analysis window for the STFT analysis in step A103.
Step A105, generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 and the pitch contour obtained in step A102. Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the extracted periodic component.
Step A106, subtract the extract periodic component from the input speech signal. The resulting signal is denoted as the extracted aperiodic component in the rest of this description.
Step A112, perform Short-Time Fourier Transform (STFT) on the extracted aperiodic component signal and obtain a series of magnitude spectra of the extracted aperiodic component signal. The length of the window for the STFT analysis is around 10
milliseconds.
Step A113, for each magnitude spectrum in the series of magnitude spectra obtained in step A112, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step A114, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the extracted aperiodic component signal to remove the portion of the aperiodic signal outside of the frequency band.
Step A115, extract the time-domain envelope of the band-pass filtered signal obtained in step A114.
The preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the signal.
Step A116, store the analysis results, including the series of spectral envelopes obtained in step A113, one or a plurality of time-domain envelopes obtained in step A115, the pitch contour obtained in step A102 and the series of harmonic amplitudes and harmonic phases obtained in step A104.
Step M101, optionally, modify the analysis results. For example, multiply the pitch contour by a constant, accordingly adjust the amplitudes of the harmonics at each time instant and
accordingly resample the time-domain envelopes for each frequency band describing the aperiodic component to shift up the pitch.
Step SlOl, generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 or step M101 and the pitch contour obtained in step A102 or step M101.
Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the synthesized periodic signal.
Step S112, generate a white noise signal with the same duration as the time-domain envelopes obtained in step A115 or step M101.
Step S113, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S112 to remove the portion of the noise signal outside of the frequency band.
Step S114, for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S113, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
Step S115, obtain the noise excitation signal by computing the sum of the modulated signals obtained in step S114.
Step S116, perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step S117, for each magnitude spectrum in the series of magnitude spectra obtained in step S116, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
Step S118, inverse filter the series of complex spectra obtained in step S116 by the series of spectral envelopes obtained in step S117. The inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S119, filter the series of inverse filtered complex spectra obtained in step S118 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal. The filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S120, perform inverse STFT on the series of complex spectra obtained in step S119. The resulting time-domain signal is the synthesized aperiodic signal.
Step S121, calculate the sum of the synthesized periodic signal obtained in step SlOl and the synthesized aperiodic signal obtained in step S120. Send the resulting signal to an sound output device such as a speaker.
Claims
1. a speech processing method that extracts information from an aperiodic signal, consisting of the following steps
estimate one or a plurality of spectral envelopes from the input aperiodic signal;
band-pass filter the input aperiodic signal for each designated frequency band;
extract time-domain envelope from each band-pass filtered signal;
store the analysis results comprising of one or a plurality of spectral envelopes and time- domain envelope for each designated frequency band.
2. a speech processing method that generates an aperiodic signal from spectral and time- domain envelopes, consisting of the following steps
generate a white noise signal;
band-pass filter the white noise signal for each designated frequency band;
modulate the band-pass filtered signal with input time-domain envelope;
obtain the full-band excitation signal as the summation of modulated band-limited signals;
whiten the excitation signal by inverse-filtering by its spectral envelope;
filter the whitened excitation signal by one or a plurality of input spectral envelopes.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662280572P | 2016-01-19 | 2016-01-19 | |
US62/280,572 | 2016-01-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017125840A1 true WO2017125840A1 (en) | 2017-07-27 |
Family
ID=59362144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2017/050208 WO2017125840A1 (en) | 2016-01-19 | 2017-01-15 | Method for analysis and synthesis of aperiodic signals |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017125840A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051460A (en) * | 2006-04-05 | 2007-10-10 | 三星电子株式会社 | Speech signal pre-processing system and method of extracting characteristic information of speech signal |
CN101123088A (en) * | 2007-09-03 | 2008-02-13 | 北京中星微电子有限公司 | A chorus special effect processing method and system |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
WO2015166694A1 (en) * | 2014-05-01 | 2015-11-05 | 日本電信電話株式会社 | Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program, and recording medium |
WO2015188627A1 (en) * | 2014-06-12 | 2015-12-17 | 华为技术有限公司 | Method, device and encoder of processing temporal envelope of audio signal |
-
2017
- 2017-01-15 WO PCT/IB2017/050208 patent/WO2017125840A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101051460A (en) * | 2006-04-05 | 2007-10-10 | 三星电子株式会社 | Speech signal pre-processing system and method of extracting characteristic information of speech signal |
CN101123088A (en) * | 2007-09-03 | 2008-02-13 | 北京中星微电子有限公司 | A chorus special effect processing method and system |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
WO2015166694A1 (en) * | 2014-05-01 | 2015-11-05 | 日本電信電話株式会社 | Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program, and recording medium |
WO2015188627A1 (en) * | 2014-06-12 | 2015-12-17 | 华为技术有限公司 | Method, device and encoder of processing temporal envelope of audio signal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5958866B2 (en) | Spectral envelope and group delay estimation system and speech signal synthesis system for speech analysis and synthesis | |
US8255222B2 (en) | Speech separating apparatus, speech synthesizing apparatus, and voice quality conversion apparatus | |
EP1612770B1 (en) | Voice processing apparatus and program | |
Mowlaee et al. | Interspeech 2014 special session: Phase importance in speech processing applications | |
JP2009042716A (en) | Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method | |
Cabral et al. | Glottal spectral separation for parametric speech synthesis | |
JP6347536B2 (en) | Sound synthesis method and sound synthesizer | |
JP2018077283A (en) | Speech synthesis method | |
JP2005531990A5 (en) | ||
Rao et al. | Voice conversion by prosody and vocal tract modification | |
JPH04358200A (en) | Speech synthesizer | |
CN102231275B (en) | Embedded speech synthesis method based on weighted mixed excitation | |
Babacan et al. | Parametric representation for singing voice synthesis: A comparative evaluation | |
JP6831767B2 (en) | Speech recognition methods, devices and programs | |
WO2017125840A1 (en) | Method for analysis and synthesis of aperiodic signals | |
Morise | Modification of velvet noise for speech waveform generation by using vocoder-based speech synthesizer | |
Arakawa et al. | High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of STRAIGHT spectrum | |
Arroabarren et al. | Instantaneous frequency and amplitude of vibrato in singing voice | |
US10586526B2 (en) | Speech analysis and synthesis method based on harmonic model and source-vocal tract decomposition | |
JPH07261798A (en) | Voice analyzing and synthesizing device | |
Kawahara et al. | Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation | |
JP5745453B2 (en) | Voice clarity conversion device, voice clarity conversion method and program thereof | |
Morise et al. | High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity | |
Fulop et al. | The Reassigned Spectrogram | |
JPH02134699A (en) | Voice analyzing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17741154 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17741154 Country of ref document: EP Kind code of ref document: A1 |