WO2017125840A1 - Method for analysis and synthesis of aperiodic signals - Google Patents

Method for analysis and synthesis of aperiodic signals Download PDF

Info

Publication number
WO2017125840A1
WO2017125840A1 PCT/IB2017/050208 IB2017050208W WO2017125840A1 WO 2017125840 A1 WO2017125840 A1 WO 2017125840A1 IB 2017050208 W IB2017050208 W IB 2017050208W WO 2017125840 A1 WO2017125840 A1 WO 2017125840A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
band
time
aperiodic
domain
Prior art date
Application number
PCT/IB2017/050208
Other languages
French (fr)
Inventor
Kanru HUA
Original Assignee
Hua Kanru
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hua Kanru filed Critical Hua Kanru
Publication of WO2017125840A1 publication Critical patent/WO2017125840A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • This invention relates to a method for analysis and synthesis of aperiodic signals, in particular the analysis and synthesis of the aperiodic component in speech signals.
  • U.S. Patent No. 5029509 discloses a well-known Spectral Modeling Synthesis (SMS) technology in which the periodic component is represented as a series of sinusoids and the aperiodic component is represented as a series of magnitude spectral envelopes.
  • SMS Spectral Modeling Synthesis
  • the time-domain envelope of the speech aperiodic component was found to be related to the periodic component.
  • the use of a square-wave-modulated white noise excitation signal synchronized to the periodic excitation in a formant synthesizer was found to produce more natural-sounding voice.
  • the pitch shifting of the aperiodic component involves whitening the aperiodic signal, estimating time-domain envelope of the whitened signal, demodulating the whitened signal by estimated time-domain envelope, resampling the time-domain envelope, re-modulating the demodulated signal by the resampled time-domain envelope, and finally spectral coloring the re-modulated signal by the spectrum of the original aperiodic signal.
  • the short-time energy of the resynthesized aperiodic signal is not guaranteed to comply with the original aperiodic signal because the resampling of time-domain envelope of whitened aperiodic signal may introduce an energy distortion.
  • Another problem of conventional technologies is the over-simplified assumption that the noise excitation receives the same modulation in all frequency channels.
  • noise is produced not only near glottis but also in the vocal tract during phonation, which implies that the aperiodic component exhibits different time-domain characteristics in different frequency regions.
  • the time-domain envelope of aperiodic component extracted from a sustained vowel in the 0-5kHz band has very weak periodicity while the time-domain envelope of the aperiodic component in the 9-12kHz band has stronger periodicity, as shown in Fig. 1.
  • the present invention is a method for analysis and synthesis of stochastic signals with quasi- periodic time-domain envelopes.
  • the method is primarily designed for high-quality speech processing applications, for example, speech synthesis and audio production.
  • the present invention in its analysis stage, involves the following steps. (1) Estimate one or a plurality of spectral envelopes from the input aperiodic signal. (2) Band-pass filter the input aperiodic signal for each designated frequency band. (3) Extract time-domain envelope from each band-pass filtered signal. (4) Store the analysis results.
  • the present invention in its synthesis stage, involves the following steps. (1) Generate a white noise signal. (2) Band-pass filter the white noise signal for each designated frequency band. (3) Modulate the band-pass filtered signal with input time-domain envelope. (4) Obtain the full-band excitation signal as the summation of modulated band-limited signals. (5) Whiten the excitation signal by inverse-filtering by its spectral envelope. (6) Filter the whitened excitation signal by input spectral envelope.
  • Fig. 1 is the plot of an example of an aperiodic signal extracted from a sustained vowel sound /a/ filtered by a 0-5kHz band-pass filter and a 9-12kHz band-pass filter, respectively. The time-domain envelopes of the signal are also plotted in the figure.
  • Fig. 2 is a flow chart showing the analysis process of this invention.
  • Fig. 3 is a flow chart showing the synthesis process of this invention.
  • Fig. 4 is a flow chart showing the analysis process of a speech processing application involving this invention.
  • Fig. 5 is a flow chart showing the synthesis process of a speech processing application involving this invention.
  • Fig. 6 is the plot of an example of a magnitude spectrum of the aperiodic signal and its spectral envelope obtained by cepstral smoothing.
  • the analysis stage of the present invention consists of the following steps.
  • Step A001 obtain the input aperiodic signal and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
  • Step A002 perform Short-Time Fourier Transform (STFT) on the input aperiodic signal and obtain a series of magnitude spectra of the input aperiodic signal.
  • STFT Short-Time Fourier Transform
  • Step A003 for each magnitude spectrum in the series of magnitude spectra obtained in step A002, calculate the corresponding spectral envelope.
  • the result should be a series of spectral envelopes.
  • the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
  • An example of a magnitude spectrum obtained by STFT in step A002 and the corresponding spectral envelope obtained by truncating the cepstrum is shown in Fig. 6.
  • Step A004 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the input aperiodic signal to remove the portion of the aperiodic signal outside of the frequency band.
  • Step A005 extract the time-domain envelope of the band-pass filtered signal obtained in step A004.
  • the preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the band-pass filtered signal.
  • Step A006 store the analysis results, including the series of spectral envelopes obtained in step A003 and one or a plurality of time-domain envelopes obtained in step A005.
  • the synthesis stage of the present invention consists of the following steps,
  • Step S001 obtain a series of spectral envelopes describing the frequency-domain
  • characteristics of the aperiodic signal one or a plurality of time-domain envelopes describing the time-domain characteristics of the aperiodic signal, and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
  • Step S002 generate a white noise signal with the same duration as the input time-domain envelopes.
  • Step S003 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S002 to remove the portion of the noise signal outside of the frequency band.
  • Step S004 for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S003, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
  • Step S005 calculate the sum of the modulated signals obtained in step S004. The result will be denoted as the noise excitation signal in the rest of this description.
  • step S004 Because the time-domain modulation in step S004 changes the energy of the band-pass filtered signal in each frequency band, the resulting noise excitation signal becomes colored.
  • the spectral envelope of the noise excitation signal should be taken into consideration in the following noise coloring procedure, in particular described in step S007-S009.
  • Step S006 perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
  • the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
  • Step S007 for each magnitude spectrum in the series of magnitude spectra obtained in step S006, calculate the corresponding spectral envelope.
  • the result should be a series of spectral envelopes.
  • Step S008 inverse filter the series of complex spectra obtained in step S006 by the series of spectral envelopes obtained in step S007.
  • the inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope.
  • the result should be a series of complex spectra.
  • Step S009 filter the series of inverse filtered complex spectra obtained in step S008 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal.
  • the filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope.
  • the result should be a series of complex spectra.
  • Step S010 perform inverse STFT on the series of complex spectra obtained in step S009.
  • the resulting time-domain signal is the synthesized aperiodic signal.
  • Step A101 receive a speech signal from a sound input device, such as a microphone.
  • Step A102 extract the pitch contour from the input speech signal.
  • the extracted pitch contour is an array of frequency values corresponding to the fundamental frequency of the speech signal at a series of time instants, spacing at a fixed time interval (around 5 milliseconds). If the speech is unvoiced at a certain time instant, then the frequency value corresponding to the time instant is set to zero.
  • the preferred pitch extraction method is the YIN algorithm described in De Cheveigne, Alain, and Kawahara, Hideki, "YIN, a fundamental frequency estimator for speech and music.” Journal of the Acoustical Society of America 111.4 (2002) : 1917-1930.
  • Step A103 perform STFT analysis on the input speech signal at a series of time instants corresponding to the analysis time instants at where the pitch contour is extracted; obtain a series of complex spectra of the input speech signal.
  • the preferred window for the STFT analysis is Blackman window.
  • the length of the window is preferred to be time-varying.
  • the length of the window is preferred to be twice the length of a period of the speech signal around the analysis time instant.
  • Step A104 for each complex spectrum in the series of complex spectra obtained in step A103, calculate the log magnitude spectrum and pick the spectral peaks around each harmonic frequency calculated as the integer multiple of the fundamental frequency at the corresponding time instant. Perform parabolic interpolation at the spectral peaks to obtain a refined estimation of the harmonic amplitudes and harmonic frequencies. Perform linear interpolation at the refined harmonic frequencies on the unwrapped phase spectrum calculated from the complex spectrum to obtain a refined estimation of the phase of the harmonics. Normalize the estimated harmonic amplitudes by dividing the amplitudes by half the sum of the analysis window for the STFT analysis in step A103.
  • Step A105 generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 and the pitch contour obtained in step A102. Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the extracted periodic component.
  • Step A106 subtract the extract periodic component from the input speech signal.
  • the resulting signal is denoted as the extracted aperiodic component in the rest of this description.
  • Step A112 perform Short-Time Fourier Transform (STFT) on the extracted aperiodic component signal and obtain a series of magnitude spectra of the extracted aperiodic component signal.
  • STFT Short-Time Fourier Transform
  • Step A113 for each magnitude spectrum in the series of magnitude spectra obtained in step A112, calculate the corresponding spectral envelope.
  • the result should be a series of spectral envelopes.
  • the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
  • Step A114 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the extracted aperiodic component signal to remove the portion of the aperiodic signal outside of the frequency band.
  • Step A115 extract the time-domain envelope of the band-pass filtered signal obtained in step A114.
  • the preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the signal.
  • Step A116 store the analysis results, including the series of spectral envelopes obtained in step A113, one or a plurality of time-domain envelopes obtained in step A115, the pitch contour obtained in step A102 and the series of harmonic amplitudes and harmonic phases obtained in step A104.
  • Step M101 optionally, modify the analysis results. For example, multiply the pitch contour by a constant, accordingly adjust the amplitudes of the harmonics at each time instant and accordingly resample the time-domain envelopes for each frequency band describing the aperiodic component to shift up the pitch.
  • Step SlOl generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 or step M101 and the pitch contour obtained in step A102 or step M101.
  • Step S112 generate a white noise signal with the same duration as the time-domain envelopes obtained in step A115 or step M101.
  • Step S113 for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S112 to remove the portion of the noise signal outside of the frequency band.
  • Step S114 for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S113, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
  • Step S115 obtain the noise excitation signal by computing the sum of the modulated signals obtained in step S114.
  • Step S116 perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
  • the preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
  • Step S117 for each magnitude spectrum in the series of magnitude spectra obtained in step S116, calculate the corresponding spectral envelope.
  • the result should be a series of spectral envelopes.
  • Step S118 inverse filter the series of complex spectra obtained in step S116 by the series of spectral envelopes obtained in step S117.
  • the inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope.
  • the result should be a series of complex spectra.
  • Step S119 filter the series of inverse filtered complex spectra obtained in step S118 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal.
  • the filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope.
  • the result should be a series of complex spectra.
  • Step S120 perform inverse STFT on the series of complex spectra obtained in step S119.
  • the resulting time-domain signal is the synthesized aperiodic signal.
  • Step S121 calculate the sum of the synthesized periodic signal obtained in step SlOl and the synthesized aperiodic signal obtained in step S120. Send the resulting signal to an sound output device such as a speaker.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention is a method for analysis and synthesis of the aperiodic component in speech signals. The analysis stage involves spectral envelope estimation and decomposition of the input aperiodic component into a plurality of band-pass filtered signals, from which time-domain envelopes are extracted. The synthesis stage involves multi-band time-domain modulation and spectral modification on a white noise signal. The present invention preserves both temporal and spectral characteristics of the aperiodic component when applied to speech signals.

Description

Method for Analysis and Synthesis of Aperiodic Signals
FIELD OF THE INVENTION This invention relates to a method for analysis and synthesis of aperiodic signals, in particular the analysis and synthesis of the aperiodic component in speech signals.
DESCRIPTION OF THE PRIOR ART Various speech processing technologies have been proposed that decomposes a speech signal into periodic and aperiodic components in analysis stage, and recombine the two components in synthesis stage. In some literatures, the periodic component is referred to as deterministic component and the aperiodic component is referred to as stochastic component or noise component.
For example, U.S. Patent No. 5029509 discloses a well-known Spectral Modeling Synthesis (SMS) technology in which the periodic component is represented as a series of sinusoids and the aperiodic component is represented as a series of magnitude spectral envelopes. According to Childers, Donald G., and C. K. Lee. "Vocal quality factors: Analysis, synthesis, and perception." the Journal of the Acoustical Society of America 90.5 (1991): 2394-2410, the time-domain envelope of the speech aperiodic component was found to be related to the periodic component. Further, the use of a square-wave-modulated white noise excitation signal synchronized to the periodic excitation in a formant synthesizer was found to produce more natural-sounding voice.
In many applications it is desirable to preserve both temporal and frequency-domain characteristics of the aperiodic signal during processing, modification, or parametrization. Such an attempt, for the purpose of pitch shifting, is described in Mehta, Daryush, and Thomas F. Quatieri. "Synthesis, analysis, and pitch modification of the breathy vowel." Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on. IEEE, 2005, according to which the speech signal is first decomposed into periodic component and aperiodic component, then the periodic component is pitch-shifted by a sinusoidal model. The pitch shifting of the aperiodic component involves whitening the aperiodic signal, estimating time-domain envelope of the whitened signal, demodulating the whitened signal by estimated time-domain envelope, resampling the time-domain envelope, re-modulating the demodulated signal by the resampled time-domain envelope, and finally spectral coloring the re-modulated signal by the spectrum of the original aperiodic signal. However, the short-time energy of the resynthesized aperiodic signal is not guaranteed to comply with the original aperiodic signal because the resampling of time-domain envelope of whitened aperiodic signal may introduce an energy distortion.
A similar technology is described in Pantazis, Yannis, and Stylianou, Yannis, "Improving the modeling of the noise part in the harmonic plus noise model of speech." International Conference on Acoustics, Speech, and Signal Processing (2008), in which the synthesis stage involves first coloring the white noise signal and then modulating the colored signal by a time-domain envelope. However, the time-domain modulation introduces a frequency- domain distortion that blurs the spectrum of the aperiodic signal, resulting in a degradation in naturalness of the synthesized speech.
Another problem of conventional technologies is the over-simplified assumption that the noise excitation receives the same modulation in all frequency channels. In fact, noise is produced not only near glottis but also in the vocal tract during phonation, which implies that the aperiodic component exhibits different time-domain characteristics in different frequency regions. For example, the time-domain envelope of aperiodic component extracted from a sustained vowel in the 0-5kHz band has very weak periodicity while the time-domain envelope of the aperiodic component in the 9-12kHz band has stronger periodicity, as shown in Fig. 1.
SUMMARY OF THE INVENTION The present invention is a method for analysis and synthesis of stochastic signals with quasi- periodic time-domain envelopes. The method is primarily designed for high-quality speech processing applications, for example, speech synthesis and audio production.
The present invention, in its analysis stage, involves the following steps. (1) Estimate one or a plurality of spectral envelopes from the input aperiodic signal. (2) Band-pass filter the input aperiodic signal for each designated frequency band. (3) Extract time-domain envelope from each band-pass filtered signal. (4) Store the analysis results.
The present invention, in its synthesis stage, involves the following steps. (1) Generate a white noise signal. (2) Band-pass filter the white noise signal for each designated frequency band. (3) Modulate the band-pass filtered signal with input time-domain envelope. (4) Obtain the full-band excitation signal as the summation of modulated band-limited signals. (5) Whiten the excitation signal by inverse-filtering by its spectral envelope. (6) Filter the whitened excitation signal by input spectral envelope.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is the plot of an example of an aperiodic signal extracted from a sustained vowel sound /a/ filtered by a 0-5kHz band-pass filter and a 9-12kHz band-pass filter, respectively. The time-domain envelopes of the signal are also plotted in the figure.
Fig. 2 is a flow chart showing the analysis process of this invention.
Fig. 3 is a flow chart showing the synthesis process of this invention.
Fig. 4 is a flow chart showing the analysis process of a speech processing application involving this invention.
Fig. 5 is a flow chart showing the synthesis process of a speech processing application involving this invention.
Fig. 6 is the plot of an example of a magnitude spectrum of the aperiodic signal and its spectral envelope obtained by cepstral smoothing.
DETAILED DESCRIPTION OF THE INVENTION As shown in Fig. 2, the analysis stage of the present invention consists of the following steps.
Step A001, obtain the input aperiodic signal and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
Step A002, perform Short-Time Fourier Transform (STFT) on the input aperiodic signal and obtain a series of magnitude spectra of the input aperiodic signal.
Step A003, for each magnitude spectrum in the series of magnitude spectra obtained in step A002, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum. An example of a magnitude spectrum obtained by STFT in step A002 and the corresponding spectral envelope obtained by truncating the cepstrum is shown in Fig. 6.
Step A004, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the input aperiodic signal to remove the portion of the aperiodic signal outside of the frequency band. Step A005, extract the time-domain envelope of the band-pass filtered signal obtained in step A004.
The preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the band-pass filtered signal.
Step A006, store the analysis results, including the series of spectral envelopes obtained in step A003 and one or a plurality of time-domain envelopes obtained in step A005.
As shown in Fig. 3, the synthesis stage of the present invention consists of the following steps,
Step S001, obtain a series of spectral envelopes describing the frequency-domain
characteristics of the aperiodic signal, one or a plurality of time-domain envelopes describing the time-domain characteristics of the aperiodic signal, and a predetermined array of one or a plurality of frequency values designating the frequency bands for modeling the time-domain characteristics of the aperiodic signal.
Step S002, generate a white noise signal with the same duration as the input time-domain envelopes.
Step S003, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S002 to remove the portion of the noise signal outside of the frequency band.
Step S004, for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S003, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
Step S005, calculate the sum of the modulated signals obtained in step S004. The result will be denoted as the noise excitation signal in the rest of this description.
Because the time-domain modulation in step S004 changes the energy of the band-pass filtered signal in each frequency band, the resulting noise excitation signal becomes colored. Thus the spectral envelope of the noise excitation signal should be taken into consideration in the following noise coloring procedure, in particular described in step S007-S009.
Step S006, perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step S007, for each magnitude spectrum in the series of magnitude spectra obtained in step S006, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
Step S008, inverse filter the series of complex spectra obtained in step S006 by the series of spectral envelopes obtained in step S007. The inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S009, filter the series of inverse filtered complex spectra obtained in step S008 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal. The filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S010, perform inverse STFT on the series of complex spectra obtained in step S009. The resulting time-domain signal is the synthesized aperiodic signal.
The following describes the implementation of an exemplary speech processing application involving this invention, as shown in Fig. 4 and Fig. 5.
Step A101, receive a speech signal from a sound input device, such as a microphone.
Step A102, extract the pitch contour from the input speech signal. The extracted pitch contour is an array of frequency values corresponding to the fundamental frequency of the speech signal at a series of time instants, spacing at a fixed time interval (around 5 milliseconds). If the speech is unvoiced at a certain time instant, then the frequency value corresponding to the time instant is set to zero.
The preferred pitch extraction method is the YIN algorithm described in De Cheveigne, Alain, and Kawahara, Hideki, "YIN, a fundamental frequency estimator for speech and music." Journal of the Acoustical Society of America 111.4 (2002) : 1917-1930.
Step A103, perform STFT analysis on the input speech signal at a series of time instants corresponding to the analysis time instants at where the pitch contour is extracted; obtain a series of complex spectra of the input speech signal. The preferred window for the STFT analysis is Blackman window. The length of the window is preferred to be time-varying. The length of the window is preferred to be twice the length of a period of the speech signal around the analysis time instant.
Step A104, for each complex spectrum in the series of complex spectra obtained in step A103, calculate the log magnitude spectrum and pick the spectral peaks around each harmonic frequency calculated as the integer multiple of the fundamental frequency at the corresponding time instant. Perform parabolic interpolation at the spectral peaks to obtain a refined estimation of the harmonic amplitudes and harmonic frequencies. Perform linear interpolation at the refined harmonic frequencies on the unwrapped phase spectrum calculated from the complex spectrum to obtain a refined estimation of the phase of the harmonics. Normalize the estimated harmonic amplitudes by dividing the amplitudes by half the sum of the analysis window for the STFT analysis in step A103.
Step A105, generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 and the pitch contour obtained in step A102. Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the extracted periodic component.
Step A106, subtract the extract periodic component from the input speech signal. The resulting signal is denoted as the extracted aperiodic component in the rest of this description.
Step A112, perform Short-Time Fourier Transform (STFT) on the extracted aperiodic component signal and obtain a series of magnitude spectra of the extracted aperiodic component signal. The length of the window for the STFT analysis is around 10
milliseconds.
Step A113, for each magnitude spectrum in the series of magnitude spectra obtained in step A112, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step A114, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the extracted aperiodic component signal to remove the portion of the aperiodic signal outside of the frequency band.
Step A115, extract the time-domain envelope of the band-pass filtered signal obtained in step A114.
The preferred method for time-domain envelope extraction is to low-pass filter the absolute value of the signal.
Step A116, store the analysis results, including the series of spectral envelopes obtained in step A113, one or a plurality of time-domain envelopes obtained in step A115, the pitch contour obtained in step A102 and the series of harmonic amplitudes and harmonic phases obtained in step A104.
Step M101, optionally, modify the analysis results. For example, multiply the pitch contour by a constant, accordingly adjust the amplitudes of the harmonics at each time instant and accordingly resample the time-domain envelopes for each frequency band describing the aperiodic component to shift up the pitch.
Step SlOl, generate a plurality of sinusoids with time-varying amplitude and time-varying frequency according to the series of harmonic amplitudes and harmonic phases obtained in step A104 or step M101 and the pitch contour obtained in step A102 or step M101.
Calculate the sum of the sinusoids. In the rest of this description the sum of the sinusoids is denoted as the synthesized periodic signal.
Step S112, generate a white noise signal with the same duration as the time-domain envelopes obtained in step A115 or step M101.
Step S113, for each of the frequency bands designated by the predetermined array of frequency values, band-pass filter the white noise signal generated in step S112 to remove the portion of the noise signal outside of the frequency band.
Step S114, for each of the frequency bands designated by the predetermined array of frequency values, multiply the band-pass filtered noise signal obtained in step S113, corresponding to the frequency band, by the time-domain envelope corresponding to the frequency band.
Step S115, obtain the noise excitation signal by computing the sum of the modulated signals obtained in step S114.
Step S116, perform STFT on the noise excitation signal and obtain a series of complex spectra of the noise excitation signal. For each complex spectra, calculate the corresponding magnitude spectrum.
The preferred method for calculating the spectral envelope from a magnitude spectrum is to convert the magnitude spectrum into cepstrum, truncate the cepstrum to a designated order, and finally convert the cepstrum back to spectrum.
Step S117, for each magnitude spectrum in the series of magnitude spectra obtained in step S116, calculate the corresponding spectral envelope. The result should be a series of spectral envelopes.
Step S118, inverse filter the series of complex spectra obtained in step S116 by the series of spectral envelopes obtained in step S117. The inverse filtering can be implemented as dividing each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S119, filter the series of inverse filtered complex spectra obtained in step S118 by the series of spectral envelopes describing the frequency-domain characteristics of the aperiodic signal. The filtering can be implemented as multiplying each complex spectrum by the corresponding spectral envelope. The result should be a series of complex spectra.
Step S120, perform inverse STFT on the series of complex spectra obtained in step S119. The resulting time-domain signal is the synthesized aperiodic signal.
Step S121, calculate the sum of the synthesized periodic signal obtained in step SlOl and the synthesized aperiodic signal obtained in step S120. Send the resulting signal to an sound output device such as a speaker.

Claims

1. a speech processing method that extracts information from an aperiodic signal, consisting of the following steps
estimate one or a plurality of spectral envelopes from the input aperiodic signal;
band-pass filter the input aperiodic signal for each designated frequency band;
extract time-domain envelope from each band-pass filtered signal;
store the analysis results comprising of one or a plurality of spectral envelopes and time- domain envelope for each designated frequency band.
2. a speech processing method that generates an aperiodic signal from spectral and time- domain envelopes, consisting of the following steps
generate a white noise signal;
band-pass filter the white noise signal for each designated frequency band;
modulate the band-pass filtered signal with input time-domain envelope;
obtain the full-band excitation signal as the summation of modulated band-limited signals;
whiten the excitation signal by inverse-filtering by its spectral envelope;
filter the whitened excitation signal by one or a plurality of input spectral envelopes.
PCT/IB2017/050208 2016-01-19 2017-01-15 Method for analysis and synthesis of aperiodic signals WO2017125840A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662280572P 2016-01-19 2016-01-19
US62/280,572 2016-01-19

Publications (1)

Publication Number Publication Date
WO2017125840A1 true WO2017125840A1 (en) 2017-07-27

Family

ID=59362144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2017/050208 WO2017125840A1 (en) 2016-01-19 2017-01-15 Method for analysis and synthesis of aperiodic signals

Country Status (1)

Country Link
WO (1) WO2017125840A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051460A (en) * 2006-04-05 2007-10-10 三星电子株式会社 Speech signal pre-processing system and method of extracting characteristic information of speech signal
CN101123088A (en) * 2007-09-03 2008-02-13 北京中星微电子有限公司 A chorus special effect processing method and system
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
WO2015166694A1 (en) * 2014-05-01 2015-11-05 日本電信電話株式会社 Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program, and recording medium
WO2015188627A1 (en) * 2014-06-12 2015-12-17 华为技术有限公司 Method, device and encoder of processing temporal envelope of audio signal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051460A (en) * 2006-04-05 2007-10-10 三星电子株式会社 Speech signal pre-processing system and method of extracting characteristic information of speech signal
CN101123088A (en) * 2007-09-03 2008-02-13 北京中星微电子有限公司 A chorus special effect processing method and system
CN102208186A (en) * 2011-05-16 2011-10-05 南宁向明信息科技有限责任公司 Chinese phonetic recognition method
WO2015166694A1 (en) * 2014-05-01 2015-11-05 日本電信電話株式会社 Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program, and recording medium
WO2015188627A1 (en) * 2014-06-12 2015-12-17 华为技术有限公司 Method, device and encoder of processing temporal envelope of audio signal

Similar Documents

Publication Publication Date Title
JP5958866B2 (en) Spectral envelope and group delay estimation system and speech signal synthesis system for speech analysis and synthesis
US8255222B2 (en) Speech separating apparatus, speech synthesizing apparatus, and voice quality conversion apparatus
EP1612770B1 (en) Voice processing apparatus and program
Mowlaee et al. Interspeech 2014 special session: Phase importance in speech processing applications
JP2009042716A (en) Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing apparatus, and cyclic signal analysis method
Cabral et al. Glottal spectral separation for parametric speech synthesis
JP6347536B2 (en) Sound synthesis method and sound synthesizer
JP2018077283A (en) Speech synthesis method
JP2005531990A5 (en)
Rao et al. Voice conversion by prosody and vocal tract modification
JPH04358200A (en) Speech synthesizer
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation
Babacan et al. Parametric representation for singing voice synthesis: A comparative evaluation
JP6831767B2 (en) Speech recognition methods, devices and programs
WO2017125840A1 (en) Method for analysis and synthesis of aperiodic signals
Morise Modification of velvet noise for speech waveform generation by using vocoder-based speech synthesizer
Arakawa et al. High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of STRAIGHT spectrum
Arroabarren et al. Instantaneous frequency and amplitude of vibrato in singing voice
US10586526B2 (en) Speech analysis and synthesis method based on harmonic model and source-vocal tract decomposition
JPH07261798A (en) Voice analyzing and synthesizing device
Kawahara et al. Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation
JP5745453B2 (en) Voice clarity conversion device, voice clarity conversion method and program thereof
Morise et al. High-quality waveform generator from fundamental frequency, spectral envelope, and band aperiodicity
Fulop et al. The Reassigned Spectrogram
JPH02134699A (en) Voice analyzing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17741154

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17741154

Country of ref document: EP

Kind code of ref document: A1