US7333931B2 - Method for estimating resonance frequencies - Google Patents
Method for estimating resonance frequencies Download PDFInfo
- Publication number
- US7333931B2 US7333931B2 US10/568,150 US56815006A US7333931B2 US 7333931 B2 US7333931 B2 US 7333931B2 US 56815006 A US56815006 A US 56815006A US 7333931 B2 US7333931 B2 US 7333931B2
- Authority
- US
- United States
- Prior art keywords
- estimating
- resonance frequencies
- input signal
- peaks
- differential
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001228 spectrum Methods 0.000 claims abstract description 68
- 230000001755 vocal effect Effects 0.000 claims description 17
- 238000004458 analytical method Methods 0.000 description 33
- 230000001364 causal effect Effects 0.000 description 14
- 230000008569 process Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 230000003595 spectral effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000000926 separation method Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- AAOVKJBEBIDNHE-UHFFFAOYSA-N diazepam Chemical compound N=1CC(=O)N(C)C2=CC=C(Cl)C=C2C=1C1=CC=CC=C1 AAOVKJBEBIDNHE-UHFFFAOYSA-N 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Definitions
- the present invention is related to an analysis technique for recorded speech signals that can be used in various fields of speech processing technology.
- the basic source-filter speech model is very frequently used. It mainly assumes that the speech signal is produced by exciting a filter (corresponding to vocal tract), e.g., by an excitation produced by the lung pressure and larynx (source signal or the glottal flow signal).
- Decomposition of the two systems has been an interesting problem in all areas of speech processing.
- the source and the filter characteristics provide very useful information for speech applications. In many applications, removing one system's effect on the other improves the quality of analysis performed by the application.
- source signal characteristics estimation is very important for voice quality analysis of speech, database labelling (for voice quality and prosodic events), speech quality modification (emotional speech synthesis).
- Both systems show some resonance characteristics, which are considered to be their essential features. These resonances are called the formants and their estimation has been studied by various researchers, especially for the filter part.
- estimation of the spectral resonance of the source (called the glottal formant) as presented in the present application is rather a new concept.
- Rabiner System for automatic formant analysis of voiced speech
- Rabiner and Schafer JASA , vol. 47, no. 2/2, pp. 634-648, 1970
- Murthy and Yegnanarayana ‘ Formant extraction from group delay function’, Speech Communication , vol. 10, no. 3, pp. 209-221, August, 199.
- Both methodologies are based on spectral processing of speech.
- Rabiner's approach is based on analysis of the Z-transform amplitude spectrum and Murthy's on the minimum phase group delay function derived from amplitude spectrum. In both cases one of the most important method steps is the cepstral smoothing.
- aspects of the invention include a method for estimating the formant frequencies for vocal tract and glottal flow, directly from speech signals and further include a computer usable medium that implements such a method.
- the circle on which the Z-transform is evaluated is different from the unit circle in the Z-plane.
- the Z-transform of the input signal is evaluated on more than one circle.
- the input signal is windowed.
- the input signal is a speech signal.
- the source is a glottal flow signal and the filter is a vocal tract system.
- Attributing the peaks is performed based on the sign of said peaks. Said attributing is preferably further based on the radius of said circle.
- the method for estimating the resonance frequencies further comprises removing zeros of the input signal's Z-transform before performing calculating the differential-phase spectrum.
- a computer usable medium having computer readable program code embodied therein for estimating from an input signal the resonance frequencies of a system modeled as a source and a filter, the computer readable code comprising instructions for determining the Z-transform of said input signal, calculating the differential-phase spectrum of said Z-transformed input signal, said Z-transform thereby being evaluated on a circle centered around the origin of the Z-plane, detecting the peaks on said differential-phase spectrum, attributing said peaks to either said source or said filter, and estimating said resonance frequencies from said peaks.
- FIG. 1 represents the source-filter speech model.
- FIG. 2 shows the anti-causal character of the glottal flow signal. a) a causal filter response, b) an anti-causal filter response, c) a typical glottal flow signal.
- FIG. 3 represents a causal and an anti-causal single pole filter response plots: a) causal impulse response, b) log-amplitude spectrum of a), c) group delay spectrum of a), d) anti-causal impulse response, e) log-amplitude spectrum of d), f) group delay spectrum of d).
- FIG. 4 represents a mixed phase all-pole signal with causal resonances at 1000 Hz and 2000 Hz and anti-causal resonances at 500 Hz and 1500 Hz.
- FIG. 5 shows the effect of zeros on the group delay function, a) Zeros of Z-Transform (ZZT) plotted in polar coordinates (region of zeros close to the unit circle indicated by dashed lines), b) group delay function with ZZT close to unit circle superimposed.
- ZZT Zeros of Z-Transform
- FIG. 6 represents an example of differential-phase spectrum analysis of synthetic speech.
- FIG. 7 represents a flowchart of the method according to one embodiment.
- Certain embodiments target the estimation of resonance frequencies (formant frequencies) of the source and the vocal tract contributions directly from the speech signal itself.
- the source-tract separation problem needs to be handled with tools, which can detect anti-causal resonances.
- the technique is more effective than current state of the art methods, mainly because it is capable of detecting causal and anti-causal resonances without utilization of a particular model of analysis, but only with spectral peak analysis. Additionally, the technique has no dependency on analysis degrees as in LP analysis systems.
- the source-filter model (see FIG. 1 ) is usually accompanied by the assumption that a speech signal is a physical system output and therefore it is the output of a stable filter system.
- all the resonances of the signal shall correspond to poles inside the unit circle in z-plane.
- the system is all-pole (i.e., the system can be defined by only poles and a gain factor)
- one ends up with a minimum phase system the systems having all zeros and poles inside the unit circle are classified as minimum phase systems.
- Speech signals have been assumed to be minimum-phase signals for long years in many studies.
- an anti-causal signal x( ⁇ n) is obtained.
- the version of x( ⁇ n) time shifted to positive time indexes is also referred to as anti-causal, because the filter characteristics are time-reversed. Shifting the signal in time only introduces a linear phase component to the signal (a DC component is added to the group delay spectrum) and the amplitude spectrum is unaffected.
- the anti-causality assumption for the source is based on the characteristics of glottal flow models (as explained in detail in ‘ Spectral correlates of glottal waveform models: an analytic study’ , Doval and d'Alessandro, Proc. ICASSP 97, Kunststoff, pp. 446-452).
- One easy explanation is through visual inspection of signal waveforms.
- FIG. 2 an example glottal flow signal is presented together with a causal and an anti-causal filter response.
- the glottal flow signal has the same characteristics as the anti-causal response, namely a slowly increasing function with a rather sharper decay.
- the glottal flow signals can be modelled by an all-pole system where the poles are anti-causal. For stability of an anti-causal all-pole system, all of the poles have to be out of the unit circle and therefore the system is maximum phase.
- the mixed-phase model assumes speech signals have two types of resonances: anti-causal resonances of the source (glottal flow) signal and causal resonances of the vocal tract filter. Certain embodiments estimate these resonances from the speech signal.
- the estimation method is based on analysis of ‘differential-phase spectra’.
- phase spectra The closest concept to differential-phase spectra is the group delay, so the differential-phase spectra will be introduced as a more general form of group delay.
- the source-tract separation is based on spectral analysis of causal and anti-causal parts of the speech signal.
- the frequently used amplitude (or power) spectra offer very little help (if any).
- the phase spectra have to be studied, since causality can only be observed in phase spectra.
- phase spectra derivative however does not have the same property and various other advantages exist over both phase spectra and amplitude spectra.
- the group delay function GD( ⁇ ) is defined as the negative of derivative of the argument ⁇ ( ⁇ ) of X( ⁇ ), being the discrete Fourier transform of a signal x(n).
- FIG. 4 a mixed phase signal (synthesised with all-pole model) and its group delay spectrum are presented.
- the mixed phase signal in FIG. 4 is synthesised by convolving a causal filter response with resonances at 1000 Hz and 2000 Hz and anti-causal filter response with resonances at 500 Hz and 1500 Hz.
- the causal and anti-causal resonances appear as peaks with opposite direction on the group delay spectrum where on the amplitude spectrum causality or anti-causality cannot be observed. Therefore, for analysis of causality of resonances of mixed-phase signals like speech, group delay function processing (obtained from phase information) is advantageous to amplitude spectrum processing.
- X(e j ⁇ ) denotes the z-transform of a discrete time sequence x(n)
- the Z m represent the roots of the z-transform
- G is the gain factor.
- Each factor in (eq. 4) corresponds, in the z-plane, to a vector starting at Z m and ending at e j ⁇ .
- the problem is first redefined in a more general framework of ‘differential-phase spectrum’.
- the differential-phase spectrum is defined as the negative derivative of the phase spectrum calculated from the signal's z-transform, evaluated on a circle with any radius centered at the origin of the z-plane.
- the spiky effects of the zeros can be avoided and resonance peaks can be tracked.
- Certain embodiments advantageously make use of the insight that signal resonances can be tracked from differential phase spectra calculated on circles with radius different from 1 (the unit circle), e.g., on circles with a radius either larger or smaller than 1.
- the analysis of more than one differential-phase spectrum is advantageous for the estimation of source and tract characteristics due to the poles existing inside and outside the unit circle (though a single differential-phase spectrum can also reveal all causal and anti-causal resonances). Therefore the method preferably includes the step of processing more than one differential-phase spectrum calculated at circles with different radius, as this yields an improved robustness.
- the resulting differential-phase spectra are much less noisy than group delay functions, but still zeros may exist anywhere in the z-plane.
- a single unexpected zero causes the same type of spiky effect for the frequency regions, where the zero is close to the analysis circle.
- a zero-removal technique is proposed that effectively calculates noise-free differential-phase spectra. The procedure comprises the steps of:
- the roots (zeros) of a z-transform polynomial can be determined by a numerical method.
- the obtained set of roots of the z-transform polynomial can be divided into two sets of roots (which corresponds to dividing the z-transform polynomial into two polynomials).
- the obtained two sets of roots correspond to the spectral representation of glottal flow and vocal tract contributions of speech signal: when classifying the roots according to their distance to the origin of the z-plane (i.e., their radius), roots outside the unit circle are classified as glottal flow roots and roots inside the unit circle as vocal tract roots.
- glottal flow roots which are out of the unit circle are removed from the complete set of zeros and then the differential-phase spectrum calculation is performed.
- FIG. 6 An example on synthetic speech analysis is presented in FIG. 6 for the zero-removal technique and its effect to differential-phase spectrum.
- the first row of plots include the actual amplitude spectrum of glottal flow ( FIG. 6 a ) and the amplitude spectrum vocal tract ( FIG. 6 b ) used in synthesis.
- the aim is to estimate the resonance peak (formant) locations of these two systems directly from the speech signal, which is constructed by convolution of these two systems and an impulse train to obtain several cycles of speech signal.
- An all-pole vocal tract filter (of a typical vowel “a” with normalised resonance frequencies at 0.075, 0.15, 0.275, 0.4 for 16000 Hz) is used for synthesis. This synthetic speech signal is windowed for analysis.
- Peak picking is performed on these spectra and sign and frequencies of the peaks are stored.
- the negative peak in FIG. 6 i will be classified as glottal formant peak and the positive peaks on FIG. 6 j will be classified as vocal tract formant peaks in the final decision.
- FIG. 7 summarises a process of estimating resonance frequencies 700 according to one embodiment in a flowchart. The various steps are as described previously.
- the process 700 begins with speech data 702 that is input to a windowing state 710 . Proceeding to state 720 , process 700 performs a z-transform and then advances to state 730 for calculation of zeros. Continuing at state 740 , process 700 performs classification of zeros according to radius. If the radius (r) is less than one (that is, inside the unit circle (UC)), process 700 advances to state 750 and performs a differential-phase spectrum calculation outside the UC. Based on the results of the calculations at state 750 , process 700 then performs peak picking at state 752 .
- UC unit circle
- process 700 advances to state 754 and performs a differential-phase spectrum calculation inside the UC. Based on the results of the calculations at state 754 . process 700 then performs peak picking at state 756 . At the completion of peak picking states 752 or 756 , process 700 continues at state 760 to perform classification of vocal tract formants and glottal flow formants according to the sign of the peak and radius of the analysis circle for the differential-phase spectrum calculation. The results of the classification performed at state 760 are vocal tract formant frequencies 762 and glottal flow formant frequencies 764 .
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/568,150 US7333931B2 (en) | 2003-08-11 | 2004-08-11 | Method for estimating resonance frequencies |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US49437503P | 2003-08-11 | 2003-08-11 | |
US56405404P | 2004-04-21 | 2004-04-21 | |
US10/568,150 US7333931B2 (en) | 2003-08-11 | 2004-08-11 | Method for estimating resonance frequencies |
PCT/BE2004/000116 WO2005031702A1 (fr) | 2003-08-11 | 2004-08-11 | Procede pour evaluer les frequences de resonance |
Publications (2)
Publication Number | Publication Date |
---|---|
US20060229868A1 US20060229868A1 (en) | 2006-10-12 |
US7333931B2 true US7333931B2 (en) | 2008-02-19 |
Family
ID=34396150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/568,150 Expired - Fee Related US7333931B2 (en) | 2003-08-11 | 2004-08-11 | Method for estimating resonance frequencies |
Country Status (5)
Country | Link |
---|---|
US (1) | US7333931B2 (fr) |
EP (1) | EP1665228A1 (fr) |
JP (1) | JP2007501957A (fr) |
AU (1) | AU2004276847B2 (fr) |
WO (1) | WO2005031702A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120004906A1 (en) * | 2009-02-04 | 2012-01-05 | Martin Hagmuller | Method for separating signal paths and use for improving speech using electric larynx |
US11610597B2 (en) | 2020-05-29 | 2023-03-21 | Shure Acquisition Holdings, Inc. | Anti-causal filter for audio signal processing |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8385864B2 (en) * | 2006-02-21 | 2013-02-26 | Wolfson Dynamic Hearing Pty Ltd | Method and device for low delay processing |
US20120089392A1 (en) * | 2010-10-07 | 2012-04-12 | Microsoft Corporation | Speech recognition user interface |
US9231805B2 (en) * | 2012-07-09 | 2016-01-05 | Telefonaktiebolaget L M Ericsson (Publ) | Device for carrier phase recovery |
US9865247B2 (en) * | 2014-07-03 | 2018-01-09 | Google Inc. | Devices and methods for use of phase information in speech synthesis systems |
GB2537802A (en) * | 2015-02-13 | 2016-11-02 | Univ Sheffield | Parameter estimation and control method and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6069857A (en) * | 1991-02-15 | 2000-05-30 | Discovision Associates | Optical disc system having improved circuitry for performing blank sector check on readable disc |
US6704711B2 (en) * | 2000-01-28 | 2004-03-09 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for modifying speech signals |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0670750B2 (ja) * | 1987-08-19 | 1994-09-07 | 日本電気株式会社 | 位相特性の推定を伴う極零分析装置 |
JPH01232224A (ja) * | 1988-03-11 | 1989-09-18 | Matsushita Electric Ind Co Ltd | 共振周波数抽出装置 |
JP3150277B2 (ja) * | 1995-10-30 | 2001-03-26 | 松下電器産業株式会社 | 線形予測係数計算装置 |
US6195632B1 (en) * | 1998-11-25 | 2001-02-27 | Matsushita Electric Industrial Co., Ltd. | Extracting formant-based source-filter data for coding and synthesis employing cost function and inverse filtering |
JP2002169579A (ja) * | 2000-12-01 | 2002-06-14 | Takayuki Arai | オーディオ信号への付加データ埋め込み装置及びオーディオ信号からの付加データ再生装置 |
SE0101175D0 (sv) * | 2001-04-02 | 2001-04-02 | Coding Technologies Sweden Ab | Aliasing reduction using complex-exponential-modulated filterbanks |
JP2003157100A (ja) * | 2001-11-22 | 2003-05-30 | Nippon Telegr & Teleph Corp <Ntt> | 音声通信方法及び装置、並びに音声通信プログラム |
-
2004
- 2004-08-11 WO PCT/BE2004/000116 patent/WO2005031702A1/fr active Search and Examination
- 2004-08-11 US US10/568,150 patent/US7333931B2/en not_active Expired - Fee Related
- 2004-08-11 EP EP04761477A patent/EP1665228A1/fr not_active Withdrawn
- 2004-08-11 AU AU2004276847A patent/AU2004276847B2/en not_active Ceased
- 2004-08-11 JP JP2006522850A patent/JP2007501957A/ja active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6069857A (en) * | 1991-02-15 | 2000-05-30 | Discovision Associates | Optical disc system having improved circuitry for performing blank sector check on readable disc |
US6704711B2 (en) * | 2000-01-28 | 2004-03-09 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for modifying speech signals |
Non-Patent Citations (7)
Title |
---|
Bozkurt et al., "Mixed-Phase Speech Modeling and Formant Estimation, Using Differential Phase Spectrums," PROC. ISCA ITRW VOQUAL 2003, 'Online! Aug. 27, 2003, pp. 21-24, XP002312214. |
Doval et al., "Spectral Correlates of Glottal Waveform Models: An Analytic Study," 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, abstract, XP002312219. |
Doval et al., "The Voice Source as a causal/anticausal linear filter," PROC. ISCA ITRW VOQUAL 2003, 'Online! Aug. 27, 2003, sheets 15-20, XP002312215. |
Duncan et al., "A Nonparametric Method of Formant Estimation Using Group Delay Spectra," 1989 International Conference on Acoustics, Speech and Signal Processing, abstract, XP002312217. |
Jackson, "Noncausal ARMA Modeling of Voiced Speech," IEEE Transactions on Acoustics, Speech and Signal Processing, Oct. 1989, abstract, XP002312218. |
Reddy et al., "High-Resolution Formant Extraction from Linear-Prediction Phase Spectra," IEEE Transactions on Acoustics, Speech and Signal Processing, Dec. 1984, abstract, XP002312216. |
Reddy et al., "High-Resolution Formant Extraction from Linear-Prediction Phase Spectra," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, No. 6, Dec. 6, 1984, pp. 1136-1144. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120004906A1 (en) * | 2009-02-04 | 2012-01-05 | Martin Hagmuller | Method for separating signal paths and use for improving speech using electric larynx |
US11610597B2 (en) | 2020-05-29 | 2023-03-21 | Shure Acquisition Holdings, Inc. | Anti-causal filter for audio signal processing |
Also Published As
Publication number | Publication date |
---|---|
US20060229868A1 (en) | 2006-10-12 |
AU2004276847A1 (en) | 2005-04-07 |
JP2007501957A (ja) | 2007-02-01 |
WO2005031702A1 (fr) | 2005-04-07 |
EP1665228A1 (fr) | 2006-06-07 |
AU2004276847B2 (en) | 2009-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Drugman et al. | Causal–anticausal decomposition of speech using complex cepstrum for glottal source estimation | |
Bozkurt et al. | Chirp group delay analysis of speech signals | |
Lim et al. | All-pole modeling of degraded speech | |
US6047254A (en) | System and method for determining a first formant analysis filter and prefiltering a speech signal for improved pitch estimation | |
Bozkurt et al. | Mixed-phase speech modeling and formant estimation, using differential phase spectrums | |
JP2007293285A (ja) | 音声信号のフォルマントの強調および抽出 | |
CN101599272B (zh) | 基音搜索方法及装置 | |
US8942977B2 (en) | System and method for speech recognition using pitch-synchronous spectral parameters | |
US7333931B2 (en) | Method for estimating resonance frequencies | |
US9466285B2 (en) | Speech processing system | |
Deng et al. | A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances | |
Bozkurt | Zeros of the z-transform (ZZT) representation and chirp group delay processing for the analysis of source and filter characteristics of speech signals | |
Bonafonte et al. | Duration modeling with expanded HMM applied to speech recognition | |
Bozkurt et al. | A method for glottal formant frequency estimation | |
Christensen | A method for low-delay pitch tracking and smoothing | |
Drugman et al. | Glottal source estimation robustness: A comparison of sensitivity of voice source estimation techniques | |
Walker et al. | Advanced methods for glottal wave extraction | |
Arroabarren et al. | Glottal source parameterization: a comparative study | |
US5911170A (en) | Synthesis of acoustic waveforms based on parametric modeling | |
Rudoy et al. | Conditionally linear Gaussian models for estimating vocal tract resonances. | |
Bank | Accurate and efficient modeling of beating and two-stage decay for string instrument synthesis | |
Giri et al. | Block sparse excitation based all-pole modeling of speech | |
Funaki et al. | WLP-based TV-CAR speech analysis and its evaluation for F0 estimation | |
US6259014B1 (en) | Additive musical signal analysis and synthesis based on global waveform fitting | |
Ding et al. | Fast and robust joint estimation of vocal tract and voice source parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FACULTE POLYTECNIQUE DE MONS, BELGIUM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOZKURT, BARIS;DUTOIT, THIERRY;D'ALESSANDRO, CHRISTOPHE;AND OTHERS;REEL/FRAME:017581/0348;SIGNING DATES FROM 20051208 TO 20051216 |
|
CC | Certificate of correction | ||
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20120219 |