TW200721108A - Apparatus and method for normalizing and converting speech waveforms into equal sized patterns of linear predict code vectors using elastic frames and classification by bayesian classifier - Google Patents

Apparatus and method for normalizing and converting speech waveforms into equal sized patterns of linear predict code vectors using elastic frames and classification by bayesian classifier

Info

Publication number
TW200721108A
TW200721108A TW094140528A TW94140528A TW200721108A TW 200721108 A TW200721108 A TW 200721108A TW 094140528 A TW094140528 A TW 094140528A TW 94140528 A TW94140528 A TW 94140528A TW 200721108 A TW200721108 A TW 200721108A
Authority
TW
Taiwan
Prior art keywords
syllable
waveform
pattern
elastic frames
feature
Prior art date
Application number
TW094140528A
Other languages
Chinese (zh)
Other versions
TWI297487B (en
Inventor
Tze-Fen Li
Original Assignee
Tze-Fen Li
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tze-Fen Li filed Critical Tze-Fen Li
Priority to TW94140528A priority Critical patent/TWI297487B/en
Publication of TW200721108A publication Critical patent/TW200721108A/en
Application granted granted Critical
Publication of TWI297487B publication Critical patent/TWI297487B/en

Links

Landscapes

  • Auxiliary Devices For Music (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

The speech waveforna of a syllable is a nonlinear, time-varying response system and hence the syllable has also a nonlinear, tim-varying dynamic feature changing with the waveform. The present invention discloses a pattern matching system applicable for syllable recognition which is theoretically and statistically derived based on this nonlinear, time-varying waveform. The invention includes a converting means, using an uniform segmentation on the speech waveform with E small equal elastic frames without filter, half overlapped to stretch or contract themselves to cover the variable length of the whole speech waveform of a known syllable. Since a signal has a linear relation with its previous signals [1,3,4], we approximate the nonlinear, time-varying waveform by a linear regression model in each of E short frames for normalizing and converting the waveform into a sequence of E linear predict coding (LPC) cepstra vectors such that the same j-th elastic frame, j=1,...,E, can produce one by one in order the same LPC cepstra (LPCC) vector by the least squares method for the same syllable. Many speakers pronounce the same known syllable and produce various lengths of nonlinear time-varying waveforms. The E elastic frames can normalize and convert them into the samples of E LPCC vectors such that the same LPCC vectors in the samples are pulled into the same time positions in the sequence of E vectors. The sample means and sample variances of LPCC are stored in a database as an equal sized standard pattern to represent the feature of the known syllable. The pattern matching system further includes a converting means, using an uniform segmentation on the speech waveform of an unknown syllable with E small equal elastic frames without filter, half overlapped to stretch or contract themselves to cover the variable length of the whole speech waveform of the unknown syllable, for normalizing and converting the waveform of the input unknown syllable into an equal sized categorizing pattern of E LPC cepstra vectors representing the unknown syllable. The pattern matching system further includes a Bayesian categorizing means for matching the equal sized stan- dard pattern representing a known syllable and the equal sized categorizing pattern representing an unknown syllable by computing a Bayesian misclassification prob- ability for each of the known syllables. The Bayesian categorization means further includes a comparing and identification means for selecting a known syllable which has the least misclassification probability as an identified syllable for the input unknown syllable. In the pattern matching system of the invention, one advantage is that elastic frames normalize the various length of waveform before extract the feature of an unknown syllable and as soon as the feature is extracted, the simplifed Bayes rule is immediately and readily able to classify the unknown syllable without additional normalizing, adjusting, compressing or warping the feature pattern as in the dynamic time-warping methods. Another advantage of the invention is that E elastic frames can pull the same features to the same time positions in the feature pattern from the different waveforms representing the same syllable. Another advantage of the invention is that elastic frames can adjust too short or too long waveforms pronounced in a fast or slow speech and convert the waveforms into equal sized feature patterns which are immediately able to be classified. This raises recogni- tion applicability for the invention, especially for English words made of short basic phonemes. The pattern matching system in the invention has the following features: (1) The pattern recognition system is statistically and theoretically derived based on the nonlinear time-varying waveforms without any arbitrary, artificial or experimental adjustments for better pattern recognition. (2) Since the waveform of a syllable is normalized before feature extraction, E equal elastic frames can use the same j-th elastic frame, j=1,...,E, to catch one by one in order the same LPC feature vector for the same syllable and the different LPC feature vectors for different syllables. (3) The system can be readily and immediately used without additional training and computing for thresholds and parameters. (4) The waveform normalization before feature extraction by E equal elastic frames saves the computation time in training and recognition. (5) The waveform normalization before feature extraction by E equal elastic frames significantly increases recognition rates. (6) The waveform normalization before feature extraction by E equal elastic frames significantly simplifies the algorithms used in the pattern matching system. (7) The waveform normalization before feature extraction by E equal elastic frames increases the applicability of the speech recognition system, i.e., the E equal elastic frames stretch or contract themselves to cover the whole speech waveform denoting a syllable to produce an equal sized pattern of E LPC cepstra vectors by the least squares method denoting the syllable and hence the pattern recognition system in the invention can recognize the syllables with too short or too long speech waveforms, especially English words made of short basic phonemes. (8) The waveform normalization before feature extraction with E equal elastic frames provides a simple feature easily extracted from speech signal waveform. (9) The Bayesian categorizing means needs less time for classification on an equal sized categorizing pattern of LPC cepstra which have normal distributions and has the least mis-classification probability and hence provides higher recognition rates.
TW94140528A 2005-11-18 2005-11-18 A method for speech recognition TWI297487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94140528A TWI297487B (en) 2005-11-18 2005-11-18 A method for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94140528A TWI297487B (en) 2005-11-18 2005-11-18 A method for speech recognition

Publications (2)

Publication Number Publication Date
TW200721108A true TW200721108A (en) 2007-06-01
TWI297487B TWI297487B (en) 2008-06-01

Family

ID=45069112

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94140528A TWI297487B (en) 2005-11-18 2005-11-18 A method for speech recognition

Country Status (1)

Country Link
TW (1) TWI297487B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI402824B (en) * 2009-10-15 2013-07-21 Univ Nat Cheng Kung A pronunciation variation generation method for spontaneous speech synthesis
TWI512719B (en) * 2013-02-01 2015-12-11 Tencent Tech Shenzhen Co Ltd An acoustic language model training method and apparatus

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9396723B2 (en) 2013-02-01 2016-07-19 Tencent Technology (Shenzhen) Company Limited Method and device for acoustic language model training
CN111712874B (en) * 2019-10-31 2023-07-14 支付宝(杭州)信息技术有限公司 Method, system, device and storage medium for determining sound characteristics

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI402824B (en) * 2009-10-15 2013-07-21 Univ Nat Cheng Kung A pronunciation variation generation method for spontaneous speech synthesis
TWI512719B (en) * 2013-02-01 2015-12-11 Tencent Tech Shenzhen Co Ltd An acoustic language model training method and apparatus

Also Published As

Publication number Publication date
TWI297487B (en) 2008-06-01

Similar Documents

Publication Publication Date Title
Tiwari MFCC and its applications in speaker recognition
KR101785500B1 (en) A monophthong recognition method based on facial surface EMG signals by optimizing muscle mixing
CN103824557A (en) Audio detecting and classifying method with customization function
Ghule et al. Feature extraction techniques for speech recognition: A review
Rashmi Review of algorithms and applications in speech recognition system
Ramgire et al. A survey on speaker recognition with various feature extraction and classification techniques
Thiruvengatanadhan Speech recognition using SVM
TW200721108A (en) Apparatus and method for normalizing and converting speech waveforms into equal sized patterns of linear predict code vectors using elastic frames and classification by bayesian classifier
Kekre et al. Speaker recognition using Vector Quantization by MFCC and KMCG clustering algorithm
Nancy et al. Audio based emotion recognition using mel frequency cepstral coefficient and support vector machine
Dutta Dynamic time warping based approach to text-dependent speaker identification using spectrograms
Rozario et al. Performance comparison of multiple speech features for speaker recognition using artifical neural network
Yousfi et al. Isolated Iqlab checking rules based on speech recognition system
Büker et al. Double compressed AMR audio detection using long-term features and deep neural networks
Aggarwal et al. Grid search analysis of nu-SVC for text-dependent speaker-identification
Al-Rawahy et al. Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients
Swathy et al. Review on feature extraction and classification techniques in speaker recognition
Kinney et al. Wavelet packet cepstral analysis for speaker recognition
Mengistu et al. Text independent Amharic language dialect recognition: A hybrid approach of VQ and GMM
Waghmare et al. Speaker Recognition for forensic application: A Review
Bora et al. Speaker identification for biometric access control using hybrid features
TW200744067A (en) Apparatus and method for classifying similar mandarin syllables using two consecutive Bayesian decision rules
Sharma et al. Text-independent speaker identification using backpropagation mlp network classifier for a closed set of speakers
Gambhir et al. A run-through: Text independent speaker identification using deep learning
Besbes et al. Wavelet packet energy and entropy features for classification of stressed speech

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees