AU646060B2 - Adaptation of reference speech patterns in speech recognition - Google Patents

Adaptation of reference speech patterns in speech recognition Download PDF

Info

Publication number
AU646060B2
AU646060B2 AU81307/91A AU8130791A AU646060B2 AU 646060 B2 AU646060 B2 AU 646060B2 AU 81307/91 A AU81307/91 A AU 81307/91A AU 8130791 A AU8130791 A AU 8130791A AU 646060 B2 AU646060 B2 AU 646060B2
Authority
AU
Australia
Prior art keywords
pattern
speech pattern
speech
stored
reference speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU81307/91A
Other versions
AU8130791A (en
Inventor
Uwe Ackermann
Susanne Dvorak
Thomas Hormann
Dieter Kopp
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel Lucent NV
Original Assignee
Alcatel NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel NV filed Critical Alcatel NV
Publication of AU8130791A publication Critical patent/AU8130791A/en
Application granted granted Critical
Publication of AU646060B2 publication Critical patent/AU646060B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)
  • Noise Elimination (AREA)

Description

S.
9* .5 S S
S
S
S S 9* P/00/011 2815/91 Reguflatlon 3.2 64 6 A UST RALLIA Patents Act 1990
ORIGINAL
COMPLETE SPECIFICATION STANDARD PATENT Iriven tion Hit~e: "ADAPTATION OF REFERENCE SPEECH PATTERNS IN SPEECH
RECOGNITION"
The followving statcmicnt is a fult description of this invention, including the best method of performing it knowvn to LIS:-
S
5* 6S9 S S S 59 5 *5* 55 S S S. 0~ *0 This invention relates to a method of continuously adapting stored reference speech patterns for speech recognition to environment dependent pronunciation variations.
Conventional automatic adapting techniques use a speech recognition performing a filter bank analysis wherein adaptation to ambient noise unit which is constant during word input is implemented by arranging that in a filter bank output signal consisting of useful signal and noise signal components, a constant level noise component caused by the noisy environment is suppressed by differentiation and subsequent integration.
It is also known to carry out speech pattern adaptation in order to achieve speaker independent characteristics. To do this, the original reference speech patterns arc derived by clustering several speakers, adaptation being effected only upon identification of an incorrect speech pattern Smith ct al, "Template Adaptation in a 0' Hypersphere Word Classifier", IEEE, ICASSP, 1990, Vol 1, pp. 565-568).
Investigations have shown, however, that the results produced by speech recognisers depend not only on the acoustic environment (backgound noise) but also S on the speaker's psychic state (stress factor) Junqua, Y. Anglade, "Acoustic and Perceptual Studies of Lombard Speech: Application to Isolated-Words Automatic Speech Recognition", IEEE, ICASSP, 1990, Vol. 2, pp. 841-844).
1 20 Especially if speech recognisers are used in motor vehicles car telephone), the driver/speaker is under considerable stress at higher speeds, so that his or her pronunciation will differ greatly from the normal pronunciation, for which the referience speech patterns were trained.
It is, therefore, the object of the invention to provide a method which permits adaptation to the environment induced pronunciation variations.
According to the present invention, there is provided a method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.
In particular, the method according to the invention permits continuous adaptation to changes in pronunciation caused by ambient influences. Besides the changed pronunciation, the method according to the invention implicitly takes into account background/disturbing noise, because such noise is already contained in the new reference speech pattern, so that the hitherto used, very complicated methods employing filter banks are no longer necessary.
The method according to the invention will now be explained in more detail with the aid of an embodiment.
It is assumed that reference speech patterns determined in a training phase are stored in a speech recogniser.
For each reference speech pattern, r(i) coefficients representing a number of feature vectors are stored as a frame, with i 0 (number of coefficients of a frame i 0 9).
10 An i-th coefficient computed in the training phase will hereinafter be denoted by r According to the invention, in order to continuously adapt such a stored (original) reference speech pattern to environment induced pronunciation variations, after termination of a recognition phase a new reference speech pattern, eg., new coefficients r is calculated from a correct speech pattern and the stored coefficient rr(i) by weighted averaging, and stored instead of the original coefficients.
The weighted averaging is performed according to the following rule: r (aX r(i) 1) where a is the weighting factor. For a 1, the conventional weighted averaging resuits. Preferably, a weighting factor of a 3 is chosen whereby the originally cal- 20 culated coefficients arc weighted more heavily than the newly determined coefficients to effect a continuous rather than momentary correction of the reference speech patterns.
In a further embodiment of the invention, a new reference speech pattern for i* continuous adaptation is calculated only if the speech pattern just identified as correct and the stored reference speech pattern are sufficiently similar. To this end, a pattern spacing calculated in the recognition phase, ie., the respective coefficient value of the correct speech pattern to the coefficient value of the stored reference speech pattern, is compared with a predetermined threshold value. If the pattern spacing is less than the threshold value, ie., if the two speech patterns are similar enough, a new reference speech pattern will be calculated using the aforementioned averaging, and stored.
If the correct speech pattern differs in length from the stored reference speech pattern, ie., if it contains more frames, adaptation to the length of the stored reference speech pattern will be effected by dynamic programming. The dynamic programming method is described by H. Sakoc, S. Chib;,, "Dynamic Programming Algorithm Optimisation for Spoken Word Recognition", IEEE Transactions on Acoustics, Speech, and Signal Processing, V ASSP-26, No. 1, February 1978.
The method according to the invention thus permits the trained reference speech patterns to be automatically adapted to changing ambient situations by weighted averaging whenever a correct speech pattern is identified.
If, because of a large dusturbing-noise component in the speech signal, additional noise reduction by means of conventional filter-bank analyses should be necessary, the resulting negative effects of the spectral discolorations are also compensated for by the continuous reference speech pattern adaptation according to 10 the invention.
900 k* 4 9 o

Claims (4)

1. A method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.
2. A method as claimed in claim 1 wherein the weighted averaging is performed according to the following rule: r, (a X rr (a 1) where i 0 (number of coefficients of a frame, rn i-th coefficient of the new reference speech pattern of a frame, r i-th coefficient of the originally stored reference :15 pattern of a frame, s i-th coefficient of the correct speech pattern of a frame and, a weighting factor 1 preferably 3.
3. A method as claimed in claim 1 or 2, wherein a pattern spacing is 20 calculated in the said recognition phase from the respective coefficient value of 0o0:o. the correct speech pattern and the coefficient value of the stored speech pattern, and compared with a predetermined threshold value, a new reference speech pattern being formed only if the pattern spacing is less than the threshold value. .00. 25
4. A method as claimed in any one of the preceding claims wherein if the correct speech pattern differs in length from the stored reference speech pattern, adaptation to the length of the stored reference speech pattern is effected by time-dynamic programming. DATED THIS TWENTY-FOURTH DAY OF NOVEMBER 1933 0 ALCATEL N.V. ABSTRACT Investigations have shown that the results produiced by speech reccognisers dc- penci not only on the acoustic cnvironment (background noise) but also on thc speaker's psychic state (stress). According to this invention, stored reference spccch patterns are continuously adapted to pronunciation variations by calculating, for each correctly identified word, a new reference speech pattern from the stored] pattern and the newly idecntifiC(l pat- tern by weighted averaging, aind storinig it instcad of the previous patternI. ft
AU81307/91A 1990-08-06 1991-07-25 Adaptation of reference speech patterns in speech recognition Ceased AU646060B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE4024890 1990-08-06
DE19904024890 DE4024890A1 (en) 1990-08-06 1990-08-06 ADAPTATION OF REFERENCE LANGUAGE PATTERNS TO ENVIRONMENTAL PRONOUNCEMENT VERSIONS

Publications (2)

Publication Number Publication Date
AU8130791A AU8130791A (en) 1992-02-13
AU646060B2 true AU646060B2 (en) 1994-02-03

Family

ID=6411712

Family Applications (1)

Application Number Title Priority Date Filing Date
AU81307/91A Ceased AU646060B2 (en) 1990-08-06 1991-07-25 Adaptation of reference speech patterns in speech recognition

Country Status (4)

Country Link
EP (1) EP0470411A3 (en)
JP (1) JPH04240700A (en)
AU (1) AU646060B2 (en)
DE (1) DE4024890A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4322372A1 (en) * 1993-07-06 1995-01-12 Sel Alcatel Ag Method and device for speech recognition
DE19804047C2 (en) * 1998-02-03 2000-03-16 Deutsche Telekom Mobil Method and device for increasing the probability of recognition of speech recognition systems
DE19813061A1 (en) * 1998-03-25 1999-09-30 Keck Klaus Arrangement for altering the micromodulations contained in electrical speech signals of telephone equipment
GB2348035B (en) 1999-03-19 2003-05-28 Ibm Speech recognition system
DE60213195T8 (en) 2002-02-13 2007-10-04 Sony Deutschland Gmbh Method, system and computer program for speech / speaker recognition using an emotion state change for the unsupervised adaptation of the recognition method
DE102004063552A1 (en) * 2004-12-30 2006-07-13 Siemens Ag Method for determining pronunciation variants of a word from a predefinable vocabulary of a speech recognition system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829577A (en) * 1986-03-25 1989-05-09 International Business Machines Corporation Speech recognition method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4297528A (en) * 1979-09-10 1981-10-27 Interstate Electronics Corp. Training circuit for audio signal recognition computer
JPH0792673B2 (en) * 1984-10-02 1995-10-09 株式会社東芝 Recognition dictionary learning method
JP2584249B2 (en) * 1986-10-31 1997-02-26 三洋電機株式会社 Voice recognition phone

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4829577A (en) * 1986-03-25 1989-05-09 International Business Machines Corporation Speech recognition method

Also Published As

Publication number Publication date
EP0470411A3 (en) 1992-09-16
JPH04240700A (en) 1992-08-27
DE4024890A1 (en) 1992-02-13
AU8130791A (en) 1992-02-13
EP0470411A2 (en) 1992-02-12

Similar Documents

Publication Publication Date Title
CN1248192C (en) Semi-monitoring speaker self-adaption
FI117954B (en) System for verifying a speaker
US5812973A (en) Method and system for recognizing a boundary between contiguous sounds for use with a speech recognition system
JP2768274B2 (en) Voice recognition device
JP2000507714A (en) Language processing
WO1998038632A1 (en) Method and system for establishing handset-dependent normalizing models for speaker recognition
US20020077813A1 (en) System and method for relatively noise robust speech recognition
US5734793A (en) System for recognizing spoken sounds from continuous speech and method of using same
JPH01291298A (en) Adaptive voice recognition device
Delcroix et al. Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation
JPH075892A (en) Voice recognition method
JPH09160584A (en) Voice adaptation device and voice recognition device
US6574596B2 (en) Voice recognition rejection scheme
AU646060B2 (en) Adaptation of reference speech patterns in speech recognition
Gupta et al. High-accuracy connected digit recognition for mobile applications
JP2002123286A (en) Voice recognizing method
JP2001083986A (en) Method for forming statistical model
Molau et al. Enhanced histogram normalization in the acoustic feature space.
Yoma et al. Weighted Viterbi algorithm and state duration modelling for speech recognition in noise
US20080228477A1 (en) Method and Device For Processing a Voice Signal For Robust Speech Recognition
GB2231700A (en) Speech recognition
Gouvêa et al. Adaptation and compensation: Approaches to microphone and speaker independence in automatic speech recognition
NZ239139A (en) Speech recognition: adapting stored speech patterns to ambient
Fissore et al. HMM modeling for speaker independent voice dialing in car environment
KR20040073145A (en) Performance enhancement method of speech recognition system