AU646060B2 - Adaptation of reference speech patterns in speech recognition - Google Patents
Adaptation of reference speech patterns in speech recognition Download PDFInfo
- Publication number
- AU646060B2 AU646060B2 AU81307/91A AU8130791A AU646060B2 AU 646060 B2 AU646060 B2 AU 646060B2 AU 81307/91 A AU81307/91 A AU 81307/91A AU 8130791 A AU8130791 A AU 8130791A AU 646060 B2 AU646060 B2 AU 646060B2
- Authority
- AU
- Australia
- Prior art keywords
- pattern
- speech pattern
- speech
- stored
- reference speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 230000006978 adaptation Effects 0.000 title claims description 12
- 238000000034 method Methods 0.000 claims description 15
- 238000012935 Averaging Methods 0.000 claims description 9
- 238000011835 investigation Methods 0.000 claims description 2
- 230000003236 psychic effect Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000002845 discoloration Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Machine Translation (AREA)
- Noise Elimination (AREA)
Description
S.
9* .5 S S
S
S
S S 9* P/00/011 2815/91 Reguflatlon 3.2 64 6 A UST RALLIA Patents Act 1990
ORIGINAL
COMPLETE SPECIFICATION STANDARD PATENT Iriven tion Hit~e: "ADAPTATION OF REFERENCE SPEECH PATTERNS IN SPEECH
RECOGNITION"
The followving statcmicnt is a fult description of this invention, including the best method of performing it knowvn to LIS:-
S
5* 6S9 S S S 59 5 *5* 55 S S S. 0~ *0 This invention relates to a method of continuously adapting stored reference speech patterns for speech recognition to environment dependent pronunciation variations.
Conventional automatic adapting techniques use a speech recognition performing a filter bank analysis wherein adaptation to ambient noise unit which is constant during word input is implemented by arranging that in a filter bank output signal consisting of useful signal and noise signal components, a constant level noise component caused by the noisy environment is suppressed by differentiation and subsequent integration.
It is also known to carry out speech pattern adaptation in order to achieve speaker independent characteristics. To do this, the original reference speech patterns arc derived by clustering several speakers, adaptation being effected only upon identification of an incorrect speech pattern Smith ct al, "Template Adaptation in a 0' Hypersphere Word Classifier", IEEE, ICASSP, 1990, Vol 1, pp. 565-568).
Investigations have shown, however, that the results produced by speech recognisers depend not only on the acoustic environment (backgound noise) but also S on the speaker's psychic state (stress factor) Junqua, Y. Anglade, "Acoustic and Perceptual Studies of Lombard Speech: Application to Isolated-Words Automatic Speech Recognition", IEEE, ICASSP, 1990, Vol. 2, pp. 841-844).
1 20 Especially if speech recognisers are used in motor vehicles car telephone), the driver/speaker is under considerable stress at higher speeds, so that his or her pronunciation will differ greatly from the normal pronunciation, for which the referience speech patterns were trained.
It is, therefore, the object of the invention to provide a method which permits adaptation to the environment induced pronunciation variations.
According to the present invention, there is provided a method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.
In particular, the method according to the invention permits continuous adaptation to changes in pronunciation caused by ambient influences. Besides the changed pronunciation, the method according to the invention implicitly takes into account background/disturbing noise, because such noise is already contained in the new reference speech pattern, so that the hitherto used, very complicated methods employing filter banks are no longer necessary.
The method according to the invention will now be explained in more detail with the aid of an embodiment.
It is assumed that reference speech patterns determined in a training phase are stored in a speech recogniser.
For each reference speech pattern, r(i) coefficients representing a number of feature vectors are stored as a frame, with i 0 (number of coefficients of a frame i 0 9).
10 An i-th coefficient computed in the training phase will hereinafter be denoted by r According to the invention, in order to continuously adapt such a stored (original) reference speech pattern to environment induced pronunciation variations, after termination of a recognition phase a new reference speech pattern, eg., new coefficients r is calculated from a correct speech pattern and the stored coefficient rr(i) by weighted averaging, and stored instead of the original coefficients.
The weighted averaging is performed according to the following rule: r (aX r(i) 1) where a is the weighting factor. For a 1, the conventional weighted averaging resuits. Preferably, a weighting factor of a 3 is chosen whereby the originally cal- 20 culated coefficients arc weighted more heavily than the newly determined coefficients to effect a continuous rather than momentary correction of the reference speech patterns.
In a further embodiment of the invention, a new reference speech pattern for i* continuous adaptation is calculated only if the speech pattern just identified as correct and the stored reference speech pattern are sufficiently similar. To this end, a pattern spacing calculated in the recognition phase, ie., the respective coefficient value of the correct speech pattern to the coefficient value of the stored reference speech pattern, is compared with a predetermined threshold value. If the pattern spacing is less than the threshold value, ie., if the two speech patterns are similar enough, a new reference speech pattern will be calculated using the aforementioned averaging, and stored.
If the correct speech pattern differs in length from the stored reference speech pattern, ie., if it contains more frames, adaptation to the length of the stored reference speech pattern will be effected by dynamic programming. The dynamic programming method is described by H. Sakoc, S. Chib;,, "Dynamic Programming Algorithm Optimisation for Spoken Word Recognition", IEEE Transactions on Acoustics, Speech, and Signal Processing, V ASSP-26, No. 1, February 1978.
The method according to the invention thus permits the trained reference speech patterns to be automatically adapted to changing ambient situations by weighted averaging whenever a correct speech pattern is identified.
If, because of a large dusturbing-noise component in the speech signal, additional noise reduction by means of conventional filter-bank analyses should be necessary, the resulting negative effects of the spectral discolorations are also compensated for by the continuous reference speech pattern adaptation according to 10 the invention.
900 k* 4 9 o
Claims (4)
1. A method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.
2. A method as claimed in claim 1 wherein the weighted averaging is performed according to the following rule: r, (a X rr (a 1) where i 0 (number of coefficients of a frame, rn i-th coefficient of the new reference speech pattern of a frame, r i-th coefficient of the originally stored reference :15 pattern of a frame, s i-th coefficient of the correct speech pattern of a frame and, a weighting factor 1 preferably 3.
3. A method as claimed in claim 1 or 2, wherein a pattern spacing is 20 calculated in the said recognition phase from the respective coefficient value of 0o0:o. the correct speech pattern and the coefficient value of the stored speech pattern, and compared with a predetermined threshold value, a new reference speech pattern being formed only if the pattern spacing is less than the threshold value. .00. 25
4. A method as claimed in any one of the preceding claims wherein if the correct speech pattern differs in length from the stored reference speech pattern, adaptation to the length of the stored reference speech pattern is effected by time-dynamic programming. DATED THIS TWENTY-FOURTH DAY OF NOVEMBER 1933 0 ALCATEL N.V. ABSTRACT Investigations have shown that the results produiced by speech reccognisers dc- penci not only on the acoustic cnvironment (background noise) but also on thc speaker's psychic state (stress). According to this invention, stored reference spccch patterns are continuously adapted to pronunciation variations by calculating, for each correctly identified word, a new reference speech pattern from the stored] pattern and the newly idecntifiC(l pat- tern by weighted averaging, aind storinig it instcad of the previous patternI. ft
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE4024890 | 1990-08-06 | ||
DE19904024890 DE4024890A1 (en) | 1990-08-06 | 1990-08-06 | ADAPTATION OF REFERENCE LANGUAGE PATTERNS TO ENVIRONMENTAL PRONOUNCEMENT VERSIONS |
Publications (2)
Publication Number | Publication Date |
---|---|
AU8130791A AU8130791A (en) | 1992-02-13 |
AU646060B2 true AU646060B2 (en) | 1994-02-03 |
Family
ID=6411712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU81307/91A Ceased AU646060B2 (en) | 1990-08-06 | 1991-07-25 | Adaptation of reference speech patterns in speech recognition |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0470411A3 (en) |
JP (1) | JPH04240700A (en) |
AU (1) | AU646060B2 (en) |
DE (1) | DE4024890A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE4322372A1 (en) * | 1993-07-06 | 1995-01-12 | Sel Alcatel Ag | Method and device for speech recognition |
DE19804047C2 (en) * | 1998-02-03 | 2000-03-16 | Deutsche Telekom Mobil | Method and device for increasing the probability of recognition of speech recognition systems |
DE19813061A1 (en) * | 1998-03-25 | 1999-09-30 | Keck Klaus | Arrangement for altering the micromodulations contained in electrical speech signals of telephone equipment |
GB2348035B (en) | 1999-03-19 | 2003-05-28 | Ibm | Speech recognition system |
DE60213195T8 (en) | 2002-02-13 | 2007-10-04 | Sony Deutschland Gmbh | Method, system and computer program for speech / speaker recognition using an emotion state change for the unsupervised adaptation of the recognition method |
DE102004063552A1 (en) * | 2004-12-30 | 2006-07-13 | Siemens Ag | Method for determining pronunciation variants of a word from a predefinable vocabulary of a speech recognition system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4829577A (en) * | 1986-03-25 | 1989-05-09 | International Business Machines Corporation | Speech recognition method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4297528A (en) * | 1979-09-10 | 1981-10-27 | Interstate Electronics Corp. | Training circuit for audio signal recognition computer |
JPH0792673B2 (en) * | 1984-10-02 | 1995-10-09 | 株式会社東芝 | Recognition dictionary learning method |
JP2584249B2 (en) * | 1986-10-31 | 1997-02-26 | 三洋電機株式会社 | Voice recognition phone |
-
1990
- 1990-08-06 DE DE19904024890 patent/DE4024890A1/en not_active Withdrawn
-
1991
- 1991-07-18 EP EP19910112008 patent/EP0470411A3/en not_active Withdrawn
- 1991-07-25 AU AU81307/91A patent/AU646060B2/en not_active Ceased
- 1991-08-02 JP JP19434091A patent/JPH04240700A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4829577A (en) * | 1986-03-25 | 1989-05-09 | International Business Machines Corporation | Speech recognition method |
Also Published As
Publication number | Publication date |
---|---|
EP0470411A3 (en) | 1992-09-16 |
JPH04240700A (en) | 1992-08-27 |
DE4024890A1 (en) | 1992-02-13 |
AU8130791A (en) | 1992-02-13 |
EP0470411A2 (en) | 1992-02-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1248192C (en) | Semi-monitoring speaker self-adaption | |
FI117954B (en) | System for verifying a speaker | |
US5812973A (en) | Method and system for recognizing a boundary between contiguous sounds for use with a speech recognition system | |
JP2768274B2 (en) | Voice recognition device | |
JP2000507714A (en) | Language processing | |
WO1998038632A1 (en) | Method and system for establishing handset-dependent normalizing models for speaker recognition | |
US20020077813A1 (en) | System and method for relatively noise robust speech recognition | |
US5734793A (en) | System for recognizing spoken sounds from continuous speech and method of using same | |
JPH01291298A (en) | Adaptive voice recognition device | |
Delcroix et al. | Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation | |
JPH075892A (en) | Voice recognition method | |
JPH09160584A (en) | Voice adaptation device and voice recognition device | |
US6574596B2 (en) | Voice recognition rejection scheme | |
AU646060B2 (en) | Adaptation of reference speech patterns in speech recognition | |
Gupta et al. | High-accuracy connected digit recognition for mobile applications | |
JP2002123286A (en) | Voice recognizing method | |
JP2001083986A (en) | Method for forming statistical model | |
Molau et al. | Enhanced histogram normalization in the acoustic feature space. | |
Yoma et al. | Weighted Viterbi algorithm and state duration modelling for speech recognition in noise | |
US20080228477A1 (en) | Method and Device For Processing a Voice Signal For Robust Speech Recognition | |
GB2231700A (en) | Speech recognition | |
Gouvêa et al. | Adaptation and compensation: Approaches to microphone and speaker independence in automatic speech recognition | |
NZ239139A (en) | Speech recognition: adapting stored speech patterns to ambient | |
Fissore et al. | HMM modeling for speaker independent voice dialing in car environment | |
KR20040073145A (en) | Performance enhancement method of speech recognition system |