AU646060B2

AU646060B2 - Adaptation of reference speech patterns in speech recognition

Info

Publication number: AU646060B2
Application number: AU81307/91A
Authority: AU
Inventors: Uwe Ackermann; Susanne Dvorak; Thomas Hormann; Dieter Kopp
Original assignee: Alcatel NV
Current assignee: Alcatel Lucent NV
Priority date: 1990-08-06
Filing date: 1991-07-25
Publication date: 1994-02-03
Anticipated expiration: 2011-07-25
Also published as: EP0470411A3; JPH04240700A; DE4024890A1; AU8130791A; EP0470411A2

Description

S.

9* .5 S S

S

S S 9* P/00/011 2815/91 Reguflatlon 3.2 64 6 A UST RALLIA Patents Act 1990

ORIGINAL

COMPLETE SPECIFICATION STANDARD PATENT Iriven tion Hit~e: "ADAPTATION OF REFERENCE SPEECH PATTERNS IN SPEECH

RECOGNITION"

The followving statcmicnt is a fult description of this invention, including the best method of performing it knowvn to LIS:-

S

5* 6S9 S S S 59 5 *5* 55 S S S. 0~ *0 This invention relates to a method of continuously adapting stored reference speech patterns for speech recognition to environment dependent pronunciation variations.

Conventional automatic adapting techniques use a speech recognition performing a filter bank analysis wherein adaptation to ambient noise unit which is constant during word input is implemented by arranging that in a filter bank output signal consisting of useful signal and noise signal components, a constant level noise component caused by the noisy environment is suppressed by differentiation and subsequent integration.

It is also known to carry out speech pattern adaptation in order to achieve speaker independent characteristics. To do this, the original reference speech patterns arc derived by clustering several speakers, adaptation being effected only upon identification of an incorrect speech pattern Smith ct al, "Template Adaptation in a 0' Hypersphere Word Classifier", IEEE, ICASSP, 1990, Vol 1, pp. 565-568).

Investigations have shown, however, that the results produced by speech recognisers depend not only on the acoustic environment (backgound noise) but also S on the speaker's psychic state (stress factor) Junqua, Y. Anglade, "Acoustic and Perceptual Studies of Lombard Speech: Application to Isolated-Words Automatic Speech Recognition", IEEE, ICASSP, 1990, Vol. 2, pp. 841-844).

1 20 Especially if speech recognisers are used in motor vehicles car telephone), the driver/speaker is under considerable stress at higher speeds, so that his or her pronunciation will differ greatly from the normal pronunciation, for which the referience speech patterns were trained.

It is, therefore, the object of the invention to provide a method which permits adaptation to the environment induced pronunciation variations.

According to the present invention, there is provided a method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.

In particular, the method according to the invention permits continuous adaptation to changes in pronunciation caused by ambient influences. Besides the changed pronunciation, the method according to the invention implicitly takes into account background/disturbing noise, because such noise is already contained in the new reference speech pattern, so that the hitherto used, very complicated methods employing filter banks are no longer necessary.

The method according to the invention will now be explained in more detail with the aid of an embodiment.

It is assumed that reference speech patterns determined in a training phase are stored in a speech recogniser.

For each reference speech pattern, r(i) coefficients representing a number of feature vectors are stored as a frame, with i 0 (number of coefficients of a frame i 0 9).

10 An i-th coefficient computed in the training phase will hereinafter be denoted by r According to the invention, in order to continuously adapt such a stored (original) reference speech pattern to environment induced pronunciation variations, after termination of a recognition phase a new reference speech pattern, eg., new coefficients r is calculated from a correct speech pattern and the stored coefficient rr(i) by weighted averaging, and stored instead of the original coefficients.

The weighted averaging is performed according to the following rule: r (aX r(i) 1) where a is the weighting factor. For a 1, the conventional weighted averaging resuits. Preferably, a weighting factor of a 3 is chosen whereby the originally cal- 20 culated coefficients arc weighted more heavily than the newly determined coefficients to effect a continuous rather than momentary correction of the reference speech patterns.

In a further embodiment of the invention, a new reference speech pattern for i* continuous adaptation is calculated only if the speech pattern just identified as correct and the stored reference speech pattern are sufficiently similar. To this end, a pattern spacing calculated in the recognition phase, ie., the respective coefficient value of the correct speech pattern to the coefficient value of the stored reference speech pattern, is compared with a predetermined threshold value. If the pattern spacing is less than the threshold value, ie., if the two speech patterns are similar enough, a new reference speech pattern will be calculated using the aforementioned averaging, and stored.

If the correct speech pattern differs in length from the stored reference speech pattern, ie., if it contains more frames, adaptation to the length of the stored reference speech pattern will be effected by dynamic programming. The dynamic programming method is described by H. Sakoc, S. Chib;,, "Dynamic Programming Algorithm Optimisation for Spoken Word Recognition", IEEE Transactions on Acoustics, Speech, and Signal Processing, V ASSP-26, No. 1, February 1978.

The method according to the invention thus permits the trained reference speech patterns to be automatically adapted to changing ambient situations by weighted averaging whenever a correct speech pattern is identified.

If, because of a large dusturbing-noise component in the speech signal, additional noise reduction by means of conventional filter-bank analyses should be necessary, the resulting negative effects of the spectral discolorations are also compensated for by the continuous reference speech pattern adaptation according to 10 the invention.

900 k* 4 9 o

Claims

1. A method of continuously adapting stored reference speech patterns for speech recognition to environment induced pronunciation variations wherein after termination of each recognition phase, a new reference speech pattern is calculated from a correct speech pattern and the associated stored reference speech pattern by weighted averaging, and stored instead of the previous reference speech pattern.

2. A method as claimed in claim 1 wherein the weighted averaging is performed according to the following rule: r, (a X rr (a 1) where i 0 (number of coefficients of a frame, rn i-th coefficient of the new reference speech pattern of a frame, r i-th coefficient of the originally stored reference :15 pattern of a frame, s i-th coefficient of the correct speech pattern of a frame and, a weighting factor 1 preferably 3.

3. A method as claimed in claim 1 or 2, wherein a pattern spacing is 20 calculated in the said recognition phase from the respective coefficient value of 0o0:o. the correct speech pattern and the coefficient value of the stored speech pattern, and compared with a predetermined threshold value, a new reference speech pattern being formed only if the pattern spacing is less than the threshold value. .00. 25

4. A method as claimed in any one of the preceding claims wherein if the correct speech pattern differs in length from the stored reference speech pattern, adaptation to the length of the stored reference speech pattern is effected by time-dynamic programming. DATED THIS TWENTY-FOURTH DAY OF NOVEMBER 1933 0 ALCATEL N.V. ABSTRACT Investigations have shown that the results produiced by speech reccognisers dc- penci not only on the acoustic cnvironment (background noise) but also on thc speaker's psychic state (stress). According to this invention, stored reference spccch patterns are continuously adapted to pronunciation variations by calculating, for each correctly identified word, a new reference speech pattern from the stored] pattern and the newly idecntifiC(l pat- tern by weighted averaging, aind storinig it instcad of the previous patternI. ft