WO1995005655B1

WO1995005655B1 - Method for recognizing a spoken word in the presence of interfering speech

Info

Publication number: WO1995005655B1
Application number: PCT/US1994/009353
Authority: WO
Filing date: 1994-08-15
Publication date: 1995-05-18

Abstract

A method for recognizing a spoken word in the presence of interfering speech begins by echo cancelling the voice prompt and any detected speech signal to produce a residual signal (60). Portions of the residual signal that have been most recently echo-cancelled are then continuously stored in a buffer (62). The energy in the residual signal is also continuously processed to determine onset of the spoken word (64). Upon detection of word onset, the portion of the residual signal then currently in the buffer is retained, the voice prompt is terminated, and the recognizer begins realtime recognition of subsequent portions of the residual signal (66). Upon detection of word completion (68), the method retrieves the portion of the residual signal that was retained in the buffer upon detection of word onset (70) and performs recognition of that portion (72).

Claims

AMENDED CLAIMS

[received by the International Bureau on 30 March 1995 (30.03.95); original claim 5 cancelled; remaining claims amended (4 pages)]

1. A method for recognizing a spoken word in the presence of a voice message generated by a voice processing system, the voice processing system having a speech recognizer, comprising the steps of:

(a) echo cancelling the voice message and any detected speech signal to produce a residual signal;

(b) continuously storing a portion of the residual signal that has been most recently

processed;

(c) processing the residual signal to detect a first portion of the spoken word;

(d) upon detection of the first portion of the spoken word, storing the portion of the residual signal including the first portion of the spoken word that has been most recently processed at the time of such detection, stopping echo cancelling of the voice message and any detected speech signal and initiating speech recognition of a second portion of the spoken word;

(e) thereafter initiating speech recognition of the stored first portion of the spoken word; and

(f) combining results of the speech

recognition effected in steps (c) and (d) to

determine the spoken word.

2. The method as described in Claim 1 further including the step of ceasing the voice message upon detection of the first portion of the spoken word.

3. The method as described in Claim 1 further including the step of detecting completion of the spoken word prior to initiating speech recognition of the stored first portion of the spoken word.

4. The method as described in Claim 1 wherein the recognition of the second portion of the spoken word occurs. 6. The method as described in Claim 1 wherein the step of echo cancelling further includes the steps of estimating an energy level in the residual signal and comparing the estimated energy level to a predetermined threshold energy level.

7. A method for recognizing a spoken word in the presence of a voice message generated by voice processing system, the voice processing system having a speech recognizer, comprising the steps of:

(B) echo cancelling the voice message and any detected speech signal to produce a residual signal;

(b) continuously storing a portion of the residual signal that has been most recently echo cancelled;

(d) upon detection of the first portion of the spoken word, retaining the stored portion of the residual signal including the first portion of the spoken word that has been most recently processed at the time of such detection, ceasing the voice message, stopping echo cancelling of the voice message and any detected speech signal and

initiating speech recognition of spoken word;

(e) thereafter initiating speech recognition of the first portion of the spoken word retained upon detection of the first portion of the spoken word; and (f) combining results of the recognition effected in steps (d) and (e) to determine the spoken word. 8. The method as described in Claim 7 further including the step of detecting completion of the spoken word prior to initiating speech recognition of the retained first portion of the spoken word. 9. The method as described in Claim 7 wherein the recognition of the second portion of the spoken word occurs in realtime.

10. A method, using a single digital signal processor, for recognizing a spoken word in the presence of interfering speech, comprising the steps of:

(a) echo cancelling the interfering speech and any detected speech signal with the single digital signal processor to produce a residual signal;

(d) upon detection of the first portion of the spoken word, retaining the portion of the residual signal including the first portion of the spoken word that has been most recently processed at the time of such detection, ceasing the interfering speech and switching the single digital signal processor from echo cancelling of the interfering speech to speech recognition of a second portion of the spoken word; (e) detecting completion of the spoken word;

(f) upon detection of completion of the spoken word, initiating speech recognition of the retained first portion of the spoken word; and

(g) combining results of the speech

recognition effected in steps (d) and (f) to determine the spoken word.

STATEMENT UNDER ARTICLE 19

In response to the Notification of Transmittal of The International Search Report Or The

Declaration, the Applicant has amended Claims 1-4 and 6-10 and cancelled Claim 5. Thus Claims 1-4 and 6-10 are now pending in the Application, The

Applicant has adopted the language of the Abstract as suggested by the Examiner.

The Examiner rejected Claims 1-10 as being unpatentable over Kartwell, Johnson and Noso. The

Claims have been amended to more particularly point out how upon detection of a first portion of a spoken word the portion of the residual signal including the first portion of a spoken word is retained, echo cancellation is stopped and speech recognition of a second portion of the spoken word is initiated. After the second portion of the spoken word is subjected to speech recognition, the stored first portion of the spoken word is then subjected to speech recognition. The first and second portions are finally combined to determine the spoken word. Neither the Hartwell nor the

Johnson references describe a method which first detects a portion of a spoken word within

interfering speech and then stops echo cancelling to initiate speech recognition of a second portion of the detected word. This procedure enables an echo canceller and voice recognizer to be utilized with fewer system requirements than would otherwise be necessary.

The Hartwell reference discloses an apparatus which first echo cancels throughout a received spoken word for a predetermined period of time.

After the predetermined time period expires, the echo cancelled spoken signal is subjected to speech recognition. Thus, the reference does not disclose stopping echo cancellation to initiate speech recognition upon detection of a spoken word, but the continuous echo cancellation of the signal throughout a predetermined time period including the duration of the spoken word and then the speech recognition of the echo cancelled signal including the spoken word. Additionally, the Hartwell, et al. reference does not disclose speech recognition of a second portion of a detected spoken word and then going back to speech recognize the first portion of the spoken word to determine the complete spoken word.

The Johnson, et al. reference describes a system wherein the cancellation procedure is begun after the initiation of a prompt or announcement and continues throughout the playing of the

announcement. Even when a speech signal is detected within the echo cancelled incoming signal, the echo cancellation continues as the received signal is recorded. In the present invention, once the speech signal is detected, echo cancellation ceases and speech recognition begins. Furthermore, the

Johnson, et al. reference describes the recording of incoming speech signal, and not the voice

recognition thereof as claimed by the Applicant.

The Applicant respectfully submits that the claims are allowable over the Noso, et al. reference for reasons similar to those discussed with respect to the Hartwell and Johnson references.

Furthermore, the Applicant respectfully submits that the Noso, et al. reference fails to disclose the use of a cancellation/recognition system wherein echo cancellation is stopped and speech recognition initiated upon the detection of a spoken word within a residual signal.

Submitted concurrently herewith are substitute pages 13-17 for pages 13-16 originally submitted with the Application. Upon review of the amended claim, it will be evident that the Claims are now fully patentable over the prior art.