CA2398579C - Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration - Google Patents

Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration Download PDF

Info

Publication number
CA2398579C
CA2398579C CA002398579A CA2398579A CA2398579C CA 2398579 C CA2398579 C CA 2398579C CA 002398579 A CA002398579 A CA 002398579A CA 2398579 A CA2398579 A CA 2398579A CA 2398579 C CA2398579 C CA 2398579C
Authority
CA
Canada
Prior art keywords
voice
speech recognition
dtmf
telephone devices
call control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002398579A
Other languages
French (fr)
Other versions
CA2398579A1 (en
Inventor
Paul Andrew Erb
Dieter Schulz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitel Networks Corp
Original Assignee
Mitel Networks Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitel Networks Corp filed Critical Mitel Networks Corp
Priority to CA002398579A priority Critical patent/CA2398579C/en
Publication of CA2398579A1 publication Critical patent/CA2398579A1/en
Application granted granted Critical
Publication of CA2398579C publication Critical patent/CA2398579C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q3/00Selecting arrangements
    • H04Q3/58Arrangements providing connection between main exchange and sub-exchange or satellite
    • H04Q3/62Arrangements providing connection between main exchange and sub-exchange or satellite for connecting to private branch exchanges
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q2213/00Indexing scheme relating to selecting arrangements in general and for multiplex systems
    • H04Q2213/13103Memory
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q2213/00Indexing scheme relating to selecting arrangements in general and for multiplex systems
    • H04Q2213/1322PBX
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q2213/00Indexing scheme relating to selecting arrangements in general and for multiplex systems
    • H04Q2213/13378Speech recognition, speech analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q2213/00Indexing scheme relating to selecting arrangements in general and for multiplex systems
    • H04Q2213/13405Dual frequency signaling, DTMF

Abstract

A telephone system is provided comprising a plurality of telephone devices, a call control for establishing calls between the telephone devices in response to receipt of commands, at least one Voice and DTMF detector (VDD) resource allocated to the calls between the telephone devices for (i) detecting and transmitting DTMF tones received from the telephone devices to the call control for further call processing, and for (ii) detecting and buffering voice received from the telephone devices and notifying the call control. At least one Speech Recognition Engine is also provided for receiving the voice from theVoice and DTMF detector resource upon notification of the call control and in response performing speech recognition and notifying the call control for further call processing.

Description

INTERMEDIATE VOICE AND DTMF DETECTOR DEVICE FOR IMPROVED
SPEECH RECOGNITION UTILIZATION AND PENETRATION
Field of the Invention The present invention relates in general to voice recognition systems for telephony, and more specifically to a method and apparatus for providing voice and tone detection prior to allocation of a speech recognition engine to a call.
Background of the Invention The integration of speech recognition into modern day PBX systems provides new user interface capabilities to augment traditional telephone device DTMF tones and 'feature' keys for call control. Speech recognition capabilities may be provided through the allocation of speech recognition engines (SREs) to a call in progress. For example, PBX
systems manufactured by Mitel Networks Corporation may be configured with a number of ports for allocating Speak@EaseTM SRE resources. Each SRE resource is a general purpose "device"
which provides all speech recognition and related capabilities (which may be composed of one or more processes). These capabilities include, but are not limited to, voice detection, DTMF detection, voice recognition, and application processing.
As speech recognition becomes more common, it is anticipated that a much larger number of SRE resources will be required to accommodate increased utilization.
The provisioning of additional SRE resources to meet anticipated usage increases the overall cost of a PBX installation. As a result, the potential penetration of speech recognition applications is subject to cost considerations and is limited except where cost justified.
According to the existing state-of the-art, SRE resources are associated with a call whenever there is a potential need for speech recognition, regardless of whether speech recognition is actually invoked during the call. Consequently, PBX systems are now configured with a plurality of SRE resources that are dedicated to servicing one or more speech recognition applications, in a PBX network. When all of the SRE
resources are in use, subsequent requests for the supported speech recognition applications are denied or deferred until an SRE resource becomes available. When the SRE is servicing a user, all capabilities are provided, regardless of utilization. For example, if a user initiates a request for which an SRE is allocated and simply dials digits at the telephone device (i.e. dialing the destination number rather than speaking the name) then the full capabilities of the SRE
are underutilized.
However, as indicated above call control allocates the SRE resource whenever speech recognition may be required, regardless of actual utilization.
Summar~of the Invention According to the present invention, a voice and DTMF detector resource (VDD) is allocated to a call prior to allocating an SRE resource. The SRE resource is only allocated when speech recognition capabilities are required. The Voice and DTMF detector resource (VDD) is a limited capability digital signal processor that can be provided in volume at relatively low cost (using existing Digital Signal Processing (DSP) technology). The presence or absence of the Voice and DTMF detector resource (VDD) does not impact the SRE
resource.
Brief Descr~tion of the Detailed Drawing A description of a preferred embodiment of the present invention is provided herein below with reference to the accompanying drawings in which:
Figure 1 is a block diagram of a telephony system incorporating a plurality of intermediate Voice and DTMF detector resources (VDDs) and Speech Recognition Engines (SREs), according to the present invention; and Figures 2 shows the structure and operation of a VDD in accordance with the preferred embodiment.
Detailed Description of Preferred Embodiments Turning to Figure 1, a telephony system is shown incorporating the principles of the present invention. Specifically, a plurality of telephones 1A, 1B, etc. are provided for
2 conducting telephone calls between parties under control of call control 3. In accordance with the present invention, a plurality of Voice and DTMF detector resources (VDDs) SA, SB...SF
may be allocated to an originating call by call control 3 in a manner similar to a traditional DTMF receiver. Each VDD SA, SB, etc. is a special purpose digital signal processing (DSP) resource which provides DTMF detection, voice detection, voice buffering and voice streaming, as discussed in greater detail below. A plurality of SRE resources 7 is also provided, as is known in the art.
In operation, call control 3 allocates a VDD (e.g. VDD SA) in response to one of the telephones (e.g. phone 1A) initially going off hook. The SRE resources 7 are not immediately allocated to the call, in contrast with prior art systems. Upon detection of DTMF tone, the VDD SA performs digit collection and reporting as is done by a traditional DTMF receiver.
Upon detection of voice, however, the VDD SA provides an indication to call control 3. Call control 3 then allocates an SRE resource 7, establishes a connection between the VDD SA
1 S and the SRE resource 7, and requests the VDD SA to begin streaming buffered voice to the connected SRE resource 7 (while buffering the received voice). The SRE
resource 7 receives voice from the VDD SA as though connected directly to a telephone device and performs the usual speech recognition operations.
If an SRE resource 7 is unavailable, a warning is provided to the user by call control
3. The user is prompted to be patient until an SRE resource 7 becomes available or, alternatively, to try their request later (i.e. defernng or rejecting the request, respectively).
However, in the event that the VDD SA contains sufficient memory to buffer up to ten seconds of voice, (e.g. by applying real-time voice compression algorithms), call control 3 is configured to perform a second attempt to obtain an SRE resource 7 prior to deferring or rejecting the user request.
The VDD resource SA purges its voice buffer upon receipt of an instruction from call control 3.
Turning to Figure 2, the VDD consists of a DTMF receiver 10, a voice detector 12, and optionally a narrow band frequency detector 14 to guard against spurious triggers when tones are present. The output of the detectors is fed to a decision logic block 16 that determines whether a voice phrase or a DTMF tone is present. In parallel with the foregoing blocks, the voice is buffered in a voice buffer 18, to compensate for latencies in the detection process and delays in switching in the SRE resource.
The voice buffer 18 is continuously updated such that the newest sample overwrites the oldest. This can be implemented either by a circular buffer or a shift register buffer, both of which are commonly known in the art. The length of the buffer depends on the inherent latencies of the detectors, the delays in the system and the typical time the user wishes to allow for the system to respond.
Upon detection of a DTMF digit, the digit is reported to call control 3. When voice is detected, the delayed voice stream is switched at 20 to an SRE 7 for speech recognition.
Additional speech recognition functions can be performed by the VDDs SA, SB, etc., such as voice end-pointing and hot word spotting, in order to further off load the requirements of the SRE's 7.
Exemplary pseudo-code for implementation of the DTMF detector 10, is as follows:
DO Forever Receive Signal IF DTMF detected Send DTMF detected event to decision logic END
ENDDO
Exemplary pseudo-code for implementation of the voice detector 12, is as follows:
DO Forever Receive Signal IF Voice detected Send Voice detected event to decision logic END
ENDDO
Exemplary pseudo-code for implementation of the optional narrow-band frequency detector, is as follows:
DO Forever Receive Signal
4 IF Narrow Band Freq detected Send Narrow Band Freq detected event to decision logic END
ENDDO
Exemplary pseudo-code for implementation of the decision logic 16, is as follows:
DO Forever IF DTMF detected Send DTMF detected event to client ELSE IF Narrow band freq detected (optional) Do nothing ELSE IF Voice detected Send Voice detected event to client END
ENDDO
A person of ordinary skill in the art will appreciate that by utilizing the VDD
resources of the present invention, substantial savings may be obtained in the cost of deploying SRE resources. For example, in a networked telephone system having 1000 users and allowing for 20% of the users to actively initiate call re-direction, 192 VDD resources may be provided (6 banks X 32 VDDs per bank) for initial call allocation.
Allowing for 10%
of these calls to require actual speech recognition services, then 16 SRE
resources are needed.
The cost of 192 VDDs and 16 SREs is significantly less than the cost of 192 SRE resources.
It will be appreciated that, although embodiments of the invention have been described and illustrated in detail, various changes and modifications may be made. For example, although the preferred embodiment utilizes a Mitel Speech Recognition Engine (sold under the name SpeakEasy or eak o asel, other SRE resources may be used.
Furthermore, the telephones 1A, 1B, etc. may be any suitable voice communication device such as a Plain Old Telephone Set (POTS), mobile phone, IP Phone, PC with voice communication functionality, etc. Also, the principles of the invention may be applied to other applications than speech recognition enhanced telephone directory services. For example, the invention may be used to advantage in a Call Center and/or Interactive Voice Response (IVR) application where a user may be prompted for information and DTMF or speech recognition is used to obtain caller responses. The invention as described above has been applied specifically to voice applications in the PBX domain. However, the invention may also be applied equally to voice applications in the CO domain and in mixed media
5 communications, in either domain. Specifically, the invention may be applied to any situation where a large number of general purpose SRE resources can be displaced by a similar number of low cost VDD resources and a smaller number of SRE resources.
All of the forgoing changes and modifications may be made without departing from the sphere and scope of the invention as defined by the claims appended herein.
6

Claims (4)

What is Claimed is:
1. A telephone system comprising:
a plurality of telephone devices;
a call control for establishing calls between said telephone devices in response to receipt of commands from said telephone devices;
at least one Voice and DTMF detector resource automatically allocated to said calls between said telephone devices for (i) detecting and transmitting DTMF tones received from said telephone devices to said call control for further call processing, and (ii) for detecting and buffering voice received from said telephone devices and notifying said call control; and at least one Speech Recognition Engine for receiving said voice from said at least one Voice and DTMF detector resource only upon notification of said call control and in response performing speech recognition and notifying said call control for further call processing.
2. The telephone system of claim 1, wherein said at least one Voice and DTMF
detector resource further comprises:
a DTMF detector for detecting DTMF tones received from said telephone devices;
a voice detector for detecting voice received from said telephone devices;
a voice buffer for buffering voice received from said telephone devices; and decision logic connected to said DTMF detector and said voice detector for detecting between voice and DTMF tones received from said telephone devices and in response notifying said call control, whereupon said voice buffered within said voice buffer is transmitted to said at least one Speech Recognition Engine.
3. The telephone system of claim 2, wherein said voice buffer is one of either a circular buffer or shift register buffer for continuously updating samples of said voice such that each new sample over-writes an oldest buffered sample.
4. The telephone system of claim 2, wherein said at least one Voice and DTMF
detector resource further comprises a narrow band frequency detector for preventing spurious switching of said voice to said at least one Speech Recognition Engine.
CA002398579A 2002-08-19 2002-08-19 Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration Expired - Lifetime CA2398579C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002398579A CA2398579C (en) 2002-08-19 2002-08-19 Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002398579A CA2398579C (en) 2002-08-19 2002-08-19 Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration

Publications (2)

Publication Number Publication Date
CA2398579A1 CA2398579A1 (en) 2004-02-19
CA2398579C true CA2398579C (en) 2005-11-15

Family

ID=31892644

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002398579A Expired - Lifetime CA2398579C (en) 2002-08-19 2002-08-19 Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration

Country Status (1)

Country Link
CA (1) CA2398579C (en)

Also Published As

Publication number Publication date
CA2398579A1 (en) 2004-02-19

Similar Documents

Publication Publication Date Title
US6801602B2 (en) Method of initiating a call feature request
US6493428B1 (en) Text-enhanced voice menu system
RU2173028C2 (en) Method and device for transmission and routing of speech telephone calls in computer network with package switching
US6353745B1 (en) Method for providing performance features for mobile subscribers via a communications network
US6400815B1 (en) Method and apparatus for subscriber line to telephone call distribution
US6332073B1 (en) Emergency number dialing from a fixed cellular terminal
US6795533B2 (en) Intermediate voice and DTMF detector device for improved speech recognition utilization and penetration
US5448286A (en) Process and terminal for video telephony permitting acceptance, and possibly calling, by a telephone set
US6058180A (en) Automatic person-following communication system
US5995604A (en) Method of preventing fraudulent toll calls by key system users
KR100348606B1 (en) Gateway apparatus
CA2238169C (en) A system for connecting calls on physically distinct servers on an advanced intelligent network
CA2398579C (en) Intermediate voice and dtmf detector device for improved speech recognition utilization and penetration
US6553024B1 (en) Method and configuration for providing performance features for mobile subscribers via a communications network
US7142658B2 (en) DTMF control
CA2357326C (en) Telephone independent provision of speech recognition during dial tone and subsequent call progress states
JPH10285285A (en) Call connection method capable of using subscriber's telephone number before change at the time of changing accommodated exchange
RU2212117C2 (en) Establishing connections in stationary wireless communication system
US20040028202A1 (en) Method and system for providing conference feature between internet call and telephone network call in a webphone system
US20020196920A1 (en) Undetected provision of auxiliary services during a telephone conversation
JPS6074757A (en) Intercepting service system
KR100851182B1 (en) Apparatus and method for processing multi-bell service to public-switched-telephone-network subscribers in electronic-switching-system
KR19980014156A (en) How to intercept foreign subscribers from other electronic exchanges
KR20010087700A (en) method for recipient distinctive service providing in switching system
KR19980061876A (en) How to break-in the Centx guide in a small capacity electronic changer

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20220819

MKEX Expiry

Effective date: 20220819