AU3044299A

AU3044299A - Method for establishing telephone calls

Info

Publication number: AU3044299A
Application number: AU30442/99A
Authority: AU
Inventors: Reginald Alfred King
Original assignee: Domain Dynamics Ltd
Current assignee: Domain Dynamics Ltd
Priority date: 1998-03-25
Filing date: 1999-03-25
Publication date: 1999-10-18
Anticipated expiration: 2019-03-25
Also published as: GB9806401D0; WO1999049639A1; AU756212B2; JP2002508629A; EP1070415A1; GB2335826B; GB2335826A; GB9906921D0

Description

WO 99/49639 PCT/GB99/00910 1 METHOD FOR ESTABLISHING TELEPHONE CALLS Field of the Invention The present invention relates to establishing telephone calls using 5 speech recognition. Background to the Invention Hands-free communication systems are known in which reliance is made upon speech recognition in order to establish a communications 10 channel. An important example of such systems relate to their applications in mobile telephones and car phones when deployed with hands-free modules. Furthermore, in some jurisdictions, there are statutory provisions to the effect that cars may only be used in vehicles by drivers when the vehicle is in motion if hands-free facilities are employed. 15 Known systems provide speech recognition capabilities that enable a call originating user to speak the name of a required remote telephone user or a desired destination into a microphone and to have the recognition system translate these acoustic voice inputs into telephone numbers for automatically dialling the required destination. 20 Known systems operate by activating a word recognition module whereafter the user speaks individual digits or the name of the person or the destination required. Thereafter, by means of synthetic speech prompts or by other indications, the user is invited to confirm to the word recognition module that the destination selected by the device is correct and then indicates to the 25 system that a call is to be made. The call is then established so as to allow the user to communication by voice or by other means, with the called subscriber. Thus, by this mechanism, the driver may give full attention to the driving conditions (or other operational conditions when deployed in WO 99/49639 PCT/GB99/00910 2 alternative environments) without having to divert their attention to look down at a fixed mobile telephone display and thereafter manually activate physical buttons mounted to the telephone. This requirement has also been identified by the title "Safe Behind The Wheel Communication". 5 With safety concerns increasing and statutory requirements in place in some countries, the demand for hands-free facilities within motor vehicles (and other environments) is growing. However, a problem exists in that, in many normally occurring situations, speech recognition systems cannot always be guaranteed and a high failure rate may result. 10 Summary of the Invention According to a first aspect of the present invention, there is provided telephony apparatus, including speech recognition means configured to identify a destination to be called in response to a vocalisation; calling means 15 configured to call an assisting person if the speech recognition means fails to correctly identify said vocalised destination; and means for receiving telephone number data transmitted back in response to said call to said assisting person, wherein said calling means establishes a new call in response to said transmitted data. 20 Preferably, the speech recognition means is configured to offer an alternative recognised destination before calling said assisting person. Furthermore, the recognition means may be configured to offer a number of alternatives before calling an assisting person and in a preferred embodiment, incoming speech vocalisations are recognised by a process of 25 Time Encoded Signal Processing And Recognition (TESPAR). Thus, the invention relates to mobile and/or hands-free communication systems and especially to systems relying upon voice control to enable interconnection or reconfiguration of elements which may include all elements of communication systems.

WO 99/49639 PCT/GB99/00910 3 One important example of such systems is their utilisation in telephones and, in particular, in mobile phones when deployed with so called "hands-free modules" in car and other mobile vehicles. The in-car telephone environment is used as an exemplar to illustrate 5 the key features of the current disclosures but it can be appreciated that the invention has broader application in environments where telephone calls are established. Voice activated so called hands-free car kits are now becoming available for the cellular phone market and are well known to those skilled in 10 the art. These involve a voice (word) recognition capability to enable the user to speak the name of a required remote telephone user or a wanted destination into a microphone and to have the recognition hardware and software translate these acoustic voice inputs into telephone numbers for automatic dialling to the distant subscriber. 15 By such means, the user whilst driving: A. activates a word recognition module; B. speaks the name of the person or the destination required; C. confirms to the word recognition module that the destination selected by the device is the correct one; 20 D. indicates that the subscriber chosen is to be dialled; and E. communicates by voice or other means with a distant subscriber. By this means, a driver may give full attention to the road, the traffic and driving conditions, without having to divert attention to look down at a 25 fixed mobile telephone display and manually activate a button or buttons on the mobile telephone mounted in the car, thus seriously diverting attention from the complexities of the driving task. One description of the requirement for this capability is "safe behind the wheel communication". The demand for such a facility is growing rapidly and is generating WO 99/49639 PCT/GB99/00910 4 serious concerns for the safety of such mobile phone users when driving vehicles in complex traffic scenarios. These concerns are persuading some state and government authorities to propose the introduction of legislative measures, completely denying the use of mobile communication equipment 5 in vehicles. Notwithstanding the threat of legislation, the demand for safe and effective mobile vehicular communications is expanding rapidly, world-wide. The current concepts deployed by existing and proposed conventional voice operated systems may be described in outline with reference to Figure 10 A, as follows. A driver of a car fitted with a hands-free car telephone kit operates an activate voice switch A1, which may be housed on the steering column or the steering wheel or, for example, as a foot operated switch similar to that used in some vehicles for dipping headlights. In response to the switch operation, and using a. synthetic speech prompt, such as, for 15 example, "do you wish to transmit?", a word recognition module A2 seeks confirmation that the car driver wishes to transmit. The driver responds with a "yes" or a "no" which is recognised by the word recognition module A2 which then acts appropriately. Thus, if the word "no" is recognised, module A2 remains dormant. If the word "yes" is recognised, the recognition module 20 prepares for the next acoustic input which will be, for example, the name or location of the person that the driver of the vehicle wishes to contact. The word recognition module attempts to recognise the name or location spoken to it by the driver from the full portfolio of words previously stored in the module. It converts the word selected into an appropriate 25 telephone number via a keypad code module A3. At this stage, the voice operated system may provide a synthetic speech output such as "do you wish to transmit to person X?", seeking confirmation from the driver that person X thus selected is the person that the driver now wishes to contact. The driver will respond with a "yes" or "no" acoustic response. If "no" is the WO 99/49639 PCT/GB99/00910 5 response, the system has made a mistake and the driver may be prompted to re-input the name that is to say, to try again, or, alternatively, the system may automatically provide the second choice from the set of scores previously calculated in the word recognition module. A range of similar 5 alternative man-machine protocols are available to enable this interactive process. On the assumption that the word selected by the word recognition module is the correct one and the driver responds with a "yes" indicating correct recognition, the code selected and previously stored in the keypad 10 code module A3 is passed to a telephone dial module A4, resulting in the appropriate number being dialled. The call is thus connected to the distant subscriber's phone and ringing tone is passed back to the driver via a loudspeaker in the car. When the call subscriber picks up the phone, the connection is complete and the call takes place. 15 Variants of these options and protocols are well known and form the basis of a number of different commercial offerings. Thus, for example, DSE Communication Corporation provides "new voice list management features" enabling customers to create and modify name dialling phoning lists using voice commands and the DSP Group Inc of Santa Clara, California have 20 developed hands-free car kits to provide alternatives to the facilities described above. In general, these products provide substantially in-vehicle hands-free operation, enabling drivers to concentrate on driving while setting up and making a call, thus facilitating communication and significantly improving road 25 safety. The general capabilities outlined above are well known and it is also known that all such systems suffer from a number of serious limitations. Such systems do not work very well in mobile vehicles, due to variable levels of ambient noise associated with cars and other vehicles. For static WO 99/49639 PCT/GB99/00910 6 operation and for relatively quiet driving conditions, recognition rates and system effectiveness may be high. However, when driving at high speeds, possibly with a window open, the variable acoustic background effectively prohibits accurate word recognition and creates frustration and confusion for 5 the driver. This leads to many drivers failing by voice and then attempting to use some other form of hands-on override to the voice input option while maintaining to drive the vehicle with one hand. Alternatively, the driver may be forced arbitrarily to slow down until the noise conditions are acceptable for voice operation, or to stop at a convenient lay-by. 10 A range of complicated and costly signal processing procedures have been developed to reduce the effects of acoustic noise in such vehicles. These include echo cancellation, noise cancellation, noise suppression, the use of distributed microphones and speakers, and on-chip filtering etc. Research in this area continues. 15 All such known approaches involve expensive and complicated algorithms and chip sets. However, no system has yet been devised which is able to satisfactory cope with the many variabilities associated with direct voice input operation in real-world vehicular environment. Further, given the set of acoustic templates produced for example by 20 the driver during the training mode in benign conditions, these differ significantly from the data sets derived from voice waveforms produced by that same speaker in high acoustic background noise. This physiological factor is well known. Current voice or word recognition algorithms based around spectral templates, including hidden mark-off models, are unable 25 satisfactorily to cope with such circumstances and such noise or to offer well known reinforcement strategies, commonly employed in human communication, such as for instance the driver repeating the control words consecutively for example, such as "John Smith, John Smith" to emphasise to the recogniser that through the noise that the person to be spoken to is WO 99/49639 PCT/GB99/00910 7 John Smith. Indeed, such time varying versions of the inputs would cause conventional word recognition systems immediately to fail. In addition, it is well known that the population of human speakers may be split into two groups, based on their basic ability to operate voice 5 activated devices. In the literature, these groups are described as sheep and goats. Sheep are those individuals who can, without effort, produce consistent acoustic outputs and who perform consistently and well in most direct voiced input tasks scenarios. Goats are those individuals who are unable so to perform, even if in benign stress free anechoic ambient 10 conditions. Thus, a significant percentage of the population are unlikely to be able to use conventional hands-free vehicular telephone systems, even when effective anti-noise strategies become available. These examples emphasise the fact that the development of an effective voice activated hands-free telephone capability for the vehicular 15 mobile cellular phone market is becoming increasingly complex and costly. As yet, current systems are unable to effectively cope with the many adverse variabilities associated with the real world driving environment and even if and when they do become effective, a sizeable portion of the general public may be unable to use them satisfactorily. 20 Currently, and for many decades to come, it is unlikely that any word recognition algorithm, irrespective of cost, will be able to match, or even to approach, the performance of the human ear and the mind behind it, in recognising a limited vocabulary of spoken words in very high and variable acoustic noise backgrounds. 25 In this arena, human minds have an exceptional and most powerful capability to integrate a wanted noise signal at the expense of the background noise. In military communications for example "words twice" is a routine example of such, wherein a very noise environment words are repeated twice or three times or more times and deployed as part of an WO 99/49639 PCT/GB99/00910 8 interactive dialog to enable the recipient with certainty to understand what the originator has said. The human mind is able to integrate and filter out the wanted information from the unwanted noise. No system currently available, irrespective of cost and complexity, is able to compete with the flexibility and 5 effectiveness of the human - human interface. The present invention is disclosed to enable voice operated mobile communications to cope safely and effectively in extreme and variable ambient noise environments and to guarantee flexible hands-free voice operated communications which fully map the capability of human beings. 10 It is well known that the performance, that is, the recognition rate of any word recognition algorithm is inversely proportional to the number of words that are to be presented at any one time to the algorithm for recognition. Current conventional systems may offer from say, eight to sixty four names, often splitting these up into sub-groups to present the algorithm 15 at any one time with a minimum number of alternative names, thereby improving performance. It is also well understood that the performance of all such system degrade significantly and progressively as the level of acoustic background noise increases, both in terms of its magnitude and variability. According to a second aspect of the present invention, there is 20 provided telephony apparatus configured to establish a telephone call using speech recognition, including means for receiving acoustic vocalisations wherein the name of a destination is repeated; and processing means are configured to analyse said repeated vocalisations so as to improve recognition properties. 25 In a preferred embodiment, the speech is recognised by a process of Time Encoded Signal Processing And Recognition (TESPAR) and the processing means may be configured to offer a predetermined number of alternatives if an incorrect recognition of a vocalisation is made. Preferably, a plurality of TESPAR archetypes are stored for specific WO 99/49639 PCT/GB99/00910 9 users. In a preferred embodiment, the recognition equipment is mounted within a motor vehicle and telephone communications are made by mobile cellular networks. 5 According to a third aspect of the present invention, there is provided a method of establishing a telephone call using speech recognition wherein, after a speech recognition system has failed to correctly identify a destination to call, a call is made to an assisting person; telephone number data is transmitted back to said user; and a new telephone call is established in 10 response to transmitted data. Brief Description of the Drawings Figure A shows a conventional mobile cellular telephone with hands free voice recognition facilities; 15 Figure 1 shows a hands-free mobile telephone system mounted within a motor vehicle; Figure 2 illustrates vehicles operating within a cellular telephone environment; Figure 4 summarises the operation of the environment identified in 20 Figure 2; and Figure 5 details the telephone system identified in Figure 1. Detailed Description of The Preferred Embodiments Hands-free telephony systems may be employed in many situations 25 where an operative cannot physically operate a telephone in the usual way or in doing so the operative may be distracted leading to a potentially dangerous situation. The hands-free environment described herein relates to the deployment of a mobile cellular telephone within a motor vehicle. However, it should be appreciated that this particular application presents an WO 99/49639 PCT/GB99/00910 10 example and should not be considered limiting. A car interior having mobile telephony equipment is shown in Figure 1. The mobile telephone may be permanently mounted within the vehicle or, alternatively, it may include a portable mobile telephone interfaced to an in 5 car system, commonly referred to as a "car kit". A mobile telephone 101 is supported within a cradle 102. A cradle 102 is connected to an in-car unit 103 by means of an umbilical connection 104. The in-car unit 103 receives power via the car's internal battery via a power connection 105. In addition, an aerial connection 106 is connected to an 10 external aerial for the transmission and reception of radio signals. Audio input signals are supplied to audio loudspeakers 107 via audio leads 108 and internal vocalisations are received from a microphone 109 via an audio input lead 110. In addition to facilitating the establishment of telephone calls in a 15 conventional manner, the in-car unit 103 also includes speech recognition facilities, thereby allowing a driver to establish a telephone call with minimal physical interaction. In preference to removing the telephone from its cradle or activating telephone buttons while it resides in its cradle. A driver is merely required to activate a voice recognition switch 111, thereby placing the in-car 20 unit 103 into a condition which facilitates the establishment of a call by voice recognition procedures. Furthermore, the system is provided with a talk-back system embodying the present invention such that should voice recognition procedures fail to establish a call, a call is automatically made to a secretary or appropriate populated bureau or service centre, whereafter the required 25 telephone number details may be returned in machine-readable form to the in-car unit 103, thereby facilitating the establishment of a further call, again without any manual intervention on the part of the driver. The vehicle identified in Figure 1 is also shown in Figure 2 at 201. The vehicle operates within a modular telephone environment and other vehicles WO 99/49639 PCT/GB99/00910 11 in this environment, such as vehicles 202 and 203 also communicate via the GSM facilities, illustrated generally as a cellular network 204. The GSM modular network 204 is connected to a local public switched telephone network 205 via conventional interface channels 206, from which it is 5 possible to establish conventional telephone connections to an office environment, shown generally at 207 and to a home environment, shown generally at 208. It is possible that these connections may be established using speech recognition systems, where the telephone numbers for "the office" and "home" are stored against appropriate encoded speech templates. 10 Thus, a speech utterance is compared against a selection of templates and a best classification is made by the speech recognition equipment, which is then presented acoustically to the driver, thus enabling the driver to confirm or otherwise whether a telephone call is to be established. In addition, in accordance with the present invention, the system is 15 provided with a "talk-back" system such that if the speech recognition fails to identify the number required after a number of attempts, the system is automatically programmed to make a call to a service centre 209. Alternatively, the system may be programmed to make a specified call, possibly to a secretary or other assistant. However, in a preferred 20 embodiment, the service centre provides talk-back facilities for a significant number of users on a subscription basis. After failing to identify a required number using speech recognition, a call is made to the service centre and an audio call is established with a service centre operative. The driver identifies the destination to which efforts 25 are being made to establish a call and the operative at the service centre takes measures to identify the appropriate number. This may involve listing numbers from customer specific databases etc. Having determined the number required, the information is relayed back to the calling customer in machine-readable form, such that the customer is then in a position to WO 99/49639 PCT/GB99/00910 12 establish the call using the received data. Thus, it can be seen that there is provided an environment in which it is possible to establish a telephone call using speech recognition. If the speech recognition system is successful (which it should be on the majority of 5 occasions) a telephone call is established and a user is then in a position to communicate without any additional manual interaction. However, when the speech recognition fails to identify a required destination, the system automatically makes a call to an assisting person, in the form of an operative at service centre 209 or a personal assisting person, such as a personal 10 secretary. Such a call, identified by the term "talk-back" is selectively used by a user and when so enabled, telephone number data is transmitted back to the user, thereby allowing the user to establish a new telephone call without additional manual intervention. Thus, if the system is working properly, automatic speech recognition 15 procedures within the car will allow the number to be selected and automatically dialled. However, should these procedures fail, the user is still not required to manually intervene, given that a back-up system, in the form of the talk-back procedures, will activate automatically thereby providing a human-involved reliable procedure for allowing the call to be established. 20 In a preferred embodiment, the speech recognition procedures employ Time Encoded Signal Processing And Recognition (TESPAR) and, in particular, time encoded speech processing and recognition. The fundamental operating aspects of TESPAR are disclosed in United States patents 4,382,160, 5,091,949, 5,101,433 and 5,101 434, along with 25 European patent publications 0 166 607 and 0 256 081. TESPAR has the unique capability of coding time varying signals into common fixed sized matrices, thereby facilitating its application in voice recognition systems. TESPAR archetypes for individual words when correlated against individual or multiple versions of the same word may WO 99/49639 PCT/GB99/00910 13 produce one hundred percent scores and this ability contrasts significantly against alternative systems. It has been discovered that in high acoustic background noise, a TESPAR matrix produced by a speaker repeating a word a number of times 5 is productive in averaging up the wanted features of the signal and averaging out much of the background noise. Thus, one important element of the human capability discussed previously may be reproduced by a substantially equivalent capability in the TESPAR word recognition domain. Furthermore, word recognition modules may incorporate the so-called 10 nm choice option by means of a dialog between the human user and the machine, using synthetic speech prompts. Thus, if the recognition vocabulary were to consist of the digit zero to nine and the driver spoke the word five to it, the recognition procedure may incorrectly recognise the word nine and prompt the user with the words "did you say nine?", the user would respond 15 with the word "no", resulting in the machine being in a position to offer a second choice out of the list of comparisons previously made by it. If the second choice were to be the word five, which is likely, the machine would then prompt with the phrase "did you say five?" and the driver would then respond with the word "yes". At this point, the system would act accordingly. 20 Thus, by this nth choice mechanism, provided that the yes and no recognition capability has a very high integrity, a system with this facility can guarantee that the driver can always select the correct word. The effectiveness of this procedure is enhanced by the fact that the second choice selected by the machine is more likely to be the correct word 25 chosen than the third or higher and that the third choice is likely to be more probable than the fourth or higher etc. It has been discovered that this capability is an inherent property of TESPAR-based word recognition systems, in contrast with systems deploying spectral templates or hidden Markov models. Thus, this n' choice facility is one which may be deployed WO 99/49639 PCT/GB99/00910 14 very effectively to cater for errors that may occur in TESPAR-based systems but not so effectively in alternative systems where the error rate is likely to be high. It has been discovered that a measure of n' choice activity may be a 5 very powerful indication of difficulty in voice communication when operating in high or variable acoustic background noise. It has been discovered that this nm choice activity may be measured to provide powerful additional alternative communication options which may enable effective communication to take place irrespective of the acoustic environment experienced by the driver. 10 A typical portable phone architecture (based on the GSM system) is shown in Figure 3. This includes microphone 109, loudspeaker 107, voice and base-band coder 301, GSM processor, 302, interface to the keypad 303, mobile phone display 304, a random access memory module 305 and a read only module 306. In addition, there is provided a radio module 307 that 15 enables both the transmission and reception of radio signals. A Subscriber Identity Module (SIM) 308 is provided and, to facilitate the transmission of data, the system includes a data terminal adapter 309. A word recognition/keypad code module 311 is provided, embodying the previously described characteristics. This is inserted between a hands 20 free module 312 and a voice/base-band coder 301. Appropriate data is re routed through the voice/base-band coder 301 to the GSM processor 302 for emulating activity of a keypad 303 and the display 304 options. In addition, a Dual Tone Multi-Frequency (DTMF)/keypad code module 313 is placed in parallel with loudspeaker 107 thereby receiving an output from the 25 voice/base-band coder 301. Similarly, an output from the DTMF/keypad code module 313 is connected to GSM processor 302. However, it should be stressed that a range of different and alternative interconnections may be used to achieve similar capabilities and that the arrangement shown in Figure 3 is merely a particular example of interconnections provided to illustrate the WO 99/49639 PCT/GB99/00910 15 embodiment. A situation may be assumed in which a driver wishes to communicate with one of N organisations or individuals stored in the word recognition/keypad module 311. If the word recognition system is a speaker 5 independent one, pre-stored templates or TESPAR archetypes will be provided within the module. If the system is a speaker dependent word recognition process, the user will previously have provided and trained the system on a number of examples. On operating the active voice switch 111, the system will produce a 10 synthetic speech prompt asking "do you wish to make a call?". The driver will respond with the word "yes" if he wishes to make a call or with the word "no" if the active voice switch has been inadvertently operated. If the driver's response is "yes" the recognition module 311 will respond with an acoustic prompt "please indicate a caller". The driver will then speak out one of the 15 designation addresses from the list stored in his word recognition sub directory such as "home", "John Smith" etc. The word recognition module 311 will compare the driver's acoustic input with the archetypes or templates of each of the words in the personalised telephone sub-directory, select the highest scoring entry and 20 respond with a synthetic speech prompt associated with the highest score. Thus, a response may be produced along the lines "do you want John Smith". If the response is "yes" John Smith's appropriate keypad code, that is to say, the relevant telephone number, will be passed, via the voice/base band coder 301 to the GSM processor 302 and the GSM processor will then 25 output the correct telephone code number for John Smith over the signalling channel of the radio communication link. By these means, a normal telephone call will be set up, the appropriate phone at the destination address will ring and the driver will communication with the destination address via the hands-free module WO 99/49639 PCT/GB99/00910 16 equipment 312. If the acoustic conditions are poor and the speech recognition system does not work effectively, such that the first word spoken by the driver to the system is not recognised as the correct word, the driver will indicate "no" and 5 the recognition module will produce a second choice, a third choice and so on for a predetermined number of times until the correct word is identified. This should enable the driver to obtain the selected number. As previously indicated, this process is referred to as n' choice and should provide significant enhancements which, with TESPAR-based word recognition 10 systems, should result in a one hundred percent success rate irrespective of acoustic conditions. However, for any large-sized vocabulary, the n' choice procedure can prove tedious, since the larger the vocabulary the more likely an error is to occur and the larger the vocabulary the more frustration is likely to be 15 engendered. If the value of N is large, a large number of interactions or acoustic transactions may need to take place before the correct subscriber is eventually chosen. For example, pathologically, with a sixty-four word sub directory, sixty-three system interactions may be required in order to achieve 20 success. This is unsatisfactory, even when background noise is very intrusive. In situations where very high noise levels are present, a VHN mode may be brought into operation, also referred to as "talk-back" where a DTMF code module 313 or other keypad code generator, is brought into play. 25 VHN facility may be introduced manually or automatically, the latter being via the nth choice procedure, in the following manner. If the correct word is not chosen first, nor is it chosen a second time, that is to say when n is greater than, for example, two, the word recognition keypad module 311 is activated to prompt with the phrase "do you want talk-back?" if the response WO 99/49639 PCT/GB99/00910 17 is a vocalised "yes", the DTMF keypad code module 313 "dials" via the GSM processor 302, one of the list of talk-back numbers, the details of which are known to the driver. These may be, for example, the driver's office, the driver's home or a bureau designated specifically to provide the talk-back 5 service, as identified in Figure 2. Having dialled this number, the driver is then, by the means previously described, interconnected directly to his office or directly to his secretary or to his home or to a talk-back bureau with whom he has listed the N individuals that form part of his word recognition sub directory or, if required, to other subscribers not on the list. 10 A typical interaction may be illustrated with reference to Figure 4, where the call from the vehicle is routed via a radio base station 401, through a telephone exchange 402 and to an appropriate telephone 403. Upon receiving an incoming call, the handset 404 of the telephone is picked-up thereby placing the telephone 403 off-hook and, hence, in a receptive mode. 15 The driver's office or the talk-back bureau are then in direct voice communication and, irrespective of the background acoustic conditions, this should enable the secretary or bureau to ascertain which number the driver wishes to access. Many modern telephones incorporate Dual Tone Multi-Frequency 20 (DTMF) dialling code capability and, for example, the driver's secretary may have the driver's N codes previously stored by means of quick dial memory keys. Thus, when the driver requests the number for a particular contact using the talk-back provision, the information may be provided quickly by means of accessing the memory keys. Thus, this exchange of information will 25 take place to the limits of human capability, using words twice, or whatever other form of interactive voice exchange is needed to confirm the requirements, irrespective of the acoustic background conditions. Once the secretary or bureau operative confirms that the number for John Smith is required, the secretary will be aware that John Smith's WO 99/49639 PCT/GB99/00910 18 telephone number is on, for example, quick dial three. With the handset off hook, the secretary will depress the quick dial three button, which will then generate a series of DTMF tones associated with John Smith's telephone number. These tones will then be automatically transmitted from the office 5 phone to be received in the driver's vehicle telephone equipment via the loudspeaker channel, as shown in Figure 3. The tones will be heard by the driver and appropriately interpreted in parallel by the DTMF keypad code module 313. At the DTMF keypad code module 313, the numerical values are stored and translated into appropriate dial codes for the GSM processor 10 302 in preparation for subsequent transmission. By these means, a variety of different options are made available. In the current example, the existing link to the secretary will automatically be disabled and the telephone number decoded by the DTMF keypad module 313. 15 It can be seen that by these means A. an optimum capability may be achieved, irrespective of acoustic background noise and that; B. the facilities may be provided for any word recognition procedure to enhance its effectiveness, irrespective of its basic performance 20 in acoustic background noise; and C. the procedures disclosed are equally effective and enabling for the sheep and goat populations of vehicular mobile radio users. Preferably, for the reasons indicated, TESPAR-based word recognition procedures should be used in preference to existing known 25 systems, since these are more likely to reduce the number of occasions when the talk-back facility is required; as the n' choice routines are particularly efficient when using TESPAR processes. It is also relevant that, in relation to this disclosure, TESPAR procedures may use words twice and similar protocols productively to overcome acoustic background noise.

WO 99/49639 PCT/GB99/00910 19 It will be apparent that by this means if the driver wishes to contact an unusual telephone number not included in the local telephone sub-directory, the talk-back facility may provide the driver with any number very effectively. Thus, the balance between word recognition capability, telephone sub 5 directory size, cost and complexity may be optimised for particular users and individual applications, to expand and improve all voice input facilities and capabilities, to provide one hundred percent system integrity in the most difficult environments and in a manner which leaves both sheep and goat drivers free to drive in a safe and effective fashion. By means of the current 10 disclosure, this capability is easily embodied and does not involve changes to standard mobile telephone equipment using the office or the bureau or the home to provide the facility disclosed to the driver on the highway. It will also be obvious to those skilled in the art that the features of this disclosure may be configured to be used in a variety of differing scenarios in 15 differing application areas and with differing data coding strategies. For example, for handicapped users, for long-term hospital patients, for the safety of engineers working manually in remote locations or hazardous situations, for operators involved in complicated manual procedures, all or any of which may otherwise preclude the manual hands-on use of portable telephone 20 equipment for communication, information transfer and remote control.

Claims

1. Telephony apparatus, including speech recognition means configured to identify a destination to be called in response to a vocalisation; 5 calling means configured to call an assisting person if the speech recognition means fails to correctly identify said vocalised destination; and means for receiving telephone number data transmitted back in response to said call to said assisting person, wherein said calling means establishes a new call in response to said transmitted data. 10

2. Apparatus according to claim 1, wherein speech recognition means is configured to offer an alternative recognised destination before calling said assisting person. 15

3. Apparatus according to claim 2, wherein said recognition means is configured to offer a number of alternatives before calling an assisting person.

4. Apparatus according to claim 2 or claim 3, wherein said speech 20 recognition means recognises incoming speech vocalisations by a process of Time Encoded Signal Processing And Recognition (TESPAR).

5. Apparatus according to claim 1, wherein said speech recognition means is configured to be responsive to words being spoken 25 more than once so as to improve the recognition procedures.

6. Apparatus according to claim 5, wherein said speech recognition procedures are performed using Time Encoded Signal Processing And Recognition (TESPAR). WO 99/49639 PCT/GB99/00910 21

7. Apparatus according to claim 1, wherein said assisting person has means for generating an encoded representation of said telephone number using conventional signalling techniques. 5

8. Apparatus according to claim 7, wherein the equipment of said assisting person is configured to transmit audio tones.

9. Apparatus according to claim 1, wherein said calling means 10 establishes a call by use of a cellular telephony network.

10. Apparatus according to claim 9, wherein said cellular telephone system is mounted within a motor vehicle. 15

11. Telephony apparatus configured to establish a telephone call using speech recognition, including means for receiving acoustic vocalisations wherein the name of a destination is repeated; and processing means are configured to analyse said repeated 20 vocalisation so as to improve recognition properties.

12. Apparatus according to claim 11, wherein said speech is recognised by a process of Time Encoded Signal Processing And Recognition (TESPAR). 25

13. Apparatus according to claim 12, wherein said processing means is configured to offer a predetermined number of alternatives if an incorrect recognition of a vocalisation is made. WO 99/49639 PCT/GB99/00910 22

14. Apparatus according to claim 12, wherein a plurality of TESPAR archetypes are stored for specific users.

15. Apparatus according to claim 11, wherein said recognition 5 equipment is mounted within a motor vehicle and telephony communications are made via mobile cellular networks.

16. A method of establishing a telephone call using speech recognition wherein, after a speech recognition system has failed to correctly 10 identify a destination to call, a call is made to an assisting person; telephone number data is transmitted back to said user; and a new telephone call is established in response to said transmitted data. 15

17. A method according to claim 16, wherein said speech recognition system offers an alternative recognised destination before calling said assisting person.

18. A method according to claim 17, wherein said speech 20 recognition system is configured to offer a number of alternatives before calling an assisting person.

19. A method according to claim 17 or claim 18, wherein said speech recognition system recognises incoming speech vocalisations by a 25 process of Time Encoded Signal Processing And Recognition (TESPAR).

20. A method according to claim 16, wherein the speech recognition system is responsive to words being spoken more than once to improve recognition. WO 99/49639 PCT/GB99/00910 23

21. A method according to claim 20, wherein said speech recognition system is performed in accordance with Time Encoded Signal Processing And Recognition (TESPAR) procedures. 5

22. A method according to claim 16, wherein said assisting person has means for generating an encoded representation of said telephone number using conventional signalling techniques. 10

23. A method according to claim 22, wherein the equipment of said assisting person is configured to transmit audio tones.

24. A method of establishing a telephone call using speech recognition, wherein 15 a speech recognition process is configured to receive acoustic vocalisations in which the name of a destination is repeated; and said repeated vocalisations are analysed so as to improve recognition properties. 20

25. A method according to claim 24, wherein the speech is recognised by a process of Time Encoded Signal Processing And Recognition (TESPAR).

26. A method according to claim 25, wherein a predetermined 25 number of alternatives are offered if an incorrect recognition of a vocalisation is made.

27. A method according to claim 25, wherein a plurality of TESPAR archetypes are stored for specific users. WO 99/49639 PCT/GB99/00910 24

28. A method according to claim 24, wherein the recognition process is performed within a motor vehicle for application with respect to a mobile cellular network.