WO2019207597A1

WO2019207597A1 - System and method of operating open ended interactive voice response in any spoken languages

Info

Publication number: WO2019207597A1
Application number: PCT/IN2019/050325
Authority: WO
Inventors: Zubair Ahmed; MD. Rezwanul HOQUE; Sadat Sakif AHMED
Original assignee: Zubair Ahmed
Priority date: 2018-04-23
Filing date: 2019-04-22
Publication date: 2019-10-31
Also published as: PH12020551761A1; AU2019260038A1

Abstract

System and method of operating open ended interactive voice response in any spoken languages such as Austroasiatic, Austronesian, Dravidian, Indo-Aryan, Afroasiatic, Sino-Tibetan and Tai-Kadai, discloses the invention of open Ended Interactive Voice Response System, wherein customer just have to speak their desired intention in a spoken languages and the system will take care of the rest, thus making it efficient, user friendly, secure and unified. Whereas, IVR also serves businesses by eliminating their need to create and maintain any IVR systems, secure integration with other businesses and itself, worry about scalability of their systems.

Description

SYSTEM AND METHOD OF OPERATING OPEN ENDED

INTERACTIVE VOICE RESPONSE IN ANY SPOKEN LANGUAGES

FIELD OF THE INVENTION

The invention relates to the field of operating Interactive Voice Response (IVR). More particularly the invention relates to operating open ended IVR system.

BACKGROUND OF THE INVENTION

Interactive Voice Response (IVR) is a system where a call, usually a GSM call or using other means of wireless communication for mobile devices, such as cellular or data networks is picked up by the system and a static menu is presented to the caller in a form of audio or text. Based on the user’s keypad input, a certain task is performed. Businesses set up Interactive Voice Response (IVR) to cut down cost and make serving customer effective before any human intervention is required.

Normally when a user calls an entity, like a bank, the call is first received by an automated audio IVR system. The user is presented with some audio menus, the menu items are bound to the keys 0 - 9 of the keypad. Most menus provides instant service or access to an end point of a specific division in an entity. Bank instant services could be for instance checking balance, change pin of a card and so on, endpoint access of entity could get in touch with the loan department, contact the security department, etc. Menus are always specific to the entity and might contain services specific to the individual caller. For example, a bank’s IVR system will only serve account menus to a caller who has previously registered an account with the phone number, whereas the IVR system of a pizza delivery service will only take in the order and place of delivery. IVR system may or may not use the caller’s information to process a request.

In most cases the caller has to listen to a set of menus and perform certain keystrokes to reach an intended service or endpoint. These makes the process very time intensive. In addition the process is quite error prone as well, in case the caller makes a wrong input, the whole process might have to be repeated again. Moreover the menu of the IVR may be changed or updated as an impact, a regular caller will not be able to reach a desired service or endpoint just by remembering the keystrokes. Furthermore the endpoint or service may not fulfill the caller’s desired intent and may have to call back to the IVR again to perform the task, which is not at all user friendly and might frustrate the caller. Additionally a user might have to call multiple IVR systems to perform related tasks since most business IVR are not interconnected, generating a negative user experience. Former technology U.S. Pat. No.804l575B2 tried to solve IVRs’ intercommunication problem by providing menu driven top level IVR system where the user can connect to different IVR systems using the menu and data is bundled up by the main IVR and passed to secondary IVR systems.

However, the back and forth between menu poses a great challenge for the users. Additional, the menu hierarchy increases drastically thus the user is deprived of a user friendly solution.

Another technique is provided in US Patent 20050033684A1 assigned to Tekelec which provides a method of payment through IVR system using either keypad entry or voice based commands. The transaction only occurs, when a local Point of Sales (POS) device generates a sale transaction.

However the users are limited to only one task: Transferring money from one point to another. Moreover, the user cannot initiate a transaction, only a POS device can do this, thus making the service very limiting. The technique integrated multiple services of same agenda into one and doesn't serve the purpose of streamlining several tasks under one call.

In US Patent No. 8155280B1 assigned to Zvi Or-Bach, Tal Lavian tried to tackle the disconnected IVR systems by having a database of it’s own and linking to each menus of different IVR systems.

However, the technique is not efficient enough, as the user can switch between one IVR to another, but the IVRs do not share data. Moreover, the users have to press menu buttons to reach the desired destination, which makes the system very cumbersome. Although that technology made multiple IVR systems accessible under one phone call, it was unable to make them interconnected.

Some prior invention tried to address this problem by providing visual form of IVR. These prior arts display the IVR menu graphically on a caller device. U.S. Pat. No. 7,215,743 assigned to International Business Machines Corporation and a published U.S. patent application Ser. No. 11/957,605, filed Dec. 17, 2007 and assigned to Motorola Inc., provides the IVR menu of the destination in a visual form to the caller. The caller can select the options from the IVR menu without listening to the complete audio representation of the IVR menu. However, the IVR menu displayed on the caller device is stored on an IVR server at the destination end.

However, the visual IVR menu is specific to the destination and only the IVR of the destination dialed is displayed. These techniques therefore require each destination to set-up hardware, software and other facilities to be deployed for providing visual IVR servers. Moreover, the caller must be literate enough to be able to read and respond to the menu or else the caller has to listen through all the menu before proceeding.

Another existing technique, as disclosed in U.S. Pat. No. 6,560,320 assigned to International Business Machines Corporation, enables an operator of the IVR to send customized signals to the caller for generating and displaying graphical elements on the device of the caller. Thereafter, the caller can respond by selecting options through the touch-screen interface of the device by utilisingDual-tone multi-frequency (DTMF) signals of the IVR.

However, this technique requires a specifically configured device to interpret the codes sent as DTMF signals for generating the graphics. Moreover, an operator is required to present the graphics to the caller. Furthermore, specialized software and hardware are required at the operator to design and generate DTMF codes. Therefore, the technique faces various practical limitations.

Generally, the IVR menus of the organizations are presented as audible menus. Moreover, there are a large number of organizations that use IVR menus. Therefore, converting the audible menus to visual IVR menus can be time consuming. An existing technique, as disclosed in U.S. Pat. No. 6,920,425 assigned to Nortel Networks Limited, discloses an automated script to convert the audible menus scripts to visual IVR menu scripts.

However, the audible menu scripts must be available in a particular format to enable the conversion. Furthermore, the audio menu scripts must be available or downloadable for the program to function. As a result, only the audio menus scripts that are available can be converted to visual IVR menu scripts. Furthermore, the device of the caller must be designed or programmed to be able to display the visual IVR menu scripts.

The effectiveness of providing the IVR in visual form is discussed in a technical paper titled ‘The Benefits of Augmenting Telephone Voice Menu Navigation with Visual Browsing and Search’ by Min Yin et al. The paper discusses a setup, where visual content of the IVR is sent from a service provider to a computer connected to a mobile phone.

However, the technique discussed in the paper is limited to the visual content provided by the service provider's end, after the connection is established. Moreover, the providers are required to individually set up the hardware and services for providing visual content.

As discussed above the existing technologies have various limitations. Hence, techniques are desired for providing enhanced IVR systems.

SUMMARY OF THE INVENTION

An enhanced IVR system which lets the user perform multiple task across multiple services under one streamlined phone call with just voice input without using any menus. The user is authenticated via voice or passcode and the credentials are fetched. The user tells the system the intended tasks. The voice is converted to text using an automatic speech recognition (ASR) system which converts the users’ utterances to text. A natural language processing (NLP) engine corrects any mistakes the ASR made during the conversion in languages such as Austroasiatic, Austronesian, Dravidian, Indo-Aryan, Afroasiatic, Sino-Tibetan and Tai- Kadai. Afterwards the logic system takes over and fetches a set of services needed by the tasks, checks if these tasks are valid, fetches required information and generates responses to be given to the user. Thereupon the response system takes charge, generates and presents responses through speech synthesis or in text form (e.g. through sms messages) to the user.

As a result, the invention tackles a wide array from problems not dealt previously. The elimination of both acoustic and visual menus leads to more natural and time effective user interaction. There is not need to type in any information, since all data will be extracted from voice, thus less literate users are enabled to interact with the IVR system effortlessly. In addition, the interconnection across multiple services allows the user to perform similar tasks in a single phone all session.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1.0.0 is a diagrammatic view of the system organization for an exemplary embodiment of the presented invention; and

FIG. 2, 4, 5, 6 are segmented flow charts of the principal steps involved in using the presented invention. DETAIL DESCRIPTION OF PREFERRED EMB ODIMENT (S)

Referring to FIG. 1, a diagrammatic view of the system organization for an embodiment of the presented invention is shown. A caller (can be multiple) 101 can initiate a call 104 to the open ended IVR system 200. The caller 101 can make the call 104 using any cellular device 102 which could be a smartphone or feature phone and the call 104 will land on the system, which is running on cloud servers 200 using the cellular network 103 the GSM device 102 use. Once the call 104 has landed on our system 200, the caller 101 will be greeted with an audio message and asks to speak the desired intent. Once the system 200 receives the caller’s 101 intent, the system 200 fetches the required services 106 necessary to perform the intent via internet 105 based APIs (application programming interface). The system 200 checks if the caller 101 intent is executable and based on the intent the system 200 responds to the caller 101 in the following ways:

1. The system 200 reports the caller 101 the informative intent requested in audio or text via SMS Gateway servers.

2. The system 200 might ask the caller’s 101 consent to execute the requested task, presenting an audio format of the task to be executed.

3. The system 200 might report the caller 101 error in the intent in audio form.

If the caller 101 requested for an informative intent or a task based intent, the caller 101 might ask to execute another intent or end the call. If the caller 101 receives an error, the caller 101 will be asked to repeat the corrected intent. For example: A caller 101 calls the system 200 and the system 200 greets the caller 101 with an audio message and asks the caller 101 for the intent. The caller 101 speaks“Buy 2 dozen eggs from EGGY Store and pay the bill from my AB bank account and deliver the eggs to my home by EATS” the system 200 processes the intent and responds to the caller 101 in audio with‘’From EGGY Store purchase 24 eggs. An amount BDT 50 will be debated from your AB bank account number XXXXXX by EGGY. Deliver eggs to house 16, flat A2, Jefferson Street by EATS. An amount BDT 8 will be debated from your AB bank account number XXXXXX by EATS. Confirm task?”. The caller 101 responds with“YES” and the system 200 replies“Your request was successfully executed”. The caller 101 ends the call. The system 200 sends two receipts via SMS, one from EGGY store and another from EATS delivery service.

The system may invoke a particular service multiple times as well based on the caller 101 needs. For example: A caller 101 calls the system 200. The system 200 greets the caller 101 with an audio message and asks the caller 101 for the intent. The caller 101 speaks“Transfer BDT 5000 from my account in ABC bank to account number XXXXX in ABC Bank and pay my Electric bill ID XXXX with my ABC bank account” the system 200 processes the intent and responds to the caller 101 in audio with“From your account XXXX in ABC bank transfer BDT 5000 to account XXXXXX in ABC. An amount BDT 4200 will be deducted from your ABC bank account XXXXX in refer to the Electric bill ID XXXX. Confirm task?”. The caller 101 responds with“YES” and the system 200 replies“Your request was successfully executed”. The caller 101 cuts the line. The system 200 sends two SMS logs, one for account transfer another for the electric bill payment.

Referring to FIG. 2, a diagrammatic view of the open ended IVR system 200 for an exemplary embodiment of the presented invention is shown. When a call 104 enters the system 200. The Call Landing server 201 handles the call 104 and routes the call 104 to the authentication system 300. The authentication system 300 identifies the caller 101 and fetches the caller 101 credentials. The audio speech of the caller 101 is converted to a machine readable intent by the intent handler system 400 and passed to the logic system 500 for further processing. The logic system 500 fetches the required services and viability of the intent and gives feedback to the response system 500. The response system converts the feedback into a human understandable format and passes the feedback to the caller 101 via the available channels.

Referring to FIG. 3, a diagrammatic view of the open ended IVR system 200 for an embodiment of the present invention.

Referring to FIG. 4, a diagrammatic view of the intent handler system 400 for an exemplary embodiment of the present invention is shown. The audio from the caller 202 is passed to the automatic speech recognition (ASR) system 401 which converts the audio speech to text 402. The text 402 is passed to the natural language processing system (NLP) 403 which corrects any error generated during audio to text conversion in languages such as Austroasiatic, Austronesian, Dravidian, Indo-Aryan, Afroasiatic, Sino-Tibetan and Tai-Kadai. The corrected text then is then passed to the contextual AI system, which reads the text and collects the tasks 406, the caller 101 wants to perform. The machine readable intent 406 is passed to the logic system 500 for further processing.

Referring to FIG. 5, a diagrammatic view of the logic system 500for an exemplary embodiment of the presented invention is shown. Intent 406 can be one of the following 3 types:

1. New intent 501

2. Correction intent 502

3. Confirmation intent 503

In case of a new intent 501 and a correction intent 502 the logic system 500, retrieves the services required to perform the intent - 504. Then the logic system 500 performs operations to evaluate whether the intent isvalid and executable - 506. If the intent 507 requires only information, the data is retrieved and passed to the response system 600 or the intent waits for the caller’s 101 confirmation or the intent contains an error, which has to be corrected by the caller 101 by providing a new intent 406. Upon receiving the confirmation intent the tasks are executed and the result is passed on to the response system 600.

Referring to FIG. 6, a diagrammatic view of the response system 600 for an exemplary embodiment of the presented invention is shown. The logic system generates 2 types of responses:

The system needs a confirmation on the task list. 502

A report is generated to be delivered to the caller 101. It can be formation or error. 503

The response is converted to human readable text 504 and based on the logic system response 501. The human readable text 504 is either passed to the text to speech (TTS) system 505 or aSMS gateway server, before it is delivered to the caller 101.

Claims

We Claim:

1. System and method of operating open ended interactive voice responses in any spoken languages such as Austroasiatic, Austronesian, Dravidian, Indo-Aryan, Afroasiatic, Sino-Tibetan and Tai-Kadai, comprising the steps of

a system of call landing via wireless communication for mobile devices, such as cellular or data networks where a call landing server will pick up the call and pass the call to the authentication server, wherein the server verifies the user and fetches stored information associated with the corresponding user; and

a system of input voice in any language by the user, wherein the voice intent is converted by the ASR, which is passed through an NLP engine to correct any grammatical mistakes. Further, the corrected text is passed to the contextual AI engine, which picks up the user’s desired tasks and passes the tasks to the logic system, wherein the said logic system checks, validates and fetches the services required to complete the tasks with user confirmation, which may be required by the logic system to execute the task; and

a response system, generated as a reply to the users audio input, where the system uses either TTS (Text to Speech) to ask/ provide feedback or SMS notification, with a combination of modular servers and services the system works as one streamlined input mechanism.

2. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, after authentication by the system, which fetches the list of services from storage that the users are affiliated with. It is also in charge of fetching users credentials required to operate the corresponding services from the data center.

3. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system takes analog input from users in audio form in any language, which is converted to text by an ASR and NLP engine.

4. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system is connected to multiple services via APIs, wherein the system asks affiliated business/es to provide services to the users within periodical time intervals or one at a time with the user’s consent.

5. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system invoke multiple services or a single service multiple times based on the user’s task as coded in the logic server.

6. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system asks the users to provide inputs multiple times based on the task requirements.

7. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system passed data from one service to another or within the service if necessary for the execution of the required task.

8. System and method of operating open ended interactive voice response in any spoken languages according to claim 1, wherein the system responds to users with TTS(Text to Speech) to ask users to provide further information, to report an error that has been generated or provide the requested information; and a system that uses SMS Gateway server as a form of record keeping or log generation.