WO2001038962A1 - Voice activated hyperlinks - Google Patents

Voice activated hyperlinks Download PDF

Info

Publication number
WO2001038962A1
WO2001038962A1 PCT/US1999/028004 US9928004W WO0138962A1 WO 2001038962 A1 WO2001038962 A1 WO 2001038962A1 US 9928004 W US9928004 W US 9928004W WO 0138962 A1 WO0138962 A1 WO 0138962A1
Authority
WO
WIPO (PCT)
Prior art keywords
functions
user
function
lists
input action
Prior art date
Application number
PCT/US1999/028004
Other languages
French (fr)
Inventor
Michael J. Polcyn
Original Assignee
Intervoice Limited Partnership
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intervoice Limited Partnership filed Critical Intervoice Limited Partnership
Priority to AU24743/00A priority Critical patent/AU2474300A/en
Priority to EP99968050A priority patent/EP1240579A1/en
Priority to PCT/US1999/028004 priority patent/WO2001038962A1/en
Publication of WO2001038962A1 publication Critical patent/WO2001038962A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals

Definitions

  • This invention relates in general to automated telephony systems, and in specific to
  • definition could be "wait for a telephone ring.” When the phone the rings, then answer;
  • FIGURE 7 is a diagram of flow of typical prior art interactive voice recognition/DTMF
  • program 700 i.e., "for information regarding ... speak or press 1 now".
  • the invention is to decouple the invocation and execution of functions from the
  • a twenty-five function system can be invoked from voice menus by
  • a graphical interface allows the presentation of many more options, if not all
  • interfaces allow the user to exit to the top level choices in the choice engine once the user
  • the invention allows for dynamic addition and deletion of both function and
  • a technical advantage of the present invention is to promote re-use of defined
  • variable state transition model that formats the different functions into function choices based
  • permutation lists comprising expanded lists of phrases that are synonymous with the key
  • a further technical advantage of the present invention is to use a format controller for
  • FIGURE 1 depicts the inventive hyperlink system
  • FIGURE 2 depicts a telephone access decision tree
  • FIGURE 3 depicts a screen phone or PDA decision tree
  • FIGURE 4 depicts a personal computer decision tree
  • FIGURE 5 depicts the logical groupings of related functions
  • FIGURE 6 depicts the device specific variable state transition
  • FIGURE 7 depicts a flow diagram of a prior art program.
  • FIGURE 1 depicts the inventive hyperlink system 100.
  • the system 100 interacts with
  • interface with the PC 101 may be some proprietary PC interface, a graphical interface or
  • Another type of communication equipment may be a
  • TDD telecommunication device for the deaf screen phone 102, which is a telephone device
  • the screen phone may be a PDA (personal digital assistant)
  • a further type of communication equipment that interacts with the system is a
  • This interface 102 may be to a standard DTMF telephone, but this
  • interface may also be to many other types of telephonic devices including faxes and other
  • the device specific variable state transition model 104 acts as
  • the functions 10, 11, 12, 99 represent the various options and choices that a
  • the system 100 of FIGURE 1 can be divided into
  • the devices 101, 102, 103 the device specific variable state transition model
  • An example of the system 100 is a banking system. Function 10, for instance, in the
  • banking application may be a savings account menu.
  • Function 11 may be a checking account menu;
  • function 12 may be a credit card account menu; and another function may be a loan
  • inventive system 100 relieves the programmer of having to anticipate the navigation portion
  • FIGURE 2 depicts the output menu or tree structure 200 from the device specific
  • variable state transition model when a telephone 103 is used to access the system 100.
  • the tree structure 200 shown in FIGURE 2 defines 20 functions. These functions are
  • the caller first receives a greeting 201 and a greeting 201.
  • the caller is presented with further options of F4, F5 and F6. However, the caller is
  • the device specific variable state transition model 104 has recognized the limitations
  • FIGURE 3 depicts the output menu or tree structure 300 from the device specific
  • variable state transition model when a screen telephone or PDA 102 is used to access the
  • the tree structure 300 shown in FIGURE 3 defines 20 functions
  • model 104 has recognized that the caller is using a screen telephone device or PDA 102, which
  • a small or diminutively sized screen is capable of displaying
  • the user can freely move between F9 203 and FI 1 204 without moving back up the
  • FIGURE 4 depicts the output menu or tree structure 400 from the device specific
  • variable state transition model when a personal computer 101 or other large screen device is
  • a large or substantially sized screen is capable of displaying
  • structure 400 shown in FIGURE 4 defines 20 functions, these functions are the same functions
  • FIGURES 1-3 First, the caller is shown a greeting 401 and then a text message or
  • transition model 104 has recognized that the caller is using personal computer 101, which has
  • FIGURE 4 also depicts the tree structure 400 when using a natural language voice
  • FIGURE 5 depicts the logical groupings of related functions. This shows the functions
  • types of accounts e.g. savings, checking, credit cards, and loans, or types of
  • transactions e.g. balance inquiries, withdrawals, deposits, and transfers.
  • the different groups can be combined to form various associated groups, such as all of the
  • FIGURE 5 is a diagrammatic representation of these types of groups, but this could be
  • FIGURE 5 function FI 10 is associated with F2 and F3 in logical
  • FIGURE 5 to set a hierarchial menu structure shown in FIGURE 2.
  • FIGURE 3 representation as shown in FIGURE 3, either as a logical grouping of icons or text menu
  • FIGURE 6 depicts the device specific variable state transition model 104 that uses the
  • FIGURES 2-4 show a type, as shown in FIGURES 2-4.
  • FIGURE 6 includes a series of functions 601. These functions are the same functions
  • This function is account specific, i.e., it is only applicable to the checking account.
  • the cleared check function 602 involves looking/accessing a host system for a given
  • the cleared check function 602 has a certain number of required inputs 604; for
  • function 602 also has a certain number of methods 605 for performing its function with the
  • the function 602 has an output section 606 for sending
  • the cleared check function 602 has a logical group section 603 that defines links and
  • the function 602 also has the ability
  • cleared check function 602 can
  • the cleared check function 602 can be invoked by a user 608 in a number of different
  • the function may appear as an icon to be activated by the caller. Another way is
  • a user 608 could say “give me the history of the checking account”, or “look at (or examine, retrieve, check, verify etc.) the last ten
  • the permutation lists 613, and the key word data base 612 feeds the dialogue engine
  • the dialogue engine 611 is the reactive mechanism that allows the user 608 to call or
  • the engine 611 parses a command
  • 608 may have said, "what are the last ten checks that have cleared my account.”
  • the format controller 615 organizes the functions into a hierarchical arrangement
  • the device is obtained several ways. The most direct is that the format controller solicits the
  • the device respond as to its type. Once the user's device type is ascertained, the format
  • controller will display the functions as shown in FIGURES 2-4.
  • the format controller works
  • Each function 601 has other inputs, either from voice or non- voice.
  • voice either from voice or non- voice.
  • FIGURE 6 one of the required inputs is an account number and pin. Now, if these inputs are
  • the dialogue engine 611 would not attempt to perform a function without having
  • the engine 611 would either terminate the function request or it may
  • a security function would prompt the user to supply
  • the dialogue engine is capable of learning new word/phrase associations for existing
  • the dialogue engine receives a command or request, and the engine

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The inventive system provides a user with a plurality of function choices based upon the communication device of the user. The system includes a device specific variable state transition model that formats the system functions into the function choices based upon a device of the user. Thus, if the user has a personal computer or other device with a large screen, the functions are formatted into a single level so that all of the functions are displayed simultaneously. If the user has a TDD or PDA or other device with a small screen, the functions are formatted into two or more levels, depending upon the screen size, so that only a manageable portion of the functions are displayed simultaneously. If the system lacks a screen, then the functions are formatted into a plurality of levels, each level having only a small number of function choices. The system uses a data base having key words that describe each function, and permutation lists that have expanded lists of phrases that are synonymous with the key words. The system uses a dialogue engine to compare the commands from the user with the key words and permutation lists to identify the requested function.

Description

VOICE ACTIVATED HYPERLINKS
TECHNICAL FIELD OF THE INVENTION
This invention relates in general to automated telephony systems, and in specific to
voice activated menu command systems.
BACKGROUND OF THE INVENTION
The existing prior art uses a scripting tool and an execution environment for interactive
voice mail audio text that defines scripts in a very linear fashion. For instance, a script
definition could be "wait for a telephone ring." When the phone the rings, then answer;
prompt the caller with a predefined set of choices; receive the selected choice via either DTMF
or voice recognition, and then perform the selected function.
FIGURE 7 is a diagram of flow of typical prior art interactive voice recognition/DTMF
program 700, i.e., "for information regarding ... speak or press 1 now". Typical programs
follow a sequence of events: telephone ring and answer 701, prompt and greet the caller 702,
present some finite number of choices or functions 704 in a predetermined fashion 703, and
receive the user input 705 either through a DTMF input or a voice command. Then perform
the selected function 706, and so forth.
Whether the system uses a voice recognition choice or a numeric DTMF input, some
function is invoked, and again in a very predetermined manner, the caller is returned to the
prompt 702 at the beginning of the program. The prior art system is very inflexible in that the
flow of the choices or functions, and the presentation of those choices, are tied very tightly to
the functions in the function definition.
The limitation of the prior art is that it is very rigid and only able to present a finite
number of options to the caller in a predetermined fashion. Moreover, the logic of the
presentation to the caller is preset, and thus is limited by the creativity of the system
programmer. SUMMARY OF THE INVENTION
The invention is to decouple the invocation and execution of functions from the
manner in which they are selected. For instance, in a listing of twenty-five functions in a voice
menu, via DTMF, no more than three or four choices are presented to the caller using DTMF
at any given menu step in the call. This is because of ergonomic or human limitations.
However, if the same functions are displayed on a screen to the caller, for example via an
internet session, there is no reason that all twenty-five of the functions could not be presented
at the same time. By decoupling the functions from the method by which those functions are
invoked, the functions can be leveraged across many different access devices and technologies.
For example, a twenty-five function system can be invoked from voice menus by
grouping certain sets of the functions through a choice engine or choice selection mechanism.
However, a graphical interface allows the presentation of many more options, if not all
options, to the caller.
Another advantage is that both natural language voice recognition and screen based
interfaces allow the user to exit to the top level choices in the choice engine once the user
completes a function. While in the prior art, the user is very restricted as to allowable
navigation through functions because of how functions are nested. Specifically, a user may
only move up one level at a time. Thus, all possible navigation paths must be pre-defined by
the programmer. The invention allows for dynamic addition and deletion of both function and
access methods. Decoupling of the functions allows for a fully webbed navigation of the
system. The user can go from function 1, directly to function 2, then to function 3 without
necessarily having to return to the top level menu or choice engine. This provides more
efficient navigation to desired functions. A technical advantage of the present invention is to promote re-use of defined
functions.
Another technical advantage of the present invention is to allow the addition of new
access methods independent of functions.
A further technical advantage of the present invention is to allow the addition of new
functions independent of defined functions.
A further technical advantage of the present invention is to have a device specific
variable state transition model that formats the different functions into function choices based
upon a device of the user.
A further technical advantage of the present invention is to have a key word data base
comprising key words used to describe each function in the system, and to have a plurality of
permutation lists comprising expanded lists of phrases that are synonymous with the key
words.
A further technical advantage of the present invention is to have a dialogue engine for
parsing voice input from the user into segments and examining each segment for key words
and expanded phrases for identifying the requested function.
A further technical advantage of the present invention is to have a learning operation
for ascertaining the meaning and the function association for an unknown phrase received
from the user, and amending the key word data base and permutation lists to include the
unknown phrase.
A further technical advantage of the present invention is to have the permutation lists
predefined by applying thesaurus and lexicon applications to the key word data base. A further technical advantage of the present invention is to use a format controller for
determining characteristics of the device, and arranging the functions into function choices
based upon limitations of the device.
The foregoing has outlined rather broadly the features and technical advantages of the
present invention in order that the detailed description of the invention that follows may be
better understood. Additional features and advantages of the invention will be described
hereinafter which form the subject of the claims of the invention. It should be appreciated by
those skilled in the art that the conception and the specific embodiment disclosed may be
readily utilized as a basis for modifying or designing other structures for carrying out the same
purposes of the present invention. It should also be realized by those skilled in the art that
such equivalent constructions do not depart from the spirit and scope of the invention as set
forth in the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention, and the advantages
thereof, reference is now made to the following descriptions taken in conjunction with the
accompanying drawings, in which:
FIGURE 1 depicts the inventive hyperlink system;
FIGURE 2 depicts a telephone access decision tree;
FIGURE 3 depicts a screen phone or PDA decision tree;
FIGURE 4 depicts a personal computer decision tree;
FIGURE 5 depicts the logical groupings of related functions;
FIGURE 6 depicts the device specific variable state transition
model; and FIGURE 7 depicts a flow diagram of a prior art program.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIGURE 1 depicts the inventive hyperlink system 100. The system 100 interacts with
various types of communications equipment, including a personal computer or PC 101. The
interface with the PC 101 may be some proprietary PC interface, a graphical interface or
possibly an internet browser interface. Another type of communication equipment may be a
TDD (telecommunication device for the deaf) screen phone 102, which is a telephone device
with a VDT screen for displaying text or images. The screen phone may be a PDA (personal
data assistant). A further type of communication equipment that interacts with the system is a
telephone interface 102. This interface 102 may be to a standard DTMF telephone, but this
interface may also be to many other types of telephonic devices including faxes and other
devices that allow for dialogue. The device specific variable state transition model 104 acts as
an interpretive assembler in that the model selects and arranges the various functions 10, 11,
12, 99 based upon the requesting device, i.e., the computer 101, the screen phone 102, and the
telephone 103. The functions 10, 11, 12, 99 represent the various options and choices that a
system user has when operating the system. The system 100 of FIGURE 1 can be divided into
three sections, the devices 101, 102, 103, the device specific variable state transition model
104, and the functions 10, 11, 12, 99. This system 100 that will present choices and options,
or functions 10, 11, 12, 99 to the user in a variable manner based on the devices type 101, 102,
103 that the user is using to access the system, without forcing the system programmer to
anticipate all the different devices that may attempt to access the system when the programmer
develops the code for the system.
An example of the system 100 is a banking system. Function 10, for instance, in the
banking application, may be a savings account menu. Function 11 may be a checking account menu; function 12 may be a credit card account menu; and another function may be a loan
application function. On a touch tone telephone for instance, the user would be offered these
different functions as voice choices with DTMF input. Specifically, push one for savings,
push two for checking and so on. If the user entered the system with a PC, however, the
different functions may be presented in a series of text choices or graphical icons. The
inventive system 100 relieves the programmer of having to anticipate the navigation portion,
i.e., device specific portion for determining the different functions and how these functions
access the data and format the data for output.
FIGURE 2 depicts the output menu or tree structure 200 from the device specific
variable state transition model when a telephone 103 is used to access the system 100. As an
example, the tree structure 200 shown in FIGURE 2 defines 20 functions. These functions are
the same functions shown in FIGURE 1. The caller first receives a greeting 201 and a
message defining the top level of input choices, shown as functions FI 10, F2 11, and F3 12.
Selection of one of these choices reveals a second layer of functions. For example, if FI 10 is
selected, the caller is presented with further options of F4, F5 and F6. However, the caller is
not presented all of the different functions F1-F20 because of ergonomic limitations.
The device specific variable state transition model 104 has recognized the limitations
of the callers telephone device 103 and has appropriately formatted the presentation of the
functions. However, this has limited the users movement around the tree in that a caller
cannot move directly from F9 203 to FI 1 204 without moving back up the tree to FI 10. This
works similarly for a move from F9 203 to F19 205; except in this instance, the caller must
return to the greeting 201. FIGURE 3 depicts the output menu or tree structure 300 from the device specific
variable state transition model when a screen telephone or PDA 102 is used to access the
system 100. As an example, the tree structure 300 shown in FIGURE 3 defines 20 functions;
these functions are the same functions shown in FIGURES 1 and 2. First, the caller is shown a
greeting 301 and a message defining the top level of input choices, shown as functions FI 10,
F2 11, and F3 12. Selection of one of these choices reveals a second layer of functions.
As compared to FIGURE 2, however, the device specific variable state transition
model 104 has recognized that the caller is using a screen telephone device or PDA 102, which
has a small screen that is capable of displaying several lines of text or a few graphical icons,
and thus has appropriately formatted the presentation of the functions so that some, but not all,
are displayed at the same time. A small or diminutively sized screen is capable of displaying
some of the functions on the screen simultaneously, but not all of the functions. In this
instance, the user can freely move between F9 203 and FI 1 204 without moving back up the
tree to FI 10. But the caller cannot move from F9 203 to F19 205 without returning to the
greeting 201 because of the small screen size.
FIGURE 4 depicts the output menu or tree structure 400 from the device specific
variable state transition model when a personal computer 101 or other large screen device is
used to access the system 100. A large or substantially sized screen is capable of displaying
all or nearly all of the functions on the screen simultaneously. As an example, the tree
structure 400 shown in FIGURE 4 defines 20 functions, these functions are the same functions
shown in FIGURES 1-3. First, the caller is shown a greeting 401 and then a text message or
graphical icons that define all of the functions FI to F20. As compared to FIGURES 2 and 3, however, the device specific variable state
transition model 104 has recognized that the caller is using personal computer 101, which has
a large screen that is capable of displaying many lines of text or many graphical icons, and
thus has appropriately formatted the presentation of the functions so that all of the functions
are displayed at the same time. In this instance, the user can freely move between F9 203 and
FI 1 204, as well as, from F9 203 to F19 205 without returning to the greeting 401.
FIGURE 4 also depicts the tree structure 400 when using a natural language voice
recognition system which means that if, for example, the caller using a telephone 103, screen
phone 102, or PC base phone 101 utters a phrase such as "I want my savings account balance,"
the system would automatically invoke function FI 10. Using such as system would put all of
the functions FI to F20 at the disposal of the user at the same time.
FIGURE 5 depicts the logical groupings of related functions. This shows the functions
and their relationship to each other and describes the hierarchy or priority of the particular
functions. For instance, there are logical groupings in the banking system example previously
discussed, such as types of accounts, e.g. savings, checking, credit cards, and loans, or types of
transactions, e.g. balance inquiries, withdrawals, deposits, and transfers.
These groupings can be arranged into particular sets of functions that are roughly
equivalent in precedence like withdrawals and deposits. Moreover, the different elements of
the different groups can be combined to form various associated groups, such as all of the
types of transactions can be combined with each of the account types. For example, savings
can be combined with balance inquiries, withdrawals, deposits, and transfers to form a savings
account transaction grouping. FIGURE 5 is a diagrammatic representation of these types of groups, but this could be
described physically, graphically or in some text definition, which would show some logical
grouping of these functions. FIGURE 5 function FI 10 is associated with F2 and F3 in logical
group B, but is also associated with F3, F4 and F9 in logical group A. This grouping of
functions would support a program that generates the actual navigation structures of the access
methods in determining how the choices are presented according to the different types of
devices. In the case of touch-tone telephone input, it would use the information provided here
in FIGURE 5 to set a hierarchial menu structure shown in FIGURE 2. In the case of access by
a screen phone device, the system would provide some logical grouping choices in a graphical
representation as shown in FIGURE 3, either as a logical grouping of icons or text menu
choices. Thus, the grouping diagram of FIGURE 5 provides the necessary information to
express the association between the different functions.
FIGURE 6 depicts the device specific variable state transition model 104 that uses the
groups of functions shown in FIGURE 5 and arranges the functions according to the device
type, as shown in FIGURES 2-4.
FIGURE 6 includes a series of functions 601. These functions are the same functions
depicted in previous figures. In keeping with the example of banking functions used
previously, an example of a discrete function would be, for instance, a cleared checks function
602. This function is account specific, i.e., it is only applicable to the checking account.
However, other functions, such as a balance inquiry function, would not be account specific
and could be re-used or called by other functions, i.e, checking account, savings account, or
credit card menu functions. The cleared check function 602 involves looking/accessing a host system for a given
account, then retrieving the details of the last several checks that have cleared through that
account, and then formatting this information in a manner that is presentable to the caller, i.e.,
for a voice activated dialogue transaction as shown in FIGURE 2, then speaking that
information to the caller.
The cleared check function 602 has a certain number of required inputs 604; for
example, there may be an account number or a security key like a pin number. Other inputs
include parameters that are passed to the function from the user; for instance, the number of
checks to be examined, such as the last five checks or the last ten checks. The cleared check
function 602 also has a certain number of methods 605 for performing its function with the
input 604. For example, within the function there are data access methods, data manipulation
methods and data formatting methods. The function 602 has an output section 606 for sending
the output back to the caller in a format acceptable to the device being used by the caller.
The cleared check function 602 has a logical group section 603 that defines links and
associations to other functions as depicted in FIGURE 5. The function 602 also has the ability
to invoke other functions 607 when necessary. For example, cleared check function 602 can
invoke a security function if the pin number is incorrectly entered.
The cleared check function 602 can be invoked by a user 608 in a number of different
ways, depending upon the user's device 614. For instance, in a non-voice manner or graphical
format 609, the function may appear as an icon to be activated by the caller. Another way is
via voice dialogue 610, although this avenue is problematic because of the number of ways
spoken language can describe the desired function. A user 608 could say "give me the history of the checking account", or "look at (or examine, retrieve, check, verify etc.) the last ten
checks", and so on.
Thus, as a function 602 is developed for voice input action 610, a certain number of
key action words or phrases will be described for that function. The key action words or
phrases will be a starting point to describe that functionality. A further step, taken non-run
time or during the development of the function, is to run the key action phrases through a
lexicon-grammar engine that constructs the various permutations of the possible requests. If
enough of the key word phrases are defined up front, and given a sufficiently intelligent
grammar engine, then the dialogue engine 611 could determine the proper function that would
satisfy the user's 608 needs and match the user's input request.
The permutation lists 613, and the key word data base 612 feeds the dialogue engine
611. The dialogue engine 611 is the reactive mechanism that allows the user 608 to call or
otherwise verbally communicate with the system 100. The engine 611 parses a command
sentence or expression into words or phrases. Theses words and phrases are examined by the
engine for various key action phrases that are defined in the key word data base 612 and the
expanded permutations of those key words stored in the permutation lists 613. The dialogue
engine 611 would then determine which function is being invoked by the user from the data in
the key word data base 612 and the lists 613. In the example shown in FIGURE 6, the user
608 may have said, "what are the last ten checks that have cleared my account."
The format controller 615 organizes the functions into a hierarchical arrangement
depending upon the type of device 614 of the user 608. The information about the user's
device is obtained several ways. The most direct is that the format controller solicits the
device type from the user, i.e., push one for a touch tone phone, two for a screen phone, etc. Another way is for the format controller to compare the signal characters of the command with
a stored data base of such information to determine the type of device being used by the user.
Another way is for the format controller to send interrogatory signals to the device and have
the device respond as to its type. Once the user's device type is ascertained, the format
controller will display the functions as shown in FIGURES 2-4. The format controller works
with the dialogue engine to control the responses sent by the dialogue engine back to the user,
such that if full natural language voice recognition is being utilized by the user, the format of
the functions will appear as shown in FIGURE 4. However, if only a voice response unit is
being used (i.e., say one for savings), then the format of the functions as shown in FIGURE 2
would be used.
Each function 601 has other inputs, either from voice or non- voice. In the example of
FIGURE 6, one of the required inputs is an account number and pin. Now, if these inputs are
not provided, the dialogue engine 611 would not attempt to perform a function without having
required fields filled. The engine 611 would either terminate the function request or it may
invoke another function 607; for example, a security function would prompt the user to supply
the missing account number and pin number. Then the security function would verify these
numbers and fill in the data set of the invoking function 602. The security function, if needed,
could invoke other functions.
Beginning with a limited number of key action words and phrases, followed by a
process through grammar analysis, thesaurus support, lexicons, and dictionaries (generating
the various permutations of this relatively limited set of definitions), would allow for a more
flexible system and make a natural language voice recognizer much more useful. Thus, it would open up a much broader range of dialogue and more useful approach to natural
language dialogues between the user and the system.
The approach of this system is not independently defining a transaction engine here
with all of the permutations and then providing a limited finite number of them for use. But
rather, it describes the function and the general access methods, the data elements required,
and the key action words and phrases that a user would typically use and building all of the
permutations of that access method, automatically feeding the dialogue engine.
The use of key words and phrases has the same application in non-voice application as
in, for example, a screen phone. For a list selection of functions to be displayed, the choices
may be narrowed to a few; or even a single item would then be added dynamically to a menu
on a screen phone.
The dialogue engine is capable of learning new word/phrase associations for existing
functions. For example, if the dialogue engine receives a command or request, and the engine
is unable to determine the action or the function in the command because the choice of words
used by the user to convey the command is not located in either the key word data base or the
permutation lists, then the dialogue engine will ascertain the meaning of the command. The
engine will do this by either asking the user to re-define the command in some other terms or
consult built-in dictionaries, thesauruses, and the like to determine the most likely meaning of
the command, and then proceed accordingly. Then engine will then update its permutation
lists to contain the new reference from the command.
Because the functions are decoupled from the manner that the functions are invoked,
additional functions may be added by the system manager without requiring revision of the
existing functions. Only the key word data base and the permutation lists would have to be updated when a new function is added to the system. However, if the new function is going to
be invoked by existing functions or will be associated with particular logical groups, then
those particular, existing functions will have to be updated. Additionally, it may be possible
for the user to define user specific functions characterizing a scripting preference for the user.
These user defined functions would be formed from combining or modifying existing
functions.
Although the present invention and its advantages have been described in detail, it
should be understood that various changes, substitutions and alterations can be made herein
without departing from the spirit and scope of the invention as defined by the appended
claims.

Claims

WHAT IS CLAIMED IS:
1. A system for providing a user with a plurality of function choices, the system
comprising:
a device specific variable state transition model; and
a plurality of functions;
wherein the state transition model formats the functions into function choices based
upon a device of the user.
2. The system of claim 1, wherein:
the functions are associated into a plurality of logical groupings, wherein each logical
group comprises functions which have a related functionality.
3. The system of claim 2, wherein:
at least one function is associated with more than one logical group.
4. The system of claim 1, wherein:
the device is selected from the group consisting of: a personal computer, a telephone, a
screen telephone, a personal data assistant, and a TDD.
5. The system of claim 1, wherein:
the device has a substantially sized display screen, and the data is formatted into a
single level comprising all of the functions as function choices.
6. The system of claim 1, wherein:
the device has a diminutively sized display screen and the data is formatted into at least
two levels, each level comprising a portion of the functions as function choices.
7. The system of claim 1, wherein:
the device lacks a display screen and the data is formatted into a plurality of levels,
each level comprising a portion of the functions as function choices.
8. The system of claim 7, wherein:
the device is a telephone using a voice response system.
9. The system of claim 7, wherein:
the device is a telephone using a touch tone response system.
10. The system of claim 1, wherein:
the device lacks a display screen and the data is formatted into a single level
comprising all of the functions as function choices.
11. The system of claim 10, wherein:
the device is a telephone using a natural language voice recognition system.
12. The system of claim 1, wherein the transition model comprises:
a key word data base comprising key words used to describe each function; a plurality of permutation lists comprising expanded lists of phrases that are
synonymous the key words; and
a dialogue engine for receiving voice input action from the user, parsing the input
action into segments, searching the segments for key words and expanded lists of phrases,
identifying a requested function in the voice input action, and activating the requested
function.
13. The system of claim 12, wherein the dialogue engine comprises:
a learning operation for ascertaining a meaning and a function association for an
unknown phrase received from the user, and amending the key word data base and
permutation lists to include the unknown phrase.
14. The system of claim 12, wherein:
a format controller for determining characteristics of the device, arranging the
functions into function choices based upon limitations of the device.
15. The system of claim 14, wherein:
the format controller determines the characteristics of the device by querying the user.
16. The system of claim 14, wherein:
the format controller determines the characteristics of the device by interrogating the
device.
17. The system of claim 14, wherein:
the format controller determines the characteristics of the device by analyzing signal
characteristics of the input action.
18. The system of claim 12, wherein:
the permutation lists are predefined.
19. The system of claim 18, wherein:
the permutation lists are formed by applying a thesaurus application to the key word
data base.
20. The system of claim 18, wherein:
the permutation lists are formed by applying a lexicon application to the key word data
base.
21. The system of claim 18, wherein:
the permutation lists are formed by applying a lexicon application and a thesaurus
application to the key word data base.
22. The system of claim 1, wherein the transition model comprises:
a format controller for determining characteristics of the device, and arranging the
functions into function choices based upon limitations of the device.
23. The system of claim 22, wherein:
the format controller determines the characteristics of the device by querying the user.
24. The system of claim 22, wherein:
the format controller determines the characteristics of the device by interrogating the
device.
25. The system of claim 22, wherein:
the format controller determines the characteristics of the device by analyzing signal
characteristics of a command issued by the device from the user.
26. A variable state transition system for arranging a plurality of functions into a
presentation format based upon a device of a user of the system, the system comprising:
a key word data base comprising key words used to describe each function;
a plurality of permutation lists comprising expanded lists of phrases that are
synonymous with the key words;
a dialogue engine for receiving voice input action from the user, parsing the input
action into segments, searching the segments for key words and expanded lists of phrases,
identifying a requested function in the voice input action, and activating the requested
function; and
a format controller for determining characteristics of the device, arranging the
functions into function choices based upon limitations of the device.
27. The system of claim 26, wherein the dialogue engine comprises:
a learning operation for ascertaining a meaning and a function association for an
unknown phrase received from the user, and amending the key word data base and
permutation lists to include the unknown phrase.
28. The system of claim 26, wherein:
the format controller determines the characteristics of the device by performing one of
the group of methods consisting of querying the user, interrogating the device, and analyzing
signal characteristics of the input action.
29. The system of claim 26, wherein:
the permutation lists are predefined.
30. A variable state transition system for arranging a plurality of functions into a
presentation format based upon a device of a user of the system, the system comprising:
means for determining characteristics of the device;
means for arranging the functions into the presentation format based upon limitations
of the device to allow the user to select a requested function;
means for storing key words used to describe each function;
means for storing expanded lists of phrases that are synonymous with the key words;
means for receiving voice input action from the user;
means for parsing the input action into segments;
means for searching the segments for key words and expanded lists of phrases; and means for identifying the requested function in the voice input action.
31. The system of claim 30, wherein the means for identifying comprises:
means for ascertaining a meaning and a function association for an unknown phrase
received from the user, and amending the key word data base and permutation lists to include
the unknown phrase.
32. The system of claim 30, the means for determining comprises:
means for determining the characteristics of the device by querying the user.
33. The system of claim 30, the means for determining comprises:
means for determining the characteristics of the device by interrogating the device.
34. The system of claim 30, the means for determining comprises:
means for determining the characteristics of the device by analyzing signal
characteristics of a command issued by the device from the user.
35. The system of claim 30, wherein:
the expanded lists of phrases are predefined.
PCT/US1999/028004 1999-11-23 1999-11-23 Voice activated hyperlinks WO2001038962A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU24743/00A AU2474300A (en) 1999-11-23 1999-11-23 Voice activated hyperlinks
EP99968050A EP1240579A1 (en) 1999-11-23 1999-11-23 Voice activated hyperlinks
PCT/US1999/028004 WO2001038962A1 (en) 1999-11-23 1999-11-23 Voice activated hyperlinks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1999/028004 WO2001038962A1 (en) 1999-11-23 1999-11-23 Voice activated hyperlinks

Publications (1)

Publication Number Publication Date
WO2001038962A1 true WO2001038962A1 (en) 2001-05-31

Family

ID=22274153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/028004 WO2001038962A1 (en) 1999-11-23 1999-11-23 Voice activated hyperlinks

Country Status (3)

Country Link
EP (1) EP1240579A1 (en)
AU (1) AU2474300A (en)
WO (1) WO2001038962A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024454A (en) * 2009-09-10 2011-04-20 三菱电机株式会社 System and method for activating plurality of functions based on speech input
EP3843091A1 (en) * 2017-10-03 2021-06-30 Google LLC Voice user interface shortcuts for an assistant application

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822404A (en) * 1996-09-30 1998-10-13 Intervoice Limited Partnership System and method for identifying remote communications formats
US5825856A (en) * 1994-03-31 1998-10-20 Citibank, N.A. Interactive voice response system for banking by telephone
EP0949571A2 (en) * 1998-04-07 1999-10-13 Xerox Corporation Document re-authoring systems and methods for providing device-independent access to the world wide web

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5825856A (en) * 1994-03-31 1998-10-20 Citibank, N.A. Interactive voice response system for banking by telephone
US5822404A (en) * 1996-09-30 1998-10-13 Intervoice Limited Partnership System and method for identifying remote communications formats
EP0949571A2 (en) * 1998-04-07 1999-10-13 Xerox Corporation Document re-authoring systems and methods for providing device-independent access to the world wide web

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024454A (en) * 2009-09-10 2011-04-20 三菱电机株式会社 System and method for activating plurality of functions based on speech input
EP3843091A1 (en) * 2017-10-03 2021-06-30 Google LLC Voice user interface shortcuts for an assistant application
JP2022003408A (en) * 2017-10-03 2022-01-11 グーグル エルエルシーGoogle LLC Voice user interface shortcuts for assistant application
US11450314B2 (en) 2017-10-03 2022-09-20 Google Llc Voice user interface shortcuts for an assistant application
JP7297836B2 (en) 2017-10-03 2023-06-26 グーグル エルエルシー Voice user interface shortcuts for assistant applications
EP4270171A3 (en) * 2017-10-03 2023-12-13 Google LLC Voice user interface shortcuts for an assistant application

Also Published As

Publication number Publication date
AU2474300A (en) 2001-06-04
EP1240579A1 (en) 2002-09-18

Similar Documents

Publication Publication Date Title
US6246989B1 (en) System and method for providing an adaptive dialog function choice model for various communication devices
US7389213B2 (en) Dialogue flow interpreter development tool
US8572209B2 (en) Methods and systems for authoring of mixed-initiative multi-modal interactions and related browsing mechanisms
US7912726B2 (en) Method and apparatus for creation and user-customization of speech-enabled services
US7082392B1 (en) Management of speech technology modules in an interactive voice response system
US8612925B1 (en) Zero-footprint telephone application development
US5924070A (en) Corporate voice dialing with shared directories
US7146323B2 (en) Method and system for gathering information by voice input
US6311159B1 (en) Speech controlled computer user interface
US7552055B2 (en) Dialog component re-use in recognition systems
US7930182B2 (en) Computer-implemented tool for creation of speech application code and associated functional specification
US5890122A (en) Voice-controlled computer simulateously displaying application menu and list of available commands
EP1016001B1 (en) System and method for creating a language grammar
US20040025115A1 (en) Method, terminal, browser application, and mark-up language for multimodal interaction between a user and a terminal
Schumacher Jr et al. Increasing the usability of interactive voice response systems: Research and guidelines for phone-based interfaces
US7729919B2 (en) Combining use of a stepwise markup language and an object oriented development tool
US20050043953A1 (en) Dynamic creation of a conversational system from dialogue objects
JP2007122747A (en) Dialogue flow interpreter
US7426469B1 (en) Speech enabled computing method
US20040217986A1 (en) Enhanced graphical development environment for controlling mixed initiative applications
US7962963B2 (en) Multimodal resource management system
WO2001038962A1 (en) Voice activated hyperlinks
US20050243986A1 (en) Dialog call-flow optimization
Salvador et al. Requirement engineering contributions to voice user interface
Demesticha et al. Aspects of design and implementation of a multi-channel and multi-modal information system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999968050

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999968050

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642