EP3005075A1 - Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains - Google Patents

Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains

Info

Publication number
EP3005075A1
EP3005075A1 EP14736158.8A EP14736158A EP3005075A1 EP 3005075 A1 EP3005075 A1 EP 3005075A1 EP 14736158 A EP14736158 A EP 14736158A EP 3005075 A1 EP3005075 A1 EP 3005075A1
Authority
EP
European Patent Office
Prior art keywords
user
electronic device
assistant
input
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP14736158.8A
Other languages
German (de)
English (en)
Inventor
Thomas R. GRUBER
Harry J. SADDLER
Lia T. NAPOLITANO
Emily Clark Schubert
Brian Conrad SUMNER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/913,421 external-priority patent/US10705794B2/en
Application filed by Apple Inc filed Critical Apple Inc
Publication of EP3005075A1 publication Critical patent/EP3005075A1/fr
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6033Substation equipment, e.g. for use by subscribers including speech amplifiers for providing handsfree use or a loudspeaker mode in telephone sets
    • H04M1/6041Portable telephones adapted for handsfree use
    • H04M1/6075Portable telephones adapted for handsfree use adapted for handsfree use in a vehicle
    • H04M1/6083Portable telephones adapted for handsfree use adapted for handsfree use in a vehicle by interfacing with the vehicle audio system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6033Substation equipment, e.g. for use by subscribers including speech amplifiers for providing handsfree use or a loudspeaker mode in telephone sets
    • H04M1/6041Portable telephones adapted for handsfree use
    • H04M1/6075Portable telephones adapted for handsfree use adapted for handsfree use in a vehicle
    • H04M1/6083Portable telephones adapted for handsfree use adapted for handsfree use in a vehicle by interfacing with the vehicle audio system
    • H04M1/6091Portable telephones adapted for handsfree use adapted for handsfree use in a vehicle by interfacing with the vehicle audio system including a wireless interface

Definitions

  • the present invention relates to multimodal user interfaces, and more specifically to user interfaces that include both voice-based and visual modalities.
  • voice command systems which map specific verbal commands to operations, for example to initiate dialing of a telephone number by speaking the person's name.
  • IVR Interactive Voice Response
  • An intelligent automated assistant also referred to herein as a virtual assistant, is able to provide an improved interface between human and computer, including the processing of natural language input.
  • Such an assistant which may be implemented as described in related U.S. Utility Application Serial No. 12/987,982 for "Intelligent
  • Automated Assistant allows users to interact with a device or system using natural language, in spoken and/or text forms.
  • Such an assistant interprets user inputs, operationalizes the user's intent into tasks and parameters to those tasks, executes services to support those tasks, and produces output that is intelligible to the user.
  • Virtual assistants are capable of using general speech and natural language understanding technology to recognize a greater range of input, enabling generation of a dialog with the user.
  • Some virtual assistants can generate output in a combination of modes, including verbal responses and written text, and can also provide a graphical user interface (GUI) that permits direct manipulation of on-screen elements.
  • GUI graphical user interface
  • the user may not always be in a situation where he or she can take advantage of such visual output or direct manipulation interfaces.
  • the user may be driving or operating machinery, or may have a sight disability, or may simply be uncomfortable or unfamiliar with the visual interface.
  • any situation in which a user has limited or no ability to read a screen or interact with a device via contact is referred to herein as a "hands-free context".
  • a hands-free context For example, in situations where the user is attempting to operate a device while driving, as mentioned above, the user can hear audible output and respond using their voice, but for safety reasons should not read fine print, tap on menus, or enter text.
  • Hands-free contexts present special challenges to the builders of complex systems such as virtual assistants. Users demand full access to features of devices whether or not they are in a hands-free context. However, failure to account for particular limitations inherent in hands-free operation can result in situations that limit both the utility and the usability of a device or system, and can even compromise safety by causing a user to be distracted from a primary task such as operating a vehicle.
  • SUMMARY SUMMARY
  • a user interface for a system such as a virtual assistant is automatically adapted for hands-free use.
  • a hands- free context is detected via automatic or manual means, and the system adapts various stages of a complex interactive system to modify the user experience to reflect the particular limitations of such a context.
  • the system of the present invention thus allows for a single implementation of a virtual assistant or other complex system to dynamically offer user interface elements and to alter user interface behavior to allow hands-free use without compromising the user experience of the same system for hands-on use.
  • the system of the present invention provides mechanisms for adjusting the operation of a virtual assistant so that it provides output in a manner that allows users to complete their tasks without having to read details on a screen.
  • the virtual assistant can provide mechanisms for receiving spoken input as an alternative to reading, tapping, clicking, typing, or performing other functions often achieved using a graphical user interface.
  • the system of the present invention provides underlying functionality that is identical to (or that approximates) that of a conventional graphical user interface, while allowing for the particular requirements and limitations associated with a hands-free context. More generally, the system of the present invention allows core functionality to remain substantially the same, while facilitating operation in a hands-free context.
  • systems built according to the techniques of the present invention allow users to freely choose between hands-free mode and conventional ("hands-on") mode, in some cases within a single session. For example, the same interface can be made adaptable to both an office environment and a moving vehicle, with the system dynamically making the necessary changes to user interface behavior as the environment changes.
  • any of a number of mechanisms can be implemented for adapting operation of a virtual assistant to a hands- free context.
  • the virtual assistant is an intelligent automated assistant as described in U.S. Utility Application Serial No. 12/987,982 for "Intelligent Automated Assistant", filed January 10, 2011, the entire disclosure of which is incorporated herein by reference.
  • Such an assistant engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
  • a virtual assistant may be configured, designed, and/or operable to detect a hands-free context and to adjust its operation accordingly in performing various different types of operations, functionalities, and/or features, and/or to combine a plurality of features, operations, and applications of an electronic device on which it is installed.
  • a virtual assistant of the present invention can detect a hands-free context and adjust its operation accordingly when receiving input, providing output, engaging in dialog with the user, and/or performing (or initiating) actions based on discerned intent.
  • Actions can be performed, for example, by activating and/or interfacing with any applications or services that may be available on an electronic device, as well as services that are available over an electronic network such as the Internet.
  • activation of external services can be performed via application programming interfaces (APIs) or by any other suitable mechanism(s).
  • APIs application programming interfaces
  • the present invention can provide a hands- free usage environment for many different applications and functions of an electronic device, and with respect to services that may be available over the Internet.
  • the use of such a virtual assistant can relieve the user of the burden of learning what functionality may be available on the device and on web- connected services, how to interface with such services to get what he or she wants, and how to interpret the output received from such services; rather, the assistant of the present invention can act as a go-between between the user and such diverse services.
  • the virtual assistant of the present invention provides a conversational interface that the user may find more intuitive and less burdensome than conventional graphical user interfaces.
  • the user can engage in a form of conversational dialog with the assistant using any of a number of available input and output mechanisms, depending in part on whether a hands-free or hands-on context is active.
  • Examples of such input and output mechanisms include, without limitation, speech, graphical user interfaces (buttons and links), text entry, and the like.
  • the system can be implemented using any of a number of different platforms, such as device APIs, the web, email, and the like, or any combination thereof.
  • Requests for additional input can be presented to the user in the context of a conversation presented in an auditory and/or visual manner.
  • Short and long term memory can be engaged so that user input can be interpreted in proper context given previous events and communications within a given session, as well as historical and profile information about the user.
  • the virtual assistant of the present invention can control various features and operations of an electronic device.
  • the virtual assistant can call services that interface with functionality and applications on a device via APIs or by other means, to perform functions and operations that might otherwise be initiated using a conventional user interface on the device.
  • functions and operations may include, for example, setting an alarm, making a telephone call, sending a text message or email message, adding a calendar event, and the like.
  • Such functions and operations may be performed as add-on functions in the context of a conversational dialog between a user and the assistant.
  • Such functions and operations can be specified by the user in the context of such a dialog, or they may be automatically performed based on the context of the dialog.
  • the assistant can thereby be used as a mechanism for initiating and controlling various operations on the electronic device.
  • the system of the present invention is able to present mechanisms for enabling hands-free operation of a virtual assistant to implement such a mechanism for controlling the device.
  • Fig. 1 is a screen shot illustrating an example of a hands-on interface for reading a text message, according to the prior art.
  • Fig. 2 is a screen shot illustrating an example of an interface for responding to a text message.
  • Figs. 3A and 3B are a sequence of screen shots illustrating an example wherein a voice dictation interface is used to reply to a text message.
  • Fig. 4 is a screen shot illustrating an example of an interface for receiving a text message, according to one embodiment.
  • Figs. 5A through 5D are a series of screen shots illustrating an example of operation of a multimodal virtual assistant according to an embodiment of the present invention, wherein the user receives and replies to a text message in a hands-free context.
  • Figs. 6A through 6C are a series of screen shots illustrating an example of operation of a multimodal virtual assistant according to an embodiment of the present invention, wherein the user revises a text message in a hands-free context.
  • FIGs. 7A-7D are flow diagrams of methods of adapting a user interface, according to some embodiments.
  • Fig. 7E is a flow diagram depicting methods of operation of a virtual assistant that supports dynamic detection of and adaptation to a hands-free context, according to one embodiment.
  • Fig. 8 is a block diagram depicting an example of a virtual assistant system according to one embodiment.
  • Fig. 9 is a block diagram depicting a computing device suitable for implementing at least a portion of a virtual assistant according to at least one embodiment.
  • Fig. 10 is a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a standalone computing system, according to at least one embodiment.
  • Fig. 11 is a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a distributed computing network, according to at least one embodiment.
  • Fig. 12 is a block diagram depicting a system architecture illustrating several different types of clients and modes of operation.
  • Fig. 13 is a block diagram depicting a client and a server, which communicate with each other to implement the present invention according to one embodiment.
  • Figs. 14A-14L is a flow diagram depicting a method of operation of a virtual assistant that provides hands-free list reading according some embodiments.
  • a hands-free context is detected in connection with operations of a virtual assistant, and the user interface of the virtual assistant is adjusted accordingly, so as to enable the user to interact with the assistant meaningfully in the hands-free context.
  • virtual assistant is equivalent to the term “intelligent automated assistant”, both referring to any information processing system that performs one or more of the functions of:
  • Devices that are in communication with each other need not be in continuous communication with each other, unless expressly specified otherwise.
  • devices that are in communication with each other may communicate directly or indirectly through one or more intermediaries.
  • embodiments include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise.
  • the virtual assistant techniques disclosed herein may be implemented on hardware or a combination of software and hardware. For example, they may be implemented in an operating system kernel, in a separate user process, in a library package bound into network applications, on a specially constructed machine, and/or on a network interface card. In a specific embodiment, the techniques disclosed herein may be implemented in software such as an operating system or in an application running on an operating system.
  • Software/hardware hybrid implementation(s) of at least some of the virtual assistant embodiment(s) disclosed herein may be implemented on a programmable machine selectively activated or reconfigured by a computer program stored in memory.
  • Such network devices may have multiple network interfaces which may be configured or designed to utilize different types of network communication protocols. A general architecture for some of these machines may appear from the descriptions disclosed herein.
  • At least some of the features and/or functionalities of the various virtual assistant embodiments disclosed herein may be implemented on one or more general-purpose network host machines such as an end-user computer system, computer, network server or server system, mobile computing device (e.g., personal digital assistant, mobile phone, smartphone, laptop, tablet computer, or the like), consumer electronic device, music player, or any other suitable electronic device, router, switch, or the like, or any combination thereof.
  • mobile computing device e.g., personal digital assistant, mobile phone, smartphone, laptop, tablet computer, or the like
  • consumer electronic device e.g., music player, or any other suitable electronic device, router, switch, or the like, or any combination thereof.
  • at least some of the features and/or functionalities of the various virtual assistant embodiments disclosed herein may be implemented in one or more virtualized computing environments (e.g., network computing clouds, or the like).
  • Computing device 60 may be, for example, an end-user computer system, network server or server system, mobile computing device (e.g., personal digital assistant, mobile phone, smartphone, laptop, tablet computer, or the like), consumer electronic device, music player, or any other suitable electronic device, or any combination or portion thereof.
  • Computing device 60 may be adapted to communicate with other computing devices, such as clients and/or servers, over a communications network such as the Internet, using known protocols for such communication, whether wireless or wired.
  • computing device 60 includes central processing unit
  • CPU 62 When acting under the control of appropriate software or firmware, CPU 62 may be responsible for implementing specific functions associated with the functions of a specifically configured computing device or machine.
  • a user's personal digital assistant (PDA) or smartphone may be configured or designed to function as a virtual assistant system utilizing CPU 62, memory 61, 65, and interface(s) 68.
  • the CPU 62 may be caused to perform one or more of the different types of virtual assistant functions and/or operations under the control of software
  • modules/components which for example, may include an operating system and any appropriate applications software, drivers, and the like.
  • CPU 62 may include one or more processor(s) 63 such as, for example, a processor from the Motorola or Intel family of microprocessors or the MIPS family of microprocessors.
  • processor(s) 63 may include specially designed hardware (e.g., application- specific integrated circuits (ASICs), electrically erasable programmable read-only memories (EEPROMs), field-programmable gate arrays (FPGAs), and the like) for controlling the operations of computing device 60.
  • ASICs application- specific integrated circuits
  • EEPROMs electrically erasable programmable read-only memories
  • FPGAs field-programmable gate arrays
  • a memory 61 (such as non-volatile random access memory (RAM) and/or readonly memory (ROM)) also forms part of CPU 62.
  • RAM non-volatile random access memory
  • ROM readonly memory
  • Memory block 61 may be used for a variety of purposes such as, for example, caching and/or storing data, programming instructions, and the like.
  • processor is not limited merely to those integrated circuits referred to in the art as a processor, but broadly refers to a microcontroller, a microcomputer, a programmable logic controller, an application- specific integrated circuit, and any other programmable circuit.
  • interfaces 68 are provided as interface cards (sometimes referred to as "line cards"). Generally, they control the sending and receiving of data packets over a computing network and sometimes support other peripherals used with computing device 60.
  • interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like.
  • interfaces may be provided such as, for example, universal serial bus (USB), Serial, Ethernet, Firewire, PCI, parallel, radio frequency (RF), BluetoothTM, near- field communications (e.g., using near- field magnetics), 802.11 (WiFi), frame relay, TCP/IP, ISDN, fast Ethernet interfaces, Gigabit Ethernet interfaces, asynchronous transfer mode (ATM) interfaces, high-speed serial interface (HSSI) interfaces, Point of Sale (POS) interfaces, fiber data distributed interfaces (FDDIs), and the like.
  • USB universal serial bus
  • RF radio frequency
  • BluetoothTM near- field communications
  • near- field communications e.g., using near- field magnetics
  • WiFi WiFi
  • frame relay TCP/IP
  • ISDN fast Ethernet interfaces
  • Gigabit Ethernet interfaces asynchronous transfer mode (ATM) interfaces
  • HSSI high-speed serial interface
  • POS Point of Sale
  • FDDIs fiber data distributed interfaces
  • FIG. 9 illustrates one specific architecture for a computing device 60 for implementing the techniques of the invention described herein, it is by no means the only device architecture on which at least a portion of the features and techniques described herein may be implemented.
  • architectures having one or any number of processors 63 can be used, and such processors 63 can be present in a single device or distributed among any number of devices.
  • a single processor 63 handles communications as well as routing computations.
  • different types of virtual assistant features and/or functionalities may be implemented in a virtual assistant system which includes a client device (such as a personal digital assistant or smartphone running client software) and server system(s) (such as a server system described in more detail below).
  • the system of the present invention may employ one or more memories or memory modules (such as, for example, memory block 65) configured to store data, program instructions for the general-purpose network operations and/or other information relating to the functionality of the virtual assistant techniques described herein.
  • the program instructions may control the operation of an operating system and/or one or more applications, for example.
  • the memory or memories may also be configured to store data structures, keyword taxonomy information,
  • advertisement information user click and impression information, and/or other specific non- program information described herein.
  • At least some network device embodiments may include nontransitory machine-readable storage media, which, for example, may be configured or designed to store program instructions, state information, and the like for performing various operations described herein.
  • nontransitory machine-readable storage media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM), flash memory, memristor memory, random access memory (RAM), and the like.
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • the system of the present invention is implemented on a standalone computing system.
  • Fig. 10 there is shown a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a standalone computing system, according to at least one embodiment.
  • Computing device 60 includes processor(s) 63 which run software for implementing multimodal virtual assistant 1002.
  • Input device 1206 can be of any type suitable for receiving user input, including for example a keyboard, touchscreen, mouse, touchpad, trackball, five-way switch, joystick, and/or any combination thereof.
  • Device 60 can also include speech input device 1211, such as for example a microphone.
  • Output device 1207 can be a screen, speaker, printer, and/or any combination thereof.
  • Memory 1210 can be random- access memory having a structure and architecture as are known in the art, for use by processor(s) 63 in the course of running software.
  • Storage device 1208 can be any magnetic, optical, and/or electrical storage device for storage of data in digital form; examples include flash memory, magnetic hard drive, CD- ROM, and/or the like.
  • the system of the present invention is implemented on a distributed computing network, such as one having any number of clients and/or servers.
  • a distributed computing network such as one having any number of clients and/or servers.
  • FIG. 11 there is shown a block diagram depicting an architecture for implementing at least a portion of a virtual assistant on a distributed computing network, according to at least one embodiment.
  • any number of clients 1304 are provided; each client 1304 may run software for implementing client-side portions of the present invention.
  • any number of servers 1340 can be provided for handling requests received from clients 1304.
  • Clients 1304 and servers 1340 can communicate with one another via electronic network 1361, such as the Internet.
  • Network 1361 may be
  • servers 1340 can call external services 1360 when needed to obtain additional information or refer to store data concerning previous interactions with particular users. Communications with external services 1360 can take place, for example, via network 1361.
  • external services 1360 include web-enabled services and/or functionality related to or installed on the hardware device itself.
  • assistant 1002 can obtain information stored in a calendar application ("app"), contacts, and/or other sources.
  • assistant 1002 can control many features and operations of an electronic device on which it is installed.
  • assistant 1002 can call external services 1360 that interface with functionality and applications on a device via APIs or by other means, to perform functions and operations that might otherwise be initiated using a conventional user interface on the device.
  • functions and operations may include, for example, setting an alarm, making a telephone call, sending a text message or email message, adding a calendar event, and the like.
  • Such functions and operations may be performed as add-on functions in the context of a conversational dialog between a user and assistant 1002.
  • Such functions and operations can be specified by the user in the context of such a dialog, or they may be automatically performed based on the context of the dialog.
  • assistant 1002 can thereby be used as a control mechanism for initiating and controlling various operations on the electronic device, which may be used as an alternative to conventional mechanisms such as buttons or graphical user interfaces.
  • assistant 1002 may provide input to assistant 1002 such as "I need to wake tomorrow at 8am".
  • assistant 1002 can call external services 1340 to interface with an alarm clock function or application on the device.
  • Assistant 1002 sets the alarm on behalf of the user. In this manner, the user can use assistant 1002 as a replacement for conventional mechanisms for setting the alarm or performing other functions on the device. If the user's requests are ambiguous or need further clarification, assistant 1002 can use the various techniques described herein, including active elicitation, paraphrasing, suggestions, and the like, and which may be adapted to a hands-free context, so that the correct services 1340 are called and the intended action taken.
  • assistant 1002 may prompt the user for confirmation and/or request additional context information from any suitable source before calling a service 1340 to perform a function.
  • a user can selectively disable assistant's 1002 ability to call particular services 1340, or can disable all such service-calling if desired.
  • the system of the present invention can be implemented with any of a number of different types of clients 1304 and modes of operation.
  • Fig. 12 there is shown a block diagram depicting a system architecture illustrating several different types of clients 1304 and modes of operation.
  • One skilled in the art will recognize that the various types of clients 1304 and modes of operation shown in Fig. 12 are merely exemplary, and that the system of the present invention can be implemented using clients 1304 and/or modes of operation other than those depicted. Additionally, the system can include any or all of such clients 1304 and/or modes of operation, alone or in any combination. Depicted examples include: • Computer devices with input/output devices and/or sensors 1402. A client component may be deployed on any such computer device 1402.
  • At least one embodiment may be implemented using a web browser 1304A or other software application for enabling communication with servers 1340 via network 1361.
  • Input and output channels may of any type, including for example visual and/or auditory channels.
  • the system of the invention can be implemented using voice-based communication methods, allowing for an embodiment of the assistant for the blind whose equivalent of a web browser is driven by speech and uses speech for output.
  • the mobile device 1304B implemented as an application on the mobile device 1304B.
  • Networked computing devices such as routers 1418 or any other device that communicate
  • the client resides on or interfaces with a network, for which the client may be implemented as a device-resident application 1304E.
  • Email Modality server 1426 acts as a
  • communication bridge for example taking input from the user as email messages sent to the assistant and sending output from the assistant to the user as replies.
  • Messaging Modality server 1430 acts as a communication bridge, taking input from the user as messages sent to the assistant and sending output from the assistant to the user as messages in reply.
  • Voice telephones 1432 for which an embodiment of the assistant is connected via a Voice over Internet Protocol (VoIP) Modality Server 1434.
  • VoIP Modality server 1434 acts as a communication bridge, taking input from the user as voice spoken to the assistant and sending output from the assistant to the user, for example as synthesized speech, in reply.
  • assistant 1002 may act as a participant in the conversations. Assistant 1002 may monitor the conversation and reply to individuals or the group using one or more the techniques and methods described herein for one-to-one interactions.
  • functionality for implementing the techniques of the present invention can be distributed among any number of client and/or server components.
  • various software modules can be implemented for performing various functions in connection with the present invention, and such modules can be variously implemented to run on server and/or client components. Further details for such an arrangement are provided in related U.S. Utility Application Serial No. 12/987,982 for "Intelligent Automated
  • input elicitation functionality and output processing functionality are distributed among client 1304 and server 1340, with client part of input elicitation 2794a and client part of output processing 2792a located at client 1304, and server part of input elicitation 2794b and server part of output processing 2792b located at server 1340.
  • the following components are located at server 1340:
  • client 1304 maintains subsets and/or portions of these components locally, to improve responsiveness and reduce dependence on network communications.
  • Such subsets and/or portions can be maintained and updated according to well known cache management techniques.
  • Such subsets and/or portions include, for example:
  • server 1340 Additional components may be implemented as part of server 1340, including for example:
  • dialog flow processor 2780 • dialog flow processor 2780;
  • Server 1340 obtains additional information by interfacing with external services 1360 when needed.
  • multimodal virtual assistant 1002 may be configured, designed, and/or operable to provide various different types of operations, functionalities, and/or features generally relating to virtual assistant technology. Further, as described in greater detail herein, many of the various operations, functionalities, and/or features of multimodal virtual assistant 1002 disclosed herein may enable or provide different types of advantages and/or benefits to different entities interacting with multimodal virtual assistant 1002.
  • the embodiment shown in Fig. 8 may be implemented using any of the hardware architectures described above, or using a different type of hardware architecture.
  • multimodal virtual assistant may be implemented using any of the hardware architectures described above, or using a different type of hardware architecture.
  • 1002 may be configured, designed, and/or operable to provide various different types of operations, functionalities, and/or features, such as, for example, one or more of the following (or combinations thereof):
  • multimodal virtual assistant 1002 may also enable the combined use of several sources of data and services at once. For example, it may combine information about products from several review sites, check prices and availability from multiple distributors, and check their locations and time constraints, and help a user find a personalized solution to their problem.
  • multimodal virtual assistant 1002 can be used to initiate, operate, and control many functions and apps available on the device.
  • multimodal virtual assistant 1002 may be implemented at one or more client systems(s), at one or more server system(s), and/or combinations thereof.
  • multimodal virtual assistant 1002 may use contextual information in interpreting and operationalizing user input, as described in more detail herein.
  • multimodal virtual assistant 1002 may be operable to utilize and/or generate various different types of data and/or other types of information when performing specific tasks and/or operations. This may include, for example, input data/information and/or output data/information.
  • multimodal virtual assistant 1002 may be operable to access, process, and/or otherwise utilize information from one or more different types of sources, such as, for example, one or more local and/or remote memories, devices and/or systems.
  • multimodal virtual assistant 1002 may be operable to generate one or more different types of output data/information, which, for example, may be stored in memory of one or more local and/or remote devices and/or systems.
  • Examples of different types of input data/information which may be accessed and/or utilized by multimodal virtual assistant 1002 may include, but are not limited to, one or more of the following (or combinations thereof):
  • Location information coming from sensors or location-based systems examples include Global Positioning System (GPS) and Assisted GPS (A-GPS) on mobile phones.
  • location information is combined with explicit user input.
  • the system of the present invention is able to detect when a user is at home, based on known address information and current location determination. In this manner, certain inferences may be made about the type of information the user might be interested in when at home as opposed to outside the home, as well as the type of services and actions that should be invoked on behalf of the user depending on whether or not he or she is at home.
  • Time information from clocks on client devices may include, for example, time from telephones or other client devices indicating the local time and time zone.
  • time may be used in the context of user requests, such as for instance, to interpret phrases such as "in an hour” and "tonight”.
  • Compass Compass, accelerometer, gyroscope, and/or travel velocity data, as well as other sensor data from mobile or handheld devices or embedded systems such as automobile control systems. This may also include device positioning data from remote controls to appliances and game consoles.
  • GUI graphical user interface
  • the input to the embodiments described herein also includes the context of the user interaction history, including dialog and request history.
  • many different types of output data/information may be generated by multimodal virtual assistant 1002. These may include, but are not limited to, one or more of the following (or
  • Speech output which may include one or more of the following (or combinations thereof):
  • hyperlinks for instance, the content rendered in a web browser
  • Actuator output to control physical actions on a device such as causing it to turn on or off, make a sound, change color, vibrate, control a light, or the like;
  • Actuator output to control physical actions to devices attached or controlled by a device, such as operating a remote camera, controlling a wheelchair, playing music on remote speakers, playing videos on remote displays, and the like.
  • multimodal virtual assistant 1002 of Fig. 8 is but one example from a wide range of virtual assistant system embodiments which may be implemented.
  • Other embodiments of the virtual assistant system may include additional, fewer and/or different components/features than those illustrated, for example, in the example virtual assistant system embodiment of Fig. 8.
  • Multimodal virtual assistant 1002 may include a plurality of different types of components, devices, modules, processes, systems, and the like, which, for example, may be implemented and/or instantiated via the use of hardware and/or combinations of hardware and software.
  • assistant 1002 may include one or more of the following types of systems, components, devices, processes, and the like (or combinations thereof):
  • Active input elicitation component(s) 2794 may include client part 2794a and server part 2794b);
  • Short term personal memory component(s) 2752 may include master version 2752b and cache 2752a);
  • Long-term personal memory component(s) 2754 may include master version 2754b and cache 2754a);
  • Vocabulary component(s) 2758 may include complete vocabulary 2758b and subset 2758a);
  • Language pattern recognizer(s) component(s) 2760 may include full library 2760b and subset 2760a);
  • Task flow models component(s) 2786 • Task flow models component(s) 2786;
  • client/server-based embodiments some or all of these components may be distributed between client 1304 and server 1340. Such components are further described in the related U.S. utility applications referenced above.
  • virtual assistant 1002 receives user input 2704 via any suitable input modality, including for example touchscreen input, keyboard input, spoken input, and/or any combination thereof.
  • assistant 1002 also receives context information 1000, which may include event context, application context, personal acoustic context, and/or other forms of context, as described in related U.S. Utility
  • Context information 1000 also includes a hands-free context, if applicable, which can be used to adapt the user interface according to techniques described herein.
  • virtual assistant 1002 Upon processing user input 2704 and context information 1000 according to the techniques described herein, virtual assistant 1002 generates output 2708 for presentation to the user.
  • Output 2708 can be generated according to any suitable output modality, which may be informed by the hands-free context as well as other factors, if appropriate. Examples of output modalities include visual output as presented on a screen, auditory output (which may include spoken output and/or beeps and other sounds), haptic output (such as vibration), and/or any combination thereof.
  • FIG. 1 there is shown a screen shot illustrating an example of a conventional hands-on interface 169 for reading a text message, according to the prior art.
  • a graphical user interface (GUI) as shown in Fig. 1 generally requires the user to be able to read fine details, such as the message text shown in bubble 171, and respond by typing in text field 172 and tapping send button 173.
  • GUI graphical user interface
  • Such actions require looking at and touching the screen, and are therefore impractical to perform in certain contexts, referred to herein as hands-free contexts.
  • FIG. 2 there is shown a screen shot illustrating an example of an interface 170 for responding to text message 171.
  • Virtual keyboard 270 is presented in response to the user tapping in text field 172, permitting text to be entered in text field 172 by tapping on areas of the screen corresponding to keys.
  • the user taps on send button 173 when the text message has been entered. If the user wishes to enter text by speaking, he or she taps on speech button 271, which invokes a voice dictation interface for receiving spoken input and converting it into text.
  • button 271 provides a mechanism by which the user can indicate that he or she is in a hands-free context.
  • FIGs. 3 A and 3B there is shown a sequence of screen shots illustrating an example of an interface 175 wherein a voice dictation interface is used to reply to text message 171.
  • Screen 370 is presented, for example, after user taps on speech button 271.
  • Microphone icon 372 indicates that the device is ready to accept spoken input.
  • the user inputs speech, which is received via speech input device 1211, which may be a microphone or similar device.
  • the user taps on Done button 371 to indicate that he or she has finished entering spoken input.
  • Speech-to-text functionality can reside on device 60 or on a server.
  • speech-to-text functionality is implemented using, for example, Nuance Recognizer, available from Nuance Communications, Inc. of Burlington, Massachusetts.
  • Keyboard 270 can be presented, to allow the user to edit the generated text in field 172.
  • Send button 173 When the user is satisfied with the entered text, he or she taps on Send button 173 to cause the text message to be sent.
  • mechanisms for accepting and processing speech input are integrated into device 60 in a manner that reduces the need for a user to interact with a display screen and/or to use a touch interface when in a hands-free context. Accordingly, the system of the present invention is thus able to provide an improved user interface for interaction in a hands-free context.
  • FIGs. 4 and 5A through 5D there is shown a series of screen shots illustrating an example of an interface for receiving and replying to a text message, according to one embodiment wherein a hands-free context is recognized; thus, in this example, the need for the user to interact with the screen is reduced, in accordance with the techniques of the present invention.
  • screen 470 depicts text message 471 which is received while device
  • multimodal virtual assistant 1002 provides functionality for receiving and replying to text message 471 in such a hands-free context.
  • virtual assistant 1002 installed on device 60 automatically detects the hands-free context. Such detection may take place by any means of determining a scenario or situation where it may be difficult or impossible for the user to interact with the screen of device 60 or to properly operate the GUI.
  • determination of hands-free context can be made based on any of the following, singly or in any combination:
  • sensors including, for example, compass, accelerometer, gyroscope, speedometer (e.g., whether device 60 is travelling at or above a predetermined speed), ambient light sensor, BlueTooth connection detector, clock, WiFi signal detector, microphone, and the like);
  • determining that device 60 is in a certain geographic location, for example via GPS (for example, determining that device 60 is travelling on or near a road);
  • signal information e.g., cell tower triangulation
  • predefined parameters for example, the user or an administrator can specify that hands-free context is active when any condition or combination of conditions is detected
  • peripherals including headphones, headsets, charging cables or docking stations (including vehicle docking stations), things connected by adapter cables, and the like;
  • the particular signal used to trigger interaction with assistant 1002 for example, a motion gesture in which the user holds the device to the ear, or the pressing of a button on a Bluetooth device, or pressing of a button on an attached audio device;
  • assistant 1002 can be configured to be listening for commands, and to be invoked when the user calls its name or says some command such as "Computer!; the particular command can indicate whether or not hands-free context is active.
  • hands-free context can be automatically determined based (at least in part) on determining that the user is in a moving vehicle or driving a car. In some embodiments, such determination is made without user input and without regard to whether a digital assistant has been separately invoked by a user.
  • a device through which a user interacts with assistant 1002 may contain multiple applications that are configured to execute within an operating system on the device. The determination that the device is in a vehicle, therefore, can be made without regard to whether a user has selected or activated a digital assistant application for immediate execution on the device. In some embodiments, the determination is made while a digital assistant application is not being executed in the foreground of an operating system, or is not displaying a graphical user interface on the device.
  • determining that the electronic device is in the vehicle is performed without regard to whether the digital assistant application was recently invoked by a user.
  • automatically determining a hands free context can be based (at least in part) on detecting that the electronic device is moving at or above a first predetermined speed. For example, if the device is moving above about 20 miles per hour, indicating that the user is not merely walking, hands-free context can be invoked, including invoking a listening mode as described below. In some embodiments, automatically determining a hands free context can be further based on detecting that the electronic device is moving at or below a second predetermined speed. This is useful, for example, to prevent the device from mistakenly detecting hands-free context when a user is in a plane. In some embodiments, hands-free context can be detected if the electronic device is moving less than about 150 miles per hour, indicating that the user is likely not flying in an airplane.
  • the user can manually indicate that hands-free context is active or inactive, and/or can schedule hands-free context to activate and/or deactivate at certain times of day and/or certain days of the week.
  • multimodal virtual assistant 1002 upon receiving text message 470 while in hands-free context, multimodal virtual assistant 1002 causes device 60 to output an audio indication, such as a beep or tone, indicating receipt of a text message.
  • the user can activate slider 472 to reply to or otherwise interact with message 471 according to known techniques (for example if hands-free mode was incorrectly detected, or if the user elects to stop driving or otherwise make him or herself available for hands-on interaction with device 60).
  • the user can engage in a spoken dialog with assistant 1002 to enable interaction with assistant 1002 in a hands-free manner.
  • the user initiates the spoken dialog by any suitable mechanism appropriate to a hands-free context.
  • an easily-accessed button for example, one mounted on the steering wheel of a car
  • Pressing the button initiates a spoken dialog with assistant 1002, and allows the user to communicate with assistant 1002 via the BlueTooth connection and through a microphone and/or speaker installed in the vehicle.
  • the user can initiate the spoken dialog by pressing a button on device 60 itself, or on a headset, or on any other peripheral device, or by performing some other distinctive action that signals to assistant 1002 that the user wishes to initiate a spoken dialog.
  • the user can speak a command that is understood by assistant 1002 and that initiates the spoken dialog, as described in greater detail below.
  • the mechanism that is used for initiating the spoken dialog does not require hand-eye coordination on the part of the user, thus allowing the user to focus on a primary task, such as driving, and/or can be performed by an individual having a disability that prevents, hinders, restricts, or limits his or her ability to interact with a GUI such as depicted in Figs. 2, 3 A, and 3B.
  • assistant 1002 listens for spoken input.
  • assistant 1002 acknowledges the spoken input by some output mechanism that is easily detected by the user while in the hands-free context.
  • An example is an audio beep or tone, and/or visual output on a vehicle dashboard that is easily seen by the user even while driving, and/or by some other mechanism.
  • Spoken input is processed using known speech recognition techniques.
  • Assistant 1002 then performs action(s) indicated by the spoken input.
  • assistant 1002 provides spoken output, which may be output via speakers (in device 60 or installed in the vehicle), headphones or the like, so as to continue the audio dialog with the user.
  • assistant 1002 can read content of text messages, email messages, and the like, and can provide options to the user in spoken form.
  • assistant 1002 may cause device 60 to emit an acknowledgement tone. Assistant may then 1002 emit spoken output such as "You have a new message from Tom Devon. It says: 'Hey, are you going to the game?'”. Spoken output may be generated by assistant 1002 using any known technique for converting text to speech.
  • text-to-speech functionality is implemented using, for example, Nuance Vocalizer, available from Nuance Communications, Inc. of Burlington, Massachusetts.
  • FIG. 5A there is shown an example of a screen shot 570 showing output that may be presented on the screen of device 60 while the verbal interchange between the user and assistant 1002 is taking placing.
  • the user can see the screen but cannot easily touch it, for example if the output on the screen of device 60 is being replicated on a display screen of a vehicle's navigation system.
  • Visual echoing of the spoken conversation can help the user to verify that his or her spoken input has been properly and accurately understood by assistant 1002, and can further help the user understand assistant's 1002 spoken replies.
  • visual echoing is optional, and the present invention can be implemented without any visual display on the screen of device 60 or elsewhere.
  • the user can interact with assistant 1002 purely by spoken input and output, or by a combination of visual and spoken inputs and/or outputs.
  • assistant 1002 displays and speaks a prompt 571.
  • assistant 1002 repeats the user input 572, on the display and/or in spoken form.
  • Assistant then introduces 573 the incoming text message and reads it.
  • the text message may also be displayed on the screen.
  • the system of the present invention informs the user of available actions in a manner that is well-suited to the hands-free context, in that it does not require the user to look at text fields, buttons, and/or links, and does not require direct manipulation by touch or interaction with on-screen objects.
  • the spoken output is echoed 574 on-screen;
  • echo messages displayed on the screen scroll upwards automatically according to well known mechanisms.
  • the user's spoken input is echoed 575 so that the user can check that it has been properly understood.
  • assistant 1002 repeats the user' s spoken input in auditory form, so that the user can verify understanding of his or her command even if he or she cannot see the screen.
  • the system of the present invention provides a mechanism by which the user can initiate a reply command, compose a response, and verify that the command and the composed response were properly understood, all in a hands-free context and without requiring the user to view a screen or interact with device 60 in a manner that is not feasible or well-suited to the current operating environment.
  • assistant 1002 provides further verification of the user's composed text message by reading back the message.
  • assistant 1002 says, verbally, "Here's your reply to Tom Devon: 'Yes I'll be there at six.'”.
  • the meaning of the quotation marks is conveyed with changes in voice and/or prosody.
  • the string "Here's your reply to Tom Devon” can be spoken in one voice, such as a male voice, while the string "Yes I'll be there at six" can be spoken in another voice, such as a female voice.
  • the same voice can be used, but with different prosody to convey the quotation marks.
  • assistant 1002 provides visual echoing of the spoken interchange, as depicted in Figs. 5B and 5C.
  • Figs. 5B and 5C show message 576 echoing assistant's 1002 spoken output of "Here's your reply to Tom Devon".
  • Fig. 5C shows a summary 577 of the text message being composed, including recipient and content of the message.
  • Previous messages have scrolled upward off the screen, but can be viewed by scrolling downwards according to known mechanisms.
  • Send button 578 sends the message; cancel button 579 cancels it.
  • the user can also send or cancel the message by speaking a keyword, such as "send" or "cancel".
  • assistant 1002 can generate a spoken prompt, such as "Ready to send it?"; again, a display 570 with buttons 578, 579 can be shown while the spoken prompt is output. The user can then indicate what he or she wishes to do by touching buttons 578, 579 or by answering the spoken prompt.
  • the prompt can be issued in a format that permits a "yes” or “no” response, so that the user does not need to use any special vocabulary to make his or her intention known.
  • assistant 1002 can confirm the user's spoken command to send the message, for example by generating spoken output such as "OK, I'll send your message.” As shown in Fig. 5D, this spoken output can be echoed 580 on screen 570, along with summary 581 of the text message being sent.
  • assistant 1002 provides redundant outputs in a multimodal interface.
  • assistant 1002 is able to support a range of contexts including eyes-free, hands-free, and fully hands-on.
  • the example also illustrates mechanisms by which the displayed and spoken output can differ from one another to reflect their different contexts.
  • the example also illustrates ways in which alternative mechanisms for responding are made available. For example, after assistant says “Ready to send it?" and displays screen 570 shown in Fig. 5C, the user can say the word “send", or “yes”, or tap on Send button 578 on the screen. Any of these actions would be interpreted the same way by assistant 1002, and would cause the text message to be sent.
  • the system of the present invention provides a high degree of flexibility with respect to the user's interaction with assistant 1002.
  • Figs. 6A through 6C there is shown a series of screen shots illustrating an example of operation of multimodal virtual assistant 1002 according to an embodiment of the present invention, wherein the user revises text message 577 in a hands- free context, for example to correct mistakes or add more content.
  • a visual interface involving direct manipulation such as described above in connection with Figs. 3A and 3B
  • the user might type on virtual keyboard 270 to edit the contents of text field 172 and thereby revise text message 577. Since such operations may not be feasible in a hands-free context, multimodal virtual assistant 1002 provides a mechanism by which such editing of text message 577 can take place via spoken input and output in a conversational interface
  • multimodal virtual assistant 1002 once text message 577 has been composed (based, for example, on the user's spoken input), multimodal virtual assistant 1002 generates verbal output informing the user that the message is ready to be sent, and asking the user whether the message should be sent. If the user indicates, via verbal or direct manipulation input, that he or she is not ready to send the message, then multimodal virtual assistant 1002 generates spoken output to inform the user of available options, such as sending, canceling, reviewing, or changing the message. For example, assistant 1002 may say with "OK, I won't send it yet. To continue, you can Send, Cancel, Review, or Change it.” [0114] As shown in Fig.
  • multimodal virtual assistant 1002 echoes the spoken output by displaying message 770, visually informing the user of the options available with respect to text message 577.
  • text message 577 is displayed in editable field 773, to indicate that the user can edit message 577 by tapping within field 773, along with buttons 578, 579 for sending or canceling text message 577, respectively.
  • tapping within editable field 773 invokes a virtual keyboard (similar to that depicted in Fig. 3B), to allow editing by direct manipulation.
  • the user can also interact with assistant 1002 by providing spoken input.
  • Assistant 1002 recognizes the spoken text and responds with a verbal message prompting the user to speak the revised message. For example, assistant 1002 may say, "OK... What would you like the message to say?" and then starts listening for the user's response.
  • Fig. 6B depicts an example of a screen 570 that might be shown in connection with such a spoken prompt. Again, the user's spoken text is visually echoed 771, along with assistant's 1002 prompt 772.
  • assistant 1002 then repeats back the input text message in spoken form, and may optionally echo it as shown in Fig. 6C.
  • Assistant 1002 offers a spoken prompt, such as "Are you ready to send it?", which may also be echoed 770 on the screen as shown in Fig. 6C.
  • the user can then reply by saying “cancel”, “send”, “yes”, or “no”, any of which are correctly interpreted by assistant 1002.
  • the user can press a button 578 or 579 on the screen to invoke the desired operation.
  • the system of the present invention provides a flow path appropriate to a hands-free context, which is integrated with a hands-on approach so that the user can freely choose the mode of interaction at each stage.
  • assistant 1002 adapts its natural language processing mechanism to particular steps in the overall flow; for example, as described above, in some situations assistant 1002 may enter a mode where it bypasses normal natural language interpretation of user commands when the user has been prompted to speak a text message.
  • multimodal virtual assistant 1002 detects a hands-free context and adapts one or more stages of its operation to modify the user experience for hands-free operation. As described above, detection of the hands-free context can be applied in a variety of ways to affect the operation of multimodal virtual assistant 1002.
  • Fig. 7A is a flow diagram depicting a method 800 of adapting a user interface, according to some embodiments.
  • the method 800 is performed at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors (e.g., device 60).
  • the method 800 includes automatically, without user input and without regard to whether a digital assistant application has been separately invoked by a user, determining (802) that the electronic device is in a vehicle.
  • automatically determining that the electronic device is in the vehicle is performed without regard to whether the digital assistant application was recently invoked by a user (e.g., within about the previous 1 minute, 2 minutes, 5 minutes).
  • determining that the electronic device is in a vehicle comprises detecting (806) that the electronic device is in communication with the vehicle.
  • the communication is wireless communication.
  • the communication is BLUETOOTH communication.
  • the communication is BLUETOOTH communication.
  • detecting that the electronic device is in communication with the vehicle comprises detecting that the electronic device is in communication with a voice control system of the vehicle (e.g., via wireless
  • determining that the electronic device is in a vehicle comprises detecting (808) that the electronic device is moving at or above a first
  • determining that the electronic device is in a vehicle further comprises detecting (810) that the electronic device is moving at or below a second predetermined speed. In some embodiments, the second predetermined speed is about 150 miles per hour. In some embodiments, the speed of the electronic device is determined using one or more of the group consisting of: GPS location information; accelerometer data; wireless data signal information; and speedometer information.
  • determining that the electronic device is in a vehicle further comprises detecting (812) that the electronic device is travelling on or near a road.
  • the location of the vehicle may be determined by GPS location information, cellular tower triangulation, and/or other location detecting techniques and technologies.
  • the method 800 further includes, responsive to the determining, invoking (814) a listening mode of a virtual assistant implemented by the electronic device.
  • Example embodiments of listening modes are described herein.
  • the listening mode causes the electronic device to continuously listen (816) for voice input from a user.
  • the listening mode causes the electronic device to continuously listen for voice input from the user responsive to detecting that the electronic device is connected to a charging source.
  • the listening mode causes the electronic device to listen for voice input from a user for a predetermined time after initiation of the listening mode (e.g., for about 5 minutes after initiation of the listening mode).
  • the listening mode causes the electronic device to automatically, without a physical input from a user, listen (818) for a voice input from the user after the electronic device provides an auditory output (such as a "beep").
  • the method 800 also comprises limiting functionality of the device (e.g., device 60) and/or the digital assistant (e.g., assistant 1002) when it is determined that the electronic device is in a vehicle.
  • the method includes, responsive to determining that the electronic device is in the vehicle, taking any of the following actions (alone or in combination): limiting the ability to view visual output presented by the electronic device; limiting the ability to interact with a graphical user interface presented by the electronic device; limiting the ability to use a physical component of the electronic device; limiting the ability to perform touch input on the electronic device; limiting the ability to use a keyboard on the electronic device; limiting the ability to execute one or more applications on the electronic device; limiting the ability to perform one or more functions enabled by the electronic device; limiting the device so as to not request touch input from the user; limiting the device so as to not respond to touch input from the user; and limiting the amount of items in the list to a predetermined amount.
  • the method 800 further comprises, while the device is in the listening mode, detecting (822) a wake-up word spoken by the user.
  • the wake-up word may be any word that a digital assistant (e.g., assistant 1002) is configured to recognize as a trigger signaling the assistant to begin listening for voice input from a user.
  • the method further comprises, in response to detecting the wake-up word, listening (824) for voice input from the user, receiving (826) a voice input from the user, and generating (828) a response to the voice input.
  • the method 800 further comprises, receiving (830) a voice input from the user; generating (832) a response to the voice input, the response including a list of information items to be presented to the user; and outputting (834) the information items via an auditory output mode, wherein if the electronic device were not in a vehicle, the information items would only be presented on a display screen of the electronic device.
  • information items that are returned in response to a web search are displayed visually on a device. In some cases, they are only displayed visually (e.g., without any audio). In contrast, this aspect of method 800 instead provides only auditory output for the information items, without any visual output.
  • the method 800 further comprises receiving (836) a voice input from the user, wherein the voice input corresponds to content to be sent to a recipient.
  • the content is to be sent to a recipient via text message, email message, etc.
  • the method further comprises producing (838) text corresponding to the voice input, and outputting (840) the text via an auditory output mode, wherein if the electronic device were not in a vehicle, the text would only be presented on a display screen of the electronic device. For example, in some cases, message content that is transcribed from a voice input is displayed visually on a device. In some cases, it is only displayed visually (e.g., without any audio).
  • this aspect of method 800 instead provides only auditory output for the transcribed text, without any visual output.
  • the method further comprises requesting (842) confirmation prior to sending the text to the recipient.
  • requesting confirmation comprises asking the user, via the auditory output mode, whether the text should be sent to the recipient.
  • Fig. 7D is a flow diagram depicting a method 850 of adapting a user interface, according to some embodiments.
  • the method 850 is performed at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors.
  • the method 850 comprises automatically, without user input, determining
  • determining that the electronic device is in a vehicle comprises detecting (854) that the electronic device is in communication with the vehicle.
  • the communication is wireless communication.
  • the communication is BLUETOOTH communication.
  • the communication is BLUETOOTH communication.
  • detecting that the electronic device is in communication with the vehicle comprises detecting that the electronic device is in communication with a voice control system of the vehicle (e.g., via wireless
  • determining that the electronic device is in a vehicle comprises detecting (856) that the electronic device is moving at or above a first
  • determining that the electronic device is in a vehicle further comprises detecting (858) that the electronic device is moving at or below a second predetermined speed. In some embodiments, the second predetermined speed is about 150 miles per hour. In some embodiments, the speed of the electronic device is determined using one or more of the group consisting of: GPS location information; accelerometer data; wireless data signal information; and speedometer information.
  • determining that the electronic device is in a vehicle further comprises detecting (860) that the electronic device is travelling on or near a road.
  • the location of the vehicle may be determined by GPS location information, cellular tower triangulation, and/or other location detecting techniques and technologies.
  • the method 850 further comprises, responsive to the determining, limiting certain functions of the electronic device, as described above.
  • limiting certain functions of the device comprises deactivating (864) a visual output mode in favor of an auditory output mode.
  • deactivating the visual output mode includes preventing (866) the display of a subset of visual outputs that the electronic device is capable of displaying.
  • FIG. 7E there is shown a flow diagram depicting a method
  • Method 10 of operation of virtual assistant 1002 that supports dynamic detection of and adaptation to a hands-free context, according to one embodiment.
  • Method 10 may be implemented in connection with one or more embodiments of multimodal virtual assistant 1002.
  • the hands-free context can be used at various stages of processing in multimodal virtual assistant 1002, according to one embodiment.
  • method 10 may be operable to perform and/or implement various types of functions, operations, actions, and/or other features such as, for example, one or more of the following (or combinations thereof):
  • a conversational interface is an interface in which the user and assistant 1002 communicate by making utterances back and forth in a conversational manner.
  • portions of method 10 may also be
  • method 10 may be concurrently implemented and/or initiated via the use of one or more processors 63 and/or other combinations of hardware and/or hardware and software. In at least one embodiment, one or more or selected portions of method 10 may be implemented at one or more client(s) 1304, at one or more server(s) 1340, and/or combinations thereof.
  • various aspects, features, and/or functionalities of method 10 may be performed, implemented and/or initiated by software components, network services, databases, and/or the like, or any combination thereof.
  • one or more different threads or instances of method 10 may be initiated in response to detection of one or more conditions or events satisfying one or more different types of criteria (such as, for example, minimum threshold criteria) for triggering initiation of at least one instance of method 10.
  • criteria such as, for example, minimum threshold criteria
  • Examples of various types of conditions or events which may trigger initiation and/or implementation of one or more different threads or instances of the method may include, but are not limited to, one or more of the following (or combinations thereof):
  • multimodal virtual assistant 1002 such as, for example, but not limited to, one or more of:
  • a mobile device application starting up, for instance, a mobile device application that is implementing an embodiment of multimodal virtual assistant 1002;
  • a dedicated button on a mobile device pressed such as a "speech input button"
  • a button on a peripheral device attached to a computer or mobile device such as a headset, telephone handset or base station, a GPS navigation system, consumer appliance, remote control, or any other device with a button that might be associated with invoking assistance;
  • multimodal virtual assistant 1002 o an interaction started from within an existing web browser session to a website implementing multimodal virtual assistant 1002, in which, for example, multimodal virtual assistant 1002 service is requested;
  • a phone call is made to a VOIP modality server 1434 that is mediating communication with an embodiment of multimodal virtual assistant 1002; o an event such as an alert or notification is sent to an application that is providing an embodiment of multimodal virtual assistant 1002.
  • one or more different threads or instances of method 10 may be initiated and/or implemented manually, automatically, statically, dynamically, concurrently, and/or combinations thereof. Additionally, different instances and/or embodiments of method 10 may be initiated at one or more different time intervals (e.g., during a specific time interval, at regular periodic intervals, at irregular periodic intervals, upon demand, and the like).
  • a given instance of method 10 may utilize and/or generate various different types of data and/or other types of information when performing specific tasks and/or operations, including detection of a hands-free context as described herein.
  • Data may also include any other type of input data/information and/or output data/information.
  • at least one instance of method 10 may access, process, and/or otherwise utilize information from one or more different types of sources, such as, for example, one or more databases.
  • at least a portion of the database information may be accessed via communication with one or more local and/or remote memory devices.
  • at least one instance of method 10 may generate one or more different types of output data/information, which, for example, may be stored in local memory and/or remote memory devices.
  • initialization parameters 10 may be performed using one or more different types of initialization parameters.
  • at least a portion of the initialization parameters may be accessed via communication with one or more local and/or remote memory devices.
  • at least a portion of the initialization parameters provided to an instance of method 10 may correspond to and/or may be derived from the input data/information.
  • assistant 1002 is installed on device 60 such as a mobile computing device, personal digital assistant, mobile phone, smartphone, laptop, tablet computer, consumer electronic device, music player, or the like.
  • Assistant 1002 operates in connection with a user interface that allows users to interact with assistant 1002 via spoken input and output as well as direct manipulation and/or display of a graphical user interface (for example via a touchscreen).
  • Device 60 has a current state 11 that can be analyzed to detect 20 whether it is in a hands-free context.
  • a hands-free context can be detected 20, based on state 11, using any applicable detection mechanism or combination of mechanisms, whether automatic or manual. Examples are set forth above.
  • Speech input is elicited and interpreted 100. Elicitation may include presenting prompts in any suitable mode. Thus, depending on whether or not hands-free context is detected, in various embodiments, assistant 1002 may offer one or more of several modes of input. These may include, for example:
  • speech input may be elicited by a tone or other audible prompt, and the user' s speech may be interpreted as text.
  • a tone or other audible prompt For example, if a hands-free context is detected, speech input may be elicited by a tone or other audible prompt, and the user' s speech may be interpreted as text.
  • One skilled in the art will recognize, however, that other input modes may be provided.
  • the output of step 100 may be a set of candidate interpretations of the text of the input speech.
  • This set of candidate interpretations is processed 200 by language interpreter 2770 (also referred to as a natural language processor, or NLP), which parses the text input and generates a set of possible semantic interpretations of the user's intent.
  • language interpreter 2770 also referred to as a natural language processor, or NLP
  • dialog flow processor 2780 implements an embodiment of a dialog and flow analysis procedure to operationalize the user's intent as task steps.
  • Dialog flow processor 2780 determines which interpretation of intent is most likely, maps this interpretation to instances of domain models and parameters of a task model, and determines the next flow step in a task flow. If appropriate, one or more task flow step(s) adapted to hands-free operation is/are selected 310. For example, as described above, the task flow step(s) for modifying a text message may be different when hands-free context is detected.
  • step 400 the identified flow step(s) is/are executed.
  • invocation of the flow step(s) is performed by services orchestration component 2782, which invokes a set of services on behalf of the user's request. In one embodiment, these services contribute some data to a common result.
  • dialog response generation 500 is influenced by the state of hands-free context.
  • different and/or additional dialog units may be selected 510 for presentation using the audio channel.
  • additional prompts such as "Ready to send it?" may be spoken verbally and not necessarily displayed on the screen.
  • the detection of hands-free context can influence the prompting for additional input 520, for example to verify input.
  • multimodal output (which, in one embodiment includes verbal and visual content) is presented to the user, who then can optionally respond again using speech input.
  • context information 1000 can be used by various components of the system to influence various steps of method 10.
  • context 1000 including hands-free context
  • steps 100, 200, 300, 310, 500, 510, and/or 520 can be used at steps 100, 200, 300, 310, 500, 510, and/or 520.
  • context information 1000, including hands-free context is not limited to these specific steps, and that the system can use context information at other points as well, without departing from the essential characteristics of the present invention. Further description of the use of context 1000 in the various steps of operation of assistant 1002 is provided in related U.S. Utility Application Serial No.
  • method 10 may include additional features and/or operations than those illustrated in the specific embodiment depicted in Fig. 7, and/or may omit at least a portion of the features and/or operations of method 10 as illustrated in the specific embodiment of Fig. 7.
  • Elicitation and interpretation of speech input 100 can be adapted to a hands- free context in any of several ways, either singly or in any combination.
  • speech input may be elicited by a tone and/or other audible prompt, and the user's speech is interpreted as text.
  • multimodal virtual assistant 1002 may provide multiple possible mechanisms for audio input (such as, for example, Bluetooth-connected microphones or other attached peripherals), and multiple possible mechanisms for invoking assistant 1002 (such as, for example, pressing a button on a peripheral or using a motion gesture in proximity to device 60).
  • the information about how assistant 1002 was invoked and/or which mechanism is being used for audio input can be used to indicate whether or not hands-free context is active and can be used to alter the hands-free experience. More particularly, such information can be used to direct step 100 to use a particular audio path for input and output.
  • the manner in which audio input devices are used can be changed.
  • the interface can require that the user press a button or make a physical gesture to cause assistant 1002 to start listening for speech input.
  • the interface can continuously prompt for input after every instance of output by assistant 1002, or can allow continuous speech in both directions (allowing the user to interrupt assistant 1002 while assistant 1002 is still speaking).
  • Natural Language Processing (NLP) 200 can be adapted to a hands-free context, for example, by adding support for certain spoken responses that are particularly well-suited to hands-free operation. Such responses can include, for example, "y es ' ⁇ "read the message” and "change it". In one embodiment, support for such responses can be provided in addition to support for spoken commands that are usable in a hands-on situation. Thus, for example, in one embodiment, a user may be able to operate a graphical user interface by speaking a command that appears on a screen (for example, when a button labeled "Send” appears on the screen, support may be provided for understanding the spoken word "send” and its semantic equivalents). In a hands-free context, additional commands can be recognized to account for the fact that the user may not be able to view the screen.
  • Detection of a hands-free context can also alter the interpretation of words by assistant 1002.
  • assistant 1002 can be tuned to recognize the command "quiet! and its semantic variants, and to turn off all audio output in response to such a comment. In a non-hands-free context, such a command might be ignored as not relevant.
  • Step 300 which includes identifying task(s) associated with the user's intent, parameter(s) for the task(s) and/or task flow steps 300 to execute, can be adapted for hands- free context in any of several ways, singly or in combination.
  • one or more additional task flow step(s) adapted to hands- free operation is/are selected 310 for operation. Examples include steps to review and confirm content verbally.
  • assistant 1002 can read lists of results that would otherwise be presented on a display screen.
  • a hands-free context when a hands-free context is detected, items that would normally be displayed only via visual interface (e.g., in a hands-on mode) are instead output to a user only via an auditory output mode.
  • a user may provide a voice input requesting a web search, thus causing the assistant 1002 to generate a response including a list of information items to be presented to the user.
  • a list may be presented to the user via visual output only, without any auditory output.
  • the assistant 1002 can speak the list aloud, either in its entirety or in a truncated or summarized version, instead of displaying it on a visual interface.
  • information that is typically displayed only via a visual interface is not adapted to auditory output modes.
  • a typical web search for restaurants will return results that include multiple pieces of information, such as a name, address, hours, phone number, user ratings, and the like. These items are well suited to being displayed in a list on a screen (such as a touchscreen on a mobile device). But this information may not all be necessary in a hands-free context, and it may be confusing or difficult to follow if it were to be converted directly to a spoken output. For example, speaking all of the displayed components of a list of restaurant results may be very confusing, especially for longer lists.
  • the assistant 1002 summarizes or truncates information items (such as items in a list) so that they can be more easily understood by a user.
  • the assistant 1002 may receive a list of restaurant results and read aloud only a subset of the information in each result, such as the restaurant name and street name, or restaurant name and rating information (e.g., 4 stars), etc., for each result.
  • Other ways of summarizing or truncating lists and/or information items within lists are also contemplated by the present disclosure.
  • verbal commands can be provided for interacting with individual items in the list. For example, if several incoming text messages are to be presented to the user, and a hands-free context is detected, then identified task flow steps can include reading aloud each text message individually, and pausing after each message to allow the user to provide a spoken command. In some embodiments, if a list of search results (e.g., from a web search) is to be presented to a user, and a hands-free context is detected, then identified task flow steps can include reading aloud each search result individually (either the entire result or a truncated or summarized version), and pausing after each result to allow the user to provide a spoken command.
  • search results e.g., from a web search
  • task flows can be modified for hands-free context.
  • the task flow for taking notes in a notes application might normally involve prompting for content and immediately adding it to a note. Such an operation might be appropriate in a hands-on environment in which content is immediately shown in the visual interface and immediately available for modification by direct manipulation.
  • the task flow can be modified, for example to verbally review the content and allow for modification of content before it is added to the note. This allows the user to catch speech dictation errors before they are stored in the permanent document.
  • hands-free context can also be used to limit the tasks or functionalities that are allowed at a given time.
  • a policy can be implemented to disallow the playing videos when the user's device is in hands-free context, or a specific hands-free context such as driving a vehicle.
  • device 60 limits the ability to view visual output presented by the electronic device. This may include limiting the device in any of the following ways (individually or in any combination):
  • limiting the ability to view visual output presented by the electronic device for example, deactivating a screen/visual output mode, preventing display of videos and/or images, displaying large text, limiting lengths of lists (e.g., search results), limiting number of visual items displayed on a screen, etc.);
  • limiting the ability to interact with a graphical user interface presented by the electronic device for example, limiting a device so as to not request touch input from the user, limiting the device so as to not respond to touch input from the user, etc.
  • assistant 1002 can make available entire domains of discourse and/or tasks that are only applicable in a hands-free context.
  • Examples include accessibility modes such as those designed for people with limited eyesight or limited use of their hands. These accessibility modes include commands that are implemented as hands- free alternatives for operating an arbitrary GUI on a given application platform, for example to recognize commands such as "press the button” or “scroll up” are.
  • Other tasks that are may be applicable only in hands-free modes include tasks related to the hands-free experience itself, such as "use my car's Bluetooth kit” or "slow down [the Text to Speech Output]".
  • any of a number of techniques can be used for modifying dialog generation 500 to adapt to a hands-free context.
  • assistant's 1002 interpretation of the user's input can be echoed in writing; however such feedback may not be visible to the user when in a hands- free context.
  • assistant 1002 uses Text-to-Speech (TTS) technology to paraphrase the user's input.
  • TTS Text-to-Speech
  • Such paraphrasing can be selective; for example, prior to sending a text message, assistant 1002 can speak the text message so that a user can verify its contents even if he or she cannot see the display screen.
  • the assistant 1002 does not visually display transcribed text at all, but rather speaks the text back to the user. This may be beneficial where it may be unsafe for a user to read text from a screen, such as when the user is driving, and/or when a screen or visual output mode has been deactivated.
  • the determination as to when to paraphrase the user's speech, and which parts of the speech to paraphrase, can be driven by task- and/or flow-specific dialogs. For example, in response to a user's spoken command such as "read my new message", in one embodiment assistant 1002 does not paraphrase the command, since it is evident from assistant's 1002 response (reading the message) that the command was understood. However, in other situations, such as when the user's input is not recognized in step 100 or understood in step 200, assistant 1002 can attempt to paraphrase the user's spoken input so as to inform the user why the input was not understood. For example, assistant 1002 might say "I didn't understand 'reel my newt massage'. Please try again.”
  • the verbal paraphrase of information can combine dialog templates with personal data on a device.
  • assistant 1002 uses a spoken output template with variables of the form, "You have a new message from $person. It says $message.”
  • the variables in the template can be substituted with user data and then turned into speech by a process running on device 60.
  • such a technique can help protect the privacy of users while still allowing personalization of output, since the personal data can remain on device 60 and can be filled in upon receipt of an output template from the server.
  • dialog units specifically tailored to hands-free contexts may be selected 510 for presentation using the audio channel.
  • the code or rules for determining which dialog units to select can be sensitive to the particulars of the hands-free context.
  • a general dialog generation component can be adapted and extended to support various hands- free variations without necessarily building a separate user experience for different hands- free situations.
  • the same mechanism that generates text and GUI output units can be annotated with texts that are tailored for an audio (spoken word) output modality.
  • texts that are tailored for an audio (spoken word) output modality.
  • a dialog generation component can be adapted for a
  • a dialog generation component can be adapted for a
  • such annotations support a variable substitution template mechanism which segregates user data from dialog generation.
  • graphical user interface elements can be annotated with text that indicates how they should be verbally paraphrased over TTS.
  • TTS texts can be tuned so that the voice, speaking rate, pitch, pauses, and/or other parameters are used to convey verbally what would otherwise be conveyed in punctuation or visual rendering.
  • the voice that is used when repeating back the user's words can be a different voice, or can use different prosody, than that used for other dialog units.
  • the voice and/or prosody can differ depending on whether content or instructions are being spoken.
  • pauses can be inserted between sections of text with different meanings, to aid in understanding. For example, when paraphrasing a message and asking for confirmation, a pause might be inserted between the paraphrase of the content "Your message reads " and the prompt for confirmation "Ready to send it?"
  • non-hands free contexts can be enhanced using similar mechanisms of using TTS as described above for hands-free contexts.
  • a dialog can generate verbal-only prompts in addition to written text and GUI elements.
  • assistant 1002 can say, verbally, "Shall I send it?" to augment the on- screen display of a Send button.
  • the TTS output used for both hands-free and non-hands-free contexts can be tailored for each case. For example, assistant 1002 may use longer pauses when in the hands-free context.
  • the detection of hands-free context can also be used to determine whether and when to automatically prompt the user for a response. For example, when interaction between assistant 1002 and user is synchronous in nature, so that one party speaks while the other listens, a design choice can be made as to whether and when assistant 1002 should automatically start listening for a speech input from the user after assistant 1002 has spoken.
  • the specifics of the hands-free context can be used to implement various policies for this auto- start- listening property of a dialog. Examples include, without limitation:
  • a listening mode is initiated in response to detecting a hands-free context.
  • the assistant 1002 may continuously analyze ambient audio in order to identify voice input, such as a voice command, from a user.
  • the listening mode may be used in hands-free contexts, such as when a user is driving in a vehicle.
  • the listening mode is activated whenever a hands-free context is detected. In some embodiments, it is activated in response to detecting that the assistant 1002 is being used in a vehicle.
  • the listening mode is active as long as the assistant
  • the listening mode is active for a predetermined time after initiation of the listening mode. For example, if a user pairs the assistant 1002 to a vehicle, the listening mode may be active for a predetermined time after the pairing event. In some embodiments, the predetermined time is 1 minute. In some embodiments, the predetermined time is 2 minutes. In some embodiments, the predetermined time is 10 or more minutes. [0180] In some embodiments, when in the listening mode, the assistant 1002 analyzes received audio inputs (e.g., using speech-to-text processing) to determine whether the audio input includes a speech input intended for the assistant 1002.
  • received audio inputs e.g., using speech-to-text processing
  • received speech is converted to text locally (i.e., on the device) without sending the audio input to a remote computer.
  • the received speech is first analyzed (e.g., converted to text) locally in order to identify words that are intended for the assistant 1002. Once it is determined that one or more words are intended for the assistant, a portion of the received speech is sent to a remote server (e.g., servers 1340) for further processing, such as speech-to-text processing, natural language processing, intent deduction, and the like.
  • the portion sent to the remote service is a group of words following a predefined wake-up word.
  • the assistant 1002 continuously analyzes received ambient audio (converting the audio to text locally), and when a predefined wake-up word is detected, the assistant 1002 will recognize that one or more of the following words are directed to the assistant 1002.
  • the assistant 1002 will then send recorded audio of the one or more words following the keyword to a remote computer for further analysis (e.g., speech-to-text processing).
  • the assistant 1002 detects a pause (i.e., a silent period) of a predefined length following the one or more words, and sends only those words that are between the keyword and the pause to the remote service.
  • the assistant 1002 then proceeds to fulfill the user's intent, including executing appropriate task flows and/or dialog flows.
  • a user may say "Hey Assistant - find me a nearby gas station."
  • the assistant 1002 is configured to detect the phrase “hey assistant” as a wake-up to signal the beginning of an utterance that is directed to the assistant 1002.
  • the assistant 1002 then processes the received audio to determine what should be sent to a remote service for further processing. In this case, the pause following the word
  • a hands-free context once detected, is a system-side parameter that can be used to adapt various processing steps of a complex system such as multimodal virtual assistant 1002.
  • the various methods described herein provide ways to adapt general procedures of assistant 1002 for hands-free contexts to support a range of user experiences from the same underlying system.
  • assistant 1002 when in a hands-free context, allows the user to can call anyone if the user can specify the person to be called without tapping or otherwise touching the device. Examples include calling by contact name, calling by phone number (digits recited by user), and the like. Ambiguity can be resolved by additional spoken prompts. Examples are shown below. Example 1 : Call a contact, unambiguous
  • Example 2 Cancel a phone call
  • Example 3 Call by name, ambiguous
  • Example 4 Call by name, doubly ambiguous
  • Example 5 Call a business by name, no ambiguity
  • Example 6 Call a business by name, multiple matches
  • Assistant 1002 plays voicemails, one at a time, with prompts
  • Example 9 Read a single text message alert
  • Assistant's 1002 audio output Reads the alert or alerts; or generates sound indicating incoming text message
  • Assistant's 1002 spoken output "Message from Mary Richards ⁇ change of voice> are you free for dinner tonight?"
  • Example 10 Reply to text message (continues from Example 8)
  • Example 11 Send a text message to one recipient
  • Assistant's 1002 spoken output "Ready to send a text to Lisa Gonzalez with the message ⁇ change of voice> I'll be 15 minutes late"
  • Example 12 Send a text message to one recipient - ambiguous
  • Example 13 Read an SMS message from Messages app
  • Example 14 Reply in context of Messages App (continues from Example 12)
  • Assistant's 1002 spoken output "Ready to reply to Betty Joe Bialowski with the message ⁇ change of voice> Hi Nancy "
  • Example 17 Send a text message to multiple recipient
  • this task is determined to be out of scope for hands-free context. Accordingly, assistant 1002 reverts to tapping for disambiguation. • User's spoken input: "Tell Lisa, Paul, and Nancy that I'll be 15 minutes late to our important meeting"
  • Assistant's 1002 spoken output "Ready to send a text to Lisa Gonzalez, Paul Marcos, and Audrey Farber with the message ⁇ change of voice> I'll be 15 minutes late to our important meeting"
  • Example 18 Read a single Reminder alert
  • Example 20 Create a new reminder with alert
  • Example 22 Create a simple appointment (description and date/time given)
  • Example 24 Create a simple appointment (no time given)
  • Example 25 Create a simple appointment (no date or time given)
  • hands-free context in particular situations. Additional uses include, for example, maps, playing media such as music, and the like.
  • the following use cases are more specifically directed to how a list of items is presented to the user in a hands-free context, in general and in specific domains (e.g., in the local search domain, calendar domain, reminder domain, text messaging domain, and e-mail domain, etc.).
  • the specific algorithms for presenting a list of items in the hands-free and/or eyes-free context(s) are designed to provide information about the items to the user in an intuitive and personal way, and at the same time, to avoid overburdening the user with unnecessary details.
  • Each piece of information to be presented to the user through a speech- based output and/or the accompanying textual interface is carefully selected out of many pieces of potentially relevant information, and optionally paraphrased to provide a smooth and personable dialogue flow.
  • the information particularly unbounded
  • suitable- sized chucks e.g., pages, sub-lists, categories, etc.
  • Known cognitive limitations e.g., adults are typically only capable of handling 3-7 pieces of information at a time, and children or people with disabilities are capable of handling even fewer pieces of information concurrently
  • Known cognitive limitations are used to guide the selection of a suitable size for the chunking and categorization of information for presentation.
  • Hands-free list reading is a core, cross-domain ability for users to be able to navigate results involving more than one item.
  • the item can be of a common data item type associated with a particular domain, such as results of a local search, a group of e-mails, a group of calendar entries, a group of reminders, a group of messages, a group of voice mail messages, a group of text messages, etc.
  • the group of data items can be sorted in a particular order (e.g., by time, location, sender, and other criteria), and hence result in a list.
  • the general functional requirements for hands-free list reading include one or more of: (1) Providing a verbal overview of a list of items (e.g.,"There are 6 items.") through a speech-based output; (2) Optionally, providing a list of visual snippets representing the list of items on a screen (e.g., within a single dialogue window); (3) Iterating through the items and have each one read aloud; (4) Reading a domain- specific paraphrase of an item (e.g., "message from X on date Y about Z”); (4) Reading the unbounded content of an item (e.g., content body of an email); (5) Verbally "paginating" the unbounded content of an individual item (e.g.
  • a speech-based overview is first provided. If the list of data items has been identified based on a particular set of selection criteria (e.g., new, unread, from Mark, for today, nearby, in Palo Alto, restaurants, etc.) and/or belong to a particular domain- specific data type (e.g., local search results, calendar entries, reminders, e-mails, etc.), the overview paraphrases the list of items.
  • a particular set of selection criteria e.g., new, unread, from Mark, for today, nearby, in Palo Alto, restaurants, etc.
  • domain- specific data type e.g., local search results, calendar entries, reminders, e-mails, etc.
  • the particular paraphrasing used is domain-specific, and typically specifies one or more of the criteria used to select the list of data items.
  • the overview also specifies the length of the list, to provide the user with some idea of how long and involved the reading is going to be. For example, the overview can be "You have 3 new messages from Anna Karenina and Alexei Vronsky.”
  • the list length e.g. , 3
  • the criteria used to select the items were specified by the user, and by including the criteria in the overview, the presentation of information would appear more responsive to the user' s request.
  • the interaction also includes providing a speech-based prompt with an offer to read the list and/or the unbounded content of each item to the user.
  • a digital assistant can provide a speech-based prompt such as "Shall I read them to you?" after providing the overview.
  • the prompt is only provided in the hands-free mode, because in a hands-on mode, the user can probably easily read and scroll through the list on a screen rather than hearing the content read out loud.
  • the digital assistant will proceed to read the data items out loud without providing the prompt first.
  • the digital assistant proceeds to read the messages without asking the user whether he or she wants the messages read out loud.
  • the digital assistant will first provide an overview of the list of messages, and will provide a prompt with an offer to read the messages. The messages will not be read out loud unless the user provides a confirmation for doing so.
  • the digital assistant identifies fields of text data from each data item in the list, and generates a domain- specific and item-specific paraphrase of the item's content based on a domain- specific template and the actual text identified from the data item. Once the respective paraphrases for the data items are generated, the digital assistant iterates through each item in the list one by one and reads its respective paraphrase out loud. Examples of text data fields in a data item include dates, times, person names, location names, business names, and other domain- specific data fields.
  • the domain- specific speakable text templates arrange the different data fields of a domain- specific item type in a suitable order, and connecting the data fields with suitable connection words, and apply suitable variations (e.g. , variations based on grammatical, cognitive, and other requirements) to the text of different text fields, to generate a succinct, and natural, and easy-to-understand paraphrase of the data item.
  • the digital assistant when iterating through the list of items and providing information (e.g. , the domain-specific, item-specific paraphrase of the items), the digital assistant sets a context marker to the current item.
  • the context marker advances from item to item as the reading proceeds through the list.
  • the context marker can also hop from one item to another item, if the user issues commands to jump from one item to another item.
  • the digital assistant uses the context marker to identify the current context of the interaction between the digital assistant and the user, so that the user' s input can be interpreted correctly in context.
  • the user can interrupt the list reading at any time and issue a command applicable to all or multiple of the list items (e.g., "reply"), and the context marker is used to identify a target data item (e.g. , the current item) for which the command should be applied.
  • the domain-specific, item-specific paraphrases are provided to the user through text- to- speech processing.
  • a textual version of the paraphrase is also provided on a screen.
  • the textual version of the paraphrase is not provided on the screen, instead, full- versions of or detailed versions the data items are presented on the screen.
  • the unbounded content when reading the unbounded content of a data item, is first divided into sections.
  • the division can be based on paragraphs, lines, number of words, and/or other logical divisions of the unbounded content.
  • the goal is to reduce the cognitive burden on the user, and not overloading the user with too much information or taking up too much time.
  • a speech output is generated for each section, provided to the user one section at a time. Once the speech output for one section is provided, a verbal prompt is provided asking whether the user wishes to proceed with the speech output for the next section. This process repeats until all sections of unbounded content have been read, or until the user asks the reading of the unbounded content to be stopped.
  • the reading of the item-specific paraphrase of the next item in the list can begin.
  • the digital assistant automatically resumes reading of the item- specific paraphrase of the next item in the list.
  • the digital assistant asks the user for a confirmation before resuming the reading.
  • the digital assistant is fully responsive to user input from multiple input channels. For example, while the digital assistant is reading through the list of items or in the middle of reading information on one item, the digital assistant allows the user to navigate to other items via natural language commands, gestures on a touch- sensitive surface or display, and other input interfaces (e.g. , mouse, keyboard, cursor, etc.).
  • Example navigation commands include: (1) Next: stop reading the current item and start reading the next. (2) More: read more of the current item (if it was truncated or segmented), (3) Repeat: read the last speech output again (e.g.
  • the interaction pattern also includes a wrap-up output.
  • a suitable wrap-up output for reading a list of e-mails can be "That was all 5 e-mails", “That was all of the messages”, "That was the end of the last message”, etc.
  • Local search results are search results obtained through a local search, e.g., search for businesses, landmarks, and/or addresses.
  • Examples of local search include a search for restaurants near a geographic location or within a geographic area, a search for gas stations along a route, a search for locations of a particular chain-store, and the like.
  • Local search is an example of a domain
  • local search result is an example of a domain- specific item type. The following provides an algorithm for presenting a list of local search results to a user in a hand- free context.
  • N the number of results returned by a search engine for a local search request
  • M the maximum number of search results to show to the user
  • P the number of items per "page" ⁇ i.e., concurrently presented to the user on the screen and/or provided under the same sub- section overview).
  • the digital assistant detects a hands-free context, and trims the list of results for hands-free context.
  • the digital assistant trims the list of all relevant results to no more than M: the maximum number of search results to show to the user.
  • a suitable number for M is about 3-7. The rationale behind this maximum number is: first, a user is unlikely to perform in depth research in a hands-free mode, and therefore, a small number of most pertinent items would typically satisfy the user's information needs; and second, a user is unlikely to be able to keep track of too much information simultaneously in his mind while in a hands-free mode, because the user is probably distracted by other tasks (e.g., driving or engaged in other hands-on work).
  • the digital assistant summarizes the list of results in text, and generates a domain- specific overview (in text form) of the entire list from the text.
  • the overview is tailored to presenting local search results and therefore location information is particularly relevant in the overview. For example, suppose that the user requested search results for a query in the form of "category, current location" (e.g., queries resulted from natural language search requests "Find Chinese restaurants near me” or "Where can I eat here?"). Then, the digital assistant reviews the search results, and identifies search results that are near the user's current location. Then the digital assistant generates an overview of the search results in the form of "I found several ⁇ categoryPlural> nearby.” In some embodiments, no count is provided in the overview unless N ⁇ 3. In some
  • a count of the search results is provided in the overview if the count is less than 6.
  • the textual form of the overview is provided on a display screen (e.g., within a dialogue window).
  • a speech-based overview is provided to the user.
  • the speech-based overview can be generated through text- to- speech conversion of the textual version of the overview.
  • no content is provided on a display screen, and only the speech-based overview is provided at this point.
  • a speech-based subsection overview of a first "page" of results can be provided.
  • the sub-section overview can list the names (e.g., business names) of the first P items on the "page.”
  • the digital assistant iterate through all the "pages" of the search result list in the above manner.
  • a current page of search results are presented in visual form (e.g., in textual form).
  • a visual context marker indicates the current item being read.
  • the textual paraphrase for each search result includes the ordinal position (e.g., first, second, etc), distance, and bearing associated with the search result.
  • the textual paraphrase for each result only occupies a single line in the list on the display, such that the list appears succinct and easy to read. To keep the text in a single line, no business name is presented, the text paraphrase is in the format of "Second: 0.6 miles south".
  • an individual visual snippet is provided for each result.
  • the snippet of each result can be revealed when the textual paraphrase shown on the display is scrolled, so that the 1 line text bubble is at the top and the snippet fits underneath.
  • the context marker or context cursor advances through the list of items as the items or paraphrases thereof are presented to the user one by one in a sequential order.
  • the context marker or context cursor advances through the list of items as the items or paraphrases thereof are presented to the user one by one in a sequential order.
  • d In speech, announce the ordinal position, business name, short address, distance, and bearing of the current item. The short address is the street name portion of the full address, for example.
  • the digital assistant can provide a speech output saying "You are already navigating on a route. Would you like to replace this route with directions to ⁇ item name>?" If the user replies in the affirmative, the digital assistant presents the directions to the location associated with that result. In some embodiments, the digital assistant provides a speech out saying "Directions to ⁇ item name>” and presents the navigation interface (e.g. , a maps and directions interface). If the user replies in the negative, the digital assistant provides a speech output saying "OK, I won't replace your route.” If in eyes-free mode, just stop here.
  • the navigation interface e.g. , a maps and directions interface
  • the digital assistant If user says “show it on a map,” but the digital assistant detects an eyes-free context, the digital assistant generates a speech output saying "Sorry, your vehicle won't let me show items on the map during driving" or some other standard eyes-free warning. If eyes-free context is not detected, the digital assistant provides a speech output saying "Here is the location of ⁇ item name>" and shows the single item snippet for that item again.
  • the digital assistant when an item is displayed, and the user asks to call an item, e.g., by saying "Call.”
  • the digital assistant identifies the correct target result, and initiates a telephone connection to a telephone number associated with the target result. Before making the telephone connection, the digital assistant provides a speech out saying "Calling ⁇ item name>.”
  • the following provides a few natural language use cases for identifying the target item/result of an action command.
  • the user can name the item in a command, and the target item is then identified based on the particular item name specified in the command.
  • the user can also use "it” or other reference to refer to a current item.
  • the digital assistant can identify the correct target item based on the current position of the context marker.
  • the user can also use "the nth one" or “number n” to refer to the nth item in the list. In some cases, the nth item can be ahead of the current item. For example, as soon as the user has heard the overview list of names and are hearing information regarding item #1, the user can say "directions to number 3". In response, the digital assistant will perform the "direction" action with respect to the 3rd item in the list.
  • the user can speak a business name to identify a target item. If multiple items in the list match the business name, then, the digital assistant chooses the last read item that matches the business name as the target item.
  • the digital assistant disambiguate from the current item (i.e. , the item pointed to by the context marker) back in time, then forward from the current item. For example, if context marker is on item 5 of 10 items, and the user says a selection criterion (e.g. , a particular business name, or other properties of the results) that matches items 2, 4, 6, and 8. Then the digital assistant chooses item 4 as the target item for the command.
  • a selection criterion e.g. , a particular business name, or other properties of the results
  • the digital assistant While presenting the list of local search results, the digital assistant allows the user to moving around the list by issuing the following commands: Next, Previous, go back, Read it again or repeat.
  • the digital assistant when the user provides a speech command that only specifies an item, but not any action applicable to the item, then, the digital assistant prompts the user to specify an applicable action.
  • the prompt provided by the digital assistant provides one or more actions applicable to the specific item type of the item (e.g., actions to local search results, such as "Call", “Directions,” “Show on map”, etc.).
  • the digital assistant prompts the user with a speech output saying "Would you like call it or get directions?" If the user' s speech input already specifies a command verb or action applicable to the item, then, the digital assistant acts on the item according to the command. For example, if the user' s input is "call the nearest gas station” or the like. The digital assistant identifies the target item (e.g. , the result corresponding to the nearest gas station), and initiates a telephone connection to a telephone number associated with the target item.
  • the target item e.g. , the result corresponding to the nearest gas station
  • the digital assistant is capable of processing and responding to user input related to different domains and context. If the user makes a context-independent, fully specified request in another domain, then, the digital assistant suspends or terminates the list reading, and responds to the request in the other domain. For example, while the digital assistant is in the process as asking the user "Would you like to call it, get directions, or go the next one" during list reading, the user can say "What is the time in Beijing?" In response to this new user input, the digital assistant determines the domain of interest has switch from local search and list-reading to another domain of clock/time.
  • the digital assistant Based on such a determination, the digital assistant performs the action requested in the clock/time domain (e.g., launch the clock application, or provides the current time in Beijing).
  • the action requested in the clock/time domain e.g., launch the clock application, or provides the current time in Beijing.
  • ⁇ category e.g. , gas station
  • list-reading in the local search domain are merely exemplary.
  • the techniques disclosed for the local search domain are also applicable to other domains and domain- specific item types.
  • the list reading algorithms and presentation techniques can also be applicable to reading a list of business listings outside of a local search domain.
  • Reading reminders in hands-free mode has two important parts: selecting what reminders to read and deciding how to read each reminder.
  • the list of reminder to be presented is filtered down to a group of reminders that is a meaningful subset of all available reminders associated with the user.
  • the group of reminders to be presented to the user in the hands-free context can further be divided into meaningful subgroups based on various reminder properties, such as reminder trigger time, trigger location, and other actions or events that the user or the user's device may perform. For example, if someone says "what are my reminders" it may not be very helpful for the assistant to reply "at least 25" since the user is unlikely to have time or be interested in hearing about all 25 reminders in one sitting.
  • the reminders to be presented to the user should be a rather small and actionable set of reminders that are relevant now. Such as “You have 3 recent reminders.” “You have 4 reminders for today.” “You have 5 reminders for today, 1 for when you are traveling and 4 for after you get home.”
  • a selection criterion can be based on a match between the alert time and due date of the reminder and the current date and time, or other user- specified date and time. For example, the user can ask "what are my reminders" and a small set (e.g., 5) of recent reminders and/or upcoming reminders with trigger time (e.g., alert time and/or due time/date) close to the current time is selected for hands-free listing reading to the user. For location triggers, a reminder can be triggered when the user is leaving a current location and/or arriving at another location.
  • trigger time e.g., alert time and/or due time/date
  • a selection criterion can be based on the current location and/or a user specified location. For example, the user can say "what are my reminders" when he or she is leaving a current location, and the assistant can select a small set of reminders that have triggers associated with the user leaving the current location. For another example, the user can say "what are my reminders" when the user steps into a store, and reminders associated with that store can be selected for presentation. For action triggers, a reminder can be triggered when the assistant detects that the user is performing an action (e.g., driving, or walking). Alternatively or in addition, the type of actions to be performed by the user as specified in the reminders can also be used to select relevant reminders for presentation.
  • an action e.g., driving, or walking
  • a selection criterion can be based on the user' s current action or the action triggers associated with the reminders.
  • a selection criterion can also be based on the user's current action and the actions that are to be performed by the user according to the reminders. For example, when the user asks "what are my reminders" when he is driving, and reminders associated with the driving action triggers (e.g., reminders for making calls in the car, reminders for going to the gas station, reminders to do oil change, etc.) can be selected for presentation.
  • reminders associate with actions that are suitable to be performed while the user is walking such as reminders for making calls and a reminder for checking the current pollen count, a reminder to put on sunscreens, etc., can be selected for presentation.
  • the assistant provides a report or overview on a short list of reminders associated with one or more of the following categories of reminders: (1) reminders that were recently triggered, (2) reminders to be triggered when the user is leaving some place (make the assumption that the some place is where they just were), (3) reminders to be triggered or due today, in soonest first, (4)reminders to be triggered when you arrive somewhere.
  • the order by which the individual reminders are presented sometimes is not as important as the overview.
  • the overview puts the list of reminders in a context in which the arbitrary title strings of the reminders can make some sense to the user. For example, when the user asks for reminders.
  • the assistant can provide a overview saying "You have N reminders that have recently come up, M for when you are traveling, and J reminders scheduled for today.”
  • the assistant can proceed to go through each sub-group of reminder in the list. For example, the following is the steps that the assistant can perform to present the list to the user:
  • the assistant provides a speech-based sub-section overview: "The reminders that were recently triggered are:", followed by a pause. Then, the assistant provides a speech- based item-specific paraphrase of the content of the reminder (e.g., a title of the reminder, or a short description of the reminder) saying, "contact that guy about something.”
  • a pause can be inserted, so that the user can tell the reminders apart, and can interrupt the assistant with a command during the pause.
  • the assistant enters a listening mode during the pause, if two-way communication is not constantly maintained.
  • the assistant proceeds with the second reminder in the sub-group, and so on: " ⁇ pause>get a cable for intergalactic communication from the company store.”
  • the ordinal position of the reminders are provided before the paraphrase is read.
  • the ordinal positions of the reminders are sometimes deliberately omitted to make the communication more succinct.
  • the assistant continues with the second sub-group of reminders by providing a sub-group overview first: "Reminders for when you are traveling are:” Then, the assistant goes through the reminders in the second sub-group one by one: “ ⁇ pause>call Justin Beaver” " ⁇ pause>check out the sunset.” After the second sub-group of reminders are presented, the assistant proceeds to read a sub-group overview of the third sub-group of reminders: "A reminder coming up today is:” Then, the assistant proceeds to provide the item-specific paraphrase of each reminder in the third sub-group: " ⁇ pause>finish that report.” After the third sub-group of reminders are presented, the assistant provides the sub-group overview of the fourth sub-group by saying "Reminders for when you get home are:” Then, the assistant proceeds to read the item-specific paraphrases for the reminders in the fourth sub-group: " ⁇ pause>pull a bottle from the cellar", " ⁇ pause>light a fire.”
  • the above examples are merely illustrative, and demonstrate the ideas of how
  • the above examples also illustrate the key phrases through which the reminders are presented. For example, a list-level overview including a description of the sub-groups and a count of reminders within each sub-group can be provided. In addition, when there are more than one sub-groups, a sub-group overview is provided before the reminders in the sub-groups are presented. The sub-group overview states the name or title of the sub-group based on a characteristic or property by which this sub-group is created, and by which reminders within the sub-group are selected.
  • the user will specify which particular group of reminders the user is interested in.
  • the selection criteria are provided by the user input.
  • the user may explicitly request “show me the calls I need to make” or “what do I have to do when I get home” "what do I have to buy at this store” and so on.
  • the digital assistant extract the selection criteria from the user input based on natural language processing, and identify the relevant reminders for presentation based on the user- specified selection criteria and the pertinent properties (e.g. , trigger time/date, trigger actions, actions to be performed, trigger locations, etc.) associated the reminders.
  • the assistant For reminders for calls: the user can ask “what calls do I need to make,” and the assistant can say “You have reminders to make 3 calls: Amy Joe, Bernard Julia, and Chetan Cheyer.” In this response, the assistant provides an overview followed by the item- specific paraphrases of the reminders.
  • the overview specified the selection criterion (e.g. , action to be performed by the user is "making calls") used to select the relevant reminders, and a count of the relevant reminders (e.g. , 3).
  • the domain-specific, item specific paraphrase for reminders for calls includes just the name of the person to be called (e.g.
  • the assistant For reminders for things to do at a specific location: the user asks “what do have to do when I get home,” and the assistant can say "You have 2 reminders for when you get home: ⁇ pause>pull a bottle from the cellar, and ⁇ pause>light a fire.”
  • the assistant provides an overview followed by the item-specific paraphrases of the reminders.
  • the overview specified the selection criterion (e.g. , trigger location is "home") used to select the relevant reminders, and a count of the relevant reminders (e.g., 2).
  • the domain-specific, item specific paraphrase for the reminders includes just the action to be performed (e.g., action specified in the reminders), and no extraneous information is provided in the paraphrases since the user just wants a preview of what' s coming up.
  • the following description relates to reading calendar events in a hands-free mode.
  • the two main considerations for hands-free calendar event reading are still selecting which calendar entries to read, and deciding how to read each calendar entry.
  • Similar to reading reminders and other domain- specific data item types a small subset of all calendar entries associated with the user are selected, and grouped into meaningful sub-groups of 3-5 entries each.
  • the division of sub-groups can be based on various selection criteria such as event date/time, reminder date/time, type of events, location of events, participants, etc.
  • the assistant can present information about the event entries for the current day or half day, and then proceeds afterwards in accordance with the user's subsequent commands. For example, the user can ask about additional events for the next day by simply saying "next page.”
  • the calendar entries are divided into sub-groups by date. Each sub-group only includes events on a single day. If the user asks for calendar entries of a date range spanning multiple days, the calendar entries associated with each single day within that range is presented at a time. For example, if the user asks "what's on my calendar next week," the assistant can reply with a list-level overview "You have 3 events on Monday, 2 events on Tuesday, and no events on other days.” The assistant can then proceed to present the events on each of Monday and Tuesday. For the events on each day, the assistant can provide a sub-group overview of the day first. The overview can specify the times of the events on that day. In some embodiments, if an event is a whole-day event, the assistant provides that information in the sub-group overview as well. For example, the following is an example scenario illustrating the hands-free reading of calendar entries:
  • event time is a most pertinent piece of information to the user in most cases. Streamlining the presentation of a list of times can improve use experience and make the communication of information more efficient.
  • the event times of the calendar entries span both the morning and the afternoon, only the event times for the first and last calendar entries are provided with an AM/PM indicator in the speech-based overview.
  • the AM indicator is provided for the event times of the first and the last calendar entries.
  • the PM indicator is provided for the last event of the day, but no AM/PM indicator is provided for other event times. Noon and midnight are exempt from AM/PM rule above.
  • the assistant provides a count of all-day events. For example, when asked about the events next week, the digital assistant can say "You have (N) all-day event(s).”
  • the digital assistant When reading the list of relevant calendar entries, the digital assistant first reads all of the timed events and then the all-day events. If there are no timed events, then the assistant goes directly to reading the list of all-day events after the overview. Then, for each event on the list, the assistants provides a speech-based item-specific paraphrase according to the following template: ⁇ time> ⁇ subject> ⁇ location>, where the location can be omitted if no location is specified in the calendar entry.
  • the item-specific paraphrases of the calendar entries include a ⁇ time> component in the form of: "at 11 AM”, “at noon”, “at 1 :30 PM”, “at 7: 15 PM", "at noon”, etc. For all day event, no such paraphrase is needed.
  • the assistant optionally specifies the count and/or identities of the participants in addition to the title of the event. For example, if there are more than 3 participants for an event, the ⁇ subject> component can include " ⁇ event title> with N people about”. If there are 1-3 participants, the ⁇ subject> component can include " ⁇ event title> with person 1, person2, and person3" If there are no participants for an event other than the user, the ⁇ subject> component can include just the ⁇ event title>. If a location is specified for a calendar event, ⁇ location> component can be inserted into the paraphrase of the calendar event. This needs some filtering.
  • the assistant After the user asks "what' s on my calendar.” The assistant replies with an overview: "You have events on your calendar at 11 AM, noon, 3:30, and 7 PM. You also have 2 day-long events.” After the overview, the assistant continues with the list of calendar entries: “At 11 AM: meeting”, “At 11 :30 AM: meeting with Harry Saddler”, "At noon: design review with 9 people in Room (8), IL 2", “At 3:30 PM: meeting with Susan”, “At 7 PM: dinner with Amy Cheyer and Lynn Julia.” In some embodiments, the assistant can indicate the end of the list by providing a wrap-up output, such as "That was all.”
  • E-mail is different from other item types in that emails typically include an unbounded portion ⁇ i.e., the message body) that is of unbounded size ⁇ e.g., too large to read in its entirety), and may include content that cannot be readily converted to speech ⁇ e.g., objects, tables, pictures, etc.).
  • emails typically include an unbounded portion ⁇ i.e., the message body) that is of unbounded size ⁇ e.g., too large to read in its entirety), and may include content that cannot be readily converted to speech ⁇ e.g., objects, tables, pictures, etc.).
  • the unbounded portions of e- mails are divided into smaller chunks, and only one chunk is provided at a time, and the rest is omitted from the speech output unless the user specifically request to hear them ⁇ e.g., by using a command such as "More").
  • pertinent properties for selecting e-mails for presentation, and dividing emails into sub-groups include sender identity, date, subject, read/unread status, urgency flag, etc.
  • Objects ⁇ e.g., tables, pictures) and attachments in the email can be identified by the assistant, but may be omitted from hands-free reading.
  • the objects and attachment may be presented on a display.
  • the display of these objects and attachment may be prevented by the assistant.
  • the following is an example scenario illustrating the hands-free list reading for email.
  • the example illustrates the use of a prompt after the overview and before reading the list of emails.
  • a summary or paraphrase of the content of each email is provided one by one.
  • the user can navigate through the list by using the command "Next", “First”, “Previous”, “Last” etc.
  • the user can say "More.”
  • the user can also say command related to actions applicable to an email.
  • the context marker advances through the list of emails as the assistant reads the emails one by one.
  • the context marker also hops from one email to another if the user's command is directed to an email out of sequential order.
  • the user can ask: "Do I have any new mail from Harry Saddler?"
  • the assistant identifies the relevant e-mails, and provides a list overview and a prompt for reading the list of emails: "You have 5 unread messages from Harry Saddler. Would you like to hear them read?" If the user answers in the affirmative, "Yes.”
  • the assistant proceeds to read a domain- specific, item-specific paraphrase of the list of emails one by one. For example, the assistant can say, “First message, From Harry Saddler, 3 days ago. With the subject: Short meeting today 3pm.”
  • the paraphrase is generated based on the content of the e-mail, and include key information such as an ordinal position of the message in the list, a sender identify associated with the message, a time associated with the message, and the subject line of the message.
  • the exact arrival time is not used. Instead a paraphrase of the arrival time is used, since the exact arrival time is probably not pertinent in the hands-free context.
  • the body of the message is not read or provided in the paraphrase.
  • the assistant can begin reading a first chuck of the unbounded content of the message: "Harry wrote: Next meeting — to scope a project X— is Monday Oct 17, 2pm. I can't find a room, so let's meet in the IL2/4th elevator lobby, and go from there. Harry"
  • the overall length of the unbounded content of the message is short enough to fit within a predetermined length limit for a chuck (e.g., 3 lines of text), therefore, the entire content body of the message is read in one chunk.
  • a brief pause is provided, so that the user can enter a command for an action (e.g., reply, delete, repeat, etc.) applicable to the message just read, or to move on to the next message.
  • an action e.g., reply, delete, repeat, etc.
  • the assistant proceeds to the next message in the list, and the context marker advances to the next message.
  • the assistant provides a speech output saying "Next message, From Harry Saddler, today at 8:23am, with the subject: List reading use cases.”
  • the assistant Before the assistant finishes reading this chunk of the message body, the user interrupts the reading with a command "Reply.” In response, the assistant stops the reading, and identifies the current message as the target message for the action requested by the user' s input. The assistant generates a speech-based response "OK, I can help you reply to Harry Saddler. What would you like to say?" The user speaks the text of the reply email “Great idea period these will help a lot.” The assistant mirrors the message back to the user to confirm accuracy, "Your message reads: Great idea. These will help a lot.” The assistant then asks "Ready to send it?" Once the user confirms (e.g., by saying "yes"), the assistant acknowledges the user' s confirmation, and says “OK, I'll send it. Shall I continue reading your email?" After the user replies "Yes.” The assistant goes to the next message in the list and reads it "Next message, From Harry Saddler, yesterday at 2:34pm with the subject:
  • the assistant detected some object that is not easily read out lout.
  • the message body contains a hyperlink.
  • the assistant does not convert the text of the hyperlink to the speech. Instead, the assistant simply identifies the existence of the link in the message body by using a keyword "LINK.TLF:.” so that the user is informed of the link, but the speech output of the message body is not distorted into an unrecognizable form.
  • an attachment is identified. The assistant does not attempt to read the attachment. Instead, the assistant simply provide a speech output indicating the presence of the attachment in the email.
  • the assistant After reading the first chuck, the assistant again asks “Continue reading this message?" The user says “Repeat.” In response, the assistant checks the position of the context marker, which is now residing on the first chunk of the unbounded content of the fourth message in the list. The assistant then reads the first chunk of the message body for the fourth message again, "How about this simple solution: silence the assistant, while assistant is speaking, shake it; it stops talking, silence the assistant and make a new request: while assistant is speaking, shake again."
  • the assistant asks "Continue?" If the user says “Yes”, the assistant continues to read the second chunk of the current message, and the context marker moves to the second chunk as well. For example, the assistant reads the second chunk "this is simply 'silence the assistant' + 'start listening' to get the assistant without making a new request. Home button, then mic button when it appears. This is really forced endpoint with no input. Harry.”
  • the assistant In response to the user' s command to stop reading the message body of the current message, the assistant proceeds to the read the next message: "Next message, From Harry Saddler, 10 minutes ago, In reply to a message from Siamak Hodjat on the subject: the simple rule for staying in transactions (modal prompting)." After a brief pause, the assistant asks “Continue reading this message?" The user says “Next.” In response to the use' s input, the assistant replies "That was the last of 5 email messages.”
  • Figs. 14A-14L is a flow diagram of a method for providing hands-free listing reading by a digital assistant (also called a virtual assistant).
  • the digital assistant identifies a plurality of data items for presentation to a user, where the plurality of data items are each associated with a domain- specific item type (1442).
  • the data items include: calendar entries associated with a user, emails from a particular sender, reminders for a particular day, and search results obtained from a particular local search request.
  • the domain-specific item types for the above example data items are calendar entries, emails, reminders, and local search results.
  • Each domain- specific data type has a relatively stable data structure, such that content of particular data fields can be predictably extracted and restructured into a paraphrase of the content.
  • the plurality of data items are also sorted according to a particular order. For example, local search results are often sorted by relevance and distance. Calendar entries are often sorted by event time. Items of some item types do not need to be sorted. For example, reminders may be unsorted.”
  • the assistant Based on the domain- specific item type, the assistant generates an speech- based overview of the plurality of data items (1444).
  • the overview provides the user with a general idea of what kinds of items are in the list, and how many items are in the list.
  • the assistant For each of the plurality of data items, the assistant further generates a respective speech-based, item-specific paraphrase for the data item based on respective content of the data item (1446).
  • the format of the item-specific paraphrase often depends on the domain- specific item type (e.g., whether the items is a calendar entry or a reminder) and the actual content of the data item (e.g. , event time and subject of a particular calendar entry).
  • the assistant provides the speech-based overview to a user through the speech-enabled dialogue interface (1448).
  • the speech-based overview is then followed by the respective speech-based, item- specific paraphrases for at least a subset of the plurality of data items.
  • the items in the list are sorted in a particular order, the paraphrases of the items are provided in the particular order.
  • the digital assistant for each of the plurality of data items, the digital assistant generates a respective textual, item-specific snippet for the data item based on respective content of the data item (1450).
  • the snippet can include more details of a corresponding local search result, or the content body of an email, etc.
  • the snippet is for presentation on a display, and accompanies the speech-based reading of the list.
  • the digital assistant provides the respective textual, item-specific snippets for at least the subset of the plurality of data items, to the user through a visual interface (1452).
  • the context marker is provided on the visual interface as well.
  • all of the plurality of data items are presented on the visual interface at the same time, while the reading of the items proceed "page" by "page", i.e., a subset at a time.
  • the provision of the speech-based, item- specific paraphrases is accompanied by provision of the respective textual, item specific snippets.
  • the digital assistant while providing the respective speech-based, item- specific paraphrases, the digital assistant inserts a pause between each pair of adjacent speech-based, item-specific paraphrases (1454).
  • the digital assistant enters a listening mode to capture user input during the pause (1456).
  • the digital assistant while providing the respective speech-based, item- specific paraphrases in a sequential order, advances a context marker to a current data item for which the respective speech-based, item- specific paraphrase is being provided to the user (1458).
  • the digital assistant receives user input requesting an action to be performed, the action applicable to the domain- specific item type (1460).
  • the digital assistant determines a target data item for the action among the plurality of data items based on a current position of the context marker (1462). For example, the user may request an action without explicitly specifying a target item for apply the action.
  • the assistant presumes the user is referring to the current data item as the target item. Then, the digital assistant performs the action with respect to the determined target data item (1464).
  • the digital assistant receives user input requesting an action to be performed, the action applicable to the domain-specific item type (1466).
  • the digital assistant determines a target data item for the action among the plurality of data items based on an item reference number specified in the user input (1468). For example, the user may say "the third" item in the user input, and the assistant can determine which item the "third" item is in the list.
  • the digital assistant performs the action with respect to the determined target data item (1470).
  • the digital assistant receives user input requesting an action to be performed, the action applicable to the domain- specific item type (1472).
  • the digital assistant determines a target data item for the action among the plurality of data items based on an item characteristic specified in the user input (1474). For example, the user can say "Reply to the message from Mark,” and the digital assistant can determine which message the user is referring to based on the sender identity "Mark” among the list of messages. Once the target item is determined, the digital assistant performs the action with respect to the determined target data item (1476).
  • the digital assistant when determining the target data item for the action, determines that the item characteristic specified in the user input applies to two or more of the plurality of data items (1478), determines a current position of a context marker among the plurality of data items (1480), and selecting one of the two or more data items as the target data item (1482).
  • the selecting of the data item includes: preferentially selecting all data items residing before the context marker over all data items residing after the context marker (1484); and preferentially selecting a data item closest to the context cursor among all data items on the same side of the context marker (1486).
  • the user says reply to the message from Mark, and if all messages from Mark are located after the current context marker, then select the closet one to the context marker as the target message. If one message from Mark is before the context marker, and the rest are after the context Marker, then the one before the context marker is selected as the target message. If all messages from Mark are located before the context marker, then the one closest to the context marker is selected as the target message.
  • the digital assistant receives user input selecting one of the plurality of data items without specifying any action applicable to the domain-specific item type (1488).
  • the digital assistant provides a speech-based prompt to the user, the speech-based prompt offering one or more action choices applicable to the selected data item (1490). For example, if the user says “the first gas station.” The assistant can offer a prompt saying "would you like to call or get directions?"
  • the digital assistant determines a respective size of an unbounded portion of the data item (1492). Then, in accordance with predetermined criteria, the digital assistant performs one of: (1) providing a speech-based output reading an entirety of the unbounded portion to the user (1494); and (2) chunking the unbounded portion of the data item into multiple discrete sections (1496), providing a speech-based output reading a particular discrete section of the multiple discrete sections to the user (1498), and prompting user input regarding whether to read the remaining discrete sections of the multiple discrete sections (1500).
  • the speech- based output comprises a verbal pagination indicator uniquely identifying the particular discrete section among the multiple discrete sections.
  • the digital assistant provides the respective speech- based, item-specific paraphrases for at least the subset of the plurality of data items in a sequential order (1502).
  • the digital assistant while providing the respective speech-based, item-specific paraphrases in the sequential order, the digital assistant receiving a speech input from the user, the speech input requesting one of: skipping one or more paraphrases, presenting additional information for a current data item, repeating one or more previously presented paraphrases (1504).
  • the digital assistant continues providing the paraphrases in accordance with the user's speech input (1506).
  • the digital assistant while providing the respective speech-based, item- specific paraphrases in the sequential order, receives a speech input from the user, the speech input requesting to pause the provision of the paraphrases (1508). In response to the speech input, the digital assistant pauses the provision of the paraphrases and listening for additional user input during the pausing (1510). During the pausing, the digital assistant performs one or more actions in response to one or more additional user input (1512). After performing the one or more actions, the digital assistant automatically resuming the provision of the paraphrases after the performance of the one or more actions (1514). For example, while reading one of a list of emails, the user can interrupt the reading, and ask the assistant to reply to a message. After the message is completed and sent, the assistant resumes reading of the remaining messages in the list. In some embodiments, the digital assistant requests a user confirmation before automatically resuming the provision of the paraphrases (1516).
  • the speech-based overview specifies a count of the plurality of data items.
  • the digital assistant receives a user input requesting presentation of the plurality of data items (1518). The digital assistant processes the user input to determine whether the user has explicitly requested reading of the plurality of data items (1520). Upon determination that the user has explicitly requested reading of the plurality of data items, the digital assistant automatically provides the speech-based, item specific paraphrases following the provision of the speech-based overview without further user request (1522). Upon determination that the user has not explicitly requested reading of the plurality of data items, the digital assistant prompts a user confirmation before providing the respective speech-based, item-specific paraphrases to the user (1524).
  • the digital assistant determines presence of a hands- free context (1526).
  • the digital assistant divides the plurality of data items into one or more subsets according to a predetermined maximum item count per subset (1528). Then, the digital assistant provides the respective speech-based, item-specific paraphrases for the data items in one subset at a time (1530).
  • the digital assistant determines presence of a hands- free context (1532).
  • the digital assistant limits the plurality of data items for presentation to a user according to a predetermined maximum item count specified for the hands-free context (1534).
  • the digital assistant provides a respective speech-based subset identifier before providing the respective item- specific paraphrases for the data items in each subset (1536).
  • the sub-set identifiers can be "the first five messages", "the next five messages", etc.
  • the digital assistant receives a user input while providing the speech-based overview and item-specific paraphrases to the user (1538).
  • the digital assistant processes the speech input to determine whether the speech input relates to the plurality of data items (1540).
  • the digital assistant suspends output generation related to the plurality of data items (1542), and provides to the user an output that is responsive to the speech input and unrelated to the plurality of data items (1544).
  • the digital assistant after the respective speech-based, item- specific paraphrases for all of the plurality of data items, the digital assistant provides a speech-based closure to the user through the dialogue interface (1546).
  • the domain- specific item type is local search results and the plurality of data items are a plurality of search results of a particular local search.
  • the digital assistant determines whether the particular local search is performed with respect to a current user location (1548), upon determining that the particular local search is performed with respect to the current user location, the digital assistant generates the speech-based overview without explicitly naming the current user location in the speech-based overview (1550), and upon determining that the particular local search is performed with respect to a particular location other than the current user location, the digital assistant generates the speech-based overview explicitly naming the particular location in the speech-based overview (1552).
  • the digital assistant determines whether a count of the plurality of search results exceeds three (1554), upon determining that the count does not exceed three, the assistant generates the speech-based overview without explicitly specifying the count (1556), and upon determining that the count exceeds three, the digital assistant generates the speech-based overview explicitly specifying the count (1558).
  • the speech-based overview of the plurality of data items specifies a respective business name associated with each of the plurality of search results.
  • the respective speech-based, item-specific paraphrase of each data item specifies a respective ordinal position of a search results among the plurality of search results, followed in sequence by a respective business name, a respective short address, a respective distance, and a respective bearing associated with the search result, and wherein the respective short address includes only a respective street name associated with the search result.
  • the digital assistant to generate the respective item-specific paraphrase for each data item, the digital assistant: (1) upon determination that an actual distance associated with the data item is less than one distance unit, specifies the actual distance in the respective item-specific paraphrase of the data item (1560); and (2) upon determination that the actual distance associated with the data item is greater than 1 distance unit, rounds the actual distance to the nearest whole number of distance units and specifies the nearest whole number of units in the respective item-specific paraphrase of the data item (1562).
  • the respective item-specific paraphrase of a highest- ranked data item among the plurality of data items according to one of a rating, a distance, and a matching score associated with the data item includes a phrase indicating the ranking of the data item, while the respective item-specific paraphrases of other data items among the plurality of data items omits the ranking of said data items.
  • the digital assistant automatically prompts user input regarding whether to perform an action applicable to the domain-specific item type, wherein the automatic prompting is only provided once for the first data item among the plurality of data items, and the automatic prompting is not repeated for the other data items among the plurality of data items (1564).
  • the digital assistant receives a user input requesting navigation to a respective business location associated with one of the search results (1566).
  • the assistant determines whether the user is already navigating on a planned route to a destination different from the respective business location (1568).
  • the assistant provides a speech output requesting a user confirmation to replace the planned route with a new route leading to the respective business location (1570).
  • the digital assistant receives an addition user input requesting a map view of the business location or the new route (1572).
  • the assistant detects presence of an eyes-free context (1574).
  • the digital assistant provides a speech-based warning indicating that the map view will not be provided in the eyes-free context (1576).
  • detecting the presence of the eyes-free context comprises detecting the user' s presence in a moving vehicle.
  • the domain- specific item type is reminders and the plurality of data items are a plurality of reminders for a particular time range.
  • the digital assistant detects a trigger event for presenting a listing of reminders to the user (1578).
  • the digital assistant identifies the plurality of reminders to be presented to the user in accordance with one or more relevance criteria, the one or more relevance criteria based on one or more of a current date, a current time, a current location, a action performed by the user or a device associated with the user, an action to be performed by the user or a device associated with the user, an a reminder category specified by the user (1580).
  • the trigger event for presenting a listing of reminders comprises receipt of a user request to see reminders for the current day, and the plurality of reminders is identified based on the current date, and each of the plurality of reminders has a respective trigger time within the current date.
  • the trigger event for presenting a listing of reminders comprises receipt of a user request to see recent reminders, and the plurality of reminders is identified based on the current time, and each of the plurality of reminders has been triggered within a predetermined time period before the current time.
  • the trigger event for presenting a listing of reminders comprises receipt of a user request to see upcoming reminders, and the plurality of reminders is identified based on the current time, and each of the plurality of reminders has a respective trigger time within a predetermined time period after the current time.
  • the trigger event for presenting a listing of reminders comprises receipt of a user request to see a particular category of reminders, and each of the plurality of reminders belongs to the particular category. In some embodiments, the trigger event for presenting a listing of reminder comprises detecting the user leaving a
  • the trigger event for presenting a listing of reminders comprises detecting the user arriving at a predetermined location.
  • the trigger event based on location, action, time for presenting a list of reminders can also be used as selection criteria for determining which reminders should be included in the list of reminders to present to the user when the user requests to see reminders without specifying a selection criterion in his or she request. For example, as set forth in the use cases for hands-free list reading, the fact that the user is at a particular location (e.g. , ), leaving or arriving at a particular location, and performing a particular action (e.g.
  • the digital assistant provides the speech-based, item specific paraphrase of the plurality of reminders in an order sorted according to respective trigger times of the reminders (1582). In some embodiments, the reminders are not sorted.
  • the digital assistant applies increasingly stringent relevance criteria to select the plurality of reminders until a count of the plurality of reminders no longer exceed a predetermined threshold number (1584).
  • the digital assistant dividing the plurality of reminders into multiple categories (1586).
  • the digital assistant generates a respective speech-based category overview for each of the multiple categories (1588).
  • the digital assistant provides the respective speech-based category overview for each category immediately before the respective item-specific paraphrases for the reminders in the category (1590).
  • the multiple categories includes one or more of a category based on location, a category based on task, a category based on trigger time relative to current time, a category based on trigger time relative to a user- specified time.
  • the domain- specific item type is calendar entries and the plurality of data items are a plurality of calendar entries for a particular time range.
  • the speech-based overview of the plurality of data items provides either or both timing and duration information associated with each of the plurality of calendar entries without providing additional details regarding the calendar entries.
  • the speech-based overview of the plurality of data items provides a count of all-day events among the plurality of calendar entries.
  • the speech-based overview of the plurality of data items includes a listing of respective event times associated with the plurality of calendar entries, and wherein the speech-based overview only explicitly pronounces a respective AM/PM indicator associated with a particular event time under one of the following conditions: (1) the particular event time is the last one in the listing, (2) the particular event time is the first one in the listing and occurs in the morning.
  • the speech -based, item- specific paraphrases of the plurality of data items is a paraphrase of a respective calendar event generated according to a " ⁇ time> ⁇ subjectxlocation, if available>" format.
  • the paraphrase of the respective calendar event names one or more participants of the respective calendar event if a total count of the participants is below a predetermined number; and the paraphrase of the respective calendar event does not name participants of the respective calendar event if the total count of the participants is above the predetermined number.
  • the paraphrase of the respective calendar event provides the total count of the participants if the total count is above the predetermined number.
  • the domain- specific item type is e-mails and the plurality of data items are a particular group of e-mails.
  • the digital assistant receiving a user input requesting a listing of emails (1592).
  • the digital assistant identifies the particular group of e-mails to be presented to the user in accordance with one or more relevance criteria, the one or more relevance criteria based on one or more of: a sender identity, a message arrival time, a read/unread status, and an e-mail subject (1594).
  • the digital assistant processes the user input to determine at least one of the one or more relevance criteria (1596).
  • the speech-based overview of the plurality of data items paraphrases the one or more relevance criteria used to identify the particular group of e-mails, and provides a count of the particular group of e-mails.
  • the digital assistant prompts user input to accept or reject reading of the group of e- mails to the user (1598).
  • the respective speech-based, item specific paraphrase for each data item is a respective speech-based, item specific paraphrase for a respective e-mail in the particular group of emails, and the respective paraphrase for the respective e-mail specifies an ordinal position of the respective e-mail in the group of e- mails, a sender of the respective e-mail, and a subject of the email.
  • the digital assistant determines a respective size of an unbounded portion of the e-mail (1600). In accordance with predetermined criteria, the digital assistant performs one of: (1) providing a speech-based output reading an entirety of the unbounded portion to the user (1602); and (2) chunking the unbounded portion of the data item into multiple discrete sections (1604), providing a speech-based output reading a particular discrete section of the multiple discrete sections to the user, and after reading the particular discrete section, prompting user input regarding whether to read the remaining discrete sections of the multiple discrete sections.
  • the present invention can be implemented as a system or a method for performing the above-described techniques, either singly or in any combination.
  • the present invention can be implemented as a computer program product comprising a nontransitory computer-readable storage medium and computer program code, encoded on the medium, for causing a processor in a computing device or other electronic device to perform the above-described techniques.
  • Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the present invention can be embodied in software, firmware and/or hardware, and when embodied in software, can be downloaded to reside on and be operated from different platforms used by a variety of operating systems. [0381] The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computing device.
  • Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic- optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computing devices referred to herein may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
  • the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention can be any organic compound [0383] Accordingly, in various embodiments, the present invention.
  • Such an electronic device can include, for example, a processor, an input device (such as a keyboard, mouse, touchpad, trackpad, joystick, trackball, microphone, and/or any
  • an output device such as a screen, speaker, and/or the like
  • memory such as a hard disk drive, floppy disk drive, and/or the like
  • long-term storage such as magnetic storage, optical storage, and/or the like
  • network connectivity such as network connectivity, according to techniques that are well known in the art.
  • Such an electronic device may be portable or nonportable.
  • Examples of electronic devices that may be used for implementing the invention include: a mobile phone, personal digital assistant, smartphone, kiosk, desktop computer, laptop computer, tablet computer, consumer electronic device, consumer entertainment device; music player; camera; television; set-top box; electronic gaming unit; or the like.
  • An electronic device for implementing the present invention may use any operating system such as, for example, iOS or MacOS, available from Apple Inc. of Cupertino, California, or any other operating system that is adapted for use on the device.

Abstract

La présente invention concerne un procédé visant à déterminer automatiquement, sans entrée d'utilisateur et indépendamment du fait qu'une application d'assistant numérique a été appelée ou pas par un utilisateur, que le dispositif électronique se trouve dans un véhicule. Dans certains modes de réalisation, la détermination de la présence du dispositif électronique dans le véhicule consiste à détecter que le dispositif électronique est en communication avec le véhicule (p. ex. par l'intermédiaire de techniques et/ou de protocoles de communication filaires ou sans fil). Le procédé consiste également, en réponse à la détermination, à appeler un mode d'écoute d'un assistant virtuel mis en œuvre par le dispositif électronique. Dans certains modes de réalisation, le procédé consiste également à limiter la capacité d'un utilisateur à voir le contenu produit en sortie par le dispositif électronique, à fournir du contenu saisi dans le dispositif électronique, et similaires.
EP14736158.8A 2013-06-08 2014-06-05 Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains Ceased EP3005075A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/913,421 US10705794B2 (en) 2010-01-18 2013-06-08 Automatically adapting user interfaces for hands-free interaction
PCT/US2014/041173 WO2014197737A1 (fr) 2013-06-08 2014-06-05 Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains

Publications (1)

Publication Number Publication Date
EP3005075A1 true EP3005075A1 (fr) 2016-04-13

Family

ID=51134345

Family Applications (1)

Application Number Title Priority Date Filing Date
EP14736158.8A Ceased EP3005075A1 (fr) 2013-06-08 2014-06-05 Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains

Country Status (5)

Country Link
EP (1) EP3005075A1 (fr)
KR (1) KR101834624B1 (fr)
CN (1) CN105284099B (fr)
HK (1) HK1223694A1 (fr)
WO (1) WO2014197737A1 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017158208A1 (fr) * 2016-03-18 2017-09-21 Universidad De La Laguna Système et procédé pour l'automatisation et l'utilisation sécurisées d'applications mobiles dans des véhicules
US10599391B2 (en) * 2017-11-06 2020-03-24 Google Llc Parsing electronic conversations for presentation in an alternative interface
CN107919120B (zh) * 2017-11-16 2020-03-13 百度在线网络技术(北京)有限公司 语音交互方法及装置,终端,服务器及可读存储介质
US10930278B2 (en) * 2018-04-09 2021-02-23 Google Llc Trigger sound detection in ambient audio to provide related functionality on a user interface
GB2573097A (en) * 2018-04-16 2019-10-30 British Gas Trading Ltd Natural language interface for a data management system
JP7203865B2 (ja) * 2018-05-07 2023-01-13 グーグル エルエルシー ユーザと、自動化されたアシスタントと、他のコンピューティングサービスとの間のマルチモーダル対話
GB2575970A (en) 2018-07-23 2020-02-05 Sonova Ag Selecting audio input from a hearing device and a mobile device for telephony
CN109098480A (zh) * 2018-10-10 2018-12-28 中国计量大学 凉亭装置
CN111695044B (zh) * 2019-03-11 2023-08-18 北京柏林互动科技有限公司 用户排名的数据处理方法、装置及电子设备
US11321048B2 (en) 2020-02-25 2022-05-03 Motorola Solutions, Inc. Method and apparatus for temporary hands-free voice interaction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120022872A1 (en) * 2010-01-18 2012-01-26 Apple Inc. Automatically Adapting User Interfaces For Hands-Free Interaction

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4812941B2 (ja) * 1999-01-06 2011-11-09 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ 注目期間を有する音声入力装置
KR100477796B1 (ko) * 2002-11-21 2005-03-22 주식회사 팬택앤큐리텔 속도 감응에 의한 핸즈프리 전환 장치 및 그 방법
US9318108B2 (en) * 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
KR100819928B1 (ko) * 2007-04-26 2008-04-08 (주)부성큐 휴대 단말기의 음성 인식장치 및 그 방법
CN101325756B (zh) * 2007-06-11 2013-02-13 英华达(上海)电子有限公司 一种手机语音辨识装置以及激活手机语音辨识的方法
CN101448340B (zh) * 2007-11-26 2011-12-07 联想(北京)有限公司 一种检测移动终端状态的方法、系统及该移动终端
US9858925B2 (en) * 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10706373B2 (en) * 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US20110111724A1 (en) * 2009-11-10 2011-05-12 David Baptiste Method and apparatus for combating distracted driving
US10145960B2 (en) * 2011-02-24 2018-12-04 Ford Global Technologies, Llc System and method for cell phone restriction
US9202465B2 (en) * 2011-03-25 2015-12-01 General Motors Llc Speech recognition dependent on text message content
CN102137193A (zh) * 2011-04-13 2011-07-27 深圳凯虹移动通信有限公司 一种移动通讯终端及其通讯控制方法
US20130035117A1 (en) * 2011-08-04 2013-02-07 GM Global Technology Operations LLC System and method for restricting driver mobile device feature usage while vehicle is in motion

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120022872A1 (en) * 2010-01-18 2012-01-26 Apple Inc. Automatically Adapting User Interfaces For Hands-Free Interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2014197737A1 *

Also Published As

Publication number Publication date
HK1223694A1 (zh) 2017-08-04
CN105284099B (zh) 2019-05-17
KR101834624B1 (ko) 2018-03-05
CN105284099A (zh) 2016-01-27
WO2014197737A1 (fr) 2014-12-11
KR20160003138A (ko) 2016-01-08

Similar Documents

Publication Publication Date Title
US10679605B2 (en) Hands-free list-reading by intelligent automated assistant
US10705794B2 (en) Automatically adapting user interfaces for hands-free interaction
US20190095050A1 (en) Application Gateway for Providing Different User Interfaces for Limited Distraction and Non-Limited Distraction Contexts
EP3005668B1 (fr) Passerelle d'application servant à fournir différentes interfaces utilisateurs pour des contextes de distraction limitée et de distraction non limitée
EP2761860B1 (fr) Interfaces utilisateur s'adaptant automatiquement pour permettre une interaction sans les mains
US10496753B2 (en) Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) Systems and methods for hands-free notification summaries
CN105144133B (zh) 对中断进行上下文相关处理
AU2017203847B2 (en) Using context information to facilitate processing of commands in a virtual assistant
KR101834624B1 (ko) 핸즈 프리 상호작용을 위한 사용자 인터페이스 자동 적응
RU2542937C2 (ru) Использование контекстной информации для облегчения обработки команд в виртуальном помощнике
US10475446B2 (en) Using context information to facilitate processing of commands in a virtual assistant

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20151116

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1223694

Country of ref document: HK

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20180209

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: APPLE INC.

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20190403

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1223694

Country of ref document: HK