CN107491468B

CN107491468B - Application integration with digital assistant

Info

Publication number: CN107491468B
Application number: CN201710386355.2A
Authority: CN
Inventors: R·A·瓦尔克二世; B·J·妞厄多普; R·达萨里; R·D·朱利; T·R·格鲁伯; C·E·拉德鲍格; A·加格; V·科斯拉; J·H·拉塞尔; C·彼得森
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2016-06-11
Filing date: 2017-05-26
Publication date: 2021-06-01
Anticipated expiration: 2037-05-26
Also published as: CN111913778A; CN113238707A; CN107493374B; CN107491468A; CN107491295B; CN107493374A; CN107491295A

Abstract

The invention provides application integration with digital assistant. The present invention provides systems and processes for application integration with a digital assistant. According to one embodiment, a method includes, at an electronic device having one or more processors and memory, receiving natural language user input, identifying, with the one or more processors, an intent object in a set of intent objects and a parameter associated with the intent, wherein the intent object and the parameter are obtained from the natural language user input. The method also includes identifying a software application associated with an intent object of the set of intent objects; and providing the intent object and parameters to the software application.

Description

Application integration with digital assistant

Technical Field

The present disclosure relates generally to interacting with applications, and more particularly to techniques for application integration with digital assistants.

Background

A digital assistant may facilitate a user performing various functions on a user device. For example, the digital assistant may set an alarm clock, provide weather updates, and perform searches both locally and on the internet, while providing a natural language interface for the user. However, existing digital assistants cannot be effectively integrated with applications, such as those stored locally on user devices, particularly third party applications. Thus, existing digital assistants fail to provide a natural language interface for such applications.

Disclosure of Invention

Exemplary methods are disclosed herein. An example method includes, at an electronic device having one or more processors, receiving a natural language user input, and identifying, with the one or more processors, an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input. The method also includes identifying a software application associated with an intent of the set of intents and providing the intent and parameters to the software application.

An exemplary method includes, at one or more electronic devices each having one or more processors, receiving natural language user input; determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.

An exemplary method includes, at one or more electronic devices each having one or more processors, receiving natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; and determining whether a task corresponding to the intent can be satisfied based on at least one of the intent or the parameter. The method also includes, in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent, and in accordance with a determination that a task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.

An example method includes, at a first electronic device having one or more processors, receiving a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the natural language user input to a second electronic device; and receiving, from the second electronic device, an indication that the software application associated with the intent is not located on the first electronic device. The method also includes, in response to the notification, obtaining a list of applications associated with the intent; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.

Example non-transitory computer readable media are disclosed herein. An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the electronic device to receive natural language user input; identifying, with the one or more processors, an intent of a set of intentions and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; identifying a software application associated with the intent of the set of intents; and providing the intent and the parameters to the software application.

An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of one or more electronic devices, cause the one or more electronic devices to receive natural language user input; determining an intent of a set of intents and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.

An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of one or more electronic devices, cause the one or more electronic devices to receive natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; determining whether a task corresponding to the intent can be satisfied based on at least one of the intent or a parameter; in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent; and in accordance with a determination that the task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.

An exemplary non-transitory computer readable storage medium stores one or more programs. The one or more programs include instructions that, when executed by one or more processors of an electronic device, cause the electronic device to receive a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the natural language user input to a second electronic device; receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; obtaining, in response to the notification, a list of applications associated with the intent; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.

Example electronic devices and systems are disclosed herein. An exemplary electronic device includes: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; identifying, with the one or more processors, an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; identifying a software application associated with the intent of the set of intents; and providing the intent and parameters to the software application.

An exemplary system includes one or more processors of one or more electronic devices; one or more memories of one or more electronic devices; and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; determining an intent of a set of intentions and a parameter associated with the intent based on the natural language user input; identifying a software application based on at least one of the intent or parameter; and providing the intent and parameters to the software application.

An exemplary system includes one or more processors of one or more electronic devices; one or more memories of one or more electronic devices; and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input; identifying, based on the natural language user input, an intent of a set of intents and a parameter associated with the intent; determining, based on at least one of the intent or parameters, whether a task corresponding to the intent can be satisfied; in accordance with a determination that a task corresponding to the intent can be satisfied, providing the intent and parameters to a software application associated with the intent; and in accordance with a determination that the task corresponding to the intent cannot be satisfied, providing a list of one or more software applications associated with the intent.

An exemplary electronic device includes one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for receiving natural language user input, wherein the natural language user input indicates an intent of a set of intentions; providing the native language user input to a second electronic device; receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; obtaining, in response to the notification, a list of applications associated with the intent; displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; receiving a user input indicating a selection of an application in the list of applications; and providing the intent of the set of intents to the application.

An exemplary electronic device comprises means for receiving a natural language user input; means for identifying an intent of a set of intents and a parameter associated with the intent, wherein the intent and parameter are obtained from the natural language user input; means for identifying a software application associated with an intent of the set of intentions; and means for providing the intent and parameters to the software application.

An exemplary system comprises means for receiving a natural language user input; means for determining an intent of a set of intents and a parameter associated with the intent based on the native language user input; means for identifying a software application based on at least one of the intent or parameter; and means for providing the intent and parameters to the software application.

An exemplary system comprises means for receiving a natural language user input; means for identifying an intent of a set of intents and a parameter associated with the intent based on the native language user input; means for determining whether a task corresponding to the intent can be satisfied based on at least one of the intent or a parameter; means for providing the intent and parameters to a software application associated with the intent in accordance with a determination that a task corresponding to the intent can be satisfied; and means for providing a list of one or more software applications associated with the intent in accordance with the determination that the task corresponding to the intent cannot be satisfied.

An exemplary electronic device comprises means for receiving a natural language user input, wherein the natural language user input indicates an intent of a set of intentions; means for providing the natural language user input to a second electronic device; means for receiving, from the second electronic device, an indication that a software application associated with the intent is not located on the first electronic device; means for obtaining a list of applications associated with the intent in response to the notification; means for displaying, with a touch-sensitive display of the first electronic device, a list of applications associated with the intent in a user interface; means for receiving a user input indicating a selection of an application in the list of applications; and means for providing the intent of the set of intents to the application.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings in which like reference numerals indicate corresponding parts throughout the figures.

FIG. 1 is a block diagram illustrating a system and environment for implementing a digital assistant in accordance with various embodiments.

Fig. 2A is a block diagram illustrating a portable multifunction device implementing a client-side portion of a digital assistant, according to some embodiments.

Fig. 2B is a block diagram illustrating exemplary components for event processing, in accordance with various embodiments.

Figure 3 illustrates a portable multi-function device implementing a client-side portion of a digital assistant, in accordance with various embodiments.

FIG. 4 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with various embodiments.

FIG. 5A illustrates an exemplary user interface of an application menu on a portable multifunction device in accordance with various embodiments.

FIG. 5B illustrates an exemplary user interface of a multifunction device with a touch-sensitive surface separate from a display in accordance with various embodiments.

FIG. 6A illustrates a personal electronic device, in accordance with various embodiments.

Fig. 6B is a block diagram illustrating a personal electronic device, in accordance with various embodiments.

Fig. 7A is a block diagram illustrating a digital assistant system or server portion thereof in accordance with various embodiments.

Fig. 7B illustrates functionality of the digital assistant illustrated in fig. 7A in accordance with various embodiments.

FIG. 7C illustrates a portion of an ontology in accordance with various embodiments.

Fig. 8 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments.

Fig. 9 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments.

Fig. 10A-10C illustrate exemplary user interfaces of an electronic device according to some embodiments.

Fig. 10D-10E illustrate exemplary data flows of a digital assistant system according to some embodiments.

Fig. 11-14 illustrate functional block diagrams of electronic devices according to some embodiments.

Detailed Description

In the following description of the present disclosure and embodiments, reference is made to the accompanying drawings, in which are shown by way of illustration specific embodiments that may be practiced. It is to be understood that other embodiments and examples may be practiced and that changes may be made without departing from the scope of the present disclosure.

Although the following description uses the terms "first," "second," etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first input may be named a second input and similarly a second input may be named a first input without departing from the scope of the various described embodiments. Both the first and second inputs may be outputs, and in some cases, may be separate different inputs.

The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Depending on the context, the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.

1. System and environment

Fig. 1 illustrates a block diagram of a system 100, in accordance with various embodiments. In some embodiments, system 100 may implement a digital assistant. The terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automatic digital assistant" may refer to any information processing system that interprets natural language input in spoken and/or textual form to infer user intent, and performs actions (e.g., tasks) based on the inferred user intent. For example, to make the inferred user intent, the system may perform one or more of the following: identifying a task flow by steps and parameters designed to achieve the inferred user intent, entering into the task flow specific requirements from the inferred user intent; executing a task flow by calling a program, method, service, API, etc.; and generating an output response to the user in audible (e.g., speech) and/or visual form.

In particular, the digital assistant is capable of accepting user requests at least partially in the form of natural language commands, requests, announcements, narratives, and/or inquiries. Typically, a user request may seek either a digital assistant to answer informativeness or a digital assistant to perform a task. The satisfactory response to the user request may be to provide the requested informational answer, to perform the requested task, or a combination of both. For example, a user may ask a digital assistant such as "where do i am present? "and the like. Based on the user's current location, the digital assistant may answer "you are near the central park siemens. A "user may also request to perform a task, such as" please invite my friend to join my girlfriend's birthday party on the next week. In response, the digital assistant may confirm the request by speaking "good, now" and then the calendar user sends an appropriate calendar invitation to each of the user's friends listed in the user's electronic address book. During the performance of requested tasks, the digital assistant can sometimes interact with the user over a long period of time in a continuous conversation involving multiple exchanges of information. There are many other ways to interact with a digital assistant to request information or perform various tasks. In addition to providing verbal responses and taking programmed actions, the digital assistant may also provide other visual or audio forms, e.g., as responses to text, alerts, music, video, animation, etc.

As shown in fig. 1, in some embodiments, the digital assistant may be implemented according to a client-server model. The digital assistant may include a client-side portion 102 (hereinafter "DA client 102") executing on a user device 104, and a server-side portion 106 (hereinafter "DA server 106") executing on a server system 108. The DA client 102 may communicate with the DA server 106 over one or more networks 110. The DA client 102 may provide client-side functionality such as user-facing input and output processing and communicate with the DA server 106. The DA server 106 may provide server-side functionality for any number of DA clients 102, each of the number of DA clients 102 located on a respective user device 104.

In some embodiments, DA server 106 may include a client-facing I/O interface 112, one or more processing modules 114, data and models 116, and an I/O interface 118 to external services. The client-facing I/O interface 112 may facilitate client-facing input and output processing of the DA server 106. The one or more processing modules 114 may utilize the data and module 116 to process the speech input and determine the user's intent based on the natural language input. Moreover, the one or more processing modules 114 perform task execution based on the inferred user intent. In some embodiments, DA server 106 may communicate with external services 120 over one or more networks 110 to complete tasks or gather information. An I/O interface 118 to external services may facilitate such communication.

The user device 104 may be any suitable electronic device. For example, the user device may be a portable multifunction device (e.g., device 200 described below with reference to fig. 2A), a multifunction device (e.g., device 400 described below with reference to fig. 4), or a personal electronic device (e.g., device 600 described below with reference to fig. 6A-6B). The portable multifunction device may be, for example, a mobile telephone that also contains other functions, such as PDA and/or music player functions. Particular embodiments of portable multifunction devices can include those from Apple Inc

iPod

And

an apparatus. Other embodiments of the portable multifunction device may include, but are not limited to, a laptop or tablet. Also, in some embodiments, the user device 104 may be a non-portable multifunction device. In particular, the user device 104 may be a desktop computer, a game console, a television, or a television set-top box. In some embodiments, the user device 104 may include a touch-sensitive surface (e.g., a touchscreen display and/or a trackpad). However, the user device 104 optionally may include one or more other physical user interface devices, such as a physical keyboard, mouse, and/or joystick. Various embodiments of electronic devices, such as multifunction devices, are described in more detail below.

Embodiments of one or more communication networks 110 may include a Local Area Network (LAN) and a Wide Area Network (WAN), such as the internet. The one or more communication networks 110 may be implemented using any known network protocol, including various wired or wireless protocols, such as, for example, ethernet, Universal Serial Bus (USB), FIREWIRE (FIREWIRE), global system for mobile communications (GSM), Enhanced Data GSM Environment (EDGE), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), bluetooth, Wi-Fi, internet telephone protocol (VoIP), Wi-MAX, or any other suitable communication protocol.

The server system 108 may be implemented on one or more stand-alone data processing devices or a distributed network of computers. In some embodiments, the server system 108 may also employ various virtual devices and/or services of third party service providers (e.g., third party cloud service providers) to provide potential computing resources and/or infrastructure resources of the server system 108.

In some embodiments, user device 104 may communicate with DA server 106 via second user device 122. The second user device 122 may be similar or identical to the user device 104. For example, the second user equipment 122 may be similar to the apparatus 200,400, or 600 described below with reference to fig. 2A, 4, and 6A-6B. The user device 104 may be configured to communicatively couple to the second user device 122 via a direct communication connection, such as bluetooth, NFC, BTLE, etc., or via a wired or wireless network, such as a local Wi-Fi network. In some embodiments, second user device 122 may be configured to act as a proxy between user device 104 and DA server 106. For example, DA client 102 of user device 104 may be configured to transmit information (e.g., a user request received at user device 104) to DA server 106 via second user device 122. DA server 106 may process the information and return relevant data (e.g., data content in response to the user request) to user device 104 via second user device 122.

In some embodiments, the user device 104 may be configured to transmit an abbreviated request for data to the second user device 122 to reduce the amount of information transmitted from the user device 104. Second user device 122 may be configured to determine that supplemental information added to the abbreviation request generates a complete request to transmit to DA server 106. The system architecture may advantageously allow a user device 104 (e.g., a watch or similar compact electronic device) with limited communication capabilities and/or limited battery power to access services provided by DA server 106 by using a second user device 122 (such as a mobile phone, laptop, tablet, etc.) with stronger communication capabilities and/or battery power as a proxy for DA server 106. Although only two

user devices

104 and 122 are shown in fig. 1, it should be understood that system 100 may include any number and type of user devices configured in the proxy configuration to communicate with DA server system 106.

Although the digital assistant shown in fig. 1 may include both a client-side portion (e.g., DA client 102) and a server-side portion (e.g., DA server 106), in some embodiments, the functionality of the digital assistant may be implemented as a standalone application installed on the user device. Moreover, the division of functionality between the client portion and the server portion of the digital assistant may vary in different implementations. For example, in some embodiments, the DA client may be a thin client that provides only user-oriented input and output processing functions, and delegates all other functions of the digital assistant to a backend server.

2. Electronic device

Attention is now directed to implementations of electronic devices for implementing the client-side portion of a digital assistant. Fig. 2A is a block diagram illustrating a portable multifunction device 200 with a touch-sensitive display system 212 in accordance with some embodiments. The touch sensitive display 212 is sometimes referred to as a "touch screen" for convenience, and may sometimes be referred to or called a "touch sensitive display system". Device 200 includes memory 202 (which optionally includes one or more computer-readable storage media), memory controller 222, one or more processing units (CPUs) 220, peripheral interface 218, RF circuitry 208, audio circuitry 210, speaker 211, microphone 213, input/output (I/O) subsystem 206, other input control devices 216, and external ports 224. The device 200 optionally includes one or more optical sensors 264. Device 200 optionally includes one or more contact intensity sensors 265 for detecting the intensity of contacts on device 200 (e.g., a touch-sensitive surface, such as touch-sensitive display system 212 of device 200). Device 200 optionally includes one or more tactile output generators 267 for generating tactile outputs on device 200 (e.g., generating tactile outputs on a touch-sensitive surface such as touch-sensitive display system 212 of device 200 or touch panel 455 of device 400). These components optionally communicate over one or more communication buses or signal lines 203.

As used in this specification and claims, the term "intensity" of a contact on a touch-sensitive surface refers to the force or pressure (force per unit area) of a contact (e.g., a finger contact) on the touch-sensitive surface, or to a substitute (surrogate) for the force or pressure of a contact on the touch-sensitive surface. The intensity of the contact has a range of values that includes at least four different values and more typically includes hundreds of different values (e.g., at least 256). The intensity of the contact is optionally determined (or measured) using various methods and various sensors or combinations of sensors. For example, one or more force sensors below or adjacent to the touch-sensitive surface are optionally used to measure forces at different points on the touch-sensitive surface. In some implementations, force measurements from multiple force sensors are combined (e.g., a weighted average) to determine an estimated contact force. Similarly, the pressure sensitive tip of the stylus is optionally used to determine the pressure of the stylus on the touch-sensitive surface. Alternatively, the size of the contact area detected on the touch-sensitive surface and/or changes thereof, the capacitance of the touch-sensitive surface adjacent to the contact and/or changes thereof and/or the resistance of the touch-sensitive surface adjacent to the contact and/or changes thereof are optionally used as a substitute for the force or pressure of the contact on the touch-sensitive surface. In some implementations, the surrogate measurement of contact force or pressure is used directly to determine whether an intensity threshold has been exceeded (e.g., the intensity threshold is described in units corresponding to the surrogate measurement). In some implementations, the surrogate measurement of contact force or pressure is converted into an estimated force or pressure, and the estimated force or pressure is used to determine whether an intensity threshold has been exceeded (e.g., the intensity threshold is a pressure threshold measured in units of pressure). The intensity of the contact is used as a property of the user input, allowing the user to access additional device functionality that the user may not have access to on a smaller sized device with limited real estate for displaying affordances (e.g., on a touch-sensitive display) and/or receiving user input (e.g., via a touch-sensitive display, a touch-sensitive surface, or physical/mechanical controls, such as knobs or buttons).

As used in this specification and claims, the term "haptic output" refers to a physical displacement of a device relative to a previous position of the device, a physical displacement of a component of the device (e.g., a touch-sensitive surface) relative to another component of the device (e.g., a housing), or a displacement of a component relative to a center of mass of the device that is to be detected by a user using the user's sense of touch. For example, where the device or component of the device is in contact with a surface of the user that is sensitive to touch (e.g., a finger, palm, or other portion of the user's hand), the haptic output generated by the physical displacement will be interpreted by the user as a haptic sensation corresponding to a perceived change in a physical characteristic of the device or component of the device. For example, movement of a touch-sensitive surface (e.g., a touch-sensitive display or trackpad) is optionally interpreted by a user as a "down click" or "up click" of a physical actuation button. In some cases, the user will feel a tactile sensation, such as a "press click" or "release click," even when the physical actuation button associated with the touch-sensitive surface that is physically pressed (e.g., displaced) by the user's movement is not moving. As another example, movement of the touch sensitive surface may optionally be interpreted or sensed by the user as "roughness" of the touch sensitive surface even when there is no change in the smoothness of the touch sensitive surface. While such interpretation of touch by a user will be limited by the user's individualized sensory perception, many sensory perceptions of the presence of touch are common to most users. Thus, when a haptic output is described as corresponding to a particular sensory perception of a user (e.g., "click down," "click up," "roughness"), unless otherwise stated, the generated haptic output corresponds to a physical displacement of the device or a component thereof that would generate a sensory perception typical (or common) to the user.

It should be understood that device 200 is only one embodiment of a portable multifunction device, and that device 200 optionally has more or fewer components than shown, optionally combines two or more components, or optionally has a different configuration or arrangement of these components. The various components shown in fig. 2A are implemented in hardware, software, or a combination of both hardware and software, including one or more signal processing circuits and/or application specific integrated circuits.

Memory 202 may include one or more computer-readable storage media. The computer-readable storage medium may be tangible and non-transitory. The memory 202 may include high-speed random access memory and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Memory controller 222 may control other components of device 200 to access memory 202.

In some embodiments, a non-transitory computer-readable storage medium of memory 202 may be used to store instructions (e.g., for performing aspects of process 1100 described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In other embodiments, the instructions (e.g., for performing aspects of the process 1100 described below) may be stored on a non-transitory computer-readable storage medium (not shown) of the server system 108 or may be divided between the non-transitory computer-readable storage medium of the memory 202 and the non-transitory computer-readable storage medium of the server system 108. In the context of this document, a "non-transitory computer-readable storage medium" can be any medium that can contain or store the program for use by or in connection with the instruction execution system, apparatus, or device.

Peripheral interface 218 may be used to couple the input and output peripherals of the device to CPU 220 and memory 202. The one or more processors 220 execute or execute various software programs and/or sets of instructions stored in memory 202 to perform various functions of device 200 and to process data. In some embodiments, peripherals interface 218, CPU 220, and memory controller 222 may be implemented on a single chip, such as chip 204. In some other embodiments, they may be implemented on separate chips.

RF (radio frequency) circuitry 208 receives and transmits RF signals, also known as electromagnetic signals. The RF circuitry 208 converts electrical signals to/from electromagnetic signals and communicates with communication networks and other communication devices via electromagnetic signals. RF circuitry 208 optionally includes well-known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a codec chipset, a Subscriber Identity Module (SIM) card, memory, and so forth. RF circuitry 208 optionally communicates with networks, such as the internet (also known as a World Wide Web (WWW)), intranets, and/or wireless networks, such as cellular telephone networks, wireless Local Area Networks (LANs), and/or Metropolitan Area Networks (MANs), as well as other devices via wireless communications. The RF circuitry 208 optionally includes well-known circuitry for detecting Near Field Communication (NFC) fields, such as by short-range communication radios. The wireless communication optionally uses any of a number of communication standards, protocols, and techniques, including but not limited to global system for mobile communications (GSM), Enhanced Data GSM Environment (EDGE), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), evolution, data only (EV-DO), HSPA +, dual cell HSPA (DC-HSPDA), Long Term Evolution (LTE), Near Field Communication (NFC), wideband code division multiple access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), bluetooth low energy, wireless fidelity (Wi-Fi) (e.g., IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.11n, and/or IEEE 802.11ac), voice over internet protocol (VoIP), Wi-MAX, email protocols (e.g., internet Message Access Protocol (IMAP) and/or Post Office Protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), session initiation protocol with extensions for instant messaging and presence (SIMPLE), Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS), or any other appropriate communication protocol including communication protocols not yet developed at the filing date of this document.

Audio circuitry 210, speaker 211, and microphone 213 provide an audio interface between a user and device 200. The audio circuitry 210 receives audio data from the peripheral interface 218, converts the audio data to electrical signals, and transmits the electrical signals to the speaker 211. The speaker 211 converts the electrical signals into human audible sound waves. The audio circuit 210 also receives electrical signals converted from sound waves by the microphone 213. The audio circuit 210 converts the electrical signals to audio data and transmits the audio data to the peripheral interface 218 for processing. The audio data may be retrieved from memory 202 and/or RF circuitry 208 and/or transmitted to memory 102 and/or RF circuitry 108 by peripheral interface 218. In some implementations, the audio circuit 210 also includes a headset jack (e.g., 312 in fig. 3). The headset jack provides an interface between the audio circuitry 210 and a removable audio input/output peripheral, such as an output-only headset or a headset having both an output (e.g., a single-ear headset or a dual-ear headset) and an input (e.g., a microphone).

The I/O subsystem 206 couples input/output peripheral devices on the device 200, such as the touch screen 212 and other input control devices 216, to a peripheral interface 218. The I/O subsystem 206 optionally includes a display controller 256, an optical sensor controller 258, an intensity sensor controller 259, a haptic feedback controller 261, and one or more input controllers 260 for other input or control devices. The one or more input controllers 260 receive/transmit electrical signals from/to other input control devices 216 to/from other input control devices 116. Other input control devices 216 optionally include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels, and the like. In some alternative embodiments, the one or more input controllers 260 are optionally coupled to (or not coupled to) any of: a keyboard, an infrared port, a USB port, and a pointing device such as a mouse. The one or more buttons (e.g., 308 in fig. 3) optionally include an increase/decrease button for volume control of the speaker 211 and/or microphone 213. The one or more buttons optionally include a push button (e.g., 306 in fig. 3).

A quick push of the push button unlocks the touch screen 212 or initiates the process of Unlocking the Device using a gesture on the touch screen, as described in U.S. patent application No. 11/322,549 entitled "Unlocking a Device by Performance testing on devices an Unlock Image," filed on 23.12.2005, and U.S. patent application No.7,657,849. The above-mentioned U.S. patent application is hereby incorporated by reference in its entirety. Pressing the push button (e.g., 306) longer may turn the device 200 on or off. The user can customize the functionality of one or more buttons. The touch screen 212 is used to implement virtual or soft buttons and one or more soft keyboards.

The touch sensitive display 212 provides an input interface and an output interface between the device and the user. Display controller 256 receives electrical signals from touch screen 212 and/or transmits electrical signals to touch screen 112. Touch screen 212 displays visual output to a user. The visual output may include graphics, text, icons, video, and any combination thereof (collectively "graphics"). In some implementations, some or all of the visual output may correspond to a user interface object.

Touch screen 212 has a touch-sensitive surface, sensor, or group of sensors that accept input from a user based on tactile sensation and/or tactile contact. The touch screen 212 and the display controller 256 (along with any associated modules and/or sets of instructions in the memory 202) detect contact (and any movement or breaking of the contact) on the touch screen 212 and convert the detected contact into interaction with user interface objects (e.g., one or more soft keys, icons, web pages, or images) that are displayed on the touch screen 212. In an exemplary embodiment, the point of contact between the touch screen 212 and the user corresponds to a finger of the user.

The touch screen 212 may use LCD (liquid crystal display) technology, LPD (light emitting polymer display) technology, or LED (light emitting diode) technology, although other display technologies may be used in other embodiments. Touch screen 212 and display controller 256 may use any of a variety of touch sensing technologies now known or later developed, as well asHis proximity sensor array or other element for determining one or more points of contact with the touch screen 212 to detect contact and any movement or breaking thereof, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies. In one exemplary embodiment, projected mutual capacitance sensing technology is used, such as that in Apple Inc. (Cupertino, California)

And iPod

The technique found.

The touch sensitive display in some embodiments of the touch screen 212 may be similar to the multi-touch sensitive touchpad described in the following U.S. patents: 6,323,846(Westerman et al), 6,570,557 (Westerman et al) and/or 6,677,932 (Westerman); and/or U.S. patent publication 2002/0015024a1, each of which is hereby incorporated by reference in its entirety. However, touch screen 212 displays visual output from device 200, while touch sensitive touchpads do not provide visual output.

Touch sensitive displays in some embodiments of touch screen 212 may be described as in the following patent applications: (1) U.S. patent application 11/381,313, "Multipoint Touch Surface Controller," filed on 2.5.2006; (2) U.S. patent application 10/840,862, "Multipoint Touchscreen", filed on 6.5.2004; (3) U.S. patent application 10/903,964, "Gestures For Touch Sensitive Input Devices", filed 30.7.2004; (4) U.S. patent application 11/048,264, "Gestures For Touch Sensitive Input Devices," filed on 31/1/2005; (5) U.S. patent application 11/038,590, "model-Based Graphical User Interfaces For Touch Sensitive Input Devices", filed on 18.1.2005; (6) U.S. patent application 11/228,758, "Virtual Input Device plan On A Touch Screen User Interface", filed On 16.9.2005; (7) U.S. patent application 11/228,700, "Operation Of A Computer With A Touch Screen Interface," filed on 16.9.2005; (8) U.S. patent application 11/228,737, "Activating Virtual Keys Of A Touch-Screen Virtual Keys", filed on 16.9.2005; and (9) U.S. patent application 11/367,749, "Multi-Functional Hand-Held Device," filed 3.3.2006. All of these patent applications are incorporated herein by reference in their entirety.

The touch screen 212 may have a video resolution in excess of 100 dpi. In some embodiments, the touch screen has a video resolution of about 160 dpi. The user may make contact with touch screen 212 using any suitable object or appendage, such as a stylus, a finger, and so forth. In some embodiments, the user interface is designed to work primarily with finger-based contacts and gestures, which may not be as accurate as stylus-based input due to the larger contact area of the finger on the touch screen. In some implementations, the device translates the rough finger-based input into a precise pointer/cursor position or command for performing the action desired by the user.

In some embodiments, in addition to a touch screen, device 200 may include a touch pad (not shown) for activating or deactivating particular functions. In some embodiments, the touchpad is a touch-sensitive area of the device that, unlike a touch screen, does not display visual output. The touchpad may be a touch-sensitive surface that is separate from the touch screen 212 or an extension of the touch-sensitive surface formed by the touch screen.

The device 200 also includes a power system 262 for powering the various components. Power system 262 may include a power management system, one or more power sources (e.g., battery, Alternating Current (AC)), a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator (e.g., a Light Emitting Diode (LED)), and any other components associated with the generation, management, and distribution of power in a portable device.

The device 200 may also include one or more optical sensors 264. Fig. 2A shows an optical sensor coupled to optical sensor controller 258 in I/O subsystem 206. The optical sensor 264 may comprise a Charge Coupled Device (CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The optical sensor 264 receives light projected through one or more lenses from the environment and converts the light into data representing an image. In conjunction with the imaging module 243 (also referred to as a camera module), the optical sensor 264 may capture still images or video. In some embodiments, the optical sensor is located on the back of device 200 opposite touch screen display 212 on the front of the device so that the touch screen display can be used as a viewfinder for still and/or video image acquisition. In some embodiments, the optical sensor is located at the front of the device so that images of the user may be acquired for the video conference while the user views the other video conference participants on the touch screen display. In some implementations, the position of the optical sensor 264 can be changed by the user (e.g., by rotating a lens and sensor in the device housing) so that a single optical sensor 264 can be used with a touch screen display for both video conferencing and still image and/or video image capture.

Device 200 optionally further comprises one or more contact intensity sensors 265. FIG. 2A shows a contact intensity sensor coupled to intensity sensor controller 259 in I/O subsystem 206. Contact intensity sensor 265 optionally includes one or more piezoresistive strain gauges, capacitive force sensors, electrical force sensors, piezoelectric force sensors, optical force sensors, capacitive touch-sensitive surfaces, or other intensity sensors (e.g., sensors for measuring the force (or pressure) of a contact on a touch-sensitive surface). Contact intensity sensor 265 receives contact intensity information (e.g., pressure information or a surrogate for pressure information) from the environment. In some embodiments, at least one contact intensity sensor is juxtaposed or adjacent to the touch-sensitive surface (e.g., touch-sensitive display system 212). In some embodiments, at least one contact intensity sensor is located on the back of device 200 opposite touch screen display 212 located on the front of device 200.

The device 200 may also include one or more proximity sensors 266. Fig. 2A shows a proximity sensor 266 coupled to the peripheral interface 218. Alternatively, the proximity sensor 266 may be coupled to the input controller 260 in the I/O subsystem 206. The proximity sensor 266 may be implemented as described in the following U.S. patent applications: 11/241,839 entitled "Proximaty Detector In Handheld Device"; 11/240,788 entitled "Proximaty Detector In Handheld Device"; 11/620,702 entitled "Using Ambient Light Sensor To Automation restriction Sensor Output"; 11/586,862 entitled "Automated Response To And Sensing Of User Activity In Portable Devices"; and 11/638,251 entitled "Methods And Systems For Automatic Configuration Of Peripherals," which are hereby incorporated by reference in their entirety. In some embodiments, the proximity sensor turns off and disables the touch screen 212 when the multifunction device is placed near the user's ear (e.g., when the user is making a phone call).

Device 200 optionally further comprises one or more tactile output generators 267. Fig. 2A shows a tactile output generator coupled to a tactile feedback controller 261 in the I/O subsystem 206. Tactile output generator 267 optionally includes one or more electro-acoustic devices such as speakers or other audio components; and/or an electromechanical device for converting energy into linear motion such as a motor, solenoid, electroactive aggregator, piezoelectric actuator, electrostatic actuator, or other tactile output generating component (e.g., a component for converting an electrical signal into a tactile output on the device). Contact intensity sensor 265 receives haptic feedback generation instructions from haptic feedback module 233 and generates haptic output on device 200 that can be felt by a user of device 200. In some embodiments, at least one tactile output generator is juxtaposed or adjacent to a touch-sensitive surface (e.g., touch-sensitive display system 212), and optionally generates tactile output by moving the touch-sensitive surface vertically (e.g., into/out of the surface of device 200) or laterally (e.g., back and forth in the same plane as the surface of device 200). In some embodiments, at least one tactile output generator sensor is located on the back of device 200 opposite touch screen display 212, which is located on the front of device 200.

Device 200 may also include one or more accelerometers 268. Fig. 2A shows accelerometer 268 coupled to peripheral interface 218. Alternatively, accelerometer 268 may be coupled to input controller 260 in I/O subsystem 206. Accelerometer 268 may be implemented as described in the following U.S. patent publications: 20050190059 entitled "Acceleration-Based Detection System For Portable Electronic Devices" And 20060017692, entitled "Methods And apparatus For Operating A Portable Device Based On An Accelerator", the disclosures of which are both incorporated herein by reference in their entirety. In some embodiments, information is displayed in a portrait view or a landscape view on the touch screen display based on analysis of data received from the one or more accelerometers. Device 200 optionally includes a magnetometer (not shown) and a GPS (or GLONASS or other global navigation system) receiver (not shown) in addition to the one or more accelerometers 268 for obtaining information regarding the position and orientation (e.g., portrait or landscape) of device 200.

In some embodiments, the software components stored in memory 202 include an operating system 226, a communication module (or set of instructions) 228, a contact/motion module (or set of instructions) 230, a graphics module (or set of instructions) 232, a text input module (or set of instructions) 234, a Global Positioning System (GPS) module (or set of instructions) 235, a digital assistant client module 229, and an application program (or set of instructions) 236. Moreover, memory 202 may store data and models, such as user data and models 231. Further, in some embodiments, memory 202 (fig. 2A) or 470 (fig. 4) stores device/global internal state 257, as shown in fig. 2A, and fig. 4. Device/global internal state 257 includes one or more of: an active application state to indicate which applications (if any) are currently active; a display state indicating what applications, views, or other information occupy various areas of the touch screen display 212; sensor states including information obtained from the various sensors of the device and the input control device 216; and location information regarding the device's location and/or attitude.

The operating system 226 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, iOS, WINDOWS, or embedded operating systems such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components.

The communication module 228 facilitates communication with other devices through one or more external ports 224 and also includes various software components for processing data received by the RF circuitry 208 and/or the external ports 224. External port 224 (e.g., Universal Serial Bus (USB), firewire, etc.) is adapted to couple directly to other devices or indirectly through a network (e.g., the internet, wireless LAN, etc.). In some embodiments, the external port is an external port

(trade mark of Apple inc.) the same or similar and/or compatible multi-pin (e.g., 30-pin) connectors used on 30-pin connectors on devices.

The contact/motion module 230 optionally detects contact with the touch screen 212 (in conjunction with the display controller 256) and other touch sensitive devices (e.g., a touchpad or a physical click wheel). The contact/motion module 230 includes various software components for performing various operations related to contact detection, such as determining whether contact has occurred (e.g., detecting a finger-down event), determining contact intensity (e.g., force or pressure of contact, or a substitute for force or pressure of contact), determining whether there is movement of contact and tracking movement across the touch-sensitive surface (e.g., detecting one or more finger-dragging events), and determining whether contact has ceased (e.g., detecting a finger-up event or contact disconnection). The contact/motion module 230 receives contact data from the touch-sensitive surface. Determining movement of the point of contact optionally includes determining velocity (magnitude), velocity (magnitude and direction), and/or acceleration (change in magnitude and/or direction) of the point of contact, the movement of the point of contact being represented by a series of contact data. These operations are optionally applied to single point contacts (e.g., single finger contacts) or multiple point simultaneous contacts (e.g., "multi-touch"/multiple finger contacts). In some embodiments, the contact/motion module 230 and the display controller 256 detect contact on the touch pad.

In some embodiments, the contact/motion module 230 uses a set of one or more intensity thresholds to determine whether an operation has been performed by the user (e.g., determine whether the user has "clicked" on an icon). In some embodiments, at least a subset of the intensity thresholds are determined according to software parameters (e.g., the intensity thresholds are not determined by the activation thresholds of particular physical actuators and may be adjusted without changing the physical hardware of the device 200). For example, the mouse "click" threshold of the trackpad or touchscreen can be set to any one of a wide range of predefined thresholds without changing the trackpad or touchscreen display hardware. Additionally, in some implementations, a user of the device is provided with software settings for adjusting one or more intensity thresholds of a set of intensity thresholds (e.g., by adjusting individual intensity thresholds and/or by adjusting multiple intensity thresholds at once with a system-level click on an "intensity" parameter).

The contact/motion module 230 optionally detects gesture input by the user. Different gestures on the touch-sensitive surface have different contact patterns (e.g., different motions, timings, and/or intensities of detected contacts). Thus, the gesture is optionally detected by detecting a specific contact pattern. For example, detecting a finger tap gesture includes detecting a finger-down event, and then detecting a finger-up (lift-off) event at the same location (or substantially the same location) as the finger-down event (e.g., at the location of the icon). As another example, detecting a finger swipe gesture on the touch-sensitive surface includes detecting a finger-down event, then detecting one or more finger-dragging events, and then subsequently detecting a finger-up (lift-off) event.

Graphics module 232 includes various known software components for rendering and displaying graphics on touch screen 212 or other display, including components for changing the visual impact (e.g., brightness, transparency, saturation, contrast, or other visual characteristics) of the displayed graphics. As used herein, the term "graphic" includes any object that may be displayed to a user, including without limitation text, web pages, icons (such as user interface objects including soft keys), digital images, videos, animations and the like.

In some embodiments, graphics module 232 stores data to be used to represent graphics. Each graphic is optionally assigned a corresponding code. The graphic module 232 receives one or more codes for specifying a graphic to be displayed from an application program or the like, and also receives coordinate data and other graphic attribute data together if necessary, and then generates screen image data to output to the display controller 256.

Haptic feedback module 233 includes various software components for generating instructions for use by haptic output generator 267 to produce haptic outputs at one or more locations on device 200 in response to user interaction with device 200.

Text input module 234, which may be a component of graphics module 232, provides a soft keyboard for entering text in a variety of applications, such as contacts 237, email 240, instant message 241, browser 247, and any other application that requires text input.

The GPS module 235 determines the location of the device and provides this information for use in various applications (e.g., to the phone 238 for use in location-based dialing, to the camera 243 as picture/video metadata, and to applications that provide location-based services, such as weather desktop applets, local yellow pages desktop applets, and map/navigation desktop applets).

The digital assistant client module 229 may include various client-side digital assistant instructions that provide client-side functionality of a digital assistant. For example, the digital assistant client module 229 can accept voice input (e.g., voice input), text input, touch input, and/or gesture input through various user interfaces of the portable multifunction device 200 (e.g., the microphone 213, the accelerometer 268, the touch-sensitive display system 212, the optical sensor 229, the other input control device 216, etc.). The digital assistant client module 229 can also provide output in audio (e.g., speech output), visual, and/or tactile forms through various output interfaces of the portable multifunction device 200 (e.g., speaker 211, touch-sensitive display system 212, touch output generator 267, etc.). For example, the output may be provided as voice, sound, alarm, text message, menu, graphic, video, animation, vibration, and/or a combination of two or more of the foregoing. During operation, digital assistant client module 229 may communicate with DA server 106 using RF circuitry 208.

The user data and models 231 can include various data associated with the user (e.g., user-specific vocabulary data, user preference data, user-specified name pronunciations, data from the user's electronic address book, to-do lists, shopping lists, etc.) to provide client-side functionality of the digital assistant. Moreover, the user data and models 231 may include various models (e.g., speech recognition models, statistical language modules, natural language processing models, ontologies, task flow models, service models, etc.) for processing user input and determining user intent.

In some embodiments, the digital assistant client module 229 may utilize various sensors, subsystems, and peripherals of the portable multifunction device 200 to gather additional information from the ambient environment of the portable multifunction device 200 to establish a context associated with the user, the current user interaction, and/or the current user input. In some embodiments, the digital assistant client module 229 may provide the contextual information, or a subset thereof, along with the user input to the DA server 106 to help infer the user's intent. In some embodiments, the digital assistant may also use the contextual information to determine how to prepare and communicate the output to the user. Context information may refer to context data.

In some embodiments, the contextual information accompanying the user input may include sensor information, such as lighting, ambient noise, ambient temperature, images or video of the surrounding environment, and the like. In some embodiments, the context information may also include physical states of the device, such as device orientation, device location, device temperature, power level, velocity, acceleration, motion pattern, cellular signal strength, and the like. In some embodiments, information related to the software state of the DA server 106, such as the running process of the portable multifunction device 200, installed programs, past and current network activities, background services, error logs, resource usage, etc., may be provided to the DA server 106 as contextual information associated with the user input.

In some embodiments, the digital assistant client module 229 may selectively provide information (e.g., user data 231) stored on the portable multifunction device 200 in response to a request from the DA server 106. In some embodiments, the digital assistant client module 229 may also elicit additional input from the user via a natural language dialog or other user interface upon request by the DA server 106. The digital assistant client module 229 may transmit the additional input to the DA server 106 to assist the DA server 106 in intent inference and/or to satisfy the user intent expressed in the user request.

A more detailed description of the digital assistant is described below with reference to fig. 7A-7C. It should be appreciated that the digital assistant client module 229 may include any number of the sub-modules of the digital assistant module 726 described below.

The application programs 236 may include the following modules (or sets of instructions), or a subset or superset thereof:

a contacts module 237 (sometimes also referred to as a contact list or contact list);

a phone module 238;

a video conferencing module 239;

an email client module 240;

an Instant Messaging (IM) module 241;

fitness support module 242;

a camera module 243 for still and/or video images;

an image management module 244;

a video player module;

a music player module;

a browser module 247;

a calendar module 248;

desktop applet modules 249 that may include one or more of the following: an weather desktop applet 249-1, a stock market desktop applet 249-2, a calculator desktop applet 249-3, an alarm desktop applet 249-4, a dictionary desktop applet 249-5, and other desktop applets acquired by the user and a desktop applet 249-6 created by the user;

a desktop applet creator module 250 for generating a user-created desktop applet 249-6;

A search module 251;

a video and music player module 252 that incorporates a video player module and a music player module;

a notepad module 253;

a map module 254; and/or

Online video module 255.

Examples of other application programs 236 that may be stored in memory 202 include other word processing applications, other image editing applications, drawing applications, rendering applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.

In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the contacts module 237 may be used to manage an address book or a list of contacts (e.g., stored in the memory 202 or in the application internal state 292 of the contacts module 237 in the memory 470), including: adding one or more names to an address book; deleting one or more names from the address book; associating one or more telephone numbers, one or more email addresses, one or more physical addresses, or other information with a name; associating the image with a name; classifying and classifying names; providing a telephone number or email address to initiate and/or facilitate communication via telephone 238, video conference module 239, email 240, or IM 241; and the like.

In conjunction with the RF circuitry 208, the audio circuitry 210, the speaker 211, the microphone 213, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the phone module 238 may be used to enter a sequence of characters corresponding to a phone number, access one or more phone numbers in the contacts module 237, modify the entered phone number, dial a corresponding phone number, conduct a conversation, and disconnect or hang up when the conversation is completed. As described above, wireless communication may use any of a number of communication standards, protocols, and technologies.

In conjunction with the RF circuitry 208, the audio circuitry 210, the speaker 211, the microphone 213, the touch screen 212, the display controller 256, the optical sensor 264, the optical sensor controller 258, the contact/motion module 230, the graphics module 232, the text input module 234, the contacts module 237, and the phone module 238, the video conference module 239 includes executable instructions to initiate, conduct, and terminate video conferences between the user and one or more other participants according to user instructions.

In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, email client module 240 includes executable instructions to create, send, receive, and manage emails in response to user instructions. In conjunction with the image management module 244, the e-mail client module 240 makes it very easy to create and send an e-mail having a still image or a video image photographed by the camera module 243.

In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the instant message module 241 includes executable instructions for: inputting a sequence of characters corresponding to an instant message, modifying previously input characters, transmitting a corresponding instant message (e.g., using a Short Message Service (SMS) or Multimedia Messaging Service (MMS) protocol for a phone-based instant message or using XMPP, SIMPLE, or IMPS for an internet-based instant message), receiving an instant message, and viewing the received instant message. In some embodiments, the transmitted and/or received instant messages may include graphics, photos, audio files, video files, and/or other attachments supported in MMS and/or Enhanced Messaging Service (EMS). As used herein, "instant message" refers to both telephony-based messages (e.g., messages transmitted using SMS or MMS) and internet-based messages (e.g., messages transmitted using XMPP, SIMPLE, or IMPS).

In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, GPS module 235, map module 254, and music player module, fitness support module 242 includes executable instructions for: creating fitness (e.g., with time, distance, and/or calorie burning goals); communicating with fitness sensors (mobile devices); receiving fitness sensor data; calibrating a sensor for monitoring fitness; selecting body-building music and playing; and displaying, storing and transmitting fitness data.

In conjunction with the touch screen 212, the display controller 256, the one or more optical sensors 264, the optical sensor controller 258, the contact/motion module 230, the graphics module 232, and the image management module 244, the camera module 243 includes executable instructions for: capturing still images or video (including video streams) and storing them in the memory 202, modifying features of the still images or video, or deleting the still images or video from the memory 202.

In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, and the camera module 243, the image management module 244 includes executable instructions for arranging, modifying (e.g., editing), or otherwise manipulating, labeling, deleting, presenting (e.g., in a digital slide or album), and storing still and/or video images.

In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, browser module 247 includes executable instructions for browsing the internet (including searching, linking to, receiving, and displaying web pages or portions thereof, and attachments and other files linked to web pages) according to user instructions.

In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, the email client module 240, and the browser module 247, the calendar module 248 includes executable instructions for creating, displaying, modifying, and storing a calendar and data associated with the calendar (e.g., calendar entries, to-do, etc.) according to user instructions.

In conjunction with the RF circuitry 208, the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, the text input module 234, and the browser module 247, the desktop applet module 249 is a mini-application (e.g., a weather desktop applet 249-1, a stock market desktop applet 249-2, a calculator desktop applet 249-3, an alarm desktop applet 249-4, and a dictionary desktop applet 249-5) or a mini-application created by a user (e.g., a user-created desktop applet 249-6) that may be downloaded and used by the user. In some embodiments, the desktop applet includes an HTML (hypertext markup language) file, a CSS (cascading style sheet) file, and a JavaScript file. In some embodiments, the desktop applet includes an XML (extensible markup language) file and a JavaScript file (e.g., Yahoo! desktop applet).

In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, and browser module 247, desktop applet creator module 250 may be used by a user to create a desktop applet (e.g., to transfer a user-specified portion of a web page into the desktop applet).

In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, and text input module 234, search module 251 includes executable instructions for searching memory 202 for text, music, sound, images, videos, and/or other files that match one or more search criteria (e.g., one or more user-specified search terms) according to user instructions.

In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, audio circuitry 210, speakers 211, RF circuitry 208, and browser module 247, video and music player module 252 includes executable instructions that allow a user to download and playback recorded music and other sound files stored in one or more file formats, such as MP3 or AAC files, as well as executable instructions for displaying, rendering, or otherwise playing back video (e.g., on touch screen 212 or on an external display connected via external port 224). In some embodiments, the device 200 optionally includes the functionality of an MP3 player, such as an iPod (trademark of Apple inc.).

In conjunction with the touch screen 212, the display controller 256, the contact/motion module 230, the graphics module 232, and the text input module 234, the notepad module 253 includes executable instructions to create and manage notepads, backlogs, and the like according to user instructions.

In conjunction with RF circuitry 208, touch screen 212, display controller 256, contact/motion module 230, graphics module 232, text input module 234, GPS module 235, and browser module 247, map module 254 may be used to receive, display, modify, and store maps and data associated with maps (e.g., driving directions, data related to stores and other points of interest at or near a particular location, and other location-based data) according to user instructions.

In conjunction with touch screen 212, display controller 256, contact/motion module 230, graphics module 232, audio circuit 210, speaker 211, RF circuit 208, text input module 234, email client module 240, and browser module 247, online video module 255 includes instructions that allow a user to access, browse, receive (e.g., by streaming and/or downloading), playback (e.g., on the touch screen or on an external display connected via external port 224), send emails with links to particular online videos, and otherwise manage online videos in one or more file formats, such as h.264. In some embodiments, a link to a particular online video is sent using instant messaging module 241 rather than email client module 240. Additional descriptions of Online video applications may be found in U.S. provisional patent application No.60/936,562 entitled "Portable Multi function Device, Method, and Graphical User Interface for Playing Online video," filed on year 6, 20, 2007, and U.S. patent application No.11/968,067 entitled "Portable Multi function Device, Method, and Graphical User Interface for Playing Online video," filed on year 12, 31, 2007, the contents of which are hereby incorporated by reference in their entirety.

Each of the modules and applications described above corresponds to a set of executable instructions for performing one or more of the functions described above as well as the methods described in this patent application (e.g., the computer-implemented methods and other information processing methods described herein). These modules (e.g., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. For example, a video player module may be combined with a music player module into a single module (e.g., video and music player module 252 in fig. 2A). In some embodiments, memory 202 may store a subset of the modules and data structures described above. In addition, memory 202 may store additional modules and data structures not described above.

In some embodiments, device 200 is a device on which the operation of a predefined set of functions is performed exclusively through a touch screen and/or a touchpad. By using a touch screen and/or touch pad as the primary input control device for operation of device 200, the number of physical input control devices (such as push buttons, dials, and the like) on device 200 may be reduced.

The predefined set of functions performed exclusively by the touchscreen and/or touchpad optionally include navigating between user interfaces. In some embodiments, the touchpad, when touched by a user, navigates device 200 from any user interface displayed on device 200 to a main, home, or root menu. In such embodiments, a touchpad is used to implement a "menu button". In some other embodiments, the menu button is a physical push button or other physical input control device, rather than a touchpad.

Fig. 2B is a block diagram illustrating exemplary components for event processing, according to some embodiments. In some embodiments, the memory 202 (FIG. 2A) or the memory 470 (FIG. 4) includes the event classifier 270 (e.g., in the operating system 226) and the corresponding application 236-1 (e.g., any of the aforementioned applications 237, 251,255,480, 490).

The event sorter 270 receives the event information and determines the application 236-1 to which the event information is to be delivered and the application view 291 of the application 236-1. Event sorter 270 includes event monitor 271 and event dispatcher module 274. In some embodiments, the application 236-1 includes an application internal state 292 that indicates one or more current application views that are displayed on the touch-sensitive display 212 when the application is active or executing. In some embodiments, device/global internal state 257 is used by event classifier 270 to determine which application(s) are currently active, and application internal state 292 is used by event classifier 270 to determine the application view 291 to which to deliver event information.

In some embodiments, the application internal state 292 includes additional information, such as one or more of the following: resume information to be used when the application 236-1 resumes execution, user interface state information indicating information being displayed by the application 236-1 or information that is ready for display by the application 236-1, a state queue for enabling a user to return to a previous state or view of the application 136-1, and a repeat/undo queue of previous actions taken by the user.

The event monitor 271 receives event information from the peripheral interface 218. The event information includes information about a sub-event (e.g., a user touch on the touch-sensitive display 212 as part of a multi-touch gesture). Peripherals interface 218 transmits information it receives from I/O subsystem 206 or sensors (such as proximity sensor 266), one or more accelerometers 268, and/or microphone 213 (via audio circuitry 210). Information received by peripheral interface 218 from I/O subsystem 206 includes information from touch-sensitive display 212 or a touch-sensitive surface.

In some embodiments, event monitor 271 sends requests to peripheral interface 218 at predetermined intervals. In response, peripheral interface 218 transmits event information. In other embodiments, peripheral interface 218 transmits event information only when there is a significant event (e.g., receiving input above a predetermined noise threshold and/or receiving input for more than a predetermined duration).

In some embodiments, event classifier 270 also includes hit view determination module 272 and/or activity event recognizer determination module 273.

When the touch-sensitive display 212 displays more than one view, the hit view determination module 272 provides a software process for determining where within one or more views a sub-event has occurred. The view consists of controls and other elements that the user can see on the display.

Another aspect of the user interface associated with an application is a set of views, sometimes referred to herein as application views or user interface windows, in which information is displayed and touch-based gestures occur. The application view (of the respective application) in which the touch is detected may correspond to a programmatic level within a programmatic or view hierarchy of applications. For example, the lowest level view in which a touch is detected may be referred to as a hit view, and the set of events identified as correct inputs may be determined based at least in part on the hit view of the initial touch that began the touch-based gesture.

Hit view determination module 272 receives information related to sub-events of the touch-based gesture. When the application has multiple views organized in a hierarchy, hit view determination module 272 identifies the hit view as the lowest view in the hierarchy that should handle the sub-event. In most cases, the hit view is the lowest level view in which the initiating sub-event (e.g., the first sub-event in a sequence of sub-events that form an event or potential event) occurs. Once the hit view is identified by hit view determination module 272, the hit view typically receives all sub-events related to the same touch or input source to which it was identified as being the hit view.

The activity event identifier determination module 273 determines which view or views within the view hierarchy should receive a particular sequence of sub-events. In some implementations, the activity event recognizer determination module 273 determines that only the hit view should receive a particular sequence of sub-events. In other embodiments, the activity event recognizer determination module 273 determines that all views including the physical location of the sub-event are actively participating views, and thus determines that all actively participating views should receive a particular sequence of sub-events. In other embodiments, even if the touch sub-event is completely confined to the area associated with one particular view, the higher views in the hierarchy will remain as actively participating views.

Event dispatcher module 274 dispatches event information to event recognizers (e.g., event recognizer 280). In embodiments that include the activity event recognizer determination module 273, the event dispatcher module 274 delivers the event information to the event recognizer determined by the activity event recognizer determination module 273. In some embodiments, the event dispatcher module 274 stores event information in an event queue, which is retrieved by the respective event receiver 282.

In some embodiments, the operating system 226 includes an event classifier 270. Alternatively, application 236-1 includes event classifier 270. In further embodiments, the event classifier 270 is a separate module or is part of another module stored in the memory 202 (such as the contact/motion module 230).

In some embodiments, the application 236-1 includes a plurality of event handlers 290 and one or more application views 291, where each application view includes instructions for handling touch events occurring within a respective view of the application's user interface. Each application view 291 of the application 236-1 includes one or more event recognizers 280. Typically, the respective application view 291 includes a plurality of event recognizers 280. In other embodiments, one or more of the event recognizers 280 are part of a separate module, such as a user interface toolkit (not shown) or a higher level object from which the application 236-1 inherits methods and other properties. In some embodiments, the respective event handlers 290 include one or more of: data updater 276, object updater 277, GUI updater 278, and/or event data 279 received from event classifier 270. Event handler 290 may utilize or call data updater 276, object updater 277 or GUI updater 278 to update application internal state 292. Alternatively, one or more of the application views 291 include one or more respective event handlers 290. Additionally, in some embodiments, one or more of the data updater 276, the object updater 277, and the GUI updater 278 are included in respective application views 291.

The corresponding event identifier 280 receives event information (e.g., event data 279) from the event classifier 270 and identifies events from the event information. Event recognizer 280 includes an event receiver 282 and an event comparator 284. In some embodiments, event recognizer 280 also includes at least a subset of: metadata 283, and event delivery instructions 288 (which may include sub-event delivery instructions).

Event receiver 282 receives event information from event sorter 270. The event information includes information about a sub-event (e.g., a touch or touch movement). According to the sub-event, the event information further includes additional information, such as the location of the sub-event. When the sub-event relates to motion of a touch, the event information may also include the velocity and direction of the sub-event. In some embodiments, the event comprises rotation of the device from one orientation to another (e.g., from a portrait orientation to a landscape orientation, or vice versa) and the event information comprises corresponding information about the current orientation of the device (also referred to as the device pose).

Event comparator 284 compares the event information to predefined event or sub-event definitions and determines an event or sub-event or determines or updates the state of an event or sub-event based on the comparison. In some embodiments, event comparator 284 includes an event definition 286. The event definition 286 contains definitions of events (e.g., predefined sub-event sequences), such as event 1(287-1), event 2(287-2), and other events. In some embodiments, sub-events in event (287) include, for example, touch start, touch end, touch move, touch cancel, and multi-touch. In one embodiment, event 1(287-1) is defined as a double click on the displayed object. For example, a double tap includes a first touch (touch start) on the displayed object for a predetermined length of time, a first lift-off (touch end) for a predetermined length of time, a second touch (touch start) on the displayed object for a predetermined length of time, and a second lift-off (touch end) for a predetermined length of time. In another example, the definition of event 2(287-2) is a drag on the displayed object. For example, the drag includes a predetermined length of time of touch (or contact) on the displayed object, movement of the touch on the touch-sensitive display 212, and lifting of the touch (touch end). In some embodiments, the event also includes information for one or more associated event handlers 290.

In some embodiments, the event definitions 287 include definitions of events for respective user interface objects. In some embodiments, event comparator 284 performs a hit test to determine which user interface object is associated with a sub-event. For example, in an application view that displays three user interface objects on the touch-sensitive display 212, when a touch is detected on the touch-sensitive display 212, the event comparator 284 performs a hit-test to determine which of the three user interface objects is associated with the touch (sub-event). If each displayed object is associated with a corresponding event handler 290, the event comparator uses the results of the hit test to determine which event handler 290 should be activated. For example, event comparator 284 selects the event handler associated with the sub-event and the object that triggered the hit test.

In some embodiments, the definition of the respective event (287) further comprises a delay action that delays the delivery of the event information until it has been determined whether the sequence of sub-events does or does not correspond to the event type of the event recognizer.

When the respective event recognizer 280 determines that the sequence of sub-events does not match any event in the event definition 286, the respective event recognizer 280 enters an event not possible, event failed, or event ended state, after which subsequent sub-events of the touch-based gesture are ignored. In this case, other event recognizers (if any) that remain active for the hit view continue to track and process sub-events of the persistent touch-based gesture.

In some embodiments, the respective event recognizer 280 includes metadata 283 with configurable attributes, tags, and/or lists for indicating how the event delivery system should perform sub-event delivery to actively participating event recognizers. In some embodiments, metadata 283 includes configurable attributes, flags, and/or lists that indicate how event recognizers may interact with each other or be enabled to interact with each other. In some embodiments, metadata 283 includes configurable attributes, tags, and/or lists for indicating whether a sub-event is delivered to different levels in a view or programmatic hierarchy.

In some embodiments, when one or more particular sub-events of an event are identified, the respective event identifier 280 activates the event handler 290 associated with the event. In some embodiments, the respective event identifier 280 delivers event information associated with the event to the event handler 290. Activating the event handler 290 is different from sending (and deferring) sub-events to the corresponding hit view. In some embodiments, event recognizer 280 throws a marker associated with the recognized event, and event handler 290 associated with the marker retrieves the marker and performs a predefined process.

In some embodiments, the event delivery instructions 288 include sub-event delivery instructions that deliver event information about sub-events without activating an event handler. Instead, the sub-event delivery instructions deliver event information to event handlers associated with the sequence of sub-events or to actively participating views. Event handlers associated with the sequence of sub-events or with actively participating views receive the event information and perform a predetermined process.

In some embodiments, the data updater 276 creates and updates data used in the application 236-1. For example, the data updater 276 updates a phone number used in the contacts module 237 or stores a video file used in the video player module. In some embodiments, the object updater 277 creates and updates objects used in the application 236-1. For example, object updater 277 creates a new user interface object or updates the location of a user interface object. The GUI updater 278 updates the GUI. For example, GUI updater 278 prepares display information and sends it to graphics module 232 for display on a touch-sensitive display.

In some embodiments, one or more event handlers 290 include data updater 276, object updater 277, and GUI updater 278 or have access to data updater 276, object updater 277, and GUI updater 278. In some embodiments, the data updater 276, the object updater 277, and the GUI updater 278 are included in a single module of the respective application 236-1 or application view 291. In other embodiments, they are included in two or more software modules.

It should be understood that the above discussion of event processing with respect to user touches on a touch sensitive display is also applicable to other forms of user input utilizing an input device to operate multifunction device 200, not all of which are initiated on a touch screen. For example, mouse movements and mouse button presses, optionally in combination with single or multiple keyboard presses or holds; contact movements on the touch pad, such as taps, drags, scrolls, and the like; inputting by a stylus; movement of the device; verbal instructions; the detected eye movement; inputting biological characteristics; and/or any combination thereof, optionally as input corresponding to the sub-event for defining the event to be identified.

Fig. 3 illustrates a portable multifunction device 200 with a touch screen 212 in accordance with some embodiments. The touch screen optionally displays one or more graphics within a User Interface (UI) 300. In this embodiment, as well as other embodiments described below, a user can select one or more of these graphics by making gestures on the graphics, for example, with one or more fingers 302 (not drawn to scale in the figures) or with one or more styluses 303 (not drawn to scale in the figures). In some embodiments, selection of one or more graphics will occur when the user breaks contact with the one or more graphics. In some embodiments, the gesture optionally includes one or more taps, one or more swipes (left to right, right to left, up, and/or down), and/or a rolling of a finger (right to left, left to right, up, and/or down) that has made contact with device 200. In some implementations, or in some cases, inadvertent contact with a graphic does not select the graphic. For example, when the gesture corresponding to the selection is a tap, a swipe gesture that sweeps over an application icon optionally does not select the corresponding application.

The device 200 may also include one or more physical buttons, such as a "home" button or a menu button 304. As previously described, menu button 304 may be used to navigate to any application 236 in a set of applications that may be executed on device 200. Alternatively, in some embodiments, the menu buttons are implemented as soft keys in a GUI displayed on touch screen 212.

In some embodiments, device 200 includes a touch screen 212, menu buttons 304, a push button 306 for powering the device on/off and for locking the device, one or more volume adjustment buttons 308, a Subscriber Identity Module (SIM) card slot 310, a headset jack 312, and a docking/charging external port 224. The push button 306 is optionally used to: powering on/off the device by pressing and maintaining the button in a depressed state for a predetermined time interval; locking the device by pressing the button and releasing the button before a predetermined time interval has elapsed; and/or unlocking the device or initiating an unlocking process. In an alternative embodiment, device 200 also accepts voice input through microphone 213 for activating or deactivating certain functions. Device 200 also optionally includes one or more contact intensity sensors 265 for detecting the intensity of contacts on touch screen 212, and/or one or more tactile output generators 267 for generating tactile outputs for a user of device 200.

Fig. 4 is a block diagram of an exemplary multifunction device with a display and a touch-sensitive surface in accordance with some embodiments. The device 400 need not be portable. In some embodiments, the device 400 is a laptop, desktop, tablet, multimedia player device, navigation device, educational device (such as a child learning toy), gaming system, or control device (e.g., a home controller or industrial controller). Device 400 typically includes one or more processing units (CPUs) 410, one or more network or other communication interfaces 460, memory 470, and one or more communication buses 420 for interconnecting these components. The communication bus 420 optionally includes circuitry (sometimes called a chipset) that interconnects and controls communication between system components. Device 400 includes an input/output (I/O) interface 430 with a display 440, which is typically a touch screen display. The I/O interface 430 also optionally includes a keyboard and/or mouse (or other pointing device) 450 and a touchpad 455, a tactile output generator 457 (e.g., similar to one or more tactile output generators 267 described above with reference to fig. 2A), a sensor 459 (e.g., an optical sensor, an acceleration sensor, a proximity sensor, a touch-sensitive sensor, and/or a contact intensity sensor similar to one or more contact intensity sensors 265 described above with reference to fig. 2A) for generating tactile outputs on the device 400. The memory 470 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and optionally includes non-volatile memory such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices or other non-volatile solid state storage devices. Memory 470 optionally includes one or more storage devices located remotely from CPU 410. In some embodiments, memory 470 stores programs, modules, and data structures similar to or a subset of the programs, modules, and data structures stored in memory 202 of portable multifunction device 200 (fig. 2A). In addition, memory 470 optionally stores additional programs, modules, and data structures not present in memory 202 of portable multifunction device 200. For example, memory 470 of device 400 optionally stores drawing module 480, presentation module 482, word processing module 484, website creation module 486, disk editing module 488, and/or spreadsheet module 490, while memory 202 of portable multifunction device 200 (FIG. 2A) optionally does not store these modules.

Each of the above-described elements in fig. 4 may be stored in one or more of the aforementioned memory devices. Each of the above modules corresponds to a set of instructions for performing a function described above. The modules or programs (e.g., sets of instructions) described above need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various embodiments. In some embodiments, memory 470 may store a subset of the modules and data structures described above. Further, memory 470 may store additional modules and data structures not described above.

Attention is now directed to embodiments of user interfaces that may be implemented on, for example, portable multifunction device 200.

Fig. 5A illustrates an exemplary user interface of an application menu on a portable multifunction device 200 according to some embodiments. A similar user interface may be implemented on device 400. In some embodiments, the user interface 500 includes the following elements, or a subset or superset thereof:

signal strength indicators 502 for wireless communications (such as cellular signals and Wi-Fi signals);

time 504;

A bluetooth indicator 505;

a battery status indicator 506;

tray 508 with common application icons such as:

an icon 516 of the phone module 238 labeled "phone", the icon 416 optionally including an indicator 514 of the number of missed calls or voice messages;

an icon 518 for the email client module 240 labeled "mail", optionally including an indicator 510 of the number of unread emails;

icon 520 of browser module 247 labeled "browser"; and

icon 522 labeled "iPod" for video and music player module 252 (also known as iPod (trademark of Apple inc.) module 252); and

icons for other applications, such as:

icon 524 of IM module 241 labeled "message";

icon 526 of calendar module 248 labeled "calendar";

icon 528 of image management module 244 labeled "photo";

icon 530 for camera module 243 labeled "camera";

icon 532 for online video module 255 labeled "online video";

an icon 534 labeled "stock market" for the O-stock desktop applet 249-2;

Icon 536 for the map module 254 labeled "map";

icon 538 for weather desktop applet 249-1 labeled "weather";

icon 540 labeled "clock" for alarm clock desktop applet 249-4;

icon 542 labeled "fitness support" for fitness support module 242;

icon 544 labeled "notepad" for notepad module 253; and

an icon 546 labeled "settings" for setting applications or modules, the icon 446 providing access to settings of the device 200 and its various applications 236.

It should be noted that the icon labels shown in fig. 5A are merely exemplary. For example, the icon 522 of the video and music player module 252 may optionally be labeled as "music" or "music player". Other tabs are optionally used for various application icons. In some embodiments, the label of the respective application icon includes a name of the application corresponding to the respective application icon. In some embodiments, the label of a particular application icon is different from the name of the application corresponding to that particular application icon.

Fig. 5B illustrates an exemplary user interface on a device (e.g., device 400 of fig. 4) having a touch-sensitive surface 551 (e.g., tablet or touchpad 455 of fig. 4) separate from a display 550 (e.g., touchscreen display 212). The device 400 also optionally includes one or more contact intensity sensors (e.g., one or more of the sensors 457) for detecting the intensity of contacts on the touch-sensitive surface 551 and/or one or more tactile output generators 459 for generating tactile outputs for a user of the device 400.

Although some of the examples that follow will be given with reference to input on the touch screen display 212 (where the touch-sensitive surface and the display are combined), in some embodiments, the device detects input on a touch-sensitive surface that is separate from the display, as shown in fig. 5B. In some embodiments, the touch-sensitive surface (e.g., 551 in fig. 5B) has a major axis (e.g., 552 in fig. 5B) that corresponds to a major axis (e.g., 553 in fig. 5B) on the display (e.g., 550). According to these embodiments, the device detects contacts (e.g., 560 and 562 in fig. 5B) with the touch-sensitive surface 551 at locations that correspond to respective locations on the display (e.g., 560 corresponds to 568 and 562 corresponds to 570 in fig. 5B). As such, when the touch-sensitive surface (e.g., 551 in fig. 5B) is separated from the display (550 in fig. 5B) of the multifunction device, user inputs (e.g.,

contacts

560 and 562 and their movements) detected by the device on the touch-sensitive surface are used by the device to manipulate the user interface on the display. It should be understood that similar methods are optionally used for the other user interfaces described herein.

Additionally, while the following examples are given primarily with reference to finger inputs (e.g., finger contact, single-finger tap gesture, finger swipe gesture), it should be understood that in some embodiments one or more of these finger inputs are replaced by inputs from another input device (e.g., mouse-based inputs or stylus inputs). For example, the swipe gesture is optionally replaced by a mouse click (e.g., rather than a contact), followed by movement of the cursor along the path of the swipe (e.g., rather than movement of the contact). As another example, a flick gesture is optionally replaced by a mouse click (e.g., rather than detection of a contact followed by termination of detection of the contact) while the cursor is over the location of the flick gesture. Similarly, when multiple user inputs are detected simultaneously, it should be understood that multiple computer mice are optionally used simultaneously, or mouse and finger contacts are optionally used simultaneously.

Fig. 6A illustrates an exemplary personal electronic device 600. The device 600 includes a body 602. In some embodiments, apparatus 600 may include some or all of the features described for apparatus 200 and 400 (e.g., fig. 2A-4). In some embodiments, device 600 has a touch-sensitive display screen 604, referred to hereinafter as touch screen 604. Instead of or in addition to the touch screen 604, the device 600 has a display and a touch-sensitive surface. As with devices 200 and 400, in some embodiments, touch screen 604 (or touch-sensitive surface) may have one or more intensity sensors for detecting the intensity of an applied contact (e.g., touch). One or more intensity sensors of touch screen 604 (or touch-sensitive surface) may provide output data representing the intensity of a touch. The user interface of device 600 may respond to the touch based on the strength of the touch, meaning that different strengths of the touch may invoke different user interface operations on device 600.

Techniques for detecting and processing touch intensities may be found, for example, in the following related patent applications: international patent Application Ser. No. PCT/US2013/040061 entitled "Device, Method, and Graphical User Interface for Displaying User Interface Objects reforming to an Application", filed on 8.5.2013, and International patent Application Ser. No. PCT/US2013/069483 entitled "Device, Method, and Graphical User Interface for translating Between Touch Input to Display Output references", filed on 11.2013, 11.11.2013, each of which is hereby incorporated by reference in its entirety.

In some embodiments, device 600 has one or

more input mechanisms

606 and 608. The input mechanisms 606 and 608 (if included) may be in physical form. Examples of physical input mechanisms include push buttons and rotatable mechanisms. In some embodiments, device 600 has one or more attachment mechanisms. Such attachment mechanisms, if included, may allow device 600 to be attached with, for example, a hat, glasses, earrings, necklace, shirt, jacket, bracelet, watchband, bracelet, pants, belt, shoe, purse, backpack, and the like. These attachment mechanisms may allow the user to wear the device 600.

Fig. 6B illustrates an exemplary personal electronic device 600. In some embodiments, the apparatus 600 may include some or all of the components described with reference to fig. 2A, 2B, and 4. The device 600 has a bus 612 that operatively couples an I/O portion 614 with one or more computer processors 616 and a memory 618. I/O portion 614 may be connected to display 604, which may have touch sensitive component 622 and optionally also touch intensity sensitive component 624. Further, I/O portion 614 may connect with communications unit 630 for receiving applications and operating system data using Wi-Fi, bluetooth, Near Field Communication (NFC), cellular, and/or other wireless communication technologies. Device 600 may include input mechanisms 606 and/or 608. For example, input mechanism 606 may be a rotatable input device or a depressible input device as well as a rotatable input device. In some examples, input mechanism 608 may be a button.

In some examples, input mechanism 608 may be a microphone. The personal electronic device 600 may include various sensors, such as a GPS sensor 632, an accelerometer 634, an orientation sensor 640 (e.g., a compass), a gyroscope 636, a motion sensor 638, and/or combinations thereof, all of which may be operatively connected to the I/O section 614.

The memory 618 of the personal electronic device 600 may be a non-transitory computer-readable storage medium for storing computer-executable instructions that, when executed by one or more computer processors 616, may, for example, cause the computer processors to perform the techniques described below, including processes 800 and 900 (fig. 8-9). The computer-executable instructions may also be stored and/or transmitted within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. The personal electronic device 600 is not limited to the components and configuration of fig. 6B, but may include other components or additional components in a variety of configurations.

As used herein, the term "affordance" refers to a user-interactive graphical user interface object that may be displayed on a display screen of device 200, 400, and/or 600 (FIGS. 2, 4, and 6). For example, images (e.g., icons), buttons, and text (e.g., links) can each constitute an affordance.

As used herein, the term "focus selector" refers to an input element that is used to indicate the current portion of the user interface with which the user is interacting. In some particular implementations that include a cursor or other position marker, the cursor acts as a "focus selector" such that when an input (e.g., a press input) is detected on a touch-sensitive surface (e.g., touchpad 455 in fig. 4 or touch-sensitive surface 551 in fig. 5B) while the cursor is over a particular user interface element (e.g., a button, window, slider, or other user interface element), the particular user interface element is adjusted according to the detected input. In some implementations that include a touch screen display (e.g., touch-sensitive display system 212 in fig. 2A or touch screen 212 in fig. 5A) that enables direct interaction with user interface elements on the touch screen display, a detected contact on the touch screen acts as a "focus selector" such that when an input (e.g., a press input by the contact) is detected at a location of a particular user interface element (e.g., a button, window, slider, or other user interface element) on the touch screen display, the particular user interface element is adjusted in accordance with the detected input. In some implementations, the focus is moved from one area of the user interface to another area of the user interface without corresponding movement of a cursor or movement of a contact on the touch screen display (e.g., by moving the focus from one button to another using tab or arrow keys); in these implementations, the focus selector moves according to focus movement between different regions of the user interface. Regardless of the particular form taken by the focus selector, the focus selector is typically a user interface element (or contact on a touch screen display) that is controlled by the user to deliver the user-intended interaction with the user interface (e.g., by indicating to the device the element with which the user of the user interface desires to interact). For example, upon detection of a press input on a touch-sensitive surface (e.g., a touchpad or touchscreen), the location of a focus selector (e.g., a cursor, contact, or selection box) over a respective button will indicate that the user desires to activate the respective button (as opposed to other user interface elements shown on the device display).

As used in the specification and in the claims, the term "characteristic intensity" of a contact refers to a characteristic of the contact based on one or more intensities of the contact. In some embodiments, the characteristic intensity is based on a plurality of intensity samples. The characteristic intensity is optionally based on a predefined number of intensity samples or a set of intensity samples acquired during a predetermined time period (e.g., 0.05 seconds, 0.1 seconds, 0.2 seconds, 0.5 seconds, 1 second, 2 seconds, 5 seconds, 10 seconds) relative to a predefined event (e.g., after detection of contact, before detection of contact lift, before or after detection of contact start movement, before or after detection of contact end, before or after detection of increase in intensity of contact, and/or before or after detection of decrease in intensity of contact). The characteristic intensity of the contact is optionally based on one or more of: maximum value of contact strength, mean value of contact strength, average value of contact strength, value at the first 10% of contact strength, half maximum value of contact strength, 90% maximum value of contact strength, and the like. In some embodiments, the duration of the contact is used in determining the characteristic intensity (e.g., when the characteristic intensity is an average of the intensity of the contact over time). In some embodiments, the characteristic intensity is compared to a set of one or more intensity thresholds to determine whether the user has performed an operation. For example, the set of one or more intensity thresholds may include a first intensity threshold and a second intensity threshold. In the present embodiment, a contact whose characteristic intensity does not exceed the first threshold value results in a first operation, a contact whose characteristic intensity exceeds the first intensity threshold value but does not exceed the second intensity threshold value results in a second operation, and a contact whose characteristic intensity exceeds the second threshold value results in a third operation. In some embodiments, a comparison between the strength of specificity and one or more thresholds is used to determine whether to perform one or more operations (e.g., whether to perform the respective operation or to forgo performing the respective operation), rather than to determine whether to perform the first operation or the second operation.

In some implementations, a portion of the gesture is recognized for determining the characteristic intensity. For example, the touch-sensitive surface may receive a continuous swipe contact that transitions from a starting location and reaches an ending location where the intensity of the contact increases. In this embodiment, the characteristic strength of the contact at the end position may be based on only a portion of the continuous swipe contact, and not the entire swipe contact (e.g., only the portion of the swipe contact at the end position). In some implementations, a smoothing algorithm may be applied to the intensity of the swipe gesture before determining the characteristic intensity of the contact. For example, the smoothing algorithm optionally includes one or more of: a non-weighted moving average smoothing algorithm, a triangular smoothing algorithm, a median filter smoothing algorithm, and/or an exponential smoothing algorithm. In some cases, these smoothing algorithms eliminate narrow spikes or dips in the intensity of the swipe contact for the purpose of determining the characteristic intensity.

The intensity of a contact on the touch-sensitive surface may be characterized relative to one or more intensity thresholds, such as a contact detection intensity threshold, a light press intensity threshold, a deep press intensity threshold, and/or one or more other intensity thresholds. In some embodiments, the light press intensity threshold corresponds to an intensity that: at which intensity the device will perform the operations typically associated with clicking a button or touchpad of a physical mouse. In some embodiments, the deep press intensity threshold corresponds to an intensity that: at which intensity the device will perform a different operation than that typically associated with clicking a button of a physical mouse or trackpad. In some embodiments, when a contact is detected whose characteristic intensity is below a light press intensity threshold (e.g., and above a nominal contact detection intensity threshold, a contact below the nominal contact detection intensity threshold is no longer detected), the device will move the focus selector in accordance with movement of the contact across the touch-sensitive surface without performing operations associated with a light press intensity threshold or a deep press intensity threshold. Generally, unless otherwise stated, these intensity thresholds are consistent between different sets of user interface drawings.

The increase in contact characteristic intensity from an intensity below the light press intensity threshold to an intensity between the light press intensity threshold and the deep press intensity threshold is sometimes referred to as a "light press" input. Increasing the contact characteristic intensity from an intensity below the deep press intensity threshold to an intensity above the deep press intensity threshold is sometimes referred to as a "deep press" input. Increasing the contact characteristic intensity from an intensity below the contact detection intensity threshold to an intensity between the contact detection intensity threshold and the light press intensity threshold is sometimes referred to as detecting a contact on the touch surface. The decrease in contact characteristic intensity from an intensity above the contact detection intensity threshold to an intensity below the contact detection intensity threshold is sometimes referred to as detecting a lift of the contact from the touch surface. In some embodiments, the contact detection intensity threshold is zero. In some embodiments, the contact detection intensity threshold is greater than zero.

In some embodiments described herein, one or more operations are performed in response to detecting a gesture that includes a respective press input or in response to detecting a respective press input performed with a respective contact (or contacts), wherein the respective press input is detected based at least in part on detecting an increase in intensity of the contact (or contacts) above a press input intensity threshold. In some embodiments, the respective operation is performed in response to detecting an increase in intensity of the respective contact above a press input intensity threshold (e.g., a "down stroke" of the respective press input). In some embodiments, the press input includes an increase in intensity of the respective contact above a press input intensity threshold and a subsequent decrease in intensity of the contact below the press input intensity threshold, and the respective operation is performed in response to detecting the subsequent decrease in intensity of the respective contact below the press input threshold (e.g., "upstroke" of the respective press input).

In some embodiments, the device employs intensity hysteresis to avoid accidental input sometimes referred to as "jitter," where the device defines or selects a hysteresis intensity threshold having a predefined relationship to the press input intensity threshold (e.g., the hysteresis intensity threshold is X intensity units lower than the press input intensity threshold, or the hysteresis intensity threshold is 75%, 90%, or some reasonable proportion of the press input intensity threshold). Thus, in some embodiments, a press input includes an increase in intensity of a respective contact above a press input intensity threshold and a subsequent decrease in intensity of the contact below a hysteresis intensity threshold corresponding to the press input intensity threshold, and a respective operation is performed in response to detecting a subsequent decrease in intensity of the respective contact below the hysteresis intensity threshold (e.g., an "up stroke" of the respective press input). Similarly, in some embodiments, a press input is detected only when the device detects an increase in intensity of the contact from an intensity at or below the hysteresis intensity threshold to an intensity at or above the press input intensity threshold and optionally a subsequent decrease in intensity of the contact to an intensity at or below the hysteresis intensity, and a corresponding operation is performed in response to detecting the press input (e.g., an increase in intensity of the contact or a decrease in intensity of the contact, depending on the circumstances).

For ease of explanation, optionally, a description of an operation performed in response to a press input associated with a press input intensity threshold or in response to a gesture that includes a press input is triggered in response to detection of any of the following: the contact intensity increases above the press input intensity threshold, the contact intensity increases from an intensity below the hysteresis intensity threshold to an intensity above the press input intensity threshold, the contact intensity decreases below the press input intensity threshold, and/or the contact intensity decreases below the hysteresis intensity threshold corresponding to the press input intensity threshold. Additionally, in examples in which operations are described as being performed in response to detecting that the intensity of the contact decreases below the press input intensity threshold, the operations are optionally performed in response to detecting that the intensity of the contact decreases below a hysteresis intensity threshold that corresponds to and is less than the press input intensity threshold.

3. Digital assistant system

Fig. 7A is a block diagram of a digital assistant system 700 according to various embodiments. In some embodiments, the digital assistant system 700 may be implemented on a stand-alone computer system. In some embodiments, the digital assistant system 700 may be distributed across multiple computers. In some embodiments, some of the modules and functionality of a digital assistant may be divided into a server portion and a client portion, where the client portion is located on one or more user devices (e.g., devices 104,122,200,400 or 600) and communicates with the server portion (e.g., server system 108) over one or more networks, for example as shown in fig. 1. In some embodiments, the digital assistant system 700 may be an implementation of the server system 108 (and/or DA server 106) shown in fig. 1. It should be noted that the digital assistant system 700 is only one example of a digital assistant system, and that the digital assistant system 700 may have more or fewer components than shown, may combine two or more components, or may have a different configuration or arrangement of components. The various components shown in fig. 7A may be implemented in hardware, software instructions for execution by one or more processors, firmware (including one or more signal processing integrated circuits and/or application specific integrated circuits), or a combination thereof.

The digital assistant system 700 can include a memory 702, one or more processors 704, input/output (I/O) interfaces 706, and a network communication interface 708. These components may communicate with each other via one or more communication buses or signal lines 710.

In some embodiments, the memory 302 may include a non-transitory computer-readable medium, such as a high speed random access memory and/or a non-volatile computer-readable storage medium (e.g., one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices).

In some embodiments, the I/O interface 706 may couple input/output devices 716 of the digital assistant system 700, such as a display, a keyboard, a touch screen, and a microphone, to the user interface module 722. I/O interface 706, in conjunction with user interface module 722, may receive user inputs (e.g., voice inputs, keyboard inputs, touch inputs, etc.) and process those inputs accordingly. In some embodiments, such as when the digital assistant is implemented on a standalone user device, the digital assistant system 700 may include any of the components and I/O communication interfaces described with respect to the devices 200,400, or 600 in fig. 2A, 4, 6A-6B, respectively. In some embodiments, the digital assistant system 700 may represent a server portion of a digital assistant implementation and may interact with a user through a client-side portion located on a user device (e.g., device 104,200,400 or 600).

In some embodiments, the network communication interface 708 may include wireless transmit and receive circuitry 714 and/or one or more wired communication ports 712. The one or more wired communication ports may receive and transmit communication signals via one or more wired interfaces, such as ethernet, Universal Serial Bus (USB), firewire, and the like. The wireless circuitry 714 may receive and transmit RF and/or optical signals to and from communication networks and other communication devices. The wireless communication may use any of a variety of communication standards, protocols, and technologies, such as GSM, EDGE, CDMA, TDMA, Bluetooth, Wi-Fi, VoIP, Wi-MAX, or any other suitable communication protocol. Network communication interface 708 may enable communication between digital assistant system 700 and other devices via a network, such as the internet, an intranet, and/or a wireless network, such as a cellular telephone network, a wireless Local Area Network (LAN), and/or a Metropolitan Area Network (MAN).

In some embodiments, memory 702, or a computer-readable storage medium of memory 702, may store programs, modules, instructions, and data structures, including all or a subset of the following: an operating system 718, a communications module 720, a user interface module 722, one or more application programs 724, and a digital assistant module 726. In particular, memory 702 or the computer-readable storage medium of memory 702 may store instructions for performing processes 800,900 described below. The one or more processors 704 may execute the programs, modules, and instructions and read data from, or write data to, the data structures.

The operating system 718 (e.g., Darwin, RTXC, LINUX, UNIX, iOS, OS X, WINDOWS, or an embedded operating system such as VxWorks) may include various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware, firmware, and software components.

The communication module 720 may facilitate communications between the digital assistant system 700 and other devices via the network communication interface 708. For example, the communication module 720 may communicate with the RF circuitry 208 of an electronic device, such as the devices 200,400, and 600 shown in fig. 2A, 4, and 6A-6B, respectively. The communications module 720 may also include various components for processing data received by the wireless circuitry 714 and/or the wired communications port 712.

User interface module 722 may receive commands and/or input from a user (e.g., from a keyboard, touch screen, pointing device, controller, and/or microphone) via I/O interface 706 and generate user interface objects on the display. User interface module 722 may also prepare and communicate output (e.g., voice, sound, animation, text, icons, vibrations, haptic feedback, lighting, etc.) to the user via I/O interface 706 (e.g., through a display, audio channel, speaker, touch pad, etc.).

The application programs 724 may include programs and/or modules configured to be executed by the one or more processors 704. For example, if the digital assistant system is implemented on a standalone user device, the applications 724 may include user applications such as games, calendar applications, navigation applications, or mail applications. If the digital assistant system 700 is implemented on a server, the application 724 may include, for example, an asset management application, a diagnostic application, or a scheduling application.

The memory 702 may also store a digital assistant module 726 (or a server portion of a digital assistant). In some embodiments, digital assistant module 726 may include the following sub-modules, or a subset or superset thereof: an input/output processing module 728, a Speech To Text (STT) processing module 730, a natural language processing module 732, a dialog flow processing module 734, a task flow processing module 736, a service processing module 738, and a speech synthesis module 740. Each of these modules may have access to one or more, or a subset or superset thereof, of the systems or data and models of the following digital assistant module 726: ontology 760, vocabulary index 744, user data 748, task flow model 754, service model 756, and ASR system.

In some embodiments, using the processing modules, data, and models implemented on the digital assistant module 726, the digital assistant can perform at least some of the following: converting the voice input into text; identifying a user's intent expressed from a natural language input received by the user; actively elicit and obtain the information needed to fully infer the user's intent (e.g., by disambiguating words, names, intent, etc.); determining a task flow for implementing the inferred intent; and executing the task stream to fulfill the inferred intent.

In some embodiments, as shown in fig. 7B, I/O processing module 728 may interact with a user through I/O device 716 in fig. 7A or with a user device (e.g., device 104, 200, 400, or 600) through network communication interface 708 in fig. 7A to obtain user input (e.g., voice input) and provide a response to the user input (e.g., as voice output). The I/O processing module 728 may optionally obtain contextual information associated with the user input from the user device along with or shortly after receiving the user input. The contextual information may include user-specific data, vocabulary, and/or preferences related to user input. In some embodiments, the context information also includes software and hardware states of the user device at the time the user request is received, and/or information relating to the user's surroundings at the time the user request is received. In some embodiments, the I/O processing module 728 may also send follow-up questions to the user regarding the user's request and receive answers back from the user. When a user request is received by the I/O processing module 728 and the user request may include speech input, the I/O processing module 728 may forward the speech input to the STT processing module 730 (or speech recognizer) for speech-to-text conversion.

STT processing module 730 may include one or more ASR systems. One or more ASR systems may process speech input received through I/O processing module 728 to generate recognition results. Each ASR system may include a front-end speech preprocessor. A front-end speech preprocessor may extract representative features from the speech input. For example, a front-end speech preprocessor may perform a fourier transform on a speech input to extract spectral features that characterize the speech input as a sequence of representative multi-dimensional vectors. Rather, each ASR system may include one or more speech recognition models (e.g., acoustic models and/or language models) and may implement one or more speech recognition engines. Embodiments of speech recognition models may include hidden Markov models, Gaussian mixture models, deep neural network models, n-gram language models, and other statistical models. Embodiments of the speech recognition engine may include a dynamic time warping based engine and a Weighted Finite State Transducer (WFST) based engine. One or more speech recognition models and one or more speech recognition engines may be used to process the extracted representative features of the front-end preprocessor to produce intermediate recognition results (e.g., phonemes, speech strings, and sub-words) and ultimately text recognition results (e.g., words, word strings, or symbol sequences). In some embodiments, the speech input may be processed at least in part by a third party service or on a user device (e.g., device 104,200,400 or 600) to produce a recognition result. Once STT processing module 730 generates a recognition result (e.g., a word, or a string of words, or a sequence of symbols) that includes a text string, the recognition result may be passed to natural language processing module 732 for intent inference.

More details regarding the processing of Speech to text are described in U.S. utility model patent application serial No. 13/236,942 entitled "Consolidating Speech recognitions Results" filed on 20/9/2011, the entire disclosure of which is incorporated herein by reference.

In some embodiments, STT processing module 730 may include and/or access a vocabulary of recognizable words via speech-to-alphabet conversion module 731. Each vocabulary word may be associated with one or more candidate pronunciations for the word represented in the speech recognition alphabet. In particular, the vocabulary of recognizable words may include words associated with multiple candidate pronunciations. For example, the vocabulary may include

And

the word "tomato" associated with the candidate pronunciation. Also, the vocabulary words may be associated with a customized candidate pronunciation based on a previous speech input from the user. Such customized candidate pronunciations can be stored in STT processing module 730 and can be associated with a particular user via a user's profile on the device. In some embodiments, candidate issues for a word may be determined based on the spelling of the word and one or more linguistic and/or phonetic rulesAnd (4) sound. In some embodiments, candidate pronunciations may be generated manually, for example, based on known canonical pronunciations.

In some embodiments, candidate pronunciations may be ranked based on their common points. For example, candidate pronunciation

Can be compared

The ranking is higher because the former is a more common pronunciation (e.g., among all users, for users in a particular geographic area, or for any other appropriate subset of users). In some embodiments, the candidate pronunciations may be ranked based on whether the candidate pronunciations are customized candidate pronunciations associated with the user. For example, the custom candidate pronunciation may be ranked higher than the canonical candidate pronunciation. This is useful for identifying proper nouns with unique pronunciations that deviate from the canonical pronunciation. In some embodiments, the candidate pronunciations may be associated with one or more speech characteristics, such as geographic origin, nationality, or ethnicity. For example, candidate pronunciations

Can be associated with the United states to make candidate pronunciation

May be associated with the united kingdom. Moreover, the ranking of the candidate pronunciations may be based on one or more characteristics of the user (e.g., geographic origin, nationality, ethnicity, etc.) in the user's profile stored on the device. For example, the determination may be based on a profile of a user associated with the United states. Based on the user associated with the United states, a candidate pronunciation (associated with the United states)

May be ranked higher than the candidate pronunciation (associated with the united kingdom). In some embodiments, one of the ranked candidate pronunciations may be selected as the predicted pronunciation (e.g., the most likely pronunciation).

When a speech input is received, the STT processing module 730 may be used to determine phonemes corresponding to the speech input (e.g., using an acoustic model) and then attempt to determine words that match the phonemes (e.g., using a language model). For example, if STT processing module 730 may first identify a phoneme sequence corresponding to a portion of speech input

Then it may be determined that the sequence corresponds to the word "tomato" based on the lexical index 744.

In some embodiments, STT processing module 730 may use approximate matching techniques to determine words in the utterance. Thus, for example, STT processing module 730 may determine a phoneme sequence

Corresponding to the word "tomato", even if the particular phoneme sequence is not one of the candidate phoneme sequences for that word.

The natural language processing module 732 ("natural language processor") of the digital assistant may take the sequence of words or symbols ("symbol sequence") generated by the STT processing module 730 and attempt to associate the symbol sequence with one or more "actionable intents" identified by the digital assistant. An "actionable intent" may represent a task that may be performed by a digital assistant and that may have an associated task flow implemented in task flow model 754. The associated task stream may be a series of programmed actions and steps taken by the digital assistant to perform the task. The capability scope of the digital assistant may depend on the number and variety of task flows that have been implemented and stored in task flow model 754, or in other words, on the number and variety of "actionable intents" that the digital assistant recognizes. However, the effectiveness of a digital assistant may also depend on the assistant's ability to infer the correct "executable intent or intents" from a user request expressed in natural language.

In some embodiments, natural language processing module 732 may receive context information associated with the user request, such as from I/O processing module 728, in addition to the sequence of words or symbols obtained from STT processing module 730. The natural language processing module 732 may optionally use the context information to clarify, supplement, and/or further define information contained in the sequence of symbols received from the STT processing module 730. The contextual information may include, for example, user preferences, hardware and/or software states of the user device, sensor information collected before, during, or shortly after the user request, previous interactions (e.g., conversations) between the digital assistant and the user, and so forth. As described herein, contextual information may be dynamic and may vary with time, location, content of a conversation, and other factors.

In some embodiments, natural language processing may be based on ontology 760, for example. Ontology 760 may be a hierarchical structure that includes a number of nodes, each node representing an "actionable intent" or an "attribute" related to one or more of an "actionable intent" or other "attribute". As described above, an "actionable intent" may represent a task that a digital assistant is capable of performing, i.e., that is, "actionable" or can be performed. An "attribute" may represent a parameter associated with a sub-aspect of an executable intent or another attribute. The connection between the actionable intent node and the property node in ontology 760 may define how the parameters represented by the property node pertain to the task represented by the actionable intent node.

In some embodiments, ontology 760 may be composed of actionable intent nodes and property nodes. Within ontology 760, each actionable intent node may be connected to one or more property nodes directly or through one or more intermediate property nodes. Similarly, each property node may be connected directly to one or more actionable intent nodes or through one or more intermediate property nodes. For example, as shown in FIG. 7C, ontology 760 can include a "restaurant reservation" node (i.e., an actionable intent node). The attribute nodes "restaurant," "date/time" (for reservation), and "number of people with whom" may all be directly connected to the actionable intent node (i.e., "restaurant reservation" node).

Further, the property nodes "cuisine," price interval, "" phone number, "and" location "may be child nodes of the property node" restaurant, "and may each be connected to the" restaurant reservation "node (i.e., actionable intent node) through an intermediate property node" restaurant. As another example, as shown in fig. 7C, ontology 760 may also include a "set reminder" node (i.e., another actionable intent node). The property node "date/time" (for set reminders) and "subject" (for reminders) may both be connected to the "set reminders" node. Since the property "date/time" may be related to both the task of making restaurant reservations and the task of setting reminders, the property node "date/time" may be connected to both the "restaurant reservation" node and the "setting reminders" node in ontology 760.

The actionable intent node, along with the concept nodes to which it connects, may be described as a "domain". In the present discussion, each domain may be associated with a respective executable intent and refer to a set of nodes (and relationships between those nodes) associated with a particular executable intent. For example, ontology 760 shown in FIG. 7C may include an embodiment of a restaurant reservation field 762 and an embodiment of a reminder field 764 within ontology 760. The restaurant reservation domain includes the actionable intent node "restaurant reservation," the property nodes "restaurant," date/time, "and" party size, "and the child property nodes" cuisine, "" price interval, "" phone number, "and" location. The reminder field 764 may include the executable intent node "set reminder" and the property nodes "subject" and "date/time". In some embodiments, ontology 760 may be composed of multiple domains. Each domain may share one or more attribute nodes with one or more other domains. For example, in addition to the restaurant reservation field 762 and reminder field 764, a "date/time" property node may be associated with many different fields (e.g., a scheduling field, a travel reservation field, a movie tickets field, etc.).

Although fig. 7C shows two exemplary domains within ontology 760, other domains may include, for example, "find movie," "initiate phone call," "find direction," "arrange meeting," "send message," and "provide answer to question," "read list," "provide navigation instructions," "provide instructions for task," etc. The "send message" field may be associated with a "send message" executable intent node, and may also include attribute nodes such as "one or more recipients," message type, "and" message body. The attribute node "recipient" may further be defined, for example, by child attribute nodes such as "recipient name" and "message address".

In some embodiments, ontology 760 may include all domains (and thus actionable intents) that a digital assistant is able to understand and act upon. In some embodiments, ontology 760 may be modified, such as by adding or removing entire domains or nodes, or by modifying relationships between nodes within ontology 760.

In some embodiments, nodes associated with multiple related executables may be clustered under a "super domain" in the awareness ontology 760. For example, a "travel" super-domain may include a cluster of attribute nodes and actionable intent nodes related to travel. Executable intent nodes related to travel may include "airline reservations," "hotel reservations," "car rentals," "route planning," "finding points of interest," and so forth. An actionable intent node under the same super-domain (e.g., a "travel" super-domain) may have multiple attribute nodes in common. For example, executable intent nodes for "airline reservations," hotel reservations, "" car rentals, "" route plans, "and" find points of interest "may share one or more of the attribute nodes" starting location, "" destination, "" departure date/time, "" arrival date/time, "and" peer count.

In some embodiments, each node in ontology 760 can be associated with a set of words and/or phrases that are related to the attribute or actionable intent represented by the node. The respective set of words and/or phrases associated with each node may be a so-called "vocabulary" associated with the node. The respective set of words and/or phrases associated with each node may be stored in the lexical index 744 associated with the property or actionable intent represented by the node. For example, returning to fig. 7B, the vocabulary associated with the node of the "restaurant" attribute may include words such as "food," "drinks," "cuisine," "hungry," "eating," "pizza," "fast food," "meal," and so forth. As another example, the vocabulary associated with the node of the actionable intent of "initiate a phone call" may include words and phrases such as "call," "make a call," "dial," "make a call with … …," "call the number," "make a call to," and so on. The vocabulary index 744 may optionally include words and phrases in different languages.

Natural language processing module 732 may receive a sequence of symbols (e.g., a text string) from STT processing module 730 and determine which nodes are involved in words in the sequence of symbols. In some embodiments, if a word or phrase in the sequence of symbols is found to be associated with one or more nodes in ontology 760 (via lexical index 744), the word or phrase may "trigger" or "activate" those nodes. Based on the number and/or relative importance of the activated nodes, natural language processing module 732 may select one of the actionable intents as a task that the user intends for the digital assistant to perform. In some embodiments, the domain with the most "triggered" nodes may be selected. In some embodiments, the domain with the highest confidence (e.g., based on the relative importance of its respective triggered node) may be selected. In some embodiments, the domain may be selected based on a combination of the number and importance of triggered nodes. In some embodiments, additional factors are also considered in selecting a node, such as whether the digital assistant has previously correctly interpreted a similar request from the user.

The user data 748 may include user-specific information such as user-specific vocabulary, user preferences, user addresses, the user's default and second languages, the user's contact list, and other short-term or long-term information for each user. In some embodiments, the natural language processing module 732 may use user-specific information to supplement information contained in the user input to further define the user intent. For example, for a user request "invite my friend to my birthday party," natural language processing module 732 can access user data 748 to determine which people "friends" are and where and when the "birthday party" will be held without requiring the user to explicitly provide such information in their request.

Additional details of Searching for ontologies based on symbolic strings are described in U.S. utility model patent application serial No. 12/341,743, entitled "Method and Apparatus for Searching Using An Active Ontology," filed 22/2008, the entire disclosure of which is incorporated herein by reference.

In some embodiments, once natural language processing module 732 identifies an actionable intent (or domain) based on the user request, natural language processing module 732 may generate a structured query to represent the identified actionable intent. In some embodiments, the structured query may include parameters for one or more nodes within the domain that can execute the intent, and at least some of the parameters are populated with specific information and requirements specified in the user request. For example, the user may say "help me reserve a seat at 7 pm in a sushi shop. In this case, the natural language processing module 732 can correctly recognize the executable intention as "restaurant reservation" based on the user input. According to the knowledge body, the structured query of the "dining room reservation" domain may include parameters such as { cuisine }, { time }, { date }, { party size }, and the like. In some embodiments, based on the speech input and text obtained from the speech input using STT processing module 730, natural language processing module 732 may generate a partially structured query for the restaurant reservation field, where the partially structured query includes parameters { cuisine ═ sushi class "} and { time ═ 7 points-later" }. However, in this embodiment, the user utterance contains insufficient information to complete a structured query associated with the domain. Thus, based on the currently available information, other necessary parameters such as { co-workers } and { date } may not be specified in the structured query. In some embodiments, natural language processing module 732 may populate some parameters of the structured query with the received contextual information. For example, in some embodiments, if the user requests a sushi store that is "nearby," the natural language processing module 732 may populate the { location } parameter in the structured query with GPS coordinates from the user device.

In some embodiments, the natural language processing module 732 may pass the generated structured query (including any completed parameters) to the task flow processing module 736 ("task flow processor"). Task stream processing module 736 may be configured to receive the structured query from natural language processing module 732, complete the structured query (if necessary), and perform the actions required to "complete" the user's final request. In some embodiments, the various processes necessary to accomplish these tasks may be provided in task flow model 754. In some embodiments, task flow model 754 may include procedures for obtaining additional information from a user, as well as task flows for performing actions associated with an executable intent.

As described above, to complete a structured query, the task flow processing module 736 may need to initiate additional conversations with the user in order to obtain additional information and/or clarify potentially ambiguous utterances. When such interaction is necessary, the task flow processor 736 may invoke the dialog flow processor module 734 to engage in a dialog with the user. In some embodiments, the dialog flow processor module 734 may determine how (and/or when) to request additional information from the user, and receive and process the user response. The questions may be provided to the user and answers may be received from the user through the I/O processing module 728. In some embodiments, the dialog processing module 734 may present dialog output to the user via audio and/or video output and receive input from the user via spoken or physical (e.g., click) responses. Continuing with the above-described embodiment, when the task flow processing module 736 invokes the conversation flow processing module 734 to determine "number of peers" and "date" information for a structured query associated with the domain "restaurant reservation," the conversation flow processing module 734 may generate a query such as "a few digits in a line? "and" which day to subscribe? "and the like to the user. Upon receiving an answer from the user, the dialog flow processing module 734 may populate the structured query with missing information or pass the information to the task flow processing module 736 to complete the missing information from the structured query.

Once the task flow processing module 736 has completed the structured query against the executable intent, the task flow processing module 736 may continue to perform the final task associated with the executable intent. Thus, the task flow processing module 736 may perform the steps and instructions in the task flow model according to the specific parameters contained in the structured query. For example, a task flow model for the actionable intent "restaurant reservation" may include steps and instructions for contacting a restaurant and actually requesting a reservation for a particular colleague at a particular time. For example, by using structured queries such as: { restaurant reservation, restaurant, ABC cafe, date 2012/3/12, time 7 pm, peer 5 }, task flow processing module 736 may perform the following steps: (1) a server or restaurant reservation system such as that logged into ABC cafe

(2) Entering date, time, and peer information in a form on a website, (3) submitting the form, and (4) making a calendar entry for the reservation in the user's calendar.

In some embodiments, the task flow processing module 736 may complete the tasks requested in the user input or provide the informational answers requested in the user input with the assistance of the service processing module 738 ("service processing module"). For example, the service processing module 738 may initiate phone calls, set calendar entries, invoke map searches, invoke or interact with other user applications installed on the user device, and invoke or interact with third-party services (e.g., restaurant reservation portals, social networking sites, bank portals, etc.) on behalf of the task flow processing module 736. In some embodiments, the protocols and Application Programming Interfaces (APIs) required for each service may be specified by respective ones of service models 756. The service handling module 738 may access the appropriate service model for the service and generate a request for the service according to the service model according to the protocols and APIs required by the service.

For example, if a restaurant has enabled an online reservation service, the restaurant may submit a service model that specifies the necessary parameters to make a reservation and an API to communicate the values of the necessary parameters to the online reservation service. The service processing module 738, when requested by the task flow processing module 736, may use the web address stored in the service model to establish a network connection with the online booking service and send the necessary parameters for booking (e.g., time, date, number of co-workers) to the online booking interface in the format of the API of the online booking service.

In some embodiments, the natural language processing module 732, the dialog processing module 734, and the task flow processing module 736 may be used jointly and iteratively to infer and define the user's intent, to obtain information to further clarify and refine the user's intent, and to ultimately generate a response (i.e., output to the user, or complete a task) to satisfy the user's intent. The generated response may be a dialog response to a speech input that at least partially satisfies the user's intent. Also, in some embodiments, the generated response may be output as a voice output. In these embodiments, the generated response may be sent to a speech synthesis module 740 (e.g., a speech synthesizer) where it may be processed to synthesize a conversational response in speech form. In other embodiments, the generated response may be data content related to satisfying the user request in the voice input.

The speech synthesis module 740 may be configured to synthesize speech output for presentation to a user. The speech synthesis module 740 synthesizes speech output based on text provided by the digital assistant. For example, the generated dialog response may be in the form of a text string. The speech synthesis module 740 may convert the text strings into audible speech output. The speech synthesis module 740 may use any suitable speech synthesis technique to generate speech output from text, including, but not limited to, concatenative synthesis, unit-selection synthesis, diphone synthesis, domain-specific synthesis, formant synthesis, articulatory synthesis, Hidden Markov Model (HMM) based synthesis, and sinusoidal wave synthesis. In some embodiments, speech synthesis module 740 may be configured to synthesize respective words based on phoneme strings corresponding to the words. For example, the phoneme string may be associated with a word in the generated dialog response. The phoneme string may be stored in metadata associated with the word. Speech synthesis model 740 may be configured to directly process the phone strings in the metadata to synthesize words in speech.

In some embodiments, instead of (or in addition to) using the speech synthesis module 740, speech synthesis may be performed on a remote device (e.g., the server system 108) and the synthesized speech may be sent to a user device user for output to the user. This may occur, for example, in some implementations, where the output of the digital assistant occurs at the server system. And because server systems typically have more processing power or resources than user devices, it is possible to obtain higher quality speech output than is practical for client-side synthesis.

Additional details regarding digital assistants can be found in U.S. patent application No. 12/987,982 entitled "Intelligent Automated Assistant" filed on 10.1.2011 and U.S. patent application No. 13/251,088 entitled "Generating and Processing Task Items at recovery Tasks to performance" filed on 30.9.2011, the entire disclosures of which are incorporated herein by reference.

4. Process for operating a digital assistant

Fig. 8 illustrates a flow diagram of a process 800 for operating a digital assistant, according to some embodiments. Process 800 is performed, for example, using one or more electronic devices (e.g.,

devices

104, 108, 200, 400, or 600) implementing a digital assistant. In some embodiments, process 800 is performed using a client-server system (e.g., system 100), and the blocks of process 800 may be divided in any manner between a server (e.g., DA server 106) and a client device. Thus, although portions of process 800 are described herein as being performed by a particular device of a client-server system, it should be understood that process 800 is not so limited. In other embodiments, process 800 is performed using only a client device (e.g., user device 104). In process 800, some blocks are optionally combined, the order of some blocks is optionally changed, and some blocks are optionally omitted. In some embodiments, additional steps may be performed in conjunction with process 800.

At block 805, natural language user input is received by a user device, such as user device 104 of FIG. 1. The natural language input is, for example, a speech input or a text input. In some embodiments, natural language input may include requesting a user device and/or other device to perform a task. For example, in the embodiment "dispatch a vehicle to 1200 avenues," the natural language input may include requesting the user device to pick up a vehicle using the ride reservation service. In some embodiments, the natural language input may also specify one or more parameters of the requested task. "1200 avenues" for example specifies a predetermined pick-up location for a car. In an embodiment, "order from Domino's," the natural language input may include requesting the user device to order from a pizza chain of Domino's. "my usual" may also specify, from a contextual perspective, that a predetermined food is desired.

At block 810, an intent and optionally one or more parameters associated with the intent are identified. The intent and parameters may be obtained, for example, from natural language user input. As noted, the intent may correspond to a task requested by the user. Thus, identifying (e.g., determining) the intent may include identifying a task specified in the natural language user input and/or inferring an intent corresponding to the requested task based on the language and/or context of the natural language user input. The intent may correspond to any kind of task performed by the user device, and in particular may correspond to a task performed by one or more applications of the user device, as described in more detail below.

In some embodiments, the intent is associated with (e.g., included in) one or more domains (e.g., an intent category, a set of intents). Each domain may include a particular kind of intent, allowing intuitive grouping of intents. For example, an intent to reserve a car, cancel a car reservation, and/or any other intent related to a task commonly associated with a ride reservation may be included in the ride reservation field. In other embodiments, the intent to board the flight, cancel the flight, reschedule the flight, obtain flight information, and/or any other intent related to a task commonly associated with the flight travel may be included in the flight travel domain. In other embodiments, directions may be provided, the intent to obtain road condition information, and/or any other intent related to a task commonly associated with navigation may be included in the navigation field. In other embodiments, the intent to issue a payment, receive a payment, and/or any other intent related to a task commonly associated with a financial transaction may be included in the financial transaction domain.

Identifying parameters may include identifying portions of the natural language input that specify a manner in which a task corresponding to the intent is to be performed. The parameters may specify, for example, a location (e.g., address or point of interest), a time, a date, a contact, a type, text (e.g., to be inserted into an email or message), a quantity (e.g., distance, money), and, in some cases, a name of a software application that performs the task. The parameters may also specify other conditions for the task, embodiments of which are described herein.

The parameters may be identified using one or more detectors, for example. Each of the detectors may be configured to analyze natural language user input (e.g., a textual representation of the natural language user input) and identify one or more respective data types. For example, a first detector may be configured to identify a user contact and a second detector may be configured to identify an address. Other detectors may identify data types including, but not limited to, phone number, name, person of interest, place of interest, URL, time, flight number, package tracking number, and date.

In some embodiments, words of the customized vocabulary may be identified as parameters. For example, one or more detectors may be configured to identify customized vocabulary for one or more applications, respectively. The customized vocabulary of the application may include the name of the application (e.g., Uber, Lyft, Instagram, Flickr, WeChat, WhatsApp, LINE, Viber) and/or may include other terms uniquely associated with the application (e.g., UberX, DM, Lyftline, ZipCar).

In some embodiments, one or more applications may register with the application registration service. The services may be hosted by the server 108 and/or the user device 104 or otherwise accessible by the server 108 and/or the user device 104. Registering in this manner may include specifying one or more custom lexical terms associated with the application and, optionally, one or more language models of the custom lexical terms. The language model may, for example, provide one or more pronunciations for each of the custom lexical terms. The language model provided in this manner may then be used to help identify the use of such custom terms during analysis of natural language user input. In some embodiments, the customized vocabulary may be included in the vocabulary index 744 (FIG. 7B).

In some embodiments, one or more parameters may be inferred from natural language user input. For example, in the example "please drive me to the stadium," one parameter associated with intent may be inferred to be the user's location. In another embodiment, "refund john rice," one parameter that may be inferred to be associated with intent is the number of money.

In some embodiments, an intent is identified based on natural language user input, and then a parameter associated with the intent is identified. Also, in some embodiments, parameters not associated with intent are not identified. For example, an intent corresponding to a heading (e.g., driving direction) may be associated with parameters that specify one or more locations (e.g., origin and/or destination) and/or transit modes. Consider, for example, the example "please give me real-time driving directions to 1200 avenues". In this way, the recognized intention corresponds to the task of providing the direction, "driving" is a parameter that specifies the relay mode, and "1200 avenues" is a parameter that specifies the location. The portion of the user input that is "real-time" is not a parameter associated with the identified intent. Thus, while "real-time" may be some intended valid parameter, it will not be recognized as a parameter during operation.

In other embodiments, one or more parameters may be identified first and an intent may be identified based on the identified one or more parameters. In other embodiments, the intent and the parameters associated with the intent may be identified simultaneously.

In some embodiments, the intent and parameters of the natural language user input are recognized by a user device, such as user device 104 of fig. 1. In other embodiments, the user device provides the natural language user input (or a representation thereof) to a server, such as server 108 of fig. 1, and the server identifies (e.g., determines) the intent and parameters of the natural language user input, as described. The server then provides (transmits) the identified intent and parameters to the user device.

Optionally, once the intent and any parameters are identified, the user device determines the identified intent and/or parameters, and in some cases, inferred parameters, to the user of the user device. Determining in this manner may include prompting the user to determine the identified intent and all identified parameters associated with the intent in response to the natural language query. For example, in response to a user input of "please drive me to the airport," the user device may provide a natural language query of "do you want to drive from your current location to the airport? "the natural language query provided by the user device may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device (e.g., speaker 211 of fig. 2). The user may respond to the natural language query, for example, by providing natural language user input to the user device.

Optionally, the user equipment determines the respective parameters. In some embodiments, this may include prompting the user to determine one or more parameters. For example, in response to a user input "please drive me to a bus stop," the user device may provide a natural language query "do you say at a guest stop? "as another example, in response to a user input" pay john $5, "the user device may provide a natural language query" do you say john smith? "the natural language query provided by the user device may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device (e.g., speaker 211 of fig. 2). The user may respond to the natural language query, for example, by providing natural language user input to the user device.

In some embodiments, the one or more parameters are context. Accordingly, the user device may determine (e.g., resolve) one or more parameters based on the context information. The context information may be context information of the user device (or any data stored therein) and/or context information of a user of the user device. For example, the natural language user input may say "please send a vehicle to my home". Because "my home" is a contextual parameter and does not specify a real location, the user device may determine the location of the user device and identify the determined location as a parameter (i.e., instead of "my home"). As another example, the natural language user input may speak "call back to him". Because "he" is a context parameter and does not specify a particular contact, the user device may determine the contact that "he" wants and pass the contact as a parameter (i.e., instead of "he").

In some embodiments, the identified intent and parameters are implemented as intent objects. When so implemented, each intent object is an object (e.g., a data structure, a programming object) and corresponds to a respective intent. Each intent object may include one or more fields (e.g., instance variables) that respectively correspond to one or more parameters. For example, an intent object corresponding to an intent in a ride reservation intent can generate (e.g., instantiate) pseudo code as follows:

as will be appreciated by those skilled in the art, the above pseudo-code is exemplary and the intended object may be implemented in other ways.

By implementing the intent as an intent object, the intent may not be linguistically known. As described, the intent may be obtained from natural language user input. Thus, the same intent can be obtained from natural language input provided in any number of spoken languages. For example, an English-language natural language user input of "find Uber to 1200 park avenues" and a German natural language user input of "find Uber to 1200 park avenues" each originate from the same intent (and from the same intent object) that was identified.

At block 815, a software application associated with the intent may be identified (e.g., selected). Typically, this may include identifying one or more software applications configured to perform tasks corresponding to the intent.

In some embodiments, identifying the software application may include determining one or more domains corresponding to the intent and identifying the application corresponding to the domain. As described, one or more software applications may register with the application registration service. Registering in this manner may include specifying which domains (e.g., ride schedule domain, flight travel domain, navigation domain) correspond to the software application. The application corresponding to the domain may support each of the domain-specified intents or may support only some of the domain-specified intents. In some embodiments, the applications may be registered with respective intents, and identifying the applications may include identifying the applications that correspond to the identified intents.

In some embodiments, the application is identified based on the identified parameters. For example, based on a user input of "send a black car to an airport," an intent to order a car using a ride reservation service may be identified along with a parameter specifying the type of vehicle (i.e., "black car"). While several available applications may typically be configured to book a car, those applications configured to book a "black car" may be identified. As another example, sending a message to the collar "hello" based on a user input. Although several applications may be configured to send messages, those applications having contact information for the contact of Sam may be identified.

In some embodiments, only applications that are installed on and/or accessible to the user device are identified. For example, while several available applications may be configured to perform tasks, those applications that may access the user device are identified. Accessible applications include applications that are resident and/or installed on the user device and also include applications that are remotely accessible by the user device, for example, on one or more other devices.

Thus, in at least some embodiments, the identified application is an application that is configured to perform a task according to the identified parameters and that is accessible to the user device. In some embodiments, multiple applications may satisfy these criteria, however the user device may wish to identify fewer applications, or a single application. Thus, the application may also be identified based on previous usage of the application by the user device. In some embodiments, for a given intent, the application most recently used to perform the task corresponding to the intent is identified. Consider a user input "call to robo," where intent (i.e., making a call) and parameters (i.e., "robo") may be identified. In this embodiment, the application that was last used to place a call is identified. In other embodiments, the applications most commonly used to perform the tasks corresponding to the intent are identified. For the same embodiment, the application most commonly used to place a call is identified. In some embodiments, the application is further selected based on one or more parameters. For example, the application most recently used to place a call to a contact in robo is identified, or the application most commonly used to place a call to a contact in robo is identified. In some embodiments, a default application may be specified for one or more particular tasks and/or parameters, for example, by a user or a digital assistant. The user may specify, for example, to use the first application when calling the first contact and to use the second application when calling the second contact.

As described, in some cases, the natural language user input may include a customized vocabulary that is identifiable as one or more parameters. In some embodiments, such customized vocabulary includes the application name, and thus the application may be identified based on the presence of the customized vocabulary in the input. For example, a natural language user input may say "call robo using Skype". In response, the software application Skype may be the identified software application. In another embodiment, the natural language input may speak "play a cape on Spotify" and in response, the software application Spotify may be the identified software application.

The customized vocabulary may also include terms uniquely associated with the application. Thus, such terms may be identified as parameters and used to identify applications. In the embodiment "assign me an UberX," UberX is a term of the custom vocabulary of the software application Uber and as a result Uber is identified as a software application. In the embodiment "push me wishes the shark team to cup," Tweet "is a term for the custom vocabulary of the software application Twitter and as a result Twitter is identified as a software application.

In some embodiments, applications that are not configured to perform tasks according to the identified parameters may be accessed by the user device. As a result, the user device may access (e.g., download and/or install) an application configured to perform a task in accordance with the identified parameters. In some embodiments, the user device may identify a plurality of software applications and provide a list of software applications to the user. The user may select one or more applications and the user device may access the one or more selected applications.

At block 820, the intent and parameters are provided to the identified software application. In some embodiments, the intent and parameters are provided to the software application as intent objects.

In some embodiments, the intents and parameters may be selectively provided to the software application based on the state of the user device. As an embodiment, the intent and parameters may be selectively provided to the user device based on whether the user device is in a locked state. In some embodiments, the application may be allowed to receive specific intents and parameters while the device is in a locked state. In other embodiments, the application may be allowed to receive specific intents and parameters when the user device is not in a locked state. Whether an application can receive a particular intent for a particular state of a user device can be specified by a software application, for example, during a registration process with an application registration service.

At block 825, the user device may receive one or more answers from the software application. In some embodiments, the user device receives a reply for each parameter provided to the software application. Each response may indicate, for example, whether the parameter is valid or whether additional user input is required. If the answer indicates that the parameter is valid, no further action is taken with respect to the parameter.

If the response provided by the software application does not indicate that the parameter is valid, the response may indicate that clarification of the parameter is required. For example, the parameters may not be appropriate (i.e., invalid) and the software application may request additional input from the user. As an example, consider a user input of "Send a blue car to 1200 avenues". Although the user requests a blue car, the ride reservation application (e.g., Uber, Lyft) may not allow the selection of a blue car (e.g., the blue car may not be a supported parameter or the application may determine that no blue car is currently available). Thus, in the event that a parameter is not appropriate (e.g., a user specifies an invalid type of car), the application may request that an appropriate (e.g., valid) value be provided for the parameter (e.g., car type). For example, referring to FIG. 10A, based on a response from the software application indicating that the parameters are not appropriate, the user device may provide a natural language query 1002 prompting the user to select valid parameters. In this embodiment, the user device providing the natural language query asks the user to select a valid car type. The natural language query 1002 may be provided to the user as text using a touch-sensitive display of the user device and/or may be provided to the user as speech using an audio output component of the user device. As illustrated, in some embodiments, the user device may provide (e.g., display) one or more candidate parameters 1004 to the user for selection. In the present embodiment, the candidate parameters 1004 include "pre-calculation", "black car", "SUV", and "share". In some embodiments, the candidate parameters 1004 provided in this manner may be provided by a software application. The user device may select one of the candidate parameters by providing a touch input and/or providing a natural language user input to the user device, and in response, the user device may provide the selected candidate parameter to the software application.

If the answer provided by the software application does not indicate that the parameter is valid, the answer may indicate that disambiguation of the parameter is required. As a result, the user device may request input from the user to disambiguate the parameters. In an embodiment where the user enters "call tom," the parameters of the contact tom may be provided to a software application configured to place a call. If the software application determines that there are multiple contacts named "tom" when determining the contact information (e.g., phone number) for the contact tom, the software application may request the user to specify which "tom" is desired. As part of the request, the software application includes a disambiguation list having a plurality of candidate parameters in the reply. The user device may provide a natural language query asking the user to select a candidate parameter. Also, candidate parameters for the disambiguation list may be displayed to allow user selection. The user may select one of the candidate parameters, for example, by providing touch input and/or providing natural language user input, and the user device may provide the selected candidate parameter to the software application.

Performing a task may require a user to specify one or more parameters of a particular type. In some embodiments, the natural language user input may ignore one or more requested parameters. Thus, the software application may optionally provide one or more parameters that the response indicates, if present, that the requirements are not specified. Consider the user input "dispatch a car to 1200 avenues". Although automobiles are typically requested using user input, a ride reservation application identified based on the user input may request selection of a particular type of automobile. Thus, in the event that a parameter is missing (e.g., the user does not specify the type of car), the application may request that an approval value be provided for the parameter (e.g., the type of car). Referring again to FIG. 10A, the user device may provide a native language query 1002 prompting the user to specify valid parameters. Thereafter, the user may select a candidate parameter, for example, from a list of candidate parameters, and the selected parameter may be provided to the software application, as described above.

Once the software application indicates that each parameter is valid and no additional information is requested, the user device may confirm the intent to the software application at block 830. In particular, the user device may request a notification that, given an intent and parameters associated with the intent, the software application may successfully perform a task corresponding to the intent.

Once the software application provides the notification indicating that the software application may perform the task, optionally, the user device confirms the intent to the user. For example, the user device may provide a natural language query "can i get uberX in your location, do i want to request it? The user may confirm or reject the intent by touch input or natural language input. In some embodiments, the notification provided by the software application may include information provided to the user. The information may, for example, allow the user to make a more informed decision when prompted for confirmation. For example, the user device may provide a natural language query "can i paiberx get to your location after 9 minutes, do i want to request it? "

Thereafter, the user device causes (e.g., instructs) the software application to perform the task corresponding to the intent in accordance with the parameters.

At block 835, the user device receives a result reply from the software application indicating whether the software application successfully performed the task. The result answer indicating that the task was not performed may also indicate one or more reasons for the failure. In some embodiments, the user device may provide an output, such as a natural language output, to the user indicating one or more reasons for the failure.

The result answer indicating successful execution of the task may include one or more answer items. Each response item may be a result (e.g., received, generated) determined by the software application when performing the task. For example, the answer items corresponding to a car scheduled with a ride reservation application may include car type, license plate number, driver name, arrival time, current car location, pickup location, destination, estimated journey time, estimated cost and estimated journey route, service type (e.g., Uber pool vs. As another example, responsive items corresponding to initiating an exercise session with a fitness application may include confirming that the session has been initiated, an exercise duration, an activity type, and one or more goals.

In some embodiments, one or more response items may be provided to the user. Referring to FIG. 10B, for example, one or more response items may be provided to the user as natural language output 1012, text input, and/or audio output. The answering item may also be provided visually. For example, a map 1014 of the estimated trip route may be provided to the user. It should be appreciated that the response items may be provided to the user in any desired manner.

In some embodiments, the software application may specify the manner in which one or more response items are provided to the user. That is, the software application may determine the manner in which the response items are displayed and/or spoken to the user and the digital assistant may provide each response item accordingly.

In some embodiments, the software application may specify a manner of providing the response item using the UI extension of the digital assistant. The user device may, for example, provide a software application with a set of view controller parameters (e.g., fields that may be provided to the view controller for display), and in response, the software application may provide a set of view controller parameter values. The set of view controller parameter values may indicate which answer items are to be displayed in the various fields of the view controller and/or the manner in which the answer items are displayed in each field.

In other embodiments, the digital assistant can determine how to provide the response item. In other embodiments, the software application is invoked such that the user can interact directly with the software application. In some embodiments, invoking the application in this manner may terminate the session with the digital assistant.

In some embodiments, the license of the software application is verified, e.g., before the intent and parameters are provided to the software application. The user device may, for example, determine whether a software application is allowed to access data associated with a particular intent. The determination may be made based on permissions configured on the user device. In the embodiment "assign a black car at my location," location "may be a contextual parameter that requires contextual information (e.g., location data) of the user device. Thus, before resolving the location of the user device and providing the location as a parameter to the software application, the user device may first determine whether the software application is allowed to access the information. If the software application is allowed to access the data, operation proceeds as described. If the software application is not allowed to access the data, the intent and parameters are not provided to the software application and no task is performed.

In some embodiments, the natural language input may include a plurality of task requests. Accordingly, based on the native language input, multiple intents and/or multiple applications may be identified. Optionally, parameters associated with each intent are also identified. The natural language input "please drive me to the airport and tell me the status of the flight" may include, for example, both an intent to book a car and an intent to acquire the status of a user's flight. In some embodiments, the tasks corresponding to each intent may be performed sequentially or simultaneously.

In some embodiments, the natural language input may include a plurality of related task requests. For example, in some embodiments, a request task input by a natural language user may be dependent on completion of another request task input by the natural language. In the example "email me directions to airport" two tasks are requested: the first task provides directions and the second task sends emails. The parameter "i me" specifies a particular contact and is a parameter that specifies an intent to send an email, and the parameter "airport" is a parameter that specifies a destination that provides the intent to point to. The second task (email) depends on the first task (providing directions) since sending directions by email requires providing directions first. Therefore, the task of acquiring the pointing is performed first.

In some embodiments, intent may be provided between applications. For example, an application may provide an intent object to another application to cause the application to perform a task. In this embodiment, both intentions (providing directions and email) may be provided to the map application to provide the requested directions. The second intent or the intent to send the email may be provided as a parameter to a mapping application, for example. The map application may provide the requested directions according to the first intent or the intent to provide directions. The map application may then provide the second intent to the email application, e.g., including the pointed-to intent object as a parameter. In response, the email application may direct the sending by email as requested.

As another example, a user may provide a user input "please drive me to a shark team race" while browsing a sports application (e.g., an ESPN application). In response, the sports application with information about the game may communicate the intent (e.g., the scheduled car) and parameters (e.g., the game address) to the ride reservation application. In some embodiments, intents and parameters may be provided as intent objects.

As another example, the user may provide the user input "pay me brother $ 5" when using the ride reservation application. In response, the ride reservation application may pass the intent (e.g., payment) and parameters ($5) to a payment application (e.g., PayPal, Venmo, online payment service). As described, in some embodiments, intents and parameters may be provided as intent objects.

Fig. 9 illustrates a flow diagram of a process for operating a digital assistant, according to some embodiments. Process 900 may be used, for example, to implement at least a portion of process 800 of FIG. 8, including but not limited to block 815 and/or block 820 of FIG. 8. Process 900 is performed, for example, using one or more electronic devices (e.g., devices 104,108,200,400 or 600) implementing a digital assistant. In some embodiments, process 900 is performed using a client-server system (e.g., system 100), and the blocks of process 900 may be divided in any manner between the server (e.g., DA server 106) and the client devices. Thus, although portions of process 900 are described herein as being performed by a particular device of a client-server system, it should be understood that process 900 is not so limited. In other embodiments, process 900 is performed using only a client device (e.g., user device 104). In process 900, some blocks are optionally combined, the order of some blocks is optionally changed, and some blocks are optionally omitted. In some embodiments, additional steps may be performed in conjunction with process 900.

At block 905, natural language user input is received by a user device, such as user device 104 of FIG. 1. As described, the natural language input may include a request to the user device and/or another device to perform a task, and may also specify one or more parameters of the requested task.

At block 910, an intent and optionally one or more parameters associated with the intent are identified. The intent and parameters may be obtained from natural language user input. As previously mentioned, the intent may correspond to any kind of task performed by the user device, and in particular may correspond to a task performed by one or more applications of the user device. The parameters associated with the intent may identify portions of the natural language input that specify a manner in which a task corresponding to the intent is to be performed. In an embodiment, "please drive me to the airport" is intended to correspond to a mission for a scheduled car and "airport" is a parameter that specifies a destination. Assuming the user's intent is a predetermined car, the user's location may also be a parameter (e.g., an inferred parameter).

At block 915, it is determined whether the task corresponding to the intent can be satisfied. In some embodiments, determining may include determining whether an application configured to perform a task according to a parameter is accessible from the user device. In this embodiment, it is determined whether the user device has access to an application configured to predetermine a car at the location of the user device. As noted, accessible applications are those that are stored locally on the user device and are accessible remotely by the user device.

In accordance with a determination that the task corresponding to the intent can be satisfied, at block 920, the intent and parameters are provided to the software application. For example, if it is determined at block 915 that the user device has access to a software application configured to perform the requested task according to any identified parameters, then the intent and parameters are provided to the application to perform the task. In this embodiment, this includes determining whether a software application configured to reserve a car at the location of the user device is accessible from the user device. For example, the ride reservation application Lyft may be installed on a user device and may be used to reserve an automobile according to embodiments described herein.

In accordance with a determination that the task corresponding to the intent cannot be satisfied, at block 925, a list of one or more software applications is provided. The one or more software application lists may include, for example, one or more software applications configured to perform the task associated with the intent based on any identified parameters. In some embodiments, the one or more software applications of the list may be identified, for example, based on one or more domains associated with the intent (to which the cancellation application may register). Referring to fig. 10C, once the list of software applications (e.g., ride plan applications) is determined, the user device may provide the list to the user. As shown, providing the list can include providing a natural language input 1022 requesting that the user select an application from a list of one or more software applications. In some embodiments, the list is generated by the user device. In other embodiments, the list of software applications is generated by the server and provided to the user device, which in turn may provide the list to the user, as described.

At block 930, the user device receives user input indicating a selection of one or more software applications in the one or more software application lists. The user input may be a touch input on a touch sensitive display of the user device and/or may be a natural language user input.

At block 935, the intent and parameters are provided to the software application selected by the user. In some embodiments, providing the intent and the parameter includes downloading and/or installing the software application such that the user device may access the software application locally. In other embodiments, this includes remotely accessing the selected application.

As described, the software application may provide one or more responses in response to the intent and the parameters. Once the parameters are verified, the user device may confirm the intent with the software application and cause the software application to perform the task. The user device may thereafter receive the resulting response and optionally provide one or more response items of the resulting response to the user.

Reference is made herein to providing a natural language output and/or a natural language query to a user of a user device. In some embodiments, the manner in which the natural language output and query are provided to the user may depend on the type or state of the user device. If, for example, the user device is a mobile phone, the user device may provide the query using both text and audio. If on the other hand the user device is a speaker, the user device may provide the query using audio only. As another example, if the user device is a mobile phone that is not paired with a headset, the user device may provide the query using text and/or a relatively short natural language query. If the user device is paired with a headset, the user device may only provide queries using relatively long natural language queries.

Fig. 10D illustrates an exemplary data flow of a digital assistant system according to some embodiments. In some embodiments, the data flow of FIG. 10D may be implemented using one or more of the

processes

800, 900. Fig. 10D illustrates an exemplary data flow of a digital assistant system 1030 according to some embodiments. Specifically, fig. 10D shows a data flow of the application registration process and a data flow of the execution task. Data flows 1031-.

Typically, the data flow associated with the application enrollment process involves the enrollment of the application with an application enrollment service (e.g., a verification service), whereby the application and the customized vocabulary corresponding thereto may be accessed and/or utilized by a digital assistant performing the task.

In operation, in data stream 1031, an application is submitted to application browsing module 1032. Both a language model corresponding to the application and an intent of the application may also be submitted. The language model may include a customized vocabulary of applications. In turn, in data flow 1033, the application browsing module 1032 can provide the application, the customized vocabulary, and/or the intent of the application to the verification service 1034. The authentication service 1034 may determine whether to authenticate the application, e.g., based on whether the application is operational with the digital assistant. This may include, for example, ensuring that any intent of the application corresponds to one or more domains of the application. For example, the verification service may reject an instant messaging application associated with an intent to reserve a car because the domain and intent are mismatched. At data flow 1035, the verification service 1034 can provide a verification reply indicating whether the application is valid.

If the verification service 1034 indicates that the application is valid, the application browsing module 1032 provides the application (as verified) to the application store 1036. Typically, applications are downloaded by user device 1040 via DA server 1038 and/or accessed at application storage 1036, as indicated by data flow 1039. In some embodiments, user device 1040 may be user device 104 of fig. 1, and DA server 1038 may be DA server 106 of fig. 1. Plist may, for example, result in an application list (e.g., info. plist) of user device 1040 being updated and/or synchronized with DA server 1038. In data flow 1041, the verification service can provide the customized vocabulary (e.g., runtime vocabulary) of the application to DA server 1038 to facilitate parsing of the natural language input, as described.

In general, the data flow associated with performing a task involves providing an intent, and optionally, one or more parameters of an application for performing the task corresponding to the intent.

In operation, in data flow 1043, user device 1040 can provide natural language input to DA server 1038. In some embodiments, the natural language input may be provided by the digital assistant 1042 of the user device 1040. Based on the natural language input, DA server 1038 can identify one or more tasks requested in the natural language input and one or more parameters associated with the intent. Additionally, DA server 1038 can identify an application for performing a task associated with the intent. In some embodiments, the name (or other form of identifier) of the identified application 1044 may be a parameter of intent. DA server 1038 thereafter provides the intent, parameters, and identification of the identified application 1044 to user device 1040 (e.g., digital assistant 1042 of user device 1040) at data flow 1045. In some embodiments, the intent and parameters may be provided to the user device 1040 as an intent object.

In response, the digital assistant determines whether to allow the identified application 1044 to access information associated with the identified parameter. For example, if the parameter is the location of user device 1040, the digital assistant queries data permissions 1046 to determine whether to allow application 1044 access to the location data.

Where the application is allowed to access the data for each parameter, the user device (e.g., the user device's digital assistant 1042) provides the intent to the application. As noted, the application may reside on user device 1040. In other embodiments, the application may reside on one or more other devices and the intent may be transmitted to the application over one or more networks. As described, if the application determines that one or more parameters are missing, inappropriate, and/or unclear, the application 1044 may thereafter request input from a user of the user device 1040. In some embodiments, the user-entered query may be provided as a natural language query generated by DA server 1038. Thus, in data flow 1051, user device 1040 can request, and subsequently receive, one or more natural language queries. Once all parameters are resolved, the application 1044 can execute the task corresponding to the intent and provide a resulting answer indicating whether the task was successfully performed.

One or more of the data flows of fig. 10D are performed (e.g., generated), for example, using one or more electronic devices (e.g.,

devices

104, 108, 200, 400, or 600) implementing a digital assistant. In particular, the data flow provided between DA server 1038 and digital assistant 1042 of user device 1040 is shown to be dependent on the client-server architecture. In other embodiments, DA server 1038 may be implemented as a process and/or service on user device 1040. Thus, in some embodiments, the data streams exchanged between DA server 1038 and digital assistant 1042 may be exchanged only on user device 1040.

Fig. 10E illustrates an exemplary data flow of the digital assistant system 1060, according to some embodiments. In particular, fig. 10E illustrates an exemplary data flow of an application registration process and may be used to implement the application registration process as discussed in fig. 10D. Also, several components of fig. 10E correspond to components of fig. 10D, respectively, and may be provided with the same reference numerals. For the sake of brevity, a description of its function and operation will not be repeated.

In data flow 1065, the verified vocabulary is provided from the verification service 1034 to the global application vocabulary store 1060. In general, the global application vocabulary storage may store language models and/or vocabularies for any number and/or version of software applications. In

data streams

1061 and 1063, a speech training module 1062 and a natural language training module 1064 are trained to recognize natural language and to process application-specific vocabulary that accompanies the verification application. Based on the data, the global application vocabulary store may generate and/or train one or more language models that allow the digital assistant to recognize and process utterances that include application-specific vocabularies.

During operation of a user device, such as user device 1040 of fig. 10D, the runtime-specific global application vocabulary store may receive vocabulary and/or language models for one or more applications of the user device from the global application vocabulary store 1060. The vocabulary may be specific to a user ID of a user of the user device and/or may be specific to a version of an application and/or operating system of the user device. Based on the vocabulary, one or more terms of the natural language input, for example, may be identified as parameters.

According to some embodiments, fig. 11 illustrates a functional block diagram of an electronic device 1100 configured according to the principles of various described embodiments, including those described with reference to fig. 8. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 11 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.

As shown in fig. 11, the electronic device 1100 includes a touch-sensitive display unit 1102 and a processing unit 1108 that is optionally coupled to the touch-sensitive display unit 1102. In some embodiments, the processing unit 1108 includes a receiving unit 1110, an identifying unit 1112, a providing unit 1114, and optionally, an output unit 1116, a speaker unit 1118, a display enabling unit 1120, a determining unit 1122, a requesting unit 1124, an accessing unit 1126, and a facilitating unit 1128.

In some embodiments, the processing unit 1108 is configured to receive (e.g., with the receiving unit 1110) natural language user input (e.g., block 805 of fig. 8); identifying (e.g., with the identification unit 1112) an intent object in a set of intent objects and parameters associated with the intent object, wherein the intent object and parameters are obtained from a natural language user input (e.g., block 810 of FIG. 8); identifying (e.g., with the identifying unit 1112) a software application associated with an intent object in the set of intent objects (e.g., block 815 of FIG. 8); and providing (e.g., with providing unit 1114) the intent object and the parameters to the software application (e.g., block 820 of fig. 8).

In some embodiments, processing unit 1108 is further configured to receive (e.g., with receiving unit 1110) an acknowledgement from the software application, where the acknowledgement indicates whether the parameter is valid (e.g., block 825 of fig. 8).

In some embodiments, receiving (e.g., with receiving unit 1110) the reply from the software application includes receiving (e.g., with receiving unit 1110) a disambiguation list associated with the parameter from the software application, where the disambiguation list includes a plurality of candidate parameters.

In some embodiments, processing unit 1108 is further configured to output (e.g., with output unit 1116) the disambiguation list; receiving (e.g., with receiving unit 1110) a user input indicating a selection of a candidate parameter value for a disambiguation list; the candidate parameter values of the selected disambiguation list are provided (e.g., with providing unit 1114) to the software application.

In some embodiments, the natural language user input is a first natural language input, the parameter is a first parameter, the processing unit 1108 is further configured to receive (e.g., with the receiving unit 1110) a request from a software application for a second parameter associated with the intent object; providing (e.g., with the providing unit 1114) a natural language query based on the request; receiving (e.g., with receiving unit 1110) a second natural language user input; a second parameter is identified (e.g., with identification unit 1112), where the second parameter is obtained from the second natural language user input, and the second parameter is provided (e.g., with providing unit 1114) to the software application.

In some embodiments, the electronic device further includes an audio output component, and providing the natural language query includes speaking (e.g., with the speaker unit 1118) the natural language query through the audio output component via speech synthesis.

In some embodiments, providing the natural language query includes providing (e.g., with the providing unit 1114) the natural language query based on the type of electronic device, the state of the electronic device, or a combination thereof.

In some embodiments, the processing unit 1108 is further configured to, after providing the intent object and parameters to the software application, receive (e.g., with the receiving unit 1110) a resulting reply associated with the intent object from the software application (e.g., block 835 of fig. 8).

In some embodiments, receiving the resulting response includes receiving (e.g., with receiving unit 1110) a set of response items associated with the intent object from the software application and outputting (e.g., with outputting unit 1116) the set of response items.

In some embodiments, the processing unit 1108 is further configured to provide (e.g., with the providing unit 1114) a set of view controller parameters to the software application; and receiving (e.g., with receiving unit 1110) a set of view controller parameter values corresponding to the set of view controller parameters from the software application, wherein outputting the set of answer items comprises enabling display of the set of answer items in the user interface (e.g., with display enabling unit 1120) based on the received view controller parameter values.

In some embodiments, each intent object in the set of intent objects is associated with a software application.

In some embodiments, the intent object is a first intent object and wherein the parameter is associated with the first intent object and is not associated with the second intent object.

In some embodiments, the processing unit 1108 is further configured to determine (e.g., with the determining unit 1122) whether the electronic device is in a locked state; and determining (e.g., with the determining unit 1122) whether the intent object is allowed to be provided to the software application while the electronic device is in the locked state based on determining that the intent object is allowed to be provided to the software application while the electronic device is in the locked state, wherein the intent object and the parameter value are provided to the software application only based on determining that the intent object is allowed to be provided to the software application while the electronic device is in the locked state.

In some embodiments, the parameters are obtained from the natural language user input by analyzing the natural language user input with each of a plurality of detectors.

In some embodiments, the processing unit 1108 is further configured to determine (e.g., with the determining unit 1122) a user context of the electronic device, wherein the parameter is based at least in part on the user context.

In some embodiments, the processing unit 1108 is further configured to request (e.g., with the requesting unit 1124) an acknowledgement of the parameter; and receiving (e.g., with receiving unit 1110) user input corresponding to the confirmation of the parameters, wherein the parameters are provided to the software application in response to receiving the user input.

In some embodiments, identifying the software application includes determining (e.g., using determining unit 1122) whether the software application is resident on the electronic device; and in accordance with a determination that the software application is not resident on the electronic device, determining (e.g., with determining unit 1122) whether the software application is resident on an external device in communication with the electronic device. In some embodiments, providing the intent object and parameters to the software application includes, in accordance with a determination that the software application is resident on an external device in communication with the electronic device, providing (e.g., with the providing unit 1114) the intent object and parameters to the software application by providing the intent object and parameters to the external device.

In some embodiments, identifying the software application includes determining (e.g., using determining unit 1122) whether the software application is resident on the electronic device; in accordance with a determination that the software application is not resident on the electronic device, identifying (e.g., with identification unit 1112) a set of software applications associated with the intent object; enabling display (e.g., with display enabling unit 1120) of the set of software applications associated with the intended object on the user interface; receiving (e.g., with receiving unit 1110) a user input indicating a selection of one or more of the set of software applications associated with the intent object; and accessing (e.g., with accessing unit 1126) the selected one or more of the set of software applications associated with the intent object.

In some embodiments, identifying the software application associated with the intent object in the set of intent objects includes identifying (e.g., with the identification unit 1112) the software application based on the parameters.

In some embodiments, the processing unit 1108 is further configured to identify (e.g., with the identifying unit 1112) a third intent object from the set of intent objects based on the natural language user input; identifying (e.g., with identification unit 1112) a second software application based on the identified third intent object; providing (e.g., with providing unit 1114) the third intent object and the at least one response item to the second software application; and receiving (e.g., with receiving unit 1110) a second answer item associated with the third intent object from the second software application.

In some embodiments, the software application is a first software application, and providing the intent object and parameters to the software application includes causing (e.g., with the facilitating unit 1128) a third software application to provide the intent object and parameters to the first software application.

In some embodiments, receiving the natural language user input includes receiving (e.g., with receiving unit 1110) a natural language user input including application-specific terms; and wherein identifying the software application associated with the intent object of the set of intent objects comprises identifying (e.g., with the identifying unit 1112) the software application based on the application-specific terms.

In some embodiments, the processing unit 1108 is further configured to receive (e.g., with the receiving unit 1110) a command comprising an identification of the software application; and in response to the command, determine (e.g., with determination unit 1122) whether the software application is allowed to access the data associated with the intent object.

In some embodiments, the processing unit 1108 is further configured to cause (e.g., with the facilitating unit 1128) the software application to perform a task associated with the intent object based on the parameter.

The operations described above with respect to fig. 8 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 11. For example, receive operations 805,825 and 835; identifying

operations

810 and 815; providing operation 820 and confirming operation 830 are optionally implemented by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 11.

Those skilled in the art will appreciate that the functional blocks described in fig. 11 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, processing unit 1108 may have an associated "controller" unit operatively coupled to processing unit 1108 to initiate operations. This controller unit is not separately shown in fig. 11, but should be understood to be in the hands of one skilled in the art designing a device, such as device 1100, with processing unit 1108. In some embodiments, as another example, one or more units, such as the receiving unit 1110, may be a hardware unit other than the processing unit 1108. Thus, the description herein optionally supports combinations, separations, and/or further limitations of the functional blocks described herein.

According to some embodiments, fig. 12 illustrates a functional block diagram of an electronic device 1200, including those described with reference to fig. 8, configured according to the principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will understand that the functional blocks described in fig. 12 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.

As shown in fig. 12, one or more electronic devices 1200 include one or more processing units 1208. In some embodiments, the one or more processing units 1208 include a receiving unit 1210, a determining unit 1212, an identifying unit 1214, and a providing unit 1216.

In some embodiments, the one or more processing units 1208 are configured to receive (e.g., with the receiving unit 1210) natural language user input (e.g., block 805 of fig. 8); determining (e.g., with determining unit 1212) an intent object of a set of intent objects and a parameter associated with the intent object (e.g., block 810 of FIG. 8) based on the natural language user input; identifying (e.g., with the identifying unit 1214) the software application (e.g., block 815 of fig. 8) based on at least one of the intent object or the parameter; and providing (e.g., with providing unit 1216) the intent object and parameters to the software application (e.g., block 820 of fig. 8).

In some embodiments, determining the intent object and the parameters associated with the intent object in the set of intent objects based on the natural language user input includes determining (e.g., with the determining unit 1212) an intent object and parameters associated with the intent object in the set of intent objects with a first electronic device of the one or more electronic devices 1200; and wherein providing the intent object and the parameters to the software application comprises providing (e.g., with the providing unit 1216) the intent object and the parameters to the software application with a second electronic device of the one or more electronic devices 1200.

In some embodiments, the one or more processing units 1208 are configured to provide (e.g., with the providing unit 1216) a command with a first electronic device of the one or more electronic devices 1200 to a second electronic device of the one or more electronic devices 1200; and determining (e.g., with determining unit 1212), with the second electronic device, whether the software application is allowed to access the data associated with the intended object, in response to the command.

In some embodiments, the one or more processing units 1208 are configured to receive (e.g., with the receiving unit 1210) an acknowledgement from the software application, where the acknowledgement indicates whether the parameter is valid (e.g., block 825 of fig. 8).

In some embodiments, the parameter indicates a software application.

In some embodiments, the one or more processing units 1208 are configured to identify (e.g., with the identification unit 1214) the software application based on the natural language user input; and determining (e.g., with determining unit 1212) whether an intent object in the set of intent objects corresponds to a registered intent object of the software application; wherein the intent object and parameters are determined only from the registered intent object that determines that the intent object corresponds to the software application.

In some embodiments, the parameter is a first parameter, and the one or more processing units are configured to determine (e.g., with determining unit 1212) a second parameter based on the natural language user input; and providing (e.g., with providing unit 1216) the second parameter to the software application.

In some embodiments, the answer is a first answer, and the one or more processing units are configured to receive (e.g., with receiving unit 1210) a second answer from the software application, wherein the second answer indicates whether the second parameter is valid.

The operations described above with respect to fig. 8 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 12. For example, receive operations 805,825 and 835; identifying

operations

810 and 815; providing operation 820 and confirming operation 830 are optionally implemented by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 12.

Those skilled in the art will understand that the functional blocks described in fig. 12 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, the one or more processing units 1208 can have an associated "controller" unit operatively coupled to at least one of the one or more processing units 1208 to initiate operations. The controller unit is not separately shown in fig. 12, but should be understood to be within the grasp of those skilled in the art of designing an apparatus, such as apparatus 1200, having one or more processing units 1208. In some embodiments, as another example, one or more units, such as receiving unit 1210, may be hardware units other than one or more processing units 1208. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.

According to some embodiments, fig. 13 illustrates a functional block diagram of an electronic device 1300, including those described with reference to fig. 9, configured according to principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 13 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.

As shown in fig. 13, one or more electronic devices 1300 include one or more processing units 1308. In some embodiments, the one or more processing units 1308 include a receiving unit 1310, an identifying unit 1312, a determining unit 1314, a providing unit 1316, and optionally a facilitating unit 1318.

In some embodiments, the one or more processing units 1308 are configured to receive (e.g., with the receiving unit 1310) natural language user input (e.g., block 905 of fig. 9); identifying (e.g., with identification unit 1312) an intent object of a set of intent objects and parameters associated with the intent object based on the natural language user input (e.g., block 910 of FIG. 9); determining (e.g., with determining unit 1314) whether the task corresponding to the intent object can be satisfied based on at least one of the intent object or the parameters (e.g., block 915 of FIG. 9); in accordance with a determination that the task corresponding to the intent object can be satisfied, providing (e.g., with the providing unit 1316) the intent object and the parameters to a software application associated with the intent object (e.g., block 920 of FIG. 9); and in accordance with a determination that the task corresponding to the intent object cannot be satisfied, providing (e.g., with providing unit 1316) a list of one or more software applications associated with the intent object (e.g., block 925 of fig. 9).

In some embodiments, the one or more processing units 1308 are configured to, after providing the one or more lists of software applications associated with the intended object, receive (e.g., with the receiving unit 1310) user input indicating a selection of a software application of the one or more lists of software applications (e.g., block 930 of fig. 9); and providing (e.g., with the providing unit 1316) the intent object of the set of intent objects to the selected software application (e.g., block 935 of fig. 9) in response to the user input.

In some embodiments, the one or more processing units 1308 are configured to provide (e.g., with the providing unit 1316) the parameters to the selected software application in response to user input.

In some embodiments, the one or more processing units 1308 are configured to receive (e.g., with the receiving unit 1310) an acknowledgement from the selected software application, and the acknowledgement indicates whether the parameter is valid.

In some embodiments, the one or more processing units 1308 are configured to cause (e.g., with the facilitating unit 1318) the selected software application to perform a task corresponding to the intent object, and to receive (e.g., with the receiving unit 1310) from the selected software application a resulting response associated with the intent object after providing the intent object to the selected software application.

The operations described above with respect to fig. 9 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 13. For example, determine operation 905; providing operations 910,915 and 925 and receiving operation 920 are optionally performed by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 13.

Those skilled in the art will appreciate that the functional blocks described in fig. 13 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, one or more processing units 1308 can have an associated "controller" unit operatively coupled to at least one of the one or more processing units 1308 for initiating operations. The controller unit is not separately shown in fig. 13, but should be understood to be within the grasp of those skilled in the art of designing devices, such as device 1300, having one or more processing units 1308. In some embodiments, as another example, one or more units, such as the receiving unit 1310, may be hardware units other than one or more processing units 1308. Thus, the description herein optionally supports combinations, subcombinations, and/or further definitions of the functional blocks described herein.

According to some embodiments, fig. 14 illustrates functional block diagrams of an electronic device 1400, including those described with reference to fig. 9, configured according to principles of various described embodiments. The functional blocks of the device are optionally implemented by hardware, software, or a combination of hardware and software which embody the principles of the various described embodiments. Those skilled in the art will appreciate that the functional blocks described in fig. 14 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein.

As shown in fig. 14, the electronic device 1400 includes a touch-sensitive display unit 1402 and a processing unit 1408 optionally coupled to the touch-sensitive display unit 1402. In some embodiments, the processing unit 1402 includes a receiving unit 1410, a providing unit 1412, an obtaining unit 1414, a display enabling unit 1416, and optionally an identifying unit 1418 and a facilitating unit 1420.

In some embodiments, the processing unit 1408 is configured to receive (e.g., with the receiving unit 1410) a natural language user input, wherein the natural language user input indicates an intent object of a set of intent objects (e.g., block 905 of fig. 9); providing (e.g., with providing unit 1412) the natural language user input to the second electronic device; receiving (e.g., with receiving unit 1410) an indication from the second electronic device that the software application associated with the intent object is not on the first electronic device (e.g., block 925 of FIG. 9); in response to the notification, obtain (e.g., with obtaining unit 1414) a list of applications associated with the intent object; enabling display (e.g., with display enabling unit 1416) of a list of applications associated with the intent object in the user interface with the touch-sensitive display of the first electronic device; receiving (e.g., with receiving unit 1410) a user input indicating a selection of an application in the list of applications (e.g., block 930 of FIG. 9); and providing (e.g., with providing unit 1412) the intent object of the set of intent objects to the application (e.g., with block 935 of fig. 9).

In some embodiments, the processing unit 1408 is further configured to identify (e.g., with the identifying unit 1418) an intent object in the set of intent objects.

In some embodiments, the processing unit 1408 is further configured to identify (e.g., with the identifying unit 1418) parameters associated with the intent objects in the set of intent objects; and provide (e.g., with providing unit 1412) the parameters to the application.

In some embodiments, the processing unit 1408 is further configured to receive (e.g., with the receiving unit 1410) an acknowledgement from the application, and the acknowledgement indicates whether the parameter is valid.

In some embodiments, the processing unit 1408 is configured to cause (e.g., with the facilitating unit 1420) the application to perform a task associated with the intent object; and receiving (e.g., with receiving unit 1410) a resulting response associated with the intent object from the application after providing the intent object to the application.

The operations described above with respect to fig. 9 are optionally implemented by the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 14. For example, determine operation 905; providing operations 910,915 and 925 and receiving operation 920 are optionally performed by processor 120. Those of ordinary skill in the art will clearly know how other processes may be implemented based on the components depicted in fig. 1, 2A, 4, 6A-6B, 7A, and 14.

Those skilled in the art will appreciate that the functional blocks described in fig. 14 are optionally combined or separated into sub-blocks in order to implement the principles of the various described embodiments. Thus, the description herein optionally supports any possible combination or separation or further definition of the functional blocks described herein. For example, the processing unit 1408 may have an associated "controller" unit operatively coupled to the processing unit 1408 to initiate operations. This controller unit is not shown separately in fig. 14, but should be understood to be in the hands of one skilled in the art designing devices, such as device 1400, with a processing unit 1408. In some embodiments, as another example, one or more units, such as receiving unit 1410, may be hardware units other than processing unit 1408. Thus, the description herein optionally supports combinations, separations, and/or further limitations of the functional blocks described herein.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the technology and its practical applications. Those skilled in the art are thus well able to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

Although the present disclosure and examples have been fully described with reference to the accompanying drawings, it is to be noted that various changes and modifications will become apparent to those skilled in the art. It is to be understood that such changes and modifications are to be considered as included within the scope of the disclosure and examples as defined by the following claims.

Claims

1. A method of processing natural language, comprising:

at an electronic device with one or more processors:

receiving a natural language user input;

identifying, with the one or more processors, an intent object of a set of intent objects and a parameter associated with the intent object, wherein the intent object and the parameter are obtained from the natural language user input;

identifying a software application associated with an intent object of the set of intent objects;

providing the intent object and the parameters to the software application;

confirming the intent object to the software application, wherein confirming the intent object to the software application comprises requesting a notification that the software application can successfully perform a task corresponding to the intent object; and

after confirming the intent object to the software application, causing the software application to perform a task corresponding to the intent object in accordance with the parameters.

2. The method of claim 1, further comprising:

receiving a reply from the software application, wherein the reply indicates whether the parameter is valid.

3. The method of claim 2, wherein receiving an acknowledgement from the software application comprises:

receiving a disambiguation list associated with the parameter from the software application, wherein the disambiguation list comprises a plurality of candidate parameters.

4. The method of claim 3, further comprising:

outputting the disambiguation list;

receiving user input indicating a selection of a candidate parameter value of the disambiguation list; and

providing the selected candidate parameter values of the disambiguation list to the software application.

5. The method of any of claims 1-4, wherein the natural language user input is a first natural language input and the parameter is a first parameter, the method further comprising:

receiving, from the software application, a request for a second parameter associated with the intent object;

providing a natural language query based on the request;

receiving a second natural language user input;

identifying the second parameter, wherein the second parameter is obtained from the second natural language user input; and

Providing the second parameter to the software application.

6. The method of claim 5, wherein the electronic device further comprises an audio output component, and wherein providing the natural language query comprises speaking the natural language query through the audio output component via speech synthesis.

7. The method of claim 5, wherein providing the natural language query comprises providing the natural language query based on a type of the electronic device, a state of the electronic device, or a combination thereof.

8. The method of any of claims 1 to 4, further comprising:

after providing the intent object and the parameters to the software application, receiving a result reply associated with the intent object from the software application.

9. The method of claim 8, wherein receiving a result reply comprises:

receiving a set of response items associated with the intent object from the software application; and

outputting the set of response items.

10. The method of claim 9, further comprising:

providing a set of view controller parameters to the software application; and

receive a set of view controller parameter values corresponding to the set of view controller parameters from the software application,

Wherein outputting the set of response items comprises displaying the set of response items in a user interface based on the received view controller parameter values.

11. The method of any of claims 1-4, wherein each intent object in the set of intent objects is associated with the software application.

12. The method of any of claims 1-4, wherein the intent object is a first intent object and wherein the parameter is associated with the first intent object and is not associated with a second intent object.

13. The method of any of claims 1 to 4, further comprising:

determining whether the electronic device is in a locked state; and

in accordance with a determination that the device is in the locked state, determining whether the intent object is allowed to be provided to the software application while the electronic device is in the locked state,

wherein the intent object and the parameter value are provided to the software application only in accordance with a determination that the intent object is permitted to be provided to the software application while the electronic device is in the locked state.

14. The method of any of claims 1-4, wherein the parameters are obtained from the natural language user input by analyzing the natural language user input with each of a plurality of detectors.

15. The method of any of claims 1 to 4, further comprising:

determining a user context of the electronic device;

wherein the parameter is based at least in part on the user context.

16. The method of any of claims 1 to 4, further comprising:

requesting confirmation of the parameter; and

receiving user input corresponding to a confirmation of the parameter, wherein the parameter is provided to the software application in response to receiving the user input.

17. The method of any of claims 1-4, wherein identifying the software application comprises:

determining whether the software application is resident on the electronic device;

in accordance with a determination that the software application is not resident on the electronic device, determining whether the software application is resident on an external device in communication with the electronic device, and wherein providing the intent object and the parameters to the software application comprises:

in accordance with a determination that the software application is resident on an external device in communication with the electronic device:

providing the intent object and the parameters to the software application by providing the intent object and the parameters to the external device.

18. The method of any of claims 1-4, wherein identifying the software application comprises:

in accordance with a determination that the software application is not resident on the electronic device:

identifying a set of software applications associated with the intent object;

displaying the set of software applications associated with the intent object on a user interface;

receiving user input comprising a selection of one or more software applications of the set of software applications associated with the intent object; and

accessing one or more software applications of the selected set of software applications associated with the intent object.

19. The method of any of claims 1-4, wherein identifying a software application associated with the intent object of the set of intent objects comprises identifying the software application based on the parameter.

20. The method of any of claims 1 to 4, further comprising:

identifying a third intent object from the set of intent objects based on the natural language user input;

identifying a second software application based on the identified third intent object;

Providing the third intent object and at least one response item to the second software application; and

receiving a second response item associated with the third intent object from the second software application.

21. The method of any of claims 1-4, wherein the software application is a first software application, and wherein providing the intent object and the parameters to the software application comprises:

causing a third software application to provide the intent object and the parameters to the first software application.

22. The method of any of claims 1-4, wherein receiving natural language user input comprises:

receiving a natural language user input comprising application-specific terms; and

wherein identifying the software application associated with the intent object of the set of intent objects comprises:

the software application is identified based on the application-specific term.

23. The method of any of claims 1 to 4, further comprising:

receiving a command comprising an identification of the software application; and

in response to the command, determining whether to allow the software application to access data associated with the intent object.

24. The method of any of claims 1 to 4, further comprising:

causing the software application to perform a task associated with the intent object based on the parameter.

25. A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to:

receiving a natural language user input;

identifying, with the one or more processors, an intent object of a set of intent objects and a parameter associated with the intent object, wherein the intent and the parameter are obtained from the natural language user input;

providing the intent object and the parameters to the software application;

26. The non-transitory computer-readable storage medium of claim 25, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

27. The non-transitory computer-readable storage medium of claim 26, wherein receiving an acknowledgement from the software application comprises:

28. The non-transitory computer-readable storage medium of claim 27, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

outputting the disambiguation list;

29. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the natural language user input is a first natural language input and the parameter is a first parameter, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

providing a natural language query based on the request;

receiving a second natural language user input;

providing the second parameter to the software application.

30. The non-transitory computer-readable storage medium of claim 29, wherein the electronic device further comprises an audio output component, and wherein providing the natural language query comprises speaking the natural language query through the audio output component via speech synthesis.

31. The non-transitory computer-readable storage medium of claim 29, wherein providing the natural language query comprises providing the natural language query based on a type of the electronic device, a state of the electronic device, or a combination thereof.

32. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

33. The non-transitory computer readable storage medium of claim 32, wherein receiving a result reply comprises:

outputting the set of response items.

34. The non-transitory computer-readable storage medium of claim 33, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

providing a set of view controller parameters to the software application; and

35. The non-transitory computer readable storage medium of any one of claims 25-28, wherein each intent object of the set of intent objects is associated with the software application.

36. The non-transitory computer readable storage medium of any one of claims 25-28, wherein the intent object is a first intent object and wherein the parameter is associated with the first intent object and not a second intent object.

37. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

determining whether the electronic device is in a locked state; and

38. The non-transitory computer readable storage medium of any one of claims 25-28, wherein the parameters are obtained from the natural language user input by analyzing the natural language user input with each of a plurality of detectors.

39. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

determining a user context of the electronic device;

Wherein the parameter is based at least in part on the user context.

40. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

requesting confirmation of the parameter; and

41. The non-transitory computer readable storage medium of any one of claims 25-28, wherein identifying the software application comprises:

in accordance with a determination that the software application is not resident on the electronic device, determining whether the software application is resident on an external device in communication with the electronic device; and wherein providing the intent object and the parameters to the software application comprises:

42. The non-transitory computer readable storage medium of any one of claims 25-28, wherein identifying the software application comprises:

identifying a set of software applications associated with the intent object;

43. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein identifying a software application associated with the intent object of the set of intent objects comprises identifying the software application based on the parameters.

44. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

45. The non-transitory computer readable storage medium of any of claims 25-28, wherein the software application is a first software application, and wherein providing the intent object and the parameters to the software application comprises:

46. The non-transitory computer readable storage medium of any one of claims 25-28, wherein receiving natural language user input comprises:

the software application is identified based on the application-specific term.

47. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

determining, in response to the command, whether to allow the software application to access data associated with the intent object.

48. The non-transitory computer-readable storage medium of any one of claims 25-28, wherein the instructions, when executed by the one or more processors, further cause the electronic device to:

49. An electronic device that processes natural language, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

receiving a natural language user input;

providing the intent object and the parameters to the software application;

50. The electronic device of claim 49, wherein the one or more programs further include instructions for:

51. The electronic device of claim 50, wherein receiving an acknowledgement from the software application comprises:

52. The electronic device of claim 51, wherein the one or more programs further include instructions for:

Outputting the disambiguation list;

53. The electronic device of any of claims 49-52, wherein the natural language user input is a first natural language input and the parameter is a first parameter, wherein the one or more programs further include instructions to:

providing a natural language query based on the request;

receiving a second natural language user input;

providing the second parameter to the software application.

54. The electronic device of claim 53, wherein the electronic device further comprises an audio output component, and wherein providing the natural language query comprises speaking the natural language query through the audio output component via speech synthesis.

55. The electronic device of claim 53, wherein providing the natural language query includes providing the natural language query based on a type of the electronic device, a state of the electronic device, or a combination thereof.

56. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

57. The electronic device of claim 56, wherein receiving a result reply comprises:

outputting the set of response items.

58. The electronic device of claim 57, wherein the one or more programs further include instructions for:

providing a set of view controller parameters to the software application; and

59. The electronic device of any of claims 49-52, wherein each intent object of the set of intent objects is associated with the software application.

60. The electronic device of any of claims 49-52, wherein the intent object is a first intent object and wherein the parameter is associated with the first intent object and is not associated with a second intent object.

61. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

determining whether the electronic device is in a locked state; and

62. The electronic device of any of claims 49-52, wherein the parameters are obtained from the natural language user input by analyzing the natural language user input with each of a plurality of detectors.

63. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

Determining a user context of the electronic device;

wherein the parameter is based at least in part on the user context.

64. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

requesting confirmation of the parameter; and

65. The electronic device of any of claims 49-52, wherein identifying the software application comprises:

66. The electronic device of any of claims 49-52, wherein identifying the software application comprises:

identifying a set of software applications associated with the intent object;

67. The electronic device of any of claims 49-52, wherein identifying a software application associated with the intent object of the set of intent objects includes identifying the software application based on the parameters.

68. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

69. The electronic device of any of claims 49-52, wherein the software application is a first software application, and wherein providing the intent object and the parameters to the software application comprises:

70. The electronic device of any of claims 49-52, wherein receiving natural language user input comprises:

the software application is identified based on the application-specific term.

71. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

72. The electronic device of any of claims 49-52, wherein the one or more programs further include instructions for:

73. A system for processing natural language, comprising:

apparatus for performing the method of any one of claims 1 to 24.