US20180090137A1 - Forming chatbot output based on user state - Google Patents
Forming chatbot output based on user state Download PDFInfo
- Publication number
- US20180090137A1 US20180090137A1 US15/277,954 US201615277954A US2018090137A1 US 20180090137 A1 US20180090137 A1 US 20180090137A1 US 201615277954 A US201615277954 A US 201615277954A US 2018090137 A1 US2018090137 A1 US 2018090137A1
- Authority
- US
- United States
- Prior art keywords
- user
- chatbot
- input
- session
- client device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 49
- 230000015654 memory Effects 0.000 claims abstract description 24
- 230000004044 response Effects 0.000 claims description 37
- 230000003993 interaction Effects 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 9
- 238000013473 artificial intelligence Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000009471 action Effects 0.000 description 30
- 238000004891 communication Methods 0.000 description 7
- 239000000463 material Substances 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 235000008429 bread Nutrition 0.000 description 4
- 230000004424 eye movement Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000008921 facial expression Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000009118 appropriate response Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 206010061991 Grimacing Diseases 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- Chatbots also referred to as “interactive assistant modules,” “virtual assistants,” and/or “mobile assistants,” may be designed to mimic human conversation. For example, a chatbot may greet a user with conversational statements such as “hello” and “how are you today?” Some chatbots may even be configured to identify a state associated with a user statement and respond accordingly. Suppose a user tells a chatbot, “I feel lousy today.” The chatbot may detect the negative state expressed by the user and may select and output an appropriate response, such as “I'm sorry to hear that.” In spite of efforts to make chatbots seem more “human,” however, chatbots may still tend to come off as unnatural or awkward because, for instance, they do not keep track of users' emotional states over time.
- a user's “state” may refer to a particular condition of the user (at that time or at a previous time) or of another being (e.g., the user's friend/family member/pet), such as an emotional and/or physical condition (e.g., a sentiment of the user).
- a client device such as a smart phone, smart watch, standalone voice-activated product, or a vehicle computing system (e.g., a vehicle navigation or media management system) that operates a chatbot may receive input from the user.
- the input may arrive during a first “session” between the user and the chatbot in various forms, including but not limited to spoken or voice input, typed input, gesture input, eye movement input, facial expression input, and so forth.
- the chatbot may semantically process the input to determine a state of the user (e.g., sentiment) expressed by the user, and may store an indication of the state of the user for later use. For example, suppose during a first session a user indicates a negative state, e.g., by saying, “I feel lousy,” or by making a facial expression associated with negativity (e.g., frowning, grimacing, etc.).
- the chatbot may detect and retain in memory an indication of the user's negative state, such as the user's actual statement and/or a sentiment measure.
- the chatbot may form, e.g., from one or more candidate words, phrases, or statements, one or more statements (e.g., empathetic statements such as “I hope you're feeling better,” “I hope your dog is feeling better,” or inquiries such as “are you feeling better?,” etc.) to output to the user based on the stored user state indication.
- the chatbot is able to retain knowledge of the user's state over time, and is able to engage the user in a more socially reasonable manner.
- a “session” may include a logically and/or temporally self-contained exchange of one or more messages between the user and the chatbot.
- a chatbot may distinguish between multiple sessions with a user based on one or more signals, such as passage of time (e.g., a predetermined time interval) between sessions, change of user context (e.g., location, before/during/after a scheduled meeting, etc.) between sessions, detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot (e.g., the user switches applications for a while, the user walks away from then later returns to a standalone voice-activated product), locking/sleeping of the client device between sessions, and so forth.
- passage of time e.g., a predetermined time interval
- change of user context e.g., location, before/during/after a scheduled meeting, etc.
- detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot
- a chatbot may track a user's state over more than two sessions. For example, a chatbot may learn that at particular times of particular days each week (or month, or year), a user tends to have a particular user state. The chatbot may proactively output statements that are targeted towards these learned user states, giving the chatbot the appearance of empathy.
- chatbot may give rise to various technical effects and advantages. For example, the more empathetic (and hence, more “human”) a chatbot appears, the more likely a user may be to converse with it in the future. The more a user utilizes a chatbot, the more the chatbot may be able to learn about the user and the user's lifestyle/interactions. Consequently, the chatbot may be able to make more intelligent recommendations and provide more useful assistance in the future, increasing the chatbot' s overall efficiency and conserving computing resources such as memory, power, processor cycles, and/or network bandwidth. Moreover, tracking a state of a user may yield more efficient dialog between the user and the chatbot, likewise decreasing the consumption of computing resources. For example, if a chatbot issues a statement that reflects a user's previous state, the user may immediately issue directed requests to the chatbot, without the user having to remind the chatbot of the user's state.
- Chatbots may output statements obtained from various sources.
- the chatbot may have access to a library of statements extracted from prior message exchange threads between multiple participants (assuming, of course, the prior message exchange threads were authorized for such use).
- the chatbot may map one or more user states (e.g., sentiments) to groups of statements from the library, e.g., using heuristics.
- the chatbot may utilize a machine learning classifier that is trained based at least in part on pairs of participant statements expressing sentiment and participant responses to those statements of sentiment extracted from the prior message exchange threads.
- a method may include: receiving, at a client device operating a chatbot, input from a user, wherein the input is received during a first session between the user and the chatbot, and the input is based on user interface input generated by the user via one or more input devices of the client device; semantically processing, by the chatbot, the input from the user to determine a state expressed by the user to the chatbot; storing, by the chatbot, an indication of the state expressed by the user in memory for future use by the chatbot; determining, by the chatbot based on one or more signals, that a second session between the user and the chatbot is underway; and outputting, by the chatbot, as part of the second session, a statement formed from a plurality of candidate words, phrases, or statements based on the stored indication of the state expressed by the user, wherein the statement is output to the user via one or more output devices of the client device.
- the determining may include determining that the second session has commenced.
- the statement formed from the plurality of candidate words, phrases, or statements may be a greeting selected from a plurality of candidate greetings.
- the state expressed by the user may be a negative sentiment, and the statement formed from the plurality of candidate words, phrases, or statements may include an inquiry of whether the user or other individual about which the state was expressed has improved.
- the method may further include: receiving, at the client device, voice input from the user; and converting the voice input into textual input.
- the statement may be formed remotely from the client device or locally by the chatbot.
- the method may further include outputting, by the chatbot during the first session in response to the input from the user, a response selected from a plurality of candidate responses based on the state expressed by the user.
- the state expressed by the user may be a negative sentiment
- the response selected from the plurality of candidate responses may include an empathetic statement.
- the one or more signals may include detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot. In various implementations, the one or more signals may include passage of a predetermined time interval since a last interaction between the user and the chatbot. In various implementations, the one or more signals may include detection of a change in a context of the user since a last interaction between the user and the chatbot.
- the chatbot may obtain the plurality of candidate words, phrases, or statements from prior message exchange threads between multiple individuals.
- the statement may be formed based on a machine learning classifier trained based at least in part on the prior message exchange threads.
- the storing may include storing the textual user input in a sequence of user inputs that include states expressed by the user over time.
- the statement may be formed further based on a change of context of the user detected by the client device between the first session and the second session.
- the state expressed by the user may be a sentiment of the user.
- implementations include an apparatus including memory and one or more processors operable to execute instructions stored in the memory, where the instructions are configured to perform any of the aforementioned methods. Some implementations also include a non-transitory computer readable storage medium storing computer instructions executable by one or more processors to perform any of the aforementioned methods.
- FIG. 1 illustrates an example architecture of a computer system.
- FIG. 2 is a block diagram of an example distributed voice input processing environment.
- FIG. 3 is a flowchart illustrating an example method of processing a voice input using the environment of FIG. 2 .
- FIG. 4 and FIG. 5 illustrate examples of how disclosed techniques may be implemented in an example scenario, in accordance with various implementations.
- FIG. 6 is a flowchart illustrating an example method performable by and/or on behalf of a chatbot, in accordance with various implementations.
- FIG. 7 is an example of how user states may be mapped to groups of statements, in accordance with various implementations.
- a client device such as a smart phone, smart watch, standalone voice-activated product, or a vehicle computing system (e.g., a vehicle navigation or media management system) that operates a chatbot may receive input from the user.
- the input may arrive during a first “session” between the user and the chatbot in various forms using various modalities, such as spoken or typed input, gesture input, facial expression of the user, eye movements, and so forth. If the input is received as voice input, it may first be parsed and tokenized into text tokens as described below.
- textual input includes both voice input (that is ultimately converted to text) and input that a user types using a virtual or physical keyboard.
- the chatbot may semantically process the textual input to determine a state (e.g., sentiment) expressed by the user (which may relate to the user or to someone else, such as the user's family member/friend/pet/co-worker), and may store an indication of the state for later use. For example, if during a first session a user says, “I feel lousy,” the chatbot may retain in memory an indication of the user's sentiment, such as the user's statement itself.
- a state e.g., sentiment
- the chatbot may retain in memory an indication of the user's sentiment, such as the user's statement itself.
- the chatbot may form one or more statements (e.g., empathetic statements such as “I hope you're feeling better,” “I hope your family is feeling better,” or inquiries such as “are you feeling better?,” etc.) to output to the user via one or more output devices.
- empathetic statements such as “I hope you're feeling better,” “I hope your family is feeling better,” or inquiries such as “are you feeling better?,” etc.
- the chatbot is able to retain knowledge of states expressed by the user over time (about the user and/or others), and is able to engage the user in a more socially reasonable manner.
- a “session” may include a logically and/or temporally self-contained exchange of one or more messages between the user and the chatbot.
- a chatbot may distinguish between multiple sessions with a user based on one or more signals, such as passage of time between sessions, change of user context (e.g., location, before/during/after a scheduled meeting, etc.) between sessions, detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot (e.g., the user switches applications for a while, the user walks away from then later returns to a standalone voice-activated product), locking/sleeping of the client device between sessions, and so forth.
- a chatbot may track states expressed by a user over more than two sessions. For example, a chatbot may learn that at particular times of particular days each week (or month, or year), a user (or someone whom the user knows) tends to have a particular state. The chatbot may proactively output statements that are targeted towards learned states, giving the chatbot the appearance of empathy. For example, suppose a user indicates a romantic sentiment to a chatbot every year around the user's wedding anniversary. The chatbot may proactively issue statements leading up to the user's anniversary that put the user in a romantic state of mind (e.g., “Remember how smitten you were this time last year?”).
- FIG. 1 is a block diagram of electronic components in an example computer system 10 .
- System 10 typically includes at least one processor 12 that communicates with a number of peripheral devices via bus subsystem 14 .
- peripheral devices may include a storage subsystem 16 , including, for example, a memory subsystem 18 and a file storage subsystem 20 , user interface input devices 22 , user interface output devices 24 , and a network interface subsystem 26 .
- the input and output devices allow user interaction with system 10 .
- Network interface subsystem 26 provides an interface to outside networks and is coupled to corresponding interface devices in other computer systems.
- user interface input devices 22 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- pointing devices such as a mouse, trackball, touchpad, or graphics tablet
- audio input devices such as voice recognition systems, microphones, and/or other types of input devices.
- use of the term “input device” is intended to include all possible types of devices and ways to input information into computer system 10 or onto a communication network.
- User interface output devices 24 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices.
- the display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image.
- the display subsystem may also provide non-visual output such as audio output.
- output device is intended to include all possible types of devices and ways to output information from computer system 10 to the user or to another machine or computer system.
- Storage subsystem 16 stores programming and data constructs that provide the functionality of some or all of the modules described herein.
- the storage subsystem 16 may include the logic to perform selected aspects of the methods disclosed hereinafter.
- Memory subsystem 18 used in storage subsystem 16 may include a number of memories including a main random access memory (RAM) 28 for storage of instructions and data during program execution and a read only memory (ROM) 30 in which fixed instructions are stored.
- a file storage subsystem 20 may provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges.
- the modules implementing the functionality of certain implementations may be stored by file storage subsystem 20 in the storage subsystem 16 , or in other machines accessible by the processor(s) 12 .
- Bus subsystem 14 provides a mechanism for allowing the various components and subsystems of system 10 to communicate with each other as intended. Although bus subsystem 14 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.
- System 10 may be of varying types including a mobile device, a portable electronic device, an embedded device, a standalone voice-activated product, a vehicle computing system (e.g., a vehicle navigation or media management system), a desktop computer, a laptop computer, a tablet computer, a wearable device, a workstation, a server, a computing cluster, a blade server, a server farm, or any other data processing system or computing device.
- vehicle computing system e.g., a vehicle navigation or media management system
- desktop computer e.g., a vehicle navigation or media management system
- a tablet computer e.g., a tablet computer
- a wearable device e.g., a workstation
- server e.g., a computing cluster
- blade server e.g., a blade server
- server farm e.g., a server farm
- any other data processing system or computing device e.g., a blade server, a server farm, or any other data processing system or computing device
- Implementations discussed hereinafter may include one or more methods implementing various combinations of the functionality disclosed herein.
- Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform a method such as one or more of the methods described herein.
- Still other implementations may include an apparatus including memory and one or more processors operable to execute instructions, stored in the memory, to perform a method such as one or more of the methods described herein.
- FIG. 2 illustrates an example distributed voice input processing environment 50 , e.g., for use with a voice-enabled device 52 (or more generally, a “client device”) in communication with an online service such as online semantic processor 54 .
- voice-enabled device 52 is described as a mobile device such as a cellular phone or tablet computer.
- Other implementations may utilize a wide variety of other voice-enabled devices, however, so the references hereinafter to mobile devices are merely for the purpose of simplifying the discussion hereinafter.
- voice-enabled devices may use the herein-described functionality, including, for example, laptop computers, watches, head-mounted devices, virtual or augmented reality devices, other wearable devices, audio/video systems, navigation systems, automotive and other vehicular systems, standalone voice-activated products, etc.
- voice-enabled devices may be considered to be resource-constrained in that the memory and/or processing capacities of such devices may be constrained based upon technological, economic or other reasons, particularly when compared with the capacities of online or cloud-based services that can devote virtually unlimited computing resources to individual tasks.
- online semantic processor 54 may be implemented as a cloud-based service employing a cloud infrastructure, e.g., using a server farm or cluster of high performance computers running software suitable for handling high volumes of requests from multiple users. Online semantic processor 54 may not be limited to voice-based requests, and may also be capable of handling other types of requests, e.g., text-based requests, image-based requests, etc. In some implementations, online semantic processor 54 may handle voice-based requests such as setting alarms or reminders, managing lists, initiating communications with other users via phone, text, email, etc., or performing other actions that may be initiated via voice input. In other implementations, online semantic processor 54 may handle other types of voice inputs, such as conversational statements from a user expressing the user's state (e.g., sentiment).
- voice-based requests such as setting alarms or reminders, managing lists, initiating communications with other users via phone, text, email, etc., or performing other actions that may be initiated via voice input.
- online semantic processor 54 may handle other types of voice inputs, such
- voice input received by voice-enabled device 52 is processed by a voice-enabled application (or “app”), which in FIG. 2 takes the form of a chatbot 56 .
- voice input may be handled within an operating system or firmware of voice-enabled device 52 .
- chatbot 56 includes a voice action module 58 , online interface module 60 and render/synchronization module 62 .
- Voice action module 58 receives voice input directed to chatbot 56 and coordinates the analysis of the voice input and performance of one or more actions for a user of the voice-enabled device 52 .
- Online interface module 60 provides an interface with online semantic processor 54 , including forwarding voice input to online semantic processor 54 and receiving responses thereto.
- Render/synchronization module 62 manages the rendering of a response to a user, e.g., via a visual display, spoken audio, or other feedback interface suitable for a particular voice-enabled device.
- render/synchronization module 62 also handles synchronization with online semantic processor 54 , e.g., whenever a response or action affects data maintained for the user in the online search service (e.g., where voice input requests creation of an appointment that is maintained in a cloud-based calendar).
- Chatbot 56 may rely on various middleware, framework, operating system and/or firmware modules to handle voice input, including, for example, a streaming voice to text module 64 and a semantic processor module 66 including a parser module 68 , dialog manager module 70 and action builder module 72 .
- Streaming voice to text module 64 receives an audio recording of voice input, e.g., in the form of digital audio data, and converts the digital audio data into one or more textual words or phrases (also referred to herein as “tokens”).
- Streaming voice to text module 64 is also a streaming module, such that voice input is converted to text on a token-by-token basis and in real time or near-real time.
- tokens may be output from streaming voice to text module 64 concurrently with a user's speech, and thus prior to a user enunciating a complete spoken request.
- Streaming voice to text module 64 may rely on one or more locally-stored offline acoustic and/or language models 74 , which together model a relationship between an audio signal and phonetic units in a language, along with word sequences in the language.
- a single model 74 may be used, while in other implementations, multiple models may be supported, e.g., to support multiple languages, multiple speakers, etc.
- semantic processor module 66 attempts to discern the semantics or meaning of the text output by streaming voice to text module 64 (or provided initially by the user as typed text) for the purpose or formulating an appropriate response.
- Parser module 68 relies on one or more offline grammar models 76 to map text to particular actions and to identify attributes that constrain the performance of such actions, e.g., input variables to such actions.
- a single model 76 may be used, while in other implementations, multiple models may be supported, e.g., to support different actions or action domains (i.e., collections of related actions such as communication-related actions, search-related actions, audio/visual-related actions, calendar-related actions, device control-related actions, etc.)
- actions or action domains i.e., collections of related actions such as communication-related actions, search-related actions, audio/visual-related actions, calendar-related actions, device control-related actions, etc.
- an offline grammar model 76 may support an action such as “set a reminder” having a reminder type parameter that specifies what type of reminder to set, an item parameter that specifies one or more items associated with the reminder, and a time parameter that specifies a time to activate the reminder and remind the user.
- Parser module 68 may receive a sequence of tokens such as “remind me to,” “pick up,” “bread,” and “after work” and map the sequence of tokens to the action of setting a reminder with the reminder type parameter set to “shopping reminder,” the item parameter set to “bread” and the time parameter of “5:00 pm”, such that at 5:00 pm that day the user receives a reminder to “buy bread.”
- Parser module 68 may also work in conjunction with a dialog manager module 70 that manages dialog with a user.
- Dialog in this context refers to a set of voice inputs and responses similar to a conversation between two individuals. Module 70 therefore maintains a “state” of dialog to enable information obtained from a user in a prior voice input to be used when forming future outputs. Thus, for example, if a user were to say “I'm stressed,” a response could be generated to say “maybe it's time for a break.”
- dialog manager module 70 may be implemented in whole or in part as part of chatbot 56 .
- Action builder module 72 receives parsed text from parser module 68 , representing a voice input interpretation and generates one or more responsive actions or “tasks” along with any associated parameters for processing by module 62 of chatbot 56 .
- Action builder module 72 may rely on one or more offline action models 78 that incorporate various rules for creating actions from parsed text. It will be appreciated that some parameters may be directly received as voice input, while some parameters may be determined in other manners, e.g., based upon a user's location, demographic information, or based upon other information particular to a user.
- a location parameter may not be determinable without additional information such as the user's current location, the user's known route between work and home, the user's regular grocery store, etc.
- models 74 , 76 and 78 may be combined into fewer models or split into additional models, as may be functionality of modules 64 , 68 , 70 and 72 .
- models 74 - 78 are referred to herein as offline models insofar as the models are stored locally on voice-enabled device 52 and are thus accessible offline, when device 52 is not in communication with online semantic processor 54 .
- module 56 is described herein as being a chatbot, that is not meant to be limiting.
- any type of app operating on voice-enabled device 52 may perform techniques described herein to tailor output to a user's state as described herein.
- online semantic processor 54 may include complementary functionality for handling voice input, e.g., using a voice-based query processor 80 that relies on various online acoustic/language, grammar and/or action models 82 . It will be appreciated that in some implementations, particularly when voice-enabled device 52 is a resource-constrained device, voice-based query processor 80 and models 82 used thereby may implement more complex and computational resource-intensive voice processing functionality than is local to voice-enabled device 52 .
- multiple voice-based query processors 80 may be employed, each acting as an online counterpart for one or more chatbots 56 .
- each client device in a user's ecosystem of client devices may be configured to operate an instance of a chatbot 56 that is associated with the user (e.g., configured with the user's preferences, associated with the same interaction history, etc.).
- a single, user-centric online instance of voice-based query processor 80 may be accessible to each of these multiple instances of chatbot 56 , depending on which client device the user is operating at the time.
- both online and offline functionality may be supported, e.g., such that online functionality is used whenever a client device is in communication with an online service, while offline functionality is used when no connectivity exists.
- different actions or action domains may be allocated to online and offline functionality, and while in still other implementations, online functionality may be used only when offline functionality fails to adequately handle a particular voice input. In other implementations, however, no complementary online functionality may be used.
- FIG. 3 illustrates a voice processing routine 100 that may be executed by voice-enabled device 52 to handle a voice input.
- Routine 100 begins in block 102 by receiving voice input, e.g., in the form of a digital audio signal.
- voice input e.g., in the form of a digital audio signal.
- an initial attempt is made to forward the voice input to the online search service (block 104 ).
- block 106 passes control to block 108 to convert the voice input to text tokens (block 108 , e.g., using streaming voice to text module 64 of FIG. 2 ), parse the text tokens (block 110 , e.g., using module 68 of FIG.
- block 106 bypasses blocks 108 - 112 and passes control directly to block 114 to perform client-side rendering and synchronization. Processing of the voice input is then complete. It will be appreciated that in other implementations, as noted above, offline processing may be attempted prior to online processing, e.g., to avoid unnecessary data communications when a voice input can be handled locally.
- FIGS. 4 and 5 schematically demonstrate an example scenario in which chatbot 56 may track a user's state (e.g. sentiment) across multiple sessions and output a statement formed based on the user's last known state.
- a voice-enabled device 152 takes the form of a smart phone or tablet computer with a touch screen display 154 that is used to render a transcript 156 of a dialog between a user (“YOU” in FIGS. 4 and 5 ) and a chatbot ( 56 in FIG. 2 ).
- FIG. 4 depicts a first session between the user and the chatbot that occurs late in the evening of August 1 st .
- the user has provided textual input (originally spoken or typed) indicating that the user feels lousy.
- the chatbot has detected the negative user state and has provided a suitable response, such as “I'm sorry to hear that.”
- the chatbot has also stored an indication of the user's expressed state in memory, e.g., of voice-enabled device 152 .
- the chatbot may store the user's statement verbatim in memory.
- the chatbot may determine a generic user state (e.g., a numeric sentiment measure or enumerated sentiment level) determined from the user's statement, such as “sick,” “sad,” “depressed,” etc., and may store an indication of that generic user state.
- FIG. 5 depicts a second session between the user and the chatbot that occurs the next morning on August 2 nd .
- the user initiates the second session by asking, “What's the weather today?”
- the chatbot first responds to the user's query by replying, “80 degrees and sunny.” Then, without any prompting from the user, and based on the user's negative state expressed the previous evening, the chatbot asks, “Are you feeling better?” While the second session depicted in FIG. 5 occurs one day after the first session depicted in FIG. 4 , this is not meant to be limiting.
- separate sessions between the user and the chatbot may be distinguished from each other based on other signals, such as intervening interactions between the user and voice-enabled device 152 , a change of context of the user (which may be detected, for instance, based on one or more signals from one or more sensors associated with voice-enabled device 152 , such as accelerometers, GPS, etc.), and so forth.
- FIG. 6 illustrates a routine 660 suitable for execution by a chatbot 56 to communicate with a user in a more natural (i.e. “human,” “graceful”) manner.
- Routine 660 may be executed by the same service that processes voice-based queries, or may be a different service altogether.
- input is received from a user during a first session.
- the input may take various forms and/or be received using various input modalities.
- the input may take the form of a digital audio signal or text typed by the user at a physical or virtual keyboard.
- the input may take other forms, such as gestures (e.g., shaking a phone may indicate excitement or frustration), eye movements (e.g., a lot of eye movement may indicate stress or excitement), and so forth.
- the user input may be semantically processed online or offline to determine a state of the user.
- the textual input may be converted to text tokens (e.g., using streaming voice to text module 64 and/or model 82 of FIG. 2 ) and then semantically processed at block 664 .
- chatbot 56 may store, e.g., in local memory and/or at one or more remote computing devices (e.g., hosted in the cloud), an indication of the user's state.
- the indication may include the user's statement verbatim.
- the indication may include a generalized label of the user's state (e.g., “happy,” “sad,” “sick,” “excited,” “stressed,” etc.).
- the indication may be stored as a numeric state (or “sentiment”) measure.
- the user input may be stored as part of a sequence of user inputs that express states of the user over time (e.g., across a plurality of distinct sessions).
- chatbot 56 may determine based on one or more signals that a subsequent session has commenced and/or is underway. Examples of signals that may be considered by chatbot 56 when distinguishing between multiple distinct sessions were described previously.
- chatbot 56 may output one or more statements that are formed, e.g., by chatbot 56 or by a remote processor, from a plurality of candidate words, phrases, and/or statements.
- the plurality of candidate words, phrases, and/or statements may be obtained/extracted from prior message exchange threads between multiple individuals.
- a corpus of prior message exchange threads may be authorized for use in training an artificial intelligence scheme such as a machine learning classifier or neural network.
- User words, phrases, and/or statements in the message exchange threads expressing states of the users e.g., user sentiments
- Responses to those statements from other users in the message exchange threads may be provided as labeled outputs.
- responses of empathy, congratulations, encouragement, etc. may be identified as responses to user statements expressing sentiment.
- a machine learning classifier, neural network, or other artificial intelligence model may be trained using these labeled pairs to identify future words, phrases, and/or statements to be formed and provided by chatbot 56 in response to user statements of sentiment.
- less complex techniques may be employed to identify suitable statements for chatbot 56 to output in subsequent sessions.
- a plurality of candidate statements may be provided for each of an enumerated set of user states (e.g., sentiments). Whenever a new session commences between chatbot 56 and a user, the user's last known state (or a combination of a plurality of previous states) may be used to identify the user's potential current sentiment. Then, a statement may be selected from a plurality of candidate statements associated with that sentiment.
- FIG. 7 schematically depicts a non-limiting example of how various levels of user state, and more particularly, user sentiment, may be mapped to a plurality of candidate statements to be output by a chatbot.
- On the left is a range of sentiments from strong negative to strong positive, with intermediate values in between, that may be determined by a chatbot based on user input during a first session.
- On the right are candidate statements that may be selected and output by a chatbot during a subsequent session with the user based on the previously-determined sentiment.
- a chatbot may form (e.g., assemble) a statement from a plurality of candidate words, phrases, and/or complete statements.
- each level of sentiment may map to a plurality of candidate statements. For example, both strong negative and negative map towards the same group of four candidate statements (“I'm sorry for your loss,” “Is there anything I can do to help?,” “I hope you're feeling better,” “How are you feeling?”). Neutral maps to three relatively generic candidate statements that take the form of common greetings (“Good morning,” “Good Afternoon,” “How are you?”). Positive and strong positive both map to another group of four candidate statements (“Still glowing?,” “Still in a good mood?,” “Wonderful to see you so happy ⁇ insert previous time>,” “Congratulations”).
- the sentiment levels and candidate statements depicted in FIG. 7 are for illustrative purposes only, and are not meant to be limiting. Any number of candidate statements may be provided for any number of sentiment levels.
- a chatbot determines during a first session with a user that the user has a particular sentiment
- the chatbot may output a statement during a subsequent session, such as the next session, that corresponds to the prior sentiment. For example, if a user expresses a negative sentiment during a first session, the chatbot may select and output one of the four statements in the top group of statements.
- the chatbot may form (e.g., assemble) statements to output from a plurality of candidate words, phrases, and/or statements.
- the chatbot may select and output one or more images, symbols, and/or ideograms (such as one or more so-called “emojis”) to convey empathy or otherwise respond to a user's expressed sentiment.
- the chatbot may select a particular statement from a group of candidate statements mapped to one or more sentiment levels in various ways.
- the chatbot may merely select the statement that is the most broadly applicable (i.e., generic). For example, if the chatbot only knows that the user's last sentiment was negative, it may select a relatively generic empathetic response, such as “I hope you're feeling better” or “How are you feeling?”
- the chatbot may randomly select from the two or more statements.
- the chatbot may analyze prior responses by the user to the candidate statements, and may select the candidate statement to which the user has responded to mostly positively in the past. For example, suppose in multiple instances in the past when a user last-known sentiment is negative, the chatbot has output both “I hope you're feeling better” (a declarative statement) and “How are you feeling?” (solicitation of a user state). Suppose further that the user ignored the former but responded to the latter (e.g., “Yes, I am feeling better, thank you for asking”). The chatbot may take this into account when selecting which of these two phrases to output in future sessions. To this end, in some implementations, the chatbot may create and maintain scores, statistics, and/or other metrics in association with each candidate statement, so that those statements that elicit positive responses are used more often.
- candidate statements may be drawn from prior message exchange between multiple (human) participants that have been authorized for use as training examples, e.g., for machine learning classifiers.
- a so-called “sentiment classifier” trained to identify user statements expressing sentiment may be employed to identify words, phrases, and/or statements by message exchange thread participants expressing various sentiments. Responses and/or replies from other participants may then be identified.
- pairs of statements may be used as positive training examples for a so-called “sentiment response classifier.”
- triplets of statements e.g., a first statement expressing sentiment, a second statement empathetically responding to the first statement, and a third statement positively acknowledging the second statement—may be used as a positive training example.
- triplets of statement e.g., a first statement expressing sentiment, a second statement empathetically responding to the first statement, and a third statement rejecting or otherwise correcting the second statement—may be used as a negative training example.
- statements may be formed based on one or more signals available to the chatbot other than a general indication prior user sentiment. For example, suppose user input expressing negative sentiment also provides other details, such as “I'm sad because my friend moved away.” The additional information about the friend moving away may be semantically processed, e.g., by the chatbot, and may be used to select “I'm sorry for your loss” from the top group of statements in FIG. 7 .
- chatbot may select “Congratulations” to be output to the user at a subsequent session, rather than a more generic statement such as “Still in a good mood?”
- the examples described above in association with FIG. 7 include maintaining or tracking user sentiment intersession (i.e. across multiple sessions). However, this is not meant to be limiting. In some implementations, similar techniques may be employed intra-session (i.e. within a single session). For example, if a user provides input that expresses a negative/positive sentiment, the chatbot may immediately respond with a statement selected from the candidate statements depicted in FIG. 7 (or in other implementations may assemble/form such a statement using candidate words, phrases, and/or statements). The chatbot may also score or otherwise rank the candidate statements based on the user's immediate response or lack thereof, e.g., for future reference.
- a user may, during a given session, express a prior state (e.g., “I felt sick last night”).
- a prior state e.g., “I felt sick last night”.
- the chatbot may, during the same given session, form a response from a plurality of candidate words, phrases, and/or statements (e.g., “are you feeling better this morning?”).
- techniques described herein may be employed without knowledge of a user's prior state (e.g., sentiment). For example, if a chatbot were to output a generic greeting (“e.g., “Hello,” “How are you,” etc.) to a user each time a new session between the user and the chatbot commenced, the user may become annoyed, particularly if there are multiple sessions within a relatively short period of time (e.g., a few minutes, a couple of hours, etc.). For example, if a chatbot already output the greeting “Good morning” to a user, it would not make sense for the chatbot to output the same greeting later that morning, even if the user engages in multiple distinct sessions with the chatbot.
- a generic greeting e.g., “Hello,” “How are you,” etc.
- chatbot may maintain, e.g., in memory of voice-enabled device 52 , an indication that the greeting “Good morning” has already been output today. Should a morning session between the user and the chatbot cease, and then should the user initiate a new session later in the morning, the chatbot may determine that it already greeted the user, and may refrain from issuing the same or similar greeting again.
- a chatbot may be configured to output only a single greeting per day. In other implementations, a chatbot may be configured to output a greeting if a threshold amount of time has elapsed since it last output a greeting.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
- Information Transfer Between Computers (AREA)
- Machine Translation (AREA)
Abstract
Description
- Chatbots, also referred to as “interactive assistant modules,” “virtual assistants,” and/or “mobile assistants,” may be designed to mimic human conversation. For example, a chatbot may greet a user with conversational statements such as “hello” and “how are you today?” Some chatbots may even be configured to identify a state associated with a user statement and respond accordingly. Suppose a user tells a chatbot, “I feel lousy today.” The chatbot may detect the negative state expressed by the user and may select and output an appropriate response, such as “I'm sorry to hear that.” In spite of efforts to make chatbots seem more “human,” however, chatbots may still tend to come off as unnatural or awkward because, for instance, they do not keep track of users' emotional states over time.
- This specification is directed generally to various techniques for tailoring chatbot output to a user's state in order to achieve more natural dialog. As used herein, a user's “state” may refer to a particular condition of the user (at that time or at a previous time) or of another being (e.g., the user's friend/family member/pet), such as an emotional and/or physical condition (e.g., a sentiment of the user). In various implementations, a client device such as a smart phone, smart watch, standalone voice-activated product, or a vehicle computing system (e.g., a vehicle navigation or media management system) that operates a chatbot may receive input from the user. The input may arrive during a first “session” between the user and the chatbot in various forms, including but not limited to spoken or voice input, typed input, gesture input, eye movement input, facial expression input, and so forth. The chatbot may semantically process the input to determine a state of the user (e.g., sentiment) expressed by the user, and may store an indication of the state of the user for later use. For example, suppose during a first session a user indicates a negative state, e.g., by saying, “I feel lousy,” or by making a facial expression associated with negativity (e.g., frowning, grimacing, etc.). The chatbot may detect and retain in memory an indication of the user's negative state, such as the user's actual statement and/or a sentiment measure. During a subsequent session with the user, the chatbot may form, e.g., from one or more candidate words, phrases, or statements, one or more statements (e.g., empathetic statements such as “I hope you're feeling better,” “I hope your dog is feeling better,” or inquiries such as “are you feeling better?,” etc.) to output to the user based on the stored user state indication. In this manner, the chatbot is able to retain knowledge of the user's state over time, and is able to engage the user in a more socially reasonable manner.
- A “session” may include a logically and/or temporally self-contained exchange of one or more messages between the user and the chatbot. A chatbot may distinguish between multiple sessions with a user based on one or more signals, such as passage of time (e.g., a predetermined time interval) between sessions, change of user context (e.g., location, before/during/after a scheduled meeting, etc.) between sessions, detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot (e.g., the user switches applications for a while, the user walks away from then later returns to a standalone voice-activated product), locking/sleeping of the client device between sessions, and so forth. In some implementations, a chatbot may track a user's state over more than two sessions. For example, a chatbot may learn that at particular times of particular days each week (or month, or year), a user tends to have a particular user state. The chatbot may proactively output statements that are targeted towards these learned user states, giving the chatbot the appearance of empathy.
- Techniques described herein may give rise to various technical effects and advantages. For example, the more empathetic (and hence, more “human”) a chatbot appears, the more likely a user may be to converse with it in the future. The more a user utilizes a chatbot, the more the chatbot may be able to learn about the user and the user's lifestyle/interactions. Consequently, the chatbot may be able to make more intelligent recommendations and provide more useful assistance in the future, increasing the chatbot' s overall efficiency and conserving computing resources such as memory, power, processor cycles, and/or network bandwidth. Moreover, tracking a state of a user may yield more efficient dialog between the user and the chatbot, likewise decreasing the consumption of computing resources. For example, if a chatbot issues a statement that reflects a user's previous state, the user may immediately issue directed requests to the chatbot, without the user having to remind the chatbot of the user's state.
- Chatbots may output statements obtained from various sources. In some implementations, the chatbot may have access to a library of statements extracted from prior message exchange threads between multiple participants (assuming, of course, the prior message exchange threads were authorized for such use). In some implementations, the chatbot may map one or more user states (e.g., sentiments) to groups of statements from the library, e.g., using heuristics. In some implementations, the chatbot may utilize a machine learning classifier that is trained based at least in part on pairs of participant statements expressing sentiment and participant responses to those statements of sentiment extracted from the prior message exchange threads.
- Therefore, in some implementations, a method may include: receiving, at a client device operating a chatbot, input from a user, wherein the input is received during a first session between the user and the chatbot, and the input is based on user interface input generated by the user via one or more input devices of the client device; semantically processing, by the chatbot, the input from the user to determine a state expressed by the user to the chatbot; storing, by the chatbot, an indication of the state expressed by the user in memory for future use by the chatbot; determining, by the chatbot based on one or more signals, that a second session between the user and the chatbot is underway; and outputting, by the chatbot, as part of the second session, a statement formed from a plurality of candidate words, phrases, or statements based on the stored indication of the state expressed by the user, wherein the statement is output to the user via one or more output devices of the client device.
- In various implementations, the determining may include determining that the second session has commenced. In various implementations, the statement formed from the plurality of candidate words, phrases, or statements may be a greeting selected from a plurality of candidate greetings. In various implementations, the state expressed by the user may be a negative sentiment, and the statement formed from the plurality of candidate words, phrases, or statements may include an inquiry of whether the user or other individual about which the state was expressed has improved.
- In various implementations, the method may further include: receiving, at the client device, voice input from the user; and converting the voice input into textual input. In various implementations, the statement may be formed remotely from the client device or locally by the chatbot. In various implementations, the method may further include outputting, by the chatbot during the first session in response to the input from the user, a response selected from a plurality of candidate responses based on the state expressed by the user. In various implementations, the state expressed by the user may be a negative sentiment, and the response selected from the plurality of candidate responses may include an empathetic statement.
- In various implementations, the one or more signals may include detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot. In various implementations, the one or more signals may include passage of a predetermined time interval since a last interaction between the user and the chatbot. In various implementations, the one or more signals may include detection of a change in a context of the user since a last interaction between the user and the chatbot.
- In various implementations, the chatbot may obtain the plurality of candidate words, phrases, or statements from prior message exchange threads between multiple individuals. In various implementations, the statement may be formed based on a machine learning classifier trained based at least in part on the prior message exchange threads.
- In various implementations, the storing may include storing the textual user input in a sequence of user inputs that include states expressed by the user over time. In various implementations, the statement may be formed further based on a change of context of the user detected by the client device between the first session and the second session. In various implementations, the state expressed by the user may be a sentiment of the user.
- In addition, some implementations include an apparatus including memory and one or more processors operable to execute instructions stored in the memory, where the instructions are configured to perform any of the aforementioned methods. Some implementations also include a non-transitory computer readable storage medium storing computer instructions executable by one or more processors to perform any of the aforementioned methods.
- It should be appreciated that all combinations of the foregoing concepts and additional concepts described in greater detail herein are contemplated as being part of the subject matter disclosed herein. For example, all combinations of claimed subject matter appearing at the end of this disclosure are contemplated as being part of the subject matter disclosed herein.
-
FIG. 1 illustrates an example architecture of a computer system. -
FIG. 2 is a block diagram of an example distributed voice input processing environment. -
FIG. 3 is a flowchart illustrating an example method of processing a voice input using the environment ofFIG. 2 . -
FIG. 4 andFIG. 5 illustrate examples of how disclosed techniques may be implemented in an example scenario, in accordance with various implementations. -
FIG. 6 is a flowchart illustrating an example method performable by and/or on behalf of a chatbot, in accordance with various implementations. -
FIG. 7 is an example of how user states may be mapped to groups of statements, in accordance with various implementations. - This specification is directed generally to various techniques for tailoring chatbot output to a user's state to achieve more natural dialog. In various implementations, a client device such as a smart phone, smart watch, standalone voice-activated product, or a vehicle computing system (e.g., a vehicle navigation or media management system) that operates a chatbot may receive input from the user. The input may arrive during a first “session” between the user and the chatbot in various forms using various modalities, such as spoken or typed input, gesture input, facial expression of the user, eye movements, and so forth. If the input is received as voice input, it may first be parsed and tokenized into text tokens as described below. Accordingly, as used herein, “textual input” includes both voice input (that is ultimately converted to text) and input that a user types using a virtual or physical keyboard. The chatbot may semantically process the textual input to determine a state (e.g., sentiment) expressed by the user (which may relate to the user or to someone else, such as the user's family member/friend/pet/co-worker), and may store an indication of the state for later use. For example, if during a first session a user says, “I feel lousy,” the chatbot may retain in memory an indication of the user's sentiment, such as the user's statement itself. During a subsequent session with the user, the chatbot may form one or more statements (e.g., empathetic statements such as “I hope you're feeling better,” “I hope your family is feeling better,” or inquiries such as “are you feeling better?,” etc.) to output to the user via one or more output devices. In this manner, the chatbot is able to retain knowledge of states expressed by the user over time (about the user and/or others), and is able to engage the user in a more socially reasonable manner.
- A “session” may include a logically and/or temporally self-contained exchange of one or more messages between the user and the chatbot. A chatbot may distinguish between multiple sessions with a user based on one or more signals, such as passage of time between sessions, change of user context (e.g., location, before/during/after a scheduled meeting, etc.) between sessions, detection of one or more intervening interactions between the user and the client device other than dialog between the user and the chatbot (e.g., the user switches applications for a while, the user walks away from then later returns to a standalone voice-activated product), locking/sleeping of the client device between sessions, and so forth.
- In some implementations, a chatbot may track states expressed by a user over more than two sessions. For example, a chatbot may learn that at particular times of particular days each week (or month, or year), a user (or someone whom the user knows) tends to have a particular state. The chatbot may proactively output statements that are targeted towards learned states, giving the chatbot the appearance of empathy. For example, suppose a user indicates a romantic sentiment to a chatbot every year around the user's wedding anniversary. The chatbot may proactively issue statements leading up to the user's anniversary that put the user in a romantic state of mind (e.g., “Remember how smitten you were this time last year?”).
- Further details regarding selected implementations are discussed hereinafter. It will be appreciated however that other implementations are contemplated so the implementations disclosed herein are not exclusive.
- Now turning to the drawings, wherein like numbers denote like parts throughout the several views,
FIG. 1 is a block diagram of electronic components in anexample computer system 10.System 10 typically includes at least oneprocessor 12 that communicates with a number of peripheral devices viabus subsystem 14. These peripheral devices may include astorage subsystem 16, including, for example, amemory subsystem 18 and afile storage subsystem 20, userinterface input devices 22, user interface output devices 24, and anetwork interface subsystem 26. The input and output devices allow user interaction withsystem 10.Network interface subsystem 26 provides an interface to outside networks and is coupled to corresponding interface devices in other computer systems. - In some implementations, user
interface input devices 22 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display, audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information intocomputer system 10 or onto a communication network. - User interface output devices 24 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual output such as audio output. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from
computer system 10 to the user or to another machine or computer system. -
Storage subsystem 16 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, thestorage subsystem 16 may include the logic to perform selected aspects of the methods disclosed hereinafter. - These software modules are generally executed by
processor 12 alone or in combination with other processors.Memory subsystem 18 used instorage subsystem 16 may include a number of memories including a main random access memory (RAM) 28 for storage of instructions and data during program execution and a read only memory (ROM) 30 in which fixed instructions are stored. Afile storage subsystem 20 may provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored byfile storage subsystem 20 in thestorage subsystem 16, or in other machines accessible by the processor(s) 12. -
Bus subsystem 14 provides a mechanism for allowing the various components and subsystems ofsystem 10 to communicate with each other as intended. Althoughbus subsystem 14 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses. -
System 10 may be of varying types including a mobile device, a portable electronic device, an embedded device, a standalone voice-activated product, a vehicle computing system (e.g., a vehicle navigation or media management system), a desktop computer, a laptop computer, a tablet computer, a wearable device, a workstation, a server, a computing cluster, a blade server, a server farm, or any other data processing system or computing device. In addition, functionality implemented bysystem 10 may be distributed among multiple systems interconnected with one another over one or more networks, e.g., in a client-server, peer-to-peer, or other networking arrangement. Due to the ever-changing nature of computers and networks, the description ofsystem 10 depicted inFIG. 1 is intended only as a specific example for purposes of illustrating some implementations. Many other configurations ofsystem 10 are possible having more or fewer components than the computer system depicted inFIG. 1 . - Implementations discussed hereinafter may include one or more methods implementing various combinations of the functionality disclosed herein. Other implementations may include a non-transitory computer readable storage medium storing instructions executable by a processor to perform a method such as one or more of the methods described herein. Still other implementations may include an apparatus including memory and one or more processors operable to execute instructions, stored in the memory, to perform a method such as one or more of the methods described herein.
- Various program code described hereinafter may be identified based upon the application within which it is implemented in a specific implementation. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience. Furthermore, given the endless number of manners in which computer programs may be organized into routines, procedures, methods, modules, objects, and the like, as well as the various manners in which program functionality may be allocated among various software layers that are resident within a typical computer (e.g., operating systems, libraries, API's, applications, applets, etc.), it should be appreciated that some implementations may not be limited to the specific organization and allocation of program functionality described herein.
- Furthermore, it will be appreciated that the various operations described herein that may be performed by any program code, or performed in any routines, workflows, or the like, may be combined, split, reordered, omitted, performed sequentially or in parallel and/or supplemented with other techniques, and therefore, some implementations are not limited to the particular sequences of operations described herein.
-
FIG. 2 illustrates an example distributed voiceinput processing environment 50, e.g., for use with a voice-enabled device 52 (or more generally, a “client device”) in communication with an online service such as onlinesemantic processor 54. In the implementations discussed hereinafter, for example, voice-enableddevice 52 is described as a mobile device such as a cellular phone or tablet computer. Other implementations may utilize a wide variety of other voice-enabled devices, however, so the references hereinafter to mobile devices are merely for the purpose of simplifying the discussion hereinafter. Countless other types of voice-enabled devices may use the herein-described functionality, including, for example, laptop computers, watches, head-mounted devices, virtual or augmented reality devices, other wearable devices, audio/video systems, navigation systems, automotive and other vehicular systems, standalone voice-activated products, etc. Moreover, many of such voice-enabled devices may be considered to be resource-constrained in that the memory and/or processing capacities of such devices may be constrained based upon technological, economic or other reasons, particularly when compared with the capacities of online or cloud-based services that can devote virtually unlimited computing resources to individual tasks. - In some implementations, online
semantic processor 54 may be implemented as a cloud-based service employing a cloud infrastructure, e.g., using a server farm or cluster of high performance computers running software suitable for handling high volumes of requests from multiple users. Onlinesemantic processor 54 may not be limited to voice-based requests, and may also be capable of handling other types of requests, e.g., text-based requests, image-based requests, etc. In some implementations, onlinesemantic processor 54 may handle voice-based requests such as setting alarms or reminders, managing lists, initiating communications with other users via phone, text, email, etc., or performing other actions that may be initiated via voice input. In other implementations, onlinesemantic processor 54 may handle other types of voice inputs, such as conversational statements from a user expressing the user's state (e.g., sentiment). - In the implementation of
FIG. 2 , voice input received by voice-enableddevice 52 is processed by a voice-enabled application (or “app”), which inFIG. 2 takes the form of achatbot 56. In other implementations, voice input may be handled within an operating system or firmware of voice-enableddevice 52. In the illustrated implementation,chatbot 56 includes avoice action module 58,online interface module 60 and render/synchronization module 62.Voice action module 58 receives voice input directed tochatbot 56 and coordinates the analysis of the voice input and performance of one or more actions for a user of the voice-enableddevice 52.Online interface module 60 provides an interface with onlinesemantic processor 54, including forwarding voice input to onlinesemantic processor 54 and receiving responses thereto. Render/synchronization module 62 manages the rendering of a response to a user, e.g., via a visual display, spoken audio, or other feedback interface suitable for a particular voice-enabled device. In addition, in some implementations, render/synchronization module 62 also handles synchronization with onlinesemantic processor 54, e.g., whenever a response or action affects data maintained for the user in the online search service (e.g., where voice input requests creation of an appointment that is maintained in a cloud-based calendar). -
Chatbot 56 may rely on various middleware, framework, operating system and/or firmware modules to handle voice input, including, for example, a streaming voice to textmodule 64 and asemantic processor module 66 including aparser module 68,dialog manager module 70 andaction builder module 72. - Streaming voice to text
module 64 receives an audio recording of voice input, e.g., in the form of digital audio data, and converts the digital audio data into one or more textual words or phrases (also referred to herein as “tokens”). In the illustrated implementation, Streaming voice to textmodule 64 is also a streaming module, such that voice input is converted to text on a token-by-token basis and in real time or near-real time. In effect, tokens may be output from streaming voice to textmodule 64 concurrently with a user's speech, and thus prior to a user enunciating a complete spoken request. Streaming voice to textmodule 64 may rely on one or more locally-stored offline acoustic and/or language models 74, which together model a relationship between an audio signal and phonetic units in a language, along with word sequences in the language. In some implementations, a single model 74 may be used, while in other implementations, multiple models may be supported, e.g., to support multiple languages, multiple speakers, etc. - Whereas streaming voice to text
module 64 converts speech to text,semantic processor module 66 attempts to discern the semantics or meaning of the text output by streaming voice to text module 64 (or provided initially by the user as typed text) for the purpose or formulating an appropriate response.Parser module 68, for example, relies on one or moreoffline grammar models 76 to map text to particular actions and to identify attributes that constrain the performance of such actions, e.g., input variables to such actions. In some implementations, asingle model 76 may be used, while in other implementations, multiple models may be supported, e.g., to support different actions or action domains (i.e., collections of related actions such as communication-related actions, search-related actions, audio/visual-related actions, calendar-related actions, device control-related actions, etc.) - As an example, an
offline grammar model 76 may support an action such as “set a reminder” having a reminder type parameter that specifies what type of reminder to set, an item parameter that specifies one or more items associated with the reminder, and a time parameter that specifies a time to activate the reminder and remind the user.Parser module 68 may receive a sequence of tokens such as “remind me to,” “pick up,” “bread,” and “after work” and map the sequence of tokens to the action of setting a reminder with the reminder type parameter set to “shopping reminder,” the item parameter set to “bread” and the time parameter of “5:00 pm”, such that at 5:00 pm that day the user receives a reminder to “buy bread.” -
Parser module 68 may also work in conjunction with adialog manager module 70 that manages dialog with a user. Dialog in this context refers to a set of voice inputs and responses similar to a conversation between two individuals.Module 70 therefore maintains a “state” of dialog to enable information obtained from a user in a prior voice input to be used when forming future outputs. Thus, for example, if a user were to say “I'm stressed,” a response could be generated to say “maybe it's time for a break.” In some implementations,dialog manager module 70 may be implemented in whole or in part as part ofchatbot 56. -
Action builder module 72 receives parsed text fromparser module 68, representing a voice input interpretation and generates one or more responsive actions or “tasks” along with any associated parameters for processing bymodule 62 ofchatbot 56.Action builder module 72 may rely on one or moreoffline action models 78 that incorporate various rules for creating actions from parsed text. It will be appreciated that some parameters may be directly received as voice input, while some parameters may be determined in other manners, e.g., based upon a user's location, demographic information, or based upon other information particular to a user. For example, if a user were to say “remind me to pick up bread at the grocery store,” a location parameter may not be determinable without additional information such as the user's current location, the user's known route between work and home, the user's regular grocery store, etc. - It will be appreciated that in some implementations,
models modules device 52 and are thus accessible offline, whendevice 52 is not in communication with onlinesemantic processor 54. Moreover, whilemodule 56 is described herein as being a chatbot, that is not meant to be limiting. In various implementations, any type of app operating on voice-enableddevice 52 may perform techniques described herein to tailor output to a user's state as described herein. - In various implementations, online
semantic processor 54 may include complementary functionality for handling voice input, e.g., using a voice-based query processor 80 that relies on various online acoustic/language, grammar and/or action models 82. It will be appreciated that in some implementations, particularly when voice-enableddevice 52 is a resource-constrained device, voice-based query processor 80 and models 82 used thereby may implement more complex and computational resource-intensive voice processing functionality than is local to voice-enableddevice 52. - In some implementations, multiple voice-based query processors 80 may be employed, each acting as an online counterpart for one or
more chatbots 56. For example, in some implementations, each client device in a user's ecosystem of client devices may be configured to operate an instance of achatbot 56 that is associated with the user (e.g., configured with the user's preferences, associated with the same interaction history, etc.). A single, user-centric online instance of voice-based query processor 80 may be accessible to each of these multiple instances ofchatbot 56, depending on which client device the user is operating at the time. - In some implementations, both online and offline functionality may be supported, e.g., such that online functionality is used whenever a client device is in communication with an online service, while offline functionality is used when no connectivity exists. In other implementations, different actions or action domains may be allocated to online and offline functionality, and while in still other implementations, online functionality may be used only when offline functionality fails to adequately handle a particular voice input. In other implementations, however, no complementary online functionality may be used.
-
FIG. 3 , for example, illustrates avoice processing routine 100 that may be executed by voice-enableddevice 52 to handle a voice input.Routine 100 begins inblock 102 by receiving voice input, e.g., in the form of a digital audio signal. In this implementation, an initial attempt is made to forward the voice input to the online search service (block 104). If unsuccessful, e.g., due to a lack of connectivity or a lack of a response from the online search service, block 106 passes control to block 108 to convert the voice input to text tokens (block 108, e.g., using streaming voice to textmodule 64 ofFIG. 2 ), parse the text tokens (block 110, e.g., usingmodule 68 ofFIG. 2 ), and build an action from the parsed text (block 112, e.g., usingaction builder module 72 ofFIG. 2 ). The resulting action is then used to perform client-side rendering and synchronization (block 114, e.g., using render/synchronization module 62 ofFIG. 2 ), and processing of the voice input is complete. - Returning to block 106, if the attempt to forward the voice input to the online search service is successful, then block 106 bypasses blocks 108-112 and passes control directly to block 114 to perform client-side rendering and synchronization. Processing of the voice input is then complete. It will be appreciated that in other implementations, as noted above, offline processing may be attempted prior to online processing, e.g., to avoid unnecessary data communications when a voice input can be handled locally.
-
FIGS. 4 and 5 schematically demonstrate an example scenario in whichchatbot 56 may track a user's state (e.g. sentiment) across multiple sessions and output a statement formed based on the user's last known state. InFIG. 4 , a voice-enableddevice 152 takes the form of a smart phone or tablet computer with atouch screen display 154 that is used to render atranscript 156 of a dialog between a user (“YOU” inFIGS. 4 and 5 ) and a chatbot (56 inFIG. 2 ).FIG. 4 depicts a first session between the user and the chatbot that occurs late in the evening of August 1st. The user has provided textual input (originally spoken or typed) indicating that the user feels lousy. The chatbot has detected the negative user state and has provided a suitable response, such as “I'm sorry to hear that.” The chatbot has also stored an indication of the user's expressed state in memory, e.g., of voice-enableddevice 152. For example, in some implementations, the chatbot may store the user's statement verbatim in memory. In other implementations, the chatbot may determine a generic user state (e.g., a numeric sentiment measure or enumerated sentiment level) determined from the user's statement, such as “sick,” “sad,” “depressed,” etc., and may store an indication of that generic user state. -
FIG. 5 depicts a second session between the user and the chatbot that occurs the next morning on August 2nd. The user initiates the second session by asking, “What's the weather today?” The chatbot first responds to the user's query by replying, “80 degrees and sunny.” Then, without any prompting from the user, and based on the user's negative state expressed the previous evening, the chatbot asks, “Are you feeling better?” While the second session depicted inFIG. 5 occurs one day after the first session depicted inFIG. 4 , this is not meant to be limiting. As noted above, in various implementations, separate sessions between the user and the chatbot may be distinguished from each other based on other signals, such as intervening interactions between the user and voice-enableddevice 152, a change of context of the user (which may be detected, for instance, based on one or more signals from one or more sensors associated with voice-enableddevice 152, such as accelerometers, GPS, etc.), and so forth. -
FIG. 6 illustrates a routine 660 suitable for execution by achatbot 56 to communicate with a user in a more natural (i.e. “human,” “graceful”) manner.Routine 660 may be executed by the same service that processes voice-based queries, or may be a different service altogether. - At
block 662, input is received from a user during a first session. As noted above, the input may take various forms and/or be received using various input modalities. In some implementations, the input may take the form of a digital audio signal or text typed by the user at a physical or virtual keyboard. In other implementations, the input may take other forms, such as gestures (e.g., shaking a phone may indicate excitement or frustration), eye movements (e.g., a lot of eye movement may indicate stress or excitement), and so forth. Assuming the input is textual input (originally spoken or typed), atblock 664, the user input may be semantically processed online or offline to determine a state of the user. In some implementations, the textual input may be converted to text tokens (e.g., using streaming voice to textmodule 64 and/or model 82 ofFIG. 2 ) and then semantically processed atblock 664. - At
block 666,chatbot 56 may store, e.g., in local memory and/or at one or more remote computing devices (e.g., hosted in the cloud), an indication of the user's state. In some implementations, the indication may include the user's statement verbatim. In other implementations, the indication may include a generalized label of the user's state (e.g., “happy,” “sad,” “sick,” “excited,” “stressed,” etc.). In yet other implementations, the indication may be stored as a numeric state (or “sentiment”) measure. In some implementations, the user input may be stored as part of a sequence of user inputs that express states of the user over time (e.g., across a plurality of distinct sessions). Atblock 668,chatbot 56 may determine based on one or more signals that a subsequent session has commenced and/or is underway. Examples of signals that may be considered bychatbot 56 when distinguishing between multiple distinct sessions were described previously. - At
block 670,chatbot 56 may output one or more statements that are formed, e.g., bychatbot 56 or by a remote processor, from a plurality of candidate words, phrases, and/or statements. In some implementations, the plurality of candidate words, phrases, and/or statements may be obtained/extracted from prior message exchange threads between multiple individuals. For example, a corpus of prior message exchange threads may be authorized for use in training an artificial intelligence scheme such as a machine learning classifier or neural network. User words, phrases, and/or statements in the message exchange threads expressing states of the users (e.g., user sentiments) may be used as labeled inputs. Responses to those statements from other users in the message exchange threads may be provided as labeled outputs. For example, responses of empathy, congratulations, encouragement, etc., may be identified as responses to user statements expressing sentiment. A machine learning classifier, neural network, or other artificial intelligence model may be trained using these labeled pairs to identify future words, phrases, and/or statements to be formed and provided bychatbot 56 in response to user statements of sentiment. - In other implementations, less complex techniques may be employed to identify suitable statements for
chatbot 56 to output in subsequent sessions. For example, in some implementations, a plurality of candidate statements may be provided for each of an enumerated set of user states (e.g., sentiments). Whenever a new session commences betweenchatbot 56 and a user, the user's last known state (or a combination of a plurality of previous states) may be used to identify the user's potential current sentiment. Then, a statement may be selected from a plurality of candidate statements associated with that sentiment. -
FIG. 7 schematically depicts a non-limiting example of how various levels of user state, and more particularly, user sentiment, may be mapped to a plurality of candidate statements to be output by a chatbot. On the left is a range of sentiments from strong negative to strong positive, with intermediate values in between, that may be determined by a chatbot based on user input during a first session. On the right are candidate statements that may be selected and output by a chatbot during a subsequent session with the user based on the previously-determined sentiment. In other implementations, in addition to or instead of selecting a candidate statement to output, a chatbot may form (e.g., assemble) a statement from a plurality of candidate words, phrases, and/or complete statements. - In the implementation depicted in
FIG. 7 , each level of sentiment may map to a plurality of candidate statements. For example, both strong negative and negative map towards the same group of four candidate statements (“I'm sorry for your loss,” “Is there anything I can do to help?,” “I hope you're feeling better,” “How are you feeling?”). Neutral maps to three relatively generic candidate statements that take the form of common greetings (“Good morning,” “Good Afternoon,” “How are you?”). Positive and strong positive both map to another group of four candidate statements (“Still glowing?,” “Still in a good mood?,” “Wonderful to see you so happy <insert previous time>,” “Congratulations”). Of course, the sentiment levels and candidate statements depicted inFIG. 7 are for illustrative purposes only, and are not meant to be limiting. Any number of candidate statements may be provided for any number of sentiment levels. - In various implementations, if a chatbot (e.g., 56) determines during a first session with a user that the user has a particular sentiment, the chatbot may output a statement during a subsequent session, such as the next session, that corresponds to the prior sentiment. For example, if a user expresses a negative sentiment during a first session, the chatbot may select and output one of the four statements in the top group of statements. As noted above, in other implementations, the chatbot may form (e.g., assemble) statements to output from a plurality of candidate words, phrases, and/or statements. Additionally or alternatively, in some implementations, the chatbot may select and output one or more images, symbols, and/or ideograms (such as one or more so-called “emojis”) to convey empathy or otherwise respond to a user's expressed sentiment.
- Referring back to
FIG. 7 , the chatbot may select a particular statement from a group of candidate statements mapped to one or more sentiment levels in various ways. In some implementations, if the chatbot lacks additional information about the context of the user, etc., the chatbot may merely select the statement that is the most broadly applicable (i.e., generic). For example, if the chatbot only knows that the user's last sentiment was negative, it may select a relatively generic empathetic response, such as “I hope you're feeling better” or “How are you feeling?” In some implementations, if two or more candidate statements are equally applicable to a user's last-known sentiment, the chatbot may randomly select from the two or more statements. - In other implementations, the chatbot may analyze prior responses by the user to the candidate statements, and may select the candidate statement to which the user has responded to mostly positively in the past. For example, suppose in multiple instances in the past when a user last-known sentiment is negative, the chatbot has output both “I hope you're feeling better” (a declarative statement) and “How are you feeling?” (solicitation of a user state). Suppose further that the user ignored the former but responded to the latter (e.g., “Yes, I am feeling better, thank you for asking”). The chatbot may take this into account when selecting which of these two phrases to output in future sessions. To this end, in some implementations, the chatbot may create and maintain scores, statistics, and/or other metrics in association with each candidate statement, so that those statements that elicit positive responses are used more often.
- In some implementations, candidate statements (or words, or phrases) may be drawn from prior message exchange between multiple (human) participants that have been authorized for use as training examples, e.g., for machine learning classifiers. For example, a so-called “sentiment classifier” trained to identify user statements expressing sentiment may be employed to identify words, phrases, and/or statements by message exchange thread participants expressing various sentiments. Responses and/or replies from other participants may then be identified.
- In some implementations, pairs of statements, one expressing sentiment and another empathetically responding thereto, may be used as positive training examples for a so-called “sentiment response classifier.” Additionally or alternatively, in some implementations, triplets of statements—e.g., a first statement expressing sentiment, a second statement empathetically responding to the first statement, and a third statement positively acknowledging the second statement—may be used as a positive training example. Likewise, in some implementations, triplets of statement—e.g., a first statement expressing sentiment, a second statement empathetically responding to the first statement, and a third statement rejecting or otherwise correcting the second statement—may be used as a negative training example. Once sufficient prior message exchange threads have been analyzed in this fashion, the sentiment response classifier may be used by a chatbot to select candidate words, phrases, and/or statements for use as output in response to previously-determined user sentiments.
- In yet other implementations, statements may be formed based on one or more signals available to the chatbot other than a general indication prior user sentiment. For example, suppose user input expressing negative sentiment also provides other details, such as “I'm sad because my friend moved away.” The additional information about the friend moving away may be semantically processed, e.g., by the chatbot, and may be used to select “I'm sorry for your loss” from the top group of statements in
FIG. 7 . - As another example, suppose that during a first session, a user expresses positive sentiment to the chatbot but does not elaborate on why they are happy. However, suppose the chatbot has access to personal data associated with the user, such as a calendar entry describing a ceremony at which the user is going to receive an award, a social networking status update indicating the user has become engaged, or an email notifying the user that the user has won a free vacation. Based on any of these data points (or other similar types of data points that might warrant congratulations), the chatbot may select “Congratulations” to be output to the user at a subsequent session, rather than a more generic statement such as “Still in a good mood?”
- The examples described above in association with
FIG. 7 include maintaining or tracking user sentiment intersession (i.e. across multiple sessions). However, this is not meant to be limiting. In some implementations, similar techniques may be employed intra-session (i.e. within a single session). For example, if a user provides input that expresses a negative/positive sentiment, the chatbot may immediately respond with a statement selected from the candidate statements depicted inFIG. 7 (or in other implementations may assemble/form such a statement using candidate words, phrases, and/or statements). The chatbot may also score or otherwise rank the candidate statements based on the user's immediate response or lack thereof, e.g., for future reference. In some implementations, a user may, during a given session, express a prior state (e.g., “I felt sick last night”). In response to such a statement about a prior state, the chatbot may, during the same given session, form a response from a plurality of candidate words, phrases, and/or statements (e.g., “are you feeling better this morning?”). - In some implementations, techniques described herein may be employed without knowledge of a user's prior state (e.g., sentiment). For example, if a chatbot were to output a generic greeting (“e.g., “Hello,” “How are you,” etc.) to a user each time a new session between the user and the chatbot commenced, the user may become annoyed, particularly if there are multiple sessions within a relatively short period of time (e.g., a few minutes, a couple of hours, etc.). For example, if a chatbot already output the greeting “Good morning” to a user, it would not make sense for the chatbot to output the same greeting later that morning, even if the user engages in multiple distinct sessions with the chatbot.
- Accordingly, techniques similar to those described above in association with
block 668 ofFIG. 6 may be employed to determine whether it would be socially reasonable for a chatbot to issue a greeting to a user on commencement of a session. In the above example, for instance, the chatbot may maintain, e.g., in memory of voice-enableddevice 52, an indication that the greeting “Good morning” has already been output today. Should a morning session between the user and the chatbot cease, and then should the user initiate a new session later in the morning, the chatbot may determine that it already greeted the user, and may refrain from issuing the same or similar greeting again. In some implementations, a chatbot may be configured to output only a single greeting per day. In other implementations, a chatbot may be configured to output a greeting if a threshold amount of time has elapsed since it last output a greeting. - While several implementations have been described and illustrated herein, a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein may be utilized, and each of such variations and/or modifications is deemed to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific implementations described herein. It is, therefore, to be understood that the foregoing implementations are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, implementations may be practiced otherwise than as specifically described and claimed. Implementations of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
Claims (21)
Priority Applications (11)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/277,954 US9947319B1 (en) | 2016-09-27 | 2016-09-27 | Forming chatbot output based on user state |
GB1713939.5A GB2555922A (en) | 2016-09-27 | 2017-08-31 | Selecting chatbot output based on user state |
EP17771933.3A EP3510497A1 (en) | 2016-09-27 | 2017-09-06 | Forming chatbot output based on user state |
PCT/US2017/050297 WO2018063758A1 (en) | 2016-09-27 | 2017-09-06 | Forming chatbot output based on user state |
DE202017105815.8U DE202017105815U1 (en) | 2016-09-27 | 2017-09-25 | Forming a chatbot output based on a user state |
DE102017122200.6A DE102017122200A1 (en) | 2016-09-27 | 2017-09-25 | Forming a chatbot output based on a user state |
CN201710880196.1A CN107870977B (en) | 2016-09-27 | 2017-09-26 | Method, system, and medium for forming chat robot output based on user status |
CN202110896567.1A CN113779378B (en) | 2016-09-27 | 2017-09-26 | Method, system, and medium for forming chat robot output based on user status |
US15/915,599 US10157615B2 (en) | 2016-09-27 | 2018-03-08 | Forming chatbot output based on user state |
US16/181,874 US10515635B2 (en) | 2016-09-27 | 2018-11-06 | Forming chatbot output based on user state |
US16/712,481 US11322143B2 (en) | 2016-09-27 | 2019-12-12 | Forming chatbot output based on user state |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/277,954 US9947319B1 (en) | 2016-09-27 | 2016-09-27 | Forming chatbot output based on user state |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/915,599 Continuation US10157615B2 (en) | 2016-09-27 | 2018-03-08 | Forming chatbot output based on user state |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180090137A1 true US20180090137A1 (en) | 2018-03-29 |
US9947319B1 US9947319B1 (en) | 2018-04-17 |
Family
ID=59930774
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/277,954 Active US9947319B1 (en) | 2016-09-27 | 2016-09-27 | Forming chatbot output based on user state |
US15/915,599 Active US10157615B2 (en) | 2016-09-27 | 2018-03-08 | Forming chatbot output based on user state |
US16/181,874 Active US10515635B2 (en) | 2016-09-27 | 2018-11-06 | Forming chatbot output based on user state |
US16/712,481 Active 2037-05-06 US11322143B2 (en) | 2016-09-27 | 2019-12-12 | Forming chatbot output based on user state |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/915,599 Active US10157615B2 (en) | 2016-09-27 | 2018-03-08 | Forming chatbot output based on user state |
US16/181,874 Active US10515635B2 (en) | 2016-09-27 | 2018-11-06 | Forming chatbot output based on user state |
US16/712,481 Active 2037-05-06 US11322143B2 (en) | 2016-09-27 | 2019-12-12 | Forming chatbot output based on user state |
Country Status (6)
Country | Link |
---|---|
US (4) | US9947319B1 (en) |
EP (1) | EP3510497A1 (en) |
CN (2) | CN113779378B (en) |
DE (2) | DE202017105815U1 (en) |
GB (1) | GB2555922A (en) |
WO (1) | WO2018063758A1 (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180096699A1 (en) * | 2016-09-30 | 2018-04-05 | Honda Motor Co., Ltd. | Information-providing device |
CN108809817A (en) * | 2018-07-06 | 2018-11-13 | 上海博泰悦臻电子设备制造有限公司 | Vehicle, vehicle device equipment, Cloud Server and the communication means of vehicle-mounted instant chat |
US20180357309A1 (en) * | 2017-06-09 | 2018-12-13 | Google Inc. | Balance modifications of audio-based computer program output |
US20180358010A1 (en) * | 2017-06-09 | 2018-12-13 | Google Inc. | Balance modifications of audio-based computer program output |
US10250532B2 (en) * | 2017-04-28 | 2019-04-02 | Microsoft Technology Licensing, Llc | Systems and methods for a personality consistent chat bot |
US20190166071A1 (en) * | 2017-11-27 | 2019-05-30 | Electronics And Telecommunications Research Institute | Chatbot system and service method thereof |
US20190180743A1 (en) * | 2017-12-13 | 2019-06-13 | Kabushiki Kaisha Toshiba | Dialog system |
CN110880319A (en) * | 2018-09-06 | 2020-03-13 | 丰田自动车株式会社 | Voice interaction device, control method for voice interaction device, and non-transitory recording medium storing program |
US10652170B2 (en) | 2017-06-09 | 2020-05-12 | Google Llc | Modification of audio-based computer program output |
US10657173B2 (en) | 2017-06-09 | 2020-05-19 | Google Llc | Validate modification of audio-based computer program output |
US10749822B2 (en) | 2018-09-20 | 2020-08-18 | The Toronto-Dominion Bank | Chat bot conversation manager |
US10841251B1 (en) * | 2020-02-11 | 2020-11-17 | Moveworks, Inc. | Multi-domain chatbot |
CN112307188A (en) * | 2020-12-30 | 2021-02-02 | 北京百度网讯科技有限公司 | Dialog generation method, system, electronic device and readable storage medium |
US20210065019A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Using a dialog system for learning and inferring judgment reasoning knowledge |
US11037576B2 (en) | 2018-11-15 | 2021-06-15 | International Business Machines Corporation | Distributed machine-learned emphatic communication for machine-to-human and machine-to-machine interactions |
CN113454708A (en) * | 2019-02-28 | 2021-09-28 | 微软技术许可有限责任公司 | Linguistic style matching agent |
US11188548B2 (en) | 2019-01-14 | 2021-11-30 | Microsoft Technology Licensing, Llc | Profile data store automation via bots |
US20220100990A1 (en) * | 2020-09-30 | 2022-03-31 | Ringcentral, Inc. | System and method of determining an emotional state of a user |
CN114363280A (en) * | 2022-03-18 | 2022-04-15 | 深圳市欧乐智能实业有限公司 | Mobile phone chat auxiliary system based on multi-section voice summary type transmission |
US11361674B2 (en) * | 2019-01-24 | 2022-06-14 | Toyota Jidosha Kabushiki Kaisha | Encouraging speech system, encouraging speech method, and computer readable medium |
US11386171B1 (en) * | 2017-10-30 | 2022-07-12 | Wells Fargo Bank, N.A. | Data collection and filtering for virtual assistants |
US20220366896A1 (en) * | 2021-05-11 | 2022-11-17 | AskWisy, Inc. | Intelligent training and education bot |
US11537744B2 (en) * | 2017-12-05 | 2022-12-27 | Microsoft Technology Licensing, Llc | Sharing user information with and between bots |
US11580977B2 (en) * | 2020-09-29 | 2023-02-14 | Salesforce, Inc. | Configurable conversation engine for executing customizable chatbots |
US20230169990A1 (en) * | 2021-12-01 | 2023-06-01 | Verizon Patent And Licensing Inc. | Emotionally-aware voice response generation method and apparatus |
US11735206B2 (en) * | 2020-03-27 | 2023-08-22 | Harman International Industries, Incorporated | Emotionally responsive virtual personal assistant |
US20230298580A1 (en) * | 2022-03-18 | 2023-09-21 | Google Llc | Emotionally Intelligent Responses to Information Seeking Questions |
US20230317074A1 (en) * | 2017-06-27 | 2023-10-05 | Amazon Technologies, Inc. | Contextual voice user interface |
WO2024025178A1 (en) * | 2022-07-25 | 2024-02-01 | Samsung Electronics Co., Ltd. | A system to provide natural utterance by a voice assistant and method thereof |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180364798A1 (en) * | 2017-06-16 | 2018-12-20 | Lenovo (Singapore) Pte. Ltd. | Interactive sessions |
KR101999657B1 (en) * | 2017-09-22 | 2019-07-16 | 주식회사 원더풀플랫폼 | User care system using chatbot |
US11369297B2 (en) * | 2018-01-04 | 2022-06-28 | Microsoft Technology Licensing, Llc | Providing emotional care in a session |
CN108521366A (en) * | 2018-03-27 | 2018-09-11 | 联想(北京)有限公司 | Expression method for pushing and electronic equipment |
US10594635B2 (en) * | 2018-04-20 | 2020-03-17 | Oracle International Corporation | Managing customer relationship using multiple chat servers designed to interface with service applications |
CN108877792B (en) * | 2018-05-30 | 2023-10-24 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
US11132681B2 (en) | 2018-07-06 | 2021-09-28 | At&T Intellectual Property I, L.P. | Services for entity trust conveyances |
DE102018212503A1 (en) * | 2018-07-26 | 2020-01-30 | Krones Aktiengesellschaft | Communication and control system for a bottling plant |
CN109243438B (en) * | 2018-08-24 | 2023-09-26 | 上海擎感智能科技有限公司 | Method, system and storage medium for regulating emotion of vehicle owner |
US10802872B2 (en) | 2018-09-12 | 2020-10-13 | At&T Intellectual Property I, L.P. | Task delegation and cooperation for automated assistants |
CN111226194B (en) * | 2018-09-27 | 2024-08-13 | 三星电子株式会社 | Method and system for providing interactive interface |
US11037048B2 (en) | 2018-10-22 | 2021-06-15 | Moveworks, Inc. | Virtual conversation method or system |
US11481186B2 (en) | 2018-10-25 | 2022-10-25 | At&T Intellectual Property I, L.P. | Automated assistant context and protocol |
US20200142719A1 (en) * | 2018-11-02 | 2020-05-07 | International Business Machines Corporation | Automatic generation of chatbot meta communication |
US10942941B2 (en) | 2018-11-12 | 2021-03-09 | International Business Machines Corporation | Natural language processor for cognitively generating contextual filler designed to keep users engaged |
US10594634B1 (en) | 2018-12-05 | 2020-03-17 | Soo Chuan Teng | Electronic mail generation device and method of use |
DE102018133149A1 (en) | 2018-12-20 | 2020-06-25 | Bayerische Motoren Werke Aktiengesellschaft | Multimodal multi-level interaction |
US11183186B2 (en) | 2019-01-16 | 2021-11-23 | International Business Machines Corporation | Operating a voice response system |
DE102019105590B3 (en) | 2019-03-05 | 2020-08-06 | Bayerische Motoren Werke Aktiengesellschaft | Cross-platform messaging in vehicles |
US20220147944A1 (en) * | 2019-03-18 | 2022-05-12 | Cognitive Industries Pty Ltd | A method of identifying and addressing client problems |
US11206229B2 (en) * | 2019-04-26 | 2021-12-21 | Oracle International Corporation | Directed acyclic graph based framework for training models |
US11106874B2 (en) * | 2019-05-08 | 2021-08-31 | Sap Se | Automated chatbot linguistic expression generation |
US11146501B2 (en) * | 2019-06-21 | 2021-10-12 | International Business Machines Corporation | Decision based resource allocation in response systems |
US11227592B1 (en) * | 2019-06-27 | 2022-01-18 | Amazon Technologies, Inc. | Contextual content for voice user interfaces |
KR20210028380A (en) | 2019-09-04 | 2021-03-12 | 삼성전자주식회사 | Electronic device for performing operation using speech recognition function and method for providing notification associated with operation thereof |
WO2021086970A1 (en) * | 2019-10-28 | 2021-05-06 | Liveperson, Inc. | Dynamic communications routing to disparate endpoints |
CN112750430A (en) * | 2019-10-29 | 2021-05-04 | 微软技术许可有限责任公司 | Providing responses in automatic chat |
US11227583B2 (en) | 2019-11-05 | 2022-01-18 | International Business Machines Corporation | Artificial intelligence voice response system having variable modes for interaction with user |
ES2847588A1 (en) * | 2020-02-03 | 2021-08-03 | Univ Vigo | System to improve the user experience without abstraction capabilities in the information query (Machine-translation by Google Translate, not legally binding) |
JP7248615B2 (en) * | 2020-03-19 | 2023-03-29 | ヤフー株式会社 | Output device, output method and output program |
US10798031B1 (en) | 2020-04-13 | 2020-10-06 | Moveworks, Inc. | Generic disambiguation |
JP2022059868A (en) * | 2020-10-02 | 2022-04-14 | シャープ株式会社 | Image processing system |
US11675820B2 (en) | 2020-10-27 | 2023-06-13 | International Business Machines Corporation | Building and modifying conversational user journeys |
US11881216B2 (en) | 2021-06-08 | 2024-01-23 | Bank Of America Corporation | System and method for conversation agent selection based on processing contextual data from speech |
US11889153B2 (en) | 2022-05-11 | 2024-01-30 | Bank Of America Corporation | System and method for integration of automatic response generating systems with non-API applications |
US11977779B2 (en) | 2022-05-11 | 2024-05-07 | Bank Of America Corporation | Smart queue for distributing user requests to automated response generating systems |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6757362B1 (en) | 2000-03-06 | 2004-06-29 | Avaya Technology Corp. | Personal virtual assistant |
US7058566B2 (en) * | 2001-01-24 | 2006-06-06 | Consulting & Clinical Psychology, Ltd. | System and method for computer analysis of computer generated communications to produce indications and warning of dangerous behavior |
US7346492B2 (en) * | 2001-01-24 | 2008-03-18 | Shaw Stroz Llc | System and method for computerized psychological content analysis of computer and media generated communications to produce communications management support, indications, and warnings of dangerous behavior, assessment of media images, and personnel selection support |
CN100518070C (en) * | 2004-08-13 | 2009-07-22 | 上海赢思软件技术有限公司 | Chat robot system |
US20080096533A1 (en) | 2006-10-24 | 2008-04-24 | Kallideas Spa | Virtual Assistant With Real-Time Emotions |
US8000969B2 (en) | 2006-12-19 | 2011-08-16 | Nuance Communications, Inc. | Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges |
US20140279050A1 (en) | 2008-05-21 | 2014-09-18 | The Delfin Project, Inc. | Dynamic chatbot |
CN101604204B (en) * | 2009-07-09 | 2011-01-05 | 北京科技大学 | Distributed cognitive technology for intelligent emotional robot |
US20150206000A1 (en) * | 2010-06-07 | 2015-07-23 | Affectiva, Inc. | Background analysis of mental state expressions |
CN103390047A (en) * | 2013-07-18 | 2013-11-13 | 天格科技(杭州)有限公司 | Chatting robot knowledge base and construction method thereof |
US9342501B2 (en) * | 2013-10-30 | 2016-05-17 | Lenovo (Singapore) Pte. Ltd. | Preserving emotion of user input |
JP6257368B2 (en) * | 2014-02-18 | 2018-01-10 | シャープ株式会社 | Information processing device |
US10120955B2 (en) * | 2014-07-18 | 2018-11-06 | Nuance Communications, Inc. | State tracking over machine-learned relational trees in a dialog system |
CN104615646A (en) * | 2014-12-25 | 2015-05-13 | 上海科阅信息技术有限公司 | Intelligent chatting robot system |
US20170046496A1 (en) * | 2015-08-10 | 2017-02-16 | Social Health Innovations, Inc. | Methods for tracking and responding to mental health changes in a user |
US9723149B2 (en) * | 2015-08-21 | 2017-08-01 | Samsung Electronics Co., Ltd. | Assistant redirection for customer service agent processing |
CN105068661B (en) * | 2015-09-07 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | Man-machine interaction method based on artificial intelligence and system |
CN105183848A (en) * | 2015-09-07 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Human-computer chatting method and device based on artificial intelligence |
CN105206284B (en) * | 2015-09-11 | 2019-06-18 | 清华大学 | Dredge the cyberchat method and system of adolescent psychology pressure |
CN105138710B (en) * | 2015-10-12 | 2019-02-19 | 金耀星 | A kind of chat agency plant and method |
CN105426436B (en) * | 2015-11-05 | 2019-10-15 | 百度在线网络技术(北京)有限公司 | Information providing method and device based on artificial intelligence robot |
CN105868188A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Man-machine interaction method and system |
CN105893344A (en) * | 2016-03-28 | 2016-08-24 | 北京京东尚科信息技术有限公司 | User semantic sentiment analysis-based response method and device |
CN105930374B (en) * | 2016-04-12 | 2019-07-19 | 华南师范大学 | Based on emotional robot dialogue method, system and the robot fed back recently |
CN105938484B (en) * | 2016-04-12 | 2020-06-23 | 华南师范大学 | Robot interaction method and system based on user feedback knowledge base |
US10419375B1 (en) * | 2016-06-14 | 2019-09-17 | Symantec Corporation | Systems and methods for analyzing emotional responses to online interactions |
-
2016
- 2016-09-27 US US15/277,954 patent/US9947319B1/en active Active
-
2017
- 2017-08-31 GB GB1713939.5A patent/GB2555922A/en not_active Withdrawn
- 2017-09-06 EP EP17771933.3A patent/EP3510497A1/en not_active Withdrawn
- 2017-09-06 WO PCT/US2017/050297 patent/WO2018063758A1/en unknown
- 2017-09-25 DE DE202017105815.8U patent/DE202017105815U1/en active Active
- 2017-09-25 DE DE102017122200.6A patent/DE102017122200A1/en active Pending
- 2017-09-26 CN CN202110896567.1A patent/CN113779378B/en active Active
- 2017-09-26 CN CN201710880196.1A patent/CN107870977B/en active Active
-
2018
- 2018-03-08 US US15/915,599 patent/US10157615B2/en active Active
- 2018-11-06 US US16/181,874 patent/US10515635B2/en active Active
-
2019
- 2019-12-12 US US16/712,481 patent/US11322143B2/en active Active
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180096699A1 (en) * | 2016-09-30 | 2018-04-05 | Honda Motor Co., Ltd. | Information-providing device |
US10250532B2 (en) * | 2017-04-28 | 2019-04-02 | Microsoft Technology Licensing, Llc | Systems and methods for a personality consistent chat bot |
US10855627B2 (en) | 2017-06-09 | 2020-12-01 | Google Llc | Modification of audio-based computer program output |
US10657173B2 (en) | 2017-06-09 | 2020-05-19 | Google Llc | Validate modification of audio-based computer program output |
US20180357309A1 (en) * | 2017-06-09 | 2018-12-13 | Google Inc. | Balance modifications of audio-based computer program output |
US11582169B2 (en) | 2017-06-09 | 2023-02-14 | Google Llc | Modification of audio-based computer program output |
US20180358010A1 (en) * | 2017-06-09 | 2018-12-13 | Google Inc. | Balance modifications of audio-based computer program output |
US10600409B2 (en) * | 2017-06-09 | 2020-03-24 | Google Llc | Balance modifications of audio-based computer program output including a chatbot selected based on semantic processing of audio |
US10614122B2 (en) | 2017-06-09 | 2020-04-07 | Google Llc | Balance modifications of audio-based computer program output using a placeholder field based on content |
US10652170B2 (en) | 2017-06-09 | 2020-05-12 | Google Llc | Modification of audio-based computer program output |
US20230317074A1 (en) * | 2017-06-27 | 2023-10-05 | Amazon Technologies, Inc. | Contextual voice user interface |
US11386171B1 (en) * | 2017-10-30 | 2022-07-12 | Wells Fargo Bank, N.A. | Data collection and filtering for virtual assistants |
US10771408B2 (en) * | 2017-11-27 | 2020-09-08 | Electronics And Telecommunications Research Institute | Chatbot system and service method thereof |
US20190166071A1 (en) * | 2017-11-27 | 2019-05-30 | Electronics And Telecommunications Research Institute | Chatbot system and service method thereof |
US11537744B2 (en) * | 2017-12-05 | 2022-12-27 | Microsoft Technology Licensing, Llc | Sharing user information with and between bots |
US20190180743A1 (en) * | 2017-12-13 | 2019-06-13 | Kabushiki Kaisha Toshiba | Dialog system |
US11087753B2 (en) * | 2017-12-13 | 2021-08-10 | KABUSHIKl KAISHA TOSHIBA | Dialog system |
CN108809817A (en) * | 2018-07-06 | 2018-11-13 | 上海博泰悦臻电子设备制造有限公司 | Vehicle, vehicle device equipment, Cloud Server and the communication means of vehicle-mounted instant chat |
CN110880319A (en) * | 2018-09-06 | 2020-03-13 | 丰田自动车株式会社 | Voice interaction device, control method for voice interaction device, and non-transitory recording medium storing program |
US11223583B2 (en) | 2018-09-20 | 2022-01-11 | The Toronto-Dominion Bank | Chat bot conversation manager |
US10749822B2 (en) | 2018-09-20 | 2020-08-18 | The Toronto-Dominion Bank | Chat bot conversation manager |
US11037576B2 (en) | 2018-11-15 | 2021-06-15 | International Business Machines Corporation | Distributed machine-learned emphatic communication for machine-to-human and machine-to-machine interactions |
US11188548B2 (en) | 2019-01-14 | 2021-11-30 | Microsoft Technology Licensing, Llc | Profile data store automation via bots |
US11361674B2 (en) * | 2019-01-24 | 2022-06-14 | Toyota Jidosha Kabushiki Kaisha | Encouraging speech system, encouraging speech method, and computer readable medium |
CN113454708A (en) * | 2019-02-28 | 2021-09-28 | 微软技术许可有限责任公司 | Linguistic style matching agent |
US20210065019A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Using a dialog system for learning and inferring judgment reasoning knowledge |
US10841251B1 (en) * | 2020-02-11 | 2020-11-17 | Moveworks, Inc. | Multi-domain chatbot |
US11735206B2 (en) * | 2020-03-27 | 2023-08-22 | Harman International Industries, Incorporated | Emotionally responsive virtual personal assistant |
US11580977B2 (en) * | 2020-09-29 | 2023-02-14 | Salesforce, Inc. | Configurable conversation engine for executing customizable chatbots |
US11887599B2 (en) | 2020-09-29 | 2024-01-30 | Salesforce, Inc. | Configurable conversation engine for executing customizable chatbots |
US20220100990A1 (en) * | 2020-09-30 | 2022-03-31 | Ringcentral, Inc. | System and method of determining an emotional state of a user |
US11972636B2 (en) * | 2020-09-30 | 2024-04-30 | Ringcentral, Inc. | System and method of determining an emotional state of a user |
CN112307188A (en) * | 2020-12-30 | 2021-02-02 | 北京百度网讯科技有限公司 | Dialog generation method, system, electronic device and readable storage medium |
US20220366896A1 (en) * | 2021-05-11 | 2022-11-17 | AskWisy, Inc. | Intelligent training and education bot |
US20230169990A1 (en) * | 2021-12-01 | 2023-06-01 | Verizon Patent And Licensing Inc. | Emotionally-aware voice response generation method and apparatus |
CN114363280A (en) * | 2022-03-18 | 2022-04-15 | 深圳市欧乐智能实业有限公司 | Mobile phone chat auxiliary system based on multi-section voice summary type transmission |
US20230298580A1 (en) * | 2022-03-18 | 2023-09-21 | Google Llc | Emotionally Intelligent Responses to Information Seeking Questions |
WO2024025178A1 (en) * | 2022-07-25 | 2024-02-01 | Samsung Electronics Co., Ltd. | A system to provide natural utterance by a voice assistant and method thereof |
Also Published As
Publication number | Publication date |
---|---|
US11322143B2 (en) | 2022-05-03 |
US10157615B2 (en) | 2018-12-18 |
GB2555922A (en) | 2018-05-16 |
DE102017122200A1 (en) | 2018-03-29 |
DE202017105815U1 (en) | 2017-12-18 |
US9947319B1 (en) | 2018-04-17 |
CN113779378A (en) | 2021-12-10 |
US20200118567A1 (en) | 2020-04-16 |
GB201713939D0 (en) | 2017-10-18 |
US20190074010A1 (en) | 2019-03-07 |
US20180197542A1 (en) | 2018-07-12 |
WO2018063758A1 (en) | 2018-04-05 |
US10515635B2 (en) | 2019-12-24 |
CN107870977B (en) | 2021-08-20 |
CN113779378B (en) | 2022-12-13 |
CN107870977A (en) | 2018-04-03 |
EP3510497A1 (en) | 2019-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11322143B2 (en) | Forming chatbot output based on user state | |
US10490190B2 (en) | Task initiation using sensor dependent context long-tail voice commands | |
US11929072B2 (en) | Using textual input and user state information to generate reply content to present in response to the textual input | |
US20210029131A1 (en) | Conditional provision of access by interactive assistant modules | |
US12093270B2 (en) | Automatically augmenting message exchange threads based on tone of message | |
CN109983430B (en) | Determining graphical elements included in an electronic communication | |
US10282218B2 (en) | Nondeterministic task initiation by a personal assistant module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HORLING, BRYAN;GARRETT, MARYAM;QUAH, WAN FEN;AND OTHERS;SIGNING DATES FROM 20160914 TO 20160927;REEL/FRAME:039921/0156 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044567/0001 Effective date: 20170929 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |