CN114270435A

CN114270435A - System and method for intelligent dialogue based on knowledge tracking

Info

Publication number: CN114270435A
Application number: CN202080053996.6A
Authority: CN
Inventors: S·汤拜尔; 刘昌松
Original assignee: De Mai Co ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2019-06-17
Filing date: 2020-06-09
Publication date: 2022-04-01
Also published as: WO2020256992A1

Abstract

The present teachings relate to methods, systems, media, and embodiments for adaptive dialog management. A language understanding result and an evaluation thereof are received. The language understanding result is derived based on an utterance from a user participating in the conversation, and the conversation is topic and subject to conversation policy. The evaluation is obtained with respect to the expected result expressed in the dialog strategy. A plurality of probabilities is derived based on the language understanding results and the associated evaluations. Updating a set of parameters associated with the conversation policy based on the plurality of probabilities, wherein a first set of parameters parameterizes the conversation policy with respect to the user and characterizes the effectiveness of the conversation with the user under the conversation policy.

Description

System and method for intelligent dialogue based on knowledge tracking

Cross Reference to Related Applications

This application claims priority from the following patent applications: us provisional patent application 62/862268 (attorney docket No. 047437-, U.S. provisional patent application 62/862275 (attorney docket No. 047437-.

Technical Field

The present teachings relate generally to computers. More particularly, the present teachings relate to human machine dialog (human machine dialog) management.

Background

Computer-assisted dialog systems are becoming increasingly popular with advances in artificial intelligence technology and the explosive growth of internet-based communications due to the ubiquity of internet connectivity. For example, more and more call centers deploy automated dialog robots to handle customer calls. Hotels have begun to install various kiosks that can answer questions of guests or guests. In recent years, automated man-machine communication in other fields has become increasingly popular.

Conventional computer-assisted dialog systems are typically preprogrammed with certain dialog content, such as questions and answers based on dialog patterns well known in the relevant arts. Unfortunately, some conversation models may be suitable for some human users, but may not be suitable for other human users. Furthermore, it is undesirable that a human user may run problems during a conversation, and continue a fixed conversation mode regardless of what the user speaks may cause irritation or lose interest.

When planning a conversation, human designers often need to manually author the content of the conversation based on known knowledge, which is time consuming and tedious. Even more labor is required in view of the need to create different conversation models. Any deviation from the designed conversation mode may need to be noted and used to determine how to continue the conversation while authoring the conversation content. Previous dialog systems have not effectively addressed such problems.

With recent developments in the AI field, the observed dynamic information can be adaptively incorporated into learning and used to guide the progress of human-computer interaction sessions. How to develop a knowledge representation (knowledge representation) that can incorporate dynamic information in different dimensions and sometimes in different modalities is a challenging problem. Since this knowledge representation is the basis for a dynamic conversational process between human and machine, it is necessary to configure the representation sufficiently to support adaptive conversations in a relevant manner.

In order to talk to a human being, an automatic dialog system may need to achieve different levels of understanding of what the human being said in language, what the semantics of the spoken words are, sometimes the emotional state of the person, and the mutual causal relationship between the spoken content and the environment of the conversation. Conventional computer-assisted dialog systems do not adequately address such problems.

Accordingly, there is a need for methods and systems that address such limitations.

Disclosure of Invention

The teachings disclosed herein relate to methods, systems, and programming for advertising. More particularly, the present teachings relate to methods, systems, and programming related to exploring sources of advertisements and utilization thereof.

In one example, a method for adaptive dialog management implemented on a machine comprising at least one processor, a memory, and a communication platform connectable to a network. A language understanding result and an evaluation thereof are received. The language understanding result is derived based on utterances from users participating in a conversation that is topic-specific and subject to conversation strategy. The evaluation is obtained for the expected result represented in the dialog strategy. A plurality of probabilities is derived based on the language understanding results and the associated evaluations. Updating a set of parameters associated with the conversation policy based on the plurality of probabilities, wherein a first set of parameters parameterizes the conversation policy with respect to the user and characterizes the effectiveness of the conversation with the user under the conversation policy.

In various examples, a system for adaptive dialog management is disclosed that includes a knowledge tracking unit, a plurality of probability estimators, and an information state updater. The knowledge tracking unit is configured for receiving a language understanding result together with an evaluation thereof, wherein the language understanding result is derived based on an utterance from a user participating in a conversation directed to a topic, the conversation is governed by a conversation policy, and the evaluation is obtained for an expected result in the conversation policy. The plurality of probability estimators are configured for determining a plurality of probabilities based on the language understanding results and the associated evaluations. The information state updater is configured to update a first set of parameters associated with the conversation policy based on the plurality of probabilities, wherein the first set of parameters parameterizes the conversation policy with respect to the user and characterizes effectiveness of a conversation with the user under the conversation policy.

Other concepts relate to software for implementing the present teachings. A software product according to this concept includes at least one machine-readable non-transitory medium and information carried by the medium. The information carried by the medium may be executable program code data, parameters associated with the executable program code, and/or information related to the user, the request, the content, or other additional information.

In one example, a machine-readable non-transitory and tangible medium having data recorded thereon for adaptive dialog management, wherein the medium, when read by a machine, causes the machine to perform a series of steps as required by the disclosed method for dialog management. The language understanding result is received along with an evaluation thereof. The language understanding result is derived based on utterances from users participating in a conversation that is topic-specific and subject to conversation strategy. The evaluation is obtained for the expected result represented in the dialog strategy. A plurality of probabilities is derived based on the language understanding results and the associated evaluations. Updating a set of parameters associated with the conversation policy based on the plurality of probabilities, wherein a first set of parameters parameterizes the conversation policy with respect to the user and characterizes the effectiveness of the conversation with the user under the conversation policy.

Additional advantages and novel features will be set forth in part in the description which follows and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The advantages of the present teachings may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.

Drawings

The methods, systems, and/or programs described herein are further described in accordance with the exemplary embodiments. These exemplary embodiments are described in detail with reference to the accompanying drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and in which:

FIG. 1A depicts an exemplary configuration of a dialog system centered on information states that capture dynamic information observed during a dialog, according to embodiments of the present teachings;

FIG. 1B is a flow diagram of an exemplary process for using a dialog system that captures the information state of dynamic information observed during a dialog, according to an embodiment of the present teachings;

FIG. 2A depicts an exemplary construction of an information state according to embodiments of the present teachings;

FIG. 2B illustrates a representation of how estimated different psychotics (mindset) are connected in a dialog with a robot director (robot tutor) teaching user score (fractions) addition, according to an embodiment of the present teachings;

FIG. 2C illustrates an exemplary relationship between an estimated representation of the heart states of an agent (agent), a shared mindset, and a user in an informational state, according to embodiments of the present teachings;

FIG. 3A illustrates exemplary relationships between different types of AND-OR graphs (AOG) used to represent estimated mind states of parties involved in a conversation, according to embodiments of the present teachings;

FIG. 3B depicts an exemplary association between a spatial AOG (S-AOG) and a temporal AOG (T-AOG) in an information state according to an embodiment of the present teachings;

FIG. 3C illustrates an exemplary S-AOG and its associated T-AOG, according to embodiments of the present teachings;

FIG. 3D illustrates an exemplary relationship between S-AOG, T-AOG and C-AOG according to an embodiment of the present teachings;

FIG. 4A illustrates an exemplary S-AOG that represents, in part, the mind of an agent for teaching different mathematical concepts, in accordance with an embodiment of the present teachings;

FIG. 4B illustrates an exemplary T-AOG that represents a dialog strategy associated in part with an psychology of an agent teaching a concept of a score, according to an embodiment of the present teachings;

FIG. 4C illustrates exemplary dialog content for teaching concepts associated with scores according to embodiments of the present teachings;

FIG. 5A illustrates an exemplary temporal partial graph (T-PG) within a T-AOG that represents a shared mind state between a user and a machine in accordance with embodiments of the present teachings;

FIG. 5B illustrates a portion of a conversation between a machine and a human along a conversation path representing a current representation of a shared mind state, in accordance with an embodiment of the present teachings;

FIG. 5C depicts an exemplary S-AOG having nodes parameterized with measurements related to mastery levels of different underlying concepts to represent a user' S mind, according to embodiments of the present teachings;

FIG. 5D illustrates exemplary personality feature types of a user that may be estimated based on observations from the conversation, according to embodiments of the present teachings;

FIG. 6A depicts a generic S-AOG for tutoring dialogues (tutoring dialogues) according to an embodiment of the present teachings;

FIG. 6B depicts a particular T-AOG of a dialog relating to a greeting, according to an embodiment of the present teachings;

FIG. 6C illustrates different types of parameterized alternatives for different types of AOGs in accordance with an embodiment of the present teachings;

FIG. 6D illustrates an S-AOG having different nodes parameterized with rewards that are updated based on dynamic observations from conversations in accordance with an embodiment of the present teachings;

FIG. 6E illustrates an exemplary T-AOG generated by merging different graphs via graph matching with parameterized content, according to embodiments of the present teachings;

FIG. 6F illustrates an exemplary T-AOG with parameterized content associated with nodes, in accordance with embodiments of the present teachings;

FIG. 6G illustrates a T-AOG having each node parameterized with one or more content sets according to an embodiment of the present teachings;

FIG. 6H illustrates exemplary data in different content sets associated with different nodes of a T-AOG, according to embodiments of the present teachings;

FIG. 6I illustrates an exemplary T-AOG in which different paths traverse different nodes, the different paths parameterized with rewards that are updated based on dynamic observations from dialogs, according to embodiments of the present teachings;

FIG. 7A depicts a high level system diagram of a knowledge tracking unit in accordance with an embodiment of the present teachings;

FIG. 7B illustrates how knowledge tracking enables adaptive dialog management according to an embodiment of the present teachings;

FIG. 7C is a flow chart of an exemplary process of a knowledge tracking unit according to an embodiment of the present teachings;

FIG. 8A illustrates an example of utility driven (utility drive) coaching (node) planning with respect to S-AOG, according to an embodiment of the present teachings;

FIG. 8B illustrates an example of utility-driven path planning with respect to a T-AOG, according to an embodiment of the present teachings;

FIG. 8C illustrates dynamic states in utility-driven adaptive dialog management based on parameterized AOG derivation according to embodiments of the present teachings;

FIG. 9A depicts an exemplary mode of creating an AOG with authored content according to embodiments of the present teachings;

FIG. 9B depicts an exemplary high-level system diagram of a content authoring system for automatically creating AOGs via machine learning, in accordance with embodiments of the present teachings;

FIG. 9C illustrates different types of topic-based AOGs derived from machine learning, in accordance with an embodiment of the present teachings;

FIG. 9D is a flow diagram of an exemplary process for a content authoring system for creating AOGs via machine learning, in accordance with an embodiment of the present teachings;

FIG. 10A illustrates an exemplary visual programming interface configured for content authoring associated with AOG, according to embodiments of the present teachings;

FIG. 10B illustrates an exemplary visual programming interface configured for authoring content for a parameterized AOG, in accordance with embodiments of the present teachings;

FIG. 10C is a flowchart of an exemplary process for creating AOG and content authoring via visual programming, according to an embodiment of the present teachings;

FIG. 11A illustrates exemplary code obtained via automatic/semi-automatic content authoring for generating scenes associated with S-AOG, according to an embodiment of the present teachings;

FIG. 11B illustrates exemplary code obtained via automatic/semi-automatic content authoring for generating a T-AOG, according to embodiments of the present teachings;

FIG. 12A depicts an exemplary high-level system diagram of a system for authoring content based on multimodal input from a user in accordance with embodiments of the present teachings;

FIG. 12B illustrates different types of metadata that may be automatically generated based on multimodal input from a user in accordance with an embodiment of the present teachings;

FIG. 12C is a flow diagram of an exemplary process of a system configured for authoring content based on multimodal input from a user in accordance with an embodiment of the present teachings;

FIG. 13 is an illustrative diagram of an exemplary mobile device architecture that can be used to implement a dedicated system embodying the present teachings in accordance with various embodiments; and

FIG. 14 is an illustrative diagram of an exemplary computing device architecture that can be used to implement a special purpose system embodying the present teachings in accordance with various embodiments.

Detailed Description

In the following detailed description, by way of example, numerous specific details are set forth in order to provide a thorough understanding of the relevant teachings. It will be apparent, however, to one skilled in the art that the present teachings may be practiced without these specific details. In other instances, well-known methods, procedures, components, and/or circuitry have been described at a relatively high-level without detail so as to avoid unnecessarily obscuring aspects of the present teachings.

The present teachings are directed to addressing the deficiencies of conventional human-machine dialog systems and providing methods and systems that enable rich presentation of multimodal information from a conversation environment to allow machines to have an improved feel to the surroundings of the conversation in order to better adapt the conversation with more efficient conversations and enhanced interaction with the user. Based on such representations, the present teachings further disclose different modes of creating such representations and authoring content of the dialog in the processing representation. Further, to allow adaptation of the representations based on the dynamics occurring during the conversation, the present teachings also disclose mechanisms to track the dynamics of the conversation and update the representations accordingly, which the machine then uses to conduct the conversation in a utility-driven adaptive manner to achieve maximized results.

FIG. 1A depicts an exemplary configuration of a dialog system 100, the dialog system 100 being centered on an information state 110, the information state 110 capturing dynamic information observed during a dialog, according to embodiments of the present teachings. The dialog system 100 includes a multimodal information processor 120, an Automatic Speech Recognition (ASR) engine 130, a Natural Language Understanding (NLU) engine 140, a Dialog Manager (DM)150, a Natural Language Generation (NLG) engine 160, a text-to-speech (TTS) engine 170. The system 100 interacts with a user 180 to conduct a conversation.

During the conversation, multimodal information is collected from the environment, including from the user 180, which captures ambient information of the conversation environment, the user's voice and expressions (of the face or body), etc. The multimodal information thus collected is analyzed by the multimodal information processor 120 to extract relevant features of different modalities in order to estimate different characteristics of the user and the environment. For example, the speech signal may be analyzed to determine speech related features such as speaking rate, pitch (pitch), or even accent (accent). Visual signals associated with the user may also be analyzed to extract, for example, facial features or body gestures, etc., in order to determine the expression of the user. By combining the acoustic and visual features, the multimodal information analyzer 120 may also be able to infer an emotional state of the user. For example, a high pitch, a fast speaking plus an angry facial expression may indicate that the user is not happy. In some embodiments, the observed user activity may also be analyzed to better understand the user. For example, if a user points or walks to a particular object, it may reveal what the user pointed to in his/her voice. Such multimodal information can provide a useful context to understand the intent of the user. The multimodal information processor 120 can continuously analyze multimodal information and store such analyzed information in the information state 110, which can then be used by various components in the system 100 to facilitate decisions related to dialog management.

In operation, speech information from user 180 is sent to ASR engine 130 to perform speech recognition. Speech recognition may include distinguishing between the language spoken by the user 180 and the words spoken. The results from ASR engine 130 are further processed by NLU engine 140 in order to understand the semantics spoken by the user. This understanding may depend not only on the words spoken, but also on other information (such as the expression and gestures of the user 180) and/or other contextual information (such as what was spoken before). Based on the understanding of the user's utterance, the dialog manager 150 determines how to respond to the user. Such determined responses may then be generated in text form via NLG engine 160 and further transformed from text form to speech signals via TTS engine 170. The output of the TTS engine 170 may then be delivered to the user 180 as a response to the user's utterance. Via this back-and-forth interaction, the process for the machine dialog system continues to perform a conversation with the user 180.

As seen in fig. 1A, the components in the system 100 connect to an information state 110, as discussed herein, the information state 110 captures the dynamics surrounding a conversation and provides relevant and rich contextual information that can be used to facilitate speech recognition (ASR), language understanding (NLU), and various conversation-related determinations, including what is the appropriate response (DM), what language features to apply to a text response (NLG), and how to convert the text response into speech form (TTS) (e.g., what accent). As discussed herein, the information state 110 can represent dialog-related dynamics obtained based on multimodal information that are relevant to the user 180 or to the surroundings of the dialog.

Upon receiving multimodal information from a dialog scene (about a user or about a dialog environment), the multimodal information processor 170 analyzes the information and characterizes the dialog surroundings in different dimensions, for example, acoustic features (e.g., pitch, speed, accent of the user), visual features (e.g., facial expressions of the user, objects in the environment), physical features (e.g., the user waving or pointing at an object in the environment), estimated mood and/or mood of the user, and/or preferences or intentions of the user. This information may then be stored in the information state 110.

The rich media context information stored in the information state 110 may facilitate different components to play their respective roles such that the conversation may proceed in an adaptive, more engaging, and more efficient manner with respect to the intended goal. For example, rich contextual information may improve understanding of the utterance of the user 180 based on what is observed in the conversation scene, evaluate the performance of the user 180 and/or estimate the utility (or preference) of the user based on the intended goal of the conversation, determine how to respond to the utterance of the user 180 based on the estimated emotional state of the user, and deliver the response in what is considered most appropriate based on the knowledge of the user, and so forth. For example, if accent information represented in both acoustic form (e.g., a particular way of speaking certain phonemes) and visual form (e.g., a particular view of the user) is captured in an informational state about the user, the ASR engine 130 may utilize this information to determine the words spoken by the user. Similarly, NLU engine 140 can also utilize rich context information to determine the semantics that the user is pointing to. For example, if the user points to a computer that is placed on a desk (visual information) and says "i like this," NLU engine 140 may combine the output of ASR engine 130 (i.e., "i like this") and the visual information that the user points to a computer in the room to understand that the user's "this" refers to a computer. As another example, if the user 180 repeatedly makes mistakes in the tutoring session while estimating that the user performs very annoying based on the tone of the speech and/or the user's facial expressions (e.g., they are determined based on multimodal information), instead of continuing to advance the tutoring content, the DM 140 may decide to temporarily change the topic based on the user's known interests (e.g., talking about a happy Gao game) in order to continue attracting the user. The decision to distract the user may be determined based on, for example, previously observed utilities (utilities) regarding what worked (e.g., intermittently made user distraction worked in the past) and what did not work (e.g., continued pressure on the user made it better).

FIG. 1B is a flow diagram of an exemplary process of the dialog system 100 in which the information state 110 captures dynamic information observed during a dialog, according to an embodiment of the present teachings. As seen in fig. 1B, the process is an iterative process. At 105, multimodal information is received, which is then analyzed by the multimodal information processor 170 at 125. As discussed herein, multimodal information includes information related to the user 180 and/or information related to the environment surrounding the conversation. Multimodal information related to a user may include utterances of the user and/or visual observations of the user, such as body gestures and/or facial expressions. The information related to the environment surrounding the conversation may include information related to the environment, such as objects that are present, spatial/temporal relationships between the user and such observed objects (e.g., the user standing in front of a table), and/or dynamic relationships between the user's activities and the observed objects (e.g., the user walking to a table and pointing to a computer at the table). An understanding of the multimodal information captured from the dialog scenario can then be used to facilitate other tasks in the dialog system 100.

Based on the information (representing the past state) stored in the information state 110 and the analysis results (regarding the current state) from the multimodal information processor 170, the ASR engine 120 and the NLU engine 130 perform speech recognition at 125 to determine the words spoken by the user and the language understanding based on the recognized words, respectively. ASR and NLU may be performed based on the current information state 110 and the analysis results from the multimodal information processor 170.

Based on the results of the multimodal information analysis and language understanding (i.e., what the user said or meant), changes to the dialog state are tracked at 135, and such changes are used to update the information state 110 accordingly at 145 to facilitate subsequent processing. To perform the dialog, the DM 140 determines a response at 155 based on the dialog tree designed for the underlying dialog, the output of the NLU engine 130 (understanding of the utterance), and the information stored in the information state 110. Once the response is determined, NLG engine 150 generates the response, e.g., in its textual form, based on information state 110. When a response is determined, there may be different ways to speak it. NLG engine 150 may generate a response at 165 based on the user's preferred genre or a genre known to be more suitable for the particular user in the current conversation. For example, if a user answers a question by mistake, there may be a different way to indicate that the answer is incorrect. For a particular user in the current conversation, if the user is known to be sensitive and easily frustrated, a gentler way may be used to tell the user that his/her answer is incorrect to generate a response. For example, rather than say "this is wrong," NLG engine 150 may generate a textual response that "it is not exactly correct.

The text response generated by NLG engine 150 may then be rendered by TTS engine 160 at 175 into a speech form, e.g., an audio signal form. Although TTS may be performed using standard or commonly used TTS techniques, the present teachings disclose that the response generated by NLG engine 150 may be further personalized based on the information stored in information state 110. For example, if slower speech speeds or softer speech styles are known to be more effective for the user, the generated response may be correspondingly rendered by TTS engine 160 into a speech form, e.g., having a lower speed and pitch, at 175. Another example is to render the response with an accent consistent with the student's known accent based on the user's personalized information in the informational state 110. The rendered response may then be delivered to the user as a response to the user utterance at 185. After responding to the user, the dialog system 100 then tracks additional changes to the dialog and updates the information state 110 accordingly at 195.

FIG. 2A depicts an exemplary configuration of an information state representation 110 according to embodiments of the present teachings. Without limitation, the information state 110 includes a representation for an estimated mind or state of mind. As shown, different representations may be evaluated to represent, for example, the agent's mind state 200, the user's mind state 220, and the shared mind state 210, along with other information recorded therein. The mental state 200 of an agent may refer to the intended target(s) that the dialog agent (machine) is to achieve in a particular dialog. The shared mind state 210 may refer to a representation of the current conversation situation, which is a combination of the agent performing the intended agenda and the user's actual performance according to the agent's mind state 200. The user's mind 220 may refer to a representation of an estimate of what state the student is in with respect to the intended purpose of the conversation, based on the shared mind or the user's performance. For example, if the agent's current task is to teach the student user the concept of a score in mathematics (which may include sub-concepts that establish an understanding of the score), the user's mind may include an estimated mastery of the user's various related concepts. Such an estimate may be derived based on an evaluation of student performance at different stages of tutoring such related concepts.

Fig. 2B illustrates how such representations of different concentric states are connected in an example where the robot director 205 teaches the student user 180 concepts 215 related to score addition, according to an embodiment of the present teachings. As can be seen, the robot agent 205 interacts with the student user 180 via multi-modal interactions. The robotic agent 205 may begin tutoring based on an initial representation of the agent's mind state 200 (e.g., a lesson on score addition that may be represented as AOG). During tutoring, the student user 180 may answer questions from the bot 205 and such answers to the questions form a particular dialog path, enabling the estimation of a representation of the shared mind state 210. Based on the user's answers, the user's performance is evaluated and the user's mental state 220 representation is evaluated with respect to different aspects, e.g., whether the student mastered the taught concept and what style of conversation worked with this particular student.

As seen in fig. 2A, the representation of the estimated mind state is based on some graph-related form, including but not limited to space-time-causal or graph STC-AOG 230, STC-parse graph (STC-PG)240, and may be used in conjunction with other types of information stored in the information state, such as conversation history 250, conversation context 260, event-centric knowledge 270, common sense models 280, … …, and user profiles 290. These different types of information may belong to multiple modalities and constitute different aspects of the dynamics of each dialog with respect to each user. As such, the information state 110 captures general information for various conversations as well as personalized information about each user and each conversation, and interconnects together to facilitate different components in the conversation system 100 to perform corresponding tasks in a more adaptive, personalized, and engaging manner.

Fig. 2C illustrates an exemplary relationship between the mental state 200 of the agent, the shared mental state 210, and the mental state 220 of the user represented in the information state 110, according to an embodiment of the present teachings. As discussed herein, the shared mind state 210 represents the state of a conversation effected via interaction between the agent and the user, and is a combination of the agent's intended intent (in terms of the agent's mind state) and the user's performance in following the agent's intended agenda. Based on the shared mind state 210, the dynamics of the conversation may be tracked as to what the agent can implement and up to that point the user can implement.

Tracking this dynamic knowledge enables the system to estimate what the user has achieved up to that point, in what way the student user has mastered which concepts or sub-concepts so far (i.e., which dialog path(s) are active and which may not). Based on the achievements made by the student user to date, the user's mind can be inferred or estimated 220, which will be used to determine how the agent may further facilitate adjusting or updating the dialog strategy in order to achieve the desired goal or to adjust the mind of the agent to suit the user. The process of adjusting the heart state of the agent enables derivation of an updated heart state 200 of the agent. Based on the dialog history, the dialog system 100 learns the user's preferences or what is more effective (utility) for the user. This information, once incorporated into the information state, will be used to adjust the conversation strategy via utility-driven (or preference-driven) conversation planning. The updated dialog policy drives the next step in the dialog, which in turn may result in a response from the user and subsequent updates to the shared mind, the mind of the user, and the mind of the agent. The process iterates so that the agent can continue to adjust the conversation policy based on the dynamic information state.

According to the present teachings, different heart states are represented based on, for example, STC-AOG and STC-PG. FIG. 3A illustrates exemplary relationships between different types of AND-OR graphs (AOG) used to represent estimated mind states of parties involved in a conversation, according to embodiments of the present teachings. An AOG is a graph with an AND branch AND an OR branch. Branches associated with nodes in the AOG AND associated through AND relationships represent tasks that need to be traversed in their entirety. Branches from nodes in the AOG that are associated through an OR relationship represent tasks that can be selectively traversed. As discussed herein, STC-AOG includes S-AOG corresponding to spatial AOG, T-AOG corresponding to temporal AOG, and C-AOG corresponding to causal AOG. In accordance with the present teachings, an S-AOG is a graph that includes nodes, each of which may correspond to a topic to be covered in a conversation. A T-AOG is a graph that includes nodes, each of which may correspond to a temporal action to be taken. Each T-AOG may be associated with a topic or node in the S-AOG, i.e., represent steps to be performed during a conversation about the topic corresponding to the S-AOG node. The C-AOG is a graph comprising nodes, each of which can be linked to a node in the T-AOG and a node in the corresponding S-AOG, representing an action occurring in the T-AOG and a causal impact of the action on the node in the corresponding S-AOG.

FIG. 3B depicts an exemplary relationship between a node in an S-AOG and a node of an associated T-AOG represented in information state 110, according to embodiments of the present teachings. In this illustration, each K node corresponds to a node in the S-AOG, representing a skill or topic to be taught in the conversation. The evaluation for each K node may include "grasp" or "not grasped", e.g., respective probabilities P (T) and 1-P (T), i.e., P (T) represents a transition probability from not grasped to grasped state. P (L0) represents the probability of a priori learning skills or a priori knowledge about a topic, i.e. the likelihood that a student has mastered the concept before the tutoring session starts. To teach the skills/concepts associated with each K node, the robot director may ask multiple questions based on the T-AOG associated with the K node, and the student will then answer each question. Each question is shown as a Q node and the student's answer is represented as an a node in fig. 3B.

During the conversation between the agent and the user, the student's answer may be the correct answer a (c) or the incorrect answer a (w), as seen in fig. 3B. Based on each answer received from the user, additional probabilities are determined based on various knowledge or observations collected, for example, during the conversation. For example, if the user provides the correct answer (a (c)), then the probability that the answer is a guess, p (g), may be determined, which represents the likelihood that the student does not know the correct answer but guesses the correct answer. Instead, 1-P (G) is the probability that the user knows the correct answer and answers correctly. For incorrect or wrong answers a (w), a probability p(s) may be determined that represents the likelihood that the student gave a wrong answer but the student did know the concept. Based on P (S), a probability 1-P (S) can be estimated, which represents the probability that a student gives a wrong answer because the concept is unknown. Such probabilities may be computed for each node along the path traversed based on the actual conversation, and may be used to estimate when the student mastered the concept, and what might work and not go in teaching this particular student about each particular topic.

FIG. 3C illustrates an exemplary S-AOG and its associated T-AOG, according to embodiments of the present teachings. In the example S-AOG 310 for guiding concepts of scores, each node corresponds to a topic or concept to be taught to student users during a conversation. For example, S-AOG 310 includes node P0 or 310-1 representing the concept of score, node P1 or 310-2 representing the concept of division, node P2 or 310-3 representing the concept of multiplication, node P3 or 310-4 representing the concept of addition, and node P4 or 310-5 representing the concept of subtraction. In this example, different nodes in S-AOG 310 are related. For example, to grasp the concept of a score, at least some of the other concepts regarding addition, subtraction, multiplication, and division need to be grasped first. To teach the concepts (e.g., scores) represented by the S-AOG nodes, the agent may need to perform a series of steps or processes in a dialog session with the student user. This process or series of steps corresponds to T-AOG. In some embodiments, there may be multiple T-AOGs for each node in an S-AOG, each of which may represent a different way for the student to teach and may be invoked in a personalized manner. As shown, S-AOG node 310-1 has a plurality of T-AOGs 320, one of which is shown as 320-1, that correspond to a series of time steps of questions/

answers

330, 340, 350, 360, 370, 380 … …, and so on. The choice of which T-AOG to use in each tutoring session that teaches the concept of a score may vary and may be determined based on various considerations, such as the user in the session (personalized), the degree of mastery of the current concept (e.g., P (L0)), etc.

The representation of the STC-AOG based dialog captures the entities/objects/concepts related to the dialog (S-AOG), the possible actions observed during the dialog (T-AOG), and the impact of each of these actions on these entities/objects/concepts (C-AOG). The actual dialog activity occurring during the dialog (speech) results in traversing the corresponding graph representation or STC-AOG, resulting in a Parse Graph (PG) corresponding to the traversed portion of the STC-AOG. In some embodiments, the S-AOG may model a spatial decomposition (spatial decomposition) of the objects and scenes of the dialog. In some embodiments, S-AOG may model a decomposition of concepts and sub-concepts as discussed herein. In some embodiments, a T-AOG may model a temporal decomposition of events/sub-events/actions that may be performed or have occurred in a dialog related to certain entities/objects/concepts represented in the corresponding S-AOG. The C-AOG can model the decomposition of events represented in the T-AOG and their causal relationships to corresponding entities/objects/concepts represented in the S-AOG. That is, the C-AOG describes changes to nodes in the S-AOG caused by events/actions taken in the dialog and represented in the T-AOG. Such information is about different aspects of the conversation and is captured in the information state 110. That is, the information state 110 represents the dynamics of the dialog between the user and the dialog agent. This is illustrated in fig. 3D.

As discussed herein, based on the actual conversation session, the particular path traversed based on the conversation may result in different types of corresponding Parse Graphs (PGs). For example, it can be applied to S-AOG to produce S-PG, to T-AOG to produce T-PG, and to C-AOG to produce C-PG. That is, based on STG-AOG, the actual conversation results in a dynamic STC-PG, which at least partially represents the different minds of the parties participating in the conversation session. To illustrate this, FIGS. 4A-4C show exemplary S-AOG/T-AOG associated with an agent' S thoughts for teaching score-related concepts; 5A-5B provide an exemplary representation of a shared mind state via a T-PG that is generated based on a dialog in a particular tutoring session; fig. 6A-6B illustrate exemplary representations of a user's mind in terms of estimated mastery of different concepts taught in a conversation with a conversation agent.

Fig. 4A illustrates an exemplary representation of an agent's mental state with respect to fractional coaching, according to embodiments of the present teachings. As discussed herein, the mental representation of the agent may reflect what the agent desires or is designed to cover in the conversation. The mental state of the agent may be adjusted during the conversation session based on the user's performance/behavior such that the representation of the mental state of the agent captures such dynamics or adjustments. As shown in fig. 4A, an exemplary representation of the mental state of the agent includes various nodes, each node representing a sub-concept related to the concept of a score. For example, there are sub-concepts related to: "understand score" 400, "compare score" 405, "understand equivalence score" 410, "expand and simplify equivalence score" 415, "find factor pair" 420, "apply characteristics of multiply/divide" 425, "add scores" 430, "find LCM" 435, "solve unknowns in multiply/divide" 440, "multiply and divide within 100" 445, "simplify false scores" 450, "understand false scores" 455, and "add and subtract" 460. These sub-concepts may constitute a lattice of scores, some of which may need to be taught before others, e.g., "understand false score" 455 may need to be covered before "simplify false score" 450, "add and subtract" 460 may need to be mastered before "multiply and divide within 100" 445, and so on.

FIG. 4B illustrates an exemplary T-AOG representing an agent's mental state in teaching concepts related to scores, according to embodiments of the present teachings. As discussed herein, a T-AOG includes various steps associated with a conversation, some of which relate to what the agent says, some of which relate to what the user responds to, and some of which correspond to certain evaluations for the conversation performed by the agent. There are branches in the T-AOG that represent decisions. For example, at 470, the corresponding action is to have the agent highlight the numerator and denominator boxes, which may be after teaching the student what is the numerator and denominator, for example. After link 480, the agent proceeds to 490 to request user input, e.g., to ask the user to tell the agent which highlighted denominator. Based on the answers received from the students, the agent follows two links combined by OR (plus sign), where each link represents a path taken by the user. For example, if the user correctly answered which was the denominator, the agent proceeds to 490-3, e.g., further asking the user to evaluate the denominator. If the user answers incorrectly, the agent proceeds to 490-4 to provide the user with a prompt for the denominator, and then returns to 490 per link 490-2, again requiring user input as to which is the denominator.

If the agent requires the user to evaluate the denominator at 490-3, there are two associated results, one is a wrong answer and one is a correct answer. The former results in 490-5, the agent indicates to the user that the answer is incorrect at 490-5, and then returns to 490 following link 490-1, asking the user again for input. If the answer is correct, the agent follows another path to 490-6, letting the user know that he/she is correct and continue along that path to set the denominator further and clear the highlighting at 490-7 and 490-8, respectively. As can be seen, the step at 490 represents the proxy plan' S temporal actions related to teaching the concept of denominator, which is related to the S-AOG in FIG. 4A representing the concept of proxy plan teaching the concept of student scores. Thus, they together form part of a representation of the mental state for the agent. FIG. 4C illustrates exemplary dialog content authored to teach concepts associated with scores according to embodiments of the present teachings. By using a similar conversation strategy, conversations are intended to be performed in the question-answering process.

FIG. 5A illustrates an exemplary representation of a shared heart state in the form of a T-PG (corresponding to the path in the T-AOG in FIG. 4B) according to embodiments of the present teachings. The step of highlighting forms a particular path taken by actions in the dialog performed by the dialog agent based on the answer from the user. In contrast to the T-AOG shown in FIG. 4B, shown in FIG. 5A is a T-PG of various highlighted steps (e.g., 470, 510, 520, 530, 540, 550 … …) along a highlighted path. The T-PG shown in FIG. 5A represents an instantiated path traversed based on actions of both the agent and the user, thus representing a shared mind state. FIG. 5B illustrates a portion of authored dialog content between an agent and a user based on which a shared mental representation may be obtained according to an embodiment of the present teachings. As discussed herein, a shared mental representation may be derived based on a stream of dialogs that form a particular path or T-PG that is traversed along an existing T-AOG.

As discussed herein, during a conversation, a conversation agent estimates the mood of a user participating in the conversation based on observations of a conversation with the user, such that both the conversation and the estimated representation of the user's mood are adjusted based on the dynamics of the conversation. For example, to determine how to conduct a conversation, an agent may need to evaluate or estimate a user's mastery of a particular topic based on observations of the user. As discussed with respect to fig. 3B, the estimate may be probabilistic. Based on such probabilities, the agent may infer the current mastery level of the concept and determine how to talk further, e.g., continue to guide the current topic if the estimated mastery level is insufficient, or proceed to other concepts if the estimated user mastery level is sufficient. The agent may evaluate periodically during the dialog and annotate the PG (parameterize) during this process to facilitate decisions on next moves in traversing the graph. Such annotated or parameterized S-AOG may produce an S-PG, i.e., indicating which nodes in the S-AOG have been adequately covered and which have not been adequately covered, for example. FIG. 5C depicts an exemplary S-PG of corresponding S-AOGs that represents an estimated user' S mood, according to embodiments of the present teachings. The underlying S-AOG is shown in FIG. 4A. In this illustrated example, during a conversation, each node in this S-AOG is evaluated based on the conversation and parameterized or annotated based on such evaluation. As shown in fig. 5C, nodes representing different sub-concepts related to the score are annotated with respective different parameters (which indicate, for example, the degree of mastery of the corresponding node).

As shown in FIG. 5C, the nodes in the initial S-AOG (FIG. 4A) are now annotated in FIG. 5C with different weights, each weight indicating the degree of mastery evaluated of the corresponding node' S sub-concepts. As can be seen, the nodes in FIG. 5C are presented in different shades, which are determined according to weights representing different degrees of mastery of the underlying sub-concepts. For example, the now dotted nodes may correspond to those already mastered sub-concepts, and therefore no further traversal is required. Nodes 560 and 565 (corresponding to "understand scores" and "understand false scores") may correspond to sub-concepts that do not achieve a desired level of mastery. All nodes (e.g., mastered and mastered) connected to the two nodes between the two nodes may be considered as a reason why the user has not mastered the concept of scores and false scores.

Such estimated level of mastery of the corresponding nodes in the original S-AOG would result in annotated S-PGs that represent an estimated user' S mood that would indicate a degree of understanding of concepts associated with such nodes. This provides a basis for the dialog broker to understand the user's relevant patterns, e.g. for what has been taught, what the user has understood and for what the user still has problems. As can be seen, the representation of the user's mind is dynamically estimated based on, for example, the user's performance and activity during an ongoing conversation. In addition to estimating the level of mastery of concepts associated with different nodes to understand the user's mind, contextual observations and information about the user that may be collected during an ongoing conversation may also be used to estimate other traits or behavioral indicators of the user as part of understanding the user's mind. FIG. 5D illustrates exemplary types of personality traits of a user that may be estimated based on information observed in a conversation, according to embodiments of the present teachings. As shown, in a conversation with a user, based on observations of the user's behavior or expression (whether in spoken, visual, or physical form), the agent can estimate various characteristics of the user in different dimensions via multimodal information processing (e.g., by multimodal information processor 170), e.g., whether the user is outward, how mature the user is, whether the user is angry, whether the user is excited easily, whether the user is generally upright, how confident or safe the user feels to him/herself, whether the user is reliable, rigorous, etc. Such information, once estimated, forms a user's profile that can influence the dialog system 100 to determine how to adjust the dialog strategy of the dialog system 100 when needed and in what manner the agent of the dialog system 100 should have a dialog with the user.

Both S-AOG and T-AOG may have certain structures that are organized based on, for example, topics, concepts, or streams of conversation. FIG. 6A depicts an exemplary generalized structure of an S-AOG associated with a tutorial dialog according to an embodiment of the present teachings. This general structure as shown in fig. 6A is not subject-specific, but can be used to teach any subject. Exemplary structures include the different stages involved in the tutorial dialog, which are represented as different nodes in the S-AOG. As shown, node 600 is used for conversations related to greetings, node 605 is used for chats about, for example, weather or health, node 610 is used for conversations reviewing previously learned knowledge (e.g., as a basis for teaching an intended topic), node 615 is used for teaching an intended topic, node 620 is used for testing student users about a taught topic, and node 625 is used for conversations evaluating the student users' mastery of the taught topic based on the testing. Different nodes may be connected in a manner that encompasses different flows between the underlying sub-dialogs, but the particular flow in each dialog may be dynamically determined based on the situation. Some branches coming out of a node may be related via an AND relationship AND some branches coming out of a node may be related via an OR relationship.

As seen in fig. 6A, a tutoring-related dialog may begin with a greeting dialog 600, such as "good morning", "good afternoon", or "good evening". There are three branches out of node 600, including going to node 605 for short chats, going to node 610 for reviewing previously learned knowledge, and going to node 615, which begins teaching directly. The three branches are OR'd together, i.e. the dialog agent can proceed to follow any of the three branches. After the chat session 605, there are also three branches, one to the tutorial node 615, one to the test node 620, and one to the review node 610. The review node 610 also has two branches, one going to the teaching node 615 and the other going to the testing node 620 (the student's prior knowledge or level of prior knowledge of the subject may be tested first before teaching). In this illustrated embodiment, teaching AND testing nodes are the necessary dialogs such that branches from nodes 605 AND 610 to teaching AND testing nodes 615 AND 620 are related by AND.

The teaching and testing may be iterative, as indicated by the double-headed arrow between nodes 615 and 620. Both the teaching node 615 or the testing node 620 may proceed to the evaluation node 625 as desired. That is, the evaluation may be performed based on the teaching results from node 615 or the test results from node 620. Based on the evaluation results, the conversation may proceed to one of three alternative scenarios (related by OR), including teaching 615 (review concepts again), testing 620 (retesting), OR reviewing 610 (to enhance the user's understanding of some concepts), OR even chatting 605 (e.g., if the user is found to be frustrated, the conversation system 100 may switch topics to continue attracting the user rather than losing the user). This general S-AOG for tutorial-related dialogs is provided by way of illustration and not limitation. The S-AOG used for tutoring can be derived from any logic flow required by the application.

As seen in fig. 6A, each node is itself a conversation and, as discussed herein, may be associated with one or more T-AOGs, each representing a conversation flow for an intended topic. FIG. 6B depicts an exemplary T-AOG with dialog content for a greeting authored for the S-AOG node 600, according to embodiments of the present teachings. The T-AOG may be defined as a dialog policy for a dialog. Following the steps defined in the T-AOG is the implementation of a strategy that achieves some of the intended purposes. In fig. 6B, the content in each rectangular box represents the content to be spoken by the agent, and the content in the ellipse represents the content to which the user responds. As can be seen, in the T-AOG for a greeting shown in fig. 6B, the agent first says one of the three alternative greetings available, namely, good morning 630-1, good afternoon 630-2, and good evening 630-3. The user's response to such a greeting may be different. For example, the user may repeat what the agent said (i.e., good morning, good afternoon, or good evening). Some people will repeat and then add "you are also" 635-1. Some people say, "thank you, at 635-2? ". Some would say both 635-1 and 635-2. Some people may simply remain silent 635-3. There may be other alternative ways to respond to an agent's greeting. Upon receiving a response from the user, the conversation agent may then reply to the user's response. Each answer may correspond to a response from the user for each alternative response from the user. This is illustrated by the contents at 640-1, 640-2 and 640-3 in FIG. 6B.

The T-AOG shown in FIG. 6B may cover multiple T-AOGs. For example, 630-1, 635-2, and 640-2 in FIG. 6B may constitute T-AOG for greetings. Similarly, 630-1, 635-1, 640-1 may correspond to another T-AOG for a greeting; 630-2, 635-1, 640-1 may form another T-AOG; 630-1, 635-3, 640-3 form a different T-AOG; 630-2, 635-3 and 640-3 may form another different T-AOG, and so on. Although different, these alternative T-AOGs all have substantially similar structures and general content. This commonality can be used to generate a simplified T-AOG with flexible content associated with each node. This may be achieved via, for example, graph matching. For example, the different T-AOGs mentioned above that are related to greetings, while having different authored content with respect to greetings, all have a similar structure, namely the initial greeting plus the response from the user and plus the response to the user's response to the greeting. In this sense, the T-AOG in FIG. 6B may not correspond to the simplest general T-AOG for greetings.

In order to facilitate flexible dialog content and to enable the dialog system 100 to adapt the dialog in a personalized manner, the AOG may be parameterized. According to different embodiments of the present teachings, such parameterization may be applied to both S-AOG and T-AOG with respect to both parameters associated with nodes in the AOG and parameters associated with links between different nodes. FIG. 6C illustrates different exemplary types of parameterization according to embodiments of the present teachings. As shown, the parameterized AOG includes parameterized S-AOG and T-AOG. For parameterized S-AOG, each of its nodes can be parameterized with an award (rewarded) representing, for example, an award obtained by covering a topic or topic/concept associated with the node. In the context of tutoring, the higher the reward associated with a node in the S-AOG, the greater the value the agent has to teach the student user the concepts associated with that node. Conversely, if the student user is already familiar with (e.g., has mastered) the concept associated with the node in the S-AOG, the lower the reward assigned to that node, since there is no further benefit by teaching the associated concept to the student. Such rewards associated with the nodes may be dynamically updated during the course of the tutoring session. This is illustrated in FIG. 6D, where S-AOG 310 has nodes associated with related mathematical concepts that are related to scores. As can be seen, each node representing a related concept is parameterized with an award that is evaluated to indicate whether the student is taught the concept for an award.

Each node in the S-AOG may have a different branch, and each branch leads to another node associated with a different topic. Such branches may also be associated with parameters such as the probability of taking the respective branch, as shown in fig. 6C. Also illustrated in FIG. 6D is parameterization of the path in the AOG. Teaching of the scores may require accumulating knowledge starting with additions and subtractions, followed by multiplications and divisions. Along each connection between different concepts, there is a probability from one to the other. For example, as shown, the probability P is parameterized from the "add" node 310-4 to the "subtract" node 310-5_a,sIt may indicate the possibility of successfully teaching the student to understand the concept of "subtraction" if the concept "addition" is first taught. In contrast, the probability P_s,aThe likelihood of successfully teaching the student to understand addition if subtraction is taught first may be indicated. As another example, the probabilities P are used from "add" to "multiply"/"divide", respectively_a,mAnd P_a,dAnd carrying out parameterization. Similarly, the probabilities P are also used from "subtract" to "multiply"/"divide", respectively_s,mAnd P_s,dAnd carrying out parameterization. With such probabilities, the conversation agent can select an optimized path by maximizing the probability of successfully teaching the intended concept in an order that is likely to work better. Such probabilities may also be dynamically updated based on, for example, observations from the conversation. In this way, the optimal process of teaching the student's actions can be adjusted in real time based on the individual's situation.

Parameterization may also be applied to the T-AOG, as indicated in FIG. 6C. As discussed herein, T-AOG represents a dialog strategy for a particular topic. Each node in the T-AOG represents a specific step in the conversation, often related to what the conversation agent is going to say to the user, or what the user is going to react to the conversation agent, or an evaluation of the transition. As discussed herein, the following often occurs: the same thing can be said in different ways and any way of saying it should be considered to convey the same thing. Based on this observation, the content associated with the nodes in the T-AOG can be parameterized. This is shown in fig. 6E according to an embodiment of the present teachings. As shown in fig. 6B, there are different ways to perform a greeting dialog. Even for such simple topics, there are many different ways to express something that is nearly the same. The content of the greeting dialog may be parameterized in a more simplified T-AOG. FIG. 6E illustrates an exemplary parameterized T-AOG corresponding to the T-AOG shown in FIG. 6B. The initial greeting is now parameterized as "[ ___ ] good! "650, where the content in parentheses is parameterized as possible examples" morning "," afternoon "and" evening ". The user's response to the initial greeting is now divided into two cases, one is a spoken response 655-1 and the other is no spoken response or silent 655-2. The spoken answer 655-1 may be parameterized with different content choices in response to the initial greeting, as shown in parenthesis associated with 655-1. That is, any content included in parameterized set 655-1 may be identified as a possible answer from the user's response to the initial greeting from the agent. Similarly, the content of such responses from the agent at 660-1 may also be parameterized as a set of all possible responses in order to respond to the user's answer. In the case of user silence, the response content 660-2 of the agent may also be similarly parameterized.

Another example of parameterizing content associated with nodes in a T-AOG is illustrated in FIG. 6F. This example relates to a T-AOG for testing students with respect to the concept of "addition". As shown, the T-AOG used for this test may include the following steps: questions are posed (665), students are asked for answers (667), the students provide answers (670-1 or 675-1), responses are made to the user' S answers (670-2 or 675-2), and then the rewards associated with the S-AOG node for "addition" are evaluated (677). For each node of the T-AOG in FIG. 6F, the content associated with it is parameterized. For example, for step 665, the parameters involved include X, Y, Oi, where X and Y are numbers and Oi refers to objects of type i. By instantiating specific values for these parameters, a number of problems can be created. In this example in fig. 6F, the first step at 665 of the test is to present X objects of type 1 ("o 1") and Y objects of type 2 ("o 2"), where X and Y are instantiated with numbers (3 and 4) and "o 1" and "o 2" may be instantiated with the type of object, such as apple and orange. Based on this instantiation of the parameters, a specific test question may be generated. At 667 of T-AOG in FIG. 6F, to test a student, the dialog agent will ask the user what the sum of X objects of type "o 1" and Y objects of type "o 2" is. When X, Y, o1 and o2 are instantiated with specific values, such as X3, Y4, o1 and o2 orange, the text question would appear as "3 apples, 4 oranges" (or even a picture thereof), and the test question could be asked by an instantiated parameterized question to ask the sum of X + Y, e.g., "how many fruits are? "or" can you tell me the total number of fruits? In this way, flexible test problems can be generated under generic and parameterized T-AOG. Also parameterized test questions may facilitate the generation of the expected correct answer. In this example, since X and Y are instantiated as 3 and 4, respectively, the expected correct answer to the summed test question may be dynamically generated as X + Y-7. Such dynamically generated expected correct answers may then be used to evaluate answers from student users in response to the question. In this way, the T-AOG can be parameterized with a simpler graph structure, while enabling the dialog agent to flexibly configure different dialog content in a parameterized framework to perform the intended task.

As discussed herein, the dialog agent may also dynamically derive a basis for evaluation of the test as parameterized content is instantiated. In this example, the expected correct answer 7 is formed based on the example of X-3 and Y-4. When an answer is received, the answer may be categorized as an unknown answer at 670-1 (e.g., when the user does not respond at all or the answer does not contain a number) or an answer with a number at 675-1 (correct or incorrect numbers). The response to the answer may also be parameterized. The answer is either unknown or incorrect may be considered a non-correct (not-correct) answer, which may be responded to using a parameterized response at 670-2.

Responses to incorrect answers may also be classified into different cases, and in each case, the response content may be parameterized with appropriate content appropriate for that classification. For example, when the incorrect answers are incorrect total numbers, the response thereto may be parameterized to account for the incorrect answers, which may be due to mistakes or guesses. If the incorrect answer is because the user does not know at all (e.g., does not answer at all), then the response can be parameterized to address that situation directly using appropriate response alternatives. Similarly, the response to a correct answer may also be parameterized to account for whether it is indeed a correct answer or is estimated to be a lucky guess.

As shown in fig. 6F, after the dialog agent responds to the answer (or no answer) from the user in different situations, the T-AOG includes a step at 677 for evaluating the current reward for mastering the concept of "add". After the evaluation, the process may return to 665 to test the student for more questions. The process may also continue to "teach" if the evaluation reveals that the concept is not fully understood by the student, or exit if the student is deemed to have mastered the concept. In some cases, the process may also be abnormal, for example, if it is detected that the student is unable to complete the assignment at all, the system may consider switching topics temporarily, as discussed with respect to fig. 6A.

Furthermore, since the T-AOG corresponds to a conversation policy (which indicates possible flows for the conversation to be available instead), the actual conversation may traverse a portion of the T-AOG by following a particular path in the T-AOG. Different users in different dialog sessions or the same user may be generated embedded inDifferent paths in the same T-AOG. Such information may be useful for allowing the dialog system 100 to personalize the dialog by parameterizing the links along different paths with respect to different users, and the path of such parameterization may be with respect to what each user represents to function and what to do not. For example, for each link between two nodes in a T-AOG, the reward for that link may be estimated with respect to each student user's performance of understanding of the underlying concepts being taught. Such path-centric rewards may be calculated based on probabilities associated with different branches for each node along the path. Figure 6G illustrates a T-AOG associated with a user having different paths between different nodes parameterized with rewards updated based on dynamic information observed in a conversation with the user, in accordance with an embodiment of the present teachings. In this exemplary parameterized T-AOG (similar to the T-AOG presented in FIG. 6F), after X objects 1 and Y objects 2 are presented to the student user at 680, the agent asks the user for the total number of X + Y at 685. Based on previous teaching or testing of the same user, there may be an estimated likelihood of how the student will make this round, i.e., for each possible outcome (690-1, 680-2 and 690-3), there is an associated reward R, respectively₁₁、R₁₂And R₁₁。

If the answer from the student is incorrect (690-2), then it may be responded to in different ways, e.g., 695-1, 695-2, and 695-3. Based on the user's past experience or known personality (also estimated in a personalized manner), there may be different reward points R respectively associated with each possible response₂₂、R₂₃And R₂₄. For example, if the user is known to be sensitive and to do better in an encouraging or positive manner, the reward associated with response 695-2 may be highest. In this case, the dialog system 100 may choose to respond to the incorrect answer with a response 695-2, which response 695-2 is more aggressive in motivation. For example, a conversation agent may say "about. Think of a new idea. "different users may prefer not to be notified of the error, in which caseFor the user, with R₂₃And R₂₄In contrast, reward R linked to response 695-1₂₂May be the highest. Such reward points associated with alternative paths of the T-AOG are personalized based on knowledge of the particular user and/or past deals with the user. By configuring the AOG with parameters for both nodes and paths, the dialog system 100 can dynamically configure and update parameters during each dialog to personalize the AOG so that these dialogs can be conducted in a flexible (content parameterized), personalized (parameters are calculated based on personalized information) and thus more efficient manner.

As discussed herein and shown in fig. 2A, the informational state 110 is represented based not only on AOG, but also on various types of information, such as conversation context 260, conversation history 250, user profile 290, event-centric knowledge 270, and some common sense models 280. The representation of the different concentric states 200-. For example, while AOGs are used to represent different psychology, their corresponding PGs (results of traversing AOGs based on dialog) are generated based on actual traversals (nodes and paths) in the AOGs and dynamic information collected during the dialog. For example, the values of parameters associated with nodes/links in an AOG may be dynamically estimated based on ongoing conversations, conversation history, conversation context, user profiles, events occurring during conversations, and so forth. In view of this, to update the information state 110, different types of information (such as knowledge about events, surrounding environment, user characteristics, activities, etc.) may be tracked, and this tracked knowledge may then be used to update different parameters and ultimately update the information state 110.

As discussed herein, AOG/PG is used to represent different psychology, including that of a robotic agent (designed according to what is expected to be done), that of sharing between the robotic agent and a user (derived based on actually occurring conversations), and that of the user (estimated based on occurring conversations and the user's performance in conversations). When AOGs and PGs are parameterized, the values of parameters associated with nodes and links may be evaluated based on, for example, information related to the conversation, the user's performance, the user's characteristics, and optionally the event(s) occurring during the conversation, etc. Based on this dynamic information, such a representation of the mood may be updated over time during the dialog based on changing circumstances.

Fig. 7A depicts a high-level system diagram of a knowledge tracking unit 700, the knowledge tracking unit 700 being used to track information and update rewards associated with nodes/paths in an AOG/PG, in accordance with embodiments of the present teachings. As discussed herein, nodes in an AOG may each be parameterized with a state-related reward, and paths in a PG may also each be parameterized with a path-related reward or utility. The reward/utility associated with a state or node in an AOG may include a reward/utility that represents a degree of mastery of the concept associated with the node. The higher the mastery degree of a concept of an AOG node, the lower the associated state reward/utility for that node, i.e., the considerably lower the reward/utility for teaching a concept that has been mastered. The reward/utility is personalized and derived based on an assessment of the user's performance, and such assessment may be ongoing on a timely basis or on a periodic basis during the session.

As discussed herein, each S-AOG node associated with a concept (e.g., to be taught in tutoring) may have one or more T-AOGs, each of which may correspond to a particular manner of teaching the concept. The parse path or PG is formed based on nodes and links in the T-AOG traversed during the dialog. The reward/utility associated with a path in a T-AOG or T-PG may indicate the likelihood that such a path will result in a successful tutoring session or a successful mastery of the concept. In view of this, the better the performance of the evaluated user in traversing along the path, the higher the path reward/utility associated with the path. Such path-related rewards/utilities may also be determined based on the performance of multiple users that statistically indicate which pedagogical style is better for a group of users. Such estimated rewards/utilities along different branch paths may be particularly helpful in conversation sessions when determining which branch path to take to continue a conversation, and may guide the conversation agent to select a path that has a statistically better chance of leading to better performance (i.e., reaching a mastery level of the concept sooner).

The illustrated embodiment shown in fig. 7A is directed to tracking state and path rewards during a conversation. In this illustrative embodiment, the rewards associated with the nodes and paths are determined based on different probabilities estimated based on the dynamic conditions of the conversation. For example, for nodes in the S-AOG associated with, for example, an "add" concept, the reward is a state-based reward that indicates whether there is a reward or reward by teaching the "add" concept to a particular user. For each student user registered to learn mathematics from the robotic agent, the mathematically conceptual reward value for each node of the S-AOG is adaptively calculated. The reward for each node in such an S-AOG (e.g., "add" to the mathematical concept) may be assigned an initial reward value, and the reward value may continue to change as the user engages in a conversation indicated by the associated T-AOG (conversation flow on the "add" concept). During the dialog specified by the T-AOG, the robotic agent may ask a question to the user, who then answers the question. Answers from the user may be continuously evaluated and the probability of whether the user is learning or making progress estimated. Such probabilities estimated while traversing the T-AOG may be used to estimate rewards associated with nodes in the S-AOG (i.e., nodes representing the concept "add") that indicate whether the user has mastered the concept. That is, the reward value associated with the node representing the concept is updated during the dialog. If the teaching is successful, the reward may be reduced to a low value indicating that the student is taught that the concept is of no further value or reward because the student has mastered the concept. As can be seen, such state-based rewards are personalized in that they are calculated based on each user's performance in the conversation.

There are also rewards associated with different paths in the T-AOG, which includes different nodes, each of which may have multiple branches representing alternative paths available. The selection of different branches results in different traversals through the underlying T-AOG, and each traversal results in a T-PG. In a tutoring application, in order to track the effectiveness of tutoring, at each node (traversed during a dialog) of the T-AOG, different branches may be associated with respective measurements, which may indicate the likelihood of achieving the intended target when selecting the respective branch. The higher the measurement associated with a branch, the more likely it is to lead to a path that meets the intended purpose. However, optimizing the selection of branches from each individual node may not result in an overall optimal path. In some embodiments, rather than optimizing the individual selection of branches at each node, the optimization may be performed on a path basis, i.e., the optimization is performed with respect to a path (of a particular length). In operation, such path-based optimization may be implemented as a look-ahead operation, i.e., what the best choice at the current branch with respect to the current node is when considering the next K choices along the path. This look-ahead operation selects the branch based on a composite measurement along each possible path determined based on measurements accumulated on links from the current node along each possible path. The length of the lookup may vary and may be determined based on application needs. The composite measurements associated with all available alternate paths (originating from the current node) may be referred to as a path-based reward. The branch from the current node may then be selected by maximizing the path-based reward for all possible traversals from the current node.

The reward along the path of the T-AOG may be determined based on a plurality of probabilities determined based on the user's performance observed during the conversation. For example, at the current node in the T-AOG, the dialog agent may ask a question to the student user and then receive an answer from the user in response to the question, where the answer corresponds to the branch from the node in the T-AOG for the question. A measure associated with the reward for the branch may then be estimated based on the probability. Such measurements, as well as path-based rewards, are personalized in that they are calculated based on personal information observed from conversations involving a particular user. Measurements associated with different branches along the T-AOG path (associated with the S-AOG node) may be used to estimate the reward of the S-AOG node with respect to student mastery level. These rewards (including node-based rewards and path-based rewards) may constitute the "utilities" or preferences of the user, and may be used by the robotic agent to adaptively determine how to continue the conversation in a utility-driven conversation plan. This is illustrated in fig. 7B, which shows how knowledge can be tracked to enable the dialog system 100 to adapt relevant knowledge on the basis of "shared thoughts" (which represent actual dialogs) on the fly to use this tracked knowledge to dynamically update models (parameters in parameterized AOGs, e.g., rewards for the degree of mastery of basic concepts in S-AOGs about students/users 'thoughts and/or rewards for different paths in T-AOGs in agents' thoughts) which can then be used (by agents) to perform utility-driven dialog plans according to the dynamic situation of dialogs with a particular user.

Referring back to fig. 7A, to perform knowledge tracking and updating of information states 110 based on tracked knowledge, the knowledge tracking unit 700 includes an initial knowledge probability estimator 710, a positive knowledge probability estimator 720, a negative knowledge probability estimator 730, a guess probability estimator 740, a state reward estimator 760, a path reward estimator 750, and an information state updater 770. Fig. 7C is a flow chart of an exemplary process of the knowledge tracking unit 700 according to an embodiment of the present teachings. In operation, the initial probability of knowledge of a node in a representation of a relevant AOG can first be estimated at 705. This may include an initial known probability of both each relevant node in the S-AOG and each branch of each T-AOG associated with the S-AOG node.

With the estimated initial probabilities, the conversation agent can converse with the user on a particular topic represented by the relevant S-AOG node and the particular T-AOG for that S-AOG node, the associated probabilities having been initialized. To initiate a conversation, the robotic agent begins the conversation by following the T-AOG. When a user responds to the robotic agent, the NLU engine 120 may analyze the response and generate a language understanding output. In some embodiments, to understand the utterance of the user, NLU engine 120 may also perform language understanding based on information other than the utterance (e.g., information from multimodal information analyzer 702). For example, a user may say "this is a machine toy" while pointing at a toy on a table. To understand the semantics of this utterance (i.e., what this means), the multimodal information analyzer 702 may analyze the audio and visual information to combine cues in different modalities to facilitate the NUL engine 120 to understand the user's meaning and output the user's response with an assessment of the correctness of the response, e.g., based on T-AOG.

When the knowledge tracking unit 700 receives the user's response with the assessment at 715, in order to track knowledge based on what is happening in the conversation, different modules may be invoked to estimate corresponding probabilities based on the received input. For example, if the user's response corresponds to a correct answer, then the know-positive probability estimator 720 may be invoked to determine a probability associated with knowing the correct answer positively; negative-aware probability estimator 730 may be invoked to estimate the probability associated with an unknown answer; the guess probability estimator 740 may be invoked to determine a probability that evaluates the likelihood that the user will only make a guess. Knowing the positive probability estimator 720 may also determine the probability associated with a positive knowledge but the user made an error if the user's response corresponds to an incorrect answer; negative-aware probability estimator 730 may estimate a probability associated with not being aware and the user still answering the error; the guess probability estimator 740 may determine the probability that the answer is only a guess. These steps are performed at 725, 735, and 745, respectively.

As discussed herein, for T-AOG, when a user interacts with a conversation agent, such interactions form a parse graph that continues to grow as the conversation progresses. One example is shown in fig. 5A. Given a parse graph or history of interactions between the robotic agent and the user, the probability of the user knowing the underlying concept may be adaptively updated based on the estimated probability. In some embodiments, the probability of initially knowing (or knowing) a concept at time t +1 may be updated based on the observations. In some embodiments, it may be calculated based on the following formula:

wherein P (L)_t+1I obs ═ correct) represents the probability of initial knowledge given the observed correct answer at time t +1, P (L)_t+1"wbs" — wrong) represents the initial known probability of a given observed wrong answer at time t +1, P (L)_t) Is the initial known probability at time t, p(s) is the probability of a miss (slip), and p (g) represents the probability of a guess. Thus, the probabilities of the a priori knowledge can be dynamically updated with probabilities estimated based on observations of the conversation, as shown in the examples herein. This a priori knowledge probability associated with the nodes in the S-AOG may then be used by the state reward estimator 760 at 755 in fig. 7C to compute state-based rewards or node-based rewards associated with the nodes in the S-AOG that represent the user' S mastery of the relevant skills associated with the concept nodes.

Based on the probabilities computed for different branches (e.g., some corresponding to correct answers and some corresponding to wrong answers) for each node along the PG path in the T-AOG, a path-based reward may be computed by the path reward estimator 750 for each path at 765 in fig. 7C. Based on such estimated state-based rewards and path-based rewards, the information state updater 770 may then continue to update the parameterized AOG in the information state 110 at 775. When the parameters associated with the AOGs in the information state 110 are updated, the updated parameterized AOGs can then be used to control the conversation based on the utility (preferences) of the user.

In some embodiments, different parameters for parameterizing AOGs may be learned based on observed and/or calculated probabilities. In some embodiments, unsupervised learning methods may be employed to learn such model parameters. This includes, for example, knowledge tracking parameters and/or utility/reward parameters. Such learning may be performed online or offline. In the following, an exemplary learning scheme is provided:

α₁(j)＝π_jb_j(o₁)，j∈[1，N]

β_T(i)＝1，i∈[1，N]

8A-8B depict a utility-driven conversation plan based on dynamically computed AOG parameters, according to an embodiment of the present teachings. Utility driven conversation plans can include conversation node plans and conversation path plans. The former may refer to a node in the S-AOG that is selected to continue the conversational session. The latter may refer to the path selected in the T-AOG for conducting a conversation. FIG. 8A illustrates an example of a utility-driven tutoring plan with respect to a parameterized S-AOG, according to an embodiment of the present teachings. FIG. 8B illustrates an example of utility-driven path planning in a parameterized T-AOG, according to an embodiment of the present teachings.

For node planning, as shown in FIG. 6D, an exemplary S-AOG 310 is used to teach various mathematical concepts and each node corresponds to a concept. In fig. 8A different nodes are shown, which have associated therewith reward related parameters, and some of which may be parameterized with conditions established based on the rewards of the connected nodes. As seen in fig. 8A, node 310-4 is used to teach the concept "addition", node 310-5 is used to teach the concept "subtraction", et. Each node parameterizes, for example with an indication, a reward for teaching the concept, which reward is related to, for example, the current level of mastery of the concept. The rewards associated with some nodes in S-AOG 310 are expressed as a function of reward parameters from the nodes to which they are connected.

Some concepts may need to be taught under the requirement or condition (e.g., a prerequisite) that the user already possesses some other concept. For example, to teach the student the concept of "division," the user may be required to already grasp the concepts of "addition" and "subtraction". This may be indicated by the requirement 820, the requirement 820 being expressed as R_d＝F_d(R_a,R_s) Wherein a reward R associated with node 310-3_dIs R_a、R_sFunction F of_d，R_a、R_sRepresenting the rewards associated with node 310-4 for "addition" and node 310-5 for "subtraction", respectively. For example, an exemplary condition for teaching the concept of "division" 310-3 may be a reward R whose reward level must be high enough (i.e., the user has not mastered the concept of "division") and for "addition" (310-4)_aAnd a reward R for "subtraction" (310-5)_sMust be low enough (i.e., the user has mastered the prerequisite concept for "addition" and "subtraction"). The function F can be designed according to application requirements_dTo satisfy these conditions.

The node-based plan may be set such that a dialog (T-AOG) associated with a node conditioned on certain reward criteria in the S-AOG may not be scheduled until the reward conditions associated with the node are satisfied. In this way, initially, unconditional conditions may be arranged when the user is unaware of any conceptsThe nodes are only 310-4 and 310-5. During a dialogue for "add" or "subtract", a reward (R) associated therewith_aOr R_s) Can be continuously updated and propagated to nodes 310-2 and 310-3 such that R_mOr R_dAlso according to formula F_mOr F_dIs updated. When the user grasps a certain point of the concept of "addition" and "subtraction", R is rewarded_aAnd R_sBecomes low enough that there is no need to schedule dialogs associated with nodes 310-4 and 310-5. At the same time, low R_aAnd R_sCan be substituted into F_mOr F_dSuch that the conditions associated with nodes 310-2 and 310-3 may now be satisfied to place 310-2 and 310-3 in an active state because R_mOr R_dCan now become high enough that they are ready to be selected to perform a dialog on the topic of multiplication and division. When this happens, a dialog for teaching the corresponding concept can be initiated using the T-AOG associated with it.

The same applies to nodes for "scores". The user may be required to have mastered the notion of "multiplication" and "division" (the rewards for 310-2 and 310-3 are sufficiently low) and the reward for node "score" becomes reasonably high accordingly. In this manner, the state-based rewards associated with the nodes in the S-AOG may be used to dynamically control how to traverse between the nodes in the S-AOG in a personalized manner, e.g., in a manner that is adaptive based on the circumstances associated with each individual. That is, in an actual conversation with a different user, the traversal can be adaptively controlled in a personalized manner based on observations of the actual conversation situation. For example, in FIG. 8A, different nodes may have different rewards at different times, depending on the stage of teaching. As shown, node 310-4 for "add" is darkest, representing, for example, the lowest prize value, which may indicate that the user has mastered the concept of "add". Node 310-5 for "subtract" has a reward value in between, for example, indicating that the user is not currently mastering the concept but is close. Nodes 310-1, 310-2, and 310-3 are light colored, for example, to indicate a high level of reward values indicating that the user has not mastered the corresponding concept.

Path-dependent or path-based rewards associated with paths in a T-AOG may also be dynamically computed based on observations of the actual conversation and may also be used to adjust how the T-AOG is traversed (how branches are selected) during the conversation. FIG. 8B illustrates an example of utility-driven path planning with respect to a T-AOG, according to an embodiment of the present teachings. As shown, upon traversing the T-AOG, at each time, e.g., at time T, upon receiving an answer from the user, the robotic agent needs to determine how to respond. During time 1,.. times t, the dialog traverses the parse graph pg_1...tThe traversed state is s₁，s₂，...，s_t. In response, slave state s_tThere may be multiple branches leading to the next state s_t+1。

To determine which branch to take, look-ahead operations may be performed based on path-based rewards along alternative paths. For example, to look ahead one step, the sum of the values from s can be considered_t(step one) the rewards associated with the available alternative branches, and the branch representing the best path based reward may be selected. To look ahead two steps, consider the sum of the results from s_tAnd the reward associated with each branch in each secondary alternative branch (originating from each branch in the first set of alternative branches), and the branch that results in the best path-based reward is selected as the next step. A deeper look-ahead can also be achieved based on the same principles. The example shown in FIG. 8B is a scheme that implements a two-step look-ahead, i.e., at time t, the range of the look-ahead includes a plurality of paths at t +1 and each of the plurality of paths at t +2 that originate from each path at t + 1. Branches are then selected via look-ahead to optimize the path-based reward.

The path-based reward associated with a branch may be initialized first and then updated during the session. In some embodiments, an initial path-based reward may be calculated based on previous conversations indicated by the user. In some embodiments of the present invention, the,such initial path-based rewards may also be calculated based on previous conversations of similarly situated users. Each path-based reward may then be dynamically updated over time during the dialog based on how each branch selection results in satisfaction of the intended purpose of the dialog. Based on this dynamically updated path-based reward, a look-ahead optimization scheme may be driven by the utility (or preference) of each user as to how to talk. Thus, it enables adaptive path planning. The following is an exemplary formula for path planning that optimizes path selection based on look-ahead operations. In this exemplary formula, a is the current state s given a number of branch choices a_tAnd an analysis graph pg_1...tIs the path of optimal choice, EU is the expected utility of branch selection a, and R(s)_t+1A) is in state s_t+1The prize of a is selected next. As can be seen, the optimization is recursive, which allows look-ahead at any depth.

a^*＝arg max EU(a|s_t，pg_1...t)

In conjunction with state-based, utility-driven node planning, the conversation system 100 in accordance with the present teachings is able to dynamically control conversations with users based on past accumulated knowledge about the users and instantaneous observations of the users, in connection with the intended purpose of the underlying conversation. Figure 8C illustrates the use of utility driven dialog management for dialogs with student users based on a combination of node and path plans, according to embodiments of the present teachings. That is, in a dialog with a student user, the dialog agent conducts a dialog with the user via utility-driven dialog management based on dynamic nodes and parameterized AOG-based path selection.

In FIG. 8C, S-AOG 310 includes various nodes for various concepts to be taught, with annotated rewards and/or conditions. The reward associated with a node may be predetermined based on knowledge about the user. For example, as shown, four nodes (310-2, 310-3, 310-4, and 310-5) may have lower rewards (represented as darker nodes), indicating, for example, that the student user has mastered the concepts of addition, subtraction, multiplication, and division. There is one node with a high educational reward (i.e., a dialog can be scheduled), which is 310-1 for "score". Thus, selecting one of the S-AOG nodes to conduct a conversation is a reward-driven or utility-driven node plan.

Node 310-1 is shown associated with one or more T-AOGs 320, each T-AOG 320 corresponding to a conversation policy governing a conversation to teach students the concept of "points". One of the T-AOGs (i.e., 320-1) may be selected to govern the conversation session, and T-AOG 320-1 includes various steps, such as 330, 340, 350, 360, 370, 380 … …. T-AOG 320-1 may be parameterized with path-based rewards, for example. During the session, path planning can be done dynamically using path-based rewards to optimize the likelihood of achieving the goal of teaching students to master the concept of "points. As shown, the nodes highlighted in 320-1 correspond to paths selected based on a path plan, forming a parse graph, representing a dynamic traversal based on an actual dialog. This is to illustrate that knowledge tracking during a conversation enables the conversation system 100 to continually update parameters in the parameterized AOG to reflect learned utilities/preferences from the conversation, and such learned utilities/preferences in turn enable the conversation system 100 to adjust its path plan, thereby making the conversation more efficient, attractive, and flexible.

The AOG used to represent the mental state of an agent in the information state 110 needs to be created with the content before being used for a human-machine conversation. The process of creating the structure of an AOG and the content associated with a node/branch is referred to as content authoring. Traditionally, content in AOG is authored by humans, which can be time consuming and tedious. The different AOGs may be created in an order, for example, a T-AOG for each node in an S-AOG may be created after the node in the S-AOG has been created. This is illustrated in FIG. 9A, which depicts an exemplary mode of creating an AOG with authored content according to an embodiment of the present teachings. As seen, the creation of an AOG includes creating an S-AOG, and then creating a T-AOG associated with a node in the S-AOG. The present teachings disclose ways to create AOGs via automated or semi-automated means.

Different creators can author different content when creating AOGs. For example, a teacher may be required to create an AOG related to teaching. Different teachers may find different ways to teach the student a certain theme, such as addition and subtraction. Some may find it useful to teach addition before subtraction, while some may feel the opposite. They may create different S-AOGs based on their personal experience. Further, with respect to a topic (e.g., "add" corresponding to a node in the S-AOG), different creators can create different sequences of steps in a conversation or T-AOG to interact with students on that topic, thereby creating a flexible way to convey the same topic. This is illustrated in fig. 6B, where there are different ways to perform the greeting. According to the present teachings, different AOGs on the same topic can sometimes be integrated via graph matching (see fig. 6B and 6E). This enables the creation of a T-AOG with parameterized content, while preserving the structure of the stream as a more compact representation of the parameterized T-AOG. While this may simplify the presentation of the T-AOG without losing content, it does not make the content authoring process more efficient. The present teachings disclose different ways of authoring content in a more efficient manner, including automated and semi-automated content authoring processes.

FIG. 9B depicts an exemplary high-level system diagram of a content authoring system 900 for automatically creating AOGs via machine learning, in accordance with embodiments of the present teachings. Via this AOG learning system 900, the structure of an S-AOG and associated T-AOG can be automatically created and content associated with the AOG automatically authored, both via learning. In some embodiments, such automatically learned S-AOG and T-AOG can be further refined by humans, enabling semi-automated means of creating AOG. In the illustrated embodiment, the content authoring system 900 includes a data-driven AOG learning engine 910 that is configured to learn not only the structure of a conversation (which corresponds to an S-AOG) but also content (T-AOG) associated with different portions of the conversation structure. The learning is based on data of past conversations retrieved from the past conversation database 915. The content stored in the past conversation database 915 may be organized based on different criteria, such as subject matter, user demographics, feature profiles, and the like. Content from past conversations may be indexed with respect to different categories so that appropriate content may be used for learning. By accessing relevant indexed learning content, the data-driven AOG learning engine 910 can learn (both structurally and conversational content) to create AOGs that are tailored to a particular type of user.

To learn AOGs related to a particular item (e.g., tutoring), data-driven AOG learning engine 910 may access relevant past conversation data about tutoring (e.g., via item-based index 920) and learn the structure or S-AOG of tutoring with respect to the conversation flow of sub-concepts and the speech content for each sub-concept involved (i.e., the T-AOG for each sub-concept). A tutoring session on any topic may have several common sub-dialogs, each of which may be directed to a sub-concept. An example is shown in fig. 6A in the form of an S-AOG, where a tutoring session typically comprises different sub-dialogs (nodes in the S-AOG) for e.g. greeting (600), chat (605) or review (610), teaching prospective concepts (615), testing (620) and evaluation (625). This general structure of tutoring-related dialog may be universally applicable to any tutoring session, regardless of the particular concept or topic to be tutored during the dialog. This general structure forms an S-AOG for tutoring and can be learned from past dialog data.

To learn the structure of the S-AOG associated with the item, the data-driven AOG learning engine 910 can invoke the topic-based classification model 930 to identify different sub-dialogs and classify them into different topics to derive the underlying structure. For example, past conversations related to tutoring may be processed, AND different sub-conversation flows for different topics may be identified, AND such sub-conversations may always exist (AND relationships) OR alternatively proceed (OR relationships). As shown in fig. 6A, chat (605) AND review (610) are connected in an OR relationship, AND tutoring (615) AND testing (620) are two necessary sub-dialogs in tutoring AND are related by an AND relationship. After evaluation (625), the next step can be any of four possibilities, including returning to review (610), returning to test (620), returning to teaching (615), returning to chat (605). These four possibilities are connected via an OR relationship.

To learn the T-AOG, the data-driven AOG learning engine 910 may classify different portions of each sub-dialog as corresponding types based on the nature of the dialog. For example, as shown in FIG. 6E, the sub-dialog associated with the greeting includes different portions, some related to the initial greeting (650), some related to the answer to the initial greeting (655-1), some related to the response to the answer to the initial greeting (660-1). Such different parts may be recognized via different means, e.g. speech based methods, where utterances from different parties may be so recognized. Based on the content of the speech from the different parties and the topic-based classification model 930, each portion may be assigned a label that represents the nature of the utterance. Different conversations may have portions with the same labels but different content. For example, in answering an initial greeting (e.g., "good morning"), in one dialog, the answer may be "thank you, you are? ". One party in different conversations may respond in different ways to the same greeting, answering the answer "you are also good morning! "both answers may be categorized or tagged as answers to the initial greeting, but with different content. Similarly, responses to the answers (660-1) may also be identified from different parties that answered and categorized as responses to the answers. Such responses to answers to greetings may also include different utterances (e.g., "thank you" or "i am good" in fig. 6E). The sequence of steps in the conversation may then be generalized to a T-AOG, where each node represents an utterance spoken by a participant of the conversation. As seen in the example shown in FIG. 6E, each tag in the T-AOG may be associated with alternative content that may be spoken to instantiate the tag.

FIG. 9C illustrates exemplary different types of item-based AOGs derived from machine learning, in accordance with embodiments of the present teachings. Such AOG (including S-AOG and T-AOG) can be derived by learning from past dialog data in the manner disclosed herein. For example, based on past dialog data, AOGs for coaching mathematics, language, … …, chemistry can be obtained from both the relationship between different sub-dialogs (S-AOGs) under each AOG, the sequence of utterances (T-AOG) for each sub-dialog (each step in the sequence being associated with an alternative utterance).

FIG. 9D is a flow diagram of an exemplary process for a content authoring system for creating AOGs via machine learning, according to an embodiment of the present teachings. In this illustrated embodiment, to create an AOG for a particular project (e.g., a tutoring session for teaching mathematics), the past conversations related to the project (e.g., all past conversation data corresponding to the tutoring session for teaching mathematical concepts) are first accessed at 950 and used to identify sub-conversations (structures) in each past conversation based on the topic-based classification model 930 at 955, enabling the structures or S-AOGs for the project to be obtained at 960 and each node in the S-AOG to be labeled according to the nature of the sub-conversations at 965. For example, if a sub-dialog is relevant for teaching a student a score concept, the corresponding node in the S-AOG structure may be marked as teaching.

For each node in the S-AOG that is linked to a sub-dialog, the past dialog data corresponding to the sub-dialog may then be analyzed at 970 to derive a T-AOG for that S-AOG node. In some embodiments, the learning system 900 may simply employ some sub-dialog content to form a T-AOG. In some embodiments, past dialog data from similar sub-dialogs that relate to similar but different dialog content may be used to create different T-AOGs for the S-AOG node. In some embodiments, different T-AOGs learned from past dialog data may also be integrated (e.g., integrated via graph matching) to create one or more merged T-AOGs. In some cases, based on different T-AOGs to be merged to generate an integrated T-AOG, dialog content from the different T-AOGs may be used to generate parameterized content for the integrated T-AOG. To create the structure of the T-AOG, learning system 900 may identify, at 975, different portions of the sub-dialog (e.g., "initial greeting," "answer to initial greeting," and "response to answer" in fig. 6E) based on topic model 930. Each such portion may be provided with parameterized dialog content generated based on parameterized dialog content of similar portions of different T-AOGs of the same S-AOG node. The concept of parameterized AOG is discussed herein with reference to FIGS. 6E-6G.

In addition to automatically creating AOGs (including S-AOGs and/or T-AOGs), AOGs may also be created semi-associatively with each S-AOG node, and automatically generated AOGs may also be inspected, validated, refined or modified by a human via, for example, a user graphical interface. This may include adjusting the S-AOG and/or changing the dialog content in the T-AOG obtained via machine learning. FIG. 10A illustrates an exemplary visual programming interface 1000 configured for authoring T-AOG content in accordance with embodiments of the present teachings. At the bottom of this example interface 1000, it is indicated that the example is a content authoring interface for a dialog related to a "greeting" user in a tutorial dialog session. There are questions (Q) and answers (A) with authored text content (Q-1010-1, 1010-3, … … and A-1010-2, 1010-4, … …). In association with each text content, there is an illustrated "edit" button to enable editing of the authored text content in the corresponding box.

In some embodiments, authored textual content 1010-1, … … 1010-4, … … may be initially created via learning and displayed in interface 1000 for potential processing. If the machine automatically authored textual content is acceptable, then a person may click on the "Save" button 1025 to store the automatically authored textual content associated with the T-AOG. The person may also modify the authored text context in the linked box via an "edit" option (1020-1, 1020-2, 1020-3, 1020-4, … …). After modification, "save" may be clicked on to save the modified dialog content associated with the T-AOG. Different people may save different versions of the modified dialog content of the T-AOG. For example, a person may feel that textual content authored via machine learning is acceptable, and then may save such automatically generated content for use by the robotic agent to teach students in an "additive" tutoring session. While another person may prefer that the robotic agent teach the same "add" concept in a different way, so he/she may revise what the machine learns from past session data, customizing the T-AOG in a different way and saving accordingly to drive his/her robotic agent.

In some embodiments, while S-AOGs may be automatically generated via learning, the generation of T-AOGs for associated S-AOG nodes may be done manually, i.e., interface 1000 may initially not display the automatically populated authored textual content, but rather a person may need to enter the textual content in each box. Even in the case of manual creation of T-AOG content, the process of semi-automatic AOG creation is more efficient than a fully manual process since S-AOG is learned via machine learning. As discussed herein, in some embodiments, the dialog content of a T-AOG may be parameterized. FIG. 10B illustrates an exemplary visual programming interface 1030 configured for authoring a parameterized T-AOG, in accordance with embodiments of the present teachings. This exemplary authoring tool interface 1030 may be provided according to the T-AOG in FIG. 6F. In some embodiments, the conversation flow/structure in fig. 6F includes parameterized content and can learn from past conversation data via AOG learning system 900. The interface 1030 shown in FIG. 10B may then be provided to one or more people to add selections for parameterized content.

As seen in FIG. 10B, interface 1030 may present different portions related to T-AOG regarding the concept of testing "addition". Some portions may correspond to portions of instructions, such as 1040-1 and 1040-4, that indicate only certain actions to be performed (e.g., display (1040-1) something and/or say something (1040-4)). Some portions are editable, such as underlined portions for entering values for variables X and Y (1040-2) or content items in parentheses [ ] to specify desired objects (1040-3, 1040-4), comments (1040-6, 1040-8, and 1040-9), for example. Some parts are used to provide conditions for expression with braces { }, such as 1040-5, 1040-7. For example, if a student is presented with a question of X + Y in a tutoring session, and the student in the session answers the question 1040-4, then the condition for issuing a positive comment is that the answer from the student is equal to X + Y, i.e., the condition of { [ input ] ═ X + Y } is satisfied. When [ input ] is not equal to X + Y, a dialog is prescribed to evaluate whether the incorrect answer is due to being unknown (e.g., a pure guess and not) or simply incorrect (known but misspoken). Such an evaluation may be performed instantaneously during the dialog based on probabilities estimated, for example, based on various observations relating to the student. If it is evaluated as an incorrect answer, then an alternative comment for [ incorrect comment ]1040-8 may be authored via this authoring tool. If it is evaluated as not knowing the answer, a comment for [ incorrect comment ]1040-9 may be authored and used to comment on the user's answer.

The editable portion may be edited, which may be done by selecting from a list of selectable items via a drop down menu, or in an "edit" mode that allows a person to enter textual content or modify an existing text string. For example, a drop down menu associated with [ obj ] (1040-3) may be activated by right clicking 1040-3. When presenting a list of existing selectable items, a person may select from the list. For example, for [ obj1] and [ obj2], their drop down menus may be associated with a list of objects such as apples, oranges, and the like. Another editing mode is entering a text string. For example, the values of the variables X and Y may be entered by a person in edit mode.

For some editable portions, they may be edited based on both existing selectable options (e.g., learned from past dialog data) and newly entered authored text content. One example is provided in FIG. 10B for content item [ positive comment ] 1040-6. When the edit button 1045-3 associated therewith is clicked, an additional window 1046 may pop up. The pop-up window 1046 includes pre-existing options for positive comments (each option may be associated with a selection button on the left) and options to add more (by clicking on the "more" button 1049-1) dialog content. In this example, there are two pre-existing options ("good to do!" and "that is correct!"), one of which is selected ("good to do!") and two more new items ("too baseball!" and "excellent!") are entered. The button "accept" 1049-2 may be clicked to save the selected and added options as alternative dialog content associated with [ positive comment ], i.e., any saved content item may be identified in the dialog as a positive comment response to the student's answer to X + Y.

If AOG structures are learned from past conversation data via machine learning, such learned AOGs can be used directly by the conversation system to converse with human users, or they can be further modified or enriched based on automatically generated AOGs via semi-automatic authoring tool 1030. The results from such authoring tools may also be used to generate program code to execute the underlying dialog specified by the AOG. That is, based on the AOGs (S-AOG and T-AOG) so generated, code can be automatically generated from the dialog content embedded with the nodes of the AOG that will follow the dialog flow specified by the AOG. Accordingly, the authoring tool or interface 1030 may be considered a visual programming tool and, together with the AOG learning system 900, may significantly enhance the process of designing and implementing a machine agent for a conversation.

FIG. 10C is a flowchart of an exemplary process for creating an AOG with authored dialog content, according to an embodiment of the present teachings. Based on past dialogue data, AOG learning system 900 learns AOGs at 1050. In the automatic mode determined at 1055, such machine learned AOGs can be used directly to generate code at 1085 for the machine agent to perform the underlying dialog without further content authoring or editing in the semi-automatic mode. If the machine learned AOG is to be further refined/modified/edited, it may be via semi-automatic means with authoring tools, such as 1000 or 1030 as shown in FIGS. 10A and 10B. To author or modify dialog content associated with an AOG, each machine learned AOG can be displayed at 1065 for editing or authoring content associated with the AOG. As discussed herein, in some embodiments, learned AOGs may be embedded with dialog content learned via learning, and such learned content may be used as a basis for further refinement. In some embodiments, the dialog content may be re-authored, as shown in FIG. 10A. During content authoring in semi-automatic mode, as modified or new dialog content is received at 1070, it is stored appropriately at 1075 along with the relevant AOG. If it is determined at 1080 that there are more AOGs to edit, the process proceeds to 1065 to display the next AOG. If all AOGs have been processed, then a modified AOG based on the editing results is generated at 1085. Based on the modified/enhanced AOG, code is generated at 1090 for executing the dialog specified by the AOG based on the authored dialog content.

FIG. 11A illustrates exemplary code 1110 generated via visual programming and the results of code in rendering a scene related to S-AOG according to an embodiment of the present teachings. 1110 is generated for rendering a scene with a set of objects used to teach children students the concept of digital addition in conjunction with S-AOG related to mathematical coaching. Code 1110 may be generated via visual programming based on dialog content associated with the AOG and authored via semi-automatic content authoring tools. As can be seen, code 1110 is programmed to render a set of different types of objects (i.e., 1120-1 for pumpkin and 1130-1 for strawberry) and the number to be rendered for each type (1 pumpkin and 2 strawberries). Execution of such code may then generate 1120-3 and 1130-3, which present different numbers of different types of products depending on the authored dialog content. The presentation is created to allow the conversation to proceed with the intended goal of teaching students to understand the concepts of numbers and/or additions. Thus, such created presentations may then be asked questions by the machine agent to ask the student user about how many pumpkins, how many strawberries, and what the total number of fruits are in the picture.

FIG. 11B illustrates exemplary code generated via visual programming based on conversational content in T-AOG obtained via semi-automatic content authoring according to embodiments of the present teachings. Code 1150 as shown implements the conversation represented by T-AOG1160 having AND AND OR branches by traversing the T-AOG AND taking a particular path (OR T-PG) based on the actual progress of the conversation. Such code is automatically generated based on the authored T-AOG and the dialog content associated with each node in the T-AOG. For example, in a particular conversation, the machine agent may ask the user whether he/she has decayed teeth at 1130. If the answer to this question is "yes" or "not known", then the machine agent traverses the conversation by following node 1140. Otherwise, the conversation continues to ask diabetes related questions at 1150. If the user answers the diabetes question is "yes"/"not known," then the path to 1170 is traversed, otherwise the path to 1160 is traversed. In this way, code for T-AOG can be developed more efficiently via automated AOG learning, or with semi-automated content authoring combined with visual programming.

As discussed herein, content authoring may be accomplished via authoring tools that allow a person to modify existing dialog content learned from past dialog data or enter new dialog content to enrich an existing AOG. In some embodiments, dialog content may also be authored based on what the user says and does so as to provide not only voice data and/or but also the manner in which voice is to be delivered. For example, instead of modifying or entering new text in the interface, a person may simply speak dialog content to create authored content, and possibly also have certain expressions, facial, tone, and body actions to convey how the authored content is delivered. That is, both voice content and metadata about voice can be authored according to human behavior. Adherence to such metadata may enable delivery of conversation content with a desired emotion (e.g., whether angry, happy, or excited).

FIG. 12A depicts an exemplary high-level configuration of a system 1200 for content authoring based on multimodal input from a user in accordance with an embodiment of the present teachings. In the illustrated embodiment, person 1210 participates in authoring content related to AOG via different means. As discussed above, in some embodiments, the person 1210 can create textual conversation content via his/her computer/device by typing authored conversation content via the visual programming interface 1205. In addition, the present teachings also allow person 1210 to author content via other means with instructions describing the manner in which content authored during a user-machine conversation will be delivered to the user. For example, a different means of authoring content is for person 1210 to author dialog content by speaking (rather than typing) the content. In some embodiments, when speaking authored content, person 1210 may also perform certain activities that may be transformed into instructions regarding the manner in which conversational content is delivered. For example, person 1210 may speak content at a particular tone with a particular volume, speed, and pitch, express a particular emotion with a particular facial expression, or make some physical action, all of which may be transformed into instructions so that the authored content may be delivered in the manner presented by person 1210.

In authoring content using speech, utterances are captured by the audio sensor 1220 and then analyzed by an Automatic Speech Recognizer (ASR)1240 to convert the speech signals into text as authored content. At the same time, the acoustic signals may also be analyzed to extract acoustic features that may be used as acoustic-based instructions for rendering the authored content. Such information (characteristics of the utterance rather than content of the utterance) may be analyzed by an audio/visual (AV) -based instruction generator 1260 and used to generate, for example, rendering instructions related to acoustics. For example, the person 1210 may speak content at a high treble altitude and at a fast rate to anger or a high volume. If the person 1210 says content with a facial expression, such visual signals may be captured by the camera 1230 and then processed by the A/V based instruction generator 1260 to convert the visual signals into expression instructions so that the authored content may be delivered by the robotic agent with a particular facial expression. For example, person 1210 may speak content with a surprise facial expression.

In some embodiments, the acoustic features and facial features may be combined to derive rendering instructions. Person 1210 can speak content with an exciting vocal character while having a large smile on the face. Both acoustic and visual information may be captured by audio/

visual sensors

1220 and 1230 simultaneously and used to derive rendering instructions related to acoustic and emotions. In some applications, the robotic device may have a display on its head as its face, and then may render the specified emotions based on the emotive instructions while the robot speaks a piece of dialog content based on the acoustically relevant rendering instructions.

In some cases, the robotic agent may have body parts that may be controlled based on instructions to perform, for example, certain gestures, such as waving hands, tilting heads, making a fist, leaning upper, and the like. The present teachings disclosed herein also facilitate generating instructions related to body movement based on the actions of the person 1210. Such body movement related instructions may be used to control the robotic agent to perform certain body actions while rendering some authored dialog content as part of the expression to be presented. As part of the content authoring content, instructions for such body movement(s) may be automatically generated. As shown in fig. 12A, a camera 1230 may capture the motion of a person 1210, and a movement instruction generator 1250 may analyze the body movement of the person 1210 and generate instructions as metadata associated with the authored content.

Instructions generated based on human motion that are intended to instruct a robot to achieve certain acoustic/visual/physical characteristics while speaking may be referred to as a/V/P instructions. FIG. 12B illustrates exemplary types of metadata that may be generated and stored with authored dialog content according to embodiments of the present teachings. Metadata or instructions associated with a piece of authored dialog content may be directed to acoustic features, facial features … … and body features. The acoustic features include speed, pitch, tone, and/or volume used to convert a piece of textual form of dialog content into its phonetic form. Facial features may include happy expressions, sympathic expressions, expressions of interest, and the like. The physical characteristics may include raising an arm, making a fist, tilting a head, … …, or tilting a body, each of which may be further specified based on physical actions of the person 1210, such as left, right, forward, and backward.

The A/V/P instructions so generated based on the actions of person 1210 may be stored with their corresponding dialog content segments, e.g., text content 1 may be associated with A/V/P instructions 1270-1; text content 2 may be associated with A/V/P instructions 1270-2; textual content 3 may be associated with A/V/P instructions 1270-3; text content 4 may be associated with a/V/P instructions 1270-4, … …, and so on. In this manner, whenever the robot agent is to speak a particular piece of dialog content, it may access the associated a/V/P instructions and then control the robot to speak the content with acoustic characteristics (e.g., tone, pitch, speed, volume, etc.) and with particular facial expressions (e.g., smile, frown, or sadness) and body characteristics (e.g., upper body leaning forward, fingers pointing to the sky, or jumping). In this way, conversational content may be authored in a rich manner in an efficient automatic or semi-automatic manner.

FIG. 12C is a flow diagram of an exemplary process of a content authoring system 1200 for authoring content based on multimodal input from a user in accordance with an embodiment of the present teachings. In content authoring, the system 1200 first receives multi-modal input from different sensors at 1205. Such multimodal sensors may include audio, visual, and other types of sensors. Based on the received audio signal, ASR is performed 1215 to generate dialog content via speech authoring. Furthermore, various types of acoustic features (such as pitch, volume, tone, etc.) may also be estimated based on the received audio signals at 1225, and used to generate acoustically-related rendering instructions associated with the authored dialog content segment at 1235. At the same time, based on the received visual signals, such visual input is analyzed at 1245 to extract different visual features. The visual features thus extracted may then be used to estimate facial expressions (if any) to generate relevant expression-related rendering instructions for the authored dialog content segment at 1255. Similarly, the extracted visual features may also be used to further estimate the physical action (if any) performed by the person 1210 at 1265 in order to generate corresponding physical-action-related rendering instructions at 1275. Then, at 1285, for dialog content that was authored via speech, such automatically generated rendering instructions may be associated with the dialog content for storage. If it is determined at 1290 that there is more dialog content, the process proceeds to the next segment of dialog content until all dialog content segments associated with the underlying T-AOG have been authored.

Figure 13 is an illustrative diagram of an exemplary mobile device architecture that can be used to implement a dedicated system embodying the present teachings in accordance with various embodiments. In this example, a user device on which the present teachings are implemented corresponds to a mobile device 1300, including but not limited to a smartphone, a tablet, a music player, a handheld game console, a Global Positioning System (GPS) receiver, and a wearable computing device (e.g., glasses, a wristwatch, etc.), or any other form of device. The mobile device 1300 may include one or more central processing units ("CPUs") 1340, one or more graphics processing units ("GPUs") 1330, a display 1320, a memory 1360, a communication platform 1310 (such as a wireless communication module), a storage 1390, and one or more input/output (I/O) devices 1340. Any other suitable components, including but not limited to a system bus or a controller (not shown), may also be included in the mobile device 1300. As shown in fig. 13, a mobile operating system 1370 (e.g., iOS, Android, Windows Phone, etc.) and one or more applications 1380 may be loaded from storage 1390 into memory 1360 for execution by CPU 1340. The applications 1380 may include a browser or any other suitable mobile application for managing a conversation system on the mobile device 1300. User interactions may be implemented via I/O device 1340 and provided to automated conversation partners via a network.

To implement the various modules, units, and their functions described in this disclosure, a computer hardware platform may be used as the hardware platform(s) for one or more of the elements described herein. The hardware elements, operating systems, and programming languages of such computers are conventional in nature, and it is assumed that those skilled in the art are sufficiently familiar with these techniques to adapt those techniques to the appropriate settings as described herein. A computer with user interface elements may be used to implement a Personal Computer (PC) or other type of workstation or terminal device, but if suitably programmed, the computer may also act as a server. It is believed that those skilled in the art are familiar with the structure, programming, and general operation of such computer equipment, and that the drawings should be self-explanatory.

FIG. 14 is an illustrative diagram of an exemplary computing device architecture that can be used to implement a special purpose system embodying the present teachings in accordance with various embodiments. Such a dedicated system incorporating the present teachings has a functional block diagram illustration of a hardware platform that includes user interface elements. The computer may be a general purpose computer or a special purpose computer. Both of which may be used to implement a specialized system of the present teachings. This computer 1400 may be used to implement any of the components of a conversation or dialog management system, as described herein. For example, the conversation management system may be implemented on a computer such as computer 1400 via its hardware, software programs, firmware, or a combination thereof. While only one such computer is shown for convenience, the computer functions associated with the conversation management system described herein may be implemented in a distributed manner across a plurality of similar platforms to distribute processing load.

For example, the computer 1400 includes a COM port 1450, the COM port 1450 being connected to a network connected thereto to facilitate data communications. Computer 1400 also includes a Central Processing Unit (CPU)1420 in the form of one or more processors for executing program instructions. Exemplary computer platforms include an internal communication bus 1410, various forms of program storage and data storage devices (e.g., disk 1470, Read Only Memory (ROM)1430 or Random Access Memory (RAM)1440), various data files for processing and/or transmission by computer 1400, and program instructions that may be executed by CPU 1420. Computer 1400 also includes I/O components 1460 to support input/output streams between the computer and other components therein, such as user interface element 1480. Computer 1400 may also receive programming and data via network communications.

Thus, as outlined above, aspects of the methods of dialog management and/or other processes may be implemented in programming. The program aspects of the technology may be considered an "article of manufacture" or "article of manufacture", typically in the form of executable code and/or associated data carried or embodied in a machine-readable medium. Tangible, non-transitory "storage" type media include any or all of the memory or other storage for a computer, processor, or the like, or its associated modules, such as various semiconductor memories, tape drives, disk drives, etc., that may provide storage at any time for software programming.

All or part of the software may sometimes be transmitted over a network such as the internet or various other telecommunications networks. For example, such communication may enable software to be loaded from one computer or processor into another computer or processor, e.g., in connection with conversation management. Thus, another type of media which may carry software elements includes optical, electrical, and electromagnetic waves, such as those used across physical interfaces between local devices, through wired and optical land-line networks, and through various air links. The physical elements that carry such waves (such as wired or wireless links, optical links, etc.) can also be considered to be media that carry software. As used herein, unless limited to a tangible "storage" medium, terms such as a computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.

Thus, a machine-readable medium may take many forms, including but not limited to, tangible storage media, carrier wave media, or physical transmission media. Non-volatile storage media includes, for example, optical or magnetic disks, any storage device in any computer or the like, such as (one or more of) any computer or the like, which may be used to implement a system or any component thereof, as shown in the figures. Volatile storage media includes dynamic memory, such as the main memory of such computer platforms. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form the bus within a computer system. Carrier-wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.

Those skilled in the art will recognize that the present teachings are amenable to various modifications and/or enhancements. For example, while an implementation of the various components described above may be implemented in a hardware device, it may also be implemented as a pure software solution — e.g., installed on an existing server. Further, the fraud network detection techniques as disclosed herein may be implemented as firmware, a firmware/software combination, a firmware/hardware combination, or a hardware/firmware/software combination.

While what is considered to constitute the present teachings and/or other examples has been described above, it should be understood that various modifications may be made thereto and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. The following claims are intended to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.

Claims

1. A method for adaptive dialog management, the method implemented on at least one machine comprising at least one processor, memory, and a communication platform connectable to a network, the method comprising:

receiving a language understanding result and an evaluation of the language understanding result, wherein the language understanding result is derived based on an utterance from a user participating in a conversation directed to a topic, the conversation is governed by a conversation policy, and the evaluation is obtained for an expected result represented in the conversation policy;

determining a plurality of probabilities based on the language understanding results and associated evaluations;

updating a first set of parameters associated with the conversation policy based on the plurality of probabilities, wherein the first set of parameters parameterizes the conversation policy with respect to the user and characterizes the effectiveness of the conversation with the user under the conversation policy.

2. The method of claim 1, wherein:

the conversation policy represents an alternative way of conducting the conversation with the user on the topic; and

the expected result represents an answer from a user in response to a statement presented to the user in accordance with the conversation policy.

3. The method of claim 1, wherein the plurality of probabilities comprises:

knowing a positive probability indicating a likelihood that the user knows the expected result regardless of whether the language understanding result is the same as the expected result;

knowing a negative probability, the knowing a negative probability indicating that the user does not know the expected outcome regardless of whether the language understanding outcome is the same as the expected outcome; and

a guess probability indicating a likelihood that the user guessed the language understanding result.

4. The method of claim 1, further comprising: updating a second set of parameters associated with the representation of the topic based on the plurality of probabilities, wherein the second set of parameters represents a dynamic assessment of a user's mastery of the topic.

5. The method of claim 4, wherein the first set of parameters and the second set of parameters characterize utility of the user with respect to the topic.

6. The method of claim 5, wherein the utility comprises:

a status reward indicating a level of reward for conversing with the user on the topic and associated with a representation of the topic; and

one or more path rewards, each path reward being associated with one of the alternative conversation paths embedded in the conversation policy and representing the effectiveness of a conversation with the user along that conversation path.

7. The method of claim 5, further comprising: determining a response to the user based on the utility of the user, the utility being dynamically updated based on knowledge tracked about the conversation policy for the topic.

8. A machine-readable and non-transitory medium having information recorded thereon for adaptive dialog management, wherein the information, when read by a machine, causes the machine to perform operations comprising:

9. The medium of claim 8, wherein:

10. The medium of claim 8, wherein the plurality of probabilities comprises:

11. The medium of claim 8, wherein the information, when read by the machine, further causes the machine to: updating a second set of parameters associated with the representation of the topic based on the plurality of probabilities, wherein the second set of parameters represents a dynamic assessment of a user's mastery of the topic.

12. The media of claim 11, wherein the first set of parameters and the second set of parameters characterize utility of the user with respect to the topic.

13. The medium of claim 12, wherein the utility comprises:

14. The medium of claim 12, wherein the information, when read by the machine, further causes the machine to: determining a response to the user based on the utility of the user, the utility being dynamically updated based on knowledge tracked about the conversation policy for the topic.

15. A system for adaptive dialog management, comprising:

a knowledge tracking unit configured to receive a language understanding result and an evaluation of the language understanding result, wherein the language understanding result is derived based on an utterance from a user participating in a conversation directed to a topic, the conversation is governed by a conversation policy, and the evaluation is obtained for an expected result represented in the conversation policy;

a plurality of probability estimators configured to determine a plurality of probabilities based on the language understanding results and associated evaluations;

an information state updater configured to update a first set of parameters associated with the conversation policy based on the plurality of probabilities, wherein the first set of parameters parameterize the conversation policy with respect to the user and characterize the validity of the conversation with the user under the conversation policy.

16. The system of claim 15, wherein:

17. The system of claim 15, the plurality of probability estimators comprising:

a known positive probability estimator configured to estimate a known positive probability indicating a likelihood that the user knows the expected result regardless of whether the language understanding result is the same as the expected result;

a negative-aware probability estimator configured to estimate a negative-aware probability indicating that the user is unaware of the expected outcome regardless of whether the language understanding outcome is the same as the expected outcome; and

a guess probability estimator configured to estimate a guess probability indicating a likelihood that the user guesses the language understanding result.

18. The system of claim 15, wherein the information status updater is further configured to: updating a second set of parameters associated with the representation of the topic based on the plurality of probabilities, wherein the second set of parameters represents a dynamic assessment of a user's mastery of the topic.

19. The system of claim 18, wherein the first set of parameters and the second set of parameters characterize utility of the user.

20. The system of claim 19, wherein the utility comprises:

21. The method of claim 19, further comprising: determining a response to the user based on the utility of the user, the utility being dynamically updated based on knowledge tracked about the conversation policy for the topic.