CN110443355B

CN110443355B - Conversation method and system applied to compound conversation task

Info

Publication number: CN110443355B
Application number: CN201910720620.5A
Authority: CN
Inventors: 俞凯; 陈志�
Original assignee: Sipic Technology Co Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2021-11-16
Anticipated expiration: 2039-08-06
Also published as: CN110443355A

Abstract

The application discloses a dialogue method applied to a compound dialogue task, which comprises the following steps: structuring the current conversation confidence state to obtain an upper-layer structured conversation state; processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state; structuring the subtask information and the current dialogue confidence state to obtain a bottom-layer structured dialogue state; processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state. The embodiment of the application combines HDRL and GNN to solve the composite task and simultaneously realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.

Description

Conversation method and system applied to compound conversation task

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a conversation method and a conversation system applied to a composite conversation task.

Background

Compound tasks are different from multi-domain conversational tasks. The latter is often mentioned in papers concerned with transfer learning. In most cases, the multi-domain dialog task involves only one domain in a single dialog, and the performance of the one domain model is tested on different domains to highlight its transferability. In contrast, a compound conversation task may involve multiple domains in a single conversation, and the agent must complete all subtasks (accomplish the goals in all domains) to obtain positive feedback.

Consider the process of completing a compound task (e.g., multi-domain restaurant reservation). The agent first selects a subtask (e.g., reserved Cambridge restaurant), then makes a series of decisions to collect relevant information (e.g., price range, region) until all the information needed by the user is provided and the subtask is completed, and then selects the next subtask (e.g., reserve-SF-reserve) to complete. The state action space will increase with the number of subtasks. Therefore, the conversational strategy learning of compound tasks requires more exploration and more conversations between the agent and the user are required to complete the compound task. The sparse reward problem is further magnified.

Solving complex tasks using the same approach as solving single domain tasks may encounter obstacles. The complexity of the compound task makes it difficult for the agent to learn acceptable strategies. However, in the prior art multi-layer perceptrons (MLPs) are often used in DQN to estimate the Q value. MLP uses a concatenation of flat dialog states as its input. Thus, it cannot easily capture the structural information of the semantic slot at that state, resulting in inefficient sampling. In the present application, a ComNet is proposed that utilizes the Graphical Neural Network (GNN) to better utilize the graphical structure under observation (e.g., dialog state) and to maintain consistency with the HDRL method.

Disclosure of Invention

The embodiment of the present application provides a dialog method and system applied to a composite dialog task, which are used to solve at least one of the above technical problems.

In a first aspect, an embodiment of the present application provides a dialog method applied to a compound dialog task, including:

structuring the current conversation confidence state to obtain an upper-layer structured conversation state;

processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;

structuring the subtask information and the current dialogue confidence state to obtain a bottom-layer structured dialogue state;

processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.

In a second aspect, an embodiment of the present application provides a dialog system applied to a compound dialog task, including:

the first structuralization processing program module is used for structuralizing the current conversation confidence state to obtain an upper-layer structuralization conversation state;

a subtask information determination program module for processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;

the second structured processing program module is used for carrying out structured processing on the subtask information and the current conversation confidence state so as to obtain a bottom-layer structured conversation state;

a dialog action determination program module for processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.

In a third aspect, the present application provides a storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to execute any one of the above-mentioned conversation methods applied to a compound conversation task.

In a fourth aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a dialog method of any of the above applications for a composite dialog task.

In a fifth aspect, the present application further provides a computer program product comprising a computer program stored on a storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any of the above-mentioned dialog methods applied to a compound dialog task.

The beneficial effects of the embodiment of the application are that: the embodiment of the application combines HDRL and GNN to solve the composite task and simultaneously realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow diagram of one embodiment of a conversation method applied to a compound conversation task of the present application;

FIG. 2 is a schematic diagram of the structured processing of a dialog state using a two-level policy in the present application

FIG. 3 is a flow diagram of another embodiment of a conversation method applied to a compound conversation task of the present application;

FIG. 4 is a functional block diagram of an embodiment of a dialog system for composite dialog tasks according to the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a neural network of a second diagram of the present application;

FIG. 6 is a graph of performance comparison experiments for three agents of the present application;

FIG. 7 is a graph of a comparison experiment between a model pre-trained on the CR + SFR task and a model with stochastic parameters in the present application;

fig. 8 is a schematic structural diagram of an embodiment of an electronic device of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this application, "module," "device," "system," and the like refer to the relevant entity, either hardware, a combination of hardware and software, or software in execution, that applies to a computer. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Conversational strategy training for compound tasks, such as restaurant reservations in multiple places, is a practically important and challenging problem. Recently, a Hierarchical Deep Reinforcement Learning (HDRL) method has achieved good performance in compound tasks. However, in vanilla HDRL, both the upper layer policies and the lower layer policies are represented by multi-layer perceptrons (MLPs), which combine all observations from the environment as input to the predicted action. Therefore, the conventional HDRL method has problems of low efficient sampling and poor transferability.

In the present application, these problems are addressed by exploiting the flexibility of Graph Neural Networks (GNNs). A novel ComNet is proposed to simulate the structure of a layered agent. ComNet performance was tested on the synthesis task of PyDial benchmarking. Experiments have shown that the performance of ComNet is superior to that of the vanilla HDRL system, which approaches the upper limit. The method can not only realize the sampling efficiency, but also has more robustness to noise while keeping the transferability of other composite tasks.

The present application mainly contributes to three aspects:

1. a new frame ComNet is provided, the composite task is solved by combining HDRL and GNN, and the sample efficiency is realized;

2. ComNet was tested based on PyDial benchmarks and showed that our results surpassed the vanilla HDRL system and were more robust to environmental noise;

3. the transferability of the ComNet framework of the application is tested, and the effective and accurate transfer can be performed under the framework.

Reinforcement learning is the mainstream approach to optimize statistical dialog management strategies recently under the Partially Observable Markov Decision Process (POMDP). One research area is single domain task oriented dialogue, using flat deep reinforcement learning methods such as DQN, strategy gradients and actor critics. Multi-domain task-oriented conversational tasks are another direction, each domain learning a separate conversational strategy.

A compound conversation task has recently been proposed. Unlike multi-domain dialog systems, a compound dialog task requires the completion of all individual subtasks. The compound conversation task is formulated by an option framework and is solved by using a layered reinforcement learning method. All of these works were built on vanilla HDRL with the strategy represented by a multilayer perceptron (MLP). However, in this application we focus on designing a convertible conversational strategy for graph neural network based composite conversational tasks.

GNNs are also used in other aspects of reinforcement learning to provide features such as transferability or less overfitting. In the construction of dialog systems, models such as BUDS also use graphical strength for dialog state tracking. Previous work also demonstrates that learning a structured conversation strategy using GNN can significantly improve system performance in a single domain setting by creating graph nodes corresponding to semantic slots and optimizing graph structure. However, for compound dialogs, we need to take advantage of task specificity and change the complete framework.

Layered reinforcement learning:

before introducing comet, we first briefly review the HRL of a task-oriented compound dialog system. According to the options framework, assume we have a set of dialog states B, a set of subtasks (or options) G, and a set of original actions A.

In contrast to the traditional Markov Decision Process (MDP) setup, where an agent can only select the original action at each time step, the hierarchical MDP decision process includes: (1) selecting upper layer strategy pi for completing subtasks_b(ii) a (2) One underlying strategy pi_b,gIt selects the original action to complete the given subtask. Upper layer ofStrategy pi_bThe confidence state b generated by the global state tracker is taken as input and the subtask G ∈ G is selected. Bottom strategy pi_b,gAnd sensing the current state b and the subtask g, and outputting the primitive action A. Bottom strategy pi_b,gShared by all subtasks.

In this application, we represent these two level strategies using two Q functions, learned by the deep Q learning method (DQN) and respectively by θ_eAnd theta_iAnd (4) parameterizing. Corresponding to the two-level strategy, there are two types of reward signals from the environment (user): external reward r^eAnd an intrinsic prize rⁱ. The external reward directs the dialog agent to select the correct subtask sequence. Intrinsic rewards are used to learn the option strategy to achieve a given subtask. The combination of extrinsic and intrinsic rewards helps the dialog agent to complete the composite task as quickly as possible. Thus, the external and internal rewards are designed as follows:

the internal reward, at the end of the subtask, the agent receives either a positive internal reward 1 or a failed subtask 0 for the successful subtask. To encourage shorter conversations, the agent receives a negative intrinsic award of-0.05 in each turn.

And K is set as the number of the sub-targets for the external reward. At the end of the session, the agent gets a positive extrinsic award for a K successful session, or 0 for a failed session. To encourage shorter conversations, the agent receives a negative external award of-0.05 each turn.

Suppose we have a subtask track T:

where k represents the kth subtask g_k. The conversation track is composed of a series of subtask tracks T₀，T₁... According to Q learning algorithm, parameter theta of upper layer Q function_eThe update is as follows:

wherein the content of the first and second substances,

alpha is a step size parameter, gamma is an element of 0,1]Is the discount rate. The first term of the above q expression is equal to fulfilling the subtask g_kTotal discount reward during the period, second estimate g_kThe maximum total discount value after completion.

The learning process of the underlying strategy is similar, except that intrinsic rewards are used. For each time step T is 0,1,.., T,

wherein the content of the first and second substances,

in vanilla HDRL, MLP is used to approximate the two Q functions described above. The structure of the dialog state is ignored in this setting. Thus, the task of the MLP strategy is to discover potential relationships between observations. This results in a longer convergence time, requiring more survey trials. In the next section, we will explain how to construct graphs to represent relationships in conversational observations.

And (3) compound conversation:

task oriented dialog systems are typically defined by a structured ontology. An ontology consists of some properties (or slots) that a user can use to build a query when completing a task. For a compound dialog state containing K subtasks, each subtask corresponds to several slots. For simplicity, we take subtask k as an example to describe the confidence state. Each slot of subtask k has two Boolean attributes, whether it is requestable or trusted. A user may request a value for a requestable slot and may provide a particular value as a search constraint for a trusted slot. The dialog state tracker updates the confidence state of each communicable slot at each dialog turn.

Generally, the confidence state consists of all distributions of candidate bin values. The value with the highest confidence for each trusted slot is selected as a constraint for searching the database. Information of matching entities in the database is added to the final dialog state. Dialog state b of subtask k^kIs decomposed into a plurality of states related to the groove and states unrelated to the groove and is expressed as

b^k,j(1. ltoreq. j. ltoreq.n) is the jth slot-associated state of subtask k, and b^k,0Indicating the slot-independent state of the subtask k. The overall confidence state is the all subtask-related state b^kIn series, i.e.

Which is the input of the upper-level dialog strategy.

The output of the upper layer policy is the subtask G ∈ G. In this application we use a single heat vector to represent a particular subtask. Furthermore, the entire confidence state b and the subtask vector g are fed into the underlying policy. The outcome of the underlying policy is the original dialog action. Similarly, for each subtask k, the dialog action set A^kCan be divided into n slot-related action sets A^k ^,j(1 ≦ j ≦ n), e.g., request _ slot^k,j，inform_slot^k,j，select_slot^k,jAnd a slot-independent action set A^k,0E.g. repeat^k,0，reqmore^k,0，......，bye^k,0. The entire dialog action space a is the union of all the subtask action spaces.

As shown in fig. 1, an embodiment of the present application provides a dialog method applied to a compound dialog task, including:

and S10, structuring the current dialogue confidence state to obtain an upper-layer structured dialogue state. Illustratively, the dialog state b (e.g., current dialog confidence state) is composed of K subtask-related states, and each subtask-related state may be further decomposed into several slot-related states and a logically inseparable slot-independent state, referred to as an atomic state. The hierarchical format of the dialog states may be viewed naturally as graphics. Each node in the graph represents a respective atomic state. To simplify the structure of the graph, nodes that are not associated with a slot are selected as delegates for nodes corresponding to the same subtask. All nodes not associated with a slot are interconnected in the upper level graph and nodes associated with a slot are only connected to their delegate node.

And S20, processing the upper-layer structured conversation state based on the first graph neural network to determine subtask information corresponding to the current conversation confidence state.

And S30, carrying out structuring processing on the subtask information and the current dialogue confidence state to obtain a bottom layer structured dialogue state.

Illustratively, unlike the input of the upper-level policy, the input of the lower-level policy adds a new node, named subtask node, to represent the target information generated by the upper-level policy. In the bottom graph, nodes that are not related to a slot are all connected to a subtask node (or global delegate node), rather than to each other.

S40, processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.

The embodiment of the application provides a new framework ComNet, which combines HDRL and GNN to solve the composite task and realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.

Illustratively, two graphical neural networks (e.g., a first graphical neural network and a second graphical neural network) are used in embodiments of the present application to parameterize the two-level strategy. For ease of subsequent description and understanding, the following symbols are first introduced: the graph structure is represented as G ═ (V, E), node V_i(i is more than or equal to 0 and less than or equal to n) belongs to V and directed edge e_ijE.g. E. The adjacency matrix Z represents the structure of G. If there is an ith node v from the directed edge_iTo the jth nodev_jElement Z of Z_ijIs 1, otherwise z_ijIs 0. We will connect v to_iRepresented as N_out(v_i). Similarly, N_in(v_i) Representing a node v_iInto a neighborhood set. Each node v_iNode type p with associations_i. Each edge e_ijHaving edge type c_eFrom the starting node type p_iAnd end node type p_jAnd (4) determining. In other words, two edges are of the same type if and only if their starting node type and ending node type are both the same.

Fig. 2 is a schematic diagram illustrating a structured processing of a dialog state by using a two-stage policy in the present application. Fig. 2a is a schematic diagram corresponding to an upper-layer policy, and there are two types of nodes: a slot-related node (S node) and a slot-free node (I node). Since there are no edges between the groove-related joints, it has only four edge types. Similarly, FIG. 2b is a graph corresponding to an underlying policy having three types of nodes (slot dependent, slot independent and subtasks (Tnodes)) and four edge types. To date, the graphs for both the upper-level policies and the lower-level policies have been well defined. ComNet has two GNNs for parsing these graphical format observations of the underlying and overlying policies.

The input of the task-based dialog strategy is the dialog state, and the dialog state of each single domain consists of two major types: a slot-dependent dialog state feature and a slot-independent dialog state feature. The conversation state features related to the slots are composed of features corresponding to all the slots one to one. For a complex conversation task comprising a plurality of sub-fields, the conversation state is composed of the conversation states of all the sub-fields, the upper layer conversation strategy is to select the conversation sub-field needing to be solved currently from the conversation states formed by the combination, and then the lower layer conversation strategy is to carry out conversation decision by combining the conversation state and the selected conversation sub-field. FIG. 2 is a diagram formed by structuring the input of the upper and lower layer policies, where S node refers to a slot-related feature, I node refers to a slot-independent feature, and T node refers to the currently selected sub-domain representation. The following figure specifically shows a graph neural network structure model of a bottom-layer strategy model, and the model mainly comprises three parts: inputting a model, extracting a graph structure information model and outputting the model. All parameters are shared for nodes of the same type. Therefore, as long as the type of node (groove-associated node, groove-unorthodox node, and domain feature node) in the graph structure remains unchanged, the parameter quantities of the model remain unchanged.

Fig. 3 is a flowchart of another embodiment of the conversation method applied to a compound conversation task according to the present application. Specifically, the structured hierarchical conversation strategy model is mainly composed of two parts: upper level dialog policies and lower level dialog policies. The upper layer dialogue strategy is mainly used for specifying the dialogue sub-fields which need to be solved currently for the lower layer dialogue strategy, and the lower layer dialogue strategy is used for outputting dialogue actions by combining dialogue states and sub-field information. Both the process of structuring the upper dialog state and the structured lower dialog state have been described in detail in relation to the embodiment of fig. 2.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

As shown in fig. 4, an embodiment of the present application further provides a dialog system 400 applied to a compound dialog task, including:

a first structured processing program module 410, configured to perform structured processing on the current dialog confidence state to obtain an upper-layer structured dialog state;

a subtask information determination program module 420, configured to process the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;

a second structured handler module 430, configured to perform structured processing on the subtask information and the current dialog confidence state to obtain a bottom-layer structured dialog state;

a dialog action determination program module 440 for processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.

FIG. 5 is a schematic diagram of an embodiment of a second graph neural network of the present application, having three parts for extracting useful representations from an initial graph format view: 1) input module, 2), graphical information draws module, 3), output module, introduce in proper order below:

1) and an input module:

before each prediction, each node v_iWill receive the corresponding atomic state b or subtask information g (denoted x)_i) It is fed into the input module to obtain the state embedding h⁰ _iThe following were used:

h⁰ _i＝F_pi(x_i)

wherein, F_piIs node type p_iIt may be a multilayer perceptron (MLP). Typically, different bins have different numbers of candidates. Thus, the input dimensions of the slot-dependent nodes are different. However, the confidence state for each bin is typically approximated by the probability of the top M values of the sequence, where M is typically less than the minimum number of values for all bins. Thus, the input dimensions of nodes having the same type are the same.

2) And the graphic information extraction module:

the graphic information extraction module extracts h⁰ _iAs node v_iThen further propagates the higher embedding of each node in the graph. The node-embedded propagation process at each extraction level shows the following operation.

And (3) message calculation: in the l step, v is calculated for each node_iPresence of an insertion h^l-1 _iThe node of (2). For each egress node v_j∈N_out(v_i) Node v_iA vector of messages is calculated as follows,

wherein c is_eIs a slave node v_iTo node v_jEdge type of M_ce ^lIs a message generating function, which may be a linear embedding: m^l _ce(h^l-1 _i)＝W^l _ceh^l-1 _i. Note that subscript c_eEdges representing the same edge type share the weight matrix W to learn^l _ce。

Message aggregation: aggregating from each node v after each node finishes computing the message_jInto a neighbor. Specifically, the polymerization process is as follows:

where A is an aggregation function, which may be a summation, averaging or maximum pool function.

Is an aggregated message vector that includes information sent from all neighboring nodes.

Embedding and updating: up to now, each node v_iBoth having two kinds of information, i.e. aggregated message vectors

And its current embedded vector h_i ^l-1. The embedded update process is as follows:

wherein the content of the first and second substances,

is the node type p of the l-th abstraction layer_iWhich may be non-linear, i.e. operating as

Where δ is the activation function, i.e. RELU, λ^lIs a weight parameter of the aggregated information, which is clipped to 0-1, and

is a trainable matrix. Note that the subscript p_iNodes representing the same node type share the same instance of the update function, sharing parameters in our example

3) And an output module:

after the update node embedding L step, each node v_iWith the final representation h_i ^LIs also denoted by h^L _k,iWherein the subscript k, i denotes the node v_iCorresponding to subtask k.

And (3) upper layer output: the upper layer strategy aims at predicting the subtasks to be implemented. In the top level diagram, for a particular subtask, it corresponds to a plurality of S nodes and an I node. Therefore, all final embeddings of subtask-related nodes will be used when calculating the Q-value of a particular subtask. In particular, for each subtask k, we perform the following calculation:

wherein, O_topIs an output function that may be MLP, with indices k,0 and k, I representing the I node and the ith S node, respectively, of subtask k. In practice, we will ∑_vi∈S-node h^L _k,iAnd h^L _k,0As an input to the MLP and outputs a scalar value. This MLP is shared for all subtasks. In making the decision, all q^k _topTo be connected, i.e.

Then according to q_topA subtask is selected.

Bottom layer output: the upper layer policy aims to predict the original dialog action. The original dialog action must correspond to a subtask. The original dialog action may further correspond to a slot node if we treat the slot-independent node as a special type of slot-dependent node. Thus, the Q value for each dialog action contains three pieces of information: a subtask level value, a slot level value, and an original value. We use T-node embedding

To calculate the subtask level value:

wherein the content of the first and second substances,

is the output function of the subtask level value, which may be MLP.

Is K, where each value is assigned to a respective subtask.

Node v belonging to S node and I node_iThe slot value and original value will be calculated:

wherein, O^pi _slotAnd

output functions of the bin value and the original value, respectively, may in fact be MLPs. Similarly, the subscript p_iNodes representing the same node type share the same instance of the output function. Corresponding to the groove node v_iAct a of_k,iHas a Q value of

Wherein + is an element-by-element operation, and

to represent

The kth value of (a). When predicting motion, all q^k,i _lowTo be connected, i.e.

Then according to q_lowThe original action is selected.

Although the parameters of the input module and the graphics information extraction module are not shared between the upper and lower GNNs, there are many shared parameters in each GNN. Assuming now that the compound task is modified and that one subtask adds some new slots, we only need to create new nodes in each GNN. If the number of edge types has not changed, the parameters of the GNN will remain unchanged after the new node is added. The attribution of ComNet results in transferability. In general, if both the set of node types and the set of edge types of the compound Task1 are a subset of the other Task2, the ComNet policy learned in Task2 may be used directly on Task 1.

Since the initial outputs of the same type of nodes have similar semantic meaning, they share parameters in the ComNet. We wish to propagate the relationships between nodes in the graph using GNNs based on the connection of the initial input and the final output.

The present application mainly contributes to three aspects:

Without verifying the effects achieved by the present application, the inventors conducted the following experiments:

first, the validity of ComNet on PyDial benchmarking compound tasks is verified. Then, transferability of ComNet was investigated.

PyDial benchmark, evaluating our target framework requires a compound dialog simulation environment. The PyDial toolkit supports the use of an error model for multi-field dialogue simulation, and lays a good foundation for the construction of the composite task environment.

We modified the policy management module and the user simulation module to support 2 subtask composite dialogue simulations between three available subtasks, which are the Cambridge Restaurant (CR), San Francisco Restaurant (SFR) and the general shopping task (LAP) of a laptop, while preserving different levels of error simulation in all function table 1. Note that in the policy management module, we discard the domain inputs provided by the Dialog State Tracking (DST) module for fair comparison. We have updated the user simulation module and the assessment management module to support reward design.

Experiment implementation:

we implement the following three multitasking agents to evaluate the performance of our proposed framework.

Vanilla HDQN: MLP is used as a hierarchical agent for the model. This is the benchmark against which we compare.

And ComNet: our goal framework takes advantage of the flexibility of GNNs.

Manual production: well-designed rule-based agents have a high success rate in a noiseless compound dialogue. The agent is also used to preheat the training process for the first two agents. Note that this agent uses the precise subtask information provided by DST, which is unfair compared to the other two information.

Here, we train a model with 6000 dialogs or iterations. The total number of training sessions is broken down into stages (30 stages in total, each stage containing 200 sessions). At each stage, there are 100 sessions to test the performance of the session policy. The results of 3 compound tasks in 3 contexts in 6,000 training sessions are shown in fig. 6.

From the analysis, we can observe from fig. 6 that comet outperforms the vanilla MLP strategy in all nine settings (3 environments x 3 types of compound tasks) in terms of success rate and learning speed. In ComNet, both the upper and lower layer policies are denoted by GNN, where nodes of the same type and edges of the same type share parameters. This means that nodes of the same type share an input space (confidence state space). The exploration space will be greatly reduced. As shown in FIG. 6, the speed of change of ComNet learning is faster than the vanilla MLP strategy.

Note that the handcrafted agent program works well because it is tricked by looking at the exact subtask information, which means that the handcrafted agent program is solving a multi-domain task. This should be the upper limit of our model performance. Compared to vanilla HDQN, our comet shows its robustness in all environments with a greater advantage, which facilitates dialog system construction without high precision ASR or DST.

We also compared the difference in dialogs generated by vanilla HDQN and comet after 6000 dialog training. After extensive training, it appears that the vanilla HDQN agent still cannot select the appropriate action country in some specific dialog, which can lead to customer frustration. On the other hand, ComNet also chooses the same operation, but as long as the required information is obtained, it advances the progress of the conversation and thus completes the task successfully. This also helps to demonstrate that ComNet is more sample efficient than the vanilla framework.

Investigating transferability of ComNet: as discussed in the previous embodiments, another advantage of ComNet is that ComNet can migrate naturally due to the flexibility of GNN.

To evaluate its transferability, we first trained 6,000 dialogues on the CR + SFR task. Then, we start the parameters of the strategy model on the other two composite tasks using the trained strategy, and continue training and testing the model. The results are shown in FIG. 7.

We can find that the transfer model learned on the CR + SFR task is compatible with the other two composite tasks. It shows that ComNet can propagate task independent relationships between graph nodes based on the connections of initial node inputs and final outputs. This shows that by using the relevant task parameters trained in advance under the ComNet framework, the training process of the new compound task can be enhanced. After all, it is crucial to solve the problem of cold start in task oriented dialog systems.

In this application, we propose ComNet, which is a structured hierarchical conversational strategy represented by two Graph Neural Networks (GNNs). By replacing the MLPs in the traditional HDRL method, the comet can better utilize the structural information of the dialog state by providing observation (dialog state) and upper layer decisions to the slot-dependent, slot-independent child nodes and exchanging messages between these nodes, respectively. We evaluated our framework in an improved PyDial benchmark test and showed high efficiency, robustness and transferability in all settings.

In some embodiments, the present application provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the above-described dialog methods applied to a composite dialog task.

In some embodiments, the present application further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above-described dialog methods applied to a compound dialog task.

In some embodiments, the present application further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a dialog method applied to a composite dialog task.

In some embodiments, the present application further provides a storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements a dialog method applied to a compound dialog task.

The dialog system applied to the composite dialog task in the embodiment of the present application may be configured to execute the dialog method applied to the composite dialog task in the embodiment of the present application, and accordingly achieve the technical effect achieved by the implementation of the dialog method applied to the composite dialog task in the embodiment of the present application, and details are not described here. In the embodiment of the present application, the relevant functional module may be implemented by a hardware processor (hardware processor).

Fig. 8 is a schematic hardware structure diagram of an electronic device for executing a dialog method applied to a compound dialog task according to another embodiment of the present application, where, as shown in fig. 8, the device includes:

one or more processors 810 and a memory 820, with one processor 810 being an example in FIG. 8.

The apparatus for performing a dialog method applied to a compound dialog task may further include: an input device 830 and an output device 840.

The processor 810, the memory 820, the input device 830, and the output device 840 may be connected by a bus or other means, such as the bus connection in fig. 8.

The memory 820, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the dialog methods applied to the compound dialog task in the embodiments of the present application. The processor 810 executes various functional applications of the server and data processing, i.e., implementing a dialog method in which the above-described method embodiments are applied to a composite dialog task, by executing nonvolatile software programs, instructions, and modules stored in the memory 820.

The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the conversation device applied to the compound conversation task, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 820 may optionally include memory located remotely from the processor 810, which may be connected via a network to the conversation devices applied to the composite conversation task. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 830 may receive input numeric or character information and generate signals related to user settings and function control of the dialog device applied to the composite dialog task. The output device 840 may include a display device such as a display screen.

The one or more modules are stored in the memory 820 and, when executed by the one or more processors 810, perform the dialog method applied to the composite dialog task in any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A conversation method applied to a compound conversation task, comprising:

2. The method of claim 1, wherein the processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state comprises:

for each subtask k, the following calculation is performed:

wherein, O_topIt can be the output function of MLP, subscripts k,0 and k, I denote the I node and ith S node of subtask k, respectively; wherein v is_iRepresenting nodes, L representing node embedding,

is node v_iIs shown in the drawing (a) and (b),

is a representation of an I node;

will be provided with

And

the series connection of (a) is used as the input of the MLP and outputs a scalar value;

when making a decision, all

To be connected, i.e.

Then according to q_topA subtask is selected.

3. The method of claim 1, wherein the processing the underlying structured conversation state based on a second graph neural network to determine a conversation action corresponding to the current conversation confidence state comprises:

the Q value for each dialog contains three pieces of information: a subtask level value, a slot level value, and an original value;

using T-node embedding

To calculate the subtask level value:

wherein the content of the first and second substances,

is an output function of the subtask level value,

is K, wherein each value is assigned to a respective subtask;

wherein the content of the first and second substances,

and

output functions, p, of the bin value and the original value, respectively_iThe same instance of the node-shared output function, representing the same node type, corresponds to a slot nodePoint v_iAct a of_k,iHas a Q value of

Wherein + is an element-by-element operation, and

_kto represent

The kth value of;

when predicting the action, all

To be connected, i.e.

Then according to q_lowA dialog action is selected.

4. The method of claim 1, wherein processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state further comprises:

an input preprocessing step:

before each prediction, each node v_iWill receive the corresponding atomic state b or subtask information g, denoted x_iThe resulting state after preprocessing is embedded as follows:

h⁰ _i＝F_pi(x_i)

wherein, F_piIs node type p_iA function of (a);

and (3) extracting graphic information:

h is to be⁰ _iAs node v_iInitial embedding of (1);

then further propagating the higher embedding of each node in the graph;

the propagation process of node embedding at each extraction level is shown below:

and (3) message calculation: for each node v_iPresence of an insertion h^l-1 _iFor each egress node v_j∈N_out(v_i) Node v_iA vector of messages is calculated as follows,

wherein, c_eIs a slave node v_iTo node v_jEdge type of M_ce ^lIs a message generating function, which may be a linear embedding:

message aggregation: the polymerization process is as follows,

wherein, A is an aggregation function,

is an aggregated message vector;

And its current embedded vector

The embedded update process is as follows:

wherein the content of the first and second substances,

is the node type p of the l-th abstraction layer_iThe update function of (a), which may be a non-linear operation,

where δ is the activation function, λ^lIs a weight parameter of the aggregated information, and

is a trainable matrix.

5. A dialog system for application to a compound dialog task, comprising:

6. The system of claim 5, wherein the processing the upper-level structured dialog state based on the first graph neural network to determine subtask information corresponding to the current dialog confidence state comprises:

for each subtask k, the following calculation is performed:

is node v_iIs shown in the drawing (a) and (b),

is a representation of an I node;

will be provided with

And

when making a decision, all

To be connected, i.e.

Then according to q_topA subtask is selected.

7. The system of claim 5, wherein the processing the underlying structured conversation state based on a second graph neural network to determine the conversation action corresponding to the current conversation confidence state comprises:

using T-jointsPoint embedding

To calculate the subtask level value:

wherein the content of the first and second substances,

is an output function of the subtask level value,

is K, wherein each value is assigned to a respective subtask;

wherein the content of the first and second substances,

and

output functions, p, of the bin value and the original value, respectively_iThe same instance of the shared output function of nodes representing the same node type, corresponding to the slot node v_iAct a of_k,iHas a Q value of

Wherein, + is one element by one elementIs operated alone, and

_kto represent

The kth value of;

when predicting the action, all

To be connected, i.e.

Then according to q_lowA dialog action is selected.

8. The system of claim 5, wherein processing the underlying structured conversation state based on a second graph neural network to determine a conversation action corresponding to the current conversation confidence state further comprises:

an input preprocessing step:

wherein, F_piIs node type p_iA function of (a);

and (3) extracting graphic information:

h is to be⁰ _iAs node v_iInitial embedding of (1);

then further propagating the higher embedding of each node in the graph;

and (3) message calculation: for each nodev_iPresence of an insertion h^l-1 _iFor each egress node v_j∈N_out(v_i) Node v_iA vector of messages is calculated as follows,

message aggregation: the polymerization process is as follows,

wherein, A is an aggregation function,

is an aggregated message vector;

And its current embedded vector

The embedded update process is as follows:

wherein the content of the first and second substances,

is a trainable matrix.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-4.

10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.