CN110443355B - Conversation method and system applied to compound conversation task - Google Patents

Conversation method and system applied to compound conversation task Download PDF

Info

Publication number
CN110443355B
CN110443355B CN201910720620.5A CN201910720620A CN110443355B CN 110443355 B CN110443355 B CN 110443355B CN 201910720620 A CN201910720620 A CN 201910720620A CN 110443355 B CN110443355 B CN 110443355B
Authority
CN
China
Prior art keywords
node
dialog
subtask
state
conversation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910720620.5A
Other languages
Chinese (zh)
Other versions
CN110443355A (en
Inventor
俞凯
陈志�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201910720620.5A priority Critical patent/CN110443355B/en
Publication of CN110443355A publication Critical patent/CN110443355A/en
Application granted granted Critical
Publication of CN110443355B publication Critical patent/CN110443355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/02Reservations, e.g. for tickets, services or events
    • G06Q10/025Coordination of plural reservations, e.g. plural trip segments, transportation combined with accommodation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a dialogue method applied to a compound dialogue task, which comprises the following steps: structuring the current conversation confidence state to obtain an upper-layer structured conversation state; processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state; structuring the subtask information and the current dialogue confidence state to obtain a bottom-layer structured dialogue state; processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state. The embodiment of the application combines HDRL and GNN to solve the composite task and simultaneously realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.

Description

Conversation method and system applied to compound conversation task
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a conversation method and a conversation system applied to a composite conversation task.
Background
Compound tasks are different from multi-domain conversational tasks. The latter is often mentioned in papers concerned with transfer learning. In most cases, the multi-domain dialog task involves only one domain in a single dialog, and the performance of the one domain model is tested on different domains to highlight its transferability. In contrast, a compound conversation task may involve multiple domains in a single conversation, and the agent must complete all subtasks (accomplish the goals in all domains) to obtain positive feedback.
Consider the process of completing a compound task (e.g., multi-domain restaurant reservation). The agent first selects a subtask (e.g., reserved Cambridge restaurant), then makes a series of decisions to collect relevant information (e.g., price range, region) until all the information needed by the user is provided and the subtask is completed, and then selects the next subtask (e.g., reserve-SF-reserve) to complete. The state action space will increase with the number of subtasks. Therefore, the conversational strategy learning of compound tasks requires more exploration and more conversations between the agent and the user are required to complete the compound task. The sparse reward problem is further magnified.
Solving complex tasks using the same approach as solving single domain tasks may encounter obstacles. The complexity of the compound task makes it difficult for the agent to learn acceptable strategies. However, in the prior art multi-layer perceptrons (MLPs) are often used in DQN to estimate the Q value. MLP uses a concatenation of flat dialog states as its input. Thus, it cannot easily capture the structural information of the semantic slot at that state, resulting in inefficient sampling. In the present application, a ComNet is proposed that utilizes the Graphical Neural Network (GNN) to better utilize the graphical structure under observation (e.g., dialog state) and to maintain consistency with the HDRL method.
Disclosure of Invention
The embodiment of the present application provides a dialog method and system applied to a composite dialog task, which are used to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present application provides a dialog method applied to a compound dialog task, including:
structuring the current conversation confidence state to obtain an upper-layer structured conversation state;
processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;
structuring the subtask information and the current dialogue confidence state to obtain a bottom-layer structured dialogue state;
processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
In a second aspect, an embodiment of the present application provides a dialog system applied to a compound dialog task, including:
the first structuralization processing program module is used for structuralizing the current conversation confidence state to obtain an upper-layer structuralization conversation state;
a subtask information determination program module for processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;
the second structured processing program module is used for carrying out structured processing on the subtask information and the current conversation confidence state so as to obtain a bottom-layer structured conversation state;
a dialog action determination program module for processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
In a third aspect, the present application provides a storage medium, in which one or more programs including execution instructions are stored, where the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to execute any one of the above-mentioned conversation methods applied to a compound conversation task.
In a fourth aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a dialog method of any of the above applications for a composite dialog task.
In a fifth aspect, the present application further provides a computer program product comprising a computer program stored on a storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any of the above-mentioned dialog methods applied to a compound dialog task.
The beneficial effects of the embodiment of the application are that: the embodiment of the application combines HDRL and GNN to solve the composite task and simultaneously realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram of one embodiment of a conversation method applied to a compound conversation task of the present application;
FIG. 2 is a schematic diagram of the structured processing of a dialog state using a two-level policy in the present application
FIG. 3 is a flow diagram of another embodiment of a conversation method applied to a compound conversation task of the present application;
FIG. 4 is a functional block diagram of an embodiment of a dialog system for composite dialog tasks according to the present application;
FIG. 5 is a schematic structural diagram of an embodiment of a neural network of a second diagram of the present application;
FIG. 6 is a graph of performance comparison experiments for three agents of the present application;
FIG. 7 is a graph of a comparison experiment between a model pre-trained on the CR + SFR task and a model with stochastic parameters in the present application;
fig. 8 is a schematic structural diagram of an embodiment of an electronic device of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this application, "module," "device," "system," and the like refer to the relevant entity, either hardware, a combination of hardware and software, or software in execution, that applies to a computer. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Conversational strategy training for compound tasks, such as restaurant reservations in multiple places, is a practically important and challenging problem. Recently, a Hierarchical Deep Reinforcement Learning (HDRL) method has achieved good performance in compound tasks. However, in vanilla HDRL, both the upper layer policies and the lower layer policies are represented by multi-layer perceptrons (MLPs), which combine all observations from the environment as input to the predicted action. Therefore, the conventional HDRL method has problems of low efficient sampling and poor transferability.
In the present application, these problems are addressed by exploiting the flexibility of Graph Neural Networks (GNNs). A novel ComNet is proposed to simulate the structure of a layered agent. ComNet performance was tested on the synthesis task of PyDial benchmarking. Experiments have shown that the performance of ComNet is superior to that of the vanilla HDRL system, which approaches the upper limit. The method can not only realize the sampling efficiency, but also has more robustness to noise while keeping the transferability of other composite tasks.
The present application mainly contributes to three aspects:
1. a new frame ComNet is provided, the composite task is solved by combining HDRL and GNN, and the sample efficiency is realized;
2. ComNet was tested based on PyDial benchmarks and showed that our results surpassed the vanilla HDRL system and were more robust to environmental noise;
3. the transferability of the ComNet framework of the application is tested, and the effective and accurate transfer can be performed under the framework.
Reinforcement learning is the mainstream approach to optimize statistical dialog management strategies recently under the Partially Observable Markov Decision Process (POMDP). One research area is single domain task oriented dialogue, using flat deep reinforcement learning methods such as DQN, strategy gradients and actor critics. Multi-domain task-oriented conversational tasks are another direction, each domain learning a separate conversational strategy.
A compound conversation task has recently been proposed. Unlike multi-domain dialog systems, a compound dialog task requires the completion of all individual subtasks. The compound conversation task is formulated by an option framework and is solved by using a layered reinforcement learning method. All of these works were built on vanilla HDRL with the strategy represented by a multilayer perceptron (MLP). However, in this application we focus on designing a convertible conversational strategy for graph neural network based composite conversational tasks.
GNNs are also used in other aspects of reinforcement learning to provide features such as transferability or less overfitting. In the construction of dialog systems, models such as BUDS also use graphical strength for dialog state tracking. Previous work also demonstrates that learning a structured conversation strategy using GNN can significantly improve system performance in a single domain setting by creating graph nodes corresponding to semantic slots and optimizing graph structure. However, for compound dialogs, we need to take advantage of task specificity and change the complete framework.
Layered reinforcement learning:
before introducing comet, we first briefly review the HRL of a task-oriented compound dialog system. According to the options framework, assume we have a set of dialog states B, a set of subtasks (or options) G, and a set of original actions A.
In contrast to the traditional Markov Decision Process (MDP) setup, where an agent can only select the original action at each time step, the hierarchical MDP decision process includes: (1) selecting upper layer strategy pi for completing subtasksb(ii) a (2) One underlying strategy pib,gIt selects the original action to complete the given subtask. Upper layer ofStrategy pibThe confidence state b generated by the global state tracker is taken as input and the subtask G ∈ G is selected. Bottom strategy pib,gAnd sensing the current state b and the subtask g, and outputting the primitive action A. Bottom strategy pib,gShared by all subtasks.
In this application, we represent these two level strategies using two Q functions, learned by the deep Q learning method (DQN) and respectively by θeAnd thetaiAnd (4) parameterizing. Corresponding to the two-level strategy, there are two types of reward signals from the environment (user): external reward reAnd an intrinsic prize ri. The external reward directs the dialog agent to select the correct subtask sequence. Intrinsic rewards are used to learn the option strategy to achieve a given subtask. The combination of extrinsic and intrinsic rewards helps the dialog agent to complete the composite task as quickly as possible. Thus, the external and internal rewards are designed as follows:
the internal reward, at the end of the subtask, the agent receives either a positive internal reward 1 or a failed subtask 0 for the successful subtask. To encourage shorter conversations, the agent receives a negative intrinsic award of-0.05 in each turn.
And K is set as the number of the sub-targets for the external reward. At the end of the session, the agent gets a positive extrinsic award for a K successful session, or 0 for a failed session. To encourage shorter conversations, the agent receives a negative external award of-0.05 each turn.
Suppose we have a subtask track T:
Figure GDA0003200578750000061
where k represents the kth subtask gk. The conversation track is composed of a series of subtask tracks T0,T1... According to Q learning algorithm, parameter theta of upper layer Q functioneThe update is as follows:
Figure GDA0003200578750000062
wherein the content of the first and second substances,
Figure GDA0003200578750000071
alpha is a step size parameter, gamma is an element of 0,1]Is the discount rate. The first term of the above q expression is equal to fulfilling the subtask gkTotal discount reward during the period, second estimate gkThe maximum total discount value after completion.
The learning process of the underlying strategy is similar, except that intrinsic rewards are used. For each time step T is 0,1,.., T,
Figure GDA0003200578750000072
wherein the content of the first and second substances,
Figure GDA0003200578750000073
in vanilla HDRL, MLP is used to approximate the two Q functions described above. The structure of the dialog state is ignored in this setting. Thus, the task of the MLP strategy is to discover potential relationships between observations. This results in a longer convergence time, requiring more survey trials. In the next section, we will explain how to construct graphs to represent relationships in conversational observations.
And (3) compound conversation:
task oriented dialog systems are typically defined by a structured ontology. An ontology consists of some properties (or slots) that a user can use to build a query when completing a task. For a compound dialog state containing K subtasks, each subtask corresponds to several slots. For simplicity, we take subtask k as an example to describe the confidence state. Each slot of subtask k has two Boolean attributes, whether it is requestable or trusted. A user may request a value for a requestable slot and may provide a particular value as a search constraint for a trusted slot. The dialog state tracker updates the confidence state of each communicable slot at each dialog turn.
Generally, the confidence state consists of all distributions of candidate bin values. The value with the highest confidence for each trusted slot is selected as a constraint for searching the database. Information of matching entities in the database is added to the final dialog state. Dialog state b of subtask kkIs decomposed into a plurality of states related to the groove and states unrelated to the groove and is expressed as
Figure GDA0003200578750000074
bk,j(1. ltoreq. j. ltoreq.n) is the jth slot-associated state of subtask k, and bk,0Indicating the slot-independent state of the subtask k. The overall confidence state is the all subtask-related state bkIn series, i.e.
Figure GDA0003200578750000075
Figure GDA0003200578750000076
Which is the input of the upper-level dialog strategy.
The output of the upper layer policy is the subtask G ∈ G. In this application we use a single heat vector to represent a particular subtask. Furthermore, the entire confidence state b and the subtask vector g are fed into the underlying policy. The outcome of the underlying policy is the original dialog action. Similarly, for each subtask k, the dialog action set AkCan be divided into n slot-related action sets Ak ,j(1 ≦ j ≦ n), e.g., request _ slotk,j,inform_slotk,j,select_slotk,jAnd a slot-independent action set Ak,0E.g. repeatk,0,reqmorek,0,......,byek,0. The entire dialog action space a is the union of all the subtask action spaces.
As shown in fig. 1, an embodiment of the present application provides a dialog method applied to a compound dialog task, including:
and S10, structuring the current dialogue confidence state to obtain an upper-layer structured dialogue state. Illustratively, the dialog state b (e.g., current dialog confidence state) is composed of K subtask-related states, and each subtask-related state may be further decomposed into several slot-related states and a logically inseparable slot-independent state, referred to as an atomic state. The hierarchical format of the dialog states may be viewed naturally as graphics. Each node in the graph represents a respective atomic state. To simplify the structure of the graph, nodes that are not associated with a slot are selected as delegates for nodes corresponding to the same subtask. All nodes not associated with a slot are interconnected in the upper level graph and nodes associated with a slot are only connected to their delegate node.
And S20, processing the upper-layer structured conversation state based on the first graph neural network to determine subtask information corresponding to the current conversation confidence state.
And S30, carrying out structuring processing on the subtask information and the current dialogue confidence state to obtain a bottom layer structured dialogue state.
Illustratively, unlike the input of the upper-level policy, the input of the lower-level policy adds a new node, named subtask node, to represent the target information generated by the upper-level policy. In the bottom graph, nodes that are not related to a slot are all connected to a subtask node (or global delegate node), rather than to each other.
S40, processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
The embodiment of the application provides a new framework ComNet, which combines HDRL and GNN to solve the composite task and realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.
Illustratively, two graphical neural networks (e.g., a first graphical neural network and a second graphical neural network) are used in embodiments of the present application to parameterize the two-level strategy. For ease of subsequent description and understanding, the following symbols are first introduced: the graph structure is represented as G ═ (V, E), node Vi(i is more than or equal to 0 and less than or equal to n) belongs to V and directed edge eijE.g. E. The adjacency matrix Z represents the structure of G. If there is an ith node v from the directed edgeiTo the jth nodevjElement Z of ZijIs 1, otherwise zijIs 0. We will connect v toiRepresented as Nout(vi). Similarly, Nin(vi) Representing a node viInto a neighborhood set. Each node viNode type p with associationsi. Each edge eijHaving edge type ceFrom the starting node type piAnd end node type pjAnd (4) determining. In other words, two edges are of the same type if and only if their starting node type and ending node type are both the same.
Fig. 2 is a schematic diagram illustrating a structured processing of a dialog state by using a two-stage policy in the present application. Fig. 2a is a schematic diagram corresponding to an upper-layer policy, and there are two types of nodes: a slot-related node (S node) and a slot-free node (I node). Since there are no edges between the groove-related joints, it has only four edge types. Similarly, FIG. 2b is a graph corresponding to an underlying policy having three types of nodes (slot dependent, slot independent and subtasks (Tnodes)) and four edge types. To date, the graphs for both the upper-level policies and the lower-level policies have been well defined. ComNet has two GNNs for parsing these graphical format observations of the underlying and overlying policies.
The input of the task-based dialog strategy is the dialog state, and the dialog state of each single domain consists of two major types: a slot-dependent dialog state feature and a slot-independent dialog state feature. The conversation state features related to the slots are composed of features corresponding to all the slots one to one. For a complex conversation task comprising a plurality of sub-fields, the conversation state is composed of the conversation states of all the sub-fields, the upper layer conversation strategy is to select the conversation sub-field needing to be solved currently from the conversation states formed by the combination, and then the lower layer conversation strategy is to carry out conversation decision by combining the conversation state and the selected conversation sub-field. FIG. 2 is a diagram formed by structuring the input of the upper and lower layer policies, where S node refers to a slot-related feature, I node refers to a slot-independent feature, and T node refers to the currently selected sub-domain representation. The following figure specifically shows a graph neural network structure model of a bottom-layer strategy model, and the model mainly comprises three parts: inputting a model, extracting a graph structure information model and outputting the model. All parameters are shared for nodes of the same type. Therefore, as long as the type of node (groove-associated node, groove-unorthodox node, and domain feature node) in the graph structure remains unchanged, the parameter quantities of the model remain unchanged.
Fig. 3 is a flowchart of another embodiment of the conversation method applied to a compound conversation task according to the present application. Specifically, the structured hierarchical conversation strategy model is mainly composed of two parts: upper level dialog policies and lower level dialog policies. The upper layer dialogue strategy is mainly used for specifying the dialogue sub-fields which need to be solved currently for the lower layer dialogue strategy, and the lower layer dialogue strategy is used for outputting dialogue actions by combining dialogue states and sub-field information. Both the process of structuring the upper dialog state and the structured lower dialog state have been described in detail in relation to the embodiment of fig. 2.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in fig. 4, an embodiment of the present application further provides a dialog system 400 applied to a compound dialog task, including:
a first structured processing program module 410, configured to perform structured processing on the current dialog confidence state to obtain an upper-layer structured dialog state;
a subtask information determination program module 420, configured to process the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;
a second structured handler module 430, configured to perform structured processing on the subtask information and the current dialog confidence state to obtain a bottom-layer structured dialog state;
a dialog action determination program module 440 for processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
The embodiment of the application provides a new framework ComNet, which combines HDRL and GNN to solve the composite task and realize the sample efficiency. In addition, the method is more stable to environmental noise, and effective and accurate migration can be performed.
FIG. 5 is a schematic diagram of an embodiment of a second graph neural network of the present application, having three parts for extracting useful representations from an initial graph format view: 1) input module, 2), graphical information draws module, 3), output module, introduce in proper order below:
1) and an input module:
before each prediction, each node viWill receive the corresponding atomic state b or subtask information g (denoted x)i) It is fed into the input module to obtain the state embedding h0 iThe following were used:
h0 i=Fpi(xi)
wherein, FpiIs node type piIt may be a multilayer perceptron (MLP). Typically, different bins have different numbers of candidates. Thus, the input dimensions of the slot-dependent nodes are different. However, the confidence state for each bin is typically approximated by the probability of the top M values of the sequence, where M is typically less than the minimum number of values for all bins. Thus, the input dimensions of nodes having the same type are the same.
2) And the graphic information extraction module:
the graphic information extraction module extracts h0 iAs node viThen further propagates the higher embedding of each node in the graph. The node-embedded propagation process at each extraction level shows the following operation.
And (3) message calculation: in the l step, v is calculated for each nodeiPresence of an insertion hl-1 iThe node of (2). For each egress node vj∈Nout(vi) Node viA vector of messages is calculated as follows,
Figure GDA0003200578750000111
wherein c iseIs a slave node viTo node vjEdge type of Mce lIs a message generating function, which may be a linear embedding: ml ce(hl-1 i)=Wl cehl-1 i. Note that subscript ceEdges representing the same edge type share the weight matrix W to learnl ce
Message aggregation: aggregating from each node v after each node finishes computing the messagejInto a neighbor. Specifically, the polymerization process is as follows:
Figure GDA0003200578750000112
where A is an aggregation function, which may be a summation, averaging or maximum pool function.
Figure GDA0003200578750000113
Is an aggregated message vector that includes information sent from all neighboring nodes.
Embedding and updating: up to now, each node viBoth having two kinds of information, i.e. aggregated message vectors
Figure GDA0003200578750000114
And its current embedded vector hi l-1. The embedded update process is as follows:
Figure GDA0003200578750000115
wherein the content of the first and second substances,
Figure GDA0003200578750000121
is the node type p of the l-th abstraction layeriWhich may be non-linear, i.e. operating as
Figure GDA0003200578750000122
Where δ is the activation function, i.e. RELU, λlIs a weight parameter of the aggregated information, which is clipped to 0-1, and
Figure GDA0003200578750000123
is a trainable matrix. Note that the subscript piNodes representing the same node type share the same instance of the update function, sharing parameters in our example
Figure GDA0003200578750000124
3) And an output module:
after the update node embedding L step, each node viWith the final representation hi LIs also denoted by hL k,iWherein the subscript k, i denotes the node viCorresponding to subtask k.
And (3) upper layer output: the upper layer strategy aims at predicting the subtasks to be implemented. In the top level diagram, for a particular subtask, it corresponds to a plurality of S nodes and an I node. Therefore, all final embeddings of subtask-related nodes will be used when calculating the Q-value of a particular subtask. In particular, for each subtask k, we perform the following calculation:
Figure GDA0003200578750000125
wherein, OtopIs an output function that may be MLP, with indices k,0 and k, I representing the I node and the ith S node, respectively, of subtask k. In practice, we will ∑vi∈S-node hL k,iAnd hL k,0As an input to the MLP and outputs a scalar value. This MLP is shared for all subtasks. In making the decision, all qk topTo be connected, i.e.
Figure GDA0003200578750000126
Then according to qtopA subtask is selected.
Bottom layer output: the upper layer policy aims to predict the original dialog action. The original dialog action must correspond to a subtask. The original dialog action may further correspond to a slot node if we treat the slot-independent node as a special type of slot-dependent node. Thus, the Q value for each dialog action contains three pieces of information: a subtask level value, a slot level value, and an original value. We use T-node embedding
Figure GDA0003200578750000127
To calculate the subtask level value:
Figure GDA0003200578750000128
wherein the content of the first and second substances,
Figure GDA0003200578750000129
is the output function of the subtask level value, which may be MLP.
Figure GDA00032005787500001210
Is K, where each value is assigned to a respective subtask.
Node v belonging to S node and I nodeiThe slot value and original value will be calculated:
Figure GDA0003200578750000131
Figure GDA0003200578750000132
wherein, Opi slotAnd
Figure GDA0003200578750000133
output functions of the bin value and the original value, respectively, may in fact be MLPs. Similarly, the subscript piNodes representing the same node type share the same instance of the output function. Corresponding to the groove node viAct a ofk,iHas a Q value of
Figure GDA0003200578750000134
Wherein + is an element-by-element operation, and
Figure GDA0003200578750000135
to represent
Figure GDA0003200578750000136
The kth value of (a). When predicting motion, all qk,i lowTo be connected, i.e.
Figure GDA0003200578750000137
Then according to qlowThe original action is selected.
Although the parameters of the input module and the graphics information extraction module are not shared between the upper and lower GNNs, there are many shared parameters in each GNN. Assuming now that the compound task is modified and that one subtask adds some new slots, we only need to create new nodes in each GNN. If the number of edge types has not changed, the parameters of the GNN will remain unchanged after the new node is added. The attribution of ComNet results in transferability. In general, if both the set of node types and the set of edge types of the compound Task1 are a subset of the other Task2, the ComNet policy learned in Task2 may be used directly on Task 1.
Since the initial outputs of the same type of nodes have similar semantic meaning, they share parameters in the ComNet. We wish to propagate the relationships between nodes in the graph using GNNs based on the connection of the initial input and the final output.
The present application mainly contributes to three aspects:
1. a new frame ComNet is provided, the composite task is solved by combining HDRL and GNN, and the sample efficiency is realized;
2. ComNet was tested based on PyDial benchmarks and showed that our results surpassed the vanilla HDRL system and were more robust to environmental noise;
3. the transferability of the ComNet framework of the application is tested, and the effective and accurate transfer can be performed under the framework.
Without verifying the effects achieved by the present application, the inventors conducted the following experiments:
first, the validity of ComNet on PyDial benchmarking compound tasks is verified. Then, transferability of ComNet was investigated.
PyDial benchmark, evaluating our target framework requires a compound dialog simulation environment. The PyDial toolkit supports the use of an error model for multi-field dialogue simulation, and lays a good foundation for the construction of the composite task environment.
We modified the policy management module and the user simulation module to support 2 subtask composite dialogue simulations between three available subtasks, which are the Cambridge Restaurant (CR), San Francisco Restaurant (SFR) and the general shopping task (LAP) of a laptop, while preserving different levels of error simulation in all function table 1. Note that in the policy management module, we discard the domain inputs provided by the Dialog State Tracking (DST) module for fair comparison. We have updated the user simulation module and the assessment management module to support reward design.
Experiment implementation:
we implement the following three multitasking agents to evaluate the performance of our proposed framework.
Vanilla HDQN: MLP is used as a hierarchical agent for the model. This is the benchmark against which we compare.
And ComNet: our goal framework takes advantage of the flexibility of GNNs.
Manual production: well-designed rule-based agents have a high success rate in a noiseless compound dialogue. The agent is also used to preheat the training process for the first two agents. Note that this agent uses the precise subtask information provided by DST, which is unfair compared to the other two information.
Here, we train a model with 6000 dialogs or iterations. The total number of training sessions is broken down into stages (30 stages in total, each stage containing 200 sessions). At each stage, there are 100 sessions to test the performance of the session policy. The results of 3 compound tasks in 3 contexts in 6,000 training sessions are shown in fig. 6.
From the analysis, we can observe from fig. 6 that comet outperforms the vanilla MLP strategy in all nine settings (3 environments x 3 types of compound tasks) in terms of success rate and learning speed. In ComNet, both the upper and lower layer policies are denoted by GNN, where nodes of the same type and edges of the same type share parameters. This means that nodes of the same type share an input space (confidence state space). The exploration space will be greatly reduced. As shown in FIG. 6, the speed of change of ComNet learning is faster than the vanilla MLP strategy.
Note that the handcrafted agent program works well because it is tricked by looking at the exact subtask information, which means that the handcrafted agent program is solving a multi-domain task. This should be the upper limit of our model performance. Compared to vanilla HDQN, our comet shows its robustness in all environments with a greater advantage, which facilitates dialog system construction without high precision ASR or DST.
We also compared the difference in dialogs generated by vanilla HDQN and comet after 6000 dialog training. After extensive training, it appears that the vanilla HDQN agent still cannot select the appropriate action country in some specific dialog, which can lead to customer frustration. On the other hand, ComNet also chooses the same operation, but as long as the required information is obtained, it advances the progress of the conversation and thus completes the task successfully. This also helps to demonstrate that ComNet is more sample efficient than the vanilla framework.
Investigating transferability of ComNet: as discussed in the previous embodiments, another advantage of ComNet is that ComNet can migrate naturally due to the flexibility of GNN.
To evaluate its transferability, we first trained 6,000 dialogues on the CR + SFR task. Then, we start the parameters of the strategy model on the other two composite tasks using the trained strategy, and continue training and testing the model. The results are shown in FIG. 7.
We can find that the transfer model learned on the CR + SFR task is compatible with the other two composite tasks. It shows that ComNet can propagate task independent relationships between graph nodes based on the connections of initial node inputs and final outputs. This shows that by using the relevant task parameters trained in advance under the ComNet framework, the training process of the new compound task can be enhanced. After all, it is crucial to solve the problem of cold start in task oriented dialog systems.
In this application, we propose ComNet, which is a structured hierarchical conversational strategy represented by two Graph Neural Networks (GNNs). By replacing the MLPs in the traditional HDRL method, the comet can better utilize the structural information of the dialog state by providing observation (dialog state) and upper layer decisions to the slot-dependent, slot-independent child nodes and exchanging messages between these nodes, respectively. We evaluated our framework in an improved PyDial benchmark test and showed high efficiency, robustness and transferability in all settings.
In some embodiments, the present application provides a non-transitory computer readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the above-described dialog methods applied to a composite dialog task.
In some embodiments, the present application further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above-described dialog methods applied to a compound dialog task.
In some embodiments, the present application further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a dialog method applied to a composite dialog task.
In some embodiments, the present application further provides a storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements a dialog method applied to a compound dialog task.
The dialog system applied to the composite dialog task in the embodiment of the present application may be configured to execute the dialog method applied to the composite dialog task in the embodiment of the present application, and accordingly achieve the technical effect achieved by the implementation of the dialog method applied to the composite dialog task in the embodiment of the present application, and details are not described here. In the embodiment of the present application, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 8 is a schematic hardware structure diagram of an electronic device for executing a dialog method applied to a compound dialog task according to another embodiment of the present application, where, as shown in fig. 8, the device includes:
one or more processors 810 and a memory 820, with one processor 810 being an example in FIG. 8.
The apparatus for performing a dialog method applied to a compound dialog task may further include: an input device 830 and an output device 840.
The processor 810, the memory 820, the input device 830, and the output device 840 may be connected by a bus or other means, such as the bus connection in fig. 8.
The memory 820, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the dialog methods applied to the compound dialog task in the embodiments of the present application. The processor 810 executes various functional applications of the server and data processing, i.e., implementing a dialog method in which the above-described method embodiments are applied to a composite dialog task, by executing nonvolatile software programs, instructions, and modules stored in the memory 820.
The memory 820 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the conversation device applied to the compound conversation task, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 820 may optionally include memory located remotely from the processor 810, which may be connected via a network to the conversation devices applied to the composite conversation task. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 830 may receive input numeric or character information and generate signals related to user settings and function control of the dialog device applied to the composite dialog task. The output device 840 may include a display device such as a display screen.
The one or more modules are stored in the memory 820 and, when executed by the one or more processors 810, perform the dialog method applied to the composite dialog task in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A conversation method applied to a compound conversation task, comprising:
structuring the current conversation confidence state to obtain an upper-layer structured conversation state;
processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;
structuring the subtask information and the current dialogue confidence state to obtain a bottom-layer structured dialogue state;
processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
2. The method of claim 1, wherein the processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state comprises:
for each subtask k, the following calculation is performed:
Figure FDA0003200578740000011
wherein, OtopIt can be the output function of MLP, subscripts k,0 and k, I denote the I node and ith S node of subtask k, respectively; wherein v isiRepresenting nodes, L representing node embedding,
Figure FDA0003200578740000012
is node viIs shown in the drawing (a) and (b),
Figure FDA0003200578740000013
is a representation of an I node;
will be provided with
Figure FDA0003200578740000014
And
Figure FDA0003200578740000015
the series connection of (a) is used as the input of the MLP and outputs a scalar value;
when making a decision, all
Figure FDA0003200578740000016
To be connected, i.e.
Figure FDA0003200578740000017
Then according to qtopA subtask is selected.
3. The method of claim 1, wherein the processing the underlying structured conversation state based on a second graph neural network to determine a conversation action corresponding to the current conversation confidence state comprises:
the Q value for each dialog contains three pieces of information: a subtask level value, a slot level value, and an original value;
using T-node embedding
Figure FDA0003200578740000018
To calculate the subtask level value:
Figure FDA0003200578740000019
wherein the content of the first and second substances,
Figure FDA0003200578740000021
is an output function of the subtask level value,
Figure FDA0003200578740000022
is K, wherein each value is assigned to a respective subtask;
node v belonging to S node and I nodeiThe slot value and original value will be calculated:
Figure FDA00032005787400000214
Figure FDA0003200578740000023
wherein the content of the first and second substances,
Figure FDA0003200578740000024
and
Figure FDA0003200578740000025
output functions, p, of the bin value and the original value, respectivelyiThe same instance of the node-shared output function, representing the same node type, corresponds to a slot nodePoint viAct a ofk,iHas a Q value of
Figure FDA0003200578740000026
Wherein + is an element-by-element operation, and
Figure FDA0003200578740000027
kto represent
Figure FDA0003200578740000028
The kth value of;
when predicting the action, all
Figure FDA0003200578740000029
To be connected, i.e.
Figure FDA00032005787400000210
Then according to qlowA dialog action is selected.
4. The method of claim 1, wherein processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state further comprises:
an input preprocessing step:
before each prediction, each node viWill receive the corresponding atomic state b or subtask information g, denoted xiThe resulting state after preprocessing is embedded as follows:
h0 i=Fpi(xi)
wherein, FpiIs node type piA function of (a);
and (3) extracting graphic information:
h is to be0 iAs node viInitial embedding of (1);
then further propagating the higher embedding of each node in the graph;
the propagation process of node embedding at each extraction level is shown below:
and (3) message calculation: for each node viPresence of an insertion hl-1 iFor each egress node vj∈Nout(vi) Node viA vector of messages is calculated as follows,
Figure FDA00032005787400000211
wherein, ceIs a slave node viTo node vjEdge type of Mce lIs a message generating function, which may be a linear embedding:
Figure FDA00032005787400000212
message aggregation: the polymerization process is as follows,
Figure FDA00032005787400000213
wherein, A is an aggregation function,
Figure FDA0003200578740000031
is an aggregated message vector;
embedding and updating: up to now, each node viBoth having two kinds of information, i.e. aggregated message vectors
Figure FDA0003200578740000032
And its current embedded vector
Figure FDA0003200578740000033
The embedded update process is as follows:
Figure FDA0003200578740000034
wherein the content of the first and second substances,
Figure FDA0003200578740000035
is the node type p of the l-th abstraction layeriThe update function of (a), which may be a non-linear operation,
Figure FDA0003200578740000036
where δ is the activation function, λlIs a weight parameter of the aggregated information, and
Figure FDA0003200578740000037
is a trainable matrix.
5. A dialog system for application to a compound dialog task, comprising:
the first structuralization processing program module is used for structuralizing the current conversation confidence state to obtain an upper-layer structuralization conversation state;
a subtask information determination program module for processing the upper-level structured dialog state based on a first graph neural network to determine subtask information corresponding to the current dialog confidence state;
the second structured processing program module is used for carrying out structured processing on the subtask information and the current conversation confidence state so as to obtain a bottom-layer structured conversation state;
a dialog action determination program module for processing the underlying structured dialog state based on a second graph neural network to determine a dialog action corresponding to the current dialog confidence state.
6. The system of claim 5, wherein the processing the upper-level structured dialog state based on the first graph neural network to determine subtask information corresponding to the current dialog confidence state comprises:
for each subtask k, the following calculation is performed:
Figure FDA0003200578740000038
wherein, OtopIt can be the output function of MLP, subscripts k,0 and k, I denote the I node and ith S node of subtask k, respectively; wherein v isiRepresenting nodes, L representing node embedding,
Figure FDA0003200578740000039
is node viIs shown in the drawing (a) and (b),
Figure FDA00032005787400000310
is a representation of an I node;
will be provided with
Figure FDA00032005787400000311
And
Figure FDA00032005787400000312
the series connection of (a) is used as the input of the MLP and outputs a scalar value;
when making a decision, all
Figure FDA0003200578740000041
To be connected, i.e.
Figure FDA0003200578740000042
Then according to qtopA subtask is selected.
7. The system of claim 5, wherein the processing the underlying structured conversation state based on a second graph neural network to determine the conversation action corresponding to the current conversation confidence state comprises:
the Q value for each dialog contains three pieces of information: a subtask level value, a slot level value, and an original value;
using T-jointsPoint embedding
Figure FDA0003200578740000043
To calculate the subtask level value:
Figure FDA0003200578740000044
wherein the content of the first and second substances,
Figure FDA0003200578740000045
is an output function of the subtask level value,
Figure FDA0003200578740000046
is K, wherein each value is assigned to a respective subtask;
node v belonging to S node and I nodeiThe slot value and original value will be calculated:
Figure FDA0003200578740000047
Figure FDA0003200578740000048
wherein the content of the first and second substances,
Figure FDA0003200578740000049
and
Figure FDA00032005787400000410
output functions, p, of the bin value and the original value, respectivelyiThe same instance of the shared output function of nodes representing the same node type, corresponding to the slot node viAct a ofk,iHas a Q value of
Figure FDA00032005787400000411
Wherein, + is one element by one elementIs operated alone, and
Figure FDA00032005787400000412
kto represent
Figure FDA00032005787400000413
The kth value of;
when predicting the action, all
Figure FDA00032005787400000414
To be connected, i.e.
Figure FDA00032005787400000415
Then according to qlowA dialog action is selected.
8. The system of claim 5, wherein processing the underlying structured conversation state based on a second graph neural network to determine a conversation action corresponding to the current conversation confidence state further comprises:
an input preprocessing step:
before each prediction, each node viWill receive the corresponding atomic state b or subtask information g, denoted xiThe resulting state after preprocessing is embedded as follows:
Figure FDA00032005787400000416
wherein, FpiIs node type piA function of (a);
and (3) extracting graphic information:
h is to be0 iAs node viInitial embedding of (1);
then further propagating the higher embedding of each node in the graph;
the propagation process of node embedding at each extraction level is shown below:
and (3) message calculation: for each nodeviPresence of an insertion hl-1 iFor each egress node vj∈Nout(vi) Node viA vector of messages is calculated as follows,
Figure FDA0003200578740000051
wherein, ceIs a slave node viTo node vjEdge type of Mce lIs a message generating function, which may be a linear embedding:
Figure FDA0003200578740000052
message aggregation: the polymerization process is as follows,
Figure FDA0003200578740000053
wherein, A is an aggregation function,
Figure FDA0003200578740000054
is an aggregated message vector;
embedding and updating: up to now, each node viBoth having two kinds of information, i.e. aggregated message vectors
Figure FDA0003200578740000055
And its current embedded vector
Figure FDA0003200578740000056
The embedded update process is as follows:
Figure FDA0003200578740000057
wherein the content of the first and second substances,
Figure FDA0003200578740000058
is the node type p of the l-th abstraction layeriThe update function of (a), which may be a non-linear operation,
Figure FDA0003200578740000059
where δ is the activation function, λlIs a weight parameter of the aggregated information, and
Figure FDA00032005787400000510
is a trainable matrix.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-4.
10. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN201910720620.5A 2019-08-06 2019-08-06 Conversation method and system applied to compound conversation task Active CN110443355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910720620.5A CN110443355B (en) 2019-08-06 2019-08-06 Conversation method and system applied to compound conversation task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910720620.5A CN110443355B (en) 2019-08-06 2019-08-06 Conversation method and system applied to compound conversation task

Publications (2)

Publication Number Publication Date
CN110443355A CN110443355A (en) 2019-11-12
CN110443355B true CN110443355B (en) 2021-11-16

Family

ID=68433435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910720620.5A Active CN110443355B (en) 2019-08-06 2019-08-06 Conversation method and system applied to compound conversation task

Country Status (1)

Country Link
CN (1) CN110443355B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826700B (en) * 2019-11-13 2021-04-23 中国科学技术大学 Method for realizing and classifying bilinear graph neural network model for modeling neighbor interaction
CN111581534B (en) * 2020-05-22 2022-12-13 哈尔滨工程大学 Rumor propagation tree structure optimization method based on consistency of vertical place
CN112860869B (en) * 2021-03-11 2023-02-03 中国平安人寿保险股份有限公司 Dialogue method, device and storage medium based on hierarchical reinforcement learning network
CN114418119A (en) * 2022-01-21 2022-04-29 深圳市神州云海智能科技有限公司 Dialogue strategy optimization method and system based on structure depth embedding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928753B2 (en) * 2006-11-08 2018-03-27 Cricket Media, Inc. Dynamic characterization of nodes in a semantic network for desired functions such as search, discovery, matching, content delivery, and synchronization of activity and information
CN108962238A (en) * 2018-04-25 2018-12-07 苏州思必驰信息科技有限公司 Dialogue method, system, equipment and storage medium based on structural neural networks
US10152970B1 (en) * 2018-02-08 2018-12-11 Capital One Services, Llc Adversarial learning and generation of dialogue responses
CN109446306A (en) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 Task-driven multi-turn dialogue-based intelligent question and answer method
WO2019065647A1 (en) * 2017-09-28 2019-04-04 株式会社東芝 Interactive processing device and interactive processing system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129601B2 (en) * 2008-11-26 2015-09-08 At&T Intellectual Property I, L.P. System and method for dialog modeling
US8395408B2 (en) * 2010-10-29 2013-03-12 Regents Of The University Of California Homogeneous dual-rail logic for DPA attack resistive secure circuit design
US10535346B2 (en) * 2017-12-07 2020-01-14 Ca, Inc. Speech processing computer system forming collaborative dialog data structures

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928753B2 (en) * 2006-11-08 2018-03-27 Cricket Media, Inc. Dynamic characterization of nodes in a semantic network for desired functions such as search, discovery, matching, content delivery, and synchronization of activity and information
WO2019065647A1 (en) * 2017-09-28 2019-04-04 株式会社東芝 Interactive processing device and interactive processing system
US10152970B1 (en) * 2018-02-08 2018-12-11 Capital One Services, Llc Adversarial learning and generation of dialogue responses
CN108962238A (en) * 2018-04-25 2018-12-07 苏州思必驰信息科技有限公司 Dialogue method, system, equipment and storage medium based on structural neural networks
CN109446306A (en) * 2018-10-16 2019-03-08 浪潮软件股份有限公司 Task-driven multi-turn dialogue-based intelligent question and answer method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning;Lu Chen 等;《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》;20190530;第27卷(第9期);全文 *
基于神经网络的知识推理研究综述;张仲伟 等;《计算机工程与应用》;20190325;第55卷(第12期);全文 *

Also Published As

Publication number Publication date
CN110443355A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110443355B (en) Conversation method and system applied to compound conversation task
EP3446260B1 (en) Memory-efficient backpropagation through time
CN111465946B (en) Neural Network Architecture Search Using Hierarchical Representations
CN108962238A (en) Dialogue method, system, equipment and storage medium based on structural neural networks
O'Sullivan Complexity science and human geography
Wang et al. Adaptive and large-scale service composition based on deep reinforcement learning
WO2019228232A1 (en) Method for sharing knowledge between dialog systems, and dialog method and apparatus
Hazra et al. Applications of game theory in deep learning: a survey
CN106471525A (en) Strength neural network is to generate additional output
Wang et al. Adaptive and dynamic service composition via multi-agent reinforcement learning
Moradi et al. Collective hybrid intelligence: towards a conceptual framework
CN109661672A (en) External memory strength neural network is utilized using intensified learning
Gym et al. Deep reinforcement learning with python
CN114398556A (en) Learning content recommendation method, device, equipment and storage medium
WO2024120504A1 (en) Data processing method and related device
CN117575008A (en) Training sample generation method, model training method, knowledge question-answering method and knowledge question-answering device
CN112541570A (en) Multi-model training method and device, electronic equipment and storage medium
US20210142180A1 (en) Feedback discriminator
CN109299231A (en) Dialogue state tracking, system, electronic equipment and storage medium
US20220198217A1 (en) Model parallel training technique for neural architecture search
CN106878403A (en) Based on the nearest heuristic service combining method explored
Evans et al. A unified model of learning to forecast
Le et al. Generating predictable and adaptive dialog policies in single-and multi-domain goal-oriented dialog systems
Ganesh et al. Machine learning and logic: a new frontier in artificial intelligence
CN110096583B (en) Multi-field dialogue management system and construction method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200617

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: AI SPEECH Ltd.

Applicant after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Ltd.

Applicant before: SHANGHAI JIAO TONG University

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201023

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: AI SPEECH Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Ltd.

Applicant before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant