CN110443355A

CN110443355A - Dialogue method and system applied to compound conversation tasks

Info

Publication number: CN110443355A
Application number: CN201910720620.5A
Authority: CN
Inventors: 俞凯; 陈志�
Original assignee: Shanghai Jiaotong University; AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2019-11-12
Anticipated expiration: 2039-08-06
Also published as: CN110443355B

Abstract

The application discloses a kind of dialogue method applied to compound conversation tasks, comprising: carries out structuring processing to current session confidence state to obtain superstructure dialogue state；The superstructure dialogue state is handled based on the first figure neural network, to determine the subtask information for corresponding to the current session confidence state；Structuring processing is carried out to obtain underlying structured dialogue state to the subtask information and the current session confidence state；The underlying structured dialogue state is handled based on the second figure neural network, to determine that the dialogue corresponding to the current session confidence state acts.The embodiment of the present application combination HDRL and GNN solves composite task, while realizing sample efficiency.In addition, it is more steady to ambient noise, effectively accurately migration can be carried out.

Description

Dialogue method and system applied to compound conversation tasks

Technical field

This application involves field of artificial intelligence more particularly to a kind of dialogue method applied to compound conversation tasks and System.

Background technique

Composite task is different from multi-field conversation tasks.The latter often mentions in the paper of concern shift learning.Big In most cases, multi-field conversation tasks only relate to individually to talk in a field, and test in different field this one The performance of a domain model is with its prominent transferability.On the contrary, compound conversation tasks may multiple necks involved in single dialogue Domain, and intelligent body must complete all subtasks (completing the target in all spectra) to obtain positive feedback.

Consider the process (for example, multi-field dining room is subscribed) of completion composite task.Intelligent body selects subtask (example first Such as, Cambridge dining room is reserved), make a series of decisions then to collect relevant information (for example, Price Range, region), Zhi Daoti For all information needed for user and these subtasks are completed, then select next subtask (for example, reserve-SF- Restaurant it) completes.State action space will increase with the quantity of subtask.Therefore, the dialogue plan of composite task Slightly study needs more explorations, needs more to be talked between intelligent body and user, to complete composite task.It is sparse Reward problem is further amplified.

Composite task, which is solved, using method identical with single field task is solved is likely encountered obstacle.The complexity of composite task Property makes intelligent body be difficult to learn acceptable strategy.However, multilayer perceptron (MLP) is often used in DQN in the prior art In estimation Q value.MLP uses the series connection of flat dialogue state as its input.In this way, it cannot easily capture language under the state The structural information of adopted slot causes sampling efficiency low.ComNet is proposed in this application, it utilizes figure neural network (GNN) It preferably utilizes the graphic structure (for example, dialogue state) in observation and is consistent with HDRL method.

Summary of the invention

The embodiment of the present application provides a kind of dialogue method and system applied to compound conversation tasks, at least solving State one of technical problem.

In a first aspect, the embodiment of the present application provides a kind of dialogue method applied to compound conversation tasks, comprising:

Structuring processing is carried out to obtain superstructure dialogue state to current session confidence state；

The superstructure dialogue state is handled based on the first figure neural network, corresponds to described work as to determine The subtask information of preceding dialogue confidence state；

Structuring processing is carried out to obtain underlying structured to the subtask information and the current session confidence state Dialogue state；

The underlying structured dialogue state is handled based on the second figure neural network, is worked as with determining corresponding to described The dialogue movement of preceding dialogue confidence state.

Second aspect, the embodiment of the present application provide a kind of conversational system applied to compound conversation tasks, comprising:

First structure handler module, for carrying out structuring processing to current session confidence state to obtain upper layer Structured conversation state；

Subtask information determines program module, for being based on the first figure neural network to the superstructure dialogue state It is handled, to determine the subtask information for corresponding to the current session confidence state；

Second structuring handler module, for being carried out to the subtask information and the current session confidence state Structuring is handled to obtain underlying structured dialogue state；

Dialogue act determine program module, for based on the second figure neural network to the underlying structured dialogue state into Row processing, to determine that the dialogue corresponding to the current session confidence state acts.

The third aspect, the embodiment of the present application provide a kind of storage medium, are stored with one or more in the storage medium Including the program executed instruction, it is described execute instruction can by electronic equipment (including but not limited to computer, server, or Network equipment etc.) it reads and executes, with the dialogue method for being applied to compound conversation tasks for executing the application any of the above-described.

Fourth aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out the application any of the above-described Dialogue method applied to compound conversation tasks.

5th aspect, the embodiment of the present application also provide a kind of computer program product, and the computer program product includes The computer program of storage on a storage medium, the computer program includes program instruction, when described program instruction is calculated When machine executes, the computer is made to execute the dialogue method that any of the above-described is applied to compound conversation tasks.

The beneficial effect of the embodiment of the present application is: the embodiment of the present application combination HDRL and GNN solves composite task, together Shi Shixian sample efficiency.In addition, it is more steady to ambient noise, effectively accurately migration can be carried out.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to required use in embodiment description Attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for this field For those of ordinary skill, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of an embodiment of the dialogue method applied to compound conversation tasks of the application；

Fig. 2 is the schematic diagram for carrying out structuring processing in the application to dialogue state using two-level policy；

Fig. 3 is the flow chart of another embodiment of the dialogue method applied to compound conversation tasks of the application；

Fig. 4 is the functional block diagram of an embodiment of the conversational system applied to compound conversation tasks of the application；

Fig. 5 is the structural schematic diagram of an embodiment of the second figure neural network of the application；

Fig. 6 is the performance comparison lab diagram of three kinds of intelligent bodies of the application；

Fig. 7 is the model of pre-training in CR+SFR task in the application and pair between the model with random parameter Compare lab diagram；

Fig. 8 is the structural schematic diagram of an embodiment of the electronic equipment of the application.

Specific embodiment

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

In this application, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.

The dialog strategy training of composite task is one actually important and have for example, the dining room in multiple places is subscribed The problem of challenge.Recently, depth of seam division intensified learning (Hierarchical deep reinforcement learning HDRL) method achieves good performance in composite task.However, in vanilla HDRL, upper layer policy and bottom plan Slightly all indicate that all observations from environment are combined the input as prediction action by they by multilayer perceptron (MLP). Therefore, traditional HDRL method has that efficiency samples low and transferability difference.

In this application, it is solved these problems by the flexibility using figure neural network (GNNs).Propose one kind Novel ComNet is layered the structure of intelligent body to simulate.The performance of ComNet is enterprising in the synthesis task of PyDial benchmark test Test is gone.Experiment shows that the performance of ComNet is better than the vanilla HDRL system close to the upper limit.It not only may be implemented to adopt Sample efficiency, and robustness is had more to noise while keeping the transferability of other composite tasks.

The application main contributions have three aspects:

1, a new frame ComNet is proposed, composite task is solved in conjunction with HDRL and GNN, while realizing that sample is imitated Rate；

2, it is based on PyDial benchmark test ComNet, and shows that our result has been more than vanillaHDRL system, and It is more steady to ambient noise；

3, test the transferability of the ComNet frame of the application, and prove under the frame, can carry out effectively and Accurately transfer.

Intensified learning is that optimization counts dialogue management plan at partially observable Markov decision process (POMDP) recently Main stream approach slightly.One research field is single field task orientation dialogue, using flat deep layer intensified learning method, such as DQN, Policy-Gradient and performer reviewer.The conversation tasks of multi-field oriented mission are other directions, and each field learns a list Only dialog strategy.

It is recently proposed a compound conversation tasks.Different from multi-field conversational system, compound conversation tasks need to complete All individual subtasks.Compound conversation tasks are formulated by option frame, and are solved using Hierarchical reinforcement learning method.It is all this A little work are all based on vanilla HDRL building, wherein strategy is indicated by multilayer perceptron (MLP).However, in the application In, we, which are absorbed in, designs convertible dialog strategy for the compound conversation tasks based on figure neural network.

GNN is also used to other aspects of intensified learning, to provide the features such as transferability or less overfitting.Talking with In system building, the model as BUDS is also engaged in the dialogue status tracking using the strength of figure.Work before also confirms, It can be by creating node of graph corresponding with semantic slot and optimizing graph structure, in single domain using GNN learning structure dialog strategy Significant raising system performance in setting.But for compound dialogue, it would be desirable to complete using the particularity and change of task Frame.

Hierarchical reinforcement learning:

Before introducing ComNet, the HRL of our brief review once compound conversational systems of oriented mission first.According to Option frame, it is assumed that we have a dialogue state collection B, and a subtask (or an option) collects G and original activities collection A。

Compared with traditional Markovian decision process (MDP) setting, intelligent body can only select original in each time step Movement, the decision process for being layered MDP includes: the upper layer policy π of (1), the completion of selection subtask_b；(2), a bottom strategy π_b,g, it selects original activities to complete given subtask.Upper layer policy π_bThe confidence state that global state tracker is generated B selects subtask g ∈ G as input.Bottom strategy π_b,gCurrent state b and subtask g is perceived, and output primitive acts A. Bottom strategy π_b,gIt is shared by all subtasks.

In this application, we indicate the two level policies using two Q functions, pass through depth Q learning method (DQN) learn and pass through θ respectively_eAnd θ_iParametrization.Corresponding to two-level policy, there are two types of the prize signals from environment (user): External reward r^eWith intrinsic reward rⁱ.External reward guidance dialogue intelligent body selects correct subtask sequence.Intrinsic reward is used for Learn option strategy to realize given subtask.It is complete as early as possible that the combination of external reward and intrinsic reward is to aid in dialogue intelligent body At composite task.Therefore, external and intrinsic reward design is as follows:

Intrinsic reward, at the end of subtask, intelligent body receives the successfully positive intrinsic reward 1 of subtask or failure subtask 0.In order to encourage shorter dialogue, intelligent body obtains -0.05 negative intrinsic reward in each bout.

External reward, if K is the quantity of sub-goal.In end-of-dialogue, intelligent body obtains the just external prize that K successfully talks with It encourages or 0 pair is unsuccessfully talked with.In order to encourage shorter dialogue, intelligent body can receive -0.05 negative external prize in each bout It encourages.

Assuming that we have subtask track T:

Wherein, k represents k-th of subtask g_k.Talk with track by a series of subtask track T₀, T₁... composition.According to Q learning algorithm, the parameter θ of upper layer Q function_eIt updates as follows:

Wherein,

α is step parameter, and γ ∈ [0,1] is discount rate.The first item of above-mentioned q expression formula, which is equal to, fulfils subtask g_kPeriod Total discount reward, Section 2 estimate g_kThe total rebate value of maximum after the completion.

Other than using intrinsic reward, the learning process of bottom strategy is also similar.For each time step t=0, 1 ..., T,

Wherein,

In vanilla HDRL, the approximate above-mentioned two Q function of MLP is used.Ignore the knot of dialogue state in this setting Structure.Therefore, the task of MLP strategy is the potential relationship between discovery observation.This leads to longer convergence time, needs more Prospecting test.In the next section, we will be explained how building chart to indicate the relationship in dialogue observation.

Compound dialogue:

The conversational system of oriented mission is usually by structuring ontology definition.Ontology is made of some attributes (or slot), user It can be in the task of completion for constructing inquiry.For the compound dialogue state comprising K subtask, each subtask corresponds to Several slots.For simplicity, we introduce confidence state by taking the k of subtask as an example.There are two boolean to belong to for each slot of subtask k Property, no matter it is demandable or believable.User can request can request slot value, and particular value conduct can be provided The search constraints of trusted slot.In each dialogue turning, dialogue state tracker update each can communications slot confidence state.

In general, confidence state is made of being distributed for candidate slot value.Select each trusted slot that there is highest confidence level Value as search database constraint.The information of matching entities is added to final dialogue state in database.Subtask k's Dialogue state b^kIt is broken down into several states relevant to slot and the state unrelated with slot, is expressed asb^k,j(1≤j≤n) is j-th of credible slot correlated condition of subtask k, And b^k,0Indicate the state unrelated with slot of subtask k.Entire confidence state is all subtask correlated condition b^kSeries connection, I.e. It is the input of upper layer dialog strategy.

The output of upper layer policy is subtask g ∈ G.In this application, we indicated using single hot vector one it is specific Subtask.In addition, by entire confidence state b and subtask vector g feed-in bottom strategy.The output of bottom strategy is original Dialogue movement.Similarly, for each subtask k, talk with behavior aggregate A^kN behavior aggregate As relevant to slot can be divided into^k ^,j(1≤j≤n), for example, request_slot^k,j, inform_slot^k,j, select_slot^k,jIt is unrelated with slot dynamic with one Make collection A^k,0, for example, repeat^k,0, reqmore^k,0..., bye^k,0.Entire dialogue motion space A is that all subtasks are dynamic Make the union in space.

As shown in Figure 1, embodiments herein provides a kind of dialogue method applied to compound conversation tasks, comprising:

S10, structuring processing is carried out to current session confidence state to obtain superstructure dialogue state.It is exemplary Ground, dialogue state b (for example, current session confidence state) is made of the subtask K correlated condition, and each subtask correlation-like State can be further broken into several states relevant to slot and with the inseparable state unrelated with slot of logic, referred to as atom State.The hierarchical format of dialogue state can be considered as figure naturally.The corresponding state of atom of each node on behalf in figure.For The structure of simplification figure selects the node unrelated with slot as the commission for the node for corresponding to identical subtask.All and slot without The node of pass is connected with each other in the figure of upper layer, and node relevant to slot is connected only to it and entrusts node.

S20, the superstructure dialogue state is handled based on the first figure neural network, corresponds to institute to determine State the subtask information of current session confidence state.

S30, structuring processing is carried out to the subtask information and the current session confidence state to obtain bottom knot Structure dialogue state.

Illustratively, different from the input of upper layer policy, an entitled subtask node is added in the input of bottom strategy New node come indicate by upper layer policy generate target information.In bottom figure, the node unrelated with slot is all connected to subtask Node (or global commission node), rather than be connected with each other.

S40, the underlying structured dialogue state is handled based on the second figure neural network, corresponds to institute to determine State the dialogue movement of current session confidence state.

The embodiment of the present application proposes a new frame ComNet, composite task is solved in conjunction with HDRL and GNN, simultaneously Realize sample efficiency.In addition, it is more steady to ambient noise, effectively accurately migration can be carried out.

Illustratively, use two figure neural networks (for example, the first figure neural network and second in the embodiment of the present application Figure neural network) parameterize two-level policy.First to introduce following symbol convenient for subsequent narration and understanding: graph structure is expressed as G =(V, E), node v_i(0≤i≤n) ∈ V and directed edge e_ij∈E.The structure of adjacency matrix Z expression G.If there is coming from directed edge I-th of node v_iTo j-th of node v_j, the element z of Z_ijIt is 1, otherwise z_ijIt is 0.We are by node v_iOutgoing neighborhood collection table It is shown as N_out(v_i).Similarly, N_in(v_i) indicate node v_iEnter neighborhood collection.Each node v_iWith associated node type p_i.Each side e_ijWith side type c_e, by start node type p_iWith end node type p_jIt determines.In other words, when and only When their start node type and all identical end node type, two side types having the same.

As shown in Fig. 2, to carry out the schematic diagram of structuring processing in the application to dialogue state using two-level policy.Its In, Fig. 2 a is the schematic diagram corresponding to upper layer policy, and there are two types of the nodes of type for it: slot interdependent node (S node) and slot are unrelated Node (inode).Due to not having edge between slot interdependent node, only there are four types of edge types for it.Similarly, Fig. 2 b is pair Should be in bottom strategy, there are three types of the node of type (depending on slot, the subtask and unrelated with slot (T node)) and four kinds of edge classes for it Type.Up to the present, the chart of upper layer policy and bottom strategy has all clearly defined.There are two GNN by ComNet, for parsing bottom These graphical formats of layer strategy and upper layer policy are observed.

The input of Task dialog strategy is dialogue state, and the dialogue state in each list field is made of two major classes type: slot Relevant dialogue state feature and the unrelated dialogue state feature of slot.The relevant dialogue state of its bracket groove is characterized in by all slot institutes One-to-one feature composition.For the complicated conversation tasks comprising multiple subdomains, dialogue state is by all sons The dialogue state composition in field, upper layer dialog strategy is exactly that current needs are selected in the dialogue state being composed using this The dialogue subdomains of solution, then bottom dialog strategy just combine dialogue state and selected the two information of dialogue subdomains into Row dialogue decision.Figure 2 above is exactly the figure for formed after structuring is processed by the input of upper layer policy and bottom strategy, wherein S node refers to that slot correlated characteristic, inode refer to that slot extraneous features, T node refer to that currently selected subdomains indicate. The following figure specifically illustrates the figure neural network structure model of bottom Policy model, which mainly consists of three parts: input mould Type, graph structure information extraction model, output model.The parameter all for the node of same type is all shared.Therefore, only The type (slot interdependent node, slot are without artis and domain features node) of graph structure interior joint is wanted to remain unchanged, the ginseng of the model Quantity will remain unchanged.

As shown in figure 3, the flow chart of another embodiment for the dialogue method applied to compound conversation tasks of the application. Specifically, the level dialog strategy model of structuring mainly consists of two parts: upper layer dialog strategy and bottom dialog strategy.On Layer dialog strategy main task is that the current desired dialogue subdomains to be solved, bottom dialog strategy are specified for bottom dialog strategy It is that dialogue movement is exported in conjunction with dialogue state and subdomains information.Structuring upper layer dialogue state and structuring bottom talk with shape The two processes of state have been described in the embodiment about Fig. 2.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Movement merge, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, related actions and modules not necessarily the application It is necessary.In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

As shown in figure 4, embodiments herein also provides a kind of conversational system 400 applied to compound conversation tasks, packet It includes:

First structure handler module 410, for carrying out structuring processing to current session confidence state to obtain Superstructure dialogue state；

Subtask information determines program module 420, for being talked with based on the first figure neural network to the superstructureization State is handled, to determine the subtask information for corresponding to the current session confidence state；

Second structuring handler module 430, for the subtask information and the current session confidence state Structuring processing is carried out to obtain underlying structured dialogue state；

Dialogue, which acts, determines program module 440, for talking with shape to the underlying structured based on the second figure neural network State is handled, to determine that the dialogue corresponding to the current session confidence state acts.

As shown in figure 5, the structural schematic diagram of the embodiment for the second figure neural network of the application, the second figure nerve net , input module network is there are three part, for extracting useful expression from from initial graphics format: 1), 2), graphical information mentions Modulus block, 3), output module, next coming in order are introduced:

1), input module:

Before each prediction, each node v_iThe upper and lower figure will receive corresponding state of atom b or subtask Information g (is expressed as x_i), it is admitted to input module to obtain state insertion h⁰ _iIt is as follows:

h⁰ _i=F_pi(x_i)

Wherein, F_piIt is node type p_iFunction, it can be multilayer perceptron (MLP).In general, different slots has not With the candidate value of quantity.Therefore, it is different dependent on the input dimension of the node of slot.However, the confidence state of each slot is logical It is often approximate by the probability of the preceding M value to sort, wherein M is usually less than the slotted minimum value number of institute.Therefore, there is same type The input dimension of node be identical.

2), managing graph information module:

Managing graph information module is by h⁰ _iAs node v_iInitial insertion, each node in figure is then propagated further Higher insertion.The communication process of node insertion at each extract layer is shown below operation.

Message calculates: walking in l, for each node v_i, there are insertion h^l-1 _iNode.For each outer egress v_j ∈N_out(v_i), node v_iFollowing message vector is calculated,

Wherein c_eIt is from node v_iTo node v_jEdge type, M_ce ^lIt is message generating function, it can be linearly embedding: M^l _ce(h^l-1 _i)=W^l _ce h ^l-1 _i.It note that subscript c_eIndicate that the weight matrix W to be learnt is shared on the side of same edge type^l _ce。

Message polymerization: after each node is completed to calculate message, will polymerize from each node v_jEnter neighbours send Message.Specifically, polymerization process is as follows:

Wherein, A is aggregate function, can be summation, average or maximum pond function.It is syndication message vector comprising The information sent from all neighbor nodes.

Insertion updates: up to the present, each node v_iAll there are two types of information, i.e. syndication message vectorAnd its it is current It is embedded in vector h_i ^l-1.It is as follows to be embedded in renewal process:

Wherein, U^l _piIt is the node type p of first of extract layer_iRenewal function, can be nonlinear operation, i.e.,

Wherein, δ is activation primitive, i.e. RELU, λ^lIt is the weight parameter of aggregation information, is cut into 0~1, and W^l _pi Being can training matrix.Note that subscript p_iThe same instance for indicating the nodes sharing renewal function of same node point type, ours Shared parameter W in example^l _pi。

3), output module:

After more new node is embedded in L step, each node v_iH is indicated with final_i ^L, it also is indicated as h^L _k,i, wherein under K is marked, i indicates node v_iCorresponding to subtask k.

Upper layer output: upper layer policy is intended to predict the subtask to be realized.In top level diagram, for specific subtask, It corresponds to multiple S nodes and an inode.Therefore, when calculating the Q value of specific subtask, subtask associated section will be used All final insertions of point.In particular, for each subtask k, we execute following calculating:

Wherein, O_topIt is the output function that can be MLP, subscript k, 0 and k, i respectively indicate the inode and i-th of subtask k A S node.In practice, we are by Σ_vi∈S-node h^L _k,iAnd h^L _k,0Series connection as MLP input and export scalar value.It is right In all subtasks, this MLP is shared.When making decision, all q^k _topIt will be connected, i.e.,Then according to q_topSelect subtask.

Bottom output: upper layer policy is intended to predict that original dialogue acts.Original dialogue movement necessarily corresponds to subtask.Such as We will be considered as the node relevant with slot of specific type to fruit to the unrelated node of slot, then original dialogue movement can be further right It should be in slot node.Therefore, the Q value of each dialogue movement includes three parts information: subtask class value, slot grade value and original value. We are embedded in h using T node^T _LTo calculate subtask class value:

Wherein,It is the output function of subtask class value, can be MLP.Output dimension be K, In each value distribute to corresponding subtask.

Belong to the node v of S node and inode_iSlot grade value and original value will be calculated:

Wherein, O^pi _slotAnd O^pi _primIt is the output function of slot grade value and original value respectively, can actually is MLP.It is similar Ground, subscript p_iIndicate the same instance of the nodes sharing output function of same node point type.Corresponding to slot node v_iMovement a_k,i Q value be q^k,i _low=(q^T _subt)_k+q^k,i _slot+q^k,i _prim, wherein+be is by element operation, and (q^T _subt)_kIndicate q^T _subtIn K-th of value.When prediction action, all q^k,i _lowIt will be connected, i.e.,Then basis q_lowSelect original activities.

It is every although the parameter of input module and managing graph information module is shared not between upper layer GNN and bottom GNN Many of a GNN shared parameter.Assuming that composite task is modified now and a subtask is added to some new slots, we It only needs to create new node in each GNN.If the quantity of edge type does not change, the ginseng of GNN after new node is added Number will remain unchanged.The ownership of ComNet leads to transferability.In general, if the node type collection of composite task Task1 and side Edge set of types is all the subset of another task task 2, then can be directly used on Task1 and to learn in Task2 ComNet strategy.

Since the initial output of same type node has similar semantic meaning, they share ginseng in ComNet Number.It is desirable that propagating the relationship between figure interior joint according to the connection of initial input and final output using GNN.

The application main contributions have three aspects:

Not verifying the application or more can achieve the effect that, inventor has carried out following experiment:

Firstly, validity of the verifying ComNet to the composite task of PyDial benchmark test.Then, that investigates ComNet can Metastatic.

PyDial benchmark, the target framework for assessing us need a compound simulation of dialogue environment.PyDial kit branch It holds and carries out the multi-field simulation of dialogue using error model, have laid a good foundation for our composite task environment construction.

We have modified policy management module and user impersonation module, to support 2 sons between three available subtasks to appoint It is engaged in the compound simulation of dialogue, these subtasks are Cambridge dining room (CR), the general shopping of San Francisco dining room (SFR) and laptop Task (LAP), while remaining the fault seeding of different stage in repertoire table 1.It note that in policy management module, We, which abandon, is inputted by the domain that dialogue state tracking (DST) module provides to carry out fair comparison.We have updated user's simulation Module and assessment management module, to support reward design.

Experiment is implemented:

We realize following three composite task intelligent bodies assess it is proposed that frame performance.

Vanilla HDQN: use MLP as the layering intelligent body of model.This is the benchmark that we compare.

ComNet: our target framework utilizes the flexibility of GNN.

It is hand-made: well-designed rule-based intelligent body, in noiseless compound dialogue have it is very high at Power.The intelligent body is also used to preheat the training process of the first two intelligent body.It note that the essence that this intelligent body uses DST to provide True subtask information, this is inequitable compared with other two information.

Herein, we train the model with 6000 dialogues or iteration.The sum of training dialogue is broken down into multiple Stage (is divided into 30 stages, each stage includes 200 dialogues) in total.In each stage, there are 100 dialogues to test dialogue The performance of strategy.The result of 3 kinds of composite tasks in 3 environment in 6,000 training dialogues is as shown in Figure 6.

Analysis, we can observe that ComNet is arranged in terms of success rate and pace of learning at all nine kinds from Fig. 6 Vanilla MLP strategy is better than in (composite task of 3 kinds of 3 seed types of environment *).In ComNet, upper layer policy and bottom Strategy all indicates by GNN, the wherein side shared parameter of the node of same type and same type.This means that the section of same type The shared input space (confidence state space) of point.Therefore exploration space will greatly reduce.As shown in fig. 6, ComNet study variation Speed ratio vanilla MLP strategy it is fast.

It note that hand-made intelligent body program operational excellence, because it is and checking accurate subtask information Deception, it means that hand-made intelligent body program is solving multi-field task.This should be the upper of our model performances Limit.Compared with vanilla HDQN, our ComNet shows its robustness in all environment with bigger advantage, this has Help the system building that engages in the dialogue in no high-precision ASR or DST.

We also compare the difference for the dialogue that vanilla HDQN and ComNet are generated after 6000 dialogues are trained.Through excessive After amount training, it appears that vanilla HDQN intelligent body can not still select movement country appropriate, this meeting in certain certain dialogs Client is caused to lose patience.On the other hand, ComNet has also selected identical operation, but as long as the information needed for obtaining, it is just The progress that can talk in advance, to successfully complete task.This also contributes to proving that ComNet has more sample compared with vanilla frame This efficiency.

Investigate ComNet transferability: just as discussed in the previous embodiment as, another advantage of ComNet It is that, due to the flexibility of GNN, ComNet can be shifted naturally.

In order to assess its transferability, we have trained 6,000 dialogue with regard to CR+SFR task first.Then, we make Start the parameter of Policy model in other two composite tasks with trained strategy, and continues trained and test model. As a result as shown in Figure 7.

We can be found that the metastasis model learnt in CR+SFR task is compatible with other two composite tasks.It shows ComNet can be inputted according to start node and final output is connected to the relationship that propagation is unrelated with task between node of graph.This Show by, using inter-related task parameter trained in advance, training for new composite task can be enhanced under ComNet frame Journey.After all, solve the problems, such as that the starting in the conversational system of oriented mission is cold most important.

In this application, we have proposed ComNet, it is a kind of structuring indicated by two figure neural networks (GNN) Hierarchical dialogue strategy.By the MLPs in replacement tradition HDRL method, ComNet can preferably utilize the structure of dialogue state Information, method are will to observe (dialogue state) and upper layer decision to be respectively supplied to that slot is related, the unrelated child node of slot and at these Message is exchanged between node.We assess our frame in improved PyDial benchmark test, and show in all settings Show high efficiency, robustness and transferability.

In some embodiments, the embodiment of the present application provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, it is described execute instruction can by electronic equipment (including but It is not limited to computer, server or the network equipment etc.) it reads and executes, for executing the application any of the above-described application In the dialogue method of compound conversation tasks.

In some embodiments, the embodiment of the present application also provides a kind of computer program product, and the computer program produces Product include the computer program being stored on non-volatile computer readable storage medium storing program for executing, and the computer program includes that program refers to It enables, when described program instruction is computer-executed, so that the computer is executed any of the above-described and be applied to compound conversation tasks Dialogue method.

In some embodiments, the embodiment of the present application also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one described processor communication, wherein the memory is stored with can be by described at least one The instruction that a processor executes, described instruction is executed by least one described processor, so that at least one described processor energy It is enough to execute the dialogue method for being applied to compound conversation tasks.

In some embodiments, the embodiment of the present application also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, the dialogue method for being applied to compound conversation tasks is realized when which is executed by processor.

The conversational system applied to compound conversation tasks of above-mentioned the embodiment of the present application can be used for executing the embodiment of the present application The dialogue method applied to compound conversation tasks, and reach accordingly above-mentioned the embodiment of the present application realization be applied to it is compound right The dialogue method of words task technical effect achieved, which is not described herein again.Hardware handles can be passed through in the embodiment of the present application Device (hardware processor) Lai Shixian related function module.

Fig. 8 is the electronic equipment for the dialogue method that the execution that another embodiment of the application provides is applied to compound conversation tasks Hardware structural diagram, as shown in figure 8, the equipment includes:

One or more processors 810 and memory 820, in Fig. 8 by taking a processor 810 as an example.

The equipment for executing the dialogue method applied to compound conversation tasks can also include: input unit 830 and output dress Set 840.

Processor 810, memory 820, input unit 830 and output device 840 can pass through bus or other modes It connects, in Fig. 8 for being connected by bus.

Memory 820 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as pair applied to compound conversation tasks in the embodiment of the present application Corresponding program instruction/the module of words method.The non-volatile software journey that processor 810 is stored in memory 820 by operation Sequence, instruction and module, thereby executing the various function application and data processing of server, i.e. realization above method embodiment Dialogue method applied to compound conversation tasks.

Memory 820 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function；Storage data area can be stored according to the dialogue for being applied to compound conversation tasks Device uses created data etc..In addition, memory 820 may include high-speed random access memory, can also include Nonvolatile memory, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts. In some embodiments, it includes the memory remotely located relative to processor 810 that memory 820 is optional, these long-range storages Device can be by being connected to the network to the Interface for being applied to compound conversation tasks.The example of above-mentioned network is including but not limited to mutual Networking, intranet, local area network, mobile radio communication and combinations thereof.

Input unit 830 can receive the number or character information of input, and generates and be applied to compound conversation tasks The related signal of user setting and function control of Interface.Output device 840 may include that display screen etc. shows equipment.

One or more of modules are stored in the memory 820, when by one or more of processors When 810 execution, the dialogue method applied to compound conversation tasks in above-mentioned any means embodiment is executed.

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.

The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:

(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..

(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.

(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.

(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.

(5) other electronic devices with data interaction function.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of dialogue method applied to compound conversation tasks, comprising:

The superstructure dialogue state is handled based on the first figure neural network, to determine corresponding to described current right Talk about the subtask information of confidence state；

Structuring processing is carried out to obtain underlying structured dialogue to the subtask information and the current session confidence state State；

The underlying structured dialogue state is handled based on the second figure neural network, to determine corresponding to described current right Talk about the dialogue movement of confidence state.

2. according to the method described in claim 1, wherein, the first figure neural network that is based on talks with the superstructureization State is handled, and includes: with the determining subtask information for corresponding to the current session confidence state

For each subtask k, following calculate is executed:

Wherein, O_topIt is the output function that can be MLP, subscript k, 0 and k, i respectively indicate the inode and i-th of S of subtask k Node；

By Σ_vi∈S-nodeh^L _k,iAnd h^L _k,0Series connection as MLP input and export scalar value；

When making decision, all q^k _topIt will be connected, i.e. q_top=q¹ _top⊕...⊕q^K _top, then according to q_topSelection is appointed Business.

3. according to the method described in claim 1, wherein, the second figure neural network that is based on talks with the underlying structured State is handled, and includes: to determine that the dialogue for corresponding to the current session confidence state acts

The Q value of each dialogue movement includes three parts information: subtask class value, slot grade value and original value；

H is embedded in using T node^T _LTo calculate subtask class value:

Wherein,It is the output function of subtask class value,Output dimension be K, wherein each value distributes to phase The subtask answered；

Wherein, O^pi _slotAnd O^pi _primIt is the output function of slot grade value and original value, p respectively_iIndicate that the node of same node point type is total The same instance of output function is enjoyed, slot node v is corresponded to_iMovement a_k,iQ value be q^k,i _low=(q^T _subt)_k+q^k,i _slot+q^k ^,i _prim, wherein+be is by element operation, and (q^T _subt)_kIndicate q^T _subtIn k-th of value；

When prediction action, all q^k,i _lowIt will be connected, i.e. q_low=q^1,1 _low⊕...⊕q^K,0 _low, then according to q_lowSelection pair Words movement.

4. according to the method described in claim 1, wherein, being based on the second figure neural network to the underlying structured dialogue state It is handled, to determine that the dialogue corresponding to the current session confidence state acts further include:

Input pre-treatment step:

Before each prediction, each node v_iWill receive corresponding state of atom b or subtask information g, be expressed as x_i, in advance It is as follows that state insertion is obtained after processing:

h⁰ _i=F_pi(x_i)

Wherein, F_piIt is node type p_iFunction；

Managing graph information step:

By h⁰ _iAs node v_iInitial insertion；

Then the higher insertion of each node in figure is propagated further；

The communication process of node insertion at each extract layer is shown below:

Message calculates: for each node v_i, there are insertion h^l-1 _iNode, for each outer egress v_j∈N_out(v_i), node v_iFollowing message vector is calculated,

Wherein, c_eIt is from node v_iTo node v_jEdge type, M_ce ^lIt is message generating function, it can be linearly embedding: M^l _ce (h^l-1 _i)=W^l _ce h^l-1 _i

Message polymerization: polymerization process is as follows,

Wherein, A is aggregate function,It is syndication message vector；

Insertion updates: up to the present, each node v_iAll there are two types of information, i.e. syndication message vectorAnd its it is currently embedded to Measure h_i ^l-1, it is as follows to be embedded in renewal process:

Wherein, U^l _piIt is the node type p of first of extract layer_iRenewal function, can be nonlinear operation,

Wherein, δ is activation primitive, λ^lIt is the weight parameter of aggregation information, and W^l _piBeing can training matrix.

5. a kind of conversational system applied to compound conversation tasks, comprising:

First structure handler module, for carrying out structuring processing to current session confidence state to obtain superstructure Change dialogue state；

Subtask information determines program module, for being carried out based on the first figure neural network to the superstructure dialogue state Processing, to determine the subtask information for corresponding to the current session confidence state；

Second structuring handler module, for carrying out structure to the subtask information and the current session confidence state Change processing to obtain underlying structured dialogue state；

Dialogue act determine program module, for based on the second figure neural network to the underlying structured dialogue state at Reason, to determine that the dialogue corresponding to the current session confidence state acts.

6. system according to claim 5, wherein the first figure neural network that is based on talks with the superstructureization State is handled, and includes: with the determining subtask information for corresponding to the current session confidence state

For each subtask k, following calculate is executed:

By Σ_vi∈S-node h^L _k,iAnd h^L _k,0Series connection as MLP input and export scalar value；

7. system according to claim 5, wherein the second figure neural network that is based on talks with the underlying structured State is handled, and includes: to determine that the dialogue for corresponding to the current session confidence state acts

H is embedded in using T node^T _LTo calculate subtask class value:

8. system according to claim 1, wherein based on the second figure neural network to the underlying structured dialogue state It is handled, to determine that the dialogue corresponding to the current session confidence state acts further include:

Input pre-treatment step:

h⁰ _i=F_pi(x_i)

Wherein, F_piIt is node type p_iFunction；

Managing graph information step:

By h⁰ _iAs node v_iInitial insertion；

Then the higher insertion of each node in figure is propagated further；

Message polymerization: polymerization process is as follows,

Wherein, A is aggregate function,It is syndication message vector；

9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out any one of claim 1-4 the method Step.

10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-4 the method.