CN108268616A

CN108268616A - The controllability dialogue management extended method of fusion rule information

Info

Publication number: CN108268616A
Application number: CN201810009140.3A
Authority: CN
Inventors: 王唯康; 张家俊; 李志飞; 宗成庆
Original assignee: Institute of Automation of Chinese Academy of Science; Mobvoi Information Technology Co Ltd
Current assignee: Institute of Automation of Chinese Academy of Science; Mobvoi Information Technology Co Ltd
Priority date: 2018-01-04
Filing date: 2018-01-04
Publication date: 2018-07-10
Anticipated expiration: 2038-01-04
Also published as: CN108268616B

Abstract

The invention belongs to interactive fields, more particularly to a kind of controllability dialogue management extended method of fusion rule information, the problem of aiming to solve the problem that of high cost, the inefficiency when conversational system of data-driven is extended by way of rebuilding interactive environment, this method includes：S1 based on interaction data, determines the new user view for needing to expand, and original language Understanding Module is extended；S2 based on new user view, builds the corresponding new session rules of the new user view；S3 based on interaction data, the dialog strategy of former dialogue management module, new session rules, builds the constraint met needed for new dialogue management module mapping space；S4 based on the constraint met needed for the new dialogue management module mapping space obtained in S3, is extended former dialogue management module, generates new dialogue management module.The present invention can carry out Quick Extended, efficient iterative to the conversational system of data-driven according to user feedback.

Description

The controllability dialogue management extended method of fusion rule information

Technical field

The invention belongs to interactive fields, and in particular to a kind of controllability dialogue pipe of fusion rule information Manage extended method.

Background technology

Task conversational system is in a certain specific area (restaurant, hotel or air ticket field), passes through natural language interaction Form, auxiliary user complete task man-machine interactive system.In general, Task conversational system needs have following four A basic function：Language understanding, dialogue state tracking, dialog strategy and dialogue generation.Wherein, dialogue state tracking and dialogue Policy module is collectively known as dialogue management module, is the core of whole system.

Since Task conversational system can help consumer to reach target in a manner of more friendly.Therefore, people throw Enter a large amount of energy goes to study how designing user experiences better conversational system.In current commercial system, dialogue pipe Reason module is generally realized using rule-based mode.Developer needs manual writing Dialogue management strategy, defines different right It talks about under context, which action is system should perform.Although this method simple, intuitive is easily controllable, need a large amount of Manpower and professional knowledge.Recent years, it has been found that the feedback signal of user's offer can be utilized by the method for intensified learning Automatically build the dialogue management module of robust.The method avoids a large amount of rules of design.Therefore, it is this to be based on data The design pattern of driving receives the extensive concern of industrial quarters.

However when designing commercial system, it is a certain for completing that developer can define all rational system actions Particular task.The user view that system can be handled then is needed after online implementing, is constantly expanded according to the demand of user Exhibition.So in business development, in order to which system is made to make rational reaction to the user view not accounted for before, exploitation Person needs to extend whole system with being iterated formula.

Although great advantage is had in structure Task conversational system based on the method for intensified learning, due to number Dialogue management module according to driving is a flight data recorder, can only be again if to be extended to original dialogue management module It designs interactive environment and carries out retraining.And the structure of interactive environment is sufficiently expensive.Therefore, minimum cost how is spent, to Dialogue management module of some based on intensified learning is efficiently extended, while retains the potential excellent of data-driven method Gesture is a subject highly studied.

Invention content

In order to solve the above problem of the prior art, in order to which the conversational system solved to data-driven is extended When of high cost, inefficiency the problem of, the present invention proposes a kind of controllability dialogue management extension side of fusion rule information Method includes the following steps：

Step S1 based on interaction data D, determines the new user view for needing to expand, and original language Understanding Module is carried out Extension；

Step S2 based on the selected new user views of step S1, builds the corresponding new dialogue rule of the new user view Then；

It is obtained in step S3, the interaction data based on step S1, the dialog strategy of former dialogue management module, step S2 New session rules build the constraint L met needed for new dialogue management module mapping space；

Step S4, based on the constraint L met needed for the new dialogue management module mapping space obtained in step S3, to original Dialogue management module is extended, and generates new dialogue management module.

Further, described " the constraint L met needed for new dialogue management module mapping space ", specially：

L=λ₁L_D+λ₂L_D,θ+λ₃L_D,R

Wherein, L_DBehavior and the behavior congruence of former dialogue management module for new dialogue management module constrain；L_D,θFor The behavioral strategy of new dialogue management module and the behavioral strategy consistency constraint of former dialogue management module；L_D,RFor new dialogue management The behavioral strategy of module and the new session rules accordance of definition constrain；λ₁、λ₂、λ₃For preset weighting parameter.

Further,

Wherein, θ^newModel parameter for new dialogue management module；θ is the model parameter of former dialogue management module；D is hands over Dialogue sample in mutual data D；T is the wheel number for talking with sample d；|A_s| it is the number of system action；h_tPair for t wheel dialogues Talk about context；a_kIt is new dialogue management module in current session context h_tUnder behavior；a_tIt is primal system in current session History h_tUnder behavior, π () is former dialogue management module；L is the session rules defined for handling new user view Number；h_lFor the context of dialogue condition stated in l rules, a_lFor rules context condition h_lSystem should when meeting Perform action.

Further, described " being extended to original language Understanding Module ", method is：

On the basis of original language Understanding Module, based on the new user view that needs expand, the mark of new user view is added Note data, re -training language understanding module.

Further, described " being extended to former dialogue management module ", method is：

Additional field, which is added, in the dialogue state of former Dialogue management model represents represents new user view；

The corresponding session rules of the new user view are set；

The constraint met needed for new dialogue management module mapping space is set.

Further, the new dialogue management module and former dialogue management module, are the dialogue pipe based on data-driven Manage module.

Further, by constructing two kinds of users simulated environment Sim1 and Sim2, the new dialogue management module is carried out Training and test；For wherein Sim1 for training original dialogue management module, Sim2 has ring on the line of X factor for simulation Border.

Further include user satisfaction computational methods：

Wherein, Satis. is user satisfaction, and d is the dialogue sample in interaction data D, and t is the wheel number for talking with sample d, L be definition session rules number, h_tFor the context of dialogue of t wheel dialogues, h_lFor above and below the dialogue stated in l rules Literary condition, a_tIt is primal system in current session history h_tUnder behavior, a_lFor rules context h_lSystem should be held when meeting Action is made, and 1 { } represents indicator function, if it is indicated that the variable on function equal sign both sides is equal, otherwise it is 0 that indicator function, which refers to as 1,.

The positive effect of the present invention：

In the design process of business system, inevitably according to the demand of user with feedback to original system into Row extension.And a reason of intensified learning method large-scale application in business development is limited namely based on pair of intensified learning Words management module is a flight data recorder, and original system, which is extended, to be needed to rebuild new interactive environment.But interactive environment Structure it is very expensive, have the cost high efficiency low if being extended by way of rebuilding interactive environment to system The problem of lower.The present invention is realized by using resource generated in business development and several simple session rules The controllability of original dialogue management module is extended.Experiment shows to reach based on method proposed by the invention and again The almost the same effect of new structure interactive environment.By the present invention, developer can be in retention data driving method potential advantages Under the premise of, Quick Extended, efficient iterative are carried out according to user feedback to the conversational system of data-driven.

Description of the drawings

Fig. 1 is that the controllability dialogue management extended method flow of the fusion rule information of an embodiment of the present invention is shown It is intended to；

Fig. 2 is the schematic diagram that extension is realized by redesigning interactive environment；

Fig. 3 is the schematic diagram for realizing extension through the embodiment of the present invention.

Specific embodiment

The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little embodiments are used only for explaining the technical principle of the present invention, it is not intended that limit the scope of the invention.

The basic thought of the present invention is to utilize generated available resources in systems development process and simple dialogue rule Then, it realizes the controllability extension of dialogue management module, while saves the cost needed for business system iteration, shortens exploitation Period.Fig. 1 is the controllability dialogue management extended method flow signal of the fusion rule information of an embodiment of the present invention Figure；Developer is if it find that certain user is intended to not account for, and it is desirable that new system energy in the system design most started These user views are enough handled, the method in Fig. 2 can also be used.But Fig. 2 require developer build new interactive environment with New dialogue management module interacts, and new dialogue management module is trained based on new model structure, this is opened in practical business It costs dearly in hair, inefficiency.Fig. 3 is then the schematic diagram of method proposed by the invention, and core concept is based on former dialogue Management module and the dialog logic rule of setting extend formation after knowledge distills (Knowledge Distillation) New dialogue management module.Table 3 then demonstrates the validity of our methods.In general, extension is based on deeply in the present invention The dialogue management module of study is divided into three key steps：(1) original system and the interaction data of real user are obtained；(2) it designs The mapping space restriction relation met needed for new dialogue management module；(3) dialogue is realized based on the restriction relation designed in (2) The Function Extension of management module.

The controllability dialogue management extended method of the fusion rule information of an embodiment of the present invention, as shown in Figure 1, Include the following steps：

Technical solution of the present invention is further described with reference to specific extended scene.

Assuming that our Task conversational system is that field helps user to obtain restaurant information at the restaurant.User can adopt Satisfactory meal is retrieved with constrained attributes (inform slots) such as " restaurant names ", " area ", " price " and " specialties " Shop.Meanwhile in addition to constrained attributes, user can also inquire " scoring ", " comment number ", " specific address " and " telephone number " Wait restaurants attribute (request slots).Initial stage is designed in system, the system action that developer defines includes：" inform user Some property value in some restaurant " (inform), " restaurant that some is recommended to meet constraints " (recommend), " to Family confirms some constraint " (confirm) and " requrying the users some constraints " (request) etc..The use that developer defines Family is intended to including " greeting " (hello), " providing some retrieval constraint " (inform), " negating some retrieval constraint " (deny), " approval " (affirm), " opposition " (negate), " inquiring some attribute " (request), " inquiry is other alternative " (reqalts) and " None- identified " (null).Table 1 gives specific example, wherein " N/A " expression is not supported, " System " is Task conversational system, and " User " is user.

The specific example of 1. dialogue of table action

	System	User
			inform	This family shop is in Zhong Guan-cun.	I wishes to have dinner in Zhong Guan-cun.
recommend	" peppery manager " is a good shop.	N/A
			confirm	You want to eat Sichuan cuisine	N/A
request	You want have dinner where	This family shop is at which
			hello	N/A	You are good.
deny	N/A	I is not desired to eat Sichuan cuisine.
			affirm	N/A	Yes.
negate	N/A	No.
			reqalts	N/A	Also other shop
null	N/A	" result of arbitrary None- identified "

The Task conversational system of the present embodiment uses the dialogue management module based on intensified learning, and (intensified learning is number According to one kind of driving), it after the completion of the exploitation of Task conversational system, needs under the set of original user view, training is former right Management module is talked about, the training hired real user or intensified learning is carried out using user's emulator, specific method may be used For：In the training of each round dialogue, one interactive task of user is given, such as " look for the Sichuan cuisine near a Zhong Guan-cun Shop, and inquire the telephone number in this family shop." if system can complete the task of user, to it to a positive feedback, Such as "+20 ".In order to which Motivational systems complete the task of user in shorter wheel number, we are punished with a smaller negative-feedback Penalize the increase of dialogue wheel number, such as " -1 ".

After online implementing, through operation after a period of time, pass through the record of the interaction of analysis system and user, developer The intention of certain user is can be found that not within the scope of the considerations of training stage, therefore developer needs the former dialogue of progress to manage Manage the extension of module so that new system can make rational reaction to the user view that those are not accounted for.With reference to Explanation is unfolded by step in the step of the present invention and above-mentioned extended scene.

Step S1 based on interaction data D, determines the new user view for needing to expand, and original language Understanding Module is carried out Extension.

Interaction data D can be the record of the interaction of user in predecessor's business type conversational system.

From the interaction example of above-mentioned extended scene, potential user demand is analyzed, illustrates a true friendship as follows Mutual example：

U：I wants to look for the restaurant near a Zhong Guan-cun.

S：" peppery manager " is a Zhong Guan-cun good restaurant nearby.

U：The characteristic in this family shop is Sichuan cuisine

S：" peppery manager " is a Zhong Guan-cun good restaurant nearby.

U：I is intended to ask that this family shop is Sichuan cuisine shop！

S：" peppery manager " is a Zhong Guan-cun good restaurant nearby.

U：What the speciality in this family shop is

S：The characteristic of " peppery manager " is Sichuan cuisine.

U：Tell me phone.

S：The phone of " peppery manager " is 189XXXXXXXX.

U：It thanks.

S：It is unfriendly.

By the analysis to data it can be found that user may want to confirm whether the restaurant of system recommendation meets some Constraint (confirms whether restaurant is located at some area), and is not taken into account in system this user view of design initial stage.Cause This, new system is required to intention of the processing similar to " user confirms some constraints " (confirm).

After the new user view for determining to need to expand is determined, need to understand mould to original language according to the new user view Block is extended.Extension with original language Understanding Module is relatively simple, need to be only based on the basis of original language Understanding Module The new user view expanded is needed, adds the labeled data of new user view, and re -training.This method comparative maturity, And more document is all on the books, details are not described herein again.

Step S2 based on the selected new user views of step S1, builds the corresponding new dialogue rule of the new user view Then.

It is obtained in step S3, the interaction data based on step S1, the dialog strategy of former dialogue management module, step S2 New session rules build the constraint L met needed for new dialogue management module mapping space, as shown in formula (1).

L=λ₁L_D+λ₂L_D,θ+λ₃L_D,R (1)

Wherein, λ₁、λ₂、λ₃For preset weighting parameter, 1,1 and 3 are taken respectively in this example.

L_DBehavior and the behavior congruence of former dialogue management module for new dialogue management module constrain, such as formula (2) institute Show；

In the formula, 1 { } represents indicator function, if it is indicated that the variable on equal sign both sides is equal in function bracket, indicates It is 1 that function, which refers to, is otherwise 0.

L_D,θFor the behavioral strategy of new dialogue management module and the behavioral strategy consistency constraint of former dialogue management module, such as Shown in formula (3)；

L_D,RIt is constrained for the behavioral strategy of new dialogue management module and the new session rules accordance of definition, such as formula (4) It is shown；

Wherein, θ^newModel parameter for new dialogue management module；θ is the model parameter of former dialogue management module；D is hands over Dialogue sample in mutual data D；T is the wheel number for talking with sample d；|A_s| for the number of system action, new dialogue management module phase It does not change for the system action in former dialogue management module, the only new dialogue management module of change can be supported More user behaviors； h_tThe context of dialogue for t wheel dialogues；a_kIt is new dialogue management module in current session context h_t Under behavior；a_tIt is primal system in current session history h_tUnder behavior, π () is former dialogue management module；L is for locating The number of session rules managed new user view and defined；h_lFor the context of dialogue condition stated in l rules, a_lFor rule Then context condition h_lSystem should perform action when meeting.New session rules

Step S4, based on the constraint L met needed for the new dialogue management module mapping space obtained in step S3, to original Dialogue management module is extended, and generates new dialogue management module.Including：It is represented in the dialogue state of former Dialogue management model The additional field of middle addition represents new user view；The constraint met needed for new dialogue management module mapping space is set.

In the present embodiment, if user " value for confirming some attribute " (confirm), then relatively reasonable system row To be " value for informing this attribute " (inform).Formalization representation is：

If user confirm (Slot=value)

Then system inform (Slot=Value)

Wherein Slot represents that attribute that user wishes to confirm that, value then represents the value of user's confirmation, and is that A attribute is really worth.So in conversation instance above, when user's inquiry, " characteristic in this family shop is Sichuan cuisine", slot For dining room type, value is Sichuan cuisine.System then needs answer, and " characteristic of ' peppery manager ' is Sichuan cuisine.", it is seen that Value is also river Dish.Due to the condition (restaurant name, area, price and specialties) that can be used for retrieving restaurant there are 4 in this example, so this The rule one of sample co-exists in 4.

Step S5, based on the constraint met needed for new dialogue management module mapping space, to the new dialogue management module It is trained.

It, can be in order to carry out effect judge to trained new task type conversational system with new dialogue management module Under ill-conceived user view, system takes the ratio of rational system action to weigh user satisfaction, specific using public Formula (5) carries out：

Satis. it is user satisfaction, the user experience of the higher expression system of value of Satis. is better.In the formula, 1 { } represents indicator function, if it is indicated that the variable on equal sign both sides is equal in function bracket, it is 1 that indicator function, which refers to, is otherwise 0.

In order to simulate true training environment, the present invention constructs two kinds of user's simulated environment：Sim1 and Sim2.Sim1 Behavior include " hello ", " inform ", " deny ", " negate ", " affirm ", " reqalts ", " request " and “null”.Other than the user behavior that Sim1 is able to carry out, Sim2 can also utilize " confirm " to inquire current system Whether the entity recommended meets some constraint.The effect of Sim1 is trained original dialogue management module, and the effect of Sim2 is Environment on line of the simulation with X factor.It is worth noting that, true user model after the online implementing that Sim2 is represented, It can not be predicted in advance.Therefore the effect that can be optimal under the test environment of Sim2 with the system of Sim2 training, And as the upper bound of our models.

The intention of user " confirm " was not accounted for the Sim1 conversational systems (original system) trained.Therefore, former system System reach the standard grade with Sim2 interact after, although developer, it can be seen that dialogue success rate it is very high, in the user view not considered Under, the reply of system is very unreasonable.We simulate there is no ambient noise and there is a situation where under ambient noise, specifically The results are shown in Table 2, and wherein D1 represents the interaction data without ambient noise, and D2 represents language understanding module, and there are 0.1 mistakes The accidentally interaction data under probability.

Table 2. emulates the situation after primal system is reached the standard grade

	Scale	Talk with success rate	Averagely take turns number	Average reward	User satisfaction
						D1	1600	0.958	12.3	4.49	0.153
D2	1600	0.964	13.6	4.00	0.151

Using method proposed by the present invention, do not need to reconfigure the additional new dialogue management module of simulated environment training. The results are shown in Table 3 for experiment.Wherein D3 and D4 is the test result of proposition method of the present invention, and D5 and D6 are by setting again It counts interactive environment training system and comes.From table 3 it is observed that the method for the present invention is in no redesign interactive training Under the premise of environment, it can reach and its comparable performance.This fully demonstrates effectiveness of the invention.

Under 3. method proposed by the present invention of table, the performance test results of new system

	Scale	Talk with success rate	Averagely take turns number	Average reward	Satisfaction
						D3	3200	0.968	11.4	5.26	1
D4	3200	0.960	13.1	4.19	0.86
						D5	3200	0.971	11.2	5.28	1
D6	3200	0.958	12.9	4.21	0.87

Those skilled in the art should be able to recognize that, and be described with reference to the embodiments described herein each exemplary Method and step can be realized with the combination of electronic hardware, computer software or the two, hard in order to clearly demonstrate electronics The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are performed actually with electronic hardware or software mode, specific application and design constraint depending on technical solution Condition.Those skilled in the art can realize described function to each specific application using distinct methods, still This realization is it is not considered that beyond the scope of this invention.

The step of method or algorithm for being described with reference to the embodiments described herein, can use hardware, processor to perform Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

Term " comprising " or any other like term are intended to cover non-exclusive inclusion, so that including one The process of list of elements, method not only include those elements, but also other elements including being not explicitly listed or and also Including the intrinsic element of these processes, method.

So far, it has been combined preferred embodiment shown in the drawings and describes technical scheme of the present invention, still, ability Field technique personnel are it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from Under the premise of the principle of the present invention, those skilled in the art can make the relevant technologies feature equivalent change or replacement, this Technical solution after changing or replace it a bit is fallen within protection scope of the present invention.

Claims

1. the controllability dialogue management extended method of a kind of fusion rule information, which is characterized in that include the following steps：

Step S1 based on interaction data D, determines the new user view for needing to expand, and original language Understanding Module is extended；

Step S2 based on the selected new user views of step S1, builds the corresponding new session rules of the new user view；

The new dialogue obtained in step S3, the interaction data based on step S1, the dialog strategy of former dialogue management module, step S2 Rule builds the constraint L met needed for new dialogue management module mapping space；

Step S4, based on the constraint L met needed for the new dialogue management module mapping space obtained in step S3, to original dialogue pipe Reason module is extended, and generates new dialogue management module.

2. the controllability dialogue management extended method of fusion rule information according to claim 1, which is characterized in that institute " the constraint L met needed for new dialogue management module mapping space " is stated, specially：

L=λ₁L_D+λ₂L_D,θ+λ₃L_D,R

Wherein, L_DBehavior and the behavior congruence of former dialogue management module for new dialogue management module constrain；L_D,θNewly to talk with The behavioral strategy of management module and the behavioral strategy consistency constraint of former dialogue management module；L_D,RFor new dialogue management module Behavioral strategy and the new session rules accordance of definition constrain；λ₁、λ₂、λ₃For preset weighting parameter.

3. the controllability dialogue management extended method of fusion rule information according to claim 2, which is characterized in that

Wherein, θ^newModel parameter for new dialogue management module；θ is the model parameter of former dialogue management module；D is interaction number According to the dialogue sample in D；T is the wheel number for talking with sample d；|A_s| the number for behavior in the number session rules of system action；h_t The context of dialogue for t wheel dialogues；a_kIt is new dialogue management module in current session context h_tUnder behavior；a_tIt is original System is in current session history h_tUnder behavior, π () is former dialogue management module；L be for handling new user view depending on The number of the session rules of justice；h_lFor the context of dialogue condition stated in l rules；a_lFor rules context condition h_lIt is full System should perform action when sufficient.

4. the controllability dialogue management extended method of fusion rule information according to claim 1, which is characterized in that institute " being extended to original language Understanding Module " is stated, method is：

On the basis of original language Understanding Module, based on the new user view that needs expand, the mark number of new user view is added According to, and training language Understanding Module again.

5. the controllability dialogue management extended method of fusion rule information according to claim 1, which is characterized in that institute " being extended to former dialogue management module " is stated, method is：

The corresponding session rules of the new user view are set；

It is 6. special according to the controllability dialogue management extended method of claim 1-5 any one of them fusion rule information Sign is that the new dialogue management module and former dialogue management module, are the dialogue management module based on data-driven.

It is 7. special according to the controllability dialogue management extended method of claim 1-5 any one of them fusion rule information Sign is, by constructing two kinds of users simulated environment Sim1 and Sim2, the new dialogue management module is trained and tested； For wherein Sim1 for training original dialogue management module, Sim2 has environment on the line of X factor for simulation.

It is 8. special according to the controllability dialogue management extended method of claim 1-5 any one of them fusion rule information Sign is, further includes user satisfaction computational methods：

Wherein, Satis. is user satisfaction, and d is the dialogue sample in interaction data D, and t is the wheel number for talking with sample d, and L is fixed The session rules number of justice, h_tFor the context of dialogue of t wheel dialogues, h_lFor the context of dialogue article stated in l rules Part, a_tIt is primal system in current session history h_tUnder behavior, a_lFor rules context h_lSystem should perform dynamic when meeting Make, 1 { } represents indicator function, if it is indicated that the variable on function equal sign both sides is equal, otherwise it is 0 that indicator function, which refers to as 1,.