CN109739965B

CN109739965B - Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium

Info

Publication number: CN109739965B
Application number: CN201811641823.7A
Authority: CN
Inventors: 莫凯翔
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2022-07-15
Anticipated expiration: 2038-12-29
Also published as: CN109739965A

Abstract

The invention provides a migration method of cross-domain conversation strategy, comprising the following steps: processing the input user input dialog to map out a corresponding target field dialog state; mapping the target domain dialog state to a source domain dialog state; processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply; and mapping the source domain dialog reply to a target domain dialog reply. The invention also provides a migration device, equipment and a readable storage medium of the cross-domain conversation strategy. The invention solves the technical problems that the conventional conventionally constructed dialogue system is difficult to maintain, the cost of manually marking data is high, the data is repeatedly marked, and the marked data is difficult to be applied across fields.

Description

Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy.

Background

The dialog system is an important component in the field of human-computer interaction, and the dialog system conventionally constructed at present mainly comprises: a dialogue system built by utilizing rules, a dialogue system based on supervised learning and a dialogue system based on reinforcement learning.

Dialog systems built using rules appear the earliest, and are easier for humans to understand and control. The disadvantage is that the developer needs to enumerate all cases and make rules for each case to make a pre-determination. When the actual scene is complex and the quantity of the established rules is accumulated more, the rules are easy to conflict with each other, so that the system is difficult to maintain. Such systems have difficulty supporting large scale dialog systems.

The dialogue system based on supervised learning and the dialogue system based on reinforcement learning are obtained by training models and data, developers do not need to make rules in advance for all conditions, and only the annotation data need to be collected and used for training the models. However, the biggest disadvantage of both dialog systems is the need to collect large-scale annotation data. However, since real application scenarios are numerous, it is obviously impractical to collect enough annotation data for each dialog scenario; the main reasons include:

1. The cost of manually labeling data is high.

2. A large number of repeated labels may exist in different scenes, resulting in resource waste. For example: the same category of demand functions (referred to as "intents" in this disclosure) occurs in the context of buying coffee, ordering airline tickets, ordering hotels, etc.: "tell", "request", and the occurrence of the same task information (referred to as "slot" in this disclosure): "location", "time", etc.

3. It is difficult to use data from one domain directly to train a model from another domain. First, the same or similar intents and slots may be labeled by different companies with different names; second, there are really different intents and slots in different areas.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy, and aims to solve the technical problems that the conventional conventionally constructed conversation system is difficult to maintain, the cost of manually marking data is high, the data is repeatedly marked, and the marked data is difficult to be applied in a cross-domain manner.

In order to achieve the above object, the present invention provides a migration method of cross-domain dialog policy, which includes the following steps:

processing the input user input dialog to map out a corresponding target field dialog state;

mapping the target domain dialog state to a source domain dialog state;

processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply;

and mapping the source domain dialog reply to a target domain dialog reply.

Preferably, the step of processing the input user input dialog to map out a corresponding dialog state of the target domain specifically includes:

performing natural language understanding on an input user input conversation to identify a target field intention and extract a target field slot;

tracking the target field intention;

and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.

Preferably, the step of mapping the dialog state of the target domain to the dialog state of the source domain specifically includes:

Determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;

acquiring a source domain intention with the maximum preset similarity to the target domain intention;

acquiring a source field slot position with the maximum preset similarity with the target field slot position;

and generating a source field conversation state according to the source field intention and the source field slot position.

acquiring a source field slot position which establishes a corresponding relation with the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;

Preferably, the step of mapping the target domain dialog state to the source domain dialog state in the source domain specifically includes:

solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots according to the solved variables;

according to the result of similarity learning, obtaining a source domain intention with the maximum similarity to the target domain intention;

according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position;

In addition, to achieve the above object, the present invention further provides a migration apparatus for cross-domain dialog policy, the apparatus including:

the target field dialogue state mapping unit is used for processing the input user input dialogue to map out a corresponding target field dialogue state;

a source domain dialog state mapping unit, configured to map the target domain dialog state into a source domain dialog state;

the source field conversation state processing unit is used for processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply;

And the target field dialogue reply mapping unit is used for mapping the source field dialogue reply into the target field dialogue reply.

Preferably, the target domain dialogue state mapping unit is specifically configured to perform natural language understanding on an input user input dialogue to identify a target domain intention and extract a target domain slot; tracking the target field intention; and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.

Preferably, the source domain dialog state mapping unit is specifically configured to determine the source domain according to the target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; acquiring a source domain intention with the maximum preset similarity to the target domain intention; acquiring a source field slot position with the maximum preset similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.

Preferably, the source domain dialog state mapping unit is specifically configured to:

acquiring a source field slot position which establishes a corresponding relation with the target field slot position; respectively sequencing the importance of the slot position in the target field and the slot position in the source field, and establishing a corresponding relation between the slot position in the target field and the slot position in the source field according to a sequencing result;

Preferably, the source domain dialog state mapping unit is specifically configured to: determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots according to the solved variables; according to the result of similarity learning, obtaining a source domain intention with the maximum similarity to the target domain intention; according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.

In addition, to achieve the above object, the present invention further provides a migration device for a cross-domain dialog policy, where the terminal device includes: the system comprises a memory, a processor and a migration program of cross-domain dialogue strategy stored on the memory and capable of running on the processor, wherein the migration program of cross-domain dialogue strategy realizes the steps of the migration method of cross-domain dialogue strategy when being executed by the processor.

In addition, to achieve the above object, the present invention further provides a readable storage medium, on which a migration program of a cross-domain dialogue policy is stored, and when executed by a processor, the migration program of the cross-domain dialogue policy implements the steps of the migration method of the cross-domain dialogue policy as described above.

The embodiment of the invention provides a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy, wherein a conversation state in a source domain is processed by mapping a conversation state in a target domain into a conversation state in the source domain based on a preset conversation strategy in the source domain to obtain a corresponding conversation reply in the source domain; and mapping the source domain dialogue reply to a target domain dialogue reply, thereby migrating the dialogue strategy of the target domain to the dialogue strategy of the source domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the demand of manual marking data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.

Drawings

FIG. 1 is a flowchart illustrating a migration method of cross-domain dialog policies according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a detailed step of step S10 in the migration method of cross-domain dialog policy according to the first embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an implementation process of the migration method of cross-domain conversation policy according to the present invention;

FIG. 4 is a flowchart illustrating a migration method of cross-domain dialog policies according to a second embodiment of the present invention;

FIG. 5 is a flowchart illustrating a migration method of cross-domain dialog policies according to a third embodiment of the present invention;

FIG. 6 is a flowchart illustrating a migration method of cross-domain dialog policies according to a fourth embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating the components of the functional units of the migration apparatus according to the cross-domain dialogue strategy of the present invention;

FIG. 8 is a schematic diagram of the operating environment of a migration device for cross-domain dialog policies of the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Some of the terms and explanations related to the present invention are listed below:

intention is: in the task type dialog system, sentences are divided into different categories according to different tasks, each category expresses a different meaning, and each category is an intention.

For example: the sentence "I want to specify a ticket from Beijing to Shanghai" is a requirement that the user expresses him, which can be defined as "inform" intention; "are tickets there are points? The phrase "indicates that the user is inquiring about the ticket information, which may be defined as a" request "intent.

It is worth noting that different companies may express different words for the same intent for different scenarios, such as: the "request" intent may be named "question" by other companies and may also be named "get information".

A slot position: in a task-based dialog system, different information needs to be collected according to different tasks, and each piece of information is a slot.

For example: in the sentence of 'I want to determine a ticket from Beijing to Shanghai', 'Beijing' is a starting place slot position and 'Shanghai' is a destination slot position. It is also worth noting that different companies may express different words for the same slot for different scenarios, such as: the "origin" may be labeled as "departure city", etc.

Target area: there is a need for improved target areas that do not have sufficient training data.

The source field is as follows: there is an existing field with a large amount of training data while having a dialog strategy with a higher level of performance.

The invention provides a migration method of a cross-domain conversation strategy.

Referring to fig. 1, fig. 1 is a flowchart illustrating a migration method of a cross-domain dialog policy according to a first embodiment of the present invention. In this embodiment, the method comprises the steps of:

step S10, processing the input user input dialog to map out the corresponding target domain dialog state;

embodiments of the present invention are particularly applicable to task-based dialog systems. The purpose of the task-based dialog system is to assist the user in completing tasks, such as booking a hotel, purchasing an airline ticket, etc., by recognizing the user's intent. In specific implementation, the user input dialog may be dialog information generated based on information materials such as characters or voice input when the user uses the human-computer interaction system, for example, when the user needs to book an air ticket, the user may input information "i want to book an air ticket from shanghai to beijing" in the human-computer interaction system (ticket booking platform); at the moment, after the system detects the input information of the user, the corresponding user input dialog is extracted.

The target domain refers to a domain having the highest degree of dialog with the input user input, and the specific type of the target domain may be manually set by the user. For example, the user makes a selection of a target field, such as selecting a "booking" field, before or after entering information. Or, the user input dialog is analyzed. For example, based on the user input dialog "i want to order an air ticket to beijing from shanghai", the target field is determined to be "ticket booking" or "ticket booking". In addition, the target field can also be a task type scene field such as flow checking, call charge checking, meal ordering, consultation and the like.

As shown in fig. 2, in one embodiment, step S10 includes:

step S11, natural language understanding is carried out on the input user input dialogue to identify the target field intention and extract the target field slot;

referring to fig. 3, fig. 3 is a schematic diagram illustrating an implementation process of the migration method of cross-domain dialog policy according to the present invention. The user input dialogue belongs to natural language, and natural language understanding is carried out on the user input dialogue through a natural language understanding module (or unit), so that target field identification, user intention identification and slot extraction are carried out. And target field identification, namely identifying a task type scene to which the user input dialogue belongs. Identifying user intentions, namely identifying the user intentions, and subdividing sub-scenes in the task-based scene; and the slot position extraction is used for extracting the slot position and the slot position value thereof based on user input dialogue, and can be specifically realized through a slot position filling mode. The specific techniques for understanding the input user input dialog with natural language, identifying the target field intention and extracting the slot position of the target field belong to the conventional prior art, and are not described herein again.

Step S12, tracking the intention of the target field;

dialog state tracking is a core component that ensures the robustness of a dialog system. The method pre-estimates the target of the user in each turn of the conversation, manages the input and conversation history of each turn, and outputs the current conversation state. This typical state structure is often referred to as slot filling or semantic framework. Conventional methods have found widespread use in most commercial implementations, and manual rules are typically employed to select the most likely output.

And step S13, mapping the user input dialog according to the target field intention, the target field slot and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.

The target domain dialog state may specifically be a combination of an intent and a set of slots and their slot bit values.

For example, the user input dialog "i want to order an air ticket from shanghai to beijing" is subjected to word segmentation and word stem extraction, and then a semantic slot corresponding to the user input dialog is generated. Semantic slots can be predefined according to different scenarios. According to the semantic slot, determining the intention of the user conversation, the slot position and the slot position value:

hotel with intention of booking

The slot position 1 is the starting city, and the corresponding slot position is Shanghai

The slot 2 is equal to the arrival city, and the corresponding slot value is equal to Beijing

Step S20, mapping the target domain dialog state to a source domain dialog state;

specifically, a corresponding source field is determined according to a target field; the source domain is a domain in which a target domain is specified in advance. The similarity between the target domain and the source domain is preferably high. For example, the source field pre-designated to the field of "booking air tickets" is the field of "booking hotels".

Then, a source domain intention with the maximum similarity to the target domain intention in the dialog state with the target domain is obtained. For example, the target domain intention "booking flight ticket" has different similarities with different intentions (such as "booking hotel", "querying house source location") in the source domain (booking hotel), respectively; the embodiment selects the source domain intention under the maximum similarity.

And acquiring a source domain slot with the maximum similarity to the target domain slot in the session state of the target domain. For example, a target field slot "departure city" in the target field (air ticket booking) has different similarities with different slots (such as "check-in time", "check-in number", "house source position") in the source field (hotel booking); the embodiment selects the source domain intention under the maximum similarity.

And then, generating a source field conversation state according to the source field intention and the source field slot position. And mapping the dialog state of the target field into the dialog state of the source field when the source field intention and the source field slot position under the maximum similarity are obtained respectively.

For the embodiment of step S20, please refer to other embodiments below.

Step S30, processing the dialog state of the source field based on the preset dialog strategy of the source field to obtain the corresponding dialog reply of the source field;

The source field has a large amount of training data, and a dialogue strategy (i.e., the preset dialogue strategy) with a higher performance level is generally obtained based on the training of the large amount of training data; or the preset conversation strategy is set manually.

Specifically, a preset dialogue strategy of the source field is called, and the dialogue state of the source field is processed through the preset dialogue strategy, so that a corresponding dialogue reply of the source field is obtained.

For example, the dialog state of the source domain is { intent: ordering a hotel, and checking in time: year 2018, 10 month 1 day, departure time: in 2018, 10, 2, and 10, a preset dialog strategy in the source domain generates an optimal abstract dialog reply { intention: query, price: is it a question of }; wherein, "? "means that the reply form to ask for the price is a question sentence.

Step S40, mapping the source domain dialog reply to a target domain dialog reply.

As shown in fig. 3, after the source domain dialog reply is obtained, the source domain dialog reply is subjected to mapping processing, so as to obtain the target domain dialog reply in the target domain. For example, an abstract source domain dialog reply to the source domain (hotel booking) { intent: query, price: is there a Executing mapping processing to obtain abstract target field dialogue reply { intention: query, price: is there a ) }, "? "means that the reply form to ask for the price is a question sentence.

Further, the target domain dialog reply may be organized into natural language for return to the user for ease of understanding by the user. For example: abstract target realm dialog reply for target realm (air ticket) intent: query, price: is it a question of Will be organized into the natural language "ask for what price ticket you want? ".

In the embodiment, the dialog state in the source field is processed by mapping the dialog state in the target field into the dialog state in the source field and further based on the existing preset dialog strategy in the source field, so as to obtain the corresponding dialog reply in the source field; and mapping the source domain dialog reply to a target domain dialog reply, thereby migrating the dialog strategy of the source domain to the target domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the required amount of manual labeling data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.

The technical solution of the present invention is further described with reference to specific extended scenarios.

Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the second embodiment is provided. As shown in fig. 4, one specific implementation of step S20 includes:

step S201, determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;

the source domain is a domain in which a target domain is specified in advance, for example, the source domain of the target domain a is specified as domain B, and the source domain of the target domain C is specified as domain D. The specified interrelationship is a preset associative relation between the target field and the source field. And after the target field is determined, determining the source field according to the target field and the preset incidence relation corresponding to the source field.

Step S202, obtaining a source domain intention with the maximum preset similarity with the target domain intention;

specifically, manual designation of the target domain arbitrary intention and the source domain arbitrary intention is performed in advance. And when determining the source field intention, determining and acquiring the source field intention with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.

Step S203, acquiring a source field slot position with the maximum preset similarity with the target field slot position;

specifically, manual designation of arbitrary intentions of the target domain and arbitrary intentions of the source domain is performed in advance. And when the target field slot position is determined, determining and acquiring a source field slot position with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.

And step S204, generating a source field conversation state according to the source field intention and the source field slot position.

And after the intention and the slot position in the source field are obtained, generating a source field conversation state. The source domain dialog state may specifically be a combination of the source domain intent and a set of source domain slots and slot bit values thereof. The slot position value of a slot position in a certain source field can be default slot position information set manually or obtained according to a preset rule, for example, if the source field is a hotel booking, the slot position in the source field is the check-in time, and the corresponding slot position value in the source field is set as the current date; and if the source field slot position is the off-store time, the corresponding source field slot position value is set as the date of the next day.

For example, the target domain is "air ticket booking", and the target domain dialog state is { intention: booking an air ticket, starting a city: shanghai, arrival at city: beijing }. The method comprises the steps of obtaining a source field intention with the maximum similarity corresponding to a target field intention in a target field conversation state as ' hotel booking ', and obtaining source field slots with the maximum similarity corresponding to a target field slot position ' departure time ' and an arrival time ' in the target field conversation state as ' check-in time ' and ' departure time ', and further obtaining slot position values corresponding to the source field slots to generate a source field conversation state. Thus, the target domain dialog state { intent: booking an air ticket, starting a city: shanghai, arrival at city: beijing is mapped to the source domain dialog state { intent: booking a hotel, and checking in time: number 10 month 1, 2018, time off store: number 2 of 10 months in 2018).

In this embodiment, a source domain intention corresponding to the target domain intention and a source domain slot corresponding to the target domain slot are determined by manually specifying any intention of the target domain, any slot and any intention of the source domain, and a similarity between any slots, so that the target domain dialog state is mapped to the source domain dialog state. The method for manually specifying the similarity has the characteristics of easiness in realization and maintenance.

Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the third embodiment is provided. As shown in fig. 5, one specific implementation of step S20 includes:

step S205, determining a source field according to the target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;

step S205 is the same as step S201 described above, and step S201 may be referred to for specific implementation.

Step S206, obtaining a source domain intention with the maximum preset similarity with the target domain intention;

step S206 is the same as step S202 described above, and step S202 may be referred to for specific implementation.

Step S207, acquiring a source field slot position corresponding to the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;

Specifically, based on the information entropy theory, the importance of a slot in a certain field is measured by the attribute entropy of the slot. The attribute entropy of the slot is an entropy obtained after normalization processing, and a preferred calculation formula is as follows:

s represents a certain slot, η(s) is attribute entropy of the certain slot, υ is a certain attribute under the corresponding slot, Vs is each attribute set (| Vs | is attribute total number) of the certain slot, and p (s ═ υ) is empirical probability that an entity in a database with a time slot is the attribute υ.

For example, as shown in the table below, the table below is for different restaurants in the restaurant database "whether children are allowed" and "price high and low".

An entropy η(s) ═ p(s) × log (p(s) ═ allow))/2 + p (s ═ disallow) × log (p (s ═ disallow))/2 of an attribute of "whether or not a restaurant in the restaurant database is allowed with children". Where the probability of p (s ═ allowed) is 0/4, the probability of p (s ═ not allowed) is 4/4, and Vs is 2. At this time, the attribute entropy of "whether to allow child-carrying" is 0.

Similarly, the attribute restaurant "price high-low" in the restaurant database has an entropy η(s) ([ p (s ═ price high) × log (p (s ═ price high))/3 + p (s ═ price medium) × log (p (s ═ price medium))/3 + p (s ═ price low) × log (p (s ═ price low))/3 ]. Where the probability of p (s ═ price high) is 1/4, the probability of p (s ═ price in) is 1/4, the probability of p (s ═ price in) is 2/4, and Vs is 3. At this time, the attribute entropy of "price high/low" is 0.15.

The attribute entropy of a certain slot position calculated according to the formula is a positive value; the lower the entropy value is, the lower the information gain level of the attribute is, and the importance degree of the attribute is lower; correspondingly, the importance of the corresponding slot is lower. In the above example, the system asks the user for a restaurant's preference for "allow kids" slot attribute, which is not practical because none of the restaurants in the database are allowed with kids, i.e., the above inquiry dialog of the system does not provide any information gain. Thus, the entropy of the "price high low" attribute is higher than the entropy of the "child with or not allowed" attribute.

After calculating the attribute entropies of different slot positions, comparing the importance of the slot positions by comparing the values of the attribute entropies of the slot positions; and then, according to the importance degree comparison result of the slot position, the importance degree is sequenced, thereby obtaining the important sequencing result of the slot position in a certain field. And establishing a corresponding relation between corresponding slot positions of the two fields at the same importance degree sequencing position according to respective slot position importance degree sequencing results of any two fields. In this embodiment, the slot importance comparison and sorting of the target field and the source field, and the establishment of the slot correspondence relationship are pre-operation steps.

For example, in the target field of 'ticket booking', the slot positions which can most help the user to carry out ticket screening are sorted according to the descending order of importance as follows: departure city, target city, departure time, airline, price. In the source field of 'hotel determination', the slot positions which can most help the user to carry out hotel screening are sorted according to the descending order of importance degree as follows: time of stay, time of departure, hotel location, hotel star, price, house type, etc. When the corresponding relations of the slot positions in the two fields are established, the corresponding relations of the departure city, the check-in time, the target city, the departure time, the hotel position, the airline company, the hotel star level and the like are respectively established. It should be noted that if all tickets are taken off in the morning, the entropy of the attribute of the slot "take off time" is very low and does not help the user to screen flights.

Therefore, based on the corresponding relation between the target field and each slot position of the source field, the slot position of the source field corresponding to the slot position of the target field is found out. For example, if the target slot is a departure city, the corresponding source slot is the stay-in time, and the rest are analogized.

And S208, generating a source field conversation state according to the source field intention and the source field slot position.

Step S208 is the same as step S204 described above, and step S204 may be referred to for specific implementation.

In this embodiment, based on the importance ranking results of the target domain slot and the source domain slot, a corresponding relationship is established between the target domain slot and the source domain slot, and the source domain slot corresponding to the target domain slot is obtained. And mapping the target field dialog state into a source field dialog state based on the source field intention and the source field slot position corresponding to the target field intention. The method and the device perform sequencing based on the importance of the slot positions, establish the corresponding relation of the slot positions between the two fields and fully utilize the importance index data of the slot positions, thereby being beneficial to improving the accuracy and the effectiveness of the slot position matching and improving the reliability of the slot position matching.

Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the fourth embodiment is provided. As shown in fig. 6, one specific implementation of step S20 includes:

step S209, determining a source field according to the target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;

Step S209, similar to step S201 described above, may refer to step S201.

Step S210, solving a set of variables maximizing a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;

understandably, the higher the similarity of any set of intents (or slots), the higher the accuracy of mapping the target domain dialog state to the source domain dialog state. If it is assumed that the similarity of any set of intents or slots in the target domain or the source domain is a probability variable, and the probability variable can be regarded as a solution value of a predefined learning target equation. The learning objective equation is used for measuring the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field.

Based on the above logic, one specific measure of the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field is as follows: and during the n-th conversation, adding the future total profit of all conversations after the n-th conversation, which is estimated by the model algorithm, with the single-round profit of the n-th conversation recorded in the actual data. The smaller the error between the obtained addition result and the total future income of all conversations after the n +1 th round estimated by the model is, the better the performance improvement effect of the model algorithm in reinforcement learning in a certain field is.

One preferred learning objective equation is:

in the formula, theta is a variable parameter set; hn refers to the state of one multi-turn conversation in the nth step; yn refers to the reply that the dialog system replies to the user at step n.

Is the square of the standard loss equation (bellman equation) in the conventional Q-learning algorithm (Q-learning), and minimizing this component is to bring the standard loss equation close to 0. Namely: reducing future revenue estimated for conversations at nth

Future profit Q actually recorded from dialogs at the n +1 th wheel^t(h_n,y_n) An error therebetween.

R (Θ) is a regularization term that limits the complexity of the model and aligns the intents and slots of the target and source realms. The specific principle is that when a certain field is used for reinforcement learning, the following logic is present by default: if two agents (agents) respectively perform two groups of actions similar to each other in two similar states (states) in two fields, the next state to which the two agents respectively transfer to will be similar, and the rewards (rewarded) obtained by the two agents in the process of performing the state transfer will be similar.

Specifically, R (Θ) ═ R₁(Θ)+R₂(Θ)+R₃(Θ)+R₄(Θ)

(1)R₁(Θ)＝R_1s(Θ)+R_1t(Θ)

R_1s(Θ)、R_1tAnd (theta) respectively representing the slot vector retention regularization formulas of the source field and the target field. R is ₁(Θ) represents a slot vector leave-regularization formula across domains.

In the formula, Lce (·) represents a cross entropy loss, and omicron represents the cross entropy loss. a is a^sRepresenting any language intent. D^sRepresenting dialogs (in large numbers) of the source domain.

c^t(. to) a prediction function for the intention vector a^tUnder the condition of (1), a slot position vector s is predicted in the target domain^t；

Can be considered as an intention vector approximating the target domain, for a^sThe answer of (1);

compatible slot position vectors for the prediction in the target field;

R_1t(theta) and R_1sThe formula (Θ) corresponds to the formula, and is not described herein.

D^tDialogs representing the target domain (fewer in number);

function representing intent translation from source domain to target domain

Representing a function translated from the target domain to the source domain statement.

Wherein the content of the first and second substances,

is the probability of occurrence of the intention a of all target areas,

is the probability of occurrence of the intent a of all source domains. L is_kl() Is the Kullback-Leibler divergence loss.

Wherein the content of the first and second substances,

is to locate the target slot

Mapping to source-realm slot

The probability of (d); i S^sThe number of slots in the source domain.

Further, after the learning objective equation is established, a set of variables that maximize the learning objective equation is found based on a preset optimization algorithm.

The optimization algorithm can be configured according to actual needs, such as Adam method (see concretely Kingma and Ba,2014 Diederik Kingma and Jimmy Ba. Adam: A method for storing optimization. arXiv preprinting arXiv:1412.6980,2014.) or gradient descent algorithm.

The set of variables that maximizes the learning objective equation corresponds to the similarity between any two intents in the source and target domains or the similarity between any two slots.

Step S211, according to the determination result of the similarity, obtaining the source domain intention with the maximum similarity with the target domain intention;

when determining the source domain intention, according to the similarity between any source domain intention and the target domain intention, determining and acquiring the source domain intention with the maximum similarity to the target domain intention.

Step S212, according to the similarity determining result, obtaining a source field slot position with the maximum similarity with the target field slot position;

and when the source field slot position is determined, determining and acquiring the source field slot position with the maximum similarity to the target field intention according to the similarity between any source field intention and the target field intention.

And step S213, generating a source field conversation state according to the source field intention and the source field slot position.

Step S213 is the same as step S204 described above, and step S204 may be referred to for specific implementation.

In this embodiment, a learning objective equation is established first, and then a set of variables maximizing the learning objective equation is searched based on a preset optimization algorithm, so as to determine the similarity between any one set of intentions in the source field and the target field or the similarity between any one set of slots; and then determining a source domain intention with the maximum similarity to the target domain intention and a source domain slot with the maximum similarity to the target domain slot, and mapping the target domain dialogue state into a source domain dialogue state according to the determined source domain intention and the source domain slot so as to facilitate the generation of subsequent source domain reply actions.

In this embodiment, based on a learning objective equation and an optimization algorithm of reinforcement learning, a set of variables that maximizes the learning objective equation is found, so as to determine the similarity between a source field and any one set of intents in a target field or the similarity between any one set of slots. The embodiment combines the advantages of reinforcement learning and cross-domain migration application, determines the similarity of any group of intents/slot positions through the representation of the performance improvement effect of reinforcement learning of a certain model algorithm in a certain domain, effectively improves the accuracy and effectiveness of intention/slot position matching, and has strong cross-domain migration generalization capability and reliability.

In addition, the invention also provides a migration device of the cross-domain conversation strategy.

As shown in fig. 7, fig. 7 is a schematic composition diagram of each functional unit of the device. Wherein the apparatus comprises:

a target domain dialog state mapping unit 10, configured to process an input user input dialog to map a corresponding target domain dialog state;

the migration device of the cross-domain conversation strategy is particularly suitable for the task type conversation system. The purpose of a task-based dialog system is to help a user complete a task, such as booking a hotel, purchasing an airline ticket, etc., by recognizing the user's intent. In specific implementation, the user input dialog may be dialog information generated based on information materials such as characters or voice input when the user uses the human-computer interaction system, for example, when the user needs to book an air ticket, the user may input information "i want to order an air ticket going to beijing from shanghai" on the human-computer interaction system (ticket ordering platform); at the moment, after the system detects the input information of the user, the corresponding user input dialog is extracted.

The target domain refers to a domain having the highest degree of dialogue association with the input user, and the specific type of the target domain may be manually set by the user. For example, the user makes a selection of a target field, such as selecting a "ticket order" field, before or after entering information. Alternatively, the user input dialog is analyzed. For example, based on the user input dialog "i want to order an air ticket to beijing from shanghai", the target-domain dialog state mapping unit 10 determines that the target domain is "ticket order" or "ticket order". In addition, the target field can also be a task type scene field such as flow checking, call charge checking, meal ordering, consultation and the like.

In a specific implementation, the target domain dialog state mapping unit 10 is specifically configured to: performing natural language understanding on an input user input conversation to identify a target field intention and extract a target field slot;

the user input dialogue belongs to natural language, and natural language understanding is carried out on the user input dialogue through a natural language understanding module (or unit), so that target field identification, user intention identification and slot extraction are carried out. And target field identification, namely identifying a task type scene to which the user input dialogue belongs. Identifying user intentions, namely identifying the user intentions, and subdividing sub-scenes in the task-based scene; and the slot position extraction is used for extracting the slot position and the slot position value thereof based on user input dialogue, and can be specifically realized through a slot position filling mode. The specific technology for understanding the natural language of the input user input dialog, recognizing the target field intention and extracting the slot position of the target field belongs to the conventional prior art, and is not described herein any more.

The target field dialogue state mapping unit 10 is further configured to track the target field intention;

dialog state tracking is a core component that ensures the robustness of a dialog system. The method estimates the target of the user in each turn of the conversation, manages the input and the conversation history of each turn, and outputs the current conversation state. This typical state structure is often referred to as slot filling or semantic framework. Conventional methods have found widespread use in most commercial implementations, and manual rules are typically employed to select the most likely output.

And the target field dialog state mapping unit 10 is further configured to map the user input dialog according to the target field intention, the target field slot, and the tracking result of the target field intention, so as to obtain a corresponding target field dialog state.

For example, the target domain dialog state mapping unit 10 performs word segmentation and word stem extraction on the user input dialog "i want to order an air ticket from shanghai to beijing", and further generates a semantic slot corresponding to the user input dialog. Semantic slots can be predefined according to different scenarios. According to the semantic slot, determining the intention of user conversation, the slot position and the slot position value:

hotel booking intention

The slot position 1 is the starting city, and the corresponding slot position value is Shanghai

Slot 2 is arrival city and corresponding slot value is beijing

A source domain dialog state mapping unit 20, configured to map the target domain dialog state into a source domain dialog state;

specifically, the source domain dialog state mapping unit 20 determines a corresponding target domain according to the target domain; the source domain is a domain in which a target domain is specified in advance. It is appropriate when the degree of similarity between the target domain and the source domain is high. For example, the source field pre-designated to the field of "booking air tickets" is the field of "booking hotels".

Then, the source domain dialog state mapping unit 20 acquires the source domain intention that has the greatest similarity to the target domain intention in the target domain dialog state. For example, the target domain intention "booking flight ticket" has different similarities with different intentions (such as "booking hotel", "querying house source location") in the source domain (booking hotel), respectively; the embodiment selects the source domain intention under the maximum similarity.

And, the source domain dialog state mapping unit 20 obtains the source domain slot with the largest similarity with the target domain slot in the target domain dialog state. For example, a target field slot "departure city" in the target field (air ticket booking) has different similarities with different slots (such as "check-in time", "check-in number", "house source position") in the source field (hotel booking); the embodiment selects the source domain intention under the maximum similarity.

Please refer to other embodiments below for specific implementation of the source domain dialog state mapping unit 20.

A source domain dialog state processing unit 30, configured to process a source domain dialog state based on a preset dialog policy of a source domain, to obtain a corresponding source domain dialog reply;

Specifically, the source domain dialog state processing unit 30 retrieves a preset dialog policy of the source domain, and processes the source domain dialog state according to the preset dialog policy, so as to obtain a corresponding source domain dialog reply.

A target domain dialog reply mapping unit 40, configured to map the source domain dialog reply into a target domain dialog reply.

After obtaining the source domain dialog reply, the target domain dialog reply mapping unit 40 performs mapping processing on the source domain dialog reply, thereby obtaining a target domain dialog reply in the target domain. For example, an abstract source domain dialog reply to the source domain (hotel booking) { intent: query, price: is it a question of Executing mapping processing to obtain abstract target field dialogue reply { intention: query, price: is it a question of ) }, "? "means that the reply form for asking the price is a question sentence.

Furthermore, the migration device of the cross-domain conversation strategy further comprises a natural language reply unit, wherein the natural language reply unit is used for organizing the target domain conversation reply into natural language to return to the user so as to facilitate the understanding of the user. For example: abstract target realm dialog reply for target realm (air ticket) intent: query, price: is there a Will be organized into the natural language "ask for what price ticket you want? ".

In the migration device of the cross-domain conversation strategy, the conversation state in the source domain is processed by mapping the conversation state in the target domain into the conversation state in the source domain based on the existing preset conversation strategy in the source domain to obtain the corresponding conversation reply in the source domain; and mapping the source domain dialogue reply to a target domain dialogue reply, thereby migrating the dialogue strategy of the source domain to the target domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the required amount of manual labeling data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.

Further, on the basis of the migration apparatus of the cross-domain dialog policy of the present invention as described above, in a specific implementation, the target domain dialog state mapping unit 20 is specifically configured to determine a source domain according to a target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;

the source domain is a domain in which a target domain is specified in advance, for example, the source domain of the target domain a is specified as domain B, and the source domain of the target domain C is specified as domain D. The specified interrelationship is a preset associative relation between the target field and the source field. And after the target field is determined, determining a source field according to the target field and the preset association relation corresponding to the target field.

The target domain dialog state mapping unit 20 is further configured to obtain a source domain intention with a maximum preset similarity to the target domain intention;

specifically, manual designation of arbitrary intentions of the target domain and arbitrary intentions of the source domain is performed in advance. And when determining the source field intention, determining and acquiring the source field intention with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.

The target domain dialog state mapping unit 20 is further configured to obtain a source domain slot having a maximum preset similarity with the target domain slot;

The target domain dialog state mapping unit 20 is further configured to generate a source domain dialog state according to the source domain intention and the source domain slot.

And after the intention and the slot position in the source field are obtained, generating a source field conversation state. The source domain dialog state may specifically be a combination of the source domain intent and a set of source domain slots and slot level values thereof. The slot position value of a slot position in a certain source field can be default slot position information set manually, or obtained according to a preset rule, for example, if the source field is a hotel booking, and the slot position in the source field is check-in time, the corresponding slot position value in the source field is set as the date of the day; and if the source field slot position is the time of leaving the store, the corresponding value of the source field slot position is set as the date of the next day.

For example, the target domain is "ticket booking", and the target domain dialog state is { intention: booking an air ticket, starting a city: shanghai, to city: beijing }. The method comprises the steps of obtaining a source field intention with the maximum similarity corresponding to a target field intention in a target field conversation state as ' hotel booking ', and obtaining source field slots with the maximum similarity corresponding to a target field slot position ' departure time ' and an arrival time ' in the target field conversation state as ' check-in time ' and ' departure time ', and further obtaining slot position values corresponding to the source field slots to generate a source field conversation state. Thus, the target domain dialog state { intent: booking an air ticket, starting a city: shanghai, to city: beijing is mapped to the source domain dialog state { intent: ordering a hotel, and checking in time: no. 10 month 1 in 2018, time from store: number 2 of 10 months in 2018 }.

In this embodiment, the target domain dialog state mapping unit 20 determines a source domain intention corresponding to the target domain intention and determines a source domain slot corresponding to the target domain slot based on a similarity between any intention of the manually specified target domain, any slot position and any intention of the source domain, and any slot position, so as to map the target domain dialog state to the source domain dialog state. The method for manually specifying the similarity has the characteristics of easiness in implementation and maintenance.

specific implementations may refer to the above embodiments.

The target field session state mapping unit 20 is further configured to obtain a source field slot position corresponding to the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;

specifically, based on the information entropy theory, the importance of a slot in a certain field is measured by the attribute entropy of the slot. The attribute entropy of the slot is an entropy obtained after normalization, and a preferred calculation formula is as follows:

For example, as shown in the table below, the table below is for the case of "child allowed" and "price high and low" for different restaurants in the restaurant database.

An entropy η(s) ═ p(s) × log (p(s) ═ allow))/2 + p (s ═ disallow) × log (p (s ═ disallow))/2 of an attribute of "whether or not a restaurant in the restaurant database is allowed with children". Where the probability of p (s ═ allowed) is 0/4, the probability of p (s ═ disallowed) is 4/4, and Vs is 2. At this time, the attribute entropy of "whether or not to allow child carrying" is 0.

Calculating according to the formula to obtain the attribute entropy of a certain slot position as a positive value; the lower the entropy value is, the lower the information gain level of the attribute is, and the importance degree of the attribute is lower; correspondingly, the importance of the corresponding slot is lower. In the above example, it would make no practical sense for the system to ask the user's preference for the restaurant "allow kids" slot property, because none of the restaurants in the database are allowed with kids, i.e., the above inquiry session of the system does not provide any information gain. Thus, the entropy of the "price high and low" attribute is higher than the entropy of the "whether children are allowed" attribute.

After calculating the attribute entropies of different slot positions, the importance degree comparison of the slot positions is realized by comparing the values of the attribute entropies of the slot positions; and then, according to the importance degree comparison result of the slot positions, the importance degree is sequenced, so that the important sequencing result of the slot positions in a certain field is obtained. And establishing a corresponding relation between corresponding slot positions of the two fields in the same importance ranking position according to respective slot position importance ranking results of any two fields. In this embodiment, the comparison and sorting of the importance of the slot positions in the target field and the source field, and the establishment of the slot position corresponding relationship are pre-operation steps.

Specific implementations may refer to the above embodiments.

In this embodiment, the target-domain dialog state mapping unit 20 establishes a corresponding relationship between the slot in the target domain and the slot in the source domain based on the importance ranking result of the slot in the target domain and the slot in the source domain, and further obtains the slot in the source domain that establishes a corresponding relationship with the slot in the target domain. And mapping the target field dialog state into a source field dialog state based on the source field intention and the source field slot position corresponding to the target field intention. The ranking is carried out based on the importance of the slot positions, the corresponding relation of the slot positions between the two fields is established, and the importance index data of the slot positions are fully utilized, so that the accuracy and the effectiveness of the slot position matching are improved, and the reliability of the slot position matching is improved.

Specific implementations may refer to the above embodiments.

The target domain dialogue state mapping unit 20 is further configured to solve a set of variables that maximizes the learning target equation based on a predefined learning target equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;

Based on the above logic, one specific measure of the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field is as follows: and during the nth round of conversation, adding the future total income of all conversations after the nth round, which is estimated by the model algorithm, with the single-round income of the nth round of conversation recorded in actual data. The smaller the error between the obtained addition result and the total future income of all conversations after the (n + 1) th round estimated by the model is, the better the performance improvement effect of the model algorithm in reinforcement learning in a certain field is.

One preferred learning objective equation is:

R (Θ) is a regularization term to limit the complexity of the model and to align the slot positions with the intentions of the target and source realms. The specific principle is that when reinforcement learning is performed in a certain field, the following logic is present by default: if two agents (agents) respectively perform two groups of actions similar to each other in two similar states (states) in two fields, the next state to which the two agents respectively transfer to will be similar, and the rewards (rewarded) obtained by the two agents in the process of performing the state transfer will be similar.

Specifically, R (Θ) ═ R₁(Θ)+R₂(Θ)+R₃(Θ)+R₄(Θ)

(1)R₁(Θ)＝R_1s(Θ)+R_1t(Θ)

R_1s(Θ)、R_1tAnd (theta) respectively representing the slot vector retention regularization formulas of the source field and the target field. R ₁(Θ) represents a slot vector leave-regularization formula across domains.

In the formula, Lce (·) represents a cross entropy loss, and omicron represents the cross entropy loss. a is a^sRepresenting any source domain intent. D^sRepresenting dialogs (in large numbers) of the source domain.

c^t(. to) a predictor function for the intent vector a at the target domain^tUnder the condition of (2), a slot position vector s is predicted in the target domain^t；

Language action vector, for a, that can be considered as approximating the target domain^sThe answer of (1);

predicting a slot position vector in a target field;

D^tDialogs representing the target domain (fewer in number);

representing a function of intent translation from a source domain to a target domain

Wherein the content of the first and second substances,

is the probability of occurrence of the intention a of all target areas,

is the probability of occurrence of the intended behavior of all source domains. L is_kl() Is the Kullback-Leibler divergence loss.

Wherein the content of the first and second substances,

is to locate the target slot

Mapping to a reference slot

The probability of (d); i S^sThe number of slots in the source domain.

The optimization algorithm can be configured according to actual needs, such as Adam method (see Kingma and Ba,2014 Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprinting arXiv:1412.6980,2014), or gradient descent algorithm.

The target domain dialog state mapping unit 20 is further configured to obtain a source domain intention with the greatest similarity to the target domain intention according to a determination result of the similarity;

The target field dialog state mapping unit 20 is further configured to obtain a source field slot with the greatest similarity to the target field slot according to the similarity determination result;

Specific implementations may refer to the above embodiments.

In this embodiment, a learning objective equation is established first, and then a set of variables maximizing the learning objective equation is found based on a preset optimization algorithm, so as to determine the similarity between the source field and any set of intentions in the target field or the similarity between any set of slots; and then determining a source field intention with the maximum similarity to the target field intention and a source field slot position with the maximum similarity to the target field slot position, and mapping the target field conversation state into a source field conversation state according to the determined source field intention and the source field slot position so as to generate a subsequent source field reply action.

In this embodiment, the target domain dialogue state mapping unit 20 finds a set of variables maximizing a learning objective equation based on the learning objective equation and the optimization algorithm of reinforcement learning, so as to determine the similarity between the source domain and any group of intents in the target domain or the similarity between any group of slots. The embodiment combines the advantages of reinforcement learning and cross-domain migration application, determines the similarity of any group of intents/slot positions through the representation of the performance improvement effect of reinforcement learning of a certain model algorithm in a certain domain, effectively improves the accuracy and effectiveness of intention/slot position matching, and has strong cross-domain migration generalization capability and reliability.

In addition, the present invention also provides a migration device for a cross-domain dialogue policy, which includes: the system comprises a memory, a processor and a migration program of cross-domain conversation strategy stored on the memory and capable of running on the processor, wherein the migration program of cross-domain conversation strategy realizes the steps of the migration method of cross-domain conversation strategy as described in any one of the above items when executed by the processor.

As shown in fig. 8, fig. 8 is a schematic structural diagram of a migration device for a cross-domain dialog policy according to an embodiment of the present invention.

The migration device of the cross-domain conversation strategy in the embodiment of the invention can be a PC or a server.

As shown in fig. 8, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the terminal structure shown in fig. 8 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 8, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a migration program of a cross-domain conversation policy.

In the device shown in fig. 8, the network interface 1004 is mainly used for connecting a backend server and communicating data with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a migration program of the cross-domain dialog policy stored in the memory 1005, and perform the operations in the migration method embodiment of the cross-domain dialog policy described above.

Based on the hardware structure, the embodiment of the migration method of the cross-domain dialogue strategy is provided.

In addition, the invention also provides a readable storage medium.

The storage medium stores a migration program of the cross-domain dialogue strategy, and the migration program of the cross-domain dialogue strategy realizes the steps of the migration method of the cross-domain dialogue strategy as described in any one of the above items when being executed by a processor.

The specific embodiments of the migration device and the storage medium of the cross-domain dialogue policy of the present invention are substantially the same as the embodiments of the migration method of the cross-domain dialogue policy, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element identified by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

While the present invention has been described with reference to the particular illustrative embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalent arrangements, and equivalents thereof, which may be made by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A migration method of cross-domain dialogue strategy, characterized in that the method comprises the following steps:

processing the input user input dialog to map out a corresponding target field dialog state, wherein the target field dialog state is a combination of an intention and a group of slot positions and slot position values thereof;

mapping the target domain dialog state to a source domain dialog state;

processing the source field dialogue state based on a preset dialogue strategy of a source field to obtain a corresponding source field dialogue reply, wherein the source field is a field which is specified in advance for a target field, and the target field comprises a field with the highest input user input dialogue association degree;

and mapping the source domain dialog reply to a target domain dialog reply.

2. The migration method of a cross-domain dialog policy according to claim 1, wherein the step of processing the input user input dialog to map out a corresponding target domain dialog state specifically comprises:

Tracking the target field intention;

3. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to a source domain dialog state specifically comprises:

4. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to the source domain dialog state specifically comprises:

5. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to the source domain dialog state in the source domain specifically comprises:

solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;

according to the result of similarity learning, obtaining a source field intention with the maximum similarity with the target field intention;

6. An apparatus for migrating a cross-domain conversation policy, the apparatus comprising:

the target field dialogue state mapping unit is used for processing the input user input dialogue to map out a corresponding target field dialogue state, wherein the target field dialogue state is the combination of an intention and a group of slot positions and slot position values thereof;

the source field dialogue state processing unit is used for processing the source field dialogue state based on a preset dialogue strategy of a source field to obtain a corresponding source field dialogue reply, wherein the source field is a field which is specified in advance for a target field, and the target field comprises a field with the highest input user input dialogue association degree;

7. The migration apparatus of a cross-domain dialog strategy according to claim 6, wherein the target domain dialog state mapping unit is specifically configured to perform natural language understanding on the input user input dialog to identify a target domain intention and extract a target domain slot; tracking the target field intention; and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.

8. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to determine the source domain according to the target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; acquiring a source domain intention with the maximum preset similarity to the target domain intention; acquiring a source field slot position with the maximum preset similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.

9. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to:

acquiring a source field slot position which establishes a corresponding relation with the target field slot position; respectively sequencing the importance of the slot positions in the target field and the slot positions in the source field, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sequencing result;

10. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to: determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots; according to the result of similarity learning, obtaining a source field intention with the maximum similarity with the target field intention; according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.

11. A migration device of a cross-domain dialogue policy is characterized by comprising: a memory, a processor and a migration program of cross-domain dialogue policies stored on the memory and executable on the processor, the migration program of cross-domain dialogue policies implementing the steps of the migration method of cross-domain dialogue policies according to any one of claims 1 to 5 when executed by the processor.

12. A readable storage medium, characterized in that the readable storage medium has stored thereon a migration program of a cross-domain dialogue policy, which when executed by a processor implements the steps of the migration method of the cross-domain dialogue policy according to any one of claims 1 to 5.