CN109739965B - Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium - Google Patents

Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium Download PDF

Info

Publication number
CN109739965B
CN109739965B CN201811641823.7A CN201811641823A CN109739965B CN 109739965 B CN109739965 B CN 109739965B CN 201811641823 A CN201811641823 A CN 201811641823A CN 109739965 B CN109739965 B CN 109739965B
Authority
CN
China
Prior art keywords
field
domain
source
target
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811641823.7A
Other languages
Chinese (zh)
Other versions
CN109739965A (en
Inventor
莫凯翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201811641823.7A priority Critical patent/CN109739965B/en
Publication of CN109739965A publication Critical patent/CN109739965A/en
Application granted granted Critical
Publication of CN109739965B publication Critical patent/CN109739965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a migration method of cross-domain conversation strategy, comprising the following steps: processing the input user input dialog to map out a corresponding target field dialog state; mapping the target domain dialog state to a source domain dialog state; processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply; and mapping the source domain dialog reply to a target domain dialog reply. The invention also provides a migration device, equipment and a readable storage medium of the cross-domain conversation strategy. The invention solves the technical problems that the conventional conventionally constructed dialogue system is difficult to maintain, the cost of manually marking data is high, the data is repeatedly marked, and the marked data is difficult to be applied across fields.

Description

Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy.
Background
The dialog system is an important component in the field of human-computer interaction, and the dialog system conventionally constructed at present mainly comprises: a dialogue system built by utilizing rules, a dialogue system based on supervised learning and a dialogue system based on reinforcement learning.
Dialog systems built using rules appear the earliest, and are easier for humans to understand and control. The disadvantage is that the developer needs to enumerate all cases and make rules for each case to make a pre-determination. When the actual scene is complex and the quantity of the established rules is accumulated more, the rules are easy to conflict with each other, so that the system is difficult to maintain. Such systems have difficulty supporting large scale dialog systems.
The dialogue system based on supervised learning and the dialogue system based on reinforcement learning are obtained by training models and data, developers do not need to make rules in advance for all conditions, and only the annotation data need to be collected and used for training the models. However, the biggest disadvantage of both dialog systems is the need to collect large-scale annotation data. However, since real application scenarios are numerous, it is obviously impractical to collect enough annotation data for each dialog scenario; the main reasons include:
1. The cost of manually labeling data is high.
2. A large number of repeated labels may exist in different scenes, resulting in resource waste. For example: the same category of demand functions (referred to as "intents" in this disclosure) occurs in the context of buying coffee, ordering airline tickets, ordering hotels, etc.: "tell", "request", and the occurrence of the same task information (referred to as "slot" in this disclosure): "location", "time", etc.
3. It is difficult to use data from one domain directly to train a model from another domain. First, the same or similar intents and slots may be labeled by different companies with different names; second, there are really different intents and slots in different areas.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy, and aims to solve the technical problems that the conventional conventionally constructed conversation system is difficult to maintain, the cost of manually marking data is high, the data is repeatedly marked, and the marked data is difficult to be applied in a cross-domain manner.
In order to achieve the above object, the present invention provides a migration method of cross-domain dialog policy, which includes the following steps:
processing the input user input dialog to map out a corresponding target field dialog state;
mapping the target domain dialog state to a source domain dialog state;
processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply;
and mapping the source domain dialog reply to a target domain dialog reply.
Preferably, the step of processing the input user input dialog to map out a corresponding dialog state of the target domain specifically includes:
performing natural language understanding on an input user input conversation to identify a target field intention and extract a target field slot;
tracking the target field intention;
and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.
Preferably, the step of mapping the dialog state of the target domain to the dialog state of the source domain specifically includes:
Determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
acquiring a source domain intention with the maximum preset similarity to the target domain intention;
acquiring a source field slot position with the maximum preset similarity with the target field slot position;
and generating a source field conversation state according to the source field intention and the source field slot position.
Preferably, the step of mapping the dialog state of the target domain to the dialog state of the source domain specifically includes:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
acquiring a source domain intention with the maximum preset similarity to the target domain intention;
acquiring a source field slot position which establishes a corresponding relation with the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;
and generating a source field conversation state according to the source field intention and the source field slot position.
Preferably, the step of mapping the target domain dialog state to the source domain dialog state in the source domain specifically includes:
Determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots according to the solved variables;
according to the result of similarity learning, obtaining a source domain intention with the maximum similarity to the target domain intention;
according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position;
and generating a source field conversation state according to the source field intention and the source field slot position.
In addition, to achieve the above object, the present invention further provides a migration apparatus for cross-domain dialog policy, the apparatus including:
the target field dialogue state mapping unit is used for processing the input user input dialogue to map out a corresponding target field dialogue state;
a source domain dialog state mapping unit, configured to map the target domain dialog state into a source domain dialog state;
the source field conversation state processing unit is used for processing the source field conversation state based on a preset conversation strategy of the source field to obtain a corresponding source field conversation reply;
And the target field dialogue reply mapping unit is used for mapping the source field dialogue reply into the target field dialogue reply.
Preferably, the target domain dialogue state mapping unit is specifically configured to perform natural language understanding on an input user input dialogue to identify a target domain intention and extract a target domain slot; tracking the target field intention; and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.
Preferably, the source domain dialog state mapping unit is specifically configured to determine the source domain according to the target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; acquiring a source domain intention with the maximum preset similarity to the target domain intention; acquiring a source field slot position with the maximum preset similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.
Preferably, the source domain dialog state mapping unit is specifically configured to:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
Acquiring a source domain intention with the maximum preset similarity to the target domain intention;
acquiring a source field slot position which establishes a corresponding relation with the target field slot position; respectively sequencing the importance of the slot position in the target field and the slot position in the source field, and establishing a corresponding relation between the slot position in the target field and the slot position in the source field according to a sequencing result;
and generating a source field conversation state according to the source field intention and the source field slot position.
Preferably, the source domain dialog state mapping unit is specifically configured to: determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots according to the solved variables; according to the result of similarity learning, obtaining a source domain intention with the maximum similarity to the target domain intention; according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.
In addition, to achieve the above object, the present invention further provides a migration device for a cross-domain dialog policy, where the terminal device includes: the system comprises a memory, a processor and a migration program of cross-domain dialogue strategy stored on the memory and capable of running on the processor, wherein the migration program of cross-domain dialogue strategy realizes the steps of the migration method of cross-domain dialogue strategy when being executed by the processor.
In addition, to achieve the above object, the present invention further provides a readable storage medium, on which a migration program of a cross-domain dialogue policy is stored, and when executed by a processor, the migration program of the cross-domain dialogue policy implements the steps of the migration method of the cross-domain dialogue policy as described above.
The embodiment of the invention provides a method, a device, equipment and a readable storage medium for migrating a cross-domain conversation strategy, wherein a conversation state in a source domain is processed by mapping a conversation state in a target domain into a conversation state in the source domain based on a preset conversation strategy in the source domain to obtain a corresponding conversation reply in the source domain; and mapping the source domain dialogue reply to a target domain dialogue reply, thereby migrating the dialogue strategy of the target domain to the dialogue strategy of the source domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the demand of manual marking data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.
Drawings
FIG. 1 is a flowchart illustrating a migration method of cross-domain dialog policies according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a detailed step of step S10 in the migration method of cross-domain dialog policy according to the first embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an implementation process of the migration method of cross-domain conversation policy according to the present invention;
FIG. 4 is a flowchart illustrating a migration method of cross-domain dialog policies according to a second embodiment of the present invention;
FIG. 5 is a flowchart illustrating a migration method of cross-domain dialog policies according to a third embodiment of the present invention;
FIG. 6 is a flowchart illustrating a migration method of cross-domain dialog policies according to a fourth embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating the components of the functional units of the migration apparatus according to the cross-domain dialogue strategy of the present invention;
FIG. 8 is a schematic diagram of the operating environment of a migration device for cross-domain dialog policies of the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Some of the terms and explanations related to the present invention are listed below:
intention is: in the task type dialog system, sentences are divided into different categories according to different tasks, each category expresses a different meaning, and each category is an intention.
For example: the sentence "I want to specify a ticket from Beijing to Shanghai" is a requirement that the user expresses him, which can be defined as "inform" intention; "are tickets there are points? The phrase "indicates that the user is inquiring about the ticket information, which may be defined as a" request "intent.
It is worth noting that different companies may express different words for the same intent for different scenarios, such as: the "request" intent may be named "question" by other companies and may also be named "get information".
A slot position: in a task-based dialog system, different information needs to be collected according to different tasks, and each piece of information is a slot.
For example: in the sentence of 'I want to determine a ticket from Beijing to Shanghai', 'Beijing' is a starting place slot position and 'Shanghai' is a destination slot position. It is also worth noting that different companies may express different words for the same slot for different scenarios, such as: the "origin" may be labeled as "departure city", etc.
Target area: there is a need for improved target areas that do not have sufficient training data.
The source field is as follows: there is an existing field with a large amount of training data while having a dialog strategy with a higher level of performance.
The invention provides a migration method of a cross-domain conversation strategy.
Referring to fig. 1, fig. 1 is a flowchart illustrating a migration method of a cross-domain dialog policy according to a first embodiment of the present invention. In this embodiment, the method comprises the steps of:
step S10, processing the input user input dialog to map out the corresponding target domain dialog state;
embodiments of the present invention are particularly applicable to task-based dialog systems. The purpose of the task-based dialog system is to assist the user in completing tasks, such as booking a hotel, purchasing an airline ticket, etc., by recognizing the user's intent. In specific implementation, the user input dialog may be dialog information generated based on information materials such as characters or voice input when the user uses the human-computer interaction system, for example, when the user needs to book an air ticket, the user may input information "i want to book an air ticket from shanghai to beijing" in the human-computer interaction system (ticket booking platform); at the moment, after the system detects the input information of the user, the corresponding user input dialog is extracted.
The target domain refers to a domain having the highest degree of dialog with the input user input, and the specific type of the target domain may be manually set by the user. For example, the user makes a selection of a target field, such as selecting a "booking" field, before or after entering information. Or, the user input dialog is analyzed. For example, based on the user input dialog "i want to order an air ticket to beijing from shanghai", the target field is determined to be "ticket booking" or "ticket booking". In addition, the target field can also be a task type scene field such as flow checking, call charge checking, meal ordering, consultation and the like.
As shown in fig. 2, in one embodiment, step S10 includes:
step S11, natural language understanding is carried out on the input user input dialogue to identify the target field intention and extract the target field slot;
referring to fig. 3, fig. 3 is a schematic diagram illustrating an implementation process of the migration method of cross-domain dialog policy according to the present invention. The user input dialogue belongs to natural language, and natural language understanding is carried out on the user input dialogue through a natural language understanding module (or unit), so that target field identification, user intention identification and slot extraction are carried out. And target field identification, namely identifying a task type scene to which the user input dialogue belongs. Identifying user intentions, namely identifying the user intentions, and subdividing sub-scenes in the task-based scene; and the slot position extraction is used for extracting the slot position and the slot position value thereof based on user input dialogue, and can be specifically realized through a slot position filling mode. The specific techniques for understanding the input user input dialog with natural language, identifying the target field intention and extracting the slot position of the target field belong to the conventional prior art, and are not described herein again.
Step S12, tracking the intention of the target field;
dialog state tracking is a core component that ensures the robustness of a dialog system. The method pre-estimates the target of the user in each turn of the conversation, manages the input and conversation history of each turn, and outputs the current conversation state. This typical state structure is often referred to as slot filling or semantic framework. Conventional methods have found widespread use in most commercial implementations, and manual rules are typically employed to select the most likely output.
And step S13, mapping the user input dialog according to the target field intention, the target field slot and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.
The target domain dialog state may specifically be a combination of an intent and a set of slots and their slot bit values.
For example, the user input dialog "i want to order an air ticket from shanghai to beijing" is subjected to word segmentation and word stem extraction, and then a semantic slot corresponding to the user input dialog is generated. Semantic slots can be predefined according to different scenarios. According to the semantic slot, determining the intention of the user conversation, the slot position and the slot position value:
hotel with intention of booking
The slot position 1 is the starting city, and the corresponding slot position is Shanghai
The slot 2 is equal to the arrival city, and the corresponding slot value is equal to Beijing
Step S20, mapping the target domain dialog state to a source domain dialog state;
specifically, a corresponding source field is determined according to a target field; the source domain is a domain in which a target domain is specified in advance. The similarity between the target domain and the source domain is preferably high. For example, the source field pre-designated to the field of "booking air tickets" is the field of "booking hotels".
Then, a source domain intention with the maximum similarity to the target domain intention in the dialog state with the target domain is obtained. For example, the target domain intention "booking flight ticket" has different similarities with different intentions (such as "booking hotel", "querying house source location") in the source domain (booking hotel), respectively; the embodiment selects the source domain intention under the maximum similarity.
And acquiring a source domain slot with the maximum similarity to the target domain slot in the session state of the target domain. For example, a target field slot "departure city" in the target field (air ticket booking) has different similarities with different slots (such as "check-in time", "check-in number", "house source position") in the source field (hotel booking); the embodiment selects the source domain intention under the maximum similarity.
And then, generating a source field conversation state according to the source field intention and the source field slot position. And mapping the dialog state of the target field into the dialog state of the source field when the source field intention and the source field slot position under the maximum similarity are obtained respectively.
For the embodiment of step S20, please refer to other embodiments below.
Step S30, processing the dialog state of the source field based on the preset dialog strategy of the source field to obtain the corresponding dialog reply of the source field;
The source field has a large amount of training data, and a dialogue strategy (i.e., the preset dialogue strategy) with a higher performance level is generally obtained based on the training of the large amount of training data; or the preset conversation strategy is set manually.
Specifically, a preset dialogue strategy of the source field is called, and the dialogue state of the source field is processed through the preset dialogue strategy, so that a corresponding dialogue reply of the source field is obtained.
For example, the dialog state of the source domain is { intent: ordering a hotel, and checking in time: year 2018, 10 month 1 day, departure time: in 2018, 10, 2, and 10, a preset dialog strategy in the source domain generates an optimal abstract dialog reply { intention: query, price: is it a question of }; wherein, "? "means that the reply form to ask for the price is a question sentence.
Step S40, mapping the source domain dialog reply to a target domain dialog reply.
As shown in fig. 3, after the source domain dialog reply is obtained, the source domain dialog reply is subjected to mapping processing, so as to obtain the target domain dialog reply in the target domain. For example, an abstract source domain dialog reply to the source domain (hotel booking) { intent: query, price: is there a Executing mapping processing to obtain abstract target field dialogue reply { intention: query, price: is there a ) }, "? "means that the reply form to ask for the price is a question sentence.
Further, the target domain dialog reply may be organized into natural language for return to the user for ease of understanding by the user. For example: abstract target realm dialog reply for target realm (air ticket) intent: query, price: is it a question of Will be organized into the natural language "ask for what price ticket you want? ".
In the embodiment, the dialog state in the source field is processed by mapping the dialog state in the target field into the dialog state in the source field and further based on the existing preset dialog strategy in the source field, so as to obtain the corresponding dialog reply in the source field; and mapping the source domain dialog reply to a target domain dialog reply, thereby migrating the dialog strategy of the source domain to the target domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the required amount of manual labeling data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.
The technical solution of the present invention is further described with reference to specific extended scenarios.
Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the second embodiment is provided. As shown in fig. 4, one specific implementation of step S20 includes:
step S201, determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
the source domain is a domain in which a target domain is specified in advance, for example, the source domain of the target domain a is specified as domain B, and the source domain of the target domain C is specified as domain D. The specified interrelationship is a preset associative relation between the target field and the source field. And after the target field is determined, determining the source field according to the target field and the preset incidence relation corresponding to the source field.
Step S202, obtaining a source domain intention with the maximum preset similarity with the target domain intention;
specifically, manual designation of the target domain arbitrary intention and the source domain arbitrary intention is performed in advance. And when determining the source field intention, determining and acquiring the source field intention with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.
Step S203, acquiring a source field slot position with the maximum preset similarity with the target field slot position;
specifically, manual designation of arbitrary intentions of the target domain and arbitrary intentions of the source domain is performed in advance. And when the target field slot position is determined, determining and acquiring a source field slot position with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.
And step S204, generating a source field conversation state according to the source field intention and the source field slot position.
And after the intention and the slot position in the source field are obtained, generating a source field conversation state. The source domain dialog state may specifically be a combination of the source domain intent and a set of source domain slots and slot bit values thereof. The slot position value of a slot position in a certain source field can be default slot position information set manually or obtained according to a preset rule, for example, if the source field is a hotel booking, the slot position in the source field is the check-in time, and the corresponding slot position value in the source field is set as the current date; and if the source field slot position is the off-store time, the corresponding source field slot position value is set as the date of the next day.
For example, the target domain is "air ticket booking", and the target domain dialog state is { intention: booking an air ticket, starting a city: shanghai, arrival at city: beijing }. The method comprises the steps of obtaining a source field intention with the maximum similarity corresponding to a target field intention in a target field conversation state as ' hotel booking ', and obtaining source field slots with the maximum similarity corresponding to a target field slot position ' departure time ' and an arrival time ' in the target field conversation state as ' check-in time ' and ' departure time ', and further obtaining slot position values corresponding to the source field slots to generate a source field conversation state. Thus, the target domain dialog state { intent: booking an air ticket, starting a city: shanghai, arrival at city: beijing is mapped to the source domain dialog state { intent: booking a hotel, and checking in time: number 10 month 1, 2018, time off store: number 2 of 10 months in 2018).
In this embodiment, a source domain intention corresponding to the target domain intention and a source domain slot corresponding to the target domain slot are determined by manually specifying any intention of the target domain, any slot and any intention of the source domain, and a similarity between any slots, so that the target domain dialog state is mapped to the source domain dialog state. The method for manually specifying the similarity has the characteristics of easiness in realization and maintenance.
Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the third embodiment is provided. As shown in fig. 5, one specific implementation of step S20 includes:
step S205, determining a source field according to the target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
step S205 is the same as step S201 described above, and step S201 may be referred to for specific implementation.
Step S206, obtaining a source domain intention with the maximum preset similarity with the target domain intention;
step S206 is the same as step S202 described above, and step S202 may be referred to for specific implementation.
Step S207, acquiring a source field slot position corresponding to the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;
Specifically, based on the information entropy theory, the importance of a slot in a certain field is measured by the attribute entropy of the slot. The attribute entropy of the slot is an entropy obtained after normalization processing, and a preferred calculation formula is as follows:
Figure BDA0001931245760000111
s represents a certain slot, η(s) is attribute entropy of the certain slot, υ is a certain attribute under the corresponding slot, Vs is each attribute set (| Vs | is attribute total number) of the certain slot, and p (s ═ υ) is empirical probability that an entity in a database with a time slot is the attribute υ.
For example, as shown in the table below, the table below is for different restaurants in the restaurant database "whether children are allowed" and "price high and low".
Figure BDA0001931245760000112
An entropy η(s) ═ p(s) × log (p(s) ═ allow))/2 + p (s ═ disallow) × log (p (s ═ disallow))/2 of an attribute of "whether or not a restaurant in the restaurant database is allowed with children". Where the probability of p (s ═ allowed) is 0/4, the probability of p (s ═ not allowed) is 4/4, and Vs is 2. At this time, the attribute entropy of "whether to allow child-carrying" is 0.
Similarly, the attribute restaurant "price high-low" in the restaurant database has an entropy η(s) ([ p (s ═ price high) × log (p (s ═ price high))/3 + p (s ═ price medium) × log (p (s ═ price medium))/3 + p (s ═ price low) × log (p (s ═ price low))/3 ]. Where the probability of p (s ═ price high) is 1/4, the probability of p (s ═ price in) is 1/4, the probability of p (s ═ price in) is 2/4, and Vs is 3. At this time, the attribute entropy of "price high/low" is 0.15.
The attribute entropy of a certain slot position calculated according to the formula is a positive value; the lower the entropy value is, the lower the information gain level of the attribute is, and the importance degree of the attribute is lower; correspondingly, the importance of the corresponding slot is lower. In the above example, the system asks the user for a restaurant's preference for "allow kids" slot attribute, which is not practical because none of the restaurants in the database are allowed with kids, i.e., the above inquiry dialog of the system does not provide any information gain. Thus, the entropy of the "price high low" attribute is higher than the entropy of the "child with or not allowed" attribute.
After calculating the attribute entropies of different slot positions, comparing the importance of the slot positions by comparing the values of the attribute entropies of the slot positions; and then, according to the importance degree comparison result of the slot position, the importance degree is sequenced, thereby obtaining the important sequencing result of the slot position in a certain field. And establishing a corresponding relation between corresponding slot positions of the two fields at the same importance degree sequencing position according to respective slot position importance degree sequencing results of any two fields. In this embodiment, the slot importance comparison and sorting of the target field and the source field, and the establishment of the slot correspondence relationship are pre-operation steps.
For example, in the target field of 'ticket booking', the slot positions which can most help the user to carry out ticket screening are sorted according to the descending order of importance as follows: departure city, target city, departure time, airline, price. In the source field of 'hotel determination', the slot positions which can most help the user to carry out hotel screening are sorted according to the descending order of importance degree as follows: time of stay, time of departure, hotel location, hotel star, price, house type, etc. When the corresponding relations of the slot positions in the two fields are established, the corresponding relations of the departure city, the check-in time, the target city, the departure time, the hotel position, the airline company, the hotel star level and the like are respectively established. It should be noted that if all tickets are taken off in the morning, the entropy of the attribute of the slot "take off time" is very low and does not help the user to screen flights.
Therefore, based on the corresponding relation between the target field and each slot position of the source field, the slot position of the source field corresponding to the slot position of the target field is found out. For example, if the target slot is a departure city, the corresponding source slot is the stay-in time, and the rest are analogized.
And S208, generating a source field conversation state according to the source field intention and the source field slot position.
Step S208 is the same as step S204 described above, and step S204 may be referred to for specific implementation.
In this embodiment, based on the importance ranking results of the target domain slot and the source domain slot, a corresponding relationship is established between the target domain slot and the source domain slot, and the source domain slot corresponding to the target domain slot is obtained. And mapping the target field dialog state into a source field dialog state based on the source field intention and the source field slot position corresponding to the target field intention. The method and the device perform sequencing based on the importance of the slot positions, establish the corresponding relation of the slot positions between the two fields and fully utilize the importance index data of the slot positions, thereby being beneficial to improving the accuracy and the effectiveness of the slot position matching and improving the reliability of the slot position matching.
Further, on the basis of the first embodiment of the migration method of the cross-domain dialogue strategy, the fourth embodiment is provided. As shown in fig. 6, one specific implementation of step S20 includes:
step S209, determining a source field according to the target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
Step S209, similar to step S201 described above, may refer to step S201.
Step S210, solving a set of variables maximizing a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;
understandably, the higher the similarity of any set of intents (or slots), the higher the accuracy of mapping the target domain dialog state to the source domain dialog state. If it is assumed that the similarity of any set of intents or slots in the target domain or the source domain is a probability variable, and the probability variable can be regarded as a solution value of a predefined learning target equation. The learning objective equation is used for measuring the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field.
Based on the above logic, one specific measure of the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field is as follows: and during the n-th conversation, adding the future total profit of all conversations after the n-th conversation, which is estimated by the model algorithm, with the single-round profit of the n-th conversation recorded in the actual data. The smaller the error between the obtained addition result and the total future income of all conversations after the n +1 th round estimated by the model is, the better the performance improvement effect of the model algorithm in reinforcement learning in a certain field is.
One preferred learning objective equation is:
Figure BDA0001931245760000131
in the formula, theta is a variable parameter set; hn refers to the state of one multi-turn conversation in the nth step; yn refers to the reply that the dialog system replies to the user at step n.
Figure BDA0001931245760000132
Is the square of the standard loss equation (bellman equation) in the conventional Q-learning algorithm (Q-learning), and minimizing this component is to bring the standard loss equation close to 0. Namely: reducing future revenue estimated for conversations at nth
Figure BDA0001931245760000133
Future profit Q actually recorded from dialogs at the n +1 th wheelt(hn,yn) An error therebetween.
R (Θ) is a regularization term that limits the complexity of the model and aligns the intents and slots of the target and source realms. The specific principle is that when a certain field is used for reinforcement learning, the following logic is present by default: if two agents (agents) respectively perform two groups of actions similar to each other in two similar states (states) in two fields, the next state to which the two agents respectively transfer to will be similar, and the rewards (rewarded) obtained by the two agents in the process of performing the state transfer will be similar.
Specifically, R (Θ) ═ R1(Θ)+R2(Θ)+R3(Θ)+R4(Θ)
(1)R1(Θ)=R1s(Θ)+R1t(Θ)
R1s(Θ)、R1tAnd (theta) respectively representing the slot vector retention regularization formulas of the source field and the target field. R is 1(Θ) represents a slot vector leave-regularization formula across domains.
Figure BDA0001931245760000141
In the formula, Lce (·) represents a cross entropy loss, and omicron represents the cross entropy loss. a is asRepresenting any language intent. DsRepresenting dialogs (in large numbers) of the source domain.
ct(. to) a prediction function for the intention vector atUnder the condition of (1), a slot position vector s is predicted in the target domaint
Figure BDA0001931245760000142
Can be considered as an intention vector approximating the target domain, for asThe answer of (1);
Figure BDA0001931245760000143
compatible slot position vectors for the prediction in the target field;
R1t(theta) and R1sThe formula (Θ) corresponds to the formula, and is not described herein.
Figure BDA0001931245760000144
DtDialogs representing the target domain (fewer in number);
Figure BDA0001931245760000145
function representing intent translation from source domain to target domain
Figure BDA0001931245760000146
Representing a function translated from the target domain to the source domain statement.
Figure BDA0001931245760000151
Wherein the content of the first and second substances,
Figure BDA0001931245760000152
is the probability of occurrence of the intention a of all target areas,
Figure BDA0001931245760000153
is the probability of occurrence of the intent a of all source domains. L iskl() Is the Kullback-Leibler divergence loss.
Figure BDA0001931245760000154
Wherein the content of the first and second substances,
Figure BDA0001931245760000155
is to locate the target slot
Figure BDA0001931245760000156
Mapping to source-realm slot
Figure BDA0001931245760000157
The probability of (d); i SsThe number of slots in the source domain.
Further, after the learning objective equation is established, a set of variables that maximize the learning objective equation is found based on a preset optimization algorithm.
The optimization algorithm can be configured according to actual needs, such as Adam method (see concretely Kingma and Ba,2014 Diederik Kingma and Jimmy Ba. Adam: A method for storing optimization. arXiv preprinting arXiv:1412.6980,2014.) or gradient descent algorithm.
The set of variables that maximizes the learning objective equation corresponds to the similarity between any two intents in the source and target domains or the similarity between any two slots.
Step S211, according to the determination result of the similarity, obtaining the source domain intention with the maximum similarity with the target domain intention;
when determining the source domain intention, according to the similarity between any source domain intention and the target domain intention, determining and acquiring the source domain intention with the maximum similarity to the target domain intention.
Step S212, according to the similarity determining result, obtaining a source field slot position with the maximum similarity with the target field slot position;
and when the source field slot position is determined, determining and acquiring the source field slot position with the maximum similarity to the target field intention according to the similarity between any source field intention and the target field intention.
And step S213, generating a source field conversation state according to the source field intention and the source field slot position.
Step S213 is the same as step S204 described above, and step S204 may be referred to for specific implementation.
In this embodiment, a learning objective equation is established first, and then a set of variables maximizing the learning objective equation is searched based on a preset optimization algorithm, so as to determine the similarity between any one set of intentions in the source field and the target field or the similarity between any one set of slots; and then determining a source domain intention with the maximum similarity to the target domain intention and a source domain slot with the maximum similarity to the target domain slot, and mapping the target domain dialogue state into a source domain dialogue state according to the determined source domain intention and the source domain slot so as to facilitate the generation of subsequent source domain reply actions.
In this embodiment, based on a learning objective equation and an optimization algorithm of reinforcement learning, a set of variables that maximizes the learning objective equation is found, so as to determine the similarity between a source field and any one set of intents in a target field or the similarity between any one set of slots. The embodiment combines the advantages of reinforcement learning and cross-domain migration application, determines the similarity of any group of intents/slot positions through the representation of the performance improvement effect of reinforcement learning of a certain model algorithm in a certain domain, effectively improves the accuracy and effectiveness of intention/slot position matching, and has strong cross-domain migration generalization capability and reliability.
In addition, the invention also provides a migration device of the cross-domain conversation strategy.
As shown in fig. 7, fig. 7 is a schematic composition diagram of each functional unit of the device. Wherein the apparatus comprises:
a target domain dialog state mapping unit 10, configured to process an input user input dialog to map a corresponding target domain dialog state;
the migration device of the cross-domain conversation strategy is particularly suitable for the task type conversation system. The purpose of a task-based dialog system is to help a user complete a task, such as booking a hotel, purchasing an airline ticket, etc., by recognizing the user's intent. In specific implementation, the user input dialog may be dialog information generated based on information materials such as characters or voice input when the user uses the human-computer interaction system, for example, when the user needs to book an air ticket, the user may input information "i want to order an air ticket going to beijing from shanghai" on the human-computer interaction system (ticket ordering platform); at the moment, after the system detects the input information of the user, the corresponding user input dialog is extracted.
The target domain refers to a domain having the highest degree of dialogue association with the input user, and the specific type of the target domain may be manually set by the user. For example, the user makes a selection of a target field, such as selecting a "ticket order" field, before or after entering information. Alternatively, the user input dialog is analyzed. For example, based on the user input dialog "i want to order an air ticket to beijing from shanghai", the target-domain dialog state mapping unit 10 determines that the target domain is "ticket order" or "ticket order". In addition, the target field can also be a task type scene field such as flow checking, call charge checking, meal ordering, consultation and the like.
In a specific implementation, the target domain dialog state mapping unit 10 is specifically configured to: performing natural language understanding on an input user input conversation to identify a target field intention and extract a target field slot;
the user input dialogue belongs to natural language, and natural language understanding is carried out on the user input dialogue through a natural language understanding module (or unit), so that target field identification, user intention identification and slot extraction are carried out. And target field identification, namely identifying a task type scene to which the user input dialogue belongs. Identifying user intentions, namely identifying the user intentions, and subdividing sub-scenes in the task-based scene; and the slot position extraction is used for extracting the slot position and the slot position value thereof based on user input dialogue, and can be specifically realized through a slot position filling mode. The specific technology for understanding the natural language of the input user input dialog, recognizing the target field intention and extracting the slot position of the target field belongs to the conventional prior art, and is not described herein any more.
The target field dialogue state mapping unit 10 is further configured to track the target field intention;
dialog state tracking is a core component that ensures the robustness of a dialog system. The method estimates the target of the user in each turn of the conversation, manages the input and the conversation history of each turn, and outputs the current conversation state. This typical state structure is often referred to as slot filling or semantic framework. Conventional methods have found widespread use in most commercial implementations, and manual rules are typically employed to select the most likely output.
And the target field dialog state mapping unit 10 is further configured to map the user input dialog according to the target field intention, the target field slot, and the tracking result of the target field intention, so as to obtain a corresponding target field dialog state.
The target domain dialog state may specifically be a combination of an intent and a set of slots and their slot bit values.
For example, the target domain dialog state mapping unit 10 performs word segmentation and word stem extraction on the user input dialog "i want to order an air ticket from shanghai to beijing", and further generates a semantic slot corresponding to the user input dialog. Semantic slots can be predefined according to different scenarios. According to the semantic slot, determining the intention of user conversation, the slot position and the slot position value:
hotel booking intention
The slot position 1 is the starting city, and the corresponding slot position value is Shanghai
Slot 2 is arrival city and corresponding slot value is beijing
A source domain dialog state mapping unit 20, configured to map the target domain dialog state into a source domain dialog state;
specifically, the source domain dialog state mapping unit 20 determines a corresponding target domain according to the target domain; the source domain is a domain in which a target domain is specified in advance. It is appropriate when the degree of similarity between the target domain and the source domain is high. For example, the source field pre-designated to the field of "booking air tickets" is the field of "booking hotels".
Then, the source domain dialog state mapping unit 20 acquires the source domain intention that has the greatest similarity to the target domain intention in the target domain dialog state. For example, the target domain intention "booking flight ticket" has different similarities with different intentions (such as "booking hotel", "querying house source location") in the source domain (booking hotel), respectively; the embodiment selects the source domain intention under the maximum similarity.
And, the source domain dialog state mapping unit 20 obtains the source domain slot with the largest similarity with the target domain slot in the target domain dialog state. For example, a target field slot "departure city" in the target field (air ticket booking) has different similarities with different slots (such as "check-in time", "check-in number", "house source position") in the source field (hotel booking); the embodiment selects the source domain intention under the maximum similarity.
And then, generating a source field conversation state according to the source field intention and the source field slot position. And mapping the dialog state of the target field into the dialog state of the source field when the source field intention and the source field slot position under the maximum similarity are obtained respectively.
Please refer to other embodiments below for specific implementation of the source domain dialog state mapping unit 20.
A source domain dialog state processing unit 30, configured to process a source domain dialog state based on a preset dialog policy of a source domain, to obtain a corresponding source domain dialog reply;
the source field has a large amount of training data, and a dialogue strategy (i.e., the preset dialogue strategy) with a higher performance level is generally obtained based on the training of the large amount of training data; or the preset conversation strategy is set manually.
Specifically, the source domain dialog state processing unit 30 retrieves a preset dialog policy of the source domain, and processes the source domain dialog state according to the preset dialog policy, so as to obtain a corresponding source domain dialog reply.
For example, the dialog state of the source domain is { intent: ordering a hotel, and checking in time: year 2018, 10 month 1 day, departure time: in 2018, 10, 2, and 10, a preset dialog strategy in the source domain generates an optimal abstract dialog reply { intention: query, price: is it a question of }; wherein, "? "means that the reply form to ask for the price is a question sentence.
A target domain dialog reply mapping unit 40, configured to map the source domain dialog reply into a target domain dialog reply.
After obtaining the source domain dialog reply, the target domain dialog reply mapping unit 40 performs mapping processing on the source domain dialog reply, thereby obtaining a target domain dialog reply in the target domain. For example, an abstract source domain dialog reply to the source domain (hotel booking) { intent: query, price: is it a question of Executing mapping processing to obtain abstract target field dialogue reply { intention: query, price: is it a question of ) }, "? "means that the reply form for asking the price is a question sentence.
Furthermore, the migration device of the cross-domain conversation strategy further comprises a natural language reply unit, wherein the natural language reply unit is used for organizing the target domain conversation reply into natural language to return to the user so as to facilitate the understanding of the user. For example: abstract target realm dialog reply for target realm (air ticket) intent: query, price: is there a Will be organized into the natural language "ask for what price ticket you want? ".
In the migration device of the cross-domain conversation strategy, the conversation state in the source domain is processed by mapping the conversation state in the target domain into the conversation state in the source domain based on the existing preset conversation strategy in the source domain to obtain the corresponding conversation reply in the source domain; and mapping the source domain dialogue reply to a target domain dialogue reply, thereby migrating the dialogue strategy of the source domain to the target domain. Therefore, sufficient training data volume of the source field and a dialogue strategy with a higher performance level can be fully utilized, sufficient training data volume does not need to be prepared for the target field again, and a dialogue reply of the target field corresponding to the input dialogue of the user can be generated without training the dialogue strategy of the target field, so that the required amount of manual labeling data is reduced, and the data acquisition cost is reduced; meanwhile, a large amount of repeated labeling is avoided, the waste of data resources is reduced, and the application scene range of each field is expanded.
The technical solution of the present invention is further described with reference to specific extended scenarios.
Further, on the basis of the migration apparatus of the cross-domain dialog policy of the present invention as described above, in a specific implementation, the target domain dialog state mapping unit 20 is specifically configured to determine a source domain according to a target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
the source domain is a domain in which a target domain is specified in advance, for example, the source domain of the target domain a is specified as domain B, and the source domain of the target domain C is specified as domain D. The specified interrelationship is a preset associative relation between the target field and the source field. And after the target field is determined, determining a source field according to the target field and the preset association relation corresponding to the target field.
The target domain dialog state mapping unit 20 is further configured to obtain a source domain intention with a maximum preset similarity to the target domain intention;
specifically, manual designation of arbitrary intentions of the target domain and arbitrary intentions of the source domain is performed in advance. And when determining the source field intention, determining and acquiring the source field intention with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.
The target domain dialog state mapping unit 20 is further configured to obtain a source domain slot having a maximum preset similarity with the target domain slot;
specifically, manual designation of arbitrary intentions of the target domain and arbitrary intentions of the source domain is performed in advance. And when the target field slot position is determined, determining and acquiring a source field slot position with the maximum preset similarity with the target field intention according to the preset similarity corresponding to the target field intention.
The target domain dialog state mapping unit 20 is further configured to generate a source domain dialog state according to the source domain intention and the source domain slot.
And after the intention and the slot position in the source field are obtained, generating a source field conversation state. The source domain dialog state may specifically be a combination of the source domain intent and a set of source domain slots and slot level values thereof. The slot position value of a slot position in a certain source field can be default slot position information set manually, or obtained according to a preset rule, for example, if the source field is a hotel booking, and the slot position in the source field is check-in time, the corresponding slot position value in the source field is set as the date of the day; and if the source field slot position is the time of leaving the store, the corresponding value of the source field slot position is set as the date of the next day.
For example, the target domain is "ticket booking", and the target domain dialog state is { intention: booking an air ticket, starting a city: shanghai, to city: beijing }. The method comprises the steps of obtaining a source field intention with the maximum similarity corresponding to a target field intention in a target field conversation state as ' hotel booking ', and obtaining source field slots with the maximum similarity corresponding to a target field slot position ' departure time ' and an arrival time ' in the target field conversation state as ' check-in time ' and ' departure time ', and further obtaining slot position values corresponding to the source field slots to generate a source field conversation state. Thus, the target domain dialog state { intent: booking an air ticket, starting a city: shanghai, to city: beijing is mapped to the source domain dialog state { intent: ordering a hotel, and checking in time: no. 10 month 1 in 2018, time from store: number 2 of 10 months in 2018 }.
In this embodiment, the target domain dialog state mapping unit 20 determines a source domain intention corresponding to the target domain intention and determines a source domain slot corresponding to the target domain slot based on a similarity between any intention of the manually specified target domain, any slot position and any intention of the source domain, and any slot position, so as to map the target domain dialog state to the source domain dialog state. The method for manually specifying the similarity has the characteristics of easiness in implementation and maintenance.
Further, on the basis of the migration apparatus of the cross-domain dialog policy of the present invention as described above, in a specific implementation, the target domain dialog state mapping unit 20 is specifically configured to determine a source domain according to a target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
specific implementations may refer to the above embodiments.
The target domain dialog state mapping unit 20 is further configured to obtain a source domain intention with a maximum preset similarity to the target domain intention;
specific implementations may refer to the above embodiments.
The target field session state mapping unit 20 is further configured to obtain a source field slot position corresponding to the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;
specifically, based on the information entropy theory, the importance of a slot in a certain field is measured by the attribute entropy of the slot. The attribute entropy of the slot is an entropy obtained after normalization, and a preferred calculation formula is as follows:
Figure BDA0001931245760000211
S represents a certain slot, η(s) is attribute entropy of the certain slot, υ is a certain attribute under the corresponding slot, Vs is each attribute set (| Vs | is attribute total number) of the certain slot, and p (s ═ υ) is empirical probability that an entity in a database with a time slot is the attribute υ.
For example, as shown in the table below, the table below is for the case of "child allowed" and "price high and low" for different restaurants in the restaurant database.
Figure BDA0001931245760000212
An entropy η(s) ═ p(s) × log (p(s) ═ allow))/2 + p (s ═ disallow) × log (p (s ═ disallow))/2 of an attribute of "whether or not a restaurant in the restaurant database is allowed with children". Where the probability of p (s ═ allowed) is 0/4, the probability of p (s ═ disallowed) is 4/4, and Vs is 2. At this time, the attribute entropy of "whether or not to allow child carrying" is 0.
Similarly, the attribute restaurant "price high-low" in the restaurant database has an entropy η(s) ([ p (s ═ price high) × log (p (s ═ price high))/3 + p (s ═ price medium) × log (p (s ═ price medium))/3 + p (s ═ price low) × log (p (s ═ price low))/3 ]. Where the probability of p (s ═ price high) is 1/4, the probability of p (s ═ price in) is 1/4, the probability of p (s ═ price in) is 2/4, and Vs is 3. At this time, the attribute entropy of "price high/low" is 0.15.
Calculating according to the formula to obtain the attribute entropy of a certain slot position as a positive value; the lower the entropy value is, the lower the information gain level of the attribute is, and the importance degree of the attribute is lower; correspondingly, the importance of the corresponding slot is lower. In the above example, it would make no practical sense for the system to ask the user's preference for the restaurant "allow kids" slot property, because none of the restaurants in the database are allowed with kids, i.e., the above inquiry session of the system does not provide any information gain. Thus, the entropy of the "price high and low" attribute is higher than the entropy of the "whether children are allowed" attribute.
After calculating the attribute entropies of different slot positions, the importance degree comparison of the slot positions is realized by comparing the values of the attribute entropies of the slot positions; and then, according to the importance degree comparison result of the slot positions, the importance degree is sequenced, so that the important sequencing result of the slot positions in a certain field is obtained. And establishing a corresponding relation between corresponding slot positions of the two fields in the same importance ranking position according to respective slot position importance ranking results of any two fields. In this embodiment, the comparison and sorting of the importance of the slot positions in the target field and the source field, and the establishment of the slot position corresponding relationship are pre-operation steps.
For example, in the target field of 'ticket booking', the slot positions which can most help the user to carry out ticket screening are sorted according to the descending order of importance as follows: departure city, target city, departure time, airline, price. In the source field of 'hotel determination', the slot positions which can most help the user to carry out hotel screening are sorted according to the descending order of importance degree as follows: time of stay, time of departure, hotel location, hotel star, price, house type, etc. When the corresponding relations of the slot positions in the two fields are established, the corresponding relations of the departure city, the check-in time, the target city, the departure time, the hotel position, the airline company, the hotel star level and the like are respectively established. It should be noted that if all tickets are taken off in the morning, the entropy of the attribute of the slot "take off time" is very low and does not help the user to screen flights.
Therefore, based on the corresponding relation between the target field and each slot position of the source field, the slot position of the source field corresponding to the slot position of the target field is found out. For example, if the target slot is a departure city, the corresponding source slot is the stay-in time, and the rest are analogized.
The target domain dialog state mapping unit 20 is further configured to generate a source domain dialog state according to the source domain intention and the source domain slot.
Specific implementations may refer to the above embodiments.
In this embodiment, the target-domain dialog state mapping unit 20 establishes a corresponding relationship between the slot in the target domain and the slot in the source domain based on the importance ranking result of the slot in the target domain and the slot in the source domain, and further obtains the slot in the source domain that establishes a corresponding relationship with the slot in the target domain. And mapping the target field dialog state into a source field dialog state based on the source field intention and the source field slot position corresponding to the target field intention. The ranking is carried out based on the importance of the slot positions, the corresponding relation of the slot positions between the two fields is established, and the importance index data of the slot positions are fully utilized, so that the accuracy and the effectiveness of the slot position matching are improved, and the reliability of the slot position matching is improved.
Further, on the basis of the migration apparatus of the cross-domain dialog policy of the present invention as described above, in a specific implementation, the target domain dialog state mapping unit 20 is specifically configured to determine a source domain according to a target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
Specific implementations may refer to the above embodiments.
The target domain dialogue state mapping unit 20 is further configured to solve a set of variables that maximizes the learning target equation based on a predefined learning target equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;
understandably, the higher the similarity of any set of intents (or slots), the higher the accuracy of mapping the target domain dialog state to the source domain dialog state. If it is assumed that the similarity of any set of intents or slots in the target domain or the source domain is a probability variable, and the probability variable can be regarded as a solution value of a predefined learning target equation. The learning objective equation is used for measuring the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field.
Based on the above logic, one specific measure of the performance improvement effect of a certain model algorithm in reinforcement learning in a certain field is as follows: and during the nth round of conversation, adding the future total income of all conversations after the nth round, which is estimated by the model algorithm, with the single-round income of the nth round of conversation recorded in actual data. The smaller the error between the obtained addition result and the total future income of all conversations after the (n + 1) th round estimated by the model is, the better the performance improvement effect of the model algorithm in reinforcement learning in a certain field is.
One preferred learning objective equation is:
Figure BDA0001931245760000231
in the formula, theta is a variable parameter set; hn refers to the state of one multi-turn conversation in the nth step; yn refers to the reply that the dialog system replies to the user at step n.
Figure BDA0001931245760000241
Is the square of the standard loss equation (bellman equation) in the conventional Q-learning algorithm (Q-learning), and minimizing this component is to bring the standard loss equation close to 0. Namely: reducing future revenue estimated for conversations at nth
Figure BDA0001931245760000242
Future profit Q actually recorded from dialogs at the n +1 th wheelt(hn,yn) An error therebetween.
R (Θ) is a regularization term to limit the complexity of the model and to align the slot positions with the intentions of the target and source realms. The specific principle is that when reinforcement learning is performed in a certain field, the following logic is present by default: if two agents (agents) respectively perform two groups of actions similar to each other in two similar states (states) in two fields, the next state to which the two agents respectively transfer to will be similar, and the rewards (rewarded) obtained by the two agents in the process of performing the state transfer will be similar.
Specifically, R (Θ) ═ R1(Θ)+R2(Θ)+R3(Θ)+R4(Θ)
(1)R1(Θ)=R1s(Θ)+R1t(Θ)
R1s(Θ)、R1tAnd (theta) respectively representing the slot vector retention regularization formulas of the source field and the target field. R 1(Θ) represents a slot vector leave-regularization formula across domains.
Figure BDA0001931245760000243
In the formula, Lce (·) represents a cross entropy loss, and omicron represents the cross entropy loss. a is asRepresenting any source domain intent. DsRepresenting dialogs (in large numbers) of the source domain.
ct(. to) a predictor function for the intent vector a at the target domaintUnder the condition of (2), a slot position vector s is predicted in the target domaint
Figure BDA0001931245760000244
Language action vector, for a, that can be considered as approximating the target domainsThe answer of (1);
Figure BDA0001931245760000245
predicting a slot position vector in a target field;
R1t(theta) and R1sThe formula (Θ) corresponds to the formula, and is not described herein.
Figure BDA0001931245760000251
DtDialogs representing the target domain (fewer in number);
Figure BDA0001931245760000252
representing a function of intent translation from a source domain to a target domain
Figure BDA0001931245760000253
Representing a function translated from the target domain to the source domain statement.
Figure BDA0001931245760000254
Wherein the content of the first and second substances,
Figure BDA0001931245760000255
is the probability of occurrence of the intention a of all target areas,
Figure BDA0001931245760000256
is the probability of occurrence of the intended behavior of all source domains. L iskl() Is the Kullback-Leibler divergence loss.
Figure BDA0001931245760000257
Wherein the content of the first and second substances,
Figure BDA0001931245760000258
is to locate the target slot
Figure BDA0001931245760000259
Mapping to a reference slot
Figure BDA00019312457600002510
The probability of (d); i SsThe number of slots in the source domain.
Further, after the learning objective equation is established, a set of variables that maximize the learning objective equation is found based on a preset optimization algorithm.
The optimization algorithm can be configured according to actual needs, such as Adam method (see Kingma and Ba,2014 Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprinting arXiv:1412.6980,2014), or gradient descent algorithm.
The set of variables that maximizes the learning objective equation corresponds to the similarity between any two intents in the source and target domains or the similarity between any two slots.
The target domain dialog state mapping unit 20 is further configured to obtain a source domain intention with the greatest similarity to the target domain intention according to a determination result of the similarity;
when determining the source domain intention, according to the similarity between any source domain intention and the target domain intention, determining and acquiring the source domain intention with the maximum similarity to the target domain intention.
The target field dialog state mapping unit 20 is further configured to obtain a source field slot with the greatest similarity to the target field slot according to the similarity determination result;
and when the source field slot position is determined, determining and acquiring the source field slot position with the maximum similarity to the target field intention according to the similarity between any source field intention and the target field intention.
The target domain dialog state mapping unit 20 is further configured to generate a source domain dialog state according to the source domain intention and the source domain slot.
Specific implementations may refer to the above embodiments.
In this embodiment, a learning objective equation is established first, and then a set of variables maximizing the learning objective equation is found based on a preset optimization algorithm, so as to determine the similarity between the source field and any set of intentions in the target field or the similarity between any set of slots; and then determining a source field intention with the maximum similarity to the target field intention and a source field slot position with the maximum similarity to the target field slot position, and mapping the target field conversation state into a source field conversation state according to the determined source field intention and the source field slot position so as to generate a subsequent source field reply action.
In this embodiment, the target domain dialogue state mapping unit 20 finds a set of variables maximizing a learning objective equation based on the learning objective equation and the optimization algorithm of reinforcement learning, so as to determine the similarity between the source domain and any group of intents in the target domain or the similarity between any group of slots. The embodiment combines the advantages of reinforcement learning and cross-domain migration application, determines the similarity of any group of intents/slot positions through the representation of the performance improvement effect of reinforcement learning of a certain model algorithm in a certain domain, effectively improves the accuracy and effectiveness of intention/slot position matching, and has strong cross-domain migration generalization capability and reliability.
In addition, the present invention also provides a migration device for a cross-domain dialogue policy, which includes: the system comprises a memory, a processor and a migration program of cross-domain conversation strategy stored on the memory and capable of running on the processor, wherein the migration program of cross-domain conversation strategy realizes the steps of the migration method of cross-domain conversation strategy as described in any one of the above items when executed by the processor.
As shown in fig. 8, fig. 8 is a schematic structural diagram of a migration device for a cross-domain dialog policy according to an embodiment of the present invention.
The migration device of the cross-domain conversation strategy in the embodiment of the invention can be a PC or a server.
As shown in fig. 8, the apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 8 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 8, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a migration program of a cross-domain conversation policy.
In the device shown in fig. 8, the network interface 1004 is mainly used for connecting a backend server and communicating data with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a migration program of the cross-domain dialog policy stored in the memory 1005, and perform the operations in the migration method embodiment of the cross-domain dialog policy described above.
Based on the hardware structure, the embodiment of the migration method of the cross-domain dialogue strategy is provided.
In addition, the invention also provides a readable storage medium.
The storage medium stores a migration program of the cross-domain dialogue strategy, and the migration program of the cross-domain dialogue strategy realizes the steps of the migration method of the cross-domain dialogue strategy as described in any one of the above items when being executed by a processor.
The specific embodiments of the migration device and the storage medium of the cross-domain dialogue policy of the present invention are substantially the same as the embodiments of the migration method of the cross-domain dialogue policy, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element identified by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
While the present invention has been described with reference to the particular illustrative embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalent arrangements, and equivalents thereof, which may be made by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (12)

1. A migration method of cross-domain dialogue strategy, characterized in that the method comprises the following steps:
processing the input user input dialog to map out a corresponding target field dialog state, wherein the target field dialog state is a combination of an intention and a group of slot positions and slot position values thereof;
mapping the target domain dialog state to a source domain dialog state;
processing the source field dialogue state based on a preset dialogue strategy of a source field to obtain a corresponding source field dialogue reply, wherein the source field is a field which is specified in advance for a target field, and the target field comprises a field with the highest input user input dialogue association degree;
and mapping the source domain dialog reply to a target domain dialog reply.
2. The migration method of a cross-domain dialog policy according to claim 1, wherein the step of processing the input user input dialog to map out a corresponding target domain dialog state specifically comprises:
performing natural language understanding on an input user input conversation to identify a target field intention and extract a target field slot;
Tracking the target field intention;
and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.
3. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to a source domain dialog state specifically comprises:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
acquiring a source domain intention with the maximum preset similarity to the target domain intention;
acquiring a source field slot position with the maximum preset similarity with the target field slot position;
and generating a source field conversation state according to the source field intention and the source field slot position.
4. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to the source domain dialog state specifically comprises:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
acquiring a source domain intention with the maximum preset similarity to the target domain intention;
Acquiring a source field slot position which establishes a corresponding relation with the target field slot position; the method comprises the steps of sorting importance of slot positions in a target field and slot positions in a source field in advance, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sorting result;
and generating a source field conversation state according to the source field intention and the source field slot position.
5. The migration method of a cross-domain dialog policy according to claim 2, wherein the step of mapping the target domain dialog state to the source domain dialog state in the source domain specifically comprises:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots;
according to the result of similarity learning, obtaining a source field intention with the maximum similarity with the target field intention;
according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position;
And generating a source field conversation state according to the source field intention and the source field slot position.
6. An apparatus for migrating a cross-domain conversation policy, the apparatus comprising:
the target field dialogue state mapping unit is used for processing the input user input dialogue to map out a corresponding target field dialogue state, wherein the target field dialogue state is the combination of an intention and a group of slot positions and slot position values thereof;
a source domain dialog state mapping unit, configured to map the target domain dialog state into a source domain dialog state;
the source field dialogue state processing unit is used for processing the source field dialogue state based on a preset dialogue strategy of a source field to obtain a corresponding source field dialogue reply, wherein the source field is a field which is specified in advance for a target field, and the target field comprises a field with the highest input user input dialogue association degree;
and the target field dialogue reply mapping unit is used for mapping the source field dialogue reply into the target field dialogue reply.
7. The migration apparatus of a cross-domain dialog strategy according to claim 6, wherein the target domain dialog state mapping unit is specifically configured to perform natural language understanding on the input user input dialog to identify a target domain intention and extract a target domain slot; tracking the target field intention; and mapping the user input dialog according to the target field intention, the target field slot position and the tracking result of the target field intention so as to obtain a corresponding target field dialog state.
8. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to determine the source domain according to the target domain; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; acquiring a source domain intention with the maximum preset similarity to the target domain intention; acquiring a source field slot position with the maximum preset similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.
9. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to:
determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field;
acquiring a source domain intention with the maximum preset similarity to the target domain intention;
acquiring a source field slot position which establishes a corresponding relation with the target field slot position; respectively sequencing the importance of the slot positions in the target field and the slot positions in the source field, and establishing a corresponding relation between the slot positions in the target field and the slot positions in the source field according to a sequencing result;
And generating a source field conversation state according to the source field intention and the source field slot position.
10. The migration apparatus of a cross-domain dialog policy according to claim 7, wherein the source domain dialog state mapping unit is specifically configured to: determining a source field according to a target field; the method comprises the following steps that a preset incidence relation exists between a target field and a source field; solving a set of variables that maximizes a learning objective equation based on a predefined learning objective equation; according to the solved variables, determining the similarity of any group of intents in the source field and the target field or the similarity of any group of slots; according to the result of similarity learning, obtaining a source field intention with the maximum similarity with the target field intention; according to the similarity determination result, obtaining a source field slot position with the maximum similarity with the target field slot position; and generating a source field conversation state according to the source field intention and the source field slot position.
11. A migration device of a cross-domain dialogue policy is characterized by comprising: a memory, a processor and a migration program of cross-domain dialogue policies stored on the memory and executable on the processor, the migration program of cross-domain dialogue policies implementing the steps of the migration method of cross-domain dialogue policies according to any one of claims 1 to 5 when executed by the processor.
12. A readable storage medium, characterized in that the readable storage medium has stored thereon a migration program of a cross-domain dialogue policy, which when executed by a processor implements the steps of the migration method of the cross-domain dialogue policy according to any one of claims 1 to 5.
CN201811641823.7A 2018-12-29 2018-12-29 Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium Active CN109739965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811641823.7A CN109739965B (en) 2018-12-29 2018-12-29 Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811641823.7A CN109739965B (en) 2018-12-29 2018-12-29 Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium

Publications (2)

Publication Number Publication Date
CN109739965A CN109739965A (en) 2019-05-10
CN109739965B true CN109739965B (en) 2022-07-15

Family

ID=66362508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811641823.7A Active CN109739965B (en) 2018-12-29 2018-12-29 Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium

Country Status (1)

Country Link
CN (1) CN109739965B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609618B (en) * 2019-08-26 2023-06-20 杭州城市大数据运营有限公司 Man-machine conversation method and device, computer equipment and storage medium
CN110941693A (en) * 2019-10-09 2020-03-31 深圳软通动力信息技术有限公司 Task-based man-machine conversation method, system, electronic equipment and storage medium
CN110727773B (en) * 2019-10-11 2022-02-01 沈阳民航东北凯亚有限公司 Information providing method and device
CN111814958B (en) * 2020-06-30 2023-06-20 中国电子科技集团公司电子科学研究院 Method and device for mapping public culture service individuals to public culture service scenes
CN115440200B (en) * 2021-06-02 2024-03-12 上海擎感智能科技有限公司 Control method and control system of vehicle-mounted system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268616A (en) * 2018-01-04 2018-07-10 中国科学院自动化研究所 The controllability dialogue management extended method of fusion rule information
CN108415939A (en) * 2018-01-25 2018-08-17 北京百度网讯科技有限公司 Dialog process method, apparatus, equipment and computer readable storage medium based on artificial intelligence
CN109033223A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 For method, apparatus, equipment and computer readable storage medium across type session
CN109101545A (en) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 Natural language processing method, apparatus, equipment and medium based on human-computer interaction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9070366B1 (en) * 2012-12-19 2015-06-30 Amazon Technologies, Inc. Architecture for multi-domain utterance processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268616A (en) * 2018-01-04 2018-07-10 中国科学院自动化研究所 The controllability dialogue management extended method of fusion rule information
CN108415939A (en) * 2018-01-25 2018-08-17 北京百度网讯科技有限公司 Dialog process method, apparatus, equipment and computer readable storage medium based on artificial intelligence
CN109033223A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 For method, apparatus, equipment and computer readable storage medium across type session
CN109101545A (en) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 Natural language processing method, apparatus, equipment and medium based on human-computer interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋博川.跨领域对话理解技术研究与实现.《 CNKI优秀硕士学位论文全文库 》.2018,I138-622. *

Also Published As

Publication number Publication date
CN109739965A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109739965B (en) Method, device and equipment for migrating cross-domain conversation strategy and readable storage medium
CN109918673B (en) Semantic arbitration method and device, electronic equipment and computer-readable storage medium
CN109145219B (en) Method and device for judging validity of interest points based on Internet text mining
CN107291783B (en) Semantic matching method and intelligent equipment
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
CN109543034B (en) Text clustering method and device based on knowledge graph and readable storage medium
CN111144723A (en) Method and system for recommending people's job matching and storage medium
CN110168535A (en) A kind of information processing method and terminal, computer storage medium
US10825071B2 (en) Adaptive multi-perceptual similarity detection and resolution
CN109783812B (en) Chinese named entity recognition method, system and device based on self-attention mechanism
CN112256845A (en) Intention recognition method, device, electronic equipment and computer readable storage medium
CN113377936A (en) Intelligent question and answer method, device and equipment
KR102456148B1 (en) Skill word evaluation method and device, electronic device, and computer readable medium
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN112507103A (en) Task type dialogue and model training method, device, equipment and storage medium
Alsudais Quantifying the offline interactions between hosts and guests of Airbnb
CN111611355A (en) Dialog reply method, device, server and storage medium
CN111882224A (en) Method and device for classifying consumption scenes
CN114461749B (en) Data processing method and device for conversation content, electronic equipment and medium
CN115470790A (en) Method and device for identifying named entities in file
CN115660695A (en) Customer service personnel label portrait construction method and device, electronic equipment and storage medium
US20230316301A1 (en) System and method for proactive customer support
CN113468890B (en) Sedimentology literature mining method based on NLP information extraction and part-of-speech rules
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant