CN113742467B

CN113742467B - Method and device for generating dialogue state of hierarchical selection slot phase context

Info

Publication number: CN113742467B
Application number: CN202111024342.3A
Authority: CN
Inventors: 黄浩; 谢红岩
Original assignee: Xinjiang University
Current assignee: Xinjiang University
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2023-08-08
Anticipated expiration: 2041-09-02
Also published as: CN113742467A

Abstract

The invention discloses a dialogue state generation method and a device for hierarchically selecting slot phase contexts, wherein the method comprises the following steps: using a gating circulation unit to encode a dialog vector to obtain a dialog-level implicit state sequence; outputting the probability that each dialog wheel contains a slot value through multi-step selection; the method comprises the steps that a slot position vector is used as initial input of a gating circulation unit, and in each time step of generating a slot value, a word vector of a word label generated in the last step is input into a state generator to obtain a current implicit state; and multiplying the probability that the dialogue wheel contains the slot value by the probability distribution of copying a word from the dialogue wheel by adopting a weighted copying mechanism to obtain the probability distribution of copying the word from the dialogue history, so as to predict the word label at the current moment. The device comprises: a processor and a memory. The invention reduces the interference of noise signals in the dialogue text, and ensures that the generating result of the slot value is more accurate.

Description

Method and device for generating dialogue state of hierarchical selection slot phase context

Technical Field

The present invention relates to the field of dialog state tracking, and more particularly, to a method and apparatus for generating a dialog state for hierarchically selecting slot phase context, i.e., accurately estimating a compact representation of a current dialog state from a series of noisy observations generated by a speech recognition and natural language understanding module.

Background

Conversations are one of the long-standing challenges in the fields of artificial intelligence computer science and artificial intelligence. Since human dialog is complex and ambiguous in nature, it is still very difficult to learn an open-domain dialog AI (dialog AI is a computer or program that interacts with humans through natural language dialogs, just like a real person), and thus industrial applications are not focused on creating dialog systems that can reach human level intelligence, but on creating task-oriented dialog systems that can help users accomplish specific tasks such as flight reservation and querying bus information. With the increasing diversity of user demands and the complexity of user goals, it is becoming increasingly important to build a dialog system capable of handling tasks across different application areas. A relatively intelligent system allows the user to modify or refine his or her needs during a conversation. Therefore, the system needs to monitor the progress status of the dialogue at any time and make a proper dialogue strategy to ensure that the dialogue proceeds towards the preset service target. The dialogue state tracking serves as a core module in the whole dialogue system, and plays a vital role in updating the internal state of the dialogue system and generating dialogue strategies.

In multi-domain dialog state tracking, the model expects to predict (domain, slot, value) triples for each slot in each domain, rather than just predicting (slot, value) pairs. This task is a great challenge because the length of the dialog text increases as the dialog progresses and some slots in different fields have relevance.

The current solutions suffer from the problem that, on the one hand, existing work is mainly focused on matching slots with historical utterances of conversations at multiple granularity levels, which ignores the side effects of overuse of context information. Typically, slots are only related to a few dialogue turns, and although historical utterances provide rich information for extracting more features, noise signals and unnecessary information are also brought; on the other hand, there may be a relation between different slots and the different slots will be usually mentioned by users or systems in different dialogue rounds, so it is difficult to correctly predict slot values when dealing with the situation of multiple rounds of reasoning, for example, users reserve a hotel in the beginning of a dialogue, propose a taxi to the hotel reserved before the dialogue is about to end, when the arrival point of the taxi is the address of the hotel, the existing method usually splices the text of the words into a sequence as the input of the model, but neglects the interaction between different words, and cannot memorize remote information, resulting in poor reasoning capability of the previous method.

Disclosure of Invention

The invention provides a dialogue state generation method and a dialogue state generation device for hierarchically selecting slot phase contexts, which are characterized in that related information related to the slot phase is fused through a multi-step reasoning mechanism, and interference of noise signals in dialogue texts is reduced, so that a slot value generation result is more accurate, and the method is described in detail below:

in a first aspect, a method for generating a dialog state for a hierarchically selected slot phase context, the method comprising:

using a gating circulation unit to encode a dialog vector to obtain a dialog level implicit state sequence, and using multiple steps to select a dialog where a judgment slot value is located;

fusion gating is utilized to fuse the slot perception context vector and the slot vector to obtain a restated slot vector; outputting the probability that each dialog wheel contains a slot value through multi-step selection;

the method comprises the steps that a slot position vector is used as initial input of a gating circulation unit, and in each time step of generating a slot value, a word vector of a word label generated in the last step is input into a state generator to obtain a current implicit state;

and multiplying the probability that the dialogue wheel contains the slot value by the probability distribution of copying a word from the dialogue wheel by adopting a weighted copying mechanism to obtain the probability distribution of copying the word from the dialogue history, so as to predict the word label at the current moment.

In a second aspect, a dialog state generation device for hierarchically selecting slot phase contexts, the device comprising: a processor and a memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of the first aspects.

In a third aspect, a computer readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method steps of any of the first aspects.

The invention selects the content related to the slot from the word level and the dialog wheel level, effectively reduces the interference of irrelevant information to the model, and one dialog wheel comprises a user utterance and a system reply. To avoid missing important information, a multi-step selection module is further proposed that collects slot phase dialog turn level information at each step and finally deduces the probability that each turn of dialog contains a slot value. In decoding, a weighted replication mechanism is used, which is more prone to replicating words from the dialog rounds most likely to contain slot values, and the network model has stronger replication capability, and compared with the existing state generation scheme, the invention has the advantages that:

1. the invention selects the content related to the slot from the word level and the dialog wheel level, and the hierarchical structure is different from the previous scheme for coding all dialog contexts, so that the invention can selectively memorize more valuable information and simultaneously reduce the interference of irrelevant information on a model;

2. compared with the traditional sequence-to-sequence state generation method, the method has the advantages that a multi-step selection module is used in the state generation network, the module collects slot phase information at each step and combines the information collected before for the next selection, the structure can avoid missing important information, and finally deduces the probability of the slot value contained in each round of dialogue;

3. in the invention, a weighted copying mechanism is adopted when the state is generated, the weighted score of the word level is used when the word is copied from the dialogue history, and the probability that each dialogue predicted before contains the slot value is also used, so that the model is more prone to copy the word from the dialogue rounds most likely to contain the slot value when the word is copied, and the copying capability of the network model is higher.

Drawings

FIG. 1 is a schematic diagram of a session state generation network overall framework for hierarchically selecting slot phase contexts;

FIG. 2 is a schematic diagram of a multi-step selection module;

FIG. 3 is a schematic diagram of a decoding network;

fig. 4 is a schematic diagram of a dialogue state generation device for hierarchically selecting slot phase contexts.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

In the task type dialogue process, only knowing the current dialogue state can provide an accurate judgment basis for the subsequent action selection task, so that the maintenance of the state is necessary inside the system. During man-machine interaction, the representation of the dialog states is usually a probability distribution, since the states of the real demands of the user in the dialog system are ambiguous. The dialogue state tracking module needs to update the probability distribution of the dialogue state in real time according to the history of the dialogue, and then supplies the probability distribution to the dialogue decision module, so that the dialogue state tracking is a dynamic process and exists at any time when the whole dialogue is performed. The dialogue state tracking module with good performance can improve the success rate of dialogue, so that the number of rounds of interaction with a user is reduced when the system provides service, the satisfaction degree of the user is improved, and larger benefits are brought.

In order to solve the above problems, the embodiment of the invention adopts a dialogue state generation method based on hierarchical selection slot phase context. The method improves the conventional problem of generating sequence to sequence of slot values by encoding all contexts through an encoder and then using a decoder to selectively generate the slot values by using the context related to the slot, thereby realizing more robust dialogue state generation. The technology of selectively utilizing the context related to the slot can reduce the influence of noise in the dialogue text on the model and avoid missing important information.

The embodiment of the invention overcomes the defects and limitations that the contribution of different parts of the conversation history to the prediction groove value can not be measured in the encoding stage by the traditional sequence-to-sequence state generation method, and valuable information and redundant information are difficult to distinguish. The traditional method for encoding all dialogue contexts is improved to an encoding method for selecting the context related to the slot from a word level and a dialogue round level, then a weighted replication mechanism is utilized to generate slot values in a decoding stage, and finally a more robust dialogue state generation model is formed.

The embodiment of the invention adopts a method for generating a slot value by hierarchically selecting the context related to the slot, particularly, the dialogue history is regarded as a hierarchical text structure, and the context related to the slot is selected from two layers of words and dialogue rounds. Firstly, a gating circulation unit is used for respectively encoding slots and dialogue rounds at each moment to map the slots and the dialogue rounds into sequences of hidden states with the same length. Secondly, words related to slots usually appear in the utterance, so that the implicit state sequences of slots and dialog wheels are input into a cross-attention mechanism to obtain word-level matching matrices of slots and each dialog wheel, and then matrix multiplication is performed on the word-level matching matrices of slots and the implicit state sequences of dialog wheels respectively, and finally slot-related dialog wheel vectors and dialog wheel-related slot vectors are obtained.

After obtaining the dialog vectors and the slot vectors, considering that the information of a plurality of dialog wheels is needed to be combined when the slot values of certain slots are predicted, using a gating circulation unit to encode the dialog vectors to obtain a dialog level hidden state sequence, then using a multi-step selection module to judge the dialog where the slot values are located, namely, calculating the matching degree of each dialog and the slot by using the dialog level hidden state sequence and the slot vectors in each step of attention mechanism, weighting and summing the dialog level hidden state sequences according to the matching score to obtain the dialog level context vectors, and using a one-way gating circulation unit (GRU) to store the dialog level context vectors to obtain the slot perception context vectors before the next selection, and then using a fusion gating to fuse the slot perception context vectors and the slot vectors to obtain the re-expressed slot vectors. The multi-step selection module outputs the probability that each dialog wheel contains a slot value in the last step.

In the decoding stage, the gating cycle unit is used as a state generator, the slot position vector is used as the initial input of the gating cycle unit, and in each time step of generating the slot value, the word vector of the word label generated in the last step is input into the state generator to obtain the current implicit state. The word decoder is a feed-forward neural network, and in order to predict the probability distribution of the generated word at the current time, the probability distribution of the generated word at the current time needs to be determined by taking the current implicit state as the input of the word decoder. At the same time, the weighted replication mechanism multiplies the probability that the dialog box contains a slot value by the probability distribution of replicating a word from the dialog box to obtain the probability distribution of replicating the word from the dialog history, so that the state generator can more tend to replicate the word from the dialog box most likely to contain the value, thereby further reducing the negative influence of irrelevant information on the model. The word probability distribution fusion gating is a feedforward neural network with a sigmoid activation function, and for predicting the fusion weight between the word probability distribution obtained by a word generator and the word probability distribution obtained by a weighted replication mechanism, the fusion weight at the current moment needs to be determined by taking the word vector of an input word at the current moment and the implicit state of a slot value generator at the current moment as the common input of the word probability distribution fusion gating. Word labels at the current time are then predicted from the final word probability distribution.

The embodiment of the invention adopts a dialogue state generation method of hierarchical selection slot phase context, and is described in detail below:

1. when training the dialogue state generation model, the method is implemented according to the following steps:

step 1: selecting a certain amount of dialogue text data as training data (also called a sample) of a dialogue state generation model of a hierarchical selection slot phase context;

step 2: preprocessing training data, converting words into corresponding word labels, and then loading pre-training word vectors;

step 3: constructing a dialogue state generation model, and determining various parameters required by the dialogue state generation model;

step 4: sending the selected training dialogue text data into a constructed dialogue state generation model to train, and obtaining parameters of a system model, wherein the training speed depends on the configuration of a machine and the scale of training data;

step 5: the network model parameters are continuously regulated, the result of the training model is continuously observed, the optimal network model parameters are selected, and the trained network model parameters are stored.

2. When the system is used for generating the dialogue state, the method is implemented according to the following steps:

step 1: obtaining test dialogue text data;

step 2: preprocessing data, converting words into corresponding word labels, and then loading pre-training word vectors;

step 3: and sending the selected test dialogue text data into a state generation model of the trained parameters to generate word sequences, and finally obtaining the slot value of each slot.

Example 2

The scheme of example 1 is further described in conjunction with specific examples, as follows:

the invention aims at generating the user dialogue state in the dialogue system facing the task, and improves the capability of generating the user state from the dialogue context.

The description of the invention is used for acquiring the dialogue state of the user in the dialogue, so that the acquired dialogue state can meet the requirements on downstream tasks such as dialogue strategies and the like, and the tasks can be realized more accurately. The method can lead the result of dialogue state tracking to be more accurate than the traditional method, and has good effect in scientific research or daily application.

The technology of the method overcomes the defects of the traditional sequence-to-sequence generation method, adopts an algorithm for hierarchically selecting dialogue contents related to slot positions, collects information related to the slot positions through different multi-step selection modules, outputs the probability that each dialogue round contains slot values in the last step for a weighted replication mechanism of a decoding stage, and finally uses the weighted replication mechanism in the decoding stage, thereby obtaining stronger replication capability.

(1) Word-level encoder

Utterances U of the user _i And system response R _i Are connected to obtain a dialog wheel sequenceWhere i=1, 2,..t. />Is a concatenated word sequence D _i Word number, w of _i,j Is D _i The j-th word in (1), wherein->During encoding, word w _i,j First mapped to its embedded vector. Each dialog is then encoded using biglu (bi-directional gating loop unit), D _i And slot s into and />Namely D respectively _i And a word-level representation sequence of slots s; l (L) _s and d_h The slot length and the size of the encoder output, d _h Is the dimension of the hidden state vector. Whole word level representation of dialog +.>Are connected in sequence to formWord-level representation G of dialog history _D Stored in memory and passed to the decoding network to copy words from the dialog history.

(2) Word-level context selection

At the word level, attention weights are calculated using cross attention to obtain each round of dialog D _i And a matching profile for slot s, the formula of which is as follows:

wherein ,is a parameter that can be learned, < >>For word alignment matrices, tanh is the activation function. Then from +.>The most significant matching features are extracted.

wherein , and />Dialog wheel D is reflected by the attention vector _i Which words have the same or similar meaning as the slot s, dim is the word alignment matrix +.>Is a dimension of (c). Generating dialog wheel D using dot product between attention vector and word-level representation _i And sentence-level representation of slot s:

wherein ,h_s and H_D Respectively representing slots and sentence-level representations of conversations.

(3) Multi-step selection module

First using bi-directional gated loop cell (BiGRU) pair H _D Encoding to obtain context-aware sentence-level representationsDue to prediction of someThe slot values need to be combined with information between multiple dialog rounds, so that only focusing on the direct semantic relationship between the slot and each round of dialog may ignore some important information. To obtain more information about the slot positions and infer the round in which the slot values are located, a multi-step selection module is used.

Specifically, using the attention mechanism as a selector at step τ, closely focusing on what is most relevant to the slot, a round-level context vector is generated at each step, where τ=1, 2.

wherein ,is a parameter that can be learned, < >>For the matching score between the slot and each dialog wheel, +.>For the matching score between the normalized slot and each dialog box is also the weight of the copied word from dialog box i, tanh is the activation function, c _τ For the τ -th dialog-wheel level context vector, s _τ The slot vectors restated for the τ step are obtained by the following procedure.

To take into account the content of previous interest before the next selection, a unidirectional gating loop unit (GRU) is used to store dialog-level context vectors to obtain slot-aware contextVector z _τ ，

z _τ ＝GRU(c _τ ,z _τ-1 ),

The current slot vector h is then combined using a fusion gate _s And slot aware context vector z _τ Generating a restated slot vector s for the next selection _τ 。

g _t ＝Sigmoid(W _f ·[z _τ ；h _s ])

wherein ,is a parameter that can be learned; sigmoid is an activation function; g _t Gating is a scalar with a value between 0 and 1.

A unidirectional gate-controlled loop unit (GRU) is used as a decoder of the state generator to generate a slot value for each slot. The hidden value of the generator for each slot perceives the context vector z with the corresponding slot _T Is performed in the initialization of the (c). The weighted replication mechanism proposed by embodiments of the present invention allows the state generator to focus on replicating words from the dialog box that most likely contains slot values.

Examples:

the dialog in the example has four dialog rounds, each of which includes a user query and a system reply, and then the model outputs dialog states (restaurant-name-fresh fish mouth old-size food street, restaurant-average consumption-50-100 yuan, restaurant-business hours-10:00-22:00, attraction-name-hometown, attraction-score)-4.7 minutes). When applying the weighted replication mechanism, assuming that the slot value of this slot is to be obtained (restaurant-name), it is first determined that its slot value may exist in that dialog box, and the dialog box that typically contains the slot value will have a higher weight score, so the weight of the first dialog box is the largest in the example. Then representing G according to vectors and word levels of slots (restaurant-names) _D The probability that each word in the dialog box is the slot value of the slot (restaurant-name) is obtained by dot multiplication, and finally the final word probability distribution is obtained by multiplying the weight score of the dialog box obtained before and the probability of each word in the dialog box, and the word with the largest replication probability is used as the output of the model.

(4) Decoding network

Using the restated slot vectors s _T As a first input to the decoder. In each step k of generating a slot value for a slot s, the generator GRU generates a hidden stateAnd input word vector embedding w _sk . The probability distributions for generating a word from the vocabulary and for copying a word from the dialog box i are then calculated respectively,

and />Representing the probability distribution of generating a word from the vocabulary and copying a word from the dialog box i, respectively, < >>Is a word-level representation sequence of dialog wheel i, < >>Representing a trainable word vector.

Using a weighted replication mechanism to calculate a probability distribution for replicating a word from the conversation history:

wherein ,copying the probability distribution of a word from the dialog history,/->Is the probability of copying a word from dialog box i.

Final output profileIs distributed-> and />Is a weighted sum of:

wherein ,β_sk Is a scalar value representing the weight of a word generated from the vocabulary, calculated by the following formula.

wherein ,for a learnable parameter->Representing context vector, w _sk Is word vector embedding.

Example 3

A dialog state generation device for hierarchically selecting slot phase contexts, see fig. 4, the device comprising: a processor 1 and a memory 2, the memory 2 having stored therein program instructions, the processor 1 invoking the program instructions stored in the memory 2 to cause the apparatus to perform the following method steps in the embodiments:

In one mode, the fusion gating is used to fuse the slot awareness context vector and the slot vector to obtain the restated slot vector specifically includes:

using the attention mechanism as a selector at step τ, a round-level context vector is generated at each step:

wherein ,is a parameter that can be learned, < >>For the matching score between the slot and each dialog wheel, +.>Matching scores between the normalized slot positions and each dialog wheel; c _τ For the τ -th dialog-level context vector, tanh is the activation function; d, d _h A dimension that is an implicit state; s is(s) _τ Is a slot vector; />Sentence-level representation for a dialog wheel;

storing dialog-level context vectors using a unidirectional gating loop unit to obtain a slot-aware context vector z _τ ：

z _τ ＝GRU(c _τ ,z _τ-1 ),

Combining current slot position vector h using fusion gates _s And slot aware context vector z _τ Generating a new slot vector s for the next selection _τ ：

g _t ＝Sigmoid(W _f ·[z _τ ；h _s ])

In one embodiment, the probability distribution of copying a word from a dialog box is weighted using the probability that each dialog box contains a slot value, and the probability distribution of copying a word from the dialog box is obtained by:

wherein ,copying the probability distribution of a word from the dialog history,/->Is the probability of copying a word from dialog box i,/->Representing the probability distribution of copying a word from dialog box i.

It should be noted that, the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein in detail.

The execution main bodies of the processor 1 and the memory 2 may be devices with computing functions, such as a computer, a singlechip, a microcontroller, etc., and in particular implementation, the execution main bodies are not limited, and are selected according to the needs in practical application.

Data signals are transmitted between the memory 2 and the processor 1 via the bus 3, which is not described in detail in the embodiment of the present invention.

Based on the same inventive concept, the embodiment of the present invention also provides a computer readable storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute the method steps in the above embodiment.

The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.

It should be noted that the readable storage medium descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the invention, in whole or in part.

The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium or a semiconductor medium, or the like.

The embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.

Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method for generating dialog states for hierarchically selecting slot phase contexts, the method comprising:

multiplying the probability that the dialogue wheel contains a slot value by the probability distribution of copying a word from the dialogue wheel by adopting a weighted copying mechanism to obtain the probability distribution of copying the word from the dialogue history, and further predicting the word label at the current moment;

wherein the multi-step selection is specifically:

wherein ,is a parameter that can be learned, < >>For the matching score between the slot and each dialog wheel, +.>Matching scores between the normalized slot positions and each dialog wheel; c _τ For the τ -th dialog-level context vector, tanh is the activation function; d, d _h A dimension that is an implicit state; s is(s) _τ Is a slot vector; />Sentence-level representation for a dialog wheel; r is system response; d (D) _i Is a series word sequence; storing dialog-level context vectors using a unidirectional gating loop unit to obtain a slot-aware context vector z _τ ：

z _τ ＝GRU(c _τ ,z _τ-1 ),

g _t ＝Sigmoid(W _f ·[z _τ ；h _s ])

2. The method for generating a dialogue state for a hierarchical selection slot context according to claim 1, wherein the weighted replication mechanism specifically comprises:

wherein ,copying the probability distribution of a word from the dialog history,/->Representing the probability distribution of copying a word from dialog box i.

3. A dialog state generation device for hierarchically selecting slot phase contexts, the device comprising: a processor and a memory, the memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of claims 1-2.

4. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps of any of claims 1-2.