CN114896988A

CN114896988A - Unified dialog understanding method and framework

Info

Publication number: CN114896988A
Application number: CN202210342533.2A
Authority: CN
Inventors: 俞凯; 陈露; 陈志�
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-08-12

Abstract

The embodiment of the invention provides a unified dialogue understanding method. The method comprises the following steps: converting the dialogue understanding tasks of various categories into a plurality of generative dialogue tasks according to a uniform generative paradigm; and inputting the multiple generative dialogue tasks into the task unified generative model, and outputting task identifiers and task answers corresponding to the multiple generative dialogue tasks so as to solve the dialogue understanding tasks of various categories. The embodiment of the invention also provides a unified dialogue understanding framework. Embodiments of the present invention propose a unified generative dialog understanding framework for sharing knowledge among multiple dialog understanding tasks. To alleviate the bias generation problem, existing learnable weight methods are improved, which can achieve the best overall performance. Compared with the existing model, the UniDU of the method realizes better performance on a plurality of DU tasks. Further research into influencing factors has been conducted. Finally, experimental results show that the UniDU model can obtain excellent performance under the settings of small samples and zero samples.

Description

Unified dialog understanding method and framework

Technical Field

The invention relates to the field of intelligent voice, in particular to a unified dialogue understanding method and a unified dialogue understanding framework.

Background

The development of the session system plays an important role in the popularization of intelligent devices, such as intelligent assistants. In recent years, there has been an increasing interest in neural dialog systems. Dialog understanding is a core technology in neural dialog systems aimed at accurately analyzing dialogs from different fine-grained perspectives. However, there are many types of dialogue understanding tasks, such as giving a question stem and a question to determine a corresponding answer, or giving a long text to determine a short key sentence, etc. These DU (dialog Understanding) tasks are still independently learned due to different task formats.

Among the learning of conversational tasks are:

application of multitask learning in conversational understanding: the shared dialogue coding model is trained by utilizing the corpora of different understanding tasks, the different tasks are provided with independent decoding modules, the dialogue representation represented by special symbols is used for classification calculation of the classification problem, the labeling calculation is carried out on each word for the sequence labeling problem, and the independent autoregressive decoding calculation is carried out for the generation problem.

Application of the method of generating the formula in dialogue modeling: the generative dialogue modeling generally refers to modeling dialogue contents by using a language model, changing the dialogue modeling into a dialogue generation problem, and then using other annotation information to help the quality of dialogue generation.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

application of multi-task learning in conversational understanding: different dialog understanding problems still need to be set to different dialog understanding tasks, for example dialog intent recognition is a classification task, dialog slot filling is a sequence labeling task and dialog shorthand is a generation task. Different tasks have independent sets of parameters, and the method is not beneficial to exploring the influence among different tasks because the influence is limited by the model. In addition, the multi-task training mode has poor expandability, different tasks still need independent modeling, the requirement on memory is high, and almost the multi-task training mode grows linearly along with the increase of the number of the tasks.

Application of the method of generating the formula in dialogue modeling: only the dialog generation problem is taken as a main problem, and the influence of different multi-task training modes on different dialog understanding tasks in the training process is not considered. Nor does it delve into the impact of many other forms of dialog understanding tasks.

Disclosure of Invention

In order to solve the problem that at least different tasks in the prior art still need independent modeling and are not beneficial to dialogue understanding of different dialogue tasks, the performance requirements of all dialogue understanding tasks are considered globally, the performance of the dialogue generating tasks is only improved by other tasks, and the influence of different multi-task training modes is not considered. In a first aspect, an embodiment of the present invention provides a unified dialog understanding method, including:

converting the dialogue understanding tasks of various categories into a plurality of generative dialogue tasks according to a uniform generative paradigm;

and inputting the generated conversation tasks into a task unified generated model, and outputting task identifiers and task answers corresponding to the generated conversation tasks so as to solve conversation understanding tasks of various categories.

In a second aspect, an embodiment of the present invention provides a unified dialog understanding framework, including:

the unified conversion program module is used for converting the dialogue understanding tasks of all categories into a plurality of generating type dialogue tasks according to a unified generating paradigm;

and the dialogue understanding program module is used for inputting the generated dialogue tasks into the task unified generating model and outputting the task identifications and the task answers corresponding to the generated dialogue tasks so as to solve the dialogue understanding tasks of various types.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the unified dialog understanding method of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the unified dialog understanding method according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: the method provides a unified generative dialog understanding framework for sharing knowledge among multiple dialog understanding tasks. To alleviate the bias generation problem, existing learnable weight methods are improved, which can achieve the best overall performance. Compared with the existing model, the UniDU method of the method realizes better performance on a plurality of DU tasks. Further intensive research into influencing factors has been carried out. Finally, experimental results show that the UniDU model of the method can obtain excellent performance under the settings of small samples and zero samples.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a unified dialog understanding method provided by an embodiment of the present invention;

fig. 2 is a schematic diagram of timeslot filling task conversion of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the conversion of dialog state tracking tasks in a unified dialog understanding method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating intent detection task transition of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a plurality of task transitions of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 6 is a block diagram of a unified dialog understanding method according to an embodiment of the present invention;

fig. 7 is a schematic diagram illustrating eight learning strategies adopted in the test result of the five DU tasks in the unified dialog understanding method according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating the overall performance of the blending, HWU, and MATS methods of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating an effect of each task corpus of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a different unified understanding of a unified dialog understanding method according to an embodiment of the present invention;

FIG. 11 is a diagram of different pre-training language model versus coding structure for a unified dialog understanding method according to an embodiment of the present invention;

FIG. 12 is a diagram illustrating the results of a unified dialog understanding method on BART and UniDU according to an embodiment of the present invention;

FIG. 13 is a block diagram of a unified dialog understanding framework according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of an embodiment of an electronic device with unified dialog understanding according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a unified dialog understanding method according to an embodiment of the present invention, which includes the following steps:

s11: converting the dialogue understanding tasks of various categories into a plurality of generative dialogue tasks according to a uniform generative paradigm;

s12: and inputting the generated conversation tasks into a task unified generated model, and outputting task identifiers and task answers corresponding to the generated conversation tasks so as to solve conversation understanding tasks of various categories.

In the present embodiment, a dialog is represented by C ═ H (H) _n ，U _n ) In which H is _n ＝(U ₁ ，U ₂ ，…，U _n-1 ) Representing the turn of the conversational history utterance containing the first n-1. U shape _n Is the utterance of the nth round and may be composed of a plurality of sentences spoken by one speaker. For task-oriented dialogs, the scope of the domain is limited by the dialog ontology designed by the dialog expert. The ontology O is composed of a dialog domain D ═ { D }, a domain slot S ═ { S }, and a user intention candidate I ═ { I }.

For step S11, converting all types of dialog understanding tasks into a plurality of generative dialog tasks according to a unified generative paradigm, wherein the categories of dialog understanding tasks include: dialog summarization, completion of dialogs, intention detection, slot filling, and dialog state tracking. Specifically, the method comprises the following steps:

the DS (dialog Summary) is intended to extract important information of a dialog. This is a typical generation problem that takes the entire dialog context C as input, generating a summary description. The DS requires that the model focus on the entire conversation process and important concepts.

The purpose of DC (complementary dialog) is to alleviate the co-pointing and information omission problems that often occur in the context of dialogs. It is also a typical generating task, inputting the dialog history H _n And the current utterance U _n Then infer the current utterance U _n Completes the statement. DC requires that the model focus on the link between the current utterance and the dialog history.

SF (Slot Filling) is a Slot type that extracts the entity mentioned by the user. This is a time slot tagging problem, i.e., utterances are tagged in the IOB (Inside, Outside and Beginning) format. The input is simply the current utterance.

ID (Intent Detection) is the identification of Intent from a predefined abstract Intent expression I. It is often described as a classification problem. The input being the current utterance U _n The output is the probability distribution of all intent candidates I.

The purpose of DST (dialog State Tracking) is to record the constraints of a user, which consists of three sets of domain slot values. For example, hotel privechepap means that the user wants an inexpensive hotel. The DST input of the n-th turn is the first n turns (U) ₁ ，…，U _n )。

The five different dialogs described above are understood to be a unified sequence of tasks into a sequence format. The composition of each task, in particular how intention detection, slot filling and dialog state tracking are used as generating tasks, is then explained in detail.

As an implementation mode, converting each category of conversation understanding tasks into a plurality of generative conversation tasks comprising task identification, conversation content and task description according to a uniform generative paradigm;

in the present embodiment, the input of the Unified general Understanding (Unified dialog Understanding) includes three parts: task identification, dialog content, and task query. The task identity is represented by a special mark, for example, a dialog summary identified by "[ DS ]". Dialog content refers to task-related input, such as a dialog history of a dialog summary. A task query may be viewed as a task-specific prompt, including task definitions and domain-related information. The output of a UniDU has two elements: task identification and query answers. The query answer is an understanding of the task query given by the dialog content. The unified input and output may be formalized as:

inputting: [ T1] dialogue content [ C ] task query

And (3) outputting: [ T1] query answer

Where "[ C ]" is an independent character, "[ TI ]" is a task identification (which may be replaced with "[ DS ]", "[ DC ]", "[ SF ]", "[ ID ]" and "[ DST ]", corresponding to a dialog summary, completion dialog, slot filling, intention detection, and dialog state tracking, respectively). When reasoning, the UniDU model must first predict task identification.

For step S12, outputting task identifiers and task answers corresponding to multiple generative dialogue tasks through the task uniform generative model, for example:

dialog summarization and completion dialog are generative tasks. The dialog content is inputted with the whole dialog context C and the multi-echo speech H _n . Since these two tasks are independent of the dialog domain, there is no domain information in the task query. For a dialog summary, the task query is "what is the summary of this dialog? ". For a completion dialogue, the query is "U _n What is the semantic completion statement of? ", wherein U _n Is the t-th statement. In the output, their comprehension answers are the annotated dialog summary and the rewritten utterance, respectively.

The slot filling task requires the model to extract all mentioned slot values and their slot types in one statement. In the method, the UniDU model predicts values slot by slot, which is an iterative generation process for a candidate list of slots. Fig. 2 shows a conventional method and a slot filling format of the method.

It is to be understood that not all lists of candidates are listed here. Generally, each sample name can be processed in the same manner, which is not described herein, and can be expressed as:

inputting: [ SF ]]U _n [C]what is s of d

And (3) outputting: [ SF ] slot value

Where s and d are predefined slots and fields. If s is in U _n If there is no value, then the slot value will be "not ignored". If s has multiple values, they will be comma separated in the bin value. When a value is "not mentioned," it is said to be a negative sample, otherwise it is a positive sample. To balance the ratio of positive and negative samples during training, the ratio is set to 2:1 or less. If the number of negative samples exceeds the threshold, the number of negative samples is randomly chosen to be twice the number of positive samples.

The dialog state tracks the task, and the classification method always obtains better performance than the generation method. However, under the framework of the UniDU, DST is also described as a slot value generation task similar to the slot filling task. DST task format as shown in fig. 3, the output of the original DST model is the distribution of all candidate values of the time slot. The inputs and outputs of DST tasks under the UniDU can be formalized as:

inputting: [ DST ] (Hn, Un) [ C ] while is the user's constraint abuut of d

And (3) outputting: [ DST ] slot value

Wherein (H) _n ，U _n ) Is the dialog context. If slot s of field d is not in a conversation state, its value is "not mentioned", which is a negative example. Different utterances are marked with a special sign in the input "[ T ]]"separate. The ratio of negative and positive samples is also set to 2:1 or less during training.

The intent detection task, the original method usually states it as an intent classification problem and outputs a distribution of all candidate intents. The UniDU model directly generates the intent name of the current utterance, which can be formalized as:

inputting: [ ID ]]U _n [C]what is the user’s intent on domain d

And (3) outputting: [ SF ] intent name

Where the field d is usually known in advance. A specific example of the UniDU format of the prior art and the present method is shown in fig. 4.

Not all intentions are listed in the above figures. To integrate generalization capability into the UniDU model, negative examples were also constructed for the intent detection task. The intent name of the negative example is "undefined" where the input utterance is not sampled from an out-of-domain dialog. The ratio of negative and positive samples is set to 2: 1. To date, all five dialog understanding tasks have been defined as uniform sequence-to-sequence generation tasks. A specific example is shown in fig. 5. Thereby obtaining the corresponding task identification and task answer.

According to the embodiment, all the dialogue understanding tasks are unified to generate the questions, and the task identification and the task answers are output. The UniDU method can utilize a task unified generative model to process whatever types of conversation tasks are exemplified, has stronger expandability and has more accurate conversation understanding performance under a small sample zero sample.

As an embodiment, the task uniform generation model is trained by adopting the following steps:

converting the dialogue understanding task training data of each category into a plurality of generative dialogue task training data according to a uniform generative paradigm;

performing multi-task joint training on the task unified generative model by using the plurality of generative dialogue task training data, and outputting predicted task identifiers and predicted task answers corresponding to the plurality of generative dialogue task training data, wherein the plurality of generative dialogue tasks share the same training parameter;

and performing unified generative model optimization on the tasks based on the errors of the reference task identification and the reference task answer training and the predicted task identification and the predicted task answer until the training is finished when the errors are converged.

In the present embodiment, the plurality of dialog understanding tasks are formulated as a unified framework. Since the output space between the above-mentioned multiple DU tasks is very different, how to effectively train five different tasks simultaneously becomes an important issue.

In the training process, the dialogue understanding task training data of each category is also converted into a plurality of generative dialogue task training data according to a unified generative paradigm, and the specific conversion process has been exemplified in the above steps, and is not described herein again. The plurality of generative conversational task training data is multitasked training. Specifically, the method comprises the following steps:

the training strategy of the multi-task joint training comprises the following steps:

distributing the same weight to all the generated dialogue tasks, and implementing an average and a strategy; and/or

Utilizing a heuristic training plan to carry out respective training processes on different generative dialogue task plans so as to realize a configurable manual scheduling strategy; and/or

And determining loss weights of different generative dialogue tasks, and using the loss weights to balance the learning weight strategy of global optimization of each loss weight. The learning weight strategy for balancing the global optimization of the loss weights comprises the following steps: homomorphic uncertainty weights, gradient normalized gradient norm.

In this embodiment, the multi-task training strategy can be divided into three categories: an average sum method, a manual scheduling method, and a learnable weight method.

The average sum method assigns all samples with the same weight. In other words, the losses of different samples are directly averaged, and the formula is

Where T is the number of tasks, L _t Is the loss of the T-th task.

The manual scheduling method designs a heuristic training plan to plan the learning process of different tasks. For example, curriculum learning is a typical manual planning method that first trains simple samples and then adds more complex cases. The manual scheduling method can be expressed as

In the formula (I), the compound is shown in the specification,

is an indicator function, the value of which is 0 or 1.

The learnable weight method aims at parameterizing the loss weights of different tasks. The goal of the parameterization weight is to balance the influence of the task instance, avoid the model from being biased to one or more tasks, and realize global optimization. There are two classical learnable weight algorithms: homomorphic Uncertainty Weights (HUW) and gradient normalized gradient norm (GradForm). For these tasks, the loss function is formulated as

Wherein W _t The learnable weight is greater than 0. In the HUW algorithm, the weights are updated to the following loss functions:

wherein, log (W) _t ) For adjusting the weights, it is applicable to regression tasks and classification tasks. The motivation for the gradient norm method is to slow down the learning scale of the task with larger gradient and faster convergence speed.

And further training by using the benchmark and the prediction after the prediction task identification and the prediction task answer are obtained. In the HUW formula above, the learnable weight Wt depends only on the corresponding task. Thus, the weight may be considered as a task W _φ (t), where a is a parameter shared between the five tasks. Under the UniDU framework, five tasks share the same encoder-decoder model, which is a constant-weight function W _φ (t) of (d). The task format depends on task attributes such as input, output, and data size. In order to extract the characteristics of five tasks, a vector is manually designed as the task characteristics to represent one task. Each dimension in the task function has its physical meaning associated with the model-agnostic setting. Function parameters can be shared among different tasks, and weights are not independent of each other like the original learnable weight method. The formalization modified from the above HUW formula is:

the final training is ended based on the above example.

Briefly, the structure of the method is shown in fig. 6:

1. the conversational understanding task first becomes a generative problem, where the input part includes three components: the special identifier of the task [ XX ] indicates, the dialog content (single or multiple rounds) and the task description. The output part is task identification and task answer.

2. All the dialogue understanding tasks share the same parameter model and are combined with a multi-task learning algorithm to carry out joint training

3. When the inference is made, the corresponding task identification and the answer to the corresponding question are output to solve all the conversational understanding tasks.

The method is experimentally explained, and experiments are carried out on ten dialogue understanding corpora. There are two corpora per task. The UniDU framework of the method is evaluated with eight different training strategies. Compared with a well-designed model, the UniDU of the method can obtain better performance in five benchmark tests. Different factors affecting the performance of the UniDU model are then analyzed in depth, including the DU tasks, the uniform format and the pre-trained language model. Finally, some experiments were performed to verify the generalization capability of the UniDU.

There are 10 corpus of conversational understanding, involving five tasks: dialog Summary (DS), completion Dialog (DC), Slot Filling (SF), Intention Detection (ID) and Dialog State Tracking (DST). Two well studied corpora were selected for each task: one is an evaluation corpus and the other is an auxiliary corpus.

And (3) dialog summarization: the SAMSUM and DIALOGSUM datasets were selected. A common indicator of the summarization task is the score, which measures the overlap between the n pieces in the generated summary and the reference summary.

Completing the conversation: tasks and languages are used. The metrics are the BLEU score and Exact Match (EM) accuracy. BLEU measures how similar the rewritten sentence is to the reference sentence. Exact match refers to the rate at which the data is generated to be exactly equal to the reference.

And (3) intention detection: experiments were performed on BANKING77 and HWU64, where 77 and 64 represent the number of predefined intents. The evaluation index is detection Accuracy (ACC).

Filling a slot: experiments were selected on RESTAURANTS8K and SNIPS. F1 extracts the score for the correct span for each user's utterance. It should be noted that no correct prediction for negative samples was calculated in the F1 score, which is comparable to the conventional method.

Conversation state tracking: WOZ2.0 and muliwoz 2.2. The metric is joint target accuracy (JGA), which measures the percentage of success in all dialog rounds, where a round is considered successful if and only if all slot values are correctly predicted. Note that only the "hotel" field data of multi iwoz2.2 is used in the training phase.

The multi-task training strategy can be divided into three categories: average sums, manual planning, and learnable weights. Before introducing the MTL (Multi-task training) training method, there is a visual baseline based on its own data, named ST (single training). In ST, the sequence-to-sequence model was trained on only five evaluation datasets, respectively. In the average sum method, there are two training strategies: TT (task transfer learning) and MIX (mixed learning). The goal of task migration learning is to improve learning performance using external data from an auxiliary corpus with the same task settings. This is the main reason for choosing two corpora for each task. Hybrid learning directly mixes all training samples from 10 corpora together. In both methods, the learning weight for each sample is uniformly distributed. In the curriculum schedule, two manual learning methods are used. From an input perspective, five tasks can be divided into three categories: utterance-level input for intent detection and slot filling, flip-level input for completion of dialog and dialog state tracking, and dialog-level input for dialog summarization.

The order of input gradually becomes complex: a speech layer, a speech wheel layer, and a dialog layer. Therefore, the intuitive method (abbreviated as CL) trains five tasks in this order. Note that the previous data will remain in the next training phase. From the perspective of task setup, the dialog summary and completion dialog belong to domain-independent tasks. The other three tasks are domain-related tasks. There is another training route: from general tasks to domain-specific tasks. Among the learnable weightings, we evaluated the three methods introduced in section 4: GradOrm, HWU and our proposed MATS (model-modelling training strategy, model agnostic training strategy).

In the method, BART (Bidirectional and Auto-Regressive transforms) is used as the backbone of the unified codec model. The BART model is implemented by the HuggingFace library. All experiments were performed on 2080TI GPU and 11G memory. Each experiment was performed for 60 sessions, taking 72 hours. The batch size was 32 and a gradient accumulation strategy (update every 8 stages) was used. The learning rates for the unified model and the learnable weights are 1e-5 and 1e-4, respectively. In the MATS method, the weighting function consists of two linear layers with a ReLU activation function with a hidden size of 64.

As in fig. 7, the best assessment performance for five tasks and eight training strategies is reported. The experimental result shows that under the UniDU framework, different training strategies have great influence on the performance of five tasks. In addition to the dialog summary, the method achieves the best or near best performance. On atypical generation tasks (intent detection, slot filling and dialog state tracking), unidus using the MATS method can achieve promising improvements compared to well-designed models. Compared with single training, the simple task transfer learning method (TT) cannot improve the learning performance obviously. The blending operation brings about continuous performance improvement over five tasks. However, the improvement is still limited compared to TT, except in the completion of the dialog. Compared to MATS proposed by the present method, MIX favors convergence to more complex DU tasks (dialogue summarization and completion dialogue). Both manual scheduling methods (G2S and CL) do not have any significant advantage. In the learnable weightage method, GradNorm only achieved excellent performance on the dialog abstract. HWU achieve performance improvements in intent detection, slot filling, and dialog state tracking. The best UniDU model (signed with underlining) continues to be fine-tuned on the corresponding corpus. It was found that only the dialogue summary and completion dialogue had significant performance improvement, which also reflects the necessity of the UniDU framework for simpler generative tasks.

In figure 7, the task specific performance of the UniDU model is reported, with the check points selected by the task specific metrics. Fig. 8 shows the uniform performance of the blend, HWU and MATS methods over five tasks. A single checkpoint of the UniDU model was evaluated, which had the highest overall score for evaluation over five tasks. The total score is an average of five main indicators shown in fig. 8. It can be seen that the MATS proposed by the method achieves the highest overall performance and also the best performance in the four DU tasks.

The method analyzes factors influencing the performance of the UniDU model, including the DU task, the unified format and the pre-trained language model.

To verify the effect of the conversational understanding task, one of the five DU corpuses was directly deleted and the UniDU model was trained using the MATS method as shown in fig. 9. Overall, the five DU tasks benefit from each other, except that the dialog summary has a negative impact on the dialog state tracking task. The general dialogue summarization task is to summarize a dialogue into a sentence, and ignore information in a specific field. On the other hand, it was found that the completion dialogue task had the greatest effect on the other four DU tasks. Studies have shown that co-pointing and information omission remain major factors affecting dialog comprehension capabilities. This phenomenon helps the conversation understanding community to focus more on the completion of the conversation. For example, when a scaled dialog model is pre-trained, the pre-trained task should be close to completing the dialog task.

The dialog understanding task is specified in the QA format. There is an intuitive alternative: prefix format, task query is concatenated at decoder side. Upon inference, the decoder receives the task query directly and then generates an answer. Fig. 10 shows a summary of four tasks, and fig. 10 shows a summary of five tasks.

To verify the effect of the different pre-training backbones, the encoders of the UniDU models were initialized using the stochastic mechanism, BART and T5. Trans-B and Trans-L represent random initialization transformers trained from scratch as in FIG. 11, with the same parameters as the BART basic model (BART-B) and the BART large model (BART-L). T5-S and T5-B represent T5-small and T5-base, respectively. It can be seen that the pre-trained language model achieves an absolute performance gain compared to the random initialization model. BART-B can achieve better performance than T5-S. The T5-base performed best when the parameter ratio was increased. The results show that large PLMs can greatly enhance the complex dialogue summarization task.

To further evaluate the generalization ability of the UniDU model, a small amount of learning was first performed to experiment with the domain-dependent slot filling task. The homonym learning ability of the UniDU was tested on unseen dialogue data.

And (3) small sample learning: the UniDU model was selected which yielded the best overall performance assessment among the five tasks learned using the MATS method. For the slot filling task, another dialog corpus DSTC8 is extended. The "bus" field data in DSTC8 was selected, which was not seen during training of the unidus. With obvious advantages, especially in situations where resources are extremely limited. When there is only 1% training data, vanilla BART will disable learning, as shown in fig. 12.

Zero part of the word learning performance: the UniDU model trained with the MATS method was validated on invisible "taxi" domain dialogue data collected from the multi iwoz2.2 corpus. The accuracy of the UniDU model was 18.24% on ID, 39.69% on F1 and 1.6% on DST for JGA.

In general, the method proposes a unified generative dialog understanding framework (UniDU) for sharing knowledge among five dialog understanding tasks. To alleviate the bias generation problem, existing learnable weight methods are improved, which can achieve the best overall performance. The UniDU approach of the present approach achieves better performance over a total of five DU tasks than the well-designed model. Further research into influencing factors has been conducted. Finally, experimental results show that the UniDU model of the method can obtain excellent performance under the settings of small samples and zero samples.

Fig. 13 is a schematic structural diagram of a unified dialog understanding framework according to an embodiment of the present invention, where the system can execute the unified dialog understanding method according to any of the above embodiments and is configured in a terminal.

The present embodiment provides a unified dialog understanding framework 10, which includes: a unified conversion program module, and a dialog understanding program module 12.

The unified conversion program module 11 is configured to convert the dialog understanding tasks of each category into a plurality of generating-type dialog tasks according to a unified generating paradigm; the dialogue understanding program module 12 is configured to input the generated dialogue tasks into the task unified generating model, and output task identifiers and task answers corresponding to the generated dialogue tasks, so as to solve the dialogue understanding tasks of each category.

Further, the unified transformation program module is configured to:

converting the conversation understanding tasks of all categories into a plurality of generative conversation tasks comprising task identification, conversation content and task description according to a uniform generative paradigm, wherein the categories of the conversation understanding tasks comprise: dialog summarization, completion of dialogs, intention detection, slot filling, and dialog state tracking.

Further, the task uniform generation type model is trained by adopting the following steps:

Further, the training strategy of the multitask joint training comprises:

And determining loss weights of different generative dialogue tasks, and using the loss weights to balance the learning weight strategy of global optimization of each loss weight.

The learning weight strategy for balancing the global optimization of the loss weights comprises the following steps: homomorphic uncertainty weights, gradient normalized gradient norm.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the unified conversation understanding method in any method embodiment;

as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the unified dialog understanding method of any of the method embodiments described above.

Fig. 14 is a schematic hardware structure diagram of an electronic device of a unified dialog understanding method according to another embodiment of the present application, and as shown in fig. 14, the electronic device includes:

one or more processors 1410 and memory 1420, with one processor 1410 being illustrated in FIG. 14. The apparatus of the unified dialog understanding method may further include: an input device 1430 and an output device 1440.

The processor 1410, memory 1420, input 1430, and output 1440 may be connected by a bus or other means, as exemplified by the bus connection in fig. 14.

Memory 1420, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the unified dialog understanding method in embodiments of the present application. The processor 1410 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions, and modules stored in the memory 1420, that is, implements the dialog understanding method unified by the above-described method embodiments.

The memory 1420 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 1420 optionally includes memory located remotely from processor 1410, which may be connected to a mobile device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1430 may receive input numeric or character information. The output device 1440 may include a display device such as a display screen.

The one or more modules are stored in the memory 1420 and, when executed by the one or more processors 1410, perform the unified dialog understanding method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the unified dialog understanding method of any of the embodiments of the present invention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players, handheld game consoles, electronic books, as well as smart toys and portable vehicle navigation devices.

(4) Other electronic devices with data processing functions.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A unified dialog understanding method, comprising:

2. The method of claim 1, wherein the converting the categories of conversation understanding tasks into a plurality of generative conversation tasks in a unified generative paradigm comprises:

3. The method of claim 1, wherein the task uniform generative model is trained using the steps of:

4. The method of claim 3, wherein the training strategy of the multitask joint training comprises:

5. The method of claim 3, wherein the learning weight strategy for balancing loss weight global optimization comprises: homomorphic uncertainty weights, gradient normalized gradient norm.

6. A unified dialog understanding framework, comprising:

7. The dialog understanding framework of claim 6, wherein the unified conversion program module is to:

8. The dialog understanding framework of claim 6, wherein the task uniform generative model is trained using the steps of:

9. The dialog understanding framework of claim 8, wherein the training strategy of the multitask joint training comprises:

10. The dialog understanding framework of claim 9, wherein said learning weight strategy for balancing loss weight global optimization comprises: homomorphic uncertainty weights, gradient normalized gradient norm.