CN116108918A

CN116108918A - Training method and related device for dialogue pre-training model

Info

Publication number: CN116108918A
Application number: CN202211737729.8A
Authority: CN
Inventors: 曾钢欣
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-12

Abstract

The embodiment of the application relates to the technical field of intelligent conversations, and discloses a training method and a related device of a conversation pre-training model. And then training the first dialogue pre-training model by adopting a plurality of second dialogue histories in the second field to obtain a second dialogue pre-training model. The second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories. Through transfer learning, the second dialogue pre-training model with good effect can be obtained by training only a small amount of marking data in other second fields, and the marking cost can be greatly reduced.

Description

Training method and related device for dialogue pre-training model

Technical Field

The embodiment of the application relates to the technical field of intelligent conversations, in particular to a training method and a related device of a conversation pre-training model.

Background

Dialog systems can generally be divided into three broad categories: the main research object of the invention is a task type dialogue system. Task dialogs are typically intended to satisfy users with explicit purposes, such as weather checking, making phone calls, booking tickets, ordering meals, and the like. Because the demands of users are complex, the users often need to interact in multiple rounds, the users can also continuously modify and perfect the demands during the conversation, and the task type robot needs to help the users to clearly target through inquiry, clarification and confirmation.

The pipelined TOD (task-oriented dialog) system consists of four modules: NLU (Natural Language Understanding ), DST (Dialog StateTracking, dialog state tracking), DP (dialog strategy), and NLG (natural languagegeneration ).

The four modules are operated in a pipelined manner, and different modules are independently trained based on a dialogue pre-training model. The dialogue pre-training model is a pre-training language model which is obtained by training massive dialogue data in the open field in a self-supervision mode. That is, the dialog training model for each domain is heavily dependent on a large amount of annotation data for training.

Disclosure of Invention

The technical problem that the embodiment of the application mainly solves is to provide a training method and a related device for a dialogue pre-training model, which can train to obtain the dialogue pre-training model with better effect under the condition of greatly reducing the labeling cost.

In a first aspect, an embodiment of the present application provides a training method for a dialog pre-training model, including:

training a neural network by adopting a plurality of first dialogue histories in the first field to obtain a first dialogue pre-training model, wherein the first dialogue histories are marked with first real labels, the first real labels comprise at least one dialogue task, and the dialogue task reflects semantic logic of the first dialogue histories;

training the first dialogue pre-training model by adopting a plurality of second dialogue histories in the second field to obtain a second dialogue pre-training model; the second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories.

In some embodiments, the training the neural network with the first session histories in the first fields to obtain a first session pre-training model includes:

Formatting the first dialogue history to obtain a first formatting sequence;

inputting the first formatted sequence into a neural network, and outputting a first prediction tag by the neural network based on a first database, wherein the first database is a question-answer database in the first field;

and according to the differences between the plurality of first prediction labels and the plurality of first real labels, adjusting model parameters of the neural network until the maximum likelihood function is maximized, and obtaining a first dialogue pre-training model.

In some embodiments, the foregoing formatting the first dialog history to obtain a first formatted sequence includes:

and splicing the first conversation history and each conversation task, wherein an identifier is inserted between any two of the first conversation history and each conversation task to distinguish the first conversation history and each conversation task, so as to obtain a first formatting sequence.

In some embodiments, the aforementioned first real label includes a real dialog state reflecting a topic feature of the first dialog history;

the maximum likelihood function includes a first maximum likelihood function reflecting a probability that the predicted dialog state in the first predicted tag is a true dialog state.

In some embodiments, the foregoing first real label further includes a real problem category reflecting a category to which the problem in the first conversation history belongs;

The maximum likelihood function further includes a second maximum likelihood function reflecting a probability that the predicted problem category in the first predicted tag is a true problem category.

In some embodiments, the first real label further includes a real reply reflecting a real answer of the first dialogue history;

the maximum likelihood function further includes a third maximum likelihood function reflecting a probability that the predicted reply in the first predicted tag is a true reply.

In some embodiments, the third maximum likelihood function also reflects a probability that the predicted reply in the first predicted tag is not a true reply for comparison learning.

In a second aspect, an embodiment of the present application provides a training method for a task-type dialog model, including:

acquiring a training set corresponding to a certain dialogue task;

training the dialogue pre-training model by using a training set to obtain a task type dialogue model, wherein the dialogue pre-training model is obtained by training by using a training method of the dialogue pre-training model as the first aspect.

In a third aspect, an embodiment of the present application provides a method for generating a dialog reply, including:

acquiring a dialogue context;

Inputting the dialogue upper line into at least one task type dialogue model, and respectively outputting dialogue tasks by the at least one task type dialogue model to obtain at least one dialogue task; wherein, at least one task-type dialogue model is used for predicting different dialogue tasks, and any task-type dialogue model is trained by adopting the method as the second aspect;

a dialog reply is output in accordance with the at least one dialog task.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first, second or third aspects.

In a fifth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform a method as in the first, second, or third aspects.

The beneficial effects of the embodiment of the application are that: different from the situation of the prior art, the training method of the dialogue pre-training model provided by the embodiment of the application adopts a plurality of first dialogue histories in the first field to train the neural network to obtain the first dialogue pre-training model, wherein the first dialogue histories are marked with first real labels, the first real labels comprise at least one dialogue task, and the dialogue task reflects semantic logic of the first dialogue histories. And then training the first dialogue pre-training model by adopting a plurality of second dialogue histories in the second field to obtain a second dialogue pre-training model. The second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories.

In this embodiment, on the basis of knowing a huge amount of first dialog history in the first domain, by setting that the first real label includes at least one dialog task reflecting semantic logic of the first dialog history, the trained first dialog pre-training model can learn the mutual implicit relationship between each dialog task in the first dialog history, so that the ability of the first dialog pre-training model to learn the mutual implicit relationship can be transferred to the second dialog pre-training model generated by learning the second dialog history in the second domain by adopting the first dialog pre-training model to learn the second dialog history in the second domain with a smaller number. In case of a small second dialog history in the second domain, the second dialog pre-training model may also learn these implicit relations reflecting the semantic logic. Therefore, under the condition that mass dialogue data in the first field exist, through transfer learning, the second dialogue pre-training model with good effect can be obtained by training only a small amount of annotation data in other second fields, and the annotation cost can be greatly reduced.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

Fig. 1 is a schematic view of an application scenario of a task-based dialog system according to some embodiments of the present application;

FIG. 2 is a program architecture diagram corresponding to a training method of a dialogue pre-training model according to some embodiments of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to some embodiments of the present application;

FIG. 4 is a flow chart of a training method of a dialogue pre-training model according to some embodiments of the present application;

FIG. 5 is a schematic diagram of a conversation history according to some embodiments of the present application;

FIG. 6 is a flow chart of a training method of a task-based dialog model according to some embodiments of the present application;

fig. 7 is a flowchart of a method for generating a dialogue reply according to some embodiments of the application.

Detailed Description

The present application is described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the spirit of the present application. These are all within the scope of the present application.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that, if not conflicting, the various features in the embodiments of the present application may be combined with each other, which is within the protection scope of the present application. In addition, while functional block division is performed in a device diagram and logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. Moreover, the words "first," "second," "third," and the like as used herein do not limit the data and order of execution, but merely distinguish between identical or similar items that have substantially the same function and effect.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application in this description is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items.

In addition, technical features described below in the various embodiments of the present application may be combined with each other as long as they do not conflict with each other.

To facilitate understanding of the methods provided in the embodiments of the present application, the terms involved in the embodiments of the present application are first described:

(1) Pre-training model

The 'pre-training' is to learn commonalities among a large amount of low-cost collected data through a certain training method, and the model related to the specific task is obtained by learning a small amount of marked data related to the specific task through a model for learning the commonalities and has the capability of learning the commonalities. Equivalently, these commonalities are "migrated" to specific tasks, and the model is "trimmed" using a small amount of labeling data associated with the specific tasks. In this way, the model need only learn the "special" portion of the particular task from the "commonality". The model obtained after training of such large-scale data is called a pre-training model.

(2) Migration learning

Transfer learning is a machine learning method. The goal is to reuse the model developed for task A as an initial point in the process of developing the model for task B. For example, if a certain field contains massive data, a pre-training model is trained in the field, and the pre-training model is applied to other fields by applying a certain technology, so that a good effect can be achieved by only marking fewer data on downstream tasks in other fields, and the learning method is called migration learning.

(3) Contrast learning

Contrast learning is one type of self-supervised learning. The guiding principle is as follows: by automatically constructing similar and dissimilar instances, it is required to learn a representation learning model by which similar instances are made closer together in projection space and dissimilar instances are made farther apart in projection space. I.e. to maximize the probability of positive sample generation and minimize the probability of negative sample generation.

Before describing the embodiments of the present application, a simple description of a training method of a task-type dialog model known to the inventor of the present application is provided, so that the embodiments of the present application can be understood later.

In some schemes, an end-to-end pre-training model is trained by performing an information masking process and an information recovery process on each unlabeled conversation data, and then utilizing two adaptive supervised learning tasks.

In this scheme, the attribute of the reply black box generated by the end-to-end pre-training model is obvious, the effect cannot be guaranteed, and the model tends to generate meaningless replies, such as: good, can be like reply, reply can't be configured, also uncontrollable. The generated replies are very relevant to the training data distribution, depending heavily on the training data. In addition, a large amount of labeling data is required to train for the pre-training model in each field.

In some schemes, a first role dialogue sentence sequence and a second role dialogue sentence sequence in a history dialogue record are obtained, all dialogue sentences in the first role dialogue sentence sequence and all dialogue sentences in the second role dialogue sentence sequence are combined and recombined to obtain a first dialogue sequence, all dialogue sentences in the first dialogue sequence are randomly ordered to obtain a second dialogue sequence, word vector superposition is carried out on each word in the first dialogue sequence and the second dialogue sequence to obtain a first initial word expression vector sequence and a second initial word expression vector sequence, a preset BERT model is input, and pre-training is carried out.

In this scheme, the pre-training approach only achieves grammar and syntactic structure, without interaction and modeling of relationships between dialog tasks in the task-based dialog system. In addition, the pre-training model in each field needs a large amount of labeling data to be trained respectively.

In view of the above problems, an embodiment of the present application provides a training method for a dialog pre-training model, which trains a neural network by using a plurality of first dialog histories in a first field to obtain the first dialog pre-training model, where the first dialog histories are labeled with first real labels, the first real labels include at least one dialog task, and the dialog task reflects semantic logic of the first dialog histories. And then training the first dialogue pre-training model by adopting a plurality of second dialogue histories in the second field to obtain a second dialogue pre-training model. The second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories.

The embodiment of the application also provides a training method of the task-type dialogue model, which comprises the steps of obtaining a training set corresponding to a certain dialogue task, training the dialogue pre-training model by adopting the training set, and fine-tuning model parameters to obtain the task-type dialogue model. For example, when a conversational task includes intent, the task-based conversational model may be an intent recognition model; when the dialog task includes a dialog state, the task-type dialog model may be a dialog state prediction model. In this embodiment, the dialogue pre-training model is applied to the downstream dialogue task, so that the accuracy of the task-type dialogue model can be greatly improved.

The embodiment of the application also provides a dialogue reply generation method, which is used for acquiring the dialogue context, inputting the dialogue context into at least one task type dialogue model, such as an input intention recognition model, a dialogue state prediction model and the like, and respectively outputting dialogue tasks, such as intention or dialogue state of the dialogue context and the like. Finally, a dialog reply is output in accordance with the at least one dialog task. In this embodiment, each task-based dialog model is applied to generate a dialog reply, and the task-based dialog model has higher accuracy, so that the dialog reply also has higher accuracy.

Exemplary applications provided by embodiments of the present application for training a dialog pre-training model or for training a task-type dialog model or for generating dialog replies are described below. The electronic device provided by the embodiment of the application may be a server, for example, a server deployed in a cloud. The electronic device provided by some embodiments of the present application may be a notebook computer, a desktop computer, or a mobile device, and other various types of terminals.

As an example, referring to fig. 1, fig. 1 is an application scenario schematic diagram of a task-type dialog system provided in an embodiment of the present application. The terminal 10 is connected to the server 20 via a network, which may be a wide area network or a local area network, or a combination of both.

The terminal 10 may be used to acquire training sets and construct neural networks, for example, by those skilled in the art downloading the prepared training sets on the terminal and constructing the network structure of the neural network. It will be appreciated that the terminal 10 may also be used to obtain dialog context, for example, the user entering dialog context via a microphone on the terminal 10.

In some embodiments, the terminal 10 locally executes the training method of the dialogue pre-training model provided in the embodiments of the present application to complete training of the designed neural network by using the training set, and determines the final model parameters, so that the neural network configures the final model parameters, and the dialogue pre-training model can be obtained. In some embodiments, the terminal 10 may also send the training set and the constructed neural network stored on the terminal by the person skilled in the art to the server 20 through the network, the server 20 receives the training set and the neural network, trains the designed neural network with the training set, determines the final model parameters, and then sends the final model parameters to the terminal 10, and the terminal 10 saves the final model parameters, so that the neural network configuration can obtain the session pre-training model from the final model parameters.

Referring to fig. 2, in some embodiments, a program architecture corresponding to a training method of a dialog pre-training model includes a data preprocessing module, a state tracking prediction module, a reply term prediction module, a dialog reply generation prediction module, and a joint training module. The data preprocessing module is used for processing data and converting dialogue data into a format required by the input of the neural network. The state tracking prediction module is used for executing the task of dialog state tracking, namely outputting a predicted dialog state. The reply term prediction module is used for executing the task of reply term prediction, namely outputting prediction reply. The dialogue reply generation module is used for executing task of problem category prediction, namely outputting predicted problem category. And the joint training module is used for training and learning training data aiming at the joint modeling of the three tasks to obtain a dialogue pre-training model.

Next, the structure of the electronic device in the embodiment of the present application is described, and fig. 3 is a schematic structural diagram of the electronic device 500 in the embodiment of the present application, where the electronic device 500 includes at least one processor 510, a memory 550, at least one network interface 520, and a user interface 530. The various components in electronic device 500 are coupled together by bus system 540. It is appreciated that the bus system 540 is used to enable connected communications between these components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to the data bus. The various buses are labeled as bus system 540 in fig. 3 for clarity of illustration.

The processor 510 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The user interface 530 includes one or more output devices 531 that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface 530 also includes one or more input devices 532, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

Memory 550 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random Access M emory). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 may optionally include one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

network communication module 552 for accessing other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 include Bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), among others;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

the input processing module 554 is configured to detect one or more user inputs or interactions from one of the one or more input devices 532 and translate the detected inputs or interactions.

From the foregoing, it will be appreciated that the training method of the dialog pre-training model provided in the embodiments of the present application may be implemented by various types of electronic devices having processing capabilities, for example, by a processor of an electronic device or by other devices having computing processing capabilities. Other devices with computing processing capabilities may be smart terminals or servers or the like communicatively coupled to the electronic device.

The training method of the dialogue pre-training model provided by the embodiment of the application is described below in connection with exemplary application and implementation of the electronic device provided by the embodiment of the application. Referring to fig. 4, fig. 4 is a flowchart illustrating a training method of a dialogue pre-training model according to an embodiment of the present application. It will be appreciated that the execution subject of the training method of the dialog pre-training model, the training method of the task-type dialog model, or the dialog reply generation method may be one or more processors of the electronic device.

Referring again to fig. 4, the method S100 may specifically include the following steps:

s10: training the neural network by adopting a plurality of first dialogue histories in the first field to obtain a first dialogue pre-training model.

The first dialogue history is marked with a first real label, and the first real label comprises at least one dialogue task which reflects semantic logic of the first dialogue history.

It will be appreciated that the first dialog history is generated when the robot dialogues with the user and is stored in the first database. When a large number of first dialogue histories are accumulated, the first database stores the large number of first dialogue histories. It is to be appreciated that the first dialog history is related to a first field, such as beauty, health or fitness, etc.

Each first dialog history is labeled with a first authentic label that includes at least one dialog task. Wherein the dialog task reflects semantic logic of the first dialog history. A conversational task may be understood as an understanding that a robot needs to make a user speaking in the course of a conversation with the user. In some embodiments, the dialog task may be a domain of user speech, an intent, or a semantic slot, etc. Wherein, the semantic slots refer to attributes that an entity has well defined.

It is to be appreciated that the first authentic signature of the first dialog history may be annotated by one of ordinary skill in the art based on the first dialog history and the first database. The first database is a question-answer database of the first field, stores a large number of first dialogue histories, and at least one dialogue task corresponding to each of the first dialogue histories. One skilled in the art looks up at least one dialog task in the first database that matches to correspond to the first dialog history to annotate the first dialog history. The robot may be a terminal having a voice question-answering function, such as a bluetooth sound or a smart phone, etc.

After a large number of first dialogue histories marked with the first real labels are obtained, training the neural network by adopting the large number of first dialogue histories to obtain a first dialogue pre-training model. In some embodiments, the neural network may be an Albert network, a Bert network, or an autoregressive network (e.g., a GPT network), or the like.

In some embodiments, the neural network performs supervised training learning by gradient back propagation. After learning the first dialogue histories in the first fields, the neural network continuously adjusts model parameters until convergence, and takes the model parameters corresponding to the convergence as final model parameters to obtain a first dialogue pre-training model. It is to be understood that the training process of the neural network is common knowledge in the field of machine learning and is not described in detail herein.

In some embodiments, the foregoing step S10 specifically includes:

s11: and formatting the first dialogue history to obtain a first formatting sequence.

It will be appreciated that the first dialog history includes not only questions and answers between the user and the robot, but also at least one dialog task for the annotation. To facilitate the neural network to learn a number of first dialog histories, in this embodiment, the first dialog histories are formatted into a structured first formatting sequence.

In some embodiments, the step S11 specifically includes: and performing splicing processing on the first dialogue history and each dialogue task, wherein identifiers are inserted between any two of the first dialogue history and each dialogue task to distinguish the first dialogue history and each dialogue task, so as to obtain a first formatting sequence.

Wherein the identifier may be a specific string and/or symbol, etc. The first real label annotated with the first dialog history includes 3 dialog tasks for illustration. In some embodiments, the first dialog history s and 3 dialog tasks are spelled into a first formatting sequence x= (s, c, b, r), where s represents the first dialog history, c represents the true reply, b represents the true dialog state, and r represents the true problem category. It will be appreciated that each item in x may be viewed as a sequence of modules, and that in some embodiments, each item is concatenated with a particular identifier. For example, referring to fig. 5, in the first dialogue history s, "Chatbot" is added as an identifier at the beginning of a reply sentence for a reply sentence of a robot, and "User" is added as an identifier at the beginning of a reply sentence for a reply sentence of a person. For the real dialog State b, "→state" is added at the beginning of the real dialog State b as an identifier, and "[ EOB ]" is added at the end of the real dialog State b as an identifier. For the true reply c, "[ DB ]" is added as an identifier at the beginning of the true reply c, and [ EOKB ] is added as an identifier at the end. For the real problem category r, "[ EOS ]" is added as an identifier at the end thereof.

By the mode, the first formatted sequence x after the split processing is used as the input of the neural network, and the neural network can know the starting and ending positions of each module sequence, so that the neural network can learn the characteristics of each module sequence better, and the convergence of the neural network is quickened.

S12: the first formatted sequence is input to a neural network that outputs a first predictive tag based on a first database, wherein the first database is a question-answer database of a first domain.

The first formatted sequence is used as input of a neural network, the neural network performs space mapping transformation on the first formatted sequence, extracts and learns the characteristics of each module sequence in the first formatted sequence, searches and searches each dialogue task corresponding to the first dialogue history in a first database, and outputs the dialogue tasks as a first prediction tag.

It will be appreciated that the first database is a question-answer database of the first domain, storing a plurality of first dialog histories, and at least one dialog task corresponding to each of the first dialog histories.

S13: and according to the differences between the plurality of first prediction labels and the plurality of first real labels, adjusting model parameters of the neural network until the maximum likelihood function is maximized, and obtaining a first dialogue pre-training model.

In order to make the prediction bias of the neural network counter-propagate, the prediction labels of the constraint neural network are trained towards the direction of the real labels, here, the difference sum (i.e. the loss sum) between a plurality of first prediction labels and a plurality of first real labels can be calculated, and the model parameters of the neural network are adjusted according to the loss sum, so that the first prediction labels are continuously close to the first real labels, the loss sum is smaller and smaller, the maximum likelihood function is maximized, the convergence of the neural network is illustrated, and the model parameters at the moment are taken as the final model parameters of the neural network, so that the first dialogue training model is obtained.

Wherein the maximum likelihood function is used to reflect the likelihood that the first predicted tag is the first real tag at the model parameter θ. It will be appreciated that when the maximum likelihood function reaches a maximum, the probability that the first predicted label output by the neural network under the model parameters θ is the first true predicted label is the maximum.

In this embodiment, the first dialog pre-training model with higher prediction accuracy can be obtained by formatting the first dialog history, inputting the obtained first formatting sequence into the neural network for training, and adopting the maximum likelihood function to constrain the neural network to train towards the directions of maximizing, losing and minimizing the maximum likelihood function.

In some embodiments, the first real label includes a real dialog state that reflects a topic characteristic of the first dialog history. In some embodiments, the real dialog state may include a domain, intent, or semantic slot to which the first dialog history belongs, or the like. For example, referring again to fig. 5, when the first dialog history includes a dialog of "skin-asking" the field in the dialog state may be "beauty", the intention may be "skin-asking", and the semantic slot may be "skin slot".

In this embodiment, the maximum likelihood function comprises a first maximum likelihood function reflecting a probability that the predicted dialog state in the first predicted tag is a true dialog state.

It will be appreciated that the first maximum likelihood function is for a dialog state, calculating the likelihood that the predicted dialog state is a true dialog state under the model parameters θ.

In some embodiments, the first maximum likelihood function is as follows:

where b is the real dialog state, s is the first dialog history, T _b Is the length of the dialog state sequence, t is the dialogIndex of state sequence. b _t Is item t, b in the real dialog state _＜t Is the first t term in the real dialog state, pθ (b _t |b _<t S) represents the probability of predicting that the t-th item is the t-th item in the real dialog state given the first t-th item in the real dialog state and the first dialog history s.

In this embodiment, the first maximum likelihood function L is used _B The probability of obtaining the real dialogue state is maximized on the premise of giving the first dialogue history, namely, the probability of obtaining the predicted dialogue state to be the real dialogue state is maximized, so that the predicted dialogue state in the first predicted label output by the neural network under the model parameter theta is most likely to be the real dialogue state. The first dialogue pre-training model obtained after training can accurately search and track the dialogue state in the first database based on the dialogue problem.

In some embodiments, the first real label further includes a real problem category reflecting a category to which the problem in the first conversation history belongs. For example, referring again to FIG. 5, when the first dialog history includes a dialog of "skin type," the real problem category may be "question function.

In this embodiment, the maximum likelihood function further comprises a second maximum likelihood function reflecting the probability that the predicted problem category in the first predicted tag is a true problem category.

It will be appreciated that the second maximum likelihood function is for a problem category, calculating the likelihood that the predicted problem category is a true problem category under the model parameters θ.

In some embodiments, the second maximum likelihood function is as follows:

where b is the real dialog state, s is the first dialog history, c is the real reply, T _r Is the length of the problem category sequence and t is the index of the problem category sequence. r is (r) _t Is a real questionItem t, r in the question category _＜t Is the first t term in the true problem category, pθ (r _t |r _＜t C, b, s) represents the probability of predicting that the t-th item is the t-th item in the real problem category given the top t-th item in the real problem category, the real reply c, the real dialog state b, and the first dialog history s.

In this embodiment, by the above-mentioned second maximum likelihood function, the probability of finding that the predicted problem category is the true problem category is maximized given the first dialogue history, the true dialogue state, and the true reply, so that the predicted problem category in the first predicted tag output by the neural network under the model parameter θ is most likely to be the true problem category. The first dialogue pre-training model obtained after training can accurately search and analyze the problem category of the dialogue problem in the first database based on the dialogue problem.

In some embodiments, the first authentic tag further includes an authentic reply reflecting an authentic answer to the first dialog history. For example, referring again to fig. 5, when the first dialog history includes a dialog of "skin texture," the actual reply may be "neutral skin texture.

In this embodiment, the maximum likelihood function further comprises a third maximum likelihood function reflecting a probability that the predicted reply in the first predicted tag is a true reply.

It will be appreciated that the third maximum likelihood function is for a reply, calculating the likelihood that the predicted reply is a true reply at the model parameters θ.

For example, if the first session history includes a session of "skin-asking" and the neural network queries the first database for an entity whose predicted reply is "neutral skin," then the "neutral skin" may be taken as a positive sample, and an entity may be randomly extracted from the first database as a negative sample for comparison learning with the positive sample. Therefore, the probability of the positive sample can be maximized, the probability of the negative sample is minimized, and the first dialogue pre-training model obtained through training has higher capacity in distinguishing the positive sample from the negative sample, so that correct reply of dialogue problems can be easily found.

In some embodiments, the third maximum likelihood function is as follows:

L _C ＝ylog(pθ(x))+(1-y)log(1-pθ(x′))

wherein in the case of a positive sample, the entity in x is the correct entity; in the case of negative samples, the entities in x' are erroneous. y represents the correct alignment of the entity, y=1 when the entity is correct, and y=0 when the entity is incorrect.

In this embodiment, through the above third maximum likelihood function, the positive sample and the negative sample are adopted to perform contrast learning, so as to maximize the probability of the positive sample and minimize the probability of the negative sample, so that the first dialogue pre-training model obtained by training has higher capability in distinguishing the positive sample from the negative sample, and can easily find the correct reply of the dialogue problem.

In some embodiments, the maximum likelihood function is the first maximum likelihood function L described above _B Second maximum likelihood function L _R And a third maximum likelihood function L _C A kind of electronic device. In this embodiment, dialog states, replies and problem categories are jointly trained with the goal of maximizing the value of the maximum likelihood function. Therefore, the first dialogue pre-training model obtained through training has higher accuracy in predicting dialogue states, replies and problem categories.

The first real label is set to comprise at least one dialogue task reflecting semantic logic of the first dialogue history, so that the trained first dialogue pre-training model can learn the mutual implicit relation among all dialogue tasks in the first dialogue history.

S20: training the first dialogue pre-training model by using second dialogue histories in a plurality of second fields to obtain a second dialogue pre-training model.

The second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories.

It will be appreciated that the second session history is generated when the robot is in session with the user and is stored in the second database. The number of second dialog histories is relatively small, less than the number of first dialog histories. It is understood that the second dialog history is content pertaining to a second domain, which is different from the first domain.

Each second dialog history is labeled with a second authentic label, the second authentic label having the same structure as the first authentic label. I.e. the second real label also comprises at least one dialog task. Wherein the dialog task reflects semantic logic of the second dialog history. A conversational task may be understood as an understanding that a robot needs to make a user speaking in the course of a conversation with the user. In some embodiments, the dialog task may be a domain of user speech, an intent, or a semantic slot, etc. Wherein, the semantic slots refer to attributes that an entity has well defined.

It is to be appreciated that the second authentic signature of the second dialog history may be annotated by one of ordinary skill in the art based on the second dialog history and the second database. The second database is a question-answer database of the second domain, stores a relatively small number of second dialogue histories, and at least one dialogue task corresponding to each second dialogue history. The person skilled in the art searches the second database for at least one dialogue task that matches the second dialogue history to annotate the second dialogue history.

Training the first dialogue pre-training model by using second dialogue histories in a plurality of second fields to obtain a second dialogue pre-training model. It will be appreciated that the underlying relationship between the various dialog tasks in the first dialog history has been learned based on the first dialog pre-training model. Therefore, the learning ability of the mutual hidden relation can be transferred to the second dialogue history in the second field, when training is carried out, only a small amount of second dialogue history is needed, the parameters of the first dialogue pre-training model are fine-tuned, the second dialogue pre-training model can be obtained, and the second dialogue pre-training model can analyze each dialogue task with specific mutual hidden relation from the dialogue.

In summary, the embodiment of the application provides a training method of a dialogue pre-training model, which trains a neural network by using a plurality of first dialogue histories in a first field to obtain the first dialogue pre-training model, wherein the first dialogue histories are marked with first real labels, the first real labels comprise at least one dialogue task, and the dialogue task reflects semantic logic of the first dialogue histories. And then training the first dialogue pre-training model by adopting a plurality of second dialogue histories in the second field to obtain a second dialogue pre-training model. The second dialogue history is marked with a second real label, the second real label and the first real label have the same structure, and the number of the second dialogue histories is smaller than that of the first dialogue histories.

On the basis of knowing a huge amount of first dialogue histories in the first field, by setting at least one dialogue task of semantic logic reflecting the first dialogue histories in the first real label, the trained first dialogue pre-training model can learn the mutual hidden relations among all dialogue tasks in the first dialogue histories, so that the ability of the first dialogue pre-training model to learn the mutual hidden relations can be transferred to a second dialogue pre-training model generated by learning the second dialogue histories in the second field by adopting the first dialogue pre-training model to learn the second dialogue histories in the second field with smaller quantity. In case of a small second dialog history in the second domain, the second dialog pre-training model may also learn these implicit relations reflecting the semantic logic. Therefore, under the condition that mass dialogue data in the first field exist, through transfer learning, the second dialogue pre-training model with good effect can be obtained by training only a small amount of annotation data in other second fields, and the annotation cost can be greatly reduced.

The embodiment of the application further provides a training method of the task-type dialogue model, referring to fig. 6, the training method S200 of the task-type dialogue model at least includes the following steps:

s201: and acquiring a training set corresponding to a certain dialogue task.

S202: training the dialogue pre-training model by adopting a training set, and fine-tuning model parameters to obtain the task type dialogue model. Specifically, the second dialogue pre-training model in the above embodiment is adopted to learn the labeling data of a certain dialogue task, and the task dialogue model is obtained by predicting, calculating loss and calculating gradient, and updating model parameters. Thus, the task-based dialog model is able to learn preference information for the dialog task, specifically the ability to accurately predict the dialog task.

For example, when a conversational task includes intent, the task-based conversational model may be an intent recognition model; when the dialog task includes a dialog state, the task-type dialog model may be a dialog state prediction model. In this embodiment, the dialogue pre-training model is applied to the downstream dialogue task, so that the accuracy of the task-type dialogue model can be greatly improved.

The embodiment of the application also provides a method for generating a dialogue reply, referring to fig. 7, the method S300 for generating a dialogue reply at least includes the following steps:

S301: the dialog context is obtained.

S302: the dialogue upper is input into at least one task type dialogue model, and the at least one task type dialogue model outputs dialogue tasks respectively to obtain at least one dialogue task. At least one task-type dialogue model is used for predicting different dialogue tasks, and any task-type dialogue model is trained by the training method of the task-type dialogue model in the embodiment.

The dialog context is input into at least one task-type dialog model, such as an input intention recognition model, a dialog state prediction model, etc., and dialog tasks, such as intents of the dialog context or dialog states, etc., are output, respectively. Finally, a dialog reply is output in accordance with the at least one dialog task. In this embodiment, each task-based dialog model is applied to generate a dialog reply, and the task-based dialog model has higher accuracy, so that the dialog reply also has higher accuracy.

The present application also provides a computer readable storage medium, such as a memory including program code executable by a processor to perform the training method of the session pre-training model, the training method of the task type session model, or the session reply generation method in the above embodiments. For example, the computer readable storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only Memory (CDROM), magnetic tape, floppy disk, optical data storage device, etc.

Embodiments of the present application also provide a computer program product comprising one or more program codes stored in a computer-readable storage medium. The processor of the electronic device reads the program code from the computer-readable storage medium, and the processor executes the program code to complete the method steps of the training method of the dialogue pretraining model, the training method of the task-type dialogue model, or the dialogue reply generation method provided in the above embodiments.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program may include processes implementing the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the present application as described above, which are not provided in details for the sake of brevity; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of training a dialog pre-training model, comprising:

training a neural network by adopting a plurality of first dialogue histories in a first field to obtain a first dialogue pre-training model, wherein the first dialogue histories are marked with first real labels, the first real labels comprise at least one dialogue task, and the dialogue task reflects semantic logic of the first dialogue histories;

2. The method of claim 1, wherein training the neural network using the first session histories of the plurality of first domains to obtain the first session pre-training model comprises:

formatting the first dialogue history to obtain a first formatted sequence;

inputting the first formatted sequence into the neural network, and outputting a first prediction tag by the neural network based on a first database, wherein the first database is a question-answer database of the first field;

and according to the differences between the plurality of first prediction labels and the plurality of first real labels, adjusting model parameters of the neural network until a maximum likelihood function is maximized, and obtaining the first dialogue pre-training model.

3. The method of claim 2, wherein formatting the first session history to obtain a first formatted sequence comprises:

And splicing the first conversation history and each conversation task, wherein identifiers are inserted between any two of the first conversation history and each conversation task to distinguish the first conversation history and each conversation task, so as to obtain the first formatting sequence.

4. The method of claim 3, wherein the first real label includes a real dialog state reflecting a topic characteristic of the first dialog history;

the maximum likelihood function includes a first maximum likelihood function reflecting a probability that a predicted dialog state in the first predicted tag is the true dialog state.

5. The method of claim 4, wherein the first real label further comprises a real problem category reflecting a category to which a problem in the first conversation history belongs;

the maximum likelihood function further includes a second maximum likelihood function reflecting a probability that the predicted problem category in the first predicted tag is the true problem category.

6. The method of claim 5, wherein the first authentic tag further comprises an authentic reply reflecting an authentic answer to the first dialog history;

The maximum likelihood function further includes a third maximum likelihood function reflecting a probability that the predicted reply in the first predicted tag is the true reply.

7. The method of claim 6, wherein the third maximum likelihood function further reflects a probability that a predicted reply in the first predicted tag is not the true reply for comparison learning.

8. A method for training a task-based dialog model, comprising:

acquiring a training set corresponding to a certain dialogue task;

training a dialogue pre-training model by using the training set to obtain the task type dialogue model, wherein the dialogue pre-training model is obtained by training by using the training method of the dialogue pre-training model according to any one of claims 1-7.

9. A method for generating a dialog reply, comprising:

acquiring a dialogue context;

inputting the dialogue context into at least one task type dialogue model, and respectively outputting dialogue tasks by the at least one task type dialogue model to obtain at least one dialogue task; wherein, the at least one task-type dialogue model is used for predicting different dialogue tasks, and any task-type dialogue model is trained by adopting the method as claimed in claim 8;

And outputting a dialogue reply according to the at least one dialogue task.

10. An electronic device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

11. A non-transitory computer readable storage medium storing computer executable instructions for causing an electronic device to perform the method of any one of claims 1-9.