CN113239157A

CN113239157A - Method, device, equipment and storage medium for training conversation model

Info

Publication number: CN113239157A
Application number: CN202110348055.1A
Authority: CN
Inventors: 黄信娴; 鲍思琪; 何煌; 王凡; 吴华; 何径舟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-08-10
Anticipated expiration: 2041-03-31
Also published as: CN113239157B

Abstract

The present disclosure discloses a training method, an apparatus, a device and a storage medium for a dialogue model, which relate to the technical field of computers, in particular to the technical fields of natural language processing, man-machine dialogue, etc. The dialogue model comprises a knowledge selection model and a reply generation model, and the training method of the dialogue model comprises the following steps: processing a conversation sample and a knowledge base by adopting the knowledge selection model to determine knowledge matched with the conversation sample and a first probability, wherein the first probability is the probability that the knowledge is selected; processing the dialog sample and the knowledge by adopting the reply generation model to determine a second probability corresponding to a predicted reply, wherein the second probability is the probability that the predicted reply is a reply sample; determining a loss function based on the first probability and the second probability, and training the knowledge selection model and the reply generation model based on the loss function. The method and the system can introduce related knowledge into a dialogue system, and the training mode has strong universality.

Description

Method, device, equipment and storage medium for training conversation model

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to the field of natural language processing, human-computer interaction, and the like, and in particular, to a method, an apparatus, a device, and a storage medium for training a dialogue model.

Background

To improve the relevance of replies generated by dialog systems, knowledge (knowledge) is typically introduced into the dialog system. For knowledge-introducing dialog systems, the dialog models employed by the dialog system include a knowledge selection model.

In the related technology, a manual labeling mode can be adopted for labeling knowledge, and a model is selected by supervised training knowledge; alternatively, unsupervised training is performed with Bag of Words (BoW) and KL divergence (Kullback-Leibler divergence) as optimization targets.

Disclosure of Invention

The disclosure provides a training method, a device, equipment and a storage medium of a dialogue model.

According to an aspect of the present disclosure, there is provided a method of training a dialogue model, the dialogue model including a knowledge selection model and a reply generation model, the method including: processing a conversation sample and a knowledge base by adopting the knowledge selection model to determine knowledge matched with the conversation sample and determine a first probability corresponding to the knowledge, wherein the first probability is the probability that the knowledge is selected; processing the dialog sample and the knowledge by adopting the reply generation model to determine a second probability corresponding to a predicted reply, wherein the second probability is the probability that the predicted reply is a reply sample; determining a loss function based on the first probability and the second probability, and training the knowledge selection model and the reply generation model based on the loss function.

According to another aspect of the present disclosure, there is provided a training apparatus of a dialogue model including a knowledge selection model and a reply generation model, the apparatus including: a knowledge selection module, configured to process a dialogue sample and a knowledge base by using the knowledge selection model to determine knowledge matched with the dialogue sample, and determine a first probability corresponding to the knowledge, where the first probability is a probability that the knowledge is selected; a reply generation module, configured to process the dialog sample and the knowledge by using the reply generation model to determine a second probability corresponding to a predicted reply, where the second probability is a probability that the predicted reply is a reply sample; a training module to determine a loss function based on the first probability and the second probability, and train the knowledge selection model and the reply generation model based on the loss function.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the above aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.

According to the technical scheme disclosed by the invention, relevant knowledge can be introduced into a dialogue system, and the training mode has strong universality.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;

fig. 7 is a schematic diagram of an electronic device for implementing any one of the training methods of the dialogue model of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In a dialog system, people often want to introduce knowledge related to dialog, so that the system can discuss the related knowledge and users in the reply, and the relevance, the information amount and the interestingness of the reply are improved.

The method introduces knowledge in a dialogue system, namely selects reasonable relevant knowledge in a knowledge base through dialogue information, and combines the reasonable relevant knowledge with reply generation of the system, and at present, two main schemes are provided: one is supervised training mode, and the other is unsupervised training mode based on posterior probability. The former mainly labels knowledge, and requires a knowledge selection model to output the knowledge labeled as correct as possible during training; the latter does not need manual labeling, mainly utilizes posterior information contained in the current reply, takes the conversation text as prior information, respectively generates prior probability and posterior probability of knowledge selection, generally uses Bag of words (BoW) to establish the relation between the knowledge selected according to the posterior probability and the posterior information (reply), improves the accuracy of the posterior probability, and then draws the distribution of the prior probability and the posterior probability in the modes of KL Divergence and the like.

However, the above solutions all have certain problems, the former solution is expensive in cost, and due to factors such as manpower and time, manual labeling cannot exhaust all reasonable knowledge, and knowledge that only depends on manual labeling is one-sidedness; the latter model has high training difficulty, often needs a large amount of experience and skill to obtain a good convergence effect, for example, the system training time cost is high, the training mode is not universal enough, and the reproducibility is poor due to the fact that certain word frequency characteristics of the data set need to be cleaned, the BoW is pre-trained by using the posterior information on the data set, and the like.

In order to solve at least one of the above problems to some extent, the embodiments of the present disclosure provide a training method for a dialogue model, where the training method belongs to an unsupervised training method, so as to solve the problems of large manual annotation amount and the like existing in the supervised training method.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, which provides a training method of a dialogue model, the dialogue model including a knowledge selection model and a reply generation model, as shown in fig. 1, the method including:

101. and processing the dialogue sample and the knowledge base by adopting the knowledge selection model to determine knowledge matched with the dialogue sample and determine a first probability corresponding to the knowledge, wherein the first probability is the probability of the knowledge being selected.

102. And processing the conversation sample and the knowledge by adopting the reply generation model to determine a second probability corresponding to the predicted reply, wherein the second probability is the probability that the predicted reply is selected as the reply sample.

103. Determining a loss function based on the first probability and the second probability, and training the knowledge selection model and the reply generation model based on the loss function.

The dialog process generally includes: the dialog system obtains dialog information (context) that the dialog system processes using a dialog model to generate a reply (response). The dialog information may also be referred to as context, above, etc., and refers to information generated during the dialog process, including, for example: the search sentence (query) currently input by the user, and in addition, since the dialog process is generally multi-round, the dialog information may also include the dialog contents that have occurred previously.

As shown in fig. 2, the dialogue model may include: a knowledge selection model 201 and a reply generation model 202. During the dialogue, the input of the knowledge selection model 201 includes dialogue information (c) and a knowledge base, the knowledge base is used for storing knowledge, the output of the knowledge selection model is one (top-1) knowledge (k) which is most matched with the input dialogue information, the input of the reply generation model 202 is the knowledge and the dialogue information, and the output is the reply (r).

Knowledge (knowledge) refers to information that is valuable for generating replies, and may be stored in a knowledge base, and may include knowledge in a variety of different areas, including, for example: weather, entertainment, intelligent customer service, traffic navigation and the like. For example, the knowledge includes information about actors, directors, and ratings of a certain movie, for example, entertainment.

In order to distinguish from dialog information (context) and replies (responses) in a dialog process, corresponding parameters are referred to as a dialog sample (context sample) and a reply sample (responses sample) in a training phase, and in addition, a reply generated based on the dialog sample and knowledge may be referred to as a predicted reply in the training phase.

In the training phase, a dialogue sample is input into a knowledge selection model, the other input of the knowledge selection model is knowledge in a pre-configured knowledge base, the knowledge base comprises a plurality of pieces of knowledge, and through the processing of the knowledge selection model, a plurality of pieces of (top-k) knowledge matched with the dialogue sample can be output, namely k pieces of knowledge are selected from high to low matching values, and k is a settable value. The knowledge selection model may include an encoding model (encoder) whose parameters are trainable, the encoder may be a deep neural network model, such as an encoder of a Transformer model, which may encode the input (dialog samples and knowledge) into corresponding vectors to select knowledge based on the vectors. In addition, a first probability corresponding to the knowledge may be calculated according to the matching value, for example, the first probability is a normalized value obtained by normalizing the matching value.

After k pieces of knowledge are obtained, the dialog samples and k pieces of knowledge may be used as input of a reply generation model, and the reply generation model may obtain probabilities corresponding to the predicted reply, including a probability of predicting the reply as a reply sample, which may be referred to as a second probability. It should be noted that, in the training stage, subsequent processing may be performed according to the second probability, and the second probability does not need to be mapped to a corresponding reply, but in the dialog process, that is, in the application stage, after the reply generation model obtains the probability corresponding to the predicted reply, the text with the highest probability may be selected as the reply. The reply generative model may be a deep neural network model, such as a Transformer model, the parameters of which are trainable.

When the first probability and the second probability are obtained, a Loss function may be determined based on the first probability and the second probability, and the Loss function may be a Marginal Loss (Marginal Loss) function, and is formulated as:

wherein, p (k)_i| c) is the first probability, p (r | c, k)_i) Is the second probability, r is the reply sample, c is the dialogue sample, k_iIs the ith knowledge.

After the loss function is obtained, a dialogue model may be trained based on the loss function. The dialogue model includes a knowledge selection model and a reply generation model, and during training, the knowledge selection model and the reply generation model may be trained jointly, for example, parameters of the knowledge selection model and parameters of the reply generation model may be adjusted until an optimization goal determined based on the loss function is reached, instead of determining the loss function corresponding to the knowledge selection model separately to train the knowledge selection model separately.

In this embodiment, through the processing of the knowledge selection model, the knowledge may be introduced into the dialog system, and when the knowledge is introduced into the dialog system, the loss function is determined based on the first probability and the second probability, the first probability is determined by using the knowledge selection model, and the second probability is determined by using the reply generation model, so that the knowledge selection model and the reply generation model may be jointly trained, thereby avoiding a problem of a large amount of manual annotations when the knowledge selection model is trained alone, and the second probability is related to the reply, using the reply as an optimization target, and compared with a mode using bows and KL divergence as optimization targets, the training mode may be more general, and the reproducibility is stronger.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure, which provides a training method of a dialogue model, the dialogue model includes a knowledge selection model and a reply generation model, as shown in fig. 3, the method includes:

301. and constructing a training corpus.

Wherein, the corpus can be collected from historical conversations, and each set of corpus can be expressed as < conversation sample, reply sample >.

302. And coding the dialogue sample into dialogue vectors by adopting a coding model in the knowledge selection model, and coding each knowledge in the knowledge base into knowledge vectors respectively.

303. Determining an inner product value of the dialogue vector and the knowledge vector.

304. And selecting a preset number of knowledge according to the sequence of the inner product values from large to small, and determining the knowledge as the knowledge matched with the dialogue sample.

305. And normalizing the inner product value corresponding to the knowledge to obtain a normalized value, and determining the normalized value as a first probability corresponding to the knowledge.

As shown in fig. 4, for a structural schematic diagram of a knowledge selection model, the knowledge selection model may include: the coding model 401, the input of the coding model 401 includes a dialog sample and each knowledge in the knowledge base, and the input can be converted into a corresponding vector through the processing of the coding model 401, which can be respectively called a dialog vector (Rep-c) and a knowledge vector (Rep-k). Then, inner product values of the dialogue vector and the knowledge vector can be calculated, and preset k knowledge is selected as the knowledge for matching the dialogue sample according to the sequence from large to small of the inner product values.

For k matched knowledge, the first probability corresponding to each knowledge may be a normalized value of the corresponding inner product value, for example, assuming that the ith knowledge is k_iIf the dialog sample is represented by c, the knowledge k is calculated_iAfter the inner product value of the dialog sample c is added, softmax processing can be performed on the inner product value to obtain [0,1 ]]Value of between, and the knowledge k_iCorresponding first probability p (k)_i|c)。

By selecting in order of the inner product value from large to small, knowledge matching with the dialogue sample can be acquired. The first probability can be obtained simply by normalizing the inner product value.

306. And processing the dialogue sample and the knowledge by adopting an input layer of the reply generation model to obtain an input vector.

307. And processing the input vector by adopting the hidden layer of the reply generation model to obtain a state vector.

308. And processing the state vector by adopting an output layer of the reply generation model to determine a second probability corresponding to the prediction reply, wherein the second probability is the probability that the prediction reply is a reply sample.

As shown in fig. 5, for a structural schematic diagram of the reply generative model, the reply generative model may include: an input layer 501, a hidden layer 502, and an output layer 503. The input layer 501 is used to convert an input text into an input vector, where the input text is represented by x; in the present embodiment, the first and second electrodes are,the input text includes the dialog sample, knowledge that matches the dialog sample, and the reply that has been generated. The hidden layer 502 is used to process the input vector and output a state vector, which is denoted by h. The input to the output layer 503 is a state vector and the output at the training stage is the probability of predicting a reply to each candidate text comprising a reply sample, and thus the output of the output layer comprises the probability of predicting a reply to a reply sample, referred to as the second probability, in p (r | c, k |)_i) And (4) showing.

Further, the input layer may include a type embedding (type embedding) layer, and inputs of the type embedding layer include a dialog information type identifier, a knowledge type identifier, and a reply type identifier, which are different from each other. For example, the reply type identifier (type id) is 0, the dialog information type identifier (type id) is 1, and the knowledge type identifier (type id) is 2. It will be appreciated that the input layers may also include other general layers, such as a position embedding (position embedding) layer and a label embedding (token embedding) layer.

By introducing a type embedding layer and adopting different type identifications to respectively identify dialog information, knowledge and replies, knowledge can be better distinguished and used.

The backbone structure of the hidden layer and the output layer can be a Transformer model, for example, the hidden layer comprises an encoder of the Transformer model, and the output layer comprises a decoder of the Transformer model. The hidden layer in fig. 5 includes L transform blocks (blocks) as an example.

Further, the hidden layer comprises a self-attention layer model, the self-attention model comprises a first part and a second part, the first part is a part corresponding to the dialogue sample and the knowledge, the second part is a part corresponding to the generated reply, the first part adopts a bidirectional self-attention mechanism, and the second part adopts a unidirectional self-attention mechanism.

As shown in fig. 5, the self-attention mechanism of the dialog sample (indicated above in fig. 5) and the portion corresponding to knowledge is bidirectional (indicated by a solid line), and the self-attention mechanism of the reply corresponding portion is unidirectional (indicated by a dotted line).

By using a two-way self-attention mechanism for the first part of the self-attention layer, information in conversational samples and knowledge can be better extracted, and flexibility can be increased by using two-way for one part and one-way for the other part, rather than all one-way or two-way.

Through the input layer, the hidden layer and the output layer, the second probability related to the reply sample can be determined, so that the reply can be used as an optimization target, and the training is more stable and more universal.

309. Determining a loss function based on the first probability and the second probability.

310. Jointly training the knowledge selection model and the reply generation model based on the loss function.

The Training Objectives (Training Objectives) may be to minimize the above-mentioned Mardigital Loss. That is, after the loss function is obtained, the parameters of the knowledge selection model and the reply generation model may be adjusted until the training goals described above are achieved.

Fig. 6 is a schematic diagram according to a sixth embodiment of the present disclosure. As shown in fig. 6, this embodiment provides a training apparatus of a dialogue model. The dialogue model comprises a knowledge selection model and a reply generation model, and the apparatus 600 comprises: a knowledge selection module 601, a reply generation module 602, and a training module 603. The knowledge selection module 601 is configured to process the dialog sample and the knowledge base by using the knowledge selection model to determine knowledge matched with the dialog sample, and determine a first probability corresponding to the knowledge, where the first probability is a probability that the knowledge is selected; the reply generation module 602 is configured to process the dialog sample and the knowledge by using the reply generation model to determine a second probability corresponding to a predicted reply, where the second probability is a probability that the predicted reply is a reply sample; the training module 603 is configured to determine a loss function based on the first probability and the second probability, and train the knowledge selection model and the reply generation model based on the loss function.

In some embodiments, the reply generation model includes an input layer, a hidden layer, and an output layer, and the reply generation module 602 is specifically configured to: processing the dialogue sample and the knowledge information by adopting the input layer to obtain an input vector; processing the input vector by adopting the hidden layer to obtain a state vector; processing the state vector with the output layer to determine a second probability corresponding to a prediction reply.

In some embodiments, the input layer comprises: the type embedding layer inputs the dialog information type identification, the knowledge type identification and the reply type identification which are different from each other.

In some embodiments, the hidden layer comprises: a self-attention model comprising a first portion and a second portion, the first portion being a portion corresponding to the dialogue sample and knowledge, the second portion being a portion corresponding to the generated reply, the first portion employing a bi-directional self-attention mechanism, the second portion employing a unidirectional self-attention mechanism.

In some embodiments, the knowledge selection model comprises a coding model, the matched knowledge is determined from the knowledge base, the knowledge base comprises at least one knowledge, and the knowledge selection module 601 is specifically configured to: coding the dialogue sample into dialogue vectors by adopting the coding model, and coding each knowledge in the knowledge base into knowledge vectors respectively; determining an inner product value of the dialogue vector and the knowledge vector; and selecting a preset number of knowledge according to the sequence of the inner product values from large to small, and determining the knowledge as the knowledge matched with the dialogue sample.

In some embodiments, the knowledge selection module 601 is specifically configured to:

and normalizing the inner product value corresponding to the knowledge to obtain a normalized value, and determining the normalized value as a first probability corresponding to the knowledge.

It is to be understood that in the disclosed embodiments, the same or similar elements in different embodiments may be referenced.

It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are used for distinction only, and do not indicate the degree of importance, the order of timing, and the like.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored. The computing unit 701, the ROM702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above, such as the training method of the dialogue model. In some embodiments, the training method of the dialogue model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM702 and/or the communication unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more steps of the training method of the dialogue model described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g. by means of firmware) to perform the training method of the dialogue model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a dialogue model, the dialogue model comprising a knowledge selection model and a reply generation model, the method comprising:

processing a conversation sample and a knowledge base by adopting the knowledge selection model to determine knowledge matched with the conversation sample and determine a first probability corresponding to the knowledge, wherein the first probability is the probability that the knowledge is selected;

processing the dialog sample and the knowledge by adopting the reply generation model to determine a second probability corresponding to a predicted reply, wherein the second probability is the probability that the predicted reply is a reply sample;

determining a loss function based on the first probability and the second probability, and training the knowledge selection model and the reply generation model based on the loss function.

2. The method of claim 1, wherein the reply generation model comprises an input layer, a hidden layer, and an output layer, and wherein processing the dialogue samples and the knowledge to determine a second probability that a predicted reply corresponds using the reply generation model comprises:

processing the dialogue sample and the knowledge information by adopting the input layer to obtain an input vector;

processing the input vector by adopting the hidden layer to obtain a state vector;

processing the state vector with the output layer to determine a second probability corresponding to a prediction reply.

3. The method of claim 2, wherein the input layer comprises: the type embedding layer inputs the dialog information type identification, the knowledge type identification and the reply type identification which are different from each other.

4. The method of claim 2, wherein the hidden layer comprises: a self-attention model comprising a first portion and a second portion, the first portion being a portion corresponding to the dialogue sample and knowledge, the second portion being a portion corresponding to the generated reply, the first portion employing a bi-directional self-attention mechanism, the second portion employing a unidirectional self-attention mechanism.

5. The method of any of claims 1-4, wherein the knowledge selection model comprises a coding model, the matching knowledge is determined from the knowledge base, the knowledge base comprises at least one knowledge, and processing the dialogue sample using the knowledge selection model to determine the knowledge that matches the dialogue sample comprises:

coding the dialogue sample into dialogue vectors by adopting the coding model, and coding each knowledge in the knowledge base into knowledge vectors respectively;

determining an inner product value of the dialogue vector and the knowledge vector;

and selecting a preset number of knowledge according to the sequence of the inner product values from large to small, and determining the knowledge as the knowledge matched with the dialogue sample.

6. The method of claim 5, wherein the determining a first probability that the knowledge corresponds to comprises:

7. An apparatus for training a dialogue model, the dialogue model including a knowledge selection model and a reply generation model, the apparatus comprising:

a knowledge selection module, configured to process a dialogue sample and a knowledge base by using the knowledge selection model to determine knowledge matched with the dialogue sample, and determine a first probability corresponding to the knowledge, where the first probability is a probability that the knowledge is selected;

a reply generation module, configured to process the dialog sample and the knowledge by using the reply generation model to determine a second probability corresponding to a predicted reply, where the second probability is a probability that the predicted reply is a reply sample;

a training module to determine a loss function based on the first probability and the second probability, and train the knowledge selection model and the reply generation model based on the loss function.

8. The apparatus of claim 7, wherein the reply generation model comprises an input layer, a hidden layer, and an output layer, and the reply generation module is specifically configured to:

9. The apparatus of claim 8, wherein the input layer comprises: the type embedding layer inputs the dialog information type identification, the knowledge type identification and the reply type identification which are different from each other.

10. The apparatus of claim 8, wherein the hidden layer comprises: a self-attention model comprising a first portion and a second portion, the first portion being a portion corresponding to the dialogue sample and knowledge, the second portion being a portion corresponding to the generated reply, the first portion employing a bi-directional self-attention mechanism, the second portion employing a unidirectional self-attention mechanism.

11. The apparatus of any of claims 7-10, wherein the knowledge selection model comprises a coding model, the matching knowledge is determined from a knowledge base, the knowledge base comprises at least one knowledge, and the knowledge selection module is specifically configured to:

12. The apparatus of claim 11, wherein the knowledge selection module is specifically configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.