WO2021238599A1 - 对话模型的训练方法、装置、计算机设备及存储介质 - Google Patents

对话模型的训练方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021238599A1
WO2021238599A1 PCT/CN2021/091954 CN2021091954W WO2021238599A1 WO 2021238599 A1 WO2021238599 A1 WO 2021238599A1 CN 2021091954 W CN2021091954 W CN 2021091954W WO 2021238599 A1 WO2021238599 A1 WO 2021238599A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialogue
dialog
feature
reply
model
Prior art date
Application number
PCT/CN2021/091954
Other languages
English (en)
French (fr)
Inventor
欧蛟
张金超
冯洋
孟凡东
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2022538829A priority Critical patent/JP7431977B2/ja
Publication of WO2021238599A1 publication Critical patent/WO2021238599A1/zh
Priority to US17/715,778 priority patent/US20220309088A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method for training a dialogue model, a method for generating a dialogue reply, a device, a computer device, and a storage medium.
  • natural language processing can be applied in a wider range.
  • human-computer interaction scenarios such as small chat robots, dialogue systems, and terminal intelligent assistants.
  • the computer device can output the corresponding dialogue reply according to the dialogue text input by the user during the dialogue process. How to avoid the overly monotonous dialog response output by computer equipment is a problem that needs to be solved.
  • the embodiments of the application provide a method for training a dialogue model, a method for generating a dialogue reply, a device, a computer device, and a storage medium.
  • a method for training a dialogue model By updating the parameters of the dialogue model multiple times according to the dialogue characteristics of the dialogue, the different semantics of the dialogue are taken into consideration, so that Dialogue replies contain multiple semantics, which improves the diversity of dialogue replies generated by the dialogue model.
  • the technical solution is as follows:
  • a method for training a dialogue model includes:
  • the prior network is used to output the probability distribution of the dialog features
  • the The posterior network is used to estimate the probability distribution of the dialogue features output by the prior network
  • the first dialogue feature is used to represent the posterior features of the dialogue above and a dialogue reply in a dialogue
  • the second dialogue feature A priori feature used to represent the dialogue above and a dialogue reply in a dialogue, the first dialogue including a first dialogue above and at least two first dialogue replies;
  • the trained model is used as a dialogue model.
  • a method for generating a dialogue reply includes:
  • a training device for a dialogue model includes:
  • a feature acquisition module for acquiring at least two first dialog features and at least two second dialog features of a first dialog based on a priori network and a posterior network in the dialog model, the prior network is used to output dialog features
  • the posterior network is used to estimate the probability distribution of dialogue features output by the a priori network
  • the first dialogue feature is used to represent the posterior features of a dialogue in a dialogue and a dialogue reply
  • the second dialogue feature is used to represent a priori feature of the dialogue above and one dialogue reply in one dialogue
  • the first dialogue includes a first dialogue above and at least two first dialogue replies;
  • a model update module configured to update the dialogue model based on at least two first dialogue features and at least two second dialogue features of the first dialogue
  • the model update module is further configured to update the posterior network based on at least two first dialog features of the first dialog;
  • the model update module is further configured to update the discriminator of the dialogue model according to at least two first dialogue features and at least two second dialogue features of the second dialogue;
  • the model acquisition module is used to respond to meeting the training end condition and use the trained model as a dialogue model.
  • a device for generating a dialogue reply comprising:
  • the dialogue acquisition module is used to acquire the dialogue above;
  • a feature extraction module configured to input the dialogue above into a dialogue model, and randomly extract a target dialogue feature from the first dialogue features corresponding to a plurality of dialogue replies based on the prior network in the dialogue model;
  • a reply output module configured to decode the target dialogue feature based on the decoder in the dialogue model, and output a target dialogue reply
  • the reply display module is used to display the target dialogue reply.
  • a computer device in another aspect, includes a processor and a memory, and the memory is used to store at least one piece of program code.
  • the at least one piece of program code is loaded and executed by the processor to implement the present application. The operations performed in the training method of the dialog model in the embodiment, or the operations performed to realize the operations performed in the method for generating a dialog reply in the embodiment of the present application.
  • a storage medium is provided, and at least one piece of program code is stored in the storage medium, and the at least one piece of program code is used to execute the dialog model training method in the embodiment of the present application, or to execute the embodiment of the present application.
  • Fig. 1 is a schematic diagram of an implementation environment of a method for training a dialogue model according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for training a dialogue model provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a method for generating a dialog reply provided by an embodiment of the present application
  • FIG. 4 is a flowchart of a method for training a dialogue model provided by an embodiment of the present application
  • Fig. 5 is a schematic structural diagram of a dialogue model provided according to an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of a multi-semantic WAE algorithm according to an embodiment of the present application
  • Fig. 7 is a block diagram of a device for training a dialogue model according to an embodiment of the present application.
  • Fig. 8 is a block diagram of a device for generating a dialog reply according to an embodiment of the present application.
  • FIG. 9 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a server provided according to an embodiment of the present application.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science. It attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • AIaaS Artificial intelligence cloud services are generally called AIaaS (AI as a Service, Chinese as "AI as a Service”).
  • AIaaS Artificial intelligence platform service method
  • the AIaaS platform will split several common AI services and provide independent or packaged services in the cloud.
  • This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through API interfaces, and some senior developers can also use the platform
  • the AI framework and AI infrastructure provided are used to deploy and operate their own exclusive cloud artificial intelligence services.
  • Natural language processing (Nature Language Processing, NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use daily, so it is closely related to the study of linguistics. Natural language processing technologies usually include text processing, semantic understanding, machine translation, robot question answering, knowledge graphs and other technologies.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • the embodiment of the present application provides a method for training a dialogue model, which can be implemented based on artificial intelligence technology.
  • the dialogue model trained by this method can be applied to human-computer interaction scenarios. For example, chat robots, dialogue systems, and terminal intelligent assistants.
  • chat robots When the user is chatting with the chat robot, the chat robot can input the content input by the user as the dialogue above into the dialogue model, and the dialogue model will output multiple dialogue replies, and then show the user one of the dialogue replies.
  • the dialog system and the terminal intelligent assistant can also output a dialog reply that meets the user's needs based on the content input by the user.
  • FIG. 1 is a schematic diagram of the implementation environment of the training method of the dialogue model provided according to an embodiment of the present application.
  • the implementation environment may include: a terminal 110 and a server 120.
  • the terminal 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • the terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the terminal 110 can install and run an application program that supports human-computer interaction.
  • the application can be a chat robot application, a social application, a terminal intelligent assistant application, etc.
  • the terminal 110 is a terminal used by a user, and a user account is logged in an application program running in the terminal 110.
  • the server 120 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or it may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the server 120 is used to provide background services for applications supporting human-computer interaction.
  • the server 120 undertakes the main model training work, and the terminal 110 undertakes the secondary model training work; or, the server 120 undertakes the secondary model training work, and the terminal 110 undertakes the main model training work; or, the server 120 or the terminal 110 may be independent Undertake model training.
  • the server 120 may be composed of an access server, a model training server, and a database server.
  • the access server is used to provide the terminal 110 to provide access services.
  • the model training server is used for model training based on the authorized dialogue data provided by the terminal.
  • the terminal 110 may generally refer to one of multiple terminals, and this embodiment only uses the terminal 110 as an example for illustration.
  • the number of the aforementioned terminals may be more or less.
  • the foregoing terminal may be only one, or the foregoing terminal may be tens or hundreds, or more.
  • the embodiment of the method for training the dialog model also includes other terminals.
  • the embodiments of the present application do not limit the number of terminals and device types.
  • Fig. 2 is a flowchart of a method for training a dialogue model provided by an embodiment of the present application. This embodiment is described by taking the execution subject as a server as an example. Referring to Fig. 2, this embodiment includes:
  • the server obtains at least two first dialogue features and at least two second dialogue features of the first dialogue based on the prior network and the posterior network in the dialogue model, where the prior network is used to output the probability distribution of the dialogue features,
  • the posterior network is used to estimate the probability distribution of the dialogue features output by the prior network
  • the first dialogue feature is used to represent the posterior features of the dialogue above and a dialogue reply in a dialogue
  • the second dialogue feature is used to Represents the a priori feature of the dialogue above and a dialogue reply in a dialogue
  • the first dialogue includes a first dialogue above and at least two first dialogue replies.
  • step 201 the server acquires at least two first dialog features and at least two second dialog features of the first dialog, and the first dialog feature and the second dialog feature are used to represent the first dialog above and the first dialog, respectively.
  • the posterior feature and the prior feature of a dialogue reply, a dialogue above corresponds to at least two dialogue replies.
  • the server may select one dialogue from a plurality of dialogues as the first dialogue, and the first dialogue includes a first dialogue above and at least two first dialogue replies corresponding to the first dialogue above. For any group of the first dialogue above and the first dialogue reply, the server can obtain the corresponding prior feature and the posterior feature through the prior network and the posterior network, respectively.
  • the server updates the dialogue model based on at least two first dialogue features and at least two second dialogue features of the first dialogue.
  • the server may obtain at least one dialogue feature and at least two second dialogue features of the first dialogue.
  • the first dialogue above and a first dialogue reply in the first dialogue can obtain a first dialogue feature and a second dialogue feature, and the dialogue model is updated once according to the first dialogue feature and the second dialogue feature , Update the parameters of the prior network and the posterior network in the dialogue model.
  • another first dialogue feature and another second dialogue feature are obtained, and the dialogue model is updated again.
  • the number of times the dialogue model is updated is the same as the number of first dialogue replies contained in the first dialogue.
  • the dialogue model may also include an encoder, a decoder, and a discriminator. While the server updates the parameters of the a priori network and the posterior network, it will also update the parameters of the encoder, decoder, and discriminator. .
  • the server updates the posterior network based on at least two first dialog features of the first dialog.
  • the server may obtain at least two second dialog features of the above-mentioned first dialog, and then update the parameters of the posterior network once based on each second dialog feature.
  • the server updates the discriminator of the dialog model according to the at least two first dialog features and the at least two second dialog features of the second dialog.
  • the second dialogue includes one second dialogue above and at least two second dialogue replies.
  • the server selects at least one dialogue from a plurality of dialogues as the second dialogue.
  • the server can obtain at least two first dialogues of the second dialogue according to the method described in step 201.
  • Dialogue features and at least two second dialogue features For the second dialogue above and a second dialogue reply according to any second dialogue, a first dialogue feature and a second dialogue feature can be obtained.
  • the discriminator Based on the first dialogue feature and the second dialogue feature, the discriminator’s The parameters are updated once. At this time, the number of times the discriminator is updated is the number of replies to the second dialogue included in the second dialogue.
  • the server can obtain the threshold of the number of iterations of the discriminator, and then perform multiple iterations according to the threshold of the number of iterations, and the training ends when the threshold of the number of iterations is reached.
  • the server uses the trained model as a dialogue model.
  • the training end condition may be reaching a predetermined number of iterations, or the model converges, or the output result of the model meets the target condition, or meets other training end conditions, etc.
  • the embodiment of the present application does not limit this.
  • the dialogue model is updated multiple times through multiple dialogue features of the first dialogue, and the posterior network is updated again, and then the discriminator of the dialogue model is updated according to the multiple dialogue characteristics of the second dialogue.
  • Different semantics of the dialogue can be taken into consideration, so that the reply of the dialogue contains a variety of semantics, which improves the performance of the dialogue model and also increases the diversity of the dialogue replies generated through the dialogue model.
  • Fig. 3 is a flowchart of a method for generating a dialog reply provided by an embodiment of the present application. This embodiment is described by taking the execution subject as the terminal as an example. Referring to FIG. 3, this embodiment includes:
  • the terminal obtains the above conversation.
  • the dialogue above may be content input by the terminal user, such as text, voice, or emoticons.
  • the terminal inputs the dialogue text into the dialogue model, and based on the prior network in the dialogue model, randomly extracts a target dialogue feature from the second dialogue features corresponding to the multiple dialogue replies.
  • the terminal may be provided with a dialogue model, and the content input by the user is used as the dialogue above.
  • the dialogue model is used to encode the input dialogue above, and the encoded features are input into the dialogue model
  • the prior network in, based on the prior network, a target dialogue feature is randomly selected from a plurality of first dialogue features. Due to the random extraction, when the terminal re-inputs the dialogue above, the dialogue feature extracted by the prior network may be different from the dialogue feature extracted last time, so the dialogue response output by the dialogue model is also different.
  • the terminal decodes the target dialogue feature based on the decoder in the dialogue model, and outputs the target dialogue reply.
  • the decoder in the dialogue model can decode the target dialogue features obtained by random extraction to obtain the target dialogue reply. If the dialogue features randomly extracted by the prior network are different, the dialogue replies decoded by the decoder are different.
  • the terminal displays the target dialogue reply.
  • the terminal may adopt a manner of voice playback, text display, or display of corresponding emoticons to display the above-mentioned target dialogue response.
  • the dialogue response corresponding to the dialogue above is obtained by random extraction, so that if the dialogue model is input multiple times in the same dialogue, different dialogue responses can be obtained, thereby increasing the diversity of dialogue replies. sex.
  • the terminal obtains and outputs the dialogue reply through the dialogue model configured by itself, and in some embodiments, the terminal can perform the dialogue model through the dialogue model configured on the server.
  • the dialogue reply is obtained and output based on the obtained dialogue reply to achieve the effect of man-machine dialogue.
  • Fig. 4 is a flowchart of a method for training a dialogue model provided by an embodiment of the present application. This embodiment is described by taking the server performing one iteration as an example. Referring to FIG. 4, this embodiment includes:
  • the server obtains a first conversation from a plurality of conversations.
  • the server may randomly select N dialogues from the multiple dialogues as the first dialogue, where N is a positive integer.
  • the first dialogue includes a first dialogue above and K first dialogue replies corresponding to the first dialogue above, where K is a positive integer greater than or equal to 2.
  • the number of first dialog replies included in different first dialogs may be the same or different.
  • the data set includes 1000 conversations, and the server randomly selects 10 conversations as the first conversation, and obtains the first conversations A, B, C, D, E, F, G, H, I, and J, among which the first conversation A Corresponding to the 5 first dialogue replies a1, a2, a3, a4 and a5, the first dialogue B corresponds to the 6 first dialogue replies b1, b2, b3, b4, b5 and b6, and the first dialogue C corresponds to the 5 first dialogues Reply to c1, c2, c3, c4, c5, and c6. I won't list them all here.
  • the server acquires at least two first dialog features and at least two second dialog features of the first dialog based on the a priori network and the posterior network in the dialog model, where the first dialog feature is used to represent a dialog in a dialog
  • the text and the posterior feature of a dialogue reply is used to represent the prior feature of the dialogue above and a dialogue reply in a dialogue
  • the first dialogue includes a first dialogue above and at least two first dialogues. Reply to a conversation.
  • the server can encode the first dialogue reply and the corresponding first dialogue above, and then input the encoded vector representations into the prior network and the posterior network to obtain the prior features and Posterior features, that is, the second dialogue feature and the first dialogue feature.
  • the server may obtain a pair of the first dialogue feature and the second dialogue feature based on the posterior network and the prior network, that is, A first dialogue feature and a second dialogue feature.
  • the step of the server acquiring at least two first dialog features and at least two second dialog features of a first dialog can be implemented through the following sub-steps 4021 to 4023.
  • the server separately encodes the first dialogue above and the first dialogue reply based on the encoder of the dialogue model to obtain the first vector above the first dialogue The second vector of the reply to the first conversation.
  • the server inputs the first dialogue above and the first dialogue reply respectively into the encoder of the dialogue model, and the encoder is constructed based on a two-way gated recurrent unit neural network. According to the encoder, the server encodes the above-mentioned first dialogue and the first dialogue reply respectively to obtain the first vector of the first dialogue above and the second vector of the first dialogue reply.
  • the encoder encodes all inputs, such as the first dialogue above and the first dialogue reply, through a two-way gated recurrent unit neural network, and the encoded vector is a fixed-length vector.
  • the encoded vector is a fixed-length vector.
  • take the first conversation above The first vector c obtained by encoding is taken as an example for description.
  • the first vector c is calculated by the following formula (1) to formula (4).
  • GRU() represents the gated recurrent unit, Indicates the first dialogue above The vector representation of the t-1th word from the left, Indicates the first dialogue above The code corresponding to the t-th word from the left.
  • GRU() represents the gated recurrent unit, Indicates the first dialogue above The vector representation of the t+1th word from the right, Indicates the first dialogue above The code corresponding to the t-th word from the right.
  • h t represents the first dialogue above The vector representation of the t-th word from the left in the middle and the first dialogue above The splicing vector represented by the vector of the t-th word from the right.
  • c represents the first dialogue above
  • the splicing vector represented by the vector of the first word from the right, T represents the first dialogue above The number of words included in.
  • the server acquires at least two first dialogue features of the first conversation, and the first conversation feature of the first conversation is a result of the first vector above the first conversation and the reply to the first conversation through the posterior network.
  • the second vector is processed.
  • the server obtains the first dialogue feature based on the posterior network and according to the first vector and the second vector above the first dialogue.
  • the posterior network is used to learn the distribution of the dialogue features of the dialogue based on the dialogue above and the dialogue reply. According to the reply information, the distribution of the dialogue features in the dialogue model obtained by training can be made more accurate.
  • the probability distribution of the dialogue features output by the posterior network is called the posterior distribution, which is used to estimate the prior distribution, that is, the probability distribution of the dialogue features output by the prior network.
  • the server is based on the posterior network, and according to the first vector and the second vector, the step of acquiring the first dialogue feature may be: the server may be based on the posterior network, according to the first vector and the first dialogue above the first dialogue
  • the second vector of the reply, the mean value of the first parameter and the variance of the first parameter of the posterior distribution are obtained.
  • the server may obtain the first dialog feature according to the mean value of the first parameter, the variance of the first parameter, and the first sample value.
  • the first sampling value is the value obtained by sampling from the standard normal distribution, that is, the value of the sampling point.
  • the decoder Since the value obtained by upsampling from the standard normal distribution is used to obtain the first dialogue feature, during the training process, the decoder reconstructs the dialogue reply based on the first dialogue feature, and based on the reconstructed dialogue reply and the first dialogue feature A difference between the dialogue replies is used to adjust the parameters of the dialogue model so that the difference between the first dialogue feature and the first dialogue reply is small, so that the first dialogue feature can be used to represent the first dialogue reply.
  • the server obtains the first dialogue feature based on the posterior network, it is calculated by the following formula (5) and formula (6).
  • ⁇ k represents the mean value of the first parameter
  • ⁇ k represents the variance of the first parameter
  • W represents the variable parameter
  • g ⁇ () represents the posterior network
  • x k represents the second vector of the first dialog reply
  • c represents the first dialog
  • the first vector above, b represents the bias parameter.
  • z k represents the first dialogue feature
  • ⁇ k represents the mean value of the first parameter
  • ⁇ k represents the variance of the first parameter
  • represents the first sample value
  • the server may obtain a second dialogue feature based on the prior network and according to the first vector and the reply category to which the first dialogue reply belongs, where the reply category includes at least one other dialogue reply belonging to the same category as the first dialogue reply .
  • the prior network is used to represent the probability distribution of the real dialogue feature, which is estimated from the posterior distribution.
  • at least two dialogue replies corresponding to one dialogue can be clustered to obtain multiple reply categories.
  • the sub-distribution in the prior distribution is selected according to the reply category to which the reply of the first dialogue belongs.
  • the server selects the sub-distribution according to the reply category to which the first dialog reply belongs, and then samples the second dialog feature from the sub-distribution.
  • the server is based on a priori network, and according to the first vector and the reply category to which the first dialogue reply belongs, the step of acquiring the second dialogue feature may be: the server may according to the first vector and the reply category to which the first dialogue reply belongs , Determine the target probability distribution, the target probability distribution is the probability distribution corresponding to the reply category in the probability distribution of the dialog feature output by the prior network, that is, the sub-distribution used to match the posterior distribution.
  • the server may obtain the mean value of the second parameter and the variance of the second parameter according to the first vector based on the prior network.
  • the server may obtain the second dialog feature according to the second parameter mean value, the second parameter variance, and the second sample value.
  • the second sampling value is the value obtained by sampling from the target probability distribution, that is, the value of the sampling point. Since the second dialogue feature is obtained by mixing the sampling values on the sub-distribution in the Gaussian distribution, in the training process, based on the encoder, the prior distribution and the posterior distribution are obtained based on the second dialogue feature and the first dialogue feature The Wasserstein distance between the distributions, so as to accurately match the prior distribution and the posterior distribution.
  • At least two posterior distributions can be obtained according to the first dialogue including at least two dialogue replies, and one first dialogue feature z k can be sampled from each posterior distribution.
  • a prior distribution can be obtained, the prior distribution includes at least two sub-distributions, and a second dialogue feature can be sampled from each sub-distribution That is, for the same first dialogue, at least two second dialogue features obtained From the same prior distribution.
  • the server updates the dialogue model based on at least two first dialogue features and at least two second dialogue features of the first dialogue.
  • the server may obtain the first dialogue feature and the second dialogue feature corresponding to the first dialogue reply.
  • the server may obtain the discriminator loss and reconstruction loss according to the first vector obtained by encoding above the first dialog, the first dialog feature and the second dialog feature corresponding to the first dialog reply.
  • the server can update the parameters of the posterior network and the prior network in the dialogue model according to the discriminator loss, and update the parameters of the encoder, the posterior network, the prior network and the decoder in the dialogue model according to the reconstruction loss.
  • the server can update the parameters of the discriminator of the dialogue model according to the loss of the discriminator.
  • the loss of the discriminator is obtained by adversarial network optimization of the Wasserstein distance between the posterior distribution and the prior distribution.
  • the server is based on the discriminator of the dialogue model, and according to the first vector above the first dialogue, the first dialogue feature and the second dialogue feature corresponding to the first dialogue reply, the first dialogue feature and the second dialogue feature are obtained.
  • the discriminator loss can be calculated by formula (10).
  • ⁇ P-net represents the parameters of the prior network
  • lr represents the learning rate of the dialogue model
  • Means derivation Indicates the loss of the discriminator.
  • the server updates the parameters of the posterior network in the dialogue model according to the loss of the discriminator, and the parameters of the posterior model are calculated by formula (12).
  • ⁇ R-net represents the parameters of the posterior network
  • lr represents the learning rate of the dialogue model
  • Means derivation Indicates the loss of the discriminator.
  • the reconstruction loss can be based on the first dialogue feature obtained by up-sampling the posterior distribution, and decode the first dialogue feature based on the decoder to reconstruct the dialogue reply. Based on the reconstructed dialogue reply and the first dialogue The error between the replies determines the reconstruction loss.
  • the server may decode the first dialog feature based on the decoder in the dialog model, and obtain the target dialog feature corresponding to the target dialog reply obtained by decoding.
  • the server may obtain the reconstruction loss according to the first vector, the first dialogue feature, the second dialogue feature, and the target dialogue feature.
  • the reconstruction loss can be calculated by formula (13).
  • p ⁇ () represents the decoder
  • x k represents the target dialogue feature
  • the server updates the parameters of the encoder, posterior network, a priori network, and decoder in the dialogue model, it can be calculated by formula (14).
  • ⁇ net represents the parameters of net
  • lr represents the learning rate of the dialogue model
  • P-net, R-net, Dec ⁇ represents that net is one of Enc
  • Enc represents encoder
  • P-net represents a priori network
  • R-net stands for posterior network
  • Dec stands for decoder.
  • the parameters of the discriminator can be calculated by formula (15).
  • ⁇ Disc represents the parameters of the discriminator
  • lr represents the learning rate of the dialogue model
  • Means derivation Indicates the loss of the discriminator.
  • the server updates the posterior network based on at least two first dialog features of the first dialog.
  • the server can obtain at least two first dialog features, that is, posterior features, through the above steps.
  • the server can be based on semantics.
  • the optimization goal of distance is to control the semantic distance between the corresponding posterior distributions above the dialogue.
  • the server may maximize the Wasserstein distance between a first dialog feature and the average value of other first dialog features by using the maximum mean difference.
  • the server can update the posterior network based on the at least two first dialog features of the first dialog as follows: For any first dialog feature, the server can obtain the at least two first dialog features except for the The average value of other first dialog features except the first dialog feature, and the average value is used as the average dialog feature.
  • the server may obtain the second Wasserstein distance between the first dialogue feature and the average dialogue feature, and use the second Wasserstein distance as a semantic loss.
  • the server can update the parameters of the posterior network based on the semantic loss. Since the semantic distance between the posterior distribution is controlled, the prior distribution is a distinguishable multi-semantic distribution.
  • the server obtains the average value of the at least two first dialog features other than the first dialog feature, it can be calculated by the following formula (16).
  • K represents the number of the first dialog feature
  • z i represents the i-th first dialog feature
  • the server uses the following formula (17) to calculate the semantic loss.
  • z k represents the first dialogue feature
  • GKF() represents the Gaussian kernel function
  • the mathematical expectation that the distance is large enough Mean dialogue features representing other posterior distributions The distance between is small enough for mathematical expectation.
  • the parameters of the posterior network are calculated by the following formula (18).
  • ⁇ R-net represents the parameters of the posterior network
  • lr represents the learning rate of the dialogue model
  • Means derivation represents the semantic loss.
  • the server updates the discriminator of the dialogue model according to at least two first dialogue features and at least two second dialogue features of the second dialogue, where the second dialogue includes a second dialogue above and at least two second dialogue replies .
  • the server can set the update times of the discriminator. Each time the discriminator is updated, the server selects at least one of the multiple conversations as the second conversation, and then obtains at least two first conversations of the second conversation For the feature and the at least two second dialog features, please refer to step 402, which will not be repeated here.
  • the server can obtain the discriminator loss according to the first dialog feature and the second dialog feature corresponding to the second dialog reply. For details, please refer to step 403, which will not be repeated here.
  • the server can update the discriminator in the dialogue model according to the loss of the discriminator. When the server updates the parameters of the discriminator in the dialogue model, you can refer to the above formula (15), which will not be repeated here.
  • steps 401 to 405 are an iterative process of the dialog model training method provided in the embodiment of the present application, and the server repeats the above steps until the training end condition is satisfied.
  • the dialogue model is updated multiple times through multiple dialogue features of the first dialogue, and the posterior network is updated again, and then the discriminator in the dialogue model is updated according to the multiple dialogue features of the second dialogue,
  • the different semantics of the dialogue are considered, so that the replies of the dialogue contain a variety of semantics, and the diversity of the dialogue replies generated by the dialogue model is improved.
  • FIG. 5 is a schematic structural diagram of a dialogue model provided according to an embodiment of the present application.
  • a first dialogue is schematically shown on the left side, and the first dialogue includes a first dialogue above and K Reply to the first conversation.
  • Inputting the first dialogue to the encoder can obtain the first vector above the first dialogue and the second vector of the first dialogue reply.
  • the first vector is input into the prior network to obtain the prior distribution, and multiple second dialogue features can be sampled from each sub-distribution of the prior distribution.
  • the first dialogue feature corresponding to the k-th first dialogue reply is z k , and the other first The average value of the dialogue feature is The decoder decodes the first dialogue feature z k to obtain a reconstructed dialogue reply.
  • the reconstructed dialogue reply is similar to the first dialogue reply, the better.
  • the following introduces the multi-semantic WAE (Wasserstein Auto-Encoder, Wasserstein Auto-Encoder) algorithm used in training the above-mentioned dialogue model in the embodiment of the present application.
  • Encoder encoder
  • R-net PosteriorNetwork (posterior network)
  • P-net PriorNetwork (prior network); Disc: Discriminator (discriminator);
  • Input Anthology The number of reply clusters K, the number of discriminator iterations n critic , and the number of model iterations max-step.
  • FIG. 6 is a schematic flowchart of a multi-semantic WAE algorithm provided according to an embodiment of the present application.
  • the input of the WAE algorithm is multiple dialogues.
  • Step 1 is to initialize the encoder parameters;
  • Step 2 is to determine the condition of the model iteration;
  • Step 3 is to obtain at least one first dialogue;
  • Step 4 is to respond based on the first dialogue in the first dialogue.
  • Step 5 is to encode the first dialogue above and the first dialogue reply;
  • Step 6 is to obtain the first dialogue feature based on the posterior network;
  • Step 7 is to obtain the second dialogue feature based on the prior network;
  • Step 8 To update the prior network according to the discriminator loss;
  • step 9 is to update the posterior network according to the discriminator loss;
  • step 10 is to update the encoder, posterior network, a priori network and decoder according to the reconstruction loss;
  • step 11 is The discriminator is updated according to the discriminator loss;
  • step 12 is the end of the iteration based on the first dialog reply;
  • step 13 is the iterative determination based on the first dialog feature;
  • step 14 is the posterior network is updated based on the semantic loss.
  • Step 15 is the end of the iteration based on the first dialogue feature;
  • Step 16 is the iterative determination based on the update times of the discriminator;
  • Step 17 is to obtain at least one second dialogue;
  • Step 18 is to perform the reply based on the second dialogue in the second dialogue Iterative judgment;
  • Step 19 is to repeat the above steps 5 to 7;
  • Step 20 is to update the discriminator based on the loss of the discriminator;
  • Step 21 is the end of the iteration based on the second dialog reply;
  • Step 22 is the number of discriminator updates plus 1;
  • Step 23 is The iteration based on the update times of the discriminator ends;
  • step 24 is the number of model iterations plus 1;
  • step 25 is the end of model iteration.
  • Dialogue data set The number to be selected M, the threshold ⁇ .
  • the embodiment of the application also designs experiments for verification.
  • the experiment is evaluated through two public session data sets.
  • One data set is Douban (from Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, and Ming Zhou’s "Response generation by context-aware prototype editing” published in Proceedings of the Intelligence, ArtiConference on AAAI Volume 33, pages 7281-7288).
  • BLUE an automatic evaluation method of machine translation
  • BOWEmbedding bagofwordsEmbedding, bag of words model embedding
  • intra-dist Intrinsic difference
  • inter-dist external difference
  • BLUE includes Recall (recall rate), Precision (precision) and F1 (F1-Score, F1 score).
  • BOWEmbedding includes Average (average value), Extreme (extreme value) and Greedy (greed value).
  • Intra-dist includes dist-1 and dist-2
  • inter-dist includes dist-1 and dist-2.
  • Table 2 The numbers with a + sign in Table 2 indicate values that exceed the optimal basic threshold and are statistically significant.
  • the data in Table 2 shows that the MA-WAE method proposed in this application significantly improves diversity and maintains relevance.
  • this application also designs a manual evaluation experiment.
  • 5 participants were recruited from Informativeness (informativeness, which measures whether a dialogue response provides meaningful information), Appropriateness (appropriateness, which measures whether a dialogue response is logical) and Semantic Diversity (semantic diversity) .
  • the score is 0-2, with 0 being the worst and 2 being the best.
  • Table 3 shows the average ⁇ standard deviation of all methods. The results show that in terms of semantic diversity, MS-WAE is much better than the other data sets on both data sets, exceeding the baseline.
  • Fig. 7 is a block diagram of an apparatus for training a dialogue model provided according to an embodiment of the present application.
  • the device is used to execute the steps in the execution of the above-mentioned dialog model training method.
  • the device includes: a feature acquisition module 701, a model update module 702, and a model acquisition module 703.
  • the feature acquisition module 701 is configured to acquire at least two first dialog features and at least two second dialog features of a first dialog, where the first dialog feature and the second dialog feature are used to represent the first dialog above and one
  • the posterior feature and the prior feature of the first dialogue reply one dialogue above corresponds to at least two dialogue replies; optionally, the feature acquisition module 701 is used to acquire based on the prior network and the posterior network in the dialogue model At least two first dialogue features and at least two second dialogue features of the first dialogue, the prior network is used to output the probability distribution of the dialogue features, and the posterior network is used to estimate the value of the dialogue features output by the prior network Probability distribution, the first dialogue feature is used to represent the posterior features of the dialogue above and a dialogue reply in a dialogue, and the second dialogue feature is used to express the prior features of the dialogue above and a dialogue reply in a dialogue,
  • the first dialogue includes a first dialogue above and at least two first dialogue replies.
  • the model update module 702 is configured to update the dialog model based on at least two first dialog features and at least two second dialog features of the first dialog, the dialog model includes a priori network and a posterior network, the posterior network Used to estimate the probability distribution of the dialogue features output by the prior network;
  • the model update module 702 is further configured to update the posterior network based on at least two first dialog features of the first dialog;
  • the model update module 702 is further configured to update the discriminator of the dialog model based on at least two first dialog features and at least two second dialog features of the second dialog; the second dialog includes the second dialog above and at least Reply to two second conversations.
  • the model acquisition module 703 is configured to use the trained model as a dialogue model in response to meeting the training end condition.
  • the feature acquisition module 701 is configured to, for any first dialog reply, based on the dialog model, respectively encode the first dialog above and the first dialog reply to obtain the The first vector above the first dialogue and the second vector of the first dialogue reply; based on the posterior network, according to the first vector and the second vector, the first dialogue feature is obtained; based on the prior network, according to The first vector and the reply category to which the first dialogue reply belongs, and the second dialogue feature is obtained, and the reply category includes at least one other dialogue reply belonging to the same category as the first dialogue reply.
  • the feature acquisition module 701 is configured to reply to any of the first dialogs of the first dialog, and based on the encoder of the dialog model, the above and the first dialog of the first dialog A dialogue reply is coded separately to obtain the first vector above the first dialogue and the second vector of the first dialogue reply; obtain at least two first dialogue features of the first dialogue, and the first dialogue of the first dialogue A dialog feature is obtained by processing the first vector above the first dialog and the second vector of the first dialog reply through the posterior network; acquiring at least two second dialog features of the first dialog, the first The second dialogue feature of the conversation is obtained by processing the first vector above the first dialogue and the reply category to which the first dialogue reply belongs through the prior network.
  • the feature acquisition module 701 is configured to input the first dialogue above and the first dialogue reply into the encoder of the dialogue model respectively, and the encoder is based on a two-way gated loop unit Neural network construction; according to the encoder, the first dialogue above and the first dialogue reply are respectively encoded to obtain the first vector of the first dialogue above and the second vector of the first dialogue reply.
  • the feature acquisition module 701 is further configured to acquire the mean value of the first parameter and the variance of the first parameter according to the first vector and the second vector based on the posterior network; A parameter mean, the first parameter variance, and a first sample value are used to obtain a first dialog feature, and the first sample value is a value of a sample point obtained from a standard normal distribution.
  • the feature acquisition module 701 is configured to input the first vector and the second vector into the posterior network, and output the mean value of the first parameter and the variance of the first parameter; according to the first parameter
  • the mean value, the variance of the first parameter, and the first sampling value are used to obtain the first dialog feature, and the first sampling value is obtained by sampling the standard normal distribution.
  • the feature acquisition module 701 is configured to determine a target probability distribution according to the first vector and the reply category to which the first dialog reply belongs, and the target probability distribution is output by the prior network The probability distribution corresponding to the reply category in the probability distribution of the dialogue feature; based on the prior network, according to the first vector, obtain the second parameter mean and the second parameter variance; according to the second parameter mean and the second parameter variance And a second sampling value to obtain a second dialogue feature, where the second sampling value is a value of a sampling point obtained from the target probability distribution.
  • the feature acquisition module 701 is configured to determine a target probability distribution according to the first vector and the reply category to which the first dialog reply belongs, and the target probability distribution is output by the prior network The probability distribution corresponding to the response category in the probability distribution; input the first vector into the prior network to obtain the second parameter mean and the second parameter variance; according to the second parameter mean, the second parameter variance, and the second sampling Value, the second dialog feature is obtained, and the second sampling value is obtained by sampling the probability distribution of the target.
  • the model update module 702 is configured to, for any first dialogue reply of the first conversation, obtain the first dialogue feature and the second dialogue feature corresponding to the first dialogue reply;
  • the first vector, the first dialog feature and the second dialog feature corresponding to the first dialog reply obtain the discriminator loss, the first vector is obtained based on the encoding of the first dialog; according to the first vector, the first dialog Reply to the corresponding first and second dialog features to obtain the reconstruction loss; according to the discriminator loss, update the parameters of the posterior network and the prior network in the dialog model; update the dialog model according to the reconstruction loss
  • the parameters of the encoder, the posterior network, the prior network and the decoder; according to the discriminator loss, the parameters of the discriminator in the dialogue model are updated.
  • the model update module 702 is used for a discriminator based on the dialog model, according to the first vector above the first dialog, the first dialog feature corresponding to the first dialog reply, and The second dialogue feature is to obtain the first Wasserstein distance between the first dialogue feature and the second dialogue feature corresponding to the first dialogue reply, and use the first Wassstein distance as the discriminator loss.
  • the model update module 702 is configured to decode the first dialog feature based on the decoder in the dialog model to obtain the target dialog feature; according to the first vector and the first vector The first dialog feature and the second dialog feature corresponding to a dialog reply, and the target dialog feature, obtain the reconstruction loss.
  • the model update module 702 is further configured to obtain, for any first dialog feature of the first dialog, the at least two first dialog features other than the first dialog feature.
  • the average value of a dialogue feature the average value is used as the average dialogue feature; the second Wasserstein distance between the first dialogue feature and the average dialogue feature is obtained, and the second Wasserstein distance is taken as the semantic loss; according to This semantic loss updates the parameters of the posterior network.
  • the dialog model is updated multiple times by using multiple dialog features of the first dialog, and the posterior network is updated again, and then the discriminator in the dialog model is updated based on the multiple dialog features of the second dialog.
  • the parameters of the dialogue model are updated many times, taking into account the different semantics of the dialogue, so that the dialogue reply contains a variety of semantics, and the diversity of dialogue replies generated through the dialogue model is improved.
  • the device for training the dialogue model provided in the above embodiment runs an application, it only uses the division of the above functional modules for illustration. In actual applications, the above functions can be allocated to different functional modules according to needs. Complete, that is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the dialog model training device provided in the foregoing embodiment and the dialog model training method embodiment belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • Fig. 8 is a block diagram of a device for generating a dialog reply according to an embodiment of the present application.
  • the device is used to perform the steps in the execution of the above-mentioned dialog reply generation method.
  • the device includes: a dialog acquisition module 801, a feature extraction module 802, a reply output module 803, and a reply display module 804.
  • the dialogue acquisition module 801 is used to acquire the dialogue above;
  • the feature extraction module 802 is configured to input the dialogue text into the dialogue model, and based on the prior network in the dialogue model, randomly extract a target dialogue feature from the first dialogue features corresponding to the multiple dialogue replies;
  • Reply output module 803 configured to decode the target dialogue feature based on the decoder in the dialogue model, and output the target dialogue reply;
  • the reply display module 804 is used to display the target dialogue reply.
  • the dialogue reply corresponding to the dialogue above is obtained by randomly extracting, so that when the same dialogue above is input multiple times, different dialogue replies can be obtained, thereby increasing the diversity of dialogue replies.
  • dialog reply generating device when the dialog reply generating device provided in the above embodiment runs an application, it only uses the division of the above functional modules for illustration. In actual applications, the above functions can be allocated by different functional modules according to needs. , That is, divide the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the dialog reply generating device provided in the above embodiment and the dialog reply generating method embodiment belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • the computer device can be configured as a terminal or a server.
  • the terminal can be used as the executive body to implement the technical solutions provided in the embodiments of this application.
  • the server can be used as the execution subject to implement the technical solutions provided in the embodiments of the present application, or the technical methods provided in the present application can be implemented through the interaction between the terminal and the server, which is not limited in the embodiments of the present application.
  • FIG. 9 is a structural block diagram of a terminal 900 provided in an embodiment of the present application.
  • the terminal 900 can be: a smartphone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compressing standard audio Level 4) Player, laptop or desktop computer.
  • the terminal 900 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 900 includes a processor 901 and a memory 902.
  • the processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 901 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). accomplish.
  • the processor 901 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 901 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 901 may also include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 902 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 902 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 902 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 901 to implement the dialogue model provided in the method embodiment of the present application. Training method, or dialogue response generation method.
  • the terminal 900 may optionally further include: a peripheral device interface 903 and at least one peripheral device.
  • the processor 901, the memory 902, and the peripheral device interface 903 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 903 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.
  • the peripheral device interface 903 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 901 and the memory 902.
  • the processor 901, the memory 902, and the peripheral device interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 901, the memory 902, and the peripheral device interface 903 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 904 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 904 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 904 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 904 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks and/or WiFi (Wireless Fidelity, wireless fidelity) networks.
  • the radio frequency circuit 904 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the display screen 905 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 905 also has the ability to collect touch signals on or above the surface of the display screen 905.
  • the touch signal can be input to the processor 901 as a control signal for processing.
  • the display screen 905 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 905 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 900.
  • the display screen 905 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 905 may be made of materials such as LCD (Liquid Crystal Display) and OLED (Organic Light-Emitting Diode).
  • the camera assembly 906 is used to capture images or videos.
  • the camera assembly 906 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 906 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 907 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input to the processor 901 for processing, or input to the radio frequency circuit 904 to implement voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 901 or the radio frequency circuit 904 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement.
  • the audio circuit 907 may also include a headphone jack.
  • the positioning component 908 is used to locate the current geographic location of the terminal 900 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 908 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Grenas system of Russia, or the Galileo system of the European Union.
  • the power supply 909 is used to supply power to various components in the terminal 900.
  • the power source 909 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 900 further includes one or more sensors 910.
  • the one or more sensors 910 include, but are not limited to: an acceleration sensor 911, a gyroscope sensor 912, a pressure sensor 913, a fingerprint sensor 914, an optical sensor 915, and a proximity sensor 916.
  • the acceleration sensor 911 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 900.
  • the acceleration sensor 911 may be used to detect the components of gravitational acceleration on three coordinate axes.
  • the processor 901 may control the display screen 905 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 911.
  • the acceleration sensor 911 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 912 can detect the body direction and the rotation angle of the terminal 900, and the gyroscope sensor 912 can cooperate with the acceleration sensor 911 to collect the user's 3D actions on the terminal 900.
  • the processor 901 can implement the following functions according to the data collected by the gyroscope sensor 912: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 913 may be disposed on the side frame of the terminal 900 and/or the lower layer of the display screen 905.
  • the processor 901 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 913.
  • the processor 901 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 905.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 914 is used to collect the user's fingerprint.
  • the processor 901 can identify the user's identity based on the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 can identify the user's identity based on the collected fingerprints. When it is recognized that the user's identity is a trusted identity, the processor 901 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 914 may be provided on the front, back or side of the terminal 900. When a physical button or a manufacturer logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical button or the manufacturer logo.
  • the optical sensor 915 is used to collect the ambient light intensity.
  • the processor 901 may control the display brightness of the display screen 905 according to the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the display screen 905 is increased; when the ambient light intensity is low, the display brightness of the display screen 905 is decreased.
  • the processor 901 may also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.
  • the proximity sensor 916 also called a distance sensor, is usually provided on the front panel of the terminal 900.
  • the proximity sensor 916 is used to collect the distance between the user and the front of the terminal 900.
  • the processor 901 controls the display screen 905 to switch from the on-screen state to the off-screen state; when the proximity sensor 916 detects When the distance between the user and the front of the terminal 900 gradually increases, the processor 901 controls the display screen 905 to switch from the rest screen state to the bright screen state.
  • FIG. 9 does not constitute a limitation on the terminal 900, and may include more or fewer components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • FIG. 10 is a schematic structural diagram of a server provided according to an embodiment of the present application.
  • the server 1000 may have relatively large differences due to different configurations or performance, and may include one or more processors ( Central Processing Units (CPU) 1001 and one or more memories 1002, where at least one instruction is stored in the memory 1002, and the at least one instruction is loaded and executed by the processor 1001 to implement the dialogue provided by the foregoing various method embodiments Model training method or dialogue response generation method.
  • the server may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the server 1000 may also include other components for implementing device functions, which will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is applied to a computer device, the computer-readable storage medium stores at least one piece of program code, and the at least one piece of program code is used by a processor Execute and implement the operations performed by the computer device in the dialog model training method or the dialog reply generation method in the embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种对话模型的训练方法、对话回复生成方法、装置、计算机设备及存储介质,属于人工智能技术领域。所述方法包括:获取第一对话的至少两个第一对话特征和至少两个第二对话特征;基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新对话模型;基于所述第一对话的至少两个第一对话特征,更新所述后验网络;根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型中的判别器;响应于满足训练结束条件,将训练得到的模型作为对话模型。上述技术方案,考虑了对话的不同语义,使得对话的回复包含多种语义,提高了通过对话模型生成的对话回复的多样性。

Description

对话模型的训练方法、装置、计算机设备及存储介质
本申请要求于2020年05月25日提交的申请号为2020104501940、发明名称为“对话模型的训练方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别涉及一种对话模型的训练方法、对话回复生成方法、装置、计算机设备及存储介质。
背景技术
随着人工智能技术的不断发展,自然语言处理可以应用在更广阔的范围。例如,闲聊机器人、对话系统以及终端智能助手等人机交互场景。计算机设备可以根据用户在对话过程中输入的对话上文,来输出对应的对话回复。如何避免计算机设备输出的对话回复过于单调,是一个需要解决的问题。
发明内容
本申请实施例提供了一种对话模型的训练方法、对话回复生成方法、装置、计算机设备及存储介质,通过根据对话的对话特征,多次更新对话模型的参数,考虑了对话的不同语义,使得对话的回复包含多种语义,提高了通过对话模型生成的对话回复的多样性。所述技术方案如下:
一方面,提供了一种对话模型的训练方法,所述方法包括:
基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,所述先验网络用于输出对话特征的概率分布,所述后验网络用于估计所述先验网络所输出的对话特征的概率分布,所述第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,所述第二对话特征用于表示一个对话中所述对话上文和一个对话回复的先验特征,所述第一对话包括一个第一对话上文和至少两个第一对话回复;
基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型;
基于所述第一对话的至少两个第一对话特征,更新所述后验网络;
根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型的判别器;
响应于满足训练结束条件,将训练得到的模型作为对话模型。
另一方面,提供了一种对话回复生成方法,所述方法包括:
获取对话上文;
将所述对话上文输入对话模型,基于所述对话模型中的先验网络,从多个对话回复对应 的第二对话特征中随机抽取一个目标对话特征;
基于所述对话模型中的解码器对所述目标对话特征进行解码,输出目标对话回复;
展示所述目标对话回复。
另一方面,提供了一种对话模型的训练装置,所述装置包括:
特征获取模块,用于基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,所述先验网络用于输出对话特征的概率分布,所述后验网络用于估计所述先验网络所输出的对话特征的概率分布,所述第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,所述第二对话特征用于表示一个对话中所述对话上文和一个对话回复的先验特征,所述第一对话包括一个第一对话上文和至少两个第一对话回复;
模型更新模块,用于基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型;
所述模型更新模块,还用于基于所述第一对话的至少两个第一对话特征,更新所述后验网络;
所述模型更新模块,还用于根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型的判别器;
模型获取模块,用于响应于满足训练结束条件,将训练得到的模型作为对话模型。
另一方面,提供了一种对话回复生成装置,所述装置包括:
对话获取模块,用于获取对话上文;
特征抽取模块,用于将所述对话上文输入对话模型,基于所述对话模型中的先验网络,从多个对话回复对应的第一对话特征中随机抽取一个目标对话特征;
回复输出模块,用于基于所述对话模型中的解码器对所述目标对话特征进行解码,输出目标对话回复;
回复展示模块,用于展示所述目标对话回复。
另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器用于存储至少一段程序代码,所述至少一段程序代码由所述处理器加载并执行以实现本申请实施例中的对话模型的训练方法中所执行的操作,或者执行以实现本申请实施例中的对话回复生成方法中所执行的操作。
另一方面,提供了一种存储介质,所述存储介质中存储有至少一段程序代码,所述至少一段程序代码用于执行本申请实施例中的对话模型的训练方法,或者执行本申请实施例中的对话回复生成方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本申请实施例提供的一种对话模型的训练方法的实施环境示意图;
图2是本申请实施例提供的一种对话模型的训练方法的流程图;
图3是本申请实施例提供的一种对话回复生成方法的流程图;
图4是本申请实施例提供的一种对话模型的训练方法的流程图;
图5是根据本申请实施例提供的一种对话模型的结构示意图;
图6是根据本申请实施例提供的一种多语义WAE算法的流程示意图;
图7是根据本申请实施例提供的一种对话模型的训练装置的框图;
图8是根据本申请实施例提供的一种对话回复生成装置的框图;
图9是本申请实施例提供的一种终端的结构框图;
图10是根据本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
下面简单介绍一下本申请实施例可能用到的技术:
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能云服务,一般也被称作是AIaaS(AI as a Service,中文为“AI即服务”)。这是目前主流的一种人工智能平台的服务方式,具体来说AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过API接口的方式来接入使用平台提供的一种或者是多种人工智能服务,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自已专属的云人工智能服务。
自然语言处理(Nature Language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。
本申请实施例提供了一种对话模型的训练方法,可以基于人工智能技术实现。该方法训练得到的对话模型,可以应用于人机交互的场景中。例如,聊天机器人、对话系统以及终端智能助手等。用户在与聊天机器人进行聊天时,聊天机器人可以将用户输入的内容作为对话上文输入对话模型中,由对话模型输出多个对话回复,然后向用户展示其中一个对话回复。同理,对话系统和终端智能助手也可以根据用户输入的内容,来输出符合用户需求的对话回复。
下面介绍一下对话模型的训练方法的实施环境,图1是根据本申请实施例提供的一种对话模型的训练方法的实施环境示意图。该实施环境可以包括:终端110和服务器120。
终端110以及服务器120可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。终端110可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端110可以安装和运行有支持人机交互的应用程序。该应用程序可以是聊天机器人类应用程序、社交类应用程序以及终端智能助手类应用程序等。示意性的,终端110是用户使用的终端,终端110中运行的应用程序内登录有用户账户。
服务器120可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器120用于为支持人机交互的应用程序提供后台服务。可选地,服务器120承担主要模型训练工作,终端110承担次要模型训练工作;或者,服务器120承担次要模型训练工作,终端110承担主要模型训练工作;或者,服务器120或终端110分别可以单独承担模型训练工作。
可选地,服务器120可以由接入服务器、模型训练服务器和数据库服务器构成。接入服务器用于提供终端110提供接入服务。模型训练服务器用于根据终端提供的已授权的对话数据进行模型训练。模型训练服务器可以是一台或多台。当模型训练服务器是多台时,存在至少两台模型训练服务器用于提供不同的服务,和/或,存在至少两台模型训练服务器用于提供相同的服务,比如以负载均衡方式提供同一种服务,本申请实施例对此不加以限定。
终端110可以泛指多个终端中的一个,本实施例仅以终端110来举例说明。
本领域技术人员可以知晓,上述终端的数量可以更多或更少。比如上述终端可以仅为一个,或者上述终端为几十个或几百个,或者更多数量,此时上述对话模型的训练方法的实施例中还包括其他终端。本申请实施例对终端的数量和设备类型不加以限定。
在本申请实施例中,可以由服务器或终端作为执行主体来实施本申请实施例提供的技术方案,也可以通过终端和服务器之间的交互来实施本申请提供的技术方法,本申请实施例对此不作限定。图2是本申请实施例提供的一种对话模型的训练方法的流程图。该实施例以执行主体为服务器为例进行说明,参见图2,该实施例包括:
201、服务器基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,该先验网络用于输出对话特征的概率分布,该后验网络用于估计该先验网络所输出的对话特征的概率分布,该第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,该第二对话特征用于表示一个对话中该对话上文和一个对话回复的先验特征,该第一对话包括一个第一对话上文和至少两个第一对话回复。在步骤201 中,服务器获取第一对话的至少两个第一对话特征和至少两个第二对话特征,该第一对话特征和该第二对话特征分别用于表示第一对话上文和一个第一对话回复的后验特征和先验特征,一个对话上文对应于至少两个对话回复。
在本申请实施例中,服务器可以从多个对话中选取一个对话作为第一对话,该第一对话包括一个第一对话上文和该第一对话上文对应的至少两个第一对话回复。对于任一一组第一对话上文和第一对话回复,服务器可以通过先验网络和后验网络,分别获取对应先验特征和后验特征。
202、服务器基于该第一对话的至少两个第一对话特征和至少两个第二对话特征,更新对话模型。
在本申请实施例中,服务器可以获取第一对话的至少一个对话特征和至少两个第二对话特征。其中,第一对话中的第一对话上文和一个第一对话回复,可以得到一个第一对话特征和一个第二对话特征,根据该第一对话特征和第二对话特征对对话模型进行一次更新,更新对话模型中的先验网络和后验网络的参数。再根据第一对话中的第一对话上文和另一个第一对话回复,得到另一个第一对话特征和另一个第二对话特征,再对对话模型进行一次更新。对话模型更新的次数与第一对话中包含的第一对话回复的个数相同。
需要说明的是,该对话模型还可以包括编码器、解码器以及判别器,服务器在更新上述先验网络和后验网络的参数的同时,还会更新上述编码器、解码器以及判别器的参数。
203、服务器基于该第一对话的至少两个第一对话特征,更新该后验网络。
在本申请实施例中,服务器可以获取上述第一对话的至少两个第二对话特征,然后基于每个第二对话特征,更新一次后验网络的参数。
204、服务器根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新该对话模型的判别器。
其中,第二对话包括一个第二对话上文和至少两个第二对话回复。
在本申请实施例中,服务器从多个对话中选择至少一个对话作为第二对话,对于任一第二对话,服务器可以根据步骤201所描述的方式,获取该第二对话的至少两个第一对话特征和至少两个第二对话特征。对于根据任一个第二对话的第二对话上文和一个第二对话回复,可以得到一个第一对话特征和一个第二对话特征,基于该第一对话特征和第二对话特征,对判别器的参数进行一次更新。此时,判别器更新的次数为第二对话中包含的第二对话回复的个数。
需要说明的是,上述过程只是判别器的一次迭代过程,服务器可以获取判别器的迭代次数阈值,然后根据该迭代次数阈值进行多次迭代,达到迭代次数阈值则结束训练。
205、服务器响应于满足训练结束条件,将训练得到的模型作为对话模型。
在本申请实施例中,训练结束条件可以是达到预定的迭代次数,或者模型收敛,或者模型输出的结果符合目标条件,或者符合其他训练结束条件等,本申请实施例对此不进行限制。
在本申请实施例中,通过第一对话的多个对话特征,来多次更新对话模型,以及再次更新后验网络,再根据第二对话的多个对话特征,来更新对话模型的判别器,能够考虑到对话的不同语义,使得对话的回复包含多种语义,提高了对话模型的性能,也提高了通过对话模型生成的对话回复的多样性。
在本申请实施例中,可以由服务器或终端作为执行主体来实施本申请实施例提供的技术方案,也可以通过终端和服务器之间的交互来实施本申请提供的技术方法,本申请实施例对此不作限定。图3是本申请实施例提供的一种对话回复生成方法的流程图。该实施例以执行主体为终端为例进行说明,参见图3,该实施例包括:
301、终端获取对话上文。
在本申请实施例中,该对话上文可以是终端用户输入的内容,例如文字、语音或者表情符号等。
302、终端将该对话上文输入对话模型,基于该对话模型中的先验网络,从多个对话回复对应的第二对话特征中随机抽取一个目标对话特征。
在本申请实施例中,终端可以设置有对话模型,将用户输入的内容作为对话上文,输入对话模型中,由该对话模型对输入的对话上文进行编码,将编码得到的特征输入对话模型中的先验网络,基于该先验网络从多个第一对话特征中随机抽取一个目标对话特征。由于是随机抽取,当终端再次输入该对话上文时,先验网络抽取的对话特征可能与上一次抽取的对话特征不同,从而对话模型输出的对话回复也就不同。
303、终端基于该对话模型中的解码器对该目标对话特征进行解码,输出目标对话回复。
在本申请实施例中,对话模型中的解码器可以对随机抽取得到的目标对话特征进行解码,得到目标对话回复。先验网络随机抽取得到的对话特征不同,则解码器解码得到的对话回复不同。
304、终端展示该目标对话回复。
在本申请实施例中,终端可以采取语音播放、文字显示或者展示对应表情符号的方式来对上述目标对话回复进行展示。
在本申请实施例中,通过采用随机抽取的方式,获取对话上文所对应的对话回复,使得同一对话上文若多次输入对话模型,可以得到不同的对话回复,从而提高了对话回复的多样性。
需要说明的是,在上述通过对话模型进行交互的过程中,终端通过自身配置的对话模型进行对话回复的获取和输出,而在一些实施例中,终端可以通过配置在服务器上的对话模型来进行对话回复的获取,并基于获取到的对话回复来输出,以达到人机对话的效果。
在本申请实施例中,可以由服务器或终端作为执行主体来实施本申请实施例提供的技术方案,也可以通过终端和服务器之间的交互来实施本申请提供的技术方法,本申请实施例对此不作限定。图4是本申请实施例提供的一种对话模型的训练方法的流程图。该实施例以服务器进行一次迭代为例进行说明,参见图4,该实施例包括:
401、服务器从多个对话中获取第一对话。
在本申请实施例中,服务器可以从多个对话中随机选择N个对话作为第一对话,其中N为正整数。对于任一第一对话,该第一对话包括一个第一对话上文以及与该第一对话上文对应的K个第一对话回复,其中K为大于等于2的正整数。不同第一对话包括的第一对话回复的数量可以相同,也可以不同。
例如,数据集中包括1000个对话,服务器从中随机选择10个对话作为第一对话,得到第一对话A、B、C、D、E、F、G、H、I和J,其中第一对话A对应5个第一对话回复a1、a2、a3、a4以及a5,第一对话B对应6个第一对话回复b1、b2、b3、b4、b5以及b6,第一 对话C对应5个第一对话回复c1、c2、c3、c4、c5以及c6。在此不再一一列举。
402、服务器基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,该第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,该第二对话特征用于表示一个对话中该对话上文和一个对话回复的先验特征,该第一对话包括一个第一对话上文和至少两个第一对话回复。
在本申请实施例中,一个第一对话中包括至少两个第一对话回复,则N个第一对话中至少包括2N个第一对话回复。对于任一第一对话回复,服务器可以对该第一对话回复以及对应的第一对话上文进行编码,然后将编码得到的向量表示分别输入到先验网络和后验网络,得到先验特征和后验特征,也即第二对话特征和第一对话特征。
以一个第一对话为例,对于该第一对话包括的每个第一对话回复,服务器可以基于后验网络和先验网络,获取一对第一对话特征和第二对话特征,也即是,一个第一对话特征和一个第二对话特征。相应的,服务器获取一个第一对话的至少两个第一对话特征和至少两个第二对话特征的步骤,可以通过以下子步骤4021至子步骤4023来实现。
4021、对于第一对话的任一第一对话回复,服务器基于该对话模型的编码器,对第一对话上文和该第一对话回复分别进行编码,得到该第一对话上文的第一向量和第一对话回复的第二向量。
在本申请实施例中,服务器将第一对话上文和该第一对话回复,分别输入对话模型的编码器,该编码器为基于双向门控循环单元神经网络构建。服务器根据该编码器,对上述第一对话上文和该第一对话回复分别进行编码,得到该第一对话上文的第一向量和该第一对话回复的第二向量。
需要说明的是,编码器通过双向门控循环单元神经网络对所有输入,如第一对话上文和第一对话回复,进行编码,编码得到的向量为固定长度的向量。例如,以对第一对话上文
Figure PCTCN2021091954-appb-000001
进行编码得到第一向量c为例进行说明。第一向量c通过以下公式(1)至公式(4)计算得到。
Figure PCTCN2021091954-appb-000002
其中,
Figure PCTCN2021091954-appb-000003
表示第一对话上文
Figure PCTCN2021091954-appb-000004
中左数第t个单词的向量表示,GRU()表示门控循环单元,
Figure PCTCN2021091954-appb-000005
表示第一对话上文
Figure PCTCN2021091954-appb-000006
中左数第t-1个单词的向量表示,
Figure PCTCN2021091954-appb-000007
表示第一对话上文
Figure PCTCN2021091954-appb-000008
中左数第t个单词对应的编码。
Figure PCTCN2021091954-appb-000009
其中,
Figure PCTCN2021091954-appb-000010
表示第一对话上文
Figure PCTCN2021091954-appb-000011
中右数第t个单词的向量表示,GRU()表示门控循环单元,
Figure PCTCN2021091954-appb-000012
表示第一对话上文
Figure PCTCN2021091954-appb-000013
中右数第t+1个单词的向量表示,
Figure PCTCN2021091954-appb-000014
表示第一对话上文
Figure PCTCN2021091954-appb-000015
中右数第t个单词对应的编码。
Figure PCTCN2021091954-appb-000016
其中,h t表示第一对话上文
Figure PCTCN2021091954-appb-000017
中左数第t个单词的向量表示和第一对话上文
Figure PCTCN2021091954-appb-000018
中右数第t个单词的向量表示的拼接向量。
Figure PCTCN2021091954-appb-000019
其中,c表示第一对话上文
Figure PCTCN2021091954-appb-000020
中左数第T个单词的向量表示和第一对话上文
Figure PCTCN2021091954-appb-000021
中右数第1个单词的向量表示的拼接向量,T表示第一对话上文
Figure PCTCN2021091954-appb-000022
中包括的单词个数。
4022、服务器获取该第一对话的至少两个第一对话特征,该第一会话的该第一对话特征通过该后验网络对该第一对话上文的第一向量和该第一对话回复的第二向量进行处理得到。
在该步骤4022中,服务器基于后验网络,根据该第一对话上文的第一向量和第二向量,获取第一对话特征。
在本申请实施例中,后验网络用于基于对话上文和对话回复,来学习对话的对话特征的分布,根据回复信息可以使得训练得到的对话模型中对话特征的分布更准确。后验网络输出的对话特征的概率分布称为后验分布,该后验分布用于估计先验分布,也即先验网络所输出的对话特征的概率分布。
在一种可选的实现方式中,假设当前需要重构的对话回复为第一对话回复,后验分布服从正态分布。相应的,服务器基于后验网络,根据该第一向量和第二向量,获取第一对话特征的步骤可以为:服务器可以基于后验网络,根据第一对话上文的第一向量和第一对话回复的第二向量,获取后验分布的第一参数均值和第一参数方差。服务器可以根据该第一参数均值、第一参数方差以及第一采样值,获取第一对话特征。其中,第一采样值为从标准正态分布中采样得到的值,也即是采样点的值。由于通过标准正态分布上采样得到的值,来获取第一对话特征,使得在训练过程中,基于该第一对话特征由解码器重构出对话回复,并基于重构出的对话回复与第一对话回复之间的差别,来调整对话模型的参数,使该第一对话特征与第一对话回复的差别较小,从而可以用第一对话特征来表示第一对话回复。
需要说明的是,服务器基于后验网络,获取第一对话特征时,通过以下公式(5)和公式(6)来计算得到。
Figure PCTCN2021091954-appb-000023
其中,μ k表示第一参数均值,σ k表示第一参数方差,W表示可变参数,g φ()表示后验网络,x k表示第一对话回复的第二向量,c表示第一对话上文的第一向量,b表示偏置参数。
Figure PCTCN2021091954-appb-000024
其中,z k表示第一对话特征,μ k表示第一参数均值,σ k表示第一参数方差,∈表示第一采样值,
Figure PCTCN2021091954-appb-000025
表示∈服从标准正态分布。
4023、服务器可以基于先验网络,根据该第一向量和该第一对话回复所属的回复类别,获取第二对话特征,该回复类别包括与该第一对话回复属于相同类别的至少一个其他对话回复。
在本申请实施例中,先验网络用于表示真实的对话特征的概率分布,由后验分布估计得出。在通过后验分布估计先验分布时,从先验分布中选择一个子分布,去匹配后验分布。为了能够精确的匹配后验分布和先验分布,可以对一个对话上文对应的至少两个对话回复进行聚类,得到多个回复类别。然后在获取第二对话特征时,根据第一对话回复所属的回复类别,来选择先验分布中的子分布。
在一种可选的实现方式中,假设先验分布服从混合高斯,服务器根据第一对话回复所属的回复类别,来选择子分布,然后从该子分布上采样得到第二对话特征。相应的,服务器基于先验网络,根据该第一向量和该第一对话回复所属的回复类别,获取第二对话特征的步骤可以为:服务器可以根据第一向量和第一对话回复所属的回复类别,确定目标概率分布,该目标概率分布为先验网络所输出的对话特征的概率分布中该回复类别对应的概率分布,也即用于与后验分布进行匹配的子分布。服务器可以基于先验网络,根据第一向量,获取第二参数均值和第二参数方差。服务器可以根据第二参数均值、第二参数方差以及第二采样值,获取第二对话特征。其中,该第二采样值为从目标概率分布中采样得到的值,也即是,采样点的值。由于通过混合高斯分布中的子分布上的采样值,来获取第二对话特征,使得在训练过程中,基于编码器,根据该第二对话特征与第一对话特征,获取先验分布和后验分布之间的Wasserstein(瓦瑟斯坦)距离,从而精确的匹配先验分布和后验分布。
需要说明的是,服务器基于先验网络,获取第二对话特征时,可以通过以下公式(7)至公式(9)来计算能得到。
Figure PCTCN2021091954-appb-000026
其中,
Figure PCTCN2021091954-appb-000027
表示第二对话特征,
Figure PCTCN2021091954-appb-000028
表示第二对话特征服从先验分布,j表示回复类别的标识,J表示回复类别的总数,π j表示子分布选择参数,r k表示第k个对话回复,
Figure PCTCN2021091954-appb-000029
表示先验分布。
Figure PCTCN2021091954-appb-000030
其中,
Figure PCTCN2021091954-appb-000031
表示回复类别j对应的目标概率分布的第二参数均值,
Figure PCTCN2021091954-appb-000032
表示回复类别j对应的目标概率分布的第二参数方差,
Figure PCTCN2021091954-appb-000033
表示回复类别j对应的目标概率分布的可变参数,f θ()表示先验网络,c表示第一对话上文的第一向量,
Figure PCTCN2021091954-appb-000034
表示回复类别j对应的目标概率分布的偏置参数。
Figure PCTCN2021091954-appb-000035
其中,
Figure PCTCN2021091954-appb-000036
表示第二对话特征,
Figure PCTCN2021091954-appb-000037
表示回复类别j对应的目标概率分布的第二参数均值,
Figure PCTCN2021091954-appb-000038
表示回复类别j对应的目标概率分布的第二参数方差,
Figure PCTCN2021091954-appb-000039
表示第二采样值,
Figure PCTCN2021091954-appb-000040
表示
Figure PCTCN2021091954-appb-000041
服从标准正态分布。
需要说明的是,对于任一第一对话,根据该第一对话包括至少两个对话回复,可以得到至少两个后验分布,从每个后验分布上可以采样得到一个第一对话特征z k。根据该第一对话包括的对话上文,可以得到一个先验分布,该先验分布包括至少两个子分布,从每个子分布上可以采样得到一个第二对话特征
Figure PCTCN2021091954-appb-000042
也即针对于同一个第一对话,得到的至少两个第二对话特征
Figure PCTCN2021091954-appb-000043
来自于同一个先验分布。
403、服务器基于该第一对话的至少两个第一对话特征和至少两个第二对话特征,更新对话模型。
在本申请实施例中,对于任一第一对话回复,服务器可以获取该第一对话回复对应的第一对话特征和第二对话特征。服务器可以根据第一对话上文编码得到的第一向量、该第一对话回复对应的第一对话特征和第二对话特征,获取判别器损失和重构损失。然后,服务器可以根据该判别器损失,更新对话模型中后验网络和先验网络的参数,根据该重构损失,更新对话模型中编码器、后验网络、先验网络以及解码器的参数。最后,服务器可以根据判别器损失,更新对话模型的判别器的参数。
需要说明的是,判别器的损失通过对抗式网络优化后验分布与先验分布之间的Wasserstein(瓦瑟斯坦)距离来获得。服务器基于对话模型的判别器,根据第一对话上文的第一向量、该第一对话回复对应的第一对话特征和第二对话特征,获取第一对话特征和第二对话特征之间的第一瓦瑟斯坦距离,将该第一瓦斯斯坦距离作为判别器损失。相应的,判别器损失可以通过公式(10)计算得到。
Figure PCTCN2021091954-appb-000044
其中,
Figure PCTCN2021091954-appb-000045
表示判别器损失,
Figure PCTCN2021091954-appb-000046
表示第一对话特征z k的数学期望,D() 表示判别器,z k表示第一对话特征,c表示第一对话上文的第一向量,
Figure PCTCN2021091954-appb-000047
表示第二对话特征
Figure PCTCN2021091954-appb-000048
的数学期望,
Figure PCTCN2021091954-appb-000049
表示第二对话特征。
相应的,服务器根据判别器损失,更新对话模型中先验网络的参数时,可以通过公式(11)计算得到。
Figure PCTCN2021091954-appb-000050
其中,θ P-net表示先验网络的参数,lr表示对话模型的学习率,
Figure PCTCN2021091954-appb-000051
表示求导,
Figure PCTCN2021091954-appb-000052
表示判别器损失。
相应的,服务器根据判别器损失,更新对话模型中后验网络的参数,该后验模型的参数通过公式(12)计算得到。
Figure PCTCN2021091954-appb-000053
其中,θ R-net表示后验网络的参数,lr表示对话模型的学习率,
Figure PCTCN2021091954-appb-000054
表示求导,
Figure PCTCN2021091954-appb-000055
表示判别器损失。
需要说明的是,重构损失可以根据后验分布上采样得到的第一对话特征,基于解码器对该第一对话特征进行解码,以重构对话回复,基于重构的对话回复与第一对话回复之间的误差确定重构损失。服务器可以基于对话模型中的解码器,对第一对话特征进行解码,获取解码得到的目标对话回复对应的目标对话特征。服务器可以根据第一向量、第一对话特征、第二对话特征和目标对话特征,获取重构损失。相应的,重构损失可以通过公式(13)计算得到。
Figure PCTCN2021091954-appb-000056
其中,
Figure PCTCN2021091954-appb-000057
表示重构损失,
Figure PCTCN2021091954-appb-000058
表示从后验分布中无限次采样得到的第一对话特征z k,以使得重构的目标对话特征的概率整体足够大的数学期望,p ψ()表示解码器,x k表示目标对话特征。
相应的,服务器更新对话模型中编码器、后验网络、先验网络以及解码器的参数时,可以通过公式(14)来计算得到。
Figure PCTCN2021091954-appb-000059
Figure PCTCN2021091954-appb-000060
其中,θ net表示net的参数,lr表示对话模型的学习率,
Figure PCTCN2021091954-appb-000061
表示重构损失s.t.net∈{Enc,P-net,R-net,Dec}表示net是Enc、P-net、R-net以及Dec中的一个,Enc表示编码器,P-net表示先验网络,R-net表示后验网络,Dec表示解码器。
相应的,服务器根据判别器损失,更新对话模型中判别器的参数时,判别器的参数可以通过公式(15)计算得到。
Figure PCTCN2021091954-appb-000062
其中,θ Disc表示判别器的参数,lr表示对话模型的学习率,
Figure PCTCN2021091954-appb-000063
表示求导,
Figure PCTCN2021091954-appb-000064
表示判别器损失。
404、服务器基于该第一对话的至少两个第一对话特征,更新该后验网络。
在本申请实施例中,服务器通过上述步骤可以得到至少两个第一对话特征,也即后验特征,为了使对话模型最终学习到的先验分布是可区分的多语义分布,服务器可以基于语义距离的优化目标来控制对话上文对应的后验分布之间的语义距离。
在一种可选的实现方式中,服务器可以通过使用最大均值差异,来最大化一个第一对话特征和其他第一对话特征的平均值之间的Wasserstein距离。相应的,服务器基于该第一对话的至少两个第一对话特征,更新该后验网络的步骤可以为:对于任一第一对话特征,服务器可以获取该至少两个第一对话特征中除该第一对话特征外其他第一对话特征的平均值,将该平均值作为平均对话特征。服务器可以获取该第一对话特征与平均对话特征之间的第二瓦瑟斯坦距离,将该第二瓦瑟斯坦距离作为语义损失。服务器可以根据该语义损失,更新后验网络的参数。由于控制了后验分布之间的语义距离,使得先验分布是可区分的多语义分布。
需要说明的是,服务器在获取该至少两个第一对话特征中除该第一对话特征外其他第一对话特征的平均值时,可以通过以下公式(16)计算得到。
Figure PCTCN2021091954-appb-000065
其中,
Figure PCTCN2021091954-appb-000066
表示平均对话特征,K表示第一对话特征的数量,z i表示第i个第一对话特征,
Figure PCTCN2021091954-appb-000067
表示i属于集合
Figure PCTCN2021091954-appb-000068
集合
Figure PCTCN2021091954-appb-000069
中不包括K个第一对话特征中的z k
相应的,服务器通过以下公式(17)来计算语义损失。
Figure PCTCN2021091954-appb-000070
其中,
Figure PCTCN2021091954-appb-000071
表示语义损失,z k表示第一对话特征,
Figure PCTCN2021091954-appb-000072
表示平均对话特征,GKF()表示高斯核函数,
Figure PCTCN2021091954-appb-000073
表示从后验分布上采样不同的第一对话特征z k之间的距离足够小的数学期望,
Figure PCTCN2021091954-appb-000074
表示从后验分布上采样的第一对话特征z k与其他后验分布的平均对话特征
Figure PCTCN2021091954-appb-000075
距离足够大的数学期望,
Figure PCTCN2021091954-appb-000076
表示其他后验分布的平均对话特征
Figure PCTCN2021091954-appb-000077
之间的距离足够小的数学期望。
相应的,服务器根据该语义损失,更新后验网络的参数时,后验网络的参数通过以下公式(18)来计算得到。
Figure PCTCN2021091954-appb-000078
其中,θ R-net表示后验网络的参数,lr表示对话模型的学习率,
Figure PCTCN2021091954-appb-000079
表示求导,
Figure PCTCN2021091954-appb-000080
表示语义损失。
405、服务器根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新该对话模型的判别器,第二对话包括一个第二对话上文和至少两个第二对话回复。
在本申请实施例中,服务器可以设置判别器的更新次数,每次更新判别器时,服务器从多个对话中选择至少一个对话作为第二对话,然后获取第二对话的至少两个第一对话特征和至少两个第二对话特征,可以参见步骤402,在此不再赘述。对于任一第二对话的第二对话回复,服务器可以根据该第二对话回复对应的第一对话特征和第二对话特征,获取判别器损失,具体可以参见步骤403,在此不再赘述。服务器可以根据该判别器损失,更新对话模型中的判别器。服务器更新对话模型中的判别器的参数时,可以参见上述公式(15),在此不再赘述。
需要说明的是,上述步骤401至步骤405是本申请实施例提供的对话模型的训练方法的一次迭代过程,服务器重复上述步骤,直至满足训练结束条件。
在本申请实施例中,通过第一对话的多个对话特征来多次更新对话模型,以及再次更新后验网络,再根据第二对话的多个对话特征,来更新对话模型中的判别器,在训练过程中,考虑了对话的不同语义,使得对话的回复包含多种语义,提高了通过对话模型生成的对话回复的多样性。
图5是根据本申请实施例提供的一种对话模型的结构示意图,如图5所示,左侧示意性示出了一个第一对话,该第一对话包括一个第一对话上文和K个第一对话回复。将该第一对话输入编码器可以得到第一对话上文的第一向量和第一对话回复的第二向量。将第一向量输入先验网络可以得到先验分布,从先验分布的各子分布上可以采样得到多个第二对话特征。 将第二向量分别输入后验网络可以得到后验分布,从一个后验分布上可以采样得到一个第一对话特征,第k个第一对话回复对应的第一对话特征为z k,其他第一对话特征的平均值为
Figure PCTCN2021091954-appb-000081
解码器对第一对话特征为z k进行解码,得到重构的对话回复,重构的对话回复与第一对话回复越相似越好。
下面介绍一下本申请实施例在训练上述对话模型时用到的多语义WAE(WassersteinAuto-Encoder,Wasserstein自动编码器)算法。
/*Enc:Encoder(编码器);R-net:PosteriorNetwork(后验网络);
P-net:PriorNetwork(先验网络);Disc:Discriminator(判别器);
Dec:Decoder(解码器)*/
Input:文集
Figure PCTCN2021091954-appb-000082
回复簇的数量K,判别器迭代次数n critic,模型迭代次数max-step。
Figure PCTCN2021091954-appb-000083
Figure PCTCN2021091954-appb-000084
为了使上述多语义WAE算法所描述的步骤更清晰,参见图6所示,图6是根据本申请实施例提供的一种多语义WAE算法的流程示意图。该WAE算法的输入为多个对话,步骤1为初始化编码器参数;步骤2模型迭代判定条件;步骤3为获取至少一个第一对话;步骤4为基于第一对话中的第一对话回复,进行迭代判定;步骤5为对第一对话上文和第一对话回复进行编码;步骤6为根据后验网络,得到第一对话特征;步骤7为根据先验网络,得到第二对话特征;步骤8为根据判别器损失,更新先验网络;步骤9为根据判别器损失,更新后验网络;步骤10为根据重构损失,更新编码器、后验网络、先验网络以及解码器;步骤11为根据判别器损失更新判别器;步骤12为基于第一对话回复的迭代结束;步骤13为基于第一对话特征,进行迭代判定;步骤14为根据语义损失,更新后验网络。步骤15为基于第一对话特征的迭代结束;步骤16为基于判别器更新次数,进行迭代判定;步骤17为获取至少一个第二对话;步骤18为基于第二对话中的第二对话回复,进行迭代判定;步骤19为重复上述步骤5至步骤7;步骤20为根据判别器损失更新判别器;步骤21为基于第二对话回复的迭代结束;步骤22为判别器更新次数加1;步骤23为基于判别器更新次数的迭代结束;步骤24为模型迭代次数加1;步骤25为模型迭代结束。
需要说明的是,上述WAE算法中可输入的文集
Figure PCTCN2021091954-appb-000085
可以通过下述算法得到。
Input:对话数据集
Figure PCTCN2021091954-appb-000086
待选数量M,阈值τ。
Figure PCTCN2021091954-appb-000087
需要说明的是,为了验证本申请实施例提供的对话模型的训练方法训练得到的对话模型,具有良好的效果,本申请实施例还设计了实验进行验证。实验通过两个公共会话数据集进行评估。一个数据集是Douban(出自Yu Wu,Furu Wei,Shaohan Huang,Yunli Wang,Zhoujun Li,and Ming Zhou在2019年发表的《Response generationby context-aware prototype editing》,发表于Proceedingsof the AAAI Conference on Artificial Intelligence,第33卷,7281-7288页)。另一个是DailyDialg(出自Yanran Li,Hui Su,Xiaoyu Shen,Wenjie Li,ZiqiangCao,and ShuziNiu在2017年发表的《DailyDialog:A manually labelled multi-turn dialogue dataset》,发表于Proceedings of the Eighth International Joint Conference onNatural Language Processing(Volume 1:Long Papers),第986-995页)。数据集中样本数量的统计汇总可以参见表1所示。需要说明的是,在Douban和DailyDialog对话数据集的词汇量分别为20,000和10,000。
表1
数据集 train valid vest
Douban 894,721 15,000 15,000
DailyDialog 68,096 6,895 6,695
下面介绍一下实验中用于对比的其他方法。在实验时,将本申请实施例提供的MS-WAE(Multi-Semanticwasserstein autoencoder,多语义瓦瑟斯坦自动编码)方法与Bahdanau等人于 2015年提出的Seq2Seq-attn(The standard Seq2Seq architecture with attention mechanism,带有注意力机制的标准Seq2Seq架构)方法,Gao等人于2019年提出的DCVAE(A discrete CVAE forresponse generation on short-text conversation,用于短文本对话产生响应的离散CVAE(Conditional AutoEncoder,有条件的自动编码))方法,Chen等人于2019年提出的MMPMS(Generating multiple diverse responses with multi-mapping and posterior mappingselection,通过多映射和后验映射选择产生多种多样的响应)方法,以及Gu等人于2018年提出的DialogWAE(Multimodal response generation with conditional wasserstein autoencoder,使用条件瓦瑟斯坦自动编码器生成多模态响应)方法。
下面介绍一下实验的评价标准。在实验时,通过BLUE(一种机器翻译的自动评价方法)、BOWEmbedding(bagofwordsEmbedding,词袋模型嵌入)、intra-dist(内在差异),以及inter-dist(外在差异)者四个大的方面来进行评价。其中,BLUE包括Recall(召回率)、Precision(精确度)以及F1(F1-Score,F1分数)。BOWEmbedding包括Average(平均值)、Extrema(极值)以及Greedy(贪婪值)。intra-dist包括dist-1和dist-2,inter-dist包括dist-1和dist-2。
实验结果可以参见表2所示。
表2
Figure PCTCN2021091954-appb-000088
Figure PCTCN2021091954-appb-000089
表2表2中带有+号的数字表示超过最佳基本阈值的数值,具有统计学意义。表2中的数据表明,本申请提出的MA-WAE方法显著提高了多样性并保持了相关性。
另外,本申请还设计了人工评判实验。实验过程中招募了5名参与人员,分别从Informativeness(信息性,衡量一个对话回复是否提供有意义的信息)、Appropriateness(恰当性,衡量一个对话回复是否符合逻辑)以及Semantic Diversity(语义多样性)。评分分值为0-2,0表示最差,2表示最好。
人工评判实验结果可以参见表3所示。
表3
Figure PCTCN2021091954-appb-000090
表3中示出了所有方法的平均值±标准差,结果表明,在语义多样性方面,MS-WAE在 两个数据集上都大大优于其他数据集,超过了基线。
图7是根据本申请实施例提供的一种对话模型的训练装置的框图。该装置用于执行上述对话模型的训练方法执行时的步骤,参见图7,装置包括:特征获取模块701、模型更新模块702以及模型获取模块703。
特征获取模块701,用于获取第一对话的至少两个第一对话特征和至少两个第二对话特征,该第一对话特征和该第二对话特征分别用于表示第一对话上文和一个第一对话回复的后验特征和先验特征,一个对话上文对应于至少两个对话回复;可选地,特征获取模块701,用于基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,该先验网络用于输出对话特征的概率分布,该后验网络用于估计该先验网络所输出的对话特征的概率分布,该第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,该第二对话特征用于表示一个对话中该对话上文和一个对话回复的先验特征,该第一对话包括一个第一对话上文和至少两个第一对话回复。
模型更新模块702,用于基于该第一对话的至少两个第一对话特征和至少两个第二对话特征,更新对话模型,该对话模型中包括先验网络和后验网络,该后验网络用于估计该先验网络所输出的对话特征的概率分布;
该模型更新模块702,还用于基于该第一对话的至少两个第一对话特征,更新该后验网络;
该模型更新模块702,还用于根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新该对话模型的判别器;该第二对话包括第二对话上文和至少两个第二对话回复。
模型获取模块703,用于响应于满足训练结束条件,将训练得到的模型作为对话模型。
在一种可选的实现方式中,该特征获取模块701,用于对于任一第一对话回复,基于该对话模型,对该第一对话上文和该第一对话回复分别进行编码,得到该第一对话上文的第一向量和该第一对话回复的第二向量;基于该后验网络,根据该第一向量和该第二向量,获取第一对话特征;基于该先验网络,根据该第一向量和该第一对话回复所属的回复类别,获取第二对话特征,该回复类别包括与该第一对话回复属于相同类别的至少一个其他对话回复。
在一种可选的实现方式中,该特征获取模块701,用于对于该第一对话的任一该第一对话回复,基于该对话模型的编码器,对该第一对话上文和该第一对话回复分别进行编码,得到该第一对话上文的第一向量和该第一对话回复的第二向量;获取该第一对话的至少两个第一对话特征,该第一会话的该第一对话特征通过该后验网络对该第一对话上文的第一向量和该第一对话回复的第二向量进行处理得到;获取该第一对话的至少两个第二对话特征,该第一会话的该第二对话特征通过该先验网络对该第一对话上文的第一向量和该第一对话回复所属的回复类别进行处理得到。
在一种可选的实现方式中,该特征获取模块701,用于将该第一对话上文和该第一对话回复分别输入该对话模型的编码器,该编码器为基于双向门控循环单元神经网络构建;根据该编码器,对该第一对话上文和该第一对话回复分别进行编码,得到该第一对话上文的第一向量和该第一对话回复的第二向量。
在一种可选的实现方式中,该特征获取模块701,还用于基于该后验网络,根据该第一向量和该第二向量,获取第一参数均值和第一参数方差;根据该第一参数均值、该第一参数方 差以及第一采样值,获取第一对话特征,该第一采样值为从标准正态分布中获取的采样点的值。
在一种可选的实现方式中,该特征获取模块701,用于将该第一向量和该第二向量输入该后验网络,输出第一参数均值和第一参数方差;根据该第一参数均值、该第一参数方差以及第一采样值,获取第一对话特征,该第一采样值为对标准正态分布采样得到。
在一种可选的实现方式中,该特征获取模块701,用于根据该第一向量和该第一对话回复所属的回复类别,确定目标概率分布,该目标概率分布为该先验网络所输出的对话特征的概率分布中该回复类别对应的概率分布;基于该先验网络,根据该第一向量,获取第二参数均值和第二参数方差;根据该第二参数均值、该第二参数方差以及第二采样值,获取第二对话特征,该第二采样值为从该目标概率分布中获取的采样点的值。
在一种可选的实现方式中,该特征获取模块701,用于根据该第一向量和该第一对话回复所属的回复类别,确定目标概率分布,该目标概率分布为该先验网络所输出的概率分布中该回复类别对应的概率分布;将该第一向量输入该先验网络,得到第二参数均值和第二参数方差;根据该第二参数均值、该第二参数方差以及第二采样值,获取第二对话特征,该第二采样值为对该目标概率分布采样得到。
在一种可选的实现方式中,该模型更新模块702,用于对于该第一会话的任一第一对话回复,获取该第一对话回复对应的第一对话特征和第二对话特征;根据第一向量、该第一对话回复对应的第一对话特征和第二对话特征,获取判别器损失,该第一向量基于该第一对话上文编码得到;根据该第一向量、该第一对话回复对应的第一对话特征和第二对话特征,获取重构损失;根据该判别器损失,更新该对话模型中后验网络和先验网络的参数;根据该重构损失,更新该对话模型中编码器、该后验网络、该先验网络以及解码器的参数;根据该判别器损失,更新该对话模型中判别器的参数。
在一种可选的实现方式中,该模型更新模块702,用于基于该对话模型的判别器,根据该第一对话上文的第一向量、该第一对话回复对应的第一对话特征和第二对话特征,获取该第一对话回复对应的第一对话特征和第二对话特征之间的第一瓦瑟斯坦距离,将该第一瓦斯斯坦距离作为判别器损失。
在一种可选的实现方式中,该模型更新模块702,用于基于该对话模型中的解码器,对该第一对话特征进行解码,获取目标对话特征;根据该第一向量、该该第一对话回复对应的第一对话特征和第二对话特征、该目标对话特征,获取重构损失。
在一种可选的实现方式中,该模型更新模块702,还用于对于第一对话的任一第一对话特征,获取该至少两个第一对话特征中除该第一对话特征外其他第一对话特征的平均值,将该平均值作为平均对话特征;获取该第一对话特征与该平均对话特征之间的第二瓦瑟斯坦距离,将该第二瓦瑟斯坦距离作为语义损失;根据该语义损失,更新该后验网络的参数。
在本申请实施例中,通过第一对话的多个对话特征来多次更新对话模型,以及再次更新后验网络,再根据第二对话的多个对话特征来更新对话模型中的判别器。使得根据对话的对话特征,多次更新对话模型的参数,考虑了对话的不同语义,使得对话的回复包含多种语义,提高了通过对话模型生成的对话回复的多样性。
需要说明的是:上述实施例提供的对话模型的训练装置在运行应用程序时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的对话模型的训练装置与对话模型的训练方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图8是根据本申请实施例提供的一种对话回复生成装置的框图。该装置用于执行上述对话回复生成方法执行时的步骤,参见图8,装置包括:对话获取模块801、特征抽取模块802、回复输出模块803以及回复展示模块804。
对话获取模块801,用于获取对话上文;
特征抽取模块802,用于将该对话上文输入对话模型,基于该对话模型中的先验网络,从多个对话回复对应的第一对话特征中随机抽取一个目标对话特征;
回复输出模块803,用于基于该对话模型中的解码器对该目标对话特征进行解码,输出目标对话回复;
回复展示模块804,用于展示该目标对话回复。
在本申请实施例中,通过采用随机抽取的方式获取对话上文所对应的对话回复,使得同一对话上文在多次输入时,可以得到不同的对话回复,从而提高了对话回复的多样性。
需要说明的是:上述实施例提供的对话回复生成装置在运行应用程序时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的对话回复生成装置与对话回复生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在本申请实施例中,计算机设备可被配置为终端或者服务器,当计算机设备被配置为终端时,可以由终端作为执行主体来实施本申请实施例提供的技术方案,当计算机设备被配置为服务器时,可以由服务器作为执行主体来实施本申请实施例提供的技术方案,也可以通过终端和服务器之间的交互来实施本申请提供的技术方法,本申请实施例对此不作限定。
计算机设备被配置为终端时,图9是本申请实施例提供的一种终端900的结构框图。该终端900可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端900还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端900包括有:处理器901和存储器902。
处理器901可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器901可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器901也可以包括主处理器和协处理器,主处理器是用于对 在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器901可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器901还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器902可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器902还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器902中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器901所执行以实现本申请中方法实施例提供的对话模型的训练方法,或者对话回复生成方法。
在一些实施例中,终端900还可选包括有:外围设备接口903和至少一个外围设备。处理器901、存储器902和外围设备接口903之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口903相连。具体地,外围设备包括:射频电路904、显示屏905、摄像头组件906、音频电路907、定位组件908和电源909中的至少一种。
外围设备接口903可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器901和存储器902。在一些实施例中,处理器901、存储器902和外围设备接口903被集成在同一芯片或电路板上;在一些其他实施例中,处理器901、存储器902和外围设备接口903中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路904用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路904通过电磁信号与通信网络以及其他通信设备进行通信。射频电路904将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路904包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路904可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路904还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏905用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏905是触摸显示屏时,显示屏905还具有采集在显示屏905的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器901进行处理。此时,显示屏905还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏905可以为一个,设置终端900的前面板;在另一些实施例中,显示屏905可以为至少两个,分别设置在终端900的不同表面或呈折叠设计;在再一些实施例中,显示屏905可以是柔性显示屏,设置在终端900的弯曲表面上或折叠面上。甚至,显示屏905还可以设置成非矩形的不规则图形,也即异形屏。显示屏905可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件906用于采集图像或视频。可选地,摄像头组件906包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件906还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路907可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器901进行处理,或者输入至射频电路904以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端900的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器901或射频电路904的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路907还可以包括耳机插孔。
定位组件908用于定位终端900的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件908可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源909用于为终端900中的各个组件进行供电。电源909可以是交流电、直流电、一次性电池或可充电电池。当电源909包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端900还包括有一个或多个传感器910。该一个或多个传感器910包括但不限于:加速度传感器911、陀螺仪传感器912、压力传感器913、指纹传感器914、光学传感器915以及接近传感器916。
加速度传感器911可以检测以终端900建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器911可以用于检测重力加速度在三个坐标轴上的分量。处理器901可以根据加速度传感器911采集的重力加速度信号,控制显示屏905以横向视图或纵向视图进行用户界面的显示。加速度传感器911还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器912可以检测终端900的机体方向及转动角度,陀螺仪传感器912可以与加速度传感器911协同采集用户对终端900的3D动作。处理器901根据陀螺仪传感器912采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器913可以设置在终端900的侧边框和/或显示屏905的下层。当压力传感器913设置在终端900的侧边框时,可以检测用户对终端900的握持信号,由处理器901根据压力传感器913采集的握持信号进行左右手识别或快捷操作。当压力传感器913设置在显示屏905的下层时,由处理器901根据用户对显示屏905的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器914用于采集用户的指纹,由处理器901根据指纹传感器914采集到的指纹 识别用户的身份,或者,由指纹传感器914根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器901授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器914可以被设置终端900的正面、背面或侧面。当终端900上设置有物理按键或厂商Logo时,指纹传感器914可以与物理按键或厂商Logo集成在一起。
光学传感器915用于采集环境光强度。在一个实施例中,处理器901可以根据光学传感器915采集的环境光强度,控制显示屏905的显示亮度。具体地,当环境光强度较高时,调高显示屏905的显示亮度;当环境光强度较低时,调低显示屏905的显示亮度。在另一个实施例中,处理器901还可以根据光学传感器915采集的环境光强度,动态调整摄像头组件906的拍摄参数。
接近传感器916,也称距离传感器,通常设置在终端900的前面板。接近传感器916用于采集用户与终端900的正面之间的距离。在一个实施例中,当接近传感器916检测到用户与终端900的正面之间的距离逐渐变小时,由处理器901控制显示屏905从亮屏状态切换为息屏状态;当接近传感器916检测到用户与终端900的正面之间的距离逐渐变大时,由处理器901控制显示屏905从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图9中示出的结构并不构成对终端900的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
计算机设备被配置为服务器时,图10是根据本申请实施例提供的一种服务器的结构示意图,该服务器1000可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)1001和一个或一个以上的存储器1002,其中,该存储器1002中存储有至少一条指令,该至少一条指令由该处理器1001加载并执行以实现上述各个方法实施例提供的对话模型的训练方法或者对话回复生成方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器1000还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质应用于计算机设备,该计算机可读存储介质中存储有至少一条程序代码,该至少一条程序代码用于被处理器执行并实现本申请实施例中的对话模型的训练方法或者对话回复生成方法中计算机设备所执行的操作。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种对话模型的训练方法,其特征在于,所述方法包括:
    基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,所述先验网络用于输出对话特征的概率分布,所述后验网络用于估计所述先验网络所输出的对话特征的概率分布,所述第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,所述第二对话特征用于表示一个对话中所述对话上文和一个对话回复的先验特征,所述第一对话包括一个第一对话上文和至少两个第一对话回复;
    基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型;
    基于所述第一对话的至少两个第一对话特征,更新所述后验网络;
    根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型的判别器;
    响应于满足训练结束条件,将训练得到的模型作为对话模型。
  2. 根据权利要求1所述的方法,其特征在于,所述基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,包括:
    对于所述第一对话的任一所述第一对话回复,基于所述对话模型的编码器,对所述第一对话上文和所述第一对话回复分别进行编码,得到所述第一对话上文的第一向量和所述第一对话回复的第二向量;
    获取所述第一对话的至少两个第一对话特征,所述第一会话的所述第一对话特征通过所述后验网络对所述第一对话上文的第一向量和所述第一对话回复的第二向量进行处理得到;
    获取所述第一对话的至少两个第二对话特征,所述第一会话的所述第二对话特征通过所述先验网络对所述第一对话上文的第一向量和所述第一对话回复所属的回复类别进行处理得到。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述第一对话的至少两个第一对话特征包括:
    将所述第一向量和所述第二向量输入所述后验网络,输出第一参数均值和第一参数方差;
    根据所述第一参数均值、所述第一参数方差以及第一采样值,获取第一对话特征,所述第一采样值为对标准正态分布采样得到。
  4. 根据权利要求2所述的方法,其特征在于,所述获取所述第一对话的至少两个第二对话特征,包括:
    根据所述第一向量和所述第一对话回复所属的回复类别,确定目标概率分布,所述目标概率分布为所述先验网络所输出的概率分布中所述回复类别对应的概率分布;
    将所述第一向量输入所述先验网络,得到第二参数均值和第二参数方差;
    根据所述第二参数均值、所述第二参数方差以及第二采样值,获取第二对话特征,所述 第二采样值为对所述目标概率分布采样得到。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型,包括:
    对于所述第一会话的任一所述第一对话回复,获取所述第一对话回复对应的第一对话特征和第二对话特征;
    根据第一向量、所述第一对话回复对应的第一对话特征和第二对话特征,获取判别器损失,所述第一向量基于所述第一对话上文编码得到;
    根据所述第一向量、所述第一对话回复对应的第一对话特征和第二对话特征,获取重构损失;
    根据所述判别器损失,更新所述对话模型中后验网络和先验网络的参数;
    根据所述重构损失,更新所述对话模型中编码器、所述后验网络、所述先验网络以及解码器的参数;
    根据所述判别器损失,更新所述对话模型的判别器的参数。
  6. 根据权利要求5所述的方法,其特征在于,所述根据第一向量、所述第一对话回复对应的第一对话特征和第二对话特征,获取判别器损失,包括:
    基于所述对话模型的判别器,根据所述第一对话上文的第一向量、所述第一对话回复对应的第一对话特征和第二对话特征,获取所述第一对话回复对应的第一对话特征和第二对话特征之间的第一瓦瑟斯坦距离,将所述第一瓦斯斯坦距离作为判别器损失。
  7. 根据权利要求5所述的方法,其特征在于,所述根据第一向量、所述第一对话回复对应的第一对话特征和第二对话特征,获取重构损失,包括:
    基于所述对话模型中的解码器,对所述第一对话特征进行解码,获取目标对话特征;
    根据所述第一向量、所述所述第一对话回复对应的第一对话特征和第二对话特征、所述目标对话特征,获取重构损失。
  8. 根据权利要求1所述的方法,其特征在于,所述基于所述第一对话的至少两个第一对话特征,更新所述所述后验网络,包括:
    对于所述第一对话的任一第一对话特征,获取所述至少两个第一对话特征中除所述第一对话特征外其他第一对话特征的平均值,将所述平均值作为平均对话特征;
    获取所述第一对话特征与所述平均对话特征之间的第二瓦瑟斯坦距离,将所述第二瓦瑟斯坦距离作为语义损失;
    根据所述语义损失,更新所述后验网络的参数。
  9. 一种对话回复生成方法,其特征在于,所述方法包括:
    获取对话上文;
    将所述对话上文输入对话模型,基于所述对话模型中的先验网络,从多个对话回复对应的第二对话特征中随机抽取一个目标对话特征;
    基于所述对话模型中的解码器对所述目标对话特征进行解码,输出目标对话回复;
    展示所述目标对话回复。
  10. 一种对话模型的训练装置,其特征在于,所述装置包括:
    特征获取模块,用于基于对话模型中的先验网络和后验网络,获取第一对话的至少两个第一对话特征和至少两个第二对话特征,所述先验网络用于输出对话特征的概率分布,所述后验网络用于估计所述先验网络所输出的对话特征的概率分布,所述第一对话特征用于表示一个对话中对话上文和一个对话回复的后验特征,所述第二对话特征用于表示一个对话中所述对话上文和一个对话回复的先验特征,所述第一对话包括一个第一对话上文和至少两个第一对话回复;
    模型更新模块,用于基于所述第一对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型;
    所述模型更新模块,还用于基于所述第一对话的至少两个第一对话特征,更新所述后验网络;
    所述模型更新模块,还用于根据第二对话的至少两个第一对话特征和至少两个第二对话特征,更新所述对话模型的判别器;
    模型获取模块,用于响应于满足训练结束条件,将训练得到的模型作为对话模型。
  11. 根据权利要求10所述的装置,其特征在于,所述特征获取模块,用于对于所述第一对话的任一所述第一对话回复,基于所述对话模型的编码器,对所述第一对话上文和所述第一对话回复分别进行编码,得到所述第一对话上文的第一向量和所述第一对话回复的第二向量;
    获取所述第一对话的至少两个第一对话特征,所述第一会话的所述第一对话特征通过所述后验网络对所述第一向量和所述第二向量进行处理得到;
    获取所述第一对话的至少两个第二对话特征,所述第一会话的所述第二对话特征通过所述先验网络对所述第一向量和所述第一对话回复所属的回复类别进行处理得到。
  12. 一种对话回复生成装置,其特征在于,所述装置包括:
    对话获取模块,用于获取对话上文;
    特征抽取模块,用于将所述对话上文输入对话模型,基于所述对话模型中的先验网络,从多个对话回复对应的第一对话特征中随机抽取一个目标对话特征;
    回复输出模块,用于基于所述对话模型中的解码器对所述目标对话特征进行解码,输出目标对话回复;
    回复展示模块,用于展示所述目标对话回复。
  13. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器用于存储至少一段程序代码,所述至少一段程序代码由所述处理器加载并执行权利要求1至8所述的对话模型的训练方法,或者执行权利要求9所述的对话回复生成方法。
  14. 一种存储介质,其特征在于,所述存储介质用于存储至少一段程序代码,所述至少一 段程序代码用于执行权利要求1至8任一权利要求所述的对话模型的训练方法,或者执行权利要求9所述的对话回复生成方法。
PCT/CN2021/091954 2020-05-25 2021-05-06 对话模型的训练方法、装置、计算机设备及存储介质 WO2021238599A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022538829A JP7431977B2 (ja) 2020-05-25 2021-05-06 対話モデルの訓練方法、装置、コンピュータ機器及びプログラム
US17/715,778 US20220309088A1 (en) 2020-05-25 2022-04-07 Method and apparatus for training dialog model, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010450194.0A CN111680123B (zh) 2020-05-25 2020-05-25 对话模型的训练方法、装置、计算机设备及存储介质
CN202010450194.0 2020-05-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/715,778 Continuation US20220309088A1 (en) 2020-05-25 2022-04-07 Method and apparatus for training dialog model, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021238599A1 true WO2021238599A1 (zh) 2021-12-02

Family

ID=72434775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091954 WO2021238599A1 (zh) 2020-05-25 2021-05-06 对话模型的训练方法、装置、计算机设备及存储介质

Country Status (4)

Country Link
US (1) US20220309088A1 (zh)
JP (1) JP7431977B2 (zh)
CN (1) CN111680123B (zh)
WO (1) WO2021238599A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680123B (zh) * 2020-05-25 2024-01-26 腾讯科技(深圳)有限公司 对话模型的训练方法、装置、计算机设备及存储介质
CN112702329B (zh) * 2020-12-21 2023-04-07 四川虹微技术有限公司 一种流量数据异常检测方法、装置和存储介质
CN113254597B (zh) * 2021-06-23 2021-09-28 腾讯科技(深圳)有限公司 模型训练方法、查询处理方法及相关设备
CN113569212B (zh) * 2021-07-30 2024-04-26 上海交通大学 基于自动编码器的击键动力学身份认证与识别方法及系统
CN117785964A (zh) * 2024-02-28 2024-03-29 宜宾市万事通网络信息服务有限公司 应用于网络服务的数据处理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303978B1 (en) * 2018-03-26 2019-05-28 Clinc, Inc. Systems and methods for intelligently curating machine learning training data and improving machine learning model performance
CN110457457A (zh) * 2019-08-02 2019-11-15 腾讯科技(深圳)有限公司 对话生成模型的训练方法、对话生成方法及装置
CN111680123A (zh) * 2020-05-25 2020-09-18 腾讯科技(深圳)有限公司 对话模型的训练方法、装置、计算机设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3454260A1 (en) * 2017-09-11 2019-03-13 Tata Consultancy Services Limited Bilstm-siamese network based classifier for identifying target class of queries and providing responses thereof
CN108897797A (zh) * 2018-06-12 2018-11-27 腾讯科技(深圳)有限公司 对话模型的更新训练方法、装置、存储介质及电子设备
KR102204979B1 (ko) * 2018-08-24 2021-01-19 네이버 주식회사 딥러닝 생성모델과 다중모달 분포를 이용하여 멀티턴 대화 응답을 생성하는 방법 및 시스템
US11544524B2 (en) * 2018-09-28 2023-01-03 Samsung Electronics Co., Ltd. Electronic device and method of obtaining emotion information
CN110222164B (zh) * 2019-06-13 2022-11-29 腾讯科技(深圳)有限公司 一种问答模型训练方法、问题语句处理方法、装置及存储介质
CN111143509B (zh) * 2019-12-09 2023-06-30 天津大学 一种基于静态-动态注意力变分网络的对话生成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303978B1 (en) * 2018-03-26 2019-05-28 Clinc, Inc. Systems and methods for intelligently curating machine learning training data and improving machine learning model performance
CN110457457A (zh) * 2019-08-02 2019-11-15 腾讯科技(深圳)有限公司 对话生成模型的训练方法、对话生成方法及装置
CN111680123A (zh) * 2020-05-25 2020-09-18 腾讯科技(深圳)有限公司 对话模型的训练方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN111680123B (zh) 2024-01-26
CN111680123A (zh) 2020-09-18
US20220309088A1 (en) 2022-09-29
JP2023508062A (ja) 2023-02-28
JP7431977B2 (ja) 2024-02-15

Similar Documents

Publication Publication Date Title
WO2021238599A1 (zh) 对话模型的训练方法、装置、计算机设备及存储介质
WO2021135577A9 (zh) 音频信号处理方法、装置、电子设备及存储介质
CN110097019B (zh) 字符识别方法、装置、计算机设备以及存储介质
US20220172737A1 (en) Speech signal processing method and speech separation method
CN110263131B (zh) 回复信息生成方法、装置及存储介质
CN110147533B (zh) 编码方法、装置、设备及存储介质
CN110162604B (zh) 语句生成方法、装置、设备及存储介质
CN111104980B (zh) 确定分类结果的方法、装置、设备及存储介质
CN112733970B (zh) 图像分类模型处理方法、图像分类方法及装置
CN112069309A (zh) 信息获取方法、装置、计算机设备及存储介质
CN111368525A (zh) 信息搜索方法、装置、设备及存储介质
CN111324699A (zh) 语义匹配的方法、装置、电子设备及存储介质
CN111581958A (zh) 对话状态确定方法、装置、计算机设备及存储介质
WO2020151685A1 (zh) 编码方法、装置、设备及存储介质
CN111753498A (zh) 文本处理方法、装置、设备及存储介质
CN110555102A (zh) 媒体标题识别方法、装置及存储介质
WO2022193973A1 (zh) 图像处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN117454954A (zh) 模型训练方法、装置、计算机设备及存储介质
CN110990549B (zh) 获取答案的方法、装置、电子设备及存储介质
CN112287070A (zh) 词语的上下位关系确定方法、装置、计算机设备及介质
CN111341307A (zh) 语音识别方法、装置、电子设备及存储介质
CN113836946B (zh) 训练评分模型的方法、装置、终端及存储介质
CN111310701B (zh) 手势识别方法、装置、设备及存储介质
CN112988984B (zh) 特征获取方法、装置、计算机设备及存储介质
CN114328815A (zh) 文本映射模型的处理方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21813897

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022538829

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 170423)

122 Ep: pct application non-entry in european phase

Ref document number: 21813897

Country of ref document: EP

Kind code of ref document: A1