CN117633198A - Training method of role dialogue model, dialogue generation method, device and equipment - Google Patents

Training method of role dialogue model, dialogue generation method, device and equipment Download PDF

Info

Publication number
CN117633198A
CN117633198A CN202311342803.0A CN202311342803A CN117633198A CN 117633198 A CN117633198 A CN 117633198A CN 202311342803 A CN202311342803 A CN 202311342803A CN 117633198 A CN117633198 A CN 117633198A
Authority
CN
China
Prior art keywords
dialogue
model
sample
reply
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311342803.0A
Other languages
Chinese (zh)
Inventor
陈春全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311342803.0A priority Critical patent/CN117633198A/en
Publication of CN117633198A publication Critical patent/CN117633198A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a training method, a dialogue generation method, a device and equipment for a role dialogue model, and belongs to the technical field of artificial intelligence. The method comprises the following steps: based on the pre-training dialogue model, acquiring an initial dialogue model, wherein the initial dialogue model comprises character parameters and model parameters of the pre-training dialogue model, and the character parameters are used for representing dialogue styles of target characters; the method comprises the steps of obtaining a dialogue sample pair of a target role, wherein the dialogue sample pair of the target role comprises a dialogue sample characteristic of a dialogue sample sentence and a reply sample characteristic of a reply sample sentence, and the reply sample sentence is a reply sentence of the target role; inputting the question sample characteristics into an initial dialogue model to obtain predicted reply characteristics; character parameters in the initial dialog model are adjusted based on the reply sample features and the predicted reply features to obtain a character dialog model for the target character, the character dialog model being used to generate a reply sentence having a dialog style for the target character. The method improves training efficiency of the role dialogue model.

Description

Training method of role dialogue model, dialogue generation method, device and equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a training method for a role dialogue model, a dialogue generating method, a device and equipment.
Background
With the development of artificial intelligence, man-machine interaction technology is becoming a widely used technology. In the related art, a dialogue model is generally trained by dialogue corpus of various sources, and man-machine interaction is completed by training the obtained dialogue model. However, because the sources of the dialogue corpora are wide, i.e. the dialogue corpora are various, the trained dialogue model has generalization only and does not have personalized features. Thus, there is a need for a training method for a character conversation model to obtain a conversation model with character characteristics.
Disclosure of Invention
The embodiment of the application provides a training method, a dialogue generating method, a device and equipment for a role dialogue model, which improve the training efficiency of the role dialogue model. The technical proposal is as follows.
In one aspect, a method for training a character conversation model is provided, the method comprising:
based on a pre-training dialogue model, acquiring an initial dialogue model, wherein the pre-training dialogue model is obtained by training based on a plurality of groups of dialogue samples, the initial dialogue model comprises role parameters and model parameters of the pre-training dialogue model, and the role parameters are used for representing dialogue styles of target roles;
A dialogue sample pair of a target role is obtained, wherein the dialogue sample pair of the target role comprises a dialogue sample characteristic of a dialogue sample sentence and a reply sample characteristic of a reply sample sentence, and the reply sample sentence is a reply sentence of the target role;
inputting the question sample characteristics into the initial dialogue model to obtain predicted reply characteristics;
based on the reply sample feature and the predicted reply feature, character parameters in the initial dialog model are adjusted to obtain a character dialog model of the target character, wherein the character dialog model is used for generating a reply sentence with a dialog style of the target character.
In another aspect, a method for generating a dialogue is provided, the method comprising:
acquiring the characteristics of a question sentence;
inputting the dialogue sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the dialogue sentence characteristics, wherein the role dialogue model is obtained based on reply sentence training of a target role, the reply sentence characteristics are used for indicating reply sentences corresponding to the dialogue sentences, and the dialogue styles of the reply sentences corresponding to the dialogue sentences belong to the dialogue styles of the target role.
In some embodiments, the inputting the question sentence feature into the character dialogue model to obtain a reply sentence feature corresponding to the question sentence feature includes:
inputting the question sentence feature into the character dialogue model;
and splicing the question sentence characteristics and the parameter values of the character parameters in the character dialogue model through the character dialogue model to obtain splicing characteristics, and obtaining the reply sentence characteristics based on the splicing characteristics.
In some embodiments, the obtaining the reply sentence feature based on the splicing feature includes:
based on key weight parameters in the character dialogue model, carrying out linear mapping on the spliced features to obtain key features, wherein the key features are used for indicating key information of a plurality of words in the question sentence;
based on the value weight parameters in the role dialogue model, carrying out linear mapping on the spliced features to obtain value features, wherein the value features are used for indicating the importance degrees of a plurality of words in the question sentence;
the reply sentence feature is obtained based on the key feature and the value feature.
In some embodiments, the deriving the reply sentence feature based on the key feature and the value feature comprises:
Obtaining an attention score matrix based on the key characteristics, wherein the attention score matrix is used for indicating the similarity between each word in the question sentence and the current parameter value of the role parameter and other words in the question sentence;
normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristics to obtain the reply sentence characteristics.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sentence and a current parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sentence and other words in the question sentence;
the normalizing processing is performed on the attention score matrix to obtain attention weights, including:
determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the role parameter on the second submatrix;
Normalizing the third submatrix to obtain a first submatrix weight, and normalizing the second submatrix to obtain a second submatrix weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the character dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter; inputting the question sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the question sentence characteristics, wherein the method comprises the following steps:
inputting the question sentence characteristics into a first attention layer, splicing the question sentence characteristics and the parameter values of character parameters in the first attention layer through the first attention layer to obtain splicing characteristics, and obtaining attention output characteristics based on the splicing characteristics;
sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the parameter values of the character parameters in the other attention layers;
and carrying out nonlinear conversion on the attention output characteristic output by the last attention layer to obtain the reply sentence characteristic.
In another aspect, there is provided a training apparatus for a character conversation model, the apparatus comprising:
the acquisition module is used for acquiring an initial dialogue model based on a pre-training dialogue model, wherein the pre-training dialogue model is obtained by training based on a plurality of groups of dialogue samples, the initial dialogue model comprises role parameters and model parameters of the pre-training dialogue model, and the role parameters are used for representing the dialogue style of a target role;
the acquisition module is further used for acquiring a dialogue sample pair of a target role, wherein the dialogue sample pair of the target role comprises dialogue sample characteristics of dialogue sample sentences and reply sample characteristics of reply sample sentences, and the reply sample sentences are reply sentences of the target role;
the input-output module is used for inputting the question sample characteristics into the initial dialogue model to obtain predicted reply characteristics;
and the adjusting module is used for adjusting the role parameters in the initial dialogue model based on the response sample characteristics and the predicted response characteristics so as to obtain a role dialogue model of the target role, wherein the role dialogue model is used for generating a response sentence with the dialogue style of the target role.
In some embodiments, the input-output module is configured to:
inputting the question sample feature into the initial dialogue model;
and splicing the question sample characteristics and the current parameter values of the role parameters through the initial dialogue model to obtain sample splicing characteristics, and obtaining the prediction reply characteristics based on the sample splicing characteristics.
In some embodiments, the input-output module is configured to:
based on key weight parameters in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain key characteristics, wherein the key characteristics are used for indicating key information of a plurality of words in the dialogue sample sentence;
based on the value weight parameters in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain value characteristics, wherein the value characteristics are used for indicating the importance degrees of a plurality of words in the dialogue sample sentence;
the predicted reply feature is derived based on the key feature and the value feature.
In some embodiments, the input-output module is configured to:
obtaining an attention score matrix based on the key characteristics, wherein the attention score matrix is used for indicating the similarity between each word in the question sample sentence and the current parameter value of the role parameter and other words in the question sample sentence;
Normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristic to obtain the prediction reply characteristic.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sample sentence and a current parameter value of the character parameter and a second sub-matrix for indicating a similarity between each word in the question sample sentence and other words in the question sample sentence; the input/output module is used for:
determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the role parameter on the second submatrix;
normalizing the third submatrix to obtain a first submatrix weight, and normalizing the second submatrix to obtain a second submatrix weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the initial dialog model comprises a plurality of attention layers connected in sequence, each of the plurality of attention layers comprising a character parameter; the input/output module is used for:
inputting the question sample characteristics into a first attention layer, splicing the question sample characteristics and the current parameter values of character parameters in the first attention layer through the first attention layer to obtain sample splicing characteristics, and obtaining attention output characteristics based on the sample splicing characteristics;
sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers;
and performing nonlinear conversion on the attention output characteristic output by the last attention layer to obtain the prediction recovery characteristic.
In some embodiments, the acquiring module is configured to:
obtaining dialogue sentences between a target role and a plurality of dialogue objects from a corpus to which the target role belongs;
acquiring at least two dialogue sentences from the dialogue sentences, wherein the at least two dialogue sentences comprise the dialogue sentences of the target role and the dialogue sentences of at least one dialogue object, and the ending sentences of the at least two dialogue sentences are the dialogue sentences of the target role;
And taking the end statement as the reply sample statement, taking other dialogue statements except the end statement in the at least two dialogue statements as the question sample statement, and respectively obtaining question sample features and reply sample features based on the question sample statement and the reply sample statement.
In another aspect, a dialog generation device is provided, the device comprising:
the acquisition module is used for acquiring the characteristics of the question sentences;
the input and output module is used for inputting the question sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the question sentence characteristics, the role dialogue model is obtained based on reply sentence training of a target role, the reply sentence characteristics are used for indicating reply sentences corresponding to the question sentences, and the dialogue styles of the reply sentences corresponding to the question sentences belong to the dialogue styles of the target role.
In some embodiments, the input-output module is configured to:
inputting the question sentence feature into the character dialogue model;
and splicing the question sentence characteristics and the parameter values of the character parameters in the character dialogue model through the character dialogue model to obtain splicing characteristics, and obtaining the reply sentence characteristics based on the splicing characteristics.
In some embodiments, the input-output module is configured to:
based on key weight parameters in the character dialogue model, carrying out linear mapping on the spliced features to obtain key features, wherein the key features are used for indicating key information of a plurality of words in the question sentence;
based on the value weight parameters in the role dialogue model, carrying out linear mapping on the spliced features to obtain value features, wherein the value features are used for indicating the importance degrees of a plurality of words in the question sentence;
the reply sentence feature is obtained based on the key feature and the value feature.
In some embodiments, the input-output module is configured to:
obtaining an attention score matrix based on the key characteristics, wherein the attention score matrix is used for indicating the similarity between each word in the question sentence and the parameter value of the role parameter and other words in the question sentence;
normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristics to obtain the reply sentence characteristics.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sentence and the parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sentence and other words in the question sentence; the input/output module is used for:
determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the role parameter on the second submatrix;
normalizing the third submatrix to obtain a first submatrix weight, and normalizing the second submatrix to obtain a second submatrix weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the character dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter; the input/output module is used for:
inputting the question sentence characteristics into a first attention layer, splicing the question sentence characteristics and the current parameter values of character parameters in the first attention layer through the first attention layer to obtain splicing characteristics, and obtaining attention output characteristics based on the splicing characteristics;
Sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers;
and carrying out nonlinear conversion on the attention output characteristic output by the last attention layer to obtain the reply sentence characteristic.
In another aspect, a computer device is provided that includes a processor and a memory for storing at least one program that is loaded and executed by the processor to implement a training method or a dialog generation method for a character dialog model in embodiments of the present application.
In another aspect, a computer readable storage medium having at least one program stored therein is provided, the at least one program loaded and executed by a processor to implement a training method or a dialog generation method for a character dialog model in an embodiment of the present application.
In another aspect, a computer program product is provided, the computer program product including at least one program stored in a computer readable storage medium, the at least one program being read from the computer readable storage medium by a processor of a computer device, the processor executing the at least one program causing the computer device to perform the training method or the dialog generation method of a character dialog model as described in any of the above implementations.
The embodiment of the application provides a training method of a role dialogue model, which introduces a new role parameter on the basis of a trained pre-training dialogue model to obtain an initial dialogue model; and training the role parameters in the initial dialogue model based on the dialogue sample pair of the target role, so that the role parameters in the trained role dialogue model can accurately represent the dialogue style of the target role, and further, a reply sentence with the dialogue style of the target role can be generated through the role dialogue model. According to the method, a new role parameter is introduced to enable the user to learn the dialogue style of the target role, so that only one role parameter is trained without training other model parameters, the training cost is reduced and the training time is shortened on the premise that the dialogue performance of the pre-training dialogue model is not damaged, and the training efficiency of the role dialogue model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by embodiments of the present application;
FIG. 2 is a flow chart of a method for training a character conversation model provided in an embodiment of the present application;
FIG. 3 is a flow chart of another method of training a character conversation model provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a pre-trained dialog model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of the structure of an attention layer in a pre-trained dialog model according to an embodiment of the present application;
FIG. 6 is a schematic diagram of the structure of an attention layer in an initial dialogue model according to an embodiment of the present application;
FIG. 7 is a flow chart of a training character conversation model provided in an embodiment of the present application;
FIG. 8 is a flow chart of a dialog generation method provided by an embodiment of the present application;
FIG. 9 is a block diagram of a training apparatus for a character conversation model provided in an embodiment of the present application;
FIG. 10 is a block diagram of a dialog generating apparatus provided in an embodiment of the present application;
fig. 11 is a block diagram of a terminal provided in an embodiment of the present application;
fig. 12 is a block diagram of a server provided in an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "nth" terms, nor is it limited to the number or order of execution.
The term "at least one" in this application means one or more, and the meaning of "a plurality of" means two or more.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the session sample pairs referred to in this application are all acquired with sufficient authorization.
The following describes the terms of art to which the present application relates:
artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.
Natural language processing (Nature Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, and is closely researched with linguistics; meanwhile, the method relates to important technology of model training in the fields of computer science and mathematics and artificial intelligence, and a pre-training model is developed from a large language model (Large Language Model) in the NLP field. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
The Pre-training model (Pre-training model), also called a matrix model and a large model, refers to a deep neural network (Deep Neural Network, DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through fine tuning (fine tuning), efficient fine tuning (PEFT) of parameters, prompt-tuning and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of the process into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of the characteristics of two or more data modalities. The pre-training model is an important tool for outputting artificial intelligence generation content (AIGC, artificial Intelligence Generated Content), and can also be used as a general interface for connecting a plurality of specific task models.
The following describes an implementation environment related to the present application:
the training method of the character dialogue model provided by the embodiment of the application can be executed by computer equipment, and the computer equipment can be provided as a server or a terminal. An environment diagram of the training method of the character dialogue model according to the embodiment of the present application is described below.
Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment provided in an embodiment of the present application, where the implementation environment includes a terminal 101 and a server 102, and the implementation environment may be applied to a training method of a role dialogue model, and may also be applied to a dialogue generation method. The terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, which is not limited herein. In some embodiments, the server 102 is configured to train a character dialog model, and the character dialog model of the target character is configured to output reply sentences having the dialog style of the target character. The terminal 101 has installed thereon a target application for conducting a man-machine conversation. In some embodiments, the terminal 101 embeds a character dialogue model of the target character obtained by training, and the terminal 101 implements man-machine dialogue through the character dialogue model and outputs a reply sentence having a dialogue style of the target character. In other embodiments, terminal 101 outputs a reply sentence having a conversational style of the target character through the character conversational model on server 102.
In some embodiments, the terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, a VR (Virtual Reality) device, an AR (Augmented Reality) device, and the like. In some embodiments, the server 102 is a stand-alone server, a server cluster or a distributed system formed by a plurality of servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network content delivery network), and basic cloud computing services such as big data and artificial intelligence platforms. In some embodiments, the server 102 primarily takes on computing work and the terminal 101 takes on secondary computing work; alternatively, the server 102 assumes secondary computing services and the terminal 101 assumes primary computing tasks; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal 101.
Referring to fig. 2, fig. 2 is a flowchart of a training method for a character conversation model provided in an embodiment of the present application, the method being performed by a computer device, the method comprising the following steps.
201. The computer device obtains an initial dialogue model based on a pre-trained dialogue model, the pre-trained dialogue model is trained based on a plurality of groups of dialogue sample pairs, the initial dialogue model comprises character parameters and model parameters of the pre-trained dialogue model, and the character parameters are used for representing dialogue styles of target characters.
In the embodiment of the application, a plurality of groups of dialogue sample pairs are obtained from an open-domain dialogue corpus, wherein the open-domain dialogue corpus comprises dialogue corpora with a plurality of domains and multiple roles. The pre-trained dialogue model has better man-machine dialogue capability. Optionally, the initial dialog model is derived based on the introduction of character parameters in the pre-trained dialog model. The model parameters and the character parameters are respectively represented in a matrix form. The initial parameter value of the character parameter is a random initialized value.
In the embodiment of the present disclosure, the target characters include, but are not limited to, characters in novels, scripts, movies, and the like, and may be various characters such as female characters, male characters, children characters, and old people, and may be set and changed as needed, which is not particularly limited herein.
202. The computer equipment acquires a dialogue sample pair of the target character, wherein the dialogue sample pair of the target character comprises a dialogue sample characteristic of a dialogue sample sentence and a reply sample characteristic of a reply sample sentence, and the reply sample sentence is a reply sentence of the target character.
In the embodiment of the present application, the expressions of the speech sample feature and the reply sample feature are respectively matrices. The question sample feature is used to describe the question sample statement. The reply sample feature is used to describe the reply sample statement.
Wherein the question sample feature is a sequence of vectors, each vector in the sequence of vectors being used to describe a word in the question sample sentence. Similarly, a reply sample is characterized by a sequence of vectors, each vector in the sequence of vectors describing a word in the reply sample sentence.
In an embodiment of the present application, the speech sample sentence comprises at least one sentence of a dialogue sentence of at least one dialogue object. Optionally, at least one sentence of dialogue sentences is distinguished by using a separation symbol.
203. The computer device inputs the question sample features into the initial dialogue model to obtain predicted reply features.
In the embodiment of the application, the initial dialogue model is used for outputting predicted reply features based on the question sample features. The predictive reply feature is used to describe the predictive reply sentence and is a sequence of vectors, each vector in the sequence of vectors being used to describe a word.
204. The computer device adjusts character parameters in the initial dialog model based on the reply sample features and the predicted reply features to obtain a character dialog model for the target character, the character dialog model being used to generate a reply sentence having a dialog style for the target character.
In the embodiment of the application, the dialogue sample pairs of the target role are multiple groups, and the computer equipment carries out iterative training on the initial dialogue model based on the multiple groups of dialogue sample pairs until the preset requirement is met, so as to obtain the role dialogue model of the target role. Adjusting the character parameters refers to adjusting the parameter values of the character parameters.
The reaching of the preset requirement may be that a difference between the reply sample feature and the predicted reply feature is smaller than a preset difference. Further, reaching the preset requirement means that the loss value determined based on the reply sample feature and the predicted reply feature reaches convergence, or the loss value reaches a preset threshold, or the iteration number reaches a preset number of times, which is not limited in detail herein.
In the embodiment of the application, the character parameters can capture and learn the style information of the target character through training the character parameters so as to adapt to the dialogue style of the target character, so that the character dialogue model learns the dialogue style of the target character based on the dialogue sample pair of the target character, and further a reply sentence with the dialogue style of the target character can be generated based on the character dialogue model.
The embodiment of the application provides a training method of a role dialogue model, which introduces a new role parameter on the basis of a trained pre-training dialogue model to obtain an initial dialogue model; and training the role parameters in the initial dialogue model based on the dialogue sample pair of the target role, so that the role parameters in the trained role dialogue model can accurately represent the dialogue style of the target role, and further, a reply sentence with the dialogue style of the target role can be generated through the role dialogue model. According to the method, a new role parameter is introduced to enable the user to learn the dialogue style of the target role, so that only one role parameter is trained without training other model parameters, the training cost is reduced and the training time is shortened on the premise that the dialogue performance of the pre-training dialogue model is not damaged, and the training efficiency of the role dialogue model is improved.
Fig. 2 is a basic flow of the training method of the character dialogue model, and the training method of the character dialogue model based on fig. 3 will be further described below. Referring to fig. 3, fig. 3 is a flowchart of a training method for a character conversation model provided in an embodiment of the present application, the method being performed by a computer device, the method comprising the following steps.
301. The computer device obtains a plurality of groups of conversational sample pairs.
In the embodiment of the application, the computer equipment acquires dialogue corpus in the open field. Various social media platforms can be used as data sources to obtain dialogue corpus. It should be noted that, the computer device has obtained the authorized license of any platform before obtaining the dialogue corpus from the platform.
In some embodiments, the computer device performs preprocessing and data cleansing on the obtained dialogue corpus to improve data quality. Optionally, the pre-processing process comprises the steps of: the computer device removes the interfering information in the dialogue corpus such as links, HTML (Hyper Text Markup Language ) tags, advertisements, etc. And then unified case is carried out on the dialogue corpus, repeated, nonsensical or low-quality dialogue corpus is removed, and three or more dialogue corpus is filtered, so that only the dialogue corpus between two dialogue objects is reserved, and a plurality of groups of dialogue sample pairs are obtained. In this embodiment, the dialogue corpus is preprocessed, so that the training data is more normalized, and model training based on the training data is facilitated.
For example, a multi-round dialogue between the dialogue object a and the dialogue object B is described below as an example.
Dialog object a: is play of the C-language lyrics to connect with the dragon?
Dialog object B: o can be, but i generally listen to the song and do not sing.
Dialog object a: you come first.
Dialog object B: i are inconvenient at present and wait for the convenience of I.
In embodiments of the present application, a set of dialog sample pairs may include multiple rounds of dialog. Optionally, the computer device separates the multiple rounds of conversations with special symbols to separate conversational sentences of the different conversational objects.
302. The computer device trains a pre-trained dialog model based on the sets of dialog sample pairs.
In some embodiments, the pre-trained dialog model employs a model of a transducer structure based on a self-attention mechanism. The pre-training dialogue model is formed by stacking a plurality of identical transducer layers, and each transducer layer is formed by combining a multi-head attention mechanism and a feedforward neural network. the model modeling capability of the transducer structure is strong, the expansibility is good, and the parallel calculation can be well carried out.
In some embodiments, the pre-trained dialog model employs a left-to-right unidirectional attention mechanism, i.e., each word in a dialog sentence can only focus on the word before that word, and not on the word after that.
In embodiments of the present application, the computer device needs to pre-process multiple rounds of dialogue sentences in a dialogue sample pair to convert the dialogue sentences into a form acceptable to a pre-trained dialogue model. Optionally, the computer device adds a special symbol "bos" at the beginning of the dialogue sentence to characterize the beginning of the dialogue, concatenating the rounds of dialogue with a special symbol "sep". Similarly, the dialogue sentence feature and the prediction reply feature are spliced together by using a special symbol "sep", and a special symbol "eos" is added at the end of the prediction reply feature to represent the end of output. Then, the dialogue sentences are subjected to word segmentation and indexing in sequence, namely, the dialogue sentences are converted into a matrix form and used as the input of a pre-training dialogue model. For example, N words obtained after the multi-round dialogue sentence word segmentation process are [ "bos", "play", "cantonese", "lyrics", "connect", "dragon", "mock", "| ]! "sep", "can", "but not me", "generally", "listen to a song", "do not", "sing", ". "sep", "you", "first", "sentence", "a sentence". "sep", "me now", "inconvenient", "the", "i am", "convenient", "at the time", "come again", "the like". "," eos "], N is an integer greater than 0. That is, the input data of the pre-trained dialogue model is the first N-1 words except the last word "eos", and the tag data, i.e., the reply sample sentence, is the N-1 words except the first word "bos".
For example, referring to fig. 4, fig. 4 is a schematic structural diagram of a pre-training dialogue model according to an embodiment of the present application. And processing the preprocessed dialogue sentences through an M-layer transducer layer (namely an attention layer) in the pre-training dialogue model to obtain an output result. Optionally, after the dialogue sample characteristics and the reply sample characteristics in the dialogue sample pair are spliced, the dialogue sample characteristics and the reply sample characteristics are input into a pre-training dialogue model, and the pre-training dialogue model outputs the spliced question sample characteristics and the prediction reply characteristics, so that not only the prediction reply characteristics, but also the question sample characteristics corresponding to the prediction reply characteristics are indicated. The output of the latter reply word in the predicted reply features depends on the former reply word and the question-mark sample features, so that the predicted reply features are combined with the above semantics, and the accuracy of the predicted reply features is improved. And the reply sample characteristics and the question sample characteristics are input into the pre-training dialogue model together, so that the data input times are reduced, and model parameters are convenient to adjust based on the difference between the reply sample characteristics and the predicted reply characteristics. Where x0 to x1 in the input data represent question sample features and y0 to y1 represent reply sample features.
In some embodiments, the computer device employs a cross entropy loss function to determine the loss between the predicted output and the reference output, i.e., the loss between the reply-sample feature and the predicted reply feature, and updates the model parameters of the pre-trained dialog model by minimizing the loss. On the basis of massive open-field dialogue corpus, a general dialogue model of a transformer model structure is trained, and the trained general dialogue model has good man-machine interaction and dialogue capability due to the large data volume of the open-field dialogue corpus, so that smooth reply sentences conforming to the context can be generated. However, since the pre-trained dialog model is a general unconditional language model, it cannot be determined whether the generated reply content conforms to the dialog style and behavior pattern of the customized character. Therefore, by the method provided by the embodiment of the application, further training is performed on the basis of the pre-training dialogue model so as to obtain the dialogue model conforming to the dialogue style of the customized role.
Wherein, the calculation process of the self-attention mechanism of the attention layer in the pre-training dialogue model is shown in the following formulas (1) - (3).
Q=xW Q ;K=xW K ;V=xW V (1);
x out =softmax(S)·V·W O (3);
Where x represents the input of the pre-trained dialog model, and further, if the pre-trained dialog model includes multiple transducer layers, x represents the input of the current transducer layer, which is derived based on the output of the last transducer layer, R represents a real set, N represents the length of the input sequence, and h represents the hidden dimension. W (W) Q 、W K 、W V 、W O The model parameters are weight parameters. W (W) Q Representing query weight parameters, W K Representing key weight parameters, W V Representing the value weight parameter. Q represents a query feature used to calculate associations or relationships between the current word and other words. K represents key features for indicating key information of a plurality of words in the speech sample sentence. V represents a value characteristic for indicating the importance of a plurality of words in the speech sample sentence. T represents the transpose, d k Representing the matrix dimension of K. softmax represents the normalization function. W (W) O Weight parameters representing the transformation matrix are used to extract semantic features. X is x out Representing the attention output characteristics, S representing the attention weight.
For example, referring to fig. 5, fig. 5 is a schematic structural diagram of an attention layer in a pre-training session model according to an embodiment of the present application. The input of the transducer layer is processed by the multi-head self-attention mechanism, the output of the transducer layer is processed by the normalization layer, then enters the feedforward neural network layer for processing, and is output after being processed by the normalization layer. In the attention layer, for each attention, the input of the attention layer is linearly mapped through the query weight parameter, the key weight parameter and the value weight parameter to obtain the query feature Q, the key feature K and the value feature V. Then multiplying the query feature and the transpose of the key feature, scaling the obtained product based on the matrix dimension of the key feature to obtain the attention weight S, normalizing the attention weight, and multiplying the normalized attention weight with the value feature to obtain the calculation result of the attention of the head. Then splice the calculation result of the multi-head attention based on Weight parameter W of conversion matrix O And linearly mapping the spliced result to obtain the output characteristics of the attention layer of the comprehensive multi-head attention.
303. The computer device obtains an initial dialog model based on the pre-trained dialog model, the initial dialog model including character parameters for representing a dialog style of the target character and model parameters of the pre-trained dialog model.
In some embodiments, the initial dialog model is derived by introducing role parameters in the pre-trained dialog model.
304. The computer equipment acquires a dialogue sample pair of the target character, wherein the dialogue sample pair of the target character comprises a dialogue sample characteristic of a dialogue sample sentence and a reply sample characteristic of a reply sample sentence, and the reply sample sentence is a reply sentence of the target character.
In some embodiments, if the target character belongs to a character in the literature work, the dialogue corpus of the target character is obtained by analyzing the literature content of the literature work to which the target character belongs, and then a dialogue sample pair of the target character is obtained based on the dialogue corpus. Taking role C in the novel as an example, dialogue contents of the role C and other dialogue objects are extracted from the novel text to construct dialogue corpus of the role C. When constructing a dialogue corpus, a dialogue object needs to be analyzed, and when customizing a dialogue model of a certain target character, all characters cannot be customized at the same time, so that other dialogue objects which perform dialogue with the character C are uniformly expressed as other dialogue objects. Preprocessing the collected original dialogue corpus to obtain the dialogue corpus of the target role, wherein the preprocessing process is the same as that in step 301, and will not be described again. Based on the implementation mode, the vertical dialog corpus of any customized role can be obtained, and then the vertical dialog corpus is used for fine tuning the pre-training dialog model to obtain the dialog model customized for the role. For example, a dialog corpus parsed for role C is shown below.
C: y teacher, you … … you get?
Other dialog objects: is you … … C?
C: i are C, D high school classmates, you are … ….
Other dialog objects: knowing this place, the sitting is convenient, and the user can walk around the place recently.
C: what is the Y teacher what was before you?
Other dialog objects: and, all have passed. Often, E mentions you, she says that you are … … specialized?
C: previously, i taught i a major at university, just at your school, but i had retired you from time to time.
Other dialog objects: * Professional, span is so large?
C: and D says i anyhow.
Other dialog objects: to you, monster cannot say she is clever.
C: little clever, and your daughter are not at one level.
In some embodiments, a process for a computer device to obtain a pair of conversational samples for a target character includes the steps of: the method comprises the steps that a computer device obtains dialogue sentences between a target role and a plurality of dialogue objects from a corpus to which the target role belongs; at least two dialogue sentences are obtained from the dialogue sentences, the at least two dialogue sentences comprise dialogue sentences of a target role and dialogue sentences of at least one dialogue object, and the end sentences of the at least two dialogue sentences are the dialogue sentences of the target role; and taking the end statement as a reply sample statement, taking other dialogue statements except the end statement in at least two dialogue statements as dialogue sample statements, and respectively obtaining dialogue sample features and reply sample features based on the dialogue sample statements and the reply sample statements.
The corpus to which the target role belongs comprises a plurality of dialogue sentences of the target role and dialogue sentences of a plurality of dialogue objects, and the dialogue corpus comprises the dialogue sentences between the target role and the dialogue objects. For example, the target character is a character of a literary work, and the corpus to which the target character belongs includes dialogue sentences of the characters in the literary work.
The computer equipment performs word segmentation, indexing processing and the like on the question sample sentences to convert the dialogue sentences into a matrix form, further obtain question sample characteristics, and performs the same processing on the reply sample sentences to obtain reply sample characteristics.
In this embodiment, at least two dialogue sentences of the target role and other dialogue objects are obtained, and the at least two dialogue sentences end with the dialogue sentence of the target role, so that the end sentence in the at least two dialogue sentences can be directly used as a reply sample sentence, and the other dialogue sentences are used as dialogue sample sentences, thereby improving the efficiency of obtaining the dialogue sample pair.
305. The computer equipment inputs the question sample characteristics into an initial dialogue model, and the sample splicing characteristics are obtained by splicing the question sample characteristics and the current parameter values of the role parameters through the initial dialogue model.
In the embodiment of the application, the question sample features and the current parameter values are respectively matrices, the hidden dimensions of the two are the same, the sequence lengths can be the same or different, and the hidden dimensions are the dimensions of each feature vector in the question sample features and the current parameter values. For example, the current parameter value of the character parameter is expressed asl represents the sequence length of the character parameter. Can be set and changed as required, for example, l is 20. The question sample is characterized by->The hidden dimensions h of both are the same.
In some embodiments, the computer device inputs the question sample features into the initial dialogue model separately to ensure that the output data is a reply to the question sample features, ensuring a match with the question sample features, avoiding confusion.
In other embodiments, the computer device enters the challenge sample feature with the reply sample feature into the initial dialog model, reducing the number of data inputs to the initial dialog model, i.e., reducing the processing pressure of the initial dialog model, not only allowing the model to learn more knowledge, but also facilitating adjustment of the character parameters based on the gap between the reply sample feature and the predicted reply feature.
Optionally, the computer device splices the question sample feature and the reply sample feature with the current parameter value of the character parameter to obtain a sample splicing feature.
306. The computer equipment obtains the predicted reply feature based on the sample splicing feature through the initial dialogue model.
In some embodiments, the process of obtaining predicted reply features by the initial dialog model based on sample stitching features includes the steps of: the initial dialogue model carries out linear mapping on sample splicing characteristics based on key weight parameters in the initial dialogue model to obtain key characteristics, wherein the key characteristics are used for indicating key information of a plurality of words in an inquiry sample sentence; based on the value weight parameter in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain value characteristics, wherein the value characteristics are used for indicating the importance degree of a plurality of words in the dialogue sample sentence; a predicted reply feature is derived based on the key feature and the value feature.
The initial dialogue model determines the product of the key weight parameter and the sample splicing characteristic to obtain the key characteristic, and determines the product of the value weight parameter and the sample splicing characteristic to obtain the value characteristic.
In this embodiment, the predictive reply feature is derived based on the attention mechanism, i.e. the initial dialog model automatically learns and selectively focuses on important information in the input, improving the performance and generalization ability of the initial dialog model, improving the accuracy of the predictive reply feature.
In some embodiments, the process of the initial dialog model to derive the predicted reply feature based on the key feature and the value feature includes the steps of: the initial dialogue model obtains an attention score matrix based on key characteristics, wherein the attention score matrix is used for indicating the current parameter values of each word and role parameters in a question sample sentence and the similarity between other words in the question sample sentence; normalizing the attention score matrix to obtain attention weight; weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics; and carrying out nonlinear conversion on the attention output characteristics to obtain predicted reply characteristics.
The computer equipment determines the product of the sample splicing characteristic and the query weight parameter in the model parameters to obtain the query characteristic. The query feature is used to match other words, i.e., to calculate associations or relationships between the current word and other words. The computer device determines a product of the query feature and a transpose of the key feature, determines a quotient of the product and a square root of a matrix dimension of the key feature, and obtains an attention score matrix. And then, carrying out normalization processing on the attention score matrix based on a normalization function softmax to obtain the attention weight. The attention weights are then weighted summed based on the value signature to obtain an attention output signature. Further, after weighted summation of the attention weights based on the value characteristics, the initial dialogue model also determines the product of the weight parameters of the conversion matrix in the model parameters and the sum, so as to obtain the attention output characteristics. The initial dialogue model may perform nonlinear conversion on the attention output feature through any one of a ReLU function, a tanh function, a Sigmoid function, and the like, which is not specifically limited herein.
Optionally, the pre-trained dialog model non-linearly transforms the attention output features through a linear layer, i.e., a fully connected layer. The linear layer is used for mapping the attention characteristics output by the attention layer to the probability distribution of the vocabulary so as to obtain the predicted reply characteristics. After the attention features are mapped to the probability distribution of the vocabulary, the probability of each word position of the candidate word features in the predicted reply features is obtained, and then the candidate word features with the highest probability in the positions are formed into the predicted reply features.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sample sentence and the current parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sample sentence and other words in the question sample sentence. The initial dialogue model normalizes the attention score matrix to obtain attention weight, and the method comprises the following steps: the initial dialogue model determines the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the character parameters on the second submatrix; normalizing the third sub-matrix to obtain a first sub-weight, and normalizing the second sub-matrix to obtain a second sub-weight; and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the gating factor is a model parameter that can be learned in the initial dialog model, and can be adaptively adjusted to control the importance of the first submatrix. In some embodiments, the initial value of the gating factor is controlled to be no greater than a preset value to eliminate the effect of randomly initialized character parameters on the output of the dialog model during an early stage of training. For example, the initial gating factor, i.e., the gating factor during the first iteration, is 0.
In the embodiment, by using a zero-initialized attention mechanism and a gating mechanism, the newly-injected random initialized character parameters can be prevented from damaging semantic knowledge of the pre-training dialogue model, so that training of the initial dialogue model is more stable, meanwhile, the performance of the character dialogue model obtained through training is ensured, and generated contents are ensured to accord with the language style and the behavior mode of the target character.
Wherein the first sub-weight and the second sub-weight are combined, i.e. spliced, to obtain the attention weight. In this embodiment, the first submatrix and the second submatrix are normalized by using independent softmax functions, so that it can be determined that the second submatrix is not affected by the role parameter, and the original dialogue capability of the initial dialogue model is not destroyed, that is, the output of the initial dialogue model is ensured to be an effective dialogue sentence under the condition that the output of the initial dialogue model has the dialogue style of the target role.
The initial dialogue model obtains the predicted reply feature through the following formulas (4) - (8) based on the sample splicing feature.
Q=xW Q (4);
K=[P;x]W K =[PW K ;xW K ]=[K P ;K N ] (5);
V=[P;x]W V =[PW V ;xW V ]=[V P ;V N ] (6);
x out =[g·softmax(S P );softmax(S N )]·V·W O (8);
Wherein P represents the current parameter value of the character parameter. g represents a gating factor. [ (r) ];]represents the splicing operation, K P The product of the current parameter value representing the character parameter and the key weight parameter. K (K) N Representing the product of the question sample feature and the key weight parameter. V (V) P The product of the current value of the character parameter and the value weight parameter is represented. V (V) N Representing the product of the question sample feature and the value weight parameter. S is S P Representing a first sub-matrix. S is S N Representing a second sub-matrix.
For example, referring to fig. 6, fig. 6 is a schematic structural diagram of an attention layer in an initial dialogue model according to an embodiment of the present application. The attention layer adopts a multi-head attention mechanism, the input of the attention layer is processed by the multi-head self-attention mechanism, the output of the attention layer is processed by the normalization layer, enters the feedforward neural network layer for processing, and is output after being processed by the normalization layer. Optionally, the attention layer takes the character parameters as input, processes them based on model parameters in the self-attention mechanism to obtain a first submatrix, and then combines the second submatrix to obtain the attention output characteristics of the attention layer.
In some embodiments, the initial dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter. The process of inputting the question sample characteristics into the initial dialogue model by the computer equipment to obtain the predicted reply characteristics comprises the following steps: inputting the question sample characteristics into a first attention layer by computer equipment, splicing the question sample characteristics and the current parameter values of character parameters in the first attention layer to obtain sample splicing characteristics, and obtaining attention output characteristics based on the sample splicing characteristics; sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers; and performing nonlinear conversion on the attention output characteristic output by the last attention layer to obtain a predicted reply characteristic. Wherein the input of each attention layer is the output of the last attention layer.
In the embodiment of the application, the trainable role parameters are inserted into each attention layer, so that each attention layer has different learnable role parameters, different model layers are allowed to perform adaptive learning with more customized fit, and the performance of the character dialogue model obtained through training is improved.
In the embodiment of the present application, the process of inputting the features of the question sample into the initial dialogue model to obtain the predicted reply features is implemented through the steps 305-306. In this embodiment, the current parameter values of the role parameters are spliced to the question sample feature, so that the role parameters are fused into the input data of the initial dialogue model, further, the predicted reply feature is obtained based on the spliced feature, the role parameters are adjusted based on the predicted reply feature and the reply sample feature of the target role, so that the role parameters can gradually represent the dialogue style to the target role, namely, the initial dialogue model gradually learns the dialogue style of the target role, the output predicted reply feature gradually has the dialogue style of the target role, and further, the role dialogue model capable of generating a reply sentence with the dialogue style of the target role is obtained.
It should be noted that, steps 305-306 are only one alternative implementation of the process of inputting the question sample feature into the initial dialogue model to obtain the predicted reply feature, and the process may be implemented in other alternative implementations. For example, the question sample feature and the current parameter of the character parameter take values, and an output is obtained through an initial dialogue model respectively, and the two outputs are spliced to be used as prediction reply features.
307. The computer device adjusts character parameters in the initial dialog model based on the reply sample features and the predicted reply features to obtain a character dialog model for the target character, the character dialog model being used to generate a reply sentence having a dialog style for the target character.
In an embodiment of the application, the computer device determines a loss between the reply sample feature and the predicted reply feature, and adjusts the character parameters in the initial dialog model based on the loss.
For example, referring to fig. 7, fig. 7 is a flowchart of training a character conversation model provided by embodiments of the present application. Wherein, the open field dialogue corpus is collected first, and then a pre-training dialogue model is trained. Dialog data for the custom character is then constructed, and then a dialog model for the custom character is trained based on the dialog data for the custom character and the pre-trained dialog model.
The embodiment of the application provides a training method of a role dialogue model, which combines the strong language expression capability of a pre-training dialogue model and the dialogue style and the language style of a specific role, and can obtain better training effect under the condition of less training data and calculation resources. During the training phase of the initial dialog model, trainable character parameters are injected for each attention layer, each attention layer having a respective different learnable character parameter, allowing for more tailored adaptive learning for different model layers. The method uses a zero-initialization attention mechanism and a gating mechanism to avoid that newly-injected random-initialization character parameters damage semantic knowledge of a pre-training dialogue model, so that the training initial stage is more stable, the performance of a customized character dialogue model obtained by training is ensured, and the generated content is determined to accord with the language style and the behavior model of the customized character. In the training stage of the initial dialogue model, only the newly injected character parameters are trained by freezing the model parameters of the pre-training dialogue model, so that the quantity of the model parameters required to be trained in the stage is greatly reduced, and the dialogue model of the customized character is obtained by light weight fine tuning. The method not only reduces the memory occupation of the training stage and the threshold of hardware resources, but also reduces the training time consumption, can more quickly realize the customization of the dialogue model for the role, reduces the training cost on the premise of not damaging the performance of the pre-training dialogue model, shortens the training time and greatly improves the efficiency of customizing the dialogue model for the role.
The character dialogue model of the target character is trained by the embodiments of fig. 2 and 3, and a method for generating a dialogue based on the trained character dialogue model is described below based on fig. 8. Referring to fig. 8, fig. 8 is a flowchart of a dialog generating method according to an embodiment of the present application, which includes the following steps.
801. The computer device obtains the question sentence feature of the question sentence.
In the embodiment of the present application, the question sentence feature is used to describe a question sentence. The question sentence is characterized by a sequence of vectors, each vector in the sequence of vectors being used to describe a word in the question sentence.
802. The computer equipment inputs the question sentence characteristics into the character dialogue model, and the character dialogue model is obtained based on the reply sentence training of the target character by splicing the question sentence characteristics and the parameter values of the character parameters in the character dialogue model through the character dialogue model to obtain the spliced characteristics.
In the embodiment of the present application, the process of splicing the question sentence feature and the parameter value of the character parameter in the character dialogue model in step 802 is the same as the process of splicing the question sample feature and the current parameter value of the character parameter in step 305, and is not described herein.
803. The computer equipment obtains the characteristic of the reply sentence based on the splicing characteristic through the role dialogue model, the characteristic of the reply sentence is used for indicating the reply sentence corresponding to the question sentence, and the dialogue style of the reply sentence belongs to the dialogue style of the target role.
In some embodiments, the process of obtaining the reply sentence feature by the character dialogue model based on the splicing feature includes the following steps: the character dialogue model carries out linear mapping on the splicing characteristics based on key weight parameters in the character dialogue model to obtain key characteristics, wherein the key characteristics are used for indicating key information of a plurality of words in a question sentence; based on the value weight parameters in the role dialogue model, carrying out linear mapping on the spliced features to obtain value features, wherein the value features are used for indicating the importance degrees of a plurality of words in the question sentence; and obtaining reply sentence characteristics based on the key characteristics and the value characteristics.
The process of obtaining the features of the reply sentence based on the splicing features in step 803 is the same as the process of obtaining the predicted reply feature based on the sample splicing features in step 306, and will not be described herein.
In some embodiments, the process of obtaining reply sentence features by the character dialogue model based on key features and value features includes the following steps: the character dialogue model obtains an attention score matrix based on key characteristics, wherein the attention score matrix is used for indicating the similarity between the current parameter value of each word in the question sentence and the character parameter and other words in the question sentence; normalizing the attention score matrix to obtain attention weight; weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics; and carrying out nonlinear conversion on the attention output characteristics to obtain reply sentence characteristics.
The process of obtaining the reply sentence feature based on the key feature and the value feature in step 803 is the same as the process of obtaining the predicted reply feature based on the key feature and the value feature in step 306, and will not be described herein.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sentence and the current parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sentence and other words in the question sentence; the character dialogue model normalizes the attention score matrix to obtain attention weight, and the method comprises the following steps: determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the character parameter on the second submatrix; normalizing the third sub-matrix to obtain a first sub-weight, and normalizing the second sub-matrix to obtain a second sub-weight; and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In step 803, the process of normalizing the attention score matrix to obtain the attention weight is the same as the process of normalizing the attention score matrix to obtain the attention weight in step 306, and is not described herein.
In some embodiments, the character dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter; the process of inputting the question sentence characteristics into the role dialogue model by the computer equipment to obtain the reply sentence characteristics comprises the following steps: inputting the speech sentence characteristics into a first attention layer, splicing the speech sentence characteristics and the current parameter values of character parameters in the first attention layer through the first attention layer to obtain splicing characteristics, and obtaining attention output characteristics based on the splicing characteristics; sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers; and performing nonlinear conversion on the attention output characteristics output by the last attention layer to obtain reply sentence characteristics.
In step 803, the process of inputting the question sentence feature into the character dialogue model to obtain the reply sentence feature is the same as the process of inputting the question sample feature into the initial dialogue model to obtain the predicted reply feature in step 306, and will not be described here again.
In the embodiment of the present application, the process of inputting the question sentence feature into the character dialogue model to obtain the reply sentence feature corresponding to the question sentence feature is implemented through the steps 802-803. In this embodiment, the reply sentence is obtained based on the spliced feature by splicing the question sentence feature and the parameter value of the character parameter. Because the role parameter is used for representing the dialogue style of the target role, the role dialogue model combines the dialogue style of the target role to output a reply sentence for the dialogue sentence characteristic, the reply sentence can have the dialogue style of the target role, and the convenience of outputting the reply sentence with the dialogue style of the target role is improved.
It should be noted that, steps 802-803 are only one alternative implementation manner for implementing a process of inputting the question sentence feature into the character dialogue model to obtain the reply sentence feature corresponding to the question sentence feature, and the process may also be implemented by other alternative implementation manners, which are not limited herein specifically.
In the embodiment of the application, the dialogue model customized based on the roles is widely applied to products, such as scenes of virtual assistants, intelligent customer service, game roles and the like. Custom role-specific dialog models may be used to create virtual assistants with specific personalities and dialog styles that may provide personalized services and interactive experiences according to user needs. In the intelligent customer service field, a dialogue model of a specific role can be customized to provide more specialized and friendly consultation and support for users. In the game, a dialogue model of a specific role can be customized to give richer individuality and interactivity to the game role, and a player can perform natural language dialogue with the game role, so that the immersion and interestingness of the game are improved. In these application scenarios, the role-based customized dialog model may improve the user experience, increasing the appeal and value of the product.
The method provided by the embodiment of the application does not need to manually design and customize rules and answer templates, and compared with a rule-based method, the method greatly reduces the labor cost and has good expansibility and generalization. The method provided by the embodiment of the application can enable the dialogue model to learn the language style and the answer mode of the character, and can generate novel answers. The method provided by the embodiment of the application combines the powerful language representation energy of the pre-training dialogue model and the language style of the specific role, and can obtain better effect under the condition of less data and calculation resources. Compared with a generating method based on full-scale fine tuning, the method and the device have the advantages that model parameters of the pre-training dialogue model are frozen in the training stage of the role dialogue model, only the newly-injected role parameters are trained, the number of model parameters required to be trained in the stage is greatly reduced, and the role customization dialogue of light-scale fine tuning is realized. Therefore, the memory occupation of the fine tuning stage is reduced, the threshold of hardware resources of the computer equipment is reduced, the training time consumption of the fine tuning stage is reduced, the role dialogue customization can be realized more rapidly, the training cost is reduced, the training time is shortened, and the role customization dialogue efficiency is greatly improved under the condition that the dialogue model performance is not damaged.
The implementation of the application provides a method for generating a dialogue, which generates a reply sentence through a role dialogue model, and the role dialogue model is obtained based on the reply sentence training of a target role, so that the reply sentence with the dialogue style of the target role can be generated based on the role dialogue model, and the reply sentence with the dialogue style of the target role can be generated rapidly and efficiently through the role dialogue model.
Fig. 9 is a block diagram of a training apparatus for a character conversation model provided in accordance with an embodiment of the present application. Referring to fig. 9, the apparatus includes:
the obtaining module 901 is configured to obtain an initial dialogue model based on a pre-training dialogue model, where the pre-training dialogue model is obtained by training based on multiple groups of dialogue samples, the initial dialogue model includes a role parameter and a model parameter of the pre-training dialogue model, and the role parameter is used for representing a dialogue style of a target role;
the obtaining module 901 is further configured to obtain a dialogue sample pair of the target role, where the dialogue sample pair of the target role includes a dialogue sample feature of a dialogue sample sentence and a reply sample feature of a reply sample sentence, and the reply sample sentence is a reply sentence of the target role;
The input-output module 902 is configured to input the question sample feature into the initial dialogue model to obtain a predicted reply feature;
an adjustment module 903, configured to adjust character parameters in the initial dialogue model based on the reply sample feature and the predicted reply feature, so as to obtain a character dialogue model of the target character, where the character dialogue model is used to generate a reply sentence having a dialogue style of the target character.
In some embodiments, input output module 902 is configured to:
inputting the question sample characteristics into an initial dialogue model;
and splicing the question sample characteristics and the current parameter values of the role parameters through the initial dialogue model to obtain sample splicing characteristics, and obtaining predicted reply characteristics based on the sample splicing characteristics.
In some embodiments, input output module 902 is configured to:
based on key weight parameters in the initial dialogue model, carrying out linear mapping on sample splicing characteristics to obtain key characteristics, wherein the key characteristics are used for indicating key information of a plurality of words in an inquiry sample sentence;
based on the value weight parameter in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain value characteristics, wherein the value characteristics are used for indicating the importance degree of a plurality of words in the dialogue sample sentence;
A predicted reply feature is derived based on the key feature and the value feature.
In some embodiments, input output module 902 is configured to:
obtaining an attention score matrix based on key characteristics, wherein the attention score matrix is used for indicating the similarity between the current parameter value of each word and character parameters in the question sample sentence and other words in the question sample sentence;
normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristics to obtain predicted reply characteristics.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sample sentence and the current parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sample sentence and other words in the question sample sentence; an input-output module 902 for:
determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the character parameter on the second submatrix;
Normalizing the third sub-matrix to obtain a first sub-weight, and normalizing the second sub-matrix to obtain a second sub-weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the initial dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter; an input-output module 902 for:
inputting the speech sample characteristics into a first attention layer, splicing the speech sample characteristics and the current parameter values of character parameters in the first attention layer through the first attention layer to obtain sample splicing characteristics, and obtaining attention output characteristics based on the sample splicing characteristics;
sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers;
and performing nonlinear conversion on the attention output characteristic output by the last attention layer to obtain a predicted reply characteristic.
In some embodiments, the obtaining module 901 is configured to:
obtaining dialogue sentences between the target role and a plurality of dialogue objects from a corpus to which the target role belongs;
At least two dialogue sentences are obtained from the dialogue sentences, the at least two dialogue sentences comprise dialogue sentences of a target role and dialogue sentences of at least one dialogue object, and the end sentences of the at least two dialogue sentences are the dialogue sentences of the target role;
and taking the end statement as a reply sample statement, taking other dialogue statements except the end statement in at least two dialogue statements as dialogue sample statements, and respectively obtaining dialogue sample features and reply sample features based on the dialogue sample statement and the reply sample statement.
The embodiment of the application provides a training device of a role dialogue model, which introduces a new role parameter on the basis of a trained pre-training dialogue model to obtain an initial dialogue model; and training the role parameters in the initial dialogue model based on the dialogue sample pair of the target role, so that the role parameters in the trained role dialogue model can accurately represent the dialogue style of the target role, and further, a reply sentence with the dialogue style of the target role can be generated through the role dialogue model. The device learns the dialogue style of the target character by introducing a new character parameter, so that only one character parameter is trained without training other model parameters, the training cost is reduced and the training time is shortened on the premise of not damaging the dialogue performance of the pre-training dialogue model, and the training efficiency of the character dialogue model is improved.
Fig. 10 is a block diagram of a dialog generating apparatus provided according to an embodiment of the present application. Referring to fig. 10, the apparatus includes:
an obtaining module 1001, configured to obtain a question sentence feature of a question sentence;
the input/output module 1002 is configured to input the feature of the question sentence into a role dialogue model, obtain a response sentence feature corresponding to the feature of the question sentence, where the role dialogue model is obtained based on training of the response sentence of the target role, and the response sentence feature is used to indicate a response sentence corresponding to the question sentence, and a dialogue style of the response sentence corresponding to the question sentence belongs to a dialogue style of the target role.
In some embodiments, the input output module is configured to:
inputting the dialogue sentence characteristics into the role dialogue model;
and splicing the question sentence characteristics and the parameter values of the character parameters in the character dialogue model through the character dialogue model to obtain spliced characteristics, and obtaining reply sentence characteristics based on the spliced characteristics.
In some embodiments, input output module 1002 is to:
based on key weight parameters in the role dialogue model, performing linear mapping on the spliced features to obtain key features, wherein the key features are used for indicating key information of a plurality of words in a question sentence;
Based on the value weight parameters in the role dialogue model, carrying out linear mapping on the spliced features to obtain value features, wherein the value features are used for indicating the importance degrees of a plurality of words in the question sentence;
and obtaining reply sentence characteristics based on the key characteristics and the value characteristics.
In some embodiments, input output module 1002 is to:
obtaining an attention score matrix based on key characteristics, wherein the attention score matrix is used for indicating the parameter values of each word and role parameters in the question sentence and the similarity between other words in the question sentence;
normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristics to obtain reply sentence characteristics.
In some embodiments, the attention score matrix includes a first sub-matrix for indicating a similarity between each word in the question sentence and the parameter value of the character parameter, and a second sub-matrix for indicating a similarity between each word in the question sentence and other words in the question sentence;
an input-output module 1002 for:
Determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the character parameter on the second submatrix;
normalizing the third sub-matrix to obtain a first sub-weight, and normalizing the second sub-matrix to obtain a second sub-weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
In some embodiments, the character dialog model includes a plurality of attention layers connected in sequence, each of the plurality of attention layers including a character parameter; an input-output module 1002 for:
inputting the speech sentence characteristics into a first attention layer, splicing the speech sentence characteristics and the parameter values of character parameters in the first attention layer through the first attention layer to obtain splicing characteristics, and obtaining attention output characteristics based on the splicing characteristics;
sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the parameter values of the character parameters in the other attention layers;
and performing nonlinear conversion on the attention output characteristics output by the last attention layer to obtain reply sentence characteristics.
The embodiment of the application provides a device for generating a dialogue, which generates a reply sentence through a role dialogue model, and the role dialogue model is obtained based on the reply sentence training of a target role, so that the reply sentence with the dialogue style of the target role can be generated based on the role dialogue model, and the reply sentence with the dialogue style of the target role can be generated rapidly and efficiently through the role dialogue model.
Fig. 11 shows a block diagram of a terminal 1100 according to an exemplary embodiment of the present application.
Generally, the terminal 1100 includes: a processor 1111 and a memory 1102.
Processor 1111 may include one or more processing cores such as a 4-core processor, an 8-core processor, etc. The processor 1111 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1111 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1111 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking charge of rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1111 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one program code for execution by processor 1111 to implement the training method or dialog generation method of the character dialog model provided by the method embodiments herein.
In some embodiments, the terminal 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1111, the memory 1102, and the peripheral interface 1103 can be connected by bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, a camera assembly 1106, audio circuitry 1107, and a power supply 1108.
Peripheral interface 1103 may be used to connect at least one Input/Output (I/O) related peripheral to processor 1111 and memory 1102. In some embodiments, the processor 1111, the memory 1102, and the peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1111, the memory 1102, and the peripheral interface 1103 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited in this application.
The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1111 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be one and disposed on the front panel of the terminal 1100; in other embodiments, the display 1105 may be at least two, respectively disposed on different surfaces of the terminal 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1105 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1111 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is then used to convert electrical signals from the processor 1111 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.
A power supply 1108 is used to power the various components in terminal 1100. The power supply 1108 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1108 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 1100 also includes one or more sensors 1109. The one or more sensors 1109 include, but are not limited to: acceleration sensor 1110, gyroscope sensor 1111, pressure sensor 1112, optical sensor 1113, and proximity sensor 1114.
The acceleration sensor 1110 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 1100. For example, the acceleration sensor 1110 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1111 may control the display screen 1105 to display the user interface in either a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1110. Acceleration sensor 1110 may also be used for the acquisition of motion data of a game or user.
The gyro sensor 1111 may detect a body direction and a rotation angle of the terminal 1100, and the gyro sensor 1111 may collect a 3D motion of the user on the terminal 1100 in cooperation with the acceleration sensor 1110. The processor 1111 may implement the following functions based on the data collected by the gyro sensor 1111: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
Pressure sensor 1112 may be disposed on a side frame of terminal 1100 and/or on an underlying layer of display 1105. When the pressure sensor 1112 is disposed at a side frame of the terminal 1100, a grip signal of the terminal 1100 by a user may be detected, and a left-right hand recognition or a shortcut operation is performed by the processor 1111 according to the grip signal collected by the pressure sensor 1112. When the pressure sensor 1112 is disposed at the lower layer of the display screen 1105, the processor 1111 performs control of the operability control on the UI interface according to the pressure operation of the display screen 1105 by the user. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor 1113 is used to collect the intensity of ambient light. In one embodiment, the processor 1111 may control the display brightness of the display screen 1105 according to the intensity of the ambient light collected by the optical sensor 1113. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1105 is turned down. In another embodiment, the processor 1111 may also dynamically adjust the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1113.
A proximity sensor 1114, also referred to as a distance sensor, is typically provided at the front panel of the terminal 1100. Proximity sensor 1114 is used to collect the distance between the user and the front of terminal 1100. In one embodiment, when proximity sensor 1114 detects a gradual decrease in the distance between the user and the front face of terminal 1100, processor 1111 controls display 1105 to switch from the on-screen state to the off-screen state; when the proximity sensor 1114 detects that the distance between the user and the front surface of the terminal 1100 gradually increases, the processor 1111 controls the display screen 1105 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 11 is not limiting and that terminal 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.
Fig. 12 is a schematic structural diagram of a server provided according to an embodiment of the present application, where the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1201 and one or more memories 1202, where the memories 1202 are used to store executable program codes, and the processors 1201 are configured to execute the executable program codes to implement the training method or the session generation method of the character session model provided by the foregoing respective method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The embodiment of the application also provides a computer readable storage medium, wherein at least one section of program is stored in the computer readable storage medium, and the at least one section of program is loaded and executed by a processor to realize the training method or the dialogue generating method of the character dialogue model in any implementation mode.
The embodiment of the application also provides a computer program product, the computer program product comprises at least one section of program, the at least one section of program is stored in a computer readable storage medium, a processor of the computer device reads the at least one section of program from the computer readable storage medium, and the processor executes the at least one section of program, so that the computer device executes the training method or the dialogue generating method of the role dialogue model in any implementation mode.
In some embodiments, the computer program product according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or on multiple computer devices distributed across multiple sites and interconnected by a communication network, where the multiple computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as being included within the spirit and principles of the present invention.

Claims (15)

1. A method of training a character conversation model, the method comprising:
based on a pre-training dialogue model, acquiring an initial dialogue model, wherein the pre-training dialogue model is obtained by training based on a plurality of groups of dialogue samples, the initial dialogue model comprises role parameters and model parameters of the pre-training dialogue model, and the role parameters are used for representing dialogue styles of target roles;
a dialogue sample pair of a target role is obtained, wherein the dialogue sample pair of the target role comprises a dialogue sample characteristic of a dialogue sample sentence and a reply sample characteristic of a reply sample sentence, and the reply sample sentence is a reply sentence of the target role;
inputting the question sample characteristics into the initial dialogue model to obtain predicted reply characteristics;
based on the reply sample feature and the predicted reply feature, character parameters in the initial dialog model are adjusted to obtain a character dialog model of the target character, wherein the character dialog model is used for generating a reply sentence with a dialog style of the target character.
2. The method of claim 1, wherein said inputting the question sample feature into the initial dialog model results in a predicted reply feature, comprising:
inputting the question sample feature into the initial dialogue model;
and splicing the question sample characteristics and the current parameter values of the role parameters through the initial dialogue model to obtain sample splicing characteristics, and obtaining the prediction reply characteristics based on the sample splicing characteristics.
3. The method of claim 2, wherein the deriving the predicted reply feature based on the sample stitching feature comprises:
based on key weight parameters in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain key characteristics, wherein the key characteristics are used for indicating key information of a plurality of words in the dialogue sample sentence;
based on the value weight parameters in the initial dialogue model, carrying out linear mapping on the sample splicing characteristics to obtain value characteristics, wherein the value characteristics are used for indicating the importance degrees of a plurality of words in the dialogue sample sentence;
the predicted reply feature is derived based on the key feature and the value feature.
4. A method according to claim 3, wherein said deriving said predicted reply feature based on said key feature and said value feature comprises:
obtaining an attention score matrix based on the key characteristics, wherein the attention score matrix is used for indicating the similarity between each word in the question sample sentence and the current parameter value of the role parameter and other words in the question sample sentence;
normalizing the attention score matrix to obtain attention weight;
weighting and summing the attention weights based on the value characteristics to obtain attention output characteristics;
and carrying out nonlinear conversion on the attention output characteristic to obtain the prediction reply characteristic.
5. The method of claim 4, wherein the attention score matrix comprises a first sub-matrix for indicating a similarity between each word in the question sample sentence and a current parameter value of the character parameter and a second sub-matrix for indicating a similarity between each word in the question sample sentence and other words in the question sample sentence;
The normalizing processing is performed on the attention score matrix to obtain attention weights, including:
determining the product of the first submatrix and a gating factor to obtain a third submatrix, wherein the gating factor is used for controlling the influence degree of the role parameter on the second submatrix;
normalizing the third submatrix to obtain a first submatrix weight, and normalizing the second submatrix to obtain a second submatrix weight;
and combining the first sub-weight and the second sub-weight to obtain the attention weight.
6. The method of claim 2, wherein the initial dialog model comprises a plurality of attention layers connected in sequence, the plurality of attention layers each comprising a character parameter; inputting the question sample feature into the initial dialogue model to obtain a predicted reply feature, wherein the method comprises the following steps:
inputting the question sample characteristics into a first attention layer, splicing the question sample characteristics and the current parameter values of character parameters in the first attention layer through the first attention layer to obtain sample splicing characteristics, and obtaining attention output characteristics based on the sample splicing characteristics;
Sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the current parameter values of the character parameters in the other attention layers;
and performing nonlinear conversion on the attention output characteristic output by the last attention layer to obtain the prediction recovery characteristic.
7. The method of claim 1, wherein the obtaining a pair of conversational samples for a target character comprises:
obtaining dialogue sentences between a target role and a plurality of dialogue objects from a corpus to which the target role belongs;
acquiring at least two dialogue sentences from the dialogue sentences, wherein the at least two dialogue sentences comprise the dialogue sentences of the target role and the dialogue sentences of at least one dialogue object, and the ending sentences of the at least two dialogue sentences are the dialogue sentences of the target role;
and taking the end statement as the reply sample statement, taking other dialogue statements except the end statement in the at least two dialogue statements as the question sample statement, and respectively obtaining the question sample feature and the reply sample feature based on the question sample statement and the reply sample statement.
8. A method of dialog generation, the method comprising:
acquiring the characteristics of a question sentence;
inputting the dialogue sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the dialogue sentence characteristics, wherein the role dialogue model is obtained based on reply sentence training of a target role, the reply sentence characteristics are used for indicating reply sentences corresponding to the dialogue sentences, and the dialogue styles of the reply sentences corresponding to the dialogue sentences belong to the dialogue styles of the target role.
9. The method of claim 8, wherein inputting the question sentence feature into a character dialogue model to obtain a reply sentence feature corresponding to the question sentence feature, comprises:
inputting the question sentence feature into the character dialogue model;
and splicing the question sentence characteristics and the parameter values of the character parameters in the character dialogue model through the character dialogue model to obtain splicing characteristics, and obtaining the reply sentence characteristics based on the splicing characteristics.
10. The method of claim 8, wherein the character conversation model includes a plurality of attention layers connected in sequence, the plurality of attention layers each including a character parameter; inputting the question sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the question sentence characteristics, wherein the method comprises the following steps:
Inputting the question sentence characteristics into a first attention layer, splicing the question sentence characteristics and the parameter values of character parameters in the first attention layer through the first attention layer to obtain splicing characteristics, and obtaining attention output characteristics based on the splicing characteristics;
sequentially inputting the attention output characteristics into other attention layers, and obtaining the attention output characteristics output by the other attention layers based on the parameter values of the character parameters in the other attention layers;
and carrying out nonlinear conversion on the attention output characteristic output by the last attention layer to obtain the reply sentence characteristic.
11. A training apparatus for a character conversation model, the apparatus comprising:
the acquisition module is used for acquiring an initial dialogue model based on a pre-training dialogue model, wherein the pre-training dialogue model is obtained by training based on a plurality of groups of dialogue samples, the initial dialogue model comprises role parameters and model parameters of the pre-training dialogue model, and the role parameters are used for representing the dialogue style of a target role;
the acquisition module is further used for acquiring a dialogue sample pair of a target role, wherein the dialogue sample pair of the target role comprises dialogue sample characteristics of dialogue sample sentences and reply sample characteristics of reply sample sentences, and the reply sample sentences are reply sentences of the target role;
The input-output module is used for inputting the question sample characteristics into the initial dialogue model to obtain predicted reply characteristics;
and the adjusting module is used for adjusting the role parameters in the initial dialogue model based on the response sample characteristics and the predicted response characteristics so as to obtain a role dialogue model of the target role, wherein the role dialogue model is used for generating a response sentence with the dialogue style of the target role.
12. A dialog generation device, the device comprising:
the acquisition module is used for acquiring the characteristics of the question sentences;
the input and output module is used for inputting the question sentence characteristics into a role dialogue model to obtain reply sentence characteristics corresponding to the question sentence characteristics, the role dialogue model is obtained based on reply sentence training of a target role, the reply sentence characteristics are used for indicating reply sentences corresponding to the question sentences, and the dialogue styles of the reply sentences belong to the dialogue styles of the target role.
13. A computer device comprising a processor and a memory for storing at least one program loaded by the processor and executing the training method of the character dialog model of any of claims 1 to 7 or the dialog generation method of any of claims 8-10.
14. A computer-readable storage medium storing at least one program for executing the training method of the character conversation model of any one of claims 1 to 7 or the conversation generation method of claims 8 to 10.
15. A computer program product, characterized in that the computer program product comprises at least one program stored in a computer-readable storage medium, from which the at least one program is read by a processor of a computer device, the processor executing the at least one program causing the computer device to perform the training method of the character dialog model of any of claims 1 to 7 or the dialog generation method of claims 8-10.
CN202311342803.0A 2023-10-17 2023-10-17 Training method of role dialogue model, dialogue generation method, device and equipment Pending CN117633198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311342803.0A CN117633198A (en) 2023-10-17 2023-10-17 Training method of role dialogue model, dialogue generation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311342803.0A CN117633198A (en) 2023-10-17 2023-10-17 Training method of role dialogue model, dialogue generation method, device and equipment

Publications (1)

Publication Number Publication Date
CN117633198A true CN117633198A (en) 2024-03-01

Family

ID=90024256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311342803.0A Pending CN117633198A (en) 2023-10-17 2023-10-17 Training method of role dialogue model, dialogue generation method, device and equipment

Country Status (1)

Country Link
CN (1) CN117633198A (en)

Similar Documents

Publication Publication Date Title
CN110490213B (en) Image recognition method, device and storage medium
US20220172737A1 (en) Speech signal processing method and speech separation method
EP4006901A1 (en) Audio signal processing method and apparatus, electronic device, and storage medium
US11842164B2 (en) Method and apparatus for training dialog generation model, dialog generation method and apparatus, and medium
CN112069309B (en) Information acquisition method, information acquisition device, computer equipment and storage medium
CN111680123B (en) Training method and device for dialogue model, computer equipment and storage medium
CN113421547B (en) Voice processing method and related equipment
CN111753498B (en) Text processing method, device, equipment and storage medium
WO2023207541A1 (en) Speech processing method and related device
CN111274412A (en) Information extraction method, information extraction model training device and storage medium
CN113822076A (en) Text generation method and device, computer equipment and storage medium
CN113205569A (en) Image drawing method and device, computer readable medium and electronic device
CN117454954A (en) Model training method, device, computer equipment and storage medium
CN117271745A (en) Information processing method and device, computing equipment and storage medium
KR102559074B1 (en) Method and apparatus for providing english education services to a learner terminal and a parent terminal using a neural network
CN115116437B (en) Speech recognition method, device, computer equipment, storage medium and product
CN116956814A (en) Punctuation prediction method, punctuation prediction device, punctuation prediction equipment and storage medium
CN115130456A (en) Sentence parsing and matching model training method, device, equipment and storage medium
CN117633198A (en) Training method of role dialogue model, dialogue generation method, device and equipment
CN113822084A (en) Statement translation method and device, computer equipment and storage medium
CN113792537A (en) Action generation method and device
CN113515943A (en) Natural language processing method and method, device and storage medium for acquiring model thereof
CN116955835B (en) Resource screening method, device, computer equipment and storage medium
CN117273019A (en) Training method of dialogue model, dialogue generation method, device and equipment
CN117009470A (en) Entity and intention recognition method, device, equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication