CN116010575A - Dialogue generation method integrating basic knowledge and user information - Google Patents
Dialogue generation method integrating basic knowledge and user information Download PDFInfo
- Publication number
- CN116010575A CN116010575A CN202310058399.8A CN202310058399A CN116010575A CN 116010575 A CN116010575 A CN 116010575A CN 202310058399 A CN202310058399 A CN 202310058399A CN 116010575 A CN116010575 A CN 116010575A
- Authority
- CN
- China
- Prior art keywords
- attention
- vector
- representing
- layer
- knowledge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 128
- 230000006870 function Effects 0.000 claims description 45
- 239000011159 matrix material Substances 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 238000007476 Maximum Likelihood Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims 1
- 239000010410 layer Substances 0.000 description 41
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000002355 dual-layer Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a dialogue generating method integrating basic knowledge and user information, which comprises the following steps: constructing a user information data set and a man-machine conversation data set, acquiring a basic knowledge data set from a large data platform, sending the data set to an encoder and a decoder adopting a multi-input transducer structure, respectively encoding and calculating attention vectors for historical conversations, user personal information and basic knowledge, and then linearly fusing all the attention vectors, so that the three parts of contents are more comprehensively considered by a language model to generate more reasonable replies. The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.
Description
Technical Field
The invention relates to the technical field of artificial intelligence natural language generation, in particular to a dialogue generation method integrating basic knowledge and user information.
Background
In recent years, man-machine intelligent dialogue technology has been vigorously developed, and has been used in many fields, such as online customer service, online medical consultation, psychological consultation, automatic question-answering, etc., and artificial intelligence has been widely used in dialogue generation for providing information to people. However, most people have impressions on AI intelligent man-machine conversations, and stay in 'Hey Siri' and 'small scale', the conversational partner chat is often a dialect which is not spoken by a user, only can react to specific instructions, and the reactive conversation is also prefabricated, so that the requirements of various users cannot be met. According to the mode of human conversation, when people provide information to others, the background and interests of the other party are considered because the interests of different people are different. Providing a large amount of knowledge without regard to the dialog object's own information may give the other party too much garbage, resulting in a reduced user experience. In the context of such conversations in humans, dialog systems need to combine a priori knowledge with character information to give a reply to provide information to the user more effectively. The existing data set and dialogue generation model rarely considers knowledge and role information, and has limitation in generating dialogue integrating knowledge and role information.
The invention comprises the following steps:
the invention aims at overcoming the defects of the prior art and provides a dialogue generating method integrating basic knowledge and user information. The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.
The technical scheme for realizing the aim of the invention is as follows:
a dialogue generating method integrating basic knowledge and user information comprises the following steps:
1) Building a dialogue data set based on knowledge and role information: because the existing dialogue data set is not fusedCombining knowledge and role information requires constructing a training dataset having both knowledge and role information, comprising: data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentencesl m Representing a question sentence @, a->Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: mapping natural language to vector space by using a multi-input transducer structure through word embedding and embedding, respectively carrying out word embedding and encoding on basic knowledge sequence D and user characteristic information P to respectively obtain word embedding sequence vectors X (D) and X (P), converting the word embedding vectors into sine and cosine vector representations containing various frequencies by position encoding, and capturing relations among words in a high-dimensional vector space to obtain the word embedding sequence vector X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p ,
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing a line employing a sine and cosine functionThe character transformation encodes the word embedding vector, T represents any position in the word embedding sequence, d represents vector dimension, and positional encoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining a hidden characteristic sequence C of the source text by performing next calculation through an Encoder layer E ;
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q ,
K=Linear(X embed )=X embed *W k ,
V=Linear(X embed )=X embed *W v ,
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i To learn multiple meaning expressions, for X embed Making linear mappingProjection onto each feature space i, multiplying with three weights W q 、W k 、W v ,
The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matricesThe h groups of transformed queries, keys and values are attention-pooled in parallel, the h groups of attention-pooled output heads i Spliced together and passed through another learnable linear projection matrix W h Transform to produce final base knowledge multi-headed attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;……;head h )W h ,
Wherein MultiHead represents a multi-head attention function, concat represents a cascade operation, head i The attention output representing the ith subspace, i e (1, h);parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned referenceBasic knowledge attention vector X att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden :
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operations, FFN represents a dual layer fully connected network with ReLU as an activation function, encoder source text hiding representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, wherein the method specifically comprises the following steps:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
wherein ,representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
where E represents the historical dialog attention feature vector,representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit needs to be mapped via a linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by a model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, the loss function aims at minimizing negative log likelihood of language generation modeling, user judgment and knowledge judgment, and the whole total loss function fully-connects all maximum likelihood estimation functions by adopting weight parameters:
Loss=α*L P +β*L D ,
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters.
The key processing procedure in the encoder in the technical scheme comprises the following steps: sequentially performing word embedding and position coding on basic knowledge and historical dialogues with personalized labels of users to obtain word embedding feature vectors X with position information embed (D)、X embed (P),X embed (D)、X embed The input encoder calculates the multi-head self-attention vector to obtain corresponding characterization, and then the hidden characteristic representation C of the source text is obtained through feedforward full-connection layer, layer normalization and residual connection E ,C E As input to the decoder.
The key processing procedure in the decoder in the technical scheme comprises the following steps: semantic vector sequences obtained by word embedding and position coding of dialogue data are input into a decoder, a multi-head attention mechanism is adopted, and a hidden state vector C output by the encoder is used E Obtaining an attention vector E of the history dialogue through multi-head attention calculation; calculating tertiary attention vectors aiming at personalized information, basic knowledge and historical dialogue states of a user respectively, linearly fusing the tertiary results, and extracting decoder hidden state representation C from the fused attention vectors D ,C D The generation statement Y is obtained through the feedforward neural network layer and the Softmax operation, the dialogue generation model is optimized by adopting the joint loss function, and the generation statement Y is obtained when the iteration of the joint loss function is minimumTraining a good model; and the user inputs the questioning content into the trained dialogue generating model to finally obtain a reply sentence with fused user information and basic knowledge.
The technical scheme improves a conventional transducer coding and decoding layer, adds a multi-layer input attention structure on an original single input structure to generate basic knowledge response and user information response, adopts a basic knowledge base and user information portraits to generate customized replies, and is favorable for accurately grasping user core requirements.
According to the technical scheme, the language generating model is trained by adopting the Encoder-Decoder structure based on the multi-input transducer, so that the language generating model can reply aiming at the user image combining basic knowledge, and the quality of language reply is improved.
The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.
Description of the drawings:
FIG. 1 is a schematic flow chart of a method of an embodiment;
FIG. 2 is a schematic diagram of an Encoder Encoder in an embodiment;
fig. 3 is a schematic diagram of a Decoder in an embodiment.
The specific embodiment is as follows:
the present invention will now be further illustrated, but not limited, by the following figures and examples.
Examples:
referring to fig. 1, a dialogue generating method for fusing basic knowledge and user information includes the steps of:
1) Building a dialogue data set based on knowledge and role information: because the existing dialogue data set does not integrate knowledge and role information, a training data set with knowledge and role information at the same time needs to be constructed, and the training data set comprises:data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentencesl m Representing a question sentence @, a->Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: as shown in fig. 2, a multi-input transducer structure is adopted to map natural language to vector space through word embedding and enabling word embedding coding to be carried out on basic knowledge sequence D and user characteristic information P respectively to obtain word embedding sequence vectors X (D) and X (P), position coding is adopted to convert the word embedding vectors into sine and cosine vector representations containing various frequencies, and relations among words are captured in high-dimensional vector space to obtain word embedding sequence vectors X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p ,
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing position encoding of a word embedding vector using linear transformation of a sine and cosine function, T representing any position in the word embedding sequence, and d representingVector dimension, positioning encoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining a hidden characteristic sequence C of the source text by performing next calculation through an Encoder layer E ;
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q ,
K=Linear(X embed )=X embed *W k ,
V=Linear(X embed )=X embed *W v ,
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i To learn multiple meaning expressions, for X embed The linear mapping projection is carried out on each characteristic space i, and three weights W are multiplied respectively q 、W k 、W v ,
The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matricesi.e. (1, h), the h groups of transformed queries, keys and values are attention pooled in parallel, the h groups of attention pooled output heads i Spliced together and passed through another learnable linear projection matrix W h Transform to produce final base knowledge multi-headed attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;......;head h )W h ,
Wherein MultiHead represents a multi-head attention function, concat represents a cascade operation, head i The attention output representing the ith subspace, i e (1, h);parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned attention vector X about basic knowledge att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden :
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operations, FFN represents a dual layer fully connected network with ReLU as an activation function, encoder source text hiding representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, as shown in fig. 3, specifically:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
wherein ,representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
where E represents the historical dialog attention feature vector,representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit needs to be mapped via a linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by a model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, the loss function aims at minimizing negative log likelihood of language generation modeling, user judgment and knowledge judgment, and the whole total loss function fully-connects all maximum likelihood estimation functions by adopting weight parameters:
Loss=α*L P +β*L D ,
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters. />
Claims (1)
1. A dialogue generation method integrating basic knowledge and user information is characterized by comprising the following steps:
1) Building a dialogue data set based on knowledge and role information: constructing a training data set with knowledge and role information at the same time, comprising: data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentencesl m Representing a question sentence @, a->Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: mapping natural language to vector space by using a multi-input transducer structure through word embedding and embedding, respectively carrying out word embedding and encoding on basic knowledge sequence D and character information P to respectively obtain word embedding sequence vectors X (D) and X (P), converting the word embedding vectors into sine and cosine vector representations containing various frequencies by position encoding, and capturing relations among words in a high-dimensional vector space to obtain word embedding sequence vectors X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p ,
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing position encoding of a word embedding vector using linear transformation of a sine and cosine function, T representing any position in the word embedding sequence, d representing vector dimension, locationencoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining the hidden characteristic sequence C of the source text by calculation of the Encoder layer E ;
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q ,
K=Linear(X embed )=X embed *W k ,
V=Linear(X embed )=X embed *W v ,
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i For X embed The linear mapping projection is carried out on each characteristic space i, and three weights W are multiplied respectively q 、W k 、W v The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matricesThe h groups of transformed queries, keys and values are attention-pooled in parallel, the h groups of attention-pooled output heads i Spliced together and passed through a linear projection matrix W which can be learned h Transforming to generate final basic knowledge multi-head attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;……;head h )W h ,
Wherein MultiHead represents a multi-headed attention function, concat represents a cascading operation,head i The attention output representing the ith subspace, i e (1, h);parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned attention vector X about basic knowledge att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden :
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operation, FFN represents a double-layer fully-connected network with ReLU as an activation function, and encoder source textThe hidden representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, as shown in fig. 3, specifically:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
wherein ,representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
where E represents the historical dialog attention feature vector,representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit is mapped via linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by the model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, and the whole total loss function fully connects the maximum likelihood estimation functions by adopting weight parameters:
Loss=α*L P +β*L D ,
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310058399.8A CN116010575A (en) | 2023-01-19 | 2023-01-19 | Dialogue generation method integrating basic knowledge and user information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310058399.8A CN116010575A (en) | 2023-01-19 | 2023-01-19 | Dialogue generation method integrating basic knowledge and user information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116010575A true CN116010575A (en) | 2023-04-25 |
Family
ID=86035719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310058399.8A Pending CN116010575A (en) | 2023-01-19 | 2023-01-19 | Dialogue generation method integrating basic knowledge and user information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116010575A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116244419A (en) * | 2023-05-12 | 2023-06-09 | 苏州大学 | Knowledge enhancement dialogue generation method and system based on character attribute |
CN116502069A (en) * | 2023-06-25 | 2023-07-28 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN116759077A (en) * | 2023-08-18 | 2023-09-15 | 北方健康医疗大数据科技有限公司 | Medical dialogue intention recognition method based on intelligent agent |
CN116821168A (en) * | 2023-08-24 | 2023-09-29 | 吉奥时空信息技术股份有限公司 | Improved NL2SQL method based on large language model |
CN116975654A (en) * | 2023-08-22 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Object interaction method, device, electronic equipment, storage medium and program product |
CN117746078A (en) * | 2024-02-21 | 2024-03-22 | 杭州觅睿科技股份有限公司 | Object detection method and system based on user-defined category |
-
2023
- 2023-01-19 CN CN202310058399.8A patent/CN116010575A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116244419A (en) * | 2023-05-12 | 2023-06-09 | 苏州大学 | Knowledge enhancement dialogue generation method and system based on character attribute |
CN116244419B (en) * | 2023-05-12 | 2023-08-25 | 苏州大学 | Knowledge enhancement dialogue generation method and system based on character attribute |
CN116502069A (en) * | 2023-06-25 | 2023-07-28 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN116502069B (en) * | 2023-06-25 | 2023-09-12 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN116759077A (en) * | 2023-08-18 | 2023-09-15 | 北方健康医疗大数据科技有限公司 | Medical dialogue intention recognition method based on intelligent agent |
CN116975654A (en) * | 2023-08-22 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Object interaction method, device, electronic equipment, storage medium and program product |
CN116975654B (en) * | 2023-08-22 | 2024-01-05 | 腾讯科技(深圳)有限公司 | Object interaction method and device, electronic equipment and storage medium |
CN116821168A (en) * | 2023-08-24 | 2023-09-29 | 吉奥时空信息技术股份有限公司 | Improved NL2SQL method based on large language model |
CN116821168B (en) * | 2023-08-24 | 2024-01-23 | 吉奥时空信息技术股份有限公司 | Improved NL2SQL method based on large language model |
CN117746078A (en) * | 2024-02-21 | 2024-03-22 | 杭州觅睿科技股份有限公司 | Object detection method and system based on user-defined category |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116010575A (en) | Dialogue generation method integrating basic knowledge and user information | |
CN109670035B (en) | Text abstract generating method | |
CN108153913B (en) | Training method of reply information generation model, reply information generation method and device | |
CN110297887B (en) | Service robot personalized dialogue system and method based on cloud platform | |
CN108595436B (en) | Method and system for generating emotional dialogue content and storage medium | |
CN112259100B (en) | Speech recognition method, training method of related model, related equipment and device | |
CN111274375B (en) | Multi-turn dialogue method and system based on bidirectional GRU network | |
CN112115687A (en) | Problem generation method combining triples and entity types in knowledge base | |
CN115964467A (en) | Visual situation fused rich semantic dialogue generation method | |
CN113918813A (en) | Method and device for recommending posts based on external knowledge in chat record form | |
CN117234341B (en) | Virtual reality man-machine interaction method and system based on artificial intelligence | |
CN113223509A (en) | Fuzzy statement identification method and system applied to multi-person mixed scene | |
CN112632244A (en) | Man-machine conversation optimization method and device, computer equipment and storage medium | |
CN116150338A (en) | Intelligent customer service method and system based on multi-round dialogue | |
CN110929476B (en) | Task type multi-round dialogue model construction method based on mixed granularity attention mechanism | |
CN112163080A (en) | Generation type dialogue system based on multi-round emotion analysis | |
CN114386426B (en) | Gold medal speaking skill recommendation method and device based on multivariate semantic fusion | |
CN114281954A (en) | Multi-round dialog reply generation system and method based on relational graph attention network | |
CN117251562A (en) | Text abstract generation method based on fact consistency enhancement | |
CN116759077A (en) | Medical dialogue intention recognition method based on intelligent agent | |
CN117149977A (en) | Intelligent collecting robot based on robot flow automation | |
CN111414466A (en) | Multi-round dialogue modeling method based on depth model fusion | |
CN116312539A (en) | Chinese dialogue round correction method and system based on large model | |
Yuan et al. | A human–machine interaction scheme based on background knowledge in 6G-enabled IoT environment | |
CN115795010A (en) | External knowledge assisted multi-factor hierarchical modeling common-situation dialogue generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |