CN116010575A - Dialogue generation method integrating basic knowledge and user information - Google Patents

Dialogue generation method integrating basic knowledge and user information Download PDF

Info

Publication number
CN116010575A
CN116010575A CN202310058399.8A CN202310058399A CN116010575A CN 116010575 A CN116010575 A CN 116010575A CN 202310058399 A CN202310058399 A CN 202310058399A CN 116010575 A CN116010575 A CN 116010575A
Authority
CN
China
Prior art keywords
attention
vector
representing
layer
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310058399.8A
Other languages
Chinese (zh)
Inventor
覃远年
黎桂成
吴冬雪
雷送强
宁波
卢玉胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202310058399.8A priority Critical patent/CN116010575A/en
Publication of CN116010575A publication Critical patent/CN116010575A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a dialogue generating method integrating basic knowledge and user information, which comprises the following steps: constructing a user information data set and a man-machine conversation data set, acquiring a basic knowledge data set from a large data platform, sending the data set to an encoder and a decoder adopting a multi-input transducer structure, respectively encoding and calculating attention vectors for historical conversations, user personal information and basic knowledge, and then linearly fusing all the attention vectors, so that the three parts of contents are more comprehensively considered by a language model to generate more reasonable replies. The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.

Description

Dialogue generation method integrating basic knowledge and user information
Technical Field
The invention relates to the technical field of artificial intelligence natural language generation, in particular to a dialogue generation method integrating basic knowledge and user information.
Background
In recent years, man-machine intelligent dialogue technology has been vigorously developed, and has been used in many fields, such as online customer service, online medical consultation, psychological consultation, automatic question-answering, etc., and artificial intelligence has been widely used in dialogue generation for providing information to people. However, most people have impressions on AI intelligent man-machine conversations, and stay in 'Hey Siri' and 'small scale', the conversational partner chat is often a dialect which is not spoken by a user, only can react to specific instructions, and the reactive conversation is also prefabricated, so that the requirements of various users cannot be met. According to the mode of human conversation, when people provide information to others, the background and interests of the other party are considered because the interests of different people are different. Providing a large amount of knowledge without regard to the dialog object's own information may give the other party too much garbage, resulting in a reduced user experience. In the context of such conversations in humans, dialog systems need to combine a priori knowledge with character information to give a reply to provide information to the user more effectively. The existing data set and dialogue generation model rarely considers knowledge and role information, and has limitation in generating dialogue integrating knowledge and role information.
The invention comprises the following steps:
the invention aims at overcoming the defects of the prior art and provides a dialogue generating method integrating basic knowledge and user information. The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.
The technical scheme for realizing the aim of the invention is as follows:
a dialogue generating method integrating basic knowledge and user information comprises the following steps:
1) Building a dialogue data set based on knowledge and role information: because the existing dialogue data set is not fusedCombining knowledge and role information requires constructing a training dataset having both knowledge and role information, comprising: data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentences
Figure BDA0004060861650000011
l m Representing a question sentence @, a->
Figure BDA0004060861650000012
Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: mapping natural language to vector space by using a multi-input transducer structure through word embedding and embedding, respectively carrying out word embedding and encoding on basic knowledge sequence D and user characteristic information P to respectively obtain word embedding sequence vectors X (D) and X (P), converting the word embedding vectors into sine and cosine vector representations containing various frequencies by position encoding, and capturing relations among words in a high-dimensional vector space to obtain the word embedding sequence vector X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p
Figure BDA0004060861650000021
Figure BDA0004060861650000022
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing a line employing a sine and cosine functionThe character transformation encodes the word embedding vector, T represents any position in the word embedding sequence, d represents vector dimension, and positional encoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining a hidden characteristic sequence C of the source text by performing next calculation through an Encoder layer E
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q
K=Linear(X embed )=X embed *W k
V=Linear(X embed )=X embed *W v
Figure BDA0004060861650000023
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i To learn multiple meaning expressions, for X embed Making linear mappingProjection onto each feature space i, multiplying with three weights W q 、W k 、W v
The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matrices
Figure BDA0004060861650000031
The h groups of transformed queries, keys and values are attention-pooled in parallel, the h groups of attention-pooled output heads i Spliced together and passed through another learnable linear projection matrix W h Transform to produce final base knowledge multi-headed attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;……;head h )W h
Figure BDA0004060861650000032
Wherein MultiHead represents a multi-head attention function, concat represents a cascade operation, head i The attention output representing the ith subspace, i e (1, h);
Figure BDA0004060861650000034
parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>
Figure BDA0004060861650000033
Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>
Figure BDA0004060861650000035
Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned referenceBasic knowledge attention vector X att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operations, FFN represents a dual layer fully connected network with ReLU as an activation function, encoder source text hiding representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, wherein the method specifically comprises the following steps:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
Figure BDA0004060861650000041
wherein ,
Figure BDA0004060861650000042
representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
Figure BDA0004060861650000043
Figure BDA0004060861650000044
Figure BDA0004060861650000045
Figure BDA0004060861650000046
Figure BDA0004060861650000047
where E represents the historical dialog attention feature vector,
Figure BDA0004060861650000048
representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->
Figure BDA0004060861650000049
Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit needs to be mapped via a linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by a model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, the loss function aims at minimizing negative log likelihood of language generation modeling, user judgment and knowledge judgment, and the whole total loss function fully-connects all maximum likelihood estimation functions by adopting weight parameters:
Figure BDA00040608616500000410
Figure BDA00040608616500000411
Figure BDA0004060861650000051
Loss=α*L P +β*L D
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;
Figure BDA0004060861650000052
the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters.
The key processing procedure in the encoder in the technical scheme comprises the following steps: sequentially performing word embedding and position coding on basic knowledge and historical dialogues with personalized labels of users to obtain word embedding feature vectors X with position information embed (D)、X embed (P),X embed (D)、X embed The input encoder calculates the multi-head self-attention vector to obtain corresponding characterization, and then the hidden characteristic representation C of the source text is obtained through feedforward full-connection layer, layer normalization and residual connection E ,C E As input to the decoder.
The key processing procedure in the decoder in the technical scheme comprises the following steps: semantic vector sequences obtained by word embedding and position coding of dialogue data are input into a decoder, a multi-head attention mechanism is adopted, and a hidden state vector C output by the encoder is used E Obtaining an attention vector E of the history dialogue through multi-head attention calculation; calculating tertiary attention vectors aiming at personalized information, basic knowledge and historical dialogue states of a user respectively, linearly fusing the tertiary results, and extracting decoder hidden state representation C from the fused attention vectors D ,C D The generation statement Y is obtained through the feedforward neural network layer and the Softmax operation, the dialogue generation model is optimized by adopting the joint loss function, and the generation statement Y is obtained when the iteration of the joint loss function is minimumTraining a good model; and the user inputs the questioning content into the trained dialogue generating model to finally obtain a reply sentence with fused user information and basic knowledge.
The technical scheme improves a conventional transducer coding and decoding layer, adds a multi-layer input attention structure on an original single input structure to generate basic knowledge response and user information response, adopts a basic knowledge base and user information portraits to generate customized replies, and is favorable for accurately grasping user core requirements.
According to the technical scheme, the language generating model is trained by adopting the Encoder-Decoder structure based on the multi-input transducer, so that the language generating model can reply aiming at the user image combining basic knowledge, and the quality of language reply is improved.
The method gives consideration to processing knowledge and role information, can improve the experience of man-machine conversation and improve the quality of language reply.
Description of the drawings:
FIG. 1 is a schematic flow chart of a method of an embodiment;
FIG. 2 is a schematic diagram of an Encoder Encoder in an embodiment;
fig. 3 is a schematic diagram of a Decoder in an embodiment.
The specific embodiment is as follows:
the present invention will now be further illustrated, but not limited, by the following figures and examples.
Examples:
referring to fig. 1, a dialogue generating method for fusing basic knowledge and user information includes the steps of:
1) Building a dialogue data set based on knowledge and role information: because the existing dialogue data set does not integrate knowledge and role information, a training data set with knowledge and role information at the same time needs to be constructed, and the training data set comprises:data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentences
Figure BDA0004060861650000061
l m Representing a question sentence @, a->
Figure BDA0004060861650000062
Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: as shown in fig. 2, a multi-input transducer structure is adopted to map natural language to vector space through word embedding and enabling word embedding coding to be carried out on basic knowledge sequence D and user characteristic information P respectively to obtain word embedding sequence vectors X (D) and X (P), position coding is adopted to convert the word embedding vectors into sine and cosine vector representations containing various frequencies, and relations among words are captured in high-dimensional vector space to obtain word embedding sequence vectors X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p
Figure BDA0004060861650000063
Figure BDA0004060861650000064
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing position encoding of a word embedding vector using linear transformation of a sine and cosine function, T representing any position in the word embedding sequence, and d representingVector dimension, positioning encoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining a hidden characteristic sequence C of the source text by performing next calculation through an Encoder layer E
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q
K=Linear(X embed )=X embed *W k
V=Linear(X embed )=X embed *W v
Figure BDA0004060861650000071
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i To learn multiple meaning expressions, for X embed The linear mapping projection is carried out on each characteristic space i, and three weights W are multiplied respectively q 、W k 、W v
The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matrices
Figure BDA0004060861650000072
i.e. (1, h), the h groups of transformed queries, keys and values are attention pooled in parallel, the h groups of attention pooled output heads i Spliced together and passed through another learnable linear projection matrix W h Transform to produce final base knowledge multi-headed attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;......;head h )W h
Figure BDA0004060861650000073
Wherein MultiHead represents a multi-head attention function, concat represents a cascade operation, head i The attention output representing the ith subspace, i e (1, h);
Figure BDA0004060861650000074
parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>
Figure BDA0004060861650000075
Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>
Figure BDA0004060861650000076
Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned attention vector X about basic knowledge att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operations, FFN represents a dual layer fully connected network with ReLU as an activation function, encoder source text hiding representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, as shown in fig. 3, specifically:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
Figure BDA0004060861650000081
wherein ,
Figure BDA0004060861650000082
representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
Figure BDA0004060861650000083
Figure BDA0004060861650000084
Figure BDA0004060861650000085
Figure BDA0004060861650000086
Figure BDA0004060861650000087
where E represents the historical dialog attention feature vector,
Figure BDA0004060861650000088
representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->
Figure BDA0004060861650000089
Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit needs to be mapped via a linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by a model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, the loss function aims at minimizing negative log likelihood of language generation modeling, user judgment and knowledge judgment, and the whole total loss function fully-connects all maximum likelihood estimation functions by adopting weight parameters:
Figure BDA0004060861650000091
Figure BDA0004060861650000092
/>
Loss=α*L P +β*L D
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;
Figure BDA0004060861650000093
the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters. />

Claims (1)

1. A dialogue generation method integrating basic knowledge and user information is characterized by comprising the following steps:
1) Building a dialogue data set based on knowledge and role information: constructing a training data set with knowledge and role information at the same time, comprising: data set d= [ D ] based on basic knowledge database DBpedia 1 ,d 2 ,...,d n ]User characteristic information data set p= (P) 1 ,p 2 ,....,p n ) Human-computer interaction is carried out by adopting sentences with role information labels to obtain dialogue data, wherein the dialogue data comprises question sentences and reply sentences
Figure FDA0004060861640000011
l m Representing a question sentence @, a->
Figure FDA0004060861640000012
Representing a reply sentence;
2) Obtaining a user information embedding vector and a basic knowledge embedding vector: mapping natural language to vector space by using a multi-input transducer structure through word embedding and embedding, respectively carrying out word embedding and encoding on basic knowledge sequence D and character information P to respectively obtain word embedding sequence vectors X (D) and X (P), converting the word embedding vectors into sine and cosine vector representations containing various frequencies by position encoding, and capturing relations among words in a high-dimensional vector space to obtain word embedding sequence vectors X with position information embed(D) and Xembed (P):
X embed =Embedding+PositionalEncodering,
Embedding(D)=D·W d ,Embedding(P)=P·W p
Figure FDA0004060861640000013
Figure FDA0004060861640000014
Wherein, the symbol (·) represents word embedded code, W d 、W p Representing a learnable parameter, locationencoding (), representing position encoding of a word embedding vector using linear transformation of a sine and cosine function, T representing any position in the word embedding sequence, d representing vector dimension, locationencoding (T,2i) Position coding of word embedding sequence T in 2i dimension (T,2i+1) Representing the position coding of the word embedding sequence T in 2i+1 dimension, aiming at the word embedding vector X obtained after the word embedding and the position coding are completed embed (D) And X is embed (P) obtaining the hidden characteristic sequence C of the source text by calculation of the Encoder layer E
3) And (3) calculating: the Encoder layer adopting the multi-input transducer structure encodes the user information and the basic knowledge embedded vector, and calculates the basic knowledge attention vector and the user information attention vector, specifically:
the Encoder layer consists of two sublayers, the first being a Multi-Headed self-focusing layer Multi-Headed Attention, the other being a feed-forward neural network FNN layer,
calculating a multi-head Attention vector Attention (·) according to the following calculation formula:
Q=Linear(X embed )=X embed *W q
K=Linear(X embed )=X embed *W k
V=Linear(X embed )=X embed *W v
Figure FDA0004060861640000021
wherein Linear (·) represents a Linear transformation operation, Q, K, V represent a query vector sequence, a key vector sequence, and a value vector sequence, W, respectively q 、W k 、W v Respectively different matrix of the learnable parameters, and W q W k W v ∈R d R represents a real number, d is a word embedding vector X embed Is the dimension of (1), softmax represents the normalized exponential function, K T Is a transpose of K, with multiple heads representing multiple different parameter matrices W i For X embed The linear mapping projection is carried out on each characteristic space i, and three weights W are multiplied respectively q 、W k 、W v The multi-headed attention layer merges knowledge from the same attention pooling, which is derived from h different feature spaces of the same query matrix Q, key matrix K and value matrix V, specifically denoted as h sets of different parameter matrices
Figure FDA0004060861640000022
The h groups of transformed queries, keys and values are attention-pooled in parallel, the h groups of attention-pooled output heads i Spliced together and passed through a linear projection matrix W which can be learned h Transforming to generate final basic knowledge multi-head attention output X att (D):
X att (D)=MultiHead(Q,K,V)=Concat(head 1 ;head 2 ;……;head h )W h
Figure FDA0004060861640000023
Wherein MultiHead represents a multi-headed attention function, concat represents a cascading operation,head i The attention output representing the ith subspace, i e (1, h);
Figure FDA0004060861640000024
parameter matrix representing query vector Q in the ith feature space,/and method for generating the same>
Figure FDA0004060861640000025
Parameter matrix representing key vector K in the ith feature space,/and method for generating the same>
Figure FDA0004060861640000026
Parameter matrix representing the i-th feature space median vector V, i E (1, h), W h Representing a learnable linear projection matrix for a user information attention vector X att Calculation of (P) and the above-mentioned attention vector X about basic knowledge att (D) Is the same as calculated;
4) Acquiring a source text hidden representation: the knowledge attention vector and the character information attention vector are fused, and the hidden representation C of the source text is obtained through a forward neural network E The method specifically comprises the following steps:
the basic knowledge attention feature vector X calculated in the step 3) is calculated att (D) Attention feature vector X with user information att (P) Linear fusion, output encoder fusion attention vector X hidden
X hidden =Linear{X att (D);X att (P)},
Linear (;) represents the cascade operation, X output hidden Through the feed-forward neural network FFN and then connected with a residual link, namely X hidden And adding the output para position of the FFN, and finally carrying out LayerNorm layer normalization on the added result once, wherein the calculation formula is as follows:
FFN=Linear{ReLU[Linear(X hidden )]},
C E =LayerNorm(X hidden +FFN),
wherein LayerNorm represents layer normalization operation, FFN represents a double-layer fully-connected network with ReLU as an activation function, and encoder source textThe hidden representation C E As input to the next module decoder;
5) Optimizing the language generation model: the Decoder adopting the multi-input transducer structure calculates the attention vector for the basic knowledge, the user information and the current state simultaneously, the three calculation results are linearly fused to obtain the fused attention characteristic representation, and the fused attention characteristic representation and the source text hiding representation C are combined E Acquiring context hidden representation C D Hiding representation C according to context D Generating a reply text sequence Y, and defining a loss function to optimize a language generation model, as shown in fig. 3, specifically:
the Decoder of the multi-input transducer structure consists of three sublayers, wherein the first sublayer is a mask multi-head self-attention layer, the second sublayer is an encoder-Decoder attention layer, the third sublayer is a feedforward neural network layer, and the mask multi-head attention layer takes historical dialogue data as input to extract mask attention vectors, and the calculation formula is as follows:
Figure FDA0004060861640000039
wherein ,
Figure FDA0004060861640000038
representing a masked attention vector, X embed (L) represents a word-embedded and position-encoded dialogue data vector, and the receiver encoder output C E Extracting historical dialog attention feature vector E, E and base knowledge attention feature vector X as input to encoder-decoder multi-head attention layer att (D) Attention feature vector X with user information att (P) concealing representation C by linear fusion output decoder D The calculation formula is as follows:
Figure FDA0004060861640000031
/>
Figure FDA0004060861640000032
Figure FDA0004060861640000033
Figure FDA0004060861640000034
Figure FDA0004060861640000035
where E represents the historical dialog attention feature vector,
Figure FDA0004060861640000036
representing a learnable projection matrix, X att (D) Representing a basic knowledge attention feature vector, X att (P) represents the attention feature vector of the user information, < ->
Figure FDA0004060861640000037
Represents a fused attention vector fusing basic knowledge, user information and historical session information, and Linear (;) represents cascading operation, C D The hidden representation of the decoder obtained after the layer normalization operation is represented, the feedforward neural network layer part is the same as the encoder part, namely a two-layer fully-connected network with residual error link is connected with a normalization layer, and the output C of the decoder D I.e. the output of the last decoder base unit is mapped via linear transformation and Softmax function to the probability distribution of the predicted word at the next time instant, at a given encoder output C E And the decoder outputs y at the previous time t-1 Next, predicting a probability distribution P (Y) of the word at the current moment, and generating a functional expression of a probability representation P (Y) of the reply sequence text Y as follows:
Y=(y 1 ,y 2 ,.....,y t-1 ,y t ),
P(Y)=Softmax(FFN(C D )+C D ),
wherein Y represents a reply text generated by the model, softmax represents a normalized exponential function, FFN represents a double-layer fully-connected network taking ReLU as an activation function, maximum likelihood estimation is adopted as a loss function, and the whole total loss function fully connects the maximum likelihood estimation functions by adopting weight parameters:
Figure FDA0004060861640000041
Figure FDA0004060861640000042
Figure FDA0004060861640000043
Loss=α*L P +β*L D
wherein m, n represents the number of samples; l (L) P Representing a user-determined loss function, L D A loss function representing knowledge decisions; p is p i An ith user sample representing the use of the model in prediction; d, d i Representing the ith knowledge used by the model in the prediction;
Figure FDA0004060861640000044
the prediction results of the model on user judgment and knowledge judgment are represented, loss represents a joint Loss function, and alpha and beta represent adjustable weight parameters. />
CN202310058399.8A 2023-01-19 2023-01-19 Dialogue generation method integrating basic knowledge and user information Pending CN116010575A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310058399.8A CN116010575A (en) 2023-01-19 2023-01-19 Dialogue generation method integrating basic knowledge and user information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310058399.8A CN116010575A (en) 2023-01-19 2023-01-19 Dialogue generation method integrating basic knowledge and user information

Publications (1)

Publication Number Publication Date
CN116010575A true CN116010575A (en) 2023-04-25

Family

ID=86035719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310058399.8A Pending CN116010575A (en) 2023-01-19 2023-01-19 Dialogue generation method integrating basic knowledge and user information

Country Status (1)

Country Link
CN (1) CN116010575A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244419A (en) * 2023-05-12 2023-06-09 苏州大学 Knowledge enhancement dialogue generation method and system based on character attribute
CN116502069A (en) * 2023-06-25 2023-07-28 四川大学 Haptic time sequence signal identification method based on deep learning
CN116759077A (en) * 2023-08-18 2023-09-15 北方健康医疗大数据科技有限公司 Medical dialogue intention recognition method based on intelligent agent
CN116821168A (en) * 2023-08-24 2023-09-29 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model
CN116975654A (en) * 2023-08-22 2023-10-31 腾讯科技(深圳)有限公司 Object interaction method, device, electronic equipment, storage medium and program product
CN117746078A (en) * 2024-02-21 2024-03-22 杭州觅睿科技股份有限公司 Object detection method and system based on user-defined category

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244419A (en) * 2023-05-12 2023-06-09 苏州大学 Knowledge enhancement dialogue generation method and system based on character attribute
CN116244419B (en) * 2023-05-12 2023-08-25 苏州大学 Knowledge enhancement dialogue generation method and system based on character attribute
CN116502069A (en) * 2023-06-25 2023-07-28 四川大学 Haptic time sequence signal identification method based on deep learning
CN116502069B (en) * 2023-06-25 2023-09-12 四川大学 Haptic time sequence signal identification method based on deep learning
CN116759077A (en) * 2023-08-18 2023-09-15 北方健康医疗大数据科技有限公司 Medical dialogue intention recognition method based on intelligent agent
CN116975654A (en) * 2023-08-22 2023-10-31 腾讯科技(深圳)有限公司 Object interaction method, device, electronic equipment, storage medium and program product
CN116975654B (en) * 2023-08-22 2024-01-05 腾讯科技(深圳)有限公司 Object interaction method and device, electronic equipment and storage medium
CN116821168A (en) * 2023-08-24 2023-09-29 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model
CN116821168B (en) * 2023-08-24 2024-01-23 吉奥时空信息技术股份有限公司 Improved NL2SQL method based on large language model
CN117746078A (en) * 2024-02-21 2024-03-22 杭州觅睿科技股份有限公司 Object detection method and system based on user-defined category

Similar Documents

Publication Publication Date Title
CN116010575A (en) Dialogue generation method integrating basic knowledge and user information
CN109670035B (en) Text abstract generating method
CN108153913B (en) Training method of reply information generation model, reply information generation method and device
CN110297887B (en) Service robot personalized dialogue system and method based on cloud platform
CN108595436B (en) Method and system for generating emotional dialogue content and storage medium
CN112259100B (en) Speech recognition method, training method of related model, related equipment and device
CN111274375B (en) Multi-turn dialogue method and system based on bidirectional GRU network
CN112115687A (en) Problem generation method combining triples and entity types in knowledge base
CN115964467A (en) Visual situation fused rich semantic dialogue generation method
CN113918813A (en) Method and device for recommending posts based on external knowledge in chat record form
CN117234341B (en) Virtual reality man-machine interaction method and system based on artificial intelligence
CN113223509A (en) Fuzzy statement identification method and system applied to multi-person mixed scene
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN116150338A (en) Intelligent customer service method and system based on multi-round dialogue
CN110929476B (en) Task type multi-round dialogue model construction method based on mixed granularity attention mechanism
CN112163080A (en) Generation type dialogue system based on multi-round emotion analysis
CN114386426B (en) Gold medal speaking skill recommendation method and device based on multivariate semantic fusion
CN114281954A (en) Multi-round dialog reply generation system and method based on relational graph attention network
CN117251562A (en) Text abstract generation method based on fact consistency enhancement
CN116759077A (en) Medical dialogue intention recognition method based on intelligent agent
CN117149977A (en) Intelligent collecting robot based on robot flow automation
CN111414466A (en) Multi-round dialogue modeling method based on depth model fusion
CN116312539A (en) Chinese dialogue round correction method and system based on large model
Yuan et al. A human–machine interaction scheme based on background knowledge in 6G-enabled IoT environment
CN115795010A (en) External knowledge assisted multi-factor hierarchical modeling common-situation dialogue generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination