CN112148863B

CN112148863B - Generation type dialogue abstract method integrated with common knowledge

Info

Publication number: CN112148863B
Application number: CN202011104023.9A
Authority: CN
Inventors: 冯骁骋; 冯夏冲; 秦兵; 刘挺
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2022-07-01
Anticipated expiration: 2040-10-15
Also published as: CN112148863A

Abstract

A generation type dialogue abstract method integrated with common knowledge belongs to the field of natural language processing. The invention solves the problems of inaccurate generated dialogue abstract and low abstraction caused by the existing method for generating the dialogue abstract without using common knowledge. The method comprises the following steps: acquiring a common knowledge base ConceptNet and a dialogue summary data set SAMSum; introducing tuple knowledge into a dialogue summary data set SAMSum by using the acquired common knowledge base conceptNet to construct a heterogeneous dialogue graph; and (4) generating a final dialogue abstract from a dialogue section through the trained dialogue heterogeneous neural network model by the dialogue heterogeneous neural network model constructed in the third step. The invention is applied to the generation of the dialogue summary.

Description

Generation type dialogue abstract method integrated with common knowledge

Technical Field

The invention relates to the field of natural language processing, in particular to a general knowledge-integrated generating type dialogue summarization method.

Background

Automatic text summarization based on natural language processing (AutomaticSummarization)^[1](topic: structural architecture by computer: techniques and protocols, author: Chris D Paice, year 1990, cited in the literature from Information Processing&Management), that is, given a text record of a multi-person conversation, a short text description containing key information of the conversation is generated, and as shown in fig. 1, a multi-person conversation and its corresponding standard abstract are shown.

For dialog summaries, the existing work has mostly focused on the generative (abstract) approach, i.e. allowing the final summary to contain novel words and phrases that the original text does not have. For example Liu et al^[2][ title: automatic dialog summary generation for customer service, author: chunyi Liu, year: 2019, the literature is quoted from Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining]For the customer service dialogue summary task, a multi-step generation mode is adopted to generate the dialogue summary, Liu et al^[3][ title: the author:zhengyuan Liu, year: 2019, the literature is quoted from arXiv preprint]And integrating topic information modeling conversation aiming at the doctor-patient conversation abstract task to generate a final abstract. Ganesh et al^[4][ title: abstract catalysis of spoke and writen conversion, authors: prakhar Ganesh, year: 2019, the literature is quoted from arXiv preprint]Useless sentences in the conversation are removed by using the conversation chapter structure as a rule, and then a conversation abstract is generated. Recently, in dialog reply generation^[5][ title: commonsense knowledge conversion with graph association, author: hao Zhou, year: 2018, the literature is quoted from IJCAI,]and dialogue context modeling^[6][ title: multi-task prediction for Multi-role dialogue representation learning, authors: tianyi Wang, year: in 2020, the literature is imported from AAAI]The tasks and the like show that although the existing abstract model based on the neural network has strong learning capacity, the existing method ignores the utilization of common knowledge, on one hand, the model cannot better understand the dialogue text, and generates an abstract with low quality; on the other hand, the lack of common sense knowledge can result in low abstract of the generated abstract. The model can be helped to better complete tasks by integrating explicit common knowledge, and the dialogue abstract integrated with the common knowledge can help the model to understand the high-level meaning behind the dialogue; and the method can also be used as a bridge between incoherent sentences to help better understand the conversation. However, existing dialog summarization systems have overlooked the use of common sense knowledge.

Common sense knowledge may help the dialog summarization system generate a higher quality summary. As shown in FIG. 1, Bob can know that Tom is expected to be given by Bob to take a free ride by 'pick up' and 'bad ride', and can help better generate a dialog summary by introducing explicit general knowledge of 'taking a free ride'. After the common sense knowledge is merged, in order to better model three types of data of speakers, sentences and common sense knowledge, the three types of data can be modeled by using the heteromorphic neural network, and a final abstract is generated.

Disclosure of Invention

The invention aims to solve the problems of inaccurate generated conversation abstract and low abstraction caused by the fact that the conventional generated conversation abstract method does not utilize common knowledge. A method for generating a dialog summary incorporating common knowledge is proposed.

A method for generating a dialogue abstract merged with common knowledge comprises the following steps:

step one, acquiring a common knowledge base ConceptNet and a dialogue summary data set SAMSum; the contained common sense knowledge exists in the form of a tuple, namely, the tuple knowledge is expressed as:

R＝(h,r,t,w)，

wherein R represents a tuple knowledge; representing a head entity; r represents a relationship; t represents a tail entity; w represents a weight, representing the confidence of the relationship; knowledge R represents the ownership relation R of the head entity and the tail entity t, and the weight is w;

the dialogue abstract data set SAMSum is divided into three parts, namely training, development and testing;

introducing tuple knowledge into the dialogue summary data set SAMSum by using the acquired common knowledge base conceptNet, and constructing a heterogeneous dialogue graph; the specific process is as follows:

step three, constructing a dialogue heterogeneous neural network model; the dialogue heterogeneous neural network model comprises a node encoder, a graph encoder and a decoder;

step three, constructing a node encoder, and acquiring node initialization representation by using a bidirectional long-time and short-time neural network

And word initialization representations

Step two, constructing a graph encoder, updating node representation by utilizing a heterogeneous graph neural network, and adding node position encoding information and updating word representation

Step three, constructing a decoder;

and step four, training the dialogue heterogeneous neural network model constructed in the step three, and generating a final dialogue abstract from a section of dialogue through the trained dialogue heterogeneous neural network model.

Advantageous effects

A conversation abstract integrated with common sense knowledge can help a model to understand the high-level meaning behind the conversation;

the conversation abstract integrated with the common knowledge can be used as a bridge between incoherent sentences to help better understand the conversation;

by introducing common knowledge, the model can be helped to generate more abstract and generalized abstract;

the invention introduces common sense knowledge in the dialogue summarization task, models three types of data of speakers, sentences and common sense knowledge in the dialogue into a heterogeneous dialogue graph, and models the whole heterogeneous dialogue graph by using a heterogeneous neural network.

The whole model adopts a graph to sequence framework to generate a final dialogue abstract. The problem that the conventional generated dialogue abstract ignores common knowledge utilization is solved. In the abstract generated after the experiment of the method, the more abstract and correct abstract is generated, the dialogue content is better summarized, the effectiveness of the method is shown, and the method obtains a better result on the evaluation index ROUGE than the existing method.

The ROUGE is a similarity measurement method based on recall rate, a group of indexes of automatic abstract and machine translation are evaluated, the sufficiency and the loyalty of the translation are examined, and the higher the value is, the better the value is. The calculation of ROUGE-1, ROUGE-2, and ROUGE-L involves unigram, bigram, and longest common subsequence, respectively.

Drawings

FIG. 1 is a diagram of a multi-person conversation and its corresponding standard abstract;

FIG. 2 is an example of a SAMSum session summary dataset session summary pair;

FIG. 3 is an example of a SAMSum dataset dialog-summary pair;

FIG. 4 is a related knowledge triple obtained from ConceptNet;

FIG. 5 is a sentence-knowledge graph constructed in accordance with the present invention;

FIG. 6 is a speaker-sentence graph constructed in accordance with the present invention;

FIG. 7 is a heterogeneous session diagram constructed in accordance with the present invention;

fig. 8 is a schematic diagram of the inventive model, in which (a) heterogeneous session map construction, (b) node encoder, (c) map encoder, and (d) decoder.

Detailed Description

The first embodiment is as follows: the embodiment is a generation type dialogue summarization method for integrating common knowledge, which comprises the following steps:

the method comprises the following steps: a large-scale common sense knowledge base ConceptNet and a session summary data set SAMSum are obtained.

Step one, obtaining a large-scale common sense knowledge base conceptNet:

obtaining a large-scale common knowledge base ConceptiNet from http:// conceceptinetIo; the common sense knowledge contained in the method exists in the form of tuple, namely, the tuple knowledge can be expressed as:

R＝(h,r,t,w)，

wherein R represents a tuple knowledge; h represents a head entity; r represents a relationship; t represents a tail entity; w represents a weight, representing the confidence of the relationship; knowledge R represents that a head entity h and a tail entity t have a relation R, and the weight is w; for example, R ═ call, associated, contact, 10, meaning that the relationship of "call" to "contact" is "associated" and the weight is 10, general knowledge in the form of tuples is available on a large scale via the website http:// concept.

Step two, acquiring a dialogue summary data set SAMSum:

fromhttps://arxiv.org/abs/1911.12237A dialog summary data set SAMSum may be obtained, which is divided into three parts of training, development and testing, the number of which is 14732, 818, 819, respectively, which is a fixed division of a unified standard; the data set mainly describes subjects such as chatting among participants, and each conversation has a corresponding standard abstract; FIG. 2 shows an example of this data set;

introducing tuple knowledge into a dialogue summary data set SAMSum by using the obtained large-scale common sense knowledge base conceptNet, and constructing a heterogeneous dialogue graph;

constructing a dialogue heterogeneous neural network model; the model mainly comprises three parts: the node encoder, the graph encoder and the decoder are three parts;

step three, constructing a node encoder, and acquiring node initialization representation by using a bidirectional long-short time neural network (Bi-LSTM)

And word initialization representations

(wherein

And

all updated in step three or two);

step three, constructing a graph encoder, updating node representation by utilizing a heterogeneous graph neural network, and adding node position coding information and updating word representation

Thirdly, constructing a decoder, and generating an abstract by using a unidirectional long-short time memory network (LSTM) decoder;

step four: and training the dialogue heterogeneous neural network model constructed in the third step, and generating a final dialogue abstract from a dialogue through the trained dialogue heterogeneous neural network model.

The second embodiment is as follows: the second step is to introduce tuple knowledge into the session summary data set SAMSum by using the obtained large-scale common sense knowledge base ConceptNet, and construct a heterogeneous session graph; the specific process is as follows:

step two, acquiring dialogue related knowledge; for a section of conversation, the method firstly acquires a series of related tuple knowledge from the ConceptNet according to words in the conversation, eliminates noise knowledge, and finally can obtain a tuple knowledge set related to the given conversation, as shown in fig. 4;

step two, constructing a sentence-knowledge graph:

for the related tuple knowledge acquired in the second step, if a sentence A and a sentence B exist, a word a belongs to the sentence A, a word B belongs to the sentence B, and if tail entities h of the related knowledge of a and B are consistent, the sentence A and the sentence B are connected to the tail entity h; obtaining a sentence-knowledge graph; for example, in FIG. 5, sentence A is "do you have Betty number", and sentence B is "do Lao last called her"; the words a and b are "number" and "call", respectively; there are related tuples of knowledge (number, place, phonebook) and (call, related, phonebook), then sentences a and B are connected to the entity "phonebook";

the common knowledge obtained by the method has the problems of redundancy and repetition, so the invention also needs to simplify tuple knowledge, and the common knowledge with cleaner and higher quality can be introduced by simplifying the knowledge;

(1) if sentences a and B connect a plurality of entities, then the one with the highest average weight of the relationship is selected, for example, as shown in fig. 5, the average weight of "phonebook" is greater than the average weight of "date", then "phonebook" is selected;

(2) if different sentences are respectively connected to the same entity, the entity is merged into one entity, for example, as shown in fig. 5, two "contact" entities are merged into one "contact" entity;

step two, constructing a speaker-sentence subgraph:

establishing an edge relation between the speaker and the sentence according to the 'one sentence of the speaker' to obtain a speaker-sentence subgraph, as shown in fig. 6;

step two, fusing a sentence-knowledge graph and a speaker-sentence graph:

in the sentence-knowledge graph and the speaker-sentence subgraph, the sentence parts are the same, so the sentence parts are merged, and the sentence-knowledge graph and the speaker-sentence subgraph are fused into a final heterogeneous dialogue graph; the constructed heterogeneous dialogue graph has two edges between a speaker and a sentence, namely a 'speak-by' edge from the speaker to the sentence and a 'rev-speak-by' edge from the sentence to the speaker; there are two kinds of edges between sentences and knowledge, from knowledge to "knock-by" edge of a sentence, there are three kinds of nodes from the constructed heterogeneous dialogue graph: speaker, sentence, general knowledge.

Other steps and parameters are the same as those in the first embodiment.

The third concrete implementation mode: the difference between the first and second embodiments is that the third step is to construct a node encoder, and obtain the node initialization representation by using a bidirectional long-short-time neural network (Bi-LSTM)

And word initialization representations

The specific process is as follows:

for step two, the heterogeneous dialogue graph provided by the invention is adopted, wherein each node v_iContaining | v_iI words, the word sequence is

Wherein w_i,nRepresenting a node v_iN.e [1, | v_i|](ii) a Using a Bi-directional long-and-short neural network (Bi-LSTM) to align word sequences

Generating a forward hidden layer sequence

And backward hidden layer sequence

Wherein the forward hidden layer state

Backward hidden layer state

x_nDenotes w_i,nA word vector representation of; splicing the last hidden layer representation of the forward hidden layer state with the first hidden layer representation of the backward hidden layer state to obtain the initialized representation of the node

Wherein; representing vector splicing; and simultaneously obtaining the initialized representation of each word in the node

As shown in fig. 8.

Other steps and parameters are the same as those in the first or second embodiment.

The fourth concrete implementation mode: the difference between the present embodiment and the first to third embodiments is that the step three and two structure diagram encoder updates the node representation by using the heterogeneous diagram neural network, and adds the node position coding information and the updated word representation

The specific process is as follows:

given a target node t, obtaining a neighbor node s e N (t), wherein N (t) represents a neighbor node set of t, and s represents one of the neighbor nodes; given an edge e ═ s, t, representing an edge pointing from the neighbor node s to the target node t, defines: (1) the node type mapping function is:

wherein, tau represents a node type mapping function; v represents a given node; v represents a node set;

representing a set of node types; in the heterogeneous dialogue graph constructed in the step two, three node types of speakers, sentences and common knowledge are contained;

(2) the edge relationship type mapping function is:

wherein,

representing an edge type mapping function; e represents a given edge; e represents an edge set;

representing a set of edge types;

in the heterogeneous dialog diagram constructed in the second step, four types of edges are contained: spot-by, rev-spot-by, knock-by, rev-knock-by; for a given edge e ═ (s, t), s and t each possess a representation from the previous layer

And

firstly, the first step is to

And

is mapped as

And

wherein,

to representThe mapping function is related to the number of layers,

indicating a mapping function related to the type, l indicating the l-th layer of the graph network,

a key-value representation representing the neighbor node s at level l,

representing the query representation of the node t at the l layer;

then calculate

And

weight in between:

wherein,

representing learnable parameters related to the number of layers and the type of edge; t represents transposition; α (s, e, t) represents

And

weight in between;

after the weight between each neighbor node s and the target node t is obtained, all weights are normalized:

wherein Softmax is a normalization function，ATT^(l)(s, e, t) is the score after final normalization;

representing each neighbor node s

The mapping is as follows:

wherein,

as a mapping function related to type and number of layers;

the message representation of the neighbor node s at the layer l;

is obtained by

Then, the final message vector is calculated:

wherein,

learnable parameters related to type and number of layers;

when the target node t type is not a sentence node, the invention utilizes normalized score ATT^(l)(s, e, t) as weights to weight the summed message vector Msg^(l)(s, e, t) to obtain

Wherein,

represents a summation;

multiplying;

is a representation of all neighbor nodes of the fusion t;

when the type of the target node t is a sentence node, the invention distinguishes the type of the neighbor node s to carry out information fusion to obtain

Where τ(s) denotes the type of neighbor node, s_kNeighboring nodes, s, of the type of representation knowledge_sA neighbor node of which the representation type is speaker;

respectively mapping the two situations to obtain

The invention then maps it to

As the node after update represents:

wherein Sigmoid represents an activation function,

representing a mapping function related to the type and the number of layers;

next, the updated node representation

Upper integration of location information for each node v_iAssociated with a position

For speaker nodes and knowledge nodes, location

With respect to the nodes of the sentence,

the position of a sentence in the dialog, i.e. the second sentence; the invention sets a position vector matrix W^posFor each position

Its corresponding vector representation can be obtained

Will be provided with

Is merged into

Get an updated representation:

finally, will furtherNode representation after new

With corresponding initialization word representation

Splicing and mapping to obtain an updated word representation:

wherein F _ Linear () represents a mapping function; representing vector stitching.

Other steps and parameters are the same as those in one of the first to third embodiments.

The fifth concrete implementation mode: the difference between this embodiment and one of the first to the fourth embodiments is that the decoder is constructed by the third step:

word representation after being updated

Then, the mean s of the representations of all words is calculated₀，

Wherein G represents all node sets in the heterogeneous dialogue graph; s₀Assigning a cell state and a hidden state to the decoder to initialize an initial state of the decoder; at each step of decoding, according to the decoder state s_tComputing a context vector c_t：

a^t＝Softmax(e^t) (12)

Wherein, W_aWhich represents a parameter that can be learned by the user,

is an updated word representation; t denotes the transpose of the image,

is the unnormalized weight for the nth term for the ith node; s_tIs the state of the decoder at the moment t; a is^tIs the weight after normalization; e.g. of the type^tIs the weight before normalization; c. C_tIs a context vector representation;

the weight of the nth word of the ith node after normalization;

according to the context vector c_tAnd decoder t time state s_tCalculating the probability P of generating each word in the word list_vocab：

P_vocab(w)＝Softmax(V′(V[s_t；c_t]+b)+b′) (14)

Wherein V ', V, b, b' are learnable parameters; [ s ] of_t；c_t]Denotes s_tAnd c_tSplicing; softmax is a normalization function; p_vocab(w) represents the probability of generating word w;

in addition to generating words from a vocabulary, the present model also allows words to be copied from the original text; first, the probability p of generating words is calculated_gen：

Wherein, w_c，w_s，w_xAnd b_ptrIs a learnable parameter; sigmoid is an activation function; p is a radical of_genRepresenting a probability of generating a word; 1-p_genRepresenting the probability of copying from the original text;

is to w_cCalculating and transposing;

is to w_sCalculating and transposing;

is to w_xCalculating and transposing; x is the number of_tInputting word vectors of words for a decoder at the time t;

therefore, for a word w, the probability generated from the word list and the probability copied from the original text are considered together, and the final probability is expressed as formula (16):

wherein,

the weight of the nth word of the ith node after normalization;

according to equation (16), the decoder can be used to select the word with the highest probability as output at each decoding step.

Other steps and parameters are the same as in one of the first to fourth embodiments.

The sixth specific implementation mode: the present embodiment is different from the first to the fifth embodiment in that the dialogue heterogeneous neural network model constructed in the training step three generates a final dialogue summary from a section of dialogue through the trained dialogue heterogeneous neural network model; the specific process is as follows:

training a dialogue heterogeneous neural network model by using a maximum likelihood estimation and utilizing a training part of a SAMSum data set, and calculating cross entropy loss according to the word probability predicted by the formula (16) and standard words at each decoding step of a decoder;

for a dialog D, a standard abstract is given

The training objective is to minimize equation (17):

wherein,

the first word in the standard abstract;

the last word in the standard abstract;

words of the standard abstract which need to be predicted at the time t; l is a cross entropy loss function;

the dialogue heterogeneous neural network model is trained according to equation (17), the best model is selected by the development part of the SAMSum data set, and finally the final dialogue summary is generated by the trained dialogue heterogeneous neural network model according to equation (16) for the test part of the SAMSum data set.

Other steps and parameters are the same as those in one of the first to fifth embodiments.

The seventh concrete implementation mode: the difference between this embodiment and one of the first to sixth embodiments is that the noise elimination knowledge method includes:

(1) excluding tuple knowledge if the weight w in this knowledge is lower than 1;

(2) if the relation r of tuple knowledge belongs to: antisense words, related in language origin, originated in language origin, different, not desired, then this knowledge is excluded.

Other steps and parameters are the same as those in one of the first to sixth embodiments.

The specific implementation mode is eight: the difference between this embodiment and one of the first to seventh embodiments is that the process of simplifying tuple knowledge includes:

(2) if different sentences are respectively connected to the same entity, the entity is merged into one entity, for example, as shown in fig. 5, two "contact" entities are merged into one "contact" entity.

Other steps and parameters are the same as those in one of the first to seventh embodiments.

Examples

The first embodiment is as follows:

the invention realizes the proposed model and compares the model with the current baseline model and the standard abstract.

(1) Summary of baseline model generation:

Gary and Lara will meet at 5pm for Tom's bday party.

(2) summary of model generation of the invention:

Gary and Lara are going to Tom's birthday party at 5pm.Lara will pick up the cake.

(3) standard abstract:

It’s Tom's birthday.Lara and Gary will come to Tom's place about 5pm to prepare everything.Gary has already paid for the cake Lara will pick it.

according to the above embodiments, it can be seen that the model of the present invention can generate results more similar to the standard abstract, and the dialog information can be better understood by introducing common sense knowledge.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it is therefore intended that all such changes and modifications be considered as within the spirit and scope of the appended claims.

Claims

1. A method for generating a dialog abstract merged with common knowledge is characterized by comprising the following steps:

step one, acquiring a common knowledge base ConceptNet and a dialogue summary data set SAMSum; the contained common sense knowledge exists in the form of tuple, namely tuple knowledge, and is expressed as follows:

R＝(h，r，t，w)，

wherein R represents a tuple knowledge; h represents a head entity; r represents a relationship; t represents a tail entity; w represents a weight, representing the confidence of the relationship; knowledge R represents that a head entity h and a tail entity t have a relation R, and the weight is w;

And word initialization representations

Step three, constructing a decoder;

2. The method for generating dialogue summary merged with general knowledge according to claim 1, wherein in the second step, the obtained general knowledge base ConceptNet is used to introduce tuple knowledge to the dialogue summary data set SAMSum to construct a heterogeneous dialogue graph; the specific process is as follows:

step two, for a section of conversation, eliminating noise knowledge according to related tuple knowledge acquired from ConceptNet by words in the conversation to obtain a tuple knowledge set related to the given conversation;

step two, assuming that a sentence A and a sentence B exist in the related tuple knowledge acquired in the step two, the word a belongs to A, the word B belongs to B, simplifying the tuple knowledge, and if tail entities h of a and B are consistent, connecting the sentences A and B to the tail entity h; obtaining a sentence-knowledge graph;

step two, establishing an edge relation between the speaker and a sentence according to the 'one sentence of the speaker' to obtain a speaker-sentence image;

step two, the sentence-knowledge graph and the speaker-sentence graph are fused into a heterogeneous dialogue graph; the heterogeneous dialogue graph has two edges between a speaker and a sentence, namely a 'speak-by' edge from the speaker to the sentence and a 'rev-speak-by' edge from the sentence to the speaker; there are two kinds of edges between sentence and tuple knowledge, namely the "knock-by" edge from knowledge to sentence, the "rev-knock-by" edge from sentence to tuple knowledge; heterogeneous dialogs exist with three types of nodes, namely speakers, sentences, and common sense knowledge.

3. The method for generating dialogue summary incorporating general knowledge as claimed in claim 2, wherein the step three is a node encoder; obtaining node initialization representation using a bidirectional long-and-short-term neural network

And word initialization representations

The specific process is as follows:

for a constructed heterogeneous dialogue graph, each node v_iContaining | v_iI words, the word sequence is

Wherein, w_i，nRepresenting a node v_iN.e [1, | v_i|](ii) a Word sequence using bidirectional long-and-short-term neural network

Generating a forward hidden layer sequence

And backward hidden layer sequence

Wherein the forward hidden layer state

Backward hidden layer state

x_nIs w_i，nA word vector representation of; splicing the last hidden layer representation of the forward hidden layer state with the first hidden layer representation of the backward hidden layer state to obtain the initialized representation of the node

Wherein, the first and second connecting parts are connected with each other; representing vector stitching; obtaining initialized representation of each word in node

4. The method for generating dialogue summary incorporating general knowledge according to claim 1 or 2, wherein the step three or two constructs graph encoder, updates node representation by using a neural network of a heterogeneous graph, and adds node position coding information and update wordsLanguage representation

The specific process is as follows:

giving a target node t, and obtaining a neighbor node s e to N (t), wherein N (t) represents a neighbor node set of t, and s represents one neighbor node; given an edge e ═ s, t, representing an edge pointing from the neighbor node s to the target node t, defines:

(1) the node type mapping function is:

τ(v):

wherein τ represents a node type mapping function; v represents a given node; v represents a node set;

(2) the edge relationship type mapping function is:

wherein,

representing a set of edge types;

in a heterogeneous dialog diagram, there are a total of four types of edges: spot-by, rev-spot-by, knock-by, rev-knock-by; for a given edge e ═ (s, t), s and t each possess a representation from the previous layer

And

will be provided with

And

is mapped as

And

wherein,

a mapping function relating to the number of layers is indicated,

representing a mapping function related to the type; l denotes the l-th layer of the network,

a key-value representation representing the neighbor node s at level l,

representing the query representation of the node t at the l layer;

computing

And

weight in between:

wherein,

And

weight in between;

wherein Softmax is a normalization function, ATT^(l)(s, e, t) is the score after final normalization;

representing each neighbor node s

The mapping is:

wherein,

as a mapping function related to type and number of layers;

is obtained by

Then, the final message vector is calculated:

wherein,

learnable parameters related to type and number of layers;

when the target node t type is not a sentence node, utilizing the normalized score ATT^(l)(s, e, t) as weights to weight the summed message vector Msg^(l)(s, e, t) to obtain

Wherein,

which means that the sum is given,

multiplying;

is a representation of all neighbor nodes of the fusion t;

when the type of the target node t is a sentence node, distinguishing the type of the neighbor node s to perform information fusion to obtain

Where τ(s) denotes the type of neighbor node, s_kDenotes a neighbor node of which the type is knowledge, τ(s) in equation (6) denotes the type of the neighbor node, s_sA neighbor node of which the representation type is speaker;

to obtain

Then, it is mapped as

As an updated node representation:

wherein Sigmoid represents an activation function,

representing a mapping function related to type and number of layers;

node representation after update

For speaker nodes and knowledge nodes, location

With respect to the nodes of the sentence,

the position of a sentence in the dialog, i.e. the second sentence;

setting a position vector matrix W^posFor each position

Can obtain its corresponding vector representation

Will be provided with

Is merged into

Get an updated representation:

representing the updated nodes

With corresponding initialization word representation

Splicing, and mapping to obtain an updated word representation:

5. The method for generating dialogue summary incorporating general knowledge as claimed in claim 4, wherein the third step is to construct a decoder; the specific process is as follows:

get updated word representation

Thereafter, the mean s of the representations of all words is calculated₀Expressed as:

wherein G represents all node sets in the heterogeneous dialogue graph;

s₀the cell state and the hidden state assigned to the decoder initialize the initial state of the decoder; at each step of decoding, using an attention mechanism, according to the decoder state s_tComputing a context vector c_t：

a^t＝Softmax(e^t) (12)

Wherein, W_aRepresents a learnable parameter;

is an updated word representation; t represents transposition;

is the unnormalized weight of n words for the ith node; s_tIs the state of the decoder at the moment t; a is a^tIs the weight after normalization; e.g. of the type^tIs the weight before normalization; c. C_tIs a context vector representation;

for the weight of the nth term of the ith node after normalization;

vector c_tAnd decoder t time state s_tCalculating the probability P of generating each word in the word list_vocab：

P_vocab(w)＝Softmax(V′(V[s_t；c_t]+b)+b′) (14)

in addition to generating words from the vocabulary, allowing copying of words from the original text; first, the probability p of generating words is calculated_gen：

Wherein, w_c，w_s，w_xAnd b_ptrIs a learnable parameter; sigmoid is an activation function; p is a radical of formula_genRepresenting a probability of generating a word; 1-p_genThen the probability of copying from the original text is indicated;

is to w_cCalculating and transposing;

is to w_sCalculating and transposing;

is to w_xCalculating transposition; x is the number of_tInputting word vectors of words for a decoder at the time t;

the final probability is as follows (16):

wherein,

the weight of the nth word of the ith node after normalization;

according to equation (16), the word with the highest probability is selected as output at each step of decoding by the decoder.

6. The method for generating dialogue summary fused with general knowledge according to claim 5, wherein the step four trains the dialogue heterogeneous neural network model constructed in the step three, and generates the final dialogue summary from a dialogue by the trained dialogue heterogeneous neural network model; the specific process is as follows:

using maximum likelihood estimation, a dialogue-heterogeneous neural network model is trained using the training portion of the SAMSum dataset, and at each step of decoder decoding, the cross-entropy loss is calculated from the predicted word probabilities and the standard words according to equation (16):

for a dialog D, a standard abstract is given

The training objective is to minimize equation (17):

wherein,

the first word in the standard abstract;

the last word in the standard abstract;

the dialogue-heterogeneous neural network model is trained according to equation (17), the best model is selected using the development part of the SAMSum data set, and finally the final dialogue summary is generated using the trained dialogue-heterogeneous neural network model according to equation (16) for the test part of the SAMSum data set.

7. The method for generating dialogue summary incorporating general knowledge as claimed in claim 2, wherein the method for eliminating noise knowledge comprises:

(1) when the weight w in the tuple knowledge is lower than 1, excluding the tuple knowledge;

(2) when the relation r of tuple knowledge belongs to: antisense words, related in language origin, originated in language origin, different or not desired, this knowledge is excluded.

8. The method for generating dialogue summary merged with common sense knowledge according to claim 2, wherein the process of simplifying tuple knowledge comprises:

(1) if the sentence A and the sentence B are connected with a plurality of entities, selecting one with the highest average weight of the edge relation;

(2) if different pairs of sentences are respectively connected to entities with the same name, all entities with the same name are combined into one entity.

9. The method for generating dialogue summary incorporating general knowledge according to claim 1, wherein the number of training, developing and testing in SAMSum is: 14732, 818, 819.