CN115495552A - Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment - Google Patents

Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment Download PDF

Info

Publication number
CN115495552A
CN115495552A CN202211128307.0A CN202211128307A CN115495552A CN 115495552 A CN115495552 A CN 115495552A CN 202211128307 A CN202211128307 A CN 202211128307A CN 115495552 A CN115495552 A CN 115495552A
Authority
CN
China
Prior art keywords
semantic
vector
word
representation
semantic representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211128307.0A
Other languages
Chinese (zh)
Inventor
蔡飞
张伟康
刘诗贤
陈洪辉
毛彦颖
刘登峰
王思远
李佩宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211128307.0A priority Critical patent/CN115495552A/en
Publication of CN115495552A publication Critical patent/CN115495552A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment, wherein the method comprises the steps of obtaining an initial word vector of a dialogue text; obtaining sequential semantic representation of the initial word vector, including obtaining a speaking-level sentence semantic vector of the initial word vector, determining a dialogue-level sentence semantic vector of the initial word vector according to the speaking-level sentence semantic vector, and marking the dialogue-level sentence semantic vector as sequential semantic representation; obtaining the graph domain semantic representation of the initial word vector on the graph domain; according to the sequential semantic representation and the image domain semantic representation, performing semantic enhancement on the dialog text to obtain enhanced semantic representation; and generating a reply text according to the enhanced semantic representation. The invention aims to integrate semantic advantages in different structure modeling and obtain information association and semantic reasoning with larger span. The model of the invention has excellent performance on a reference model and relieves the long-distance semantic dependence problem.

Description

Multi-round dialog reply generation method based on two-channel semantic enhancement and terminal equipment
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-round dialog reply generation method based on dual-channel semantic enhancement and a terminal device.
Background
With the rise of the internet of everything and man-machine interaction, a conversation system, as a communication medium with wide application, has been vertically deep into a plurality of scenes such as intelligent customer service, AI sound boxes, intelligent cabins and the like. Meanwhile, due to the great advantages of improving the information service experience and assisting the voice instruction interaction, the conversation system has great research value and application value, and the multi-turn conversation system is the most prominent. The multi-round dialog reply generation is a generation type dialog focusing on continuous dialog and complex semantic interaction, can carry out meaningful and diversified fluent replies on a user according to an interactive text of the user and an intelligent agent within a certain period of time, and is gradually researched and focused by researchers in various countries in recent years.
Most of the current intelligent dialog systems are developed based on an end-to-end deep neural network technology, and with popularization of application scenes, the dialog systems cannot reply all the time, the form is single, and the contents lack scene values. Although the continuous interactive multi-turn dialogue system research obtains the improvement of response quality by introducing common knowledge or fixed sentence patterns, the main challenge is to effectively model the context and obtain accurate semantic representation.
In the prior art, in the process of extracting semantic information, due to the limitation of a model structure and a long sequence structure of a session history, accurate query information is difficult to obtain, semantic noise is easy to introduce to generate a reply, so that a non-ideal response is generated and the robustness is poor.
Disclosure of Invention
The invention provides a multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment, and solves the technical problems of poor reply quality and poor robustness of a multi-round dialogue reply method in the prior art.
The first aspect of the invention discloses a multi-round dialog reply generation method based on two-channel semantic enhancement, which comprises the following steps:
acquiring an initial word vector of a dialog text;
acquiring sequential semantic representation of the initial word vector, including acquiring a speaking-level sentence semantic vector of the initial word vector, determining a dialogue-level sentence semantic vector of the initial word vector according to the speaking-level sentence semantic vector, and marking the dialogue-level sentence semantic vector as sequential semantic representation;
obtaining the graph domain semantic representation of the initial word vector on the graph domain;
performing semantic enhancement on the dialog text according to the sequential semantic representation and the map domain semantic representation to obtain enhanced semantic representation;
and generating a reply text according to the enhanced semantic representation.
Preferably, the obtaining of the utterance level sentence semantic vector of the initial word vector specifically includes:
inputting the initial word vectors into a sentence layer encoder and a word attention module in sequence to obtain a speaking-level sentence semantic vector;
determining a dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, specifically comprising:
the speech-level sentence semantic vector is sequentially input into a context encoder and a sentence attention module to obtain a dialogue-level sentence semantic vector;
the sentence layer encoder and the context encoder are both bidirectional gated neural networks;
the mechanisms used in the word attention module and the sentence attention module are both attention mechanisms.
Preferably, obtaining a graph domain semantic representation of the initial word vector on a graph domain specifically includes:
obtaining the topic key words of the initial word vector, and determining nodes of a heterogeneous cognitive map according to the topic key words, wherein the nodes comprise a topic-sentence cluster node, a conversation query node and a common node (a conversation sentence cluster without a topic);
determining the edges of the heterogeneous cognitive map and the weight of each edge according to the nodes of the heterogeneous cognitive map, wherein the weight is determined according to the theme coincidence degree between sentences in the dialog text corresponding to the initial word vector;
and learning the vector representation of the nodes in the heterogeneous cognitive map by using a graph neural network to obtain the graph domain semantic representation of the initial word vector on a graph domain.
Preferably, performing semantic enhancement on the dialog text according to the sequential semantic representation and the graph domain semantic representation, specifically including:
performing semantic enhancement on the dialog text according to a first formula, wherein the first formula is as follows:
Figure BDA0003849867690000021
in the formula, c final For the purpose of the enhanced semantic representation,
Figure BDA0003849867690000022
in order to represent the sequential semantics, the method comprises the following steps,
Figure BDA0003849867690000023
for the map domain semantic representation, δ is the number of semantics in the sequential semantic representation, and (1- δ) is the number of semantics in the map domain semantic representationThe number of semantics.
Preferably, generating a reply text according to the enhanced semantic representation includes:
inputting the enhanced semantic representation into a one-way gating neural network to obtain a hidden state of each word in a generated reply text;
and determining the generation probability of each word according to the hidden state, and determining a reply text according to the generation probability.
Preferably, inputting the enhanced semantic representation into a unidirectional gated neural network, and acquiring a hidden state of each word in the generated reply text, specifically including:
generating the hidden state of each word in the reply text according to a second formula, wherein the second formula is as follows:
Figure BDA0003849867690000024
in the formula, y i Generating the ith word, y, in the reply text for the training phase i-1 The i-1 th word in the reply text is generated for the training phase,
Figure BDA0003849867690000031
is y i GRU (-) indicates that the parameters are input into the gated neural network,
Figure BDA0003849867690000032
is y i-1 Hidden state of (c) final Is the enhanced semantic representation.
Preferably, determining the generation probability of each word according to the hidden state specifically includes:
determining a generation probability of each word according to a third formula, the third formula being:
Figure BDA0003849867690000033
in the formula,
Figure BDA0003849867690000034
the ith word in the reply text is generated for the prediction stage,
Figure BDA0003849867690000035
is composed of
Figure BDA0003849867690000036
The probability of generation of (a) is,
Figure BDA0003849867690000037
and
Figure BDA0003849867690000038
respectively as the ith word in the topic keyword list and the reply text list in the prediction stage
Figure BDA0003849867690000039
The probability of generation of (c).
Preferably, the
Figure BDA00038498676900000310
Determined according to a fourth formula, which is:
Figure BDA00038498676900000311
where η (·) is a nonlinear function tanh, V is a reply text vocabulary, K is a topic keyword vocabulary,
Figure BDA00038498676900000312
generating the ith word y in the reply text for the training phase i Hidden state of (y) i-1 Generating the i-1 word in the reply text for the training phase, c final For the enhanced semantic representation, vocab represents variable i.
Preferably, the
Figure BDA00038498676900000313
The method is determined according to a fifth formula, wherein the fifth formula is as follows:
Figure BDA00038498676900000314
where η (·) is a nonlinear function tanh, V is a reply text vocabulary, K is a topic keyword vocabulary,
Figure BDA00038498676900000315
generating the ith word y in the reply text for the training phase i Hidden state of (a), y i-1 Generating the i-1 word in the reply text for the training phase, c final For the enhanced semantic representation, vocab represents variable i.
A second aspect of the present disclosure discloses a terminal device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above method when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses a sequential and image domain two-channel collaborative semantic modeling and reasoning method, and aims to fuse semantic advantages in modeling of different structures and obtain information association and semantic reasoning with larger span. In detail, on one hand, a dialogue-level heterogeneous cognitive map is constructed, map nodes are the integration of subject semantics and sentence cluster semantics, the edges in the map are the degree of subject coincidence among sentences, and then a double-gated map neural network is used for deep learning to obtain semantic representation of a dialogue context on a map domain; embedding a hierarchical attention mechanism in the retained sequential channels, on the other hand, results in a sequential semantic representation of the dialog context. And finally, coordinating the information contributions of the two semantic representations to predict. The model of the invention has excellent performance on a reference model and relieves the long-distance semantic dependence problem.
The method is helpful for promoting the further development of multi-turn dialog generation, helping the system to better understand the context high-level semantic information, also being capable of retrieving new cognition from the reconstructed heterogeneous cognitive map structure, helping to generate diversified and valuable information, having good robustness and improving the satisfaction degree and efficiency of the user in using information service.
Drawings
FIG. 1 is a schematic flow chart of a method for generating a multi-turn dialog reply based on two-channel semantic enhancement according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of a multi-round dialog reply generation method based on dual-channel semantic enhancement according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network according to the present invention;
FIG. 4 is a node semantic representation encoding diagram according to an embodiment of the present invention;
FIG. 5 is an aggregation strategy for semantic features according to an embodiment of the present invention;
FIG. 6 is a graph of performance of the SGDC model and the baseline model in accordance with an embodiment of the present invention on test samples of different context lengths. In FIG. 6, (a) is the PPL value of the dataset DailyDialog, (b) is the PPL value of the dataset MuTual, (c) is the Dist-2 value of the dataset DailyDialog, (d) is the Dist-2 value of the dataset MuTual, (e) is the EA value of the dataset DailyDialog, and (f) is the EA value of the dataset MuTual.
Detailed Description
The following detailed description of the embodiments of the invention will be made with reference to the accompanying drawings. It is to be understood that the following examples are only illustrative and explanatory of the present invention and should not be construed as limiting the scope of the present invention. All the technologies realized based on the above-mentioned contents of the present invention are covered in the protection scope of the present invention.
The generation of multiple dialog replies is essentially a problem of predicting the sequence from the sequence. The invention carries out the tasking on the multi-round dialogue reply generation modeling, and belongs to the natural language text generation task. A multi-turn dialog sequence Diag comprises M>2 turns of speech, m ∈ (1, M)]Representing the mth turn, the dialog sequence may be defined by Diag = { U = { 1 ,......U m-1 ,U M Expressed with Dialogue History (dialog History) utterance sequence (U) 1 ,......U m-2 ) Contextual information representing the entire dialog context, dialog Query (dialog Query) utterance U m-1 Representing the current state of dialog progress, U M Is the Target utterance (Target Response) to be generated by the multi-turn dialog reply generation task of the present invention.
Most of the current intelligent dialog systems are developed based on an end-to-end deep neural network technology, with popularization of application scenes, reply of the dialog systems cannot progress all the time, the forms are single, and the contents lack scene values. Although the continuous interactive multi-turn dialogue system research obtains the improvement of response quality by introducing common knowledge or fixed sentence patterns, the main challenge is to effectively model the context and obtain accurate semantic representation. The most common layered codec framework at present ignores the fact that dialogs are generated in a coherent process, any two utterances being semantically related and complementing or clipping each other. When each sentence utterance is encoded separately without regard to their inherent relationships, the hierarchical model may not be able to capture the utterance coherence in context and ultimately produce a non-ideal response. Therefore, the encoder-decoder framework based on the hierarchical model still needs to screen the contribution degrees of different sentences and perform differential encoding on the context semantic modeling of the conversation history.
In the multi-round dialog reply generation, the topic information is high-level semantic features extracted according to the dialog history, and the model improves the response reply information amount and the topic cutting degree method by integrating the topic semantics. However, there have been researches to introduce or select topics to alleviate semantic sparsity, and vector representation of topics does not consider semantic interaction with the specific dialog context in which the topic is located, and this way without context may result in inaccurate topic representation and utterance sentence representation due to the inherent ambiguity of natural language, thereby impairing the effect of response generation.
The vector representation learning of multi-round dialog text input is gradually stabilized from an unordered bag-of-words structure to a sequence structure, the semantic modeling mode is also developed from a machine learning method to a deep learning method represented by a cyclic neural network and an attention mechanism, but the semantic modeling is still difficult to solve the long-distance dependence problem due to the neural network learning mode of the sequence structure. With the wide application of the graph neural network in various NLP subtasks, the task of generating multi-turn dialogue recovery also urgently needs to break the limitation of non-Euclidean space, deeply explore the graph structure which exists in self input, and further model the context to assist the sequence structure modeling in the current situation.
A first aspect of the present invention provides a method for generating a multi-round dialog reply based on two-channel semantic enhancement, as shown in fig. 1 and fig. 2, including:
step 1, obtaining an initial word vector of a dialog text.
And 2, acquiring sequential semantic representation of the initial word vector, including acquiring a speaking-level sentence semantic vector of the initial word vector, determining a dialogue-level sentence semantic vector of the initial word vector according to the speaking-level sentence semantic vector, and marking the dialogue-level sentence semantic vector as sequential semantic representation.
The main purpose of this step is to model the context of the dialog text in time sequence development through semantic analysis of the sequential channel and form semantic vector representation of the dialog context in the sequential channel. In order to reduce the loss of important semantics in the multi-turn conversation circulation process, important semantics focusing on different granularities in a layered structure encoder are added in the step. The hierarchical framework encoder includes a sentence-level encoder and a context encoder, and the hierarchical attention includes word attention and sentence attention.
The step 2 specifically comprises the following steps:
and step 21, inputting the initial word vectors into a sentence layer encoder and a word attention module in sequence to obtain the utterance level sentence semantic vectors. The sentence layer encoder is a bidirectional gating neural network, and the mechanism used by the word attention module is an attention mechanism.
Sentence-level encoder and word attention are based on an initialized word vector representation of the dialog context for utterance-level sentence semantic vector learning, as in equation (1):
Figure BDA0003849867690000061
in the formula, with U i For example, the utterance-level sentence semantic vector learning,
Figure BDA0003849867690000062
is a Bidirectional gated neural network (Bidirectional gated reccurrentunit,
Figure BDA0003849867690000063
),Num i as a sentence U i The total number of words in the word(s),
Figure BDA0003849867690000064
learning vocabulary x for bi-directional gated neural network j,i Adjacent hidden state output of time, w j,i As a sentence U i The word x at the jth position j,i Is determined by the initial vector of (a) or (b),
Figure BDA0003849867690000065
as a sentence U i The word x at the jth position j,i The latest vector representation also represents the vocabulary x learned by the bidirectional gated neural network j,i And outputting the later hidden state.
Unlike the common layered-architecture encoder, the present invention does not hide the last hidden state
Figure BDA0003849867690000066
As a sentence U i Instead of using s in the decoding step t-1 And hidden state sequence
Figure BDA0003849867690000067
Similarity calculation is performed to determine respective weights { alpha ] attached to each hidden state 1,i2,i ...α j,i } thus weighted summation is U i Semantic vector of
Figure BDA0003849867690000068
As shown in equations (2) and (3):
Figure BDA0003849867690000069
Figure BDA00038498676900000610
in the formula, eta (·) represents a Relu function, which can save calculation and relieve the problems of overfitting and gradient disappearance. Therefore, the invention can obtain the sentence semantic vector sequence of the speech level
Figure BDA00038498676900000611
And step 22, sequentially inputting the utterance-level sentence semantic vectors into a context encoder and a sentence attention module to obtain the dialogue-level sentence semantic vectors, wherein the context encoder is a bidirectional gating neural network, and a mechanism used by the sentence attention module is an attention mechanism.
The context encoder and sentence attention are based on the sentence semantic vector at the utterance level for sentence semantic vector learning at the dialog level. Similar to the above calculations:
Figure BDA00038498676900000612
Figure BDA00038498676900000613
where η (-) represents the Relu function,
Figure BDA00038498676900000614
is a bidirectional gated neural network (bidirectional gated reccurrentunit,
Figure BDA0003849867690000071
),
Figure BDA0003849867690000072
learning vocabulary x for bi-directional gated neural network t,i Adjacent hidden state output of time, beta i,t As a sentence U i The t position of the vocabulary x t,i The weight of (a) is calculated,
Figure BDA0003849867690000073
as a sentence U i The t position vocabulary x t,i The latest vector representation also represents the learning vocabulary x of the bidirectional gating neural network t,i The output of the latter hidden state is carried out,
Figure BDA0003849867690000074
is the hidden state vector of the output layer obtained by the context encoder,
Figure BDA0003849867690000075
is a sentence semantic vector representation of dialogue level obtained after sentence attention weighting calculation.
Figure BDA0003849867690000076
The final semantic representation of the whole dialogue context can be adjusted based on the real-time application condition of the text, which is obtained by weighting and aggregating after calculating the associated weight of the decoding state and each hidden state of the context encoder. The invention will generally
Figure BDA0003849867690000077
Is called as
Figure BDA0003849867690000078
Figure BDA0003849867690000079
For the dialog-level sentence semantic vector, the final semantic vector of the dialog context represents the context after complex semantic interactive learning in the sequential channel, and is the important reference and input of the decoder.
And 3, obtaining the image domain semantic representation of the initial word vector on the image domain.
The invention mainly aims to model the medium-long distance semantic association of a dialog text after spanning a time sequence through explicit-implicit semantic analysis of an image domain channel and learn the semantic vector representation of the dialog context in the image domain channel. Firstly, constructing a graph according to explicit-implicit relations in a conversation context, then designing a novel graph neural network to learn vector representation of nodes, and finally obtaining final semantic vector representation through pooling calculation. In order to reduce the loss of important semantics in a multi-turn conversation circulation process, the invention designs a double-gated filtering mechanism in the traditional graph neural network layer, and the double-gated mechanism can reduce semantic noise when node information is updated.
Compared with the simplest fully-connected neural network (MLP), the graph neural network technology updates node information on a graph domain structure, and adds an adjacency matrix a for aggregation calculation in addition to a weight matrix, as shown in fig. 3. In the research, there are three general classes of neural networks, which are:
graph Convolution neural network (Graph Convolation Networks, GCN)
Graph convolutional neural networks (GCNs) are divided into spectral domain-based GCNs and space-based GCNs, the latter being the most widely used in GCNs, and therefore the present invention focuses on space-based GCNs (hereinafter abbreviated GCNs). Similar to the conventional CNN convolution calculation for the euclidean data, the GCN is convolved according to the node relationship in the graph domain data, and the representation of the central node and the representation of its neighbors are aggregated along the edges to update the vector representation of the central node, so that the GCN can adapt to different positions and structures, and can also share the weight during the node calculation. Information transfer calculation between nodes:
Figure BDA00038498676900000710
wherein M is k (. And U) k (. Is) a function with learnable parameters, typically using a fully connected neural network (MLP).
Figure BDA00038498676900000711
Representing nodes of v node at k layerVector u ∈ N v A set of neighbor nodes representing v nodes.
Graph Attention Networks (GAT)
Graph attention networks (GAT) support amplifying the impact of the most important parts of neighboring nodes. In the aggregation process, the attention mechanism is used for determining the weight of a neighborhood node, a plurality of neighbor nodes are controlled to represent semantic information of vector input central nodes, and random walking representation facing important targets is generated. Information transfer calculation between nodes:
Figure BDA0003849867690000081
wherein W k (. And U) k (. Is) a function with learnable parameters, typically using a fully connected neural network (MLP). Alpha (-) is a neighbor node that can be adaptively controlled
Figure BDA0003849867690000082
Attention function contributing to v node semantic information.
Graph space-time network (Graph Spatial-Temporal Networks, GSTN)
The graph space-time network (GSTN) is excellent in space-time correlation, and can be used in application scenarios such as node prediction of a traffic network. GSTN can predict future node values or labels, and predict space-time graph labels, both RNN-based and CNN-based approaches are followed in construction. Taking an RNN-based method as an example, adding a graph convolution unit can capture the spatio-temporal dependency, and the node update calculation is:
Figure BDA0003849867690000083
wherein Gconv (·) is a graph convolution unit, A v The adjacency matrix of the central node on the k layer represents the association between the neighbor node and the central node; RNN (-) is a classical recurrent neural network computation, detailed in:
h t =σ(U·x t +W·h t-1 ) (9)
based on this, the step 3 specifically includes:
and 31, obtaining the topic keywords of the initial word vector, and determining nodes of the heterogeneous cognitive map according to the topic keywords, wherein the nodes comprise topic nodes, non-topic nodes and query nodes.
According to the method, the segmented conversation contents are subjected to explicit-implicit connection by extracting the topic keywords, and the heterogeneous cognitive map is constructed. First, the entire dialog context { U } 1 ,U 2 ...U m-1 ,U m The sentence is divided into three parts (U, Q, R), which are dialogue history sentences (U, Q, R) respectively 1 ,U 2 ...U m-2 }, dialogue inquiry sentence { U m-1 And reply text U m And the dialogue historical sentence is far away from the reply text and represents global historical information of the dialogue, and the dialogue inquiry sentence is close to the reply text and represents short-term intention information of the dialogue, which belong to coarse-grained semantic information. Furthermore, unlike long text analysis, the context of a conversation is often irrelevant to the direction of the overall conversation flow, e.g., "Yes, I understand" and, therefore, the present invention extracts topic keywords to better understand the context of the conversation. The topic keywords are special named entities, are important entities distributed in the whole conversation context, have identification degree, represent semantic information with fine granularity, and can be used for modeling semantic flow association of the conversation.
The invention uses a Term Frequency-Inverse Document Frequency algorithm (TF-IDF algorithm) to extract the topic keywords, wherein the topic keywords are high-Frequency words which can represent the conversation context in the conversation text.
And step 32, determining the edges of the heterogeneous cognitive map and the weight of each edge according to the nodes of the heterogeneous cognitive map, wherein the weight is determined according to the coincidence degree of the topics among the sentences in the dialog text corresponding to the initial word vector.
The implementation algorithm of steps 31 and 32 is shown in table 1.
Table 1 heterogeneous cognitive map construction algorithm
Figure BDA0003849867690000091
Algorithm 1 develops a process for building a heterogeneous cognitive map from dialog text. The heterogeneous cognitive map is established to support cognitive inference based on map domain channels, and specifically, multi-hop inference can be performed by utilizing cooperative information in dialogue inquiry, dialogue history and topic keywords to obtain stronger semantic interaction. The invention uses Stanford CoreNLP (https:// stanfordlp. Github. Io/CoreNLP) to perform data processing such as word segmentation and part of speech tagging, but the data processing is not enough to represent the semantics of a conversation sentence, so that a TF-IDF algorithm is used for extracting topic keywords, and the topic keywords are high-frequency words in the conversation text which can represent the context of the conversation.
After the invention obtains the topic keyword set K, the invention constructs the graph nodes according to the conditions of the topic keywords in the dialog sentences. A sentence set containing a certain topic k and the topic k form a first class of important nodes in a heterogeneous cognitive map, and the first class of important nodes is called v k . It can be noted that a certain sentence may contain several topic keywords, which indicates that the current sentence has rich semantic information and can establish a connection channel for information interaction with other similar sentences. When a sentence does not contain the topic key word, the invention considers that the semantic effect of the current sentence on the whole dialogue is small, and the current sentence can be generalized to a special node v empty In (1). Meanwhile, the dialogue inquiry sentence is most adjacent to the reply text, so the invention considers that the semantic function is most important and attributes the dialogue inquiry sentence to another special node v Q
When establishing the connection between the nodes in the graph, the nodes { v) in the three classes are selected according to the explicit-implicit relation K ,v empty ,v Q Construction of edge set E = { E = } i,j }。v k As "topic-sentence cluster" nodes, v empty Is a common node (cluster of dialog sentences containing no subject), v Q Nodes are queried for the conversation. It is noted here that the present invention takes the heterogeneous node characteristics into account in the subsequent node representation, and the establishment of edges only needs to consider the weight of the connection. In the steps 13-17 of the algorithm, the invention can see that when the node v i And node v j Sharing sentencesIn the invention, a side e is added i,j The more sentences are shared, the tighter the relationship between two nodes is, and the greater the weight is. In addition, due to the association of the two types of special nodes, the invention directly connects the two special nodes to construct the special edge e Q,E This learns important relevant information in the semantic noise from the heuristics of the query sentence.
And step 33, learning the vector representation of the nodes in the heterogeneous cognitive map by using a graph neural network to obtain the graph domain semantic representation of the initial word vector on the graph domain.
Node semantic representation coding
Reasoning on the heterogeneous cognitive graph is based on updating and learning of graph node representations, and the initial representation of the graph nodes can bring correct guidance to the neural network learning of a subsequent graph. Taking fig. 4 as an example, the present invention calculates initial vector representations of three nodes:
(1) for node v empty The invention belongs to node v empty Average pooling of initial semantic vectors of sentences to obtain node vectors v e When node v empty When no sentence is combined, the invention performs average pooling on all the dialogue history sentences:
Figure BDA0003849867690000101
(2) for node v Q The invention directly represents the initial vector of the dialogue inquiry sentence
Figure BDA0003849867690000102
As a node vector v Q
Figure BDA0003849867690000111
(3) For node v k The invention carries out cascade operation on the topic vector and sentence vector belonging to the node, and carries out dimension conversion through a single-layer full-connection network, and k is used for 1 The node where the theme is located is taken as an example:
Figure BDA0003849867690000112
after the initialization vector representation of the three types of nodes is obtained, in order to facilitate the calculation of the neural network of the subsequent graph, the invention represents the vectors of all the nodes as { v } Q ,v e ,v K Is called { v } 1 ,v 2 ...v m Where m = K +2.
Transfer and update of node information
Message passing between graph nodes is achieved through two steps, information aggregation and information combining, which can be done multiple times (commonly referred to as layers or hops). The information aggregation is to aggregate semantic interaction information of adjacent nodes on the same layer; the information combination is to update and combine the information of the same node at different layers.
The information aggregation focuses on how a certain node collects semantic information of adjacent nodes, and the invention focuses on semantic information which is more in number of adjacent nodes of a certain node and cannot bring equal value to a central node, and some semantic noise can be brought even after a plurality of turns of conversation continue. Therefore, the method is different from the common graph neural network, and the GRU unit is selected to filter the information content of the adjacent node cluster when the node of the layer I is updated, so that the semantic noise is relieved. In particular, the reset gate R in the gating mechanism t Will control the slave neighbor node v j To v i The information flow of (2):
Figure BDA0003849867690000113
where R is the set of all type edges,
Figure BDA0003849867690000114
is a node v with edge type r i The cluster of neighbors of (a) is,
Figure BDA0003849867690000115
is a certain adjacent node v j At the first layerIs represented by a node in (1). | · | represents the size of the contiguous node cluster. The GRU unit defines a conversion process for aggregating adjacency information. The conversion of the representation of the neighboring nodes can be realized by a multi-layer perceptron (MLP).
Figure BDA0003849867690000116
Representing a node v i In the aggregate information of the l layer, in consideration of complex connection and more nodes of the structure of the image domain, a residual error connection is added, so that important semantics of the method are kept while gradient disappearance is avoided:
Figure BDA0003849867690000117
wherein, f s Is realized by a multi-layer perceptron (MLP).
The information combination emphasizes that the same node representation of different layers is updated and combined to obtain the information content after multi-hop cognition. However, research has shown that the graph neural network is prone to have a smoothing problem during interlayer reasoning, and the smoothing problem can cause similar node representation, so that the information discrimination capability is lost. To solve this problem, the present invention controls node v from different sources i The size of the information stream from l layer to l +1 layer, which adds a Gate weight to the information combination:
Figure BDA0003849867690000121
Figure BDA0003849867690000122
sigmoid (-) determines the weight by quantifying the contribution degree of different information sources in the same node to updating the interlayer information
Figure BDA0003849867690000123
In particular, the present invention is applicable to,
Figure BDA0003849867690000124
it is the amount of information from the original node representation and the updated node representation when the combination of information is decided, similar to the flexible residual mechanism. Eta (-) as a nonlinear activation function Leaky ReLU, which represents an element-by-element multiplication, f s 、f g Are all realized by using a single-layer MLP. After multiple layers of messaging, all nodes will have their final updated node representations.
And 4, performing semantic enhancement on the dialog text according to the sequential semantic representation and the image domain semantic representation to obtain enhanced semantic representation.
Sequential channels can obtain semantic representations on sequential dialog data through hierarchical attention and progressive modeling of the encoder
Figure BDA0003849867690000125
The graph domain channel can carry out multi-hop inference of conversation intention and semantics on the graph through the heterogeneous cognitive graph constructed by the invention and the designed double-gating GNN, and establish a plurality of semantic association representations at medium and long distances
Figure BDA0003849867690000126
The semantic results of the two channels supplement each other, and the high-level semantic cognition of the whole conversation context can be achieved through information cooperation.
In a graph domain channel, semantic representation of each node is obtained through multi-hop reasoning, and the semantic representation converges long-distance information transmission among layers, and the invention obtains
Figure BDA0003849867690000127
Later, for the sake of synergy in decoder convenience and sequential channel semantics, a weight score is used i Semantic information management of nodes is carried out:
Figure BDA0003849867690000128
Figure BDA0003849867690000129
wherein,
Figure BDA00038498676900001210
the method is a semantic representation of a dialogue inquiry sentence which does not enter two channels, represents an initial dialogue intention and has good guiding function on the generated reply, so the method is used for calculating the information management weight of node information; numL is the total number of semantic nodes.
In the dual-channel information cooperation module, the invention also uses a Gate mechanism to control the influence of the semantic information of two channels on the generation of the reply decoding flow:
Figure BDA00038498676900001211
Figure BDA00038498676900001212
wherein δ is the sequence channel
Figure BDA0003849867690000131
Semantic number, 1-delta, delivered to decoder representing graph domain channel
Figure BDA0003849867690000132
The number of semantics delivered to the decoder. The two parts of semantemes are added to form a final semanteme enhanced semantic representation c final 。c final The integrated semantics after sequence semantic development and image domain semantic association are learned, the dialogue direction of dialogue query sentences is emphasized on information integration, and a decoder can be assisted to accurately decode to generate new words.
And 5, generating a reply text according to the enhanced semantic representation, wherein the method specifically comprises the following steps:
and 51, inputting the enhanced semantic representation into a one-way gating neural network to obtain the hidden state of each word in the generated reply text.
The decoder module part uses a unidirectional GRU to decode to generate a latest hidden state to update the semantic vector of the whole decoding layer, thereby obtaining the hidden state to obtain the probability distribution of a decoding word list:
Figure BDA0003849867690000133
wherein,
Figure BDA0003849867690000134
is in the generation of the ith word y of the reply text i Hidden state of decoding layer of time c final Is semantic representation after the two-channel information is cooperated, and represents semantic inspiration after the conversation idea is clear, y i-1 The invention replies to the vector representation of the (i-1) th word of the text during training and uses the vector representation of the (i-1) th word of the predicted text during prediction
Figure BDA0003849867690000135
Instead, consistency of the reply text may be guaranteed.
And step 52, determining the generation probability of each word according to the hidden state, and determining the reply text according to the generation probability.
When the generated text is decoded, the generated reply tends to extend from the topic keywords, so that a topic bias probability is added by the encoder, the model is constrained forcibly to consider topic development, and the corresponding generation probability is calculated as follows:
Figure BDA0003849867690000136
wherein K and V represent the topic keyword vocabulary and the reply text vocabulary, respectively. Corresponding to, p V And p K All probability values of (c) are normalized by Softmax:
Figure BDA0003849867690000137
Figure BDA0003849867690000138
where η (·) is a nonlinear function tanh. In the training process, the invention defines theta as a trainable parameter, divides a training text into batches for training, obtains the best model effect by optimizing a cross entropy loss function based on negative log likelihood, and learns the parameters, namely the processes of gradient reverse conduction, updating and descending, wherein:
Figure BDA0003849867690000141
the invention provides a neural network model of a fine-grained information interaction method based on topic-enhanced dialogue historical understanding for the first time. On one hand, the model of the invention utilizes the theme semantics and each sentence to carry out fine-grained semantic interaction to obtain the enhanced semantic representation of the historical sentences of the conversation; and on the other hand, the dialog query sentence is used for guiding the topic matrix fusion to obtain the semantic representation of the dialog intention. The operation of the two aspects aims to enhance the understanding of the context by using the subject semantics, thereby breaking through the defect that the topic information is used indiscriminately in the past;
the invention breaks through the sequential structure modeling and guarding thinking of the context content of the conversation history for the first time and provides a model of a collaborative semantic modeling and reasoning method based on a sequence and graph domain dual channel. By using a 'double-tower' model in a recommendation system for reference, the two-channel model can stand at a view field to understand and train the whole conversation context, meanwhile, the two-channel collaborative semantic modeling can maximize the semantic value, and the research thought of a multi-round conversation system is widened.
The multi-round dialog reply generation method based on the two-channel semantic enhancement can be applied to scenes such as an intelligent customer service system of an electronic commerce platform, a voice interaction module of a bionic AI robot, novel retrieval of a portal website and the like. In addition, the method can be embedded into military information service, scenes such as battle situation texts and auxiliary command decisions are intelligently planned and analyzed, and the efficiency and interactive experience of the information service are improved.
The method provided by the invention improves the fluency, diversity and rationality of the multi-turn dialog generation reply, improves the robustness of the generation dialog, and optimizes the capability effect of text semantic modeling.
Hereinafter, the present application will be described in more detail with reference to more specific examples.
Preparation of the experiment:
1. problem of study
In the Sequential and Graph domain two-channel Collaborative semantic modeling and reasoning model (SGDC) provided by the present invention, the present embodiment provides the following three research problems to guide the subsequent experiments:
RQ1: is the SGDC model of the present application perform better than other baseline models in fluency, relevance, and diversity?
RQ2: what does the length (number of turns) of the entire dialog context have an impact on the performance of our SGDC model in the generation of multiple dialog replies?
RQ3: in the model decoding of predictive reply text, is the sequential and graph-domain two-pass synergy effect on the overall performance of our SGDC model?
2. Data set
The DailyDialog dataset and MuTual dataset were selected for the experiments in this example.
The DailyDialog data set is collected from daily life by scholars in the field, and has 13118 conversations, which cover multiple topics such as education, travel, weather and shopping and can reflect most of communication among human beings. The DailyDialog semantic structure is more standard and formal, has more subject value, reasonable speaker number, no redundancy in conversation turns and has more research and application values.
The MuTual data set is a high-quality manual annotation multi-turn dialogue reasoning data set, and comprises 8,860 manual annotation dialogues based on English listening comprehension examinations of Chinese students. MuTual is more reasoning challenging than previous conversational baseline datasets. The two data sets are shown in table 2.
TABLE 2 data set information
DailyDialog MuTual
Number of conversations 13118 8860
Average number of turns of conversation 7.9 4.73
Average number of words in a sentence 14.6 19.57
3. Reference model for experimental control
In the embodiment, five related multi-round dialog generation models are selected as a baseline algorithm model to be compared with the algorithm model of the application in the overall performance, and the experimental effect is discussed and analyzed. A brief introduction of the baseline algorithm model is as follows:
S2S-Att: in the most popular encoder-decoder framework, an encoder encodes an input sequence into an intermediate state, and then a decoder is used for decoding and generating, the encoder and the decoder both adopt a gated neural network GRU, and meanwhile, an Attention mechanism is added to the decoder input at each time step, so that the predicted words at each time are the most relevant to the input text.
HRED: the first hierarchical context modeling approach for response generation uses utterance-level GRUs to encode each sentence, and dialog-level GRUs to convert the utterance vector into a dialog-level vector representation. Compared with the common S2S framework, the three-level semantic progression of vocabulary-utterance-dialogue is considered, and the information aggregation and propagation on each level can be assisted, so that the multi-turn dialogue history modeling is realized.
THRED: according to the method, topic perception is introduced firstly in the field of multi-turn dialog generation, and according to the development, topic perception is introduced on the basis of an HRED model, and decoding of guidance reply is performed by using a topic-context joint attention mechanism.
RecoSa: the use of Self-orientation mechanism to associate the most closely related dialog context with reply text is an improved hybrid model of Transformer and HRED, with Attention mechanism embedded in both word-level and utterance-level coders for hierarchical modeling, and is currently the most advanced performance in the field of multi-turn dialog generation.
In addition, in order to explore the influence of a two-channel semantic collaborative mode on the model performance, three baseline models are constructed according to a common semantic feature aggregation technology, and the following are briefly introduced:
-Avg: a two-channel model of a mean aggregation strategy is adopted, and background semantics are concerned;
-Max: a two-channel model of a maximum aggregation strategy is adopted, and foreground semantics are concerned;
-Concat: a double-channel model of a mean aggregation strategy is adopted, and global semantics are concerned.
The aggregation strategy of the semantic features is an important strategy adopted when a plurality of semantic vectors are aggregated or a semantic matrix is converted into a fixed-length vector for representation, as shown in fig. 5, when the semantic features are aggregated, information redundancy can be reduced, focus semantics can be aggregated, and over-fitting training can be prevented, which is similar to the function of a pooling layer in a convolutional neural network. Common aggregation strategies are maximum aggregation, mean aggregation, join aggregation, and gated aggregation.
Maximum aggregation strategy
The operation of the maximum aggregation strategy is to take the maximum value of the vector element value of each dimension as the same-dimension vector element value represented by the new semantic vector. According to the process of the maximum value aggregation strategy, other element values in the same dimension cannot be transmitted into the next semantic vector, and the aggregation enables the model to pay attention to the most shallow salient part in the semantics. The aggregation strategy can be adopted for the text representation with single vocabulary, simple syntax and shallow semantic appearance, the aggregation operation can be carried out by using the maximum pooling layer, and the maximum aggregation formula is as follows:
h max =max{h 1 ,h 2 ...h i } (26)
mean aggregation strategy
The operation of the mean aggregation strategy is to average the vector element values of each dimension as the vector values of the same latitude elements of the new semantic vector representation. The average aggregation strategy considers that elements of each dimension have equal semantic value, semantic neutralization can be performed through an averaging mode, deviation of the semantic direction is reduced, and the strategy can improve the robustness of the model. The farther back in the deep neural network, the more the semantic information is rich and balanced, and the more the mean value aggregation strategy is suitable. The mean aggregation formula is:
Figure BDA0003849867690000161
join aggregation policy
The join aggregation strategy is an aggregation strategy which is extended by simultaneously taking advantages of mean aggregation and maximum aggregation into consideration. With v is 1 ∈R n And v 2 ∈R m For example, two feature vectors, a common join aggregation strategy operation is to splice vector elements, most notably to increase the dimension to v concat ∈R n+m The connection aggregation is a direct but effective strategy for violently adding the semantic information and maximally retaining the semantic information, and a linear mapping is often used after splicingAnd the radiation is converted to the required number of dimensions. The formula for the linkage polymerization is:
Figure BDA0003849867690000162
gated aggregation strategy
The gating aggregation strategy supports an information cooperation mode with flexible operation, and the input degree of each dimension or vector semantics is controlled through Gate calculation. The specific operation of the gating aggregation strategy is to learn the value of the Gate through an activation function sigmoid and gradient updating in deep learning, and to combine the Gate value and each state sequentially through a Hadamard product (Hadamard), so as to aggregate to obtain a final vector representation. The gating aggregation strategy supports training and updating of deep learning, can confirm important weight of each dimension aggregation according to a neural network, and is suitable for models of complex semantics and reasoning tasks. The formula for gated aggregation is:
Figure BDA0003849867690000163
4. criteria and indicators for experimental evaluation
Automated evaluation of multi-turn dialog reply generation is primarily considered from fluency, diversity, and relevance of topics of the generated text. The fluency of the text represents reasonable compliance on word grammar and semantic association, namely the probability of the deep learning model on the part of speech sequence and word collocation distance is maximized, the requirement of human readable and understandable text is met, and ambiguous understanding does not exist. The diversity of the text represents that the dialogue context has rich semantics and has the extensible characteristic, is not meaningless reply of the Idon't Know' and is the necessary requirement of the chatty dialogue system. The relevance of the cutting questions is whether the reply has practical significance, accords with the topic order or turning of a conversation scene, and is an important direction for judging the reply generation task.
In particular, in order to better evaluate the advantages and disadvantages of the baseline algorithm and the research work of the application, the fluency, diversity and relevance of the generated reply are respectively evaluated according to the literature (completion) by adopting the confusion degrees PPL, dist-1 and dist-2 and the sentence relevance index based on Embedding.
PPL: according to the reference, the fluency of the generated text is evaluated by using language model confusion (PPL), and the lower the PPL value is, the higher the probability of the generated reply text is, and the more reasonable the vocabulary arrangement and collocation are, the easier the fluency is to understand. The formula is as follows:
Figure BDA0003849867690000171
distingt: automated evaluation indicators Distingt-1 and Distingt-2 are used herein to evaluate content diversity of generated text. The higher the score value of the Distingt-n is, the higher the proportion of the n-tuple in the sentence is, the generated text is rich in more contents, and the better the reply effect is. The formula is calculated as follows:
Figure BDA0003849867690000172
embedding: the method is different from an ngram method in calculating the coincidence or the co-occurrence degree between the prediction and the reality, and the text is transferred to the low-dimensional semantic representation based on the Embedding evaluation method, so that the correlation degree is measured through the text similarity. The present example uses Greedy Matching (GM), embedding Average (EA) and Vector Extreme (VE) for evaluation. The larger the three evaluation index values are, the closer the semantic correlation between the representation predicted text and the real text is, and the more critical the reply is.
Greedy Matching (GM), embedding Greedy value measurement, using Greedy search to make as much as possible to generate words or semantics similar to keywords in the real text, considering word-level alignment at a finer granularity, and evaluating a long text more accurately, where a formula is calculated as follows:
Figure BDA0003849867690000173
Figure BDA0003849867690000174
embedding Average (EA), the embedded Average metric, is widely used to measure text similarity. Cosine similarity is used to measure semantic vectors of predicted and true text, where a semantic vector is a method of calculating meaning of a phrase by averaging vector representations of its constituent words, and the formula is calculated as follows:
Figure BDA0003849867690000175
vector Extreme (VE), embedded extreme value measurement uses extreme values of each dimension of a word Vector when calculating a text Vector, and uses cosine similarity for measurement and comparison as above, wherein it is worth noting that the evaluation index focuses on an information extreme value, namely topic information, and thus can be used for measuring relevance of a topic, and the formula is calculated as follows:
Figure BDA0003849867690000181
parameter setting and implementation environment
For fair comparison of the baseline algorithm and the algorithm model of the invention, the embodiment of the invention adopts an Adam optimizer and a pytorech framework, during training, word embedding vectors adopt random initialization and model updating, the dimension is 512 dimensions, the hidden dimensions of input and output of all the recurrent neural network (GRU and BIGRU) units are also 512 dimensions, the model learning rate is set to 0.0001 for gradient clipping, the sample number (batch size) of training participation in each iteration is set to 64, and optimal training and verification prediction are carried out on a workstation of an NVIDIA TITAN RTX GPU.
In addition, the subject in the model of the present embodiment is extracted by TF-IDF. In order to accelerate the training process and prevent error accumulation caused by larger model errors in the early training process, a teacher forcing mechanism is introduced to forcibly modify the input of a decoder into a target token, so that error transmission in the model is reduced, and the parameters can be ensured to be updated normally.
The software and hardware implementation configuration of this operation is shown in table 3 below:
TABLE 3 software and hardware implementation configurations
Figure BDA0003849867690000182
Analysis and discussion of experimental results:
1. overall performance compared to baseline model
To explore the RQ1 problem, this example performed performance alignment of the SGDC model with the baseline model of DialogueRNN framework on both MuTual and DailyDialog datasets, with fluency, relevance, and diversity assessment results as shown in table 4. Wherein, the best base line model index value is underlined, and the best evaluation index value is thickened.
TABLE 4 Performance alignment of SGDC model to Baseline model of DialogueRNN framework
Figure BDA0003849867690000183
Figure BDA0003849867690000191
As can be seen from the experimental results of table 4:
1, it is obvious that the SGDC model of the two-channel semantic modeling of the present embodiment is superior to other baseline models in most evaluations on two data sets. The remarkably-won model performance on the two data sets shows the effectiveness of the two-channel semantic collaborative modeling mode of the sequence and the map domain, and the semantic association and the reasoning effect which cannot be mined by the DialogueRNN framework can be obtained.
2 on the relevance dimension evaluation, we found interesting phenomena: taking the MuTual data set as an example, the SGDC model achieves higher scores for all three indexes evaluated on the relevance, the VE and EA scores are about 2% higher and the GM score is 3.5% higher than the best baseline model, and the VE, EA and GM scores are 5% higher than all other baseline models. Similarly, the same is true on the DailyDialog dataset, but the lifting is not as great as the MuTual dataset. To explain this capacity gap in relevance, it is important to understand the nature of these models and datasets, the MuTual dataset is a dataset that labels more rigorous and focuses on multi-round dialogue reasoning, the SGDC model is a dataset that mines semantic relations from different structural perspectives, the order channel can capture general progressive semantics, the graph domain channel can get long distance information dependence across distance barriers through edge connections in multi-round dialogue, and the synergistic modeling advantage of this two-channel is clearly superior to other baseline models of dialoguern framework.
3, the score of the embodiment is significantly higher than that of recsa, perhaps because the graph domain channel of the invention is constructed based on the relationship of "topic-sentence cluster" nodes, semantic reasoning on an abnormal graph is more effective than that of recsa only focusing on a prior mechanism corresponding to a question and answer, which also shows that the model of the invention can accurately sense the topic direction of a conversation, so that a similarity score closer to a real reply text can be obtained, and a conversation track on a topic is maintained.
Surprisingly, the difference between the model of the present invention and other reference models is not large in the diversity Dist-1 and Dist-2 scores, probably because the semantic modeling of the present invention pursues semantic relations too much and also uses topic bias probabilities, thus losing a certain text diversity, which is a point that the dialog generation model is difficult to balance, but it is worth to say that the controllable diversity can be replaced by more closely related replies from the perspective of the scene application.
Effect of dialog context Length on Performance
To explore the RQ2 problem, this example analyzed the performance of the SGDC model and the baseline model over test samples of different context lengths, i.e., the number of turns of a multi-turn conversation is different. This example manually groups the sampling test set into three groups, a short session (session turns less than 5), a medium session (session turns between 6 and 10), and a long session (session turns greater than 10), depending on the session context length. The performance of each model was evaluated using part of the objective evaluation metrics, and the results are plotted as in fig. 6. The three histograms in each model in the figure refer to short, medium and long, respectively, from left to right.
From the experimental results it can be seen that:
1 whether it is the SGDC model or the baseline model, as the dialogue space increases, the confusion score shows monotonous rise with different degrees, which means that the longer the dialogue space, the more complicated the semantic modeling of the dialogue, the more difficult the information correlation is captured, and the model is easily affected by irrelevant semantic noise, resulting in the reduction of the prediction ability.
2 the SGDC model of the present invention performs better than the baseline model in either short, medium, or long session test sets, demonstrating the robustness of the model of the present invention. In addition, compared with the best reference model, the SGDC has the largest lifting amplitude in a long dialog, which can be seen that the graph domain channel of the SGDC plays a role in capturing long-distance semantic dependence and plays a unique capability advantage in semantic modeling of the long dialog.
Effect of a two-channel semantic collaborative approach on Performance
In order to explore the RQ3 problem, this embodiment changes a way of dual-channel information cooperation in the SGDC model, and designs three variant models to explore an aggregation strategy for the two channels with optimal semantic information, where the three variant models are:
·SGDC Avg :SGDC Avg the model selects an averaging aggregation strategy, and the cooperative mode assumes that semantic information in two channels is equal, so that semantic representation of the whole context is obtained by averaging and pooling two semantic vectors;
·SGDC Max :SGDC Max the model emphasizes selection of the most important semantic features, a maximum aggregation strategy is adopted, and the cooperation mode is that the maximum value in semantic vector representation can reflect important semantics, so that the semantic representation of the whole context is obtained by maximally pooling two semantic vectors;
·SGDC Concat :SGDC Concat the model considers that the semantic information of the two channels is equally important and cannot be damaged, so that the semantic representation of the whole context is obtained in a mode of directly combining semantic vectors;
table 5 results of the synergistic effect on performance
Figure BDA0003849867690000201
For the sake of comparison, we refer to the model used in the above experiments as SGDC Gate In Table 5, SGDC is recorded Gate And the generation effect of the three variants thereof on two data sets of MuTual and DailyDialog, wherein the evaluation index (GM \ EA \ VE) of semantic relevance is selected in the embodiment, and the evaluation index difference in Table 5 can be found:
①SGDC Gate the method is obviously superior to SGDC in three indexes of semantic relevance Avg And SGDC Max . The method shows that the semantic information of the two channels has respective uniqueness and importance, the benefit maximization of semantic information aggregation can be ensured only by finding the optimal aggregation strategy through a Gate mechanism, and the SGDC Avg May cause the respective important semantics to be lost, SGDC Max The maximum aggregation strategy of (1) is too focused on important semantics and belongs to local correlation, and the semantic correlation balance is difficult to capture, so that the overall semantic correlation of the reply text is lost.
②SGDC Gate Comparison with SGDC in evaluation of EA and VE Concat The difference is not great, but is obviously superior to SGDC in GM Concat . This curiosity can be explained from the details of the evaluation index, EA and VE measuring the mean and extreme levels of the similarity of embedding of predicted and actual text words, respectively, SGDC Gate Finding the equilibrium point of aggregation through the Gate mechanism, SGDC Concat Then, as per the single collection, the long-distance dependence is captured by the semantic enhancement of the image domain channel, so that the two are the special approaches of accurate solution and violent solution respectively, and the difference is not too large on the average level and the extreme value level of the predicted text. And GThe M value not only considers the similarity of word embedding, but also considers the alignment between words, so the M value belongs to a more fine-grained evaluation item, the advantage of a Gate mechanism is reflected, and the SGDC (generalized Standard DC) Gate Energy ratio SGDC Concat And filtering semantic noise with finer granularity.
The invention provides a collaborative semantic modeling and reasoning method based on sequence and map domain dual channels by taking rich research ideas of a map neural network technology as reference, and designs a dual-gate control map neural network based on nodes on a map. In addition, the model of the invention is experimentally verified on both the open domain data set and the dialogue reasoning data set, the experimental result shows the advantages of the dual-channel collaborative semantic modeling and reasoning method on each evaluation item, and meanwhile, the model of the invention still has good robustness with the increase of the dialogue turns.
The invention starts with the analysis and reutilization of content characteristics and the double-channel enhancement of structural characteristics, and the proposed model scheme has better practical application value:
(1) The generative dialogue method only considers the progressive semantics of the sequence structure and ignores the long-distance context strong association interaction. How to effectively and comprehensively utilize the context information, the invention adopts the most direct method, namely, the sequence structure is broken, a heterogeneous graph capable of realizing cognitive reasoning is designed by utilizing the context information by using the thinking mode of repeated thinking during human conversation, and the effectiveness of research can be expanded by the thought of breaking and then standing.
(2) In the information transmission process represented by the updating node of the neural network of the graph, the invention designs the double-Gate GNN, and the filtering and screening of the information transmission are carried out through the GRU unit of the information aggregation and the Gate mechanism of the information combination. The design can effectively filter semantic noise and catch key semantic information, and is an attempt in the field of dialog generation.
A second aspect of the invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the steps of the method being implemented when the computer program is executed by the processor.
The invention provides a sequential and image domain two-channel collaborative semantic modeling and reasoning method, and aims to fuse semantic advantages in modeling of different structures and obtain information association and semantic reasoning with larger span. In detail, on one hand, the method constructs a dialogue-level heterogeneous cognitive map, map nodes are the integration of subject semantics and sentence cluster semantics, edges in the map are the degree of subject coincidence between sentences, and then deep learning is carried out by utilizing a double-gated map neural network to obtain semantic representation of dialogue context on a map domain; embedding a hierarchical attention mechanism in the retained sequential channels, on the other hand, results in a sequential semantic representation of the dialog context. And finally, coordinating the information contributions of the two semantic representations to predict. The model of the invention has excellent performance on the reference model and relieves the long-distance semantic dependence problem.
The foregoing is illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make numerous possible variations and modifications to the present invention, or modify equivalent embodiments to equivalent variations, without departing from the scope of the invention, using the teachings disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention shall fall within the protection scope of the technical solution of the present invention, unless the technical essence of the present invention departs from the content of the technical solution of the present invention.

Claims (10)

1. A multi-round dialog reply generation method based on dual-channel semantic enhancement is characterized by comprising the following steps:
acquiring an initial word vector of a dialog text;
acquiring sequential semantic representation of the initial word vector, including acquiring a speaking-level sentence semantic vector of the initial word vector, determining a dialogue-level sentence semantic vector of the initial word vector according to the speaking-level sentence semantic vector, and marking the dialogue-level sentence semantic vector as sequential semantic representation;
obtaining the graph domain semantic representation of the initial word vector on the graph domain;
performing semantic enhancement on the dialog text according to the sequential semantic representation and the graph domain semantic representation to obtain enhanced semantic representation;
and generating a reply text according to the enhanced semantic representation.
2. The method as claimed in claim 1, wherein obtaining the utterance-level sentence semantic vector of the initial word vector comprises:
inputting the initial word vector into a sentence layer encoder and a word attention module in sequence to obtain a speech-level sentence semantic vector;
determining a dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, specifically comprising:
the speech-level sentence semantic vector is sequentially input into a context encoder and a sentence attention module to obtain a dialogue-level sentence semantic vector;
the sentence layer encoder and the context encoder are both bidirectional gated neural networks;
the mechanism used in the word attention module and the sentence attention module are both attention mechanisms.
3. The method as claimed in claim 1, wherein obtaining a domain semantic representation of the initial word vector on a domain comprises:
obtaining the topic key words of the initial word vector, and determining nodes of a heterogeneous cognitive map according to the topic key words, wherein the nodes comprise topic-sentence cluster nodes, conversation query nodes and common nodes;
determining the edges of the heterogeneous cognitive map and the weight of each edge according to the nodes of the heterogeneous cognitive map, wherein the weight is determined according to the coincidence degree of the topics among the sentences in the dialog text corresponding to the initial word vector;
and learning the vector representation of the nodes in the heterogeneous cognitive map by using a map neural network to obtain a map domain semantic representation of the initial word vector on a map domain.
4. The method as claimed in claim 1, wherein semantically enhancing said dialog text based on said sequential semantic representation and said graph domain semantic representation comprises:
performing semantic enhancement on the dialog text according to a first formula, wherein the first formula is as follows:
Figure FDA0003849867680000021
in the formula, c final For the purpose of the enhanced semantic representation,
Figure FDA0003849867680000022
for the purpose of the sequential semantic representation,
Figure FDA0003849867680000023
for the map domain semantic representation, δ is the number of semantics in the sequential semantic representation, and (1- δ) is the number of semantics in the map domain semantic representation.
5. The method of any of claims 1-4, wherein generating a reply text based on the enhanced semantic representation comprises:
inputting the enhanced semantic representation into a one-way gating neural network to obtain a hidden state of each word in a generated reply text;
and determining the generation probability of each word according to the hidden state, and determining a reply text according to the generation probability.
6. The method as claimed in claim 5, wherein inputting the enhanced semantic representation into a one-way gated neural network to obtain the hidden state of each word in the generated reply text comprises:
generating the hidden state of each word in the reply text according to a second formula, wherein the second formula is as follows:
Figure FDA0003849867680000024
in the formula, y i Generating the ith word, y, in the reply text for the training phase i-1 The i-1 th word in the reply text is generated for the training phase,
Figure FDA0003849867680000025
is y i GRU (-) indicates that the parameters thereof are input into the gated neural network,
Figure FDA0003849867680000026
is y i-1 Hidden state of (c) final Is the enhanced semantic representation.
7. The method of claim 5, wherein determining a probability of generation for each word based on the hidden state comprises:
determining the generation probability of each word according to a third formula, wherein the third formula is as follows:
Figure FDA0003849867680000027
in the formula,
Figure FDA0003849867680000028
the ith word in the reply text is generated for the prediction stage,
Figure FDA0003849867680000029
is composed of
Figure FDA00038498676800000210
The probability of generation of (a) is,
Figure FDA00038498676800000211
and
Figure FDA00038498676800000212
respectively as the ith word in the topic keyword vocabulary and the reply text vocabulary in the prediction stage
Figure FDA00038498676800000213
The probability of generation of (c).
8. The method of claim 7, wherein said step of determining is performed by a computer
Figure FDA00038498676800000214
Determined according to a fourth formula, which is:
Figure FDA0003849867680000031
where η (-) is a non-linear function tanh, V is a reply text vocabulary, K is a topic keyword vocabulary,
Figure FDA0003849867680000032
generating the ith word y in the reply text for the training phase i Hidden state of (y) i-1 Generating the i-1 word in the reply text for the training phase, c final For the enhanced semantic representation, vocab represents variable i.
9. The method as set forth in claim 7, wherein said step of determining is performed by using a calibration curve
Figure FDA0003849867680000033
The method is determined according to a fifth formula, wherein the fifth formula is as follows:
Figure FDA0003849867680000034
in the formula, eta (. Cndot.) isThe linear function tanh, V is the reply text vocabulary, K is the topic keyword vocabulary,
Figure FDA0003849867680000035
generating the ith word y in the reply text for the training phase i Hidden state of (y) i-1 Generating the i-1 word in the reply text for the training phase, c final For the enhanced semantic representation, vocab represents variable i.
10. A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the steps of the method according to any of claims 1 to 9 when executing said computer program.
CN202211128307.0A 2022-09-16 2022-09-16 Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment Pending CN115495552A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211128307.0A CN115495552A (en) 2022-09-16 2022-09-16 Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211128307.0A CN115495552A (en) 2022-09-16 2022-09-16 Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment

Publications (1)

Publication Number Publication Date
CN115495552A true CN115495552A (en) 2022-12-20

Family

ID=84467864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211128307.0A Pending CN115495552A (en) 2022-09-16 2022-09-16 Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment

Country Status (1)

Country Link
CN (1) CN115495552A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879422A (en) * 2023-02-16 2023-03-31 之江实验室 Dialog reply generation method, device and storage medium
CN116361490A (en) * 2023-06-02 2023-06-30 中国传媒大学 Entity and relation extraction method, system and electronic equipment based on graph neural network
CN117234341A (en) * 2023-11-15 2023-12-15 中影年年(北京)文化传媒有限公司 Virtual reality man-machine interaction method and system based on artificial intelligence
CN117290491A (en) * 2023-11-27 2023-12-26 语仓科技(北京)有限公司 Aggregation retrieval enhancement-based large-model multi-round dialogue method, system and equipment
CN117828072A (en) * 2023-11-06 2024-04-05 中国矿业大学(北京) Dialogue classification method and system based on heterogeneous graph neural network
CN117892735A (en) * 2024-03-14 2024-04-16 中电科大数据研究院有限公司 Deep learning-based natural language processing method and system
CN118569338A (en) * 2024-08-02 2024-08-30 国泰新点软件股份有限公司 Data proportioning method, device and equipment for vertical domain large model pre-training
CN118657223A (en) * 2024-08-22 2024-09-17 天津大学 Target-oriented dialogue method and model based on knowledge-graph bidirectional reasoning

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115879422A (en) * 2023-02-16 2023-03-31 之江实验室 Dialog reply generation method, device and storage medium
CN116361490A (en) * 2023-06-02 2023-06-30 中国传媒大学 Entity and relation extraction method, system and electronic equipment based on graph neural network
CN116361490B (en) * 2023-06-02 2023-08-22 中国传媒大学 Entity and relation extraction method, system and electronic equipment based on graph neural network
CN117828072A (en) * 2023-11-06 2024-04-05 中国矿业大学(北京) Dialogue classification method and system based on heterogeneous graph neural network
CN117234341A (en) * 2023-11-15 2023-12-15 中影年年(北京)文化传媒有限公司 Virtual reality man-machine interaction method and system based on artificial intelligence
CN117234341B (en) * 2023-11-15 2024-03-05 中影年年(北京)科技有限公司 Virtual reality man-machine interaction method and system based on artificial intelligence
CN117290491A (en) * 2023-11-27 2023-12-26 语仓科技(北京)有限公司 Aggregation retrieval enhancement-based large-model multi-round dialogue method, system and equipment
CN117892735A (en) * 2024-03-14 2024-04-16 中电科大数据研究院有限公司 Deep learning-based natural language processing method and system
CN117892735B (en) * 2024-03-14 2024-07-02 中电科大数据研究院有限公司 Deep learning-based natural language processing method and system
CN118569338A (en) * 2024-08-02 2024-08-30 国泰新点软件股份有限公司 Data proportioning method, device and equipment for vertical domain large model pre-training
CN118657223A (en) * 2024-08-22 2024-09-17 天津大学 Target-oriented dialogue method and model based on knowledge-graph bidirectional reasoning

Similar Documents

Publication Publication Date Title
CN115495552A (en) Multi-round dialogue reply generation method based on two-channel semantic enhancement and terminal equipment
CN110083705B (en) Multi-hop attention depth model, method, storage medium and terminal for target emotion classification
CN108763284B (en) Question-answering system implementation method based on deep learning and topic model
CN108681610B (en) generating type multi-turn chatting dialogue method, system and computer readable storage medium
US20180329884A1 (en) Neural contextual conversation learning
CN109522545B (en) A kind of appraisal procedure that more wheels are talked with coherent property amount
CN112527966B (en) Network text emotion analysis method based on Bi-GRU neural network and self-attention mechanism
CN114168749A (en) Question generation system based on knowledge graph and question word drive
CN111400461B (en) Intelligent customer service problem matching method and device
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
US20230169271A1 (en) System and methods for neural topic modeling using topic attention networks
CN117574904A (en) Named entity recognition method based on contrast learning and multi-modal semantic interaction
WO2023231513A1 (en) Conversation content generation method and apparatus, and storage medium and terminal
CN113918813A (en) Method and device for recommending posts based on external knowledge in chat record form
CN115964459A (en) Multi-hop inference question-answering method and system based on food safety cognitive map
CN116821294A (en) Question-answer reasoning method and device based on implicit knowledge ruminant
CN115600582A (en) Controllable text generation method based on pre-training language model
CN117037789B (en) Customer service voice recognition method and device, computer equipment and storage medium
CN114582448A (en) Epidemic case information extraction framework construction method based on pre-training language model
CN116863920B (en) Voice recognition method, device, equipment and medium based on double-flow self-supervision network
Huang et al. Knowledge distilled pre-training model for vision-language-navigation
CN115422388B (en) Visual dialogue method and system
CN117235261A (en) Multi-modal aspect-level emotion analysis method, device, equipment and storage medium
CN116628203A (en) Dialogue emotion recognition method and system based on dynamic complementary graph convolution network
Chien et al. Variational sequential modeling, learning and understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination