CN115495552A

CN115495552A - Multi-round dialogue reply generation method and terminal equipment based on dual-channel semantic enhancement

Info

Publication number: CN115495552A
Application number: CN202211128307.0A
Authority: CN
Inventors: 蔡飞; 张伟康; 刘诗贤; 陈洪辉; 毛彦颖; 刘登峰; 王思远; 李佩宏
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-12-20

Abstract

The invention discloses a multi-round dialogue reply generation method based on dual-channel semantic enhancement and a terminal device. The method includes acquiring the initial word vector of the dialogue text; acquiring the sequential semantic representation of the initial word vector, including acquiring the discourse-level sentence of the initial word vector Semantic vector, determine the dialogue-level sentence semantic vector of the initial word vector according to the discourse-level sentence semantic vector, record the dialogue-level sentence semantic vector as a sequential semantic representation; obtain the graph domain semantic representation of the initial word vector on the graph domain; according to the sequential semantic representation And graph domain semantic representation, carry out semantic enhancement on the dialogue text, obtain the enhanced semantic representation; generate reply text according to the enhanced semantic representation. The invention aims to integrate the semantic advantages in different structural modeling to obtain information association and semantic reasoning with a larger span. The model of the invention performs well on the benchmark model, and alleviates the problem of long-distance semantic dependence.

Description

Multi-round dialogue reply generation method and terminal equipment based on dual-channel semantic enhancement

技术领域technical field

本发明属于人工智能技术领域，具体涉及基于双通道语义增强的多轮对话回复生成方法及终端设备。The invention belongs to the technical field of artificial intelligence, and in particular relates to a method for generating multi-round dialogue replies based on dual-channel semantic enhancement and a terminal device.

背景技术Background technique

随着万物互联和人机交互的兴起，对话系统作为一种应用广泛的沟通媒介，已经垂直深入到智能客服、AI音箱、智慧座舱等诸多场景。同时由于其能够提升信息服务体验和辅助语音指令交互的巨大优势，对话系统拥有着巨大的研究价值和应用价值，其中以多轮对话系统最为突出。多轮对话回复生成，是一种关注连续对话、复杂语义交互的生成式对话，能够根据用户与智能体在某段时间内的交互文本，对用户进行有意义、多样化的流畅回复，近些年来逐步被各国研究人员调研并关注。With the rise of the Internet of Everything and human-computer interaction, dialogue systems, as a widely used communication medium, have vertically penetrated into many scenarios such as smart customer service, AI speakers, and smart cockpits. At the same time, due to its great advantages of improving information service experience and assisting voice command interaction, dialogue systems have great research value and application value, among which multi-turn dialogue systems are the most prominent. Multi-round dialogue reply generation is a kind of generative dialogue that focuses on continuous dialogue and complex semantic interaction. It can provide meaningful and diverse fluent responses to users based on the interaction text between users and agents within a certain period of time. Over the years, it has been gradually investigated and paid attention to by researchers from various countries.

如今的智能对话系统大多是基于端到端的深度神经网络技术研发的，随着应用场景的普及，对话系统的回复却不能与时俱进，形式较为单一，内容缺少场景价值。其中，连续交互的多轮对话系统研究虽然通过引入常识或固定句式来获得响应质量的提升，但主要挑战却是对上下文进行有效性建模，获得准确语义表示。Most of today's intelligent dialogue systems are developed based on end-to-end deep neural network technology. With the popularity of application scenarios, the responses of dialogue systems cannot keep pace with the times. The form is relatively single, and the content lacks scene value. Among them, although the continuous interactive multi-round dialogue system research improves the response quality by introducing common sense or fixed sentence patterns, the main challenge is to effectively model the context and obtain accurate semantic representation.

现有技术中的多轮对话回复生成，在进行语义信息抽取的过程中，由于模型结构和会话历史长序列结构的限制，难以获取准确的查询信息，容易引入语义噪声干扰生成回复，导致生成非理想的响应且鲁棒性较差。In the process of extracting semantic information in the multi-round dialogue reply generation in the prior art, due to the limitation of the model structure and the long sequence structure of the conversation history, it is difficult to obtain accurate query information, and it is easy to introduce semantic noise to interfere with the generated reply, resulting in the generation of abnormal Ideal response with less robustness.

发明内容Contents of the invention

本发明提供了一种基于双通道语义增强的多轮对话回复生成方法及终端设备，解决了现有技术中多轮对话回复方法存在的回复质量差、且鲁棒性差的技术问题。The present invention provides a multi-round dialog reply generation method based on dual-channel semantic enhancement and a terminal device, which solves the technical problems of poor reply quality and poor robustness existing in the multi-round dialog reply method in the prior art.

本发明内容的第一方面公开了一种基于双通道语义增强的多轮对话回复生成方法，包括：The first aspect of the content of the present invention discloses a multi-round dialogue reply generation method based on dual-channel semantic enhancement, including:

获取对话文本的初始词向量；Obtain the initial word vector of the dialogue text;

获取所述初始词向量的顺序语义表示，包括获取所述初始词向量的话语级句子语义向量，根据所述话语级句子语义向量确定所述初始词向量的对话级句子语义向量，将所述对话级句子语义向量记为顺序语义表示；Obtaining the sequential semantic representation of the initial word vector includes obtaining the utterance-level sentence semantic vector of the initial word vector, determining the dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, and converting the dialogue Sentence-level semantic vectors are recorded as sequential semantic representations;

获取所述初始词向量在图域上的图域语义表示；Obtaining the graph domain semantic representation of the initial word vector on the graph domain;

根据所述顺序语义表示和所述图域语义表示，对所述对话文本进行语义增强，得到增强后的语义表示；performing semantic enhancement on the dialog text according to the sequential semantic representation and the graph domain semantic representation, to obtain an enhanced semantic representation;

根据所述增强后的语义表示，生成回复文本。A reply text is generated according to the enhanced semantic representation.

优选地，获取所述初始词向量的话语级句子语义向量，具体包括：Preferably, obtaining the utterance-level sentence semantic vector of the initial word vector specifically includes:

将所述初始词向量依次输入句子层编码器和字词注意力模块，得到话语级句子语义向量；The initial word vector is input into the sentence layer encoder and the word attention module in turn to obtain the discourse-level sentence semantic vector;

根据所述话语级句子语义向量确定所述初始词向量的对话级句子语义向量，具体包括：Determine the dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, specifically including:

将所述话语级句子语义向量依次输入上下文编码器和句子注意力模块，得到对话级句子语义向量；The discourse-level sentence semantic vector is input into the context encoder and the sentence attention module in sequence to obtain the dialogue-level sentence semantic vector;

所述句子层编码器和所述上下文编码器均为双向门控神经网络；Both the sentence layer encoder and the context encoder are bidirectional gated neural networks;

所述字词注意力模块和所述句子注意力模块中使用的机制均为注意力机制。The mechanisms used in the word attention module and the sentence attention module are attention mechanisms.

优选地，获取所述初始词向量在图域上的图域语义表示，具体包括：Preferably, obtaining the graph domain semantic representation of the initial word vector on the graph domain, specifically includes:

获取所述初始词向量的主题关键词，根据所述主题关键词确定异构认知图的节点，所述节点包括“主题-句子簇”节点、对话查询节点、普通节点(不包含主题的对话句子簇)；Obtain the topic keywords of the initial word vector, determine the nodes of the heterogeneous cognitive map according to the topic keywords, and the nodes include "topic-sentence cluster" nodes, dialogue query nodes, common nodes (dialogues that do not contain topics) sentence cluster);

根据所述异构认知图的节点，确定所述异构认知图的边及每条边的权重，所述权重根据所述初始词向量对应的对话文本中句子间主题重合程度确定；According to the nodes of the heterogeneous cognitive graph, determine the edges of the heterogeneous cognitive graph and the weight of each edge, and the weight is determined according to the degree of topic overlap between sentences in the dialogue text corresponding to the initial word vector;

利用图神经网络学习所述异构认知图中所述节点的向量表示，获得所述初始词向量在图域上的图域语义表示。A graph neural network is used to learn the vector representation of the nodes in the heterogeneous cognitive graph, and obtain the graph domain semantic representation of the initial word vector on the graph domain.

优选地，根据所述顺序语义表示和所述图域语义表示，对所述对话文本进行语义增强，具体包括：Preferably, performing semantic enhancement on the dialog text according to the sequential semantic representation and the graph domain semantic representation specifically includes:

根据第一公式对所述对话文本进行语义增强，所述第一公式为：The dialogue text is semantically enhanced according to a first formula, and the first formula is:

式中，c_final为所述增强后的语义表示，

为所述顺序语义表示，

为所述图域语义表示，δ为所述顺序语义表示中的语义数量，(1-δ)为所述图域语义表示中的语义数量。In the formula, c _final is the enhanced semantic representation,

For the sequential semantic representation,

is the graph-domain semantic representation, δ is the semantic quantity in the sequential semantic representation, and (1-δ) is the semantic quantity in the graph-domain semantic representation.

优选地，根据所述增强后的语义表示，生成回复文本，具体包括：Preferably, the reply text is generated according to the enhanced semantic representation, which specifically includes:

将所述增强后的语义表示输入至单向门控神经网络中，获取生成回复文本中每个词的隐藏状态；Input the enhanced semantic representation into the one-way gated neural network to obtain the hidden state of each word in the generated reply text;

根据所述隐藏状态，确定每个词的生成概率，根据所述生成概率确定回复文本。According to the hidden state, the generation probability of each word is determined, and the reply text is determined according to the generation probability.

优选地，将所述增强后的语义表示输入至单向门控神经网络中，获取生成回复文本中每个词的隐藏状态，具体包括：Preferably, the enhanced semantic representation is input into a one-way gated neural network to obtain the hidden state of each word in the generated reply text, specifically including:

根据第二公式生成回复文本中每个词的隐藏状态，所述第二公式为：Generate the hidden state of each word in the reply text according to the second formula, the second formula is:

式中，y_i为训练阶段生成回复文本中的第i个词，y_i-1为训练阶段生成回复文本中的第i-1个词，

为y_i的隐藏状态，GRU(·)表示将其中的参数输入至门控神经网络中，

为y_i-1的隐藏状态，c_final为所述增强后的语义表示。In the formula, y _i is the i-th word in the reply text generated in the training phase, and y _i-1 is the i-1th word in the reply text generated in the training phase,

is the hidden state of y _i , and GRU( ) means that the parameters in it are input into the gated neural network,

is the hidden state of y _i-1 , and c _final is the enhanced semantic representation.

优选地，根据所述隐藏状态，确定每个词的生成概率，具体包括：Preferably, according to the hidden state, determining the generation probability of each word specifically includes:

根据第三公式确定每个词的生成概率，所述第三公式为：Determine the generation probability of each word according to the third formula, the third formula is:

式中，

为预测阶段生成回复文本中的第i个词，

为

的生成概率，

和

别为预测阶段主题关键词词表和回复文本词表中第i个词

的生成概率。In the formula,

generate the i-th word in the reply text for the prediction phase,

for

The generation probability of

with

Don’t be the i-th word in the topic keyword vocabulary and reply text vocabulary in the prediction stage

the generation probability of .

优选地，所述

根据第四公式确定，所述第四公式为：Preferably, the

Determined according to the fourth formula, the fourth formula is:

式中，η(·)是非线性函数tanh，V为回复文本词表，K为主题关键词词表，

为训练阶段生成回复文本中第i个词y_i的隐藏状态，y_i-1为训练阶段生成回复文本中第i-1个词，c_final为所述增强后的语义表示，vocab表示变量i。In the formula, η(·) is a non-linear function tanh, V is a reply text vocabulary, K is a subject keyword vocabulary,

Generate the hidden state of the i-th word y _i in the reply text for the training phase, y _i-1 generates the i-1th word in the reply text for the training phase, c _final is the semantic representation after the enhancement, and vocab represents the variable i .

优选地，所述

根据第五公式确定，所述第五公式为：Preferably, the

Determined according to the fifth formula, the fifth formula is:

本发明内容的第二方面公开了一种终端设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述方法的步骤。The second aspect of the content of the present invention discloses a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, the steps of the method described above.

本发明相对于现有技术，具有如下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明是一种顺序和图域双通道协同语义建模及推理的方法，旨在融合不同结构建模中的语义优势，获得跨度更大的信息关联和语义推理。详细来讲，一方面本发明构建对话级异构认知图，图节点是主题语义和句子簇语义的整合，图中边是句子间主题重合的程度，然后利用双门控图神经网络进行深度学习，获得对话上下文在图域上的语义表示；另一方面，在保留的顺序通道中嵌入分层注意力机制获得了对话上下文的顺序语义表示。最后协调两个语义表示的信息贡献进行预测。本发明模型在基准模型上表现优异，而且缓解了长距离的语义依赖问题。The invention is a sequential and graph-domain dual-channel cooperative semantic modeling and reasoning method, which aims to integrate semantic advantages in different structural modeling and obtain information association and semantic reasoning with a larger span. In detail, on the one hand, the present invention constructs a dialogue-level heterogeneous cognitive graph. The graph nodes are the integration of topic semantics and sentence cluster semantics. The edges in the graph represent the degree of topic overlap between sentences. Learning, to obtain the semantic representation of the dialogue context on the graph domain; on the other hand, embedding a hierarchical attention mechanism in the preserved sequential channel obtains the sequential semantic representation of the dialogue context. Finally, the information contributions of the two semantic representations are reconciled for prediction. The model of the present invention performs well on the benchmark model, and alleviates the problem of long-distance semantic dependence.

本发明有助于推动多轮对话生成的进一步发展，帮助系统更好理解上下文的高层语义信息，也能够从重构的异构认知图结构中重获新的认知，帮助生成多样化，有价值的信息，且鲁棒性佳，提升了用户使用信息服务的满意度和效率。The present invention helps to promote the further development of multi-round dialogue generation, helps the system to better understand the high-level semantic information of the context, and can also regain new cognition from the reconstructed heterogeneous cognitive map structure, helping to generate diversification, Valuable information with good robustness improves users' satisfaction and efficiency in using information services.

附图说明Description of drawings

图1为本发明实施例的基于双通道语义增强的多轮对话回复生成方法的概要流程图；FIG. 1 is a schematic flowchart of a multi-round dialog reply generation method based on dual-channel semantic enhancement according to an embodiment of the present invention;

图2为本发明实施例的基于双通道语义增强的多轮对话回复生成方法的详细流程图；2 is a detailed flow chart of a method for generating multi-round dialogue replies based on dual-channel semantic enhancement according to an embodiment of the present invention;

图3为本发明图神经网路的结构示意图；Fig. 3 is the structural representation of graph neural network of the present invention;

图4为本发明实施例节点语义表示编码示意图；FIG. 4 is a schematic diagram of node semantic representation encoding according to an embodiment of the present invention;

图5为本发明具体实施例的语义特征的聚合策略；Fig. 5 is the aggregation strategy of the semantic feature of the specific embodiment of the present invention;

图6为本发明具体实施例的SGDC模型和基线模型在不同上下文长度的测试样本上的性能图。图6中的(a)为数据集DailyDialog的PPL值，(b)为数据集MuTual的PPL值，(c)为数据集DailyDialog的Dist-2值，(d)为数据集MuTual的Dist-2值，(e)为数据集DailyDialog的EA值，(f)为数据集MuTual的EA值。Fig. 6 is a performance graph of the SGDC model and the baseline model of the specific embodiment of the present invention on test samples with different context lengths. In Figure 6, (a) is the PPL value of the dataset DailyDialog, (b) is the PPL value of the dataset MuTual, (c) is the Dist-2 value of the dataset DailyDialog, and (d) is the Dist-2 of the dataset MuTual Value, (e) is the EA value of the dataset DailyDialog, (f) is the EA value of the dataset MuTual.

具体实施方式detailed description

下文将结合附图以及具体实施案例对本发明的技术方案做更进一步的详细说明。应当了解，下列实施例仅为示例性地说明和解释本发明，而不应被解释为对本发明保护范围的限制。凡基于本发明上述内容所实现的技术均涵盖在本发明旨在保护的范围内。The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific implementation cases. It should be understood that the following examples are only for illustrating and explaining the present invention, and should not be construed as limiting the protection scope of the present invention. All technologies realized based on the above contents of the present invention are covered within the scope of protection intended by the present invention.

多轮对话回复生成从实质来看就是一个根据序列预测序列的问题。本发明对多轮对话回复生成建模进行任务化，其归属于自然语言文本生成任务。一个多轮对话序列Diag包含M>2轮话语，m∈(1,M]代表第m轮次，对话序列可以用Diag＝{U₁,......U_m-1,U_M}来表示，其中对话历史(Dialogue History)话语序列(U₁，......U_m-2)代表着整个对话上下文的语境信息，对话查询(Dialogue Query)话语U_m-1代表着当前的对话进展状态，U_M是本发明多轮对话回复生成任务要生成的目标话语(Target Response)。Multi-round dialogue response generation is essentially a problem of predicting sequences based on sequences. The present invention performs task-based modeling of multi-round dialogue reply generation, which belongs to the task of natural language text generation. A multi-round dialog sequence Diag contains M>2 rounds of utterances, m∈(1,M] represents the mth round, and the dialog sequence can be used Diag={U ₁ ,...U _m-1 ,U _M } , where the Dialogue History utterance sequence (U ₁ , ...... U _m-2 ) represents the contextual information of the entire dialogue context, and the Dialogue Query (Dialogue Query) utterance U _m-1 represents The current dialogue progress state, U _M is the target speech (Target Response) to be generated by the multi-round dialogue reply generation task of the present invention.

如今的智能对话系统大多是基于端到端的深度神经网络技术研发的，随着应用场景的普及，对话系统的回复却不能与时俱进，形式较为单一，内容缺少场景价值。其中，连续交互的多轮对话系统研究虽然通过引入常识或固定句式来获得响应质量的提升，但主要挑战却是对上下文进行有效性建模，获得准确语义表示。当前最常用的分层编码器解码器框架忽略了这样一个事实，即对话是在连贯的过程中产生的，任两句话语之间在语义上相关且相互补充或者削减。当单独编码每一句话语时而不考虑它们的内在关系时，层次模型可能无法捕捉上下文中的话语连贯性并最终产生非理想的响应。因此基于层次模型的编码器解码器框架仍然在对话历史的上下文语义建模上需要甄别不同语句的贡献程度并进行区别编码。Most of today's intelligent dialogue systems are developed based on end-to-end deep neural network technology. With the popularity of application scenarios, the responses of dialogue systems cannot keep pace with the times. The form is relatively single, and the content lacks scene value. Among them, although the continuous interactive multi-round dialogue system research improves the response quality by introducing common sense or fixed sentence patterns, the main challenge is to effectively model the context and obtain accurate semantic representation. The current most commonly used hierarchical encoder-decoder framework ignores the fact that dialogue is produced in a coherent process, where any two utterances are semantically related and complement or reduce each other. When encoding each utterance individually without considering their interrelationships, hierarchical models may fail to capture utterance coherence in context and end up producing non-ideal responses. Therefore, the encoder-decoder framework based on the hierarchical model still needs to identify the contribution of different sentences and encode them differently in the context semantic modeling of the dialogue history.

在多轮对话回复生成中，主题信息是根据对话历史提取到的高级语义特征，模型通过整合主题语义提高响应回复信息量和切题度法。但已有研究仅仅是引入或者选择主题来缓解语义稀疏性，主题的向量表示时未考虑和主题所在的特定对话上下文进行语义交互，这种无上下文语境的方式可能会由于自然语言的固有多义性而导致不准确的主题表示和话语句子表示，进而损害响应生成的效果。In the multi-round dialogue reply generation, the topic information is the high-level semantic feature extracted from the dialogue history, and the model improves the information volume and relevance of the response reply by integrating the topic semantics. However, the existing research only introduces or selects topics to alleviate semantic sparsity. The vector representation of topics does not consider the semantic interaction with the specific dialogue context where the topics are located. This context-free approach may be due to the inherent diversity of natural language This leads to inaccurate topic representation and utterance sentence representation, which in turn impairs the effect of response generation.

多轮对话文本输入的向量表示学习从无序的词袋结构逐步稳定到序列结构，语义建模方式也从机器学习方法进展到以循环神经网络和注意力机制为代表的深度学习方法，但囿于序列结构的神经网络学习模式，语义建模仍难以较好解决长距离依赖问题。随着图神经网络在各种各样的NLP子任务中的广泛应用，多轮对话回复生成的任务也急需打破非欧式空间的桎梏，深入探索自身输入存在的图结构，并依此来建模上下文语境辅助现状下的序列结构建模。The vector representation learning of multi-round dialogue text input has gradually stabilized from the disordered bag-of-words structure to the sequence structure, and the semantic modeling method has also progressed from machine learning methods to deep learning methods represented by recurrent neural networks and attention mechanisms. Due to the neural network learning mode of sequence structure, semantic modeling is still difficult to solve the problem of long-distance dependence. With the widespread application of graph neural networks in various NLP sub-tasks, the task of multi-round dialogue response generation also urgently needs to break the shackles of non-Euclidean space, deeply explore the graph structure of its own input, and model accordingly Sequence Structure Modeling in Context Aided Status Quo.

本发明的第一方面提供了一种基于双通道语义增强的多轮对话回复生成方法，如图1和图2所示，包括：The first aspect of the present invention provides a multi-round dialogue reply generation method based on dual-channel semantic enhancement, as shown in Figures 1 and 2, including:

步骤1、获取对话文本的初始词向量。Step 1. Obtain the initial word vector of the dialogue text.

步骤2、获取初始词向量的顺序语义表示，包括获取初始词向量的话语级句子语义向量，根据话语级句子语义向量确定初始词向量的对话级句子语义向量，将对话级句子语义向量记为顺序语义表示。Step 2. Obtain the sequential semantic representation of the initial word vector, including obtaining the discourse-level sentence semantic vector of the initial word vector, determine the dialogue-level sentence semantic vector of the initial word vector according to the discourse-level sentence semantic vector, and record the dialogue-level sentence semantic vector as the sequence Semantic representation.

本步骤的主要目的是通过顺序通道的语义分析，建模对话文本在时序发展上的语境，并形成顺序通道中对话上下文的语义向量表示。为了减少多轮对话流转过程中重要语义的丢失，本步骤在分层架构的编码器中添加了分层注意力聚焦不同粒度的重要语义。分层架构编码器包括句子层编码器和上下文编码器，分层注意力包括字词注意力和句子注意力。The main purpose of this step is to model the context of the dialogue text in the temporal development through the semantic analysis of the sequential channel, and form a semantic vector representation of the dialogue context in the sequential channel. In order to reduce the loss of important semantics in the process of multi-round dialogue flow, this step adds layered attention to the important semantics of different granularities in the encoder of the layered architecture. The hierarchical architecture encoder includes sentence-level encoder and context encoder, and the hierarchical attention includes word attention and sentence attention.

步骤2具体包括：Step 2 specifically includes:

步骤21、将初始词向量依次输入句子层编码器和字词注意力模块，得到话语级句子语义向量。其中句子层编码器为双向门控神经网络，字词注意力模块使用的机制为注意力机制。Step 21. Input the initial word vector into the sentence layer encoder and word attention module in sequence to obtain the sentence-level sentence semantic vector. The sentence layer encoder is a two-way gated neural network, and the mechanism used by the word attention module is the attention mechanism.

句子层编码器和字词注意力是基于对话上下文的初始化词向量表示来进行话语级的句子语义向量学习的，如公式(1)：The sentence-level encoder and word attention are based on the initial word vector representation of the dialogue context to perform sentence-level sentence semantic vector learning, such as formula (1):

式中，以U_i的话语级句子语义向量学习为例，

为双向门控神经网络(Bidirectional GatedRecurrentUnit，

)，Num_i为句子U_i的词汇总数，

为双向门控神经网络在学习词汇x_j,i时的相邻隐藏状态输出，w_j,i为句子U_i第j个位置上词汇x_j,i的初始向量，

为句子U_i第j个位置上词汇x_j,i最新的向量表示，也代表双向门控神经网络在学习词汇x_j,i后的隐藏状态输出。In the formula, taking the utterance-level sentence semantic vector learning of U _i as an example,

For the bidirectional gated neural network (Bidirectional GatedRecurrentUnit,

), Num _i is the total vocabulary of sentence U _i ,

is the adjacent hidden state output of the bidirectional gated neural network when learning the vocabulary x _j,i , w _j,i is the initial vector of the vocabulary x _j,i at the jth position of the sentence U _i ,

is the latest vector representation of the vocabulary x _j,i at the jth position of the sentence U _i , and also represents the hidden state output of the bidirectional gated neural network after learning the vocabulary x _j,i .

和普通的分层架构的编码器不同的是，本发明没有将最后一个隐藏状态

作为句子U_i的语义向量，而是使用解码步骤中的s_t-1和隐藏状态序列

进行相似度计算，确定附加在每个隐藏状态上的各自权重{α_1,i,α_2,i...α_j,i}，从而加权求和得到U_i的语义向量

如公式(2)和公式(3)所示：Different from the encoder of ordinary layered architecture, the present invention does not use the last hidden state

as the semantic vector of the sentence U _i , but instead use s _t-1 and the hidden state sequence in the decoding step

Carry out similarity calculations, determine the respective weights {α _1,i ,α _2,i ...α _j,i } attached to each hidden state, and thus obtain the semantic vector of U _i by weighted summation

As shown in formula (2) and formula (3):

式中，η(·)代表着Relu函数，Relu函数能够节省计算，缓解过拟合和梯度消失问题。至此，本发明可以获得话语级的句子语义向量序列

In the formula, η( ) represents the Relu function, which can save calculations and alleviate the problems of overfitting and gradient disappearance. So far, the present invention can obtain the sentence semantic vector sequence of discourse level

步骤22、将话语级句子语义向量依次输入上下文编码器和句子注意力模块，得到对话级句子语义向量，其中上下文编码器为双向门控神经网络，句子注意力模块使用的机制为注意力机制。Step 22. Input the utterance-level sentence semantic vector into the context encoder and the sentence attention module in turn to obtain the dialogue-level sentence semantic vector. The context encoder is a bidirectional gated neural network, and the mechanism used by the sentence attention module is the attention mechanism.

上下文编码器和句子注意力是基于话语级的句子语义向量来进行对话级的句子语义向量学习的。和上述计算类似：Context encoder and sentence attention are based on utterance-level sentence semantic vectors for dialogue-level sentence semantic vector learning. Similar to the calculation above:

其中，η(·)代表着Relu函数，

为双向门控神经网络(BidirectionalGatedRecurrentUnit，

)，

为双向门控神经网络在学习词汇x_t,i时的相邻隐藏状态输出，β_i,t为句子U_i第t个位置上词汇x_t,i的权重，

为句子U_i第t个位置上词汇x_t,i最新的向量表示，也代表双向门控神经网络在学习词汇x_t,i后的隐藏状态输出,

是上下文编码器获得的输出层的隐藏状态向量，

是经过句子注意力加权计算后获得的对话级的句子语义向量表示。

是计算了解码状态和上下文编码器的每个隐藏状态的关联权重后加权聚合得到的，能够基于文本的实时申请状况来调整整个对话上下文的最终语义表示。本发明一般将

称作

为对话级句子语义向量，这个对话上下文的最终语义向量代表着顺序通道中经过复杂语义交互学习后的语境，是解码器的重要参照及输入。Among them, η( ) represents the Relu function,

For the bidirectional gated neural network (BidirectionalGatedRecurrentUnit,

),

is the adjacent hidden state output of the bidirectional gated neural network when learning the vocabulary x _t,i , β _i,t is the weight of the vocabulary x _t,i at the tth position of the sentence U _i ,

is the latest vector representation of the vocabulary x _t,i at the tth position of the sentence U _i , and also represents the hidden state output of the bidirectional gated neural network after learning the vocabulary x _t,i ,

is the hidden state vector of the output layer obtained by the context encoder,

It is a dialogue-level sentence semantic vector representation obtained after sentence attention weighted calculation.

It is obtained by weighted aggregation after calculating the associated weights of the decoding state and each hidden state of the context encoder, and can adjust the final semantic representation of the entire dialogue context based on the real-time application status of the text. The present invention will generally

called

It is a dialogue-level sentence semantic vector. The final semantic vector of this dialogue context represents the context after complex semantic interaction learning in the sequential channel, and is an important reference and input for the decoder.

步骤3、获取初始词向量在图域上的图域语义表示。Step 3. Obtain the graph domain semantic representation of the initial word vector on the graph domain.

本发明的主要目的是通过图域通道的显隐式语义分析，建模对话文本跨越时序后的中长距离语义关联，并学习图域通道中对话上下文的语义向量表示。首先，根据对话上下文中的显隐式关系进行图构建，然后设计了新式的图神经网络来学习节点的向量表示，最后通过池化计算来获得最终的语义向量表示。为了减少多轮对话流转过程中重要语义的丢失，本发明在传统的图神经网络层设计了双门控的过滤机制，双门控的机制可以减少节点信息更新时的语义噪声。The main purpose of the present invention is to model the medium and long-distance semantic associations of dialogue texts across time series through the explicit and implicit semantic analysis of the graph-domain channel, and to learn the semantic vector representation of the dialogue context in the graph-domain channel. First, the graph is constructed according to the explicit and implicit relations in the dialogue context, then a new graph neural network is designed to learn the vector representation of the nodes, and finally the final semantic vector representation is obtained through pooling calculation. In order to reduce the loss of important semantics in the process of multiple rounds of dialogue flow, the present invention designs a double-gated filtering mechanism in the traditional graph neural network layer. The double-gated mechanism can reduce semantic noise when node information is updated.

和最简单的全连接神经网络(MLP)相比，图神经网络技术是在图域结构上进行节点信息更新，计算时除了权重矩阵，还多了一个邻接矩阵A用于聚合计算，如图3所示。在研究中，常见的图神经网络有三大类别，分别是：Compared with the simplest fully connected neural network (MLP), graph neural network technology updates node information on the graph domain structure. In addition to the weight matrix, an additional adjacency matrix A is added for aggregation calculation during calculation, as shown in Figure 3 shown. In the research, there are three major categories of common graph neural networks, namely:

图卷积神经网络(Graph Convolution Networks，GCN)Graph Convolution Networks (GCN)

图卷积神经网络(GCN)分为基于谱域的GCN和基于空间的GCN，后者在GCN的应用中最为广泛，因此本发明着重介绍基于空间的GCN(后面简称GCN)。类似于传统CNN对欧式数据的卷积计算，GCN是根据图域数据中的节点关系来进行卷积的，将中心节点的表示与其邻居的表示进行沿着边进行聚合，以更新中心节点的向量表示，可以适应不同位置和结构，也可以在节点计算时共享权重。节点之间的信息传递计算：Graph Convolutional Neural Network (GCN) is divided into GCN based on spectral domain and GCN based on space. The latter is the most widely used in GCN. Therefore, this invention focuses on GCN based on space (hereinafter referred to as GCN). Similar to the convolution calculation of traditional CNN on European data, GCN performs convolution according to the node relationship in the graph domain data, and aggregates the representation of the central node and its neighbors along the edge to update the vector of the central node Indicates that it can adapt to different positions and structures, and can also share weights during node calculations. Information transfer calculation between nodes:

其中M_k(·)和U_k(·)是具有可学习参数的函数，一般使用全连接神经网络(MLP)。

代表v节点在第k层的节点向量，u∈N_v代表v节点的邻居节点集合。where M _k (·) and U _k (·) are functions with learnable parameters, generally using a fully connected neural network (MLP).

Represents the node vector of node v at layer k, u∈N _v represents the set of neighbor nodes of v node.

图注意力网络(Graph Attention Networks,GAT)Graph Attention Networks (GAT)

图注意力网络(GAT)支持放大邻居节点最重要部分的影响。它在聚合过程中使用注意力机制确定邻域节点的权重，控制多个邻居节点表示向量输入中心节点的语义信息，并生成面向重要目标的随机行走表示。节点之间的信息传递计算：Graph Attention Network (GAT) supports amplifying the influence of the most important parts of neighbor nodes. It uses an attention mechanism to determine the weights of neighborhood nodes during the aggregation process, controls multiple neighbor node representation vectors to input the semantic information of the center node, and generates important target-oriented random walk representations. Information transfer calculation between nodes:

其中W_k(·)和U_k(·)是具有可学习参数的函数，一般使用全连接神经网络(MLP)。α(·)是一个可以自适应控制邻居节点

对v节点语义信息贡献的注意力函数。where W _k (·) and U _k (·) are functions with learnable parameters, generally using a fully connected neural network (MLP). α( ) is a node that can adaptively control neighbor nodes

An attention function that contributes to the semantic information of v-nodes.

图时空网络(Graph Spatial-Temporal Networks,GSTN)Graph Spatial-Temporal Networks (GSTN)

图时空网络(GSTN)在时空相关性上表现优异，例如交通网络的节点预测等应用场景。GSTN可以预测未来的节点值或者标签，以及预测时空图标签，在构建上遵循基于RNN和基于CNN的两种方法。以基于RNN的方法为例，添加图卷积单元可以捕获时空依赖性，节点更新计算为：Graph spatio-temporal network (GSTN) performs well in spatio-temporal correlation, such as node prediction of traffic network and other application scenarios. GSTN can predict future node values or labels, as well as predict space-time graph labels. It follows two methods based on RNN and CNN in construction. Taking the RNN-based method as an example, adding a graph convolution unit can capture the spatio-temporal dependencies, and the node update is calculated as:

其中，Gconv(·)是图卷积单元，A_v是中心节点在第k层的邻接矩阵，代表着邻居节点和中心节点的关联；RNN(·)是经典的循环神经网络计算，详细介绍：Among them, Gconv( ) is the graph convolution unit, A _v is the adjacency matrix of the central node at layer k, which represents the association between the neighbor node and the central node; RNN( ) is a classic recurrent neural network calculation, detailed introduction:

h_t＝σ(U·x_t+W·h_t-1) (9)h _t ＝σ(U·x _t +W·h _t-1 ) (9)

基于此上述步骤3具体包括：Based on this, the above step 3 specifically includes:

步骤31、获取初始词向量的主题关键词，根据主题关键词确定异构认知图的节点，节点包括主题节点、非主题节点和查询节点。Step 31. Obtain the topic keywords of the initial word vector, and determine the nodes of the heterogeneous cognitive map according to the topic keywords. The nodes include topic nodes, non-topic nodes and query nodes.

本发明通过提取主题关键字将分割后的对话内容进行显隐式连接，构建异构认知图。首先，将整个对话上下文{U₁,U₂...U_m-1,U_m}切分为三部分{U,Q,R}，分别是对话历史句子{U₁,U₂...U_m-2}、对话查询句子{U_m-1}和回复文本{U_m}，其中对话历史句子距离回复文本较远，代表对话的全局历史信息，对话查询句子紧邻回复文本，代表着对话的短期意图信息，均属粗粒度的语义信息。此外，与长文本分析不同，对话上下文中经常有内容与整个对话流的方向无关，例如“是的，我懂”，因此，本发明提取主题关键字来更好理解对话语境。主题关键字是特殊的命名实体，是分布于整个对话上下文的重要实体，极具辨识度，代表细粒度的语义信息，可以用来建模对话的语义流动关联。The present invention constructs a heterogeneous cognitive map by extracting subject keywords and connecting the segmented dialogue contents explicitly and implicitly. First, the entire dialogue context {U ₁ , U ₂ ... U _m-1 , U _m } is divided into three parts {U, Q, R}, which are the dialogue history sentences {U ₁ , U ₂ ... U _m-2 }, dialogue query sentence {U _m-1 } and reply text {U _m }, where the dialogue history sentence is far from the reply text, representing the global historical information of the dialogue, and the dialogue query sentence is close to the reply text, representing the dialogue The short-term intention information is coarse-grained semantic information. In addition, unlike long text analysis, there is often content in the dialogue context that is irrelevant to the direction of the entire dialogue flow, such as "Yes, I understand", so the present invention extracts topic keywords to better understand the dialogue context. Topic keywords are special named entities, which are important entities distributed in the entire dialogue context. They are highly recognizable and represent fine-grained semantic information, which can be used to model the semantic flow association of dialogues.

本发明使用词频-逆文档频次算法(Term Frequency-Inverse DocumentFrequency，TF-IDF算法)进行主题关键词提取，主题关键词是对话文本中能够代表对话语境的高频词汇。The present invention uses a Term Frequency-Inverse Document Frequency algorithm (Term Frequency-Inverse Document Frequency, TF-IDF algorithm) to extract topic keywords, and topic keywords are high-frequency words in the dialogue text that can represent the dialogue context.

步骤32、根据异构认知图的节点，确定异构认知图的边及每条边的权重，权重根据初始词向量对应的对话文本中句子间主题重合程度确定。Step 32. According to the nodes of the heterogeneous cognitive graph, determine the edges of the heterogeneous cognitive graph and the weight of each edge. The weight is determined according to the topic overlap degree between sentences in the dialogue text corresponding to the initial word vector.

步骤31和步骤32的实现算法如表1所示。The implementation algorithms of step 31 and step 32 are shown in Table 1.

表1异构认知图构建算法Table 1 Heterogeneous Cognitive Map Construction Algorithm

算法1展开阐述了由对话文本构建异构认知图的过程。异构认知图的建立是为了支持基于图域通道的认知推理，具体来讲可以利用对话查询，对话历史和主题关键词中的协作信息，执行多跳推理来获得更强的语义交互。其中，本发明使用Stanford CoreNLP(https://stanfordnlp.github.io/CoreNLP)来进行分词和词性标注等数据处理，但不足以代表对话句子语义，于是使用TF-IDF算法进行主题关键词提取，主题关键词是对话文本中能够代表对话语境的高频词汇。Algorithm 1 expounds the process of constructing heterogeneous cognitive maps from dialogue texts. The heterogeneous cognitive graph is established to support cognitive reasoning based on graph-domain channels. Specifically, it can use dialogue queries, dialogue history, and collaborative information in topic keywords to perform multi-hop reasoning to obtain stronger semantic interactions. Among them, the present invention uses Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP) to perform data processing such as word segmentation and part-of-speech tagging, but it is not enough to represent the semantics of dialogue sentences, so the TF-IDF algorithm is used to extract subject keywords, Topic keywords are high-frequency words in the dialogue text that can represent the context of the dialogue.

在本发明得到主题关键词集合K之后，本发明通过主题关键词在对话句子中的状况来构建图节点。包含某主题k的句子集合和该主题k组成异构认知图中的第一类重要节点，称作v_k。可以注意到，某个句子可能包含若干主题关键词，这说明当前句子具有丰富的语义信息，能够和其他类似的句子建立信息交互的连接渠道。当有句子不包含主题关键词时，本发明认为当前句子对于整个对话的语义作用较小，会归纳到特殊的节点v_empty中。同时，对话查询句子由于最邻接回复文本，因此本发明认为语义作用最为重要，将其归结到另一特殊节点v_Q。After the subject keyword set K is obtained in the present invention, the present invention constructs graph nodes through the status of subject keywords in dialogue sentences. The set of sentences containing a topic k and the topic k constitute the first type of important nodes in the heterogeneous cognitive graph, called v _k . It can be noticed that a sentence may contain several topic keywords, which indicates that the current sentence has rich semantic information and can establish a connection channel for information interaction with other similar sentences. When there is a sentence that does not contain the topic keyword, the present invention considers that the current sentence has little semantic effect on the entire dialogue, and it will be summarized into a special node v _empty . At the same time, since the dialogue query sentence is closest to the reply text, the present invention considers that the semantic role is the most important, and attributes it to another special node v _Q .

在建立图中节点之间连接时，根据显隐式的关系从上述三类节点{v_K,v_empty,v_Q}构建边集合E＝{e_i,j}。v_k为“主题-句子簇”节点、v_empty为普通节点(不包含主题的对话句子簇)，v_Q为对话查询节点。这里注意，本发明将异构节点特征考虑到后续的节点表示中，边的建立只需要考虑连接的权重。在算法13-17步中，本发明可以看到当节点v_i和节点v_j共享句子时，本发明添加一条边e_i,j，共同拥有的句子数越多，两个节点之间的关系越紧密，权重越大。此外，由于两类特殊节点的关联，本发明直接连接两个特殊节点，构建特殊边e_Q,E，这会从查询句子的启发中学到语义噪声中的重要有关信息。When establishing connections between nodes in the graph, an edge set E={e _i,j } is constructed from the above three types of nodes {v _K ,v _empty ,v _Q } according to explicit and implicit relationships. v _k is a "topic-sentence cluster" node, v _empty is an ordinary node (a dialog sentence cluster not containing a topic), and v _Q is a dialog query node. It should be noted here that the present invention takes heterogeneous node characteristics into consideration in the subsequent node representation, and the establishment of edges only needs to consider the weight of connections. In steps 13-17 of the algorithm, the present invention can see that when node v _i and node v _j share sentences, the present invention adds an edge e _i,j , the more sentences they share, the relationship between the two nodes The tighter it is, the greater the weight. In addition, due to the association of two types of special nodes, the present invention directly connects two special nodes to construct a special edge e _Q,E , which can learn important relevant information in semantic noise from the heuristic of query sentences.

步骤33、利用图神经网络学习异构认知图中节点的向量表示，获得初始词向量在图域上的图域语义表示。Step 33. Use the graph neural network to learn the vector representation of the nodes in the heterogeneous cognitive graph, and obtain the graph domain semantic representation of the initial word vector on the graph domain.

节点语义表示编码Node Semantic Representation Encoding

异构认知图上的推理是基于图节点表示的更新和学习的，图节点的初始表示可以给后续图神经网络学习带来正确的引领。以图4为例，本发明计算三种节点的初始向量表示：The reasoning on the heterogeneous cognitive graph is based on updating and learning graph node representations. The initial representation of graph nodes can bring correct guidance to subsequent graph neural network learning. Taking Figure 4 as an example, the present invention calculates the initial vector representations of three kinds of nodes:

①对于节点v_empty，本发明对属于节点v_empty的句子的初始语义向量进行平均池化来获取节点向量v_e，当节点v_empty没有句子集合时，本发明将所有对话历史句子进行平均池化：① For the node v _empty , the present invention performs average pooling on the initial semantic vectors of the sentences belonging to the node v _empty to obtain the node vector v _e , and when the node v _empty has no sentence set, the present invention performs average pooling on all dialogue history sentences :

②对于节点v_Q，本发明直接将对话查询句子的初始向量表示

作为节点向量v_Q：②For node v _Q , the present invention directly expresses the initial vector of the dialogue query sentence

As node vector v _Q :

③对于节点v_k，本发明对属于该节点的主题向量和句子向量进行级联操作，并通过单层全连接网络进行维度转变，以k₁主题所在节点为例：③ For node v _k , the present invention performs cascading operations on the topic vector and sentence vector belonging to the node, and performs dimension transformation through a single-layer fully connected network. Take the node where the topic k ₁ is located as an example:

在获得三类节点的初始化向量表示后，为了方便后续图神经网络计算，本发明将各节点向量表示{v_Q,v_e,v_K}称作{v₁,v₂...v_m}，其中m＝K+2。After obtaining the initialization vector representations of the three types of nodes, in order to facilitate subsequent graph neural network calculations, the present invention refers to each node vector representation {v _Q , v _e , v _K } as {v ₁ , v ₂ ...v _m } , where m=K+2.

节点信息的传递与更新Transmission and update of node information

图节点间的消息传递是通过两个步骤实现:信息聚合和信息组合，这个过程可以进行多次(通常称为层或跳)。信息聚合是为了汇聚同层邻接节点的语义交互信息；信息组合是为了将同一节点在不同层的信息进行更新组合。Message passing between graph nodes is achieved through two steps: information aggregation and information combination, and this process can be performed multiple times (often referred to as layers or hops). Information aggregation is to gather the semantic interaction information of adjacent nodes in the same layer; information combination is to update and combine the information of the same node in different layers.

信息聚合关注某一节点如何收集邻接节点的语义信息，本发明关注到当对话持续若干轮后，某一节点的邻接节点数量较多且并不是都能给中心节点带来等价值的语义信息，有些甚至会带来语义噪声。因此，本发明区别于普通的图神经网络，在l层的节点更新时选用GRU单元来过滤邻接节点簇的信息内容，缓解语义噪声。具体来讲，门控机制中的重置门R_t会控制从邻居节点v_j到v_i的信息流：Information aggregation focuses on how a certain node collects semantic information of adjacent nodes. The present invention concerns that when the dialogue lasts for several rounds, the number of adjacent nodes of a certain node is large and not all of them can bring equivalent semantic information to the central node. Some even introduce semantic noise. Therefore, the present invention is different from the common graph neural network, and selects the GRU unit to filter the information content of the adjacent node clusters when updating the nodes of the l layer, so as to alleviate the semantic noise. Specifically, the reset gate R _t in the gating mechanism will control the information flow from the neighbor node v _j to v _i :

其中R为所有类型边的集合，

为边缘类型为r的节点v_i的邻居簇，

为某一邻接节点v_j在第l层中的节点表示。|·|表示邻接节点簇的大小。GRU单元定义了聚合邻接信息的转换过程。相邻节点表示的转换，可以通过多层感知机(MLP)实现。

表示节点v_i在第l层的聚合信息，出于图域结构连接复杂、节点多的考虑，本发明又增加了一个残差连接，避免梯度消失的同时保留自身重要语义：where R is the set of all types of edges,

is the neighbor cluster of node v _i whose edge type is r,

It is the node representation of a certain adjacent node v _j in layer l. |·| represents the size of the adjacent node cluster. The GRU unit defines the transformation process for aggregating adjacency information. The transformation of the representation of adjacent nodes can be realized by multi-layer perceptron (MLP).

Indicates the aggregation information of node v _i at layer l. Considering the complex connection of the graph domain structure and the large number of nodes, the present invention adds a residual connection to avoid gradient disappearance while retaining its own important semantics:

其中，f_s是通过多层感知机(MLP)实现的。Among them, f _s is realized by multi-layer perceptron (MLP).

信息组合侧重将不同层的同一节点表示进行更新组合，获得多跳认知后的信息内容。但已有研究表明，图神经网络在层间推理极易出现平滑问题，平滑问题会导致相似的节点表示，从而丧失信息的辨别能力。为了解决这个问题，本发明会从不同源点控制节点v_i从l层到l+1层的信息流大小，这个是在信息组合中添加了一个Gate权重:Information combination focuses on updating and combining representations of the same node in different layers to obtain information content after multi-hop cognition. However, existing studies have shown that graph neural networks are prone to smoothing problems in reasoning between layers. Smoothing problems will lead to similar node representations, thus losing the ability to distinguish information. In order to solve this problem, the present invention will control the size of the information flow of node _v from layer l to layer l+1 from different sources. This is to add a Gate weight in the information combination:

其中，sigmoid(·)是通过量化同一节点不同信息源对层间信息更新的贡献程度，来确定权重

具体来讲，

是决定了信息组合时来自原始节点表示和更新节点表示中的信息数量，类似于灵活的残差机制。η(·)为非线性激活函数Leaky ReLU，⊙表示逐元素乘法，f_s、f_g均采用单层MLP实现。经过多层消息传递后，所有节点都将拥有它们最终更新后的节点表示。Among them, sigmoid( ) is to determine the weight by quantifying the contribution of different information sources of the same node to the update of inter-layer information

Specifically,

is to determine the amount of information from the original node representation and the updated node representation when the information is combined, similar to the flexible residual mechanism. η( ) is the non-linear activation function Leaky ReLU, ⊙ means element-wise multiplication, and f _s and f _g are implemented by single-layer MLP. After multiple layers of message passing, all nodes will have their final updated node representations.

步骤4、根据顺序语义表示和图域语义表示，对对话文本进行语义增强，得到增强后的语义表示。Step 4. Perform semantic enhancement on the dialogue text according to the sequence semantic representation and the graph domain semantic representation, and obtain the enhanced semantic representation.

顺序通道经过分层注意力和编码器的递进建模，能够获得在序列对话数据上的语义表示

图域通道通过本发明构建的异构认知图和设计的双门控GNN，能够在图上进行对话意图和语义的多跳推理，建立中远距离的多个语义关联表示

两个通道的语义结果相辅相成，通过信息协同可以达到整个对话语境的高级语义认知。The sequential channel can obtain semantic representation on sequential dialogue data through hierarchical attention and progressive modeling of encoder

Through the heterogeneous cognitive map constructed by the present invention and the designed double-gated GNN, the graph-domain channel can perform multi-hop reasoning on dialogue intent and semantics on the graph, and establish multiple semantic association representations at medium and long distances

The semantic results of the two channels complement each other, and the high-level semantic cognition of the entire dialogue context can be achieved through information collaboration.

在图域通道中，通过多跳推理获得了每个节点的语义表示，这个语义表示是在层间汇聚了长距离的信息传递，本发明得到

后，为了解码器中方便和顺序通道语义进行协同，使用权重得分score_i进行诸节点的语义信息管理：In the graph-domain channel, the semantic representation of each node is obtained through multi-hop reasoning. This semantic representation is the aggregation of long-distance information transmission between layers. The present invention obtains

Finally, for the convenience of the decoder to cooperate with the sequential channel semantics, use the weight score score _i to manage the semantic information of the nodes:

其中，

是未进入双通道的对话查询句子语义表示，代表最初的对话意图，对生成的回复有着很好的引导作用，所以本发明用来计算节点信息的信息管理权重；NumL是语义节点的总数量。in,

It is the semantic representation of the dialogue query sentence that has not entered the dual channel, represents the original dialogue intention, and has a good guiding effect on the generated reply, so the present invention is used to calculate the information management weight of node information; NumL is the total number of semantic nodes.

在双通道的信息协同模块中，本发明同样使用一个Gate机制来控制两个通道语义信息对生成回复解码流程的影响：In the dual-channel information coordination module, the present invention also uses a Gate mechanism to control the influence of the semantic information of the two channels on the generated reply decoding process:

其中，δ是序列通道

输送到解码器的语义数量，1-δ代表图域通道

输送到解码器的语义数量。两部分语义通过相加，组合成最终的语义增强后的语义表示c_final。c_final是学习了序列语义发展和图域语义关联后的集成语义，在信息集成上侧重了对话查询句子的对话方向，能够辅助解码器精准的解码生成新的词汇。where δ is the sequence channel

The number of semantics delivered to the decoder, 1-δ represents the image domain channel

The amount of semantics fed to the decoder. The two parts of semantics are added together to form the final semantically enhanced semantic representation c _final . c _final is the integrated semantics after learning the development of sequence semantics and the association of graph-domain semantics. In terms of information integration, it focuses on the dialogue direction of dialogue query sentences, which can assist the decoder to accurately decode and generate new vocabulary.

步骤5、根据增强后的语义表示，生成回复文本，具体包括：Step 5. Generate a reply text according to the enhanced semantic representation, specifically including:

步骤51、将增强后的语义表示输入至单向门控神经网络中，获取生成回复文本中每个词的隐藏状态。Step 51. Input the enhanced semantic representation into the one-way gated neural network to obtain the hidden state of each word in the generated reply text.

解码器模块部分，本发明使用单向的GRU进行解码生成最新隐状态进行整个解码层语义向量的更新，从而获得隐藏状态来获得解码词表的概率分布：In the decoder module part, the present invention uses a one-way GRU to decode and generate the latest hidden state to update the semantic vector of the entire decoding layer, thereby obtaining the hidden state to obtain the probability distribution of the decoding vocabulary:

其中，

是在生成回复文本的第i个词y_i时的解码层隐藏状态，c_final是双通道信息协同之后的语义表示，代表着对话思路清晰后的语义启发，y_i-1在训练时是本发明回复文本的第i-1个词的向量表示，在预测时用预测文本的第i-1个词的向量表示

代替，可以保证回复文本的一致性。in,

is the hidden state of the decoding layer when generating the i-th word y _i of the reply text, c _final is the semantic representation after the two-channel information collaboration, representing the semantic inspiration after the dialogue is clear, y _i-1 is the original Invent the vector representation of the i-1th word of the reply text, and use the vector representation of the i-1th word of the predicted text during prediction

Instead, the consistency of the reply text can be guaranteed.

步骤52、根据隐藏状态，确定每个词的生成概率，根据生成概率确定回复文本。Step 52: Determine the generation probability of each word according to the hidden state, and determine the reply text according to the generation probability.

在解码生成文本时，本发明认为生成回复趋向从主题关键词中进行延伸回复，因此区别于之前的编码器增加了一个主题偏置概率，强制约束该模型会考虑主题发展，相应的生成概率计算为：When decoding the generated text, the present invention believes that the generated reply tends to extend the reply from the topic keywords, so it is different from the previous encoder and adds a topic bias probability. The model is forced to constrain the topic development, and the corresponding generation probability is calculated. for:

其中，K和V分别代表主题关键词词表和回复文本词表。相应的，p_V和p_K的概率值均由Softmax均一化得到：Among them, K and V represent subject keyword vocabulary and reply text vocabulary respectively. Correspondingly, the probability values of p _V and p _K are uniformized by Softmax:

其中，η(·)是非线性函数tanh。在训练的过程中，本发明定义θ为可训练的参数，将训练文本划分批次进行训练，通过优化基于负对数似然的交叉熵损失函数来获得最好的模型效果，学习参数也就是梯度反向传导、更新下降的过程，其中：Among them, η(·) is a nonlinear function tanh. In the process of training, the present invention defines θ as a trainable parameter, divides the training text into batches for training, and obtains the best model effect by optimizing the cross-entropy loss function based on negative log likelihood, and the learning parameter is The process of gradient reverse conduction and update descent, in which:

本发明首次提出基于主题增强对话历史理解的细粒度信息交互方法的神经网络模型。本发明的模型一方面利用主题语义和各语句进行细粒度的语义交互，得到对话历史句子们的增强语义表示；另一方面利用对话查询句子引导主题矩阵融合，得到对话意图语义表示。两方面的操作旨在使用主题语义去增强对上下文的理解，从而突破过去无差别使用话题信息的弊端；The present invention proposes for the first time a neural network model of a fine-grained information interaction method based on topic-enhanced dialogue history understanding. On the one hand, the model of the present invention uses topic semantics and each sentence to perform fine-grained semantic interaction to obtain enhanced semantic representations of dialogue history sentences; on the other hand, it uses dialogue query sentences to guide topic matrix fusion to obtain dialogue intent semantic representations. The two operations aim to use topic semantics to enhance the understanding of context, thereby breaking through the disadvantages of indiscriminate use of topic information in the past;

本发明首次打破对话历史上下文内容的序列结构建模固守思维，提出基于顺序和图域双通道的协同语义建模及推理方法的模型。借鉴推荐系统中的“双塔”模型，我们的双通道模型能够站在图域视角对整个对话上下文进行理解和训练，同时双通道协同语义建模能够做到语义价值的最大化，拓宽了多轮对话系统的研究思路。For the first time, the invention breaks the rigid thinking of sequence structure modeling of dialogue history context content, and proposes a model based on sequential and graph-domain dual-channel collaborative semantic modeling and reasoning methods. Drawing on the "Twin Towers" model in the recommendation system, our dual-channel model can understand and train the entire dialogue context from the perspective of the graph domain. At the same time, the dual-channel collaborative semantic modeling can maximize the semantic value and broaden the The research idea of round dialogue system.

本发明的基于双通道语义增强的多轮对话回复生成方法可应用在电子商务平台的智能客服系统、仿生AI机器人的语音交互模块、门户网站的新式检索等场景上。此外，也可将其嵌入到军事信息服务中，做到智能筹划分析战况文本和辅助指挥决策等场景中，提升信息服务的效率和交互体验。The multi-round dialogue reply generation method based on dual-channel semantic enhancement of the present invention can be applied to the intelligent customer service system of the e-commerce platform, the voice interaction module of the bionic AI robot, and the new retrieval of the portal website. In addition, it can also be embedded in military information services to achieve intelligent planning and analysis of battle situation texts and auxiliary command and decision-making scenarios to improve the efficiency and interactive experience of information services.

本发明的方法提升了多轮对话生成回复的流畅性、多样性和合理性、提升生成式对话的鲁棒性，优化了文本语义建模的能力效果。The method of the present invention improves the fluency, diversity and rationality of generated responses in multiple rounds of dialogue, improves the robustness of generative dialogue, and optimizes the ability and effect of text semantic modeling.

下面，将以更为具体的实施例详述本申请。Below, the present application will be described in detail with more specific embodiments.

实验准备：Experiment preparation:

1、研究问题1. Research questions

在本发明提出的顺序和图域双通道的协同语义建模及推理模型(Sequential andGraph Dual-channel Collaborative,SGDC)中，本实施例提出了以下三个研究问题来指导后续试验：In the sequential and graph dual-channel collaborative semantic modeling and reasoning model (Sequential and Graph Dual-channel Collaborative, SGDC) proposed by the present invention, this embodiment proposes the following three research questions to guide subsequent experiments:

RQ1：本申请的SGDC模型在流畅性、相关性和多样性上的表现是否比其他基线模型优异？RQ1: Does the SGDC model of this application perform better than other baseline models in terms of fluency, relevance and diversity?

RQ2：整个对话上下文的长度(轮次数)对我们的SGDC模型在多轮对话回复生成上的性能有什么影响？RQ2: How does the length of the entire dialogue context (number of turns) affect the performance of our SGDC model on multi-turn dialogue reply generation?

RQ3：在模型解码预测回复文本时，顺序和图域双通道的协同方式对我们的SGDC模型整体性能有影响？RQ3: When the model decodes and predicts the reply text, does the synergy of sequential and graph-domain dual channels affect the overall performance of our SGDC model?

2、数据集2. Dataset

本实施例选择了DailyDialog数据集和MuTual数据集进行实验。In this embodiment, the DailyDialog data set and the MuTual data set are selected for experimentation.

DailyDialog数据集是经过领域内的学者从日常生活中收集来的，共计13118番对话，涵盖教育、旅行、天气和购物等多种话题，能够反映我们人类之间的绝大部分沟通交流。DailyDialog语义结构更加规范正式、更具主题价值，并且发言者数量合理，对话轮次不冗余，更具研究和应用价值。The DailyDialog dataset is collected from daily life by scholars in the field, with a total of 13,118 conversations, covering various topics such as education, travel, weather, and shopping, and can reflect most of the communication between us humans. The semantic structure of DailyDialog is more formal, more topical, and the number of speakers is reasonable. The dialogue rounds are not redundant, and it has more research and application value.

MuTual数据集是一个高质量的手动注释多轮对话推理数据集，共包含8,860个基于中国学生英语听力理解考试的手动注释对话。与以前对话基准数据集相比，MuTual更具推理挑战性。上述两个数据集的情况如表2所述。The MuTual dataset is a high-quality manually annotated multi-turn dialogue inference dataset, which contains a total of 8,860 manually annotated dialogues based on Chinese students' English listening comprehension tests. Compared with previous dialogue benchmark datasets, MuTual is more challenging for inference. The conditions of the above two datasets are described in Table 2.

表2数据集信息Table 2 Dataset information

DailyDialogDailyDialog MuTualMuTual 对话数量number of conversations 1311813118 88608860 对话的平均轮数Average number of rounds of dialogue 7.97.9 4.734.73 句子中的平均词数average number of words in a sentence 14.614.6 19.5719.57

3、实验对照基准模型3. Experimental comparison benchmark model

本实施例选用五种相关的多轮对话生成模型作为基线算法模型和本申请的算法模型进行整体性能的比较，并探讨分析实验效果。基线算法模型的简介如下所述：In this embodiment, five related multi-round dialogue generation models are selected as the baseline algorithm model to compare the overall performance with the algorithm model of this application, and to discuss and analyze the experimental results. The introduction of the baseline algorithm model is as follows:

S2S-Att：最流行的编码器-解码器框架，编码器将输入序列编码成中间状态，再利用解码器进行解码生成，编码器和解码器均采用门控神经网络GRU，同时在每个时间步的解码器输入添加了Attention机制，确保每次预测的词都是与输入文本最相关的。S2S-Att: The most popular encoder-decoder framework, the encoder encodes the input sequence into an intermediate state, and then uses the decoder to decode and generate, both the encoder and the decoder use the gated neural network GRU, and at the same time at each time The Attention mechanism is added to the decoder input of the step to ensure that each predicted word is the most relevant to the input text.

HRED：第一个用于响应生成的分层上下文建模方法，它使用话语级GRU对每一句进行编码，使用对话级GRU将话语向量转换到对话级的向量表示中。相比于普通的S2S框架来说，考虑了“词汇-话语-对话”的三级语义递进，能够帮助每一层级上的信息聚合和传播，从而实现多轮对话历史建模。HRED: The first hierarchical context modeling approach for response generation, which encodes each sentence using an utterance-level GRU and converts utterance vectors into a dialogue-level vector representation using a dialogue-level GRU. Compared with the ordinary S2S framework, the three-level semantic progression of "vocabulary-discourse-dialogue" is considered, which can help information aggregation and dissemination at each level, thereby realizing multi-round dialogue history modeling.

THRED：是在多轮对话生成领域首个引入主题感知的模型，展开来讲，THRED是在HRED模型的基础上引入主题感知，利用主题-上下文的联合注意力机制进行指导回复的解码。THRED: It is the first model to introduce topic awareness in the field of multi-round dialogue generation. To expand, THRED introduces topic awareness on the basis of the HRED model, and uses the topic-context joint attention mechanism to decode the guidance reply.

ReCoSa：利用Self-Attention机制来关联与回复文本最密切相关的对话上下文，是Transformer和HRED的改良混合模型，在词级别和话语级编码器中均嵌有注意力机制来进行分层建模，目前在多轮对话生成领域中是最先进的性能。ReCoSa: Use the Self-Attention mechanism to associate the dialogue context most closely related to the reply text. It is an improved hybrid model of Transformer and HRED. An attention mechanism is embedded in both word-level and utterance-level encoders for hierarchical modeling. It is currently the state-of-the-art performance in the field of multi-turn dialogue generation.

此外，为探索双通道语义的协同方式对模型性能的影响，我们根据常见的语义特征聚合技术，构建了三种基线模型，简要介绍如下：In addition, in order to explore the impact of the synergy of dual-channel semantics on model performance, we constructed three baseline models based on common semantic feature aggregation techniques, which are briefly introduced as follows:

-Avg：采用了均值聚合策略的双通道模型，关注背景语义；-Avg: A two-channel model using the mean aggregation strategy, focusing on background semantics;

-Max：采用了最大值聚合策略的双通道模型，关注前景语义；-Max: A dual-channel model using the maximum aggregation strategy, focusing on foreground semantics;

-Concat：采用了均值聚合策略的双通道模型，关注全局语义。-Concat: A two-channel model using the mean aggregation strategy, focusing on global semantics.

语义特征的聚合策略是在若干个语义向量聚合时或者将语义矩阵转化为固定长度向量表示时采用的重要策略，如图5所示，语义特征发生聚合时，能够降低信息冗余、汇聚焦点语义以及防止训练过拟合，这类似卷积神经网络中的池化层的作用。常用的聚合策略有最大值聚合、均值聚合、联接聚合和门控聚合。The aggregation strategy of semantic features is an important strategy adopted when a number of semantic vectors are aggregated or when a semantic matrix is converted into a fixed-length vector representation. As shown in Figure 5, when semantic features are aggregated, information redundancy can be reduced and the focus semantics can be reduced. And to prevent training overfitting, which is similar to the role of the pooling layer in the convolutional neural network. Commonly used aggregation strategies are maximum aggregation, mean aggregation, join aggregation, and gated aggregation.

最大值聚合策略Maximum Aggregation Strategy

最大值聚合策略的操作是将每个维度的向量元素值取最大值作为新语义向量表示的同维度向量元素值。根据最大值聚合策略的流程，同维度的其他元素值不会传入下一层语义向量，这样的聚合使得模型能够关注语义中最为浅层突出的部分。词汇单一、句法简单以及语义浅显的文本表示可以采用这种聚合策略，使用最大池化层可以进行聚合操作，最大值聚合公式为：The operation of the maximum value aggregation strategy is to take the maximum value of the vector element value of each dimension as the vector element value of the same dimension represented by the new semantic vector. According to the flow of the maximum value aggregation strategy, other element values of the same dimension will not be passed to the next layer of semantic vectors. Such aggregation enables the model to focus on the most shallow and prominent parts of the semantics. This aggregation strategy can be used for text representations with a single vocabulary, simple syntax, and shallow semantics. The maximum pooling layer can be used for aggregation operations. The maximum aggregation formula is:

h_max＝max{h₁,h₂...h_i} (26)h _max = max{h ₁ ,h ₂ ... h _i } (26)

均值聚合策略mean aggregation strategy

均值聚合策略的操作是将每个维度的向量元素值取平均作为新语义向量表示的同纬度元素向量值。均值聚合策略认为每个维度的元素都包含平等的语义价值，通过求平均的方式能够进行语义中和，减少语义方向的偏移，是能够提升模型鲁棒性的策略。深度神经网络中越往后的位置，语义信息丰富且均衡，越适合均值聚合策略。均值聚合公式为：The operation of the mean aggregation strategy is to average the vector element values of each dimension as the same-latitude element vector value represented by the new semantic vector. The mean aggregation strategy considers that the elements of each dimension contain equal semantic value, and the mean can be used to neutralize the semantics and reduce the deviation of the semantic direction, which is a strategy that can improve the robustness of the model. The further back the position in the deep neural network, the richer and more balanced the semantic information, the more suitable for the mean aggregation strategy. The mean aggregation formula is:

联接聚合策略Join Aggregation Strategy

联接聚合策略是同时兼顾均值聚合和最大值聚合两种的优势而引申的一种聚合策略。以v₁∈Rⁿ和v₂∈R^m两个特征向量为例，常见的联接聚合策略操作是将向量元素进行拼接，最为明显的就是维度增加到v_concat∈R^n+m，联接聚合是暴力的将语义信息进行附加，最大程度的保留语义信息，是一种虽然直接但十分有效的策略，拼接完经常会使用一个线性映射来转换到需要的维度数。联接聚合的公式为：The join aggregation strategy is an aggregation strategy that takes into account both the advantages of mean aggregation and maximum aggregation. Taking the two feature vectors v ₁ ∈ R ⁿ and v ₂ ∈ R ^m as an example, the common join aggregation strategy operation is to splice the vector elements, the most obvious is to increase the dimension to v _concat ∈ R ^n+m , the join aggregation is Violently attaching semantic information and retaining semantic information to the greatest extent is a direct but very effective strategy. After splicing, a linear mapping is often used to convert to the required number of dimensions. The formula for join aggregation is:

门控聚合策略Gated Aggregation Strategy

门控聚合策略支持操作灵活的信息协同方式，通过Gate的计算来控制每个维度或者向量语义的输入程度。门控聚合策略的具体操作会通过深度学习中的激活函数sigmoid和梯度更新来学习Gate的取值，并依次通过哈达码积(Hadamard)将Gate值和各状态联立，从而聚合得到最终向量表示。门控聚合策略支持深度学习的训练更新，能够依据神经网络确认各维度聚合的重要权重，适合复杂语义及推理任务的模型。门控聚合的公式为：The gated aggregation strategy supports a flexible information collaboration method, and controls the input degree of each dimension or vector semantics through the calculation of Gate. The specific operation of the gated aggregation strategy will learn the value of the Gate through the activation function sigmoid and gradient update in deep learning, and sequentially combine the Gate value with each state through the Hadamard product (Hadamard), so as to aggregate to obtain the final vector representation . The gated aggregation strategy supports the training update of deep learning, and can confirm the important weight of aggregation of each dimension according to the neural network, which is suitable for complex semantic and reasoning task models. The formula for gated aggregation is:

4、实验评估准则和指标4. Experimental evaluation criteria and indicators

多轮对话回复生成的自动化评估主要从生成文本的流畅度、多样性以及切题相关性进行考虑。文本的流畅程度代表着字词语法和语义关联上的合理合规，即深度学习模型在词性序列和词汇搭配距离上的概率最大化，符合人类可读可理解文本的要求，不存在歧义理解。文本的多样性代表着对话语境的语义丰富，具备可延展的特性，不是“Idon’tKnow”的无意义回复，是闲聊型对话系统的必要要求。切题相关性是指回复是否具有现实意义，符合对话场景的话题顺承或者转折，是回复生成任务评判的重要方向。The automatic evaluation of multi-round dialogue response generation mainly considers the fluency, diversity and relevance of the generated text. The fluency of the text represents the reasonable compliance of the grammatical and semantic associations of words, that is, the deep learning model maximizes the probability of part-of-speech sequences and lexical collocation distances, which meets the requirements of human-readable and comprehensible texts, and there is no ambiguous understanding. The diversity of texts means that the dialogue context is rich in semantics and extensible. It is not a meaningless reply of "Idon't Know", but a necessary requirement for a chat-type dialogue system. Relevance to the topic refers to whether the reply has practical significance, and it is an important direction for judging the reply generation task.

具体实施来讲，为了更好评估基线算法和本申请研究工作的优劣，根据文献(补全)采用困惑度PPL、dist-1和dist-2以及基于Embedding的句子相关性指标分别评价生成回复的流畅程度、多样性以及相关性。In terms of specific implementation, in order to better evaluate the pros and cons of the baseline algorithm and the research work of this application, according to the literature (completion), the perplexity PPL, dist-1 and dist-2 and the sentence correlation index based on Embedding are used to evaluate and generate responses respectively fluency, variety, and relevance.

PPL：根据参考文献，我们使用语言模型困惑度Perplexity(PPL)来评估生成文本的流畅程度，PPL值越低，代表生成的回复文本概率越高，词汇排列和搭配越合理，越易理解流畅。其公式为：PPL: According to the references, we use the language model perplexity (PPL) to evaluate the fluency of the generated text. The lower the PPL value, the higher the probability of the generated reply text, the more reasonable the vocabulary arrangement and collocation, and the easier it is to understand and fluency. Its formula is:

Distinct：本文使用了自动化评估指标Distinct-1和Distinct-2来评估生成文本的内容多样性。Distinct-n的分数值越高，代表着n元组在句子中占有的比例就越高，代表生成的文本富含更多的内容，回复的效果越好。公式计算如下：Distinct: This paper uses automated evaluation metrics Distinct-1 and Distinct-2 to evaluate the content diversity of generated text. The higher the score value of Distinct-n, the higher the proportion of n-tuples in the sentence, which means that the generated text is richer in content and the reply effect is better. The formula is calculated as follows:

Embedding：区别于ngram方式计算预测和真实之间的重合或者共现程度，基于Embedding的评估方式将文本转义到低维语义表征，进而通过文本相似度来衡量相关程度。本实施例使用了Greedy Matching(GM)、Embedding Average(EA)和Vector Extrema(VE)来进行评估。三个评估指标值越大，代表预测文本和真实文本的语义相关性越紧密，回复越切题。Embedding: Different from the ngram method to calculate the degree of overlap or co-occurrence between the prediction and the real, the Embedding-based evaluation method converts the text into a low-dimensional semantic representation, and then measures the degree of correlation through text similarity. In this embodiment, Greedy Matching (GM), Embedding Average (EA) and Vector Extrema (VE) are used for evaluation. The larger the value of the three evaluation indicators, the closer the semantic correlation between the predicted text and the real text, and the more relevant the reply is.

Greedy Matching(GM)，嵌入贪婪值度量利用贪婪搜索使得尽量生成与真实文本中关键词相似的词汇或语义，更加细粒度地考虑了单词级的对齐，对长文本评估更加准确，公式计算如下：Greedy Matching (GM), embedded greedy value measurement uses greedy search to generate vocabulary or semantics similar to keywords in real text as much as possible, considers word-level alignment in a more granular manner, and evaluates long texts more accurately. The formula is calculated as follows:

Embedding Average(EA)，嵌入平均值度量广泛用于测量文本相似性。使用余弦相似度来衡量预测和真实文本的语义向量，其中语义向量是通过平均其组成单词的向量表示来计算短语含义的方法，公式计算如下：Embedding Average (EA), the embedded average metric is widely used to measure text similarity. Cosine similarity is used to measure the semantic vectors of predicted and real texts, where semantic vectors are a method of calculating the meaning of a phrase by averaging the vector representations of its constituent words, calculated as follows:

Vector Extrema(VE)，嵌入极值度量在计算文本向量时使用词向量各个维度的极值，并如上使用余弦相似度进行衡量比较，其中值得注意的是此种评估指标关注信息极值，即话题信息，因此可以用于衡量切题相关性，公式计算如下：Vector Extrema (VE), the embedding extreme value measurement uses the extreme value of each dimension of the word vector when calculating the text vector, and uses the cosine similarity to measure and compare as above. It is worth noting that this evaluation index focuses on the information extreme value, that is, the topic information, and thus can be used to measure relevance to the topic, the formula is calculated as follows:

参数设置及实施环境Parameter setting and implementation environment

为了公平计较基线算法和本发明算法模型，本发明实施例均采用了Adam优化器和Pytorch框架，并在训练期间，词嵌入向量均采用随机初始化和模型更新，维度为512维，所有循环神经网络(GRU和BIGRU)单元的输入和输出隐藏维度也是512维，模型学习速率设置为0.0001进行梯度裁剪，每一次迭代参与训练的样本数(批大小)设置为64，并均在NVIDIATITAN RTX GPU的工作站上进行优化训练和验证预测。In order to fairly compare the baseline algorithm and the algorithm model of the present invention, the embodiments of the present invention all use the Adam optimizer and the Pytorch framework, and during the training period, the word embedding vectors are randomly initialized and model updated, the dimension is 512 dimensions, and all recurrent neural networks (GRU and BIGRU) The input and output hidden dimensions of the unit are also 512 dimensions, the model learning rate is set to 0.0001 for gradient clipping, and the number of samples (batch size) participating in training in each iteration is set to 64, and all are in the workstation of NVIDIA RTX GPU Optimizing training and validating predictions on .

此外，本实施例的模型中的主题是通过TF-IDF进行提取出来的。为了加速训练过程，防止前期训练过程中由于模型误差较大而导致的错误累积，引入teacher forcing机制，将解码器的输入强制修改为目标token，从而减少模型中的错误传递，保证参数可以正常更新。In addition, the topics in the model of this embodiment are extracted through TF-IDF. In order to speed up the training process and prevent the accumulation of errors caused by large model errors in the early training process, a teacher forcing mechanism is introduced to force the input of the decoder to be modified to the target token, thereby reducing the error transmission in the model and ensuring that the parameters can be updated normally .

本次工作的软硬件实施配置如下表3所示：The hardware and software implementation configuration of this work is shown in Table 3 below:

表3软硬件实施配置Table 3 Software and hardware implementation configuration

实验结果分析与讨论：Analysis and discussion of experimental results:

1、与基线模型相比的整体性能1. Overall performance compared to the baseline model

为了探索RQ1问题，本实施例将SGDC模型与DialogueRNN框架的基线模型在MuTual和DailyDialog两个数据集上进行性能比对，流畅性、相关性和多样性的评估结果如表4所示。其中，表现最好的基线模型指标值增加了下划线，最好的评估指标值被加粗。In order to explore the RQ1 problem, this example compares the performance of the SGDC model and the baseline model of the DialogueRNN framework on the two data sets of MuTual and DailyDialog. The evaluation results of fluency, relevance and diversity are shown in Table 4. Among them, the best-performing baseline model indicator value is underlined, and the best evaluation indicator value is bolded.

表4 SGDC模型与DialogueRNN框架的基线模型的性能比对Table 4 Performance comparison between the SGDC model and the baseline model of the DialogueRNN framework

从表4的实验结果可以看出：It can be seen from the experimental results in Table 4 that:

1显然，本实施例双通道语义建模的SGDC模型在两个数据集上的绝多数评估优于其他基线模型。两个数据集上显著胜出的模型性能说明了顺序和图域双通道语义协同建模方式的有效性，能够获得DialogueRNN框架挖掘不到的语义关联和推理效果。1 Obviously, the SGDC model of the dual-channel semantic modeling of this embodiment outperforms other baseline models in most evaluations on the two datasets. The significantly winning model performance on the two datasets demonstrates the effectiveness of the sequential and graph-domain dual-channel semantic collaborative modeling approach, which can obtain semantic associations and inference effects that cannot be mined by the DialogueRNN framework.

2在相关性维度评估上，我们发现了有趣的现象：以MuTual数据集为例，SGDC模型在相关性上评估的三个指标均获较高得分，与最好的基线模型相比，VE和EA得分高出约2％，GM得分高出3.5％，与其他所有基线模型相比，VE、EA和GM得分均高出5％。类似的，在DailyDialog数据集上也是如此，但提升没有MuTual数据集幅度大。为了解释这种相关性上的能力差距，理解这些模型和数据集的本质很重要，MuTual数据集是标注更为严谨的侧重多轮对话推理的数据集，SGDC模型是从不同结构视角挖掘语义联系，顺序通道可以捕获一般的递进语义，图域通道能够在多轮对话中通过边的连接跨越距离壁垒得到长距离的信息依赖，这种双通道的协同建模优势显然优于DialogueRNN框架的其他基线模型。2 In the evaluation of the correlation dimension, we found an interesting phenomenon: Taking the MuTual dataset as an example, the SGDC model scored higher in the three indicators evaluated in the correlation. Compared with the best baseline model, VE and The EA score is about 2% higher, the GM score is 3.5% higher, and VE, EA, and GM scores are all 5% higher than all other baseline models. Similarly, the same is true on the DailyDialog dataset, but the improvement is not as large as that of the MuTual dataset. In order to explain this capability gap in correlation, it is important to understand the nature of these models and datasets. The MuTual dataset is a more rigorously labeled dataset that focuses on multi-round dialogue reasoning. The SGDC model mines semantic connections from different structural perspectives. , the sequential channel can capture general progressive semantics, and the graph-domain channel can get long-distance information dependence through edge connections across distance barriers in multiple rounds of dialogue. The advantages of this dual-channel collaborative modeling are obviously better than other DialogueRNN frameworks baseline model.

3本实施例的得分显著高于ReCoSa，这也许因为本发明的图域通道是基于“主题-句子簇”节点的关系构建的，异构图上语义推理比ReCoSa仅仅关注问答对应的先验机制更为有效，这也说明了本发明的模型可以准确的感知对话的话题方向，因此可以得到与真实回复文本更为紧密的相似度得分，保持在话题上的对话轨迹。3 The score of this embodiment is significantly higher than that of ReCoSa, which may be because the graph-domain channel of the present invention is constructed based on the relationship between "topic-sentence cluster" nodes, and the semantic reasoning on heterogeneous graphs is more than that of ReCoSa, which only focuses on the prior mechanism of question-answering It is more effective, which also shows that the model of the present invention can accurately perceive the topic direction of the dialogue, so it can obtain a closer similarity score with the real reply text, and keep the dialogue track on the topic.

4令人惊讶的是，在多样性Dist-1和Dist-2得分上，本发明的模型和其他基准模型差距不大，这或许是因为本发明语义建模过于追求语义联系，也使用了主题偏置概率，因此丧失了一定的文本多样性，这是对话生成模型难以均衡的一点，不过从场景应用来看，折损可控的多样性可以换来更为紧密相关的回复也是值得的。4 Surprisingly, in terms of diversity Dist-1 and Dist-2 scores, there is not much difference between the model of the present invention and other benchmark models, which may be because the semantic modeling of the present invention pursues semantic connections too much, and also uses topic Bias probability, so a certain amount of text diversity is lost. This is a point that is difficult for dialogue generation models to balance. However, from the perspective of scenario applications, it is worthwhile to exchange more closely related replies for controllable loss of diversity.

对话上下文长度对性能的影响Impact of dialog context length on performance

为了探索RQ2问题，本实施例分析了SGDC模型和基线模型在不同上下文长度的测试样本上的性能，也就是多轮对话的轮次数量不一样。本实施例将抽样测试集依据对话上下文长度人工分为三组，分别是短篇对话(对话轮次少于5)、中篇对话(对话轮次在6轮到10轮之间)和长篇对话(对话轮次大于10轮)。采用部分客观评估指标评测各模型性能，并且绘制结果如图6。图中每个模型中的三个柱状图由左至右分别指代短篇、中篇和长篇。In order to explore the RQ2 problem, this embodiment analyzes the performance of the SGDC model and the baseline model on test samples with different context lengths, that is, the number of rounds of multi-round dialogue is different. In this embodiment, the sample test set is manually divided into three groups according to the length of the dialogue context, which are short dialogues (dialogue turns less than 5), medium dialogues (dialogue turns between 6 and 10 rounds) and long dialogues ( Dialogue turns are greater than 10 rounds). Some objective evaluation indicators are used to evaluate the performance of each model, and the results are drawn in Figure 6. The three histograms in each model in the figure refer to short, medium and long stories from left to right.

从实验结果可以看出：It can be seen from the experimental results that:

1无论是SGDC模型还是基线模型，随着对话篇幅的增加，困惑度得分都出现了不同程度的单调上升，这说明对话篇幅越长，对话的语义建模越复杂，信息关联越难以捕获，并且模型容易受到无关语义噪声影响，导致预测能力下降。1 Whether it is the SGDC model or the baseline model, as the length of the dialogue increases, the perplexity score increases monotonously to varying degrees, which shows that the longer the dialogue, the more complex the semantic modeling of the dialogue and the more difficult it is to capture information associations, and The model is vulnerable to irrelevant semantic noise, resulting in a decline in predictive power.

2本发明的SGDC模型无论在短篇、中篇还是长篇对话测试集中表现都是优于基线模型的，这证明了本发明模型的鲁棒性。此外，相较于最好的基准模型，SGDC在长篇对话中提升幅度最大，这可以看出SGDC的图域通道的确发挥了捕获长距离语义依赖的作用，在长篇对话语义建模中发挥着独有的能力优势。2 The SGDC model of the present invention performs better than the baseline model no matter in the test set of short, medium or long dialogues, which proves the robustness of the model of the present invention. In addition, compared with the best benchmark model, SGDC has the largest improvement in long-form dialogues. It can be seen that the graph-domain channel of SGDC has indeed played a role in capturing long-distance semantic dependencies, and has played a unique role in semantic modeling of long-form dialogues. Some ability advantages.

双通道语义的协同方式对性能的影响The impact of the cooperative mode of dual-channel semantics on performance

为了探索RQ3问题，本实施例改变SGDC模型中双通道信息协同的方式，设计了三种变体模型，来探索两个通道语义信息最佳的聚合策略，三个变体模型分别为：In order to explore the RQ3 problem, this embodiment changes the two-channel information collaboration method in the SGDC model, and designs three variant models to explore the best aggregation strategy for the semantic information of the two channels. The three variant models are:

·SGDC_Avg：SGDC_Avg模型选择的是平均化聚合策略，这种协同方式假设在两个通道的语义信息是平等的，因此对于两个语义向量通过平均池化来获得整个上下文的语义表示；SGDC _Avg : The SGDC _Avg model chooses the average aggregation strategy. This collaborative method assumes that the semantic information in the two channels is equal, so the semantic representation of the entire context is obtained by averaging pooling for the two semantic vectors;

·SGDC_Max：SGDC_Max模型侧重选择最重要的语义特征，采用的是最大化聚合策略，这种协同方式是假设语义向量表示中最大值能反映重要语义，因此对于两个语义向量通过最大池化来获得整个上下文的语义表示；SGDC _Max : The SGDC _Max model focuses on selecting the most important semantic features, and adopts the maximum aggregation strategy. This collaborative method assumes that the maximum value in the semantic vector representation can reflect important semantics. Therefore, the two semantic vectors are pooled through the maximum to obtain a semantic representation of the entire context;

·SGDC_Concat：SGDC_Concat模型认为两个通道的语义信息是同等重要的，并且不能有任何折损，因此通过语义向量直接组合的方式得到整个上下文的语义表示；·SGDC _Concat : The SGDC _Concat model believes that the semantic information of the two channels is equally important and cannot be compromised, so the semantic representation of the entire context is obtained through the direct combination of semantic vectors;

表5协同方式对性能影响的结果Table 5 The results of the impact of collaborative methods on performance

为了方便比较，我们将上述实验使用的模型称为SGDC_Gate，在表5中记录了SGDC_Gate及其三种变体在MuTual和DailyDialog两个数据集上的生成效果，其中本实施例选择的是语义相关性的评估指标(GM\EA\VE)，可以通过表5中的评估指标差异发现：For the convenience of comparison, we refer to the model used in the above experiment as SGDC _Gate . Table 5 records the generation effects of SGDC _Gate and its three variants on the two data sets of MuTual and DailyDialog, among which this example chooses The evaluation index (GM\EA\VE) of semantic relevance can be found through the difference of evaluation index in Table 5:

①SGDC_Gate在语义相关性的三个指标上都明显地优于SGDC_Avg和SGDC_Max。这说明双通道的语义信息都有各自的独特性和重要性，只有通过Gate机制找到最佳聚合策略才能保证语义信息聚合的效益最大化，而SGDC_Avg的平均化聚合策略会可能会导致各自丢失掉重要的语义，SGDC_Max的最大化聚合策略过于聚焦重要语义，属于局部相关，难以捕获语义关联平衡从而丧失回复文本的整体语义相关性。①SGDC _Gate is significantly better than SGDC _Avg and SGDC _Max in the three indicators of semantic relevance. This shows that the semantic information of the two channels has its own uniqueness and importance. Only by finding the optimal aggregation strategy through the Gate mechanism can the maximum benefits of semantic information aggregation be guaranteed, and the average aggregation strategy of SGDC _Avg may cause respective losses. Losing important semantics, SGDC _Max 's maximization aggregation strategy focuses too much on important semantics, which belongs to local correlation, and it is difficult to capture the balance of semantic associations, thus losing the overall semantic relevance of the reply text.

②SGDC_Gate在EA和VE的评估上与SGDC_Concat差别不大，但在GM上明显优于SGDC_Concat。这种令人好奇的现象可以从评估指标的具体细节来解释，EA和VE分别衡量预测文本和真实文本词嵌入相似度的平均水平以及极值水平，SGDC_Gate通过Gate机制找到了聚合的平衡点，SGDC_Concat则照单全收，图域通道的语义增强本身就捕获了长距离依赖，因此两者分别是精准求解和暴力求解的殊途同归，在预测文本的平均水平和极值水平上并未太大差异。而GM值是不仅考虑了词嵌入的相似度，还考量单词之间的对齐，所以属于更加细粒度的评估项，这时Gate机制的优势就体现出来，SGDC_Gate能比SGDC_Concat过滤更为细粒度的语义噪声。②SGDC _Gate is not much different from SGDC _Concat in the evaluation of EA and VE, but it is obviously better than SGDC _Concat in GM. This curious phenomenon can be explained from the specific details of the evaluation index. EA and VE respectively measure the average level and extreme value level of the word embedding similarity between the predicted text and the real text. SGDC _Gate finds the balance point of aggregation through the Gate mechanism , SGDC _Concat accepts everything as it is ordered. The semantic enhancement of the graph-domain channel itself captures long-distance dependencies. Therefore, the two are precise solution and brute force solution respectively. big difference. The GM value not only considers the similarity of word embedding, but also considers the alignment between words, so it is a more fine-grained evaluation item. At this time, the advantages of the Gate mechanism are reflected. SGDC _Gate can filter more finely than SGDC _Concat Granular Semantic Noise.

本发明借鉴图神经网络技术的丰富研究思路，提出了基于顺序和图域双通道的协同语义建模及推理方法，并设计了基于图上节点的双门控图神经网络。并且，本发明的模型在开放域数据集和对话推理数据集上均进行了实验验证，实验结果表明了双通道的协同语义建模及推理方法在各项评估项上的优势，同时随着对话轮次的增加，本发明的模型依然具备很好的鲁棒性。The present invention draws on the rich research ideas of graph neural network technology, proposes a collaborative semantic modeling and reasoning method based on dual channels of sequence and graph domain, and designs a dual-gated graph neural network based on nodes on the graph. Moreover, the model of the present invention has been verified experimentally on both the open domain data set and the dialogue reasoning data set. The experimental results show the advantages of the dual-channel collaborative semantic modeling and reasoning method in various evaluation items. With the increase of rounds, the model of the present invention still has good robustness.

本发明从内容特征的剖析再利用和结构特征的双通道增强入手，提出的模型方案具有比较好的实际应用价值：The present invention starts from the analysis and reuse of content features and dual-channel enhancement of structural features, and the proposed model scheme has relatively good practical application value:

(1)生成式对话方法只考虑了序列结构的递进语义，忽略了远距离的上下文强关联交互。如何有效全面利用上下文信息，本发明采用了最直接的办法，也就是打破序列结构，借鉴人类对话时反复思考的思维方式，利用上下文信息设计一个可以认知推理的异构图，这种“破而后立”的思路可以拓展研究的有效性。(1) The generative dialogue method only considers the progressive semantics of the sequence structure, ignoring the long-distance contextual strong interaction. How to effectively and comprehensively utilize the context information, the present invention adopts the most direct method, that is, breaks the sequence structure, learns from the thinking mode of repeated thinking in human dialogue, and uses the context information to design a heterogeneous graph that can be cognitively reasoned. The idea of "and then stand" can expand the validity of the research.

(2)在图神经网络更新节点表示的信息传递过程中，本发明设计了双门控GNN，通过信息聚合的GRU单元和信息组合的Gate机制进行信息传递的过滤筛选。这种设计可以有效过滤语义噪声，抓住关键语义信息，是在对话生成领域的一次尝试。(2) During the information transmission process of graph neural network update node representation, the present invention designs a double-gated GNN, which performs filtering and screening of information transmission through the GRU unit of information aggregation and the Gate mechanism of information combination. This design can effectively filter semantic noise and capture key semantic information, which is an attempt in the field of dialogue generation.

本发明的第二方面提供了一种终端设备，包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序，处理器执行计算机程序时实现上述方法的步骤。A second aspect of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the steps of the above method are implemented.

本发明提出了顺序和图域双通道协同语义建模及推理的方法，旨在融合不同结构建模中的语义优势，获得跨度更大的信息关联和语义推理。详细来讲，一方面本发明构建对话级异构认知图，图节点是主题语义和句子簇语义的整合，图中边是句子间主题重合的程度，然后利用双门控图神经网络进行深度学习，获得对话上下文在图域上的语义表示；另一方面，在保留的顺序通道中嵌入分层注意力机制获得了对话上下文的顺序语义表示。最后协调两个语义表示的信息贡献进行预测。本发明模型在基准模型上表现优异，而且缓解了长距离的语义依赖问题。The invention proposes a sequential and graph-domain dual-channel collaborative semantic modeling and reasoning method, aiming to integrate semantic advantages in different structural modeling and obtain information association and semantic reasoning with a larger span. In detail, on the one hand, the present invention constructs a dialogue-level heterogeneous cognitive graph. The graph nodes are the integration of topic semantics and sentence cluster semantics. The edges in the graph represent the degree of topic overlap between sentences. Learning, to obtain the semantic representation of the dialogue context on the graph domain; on the other hand, embedding a hierarchical attention mechanism in the preserved sequential channel obtains the sequential semantic representation of the dialogue context. Finally, the information contributions of the two semantic representations are reconciled for prediction. The model of the present invention performs well on the benchmark model, and alleviates the problem of long-distance semantic dependence.

上述只是本发明的较佳实施例，并非对本发明作任何形式上的限制。虽然本发明已以较佳实施例揭露如上，然而并非用以限定本发明。任何熟悉本领域的技术人员，在不脱离本发明技术方案范围的情况下，都可利用上述揭示的技术内容对本发明技术方案做出许多可能的变动和修饰，或修改为等同变化的等效实施例。因此，凡是未脱离本发明技术方案的内容，依据本发明技术实质对以上实施例所做的任何简单修改、等同变化及修饰，均应落在本发明技术方案保护的范围内。The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into an equivalent implementation of equivalent changes example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention shall fall within the protection scope of the technical solution of the present invention.

Claims

1. A multi-round dialogue reply generation method based on dual-channel semantic enhancement, characterized in that it comprises:

Obtain the initial word vector of the dialogue text;

Obtaining the sequential semantic representation of the initial word vector includes obtaining the utterance-level sentence semantic vector of the initial word vector, determining the dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, and converting the dialogue Sentence-level semantic vectors are recorded as sequential semantic representations;

Obtaining the graph domain semantic representation of the initial word vector on the graph domain;

performing semantic enhancement on the dialog text according to the sequential semantic representation and the graph domain semantic representation, to obtain an enhanced semantic representation;

A reply text is generated according to the enhanced semantic representation.

2. The method according to claim 1, wherein obtaining the discourse-level sentence semantic vector of the initial word vector specifically comprises:

The initial word vector is input into the sentence layer encoder and the word attention module in turn to obtain the discourse-level sentence semantic vector;

Determine the dialogue-level sentence semantic vector of the initial word vector according to the utterance-level sentence semantic vector, specifically including:

The discourse-level sentence semantic vector is input into the context encoder and the sentence attention module in sequence to obtain the dialogue-level sentence semantic vector;

Both the sentence layer encoder and the context encoder are bidirectional gated neural networks;

The mechanisms used in the word attention module and the sentence attention module are attention mechanisms.

3. The method according to claim 1, wherein obtaining the semantic representation of the initial word vector on the graph domain includes:

Obtain the topic keywords of the initial word vector, determine the nodes of the heterogeneous cognitive map according to the topic keywords, and the nodes include topic-sentence cluster nodes, dialogue query nodes and common nodes;

According to the nodes of the heterogeneous cognitive graph, determine the edges of the heterogeneous cognitive graph and the weight of each edge, and the weight is determined according to the degree of topic overlap between sentences in the dialogue text corresponding to the initial word vector;

A graph neural network is used to learn the vector representation of the nodes in the heterogeneous cognitive graph, and obtain the graph domain semantic representation of the initial word vector on the graph domain.

4. The method according to claim 1, wherein, according to the sequential semantic representation and the graph domain semantic representation, performing semantic enhancement on the dialog text, specifically comprising:

The dialogue text is semantically enhanced according to a first formula, and the first formula is:

In the formula, c _final is the enhanced semantic representation,

For the sequential semantic representation,

5. The method according to any one of claims 1-4, wherein generating a reply text according to the enhanced semantic representation specifically includes:

Input the enhanced semantic representation into the one-way gated neural network to obtain the hidden state of each word in the generated reply text;

According to the hidden state, the generation probability of each word is determined, and the reply text is determined according to the generation probability.

6. The method according to claim 5, wherein the enhanced semantic representation is input into a one-way gated neural network to obtain the hidden state of each word in the generated reply text, specifically comprising:

Generate the hidden state of each word in the reply text according to the second formula, the second formula is:

In the formula, y _i is the i-th word in the reply text generated in the training phase, and y _i-1 is the i-1th word in the reply text generated in the training phase,

7. The method according to claim 5, wherein, according to the hidden state, determining the generation probability of each word specifically comprises:

Determine the generation probability of each word according to the third formula, the third formula is:

In the formula,

generate the i-th word in the reply text for the prediction phase,

for

The generation probability of

with

Respectively, the i-th word in the topic keyword vocabulary and reply text vocabulary in the prediction stage

the generation probability of .

8. The method of claim 7, wherein the

Determined according to the fourth formula, the fourth formula is:

In the formula, η(·) is a non-linear function tanh, V is a reply text vocabulary, K is a subject keyword vocabulary,

9. The method of claim 7, wherein the

Determined according to the fifth formula, the fifth formula is:

10. A terminal device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that, when the processor executes the computer program, the computer program according to claim The step of any one of 1 to 9 methods.