CN106126596A

CN106126596A - A kind of answering method based on stratification memory network

Info

Publication number: CN106126596A
Application number: CN201610447676.4A
Authority: CN
Inventors: 许家铭; 石晶; 姚轶群; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2016-11-16
Anticipated expiration: 2036-06-20
Also published as: CN106126596B

Abstract

The present invention provides a question answering method based on a hierarchical memory network. Firstly, sentence granularity memory coding is performed, and under the stimulation of question semantic coding, the information reasoning of the sentence granularity memory unit is completed through a multi-round iterative attention mechanism. The maximum sampling is used to screen the sentences, and word-granularity memory encoding is also performed on the basis of sentence-granularity memory encoding, that is, memory encoding is performed at two levels to form a hierarchical memory encoding, and the sentence-granularity and word-granularity memory units are used to jointly predict the output word Probability distribution improves the accuracy of automatic question answering and effectively solves the problem of answer selection for low-frequency words and unregistered words.

Description

A Question Answering Method Based on Hierarchical Memory Network

技术领域technical field

本发明涉及自动问答系统构建技术领域，更具体地涉及一种基于层次记忆网络的端到端问答方法。The present invention relates to the technical field of building an automatic question answering system, and more particularly relates to an end-to-end question answering method based on a hierarchical memory network.

背景技术Background technique

长期以来，自动问答一直都是自然语言处理问题中最具挑战性的任务之一，该任务需要对文本进行深层次理解并筛选出候选答案作为系统响应。目前已有的传统方法包括：采用流水线模式对文本处理过程中各个模块进行独立训练，然后融合输出的模式；构建大规模结构化知识库，并基于此知识库进行信息推理和答案预测。近些年，基于深度学习方法的端到端系统被广泛用于解决各种任务，这些方法无须手工构造特征，且无需对各个模块进行单独调优。Automatic question answering has long been one of the most challenging tasks in natural language processing, which requires deep understanding of text and screening candidate answers as system responses. The existing traditional methods include: using the pipeline mode to independently train each module in the text processing process, and then merging the output mode; building a large-scale structured knowledge base, and based on this knowledge base for information reasoning and answer prediction. In recent years, end-to-end systems based on deep learning methods have been widely used to solve various tasks, these methods do not need to manually construct features, and do not require individual tuning of each module.

问答系统大致可分为两个步骤：首先定位相关语义信息，该步骤称为“激活阶段”，然后基于相关信息进行响应生成，该步骤称为“生成阶段”。最近，神经记忆网络模型在问答系统任务中取得了较好地效果。但是，这些模型最大的缺点是采用单层次句粒度的记忆单元，无法较好地解决低频词或未登录词问题。而且通常情况下，为了降低模型的时间复杂度，常需要减少词典规模。此时，现有的端到端神经网络模型无法较好地选择出低频或未登录词作为答案输出。即当目标答案词在训练词典之外时，现有方法于在线测试阶段无法较好地选择准确答案作为模型输出。以下述对话文本为例：The question answering system can be roughly divided into two steps: firstly locate relevant semantic information, this step is called "activation phase", and then generate a response based on relevant information, this step is called "generation phase". Recently, neural memory network models have achieved good results in question answering system tasks. However, the biggest disadvantage of these models is that they use single-level sentence granular memory units, which cannot solve the problem of low-frequency words or unregistered words well. And usually, in order to reduce the time complexity of the model, it is often necessary to reduce the size of the dictionary. At this time, the existing end-to-end neural network model cannot well select low-frequency or unregistered words as the answer output. That is, when the target answer word is outside the training dictionary, the existing method cannot select the correct answer as the model output in the online test phase. Take the following dialogue text as an example:

1.先生您好，叫什么名字？1. Hello sir, what's your name?

2.呃，我叫威廉姆森。2. Uh, my name is Williamson.

3.请告诉我您的护照号码。3. Please tell me your passport number.

4.好的，是577838771。4. OK, it's 577838771.

5.还有您的电话号码？5. And your phone number?

6.号码是0016178290851。6. The number is 0016178290851.

假定“威廉姆森”，“577838771”和“0016178290851”是低频词或未登录词，如果传统方法将这些词丢弃或者统一以“unk”符号替换的话，这些方法均无法从对话文本中选择出准确的用户信息。然而，在实际应用中，多数答案信息来自于低频词或长尾词，如何设计一种能够有效解决未登录词的答案选择方法是目前自动问答系统领域迫切亟需的任务。Assuming that "Williamson", "577838771" and "0016178290851" are low-frequency words or unregistered words, if traditional methods discard these words or replace them with "unk" symbols, these methods cannot select accurate words from the dialogue text. user information. However, in practical applications, most of the answer information comes from low-frequency words or long-tail words. How to design an answer selection method that can effectively solve unregistered words is an urgent task in the field of automatic question answering systems.

发明内容Contents of the invention

(一)要解决的技术问题(1) Technical problems to be solved

为了解决现有技术问题，本发明提供了一种基于层次化记忆网络的问答方法。In order to solve the problems in the prior art, the present invention provides a question answering method based on a hierarchical memory network.

(二)技术方案(2) Technical solution

本发明提供了一种基于层次化记忆网络的问答方法，包括：步骤S101：融合词的位置和句子的时间序列信息，对句子集合中的句子进行句粒度记忆编码，得到句粒度记忆单元的双通道记忆编码；步骤S102：在问题语义编码的刺激下，通过多轮迭代的注意力机制完成所述句粒度记忆单元的信息推理，得到所述句粒度记忆单元上在词典维度的输出词概率分布；步骤S103：对所述句粒度记忆单元的信息推理结果进行k最大采样，从所述句子集合中筛选出k最大采样重要句子集合；步骤S104：利用双向循环神经网络模型对所述句子集合进行词粒度记忆编码，得到词粒度记忆单元的记忆编码；步骤S105：基于所述问题语义编码、词粒度记忆单元的记忆编码和k最大采样重要句子集合，通过注意力机制得到词粒度输出词概率分布；以及步骤S106：从句粒度和词粒度记忆单元中联合预测输出词概率分布，并利用交叉熵进行监督训练。The present invention provides a question answering method based on a hierarchical memory network, including: Step S101: Fusion word position and sentence time sequence information, carry out sentence granularity memory encoding on the sentences in the sentence set, and obtain the double sentence granularity memory unit Channel memory encoding; step S102: Under the stimulation of question semantic encoding, complete the information reasoning of the sentence granularity memory unit through the attention mechanism of multiple iterations, and obtain the output word probability distribution of the sentence granularity memory unit in the dictionary dimension ; Step S103: Carry out k-maximum sampling on the information reasoning results of the sentence granularity memory unit, and filter out the k-maximum sampling important sentence set from the sentence set; Step S104: Utilize the bidirectional recurrent neural network model to perform Word granularity memory encoding to obtain the memory encoding of the word granularity memory unit; Step S105: Based on the semantic encoding of the problem, the memory encoding of the word granularity memory unit and the k-maximum sampling important sentence set, the word granularity output word probability distribution is obtained through the attention mechanism ; and step S106 : jointly predict the output word probability distribution from the sentence granularity and word granularity memory units, and use cross-entropy for supervised training.

(三)有益效果(3) Beneficial effects

从上述技术方案可以看出，本发明的基于层次化记忆网络的问答方法具有以下有益效果：It can be seen from the above technical solution that the question answering method based on the hierarchical memory network of the present invention has the following beneficial effects:

(1)本发明首先进行句粒度记忆编码，并在问题语义编码的刺激下，通过多轮迭代的注意力机制完成句粒度记忆单元的信息推理，可以提高自动问答的准确性和及时性，有利于低频词和未登录词的答案选择；(1) The present invention first carries out sentence granular memory coding, and under the stimulation of problem semantic coding, completes the information reasoning of sentence granular memory unit through the attention mechanism of multiple iterations, can improve the accuracy and timeliness of automatic question answering, has Conducive to the answer selection of low-frequency words and unregistered words;

(2)通过k最大采样对句子进行筛选，可以提高自动问答的效率，降低计算复杂度；(2) Screening sentences by k-maximum sampling can improve the efficiency of automatic question answering and reduce computational complexity;

(3)在句粒度记忆编码的基础上还进行词粒度记忆编码，即在两个层次进行记忆编码，形成层次化的记忆编码，可以进一步提高自动问答的准确性；(3) On the basis of sentence granular memory coding, word granular memory coding is also carried out, that is, memory coding is carried out at two levels to form a hierarchical memory coding, which can further improve the accuracy of automatic question answering;

(4)利用循环神经网络进行词粒度记忆编码时，是在全句子集合X上操作的，该方式可以在词粒度记忆编码过程中引入词在句子全集合中的上下文环境语义信息，可以提高自动问答的准确性和及时性；(4) When the recurrent neural network is used for word granular memory coding, it is operated on the whole sentence set X. This method can introduce the context semantic information of words in the full sentence set during the word granular memory coding process, which can improve the automatic the accuracy and timeliness of questions and answers;

(5)词粒度记忆单元上的注意力机制是在k采样后的词粒度记忆单元子集合上运算的，避免了记忆编码中的干扰信息，并减少了词粒度注意力机制的计算量；(5) The attention mechanism on the word granularity memory unit is operated on the subset of word granularity memory units after k sampling, which avoids the interference information in the memory encoding and reduces the calculation amount of the word granularity attention mechanism;

(6)利用句粒度和词粒度记忆单元联合预测输出词概率分布，可以进一步提高自动问答的准确性，有效解决了低频词和未登录词的答案选择问题。(6) Using sentence granularity and word granularity memory units to jointly predict the output word probability distribution can further improve the accuracy of automatic question answering, and effectively solve the problem of answer selection for low-frequency words and unregistered words.

附图说明Description of drawings

图1为本发明实施例的基于层次化记忆网络的问答方法的流程图；FIG. 1 is a flowchart of a question answering method based on a hierarchical memory network according to an embodiment of the present invention;

图2为本发明实施例的基于层次化记忆网络的问答方法的框架示意图；FIG. 2 is a schematic framework diagram of a question answering method based on a hierarchical memory network according to an embodiment of the present invention;

图3为本发明实施例的句粒度记忆编码及基于句粒度记忆编码的信息推理示意图；Fig. 3 is a schematic diagram of sentence granularity memory coding and information reasoning based on sentence granularity memory coding according to an embodiment of the present invention;

图4为本发明实施例的词粒度记忆编码及基于词粒度记忆编码的注意力激活示意图；FIG. 4 is a schematic diagram of word granularity memory coding and attention activation based on word granularity memory coding according to an embodiment of the present invention;

图5为本发明实施例的基于层次化记忆网络的问答方法的性能示意图1；FIG. 5 is a performance diagram 1 of a question answering method based on a hierarchical memory network according to an embodiment of the present invention;

图6为本发明实施例的基于层次化记忆网络的问答方法的又一性能示意图。FIG. 6 is another performance diagram of a question answering method based on a hierarchical memory network according to an embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明作进一步的详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

本发明公开了一种基于层次化记忆网络的问答方法，其基于一种全神经网络结构的端到端模型，可以实现句子集合中的信息推理，筛选和词粒度选择，有效解决大数据下问答系统对低频词或未登录词的答案选择问题。本发明的问答方法将具有时间序列信息的句子集合分别进行两个层次化的记忆编码，分别为：句粒度记忆编码和词粒度记忆编码。然后基于层次化记忆网络进行信息推理、筛选和激活，联合预测候选答案词概率分布。The invention discloses a question answering method based on a hierarchical memory network, which is based on an end-to-end model of a full neural network structure, which can realize information reasoning, screening and word granularity selection in a sentence set, and effectively solve question answering under big data The system selects answers to low-frequency words or unregistered words. In the question answering method of the present invention, sentence sets with time series information are respectively subjected to two hierarchical memory encodings, namely: sentence granularity memory encoding and word granularity memory encoding. Then information reasoning, screening and activation are performed based on the hierarchical memory network, and the probability distribution of candidate answer words is jointly predicted.

本发明的问答方法，通过层次化记忆网络先对句子集合进行句向量化记忆编码，考虑词在句子中的位置信息及句子在句子集合中的序列时间信息，然后通过多轮迭代的注意力机制完成句粒度记忆单元的信息推理，并基于该推理结果进行k最大采样，筛选出重要的句子信息。然后利用双向循环网络模型对句子集合进行词粒度的序列编码，并通过注意力机制从被筛选的信息中进行词粒度记忆单元的信息激活，最后分别从句粒度和词粒度记忆单元中预测输出词概率分布并通过Softmax进行联合监督训练，学习端到端的自动问答模型。In the question-answering method of the present invention, sentence vectorized memory encoding is first performed on the sentence set through a hierarchical memory network, considering the position information of words in the sentence and the sequence time information of the sentence in the sentence set, and then through the attention mechanism of multiple rounds of iteration Complete the information reasoning of the sentence granularity memory unit, and perform k-maximum sampling based on the reasoning result to screen out important sentence information. Then use the two-way recurrent network model to encode the word-granularity sequence of the sentence set, and use the attention mechanism to activate the information of the word-granularity memory unit from the screened information, and finally predict the output word probability from the sentence-granularity and word-granularity memory units respectively Distributed and jointly supervised training by Softmax to learn an end-to-end automatic question answering model.

下面结合附图对作为本发明实施例的基于层次化记忆网络的问答方法进行详细描述。A question answering method based on a hierarchical memory network as an embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

图1为本发明实施例的基于层次化记忆网络的问答方法的流程图，参照图1，该问答方法包括：Fig. 1 is the flowchart of the question answering method based on hierarchical memory network of the embodiment of the present invention, with reference to Fig. 1, this question answering method comprises:

步骤S101：融合词的位置和句子的时间序列信息，对句子集合中的句子进行句粒度记忆编码，得到句粒度记忆单元的双通道记忆编码。Step S101: Fusing word positions and sentence time series information, performing sentence-granularity memory encoding on the sentences in the sentence set to obtain a dual-channel memory encoding of the sentence-granularity memory unit.

参照图3，步骤S101包括：Referring to Fig. 3, step S101 includes:

子步骤S101a：对句子集合中具有时间序列信息的句子进行双通道词向量映射，得到句子的双通道词向量化编码。Sub-step S101a: Perform dual-channel word vector mapping on the sentences with time series information in the sentence set to obtain the dual-channel word vectorization encoding of the sentences.

子步骤S101a包括：给定具有时间序列信息的句子集合X＝{x_i}_{i＝(1，2，...，n)}，其中，i为句子的当前时间序列，n为句子集合的最大时间序列长度；随机初始化两个词向量矩阵和其中，|V|为词典维度，d为词向量的维度，A和C分别采用标准差为0.1、均值为0的正态分布作为随机初始化参数，对句子集合X中的句子x_i进行双通道词向量映射，则句子x_i中的词x_ij的双通道向量化编码为和j为词在句子x_i中的位置信息。Sub-step S101a includes: Given a set of sentences with time series information X={ _xi } _{i=(1, 2, ..., n)} , where i is the current time series of sentences, and n is the largest sentence set Time series length; randomly initialize two word vector matrices and Among them, | _V | is the dimension of the dictionary, d is the dimension of the word vector, A and C respectively use the normal distribution with a standard deviation of 0.1 and a mean of 0 as random initialization parameters, and perform dual-channel Word vector mapping, then the two-channel vectorization encoding of word x _ij in sentence x _i is and j is the position information of the word in the sentence x _i .

子步骤S101b：根据词在句子中的位置信息对双通道词向量化编码进行更新。Sub-step S101b: update the dual-channel word vectorization encoding according to the position information of the word in the sentence.

子步骤S101b包括：根据词在句子中的位置信息j及词向量的维度d生成更新矩阵l，则更新后的双通道词向量化编码为l_gj·(Ax_ij)和l_gj·(Cx_ij)，其中：Sub-step S101b includes: generating an update matrix l according to the position information j of the word in the sentence and the dimension d of the word vector, then the updated two-channel word vectorization encoding is l _gj (Ax _ij ) and l _gj (Cx _ij ),in:

l_gj＝(1-j/J_i)-(g/d)(1-2j/J_i) (1)l _gj ＝(1-j/J _i )-(g/d)(1-2j/J _i ) (1)

其中，J_i是句子x_i中词的个数，而g为维度为d的词向量中的当前维度值，且1≤j≤J和1≤g≤d。Among them, J _i is the number of words in the sentence x _i , and g is the current dimension value in the word vector with dimension d, and 1≤j≤J and 1≤g≤d.

子步骤S101c：融合句子的时间序列信息对句子进行句粒度记忆编码，得到句粒度记忆单元的双通道记忆编码。Sub-step S101c: Integrating the time series information of the sentence to perform sentence-granularity memory encoding on the sentence to obtain the dual-channel memory encoding of the sentence-granularity memory unit.

子步骤S101c包括：随机初始化两个句子时间序列的向量化矩阵和其中，n为句子集合的最大时间序列长度，d为时间向量维度，与词向量的维度相同，T_A和T_C分别采用标准差为0.1、均值为0的正态分布作为随机初始化参数，则句粒度记忆单元的双通道记忆编码为M^(S)＝{{a_i}，{c_i}}，其中：Substep S101c includes: randomly initializing the vectorized matrix of two sentence time series and Among them, n is the maximum time series length of the sentence set, d is the dimension of the time vector, which is the same as the dimension of the word vector, T _A and T _C respectively adopt the normal distribution with standard deviation of 0.1 and mean value of 0 as random initialization parameters, then The dual-channel memory encoding of the sentence granularity memory unit is M ^(S) = {{a _i }, { _ci }}, where:

${a a}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({Ax Ax}_{i i j j})) + + {T T}_{A A} ((i i)) - - - - - - ((22))$

${c c}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({Cx Cx}_{i i j j})) + + {T T}_{C C} ((i i)) - - - - - - ((33))$

其中，l_j为中更新矩阵l在句子x_i中第j个词的更新向量，操作符·为向量间元素乘法操作，如公式(2)中l_j·(Ax_ij)表示向量l_j和向量(Ax_ij)进行元素乘法操作。Among them, l _j is the update vector of the jth word in the sentence x _i in the update matrix l, and the operator · is the element multiplication operation between vectors, such as l _j · (Ax _ij ) in the formula (2) represents the vector l _j and Vector(Ax _ij ) performs element-wise multiplication.

步骤S102：在问题语义编码的刺激下，通过多轮迭代的注意力机制完成句粒度记忆单元的信息推理，得到句粒度记忆单元上在词典维度的输出词概率分布。Step S102: Under the stimulation of the semantic encoding of the question, the information reasoning of the sentence granular memory unit is completed through the multi-round iterative attention mechanism, and the output word probability distribution of the sentence granular memory unit in the dictionary dimension is obtained.

步骤S102包括：Step S102 includes:

子步骤S102a：对问题文本进行向量化表示，得到问题的语义编码。Sub-step S102a: Perform vectorized representation of the question text to obtain the semantic code of the question.

子步骤S102a包括：利用所述词向量矩阵对问题文本q中第j个词q_j进行向量化表示并基于词在问题文本中的位置j对该向量化表示进行更新，得到问题的语义编码：Sub-step S102a includes: using the word vector matrix Vectorized representation of the jth word q _j in the question text q And based on the position j of the word in the question text, the vectorized representation is updated to obtain the semantic encoding of the question:

${u u}_{11}^{((S S))} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({Aq Q}_{j j})) - - - - - - ((44))$

同所述公式(2)和(3)，l_j为更新矩阵l在句子x_i中第j个词的更新向量。Same as the formulas (2) and (3), l _j is the update vector of the jth word of the update matrix l in the sentence x _i .

子步骤S102b：在问题语义编码的刺激下，利用注意力机制在句粒度记忆单元的双通道记忆编码中进行信息激活；Sub-step S102b: Under the stimulation of the semantic encoding of the question, use the attention mechanism to activate information in the dual-channel memory encoding of the sentence granularity memory unit;

子步骤S102b包括：采用点积方式计算问题语义编码在句粒度记忆单元的注意力权重：Sub-step S102b includes: using the dot product method to calculate the attention weight of the question semantic code in the sentence granularity memory unit:

${α α}_{i i}^{((S S))} = = s the s o o f f t t max max (({a a}_{i i}^{T T} {u u}_{11}^{((S S))})) - - - - - - ((55))$

则在问题语义编码刺激下，句粒度记忆单元的双通道记忆编码的激活信息为： Then, under the stimulation of question semantic coding, the activation information of the dual-channel memory coding of the sentence granularity memory unit is:

子步骤S102c：通过多轮迭代的注意力机制完成在句粒度记忆单元的信息推理，得到句粒度记忆单元上在词典维度的输出词概率分布。Sub-step S102c: complete the information reasoning in the sentence granular memory unit through the multi-round iterative attention mechanism, and obtain the output word probability distribution in the dictionary dimension on the sentence granular memory unit.

子步骤S102c包括：在句粒度记忆单元上进行R轮信息激活，找到候选句子集合，得到第R轮的激活信息O_R，其中，在第r+1轮信息激活中，Sub-step S102c includes: performing R rounds of information activation on the sentence granularity memory unit, finding a set of candidate sentences, and obtaining the Rth round of activation information OR , wherein, in the _r +1th round of information activation,

${u u}_{r r + + 11}^{((S S))} = = {o o}_{r r} + + {u u}_{r r}^{((S S))} - - - - - - ((66))$

${α α}_{i i}^{((S S))} = = s the s o o f f t t m m a a x x (({a a}_{i i}^{T T} {u u}_{r r + + 11}^{((S S))})) - - - - - - ((77))$

${a a}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot \cdot (({A A}^{r r + + 11} {x x}_{i i j j})) + + {T T}_{A A}^{r r + + 11} ((i i)) - - - - - - ((88))$

${o o}_{r r + + 11} = = {Σ Σ}_{i i} {α α}_{i i}^{((S S))} {c c}_{i i} - - - - - - ((99))$

${c c}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({C C}^{r r + + 11} {x x}_{i i j j})) + + {T T}_{C C}^{r r + + 11} ((i i)) - - - - - - ((1010))$

其中，1≤r≤(R-1)，在第r+1轮信息激活中采用独立的词向量矩阵A^r+1和C^r+1和时间向量矩阵和对句子集合进行向量化表示，且C^r和分别采用标准差为0.1、均值为0的正态分布作为随机初始化参数。Among them, 1≤r≤(R-1), the independent word vector matrix A ^r+1 and C ^r+1 and the time vector matrix are used in the r+1 round of information activation and A vectorized representation of a collection of sentences, and C ^r and A normal distribution with a standard deviation of 0.1 and a mean of 0 was used as random initialization parameters.

通过R轮迭代的注意力机制完成在句粒度记忆单元的信息推理，得到句粒度记忆单元上在词典维度的输出词概率分布为：The information reasoning in the sentence granularity memory unit is completed through the attention mechanism of R rounds of iteration, and the output word probability distribution in the dictionary dimension on the sentence granularity memory unit is obtained as:

${p p}^{((S S))} ((w w)) = = s the s o o f f t t m m a a x x (({(({C C}^{R R}))}^{T T} (({o o}_{R R} + + {u u}_{R R}^{((S S))})))) - - - - - - ((1111))$

其中，为词典维度词集合，为第R轮信息激活的词向量矩阵，而T为转置操作符。in, is the set of dictionary dimension words, is the matrix of word embeddings activated for the R-th round information, and T is the transposition operator.

本发明首先进行句粒度记忆编码，并在问题语义编码的刺激下，通过多轮迭代的注意力机制完成句粒度记忆单元的信息推理，可以提高自动问答的准确性和及时性，有利于低频词和未登录词的答案选择。The present invention first performs sentence granularity memory coding, and under the stimulation of question semantic coding, completes the information reasoning of sentence granularity memory units through the attention mechanism of multiple rounds of iterations, which can improve the accuracy and timeliness of automatic question and answer, and is beneficial to low-frequency words and unregistered word answer choices.

步骤S103：对句粒度记忆单元的信息推理结果进行k最大采样，从句子集合中筛选出k最大采样重要句子集合。Step S103: Perform k-maximum sampling on the information reasoning results of the sentence-granularity memory unit, and select the k-maximum-sampling important sentence set from the sentence set.

步骤S103包括：Step S103 includes:

子步骤S103a：对句粒度记忆单元上第R轮信息激活的注意力权重向量从中选取其k个最大的注意力权重子集合 Sub-step S103a: the attention weight vector activated on the R-th round of information on the sentence granularity memory unit Select the k largest attention weight sub-sets from it

子步骤S103b：选取k个最大的注意力权重子集合对应的句子集合作为k最大采样重要句子集合重要句子集合中的句子为重要句子。Sub-step S103b: Select the k largest attention weight sub-sets The corresponding sentence set is used as the k-maximum sampling important sentence set Sentences in the set of important sentences for important sentences.

本发明k最大采样对句子进行筛选，可以提高自动问答的效率，降低计算复杂度，更加有利于低频词和未登录词的答案选择。The k-maximum sampling of the present invention screens sentences, which can improve the efficiency of automatic question answering, reduce computational complexity, and be more conducive to the answer selection of low-frequency words and unregistered words.

步骤S104：利用双向循环神经网络模型对句子集合进行词粒度记忆编码，得到词粒度记忆单元的记忆编码。Step S104: use the bidirectional recurrent neural network model to perform word-granularity memory encoding on the sentence set, and obtain the memory encoding of the word-granularity memory unit.

参照图4，步骤S104包括：Referring to FIG. 4, step S104 includes:

子步骤S104a：利用双向循环网络模型对重要句子集合中的词按时间序列进行编码，得到双向循环网络模型的隐状态。双向循环网络模型有多种现有模型，本实施例采用其中的一种：门循环网络模型(GRU)。Sub-step S104a: use the bidirectional recurrent network model to encode the words in the important sentence set in time series, and obtain the hidden state of the bidirectional recurrent network model. There are many existing bidirectional recurrent network models, one of which is adopted in this embodiment: the gated recurrent network model (GRU).

子步骤S104a包括：利用门循环网络模型(GRU)分别对句子集合X中的所有词按时间序列进行正向和反向编码，对于t时刻的词特征，前向GRU编码的隐状态为后向GRU编码的隐状态为其中，|t|为按照时间序列对句子集合X中所有词进行排列后的词最大序列长度，和的维度与词向量的维度d相同，C^R为句粒度记忆单元中第R轮信息激活过程中的词向量矩阵。Sub-step S104a includes: using the Gated Recurrent Network Model (GRU) to respectively classify all words in the sentence set X Forward and reverse encoding according to time series, for word features at time t, the hidden state of forward GRU encoding is The hidden state of the backward GRU encoding is Among them, |t| is the maximum sequence length of words after arranging all the words in the sentence set X according to the time sequence, and The dimension of is the same as the dimension d of the word vector, and C ^R is the word vector matrix during the R-th round of information activation in the sentence granularity memory unit.

子步骤S104b：对双向循环网络模型的隐状态进行融合，得到词粒度记忆单元的记忆编码。Sub-step S104b: Fusion the hidden state of the bidirectional recurrent network model to obtain the memory encoding of the word granularity memory unit.

子步骤S104b包括：将双向循环网络模型的隐状态直接相加，得到词粒度记忆单元的记忆编码M^(W)＝{m_t}_{t＝1，2，..|t|)}，其中 Sub-step S104b includes: adding the hidden states of the two-way recurrent network model directly to obtain the memory encoding M ^(W) ={m _t } _{t=1, 2, ..|t|)} of the memory unit at the word granularity, where

本发明利用循环神经网络进行词粒度记忆编码时，是在全句子集合X上操作的，该方式可以在词粒度记忆编码过程中引入词在句子全集合中的上下文环境语义信息，可以提高自动问答的准确性和及时性，有利于低频词和未登录词的答案选择。When the present invention utilizes the recurrent neural network to perform word granularity memory encoding, it operates on the entire sentence set X. This method can introduce the context and semantic information of words in the full set of sentences during the word granularity memory encoding process, which can improve automatic question answering. The accuracy and timeliness of the method are beneficial to the answer selection of low-frequency words and unregistered words.

步骤S105：基于问题语义编码、词粒度记忆单元的记忆编码和k最大采样重要句子集合，通过注意力机制得到词粒度输出词概率分布。Step S105: Based on the semantic encoding of the question, the memory encoding of the memory unit at the word granularity, and the k-maximum sampling important sentence set, the output word probability distribution at the word granularity is obtained through the attention mechanism.

步骤S105包括：Step S105 includes:

子步骤S105a：根据问题语义编码和词粒度记忆单元的记忆编码计算词粒度记忆单元的注意力权重；Sub-step S105a: Calculate the attention weight of the word-granularity memory unit according to the semantic code of the question and the memory code of the word-granularity memory unit;

子步骤S105a包括：基于句粒度记忆单元上第R轮信息激活过程中的问题语义编码词粒度记忆单元的记忆编码M^(W)＝{m_t}_{t＝1，2，..，|t|)}和k最大采样重要句子集合得到归一化后的词粒度记忆单元的注意力权重向量其中：Sub-step S105a includes: based on the semantic encoding of the question during the R-th round of information activation on the sentence granularity memory unit Memory encoding M ^(W) = {m _t } _{t = 1, 2, .., |t|)} and k largest sampling important sentence set of word granularity memory unit Obtain the attention weight vector of the normalized word granularity memory unit in:

${α α}_{t t}^{((w w))} = = s the s o o f f t t m m a a x x (({v v}^{T T} tanh tanh (({Wu Wu}_{R R}^{((S S))} + + U u {\overset{^^}{m m}}_{t t})))) - - - - - - ((1212))$

其中，是k最大采样重要句子集合中的词集合所对应的词粒度记忆编码M^(W)＝{m_t}_{t＝(1，2，...，|t|)}中的子集合注意力权重向量α^(W)的维度与按照时间序列对重要句子集合中所有词进行排列后的词最大序列长度一致，即为和均为学习参数，v、W和U均采用标准差为0.1、均值为0的正态分布进行随机初始化，在训练阶段进行更新。in, is the k-maximally sampled set of important sentences set of words in The corresponding word-granularity memory code M ^(W) = {m _t } _{t = (1, 2, ..., |t|)} in the subset The dimension of the attention weight vector α ^(W) and the set of important sentences according to the time series all words in The maximum sequence length of the arranged words is the same, that is, and All are learning parameters, v, W and U are randomly initialized with a normal distribution with a standard deviation of 0.1 and a mean of 0, and are updated during the training phase.

子步骤S105b：根据词粒度记忆单元的注意力权重得到词粒度输出词概率分布。在本发明的实施例中，直接采用归一化后的词粒度记忆单元的注意力权重α^(W)作为词粒度输出词概率分布：Sub-step S105b: Obtain the word granularity output word probability distribution according to the attention weight of the word granularity memory unit. In an embodiment of the present invention, the attention weight α ^(W) of the normalized word granularity memory unit is directly adopted as the word granularity output word probability distribution:

${p p}^{((W W))} ((\overset{^^}{w w})) = = {α α}^{((W W))} - - - - - - ((1313))$

此时，词粒度输出词概率分布与注意力权重的维度一致，即为重要句子集合中所有词的集合 At this time, the word granularity output word probability distribution is consistent with the dimension of attention weight, namely is the set of all words in the important sentence set

本发明在句粒度记忆编码的基础上还进行词粒度记忆编码，即在两个层次进行记忆编码，形成层次化的记忆编码，可以进一步提高自动问答的准确性，更加有利于低频词和未登录词的答案选择。同时，词粒度记忆单元上的注意力机制是在k采样后的词粒度记忆单元子集合上运算的，避免了记忆编码中的干扰信息，并减少了词粒度注意力机制的计算量。The present invention also performs word granularity memory coding on the basis of sentence granularity memory coding, that is, performs memory coding at two levels to form hierarchical memory coding, which can further improve the accuracy of automatic question and answer, and is more conducive to low-frequency words and unregistered words. word answer choice. At the same time, the attention mechanism on the word granularity memory unit is operated on the subset of word granularity memory units after k sampling, which avoids the interference information in the memory encoding and reduces the calculation amount of the word granularity attention mechanism.

步骤S106：从句粒度和词粒度记忆单元中联合预测输出词概率分布，并利用交叉熵进行监督训练。Step S106: Jointly predict the output word probability distribution from the sentence-granularity and word-granularity memory units, and use cross-entropy for supervised training.

步骤S106包括：Step S106 includes:

子步骤S106a：基于句粒度记忆单元上在词典维度的输出词概率分布和词粒度输出词概率分布进行输出词联合预测，联合预测输出词分布p(w)表达式为：Sub-step S106a: Based on the output word probability distribution in the dictionary dimension and the word granularity output word probability distribution on the sentence granularity memory unit, the output word joint prediction is performed, and the expression of the joint prediction output word distribution p(w) is:

$\begin{matrix} p p ((w w)) = = {p p}^{((S S))} ((w w)) + + {p p}^{((W W))} ((w w)) \\ = = {p p}^{((S S))} ((w w)) + + t t r r a a n no s the s (({p p}^{((W W))} ((\overset{^^}{w w})))) \end{matrix} - - - - - - ((1414))$

其中，trans(·)表示将子集合的词粒度输出词概率分布映射到词典维度全集合的词粒度输出词概率分布该映射操作具体是指输出词概率分布中概率值按照其对应词子集合中的词在词典维度词全集合中的位置进行概率值映射，若全集合中的某些词在子集合中未出现，则将其输出概率置为0，得到映射后的词输出概率分布 Among them, trans( ) indicates that the word granularity of the sub-set is output to the word probability distribution Word granularity output word probability distribution mapped to the full set of dictionary dimensions The mapping operation specifically refers to the output word probability distribution The medium probability value according to its corresponding word sub-set The words in the dictionary dimension word set The position in the probability value mapping, if some words in the full set do not appear in the sub-set, the output probability is set to 0, and the output probability distribution of the word after mapping is obtained

子步骤S106b：利用目标答案词分布对联合预测输出词分布进行交叉熵监督训练。给定训练集的目标答案词分布为y，则基于目标答案词分布y与联合预测输出词分布p(w)的交叉熵函数进行联合优化。Sub-step S106b: Perform cross-entropy supervised training on the joint prediction output word distribution by using the target answer word distribution. Given that the target answer word distribution of the training set is y, the joint optimization is performed based on the cross-entropy function of the target answer word distribution y and the joint prediction output word distribution p(w).

本发明的一个示例性实施例中，采用随机梯度下降方法进行误差反向传播对联合优化中的目标函数进行优化，优化参数包括词粒度记忆单元中的词向量矩阵{A^r}_{r＝(1，2，...，R)}和{C^r}_{t＝1，2，...，R)}和时间向量矩阵和词粒度记忆编码过程中所采用双向GRU模型的所有参数集合{θ_GRU}及计算词粒度记忆单元的注意力权重(公式(12))中的v，W和U。In an exemplary embodiment of the present invention, the stochastic gradient descent method is used to carry out error backpropagation to optimize the objective function in the joint optimization, and the optimization parameters include the word vector matrix {A ^r } _{r=(1 , 2,...,R)} and {C ^r } _{t=1, 2,...,R)} and time vector matrix and All parameter sets {θ _GRU } of the two-way GRU model used in the encoding process of word granularity memory and v, W and U in calculating the attention weight (formula (12)) of the word granularity memory unit.

本发明在句粒度和词粒度记忆单元中联合预测输出词概率分布，可以进一步提高自动问答的准确性，更加有利于低频词和未登录词的答案选择。The present invention jointly predicts the output word probability distribution in the sentence granularity and word granularity memory units, can further improve the accuracy of automatic question answering, and is more conducive to the answer selection of low-frequency words and unregistered words.

图2为作为本发明一个实施例的基于层次化记忆网络的问答方法的框架示意图。参照图2，基于层次化记忆网络的问答方法共有两个层次的记忆网络单元，分别为：FIG. 2 is a schematic framework diagram of a question answering method based on a hierarchical memory network as an embodiment of the present invention. Referring to Figure 2, the question answering method based on the hierarchical memory network has two levels of memory network units, which are:

记忆单元一：句子集合以时间序列进行句粒度的编码记忆；Memory unit 1: Sentence sets are coded and memorized at sentence granularity in time series;

记忆单元二：句子集合中所有词按照时间序列进行词粒度的编码记忆。Memory unit 2: All words in the sentence set are coded and memorized according to the time sequence of word granularity.

不同记忆单元层次间采用k最大采用进行重要信息筛选和过滤。Between different memory unit levels, k-maximum adoption is used to screen and filter important information.

模型信息处理阶段有两处信息激活机制，分别为：There are two information activation mechanisms in the model information processing stage, which are:

激活机制一：在句粒度记忆单元上采用推理机制进行信息激活；Activation mechanism 1: Use the reasoning mechanism to activate information on the sentence granularity memory unit;

激活机制二：在词粒度记忆单元上采用注意力机制进行词选择。Activation mechanism 2: Use the attention mechanism on the word granularity memory unit for word selection.

整个模型训练阶段共有两处监督信息进行指导，分别为：During the entire model training phase, there are two supervisory information for guidance, namely:

监督信息一：句粒度记忆单元进行信息推理后的输出向量进行解码和Softmax输出后对目标词的拟合信息；Supervision information 1: the output vector after the information reasoning of the sentence granularity memory unit is decoded and the fitting information of the target word is output by Softmax;

监督信息二：词粒度记忆单元进行注意力机制激活和Softmax输出后对目标词的拟合信息。Supervision information 2: the fitting information of the target word after the attention mechanism activation and Softmax output of the word granular memory unit.

为了准确评估本发明方法的自动问答响应性能，本发明通过对比模型选择输出的答案词和数据真实答案词不一致的错误样本数对比本发明方法的性能。In order to accurately evaluate the automatic question and answer response performance of the method of the present invention, the present invention compares the performance of the method of the present invention by comparing the number of error samples in which the answer words output by the model selection and the real answer words of the data are inconsistent.

表1Table 1

数据领域data field 训练/测试问答对train/test question-answer pairs 词典大小(全部/训练/测试)Dictionary size (all/train/test) 未登录目标词(百分比)Unregistered target words (percentage) 机票预订pre-book fly ticket 7,000/7,0007,000/7,000 10,682/5,612/5,61810,682/5,612/5,618 5,070(72.43％)5,070 (72.43%)

本发明的实验中采用一种中文机票订票领域文本数据集，该数据集共包含2,000个完整对话历史，14,000个问答对，将其5∶5分为训练集和测试集。针对这些文本数据集，本发明不做任何处理(包括去停用词和词干还原等操作)。数据集的具体统计信息如表1所示，可以看到，在测试集中的未登录目标词占有72.43％，对传统模型训练会产生比较大的影响。In the experiment of the present invention, a text data set in the field of Chinese airline ticket booking is adopted, which contains 2,000 complete dialogue histories and 14,000 question-answer pairs, which are divided into training set and test set 5:5. For these text data sets, the present invention does not do any processing (including operations such as removing stop words and word stem restoration). The specific statistical information of the data set is shown in Table 1. It can be seen that the unregistered target words in the test set account for 72.43%, which will have a relatively large impact on traditional model training.

本发明的实验中采用以下对比方法：Adopt following comparative method in the experiment of the present invention:

对比方法一：基于注意力机制的指针网络模型，该方法将句子集合中的所有词按照时间序列看成一个长句进行编码，直接利用问题与词编码的注意力机制生成答案；Comparison method 1: Pointer network model based on attention mechanism, which encodes all words in the sentence set as a long sentence according to time sequence, and directly uses the attention mechanism of question and word encoding to generate answers;

对比方法二：神经记忆网络模型，该方法对句子集合进行句粒度编码，利用问题的编码向量进行语义激活后的信息直接在全词典空间上进行答案匹配。Contrast method 2: Neural memory network model, this method encodes sentence sets at sentence granularity, and uses the encoding vector of the question for semantically activated information to directly match the answer in the full dictionary space.

本发明实验中采用参数设置如表2所示：Adopt parameter setting in the experiment of the present invention as shown in table 2:

表2Table 2

nno dd RR kk lrlr bsbs 1616 100100 33 11 0.010.01 1010

表2中，参数n为实验数据的句子集合的句子最大时间序列，d为词向量维度及隐层编码维度，R是句粒度记忆单元上推理机制的迭代次数，k为不同层次记忆间的最大采样数，lr是采用随机梯度下降方法进行模型参数优化时的学习率，bs是进行模型训练时每批样本的数量。In Table 2, the parameter n is the maximum sentence time series of the sentence set of the experimental data, d is the word vector dimension and the hidden layer coding dimension, R is the number of iterations of the reasoning mechanism on the sentence granularity memory unit, and k is the maximum The number of samples, lr is the learning rate when the stochastic gradient descent method is used to optimize the model parameters, and bs is the number of samples in each batch when the model is trained.

本发明实验中，进行15轮迭代训练，所有方法均收敛如图5所示，最终收敛后的实验结果如表3所示：In the experiment of the present invention, 15 rounds of iterative training are carried out, and all methods converge as shown in Figure 5, and the experimental results after the final convergence are shown in Table 3:

表3table 3

方法method 错误样本数error samples 对比方法一Comparison method one 109109 对比方法二Comparison method two 5656 本发明方法The method of the invention 00

图5和表3为本发明方法、对比方法一和对比方法二在数据集上的错误样本数评测结果。实验结果表明，本发明方法的收敛速度显著优越于其他方法。且根据表3中的最终收敛结果，可以看到，本发明方法明显优于其他方法，可以完全解决未登录词集合上的答案选择问题，达到100％的正确率。Figure 5 and Table 3 are the evaluation results of the number of error samples on the data set for the method of the present invention, the first comparison method and the second comparison method. Experimental results show that the convergence speed of the method of the present invention is significantly superior to other methods. And according to the final convergence results in Table 3, it can be seen that the method of the present invention is obviously superior to other methods, and can completely solve the problem of answer selection on unregistered word sets, reaching a 100% correct rate.

同时，本发明实验验证在层次记忆单元间信息筛选的最大采样数k对答案选择问题中错误样本数的性能影响，实验结果如图6和表4所示。可以看到，当最大采样数为1时，本发明方法性能的收敛速度和最终收敛结果可达到最优，进一步说明层次记忆单元间进行信息选择的重要性。At the same time, the experiment of the present invention verifies the performance impact of the maximum number of samples k for information screening among hierarchical memory units on the number of wrong samples in answer selection questions, and the experimental results are shown in Figure 6 and Table 4. It can be seen that when the maximum number of samples is 1, the convergence speed and final convergence result of the performance of the method of the present invention can be optimal, which further illustrates the importance of information selection among hierarchical memory units.

表4Table 4

最大采样数Maximum number of samples 错误样本数error samples 33 55 22 44 11 00

至此，已经结合附图对本发明实施例进行了详细描述。依据以上描述，本领域技术人员应当对本发明的一种基于层次化记忆网络的问答方法有了清楚的认识。So far, the embodiments of the present invention have been described in detail with reference to the accompanying drawings. Based on the above description, those skilled in the art should have a clear understanding of the question answering method based on the hierarchical memory network of the present invention.

本发明一种基于层次化记忆网络的问答方法，首先进行句粒度记忆编码，并在问题语义编码的刺激下，通过多轮迭代的注意力机制完成句粒度记忆单元的信息推理，可以提高自动问答的准确性和及时性，有利于低频词和未登录词的答案选择；并通过k最大采样对句子进行筛选，可以提高自动问答的效率，降低计算复杂度，句粒度记忆编码的基础上还进行词粒度记忆编码，即在两个层次进行记忆编码，形成层次化的记忆编码，可以进一步提高自动问答的准确性；利用循环神经网络进行词粒度记忆编码时，是在全句子集合X上操作的，该方式可以在词粒度记忆编码过程中引入词在句子全集合中的上下文环境语义信息，可以提高自动问答的准确性和及时性；词粒度记忆单元上的注意力机制是在k采样后的词粒度记忆单元子集合上运算的，避免了记忆编码中的干扰信息，并减少了词粒度注意力机制的计算量；利用句粒度和词粒度记忆单元联合预测输出词概率分布，可以进一步提高自动问答的准确性，有效解决了低频词和未登录词的答案选择问题。A question-and-answer method based on a hierarchical memory network of the present invention first performs sentence-granularity memory encoding, and under the stimulation of question semantic encoding, completes the information reasoning of sentence-granularity memory units through a multi-round iterative attention mechanism, which can improve automatic question-answering The accuracy and timeliness of the method are beneficial to the answer selection of low-frequency words and unregistered words; and the k-maximum sampling is used to screen sentences, which can improve the efficiency of automatic question answering and reduce computational complexity. On the basis of sentence granularity memory coding, it can also perform Word-granular memory encoding, that is, memory encoding at two levels to form a hierarchical memory encoding, which can further improve the accuracy of automatic question and answer; when using a recurrent neural network for word-granular memory encoding, it is operated on the entire sentence set X , this method can introduce the context semantic information of words in the full set of sentences in the process of word granular memory encoding, which can improve the accuracy and timeliness of automatic question answering; the attention mechanism on the word granular memory unit is after k sampling The operation on the sub-set of word granularity memory unit avoids the interference information in the memory encoding and reduces the calculation amount of the word granularity attention mechanism; using the sentence granularity and word granularity memory unit to jointly predict the output word probability distribution can further improve the automatic The accuracy of the question and answer effectively solves the problem of answer selection for low-frequency words and unregistered words.

需要说明的是，在附图或说明书正文中，未绘示或描述的实现方式，均为所属技术领域中普通技术人员所知的形式，并未进行详细说明。此外，上述对各元件的定义并不仅限于实施例中提到的各种方式，本领域普通技术人员可对其进行简单地更改或替换，例如：It should be noted that, in the accompanying drawings or in the text of the specification, implementations that are not shown or described are forms known to those of ordinary skill in the art, and are not described in detail. In addition, the above definitions of each element are not limited to the various methods mentioned in the embodiments, and those of ordinary skill in the art can easily modify or replace them, for example:

(1)实施例中提到的方向用语，例如“上”、“下”、“前”、“后”、“左”、“右”等，仅是参考附图的方向，并非用来限制本发明的保护范围；(1) The directional terms mentioned in the embodiments, such as "up", "down", "front", "back", "left", "right", etc., are only referring to the directions of the drawings, and are not used to limit The protection scope of the present invention;

(2)上述实施例可基于设计及可靠度的考虑，彼此混合搭配使用或与其他实施例混合搭配使用，即不同实施例中的技术特征可以自由组合形成更多的实施例。(2) The above embodiments can be mixed and matched with each other or with other embodiments based on design and reliability considerations, that is, technical features in different embodiments can be freely combined to form more embodiments.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments described above have described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention, and are not intended to limit the present invention. Within the spirit and principles of the present invention, any modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the present invention.

Claims

1. A question answering method based on a hierarchical memory network, characterized in that, comprising:

Step S101: Fusing the position of the word and the time series information of the sentence, performing sentence-granularity memory encoding on the sentences in the sentence set, and obtaining the dual-channel memory encoding of the sentence-granularity memory unit;

Step S102: Under the stimulation of the semantic encoding of the question, complete the information reasoning of the sentence granular memory unit through the attention mechanism of multiple iterations, and obtain the output word probability distribution of the sentence granular memory unit in the dictionary dimension;

Step S103: performing k-maximum sampling on the information reasoning results of the sentence-granularity memory unit, and selecting the k-maximum-sampling important sentence set from the sentence set;

Step S104: using a bidirectional recurrent neural network model to perform word-granularity memory encoding on the sentence set to obtain memory encoding of word-granularity memory units;

Step S105: Based on the semantic encoding of the problem, the memory encoding of the word granularity memory unit and the k-maximum sampling important sentence set, the word granularity output word probability distribution is obtained through the attention mechanism; and

Step S106: Jointly predict the output word probability distribution from the sentence-granularity and word-granularity memory units, and use cross-entropy for supervised training.

2. The question answering method according to claim 1, characterized in that said step S101 comprises:

Sub-step S101a: Given a sentence set X={ _xi } _{i=(1, 2,...,n)} with time series information, randomly initialize the word vector matrix and The two-channel vectorization encoding of word x _ij in sentence x _i is and

Among them, i is the current time series of the sentence; n is the maximum time series length of the sentence set; |V| is the dictionary dimension; d is the dimension of the word vector; j is the position information of the word in the sentence x _i ;

Sub-step S101b: update the dual-channel word vectorization encoding according to the position information of the word in the sentence; and

Sub-step S101c: Integrating the time series information of the sentence to perform sentence-granularity memory encoding on the sentence to obtain the dual-channel memory encoding of the sentence-granularity memory unit.

3. The question answering method according to claim 2, characterized in that, said substep S101b comprises:

The updated two-channel word vectorization is encoded as l _gj (Ax _ij ) and l _gj (Cx _ij ), where,

l _gj ＝(1-j/J _i )-(g/d)(1-2j/J _i ) (1)

Among them, J _i is the number of words in the sentence x _i , and g is the current dimension value in the word vector with dimension d, and 1≤j≤J and 1≤g≤d.

4. The question answering method according to claim 3, characterized in that, said sub-step S101c comprises:

Randomly initialize the matrix of time vectors for sentences and Then the dual-channel memory encoding of the sentence granularity memory unit is M ^(S) = {{a _i }, { _ci }}, where,

a _i ＝∑ _j l _j ·(Ax _ij )+T _A (i) (2)

c _i ＝∑ _j l _j ·(Cx _ij )+T _C (i) (3)

Among them, l _j is the update vector of the jth word in the sentence x _i in the update matrix l; the operator is the element multiplication operation between vectors; n is the maximum time series length of the sentence set; d is the time vector dimension, and the word The vectors have the same dimensions.

5. The question answering method according to claim 4, characterized in that said step S102 comprises:

Sub-step S102a: use word vector matrix Vectorized representation of the jth word q _j in the question text q Get the problem semantic encoding:

{u u}_{11}^{((S S))} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({Aq Q}_{j j})) - - - - - - ((44))

Among them, l _j is the update vector of the jth word in the sentence x _i of the update matrix l;

Sub-step S102b: Calculating the attention weight of the question semantic code in the sentence granularity memory unit

{α α}_{i i}^{((S S))} = = s the s o o f f t t m m a a x x (({a a}_{i i}^{T T} {u u}_{11}^{((S S))})) - - - - - - ((55))

Under the stimulation of question semantic encoding, the activation information of the dual-channel memory encoding of the sentence granularity memory unit is: as well as

Sub-step S102c: complete the information reasoning in the sentence granular memory unit through the multi-round iterative attention mechanism, and obtain the output word probability distribution in the dictionary dimension on the sentence granular memory unit.

6. The question answering method according to claim 5, wherein said sub-step S102c comprises:

Perform R rounds of information activation on the sentence granularity memory unit to obtain the Rth round of activation information OR , where, in the _r +1th round of information activation,

{u u}_{r r + + 11}^{((S S))} = = {o o}_{r r} + + {u u}_{r r}^{((S S))} - - - - - - ((66))

{α α}_{i i}^{((S S))} = = s the s o o f f t t max max (({a a}_{i i}^{T T} {u u}_{r r + + 11}^{((S S))})) - - - - - - ((77))

{a a}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot &Center Dot; (({A A}^{r r + + 11} {x x}_{i i j j})) + + {T T}_{A A}^{r r + + 11} ((i i)) - - - - - - ((88))

{o o}_{r r + + 11} = = {Σ Σ}_{i i} {α α}_{i i}^{((S S))} {c c}_{i i} - - - - - - ((99))

{c c}_{i i} = = {Σ Σ}_{j j} {l l}_{j j} \cdot \cdot (({C C}^{r r + + 11} {x x}_{i i j j})) + + {T T}_{C C}^{r r + + 11} ((i i)) - - - - - - ((1010))

Among them, 1≤r≤(R-1); A ^r+1 ＝C ^r ,

The output word probability distribution in the dictionary dimension on the sentence granularity memory unit is:

{p p}^{((S S))} ((w w)) = = s the s o o f f t t m m a a x x (({(({C C}^{R R}))}^{r r} (({o o}_{R R} + + {u u}_{R R}^{((S S))})))) - - - - - - ((1111))

Wherein, w={w _t } _{t=(1, 2, ..., |V|)} is a dictionary dimension word set; The word vector matrix activated for the R-th round of information; T is the transpose operator.

7. The question answering method according to claim 6, characterized in that said step S103 comprises:

Sub-step S103a: the attention weight vector activated on the R-th round of information on the sentence granularity memory unit Select the k largest attention weight sub-sets from it as well as

Sub-step S103b: Select the k largest attention weight sub-sets The corresponding sentence set is used as the k-maximum sampling important sentence set

8. The question answering method according to claim 7, wherein said step S104 comprises:

Sub-step S104a: Use the gated recurrent network model to classify all the words in the sentence set X Forward and reverse encoding according to time series, for word features at time t, the hidden state of forward GRU encoding is The hidden state of the backward GRU encoding is

Among them, |t| is the maximum sequence length of words after arranging all the words in the sentence set X according to the time sequence; and The dimension of is the same as the dimension d of the word vector;

Sub-step S104b: Obtain the memory code M ^(W) ={m _t } _{t=(1, 2, ..., |t|)} of the word granularity memory unit, where

9. The question answering method according to claim 8, characterized in that said step S105 comprises:

Sub-step S105a: Calculating the attention weight vector of the normalized word granularity memory unit in:

{α α}_{t t}^{((W W))} = = s the s o o f f t t m m a a x x (({v v}^{T T} tanh tanh (({Wu Wu}_{R R}^{((S S))} + + U u {\overset{^^}{m m}}_{t t})))) - - - - - - ((1212))

in, is the k-maximally sampled set of important sentences set of words in The corresponding word-granularity memory code M ^(W) = {m _t } _{t = (1, 2, ..., |t|)} in the subset The dimension of the attention weight vector α ^(W) is and is the learning parameter;

Sub-step S105b: word granularity output word probability distribution for:

{p p}^{((W W))} ((\overset{^^}{w w})) = = {α α}^{((W W))} - - - - - - ((1313))

in, is the set of all words in the important sentence set

10. The question answering method according to claim 9, wherein said step S106 comprises:

Sub-step S106a: Based on the output word probability distribution in the dictionary dimension and the word granularity output word probability distribution on the sentence granularity memory unit, the output word joint prediction is performed, and the expression of the joint prediction output word distribution p(w) is:

\begin{matrix} p p ((w w)) = = {p p}^{((S S))} ((w w)) + + {p p}^{((W W))} ((w w)) \\ = = {p p}^{((S S))} ((w w)) + + t t r r a a n no s the s (({p p}^{((W W))} ((\overset{^^}{w w})))) \end{matrix} - - - - - - ((1414))

Among them, trans( ) indicates that the word granularity of the sub-set is output to the word probability distribution Word granularity output word probability distribution mapped to the full set of dictionary dimensions

Sub-step S106b: Perform cross-entropy supervised training on the joint prediction output word distribution by using the target answer word distribution.