CN110765755A

CN110765755A - A Semantic Similarity Feature Extraction Method Based on Double Selection Gate

Info

Publication number: CN110765755A
Application number: CN201911032492.1A
Authority: CN
Inventors: 蔡晓东; 秦菲
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-02-07

Abstract

The invention discloses a semantic similarity feature extraction method based on double selection gate, which relates to the field of natural language processing. Input the two-way long-term memory network to obtain the context information vector of the two sentences, and then obtain the core feature vector of the sentence pair through the double selection gate, and then input the vector into the multi-angle semantic feature matching network to obtain the sentence pair. Feature matching vector, Finally, the matching vectors are combined with two semantic feature matching vectors through the bidirectional long-short-term memory network aggregation layer respectively, and the similarity prediction of sentence pairs is performed. The method effectively alleviates the problem of low matching efficiency caused by information redundancy, and at the same time avoids the cost problem of manually extracting core information.

Description

A Semantic Similarity Feature Extraction Method Based on Double Selection Gate

技术领域technical field

本发明涉及自然语言处理领域，特别涉及一种基于双重选择门的语义相似度特征提取方法。The invention relates to the field of natural language processing, in particular to a method for extracting semantic similarity features based on a double selection gate.

背景技术Background technique

当今世界充斥着海量的信息，这些信息大部分都是以文本的形式保存起来的，而人工智能一个重要的课题就是将这些文本信息整理后“表达”出来，使计算机能像人类一样“理解”这些信息。由于语言中存在很多一个词语有多种意思，相同的概念可以采用不同的方式进行表述等较多不确定因素的存在，传统基于字符串匹配的文本相似度计算方法在搜索引擎以及问答系统中等，已经难以满足用户需求，当用户输入关键字寻找与关键字匹配的信息时，搜索反馈回来的内容可能对应着不符合的内容，有可能只是少数内容符合搜索的关键字，这给用户带来了极度的不变，所以更深层次的语义理解计算文本相似的成为当前自然语言研究的热点。Today's world is full of massive amounts of information, most of which are stored in the form of text, and an important topic of artificial intelligence is to organize and "express" this text information, so that computers can "understand" like humans. these messages. Due to the existence of many uncertain factors in the language, such as many words with multiple meanings, the same concept can be expressed in different ways, etc., the traditional text similarity calculation method based on string matching is used in search engines and question answering systems, etc. It has been difficult to meet user needs. When a user enters a keyword to find information that matches the keyword, the content returned by the search may correspond to the content that does not match the search. Extremely invariant, so deeper semantic understanding computing text similarity has become a hotspot in current natural language research.

现有技术中句子语义相似度匹配方法很多，最开始基本上都集中在字符串的匹配上，其基本的流程通常分为两步，首先将两个要判断相似度的句子输入到循环网络中映射成向量表示，然后将所得到的两个句子向量通过余弦距离判断两个句子的相似程度。虽然采用传统的字符串法来判断句子对的相似性在一定程度上帮助人们在搜寻相关问题时过滤掉了一些无关信息，但搜索结果在质量上还是不能令人满意。因为通过字符串判断句子之间的相似程度仅仅是在字词层面计算词之间的距离，没有上下文语义信息，导致信息错误匹配、有歧义，最终用户不能快速找到关键字的相关信息。There are many sentence semantic similarity matching methods in the prior art, which basically focus on the matching of strings at first. Map to a vector representation, and then use the obtained two sentence vectors to judge the similarity of the two sentences through the cosine distance. Although using the traditional string method to judge the similarity of sentence pairs helps people to filter out some irrelevant information when searching for related questions to a certain extent, the quality of the search results is still unsatisfactory. Because judging the similarity between sentences by strings is only to calculate the distance between words at the word level, without contextual semantic information, resulting in incorrect matching and ambiguity of information, and end users cannot quickly find relevant information about keywords.

因此，有必要发明一种新的语义相似度特征提取方法。Therefore, it is necessary to invent a new semantic similarity feature extraction method.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于双重选择门的语义相似度特征提取方法，其能够自动判定两条句子的语义相似度，并且通过双重自动选择核心信息有效减少了句子冗余信息，提高了句子相似度的准确率和判定效率。The purpose of the present invention is to provide a semantic similarity feature extraction method based on a double selection gate, which can automatically determine the semantic similarity of two sentences, and effectively reduces the redundant information of sentences through double automatic selection of core information, and improves the performance of sentences. Similarity accuracy and judgment efficiency.

其技术方案为：Its technical solutions are:

S100、将待处理的句子对P和Q的进行分词处理，对经过分词处理后的词语进行向量化表示得到词向量；S100, performing word segmentation processing on the sentence pairs P and Q to be processed, and performing vectorized representation on the words after word segmentation processing to obtain word vectors;

S200、将步骤S100中得到的句子对P和Q的全部词向量按顺序输入第一循环神经网络，得到上下文信息向量，其中，句子的最后一个上下文信息向量代表该句子的句向量；S200. Input all word vectors of the sentence pairs P and Q obtained in step S100 into the first cyclic neural network in order to obtain a context information vector, wherein the last context information vector of the sentence represents the sentence vector of the sentence;

S300、将句子对P和Q的句向量输入到一级选择门中，获取核心信息特征；S300, input the sentence vectors of the sentence pairs P and Q into the first-level selection gate to obtain core information features;

S400、将步骤S300中得到的核心信息输入到二级选择门中，再次获取核心信息特征；S400, input the core information obtained in step S300 into the secondary selection gate, and obtain the core information feature again;

S500、将步骤S400获取到的核心信息输入到多角度语义匹配网络，其中，多角度语义匹配网络包含全匹配、最大池化匹配、注意力匹配和最大注意力匹配四种方式，得到句子对的特征匹配向量；S500. Input the core information obtained in step S400 into the multi-angle semantic matching network, wherein the multi-angle semantic matching network includes four methods: full matching, maximum pooling matching, attention matching and maximum attention matching, to obtain sentence pairs feature matching vector;

S600、将步骤S500得到的匹配向量通过第二神经网络，使特征匹配向量融合成一个固定长度的向量，并输入到预测层计算句子对的相似度概率分布。S600. Pass the matching vector obtained in step S500 through the second neural network to fuse the feature matching vector into a fixed-length vector, and input it to the prediction layer to calculate the probability distribution of the similarity of sentence pairs.

优选为，所述第一循环神经网络，用于生成上下文信息的状态向量。Preferably, the first recurrent neural network is used to generate a state vector of context information.

优选为，所述第一循环神经网络第一层为单项长短时记忆网络，第二层为双向长短时记忆网络，每个层级结构均包括多个相连的LSTM细胞模块。Preferably, the first layer of the first recurrent neural network is a single-term long-short-term memory network, the second layer is a bidirectional long-short-term memory network, and each hierarchical structure includes a plurality of connected LSTM cell modules.

优选为，所述第一循环神经网络包括两个层级结构；Preferably, the first recurrent neural network includes two hierarchical structures;

所述第一循环神经网络的第一层用于生成字词级别的向量；The first layer of the first recurrent neural network is used to generate word-level vectors;

所述第一循环神经网络的第二层用于生成上下文信息向量。The second layer of the first recurrent neural network is used to generate context information vectors.

优选为，所述一级选择门和二级选择门分别包括多个一级选择门单元和二级选择门单元；Preferably, the primary selection gate and the secondary selection gate respectively comprise a plurality of primary selection gate units and secondary selection gate units;

所述一级选择门和二级选择门的结构不同，参数不同。The first-level selection gate and the second-level selection gate have different structures and different parameters.

优选为，所述步骤S200中，将步骤S100得到的句子对的全部词向量按顺序输入第一循环网络，从而得到输入每个词后的句子状态向量，具体为：Preferably, in the step S200, all the word vectors of the sentence pair obtained in the step S100 are input into the first cyclic network in order, so as to obtain the sentence state vector after each word is input, specifically:

将第i个所述词向量和第i-1时刻的输出词向量输入到第i个所述LSTM细胞模块中，经过第i个所述LSTM细胞模块处理得到第i个词向量后句子的状态向量。Input the i-th word vector and the output word vector at the i-1th time into the i-th LSTM cell module, and obtain the state of the sentence after the i-th word vector is processed by the i-th LSTM cell module vector.

优选为，所述步骤S300中将句子对的句向量输入到一级选择门中，获取核心信息特征包括：Preferably, in the step S300, the sentence vector of the sentence pair is input into the first-level selection gate, and the acquisition of core information features includes:

将句子P的每个时刻所述上下文信息向量和句子Q的第i个所述句向量输入到所述一级选择门单元中，经过第i个所述一级选择门单元处理得到核心信息。The context information vector at each moment of sentence P and the i-th sentence vector of sentence Q are input into the first-level selection gate unit, and the core information is obtained through processing by the i-th first-level selection gate unit.

优选为，步骤S400中将步骤S300中得到的核心信息输入到二级选择门中，再次获取核心信息特征包括：Preferably, in step S400, the core information obtained in step S300 is input into the secondary selection gate, and obtaining the core information features again includes:

将所述第i个一级选择门单元处理得到的核心信息输入到第i个二级选择门单元中，经过第i个二级选择门单元处理得到核心信息特征。The core information processed by the i-th first-level selection gate unit is input into the i-th second-level selection gate unit, and the core information features are obtained through processing by the i-th second-level selection gate unit.

优选为，所述步骤S500中，将步骤S400获取到的核心信息输入到多角度语义匹配网络中，得到特征匹配向量包括：Preferably, in the step S500, the core information obtained in the step S400 is input into the multi-angle semantic matching network, and the obtained feature matching vector includes:

所述全匹配将句子P每个时刻所述上下文信息向量与句子Q所述句向量进行余弦相似计算，得到特征匹配向量；The full matching performs cosine similarity calculation on the context information vector described in sentence P and the sentence vector described in sentence Q at each moment to obtain a feature matching vector;

所述最大池化匹配将句子P每个时刻所述上下文信息向量与句子Q每个时刻所述上下文信息向量进行余弦相似计算，选取最大值作为特征匹配向量；The maximum pooling matching performs cosine similarity calculation on the context information vector at every moment of sentence P and the context information vector at every moment of sentence Q, and selects the maximum value as a feature matching vector;

所述注意力匹配将句子P第i时刻的所述上下文信息向量与句子Q第i时刻所述上下文信息向量分别进行余弦计算，得到句子P的i个余弦值，将i个余弦值加权作为注意力权重并与句子Q每个时刻所述上下文信息相乘，得到的结果再与句子P每个时刻所述的上下文信息向量进行余弦计算，得到特征匹配向量；The attention matching performs cosine calculation on the context information vector at the ith moment of sentence P and the context information vector at the ith moment of sentence Q, respectively, to obtain i cosine values of sentence P, and weight the i cosine values as attention. The force weight is multiplied with the context information described in each moment of sentence Q, and the obtained result is then subjected to cosine calculation with the context information vector described in each moment of sentence P to obtain a feature matching vector;

所述最大注意力匹配将句子P第i时刻的所述上下文信息向量与句子Q第i时刻所述的上下文信息向量分别进行余弦计算，得到句子P的i个余弦值，从i个余弦值中选取最大的值作为注意力权重，并与句子Q的所述上下文信息相乘，得到的结果再与句子P每个时刻所述的上下文信息向量进行余弦计算，得到特征匹配向量。The maximum attention matching performs cosine calculation on the context information vector at the i-th moment of sentence P and the context information vector at the i-th moment in sentence Q, respectively, to obtain i cosine values of sentence P, from the i cosine values. The largest value is selected as the attention weight and multiplied by the context information of sentence Q, and the result obtained is then cosine calculation with the context information vector described in sentence P at each moment to obtain the feature matching vector.

优选为，所述第二神经网络包括两个双向长短时记忆网络，用于处理句子对的特征匹配向量聚合成一个固定长度的向量。Preferably, the second neural network includes two bidirectional long-term and short-term memory networks, and the feature matching vectors for processing sentence pairs are aggregated into a fixed-length vector.

优选为，所述步骤S600将S500步骤得到的匹配向量通过第二神经网络，使特征匹配向量融合成一个固定长度的向量，并输入到预测层计算句子对的相似度概率分布包括：Preferably, in the step S600, the matching vector obtained in the step S500 is passed through the second neural network, so that the feature matching vector is fused into a fixed-length vector, and input to the prediction layer to calculate the similarity probability distribution of the sentence pair, including:

将句子P经过四个匹配得到的四个特征匹配向量，经过所述第二循环神经网络聚合成一个固定长度的特征匹配向量；The four feature matching vectors obtained by the four matchings of the sentence P are aggregated into a fixed-length feature matching vector through the second cyclic neural network;

将句子Q也经过四个匹配得到的四个特征匹配向量，经过所述的双向长短时记忆网络聚合成一个固定长度的特征匹配向量；The four feature matching vectors obtained by the four matchings of the sentence Q are aggregated into a fixed-length feature matching vector through the bidirectional long-short-term memory network;

利用句子P和句子Q两个特征匹配向量输入到预测层，得到句子对相似度。The two feature matching vectors of sentence P and sentence Q are input to the prediction layer to obtain the sentence pair similarity.

优选为，步骤S100中采用Word2Vec对所述经过Jieba分词处理后的词语进行向量化表示。Word2Vec是一种预测模型，可以高效地学习嵌入字，Word2Vec的基本思想是把自然语言中的每一个词，表示成一个统一意义统一维度的短向量。Preferably, in step S100, Word2Vec is used to perform vectorized representation on the words processed by Jieba word segmentation. Word2Vec is a predictive model that can efficiently learn embedded words. The basic idea of Word2Vec is to represent each word in natural language as a short vector with a unified meaning and dimension.

本发明实施例提供的技术方案带来的有益效果是：The beneficial effects brought by the technical solutions provided in the embodiments of the present invention are:

1、本发明的基于双重选择门的语义相似度特征提取方法，无需依赖人工去除冗余信息，自动获取句子中的核心信息，通过语义相似度模型能够自动判定两条句子的语义相似性，并且用该模型判定的句子相似性准确率和效率更高，能够帮助用户在问答或者搜索系统中找到更匹配的结果。1. The semantic similarity feature extraction method based on the double selection gate of the present invention does not need to rely on manual removal of redundant information, automatically obtains the core information in the sentence, and can automatically determine the semantic similarity of two sentences through the semantic similarity model, and The sentence similarity determined by this model is more accurate and efficient, and can help users find more matching results in question answering or search systems.

2、本发明的基于双重选择门的语义相似度特征提取方法，利用双向长短时记忆网络对句子进行上下文信息向量化表示。该网络拥有细胞状态能够捕获文本的长距离依赖关系，可以记住长期状态，实现信息的更新、遗忘、过滤，更好表达上下文关系，并且可以解决网络梯度消失和爆炸问题。传统的RNN网络将过去的输出和当前的输入连接在一起通过激活函数控制两者输出，只能考虑最近时刻的状态。2. The semantic similarity feature extraction method based on the double selection gate of the present invention utilizes a bidirectional long-short-term memory network to perform vectorized representation of context information on sentences. The network has a cell state that can capture long-distance dependencies of text, can remember long-term states, realize information update, forgetting, filtering, better express contextual relationships, and can solve the problem of network gradient disappearance and explosion. The traditional RNN network connects the past output and the current input to control the output of both through the activation function, and can only consider the state of the recent moment.

3、本发明的基于双重选择门的语义相似度特征提取方法，利用两个选择门自动获取句子中的核心语义信息，从而避免了冗余信息对句子语义相似度判定的影响，并且提高了匹配效率。3. The semantic similarity feature extraction method based on the double selection gate of the present invention uses two selection gates to automatically obtain the core semantic information in the sentence, thereby avoiding the influence of redundant information on the judgment of the semantic similarity of the sentence, and improving the matching. efficiency.

4、本发明的基于双重选择门的语义相似度特征提取方法，利用多角度语义匹配网络，对两条句子进行全匹配、最大池化匹配、注意力匹配和最大注意力匹配四种匹配方式，四种匹配方式充分利用上下文信息向量进行多角度更细致的匹配，有效避免了在传统方法中只通过两条句子字词之间的余弦距离判定相似度准确率低的问题，并采用双向长短时记忆网络将匹配向量融合城固定长度向量，有效的控制了匹配向量的维度，有利于预测层计算句子对的相似度。4. The semantic similarity feature extraction method based on the double selection gate of the present invention uses a multi-angle semantic matching network to perform four matching methods of full matching, maximum pooling matching, attention matching and maximum attention matching for two sentences, The four matching methods make full use of the context information vector to perform multi-angle and more detailed matching, which effectively avoids the problem of low accuracy in determining the similarity by only the cosine distance between two sentences in the traditional method. The memory network integrates the matching vector into a fixed-length vector, which effectively controls the dimension of the matching vector, which is beneficial for the prediction layer to calculate the similarity of sentence pairs.

5、本发明的基于双重选择门的语义相似度特征提取方法，能够有效提高句子语义相似度的判定准确率和效率，适用于中文和英文句子对语料。5. The semantic similarity feature extraction method based on the double selection gate of the present invention can effectively improve the accuracy and efficiency of sentence semantic similarity judgment, and is suitable for Chinese and English sentence pair corpus.

附图说明Description of drawings

图1为本发明实施例的方法流程图。FIG. 1 is a flowchart of a method according to an embodiment of the present invention.

图2为本发明实施例的双重选择门模块的结构图。FIG. 2 is a structural diagram of a dual selection gate module according to an embodiment of the present invention.

图3为本发明实施例的多角度语义匹配网络结构图。FIG. 3 is a structural diagram of a multi-angle semantic matching network according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。当然，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

需要说明的是，在不冲突的情况下，本发明创造中的实施例及实施例中的特征可以相互组合。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict.

实施例1Example 1

参见图1，本发明提供一种基于双重选择门的语义相似度特征提取方法，包括：Referring to Fig. 1, the present invention provides a method for extracting semantic similarity features based on double selection gate, including:

S100、将待处理的句子对P和Q的进行分词处理，对经过分词处理后的词语进行向量化表示得到词向量。S100: Perform word segmentation processing on the sentence pairs P and Q to be processed, and perform vectorized representation on the words after word segmentation processing to obtain word vectors.

步骤S100中的分词处理是将句子中的词语切分成合理的、符合语境意义的词语序列的过程，它是自然语言理解和文本信息处理的关键技术和难点之一，也是语义相似度模型中的一个重要处理环节。中文的词语切分问题比较复杂，其原因在于词语之间没有明显的标记，词语的使用灵活、变化多样、语义丰富，容易产生歧义。据研究，基于统计的中文文本分词的主要难点在于歧义消解、固有名词和新词发现，本发明采用Jieba对中文文本进行分词，采用Nltk对英文文本进行分词，从而提高分词正确率。The word segmentation process in step S100 is the process of dividing the words in the sentence into reasonable word sequences that conform to the contextual meaning. It is one of the key technologies and difficulties in natural language understanding and text information processing. an important part of the processing. The problem of Chinese word segmentation is more complicated. The reason is that there is no obvious mark between words, and the use of words is flexible, varied, and semantically rich, which is prone to ambiguity. According to research, the main difficulty of Chinese text segmentation based on statistics lies in ambiguity resolution, proper noun and new word discovery. The present invention uses Jieba to segment Chinese text and Nltk to segment English text, thereby improving the correct rate of word segmentation.

对单词进行向量化表示的模型有One-hot模型和Distributed模型。其中，One-hot模型简单，但是维度无法控制，并且无法很好的表示词与词之间的关系，因此，本方法采用Distributed模型，具体采用Word2Vec对单词进行向量化表示。The models for vectorized representation of words include One-hot model and Distributed model. Among them, the One-hot model is simple, but the dimensions cannot be controlled, and the relationship between words cannot be well represented. Therefore, this method adopts the Distributed model, and specifically uses Word2Vec to vectorize the words.

其中，第一循环神经网络，用于生成上下文信息的状态向量；第一循环神经网络包括两个层级结构，第一层为单项长短时记忆网络，用于生成字词级别的向量；第二层为双向长短时记忆网络，用于生成上下文信息向量；每个层级结构均包括多个相连的LSTM细胞模块；处于不同层级结构的模块参数不同，以便生成单词级别和上下文信息向量。Among them, the first recurrent neural network is used to generate the state vector of context information; the first recurrent neural network includes two hierarchical structures, the first layer is a single-term long-term memory network, which is used to generate word-level vectors; the second layer is a bidirectional long-short-term memory network for generating context information vectors; each hierarchical structure includes multiple connected LSTM cell modules; modules in different hierarchical structures have different parameters to generate word-level and context information vectors.

将步骤S100得到的句子对的全部词向量按顺序输入第一循环网络，从而得到输入每个词后的句子状态向量，具体为：Input all the word vectors of the sentence pair obtained in step S100 into the first cyclic network in sequence, so as to obtain the sentence state vector after each word is input, specifically:

将第i个词向量和第i-1时刻的输出词向量输入到第i个LSTM细胞模块中，经过第i个LSTM细胞模块处理得到第i个词向量后句子的状态向量。Input the i-th word vector and the output word vector at the i-1th time into the i-th LSTM cell module, and obtain the state vector of the sentence after the i-th word vector after processing by the i-th LSTM cell module.

具体为，将句子P的每个时刻上下文信息向量和句子Q的第i个句向量输入到一级选择门单元中，经过第i个一级选择门单元处理得到核心信息。Specifically, the context information vector of sentence P at each moment and the ith sentence vector of sentence Q are input into the first-level selection gate unit, and the core information is obtained after processing by the i-th first-level selection gate unit.

S400、将步骤S300中得到的核心信息输入到二级选择门中，再次获取核心信息特征；具体为，将第i个一级选择门单元处理得到的核心信息输入到第i个二级选择门单元中，经过第i个二级选择门单元处理得到核心信息特征。S400. Input the core information obtained in step S300 into the secondary selection gate, and obtain the core information feature again; specifically, input the core information processed by the i-th primary selection gate unit into the i-th secondary selection gate In the unit, the core information features are obtained through processing by the i-th secondary selection gate unit.

一级选择门和二级选择门分别包括多个一级选择门单元和二级选择门单元；The primary selection gate and the secondary selection gate respectively comprise a plurality of primary selection gate units and secondary selection gate units;

一级选择门和二级选择门的结构不同，参数不同。The structure of the primary selection gate and the secondary selection gate are different, and the parameters are different.

S500、将步骤S400获取到的核心信息输入到多角度语义匹配网络，其中，多角度语义匹配网络包含全匹配、最大池化匹配、注意力匹配和最大注意力匹配四种方式，得到句子对的特征匹配向量；具体为，S500. Input the core information obtained in step S400 into the multi-angle semantic matching network, wherein the multi-angle semantic matching network includes four methods: full matching, maximum pooling matching, attention matching and maximum attention matching, to obtain sentence pairs Feature matching vector; specifically,

全匹配将句子P每个时刻上下文信息向量与句子Q句向量进行余弦相似计算，得到特征匹配向量；In full matching, the cosine similarity calculation is performed between the context information vector of sentence P and the sentence vector of sentence Q at each moment, and the feature matching vector is obtained;

最大池化匹配将句子P每个时刻上下文信息向量与句子Q每个时刻上下文信息向量进行余弦相似计算，选取最大值作为特征匹配向量；The maximum pooling matching performs cosine similarity calculation on the context information vector of sentence P at each moment and the context information vector of sentence Q at each moment, and selects the maximum value as the feature matching vector;

注意力匹配将句子P第i时刻的上下文信息向量与句子Q第i时刻上下文信息向量分别进行余弦计算，得到句子P的i个余弦值，将i个余弦值加权作为注意力权重并与句子Q每个时刻上下文信息相乘，得到的结果再与句子P每个时刻的上下文信息向量进行余弦计算，得到特征匹配向量；Attention matching performs cosine calculation on the context information vector of sentence P at the ith moment and the context information vector of sentence Q at the ith moment, respectively, to obtain i cosine values of sentence P, and weight the i cosine values as the attention weight and combine it with sentence Q. The context information at each moment is multiplied, and the result obtained is then subjected to cosine calculation with the context information vector of sentence P at each moment to obtain a feature matching vector;

最大注意力匹配将句子P第i时刻的上下文信息向量与句子Q第i时刻的上下文信息向量分别进行余弦计算，得到句子P的i个余弦值，从i个余弦值中选取最大的值作为注意力权重，并与句子Q的上下文信息相乘，得到的结果再与句子P每个时刻的上下文信息向量进行余弦计算，得到特征匹配向量。The maximum attention matching performs cosine calculation on the context information vector at the ith moment of sentence P and the context information vector at the ith moment of sentence Q, respectively, to obtain i cosine values of sentence P, and selects the largest value from the i cosine values as attention. The force weight is multiplied with the context information of sentence Q, and the result obtained is then subjected to cosine calculation with the context information vector of sentence P at each moment to obtain the feature matching vector.

其中，第二神经网络包括两个双向长短时记忆网络，用于处理句子对的特征匹配向量聚合成一个固定长度的向量。Among them, the second neural network includes two bidirectional long and short-term memory networks, and the feature matching vectors used to process sentence pairs are aggregated into a fixed-length vector.

S600、将步骤S500得到的匹配向量通过第二神经网络，使特征匹配向量融合成一个固定长度的向量，并输入到预测层计算句子对的相似度概率分布，具体为，S600. Pass the matching vector obtained in step S500 through the second neural network, so that the feature matching vector is fused into a fixed-length vector, and input to the prediction layer to calculate the similarity probability distribution of sentence pairs, specifically,

将句子P经过四个匹配得到的四个特征匹配向量，经过第二循环神经网络聚合成一个固定长度的特征匹配向量；The four feature matching vectors obtained by the four matchings of the sentence P are aggregated into a fixed-length feature matching vector through the second cyclic neural network;

将句子Q也经过四个匹配得到的四个特征匹配向量，经过的双向长短时记忆网络聚合成一个固定长度的特征匹配向量；The four feature matching vectors obtained by the four matchings of the sentence Q are also aggregated into a fixed-length feature matching vector through the bidirectional long-short-term memory network;

步骤S100中采用Word2Vec对经过Jieba分词处理后的词语进行向量化表示。In step S100, Word2Vec is used to vectorize the words processed by Jieba word segmentation.

实施例2Example 2

在实施例1的基础上，第一循环神经网络由一层单向LSTM网络构成和一层双向LSTM网络构成，每个层级包括多个相连的LSTM细胞模块，根据LSTM细胞模块中的输入门、遗忘门、更新门和过滤输出门对当前输入信息和前一时刻输出信息进行处理。第一循环神经网络的第一层包括多个相连的单向LSTM细胞模块，用于得到每个词的状态向量。第一循环神经网络的第二层包括多个相连的双向LSTM细胞模块，用于的到句子上下文信息向量。On the basis of Example 1, the first recurrent neural network consists of a layer of unidirectional LSTM network and a layer of bidirectional LSTM network, each layer includes a plurality of connected LSTM cell modules, according to the input gate in the LSTM cell module, The forget gate, update gate and filter output gate process the current input information and the output information of the previous moment. The first layer of the first recurrent neural network consists of multiple connected unidirectional LSTM cell modules for obtaining the state vector for each word. The second layer of the first recurrent neural network consists of multiple connected bidirectional LSTM cell modules for sentence context information vectors.

在本方法中，首先通过第一循环神经网络对句子的词语和上下文信息进行建模，得到句子每个词对应时刻的状态向量和每个时刻句子的上下文信息向量。其中，如图2所示，步骤S200中第一循环神经网络中采用长短时记忆网络(Long Short Term MemoryNetwork，LSTM)该网络的计算公式如下：In this method, the first cyclic neural network is used to model the words and context information of the sentence, and the state vector of the corresponding moment of each word of the sentence and the context information vector of the sentence at each moment are obtained. Wherein, as shown in Figure 2, in step S200, the first cyclic neural network adopts the Long Short Term Memory Network (LSTM), and the calculation formula of the network is as follows:

f_t＝σ(W_fw_t+U_fh_t-1+b_f)；f _t =σ(W _f w _t +U _f h _t-1 +b _f );

i_t＝σ(W_iw_t+U_ih_t-1+b_i)；i _t =σ(W _i w _t +U _i h _t-1 +b _i );

o_t＝σ(W_ow_t+U_oh_t-1+b_o)；o _t =σ(W _o w _t +U _o h _t-1 +b _o );

h_t＝o_ttanh(c_t)；h _t =o _t tanh(c _t );

上述公式中f_t为遗忘门的输出；i_t为输入门的输出；o_t为输出门的输出；W_f、W_i、W_o、W_c、b_f、b_i、b_o、b_c、为遗忘门、输入门、输出门、选择门的权重矩阵和偏置向量；为新的记忆信息；c_t为更新的LSTM网络单元的记忆内容；σ为sigmoid函数；⊙为元素乘积；h_t-1为t-1时刻的隐藏层输出，W_t为t时刻的输入信息。In the above formula, f _t is the output of the forget gate; i _t is the output of the input gate; _o _t is the output of the output gate; W _f , Wi , W _o , W _c , b _f , b _i , b _o , b _c , are the weight matrix and bias vector of the forget gate, input gate, output gate, and selection gate; is the new memory information; c _t is the memory content of the updated LSTM network unit; σ is the sigmoid function; ⊙ is the element product; h _t-1 is the output of the hidden layer at time t-1, and W _t is the input information at time t .

在本发明的方法中，由于通过循环神经网络对句子上下文进行建模，使得t时刻输入单词后对应句子的状态向量理论上包含了该时刻之前的所有单词的信息，也就是说，输入最后一个词后得到的句子状态向量h_n包含了整个句子的所有信息，因此，h_n代表了整个句子的状态向量，即句向量。In the method of the present invention, since the context of the sentence is modeled by the cyclic neural network, the state vector of the corresponding sentence after the word is input at time t theoretically includes the information of all words before the time, that is, the input of the last word The sentence state vector h _n obtained after the word contains all the information of the whole sentence. Therefore, h _n represents the state vector of the whole sentence, that is, the sentence vector.

实施例3Example 3

在实施例1或2的基础上，双重选择门包括两个选择门结构，两个选择门结构不同，参数也不同。通过不同的选择门，有利于过滤掉句子中的冗余信息，更加准确地获取核心信息。第一层选择门计算公式如下：On the basis of Embodiment 1 or 2, the double selection gate includes two selection gate structures, and the two selection gates have different structures and different parameters. Through different selection gates, it is beneficial to filter out redundant information in sentences and obtain core information more accurately. The calculation formula of the first layer selection gate is as follows:

s＝h_n；s= _hn ;

sGate_i＝σ(W_sh_i+U_ss+b)；sGate _i =σ(W _s _hi +U _s s+b);

上述公式中，使用句子上下文隐向量构造其句向量，取句子的隐藏层h_n为句向量s，sGate_i为门向量，W_s和U_s是权重矩阵，b是偏置向量，σ是sigmoid激活函数，

是元素之间的点乘。In the above formula, use the sentence context hidden vector to construct its sentence vector, take the hidden layer h _n of the sentence as the sentence vector s, sGate _i as the gate vector, W _s and U _s are the weight matrix, b is the bias vector, σ is the sigmoid activation function,

is the dot product between elements.

第二层选择门通过计算t时刻的上下文向量，利用前一时刻句向量和选择门隐层状态h′_i计算选择门权重，最后将选择门权重归一化，计算公式如下：The second-layer selection gate calculates the selection gate weight by calculating the context vector at time t, using the sentence vector at the previous time and the hidden layer state h′ _i of the selection gate, and finally normalizes the selection gate weight. The calculation formula is as follows:

e_i,j＝v_a ^Ttanh(W_as_t-1+U_ah'_i)；e _i,j = v _a ^T tanh(W _a s _t-1 +U _a h' _i );

上述公式中h′_i为上下文隐向量；

为权值矩阵，a_i,j为选择门选中归一化，

为第k个语句的核心特征向量，k＝1,2,...,L，L为文本中的语句数量。In the above formula, h′ _i is the context hidden vector;

is the weight matrix, a _i,j is the normalization selected by the selection gate,

is the core feature vector of the kth sentence, k=1,2,...,L, where L is the number of sentences in the text.

参见图2，语句P为P＝[p₁,p₂,...,p_i,...,p_n]，语句Q表示为Q＝[q₁,q₂,...,q_i,...,q_m]表示输入的句子对序列，模型一次输入词语并经过步骤S200得到句子的每个时刻的上下文信息向量表示，P语句上下文的隐向量表达式矩阵

和Q语句的上下文向量表达式矩阵

经过步骤S300、S400中的两层选择门获取核心信息，语句P核心特征特征表达式同理可得，语句Q表达式

Referring to Fig. 2, the statement P is P=[p ₁ , p ₂ ,...,pi ,...,p _n ], and the statement Q _is expressed as Q=[q ₁ ,q ₂ ,..., _qi ,...,q _m ] represents the input sentence pair sequence, the model inputs the words once and obtains the context information vector representation of each moment of the sentence through step S200, and the implicit vector expression matrix of the P sentence context

and the context vector expression matrix of the Q statement

The core information is obtained through the two-layer selection gate in steps S300 and S400, and the expression of the core feature of the statement P is Similarly, the statement Q expression

本发明的方法通过循环神经网络得到的句子上下文信息向量，从而使两条句子的上下文语义关联性更强，更好的判断两条句子的语义相似度。The method of the invention obtains the sentence context information vector through the cyclic neural network, so that the context semantic correlation of the two sentences is stronger, and the semantic similarity of the two sentences is better judged.

如图3所示，第二循环神经网络为双向LSTM神经网络，包括多个双向LSTM细胞模块相连。为了使多角度匹配网络生成的特征匹配向量变成一个固定长度的向量输入到预测层，需要将匹配向量输入至双向LSTM网络中融合成一个固定长度的向量。As shown in Fig. 3, the second recurrent neural network is a bidirectional LSTM neural network, including multiple bidirectional LSTM cell modules connected. In order to make the feature matching vector generated by the multi-angle matching network into a fixed-length vector input to the prediction layer, the matching vector needs to be input into the bidirectional LSTM network and fused into a fixed-length vector.

本发明为得到两条语句的相似判定，使用了第二循环神经网络，将句子P和句子Q的四个特征匹配向量输入第二循环神经网络中融合得到一个固定长度向量，句子Q和句子P的四个特征匹配向量用以上相同操作，分别得到两个固定长度的匹配向量，将向量输入预测层得到句子对相似度概率分布。In order to obtain the similarity judgment of two sentences, the present invention uses a second cyclic neural network, and inputs the four feature matching vectors of sentence P and sentence Q into the second cyclic neural network to fuse to obtain a fixed-length vector, sentence Q and sentence P The four feature matching vectors of , use the same operations as above to obtain two fixed-length matching vectors respectively, and input the vectors into the prediction layer to obtain the probability distribution of sentence pair similarity.

利用本发明的方法判定的句子语义相似度，除了利用句子间的上下文信息之外，还自动从句子中提取了核心信息特征作为匹配网络的输入，提高了匹配准确率，同时减少了匹配网络对于冗余信息的处理，提高了匹配效率。对于句子中一些意思相同表达形式不同的词语，也可以通过模型判定它们相似，比如“计算机”和“电脑”两个词汇，在对两个词进行相似度判定时，不仅仅考虑词之间的距离，而是利用词所在句子上下文信息来判定相似度。The semantic similarity of sentences determined by the method of the present invention not only uses the context information between sentences, but also automatically extracts core information features from the sentences as the input of the matching network, which improves the matching accuracy and reduces the matching network for the same time. The processing of redundant information improves the matching efficiency. For some words with the same meaning and different expressions in the sentence, the model can also be used to determine that they are similar, such as the words "computer" and "computer". When determining the similarity of the two words, not only the similarity between the words is considered distance, but use the context information of the sentence in which the word is located to determine the similarity.

以上仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention. Inside.

Claims

1. a semantic similarity feature extraction method based on double selection gate, is characterized in that, comprises the steps

S100, performing word segmentation processing on the sentence pairs P and Q to be processed, and performing vectorized representation on the words after word segmentation processing to obtain word vectors;

S200. Input all word vectors of the sentence pairs P and Q obtained in step S100 into the first cyclic neural network in order to obtain a context information vector, wherein the last context information vector of the sentence represents the sentence vector of the sentence;

S300, input the sentence vectors of the sentence pairs P and Q into the first-level selection gate to obtain core information features;

S400, input the core information obtained in step S300 into the secondary selection gate, and obtain the core information feature again;

S500. Input the core information obtained in step S400 into the multi-angle semantic matching network, wherein the multi-angle semantic matching network includes four methods: full matching, maximum pooling matching, attention matching and maximum attention matching, to obtain sentence pairs feature matching vector;

S600. Pass the matching vector obtained in step S500 through the second neural network to fuse the feature matching vector into a fixed-length vector, and input it to the prediction layer to calculate the probability distribution of the similarity of sentence pairs.

2 . The semantic similarity feature extraction method based on double selection gate according to claim 1 , wherein the first recurrent neural network is used to generate a state vector of context information. 3 .

3. The semantic similarity feature extraction method based on double selection gate according to claim 1, is characterized in that, the first layer of the first recurrent neural network is a single-item long-short-term memory network, and the second layer is a bidirectional long-short-term memory network A network, each hierarchical structure consists of multiple connected LSTM cell modules.

4. the semantic similarity feature extraction method based on double selection gate according to claim 3, is characterized in that,

The first recurrent neural network includes two hierarchical structures;

The first layer of the first recurrent neural network is used to generate word-level vectors;

The second layer of the first recurrent neural network is used to generate context information vectors.

5. The semantic similarity feature extraction method based on double selection gate according to claim 1, characterized in that, the primary selection gate and the secondary selection gate respectively comprise a plurality of primary selection gate units and secondary selection gates unit;

6. The semantic similarity feature extraction method based on double selection gate according to claim 3, is characterized in that,

In the step S200, all the word vectors of the sentence pair obtained in the step S100 are input into the first cyclic network in sequence, so as to obtain the sentence state vector after each word is input, specifically:

Input the i-th word vector and the output word vector at the i-1th time into the i-th LSTM cell module, and obtain the state of the sentence after the i-th word vector is processed by the i-th LSTM cell module vector.

7. The semantic similarity feature extraction method based on double selection gate according to claim 5, is characterized in that,

In the step S300, the sentence vector of the sentence pair is input into the first-level selection gate, and the acquisition of core information features includes:

The context information vector at each moment of sentence P and the i-th sentence vector of sentence Q are input into the first-level selection gate unit, and the core information is obtained after processing by the i-th first-level selection gate unit.

8. the semantic similarity feature extraction method based on double selection gate according to claim 1-7, is characterized in that,

In step S400, the core information obtained in step S300 is input into the secondary selection gate, and the characteristics of the core information obtained again include:

The core information processed by the i-th first-level selection gate unit is input into the i-th second-level selection gate unit, and the core information features are obtained through processing by the i-th second-level selection gate unit.

9. The semantic similarity feature extraction method based on double selection gate according to claim 1-8, wherein in the step S500, the core information obtained in the step S400 is input into the multi-angle semantic matching network, The obtained feature matching vector includes:

The full matching performs cosine similarity calculation on the context information vector described in sentence P and the sentence vector described in sentence Q at each moment to obtain a feature matching vector;

The maximum pooling matching performs cosine similarity calculation on the context information vector at every moment of sentence P and the context information vector at every moment of sentence Q, and selects the maximum value as a feature matching vector;

The attention matching performs cosine calculation on the context information vector at the ith moment of sentence P and the context information vector at the ith moment of sentence Q, respectively, to obtain i cosine values of sentence P, and weight the i cosine values as attention. The force weight is multiplied with the context information described in each moment of sentence Q, and the obtained result is then subjected to cosine calculation with the context information vector described in each moment of sentence P to obtain a feature matching vector;

The maximum attention matching performs cosine calculation on the context information vector at the i-th moment of sentence P and the context information vector at the i-th moment in sentence Q, respectively, to obtain i cosine values of sentence P, from the i cosine values. The largest value is selected as the attention weight and multiplied by the context information of sentence Q, and the result obtained is then cosine calculation with the context information vector described in sentence P at each moment to obtain the feature matching vector.

10. The semantic similarity feature extraction method based on the double selection gate according to claim 1-9, wherein the step S600 passes the matching vector obtained in the step S500 through the second neural network, so that the feature matching vector is fused into a A fixed-length vector and input to the prediction layer to calculate the probability distribution of the similarity of sentence pairs including:

The four feature matching vectors obtained by the four matchings of the sentence P are aggregated into a fixed-length feature matching vector through the second cyclic neural network;

The four feature matching vectors obtained by the four matchings of the sentence Q are aggregated into a fixed-length feature matching vector through the bidirectional long-short-term memory network;

The two feature matching vectors of sentence P and sentence Q are input to the prediction layer to obtain the sentence pair similarity.