CN110765755A

CN110765755A - Semantic similarity feature extraction method based on double selection gates

Info

Publication number: CN110765755A
Application number: CN201911032492.1A
Authority: CN
Inventors: 蔡晓东; 秦菲
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-02-07

Abstract

The invention discloses a semantic similarity feature extraction method based on double selection gates, which relates to the field of natural language processing. The method effectively relieves the problem of low matching efficiency caused by information redundancy, and simultaneously avoids the cost problem of manually extracting the core information.

Description

Semantic similarity feature extraction method based on double selection gates

Technical Field

The invention relates to the field of natural language processing, in particular to a semantic similarity feature extraction method based on a double selection gate.

Background

The world is full of massive information, most of the information is stored in the form of texts, and an important subject of artificial intelligence is to arrange the text information into an expression so that a computer can understand the information like a human being. Because many words in a language have multiple meanings, the same concept can be expressed in different ways and other uncertain factors exist, the traditional text similarity calculation method based on character string matching is in a search engine, a question and answer system and the like, and the user requirements are difficult to meet, when a user inputs keywords to search information matched with the keywords, the contents fed back by searching may correspond to non-conforming contents, only a few contents may conform to the searched keywords, and extreme invariance is brought to the user, so that the calculation of text similarity through deeper semantic understanding becomes a hotspot of current natural language research.

In the prior art, a plurality of sentence semantic similarity matching methods are provided, and basically focus on matching character strings at first, the basic flow is generally divided into two steps, firstly, two sentences of which the similarity needs to be judged are input into a circulating network and mapped into vector representation, and then the obtained two sentence vectors are used for judging the similarity of the two sentences through cosine distance. Although the traditional character string method is adopted to judge the similarity of sentence pairs to a certain extent, people are helped to filter out some irrelevant information when searching for relevant problems, the search result is still unsatisfactory in quality. Because the similarity between sentences judged by character strings is only the distance between words calculated at the word level, and no context semantic information exists, information is mismatched and ambiguous, and finally a user cannot quickly find related information of keywords.

Therefore, it is necessary to invent a new semantic similarity feature extraction method.

Disclosure of Invention

The invention aims to provide a semantic similarity feature extraction method based on a double selection gate, which can automatically judge the semantic similarity of two sentences, effectively reduce redundant information of the sentences through double automatic selection of core information, and improve the accuracy and judgment efficiency of the sentence similarity.

The technical scheme is as follows:

s100, carrying out word segmentation on P and Q in sentences to be processed, and carrying out vectorization representation on words subjected to word segmentation to obtain word vectors;

s200, all word vectors of the sentence pairs P and Q obtained in the step S100 are input into a first recurrent neural network in sequence to obtain context information vectors, wherein the last context information vector of the sentence represents a sentence vector of the sentence;

s300, inputting sentence vectors of the sentence pairs P and Q into a primary selection gate to obtain core information characteristics;

s400, inputting the core information obtained in the step S300 into a secondary selection gate, and acquiring the core information characteristics again;

s500, inputting the core information acquired in the step S400 into a multi-angle semantic matching network, wherein the multi-angle semantic matching network comprises four modes of full matching, maximum pooling matching, attention matching and maximum attention matching to obtain feature matching vectors of sentence pairs;

and S600, fusing the feature matching vectors obtained in the step S500 into a vector with a fixed length through a second neural network, and inputting the vector into a prediction layer to calculate the similarity probability distribution of sentence pairs.

Preferably, the first recurrent neural network is configured to generate a state vector of context information.

Preferably, the first layer of the first recurrent neural network is a single long-term and short-term memory network, the second layer of the first recurrent neural network is a bidirectional long-term and short-term memory network, and each hierarchical structure comprises a plurality of connected LSTM cell modules.

Preferably, the first recurrent neural network comprises two hierarchies;

a first layer of the first recurrent neural network is used to generate word-level vectors;

a second layer of the first recurrent neural network is used to generate a context information vector.

Preferably, the first-stage selection gate and the second-stage selection gate respectively comprise a plurality of first-stage selection gate units and second-stage selection gate units;

the primary selection gate and the secondary selection gate are different in structure and different in parameter.

Preferably, in step S200, all word vectors of the sentence pairs obtained in step S100 are sequentially input to the first cyclic network, so as to obtain a sentence state vector after each word is input, specifically:

and inputting the ith word vector and the output word vector at the ith-1 moment into the ith LSTM cell module, and processing the ith word vector and the output word vector by the ith LSTM cell module to obtain the state vector of the sentence after the ith word vector.

Preferably, in step S300, the inputting a sentence vector of a sentence pair into the first-level selection gate, and the acquiring the core information feature includes:

and inputting the context information vector at each moment of the sentence P and the ith sentence vector of the sentence Q into the first-level selection gate unit, and processing the context information vector and the ith sentence vector by the first-level selection gate unit to obtain core information.

Preferably, the step S400 of inputting the core information obtained in the step S300 into the second-level selection gate, and the step of acquiring the core information feature again includes:

and inputting the core information processed by the ith primary selection gate unit into the ith secondary selection gate unit, and processing the core information by the ith secondary selection gate unit to obtain the core information characteristics.

Preferably, in step S500, the step of inputting the core information acquired in step S400 into a multi-angle semantic matching network to obtain a feature matching vector includes:

the full matching carries out cosine similarity calculation on the context information vector at each moment of the sentence P and the sentence vector of the sentence Q to obtain a feature matching vector;

the maximum pooling matching is used for performing cosine similarity calculation on the context information vector at each moment of the sentence P and the context information vector at each moment of the sentence Q, and selecting the maximum value as a feature matching vector;

the attention matching carries out cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the i cosine values are weighted to be taken as attention weights and are multiplied by the context information at each moment of the sentence Q, and the obtained result is further subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector;

the maximum attention matching respectively performs cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q to obtain i cosine values of the sentence P, the maximum value is selected from the i cosine values to be taken as the attention weight and is multiplied by the context information of the sentence Q, and the obtained result is subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector.

Preferably, the second neural network comprises two bidirectional long-time memory networks, and is used for processing the feature matching vectors of sentence pairs and aggregating the feature matching vectors into a vector with a fixed length.

Preferably, the step S600 of fusing the matching vectors obtained in the step S500 into a vector with a fixed length by passing the matching vectors through a second neural network, and inputting the vector into the prediction layer to calculate the probability distribution of similarity between sentence pairs includes:

aggregating four feature matching vectors obtained by four matching of the sentence P into a feature matching vector with a fixed length through the second recurrent neural network;

aggregating four feature matching vectors obtained by four matching of the sentence Q into a feature matching vector with a fixed length through the bidirectional long-short time memory network;

and inputting the two feature matching vectors of the sentence P and the sentence Q into a prediction layer to obtain the sentence pair similarity.

Preferably, Word2Vec is adopted in step S100 to perform vectorization representation on the Word after the Jieba Word segmentation processing. Word2Vec is a prediction model that can efficiently learn embedded words, and the basic idea of Word2Vec is to represent each Word in natural language as a short vector with unified meaning and unified dimension.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

1. according to the semantic similarity feature extraction method based on the double selection gates, the core information in the sentences is automatically acquired without manually removing redundant information, the semantic similarity of the two sentences can be automatically judged through the semantic similarity model, the sentence similarity judgment accuracy and efficiency are higher through the semantic similarity model, and a user can be helped to find a more matched result in a question-answering or search system.

2. The semantic similarity feature extraction method based on the double selection gates utilizes a bidirectional long-time and short-time memory network to carry out context information vectorization expression on sentences. The network has a long-distance dependence relationship of a cell state capable of capturing texts, can remember a long-term state, realizes the updating, forgetting and filtering of information, better expresses a context relationship, and can solve the problems of gradient disappearance and explosion of the network. Conventional RNN networks connect past outputs and current inputs together and control both outputs by activating functions, which can only take into account the state at the most recent time.

3. According to the semantic similarity feature extraction method based on the double selection gates, the core semantic information in the sentence is automatically acquired by utilizing the two selection gates, so that the influence of redundant information on the judgment of the semantic similarity of the sentence is avoided, and the matching efficiency is improved.

4. The semantic similarity feature extraction method based on the double selection gates utilizes the multi-angle semantic matching network to perform four matching modes of full matching, maximum pooling matching, attention matching and maximum attention matching on two sentences, fully utilizes the context information vectors to perform multi-angle more detailed matching in the four matching modes, effectively avoids the problem that the accuracy of similarity judgment is low only through the cosine distance between two sentence words in the traditional method, and adopts the two-way long-short time memory network to fuse the matching vectors into the city fixed length vectors, effectively controls the dimensionality of the matching vectors and is beneficial to the calculation of the similarity of sentence pairs of a prediction layer.

5. The semantic similarity feature extraction method based on the double selection gates can effectively improve the judgment accuracy and efficiency of the semantic similarity of the sentences, and is suitable for Chinese and English sentence pair corpora.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

FIG. 2 is a block diagram of a dual select gate module according to an embodiment of the present invention.

FIG. 3 is a diagram of a multi-angle semantic matching network structure according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.

It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Example 1

Referring to fig. 1, the present invention provides a semantic similarity feature extraction method based on a dual-selection gate, including:

s100, carrying out word segmentation on P and Q in the sentences to be processed, and carrying out vectorization representation on the words subjected to word segmentation to obtain word vectors.

The word segmentation in step S100 is a process of segmenting words in a sentence into reasonable word sequences conforming to the context meaning, and is one of key technologies and difficulties in natural language understanding and text information processing, and is also an important processing link in a semantic similarity model. The Chinese word segmentation problem is complex because there is no obvious mark between words, the words are flexible to use, varied, rich in semantics and easy to generate ambiguity. According to research, the main difficulties of Chinese text word segmentation based on statistics are ambiguity resolution, inherent nouns and new word discovery, the invention adopts Jieba to segment Chinese texts, adopts Nltk to segment English texts, thereby improving word segmentation accuracy.

Models for vectorizing words include One-hot models and Distributed models. The One-hot model is simple, but the dimension cannot be controlled, and the relation between words cannot be well represented, so that the method adopts the Distributed model, and particularly adopts Word2Vec to vectorize the words.

the first recurrent neural network is used for generating a state vector of the context information; the first recurrent neural network comprises two hierarchical structures, wherein the first layer is a single long-term and short-term memory network and is used for generating word-level vectors; the second layer is a bidirectional long-time and short-time memory network and is used for generating context information vectors; each hierarchy comprising a plurality of linked LSTM cell modules; the module parameters at different hierarchies are different in order to generate the word level and context information vectors.

Inputting all word vectors of the sentence pairs obtained in step S100 into the first cyclic network in sequence, thereby obtaining a sentence state vector after each word is input, specifically:

and inputting the ith word vector and the output word vector at the (i-1) th moment into the ith LSTM cell module, and processing the ith word vector and the output word vector by the ith LSTM cell module to obtain the state vector of the sentence after the ith word vector.

specifically, a context information vector at each moment of the sentence P and an ith sentence vector of the sentence Q are input into a first-level selection gate unit, and the core information is obtained through the processing of the ith first-level selection gate unit.

S400, inputting the core information obtained in the step S300 into a secondary selection gate, and acquiring the core information characteristics again; specifically, the core information obtained by processing of the ith primary selection gate unit is input into the ith secondary selection gate unit, and the core information characteristics are obtained by processing of the ith secondary selection gate unit.

The first-stage selection gate and the second-stage selection gate respectively comprise a plurality of first-stage selection gate units and second-stage selection gate units;

the first-level selection gate and the second-level selection gate are different in structure and different in parameter.

S500, inputting the core information acquired in the step S400 into a multi-angle semantic matching network, wherein the multi-angle semantic matching network comprises four modes of full matching, maximum pooling matching, attention matching and maximum attention matching to obtain feature matching vectors of sentence pairs; in particular to a method for preparing a high-performance nano-silver alloy,

performing cosine similarity calculation on the context information vector of each moment of the sentence P and the sentence vector of the sentence Q to obtain a feature matching vector by full matching;

performing maximum pooling matching on the context information vector at each moment of the sentence P and the context information vector at each moment of the sentence Q for cosine similarity calculation, and selecting the maximum value as a feature matching vector;

the attention matching carries out cosine calculation on the context information vector of the ith moment of the sentence P and the context information vector of the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the i cosine values are weighted to be taken as attention weights and are multiplied by the context information of each moment of the sentence Q, and the obtained result is subjected to cosine calculation with the context information vector of each moment of the sentence P to obtain a feature matching vector;

the maximum attention matching carries out cosine calculation on the context information vector of the ith moment of the sentence P and the context information vector of the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the maximum value is selected from the i cosine values to serve as the attention weight and is multiplied by the context information of the sentence Q, and the obtained result is further subjected to cosine calculation with the context information vector of each moment of the sentence P to obtain a feature matching vector.

The second neural network comprises two bidirectional long-time and short-time memory networks and is used for processing the feature matching vectors of sentence pairs and aggregating the feature matching vectors into a vector with a fixed length.

S600, the matching vector obtained in the step S500 is passed through a second neural network, so that the feature matching vector is fused into a vector with a fixed length, and the vector is input into a prediction layer to calculate the similarity probability distribution of sentence pairs, specifically,

aggregating four feature matching vectors obtained by four matching of the sentence P into a feature matching vector with a fixed length through a second recurrent neural network;

aggregating four feature matching vectors obtained by four matching of the sentence Q and a passing bidirectional long-and-short time memory network into a feature matching vector with a fixed length;

In step S100, Word2Vec is used to perform vectorization representation on the words subjected to Jieba Word segmentation processing.

Example 2

On the basis of the embodiment 1, the first recurrent neural network is composed of a layer of unidirectional LSTM network and a layer of bidirectional LSTM network, each layer comprises a plurality of connected LSTM cell modules, and current input information and output information at the previous moment are processed according to an input gate, a forgetting gate, an updating gate and a filtering output gate in the LSTM cell modules. The first layer of the first recurrent neural network includes a plurality of connected unidirectional LSTM cell modules for deriving a state vector for each word. The second layer of the first recurrent neural network includes a plurality of connected bi-directional LSTM cell modules for sentence-to-sentence context information vectors.

In the method, firstly, words and context information of a sentence are modeled through a first recurrent neural network, and a state vector of each word of the sentence at a corresponding moment and a context information vector of the sentence at each moment are obtained. As shown in fig. 2, in the step S200, a Long Short Term memory network (LSTM) is used in the first recurrent neural network, and a calculation formula of the network is as follows:

f_t＝σ(W_fw_t+U_fh_t-1+b_f)；

i_t＝σ(W_iw_t+U_ih_t-1+b_i)；

o_t＝σ(W_ow_t+U_oh_t-1+b_o)；

h_t＝o_ttanh(c_t)；

in the above formula f_tIs the output of the forgetting gate; i.e. i_tIs the output of the input gate; o_tIs the output of the output gate; w_f、W_i、W_o、W_c、b_f、b_i、b_o、b_cThe weight matrixes and the offset vectors are forgetting gates, input gates, output gates and selection gates;new memory information; c. C_tFor updating memory content of LSTM network unit, sigma is sigmoid function, ⊙ is element product, h is_t-1For hidden layer output at time t-1, W_tIs the input information at time t.

In the method of the invention, because the context of the sentence is modeled by the recurrent neural network, the state vector of the corresponding sentence after the word is input at the time t theoretically contains the information of all the words before the time, that is, the state vector h of the sentence obtained after the last word is input_nContains all the information of the whole sentence, therefore h_nA state vector representing the entire sentence, i.e., a sentence vector.

Example 3

On the basis of embodiment 1 or 2, the double selection gate comprises two selection gate structures, and the two selection gate structures are different and parameters are also different. Through different selection gates, the method is beneficial to filtering out redundant information in sentences and more accurately acquiring core information. The first floor select gate calculation formula is as follows:

s＝h_n；

sGate_i＝σ(W_sh_i+U_ss+b)；

in the above formula, the sentence vector is constructed by using the context hidden vector of the sentence, and the hidden layer h of the sentence is taken_nAs sentence vectors s, sGate_iIs a gate vector, W_sAnd U_sIs a weight matrix, b is a bias vector, σ is a sigmoid activation function,

is a dot product between elements.

The second layer selection gate calculates the context vector at the time t, and utilizes the sentence vector at the previous time and the hidden state h 'of the selection gate'_iCalculating the weight of the selection gate, and finally normalizing the weight of the selection gate, wherein the calculation formula is as follows:

e_i,j＝v_a ^Ttanh(W_as_t-1+U_ah'_i)；

h 'in the formula'_iA context implicit vector;

as a weight matrix, a_i,jThe normalization is selected for the selection of the gate,

as a core of the k-th statementThe heart feature vector, k 1,2, is the number of sentences in the text.

Referring to fig. 2, the sentence P is P ═ P₁,p₂,...,p_i,...,p_n]The term Q denotes Q ═ Q₁,q₂,...,q_i,...,q_m]Representing the input sentence pair sequence, inputting words by the model once, obtaining the context information vector representation of each moment of the sentence through the step S200, and obtaining the implicit vector expression matrix of the P sentence context

Context vector expression matrix for sum Q statements

Obtaining core information through two layers of selection gates in steps S300 and S400, and obtaining statement P core characteristic feature expression By analogy, statement Q expression

The method of the invention obtains the context information vector of the sentence through the recurrent neural network, thereby enabling the context semantic relevance of the two sentences to be stronger and better judging the semantic similarity of the two sentences.

As shown in fig. 3, the second recurrent neural network is a bidirectional LSTM neural network, which includes a plurality of bidirectional LSTM cell modules connected together. In order to change the feature matching vector generated by the multi-angle matching network into a vector with a fixed length and input the vector into the prediction layer, the matching vector needs to be input into the bidirectional LSTM network and fused into a vector with a fixed length.

In order to obtain the similarity judgment of two sentences, a second cyclic neural network is used, four feature matching vectors of a sentence P and a sentence Q are input into the second cyclic neural network and fused to obtain a fixed-length vector, the four feature matching vectors of the sentence Q and the sentence P are operated in the same way to respectively obtain two fixed-length matching vectors, and the vectors are input into a prediction layer to obtain the sentence pair similarity probability distribution.

The sentence semantic similarity determined by the method of the invention automatically extracts the core information characteristics from the sentences as the input of the matching network besides using the context information between the sentences, thereby improving the matching accuracy, reducing the processing of the matching network on the redundant information and improving the matching efficiency. For some words with the same meaning and different expression forms in the sentence, similarity can be judged through models, for example, two words of 'computer' and 'computer', when similarity judgment is carried out on the two words, not only the distance between the words is considered, but also the context information of the sentence where the words are located is used for judging the similarity.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A semantic similarity feature extraction method based on double selection gates is characterized by comprising the following steps

2. The dual-selection-gate-based semantic similarity feature extraction method according to claim 1, wherein the first recurrent neural network is used for generating a state vector of context information.

3. The semantic similarity feature extraction method based on the dual selection gates as claimed in claim 1, wherein the first recurrent neural network is a single long-term and short-term memory network at the first layer and a bidirectional long-term and short-term memory network at the second layer, and each hierarchical structure comprises a plurality of connected LSTM cell modules.

4. The semantic similarity feature extraction method based on dual-choice gates according to claim 3,

the first recurrent neural network comprises two hierarchies;

5. The semantic similarity feature extraction method based on the dual selection gates as claimed in claim 1, wherein the primary selection gate and the secondary selection gate respectively comprise a plurality of primary selection gate units and secondary selection gate units;

6. the semantic similarity feature extraction method based on dual-choice gates according to claim 3,

in step S200, all word vectors of the sentence pairs obtained in step S100 are sequentially input to the first cyclic network, so as to obtain a sentence state vector after each word is input, specifically:

7. The semantic similarity feature extraction method based on dual-choice gates according to claim 5,

in step S300, the sentence vector of the sentence pair is input into the first-level selection gate, and obtaining the core information features includes:

8. The semantic similarity feature extraction method based on dual selection gates according to claims 1-7,

in step S400, the core information obtained in step S300 is input into the secondary selection gate, and the obtaining the core information feature again includes:

9. The method for extracting semantic similarity features based on dual selection gates according to claims 1-8, wherein in the step S500, the step of inputting the core information acquired in the step S400 into a multi-angle semantic matching network to obtain a feature matching vector comprises:

10. The method for semantic similarity feature extraction based on dual-choice gates according to claims 1-9, wherein the step S600 of passing the matching vector obtained in step S500 through a second neural network to fuse the feature matching vector into a vector with a fixed length, and inputting the vector into a prediction layer to calculate the probability distribution of similarity of sentence pairs comprises: