CN113535914B

CN113535914B - Text semantic similarity calculation method

Info

Publication number: CN113535914B
Application number: CN202110654980.7A
Authority: CN
Inventors: 许晓伟; 张善平; 王晓东; 曹媛
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2024-05-21
Anticipated expiration: 2041-06-11
Also published as: CN113535914A

Abstract

The invention discloses a text semantic similarity calculation method, which comprises the following steps: converting the text into a vector matrix, utilizing Siamese BiLSTM network and combining Co-Attention mechanism to obtain a global feature matrix containing text interaction information, utilizing CapsNet network to extract local features of the text, and utilizing SENet network to automatically calibrate the importance degree of each local feature to obtain a local feature matrix of the text; the global feature matrix and the local feature matrix are fused, then the context information is extracted again by using BiLSTM networks, the similarity matrix of the two texts is obtained, finally, the semantic similarity of the texts is judged through fusion, pooling and full-connection layers, and the effective feature information of the texts is fully extracted; because the interactive information of the text to be matched is extracted, valuable information of the text is reserved, and accuracy of text semantic similarity calculation is improved.

Description

Text semantic similarity calculation method

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a text semantic similarity calculation method based on feature self-adaptive calibration and a Co-attention mechanism.

Background

With the rapid aim of enabling computers to better understand human intentions in internet technology, natural language processing is becoming a popular field of research, returning information required for humans from massive text information. Development and general application of social networks, text information such as microblogs, critique, news and the like presents explosive growth.

The text semantic similarity calculation is a basic task in the field of natural language processing, plays an irreplaceable role in the aspects of information retrieval, text classification, question-answering systems and the like, and therefore has great significance in researching the text similarity calculation.

Research efforts on text semantic similarity in recent years can be divided into three categories: firstly, a traditional method based on keyword and character string matching; kondrak an N-Gram model is proposed to calculate the similarity of texts, and the similarity of the texts is characterized by calculating the ratio of the common N-tuple to the total N-tuple number in two texts; niwattanakul et al propose Jaccard coefficients to measure similarity between two texts without focusing on the differences between collection elements; the keyword-based method only focuses on the similarity of the vocabulary layers of the texts to be matched, does not consider the semantic information of the texts, and has great limitation. Secondly, a calculation method based on a vector space model; salton et al first proposed a Vector Space Model (VSM) that mapped text into vectors based on word frequency statistics; landauer et al propose an LSA model based on a VSM model, and map a text from a word vector space to a semantic vector space through Singular Value Decomposition (SVD) to enable the vector to have certain semantic information; the similarity of the text is usually represented by using Euclidean distance, cosine distance, manhattan distance and the like in the calculation method based on the vector space, so that the defects of the traditional vocabulary-based matching method are well supplemented; in such methods, the text is represented as a high-dimensional, sparse vector, which is detrimental to the computation of similarity; assuming that words are mutually independent, ignoring the phenomena of word ambiguity, hyponym and synonym, so that the assumption is difficult to meet the actual situation; the contextual relation between words in the text is ignored and enough semantic information is lacking. Thirdly, a calculation method based on deep learning; mikolov et al propose word2vec models, and compared with one-hot models, word2vec models train out low-dimensional and dense word vectors by using word context information, so that semantic information is more abundant; yin and the like model sentence pairs, three schemes for fusing an attention mechanism with CNN are provided, and verification is carried out on tasks such AS AS, PI and the like; kusner et al developed a word movement distance to measure similarity between two texts according to the development of the word embedding study; the method based on deep learning extracts text features through a neural network to judge the similarity of texts, which contains semantic information to a certain extent but lacks text features such as contrast among sentences.

The conventional text similarity calculation method based on statistics has certain defects: text is represented as a high-dimensional, sparse vector, which is not conducive to similarity calculation; assuming that words are mutually independent, ignoring the phenomena of word ambiguity, hyponym and synonym, so that the assumption is difficult to meet the actual situation; ignoring contextual links between words in text, there is no semantic information. There are also some disadvantages to using a pre-trained language model in combination with a neural network approach: the convolutional neural network focuses on the local information of the text, extracts the local features of the text through a vector matrix, but lacks context interaction information, and the rolling and pooling process can cause the loss of more effective features; the recurrent neural network, although extracting context interaction information, lacks the ability to extract text features for long-range dependencies.

Disclosure of Invention

The invention aims to provide a text semantic similarity calculation method, which overcomes the defects in the prior art, uses a Co-Attention mechanism to extract interactive information of texts to be matched, simultaneously uses SECAPSNET networks to fully reserve valuable information of the texts, fully extracts effective characteristic information of the texts, and improves the accuracy of short text semantic similarity calculation.

The invention is realized by adopting the following technical scheme:

The text semantic similarity calculation method comprises the following steps:

step1, converting a text into a vector matrix;

Step 2, extracting global features of the text by utilizing Siamese BiLSTM networks, and extracting a global feature matrix containing text interaction information by combining a Co-Attention mechanism;

Step 3, extracting local features of the text by using CapsNet for the vector matrix, and calibrating the importance degree of each local feature by using a SENet network to obtain a local feature matrix;

Step 4, fusing the global feature matrix and the local feature matrix, and extracting context information by using BILSTM networks to obtain semantic similarity matrixes of the two texts;

And 5, fusing the two semantic similarity matrixes, and judging the semantic similarity of the two texts through a pooling and full-connection layer.

Further, the step 1 specifically includes: the length of the sentence is intercepted as m, a n-dimensional pre-trained GloVe model is used for word embedding, and the text is expressed as an m-by-n vector matrix containing semantic information.

Further, step 2 specifically includes: extracting global features of the texts by using Siamese BiLSTM networks sharing parameters to obtain global feature matrixes M and N of the two texts; combining the global feature matrix M and N by using a Co-Attention mechanism to obtain a matrix L=M ^T ×N; solving softmax for L by row and column, respectively, to obtain attention matrix a ^N = softmax (L) and a ^M＝softmax(L^T for the first text versus the second text; applying attention to the second text, and generating attention matrices C ^N＝M*A^N and C ^M＝C^N*A^M of the first text based on the information of the second text after adding attention, results in a global feature matrix containing interaction information.

Further, the use SENet of the network to calibrate the importance of each local feature specifically includes: the output of the CapsNet network DIGITCAPS layer is used as a text local feature matrix, the text local feature matrix is input into the SENet network, and the SECAPSNET network is constructed to calibrate the local features of the text.

Further, extracting global features of the text by using Siamese BiLSTM networks sharing parameters to obtain global feature matrixes M and N of the two texts, wherein the global feature matrixes M and N are specifically as follows:

The two LSTM are used to perform operations in two directions, namely front and back, and the output of BiLSTM at time t is:

Wherein, Representing the output of forward LSTM at time t,/>Indicating the output of backward LSTM at time t, w _f and w _b are the hidden layer states of forward LSTM and backward LSTM, respectively, b _t indicates the offset, and h _t indicates the output of BiLSTM at time t.

Compared with the prior art, the invention has the advantages and positive effects that: the text semantic similarity calculation method provided by the invention comprises the steps of firstly preprocessing a text, and using a pre-trained GloVe model to perform word embedding so as to convert the text into a vector matrix; then extracting global features and interaction information of the text by utilizing Siamese BiLSTM networks and combining a Co-Attention mechanism to obtain a global feature matrix containing the text interaction information; meanwhile, extracting local features of the text by utilizing CapsNet networks, automatically calibrating the importance degree of each local feature by utilizing SENet networks to form SECAPSNET networks, and obtaining a local feature matrix of the text; the global feature matrix and the local feature matrix are fused, then the context information is extracted again by using BiLSTM networks, the similarity matrix of the two texts is obtained, finally, the semantic similarity of the texts is judged through fusion, pooling and full-connection layers, and the effective feature information of the texts is fully extracted; compared with the prior art, the method and the device have the advantages that the Co-Attention mechanism is used for extracting the interaction information of the texts to be matched, and meanwhile, the SECAPSNET network fully reserves the valuable information of the texts, so that the effective characteristic information of the texts can be fully extracted, and the accuracy of text semantic similarity calculation is improved.

Other features and advantages of the present invention will become more apparent from the following detailed description of embodiments of the present invention, which is to be read in connection with the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a text semantic similarity calculation method provided by the invention;

FIG. 2 is a technical flow chart of a text semantic similarity calculation method according to the present invention;

FIG. 3 is a diagram of a BILSTM model structure in accordance with the present invention;

FIG. 4 is a diagram of a model structure of CapsNet in the present invention.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to the drawings.

The text semantic similarity calculation method provided by the invention, as shown in fig. 1 and 2, comprises the following steps:

step S1: the text is converted into a vector matrix.

First, preprocessing is performed on a text to be matched (hereinafter, a first text and a second text are taken as examples) to solve the problems of misspelling, confusion of cases, and the like, including but not limited to spell checking, symbol replacement, abbreviation unification, and the like.

After preprocessing, word embedding is performed using a pre-trained GloVe model:

Because training GloVe models on large data sets takes a lot of time, in some embodiments of the invention, word embedding is performed using a 300-dimensional pre-trained GloVe model, which is trained based on a billion-level data set corpus, and the generated text vectors contain more semantic information, so that dimension disaster problems caused by insufficient carrying information and oversized dimensions due to undersize can be avoided.

300-Dimensional randomized embedding was also performed for word-embedded dictionary-outside words. Meanwhile, in view of the problem of different sentence lengths in the data set, in some embodiments of the present invention, the sentence length is truncated to 25, and the original sentence is represented as a vector matrix of 25×300 through the word embedding layer.

Step S2: and extracting global features of the text by utilizing Siamese BiLSTM networks, and extracting a global feature matrix containing text interaction information by combining a Co-attribute mechanism.

The two LSTM are used to perform operations in two directions, namely front and back, and the BiLSTM model structure is as shown in fig. 3, and the output of BiLSTM at time t is:

Wherein, Representing the output of forward LSTM at time t,/>Indicating the output of backward LSTM at time t, w _f and w _b are respectively the hidden layer states of forward LSTM and backward LSTM, b _t indicates the offset, h _t indicates the output of BiLSTM at time t, then h _t is represented by/>And/>Determining together; finally extracting global feature matrixes M and N of the two texts through BiLSTM networks; in the figure, e _t denotes a word after segmentation.

And acquiring interaction information of the two texts by using a Co-Attention mechanism to obtain global feature matrixes C ^M and C ^N containing the text interaction information.

Specifically, first, information of global feature matrices M and N extracted by using a Co-Attention mechanism in combination with BiLSTM is used to obtain matrices

L＝M^T*N

And respectively solving softmax of the matrix L after the information combination according to rows and columns to obtain attention matrices of the first text A and the second text B:

A^N＝softmax(L)

A^M＝softmax(L^T)

Matrix a ^N represents that each word in the first text a has one attention for each word in the second text B, and matrix a ^M is the same.

Next, attention is applied to the second text B, and attention of the first text a is generated based on the information of the second text B after the addition attention:

C^N＝M*A^N

C^M＝C^N*A^M

Thus, information interaction of the two texts is realized, and global feature matrices C ^M and C ^N of the two texts containing interaction information are obtained.

Step S3: extracting local features of the text from the vector matrix by using CapsNet, and calibrating the importance degree of each local feature by using a SENet network to obtain a local feature matrix.

The invention uses CapsNet network to extract the local character of text, the model structure is shown in figure 4, capsNet is composed of input layer, convolution layer, PRIMARYCAPS layer, DIGITCAPS layer and full connection layer when used for classifying task. The input layer is a matrix representation of the raw data; the Cov1 layer is a standard convolution layer, and characteristic information of input data is extracted through a convolution kernel; at PRIMARYCAPS, the extracted features are encapsulated in a plurality of primary capsules for comprehensively reflecting certain classes of features; the DIGITCAPS layer performs propagation and dynamic route update on the basis of the vector output by the PRIMARYCAPS layer; and finally, decoding the extracted characteristic information at the full connection layer to judge the type of the original data.

The dynamic routing algorithm in CapsNet networks is shown in the following examples:

1) Initializing all l-layer capsules i and l+1-layer capsules j: b _ij =0

2)For r＝1 to 3do

3) For each capsule i in layer i: c _ij＝softmax(b_ij)

4) For each capsule j in layer l+1: u ⁱ＝vⁱ*wⁱ,s^j＝∑_ic_ijuⁱ

5) For each capsule j in layer l+1:

v_j＝Squash(s^j)

6) For all l-layer capsules i and l+1-layer capsules j: b _ij＝b_ij+uⁱv_j

7)return v_j

Where v ⁱ is the input vector, w ⁱ is the weight matrix, u ⁱ is the multiplication of v ⁱ and w ⁱ, c _ij is called coupling coefficients, and represents the possibility that the lower capsule activates the upper capsule, and the iterative update is performed by b _ij, squash is the extrusion operation, and v _j represents the output vector, similar to the activation function.

Compared with the traditional convolutional neural network, the method for extracting the local features of the text by using the CapsNet network can retain more valuable information such as original sequence and semantic representation of the text.

Meanwhile, the invention takes the output of the CapsNet network DIGITCAPS layer as a text local feature matrix and inputs the text local feature matrix into the SENet network to construct the SECAPSNET network. SENet the network mainly comprises Squeeze, excitation, reweight parts, and the squeze uses global average pooling to compress a multi-layer characteristic channel U obtained by convolution, so that a receptive field is changed from local to global, and enough effective information can be obtained; the accounting is responsible for generating weights for each feature, similar to the mechanism of gates in a recurrent neural network; reweight on the basis of the Squeeze and specification operations, the recalibration of the original features in the channel dimension can be completed.

Therefore, SECAPSNET network not only contains more text features and semantic information, but also realizes the self-adaptive calibration of text local features.

Step S4: and fusing the global feature matrix and the local feature matrix, and extracting context information by using BILSTM networks to obtain semantic similarity matrixes of the two texts.

In the embodiment of the invention, the global features and the local features of the text are fused by using a splicing method, and the fused feature matrix contains more text features.

And BILSTM, the network still adopts a Siamese structure with shared parameters, and context information and text features are extracted again from the fused feature matrix, so that a semantic similarity matrix of two texts can be obtained.

Step S5: and fusing the two semantic similarity matrixes, and judging the semantic similarity of the two texts through a pooling and full-connection layer.

And fusing the semantic similarity matrix of the first text A with the semantic similarity matrix of the second text B, extracting text features through pooling operation, wherein the pooling mode is global average pooling and global maximum pooling.

In the full connection layer, a 3-layer Dense network is used as the full connection layer, and hidden layer units 128, 32 and 1 are respectively set.

The semantic similarity calculation method proposed by the present invention is tested based on Quora Questions Pairs dataset consisting of 40 or more pairs of questions, where 0 or 1 indicates whether the meanings of the two questions are the same.

In the embodiment of the invention, 10000 samples of balance data are selected as a development set, 10000 samples of balance data are selected as a test set, the rest examples are reserved as a training set, and samples of Quora data sets are shown in the following table one:

List one

The experimental process uses accuracy, recall, precision, and F1 values to comprehensively evaluate the performance of the model.

For a classification problem, four situations can occur when the predicted result and the actual result are combined two by two: TP, TN, FP, FN.

TP (True Positive) denotes the number of positive classes predicted as positive classes; TN (True Negative) denotes the number of negative classes predicted as negative classes; FP (False Positive) denotes the number of predicted negative classes as positive classes; FN (False Negative) denotes the number of positive classes predicted as negative classes.

The Accuracy (Accuracy) is the percentage of the correct prediction result to the total sample, and can measure the discrimination capability of the model to the whole data set; recall (Recall) refers to the probability of being predicted as a positive sample among samples that are actually positive; precision is the probability that all samples predicted to be positive are actually positive samples; the F1 value is the harmonic mean of the precision and recall, and higher F1 values indicate better overall model performance.

In order to verify the effectiveness of the method proposed by the invention, three sets of comparative experiments were set up:

Experiment 1: comparison experiments with classical models.

Experiment 2: the attention mechanism and the CapsNet network are sequentially integrated in the experiment process.

Experiment 3: comparative experiments with the methods proposed in other documents.

In experiment 1, the model of the present invention was compared with a model such as classical LSTM, biLSTM, siameselstm. The experimental results are shown in table two:

Watch II

As can be seen from the second table, compared with the traditional method, the method provided by the invention has better performance in terms of text semantic similarity, and the accuracy reaches 87.31%.

Since the BiLSTM model is capable of extracting both forward and backward sequence information, the BiLSTM model has better performance than the LSTM model. After the Siamese structure is integrated, compared with an LSTM model, the accuracy, recall ratio, precision ratio and F1 value of the BiLSTM model are remarkably improved, which indicates that the conjoined structure can effectively improve the performance of the model. Therefore, the Siamese structure is adopted in the method provided by the invention.

In experiment 2, the method presented herein was compared with a model method that was followed by integration of the attention mechanism and CapsNet networks, and the experimental results are shown in table three:

Watch III

According to the results of BiLSTM and BiLSTM-Attention models, the BiLSTM model integrated into the Attention mechanism has higher accuracy and F1 value, which shows that the Attention mechanism improves the accuracy of the model by extracting interaction information between texts. From table three, it can also be seen that the accuracy of the method after being integrated into CapsNet networks reaches 88.27%, which indicates that CapsNet networks can better extract local features of text. After SENet networks are integrated, the accuracy of the method is reduced, but other indexes are improved, particularly the recall rate is improved by 3.3%, and the result shows that the SENet network can establish the dependency relationship among characteristic channels through learning, so that the utilization rate of the characteristics is further improved.

In experiment 3, the method proposed by the present invention was compared with the methods proposed in other documents, and the experimental results are shown in table four:

Table four

Finally, comparative experiments were performed with the methods proposed in other documents. In the literature of BiLSTM-DenseNet method, stacked BiLSTM networks are utilized to extract text features, so that a long time is needed to train a model, and compared with the model, the accuracy of the method is improved by 1.81%, and the F1 value is improved by 0.25%; from Table IV, it can be seen that the method provided by the invention has better precision and F1 value than BiLSTM-CNN model.

Taken together, the results of the experiments on the dataset based on the Quora problem demonstrate the effectiveness of the method.

It should be noted that the above description is not intended to limit the invention, but rather the invention is not limited to the above examples, and that variations, modifications, additions or substitutions within the spirit and scope of the invention will be within the scope of the invention.

Claims

1. A text semantic similarity calculation method, comprising:

step1, converting a text into a vector matrix;

Step 5, fusing the two semantic similarity matrixes, and judging the semantic similarity of the two texts through a pooling and full-connection layer;

The step1 specifically comprises the following steps:

Intercepting the length of a sentence as m, performing word embedding by using an n-dimensional pre-trained GloVe model, and representing a text as an m-n vector matrix containing semantic information;

The step2 specifically comprises the following steps:

extracting global features of the texts by using Siamese BiLSTM networks sharing parameters to obtain global feature matrixes M and N of the two texts;

Combining the global feature matrix M and N by using a Co-Attention mechanism to obtain a matrix ；

The softmax is calculated for L by rows and columns respectively to obtain attention matrix of the first text to the second textAnd/>；

Applying attention to the second text, generating attention matrix of the first text based on information of the second text after joining attentionAnd/>Obtaining a global feature matrix containing interaction information;

the use of SENet networks to calibrate the importance of each local feature specifically includes:

the output of the CapsNet network DIGITCAPS layer is used as a text local feature matrix, the text local feature matrix is input into a SENet network, and a SECAPSNET network is constructed to calibrate the local feature of the text;

extracting global features of texts by using Siamese BiLSTM networks sharing parameters to obtain global feature matrixes M and N of the two texts, wherein the global feature matrixes M and N are specifically as follows:

The two LSTM are used to perform operations in two directions, namely front and back, and the output of BiLSTM at time t is: Wherein/> Representing the output of forward LSTM at time t,/>Represents the output to LSTM after time t,/>And/>Hidden layer states of forward LSTM and backward LSTM respectively,/>Representing offset,/>The output of BiLSTM at time t is shown.