CN114723013A

CN114723013A - Multi-granularity knowledge enhanced semantic matching method

Info

Publication number: CN114723013A
Application number: CN202210390694.9A
Authority: CN
Inventors: 曹小鹏; 王凯丽; 杨笑
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-07-08

Abstract

The invention provides a multi-granularity knowledge enhanced semantic matching method, which solves the problems of ambiguous word and improper word segmentation in text matching, and the technical scheme of the invention mainly comprises the following steps: (1) and constructing an embedded model (2), capturing the matching features (3) and calculating the text similarity. The method is mainly applied to text semantic matching tasks.

Description

Multi-granularity knowledge enhanced semantic matching method

Technical Field

The invention belongs to the field of computer natural language processing, and particularly relates to a method for semantic matching by adopting a multi-granularity knowledge enhancement mode.

Background

Text semantic matching is a basic problem and a research hotspot in the field of natural language processing and is widely applied to various aspects in real life. For example, when a dialogue question-answering task is carried out, matching semantics among contexts is carried out, or whether the semantics between a question and a candidate answer are matched or not is compared, so that a correct answer is selected; when reading and understanding tasks are performed, the text can be matched with the question to select an answer, so that the text matching technology has a more important role in natural language processing.

The traditional short text matching mainly matches sentences from a vocabulary level, generally considers the aspects of words, sentence patterns and the like, the words are independent, the context of the words is lacked, and the semantic information of the words is greatly ignored. A large number of words in chinese are ambiguous words, which brings great difficulty to semantic understanding. The existing interaction model only uses a single word vector for interaction, and context information between sentences cannot be effectively utilized, so that semantic features implicit in texts cannot be completely mined.

In 2013, Huang et al proposed a deep semantic structured model, which is one of the earliest works of deep learning applied to text matching. By mapping words or sentences to feature vectors using MLP, query documents are projected into low-dimensional vectors of equal length in the underlying space using two deep feed-forward neural networks, and relevance is measured by cosine similarity. The model can reduce word cutting dependence and improve the canonicalization capability of the model.

In 2015, the Noah ark laboratory, Hua, adopts a CNN model to solve the semantic matching problem, and provides two network architectures, namely ARC-I and ARC-II, wherein ARC-II fuses two texts after the convolution of a first layer. Wang and Jiang propose a comparative aggregation model of matching text sequences, performing word-level matching, and aggregating using a convolutional neural network. Subsequently, Wang et al propose a bimp model to perform text matching from multiple angles, which has good effects on both paraphrase recognition and natural language reasoning tasks.

In 2016, Pang et al proposed a MatchPyramid model that focused on the relationships between words, performed point multiplication, cosine similarity, etc. calculations on words in a sentence to obtain a matching matrix, and then performed a two-dimensional convolution on the matrix to extract features. MatchPyramid has a good effect on text matching, but lacks matching information after words and phrases form phrases. The long-time memory neural network LSTM extracts the feature information of the long-sequence text to obtain the global information of the text, and the defect that the CNN cannot extract the global feature is overcome. Chen et al propose an ESIM model, which is an enhanced version of LSTM, that allows for local and global inferences, with an inter-sentence attention mechanism to achieve local and further global inferences.

In 2018, Google proposes a BERT model, which is pre-trained by using MLM and NSP and finally generates a deep two-way language representation fused with context information by adopting a deep two-way Transformer component. The BERT has good effect in the task in the NLP field, but the model is huge, the network parameters are more, and the speed is slower when the pretraining or the fine tuning is carried out.

Disclosure of Invention

The invention provides a multi-granularity knowledge enhanced semantic matching method, which mainly comprises the following steps: 1. constructing an embedded model: the text is embedded according to the character granularity and the word granularity, the Lattice LSTM is used for fusing the information of the character level and the word level, the HowNet external knowledge base is introduced, all the hidden word information in the input sentence is obtained, and the problem of word ambiguity is solved. 2. Capturing the matching features: the two sentences are coded according to the character granularity and the word granularity, and hidden information of the text in the two granularities of the characters and the words is captured by using an attention mechanism. 3. And (3) calculating text similarity: and extracting global features and key features of the text respectively by using maximum pooling and average pooling, and inputting the global features and the key features into a prediction layer to judge whether the two sentences are similar.

The invention has the following effects: the method is applied to LCQMC and BQ data sets to verify that the accuracy rate and the F1 value of the optimal experimental result on the LCQMC data set are 86.13% and 86.95% respectively, the accuracy rate and the F1 value of the optimal experimental result on the BQ data set are 84.36% and 84.40% respectively, and the text matching effect is superior to that of a traditional model.

Drawings

FIG. 1 general model structure diagram

FIG. 2 is a diagram of a coding structure

Detailed Description

The specific implementation of the invention is divided into three steps: 1. constructing an embedded model; 2. capturing the matching features; 3. and calculating the text similarity. First, text is embedded from character and word granularity, while introducing the HowNet external knowledge base. Secondly, encoding the two sentences according to the character granularity and the word granularity, and acquiring hidden information of the text in the two granularities of the characters and the words by using an attention mechanism. And finally, extracting global features and key features of the text respectively by using maximum pooling and average pooling, and inputting the global features and the key features into a prediction layer to judge whether the two sentences are similar. The structure of the method is shown in figure 1:

FIG. 1 general model structure diagram

(1) Building an Embedded model

The text needs to be preprocessed, and two sentences are input

And

the method comprises the steps of segmenting a sentence into characters and words by adopting different segmentation methods, converting the characters and the words into expression vectors, and enabling the input sentence to obtain multi-granularity sentence expression of the respective characters and words. The existing short text matching mainly matches sentences from a vocabulary level, ignores semantic information of the words and fails to fully consider ambiguity of Chinese words. For example, the term "apple" has different meanings in different contexts, and may refer to fruit, electronic products, companies, and the like. Therefore, in order to better capture word-level characteristics, the method uses Lattice LSTM to fuse information of characters and word levels, introduces HowNet external knowledge base, solves the problem of word ambiguity, and obtains all implied word information in the input sentence.

The Lattice LSTM can use character and word information, and its input includes two parts of character sequence and word sequence, and the input of Lattice LSTM model is assumed as a character sequence w₁,w₂,...,w_nAnd a dictionary

The word vectors for all matched characters. Given an input sentence and a dictionary

Matched word w_s,eThe formula is as follows:

wherein e is^wRepresenting a look-up table, s, e refer to the beginning and end of a word.

Given a word w_s,eWord w_s,eThe h meaning of (A) is expressed as

Word w_s,eThe h-th meaning calculation formula of (2) is:

wherein the content of the first and second substances,

is the word w_s,eThe memory cell of the h-th sense of (1). Then, all meanings are combined and used

Expressed, the calculation formula is:

to better understand semantic information of words, merging ambiguous word semantics into

In (1). The circular path of words ending with e will flow in

In, the formula is:

finally, the hidden state is computed.

(2) Capturing matching features

The method adopts GRU and BiGRU to encode two sentences according to character and word granularity respectively, and carries out deep feature extraction on input character vectors and word vectors. Sentence X is represented as follows:

wherein the content of the first and second substances,

representing the hidden state generated by the encoding module for the q-th character,

representing the hidden state generated by the encoding module for the p-th word.

The GRU layer is the first layer in the encoder, and the embedding layer and the GRU are combined and output to the BiGRU layer. Finally, the outputs of the GRU and BiGRU layers are combined into a final representation. In the text matching process, in order to obtain information between different granularities in a sentence, the similarity of hidden state tuples between characters and word granularities is calculated through an attention mechanism, and the formula is as follows:

thus, we get attention weights for different granularities of the sentence. The sentence X is characterized as follows:

the coding structure is shown in fig. 2:

FIG. 2 is a diagram of a coding structure

(3) Calculating text similarity

And calculating the similarity of the model texts, respectively extracting the global features and the key features of the texts by adopting maximum pooling and average pooling, and splicing the two vectors to obtain an output vector. At a prediction layer, the method aggregates the feature representations of two sentences X, Y in various ways, calculates the similarity probability value of the two texts by using a softmax activation function, and judges the similarity of the two sentences, wherein the formula is as follows: g ═ H ([ g ])^x,g^y,g^x⊙g^y,|g^x-g^y|])

Where H (-) is a feed-forward neural network with two hidden layers. g^x、g^yIs a sentence vector.

Finally, N training samples are used

The binary cross entropy function is taken as a loss function, and the formula is as follows:

wherein v is_iE {0,1} is the ith training sample value, g_iE {0,1} is the model predictor.

The experimental hardware environment implemented by the invention is Intel (R) core (TM) i7-10750H CPU @2.60GHz 2.59GHz, the memory is 16G, and the software running environment is Windows 10 version. The invention utilizes the LCQMC data set and the BQ data set to test data, and proves the superiority of the method compared with other methods.

The first embodiment is as follows: semantic matching

Semantic matching is to determine whether the two texts express the same semantic meaning, and may give text similarity of the two texts or directly give 0/1 label, in this example, 0/1 label is used to determine whether the text is correct or not. The evaluation indexes adopted are Accuracy (ACC) and F1-Score, and the calculation formula is as follows:

where ACC is the percentage of correct classification examples, TR is a true example, TF is a true negative example, FR is a false positive example, and FF is a false negative example; F1-Score is the harmonic mean of accuracy and recall. The results of the experiment are shown in table 1.

Table 1: results of the experiment

As can be seen from table 1, the present invention outperforms the other models in the table on both the LCQMC and BQ data sets. The result of the invention is superior to that of Lattice-CNN, although Lattice-CNN uses a word Lattice diagram mode, the structure is limited, only partial information of sentences is concerned, and semantic information of the sentences is imperfect. Compared with the BilSTM, the BiMPM and the ESIM, the result of the invention is obviously superior to the BilSTM, the BiMPM and the ESIM, although the BilSTM can capture the semantic dependency relationship between long distances in two directions, and the BiMPM and the ESIM can be matched at multiple angles, the invention embeds the character granularity and the word granularity, acquires semantic information from an external knowledge base and codes sentences, and the BilSTM, the BiMPM and the ESIM only start from the angle of characters or words, so that the feature extraction is insufficient. Analysis shows that the method is superior to other methods in the performance of LCQMC and BQ data sets, and the method shows that multi-granularity and external knowledge play an important role in the research of text matching.

In summary, the present invention proposes to capture text semantic information from character granularity and word granularity in conjunction with multi-granularity text expression from an external knowledge base. Experiments show that capturing text matching features from multiple granularities by combining external knowledge is better than a neural network for extracting text information from multiple granularities.

The above examples are merely illustrative of the present invention and should not be construed as limiting the scope of the invention, which is intended to be covered by the claims as well as any design similar or equivalent to the scope of the present invention.

Claims

1. A semantic matching method for enhancing multi-granularity knowledge is characterized by comprising the following steps:

(1) constructing an embedded model: the text is embedded according to the character granularity and the word granularity, the Lattice LSTM is used for fusing the information of the character level and the word level, the HowNet external knowledge base is introduced, all the hidden word information in the input sentence is obtained, and the problem of word ambiguity is solved.

(2) Capturing the matching features: the two sentences are coded according to the character granularity and the word granularity, and hidden information of the text in the two granularities of the characters and the words is captured by using an attention mechanism. And finally, extracting text features according to pooling, and inputting the text features into a prediction layer to judge whether the two sentences are similar.