CN114723013A - Multi-granularity knowledge enhanced semantic matching method - Google Patents

Multi-granularity knowledge enhanced semantic matching method Download PDF

Info

Publication number
CN114723013A
CN114723013A CN202210390694.9A CN202210390694A CN114723013A CN 114723013 A CN114723013 A CN 114723013A CN 202210390694 A CN202210390694 A CN 202210390694A CN 114723013 A CN114723013 A CN 114723013A
Authority
CN
China
Prior art keywords
word
text
granularity
matching
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210390694.9A
Other languages
Chinese (zh)
Inventor
曹小鹏
王凯丽
杨笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN202210390694.9A priority Critical patent/CN114723013A/en
Publication of CN114723013A publication Critical patent/CN114723013A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-granularity knowledge enhanced semantic matching method, which solves the problems of ambiguous word and improper word segmentation in text matching, and the technical scheme of the invention mainly comprises the following steps: (1) and constructing an embedded model (2), capturing the matching features (3) and calculating the text similarity. The method is mainly applied to text semantic matching tasks.

Description

Multi-granularity knowledge enhanced semantic matching method
Technical Field
The invention belongs to the field of computer natural language processing, and particularly relates to a method for semantic matching by adopting a multi-granularity knowledge enhancement mode.
Background
Text semantic matching is a basic problem and a research hotspot in the field of natural language processing and is widely applied to various aspects in real life. For example, when a dialogue question-answering task is carried out, matching semantics among contexts is carried out, or whether the semantics between a question and a candidate answer are matched or not is compared, so that a correct answer is selected; when reading and understanding tasks are performed, the text can be matched with the question to select an answer, so that the text matching technology has a more important role in natural language processing.
The traditional short text matching mainly matches sentences from a vocabulary level, generally considers the aspects of words, sentence patterns and the like, the words are independent, the context of the words is lacked, and the semantic information of the words is greatly ignored. A large number of words in chinese are ambiguous words, which brings great difficulty to semantic understanding. The existing interaction model only uses a single word vector for interaction, and context information between sentences cannot be effectively utilized, so that semantic features implicit in texts cannot be completely mined.
In 2013, Huang et al proposed a deep semantic structured model, which is one of the earliest works of deep learning applied to text matching. By mapping words or sentences to feature vectors using MLP, query documents are projected into low-dimensional vectors of equal length in the underlying space using two deep feed-forward neural networks, and relevance is measured by cosine similarity. The model can reduce word cutting dependence and improve the canonicalization capability of the model.
In 2015, the Noah ark laboratory, Hua, adopts a CNN model to solve the semantic matching problem, and provides two network architectures, namely ARC-I and ARC-II, wherein ARC-II fuses two texts after the convolution of a first layer. Wang and Jiang propose a comparative aggregation model of matching text sequences, performing word-level matching, and aggregating using a convolutional neural network. Subsequently, Wang et al propose a bimp model to perform text matching from multiple angles, which has good effects on both paraphrase recognition and natural language reasoning tasks.
In 2016, Pang et al proposed a MatchPyramid model that focused on the relationships between words, performed point multiplication, cosine similarity, etc. calculations on words in a sentence to obtain a matching matrix, and then performed a two-dimensional convolution on the matrix to extract features. MatchPyramid has a good effect on text matching, but lacks matching information after words and phrases form phrases. The long-time memory neural network LSTM extracts the feature information of the long-sequence text to obtain the global information of the text, and the defect that the CNN cannot extract the global feature is overcome. Chen et al propose an ESIM model, which is an enhanced version of LSTM, that allows for local and global inferences, with an inter-sentence attention mechanism to achieve local and further global inferences.
In 2018, Google proposes a BERT model, which is pre-trained by using MLM and NSP and finally generates a deep two-way language representation fused with context information by adopting a deep two-way Transformer component. The BERT has good effect in the task in the NLP field, but the model is huge, the network parameters are more, and the speed is slower when the pretraining or the fine tuning is carried out.
Disclosure of Invention
The invention provides a multi-granularity knowledge enhanced semantic matching method, which mainly comprises the following steps: 1. constructing an embedded model: the text is embedded according to the character granularity and the word granularity, the Lattice LSTM is used for fusing the information of the character level and the word level, the HowNet external knowledge base is introduced, all the hidden word information in the input sentence is obtained, and the problem of word ambiguity is solved. 2. Capturing the matching features: the two sentences are coded according to the character granularity and the word granularity, and hidden information of the text in the two granularities of the characters and the words is captured by using an attention mechanism. 3. And (3) calculating text similarity: and extracting global features and key features of the text respectively by using maximum pooling and average pooling, and inputting the global features and the key features into a prediction layer to judge whether the two sentences are similar.
The invention has the following effects: the method is applied to LCQMC and BQ data sets to verify that the accuracy rate and the F1 value of the optimal experimental result on the LCQMC data set are 86.13% and 86.95% respectively, the accuracy rate and the F1 value of the optimal experimental result on the BQ data set are 84.36% and 84.40% respectively, and the text matching effect is superior to that of a traditional model.
Drawings
FIG. 1 general model structure diagram
FIG. 2 is a diagram of a coding structure
Detailed Description
The specific implementation of the invention is divided into three steps: 1. constructing an embedded model; 2. capturing the matching features; 3. and calculating the text similarity. First, text is embedded from character and word granularity, while introducing the HowNet external knowledge base. Secondly, encoding the two sentences according to the character granularity and the word granularity, and acquiring hidden information of the text in the two granularities of the characters and the words by using an attention mechanism. And finally, extracting global features and key features of the text respectively by using maximum pooling and average pooling, and inputting the global features and the key features into a prediction layer to judge whether the two sentences are similar. The structure of the method is shown in figure 1:
FIG. 1 general model structure diagram
(1) Building an Embedded model
The text needs to be preprocessed, and two sentences are input
Figure BDA0003595369160000021
And
Figure BDA0003595369160000022
the method comprises the steps of segmenting a sentence into characters and words by adopting different segmentation methods, converting the characters and the words into expression vectors, and enabling the input sentence to obtain multi-granularity sentence expression of the respective characters and words. The existing short text matching mainly matches sentences from a vocabulary level, ignores semantic information of the words and fails to fully consider ambiguity of Chinese words. For example, the term "apple" has different meanings in different contexts, and may refer to fruit, electronic products, companies, and the like. Therefore, in order to better capture word-level characteristics, the method uses Lattice LSTM to fuse information of characters and word levels, introduces HowNet external knowledge base, solves the problem of word ambiguity, and obtains all implied word information in the input sentence.
The Lattice LSTM can use character and word information, and its input includes two parts of character sequence and word sequence, and the input of Lattice LSTM model is assumed as a character sequence w1,w2,...,wnAnd a dictionary
Figure BDA00035953691600000312
The word vectors for all matched characters. Given an input sentence and a dictionary
Figure BDA00035953691600000313
Matched word ws,eThe formula is as follows:
Figure BDA0003595369160000031
wherein e iswRepresenting a look-up table, s, e refer to the beginning and end of a word.
Given a word ws,eWord ws,eThe h meaning of (A) is expressed as
Figure BDA0003595369160000032
Word ws,eThe h-th meaning calculation formula of (2) is:
Figure BDA0003595369160000033
Figure BDA0003595369160000034
wherein the content of the first and second substances,
Figure BDA0003595369160000035
is the word ws,eThe memory cell of the h-th sense of (1). Then, all meanings are combined and used
Figure BDA0003595369160000036
Expressed, the calculation formula is:
Figure BDA0003595369160000037
Figure BDA0003595369160000038
to better understand semantic information of words, merging ambiguous word semantics into
Figure BDA0003595369160000039
In (1). The circular path of words ending with e will flow in
Figure BDA00035953691600000310
In, the formula is:
Figure BDA00035953691600000311
finally, the hidden state is computed.
(2) Capturing matching features
The method adopts GRU and BiGRU to encode two sentences according to character and word granularity respectively, and carries out deep feature extraction on input character vectors and word vectors. Sentence X is represented as follows:
Figure BDA0003595369160000041
Figure BDA0003595369160000042
wherein the content of the first and second substances,
Figure BDA0003595369160000043
representing the hidden state generated by the encoding module for the q-th character,
Figure BDA0003595369160000044
representing the hidden state generated by the encoding module for the p-th word.
The GRU layer is the first layer in the encoder, and the embedding layer and the GRU are combined and output to the BiGRU layer. Finally, the outputs of the GRU and BiGRU layers are combined into a final representation. In the text matching process, in order to obtain information between different granularities in a sentence, the similarity of hidden state tuples between characters and word granularities is calculated through an attention mechanism, and the formula is as follows:
Figure BDA0003595369160000045
thus, we get attention weights for different granularities of the sentence. The sentence X is characterized as follows:
Figure BDA0003595369160000046
the coding structure is shown in fig. 2:
FIG. 2 is a diagram of a coding structure
(3) Calculating text similarity
And calculating the similarity of the model texts, respectively extracting the global features and the key features of the texts by adopting maximum pooling and average pooling, and splicing the two vectors to obtain an output vector. At a prediction layer, the method aggregates the feature representations of two sentences X, Y in various ways, calculates the similarity probability value of the two texts by using a softmax activation function, and judges the similarity of the two sentences, wherein the formula is as follows: g ═ H ([ g ])x,gy,gx⊙gy,|gx-gy|])
Where H (-) is a feed-forward neural network with two hidden layers. gx、gyIs a sentence vector.
Finally, N training samples are used
Figure BDA0003595369160000047
The binary cross entropy function is taken as a loss function, and the formula is as follows:
Figure BDA0003595369160000048
wherein v isiE {0,1} is the ith training sample value, giE {0,1} is the model predictor.
The experimental hardware environment implemented by the invention is Intel (R) core (TM) i7-10750H CPU @2.60GHz 2.59GHz, the memory is 16G, and the software running environment is Windows 10 version. The invention utilizes the LCQMC data set and the BQ data set to test data, and proves the superiority of the method compared with other methods.
The first embodiment is as follows: semantic matching
Semantic matching is to determine whether the two texts express the same semantic meaning, and may give text similarity of the two texts or directly give 0/1 label, in this example, 0/1 label is used to determine whether the text is correct or not. The evaluation indexes adopted are Accuracy (ACC) and F1-Score, and the calculation formula is as follows:
Figure BDA0003595369160000051
Figure BDA0003595369160000052
Figure BDA0003595369160000053
Figure BDA0003595369160000054
where ACC is the percentage of correct classification examples, TR is a true example, TF is a true negative example, FR is a false positive example, and FF is a false negative example; F1-Score is the harmonic mean of accuracy and recall. The results of the experiment are shown in table 1.
Table 1: results of the experiment
Figure BDA0003595369160000055
As can be seen from table 1, the present invention outperforms the other models in the table on both the LCQMC and BQ data sets. The result of the invention is superior to that of Lattice-CNN, although Lattice-CNN uses a word Lattice diagram mode, the structure is limited, only partial information of sentences is concerned, and semantic information of the sentences is imperfect. Compared with the BilSTM, the BiMPM and the ESIM, the result of the invention is obviously superior to the BilSTM, the BiMPM and the ESIM, although the BilSTM can capture the semantic dependency relationship between long distances in two directions, and the BiMPM and the ESIM can be matched at multiple angles, the invention embeds the character granularity and the word granularity, acquires semantic information from an external knowledge base and codes sentences, and the BilSTM, the BiMPM and the ESIM only start from the angle of characters or words, so that the feature extraction is insufficient. Analysis shows that the method is superior to other methods in the performance of LCQMC and BQ data sets, and the method shows that multi-granularity and external knowledge play an important role in the research of text matching.
In summary, the present invention proposes to capture text semantic information from character granularity and word granularity in conjunction with multi-granularity text expression from an external knowledge base. Experiments show that capturing text matching features from multiple granularities by combining external knowledge is better than a neural network for extracting text information from multiple granularities.
The above examples are merely illustrative of the present invention and should not be construed as limiting the scope of the invention, which is intended to be covered by the claims as well as any design similar or equivalent to the scope of the present invention.

Claims (1)

1. A semantic matching method for enhancing multi-granularity knowledge is characterized by comprising the following steps:
(1) constructing an embedded model: the text is embedded according to the character granularity and the word granularity, the Lattice LSTM is used for fusing the information of the character level and the word level, the HowNet external knowledge base is introduced, all the hidden word information in the input sentence is obtained, and the problem of word ambiguity is solved.
(2) Capturing the matching features: the two sentences are coded according to the character granularity and the word granularity, and hidden information of the text in the two granularities of the characters and the words is captured by using an attention mechanism. And finally, extracting text features according to pooling, and inputting the text features into a prediction layer to judge whether the two sentences are similar.
CN202210390694.9A 2022-04-14 2022-04-14 Multi-granularity knowledge enhanced semantic matching method Pending CN114723013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210390694.9A CN114723013A (en) 2022-04-14 2022-04-14 Multi-granularity knowledge enhanced semantic matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210390694.9A CN114723013A (en) 2022-04-14 2022-04-14 Multi-granularity knowledge enhanced semantic matching method

Publications (1)

Publication Number Publication Date
CN114723013A true CN114723013A (en) 2022-07-08

Family

ID=82243459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210390694.9A Pending CN114723013A (en) 2022-04-14 2022-04-14 Multi-granularity knowledge enhanced semantic matching method

Country Status (1)

Country Link
CN (1) CN114723013A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422362A (en) * 2022-10-09 2022-12-02 重庆邮电大学 Text matching method based on artificial intelligence
CN115858791A (en) * 2023-02-17 2023-03-28 成都信息工程大学 Short text classification method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422362A (en) * 2022-10-09 2022-12-02 重庆邮电大学 Text matching method based on artificial intelligence
CN115422362B (en) * 2022-10-09 2023-10-31 郑州数智技术研究院有限公司 Text matching method based on artificial intelligence
CN115858791A (en) * 2023-02-17 2023-03-28 成都信息工程大学 Short text classification method and device, electronic equipment and storage medium
CN115858791B (en) * 2023-02-17 2023-09-15 成都信息工程大学 Short text classification method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112560503B (en) Semantic emotion analysis method integrating depth features and time sequence model
CN111738004B (en) Named entity recognition model training method and named entity recognition method
CN110083831A (en) A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF
CN110134946B (en) Machine reading understanding method for complex data
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
CN112926324B (en) Vietnamese event entity recognition method integrating dictionary and anti-migration
CN112541356B (en) Method and system for recognizing biomedical named entities
CN114723013A (en) Multi-granularity knowledge enhanced semantic matching method
CN111414481A (en) Chinese semantic matching method based on pinyin and BERT embedding
CN114757182A (en) BERT short text sentiment analysis method for improving training mode
CN113239663B (en) Multi-meaning word Chinese entity relation identification method based on Hopkinson
CN113553848A (en) Long text classification method, system, electronic equipment and computer readable storage medium
CN113190656A (en) Chinese named entity extraction method based on multi-label framework and fusion features
CN112163089A (en) Military high-technology text classification method and system fusing named entity recognition
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN113901802A (en) Short text similarity matching method for CRNN (CrNN) network fusion attention mechanism
Wu et al. Tdv2: A novel tree-structured decoder for offline mathematical expression recognition
CN113032541A (en) Answer extraction method based on bert and fusion sentence cluster retrieval
CN117010387A (en) Roberta-BiLSTM-CRF voice dialogue text naming entity recognition system integrating attention mechanism
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
Diao et al. Leveraging integrated learning for open-domain Chinese named entity recognition
CN116562291A (en) Chinese nested named entity recognition method based on boundary detection
CN113139050B (en) Text abstract generation method based on named entity identification additional label and priori knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination