CN110765755A - Semantic similarity feature extraction method based on double selection gates - Google Patents
Semantic similarity feature extraction method based on double selection gates Download PDFInfo
- Publication number
- CN110765755A CN110765755A CN201911032492.1A CN201911032492A CN110765755A CN 110765755 A CN110765755 A CN 110765755A CN 201911032492 A CN201911032492 A CN 201911032492A CN 110765755 A CN110765755 A CN 110765755A
- Authority
- CN
- China
- Prior art keywords
- sentence
- vector
- matching
- context information
- ith
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims description 189
- 238000013528 artificial neural network Methods 0.000 claims description 37
- 230000000306 recurrent effect Effects 0.000 claims description 27
- 238000004364 calculation method Methods 0.000 claims description 24
- 230000002457 bidirectional effect Effects 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 10
- 230000004931 aggregating effect Effects 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 7
- 230000006403 short-term memory Effects 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 5
- 230000009977 dual effect Effects 0.000 claims description 5
- 230000007787 long-term memory Effects 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 239000011159 matrix material Substances 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 229910001316 Ag alloy Inorganic materials 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a semantic similarity feature extraction method based on double selection gates, which relates to the field of natural language processing. The method effectively relieves the problem of low matching efficiency caused by information redundancy, and simultaneously avoids the cost problem of manually extracting the core information.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a semantic similarity feature extraction method based on a double selection gate.
Background
The world is full of massive information, most of the information is stored in the form of texts, and an important subject of artificial intelligence is to arrange the text information into an expression so that a computer can understand the information like a human being. Because many words in a language have multiple meanings, the same concept can be expressed in different ways and other uncertain factors exist, the traditional text similarity calculation method based on character string matching is in a search engine, a question and answer system and the like, and the user requirements are difficult to meet, when a user inputs keywords to search information matched with the keywords, the contents fed back by searching may correspond to non-conforming contents, only a few contents may conform to the searched keywords, and extreme invariance is brought to the user, so that the calculation of text similarity through deeper semantic understanding becomes a hotspot of current natural language research.
In the prior art, a plurality of sentence semantic similarity matching methods are provided, and basically focus on matching character strings at first, the basic flow is generally divided into two steps, firstly, two sentences of which the similarity needs to be judged are input into a circulating network and mapped into vector representation, and then the obtained two sentence vectors are used for judging the similarity of the two sentences through cosine distance. Although the traditional character string method is adopted to judge the similarity of sentence pairs to a certain extent, people are helped to filter out some irrelevant information when searching for relevant problems, the search result is still unsatisfactory in quality. Because the similarity between sentences judged by character strings is only the distance between words calculated at the word level, and no context semantic information exists, information is mismatched and ambiguous, and finally a user cannot quickly find related information of keywords.
Therefore, it is necessary to invent a new semantic similarity feature extraction method.
Disclosure of Invention
The invention aims to provide a semantic similarity feature extraction method based on a double selection gate, which can automatically judge the semantic similarity of two sentences, effectively reduce redundant information of the sentences through double automatic selection of core information, and improve the accuracy and judgment efficiency of the sentence similarity.
The technical scheme is as follows:
s100, carrying out word segmentation on P and Q in sentences to be processed, and carrying out vectorization representation on words subjected to word segmentation to obtain word vectors;
s200, all word vectors of the sentence pairs P and Q obtained in the step S100 are input into a first recurrent neural network in sequence to obtain context information vectors, wherein the last context information vector of the sentence represents a sentence vector of the sentence;
s300, inputting sentence vectors of the sentence pairs P and Q into a primary selection gate to obtain core information characteristics;
s400, inputting the core information obtained in the step S300 into a secondary selection gate, and acquiring the core information characteristics again;
s500, inputting the core information acquired in the step S400 into a multi-angle semantic matching network, wherein the multi-angle semantic matching network comprises four modes of full matching, maximum pooling matching, attention matching and maximum attention matching to obtain feature matching vectors of sentence pairs;
and S600, fusing the feature matching vectors obtained in the step S500 into a vector with a fixed length through a second neural network, and inputting the vector into a prediction layer to calculate the similarity probability distribution of sentence pairs.
Preferably, the first recurrent neural network is configured to generate a state vector of context information.
Preferably, the first layer of the first recurrent neural network is a single long-term and short-term memory network, the second layer of the first recurrent neural network is a bidirectional long-term and short-term memory network, and each hierarchical structure comprises a plurality of connected LSTM cell modules.
Preferably, the first recurrent neural network comprises two hierarchies;
a first layer of the first recurrent neural network is used to generate word-level vectors;
a second layer of the first recurrent neural network is used to generate a context information vector.
Preferably, the first-stage selection gate and the second-stage selection gate respectively comprise a plurality of first-stage selection gate units and second-stage selection gate units;
the primary selection gate and the secondary selection gate are different in structure and different in parameter.
Preferably, in step S200, all word vectors of the sentence pairs obtained in step S100 are sequentially input to the first cyclic network, so as to obtain a sentence state vector after each word is input, specifically:
and inputting the ith word vector and the output word vector at the ith-1 moment into the ith LSTM cell module, and processing the ith word vector and the output word vector by the ith LSTM cell module to obtain the state vector of the sentence after the ith word vector.
Preferably, in step S300, the inputting a sentence vector of a sentence pair into the first-level selection gate, and the acquiring the core information feature includes:
and inputting the context information vector at each moment of the sentence P and the ith sentence vector of the sentence Q into the first-level selection gate unit, and processing the context information vector and the ith sentence vector by the first-level selection gate unit to obtain core information.
Preferably, the step S400 of inputting the core information obtained in the step S300 into the second-level selection gate, and the step of acquiring the core information feature again includes:
and inputting the core information processed by the ith primary selection gate unit into the ith secondary selection gate unit, and processing the core information by the ith secondary selection gate unit to obtain the core information characteristics.
Preferably, in step S500, the step of inputting the core information acquired in step S400 into a multi-angle semantic matching network to obtain a feature matching vector includes:
the full matching carries out cosine similarity calculation on the context information vector at each moment of the sentence P and the sentence vector of the sentence Q to obtain a feature matching vector;
the maximum pooling matching is used for performing cosine similarity calculation on the context information vector at each moment of the sentence P and the context information vector at each moment of the sentence Q, and selecting the maximum value as a feature matching vector;
the attention matching carries out cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the i cosine values are weighted to be taken as attention weights and are multiplied by the context information at each moment of the sentence Q, and the obtained result is further subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector;
the maximum attention matching respectively performs cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q to obtain i cosine values of the sentence P, the maximum value is selected from the i cosine values to be taken as the attention weight and is multiplied by the context information of the sentence Q, and the obtained result is subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector.
Preferably, the second neural network comprises two bidirectional long-time memory networks, and is used for processing the feature matching vectors of sentence pairs and aggregating the feature matching vectors into a vector with a fixed length.
Preferably, the step S600 of fusing the matching vectors obtained in the step S500 into a vector with a fixed length by passing the matching vectors through a second neural network, and inputting the vector into the prediction layer to calculate the probability distribution of similarity between sentence pairs includes:
aggregating four feature matching vectors obtained by four matching of the sentence P into a feature matching vector with a fixed length through the second recurrent neural network;
aggregating four feature matching vectors obtained by four matching of the sentence Q into a feature matching vector with a fixed length through the bidirectional long-short time memory network;
and inputting the two feature matching vectors of the sentence P and the sentence Q into a prediction layer to obtain the sentence pair similarity.
Preferably, Word2Vec is adopted in step S100 to perform vectorization representation on the Word after the Jieba Word segmentation processing. Word2Vec is a prediction model that can efficiently learn embedded words, and the basic idea of Word2Vec is to represent each Word in natural language as a short vector with unified meaning and unified dimension.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. according to the semantic similarity feature extraction method based on the double selection gates, the core information in the sentences is automatically acquired without manually removing redundant information, the semantic similarity of the two sentences can be automatically judged through the semantic similarity model, the sentence similarity judgment accuracy and efficiency are higher through the semantic similarity model, and a user can be helped to find a more matched result in a question-answering or search system.
2. The semantic similarity feature extraction method based on the double selection gates utilizes a bidirectional long-time and short-time memory network to carry out context information vectorization expression on sentences. The network has a long-distance dependence relationship of a cell state capable of capturing texts, can remember a long-term state, realizes the updating, forgetting and filtering of information, better expresses a context relationship, and can solve the problems of gradient disappearance and explosion of the network. Conventional RNN networks connect past outputs and current inputs together and control both outputs by activating functions, which can only take into account the state at the most recent time.
3. According to the semantic similarity feature extraction method based on the double selection gates, the core semantic information in the sentence is automatically acquired by utilizing the two selection gates, so that the influence of redundant information on the judgment of the semantic similarity of the sentence is avoided, and the matching efficiency is improved.
4. The semantic similarity feature extraction method based on the double selection gates utilizes the multi-angle semantic matching network to perform four matching modes of full matching, maximum pooling matching, attention matching and maximum attention matching on two sentences, fully utilizes the context information vectors to perform multi-angle more detailed matching in the four matching modes, effectively avoids the problem that the accuracy of similarity judgment is low only through the cosine distance between two sentence words in the traditional method, and adopts the two-way long-short time memory network to fuse the matching vectors into the city fixed length vectors, effectively controls the dimensionality of the matching vectors and is beneficial to the calculation of the similarity of sentence pairs of a prediction layer.
5. The semantic similarity feature extraction method based on the double selection gates can effectively improve the judgment accuracy and efficiency of the semantic similarity of the sentences, and is suitable for Chinese and English sentence pair corpora.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
FIG. 2 is a block diagram of a dual select gate module according to an embodiment of the present invention.
FIG. 3 is a diagram of a multi-angle semantic matching network structure according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. Of course, the specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example 1
Referring to fig. 1, the present invention provides a semantic similarity feature extraction method based on a dual-selection gate, including:
s100, carrying out word segmentation on P and Q in the sentences to be processed, and carrying out vectorization representation on the words subjected to word segmentation to obtain word vectors.
The word segmentation in step S100 is a process of segmenting words in a sentence into reasonable word sequences conforming to the context meaning, and is one of key technologies and difficulties in natural language understanding and text information processing, and is also an important processing link in a semantic similarity model. The Chinese word segmentation problem is complex because there is no obvious mark between words, the words are flexible to use, varied, rich in semantics and easy to generate ambiguity. According to research, the main difficulties of Chinese text word segmentation based on statistics are ambiguity resolution, inherent nouns and new word discovery, the invention adopts Jieba to segment Chinese texts, adopts Nltk to segment English texts, thereby improving word segmentation accuracy.
Models for vectorizing words include One-hot models and Distributed models. The One-hot model is simple, but the dimension cannot be controlled, and the relation between words cannot be well represented, so that the method adopts the Distributed model, and particularly adopts Word2Vec to vectorize the words.
S200, all word vectors of the sentence pairs P and Q obtained in the step S100 are input into a first recurrent neural network in sequence to obtain context information vectors, wherein the last context information vector of the sentence represents a sentence vector of the sentence;
the first recurrent neural network is used for generating a state vector of the context information; the first recurrent neural network comprises two hierarchical structures, wherein the first layer is a single long-term and short-term memory network and is used for generating word-level vectors; the second layer is a bidirectional long-time and short-time memory network and is used for generating context information vectors; each hierarchy comprising a plurality of linked LSTM cell modules; the module parameters at different hierarchies are different in order to generate the word level and context information vectors.
Inputting all word vectors of the sentence pairs obtained in step S100 into the first cyclic network in sequence, thereby obtaining a sentence state vector after each word is input, specifically:
and inputting the ith word vector and the output word vector at the (i-1) th moment into the ith LSTM cell module, and processing the ith word vector and the output word vector by the ith LSTM cell module to obtain the state vector of the sentence after the ith word vector.
S300, inputting sentence vectors of the sentence pairs P and Q into a primary selection gate to obtain core information characteristics;
specifically, a context information vector at each moment of the sentence P and an ith sentence vector of the sentence Q are input into a first-level selection gate unit, and the core information is obtained through the processing of the ith first-level selection gate unit.
S400, inputting the core information obtained in the step S300 into a secondary selection gate, and acquiring the core information characteristics again; specifically, the core information obtained by processing of the ith primary selection gate unit is input into the ith secondary selection gate unit, and the core information characteristics are obtained by processing of the ith secondary selection gate unit.
The first-stage selection gate and the second-stage selection gate respectively comprise a plurality of first-stage selection gate units and second-stage selection gate units;
the first-level selection gate and the second-level selection gate are different in structure and different in parameter.
S500, inputting the core information acquired in the step S400 into a multi-angle semantic matching network, wherein the multi-angle semantic matching network comprises four modes of full matching, maximum pooling matching, attention matching and maximum attention matching to obtain feature matching vectors of sentence pairs; in particular to a method for preparing a high-performance nano-silver alloy,
performing cosine similarity calculation on the context information vector of each moment of the sentence P and the sentence vector of the sentence Q to obtain a feature matching vector by full matching;
performing maximum pooling matching on the context information vector at each moment of the sentence P and the context information vector at each moment of the sentence Q for cosine similarity calculation, and selecting the maximum value as a feature matching vector;
the attention matching carries out cosine calculation on the context information vector of the ith moment of the sentence P and the context information vector of the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the i cosine values are weighted to be taken as attention weights and are multiplied by the context information of each moment of the sentence Q, and the obtained result is subjected to cosine calculation with the context information vector of each moment of the sentence P to obtain a feature matching vector;
the maximum attention matching carries out cosine calculation on the context information vector of the ith moment of the sentence P and the context information vector of the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the maximum value is selected from the i cosine values to serve as the attention weight and is multiplied by the context information of the sentence Q, and the obtained result is further subjected to cosine calculation with the context information vector of each moment of the sentence P to obtain a feature matching vector.
The second neural network comprises two bidirectional long-time and short-time memory networks and is used for processing the feature matching vectors of sentence pairs and aggregating the feature matching vectors into a vector with a fixed length.
S600, the matching vector obtained in the step S500 is passed through a second neural network, so that the feature matching vector is fused into a vector with a fixed length, and the vector is input into a prediction layer to calculate the similarity probability distribution of sentence pairs, specifically,
aggregating four feature matching vectors obtained by four matching of the sentence P into a feature matching vector with a fixed length through a second recurrent neural network;
aggregating four feature matching vectors obtained by four matching of the sentence Q and a passing bidirectional long-and-short time memory network into a feature matching vector with a fixed length;
and inputting the two feature matching vectors of the sentence P and the sentence Q into a prediction layer to obtain the sentence pair similarity.
In step S100, Word2Vec is used to perform vectorization representation on the words subjected to Jieba Word segmentation processing.
Example 2
On the basis of the embodiment 1, the first recurrent neural network is composed of a layer of unidirectional LSTM network and a layer of bidirectional LSTM network, each layer comprises a plurality of connected LSTM cell modules, and current input information and output information at the previous moment are processed according to an input gate, a forgetting gate, an updating gate and a filtering output gate in the LSTM cell modules. The first layer of the first recurrent neural network includes a plurality of connected unidirectional LSTM cell modules for deriving a state vector for each word. The second layer of the first recurrent neural network includes a plurality of connected bi-directional LSTM cell modules for sentence-to-sentence context information vectors.
In the method, firstly, words and context information of a sentence are modeled through a first recurrent neural network, and a state vector of each word of the sentence at a corresponding moment and a context information vector of the sentence at each moment are obtained. As shown in fig. 2, in the step S200, a Long Short Term memory network (LSTM) is used in the first recurrent neural network, and a calculation formula of the network is as follows:
ft=σ(Wfwt+Ufht-1+bf);
it=σ(Wiwt+Uiht-1+bi);
ot=σ(Wowt+Uoht-1+bo);
ht=ottanh(ct);
in the above formula ftIs the output of the forgetting gate; i.e. itIs the output of the input gate; otIs the output of the output gate; wf、Wi、Wo、Wc、bf、bi、bo、bcThe weight matrixes and the offset vectors are forgetting gates, input gates, output gates and selection gates;new memory information; c. CtFor updating memory content of LSTM network unit, sigma is sigmoid function, ⊙ is element product, h ist-1For hidden layer output at time t-1, WtIs the input information at time t.
In the method of the invention, because the context of the sentence is modeled by the recurrent neural network, the state vector of the corresponding sentence after the word is input at the time t theoretically contains the information of all the words before the time, that is, the state vector h of the sentence obtained after the last word is inputnContains all the information of the whole sentence, therefore hnA state vector representing the entire sentence, i.e., a sentence vector.
Example 3
On the basis of embodiment 1 or 2, the double selection gate comprises two selection gate structures, and the two selection gate structures are different and parameters are also different. Through different selection gates, the method is beneficial to filtering out redundant information in sentences and more accurately acquiring core information. The first floor select gate calculation formula is as follows:
s=hn;
sGatei=σ(Wshi+Uss+b);
in the above formula, the sentence vector is constructed by using the context hidden vector of the sentence, and the hidden layer h of the sentence is takennAs sentence vectors s, sGateiIs a gate vector, WsAnd UsIs a weight matrix, b is a bias vector, σ is a sigmoid activation function,is a dot product between elements.
The second layer selection gate calculates the context vector at the time t, and utilizes the sentence vector at the previous time and the hidden state h 'of the selection gate'iCalculating the weight of the selection gate, and finally normalizing the weight of the selection gate, wherein the calculation formula is as follows:
ei,j=va Ttanh(Wast-1+Uah'i);
h 'in the formula'iA context implicit vector;as a weight matrix, ai,jThe normalization is selected for the selection of the gate,as a core of the k-th statementThe heart feature vector, k 1,2, is the number of sentences in the text.
Referring to fig. 2, the sentence P is P ═ P1,p2,...,pi,...,pn]The term Q denotes Q ═ Q1,q2,...,qi,...,qm]Representing the input sentence pair sequence, inputting words by the model once, obtaining the context information vector representation of each moment of the sentence through the step S200, and obtaining the implicit vector expression matrix of the P sentence contextContext vector expression matrix for sum Q statementsObtaining core information through two layers of selection gates in steps S300 and S400, and obtaining statement P core characteristic feature expression By analogy, statement Q expression
The method of the invention obtains the context information vector of the sentence through the recurrent neural network, thereby enabling the context semantic relevance of the two sentences to be stronger and better judging the semantic similarity of the two sentences.
As shown in fig. 3, the second recurrent neural network is a bidirectional LSTM neural network, which includes a plurality of bidirectional LSTM cell modules connected together. In order to change the feature matching vector generated by the multi-angle matching network into a vector with a fixed length and input the vector into the prediction layer, the matching vector needs to be input into the bidirectional LSTM network and fused into a vector with a fixed length.
In order to obtain the similarity judgment of two sentences, a second cyclic neural network is used, four feature matching vectors of a sentence P and a sentence Q are input into the second cyclic neural network and fused to obtain a fixed-length vector, the four feature matching vectors of the sentence Q and the sentence P are operated in the same way to respectively obtain two fixed-length matching vectors, and the vectors are input into a prediction layer to obtain the sentence pair similarity probability distribution.
The sentence semantic similarity determined by the method of the invention automatically extracts the core information characteristics from the sentences as the input of the matching network besides using the context information between the sentences, thereby improving the matching accuracy, reducing the processing of the matching network on the redundant information and improving the matching efficiency. For some words with the same meaning and different expression forms in the sentence, similarity can be judged through models, for example, two words of 'computer' and 'computer', when similarity judgment is carried out on the two words, not only the distance between the words is considered, but also the context information of the sentence where the words are located is used for judging the similarity.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A semantic similarity feature extraction method based on double selection gates is characterized by comprising the following steps
S100, carrying out word segmentation on P and Q in sentences to be processed, and carrying out vectorization representation on words subjected to word segmentation to obtain word vectors;
s200, all word vectors of the sentence pairs P and Q obtained in the step S100 are input into a first recurrent neural network in sequence to obtain context information vectors, wherein the last context information vector of the sentence represents a sentence vector of the sentence;
s300, inputting sentence vectors of the sentence pairs P and Q into a primary selection gate to obtain core information characteristics;
s400, inputting the core information obtained in the step S300 into a secondary selection gate, and acquiring the core information characteristics again;
s500, inputting the core information acquired in the step S400 into a multi-angle semantic matching network, wherein the multi-angle semantic matching network comprises four modes of full matching, maximum pooling matching, attention matching and maximum attention matching to obtain feature matching vectors of sentence pairs;
and S600, fusing the feature matching vectors obtained in the step S500 into a vector with a fixed length through a second neural network, and inputting the vector into a prediction layer to calculate the similarity probability distribution of sentence pairs.
2. The dual-selection-gate-based semantic similarity feature extraction method according to claim 1, wherein the first recurrent neural network is used for generating a state vector of context information.
3. The semantic similarity feature extraction method based on the dual selection gates as claimed in claim 1, wherein the first recurrent neural network is a single long-term and short-term memory network at the first layer and a bidirectional long-term and short-term memory network at the second layer, and each hierarchical structure comprises a plurality of connected LSTM cell modules.
4. The semantic similarity feature extraction method based on dual-choice gates according to claim 3,
the first recurrent neural network comprises two hierarchies;
a first layer of the first recurrent neural network is used to generate word-level vectors;
a second layer of the first recurrent neural network is used to generate a context information vector.
5. The semantic similarity feature extraction method based on the dual selection gates as claimed in claim 1, wherein the primary selection gate and the secondary selection gate respectively comprise a plurality of primary selection gate units and secondary selection gate units;
6. the semantic similarity feature extraction method based on dual-choice gates according to claim 3,
in step S200, all word vectors of the sentence pairs obtained in step S100 are sequentially input to the first cyclic network, so as to obtain a sentence state vector after each word is input, specifically:
and inputting the ith word vector and the output word vector at the ith-1 moment into the ith LSTM cell module, and processing the ith word vector and the output word vector by the ith LSTM cell module to obtain the state vector of the sentence after the ith word vector.
7. The semantic similarity feature extraction method based on dual-choice gates according to claim 5,
in step S300, the sentence vector of the sentence pair is input into the first-level selection gate, and obtaining the core information features includes:
and inputting the context information vector at each moment of the sentence P and the ith sentence vector of the sentence Q into the first-level selection gate unit, and processing the context information vector and the ith sentence vector by the first-level selection gate unit to obtain core information.
8. The semantic similarity feature extraction method based on dual selection gates according to claims 1-7,
in step S400, the core information obtained in step S300 is input into the secondary selection gate, and the obtaining the core information feature again includes:
and inputting the core information processed by the ith primary selection gate unit into the ith secondary selection gate unit, and processing the core information by the ith secondary selection gate unit to obtain the core information characteristics.
9. The method for extracting semantic similarity features based on dual selection gates according to claims 1-8, wherein in the step S500, the step of inputting the core information acquired in the step S400 into a multi-angle semantic matching network to obtain a feature matching vector comprises:
the full matching carries out cosine similarity calculation on the context information vector at each moment of the sentence P and the sentence vector of the sentence Q to obtain a feature matching vector;
the maximum pooling matching is used for performing cosine similarity calculation on the context information vector at each moment of the sentence P and the context information vector at each moment of the sentence Q, and selecting the maximum value as a feature matching vector;
the attention matching carries out cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q respectively to obtain i cosine values of the sentence P, the i cosine values are weighted to be taken as attention weights and are multiplied by the context information at each moment of the sentence Q, and the obtained result is further subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector;
the maximum attention matching respectively performs cosine calculation on the context information vector at the ith moment of the sentence P and the context information vector at the ith moment of the sentence Q to obtain i cosine values of the sentence P, the maximum value is selected from the i cosine values to be taken as the attention weight and is multiplied by the context information of the sentence Q, and the obtained result is subjected to cosine calculation with the context information vector at each moment of the sentence P to obtain a feature matching vector.
10. The method for semantic similarity feature extraction based on dual-choice gates according to claims 1-9, wherein the step S600 of passing the matching vector obtained in step S500 through a second neural network to fuse the feature matching vector into a vector with a fixed length, and inputting the vector into a prediction layer to calculate the probability distribution of similarity of sentence pairs comprises:
aggregating four feature matching vectors obtained by four matching of the sentence P into a feature matching vector with a fixed length through the second recurrent neural network;
aggregating four feature matching vectors obtained by four matching of the sentence Q into a feature matching vector with a fixed length through the bidirectional long-short time memory network;
and inputting the two feature matching vectors of the sentence P and the sentence Q into a prediction layer to obtain the sentence pair similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911032492.1A CN110765755A (en) | 2019-10-28 | 2019-10-28 | Semantic similarity feature extraction method based on double selection gates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911032492.1A CN110765755A (en) | 2019-10-28 | 2019-10-28 | Semantic similarity feature extraction method based on double selection gates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110765755A true CN110765755A (en) | 2020-02-07 |
Family
ID=69334325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911032492.1A Pending CN110765755A (en) | 2019-10-28 | 2019-10-28 | Semantic similarity feature extraction method based on double selection gates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110765755A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339249A (en) * | 2020-02-20 | 2020-06-26 | 齐鲁工业大学 | Deep intelligent text matching method and device combining multi-angle features |
CN111523241A (en) * | 2020-04-28 | 2020-08-11 | 国网浙江省电力有限公司湖州供电公司 | Method for constructing novel power load logic information model |
CN111523301A (en) * | 2020-06-05 | 2020-08-11 | 泰康保险集团股份有限公司 | Contract document compliance checking method and device |
CN111651973A (en) * | 2020-06-03 | 2020-09-11 | 拾音智能科技有限公司 | Text matching method based on syntax perception |
CN112434514A (en) * | 2020-11-25 | 2021-03-02 | 重庆邮电大学 | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment |
CN112560502A (en) * | 2020-12-28 | 2021-03-26 | 桂林电子科技大学 | Semantic similarity matching method and device and storage medium |
CN113157889A (en) * | 2021-04-21 | 2021-07-23 | 韶鼎人工智能科技有限公司 | Visual question-answering model construction method based on theme loss |
CN113177406A (en) * | 2021-04-23 | 2021-07-27 | 珠海格力电器股份有限公司 | Text processing method and device, electronic equipment and computer readable medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547885A (en) * | 2016-10-27 | 2017-03-29 | 桂林电子科技大学 | A kind of Text Classification System and method |
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN109165300A (en) * | 2018-08-31 | 2019-01-08 | 中国科学院自动化研究所 | Text contains recognition methods and device |
CN109214001A (en) * | 2018-08-23 | 2019-01-15 | 桂林电子科技大学 | A kind of semantic matching system of Chinese and method |
CN109800390A (en) * | 2018-12-21 | 2019-05-24 | 北京石油化工学院 | A kind of calculation method and device of individualized emotion abstract |
CN110162593A (en) * | 2018-11-29 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of processing of search result, similarity model training method and device |
CN110298037A (en) * | 2019-06-13 | 2019-10-01 | 同济大学 | The matched text recognition method of convolutional neural networks based on enhancing attention mechanism |
-
2019
- 2019-10-28 CN CN201911032492.1A patent/CN110765755A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547885A (en) * | 2016-10-27 | 2017-03-29 | 桂林电子科技大学 | A kind of Text Classification System and method |
CN109101494A (en) * | 2018-08-10 | 2018-12-28 | 哈尔滨工业大学(威海) | A method of it is calculated for Chinese sentence semantic similarity, equipment and computer readable storage medium |
CN109214001A (en) * | 2018-08-23 | 2019-01-15 | 桂林电子科技大学 | A kind of semantic matching system of Chinese and method |
CN109165300A (en) * | 2018-08-31 | 2019-01-08 | 中国科学院自动化研究所 | Text contains recognition methods and device |
CN110162593A (en) * | 2018-11-29 | 2019-08-23 | 腾讯科技(深圳)有限公司 | A kind of processing of search result, similarity model training method and device |
CN109800390A (en) * | 2018-12-21 | 2019-05-24 | 北京石油化工学院 | A kind of calculation method and device of individualized emotion abstract |
CN110298037A (en) * | 2019-06-13 | 2019-10-01 | 同济大学 | The matched text recognition method of convolutional neural networks based on enhancing attention mechanism |
Non-Patent Citations (2)
Title |
---|
QINGYU ZHOU等: "Selective Encoding for Abstractive Sentence Summarization", 《ARXIV:1704.07073V1》, 24 April 2017 (2017-04-24), pages 4 * |
ZHIGUO WANG等: "Bilateral Multi-Perspective Matching for Natural Language Sentences", 《ARXIV:1702.03814V3》, 14 July 2017 (2017-07-14), pages 3 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111339249A (en) * | 2020-02-20 | 2020-06-26 | 齐鲁工业大学 | Deep intelligent text matching method and device combining multi-angle features |
CN111523241A (en) * | 2020-04-28 | 2020-08-11 | 国网浙江省电力有限公司湖州供电公司 | Method for constructing novel power load logic information model |
CN111523241B (en) * | 2020-04-28 | 2023-06-13 | 国网浙江省电力有限公司湖州供电公司 | Construction method of power load logic information model |
CN111651973A (en) * | 2020-06-03 | 2020-09-11 | 拾音智能科技有限公司 | Text matching method based on syntax perception |
CN111651973B (en) * | 2020-06-03 | 2023-11-07 | 拾音智能科技有限公司 | Text matching method based on syntactic perception |
CN111523301A (en) * | 2020-06-05 | 2020-08-11 | 泰康保险集团股份有限公司 | Contract document compliance checking method and device |
CN112434514B (en) * | 2020-11-25 | 2022-06-21 | 重庆邮电大学 | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment |
CN112434514A (en) * | 2020-11-25 | 2021-03-02 | 重庆邮电大学 | Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment |
CN112560502B (en) * | 2020-12-28 | 2022-05-13 | 桂林电子科技大学 | Semantic similarity matching method and device and storage medium |
CN112560502A (en) * | 2020-12-28 | 2021-03-26 | 桂林电子科技大学 | Semantic similarity matching method and device and storage medium |
CN113157889A (en) * | 2021-04-21 | 2021-07-23 | 韶鼎人工智能科技有限公司 | Visual question-answering model construction method based on theme loss |
CN113177406A (en) * | 2021-04-23 | 2021-07-27 | 珠海格力电器股份有限公司 | Text processing method and device, electronic equipment and computer readable medium |
CN113177406B (en) * | 2021-04-23 | 2023-07-07 | 珠海格力电器股份有限公司 | Text processing method, text processing device, electronic equipment and computer readable medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN110765755A (en) | Semantic similarity feature extraction method based on double selection gates | |
CN110210037B (en) | Syndrome-oriented medical field category detection method | |
CN107291693B (en) | Semantic calculation method for improved word vector model | |
CN111027595B (en) | Double-stage semantic word vector generation method | |
CN112347268A (en) | Text-enhanced knowledge graph joint representation learning method and device | |
WO2019080863A1 (en) | Text sentiment classification method, storage medium and computer | |
CN110969020A (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN104834747A (en) | Short text classification method based on convolution neutral network | |
CN113255320A (en) | Entity relation extraction method and device based on syntax tree and graph attention machine mechanism | |
CN112163425A (en) | Text entity relation extraction method based on multi-feature information enhancement | |
CN112232053B (en) | Text similarity computing system, method and storage medium based on multi-keyword pair matching | |
CN110532395B (en) | Semantic embedding-based word vector improvement model establishing method | |
CN111581364B (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
CN114428850B (en) | Text retrieval matching method and system | |
CN112417170B (en) | Relationship linking method for incomplete knowledge graph | |
CN111639165A (en) | Intelligent question-answer optimization method based on natural language processing and deep learning | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN114691864A (en) | Text classification model training method and device and text classification method and device | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
Li et al. | Multimodal fusion with co-attention mechanism | |
Yang et al. | Text classification based on convolutional neural network and attention model | |
CN114757184A (en) | Method and system for realizing knowledge question answering in aviation field | |
CN114282592A (en) | Deep learning-based industry text matching model method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200207 |