CN110807084A - Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy - Google Patents

Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy Download PDF

Info

Publication number
CN110807084A
CN110807084A CN201910404547.0A CN201910404547A CN110807084A CN 110807084 A CN110807084 A CN 110807084A CN 201910404547 A CN201910404547 A CN 201910404547A CN 110807084 A CN110807084 A CN 110807084A
Authority
CN
China
Prior art keywords
lstm
keyword
representing
attention mechanism
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910404547.0A
Other languages
Chinese (zh)
Inventor
董志安
吕学强
孙少奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN201910404547.0A priority Critical patent/CN110807084A/en
Publication of CN110807084A publication Critical patent/CN110807084A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a patent term relation extraction method of Bi-LSTM and keyword strategy based on attention mechanism, which comprises the following steps: step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix; step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism; step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer; step 4): fusing the global features and the local features; step 5): and outputting a classification result by using a softmax classifier. The invention provides a patent term relation extraction method based on a Bi-LSTM and keyword strategy of an attention mechanism, which is based on patent term relation extraction and aims at solving the problem of long-distance dependence in the traditional deep learning method. Through various experimental comparisons, the effect of the invention is superior to that of the existing method, and the invention can well meet the requirements of practical application.

Description

Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
Technical Field
The invention belongs to the technical field of patent term relation extraction, and particularly relates to a Bi-LSTM and keyword strategy patent term relation extraction method based on an attention mechanism.
Background
With social development and scientific and technological progress, people gradually increase the protection consciousness of scientific research achievements, the patent application number also rises year by year, in order to more effectively analyze the relation between patents and optimize the retrieval of patents, the research of patent term relation automatic extraction is emphasized by more and more scholars, the requirements of people cannot be met by manual collection and extraction of unsupervised learning algorithms in the past, and the patent term relation is necessarily extracted by a computer automatically. The patent term relationship automatic extraction has important effects on the work of patent information retrieval, patent similarity detection, patent field ontology construction, patent knowledge map construction, potential semantic analysis and the like.
At present, the main research methods for relation extraction include a method based on pattern matching, a method based on dictionary driving, a machine learning method based on statistics, and a method based on multi-method mixing, but these methods all require manual extraction of features, such as part of speech, dependency relationship, semantic roles, and the like; or rely on natural language processing tools to some extent, such as part-of-speech tagging, syntactic analysis, etc., however, the processing results of different tools may have a certain difference, thereby affecting the final extraction result.
In recent years, entity relation extraction by using a deep learning method becomes a mainstream, effective text features can be automatically learned and obtained, and the method has performance superior to that of the traditional method in a plurality of natural language processing tasks under the condition of not using a basic natural language processing tool, but the method still has defects in representing local features and global features of sentences.
Disclosure of Invention
In view of the above problems in the prior art, the present invention is directed to a method for extracting patent term relationships based on attention-based Bi-LSTM and keyword strategy, which can avoid the above technical disadvantages.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
a patent term relationship extraction method based on Bi-LSTM and keyword strategy of attention mechanism comprises the following steps:
step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix;
step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism;
step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer;
step 4): fusing the global features and the local features;
step 5): and outputting a classification result by using a softmax classifier.
Further, the improved TextRank algorithm in the step 1) is specifically as follows:
step A: inputting a patent text information set S ═ S to be processed1,s2,s3,...,snAnd setting parameters as follows: damping coefficient d, sliding window size w, maximum iteration number I and iteration stop threshold
Figure BSA0000183132410000021
And B: each corresponding text S in the patent text information set SiPerforming word segmentation and part-of-speech tagging, filtering stop words, and only keeping words (verbs, adjectives and nouns) with specified part-of-speech, wherein the words form final candidate category characteristic keywords;
and C: calculating the TF-IDF value of each word in the patent text information set S through a TF-IDF algorithm;
step D: traversing patent text information words based on the sliding window size w, and then constructing an edge between any two words by adopting a co-occurrence relation (co-occurrence), thereby constructing a second expression siKeyword graph G formed by the words in (1)i
Step E: iterative computation according to equation (1)Keyword graph GiUntil convergence, formula (1) is as follows:
Figure BSA0000183132410000031
wherein: w (v)i) Is a node viThe weight of (2); d is a damping coefficient which represents the probability of pointing from a specific node to any other node in the graph and is set to be 0.85; in (v)i) Representing a pointing node viA set of nodes of (c); out (v)j) Representing a slave node vjA set of nodes pointed by the starting edge; w is ajiRepresenting a node vjTo viThe weight of the edge of (v), W' (v)i)TF-IDFRepresenting a node viA TF-IDF value of (1);
step F: keyword graph G by weightiThe words in the list are sorted, and the word with the largest weight and the part of speech being the verb is selected as the category characteristic keyword.
Further, the step 2) is specifically as follows: the formulas used in the attention layer are shown in (2), (3) and (4):
M=tanh(H)(2)
α=softmax(wTM)(3)
Figure BSA0000183132410000032
where H is the matrix [ H ] output by the Bi-LSTM layer for T moments1,h2,h3,...,hT]And is
Figure BSA0000183132410000038
dwA dimension representing a word vector; w represents a training parameter vector and wTRepresenting the transpose of w α representing the attention probability distribution vector h*Representing the learned sentence representation.
Further, the step 3) is specifically as follows: the output result H of the Bi-LSTM model is calculated statistically by using a maximum pooling method, as shown in formula (5):
h′=maxpool(H)(5)
further, the step 4) is specifically as follows: feature fusion is to combine the calculation results of the attention layer and the pooling layer, as shown in formula (6):
Figure BSA0000183132410000033
whereinRepresenting vector stitching.
Further, the step 5) specifically comprises: predicting tags from a set of discrete classes Y of sentences S using a softmax classifier
Figure BSA0000183132410000037
The classifier takes the result after feature fusion as input, and the formulas are shown as (7) and (8):
Figure BSA0000183132410000036
the loss function used is a negative log-likelihood function of the true class label y and uses L2 regularization to prevent overfitting, the calculation formula is shown in (9):
Figure BSA0000183132410000041
wherein the content of the first and second substances,
Figure BSA0000183132410000042
a one-hot form representing the real category label y,
Figure BSA0000183132410000043
representing the estimated probability of softmax for each class; m represents the number of training samples;
Figure BSA0000183132410000044
l2 regularization superparameter;
Figure BSA0000183132410000045
the model may train parameters.
The patent term relationship extraction method based on the attention mechanism Bi-LSTM and the keyword strategy provided by the invention is based on the patent term relationship extraction and aims at the problem of long-distance dependence in the traditional deep learning method. Through various experimental comparisons, the effect of the invention is superior to that of the existing method, and the invention can well meet the requirements of practical application.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is sentence vectorization;
FIG. 3 is a view of the overall framework of the model;
FIG. 4 is a comparison of the internal experiments of the model;
FIG. 5 is a comparative graph of different experimental methods.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in FIG. 1, a patent term relationship extraction method based on attention mechanism Bi-LSTM and keyword strategy comprises the following steps:
step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix;
preprocessing a patent text, namely segmenting the patent text by commas, semicolons and sentence numbers, identifying term characteristics in each sentence, simultaneously adding position information, obtaining category keyword characteristics representing each sentence by an improved TextRank keyword extraction algorithm, and forming a final vector matrix by the sentences and the extracted characteristics.
The word vector model is as follows:
word vector (word embedding), which is also called word embedding, is a distributed representation of words, and maps each word in an input sentence into a vector of continuous real values, which can capture syntactic and semantic information of the word. Given a sentence s containing k words, { w ═ w1,w2,w3,...,wkWhere each word w in the sentenceiAre mapped into a low-dimensional real number vector xiThe words in each sentence are mapped by equation (1).
xi=Wword·Vi(1)
Wherein x isiIs the word wiIn the form of a vector of (a),
Figure BSA0000183132410000054
is a vector matrix derived from word2vec training, where d iswAs a dimension of a word vector, m is a fixed-size word list, ViIs the word wiThe bag of words of (one-hot form). Thereby obtaining a word vector representation form V of each sentences={x1,x2,x3,...,xk}。
The position vector features are as follows:
in the task of patent term relation extraction, words capable of highlighting the relation between terms are often distributed nearby the terms, so in order to extract the relation between the terms more accurately, the distance between each word and two terms is searched for to generate a position vector matrix, the position vector information of the word is obtained, and the position vector information is spliced behind the word vector of the sentence. For a sentence s containing k words, { w ═ w1,w2,w3,...,wkEvery word w iniThe relative distance from the two terms is
Figure BSA0000183132410000051
Wherein
Figure BSA0000183132410000052
Showing the index of the position in the sentence at the current time,respectively, the position indexes of the two terms in the sentence. And generating a position vector matrix by the word2vec tool according to the obtained word position information. The dimensions of each word vector at this time are:
dw′=dw+2dp(2)
wherein d isw′Representing the vector dimension after stitching location vector information, dwRepresenting the original word vector dimension, dpRepresenting the position vector dimension.
For example, the sentence "the number of chargers controlled by the charging control system" is followed by "the charging control system/control/the charger/the access/the number", and "the charging control system" and "the charger" are used as two patent terms in this sentence, then the distance from the word "control" to the term "charging control system" is 1, and the distance from the word "charger" is-1. The distance of the word "number" to the term "charge controller" is 5 and the distance to the term "charger" is 3.
The category keyword feature extraction based on sentence level is as follows:
the TextRank algorithm is a sorting algorithm based on a graph model and can be used for extracting keywords of a text. The method comprises the steps of dividing a text into a plurality of composition units (words and sentences), establishing a graph model, sequencing important components in the text by using a voting mechanism, and extracting keywords only by using the information of a single document.
The TestRank algorithm is simple and easy to use, utilizes the relevance between words, and can realize keyword extraction only by using the information of a single document, but the TextRank only depends on the document and has the same importance degree of each word during initialization, so that the keywords in the text are difficult to accurately extract. The TF-IDF algorithm depends on the corpus environment, and the importance degree of a word can be known in advance, which is a place superior to the TextRank algorithm. Therefore, the TF-IDF algorithm idea is added into the TextRank algorithm, the initialization value of the importance degree of each word is described by the TF-IDF value, the importance degree of the word is highlighted at the beginning of the TextRank algorithm, and the efficiency and the accuracy of the algorithm are improved. An Improved TextRank (impromoted TextRank, IMTR) algorithm, described as follows:
(1) inputting a patent text information set S ═ S to be processed1,s2,s3,...,snAnd setting parameters as follows: damping coefficient d, sliding window size w, maximum iteration number I and iteration stop threshold
(2) Each corresponding text S in the patent text information set SiPerforming word segmentation and part-of-speech tagging, filtering stop words, and only keeping words (verbs, adjectives and nouns) with specified part-of-speech, wherein the words form final candidate category characteristic keywords;
(3) calculating the TF-IDF value of each word in the patent text information set S through a TF-IDF algorithm;
(4) traversing patent text information words based on the sliding window size w, and then constructing an edge between any two words by adopting a co-occurrence relation (co-occurrence), thereby constructing a second expression siKeyword graph G formed by the words in (1)i
(5) According to the formula
Figure BSA0000183132410000071
Iterative computation keyword graph GiThe weight of each word in the list until convergence;
(6) keyword graph G by weightiThe words in the list are sorted, and the word with the largest weight and the part of speech being the verb is selected as the category characteristic keyword.
In the algorithm W (v)i) Is a node viThe weight of (2); d is damping coefficient, shown from the figureThe probability that a certain specific node points to any other node is generally set to be 0.85; in (v)i) Representing a pointing node viA set of nodes of (c); out (v)j) Representing a slave node vjA set of nodes pointed by the starting edge; w is ajiRepresenting a node vjTo viThe weight of the edge of (v), W' (v)i)TF-IDFRepresenting a node viThe TF-IDF value of (1).
The text features adopted by the invention include term features, position information and keyword features, and are subjected to vectorization processing and spliced into text information word vectors to form final vectorization representation as shown in fig. 2.
Step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism;
a Long Short Term Memory network (LSTM) is proposed by Hochreiter et al in 1997 in Long Short-Term Memory, and is a special type in a Recurrent Neural Network (RNN), so that effective utilization of Long-distance information is realized. The original intention of LSTM design was to ensure the integrity of information in the real world, and memory cells were introduced, i.e. history information was recorded and this recording was selective in control, thereby introducing the concept of three control gates: an input gate, a forgetting gate and an output gate.
In the patent term semantic relation extraction task, the historical information and the future context information of the text are generally considered. However, the LSTM model only records historical information, and has no knowledge of future information. Unlike the LSTM model, the bi-directional LSTM model takes into account both past features (extracted by forward propagation) and future features (extracted by backward propagation). It is simply understood that the bi-directional LSTM model is equivalent to two LSTMs, one forward output sequence and one backward output sequence, and then the outputs of the two are combined as the final result. The bidirectional LSTM model effectively utilizes the context information of the patent text, and can dig out more implicit characteristics in the patent text.
The essence of the Attention mechanism is derived from the human visual Attention mechanism and is applied to the field of visual images. Bahdana et al used the Attention mechanism in Neural Machine Translation by Jointly Learning to Align and Translation, the first to apply the Attention mechanism to the field of natural language processing. The introduction of the Attention mechanism is increasing subsequently with the research of other topics in the field of natural language processing. The attention mechanism can make the model focus more on important information in patent text by computing attention probabilities to highlight the importance of particular words to the entire sentence.
In the part, an attention mechanism of a relation classification task is used for calculating the output of the Bi-LSTM model to obtain an attention probability distribution, the importance degree of the output state of the LSTM unit to relation classification at each moment is obtained from the attention probability distribution, and a sentence expression is learned, so that the final classification effect is improved. In this model, the formula used by the attention tier is as follows:
M=tanh(H)
α=softmax(wTM)
Figure BSA0000183132410000081
where H is the matrix [ H ] output by the Bi-LSTM layer for T moments1,h2,h3,...,hT]And is
Figure BSA0000183132410000084
dwA dimension representing a word vector; w represents a training parameter vector and wTRepresenting the transpose of w α representing the attention probability distribution vector h*Representing the learned sentence representation.
Step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer;
output result H ═ H for Bi-LSTM model1,h2,h3,...,hT]Besides the attention mechanism, the statistical calculation method of the maximum pooling is also selectedAnd obtaining the feature representation which is most relevant to the classification task. Namely: h' ═ maxpool (H)
Step 4): fusing the global features and the local features;
the feature fusion is to combine the calculation results of the attention layer and the pooling layer to achieve the effect of complementary advantages among a plurality of features. Namely:wherein
Figure BSA0000183132410000083
Representing vector stitching.
Step 5): and outputting a classification result by using a softmax classifier.
Transforming the patent term relationship extraction problem into a multi-classification problem, the present invention predicts tags from a set of discrete classes Y of a sentence S using a softmax classifier
Figure BSA0000183132410000091
The classifier takes the result after feature fusion as input, and the formula is as follows:
Figure BSA0000183132410000092
Figure BSA0000183132410000093
the loss function used is a negative log-likelihood function of the true class label y and uses L2 regularization to prevent overfitting, the calculation formula is as follows:
Figure BSA0000183132410000094
wherein the content of the first and second substances,a one-hot form representing the real category label y,
Figure BSA0000183132410000098
representing the estimated probability of softmax for each class; m represents the number of training samples;
Figure BSA0000183132410000096
l2 regularization superparameter;
Figure BSA0000183132410000097
the model may train parameters. The model framework used in the present invention is illustrated in fig. 3.
The experimental data and evaluation criteria were as follows:
the data adopted by the experiment is 9978 patent texts of the new energy automobile field, which are crawled from a patent retrieval and analysis website, the final purpose of the experiment is to extract the relation among the field terms in the patent texts of the new energy automobile field, and as the field terms exist in each part of the patent texts, the abstract, the specification and the claim in the patent texts are used as linguistic data to extract the field term relation. Preprocessing the patent text data, and selecting 6912 corpora as experimental data, wherein 5248 corpora are used as training data, and 1664 corpora are used as test data. The specific data processing steps are as follows:
(1) performing term Extraction on the Patent data by using a Patent term Extraction algorithm of Lv, Xiagnru in Patent Domain Terminology Extraction Based on Multi-feature Fusion and BILSTM-CRF Model;
(2) forming a term dictionary by the extracted patent terms, adding the term dictionary into a jieba word segmentation tool, and performing word segmentation processing on the patent data;
(3) sentence breaking is carried out on the patent data according to commas, semicolons and sentence numbers, and each sentence belongs to a corpus;
(4) selecting sentences only containing two patent terms to form a final data set;
(5) and carrying out category marking on the screened data, and determining final experimental data.
6912 pieces of data selected in the experiment of the present invention contain 7 relationship types, the sample relationship is shown in table 1, and the sample is shown in table 2.
TABLE 1 sample relationships
Figure BSA0000183132410000101
Table 2 sample examples
Accuracy and recall are two metrics widely used in the fields of information retrieval and statistical classification to evaluate the quality of results. Wherein, the accuracy rate is the ratio of the number of the searched relevant documents to the total number of the searched documents, and the measurement is the precision rate of the searching system; the recall rate is the ratio of the number of the searched relevant documents to the number of all the relevant documents in the document library, and the recall rate of the search system is measured; and the F value is an evaluation index integrating the two indexes and is used for comprehensively reflecting the integral index.
In order to verify the correctness and the validity of the model provided by the invention, a macro _ averagedF1(macro _ F1) value is used as an evaluation index of an experiment, if a macro-averagedF1 value is to be calculated, the accuracy (Precision), Recall (Recall) and F1 values of each category need to be calculated, and the calculation formula is as follows:
Figure BSA0000183132410000111
Figure BSA0000183132410000112
wherein, TPiIs shown as
Figure BSA0000183132410000116
The number of correctly predicted data in the relationship type; FPiThe representation is mispredictedIs as follows
Figure BSA0000183132410000117
The number of data of the relationship type; FN (FN)iIndicates to belong toThe number of data types that are relationship type but are mispredicted to other relationship types. The calculation formula of macro _ averagedF1 is as follows:
Figure BSA0000183132410000114
wherein M represents the number of relationship types.
The parameter settings and results were analyzed as follows:
the running environment of the experimental model is a 64-bit Ubuntu16.04 operating system installed on a Dell server, NVIDIA Tesla K40 GPU, and the running memory is 64 GB. The model is implemented using the TensorFlow framework, python language. The final patent term relationship extraction effect of the model has close relationship with the parameters in the model, the local optimal value of each parameter is obtained through a large number of parameter adjusting experiments, and the specific parameter setting is shown in table 3. The final experimental results of this experiment are shown in table 4.
TABLE 3 model parameter settings
Figure BSA0000183132410000115
TABLE 4 Final Experimental results
Figure BSA0000183132410000121
It can be seen from the experimental results of each relationship type in table 4 that the simplicity and complexity of the relationship type affect the final effect of the relationship extraction, because the relationship type (e.g. spatial relationship) is simple, the easier it is to learn by the model, the more accurate the relationship type identification is, the relationship type (e.g. generic relationship) is complex, and the less semantic association the model is able to obtain in the learning process, the lower the identification effect of the relationship type is.
The internal experimental comparisons of the model are as follows:
in order to verify the validity of extracting patent term relations by adding the keyword features and the pooling layer into a Bi-LSTM model based on an Attention mechanism, the invention designs four groups of models and carries out analysis by internal comparison experiments, wherein the original input of the models is sentence word vectors, position feature vectors and term feature vectors. The results of the experiment are shown in table 5 and fig. 4. The results of the various classes of experiments in each experiment are shown in table 6.
TABLE 5 comparison of model internal experiments
Figure BSA0000183132410000122
Comparison of the experimental results for each class of the model in Table 6
Figure BSA0000183132410000123
Figure BSA0000183132410000131
As can be seen from the accuracy, the recall rate and the F1 value of each group of experiments shown in tables 5 and 6 and shown in FIG. 4, the addition of the key word features and the pooling layer in the Bi-LST model based on the Attention mechanism designed by the invention has a good effect, and the relationship between the patent terms in the field of new energy vehicles can be effectively extracted. In experiment 1, only the Attention + Bi-LSTM model is used, and although a certain effect is obtained, the problem of patent term relation extraction can be solved to a certain extent, the final extraction result still needs to be improved. Experiment 2 has increased the keyword characteristic on experiment 1's basis, and experiment 3 has increased the pooling layer on experiment 1's basis, and these two sets of experiments are for experiment 1, and the experimental effect all promotes to some extent, therefore keyword characteristic and pooling layer have all played certain effect to improving the extraction of patent term relation. Experiment 2 is 0.95% higher than the F1 value of experiment 1, and experiment 3 is 0.42% higher than the F1 value of experiment 1, so that it can be concluded that the keyword features play a greater role in improving the extraction of patent term relationships than the pooling layer. The reason is that the addition of the keyword features improves the discrimination of the patent term relationship categories, and simultaneously makes up the deficiency of the Attention + Bi-LSTM model automatic learning features, so that the explicit addition of the keyword features can play a certain role in extracting the patent term relationship.
Therefore, the invention designs that the keyword characteristics and the pooling layer are simultaneously added into the Bi-LSTM model based on the Attention mechanism, and the experiment 4 can obtain that the keyword + Attention + Bi-LSTM + pooling layer model can achieve better experiment effect than a general deep learning model.
The different classification methods were compared as follows:
in order to verify the advantages of the Attention + Bi-LSTM model in patent term relationship extraction, the Attention + Bi-LSTM model and the RNN, LSTM and Bi-LSTM models are compared on the same data set, for unifying the experimental standards, the input word vectors of all the models are the same and are in the vector format shown in FIG. 2, and the models are added with the pooling layer, and the experimental results are shown in Table 7, Table 8 and FIG. 5.
TABLE 7 comparison of different experimental methods
Figure BSA0000183132410000141
Comparison of the experimental results for each class of the model in Table 8
Figure BSA0000183132410000142
It can be seen from the experimental comparison of the different methods in table 7 and fig. 5 that the Bi-LSTM method shows better performance than the LSTM, RNN methods. This is because the Bi-LSTM model considers both past features (extracted by forward propagation) and future features (extracted by backward propagation), and effectively utilizes the context information of the patent text to extract more implicit features in the patent text. The Attention mechanism is added on the basis of the Bi-LSTM model, and the effect is further improved, because the Attention mechanism highlights the importance degree of a specific word to the whole sentence by calculating the Attention probability, so that the model can pay more Attention to the important information in the patent text. The effectiveness of the method of the invention in the extraction of patent term relationships is confirmed by comparing the above experiments.
The patent term relationship extraction method based on the attention mechanism Bi-LSTM and the keyword strategy provided by the invention is based on the patent term relationship extraction and aims at the problem of long-distance dependence in the traditional deep learning method. Through various experimental comparisons, the effect of the invention is superior to that of the existing method, and the invention can well meet the requirements of practical application.
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A patent term relation extraction method based on Bi-LSTM and keyword strategy of attention mechanism is characterized by comprising the following steps:
step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix;
step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism;
step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer;
step 4): fusing the global features and the local features;
step 5): and outputting a classification result by using a softmax classifier.
2. The attention mechanism-based patent term relationship extraction method for the Bi-LSTM and keyword strategy according to claim 1, wherein the TextRank algorithm improved in the step 1) is specifically as follows:
step A: inputting a patent text information set S ═ S to be processed1,s2,s3,...,snAnd setting parameters as follows: damping coefficient d, sliding window size w, maximum iteration number I and iteration stop threshold
Figure FSA0000183132400000011
And B: each corresponding text S in the patent text information set SiPerforming word segmentation and part-of-speech tagging, filtering stop words, and only keeping words (verbs, adjectives and nouns) with specified part-of-speech, wherein the words form final candidate category characteristic keywords;
and C: calculating the TF-IDF value of each word in the patent text information set S through a TF-IDF algorithm;
step D: traversing patent text information words based on the sliding window size w, and then constructing an edge between any two words by adopting a co-occurrence relation (co-occurrence), thereby constructing a second expression siKeyword graph G formed by the words in (1)i
Step E: iteratively calculating a keyword graph G according to formula (1)iUntil convergence, formula (1) is as follows:
wherein: w (v)i) Is a node viThe weight of (2); d is a damping coefficient which represents the probability of pointing from a specific node to any other node in the graph and is set to be 0.85; in (v)i) Representing a pointing node viA set of nodes of (c); out (v)j) Representing a slave node vjStarting fromThe node set pointed by the edge; w is ajiRepresenting a node vjTo viThe weight of the edge of (v), W' (v)i)TF-IDFRepresenting a node viA TF-IDF value of (1);
step F: keyword graph G by weightiThe words in the list are sorted, and the word with the largest weight and the part of speech being the verb is selected as the category characteristic keyword.
3. The attention mechanism-based patent term relationship extraction method for the Bi-LSTM and keyword strategy according to claim 1, wherein the step 2) is specifically: the formulas used in the attention layer are shown in (2), (3) and (4):
M=tanh(H)(2)
α=softmax(wTM)(3)
Figure FSA0000183132400000022
where H is the matrix [ H ] output by the Bi-LSTM layer for T moments1,h2,h3,...,hT]And isdwA dimension representing a word vector; w represents a training parameter vector and wTRepresenting the transpose of w α representing the attention probability distribution vector h*Representing the learned sentence representation.
4. The attention mechanism-based patent term relationship extraction method for the Bi-LSTM and keyword strategy according to claim 1, wherein the step 3) is specifically: the output result H of the Bi-LSTM model is calculated statistically by using a maximum pooling method, as shown in formula (5):
h′=maxpool(H)(5)。
5. the attention mechanism-based patent term relationship extraction method for the Bi-LSTM and keyword strategy according to claim 1, wherein the step 4) is specifically: feature fusion is to combine the calculation results of the attention layer and the pooling layer, as shown in formula (6):
wherein
Figure FSA0000183132400000024
Representing vector stitching.
6. The attention mechanism-based patent term relationship extraction method for the Bi-LSTM and keyword strategy according to claim 1, wherein the step 5) is specifically: predicting tags from a set of discrete classes Y of sentences S using a softmax classifier
Figure FSA0000183132400000031
The classifier takes the result after feature fusion as input, and the formulas are shown as (7) and (8):
Figure FSA0000183132400000032
Figure FSA0000183132400000033
the loss function used is a negative log-likelihood function of the true class label y and uses L2 regularization to prevent overfitting, the calculation formula is shown in (9):
Figure FSA0000183132400000034
wherein the content of the first and second substances,
Figure FSA0000183132400000035
a one-hot form representing the real category label y,
Figure FSA0000183132400000036
representing the estimated probability of softmax for each class; m represents the number of training samples;
Figure FSA0000183132400000037
l2 regularization superparameter;the model may train parameters.
CN201910404547.0A 2019-05-15 2019-05-15 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy Withdrawn CN110807084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910404547.0A CN110807084A (en) 2019-05-15 2019-05-15 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910404547.0A CN110807084A (en) 2019-05-15 2019-05-15 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy

Publications (1)

Publication Number Publication Date
CN110807084A true CN110807084A (en) 2020-02-18

Family

ID=69487335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910404547.0A Withdrawn CN110807084A (en) 2019-05-15 2019-05-15 Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy

Country Status (1)

Country Link
CN (1) CN110807084A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444712A (en) * 2020-03-25 2020-07-24 重庆邮电大学 Keyword extraction method, terminal and computer readable storage medium
CN111475629A (en) * 2020-03-31 2020-07-31 渤海大学 Knowledge graph construction method and system for math tutoring question-answering system
CN112052683A (en) * 2020-09-03 2020-12-08 平安科技(深圳)有限公司 Text matching method and device, computer equipment and storage medium
CN112163426A (en) * 2020-09-30 2021-01-01 中国矿业大学 Relationship extraction method based on combination of attention mechanism and graph long-time memory neural network
CN112256939A (en) * 2020-09-17 2021-01-22 青岛科技大学 Text entity relation extraction method for chemical field
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph
CN112507109A (en) * 2020-12-11 2021-03-16 重庆知识产权大数据研究院有限公司 Retrieval method and device based on semantic analysis and keyword recognition
CN113312532A (en) * 2021-06-01 2021-08-27 哈尔滨工业大学 Public opinion grade prediction method based on deep learning and oriented to public inspection field
CN113342929A (en) * 2021-05-07 2021-09-03 上海大学 Material-component-process-performance relation quadruple extraction method for material field
CN113535948A (en) * 2021-06-02 2021-10-22 中国人民解放军海军工程大学 LSTM-Attention text classification method introducing essential point information
CN113535800A (en) * 2021-06-03 2021-10-22 同盾科技有限公司 Feature representation method in credit scenario, electronic device, and storage medium
CN113743099A (en) * 2021-08-18 2021-12-03 重庆大学 Self-attention mechanism-based term extraction system, method, medium and terminal

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444712B (en) * 2020-03-25 2022-08-30 重庆邮电大学 Keyword extraction method, terminal and computer readable storage medium
CN111444712A (en) * 2020-03-25 2020-07-24 重庆邮电大学 Keyword extraction method, terminal and computer readable storage medium
CN111475629A (en) * 2020-03-31 2020-07-31 渤海大学 Knowledge graph construction method and system for math tutoring question-answering system
CN112052683A (en) * 2020-09-03 2020-12-08 平安科技(深圳)有限公司 Text matching method and device, computer equipment and storage medium
CN112256939A (en) * 2020-09-17 2021-01-22 青岛科技大学 Text entity relation extraction method for chemical field
CN112256939B (en) * 2020-09-17 2022-09-16 青岛科技大学 Text entity relation extraction method for chemical field
CN112163426A (en) * 2020-09-30 2021-01-01 中国矿业大学 Relationship extraction method based on combination of attention mechanism and graph long-time memory neural network
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph
CN112507109A (en) * 2020-12-11 2021-03-16 重庆知识产权大数据研究院有限公司 Retrieval method and device based on semantic analysis and keyword recognition
CN113342929A (en) * 2021-05-07 2021-09-03 上海大学 Material-component-process-performance relation quadruple extraction method for material field
CN113312532A (en) * 2021-06-01 2021-08-27 哈尔滨工业大学 Public opinion grade prediction method based on deep learning and oriented to public inspection field
CN113312532B (en) * 2021-06-01 2022-10-21 哈尔滨工业大学 Public opinion grade prediction method based on deep learning and oriented to public inspection field
CN113535948B (en) * 2021-06-02 2022-08-16 中国人民解放军海军工程大学 LSTM-Attention text classification method introducing essential point information
CN113535948A (en) * 2021-06-02 2021-10-22 中国人民解放军海军工程大学 LSTM-Attention text classification method introducing essential point information
CN113535800A (en) * 2021-06-03 2021-10-22 同盾科技有限公司 Feature representation method in credit scenario, electronic device, and storage medium
CN113743099A (en) * 2021-08-18 2021-12-03 重庆大学 Self-attention mechanism-based term extraction system, method, medium and terminal
CN113743099B (en) * 2021-08-18 2023-10-13 重庆大学 System, method, medium and terminal for extracting terms based on self-attention mechanism

Similar Documents

Publication Publication Date Title
CN110807084A (en) Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN107992597B (en) Text structuring method for power grid fault case
CN109766544B (en) Document keyword extraction method and device based on LDA and word vector
CN105279495A (en) Video description method based on deep learning and text summarization
CN106294344A (en) Video retrieval method and device
Cai et al. Intelligent question answering in restricted domains using deep learning and question pair matching
CN111368088A (en) Text emotion classification method based on deep learning
CN114048354B (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN114936277A (en) Similarity problem matching method and user similarity problem matching system
CN112434164A (en) Network public opinion analysis method and system considering topic discovery and emotion analysis
CN110297986A (en) A kind of Sentiment orientation analysis method of hot microblog topic
CN111581943A (en) Chinese-over-bilingual multi-document news viewpoint sentence identification method based on sentence association graph
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
Gunaseelan et al. Automatic extraction of segments from resumes using machine learning
CN115168580A (en) Text classification method based on keyword extraction and attention mechanism
CN111259156A (en) Hot spot clustering method facing time sequence
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN117291190A (en) User demand calculation method based on emotion dictionary and LDA topic model
Jiang et al. A hierarchical bidirectional LSTM sequence model for extractive text summarization in electric power systems
Yafoz et al. Analyzing machine learning algorithms for sentiments in arabic text
Utami Sentiment Analysis of Hotel User Review using RNN Algorithm
Essatouti et al. Arabic sentiment analysis using a levenshtein distance based representation approach
Liu et al. Suggestion mining from online reviews usingrandom multimodel deep learning
Tan et al. Sentiment analysis of chinese short text based on multiple features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200218

WW01 Invention patent application withdrawn after publication