CN110597998A - Military scenario entity relationship extraction method and device combined with syntactic analysis - Google Patents
Military scenario entity relationship extraction method and device combined with syntactic analysis Download PDFInfo
- Publication number
- CN110597998A CN110597998A CN201910653287.0A CN201910653287A CN110597998A CN 110597998 A CN110597998 A CN 110597998A CN 201910653287 A CN201910653287 A CN 201910653287A CN 110597998 A CN110597998 A CN 110597998A
- Authority
- CN
- China
- Prior art keywords
- entity
- corpus
- entity relationship
- military
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses a military scenario entity relationship extraction method and device combining syntactic analysis, wherein the method comprises the following steps: 1. predefining a target relationship type of a military scenario entity relationship extraction task; 2. constructing a training data set and a testing data set of the entity relationship extraction model; 3. parsing the linguistic data item by item, and filtering out sentence components which do not contribute to the extraction of the entity relationship; 4. converting sentence components reserved after syntactic parsing into vectorized word embedding by using a pre-trained word embedding matrix; 5. training the entity relationship extraction model by using vectorized training data; 6. and extracting entity relations of the military scenario texts to be processed. The military thought entity relationship extraction method combined with the syntactic analysis can effectively improve the calculation efficiency and the accuracy of the entity relationship extraction.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to an entity relationship extraction method and device for military scenario.
Background
The military idea is divided into basic idea and supplementary idea, is a practice document which is assumed and assumed according to the attempts, situations and development situations of both parties of the battle according to the training topic, and is a basic document which organizes and induces the military practice and operation. The military thought entity relationship is a basic information element of military thought data, is a basis for extracting, processing and analyzing the military thought data, aims to extract the military thought entity relationship, finds the entity relationship hidden in the military thought unstructured text, and extracts the entity relationship by adopting a certain means.
At present, entity relationship extraction methods in the open field mainly include a rule-based method, a kernel function-based method, and a deep learning-based method. The rule-based method needs to depend on expert knowledge and manual induction seriously according to domain knowledge related to the linguistic data to be processed, so that the cost is high, the portability is poor, and the rule-based method is difficult to widely use; the method based on the kernel function performs entity relation extraction by calculating the similarity of the syntactic structure tree, so that the training and testing speed is too low, and the method is not suitable for processing large-scale data; the deep learning-based method can automatically extract high-level features in sentences by utilizing a deep neural network, has strong portability and high extraction precision, but for the text in the closed field planned by military, the performance of the text is restricted due to the lack of large-scale manual labeling linguistic data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and realize a military scenario entity relationship extraction method and device combined with syntactic analysis.
In order to achieve the purpose, the invention adopts the following technical scheme:
a military scenario entity relationship extraction method based on syntactic analysis and a deep neural network comprises the following steps:
s1, corpus construction, which is used for predefining entity relationship extraction target relationship types, labeling military scenario original texts, constructing entity relationship extraction model training data sets and testing data sets, and specifically comprises the following steps:
s1.1, predefining entity relations, wherein the entity relations are used for analyzing military concepts in an authoritative dictionary in the field, referring to the principle and method of a Semantic Evaluation conference about entity relation type definition, and predefining entity relation types to be extracted;
authoritative dictionaries in the field include, but are not limited to, dictionaries such as Chinese military encyclopedia, military dictionary, concise military dictionary and the like;
s1.2, entity relation linguistic data are constructed, military scenario original texts are labeled manually according to predefined entity relation types, an entity relation extraction corpus is generated, and the storage form of each linguistic data in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity, r representing a semantic relationship between the two entities, s representing a description entity e1、e2Sentences with semantic relation r;
s1.3, dividing a data set, dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
the division ratio of the training data set to the test data set is 2: 1.
S2, parsing, which is used to parse the sentence S in each corpus in the corpus and filter the sentence components that do not contribute to the entity relationship extraction, and specifically includes:
s2.1, generating a syntax tree, and analyzing sentences S in each corpus in the corpus by using a syntax analysis open source tool to generate the syntax tree;
the syntax parsing open source tool includes but is not limited to Stanford parser and the like;
s2.2, parsing tree pruning for pruning triples (e) in the syntax tree related to the entity1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
and S2.3, recombining the subtrees, namely recombining the syntax analysis subtrees into a text sequence, wherein the original sequence of words is not changed in the recombining process.
S3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization, combining with an authority dictionary in the field to recombine the currently input sequence SiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, generating word embedding, namely converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-value word embedding word by using a word vector conversion open source tool;
the word vector translation open source tool includes, but is not limited to, word2vec, etc.
S4, model training, namely training an entity relationship extraction model based on the deep neural network by using a datamation entity relationship extraction training data set, wherein the method specifically comprises the following steps:
s4.1, semantic feature extraction, namely selecting a specific neural network as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.4, and simultaneously extracting an entity pair e by adopting a bidirectional neural network in the model1、e2The context semantic information of the ith corpus is used for improving the recognition accuracy of the entity relationship, and the characteristic expression of the jth word of the ith corpus is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
the specific neural network includes, but is not limited to, Long Short-Term Memory Networks (LTSM) and the like;
the bidirectional neural network includes but is not limited to bidirectional long short term memory network (BLSTM) and the like;
s4.2, entity relation prediction, namely processing the characteristic vector output in the step S4.1 by using a classifier, and calculating the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
the classifier includes but is not limited to a softmax classifier, etc.;
s4.3, optimizing a cost function, and obtaining the following cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y:
in the formula, tnWhich represents a one-hot vector of the vector,the method comprises the steps of representing the estimation probability of each predefined relationship type output by a softamx classifier, representing the number of the predefined relationship types (the value is 8 here), representing L2 regularized superparameter, representing theta to an independent parameter in an entity relationship extraction model, representing | · | | | to obtain a norm, and continuously adjusting the model superparameter through minimizing a cost function J (theta) to finish model training.
S5, entity relationship extraction, which is used to extract entity relationships of the military scenario text to be processed by using the trained model, and specifically includes:
s5.1, testing text vectorization, and using the processing process in the step S3 to vectorize the military scenario original text to be processed sentence by sentence;
and S5.2, entity relation prediction, namely performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step S5.1 by using the model trained in the step S4, and storing the result.
The invention adopts the extraction method of the military scenario entity relationship by combining the syntactic analysis, and has the advantages that:
1. through deep analysis of authoritative dictionaries in the fields of Chinese military encyclopedia and the like, the target requirement of extraction of military thought entity relations is cleared, on the basis, the principle and the method of definition of entity relation types of a Semantic Evaluation conference are referred, the target relation types of extraction of 8 military thought entity relations are predefined, and a military thought entity relation extraction training/testing corpus containing 11236 corpora is constructed;
2. the semantic expression which is set for military affairs has stronger normativity and modularity, and the syntactic parser is firstly utilized to carry out syntactic parsing and pruning operation on the sentence before the relation extraction is carried out, so that sentence components which do not contribute to the entity relation extraction are filtered, the utilization rate of effective information is improved, and the operation overhead of the model is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of an embodiment of a military proposed entity relationship extraction method incorporating syntactic analysis of the present invention;
FIG. 2 is a block diagram of the component architecture of the present invention;
FIG. 3 is a diagram of an entity relationship extraction model based on a deep neural network applied in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow diagram of a military scenario entity relationship extraction method with syntactic analysis is shown, which specifically includes the following steps:
s1, corpus construction, which is used for predefining entity relationship extraction target relationship types, labeling military scenario original texts, constructing entity relationship extraction model training data sets and testing data sets, and specifically comprises the following steps:
s1.1, predefining entity relations, analyzing military concepts in authoritative dictionaries in fields of Chinese military encyclopedias, military major dictionaries, concise military dictionaries and the like, referring to principles and methods of a Semantic Evaluation conference about entity relation type definition, and predefining entity relation types to be extracted;
1.2, constructing entity relation corpora, marking military scenario original texts by adopting a manual method according to predefined entity relation types to generate an entity relation extraction corpus, wherein the storage form of each corpus in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity; r represents the semantic relationship between two entities; s denotes a description entity e1、e2Sentences with semantic relation r;
s1.3, dividing a data set, dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
s2, parsing, which is used to parse the sentence S in each corpus in the corpus and filter the sentence components that do not contribute to the entity relationship extraction, and specifically includes:
s2.1, generating a syntax tree, and performing syntax analysis on sentences S in each corpus in the corpus by using open source tools such as Stanfordparser and the like to generate the syntax tree;
s2.2, parsing tree pruning for pruning triples (e) in the syntax tree related to the entity1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
and S2.3, recombining the subtrees, namely recombining the syntax analysis subtrees into a text sequence, wherein the original sequence of words is not changed in the recombining process.
S3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization, combining with an authority dictionary in the field to recombine the currently input sequence SiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, generating word embedding, namely converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-value word embedding word by utilizing open source tools such as word2vec and the like, namely, converting the jth word x in the ith sentence into low-dimensional real-value word embedding wordijConversion to k-dimensional vectors
S4, model training, namely training an entity relationship extraction model based on the deep neural network by using a datamation entity relationship extraction training data set, wherein the method specifically comprises the following steps:
s4.1, semantic feature extraction, namely selecting a Long Short-Term Memory network (LTSM) and the like as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.2, and simultaneously extracting an entity pair e by using a bidirectional Long Short-Term Memory network (BLSTM) and the like as a model1、e2The context semantic information of the entity relationship is improved, and the identification precision of the entity relationship is improved. Of the j-th word of the ith corpusThe characteristic expression is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
s4.2, entity relation prediction, namely processing the feature vector output in the step S4.1 by using classifiers such as softmax and the like, and calculating the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
s4.3, optimizing a cost function, and obtaining the following cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y:
in the formula, tnWhich represents a one-hot vector of the vector,and (2) representing the estimated probability of each predefined relationship type output by the softamx classifier in the step (S4.2), wherein m represents the number of the predefined relationship types (the value is 8), lambda represents the regularized hyperparameter of L2, theta represents an independent parameter in the entity relationship extraction model, and | L | · | | represents the norm, and model hyperparameter is continuously adjusted by minimizing a cost function J (theta) to complete model training.
S5, extracting entity relations, wherein the extraction of the entity relations is carried out on the military scenario text to be processed by utilizing the trained model, and the extraction method specifically comprises the following steps:
s5.1, testing text vectorization, and using the processing process in the step S3 to vectorize the military scenario original text to be processed sentence by sentence;
and S5.2, entity relation prediction, namely performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step S5.1 by using the model trained in the step S4, and storing the result.
Referring to fig. 2, there is shown a composition structure diagram of the present invention, which specifically includes:
the corpus construction module 100 is configured to predefine an entity relationship extraction target relationship type, label a military scenario original text, and construct an entity relationship extraction model training data set and a test data set, and specifically includes:
the entity relationship predefining unit 101 is used for analyzing military concepts in authoritative dictionaries in fields of Chinese military encyclopedias, military dictionaries, concise military dictionaries and the like, predefining entity relationship types to be extracted by referring to principles and methods defined by a Semantic Evaluation conference about entity relationship types;
the entity relationship corpus building unit 102 labels military scenario original texts by a manual method according to predefined entity relationship types to generate an entity relationship extraction corpus;
a data set dividing unit 103, configured to divide a training data set and a test data set, and divide the corpus obtained by the entity-relationship corpus establishing unit 102 into the training data set and the test data set according to a specific ratio;
the syntax parsing module 200 is configured to perform syntax parsing on sentences in each corpus in the corpus, and filter out sentence components that do not contribute to entity relationship extraction, and specifically includes:
a syntax tree generating unit 201, which performs syntax parsing on sentences in each corpus in the corpus by using an open source tool to generate a syntax tree;
a syntax tree pruning unit 202, configured to prune branches and leaves in the syntax tree except for the entity and its root node, and generate a syntax parsing sub-tree;
and the subtree recombination unit 203 recombines the syntax analysis subtree into a text sequence, and does not change the original sequence of words in the recombination process.
The data vectorization module 300 converts the recombination sequences generated by the sub-tree recombination unit 203 into word embedding sets expressed in a distributed vector form, and specifically includes:
training an original text vectorization unit 301, segmenting the recombination sequence in the current input corpus according to words to obtain a word set consisting of T words, and converting the words in the set into one-hot vectors based on an authority dictionary in the field;
the word embedding generating unit 302 converts the one-hot vector set obtained by training the original text vectorization 301 into low-dimensional real-valued word embedding word by using an open-source tool.
The model training module 400, which trains the entity relationship extraction model based on the deep neural network by using the datamation entity relationship extraction training data set, specifically includes:
the semantic feature extraction unit 401 selects a specific neural network as a basic relationship extractor, extracts the high-level semantic features of the current sentence from the vector set output by the word embedding generation unit 302, and the model adopts a bidirectional neural network to simultaneously extract the entity pair e1、e2The context semantic information of the entity relation is improved so as to improve the identification precision of the entity relation;
an entity relationship prediction unit 402 for processing the feature vector output from the semantic feature extraction unit 401 by using a classifier;
the cost function optimization unit 403 obtains a cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y, and completes model training by continuously adjusting the hyper-parameters of the model by minimizing the cost function.
The entity relationship extraction module 500 performs entity relationship extraction on the military scenario text to be processed by using the trained model, and specifically includes:
the test text vectorization unit 501 performs vectorization on the military scenario original text to be processed sentence by using the processing procedure in the data vectorization module 300;
the entity relationship prediction unit 502 performs semantic relationship prediction on the vectorization military scenario sentence by sentence output by the test text vectorization unit 501 by using the model trained by the model training module 400, and stores the result.
Claims (9)
1. A military scenario entity relationship extraction method combined with syntactic analysis is characterized by comprising the following steps:
s1, corpus construction: predefining entity relationship extraction target relationship types, labeling military scenario original texts, and constructing an entity relationship extraction model training data set and a test data set, wherein the method specifically comprises the following steps:
s1.1, predefining entity relations: predefining entity relationship types to be extracted by adopting the principle and method of definition of the entity relationship types of the Semantic Evaluation conference;
s1.2, entity relation corpus construction: marking military scenario original text by adopting a manual method according to a predefined entity relation type to generate an entity relation extraction corpus, wherein the storage form of each corpus in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity, r representing a semantic relationship between the two entities, s representing a description entity e1、e2Sentences with semantic relation r;
s1.3, data set division: dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
s2, syntax analysis: the method specifically comprises the following steps of performing syntactic analysis on a sentence s in each corpus in a corpus, and filtering out sentence components which do not contribute to entity relationship extraction, wherein the method specifically comprises the following steps:
s2.1, syntax tree generation: analyzing sentences s in each corpus in the corpus by using a syntax analysis open source tool to generate a syntax tree;
s2.2, pruning of the analytic tree: pruning triples (e) in syntax trees related to entities1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
s2.3, subtree recombination: the syntax analysis subtrees are recombined into a text sequence, and the original sequence of words is not changed in the recombination process;
s3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization: combining the currently input recombination sequence s with the authoritative dictionary in the fieldiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, word embedding generation: converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-valued word embedding word by utilizing a word vector conversion and source-opening tool:
s4, model training: the method for training the entity relationship extraction model based on the deep neural network by utilizing the datamation entity relationship extraction training data set specifically comprises the following steps:
s4.1, semantic feature extraction: selecting a specific neural network as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.4, and simultaneously extracting an entity pair e by adopting a bidirectional neural network in the model1、e2The context semantic information of the ith corpus is used for improving the recognition accuracy of the entity relationship, and the characteristic expression of the jth word of the ith corpus is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
s4.2, entity relation prediction: processing the feature vector output in step S4.1 by a classifier to calculate the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
s4.3, cost function optimization: by calculating the logarithm of the negative likelihood function of the real label y, the cost function of the deep neural network is obtained as follows:
in the formula, tnWhich represents a one-hot vector of the vector,representing the estimated probability of each predefined relationship type output by the softamx classifier in the step S4.2, wherein m represents the number of the predefined relationship types (the value is 8 here), lambda represents L2 regularized hyper-parameter, theta represents an independent parameter in the entity relationship extraction model, and | L | · | | | represents the norm, and model hyper-parameter is continuously adjusted through minimizing a cost function J (theta) to complete model training;
s5, entity relationship extraction: the method for extracting the entity relationship of the military scenario text to be processed by utilizing the trained model specifically comprises the following steps:
s5.1, testing text vectorization: vectorizing the military scenario original text to be processed sentence by using the processing procedure in the step S3;
s5.2, entity relation prediction: and (4) performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step (S5.1) by using the model trained in the step (S4), and storing the result.
2. The method of military affairs ideation entity relationship extraction combined with syntactic analysis according to claim 1, wherein the domain authority dictionary comprises military encyclopedia of China, military dictionary, and concise military dictionary.
3. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the training data set is divided by the test data set in a ratio of 2: 1.
4. The method of extracting military tape-out entity relationships in conjunction with syntactic analysis according to claim 1, wherein said syntactic parse open source tool is a Stanford parser.
5. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the word vector translation open source tool is word2 vec.
6. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the specific neural network is a long-short term memory network.
7. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the bidirectional neural network is a bidirectional long-short term memory network.
8. The method of extracting military proposal entity relationships in conjunction with syntactic analysis of claim 1 wherein the classifier comprises a softmax classifier.
9. An apparatus for extracting military proposed entity relationships in conjunction with syntactic analysis, the apparatus comprising:
corpus construction module 100: predefining entity relationship extraction target relationship types, labeling military scenario original texts, and constructing an entity relationship extraction model training data set and a test data set, wherein the method specifically comprises the following steps:
entity relationship pre-defining unit 101: predefining entity relationship types to be extracted by adopting the principle and method of definition of the entity relationship types of the Semantic Evaluation conference;
the entity relationship corpus building unit 102: marking military scenario original texts by adopting a manual method according to predefined entity relationship types to generate an entity relationship extraction corpus;
the data set dividing unit 103: dividing a training data set and a test data set, and dividing a corpus obtained by the entity relationship corpus construction unit 102 into the training data set and the test data set according to a specific proportion;
syntax parsing module 200: the method specifically comprises the following steps of performing syntactic analysis on sentences in each corpus in a corpus, and filtering sentence components which do not contribute to entity relationship extraction, wherein the method specifically comprises the following steps:
syntax tree generation unit 201: performing syntax analysis on sentences in each corpus in the corpus by using an open source tool to generate a syntax tree;
syntax tree pruning unit 202: cutting branches and leaves except entities and root nodes thereof in the syntax tree to generate a syntax analysis sub-tree;
the subtree recombination unit 203: the syntax analysis subtrees are recombined into a text sequence, and the original sequence of words is not changed in the recombination process;
the data vectorization module 300: converting the recombination sequences generated by the sub-tree recombination unit 203 into word embedding sets expressed in a distributed vector form, which specifically comprises:
training the original text vectorization unit 301: segmenting the recombination sequence in the current input corpus according to words to obtain a word set consisting of T words, and converting the words in the set into one-hot vectors based on an authority dictionary in the field;
the word embedding generation unit 302: converting the one-hot vector set obtained by the training original text vectorization unit 301 word by word into low-dimensional real-valued word embedding by using an open-source tool;
model training module 400: the method for training the entity relationship extraction model based on the deep neural network by utilizing the datamation entity relationship extraction training data set specifically comprises the following steps:
semantic feature extraction section 401: selecting a particular neural network as a basis relationship extractionA device for extracting high-level semantic features of the current sentence from the vector set output by the word embedding generation unit 302, wherein the model adopts a bidirectional neural network to simultaneously extract entity pairs e1、e2The context semantic information of the entity relation is improved so as to improve the identification precision of the entity relation;
entity relationship prediction unit 402: processing the feature vectors output by the semantic feature extraction unit 401 by using a classifier;
cost function optimization unit 403: obtaining a cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y, and continuously adjusting the hyper-parameters of the model by minimizing the cost function to finish model training;
the entity relationship extraction module 500: the method for extracting the entity relationship of the military scenario text to be processed by utilizing the trained model specifically comprises the following steps:
the test text vectorization unit 501: vectorizing the military scenario original text to be processed sentence by using the processing procedure in the data vectorization module 300;
entity relationship prediction unit 502: the model trained by the model training module 400 is used for carrying out semantic relation prediction on the vectorization military scenario sentence by sentence output by the test text vectorization unit 501, and the result is stored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910653287.0A CN110597998A (en) | 2019-07-19 | 2019-07-19 | Military scenario entity relationship extraction method and device combined with syntactic analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910653287.0A CN110597998A (en) | 2019-07-19 | 2019-07-19 | Military scenario entity relationship extraction method and device combined with syntactic analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110597998A true CN110597998A (en) | 2019-12-20 |
Family
ID=68852960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910653287.0A Pending CN110597998A (en) | 2019-07-19 | 2019-07-19 | Military scenario entity relationship extraction method and device combined with syntactic analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110597998A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177383A (en) * | 2019-12-24 | 2020-05-19 | 上海大学 | Text entity relation automatic classification method fusing text syntactic structure and semantic information |
CN111309925A (en) * | 2020-02-10 | 2020-06-19 | 同方知网(北京)技术有限公司 | Knowledge graph construction method of military equipment |
CN111476035A (en) * | 2020-05-06 | 2020-07-31 | 中国人民解放军国防科技大学 | Chinese open relation prediction method and device, computer equipment and storage medium |
CN111738000A (en) * | 2020-07-22 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Phrase recommendation method and related device |
CN112149423A (en) * | 2020-10-16 | 2020-12-29 | 中国农业科学院农业信息研究所 | Corpus labeling method and system for domain-oriented entity relationship joint extraction |
CN112487206A (en) * | 2020-12-09 | 2021-03-12 | 中国电子科技集团公司第三十研究所 | Entity relationship extraction method for automatically constructing data set |
CN112685513A (en) * | 2021-01-07 | 2021-04-20 | 昆明理工大学 | Al-Si alloy material entity relation extraction method based on text mining |
CN113076421A (en) * | 2021-04-02 | 2021-07-06 | 西安交通大学 | Social noise text entity relation extraction optimization method and system |
CN113076396A (en) * | 2021-03-29 | 2021-07-06 | 中国医学科学院医学信息研究所 | Entity relationship processing method and system oriented to man-machine cooperation |
CN112214610B (en) * | 2020-09-25 | 2023-09-08 | 中国人民解放军国防科技大学 | Entity relationship joint extraction method based on span and knowledge enhancement |
CN117332761A (en) * | 2023-11-30 | 2024-01-02 | 北京一标数字科技有限公司 | PDF document intelligent identification marking system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930746A (en) * | 1996-03-20 | 1999-07-27 | The Government Of Singapore | Parsing and translating natural language sentences automatically |
CN104933027A (en) * | 2015-06-12 | 2015-09-23 | 华东师范大学 | Open Chinese entity relation extraction method using dependency analysis |
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
CN109902301A (en) * | 2019-02-26 | 2019-06-18 | 广东工业大学 | Relation inference method, device and equipment based on deep neural network |
-
2019
- 2019-07-19 CN CN201910653287.0A patent/CN110597998A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5930746A (en) * | 1996-03-20 | 1999-07-27 | The Government Of Singapore | Parsing and translating natural language sentences automatically |
CN104933027A (en) * | 2015-06-12 | 2015-09-23 | 华东师范大学 | Open Chinese entity relation extraction method using dependency analysis |
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
CN109902301A (en) * | 2019-02-26 | 2019-06-18 | 广东工业大学 | Relation inference method, device and equipment based on deep neural network |
Non-Patent Citations (8)
Title |
---|
LI ZHEN 等: ""Research on Entity Semantic Relation Extraction in Fusion Domain"", 《2018 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA)》 * |
YUANFEI DAI 等: ""Relation Classification via LSTMs Based on Sequence and Tree Structure"", 《IEEE ACCESS》 * |
单赫源 等: ""结合词语规则和SVM模型的军事命名实体关系抽取方法"", 《指挥控制与仿真》 * |
唐弘毅: ""基于深度学习的实体关系抽取的研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
唐敏: ""基于深度学习的中文实体关系抽取方法研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
庄成龙 等: ""基于树核函数的实体语义关系抽取方法研究"", 《中文信息学报》 * |
朱珊珊 等: ""基于BiLSTM_Att的军事领域实体关系抽取研究"", 《智能计算机与应用》 * |
李枫林 等: ""基于深度学习框架的实体关系抽取研究进展"", 《情报科学》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177383B (en) * | 2019-12-24 | 2024-01-16 | 上海大学 | Text entity relation automatic classification method integrating text grammar structure and semantic information |
CN111177383A (en) * | 2019-12-24 | 2020-05-19 | 上海大学 | Text entity relation automatic classification method fusing text syntactic structure and semantic information |
CN111309925B (en) * | 2020-02-10 | 2023-06-30 | 同方知网数字出版技术股份有限公司 | Knowledge graph construction method for military equipment |
CN111309925A (en) * | 2020-02-10 | 2020-06-19 | 同方知网(北京)技术有限公司 | Knowledge graph construction method of military equipment |
CN111476035A (en) * | 2020-05-06 | 2020-07-31 | 中国人民解放军国防科技大学 | Chinese open relation prediction method and device, computer equipment and storage medium |
CN111476035B (en) * | 2020-05-06 | 2023-09-05 | 中国人民解放军国防科技大学 | Chinese open relation prediction method, device, computer equipment and storage medium |
CN111738000B (en) * | 2020-07-22 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Phrase recommendation method and related device |
CN111738000A (en) * | 2020-07-22 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Phrase recommendation method and related device |
CN112214610B (en) * | 2020-09-25 | 2023-09-08 | 中国人民解放军国防科技大学 | Entity relationship joint extraction method based on span and knowledge enhancement |
CN112149423A (en) * | 2020-10-16 | 2020-12-29 | 中国农业科学院农业信息研究所 | Corpus labeling method and system for domain-oriented entity relationship joint extraction |
CN112149423B (en) * | 2020-10-16 | 2024-01-26 | 中国农业科学院农业信息研究所 | Corpus labeling method and system for domain entity relation joint extraction |
CN112487206A (en) * | 2020-12-09 | 2021-03-12 | 中国电子科技集团公司第三十研究所 | Entity relationship extraction method for automatically constructing data set |
CN112685513A (en) * | 2021-01-07 | 2021-04-20 | 昆明理工大学 | Al-Si alloy material entity relation extraction method based on text mining |
CN113076396A (en) * | 2021-03-29 | 2021-07-06 | 中国医学科学院医学信息研究所 | Entity relationship processing method and system oriented to man-machine cooperation |
CN113076396B (en) * | 2021-03-29 | 2023-05-16 | 中国医学科学院医学信息研究所 | Entity relationship processing method and system for man-machine cooperation |
CN113076421A (en) * | 2021-04-02 | 2021-07-06 | 西安交通大学 | Social noise text entity relation extraction optimization method and system |
CN113076421B (en) * | 2021-04-02 | 2023-03-28 | 西安交通大学 | Social noise text entity relationship extraction optimization method and system |
CN117332761A (en) * | 2023-11-30 | 2024-01-02 | 北京一标数字科技有限公司 | PDF document intelligent identification marking system |
CN117332761B (en) * | 2023-11-30 | 2024-02-09 | 北京一标数字科技有限公司 | PDF document intelligent identification marking system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110597998A (en) | Military scenario entity relationship extraction method and device combined with syntactic analysis | |
US11194972B1 (en) | Semantic sentiment analysis method fusing in-depth features and time sequence models | |
CN107291693B (en) | Semantic calculation method for improved word vector model | |
CN106919646B (en) | Chinese text abstract generating system and method | |
CN110598203B (en) | Method and device for extracting entity information of military design document combined with dictionary | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN107908614A (en) | A kind of name entity recognition method based on Bi LSTM | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN110362819B (en) | Text emotion analysis method based on convolutional neural network | |
CN107885721A (en) | A kind of name entity recognition method based on LSTM | |
CN112487143A (en) | Public opinion big data analysis-based multi-label text classification method | |
CN111353306B (en) | Entity relationship and dependency Tree-LSTM-based combined event extraction method | |
CN107797987B (en) | Bi-LSTM-CNN-based mixed corpus named entity identification method | |
CN111858932A (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN109062904B (en) | Logic predicate extraction method and device | |
CN114020768A (en) | Construction method and application of SQL (structured query language) statement generation model of Chinese natural language | |
CN108733647B (en) | Word vector generation method based on Gaussian distribution | |
CN107977353A (en) | A kind of mixing language material name entity recognition method based on LSTM-CNN | |
Ma et al. | Tagging the web: Building a robust web tagger with neural network | |
CN110134793A (en) | Text sentiment classification method | |
CN114020906A (en) | Chinese medical text information matching method and system based on twin neural network | |
Legrand et al. | Phrase representations for multiword expressions | |
Ansari et al. | Language Identification of Hindi-English tweets using code-mixed BERT | |
Yang et al. | Multi-intent text classification using dual channel convolutional neural network | |
Ronghui et al. | Application of Improved Convolutional Neural Network in Text Classification. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191220 |
|
RJ01 | Rejection of invention patent application after publication |